Side projects and experiments: expanding the reach of page caching

Posted by Luke Francl
on Monday, September 29

One of the many benefits of side projects is that you get to try out new things. In my job I can’t screw around too much—I’ve got a site to run. But with side projects, I can play with new APIs and try out ideas. Lately, Twistr has been my playground.

twistr photo
Twistr. Twitter + Flickr = LOLs?

In the case of Twistr, shared hosting is the mother of invention. To give Twistr “teh snappy” on crappy shared hosting, I wanted to use page caching. Page caching is the simplest and fastest Rails caching mechanism—it writes the result of a request to an HTML file which is served by the web browser for subsequent requests. Rails is bypassed entirely.

Page caching: First request
Page Caching diagram - first request

Page caching: subsequent requests
Page caching diagram - subsequent requests

The limitation of page caching is that you’ve got to show the same thing to every user. No dynamic content. But you can bend the rules a bit. Here’s what I did to create a page caching solution that works pretty well and allows Twistr to perform solidly on shared hosting.

Cacheable Flash

With page caching, dynamic pages – like flash messages – are impossible. The Cacheable Flash plugin makes this possible. Cacheable Flash stores the flash message in a cookie and writes it on to the page using JavaScript. This technique greatly expands the number of pages you can use page caching with.

I forked the plugin and removed the dependence on the JSON gem and Prototype to make it more lightweight and easier to deploy on Dreamhost.

Caching paginated results

On Twistr, the WIN and FAIL pages show a list of mashed-up photos, ordered by votes. Twistr uses will_paginate, but by default will_paginate is not compatible with page caching because it creates links with query parameters (for example /win?page=2).

Fortunately, this is easy to fix!

Just add a route with a :page path like this:

map.win_page '/win/:page', :controller => 'photos', :action => 'win'

Now each results page will get a separate page cached file (for example /win/2).

Expiring page caches by time

I decided to use a time-based approach to expire the page caches. This happily punts on the hard problem of figuring out which pages actually need to be purged from the cache.

Twistr has four pages: the home page, the photo show page, the win page, and the fail page.

Photo pages are never expired. Using Cacheable Flash means I do not need to have these pages be dynamic. When the site design changes, I simply delete the photo pages and allow them to regenerate.

Cache expiration for the other pages is triggered by a user casting a vote. I created a cache sweeper that observes the Vote class after_create to do this.

The home page is a photo page, but it needs to be expired periodically so that it doesn’t always have the same image. When a vote is cast, the cache sweeper deletes the home page if it is more than 20 minutes old.

The win/fail pages are the only pages that actually change because of user input. Up to the second results are not necessary, but I want them to be fairly current. When a vote is cast, the win and fail pages are expired if they are more than 5 minutes old.

Note that this only happens after a vote is cast—if no vote is cast, the pages will stay page cached forever.

A better solution might use cron to purge the files every 5 minutes, then fire up a GET request to warm up the cache for the next person who requests the resource (attempting to avoid the dog pile effect).

Lessons learned

Using page caching, I can get adequate performance from shared hosting. Almost every request on the site serves a cached page. For most users, only submitting a vote will hit Rails.

Comments

Leave a response

  1. Jason WatkinsSeptember 29, 2008 @ 11:05 PM

    I think page caching and action caching are too often overlooked. Separating common from user specific content does add complexity, but the gains are so tremendous it’s usually justified. When doing design it pays to put some effort into putting user specific content into a few small organized blocks rather than scattered everywhere in the page. Luckily this is often good user interface design anyhow.

    Action caching in particular is underused IMHO. It’s quite useful to get a dynamic hook on a request to check permissions, log impressions or make a more complex choice about which cached page to serve up. And while it’s not as blazing fast static pages are it is easily to get requests processed in rails in a few ms or so as long as you’re limiting database interaction and don’t render a view.

    One very useful way to avoid the dog pile is to expire behind. Similar to a write behind cache, on checking a ttl and finding the content expired, still serve up the stale content for that request, but also asynchronously start rebuilding the content via a queue/background worker. If you randomize your ttl’s a bit this results in very even system load.

  2. Luke FranclSeptember 30, 2008 @ 08:45 AM

    Jason,

    Thanks for your comment.

    I like this expire behind idea. I can see how you’d do it for action caching and other cache types that involve Rails more in the request. Is it possible to implement with page caching? I’m just not sure where you’d put the hook to expire the cache.

  3. Jason WatkinsOctober 01, 2008 @ 03:27 AM

    Luke,

    I think proxies that have a dynamic rule language like Varnish or BigIP’s might be able to do it without involving a rails process.

  4. NeerajOctober 02, 2008 @ 09:27 AM

    How would you add pagination for a resource which is restful and defined as map.resources :articles

  5. Luke FranclOctober 02, 2008 @ 12:50 PM

    Neeraj,

    Maybe you could try something like this:

    map.resources :articles

    map.article_pages '/articles/pages/:page', :controller => 'articles', :action => 'index'

    That would create /articles/pages/1, etc.

  6. JonathanOctober 02, 2008 @ 09:58 PM

    If a cached page skips rails altogether, what about password protected resources? Can you cache things that you normally protect with a filter?

    Great article, btw.

  7. Luke FranclOctober 02, 2008 @ 11:17 PM

    Jonathan,

    Nope. If the same path exists in the public directory (where page caches are stored) it gets served up by your web server without invoking Rails at all.

    So a page cache isn’t appropriate for password protected resources, unless you’ve got some HTTP Auth thing going on.

    You might want to look into action caching. It works like page caching, but Rails is invoked, so you can run filters and such. Since Rails runs, the performance isn’t as high as page caching, but like they say, correctness matters more than performance.

  8. PunNengOctober 05, 2008 @ 01:02 PM

    how about using :if to put after caches_page like

    caches_page :show, :if => Proc.new{ |p| p.request.session.data[“flash”].blank?}

    it looks ok to me

  9. Luke FranclOctober 05, 2008 @ 09:27 PM

    PunNeng, the thing about that is that once a cached page exists, Rails will be bypassed, flash message or no.