Your Linux Data Center Experts

How to set up caching, especially with PaginationHelper

By  Sean Reifschneider Date October 2, 2005

Rails has the neat "caches_page()" method that will cache a rendered page and put it into your "public" directory. The benefit of this is that once the cached page is rendered, Apache will serve it as static content. This is an order of magnitude faster than having to pass off the request to Rails, even if Rails is using the cached page results.

However, this is only true if the filename that Rails writes the cached page to matches up with incoming URLs that Apache sees. That's the problem.

Worse, if you're using the PaginationHelper to make your long list paged, that will normally use URLs of the form "http://sitename.example.com/app/controller/action?page=1". Because of the "?page=1" GET parameter, Apache is looking for a page named after "action" in the "controller" directory, and doesn't care about the GET parameter. Without this, all of the pages end up showing the contents of the first page that's cached.

Cachable Pagination

You can change the Pagination so that cachable URLs are used by modifying your "routes.rb" Rails Routing table. I chose to make my URLs be "http://sitename.example.com/app/controller/action/1" for the first page, etc. To do this, I added the following lines early in my "routes.rb":

map.connect(':controller/:page',
      :action => 'index',
      :page => /\d+(\.html)?/
      )
map.connect(':controller/:action/:page',
      :page => /\d+(\.html)?/
      )

Note that I made it so that the ":page" value can match just a number (like "2") or the number with a ".html" suffix ("2.html"). This is required for the Apache static page cache which I'll discuss next.

I found that PaginationHelper doesn't mind if the page number has ".html" appended to it, so I didn't have to modify my controller at all. Also note that this kind of demonstrates a weakness of Rails Routing. It has a forward mapping (from URL into :controller/:action/:id, for example), but it doesn't have an explicit reverse mapping even though it's used in reverse.

In some cases it's easy to do the reverse mapping, but in the above case there's no way of specifying that "url_for()" should create URLs of the form "controller/action/1.html". If you set a map of ":controller/:action/:page.html", it maps properly forward, but the reverse results in "controller/action/:page.html". In other words, the ":page.html" is literally left in the resulting URL. I don't know if this is just a bug, or is a weakness of the fact that Rails doesn't have a reverse map. Some way to optionally specify the reverse mapping would be useful. I love that the default action works without any extra effort, but sometimes you just need to "take the stick".

After I made the Rails Routing change, I figured I'd then have to go into the Pagination helper and get it to make the pagination links which are generated stop using the "?page=" parameter. Having recently read the PaginationHelper documentation, I wasn't sure that this was going to be possible without hacking the code. As I was digging around, I realized that Rails had already worked all this out and no changes were necessary. Because of the "url_for()" helper, which generates URLs using the Rails Routing map, I didn't have to make any changes, it happened automatically. Deee-lightful.

Now that we have URLs that are unique for every page without relying on the "?page=" parameter, we need to make it so that Apache can find these pages.

Apache Static Pages

There's a pretty significant problem with how Rails is generating the cache. If you go to the URL "http://sitename.example.com/app/controller", which calls the "index()" controller method, the cached page is stored in ".../public/app/controller.html". If you go to the URL "http://.../app/controller/action" the cached page is stored in ".../public/app/controller/action.html".

Now, if the first URL above were put in ".../public/app/controller/index.html", that would be better for Apache, except that Apache internally generates redirects for a directory without a trailing "/" to the directory with the trailing slash. This is fine in most cases, but Rails does not do that.

So, we've got Apache receiving URLs (potentially) like:

When the actual static content is at ".../public/app/controller.html". Luckily, mod_rewrite comes to the rescue. In my main Apache configuration I added the following lines, though I'd imagine that they'd work just as well in the ".htaccess" file:

RewriteEngine On
#RewriteLog /tmp/rewrite
#RewriteLogLevel 2

RewriteRule ^/CONTROLLER/ACTION/(\d+)$ /CONTROLLER/ACTION/$1.html [L]
RewriteRule ^/CONTROLLER/ACTION/$ /CONTROLLER/ACTION.html [L]
RewriteRule ^/CONTROLLER/ACTION$ /CONTROLLER/ACTION.html [L]

RewriteRule ^/CONTROLLER/(\d+)$ /CONTROLLER/$1.html [L]
RewriteRule ^/CONTROLLER/$ /CONTROLLER.html [L]
RewriteRule ^/CONTROLLER$ /CONTROLLER.html [L]

The words in caps will need to be changed for your deployment. These are all internal RewriteRules (in other words, they do not cause a redirect to be sent to the browser), and the "[L]" means that if it matches no further rules will be tried.

The first RewriteRule line in each group above are meant to handle the PaginationHelper changes we made above. So, a request for "http://.../controller/1" gets changed into a request for "http://.../controller/1.html". This means that Apache will find the page if a cached version has been written to your "public" directory.

The second and third RewriteRule lines (in each group) are to handle requests for the base action or controller. You will need one for the controller and one to match each action. Or, you could get fancy and do:

RewriteEngine On
#RewriteLog /tmp/rewrite
#RewriteLogLevel 2

RewriteRule ^/([^/]+)/([^/]+)/(\d+)$ /$1/$2/$3.html [L]
RewriteRule ^/([^/]+)/([^/]+)/$ /$1/$2.html [L]
RewriteRule ^/([^/]+)/([^/]+)$ /$1/$2.html [L]

RewriteRule ^/([^/]+)/(\d+)$ /$1/$2.html [L]
RewriteRule ^/([^/]+)/$ /$1.html [L]
RewriteRule ^/([^/]+)$ /$1.html [L]

These are regex rules, the second set for just reaching the controller (via the index() method), and the first set is for the :controller/:action. Both of them have the rule (the first of each set) for the pagination mapping.

Cleaning The Cache

There's a problem with the Apache caching though... If you also want to have static content in your "public" directory (images, stylesheets, other static pages), it then can be difficult to separate the cached pages out from the static pages. They kind of need to exist in the same directory (though the Rewrite module may be able to help). What I did was to make the static files owned and group of root, where the cached files are written by the web server. So, in this case all I have to do is "find /path/to/public -group www-data" to get the list of files generated from the cache.

So, in the event of needing to flush the cache, using "find" with the group or user option to get the list of cached files.

Now, as I mentioned above, mod_rewrite can do some pretty fancy things. In particular, you could move the cache directory to a different location and then use RewriteCond to conditionally redirect to that location only when a file exists there. That way, the cache file can exist in an entirely separate directory structure than the static pages. This setup, since I'm not using it currently, is left as an exercise to the reader. :-)

comments powered by Disqus