How to set up caching, especially with PaginationHelper
(Written by Sean Reifschneider, tummy.com, ltd., October 2, 2005)
Rails has the neat "caches_page()" method that will cache a rendered
page and put it into your "public" directory. The benefit of this is that
once the cached page is rendered, Apache will serve it as static content.
This is an order of magnitude faster than having to pass off the request
to Rails, even if Rails is using the cached page results.
However, this is only true
if the filename that Rails writes
the cached page to matches up with incoming URLs that Apache sees. That's
the problem.
Worse, if you're using the PaginationHelper to make
your long list paged, that will normally use URLs of the form
"http://sitename.example.com/app/controller/action?page=1". Because of the
"?page=1" GET parameter, Apache is looking for a page named after "action"
in the "controller" directory, and doesn't care about the GET parameter.
Without this, all of the pages end up showing the contents of the first
page that's cached.
Cachable Pagination
You can change the Pagination so that cachable URLs are used by
modifying your "routes.rb" Rails Routing table. I chose to make my URLs be
"http://sitename.example.com/app/controller/action/1" for the first page,
etc. To do this, I added the following lines early in my "routes.rb":
map.connect(':controller/:page',
:action => 'index',
:page => /\d+(\.html)?/
)
map.connect(':controller/:action/:page',
:page => /\d+(\.html)?/
)
Note that I made it so that the ":page" value can match just a number
(like "2") or the number with a ".html" suffix ("2.html"). This is
required for the Apache static page cache which I'll discuss next.
I found that PaginationHelper doesn't mind if the page number has
".html" appended to it, so I didn't have to modify my controller at all.
Also note that this kind of demonstrates a weakness of Rails Routing. It
has a forward mapping (from URL into :controller/:action/:id, for example),
but it doesn't have an explicit reverse mapping even though it's used in
reverse.
In some cases it's easy to do the reverse mapping, but in the
above case there's no way of specifying that "url_for()" should create
URLs of the form "controller/action/1.html". If you set a map of
":controller/:action/:page.html", it maps properly forward, but the reverse
results in "controller/action/:page.html". In other words, the
":page.html" is literally left in the resulting URL. I don't know if this
is just a bug, or is a weakness of the fact that Rails doesn't have a
reverse map. Some way to optionally specify the reverse mapping would be
useful. I love that the default action works without any extra effort, but
sometimes you just need to "take the stick".
After I made the Rails Routing change, I figured I'd then have to go
into the Pagination helper and get it to make the pagination links which
are generated stop using the "?page=" parameter. Having recently read
the PaginationHelper documentation, I wasn't sure that this was going
to be possible without hacking the code. As I was digging around, I
realized that
Rails had already worked all this out and no changes
were necessary. Because of the "url_for()" helper, which generates
URLs using the Rails Routing map, I didn't have to make any changes,
it happened automatically. Deee-lightful.
Now that we have URLs that are unique for every page without relying
on the "?page=" parameter, we need to make it so that Apache can find these
pages.
Apache Static Pages
There's a pretty significant problem with how Rails is generating the
cache. If you go to the URL
"http://sitename.example.com/app/controller", which calls the "index()"
controller method, the cached page is stored in
".../public/app/controller.html". If you go to the URL
"http://.../app/controller/action" the cached page is stored in
".../public/app/controller/action.html".
Now, if the first URL above were put in
".../public/app/controller/index.html", that would be better for Apache,
except that Apache internally generates redirects for a directory without a
trailing "/" to the directory with the trailing slash. This is fine in
most cases, but Rails does not do that.
So, we've got Apache receiving URLs (potentially) like:
http://sitename.example.com/app/controller
http://sitename.example.com/app/controller/
http://sitename.example.com/app/controller/index.html
When the actual static content is at ".../public/app/controller.html".
Luckily, mod_rewrite comes to the rescue. In my main Apache configuration
I added the following lines, though I'd imagine that they'd work just as
well in the ".htaccess" file:
RewriteEngine On
#RewriteLog /tmp/rewrite
#RewriteLogLevel 2
RewriteRule ^/CONTROLLER/ACTION/(\d+)$ /CONTROLLER/ACTION/$1.html [L]
RewriteRule ^/CONTROLLER/ACTION/$ /CONTROLLER/ACTION.html [L]
RewriteRule ^/CONTROLLER/ACTION$ /CONTROLLER/ACTION.html [L]
RewriteRule ^/CONTROLLER/(\d+)$ /CONTROLLER/$1.html [L]
RewriteRule ^/CONTROLLER/$ /CONTROLLER.html [L]
RewriteRule ^/CONTROLLER$ /CONTROLLER.html [L]
The words in caps will need to be changed for your deployment.
These are all internal RewriteRules (in other words, they do not cause
a redirect to be sent to the browser), and the "[L]" means that if it
matches no further rules will be tried.
The first RewriteRule line in each group above are meant to
handle the PaginationHelper changes we made above. So, a request
for "http://.../controller/1" gets changed into a request for
"http://.../controller/1.html". This means that Apache will find the
page if a cached version has been written to your "public" directory.
The second and third RewriteRule lines (in each group) are to handle
requests for the base action or controller. You will need one for the
controller and one to match each action. Or, you could get fancy and do:
RewriteEngine On
#RewriteLog /tmp/rewrite
#RewriteLogLevel 2
RewriteRule ^/([^/]+)/([^/]+)/(\d+)$ /$1/$2/$3.html [L]
RewriteRule ^/([^/]+)/([^/]+)/$ /$1/$2.html [L]
RewriteRule ^/([^/]+)/([^/]+)$ /$1/$2.html [L]
RewriteRule ^/([^/]+)/(\d+)$ /$1/$2.html [L]
RewriteRule ^/([^/]+)/$ /$1.html [L]
RewriteRule ^/([^/]+)$ /$1.html [L]
These are regex rules, the second set for just reaching the
controller (via the index() method), and the first set is for the
:controller/:action. Both of them have the rule (the first of each set)
for the pagination mapping.
Cleaning The Cache
There's a problem with the Apache caching though... If you also want
to have static content in your "public" directory (images, stylesheets,
other static pages), it then can be difficult to separate the cached pages
out from the static pages. They kind of need to exist in the same
directory (though the Rewrite module may be able to help). What I did was
to make the static files owned and group of root, where the cached files
are written by the web server. So, in this case all I have to do is "find
/path/to/public -group www-data" to get the list of files generated from
the cache.
So, in the event of needing to flush the cache, using "find" with the
group or user option to get the list of cached files.
Now, as I mentioned above, mod_rewrite can do some pretty fancy
things. In particular, you could move the cache directory to a different
location and then use RewriteCond to conditionally redirect to that
location only when a file exists there. That way, the cache file can exist
in an entirely separate directory structure than the static pages. This
setup, since I'm not using it currently, is left as an exercise to the
reader. :-)