I've been thinking about our mirror site, mirrors.tummy.com, and today while I was doing some metalworking I was trying to figure out a good way to do atomic updates of the contents of the mirror site. This can also apply to web sites or other content you may want to update such that users have no opportunity for getting partially incomplete data.
Read on for my thoughts on making web and other data available reliably during an update.
I guess it's less of an issue now with tools like yum trying other mirrors if the one that it's on does not have the file that it's looking for. Still, it's desirable to have an update not leave the contents of the directory too out of date. Typically things like the “–delete-after” argument to rsync are used to do this, and work well.
But, as a thought exercise I was thinking about how btrfs might help here. I figured that you could clone the site contents directory, and then run the rsyncs in the clone. When you're done, you need to make the web/ftp server atomically see the new contents.
I figured the best option there would be to use a symlink pointing to the “active” directory. However, the “mv” command will be clever and if you try to “mv” a symlink to another symlink, it puts it inside the destination rather than overwriting it like it would if they were simple files:
guin:slt$ ln -s foo1 foo guin:slt$ ls -l total 8.0K lrwxrwxrwx. 1 jafo jafo 4 Oct 4 16:51 foo -> foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:36 foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:37 foo2/ guin:slt$ ln -s foo2 foo.tmp guin:slt$ mv foo.tmp foo guin:slt$ ls -l total 8.0K lrwxrwxrwx. 1 jafo jafo 4 Oct 4 16:51 foo -> foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:51 foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:37 foo2/ guin:slt$
The rename(2) system call (which “mv” uses under the covers) will do what we want:
guin:slt$ ln -s foo1 foo guin:slt$ ln -s foo2 foo.tmp guin:slt$ ls -l total 8.0K lrwxrwxrwx. 1 jafo jafo 4 Oct 4 16:53 foo -> foo1/ lrwxrwxrwx. 1 jafo jafo 4 Oct 4 16:53 foo.tmp -> foo2/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:51 foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:37 foo2/ guin:slt$ python -c 'import os; os.rename("foo.tmp", "foo")' guin:slt$ ls -l total 8.0K lrwxrwxrwx. 1 jafo jafo 4 Oct 4 16:53 foo -> foo2/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:51 foo1/ drwxr-xr-x. 2 jafo jafo 4.0K Oct 4 16:37 foo2/ guin:slt$
This causes the data to all be atomically updated, but there is a hole between when the clients get their index files (say, “apt-get update”) and when they try to get the files associated with that index. So the user could get the index files before the update, and then try referencing files that no longer exist after the update.
Ideally what you'd probably want to do is use “–delete-after”, but somehow trigger a new clone to be taken after the transfer but before the delete was done. Or log the deletes and run them after another clone. However, rsync doesn't natively support this. So the best bet would probably be to run one rsync without any “–delete” option, then make a clone and do another rsync with “–delete”.
The whole goal of separating out the delete step is that you'd end up with a clone that has both the previous set of data and the new set of data. Then you'd run the delete phase to get just the new data, to use as the basis for the next rsync.
So this way you have any files that were deleted recently, without keeping all deleted files forever.
One thing that I've seen for some web-sites, typically with CSS files or similar, is that these files are never updated in place. They are always referenced via a specific version such as “style-1.css” and “style-2.css”. This means that a page load in a browser that then needs to get the CSS after an update will still get the one that is compatible with the page contents. It also means that any caching will be avoided. Instead of returning a cached copy of the stylesheet (because the name is the same), you get the new version.
Another nice btrfs feature for our mirroring situation would be to use subvolumes. Right now we have something around a dozen directories of software that we mirror (Fedora, Debian, Ubuntu, etc…). Each of those is in a single large file-system, because we don't want to manage the exact space required by each repo we mirror.
But with btrfs we can have our disc space be a single large pool, and each software repository be a sub-volume. They'd share space, but we could individually clone them for doing individual updates as mentioned above.comments powered by Disqus