Your Linux Data Center Experts

I really like the idea of rdiff-backup, but the drawbacks kept stopping me from deploying it more widely. The nicest thing is that it stores deltas as files change, so if you have a large file that changes a little bit every day, rdiff-backup only stores the little bit that changes. If you use the rsync hard-link trick to keep historic data around, it duplicates the whole file every day which can quickly add up on a slowly changing multi-gigabyte database file.

Problems with rdiff-backup include:

  • Server and client versions need to be the same. In a mixed environment, this means you're going to have to maintain your own packages for many of CentOS 4, CentOS 5, Hardy, Karmic, Fedora 12… My personal backup server has clients that are most of those…
  • rdiff-backup doesn't deal very well with intermittently connected systems. If you have a big set of changes that takes several days to push up at a throttled rate, and you are disconnected part way through, it needs to start over from scratch.
  • Worse, this failure may require running the next rdiff-backup with a special option to clean up the broken backup directory.
  • No throttling like the “–bwlimit” option in rsync.

After giving this all some thought, I came up with the idea of using rsync to pull the data over the network, and using rdiff-backup to maintain the historic backup information. Read below for my experiences with this.

rsync is very good at pulling data from remote sites via slow cable modems and the like – backing up our laptops to a central location. And using the “–partial –inplace” options means that a single huge file (like an ISO or virtual machine image) will eventually get pushed across, even if you rate limit it so the slow outbound connection at home doesn't get saturated.

Once the rsync finishes, run an rdiff-backup from the rsync destination directory, to another directory. This means that you don't have to worry about syncing the version of rdiff-backup, since it only runs on one machine. Unfortunately, it does mean that you have another copy of all the files, doubling the space used.

I had this clever idea of hard-linking the rdiff-backup directory files to the rsync destination files, but that's incompatible with “rsync –inplace”, and also rdiff-backup seems to go through and break the hard links anyway.

So, the question is whether the deltas save enough space to make up for the duplicate copy of the current system data. My experience with rsync hardlinks and also with BackupPC makes me think that it probably does in most cases.

Because it's running locally, I should never have to worry about an incomplete rdiff-backup run. If I'm running rdiff-backup directly to my laptop, I do have to worry about it not being complete before I do a reboot, go to or from the coffee shop, etc…

Another thing I'm trying out, is having multiple rdiff-backup directories to have different intervals for backups. So on the first day of the month I'll rdiff-backup to a “monthly” directory, and other days I'll go to a “daily” directory. I could do the same for the first day of the week, but currently I'm just doing daily (keeping for 30 days) and monthly in case I need to go back a really long time.

With the hard link trick and rsync, or even BackupPC (which uses it's own rsync implementation so that it can do deduplication and compression of the stored files), the amount of space required for keeping a reasonable amount of history just explodes. For servers in particular I like to keep a week or two of daily backups, at least 6 weeks of weekly, and at least a year of monthly history.

So far I've been running this configuration on around a dozen machines for 3 weeks. It's worked as well as I was hoping it would. Once I got the controlling script worked out, it's been maintenance free and has just worked. There have been no issues with rdiff-backup getting upset.

Of course, what I really want is to use snapshots to manage the deltas between rsync backups. But, LVM snapshots just won't cut it there, so I'm kind of stuck waiting until btrfs matures.

comments powered by Disqus

Join our other satisfied clients. Contact us today.