| Comment |
Author:
Sean Reifschneider Subject: Nope, hard-links are what I'm trying to avoid. |
--link-dest is exactly the sort of "linking trick" that I reference in my original post. It works fine for fairly small sets of data, but it's quite expensive to remove these archived backups (requiring hundreds of thousands or millions or more file delete operations, directory traversal, etc).
The biggest issue is, say you have a 20GB file that every night has 100KB appended to it. Or 100KB within it updated. You know, a fairly typical database file. And you are keeping 30 incremental copies...
With the hard links, this requires 600GB of storage.
Using rdiff-backup or ZFS snapshots, you end up using more like 20.03GB for the same dataset. Or a space savings of 97%.
Disc space is cheap, but it's not that cheap when you're talking about 5 TB versus 150TB...
As far as LVM snapshots go, you have to know beforehand how much space one of these snapshots is going to require, or you have to overcommit and make the snapshot volume larger than you ever expect it to reach, or you have to set up the snapshots to automatically extend when they run short (but not out) of space.
As far as overallocating, I'm looking at having a thousand backup copies on one of our larger systems. Over-allocating by even just 1GB results in a wasted terabyte right there. And I probably can't guess that close to right. Or I have to snapshot the whole backup file-system, and count on each snapshot being rather large, but also that if I move a backup from one host to another, it's still going to have all of that data reserved (probably through the old snapshot copies) for the next year.
Oh, and if you have 30 snapshots of a piece of data and it changes, you have to write and keep 30 copies of that changed volume (one for each snapshot), or you have to have rolling snapshots (one snapshot snapshotting another snapshot, can you do that) and then only be able to trim off the ends, so no trimming from the middle -- you need rolling snapshots for each backup type).
If you have had good luck with setting up and managing 1,000 LVM based snapshots, I'd love to hear about it. However, it seems like it would be a maintenance and performance nightmare. I've toyed with trying it out, since I think LVM snapshots are more robust than ZFS or btrfs are right now, but I just haven't gotten up the urge to try it.
So, the mechanism I wrote about is similar in ideas to many of these, but I believe it's dramatically simpler than LVM snapshots while saving more space than hard links.
The target I'm shooting for is like ZFS snapshots. They automatically manage their space, so you don't have to guess at how much space is going to be used by a snapshot, and it gets allocated out of the main file-system. This is because as blocks are written, they are copied elsewhere (copy on write). And you can create many "light weight" sub-file-systems within the ZFS file-system.
So, I'd create a "backups" ZFS, and then within that make one file-system for each system. For each of those, every night I would take a snapshot. I would delete the snapshots as time went on such that I ended up keeping monthly interval snapshots beyond 6 weeks, and weekly beyond 14 days.
ZFS managed all the complexity behind the scenes.
Sean
(Return to parent message)
(Return to parent message)