Your Linux Data Center Experts

Not to subvert the recent 0.6.0 release of ZFS-FUSE, which I think is great… But the thing I'm really interested in is the dedupication code that just recently got released, and isn't even in the current OpenSolaris release yet. Read below the fold for more information on dedup in ZFS-FUSE.

For those of you who aren't up on it, the ZFS dedup code works at the block level, which is great news for files that are largely the same but differ in some small components. Copying the same file (say rsyncing multiple similar machines to a backup server) will obviously be deduplicated. But it will also deduplicate (most of) prelinked files that are largely the same on multiple machines, but may differ slightly in the pre-linking information.

The 0.6.0 ZFS-FUSE release doesn't include dedup, not surprisingly. I did some digging around and I found this git repository which has a version of ZFS-FUSE that includes the dedup code:

git clone '' zfs-fuse-dedupe

I've installed this on a test system and am currently running some stress testing of it and some basic testing of dedup, and everything seems to be working as expected. It'll take weeks or more before I'm ready to try putting any real data on it though. My first test, which I started Sunday, was interrupted by what I suspect was a hardware problem with one of my drive enclosures.

I've been running with “dedup=verify”, which I believe should be the default. The default is to consider blocks that have the same checksum to be the same block. “dedup=verify” takes it one step further and verifies that the contents are indeed the same before deduplicating the block.

One other thing I've really been impressed with on that version is the memory consumption. My system running an old version of ZFS-FUSE is currently consuming 2.6GB of RAM. The new test system running the above version never went over 600MB. Which is good – my test system is maxed out at 2GB of RAM (Atom 330 system with the older 2GB chipset).

I hope to convert my main home storage server over to 0.6.0 at some point soon here. But first I'd like to run some stress testing… It's not a huge deal if I lose the data on there, it's all backed up. But I'd prefer to get a bit more comfortable with it first.

comments powered by Disqus

Join our other satisfied clients. Contact us today.