Your Linux Data Center Experts

As was pointed out by Daniel Webb in a comment to my previous post, under Linux you have to use FUSE to use ZFS. He just replied before I had a chance to get the next post in this series out. :-)

We've been using ZFS under Open Solaris for the last year or two in our hosting business for backup servers. It has some really compelling features (beyond what I mentioned in my last post) when used for backups. While it has worked well, it hasn't been entirely trouble-free. For a home backup/storage server I wanted to use ZFS but I absolutely have to keep the data encrypted.

ZFS under OpenSolaris doesn't currently support on disc encryption, though they are working on it. Linux has very mature disc encryption support, it's in the stock kernels and many installers support it now. That plus me being very familiar with Linux prompted me to look at ZFS under Linux again. Read on for my user report.

I've built and tested ZFS on FUSE previously and it was working. I was planning on doing some serious use, and had several options (including running Linux+Crypto on the base machine, exporting the block devices via iSCSI, and running OpenSolaris on another machine or virtual). Because of this, I decided to start my research by looking at the mailing lists.

ZFS on FUSE hasn't gotten a real release in quite a long time, around 15 months at the time of this writing. However, on the mailing list I saw a healthy amount of discussion and regular fixes being applied to it. So for my current set of tests I started with the latest code from version control.

My test system was running on a system with CentOS 5 and 10 hard 250GB hard drives. Because of CentOS I was running an older kernel and FUSE, but things worked relatively well. I had some problems initially because one of my hard drives was having problems – something I knew because of RAID issues that caused trashing of my file-system resulting in this system becoming available for use in testing ZFS. :-)

I tracked down that bad drive (over a year out of warranty), and things got better but I still ran into a couple of situations where the system would lock up while I was running multiple backups.

My theory on this was that with only 2GB of RAM I was just thrashing the system while running rsyncs. ZFS under FUSE is known to use a lot of memory in the first place, and rsync version 2 stores a full file list in memory. So I upgraded the test system to 3GB of RAM, and at that point I didn't have any problems. I also installed rsync version 3, which can do incremental file lists, which saved a ton of memory.

However, my performance was pretty limited. This had nothing to do with ZFS under FUSE. A Celeron 3GHz just doesn't have the huevos for keeping up with 10 encryption processes plus the ZFS checksumming, etc…

I finally decided that I was happy enough with my testing that I was ready to bite the bullet and start trying to deploy the final system.

I upgraded the system to 14 500GB drives and a quad core 2.4GHz Core 2 CPU, but with only 2GB of RAM currently. I left a 250GB drive in place for the system disc (previously I just saved 4GB on every drive and used the first two drives as a mirror for the system). On this I installed Ubuntu 8.04 (a LTS release similar to CentOS, but with more recent software since it was released only a few months ago.

I built the latest development checkout of ZFS for FUSE and set up the 14 500GB drives as a raidz2 (redundant storage with two parity drives) on top of the encrypted partitions.

I then copied over the “zfs send” copies of the file-systems I had created on my test system, around 400GB of data. These are low-level copies of the file-system snapshots, containing the backups I've been making of a bunch of our laptops. I loaded the dumps back into their respective ZFS mount-points with no problems, which I was happy about. These backups took weeks to complete, because they were coming over slim upload pipes and further rate-limited so that they wouldn't impact other use of our networks while backups were running.

I next copied 2.2TB of data over from my storage server. This took a couple of days, but copied over with no problems at all. I found my old storage server had only a 100mbps network adapter in it. I took it down to add a gigabit adapter, and then realized I had no more free PCI slots (because of several being used for 4-port SATA adapters). Which was probably just as well, the 3.2GHz Celeron in the sending computer could only handle around 30MB/sec with all the crypto going on…

At this point I have around 3TB on the system, so I started a “scrub” – the ZFS equivalent of a RAID verify. I've done a few of them with no problems. It's running over 240MB/sec during the verify, which is perfectly reasonable. The OpenSolaris machines running native ZFS are seeing around 100MB/sec to 200MB/sec with no crypto, but those are 12 250GB drives or 8 500GB drives.

The only real gotcha is that on OpenSolaris you can access the snapshots via “/pool/fsname/.zfs/snapshotname”. Under Linux you don't have access to the “.zfs” hidden name. So to access a snapshot you have to clone it. This is a very workable solution, but I had to spend a bit of time hunting around before I figured out what the story was.

So far I've had absolutely no issues with it. It's been running great. I've only been using ZFS under Linux seriously for about the last month, but so far things are looking really good.

Am I considering switching our OpenSolaris systems to Linux? Absolutely. Part of that is that we are much more familiar with Linux than OpenSolaris. Another part is that the hardware support under OpenSolaris is much more limited than under Linux. It was fairly painful to find SATA cards that were supported under OpenSolaris.

One final parting note, and this is true of ZFS under both platforms… The “zfs scrub” restarts whenever you create or delete a snapshot. The scrub is an important part of ensuring your data is happy and healthy, but if the time required to do the scrub is larger than the frequency with which you create snapshots, your scrub will never finish. With ZFS you can do neat things like create a snapshot, run a “zfs send” to replicate the data to another machine, then destroy the previous snapshot and repeat. However, if you do this frequently (say, via cron every minute), it's going to seriously mess with the scrub.

comments powered by Disqus

Join our other satisfied clients. Contact us today.