Saturday July 05, 2008 at 15:00
Subject: ZFS Under Linux: A User Report
Keywords:
Technical, ZFS
Posted by: Sean Reifschneider
Related entries:Why I Like ZFS. by Sean Reifschneider, Saturday July 05, 2008 at 02:38
Putting it all together: The Ultimate Storage Box by Sean Reifschneider, Monday July 07, 2008 at 02:44
A month of ZFS under Linux by Sean Reifschneider, Friday August 08, 2008 at 16:57
As was pointed out by Daniel Webb in a comment to my previous post,
under Linux you have to use FUSE
to use ZFS. He just replied before I had a chance to get the next post
in this series out. :-)
We've been using ZFS under Open Solaris for the last year or two in
our hosting business for backup servers. It has some really compelling
features (beyond what I mentioned in my last post) when used for backups.
While it has worked well, it hasn't been entirely trouble-free. For a
home backup/storage server I wanted to use ZFS but I absolutely have to
keep the data encrypted.
ZFS under OpenSolaris doesn't currently support on disc encryption,
though they are working on it. Linux has very mature disc encryption
support, it's in the stock kernels and many installers support it now.
That plus me being very familiar with Linux prompted me to look at ZFS
under Linux again. Read on for my user report.
I've built and tested ZFS on FUSE previously and it was working.
I was planning on doing some serious use, and had several options
(including running Linux+Crypto on the base machine, exporting the block
devices via iSCSI, and running OpenSolaris on another machine or virtual).
Because of this, I decided to start my research by looking at the mailing
lists.
ZFS on FUSE hasn't gotten a real release in quite a long time, around
15 months at the time of this writing. However, on the mailing list I saw
a healthy amount of discussion and regular fixes being applied to it. So
for my current set of tests I started with the latest code from version
control.
My test system was running on a system with CentOS 5 and 10 hard
250GB hard drives. Because of CentOS I was running an older kernel and
FUSE, but things worked relatively well. I had some problems initially
because one of my hard drives was having problems -- something I knew
because of RAID issues that caused trashing of my file-system resulting in
this system becoming available for use in testing ZFS. :-)
I tracked down that bad drive (over a year out of warranty), and
things got better but I still ran into a couple of situations where the
system would lock up while I was running multiple backups.
My theory on this was that with only 2GB of RAM I was just thrashing
the system while running rsyncs. ZFS under FUSE is known to use a lot of
memory in the first place, and rsync version 2 stores a full file list in
memory. So I upgraded the test system to 3GB of RAM, and at that point I
didn't have any problems. I also installed rsync version 3, which can do
incremental file lists, which saved a ton of memory.
However, my performance was pretty limited. This had nothing to do
with ZFS under FUSE. A Celeron 3GHz just doesn't have the huevos for
keeping up with 10 encryption processes plus the ZFS checksumming, etc...
I finally decided that I was happy enough with my testing that I was
ready to bite the bullet and start trying to deploy the final system.
I upgraded the system to 14 500GB drives and a quad core 2.4GHz Core 2
CPU, but with only 2GB of RAM currently. I left a 250GB drive in place
for the system disc (previously I just saved 4GB on every drive and used
the first two drives as a mirror for the system). On this I installed
Ubuntu 8.04 (a LTS release similar to CentOS, but with more recent software
since it was released only a few months ago.
I built the latest development checkout of ZFS for FUSE and set up the
14 500GB drives as a raidz2 (redundant storage with two parity drives) on
top of the encrypted partitions.
I then copied over the "zfs send" copies of the file-systems I had
created on my test system, around 400GB of data. These are low-level
copies of the file-system snapshots, containing the backups I've been
making of a bunch of our laptops. I loaded the dumps back into their
respective ZFS mount-points with no problems, which I was happy about.
These backups took weeks to complete, because they were coming over slim
upload pipes and further rate-limited so that they wouldn't impact
other use of our networks while backups were running.
I next copied 2.2TB of data over from my storage server. This took a
couple of days, but copied over with no problems at all. I found my old
storage server had only a 100mbps network adapter in it. I took it down to
add a gigabit adapter, and then realized I had no more free PCI slots
(because of several being used for 4-port SATA adapters). Which was
probably just as well, the 3.2GHz Celeron in the sending computer could
only handle around 30MB/sec with all the crypto going on...
At this point I have around 3TB on the system, so I started a "scrub"
-- the ZFS equivalent of a RAID verify. I've done a few of them with no
problems. It's running over 240MB/sec during the verify, which is
perfectly reasonable. The OpenSolaris machines running native ZFS are
seeing around 100MB/sec to 200MB/sec with no crypto, but those are
12 250GB drives or 8 500GB drives.
The only real gotcha is that on OpenSolaris you can access the
snapshots via "/pool/fsname/.zfs/snapshotname". Under Linux you don't have
access to the ".zfs" hidden name. So to access a snapshot you have to
clone it. This is a very workable solution, but I had to spend a bit of
time hunting around before I figured out what the story was.
So far I've had absolutely no issues with it. It's been running
great. I've only been using ZFS under Linux seriously for about the last
month, but so far things are looking really good.
Am I considering switching our OpenSolaris systems to Linux?
Absolutely. Part of that is that we are much more familiar with Linux than
OpenSolaris. Another part is that the hardware support under OpenSolaris
is much more limited than under Linux. It was fairly painful to find SATA
cards that were supported under OpenSolaris.
One final parting note, and this is true of ZFS under both
platforms... The "zfs scrub" restarts whenever you create or delete a
snapshot. The scrub is an important part of ensuring your data is happy
and healthy, but if the time required to do the scrub is larger than the
frequency with which you create snapshots, your scrub will never finish.
With ZFS you can do neat things like create a snapshot, run a "zfs send" to
replicate the data to another machine, then destroy the previous snapshot
and repeat. However, if you do this frequently (say, via cron every
minute), it's going to seriously mess with the scrub.
(Post Reply)
(Post Reply)
| Comment |
Daniel Webb Subject: Very promising |
Wow, this sounds very promising. Normally I wouldn't even consider being an early adopter of a filesystem, but in this case, ZFS is so much better than other filesystems with respect to integrity/robustness that I am very tempted.
| Comment |
Larry Hastings Subject: Just in time! |
Just last night I started ordering the parts for my new 10TB RAID. I was planning on going with OpenSolaris--solely so I could use ZFS RAIDZ2. I've been keeping an eye on ZFS-for-FUSE but didn't get the impression it was production ready. If I can run Linux, that's just fabulous news.
The Sun HCL listed a SATA card with a SI 3114 chipset; do those not work in your experience? 'Cause those are a dime a dozen.
| Comment |
doug Subject: 32 bit or 64 bit? |
Just wondering whether your set-up is 32bit or 64bit? I've heard ZFS is less buggy in 64bit
| Comment |
Ken Roberts Subject: Hardware? |
Hi Sean,
Thanks for such an informative post. I have long been interested in ZFS+Linux, and really appreciate a post from someone who has worked with it in production.
I may be building a big fileserver soon-- would you mind sharing details about your hardware (case, mobo, and sata cards)?
Thank you.
| Comment |
Author:
Sean Reifschneider Subject: Answers about the configuration... |
The system is running 64-bit, primarily so that I could run virtualization in 64-bit if I decide I want to but also so that I could run the SMP F@H client on it (which requires 64-bit).
As far as the Silicon Image 3114 chipset, I initially started with those but had problems. That chipset only works with the non-RAID version of those boards, but all the boards I found had the RAID bit set. This may have been fixed since, but getting it working under OpenSolaris at the time required re-flashing the BIOS on the board, which I had no luck at. This is one of the benefits of running ZFS under Linux that I mentioned: Much more hardware support. If you want to have the option of running under OpenSolaris, you'll want to pick another board. My current system is not capable of running directly under OpenSolaris I'd expect.
As far as details about the setup, I plan to write up more on this soon. However, one of the cards I'm using is the one I wrote about a few days ago. Another is the Supermicro 8-port SATA PCI-X card. And finally I'm using 4 ports on the motherboard.
Sean