Monday July 17, 2006 at 12:55
Subject: Nexenta and ZFS: The State of the Union.
Keywords:
Nexenta, OpenSolaris, Technical, ZFS
Posted by: Sean Reifschneider
Related entries:ZFS, the latest word in file-systems. by Sean Reifschneider, Wednesday May 17, 2006 at 01:46
I've been using ZFS for a backup server via the Nexenta distribution
for around 3 months now. In addition, I've been testing the new Nexenta.
While ZFS is insanely cool, the road has been far from smooth. Here are
some of my impressions.
The concepts behind ZFS are, of course, exceedingly cool. All data
written to disc is checksummed, so the file-system can detect corruption at
any stage, including a disc or bus silently corrupting data you write to
it. The transparent compression is also great, it's saving us around 25%
on storage space.
Nexenta, if you aren't familiar with it, is an OpenSolaris
distribution with many parts of the OpenSolaris codebase (kernel, SMF, etc)
combined with apt-get and GNU tools from Ubuntu Dapper. It's currently in
it's 5th Alpha release, with the first version I tested being the second
alpha.
The cool thing about Nexenta is that it includes apt-get to easily
install packages. One of the guys at the NCLUG meeting last week is a
Solaris guy, and he was griping about not being able to just install
packages for Postfix and Mailman. This should help that. Except that with
Alpha 5 I haven't been able to get a single package to successfully install
via "apt-get". It always errors out. Luckily, the install CD includes
many packages, and others I can download the Debian package directly from
the Nexenta web site. I'm installing pretty minimal sets of packages on
these boxes, so that's not a big problem.
The bigger problem is that the kernel update packages come pretty
slowly. This wouldn't be as much of a problem except that I'm having
serious problems with ZFS, and would like the ability to try newer kernels.
The OpenSolaris developers seem to be releasing new kernels every 2 weeks
like clockwork, but I have yet to see a kernel update package available.
In the early days of Nexenta, kernel updates via packages
weren't possible, but as of Alpha 5 they are supposed to be. "apt-get
dist-upgrade" doesn't show anything to update from what the CD installed
though. Alpha 5 has been out just over a month, so there should be
some level of updates.
As I mentioned above, ZFS is having some issues. The initial install
I did was running fairly well, but once the amount of data on the system
got to be around 1TB, the "scrub" process (where it goes through and
verifies checksums) will hang the ZFS storage pool after running about
12 hours. This "scrub" process is also how you recover from a disc
failure.
I'm happy to report that my testing of Alpha 5 shows that this is
resolved. I was running some tests with a simple set of 1TB of data and
running repeated scrubs. It would do the scrub in around 3 hours, and I
kept running them for 4 days without the pool hanging.
So, my plan was to migrate the data from the existing ZFS to the new
one on the new box I'm setting up. ZFS includes this nice backup utility
that will dump the raw file-system snapshots, so I could preserve all the
data and snapshots on the original system, given a bit of scripting to wrap
around it.
Sadly, in my testing of copying a file-system with it's 10 or so
snapshots, the initial file-system copied over fine, but the migration of
the incremental snapshots caused a kernel panic and the system to reboot.
I guess I need to set up a test system with Alpha 2 on it, create a ZFS,
and then try doing an upgrade to Alpha 5 and see if that will not freak
out.
In general, I have found the Nexenta user-space to be pretty solid.
The only real problems I've had have been related to ZFS. It still seems
that ZFS is not ready for prime time. Interestingly enough, it looks like
ZFS is being used in Solaris 10 now, because Sun announced this new server
that has 24 discs in a 4U chassis, and it's using ZFS for the file-system.
I imagine that the attention that gets will translate pretty quickly to
more stability in OpenSolaris.
Finally, there is a google summer of code project to port ZFS to
Linux. I imagined that would be a pretty tough port, because of the way
ZFS unifies RAID, LVM, and the file-system. However, at BLUG, Kevin looked
up the project and found that it's being ported as a user-space
file-system. Not really usable for many things, but I could definitely
make use of that for some archive storage needs. The checksummed data
would be really nice to have on Linux.
(Post Reply)
(Post Reply)
| Comment |
Daniel Webb Subject: ZFS Linux port |
Even as a user-space port, I'm drooling. For low-performance applications like I normally use (backup, music collection, subversion repositories) this would be totally awesome!
| Comment |
Author:
Sean Reifschneider Subject: Absolutely... |
Yeah, that's the way I feel also. It would be great if it were in-kernel and I could have it be my /home directory on top of a dm-crypt volume, and my root file-system as well. However, because ZFS touches not only the filesystem but unifies RAID and LVM into it, I'm not expecting that it will actually happen for quite some time. I mean, they had to overcome the same issues on Solaris, but I imagine that's going to make it hard to get into Linux at all, and the licensing may prevent it from ever going into the kernel.
The benefits it provides are huge. It's snapshots are just incredible. For example, I've used it in cases where I had a huge mass of historic data I didn't want to get rid of yet, so I copied it over, made a snapshot, and then deleted the historic stuff. In 6 months when I'm sure I don't need it, I can nuke it quite quickly. Much more quickly than the original "rm -rf" of a few million files went, in this case.
Also, the checksumming is great. I simulated this for one project in particular. It was a simple archive package I wrote for a client years and years ago. About a year after I wrote it, an upgrade to the kernel brought them a new version of ext3 that was very unstable. Their archive needed to be fscked every week, and often destroyed files. Luckily, I had written a checksum of each file out to the database, so I could detect it quickly and recover it from a backup. I ran the check job daily from cron. I switched to JFS for the data volume, and it was happy after that. Now, of course, ext3 is stable, but for a while it was not.
Sean
| Comment |
David Comay Subject: Re: Nexenta and ZFS: The State of the Union. |
Very interesting observations. Have you joined the ZFS community at
http://www.opensolaris.org/os/community/zfs/ and the zones-discuss
mailing list to report the problem you've seen.
As for ready for prime time, while there is clearly bugs to be fixed
and many future enhancements in the work, ZFS is being used in production
at many places. At Sun, we've been using it in engineering for several
years now and we have found it invaluable from a performance and reliability
perspective (not even counting the greater ease of use).
| Comment |
Author:
Sean Reifschneider Subject: Contacting the ZFS community? |
I haven't contacted the ZFS community. Partly, this is because Nexenta is always pretty far behind on the kernels, as I mentioned in the original post. One of the things that I'm sure the community would want to know, and I'd want to provide, is feedback on whether the problem exists in the current kernel.
You paint a pretty rosy picture of the stability of ZFS, but in my experience ZFS is pretty immature. Admittedly, I've only tried the versions in Nexenta Alpha 2 and 5, however both of those had serious bugs in them. In one, the zpool would stop responding after some amount of I/O to the pool. In the other, recovering a zfs incremental backup would cause a panic. Both of which I'd consider critical bugs.
On the plus side, I have never found corrupted data in the ZFS. Now, admittedly, the system with the first problem above the zpool will lock up part way through a scrub (though it also will lock up the same way after about 2 weeks of normal I/O to the file-system). The other system I've been running less than a week, but in my initial testing it seemed immune to the above bug, and so I'm just avoiding the "zfs recv".
Undoubtedly, Sun has access to more information about where there be dragons in ZFS. Perhaps they are running it in production while avoiding those problem areas. For example, maybe my problems are all related to RAID-Z, and it'd be fine if I was running on a hardware RAID, or mirror paris.
Sean
| Comment |
taggart Subject: zfs update |
It's been 5 months since you wrote this entry. Any updates?
| Comment |
Author:
Sean Reifschneider Subject: Not many updates... |
I don't have many updates on it. Alpha 5 was just recently released, and I haven't had a chance to update to that and run any significant tests on it. Our Alpha 4 system has serious issues with cron dieing, and the "zpool scrub" never completes, it just loops back and starts over again, but the system rarely actually locks up in my use. I fixed cron basically by setting up a daemon that watches for cron to stop responding and then dies. I've been reluctant to do any software updates on the system becase the kernel mostly is working well.
Some of the problems I'd had in the past with mirror reachability and performance may be resolved now that tummy.com is a Nexenta mirror.
My plan is to test Alpha 5 at some point, but I currently have around a dozen projects of that sort of on the back burner so it may be a while.
Sean