Your Linux Data Center Experts

OpenSolaris has this new file-system called ZFS. It's insanely cool. The down side, that I don't find mentioned in any of the ZFS discussions I've seen so far is that it's extremely beta. I've heard that they had it 90% complete in Solaris, and then said “Hey, OpenSolaris guys! Why don't you finish it up?” My experience has been that as long as I reboot before and after running a “zpool scrub”, I don't experience any ZFS hangs. ZFS has some really nice features.

First of all, since most places discussing ZFS don't mention is that it's very Beta. Combine this with NexentaOS's “Alpha” status, and you probably don't want to put it into production quite yet. That said, I have been running it for file storage for a few months, and it's worked. However, I do find that if I do a reboot immediately before the “zpool scrub”, it will complete instead of hanging part way through. Then I do a reboot immediately after it completes, around 18 hours later (for 1.8TB).

ZFS is a file-system which is tightly coupled with a logical volume manager and RAID system. On top of this, every object in the file-system is checksummed, and snapshots can easily be created and removed. All this combines into an extremely cool user experience.

For example, the “RAID-Z” that it includes (in addition to RAID-0 and RAID-1) doesn't require that discs be the same size. The file-system will manage data-writes to ensure that data is replicated across multiple drives. Because everything is checksummed, accidentally overwriting a disc, corruption (even at the controller or disc level) can be detected with a “zpool scrub”. If you have redundancy, it will even be repaired during the scrub.

Another nice thing about having the file-system and the RAID combined is that when you do a repair or check, it only checks the blocks that have data in them. A normal RAID would have to do a rebuild it's entire size. So, a RAID test or repair of a 1TB volume that only had 1MB of data on it will take only seconds to complete, instead of potentially hours on a traditional RAID-5 setup.

Snapshots on ZFS just happen. Unlike with Linux LVM, where you have to create a “copy on write” partition to hold changed blocks for each copy, in ZFS you can just do “zfs snapshot” to create a snapshot, and “zfs destroy” to remove it. You can also create sub-file-systems as easily as you can create directories with “zfs create”. For example, you could create one of these file-systems for the home directory of every user on the system, even if there are thousands of them.

A “df” will show the status of every file-system, top-level or those below it. If you create a ZFS for every user on your system, a quick “df” will show how much space each individual user is consuming. You could also allow users to independently create or delete snapshots of their home directories. If a user wants to roll back they don't have to roll back other users.

Another constraint of ZFS currently is that you cannot run it for your root file-system. This is being worked on and it sounds like they're expecting to have it resolved in the next few months. Also, it's currently only available on OpenSolaris kernels. I've heard talk of porting it to DragonFly BSD and MacOS. I imagine that it will start getting attention in Linux before long, but it's hard to say how easy the port will be. Since it's a file-system that is also RAID and LVM, it may be fairly challenging to port.

I've been experimenting with it on NexentaOS. This is an OS which uses the OpenSolaris kernel and associated required components, but with apt-get and much of the GNU/Debian software available in it's repositories. As I said earlier, it's currently in Alpha, but on some of my systems it will install and run reasonably well. Note there is a learning curve because of some of the associated things that are pulled in. Instead of /etc/rc*.d you have to learn to deal with Service Management Framework (SMF). This imposes some annoying restrictions like you can't turn off inetd without SSH also going away.

SMF is very cool though, it just takes time to learn. Other software such as “ps” follows the Solaris form instead of Linux. And the networking configuration is painful… Switching from DHCP to a static address is a nightmare. Even though I followed a recipe for changing things in the 15 different places I needed to, the network didn't come up on a reboot. This is because – wait for it – shutting down DHCP on the reboot munged many of the settings I had to make. As near as I can tell, switching from DHCP to static is something you have to be at the console to do.

NexentaOS is a good way to get and try out OpenSolaris, particularly if you come from Linux. In fact, as near as I can tell, it's the only way to get an OpenSolaris install right now that doesn't involve building the kernel from scratch and boot-strapping.

Likewise, ZFS is still pretty early in it's life-cycle. However, it is a very solid concept and I am looking forward to being able to use it more over the next couple of years.

comments powered by Disqus

Join our other satisfied clients. Contact us today.