Your Linux Data Center Experts

The data on computer systems is what makes them valuable. Most file-systems and RAID designs will go through all sorts of work to make sure that their own meta-data are correct, but very little about the user data that they contain. ZFS, in contrast, checksums everything that's written to disc – meta-data as well as file contents. It can detect if the disc has silently been corrected, and recover from it. Read on for more of the benefits of ZFS.

Much discussion in the Linux world has come about because ZFS is a file-system that also includes a deep understanding of the RAID-system as well. ZFS includes RAID and volume-manager like functionality within it. Why not use an external RAID/LVM?

We now have very fast file-system make times, even on large discs, with most file-systems. EXT3 is a notable exception, but JFS and Reiser and XFS can all make file-systems on large devices in some few seconds. On a newly installed system though, putting these file-systems on a RAID array with redundancy still requires a lengthy rebuild process – usually hours. When you create a ZFS with redundancy, after a few seconds you have both the RAID and the file-system created and fully synced.

This also applies to RAID rebuilds or verifies. If you have a 10TB array with only a couple of small files on it, say a few MB, a RAID verify and rebuild will only take a few seconds. Because the file-system and the RAID are tightly coupled, the system only verifies or reconstructs the used data, not the 9.9999TB of unused data like the normal Linux “md” driver would.

A common case with most RAID systems is that they will be used for a long time and a drive falls out of the array. You replace the drive, and the system goes through reading all the data on the existing drives, reconstructing the missing drive. But then one of those other drives has a marginal sector in a part of the drive that isn't actively being used for storing data, and now another drive falls out of the array. If you don't have multiple redundancy, it's now something you will manually have to recover from.

The primary resolution of this is to regularly run verifies of the array, where the array goes through and reads all the data on all the drives, and verifies that the parity with the other drives. This is a lengthy process on large discs even if they are only filled a little bit. If you get a read error while doing this, the RAID array can mark that drive as bad and let you know before you are trying to use this sector to reconstruct from a failed drive (in sectors you are actively using).

If, on the other hand, one of the drives has slightly corrupted data, without redundant parity you can't tell if it's one of the data slices or the parity slice is corrupt, so you're hosed. Because ZFS checksums everything, it can detect and repair from this situation. In fact, this is one of the first demos of ZFS I saw – create a ZFS equivalent of a RAID-5 set, put some data on it, and then dump /dev/zero directly over one of the disc devices. Then tell ZFS to figure out what happened and clean up from it.

ZFS has demonstrated that the combining of the RAID with the file-system really makes sense. It is, quite honestly, the only file-system that I truly trust my data with for long-term storage, because it's the only one that is designed to detect and recover from corruption. EXT3 has performed well for us, and I have a lot of systems using it, but when it gets right down to it EXT3 does very little to detect or correct data corruption that happens behind the scenes.

comments powered by Disqus

Join our other satisfied clients. Contact us today.