Saturday July 05, at 02:38
Subject: Why I Like ZFS.
Keywords:
Technical, ZFS
Posted by: Sean Reifschneider
Related entries:ZFS Under Linux: A User Report by Sean Reifschneider, Saturday July 05, at 15:00
The data on computer systems is what makes them valuable. Most
file-systems and RAID designs will go through all sorts of work to make
sure that their own meta-data are correct, but very little about the user
data that they contain. ZFS, in contrast, checksums everything that's
written to disc -- meta-data as well as file contents. It can detect if
the disc has silently been corrected, and recover from it. Read on for
more of the benefits of ZFS.
Much discussion in the Linux world has come about because ZFS is a
file-system that also includes a deep understanding of the RAID-system as
well. ZFS includes RAID and volume-manager like functionality within it.
Why not use an external RAID/LVM?
We now have very fast file-system make times, even on large discs,
with most file-systems. EXT3 is a notable exception, but JFS and Reiser
and XFS can all make file-systems on large devices in some few seconds.
On a newly installed system though, putting these file-systems on a RAID
array with redundancy still requires a lengthy rebuild process -- usually
hours. When you create a ZFS with redundancy, after a few seconds you have
both the RAID and the file-system created and fully synced.
This also applies to RAID rebuilds or verifies. If you have a 10TB
array with only a couple of small files on it, say a few MB, a RAID verify
and rebuild will only take a few seconds. Because the file-system and the
RAID are tightly coupled, the system only verifies or reconstructs the used
data, not the 9.9999TB of unused data like the normal Linux "md" driver
would.
A common case with most RAID systems is that they will be used for a
long time and a drive falls out of the array. You replace the drive, and
the system goes through reading all the data on the existing drives,
reconstructing the missing drive. But then one of those other drives has a
marginal sector in a part of the drive that isn't actively being used for
storing data, and now another drive falls out of the array. If you don't
have multiple redundancy, it's now something you will manually have to
recover from.
The primary resolution of this is to regularly run verifies of the
array, where the array goes through and reads all the data on all the
drives, and verifies that the parity with the other drives. This is a
lengthy process on large discs even if they are only filled a little bit.
If you get a read error while doing this, the RAID array can mark that
drive as bad and let you know before you are trying to use this sector to
reconstruct from a failed drive (in sectors you are actively using).
If, on the other hand, one of the drives has slightly corrupted data,
without redundant parity you can't tell if it's one of the data slices or
the parity slice is corrupt, so you're hosed. Because ZFS checksums
everything, it can detect and repair from this situation. In fact, this is
one of the first demos of ZFS I saw -- create a ZFS equivalent of a RAID-5
set, put some data on it, and then dump /dev/zero directly over one of the
disc devices. Then tell ZFS to figure out what happened and clean up from
it.
ZFS has demonstrated that the combining of the RAID with the
file-system really makes sense. It is, quite honestly, the only
file-system that I truly trust my data with for long-term storage, because
it's the only one that is designed to detect and recover from corruption.
EXT3 has performed well for us, and I have a lot of systems using it, but
when it gets right down to it EXT3 does very little to detect or correct
data corruption that happens behind the scenes.
(Post Reply)
(Post Reply)
| Comment |
Daniel Webb Subject: How? |
Are you using the FUSE version of ZFS on Linux or are you using Solaris?
I have been pining for ZFS since I first heard about it. I did some research for a while into what would be required to make truly robust backups for current filesystems, and it is not pretty. ZFS does nearly everything the right way, but I'm very hesistant to use a beta-release of something as important as a filesystem!