Your Linux Data Center Experts

Stephen Warren tonight noticed that one of his backups of his storage server was transferring a file that it shouldn't have been. He was running with the “-c” option to rsync to do a full checksum check, so apparently this file had changed.

Stephen provided these instructions on how to compare the two copies of a file on a RAID array, to verify that they are the same, or in his case different. He wanted me to post them so he could find them again when he has to search google for them in the future. :-)

Read after the fold for the details of how to read from a single drive on a RAID-1 array.

Now you can configure one MD member to not be read from with these commands (which assume you are disabling member “sda3” of “md1”):

cd /sys/block/md1/md/dev-sda3
echo writemostly > state

Check the contents of the file using these commands, first by wiping the Linux kernel cache, and then running an md5sum:

echo 3 > /proc/sys/vm/drop_caches
md5sum /some/file

Then to put the RAID array back to the point where it will randomly read blocks from each device:

cd /sys/block/md1/md/dev-sda3
echo -writemostly > state

Then repeat the above for the other drive in the array, say sdb3 instead of sda3.

In Stephen's case, he would always get one checksum when reading the file from one drive in the array, and another checksum when reading it from the other, and randomly one of the two checksums when reading from the RAID array with both drives enabled.

In Stephen's case the files differed by just 4 bytes, which is very odd. Unfortunately, because of the way RAID works, it can only tell when a drive fails, it can't do anything when reads from both drives in the array succeed but return different results – it doesn't even notice. Most RAID levels, except probably RAID-6, can't recover or even detect such a corruption.

Not to rub it in (I did plenty of that in person :-), but this is exactly why I love ZFS so dearly. ZFS computes checksums for each block stored, and can use this information to detect and automatically correct for data corruption, even if the read succeeds.

comments powered by Disqus

Join our other satisfied clients. Contact us today.