My laptop, apparently sensing that I think about work too much, decided to today it would eat my /home partition. It seems like it might have been caused by XFS corruption. However, good backups and xfs-repair, I seem to have recovered every scrap of information, after converting back to JFS. To be honest, XFS has been nothing but trouble on the laptop. JFS had it's quirks, but it wasn't bad.
XFS seems to delay writing stuff for a very long time, even on an otherwise idle system. When I've had problems which required the reset of my laptop, which happens about once a month (though a bit more frequently this month because of the FC3 upgrade and software suspend trickery), it would lose amazingly large chunks of data. One of the instances I remember it being like the file-system had been rolled back at least an hour.
To be fair, I've been running XFS on a server for 6+ months and haven't any trouble with it. It also has scarcely been rebooted in that time, maybe once or twice. The problem with my laptop today was that disc activity started just hanging while I was using the system. “dmesg” didn't say anything out of the ordinary. So, I did a few “sync"s and a power-cycle, and when it came up it couldn't mount /home. I tried manually doing a mount, but it gave me what looked like a traceback. xfs_recover, of course, wants you to replay the journal by doing a mount, so when I ran xfs_recover and forced it, it was very unhappy.
I decide to just verify that my backups were current, save off any files modified in the last couple of days, and then start from scratch. This also gave me a free partition to move / into, then rebuild / as JFS instead of XFS. So now I'm back to running JFS, which I had been running happily for about a year.
Well, fairly happily. I had a few issues with it. One was that writing a file to a directory doesn't update the mtime on the directory. This sounds like it might be fairly minor, but I store much of my mail in Maildirs, and mutt uses the directory mtime to determine if new e-mail has come in since the last check. I've had to add a "touch Maildir/new” to my “~/.forward” file because of that.
The other problem I've run into is that some files with special characters in their name end up being undeletable. For example, Pebble Linux, which I had a few development copies of on my laptop for a project, has a file in it which, once in place, you can't remove when using XFS. Even if you run an fsck.xfs on the file-system. I also ran into a bunch of files in “lost+found” which I couldn't delete, even after an fsck.
Maybe it's time to try Reiser again. It's been at least several years since I last used Reiser on my laptop. I've been reluctant to use it because at the time we had all of our laptops running it and every one of them experienced at least some serious data corruption within a year, Kevin's had to be totally reloaded from backups.
I've been avoiding ext3 for fairly similar reasons. One system we maintain which acts as an archive of data for other systems was running into file-system corruption issues once to twice a week requiring fsck to be run. I had set up a job to verify that all the files on the system were present and checked their checksums, so it was easy to find even the ones that weren't showing up as corrupted directories. Once we switched this data store over to JFS, the corruption problems entirely went away. I left the root file-system ext3 there, and it hasn't had problems, but the data partition was definitely having issues. We've also seen other issues with ext3 elsewhere, particularly with quotas enabled.
I wish I had a good answer, but I don't. I've written a program which tries to simulate the activity that were causing so many problems with ext3, but I've had no luck reproducing it on another system. For now, it seems like JFS is the way to go for my laptop.
Happy holidays.comments powered by Disqus