Follow up to the Hitachi issues. (tummy.com, ltd. Journal Entry)
tummy.com: we do linux

Wednesday October 08, 2008 at 18:57
Subject: Follow up to the Hitachi issues.
Keywords: Hard Drives, Hitachi, Kernel
Posted by: Sean Reifschneider

Related entries:
   What's the deal with Hitachi drives lately? by Sean Reifschneider, Tuesday September 16, 2008 at 23:22
   Current state of the Hitachi Hard Drives by Sean Reifschneider, Friday December 05, 2008 at 10:56

After a month of working with Hitachi on this, and digging through some kernel changes, it looks like the issue I previously reported (part number 0A35415) is a kernel bug.

This seems to be a change in how the firmware handles addressing one specific block on the drive, which other Hitachi drives were more tolerant of. However, this does conform to the ATA specification, so the problem is definitely in the Linux ATA driver.

A fix has been committed to kernel 2.6.27-rc7. However, until this change is rolled into the distributions install media there will be the opportunity for this change to cause grief, particularly for Red Hat Enterprise, CentOS, and Ubuntu LTS users who may not get a re-roll of the install media for quite some time...

Read on for more details and work-arounds.

We finally got a good response from Hitachi. Rick Prijatel of Hitachi GST got back to me and confirmed information about the model numbers that were and were not working and then was able to provide information about the failure mode. The short form is that there is a boundary on the drives above which 48-bit addressing needs to be used, and the kernel is switching between 28 and 48-bit addressing at the wrong time.

So, whenever you try to read or write data to this one specific block it will cause an abort.

In 2.6.27-rc7, in commit 97b697a11b07e2ebfa69c488132596cc5eb24119, "Taisuke Yamada" demonstrates the problem with:

# dd if=/dev/sdc bs=512 count=1 skip=268435455 >/dev/null
dd: reading `/dev/sdc': Input/output error
0+0 records in
0+0 records out

This seems to be the firmware in the drives being more strict than previous firmware on the Hitachi drives and may also impact other drives. Note that these drives do conform to the ATA specification in this regard, the problem is with the Linux ATA driver.

So, in conclusion, be particularly careful until this fix is in the distro installers to run testing like read/write badblocks or the "dd" above to see if the kernel is compatible with the drive you are attempting to install on. Or, at least, you might want to run that dd test after the install to see if it impacts your drive.

Workarounds

The most obvious work-around would be to make sure that none of your partitions extend beyond around 100GB on the disc until after you have a kernel on the system with this patch applied. So, you could do an install and specify an LVM or partition size of 100GB and know that no data will be written at this boundary. Then once you have a safe kernel on the machine, you can extend the partition and file-system to use the whole drive.

Alternately, you could use the "-c" option to mke2fs on the partition that crosses this boundary to cause the drive to be checked for bad blocks. This should identify that block as being bad, but will of course take several hours to format a 500GB drive.

What we will probably do is to just not use these drives and instead stick with a different model like the model 7K1000 0A35770 500GB drive. Unless Hitachi rolls a new firmware onto this drive in production, we have had these drives pass our testing.
(Post Reply)

Comment
taggart
Subject: nice work
Nice work tracking this down! Now I can buy drives again. Too bad Hitachi didn't catch it in their own Linux QA :( I also wonder if Hitachi could issue a firmware workaround, that would save a lot of hassle when doing installs.
Comment
Durval Menezes
Subject: A data-point on Seagate drives.

For the record, at least Seagate drives seem to be more "tolerant" with the mishandling of 28-to-48 bit LBA by Linux:

Recent Seagate 750GB drives:

       sdparm /dev/sdd
           /dev/sdd: ATA       ST3750330AS       SD15
           [...]
       dd if=/dev/sdd bs=512 count=1 skip=268435455 >/dev/null
           1+0 records in
           1+0 records out
           512 bytes (512 B) copied, 7.05896 s, 0.1 kB/s
       -> This was run in an up-to-date Ubuntu 8.04 system:
           Linux felwithe 2.6.24-21-generic #1 SMP Mon Aug 25 17:32:09 UTC 2008 i686 GNU/Linux

Older Seagate 500GB drives:

       sdparm /dev/sdbsdparm  /dev/sdb
           /dev/sdb: ATA       ST3500641AS       3.AA
           [...]
       dd  if=/dev/sdb bs=512 count=1 skip=268435455 >/dev/null
           1+0 records in
           1+0 records out
      -> This was run on my own distribution, based on CentOS4 but with
         (among other things) a newer, custom-compiled kernel:
       uname -a
       Linux carbon 2.6.20.14p4 #2 Tue Jul 1 17:29:02 BRT 2008 i686 i686 i386 GNU/Linux
Comment
Ellen
Subject: hitachi drives
These posts about the Hitachi drives have been very useful to me. Thank you! I had been having problems using the dd command with a 750 GB Hitachi drive, and it seems like this is the explanation.