Your Linux Data Center Experts

After a month of working with Hitachi on this, and digging through some kernel changes, it looks like the issue I previously reported (part number 0A35415) is a kernel bug.

This seems to be a change in how the firmware handles addressing one specific block on the drive, which other Hitachi drives were more tolerant of. However, this does conform to the ATA specification, so the problem is definitely in the Linux ATA driver.

A fix has been committed to kernel 2.6.27-rc7. However, until this change is rolled into the distributions install media there will be the opportunity for this change to cause grief, particularly for Red Hat Enterprise, CentOS, and Ubuntu LTS users who may not get a re-roll of the install media for quite some time…

Read on for more details and work-arounds.

We finally got a good response from Hitachi. Rick Prijatel of Hitachi GST got back to me and confirmed information about the model numbers that were and were not working and then was able to provide information about the failure mode. The short form is that there is a boundary on the drives above which 48-bit addressing needs to be used, and the kernel is switching between 28 and 48-bit addressing at the wrong time.

So, whenever you try to read or write data to this one specific block it will cause an abort.

In 2.6.27-rc7, in commit 97b697a11b07e2ebfa69c488132596cc5eb24119, “Taisuke Yamada” demonstrates the problem with:

# dd if=/dev/sdc bs=512 count=1 skip=268435455 >/dev/null
dd: reading `/dev/sdc': Input/output error
0+0 records in
0+0 records out

This seems to be the firmware in the drives being more strict than previous firmware on the Hitachi drives and may also impact other drives. Note that these drives do conform to the ATA specification in this regard, the problem is with the Linux ATA driver.

So, in conclusion, be particularly careful until this fix is in the distro installers to run testing like read/write badblocks or the “dd” above to see if the kernel is compatible with the drive you are attempting to install on. Or, at least, you might want to run that dd test after the install to see if it impacts your drive.

Workarounds

The most obvious work-around would be to make sure that none of your partitions extend beyond around 100GB on the disc until after you have a kernel on the system with this patch applied. So, you could do an install and specify an LVM or partition size of 100GB and know that no data will be written at this boundary. Then once you have a safe kernel on the machine, you can extend the partition and file-system to use the whole drive.

Alternately, you could use the “-c” option to mke2fs on the partition that crosses this boundary to cause the drive to be checked for bad blocks. This should identify that block as being bad, but will of course take several hours to format a 500GB drive.

What we will probably do is to just not use these drives and instead stick with a different model like the model 7K1000 0A35770 500GB drive. Unless Hitachi rolls a new firmware onto this drive in production, we have had these drives pass our testing.

comments powered by Disqus

Join our other satisfied clients. Contact us today.