Your Linux Data Center Experts

I've been following this Intel forum thread on failures of the 320 series of SSD drives for a while now after having the Tom's Hardware article on it pointed out to me by our friends at Pattern Review.

So far it's been fairly full of speculation with little concrete information available… A week ago Intel responded that firmware fixes were being worked on.

Read on for more details about this, including the failure I've experienced.

This has been curious for me, because we've been using Intel drives of many flavors for several years, and they've been rock solid. Even in laptops, where power-management issues often result in the systems being power-cycled, one of the events reported to causing this issue.

That is, until this weekend… Now, here's the catch: all the reports so far are related to the Series 320 drives. The drive I had fail is an X-25E 32GB drive. And it's failed in exactly the same way, reporting 8MB (not GB mind you :-) size. This drive is dramatically different from the Series 320 drives using SLC rather than MLC.

Another interesting data point is that this drive almost certainly did not suffer a power loss when this problem occurred. The drive is in a system in our facility, which has an in-rack transfer switch that switches between 2 power circuits fed from 2 different in-room transfer switches, each fed from 2 distinct UPSes and generators. In short, we don't lose power…

Here are the details I know:

  • The drive is an X25-E 32GB labeled as the Kingston drive. I got it used off of ebay, I don't know the status of the firmware version.
  • There was existing data on it, so I wiped it when it arrived. I had installed Ubuntu Natty on it, in the laptop that you now have. I hadn't noticed any problems. I then moved it over to act as the boot drive on a Supermicro server (500MB or so as /boot), and the remaining space were used as a ZFS intent log and level-2 cache (“L2ARC” in the parlance of ZFS).
  • This worked fine for around 2 weeks.
  • Sometime around 3 days ago, I got a bunch of drive errors related to this device, and ZFS noticed the errors and stopped using it. Because it was only used for logging and cache, it's failure didn't interrupt service.
  • I decided to try to reboot and see if I could access it again. Before I did, I tried to “umount /boot”, and got the output I'm pasting below.
  • On the reboot, the BIOS was hanging when it was trying to detect that drive.
  • I did a power-cycle.
  • It no longer hangs, but it won't boot off that drive. The BIOS sees the drive, but now it says it's an 8MB drive. See attached screen-shots.
  • The server was moved from one rack to another several days before this failure, but was shut down cleanly for the move, and the SSD worked fine for several days, including writing at least 32GB to it (it's primary use is as a cache and it was full at the time it failed, it starts off empty at a reboot).

It's hard to say at this point, since Intel is being fairly tight-lipped about it, so all I can do is speculate. But either I've had a totally unrelated failure that looks very similar to what others are reporting with the Series 320 drives, or the problem is more widespread than initially thought.

All these details have been provided to Intel, but so far there has been no response.

As always, make sure you have good backups.

comments powered by Disqus

Join our other satisfied clients. Contact us today.