Massive Google hard drive survey turns up very interesting things

When your server farm is in the hundreds of thousands and you're using cheap, off-the-shelf hard drives as your primary means of storage, you've probably got a a pretty damned good data set for looking at the health and failure patterns of hard drives. Google studied a hundred thousand SATA and PATA drives with between 80 and 400GB storage and 5400 to 7200rpm, and while unfortunately they didn't call out specific brands or models that had high failure rates, they did find a few interesting patterns in failing hard drives. One of those we thought was most intriguing was that drives often needed replacement for issues that SMART drive status polling didn't or couldn't determine, and 56% of failed drives did not raise any significant SMART flags (and that's interesting, of course, because SMART exists solely to survey hard drive health); other notable patterns showed that failure rates are indeed definitely correlated to drive manufacturer, model, and age; failure rates did not correspond to drive usage except in very young and old drives (i.e. heavy data "grinding" is not a significant factor in failure); and there is less correlation between drive temperature and failure rates than might have been expected, and drives that are cooled excessively actually fail more often than those running a little hot. Normally we'd recommend you go on ahead and read the document, but be ready for a seriously academic and scientific analysis. [Warning: PDF link]

[Via Slashdot, photo by Uwe Hermann]