Google study: effect of temperature on server hdds

Silencing hard drives, optical drives and other storage devices

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Aris
Posts: 2299
Joined: Mon Dec 15, 2003 10:29 am
Location: Bellevue, Nebraska
Contact:

Post by Aris » Tue Apr 03, 2007 3:10 am

from personal experience, with working with computers in my job and at home. the parts that fail the most are either:

1. being handled by people regularly
or
2. have moving parts.

Parts that are never physically touched by people, and dont have any moving parts almost never fail.

To me, that says vibration, and friction are the key culprits. Heat definately effects lubricants ability to keep friction down in moving parts.

Now i work in the military, and i have to say i have never heard of that 10c degree lower doubles electronics lifespan BS ever. That doesnt mean heat doesnt effect them, i just dont think it effects them this severely.

i laughed a little on the inside when i read that postscript from google, they said that SMART causes drive failures. rofl, didnt they ever stop to think that mabey if its relocating data because of possible bad sectors that mabey its an indication of a flawed product, and not that the SMART utility is killing the drive? i mean honestly, all its doing is moving data around. i guarentee you that you will manually move that data around alot more than the SMART utility ever does. No products are perfect, there are always going to be duds, and ones that fail early. Its just the nature of the beast.

Hard drive manufacturers list a maximum safe operating temp. They dont just pull this number out of their ass. As long as your below that number, your good. They have tested their own products, and they know what their products are capable of.

I definately can see drives failing if they are too cold though, just like if they are too hot. when the temp drops, lubricants will get more thick. Think about how you change your oil in your car in the winter time. Only problem is you cant adjust your oil vescocity in your HD bearins as the temp changes. This is why there are operating temp guidlines on the drives.

Flodis
Posts: 1
Joined: Fri Jan 05, 2007 6:29 am
Location: Stockholm, Sweden

Post by Flodis » Tue Apr 03, 2007 6:05 am

Rusty075 wrote:The temperature results list average temperature readings for the drives.

Take two identical drives. Place one under continuous Medium utilization where its drive temperature stays at a near constant 45°. Place the other drive under Low utilization, where it spends say 2/3rds of its time idling at 25°, and 1/3rd of its time at 100% use where its temp peaks at over 50°. The Low drive will have an average temp that is much less than the Medium usage drive (33° in my hypothetical). But I could almost guarantee you that the repetitive thermal cycling that comes from alternating periods of high usage and low usage will be harder on that drive than the 12° hotter temp is on the Medium, thus making the Low drive statistically more likely to fail. For many high-precision mechanical parts thermal cycling is more damaging than conventional wear....seems reasonable the HDD's would have a similar reaction.
I was just about to post the same argument myself when I saw that you had already posted it. I agree wholeheartedly: not including thermal cycling in the equation is just plain bad.

Google's study is an impressive collection of data, but without more in-depth analysis on the part of the researchers, you just can't draw any conclusions from it - at least not regarding the thermal issue.

I'm gonna continue keeping my HDDs as cool as possible. That way the temperature delta between power-off and working temperature is kept at a minimum.

alleycat
Posts: 740
Joined: Sun Oct 20, 2002 10:32 am
Location: Melbourne, Australia

Post by alleycat » Tue Apr 03, 2007 8:18 am

I totally agree with Rusty. Maintaining a consistent drive temperature should maximize longevity.

J. Sparrow
Posts: 414
Joined: Wed Jan 17, 2007 7:55 am
Location: EU

Post by J. Sparrow » Tue Apr 03, 2007 9:22 am

Rusty075 wrote:Place the other drive under Low utilization, where it spends say 2/3rds of its time idling at 25°, and 1/3rd of its time at 100% use where its temp peaks at over 50°.
Yours is an interesting point and really worth of noting.

My experience is that a HDD inside a case will operate in a narrower range of temperature, though; it won't idle at 25 °C if it's a 7200 rpm unit, unless you put it through a sizeable airflow. And in that case, it won't probably reach 50 °C anyway.

If their environment is not entirely different from mine, the drive should be ubercooled and idle for most of its life to display an average under 30 degrees. Thus whiic's conjecture about 5400 rpm units being the cool-running non-dependable ones IMO make sense.

mattthemuppet
*Lifetime Patron*
Posts: 618
Joined: Mon May 23, 2005 7:05 am
Location: State College, PA

Post by mattthemuppet » Tue Apr 03, 2007 2:56 pm

Still, do you disagree with some of the following:
- 5400rpm drives usually run cooler than 7200rpm
- 5400rpm drive are a dying breed and usually use older technology and ball-bearings
- ball-bearings compromize HDD reliability over a longer period of use as ball-bearings have a tendency to wear out (thus increase non-repeatable run out (NRRO) and cause errors during I/O).
I don't disagree with your points at all, they're all perfectly valid. Where I was coming from is that all these different factors can be taken into account in any given analysis. You can quite easily have 2nd or 3rd factor interactions, eg DRIVE AGE * TEMP or DRIVE AGE * TEMP * SPINDLE SPEED which will tell you that, if TEMP is a significant factor, how much of that "significance" is attributable to SPINDLE SPEED and so on. It all depends on how well the statisticians did their analysis and what they included as variables, something which I'm afraid I can't vouch for.

Afterall, as they say "there are lies, damn lies and statistics" :)

Rusty075
SPCR Reviewer
Posts: 4000
Joined: Sun Aug 11, 2002 3:26 pm
Location: Phoenix, AZ
Contact:

Post by Rusty075 » Tue Apr 03, 2007 5:56 pm

If you read for tone in the Google report, and not just content, it seems pretty clear that they're as confused about some of the results as we are. I would think that if the difference between 5400 and 7200 drive failure rates accounted for the anomolies they wouldn't be grasping at straws for explanations. Obviously their database has the drives all sorted out by model number...even a cursory examination of it would reveal if the cool drives that were failing were mostly of one variety or another.

From reading the report I suspect that its preparation wasn't so much a matter of, "Hey lets share these interesting conclusions with the world" as it was, "Lets give the really interested parties a taste of what kind of data we have been collecting, and then see how valuable they think the raw data would be to them." There's clearly been a lot more analysis going on within Google that won't show up in any public report given away for free.

whiic
Posts: 575
Joined: Wed Sep 06, 2006 11:48 pm
Location: Finland

Post by whiic » Wed Apr 04, 2007 4:49 am

Rusty075:
I would think that if the difference between 5400 and 7200 drive failure rates accounted for the anomolies they wouldn't be grasping at straws for explanations. Obviously their database has the drives all sorted out by model number...even a cursory examination of it would reveal if the cool drives that were failing were mostly of one variety or another.
Most likely so. The question is, would they reveal their findings? It's not up to their interest to rank HDD manufacturers as good or bad, and Maxtor has been pratically the only manufacturer that offered 5400rpm drives lately. Would Google want to label Maxtor's brand (now owned by Seagate) as time-bombs?

Rusty075:
"Lets give the really interested parties a taste of what kind of data we have been collecting, and then see how valuable they think the raw data would be to them." There's clearly been a lot more analysis going on within Google that won't show up in any public report given away for free."
Also, there's a lot more raw data within Google that won't show up in this or any other public report given away for free. Whether ball-bearing drives are or are not the cause of the anomality with higher failures at low temperatures, is one of them.

Google's study confirmed previous results of how accurate SMART diagnostics are to predict drive failures. That's the relevant part of the study. That temperature vs reliability is pretty much worthless piece of information if they don't publish some additional raw data that might tell us about the cause for lower temperature. Possible causes could be: lower power consumption (5400rpm BB), more airflow, cooler ambient temperature. If the reason for cooler drive temperature is within drive itself, it's simply wrong to say cooling (airflow and ambient temperature) is detrimental to the drive's reliability.

J. Sparrow
Posts: 414
Joined: Wed Jan 17, 2007 7:55 am
Location: EU

Post by J. Sparrow » Wed Apr 04, 2007 5:49 am

whiic wrote:Would Google want to label Maxtor's brand (now owned by Seagate) as time-bombs?
They could have stated ball-bearing disks were less dependable, and leave to the reader the task of connecting the points :)

However, it seems pretty 'careless' to discard data from Hitachi drives just because they wouldn't bother finding the way to read the temp data (was it that hard? or do they have so little hitachis that it wouldn't have been worth the hassle?)

Post Reply