Max safe temp for hard drives?

Silencing hard drives, optical drives and other storage devices

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

MikeC
Site Admin
Posts: 12285
Joined: Sun Aug 11, 2002 3:26 pm
Location: Vancouver, BC, Canada
Contact:

Max safe temp for hard drives?

Post by MikeC » Thu Oct 16, 2003 9:01 am

This is the question that came up in Ralf Hutter's review of the Antec SLK3700BQE. His displeasure at seeing a max temp of 43C for a 2-platter Barracuda IV in the BQE prompted me to post a POINT * COUNTERPOINT addendum at the end of the review. (Please read that before posting further comments here.)

It also prompted me to review a lot of the documentation from HDD manufacturers about safe temperatures. It made me realize that they are pretty cagey on the topic in many ways.

For one, they most commonly talk about "operational temperature" not for the drives themselves (by which I mean the readout from the S.M.A.R.T internal temp diode) but for "ambient." The one exception I've found so far is Seagate:

Seagate specifies in their spec doc 100129212b.pdf for the Barracuda IV -- "Ambient temperature: 0° to 60°C (op.), –40° to 70°C (nonop.)"

And: "Ambient temperature is defined as the temperature of the environment immediately surrounding the drive. Actual drive case temperature should not exceed 69°C (156°F) within the operating ambient conditions."

These are exactly the same as specified for the 7200.7 drives.

The recommended position for the measuring the "Actual drive case temperature" is at the bottom center edge of the front edge (see p23 of the pdf -- p.31 as read by Acrobat)

The implication of all the above is that there is a ~10C difference between drive temp and ambient temp.

WD's thermal specs are harder to find, but this document on Thermal Monitoring for Advanced Data Protectionrefers to S.M.A.R.T. default warning temp as 60C and shutdown as 65C. This would suggest that WD's maximum ambient temp recommendation would be ~55C...

No temp refs found thus far at Maxtor, but operating ambient temps are 55C max.

So the questions are:

1) what do you think the max long term safe S.M.A.R.T drive temp should be (for most/any modern 7200 rpm drives)?
2) What experience / evidence do you have to support this?
3) can you point to any definitive info regarding max long term safe S.M.A.R.T drive temp?

Or should we simply say anything over 55C is unsafe and how much below that you want to go is a matter of personal comfort -- much like max CPU temp? (The argument is that except for burnouts caused by catastrophic failures like a HS fall off, there is little or no evidence of CPUs actually getting damaged by running them close -- say -10% -- to the max die temp for extended periods.) Or do we discriminate between the all-electronic CPU vs. the electro-mechanical hard drive?
Last edited by MikeC on Mon May 31, 2004 9:01 am, edited 3 times in total.

frosty
Posts: 636
Joined: Fri Jun 06, 2003 9:40 am
Location: USA

Post by frosty » Thu Oct 16, 2003 9:50 am

Thanks for the info Mike:

To me it depends on your data and comfort level, for me if my Maxtor dies it gives the excuse and right to buy a cuda, but once a cuda is bought I will prolly mount it in the lower end of the case rather tah in the 5/12 drive bay. Plus I do not do any real critical stuff at home like many here do.

Some have suggested to keep you hard drive within 5 to 6c of your case temps, I think that can only be achieved on decoupled drives with a fan blowing on them.

Jan Kivar
Friend of SPCR
Posts: 1310
Joined: Mon Apr 28, 2003 4:37 am
Location: Finland

Post by Jan Kivar » Thu Oct 16, 2003 10:56 am

NOTE: For all temps, I mean "idle" values. Like when one is surfing/playing MP3s/watching a video etc., not the highest temps when really pushing the drive to the limits (file copy/defrag). Reader can skip the small text if he/she sees fit.

Firstly, I'd like to point out that there is no simple high limit for drive temperature; as it depends on the ambient temp and the case temp. Say, with 30°C ambient You will be hitting over 40°C for the drive. You can't get the drive easily below 40°C even with active cooling.

The situation changes dramatically, if the ambient is only 20°C. If You are hitting over 40°C then, it's because You have the drive decoupled (or enclosed), or there is no airflow across the drive. Especially decoupling can really affect temps, as the heat has no way to conduct to the case.

Then again, can SMART sensors be trusted? As we all understand, motherboards have their hot and cold spots. Same applies to hard drives too. One of our national computer magazines (MikroBitti) used a thermographic camera to measure temps of various parts. [You can download few videos here.] Unfortunately there isn't any for a hard drive, but in the magazine they had a picture of Maxtor D740X (IIRC), and the chips were about 50°C, when the drive was out of the case. Placing the sensor near these chips will produce misleading temps (unless the sensor is "fixed").

And, as we all know, different motherboards give different temps for the same CPU. So, we can't do "accurate" direct comparisons between two brands, maybe not even between two different model series from the same manufacturer. I've seen Maxtor drives (new FireBall 3's, IIRC, which are 5400 rpm) that run near 50°C inside a case, and that drive replaced an IBM 75 GXP, which ran at 37°C in the same mounting. One would have to use external sensors, but the placement of these sensors gives again some pitfalls.


One thing to consider is the "10°C rule for electronics/mechanics": A temperature rise of 10°C will halve the expected lifetime. I think that CPUs are designed to work at high temperatures (like transistors in general), but for hard drives there is "optimum" operating temperature IMHO.

I've once experienced a "cold" hard drive: Once when we got back to work after a weekend, the A/C had acted out (outside temps either rose or dropped by 20°C) and dropped the temp in our office to 10°C. When I booted up the machine, the hard drive made very audible whining. DTemp showed only 14°C when I got to Windows. After the drive temp rose over 22-24°C, the noise dropped considerably. This Maxtor drive (D740X) used to idle at 46°C in 23°C ambient, BTW.

I'd like to use "rise above ambient" for calculating the max. desirable temp. I tend to agree with Ralf; going over 40°C upsets me, as I use a fan to cool the drives. Consider that the ambient temps here are near 20-22°C nearly all the time, so the delta T is roughly 18°C. During summer we had ambient temps near 30°C, so considering that the limit would elevate to close to 50°C, as lower values aren't obtainable without changing the cooling setup. In "normal" cases (=no extra fans or directed airflow), I'd set the limit to ambient + 23°C.

These are all assumptions based on the use of one drive. I've noticed that if one is running two or more drives, the topmost is always running hotter than the lower ones.

Hopefully this wasn't a boring read... :wink:

Cheers,

Jan

MikeC
Site Admin
Posts: 12285
Joined: Sun Aug 11, 2002 3:26 pm
Location: Vancouver, BC, Canada
Contact:

Post by MikeC » Thu Oct 16, 2003 11:39 am

Jan Kivar wrote:...there is no simple high limit for drive temperature...
Well there is if you consider the S.M.A.R.T. temp, which is off the internal temp diode, to be a reasonable representation of internal drive tmep. This would naturally include the effect of ambient temp.

Regarding the "10°C rule for electronics/mechanics": A temperature rise of 10°C will halve the expected lifetime.

There is some question about where this originated. I recall reading somewhere about it being pulled out of the air by some smartass contractor for the US military who wanted to sell more cooling electronic gear?... Probably totally distorted.

Here's an insight from a more informed source at X-bit Labs:
Speaking seriously, we can’t help recalling Arrhenius’ rule from the U.S. Department of Defense Military Handbook 217 (this book used to be the court of first instance in all questions concerning electronics reliability). This rule suggests that for the temperature range from –20 to 140C every temperature drop by 10C doubles the life term of the equipment. Military Handbook 217 is no longer used nowadays and the rule shouldn’t be taken directly as is. For example, temperature may vary in different parts of a single PC case. Still, the book had its truth. High temperature of a chip doesn’t tell well on its life term.

By the way, we have mentioned temperature variations inside the PC case. This is more of a problem now than it used to be before. Traditionally, the central processor is the warmest spot, but lately the chipset, the graphics processor, and even the hard disk drive have become very warm, too. Together with a complex pattern of airflows inside, the whole picture is too complicated to fully comply with the “golden rule” about 10C.

josephclemente
Posts: 580
Joined: Sun Aug 11, 2002 3:26 pm
Location: USA (Phoenix, AZ)

Post by josephclemente » Thu Oct 16, 2003 11:44 am

I am very interested in this subject.

Being a slave to temperature readings is a serious barrier in acheiving a quiet PC.

I have a Shuttle SS51G XPC (small form factor) with a grommet-mounted 7200 RPM 80GB Maxtor 6Y080P0.

I use SpeedFan, and I have it set to speed up my 92mm blowhole fan when my Maxtor reports higher than 40C.

Right now, ambient is 28C (NOT from a good lab thermometer, just a digital clock) and the drive is 39C (idle).

But should I even bother spinning up the fan after 40C? Maybe 45C or 50C? It would be great to know the optimum internal temperature for reliability/life. Overcooling beyond that just means unnecessary noise.

lucienrau
Posts: 197
Joined: Thu Mar 27, 2003 4:22 pm
Location: Boston, MA

Post by lucienrau » Thu Oct 16, 2003 12:04 pm

Sort of off topic here, what kind of noise do you get from your modded shuttle?

Jan Kivar
Friend of SPCR
Posts: 1310
Joined: Mon Apr 28, 2003 4:37 am
Location: Finland

Post by Jan Kivar » Thu Oct 16, 2003 12:10 pm

MikeC wrote:
Jan Kivar wrote:...there is no simple high limit for drive temperature...
Well there is if you consider the S.M.A.R.T. temp, which is off the internal temp diode, to be a reasonable representation of internal drive tmep. This would naturally include the effect of ambient temp.
I was trying to make a point that You can't easily say what the temp will be if You use Drive A in Case B, with Ambient C, apart from the fact that it will be lower than 55°C in most cases.
MikeC wrote:Regarding the "10°C rule for electronics/mechanics": A temperature rise of 10°C will halve the expected lifetime.

There is some question about where this originated. I recall reading somewhere about it being pulled out of the air by some smartass contractor for the US military who wanted to sell more cooling electronic gear?... Probably totally distorted.
I've understood that this rule can be applied especially to the power supplies. Running PSUs with slow fans (or, better yet, without a fan) can kill the PSU sooner.

Jan

josephclemente
Posts: 580
Joined: Sun Aug 11, 2002 3:26 pm
Location: USA (Phoenix, AZ)

Post by josephclemente » Thu Oct 16, 2003 12:17 pm

I don't have anything to measure, but my XPC is definitely very quiet after my mods. I have a Zalman on the video card, and use a Speedfan-controlled 60mm external fan in place of the original 40mm PSU fan. I just wish it came out-of-the-box like this. :)

lucienrau
Posts: 197
Joined: Thu Mar 27, 2003 4:22 pm
Location: Boston, MA

Post by lucienrau » Thu Oct 16, 2003 12:27 pm

I've been contemplating a sff with the same mods, but just haven't had the need to do it yet as I recently bought an Nforce 2 mobo and some new ram before I got the sff modding bug. Though I think my next project will be to find an old Philco cathedral radio and put a small Mobo to see if I can make that my primary silent system.

MikeC
Site Admin
Posts: 12285
Joined: Sun Aug 11, 2002 3:26 pm
Location: Vancouver, BC, Canada
Contact:

Post by MikeC » Thu Oct 16, 2003 12:30 pm

Jan Kivar wrote:1 - I was trying to make a point that You can't easily say what the temp will be if You use Drive A in Case B, with Ambient C, apart from the fact that it will be lower than 55°C in most cases.

2 - I've understood that this rule can be applied especially to the power supplies. Running PSUs with slow fans (or, better yet, without a fan) can kill the PSU sooner.
1 - Oh, ok, you mean to predict temps? No I totally agree, you can't predict it but you don't need to, you can measure directly with thermal diodes in almost any modern drive and DTemp. I'd recommend anyone who has concern about data safety to have a drive with thermal diodes and DTemp or similar and to monitor temps at least from time to time.

2 - I have no quibble with the basic notion that more heat shortens component life -- just the precise expression of +10C = 1/2 life. I would think this depends entirely on how close you are to overheating. Say with a drive rated for safe operation to 60C internal temp. If you run it at 40C instead of 30C, it will halve the lifespan? Somehow I doubt it. But if you run it at 55C, it is much more likely to halve the life compared to running it at 45C, I would think.

Jan Kivar
Friend of SPCR
Posts: 1310
Joined: Mon Apr 28, 2003 4:37 am
Location: Finland

Post by Jan Kivar » Thu Oct 16, 2003 12:51 pm

MikeC wrote:2 - I have no quibble with the basic notion that more heat shortens component life -- just the precise expression of +10C = 1/2 life. I would think this depends entirely on how close you are to overheating. Say with a drive rated for safe operation to 60C internal temp. If you run it at 40C instead of 30C, it will halve the lifespan? Somehow I doubt it. But if you run it at 55C, it is much more likely to halve the life compared to running it at 45C, I would think.
Yeah, You're right. I was trying to say this with the "optimum" temperature. Having too low temperature can hurt the hard drive also. IIRC some guy was using a watercooling setup which had water temp lower than the ambient (and the case was isolated etc.). He mentioned that running the drive only at ~20°C made the drive whine more. I have experienced similar effects, as I mentioned in my post.

45°C is safe, if the HD sees no airflow. With airflow, the temperature will be lower. This was posted today, and clearly shows the importance of having some airflow across the drive also. It's just the level of quietness one wishes to achieve...

Cheers,

Jan

dukla2000
*Lifetime Patron*
Posts: 1465
Joined: Sun Mar 09, 2003 12:27 pm
Location: Reading.England.EU

Post by dukla2000 » Wed Oct 22, 2003 3:45 am

I can't answer any of Mike's original 3 questions, but figure some thoughts I started in another thread are worth repeating.

1) Based on the relative importance we should place on the reliability of our hdd, it is worth making an effort to keep them running. Despite claims for relatively high operating temps (50/60C) and because of claims that cooler is more likely longer life, I subscribe to the 'over ambient' target. My main thought is this: with a typical hdd power consumption around 10W, I figure a well designed case/airflow/hdd location should easily be able to keep a hdd 10C over ambient, if not 5C over ambient. (OK if we enclose them for silence things change.)

2) Also because these things are mechanical, I figure the rate of change of temp is significant. Seagate 7200.7 says 20C/hour. If you do have a design that runs more than 20C above ambient, then there is every chance that on start (from cold) your hdd will heat faster than is good for it. (So maybe 20C over ambient is a sensible design max for hdd temp?)

Ralf Hutter
SPCR Reviewer
Posts: 8636
Joined: Sat Nov 23, 2002 6:33 am
Location: Sunny SoCal

Post by Ralf Hutter » Wed Oct 22, 2003 5:42 am

dukla2000 wrote:I can't answer any of Mike's original 3 questions, but figure some thoughts I started in another thread are worth repeating.

1) Based on the relative importance we should place on the reliability of our hdd, it is worth making an effort to keep them running. Despite claims for relatively high operating temps (50/60C) and because of claims that cooler is more likely longer life, I subscribe to the 'over ambient' target. My main thought is this: with a typical hdd power consumption around 10W, I figure a well designed case/airflow/hdd location should easily be able to keep a hdd 10C over ambient, if not 5C over ambient. (OK if we enclose them for silence things change.)

2) Also because these things are mechanical, I figure the rate of change of temp is significant. Seagate 7200.7 says 20C/hour. If you do have a design that runs more than 20C above ambient, then there is every chance that on start (from cold) your hdd will heat faster than is good for it. (So maybe 20C over ambient is a sensible design max for hdd temp?)
My thoughts exactly.

Add this to the counterpoint ("What Price Data Safety") that I posted in my case review and you'll see my position on this issue. Which still stands. And since I figured I had nothing new to add to this topic I haven't posted here until now. So consider this post as an exclamation point to my original counterpoint reply.

MikeC
Site Admin
Posts: 12285
Joined: Sun Aug 11, 2002 3:26 pm
Location: Vancouver, BC, Canada
Contact:

Post by MikeC » Wed Oct 22, 2003 7:01 am

dukla2000 wrote:(So maybe 20C over ambient is a sensible design max for hdd temp?)
I have no issue with the position that more heat is worse than less heat. (as long as we're not dipping down to too cold.) The only real question I am asking is what temps are unsafe and is there empirical data to support that?

Both of dukla2000 's points are sound. The second combines with the max safe temp spec provided by drive makers to give a much more curtailed high temp, especially if you turn your PC on/off as opposed to running them 24/7. (Constant rotation must be more benign than off/on for most devices like fans and hard drives; the start/stop process invoves a huge number of mechanical stresses from the jerk start to overcome inertia to large temp changes and so on, much like for a car engine, which tends to get the greatest wear & tear from ignition.)

BTW, going back to Ralf's review, we find the stated ambient is 75F = 24C. The worst case temp was 43C or 19C over ambient, within the sensible design max suggested by dukla2000. Add the 5V front fan Ralf favors, and it drops to 37C or just 13C over ambient.

pingu666
Friend of SPCR
Posts: 739
Joined: Sun Aug 11, 2002 3:26 pm
Location: swindon- england :/
Contact:

Post by pingu666 » Wed Oct 22, 2003 10:31 am

i think a rule of thumb is, if yourdata is important u cool the drive somehow
i need to sort my via, and find a solution for my dads

Bluefront
*Lifetime Patron*
Posts: 5316
Joined: Sat Jan 18, 2003 2:19 pm
Location: St Louis (county) Missouri USA

Post by Bluefront » Wed Oct 22, 2003 4:54 pm

Me being the cheap fellow I am, I value both the data and the drive. The worst scare I ever got in this matter was after a long XP install with a low-powered computer, and a Maxtor drive. (at least 2.5 hrs)

I thought the drive had enough airflow, but it was a new untested setup. For whatever reason I opened the case immediately after the install, and I swear I burned my hand on the drive. No telling how hot it was.

Since that experience, when I install an OS, it's with the side of the case open, and a large house fan blowing in...heh.

I use 40c as a max temp point, for no particular reason except most of my setups with moderate airflow around the drives, stay under that temp.

josephclemente
Posts: 580
Joined: Sun Aug 11, 2002 3:26 pm
Location: USA (Phoenix, AZ)

Post by josephclemente » Wed Oct 22, 2003 5:21 pm

I don't think even the drive manufacturer's know.

At work we have loads of computers in vent-blocking locations, filled to the max with dust-balls, and hard drives operating at temperatures I wouldn't dare touch. Not bad for 10 years of 10 hour days, 5 days per week. Of course, today's drives are different with their 1 year warranty period...

grandpa_boris
Posts: 255
Joined: Thu Jun 05, 2003 9:45 am
Location: CA

Post by grandpa_boris » Wed Oct 22, 2003 6:15 pm

Bluefront wrote:Me being the cheap fellow I am, I value both the data and the drive
seagate gives the MTBF for 7200.7 as 600,000 power-on hours @ 25'C. that's over 68 years of operation. if we take the "+10'C == 1/2 life" rule seriously, then 7200.7 running @ 35'C will last for 34 years before a failure, 17 years @ 45'C, and over 8 years @ 55'C :shock: .

i would like you all to think back to the systems you were using 8 years ago and drives you were using then. or even better, think back 17 years and recall what system and what sort of drive you were using back in 1986. my system in 1986 was a 80186 with a 20MB disk drive. in 1995 i was running 486/66 with a 1.2GB drive :!: .

now think about how much data you had there. and think about how much data you have now. plot the curve in your mind. it's roughly a 60-100%/year slope, i think. the average drive capacity definitely grows @ 60%/year (i did some survey and data analysis in this area some 6 months ago and my numbers matched the projections from IBM and seagate) now, consider your 120GB 7200.7 drive that probably has 20-30GB of data on it. you will run out of capacity on that drive in about 4 years. at that point the average size of a disk drive in "moderate" price range (say $100-150) will be around 500GB, and you are probably going to be upgrading your system at that point, anyway.

so if you are pro-active and move your valuable data from where it is now to a new disk on a new system every 4 years or so, you should be keeping well away from sudden heat death of your disk drive. if you start noticing a lot of seek delays and recalibration grinding of a drive, and use that a sign that it's time to migrate the data, you can accomodate even cases of really badly made drives (i am thinking here of my 30GB IBM deskstar that starting making nasty noises and grind-seeking after less than 2 years of fairly cool operation).

so unless i am building a system that will be entombed in a wall and has to work unattended for the next 20 years, based on this discussion and my thinking on the matter, i expect that i will not be worrying about a disk drive running @ 55'C.

i am disconnecting the 40mm fan on my formerly and soon again fanless via box as soon as i get home.
josephclemente wrote:I don't think even the drive manufacturer's know.
perhaps we should ask....

Inexplicable
Posts: 226
Joined: Sat Sep 06, 2003 5:59 am
Location: Finland

Post by Inexplicable » Thu Oct 23, 2003 12:03 am

I found the following white paper on Maxtor's web site. It discusses choosing a hard drive for DVR/PVR systems and has a lot of stuff that's relevant for silent systems. An interesting read. The temperature graphs seem to pan out with what I'm observing with my two 7200 rpm DiamonMax PLus 9 SATA drives. I have a Sonata case, which has restricted air flow around the drive cages and rubber grommet mounting, and my SMART temps are typically hovering around 45 C.

al bundy
Posts: 667
Joined: Thu Feb 20, 2003 5:38 pm
Location: Chicago, IL

Post by al bundy » Thu Oct 23, 2003 12:28 am

grandpa_boris wrote:... i expect that i will not be worrying about a disk drive running @ 55'C...
My thoughts exactly, at least for a Barracuda drive.

I have never cooled a hard drive and I have never experienced a "heat-related failure". Nobody else that I know has ever experienced a "heat-related" hard drive failure. Although we have had manufacturers aknowledge various mechanical failures on occasion, these were inherent (structural) issues and (according to the manufacturer) not related to heat. Even the dreaded DeathStar drives :wink: , which I have had plenty of, are now made safe with the recent downloadable IBM firmware fixes.

My own safety-temp values, where I start to get worried about a hard drive, is at 56C for a Barracuda drive and at 51C for all others. I've run various drives for years, in on/off fashion too, at temps just under these and never had a single issue.

If I ever found a drive hitting these temp-marks, this would indicate a bigger problem, and I would want to find a better case-cooling solution rather than putting a fan in front of the hard drive.

I do understand the feelings of those that worry about cooling their hard drives though, as we are all sensitive to heat when minimizing noise from our PC's. I just think that most people waaaaaaaaaay underestimate the 'workhorse' nature of our hard drives.

8)

dukla2000
*Lifetime Patron*
Posts: 1465
Joined: Sun Mar 09, 2003 12:27 pm
Location: Reading.England.EU

BEWARE THE MAGIC MTBF ILLUSION!!!

Post by dukla2000 » Thu Oct 23, 2003 2:38 am

al bundy (et al) - Sure in the past/historically/actual data very few (if anyone) have actual heat related hdd failures. But as per investment performance, past experience is no guarantee of the future.
grandpa_boris wrote:... that's over 68 years of operation. ...
Not necessarily. My stats is virtually zero, but I remember somewhere a good post on what MTBF really means, and this Googled result is more or less what I remember. Now I can't work the arithmetic in the example
R = exp(-43800/250000) = 0.839289
But bottom line, with a 7200.7 over (say) a 4 year life then the stats is actually saying there is an x% (say 92%? - I can't interpret what exp function is in that equation!) probability my drive will last that long.

By looking after the drive environment I am trying to increase the probability of no failure. Coming back to the 'bad' environments: again the stats is only saying the probability is lower you will survive, not necessarily zero. In no way am I finger pointing or asserting your drive WILL fail: it is just an inner smuggness that I believe my drive has a better chance of lasting 4 years than yours.

[edit] ps - managed to work the arithmetic: it is natural log (e) based. So for 600000 MTBF, 4 year life, probability is 94.3% of operation without failure. [/edit]
Last edited by dukla2000 on Thu Oct 23, 2003 2:52 am, edited 1 time in total.

Bluefront
*Lifetime Patron*
Posts: 5316
Joined: Sat Jan 18, 2003 2:19 pm
Location: St Louis (county) Missouri USA

Post by Bluefront » Thu Oct 23, 2003 2:49 am

I guess my experience differs from others. I collect old computers that I gather from work, other places. Most died from failed hard drives, some from a failed PSU. Whether the drive failure was heat-related is debatable, but they did die. So I do worry about heat......

SometimesWarrior
Patron of SPCR
Posts: 700
Joined: Thu Mar 13, 2003 2:38 pm
Location: California, US
Contact:

Post by SometimesWarrior » Thu Oct 23, 2003 3:17 am

grandpa_boris wrote:seagate gives the MTBF for 7200.7 as 600,000 power-on hours @ 25'C. that's over 68 years of operation. if we take the "+10'C == 1/2 life" rule seriously, then 7200.7 running @ 35'C will last for 34 years before a failure, 17 years @ 45'C, and over 8 years @ 55'C :shock:
I think you may be misinterpreting the MTBF rating. StorageReview has a good article on the topic. Hard drives don't really last half a century. ;)

grandpa_boris
Posts: 255
Joined: Thu Jun 05, 2003 9:45 am
Location: CA

Post by grandpa_boris » Fri Oct 24, 2003 1:09 pm

SometimesWarrior wrote:I think you may be misinterpreting the MTBF rating.


deliberately so :-). my point is that the disk will be practically useless and subject to replacement with a cheaper, better drive long before it reaches the end of its useful life. so shaving that life span down by running it near the operating limits may not be such a great threat.

grandpa_boris
Posts: 255
Joined: Thu Jun 05, 2003 9:45 am
Location: CA

disk temperature vs longevity

Post by grandpa_boris » Fri Oct 31, 2003 2:43 am

:oops: this is very embarassing. this message was supposed to be a personal note to MikeC, hence the questions about possibly hosting images, obviously incomplete information, and some specifics that i am now editing out. i should be more careful next time. but i decided to leave the message posted because it may be of general interest after all...

in a recent discussion, you said:
MikeC wrote:1 - Oh, ok, you mean to predict temps? No I totally agree, you can't predict it but you don't need to, you can measure directly with thermal diodes in almost any modern drive and DTemp. I'd recommend anyone who has concern about data safety to have a drive with thermal diodes and DTemp or similar and to monitor temps at least from time to time.
actually, you can to some extent. at USENIX FAST 2003 conference there was a paper presented on calculating power consumption of a disk drive given a work load pattern. it isn't yet available to non-USENIX members and the math was sufficiently baroque that it's probably of little interest to anyone outside of the academia.
2 - I have no quibble with the basic notion that more heat shortens component life -- just the precise expression of +10C = 1/2 life. I would think this depends entirely on how close you are to overheating. Say with a drive rated for safe operation to 60C internal temp. If you run it at 40C instead of 30C, it will halve the lifespan? Somehow I doubt it. But if you run it at 55C, it is much more likely to halve the life compared to running it at 45C, I would think.
i had an opportunity to ask my contacts within a disk manufacturer's research arm to see what they can find out about the relationship between disk temperatures and disk longevity and reliability, and take a couple of minutes in our recent meeting to give me a synopsys.

turns out they don't have much to tell and all that they did have was proprietary, internal and subject to the usual ugly NDAs.

i don't have any hard numbers or charts to pass on the forums here. they didn't have anything that was publicly available, but promised they'll look for public info they can pass on to me. if that happens and if you are interested in placing it on this site, i'll get it to you.

what it all comes down to is that if a disk has an error rate of X @ 25°C, it is derated to (.44 * X) @ 45°C and (.2 * X) @ 65°C. at the nominal temperature of 25°C, a typical disk's serivce life is 5 years. they use temperature-induced aging to stress test drives, but they don't publish the data at higher temperatures. the implication is that the service life is guaranteed within the operating range of the drive. the manual for 7200.7 states "Actual drive case temperature should not exceed
69°C (156°F) within the operating ambient conditions.
". does that mean i can run my disk @ 68°C for 5 years? they didn't know.

if i get any solid info, it may be worth sharing it with the people here. i have no way of hosting images of charts or pdf copies of papers, if they get me any. would it be possible to have you host them on SPCR if they aren't too big and of sufficiently broad appeal to the people here?
Last edited by grandpa_boris on Fri Oct 31, 2003 10:31 am, edited 1 time in total.

dukla2000
*Lifetime Patron*
Posts: 1465
Joined: Sun Mar 09, 2003 12:27 pm
Location: Reading.England.EU

Post by dukla2000 » Fri Oct 31, 2003 3:44 am

Interesting stuff. In particular the numbers for the error rate decrease (well I guess the quoted number will decrease => an increase in the number of errors) as temp increases. But after that I feel it is all lies, damned lies and statistics :)

I can understand they are loath to publish raw data: based on the litigation nature of some societies it is simple to imagine the consequences. And it also sets them up for a willy-waving spec contest with other manufacturers.

But your notes did make me recheck the 7200.7 Sata specs (Publication number: 100270024, Rev. C) which has under Reliability (pg 21) "Mean time between failures (MTBF) 600,000 power-on hours (nominal power, 25°C ambient temperature)". (My emphasis) I am sure the error rate degradation numbers you quote are used to 'normalise' the temperature induced aging. So the following may be misuse of correct data for an incorrect purpose, but what the hell :twisted:

600000 hours @ 25C becomes
264000 hours @ 45C and
120000 hours @ 65C

Now the probability of no failure during a 5 year life become
0.93 @ 25C
0.85 @ 45C and
0.69 @ 65C

And presumably the probability at 65C is starting to be 'significantly low' which is why they spec the environment max as 60C? Now if these stats are 'meaningful' then for every 100 systems delivered, a system builder can expect between 7 and 30 hdd failures in 5 years (depending on the ambient temps): any builders out there with any records of failures?

One pedantry: I suggest your "the implication is that the service life is guaranteed within the operating range of the drive" would be more accurately written as "the implication is that the service life has high statistical probability of no failure within the operating range of the drive". I am sure it is not their intention to guarantee anything!

But certainly anything you can get would interest me: not least even their own numbers or correction of any GIGO I may be propagating!

grandpa_boris
Posts: 255
Joined: Thu Jun 05, 2003 9:45 am
Location: CA

Post by grandpa_boris » Fri Oct 31, 2003 10:48 am

dukla2000 wrote:One pedantry: I suggest your "the implication is that the service life is guaranteed within the operating range of the drive" would be more accurately written as "the implication is that the service life has high statistical probability of no failure within the operating range of the drive". I am sure it is not their intention to guarantee anything!
for the consumer market this is most likely the case (although seagate seems to be willing to make some efforts to improve on that image in some markets). but for the "enterprise" market segment (i.e. major OEMs like Sun, EMC, etc.) they will support and fix drives within their service life.

bob670
Posts: 49
Joined: Fri Oct 10, 2003 7:47 am

Post by bob670 » Fri Oct 31, 2003 12:25 pm

I try not to be a slave to component temps, but a little OCD kicks in and I get obsessed with it. I think the 55 degree number rings with me and my WDs seems to be hanging around 43-45 full time. That said, I have been repairnig OEM PCs for a long time, and as manufacturers have sought to stuff more stuff into a smaller space I have encountered some really hot and poorly ventilated drives in systems from Dell, Compaq and IBM, and none of their failure rates have struck me as so high to worry about it. I have one client with over 300 GX 50s which have the drive mounted in the front, top half of small desktop cases with practically zero airflow, and a lot of heat fromthe case seems to flow right UP into these drives. If you shut one of these machines down and check temps they are always in the high 50s or worse, and so far after 2 years of 24/7/364 usage I have only replaced one hard drive for this customer. I won't do the math, but judging on what I have seen from Dell and IBM in the past, if temps that high were really that much of a life shortener, they would find a solution or change case design to reduce warranty replacement cost.

I have to amend that, I ran Dtemp for the first time since I rearranged my cabling and changed heatsinks, both my WD 40 Gig SEs are idling at 28 degrees. Gotta' like that, although I'm not sure how accurate those diodes are? Any opinion?

exrcoupe
Posts: 31
Joined: Sat Sep 20, 2003 9:43 pm
Contact:

Post by exrcoupe » Tue Nov 11, 2003 8:23 am

So after all this debate, it still is left undecided and it's left up to what you're comfortable with? But it seems that the general concensus is that 55C is a safe range correct?

I ask this because I had just installed a cdrw so I had to re-arrange my setup with 2 hard drives and another cdrom. So my cables are all jumbled now and my hdd's aren't mounted the way I want them.

scalar
Posts: 90
Joined: Sat May 17, 2003 12:54 am

Post by scalar » Sun Nov 16, 2003 7:50 am

I'm not sure on what these max safe temperatures are based. I assume this is for the cover gaskets and drive bearing oil?


In the worst case scenario, the actual data on a hard drive should be safe up to approximately 500F, even if the circuitry may not survive. You can probably set the drive in a wood stove for an hour, then send it to a drive recovery service and get all the data back.

Somewhere around 500F there is a problem where the magnetic domains in metal can relax, and cause the data to essentially fade away as the domains begin realigning on the recording surface to a pattern of lowest energy.


I seem to recall that all modern circuit boards are built using a process called wave soldering where the whole board with components is dipped into a huge pool of liquid solder at around 350 degrees F, so the circuitry itself can therefore survive a nonoperating temperature at least that high. Even those ribbon cables in a drive are often wave soldered, so they too can withstand such high heat.

When operating, the circuitry will obviously generate heat, but the heat output is fairly small and stable, since the drive only needs to maintain a constant RPM with fairly low-friction bearings. Therefore, the circutry itself could probably tolerate running at least at 300 degrees continuously without failure.


Probably the real temperature concern is for the following two items:
- foam/rubber gaskets/insulators/isolators
- spindle motor/bearing oil

The drive cover is sealed to the frame using a gasket, and in all likelihood, this is a cheap foam-rubber gasket. This probably cannot handle temperatures in excess of 250 degrees before it begins to melt and bubble.

In the worst case, the foam seals could liquify, flow into the drive, and get on the platters, gumming up the head/arm assembly. Or it could melt and form a gap between the cover plates, and allow dust to get inside that eventually crashes the drive heads.

Additionally, many drives have foam or a plastic insulating sheet under the circuit board to insulate it from the metal drive frame. It is certainly possible for this plastic to shrink or perhaps melt, and perhaps warp the circuit board until the board cracks or touches a wire to bare metal.


The other weak spot is likely the spindle lubricating/bearing oil. Get the drive sufficiently hot, and you can probably boil the oil right out of the drive's motor bearings, evaporating the oil until the bearings are dry and the thing will no longer spin.

Since this oil is already likely very thin and light, it probably does not take too much heat to get the internal bearing pressure high enough until it leaks out of the seals and escapes into the air.

Very old drives develop a problem known as sticktion, which is some sort of failure of the bearings. Either the oil has leaked out, or the high heat over the years has caused the oil to form a thick, sticky jelly that makes the spindle difficult to turn. Usually drives with sticktion can be manually spin up with inertial techniques, and once running will continue to run, but if stopped for a while will go right back to being stuck again.


-Scalar

Post Reply