Max safe temp for hard drives?
Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee
We need someone to do a drive overheating test.
Put the hard drive in a small toaster oven with a temperature probe, and see just how warm the drive can get before it fails, slowly heating the oven by 25 degrees every 15 minutes and run a full seek and read/write test of the drive, from 150F on up to 450F.
Just be warned that the power and data/ribbon cables may begin to melt and short before the drive actually fails.
Put the hard drive in a small toaster oven with a temperature probe, and see just how warm the drive can get before it fails, slowly heating the oven by 25 degrees every 15 minutes and run a full seek and read/write test of the drive, from 150F on up to 450F.
Just be warned that the power and data/ribbon cables may begin to melt and short before the drive actually fails.
How To Really And Fully Wipe A Hard Drive?
You need to raise the temperature of the magnetic coating above the Curie temperature (770 C for iron). But as the platters are probably aluminum, and the melting point of aluminum is around 660 C -- you're probably going to have to settle for melting the platters and stirring them up.
Digital Media Life Expectancy and Care
The only magnetic reasons for an unsuccessful data recovery are erasure or extreme temperature exposure. The Curie temperature for 8mm Metal Particle DAT Tape is 1000C! Hence we need not be concerned about the magnetic properties because these temperatures will destroy the binder and the base film long before the magnetic properties are affected.
Magneto-Optical systemsp
All magnetic materials have a characteristic temperature, called the Curie temperature, above which they lose magnetization due to a complete disordering of their magnetic domains. Therefore, they lose all the data they had stored before. More importantly, the material's coercivity, which is the measure of material's resistance to magnetization by the applied magnetic field, decreases as the temperature approaches the Curie point, and reaches zero when this temperature is exceeded. For the modern magnetic materials used in MO systems, this Curie temperature is on the order of 200oC. It is important (since this is a multiply-erasable system) that the only change to the material when it is heated and cooled is the change in magnetization, with no damage to the material itself.
You need to raise the temperature of the magnetic coating above the Curie temperature (770 C for iron). But as the platters are probably aluminum, and the melting point of aluminum is around 660 C -- you're probably going to have to settle for melting the platters and stirring them up.
Digital Media Life Expectancy and Care
The only magnetic reasons for an unsuccessful data recovery are erasure or extreme temperature exposure. The Curie temperature for 8mm Metal Particle DAT Tape is 1000C! Hence we need not be concerned about the magnetic properties because these temperatures will destroy the binder and the base film long before the magnetic properties are affected.
Magneto-Optical systemsp
All magnetic materials have a characteristic temperature, called the Curie temperature, above which they lose magnetization due to a complete disordering of their magnetic domains. Therefore, they lose all the data they had stored before. More importantly, the material's coercivity, which is the measure of material's resistance to magnetization by the applied magnetic field, decreases as the temperature approaches the Curie point, and reaches zero when this temperature is exceeded. For the modern magnetic materials used in MO systems, this Curie temperature is on the order of 200oC. It is important (since this is a multiply-erasable system) that the only change to the material when it is heated and cooled is the change in magnetization, with no damage to the material itself.
notebook hardisk Temps
Hi anybody knows if the notebook hardisk temp are the same than the desktop pc ? how much heat can take a notebook hardisk? normaly this type of disks doesn't have a good airflow....
Just to disturb those who believe in HDD cooling and such things - mine drivers are constantly over 50 degrees, let's take a look, 58 right now and workign fine for years
Mostly Maxtors 6Y120P0 and 6Y160P0, and some Seagate Barracuda IV 80G ones.
Record temp - 83 degrees for my 6Y160P0 one - well, fitting the drive with foam into 5 1/4" bay is a big no-no
Record temp for Seagate - 65 degrees - after backing up 80G of data and some more working/defragmenting with the drive into not cooled (cooler removed because of damn noise) IDE drawer
So far, no drive failure yet
Mostly Maxtors 6Y120P0 and 6Y160P0, and some Seagate Barracuda IV 80G ones.
Record temp - 83 degrees for my 6Y160P0 one - well, fitting the drive with foam into 5 1/4" bay is a big no-no
Record temp for Seagate - 65 degrees - after backing up 80G of data and some more working/defragmenting with the drive into not cooled (cooler removed because of damn noise) IDE drawer
So far, no drive failure yet
Re: notebook hardisk Temps
Most 3.5" desktop drives suggest no higher than 55C. Most notebook 2.5" drives suggest no higher than 60C. The nice thing about notebook drives is that they generally consume far less power, thus produce less heat too. Producing less heat, while tolerating more heat is a very good combination to have.shadow947 wrote:Hi anybody knows if the notebook hardisk temp are the same than the desktop pc ? how much heat can take a notebook hardisk? normaly this type of disks doesn't have a good airflow....
Mine (for Seagates and Maxtors) seems accurate. The temp is allways higher on the sides, alwas lower on the top. Like 58C on sides, 53C on top. The SMART says 57 degrees Pretty accurate, if you ask me
Measured with this:
http://deltatrak.com/infrared_thermo_8.shtml
Measured with this:
http://deltatrak.com/infrared_thermo_8.shtml
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
I took statistics class, and that formula isn't making any sense to me because it has no way of accounting for variability.dukla2000 wrote:al bundy (et al) - Sure in the past/historically/actual data very few (if anyone) have actual heat related hdd failures. But as per investment performance, past experience is no guarantee of the future.
Not necessarily. My stats is virtually zero, but I remember somewhere a good post on what MTBF really means, and this Googled result is more or less what I remember. Now I can't work the arithmetic in the examplegrandpa_boris wrote:... that's over 68 years of operation. ...
R = exp(-43800/250000) = 0.839289
But bottom line, with a 7200.7 over (say) a 4 year life then the stats is actually saying there is an x% (say 92%? - I can't interpret what exp function is in that equation!) probability my drive will last that long.
By looking after the drive environment I am trying to increase the probability of no failure. Coming back to the 'bad' environments: again the stats is only saying the probability is lower you will survive, not necessarily zero. In no way am I finger pointing or asserting your drive WILL fail: it is just an inner smuggness that I believe my drive has a better chance of lasting 4 years than yours.
[edit] ps - managed to work the arithmetic: it is natural log (e) based. So for 600000 MTBF, 4 year life, probability is 94.3% of operation without failure. [/edit]
-
- Posts: 255
- Joined: Thu Jun 05, 2003 9:45 am
- Location: CA
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
worse yet, a simple statistical model is inaplicable. disk failures aren't linearly distributed. they follow a bathtub-shaped curve. disks either fail shortly after being deployed, or run well for a long time -- and then all disks from the same batch fail almost at the same time. i have actually seen this happen in real life. it ain't pretty.Elixer wrote:I took statistics class, and that formula isn't making any sense to me because it has no way of accounting for variability.
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
So if I buy a bunch of the same harddrives at once and they all happen to be from the same batch, then I put them into a RAID5 array thinking I'll be fine as long as no more than one drive fails at one time, I might be surprised down the road? That would suck.grandpa_boris wrote:and then all disks from the same batch fail almost at the same time. i have actually seen this happen in real life. it ain't pretty.
-
- Posts: 255
- Joined: Thu Jun 05, 2003 9:45 am
- Location: CA
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
that's what the numbers i've seen (and referenced) suggest. as i have mentioned, i've seen this happen in real life with enterprize-grade SCSI disks. however, it takes time for a disk to fail. if you monitor the SMART info from your drives, you'll be able to detect deterioration before it spreads and becomes fatal. if you quickly replace (and resync) the failing or about to fail drives, you should be able to get through the "mass die-off" with little trouble.shunx wrote:So if I buy a bunch of the same harddrives at once and they all happen to be from the same batch, then I put them into a RAID5 array thinking I'll be fine as long as no more than one drive fails at one time, I might be surprised down the road? That would suck.
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
Other than the tempearture, what other things in particular should users look for in order to detect potential failures? I've merely used MBM to check out temperatures.grandpa_boris wrote:if you monitor the SMART info from your drives, you'll be able to detect deterioration before it spreads and becomes fatal. if you quickly replace (and resync) the failing or about to fail drives, you should be able to get through the "mass die-off" with little trouble.
-
- Posts: 255
- Joined: Thu Jun 05, 2003 9:45 am
- Location: CA
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
look for accumulations of successfully retried read and write errors, which SMART firmware in the disk drives keeps stats on. SMART readouts, at some point, start suggesting disk replacement. it's quite unambiguous.shunx wrote: Other than the tempearture, what other things in particular should users look for in order to detect potential failures? I've merely used MBM to check out temperatures.
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
I didn't know it was possible to do this. What software could we use to get the info? Thanks.grandpa_boris wrote:look for accumulations of successfully retried read and write errors, which SMART firmware in the disk drives keeps stats on. SMART readouts, at some point, start suggesting disk replacement. it's quite unambiguous.
-
- SPCR Reviewer
- Posts: 8636
- Joined: Sat Nov 23, 2002 6:33 am
- Location: Sunny SoCal
Re: BEWARE THE MAGIC MTBF ILLUSION!!!
DTemp will certainly do it.shunx wrote:I didn't know it was possible to do this. What software could we use to get the info? Thanks.grandpa_boris wrote:look for accumulations of successfully retried read and write errors, which SMART firmware in the disk drives keeps stats on. SMART readouts, at some point, start suggesting disk replacement. it's quite unambiguous.
Thanks for the replies. Strangely DTemp says my ST3200822A drives are 128.00GB in capacity, even though it's a 200GB drive. Anyway here are some read outs that seem relevant:
----------------------[Device S.M.A.R.T. status]----------------------
Attribute Value Thresh
----------------------------------------------------------------------
Raw read error rate 49 6
Reallocated sector count 100 36
Seek error rate 83 30
Spin up retry count 100 97
Hardware ECC recovered 49 0
----------------------------------------------------------------------
So, which attributes are important, and what should the ideal range be?
----------------------[Device S.M.A.R.T. status]----------------------
Attribute Value Thresh
----------------------------------------------------------------------
Raw read error rate 49 6
Reallocated sector count 100 36
Seek error rate 83 30
Spin up retry count 100 97
Hardware ECC recovered 49 0
----------------------------------------------------------------------
So, which attributes are important, and what should the ideal range be?
not gonna quote since the original stuff is so old, but also remember MTBF = mean time BETWEEN failures, not BEFORE. Like storagereview says, it's more like the chance of having a (or the, if you only use one) drive die assuming you regularly replace it and run it under reasonable conditions, so it's not like the "timer" resets when you replace the drive. Any deviation from that will only make it much worse, I seriously doubt a modern (read: disposable) drive would have a chance at running for 25 years at 50C.
Hottest drive I've seen myself was an old SCSI Quantum Fireball ST that got too hot to even tap after about 30 seconds... don't think that counts though, since it had already died shortly after arrival and the heat was presumably from the motor straining against the crashed heads.
Aside from that (empty I think) drive, I've "only" ever completely lost two drives with no warning. Both happened within a week or so of each other, but they were completely different models (both Quantum though I think), just a one in several billion fluke. Lost everything on a 1GB Seagate SCSI drive sometime after that for some other reason... just got very flaky very fast, and I had no way of getting the data off it asap. Had a ton of sort of historically important data on it too.
Hottest drive I've seen myself was an old SCSI Quantum Fireball ST that got too hot to even tap after about 30 seconds... don't think that counts though, since it had already died shortly after arrival and the heat was presumably from the motor straining against the crashed heads.
Aside from that (empty I think) drive, I've "only" ever completely lost two drives with no warning. Both happened within a week or so of each other, but they were completely different models (both Quantum though I think), just a one in several billion fluke. Lost everything on a 1GB Seagate SCSI drive sometime after that for some other reason... just got very flaky very fast, and I had no way of getting the data off it asap. Had a ton of sort of historically important data on it too.
i have a Maxtor 6Y120P0 and it's been running at 55-57C (according to SMART) for the past 6 months or so. i haven't experienced any failures in my general usage of it. but it's only a email server and there's only about 2GB of stuff on it.
however, running a smart extended test reports there are "read failures" on it. the drive has only 6000 hours on it. a test ran at 3000 hours had no errors. between 3000 and 6000, it's just been inside my computer operating at 55C (on idle).
the failure might not be heat related but i'm leaning towards it. Maxtor's specs state 55C as the max operating temperature.
however, running a smart extended test reports there are "read failures" on it. the drive has only 6000 hours on it. a test ran at 3000 hours had no errors. between 3000 and 6000, it's just been inside my computer operating at 55C (on idle).
the failure might not be heat related but i'm leaning towards it. Maxtor's specs state 55C as the max operating temperature.
Recent hot weather and some poor temporary drive mountings pushed my seagate 7200.7 200gb sata drives over 50 deg C. Windows logged drive errors in the event viewer. SMART for one of the drives logged a max temp of 55 deg C.
I also have 2 120gb 7200.7 sata drives.. they probably got even hotter, because an ide cable fell on the top drive, blocking most of the airflow. Oops! I had to disconnect them to get my computer to boot.. I couldn't get SMART info because they are a raid-0 pair. I do have dvd backups of the data
-EDIT-
Ok, I'm no so sure if HDD temp caused the problems, will update when I get around to fixing it.
I also have 2 120gb 7200.7 sata drives.. they probably got even hotter, because an ide cable fell on the top drive, blocking most of the airflow. Oops! I had to disconnect them to get my computer to boot.. I couldn't get SMART info because they are a raid-0 pair. I do have dvd backups of the data
-EDIT-
Ok, I'm no so sure if HDD temp caused the problems, will update when I get around to fixing it.
Last edited by brad on Thu Jan 20, 2005 6:09 pm, edited 1 time in total.
Arrhenius’ rule is from chemistry... A 10C change in temperature doubles (or halves) a chemical reaction rate. Rules of thumb are stretched in all directions. For reliability, the temperature delta is important, but not nearly as important as the temperature delta divided by the time to change the temperature (dT/dt).MikeC wrote:Well there is if you consider the S.M.A.R.T. temp, which is off the internal temp diode, to be a reasonable representation of internal drive tmep. This would naturally include the effect of ambient temp.Jan Kivar wrote:...there is no simple high limit for drive temperature...
Regarding the "10°C rule for electronics/mechanics": A temperature rise of 10°C will halve the expected lifetime.
There is some question about where this originated. I recall reading somewhere about it being pulled out of the air by some smartass contractor for the US military who wanted to sell more cooling electronic gear?... Probably totally distorted.
Speaking seriously, we can’t help recalling Arrhenius’ rule from the U.S. Department of Defense Military Handbook 217 (this book used to be the court of first instance in all questions concerning electronics reliability). This rule suggests that for the temperature range from –20 to 140C every temperature drop by 10C doubles the life term of the equipment. Military Handbook 217 is no longer used nowadays and the rule shouldn’t be taken directly as is. For example, temperature may vary in different parts of a single PC case. Still, the book had its truth. High temperature of a chip doesn’t tell well on its life term.
-
- Patron of SPCR
- Posts: 857
- Joined: Fri Dec 27, 2002 1:49 pm
- Location: Somerset, WI - USA
- Contact:
I was putting together a server a while ago. I have a Dell Precision Workstation 410 which has a 4 drive rack in the front of the case. I filled it up with 4 scsi drives. As I was installing Linux, it froze. So I turned it off and tried again and got the same results.
I finally noticed when I touched the drive rack that it was damn hot! 4 scsi drives right on top of each other can create a lot of heat. Then I noticed the 92mm exhaust fan which was temperature driven wasn't moving much. Even pinching the thermister between my fingers didn't speed it up at all. Last I noticed that the airflow wasn't very good in this case. There was plenty of open area in the front, but there were too many openings below the drives. Given the room that the 4 drives and cables took, the air decided to flow through the easiest path completely bypassing the drives leaving them in a dead zone.
I replaced the exhaust fan with a Panaflo 92M and duct taped all extra openings besides the ones right in front of the drives. Then all the air flowed right over the drives and now they don't barely get warm at all.
BUT.... I did appear to lose my 2 18gb drives. One was a Seagate Baracuda and the other was a IBM which was a double height drive. Both of these drives reported errors after this. I don't know what temp they got up too when they crashed. But I don't see how it could have been caused by anything except the temp.
I finally noticed when I touched the drive rack that it was damn hot! 4 scsi drives right on top of each other can create a lot of heat. Then I noticed the 92mm exhaust fan which was temperature driven wasn't moving much. Even pinching the thermister between my fingers didn't speed it up at all. Last I noticed that the airflow wasn't very good in this case. There was plenty of open area in the front, but there were too many openings below the drives. Given the room that the 4 drives and cables took, the air decided to flow through the easiest path completely bypassing the drives leaving them in a dead zone.
I replaced the exhaust fan with a Panaflo 92M and duct taped all extra openings besides the ones right in front of the drives. Then all the air flowed right over the drives and now they don't barely get warm at all.
BUT.... I did appear to lose my 2 18gb drives. One was a Seagate Baracuda and the other was a IBM which was a double height drive. Both of these drives reported errors after this. I don't know what temp they got up too when they crashed. But I don't see how it could have been caused by anything except the temp.
-
- Posts: 23
- Joined: Mon Aug 11, 2003 7:09 pm
I'm fighting hard drive heat myself now.
Finally got around to attempting to decouple my Cuda IV, which would eliminate the final source of noise in my quiet little PC.
I've got to figure out a way to strap some heatsinks to the side of the drive so that it can stay a little cooler. I'm not comfortable with 45C (under a defrag)
Lots of good ideas in here though.
- M4H
Finally got around to attempting to decouple my Cuda IV, which would eliminate the final source of noise in my quiet little PC.
I've got to figure out a way to strap some heatsinks to the side of the drive so that it can stay a little cooler. I'm not comfortable with 45C (under a defrag)
Lots of good ideas in here though.
- M4H
-
- Posts: 255
- Joined: Thu Jun 05, 2003 9:45 am
- Location: CA
a few interesting disk driver reliability links:
http://www.ewh.ieee.org/r6/scv/rs/articles/ss030326.pdf
http://www.usenix.org/events/fast03/tec ... rson_html/
the link i was actually looking for is Jon G. Elerath's iEEE paper titled "Specifying reliability in the disk drive industry: No more MTBF", but it's not accessible without IEEE login. i have a PDF version, but i don't have a way to make it available.
http://www.ewh.ieee.org/r6/scv/rs/articles/ss030326.pdf
http://www.usenix.org/events/fast03/tec ... rson_html/
the link i was actually looking for is Jon G. Elerath's iEEE paper titled "Specifying reliability in the disk drive industry: No more MTBF", but it's not accessible without IEEE login. i have a PDF version, but i don't have a way to make it available.
iMac HD runs 57C
I came upon this forum while googling safe hard drive temperatures. We have similar debates on the Apple boards about whether the iMac G5 runs too hot and the ramifications of hard drive temps.exrcoupe wrote:So after all this debate, it still is left undecided and it's left up to what you're comfortable with? But it seems that the general concensus is that 55C is a safe range correct?
You guys might find it shocking, but iMac hard drives (e.g. Barracuda ST3160023AS 160gb on mine, others have Maxors etc.) typically settles around 57.5C after some hours, Smart= about 57 or 58C. This is just surfing and email not heavy duty usage, and is not unusual for this machine. This is 3 degrees away from the max! As you can imagine there is debate about whether iMac's 2" thick system design compromises air flows, but you don't hear about a lot of harddrive failures (of course machines only a couple years old).
So this question you're debating is even more apt to my iMac than for your PCs. You guys seem to be worrying about 45C, jeez.
The other big complaint is that the fans are really loud, people think it's the resonance of the cpu fan case together with the fact the cpu is 18" in front of your ears.
Allow me to give my two cents here...
Everytime I see these debates it makes me laugh.
People worry about the "long-term affects" of overclocking CPUs and "hot" running components.
Here's how it works.
The hotter the component, the shorter the life.
Now here is the reality of it all.
Even if your run a component hot, you will most likely upgrade before it dies.
If it runs stable through stress testing 24 hours and you will more than likely have all new components within 5 years, don't sweat it!
If you are still using the same components after 5 years, you're living in the stone age.
P.S. I have a 74GB Raptor suspended with a Nexus 120mm case fan at 5v, I never touch 40C under load.
Everytime I see these debates it makes me laugh.
People worry about the "long-term affects" of overclocking CPUs and "hot" running components.
Here's how it works.
The hotter the component, the shorter the life.
Now here is the reality of it all.
Even if your run a component hot, you will most likely upgrade before it dies.
If it runs stable through stress testing 24 hours and you will more than likely have all new components within 5 years, don't sweat it!
If you are still using the same components after 5 years, you're living in the stone age.
P.S. I have a 74GB Raptor suspended with a Nexus 120mm case fan at 5v, I never touch 40C under load.
Disk Reliability
Hi,
It seems that a lot of folks are worried about their disk failure rates and how these are impacted by temperature. I have some experience in reliability assessment and modelling in datacentres so I thought I'd join in.
Thermal impact on Reliability;
Disk reliability is significantly affected by their operating temperature. See the following link from Hitachi Data Storage for a graphic example from a manufacturer:
http://www.hitachigst.com/hdd/technolo/ ... vetemp.htm
As can be seen the rule of doubling for every 10 degrees C rise is not applicable as the hard drive is electromechanical not purely electronic. More interesting (to me anyway) is that the stated MTBF can be improved upon by running the disk colder.
From the chart, running this disk at 15 degrees above design temperature will increase the probabiliy of a failure by 1.4 every day you run the disk hot. Conversely if you chill it down to 15 degrees below design then the probability of failure is divided by 1.5.
MTBF & Reliability;
There has been discussion of what MTBF means and how you will be impacted by it in this post so I will offer some views on this.
MTBF is normally Mean Time Between Failures, but this applies to repairable systems, hard drives are typically replaced, not repaired, so in this case it would mean Mean Time Before Failure.
As has already been pointed out the reliability will be of the characteristic "bathtub curve" where the following three effects combine:
1) Initial high infant mortality due to manufacturing defects, this will typically show up during the first few days, format the disk and then benchmark it for a day to get through this bit
2) Normal low level random failure following an exponential reliability model (thus the log e in the equation already given)
3) End of design life high failure due to component wearout
So if your disk survives it's first few days it is then going to be subject to a continuous probability of random failure. The cumulative probability of failure (e.g. probability that a disk will last 100,000 hours) is given by the exponential model. This means that the probability of your disk failing tomorrow is the same all the way through the design lifetime.
MTBF and Warranty;
The MTBF given and the design life of the disk will be substantially larger then the warranty given for the simple reason that warranty returns cost the manufacturer a lot of money/ This is why you see disks with 5 year warranties with huge MTBFs. The manufacturer will want to limit the returns within warranty to a small percentage for cost reasons.
e.g.
Disks with MTBF of 83500 Hours are sold by manufacturer with a 1 year warranty.
Ignoring infant mortality and assuming no end of life failures;
probability of each disk surviving the first year is 90%
(in Excel use the formula =EXP(-(1/MTBF)*Hours) to give the reliability, subtract this from 1 to get the probability of failure)
So this manufacturer giving a 1 year warranty will have to replace 10% of all the disks they sell which will cost them more than the profit for the whole batch.
This is why the MTBF for disks is so high, the manufacturers who offer udeful warranties (5 years) have to make disks that only a very small percentage will fail within warranty to make money.
This MTBF is, however, an artificial value and should not be read as "my disk will last for 1,000,000 operating hours" because this is not true. Due to the use of the number and the testing methods used to get it what it means is:
"with 10,000 of my disks all running together for 70 hours roughly half will fail during the test"
End of Design Life;
Your disks will probably die shortly after the end of the design life due to mechanical wearout. This will be a safe (for the manufacturer) margin beyond the end of the warranty.
I have no data on how this is affected by temperature but it is reasonable to make the following assumptions:
1) Percentage of time powered up will impact EOL
2) Temperature whilst running will impact EOL (see the discussion above about the various seals etc in the disk)
3) Extent of use will impact EOL, if it is seeking continuously then the head motors and bearings are going to fail sooner.
I hope this helps, if anyone want me to clarify anything or wishes to know more then please let me know.
Thx
Liam
It seems that a lot of folks are worried about their disk failure rates and how these are impacted by temperature. I have some experience in reliability assessment and modelling in datacentres so I thought I'd join in.
Thermal impact on Reliability;
Disk reliability is significantly affected by their operating temperature. See the following link from Hitachi Data Storage for a graphic example from a manufacturer:
http://www.hitachigst.com/hdd/technolo/ ... vetemp.htm
As can be seen the rule of doubling for every 10 degrees C rise is not applicable as the hard drive is electromechanical not purely electronic. More interesting (to me anyway) is that the stated MTBF can be improved upon by running the disk colder.
From the chart, running this disk at 15 degrees above design temperature will increase the probabiliy of a failure by 1.4 every day you run the disk hot. Conversely if you chill it down to 15 degrees below design then the probability of failure is divided by 1.5.
MTBF & Reliability;
There has been discussion of what MTBF means and how you will be impacted by it in this post so I will offer some views on this.
MTBF is normally Mean Time Between Failures, but this applies to repairable systems, hard drives are typically replaced, not repaired, so in this case it would mean Mean Time Before Failure.
As has already been pointed out the reliability will be of the characteristic "bathtub curve" where the following three effects combine:
1) Initial high infant mortality due to manufacturing defects, this will typically show up during the first few days, format the disk and then benchmark it for a day to get through this bit
2) Normal low level random failure following an exponential reliability model (thus the log e in the equation already given)
3) End of design life high failure due to component wearout
So if your disk survives it's first few days it is then going to be subject to a continuous probability of random failure. The cumulative probability of failure (e.g. probability that a disk will last 100,000 hours) is given by the exponential model. This means that the probability of your disk failing tomorrow is the same all the way through the design lifetime.
MTBF and Warranty;
The MTBF given and the design life of the disk will be substantially larger then the warranty given for the simple reason that warranty returns cost the manufacturer a lot of money/ This is why you see disks with 5 year warranties with huge MTBFs. The manufacturer will want to limit the returns within warranty to a small percentage for cost reasons.
e.g.
Disks with MTBF of 83500 Hours are sold by manufacturer with a 1 year warranty.
Ignoring infant mortality and assuming no end of life failures;
probability of each disk surviving the first year is 90%
(in Excel use the formula =EXP(-(1/MTBF)*Hours) to give the reliability, subtract this from 1 to get the probability of failure)
So this manufacturer giving a 1 year warranty will have to replace 10% of all the disks they sell which will cost them more than the profit for the whole batch.
This is why the MTBF for disks is so high, the manufacturers who offer udeful warranties (5 years) have to make disks that only a very small percentage will fail within warranty to make money.
This MTBF is, however, an artificial value and should not be read as "my disk will last for 1,000,000 operating hours" because this is not true. Due to the use of the number and the testing methods used to get it what it means is:
"with 10,000 of my disks all running together for 70 hours roughly half will fail during the test"
End of Design Life;
Your disks will probably die shortly after the end of the design life due to mechanical wearout. This will be a safe (for the manufacturer) margin beyond the end of the warranty.
I have no data on how this is affected by temperature but it is reasonable to make the following assumptions:
1) Percentage of time powered up will impact EOL
2) Temperature whilst running will impact EOL (see the discussion above about the various seals etc in the disk)
3) Extent of use will impact EOL, if it is seeking continuously then the head motors and bearings are going to fail sooner.
I hope this helps, if anyone want me to clarify anything or wishes to know more then please let me know.
Thx
Liam
-
- Posts: 10
- Joined: Wed Oct 08, 2003 1:24 am
I had two IBM 120GPX or something (3 years or 4 years ago) in my PC which run super hot. the drives got up to 65C in summer under load.
I replaced them after 1 year of temps between 50C and 65C.
I dont know if it was just the old drives of the heat but they where LOUD when spinning and not in their drive enclosure any more.
Now I try not to go above 50C for hard drives.
No hard drive has ever failed me.
I also replace any drive older then 2 years normally
Seriously drive temps under 55C for 2 years should not be a problem if you replace the drive after 2 years.
the bigger problems is running the drives 24/7 rather then heat.
they are not build for this. all desktop drives are build with 7 hours 5 day in mind.
I replaced them after 1 year of temps between 50C and 65C.
I dont know if it was just the old drives of the heat but they where LOUD when spinning and not in their drive enclosure any more.
Now I try not to go above 50C for hard drives.
No hard drive has ever failed me.
I also replace any drive older then 2 years normally
Seriously drive temps under 55C for 2 years should not be a problem if you replace the drive after 2 years.
the bigger problems is running the drives 24/7 rather then heat.
they are not build for this. all desktop drives are build with 7 hours 5 day in mind.