Intelligent thermal management is the key to silent computing, but CPU temperature reporting mechanisms in current processors and motherboards are inaccurate, with results that can vary by 10°C or more. The causes of the inaccuracies are complex, but correcting them to a more reasonable margin of error is not terribly difficult. Russ shows you how.
October 6, 2004 by Russ Kinder
In the quest for a quieter PC, most people quickly discover that heat is the fundamental cause of noise problems. Every source of noise in the computer, with the exception of the hard drives, is a fan that is there to deal with heat. Because of this, the quest for PC silence hinges on understanding thermal issues.
The single greatest source of heat is the CPU, which is also one of the most heat-sensitive components in the machine and often the most expensive. Unfortunatly, our understanding of the CPU heat is limited by how the temperature is reported. The CPU temperature reporting mechanisms incorporated into current processors and motherboards are incredibly inaccurate, with results that can vary by 10°C or more. The causes of the inaccuracies are complex, but correcting them to a more reasonable margin of error is not terribly difficult.
The single greatest source of heat in a PC is the CPU.
Before we get into correcting the CPU temperature reporting inaccuracies, a few observations:
1. Most PC users do not need to worry about precise CPU temps. It’s an odd
disclaimer for an article about how to correct CPU temps, but the oft-quoted
axiom of "If your PC is stable, your temps are fine"
is true. As long as your system is stable, you really don’t need to worry about whether your reported CPU temperatures are correct; running at 50°C or 60°C makes no
difference to any measurable performance benchmark. Computer tinkerers and power
enthusiasts have become temperature obsessed, but in reality, rounding your
CPU temps off to the nearest 10°C is all the accuracy 99% of users need.
So why write this? Well, the temperature obsession isn’t likely to go away anytime soon, so at the very least we can help clear some of the fog of mis-information surrounding CPU temps. Secondly, there will always be a small group of people who want to know precisely what their machine is doing, so that they can tread more closely to that fine line at the limit of the performance envelope. A third even smaller group is people for whom having accurate temperature data is an integral part of their work. If you want to compare the efficiency of different cooling systems, or benchmark your current system, you need to have a higher level of accuracy. This group obviously includes technical reviewers such as the SPCR staff.
2. What this article describes as "accurate" is a matter of debate.
This methodology attempts to remove the motherboard-induced variations to the CPU on-die thermal diode readings. However, the accuracy of the temperature output from the CPU thermal diode is a completely different issue. The P4 is known to report temperatures well below the hottest portions of the core, and to a lesser extant this is true of the AMD processors as well. The nice aspect of reading directly from the CPU diode is that at least you know that everyone else’s readings are equally inaccurate. (On a philosophical level, if everyone is equally wrong, does that make everyone right?)
Simulated temperature plot of a P4 processor. The arrow on the
upper left marks the location of the P4’s thermal diode, while the arrow on
the lower right is the hottest portion of the die. The image, many times larger than the actual size of a P4 core, is from Differentiating
PCs in a ‘Toaster World’ by Robin Getz of Analog Devices, published
in the April 2002 edition of the Intel
Developer Update Magazine.
3. The calibration described here only works for
a specific CPU in a specific motherboard. Swap the CPU after you calibrate the thermal reporting system, and it won’t be accurate
anymore. This is because there is enough variance between different samples of the same model processor that their heat output will not be the same. Change the motherboard, and it won’t be accurate, again because of sample variances. Changing to a different model virtually guarantees a different temp result. Even removing and
replacing the CPU from the socket may effect the results, due to the potential
for altered resistance between the thermal diode output pins and the motherboard
socket. Changing heatsinks will have little effect, as long as
the heatsink is installed the same way with the same thermal interface material. (There is some debate about variability of TIM applications and the effects on CPU temp, but it’s generally true that over time, any differences become nullified.)
4. This article contains very little original thought. Nearly everything
in here is based on the work of someone else, or upon commonly understood principles
that have been explained elsewhere, often in much greater detail. This methodology
is a distillation of these ideas into a single convenient place. If the subject
interests you, I encourage you to do more reading elsewhere. Some excellent
places to start are be the articles by Derek Peak (aka pHaestus) at Procooling.com:
from AMD’s Thermal Diode, and Fun
with AMD Diodes: CPU Mutilation, calibration and testing. Another important
source is the useful, if obscure Calibrating
the Internal Thermal Diode in an Intel PIII CPU at Arctic
Ok, for those of you brave enough to proceed…
CALIBRATING YOUR CPU THERMAL DIODE OUTPUT
To ensure accurate temperature reports, you need to check for and correct two big issues in your readings: Linearity and Offset.
Essentially, if your temp readings are linear they may be wrong, but
at least they are always wrong or offset by the same amount. Offset is
pretty easy to fix, but non-linearity is much trickier. So tricky, in fact,
that non-linearity is probably not correctable.
What you need for testing:
1. A CPU/motherboard combination that allows for reading from the CPU
diode. If you are running an Intel CPU made in the past 5 years, you’ve
got one. If you are running an AMD system, its a bit more complicated: First
of all, it has to be an Athlon XP or newer to have a diode, and secondly,
your motherboard has to read from the diode. Do not assume that your brand
new motherboard does.Quite a few of the Socket A motherboards available today still read from an in-socket thermistor rather than the CPU diode. The
trouble with reading from the socket is that its separation from the core
ensures temperature compression and non-linearity especially as temperature rises.
A simple way to tell if your socket A board is reading from the diode
is to watch the CPU temp change when a load is suddenly added to the system.
Set the update time on Motherboard Monitor (or other CPU temp monitoring utility) to 1 second, and then fire up CPUBurn or Folding@Home. If
the temp doesn’t jump several degrees in the first second or two, you’re not
reading from the diode.
An illustration of the difference in Diode/Socket temperature
readings from an AMD XP CPU. Note the 5° jump in the reading from the
diode in the first second of CPUBurn running, while the socket temp hasn’t
even begun to move
A second source of information on how your motherboard reports CPU temperature is to look up your model in Motherboard Monitor’s Motherboard List. It may be that your board has the ability to read the diode, but only if you have MBM set to be reading from the correct sensor. If your motherboard does not support diode readings, you’re still welcome to try this method but there are no guarantees that your invested time will amount to anything.
2. A mobo/CPU combination that allows under/overclocking. You don’t need a huge swing in CPU speed to get meaningful results, but statistically, the bigger the variation in speeds you can get, the better. Under/overclocking via either FSB, multiplier, or both is fine, but do not adjust the Vcore at any point during the testing.
3. Monitoring software, such as Motherboard Monitor 5. MBM5, although no longer being updated, is still probably the best choice. The ability to apply the calibration adjustment automatically to its readings is a very convenient feature.
4. Wattage calculation software, such as CPUHeat
Accuracy is surprising un-important. CPU wattage varies linearly with Mhz,
so the other variables in the software’s calculations have no real effect
on the outcomes of these tests (they get canceled out in the calculations).
So as long as you use the same source for all your wattages, and don’t change
the Vcore, you’re fine.
5. CPU stressing software. CPUBurn
is default choice here: it’s simple, small, and unlike Prime95 or Folding@Home,
it produces consistent results.
6. Fixed fan speed on CPU heatsink, and a PSU that does not increase fan speed so much (under load) as to affect CPU temperature. Basically, the airflow/cooling conditions in the system must remain constant through all the tests. If you have a HSF that adjusts the RPM based on temperature, you will need to find a way to lock the fan at the same RPM for all the tests, otherwise non-linearity with changing temperature is assured. For motherboard controlled fans, a BIOS setting tweak may be required. For a PSUs with bottom-intake fans that ramp up fast at load, you may need to move it outside the case temporarily.
The basis of the calculations is a series of temperature readings taken while putting the CPU under maximum load over a range of speeds. Run a series of max CPU temp tests across as wide a range of FSB speeds as you can do stably. The general method to use is:
A. Run CPUBurn until the reported CPU temp stops increasing, and then
record that temperature, the ambient temperature, and the CPU speed.
B. Adjust the FSB to the next level, and repeat A. Noting the ambient temperature is important, since it is unlikely that it will remain constant throughout your testing. Assuming your CPU heatsink fan draws air into the heatsink, the best place to measure this is within 6" of the intake point.
A wide CPU speed range is more important than the number of tests, but the more tests you have the more reliable your results will be, and the easier it will be to see patterns. I made 15 different readings for my testbed, but 5 would probably be enough. For my tests I set up a little spreadsheet to keep the values organized and to do the math for me, but just recording the results on the back of an envelope would work just fine too.
C. For each of the tests calculate and/or record the following values:
Temperature Rise from Ambient (CPU max temp minus the ambient temp), wattage,
MHz, and °C/W (°C/W=Temp Rise/Wattage).
Part 1: Linearity
Linearity is simple to check now that you’ve done all that testing. Just scroll down the table looking at the °C/W results. Are they the same, or fairly close to one another? If yes, then your CPU temperature monitoring system is linear. If not, then it is not linear. If it is not linear you could, in theory, derive a mathematical equation (that’s the tricky part I mentioned above) to equate your results to linear ones.
If the system is linear, go on to part 2.
Part 2: Offset
This is slightly more complicated and it involves more math.
First, the premise: Since wattage scales linearly with clock speed, so should
the change in temperature (dT). Luckily we have very accurate data on what the
clock speeds are, and we can use that to determine the accuracy of the temperatures.
Whatever the actual wattage is, it is irrelevant. (thus sparing the TDP vs MDP
Here comes the math….
Pick two different sets of test results from your data table. (the wider the spread the more accurate the math) For convenience we will call them Low and High. From the data we need 4 numbers:
- Low clock speed = LS
- High clock speed = HS
- Low speed dT = LT
- High speed dT = HT
In a perfect world: HS / LS = HT / LT
In plain English, the above equation means the ratio of the high clock speed to the low clock speed is equal to the ratio of the high clockspeed’s temperature change to the low clock speed’s temperature change.
What you’ll likely find is that the above equation doesn’t actually come out equal for your numbers. Do not despair, we really didn’t think it would. Since we know from Part 1 that the temperature results are linear, we know that the HT and LT are must both be offset by the same constant. Adding that to the equation we get:
HS / LS = (HT + c) / (LT + c) where c is the calibration offset. (Note that c can be positive or negative)
You can do the algebra to solve for c:
c = ((HS / LS) * Lt – HT) / (1 – (HS / LS))
I created an XLS file to solve the above formula which you can download here for your convenience: Calculate C.
You can also just start plugging
in integers for it and see whether it makes the equation closer to being equal.
Accuracy to 0.5° is really the very best you can hope to achieve; chances are, this is already beyond the resolution of the CPU diode and
motherboard circuitry. Eventually you will find a single value
for c that will make the equation true, or at least pretty close to true.
That is your motherboard/CPU temperature offset. Try repeating the offset calculation
with a couple of other data set pairs to confirm it. If your numbers are accurate,
it should come out the same each time (or pretty close)
Perhaps an example would help. From my testbed calibrations:
LS = 1300Mhz
HS = 2300Mhz
LT = 13.5°
HT = 25°
HS / LS = HT / LT
2300 / 1300 = 25 / 13.5
1.77 = 1.85 ……Hmm….not quite right. Lets start working out "c"
1.77 = (25 + c) / (13.5 + c) …………..I’ll try -2° as a first guess
1.77 = (25 – 2) / (13.5 – 2)
1.77 = 2 ………………..drat, the difference is bigger now, I must have gone the wrong way. Lets try +2°
1.77 = (25 + 2) / (13.5 + 2)
1.77 = 1.74 …………………….pretty close, but I went a bit past it, I’ll try +1.5°
1.77 = (25 + 1.5) / (13.5 + 1.5)
1.77 = 1.77……………… We have a winner! My CPU diode is off by 1.5°.
For my testing I set up a spreadsheet matrix to solve for c across all 15 sets of test data, then provide an average for c for all the pairs that produced valid answers. The average came out to something like 1.53C°. I settled on 1.5°C and entered this value into MBM to have it adjust the CPU readings automatically to ensure correct readings for my heatsink tests.
My AMD XP CPU test rig proved to be off by 1.5°.
SPCR reviewers have tried this method on a couple of different heatsink testbeds, one Socket A, and the other P4, and it appears to work as described here. Anyone who takes the time to complete the process is welcome to post their results. Perhaps with enough results we can look for patterns in the inaccuracies among CPU and motherboard models and makes.
* * *