No good deed goes unpunished...
Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee
No good deed goes unpunished...
Gentlemen,
any assistance in diagnosing an unstable quiet system would be much appreciated. As any good cousin would, I am giving my current P4-2.4 quiet system to a needy relative for Xmas, but my new replacement P4-3.0C setup is bedeviling me with spontaneous reboots under load. I've never experienced such a difficult time getting a system stabilized.
Overview:
Case: Antec 3700QBE
MB: Albatron PX865PE Pro
CPU: Intel 3.0Ghz / 800FSB (oem)
Heatsink: Zalman 7000-AlCu w/Arctic Silver 3 paste.
RAM: pair of vanilla 512MB PC3200, also tried single and pair of Mushkin PC2700 for sanity
Power: Seasonic Super Tornado 300, also tried standard Antec 365 that came with case.
Video: Radeon 9000 with Catlyst 3.9 drivers
Other cards: Firewire PCI card
HD: IBM 80GB with standard 80-pin ribbon cable
CDs: Pioneer DVD and LiteOn CDRW
OS: fresh XP SP1 installation ...
At first I attempted to run the Zalman at the lowest speed, but even Memtest86 would trigger a reboot after a while, so I cranked it back up to full speed (already not very quiet!).
So now at idle in XP, MBM indicates CPU temps of around 40-41. Is that already an indication of bad CPU / heatsink pairing?
Next up, Prime95 / Sandra stress tests cannot run for more than about 10 minutes before it reboots spontaneously (i.e. not an XP Bluescreen/bugcheck due to device drivers). The temp might get up to 48 by this point, but that doesn't seem outrageous, does it?
I've swapped power supplies, swapped RAM, but neither seemed to help. Maybe my power supplies are all underpowered, but there isn't that much in this rig.
At this point, my next thought is to pop out the CPU and try running my 2.4 Ghz in it, which would help point the finger either at the CPU or the Motherboard.
I've also been assuming it isn't a software problem, but perhaps that is a mistake. I've just never seen a machine reboot so randomly. One thing I've noticed is that NTFS hasn't appeared to have gotten corrupted due to this.
Thanks for any tips,
Mark
any assistance in diagnosing an unstable quiet system would be much appreciated. As any good cousin would, I am giving my current P4-2.4 quiet system to a needy relative for Xmas, but my new replacement P4-3.0C setup is bedeviling me with spontaneous reboots under load. I've never experienced such a difficult time getting a system stabilized.
Overview:
Case: Antec 3700QBE
MB: Albatron PX865PE Pro
CPU: Intel 3.0Ghz / 800FSB (oem)
Heatsink: Zalman 7000-AlCu w/Arctic Silver 3 paste.
RAM: pair of vanilla 512MB PC3200, also tried single and pair of Mushkin PC2700 for sanity
Power: Seasonic Super Tornado 300, also tried standard Antec 365 that came with case.
Video: Radeon 9000 with Catlyst 3.9 drivers
Other cards: Firewire PCI card
HD: IBM 80GB with standard 80-pin ribbon cable
CDs: Pioneer DVD and LiteOn CDRW
OS: fresh XP SP1 installation ...
At first I attempted to run the Zalman at the lowest speed, but even Memtest86 would trigger a reboot after a while, so I cranked it back up to full speed (already not very quiet!).
So now at idle in XP, MBM indicates CPU temps of around 40-41. Is that already an indication of bad CPU / heatsink pairing?
Next up, Prime95 / Sandra stress tests cannot run for more than about 10 minutes before it reboots spontaneously (i.e. not an XP Bluescreen/bugcheck due to device drivers). The temp might get up to 48 by this point, but that doesn't seem outrageous, does it?
I've swapped power supplies, swapped RAM, but neither seemed to help. Maybe my power supplies are all underpowered, but there isn't that much in this rig.
At this point, my next thought is to pop out the CPU and try running my 2.4 Ghz in it, which would help point the finger either at the CPU or the Motherboard.
I've also been assuming it isn't a software problem, but perhaps that is a mistake. I've just never seen a machine reboot so randomly. One thing I've noticed is that NTFS hasn't appeared to have gotten corrupted due to this.
Thanks for any tips,
Mark
I think the CPU swap is a step in the right direction. You've already swapped out the PSU and RAM. Since you've got the one good system, use it for good donor parts and swap out one item at a time. I've usually been able to identify the problem part that way. Heck, one time it was the keyboard causing lock-ups.
Of course, if you're unlucky it'll be a combination of things.
Of course, if you're unlucky it'll be a combination of things.
I have been having a similar problem, although with an ASUS NForce2 and AMD. My problem is that the BIOS temperature reads a different value than the ASUS tool in WinXP. The ASUS tool is significantly cooler, peaking at 42 when under 100% load for an extended period of time. The BIOS will turn the computer off when the ASUS tool is at about 46. A quick check in the BIOS says that the CPU is at 77 degrees! I have a hard time believing this because the heatsink is only warm to the touch. You might want to check your BIOS temps.
Destron
Destron
Believe it or not, problems like you're describing can often be caused by a faulty Firewire PCI card. Try taking out that card and see if the problem persists.
Other possible culprits:
Try the "failsafe" BIOS settings and see if the problem still happens. If so, then I'd bet (at this point in the troubleshoot) that you have a short occurring between your motherboard and the case somewhere. Another possibility is a poorly-seated/cooled processor (or simply a bad processor).
If the original RAM was bad when you installed the OS, then the critical OS files can be corrupt, causing spontaneous reboot issues even when the RAM (and other hardware) gets replaced.
Turn off "automatically reboot" in the Startup and Recovery options and see if anything in the way of evidence shows up at next system crash.
Good luck man.
Other possible culprits:
Try the "failsafe" BIOS settings and see if the problem still happens. If so, then I'd bet (at this point in the troubleshoot) that you have a short occurring between your motherboard and the case somewhere. Another possibility is a poorly-seated/cooled processor (or simply a bad processor).
If the original RAM was bad when you installed the OS, then the critical OS files can be corrupt, causing spontaneous reboot issues even when the RAM (and other hardware) gets replaced.
Turn off "automatically reboot" in the Startup and Recovery options and see if anything in the way of evidence shows up at next system crash.
Good luck man.
-
- Posts: 226
- Joined: Sat Sep 06, 2003 5:59 am
- Location: Finland
-
- *Lifetime Patron*
- Posts: 142
- Joined: Thu May 01, 2003 10:46 pm
- Location: Philadelphia, PA
- Contact:
I experienced a similar situation with my HTPC. The instability in my system was isolated to a Seasonic 300W Super Tornado (replacing this PSU with a 350W Forton PSU w/120mm fan resolved the problem...and the Fortron is also fairly quiet).
EDIT: Well, after reading the rest of your post, I realize that you have already tried changing PSUs...sorry for the knee-jerk response...I suppose that's what happens when you're running on 6 hours of sleep in two days...oh well...
EDIT: Well, after reading the rest of your post, I realize that you have already tried changing PSUs...sorry for the knee-jerk response...I suppose that's what happens when you're running on 6 hours of sleep in two days...oh well...
-
- Posts: 1283
- Joined: Wed Sep 03, 2003 1:35 am
- Location: Sweden, Linkoping
Have you checked you voltages in MBM and see if you have any fluctuations?
This might give you an indication if you are close to maxing out on the PSU. Especially check the 12V line.
Doing this might rule out any PSU issues.
Just to rule out all software issues in one single go I suggest that you download and burn a Demo-Linux CD. With that CD you can boot on the CD and it will not use your harddrive at all.
If you can't provoke any crash under Linux you can start looking into software problems in windows, possibly reinstalling windows etc.
This might give you an indication if you are close to maxing out on the PSU. Especially check the 12V line.
Doing this might rule out any PSU issues.
Just to rule out all software issues in one single go I suggest that you download and burn a Demo-Linux CD. With that CD you can boot on the CD and it will not use your harddrive at all.
If you can't provoke any crash under Linux you can start looking into software problems in windows, possibly reinstalling windows etc.
-
- SPCR Reviewer
- Posts: 8636
- Joined: Sat Nov 23, 2002 6:33 am
- Location: Sunny SoCal
This is a bad sign right out of the box. If you're not stable in Memtest86 you're wasting your time doing anything else.mh wrote: RAM: pair of vanilla 512MB PC3200, also tried single and pair of Mushkin PC2700 for sanity
At first I attempted to run the Zalman at the lowest speed, but even Memtest86 would trigger a reboot after a while, so I cranked it back up to full speed (already not very quiet!).
Take your board out of the case and set it on top of the mobo box or a piece of cardboard. Leave everything unplugged except for CPU, RAM, Vidcard, FDD and keyboard/mouse. Reset the CMOS (but check the memory SPD timings, a lot of 875/865 boards have trouble setting the timings by SPD. I'm already leery of your "plain vanilla RAM" so you better make sure the board is setting it at timings that will work. I don't know what it's rated at but you might try 2.5-3-3-7 or 3-3-3-7 to start) and run Memtest86 for a while. Assuming your memory can run some loops (maybe at least 10+) without errors (zero errors are acceptable) I'd probably do a fresh OS install and start all over again. It the RAM's bad get yourself some better RAM and try again. You need to make sure you have a good foundation before you even start.
The thing is, rebooting in Memtest86 is real weird. This kind of points to a bad PSU or MoBo to me.
That's certainly higher than normal. With that setup (and the fan running at 12V) I'd expect that you'd be at least 10°C cooler . What's your ambient temp?mh wrote:So now at idle in XP, MBM indicates CPU temps of around 40-41. Is that already an indication of bad CPU / heatsink pairing?
mh wrote:Next up, Prime95 / Sandra stress tests cannot run for more than about 10 minutes before it reboots spontaneously (i.e. not an XP Bluescreen/bugcheck due to device drivers). The temp might get up to 48 by this point, but that doesn't seem outrageous, does it?
First you gotta get Memtest86 to run without errors before you start worrying about this stuff. 48°C isn't bad at all, it kinda sounds like your CPU's not running at full load.
Your CPUs aren't underpowered at all. Max load on your setup os probably around 175W. Using MBM, keep an eye on your rails when you put your CPU under load. They shouldn't fluctuate much.mh wrote:I've swapped power supplies, swapped RAM, but neither seemed to help. Maybe my power supplies are all underpowered, but there isn't that much in this rig.
Certainly couldn't hurt, but look at your heatsink mounting while you're at it. Make sure both surfaces are squeeky clean before applying TIM and make sure you screw both mounting screws all the way down until they bottom.mh wrote:At this point, my next thought is to pop out the CPU and try running my 2.4 Ghz in it, which would help point the finger either at the CPU or the Motherboard.
How do you know? Bad memory can make a real mess of your OS, even though NTFS is especially bullet-proof.mh wrote:I've also been assuming it isn't a software problem, but perhaps that is a mistake. I've just never seen a machine reboot so randomly. One thing I've noticed is that NTFS hasn't appeared to have gotten corrupted due to this.
If you don't do a fresh install you might try one of those ATI driver removal apps (go to Rage3D forums and look at their stickies and FAQs) to get rid of those Cat 3.9's and go with the 3.7's instead. If you do a fresh install, pass on the 3.9's and go with the 3.7's.mh wrote: Video: Radeon 9000 with Catlyst 3.9 drivers
Thanks for the tips so far!
You guys are great! Thanks for the help so far.
I had prepared a longer response, but it took me so long to type it in that this BBS posting page timed me out so I lost it -- always compose in Notepad and cut/paste!
Anyway, I will be trying out each of your tips before giving up on the whole thing and returning the components.
To follow-up on some of the missing data:
#1. Using the Antec 385 PS & MBM, the readings for CPUTemp/Core0/Core1/3.3v/5.0v/12v/-12v/-5v were:
Idle:
35C/1.49v/1.52v/3.28v/5.05v/12.16v/-12.68v/-5.85v
After 1 minute of Prime95:
43C/1.34v/1.41v/3.26v/5.03v/11.98v/-12.93v/-5.95v
After 5 minutes of Prime95:
46C/1.30v/1.34v/3.28v/5.05v/12.04v/-13.01v/-5.90v
This was looking too stable, so right after taking these measurements I launched Sandra to run the burn-in wizard. It spontaneously rebooted before the Sandra splash screen even showed up!
2. The BIOS settings for timing and voltages have always been at the failsafe/default levels. The memory timing is via "By SPD", and for the Mushkin "value" PC2700 memory in dual channel mode, the BIOS determined the timing should be 2.5-6-3-3. My only somewhat reliable recollection of the DDR400 settings was 3-8-3-3.
The default CPU voltage was also shown as 1.525v
The BIOS Health screen showed the voltages as:
Vcore: 1.52
3.3v: 3.29
5v: 5.05
12v: 11.97-12.09
-12v: -12.77
VBAT(v): 3.15
5VSB(v): 4.99
Another question for the gurus:
1. I really like the bootable CDs like Memtest86, although I wish it could show temps & voltages too. But as for the "live CD" Linuxes, I am not familiar with any burn-in software under Linux. Does the Demo-Linux CD or other distros on www.distrowatch.com emphasize stress testing? Most seem to talk about recovery utilities.
Thanks,
Mark
I had prepared a longer response, but it took me so long to type it in that this BBS posting page timed me out so I lost it -- always compose in Notepad and cut/paste!
Anyway, I will be trying out each of your tips before giving up on the whole thing and returning the components.
To follow-up on some of the missing data:
#1. Using the Antec 385 PS & MBM, the readings for CPUTemp/Core0/Core1/3.3v/5.0v/12v/-12v/-5v were:
Idle:
35C/1.49v/1.52v/3.28v/5.05v/12.16v/-12.68v/-5.85v
After 1 minute of Prime95:
43C/1.34v/1.41v/3.26v/5.03v/11.98v/-12.93v/-5.95v
After 5 minutes of Prime95:
46C/1.30v/1.34v/3.28v/5.05v/12.04v/-13.01v/-5.90v
This was looking too stable, so right after taking these measurements I launched Sandra to run the burn-in wizard. It spontaneously rebooted before the Sandra splash screen even showed up!
2. The BIOS settings for timing and voltages have always been at the failsafe/default levels. The memory timing is via "By SPD", and for the Mushkin "value" PC2700 memory in dual channel mode, the BIOS determined the timing should be 2.5-6-3-3. My only somewhat reliable recollection of the DDR400 settings was 3-8-3-3.
The default CPU voltage was also shown as 1.525v
The BIOS Health screen showed the voltages as:
Vcore: 1.52
3.3v: 3.29
5v: 5.05
12v: 11.97-12.09
-12v: -12.77
VBAT(v): 3.15
5VSB(v): 4.99
Another question for the gurus:
1. I really like the bootable CDs like Memtest86, although I wish it could show temps & voltages too. But as for the "live CD" Linuxes, I am not familiar with any burn-in software under Linux. Does the Demo-Linux CD or other distros on www.distrowatch.com emphasize stress testing? Most seem to talk about recovery utilities.
Thanks,
Mark
-
- SPCR Reviewer
- Posts: 8636
- Joined: Sat Nov 23, 2002 6:33 am
- Location: Sunny SoCal
Re: Thanks for the tips so far!
Pretty big dip in your Vcore. I've never seen anything like that. Something's wrong, I'd say almost for sure that's your problem right there. Big Vcore sag under load ≠ stability. That's a 20% drop!mh wrote: Idle:
35C/1.49v/1.52v/3.28v/5.05v/12.16v/-12.68v/-5.85v
After 1 minute of Prime95:
43C/1.34v/1.41v/3.26v/5.03v/11.98v/-12.93v/-5.95v
After 5 minutes of Prime95:
46C/1.30v/1.34v/3.28v/5.05v/12.04v/-13.01v/-5.90v
You do have that square Aux 12V connector plugged into your mobo, right?
Vcore dropping and Hyperthreading
Ralf,
yes, the dropping Vcore has been consistent with both powersupplies, etc. The 4pin power connector has always been attached. I just guessed it was something intentional, where the Vcore dropped to compensate for rising CPU temps.
And for reference, a few other test results:
1. removing the Firewire card had no effect.
2. memtest86 was very stable for more than 10 passes
3. bumping the DDR RAM voltages up by 0.2 volts did not help
4. reverting to default BIOS settings allowed me to run Prime95 + Sandra burn-in for quite a while.
It turns out that the default BIOS settings disable Hyperthreading. When I turn on Hyperthreading, the machine can still run Prime95 indefinitely, but the moment I launch another application (not just Sandra), the system reboots.
That almost sounds like a software problem again, but with HT turned off and both Prime95 + Sandra running, the Vcore stays up around 1.47 instead of dropping off to the low to mid 1.30v.
That still doesn't explain whether the CPU is faulty, or the motherboard, or both, but my guess is that if I swapped in my non-HT capable P4-2.4Ghz, it would be stable as a rock just like the 3.0 appears to be when HT is off.
Of course, one of the big appeals of getting the P4C models is HT.
Does this ring any bells with people?
Thanks,
Mark
yes, the dropping Vcore has been consistent with both powersupplies, etc. The 4pin power connector has always been attached. I just guessed it was something intentional, where the Vcore dropped to compensate for rising CPU temps.
And for reference, a few other test results:
1. removing the Firewire card had no effect.
2. memtest86 was very stable for more than 10 passes
3. bumping the DDR RAM voltages up by 0.2 volts did not help
4. reverting to default BIOS settings allowed me to run Prime95 + Sandra burn-in for quite a while.
It turns out that the default BIOS settings disable Hyperthreading. When I turn on Hyperthreading, the machine can still run Prime95 indefinitely, but the moment I launch another application (not just Sandra), the system reboots.
That almost sounds like a software problem again, but with HT turned off and both Prime95 + Sandra running, the Vcore stays up around 1.47 instead of dropping off to the low to mid 1.30v.
That still doesn't explain whether the CPU is faulty, or the motherboard, or both, but my guess is that if I swapped in my non-HT capable P4-2.4Ghz, it would be stable as a rock just like the 3.0 appears to be when HT is off.
Of course, one of the big appeals of getting the P4C models is HT.
Does this ring any bells with people?
Thanks,
Mark
-
- Site Admin
- Posts: 12285
- Joined: Sun Aug 11, 2002 3:26 pm
- Location: Vancouver, BC, Canada
- Contact:
It would be a shocker if the CPU was at fault. I've never heard of a CPU that was partly working -- it's always been works 100% or is dead. If the Vcore drop happens with both PSUs, it reallyu looks like to motherboard to me. Motherboards often have return rates in the double-digit percentages because of the complexity (and user error) -- about the highest failure rate of all PC components. Try another motherboard if you can.
-
- Posts: 226
- Joined: Sat Sep 06, 2003 5:59 am
- Location: Finland
Sounds like the CPU voltage regulator on the motherboard is faulty or suffering from heat problems. As Ralf pointed out, a 0.2V drop in vcore is not normal even under maximum load. Intel specs say that a drop of 0.1V is typical (with good mobos you see maybe 0.05V) and that the minimum acceptable voltage for your CPU is 1.315, so the regulator circuit is definately scraping the lower end of the range. You may be able to stabilize your system by upping the vcore by a notch or two or significantly underclocking the CPU but my advice would be to swap the mobo.
-
- SPCR Reviewer
- Posts: 8636
- Joined: Sat Nov 23, 2002 6:33 am
- Location: Sunny SoCal
Let me reiterate: the Vcore drop is a very bad thing.
Just for grins, up your Vcore (in the BIOS) to 1.60-1.65V and run your Windows-based stress tests again. Keep an eye on your Vcore when you do this. See how low the Vcore dips and see if you can keep running this time. I'm betting that if you can keep your Vcore over 1.45V you'll continure to run fine. If it's stable under load with the increased Vcore and both PSUs give you the same results I'd RMA the board.
Just for grins, up your Vcore (in the BIOS) to 1.60-1.65V and run your Windows-based stress tests again. Keep an eye on your Vcore when you do this. See how low the Vcore dips and see if you can keep running this time. I'm betting that if you can keep your Vcore over 1.45V you'll continure to run fine. If it's stable under load with the increased Vcore and both PSUs give you the same results I'd RMA the board.