No good deed goes unpunished...

The forum for non-component-related silent pc discussions.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
mh
Patron of SPCR
Posts: 11
Joined: Wed Jan 01, 2003 11:18 pm
Location: Palo Alto, California, USA

No good deed goes unpunished...

Post by mh » Tue Dec 02, 2003 7:26 pm

Gentlemen,
any assistance in diagnosing an unstable quiet system would be much appreciated. As any good cousin would, I am giving my current P4-2.4 quiet system to a needy relative for Xmas, but my new replacement P4-3.0C setup is bedeviling me with spontaneous reboots under load. I've never experienced such a difficult time getting a system stabilized.

Overview:
Case: Antec 3700QBE
MB: Albatron PX865PE Pro
CPU: Intel 3.0Ghz / 800FSB (oem)
Heatsink: Zalman 7000-AlCu w/Arctic Silver 3 paste.
RAM: pair of vanilla 512MB PC3200, also tried single and pair of Mushkin PC2700 for sanity
Power: Seasonic Super Tornado 300, also tried standard Antec 365 that came with case.
Video: Radeon 9000 with Catlyst 3.9 drivers
Other cards: Firewire PCI card
HD: IBM 80GB with standard 80-pin ribbon cable
CDs: Pioneer DVD and LiteOn CDRW
OS: fresh XP SP1 installation ...

At first I attempted to run the Zalman at the lowest speed, but even Memtest86 would trigger a reboot after a while, so I cranked it back up to full speed (already not very quiet!).

So now at idle in XP, MBM indicates CPU temps of around 40-41. Is that already an indication of bad CPU / heatsink pairing?

Next up, Prime95 / Sandra stress tests cannot run for more than about 10 minutes before it reboots spontaneously (i.e. not an XP Bluescreen/bugcheck due to device drivers). The temp might get up to 48 by this point, but that doesn't seem outrageous, does it?

I've swapped power supplies, swapped RAM, but neither seemed to help. Maybe my power supplies are all underpowered, but there isn't that much in this rig.

At this point, my next thought is to pop out the CPU and try running my 2.4 Ghz in it, which would help point the finger either at the CPU or the Motherboard.

I've also been assuming it isn't a software problem, but perhaps that is a mistake. I've just never seen a machine reboot so randomly. One thing I've noticed is that NTFS hasn't appeared to have gotten corrupted due to this.

Thanks for any tips,
Mark

Zyzzyx
Friend of SPCR
Posts: 1063
Joined: Mon Dec 23, 2002 12:55 pm
Location: Richland, WA
Contact:

Post by Zyzzyx » Tue Dec 02, 2003 7:34 pm

I think the CPU swap is a step in the right direction. You've already swapped out the PSU and RAM. Since you've got the one good system, use it for good donor parts and swap out one item at a time. I've usually been able to identify the problem part that way. Heck, one time it was the keyboard causing lock-ups.

Of course, if you're unlucky it'll be a combination of things. :(

Destron
Posts: 27
Joined: Wed Jan 08, 2003 9:17 pm
Location: Fukui, Japan

Post by Destron » Tue Dec 02, 2003 8:14 pm

I have been having a similar problem, although with an ASUS NForce2 and AMD. My problem is that the BIOS temperature reads a different value than the ASUS tool in WinXP. The ASUS tool is significantly cooler, peaking at 42 when under 100% load for an extended period of time. The BIOS will turn the computer off when the ASUS tool is at about 46. A quick check in the BIOS says that the CPU is at 77 degrees! I have a hard time believing this because the heatsink is only warm to the touch. You might want to check your BIOS temps.

Destron

bondiablo
Posts: 155
Joined: Sat May 17, 2003 7:14 pm
Location: Milwaukee, WI

Post by bondiablo » Tue Dec 02, 2003 9:37 pm

what memory settings are you using in the BIOS? DDR:CPU ratio, DDR Voltage, CAS latency, etc...

al bundy
Posts: 667
Joined: Thu Feb 20, 2003 5:38 pm
Location: Chicago, IL

Post by al bundy » Tue Dec 02, 2003 9:52 pm

Believe it or not, problems like you're describing can often be caused by a faulty Firewire PCI card. Try taking out that card and see if the problem persists.

Other possible culprits:

Try the "failsafe" BIOS settings and see if the problem still happens. If so, then I'd bet (at this point in the troubleshoot) that you have a short occurring between your motherboard and the case somewhere. Another possibility is a poorly-seated/cooled processor (or simply a bad processor).

If the original RAM was bad when you installed the OS, then the critical OS files can be corrupt, causing spontaneous reboot issues even when the RAM (and other hardware) gets replaced.

Turn off "automatically reboot" in the Startup and Recovery options and see if anything in the way of evidence shows up at next system crash.

Good luck man.

8)

Inexplicable
Posts: 226
Joined: Sat Sep 06, 2003 5:59 am
Location: Finland

Post by Inexplicable » Tue Dec 02, 2003 10:33 pm

Try upping the RAM voltage to 2.8V. The 865PE chipset can be really picky about RAM, especially in dual channel mode.

LushMD
*Lifetime Patron*
Posts: 142
Joined: Thu May 01, 2003 10:46 pm
Location: Philadelphia, PA
Contact:

Post by LushMD » Wed Dec 03, 2003 1:27 am

I experienced a similar situation with my HTPC. The instability in my system was isolated to a Seasonic 300W Super Tornado (replacing this PSU with a 350W Forton PSU w/120mm fan resolved the problem...and the Fortron is also fairly quiet).

EDIT: Well, after reading the rest of your post, I realize that you have already tried changing PSUs...sorry for the knee-jerk response...I suppose that's what happens when you're running on 6 hours of sleep in two days...oh well...

silvervarg
Posts: 1283
Joined: Wed Sep 03, 2003 1:35 am
Location: Sweden, Linkoping

Post by silvervarg » Wed Dec 03, 2003 1:30 am

Have you checked you voltages in MBM and see if you have any fluctuations?
This might give you an indication if you are close to maxing out on the PSU. Especially check the 12V line.
Doing this might rule out any PSU issues.

Just to rule out all software issues in one single go I suggest that you download and burn a Demo-Linux CD. With that CD you can boot on the CD and it will not use your harddrive at all.
If you can't provoke any crash under Linux you can start looking into software problems in windows, possibly reinstalling windows etc.

Ralf Hutter
SPCR Reviewer
Posts: 8636
Joined: Sat Nov 23, 2002 6:33 am
Location: Sunny SoCal

Post by Ralf Hutter » Wed Dec 03, 2003 5:55 am

mh wrote: RAM: pair of vanilla 512MB PC3200, also tried single and pair of Mushkin PC2700 for sanity

At first I attempted to run the Zalman at the lowest speed, but even Memtest86 would trigger a reboot after a while, so I cranked it back up to full speed (already not very quiet!).
This is a bad sign right out of the box. If you're not stable in Memtest86 you're wasting your time doing anything else.

Take your board out of the case and set it on top of the mobo box or a piece of cardboard. Leave everything unplugged except for CPU, RAM, Vidcard, FDD and keyboard/mouse. Reset the CMOS (but check the memory SPD timings, a lot of 875/865 boards have trouble setting the timings by SPD. I'm already leery of your "plain vanilla RAM" so you better make sure the board is setting it at timings that will work. I don't know what it's rated at but you might try 2.5-3-3-7 or 3-3-3-7 to start) and run Memtest86 for a while. Assuming your memory can run some loops (maybe at least 10+) without errors (zero errors are acceptable) I'd probably do a fresh OS install and start all over again. It the RAM's bad get yourself some better RAM and try again. You need to make sure you have a good foundation before you even start.

The thing is, rebooting in Memtest86 is real weird. This kind of points to a bad PSU or MoBo to me.

mh wrote:So now at idle in XP, MBM indicates CPU temps of around 40-41. Is that already an indication of bad CPU / heatsink pairing?
That's certainly higher than normal. With that setup (and the fan running at 12V) I'd expect that you'd be at least 10°C cooler . What's your ambient temp?
mh wrote:Next up, Prime95 / Sandra stress tests cannot run for more than about 10 minutes before it reboots spontaneously (i.e. not an XP Bluescreen/bugcheck due to device drivers). The temp might get up to 48 by this point, but that doesn't seem outrageous, does it?


First you gotta get Memtest86 to run without errors before you start worrying about this stuff. 48°C isn't bad at all, it kinda sounds like your CPU's not running at full load.
mh wrote:I've swapped power supplies, swapped RAM, but neither seemed to help. Maybe my power supplies are all underpowered, but there isn't that much in this rig.
Your CPUs aren't underpowered at all. Max load on your setup os probably around 175W. Using MBM, keep an eye on your rails when you put your CPU under load. They shouldn't fluctuate much.
mh wrote:At this point, my next thought is to pop out the CPU and try running my 2.4 Ghz in it, which would help point the finger either at the CPU or the Motherboard.
Certainly couldn't hurt, but look at your heatsink mounting while you're at it. Make sure both surfaces are squeeky clean before applying TIM and make sure you screw both mounting screws all the way down until they bottom.
mh wrote:I've also been assuming it isn't a software problem, but perhaps that is a mistake. I've just never seen a machine reboot so randomly. One thing I've noticed is that NTFS hasn't appeared to have gotten corrupted due to this.
How do you know? Bad memory can make a real mess of your OS, even though NTFS is especially bullet-proof.
mh wrote: Video: Radeon 9000 with Catlyst 3.9 drivers
If you don't do a fresh install you might try one of those ATI driver removal apps (go to Rage3D forums and look at their stickies and FAQs) to get rid of those Cat 3.9's and go with the 3.7's instead. If you do a fresh install, pass on the 3.9's and go with the 3.7's.

mh
Patron of SPCR
Posts: 11
Joined: Wed Jan 01, 2003 11:18 pm
Location: Palo Alto, California, USA

Thanks for the tips so far!

Post by mh » Wed Dec 03, 2003 8:17 am

You guys are great! Thanks for the help so far.

I had prepared a longer response, but it took me so long to type it in that this BBS posting page timed me out so I lost it -- always compose in Notepad and cut/paste! :-)

Anyway, I will be trying out each of your tips before giving up on the whole thing and returning the components.

To follow-up on some of the missing data:

#1. Using the Antec 385 PS & MBM, the readings for CPUTemp/Core0/Core1/3.3v/5.0v/12v/-12v/-5v were:

Idle:
35C/1.49v/1.52v/3.28v/5.05v/12.16v/-12.68v/-5.85v

After 1 minute of Prime95:
43C/1.34v/1.41v/3.26v/5.03v/11.98v/-12.93v/-5.95v

After 5 minutes of Prime95:
46C/1.30v/1.34v/3.28v/5.05v/12.04v/-13.01v/-5.90v

This was looking too stable, so right after taking these measurements I launched Sandra to run the burn-in wizard. It spontaneously rebooted before the Sandra splash screen even showed up!

2. The BIOS settings for timing and voltages have always been at the failsafe/default levels. The memory timing is via "By SPD", and for the Mushkin "value" PC2700 memory in dual channel mode, the BIOS determined the timing should be 2.5-6-3-3. My only somewhat reliable recollection of the DDR400 settings was 3-8-3-3.

The default CPU voltage was also shown as 1.525v

The BIOS Health screen showed the voltages as:
Vcore: 1.52
3.3v: 3.29
5v: 5.05
12v: 11.97-12.09
-12v: -12.77
VBAT(v): 3.15
5VSB(v): 4.99


Another question for the gurus:

1. I really like the bootable CDs like Memtest86, although I wish it could show temps & voltages too. But as for the "live CD" Linuxes, I am not familiar with any burn-in software under Linux. Does the Demo-Linux CD or other distros on www.distrowatch.com emphasize stress testing? Most seem to talk about recovery utilities.

Thanks,
Mark

Ralf Hutter
SPCR Reviewer
Posts: 8636
Joined: Sat Nov 23, 2002 6:33 am
Location: Sunny SoCal

Re: Thanks for the tips so far!

Post by Ralf Hutter » Wed Dec 03, 2003 1:11 pm

mh wrote: Idle:
35C/1.49v/1.52v/3.28v/5.05v/12.16v/-12.68v/-5.85v

After 1 minute of Prime95:
43C/1.34v/1.41v/3.26v/5.03v/11.98v/-12.93v/-5.95v

After 5 minutes of Prime95:
46C/1.30v/1.34v/3.28v/5.05v/12.04v/-13.01v/-5.90v
Pretty big dip in your Vcore. I've never seen anything like that. Something's wrong, I'd say almost for sure that's your problem right there. Big Vcore sag under load ≠ stability. That's a 20% drop!

You do have that square Aux 12V connector plugged into your mobo, right?

mh
Patron of SPCR
Posts: 11
Joined: Wed Jan 01, 2003 11:18 pm
Location: Palo Alto, California, USA

Vcore dropping and Hyperthreading

Post by mh » Wed Dec 03, 2003 7:18 pm

Ralf,
yes, the dropping Vcore has been consistent with both powersupplies, etc. The 4pin power connector has always been attached. I just guessed it was something intentional, where the Vcore dropped to compensate for rising CPU temps.

And for reference, a few other test results:

1. removing the Firewire card had no effect.
2. memtest86 was very stable for more than 10 passes
3. bumping the DDR RAM voltages up by 0.2 volts did not help
4. reverting to default BIOS settings allowed me to run Prime95 + Sandra burn-in for quite a while.

It turns out that the default BIOS settings disable Hyperthreading. When I turn on Hyperthreading, the machine can still run Prime95 indefinitely, but the moment I launch another application (not just Sandra), the system reboots.

That almost sounds like a software problem again, but with HT turned off and both Prime95 + Sandra running, the Vcore stays up around 1.47 instead of dropping off to the low to mid 1.30v.

That still doesn't explain whether the CPU is faulty, or the motherboard, or both, but my guess is that if I swapped in my non-HT capable P4-2.4Ghz, it would be stable as a rock just like the 3.0 appears to be when HT is off.

Of course, one of the big appeals of getting the P4C models is HT.

Does this ring any bells with people?

Thanks,
Mark

MikeC
Site Admin
Posts: 12285
Joined: Sun Aug 11, 2002 3:26 pm
Location: Vancouver, BC, Canada
Contact:

Post by MikeC » Wed Dec 03, 2003 7:31 pm

It would be a shocker if the CPU was at fault. I've never heard of a CPU that was partly working -- it's always been works 100% or is dead. If the Vcore drop happens with both PSUs, it reallyu looks like to motherboard to me. Motherboards often have return rates in the double-digit percentages because of the complexity (and user error) -- about the highest failure rate of all PC components. Try another motherboard if you can.

Inexplicable
Posts: 226
Joined: Sat Sep 06, 2003 5:59 am
Location: Finland

Post by Inexplicable » Thu Dec 04, 2003 1:00 am

Sounds like the CPU voltage regulator on the motherboard is faulty or suffering from heat problems. As Ralf pointed out, a 0.2V drop in vcore is not normal even under maximum load. Intel specs say that a drop of 0.1V is typical (with good mobos you see maybe 0.05V) and that the minimum acceptable voltage for your CPU is 1.315, so the regulator circuit is definately scraping the lower end of the range. You may be able to stabilize your system by upping the vcore by a notch or two or significantly underclocking the CPU but my advice would be to swap the mobo.

Ralf Hutter
SPCR Reviewer
Posts: 8636
Joined: Sat Nov 23, 2002 6:33 am
Location: Sunny SoCal

Post by Ralf Hutter » Thu Dec 04, 2003 5:24 am

Let me reiterate: the Vcore drop is a very bad thing.

Just for grins, up your Vcore (in the BIOS) to 1.60-1.65V and run your Windows-based stress tests again. Keep an eye on your Vcore when you do this. See how low the Vcore dips and see if you can keep running this time. I'm betting that if you can keep your Vcore over 1.45V you'll continure to run fine. If it's stable under load with the increased Vcore and both PSUs give you the same results I'd RMA the board.

Post Reply