Memory bandwidth tests... any real differences (part 2)

All about them.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Memory bandwidth tests... any real differences (part 2)

Post by graysky » Sat May 10, 2008 2:41 pm

About 7 months ago I posted data comparing two memory dividers (1:1 and 3:5 @ 333 MHz) on my then Q6600/P965 based system and concluded that for the 67 % increase in memory bandwidth, the marginal gains in actual performance weren't worth the extra voltage/heat.

Since then I've upgraded my hardware to an X3360/P35 setup and wanted to revisit this issue. Again, two dividers were looked at: one pair running 8.5x333=2.83 GHz, and another running @ 8.5x400=3.40 GHz:

333 MHz FSB:
1:1 a.k.a. PC2-5300 (667 MHz)
5:8 a.k.a. PC2-8500 (1,067 MHz)

400 MHz FSB:
1:1 a.k.a. PC2-6400 (800 MHz)
4:5 a.k.a. PC2-8000 (1,000 MHz)

I figured there would be a much greater difference in the 333 FSB case since the memory bandwidth increased by 60 % vs. 25 % in the 400 MHz FSB case. All other BIOS settings were held constant with the exception of the divider (and the strap) and the given FSB. Subtimings were set to auto and as such could vary as managed by the board which I found out, was required since manually settings some of the subtimings lead to either an incomplete POST, or an unstable system.

The benchmarks were broken down into three categories:
1) "Real-World" Applications
2) 3D Games
3) Synthetic Benchmarks

The following "real-world" apps were chosen: x264, winrar, and the trial version of Photohop CS3. All were run on a freshly installed version of Windows XP Pro x64 SP2 w/ all relevant hotfixes. The 3D games were just Doom3 (an older game) and Crysis (a newer game). Finally, I threw in some synthetic benchmarks consisting of the Winrar self test, Super Pi-mod, and Everest's synthetic memory benchmark. Here is an explanation of the specifics:

Trial of Photoshop CS3 – The batch function in PSCS3 v10.0.1 was used process a total of fifty-six, 10.1 MP jpeg files (226 MB totally):

1) bicubic resize 10.1 MP to 2.2 MP (3872x2592 --> 1800x1200) which is the perfect size for a 4x6 print @ 300 dpi.
2) smart sharpen (120 %, 0.9 px radius, more accurate, lens blur setting)
3) auto levels
4) saved the resulting files as a quality 10 jpg.

Benchmark results are an average of two runs timed with a stopwatch.

RAR version 3.71 – rar.exe ran my standard backup batch file which generated about 955 MB of rars containing 5,210 files totally. Here is the commandline used:

Code: Select all

rar a -m3 -md4096 -v100m -rv40p -msjpg;mp3;tif;avi;zip;rar;gpg;jpg "f:\Backups\Backup.rar" @list.txt
where list.txt a list of all the target files/dirs included in back up set. Benchmark results are an average of two runs timed with a stopwatch.

x264 Benchmark HD – Automatically runs a 2-pass encode on the same 720p MPEG-2 (1280x720 DVD source) file four times totally. It contains two versions of x264.exe and runs it on both. The benchmark is the best three of four runs (FPS) converted to total encode time.

Shameless promotion --> you can read more about the x264 Benchmark HD at this URL which contains results for hundreds of systems. You can also download the benchmark and test your own machine.

3D Games Based Benchmarks

Doom3 - Ran timeddemo demo1 a total of three times and averaged the fps as the result. Settings were 1,280x1,024, ultra quality with 8x AA.

Crysis - Ran the included "Benchmark_CPU.bat" and "Benchmark_GPU.bat" both of which runs the pre-defined timedemo, looped four times. I took the best three of four (average FPS) and averaged them together as the benchmark. Settings were 1,024x768, very high for all (used the DX9 very high settings hack, and 2x AA.

"Synthetic" Application Based Tests

WinRAR version 3.71 – If you hit alt-B in WinRAR, it'll run a synthetic benchmark. This was run twice (stopped after 150 MB) and is the average of four runs.

SuperPI / mod1.5 XS – The 16M test was run twice, and the average of the two are the benchmark.

Everest v4.50.1330 Memory Benchmark - Ran this benchmark a total of three times and averaged the results.

Hardware specs:

Code: Select all

D.F.I. LP LT P35-TR2 (BIOS: LP35D317)
Intel X3360 @ 8.5x400=3.40 GHz
Corsair Dominator DDR2-1066 (TWIN2X4096-8500C5DF)
   2x 2Gb @ 5-5-5-15 (all subtimings on auto)

 (tRD=8) @ 667 MHz (1:1) @ 2.100V
 (tRD=7) @ 1,066 MHz (5:8) @ 2.100V
 (tRD=8) @ 800 MHz (1:1) @ 2.100V
 (tRD=6) @ 1,000 MHz (4:5) @ 2.100V

EVGA Geforce 8800GTS (G92) w/ 512 meg
Core=770 MHz
Shader=1,923 MHz
Memory=2,000 MHz
Note: the performance levels (tRD) are set automatically by the board which wouldn't POST if I manually tweaked them. Even though they're different, I still feel the data are valid since this is the only way I can run them. In other words, if I'm going to run the higher dividers, it'll be as such or it won't POST!

Without further ado, here are the data starting first with a 333 MHz FSB comparing the 1:1 vs. 5:8 divider (DDR2-667 vs. DDR-1066):
Image

Here are the averaged data visualized graphically:
Image

Now on to the 400 MHz FSB comparing the 1:1 vs. 4:5 divider (DDR2-800 vs. DDR2-1000):
Image

And graphically:
Image

As you can see, there way nothing spectacular in either the real-world category, or the 3D games category in comparison to the massive increase in memory bandwidth (shown on the graphs in red). In fact, I was surprised to see that there were really no gains by Doom3 and minimal gains by Crysis. This is probably due to the fact that the video card shoulders the burden of these games with Doom3 being the light-weight of the two. As expected, the synthetic benchmarks did pick-up on the larger bandwidth, but only in the case of the 400 MHz FSB did I see anything approaching the theoretical increase (14 % of 25 % vs 15 % of 60 %).

If you read my first memory bandwidth post, perhaps the same conclusions can be drawn from these new data. One thing I'll add is that this new MB doesn't require extra voltage like my older P5B-Deluxe did to run the higher dividers, so it's not producing that much more heat. That said, I'm actually running the system with the 4:5 divider, since things seem to feel faster to me (windows opening, responsiveness, etc.) which are all unfortunately intangibles I can't measure.

graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Post by graysky » Sun May 11, 2008 10:44 am

I edited the first post switching the highest 400 MHz FSB run from 5:6 to 4:5 (960 MHz vs. 1,000 MHz) and included some info about subtimings to make things more clear.

AZBrandon
Friend of SPCR
Posts: 867
Joined: Sun Mar 21, 2004 5:47 pm
Location: Phoenix, AZ

Post by AZBrandon » Mon May 12, 2008 8:12 am

Kind of makes you wonder why we continue to see the march from DDR to DDR2 to DDR3 when there's little to no evidence that memory is a bottleneck. Worse still is that Intel was the first to make the jump to DDR3 and they still don't even have an integrated memory controller like AMD has had for years! It seems like memory speeds are all smoke and mirrors and that memory quantity has far more to do with system performance than memory bandwidth does.

Plekto
Posts: 398
Joined: Tue Feb 19, 2008 2:08 pm
Location: Los Angeles

Post by Plekto » Mon May 12, 2008 2:35 pm

The problem also has a lot to do with the fact that the CPUs run at 166 or 200mhz internally. All a fancy multiplier does is net you more CPU speed, but the thing's hopelessly crippled by the actual I/O (FSB) speed, which can never be faster than the base CPU speed. As others have pointed out, blame the memory controller and ancient architecture.

It's like adding a 200 amp alternator onto your car. If the thing can't use more than 50 amps of it, the rest is largely wasted.

jaganath
Posts: 5085
Joined: Tue Sep 20, 2005 6:55 am
Location: UK

Post by jaganath » Mon May 12, 2008 3:29 pm

Worse still is that Intel was the first to make the jump to DDR3 and they still don't even have an integrated memory controller like AMD has had for years!
Nehalem, which will debut at the end of 2008, will have an integrated mem controller. given that Intel chips have been handily outperforming AMD chips, even with the handicap of an external memcontroller, presumably this means Nehalem will rip to shreds anything AMD has to offer.

AZBrandon
Friend of SPCR
Posts: 867
Joined: Sun Mar 21, 2004 5:47 pm
Location: Phoenix, AZ

Post by AZBrandon » Mon May 12, 2008 3:40 pm

That still says nothing about the memory however. Having a fast CPU means you have a fast CPU. If you have a fast system due to the CPU being fast, then again, its still the CPU. I suspect that if you brought the memory speed of a modern system down to half its current speed, you'd still probably have 99% of the original real world performance due to there being so much more available memory bandwidth than any application needs to use.

Correct me if I'm wrong, but is there anything that really demands fast memory speed in order to deliver fast performance to the end user?

Plekto
Posts: 398
Joined: Tue Feb 19, 2008 2:08 pm
Location: Los Angeles

Post by Plekto » Tue May 13, 2008 12:48 pm

Only two things that I can think of - real time rendering and raytracing and something like pure number crunching for CAD or engineering purposes.

Games and so on - hardly more than 5% difference on average. Considering that the money you spend on more expensive ram could buy the next model up video card and net you 30-50% faster speed in most cases...

sjoukew
Posts: 401
Joined: Mon Nov 27, 2006 6:51 am
Location: The Netherlands (NL)
Contact:

Post by sjoukew » Tue May 13, 2008 1:30 pm

you should read xbit labs and second xbit labs link.
They did this "research" in 2006, and their conclusion is the same as yours ;)

AZBrandon
Friend of SPCR
Posts: 867
Joined: Sun Mar 21, 2004 5:47 pm
Location: Phoenix, AZ

Post by AZBrandon » Tue May 13, 2008 3:42 pm

I did some poking around myself and finally came up with an app that demands memory speed: Ulead Videostudio 11.

System: AMD S939 Opteron 185 (dual core), running about 2.75Ghz for all testing, 2 x 1GB OCZ Platinum DDR400 sticks.

Memory bandwidth is from SiSandra, rendering time is for a 5-minute MPEG video to be upscaled and resampled to a different bitrate

230x12 (2760cpu, 460mem), 5840mb/sec, 401 seconds
212x13 (2756cpu, 424mem), 5357mb/sec, 411 seconds
230x12 (2760cpu, 368mem), 4684mb/sec, 417 seconds
230x12 (2760cpu, 103mem), 1297mb/sec, 626 seconds

The first two show the difference between dropping the multiplier and increasing bus speed versus a high mult and lower bus. The memory bandwidth was 9% higher and rendering time was 97.6% as long.

For the next two, I decreased memory speed within BIOS, so the CPU speed remained unaffected and memory alone changed. With memory bandwith only 80% as high, rendering was only 95% as fast. With 22% of memory bandwidth it was 64% as fast.

So there you have it - there's at least one application that can fully utilize every last drop of DDR speed you have. As for DDR2 and DDR3, I don't know, but I felt good about finding at least one application that increased performance with every bit of memory speed I could give it, right up to the point where my memory is absolutely overclocked to the fullest of its abilities.

lobuni
Posts: 73
Joined: Thu Aug 23, 2007 2:33 am

Post by lobuni » Tue May 13, 2008 11:34 pm

AZBrandon wrote:Correct me if I'm wrong, but is there anything that really demands fast memory speed in order to deliver fast performance to the end user?
Maybe integrated graphics chipsets, that rely on system ram for memory.

AZBrandon
Friend of SPCR
Posts: 867
Joined: Sun Mar 21, 2004 5:47 pm
Location: Phoenix, AZ

Post by AZBrandon » Wed May 14, 2008 7:24 am

lobuni wrote:Maybe integrated graphics chipsets, that rely on system ram for memory.
Good point - the latest talk that seems to have been news lately is Nvidia claiming that Intel is irrelevant now, that the future of computing will rely on faster and faster accessory GPU's, and Intel countering that accessory GPU's are going to become irrelevant as integrated graphics solutions improve and will one day move on-die with the CPU.

Looking at how AMD was in a hurry to acquire ATI and move ahead improving their integrated graphics solutions, I think the future will be somewhere in the middle - perhaps CPU's will include a decent GPU on-die but then have an additional socket on the mobo for an even faster accessory GPU, and over time moving to faster and faster on-die GPUs until they reach a point where there's no reason to buy an accessory one.

One of the things that I have noticed is that Intel's next CPU will have not just dual channel memory support, but triple channel DDR3 support with the on-die memory controller. About a week ago I spent a while going over Sun's new Ultrasparc T2 processor since we're buying a ton of them for work. Sun calls it the "system on a chip" since it has a bazillion things all integrated right into the CPU die itself. The effect has been stunning - it delivered 3 - 50x the performance per dollar of any other system it competes against in the eval test we did.

The more research I did on memory bandwidth in the last couple days, the more that it seems current systems, especially ones like mine that use the older DDR(1) chips and the Intel procs without onboard memory controllers don't benefit so much from really high speed memory, but the future of computing appears it will absolutely stand to benefit from faster memory speeds, especially for anything multithreaded.

Post Reply