• Home
  • blog
  • Intel Core i7: Nehalem Launched

Intel Core i7: Nehalem Launched

Core 2 has dominated desktop processors for the last couple of years. Now comes Core i7, Intel’s next gen architecture with the capacity to scale up to eight Hyper-threaded cores. What does the new platform bring to silent computing?

November 8, 2008 by Lawrence Lee with Mike
Chin

The release of the Core line of desktop processors was
Intel’s great redemption. After struggling for some years to
compete with AMD’s Athlon 64 line, Intel abandoned the
NetBurst architechture of their Pentium 4 and Pentium D chips in
favor of Core and Core 2. The Core 2 took back the performance
crown and Intel regained much of the market share it had lost,
forcing AMD to drop prices in order to compete at least in the
middle and low end of the processor market, at great financial
cost. To this day, no AMD dual core processor can outperform the
fastest of the initially released Core 2’s, the 2.93Ghz Core
2 Extreme X6800, released all the way back in July of 2006, an
eternity in the life cycle of processors; AMD still has a lot of
catching up to do. While it has been a great run for Core 2,
Intel is not resting on its laurels now. The next generation Core
i7 is here, and with it, Intel is pushing to keep its lead into
the future.


A Core i7 processor. Bare on the left, with heatspreader in the
center, and the back side on the right. No, it’s not quite
square.

The Core i7 line is the first release of Intel’s new
Nehalem architechture. Though it is built on the same 45nm
process as current Core 2 processors, Nehalem is not simply the
next generation of Core — it is a drastic re-design in many
respects. First and foremost, the processors feature an
integrated memory controller — a move that proved very
advantageous for AMD’s Opteron and Athlon 64. Having a memory
controller built directly on the CPU die allows CPU/memory
operations to bypass routing via the northbridge chip. This
increases overall memory performance and frees up bandwidth for
other interfaces; multiple cores benefit greatly. The controller
also supports triple channel memory, presumably a step above dual
channel.


A Nehalem die.

Following in the footsteps of AMD once again, Intel has
developed their own version of HyperTransport called QuickPath Interconnect. QPI allows for a 20-bit wide
25.6 GB/s link between the CPU and northbridge — doubling
performance compared to the maximum 1600Mhz Front Side Bus
available currently on Intel platforms. Nehalem also sees
Intel’s return to Hyper-Threading, a feature from the Pentium 4
era not present in the Core 2 line. There are plans for 8 core
Nehalem processors capable of dealing with 16 threads
simultaneously.

Power management is handled differently as well. Each Core i7
chip has a PCU (Power Control Unit) which can alter core
frequencies and voltages dynamically depending on load. When the
cores are idle, PCU can underclock and undervolt them to use less
power or put them in a sort of standby mode where they use almost
no power at all. To complement this, Turbo Boost
Technology
will overclock active cores while the idle ones
are asleep — as many programs cannot fully take advantage
of 4 cores, a boost in clock speeds of the ones they can use is
useful. Turbo mode will also overclock all cores if it deems that
the processor is receiving sufficient cooling.

With all these changes comes a new socket, LGA 1366.
Enthusiasts often cry in outrage when a socket change occurs,
usually because it’s seen as unwarranted money-grab that
requires not only a new CPU for an upgrade but new motherboard
and heatsink as well. This time around, with all the changes to
the architecture, the die size is much larger so a new socket
appears to have been unavoidable. The thing does have 1366
contact points!

The Core i7 line, code-named Bloomfield, is currently
comprised of three quad-core chips. They each have 256KB of L2
cache per core, 8MB of L3 cache and TDPs of 130W. The 920 runs at
2.66Ghz, the 940 at 2.93Ghz, and finally the 965 Extreme which
runs at 3.2Ghz. Pricing is set to be $284, $562, and $999 USD
respectively in large qualtities for Intel’s biggest
buyers.

CORE i7 REVIEW KIT

As you’ve surmised by now, we were among the recipients of
the Intel i7 press review package. Our kit included:

  • Core i7 920 and Core i7 965 (non-retail processor samples)
  • Intel DX58SO motherboard – Intel X58 chipset.
  • Intel X25-M 80GB solid-state hard drive
  • Stock Intel heatsink/fan for LGA1366
  • Thermalright Ultra-120 eXtreme RT heatsink with fan
  • QiMonda DDR3 RAM – PC3-1066, 3 x 1024
    MB

For who are expecting pages and pages of benchmarks of the
above gear using programs you’ve never heard about or will
ever use, we will disappoint. SPCR’s focus is on acoustics,
thermals and power. These are the aspects we’ll focus on
using test tools you may be familar with from other SPCR reviews.
The physical details of the new platform are also important for
us to examine — CPU die size, heatspreader composition and
dimensions, heatsink mounting, etc. Finally, we designed a basic
test suite with a few time-sensitive operations that the majority
of users may find themselves doing from time to time. These tests
were run on two similarly equipped systems — one Core i7,
one Core 2 quad-core — for a comparison of practical
desktop performance diferences between the Core i7 platform and
its closest Core 2 equivalents.

PHYSICAL DETAILS

The Core i7 in the LGA1366 package is physically larger than
the Core 2 socket 775 processors. It has some 731 million
transistors in a 263 mm² area made in the same 45nm
fabrication process for “Penryn” Core 2 chips, which
has 410 million transistors and a 107 mm² die. Two such dies
are needed for a quad-core Core 2 processor.


LGA1366 processor on left, LGA775 on right.

 


A Core i7 in its socket. Notice the screws in each corner —
they attach to a back plate on the trace side of the
motherboard.

 


The socket is slightly larger, and rectangular, to match the
processor.

 


The heatsink mounting holes remain symmetrical, though the
push-pin system from the LGA775 cooler unfortunately remains. The
lever that locks the CPU in place is longer and more prone to
bending. Watch your fingers — the amount of force required
is similar to that of a mouse trap.

 


The included stock cooler is somewhat taller than the stock
LGA775 cooler.

 


The stock heatsink also has a larger diameter as the mounting
holes are further apart on LGA1366 motherboards. They form a 80mm
square — LGA775 mounting holes are 72mm apart.

 


Thermalright’s 1366 version of the Ultra-120 eXtreme. The actual heatsink is unchanged,
but it ships with a Thermalright branded fan with a re-designed
plastic mount — the wire clips have been done away
with.

INTEL DX58SO MOTHERBOARD

With our Core i7 review package from Intel came an Intel
DX58SO motherboard — the Intel’s X58 chipset is the
only one that supports Core i7 processors at the moment. It is a
board bursting with features.


Chipset diagram.

 

Layout. The DX58SO has two PCI-E 16x slots for
use in CrossFire and four memory slots (the fourth in its own
individual channel). There is also an additional 4-pin power plug
near the back panel.

 


The board’s northbridge heatsink is screwed on and its fins
are oriented to take advantage of right-left airflow. There are
also heatsinks on the VRMs around the CPU socket. (Click on image
to enlarge.)

 


The trace side of the board. The CPU and northbridge back plates
are visible.

 


The back panel. The DX58SO comes with one FireWire and two eSATA
ports at the back.

 


Heatsink and memory installed.

TESTING SETUP

Core i7 Test Platform

  • Intel Core i7 920 Bloomfield core processor – 2.66Ghz, 4.8 QDI, 130W
    TDP
  • Intel Core i7 965 Extreme
    – 3.2GHz, 6.4 QDI, 130W TDP
  • Intel DX58SO motherboard – Intel X58 chipset.
  • Thermalright Ultra-120 eXtreme RT heatsink
  • QiMonda DDR3 RAM – PC3-1066, 3 x 1024 MB
  • Asus
    ENGTX260
    512MB graphics card
  • Intel X25-M 80GB solid-state hard
    drive
  • NesteQ
    ECS7001
    ATX power supply.
  • Microsoft Windows Vista SP1
    operating system – Home Premium, 32-bit
  • nVidia Forceware graphics driver
    version 178.24

Core 2 Test Platform

Measurement and Analysis Tools

Benchmark Test Details

  • Eset NOD32: In-depth
    virus scan of a folder containing 32 files of varying size with many of them
    being file RAR and ZIP archives.
  • WinRAR: Archive creation
    with a folder containing 68 files of varing size (less than 50MB).
  • iTunes: Conversion of an
    MP3 file to AAC
  • TMPGEnc Xpress: Encoding a 1-minute long XVID AVI file
    to VC-1 (1280×720, 30fps, 20mbps)

TEST RESULTS

While other sites will fill pages and pages of
benchmarks using programs you’ve never heard about or will
ever use, we designed a simple, basic test suite with a few
time-sensitive operations that the majority of users may find
themselves doing from time to time: NOD32 for anti-virus
scanning, WinRAR for file archiving, iTunes for audio encoding,
and TMPGEnc for video encoding. For more comprehensive
benchmarking, a round-up of Core i7 reviews can be found on
Mike Chin’s blog entry.

Performance

For our tests, we ran the Core 2 Extreme QX9650
at its stock speed of 3Ghz and underclocked to 2.66Ghz to match
the Core i7 920 clock for clock. We opted to underclock the
QX9650 rather than overclock the Core i7 920 because the latter
would require a bus adjustment which would then in-turn alter the
memory frequency and thus create another variable to consider. We
also tested the the Core i7 920 with dual channel and triple
channel memory memory configurations to see if that made any
significant difference in performance. Except for the motherboard
and CPU cooler, the components used in the two platforms were
identical, down to the memory speed and timings.

Benchmarks
CPU
QX9650
QX9650
i7 920
i7 920
i7 965XE
Clock Speed
2.66Ghz
(UC)
3.00Ghz
(Stock)
2.66Ghz
(Stock)
2.66Ghz
(Stock)
3.20Ghz
(Stock)
System RAM
2 x 1GB
2 x 1GB
2 x 1GB
3 x 1GB
3 x 1GB
NOD32
209s
197s
210s
209s
175s
WinRAR
185s
177s
153 s
151 s
136s
iTunes
214s
189s
209 s
200 s
176s
TMPGEnc
210s
189s
177 s
178 s
151s
3DMark06
13058
14077
15187
15200
16307

Not surprisingly, the Core i7 965XE dominated the
benchmarks. NOD32 performance seemed to be based mostly on clock
speed, with the QX9650 at 3Ghz coming in second place. WinRAR
heavily favored the Bloomfield system. iTunes encoding received a
small boost from Core i7, but clock speed, again was most
critical, putting the QX9650 @ 3Ghz in second place. TMPGEnc,
probably the most complex and thread-aware application in our
suite, performed better on the Core i7 system as did 3DMark. Keep
in mind that the Turbo Boost Technology in i7 overclocks
all or some of the cores under load, while there is no such
auto-overclock function in the QX9650. For example, the 965XE
core clock speed went up to 3.33GHz under load.

Overall, in day-to-day use, the Core i7 gives a
nice boost to performance over an equivalently clocked Core 2.
However, for those running programs that aren’t smart enough
to take advantage of the improvements to the new architecture, it
may be wise to invest the extra money that would be spent on DDR3
memory and a X58 board for a faster Core 2 quad core instead. Of
the programs that did see an significant increase in performance,
the amount was in the order of 15-20%. The effect of triple
channel memory on the Core i7 system barely registered in any of
our benchmarks with the i7 920.

Memory

The incorporation of an integrated memory
controller is a giant step for Intel, and the memory benchmarks
provided by Everest show that memory bandwidth and latency
dramatically improved. While this does not necessarily translate
into better performance for day-to-day desktop applications,
server applications and such which are heavily memory-bound will
be much happier.

Memory Performance (Everest Ultimate
Edition)
CPU
QX9650
QX9650
i7 920
i7 920
i7 965XE
Clock Speed
2.66Ghz
(UC)
3.00Ghz
(Stock)
2.66Ghz
(Stock)
2.66Ghz
(Stock)
3.20Ghz
(Stock)
System RAM
2 x 1GB
2 x 1GB
2 x 1GB
3 x 1GB
3 x 1GB
Read MB/s
7334
7404
13212
13624
15022
Write MB/s
7074
7079
9681
9682
12054
Copy MB/s
6432
6259
11881
13239
15388
Latency
72.8 ns
72.2 ns
38.1 ns
42.3 ns
39.4 ns

POWER

With all the changes under the hood of the
processor and the increased die size, you might assume that the
Core i7 chips uses more idle power. This was not the case —
idle power was almost identical on all of these system
configurations. During playback of a VC-1 video, the power draw
was very similar across the board as CPU usage was very low, due
to the hardware acceleration capabilities of the nVidia graphics
card. The Core i7 system did not exhibit significantly higher
power draw until the Prime95 load tests. With half the cores
stressed, the Core i7 920 system pulled 10W more than the QX9650
at 2.66Ghz and 3W more at 3.00Ghz. At full load, the Core i7
system used an extra 45-50W. The extra power draw of the 965XE
system running Prime95 is directly attributable to the higher
clock speed of its cores.

System Power Consumption (AC)
CPU
QX9650
QX9650
i7 920
i7 920
i7 965XE
Clock Speed
2.66Ghz
(UC)
3.00Ghz
(Stock)
2.66Ghz
(Stock)
2.66Ghz
(Stock)
3.20Ghz
(Stock)
System RAM
2 x 1GB
2 x 1GB
2 x 1GB
3 x 1GB
3 x 1GB
Off
2W
2W
2W
2W
2W
Sleep
3W
3W
5W
5W
5W
Idle
102W
103W
102W
104W
104W
VC-1
137W
140W
143W
140W
138W
Prime95 (2/4)
152W
159W
162W
160W
173W
Prime95 (4/4)
171W
177W
216W
215W
236W
Prime95 (4/4) + Furmark06
295W
302W
341W
343W
367W
Note: The number of threads used in
Prime95 were doubled for the Core i7 920 due to Hyper-Threading.
To stress 2/4 cores, 4 threads must be run on the Core i7 920
while the QX9650 requires only 2.

The extra DIMM in the triple channel memory
configured i7 system barely made an impression on our power
readings, varying an average of 2W. During some of the tests we
noticed the power draw was actually lower. Normally adding an
extra stick of memory increases power across the board, but not
so for the Core i7. Initially when we were comtemplating
overclocking the Core i7 920 to 3.00Ghz to match the QX9650’s
stock clock speed, we found that it would blue screen with only 2
DIMMs at 3Ghz, yet it was perfectly stable with 3 DIMMs.
Rearranging and using different modules did not change this
behavior. This particular system seemed to work best with triple
channel memory.

ENERGY EFFICIENCY

Timed benchmarks give us an opportunity to
analyze power efficiency while keeping performance in mind. Once
a task is completed, the system sits idle, and in our case the
Core 2 and Core i7 systems in our test setups idle using the same
amount of power. So how fast the program takes to finish its task
and how much power it draws while doing so ultimately determines
power efficiency. With that in mind we calculated the
watt-seconds for each benchmark by multipling the time with the
average power consumption during the task. W/hr would be easy to
obtain from this number, but since the tasks themselves were very
short (typically no more than 3 minutes), that seemed
unwarranted.

Benchmark Energy Efficiency
CPU
QX9650
QX9650
i7 920
i7 920
i7 965XE
Clock Speed
2.66Ghz
(UC)
3.00Ghz
(Stock)
2.66Ghz
(Stock)
2.66Ghz
(Stock)
2.66Ghz
(Stock)
System RAM
2 x 1GB
2 x 1GB
2 x 1GB
3 x 1GB
3 x 1GB
WinRAR
185s @
133W
177s @
135W
153s @
133W
151s @
130W
140s @
136W
24605 Ws
+3%
23895 Ws
100%
20349 Ws
-15%
19630 Ws
-18%
19040 Ws
-20%
iTunes
214s @
125W
189s @
128W
209s @
136W
200s @
133W
175s @
141W
26750 Ws
+10%
24192 Ws
100%
28424 Ws
+17%
26600 Ws
+10%
24675 Ws
+2%
TMPGEnc
210s @
165W
189s @
170W
177s @
188W
178s @
189W
151s @
208W
34650 Ws
+8%
32130 Ws
100%
33276 Ws
+4%
33642 Ws
+5%
31408 Ws
-2%

The QX9650 system at stock 3Ghz clock speed was
used as the reference point for each benchmark. The energy
consumption of the other systems were scored as needing more
(plus %) or less (minus %) energy compared to that used by the
stock-clock QX9650 system. The lowest energy consumption is in
bold green.

WinRAR used about the same amount of power as the
Core 2 configurations, but finished a fair bit faster, resulting
in much better energy efficiency, with the 965XE using a
substantial 20% less energy than the baseline. The QX9650 2.66Ghz
and i7 920/triple channel configuration were about dead even when
it came to iTunes encoding efficiency, but the QX9650 at its
stock speed of 3Ghz proved to be best, by a small margin over the
top i7. Video encoding with TMPGEnc was faster on the Core i7 920
configurations but the extra power it consumed doing so made it
slightly less efficient overall. The extra power demanded by the
965XE, however, was more than compensated by the reduced time,
which gave it a 2% advantage.

From these limited tests it would seem that Core
i7 is fairly close to the power efficiency of Core 2, with
variance depending on the application used. However, it seems
likely that with highly demanding, multi-threaded applications,
the i7 will scale up better than the Core 2s.

CPU CLOCK CONTROL

When idle, CPU-Z reported fluctuating CPU frequencies
ranging from 1.6Ghz to 2.8Ghz for the Core i7 920, so the Power Control Unit seemed to be working somewhat
mysteriously, possibly detecting slight variations in demand by the OS and responding dynamically
with overclocking or underclocking. With Prime95 load, CPU-Z reported the CPU
speed as 2.8Ghz, no matter how many worker threads were used.
With only one active core, we were expecting a higher increase in
clock speed, but that didn’t happen. Turbo mode seems to be a
crude way of saying “slightly overclock mode.”


Minimum and maximum CPU states according to CPU-Z.

 


Task Manger, performance tab.

As the CPU supports Hyper-Threading, eight CPUs were detected
by Windows. With Prime95 running four worker threads, the CPU
usage of cores 0 and 3 hit 100%, while the other two remained
idle.


Intel Desktop Control Center GUI.

Intel’s Desktop Control Center is a beta application for
adjusting and monitor system settings. It is available only on
select Intel branded boards, so it is unfortunately, not an
equivalent to AMD Overdrive. It does allow for a variety of
settings to be changed from the desktop, though most changes
require a reboot. CPU, memory, and bus settings can be adjusted
and there is monitoring available for various temperatures and
voltages via additional menus. You can also define presets for
easy switching.


Overclocked to 3Ghz — minimum and maximum CPU states
according to CPU-Z.

Using the Desktop Control Center, we briefly overclocked the
i7 920 processor to 3Ghz by adjusting the CPU frequency to
150Mhz. The same CPU speed behavior that was in effect at stock
speed also prevailed when overclocked: The multiplier shifted up
and down, sometimes overclocking the CPU past its set speed.
Running Prime95 with any number of threads resulted in 3.15Ghz
according to CPU-Z.

It’s hard to tell exactly what is going on under
the hood as CPU-Z does not allow each core to be monitored. No
software we know of tells whether inactive cores in a multi-core
CPU underclock and undervolt or go to sleep.

FINAL THOUGHTS

There was never any doubt that the Core i7 would be the new
fastest processor family for the desktop. With all the changes in
the new architecture, it’s somewhat surprising that the
increase in performance is evolutionary, not revolutionary. For
more extensive performance benchmarks, check out the reviews at
The Tech Report, Anandtech, and X-bit Labs. They concur that Core i7
is the new king of desktops, garnering about 20% more performance
than equivalently clocked Core 2’s (Turbo mode aside).
However, the best gains are experienced with professional level
applications written to take advantage of the Core i7’s new
feature and instruction set.

The improvements to the core, namely the integrated memory
controller and QPI, will allow Nehalem processors to scale very
well as more cores are added and clock speed increased, and
reduce overhead in multi-processor environments like servers. For
the average desktop user, Core i7 isn’t going to provided
dramatic improvements. Part of the problem may be that Core 2 was
such a huge improvement over Pentium 4 and Pentium D, that we are
somewhat spoiled — we expect a leap in performance with
every a new generation of processors, and Core 2 is pretty darn
fast to begin with. The changes made were necessary, however, for
a base from which Intel can improve further.

In terms of power efficiency, Core i7 is at least equal to its
predecessor for typical desktop applications. The extra
performance it delivers more or less justifies the increased
power usage. The new power management system however may not work
exactly as intended — it’s hard to tell exactly what is
going on with each core. There’s little question that Core
i7’s idle power consumption is equal to Core 2, regardless of
how that’s achieveed.

The higher peak power consumption is probably what prompted
Intel to increase the size of the heatsink mounting pattern, to
accommodate larger coolers. The default installation method still
involves push-pins. This is too bad — the last thing we
need is bigger, heavier heatsinks that mount with push-pins. The
inclusion of a metal back plate for the CPU is welcome, however,
somewhat mitigating the push-pins: You won’t have to worry
about the PCB bending, only the push-pins completely failing and
popping off.

Bloomfield delivers better performance than Yorkfield, but it
is not enough to offset the additional costs involved in a
platform change for most desktop users. DDR3 memory is twice the
price of DDR2 and X58 is the only compatible chipset. The DX58SO
will debut at around $300, and even the cheapest of the i7s, the
920, will probably retail for at least $300. Until memory prices
drop and cheaper, mainstream chipsets are released, it’s a
lot to pay unless the absolute highest level of performance is
required and/or money is no object. As usual early adopters will
pay a heavy tax for the latest and greatest — especially
for the $1,000 3.2Ghz version.

For silent computing enthusiasts, the Core i7 isn’t really
a step forward, what with the promise of higher performance at
the price of higher thermals, which may cost a decibel or two.
But it’s not a step back, either, as the lessons learned by
silencers during the era of overheating P4s remain, and the
thermal design of the new platform appears good enough to allow
quiet cooling as with today’s top socket 775 systems. In
short, the increased socket size and heatsink area are mostly
benefits, especially for lower TDP Core i7 processors that will
surely come down the pipeline. Most of us will wait for the
technology to flow down to those ranks — or be content with
Core 2, which will likely remain in production for some time to
come.

Our thanks to Intel, QiMonda, and
Thermalright for the various samples.

* * *

Articles of Related Interest
Core i7 News
Intel Developers
Forum, Fall 2008

Desktop CPU Power
Survey, April 2006

* * *

Discuss this article in the SPCR
forums.

 

Leave a Comment

Your email address will not be published. Required fields are marked *