Core i7,i5,i3-2

TopNax

The 6-series Platform

At launch Intel is offering two chipset families for Sandy Bridge: P-series and H-series, just like with Lynnfield. The high level differentiation is easy to understand: P-series doesn’t support processor graphics, H-series does.

There are other differences as well. The P67 chipset supports 2x8 CrossFire and SLI while H67 only supports a single x16 slot off of the SNB CPU (the chip has 16 PCIe 2.0 lanes that stem from it).

While H67 allows for memory and graphics overclocking, it doesn’t support any amount of processor overclocking. If you want to overclock your Sandy Bridge, you need a P67 motherboard.

6Gbps

Had SSDs not arrived when they did, I wouldn’t have cared about faster SATA speeds. That’s how it worked after all in the evolution of the hard drive. We’d get a faster ATA or SATA protocol, and nothing would really change. Sure we’d eventually get a drive that could take advantage of more bandwidth, but it was a sluggish evolution that just wasn’t exciting. SSDs definitely changed all of that. Today there’s only a single 6Gbps consumer SSD on the market—Crucial’s RealSSD C300. By the middle of the year we’ll have at least two more high-end offerings, including SandForce’s SF-2000. All of these SSDs will be able to fully saturate a 3Gbps SATA interface in real world scenarios.

Intel's DP67BG—The blue SATA ports on the right are 6Gbps, the black ones are 3Gbps

To meet the soon to be growing need for 6Gbps SATA ports Intel outfits the 6-series PCH with two 6Gbps SATA ports in addition to its four 3Gbps SATA ports. I dusted off my 128GB RealSSD C300 and ran it through a bunch of tests on five different platforms: Intel’s X58 (3Gbps), Intel’s P67 (3Gbps and 6Gbps), AMD’s 890GX (6Gbps) and Intel’s X58 with a Marvell 9128 6Gbps SATA controller. The Marvell 91xx controller is what you’ll find on most 5-series motherboards with 6Gbps SATA support.

I ran sequential read/write and random read/write tests, at a queue depth of 32 to really stress the limits of each chipset’s SATA protocol implementation. I ran the sequential tests for a minute straight and the random tests for three minutes. I tested a multitude of block sizes ranging from 512-bytes all the way up to 32KB. All transfers were 4KB aligned to simulate access in a modern OS. Each benchmark started at LBA 0 and was allowed to use the entire LBA space for accesses. The SSD was TRIMed between runs involving writes.

Among Intel chipsets I found that the X58 has stellar 3Gbps SATA performance, which is why I standardize on it for my SSD testbed. Even compared to the new 6-series platform there are slight advantages at high queue depths to the X58 vs. Intel’s latest chipsets. Looking at 6Gbps performance though there’s no comparison, the X58 is dated in this respect. Thankfully all of the contenders do well in our 6Gbps tests. AMD’s 8-series platform is a bit faster at certain block sizes but for the most part it, Intel’s 6-series and Marvell’s 91xx controllers perform identically. I hate to be a bore but when it comes to SATA controllers an uneventful experience is probably the best you can hope for.

Intel P67

Intel P55

Intel X58

Time from Power on to Boot Loader

22.4 seconds

29.4 seconds

29.3 seconds

Z68

In developing its 6-series chipsets Intel wanted to minimize as much risk as possible, so much of the underlying chipset architecture is borrowed from Lynnfield’s 5-series platform. The conservative chipset development for Sandy Bridge left a hole in the lineup. The P67 chipset lets you overclock CPU and memory but it lacks the flexible display interface necessary to support SNB’s HD Graphics. The H67 chipset has an FDI so you can use the on-die GPU, however it doesn’t support CPU overclocking—only memory. What about those users who don’t need a discrete GPU but still want to overclock their CPUs? With the chipsets that Intel is launching today, you’re effectively forced to buy a discrete GPU if you want to overclock your CPU. This is great for AMD/NVIDIA, but not so great for consumers who don’t need a discrete GPU and not the most sensible decision on Intel’s part.

There is a third member of the 6-series family that will begin shipping in Q2: Z68. Take P67, add processor graphics support and you’ve got Z68. It’s as simple as that. Z68 is also slated to support something called SSD Caching, which Intel hasn’t said anything to us about yet. With version 10.5 of Intel’s Rapid Storage Technology drivers, Z68 will support SSD caching. This sounds like the holy grail of SSD/HDD setups, where you have a single drive letter and the driver manages what goes on your SSD vs. HDD. Whether SSD Caching is indeed a DIY hybrid hard drive technology remains to be seen. It’s also unclear whether or not P67/H67 will get SSD Caching once 10.5 ships.

LGA-2011 Coming in Q4

One side effect of Intel’s tick-tock cadence is a staggered release update schedule for various market segments. For example, Nehalem’s release in Q4 2008 took care of the high-end desktop market, however it didn’t see an update until the beginning of 2010 with Gulftown. Similarly, while Lynnfield debuted in Q3 2009 it was left out of the 32nm refresh in early 2010. Sandy Bridge is essentially that 32nm update to Lynnfield.

So where does that leave Nehalem and Gulftown owners? For the most part, the X58 platform is a dead end. While there are some niche benefits (more PCIe lanes, more memory bandwidth, 6-core support), the majority of users would be better served by Sandy Bridge on LGA-1155.

For the users who need those benefits however, there is a version of Sandy Bridge for you. It’s codenamed Sandy Bridge-E and it’ll debut in Q4 2011. The chips will be available in both 4 and 6 core versions with a large L3 cache (Intel isn’t being specific at this point).

SNB-E will get the ring bus, on-die PCIe and all of the other features of the LGA-1155 Sandy Bridge processors, but it won’t have an integrated GPU. While current SNB parts top out at 95W TDP, SNB-E will run all the way up to 130W—similar to existing LGA-1366 parts.

The new high-end platform will require a new socket and motherboard (LGA-2011). Expect CPU prices to start off at around the $294 level of the new i7-2600 and run all the way up to $999.

A Near-Perfect HTPC

Since 2006 Intel’s graphics cores have supported sending 8-channel LPCM audio over HDMI. In 2010 Intel enabled bitstreaming of up to eight channels of lossless audio typically found on Blu-ray discs via Dolby TrueHD and DTS-HD MA codecs. Intel’s HD Graphics 3000/2000 don’t add anything new in the way of audio or video codec support.

Dolby Digital, TrueHD (up to 7.1), DTS, DTS-HD MA (up to 7.1) can all be bitstreamed over HDMI. Decoded audio can also be sent over HDMI. From a video standpoint, H.264, VC-1 and MPEG-2 are all hardware accelerated. The new GPU enables HDMI 1.4 and Blu-ray 3D support. Let’s run down the list:

Dolby TrueHD Bitstreaming? Works:

DTS HD-MA bitstreaming? Yep:

Blu-ray 3D? Make that three:

How about 23.976 fps playback? Sorry guys, even raking in $11 billion a quarter doesn’t make you perfect.

Here’s the sitch, most movie content is stored at 23.976 fps but incorrectly referred to as 24p or 24 fps. That sub-30 fps frame rate is what makes movies look like, well, movies and not soap operas (this is also why interpolated 120Hz modes on TVs make movies look cheesey since they smooth out the 24 fps film effect). A smaller portion of content is actually mastered at 24.000 fps and is also referred to as 24p. In order to smoothly playback either of these formats you need a player and a display device capable of supporting the frame rate. Many high-end TVs and projectors support this just fine, however on the playback side Intel only supports the less popular of the two: 24.000Hz.

This isn’t intentional, but rather a propagation of an oversight that started back with Clarkdale. Despite having great power consumption and feature characteristics, Clarkdale had one glaring issue that home theater enthusiasts discovered: despite having a 23Hz setting in the driver, Intel’s GPU would never output anything other than 24Hz to a display.

The limitation is entirely in hardware, particularly in what’s supported by the 5-series PCH (remember that display output is routed from the processor’s GPU to the video outputs via the PCH). One side effect of trying to maintain Intel’s aggressive tick-tock release cadence is there’s a lot of design reuse. While Sandy Bridge was a significant architectural redesign, the risk was mitigated by reusing much of the 5-series PCH design. As a result, the hardware limitation that prevented a 23.976Hz refresh rate made its way into the 6-series PCH before Intel discovered the root cause.

Intel had enough time to go in and fix the problem in the 6-series chipsets, however doing so would put the chipset schedule at risk given that fixing the problem requires a non-trivial amount of work to correct. Not wanting to introduce more risk into an already risky project (brand new out of order architecture, first on-die GPU, new GPU architecture, first integrated PLL), Intel chose to not address it this round, which is why we still have the problem today.

Note the frame rate

What happens when you try to play 23.976 fps content on a display that refreshes itself 24.000 times per second? You get a repeated frame approximately every 40 seconds to synchronize the source frame rate with the display frame rate. That repeated frame appears to your eyes as judder in motion, particularly evident in scenes involving a panning camera. How big of an issue this is depends on the user. Some can just ignore the judder, others will attempt to smooth it out by setting their display to 60Hz, while others will be driven absolutely insane by it.

If you fall into the latter category, your only option for resolution is to buy a discrete graphics card. Currently AMD’s Radeon HD 5000 and 6000 series GPUs correctly output a 23.976Hz refresh rate if requested. These GPUs also support bitstreaming Dolby TrueHD and DTS-HD MA, while the 6000 series supports HDMI 1.4a and stereoscopic 3D. The same is true for NVIDIA’s GeForce GT 430, which happens to be a pretty decent discrete HTPC card.

Intel has committed to addressing the problem in the next major platform revision, which unfortunately seems to be Ivy Bridge in 2012. There is a short-term solution for HTPC users absolutely set on Sandy Bridge. Intel has a software workaround that enables 23.97Hz output. There’s still a frame rate mismatch at 23.97Hz, but it would be significantly reduced compared to the current 24.000Hz-only situation.

MPC-HC Compatibility Problems

Just a heads up. Media Player Classic Home Cinema doesn't currently play well with Sandy Bridge. Enabling DXVA acceleration in MPC-HC will cause stuttering and image quality issues during playback. It's an issue with MPC-HC and not properly detecting SNB as far as I know. Intel has reached out to the developer for a fix.

Intel’s Quick Sync Technology

In recent years video transcoding has become one of the most widespread consumers of CPU power. The popularity of YouTube alone has turned nearly everyone with a webcam into a producer, and every PC into a video editing station. The mobile revolution hasn’t slowed things down either. No smartphone can play full bitrate/resolution 1080p content from a Blu-ray disc, so if you want to carry your best quality movies and TV shows with you, you’ll have to transcode to a more compressed format. The same goes for the new wave of tablets.

At a high level, video transcoding involves taking a compressed video stream and further compressing it to better match the storage and decoding abilities of a target device. The reason this is transcoding and not encoding is because the source format is almost always already encoded in some sort of a compressed format. The most common, these days, being H.264/AVC.

Transcoding is a particularly CPU intensive task because of the three dimensional nature of the compression. Each individual frame within a video can be compressed; however, since sequential frames of video typically have many of the same elements, video compression algorithms look at data that’s repeated temporally as well as spatially.

I remember sitting in a hotel room in Times Square while Godfrey Cheng and Matthew Witheiler of ATI explained to me the challenges of decoding HD-DVD and Blu-ray content. ATI was about to unveil hardware acceleration for some of the stages of the H.264 decoding pipeline. Full hardware decode acceleration wouldn’t come for another year at that point.

The advent of fixed function video decode in modern GPUs is important because it helped enable GPU accelerated transcoding. The first step of the video transcode process is to first decode the source video. Since transcoding involves taking a video already in a compressed format and encoding it in a new format, hardware accelerated video decode is key. How fast a decode engine is has a tremendous impact on how fast a hardware accelerated video encode can run. This is true for two reasons.

First, unlike in a playback scenario where you only need to decode faster than the frame rate of the video, when transcoding the video decode engine can run as fast as possible. The faster frames can be decoded, the faster they can be fed to the transcode engine. The second and less obvious point is that some of the hardware you need to accelerate video encoding is already present in a video decode engine (e.g. iDCT/DCT hardware).

With video transcoding as a feature of Sandy Bridge’s GPU, Intel beefed up the video decode engine from what it had in Clarkdale. In the first generation Core series processors, video decode acceleration was split between fixed function decode hardware and the GPU’s EU array. With Sandy Bridge and the second generation Core CPUs, video decoding is done entirely in fixed function hardware. This is not ideal from a flexibility standpoint (e.g. newer video codecs can’t be fully hardware accelerated on existing hardware), but it is the most efficient method to build a video decoder from a power and performance standpoint. Both AMD and NVIDIA have fixed function video decode hardware in their GPUs now; neither rely on the shader cores to accelerate video decode.

The resulting hardware is both performance and power efficient. To test the performance of the decode engine I launched multiple instances of a 15Mbps 1080p high profile H.264 video running at 23.976 fps. I kept launching instances of the video until the system could no longer maintain full frame rate in all of the simultaneous streams. The graph below shows the maximum number of streams I could run in parallel:

AMD’s Radeon HD 6000 series GPUs can only manage a single high profile, 1080p H.264 stream, which is perfectly sufficient for video playback. NVIDIA’s GeForce GTX 460 does much better; it could handle three simultaneous streams. Sandy Bridge however takes the cake as a single Core i5-2500K can decode five streams in tandem.

The Sandy Bridge decoder is likely helped by the very large (and high bandwidth) L3 cache connected to it. This is the first advantage Intel has in what it calls its Quick Sync technology: a very fast decode engine.

The decode engine is also reused during the actual encode phase. Once frames of the source video are decoded, they are actually fed to the programmable EU array to be split apart and prepared for transcoding. The data in each frame is transformed from the spatial domain (location of each pixel) to the frequency domain (how often pixels of a certain color appear); this is done by the use of a discrete cosine transform. You may remember that inverse discrete cosine transform hardware is necessary to decode video; well, that same hardware is useful in the domain transform needed when transcoding.

Motion search, the most compute intensive part of the transcode process, is done in the EU array. It's the combination of the fast decoder, the EU array, and fixed function hardware that make up Intel's Quick Sync engine.

Home Previous AMD page Intel page Next

	Intel Core i5-2500K	NVIDIA GeForce GTX 460	AMD Radeon HD 6870
Number of Parallel 1080p HP Streams	5 streams	3 streams	1 stream