TopNax |
Quick Sync: The Best Way to TranscodeCurrently Intel’s Quick Sync transcode is only supported by two applications: Cyberlink’s Media Espresso 6 and Arcsoft’s Media Converter 7. Both of these applications are video to go applications targeted at users who want to take high resolution/high bitrate content and transcode it to more compact formats for use on smartphones, tablets, media streamers and gaming consoles. The intended market is not users who are attempting to make high quality archives of Blu-ray content. As a result, there’s no support for multi-channel audio; both applications are limited to 2-channel MP3 or AAC output. There’s also no support for transcoding to anything higher than the main profile of H.264. Intel indicates that these are not hardware limitations of Quick Sync, but rather limitations of the transcoding software. To that extent, Intel is working with developers of video editing applications to bring Quick Sync support to applications that have a more quality-oriented usage model. These applications are using Intel’s Media SDK 2.0 which is publicly available. Intel says that any developer can get access to and use it. For the purposes of this comparison I’ve used Media Converter 7, but that’s purely a personal preference thing. The performance and image quality should be roughly identical between the two applications as they both use the same APIs. Jarred's look at Mobile Sandy Bridge will focus on MediaEspresso. Where image quality isn’t consistent however is between transcoding methods in either application. Both applications support four codepaths: ATI Stream, Intel Quick Sync, NVIDIA CUDA, and x86. While you can set any of these codepaths to the same transcoding settings, the method by which they arrive at the transcoded image will differ. This makes sense given how different all four target architectures are (e.g. a Radeon HD 6870 doesn’t look anything like a NVIDIA GeForce GTX 460). Each codepath makes a different set of performance vs. quality tradeoffs which we’ll explore in this section. The first but not as obvious difference is if you use the Sandy Bridge CPU cores vs. Quick Sync to transcode you will actually get a different image. The image quality is slightly better on the x86 path, but the two are similar. The reason for the image quality difference is easy to understand. CPUs are inherently not very parallel beasts. We get tremendous speedup on highly parallel tasks on multi-core CPUs, but compared to a GPU’s ability to juggle hundreds or thousands of threads, even a 6-core CPU doesn’t look too wide. As a result of this serial vs. parallel difference, transcoding algorithms optimized for CPUs are very computationally efficient. They have to be, because you can’t rely on hundreds of cores running in parallel when you’re running on a CPU. Take the same code and run it on a GPU and you’ll find that the majority of your execution resources are wasted. A new codepath is needed that can take advantage of the greater amount of compute at your disposal. For example, a GPU can evaluate many different compression modes in parallel whereas on a CPU you generally have to pick a balance between performance and quality up front regardless of the content you’re dealing with. There’s also one more basic difference between code running on the CPU vs. integrated GPU. At least in Intel’s case, certain math operations can be performed with higher precision on Sandy Bridge’s SSE units vs. the GPU’s EUs. |
Intel tuned the PSNR of the Quick Sync codepath to be as similar to the x86 codepath as possible. The result is, as I mentioned above, quite similar: Now let’s tackle the other GPUs. When I first started my Quick Sync investigations I did a little experiment. Without forming any judgments of my own, I quickly transcoded a ~15Mbps 1080p movie into a iPhone 4 compatible 720p H.264 at 4Mbps. I then trimmed it down to a single continuous 4 minute scene and passed the movie along to six AnandTech editors. I sent the editors three copies of the 4 minute scene. One transcoded on a GeForce GTX 460, one using Intel’s Quick Sync, and one using the standard x86 codepath. I named the three movies numerically and told no one which platform was responsible for each output. All I asked for was feedback on which ones they thought were best. Here are some of the comments I received: “Wow... there are some serious differences in quality. I'm concerned that the 1.mp4 is the accelerated transcode, in which case it looks like poop..” “Video 1: Lots of distracting small compression blocks, as if the grid was determined pre-encoding (I know that generally there are blocks, but here the edges seem to persist constantly). Persistent artifacts after black. Quality not too amazing, I wouldn't be happy with this.” Video one, which many assumed was Quick Sync, actually came from the GeForce GTX 460. The CUDA codepath, although extremely fast, actually produces a much worse image. Videos 2 and 3 were outputs from Sandy Bridge, and the editors generally didn’t agree on which one of those two looked better just that they were definitely better than the first video. To confirm whether or not this was a fluke I set up three different transcodes. Lossy video compression is hard to get right when you’re transcoding scenes that are changing quickly, so I focused on scenes with significant movement. The first transcode involves taking the original Casino Royale Blu-ray, stripping it of its DRM using AnyDVD HD, and feeding that into MC7 as a source. The output in this case was a custom profile: 15Mbps 1080p main profile H.264. This is an unrealistic usage model simply because the output file only had 2-channel audio, making it suitable only for PC use and likely a waste of bitrate. I simply wanted to see how the various codepaths looked and performed with an original BD source. |
Let’s look at performance first. The entire movie has around 200,000 frames, the transcoding frame rate is below: As we’ve been noting in our GPU reviews for quite some time now, there’s no advantage to transcoding on a GPU faster than the $200 mainstream parts. Remember that the transcode process isn’t all infinitely parallel, we are ultimately bound by the performance of the sequential components of the algorithm. As a result, the Radeon HD 6970 offers no advantage over the 6870 here. Both of these AMD GPUs end up being just as fast as a Core i5-2500K. NVIDIA’s GPUs offer a 15.7% performance advantage, but as I mentioned earlier, the advantage comes at the price of decreased quality (which we’ll get to in a moment). Inte’s Quick Sync is untouchable though. It’s 48% faster than NVIDIA’s GeForce GTX 460 and 71% faster than the Radeon HD 6970. I don’t want to proclaim that discrete GPU based transcoding is dead, but based on these results it sure looks like it. What about image quality? My image quality test scene isn’t anything absurd. Bond and Vespyr are about to meet Mathis for the first time. Mathis walks towards the two and the camera pans to follow him. With only one character and the camera both moving at a predictable rate, using some simple motion estimation most high quality transcoders should be able to handle this scene without getting tripped up too much.
|
Comparing the shots above the only real outlier is NVIDIA’s GeForce GTX 460. The CUDA path clearly errs on the side of performance vs. quality and produces a far noisier image. The ATI Stream codepath produces an image that’s very close to the standard x86 and Quick Sync output. In fact, everything but the GTX 460 does well here. The next test uses an already transcoded 15Mbps 1080p x264 rip of Quantum of Solace Blu-ray disc. For many this is likely what you’ll have stored on your movie server rather than a full 50GB Blu-ray rip. Our destination this time is the iPhone 4. The settings are as follows: 4Mbps 720p H.264 |
Comparing the shots above the only real outlier is NVIDIA’s GeForce GTX 460. The CUDA path clearly errs on the side of performance vs. quality and produces a far noisier image. The ATI Stream codepath produces an image that’s very close to the standard x86 and Quick Sync output. In fact, everything but the GTX 460 does well here. The next test uses an already transcoded 15Mbps 1080p x264 rip of Quantum of Solace Blu-ray disc. For many this is likely what you’ll have stored on your movie server rather than a full 50GB Blu-ray rip. Our destination this time is the iPhone 4. The settings are as follows: 4Mbps 720p H.264. At only 4Mbps there’s a lot of compression going on, image quality isn’t going to be nearly as good as the previous test. Performance is considerably higher as the encoders are able to discard more data and optimize for performance over absolute quality. The entire movie has 152,000 frames that are transcoded in this test: The six-core Phenom II X6 1100T is faster than the Core i5-2500K thanks to the latter’s lack of Hyper Threading. Both are around the speed of the Radeon HD 6870. The GeForce GTX 460 is faster than any standalone x86 CPU, regardless of core count. However once again, Quick Sync blows them all out of the water. At 200 frames per second Quick Sync is more than twice the speed of a standard Core i5-2500K or even the Phenom II X6 1100T. And it’s nearly twice as fast as the GTX 460. |
The image quality comparison scene is also far more stressful on the transcoders. There’s a lot of unpredictable movement going on as Bond is in pursuit of a double agent at the beginning of the film.
|
The image quality story is about the same for AMD’s GPUs and the x86 path, however Quick Sync delivers a noticeably worse quality image. It’s no where near as bad as the GTX 460, but it’s just not as sharp as what you get from the software or ATI Stream codepaths. The problem here seems to be that when transcoding from a lower quality source, the tradeoffs NVIDIA makes are amplified. Even Quick Sync isn’t perfect here. I’d say Quick Sync is closer to the pure x86 path than CUDA. Given the tremendous performance advantage I’d say the tradeoff is probably worth it in this case. For our final test we’ve got a 12Mbps 1080p x264 rip of The Dark Knight. Our target this time is a 640x480, 1.5Mbps iPod Touch compatible format |
Surprisingly enough the 6970 shows a slight performance advantage compared to the 6870 in this test, but still not enough to approach the speed of the x86 CPUs in this test. Quick Sync is almost 4x faster than the Radeon HD 6970 and twice as fast as everything else. Our Dark Knight image quality test is also the most strenuous of the review. We’re looking at a very dark, high motion scene with a sudden explosion. The frame we’re looking at is right after the Joker fires a rocket at the rear of a police car. The sudden explosion casts light everywhere which can’t be predicted based on the previous frame. |
The GeForce GTX 460 looks horrible here. The output looks like an old film, it’s simply inexcusable. The Radeon HD 6870 produces a frame that has similar sharpness to the x86 codepath, but with muted colors. Quick Sync maintains color fidelity but loses the sharpness of the x86 path, similar to what we saw in the previous test. In this case the loss of sharpness does help smooth out some aliasing in the paint on the police car but otherwise is undesirable. Overall, based on what I’ve seen in my testing of Quick Sync, it isn’t perfect but it does deliver a good balance of image quality and performance. With Quick Sync enabled you can transcode a ~2.5 hour Blu-ray disc in around 35 minutes. If you’ve got a lower quality source (e.g. a 15GB Blu-ray re-encode), you can plan on doing a full movie in around 13 minutes. Quick Sync will chew through TV shows in a couple of minutes, without a tremendous loss in quality. With CUDA on NVIDIA GPUs we had to choose between high quality or high performance. (Perhaps other applications will do the transcode better as well, but at least Arcsoft's Media Converter 7 has serious image quality problems with CUDA.) With Quick Sync you can have both, and better performance than we’ve ever seen from any transcoding solution in desktops or notebooks. |
Intel’s Gen 6 GraphicsAll 2nd generation Core series processors that fit into an LGA-1155 motherboard will have one of two GPUs integrated on-die: Intel’s HD Graphics 3000 or HD Graphics 2000. Intel’s upcoming Sandy Bridge E for LGA-2011 will not have an on-die GPU. All mobile 2nd generation Core series processors feature HD Graphics 3000. The 3000 vs. 2000 comparison is pretty simple. The former has 12 cores or EUs as Intel likes to call them, while the latter only has 6. Clock speeds are the same although the higher end parts can turbo up to higher frequencies. Each EU is 128-bits wide, which makes a single EU sound a lot like a single Cayman SP. Unlike Clarkdale, all versions of HD Graphics on Sandy Bridge support Turbo. Any TDP that is freed up by the CPU running at a lower frequency or having some of its cores shut off can be used by the GPU to turbo up. The default clock speed for both HD 2000 and 3000 on the desktop is 850MHz; however, the GPU can turbo up to 1100MHz in everything but the Core i7-2600/2600K. The top-end Sandy Bridge can run its GPU at up to 1350MHz. |
Mobile is a bit different. The base GPU clock in all mobile SNB chips is 650MHz but the max turbo is higher at 1300MHz. The LV/ULV parts also have different max clocks, which we cover in the mobile article. As I mentioned before, all mobile 2nd gen Core processors get the 12 EU version—Intel HD Graphics 3000. The desktop side is a bit more confusing. In desktop, the unlocked K-series SKUs get the 3000 GPU while everything else gets the 2000 GPU. That’s right: the SKUs most likely to be paired with discrete graphics are given the most powerful integrated graphics. Of course those users don’t pay any penalty for the beefier on-die GPU; when not in use the GPU is fully power gated. Despite the odd perk for the K-series SKUs, Intel’s reasoning behind the GPU split does makes sense. The HD Graphics 2000 GPU is faster than any desktop integrated GPU on the market today, and it’s easy to add discrete graphics to a desktop system if the integrated GPU is insufficient. The 3000 is simply another feature to justify the small price adder for K-series buyers. On the mobile side going entirely with 3000 is simply because of the quality of integrated or low-end graphics in mobile. You can’t easily add in a discrete card so Intel has to put its best foot forward to appease OEMs like Apple. I suspect the top-to-bottom use of HD Graphics 3000 in mobile is directly responsible for Apple using Sandy Bridge without a discrete GPU in its entry level notebooks in early 2011. I’ve been careful to mention the use of HD Graphics 2000/3000 in 2nd generation Core series CPUs, as Intel will eventually bring Sandy Bridge down to the Pentium brand with the G800 and G600 series processors. These chips will feature a version of HD Graphics 2000 that Intel will simply call HD Graphics. Performance will be similar to the HD Graphics 2000 GPU, however it won’t feature Quick Sync. Image Quality and ExperiencePerhaps the best way to start this section is with a list. Between Jarred and I, these are the games we’ve tested with Intel’s on-die HD 3000 GPU: Assassin’s Creed This is over two dozen titles, both old and new, that for the most part worked on Intel’s integrated graphics. Now for a GPU maker, this is nothing to be proud of, but given Intel’s track record with game compatibility this is a huge step forward. We did of course run into some issues. Fallout 3 (but not New Vegas) requires a DLL hack to even run on Intel integrated graphics, and we saw some shadow rendering issues in Mafia II, but for the most part the titles—both old and new—worked.
Now the bad news. Despite huge performance gains and much improved compatibility, even the Intel HD Graphics 3000 requires that you run at fairly low detail settings to get playable frame rates in most of these games. There are a couple of exceptions but for the most part the rule of integrated graphics hasn’t changed: turn everything down before you start playing.
This reality has been true for more than just Intel integrated graphics however. Even IGPs from AMD and NVIDIA had the same limitations, as well as the lowest end discrete cards on the market. The only advantage those solutions had over Intel in the past was performance. Realistically we need at least another doubling of graphics performance before we can even begin to talk about playing games smoothly at higher quality settings. Interestingly enough, I’ve heard the performance of Intel’s HD Graphics 3000 is roughly equal to the GPU in the Xbox 360 at this point. It only took six years for Intel to get there. If Intel wants to contribute positively to PC gaming, we need to see continued doubling of processor graphics performance for at least the next couple generations. Unfortunately I’m worried that Ivy Bridge won’t bring another doubling as it only adds 4 EUs to the array. |
Intel HD Graphics 2000/3000 PerformanceI dusted off two low-end graphics cards for this comparison: a Radeon HD 5450 and a Radeon HD 5570. The 5450 is a DX11 part with 80 SPs and a 64-bit memory bus. The SPs run at 650MHz and the DDR3 memory interface has a 1600MHz data rate. That’s more compute power than the Intel HD Graphics 3000 but less memory bandwidth than a Sandy Bridge if you assume the CPU cores aren’t consuming more than half of the available memory bandwidth. The 5450 will set you back $45 at Newegg and is passively cooled. The Radeon HD 5570 is a more formidable opponent. Priced at a whopping $70, this GPU comes with 400 SPs and a 128-bit memory bus. The core clock remains at 650MHz and the DDR3 memory interface has an 1800MHz data rate. This is more memory bandwidth and much more compute than the HD Graphics 3000 can offer. Based on what we saw in our preview I’d expect performance similar to the Radeon HD 5450 and significantly lower than the Radeon HD 5570. Both of these cards were paired with a Core i5-2500K to remove any potential CPU bottlenecks. On the integrated side we have a few representatives. AMD’s 890GX is still the cream of the crop for AMD integrated for at least a few more months. I paired it with a 6-core 1100T to keep the CPU from impacting things. Representing Clarkdale I have a Core i5-661 and 660. Both chips run at 3.33GHz but the 661 has a 900MHz GPU while the 660 runs at 733MHz. These are the fastest representatives with last year’s Intel HD Graphics, but given the margin of improvement I didn’t feel the need to show anything slower. And finally from Sandy Bridge we have three chips: the Core i5-2600K and 2500K both with Intel HD Graphics 3000 (but different turbo modes) and the Core i3-2100 with HD Graphics 2000. Nearly all of our test titles were run at the lowest quality settings available in game at 1024x768. We ran with the latest drivers available as of 12/30/2010. Note that all of the screenshots used below were taken on Intel's HD Graphics 3000. For a comparison of IQ between it and the Radeon HD 5450 I've zipped up originals of all of the images here. Dragon Age: OriginsDAO has been a staple of our integrated graphics benchmark for some time now. The third/first person RPG is well threaded and is influenced both by CPU and GPU performance. We ran at 1024x768 with graphics and texture quality both set to low. Our benchmark is a FRAPS runthrough of our character through a castle. The new Intel HD Graphics 2000 is roughly the same performance level as the highest clock speed HD Graphics offered with Clarkdale. The Core i3-2100 and Core i5-661 deliver about the same level of performance here. Both are faster than AMD’s 890GX and all three of them are definitely playable in this test.HD Graphics 3000 is a huge step forward. At 71.5 fps it’s 70% faster than Clarkdale’s integrated graphics, and fast enough that you can actually crank up some quality settings if you’d like. The higher end HD Graphics 3000 is also 26% faster than a Radeon HD 5450. What Sandy Bridge integrated graphics can’t touch however is the Radeon HD 5570. At 112.5 fps, the 5570’s compute power gives it a 57% advantage over Intel’s HD Graphics 3000. |
Quick Sync with a Discrete GPUThere’s just one hangup to all of this Quick Sync greatness: it only works if the processor’s GPU is enabled. In other words, on a desktop with a single monitor connected to a discrete GPU, you can’t use Quick Sync. This isn’t a problem for mobile since Sandy Bridge notebooks should support switchable graphics, meaning you can use Quick Sync without waking up the discrete GPU. However there’s no standardized switchable graphics for desktops yet. Intel indicated that we may see some switchable solutions in the coming months on the desktop, but until then you either have to use the integrated GPU alone or run a multimonitor setup with one monitor connected to Intel’s GPU in order to use Quick Sync. |
Processor |
Intel HD Graphics |
EUs |
Quick Sync |
Graphics Clock |
Graphics Max Turbo |
Intel Core i7-2600K |
3000 |
12 |
Y |
850MHz |
1350MHz |
Intel Core i7-2600 |
2000 |
6 |
Y |
850MHz |
1350MHz |
Intel Core i5-2500K |
3000 |
12 |
Y |
850MHz |
1100MHz |
Intel Core i5-2500 |
2000 |
6 |
Y |
850MHz |
1100MHz |
Intel Core i5-2400 |
2000 |
6 |
Y |
850MHz |
1100MHz |
Intel Core i5-2300 |
2000 |
6 |
Y |
850MHz |
1100MHz |
Intel Core i3-2120 |
2000 |
6 |
Y |
850MHz |
1100MHz |
Intel Core i3-2100 |
2000 |
6 |
Y |
850MHz |
1100MHz |
Intel Pentium G850 |
Intel HD Graphics |
6 |
N |
850MHz |
1100MHz |
Intel Pentium G840 |
Intel HD Graphics |
6 |
N |
850MHz |
1100MHz |
Intel Pentium G620 |
Intel HD Graphics |
6 |
N |
850MHz |
1100MHz |