TopNax

Home       Previous        AMD page         Intel page         Next

AMD Fusion: Brazos Gets Previewed: Part 1

Today is AMD’s 2010 Financial Analyst Day, and we have some more details on its Fusion APUs. Llano is still a few months away. In the meantime, we have a preview of the Brazos platform, which will voraciously tackle mobility under the $500 price point.

According to recent Q3 market share numbers, the perpetual back and forth between AMD and Intel is once again in a state of flux. The larger company is taking substantial chunks of away from AMD in the high-end server market, thanks mostly to the Xeon family's transition to Nehalem-based designs in the 1P, 2P, and MP segments. And we do mean huge, as AMD’s overall server market share fell from ~10% to 6.5% (source: IDC). The Opteron 6000-series is holding its own, but it really isn’t putting up a good enough of a fight to retain its foothold.

 

This reinforces our belief that the Magny-Cours-based server processors only serve as a temporary placeholder. That product line is meant to hold Intel back until Bulldozer becomes available. This isn’t necessarily a bad strategy, as Q3 was the 4th consecutive quarter that we saw the price paid per CPU rise (Oct. 27th report from Mercury Research), which only helps to advocate AMD’s lower price points.

 

 

 

AMD is doing better in the desktop world, which makes up the largest portion of its CPU portfolio. Comparatively, Intel has a much beefier portion of the notebook market. However, all three market segments have seen some of the slowest per quarter growth (1.9% for Q3), less than a third of the historical numbers. Server sales still outrank desktop, while mobile CPU numbers fall to third place. This disproportionate growth is why AMD saw some mild gains in Q3, as the company's larger desktop CPU foothold helped shore up revenue lost to a slowdown in the mobile market.

 

 

Source: IDC, Jon Peddie, Mercury, Intel

 

Why does this all this matter? We’ve covered much of this information in our manager surveys, but Intel and AMD are both about to throw monkey wrenches into the graphics battle, too. The two companies are on the cusp of unveiling new processor platforms that feature integrated graphics technology. In September, we saw Sandy Bridge at IDF, which will put CPU and graphics processing on a single piece of silicon etched at 32 nm. It is definitely exciting stuff. But Intel's solution is, ironically, facing the unlikely underdog position, as very few industry veterans have much faith in the company's ability to deliver even mediocre graphics capabilities. Moreover, it is keeping the architecture's fixed-function media encoder close to its chest. Very few folks have seen it in action.

 

Meanwhile on October 19th, AMD showed off its upcoming Llano APU (Accelerated Processing Unit--the company’s acronym for a CPU/GPU hybrid). 

 

 

In general, the real divider for the graphics industry isn't integrated versus discrete. What separates the proverbial men from the boys is performance. It’s the reason every gamer worth his salt cringes when you stick him with an IGP-based platform. Performance is the very reason people complain about high bit rate HD video playback on low-end systems like netbooks.

 

Historically, IGPs have never approached the low-end discrete space. There's just too much of a power/heat difference between something you stick under a passive heatsink the size of a postage stamp soldered onto a motherboard and the real estate available on even a single-slot add-in card. As a result, those two markets are as divided as oil and water. Intel’s Sandy Bridge architecture and AMD’s Fusion initiative are about bringing in the tide that'll blur this line in the sand.

 

 

Remember, Intel has a bigger portion of the graphics pie, thanks to its northbridge-based integrated graphics solutions and more recent HD Graphics engine, built onto the Clarkdale and Arrandale CPUs.

That leaves AMD and Nvidia to duke it out in the discrete graphics space, while Intel watches from its cushy vantage point, not really needing a competitive offering. Even when we were dealing with front-side bus-hobbled CPUs, Intel could always outmode Nvidia, AMD, SiS, and VIA chipsets in sheer sales thanks to price and compatibility. System vendors could always trust an Intel CPU paired to an Intel chipset. This isn’t to say that third-party chipsets didn’t work. However, they often required extra effort on the part of the ODM or OEM. As they say, when you have a problem, it's always better to have one throat to choke.

As we start working with more proprietary interconnects like DMI and UMI, Intel and AMD can both deny Nvidia the ability to sell its own compatible chipsets. Particularly now that we no longer need a separate northbridge, the integrated graphics fight is going to be purely AMD versus Intel--that is, until the Delaware courts tell Nvidia otherwise or VIA achieves more than a 1% market share.

As far as graphics performance goes, it's fair to say that Intel has a lot more to prove with Sandy Bridge than AMD does with the upcoming designs in its Fusion program, if only because of the expertise introduced by ATI. Of course, we'll spend more time with Sandy Bridge soon, but today is AMD’s 2010 Financial Analyst day, and so we can finally spill a few beans on what Fusion will mean in the months and years to come.

AMD Fusion: What Can It Do?

If you haven’t yet seen our earlier coverage of AMD’s 2011 Code Names, its a good time to play some catch-up.

 

Fusion: AMD is using the word Fusion to describe an approach to processor design and software development, in its words: “…delivering powerful CPU and GPU capabilities for HD, 3D, and data-intensive workloads in a single-die processor called an APU (accelerated processing unit). APUs combine high-performance serial and parallel processing cores with other special-purpose hardware accelerators, enabling breakthroughs in visual computing, security, performance-per-watt and device form factor.”

 

In short, an APU designed according to AMD’s Fusion initiative will include a CPU and a GPU on a single piece of silicon. The improvements an APU are supposedly going to deliver include enhanced mainstream gaming performance and accelerated video transcoding, to name a couple of specific examples.

Fusion is the culmination of the AMD and ATI merger. AMD sees this as the next step in processor design as we near the apogee of the “Multi-Core Era.”

 

 

Remember that any system is only as fast as its slowest link. This means computational bottlenecks are always going to be a mix of bandwidth and latency. AMD considers its APU a Heterogenous Processing Unit, since it introduces a massive SIMD GPU array that allows general use programmable scalar and vector processor cores. This means that, while the GPU consumes the greatest real estate on the die, APUs will benefit from parallel processing capabilities particularly, as the shared memory helps enable lower latencies that integrated graphics processors have never enjoyed.

 

Discrete GPU solutions clearly still have their place, but APUs will still offer a big boost to value-oriented customers compared to anything seen from AMD before.

 

The real question becomes: how can Fusion speed up everyday tasks? Even if the processor cores are on par with what we see today, what can that on-die GPU do? We had a discussion with Tom Vaughan, CyberLink’s director of business development about this very issue. His company's software is quite often at the forefront of supporting brand new hardware technologies. Looking to the future, CyberLink sees a time when the capabilities of an APU and a discrete graphics solution are additive through an API like OpenCL. For example, if you have an APU with 400 stream processors and an add-in card with 1600, there could be gains tied to using them cooperatively. Or say you're running your display from the on-die graphics, and only spinning up the discrete card when a 3D application needs it. There might be power-oriented benefits there.

 

More immediately, though, you should be able to operate an APU and see near-identical performance to a comparable CPU/discrete graphics solution. Taking the PCI Express bus out of the equation does cut down on some latencies, but you'd be hard-pressed to tell either configuration apart. The real gain stems from the integration. Putting one more (capable) subsystem into the processor eliminates a discrete card, which in turn cuts back on cost, power, and motherboard complexity. This is really about doing the same job for less money than it would have cost previously. Surely, that's one of AMD's hallmarks. 

 

In the past, the integration of graphics was a value-add that companies like Intel tried to sugarcoat. The reality was that only a subset of its customers could use the built-in graphics. Everyone else needed an add-in chip from a vendor like AMD or Nvidia. And when it comes to mobility, we all know what building with a discrete GPU does to cost. Now, integrated graphics is relevant to a far larger customer base. There is still a point where an add-in chip becomes necessary to support the habits of a more hardcore user. But that threshold shift to the right, as the diagram below illustrates. Atom cannot achieve that today.

 

 

Even though we are only looking at Brazos today (it’s a mainstream value platform), there are very clear performance benefits associated with optimizing for an architecture designed under Fusion's charter. When we get to look at Sabine and Lynx, things will look at lot more exciting, and the benefits of a beefier APU will become clearer. We’ll talk more about Sabine and Lynx soon enough, but today is, again, about Brazos.

AMD Fusion: Brazos Platform

During August’s Hot Chips discussion, AMD revealed information about two APUs: Llano and Ontario. We already know that the first generation of Bulldozer-powered processors won’t be APUs. Roughly two weeks later, AMD unvieled Zacate, which fills the gap between Llano and Ontario. Zacate is basically identical to Ontario (manufactured at 40 nm, armed with DX11-class graphics, fixed-function UVD for video playback, and dual low-power Bobcat cores). It is the second of two new x86 architectures and is aimed at the low-power, ultrathin notebook and netbook spaces.

The Brazos platform consists of either APU: Ontario or Zacate. Brazos is intended for the low-end of the mobile spectrum, where AMD has predominately lagged behind Intel ever since Atom was introduced. AMD is specifically targeting users that instant message, word process, Web-browse, email, watch video, and maybe engage in some casual mainstream gaming.

Zacate, in particular, is aimed at Intel’s more entry-level CULV Pentiums, cheaper Celeron offerings, and premium Atom-based configurations. Just think about systems priced from $399 to $500. Meanwhile, Ontario will go head-to-head with more budget-focused Atom-based systems. And even though Ontario uses 1 W more than Intel's single-core Atom, rated for up to 8 W, it is supposed to include a more potent graphics bite than Intel’s decidedly mediocre GMA 3150. Discrete GPUs are still an option, but they hook directly into the APU by way of a four-lane PCIe link.

Technically, the Ontario/Zacate APU is only one half of the Brazos platform. The other half is the SB750 southbridge (codenamed "Hudson," a derivative of the SB800). The SB750 connects via AMD’s proprietary UMI (Unified Media Interface) interconnect. Details on that interface are still forthcoming. We should point out that, unlike earlier reports, there is no native USB 3.0 support.

The Brazos architectural overview provides a few more details. Notice that AMD is making a point to differentiate between a first-gen and second-gen implementation of the Brazos platform, the latter of which looks to include more PCI Express connectivity.

For netbooks, AMD's configuration combines a SB750 southbridge with a Ontario. Since the wireless device is directly hooked up to the APU's single PCIe link, there are improvements in power management, as the southbridge can go into an idle state without sacrificing connectivity.

For notebooks, the plan is to pair a SB750 southbridge with an Ontario or Zacate, with the additional use of a four-lane PCIe connection to a discrete GPU. However, this sacrifices some power management savings by hooking the wireless device to the southbridge. AMD's logic is that in a netbook, users would be less likely to need a discrete graphic solution. By moving wireless connectivity up to the APU, the southbridge only needs to deal with I/O devices like the keyboard, touchpad, USB devices, and flash media. Given that mobile users are less likely to use USB devices and flash media while on the road, the SB750 only has to transfer small bits of data from the keyboard and touchpad, which translates into higher power savings.

Everyone can benefit from power savings, and in a world where we leave our wireless connection active, the ideal situation would be to always have the wireless device hooked directly into the APU. However, this is not possible if you are using a discrete graphic solution in a x4 configuration. Remember, there are five PCIe controllers off the APU, and one is reserved for the UMI link. The other four are intended for peripherals. For discrete graphics, you can either use a single x1, x2, or x4. So in practice, it is possible to connect a discrete graphics chip with x2 and still simultaneously have two x1 connections available. Meanwhile, all "Hudson-M1" southbridges come with a UMI connection to the APU (Ontario/Zacate) that is based on a single x4 PCIe connection; probably with some aspect of proprietary signal handling.

It isn't quite clear in the diagrams, but we should point out that PCIe 1.0 and PCIe 2.0 are simply implementation blueprints. An Ontario APU armed with first-gen PCIe doesn't differ at the hardware level from an Ontario APU running second-gen PCIe signaling. According to AMD, the chips are from the same yield process, so this PCIe 1.0 versus 2.0 configuration is implemented at the BIOS level. This means that all PCIe connections on the Brazos platform are PCIe 1.0- and 2.0-capable. It is up the the system integrator to choose the implementation.

Ontario/Zacate (Bobcat-Based APUs) And Beta Drivers

It is interesting to point out that a single-core Zacate is still manufactured as a dual-core, except with one core disabled. Given the unlocking workarounds that the motherboard folks have exposed with existing dual- and triple-core Phenom IIs, we have been told it is possible to unlock that second core, provided you are married to the southbridge in the OEM's attempt to gain an additional core. Obviously there are risks, particularly as we don’t know what kind of tolerance threshold AMD is using to disable a core. Moreover, as Brazos APUs are only provided in a BGA package, the risk is bricking the entire system.

“Zacate” (18 W max)

· AMD E-350 with AMD Radeon HD 6310 Graphics (dual-core CPU @ 1.6 GHz & DX11 SIMD @ 500 MHz)

· AMD E-240 with AMD Radeon HD 6310 Graphics (single-core CPU @ 1.5 GHz & DX11 SIMD @ 500 MHz)


“Ontario” (9 W max)

· AMD C-50 with AMD Radeon HD 6250 Graphics (dual-core CPU @ 1.0 GHz & DX11 SIMD @ 280 MHz)

· AMD C-30 with AMD Radeon HD 6250 Graphics (single-core CPU @ 1.2 GHz & DX11 SIMD @ 280 MHz)


The final details of the Bobcat core have been released. Most of the technical specs are as we expected--out-of-order program and execution, something we see on most of the current x86 processors. Generally speaking, out-of-order processors increase die space because of the additional real estate space required for reordering instructions. As a result, power consumption goes up. Intel addressed this by creating Atom to be an in-order microprocessor. You can think of the difference between the two execution approaches as a to-do-list and a step-by-step flowchart. For example, when you make a sandwich, in-order execution means lettuce always comes before the tomato. Out-of-order means you just grab whatever ingredient is fastest to slap onto your slab of bread regardless of order; it ends up being faster but it generally isn't an energy efficient process.

Unlike Atom, though, AMD’s goal wasn’t to develop a “fast-enough processor” in order to achieve low power. The company’s goal was to achieve a fast processor with low power. In doing so, it favored OoO and introduced aggressive core-gating. AMD tells us it can gate off CPU cores and even portions of the APU, such as the UVD block, when not in use. This is critical to AMD’s plans to produce a processor that can deliver excess performance at a fraction of previous processor power consumption figures.

We are told that Ontario and Zacate come from the same manufacturing process, but AMD is separating them based on clock speeds. So, at the architectural level, they look identical.  There are two Bobcat cores with 1 MB L2 and a 64-bit FPU, along with a massive SIMD array, a dedicated DDR3 memory controller, unified video decoder, and a dedicated bus tying everything together. We say massive because the diagram doesn’t proportionally lay out the real estate of the individual components. Once you look at the die shot, you can see how much space the two SIMD engines take up. Boosting the computational speed for this first generation of APUs are 80 stream processors (40 per SIMD engine).

Furthermore, AMD claims to achieve 90% of K8 performance at one-third of the size. The GPU basically performs somewhere in between a Radeon HD 5400 and 5500 with the added benefit of a 6000-series card (3rd gen) UVD. According to AMD, it's directly hooking the on-die PCIe controller into the ultra-wide platform bus. There is no HyperTransport link used here.

Home       Previous        AMD page         Intel page         Next

Free Web Hosting