AMD Created Navi For Sony's PlayStation 5, Vega Suffered

AmazingAvery · June 2018

Forbes article - https://www.forbes.com/sites/jasonevangelho/2018/06/12/sources-amd-created-navi-for-sonys-playstation-5-vega-suffered/#a16310d24fda

The vast majority of AMD and Sony's Navi collaboration took place while Raja Koduri -- Radeon Technologies Group boss and chief architect -- was at AMD.

On a related note, a new rumor emerged recently about Navi for desktop being merely a midrange part and not competing with Nvidia's high-end GeForce cards. That makes perfect sense if it was developed primarily for a console first.

hmm,

Navi 10, the small Navi, manufactured on 7nm, is set to arrive at some point in 2019. There will be a big gaming GPU called Navi 20, but this one won’t come until 2020.

When we say 2020, you should not be thinking on the first few months of the year, as traditionally with AMD, you can expect it later, rather than sooner. It could be as late as the tail end of the year with the possibility it might slip to the following year. The Navi 20 is based on 7nm architecture and will share the DNA with Navi 10.

My take, if all that lines up Nvidia launches there next gen cards (20xx / 11xx series) within the next 4 months. A "Ti" version comes maybe a 8-12 months later. AMD has a GPU Navi solution for PS5 that IMO comes Fall 2019 holidays with the potential for an AMD consumer mid range Navi then. That won't be able to compete (I feel) gaming high end wise, with either of Nvidia's launches, also gives time for Nvidia to revise their 2018 offerings too. The larger gaming Navi comes in 2020 speculating towards the end, if sooner though great. However, that could be just in time for Nvidia's next cycle. Speculation for another 2 years for a high end AMD gaming card.

Ridelynn · June 2018

We heard something awfully similar before Polaris release.

Navi will be the first post-GCN GPU if I’m not mistaken. It’s being made for a console, so mid/low range parts make sense. Just like Polaris on this generation.

But I would not assume automatically that goes into a big Navi 20 part. They said the same thing would happen with Polaris and the big chip never came many years ago... and we ended up with just a small refresh of small Polaris before hitting Vega.

GCN has been around for a long time, so it’s a big gamble moving away. Those dice could roll any which direction.

All we can really infer is that it’s going into a next gen console in some manner, so we can make some extremely rough guesses as to power envelope and performance.

A PS5 has to beat out PS4 Pro by a good margin, and has to at least match XB1X, and most likely will also outperform that as well by a decent margin. But being a console folks will be particular about how much heat and noise it generates, and it has to look good sitting under the TV.

My guess - Navi will be about 25% faster than Polaris once the drivers mature out. But it will take a bit for the driver to settle down. Similar power profiles to existing Polaris SKUs (RX 680/670/660). 680 will likely be billed as “entry level 4K” on a larger VRAM SKU (probably 8G again). GDDR5 is my guess based on price/performance, but global supply issues may force a move there - I don’t expect HBM2 in a discrete card in this price tier (although I would not rule it out on PS5).

I do not expect another high end AMD ever really. Vega is mostly an offshoot of their professional card lineup, and I think that is what we will see again in about 2 more years time — a new Pro lineup, and a halfhearted gaming lineup built from that so they have a higher end product. nVidia will continue to carry the performance crown in the foreseeable future. AMDs near term future lay in selling out the high volume low/mid tier cards and semi-custom business, not in the top tier high performance where it takes significant R&D to compete any longer — only “Pro” cards carry enough margin there to make sense, and for AMD I only see that occurring if they line up another semi-custom deal to help fund it (believe it or not, Vega today owes a lot to the iMac Pro 5K).

Vrika · June 2018

I think Vega suffered because Vega was bad for graphic cards, AMD realized that they weren't going to make that good product anyway, and transferred resources to their next generation GPUs instead.

Quizzical · June 2018

Focusing on who has the fastest top end chip misses the point. The relevant metrics of architecture goodness are performance per watt and performance per mm^2.

If both vendors are willing to burn the same amount of power, then whoever has the best energy efficiency will have the fastest top end chip. If the vendor with the less efficient architecture ends up with the fastest top end chip by being willing to burn more power, then the other vendor will probably take the crown simply by making a higher power, higher performance version.

Performance per mm^2 matters because that's basically the cost to build the GPU chips, and a major chunk of the cost to build a completed video card. It's less of a clean comparison than it used to now that Nvidia and AMD are using different foundries to produce their GPU chip, but if it costs one vendor twice as much to produce a video card with a given level of performance, that's probably going to filter into their cards costing more at retail. At least if the miners aren't buying all the cards at MSRP and bidding up the prices.

On 28 nm, the Fury X and Titan X were close in the efficiency metrics. They were close in all three of die size, power consumption, and gaming performance. That meant that they were also close in performance per mm^2 and performance per watt.

Then both vendors did a die shrink without substantially changing their architectures, and after the die shrink, they weren't still close in the efficiency metrics. For graphical performance, Nvidia is way ahead on performance per watt and performance per mm^2. A Radeon RX Vega 64 uses more power than a GeForce GTX 1080 Ti and has a larger die. That doesn't translate into increased gaming performance, however.

What I want to know is, how did that happen? I see three possibilities that aren't mutually exclusive:

1) TSMC's 16 nm process node is simply better for high performance GPUs than Global Foundries' 14 nm process node. Don't get caught up in the difference between 14 nm and 16 nm. If that's the case, then Nvidia doesn't really have a better architecture than AMD, but only a better process node. In that case, that tells us nothing about how they'll compare in the future once they both move to 7 nm.

2) Nvidia had simply botched some things in their implementation of Maxwell in silicon at 28 nm that they fixed in Pascal. Meanwhile, AMD had a lot less to gain from redoing the chip as Fiji was a much better implementation of GCN than GM200 was of Maxwell. Nvidia got larger gains from a die shrink because they had some low-hanging fruit available to improve their architecture and AMD simply didn't. While GCN has been around for longer than Maxwell, GM200 launched more than a year after the GeForce GTX 750, so I'm skeptical that this accounts for the discrepancy. Still, if this is the case, then AMD won't be competitive again until they have a major new architecture.

3) AMD could have made Polaris and Vega competitive with Pascal if they had put in the engineering effort to lay out the chips more efficiently, but they didn't. Nvidia did put in the effort with Pascal, and that's how they ended up with more efficient parts. It's notable that Nvidia saw their clock speeds jump a lot more from the die shrink than AMD did. If this is the case, then simply putting in the necessary engineering effort in the future as they had in the past would be enough to catch up. If the article is correct that Vega was starved for resources because AMD focused them elsewhere, then being uncompetitive for a generation is a one-off oddity, not indicative of the future.

Quizzical · June 2018

It's a severe mistake to think of architectures as being designed for large or small dies. Both AMD and Nvidia make GPUs of a given architecture with a variety of die sizes. For the consumer version of Pascal alone, Nvidia offers dies with 2, 6, 10, 20, or 30 compute units, though the 2 is only for the integrated GPU in the Tegra X2. AMD had comparable scaling of their GCN parts, with dies ranging from 2 compute units (integrated GPU in Kabini) to 64 (Fiji).

It's possible that on 7 nm, AMD and/or Nvidia will go with smaller dies first and not really release a big die until later. They surely know a lot more about what yields look like on the upcoming process nodes than we do. But that won't be driven by "the architecture can't do it" but by "the foundry will make too many chips defective if we go larger". It's also possible that they'll wait until the EUV updates before going for larger dies if they expect that to markedly improve yields.

Historically, AMD got to new process nodes before Nvidia, but Nvidia came along later and built bigger dies. That Pascal arrived before Polaris marked the first time that Nvidia had been ahead in process nodes in over a decade. Today, with half node die shrinks basically dead and even some full nodes being skipped (how many 20 nm discrete GPUs are there, again?), that changes the options available.

Quizzical · June 2018

Ridelynn said:

My guess - Navi will be about 25% faster than Polaris once the drivers mature out. But it will take a bit for the driver to settle down. Similar power profiles to existing Polaris SKUs (RX 680/670/660). 680 will likely be billed as “entry level 4K” on a larger VRAM SKU (probably 8G again). GDDR5 is my guess based on price/performance, but global supply issues may force a move there - I don’t expect HBM2 in a discrete card in this price tier (although I would not rule it out on PS5).

If the combination of a die shrink from 14 nm to 7 nm and their first major new architecture in 8 years only means gains of 25% in performance per watt or performance per mm^2, then something will have gone severely wrong. The shrink from 28 nm to 14 nm got AMD much larger gains than that and here we're talking about what's wrong with Vega.

It's also important to remember that it's easy to improve on a bad product than a good one. If one vendor is ahead at a given point in time, then we can expect the one that is trailing behind to get larger gains out of their next generation. The history of GPUs is one if who has the lead changing often, not one vendor remaining permanently ahead. AMD was slightly ahead on efficiency as recently as four years ago (until the launch of the GeForce GTX 980), and far ahead for a period of about 3 1/2 years from the launch of the Radeon HD 4870 until the launch of the GeForce GTX 680.

Nvidia will almost certainly have the more efficient consumer GPUs until AMD launches some on 7 nm, but the combination of new process nodes and a major new architecture at least for AMD (and possibly also for Nvidia) will basically reroll everything on efficiency. Nvidia retaining the lead isn't really any more likely than AMD claiming it.

Cleffy · June 2018

I think the difference with Pascal is quite obvious when you look at compute. They sacrificed compute for graphics.
The die size on Navi doesn't matter. It's intended to be a pasted together construction like Thread Ripper is in CPUs. Ideally you want a smaller die.

Quizzical · June 2018

Cleffy said:

I think the difference with Pascal is quite obvious when you look at compute. They sacrificed compute for graphics.
The die size on Navi doesn't matter. It's intended to be a pasted together construction like Thread Ripper is in CPUs. Ideally you want a smaller die.

Multichip modules have been around for a long time for CPUs (e.g. the Core 2 Quad), but have never been done for GPUs. The problem is that the bandwidth needs connecting the two dies for a GPU would be enormous--several hundred GB/sec in each direction. I'm not saying that it's impossible, but I am saying that it's hard enough that in the past, vendors have always found it easier to make a larger die and then rely on salvage parts to fix the yields.

Kepler pretty severely sacrificed compute for the sake of graphics. Maxwell/Pascal actually beefed up compute considerably as compared to Kepler, though it still trails behind GCN/Polaris/Vega there.

Ridelynn · June 2018

Quizzical said:

It's a severe mistake to think of architectures as being designed for large or small dies.

There is a fairly distinct line where HBM2 makes sense, and where it does not. That's what I imagine AMD means by "Big" and "Small", and also what I took it as.

Ridelynn · June 2018

Quizzical said:

Ridelynn said:

My guess - Navi will be about 25% faster than Polaris once the drivers mature out. But it will take a bit for the driver to settle down. Similar power profiles to existing Polaris SKUs (RX 680/670/660). 680 will likely be billed as “entry level 4K” on a larger VRAM SKU (probably 8G again). GDDR5 is my guess based on price/performance, but global supply issues may force a move there - I don’t expect HBM2 in a discrete card in this price tier (although I would not rule it out on PS5).

If the combination of a die shrink from 14 nm to 7 nm and their first major new architecture in 8 years only means gains of 25% in performance per watt or performance per mm^2, then something will have gone severely wrong.
...

It's also important to remember that it's easy to improve on a bad product than a good one.

We'll see. I don't have any internal knowledge or anything special to go off from. But I put a guess down on the table, and I still wager it's closer than not.

Vrika · June 2018

Quizzical said:

What I want to know is, how did that happen? I see three possibilities that aren't mutually exclusive:

1) TSMC's 16 nm process node is simply better for high performance GPUs than Global Foundries' 14 nm process node. Don't get caught up in the difference between 14 nm and 16 nm. If that's the case, then Nvidia doesn't really have a better architecture than AMD, but only a better process node. In that case, that tells us nothing about how they'll compare in the future once they both move to 7 nm.

2) Nvidia had simply botched some things in their implementation of Maxwell in silicon at 28 nm that they fixed in Pascal. Meanwhile, AMD had a lot less to gain from redoing the chip as Fiji was a much better implementation of GCN than GM200 was of Maxwell. Nvidia got larger gains from a die shrink because they had some low-hanging fruit available to improve their architecture and AMD simply didn't. While GCN has been around for longer than Maxwell, GM200 launched more than a year after the GeForce GTX 750, so I'm skeptical that this accounts for the discrepancy. Still, if this is the case, then AMD won't be competitive again until they have a major new architecture.

3) AMD could have made Polaris and Vega competitive with Pascal if they had put in the engineering effort to lay out the chips more efficiently, but they didn't. Nvidia did put in the effort with Pascal, and that's how they ended up with more efficient parts. It's notable that Nvidia saw their clock speeds jump a lot more from the die shrink than AMD did. If this is the case, then simply putting in the necessary engineering effort in the future as they had in the past would be enough to catch up. If the article is correct that Vega was starved for resources because AMD focused them elsewhere, then being uncompetitive for a generation is a one-off oddity, not indicative of the future.

I'd like to add a fourth explanation:

4) AMD had developed Vega with HBM2 memory controller and counting that they'd have HBM2's bandwidth. At some point they noticed it wasn't going to be available at prices or amounts they'd need to be successful, but by then they were too far in development to do anything, so they just moved resources to other projects.

Ridelynn · June 2018

Reading the Forbes article... resources weren't moved away from RTG - they were re-prioritized to deal with custom volume partners, rather than consumer/reference designs.

So if you look at it in that light: Sony is obviously the biggest partner there (with around 80M PS4 units shipped, who knows how many on contract to produce, and PS5 on the horizon). Microsoft is a not-quite-so close second (with around 30M XB1's shipped). And each of those released a upgrade mid-cycle. Those use Polaris, so Polaris received a lot of attention.

Apple was touting Vega for their new Pro linup, and that ended up being the iMac Pro 5K. I can't imagine it's sold anywhere near as many units as any of the consoles have. Intel has since also licensed Vega for KabyG (although it appears Intel is actually using Polaris, and it's just branded as "Vega").

If we assume that's true, of all the money/resources that RTG was allocated, the vast majority (according to Forbes, somewhere around 66%) was toward custom chip development and not discrete GPU development (which would be where all those various SKU's and salvage parts come into play). Consoles don't want/need big, high power parts. Apple was the only one looking for anything like that directly, and so AMD developed something for Apple and whatever worked out of that was what made it into Vega 56/64.

And that seems to jive with about what we have seen play out in Retail space. A bit of movement on Polaris (with a refresh that happened to mirror upgraded consoles), and one Vega (which happened to mirror the Apple release). It may not be the exact truth, but it's fairly plausible and checks out with what we see going on.

Quizzical · June 2018

Vrika said:

Quizzical said:

What I want to know is, how did that happen? I see three possibilities that aren't mutually exclusive:

1) TSMC's 16 nm process node is simply better for high performance GPUs than Global Foundries' 14 nm process node. Don't get caught up in the difference between 14 nm and 16 nm. If that's the case, then Nvidia doesn't really have a better architecture than AMD, but only a better process node. In that case, that tells us nothing about how they'll compare in the future once they both move to 7 nm.

2) Nvidia had simply botched some things in their implementation of Maxwell in silicon at 28 nm that they fixed in Pascal. Meanwhile, AMD had a lot less to gain from redoing the chip as Fiji was a much better implementation of GCN than GM200 was of Maxwell. Nvidia got larger gains from a die shrink because they had some low-hanging fruit available to improve their architecture and AMD simply didn't. While GCN has been around for longer than Maxwell, GM200 launched more than a year after the GeForce GTX 750, so I'm skeptical that this accounts for the discrepancy. Still, if this is the case, then AMD won't be competitive again until they have a major new architecture.

3) AMD could have made Polaris and Vega competitive with Pascal if they had put in the engineering effort to lay out the chips more efficiently, but they didn't. Nvidia did put in the effort with Pascal, and that's how they ended up with more efficient parts. It's notable that Nvidia saw their clock speeds jump a lot more from the die shrink than AMD did. If this is the case, then simply putting in the necessary engineering effort in the future as they had in the past would be enough to catch up. If the article is correct that Vega was starved for resources because AMD focused them elsewhere, then being uncompetitive for a generation is a one-off oddity, not indicative of the future.

I'd like to add a fourth explanation:

4) AMD had developed Vega with HBM2 memory controller and counting that they'd have HBM2's bandwidth. At some point they noticed it wasn't going to be available at prices or amounts they'd need to be successful, but by then they were too far in development to do anything, so they just moved resources to other projects.

That is extremely implausible. You have to tape out a product about a year before it launches. By the time a product tapes out, nearly all of the development and optimization work is done. All that's left is a bit of debugging. There isn't much engineering resource savings to be had by going everything as normal up to fixing things in a few respins, then not bothering to do the respins. And if you don't do the respins as necessary to fix whatever is broken, you don't just end up with a product that performs 20% slower. You end up with dies that flatly don't work at all.

In order for there to be a major resource shift away from Vega, it would have had to have happened long before Vega taped out. That may well have happened, but if that happened 2-3 years before HBM2 entered mass production, the decision really couldn't have been based on the state of HBM2.

Furthermore, even if HBM2 had been the problem forcing a change that far out, it would have been solvable by dropping HBM2 from Vega and instead using GDDR5 or GDDR5X. It's common for a given GPU architecture to use different memory standards. See, for example, how some Pascal GPUs use GDDR5 and others use GDDR5X. Or see what both AMD and Nvidia have done for many years of using GDDR5 for the mid range to higher end GPUs, but DDR3 for the low end.

AmazingAvery · June 2018

Quizzical said:

Cleffy said:

I think the difference with Pascal is quite obvious when you look at compute. They sacrificed compute for graphics.
The die size on Navi doesn't matter. It's intended to be a pasted together construction like Thread Ripper is in CPUs. Ideally you want a smaller die.

Multichip modules have been around for a long time for CPUs (e.g. the Core 2 Quad), but have never been done for GPUs. The problem is that the bandwidth needs connecting the two dies for a GPU would be enormous--several hundred GB/sec in each direction. I'm not saying that it's impossible, but I am saying that it's hard enough that in the past, vendors have always found it easier to make a larger die and then rely on salvage parts to fix the yields.

Kepler pretty severely sacrificed compute for the sake of graphics. Maxwell/Pascal actually beefed up compute considerably as compared to Kepler, though it still trails behind GCN/Polaris/Vega there.

No MCM for Navi -

AMD’s Navi will be a traditional monolithic GPU, not a multi-chip module
https://www.pcgamesn.com/amd-navi-mo...esign?tw=PCGN1

It’s definitely something AMD’s engineering teams are investigating, but it still looks a long way from being workable for gaming GPUs, and definitely not in time for the AMD Navi release next year. “We are looking at the MCM type of approach,” says Wang, “but we’ve yet to conclude that this is something that can be used for traditional gaming graphics type of application.”

To some extent you’re talking about doing CrossFire on a single package,” The challenge is that unless we make it invisible to the ISVs [independent software vendors] you’re going to see the same sort of reluctance

Howdy, Stranger!

AMD Created Navi For Sony's PlayStation 5, Vega Suffered

Comments

Howdy, Stranger!

Quick Links

AMD Created Navi For Sony's PlayStation 5, Vega Suffered

Comments