The power of multiGPU

Malabooga · September 2016

So, fairly recently there was some discussion about multiGPu and a lot saying how it sucks and what not. But heres the thing

You can now see that 2xRX480 (400-500$) beat/are on par with 1200$ single GPU when multiGPU is properly used.

So next time you go ranting how "multiGPU sucks" stop and think about your own interests and instead ranting about multiGPU demand better multiGPU support and proper usage of multiGPU.

And this was the initial idea of multiGPU: instead making insanely expencive single chip cards, use smaller much cheaper chips that perform same for much less money. Its completely irrelevant which vendor it is (athough its nice to see RX480 performing so well) but NVidia is going in completely opposite direction by removing SLI capability from GTX1060 completely.

The point is: if you bought 1200$ GPU....or 700$ GPU....why in the world would you need another? Arent those supposed to be fast enough? And why go against customer and remove capability to pair 2 lower priced GPUs to get good performance? Since that was actual idea and actually makes sense?

Ridelynn · September 2016

That's been the case since the 2xGTX460 could beat out a GTX480, and maybe even before then. Several cheaper cards ~can~ beat out a single more expensive card.

The issue has always been, that they usually don't.

Maybe DX12 will help with that, but right now, if you want to support SLI/CF, the game developer and the driver developer have to work together to do a lot of optimization. That doesn't happen very often, it's very time consuming (therefore expensive), and to date there hasn't been a good method of homogenizing it across different games - even those using the same game engine.

Therefore, multiGPU has sucked. For a long while now. With only a few exceptions - those exceptions run exceptionally well, but they are still very much exceptions.

laserit · September 2016

The point is that it doesn't work unless the software in question supports it. And correct me if I'm in error but it also requires driver support for said software.

People think that you can go Crossfire or SLI and it just works with everything but that is not the case.

Ridelynn · September 2016

Why is nVidia pulling it out of the lower end?

Well, for the exact reason you bring up - they don't want people buying lots of low margin cards. They want people buying the high margin cards.

And then, for those people where the high margin cards still aren't enough, that's the niche where SLI/CF has found itself in lately.

Maybe DX12 changes that, but I sorta doubt it.

Quizzical · September 2016

The problem is that there just isn't a nice way to spread rendering a game across multiple GPUs. It's not a problem of DirectX. It's a problem of rasterization and of rendering games in real time as you play.

CrossFire and SLI previously relied on driver magic, which required game-specific optimizations because how to make it work varied by game. DirectX 12 and Vulkan give the developer the ability to do use multiple GPUs in a way that makes sense in their particular game, rather than relying on Nvidia and AMD to make custom driver optimizations for them. But that just doesn't fix the basic problem that there simply isn't a nice way to spread rendering a game across multiple GPUs.

There are some things you can try. For example, you could have the left half of the frame on one GPU and the right half on the other. But you don't find out which half of the frame a vertex is in until after four of the five programmable shader stages are done. You can do some culling host side, but if you try to go this route, you're guaranteed to have a lot of work replicated on both GPUs before you learn which GPU "should" have done it.

Another approach is to simply have different GPUs handle different draw calls, and then piece together the final frame at the end. If you go this route, then not only is there extra work at the end to combine the two partial frames, but you also have no reliable way to divide the work evenly between the two GPUs. Furthermore, both GPUs have to do extra work that would have been skipped if they had the other GPU's depth buffer to discard fragments sooner. And both have to finish and merge before you can do any post processing effects.

You could also try doing the initial rendering on one GPU, and then post processing on the other, using the two GPUs in a pipeline of sorts. This again creates the problem that you can't divide the workload evenly between the two GPUs, on top of the extra overhead of transferring frame buffers what not.

So maybe you can make two of GPU X offer 50% or 70% or whatever better performance than one of GPU X. And you can do this for all recent and future GPUs, not just whichever ones AMD and Nvidia are willing to provide custom driver support for. But the development effort spent on this could have been spent on something else, instead of something that is cool in some esoteric sense but only 1% of your playerbase will ever benefit from.

Ridelynn · September 2016

Now, if I'm honest about it, @Malabooga does bring up a good point.

A modern GPU is made up of a handful of Steaming Multiprocessors, or GCN cores, depending on which brand you choose. They aren't exactly the same, but conceptually, work with me here. Each of those is made up of a bunch of shader cores, some scheduling logic, some texture handling, and some register space - to extremely simplify the thing.

The difference between a $170 GTX1060 and a $700 GTX1080, aside from some video RAM, is pretty much that the 1060 has 9 SMs, whereas the 1080 has 20 SMs. Those SMs aren't all that dissimilar. The real difference in the power of a GPU across a generation is mostly to do with those SM or GCN core counts, and lower tiered cards are often cut down higher tiered cards that just have some disabled.

At it's heart, that's what SLI/CF should be - allowing two cards to merge their resources into one pool. In reality, it's what Quiz describes - you have 2 (or more) physical cards, each acting very independantly and trying to share workloads. Rather than, say, having a 1080 and a 1060, and having 29 total SMs available (I know, a pipe dream, there are other things like core frequency and VRAM and things besides SMs, but bear with me here), and having a better experience than either card individually.

So I do get Malabooga's point, I really do. GPU architectures are highly scalable. And it ~seems~ like it should be an easy thing to just extend that past a die to multiGPU. It's really a shame that real life hasn't panned out that way. Maybe it's a good case for a third party to swoop in, come up with a scalable architecture (although, it seems to me that SMs and GCN cores are pretty scalable already, they just need a more flexible controlling architecture and maybe higher bandwidth between physical cards), that really can rock the socks off of current multiGPU implementations.

Quizzical · September 2016

Higher bandwidth between cards doesn't fix the problem unless you get to really insane bandwidth connecting the cards. It's not just that all of the compute units on a GPU are on the same board. They all share the same L2 cache and global memory with full access to all of its bandwidth. L2 cache bandwidth alone can be in the ballpark of 1 TB/s. If you get that kind of bandwidth connecting two GPUs, then yeah, you could make it scale. But for the cost of doing that, it would be cheaper to just get a bigger GPU.

One issue specific to rasterization is that you don't find out where a vertex is on the screen, or even if it's on the screen, until far into the rendering pipeline. Try to spread a single framebuffer and depth buffer across two GPUs and each very often needs to grab the other's buffer. You'd need a ton of bandwidth to make that alone work. And it's not just bandwidth; you'd need both GPUs to be able to atomically access those buffers and somehow mitigate the race conditions that would happen with a naive implementation.

The problems with multi GPU scaling basically come down to trying to find ways to work around the inability to spread a depth buffer and frame buffer across two GPUs.

Cleffy · September 2016

That would be interesting. Having each GPU take a different element in the shader pipeline, like have one GPU do the first part of the shading pipeline and pass it onto the second one that rasterizes and applies the pixel shaders. There would still need to be a large buffer to pass info quickly and would need some degree of frame control. A lot of the info would need to be mirrored across both GPUs. There would be latency. For ideal speed you would be looking at the previous frame. You would also need to write custom code to dictate when the GPU must stop in the pipeline and pass the info forward.

Ridelynn · September 2016

Quizzical said:

Higher bandwidth between cards doesn't fix the problem unless you get to really insane bandwidth connecting the cards. It's not just that all of the compute units on a GPU are on the same board. They all share the same L2 cache and global memory with full access to all of its bandwidth. L2 cache bandwidth alone can be in the ballpark of 1 TB/s. If you get that kind of bandwidth connecting two GPUs, then yeah, you could make it scale. But for the cost of doing that, it would be cheaper to just get a bigger GPU.

One issue specific to rasterization is that you don't find out where a vertex is on the screen, or even if it's on the screen, until far into the rendering pipeline. Try to spread a single framebuffer and depth buffer across two GPUs and each very often needs to grab the other's buffer. You'd need a ton of bandwidth to make that alone work. And it's not just bandwidth; you'd need both GPUs to be able to atomically access those buffers and somehow mitigate the race conditions that would happen with a naive implementation.

The problems with multi GPU scaling basically come down to trying to find ways to work around the inability to spread a depth buffer and frame buffer across two GPUs.

Well, if it were easy, it would have already been done. But thanks for the more detailed explanation. I still can't quite wrap my head around it, but I suppose I don't make GPUs for a living either.

Vrika · September 2016

MultiGPU is a perfect option if your aim is to build an expensive computer that's able to run 66% of available games really fast.

If on the other hand you want to be able to play all games, then you're going to need a single GPU solution.

If it were otherwise, and two RX 480 could reliably match Titan X, AMD would attach them together at the factory and sell them as a package. AMD are not complete idiots. If they had an economical way to match the speed of Titan X they would be selling it.

Malabooga · September 2016

Ridelynn said:

Quizzical said:

Higher bandwidth between cards doesn't fix the problem unless you get to really insane bandwidth connecting the cards. It's not just that all of the compute units on a GPU are on the same board. They all share the same L2 cache and global memory with full access to all of its bandwidth. L2 cache bandwidth alone can be in the ballpark of 1 TB/s. If you get that kind of bandwidth connecting two GPUs, then yeah, you could make it scale. But for the cost of doing that, it would be cheaper to just get a bigger GPU.

One issue specific to rasterization is that you don't find out where a vertex is on the screen, or even if it's on the screen, until far into the rendering pipeline. Try to spread a single framebuffer and depth buffer across two GPUs and each very often needs to grab the other's buffer. You'd need a ton of bandwidth to make that alone work. And it's not just bandwidth; you'd need both GPUs to be able to atomically access those buffers and somehow mitigate the race conditions that would happen with a naive implementation.

The problems with multi GPU scaling basically come down to trying to find ways to work around the inability to spread a depth buffer and frame buffer across two GPUs.

Well, if it were easy, it would have already been done. But thanks for the more detailed explanation. I still can't quite wrap my head around it, but I suppose I don't make GPUs for a living either.

Well Raja Koduri said in an interview that they are making advancements on mGPU side. The problem can be tackeled from software angle and from hardware angle. AMD is most likely doing harware angle.

So there are rumors that AMD will make small ARM chip and put it on dual GPU card to manage GPUs resources. From any point of view outside of the carsd, card would be seen as a single card as it would communicate through ARM chip which would then control 2 graphic chips resuorces internally.

And when we see Microsoft push into mGPU, new Scorpio might be based on such design as well as future AMD cards.

The thing with that is that it wouldnt have to stop on only 2 chips as theoretically you could put any number of chips on a PCB and tie them with ARM controller to be seen as single resurce from outside.

Malabooga · December 2016

Some benchmarks of mGPU done well

Thumbnail

filmoret · December 2016

For some reason the Nvidia cards are much better when its 4k resolution. They almost always have 2x performance boost when running 4k. The lower resolutions are a bit different and idk exactly why that is the case.

https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080_SLI/14.html

Out of the 20 games they tested there is almost always a performance boost with sli.

H0urg1ass · December 2016

I'm no GPU scientist, and honestly, it's not my cup of tea. What I will say, however, is that I tried two iterations of SLI and they both sucked. I tried dual 460's and dual 560's and ever since then I've gone with X70 series and had much better luck.

The dual 460's would flat out refuse to play some games like World of Tanks which would crash on launching a match immediately. The dual 560's had so many issues that I frequently had to disable one GPU to play several different games. Issues from micro-stuttering to horrible screen tearing to FPS that was worse than with one GPU turned off.

Never again.

Quizzical · December 2016

filmoret said:

For some reason the Nvidia cards are much better when its 4k resolution. They almost always have 2x performance boost when running 4k. The lower resolutions are a bit different and idk exactly why that is the case.

https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080_SLI/14.html

Out of the 20 games they tested there is almost always a performance boost with sli.

You should try reading your own links before you post them.

That review found that at 4K, SLI on average increased the frame rates by 52%. Even if you include non-scaling games, it was an average of 71%. That's not double, and not terribly close to it. Furthermore, because Nvidia and AMD tend to heavily optimize their drivers specifically for the games that appear in reviews, you can expect that other games will tend to have worse SLI scaling--and likely not scale with SLI at all for games not released near the time that Pascal was Nvidia's latest architecture, as even the most popular games released a few years from now will have the game-specific driver optimizations go to Volta or whatever rather than then-older Pascal cards.

And that's even assuming that SLI works flawlessly, with no micro-stutter or added latency. Which is, of course, a false assumption. The micro-stutter problem has lessened considerably with improved frame pacing over the last few years, but there's nothing that can be done about the added latency from using SLI. Even with perfect frame pacing, x frames per second with a single GPU will tend to give you a better gaming experience than 1.2x frames per second with SLI, as a lower net frame rate is compensated for by lower latency. With imperfect frame pacing, that 1.2 factor can get much larger and even approach 2 in degenerate cases.

The reason there is better scaling at higher resolutions is simple and obvious: higher resolutions add far more GPU work but not much more CPU work, making CPU bottlenecks less common and less severe.

Rich84 · December 2016

SLI and Crossfire are great...when they work.

The problem is besides the extra cost of components is dealing with the bugs to get it working and most games don't launch with proper support for SLI/Crossfire. It can take weeks/months for drivers to be released.

Malabooga · December 2016

The frametimes in Deus Ex are almost IDENTICAL to single card

https://www.pcper.com/news/Graphics-Cards/DX12-Multi-GPU-scaling-and-running-Deus-Ex-Mankind-Divided

and in DX12 and SFR rendering process is done exactly the same as on a single card as 2 cards act as one both rendering same frame at the same time. Everything is pooled together, even VRAM so if you have 8GB cards you effectively have 16GB VRAM, not 8 as you have with AFR.

And games release in broken state for single cards too, just look at Wtach Dogs 2/Dishonored 2/Mafia 3. Completely broken with plethora of problems.

now ask yorself why you cant buy 2xRX480 for 400$ and have same performance as 1200$ Titan XP (now proven in 3 games, ~90% scaling needed). Or why NVidia removed SLI capability from GTX1060 (same calss a little slower card than RX480 but would be as fast as Titan XP as 2xRX480s are)

and yeah, GTX1080 SLI? Shouldnt 700$ card run everything great WITHOUT need to use second 700$ card?

filmoret · December 2016

Malabooga said:

The frametimes in Deus Ex are almost IDENTICAL to single card

and yeah, GTX1080 SLI? Shouldnt 700$ card run everything great WITHOUT need to use second 700$ card?

Everything does run good on a single card. Its the most powerful card on the market. How is your statement logical at all? People use 2 cards because they want more and that will always happen. No matter how fast something is they will always want more.

Back to the OP. You might not always get the boost you want out of dual cards. But sometimes you will and never do you actually get less.

Vrika · December 2016

Malabooga said:

now ask yorself why you cant buy 2xRX480 for 400$ and have same performance as 1200$ Titan XP (now proven in 3 games, ~90% scaling needed).

What you are doing is like proving that shooting people in the head is not fatal because it's possible to find 3 people who've survived it.

Smart people ignore Malabooga and instead look at some RX480 CrossFire review where games were more randomly picked, like this one:
https://www.techpowerup.com/reviews/AMD/RX_480_CrossFire/

NitemareMMO · December 2016

I might go with SLI/Crossfire setup down the lane ONLY when 99% games support it flawlessly AND dual setup don't exceed the budget of a single board AND I get better performance.

Besides Win10 adoption isn't going all that great (about 23%), almost 50% users are still running Win7 which means no DX12. There's no incentive for devs to support multi GPU (Vulcan could help here being available broader but I've no info about it's multiGPU support).

Quizzical · December 2016

Malabooga said:

and in DX12 and SFR rendering process is done exactly the same as on a single card as 2 cards act as one both rendering same frame at the same time. Everything is pooled together, even VRAM so if you have 8GB cards you effectively have 16GB VRAM, not 8 as you have with AFR.

Kinda, sorta, but not really. It depends greatly on the fine details of how you split the workload, but a lot of things are probably going to need to be replicated across both cards in order to be available from either one.

Really, though, if a game can't run well on 8 GB of video memory, multi-GPU scaling is the least of their worries.

Muke · December 2016

Malabooga said:

So, fairly recently there was some discussion about multiGPu and a lot saying how it sucks and what not. But heres the thing

You can now see that 2xRX480 (400-500$) beat/are on par with 1200$ single GPU when multiGPU is properly used.

So next time you go ranting how "multiGPU sucks" stop and think about your own interests and instead ranting about multiGPU demand better multiGPU support and proper usage of multiGPU.

And this was the initial idea of multiGPU: instead making insanely expencive single chip cards, use smaller much cheaper chips that perform same for much less money. Its completely irrelevant which vendor it is (athough its nice to see RX480 performing so well) but NVidia is going in completely opposite direction by removing SLI capability from GTX1060 completely.

The point is: if you bought 1200$ GPU....or 700$ GPU....why in the world would you need another? Arent those supposed to be fast enough? And why go against customer and remove capability to pair 2 lower priced GPUs to get good performance? Since that was actual idea and actually makes sense?

You do know that many games do not even support SLI/Crossfire?

GL running those games running on a dual GPU system.

botrytis · December 2016

Unless one is playing at high res (4K) with all the eye candy on, multi-GPU systems are not needed. Remember many games are still using Dx 9.0c which is more CPU bound than GPU.

Malabooga · December 2016

NitemareMMO said:

I might go with SLI/Crossfire setup down the lane ONLY when 99% games support it flawlessly AND dual setup don't exceed the budget of a single board AND I get better performance.

Besides Win10 adoption isn't going all that great (about 23%), almost 50% users are still running Win7 which means no DX12. There's no incentive for devs to support multi GPU (Vulcan could help here being available broader but I've no info about it's multiGPU support).

Majority of gamers are on WIn10 and every major release in 2016. had DX12/Vulkan, and some 2015. games getting the treatment too. Not only DX12 is here but its taken over lol

Quizzical said:

Malabooga said:

and in DX12 and SFR rendering process is done exactly the same as on a single card as 2 cards act as one both rendering same frame at the same time. Everything is pooled together, even VRAM so if you have 8GB cards you effectively have 16GB VRAM, not 8 as you have with AFR.

Kinda, sorta, but not really. It depends greatly on the fine details of how you split the workload, but a lot of things are probably going to need to be replicated across both cards in order to be available from either one.

Really, though, if a game can't run well on 8 GB of video memory, multi-GPU scaling is the least of their worries.

well, thats what API and drivers are for. Along with devs fine-tuning. And since DX12 mGPU is vendor agnostic and you can use any 2 cards....

memory is just as an example since up until now thre was no pooling, if your cards had 8 GB each, you get 8GB, having it pooled to 16GB is just a bonus in the process ;P

Howdy, Stranger!

The power of multiGPU

Comments

Howdy, Stranger!

Quick Links

The power of multiGPU

Comments