NVIDIA @ CES

PyrateLV · January 2013

2013 product reveal

Nvidia Tegra4

Nvidia Shield

Nvidia Grid

Looks like they are coming out with some interesting stuff.

Oh, and the new LG 83" 4k LED Smart TV they showed all this on is

Quizzical · January 2013

Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that.

Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though.

Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation.

But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable.

Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this.

Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.

TheLizardbones · January 2013

Originally posted by Quizzical
Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that.Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though.Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation.But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable.Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this.Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.

I think the big deal with GRID is that they don't need to have 1 GPU per customer. They have 20 devices in a rack that is comparable to 700 Xbox 360s. Performance is important though...I don't know if people will be stoked about Xbox 360 type performance even this year, much less in the next few years.

Garkan · January 2013

As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

Quizzical · January 2013

Originally posted by lizardbones

Originally posted by Quizzical
Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that.

Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though.

Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation.

But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable.

Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this.

Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.

I think the big deal with GRID is that they don't need to have 1 GPU per customer. They have 20 devices in a rack that is comparable to 700 Xbox 360s. Performance is important though...I don't know if people will be stoked about Xbox 360 type performance even this year, much less in the next few years.

Instead of having one entry-level GPU per customer, they can have one entry-level GPU split among multiple customers? That gets you performance that can't even keep pace with integrated graphics, even before the problems of streaming. What reason is there to buy such a service apart from not knowing any better?

TheLizardbones · January 2013

Originally posted by Quizzical
Originally posted by lizardbones
Originally posted by Quizzical Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that. Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though. Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation. But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable. Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this. Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.
I think the big deal with GRID is that they don't need to have 1 GPU per customer. They have 20 devices in a rack that is comparable to 700 Xbox 360s. Performance is important though...I don't know if people will be stoked about Xbox 360 type performance even this year, much less in the next few years.
Instead of having one entry-level GPU per customer, they can have one entry-level GPU split among multiple customers? That gets you performance that can't even keep pace with integrated graphics, even before the problems of streaming. What reason is there to buy such a service apart from not knowing any better?

No, one box would give you 35 entry level gpus. So twenty boxes give you 700 customers, each with entry level gpu performance. This is according to them.

** edit **
So you have the equivalent of an entry level gpu per customer, just with far fewer physical gpus. On Live needed a physical gpu per customer.

Quizzical · January 2013

Originally posted by Garkan
As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

This is very different from the PlayStation Vita. In order to have a game run on Shield, you don't need to code the game specifically for Shield. For starters, any Android game will run--and usually run at least as well on this as on any other Android device in existence--and there are already quite a lot of those. Additionally, if a game has a Linux or Mac version even if not specifically an Android version, even that might (or might not) run with minimal effort.

Additionally, it's an open platform. That means that if you make a game for Nvidia Shield, you get to keep whatever revenue you can get from it. For Vita, you have to give a substantial chunk of that revenue to Sony. Furthermore, even if you make a game with Nvidia Shield in mind, porting it to run on a lot of other tablets that are coming very soon will be easy--if it even requires any additional work on the developer's part at all. If you make a game for Vita, it runs on Vita, and making it run on anything else is far from trivial.

Quizzical · January 2013

Originally posted by lizardbones

No, one box would give you 35 entry level gpus. So twenty boxes give you 700 customers, each with entry level gpu performance. This is according to them.

** edit **
So you have the equivalent of an entry level gpu per customer, just with far fewer physical gpus. On Live needed a physical gpu per customer.

One physical Nvidia Grid card has four physical GPUs in it, and they're the lowest end discrete GPU of the generation--from either major graphics vendor. They'll probably be clocked down from GeForce cards, and might even be paired with DDR3 memory instead of GDDR5. That's not going to get you the performance of 35 entry level GPUs at once, unless by "entry-level", you mean something ancient like GeForce G 210 or Radeon HD 4350.

TheLizardbones · January 2013

Originally posted by Quizzical
Originally posted by lizardbones No, one box would give you 35 entry level gpus. So twenty boxes give you 700 customers, each with entry level gpu performance. This is according to them. ** edit ** So you have the equivalent of an entry level gpu per customer, just with far fewer physical gpus. On Live needed a physical gpu per customer.

One physical Nvidia Grid card has four physical GPUs in it, and they're the lowest end discrete GPU of the generation--from either major graphics vendor. They'll probably be clocked down from GeForce cards, and might even be paired with DDR3 memory instead of GDDR5. That's not going to get you the performance of 35 entry level GPUs at once, unless by "entry-level", you mean something ancient like GeForce G 210 or Radeon HD 4350.

I have no idea. What they are saying is each box has 24 GPUs, which is a substantial cost improvement over having 24 discrete video cards. These are GPUs developed specifically for the GRID servers, not their regular GPUs. They've combined this with software that allows for load balancing and virtual hardware stacks(?), which means one GPU can support several users at a hardware cost reduction, and also a substantial power requirement reduction.

Quizzical · January 2013

Originally posted by lizardbones

Originally posted by Quizzical

Originally posted by lizardbones No, one box would give you 35 entry level gpus. So twenty boxes give you 700 customers, each with entry level gpu performance. This is according to them. ** edit ** So you have the equivalent of an entry level gpu per customer, just with far fewer physical gpus. On Live needed a physical gpu per customer.

One physical Nvidia Grid card has four physical GPUs in it, and they're the lowest end discrete GPU of the generation--from either major graphics vendor. They'll probably be clocked down from GeForce cards, and might even be paired with DDR3 memory instead of GDDR5. That's not going to get you the performance of 35 entry level GPUs at once, unless by "entry-level", you mean something ancient like GeForce G 210 or Radeon HD 4350.

I have no idea. What they are saying is each box has 24 GPUs, which is a substantial cost improvement over having 24 discrete video cards. These are GPUs developed specifically for the GRID servers, not their regular GPUs. They've combined this with software that allows for load balancing and virtual hardware stacks(?), which means one GPU can support several users at a hardware cost reduction, and also a substantial power requirement reduction.

Nvidia now says that there are two such cards.

http://www.nvidia.com/object/grid-boards.html

The lower end version (Grid K1) is four GK107 chips paired with DDR3 memory in a 130 W TDP. That means it's basically four of these, except clocked a lot lower:

http://www.newegg.com/Product/Product.aspx?Item=N82E16814130818

That's stupidly overpriced, by the way; on a price/performance basis, you could maybe justify paying $70, but not more than a far superior Radeon HD 7750 costs. Oh, and that's before you turn the clock speeds way down to save on power consumption.

The higher end version (Grid K2) is two GK104 GPUs in a 225 W TDP. That means basically two 4 GB GeForce GTX 680s, except clocked a lot lower, in order to have two of them on a card barely use more power than a single "real" GTX 680.

Now yes, the Grid cards might make a lot of sense for a service like OnLive. (AMD either offers or will soon offer Trinity-based Opteron chips with integrated graphics that might also make a ton of sense for something like OnLive.) What doesn't make sense is for customers to pay for any streaming service based on the Grid K1 cards. But OnLive was always targeted mainly at the clueless, so nothing changes there.

They aren't custom chips. You don't do custom chips for low-volume parts. Nvidia doesn't even do custom chips for Quadro cards, and that's a huge cash cow. A different bin of existing chips, yes, but that's far from doing a custom chip. It might be a special salvage bin with something fused off that the consumer cards need. Nvidia even explicitly says that they're Kepler GPU chips.

Quizzical · January 2013

Actually, hang on a moment. Nvidia says Grid K1 has 768 shaders. That would mean 4 SMXes. Spread across 4 GPU chips, that's one SMX per GPU. GK107 has two. So Grid K1 could either be GK107 salvage parts or the inaugural part on a new low end GPU chip, perhaps a GK118 or GK119 of sorts. That would be just like how with Fermi, they made GF108 with two SMs, then later made GF119 with one. That explains how they got the power consumption down.

Entry-level, indeed. That's going to get crushed by integrated graphics, no matter how you clock it. And that's if you've just got one customer trying to use the card at a time. Want to split it among 36? Well then, Nvidia Shield would probably give you more graphics performance. In a tablet. Maybe soon in a cell phone.

And that leaves me wondering what the intended market for Grid is. Maybe you sell a few dozen racks to OnLive before they go broke again. Then again, maybe not, as the official reason they went broke the first time is that they bought five times as much capacity as they needed and couldn't get customers to use it.

And then what? Virtualization over a LAN in a corporate environment? Maybe that's what Nvidia has in mind, as their page talks about VMware and Citrix. For office applications, streaming is probably more computationally intensive than just rendering it locally. Maybe being able to do relatively simple graphics work while having everything stored and secured centrally is the real goal.

Or maybe this is supposed to be an energy-efficient alternative to Tesla for certain workloads that scale well to many weak GPUs that don't need to communicate with each other much? In other words, Grid is to Tesla as Calxeda and SeaMicro are to Xeon and Opteron? I don't see it playing out that way, and Nvidia doesn't seem to be pushing that, but it's plausible.

Nvidia does seem to think that Grid will be useful for cloud gaming. But I don't see how, other than a service that thinks they can separate idiots from their money by overcharging for a badly inferior product. It's worthless to home users, as if you only need to stream one game at a time, a single GeForce (or Radeon) card will do it better and cheaper.

I can't see it making sense for cyber cafes, either, unless somewhere in the world the culture is that people go there looking for low-end graphics performance. Nvidia will probably charge enough for Grid that they'd be better off getting ordinary GeForce or Radeon cards, just like they do now.

Gabby-air · January 2013

Originally posted by Quizzical
Originally posted by Garkan
As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

This is very different from the PlayStation Vita. In order to have a game run on Shield, you don't need to code the game specifically for Shield. For starters, any Android game will run--and usually run at least as well on this as on any other Android device in existence--and there are already quite a lot of those. Additionally, if a game has a Linux or Mac version even if not specifically an Android version, even that might (or might not) run with minimal effort.

Additionally, it's an open platform. That means that if you make a game for Nvidia Shield, you get to keep whatever revenue you can get from it. For Vita, you have to give a substantial chunk of that revenue to Sony. Furthermore, even if you make a game with Nvidia Shield in mind, porting it to run on a lot of other tablets that are coming very soon will be easy--if it even requires any additional work on the developer's part at all. If you make a game for Vita, it runs on Vita, and making it run on anything else is far from trivial.

Google takes a 30% cut on revenue from android apps.

Quizzical · January 2013

Originally posted by Gabby-air
Originally posted by Quizzical
Originally posted by Garkan
As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

This is very different from the PlayStation Vita. In order to have a game run on Shield, you don't need to code the game specifically for Shield. For starters, any Android game will run--and usually run at least as well on this as on any other Android device in existence--and there are already quite a lot of those. Additionally, if a game has a Linux or Mac version even if not specifically an Android version, even that might (or might not) run with minimal effort.

Additionally, it's an open platform. That means that if you make a game for Nvidia Shield, you get to keep whatever revenue you can get from it. For Vita, you have to give a substantial chunk of that revenue to Sony. Furthermore, even if you make a game with Nvidia Shield in mind, porting it to run on a lot of other tablets that are coming very soon will be easy--if it even requires any additional work on the developer's part at all. If you make a game for Vita, it runs on Vita, and making it run on anything else is far from trivial.

Google takes a 30% cut on revenue from android apps.

Only if you buy them through Google's store.

Quizzical · January 2013

Apparently Tegra 4 does support the full OpenGL 4.0. But only 4.0, not 4.3.

Multiple sites are reporting that Tegra 4 does not feature unified shaders. That's a peculiar development that leads me to wonder how many shaders of each type it has available. And how it will manage to make disparate shaders work with OpenCL. You could have dedicated vertex shaders and fragment shaders when you only needed two types. But now there are six programmable pipeline stages (five if they don't do compute shaders), and having six separate types of dedicated shaders strikes me as rather strange.

Maybe it's two different types of shaders or something. What you'd need for tessellation evaluation shaders is nearly identical to what you'd need for vertex shaders, for example. Indeed, porting shaders from OpenGL 4 to OpenGL 3 is largely a matter of renaming tessellation evaluation shaders as vertex shaders, and then making some minor tweaks.

Ridelynn · January 2013

Originally posted by Quizzical
Originally posted by lizardbones
Originally posted by Quizzical Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that. Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though. Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation. But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable. Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this. Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.
I think the big deal with GRID is that they don't need to have 1 GPU per customer. They have 20 devices in a rack that is comparable to 700 Xbox 360s. Performance is important though...I don't know if people will be stoked about Xbox 360 type performance even this year, much less in the next few years.
Instead of having one entry-level GPU per customer, they can have one entry-level GPU split among multiple customers? That gets you performance that can't even keep pace with integrated graphics, even before the problems of streaming. What reason is there to buy such a service apart from not knowing any better?

The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other...

You end up with Something like this.

As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource.

It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".

TheLizardbones · January 2013

Originally posted by Ridelynn
Originally posted by Quizzical
Originally posted by lizardbones Originally posted by Quizzical Interesting stuff. Tegra 4 is, as had been rumored, a quad core Cortex A15 chip together with a 72-shader GPU. I'd like to know the TDP on it, as that could easily get rather power-hungry for a tablet. I'm curious about API compliance, as rumors had said Tegra 4 would support the full OpenGL 4. Nvidia's web site only says WebGL and HTML 5, though surely it supports more than that. Even Tegra 3 supports more than that. Nvidia Shield is a hand-held gaming device, with both a gamepad and a screen built in. It's powered by Tegra 4, of course, though a 38 watt-hour battery could easily make it awkwardly heavy. It runs Google Android, so it might be a nifty device for Android gaming. No word on the price tag, though. Nvidia Grid is Nvidia's attempt at making a cloud-gaming GPU. Cloud-gaming enthusiasts have claimed that it will offer high-end gaming performance because you can get a high-end video card rendering the game and then streaming it to you. Nvidia Grid most unambiguously cannot do that, as it's based on GK107, the lowest end GPU chip of the current generation. But hey, at least Nvidia was able to demonstrate that it works. If you're only streaming over a LAN. And running Ethernet rather than WiFi. And happy with extremely low monitor resolutions that will make some games borderline unplayable. Which is how exactly no one in the world will ever reasonably use it. If you want to stream long distances, Ethernet isn't an option. If you want to stream over a LAN, you can get a higher end video card. Such as every single current-generation card on the market apart from the crippled GeForce GT 640. More than a few of them will probably cost less than Nvidia Grid, too, as Grid has four GPUs in a single "card", and enterprise-focused products tend to carry a heavy price premium. See pricing of Tesla and Quadro cards for examples of this. Still, I can see why they made that choice: less cost and less heat for companies like OnLive that run cloud gaming services. And OnLive could probably run medium settings, tell their customers that it's super-ultra-max settings, and hardly any of them would ever know the difference. Often, you wouldn't even be able to tell the difference after the very lossy video compression, anyway.

I think the big deal with GRID is that they don't need to have 1 GPU per customer. They have 20 devices in a rack that is comparable to 700 Xbox 360s. Performance is important though...I don't know if people will be stoked about Xbox 360 type performance even this year, much less in the next few years.

Instead of having one entry-level GPU per customer, they can have one entry-level GPU split among multiple customers? That gets you performance that can't even keep pace with integrated graphics, even before the problems of streaming. What reason is there to buy such a service apart from not knowing any better?

The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other...

You end up with Something like this.

As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource.

It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".

It seems to work the same way that virtual machines work in a business environment. The system just pushes power around to wherever it's needed, regardless of how many or how few gpu cycles it requires. Except instead of cpu or memory, it does this with the GPU, which if I understand correctly hasn't been done before.

You can stack the devices together, but I don't know if they all work together as one large unit. You might have 35 players per box, or 700 players per 20 boxes. Dunno.

KhinRunite · January 2013

Originally posted by Garkan
As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

Vita is a closed platform, and is expensive. While we don't know for sure if Shield will be any cheaper, it's definitely running on a more open ecosystem. You got your all your android games available, and you get to stream from the PC. Vita only has the PSN.

The 3DS also struggled before the price cut.

KyutaSyuko · January 2013

Originally posted by KhinRunite
Originally posted by Garkan
As a gadget fan the Tegra news is exciting, I love my tablets and smartphones. The rest is fairly meh, the handheld is dead in the water. The Vita shows noone really wants a high performance handheld and devs and publishers dont seem all that bothered about releasing games for it. I dont exactly regret getting one but at the current pace of game releases for it that time might not be to far away. I wont be falling for buying another non Nintendo handheld again.

Vita is a closed platform, and is expensive. While we don't know for sure if Shield will be any cheaper, it's definitely running on a more open ecosystem. You got your all your android games available, and you get to stream from the PC. Vita only has the PSN.

The 3DS also struggled before the price cut.

To go along with this -- if I remember correctly -- the original PSP wasn't doing so well either and everyone figured it would end up tanking. Just saying I'm not giving up hope on my Vita. Though if the third party support doesn't pick-up by the end of this year...

Quizzical · January 2013

Originally posted by lizardbones

Originally posted by Ridelynn
The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other...

You end up with Something like this.

As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource.

It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".

It seems to work the same way that virtual machines work in a business environment. The system just pushes power around to wherever it's needed, regardless of how many or how few gpu cycles it requires. Except instead of cpu or memory, it does this with the GPU, which if I understand correctly hasn't been done before.

You can stack the devices together, but I don't know if they all work together as one large unit. You might have 35 players per box, or 700 players per 20 boxes. Dunno.

In order for a supercomputer to be useful, you have to have an application that readily splits into an enormous number of threads that don't need to communicate with each other all that much. That's basically the opposite of what a GPU does with games.

Neither AMD nor Nvidia has figured out how to split rendering a single frame of a game across two GPUs. Or at least, they can't do it well enough for it to have a point. Lucid tried, with results that were often slower than merely ignoring the slower GPU and using only the faster one. Even with two GPUs on a single card, or a dedicated CrossFire or SLI bridge, the latency is far too high and the bandwidth far too low.

And now you think that Nvidia Grid is going to split rendering a game across 10 GPUs that can barely communicate with each other at all? I am totally not buying that.

If you want to split rendering a frame across multiple GPUs, then for starters, all of the buffered data would have to be duplicated on every GPU. That includes vertex data, textures, framebuffers, depth buffers, and so forth. Oh, and framebuffers and the depth buffer change constantly as you render a frame, so you'd have to send massive amounts of data back and forth.

If you let each GPU compute half of a frame and then collate the data afterwards, they'll each have to do a ton of work on fragments that should have been skipped because of a failed depth buffer test, except that the fragment that previously wrote to the depth buffer was processed on the other GPU so you never see it and have to do all of the fragment processing anyway. How much more work will that create? 20%? 50%? If you're trying to spread it across 10 GPUs, you could easily double or triple the amount of work that you have to do. That's not a recipe for efficiency.

Neither is trying to process objects on one side of the screen on one GPU and objects on the other side on another GPU practical. The GPU doesn't find out where on the screen an object is (or even whether it's on the screen at all) until rasterization, which comes after four programmable shader stages plus multiple fixed-function pipeline stages.

Nor is it practical to do the earlier pipeline stages on one GPU, then hand it all off to another GPU to finish it. Trying to load balance that between the GPUs would be impossible, as the proportion of the work done in different stages varies wildly from game to game, or by changing graphical settings within a single game, or even from one map area to another at the same settings in the same game. The bandwidth needed would massively overwhelm even a PCI Express 3.0 x16 connection.

While I'm not privy to the internal details of how GPUs work, I'd be shocked if they don't try to store at least some of the data passed from one pipeline stage to the next in GPU cache rather than writing it all to video memory, as the latter could easily overwhelm your video memory bandwidth. If a GPU accessing its own video memory is far too slow for some purposes, then passing the data along to another GPU is going to be much worse.

That's why CrossFire and SLI use alternate frame rendering. Each GPU renders a complete frame on its own, and both are rendering at the same time. That's a way to use multiple GPUs per user, and doesn't scale nearly as well as simply having a more powerful GPU. You know that, of course. But it still works vastly better than trying to split a frame across multiple GPUs.

-----

I think that the Grid K1 is meant for enterprise virtualization use similar to how they virtualize processors and system memory now. I don't think it's meant for gaming. It could perhaps be used by services like OnLive for games that are really light on GPU load. If someone wants to stream Solitaire through OnLive, giving him a GeForce GTX 680 for it is quite a waste. But the Grid K2 is the gaming one, and having multiple games rendering simultaneously on what is effectively a single GTX 680 could easily be viable.

Ridelynn · January 2013

Originally posted by Quizzical
Originally posted by lizardbones
Originally posted by Ridelynn The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other... You end up with Something like this. As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource. It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".
It seems to work the same way that virtual machines work in a business environment. The system just pushes power around to wherever it's needed, regardless of how many or how few gpu cycles it requires. Except instead of cpu or memory, it does this with the GPU, which if I understand correctly hasn't been done before. You can stack the devices together, but I don't know if they all work together as one large unit. You might have 35 players per box, or 700 players per 20 boxes. Dunno.
In order for a supercomputer to be useful, you have to have an application that readily splits into an enormous number of threads that don't need to communicate with each other all that much. That's basically the opposite of what a GPU does with games.

Not necessarily. Think about it - internal to a GPU there are hundreds of compute units - rather you call them Shader Processors or CUDA Cores or Stream Units what have you. The GPU work load is split across these extremely paralleled.

The problem with SLI/CFX (and presumably Grid) isn't that the work doesn't split across an enormous number of threads: We already know that a typical GPU workload does that extremely well already. it's been mainly in software implementation. We've seen in properly packaged software, with driver support, with efficiencies that approach 100% for dual GPU setups (I'll admit they taper off greatly after that in Desktops, but the point is we can get scaling to work across multiple GPUs, maybe Grid doesn't use PCI3 or has some other means than the SLI bridge to accomplish better scaling - I don't know). The main problem is that it hasn't been generically that great - you can't just throw multiple GPU's and expect it to work, the software has to incorporate it to some degree, and the drivers have to be tweaked just so. And there's some loss of efficiency in making it convenient - using PCI busses with only a simple bridge connection to make it easy to install, and making the drivers compromising enough to allow for slight hardware mismatches and timing inaccuracies.

With virtualization software, it would be very easy to genericize any number of discrete computing assests, and then re-allocate them as needed. Amazon does this with general computing power and Elastic Cloud Compute. It's not really any different, just the driver has to work a bit differently than it does on a typical Windows platform - and since nVidia is making all the hardware, and writes the driver, it wouldn't be very hard for them at all.

It's definitely not meant for consumer gaming - and I agree to some extent, it's intended for enterprise-level applications. It's basically nVidia taking their recent supercomputer experience, and trying to play a video game on it. We may see it used in baby super computers (for like universities), or render farms (they already use similar technology). OnLive is the obvious assumption, but I think that model too far ahead of it's time - the internet superstructure can't keep up with that. It obviously isn't for home use. But we may see something new come from it... it has some obvious implications with the Shield, and there could be more to that story that just hasn't been told yet.

Quizzical · January 2013

Originally posted by Ridelynn

Originally posted by Quizzical

Originally posted by lizardbones

Originally posted by Ridelynn The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other... You end up with Something like this. As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource. It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".

It seems to work the same way that virtual machines work in a business environment. The system just pushes power around to wherever it's needed, regardless of how many or how few gpu cycles it requires. Except instead of cpu or memory, it does this with the GPU, which if I understand correctly hasn't been done before. You can stack the devices together, but I don't know if they all work together as one large unit. You might have 35 players per box, or 700 players per 20 boxes. Dunno.

In order for a supercomputer to be useful, you have to have an application that readily splits into an enormous number of threads that don't need to communicate with each other all that much. That's basically the opposite of what a GPU does with games.

Not necessarily. Think about it - internal to a GPU there are hundreds of compute units - rather you call them Shader Processors or CUDA Cores or Stream Units what have you. The GPU work load is split across these extremely paralleled.

The problem with SLI/CFX (and presumably Grid) isn't that the work doesn't split across an enormous number of threads: We already know that a typical GPU workload does that extremely well already. it's been mainly in software implementation. We've seen in properly packaged software, with driver support, with efficiencies that approach 100% for dual GPU setups (I'll admit they taper off greatly after that in Desktops, but the point is we can get scaling to work across multiple GPUs, maybe Grid doesn't use PCI3 or has some other means than the SLI bridge to accomplish better scaling - I don't know). The main problem is that it hasn't been generically that great - you can't just throw multiple GPU's and expect it to work, the software has to incorporate it to some degree, and the drivers have to be tweaked just so. And there's some loss of efficiency in making it convenient - using PCI busses with only a simple bridge connection to make it easy to install, and making the drivers compromising enough to allow for slight hardware mismatches and timing inaccuracies.

With virtualization software, it would be very easy to genericize any number of discrete computing assests, and then re-allocate them as needed. Amazon does this with general computing power and Elastic Cloud Compute. It's not really any different, just the driver has to work a bit differently than it does on a typical Windows platform - and since nVidia is making all the hardware, and writes the driver, it wouldn't be very hard for them at all.

It's definitely not meant for consumer gaming - and I agree to some extent, it's intended for enterprise-level applications. It's basically nVidia taking their recent supercomputer experience, and trying to play a video game on it. We may see it used in baby super computers (for like universities), or render farms (they already use similar technology). OnLive is the obvious assumption, but I think that model too far ahead of it's time - the internet superstructure can't keep up with that. It obviously isn't for home use. But we may see something new come from it... it has some obvious implications with the Shield, and there could be more to that story that just hasn't been told yet.

In order for a GPU-friendly workload (which requires massive amounts of SIMD and little to no branching, among other things) to scale well to multiple GPUs, it's not enough for it to merely be trivial to break into an enormous number of threads. It's also critical that those threads not need to communicate with each other very much. Graphics computations trivially break into many thousands of threads if you need them to, but those threads have to communicate to an enormous degree.

Let's dig into how the OpenGL pipeline works; DirectX is similar. You start with a vertex shader. This usually takes data from a vertex array object in video memory, though it can be streamed in from the CPU. This reads in some vertex data and some uniforms, does some computations, produces some data, and outputs it. One vertex shader doesn't have to talk to another vertex shader directly, but in order to properly vectorize the data, GPUs will probably tend to run the same vertex shader on a bunch of different vertices at the same time, since they all apply the same instructions in the same order, but merely to different starting data. (That's what SIMD, or Single Instruction Multiple Data means.)

Then you take outputs from vertex shaders and have to look at which patches (if you think of them as triangles, that's close enough) contain which vertices. Each invocation of a tessellation control shader has to read in the vertex shader outputs from each of the vertices that the patch contains. That means a tessellation control shader input typically corresponds to three or so sets of vertex shader outputs. It doesn't have to be three; it can be any arbitrary number, but it will usually be three if you do tessellation in what I think is the obvious, straightforward manner; it's certainly the geometrically intuitive manner.

Meanwhile, the same vertex is likely to be part of several different patches, so an average set of vertex shader outputs needs to get read in as input by several different tessellation control shaders. For what I'm working on, the average number of tessellation control shader invocations that will read in a given vertex shader output is around 3-4, though it varies by which program you're using. And it's not a simple case where the outputs corresponding to these three vertex shaders get read in inputs corresponding to those three patches. Which vertices correspond to which patches can be arbitrary, and it's a many to many relationship.

Then the tessellation control shader does a little bit of work before handing its outputs off to the hardware tessellator. In some video cards, such as my Radeon HD 5850, there is only one hardware tessellator in the entire chip. It doesn't matter which patch the data came from, or which surface, or even which program; it all has to go to the same hardware tessellator. Some cards have multiple tessellators, but I think that's more for latency or bandwidth reasons than raw tessellation performance.

So then the tessellator does its thing and creates a bunch of new vertices and a bunch of new data on how the vertices are adjacent. A single patch sent to the tessellator can easily produce hundreds of vertices and hundreds of triangles, though it will typically produce far fewer. (The OpenGL 4.2 specification requires hardware to support any tessellation degree up to at least 64, though hardware can optionally support higher than that. Setting all tessellation degrees to 64 for triangles means each patch will correspond to 2977 vertices and 6144 triangles.)

Then the vertex data that was output from the tessellation control shader gets input into tessellation evaluation shaders. All of the data that was output for a given patch must be available as inputs for the next patch, and this includes outputs for each vertex of the patch. Additionally, the hardware tessellator provides for each vertex barycentric coordinates to describe where within a patch each particular vertex that it outputs is. This time, while each vertex in a tessellation evaluation shader corresponds to only one patch in a tessellation control shader, a single patch can correspond to many vertices.

Then the tessellation control shader does its thing, and outputs a bunch of data for geometry shaders. The way data gets passed from tessellation evaluation shaders to geometry shaders is a lot like the way it gets passed from vertex shaders to tessellation control shaders: each output will usually be input into several invocations of the next stage, and each input gathers data from several outputs from the previous stage.

Then the geometry shader does whatever it does and outputs data for each triangle that is to be drawn. This goes to the rasterizer that figures out which pixels on the screen correspond to a particular triangle. It produces a fragment for each such pixel, and then sends the fragments on to fragment shaders. There can be many fragments corresponding to the same triangle. In extreme cases, a single triangle can correspond to millions of pixels. While that's unusual, getting thousands of fragments from a single triangle is not.

Meanwhile, the rasterizer has to take the data output from three separate vertices of a triangle and produce data at the particular point in the triangle corresponding to the pixel on the screen for each fragment. It gives you a few choices on how data will be scaled, but it usually interpolates between the vertices, so that a given triangle from a geometry shader can correspond to many different fragments, all of which get different input data.

Then the fragment shaders do their thing, and output the color of a fragment and possibly also the depth. Then they take the depth that is output and check to see whether there is already some other fragment that has been drawn for that pixel in "front" of it. If so, then the fragment that was computed is discarded as being "behind" something else in the scene. If not, then the color and depth output from a fragment shader get written to that particular pixel for the framebuffer and depth buffer, respectively.

Here, every fragment in every fragment shader has to use exactly the same framebuffer and depth buffer. It doesn't matter if they're part of the same triangle, or generated by the same API drawing call, or even generated by the same program.

So as you can see, there's a tremendous amount of communication between threads. Passing data from one stage of the pipeline to the next is rarely a one to one relationship. Sometimes it's one to many or many to one, or even many to many.

Enormous amounts of data have to be passed around. From a quick calculation, it's not at all obvious whether the amount of non-texture data that is input into my fragment shaders alone exceeds the video memory bandwidth of my entire video card. And that's ignoring outputs, textures, and all of the other pipeline stages, not to mention the work that is actually done within the fragment shader. It does count uniforms separately for each fragment shader invocation, and that's about half of the input data.

But if you're having to pass around too much data from one stage to the next for even a GPU's own dedicated video memory to have enough bandwidth, then passing that sort of data from one GPU to another is completely out of the question.

You can readily have different GPUs rendering different frames simultaneously. That way, a GPU computes an entire frame on its own, and then only has to send the completed framebuffer somewhere else. There is little dependence between the computations in one frame and those in the next, unless you neglect to wipe the framebuffer and depth buffer.

If you're trying to render a movie, can work on a bunch of frames simultaneously, and don't care if each frame takes an hour to render, then this could work. Though if that's your goal, I'd question why you need the "36 users for a single card" functionality.

But spreading that across several GPUs simply isn't useful for rendering games, due to latency issues. 1000 GPUs each rendering one frame per second would get you an amazing 1000 frames per second--and leave the game completely unplayable because all you can see is the state of the game as it was a full second ago.

Ridelynn · January 2013

Your assuming that the GPU is a complete video card-like unit, like on a PC - with it's own power supply and memory on a card, and cross-talk between GPU's being forced to mirror the memory and push information over a PCI bus.

What's to stop Grid from just having a lot of GPU's, but a common VRAM bank, or (much) higher bandwidth communication lanes.

It just says 24 GPU's per 4U rack unit. Nothing says these are 24 "video cards", or even how much VRAM they have access to or what the communication infrastructure is. If you look at a picture of a rack (there are 24 GPU's per rack, up to 20 racks per cabinet)... you can't even identify the discrete GPU units - it just looks like 4 hard drive cages and a big mess of fans running down the center of the rack (see link below for picture).

Looking at nVidia's page - they include a hypervisor to "virtualize" the driver for software access, acting as a ringmaster for the hardware. It's set up almost identically to their high performance units (Titan Supercomputer is the latest example).

http://www.nvidia.com/object/cloud-gaming-benefits.html

No big surprises. I think it will work out fairly well as far as actual rendering performance. I just remain skeptical of the network latency - which has always been the real concern with "cloud gaming".

lkc673 · January 2013

Weird that companies are busting out handheld console suddendly. They must have done some good research to see that there is a market for handheld with casual game. Im not sure if i would fork out big bucks for high hand handheld though. With my games i would want to mod or do some creative stuff doubt these handheld are able to as of yet.

Guess microsoft is the only one without a handheld device now!!! lets see if they will soon!

Quizzical · January 2013

Originally posted by Ridelynn
Your assuming that the GPU is a complete video card-like unit, like on a PC - with it's own power supply and memory on a card, and cross-talk between GPU's being forced to mirror the memory and push information over a PCI bus.

What's to stop Grid from just having a lot of GPU's, but a common VRAM bank, or (much) higher bandwidth communication lanes.

It just says 24 GPU's per 4U rack unit. Nothing says these are 24 "video cards", or even how much VRAM they have access to or what the communication infrastructure is. If you look at a picture of a rack (there are 24 GPU's per rack, up to 20 racks per cabinet)... you can't even identify the discrete GPU units - it just looks like 4 hard drive cages and a big mess of fans running down the center of the rack (see link below for picture).

Looking at nVidia's page - they include a hypervisor to "virtualize" the driver for software access, acting as a ringmaster for the hardware. It's set up almost identically to their high performance units (Titan Supercomputer is the latest example).

http://www.nvidia.com/object/cloud-gaming-benefits.html

No big surprises. I think it will work out fairly well as far as actual rendering performance. I just remain skeptical of the network latency - which has always been the real concern with "cloud gaming".

Nvidia tells you the number of discrete GPU chips on a card:

http://www.nvidia.com/object/grid-boards.html

It's four on a Grid K1 and two on a Grid K2.

-----

Getting enough bandwidth to feed everything is one of the key problems restricting GPU performance. (Power consumption is the other big one.) Indeed, the impracticality of getting enough memory bandwidth for it is the only thing stopping AMD from releasing integrated graphics that handily beats $100 discrete video cards. AMD and Nvidia didn't move to expensive GDDR5 memory rather than cheap DDR3 just for fun. They did it because there's no other practical way to get the video memory bandwidth needed for a higher end gaming card.

Meanwhile, look at how the memory is laid out on a video card. They put the GPU chip in the middle of a ring of memory chips, with a ton of traces all over the place connecting the memory to the GPU. Getting enough bandwidth to a single higher end GPU that doesn't have to communicate with other GPUs at all is hard.

And now you're surmising that Nvidia will magically deliver that kind of bandwidth spread evenly between two GPU chips at once? And with ridiculous amounts of bandwidth connecting the two GPU chips? If they could do that, then cloud rendering of graphics is way down the list of obvious applications for it. For starters, how about SLI and CrossFire that actually work perfectly 100% of the time?

Quizzical · January 2013

Originally posted by lkc673
Weird that companies are busting out handheld console suddendly. They must have done some good research to see that there is a market for handheld with casual game. Im not sure if i would fork out big bucks for high hand handheld though. With my games i would want to mod or do some creative stuff doubt these handheld are able to as of yet.

Guess microsoft is the only one without a handheld device now!!! lets see if they will soon!

There's more likely to be a market after the hardware to do it decently is available (i.e., when the new generation of chips such as Tegra 4 comes to market) than before. Will there be a strong market for it? We'll find out.

TheLizardbones · January 2013

Originally posted by Quizzical
Originally posted by Ridelynn
Originally posted by Quizzical
Originally posted by lizardbones
Originally posted by Ridelynn The idea is that you have 24 GPUs in a chassis. These are presumably SLI'ed in some fashion (similarly to how they are doing it for high performance applications). And with a nice backplane, there's no reason you can take multiple chassis and have them talk to each other in a single cabinet. And multiple cabinets talk to each other... You end up with Something like this. As a gamer, your game may be running on 10 GPUs, or it may run on a single shared GPU. It will use as many GPU cycles as whomever the owner of the Grid allows it to, but the Grid device is designed to just be a scalable computing resource. It's not necessarily one GPU per client. It's "A big mess of GPU computer power" provided for a "big mess of customer-driver graphics workloads".
It seems to work the same way that virtual machines work in a business environment. The system just pushes power around to wherever it's needed, regardless of how many or how few gpu cycles it requires. Except instead of cpu or memory, it does this with the GPU, which if I understand correctly hasn't been done before. You can stack the devices together, but I don't know if they all work together as one large unit. You might have 35 players per box, or 700 players per 20 boxes. Dunno.
In order for a supercomputer to be useful, you have to have an application that readily splits into an enormous number of threads that don't need to communicate with each other all that much. That's basically the opposite of what a GPU does with games.
Not necessarily. Think about it - internal to a GPU there are hundreds of compute units - rather you call them Shader Processors or CUDA Cores or Stream Units what have you. The GPU work load is split across these extremely paralleled. The problem with SLI/CFX (and presumably Grid) isn't that the work doesn't split across an enormous number of threads: We already know that a typical GPU workload does that extremely well already. it's been mainly in software implementation. We've seen in properly packaged software, with driver support, with efficiencies that approach 100% for dual GPU setups (I'll admit they taper off greatly after that in Desktops, but the point is we can get scaling to work across multiple GPUs, maybe Grid doesn't use PCI3 or has some other means than the SLI bridge to accomplish better scaling - I don't know). The main problem is that it hasn't been generically that great - you can't just throw multiple GPU's and expect it to work, the software has to incorporate it to some degree, and the drivers have to be tweaked just so. And there's some loss of efficiency in making it convenient - using PCI busses with only a simple bridge connection to make it easy to install, and making the drivers compromising enough to allow for slight hardware mismatches and timing inaccuracies. With virtualization software, it would be very easy to genericize any number of discrete computing assests, and then re-allocate them as needed. Amazon does this with general computing power and Elastic Cloud Compute. It's not really any different, just the driver has to work a bit differently than it does on a typical Windows platform - and since nVidia is making all the hardware, and writes the driver, it wouldn't be very hard for them at all. It's definitely not meant for consumer gaming - and I agree to some extent, it's intended for enterprise-level applications. It's basically nVidia taking their recent supercomputer experience, and trying to play a video game on it. We may see it used in baby super computers (for like universities), or render farms (they already use similar technology). OnLive is the obvious assumption, but I think that model too far ahead of it's time - the internet superstructure can't keep up with that. It obviously isn't for home use. But we may see something new come from it... it has some obvious implications with the Shield, and there could be more to that story that just hasn't been told yet.
In order for a GPU-friendly workload (which requires massive amounts of SIMD and little to no branching, among other things) to scale well to multiple GPUs, it's not enough for it to merely be trivial to break into an enormous number of threads. It's also critical that those threads not need to communicate with each other very much. Graphics computations trivially break into many thousands of threads if you need them to, but those threads have to communicate to an enormous degree.
Let's dig into how the OpenGL pipeline works; DirectX is similar. You start with a vertex shader. This usually takes data from a vertex array object in video memory, though it can be streamed in from the CPU. This reads in some vertex data and some uniforms, does some computations, produces some data, and outputs it. One vertex shader doesn't have to talk to another vertex shader directly, but in order to properly vectorize the data, GPUs will probably tend to run the same vertex shader on a bunch of different vertices at the same time, since they all apply the same instructions in the same order, but merely to different starting data. (That's what SIMD, or Single Instruction Multiple Data means.)
Then you take outputs from vertex shaders and have to look at which patches (if you think of them as triangles, that's close enough) contain which vertices. Each invocation of a tessellation control shader has to read in the vertex shader outputs from each of the vertices that the patch contains. That means a tessellation control shader input typically corresponds to three or so sets of vertex shader outputs. It doesn't have to be three; it can be any arbitrary number, but it will usually be three if you do tessellation in what I think is the obvious, straightforward manner; it's certainly the geometrically intuitive manner.
Meanwhile, the same vertex is likely to be part of several different patches, so an average set of vertex shader outputs needs to get read in as input by several different tessellation control shaders. For what I'm working on, the average number of tessellation control shader invocations that will read in a given vertex shader output is around 3-4, though it varies by which program you're using. And it's not a simple case where the outputs corresponding to these three vertex shaders get read in inputs corresponding to those three patches. Which vertices correspond to which patches can be arbitrary, and it's a many to many relationship.
Then the tessellation control shader does a little bit of work before handing its outputs off to the hardware tessellator. In some video cards, such as my Radeon HD 5850, there is only one hardware tessellator in the entire chip. It doesn't matter which patch the data came from, or which surface, or even which program; it all has to go to the same hardware tessellator. Some cards have multiple tessellators, but I think that's more for latency or bandwidth reasons than raw tessellation performance.
So then the tessellator does its thing and creates a bunch of new vertices and a bunch of new data on how the vertices are adjacent. A single patch sent to the tessellator can easily produce hundreds of vertices and hundreds of triangles, though it will typically produce far fewer. (The OpenGL 4.2 specification requires hardware to support any tessellation degree up to at least 64, though hardware can optionally support higher than that. Setting all tessellation degrees to 64 for triangles means each patch will correspond to 2977 vertices and 6144 triangles.)
Then the vertex data that was output from the tessellation control shader gets input into tessellation evaluation shaders. All of the data that was output for a given patch must be available as inputs for the next patch, and this includes outputs for each vertex of the patch. Additionally, the hardware tessellator provides for each vertex barycentric coordinates to describe where within a patch each particular vertex that it outputs is. This time, while each vertex in a tessellation evaluation shader corresponds to only one patch in a tessellation control shader, a single patch can correspond to many vertices.
Then the tessellation control shader does its thing, and outputs a bunch of data for geometry shaders. The way data gets passed from tessellation evaluation shaders to geometry shaders is a lot like the way it gets passed from vertex shaders to tessellation control shaders: each output will usually be input into several invocations of the next stage, and each input gathers data from several outputs from the previous stage.
Then the geometry shader does whatever it does and outputs data for each triangle that is to be drawn. This goes to the rasterizer that figures out which pixels on the screen correspond to a particular triangle. It produces a fragment for each such pixel, and then sends the fragments on to fragment shaders. There can be many fragments corresponding to the same triangle. In extreme cases, a single triangle can correspond to millions of pixels. While that's unusual, getting thousands of fragments from a single triangle is not.
Meanwhile, the rasterizer has to take the data output from three separate vertices of a triangle and produce data at the particular point in the triangle corresponding to the pixel on the screen for each fragment. It gives you a few choices on how data will be scaled, but it usually interpolates between the vertices, so that a given triangle from a geometry shader can correspond to many different fragments, all of which get different input data.
Then the fragment shaders do their thing, and output the color of a fragment and possibly also the depth. Then they take the depth that is output and check to see whether there is already some other fragment that has been drawn for that pixel in "front" of it. If so, then the fragment that was computed is discarded as being "behind" something else in the scene. If not, then the color and depth output from a fragment shader get written to that particular pixel for the framebuffer and depth buffer, respectively.
Here, every fragment in every fragment shader has to use exactly the same framebuffer and depth buffer. It doesn't matter if they're part of the same triangle, or generated by the same API drawing call, or even generated by the same program.
So as you can see, there's a tremendous amount of communication between threads. Passing data from one stage of the pipeline to the next is rarely a one to one relationship. Sometimes it's one to many or many to one, or even many to many.
Enormous amounts of data have to be passed around. From a quick calculation, it's not at all obvious whether the amount of non-texture data that is input into my fragment shaders alone exceeds the video memory bandwidth of my entire video card. And that's ignoring outputs, textures, and all of the other pipeline stages, not to mention the work that is actually done within the fragment shader. It does count uniforms separately for each fragment shader invocation, and that's about half of the input data.
But if you're having to pass around too much data from one stage to the next for even a GPU's own dedicated video memory to have enough bandwidth, then passing that sort of data from one GPU to another is completely out of the question.
You can readily have different GPUs rendering different frames simultaneously. That way, a GPU computes an entire frame on its own, and then only has to send the completed framebuffer somewhere else. There is little dependence between the computations in one frame and those in the next, unless you neglect to wipe the framebuffer and depth buffer.
If you're trying to render a movie, can work on a bunch of frames simultaneously, and don't care if each frame takes an hour to render, then this could work. Though if that's your goal, I'd question why you need the "36 users for a single card" functionality.
But spreading that across several GPUs simply isn't useful for rendering games, due to latency issues. 1000 GPUs each rendering one frame per second would get you an amazing 1000 frames per second--and leave the game completely unplayable because all you can see is the state of the game as it was a full second ago.

Sometimes I think you are just trying to punch people with words. I'm going to leave all those words in there, even though anyone else who gets through them will have been punched in the brain.

Wouldn't it make a lot more sense to have more than one player's game per GPU than to have multiple GPUs working on one player's game? That seems to be what they are attempting to do. Most games do not utilize 100% of a GPUs processing, so being able to capitalize on that would lead to a hardware and power savings, even if it doesn't lead to a huge performance improvement. I'm pretty sure their incredibly boring presentation talked about a power and cost savings, not a huge performance increase.

Another thing is they are saying what they have doesn't exist anywhere else. Using multiple GPUs per game already exists with the SLI stuff. Multiple people utilizing a single GPU doesn't exist outside of their hardware. It's virtualized hardware for the GPU. I've not even looked at this type of thing for servers, but when running virtual machines, the virtual GPU hardware is bad. Bad, bad, bad, bad, bad. If they can work out a system where virtual GPUs are** garbage, they would have a new product. Much better than an old product that is being used in a suspicious way.

Whether or not they can do it is another question entirely. You seem to have questions about whether even one of their GPUs will give decent performance, much less multiple player's games utilizing a single GPU.

** are not. This should say "are not".

Howdy, Stranger!

NVIDIA @ CES

Comments

Howdy, Stranger!

Quick Links

NVIDIA @ CES

Comments