The Radeon RX Vega 64 tends to be a little faster than a GeForce GTX 1080 in DX12 or Vulkan games, and a little slower in games that use older APIs. The Vega 56 is somewhere between a GTX 1070 and a GTX 1080. Both are clearly slower than a GTX 1080 Ti at gaming.
If the only thing you care about is price and performance, then both the Radeon RX Vega 56 and 64 are a fairly good value if they show up at their stated MSRPs. The many out of stock links on New Egg make it clear that that hasn't happened yet.
But the power consumption merits mentioning, too, and it's a lot. Power consumption matters less in a desktop than pretty much any other form factor, but using 100 W more than a GTX 1080 to only roughly match its performance is a significant difference.
I'm reminded in a way of the Fermi versus Evergreen generation, though with the roles reversed. Back then, Nvidia built a compute card and AMD built a graphics card. Not surprisingly, Fermi was better at compute, while Evergreen was more efficient at graphics. This time, Vega is better at compute, while consumer Pascal is more efficient at graphics.
There are a couple of important differences here. One is that in the Evergreen versus Fermi generation, Nvidia built a big chip and AMD didn't. That allowed Nvidia to have the fastest card, even if it took about 70% more power and 70% more die space to squeeze out 10% more performance. That would be pretty neatly reversed this time if the GP102 die (GeForce GTX 1080 Ti and some Titan cards) didn't exist. But it does.
Another important difference is that people who want a monster compute card from Nvidia can get the Tesla P100, which adds a lot of compute stuff that the consumer Pascal cards lack. Double precision compute and ECC memory get the headlines, but I tend to see the improved cache sizes and bandwidths--which basically match Vega, by the way--as mattering more. There wasn't a great compute card version of Evergreen. Still, that's $6000 for the Tesla P100, as compared to $500 for a Vega 64. And I'd be absolutely shocked if the Vega 64 doesn't beat a Tesla P100 outright at a whole lot of compute applications, not to mention completely destroying a GTX 1080 Ti at a lot of compute purposes.
And yet another very important difference is that AMD was a lot more aggressive on pricing with Evergreen than Nvidia has been with Pascal. The Radeon HD 5870 was pretty indisputably the fastest single GPU card available for several months, and with an MSRP of only $380. Nvidia is charging considerably higher prices than that for Pascal, which allows AMD to price Vega in line with its performance to offer a decent value while still making good money on every card sold.
Good compute benchmarks are hard to come by and generally don't make it into reviews. If you write stupid code that does stupid stuff, you can get bar charts that aren't representative of anything--and most of your readers won't know the difference. For that matter, the person writing the review probably won't know the difference, either. And compute performance varies so wildly from one application to the next that there is no such thing as an "typical" compute score in roughly the sense that we can think of typical gaming performance. Benchmarking is hard, and GPU compute doesn't have a lot of well-known, well-optimized benchmarks that you can grab and run like gaming does.
But that doesn't matter that much to consumer cards. Most people who buy a consumer graphics card want to play games on it. Nvidia focusing more narrowly on graphics while AMD puts more compute hardware in all up and down their lineup is a significant contributing factor to the fact that, for what most people buy a discrete video card for, Nvidia is simply more efficient. As with Fermi so many years ago, you can make a good case for buying the less efficient architecture if you don't care much about power consumption. But that's really not the ideal spot for AMD to be in.
Comments
https://imgur.com/5gLr5Zv
JHH at nVidia today announces that's volta isn't coming to the desktop anytime soon (but will be available in compute cards very soon), hints at possible Pascal refresh for 1100 series. Tells gamers to go buy Pascal...
He doesn't come out and say this part but... because Vega doesn't change anything, so nVidia can continue milking what it has.
Can comfortably beat a optimized 1070 and Polaris in raw throughput, but can't touch them in Hashes / Watt.
So unless things change soon, not looking like it will be a big miner card, for similar reasons that the 1080/1080Ti haven't been big.
Let that sink in.
Nvidia managed to produce cards that perform simillarly (the 980 Ti and Titan Xm) using an EOL process node, stock-standard GDDR5, and two and a half years ahead.
https://hardforum.com/threads/amd-radeon-rx-vega-64-video-card-review-h.1941804/page-2#post-1043161564
https://techgage.com/article/a-look-at-amds-radeon-rx-vega-64-workstation-compute-performance/
Interesting read, as Quiz had guessed
I think one thing none of the reviews ive read have mentioned is OC headroom. IMO that usually gives you a good indication of the quality of the architecture. If AMD is basically pulling another Fiji and just selling the cards OC'd to hell and back so they can match Nvidia... then I'm not even remotely impressed, especially given the power consumption.
However, if there is 10-15% more headroom in there, then that means that they have a good, albeit power inefficient architecture on their hands and that bodes well.
One thing we can all agree on is that its good for Nvidia to have something resembling competition at this point.
Edit: I Should mention this is honestly my suspicion. Power consumption typically goes through the roof when you OC GPUs and CPUs, and the fact that the "highest end" version REQUIRED water cooling to reach those clocks IMO is very telling.
This is just a guess, but im thinking when these are out in the field for a while we're gonna see little to no overclockability.
"The surest way to corrupt a youth is to instruct him to hold in higher esteem those who think alike than those who think differently."
- Friedrich Nietzsche
Graphics is a really weird workload. I realize that it's the usual thing for video cards, but video cards have to do all sorts of crazy contortions in order to be something other than terrible at graphics.
Once you move away from graphics, the situation is a lot simpler. Unless you have a broken architecture like Kepler, or have latency issues usually caused by insufficient thread residency, you typically use 100% of some resource and your performance in a given kernel is very strongly correlated with a synthetic benchmark of that resource.
Usually the limiting factor is one of:
1) instruction throughput, as you're keeping the shaders busy
2) local memory bandwidth, as you're saturating local memory
3) global memory bandwidth, as you're saturating global memory (real-world usable bandwidth is far less than theoretical peak bandwidth), or
4) latency, as you don't have enough threads resident to saturate any resource
It is possible for some other things to be a bottleneck, such as L2 cache or heavy use of atomic operations. But those are much rarer.
Instruction throughput can vary some depending on which instructions you're using. But unless you're making heavy use of an instruction that one architecture has and another doesn't (or merely doesn't have enough of)--and this can as easily skew things in favor of AMD as in favor of Nvidia--it tends to be pretty strongly correlated with theoretical TFLOPS. There, a Vega 64 beats all else, including both a Tesla P100 and a Titan Xp. Though it's at least close.
For local memory bandwidth, you can just read off the paper specs and get real-world performance almost perfectly correlated with that. There are bank collisions, but work basically identically on all of the recent GPU architectures. Theoretical local memory bandwidth in bytes per second is the TFLOPS number exactly on a Vega 64 or a Tesla P100, or half of it on consumer Pascal. A Vega 64 will beat all else there, and more than double a Titan Xp.
For global memory bandwidth, the Tesla P100 is the top dog, and by quite a bit. Four stacks of HBM2 is more than two, and that's that. It looks like the Vega 64 is a little behind a GTX 1080 Ti, and further behind a Titan Xp, in the benchmark of it on your link. But a lot depends on the details of the benchmark. A lot of GPUs have global memory bandwidth vary significantly depending on whether a thread reads or writes 4 bytes at a time, or 8, or 16, or 32. Additionally, recent Nvidia GPUs tend to be faster at writes than reads, while recent AMD GPUs tend to be faster at reads than writes. Different GPUs have different preferences, and a single benchmark can't capture all of the relevant information.
Global memory bandwidth bottlenecks are pretty common in GPU compute. Sometimes it's basically forced on you by the nature of the algorithm. Often, it's caused by stupid code. If you're writing non-graphical GPU compute code and don't know what your bottleneck is, you probably a global memory bottleneck caused by you doing something stupid in your code. If all you know how to do is put a wrapper around a CPU version of code, you'll probably hit global memory a lot more than necessary, and create a global memory bandwidth bottleneck whether there should have been one or not.
And then there is latency. Nvidia has a more sophisticated scheduler than AMD, which helps significantly. But the ability to have more threads resident helps more. Basically, you can have more threads until you run out of either register capacity or local memory capacity. The theoretical numbers there are pretty easy to read off:
Register capacity:
Radeon RX Vega 64: 16 MB
Tesla P100: 14 MB
GeForce GTX Titan Xp: 7.5 MB
Local memory capacity:
Radeon RX Vega 64: 4 MB
Tesla P100: 3.5 MB
GeForce GTX Titan Xp: 2.8125 MB
Guess who's going to win if the issue is latency? It's not guaranteed that Vega 64 will beat a Tesla P100 in such cases, but Nvidia's scheduler advantage isn't likely to let a Titan Xp keep pace with a Vega 64 here.
It's also possible for different cards to have a different bottleneck from the same code. For example, if you have three instructions per local memory access and no bank collisions, you can get a local memory bottleneck on a Titan Xp, but an instruction throughput bottleneck on a Vega 64.
It's also possible for a benchmark to be a chain of multiple kernels, where different kernels have different bottlenecks. That gets you a weighted average of performance against different bottlenecks, kind of like if you had an SSD run a pure read benchmark until it read 10 GB of data, then a pure write benchmark until it wrote 10 GB of data, then reported the total time as a single benchmark score. I tend to find that sort of benchmark less enlightening about what is going on, but I'd expect it to be common. A single game benchmark is a weighted average of so many different things that no one really knows exactly how it ended up with the final numbers.
Note that I barely mentioned a GTX 1080 Ti above, and this closing paragraph is my first mention of a GeForce GTX 1080. The latter especially doesn't even belong in the discussion of good GPU compute cards. But a Radeon RX Vega 64 absolutely does.
I think this will be a trend for coming years. Hardware is already at the table top -- it's the software that's behind.
I've never understood the nvidia vs ati war. I've used both over the years -- 2 extra frames doesn't impress me, and technological support has always been the more pressing for what to buy. One advantage Nvidia still has right now for example is backwards support is still top notch. Way behind in benchmarks on the new graphic libraries and api's compared to AMD right now. (OpenGL 4.3+, Vulkan(which is just opengl simplified and implementing new shader tech. and openCL 2.0+))
I think what you meant is the advantage nVidia has with certain DirectX versions. The cards are just more efficient at it now. Since DirectX 10, nVidia has been lagging at implementation of the standard. As a result AMD typically played newer DirectX versions better until nVidia eventually caught up and surpassed AMD. I expect the same to happen with Volta. With Vulkan, the API is based on Mantle, so AMD will have an advantage here for some time.
https://m.hardocp.com/article/2017/01/30/amd_video_card_driver_performance_review_fine_wine/14
Having used Vulkan... Could care less if it's based on mantle or whatever. It's OpenGL in the coding but no longer having to name everything an int pretty much and shader options have been simplified on the human side. Under the hood. it's still OpenGL + OpenCL. OpenGL with threading options essentially.
Gamers definitely don't need ECC VRAM or Gigateramegaflops of memory bandwidth. Most of that is for very high precision and/or low fault tolerance work. There is a market for it, and you would know if you needed it.
Stability is very important in mission critical industry. In home, not so much. Even an operating system is considered non-critical data. It's always been possible however to get parts with ECC for home use, but who would facilitate it? Save for maybe a small business owner running PoS systems etc. (Point of Sale, not piece of shit.) ECC vram I could see being beneficial in CAD and home 3-D printing/modelling. That framebuffer gets locked up and you could end up between a rock and a hard place.
The other benefits would be in some bleeding edge. AMD's cpu to gpu sharing in new APU and Ryzen chips. Resource sharing VRAM as system RAM, using the gpu as a second processor (Why not it's already pretty much a cpu?) Adding ECC on top makes that significantly more stable.
It is possible for games to use a GPU for compute in addition to graphics. It used to be Nvidia pushing that via PhysX so that they could have major portions of a game run on the GPU on Nvidia and on the CPU if an ATI GPU was present, and then say, look, the game runs five times as fast on Nvidia as on ATI. That AMD GPUs were still branded as ATI is some indicator of how long ago that was.
Today, AMD is pushing GPU compute in games more than Nvidia, simply because AMD's GPU architecture will handle it better. That's partially because AMD puts more compute hardware not specific to graphics in their GPUs all up and down the lineup, and partially because AMD GPUs can handle splitting the same GPU between graphics and compute to have both running at the same time much better than Nvidia.
If a game used the GPU as much to do compute as graphics, it wouldn't be surprising if a Vega 64 beats a GeForce GTX 1080 Ti outright. Yes, a Ti, not just a 1080. But doing graphics properly is a lot of work, and if a game used has the GPU do 10% as much compute work as graphics, that won't jump out at you in benchmarks.
Still, if all you do with your video card is play games and display the desktop, non-graphical compute shouldn't matter to you very much. It's the people who need massive amount of compute performance and realize that a single server with 4 GPUs in it can sometimes replace an entire rack full of CPU based servers that really need to care about GPU compute performance. Unless you're considering buying racks full of servers for your home, that's not you.
Most memory technologies used for GPUs have no way to implement ECC, however. HBM2 is the first with a way to do it in hardware. There have been several GPUs that offered ECC over the years (Nvidia GF100/GF110, GK110/GK210, GP100, and AMD Hawaii), but it was only enabled in professional cards that cost several thousand dollars and came with a significant hit to both memory capacity (as the extra bits for ecc had to be stored somewhere) and a large hit to memory bandwidth. If turning on ECC only cost you 20% of your memory bandwidth, that was doing pretty well.
The cheapest GPU to offer ECC memory ever had an MSRP of something like $3000. It's not a consumer feature.
The real question is whether you need ECC memory for a GPU, and even for enterprise use, the answer is usually "no". I strongly suspect that a lot of the people who think they need ECC on a GPU are just being stupid about it and thinking of it in terms of CPUs where a bit going bad can crash the OS.
It's a question of how bad one bit getting flipped is. If it will quickly wash out of the system and be gone, then ECC probably doesn't gain you anything. For graphics, if a bit getting flipped in memory means that once per year, one pixel of one frame at some random time is the wrong color, and it's gone as soon as that frame is over, do you care? Do you care enough about that one pixel per year to accept losing 10% off of your frame rates across the board for the entire year to avoid it? I say you shouldn't. If the bit is part of a texture, it might mean that one particular texture looks slightly wrong as viewed from certain distances until it gets reloaded onto the GPU, but that's still not a big deal.
Now, there are some algorithms where one bit getting flipped will propagate through the system and turn everything to garbage. If you're doing something that is spread across a ton of GPUs in an HPC and constantly passing data back and forth between them, so that one bit getting flipped means that everything on all of the GPUs ends up as garbage, that's much more of a problem. If you have an HPC with 1000 GPUs in it, and one bit getting flipped means that everything that the entire HPC did yesterday is garbage, then you want ECC memory on your GPU.
It's also important to note that even without ECC memory, there's a lot of other error detection and correct on a GPU. PCI Express data transfers have an error correction protocols in place, which is why moving 128 bits of data from the CPU to the GPU or vice versa requires actually transmitting at least 130 bits. GDDR5 has a way to detect that a bit was transmitted wrongly from memory and ask for it to be resent. Various critical logical circuits through a GPU have various sorts of error detection and correction even on lower end consumer grade hardware. ECC memory on a GPU only protects against data sitting in memory having a bit getting flipped while the data just sits there not being used. And even without ECC, that doesn't happen very often.
Even if you're explicitly writing kernel code to do image processing on a GPU, I wouldn't expect ECC to benefit you much. The image quality improvements from a slight improvement to your algorithm could easily be millions of times as large as the gains from ECC.
There is error correction built into a lot of things, and how much depends greatly on how likely errors are. For SATA, the overhead of error detection and correction is 20% of the data sent. Hard drives and solid state drives use a significant percentage of their physical storage space for error correction. For some wireless things, error detection and correction and other protocol overhead can account for a considerable majority of the data transmitted, to try to make up for sending data over the air being intrinsically so unreliable.
https://techgage.com/news/amd-issues-statement-regarding-price-hiked-vega/
I tried AMD cards I have serious issues with them unless they want to donate me a $1000 card to test I am done with Nvidia until I see proof its really improve or id have to purchase a product at the store and test and then if it does perform as promised return it for refund.