I have a longstanding theory that the first part of a major new architecture is far more important than derivative parts. I don't mean from a commercial perspective, but more in terms of predicting how the company is going to do in the next few years, as a stock trader might wish to do. The reason that the FX-8150 was so disastrous for AMD is not that it was merely a single bad SKU. It meant that outside of the cat cores for the low end, AMD's CPUs would be rather bad for the next 5 1/2 years.
Sometimes the first part of a new architecture is an oddball that makes it hard to judge the architecture. That was the case with Maxwell and the GeForce GTX 750 Ti, for example. The bottom of the line Maxwell part had what was at the time the largest L2 cache for any GPU ever. That took a lot of die space and made the card cost far too much for a lower end card, making it hard to gauge how efficient the architecture was otherwise. It wasn't until the launch of the GeForce GTX 980 later that year that it became clear that Nvidia had a winner on their hands.
The launch of Volta/Turing may well be the most extreme case of this ever. It started with GV100 in the Titan V, which wasn't optimized for graphics. The TU102, TU104, and TU106 dies of the higher end Turing cards were bloated by ray-tracing and tensor cores, with the former somewhat dubious for gaming in the near future and the latter completely worthless to gamers. On a performance per watt basis, or performance per model number (which is a stupid way to analyze cards), they were a fine advance.
The problem was the price tag. $1200 for a consumer card is an awfully tough sell, no matter what it can do. For a $700 card, gamers expect the top of the line, not something far removed from it. The prices were high because the die size was large, in part because it was bloated by unnecessary junk. That made Turing look bad up front, but held out the hope that it could be a fine architecture if you chop out the bloat.
Well, Nvidia just did exactly that. The TU116 die of the GeForce GTX 1660 Ti doesn't have ray tracing logic, nor does it have tensor cores. It does feature a stupid name that is completely out of whack with Nvidia's longstanding naming scheme, though it's less bad than the erratic names that AMD has been giving their Vega cards. Some back of the envelope arithmetic indicates that ray tracing and tensor cores combine for about 15%-20% of the die size of the earlier Turing cards.
So now we see what Turing can really do. The GTX 1660 Ti is about on par in performance with the GTX 1070, while using less power. That makes it a fine choice for consumers at $280, considering the rest of today's market.
But I'm not so interested in a particular SKU as in what this says about what Nvidia will have to offer for the next few years. And on that count, the news is decidedly bad.
The key point of reference is the GP104 die of the GeForce GTX 1080. The latter had a 314 mm^2 die of the Pascal architecture, as compared to 284 mm^2 for the TU116 die of the GTX 1660 Ti. So the new GPU has over 90% of the die size of one that Nvidia paper launched about 33 months earlier--an eternity in technology. The problem doesn't offer 90% of the performance of the GTX 1080.
Over the course of 33 months, performance per mm^2 actually went down. That's awful, and almost never happens, at least if you exclude top end compute parts bloated by non-graphical stuff. And it's in spite of moving to a new, better process node. To be fair, moving from 16 nm to 12 nm isn't a full node die shrink, or even a half node. It's not so much a shrink at all as a more mature node tuned better for how Nvidia wanted it. But with the analogous move from Polaris 10 (Radeon RX 480) to Polaris 20 (Radeon RX 580) to Polaris 30 (Radeon RX 590), AMD at least saw performance per mm^2 go up, not down. It wasn't anything earth-shattering, but it sure beats going in the wrong direction entirely.
Now, the GP104 die of the GTX 1080 was a terrific chip, far more so than we realized at the time. AMD wouldn't launch Polaris until later that year, and even that had rather small dies; they wouldn't launch Vega until about a year after Pascal. And it still isn't entirely clear how GlobalFoundries 14 nm node compares to TSMC's 16 nm. GP104 also had the disadvantage of being Nvidia's lead chip on a new and then-immature process node, while TU116 is launching on what is now a very mature process node.
That doesn't mean that Turing is catastrophic for Nvidia. This isn't a Bulldozer-level catastrophe that will threaten the company's viability. It's not a bad architecture on an absolute scale; it's not like Fermi for graphics or Kepler for compute. But it might be more like a GPU version of Kaby Lake: nice at launch, and fine in its own right, but merely treading water while your competitor is greatly advancing. Depending on how good AMD's upcoming Navi architecture is, it's also very possible that the better comparison for Turing will be Broadwell: too expensive and not much of an advance, but still ahead of the competition.
The GTX 1660 Ti launching now is also bad news for Nvidia in another sense. AMD already has a 7 nm GPU out. Navi is coming, with various rumors putting it in July or October, and AMD promising to say a lot more about it sometime this year. That Nvidia is launching a new $280 GPU today means that they surely aren't going to have a $280 GPU on 7 nm anytime soon. If they had a new 7 nm lineup coming by the middle of the year, they'd have canceled this part long ago. That lends credibility to the rumors that Nvidia won't have anything on 7 nm until 2020.