A server with 8 of the new GPUs in it costs $149,000. Nvidia is taking pre-orders now and promising delivery starting in Q3 of this year. The cards won't be available other than in the $149k server until Q4, assuming no delays happen between now and then. Remember that Nvidia announced GP100 around this time last year, but the cards didn't go on sale until March of this year and still aren't that widely available.
It's apparently on a "new" process node that TSMC made custom for Nvidia. Nvidia is calling it 12 nm FFN. Transistor density is basically unchanged from 16 nm, which makes me wonder if it's just a modified 16 nm process that they call 12 nm for marketing reasons. Claimed energy efficiency is up slightly as compared to GP100, but not as compared to consumer Pascal chips.
It's an extremely ambitious chip with a die size of 815 mm^2. That's not merely the largest GPU ever made; it's about 1/3 larger than the second largest. And that's on a brand new process node. Yields are sure to be terrible, but with what Nvidia is charging for it, you can afford to throw most of the chips produced in the garbage as defective and still make a good profit. That's not profitable if you're hoping to sell GeForce cards for $700, however, so don't expect to see this show up in GeForce cards.
Whereas Pascal was basically a die shrink of Maxwell, this really is a new architecture. Nvidia put massive die space into tensor operations, which will let you do a 4x4 matrix multiply-add of 16-bit floating point values very efficiently. That, like 64-bit floating point operations, is pretty much useless for graphics. But that's kind of the point: while this is technically a GPU, it's really not for graphics.
They beefed up some cache sizes as compared to the GP100 version of Pascal, but GP100 did so as compared to Maxwell while the consumer Pascal didn't. It's not clear whether GeForce cards will get larger cache sizes. Note that AMD has long had more register and local memory capacity in their consumer GPUs than Nvidia; increasing those cache sizes in GP100 really only brought the chip up to parity with the GCN/Polaris cards AMD has been selling since 2012.
Nvidia is also creeping away from the SIMD approach. While previous GPUs had 32 threads in a warp that had to stay together, now Nvidia claims that they have some degree of independent scheduling. It's not clear at all what that means, however. It's hard to imagine that being purely a big chip phenomenon, though, so I expect that the modified scheduling will come to GeForce cards, too.