Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Intel drains Cooper Lake

QuizzicalQuizzical Member LegendaryPosts: 25,355
The original scoop was here, but it's behind a paywall:

https://semiaccurate.com/2020/03/14/intel-kills-off-a-server-program/

Intel confirmed the story here:

https://www.servethehome.com/intel-cooper-lake-rationalized-still-launching-1h-2020/

I'm also taking some details from here:

https://www.anandtech.com/show/15631/intels-cooper-lake-plans-the-chip-that-wasnt-meant-to-exist-dies-for-you

The short version is that Intel's next major server platform, Cooper Lake, is almost entirely cancelled.  It will still exist only in 4-socket and 8-socket versions, and only for select customers.  It's possible that the only customer for it at all will be Facebook.

In one sense, this is shocking.  We've just had Intel cancel a generation of mainstream server CPUs in favor of continuing to sell previous generations.  That just doesn't go over well in markets for integrated circuits that scale with Moore's Law.  By the time the new generation was going to launch, the previous one is probably a little dated and not nearly as competitive as it was when it first launched.

But in another sense, this was expected.  The basic problem is that Intel was putting a ton of work into making a new generation of parts that wasn't meaningfully better than the old one.  Cooper Lake wasn't going to have much of a point, anyway, so the cancellation isn't a big deal.  But even then, that Intel was putting so much work into a new generation of server parts that would be nearly pointless is shocking in another way.

Intel's main problem these days is that their 10 nm process node is a complete disaster.  It has been delayed by years.  Even now, it can only produce tiny chips in low volumes at high cost.  And those chips aren't even very good.

If you don't have a new process node to move to, you can keep using the old one.  The problem is that Intel decided that their new Sunny Cove cores would be built on 10 nm, not 14 nm.  That reduces Intel to being able to produce old CPU cores on an old process node.  That's little more than an opportunity to fix things that were broken and take advantage of a more mature process node.

In gaming desktops, that has worked out all right.  You could tweak things a little to allow higher clock speeds at the expense of higher power consumption.  So we've seen Intel go from Sky Lake cores with max turbo up to 4.2 GHz in Sky Lake to 4.5 GHz in Kaby Lake to 4.7 GHz in Coffee Lake to 5.0 GHz in Coffee Lake Refresh to probably something more than 5.0 GHz in the upcoming desktop version of Comet Lake.  Power consumption at load has gone way up, but that's not a big deal in a gaming desktop.

That does not work in servers.  If you only have one gaming desktop in your house and it occasionally uses an extra 50 W, that's not a big deal.  If you have ten thousand CPUs in a data center and each of them needs to use an extra 50 W around the clock all at once, that is called a problem.

It's also important to understand that the gaming desktop CPU market isn't very big.  The server CPU market is huge.  The CPUs that Intel sells for desktops are primarily intended for servers or laptops, and then they tweak it a little to offer a desktop version.

So what could Intel do to improve their servers without a new process node or a new CPU core?  They had moved from Sky Lake to Cascade Lake, and the plan was to move to Cooper Lake.  The main headline difference between Sky Lake and Cascade Lake is that the latter had Meltdown/Spectre fixes.  There was also a slight bump in clock speeds allowed by a maturing process node.  For example, the Xeon Platinum 8176 had a base clock speed of 2.1 GHz and a max turbo of 3.8 GHz.  Its successor, the Xeon Platinum 8276, had a base clock speed of 2.2 GHz and a max turbo of 4.0 GHz in the same 165 W TDP.  There were minor improvements like that all up and down the line.

With Cascade Lake built on an already mature process node, Cooper Lake likely had less to offer there.  Instead, the headline feature was support for bfloat16.  Why this matters (or more to the point, doesn't) involves digging into the weeds quite a ways.

Floating-point data types generally have 1 bit for the sign of a number, some number of bits for the exponent, and some number for the mantissa.  Add those numbers together and you get the total number of bits that it takes to store the number.  Smaller allows your data to take less space and makes it cheaper to do computations with it.  More bits for the exponent allow you to store larger or smaller (as in, 0.000000001) numbers, while more bits for the mantissa allow you to store more precise numbers.

A normal 32-bit float has 8 bits for the exponent, and 23 bits for the mantissa.  The typical 16-bit half has 5 bits for the exponent and 10 bits for the mantissa.  For a lot of purposes, that's just not enough.  For example, the largest number that you can store in a half is 65504.  A 32-bit float can handle numbers up to a little over 34 undecillion.

The idea of a bfloat16 is that you still only use 16 bits as with a half, but reallocate them as 8 bits for the exponent and 7 for the mantissa.  That gives you about the same range as a 32-bit float.  It comes at the expense of having much less precision.  For example, you cannot store the number 257 in a bfloat16, as it would be rounded to 258.  With a 32-bit float, the smallest positive integer that you cannot store because it would be rounded to some other is 16777217.

Thus, CPUs heavily optimized to do arithmetic with bfloat16 data really aren't very useful, which is why they haven't previously existed.  Now, Cooper Lake would also be able to handle half and float and double and so forth.  Adding bfloat16 support was set to be the most notable feature difference between Cascade Lake and Sky Lake.  Intel pushed it hard, only to find that no one except Facebook actually wanted it.  So Intel decided to produce the particular SKUs that Facebook wanted and cancel all of the rest--including all of the one and two socket versions that would have handled most of the x86 server market.  To point that into perspective, AMD doesn't even offer 4-socket or 8-socket server CPUs at all, as that portion of the market is too small to bother, even though they've got some terrific server CPUs.  If you want a new Intel server CPU, they'll happily continue to sell you Cascade Lake.

So in that sense, the cancellation wasn't surprising.  But it probably means no new Intel CPUs until they can make them on 10 nm.  Ice Lake is supposed to come this year.  Of course, it was also supposed to come in 2019.  And in 2018.

Comments

  • QuizzicalQuizzical Member LegendaryPosts: 25,355
    But Cooper Lake probably would have been at least a little better than Cascade Lake.  After all, Intel put a ton of work into making it exist in the first place.  So the cancellation could herald the good news that Intel's 10 nm process node is finally working right and Ice Lake server parts are coming soon.  You may cancel a generation of parts if it was barely going to make it to market before its successor, and the successor is much better.  And that's possible.

    But the history of Intel's 10 nm process node isn't encouraging.  It's not just that Cannon Lake was so awful that Intel is ambivalent as to whether to acknowledge that it ever existed.  For example, see here:

    https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html

    Follow the link to "Products formerly Cannon Lake".  It's a dead link.

    But the laptop version of Ice Lake has been rather troubled, too.  More than half a year after launch, the only laptop vendors selling them at all on New Egg are Dell, HP, Acer, and Microsoft.  For comparison, the 14 nm Comet Lake--which launched at the same time--has also been heavily adopted by Asus, MSI, and Lenovo, all major laptop vendors who have seen fit to simply ignore Ice Lake.  I don't think that major laptop vendors deciding to ignore CPUs on a new process node because the ones on an older process node were more suitable for their premium products has ever happened before.

    Intel's 7 nm node can't come soon enough.
Sign In or Register to comment.