Intel has figured out how to stack logic dies and is now trying to determine if it is useful

Quizzical · July 2020

Intel calls their die stacking technology Foveros, and their first product to use it is Lakefield. You can read a lot more about it here:

https://www.anandtech.com/show/15877/intel-hybrid-cpu-lakefield-all-you-need-to-know

There's no intrinsic reason why you can't stack dies on top of each other. Memory chips have been doing this for quite some time. But for logic dies, it's a lot harder.

One thing that makes memory chips easier to stack is that they put out very little power. If one chip that uses a tiny fraction of a watt is stacked on top of another that also uses a tiny fraction of a watt, keeping them sufficiently cooled is easy. So long as they're not particularly insulated, they'll be fine.

If you want to have one chip that is putting out 100 W and stack that on top of another chip that is also putting out 100 W, you have a problem. Traditional CPUs and GPUs have a planar design where whatever is putting out a lot of heat is as close as possible to a heatspreader and then heatsink to suck that heat away. If you have another hot chip in between them, then the heatsink may be able to properly cool the top chip, but that bottom one is going to get rather toasty.

Cooling is hardly the only problem. There's also thermal expansion. If one die gets hotter than the other and expands more, you can crack whatever you're using to hold them together. That can be a problem with the bumps to attach the die to a package even without stacking dies, of course. But having two dies with independent hotspots makes it into a bigger problem. And if you want the two dies to have lots of little connections to allow them to communicate directly, that makes it a much bigger problem. When memory dies are stacked on top of each other, the dies are independent and don't need to communicate with each other at all.

With Lakefield, Intel tries to start off small, with a 7 W TDP. Naturally, that makes the problem much simpler than if you had a 100 W TDP. One logic die is for the CPU cores, while the other is for I/O. AMD does something similar with their Zen 2 desktop and server CPUs, except that AMD doesn't try to stack the dies on top of each other. Lakefield also sticks memory on top of the entire package, which makes it harder to cool it. But you can get away with that with a 7 W TDP, even if you couldn't with a higher TDP.

What Intel does with CPU cores is to have one Sunny Cove (Ice Lake) core, and four Tremont atom cores. All previous x86 CPUs have used exactly the same cores for all cores in the entire CPU. Smartphones have had mixed cores for quite some time, with the idea that you can get better energy efficiency by having both big cores for the tasks that really need them, while also having small cores where you can shove background tasks that don't need much performance.

But this creates all sorts of problems. For starters, in order to put the tasks that need high performance on a high performance core and those that don't on a low power core, the operating system needs to know how to do this. And Windows doesn't. Nor does Linux. Nor Mac OS X, though this probably isn't coming to a Mac at all.

Another problem is that while the big and little cores can do things somewhat differently internally, they need to be able to execute exactly the same instructions or else programs won't work. That means that Lakefield can only afford to expose the instructions common to both the Sunny Cove and Tremont cores--and yes, the latter has some that the former doesn't. So you get a crippled big core combined with some crippled small cores.

So why? Why take on all of these problems? For hybrid CPU cores, the answer is that, if it all works right, it's the best way to get the maximum amount of performance out of a fixed TDP. Of course, it's pretty much guaranteed not to work how you'd hope, so that alone will probably make it a dumb product.

Of course, that maximum performance will surely get squashed by the stacked dies of Lakefield. Normally, if you build an x86 CPU with a "7 W" TDP, you can go way above that 7 W for short periods of time. It's okay to use 20 W briefly, then throttle back as the chip heats up. The "7 W" is the maximum sustained power usage. But if stacking dies will cause mismatched hot spots to overheat faster or even crack apart, then you can't have turbo use 20 W. Intel claims that the max turbo power is 9.5 W. So much for maximizing performance.

As for stacking dies, the immediate point seems to be that you can have a smaller package. So instead of your CPU package taking somewhat under a square inch, the Lakefield package with a CPU and memory together can be a square only 12 mm on a side. That's a big advantage in all of those situations where an x86 CPU is used, but a package size of a square inch is just too big. That's a decent enough description of a phone (which won't use x86), or of Intel's compute stick, but not a laptop whose physical size is driven by the monitor, let alone desktops or servers.

So the upshot is that you can pay a lot of money to get a low performance device that will cause all sorts of problems for software. That sure sounds like a terrible product to me.

So what's the point? As best as I can tell, it's more a test vehicle than a real consumer product, even if it will be offered to consumers. Rather than having die stacking kill one product while you try to work out its kinks, and hybrid CPU cores kill another product while you try to fix its problems, just sacrifice one product to be very useless for a lot of reasons. See what goes wrong and fix it in the next die stacking product. And also in the next hybrid core product. Assuming that there is a next product for both technologies. Or either of them.

Longer term, hybrid CPU core approaches could have a future if you want to push x86 into lower power envelopes. They're already ubiquitous in phones, and the same reasons should apply to x86 as to ARM. Of course, that assumes that you actually want to use x86 cores and not ARM when you need low power.

Stacking logic dies in the manner of Lakefield doesn't have any obvious applications in the near future, however. Rather, This could be an attempt at making a longer run development effort. Everyone knows that you can't keep doing die shrinks forever. How do you keep improving performance after you can't shrink silicon any further? There have been some proposals to do so by going to other materials, but whatever you use, it can only shrink so far.

Another proposal is to stack dies on top of each other in a 3D manner. Rather than having a single 20x20 mm die, why not have four 10x10 mm dies stacked on top of each other? If you can have the dies communicate in a fine-grained manner, you only need to move data a fraction of a millimeter to the next die down, rather than 10 mm for the next die over. That could save power, and ultimately allow more performance in the same TDP as before.

Of course, actually building all those tiny wires to connect stacked dies is hard. Lakefield has some communication between dies, but it's still on a much larger scale than the metal layers within a single chip. But you have to start somewhere, and this might be Intel's attempt at starting to build toward full 3D logic chips. Start seeing what goes wrong now and fix it before it actually needs to work right. Then shrink as you go. Or at least, shrink them if you can. Trying to shrink the wires means it takes that much less thermal expansion to snap them off entirely.

Quizzical · July 2020

Jean-Luc_Picard said:

https://wccftech.com/intel-tiger-lake-xe-gpu-matches-nvidia-geforce-mx350-faster-than-7nm-amd-vega-gpu/

I'm not sure why you posted that on this thread, as it's not about die stacking, a mixed CPU core architecture, or anything else relevant to this thread.

Wizardry · July 2020

They are trying to create a whole new arms race,a more costlier option to the consumer.
Oh look ,space saving,faster speeds,...please deposit your bank loan right here>>>>$$$$.

What will this do for our games,not a damn thing,it might help all these poorly optimized games run faster at a higher cost but the quality of games is not going anywhere.

What the industry has needed for a very long time is ONE infrastructure that makes it easier,less costly to make games and eliminates Bill Gates out of the equation.If we just continue this same path 20 years from now 5% better games 500% more costly computers and everyone will be playing BR cash shop gaming on a $6000-$10,000 computer.

Nanfoodle · July 2020

Wizardry said:

They are trying to create a whole new arms race,a more costlier option to the consumer.
Oh look ,space saving,faster speeds,...please deposit your bank loan right here>>>>$$$$.

What will this do for our games,not a damn thing,it might help all these poorly optimized games run faster at a higher cost but the quality of games is not going anywhere.

What the industry has needed for a very long time is ONE infrastructure that makes it easier,less costly to make games and eliminates Bill Gates out of the equation.If we just continue this same path 20 years from now 5% better games 500% more costly computers and everyone will be playing BR cash shop gaming on a $6000-$10,000 computer.

Often thing like this have nothing to do with gaming. The applications are unknown but is experimental that can lead to who knows what.

Quizzical · July 2020

Wizardry said:

What the industry has needed for a very long time is ONE infrastructure that makes it easier,less costly to make games and eliminates Bill Gates out of the equation.If we just continue this same path 20 years from now 5% better games 500% more costly computers and everyone will be playing BR cash shop gaming on a $6000-$10,000 computer.

Computer prices have been dropping dramatically. A low end piece of junk PC used to cost over $1000. Now you can actually get something decently nice for under $1000--and decently nice by today's standards, not just by the standards of 25 years ago.

Cuddleheart · July 2020

Great read! Thanks!

Questions:

Do you think Intel might be trying more to break into the mobile processor market? Losing Apple and the hype battle for the 10 series chips might have them looking to diversify? First GPUs, now this? Speaking of Apple...Am I mistaken in thinking the next gen of Macbooks running their proprietary chips might have at least some implementation of ARM?

Quizzical · July 2020

Cuddleheart said:

Great read! Thanks!

Questions:

Do you think Intel might be trying more to break into the mobile processor market? Losing Apple and the hype battle for the 10 series chips might have them looking to diversify? First GPUs, now this? Speaking of Apple...Am I mistaken in thinking the next gen of Macbooks running their proprietary chips might have at least some implementation of ARM?

While Intel has been trying to break into the mobile market, I don't think that chip stacking helps with that. Hybrid core architectures could, but even one big core might be too much for a cell phone. I think that's more a play toward lower power usage (and hence higher battery life) in laptops, if they can ever get it to work well enough to be useful.

Intel's discrete GPUs might be more an effort at getting into the HPC market than consumer graphics. Even so, I think the reason why GPUs have gotten some traction in HPCs and some other architectures haven't is that graphics is such a pathological problem to do well. If you can do graphics well, then there's necessarily a whole lot of things that you can do well, and that translates to being good at some compute problems, too. That avoids the problem of building a new architecture from scratch and hoping that by dumb luck, it will be good at something other than the few problems that you specifically had in mind when you built it.

Intel has already attempted to build discrete GPUs in the past. Twice, even. It didn't go well. About a decade ago, Larrabee never even made it to market before Intel killed off the graphics application and tried to rebrand it as a compute card ("Xeon Phi") that was awful at compute. A decade prior to that, the Starfighter Real3D i740 did actually make it to market, but was something of a laughingstock.

Intel's previous efforts at building a discrete GPU tried to be too clever. The point of the i740 was to say, this new AGP port is so fast that we don't need onboard video memory, and can just use system memory instead. Intel was wildly wrong about that, and later made a PCI version of the card that was faster than the AGP version because it had onboard video memory.

Meanwhile, Larrabee was an attempt at making a GPU out of x86 cores rather than something more customized for graphics. Intel seemed to think that this would make it good at compute, but it really didn't. GPUs have largely solved the problem of how to make your code scale well to very wide SIMD architectures in the sense that GPUs use them. CPUs really haven't figured out how to do that, and even today, trying to use SSE and AVX efficiently in x86 CPUs is a mess. So all that Intel ended up doing was making something that was inferior to ordinary Xeon CPUs in code that isn't trivially parallelizable, far inferior to GPUs in code that is embarrassingly parallel, and a lot harder to write code for than either.

As for Apple, yes, it's ARM cores. Apple has a ton of experience at designing their own ARM cores, ARM already has quite an ecosystem built up rather than Apple having to create something from scratch, and ARM licenses are pretty cheap.

Cuddleheart · July 2020

Quizzical said:
While Intel has been trying to break into the mobile market, I don't think that chip stacking helps with that. Hybrid core architectures could, but even one big core might be too much for a cell phone. I think that's more a play toward lower power usage (and hence higher battery life) in laptops, if they can ever get it to work well enough to be useful.

Intel's discrete GPUs might be more an effort at getting into the HPC market than consumer graphics. Even so, I think the reason why GPUs have gotten some traction in HPCs and some other architectures haven't is that graphics is such a pathological problem to do well. If you can do graphics well, then there's necessarily a whole lot of things that you can do well, and that translates to being good at some compute problems, too. That avoids the problem of building a new architecture from scratch and hoping that by dumb luck, it will be good at something other than the few problems that you specifically had in mind when you built it.

Intel has already attempted to build discrete GPUs in the past. Twice, even. It didn't go well. About a decade ago, Larrabee never even made it to market before Intel killed off the graphics application and tried to rebrand it as a compute card ("Xeon Phi") that was awful at compute. A decade prior to that, the Starfighter Real3D i740 did actually make it to market, but was something of a laughingstock.

Intel's previous efforts at building a discrete GPU tried to be too clever. The point of the i740 was to say, this new AGP port is so fast that we don't need onboard video memory, and can just use system memory instead. Intel was wildly wrong about that, and later made a PCI version of the card that was faster than the AGP version because it had onboard video memory.

Meanwhile, Larrabee was an attempt at making a GPU out of x86 cores rather than something more customized for graphics. Intel seemed to think that this would make it good at compute, but it really didn't. GPUs have largely solved the problem of how to make your code scale well to very wide SIMD architectures in the sense that GPUs use them. CPUs really haven't figured out how to do that, and even today, trying to use SSE and AVX efficiently in x86 CPUs is a mess. So all that Intel ended up doing was making something that was inferior to ordinary Xeon CPUs in code that isn't trivially parallelizable, far inferior to GPUs in code that is embarrassingly parallel, and a lot harder to write code for than either.

As for Apple, yes, it's ARM cores. Apple has a ton of experience at designing their own ARM cores, ARM already has quite an ecosystem built up rather than Apple having to create something from scratch, and ARM licenses are pretty cheap.

Thanks for the info! So how are you going to get the computing power out of ARM to run all the graphics and audio software that people used to swear by Macs for? I heard that they're going to make a new iteration of Rosetta, but it seems far fetched to have full and reliable x86 to ARM conversion. I know nothing about Macs, but the processer switch raised a lot of questions.

Ridelynn · July 2020

Cuddleheart said:

Quizzical said:
While Intel has been trying to break into the mobile market, I don't think that chip stacking helps with that. Hybrid core architectures could, but even one big core might be too much for a cell phone. I think that's more a play toward lower power usage (and hence higher battery life) in laptops, if they can ever get it to work well enough to be useful.

Intel's discrete GPUs might be more an effort at getting into the HPC market than consumer graphics. Even so, I think the reason why GPUs have gotten some traction in HPCs and some other architectures haven't is that graphics is such a pathological problem to do well. If you can do graphics well, then there's necessarily a whole lot of things that you can do well, and that translates to being good at some compute problems, too. That avoids the problem of building a new architecture from scratch and hoping that by dumb luck, it will be good at something other than the few problems that you specifically had in mind when you built it.

Intel has already attempted to build discrete GPUs in the past. Twice, even. It didn't go well. About a decade ago, Larrabee never even made it to market before Intel killed off the graphics application and tried to rebrand it as a compute card ("Xeon Phi") that was awful at compute. A decade prior to that, the Starfighter Real3D i740 did actually make it to market, but was something of a laughingstock.

Intel's previous efforts at building a discrete GPU tried to be too clever. The point of the i740 was to say, this new AGP port is so fast that we don't need onboard video memory, and can just use system memory instead. Intel was wildly wrong about that, and later made a PCI version of the card that was faster than the AGP version because it had onboard video memory.

Meanwhile, Larrabee was an attempt at making a GPU out of x86 cores rather than something more customized for graphics. Intel seemed to think that this would make it good at compute, but it really didn't. GPUs have largely solved the problem of how to make your code scale well to very wide SIMD architectures in the sense that GPUs use them. CPUs really haven't figured out how to do that, and even today, trying to use SSE and AVX efficiently in x86 CPUs is a mess. So all that Intel ended up doing was making something that was inferior to ordinary Xeon CPUs in code that isn't trivially parallelizable, far inferior to GPUs in code that is embarrassingly parallel, and a lot harder to write code for than either.

As for Apple, yes, it's ARM cores. Apple has a ton of experience at designing their own ARM cores, ARM already has quite an ecosystem built up rather than Apple having to create something from scratch, and ARM licenses are pretty cheap.

Thanks for the info! So how are you going to get the computing power out of ARM to run all the graphics and audio software that people used to swear by Macs for? I heard that they're going to make a new iteration of Rosetta, but it seems far fetched to have full and reliable x86 to ARM conversion. I know nothing about Macs, but the processer switch raised a lot of questions.

The latest rumor is that x86 won’t be going away entirely.

The Pro line would stay x86 (at least for the mid-term), and the consumer line would switch to Arm. Catalyst let’s you run ARM code on the x86 platform, and Rosetta let’s your run x86 code on ARM, so stuff still runs regardless of which platform your on.

its just a question of how well it runs. Ideally, devs would put out an optimized version for both, and Rosetta actually allows you (or at least it used to) to include the binaries for both in the App package.

Rosetta 1.0 (PPC to x86 transition) ran pretty well, it was fairly transparent for the most part. Things that optimized for AltiVec took a big performance hit, but there were not a whole lot of those applications that didn’t get re-optimized for x86 pretty quickly.

The recently posted Geekbench scores on the ARM development box look fairly competitive (for consumer devices) and that is running under Rosetta right now

Quizzical · July 2020

Cuddleheart said:

Thanks for the info! So how are you going to get the computing power out of ARM to run all the graphics and audio software that people used to swear by Macs for? I heard that they're going to make a new iteration of Rosetta, but it seems far fetched to have full and reliable x86 to ARM conversion. I know nothing about Macs, but the processer switch raised a lot of questions.

There is no reason why x86 has to be fast and ARM has to be slow. Any CPU architecture can be built for high performance at the expense of high power consumption, or low power consumption at the expense of low performance. That x86 has traditionally gone relatively more for the former and ARM more for the latter doesn't mean that it couldn't be the other way around.

Apple is the first company to care about making high-performance ARM cores for a desktop environment, but they design their own ARM cores, so they can build cores specifically for that use case. There's no reason why they can't make ARM cores competitive in desktops with what Intel and AMD make. They might try and fail, but even if so, that won't be because it was impossible. It's also quite possible to make chips with large numbers of ARM cores. Here's a company that will sell you server CPUs with 80 ARM cores each:

https://amperecomputing.com/altra/

Marvell and Amazon have somewhat similar products of server CPUs with many ARM cores, too.

As for other things like a GPU, you can match any CPU with any GPU, and it's just a matter of having the right drivers. Imagination graphics have already been matched with both ARM and x86 (one generation of Atom). AMD's Radeon graphics are going to end up in a Samsung SoC with ARM cores soon.

Quizzical · July 2020

Ridelynn said:
The latest rumor is that x86 won’t be going away entirely.

The Pro line would stay x86 (at least for the mid-term), and the consumer line would switch to Arm. Catalyst let’s you run ARM code on the x86 platform, and Rosetta let’s your run x86 code on ARM, so stuff still runs regardless of which platform your on.

its just a question of how well it runs. Ideally, devs would put out an optimized version for both, and Rosetta actually allows you (or at least it used to) to include the binaries for both in the App package.

Rosetta 1.0 (PPC to x86 transition) ran pretty well, it was fairly transparent for the most part. Things that optimized for AltiVec took a big performance hit, but there were not a whole lot of those applications that didn’t get re-optimized for x86 pretty quickly.

The recently posted Geekbench scores on the ARM development box look fairly competitive (for consumer devices) and that is running under Rosetta right now

I thought Apple said that they were going to do the full transition over the course of two years, with the first products coming later this year. That was generally interpreted as the consumer products moving to ARM later this year, while the Mac Pro makes the move two years from now. Trying to keep Mac Pro on x86 while everything else moves to ARM and then trying to make software compatible with both sounds like a pain.

Ridelynn · July 2020

Quizzical said:

Ridelynn said:
The latest rumor is that x86 won’t be going away entirely.

The Pro line would stay x86 (at least for the mid-term), and the consumer line would switch to Arm. Catalyst let’s you run ARM code on the x86 platform, and Rosetta let’s your run x86 code on ARM, so stuff still runs regardless of which platform your on.

its just a question of how well it runs. Ideally, devs would put out an optimized version for both, and Rosetta actually allows you (or at least it used to) to include the binaries for both in the App package.

Rosetta 1.0 (PPC to x86 transition) ran pretty well, it was fairly transparent for the most part. Things that optimized for AltiVec took a big performance hit, but there were not a whole lot of those applications that didn’t get re-optimized for x86 pretty quickly.

The recently posted Geekbench scores on the ARM development box look fairly competitive (for consumer devices) and that is running under Rosetta right now

I thought Apple said that they were going to do the full transition over the course of two years, with the first products coming later this year. That was generally interpreted as the consumer products moving to ARM later this year, while the Mac Pro makes the move two years from now. Trying to keep Mac Pro on x86 while everything else moves to ARM and then trying to make software compatible with both sounds like a pain.

True.

But Apple has committed to supporting x86 “for years to come”, and they have put out a new Mac Pro model about every 8 years lately, so...

Quizzical · July 2020

Ridelynn said:

Quizzical said:

Ridelynn said:
The latest rumor is that x86 won’t be going away entirely.

The Pro line would stay x86 (at least for the mid-term), and the consumer line would switch to Arm. Catalyst let’s you run ARM code on the x86 platform, and Rosetta let’s your run x86 code on ARM, so stuff still runs regardless of which platform your on.

its just a question of how well it runs. Ideally, devs would put out an optimized version for both, and Rosetta actually allows you (or at least it used to) to include the binaries for both in the App package.

Rosetta 1.0 (PPC to x86 transition) ran pretty well, it was fairly transparent for the most part. Things that optimized for AltiVec took a big performance hit, but there were not a whole lot of those applications that didn’t get re-optimized for x86 pretty quickly.

The recently posted Geekbench scores on the ARM development box look fairly competitive (for consumer devices) and that is running under Rosetta right now

I thought Apple said that they were going to do the full transition over the course of two years, with the first products coming later this year. That was generally interpreted as the consumer products moving to ARM later this year, while the Mac Pro makes the move two years from now. Trying to keep Mac Pro on x86 while everything else moves to ARM and then trying to make software compatible with both sounds like a pain.

True.

But Apple has committed to supporting x86 “for years to come”, and they have put out a new Mac Pro model about every 8 years lately, so...

What is Apple supposed to say? "Don't buy a Mac now because support will be discontinued next year?"

Cleffy · July 2020

I imagine with a stacked die, you would encounter either heat issues in the center, or latency issues with transistors further from the connection points.

Cleffy · July 2020

I wish I could earn 1900 buckets a week.

Howdy, Stranger!

Intel has figured out how to stack logic dies and is now trying to determine if it is useful

Comments

Howdy, Stranger!

Quick Links

Intel has figured out how to stack logic dies and is now trying to determine if it is useful

Comments