For clarity, "high-end desktop" (HEDT) is Intel's terminology for the consumer versions of their platforms that tend to have more cores than their "normal" desktops but no integrated GPU. The cheapest CPUs in the HEDT lineup have generally been around $300 or more. The generations of it have been Nehalem, Gulftown, Sandy Bridge-E, Ivy Bridge-E, Haswell-E, and most recently, Broadwell-E.
Yesterday AMD announced Threadripper, which is basically a two-die Ryzen-based solution with up to 16 Zen cores. This is really AMD's first credible shot at the HEDT market since Intel split their lines to invent it in 2008.
One traditional problem with HEDT parts is that having so many cores means that the cores can't clock all that high. Thus, the HEDT parts typically trail behind the normal consumer quad core CPUs in single-threaded performance. Intel has commonly had the HEDT market use older CPU cores and older process nodes than the normal consumer market, in part because more cores mean larger dies, and that requires more mature process nodes to get acceptable yields.
There have at times been efforts at making a 2-socket desktop, in which you use two separate CPUs. I don't mean two cores; I mean two entirely separate chips that have their own separate CPU socket, memory pool, and so forth. The two CPUs can communicate over some bus; Intel has called it QPI in recent years. Spreading the cores among two sockets means that you can double the number of cores in the system without creating cooling problems or causing clock speed problems from so many cores so near each other.
The problem with the two-socket approach is that for many things, it just doesn't work very well. If a thread running on one CPU needs memory attached to the other CPU, it has to go across the QPI link to get it. For occasional accesses, that's fine, but if half of your memory accesses have to go over QPI, that can give you a huge bottleneck in a hurry. As programs don't have a way to specify which memory pool to use in a two-socket system, if threads are migrating from one CPU to the other a lot without releasing and reallocating memory, you can expect a whole lot of memory accesses to go over QPI.
Intel did push two-socket for high end desktops as recently as Skulltrail, which was basically two Core 2 Quad CPUs. But due to creating bottlenecks in what was then the front-side bus, many programs performed worse on two CPUs than they did on one. After that, Intel relegated the multi-socket approach to servers and used single CPUs for their HEDT platforms.
AMD's proposal is to have a two-socket HEDT system with all its benefits, including double the memory bus with and capacity. But instead of having a giant QPI (or HyperTransport, as AMD has traditionally used) bottleneck, put the two physical CPUs in the same socket, connected by an interposer. That way, you get enormous bandwidth connecting the two CPUs, rather than a big bottleneck.
So why can't they just add a ton of bandwidth to traditional two-socket systems and fix the problem that way? One issue is pin count. You've only got so much room for I/O coming out of your package. Pins take space, and if you want to add more pins, you have to have bigger, more expensive chips, with all of the attendant drawbacks. An interposer can allow massively smaller "pins", allowing far more of them and very wide bus widths to connect multiple dies in the same package. That doesn't let you get massive bandwidth to anything outside of the package, but it does let you connect two CPUs in the same package, or a CPU to a GPU in the same package, or as we've already seen with Fiji, a GPU to HBM.
You might ask, what's the difference between this and the multi-die server chips AMD has had in the past, including Magny-Cours and Valencia? If one die is going to burn 130 W all by itself, what is a two-die package supposed to burn? Clock speeds for the server chips were far too low to be appropriate for consumer use. Ryzen 7 tops out at 95 W, in contrast, with a 65 W consumer version that has all eight cores active. That leaves plenty of space to add more cores without taking a huge hit to clock speeds. At minimum, they could probably at least match Ryzen 7 1700 clock speeds in 130 W, or go higher if they're willing to burn more power. Using an interposer makes it at least possible to spread out the dies a little, which can help with cooling.
But why stop at two dies? Yesterday AMD also announced Epyc, their new server line. The top end part has four dies, for 32 cores total. That will mean considerable drops in clock speeds as compared to Ryzen 7, so it would a stupid part for consumer use. AMD will offer a two-socket version of Epyc, but they're really trying to make a single-socket alternative to what would previously be two-socket Xeon systems. Now that AMD finally has competitive CPU cores for the first time since Conroe arrived in 2006, the massive room available to undercut Xeon prices while still making a hefty profit means that Intel's server division should be scared.