Ryzen and memory clock speeds

Quizzical · August 2017

Some people have noted that AMD Ryzen CPUs tend to have a stronger preference for higher memory clock speeds, while Intel CPUs don't care so much. I finally found a good explanation for it online, so I thought I'd share it.

Let's start with the obvious reason that isn't actually the culprit. Ryzen benefiting more from higher clock speeds could easily happen if Ryzen needed more memory bandwidth. For example, if it had to do more work because it was performing better, that could burn through more bandwidth. If it had a smaller (or no) L3 cache so that more data accesses had to go off of the chip, that would also use more bandwidth.

But outside of programs that scale well to many CPU cores and thus allow an 8-core Ryzen CPU to handily beat a 4-core Kaby Lake, neither of those are true. And outside of such programs, Ryzen doesn't typically need more memory bandwidth than Kaby Lake. Indeed, it commonly needs less. Often, the bottleneck at lower memory clock speeds isn't the system memory at all.

Rather, it's the infinity fabric. Ryzen has two core complexes with four CPU cores each, as well as 8 MB of L3 cache per core complex. The L3 cache is not unified across the entire chip like it is on Kaby Lake, but separate core complexes have separate L3 caches. Having far more L3 cache capacity and bandwidth offers Ryzen some real advantages over Kaby Lake.

The problem comes when a thread moves from one core complex to the other. Now all of its cached data is in the wrong core complex, and it has to move across the infinity fabric. That has considerably less bandwidth than L2 or L3 caches. And the clock speed of the infinity fabric is set to match the clock speed of the DDR4 memory. Higher clocked memory means more infinity fabric bandwidth, and less of a bottleneck.

This is only a problem when a thread moves from one core complex to the other. Moving from one core within a core complex to a different core within the same core complex does not cause this problem. It does mean that data has to move from one L1 and L2 cache to another, but the bandwidth available for that is massively higher and Intel CPUs have the same problem, anyway.

So long as a thread is active, constantly moving it around from one CPU core to another is a stupid thing for an OS to do. Recent versions of Windows are smart enough not to do this. But what if there are a bunch of threads bouncing back and forth between being active and not being active? Then a thread was active, became inactive so that a different thread was scheduled on its CPU core, and then the first thread gets some more work to do again. Is the OS supposed to make the first thread wait until that core is free? That could mean waiting a long time. It's reasonable to just move it to a different core.

What sort of software might do something like that a lot? What if a program is trying to do something perhaps 60 times per second, and each time it does it, a bunch of threads all go through a cycle of being active for a while, then inactive for a while? That's a pretty bad scenario for Ryzen. But if that thing you're doing is the CPU work to render a frame, I've just described a whole lot of game engines.

I'm not sure just how bad the problem is for Ryzen. But it does explain why Ryzen benefits from higher clocked memory in a lot situations where Kaby Lake doesn't. If Raven Ridge has only four CPU cores, that would likely mean one core complex, and hence complete immunity to this problem. Of course, if you're using the integrated GPU, that's likely to make Raven Ridge need a ton of actual memory bandwidth, so it might well still want higher memory clock speeds.

Ridelynn · August 2017

Wasn't there a similar problem when Bulldozer first came out, with its Dual integer cores sharing a common FPU and threads that jumped to different core pairs caused a significant degradation in performance?

Seems that was ~mostly~ cleared up by some tweaks to the Windows scheduler (at least in real-world cases, you could always devise something synthetic that exaggerates the issue), and seems this issue with Ryzen could similarly be mitigated in the same fashion.

Quizzical · August 2017

Ridelynn said:

Wasn't there a similar problem when Bulldozer first came out, with its Dual integer cores sharing a common FPU and threads that jumped to different core pairs caused a significant degradation in performance?

Seems that was ~mostly~ cleared up by some tweaks to the Windows scheduler (at least in real-world cases, you could always devise something synthetic that exaggerates the issue), and seems this issue with Ryzen could similarly be mitigated in the same fashion.

No, that's a different issue entirely, and more analogous to running multiple threads on the same core with hyperthreading rather than on different cores.

Think of a two-socket server and what happens when a thread that was running on a core of one socket later ends up running on a core on the other socket. Ryzen's issue here isn't nearly as bad as that, but it's the same sort of issue.

And then imagine if, for whatever reason, the bus connecting the two sockets were set to run at the same clock speed as system memory. Clocking system memory higher could then improve performance purely because you're clocking the bus connecting the two CPU sockets higher. That's analogous to what's going on here.

Ridelynn · August 2017

So can't the scheduler just be made smart enough to not have threads jump cross-module cores? When possible of course.

I still fail to really see a significant reason that isn't possible or wouldn't help alleviate the problem

because thats at what it does for hyperthreading and Bulldozer and whatever other architectures, in my crude laymens understanding of how it works.

Cleffy · August 2017

Speed. You always want to make the core functionality as simple as possible. AMD has always had a fast hyper-transport or infinity fabric. I really don't think its a big deal to pair Ryzen with better memory. It's not some huge night and day difference to the typical user. For the power user, they would already be going after 3200mhz+ memory.

13lake · August 2017

Memory @3600Mhz makes a Ryzen 7 1800x at 4.1/4.2Ghz behave like it's clocked at 4.6/4.7Ghz (fps wise for gaming, different gains for other things)

And 3466Mhz is easy to do, 3600Mhz is a little bit trickier, and 3600Mhz+ is possible but requires a golden sample IMC and heavy memory timing and other bios setting tweaking.

Quizzical · August 2017

Ridelynn said:

So can't the scheduler just be made smart enough to not have threads jump cross-module cores? When possible of course.

I still fail to really see a significant reason that isn't possible or wouldn't help alleviate the problem

because thats at what it does for hyperthreading and Bulldozer and whatever other architectures, in my crude laymens understanding of how it works.

Is it possible to improve the situation with smarter OS scheduling? Probably. But this is a milder version of what multi-socket servers have had for many years, and that's far from solved.

Howdy, Stranger!

Ryzen and memory clock speeds

Comments

Howdy, Stranger!

Quick Links

Ryzen and memory clock speeds

Comments