The difficulty of benchmarking processors for gaming use

Quizzical · May 2013

A common question that people building or buying a gaming rig have is of what processor to get. You know that the processor is important; the real question is which processors will tend to perform how well in various games. With video cards, there are benchmarks all over the place that make it pretty clear how they'll tend to perform. With processors, good gaming benchmarks are scarce to the degree that I'm not aware of a single site that I can point to and say, they've got good data that won't lead you to a wildly wrong conclusion.

The real occasion for this post is that AnandTech tried to do such an article today. I won't link to it, as it's basically worthless. But it's worth examining how they botched the testing to make the results worthless outside of some esoteric cases.

Some games tend to put more load on a video card, while others put more load on a processor. Video card reviews want games that put a heavy load on the video card, so that you won't immediately have a processor bottleneck. Otherwise, they'd say, we tested these ten cards and they all got 80 frames per second, because that's all that the CPU can handle.

There's a general principle that if you're benchmarking a type of component, you want a test in which results will depend mostly on that component. If you're benchmarking video cards, you want the video card to be the limiting factor. If you're benchmarking solid state drives, you want the SSD to be the limiting factor. If you're benchmarking USB controllers, you want the USB controller to be the limiting factor. And if you're benchmarking processors, you want the processor to be the limiting factor.

But tech sites tend to put a lot of focus on benchmarking video cards, and while they do benchmark processors, they don't put much emphasis on games when benchmarking processors. When they do, sometimes they reuse the same games that they use to benchmark video cards--that is, games specifically chosen because the CPU probably won't be the limiting factor. Oops.

And then there's the choice of what graphical settings to use. Turning up adjustable graphical settings tends to put far more extra load on the GPU than the CPU, but it does vary by setting and by game. Some settings are almost pure GPU load and the difference in CPU load is negligible--or in some cases, literally zero added CPU load over the course of a benchmarking run. Anything that you can set in video drivers and any post-processing effects are almost invariably in this category. Turning these on can be useful if you're trying to benchmark video cards. Even so, anti-aliasing and anisotropic filtering can be dangerous to turn on, as different video cards may use different algorithms that give different image quality, and thus a different relative performance hit because they're not doing the same things.

But if you're trying to benchmark anything other than GPU performance, all such settings should be turned off or to the minimum. Different sites take different approaches to CPU benchmarks. Some go to a preset low or medium settings. Anandtech went with max settings, which means that all of the pure GPU load settings were set to the maximum. Oops.

The result was that in every game they benchmarked, adding a second Radeon HD 7970 or GeForce GTX 580 greatly increased performance for many of the CPUs--which pretty conclusively proves that the game had largely been GPU-limited before. The CrossFire or SLI results are not a clean comparison to get rid of a GPU bottleneck, either, as at that point, you may well be measuring CrossFire or SLI weirdness as much as CPU performance. For analogous reasons, some sites have taken to using SSDs when benchmarking other components just so that the hard drive doing something weird won't interfere with results. That's how you end up with worthless results.

Obviously, AnandTech isn't the only tech review site out there. Tom's Hardware publishes monthly lists of the best gaming CPUs for the money, purporting to offer an answer to the question at the start. Unfortunately, their lists tend to be about as good as assuming that all current generation processors have prices in line with their performance and not looking up benchmarks anywhere. There are obvious problems with that last approach, and it's very different from the problems with Tom's Hardware's.

So it is instructive to look at where Tom's Hardware goes wrong in their CPU benchmarks. A game engine doesn't scale perfectly to n threads and no more, outside of single-threaded games. Different portions of an engine may scale to different numbers of threads. Within the rendering loop, a lot of work will scale well to many threads, but some won't--and in particular, at the start and end of a frame, you may have a substantial single-threaded component. Some work such as loading data off of the hard drive or processing network activity isn't part of the normal rendering loop, and can also affect results. Which components are waiting on which others can bounce back and forth a lot, and you could easily have the CPU waiting on the GPU one millisecond and the other way around the next.

If a game is mostly GPU bound, you'll often see results where processors with higher single-threaded performance fare a little better but not a lot better. You have to get to a situation where you're almost purely GPU bound in order for a bunch of different processors to give almost exactly the same frame rate.

So what does Tom's Hardware do? They benchmark games that don't put that much load on the CPU (remember, video card reviews get a lot more attention), note that better single-threaded performance gives you slightly higher frame rates when you're mostly GPU bound, and then conclude that Intel processors are better because they have higher single-threaded performance. So they recommend almost entirely Intel processors all up and down the lineup--and in particular, they recommend a Core i3-3220 over an FX-6300, which I think is sheer lunacy if you want a gaming rig that will last you several years.

Now, you could say, if the Core i3-3220 slightly beats an FX-6300 in situations where you're mostly GPU-bound and they're the same price, then why not get it? You'll be GPU limited in a lot of games, after all. Even if it's 100 frames per second for the former and 90 for the latter, 100 is more than 90, so even if it the difference doesn't matter much, why not get the 100? Of course, I'd argue that if you want to increase performance in situations where you're mostly GPU limited, the place to spend more is the video card, not the processor.

The problem is, what happens if you pick up a game that puts a lot of load on the CPU? For the computations that games tend to do, most things can be readily scaled to as many CPU cores as you care to. In that case, you could easily end up largely CPU-limited in a game. In such a game, you're all but guaranteed that the FX-6300 will beat the Core i3-3220, and probably by a lot--likely in the ballpark of a 50%-100% performance advantage.

So if the Core i3-3220 wins slightly in situations where the difference doesn't matter, but the FX-6300 will predictably win by a huge margin in situations where the difference does matter, which do you prefer? It's sort of a philosophical question, but even if you don't think that the situations where the difference matters will be very common, I still say that preferring the FX-6300 is an easy call.

But Tom's Hardware hasn't come across any such games in their benchmarks (and remember, games to get benchmarked largely get chosen on the basis of putting a heavy load on the GPU, not the CPU), so they recommend the Core i3-3220. In fairness, such games aren't common and won't be for quite some time; a game that struggles to run on a Core i3-3220 today wouldn't have a very big market. But it used to be that games that required a dual core processor wouldn't have much of a market; that's hardly the case today.

Now, I didn't come here to bury AnandTech and Tom's Hardware. AnandTech has some excellent articles explaining what various components do. Both often have useful hardware reviews and benchmarks. It's not like they're sites run by fanboy idiots who haven't a clue what they're talking about.

But that just points back to the title: if even some of the better tech sites have tried to come up with benchmarking methods that predict how well various processors will run games and they've failed miserably, and some sites that review gaming hardware have declined to even try, then it surely isn't easy to do.

Cleffy · May 2013

I comment on this on Toms Hardware almost every CPU benchmark comparison. Not one Strategy game or MMO in the benchmarking suite, yet these are the CPU limited games.

Ridelynn · May 2013

I think you hit upon the major problem in a round-about method.

You can generically benchmark a CPU: there are several different metrics (Overall, Floating Point, Single Thread, etc) using various methods (Passmark, BogoMIPS, Prime95 benchmarks, etc). Those can give you a definitive answer as to which CPU will be faster in a specific benchmark suite. You can even stabilize conditions and pit different CPUs against each other, with all other hardware alike, and run game benchmarks, and get a definitive answer as to which is best, again, for that particular instance.

But it can't answer which is "better for gaming".

Each game, and taking it further, each program, is going to be unique. The instructions land in a different order, and utilize different features of the CPU - which each CPU will handle somewhat differently. Multi-threaded application is just one (very important) facet of the overall performance of a particular piece of software on a particular CPU.

That, and there is also the important question: what is "good enough" for gaming? There is a very steep curve of diminishing returns, and we've been on the far end of that curve, with regard to CPU performance, for a few years now - I would argue. Is the difference between 100 and 90 FPS that important? Sure, 100 is "better", but 90 is surely good enough (for those of us not running 120Hz monitors) - just as you outline in the OP. Is it worth $10 for those 10FPS? $100? $1000? That's roughly the price spread of the current crop of available options, and in many cases, especially those on single-monitor 1080p displays, that's a pretty typical FPS spread for a typical game as well.

And the final question, even if you could answer the above 2 questions with absolute certainty citing benchmarks for every single game you hope to play: How long will it be the best/good enough? Software is evolving. We are learning how to utilize more cores more efficiently. We are learning how to offload to the GPU, and to the Server/Cloud, and to any number of other mechanisms that reduce the burden on the CPU. How long will that CPU be able to perform adequately before either it can't provide enough performance any longer, or technology has shifted sufficiently enough to obsolete it in some other manner.

The three questions that matter:
Between a closed set of options, which hardware is better for gaming?
Inside of that same closed set of options, which hardware is good enough for gaming now?
How long will those two choices of hardware be good enough?

Benchmarks can help answer the first two questions, if taken in context - and you outlay a lot of the concerns addressing taking a benchmark out of context. But there is no real definitive answer to the last question.

Quizzical · May 2013

Originally posted by Ridelynn
Each game, and taking it further, each program, is going to be unique. The instructions land in a different order, and utilize different features of the CPU - which each CPU will handle somewhat differently. Multi-threaded application is just one (very important) facet of the overall performance of a particular piece of software on a particular CPU.

...

We are learning how to utilize more cores more efficiently. We are learning how to offload to the GPU, and to the Server/Cloud, and to any number of other mechanisms that reduce the burden on the CPU. How long will that CPU be able to perform adequately before either it can't provide enough performance any longer, or technology has shifted sufficiently enough to obsolete it in some other manner.

On the first quoted portion, that's certainly true. But different instructions in different orders is true of video cards, too. With video cards, you can benchmark a bunch of games, take kind of an "average" result, and assume that future games are likely to perform proportionally to today's average result. And that assumption likely won't be all that wildly wrong. If CPU usage of AVX, FMA, AES-NI or some such becomes widespread in games in the future, that could skew results in favor of CPUs that support the instructions and against those that don't, but I wouldn't expect to see that happen.

Processors also have the issue of scaling to more cores. Video cards kind of have that issue in scaling to more shaders and such, but it's not a major concern, as it's pretty trivial for real games to scale. It would be easy to write a program that didn't scale to use as many shaders as you've got if you were so inclined, but that would be intentional sabotage, not merely a case of accidentally not working very well. There, I'm optimistic that games that need the computational power that more cores provide will use it, precisely because it's not that hard to do in games.

Yes, people are getting better at writing programs that scale well to more CPU cores. But in games, the only real trick you need is a standard producer-consumer queue. There are some situations where other threading approaches are more efficient (both in running speed and in code readability), but I'd be surprised if there are more than a handful of games out there that do anything that need more CPU power than, say, a 1 GHz single-core Atom would offer, but wouldn't be easy to scale to many cores even if your only threading paradigm is the producer-consumer queue.

There are some programs where offloading work to the cloud is a big deal. Games aren't among them today, and aren't likely to be among them anytime soon. While online games do need servers, the point of that isn't to offload work from the CPU. But you know that, and weren't talking about games in particular.

As for offloading work to the GPU, I'd argue that that's largely a matter of using the new capabilities of recent video cards. Some cutting-edge stuff like tessellation or GPU physics may well be hard to do. But geometry shaders aren't. While I've argued that the reason tessellation isn't used more is because people lack the math background, that's not true of geometry shaders. If you've got the math background to write older vertex or pixel/fragment shaders by any method other than copying formulas that you found somewhere, then you've got the background to get a lot of mileage out of geometry shaders.

Now, games that have a DIrectX 10 or later version probably do use geometry shaders. But so long as they're still trying to make a DirectX 9.0c version for backward compatibility, it limits what you can do with them to some degree, as it means that everything you do with geometry shaders either has to be for a non-essential effect that can be turned off, or else you have to make a DirectX 9.0c version of the same effect that doesn't also use geometry shaders.

Geometry shaders let you do two key things. First, they let you see an entire primitive at once rather than only one vertex or one pixel at a time. Second, they let you emit a different number and type of primitives than you take in. That adds a ton of versatility.

For an example of why the first is good, consider the situation of wrapping a texture around a cylinder (or perhaps rather, something homeomorphic to a cylinder). There are a lot of cases where you'd want to do this, such as if you're drawing a character's arm or leg.

The way that OpenGL works is that to apply a texture, fragment shaders need texture coordinates that correspond to the given pixel, which tells it where in the texture to look. You specify your texture coordinates at each vertex in the last stage before rasterization (geometry shaders if you use them, vertex shaders if not), and the video card will interpolate intelligently to give you texture coordinates at each fragment that it generates.

When you want to wrap around a cylinder, this causes problems. Let's suppose that you have four vertices in a loop around the cylinder, and that they have texture coordinates in the relevant axis of 0.1, 0.35, 0.6, and 0.85, respectively. When you go from the first vertex to the second, if you have 5 pixels, it will interpolate them to give values somewhere around 0.125, 0.175, 0.225, 0.275, and 0.325, respectively. (For technical reasons, it's unlikely to be these exact values or even that exact spacing, but it works properly.) That works pretty well. To go from the second vertex to the third and from the third to the fourth, it also works properly.

But to go from the fourth vertex to the first, you want texture coordinates in the ballpark of 0.875, 0.925, 0.975, 0.025, and 0.075. But it won't do that; it will wrap around in the other direction, and give you coordinates around 0.775, 0.625, 0.475, 0.325, and 0.175. That's probably hard to follow, so I'll give you a picture:

Having more vertices filled in doesn't fix the problem. It arguably makes it look worse:

To let you see the texture without artifacting, here it is:

There are a variety of ways to fix this without using geometry shaders. If you could tell the video card that, when connecting the fourth vertex to the first, you want to use a texture coordinate of 1.1 rather than 0.1, it works. Without geometry shaders, you can split your model so that the first vertex is two different vertices with two different texture coordinates, and those vertices have to be processed completely separately. This also forces you to include texture coordinates in your vertex data. It works, but it's something of a kludge.

With geometry shaders, it's pretty simple: if two vertices in the same primitive have texture coordinates that are too different, then you change the texture coordinates of a vertex for that primitive only. Or, to give you explicit source code:

fTexCoord.y += round(gTexCoord[0].y - gTexCoord[1].y);

You have addition, subtraction, and rounding to an integer. That's elementary arithmetic, and nothing fancy. It completely fixes the problem, means that you don't have to mutilate your models to replicate vertices, and doesn't require the CPU to do anything at all.

But if you're also going to make a DirectX 9.0c version of your game, you have to also implement the awkward approach, and that means different art assets for the different versions. Maybe you can store the DirectX 10 version only and compute the DirectX 9.0c art assets from it when you load them, but that's something of a pain.

Above, geometry shaders were really just a convenience. Let's give another example of something that is much harder to do without geometry shaders: particle effects with lots of particles moving in complex ways.

Let's suppose that you want to have a character's attack consisting of sending 100 particles from the character to his target. They could be individual pixels on the screen, or small tetrahedra, or whatever. But you want 100 of them, you want them to go from this point to that point, and you want them to disappear when they get there. You want them to spread out a bit so that some appear a little before others and disappear a little before others, and the paths spread out over a cylinder or cone or ellipsoid or some such.

If you've got geometry shaders, this is pretty easy to do. You can have simple data consisting of 100 floats as your vertex data, and uniforms for the starting and ending positions and how much time has passed. You can have the vertex shaders decide where to put each particle, and then geometry shaders convert the point into a tetrahedron or whatever. Also importantly, you can have the geometry shader simply discard some of the particles if it's before they were supposed to be created or after they disappear. You could even have the geometry shader convert the particles into a small explosion at the end by changing the number and position of primitives it outputs for each vertex it takes in.

You can actually make this better yet by using tessellation. You input some token data and a uniform saying how many particles you want, and the hardware tessellator generates exactly that many points. This eliminates the need to cull stuff at the ends.

And you can do all of that on the GPU so that it carries essentially no performance hit. You could have a thousand particles instead of a hundred. You could have 20 characters using that attack at once, for thousands of particles moving around on the screen. All that the CPU needs to see is the starting location, the ending location, how much time has passed, and whatever is necessary to draw a single particle. And again, doing this in geometry shaders would carry virtually no performance hit; indeed, it could even increase your performance on net (as compared to drawing nothing) if it you don't do any texturing or fancy lighting computations for the particles, because it lets you skip running some more expensive pixel shaders for the pixels that are covered up by the particles.

If you want to port that back to DirectX 9.0c, you're going to have to do a lot of the work of determining which particles go where on the CPU instead--and then going to have to redo it and re-upload it every single frame. That carries a big performance hit. And you can't just decide not to draw it, or else the attack is now invisible. You could have the DirectX 9.0c version do something much simpler to draw, but then you're looking at the DirectX 9.0c version having completely independent artwork from the DirectX 10 version. If you want to do that in very many places in the game, you can bloat the development cost in a hurry.

I'm hoping that extensive use of geometry shaders like that will become common, as it's the next real thing that needs to be done to offload more work to the GPU. But fully embracing it means no more DirectX 9.0c versions of games, and that's what's holding it back.

Did I just derail my own thread?

Quizzical · May 2013

And so, of course, the very day after I post this, Tom's Hardware comes out with this:

http://www.tomshardware.com/reviews/neverwinter-performance-benchmark,3495-8.html

Want a game that is CPU-bound? Neverwinter sure is. At max detail, they basically couldn't tell the difference between a Radeon HD 7790 and a 7970, or between a GeForce GTX 650 Ti and a GTX 680. And nothing could touch 60 frames per second, though you might be able to get there by overclocking a Core i5-3570K a good bit.

Meanwhile, it only scales to about 3 CPU cores. So it's not possible to feed the game the sort of CPU power that it needs.

Even so, it's not really that surprising that Neverwinter has such a CPU bottleneck. It's basically the same game engine as Champions Online and Star Trek Online, which were also mostly CPU-bound.

ShakyMo · May 2013

I look at benches for games that push both cpu and gpu

E.g. planetside 2, skyrim with the hd mod

Howdy, Stranger!

The difficulty of benchmarking processors for gaming use

Comments

Howdy, Stranger!

Quick Links

The difficulty of benchmarking processors for gaming use

Comments