It looks like you're new here. If you want to get involved, click one of these buttons!
To be clear, by "hyperthreading", I'm talking about the feature in many recent Intel processors in which each core has extra scheduling resources so that a single core can have two threads active at once. If one thread doesn't have anything ready to execute (e.g., because it's waiting to get data from system memory), then the other thread can use the core briefly. This lets a CPU core bounce back and forth between two threads on a nanosecond scale to fill small gaps when a core would otherwise be idle. This is much faster than an OS can switch them back and forth.
Having one core with hyperthreading isn't nearly as good as having two real cores, of course. While hyperthreading lets a single core juggle two threads, it can never have both threads execute something at the same time, as can easily be done with two separate cores.
In gaming benchmarks, a quad core with hyperthreading typically has little to no advantage over a quad core without hyperthreading. The reasons for this are pretty simple: many games don't scale to more than four cores, and even those that do are likely to be video card limited rather than processor limited on a system with four fast cores.
But if you look at dual cores, it's a different picture entirely. Intel's Pentium and Celeron branded processors tend to offer miserable gaming performance. They'll still make many games playable, but they'd be a huge step down from a budget AMD quad core.
Meanwhile, a Core i3 that is nearly the same thing except with hyperthreading often fares much better. Yes, the Core i3 is clocked higher, and has more L3 cache, but often it beats a Pentium dual core of the same architecture by far more than you'd predict from that. Sometimes it beats out a comparably-priced AMD quad core even when a Pentium dual core loses badly. That wouldn't happen in a game that scaled well to four cores. Yet hyperthreading is useless in programs that can't put more than two cores to good use.
For quite a while, I found this rather puzzling. But now that I've been programming a game recently, I think I have the answer.
Games typically have only one CPU thread that communicates with the video card, because if two threads try to talk to the video card at the same time, they'll trip over each other and break everything. Actually, for technical reasons, in my game, there are actually two threads that communicate with the video card: one initializes some things and then dies, and the other isn't allowed to talk to the video card until the first is done. For complicated reasons, this makes the game load faster. Regardless, only one thread can communicate with the video card at a time. While DirectX 11 offers multithreaded rendering to get around this, there are compelling reasons not to use it unless you know that the client has many CPU cores. We'll get there shortly.
Since only one thread can handle the video card, what you do is to have other threads handle everything else except for communicating with the video card, while the one rendering thread mostly just passes along the data that other threads have prepared while doing a little bit of work to keep things organized. The advantage of this approach is that other work doesn't have to wait on passing data to the video card, but can go as soon as it's ready. This means that you're not bottlenecked by having to do a large fraction of the work in the rendering thread, and thus unable to scale well to many CPU cores.
The "problem" comes when the processor can pass data and commands along to the video card faster than the video card can process them. That's an inevitable result of a single drawing command easily being able to cause hundreds of thousands of shader invocations on the video card. What I think happens is that the video drivers just put some things in a queue that the video card will handle when it's ready.
If the queue gets big enough, however, the video card basically tells the rendering thread to stop and wait for the video card to process some data before continuing. This is a good thing: if a video card let you have 20 frames worth of rendering commands sitting in a queue, the resulting display latency would probably make the game unplayable.
The video card might well be ready again a few microseconds later. You don't want to stop the rendering thread entirely for a few milliseconds and potentially leave the video card idle for a while after that. So the rendering thread remains "active" and uses up a core for those few microseconds while it is waiting for the go-ahead from the video card to continue.
If a game is very much GPU-bound, the rendering thread could easily spend 2/3 of its time waiting on the video card. Yet it uses up a CPU core that entire time, even though it's commonly not actually executing any instructions apart from waiting.
And yes, I'm pretty sure that this does happen. In my game, I've tried settings that should be very GPU-heavy with not much CPU load, and I get CPU usage of one and a small fraction cores. Cut the GPU load in half (say, turning off SSAA) without changing the per-frame GPU load and you double the frame rate, while getting CPU usage of one and a slightly less small fraction cores--even though the CPU is now doing twice as much work as before, to prepare twice as many frames.
So what does this have to do with hyperthreading? Having a high-priority thread that is using a core while not executing that many instructions is a tailor-made scenario for hyperthreading to shine. Put two threads on a core and the rendering thread mostly leaves gaps that the other thread can fill, so the other thread on the same core can have performance that is a large fraction of what it would have if it had the core to itself.
That leads to a strange conclusion: hyperthreading is likely to be particularly useful in games that are largely GPU-bound. But that's not as strange as you might think. On a second to second basis, the relative amount of work that a GPU does as compared to a CPU typically doesn't fluctuate that wildly for a given game at given settings on given hardware. That's what you see if you try to measure CPU or GPU load by using Windows Task Manager, CPU-Z, GPU-Z, Catalyst Control Center, or whatever.
But on a millisecond scale, the relative load can fluctuate wildly. At the start of a new frame, the CPU knows that there are a bunch of objects that the game might potentially want to draw, so it can have a bunch of threads process those objects at once to get them ready to draw, and stick them in a queue for the rendering thread. If you're GPU bound, the CPU (other than the rendering thread) may be done with its work for that frame while the video card is only 1/3 done, so the CPU then gets to sit there and wait. To have nearly 100% CPU usage for a few milliseconds and then likely nothing active but the rendering thread for the rest of that frame and bounce back and forth between those two extremes over the course of most frames can easily happen.
Now, that's not a bad thing, really. Remember where I said above that the rendering thread tries to organize things? That's a lot easier to do if you have everything available to be organized than if you only have a few at a time. Switching programs in particular is expensive, so if you can sort a bunch of surfaces to draw by program, you can draw many surfaces between times that you have to switch programs. For example, switch to the program to draw ellipsoids, then draw all of the ellipsoids for the entire frame at once, then switch to the program to draw tree branches, then draw all of the tree branches for the entire frame, and so forth. If you're mostly waiting on the CPU, the rendering thread may get one surface, send it along to the video card, then switch programs to whatever the next one with data ready is, and so forth, and have to switch programs nearly every single time you want to draw anything.
Hyperthreading means that the rendering thread doesn't have to waste a core when it's just waiting for the video card to be ready. For a dual core, that can mean one core available to do "real" work versus two, and yes, that's a huge difference. That can account for the chasm in performance between a Core i3 and a Pentium dual core even in games that don't scale well to four cores. Meanwhile, being unable to use four real cores means that the extra cores of an AMD quad core can't be used to their full effect, and so it may well also lose to the Core i3. Four slower cores don't beat two faster cores if you can't put more than two to good use.
And yet, the real problem here is that you're mostly GPU bound, which is why the games where this happens commonly show a bunch of CPUs bunched near the top, with a Core i3 maybe beating, say, an FX-4300 by several frames per second, but only losing to a Core i7-3770K by about that same margin. Get a faster video card (or just turn down graphical settings that don't put much load on the CPU) and the relative CPU results could easily change.
The upshot is that hyperthreading matters a lot for gaming on a dual core. But if you're buying a new processor today with gaming in mind, you don't want a dual core. Which is almost to say that hyperthreading doesn't actually matter much for gaming at all. But not for the reasons you thought.