The trouble with DLSS and related algorithms

Quizzical · October 2022

The short version is that there is no free lunch. There is no substitute for having enough samples to produce high quality images at high frame rates. But if you can't generate as many samples as you want as fast as you want, then there are trade-offs as to what penalty you take for that.

I made a thread on a related topic a little shy of two years ago. That's still accurate and relevant to the topic at hand, so you may wish to brush up on it if you'd like to know more about this topic. The old thread is here:

https://forums.mmorpg.com/discussion/491977/the-truth-about-high-resolutions-dlss-and-variable-rate-shading

I'm not going to rehash the information in that old thread, however. This thread has some new stuff to say. Today, I want to talk about the trade-offs inherent in using DLSS, FSR, and Nvidia's new frame generation, and important ways that they're different from each other. Such novel methods of generating frames from samples create new options, but it's good to understand the trade-offs. They don't just magically give you better image quality or higher frame rates for free.

Let's back up a bit and explain some terms. The traditional way to do 3D graphics is to have one sample per pixel per frame. You do a bunch of computations in order to conclude that this pixel of this frame should be this particular color. You do analogous computations for each other pixel in a frame, and that's how you render a frame.

But it doesn't necessarily have to be one sample per pixel. You can generate a sample at any arbitrary location on the screen, not just the center of a pixel. You can have multiple samples per pixel and then average their colors to get the color that will be displayed on a monitor. That's called super sample anti-aliasing, or SSAA. You can also have multiple pixels per sample, such as having one sample at the center of each 2x2 grid of pixels and using the color chosen for all four of the pixels. Doing that in the traditional way is called upscaling.

Fundamentally, a sample is one set of computations that say, in this particular frame, the color at this particular point on the screen should be such and such. Generating the samples is most of the work of rendering a frame. For a simple method of one sample per pixel, mapping samples to pixels is trivial. Some recent methods such as DLSS, FSR, and Intel's new XeSS use more complicated methods to map samples to pixels, with each sample affecting multiple pixels (sometimes spread across multiple frames) and each pixel relying on multiple samples.

For simplicity, let's suppose that a video card can generate a fixed number of samples per second. This isn't actually true, for a variety of reasons. For starters, it will vary a lot by game, and also by the graphical settings that you use within a particular game. Let's suppose that we're talking about some fixed game and some fixed settings that you want, with various options turned on or off, scaled up or down, or whatever is relevant.

The assumption of a fixed number of samples per second is also untrue because most stages of the graphics pipeline don't actually depend on the monitor resolution chosen. Rather, the video card computes various information about triangles or quads in screen space without knowing the resolution, then determines which pixels (or more properly, sample locations) each triangle covers at rasterization. Only after this does it do separate computations for each sample. Still, as you can see from many benchmark results, performance scales strongly with monitor resolution, so a large fraction of the workload usually comes after rasterization, and thus scales linearly with the number of samples generated.

Another reason why the fixed number of samples per second is untrue is because more complex methods of generating frames from them carry additional computational overhead. If the GPU is spending 90% of its time generating samples and 10% of its time assembling them into frames to display on your monitor, then it's naturally going to generate fewer samples than if it can spend 100% of its time generating them. This effect makes all of the newer methods such as DLSS worse than they would otherwise be, but it's not a large effect.

Even so, let's suppose that you can generate some fixed number of samples per second for the sake of simplicity. For concreteness, let's suppose that it's 248,832,000. That's enough to render a game at 3840x2160 at 30 frames per second, or 2560x1440 at 67.5 frames per second, or 1920x1080 at 120 frames per second. Even using traditional rendering, you get a strong trade-off between monitor resolution and frame rate.

Being able to generate more samples per second makes everything better. If you double the number of samples per second that you can generate, then you can double your frame rate at a given resolution. Alternatively, you could maintain the same frame rate at a higher resolution. Similarly, fewer samples per second makes everything worse.

The idea of FSR is basically to say, let's not generate all of the samples for a frame. Let's only generate half of them, and then use some fancy algorithm to create the full frame. Because you're only generating half as many samples per frame, you double your frame rate. However, because you only have half as many samples available, you get worse image quality. It doesn't have to be exactly half, and could be any other ratio. But the general rule is that the more samples that you have, the better the image quality that you can get.

DLSS takes a fundamentally different approach. It says, each time we generate a frame, let's only generate half of the samples. That allows us to double the frame rate. However, rather than only using half as many samples to generate a frame, let's use the samples that we generate for one frame and also the samples that we generated for the previous frame. Now we have the full number of samples, so we can get higher image quality.

There is, of course, the drawback that the samples from the previous frame will be wrong because things have moved. They improve on this by attaching a motion vector to each sample. That is, if a particular object is moving up and left on the screen at a rate of 0.04 pixels per millisecond, and 20 milliseconds have passed since the last frame time, then the sample for the new frame will be the old sample color with the position moved up and left by 0.8 pixels. If everything is moving at constant speed on your screen and not accelerating at all, then this works great.

DLSS doesn't say that you have to take exactly half of the samples or use samples from exactly two consecutive frames. As with FSR, you can adjust the ratio of samples to pixels, with more samples inevitably resulting in higher image quality. You can also adjust the number of frames that you consider, and give less weight to older frames.

Because DLSS is able to recycle samples from previous frames, it can get a lot more samples. When it works well, that will give you much higher image quality than FSR. Provided that everything is moving at constant speed and not changing color or brightness, with enough samples and enough frames, DLSS image quality can even rival that of SSAA. Thus, used properly, DLSS will tend to result in better image quality than FSR.

Quizzical · October 2022

But there is still no free lunch. DLSS has two enormous drawbacks that FSR is categorically immune to. One of them is hinted at in the "constant speed" and related caveats above. DLSS attempts to recycle old samples to predict new ones, but the ways that it happens will sometimes be wrong. Being slightly wrong doesn't make a very big difference, but being wildly wrong can produce awful image quality, and far worse than you'd get from traditional upscaling.

There are two ways that this can be wrong. One is if the object is accelerating. This will cause the sample to be in the wrong place. So long as the acceleration is slow relative to the frame rate, this won't make very much of a difference. If older samples are in the wrong spot by 0.1 pixels, the image quality degradation will be modest. If older samples are off by several pixels, then the resulting artifacting will look terrible.

For objects under constant acceleration, how far off the samples will be is proportional to the acceleration times (time per frame)^2. Note that time per frame is squared here. Things that look fine at high frame rates can easily look terrible at low frame rates. If you're relying on DLSS to be the difference between a slide show and a playable frame rate, then you're probably going to be disappointed in the image quality.

For objects that instantly change direction, how far off the samples will be is proportional to (change in velocity) times (time per frame). Again, higher frame rates and thus shorter time per frame makes the image quality problems look less bad. In this case, the effect is less dependent on frame rates than in the constant acceleration case, as the time per frame isn't squared.

The other way that this can be wrong is if the color of an object changes. Blinking lights are a huge problem, as if a light was off when a sample was generated in a previous frame and then on in a new frame, those old samples will assume that the light is still off. This doesn't just affect seeing lights themselves, but also anything that they illuminate.

The other problem with DLSS is latency. If a completed frame averages the samples generated across several frames, then this will add substantial latency to the time before an object appears to start moving. If an object starts to move, then when the game generates a new frame, it will average several frames that assume that it hasn't moved with one that assumes that it has, and the object will mostly not move. To the extent that it looks like it moves at all in the first frame, it will merely be due to some artifacting showing up.

In some games, this won't matter. In twitchy games, this effect can be deadly. If it takes an extra 30 or 40 milliseconds before you realize that something has happened, then that will often be the difference between winning and losing.

How bad the latency problem of DLSS is depends on how far back in time the old samples go. That depends both on how many frames are used, and also on how much time per frame. Again, faster true frame rates make everything better here. If everything feels like it's delayed by two frames, then that's going to be a huge problem at 40 frames per second but only a minor one at 200 frames per second.

It's important to understand that FSR is categorically immune to both of these problems. Because it generates each frame from completely new samples, it doesn't introduce any latency from recycling old samples or artifacting from those old samples now being wrong. That doesn't mean that FSR is better than DLSS. There is still no free lunch, and still it's a matter of trade-offs. When the artifacting is minimal, DLSS will tend to produce significantly better image quality than FSR.

And then there is Nvidia's new frame interpolation method. The idea here is that you generate two full frames, then interpolate another frame to be halfway between them. You use both the samples from the older frame moved forward in time and also the samples from the newer frame moved backward in time. That gives you enough samples to interpolate a decent frame halfway between them. It can double your apparent frame rate at a minimal cost to GPU work. Unlike FSR or DLSS, which don't affect the CPU load per frame, the additional CPU load from frame interpolation is trivial.

The problem is that the frame interpolation will cause both degraded image quality and increased latency, for about the same reasons as DLSS. In the generated, intermediate frames, all of the samples will be at least slightly wrong for reasons similar to DLSS.

But there is also increased latency, as once a new frame is ready, you can't display it yet. Rather, you create a new frame, create an interpolated frame between the previous frame and the new one, and then display the interpolated frame. Only later do you actually display the new, true frame. This delays frames appearing on your monitor by half of the time that it takes to make a rendered frame, or one frame in the newly higher nominal frame rate.

Quizzical · October 2022

At this point, one could stop to ask why higher frame rates are better. It's obvious that they are if you aren't playing any shenanigans like DLSS or frame interpolation, but I'd like to explain exactly why. When frame rates get very low, the human brain interprets them as being a sequence of unrelated frames. Only when they get high enough does your brain interpret the series of frames as motion. The slide show effect is terrible and makes it hard to follow what is happening in a game when you can't intuitively see motion. But a steady 24 frames per second is generally enough to avoid that.

Animation does look smoother as you go to higher frame rates. But once you get past 40 or so, provided that it's a steady frame rate (which often isn't the case!), the effect of smoother animations is pretty minimal. This is especially the case if you avoid the implicit judder of reused frames by using AdaptiveSync or something analogous to display frames in a steadier manner.

The other benefit of higher frame rates, and the only real benefit once frame rates get high enough, is reduced latency. If you're running a game at 50 frames per second, generating a new frame every 20 milliseconds, then on average, the frame on the screen is 10 ms older than it would be if you had an infinite frame rate. At 100 frames per second, you generate a new frame every 10 ms, and the frame on the screen is on average 5 ms older than if you had an infinite frame rate. Being able to see what is happening 5 ms sooner lets you react 5 ms sooner. That matters significantly in twitchy games, though not so much in slower paced games.

The problem with the increased latency of DLSS is that you lose this latter benefit. Frame interpolation will always make your latency strictly worse than it would be without it. If DLSS adjusts to generate half as many samples per frame while going back twice as many frames, perhaps this doubles your frame rate, but it comes at the expense of needing to double the frame rate in order to maintain the same latency and image quality as before. That really isn't a net win.

Again, these problems are much, much worse at low frame rates. If someone says, hey, I get 100 frames per second with DLSS off and 200 per second with DLSS on, and I don't see any noticeable artifacting or latency problems, then at 100 frames per second even without DLSS, he's probably telling the truth. He doesn't notice artifacting or latency problems, because they're so small. But the benefits are likewise very small, so that the only particularly noticeable difference (whether good or bad) of using DLSS is on the frame rate meter. If you're relying on DLSS to be the difference between 25 frames per second and 50, it's not likely to work out well for you.

It's also important to understand that still shots comparing image quality from DLSS on and off will tend to be misleading. Remember that DLSS works great if everything is moving at constant speed, which includes not moving at all. People trying to generate the same screenshot with DLSS off or on tend to pick situations where nothing is moving, as that's far easier to replicate than a bunch of monsters moving around crazily in a battle. That creates the best possible situation for DLSS, as it hides the artifacting that would otherwise be present.

Nor can you see the difference properly in videos recorded by others, at least unless the video encoding is completely lossless. If the artifacts of video encoding are much greater than the artifacts introduced by DLSS, then the latter won't look like a problem.

FSR is not affected by this. Because FSR uses purely samples from a single frame, motion or acceleration will not affect the image quality at all. Trying to generate the same screenshot of nothing moving with FSR on or off is a clean comparison of the image quality degradation that you get at a given quality setting of FSR.

If used stupidly, DLSS can make your gaming experience markedly worse than before. For example, if you're CPU limited, then turn on DLSS, you get all of the latency and artifacting, but none of the increased frame rates, as you're still CPU limited. Similarly, if you're limited by your monitor's refresh rate, you might not see much of the effects of an increased frame rate, especially if vertical sync is on.

That a setting can be used stupidly doesn't necessarily mean that it's a bad setting, however. I'll conclude this thread the same way that I did the other thread linked above: don't be an idiot. Use graphical settings responsibly.

Vrika · October 2022

Quizzical said:

If used stupidly, DLSS can make your gaming experience markedly worse than before. For example, if you're CPU limited, then turn on DLSS, you get all of the latency and artifacting, but none of the increased frame rates, as you're still CPU limited.

If you're running CPU limited then DLSS 3 (or DLSS Frame Generation) can double your framerate since the GPU doesn't need any CPU updates for the new interpolated frames.

So DLSS 2 or earlier can't increase your FPS when you're CPU bound, but DLSS 3 can.

Howdy, Stranger!

The trouble with DLSS and related algorithms

Comments

Howdy, Stranger!

Quick Links

The trouble with DLSS and related algorithms

Comments