On FidelityFX
On FidelityFX
On FidelityFX
Overview
From: https://gpuopen.com/fidelityfx-superresolution/
FidelityFX Super Resolution is a spatial upscaler: it works by taking the current anti-aliased
frame and upscaling it to display resolution without relying on other data such as frame history
or motion vectors.
At the heart of FSR is a cutting-edge algorithm that detects and recreates high-resolution edges
from the source image. Those high-resolution edges are a critical element required for turning
the current frame into a “super-resolution” image.
FSR provides consistent upscaling quality regardless of whether the frame is in motion, which
can provide quality advantages compared to other types of upscalers.
FSR is composed of two main passes:
● An upscaling pass called EASU (Edge-Adaptive Spatial Upsampling) also performs edge
reconstruction. In this pass, the input frame is analyzed and the main part of the
algorithm detects gradient reversals – essentially looking at how neighboring gradients
differ – from a set of input pixels. The intensity of the gradient reversals defines the
weights to apply to the reconstructed pixels at display resolution.
● A sharpening pass called RCAS (Robust Contrast-Adaptive Sharpening) extracts pixel
detail in the upscaled image.
FSR also comes with helper functions for color space conversions, dithering, and tone mapping
to assist with integrating it into common rendering pipelines used with today’s games.
FidelityFX Super Resolution looks for gradient reversals in the source image to reconstruct
high-definition edges at upscaled resolution.
EASU Explanation
In the following we are going to explain all we know/understand/gather from this technique.
EASU preprocessing
● Image should be well antialiased (TAA, MSAA, etc.)
● Image should be in perceptual space
○ This means we should do a conversion. The idea conversion is described
elsewhere, for instance at these blogs (in particular, the second one):
https://lettier.github.io/3d-game-shaders-for-beginners/gamma-correction.html
http://chilliant.blogspot.com/2012/08/srgb-approximations-for-hlsl.html
However, at the Unity presentation at siggraph, they say they used sqrt to go
from RGB to sRGB, and squared to go the other way round…. Not as accurate,
but probably faster...
● Input image must be normalized to [0,1]
○ Negative input results in RCAS to output NaN!
● Image should be generated using negative MIP bias to increase texture detail
● Image should be noise-free
○ Add that AFTER rescaling with FidellityFX
EASU Algorithm
● Uses a fixed 12-tap kernel window. EASU requires an analysis on those 12 taps before it
can figure out the filter kernel, selecting the nearest 12-taps in a circular pattern. The
reason why 12 taps was chosen, instead of 16, is because with 12 taps you only need
36 registers for the 32-bit version.
● Does analysis on each ‘+’ pattern that surrounds the inner 2x2 quad in luma (r+2g+b).
So if we look at the 12-tap kernel, there is 4 taps in the center, and for each one of
those, it needs to compute the analysis for direction and length. And to do the analysis it
is working in luma, and by luma I mean an approximation, red plus two green plus blue.
So is it not a complicated luma, it is more of a “get all the channels included so we don’t
miss anything” approximation.
The analysis done on the 2x2 quad, and this is effectively a form of pass merging. As
the analysis could have been done in a separate pass, but then one would require two
round trips through memory, and therefore we don’t want to do that, instead we
duplicate a little amount of work in the shader, thus we don’t have to go through
memory many times. Once the analysis is finished we are going to bilinearly interpolate
the analysis at the position we actually want to filter at. And that is going to be used to
shape the final filter kernel.
However, there it is only the setup, the function where this is really applied, i.e., where the
textures are sampled, is at (line 239):
void FsrEasuTapF(
inout AF3 aC, // Accumulated color, with negative lobe.
inout AF1 aW, // Accumulated weight.
AF2 off, // Pixel offset from resolve position to tap.
AF2 dir, // Gradient direction.
AF2 len, // Length.
AF1 lob, // Negative lobe strength.
AF1 clp, // Clipping point.
AF3 c){ // Tap color.
…
}
EASU Sampling
EASU Analysis
Edge direction is estimated from central difference: For the analysis, once the taps are in the
edge direction is estimated using a central difference. The central difference does miss
single-pixel features, however as we will see later, as feature-length becomes very small, the
filter kernel becomes symmetric and non-directional so we don’t care about directionality for thin
features.
A diagonal diff would have been more expensive and have 0.5 texel offset: Therefore a diagonal
diff is not used, and also a diagonal diff would have been more expensive and we would have
had to deal with a half-texel offset which would have made the logic a little more complicated.
Ok to miss single-pixel features (feature-length forces asymmetric non-directional filters in those
cases anyway). So once the edge direction is finished, we look at feature-length, and by
feature-length we are estimating that by looking at the 3 texels in the horizontal and 3 texels in
the vertical.
This way the computation is not that expensive, because if we were to input in linear and
convert to perceptual, we would have to do that 12 times for the 12 taps.
So it is much better and in fact required for good performance to factor any linear to perceptual
translation into the prior pass, prior to EASU.
Since linear to perceptual transforms are expensive and using 12 taps
It is better (required for good perf) to factor that out to the pass prior to EASU
The one compromise of course is that if we are running on perceptual, we are running all the
filtering in perceptual, but as it turns out it is typically acceptable in this case.
EASU Kernel Shaping
Analysis after interpolation produces {direction, length}
● The ‘direction’ used to rotate the filter kernel
● The ‘length’ drives post-rotation kernel scaling, and kernel window adjustment
(a*(b*x^2-1)^2-(a-1)),
Where window term ‘w’ varies from 1/4 for {+/- 2} kernel, and 1/2 for {+/- sqrt(2)}
EASU Deringing
The local 2x2 texel quad {min,max} used to clamp the EASU output
The first part is implemented in two functions, FsrEasuSetF and FsrEasuF. First, FsrEasuSetF
accumulates direction and length:
void FsrEasuSetF(
inout AF2 dir,
inout AF1 len,
AF2 pp,
AP1 biS,AP1 biT,AP1 biU,AP1 biV,
AF1 lA,AF1 lB,AF1 lC,AF1 lD,AF1 lE){
…
}
a
bcd
e
Then, the algorithm takes magnitude from abs average of both sides of 'c'. Length converts
gradient reversal to 0, smoothly to non-reversal at 1, shaped, then adding horz and vert terms.
This is done as:
AF1 dc=lD-lC;
AF1 cb=lC-lB;
AF1 lenX=max(abs(dc),abs(cb));
lenX=APrxLoRcpF1(lenX);
AF1 dirX=lD-lB;
dir.x+=dirX*w;
lenX=ASatF1(abs(dirX)*lenX);
lenX*=lenX;
len+=lenX*w;
Then the code repeats for the y axis, accumulating in the variable len. Here (from ffx_a.h):
Here AMinF1 and AMaxF1 are simple minimum- and maximum-computing functions:
And AF1_AU1 is a conversion from int to float, and AU1_ is a simple float cast.
void FsrEasuF(
out AF3 pix,
AU2 ip, // Integer pixel position in output.
AU4 con0, // Constants generated by FsrEasuCon().
AU4 con1,
AU4 con2,
AU4 con3){
…
}
This code first gets the position of sample “f” from ip, the position of the pixel to calculate. Then
it applies the 12-tap kernel.
bc
efgh
ijkl
no
For packed FP16, need either {rg} or {ab} so using the following setup for gather in all versions.
Also, take into account that a b are unused (z).
// r g
// a b a b
// r g r g
// a b
// r g <- unused (z)
// Allowing dead-code removal to remove the 'z's.
Then it applies the simplest multi-channel approximate luma possible (luma times 2, in 2
FMA/MAD):
AF4 bczzL=bczzB*AF4_(0.5)+(bczzR*AF4_(0.5)+bczzG);
AF4 ijfeL=ijfeB*AF4_(0.5)+(ijfeR*AF4_(0.5)+ijfeG);
AF4 klhgL=klhgB*AF4_(0.5)+(klhgR*AF4_(0.5)+klhgG);
AF4 zzonL=zzonB*AF4_(0.5)+(zzonR*AF4_(0.5)+zzonG);
AF2 dir=AF2_(0.0);
AF1 len=AF1_(0.0);
FsrEasuSetF(dir,len,pp,true, false,false,false,bL,eL,fL,gL,jL);
FsrEasuSetF(dir,len,pp,false,true ,false,false,cL,fL,gL,hL,kL);
FsrEasuSetF(dir,len,pp,false,false,true ,false,fL,iL,jL,kL,nL);
FsrEasuSetF(dir,len,pp,false,false,false,true ,gL,jL,kL,lL,oL);