Super Sample
Super Sample
Lei Yang1 Diego Nehab2 Pedro V. Sander1 Pitchaya Sitthi-amorn3 Jason Lawrence3 Hugues Hoppe2
1 2 3
Hong Kong UST Microsoft Research University of Virginia
Abstract
We present a real-time rendering scheme that reuses shading sam-
ples from earlier time frames to achieve practical antialiasing of
procedural shaders. Using a reprojection strategy, we maintain sev-
eral sets of shading estimates at subpixel precision, and incremen-
tally update these such that for most pixels only one new shaded
sample is evaluated per frame. The key difficulty is to prevent ac-
cumulated blurring during successive reprojections. We present a (a) No AA (b) Jit. reproj. (c) 4×4 SS (d) Ours (e) Reference
theoretical analysis of the blur introduced by reprojection methods. 140fps 88fps 11fps 63fps
Based on this analysis, we introduce a nonuniform spatial filter,
Figure 1: For a moving scene with a procedural shader (top row of
an adaptive recursive temporal filter, and a principled scheme for
Figure 14), comparison of (a) no antialiasing, (b) jittered reprojec-
locally estimating the spatial blur. Our scheme is appropriate for
tion, (c) 4×4 stratified supersampling, (d) our amortized supersam-
antialiasing shading attributes that vary slowly over time. It works
pling, and (e) ground truth reference image.
in a single rendering pass on commodity graphics hardware, and
offers results that surpass 4×4 stratified supersampling in quality,
at a fraction of the cost. general idea of reusing shading information across frames has been
studied extensively, as reviewed in Section 2. Our approach builds
1 Introduction on the specific strategy of real-time reprojection, whereby the GPU
pixel shader “pulls” information associated with the same surface
The use of antialiasing to remove sampling artifacts is an impor- point in earlier frames. Recent work on reprojection has focused on
tant and well studied area in computer graphics. In real-time ras- reusing expensive intermediate shading computations across frames
terization, antialiasing typically involves two hardware-supported [Nehab et al. 2007] and temporally smoothing shadow map bound-
techniques: mipmapped textures for prefiltered surface content, and aries [Scherzer et al. 2007]. In contrast, our goal is effective super-
framebuffer multisampling to remove jaggies at surface silhouettes. sampling of more general shading functions.
With the increasing programmability of graphics hardware, many Amortized supersampling faces many challenges not present in or-
functionalities initially developed for offline rendering are now fea- dinary supersampling. Due to scene motion, the set of samples
sible in real-time. These include procedural materials and complex computed in earlier frames form an irregular pattern when repro-
shading functions. Unlike prefiltered textures, procedurally defined jected into the current frame. Moreover, some samples become in-
signals are not usually bandlimited [Ebert et al. 2003], and produc- valid due to occlusion. Thus the set of spatio-temporal samples
ing a bandlimited version of a procedural shader is a difficult and available for reconstruction has much less structure than a typical
ad-hoc process [Apodaca and Gritz 2000]. grid of stratified stochastic samples.
To reduce aliasing artifacts in procedurally shaded surfaces, a com- We build on the jittered sampling and recursive exponential
mon approach is to increase the spatial sampling rate using super- smoothing introduced in prior reprojection work. An important
sampling (Figure 1c). However, it can be prohibitively expensive to contribution of this paper is a theoretical analysis of the spatio-
execute a complex procedural shader multiple times at each pixel. temporal blur introduced by these techniques as a function of the
Fortunately, it is often the case that at any given surface point, relative scene motion and smoothing factor applied in the recursive
expensive elements of the surface shading (such as albedo) vary filter. We show that by adjusting this smoothing factor adaptively,
slowly or are constant over time. A number of techniques can auto- the basic reprojection algorithm can be made to converge to per-
matically factor a procedural shader into static and dynamic layers fect reconstruction (infinite supersampling) for stationary views of
[Guenter 1994; Jones et al. 2000; Sitthi-amorn et al. 2008]. Our idea static scenes (Section 4.1). Furthermore, we show that for moving
is to sample the static and weakly dynamic layers at a lower tempo- surfaces, straightforward reprojection leads to excessive blurring
ral rate to achieve a higher spatial sampling rate for the same com- (Figure 1b). Our scheme makes several contributions in addressing
putational budget. The strong dynamic layers can be either sampled this key issue:
at the native resolution or supersampled using existing techniques. • Use of multiple subpixel buffers to maintain reprojection es-
We present a real-time scheme, amortized supersampling, that eval- timates at a higher spatial resolution;
uates the static and weak dynamic components of the shading func- • Irregular round-robin update of these subpixel buffers to im-
tion only once for the majority of pixels, and reuses samples com- prove reconstruction quality, while still only requiring one
puted in prior framebuffers to achieve good spatial antialiasing. The sample evaluation per pixel per frame;
• A principled approach to estimate and limit the amount of blur
introduced during reprojection and exponential smoothing;
• Adaptive evaluation of additional samples in disoccluded pix-
els to reduce aliasing;
• A strategy to estimate and react to slow temporal changes in
the shading.
Amortized supersampling is compatible with the modern rasteriza- 3 Review of reprojection
tion pipeline implemented on commodity graphics hardware. It is
lightweight, requiring no preprocessing, and thus provides a prac- Reprojection methods [Nehab et al. 2007; Scherzer et al. 2007] al-
tical approach for antialiasing existing procedural shaders. Also, it low reusing values generated at the pixel level over consecutive
requires only a single rendering pass, and can be used in conjunc- frames. We next summarize the basic approach which has two main
tion with hardware multisampling for antialiasing geometry silhou- parts: reprojection and recursive exponential smoothing.
ettes. We show that it achieves results that are qualitatively compa-
rable or superior to 4×4 stratified supersampling, but at a fraction
Reprojection The core idea is to let the rendering of the current
of the rendering cost (Figure 1d).
frame gather and reuse shading information from surfaces visible
in the previous frame. Conceptually, when rasterizing a surface at
a given pixel, we determine the projection of the surface point into
2 Related work the previous framebuffer, and test if its depth matches the depth
stored in the previous depth buffer. If so, the point was previously
visible, and its attributes can be safely reused. Formally, let buffer
Data caching and reuse Many offline and interactive ray- 𝑓𝑡 hold the cached pixel attributes at time 𝑡, and buffer 𝑑𝑡 hold
based rendering systems exploit the spatio-temporal coherence of the pixel depths. Let 𝑓𝑡 [𝑝] and 𝑑𝑡 [𝑝] denote the buffer values at
animation sequences [Cook et al. 1987; Badt 1988; Chen and pixel 𝑝 ∈ ℤ2 , and let 𝑓𝑡 (⋅) and 𝑑𝑡 (⋅) denote bilinear sampling.
Williams 1993; Bishop et al. 1994; Adelson and Hodges 1995; For each pixel 𝑝 = (𝑥, 𝑦) at time 𝑡, we determine the 3D clip-
Mark et al. 1997; Walter et al. 1999; Bala et al. 1999; Ward and Sim- space position of its generating scene point at time 𝑡-1, denoted
mons 1999; Havran et al. 2003; Tawara et al. 2004]. The idea is also (𝑥′ , 𝑦 ′ , 𝑧 ′ ) = 𝜋𝑡-1 (𝑝). The reprojection operation 𝜋𝑡-1 (𝑝) is ob-
used in hybrid systems that use some form of hardware acceleration tained using a simple computation in the vertex program and inter-
[Simmons and Séquin 2000; Stamminger et al. 2000; Walter et al. polator as described by Nehab et al. [2007]. If the reprojected depth
2002; Woolley et al. 2003; Gautron et al. 2005; Zhu et al. 2005; 𝑧 ′ lies within some tolerance of the bilinearly interpolated depth
Dayal et al. 2005]. These systems focus primarily on reusing expen- 𝑑𝑡-1 (𝑥′ , 𝑦 ′ ) we conclude that 𝑓𝑡 [𝑝] has some correspondence with
sive global illumination or geometry calculations such as ray-scene the interpolated value 𝑓𝑡-1 (𝑥′ , 𝑦 ′ ). If the depths do not match (due
intersections, indirect lighting estimates, and visibility queries. A to occlusion), or if (𝑥′ , 𝑦 ′ ) lies outside the view frustum at time 𝑡-1,
related set of methods opportunistically reuse shading information no correspondence exists and we denote this by 𝜋𝑡-1 (𝑝) = ∅.
to accelerate real-time rendering applications by reprojecting the
contents of the previous frame into the current frame [Hasselgren
and Akenine-Moller 2006; Nehab et al. 2007; Scherzer et al. 2007; Recursive exponential smoothing Both Nehab et al. [2007]
Scherzer and Wimmer 2008]. Although we also apply a reprojec- and Scherzer et al. [2007] showed how this reprojection strategy
tion strategy to reuse shading information over multiple frames, we can be combined with a recursive temporal filter for antialiasing.
do so to achieve spatial antialiasing; this application was first noted We first review this basic principle before extending it to a more
by Bishop et al. [1994], but not pursued. Furthermore, we add to general setting.
this area of research a rigorous theoretical analysis of the type of
At each pixel 𝑝, the shader is evaluated at some jittered position to
blur introduced by repeatedly resampling a framebuffer, a funda-
obtain a new sample 𝑠𝑡 [𝑝]. This sample is combined with a running
mental operation in these systems.
estimate of the antialiased value maintained in 𝑓𝑡 [𝑝] according to a
recursive exponential smoothing filter:
( )
Antialiasing of procedural shaders There is a considerable 𝑓𝑡 [𝑝] ← (𝛼)𝑠𝑡 [𝑝] + (1 − 𝛼)𝑓𝑡-1 𝜋𝑡-1 (𝑝) . (1)
body of research on antialiasing procedural shaders, recently re-
viewed by Brooks Van Horn III and Turk [2008]. Creating a ban- Note that the contribution a single sample makes to this estimate
dlimited version of a procedural shader can be a difficult task be- decreases exponentially in time, and the smoothing factor 𝛼 regu-
cause analytically integrating the signal is often infeasible. Sev- lates the tradeoff between variance reduction and responsiveness to
eral practical approaches are reviewed in the book by Ebert et al. changes in the scene. For example, a small value of 𝛼 reduces the
[2003]. These include clamping the high-frequency components variance in the estimate and therefore produces a less aliased result,
of a shader that is defined in the frequency domain [Norton et al. but introduces more lag if the shading changes between frames. If
1982], precomputing mipmaps for tabulated data such as lookup ta- reprojection fails at any pixel (i.e. 𝜋𝑡-1 (𝑝) = ∅), then 𝛼 is locally
bles [Hart et al. 1999], and obtaining approximations using affine reset to 1 to give full weight to the current sample. This produces
arithmetic [Heidrich et al. 1998]. However, the most general and higher variance (greater aliasing) in recently disoccluded regions.
common approach is still to numerically integrate the signal using
supersampling [Apodaca and Gritz 2000]. Our technique brings the
simplicity of supersampling to real-time applications at an accept-
4 Amortized supersampling: theory
able increase in rendering cost.
Spatial antialiasing is achieved by convolving the screen-space
shading function 𝑆 with a low-pass filter 𝐺 [Mitchell and Netravali
1988]. We use a Monte Carlo algorithm with importance sam-
Post-processing of rendered video The pixel tracing filter pling [Robert and Casella 2004] to approximate this convolution
of Shinya [1993] is related to our approach. Starting with a ren- (𝑆 ∗ 𝐺)(𝑝) at each pixel 𝑝:
dered video sequence, the tracing filter tracks the screen-space po-
sitions of corresponding scene points, and combines color samples 𝑁
at these points to achieve spatial antialiasing in each frame. The 1 ∑ ( )
𝑓𝑁 [𝑝] ← 𝑆 𝑝 + 𝑔𝑖 [𝑝] . (2)
filtering operation is applied as a post-process, and assumes that 𝑁 𝑖=1
the full video sequence is accessible. In contrast, our approach is
designed for real-time evaluation. We maintain only a small set of Here 𝑔𝑖 [𝑝] plays the role of a per-pixel random jitter offset, dis-
reprojection buffers in memory, and update these efficiently with tributed according to 𝐺. Our choice for 𝐺 is a 2D Gaussian kernel
adaptive recursive filtering. with standard deviation 𝜎𝐺 = 0.4737 as this closely approximates
4.2 Moving viewpoint and dynamic scene
the kernel of Mitchell and Netravali [1988] while avoiding negative As discussed later on, the presence of scene motion requires us to
lobes, which interfere with importance sampling [Ernst et al. 2006]. limit this indefinite accumulation of uniformly weighted samples
It is easily shown that the variance of the estimator is — even for pixels that remain visible over many frames. To do so,
Sections 5.2 and 6.2 prescribe lower bounds on the value of 𝛼𝑡 [𝑝]
1 ( ) that override (6). This effectively changes the rate at which new
Var(𝑓𝑁 [𝑝]) = Var 𝑓1 [𝑝] , (3)
𝑁 samples are accumulated, leading to a new update rule for 𝑁𝑡 [𝑝]
( ) (see the appendix for a derivation):
where Var 𝑓1 [𝑝] is the per-pixel variance of the Monte Carlo esti-
mator using just one sample. Using a recursive exponential smooth- ( ( )2 )−1
2 1 − 𝛼𝑡 [𝑝]
ing filter, we can amortize the cost of evaluating the sum in (2) over 𝑁𝑡 [𝑝] ← 𝛼𝑡 [𝑝] + [ ] . (8)
multiple frames: 𝑁𝑡-1 𝜋𝑡-1 (𝑝)
( ) [ ]
𝑓𝑡 [𝑝] ← (𝛼𝑡 [𝑝]) 𝑆 𝑝 + 𝑔𝑡 [𝑝] + (1 − 𝛼𝑡 [𝑝]) 𝑓𝑡-1 𝜋𝑡-1 (𝑝) . (4) Note that (8) reduces to (7) when 𝛼𝑡 [𝑝] is not overridden, so in
practice we always use (8).
In words, a running estimate of (2) is maintained at each pixel 𝑝
in the buffer 𝑓𝑡 and
( is updated
) at each frame by combining a new 4.2.2 Modeling blur due to resampling
jittered sample 𝑆 𝑝+𝑔𝑡 [𝑝] with the previous estimate according to
the smoothing factor 𝛼𝑡 [𝑝]. Note that this formulation allows 𝛼𝑡 [𝑝] In general, the reprojected position 𝜋𝑡-1 (𝑝) used in (4) lies some-
to vary over time and with pixel location. where between the set of discrete samples in buffer 𝑓𝑡-1 and thus
some form of resampling is required. This resampling involves
We first present an antialiasing scheme for stationary views and
computing a weighted sum of the values in the vicinity of 𝜋𝑡-1 (𝑝).
static scenes, and then consider the more general case of arbitrary
Repeatedly resampling values at intermediate locations has the ef-
scene and camera motion. Detailed derivations of key results in
fect of progressively increasing the number of samples that con-
these sections are found in the appendix.
tribute to the final estimate at each pixel. Moreover, the radius of
this neighborhood of samples increases over time (Figure 2), lead-
4.1 Stationary viewpoint and static scene ing to undesirable blurring (Figure 1b). Our goal is to limit this
effect. We first model it mathematically.
In the case of a stationary camera viewpoint and static scene, the re-
projection map 𝜋 is simply the identity. In this case, the smoothing The value stored at a single pixel is given by a weighted sum of a
factor can be gradually decreased over time as number of samples 𝑛(𝑡, 𝑝) evaluated at different positions:
1 𝑛(𝑡,𝑝) 𝑛(𝑡,𝑝)
𝛼𝑡 [𝑝] = , (5) ∑ ( ) ∑
𝑡 𝑓𝑡 [𝑝] = 𝜔𝑡,𝑖 𝑆 𝑝 + 𝛿𝑡,𝑖 [𝑝] with 𝜔𝑡,𝑖 = 1. (9)
𝑖=1 𝑖=1
resulting in an ever-increasing accumulation of samples, all with
uniform weights. This causes the estimates 𝑓𝑡 [𝑝] to converge to The weights 𝜔𝑡,𝑖 are a function of the particular resampling strategy
perfect antialiasing (infinite
( supersampling)
) as 𝑡 → ∞, with vari- employed and the sequence of weights 𝛼𝑡 used in the recursive fil-
ance decreasing as Var 𝑓1 [𝑝] /𝑡. ter. The offsets 𝛿𝑡,𝑖 denote the position of each contributing sample
Simulation Equation (11)
0.7 0.54
0.54
0.6
0.5
0.4
0.3 0.90
0.90
0.2
0.71 0.71 1.3
1.3 1.1
0.1 1.1
1.6 1.9 1.9
1.6 2.2
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
Figure 3: Experimental validation of Equation (11). For each veloc- Figure 4: Sampling from multiple subpixel buffers. To limit the
ity 𝑣 and weight 𝛼 we rendered a resolution chart until convergence amount of blur, we use nonuniform blending weights defined by
using (4). We compared the rendered result to a set of images of the a tent function centered on the quadrant being updated. (a) In the
same resolution chart rendered with (2) using a low-pass filter 𝐺′ absence of local motion, no resampling blur is introduced; (b) For
and a range of standard deviations 𝜎𝐺′ . The left plot shows the 𝜎𝐺′ a moving scene, our method selects those samples closest to the
that gives the best match (highest PSNR) as a function of 𝑣 and 𝛼. desired quadrant center to limit the amount of blur.
The right plot shows the blur standard deviation predicted by (11).
The RMSE between the observed and predicted blur standard devi-
ations is only 0.0382 pixels. 5.1 Subpixel buffers
with respect to the center of the pixel 𝑝. Note that each displace- We decrease unwanted blur by maintaining screen-space esti-
ment 𝛿𝑡,𝑖 [𝑝] is the result of a combination of offsets due to jitter and mates 𝑓 at twice the screen resolution, as this tends to reduce the
reprojection. Following the derivation in the appendix, the amount terms 𝑣𝑥 (1 − 𝑣𝑥 ) and 𝑣𝑦 (1 − 𝑣𝑦 ) in (11) by half. We associate es-
of blur at a pixel can be characterized by the average weighted spa- timates with the four quadrants of each pixel, and update these in
tial variance across both dimensions round-robin fashion. These 2×2 quadrant samples are deinterleaved
to form 𝐾 = 4 independent subpixel buffers {𝑏𝑘 }, 𝑘 ∈ {0, 1, 2, 3},
𝑛(𝑡,𝑝) ) 𝑛(𝑡,𝑝) )
𝜎𝑡2 [𝑝] = 12 Var𝑥 {𝛿𝑡,𝑖 [𝑝]}𝑖=1 + 21 Var𝑦 {𝛿𝑡,𝑖 [𝑝]}𝑖=1 . (10)
( (
each at screen
{( resolution.
) (−0.25)Each buffer 𝑏𝑘 )}
stores an estimate offset
by 𝜙𝑘 ∈ −0.25
( 0.25 ) (0.25
−0.25
, 0.25 , −0.25 , 0.25 relative to the center
Although obtaining a closed-form expression for 𝜎𝑡2 [𝑝] is impracti- of the pixel.
cal for arbitrary scene motion, the case of constant panning motion
is tractable and provides an insightful case that will serve as the Note that in the absence of scene motion, these four subpixel buffers
basis of our approach for estimating (and limiting) unwanted blur. effectively form a higher-resolution framebuffer. However, under
Moreover, other types of motion are well approximated locally by scene motion, the subpixel samples computed in earlier frames re-
translations. This type of motion resamples each pixel at a constant project to offset locations, as indicated in Figure 4.
offset given by the fractional velocity 𝑣 = 𝜋𝑡-1 (𝑝) − ⌊𝜋𝑡-1 (𝑝)⌋.
Furthermore, let us assume for now that standard bilinear interpo- At each frame, we compute one new sample per pixel, and up-
lation is used to reconstruct intermediate values and that a constant date one of the subpixel buffers, 𝑏𝑖(𝑡) , according to (4) using in-
smoothing factor 𝛼 is used in (4). As shown in the formation gathered from all the subpixel buffers. (For now, let
( appendix, under
𝑖(𝑡) = 𝑡 mod 𝐾.) We then compute the final pixel color as a
these assumptions the expected blur variance 𝐸 𝜎𝑡2 [𝑝] converges
)
(as 𝑡 → ∞) to weighted average of these subpixel buffers. Figure 5 illustrates the
steps executed by the pixel shader at each frame 𝑡:
1 − 𝛼 𝑣𝑥 (1 − 𝑣𝑥 ) + 𝑣𝑦 (1 − 𝑣𝑦 )
𝜎𝑣2 = 𝜎𝐺
2
+ . (11) (
1. 𝑠𝑡 [𝑝] ← E VALUATE S AMPLE 𝑝 + 𝜙𝑖(𝑡) + 𝑔𝑡 [𝑝]
)
𝛼 2
Evaluate the procedural shader at a new sample position;
The simulation results shown in Figure 3 confirm the accu- ( )
racy of this expression. Each factor in (11) suggests a dif- 2. 𝑏𝑖(𝑡) [𝑝] ← U PDATE S UBPIXEL {𝑏𝑘 [𝑝]}, 𝑠𝑡 [𝑝], 𝑡
(ferent approach for reducing ) the amount of blur. The factor Update one subpixel buffer using all previous subpixel buffers
𝑣𝑥 (1 − 𝑣𝑥 ) + 𝑣𝑦 (1 − 𝑣𝑦 ) arises from the choice of bilinear in- and the new sample;
terpolation. We encourage the fractional velocity 𝑣𝑥 and 𝑣𝑦 to con- ( )
centrate around 0 or 1 by maintaining an estimate of the framebuffer 3. 𝑓𝑡 [𝑝] ← C OMPUTE P IXEL C OLOR {𝑏𝑘 [𝑝]}, 𝑠𝑡 [𝑝]
at a higher resolution, and by avoiding resampling whenever possi- Compute the output pixel color given the new sample and the
ble (Section 5.1). In addition, we reduce the factor 1−𝛼𝛼
by using subpixel buffers;
larger values of 𝛼, although this has the disadvantage of limiting ( )
the magnitude of antialiasing. We present a strategy for setting 𝛼 4. 𝑁𝑡 [𝑝] ← U PDATE E FFECTIVE N UMBERO F S AMPLES 𝑁𝑡 [𝑝]
that controls this tradeoff (Section 5.2). Together, these ideas form Update the new per-pixel effective number of samples.
the backbone of our antialiasing algorithm. These steps are implemented in the real-time graphics pipeline. The
vertex shader computes the reprojection coordinates needed to ac-
5 Algorithm cess the four subpixel buffers (as with traditional reprojection [Ne-
hab et al. 2007]), and the fragment shader outputs three render tar-
As described in this section, our antialiasing algorithm uses multi- gets: the framebuffer 𝑓𝑡 , an updated version 𝑏𝑖(𝑡) of one of the sub-
ple subpixel buffers to limit the amount of blur, adapts sample eval- pixel buffers, and updated values for the per-pixel number 𝑁𝑡 of
uation at disoccluded pixels, and adjusts the smoothing factor 𝛼 to samples. All steps are performed together in a single GPU render-
control the tradeoff between blurring and aliasing. ing pass, as described below.
Rather than accumulating all of the samples in the resulting foot-
print (which would be too expensive), we consider only the four
nearest samples in each subpixel buffer
{( )(altogether
( ) ( ) ( sixteen sam-
ples), located at ⌊𝑝𝑘 ⌋ + Δ with Δ ∈ 00 , 10 , 01 , 11 :
)}
∑ [ ]
˜𝑏(𝑝 + 𝜙𝑖(𝑡) ) = 𝑘,Δ 𝑤𝑘,Δ 𝑏𝑖(𝑡-𝑘-1) ⌊𝑝𝑘 ⌋ + Δ
∑ , (16)
𝑘,Δ 𝑤𝑘,Δ
We then approximate the footprint of the projected tent function by 5.2 Limiting the amount of blur
an axis-aligned square of radius
( )
Given a threshold 𝜏𝑏 on the amount of blur (variance of the sam-
𝑟𝑘 =
𝐽𝜋𝑡-𝑘-1 [𝑝] 0.5 , (15)
0.5
ple distribution) and the velocity 𝑣 due to constant panning motion,
∞
we would like to compute the smallest smoothing factor 𝛼𝜏𝑏 (𝑣)
where 𝐽𝜋𝑡-𝑘-1 is the Jacobian of each reprojection map, which is that provides the greatest degree of antialiasing without exceeding
directly available using the ddx/ddy shader instructions. This is to this blur threshold. Unlike in the case of traditional bilinear repro-
account for changes in scale during minification or magnification. jection, which admits a bound on 𝛼 by inverting (11), our more
28.16dB 30.54dB 23.65dB
0.2
Figure 7: 2D plot of the smallest smoothing factor 𝛼𝜏𝑏 (𝑣) that re-
spects a blur threshold 𝜏𝑏 , as function of the 2D velocity vector 𝑣.
1. Rather than (updating the subpixel ) buffers in simple round- The factor 𝐾 = 4 is due to the fact that we update each subpixel
robin order 𝑖(𝑡) = 𝑡 mod 𝐾 , we use an irregular update buffer every fourth frame. At pixels where the scene is stationary,
sequence that breaks the drift coherence. We found that the the effect of (21) is to progressively reduce the value of 𝛼 just as
following update sequence works well in practice: (0, 1, 2, 3, 0, in (6). In the case of a moving scene, however, the value of 𝛼 is set
2, 1, 3, 1, 0, 3, 2, 1, 3, 0, 2, 2, 3, 0, 1, 2, 0, 3, 1, 3, 2, 1, 0, 3, 1, 2, 0). to give the greatest reduction in aliasing at an acceptable amount of
blur as shown in Figure 8.
2. Rather than bound the variance 𝜎 2 of the sample distribution
(second moment about the mean 𝜇), we bound the quantity 5.3 Adaptive evaluation when reprojection fails
𝜎 2 + 𝜇2 (second moment about zero, the pixel center). This
simultaneously limits the degree of blur and drift.
For surfaces that recently became visible, the reprojection from one
The irregular update sequence makes it difficult to derive a closed- or more subpixel buffers may fail. In the worst case, only the new
form expression for the second moment of the sample distribution. sample 𝑠𝑡 [𝑝] is available. In these cases, the shading is prone to
Instead, we compute 𝜎 2 + 𝜇2 numerically in an off-line simulation. aliasing as seen in Figure 9b.
Specifically, we compute the moments of the sample distribution in To reduce these artifacts, for each reprojection that fails when com-
each subpixel buffer over a range of values of 𝛼 and 𝑣. We then puting 𝑓˜(𝑝) in (20) (i.e., 𝜋𝑡-𝑘-1 (𝑝 + 𝜙𝑖(𝑡) ) = ∅) we invoke the
average the moments over an entire period of the update sequence, procedural shader at the appropriate quadrant offset 𝑝 + 𝜙𝑖(𝑡) . Con-
over the 𝑥 and 𝑦 directions according to (10), and over all subpixel sequently, the shader may be evaluated from one to five times (Fig-
buffers. These results are finally inverted to produce a table 𝛼𝜏𝑏 (𝑣). ure 9a) to improve rendering quality (Figure 9c). Fortunately these
At runtime, this table is accessed to retrieve the value of 𝛼 that troublesome regions tend to be spatially contiguous and thus map
meets the desired threshold for the measured per-pixel velocity. We well to the SIMD architecture of modern GPUs.
found that a 643 table is sufficient to capture the variation in 𝛼𝜏𝑏 (𝑣).
Figure 7 shows slices of this function for two different values of 𝜏𝑏 .
6 Accounting for signal changes
Putting everything together, we apply the following update rule in
place of (6): The preceding analysis and algorithm assume that the input signal
( ) 𝑆 does not change over time. However, it is often the case that
1 the surface shading does vary temporally due to, for example, light
𝛼𝑡 [𝑝] ← 𝐾 max , 𝛼𝜏𝑏 (𝑣) . (21) and view-dependent effects such as cast shadows and specular high-
𝑁𝑡-1 + 1
10%
5%
0%
(a) No signal adaptation (b) Small 𝜏𝜖 (c) Good 𝜏𝜖 (d) Large 𝜏𝜖 (e) Reference
Figure 10: Effect of the residual tolerance 𝜏𝜖 on the teapot scene, which has bump-mapped specular highlights and moving lights. Small
values of 𝜏𝜖 give too little weight to earlier estimates and lead to aliasing. Large values result in excessive temporal blurring when the shading
function varies over time, as evident in this close-up view. The full-scene difference images are shown at the top.
lights. In these cases it is possible to apply our supersampling tech- 6.2 Limiting the residual
nique to only the constant view- and light-independent layers and
evaluate the remaining portions of the shading at the native screen Similar to our approach for limiting the degree of spatial blur, we
resolution or with an alternative antialiasing technique. However, would like to establish a lower bound on 𝛼 such that the residual
providing a unified framework for antialiasing time-varying surface 𝜖ˆ𝑡+1 in the next frame remains within a threshold 𝜏𝜖 . The choice of
effects is a worthy goal and we present a preliminary solution to 𝜏𝜖 controls the tradeoff between the degree of antialiasing and the
this problem in this section. We describe how to compute a lower responsiveness of the system to temporal changes in the shading.
bound on 𝛼 to avoid unwanted temporal blur in the case of shading
changes. Following the derivation in the appendix, our strategy to adapt to
temporal changes is to replace (21) with
6.1 Estimating the residual (
1
)
𝛼𝑡 [𝑝] ← 𝐾 max , 𝛼𝜏𝑏 (𝑣), 𝛼𝜏𝜖 , (26)
𝑁𝑡-1 + 1
Detecting temporal changes in the shading requires estimating the
residual between our current shading estimate and its true value: where 𝜏𝜖
𝜖𝑡 [𝑝] = (𝑆 ∗ 𝐺)𝑡 (𝑝) − 𝑓𝑡 [𝑝]. (22) 𝛼𝜏𝜖 = 1 − . (27)
𝜖ˆ𝑡 [𝑝]
Since the correct value of (𝑆 ∗ 𝐺)𝑡 is naturally unavailable, we
would like to use the most current information 𝑠𝑡 to estimate this At pixels where the shading is constant, 𝜖ˆ𝑡 is less than 𝜏𝜖 and 𝛼𝑡
residual 𝜖ˆ𝑡 . However, since we expect 𝑠𝑡 to be aliased (otherwise progresses according to the previous rules. When the residual in-
we would not need supersampling), we must smooth this value in creases, the value of 𝛼 also increases, shrinking the temporal win-
both space and time. This corresponds to our assumption that al- dow over which samples are aggregated and producing a more ac-
though 𝑆𝑡 (𝑝) may contain high-frequency spatial information, its curate estimate of the shading. Figure 10 illustrates this tradeoff
partial derivative with respect to time ∂𝑆𝑡 (𝑝)/∂𝑡 is smooth over between aliasing and temporal lag.
the surface. In other words, we assume that temporal changes in The selection of 𝜏𝜖 is closely related to the characteristic of the
the signal affect contiguous regions of the surface evenly. When shading signal. In our experiments, we simply select 𝜏𝜖 that
this is not the case, our strategy for setting 𝛼 will fail as discussed achieves the best PSNR. Alternatively, other numerical or visual
in Section 8. metrics can be used to limit both temporal lag and aliasing to ac-
Let the buffers 𝑒𝑘 store the differences between the recent sam- ceptable amounts.
ples 𝑠𝑡-𝑘 [𝑝] and the values reconstructed from the previous contents
of the subpixel buffers 𝑏𝑖(𝑡-𝑘) [𝑝] over the last 𝐾 frames 7 Results
𝑒𝑘 [𝑝] = 𝑠𝑡-𝑘 [𝑝] − 𝑏𝑖(𝑡-𝑘) [𝑝], 𝑘 ∈ {0, 1, 2, 3}. (23)
Scenes We tested our algorithm using several expensive proce-
We temporally smooth these values by retaining at each pixel the dural shaders with high-frequency spatial details that are prone to
difference with the smallest magnitude aliasing. The brick scene combines a random-color brick pattern
with noisy mortar and pits. Bump mapping is used for the light-
𝑒smin [𝑝] = 𝑒𝑗 [𝑝] where 𝑗 = arg min 𝑒𝑘 [𝑝], (24)
𝑘 ing. The horse scene includes an animated wooden horse galloping
over a marble checkered floor. The teapot consists of a procedural
and obtain our final estimate of the residual by spatially smoothing Voronoi cell pattern modulating both the color and the height field.
these using a box filter 𝐵𝑟 of radius 𝑟 = 3: The added rough detail is bump-mapped with specular highlights.
( )
𝜖ˆ𝑡 [𝑝] = 𝐵3 ∗ 𝑒smin [𝑝] (25)
In addition to the basic scenes above, we show results for an indoor
Note that this approach risks underestimating the residual. In other scene that has a higher variety of shaders, dynamic elements, and
words, when presented with the choice between aliasing or a slower more geometric complexity. The scene consists of several proce-
response to signal changes, our policy is to choose the latter. durally shaded objects that altogether have over 100,000 triangles.
Panning Rotation Minification / Magnification
36 36 36
Panning Rotation Minification / Magnification
36 33 3633 Ours
33
36 Ours
Ours
Reproj
33 30 3330 Jit Reproj 30 Jit Reproj
PSNR (dB)
Ours
33 Ours
Ours
4x4 SS
Orignial original
30 27 3027 Jit Reproj
27
30 Jit Reproj
Reproj
PSNR (dB)
3x3 SS
SS2x2Jit 2x2SSJit 4x4 SS
27 24 2724 Orignial 24
27 original 2x2 SS
SS3x3Jit 3x3SSJit 3x3 SS
SS2x2Jit 2x2SSJit
24 21 2421 21 No AA
SS4x4Jit 24 4x4Jit 2x2 SS
SS3x3Jit 3x3SSJit
21 18 2118 18
21 No AA
SS4x4Jit 4x4Jit
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
18 Animation time (frames) 18 Animation time (frames) 18 Animation time (frames)
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
Figure 11: PSNR comparison of our approach with traditional
Animation time (frames) supersampling and jittered reprojection
Animation time (frames) for the brick scene using real-time
Animation time (frames)
animation sequences exhibiting different types of motion. The red line indicates the frame used in Figure 14.
Horse Teapot Indoor
30 30 37
Horse Teapot Indoor
Ours
30 27 3028 34
37
Reproj
PSNR (db)
27 24 2826 31
34 Ours
4x4 SS
Reproj
PSNR (db)
3x3 SS
24 21 2624 28
31 4x4 SS
2x2 SS
2x2 SS
18 2422 25 3x3 SS
21 28 No AA
15 2x2 SS
2x2 SS
18 2220 22
25
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
No AA
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
31
61
91
121
151
181
211
241
271
301
331
361
391
421
15 Animation time (frames) 20 Animation time (frames) 22 Animation time (frames)
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
1
31
61
91
121
151
181
211
241
271
301
331
361
391
421
Animation time (frames) Animation time (frames) Animation time (frames)
Figure 12: Additional PSNR comparisons of our approach with traditional supersampling and jittered reprojection. The horse is animated, the
teapot is dynamically lit, and the indoor scene is animated and dynamically lit. The red lines indicate the frames used in Figures 10 and 14.
These objects include bump-mapped brick walls, a shiny colored Figure 14 compares our algorithm to these alternatives. Note that
bumpy sphere, a glistening uneven marble floor, a reflective stone- standard rasterization is very efficient, but produces significant
bricked teapot, a fine-carved wooden box and a rusty metal exotic aliasing, especially under motion (see accompanying video). Jit-
creature. The animated scene from the accompanying video com- tered reprojection is also faster than our technique, but has infe-
bines several types of fast animation and camera motion, exhibit- rior quality because it lacks high-resolution estimates and does not
ing similar complexity to what one would expect in a typical game adapt 𝛼 to limit blur. Our technique is also superior to traditional
scene. It also includes a rotating masked gobo light, which sheds 2×2 and 3×3 stratified supersampling in terms of both rendering
a procedurally generated pattern onto the scene objects. The fast speed and quality. Finally, note that our technique gives higher
moving light and significant disocclusion caused by the fast motion PSNR when compared to 4×4 stratified supersampling in the ma-
and complex geometry details makes the scene extremely difficult jority of cases. The teapot from Figure 10 is the most challenging
to handle with traditional reprojection techniques. However, with scene due to the fact that it contains a procedural shader with a
multiple subpixel buffers and adaptive evaluation, our method pro- high-frequency yet fast changing dynamic lighting component. As
duces satisfying results. a result, our method limits the effective number of samples it ag-
gregates to avoid temporal blurring, and this reduces the degree of
Memory usage The depth values required by reprojection are antialiasing. For the indoor scene in motion, we chose not to an-
stored in the alpha channel of each subpixel buffer, which are 16- tialias the gobo lighting computation to avoid excessive blurring of
bit RGBA. We store the variance reduction factors 𝑁𝑡 [𝑝] in an 8-bit the pattern (see Section 8). Note, however, that the moving specu-
auxiliary buffer. For the teapot and indoor scenes, residuals for the lar highlights over the shiny objects (such as the teapot scene, the
four subpixel buffers are stored in the four channels of one addi- sphere, floor and teapot in the indoor scene) at different scales are
tional 8-bit RGBA buffer. A 1024×768 backbuffer consumes about still properly antialiased without introducing noticeable temporal
3MB, and our scheme uses an additional 27MB to supersample all blur. This demonstrates that our technique can handle signals that
shaders in a scene. change with moderate speed and in a spatially correlated manner.
The indoor scene also shows the ability of our approach to preserve
Comparisons All results are generated using an Intel Core2 antialiased details in the presence of complex and very dynamic
2.13GHz PC with 2GB RAM and an AMD HD4870 graphics board changes in occlusion. Overall, note that our algorithm is signifi-
at a screen resolution of 1024×768. We measure image quality us- cantly faster than 4×4 supersampling on all scenes (about 5–10×
ing the peak signal-to-noise ratio (PSNR) with respect to a ground faster depending on the relative amount of vertex and pixel process-
truth image generated with 256 samples/pixel weighted as in Sec- ing).
tion 4.1. We show comparisons between conventional rendering
(no antialiasing), our algorithm, and traditional 2×2, 3×3 and 4×4 Figure 11 and 12 graph the rendering quality (in PSNR) of our
stratified supersampling (performed on the GPU by rendering to a scenes for each of the techniques using different animation se-
larger framebuffer and then downsampling). In addition, we com- quences. The red vertical lines denote the images shown in Fig-
pare to the basic reprojection method [Nehab et al. 2007; Scherzer ures 10 and 14. For the brick scene, Figure 11 demonstrates the
et al. 2007] that uses uniform jittered sampling, a single cached superior quality of our technique under different types of motion:
buffer, and a value of 𝛼 chosen to maximize PSNR. panning, rotation, and repeated magnification and minification. The
small oscillations in PSNR do not visibly affect rendering quality
as can be verified in the accompanying video. Figure 12 shows that
similar results are achieved for the other three test scenes, which in-
clude various different types of motion. Again, the accompanying
video shows animated versions of these results.
Let 𝑀𝑋 (𝑢) denote the moment-generating function of 𝑋: Derivations of (18) and (19) For the sums in (16)[ and (18) to]
be equal, the weights associated to each value 𝑏𝑖(𝑡-𝑘-1) ⌊𝑝𝑘 ⌋ + Δ
𝑀𝑋 (𝑢) = 𝐸 lim 𝑒𝑢𝑋𝑡
( )
(39)
𝑡→∞ must be the same. This leads to a system of equations for each 𝑜𝑘 :
∞ ∑
∑ ∞
(1 − 𝑜𝑘𝑥 )(1 − 𝑜𝑘𝑦 ) = 𝑤𝑘,(0) /𝑤𝑘 = 𝛽𝑘,0 ⋅ 𝛾𝑘,0
(𝑗+𝑘)
𝛼(1-𝛼)𝑗+𝑘 𝑎𝑗 𝑐𝑘 𝑒𝑢(𝑗𝑏+𝑘𝑑)
⎧
= 𝑗
(40)
0
𝑗=0 𝑘=0
(1 − 𝑜𝑘𝑥 ) 𝑜𝑘𝑦 = 𝑤𝑘,(0) /𝑤𝑘 = 𝛽𝑘,0 ⋅ 𝛾𝑘,1
⎨
𝑞
∞ ∑ 1
∑ (𝑞) . (51)
= 𝑗
𝛼(1-𝛼)𝑞 𝑎𝑗 𝑐𝑞-𝑗 𝑒𝑢(𝑗𝑏+(𝑞-𝑗)𝑑) (41)
𝑜𝑘𝑥 (1 − 𝑜𝑘𝑦 ) = 𝑤𝑘,(1) /𝑤𝑘 = 𝛽𝑘,1 ⋅ 𝛾𝑘,0
0
𝑞=0 𝑗=0
∞
⎩ 𝑜𝑘𝑥 𝑜𝑘𝑦 = 𝑤𝑘,(1) /𝑤𝑘 = 𝛽𝑘,1 ⋅ 𝛾𝑘,1
∑ )𝑞 1
(1-𝛼)(𝑎 𝑒𝑢𝑏 + 𝑐 𝑒𝑢𝑑 )
(
=𝛼 (42)
𝑞=0 Note that both sides add up to one, and recall that the tent fil-
𝛼 ters are axis-aligned and separable, which allows us to factor the
= . (43) weights 𝑤𝑘,Δ into products 𝛽 ⋅ 𝛾 as shown above. Therefore there
1 − (1-𝛼)(𝑎 𝑒𝑢𝑏 + 𝑐 𝑒𝑢𝑑 )
exists a unique solution to each of these systems, given by (19).
This function can be used to compute the desired moments:
Derivation of (27) The residual in the next frame is equal to
′ (𝑎𝑏 + 𝑐𝑑)𝛼(1-𝛼)
𝜇𝑋 = 𝐸(𝑋) = 𝑀𝑋 (0) = ( )2 (44)
1 − (1-𝛼)(𝑎 + 𝑐) 𝜖ˆ𝑡+1 [𝑝] ≈ 𝑠𝑡+1 [𝑝] − 𝑓𝑡+1 [𝑝] (52)
2 ′′
𝜇2𝑋 𝜇2𝑋
( )
𝜎𝑋 + = Var(𝑋) + = 𝑀𝑋 (0) = 𝑠𝑡+1 [𝑝] − (𝛼)𝑠𝑡+1 [𝑝] + (1 − 𝛼)𝑓𝑡 [𝑝] (53)
2 2 2 2
( )
(𝑎𝑏 + 𝑐𝑑 )𝛼(1-𝛼) 2(𝑎𝑏 + 𝑐𝑑) 𝛼(1-𝛼) ≈ 𝑠𝑡+1 [𝑝] − (𝛼)𝑠𝑡+1 [𝑝] − (1 − 𝛼) 𝑠𝑡 [𝑝] − 𝜖ˆ𝑡 [𝑝] (54)
= ( )2 + ( )3 . (45) ( )
1 − (1-𝛼)(𝑎 + 𝑐) 1 − (1-𝛼)(𝑎 + 𝑐) = (1 − 𝛼) 𝑠𝑡+1 [𝑝] − 𝑠𝑡 [𝑝] + (1 − 𝛼)ˆ 𝜖𝑡 [𝑝]. (55)
The introduction of sample jittering simply adds a Gaussian random Here we do not attempt to predict the additional residual introduced
variable 𝐺 to 𝑋. Since Var(𝑋 + 𝐺) = Var(𝑋) + Var(𝐺), we due to future signal changes, so we set 𝑠𝑡+1 [𝑝] = 𝑠𝑡 [𝑝] in (55). This
can substitute (36) into (44–45) to obtain leads to the relation
𝜇𝑣 = 0 (46) 𝜖ˆ𝑡+1 [𝑝] ≈ (1 − 𝛼)ˆ
𝜖𝑡 [𝑝]. (56)
1−𝛼
𝜎𝑣2 = 𝜎𝐺
2
+ 𝑣 (1 − 𝑣). (47)
𝛼 Requiring ∣ˆ
𝜖𝑡+1 ∣ to be smaller than 𝜏𝜖 , we reach
Extending this result to 2D requires two modifications. First, the 𝜏𝜖
fractional velocity 𝑣 and offsets 𝛿𝑡,𝑖 become 2D vectors. Second, 𝛼 > 𝛼𝜏𝜖 = 1 − . (27)
(35) now contains four terms with 𝐹𝑡-1 , corresponding to the four 𝜖ˆ𝑡 [𝑝]
pixels involved in bilinear resampling. Because the bilinear weights
are separable in 𝑣𝑥 and 𝑣𝑦 , we can consider the 𝑥 and 𝑦 components
of these moments separately, and the respective sample sets reduce
to the 1D case in (35). Therefore, we can use (47) and (10) to
reach (11).
Reproj mov (88fps, 22.72dB) Reproj still (88fps, 26.30dB) Ours mov (64fps, 30.54dB) Ours still (64fps, 40.04dB)
H ORSE SCENE
No AA (140fps, 15.68dB) 2×2 SS (37fps, 22.22dB) 3×3 SS (19fps, 24.62dB) 4×4 SS (11fps, 25.50dB) Reference
Reproj mov (113fps, 25.52dB) Reproj still (113fps, 28.70dB) Ours mov (84fps, 31.96dB) Ours still (84fps, 35.11dB)
B RICK SCENE
No AA (166fps, 21.82dB) 2×2 SS (35fps, 26.54dB) 3×3 SS (17fps, 28.71dB) 4×4 SS (9.8fps, 29.72dB) Reference
Reproj mov (92fps, 27.27dB) Reproj still (92fps, 31.24dB) Ours mov (52fps, 33.93dB) Ours still (52fps, 38.37dB)
I NDOOR SCENE
No AA (112fps, 24.79dB) 2×2 SS (35fps, 30.33dB) 3×3 SS (17fps, 32.38dB) 4×4 SS (10fps, 33.26dB) Reference
Figure 14: Comparison between our approach, no antialiasing, stratified supersampling, and jittered reprojection.