0% found this document useful (0 votes)
110 views12 pages

Super Sample

This document presents a method called amortized supersampling for antialiasing procedurally generated shaders in real-time rendering. It reuses shading samples from previous frames to increase the effective sampling rate without requiring multiple samples per pixel per frame. By maintaining multiple sets of samples at subpixel precision and updating them incrementally, it can achieve results comparable to 4x4 supersampling but at a lower cost, processing only one new sample per pixel per frame. It addresses challenges like accumulated blurring from reprojection over time through techniques like nonuniform spatial filtering, adaptive temporal filtering, and estimating local spatial blur.

Uploaded by

Lmao company
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views12 pages

Super Sample

This document presents a method called amortized supersampling for antialiasing procedurally generated shaders in real-time rendering. It reuses shading samples from previous frames to increase the effective sampling rate without requiring multiple samples per pixel per frame. By maintaining multiple sets of samples at subpixel precision and updating them incrementally, it can achieve results comparable to 4x4 supersampling but at a lower cost, processing only one new sample per pixel per frame. It addresses challenges like accumulated blurring from reprojection over time through techniques like nonuniform spatial filtering, adaptive temporal filtering, and estimating local spatial blur.

Uploaded by

Lmao company
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Amortized Supersampling

Lei Yang1 Diego Nehab2 Pedro V. Sander1 Pitchaya Sitthi-amorn3 Jason Lawrence3 Hugues Hoppe2
1 2 3
Hong Kong UST Microsoft Research University of Virginia

Abstract
We present a real-time rendering scheme that reuses shading sam-
ples from earlier time frames to achieve practical antialiasing of
procedural shaders. Using a reprojection strategy, we maintain sev-
eral sets of shading estimates at subpixel precision, and incremen-
tally update these such that for most pixels only one new shaded
sample is evaluated per frame. The key difficulty is to prevent ac-
cumulated blurring during successive reprojections. We present a (a) No AA (b) Jit. reproj. (c) 4×4 SS (d) Ours (e) Reference
theoretical analysis of the blur introduced by reprojection methods. 140fps 88fps 11fps 63fps
Based on this analysis, we introduce a nonuniform spatial filter,
Figure 1: For a moving scene with a procedural shader (top row of
an adaptive recursive temporal filter, and a principled scheme for
Figure 14), comparison of (a) no antialiasing, (b) jittered reprojec-
locally estimating the spatial blur. Our scheme is appropriate for
tion, (c) 4×4 stratified supersampling, (d) our amortized supersam-
antialiasing shading attributes that vary slowly over time. It works
pling, and (e) ground truth reference image.
in a single rendering pass on commodity graphics hardware, and
offers results that surpass 4×4 stratified supersampling in quality,
at a fraction of the cost. general idea of reusing shading information across frames has been
studied extensively, as reviewed in Section 2. Our approach builds
1 Introduction on the specific strategy of real-time reprojection, whereby the GPU
pixel shader “pulls” information associated with the same surface
The use of antialiasing to remove sampling artifacts is an impor- point in earlier frames. Recent work on reprojection has focused on
tant and well studied area in computer graphics. In real-time ras- reusing expensive intermediate shading computations across frames
terization, antialiasing typically involves two hardware-supported [Nehab et al. 2007] and temporally smoothing shadow map bound-
techniques: mipmapped textures for prefiltered surface content, and aries [Scherzer et al. 2007]. In contrast, our goal is effective super-
framebuffer multisampling to remove jaggies at surface silhouettes. sampling of more general shading functions.

With the increasing programmability of graphics hardware, many Amortized supersampling faces many challenges not present in or-
functionalities initially developed for offline rendering are now fea- dinary supersampling. Due to scene motion, the set of samples
sible in real-time. These include procedural materials and complex computed in earlier frames form an irregular pattern when repro-
shading functions. Unlike prefiltered textures, procedurally defined jected into the current frame. Moreover, some samples become in-
signals are not usually bandlimited [Ebert et al. 2003], and produc- valid due to occlusion. Thus the set of spatio-temporal samples
ing a bandlimited version of a procedural shader is a difficult and available for reconstruction has much less structure than a typical
ad-hoc process [Apodaca and Gritz 2000]. grid of stratified stochastic samples.

To reduce aliasing artifacts in procedurally shaded surfaces, a com- We build on the jittered sampling and recursive exponential
mon approach is to increase the spatial sampling rate using super- smoothing introduced in prior reprojection work. An important
sampling (Figure 1c). However, it can be prohibitively expensive to contribution of this paper is a theoretical analysis of the spatio-
execute a complex procedural shader multiple times at each pixel. temporal blur introduced by these techniques as a function of the
Fortunately, it is often the case that at any given surface point, relative scene motion and smoothing factor applied in the recursive
expensive elements of the surface shading (such as albedo) vary filter. We show that by adjusting this smoothing factor adaptively,
slowly or are constant over time. A number of techniques can auto- the basic reprojection algorithm can be made to converge to per-
matically factor a procedural shader into static and dynamic layers fect reconstruction (infinite supersampling) for stationary views of
[Guenter 1994; Jones et al. 2000; Sitthi-amorn et al. 2008]. Our idea static scenes (Section 4.1). Furthermore, we show that for moving
is to sample the static and weakly dynamic layers at a lower tempo- surfaces, straightforward reprojection leads to excessive blurring
ral rate to achieve a higher spatial sampling rate for the same com- (Figure 1b). Our scheme makes several contributions in addressing
putational budget. The strong dynamic layers can be either sampled this key issue:
at the native resolution or supersampled using existing techniques. • Use of multiple subpixel buffers to maintain reprojection es-
We present a real-time scheme, amortized supersampling, that eval- timates at a higher spatial resolution;
uates the static and weak dynamic components of the shading func- • Irregular round-robin update of these subpixel buffers to im-
tion only once for the majority of pixels, and reuses samples com- prove reconstruction quality, while still only requiring one
puted in prior framebuffers to achieve good spatial antialiasing. The sample evaluation per pixel per frame;
• A principled approach to estimate and limit the amount of blur
introduced during reprojection and exponential smoothing;
• Adaptive evaluation of additional samples in disoccluded pix-
els to reduce aliasing;
• A strategy to estimate and react to slow temporal changes in
the shading.
Amortized supersampling is compatible with the modern rasteriza- 3 Review of reprojection
tion pipeline implemented on commodity graphics hardware. It is
lightweight, requiring no preprocessing, and thus provides a prac- Reprojection methods [Nehab et al. 2007; Scherzer et al. 2007] al-
tical approach for antialiasing existing procedural shaders. Also, it low reusing values generated at the pixel level over consecutive
requires only a single rendering pass, and can be used in conjunc- frames. We next summarize the basic approach which has two main
tion with hardware multisampling for antialiasing geometry silhou- parts: reprojection and recursive exponential smoothing.
ettes. We show that it achieves results that are qualitatively compa-
rable or superior to 4×4 stratified supersampling, but at a fraction
Reprojection The core idea is to let the rendering of the current
of the rendering cost (Figure 1d).
frame gather and reuse shading information from surfaces visible
in the previous frame. Conceptually, when rasterizing a surface at
a given pixel, we determine the projection of the surface point into
2 Related work the previous framebuffer, and test if its depth matches the depth
stored in the previous depth buffer. If so, the point was previously
visible, and its attributes can be safely reused. Formally, let buffer
Data caching and reuse Many offline and interactive ray- 𝑓𝑡 hold the cached pixel attributes at time 𝑡, and buffer 𝑑𝑡 hold
based rendering systems exploit the spatio-temporal coherence of the pixel depths. Let 𝑓𝑡 [𝑝] and 𝑑𝑡 [𝑝] denote the buffer values at
animation sequences [Cook et al. 1987; Badt 1988; Chen and pixel 𝑝 ∈ ℤ2 , and let 𝑓𝑡 (⋅) and 𝑑𝑡 (⋅) denote bilinear sampling.
Williams 1993; Bishop et al. 1994; Adelson and Hodges 1995; For each pixel 𝑝 = (𝑥, 𝑦) at time 𝑡, we determine the 3D clip-
Mark et al. 1997; Walter et al. 1999; Bala et al. 1999; Ward and Sim- space position of its generating scene point at time 𝑡-1, denoted
mons 1999; Havran et al. 2003; Tawara et al. 2004]. The idea is also (𝑥′ , 𝑦 ′ , 𝑧 ′ ) = 𝜋𝑡-1 (𝑝). The reprojection operation 𝜋𝑡-1 (𝑝) is ob-
used in hybrid systems that use some form of hardware acceleration tained using a simple computation in the vertex program and inter-
[Simmons and Séquin 2000; Stamminger et al. 2000; Walter et al. polator as described by Nehab et al. [2007]. If the reprojected depth
2002; Woolley et al. 2003; Gautron et al. 2005; Zhu et al. 2005; 𝑧 ′ lies within some tolerance of the bilinearly interpolated depth
Dayal et al. 2005]. These systems focus primarily on reusing expen- 𝑑𝑡-1 (𝑥′ , 𝑦 ′ ) we conclude that 𝑓𝑡 [𝑝] has some correspondence with
sive global illumination or geometry calculations such as ray-scene the interpolated value 𝑓𝑡-1 (𝑥′ , 𝑦 ′ ). If the depths do not match (due
intersections, indirect lighting estimates, and visibility queries. A to occlusion), or if (𝑥′ , 𝑦 ′ ) lies outside the view frustum at time 𝑡-1,
related set of methods opportunistically reuse shading information no correspondence exists and we denote this by 𝜋𝑡-1 (𝑝) = ∅.
to accelerate real-time rendering applications by reprojecting the
contents of the previous frame into the current frame [Hasselgren
and Akenine-Moller 2006; Nehab et al. 2007; Scherzer et al. 2007; Recursive exponential smoothing Both Nehab et al. [2007]
Scherzer and Wimmer 2008]. Although we also apply a reprojec- and Scherzer et al. [2007] showed how this reprojection strategy
tion strategy to reuse shading information over multiple frames, we can be combined with a recursive temporal filter for antialiasing.
do so to achieve spatial antialiasing; this application was first noted We first review this basic principle before extending it to a more
by Bishop et al. [1994], but not pursued. Furthermore, we add to general setting.
this area of research a rigorous theoretical analysis of the type of
At each pixel 𝑝, the shader is evaluated at some jittered position to
blur introduced by repeatedly resampling a framebuffer, a funda-
obtain a new sample 𝑠𝑡 [𝑝]. This sample is combined with a running
mental operation in these systems.
estimate of the antialiased value maintained in 𝑓𝑡 [𝑝] according to a
recursive exponential smoothing filter:
( )
Antialiasing of procedural shaders There is a considerable 𝑓𝑡 [𝑝] ← (𝛼)𝑠𝑡 [𝑝] + (1 − 𝛼)𝑓𝑡-1 𝜋𝑡-1 (𝑝) . (1)
body of research on antialiasing procedural shaders, recently re-
viewed by Brooks Van Horn III and Turk [2008]. Creating a ban- Note that the contribution a single sample makes to this estimate
dlimited version of a procedural shader can be a difficult task be- decreases exponentially in time, and the smoothing factor 𝛼 regu-
cause analytically integrating the signal is often infeasible. Sev- lates the tradeoff between variance reduction and responsiveness to
eral practical approaches are reviewed in the book by Ebert et al. changes in the scene. For example, a small value of 𝛼 reduces the
[2003]. These include clamping the high-frequency components variance in the estimate and therefore produces a less aliased result,
of a shader that is defined in the frequency domain [Norton et al. but introduces more lag if the shading changes between frames. If
1982], precomputing mipmaps for tabulated data such as lookup ta- reprojection fails at any pixel (i.e. 𝜋𝑡-1 (𝑝) = ∅), then 𝛼 is locally
bles [Hart et al. 1999], and obtaining approximations using affine reset to 1 to give full weight to the current sample. This produces
arithmetic [Heidrich et al. 1998]. However, the most general and higher variance (greater aliasing) in recently disoccluded regions.
common approach is still to numerically integrate the signal using
supersampling [Apodaca and Gritz 2000]. Our technique brings the
simplicity of supersampling to real-time applications at an accept-
4 Amortized supersampling: theory
able increase in rendering cost.
Spatial antialiasing is achieved by convolving the screen-space
shading function 𝑆 with a low-pass filter 𝐺 [Mitchell and Netravali
1988]. We use a Monte Carlo algorithm with importance sam-
Post-processing of rendered video The pixel tracing filter pling [Robert and Casella 2004] to approximate this convolution
of Shinya [1993] is related to our approach. Starting with a ren- (𝑆 ∗ 𝐺)(𝑝) at each pixel 𝑝:
dered video sequence, the tracing filter tracks the screen-space po-
sitions of corresponding scene points, and combines color samples 𝑁
at these points to achieve spatial antialiasing in each frame. The 1 ∑ ( )
𝑓𝑁 [𝑝] ← 𝑆 𝑝 + 𝑔𝑖 [𝑝] . (2)
filtering operation is applied as a post-process, and assumes that 𝑁 𝑖=1
the full video sequence is accessible. In contrast, our approach is
designed for real-time evaluation. We maintain only a small set of Here 𝑔𝑖 [𝑝] plays the role of a per-pixel random jitter offset, dis-
reprojection buffers in memory, and update these efficiently with tributed according to 𝐺. Our choice for 𝐺 is a 2D Gaussian kernel
adaptive recursive filtering. with standard deviation 𝜎𝐺 = 0.4737 as this closely approximates
4.2 Moving viewpoint and dynamic scene

The presence of relative scene motion (caused by a moving cam-


era or moving/deforming scene objects) significantly complicates
the task of amortizing (2) over multiple frames. Specifically, any
algorithm must both respond to changes in visibility and avoid un-
wanted blurring due to successive resamplings. We next describe
our approach for addressing these two issues.

4.2.1 Accounting for visibility changes

In dynamic scenes, the number of consecutive frames that a surface


point has been visible varies across pixels. We therefore maintain
a record of the effective number of samples that have contributed
to the current estimate at each pixel. More precisely, we store the
variance reduction factor 𝑁𝑡 [𝑝] for each pixel, and use this value
to set the per-pixel smoothing factor 𝛼𝑡 [𝑝] at each frame, with the
goal of reproducing the stationary case in Section 4.1.
When a surface point becomes visible for the first time (ei-
ther at start-up or due to a disocclusion) we initialize 𝛼𝑡 [𝑝] ← 1
Figure 2: Illustration of the samples that contribute to the estimate and 𝑁𝑡 [𝑝] ← 1. In subsequent updates, we apply the rules
at a single pixel 𝑓𝑡 [𝑝] under a constant fractional velocity field 𝑣
(and assuming no jitter for simplicity). Note that the sample points 1
𝛼𝑡 [𝑝] ← (6)
spread out over time, but that their weights (proportional to the ra- 𝑁𝑡-1 [𝑝] + 1
dius of each dot) decreases at a rate determined by the smoothing
factor 𝛼. We characterize the amount of blur at each pixel by the and
weighted spatial variance of these sample offsets.
𝑁𝑡 [𝑝] ← 𝑁𝑡-1 [𝑝] + 1. (7)

the kernel of Mitchell and Netravali [1988] while avoiding negative As discussed later on, the presence of scene motion requires us to
lobes, which interfere with importance sampling [Ernst et al. 2006]. limit this indefinite accumulation of uniformly weighted samples
It is easily shown that the variance of the estimator is — even for pixels that remain visible over many frames. To do so,
Sections 5.2 and 6.2 prescribe lower bounds on the value of 𝛼𝑡 [𝑝]
1 ( ) that override (6). This effectively changes the rate at which new
Var(𝑓𝑁 [𝑝]) = Var 𝑓1 [𝑝] , (3)
𝑁 samples are accumulated, leading to a new update rule for 𝑁𝑡 [𝑝]
( ) (see the appendix for a derivation):
where Var 𝑓1 [𝑝] is the per-pixel variance of the Monte Carlo esti-
mator using just one sample. Using a recursive exponential smooth- ( ( )2 )−1
2 1 − 𝛼𝑡 [𝑝]
ing filter, we can amortize the cost of evaluating the sum in (2) over 𝑁𝑡 [𝑝] ← 𝛼𝑡 [𝑝] + [ ] . (8)
multiple frames: 𝑁𝑡-1 𝜋𝑡-1 (𝑝)
( ) [ ]
𝑓𝑡 [𝑝] ← (𝛼𝑡 [𝑝]) 𝑆 𝑝 + 𝑔𝑡 [𝑝] + (1 − 𝛼𝑡 [𝑝]) 𝑓𝑡-1 𝜋𝑡-1 (𝑝) . (4) Note that (8) reduces to (7) when 𝛼𝑡 [𝑝] is not overridden, so in
practice we always use (8).
In words, a running estimate of (2) is maintained at each pixel 𝑝
in the buffer 𝑓𝑡 and
( is updated
) at each frame by combining a new 4.2.2 Modeling blur due to resampling
jittered sample 𝑆 𝑝+𝑔𝑡 [𝑝] with the previous estimate according to
the smoothing factor 𝛼𝑡 [𝑝]. Note that this formulation allows 𝛼𝑡 [𝑝] In general, the reprojected position 𝜋𝑡-1 (𝑝) used in (4) lies some-
to vary over time and with pixel location. where between the set of discrete samples in buffer 𝑓𝑡-1 and thus
some form of resampling is required. This resampling involves
We first present an antialiasing scheme for stationary views and
computing a weighted sum of the values in the vicinity of 𝜋𝑡-1 (𝑝).
static scenes, and then consider the more general case of arbitrary
Repeatedly resampling values at intermediate locations has the ef-
scene and camera motion. Detailed derivations of key results in
fect of progressively increasing the number of samples that con-
these sections are found in the appendix.
tribute to the final estimate at each pixel. Moreover, the radius of
this neighborhood of samples increases over time (Figure 2), lead-
4.1 Stationary viewpoint and static scene ing to undesirable blurring (Figure 1b). Our goal is to limit this
effect. We first model it mathematically.
In the case of a stationary camera viewpoint and static scene, the re-
projection map 𝜋 is simply the identity. In this case, the smoothing The value stored at a single pixel is given by a weighted sum of a
factor can be gradually decreased over time as number of samples 𝑛(𝑡, 𝑝) evaluated at different positions:

1 𝑛(𝑡,𝑝) 𝑛(𝑡,𝑝)
𝛼𝑡 [𝑝] = , (5) ∑ ( ) ∑
𝑡 𝑓𝑡 [𝑝] = 𝜔𝑡,𝑖 𝑆 𝑝 + 𝛿𝑡,𝑖 [𝑝] with 𝜔𝑡,𝑖 = 1. (9)
𝑖=1 𝑖=1
resulting in an ever-increasing accumulation of samples, all with
uniform weights. This causes the estimates 𝑓𝑡 [𝑝] to converge to The weights 𝜔𝑡,𝑖 are a function of the particular resampling strategy
perfect antialiasing (infinite
( supersampling)
) as 𝑡 → ∞, with vari- employed and the sequence of weights 𝛼𝑡 used in the recursive fil-
ance decreasing as Var 𝑓1 [𝑝] /𝑡. ter. The offsets 𝛿𝑡,𝑖 denote the position of each contributing sample
Simulation Equation (11)
0.7 0.54
0.54

0.6

0.5

0.4

0.3 0.90
0.90

0.2
0.71 0.71 1.3
1.3 1.1
0.1 1.1
1.6 1.9 1.9
1.6 2.2
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5

Figure 3: Experimental validation of Equation (11). For each veloc- Figure 4: Sampling from multiple subpixel buffers. To limit the
ity 𝑣 and weight 𝛼 we rendered a resolution chart until convergence amount of blur, we use nonuniform blending weights defined by
using (4). We compared the rendered result to a set of images of the a tent function centered on the quadrant being updated. (a) In the
same resolution chart rendered with (2) using a low-pass filter 𝐺′ absence of local motion, no resampling blur is introduced; (b) For
and a range of standard deviations 𝜎𝐺′ . The left plot shows the 𝜎𝐺′ a moving scene, our method selects those samples closest to the
that gives the best match (highest PSNR) as a function of 𝑣 and 𝛼. desired quadrant center to limit the amount of blur.
The right plot shows the blur standard deviation predicted by (11).
The RMSE between the observed and predicted blur standard devi-
ations is only 0.0382 pixels. 5.1 Subpixel buffers

with respect to the center of the pixel 𝑝. Note that each displace- We decrease unwanted blur by maintaining screen-space esti-
ment 𝛿𝑡,𝑖 [𝑝] is the result of a combination of offsets due to jitter and mates 𝑓 at twice the screen resolution, as this tends to reduce the
reprojection. Following the derivation in the appendix, the amount terms 𝑣𝑥 (1 − 𝑣𝑥 ) and 𝑣𝑦 (1 − 𝑣𝑦 ) in (11) by half. We associate es-
of blur at a pixel can be characterized by the average weighted spa- timates with the four quadrants of each pixel, and update these in
tial variance across both dimensions round-robin fashion. These 2×2 quadrant samples are deinterleaved
to form 𝐾 = 4 independent subpixel buffers {𝑏𝑘 }, 𝑘 ∈ {0, 1, 2, 3},
𝑛(𝑡,𝑝) ) 𝑛(𝑡,𝑝) )
𝜎𝑡2 [𝑝] = 12 Var𝑥 {𝛿𝑡,𝑖 [𝑝]}𝑖=1 + 21 Var𝑦 {𝛿𝑡,𝑖 [𝑝]}𝑖=1 . (10)
( (
each at screen
{( resolution.
) (−0.25)Each buffer 𝑏𝑘 )}
stores an estimate offset
by 𝜙𝑘 ∈ −0.25
( 0.25 ) (0.25
−0.25
, 0.25 , −0.25 , 0.25 relative to the center
Although obtaining a closed-form expression for 𝜎𝑡2 [𝑝] is impracti- of the pixel.
cal for arbitrary scene motion, the case of constant panning motion
is tractable and provides an insightful case that will serve as the Note that in the absence of scene motion, these four subpixel buffers
basis of our approach for estimating (and limiting) unwanted blur. effectively form a higher-resolution framebuffer. However, under
Moreover, other types of motion are well approximated locally by scene motion, the subpixel samples computed in earlier frames re-
translations. This type of motion resamples each pixel at a constant project to offset locations, as indicated in Figure 4.
offset given by the fractional velocity 𝑣 = 𝜋𝑡-1 (𝑝) − ⌊𝜋𝑡-1 (𝑝)⌋.
Furthermore, let us assume for now that standard bilinear interpo- At each frame, we compute one new sample per pixel, and up-
lation is used to reconstruct intermediate values and that a constant date one of the subpixel buffers, 𝑏𝑖(𝑡) , according to (4) using in-
smoothing factor 𝛼 is used in (4). As shown in the formation gathered from all the subpixel buffers. (For now, let
( appendix, under
𝑖(𝑡) = 𝑡 mod 𝐾.) We then compute the final pixel color as a
these assumptions the expected blur variance 𝐸 𝜎𝑡2 [𝑝] converges
)
(as 𝑡 → ∞) to weighted average of these subpixel buffers. Figure 5 illustrates the
steps executed by the pixel shader at each frame 𝑡:
1 − 𝛼 𝑣𝑥 (1 − 𝑣𝑥 ) + 𝑣𝑦 (1 − 𝑣𝑦 )
𝜎𝑣2 = 𝜎𝐺
2
+ . (11) (
1. 𝑠𝑡 [𝑝] ← E VALUATE S AMPLE 𝑝 + 𝜙𝑖(𝑡) + 𝑔𝑡 [𝑝]
)
𝛼 2
Evaluate the procedural shader at a new sample position;
The simulation results shown in Figure 3 confirm the accu- ( )
racy of this expression. Each factor in (11) suggests a dif- 2. 𝑏𝑖(𝑡) [𝑝] ← U PDATE S UBPIXEL {𝑏𝑘 [𝑝]}, 𝑠𝑡 [𝑝], 𝑡
(ferent approach for reducing ) the amount of blur. The factor Update one subpixel buffer using all previous subpixel buffers
𝑣𝑥 (1 − 𝑣𝑥 ) + 𝑣𝑦 (1 − 𝑣𝑦 ) arises from the choice of bilinear in- and the new sample;
terpolation. We encourage the fractional velocity 𝑣𝑥 and 𝑣𝑦 to con- ( )
centrate around 0 or 1 by maintaining an estimate of the framebuffer 3. 𝑓𝑡 [𝑝] ← C OMPUTE P IXEL C OLOR {𝑏𝑘 [𝑝]}, 𝑠𝑡 [𝑝]
at a higher resolution, and by avoiding resampling whenever possi- Compute the output pixel color given the new sample and the
ble (Section 5.1). In addition, we reduce the factor 1−𝛼𝛼
by using subpixel buffers;
larger values of 𝛼, although this has the disadvantage of limiting ( )
the magnitude of antialiasing. We present a strategy for setting 𝛼 4. 𝑁𝑡 [𝑝] ← U PDATE E FFECTIVE N UMBERO F S AMPLES 𝑁𝑡 [𝑝]
that controls this tradeoff (Section 5.2). Together, these ideas form Update the new per-pixel effective number of samples.
the backbone of our antialiasing algorithm. These steps are implemented in the real-time graphics pipeline. The
vertex shader computes the reprojection coordinates needed to ac-
5 Algorithm cess the four subpixel buffers (as with traditional reprojection [Ne-
hab et al. 2007]), and the fragment shader outputs three render tar-
As described in this section, our antialiasing algorithm uses multi- gets: the framebuffer 𝑓𝑡 , an updated version 𝑏𝑖(𝑡) of one of the sub-
ple subpixel buffers to limit the amount of blur, adapts sample eval- pixel buffers, and updated values for the per-pixel number 𝑁𝑡 of
uation at disoccluded pixels, and adjusts the smoothing factor 𝛼 to samples. All steps are performed together in a single GPU render-
control the tradeoff between blurring and aliasing. ing pass, as described below.
Rather than accumulating all of the samples in the resulting foot-
print (which would be too expensive), we consider only the four
nearest samples in each subpixel buffer
{( )(altogether
( ) ( ) ( sixteen sam-
ples), located at ⌊𝑝𝑘 ⌋ + Δ with Δ ∈ 00 , 10 , 01 , 11 :
)}

∑ [ ]
˜𝑏(𝑝 + 𝜙𝑖(𝑡) ) = 𝑘,Δ 𝑤𝑘,Δ 𝑏𝑖(𝑡-𝑘-1) ⌊𝑝𝑘 ⌋ + Δ
∑ , (16)
𝑘,Δ 𝑤𝑘,Δ

weighting these nearest samples according to the tent function as


( )
𝑤𝑘,Δ = Λ𝑟𝑘 𝑝𝑘 − (⌊𝑝𝑘 ⌋ + Δ) . (17)

As proved in the appendix, this is equivalent to taking the weighted


average of the values of each subpixel buffer resampled at positions
⌊𝑝𝑘 ⌋ + 𝑜𝑘 using standard bilinear interpolation
∑ ( )
˜𝑏(𝑝 + 𝜙𝑖(𝑡) ) = 𝑘 𝑤𝑘 𝑏𝑖(𝑡-𝑘-1) ⌊𝑝𝑘 ⌋ + 𝑜𝑘
∑ (18)
𝑘 𝑤𝑘
Figure 5: The main steps of our fragment shader algorithm. All ∑
steps are performed together in the main rendering pass. where 𝑤𝑘 = Δ 𝑤𝑘,Δ and
( )T
𝑤𝑘,(0) + 𝑤𝑘,(1) 𝑤𝑘,(1) + 𝑤𝑘,(1)
1 1 0 1
Step 1: Evaluate a new sample The first step is to eval- 𝑜𝑘 = . (19)
𝑤𝑘 𝑤𝑘
uate the procedural shader at a new sample position. The offset
𝜙𝑖(𝑡) centers the jitter distribution on the appropriate quadrant. The This formulation leverages hardware support for bilinear interpola-
jitter value 𝑔𝑡 [𝑝] is drawn from a Gaussian distribution with stan- tion and accelerates the resampling process. Finally, note that for
dard deviation 𝜎𝐺 = 0.2575. When the values from all the sub- the special case of a static scene, the weight 𝑤𝑖(𝑡-4) is exactly one
pixel buffers are combined (and in the absence of motion), this while the others are zero and the offset vector 𝑣𝑖(𝑡-4) is zero, so the
standard deviation gives the best fit to the wider Gaussian kernel previous point estimate is retrieved unaltered. Thus, no blur is in-
(𝜎𝐺 = 0.4737). We jitter each pixel independently by offsetting troduced and the running estimate converges exactly. Also, in the
the interpolated values received from the rasterizer before they are special case of constant panning motion, each subpixel buffer con-
sent to the fragment shader. tributes exactly one sample to the sum in (16) due to the fact that
the tent function Λ0.5 has unit diameter. Thus, each subpixel buffer
Step 2: Update one subpixel buffer During this step, effectively undergoes nearest-sampling.
we access all subpixel buffers using reprojection to reconstruct
˜𝑏(𝑝 + 𝜙𝑖(𝑡) ), which is then combined with the sample 𝑠𝑡 [𝑝] from
Step 3: Compute pixel color The computation of the final
step 1: pixel color follows a process very similar to that of step 2. We
access all subpixel buffers using reprojection to form an estimated
𝑏𝑖(𝑡) [𝑝] ← (𝛼) 𝑠𝑡 [𝑝] + (1 − 𝛼) ˜𝑏(𝑝 + 𝜙𝑖(𝑡) ). (12) value 𝑓˜(𝑝) at the center of the pixel, and combine this estimate with
the fresh sample 𝑠𝑡 [𝑝]:
To minimize the amount of blur, ˜𝑏 should favor samples nearest the
center of the quadrant being updated. As illustrated in Figure 4, our (𝛼) ( 𝛼) ˜
approach is to define a bilinear tent function Λ𝑟 that encompasses 𝑓𝑡 [𝑝] ← 𝑠𝑡 [𝑝] + 1 − 𝑓 (𝑝). (20)
𝐾 𝐾
the current quadrant in the framebuffer (with 𝑟 = 0.5):
The estimate 𝑓˜(𝑝) is obtained just as in (18), but using posi-
∣𝑣𝑥 ∣ ∣𝑣𝑦 ∣
Λ𝑟 (𝑣) = clamp(1 − 𝑟
, 0, 1) clamp(1 − 𝑟
, 0, 1). (13) tion 𝑝′𝑘 = 𝜋𝑡-𝑘-1 (𝑝) − 𝜙𝑖(𝑡-𝑘-1) reprojected from the center of the
pixel rather than the center of one quadrant, and using a tent func-
Ideally, we would like to compute the weighted sum of all the sam- tion with radius 𝑟𝑘′ = 2𝑟𝑘 that is twice as large. The current sample
ples that fall under the support of this function, after these are for- 𝑠𝑡 [𝑝] appears in this formula because the estimate 𝑓˜(𝑝) is computed
ward-projected into the current frame. However, certain approxi- using the old contents of buffer 𝑏𝑖(𝑡) , before it is updated in step 2
mations are required to make this operation efficient on graphics in the same rendering pass. It is weighted by 𝛼/𝐾 because it would
hardware. Instead of using forward reprojection, we compute the contribute to only one of the 𝐾 subpixel buffers.
positions 𝑝𝑘 of the centers of these tents reprojected back into each
subpixel buffer (reverse reprojection), properly accounting for the
differences in quadrant offsets: Step 4: Update effective number of samples Buffer 𝑁𝑡 is
updated from 𝑁𝑡−1 and 𝛼𝑡 using formula (8). The algorithm uses
𝑝𝑘 = 𝜋𝑡-𝑘-1 (𝑝 + 𝜙𝑖(𝑡) ) − 𝜙𝑖(𝑡-𝑘-1) , 𝑘 ∈ {0, 1, 2, 3}. (14) a pair of ping-pong buffers to maintain 𝑁𝑡 and 𝑁𝑡−1 .

We then approximate the footprint of the projected tent function by 5.2 Limiting the amount of blur
an axis-aligned square of radius
( ) Given a threshold 𝜏𝑏 on the amount of blur (variance of the sam-
𝑟𝑘 = 𝐽𝜋𝑡-𝑘-1 [𝑝] 0.5 , (15)

0.5 ple distribution) and the velocity 𝑣 due to constant panning motion,

we would like to compute the smallest smoothing factor 𝛼𝜏𝑏 (𝑣)
where 𝐽𝜋𝑡-𝑘-1 is the Jacobian of each reprojection map, which is that provides the greatest degree of antialiasing without exceeding
directly available using the ddx/ddy shader instructions. This is to this blur threshold. Unlike in the case of traditional bilinear repro-
account for changes in scale during minification or magnification. jection, which admits a bound on 𝛼 by inverting (11), our more
28.16dB 30.54dB 23.65dB

Figure 6: Comparison of signal drift for a simple round-robin sub-


pixel update sequence (left) and our irregular sequence (right).
(a) Small 𝜏𝑏 (b) Good 𝜏𝑏 (c) Large 𝜏𝑏 (d) Reference
𝜏𝑏 = 1 𝜏𝑏 = 2
Figure 8: Effect of the blur tolerance 𝜏𝑏 . Values of 𝜏𝑏 that are too
small aggressively discard earlier estimates and can lead to aliasing,
while values that are too large may combine too much resampled
data and cause blurring, as can be seen in this close-up view (from
top row of Figure 14).

0.2

Figure 7: 2D plot of the smallest smoothing factor 𝛼𝜏𝑏 (𝑣) that re-
spects a blur threshold 𝜏𝑏 , as function of the 2D velocity vector 𝑣.

complex algorithm does not lend itself to a similar analysis. The


(a) Eval map (b) One eval (c) Adaptive eval (d) Reference
appendix provides a more detailed explanation of why extending
these earlier results is impractical. In particular, the mean 𝜇 of the
sample distribution is no longer guaranteed to be zero. This is due Figure 9: As shown in this close-up view (from Figure 14), we re-
to the fact that the reconstruction kernel Λ0.5 in (17) effectively re- duce aliasing in newly disoccluded regions by adaptively evaluating
duces to nearest-sampling during panning motion. This can cause additional samples when reprojection fails. The shades of red in (a)
the signal to drift spatially and may lead to visible artifacts (Fig- indicate the number (1–4) of additional samples.
ure 6). We combine two strategies to address this problem:

1. Rather than (updating the subpixel ) buffers in simple round- The factor 𝐾 = 4 is due to the fact that we update each subpixel
robin order 𝑖(𝑡) = 𝑡 mod 𝐾 , we use an irregular update buffer every fourth frame. At pixels where the scene is stationary,
sequence that breaks the drift coherence. We found that the the effect of (21) is to progressively reduce the value of 𝛼 just as
following update sequence works well in practice: (0, 1, 2, 3, 0, in (6). In the case of a moving scene, however, the value of 𝛼 is set
2, 1, 3, 1, 0, 3, 2, 1, 3, 0, 2, 2, 3, 0, 1, 2, 0, 3, 1, 3, 2, 1, 0, 3, 1, 2, 0). to give the greatest reduction in aliasing at an acceptable amount of
blur as shown in Figure 8.
2. Rather than bound the variance 𝜎 2 of the sample distribution
(second moment about the mean 𝜇), we bound the quantity 5.3 Adaptive evaluation when reprojection fails
𝜎 2 + 𝜇2 (second moment about zero, the pixel center). This
simultaneously limits the degree of blur and drift.
For surfaces that recently became visible, the reprojection from one
The irregular update sequence makes it difficult to derive a closed- or more subpixel buffers may fail. In the worst case, only the new
form expression for the second moment of the sample distribution. sample 𝑠𝑡 [𝑝] is available. In these cases, the shading is prone to
Instead, we compute 𝜎 2 + 𝜇2 numerically in an off-line simulation. aliasing as seen in Figure 9b.
Specifically, we compute the moments of the sample distribution in To reduce these artifacts, for each reprojection that fails when com-
each subpixel buffer over a range of values of 𝛼 and 𝑣. We then puting 𝑓˜(𝑝) in (20) (i.e., 𝜋𝑡-𝑘-1 (𝑝 + 𝜙𝑖(𝑡) ) = ∅) we invoke the
average the moments over an entire period of the update sequence, procedural shader at the appropriate quadrant offset 𝑝 + 𝜙𝑖(𝑡) . Con-
over the 𝑥 and 𝑦 directions according to (10), and over all subpixel sequently, the shader may be evaluated from one to five times (Fig-
buffers. These results are finally inverted to produce a table 𝛼𝜏𝑏 (𝑣). ure 9a) to improve rendering quality (Figure 9c). Fortunately these
At runtime, this table is accessed to retrieve the value of 𝛼 that troublesome regions tend to be spatially contiguous and thus map
meets the desired threshold for the measured per-pixel velocity. We well to the SIMD architecture of modern GPUs.
found that a 643 table is sufficient to capture the variation in 𝛼𝜏𝑏 (𝑣).
Figure 7 shows slices of this function for two different values of 𝜏𝑏 .
6 Accounting for signal changes
Putting everything together, we apply the following update rule in
place of (6): The preceding analysis and algorithm assume that the input signal
( ) 𝑆 does not change over time. However, it is often the case that
1 the surface shading does vary temporally due to, for example, light
𝛼𝑡 [𝑝] ← 𝐾 max , 𝛼𝜏𝑏 (𝑣) . (21) and view-dependent effects such as cast shadows and specular high-
𝑁𝑡-1 + 1
10%

5%

0%

17.86dB 25.90dB 28.75dB 22.41dB

(a) No signal adaptation (b) Small 𝜏𝜖 (c) Good 𝜏𝜖 (d) Large 𝜏𝜖 (e) Reference

Figure 10: Effect of the residual tolerance 𝜏𝜖 on the teapot scene, which has bump-mapped specular highlights and moving lights. Small
values of 𝜏𝜖 give too little weight to earlier estimates and lead to aliasing. Large values result in excessive temporal blurring when the shading
function varies over time, as evident in this close-up view. The full-scene difference images are shown at the top.

lights. In these cases it is possible to apply our supersampling tech- 6.2 Limiting the residual
nique to only the constant view- and light-independent layers and
evaluate the remaining portions of the shading at the native screen Similar to our approach for limiting the degree of spatial blur, we
resolution or with an alternative antialiasing technique. However, would like to establish a lower bound on 𝛼 such that the residual
providing a unified framework for antialiasing time-varying surface 𝜖ˆ𝑡+1 in the next frame remains within a threshold 𝜏𝜖 . The choice of
effects is a worthy goal and we present a preliminary solution to 𝜏𝜖 controls the tradeoff between the degree of antialiasing and the
this problem in this section. We describe how to compute a lower responsiveness of the system to temporal changes in the shading.
bound on 𝛼 to avoid unwanted temporal blur in the case of shading
changes. Following the derivation in the appendix, our strategy to adapt to
temporal changes is to replace (21) with
6.1 Estimating the residual (
1
)
𝛼𝑡 [𝑝] ← 𝐾 max , 𝛼𝜏𝑏 (𝑣), 𝛼𝜏𝜖 , (26)
𝑁𝑡-1 + 1
Detecting temporal changes in the shading requires estimating the
residual between our current shading estimate and its true value: where 𝜏𝜖
𝜖𝑡 [𝑝] = (𝑆 ∗ 𝐺)𝑡 (𝑝) − 𝑓𝑡 [𝑝]. (22) 𝛼𝜏𝜖 = 1 − . (27)
𝜖ˆ𝑡 [𝑝]
Since the correct value of (𝑆 ∗ 𝐺)𝑡 is naturally unavailable, we
would like to use the most current information 𝑠𝑡 to estimate this At pixels where the shading is constant, 𝜖ˆ𝑡 is less than 𝜏𝜖 and 𝛼𝑡
residual 𝜖ˆ𝑡 . However, since we expect 𝑠𝑡 to be aliased (otherwise progresses according to the previous rules. When the residual in-
we would not need supersampling), we must smooth this value in creases, the value of 𝛼 also increases, shrinking the temporal win-
both space and time. This corresponds to our assumption that al- dow over which samples are aggregated and producing a more ac-
though 𝑆𝑡 (𝑝) may contain high-frequency spatial information, its curate estimate of the shading. Figure 10 illustrates this tradeoff
partial derivative with respect to time ∂𝑆𝑡 (𝑝)/∂𝑡 is smooth over between aliasing and temporal lag.
the surface. In other words, we assume that temporal changes in The selection of 𝜏𝜖 is closely related to the characteristic of the
the signal affect contiguous regions of the surface evenly. When shading signal. In our experiments, we simply select 𝜏𝜖 that
this is not the case, our strategy for setting 𝛼 will fail as discussed achieves the best PSNR. Alternatively, other numerical or visual
in Section 8. metrics can be used to limit both temporal lag and aliasing to ac-
Let the buffers 𝑒𝑘 store the differences between the recent sam- ceptable amounts.
ples 𝑠𝑡-𝑘 [𝑝] and the values reconstructed from the previous contents
of the subpixel buffers 𝑏𝑖(𝑡-𝑘) [𝑝] over the last 𝐾 frames 7 Results
𝑒𝑘 [𝑝] = 𝑠𝑡-𝑘 [𝑝] − 𝑏𝑖(𝑡-𝑘) [𝑝], 𝑘 ∈ {0, 1, 2, 3}. (23)
Scenes We tested our algorithm using several expensive proce-
We temporally smooth these values by retaining at each pixel the dural shaders with high-frequency spatial details that are prone to
difference with the smallest magnitude aliasing. The brick scene combines a random-color brick pattern
with noisy mortar and pits. Bump mapping is used for the light-
𝑒smin [𝑝] = 𝑒𝑗 [𝑝] where 𝑗 = arg min 𝑒𝑘 [𝑝] , (24)
𝑘 ing. The horse scene includes an animated wooden horse galloping
over a marble checkered floor. The teapot consists of a procedural
and obtain our final estimate of the residual by spatially smoothing Voronoi cell pattern modulating both the color and the height field.
these using a box filter 𝐵𝑟 of radius 𝑟 = 3: The added rough detail is bump-mapped with specular highlights.
( )
𝜖ˆ𝑡 [𝑝] = 𝐵3 ∗ 𝑒smin [𝑝] (25)
In addition to the basic scenes above, we show results for an indoor
Note that this approach risks underestimating the residual. In other scene that has a higher variety of shaders, dynamic elements, and
words, when presented with the choice between aliasing or a slower more geometric complexity. The scene consists of several proce-
response to signal changes, our policy is to choose the latter. durally shaded objects that altogether have over 100,000 triangles.
Panning Rotation Minification / Magnification
36 36 36
Panning Rotation Minification / Magnification
36 33 3633 Ours
33
36 Ours
Ours

Reproj
33 30 3330 Jit Reproj 30 Jit Reproj
PSNR (dB)

Ours
33 Ours
Ours
4x4 SS
Orignial original
30 27 3027 Jit Reproj
27
30 Jit Reproj
Reproj
PSNR (dB)

3x3 SS
SS2x2Jit 2x2SSJit 4x4 SS
27 24 2724 Orignial 24
27 original 2x2 SS
SS3x3Jit 3x3SSJit 3x3 SS
SS2x2Jit 2x2SSJit
24 21 2421 21 No AA
SS4x4Jit 24 4x4Jit 2x2 SS
SS3x3Jit 3x3SSJit
21 18 2118 18
21 No AA
SS4x4Jit 4x4Jit
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
18 Animation time (frames) 18 Animation time (frames) 18 Animation time (frames)
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
7
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
Figure 11: PSNR comparison of our approach with traditional
Animation time (frames) supersampling and jittered reprojection
Animation time (frames) for the brick scene using real-time
Animation time (frames)
animation sequences exhibiting different types of motion. The red line indicates the frame used in Figure 14.
Horse Teapot Indoor
30 30 37
Horse Teapot Indoor
Ours
30 27 3028 34
37
Reproj
PSNR (db)

27 24 2826 31
34 Ours
4x4 SS
Reproj
PSNR (db)

3x3 SS
24 21 2624 28
31 4x4 SS
2x2 SS
2x2 SS
18 2422 25 3x3 SS
21 28 No AA

15 2x2 SS
2x2 SS
18 2220 22
25
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

No AA
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
31
61
91
121
151
181
211
241
271
301
331
361
391
421
15 Animation time (frames) 20 Animation time (frames) 22 Animation time (frames)
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
8
15
22
29
36
43
50
57
64
71
78
85
92
99

1
31
61
91
121
151
181
211
241
271
301
331
361
391
421
Animation time (frames) Animation time (frames) Animation time (frames)

Figure 12: Additional PSNR comparisons of our approach with traditional supersampling and jittered reprojection. The horse is animated, the
teapot is dynamically lit, and the indoor scene is animated and dynamically lit. The red lines indicate the frames used in Figures 10 and 14.

These objects include bump-mapped brick walls, a shiny colored Figure 14 compares our algorithm to these alternatives. Note that
bumpy sphere, a glistening uneven marble floor, a reflective stone- standard rasterization is very efficient, but produces significant
bricked teapot, a fine-carved wooden box and a rusty metal exotic aliasing, especially under motion (see accompanying video). Jit-
creature. The animated scene from the accompanying video com- tered reprojection is also faster than our technique, but has infe-
bines several types of fast animation and camera motion, exhibit- rior quality because it lacks high-resolution estimates and does not
ing similar complexity to what one would expect in a typical game adapt 𝛼 to limit blur. Our technique is also superior to traditional
scene. It also includes a rotating masked gobo light, which sheds 2×2 and 3×3 stratified supersampling in terms of both rendering
a procedurally generated pattern onto the scene objects. The fast speed and quality. Finally, note that our technique gives higher
moving light and significant disocclusion caused by the fast motion PSNR when compared to 4×4 stratified supersampling in the ma-
and complex geometry details makes the scene extremely difficult jority of cases. The teapot from Figure 10 is the most challenging
to handle with traditional reprojection techniques. However, with scene due to the fact that it contains a procedural shader with a
multiple subpixel buffers and adaptive evaluation, our method pro- high-frequency yet fast changing dynamic lighting component. As
duces satisfying results. a result, our method limits the effective number of samples it ag-
gregates to avoid temporal blurring, and this reduces the degree of
Memory usage The depth values required by reprojection are antialiasing. For the indoor scene in motion, we chose not to an-
stored in the alpha channel of each subpixel buffer, which are 16- tialias the gobo lighting computation to avoid excessive blurring of
bit RGBA. We store the variance reduction factors 𝑁𝑡 [𝑝] in an 8-bit the pattern (see Section 8). Note, however, that the moving specu-
auxiliary buffer. For the teapot and indoor scenes, residuals for the lar highlights over the shiny objects (such as the teapot scene, the
four subpixel buffers are stored in the four channels of one addi- sphere, floor and teapot in the indoor scene) at different scales are
tional 8-bit RGBA buffer. A 1024×768 backbuffer consumes about still properly antialiased without introducing noticeable temporal
3MB, and our scheme uses an additional 27MB to supersample all blur. This demonstrates that our technique can handle signals that
shaders in a scene. change with moderate speed and in a spatially correlated manner.
The indoor scene also shows the ability of our approach to preserve
Comparisons All results are generated using an Intel Core2 antialiased details in the presence of complex and very dynamic
2.13GHz PC with 2GB RAM and an AMD HD4870 graphics board changes in occlusion. Overall, note that our algorithm is signifi-
at a screen resolution of 1024×768. We measure image quality us- cantly faster than 4×4 supersampling on all scenes (about 5–10×
ing the peak signal-to-noise ratio (PSNR) with respect to a ground faster depending on the relative amount of vertex and pixel process-
truth image generated with 256 samples/pixel weighted as in Sec- ing).
tion 4.1. We show comparisons between conventional rendering
(no antialiasing), our algorithm, and traditional 2×2, 3×3 and 4×4 Figure 11 and 12 graph the rendering quality (in PSNR) of our
stratified supersampling (performed on the GPU by rendering to a scenes for each of the techniques using different animation se-
larger framebuffer and then downsampling). In addition, we com- quences. The red vertical lines denote the images shown in Fig-
pare to the basic reprojection method [Nehab et al. 2007; Scherzer ures 10 and 14. For the brick scene, Figure 11 demonstrates the
et al. 2007] that uses uniform jittered sampling, a single cached superior quality of our technique under different types of motion:
buffer, and a value of 𝛼 chosen to maximize PSNR. panning, rotation, and repeated magnification and minification. The
small oscillations in PSNR do not visibly affect rendering quality
as can be verified in the accompanying video. Figure 12 shows that
similar results are achieved for the other three test scenes, which in-
clude various different types of motion. Again, the accompanying
video shows animated versions of these results.

8 Limitations and future work

As with previous reprojection methods, our supersampling algo-


rithm results in a increase in the amount of vertex processing and
raster interpolation, as well as pixel shader computation. However,
this cost is negligible if the bottleneck lies in the pixel shader, which
is increasingly the case. Figure 13: Comparison between performing the gobo computation
(left) within our amortized framework, or (right) separately at the
The work of Méndez-Feliu et al. [2006] makes the interesting point native resolution. Note that it is necessary to defer computation of
that due to differences in the portion of the scene that is seen below such signals that move rapidly across the object surface.
each pixel in adjacent frames, a technique based on multiple im-
portance sampling is required to obtain an unbiased estimator for
reprojected color. However, they note that the difference is negligi-
ble when reused frames are close to each other, as in our case. References
A DELSON , S. J. and H ODGES , L. F. 1995. Generating exact
The main limitation of our technique is that it cannot properly de-
ray-traced animation frames by reprojection. IEEE Computer
tect and antialias arbitrary temporal changes in the shading such as
Graphics and Applications, 15(3):43–52.
very fast moving sharp specular highlights and shadow boundaries,
as well as parallax occlusion mapping. These types of effects can- A PODACA , A. and G RITZ , L. 2000. Advanced RenderMan: Cre-
not be accurately predicted by our reprojection framework because ating CGI for Motion Pictures. Morgan Kaufmann.
they involve a signal that rapidly moves relative to the object sur- BADT, S. 1988. Two algorithms for taking advantage of temporal
face. However, we have provided numerous examples of our tech- coherence in ray tracing. The Visual Computer, 4(3):123–132.
nique applied to the constant components in complex real-world
shaders (i.e., those that are independent of the view and light di- BALA , K., D ORSEY, J., and T ELLER , S. 1999. Radiance inter-
rections) as well as slowly varying signals, such as the lighting in polants for accelerated bounded-error ray tracing. ACM Trans-
the teapot and indoor scenes. Note, however, that any strongly dy- actions on Graphics, 18(3):213–256.
namic “lighting” effects can be evaluated at the native screen reso- B ISHOP, G., F UCHS , H., M C M ILLAN , L., and Z AGIER , E. J. S.
lution independent of our technique. In the indoor scene, for exam- 1994. Frameless rendering: Double buffering considered harm-
ple, the fast moving gobo light computation was deferred to avoid ful. In Proceedings of ACM SIGGRAPH 94, pages 175–176.
being severely blurred. Figure 13 shows a side-by-side compari-
son between amortized and native-resolution gobo lighting, clearly B ROOKS VAN H ORN III, R. and T URK , G. 2008. Antialiasing
demonstrating that temporally varying effects that result in such ex- procedural shaders with reduction maps. IEEE Transactions on
tremely fast changes in surface color cannot be properly handled by Visualization and Computer Graphics, 14(3):539–550.
our framework. C HEN , S. E. and W ILLIAMS , L. 1993. View interpolation for im-
age synthesis. In Proceedings of ACM SIGGRAPH 93, pages
Extending our technique to handle a broader range of temporal ef- 279–288.
fects is a clear direction of future work. We would also like to
investigate extending our algorithm to allow supersampling geom- C OOK , R. L., C ARPENTER , L., and C ATMULL , E. 1987. The
etry silhouettes in addition to the surface shading, and to allow the REYES image rendering architecture. In Computer Graphics
efficient rendering of motion blur. Finally, we believe it would be (Proceedings of ACM SIGGRAPH 87), volume 21, pages 95–
possible to modify the number of subpixel buffers over time in or- 102.
der to achieve a target render quality or framerate. DAYAL , A., W OOLLEY, C., WATSON , B., and L UEBKE , D. 2005.
Adaptive frameless rendering. In Eurographics Symposium on
Rendering, pages 265–275.
9 Conclusion
E BERT, D. S., M USGRAVE , F. K., P EACHEY, D., P ERLIN , K.,
and W ORLEY, S. 2003. Texturing and Modeling: A Procedural
Amortized supersampling is a practical scheme for antialiasing pro- Approach. Morgan Kaufmann, 3rd edition.
cedural shaders. A key challenge this paper addresses is to char-
acterize the spatial blur introduced by reprojection schemes that E RNST, M., S TAMMINGER , M., and G REINER , G. 2006. Filter
involve repeatedly reconstructing intermediate values in a discrete importance sampling. In IEEE Symposium on Interactive Ray
framebuffer. Based on this analysis, we introduced a principled ap- Tracing, pages 125–132.
proach for setting the smoothing factor in a recursive temporal filter G AUTRON , P., K ŘIV ÁNEK , J., B OUATOUCH , K., and PATTANAIK ,
and demonstrated how the shading can be maintained at a higher S. 2005. Radiance cache splatting: A GPU-friendly global illu-
spatial resolution while still requiring roughly one shader invoca- mination algorithm. In Eurographics Symposium on Rendering,
tion per pixel. We also introduced a method for handling temporally pages 55–64.
varying shaders that is appropriate when the temporal derivative has
low spatio-temporal frequencies. We demonstrated the efficacy of G UENTER , B. 1994. Motion compensated noise reduction. Tech-
nical Report MSR-TR-94-05, Microsoft Research.
our approach compared to fixed stratified supersampling and naive
extensions of prior reprojection techniques. H ART, J., C ARR , N., K AMEYA , M., T IBBITTS , S., and C OLE -
MAN , T. 1999. Antialiased parameterized solid texturing simpli- scenes. In Proceedings of the Computer Graphics International,
fied for consumer-level hardware implementation. In Graphics pages 110–119.
Hardware, pages 45–53. WALTER , B., D RETTAKIS , G., and G REENBERG , D. P. 2002. En-
H ASSELGREN , J. and A KENINE -M OLLER , T. 2006. An efficient hancing and optimizing the render cache. In Eurographics Work-
multi-view rasterization architecture. In Eurographics Sympo- shop on Rendering, pages 37–42.
sium on Rendering, pages 61–72. WALTER , B., D RETTAKIS , G., and PARKER , S. 1999. Interactive
H AVRAN , V., DAMEZ , C., M YSZKOWSKI , K., and S EIDEL , H.- rendering using the render cache. In Eurographics Workshop on
P. 2003. An efficient spatio-temporal architecture for anima- Rendering, pages 235–246.
tion rendering. In Eurographics Symposium on Rendering, pages WARD , G. and S IMMONS , M. 1999. The holodeck ray cache: an
106–117. interactive rendering system for global illumination in nondif-
H EIDRICH , W., S LUSALLEK , P., and S EIDEL , H.-P. 1998. Sam- fuse environments. ACM Transactions on Graphics, 18(4):361–
pling procedural shaders using affine arithmetic. ACM Transac- 368.
tions on Graphics, 17(3):158–176. W OOLLEY, C., L UEBKE , D., WATSON , B., and DAYAL , A. 2003.
H OGG , R. and TANIS , E. 2001. Probability and Statistical Infer- Interruptible rendering. In Proceedings of the Symposium on
ence. Prentice Hall, 6th edition. Interactive 3D Graphics, pages 143–151.
J ONES , T. R., P ERRY, R. N., and C ALLAHAN , M. 2000. Sha- Z HU , T., WANG , R., and L UEBKE , D. 2005. A GPU accelerated
dermaps: A method for accelerating procedural shading. Tech- render cache. In Pacific Graphics.
nical report, Mitsubishi Electric Research Laboratories.
M ARK , W. R., M C M ILLAN , L., and B ISHOP, G. 1997. Post- Appendix: Detailed derivations
rendering 3D warping. In Proceedings of the Symposium on
Interactive 3D Graphics, pages 7–12. Derivation of (8) According to our definition, 𝑁𝑡 is the variance
reduction factor that appears in (3), or intuitively the effective num-
M ÉNDEZ -F ELIU , À., S BERT, M., and S ZIRMAY-K ALOS , L. 2006. ber of samples that have so far contributed to the estimate. Omitting
Reusing frames in camera animation. Journal of WSCG, 14. the dependency on 𝑝 for simplicity:
M ITCHELL , D. P. and N ETRAVALI , A. N. 1988. Reconstruction ( )
filters in computer-graphics. In Computer Graphics (Proceed- 1 Var(𝑓𝑡 ) Var 𝛼𝑡 𝑆 + (1 − 𝛼𝑡 )𝑓𝑡-1
ings of ACM SIGGRAPH 88), volume 22, pages 221–228. = = (28)
𝑁𝑡 Var(𝑓1 ) Var(𝑓1 )
N EHAB , D., S ANDER , P. V., L AWRENCE , J., TATARCHUK , N., Var(𝑓1 ) Var(𝑓𝑡-1 )
and I SIDORO , J. R. 2007. Accelerating real-time shading with = 𝛼𝑡2 + (1 − 𝛼𝑡 )2 (29)
Var(𝑓1 ) Var(𝑓1 )
reverse reprojection caching. In Graphics Hardware, pages 25– 1
35. = 𝛼𝑡2 + (1 − 𝛼𝑡 )2 . (30)
𝑁𝑡-1
N ORTON , A., ROCKWOOD , A. P., and S KOLMOSKI , P. T. 1982.
Clamping: a method of antialiasing textured surfaces by band- Taking the reciprocal of both sides results in (8).
width limiting in object space. In Computer Graphics (Proceed-
ings of ACM SIGGRAPH 82), volume 16, pages 1–8. Derivation of (10) We choose to characterize the blur at each
ROBERT, C. P. and C ASELLA , G. 2004. Monte Carlo Statistical pixel by the covariance matrix of the collection of weighted samples
Methods. Springer, 2nd edition. that contribute to it:
[ ( ) ( ) ]
S CHERZER , D., J ESCHKE , S., and W IMMER , M. 2007. Pixel- Var𝑥 (𝛿𝑡 [𝑝] ) Cov𝑥𝑦( 𝛿𝑡 [𝑝])
correct shadow maps with temporal reprojection and shadow test 𝑉 [𝑝] = , (31)
Cov𝑦𝑥 𝛿𝑡 [𝑝] Var𝑦 𝛿𝑡 [𝑝]
confidence. In Eurographics Symposium on Rendering, pages
45–50. { }𝑛(𝑡,𝑝)
where 𝛿𝑡 [𝑝] = 𝛿𝑡,𝑖 [𝑝] 𝑖=1 . The two eigenvalues 𝜆1 and 𝜆2
S CHERZER , D. and W IMMER , M. 2008. Frame sequential interpo- of 𝑉 [𝑝] correspond to the variances along the directions of mini-
lation for discrete level-of-detail rendering. Computer Graphics mum and maximum variance. We use their average as our estimate
Forum (Proceedings of Eurographics Symposium on Rendering for the degree of blur:
2008), 27(4):1175–1181.
𝜆1 + 𝜆2
𝜎𝑡2 [𝑝] = = 12 Tr 𝑉 [𝑝]
( )
S HINYA , M. 1993. Spatial anti-aliasing for animation sequences (32)
with spatio-temporal filtering. In Proceedings of ACM SIG- 2 (
= 21 Var𝑥 𝛿𝑡 [𝑝] + 12 Var𝑦 𝛿𝑡 [𝑝] .
) ( )
GRAPH 93, pages 289–296. (10)
S IMMONS , M. and S ÉQUIN , C. H. 2000. Tapestry: dynamic mesh-
based display representation for interactive rendering. In Euro- Derivation of (11) We first perform the derivation in 1D and
graphics Workshop on Rendering, pages 329–340. then extend it to the 2D case. At time 𝑡, the value of a pixel is given
by the weighted sum of a set of samples. We can represent each
S ITTHI - AMORN , P., L AWRENCE , J., YANG , L., S ANDER , P., N E - sample in the set by a pair carrying its weight and displacement:
HAB , D., and X I , J. 2008. Automated reprojection-based pixel
shader optimization. ACM Transactions on Graphics, 27(5):127. { }𝑛(𝑡)
𝐹𝑡 = (𝜔𝑡,𝑖 , 𝛿𝑡,𝑖 ) 𝑖=1 . (33)
S TAMMINGER , M., H ABER , J., S CHIRMACHER , H., and S EIDEL ,
H.-P. 2000. Walkthroughs with corrective texturing. In Euro- Here, the displacements 𝛿𝑡,𝑖 are relative to the pixel center and the
graphics Workshop on Rendering, pages 377–388. weights 𝜔𝑡,𝑖 sum to one. To simplify notation let us define two
operations:
TAWARA , T., M YSZKOWSKI , K., and S EIDEL , H.-P. 2004. Ex-
ploiting temporal coherence in final gathering for dynamic { }𝑛(𝑡) { }𝑛(𝑡)
𝑎⋅𝐹𝑡 = (𝑎 𝜔𝑡,𝑖 , 𝛿𝑡,𝑖 ) 𝑖=1 , 𝐹𝑡 +𝑏 = (𝜔𝑡,𝑖 , 𝛿𝑡,𝑖 +𝑏) 𝑖=1 , (34)
which scale all of the weights or translate all of the displacements Beyond linear resampling The analysis leading to (44–45)
by an equal amount, respectively. If we ignore the effect of jittering applies without modification to the more sophisticated resam-
and assume constant panning motion, then every pixel will contain pling scheme of Section 5.1, if we restrict ourselves to a simple
the same set 𝐹𝑡 . Furthermore, using (4) and assuming linear resam- round-robin update sequence in one dimension (with two subpixel
pling, 𝐹𝑡 can be written in terms of 𝐹𝑡-1 at neighboring pixels: buffers). The weights 𝑎, 𝑐 and offsets 𝑏, 𝑑 then become:
( )
𝐹𝑡 = 𝛼 ⋅ {(1, 0)} ∪ (1-𝛼) ⋅ 𝑎 ⋅ (𝐹𝑡-1 + 𝑏) ∪ (𝑐 ⋅ 𝐹𝑡-1 + 𝑑) , (35) 𝑏 = frac(2𝑣 + 1/2) − 1/2 𝑑 = frac(𝑣) − 1/2 (48)
where 𝑎=𝑎 ¯/(¯
𝑎 + 𝑐¯) 𝑐 = 𝑐¯/(¯
𝑎 + 𝑐¯), where (49)
¯ = max(0, 1 − 2∣𝑏∣)
𝑎 𝑐¯ = max(0, 1 − 2∣𝑑∣), (50)
𝑎 = 1 − 𝑣, 𝑏 = −𝑣, 𝑐 = 𝑣, 𝑑 = 1 − 𝑣. (36)
with frac(𝑥) = 𝑥 − ⌊𝑥⌋. Substituting (48–49) into (44–45) intro-
Note that in (35) all samples are of the form duces three complications not previously present in (46–47): (i) the
fact that (45) is a cubic makes it inconvenient to obtain an explicit
𝛼 (1-𝛼)𝑗+𝑘 𝑎𝑗 𝑐𝑘 , 𝑗 𝑏 + 𝑘 𝑑 .
( )
(37) solution for 𝛼 given a bound 𝜏𝑏 on the variance of the sample dis-
(𝑗+𝑘) tribution; (ii) since 𝑎𝑏 + 𝑐𝑑 is generally nonzero, the mean of the
Furthermore, for large enough 𝑡, each one appears 𝑗 times. We sample distribution is also nonzero (and this is the source of the
can now rephrase the problem using probability theory and take drift in Figure 6); (iii) the weights 𝑎, 𝑐 are no longer separable, so
advantage of several mathematical tools [Hogg and Tanis 2001]. the extension to 2D would result in expressions for the 𝑥 and 𝑦 com-
First, let 𝑋𝑡 be a random variable that takes on values in the set ponents of the moments that depend on both components 𝑣𝑥 and 𝑣𝑦
𝑛(𝑡)
{𝛿𝑡,𝑖 }𝑖=1 , with a probability mass function (p.m.f.) proportional to of the velocity vector.
the corresponding weights 𝜔𝑡,𝑖 . Our goal is to compute Var(𝑋),
where 𝑋 = lim𝑡→∞ 𝑋𝑡 . This can be accomplished by computing In addition, our use of an irregular update sequence (Section 5.2)
the first and second moments of 𝑋. The p.m.f. is given by invalidates the simple recurrence in (35) altogether. Therefore as
explained in the text, we instead precompute a numerical table in
𝑃 (𝑋 = 𝑗 𝑏 + 𝑘 𝑑) = 𝑗+𝑘 𝛼(1-𝛼)𝑗+𝑘 𝑎𝑗 𝑐𝑘 .
( )
𝑗
(38) an off-line preprocess.

Let 𝑀𝑋 (𝑢) denote the moment-generating function of 𝑋: Derivations of (18) and (19) For the sums in (16)[ and (18) to]
be equal, the weights associated to each value 𝑏𝑖(𝑡-𝑘-1) ⌊𝑝𝑘 ⌋ + Δ
𝑀𝑋 (𝑢) = 𝐸 lim 𝑒𝑢𝑋𝑡
( )
(39)
𝑡→∞ must be the same. This leads to a system of equations for each 𝑜𝑘 :
∞ ∑
∑ ∞
(1 − 𝑜𝑘𝑥 )(1 − 𝑜𝑘𝑦 ) = 𝑤𝑘,(0) /𝑤𝑘 = 𝛽𝑘,0 ⋅ 𝛾𝑘,0
(𝑗+𝑘)
𝛼(1-𝛼)𝑗+𝑘 𝑎𝑗 𝑐𝑘 𝑒𝑢(𝑗𝑏+𝑘𝑑)

= 𝑗
(40) 
 0
𝑗=0 𝑘=0

(1 − 𝑜𝑘𝑥 ) 𝑜𝑘𝑦 = 𝑤𝑘,(0) /𝑤𝑘 = 𝛽𝑘,0 ⋅ 𝛾𝑘,1



𝑞
∞ ∑ 1
∑ (𝑞) . (51)
= 𝑗
𝛼(1-𝛼)𝑞 𝑎𝑗 𝑐𝑞-𝑗 𝑒𝑢(𝑗𝑏+(𝑞-𝑗)𝑑) (41) 
 𝑜𝑘𝑥 (1 − 𝑜𝑘𝑦 ) = 𝑤𝑘,(1) /𝑤𝑘 = 𝛽𝑘,1 ⋅ 𝛾𝑘,0
 0
𝑞=0 𝑗=0 


⎩ 𝑜𝑘𝑥 𝑜𝑘𝑦 = 𝑤𝑘,(1) /𝑤𝑘 = 𝛽𝑘,1 ⋅ 𝛾𝑘,1
∑ )𝑞 1
(1-𝛼)(𝑎 𝑒𝑢𝑏 + 𝑐 𝑒𝑢𝑑 )
(
=𝛼 (42)
𝑞=0 Note that both sides add up to one, and recall that the tent fil-
𝛼 ters are axis-aligned and separable, which allows us to factor the
= . (43) weights 𝑤𝑘,Δ into products 𝛽 ⋅ 𝛾 as shown above. Therefore there
1 − (1-𝛼)(𝑎 𝑒𝑢𝑏 + 𝑐 𝑒𝑢𝑑 )
exists a unique solution to each of these systems, given by (19).
This function can be used to compute the desired moments:
Derivation of (27) The residual in the next frame is equal to
′ (𝑎𝑏 + 𝑐𝑑)𝛼(1-𝛼)
𝜇𝑋 = 𝐸(𝑋) = 𝑀𝑋 (0) = ( )2 (44)
1 − (1-𝛼)(𝑎 + 𝑐) 𝜖ˆ𝑡+1 [𝑝] ≈ 𝑠𝑡+1 [𝑝] − 𝑓𝑡+1 [𝑝] (52)
2 ′′
𝜇2𝑋 𝜇2𝑋
( )
𝜎𝑋 + = Var(𝑋) + = 𝑀𝑋 (0) = 𝑠𝑡+1 [𝑝] − (𝛼)𝑠𝑡+1 [𝑝] + (1 − 𝛼)𝑓𝑡 [𝑝] (53)
2 2 2 2
( )
(𝑎𝑏 + 𝑐𝑑 )𝛼(1-𝛼) 2(𝑎𝑏 + 𝑐𝑑) 𝛼(1-𝛼) ≈ 𝑠𝑡+1 [𝑝] − (𝛼)𝑠𝑡+1 [𝑝] − (1 − 𝛼) 𝑠𝑡 [𝑝] − 𝜖ˆ𝑡 [𝑝] (54)
= ( )2 + ( )3 . (45) ( )
1 − (1-𝛼)(𝑎 + 𝑐) 1 − (1-𝛼)(𝑎 + 𝑐) = (1 − 𝛼) 𝑠𝑡+1 [𝑝] − 𝑠𝑡 [𝑝] + (1 − 𝛼)ˆ 𝜖𝑡 [𝑝]. (55)
The introduction of sample jittering simply adds a Gaussian random Here we do not attempt to predict the additional residual introduced
variable 𝐺 to 𝑋. Since Var(𝑋 + 𝐺) = Var(𝑋) + Var(𝐺), we due to future signal changes, so we set 𝑠𝑡+1 [𝑝] = 𝑠𝑡 [𝑝] in (55). This
can substitute (36) into (44–45) to obtain leads to the relation
𝜇𝑣 = 0 (46) 𝜖ˆ𝑡+1 [𝑝] ≈ (1 − 𝛼)ˆ
𝜖𝑡 [𝑝]. (56)
1−𝛼
𝜎𝑣2 = 𝜎𝐺
2
+ 𝑣 (1 − 𝑣). (47)
𝛼 Requiring ∣ˆ
𝜖𝑡+1 ∣ to be smaller than 𝜏𝜖 , we reach
Extending this result to 2D requires two modifications. First, the 𝜏𝜖
fractional velocity 𝑣 and offsets 𝛿𝑡,𝑖 become 2D vectors. Second, 𝛼 > 𝛼𝜏𝜖 = 1 − . (27)
(35) now contains four terms with 𝐹𝑡-1 , corresponding to the four 𝜖ˆ𝑡 [𝑝]
pixels involved in bilinear resampling. Because the bilinear weights
are separable in 𝑣𝑥 and 𝑣𝑦 , we can consider the 𝑥 and 𝑦 components
of these moments separately, and the respective sample sets reduce
to the 1D case in (35). Therefore, we can use (47) and (10) to
reach (11).
Reproj mov (88fps, 22.72dB) Reproj still (88fps, 26.30dB) Ours mov (64fps, 30.54dB) Ours still (64fps, 40.04dB)
H ORSE SCENE

No AA (140fps, 15.68dB) 2×2 SS (37fps, 22.22dB) 3×3 SS (19fps, 24.62dB) 4×4 SS (11fps, 25.50dB) Reference
Reproj mov (113fps, 25.52dB) Reproj still (113fps, 28.70dB) Ours mov (84fps, 31.96dB) Ours still (84fps, 35.11dB)
B RICK SCENE

No AA (166fps, 21.82dB) 2×2 SS (35fps, 26.54dB) 3×3 SS (17fps, 28.71dB) 4×4 SS (9.8fps, 29.72dB) Reference
Reproj mov (92fps, 27.27dB) Reproj still (92fps, 31.24dB) Ours mov (52fps, 33.93dB) Ours still (52fps, 38.37dB)
I NDOOR SCENE

No AA (112fps, 24.79dB) 2×2 SS (35fps, 30.33dB) 3×3 SS (17fps, 32.38dB) 4×4 SS (10fps, 33.26dB) Reference

Figure 14: Comparison between our approach, no antialiasing, stratified supersampling, and jittered reprojection.

You might also like