Bilagrid Web
Bilagrid Web
Bilagrid Web
Figure 1: The bilateral grid enables edge-aware image manipulations such as local tone mapping on high resolution images in real time.
This 15 megapixel HDR panorama was tone mapped and locally refined using an edge-aware brush at 50 Hz. The inset shows the original
input. The process used about 1 MB of texture memory.
1
To appear in the ACM SIGGRAPH conference proceedings
cations [Bennett and McMillan 2005; Xiao et al. 2006; Bae et al. from artifacts with the larger kernels used in other applications.
2006; Winnemöller et al. 2006]. A common drawback of nonlin- Weiss [2006] maintains local image histograms, which unfortu-
ear filters is speed: a direct implementation of the bilateral filter nately limits this approach to box spatial kernels instead of the
can take several minutes for a one megapixel image. However, re- smooth Gaussian kernel.
cent work has demonstrated acceleration and obtained good perfor-
mance on CPUs, on the order of one second per megapixel [Du- Paris and Durand [2006] extend the fast bilateral filter introduced
rand and Dorsey 2002; Pham and van Vliet 2005; Paris and Durand by Durand and Dorsey [2002] and recast the computation as a
2006; Weiss 2006]. However, these approaches still do not achieve higher-dimensional linear convolution followed by trilinear interpo-
real-time performance on high-definition content. lation and a division. We generalize the ideas behind their higher-
dimensional space into a data structure, the bilateral grid, that en-
In this work, we dramatically accelerate and generalize the bilat- ables a number of operations, including the bilateral filter, edge-
eral filter, enabling a variety of edge-aware image processing ap- aware painting and interpolation, and local histogram equalization.
plications in real-time on high-resolution inputs. Building upon
the technique by Paris and Durand [2006], who use linear filter- Other edge-preserving techniques include anisotropic diffusion
ing in a higher-dimensional space to achieve fast bilateral filtering, [Perona and Malik 1990; Tumblin and Turk 1999] which is related
we extend their high-dimensional approach and introduce the bilat- to the bilateral filter [Durand and Dorsey 2002; Barash 2002]. Op-
eral grid, a new compact data structure that enables a number of timization [Levin et al. 2004; Lischinski et al. 2006] can also be
edge-aware manipulations. We parallelize these operations using used to interpolate values while respecting strong edges. Our work
modern graphics hardware to achieve real-time performance at HD introduces an alternative to these approaches.
resolutions. In particular, our GPU bilateral filter is two orders of
magnitude faster than the equivalent CPU implementation. High-Dimensional Image Representation The interpretation of
This paper makes the following contributions: 2D images as higher-dimensional structures has received much at-
tention. Sochen et al. [1998] describe images as 2D manifolds em-
• The bilateral grid, a new data structure that naturally enables bedded in a higher-dimensional space. For instance, a color image
edge-aware manipulation of images. is embedded in a 5D space: 2D for x and y plus 3D for color. This
enables an interpretation of PDE-based image processes in terms of
• Real-time bilateral filtering on the GPU, which is two orders
geometric properties such as curvature. Our bilateral grid shares the
of magnitude faster than previous CPU techniques, enabling
use of higher-dimensional spaces with this approach. It is nonethe-
real-time processing of HD content.
less significantly different since we consider values over the entire
• Edge-aware algorithms. We introduce a number of real- space and not only on the manifold. Furthermore, we store homo-
time, edge-aware algorithms including edge-preserving paint- geneous values, which allows us to handle weighted functions and
ing, scattered data interpolation, and local histogram equal- the normalization of linear filters. Our approach is largely inspired
ization. by signal processing whereas Sochen et al. follow a differential
geometry perspective.
Our approach is implemented as a flexible library which is dis-
tributed in open source to facilitate research in this domain. Felsberg et al. [2006] describe an edge-preserving filter based on a
The code is available at http://groups.csail.mit.edu/ stack of channels that can be seen as a volumetric structure simi-
graphics/bilagrid/. We believe that the bilateral grid data lar in spirit to the bilateral grid. Each channel stores spline coeffi-
structure is general and future research will develop many new ap- cients representing the encoded image. Edge-preserving filtering is
plications. achieved by smoothing the spline coefficients within each channel
and then reconstructing the splines from the filtered coefficients. In
1.1 Related Work comparison, the bilateral grid does not encode the data into splines
and allows direct access. This enables faster computation; in par-
Bilateral Filter The bilateral filter is a nonlinear process that ticular, it makes it easier to leverage the computational power of
smoothes images while preserving their edges [Aurich and Weule modern parallel architectures.
1995; Smith and Brady 1997; Tomasi and Manduchi 1998]. For an
image I, at position p, it is defined by:
2 Bilateral Grid
1
bf (I)p = Gσs (||p − q||) Gσr (|Ip − Iq |) Iq (1a) The effectiveness of the bilateral filter in respecting strong edges
Wp
q∈N (p)
comes from the inclusion of the range term Gσr (|Ip − Iq |) in the
Wp = Gσs (||p − q||) Gσr (|Ip − Iq |) (1b) weighted combination of Equation 1: although two pixels across an
q∈N (p)
edge are close in the spatial dimension, from the filter’s perspec-
tive, they are distant because their values differ widely in the range
The output is a simple weighted average over a neighborhood where dimension. We turn this principle into a data structure, the bilat-
the weight is the product of a Gaussian on the spatial distance (Gσs ) eral grid, which is a 3D array that combines the two-dimensional
and a Gaussian on the pixel value difference (Gσr ), also called the spatial domain with a one-dimensional range dimension, typically
range weight. The range weight prevents pixels on one side of a the image intensity. In this three-dimensional space, the extra di-
strong edge from influencing pixels on the other side since they mension makes the Euclidean distance meaningful for edge-aware
have different values. This also makes it easy to take into account image manipulation.
edges over a multi-channel image (such as RGB) since the Gaussian
on the pixel value difference can have an arbitrary dimension. The bilateral grid first appeared as an auxiliary data structure in
Paris and Durand’s fast bilateral filter [2006]. In that work, it was
Fast numerical schemes for approximating the bilateral fil- used as an algebraic re-expression of the original bilateral filter
ter are able to process large images in less than a second. equation. Our perspective is different in that we view the bilateral
Pham et al. [2005] describe a separable approximation that grid as a primary data structure that enables a variety of edge-aware
works well for the small kernels used in denoising but suffers operations in real time.
2
To appear in the ACM SIGGRAPH conference proceedings
2.1 A Simple Illustration of smoothing, while sr controls the degree of edge preservation.
The number of grid cells is inversely proportional to the sampling
Before formally defining the bilateral grid, we first illustrate the rate: a smaller ss or sr yields a larger number of grid cells and
concept with the simple example of an edge-aware brush. The requires more memory. In practice, most operations on the grid re-
edge-aware brush is similar to brushes offered by traditional editing quire only a coarse resolution, where the number of grid cells is
packages except that when the user clicks near an edge, the brush much smaller than the number of image pixels. In our experiments,
does not paint across the edge. we use between 2048 and 3 million grid cells for the useful range of
parameters. The required resolution depends on the operation and
When the user clicks on an image E at a location (xu , yu ), the edge- we will discuss it on a per-application basis.
aware brush paints directly in the three-dimensional bilateral grid
at position xu , yu , E(xu , yu ) . The spatial coordinates are deter- Data Type and Homogeneous Values The bilateral grid can
mined by the click location, while the range coordinate is the inten- store in its cells any type of data such as scalars and vectors. For
sity of the clicked pixel. The edge-aware brush has a smooth falloff many operations on the bilateral grid, it is important to keep track
in all three dimensions. The falloff along the two spatial dimen- of the number of pixels (or a weight w) that correspond to each
sions is the same as in classical 2D brushes, but the range falloff grid cell. Hence, we store homogeneous quantities such as (wV, w)
is specific to our brush and ensures that only a limited interval of for a scalar V or (wR, wG, wB, w) if we deal with RGB colors.
intensity values is affected. To retrieve the painted value V in im- This representation makes it easy to compute weighted averages:
age space at location (x, y),
we read the value of the bilateral grid (w1 V1 , w1 ) + (w2 V2 , w2 ) = (w1 V1 + w2 V2 ), (w1 + w2 ) where
at position x, y, E(x, y) . We use the same terminology as Paris normalizing by the homogeneous coordinate (w1 + w2 ) yields the
and Durand [2006] and call this operation slicing. In regions where expected result of averaging V1 and V2 weighted by w1 and w2 .
the image E is nearly constant, the edge-aware brush behaves like Intuitively, the homogeneous coordinate w encodes the importance
a classical brush with a smooth spatial falloff. Since the range vari- of its associated data V . It is also similar to the homogeneous inter-
ations are small, the range falloff has little influence. At edges, E pretation of premultiplied alpha colors [Blinn 1996; Willis 2006].
is discontinuous and if only one side has been painted, the range
falloff ensures that the other side is unaffected, thereby creating a
discontinuity in the painted values V . Although the grid values are 2.3 Basic Usage of a Bilateral Grid
smooth, the output value map is piecewise-smooth and respects the
strong edges of E because we slice according to E. Figure 2 illus- Since the bilateral grid is inspired by the bilateral filter, we use it as
trates this process on a 1D image. the primary example of a grid operation. We describe a set of new
manipulations in Section 4. In general, the bilateral grid is used
V(x)
in three steps. First, we create a grid from an image or other user
input. Then, we perform processing inside the grid. Finally, we
slice the grid to reconstruct the output (Fig. 3). Construction and
range
mouse clicks x Grid Creation Given an image I normalized to [0, 1], ss and sr ,
space the spatial and range sampling rates, we construct the bilateral grid
(a) grid and reference function (b) reconstructed values Γ as follows:
Figure 2: Example of brush painting on a 1D image E. When the
1. Initialization: For all grid nodes (i, j, k), Γ(i, j, k) = (0, 0).
location x, we add a smooth brush shape in the grid
user clicks at
2. Filling: For each pixel at position (x, y):
at location x, E(x) . The image space result of the brush oper-
ation at a position
x is obtained by interpolating the grid values
at location x, E(x) . Although the grid is smooth, we obtain an Γ ([x/ss ] , [y/ss ] , [I(x, y)/sr ]) += (I(x, y), 1)
image space result that respects the image discontinuity.
where [ · ] is the closest-integer operator. We use the notation
Γ = c (I) for this construction. Note that we accumulate both the
This simple example demonstrates the edge-aware properties of the image intensity and the number of pixels into each grid cell using
bilateral grid. Although we manipulate only smooth values and ig- homogeneous coordinates.
nore the issue of edge preservation, the resulting function generated
in image space is piecewise-smooth and respects the discontinuities
Processing Any function f that takes a 3D function as input
of the reference image thanks to the final slicing operation. Further-
more, computations using the bilateral grid are generally indepen- can be applied to a bilateral grid Γ to obtain a new bilateral grid
dent and require a coarser resolution than the reference image. This Γ̃ = f (Γ). For the bilateral filter, f is a convolution by a Gaussian
makes it easy to map grid operations onto parallel architectures. kernel, where the variance along the domain and range dimensions
are σs and σr respectively [Paris and Durand 2006].
2.2 Definition
Extracting a 2D Map by Slicing Slicing is the critical bilateral
grid operation that yields a piecewise-smooth output. Given a bi-
Data Structure The bilateral grid is a three dimensional array, lateral grid Γ and a reference
where the first two dimensions (x, y) correspond to 2D position image E, we extract a 2D value map
M by accessing the grid at x/ss , y/ss , E(x, y)/sr using trilinear
in the image plane and form the spatial domain, while the third interpolation. We use the notation M = sE (Γ). If the grid stores
dimension z corresponds to a reference range. Typically, the range homogeneous values, we first interpolate the grid to retrieve the ho-
axis is image intensity. mogeneous vector; then, we divide the interpolated vector to access
the actual data.
Sampling A bilateral grid is regularly sampled in each dimen-
sion. We name ss the sampling rate of the spatial axes and sr the Slicing is symmetric to the creation of the grid from an image. If
sampling rate of the range axis. Intuitively, ss controls the amount a grid matches the resolution and quantization of the image used
3
To appear in the ACM SIGGRAPH conference proceedings
I bf(I )
range
create process slice
x space x
(a) input 1D image (b) grid created from 1D image (c) filtered grid and slicing image (d) filtered 1D image
Figure 3: Bilateral filtering using a bilateral grid demonstrated on a 1D example. Blue grid cells are empty (w = 0).
for creation, slicing will result in the same image; although in prac- levels), a bilateral grid requires about 1 megabyte of texture mem-
tice, the grid is much coarser than the image. Processing in the ory per megapixel. For the extreme case of ss = 2, the storage cost
grid between creation and slicing is what enables edge-aware op- is about 56 megabytes per megapixel. In general, a grid requires
erations. Moreover, the particular grid operation determines the 16 × number of pixels/(ss 2 × sr ) bytes of texture memory. Grids
required sampling rate. that do not require homogeneous coordinates are stored as single-
component floating point textures and require 1/4 the memory. In
Recap: Bilateral Filtering Using our notation, the algorithm by our examples, we use between 50 kB and 40 MB.
Paris and Durand [2006] to approximate the bilateral filter in Equa-
tion 1 becomes (Fig. 3): 3.2 Low-Pass Filtering
bf (I) = sI Gσs ,σr ⊗ c (I) (2) As described in Section 2.3, the grid processing stage of the bilat-
eral filter is a convolution by a 3D Gaussian kernel. In constructing
We embed the image I in a bilateral grid, convolve this grid with a the grid, we set the the sampling rates ss and sr to correspond to the
3D Gaussian kernel with spatial parameter σs and range parameter bandwidths of the Gaussian σs and σr , which provides a good trade-
σr , and slice it with the input image. A contribution of our work is to off between accuracy and storage [Paris and Durand 2006]. Since
demonstrate that the bilateral grid extends beyond bilateral filtering the Gaussian kernel is separable, we convolve the grid in each di-
and enables a variety of edge-preserving processing. mension with a 5-tap 1D kernel using a fragment shader.
In developing the bilateral grid, one of our major goals was to facili- After processing the bilateral grid, we slice the grid using the input
tate parallelization on graphics hardware. Our benchmarks showed image I to extract the final 2D output. We slice on the GPU by ren-
that on a CPU, the bottleneck lies in the slicing stage, where the dering I as a textured quadrilateral and using a fragment shader to
cost is dominated by trilinear interpolation. We take advantage of look up the stored grid values in a texture. To perform trilinear inter-
hardware texture filtering on the GPU to efficiently perform slicing. polation, we enable bilinear texture filtering, fetch the two nearest
The GPU’s fragment processors are also ideally suited to executing z levels, and interpolate between the results. By taking advantage
grid processing operations such as Gaussian blur. of hardware bilinear interpolation, each output pixel requires only
2 indirect texture lookups.
3.1 Grid Creation
3.4 Performance
To create a bilateral grid from an image, we accumulate the value
of each input pixel into the appropriate grid voxel. Grid creation is We benchmark the grid-based bilateral filter on three generations of
inherently a scatter operation since the grid position depends on a GPUs on the same workstation. Our GPUs consist of an NVIDIA
pixel’s value. Since the vertex processor is the only unit that can GeForce 8800 GTX (G80), which features unified shaders, a
perform a true scatter operation [Harris and Luebke 2004], we ras- GeForce 7800 GT (G70) and a GeForce 6800 GT (NV40). They
terize a vertex array of single pixel points and use a vertex shader were released in 2006, 2005, and 2004, respectively. Our CPU is
to determine the output position. On modern hardware, the vertex an Intel Core 2 Duo E6600 (2.4 GHz, 4MB cache).
processor can efficiently access texture memory. The vertex array
The first benchmark varies the image size while keeping the bilat-
consists of (x, y) pixel positions; the vertex shader looks up the cor-
eral filter parameters constant (σs = 16, σr = 0.1). We use the
responding image value I(x, y) and computes the output position.
same image subsampled at various resolutions and report the aver-
On older hardware, however, vertex texture fetch is a slow oper-
age runtime over 1000 iterations. Figure 4 shows that our algorithm
ation. Instead, we store the input image as vertex color attribute:
is linear in the image size, ranging from 4.5ms for 1 megapixel to
each vertex is a record (x, y, r, g, b) and we can bypass the vertex
44.7ms for 10 megapixels on the G80. We consistently measured
texture fetch. The disadvantage of this approach is that during slic-
slowdowns beyond 9 megapixels for the older GPUs, which we sus-
ing, where the input image needs to be accessed as a texture, we
pect is due to an overflow in the vertex cache. For comparison, our
must copy the data. For this, we use the pixel buffer object exten-
CPU implementation ranges from 0.2 to 1.9 seconds on the same
sion to do a fast copy within GPU memory.
inputs. Our GPU implementation outperforms Weiss’s CPU bilat-
eral filter [2006] by a factor of 50.
Data Layout We store bilateral grids as 2D textures by tiling the z
levels across the texture plane. This layout lets us use hardware bi- The second benchmark keeps the image size at 8 megapixels and the
linear texture filtering during the slicing stage and reduce the num- range kernel size σr at 0.1 while varying the spatial kernel size σs .
ber of texture lookups from 8 to 2. To support homogeneous coordi- As with the first benchmark, we use the same image and report the
nates, we use four-component, 32-bit floating point textures. In this average runtime over 1000 iterations. Figure 5 shows the influence
format, for typical values of ss = 16 and sr = 0.07 (15 intensity of the kernel size. With the exception of a few bumps in the curve
4
To appear in the ACM SIGGRAPH conference proceedings
200
Figure 4: Bilateral cessing and slicing a bilateral grid.
running time in ms
NVIDIA NV40
160 NVIDIA G70 filter running times as
NVIDIA G80 a function of the image 4.1 Cross-Bilateral Filtering
120 size (using σs = 16 and
σr = 0.1). The memory A direct extension to the bilateral filter is the cross-bilateral fil-
80 requirements increase ter [Petschnigg et al. 2004; Eisemann and Durand 2004], where the
linearly from 625 kB at notion of image data is decoupled from image edges. We define a
40
1 megapixel to 6.25 MB new grid creation operator with two parameters: an image I, which
0 at 10 megapixels. defines the grid values, and an edge image E which determines the
2 4 6 8 10
image size in megapixels grid position.
x y E(x, y)
200 Figure 5: Bilateral Γ , , += (I(x, y), 1) (3)
NVIDIA NV40 ss ss sr
running time in ms
3.5 Further Acceleration (a) input and stroke (b) intermediate (c) output
On current hardware, we can run multiple bilateral filters per frame Figure 6: Bilateral Grid Painting allows the user to paint without
on 1080p HD video, but on older hardware, we are limited to a bleeding across image edges. The user clicks on the input (a) and
single filter per frame. For temporally coherent data, we propose strokes the mouse. The intermediate (b) and final (c) results are
an acceleration based on subsampling. A cell of the grid stores shown. The entire 2 megapixel image is updated at 60 Hz. Memory
the weighted average of a large number of pixels and we can ob- usage was about 1.5 MB for a 20 × 20 brush and sr = 0.05.
tain a good estimate with only a subset of those pixels. For typical
values of σs ∈ [10, 50] and σr ∈ [0.05, 0.4], using only 10% of
GPU Implementation We tile Γbrush as a single-component 2D
the input pixels produces an output with no visual artifacts. We
texture. When the user clicks the mouse, a fragment shader renders
choose the 10% of pixels by rotating through a sequence of pre-
the brush shape using blending. Slicing is identical to the case of the
computed Poisson-disk patterns to obtain a good coverage. To com-
bilateral filter. A modern GPU can support bilateral grid painting
bat “swimming” artifacts introduced by time-varying sampling pat-
on very large images. For a 2 × 2 brush with sr = 0.05, the grid
terns, we apply a temporal exponential filter with a decay constant
requires 20 MB of texture memory per megapixel; a 5 × 5 brush
of 5 frames. This produces results visually indistinguishable from
consumes less than 1 MB per megapixel.
the full bilateral filter except at hard scene transitions.
Results In Figure 6, the user manipulates the hue channel of an
4 Image Manipulation with the Bilateral Grid image without creating a mask. An initial mouse click determines
(x0 , y0 , z0 ) in the grid. Subsequent mouse strokes vary in x and y,
The bilateral grid has a variety of applications beyond bilateral fil- but z0 is fixed. Hence, the brush affects only the selected intensity
tering. The following sections introduce new ways of creating, pro- layer and does not cross image edges.
5
To appear in the ACM SIGGRAPH conference proceedings
range M(x)
mouse clicks x
space
(a) grid and reference function (b) smoothly interpolated grid (c) grid after sigmoid (d) extracted influence map
Figure 7: Edge-aware interpolation with a bilateral grid demonstrated on a 1D example. The user clicks on the image to indicate sparse
constraints (a) (unconstrained cells are shown in blue). These values are interpolated into a smooth function (b) (constraints are shown in
green). We filter the grid using a sigmoid to favor consistent regions (c). The resulting is sliced with the input image to obtain the influence
map M (d).
4.3 Edge-Aware Scattered Data Interpolation Euclidean distance in a higher-dimensional space. The comparison
of those two measures deserves further investigation and we believe
Inspired by Levin et al. [2004], Lischinski et al. [2006] introduced that which method is most appropriate is an application-dependent
a scribble interface to create an influence map M over an image I. choice.
The 2D map M interpolates a set of user-provided constraints
{M (xi , yi ) = mi } (the scribbles) while respecting the edges of the GPU Implementation Analogous to grid painting, we rasterize
underlying image I. We use a scalar bilateral grid Γint to achieve a scribble constraints into the bilateral grid using a fragment shader.
similar result: instead of solving a piecewise-smooth interpolation To obtain a globally smooth bilateral grid that respects the con-
in the image domain, we solve a smooth interpolation in the grid straints, we solve Laplace’s equation by extending a GPU multigrid
domain and then slice. algorithm [Goodnight et al. 2003] to handle irregular 3D domains.
The domain has holes because the user-specified hard constraints
We lift the user-provided constraints into the 3D domain:
create additional boundaries.
{Γint ([x/ss ] , [y/ss ] , [I(x, y)/sr ]) = mi }, and minimize the vari-
ations of the grid values:
Results We demonstrate real-time scribble interpolation with a
simple color adjustment application. The user paints scribbles over
argmin ||grad(Γint )||2 (5) an image; white scribbles denote regions where the hue should be
adjusted, while black scribbles protect the image region. In prac-
under the constraints: Γint sxs , sys , I(x,y)
sr
= m i tice, the sampling rate of Γint can be very coarse and still yield
good influence maps. The example in Figure 8 was generated us-
The 2D influence map is obtained by slicing: M = sI (Γint ). ing a coarse grid containing about 600 variables, allowing our GPU
Finally, we bias the influence map toward 0 and 1 akin to solver for generating the influence map in real time (40 Hz). When
Levin et al. [2007]. We achieve this by applying a sigmoid function finer adjustments are required, we can still achieve interactive rates
to the grid values. Figure 7 summarizes this process and Figure 8 (1 Hz) on finely sampled grids with over 500,000 variables. Refer
shows a sample result. to the supplemental video for a screen capture of an editing session.
Discussion Compared to image-based approaches [Levin et al. 4.4 Local Histogram Equalization
2004; Lischinski et al. 2006], our method does not work at the
pixel level which may limit accuracy in some cases; although our Histogram equalization is a standard technique for enhancing the
experiments did not reveal any major problems. On the other hand, contrast of images [Gonzales and Woods 2002]. However, for some
the bilateral grid transforms a difficult image-dependent and non- inputs, such as X-Ray and CT medical images that have high dy-
homogeneous 2D optimization into a simpler smooth and homoge- namic range, histogram equalization can obscure small details that
neous interpolation in 3D. Furthermore, the grid resolution is de- span only a limited intensity range. For these cases, it is more use-
coupled from the resolution of the image, which prevents the com- ful to perform histogram equalization locally over image windows.
plexity from growing with image resolution. We perform local histogram equalization efficiently using a bilat-
Another difference is that image-based techniques use the notion of eral grid, and achieve real-time performance using the GPU.
“geodesic distance” over the image manifold, while we consider the Given an input image I, we construct a scalar bilateral grid Γhist .
We initialize Γhist = 0 and fill it with:
x y I(x, y)
Γhist , , += 1 (6)
ss ss sr
We denote this operator Γhist = chist (I). Γhist stores the number of
pixels in a grid cell and can be considered a set of local histograms.
(a) input & scribbles (b) influence map (c) output For each (x, y), the corresponding column splits the ss ×ss covered
pixels into intensity intervals of size sr . By using the closest-integer
Figure 8: Fast scribble interpolation using the Bilateral Grid. The operator when constructing Γhist , we perform a box filter in space. If
user paints scribbles over the input (a). Our algorithm extracts an a smoother spatial kernel is desired, we blur each z level of the grid
influence map (b), which is used to adjust the input hue and produce by a spatial kernel (e.g., Gaussian). We perform local histogram
the output (c). The entire 2 megapixel image is updated at 20 Hz. equalization by applying a standard histogram equalization to each
Memory usage was about 62 kB for ss = 256 and sr = 0.05. column and slicing the resulting grid with the input image I.
6
To appear in the ACM SIGGRAPH conference proceedings
I lhe(I )
range
x space x
(a) input 1D image (b) histogram grid from 1D image (c) equalized grid & slicing image (d) locally equalized 1D image
Figure 9: Local histogram equalization demonstrated on a 1D image. We build a grid that counts the number of pixels in each bin (b). Each
grid column corresponds to the histogram of the image region it covers. By equalizing each column, we obtain a grid (c) which leads to an
image-space signal with an enhanced contrast that exploits the whole intensity range (d).
GPU Implementation We construct the bilateral grid the same 5.1 High Resolution Video Abstraction
way as in bilateral filtering, except we can ignore the image data.
Next, we execute a fragment shader that accumulates over each
(x, y) column of the bilateral grid. This yields a new grid where Winnemöller et al. [2006] demonstrated a technique for stylizing
each (x, y) column is an unnormalized cumulative distribution and abstracting video in real time. A major bottleneck in their ap-
function. We run another pass to normalize the grid to between proach was the bilateral filter, which limited the video to DVD res-
0 and 1 by dividing out the final bin value. Finally, we slice the grid olution (0.3 megapixels) and the framerate to 9 to 15 Hz. To attain
using the input image. this framerate, they used a separable approximation to the bilateral
filter with a small kernel size and iterated the approximation to ob-
tain a sufficiently large spatial support [Pham and van Vliet 2005].
Results Our algorithm achieves results visually similar to MAT- Using the bilateral grid with our GPU acceleration technique (with-
LAB’s adapthisteq (Figure 10). In both cases, low-contrast de- out the additional acceleration described in Section 3.5), we are
tails are revealed while the organ shapes are preserved. Our method able to perform video abstraction at 42 Hz on 1080p HD video
based on the bilateral grid achieves a speed-up of one order of mag- (1.5 megapixels).
nitude: 100ms compared to 1.5s on a 3.7 megapixel HDR image.
(a) input (1760 x 2140, HDR) (b) MATLAB’s adapthisteq (1.5s) (c) our result (100ms)
Figure 10: Local histogram equalization reveals low-contrast details by locally remapping the intensity values. The input (a) is an HDR
chest X-Ray (tone mapped for display). Our algorithm (c) based on the bilateral grid has a similar visual appearance to MATLAB’s
adapthisteq (b) while achieving a speedup of an order of magnitude. For this example, we used ss = 243.75, sr = 0.0039; memory
usage was 500 kB total for the two grids.
7
To appear in the ACM SIGGRAPH conference proceedings
5.2 Transfer of Photographic Look lets users locally modify the remapping function using an edge-
aware brush. We represent the remapping function with a grid ΓTM
Bae et al. [2006] introduced a method to transfer the “look” of a initialized with a linear ramp:
model photograph to an input photograph. We adapt their work
to operate on videos in real time. We describe two modifications ΓTM (x, y, z) = αz + β (7)
to handle the constraints inherent to video. We use a simplified If ΓTM is unedited, slicing ΓTM with B yields the same remapped
pipeline that yields real-time performance while retaining most of base layer as Durand and Dorsey’s operator: B = sB (ΓTM ).
the effects from the original process. We also specifically deal with
noise and compression artifacts to handle sources such as DVDs Users edit ΓTM with an edge-aware brush to locally modify the grid
and movies recorded using consumer-grade cameras. values. The modified base layer is still obtained by slicing accord-
ing to B. Clicking with the left button on the pixel at location (x, y)
Processing Pipeline Akin to Bae et al., we use the bilateral filter adds a 3D Gaussian centered at (x/ss , y/ss , L(x, y)/sr ) to the grid
on each input frame I and name the result the base layer Bi = values. A right click subtracts a 3D Gaussian. In practice, we use
bf (I) and its residual the detail layer Di = I − Bi . We perform Gaussian kernels with a user-specified amplitude A and parameters
the same decomposition on the model image M to get Bm and Dm . σs = ss and σr = sr . The spatial sampling ss controls the size of
We use histogram transfer to transform the input base Bi so that the brush and the range sampling sr controls its sensitivity to edges.
it matches the histogram of Bm . We denote by ht the histogram If users hold the mouse button down, we lock the z coordinate to
transfer operator and Bo = ht Bm (Bi ) the base output. For the the value of the first click L(x0 , y0 )/sr , thereby enabling users to
detail layer, we match the histogram of the amplitudes: |Do | = paint without affecting features at different intensities.
ht |Dm | (|Di |). We obtain the detail layer of the output by using the Using our GPU algorithm for the bilateral filter that creates the base
sign of the input detail: Do = sign(Di ) |Do |. The output frame O layer and for grid painting, we tone map a 15 megapixel image at
is reconstructed by adding the base and detail layers: O = Bo +Do . 50 Hz (Figure 1). Refer to the video for a screen capture of a local
tone mapping session.
input Bi Bo output
bf ht +
6 Discussion and Limitations
- Do
sign * Memory Requirements Our approach draws its efficiency from
Di the coarser resolution of the bilateral grid compared to the 2D im-
abs ht age. However, operations such as a bilateral filter with a small spa-
|Di|
|Do| tial kernel require fine sampling, which results in large memory and
Figure 11: Tone management pipeline. computation costs for our technique. In this case, Weiss’s approach
is more appropriate [2006]. Nonetheless, for large kernels used
in computational photography applications, our method is signifi-
Denoising A number of videos are noisy or use a compression cantly faster than previous work.
algorithm that introduces artifacts. These defects are often not no-
Due to memory constraints, the bilateral grid is limited to a one-
ticeable in the original video but may be revealed as we increase the
dimensional range that stores image intensities and can cause prob-
contrast or level of detail. A naı̈ve solution would be to denoise the
lems at isoluminant edges. We found that in most cases, we achieve
input frames before processing them but this produces “too clean”
good results with 7 to 20 levels in z. A future direction of research
images that look unrealistic. We found that adding back the de-
is to consider how to efficiently store higher dimensional bilateral
noising residual after processing yields superior results with a more
grids: 5D grids that can handle color edges and even 6D grids for
realistic appearance. In practice, since noise amplitude is low com-
video. Another possibility is to look at fast dimensionality reduc-
pared to scene edges, we use a bilateral filter with small sigmas.
tion techniques to reduce the memory limitations.
Discussion Compared to the process described by Bae et al., our
Interpolation We rely on trilinear interpolation during slicing
method directly relies on the detail amplitudes to estimate the level
for optimal performance. Higher-order approaches can potentially
of texture of a frame. Although the textureness measure proposed
yield higher-quality reconstruction. We would like to investigate
by Bae et al. capture more sophisticated effects, it induces three
the tradeoff between quality and cost in using these filters.
additional bilateral filtering steps whose computational cost would
prevent our algorithm to run in real time on HD sequences. Our re-
Thin Features Techniques based on the bilateral grid have the
sults show that detail amplitude is a satisfying approximation. Fur-
same properties as the bilateral filter at thin image features. For
thermore, it provides sufficient leeway to include a denoising step
example, in an image with a sky seen through a window frame, the
that broadens the range of possible inputs.
edge-aware brush affects the sky independently of the frame; that
is, the brush paints across the frame without altering it. Whether or
5.3 Local Tone Mapping not a filter stops at thin features is a fundamental difference between
bilateral filtering and diffusion-based techniques. We believe that
We describe a user-driven method to locally tone map HDR images both behaviors can be useful, depending on the application.
based on grid painting. We build upon Durand and Dorsey’s tone
mapping algorithm [2002], where the log luminance L of an im-
age is decomposed into a base layer B = bf (L) and a detail layer 7 Conclusion
D = L−B. The contrast of the base is reduced using a simple lin-
ear remapping B = αB + β while the detail layer D is unaffected. We have presented a new data structure, the bilateral grid, that en-
This reduces the overall dynamic range without losing local detail. ables real-time edge-preserving image manipulation. By lifting im-
The final output is obtained by taking the exponential of B + D age processing into a higher dimensional space, we are able to de-
and preserving color ratios. sign algorithms that naturally respect strong edges in an image. Our
approach maps well onto modern graphics hardware and enables
Our method extends this global remapping of the base layer and real-time processing of high-definition video.
8
To appear in the ACM SIGGRAPH conference proceedings
Acknowledgements We thank the MIT Computer Graphics H ARRIS , M., AND L UEBKE , D. 2004. GPGPU. In Course notes
Group and the anonymous reviewers for their comments. We are of the ACM SIGGRAPH conference.
especially grateful to Jonathan Ragan-Kelley for fruitful discus-
sions on GPU programming and Tom Buehler for his assistance in L EVIN , A., L ISCHINSKI , D., AND W EISS , Y. 2004. Colorization
making the video. This work was supported by a National Science using optimization. ACM Transactions on Graphics 23, 3 (July).
Foundation CAREER award 0447561 “Transient Signal Process- Proceedings of the ACM SIGGRAPH conference.
ing for Realistic Imagery,” an NSF Grant No. 0429739 “Parametric L EVIN , A., R AV-ACHA , A., AND L ISCHINSKI , D. 2007. Spectral
Analysis and Transfer of Pictorial Style,” and a grant from Royal matting. In Proceedings of the IEEE conference on Computer
Dutch/Shell Group. Jiawen Chen is partially supported by an NSF Vision and Pattern Recognition.
Graduate Research Fellowship and an NVIDIA Fellowship. Frédo
Durand acknowledges a Microsoft Research New Faculty Fellow- L ISCHINSKI , D., FARBMAN , Z., U YTTENDAELE , M., AND
ship and a Sloan Fellowship. S ZELISKI , R. 2006. Interactive local adjustment of tonal values.
ACM Transactions on Graphics 25, 3, 646 – 653. Proceedings
of the ACM SIGGRAPH conference.
References
O H , B. M., C HEN , M., D ORSEY, J., AND D URAND , F. 2001.
AURICH , V., AND W EULE , J. 1995. Non-linear gaussian filters Image-based modeling and photo editing. In Proceedings of the
performing edge preserving diffusion. In Proceedings of the ACM SIGGRAPH conference.
DAGM Symposium. PARIS , S., AND D URAND , F. 2006. A fast approximation of the bi-
BAE , S., PARIS , S., AND D URAND , F. 2006. Two-scale tone man- lateral filter using a signal processing approach. In Proceedings
agement for photographic look. ACM Transactions on Graphics of the European Conference on Computer Vision.
25, 3, 637 – 645. Proceedings of the ACM SIGGRAPH confer- P ERONA , P., AND M ALIK , J. 1990. Scale-space and edge detection
ence. using anisotropic diffusion. IEEE Transactions Pattern Analysis
Machine Intelligence 12, 7 (July), 629–639.
BARASH , D. 2002. A fundamental relationship between bilateral
filtering, adaptive smoothing and the nonlinear diffusion equa- P ETSCHNIGG , G., AGRAWALA , M., H OPPE , H., S ZELISKI , R.,
tion. IEEE Transactions on Pattern Analysis and Machine Intel- C OHEN , M., AND T OYAMA , K. 2004. Digital photography with
ligence 24, 6, 844. flash and no-flash image pairs. ACM Transactions on Graphics
23, 3 (July). Proceedings of the ACM SIGGRAPH conference.
B ENNETT, E. P., AND M C M ILLAN , L. 2005. Video enhancement
using per-pixel virtual exposures. ACM Transactions on Graph- P HAM , T. Q., AND VAN V LIET , L. J. 2005. Separable bilateral
ics 24, 3 (July), 845 – 852. Proceedings of the ACM SIGGRAPH filtering for fast video preprocessing. In Proceedings of the IEEE
conference. International Conference on Multimedia and Expo.
B LINN , J. F. 1996. Fun with premultiplied alpha. IEEE Computer S MITH , S. M., AND B RADY, J. M. 1997. SUSAN – a new ap-
Graphics and Applications 16, 5, 86–89. proach to low level image processing. International Journal of
Computer Vision 23, 1 (May), 45–78.
C HIU , K., H ERF, M., S HIRLEY, P., S WAMY, S., WANG , C., AND
Z IMMERMAN , K. 1993. Spatially nonuniform scaling functions S OCHEN , N., K IMMEL , R., AND M ALLADI , R. 1998. A general
for high contrast images. In Proceedings of the conference on framework for low level vision. IEEE Transactions in Image
Graphics Interface, 245–253. Processing 7, 310–318.
T OMASI , C., AND M ANDUCHI , R. 1998. Bilateral filtering for
D E C ARLO , D., AND S ANTELLA , A. 2002. Stylization and ab-
gray and color images. In Proceedings of the IEEE International
straction of photographs. ACM Transactions on Graphics 21, 3.
Conference on Computer Vision, 839–846.
Proceedings of the ACM SIGGRAPH conference.
T UMBLIN , J., AND T URK , G. 1999. LCIS: A boundary hierarchy
D URAND , F., AND D ORSEY, J. 2002. Fast bilateral filtering for for detail-preserving contrast reduction. In Proceedings of the
the display of high-dynamic-range images. ACM Transactions ACM SIGGRAPH conference, 83–90.
on Graphics 21, 3. Proceedings of the ACM SIGGRAPH con-
ference. W EISS , B. 2006. Fast median and bilateral filtering. ACM Trans-
actions on Graphics 25, 3, 519 – 526. Proceedings of the ACM
E ISEMANN , E., AND D URAND , F. 2004. Flash photography en- SIGGRAPH conference.
hancement via intrinsic relighting. ACM Transactions on Graph-
ics 23, 3 (July). Proceedings of the ACM SIGGRAPH confer- W ILLIS , P. J. 2006. Projective alpha colour. Computer Graphics
ence. Forum 25, 3 (Sept.), 557–566. 0167-7055.
F ELSBERG , M., F ORSS ÉN , P.-E., AND S CHARR , H. 2006. Chan- W INNEM ÖLLER , H., O LSEN , S. C., AND G OOCH , B. 2006. Real-
nel smoothing: Efficient robust smoothing of low-level signal time video abstraction. ACM Transactions on Graphics 25, 3,
features. IEEE Transactions on Pattern Analysis and Machine 1221 – 1226. Proceedings of the ACM SIGGRAPH conference.
Intelligence 28, 2 (February), 209–222. X IAO , J., C HENG , H., S AWHNEY, H., R AO , C., AND I SNARDI ,
M. 2006. Bilateral filtering-based optical flow estimation with
G ONZALES , R. C., AND W OODS , R. E. 2002. Digital Image
occlusion detection. In Proceedings of the European Conference
Processing. Prentice Hall. ISBN 0201180758.
on Computer Vision.
G OODNIGHT, N., W OOLLEY, C., L EWIN , G., L UEBKE , D., AND
H UMPHREYS , G. 2003. A multigrid solver for boundary value
problems using programmable graphics hardware. In Proceed-
ings of the ACM SIGGRAPH / EUROGRAPHICS conference on
Graphics Hardware.