0% found this document useful (0 votes)

106 views

Computer Graphics CSE 306

This class covers several aspects of computer graphics including rendering, geometry, simulation and image processing. Key topics like path tracing, image retargeting, fluid simulation and mesh parameterization will be implemented from scratch in C++ labs. Students are advised to thoroughly understand concepts before coding and to use a debugger to minimize bugs during development.

Uploaded by

Anis Hadjari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views

Computer Graphics CSE 306

Uploaded by

Anis Hadjari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 119

Computer Graphics – CSE 306

École Polytechnique

Nicolas Bonneel
[email protected]
https://perso.liris.cnrs.fr/nicolas.bonneel/
2
Contents

1 Preamble 7

1.1 Preamble of the preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Vector Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 A Vector Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 A Triangle Mesh Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Rendering 13

2.1 Real-Time Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Projection and homogeneous coordinates . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Rasterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.3 The Z-Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.4 Coloring pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.5 OpenGL and DirectX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.6 Advanced effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Physically-Based Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Raytracing / Path-Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Photon Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.2.3 Precomputed Radiance Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2.2.4 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3 Image Processing 65

3.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.1.1 Gaussian filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3
4 CONTENTS

3.1.2 Bilateral filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.1.3 Non-local means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.2 Color Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.2.1 Simple mean/standard deviation matching . . . . . . . . . . . . . . . . . . . . 72

3.2.2 Sliced optimal transport matching . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.3 Image Retargeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3.4 Poisson Image Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.1 A simple approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.4.2 Possible improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4 Geometry Processing 81

4.1 Representing shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Polygon clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.3 Voronoı̈ diagrams and Delaunay triangulations . . . . . . . . . . . . . . . . . . . . . . 84

4.3.1 Bowyer–Watson algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.2 Jump Flooding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.3.3 Voronoı̈ Parallel Linear Enumeration . . . . . . . . . . . . . . . . . . . . . . . . 90

4.4 More than Voronoı̈ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.4.1 Centroidal Voronoı̈ Tessellation . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.4.2 Restricted Voronoı̈ Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.4.3 Power diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.4.4 Semi-discrete Optimal Transport . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.5 The Marching Cubes algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.6 Surface parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.6.1 Tutte’s mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.6.2 Conformal mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5 Fluid simulation 111

5.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.1.1 Helmoltz-Hodge decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.1.2 Navier-Stokes and Incompressible Euler . . . . . . . . . . . . . . . . . . . . . . 112

5.1.3 Chorin’s projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

CONTENTS 5

5.2 Marker-and-Cell Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.3 Smoothed particle hydrodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.4 Using optimal transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

This class will cover several aspects of computer graphics, including overviews of rendering, geom-
etry, simulation and image processing. A selected few aspects will be discussed in more details as they
will contribute to labs, where you will need to implement them from scratch in C++. Little to no
code will be provided: these aspects will need to be fully understood. I do consider that nothing is
fully understood if you cannot implement it from scratch, and conversely, once understood, coding is
merely a matter of touch typing. In return, you will get the satisfaction of having implemented your
own tools producing beautiful computer graphics results.

These labs will include a path-tracer (Sec. 2.2.1, 4 labs), an image retargeting algorithm (Sec. 3.3,
1 lab) or sliced optimal transport color matching (Sec. 3.2.2), a fluid simulator that uses Voronoı̈
diagrams (Sec. 4.3 and Sec. 5.4, 3 labs), and a mesh parameterization algorithm from Tutte (Sec. 4.6.1).
6 CONTENTS
Chapter 1

Preamble

This chapter gives an overview of what is considered common knowledge (although you may not have
formally learnt it), and prerequisite for the rest of the course. As labs will be implemented in C++,
typical C++ prototypes are given.

1.1 Preamble of the preamble

This class will require you to code. It is largely advised that you do not write any line of code
before you are 100% sure the line is correct. The most time consuming aspect of programming is often
debugging, and you should strive to minimize this amount of time.

" In a program, random lines of code have close to 0% chances of working, but near 100% chances
of you needing to spend time trying to find them and fix them. You’ll be better off not writing them
in the first place.

This is particularly true for what we will implement: when implementing a path-tracer, code
errors such as mistakes in probability density functions (or even basic vector math operators) can
go unnoticed for some time before producing noticeable artifacts and they will thus become hard to
track down ; when implementing a fluid solver, errors such as off-by-one indices typically result in
completely wrong simulations but are also hard to track down due to the number of indices in the
code.

However, bugs happen. Make sure you master a real debugger, preferably with an IDE and ways
to quickly step through the code execution (setting break points, stepping inside/over lines of code,
inspecting variable values including arrays, structure members, array of structures etc.). I will use
Visual Studio for that purpose, but other debuggers exist (and I would not recommend small tools
such as gdb if used directly in the command line – the goal is to be efficient).

Regarding languages, I let students choose the languages they are most comfortable with, but
support will only/mostly be provided on C++ code. However, I strongly advise against scripted
languages such as Matlab or Python, particularly if you are not expert in them, as they are extremely
slow. A naive path tracer implemented in C++ would take a few seconds to run when its Python
equivalent would take several hours: you will not be able to debug and experiment with your codes.
While our code will not result in state-of-the-art performances, we will still try to avoid large perfor-
mance bottlenecks and maintain good code practices regarding performance when this only results in
minor efforts in code writing. For instance, this involves avoiding unneeded square root computations,

7
8 CHAPTER 1. PREAMBLE

passing const reference parameters instead of entire objects, or using simple parallelization instruc-
tions. I will most often give running times (obtained on a good desktop computer) and code length for
your to check if you have done anything stupid in the code (e.g., if you get a 100x slow down or a code
3x as long), to see the impact of design choices on running times and to compare different approaches
(e.g., realtime OpenGL vs. slow path tracing). I also believe that code length is a good metric to see
is an algorithm is worthwhile. Note that highly tuned code with clever algorithmic tricks would be
orders of magnitude faster.

" I occasionally see students compiling without optimization flags and complaining about speed.
Do not forget to turn on optimization ! In GCC, use -O3 ; on Visual Studio, use the Release mode.

Regarding libraries, from an educational perspective I will strive to minimize the number of libraries
used in this course. Of course, in a professional setting, you would probably use a library such
as Embree to compute intersections quickly rather than the code you developed during this course.
However, a few functionalities are much less interesting to code, and I will thus recommend libraries or
give pieces of codes for a few functionalities. Notably, I will recommend the C++ header only stb image
and stb image write libraries (https://github.com/nothings/stb) to read and write images, and I
will provide code to read .obj mesh files. Unless you want to go further (e.g., adding a GUI), you will
not need other codes.

1.2 Image Representation

For this course, we will consider that an image is a 2d array of pixels, each pixel being a triplet of
red (R), green (G) and blue (B) values. For implementation purpose, we will consider all rows of the
image stored consecutively (row major ordering), interleaving R, G and B values. A typical C/C++
implementation with 0-based array indexing would access coordinate (x, y) in the image using:

1 image [ y * W * 3 + x * 3 + 0 ] = red_value ;
2 image [ y * W * 3 + x * 3 + 1 ] = green_value ;
3 image [ y * W * 3 + x * 3 + 2 ] = blue_value ;

with red value, green value, blue value between 0 and 255.

Note that other representations are commonly encountered. For instance, a camera sensor stores a
file where pixels are interleaved in a Bayer pattern. Certain applications require multispectral images
consisting of multiple (>3) sampled wavelengths (e.g., additional infra-red channels) or including
transparency (an additional alpha channel).

1.3 Vector Image Representation

A vector image is an image defined by parametric shapes: lines, circles, squares etc., with parametric
rendering types (e.g., gradients). Vector images can support animation. The .svg file format is a
simple text file format that describes vector graphics. I uploaded a svg writer in C++ that saves
polygon soups and supports animations (just call the save svg animated repeatedly, once for each
frame of your animation) at https://pastebin.com/bEYVtqYy.
1.4. A VECTOR CLASS 9

The idea of the .svg file format is to describe shapes using shape commands in an xml-like fashion.
A rectangle can be obtained using:

<rect width="10" height="10" x="0" y="0" fill="blue" />

a line is represented by:

<line x1="0" y1="0" x2="1" y2="1" stroke="red" />

while a general (closed) polygon is described by pairs of coordinates for each vertex:

<polygon points="0,0 10,0 7,10 3,10" />

Objects can be grouped using the <g> ... </g> pair. All parameters can be animated. To
describe frame-by-frame animations, the option I used in my svg writer above is to superimpose all
polygons of all frames in the same image, but grouping all polygons that belong to the same frame,
and animating the visibility parameter of each group (called the “display” attribute).

" As I realize .svg animations are extremely slow and you will not be able to load svg files
containing even a few hundreds frames of a few thousands of Voronoi cells, I uploaded here a code
that rasterizes these cells and saves a bitmap (it will only work for convex polygonal cells such as
Voronoı̈ or Laguerre’s cells: https://pastebin.com/jVcNAE5Q

1.4 A Vector Class

While it is a bad practice in software engineering, we will consider everything that has 3 floating point
coordinates as a Vector. It is considered a bad practice since it violates several software design rules
(e.g., allowing cross products between mathematical vectors is ok, but not between colors, points, etc.
; similarly, adding two vectors or a point and a vector is fine but not two points). Still, in practice,
it has become widespread in computer graphics to consider a single Vector class, to the point that it
is the standard for programming langages designed for graphics cards such as GLSL or HLSL. Theses
langages implement classes such as vec3 or float3 that contain 3 floating point values that can be
accessed either via .x, .y, and .z, or via .r, .g and .b (or even .s, .t and .p when refering to texture
coordinates). In general, this course will take shortcuts to quickly implement prototypes and will not
be a reference for software design !

A typical (partial) example of such a Vector class is provided below. You will need to fully
implement it, including operations such as dot products, cross products, vector normalization and
norm, multiplication by a scalar etc.

1 c l a s s Vector {
2 public :
3 e x p l i c i t Vector ( d o u b l e x = 0 . , d o u b l e y = 0 . , d o u b l e z = 0 . ) {
4 coords [ 0 ] = x ;
5 coords [ 1 ] = y ;
6 coords [ 2 ] = z ;
7 };
8 Vector& o p e r a t o r+=(c o n s t Vector& b ) {
9 coord [ 0 ] += b [ 0 ] ;
10 coord [ 1 ] += b [ 1 ] ;
11 coord [ 2 ] += b [ 2 ] ;
10 CHAPTER 1. PREAMBLE

12 return * this ;
13 }
14 c o n s t d o u b l e& o p e r a t o r [ ] ( i n t i ) c o n s t { r e t u r n coord [ i ] ; }
15 d o u b l e& o p e r a t o r [ ] ( i n t i ) { r e t u r n coord [ i ] ; }
16
17 private :
18 d o u b l e coords [ 3 ] ;
19 };
20 Vector o p e r a t o r +( c o n s t Vector& a , c o n s t Vector &b ) {
21 r e t u r n Vector ( a [ 0 ] + b [ 0 ] , a [ 1 ] + b [ 1 ] , a [ 2 ] + b [ 2 ] ) ;
22 }
23 d o u b l e dot ( c o n s t Vector&a , c o n s t Vector& b ) {
24 return a [ 0 ] * b [ 0 ] + a [ 1 ] * b [ 1 ] + a [ 2 ] * b [ 2 ] ;
25 }

The explicit keyword indicates the Vector’s constructor cannot be called from implicit conver-
sions. For instance, the code :

1 Vector myVector1 ( 1 . , 2 . , 3 . ) ;
2 Vector result = myVector1 + 1 . ;

would otherwise produce the Vector result = (2., 2., 3), resulting from the implicit conversion of the
real value 1. to a Vector by an implicit call to Vector(1.). This is prone to bugs, and explicit
prevents that from happening.

1.5 A Triangle Mesh Class

This course will mostly manipulate triangle meshes, as they are widely used and efficient (for instance,
they are natively supported by your graphics card!). These meshes consist of a set of vertices, and
triplets of vertices are connected together for form triangles. The most common structure to store
meshes consist in an array of vertices, and an array of triangle faces referencing these vertices. Often,
additional informations are stored per vertex (e.g., a color, UV coordinates, normals etc. as we shall
see later).

The most common implementation of a triangle mesh consists of an array of Vector, and an array
of triplets of indices refering to the previous array. As in most cases other geometric informations are
stored as well (typically, at least a normal vector per vertex, but also UV coordinates that we will
discuss later), we will consider multiple arrays as in the example below:

1 s t r u c t TriangleIndices {
2 i n t vtxindices [ 3 ] ; // r e f e r s t o 3 i n d i c e s i n t h e v e r t i c e s a r r a y o f t h e c l a s s ←-
Mesh
3 i n t normalindices [ 3 ] ; // r e f e r s t o 3 i n d i c e s i n t h e normal a r r a y o f t h e c l a s s Mesh
4 i n t uvindices [ 3 ] ; // r e f e r s t o 3 i n d i c e s i n t h e uv a r r a y o f t h e c l a s s Mesh
5 };
6 c l a s s Mesh {
7 public :
8 // . . .
9 private :
10 std : : vector<Vector> vertices ;
11 std : : vector<Vector> normals ;
12 std : : vector<Vector> uvs ;
13 std : : vector<TriangleIndices> triangles ;
14 };
1.5. A TRIANGLE MESH CLASS 11

The .obj file format implements this structure. It is a file in text mode. Each line starting with
a v defines a vertex coordinate (e.g., v 1.0 3.14 0.00, and each line starting with an f defines a
face (most often a triangle, but it also supports more general polygonal faces – e.g., f 1 2 3 defines a
triangle consisting of the first 3 vertices, as indexing starts at 1). Negative indices correspond to offsets
relative to the end of the vertex list. Normal vectors start with a vn, and uv coordinates with vt. The
general syntax to define a triangle that has normal and uv coordinates is f v1/vt1/vn1 v2/vt2/vn2
v3/vt3/vn3. I uploaded a (poorly coded) obj file reader at https://pastebin.com/CAgp9r15.
12 CHAPTER 1. PREAMBLE
Chapter 2

Rendering

Two main approaches to rendering have been adopted, focusing either on producing images at fast
framerate for realtime applications (video games, simulators, fast previews of complex scenes, visu-
alization, augmented reality etc.) or on producing images that are realistic (mostly for the movies
industry) or even physically accurate (lighting simulation for architecture, car paint and light design
etc.). We will briefly discuss real-time rendering (Sec. 2.1), and cover physically-based rendering in
more depth (Sec. 2.2). From an implementation point of view, realtime rendering systems based on
OpenGL or DirectX are often more time-consuming to produce results of similar quality as methods
tailored for physically correct results ; however, they allow to framerates that are difficult to achieve
with physically correct methods (Fig. 2.1). To remediate implementation issues, most realtime 3d
applications are based on complex rendering engines, such as Unity, Unreal Engine, or Amazon Lum-
beryard (or CryEngine), that make development much faster.

2.1 Real-Time Rendering

Modern real-time rendering systems rely on the rasterization of triangle meshes and the use of Graphics
Processing Units (GPUs). These triangles are thus projected on the screen, and shaded according to
their materials and the various light sources in the scene. This section briefly describes the process,
and will be skimmed over in class.

2.1.1 Projection and homogeneous coordinates

Given 3d coordinates of the vertices of a triangle, these coordinates are first projected on the 2d screen
using:

p0 = P V M p

where M is the model matrix that represents the 3d transformation applied to the 3d mesh itself,
V the view matrix that represents the inverse of the camera transformation (typically, if the camera
translates to the right, it is equivalent to translating the point to the left!), and P is a projection
matrix that mainly depends on the field of view.

As you notice, these are linear transforms, and you may wonder how this could ever produce a
perspective projection or even a translation (which is an affine transform, not purely linear)... In fact,
we work with homogeneous coordinates that handle projective geometry ! In this context, our point
p typically consists of coordinates (x, y, z, w = 1). The model and view matrices are 4 × 4 matrices

13
14 CHAPTER 2. RENDERING

OpenGL Basic Intermediate Advanced

∼ 60 (deprecated) to 4000 lines of C++ ∼ 600 (deprecated) lines ∼ 18k lines

Realtime (< millisecond) Realtime (< millisecond) Realtime (milliseconds)
Path tracing

∼ 120 lines ∼ 300 lines ∼900 lines

∼ Realtime (∼ 4 ms) 2 seconds 4 min.

Figure 2.1: Basic OpenGL rendering can be obtained in a few lines of deprecated OpenGL
(reference implementation in about 60 useful lines of code here: https://www.opengl.org/
archives/resources/code/samples/redbook/colormat.c using glutSolidSphere) or thousands of
non-deprecated lines of code (e.g., see here for a typical implementation using shaders: http://www.
songho.ca/opengl/gl_sphere.html). Intermediate OpenGL rendering that features soft shadows
can be performed in about 600 lines of code with deprecated features http://fabiensanglard.net/
shadowmappingPCF/. Advanced OpenGL rendering with indirect lighting quickly becomes difficult
to implement (example implementation in 18k lines of c++ code: https://github.com/djbozkosz/
Light-Propagation-Volumes ). Note that OpenGL 3.1 (March, 2009) has removed many features
that have been deprecated in OpenGL 3.0 (August, 2008), making codes significantly longer and also
explaining variations in the above code lengths. The images shown for path-tracing are those obtained
from the code developed in this class, and even includes indirect lighting and participating media (ad-
vanced rendering). Also, sacrificing readability can always shorten code. A small path tracer that
produces something along the Intermediate result in term of complexity but Advanced in term of light-
ing simulation can be achieved in 99 lines of code here https://www.kevinbeason.com/smallpt/.

that now allow for translating the geometry via their fourth column. Similarly, the fourth component
w gets transformed to w0 via the fourth row of the 4 × 4 matrix P . The projection on the screen is
0 p0 0
performed by simply considering the projected point p00 = [ pp0x , p0y ] and its corresponding depth pp0z is
w w w
used to determine which parts are visible or occluded.

2.1.2 Rasterization

Once 2d screen projections are known for the three vertices of a triangle, it remains to fill pixels
inside this 2d triangle. This is often performed by computing an axis-aligned bounding box around
the triangle, and testing the center of all pixels within this box if they belong to this triangle (we will
implement such as test in 3d for raytracing 3d meshes in Sec. 2.2.1 using barycentric coordinates).
This also allows to interpolate quantities stored at the vertices of the triangle such as normals, colors
or the depth value that will be used next.
2.1. REAL-TIME RENDERING 15

2.1.3 The Z-Buffer

When two 2d triangles overlap, their depth should be used to determine which one is in front of the
other. An approximate solution is to use what is known as the painter’s algorithm : 3d triangles are
simply sorted by the distance between their center and the camera before being rasterized. However,
this technique can be costly for large scenes, and cannot handle all cases (triangle A partly in front
of triangle B, which is partly in front of triangle C, which is partly in front of triangle A). For this
reason, Wolfgang Straßer, Ivan Sutherland (Turing award) and Ed Catmull (founder of Pixar and
former president of Disney Animation Studios) independently described in 1974 a technique called
the Z-Buffer. This technique simply stores the depth of each pixel being rasterized in a buffer called
z-buffer. If the depth of the current pixel being rasterized is further than the depth already stored in
the z-buffer for this pixel, the pixel is simply ignored.

2.1.4 Coloring pixels

A relevant color should be assigned to the pixel being rasterized. This color depends on the material
of the object, the orientation of the surface (its normal), and the various light sources. The precise
illumination model that is routinely used will be described later in the course in the context of
raytracing, and consists in the Blinn-Phong BRDF (Sec. 2.2.1).

2.1.5 OpenGL and DirectX

The operations described previously are automatically performed by graphics libraries such as OpenGL
or DirectX. These libraries take as input triangle meshes seen as arrays of vertices and polygons as
well as transformation matrices (camera transformation, model transformation, projection matrix),
and perform rasterization on the GPU.

Typical OpenGL < 3.0 code to display a triangle looks like:

1 glBegin ( GL_TRIANGLES ) ;
2 glNormal3f ( 0 . f , 0 . f , −1.f ) ;
3 glColor4f ( 1 . f , 0 . f , 0 . f , 1 . f ) ;
4 glVertex3f ( 0 . f , 1 . f , 0 . f ) ;
5 glColor4f ( 0 . f , 1 . f , 0 . f , 1 . f ) ;
6 glVertex3f ( −1. f , 0 . f , 0 . f ) ;
7 glColor4f ( 0 . f , 0 . f , 1 . f , 1 . f ) ;
8 glVertex3f ( 1 . f , 0 . f , 0 . f ) ;
9 glEnd ( ) ;

while in OpenGL ≥ 3.0, a similar triangle would look like:

1 // d e f i n e geometry
2 GLfloat positions [ 3 * 3 ] = { 0 . f , 1 . f , 0 . f , −1.f , 0 . f , 0 . f , 1 . f , 0 . f , 0 . f } ;
3 GLfloat normals [ 3 * 3 ] = { 0 . f , 0 . f , −1.f , 0 . f , 0 . f , −1.f , 0 . f , 0 . f , −1.f } ;
4 GLfloat colors [ 4 * 3 ] = { 1 . f , 0 . f , 0 . f , 1 . f , 0 . f , 1 . f , 0 . f , 1 . f , 0 . f , 0 . f , 1 . f , 1 . f } ;
5 GLuint indices [ 3 ] = { 0 , 1 , 2 } ;
6
7 // g e n e r a t e s a v e r t e x a r r a y with 3 b u f f e r s f o r p o s i t i o n , normals and c o l o r s
8 GLuint vertexArray , vertexBufferObject [ 3 ] ;
9 glGenVertexArrays ( 1 , &vertexArray ) ;
16 CHAPTER 2. RENDERING

10 glBindVertexArray ( vertexArray ) ;
11 glGenBuffers ( 3 , vertexBufferObject ) ;
12
13 // a s s o c i a t e t h e a r r a y s on t h e CPU t o a t t r i b u t e s s t o r e d on t h e GPU
14 // g l B u f f e r D a t a u p l o a d s t h e a r r a y t o t h e GPU
15 glBindBuffer ( GL_ARRAY_BUFFER , vertexBufferObject [ 0 ] ) ;
16 glBufferData ( GL_ARRAY_BUFFER , 9 * s i z e o f ( GLfloat ) , positions , GL_STATIC_DRAW ) ;
17 glVertexAttribPointer ( ( GLuint ) 0 , 3 , GL_FLOAT , GL_FALSE , 0 , 0 ) ;
18 glEnableVertexAttribArray ( 0 ) ;
19
20 glBindBuffer ( GL_ARRAY_BUFFER , vertexBufferObject [ 1 ] ) ;
21 glBufferData ( GL_ARRAY_BUFFER , 9 * s i z e o f ( GLfloat ) , normals , GL_STATIC_DRAW ) ;
22 glVertexAttribPointer ( ( GLuint ) 1 , 3 , GL_FLOAT , GL_FALSE , 0 , 0 ) ;
23 glEnableVertexAttribArray ( 1 ) ;
24
25 glBindBuffer ( GL_ARRAY_BUFFER , vertexBufferObject [ 2 ] ) ;
26 glBufferData ( GL_ARRAY_BUFFER , 12 * s i z e o f ( GLfloat ) , colors , GL_STATIC_DRAW ) ;
27 glVertexAttribPointer ( ( GLuint ) 2 , 4 , GL_FLOAT , GL_FALSE , 0 , 0 ) ;
28 glEnableVertexAttribArray ( 2 ) ;
29
30 // d e f i n e a t r i a n g l e as 3 vertex i n d i c e s
31 GLuint elementbuffer ;
32 glGenBuffers ( 1 , &elementbuffer ) ;
33 glBindBuffer ( GL_ELEMENT_ARRAY_BUFFER , elementbuffer ) ;
34 glBufferData ( GL_ELEMENT_ARRAY_BUFFER , 3 * s i z e o f ( u n s i g n e d i n t ) , &indices [ 0 ] , ←-
GL_STATIC_DRAW ) ;
35
36 // draw t h i s t r i a n g l e
37 glDrawElements ( GL_TRIANGLES , 3 , GL_UNSIGNED_INT , ( v o i d * ) 0 ) ;

Note that an entire mesh would be handled by creating a larger array of triangles (and not by
repeating this routine for all triangles!). In both examples, large parts would be missing to see
something, including definition of lights, cameras and materials. However, an important missing
routine in the second example is the definition of shaders that actually perform the computations
(projecting the vertices on screen in a vertex shader or vertex program, and computing the pixel color
based on the lighting in the scene in a pixel shader or fragment program) – by default, OpenGL < 3.0
has these things already handled.

These functions remain extremely simple and limited: they merely allow you to draw triangles on
screen, very efficiently. But most of the job remains to be done: making these triangles look realistic.
This is most often accomplished via a number of tricks, which we will give a few examples below.

2.1.6 Advanced effects

Hard shadows from a spot light. The first thing that is obviously missing when displaying triangles
on screen is cast shadows. The most common way to cast shadows in a rasterizer is to compute a
shadow map. A shadow map is a view of the scene from the light point of view, and only storing depth
information (you do not need to keep or even compute color values). This depth image is precomputed
for each light source (Fig. 2.2). Then, when computing the color value of a given pixel, it becomes
easy to reproject this pixel on the shadow map via transformation matrices, and check whether the
light arriving from the light source is obstructed by comparing the distance between the shaded point
and the light source, and the value stored in the shadow map.

Indirect lighting. A simple solution to obtain indirect lighting1 consists of the Instant Radiosity
technique. This technique, despite its name, has nothing to do with the usual radiosity algorithm (see
Sec. 2.2.4): it consists in rendering (both colors and z-buffer) the scene from the light sources, and
1
see Sec. 2.2.1 for more details on indirect lighting in the context of raytracing
2.2. PHYSICALLY-BASED RENDERING 17

Figure 2.2: Realtime rendering with shadows (left) obtained by a shadowmap (right) that corresponds
to the z-buffer of the scene as seen from the light source.

then placing new secondary light sources (often called Virtual Point Lights) in the scene whose color
match the rendered image. Clustering can be performed in realtime to group all nearby pixels of the
same color in the image rendered from the light source to construct a single secondary light source for
each cluster of pixels.

2.2 Physically-Based Rendering

This section covers basics of physically-based rendering to the point that you should be able to im-
plement a path tracer (while it will not work at the speed of production engines, it would give close
to production level quality), and have minimal knowledge of other techniques.

2.2.1 Raytracing / Path-Tracing

Path-tracing works by launching rays of light from a virtual camera throughout the scene, computing
ray/scene intersections, evaluating light contributions from light sources and making these light rays
bounce off the objects. While this approach works counter-intuitively to real-world physics (in which
light rays are emitted from light sources rather than the camera!), it can be shown to be strictly
equivalent due to Helmoltz reciprocity principle: what only counts is the set of light paths joining the
camera sensor and light sources. In fact, an approach called bidirectional path tracing benefits both
from rays emitted from the camera and rays emitted from light sources to construct these light paths.

Rendering basic spheres

We will first write a small program that renders and shade a few spheres with direct lighting. First,
“launching rays” from the camera to the scene corresponds to generating half-lines (rays) which
originate at the camera location and towards each pixel of the camera sensor, and computing the
point of intersection of these half-lines with the scene (i.e., the spheres).

The ingredients thus are:

1. Defining classes and operators for handling geometric computations

2. Defining a scene

3. Computing the direction of rays

4. Computing the intersection between a ray and a sphere

5. Computing the intersection between a ray and the scene

18 CHAPTER 2. RENDERING

6. Computing the color

1 Classes Regarding operators, we will define classes for Vector (see Sec. 1.4), Sphere (a center
Vector C and a double radius R ; we will also add a color, called albedo, stored in a Vector as
an RGB triple ∈ [0, 1]3 ), Ray (an origin Vector O and a unit direction Vector u ), and Scene (an
array/std::vector of Spheres). A Sphere will further posess a function intersect that computes
the point of intersection between a Ray and the sphere, if any (at this stage, we can either return
a bool indicating whether an intersection occured and pass the relevant intersection information as
parameters passed by reference ; or we can return an Intersection structure that contains all the
relevant information including a bool flag). A Scene will also posess a similar function.

2 Scene For reproducibility purpose, we can define a standard scene as in Fig. 2.3, that we will
use throughout this course. To simplify the introduction, we will first focus on the center sphere. Also
for simplicity, we consider the camera if standing upright and looking at the −z direction.

Center = (0, 1000, 0), Radius = 1000-60 = 940.

S = (-10, 20, 40)

Center= (0, 0, -1000), Radius = 1000-60 = 940.

Center = (0, 0, 1000), Radius = 1000-60 = 940.

camera

R = 10
6 0°

+z C = (0,0,0)
id

Q = (0,0, 55)
grl
xe
pi

Center = (0, -1000, 0), Radius = 1000-10 = 990.

Figure 2.3: We define a standard scene that consists in walls, a ground and a ceiling, all consisting of
gigantic spheres approximating planes. We also add a center sphere, which we will focus on as a first
step.

3 Computing the direction of rays Our camera consists of a center Q and a virtual plane that
makes the screen (or similarly the sensor, if you see our camera as a pinhole, see Sec 2.2.1), see Fig. 2.4.
Assuming the screen is at a distance f from the camera center Q = (Qx , Qy , Qz ), we will consider that,
in our configuration, pixel (x, y) is located at coordinate (Qx +x+0.5−W/2, Qy +y+0.5−H/2, Qz −f ).
However, one usually only knows α, the visual angle covering the W pixels in width (called horizontal
field of view, or fov ), not f . Simple calculus shows that tan (α/2) = (W/2)/f such that pixels are
located at coordinates (Qx + x + 0.5 − W/2, Qy + y + 0.5 − H/2, Qz − W/(2 tan(α/2))). Note that in
our pixel grid, we will index pixels by their row and column number (i, j). Since image rows are most
often stored from top to bottom, this corresponds to using (x, y) = (j, H − i − 1), with i ∈ {0..H − 1}
and j ∈ {0..W − 1}. From the coordinate of each pixel and the camera center, we can simply compute
a normalized ray direction.
2.2. PHYSICALLY-BASED RENDERING 19

3D view Top view

W
x
(i,j) W
f
α
H

id
lgr
xe
pi
Figure 2.4: Notations for a virtual camera.

4 Ray-Sphere intersection A parametric equation of a ray of origin O and direction u is X(t) =

O + t u, with t > 0. A implicit equation of a sphere centered at C and radius R is kX − Ck2 = R2 .
A point of intersection P , if any, would satisfy both equations. Plugging the first equation into the
second yields kO + t u − Ck2 = R2 . Expanding the squared norm and using scalar product bilinearity
yields t2 kuk2 +2 t hu, O −Ci+kO −Ck2 = R2 . Assuming unit norm for u leads to the simple quadratic
equation:
t2 + 2 t hu, O − Ci + kO − Ck2 − R2 = 0

A quadratic equation has 0, 1 or 2 real solutions depending on the discriminant, which has geo-
metric interpretations here (see Fig. 2.5). Denoting ∆ = hu, O − Ci2 − (kO − Ck2 − R2 ) the reduced
discriminant, no intersection between the line (not the ray) is found if ∆ < 0, one (double) intersection
is found if ∆ = 0 and two are found if ∆ > 0. However, one needs to further check that the solu-
tion parameter t is non-negative, since otherwise the intersection would occur behind the ray origin.
Further, in the context of ray-tracing, only the first non-negative intersection is of interest, i.e., the
(positive) intersection
√ closest to the ray origin. √If ∆ ≥ 0, the two possible intersection parameters are
t1 = hu, C − Oi − ∆ and t2 = hu, C − Oi + ∆. If t2 < 0, the ray does not intersect the sphere.
Otherwise, if t1 ≥ 0, t = t1 else t = t2 . The intersection point P is located at P = O + t u. For further
lighting computation, we will also need to retrieve the unit normal N at P . It can be simply obtained
−C
using N = kPP −Ck . We are now ready to produce a first image, by scanning all pixels in the pixel grid,
throwing rays, and testing if there is any intersection. If any intersection is found, just setting the
pixel white results in Fig. 2.6 (considering only the central sphere of our standard scene in Fig. 2.3).

5 Ray-Scene intersection Our scene is composed of multiple spheres (for now). The intersection
we are interested in, between a ray and the scene, is the ray-sphere intersection that is closest to the
ray origin among all (if any). It can also be useful to return the specific sphere or object ID that has
been hit to retrieve object-specific properties, such as material parameters.

6 Shading and shadows computation For now, we will use a simple material model: the
Lambertian model. This model assumes that materials scatter light equally in all directions, regardless
of incoming light direction. This well represents diffuse materials such as plaster, but will not handle
shiny materials such as metals or plastics. Under this model, the intensity reflected off a surface at
point P with albedo ρ and normal N , illuminated by an omnidirectional light source of intensity I at
position S is given by
I ρ
L= VP (S) hN, ωi i
4πd2 π
S−P
with ωi = kS−P k , d = kS − P k. The visibility term VP (S) is such that VP (S) = 1 if S is “visible”
from P and 0 otherwise. “Visible” means that launching a ray from P with direction ωi (towards S)
will either encounter no intersection, or that an intersection exists but further than the light source2 ,
I
that is, t > d. The term in 4πd 2 merely says that a light intensity of I Watt will be spread over a

2
These visibility or shadow rays often benefit from faster intersection routines as the exact point of intersection is not
required but merely the presence of an intersection within an interval ; feel free to do that.
20 CHAPTER 2. RENDERING

Δ<0

Δ=0
A
X Ray t2>0
Δ>0
t1<0
yB
Ra t1>0
X
X C t2>0
X Ray t1<0
D t2<0
Ray X
E
Ray

Figure 2.5: Ray-Sphere intersections lead to solving a quadratic equation. Depending on the sign of
the discriminant, this leads to either 0, 1 or 2 points of intersection. Here, ray B leads to one (double)
intersection, ray C produces a first intersection of interest at t1 , and ray D produces the intersection
of interest at t2 (the other intersection being behind).

Figure 2.6: Computing the ray-sphere intersection at each pixel leads to our first image. Ok, that’s
just a plain white disk, don’t be too excited.

sphere surface of 4πd2 , and the amount reaching point P is thus 4πd I −1 .m−1 (sr stands for
2 Watt.sr
ρ
steradian, a unit of solid angle). The term in π is essentially a convention:
R with albedo values ρ ranging
in [0..1], the material respects energy conservation (see Sec. 2.2.1) if S + c.hN, ωRi idωi ≤ 1 for some
normalization constant c, where S + is the hemisphere above the surface. Since S + hN, ωi idωi = 4π,
c = 1/(4π).

" Due to numerical precision issues, you will certainly observe extreme noise levels (see Fig. 2.7).
This is due to the fact that when launching a ray from point P towards the light source S, the first
point of intersection that may be found is P itself since precision is limited. The solution to this issue
is to lauch the ray not from P , but from a point slightly above the surface, P + εN . Since we are
launching rays from a slightly elevated position, it could be that hN, ωi i < 0 at grazing angles. For
safety, we will use instead max(hN, ωi i, 0).

Gamma correction. Computer screens do not react linearly with the pixel intensities they
are fed with. For instance, a linear ramp from pure black to pure white results in a midpoint that
seems too dark (Fig. 2.8). To compensate for this effect, we apply gamma correction to the images
2.2. PHYSICALLY-BASED RENDERING 21

Figure 2.7: Due to numerical precision issues in shadow computations the image appears noisy (left).
Launching rays from an offseted origin solves this issue (right).

produced by our path tracer. This consists in elevating RGB values (in a normalized range [0, 1]) at
the power 1./γ, with typically γ = 2.2. One reason for the need to gamma-correct images is a more
perceptually uniform image encoding. Indeed, noise, compression artifacts or quantization artifacts are
often more visible on dark pixels than on bright pixels. To allow for more accuracy on darker values,
the quantization is made non-uniform by storing gamma-corrected images. Additionally, (integer)
pixels values should be clamped in the range {0..255} to avoid overflowing unsigned char that would
result in wraparound. You can see the result of gamma correction on our test scene in Fig. 2.9.

Figure 2.8: A linear ramp (top) and gamma-corrected linear ramp (bottom, with γ = 2.2). The linear
ramp’s midpoint appears too dark. Note that perceived results may vary depending on specific screen
settings.

Figure 2.9: Without gamma correction, the scene appears too contrasted (here, I = 2.107 ). With
gamma correction (and I = 2.1010 ), the scene appears more natural. At this stage, we have roughly
170 lines of (verbose) code which runs in 50ms without parallelization (see end of Sec. 2.2.1) for a
512x512 image.
22 CHAPTER 2. RENDERING

" A common bug is to gamma-correct or clamp intensity values for all bounces of the light, which
is not correct. This typically results in lack of contrasts. These operations compensate for specific
image formats. For instance, High Dynamic Range (HDR) formats such as .exr, .pfm or .hdr do not
need gamma correction, as this step is usually performed by the image viewer. As such, these should
be the very last steps to be performed only once, right before saving the image to disk, and should
not be involved in the light simulation process.

Adding reflections and refractions

Reflections. Contrary to Lambertian surfaces that scatter light in all directions, (purely) reflec-
tive/specular surfaces only reflect light in a single direction. It is easy to see that the direction
ωr reflected from an incident direction ωi off a surface with normal N is ωr = ωi − 2hωi , N iN (see
Fig. 2.10). A perfect mirror thus only transfers light energy from the incident direction to the reflected
direction.

N
sin θi
-<ωi,N> N
ωi
θi θr
ωr ωi θi n1

n2
θt
ωt

sin θt
Figure 2.10: The reflected direction is ωr = ωi − 2hωi , N iN (left) and refracted direction (right).

" A common bug is to compute a visibility term (or shadows) on top of the reflected light. You
should not do it. Visibility is a shadowing term that refers to specific light sources. A reflective surface
will not see our point light sources (they are infinitesimally small) and light sources will not play any
role here. Mirrors reflect light coming from all directions, not just that of our point light sources.

In term of implementation, handling reflections will add one of the most important brick of our
path tracer. Reflective surfaces lead to recursive code: to compute the light arriving towards the
camera sensor, you need to know the amount of light arriving at P from the reflected direction ωr .
But the light coming from this reflected direction could be the result of another mirror reflecting light
from elsewhere (and so on). As such, you will now build your first path throughout the scene. A
typical recursive implementation/pseudo-code would look like:

1
2 Vector Scene : : getColor ( c o n s t Ray& ray , i n t ray_depth ) {
2.2. PHYSICALLY-BASED RENDERING 23

3 i f ( ray_depth < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // t e r m i n a t e s r e c u r s i o n a t some ←-

point
4
5 i f ( intersect ( ray , P , N , sphere_id ) ) {
6 i f ( spheres [ sphere_id ] . mirror ) {
7 Ray reflected_ray = . . . . ;
8 r e t u r n getColor ( reflected_ray , ray_depth −1) ;
9 } else {
10 // h a n d l e d i f f u s e s u r f a c e s
11 }
12 }
13
14 }
15
16 i n t main ( ) {
17 // f i r s t d e f i n e t h e s c e n e , v a r i a b l e s , . . .
18 // then s c a n a l l p i x e l s
19 f o r ( i n t i =0; i<H ; i++) {
20 f o r ( i n t j =0; j<W ; j++) {
21 Ray ray ( . . . ) ; // c a s t a ray from t h e camera c e n t e r t o p i x e l i , j
22 Vector color = scene . getColor ( ray , max_path_length ) ;
23 pixel [ i * W *3+j * 3 + 0 ] = std : : min ( 2 5 5 , std : : pow ( color [ 0 ] , 1 . / 2 . 2 ) ) ; // s t o r e s R ←-
channel
24 // same f o r g r e e n and b l u e
25 }
26 }
27 // s a v e image and r e t u r n 0
28 }

Note that similarly to cast shadows, you need to offset the starting point of the reflected ray off
the surface to avoid numerical issues. This will also be the case later for transparent surfaces, indirect
lighting etc. and will not be repeated any further.

Refractions. The case of transparent surfaces is very similar to that of mirrors. For transparent
objects, rays also continue their lives by bouncing off the surface, but this time, passing through it.
The computation of the reflected direction is however slightly more involved. For that, we assume the
Snell-Descartes law, written here as n1 sin θi = n2 sin θt . This law essentially says that the tangential
component of the transmitted ray (sin θt ) is stretched from that of the incoming ray (sin θi ) by a factor
n1 /n2 . Decomposing the transmitted direction ωt in tangential and normal components ωt = ωtT +ωtN ,
it is easy to deduct that
n1
ωtT = (ωi − hωi , N iN )
n2
where we have used the fact that the tangential component of ωi is ωi minus its normal component
(its projection on N ).

Regarding the normal component, we have ωtN = −N p cos θt (considering the normal N is pointing
N 2
towards the incoming ray). This amounts
r to ωt = −N 1 − sin r θt . And since we have the Snell-
2 2
Descartes law, this equals: ωtN = −N 1 − nn12 sin2 θi = −N 1 − nn12 (1 − cos2 θi ). The cosine
can be computed by projecting on the normal N , so:

s 2
n1
ωtN = −N 1− (1 − hωi , N i2 )
n2

2
From this equation, one can see that if 1 − nn12 1 − hωi , N i2 becomes negative, the square root

would lead to imaginary results... This can only occur if n1 > n2 . This corresponds to a total internal
reflection, and occurs if sin θi > nn21 .
24 CHAPTER 2. RENDERING

" During the computations, we made sure the normal N was pointing towards the incoming ray.
This is typically the case when the ray enters a sphere. However, when the ray exits the sphere, the
geometric normal returned by our intersection test has the wrong sign. Make sure to use the correct
refraction indices and normal sign in this case ! You can detect the case of a ray exiting the transparent
sphere when hωi , N i > 0. Also, make sure to offset the starting point of your refracted ray... on the
correct side! In general, for refraction, beware of signs.

A trick to simulate hollow spheres is to make two spheres of the same center and slightly different
radii, and then inverting the normals of the inside sphere. A result showing reflection and refraction
on a full and hollow sphere is shown in Fig. 2.11. Also, ideally, the index of refraction should depend
on the wavelength. To achieve dispersion, we would throw rays of a single wavelength, combining
them on the sensor ; we will not do that here.

Figure 2.11: A sphere with reflection, a full sphere with refraction, and an hollow sphere with refrac-
tion. Notice how the full sphere inverts the scene behind as it acts as a lens. The refraction index used
is 1.5, corresponding to glass. The image is computed in 75ms (without parallelization) with about
230 lines of code.

Fresnel law. Both the coefficient of reflection and transmission are fully determined by the
refraction indices n1 and n2 , via Fresnel equations. In practice, these equations are relatively costly
to evaluate, and one often rely on Schlick’s approximation of Fresnel coefficients. For dielectrics, this
reads:
k0 = (n1 − n2 )2 /(n1 + n2 )2
R = k0 + (1 − k0 )(1 − |hN, ωi i|)5
T =1−R
where k0 is the reflection coefficient at normal incidence, R is the reflection coefficient for incidence
ωi , and T the transmission coefficient. An option could be to call our function Scene::getColor
twice, once for the reflected ray and once for the refracted ray, and modulate the two resulting colors
with the reflection and transmission coefficients, and summing them. However, this would double the
number of rays in the scene for each light bounce. Instead, we will randomly launch either a reflection
ray, or a refraction ray. For that, we find a (uniform) random number u between 0 and 1, and launch
a reflection ray if u < R and a refraction ray otherwise. We then do not need to rescale the resulting
value. Of course, this would result in an extremely noise image since adjacent pixels will get assigned
different random numbers. As such, we will launch multiple rays for each pixel, resulting in multiple
2.2. PHYSICALLY-BASED RENDERING 25

paths, and average the resulting color. This scheme will be further discussed along with Monte Carlo
integration next, in Sec. 2.2.1.

" To avoid noisy images, you need to average the result of multiple paths. It is extremely important
that for each light bounce in the scene, a single call to Scene::getColor is performed. To make it
clearer: you launch K rays from the camera center C to the same pixel (i, j), then for each light bounce
of these rays you send (at most) one secondary ray (for reflection, transmission, or indirect lighting as
we will see next). This results in K paths throughout the scene, resulting in K different colors. You
then average these K colors to obtain the pixel value. Never recursively call Scene::getColor more
than once: this would result in impractically too many secondary rays.

Note: you can similarly handle multiple point light sources by adding the contribution of just one
randomly chosen light source and averaging different realizations, rather than adding all contributions.
This becomes interesting when one can weigh this randomness by the intensity or distance of light
sources. We will see a similar approach next, to handle indirect lighting.

Figure 2.12: Same as Fig. 2.11 but with Fresnel reflection taken into account on transparent surfaces.
For this image, I took 1000 rays per pixel, which resulted in a rendering that took about 1 minute
(without parallelization, and about 260 lines of code).

Adding indirect lighting

Indirect lighting is an extremely important factor to realism. To my knowledge, it has first been
introduced in a physically correct manner (at least via Virtual Point Lights, as opposed to artists
manually tuning light sources) in Pirates of the Caribbean 2 (2006) with the Renderman renderer.
Indirect lighting is the reason why the ceiling in your classroom does not appear black, although no
(direct) light sources are illuminating it (Fig. 2.13). Simulating indirect lighting is probably one of
the most difficult aspect of rendering, and will require several ingredients: understanding the render-
ing equation, understanding Monte Carlo integration, and implementing good importance sampling
strategies.

The Rendering Equation. The equation that gives the outgoing spectral radiance (i.e., the
26 CHAPTER 2. RENDERING

Figure 2.13: Classroom illuminated only via direct lighting (left), and direct+indirect lighting (right).
Notice the overly dark ceiling on the left. Model from https://www.blendswap.com/blend/15639.

result of Scene::getColor) is:

Z
Lo (x, ωo , λ, t) = Le (x, ωo , λ, t) + f (x, ωi , ωo , λ, t)Li (x, ωi , λ, t)hωi , N idωi (2.1)
Ω

This equation simply says that your Scene::getColor function depends on the point x in the scene
(in our case, it is evaluated at intersection points P ), the (opposite of the) ray direction −ωo , the light
wavelength λ (in our case, we merely render R, G and B channels) and a time parameter t. It results
in the sum of the emitted light Le at x in the direction ωo (and wavelength λ and time t) and the
contribution of all light reflected at point x. The light reflected at x is simply the sum of all incoming
light contributions Li from the hemisphere Ω falling on x, modulated by a function f that is called
Bidirectional Reflectance Distribution Function or BRDF, which describes the appearance or shininess
of materials, and a dot product/cosine function that accounts for light sources projected area (a small
area light at grazing angle will see its contribution smeared over a large area). Notations can be see
in Fig. 2.14.

x’
N’

Lo N
Li
ωo
-ωi
θi

Ω
x
Le

Figure 2.14: Notations for the Rendering Equation. Note that from now on, we denote by convention
ωi a vector that points outwards the surface, like ωo . Since this mostly influence dot product signs,
this is usually understood from context.
2.2. PHYSICALLY-BASED RENDERING 27

It is interesting to see that the incoming light at point x from direction ωi is exactly the outgoing
light at a point x0 from direction −ωi , assuming a vacuum medium (we will see in Sec. 2.2.1 how to
handle participating media). As such, using the rendering equation at point x0 (and ignoring spectral
and temporal variables for conciseness ; we will also occasionally ignore position variables when the
context is clear enough in the future), we could rewrite Eq. 2.1 at point x as
Z Z
0 0 0 0 0 0 0
Lo (x, ωo ) = Le (x, ωo ) + f (x, ωi , ωo ) Le (x, ωo ) + f (x , ωi , −ωi )Li (x , ωi )hωi , N idωi hωi , N idωi
Ω Ω0

and recursively, the lighting reaching point x0 comes from other locations in the scene and so on.
This type of recursive integral equation is called a Fredholm integral of the second kind, as, in
fact, there is a single unknown radiance function L to be determined, that is both outside and inside
the integral.

This results in an integration over an infinite dimensional domain, called Path Space that represents
a sum of light paths with 0, 1, 2..∞ bounces, that needs to be performed numerically.

Bidirection Reflectance Distribution Functions (BRDFs). An important function in the

rendering equation above is the term f , the BRDF. This term describes the amount of light being
reflected off a surface towards a direction ωo if it arrives from a direction ωi (Fig. 2.15). Condi-
tions for their physical meaningfulness are that they are positive (f ≥ 0), they respect Helmoltz
reciprocity principle, that is, they are symmetric (f (ωi , ωo ) = f (ωo , ωi ))3 and preserve energy, that is
4
R
ω f (ωi , ωo )hωi , N idωi ≤ 1, ∀ωo

ωi ωi ωi ωo
ωo
ωo
x x x
Specular surface Diffuse surface Glossy surface

Figure 2.15: Typical BRDFs.

These BRDFs can be provided as tabulated functions, for instance coming from gonioreflectometers
that are physical devices that measure reflected light off surfaces at different angular values. Notable
databases of BRDFs include MERL 100 isotropic BRDF dataset5 (see Fig. 2.16 ; note that isotropic
BRDFs can be reparameterized using only 3 dimensions, θi , θr , φd instead of 4 angular values θi , φi ,
θr , φr – a parameterization called Rusinkiewicz parameterization), Ngan’s 4 anisotropic BRDFs6 , and
UTIA 150 anisotropic BRDFs7 . These tabulated values can be heavy to store and manipulate, and
can further be compressed, for instance by projecting them on spherical harmonics. Applications of
these spherical harmonic projected BRDFs will be discussed in the context of Precomputed Radiance
Transfer in Sec. 2.2.3.

BRDFs can also be described via closed-form expressions, that can either be ad-hoc (also coined
as “phenomenological” for political correctness, but they are all more or less Gaussian lobes around
the purely specular direction – we will see the Blinn-Phong BRDF model in Sec. 2.2.1) – or derived
from microgeometry analysis assuming microfacet models (e.g., Cook-Torrance, Oren-Nayar, Torrance-
Sparrow, Ashikhmin-Shirley, He et al., ...).

For now, we have seen and will focus on three particular cases: fr (ωi , ωo ) = δωr (ωo ) with ωr the
reflection of ωi around the normal N as we have seen in Sec. 2.2.1, ft (ωi , ωo ) = δωt (ωo ) with ωt the
2
3 n2
This is not always the case, though most often. Notably, for transparent surface, f (ωi , ωo ) = n1
f (ωo , ωi )
4
R R
This can be derived from the fact that ω ω Li (ωi )f (ωi , ωo )hωi , N idωi dωo ≤ 1, ∀Li .
5
https://www.merl.com/brdf/
6
https://people.csail.mit.edu/addy/research/brdf/
7
http://btf.utia.cas.cz/?brdf_dat_dwn
28 CHAPTER 2. RENDERING

Figure 2.16: BRDFs from the MERL dataset.

transmission of ωi inside the surface of normal N as we have seen in Sec. 2.2.1, and fd (ωi , ωo ) = πc
the diffuse BRDF as in Sec. 2.2.1. Note that fr and ft involve Dirac distributions, and Eq. 2.1 should
thus be (re-)interpreted in the sense of distributions. We will see later in Sec. 2.2.1 how to implement
the Blinn-Phong BRDF.

Monte Carlo integration. We need to perform numerical integration to evaluate Eq. 2.1. You
have probably seen during your curriculum various ways to numerically integrate functions, such as
the rectangle method (midpoint rule), trapezoidal rule, or even higher order methods such as Newton
Cotes. These methods divide the integration domain in regular intervals, and consider the function
is piecewise-something within these intervals. The major drawback is that regularly dividing an
integration domain of dimension d (let alone an infinite dimensional space!) produces exponentially
many intervals, such that even dividing in 10 intervals each dimension of a 4-d domain would result
in 104 intervals (remember that this integration needs to be performed for possibly millions of pixels,
that in practice, we often need more than 4 dimensions, and that 10 intervals per dimension would
likely miss important high frequency features).

To alleviate this issue, Monte Carlo integration has been proposed as a way to stochastically evalu-
ate integrals. This technique has been historically developped in the context of the Manhattan project
for nuclear simulation and is now widely used in computer graphics, but also mainly in economics,
nuclear physics and medical imaging. It is simply expressed in general term as:
Z N
1 X f (xi )
f (x)dx ≈
Ω N p(xi )
i=1

where xi are random samples following the probability density function p. This converges to the true
integral assuming p > 0 wherever f 6= 0. The intuition is that if you can give a sample half the
2.2. PHYSICALLY-BASED RENDERING 29

probability of occurring, but then you need to compensate and count it twice. However, this process
converges slowly: the integration error decreases in O(1/N 0.5 )8 .

A major tool to improve the integration error is importance sampling. Importance sampling will
try to find a probability density function p that is near proportional to f . In fact, if p is exactly
proportional to f , that is, p = αf , then
Z N
1 X f (xi )
f (x)dx ≈ (2.2)
Ω N αp(xi )
i=1
N
1 X 1
= (2.3)
N α
i=1
1
= (2.4)
α
that is, the estimator would converge without any sample, in O(1) ! This is due to the definition of
probability distributions: they should integrate to 1, so if they integrate to 1 and are proportional
to f , then the constant of proportionality is the (inverse of the) integral. In short, if you are able
to build an exactly proportional probability density function (pdf), then you do not need numerical
integration in the first place ! However, this method is interesting if you know that your p is a good
approximation of f , up to a constant (unknown) scaling factor.

Exercise. To test your understanding of Monte Carlo integration, please write a program that
estimates Z
F = cos(x y z)dx dy dz
[−π/2,π/2]3

using an isotropic Gaussian probability density function f of standard deviation σ = 1 (f does not
really look like a Gaussian, but gives at least more priority on values near (0, 0, 0) and is sufficient for
the sake of exercise – a better proxy would gives higher values around each axis).
For that, we will use the <random> header from the STL which provides reasonably good random
numbers (at least, as opposed to the rand() function), and we will consider the Box-Muller transform,
that produces 2 Gaussian samples given 2 uniform random values:

1 #i n c l u d e <random>
2 s t a t i c std : : default_random_engine engine ( 1 0 ) ; // random s e e d = 10
3 s t a t i c std : : uniform_real_distribution<double> uniform ( 0 , 1 ) ;
4
5 v o i d boxMuller ( d o u b l e stdev , d o u b l e &x , d o u b l e &y ) {
6 d o u b l e r1 = uniform ( engine ) ;
7 d o u b l e r2 = uniform ( engine ) ;
8 x = sqrt (−2 * log ( r1 ) ) * cos ( 2 * M_PI * r2 ) * stdev ;
9 y = sqrt (−2 * log ( r1 ) ) * sin ( 2 * M_PI * r2 ) * stdev ;
10 }

3
Note that this 3-dimensional Gaussian has a pdf given by p(x, y, z) = √1 exp(−(x2 + y 2 +
σ (2π)
z 2 )/(2σ 2 )), as a joint density of 3 independent 1-dimensional Gaussian functions. The exact value is
close to 24.3367. With 10000 samples, you should at least get the 24 part correct...
d
8
This can be somewhat improved to O( (logNN ) ) by using well-chosen deterministic samples that uniformly cover the
integration domain such as the commonly used Sobol sequence or other low-discrepancy sequences – a technique called
quasi-Monte Carlo. See Variance Analysis for Monte Carlo Integration https://dl.acm.org/doi/pdf/10.1145/2766930
30 CHAPTER 2. RENDERING

" The resulting code should only have 1 for loop, and not 3 nested loops, looping over x,
y and z (like for the midpoint rule for example) ! This would otherwise entirely miss the point of
Monte Carlo integration: having a code whose complexity does not depend on the dimensionality of
the integrand. This remark is akin to that of Fresnel refraction: in fact, when we randomly chose
between reflecting or refracting rays, we actually did Monte Carlo integration, with p being a discrete
probability distribution!

Implementing indirect lighting. We are now ready to add indirect lighting to our path tracer.
Realizing that we actually did implement indirect lighting already for mirror and transparent surfaces,
we will consider for now that our surfaces are either purely diffuse of albedo ρ (and Le = 0), or
emissive (with f = 0). We aim a building a path sampling the path space where the light contribution
is reasonably high, and at each light bounce over a diffuse surface at point x we locally evaluate the
interaction and use it recursively :
Z
ρ
Lo (x, ωo ) = Li (x, ωi )hωi , N idωi (2.5)
π Ω

To importance sample a diffuse surface, we would ideally sample the integrand Li (x, ωi )hωi , N i. But
as noted before, it is simply impossible (otherwise the problem would be already solved). A simple
option is to only sample according to the second term hωi , N i. Assuming N = (0., 0., 1.), this can be
achieved by using a formula similar to Box-Muller formula:

r1 , r2 ∼ U(0, 1) (2.6)
√
x = cos(2πr1 ) 1 − r2 (2.7)
√
y = sin(2πr1 ) 1 − r2 (2.8)
√
z = r2 (2.9)
(2.10)

It is easy to see that this formula directly gives a vector of unit norm, and the pdf of these samples
is p((x, y, z)) = z/π. Using a frame change formula, one can easily bring it to a frame such that
the z coordinate above is aligned with our actual normal vector N . Producing a local frame around
N can be achieved by first generating two orthogonal tangent vectors T1 and T2 . To generate T1 ,
we could directly use a normalized version of the vector (Nz , 0, −Nx ) for example, since it is easy to
see that hN, T1 i = 0 by construction. This would often work, until numerical issues arise near the
normal vector N = (0, 1, 0), which would produce a tangent vector near T1 = (0., 0., 0.). To avoid
that, we detect the smallest component of N (in absolute value!), force it to be zero, swap the two
other components and negate one of them to produce T1 , which we normalize. Then T2 is obtained by
taking the cross product between N and T1 . And given N , T1 and T2 , we obtain the random Vector
in the correct frame by using V = x T1 + y T2 + z N , where (x, y, z) were generated by the formula
above. We will call this function random cos(const Vector &N).

With the method above to generate random vectors, and the known pdf p, it becomes easy to
perform Monte Carlo integration. You will realize that cosine terms cancel out, as well as the factor
π (the term π in πρ is cancelled by π from the pdf: p = hN,ω
π
ii
when dividing by the pdf).

Other importance sampling formulas can be found in the Global Illumination Compendium by
Philip Dutré9 .

Now, you may realize that working only with point light sources (for now) will result in strictly no
rays arriving by chance on these infinitesimally small lights. To address this issue, we directly sample
our point light source using the formulas we used until now resulting in the direct lighting contribution,
and add it to the random contribution we are generating (call the indirect lighting contribution).
9
https://people.cs.kuleuven.be/ philip.dutre/GI/TotalCompendium.pdf
2.2. PHYSICALLY-BASED RENDERING 31

Similarly to Fresnel, if you sample one ray per pixel the resulting image will be extremely noisy due
to all that randomness, but shooting many rays per pixel will make it converge to a nice and smooth
image. If you have already implemented this strategy for Fresnel materials, you do not need to change
anything.

Also, realize that the code you just wrote for handling indirect lighting on diffuse surfaces just
looks like the code for mirror surfaces – just the reflected ray goes in a random direction instead of a
deterministic mirror direction. The code should look like:

1 Vector Scene : : getColor ( c o n s t Ray& ray , i n t ray_depth ) {

2 i f ( ray_depth < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // t e r m i n a t e s r e c u r s i o n a t some ←-
point
3
4 i f ( intersect ( ray , P , N , sphere_id ) ) {
5 i f ( spheres [ sphere_id ] . mirror ) {
6 // h a n d l e m i r r o r s u r f a c e s . . .
7 } else {
8 // h a n d l e d i f f u s e s u r f a c e s
9 Vector Lo ( 0 . , 0 . , 0 . ) ;
10 // add d i r e c t l i g h t i n g
11 d o u b l e visibility = . . . ; // computes t h e v i s i b i l i t y term by l a u n c h i n g a ray ←-
towards t h e l i g h t s o u r c e
12 Lo = light_intensity / ( 4 * M_PI * squared_distance_light ) * albedo / M_PI * ←-
visibility * std : : max ( dot ( N , light_direction ) , 0 . ) ;
13
14 // add i n d i r e c t l i g h t i n g
15 Ray randomRay = . . . ; // randomly sample ray u s i n g random cos
16 Lo += albedo * getColor ( randomRay , ray_depth −1) ;
17
18 r e t u r n Lo ;
19 }
20 }
21 }

and should produce results similar to those of Fig. 2.17.

Russian Roulette. Until now, we have truncated light paths to a maximum number of bounces
controlled by the initial value of ray depth. This leads to a biased rendering: one can construct a
scene that requires an arbitrarily high number of light bounces (for instance, take an arbitrary number
of mirrors redirecting one light source to a room). We thus did not integrate over the entire infinite
dimensional space of light paths, but over a truncated version of it. It is however possible to integrate
over this infinite-dimensional space. Instead of killing rays after a certain number of bounces, you only
kill them with some probability, and divide the light contribution by this probability. You can fine
tune this probability to be proportional to the current path intensity (if the first 5 encountered albedos
are very dark, it is unlikely that any future light source will be sufficiently bright to compensate light
absorption, so we make a 6th bounce unlikely – but if it occurs, then we compensate this low probability
by putting a large weight), but in any case, this results in an unbias rendering. Unfortunately, this
also tends to introduce significant noise (there is always a tradeoff between bias and noise), so we will
not implement it here.

Parallelization. Our code starts to be relatively slow, due to the number of paths that need to
be generated. An easy parallelization instruction is:

1 #pragma omp p a r a l l e l f o r
2 f o r ( i n t i =0; i<H ; i++) {
3 // . . .
32 CHAPTER 2. RENDERING

Figure 2.17: Rendering with indirect lighting (290 lines of code). First row, the renderings with
either a diffuse or transparent central spheres take about 35 seconds in parallel (or 7 minutes without
parallelization) using 1000 paths per pixel, and a maximum ray depth (max path length in the code
below) of 5. Second row, the rendering takes 1.2 seconds (in parallel) for 32 paths per pixel.

4 }

This instructs the compiler to perform the for loop in parallel. Make sure to enable OpenMP, using
Projet properties − > Configurations Properties − > C/C++ − > Language − > Open MP Support
with Visual Studio, or -fopenmp on recent GCC or -openmp on old GCC. Old Clang do do support
OpenMP. On MacOS, you may need to link with OpenMP using -L/usr/local/opt/libomp/lib
-I/usr/local/opt/libomp/include -fopenmp -lomp. Parallelization instructions should in general
go on the outermost loop, since starting threads has an inherent non-negligible system cost. By default,
the above instruction would evenly split the H lines of pixels in OMP NUM THREADS blocks (or as many
as the number of cores you have), and run these blocks in parallel. This is equivalent to #pragma omp
parallel for, schedule(static, ceil(H/(double)omp get num threads()) and is ideal when all
rows of pixels have the same computational time. However, when this is not the case (which often
occurs), threads end up waiting for other threads to finish, doing nothing. A dynamic schedule can
2.2. PHYSICALLY-BASED RENDERING 33

then be used, as in #pragma omp parallel for, schedule(dynamic, 1) which instructs OpenMP
to feed threads one row as soon as it is available. Dynamic scheduling is generally more costly than
static scheduling, though the scheduling cost is here negligible with respect to computation times.

" The std::default random engine is not thread safe. Also, the thread local directive is not
compatible with OpenMP threads. You may need to instantiate one random number generator per
thread.

Antialiasing

As we are always sampling rays in the middle of each pixels, there is a discontinuity between adjacent
pixels: a ray may hit the sphere for a pixel and miss it in the next pixel. This results in a phenomenon
called aliasing. In fact, camera sensor cells have an area, they are not points. More precisely, actual
camera sensor cells are arranged in a pattern called Bayer pattern (Fig. 2.18). Each sensor cell is
sensitive to either red, green and blue through a colored filter array, and since the eye is more sensitive
to green light than red or blue, there are twice as many “green cells” (or rather grayscale cells covered
with a green filter) than red or blue cells. Once a photograph is taken, the resulting raw image is then
converted to an RGB pixel grid using demosaicing (or debayering) algorithms. We will not simulate
Bayer patterns as we can directly emulate an RGB-sensitive pixel array.

Figure 2.18: Camera sensor cells are arranged in a Bayer pattern, interleaving red, green and blue
filtered sensors.

The idea here is to integrate the radiance that reaches the camera sensor over the surface of each
pixel. For that, we are actually integrating:
Z
Li,j = Li (x, ωi (x))dx
Ai,j

where {i, j} are the pixel indices, Ai,j represents the surface of pixel (i, j), and Li (x, ωi (x)) represents
the light reaching the camera sensor at point x from a direction that is fully determined by x and the
x−C
camera center (ωi (x) = kx−Ck ). In practice, this would amount to box filtering the input radiance,
which is not spectrally ideal and could still result in some amount of aliasing (notably for high frequency
textures or geometries).

Instead, we would rather filter the signal more smoothly, by integrating:

Z
i,j
L = Li (x, ωi (x))hi,j (x)dx
Ai,j

where h is some nice smooth kernel (see Sec. 3.1.1 for more details on filtering and convolutions). While
interesting choices include Mitchell-Netravali’s filtering or windowed Since filters, we will simply use
34 CHAPTER 2. RENDERING

a Gaussian filter centered in the middle of pixel i, j for our function h. We have now seen Monte
Carlo integration, and it is becoming clear that the above computation is well suited to it: we can
efficiently design an importance sampling approach that produces samples more often in the middle of
each pixels according to a Gaussian probability! In fact, we have already implemented Box-Muller’s
technique earlier as an exercise. And while evaluating the Monte Carlo estimate, one realize that
again, the Gaussian kernel h and the pdf p exactly cancel out since we have importance sampled the
integrand according to h.

Our main function now looks like:

1 i n t main ( ) {
2 // f i r s t d e f i n e t h e s c e n e , v a r i a b l e s , . . .
3 // then s c a n a l l p i x e l s
4 #pragma omp p a r a l l e l f o r , s c h e d u l e ( dynamic , 1 )
5 f o r ( i n t i =0; i<H ; i++) {
6 f o r ( i n t j =0; j<W ; j++) {
7 Vector pixelColor ( 0 . , 0 . , 0 . ) ;
8 f o r ( i n t k =0; k<NB_PATHS ; k++) {
9 Vector rand_dir = . . . ; // a s b e f o r e but t a r g e t i n g p i x e l ( i , j )+boxMuller ( ) * s p r e a d
10 Ray ray ( C , rand_dir ) ; // c a s t a ray from t h e camera c e n t e r C with r a n d d i r ←-
direction
11 pixelColor += scene . getColor ( ray , max_path_length ) ;
12 }
13 pixel [ i * W *3+j * 3 + 0 ] = std : : min ( 2 5 5 , std : : pow ( pixelColor [ 0 ] / NB_PATHS , 1 . / 2 . 2 ) ) ; //←-
s t o r e s R channel
14 // same f o r g r e e n and b l u e
15 }
16 }
17 // s a v e image and r e t u r n 0
18 }

and produces the image in Fig. 2.19.

Figure 2.19: Image without (left) and with (right) antialiasing.

2.2. PHYSICALLY-BASED RENDERING 35

Remark. It is now clear that, using a Gaussian importance sampling strategy, samples for pixel (i, j)
have some probability to fall outside of pixel (i, j) (in fact, as soon as the Box-Muller function will
return one value larger than 0.5). Given the cost of retrieving Li (x, ωi (x)), it would be a waste to only
use it for pixel (i, j) and not for all the neighboring pixels (i0 , j 0 ) for which hi0 ,j 0 is sufficiently large.
It is indeed interesting to splat Li (x, ωi (x)) over a small pixel neighborhood. However, care must be
taken to avoid concurrency issues while parallelizing code. To simplify the implementation, we will
not implement this technique which correlates samples received by neighboring pixel.

Spherical / area light sources

Another important factor to realism is the presence of soft shadows (Fig. 2.20). Soft shadows are
the result of light sources having an area and not being points, hence resulting in penumbras. For
simplicity, we will support spherical light sources (since we have primitives for them), but the method
extends to other shapes.

Figure 2.20: Classroom image without (left) and with (right) soft shadows. Notice the shadow of the
blackboard on the wall and tables on the ground.

A naive solution would simply to set a positive value for the emission Le of all spherical light
sources, and wait for our random rays to reach these light sources (and remove our point light source).
This would theoretically work, but also produce very noisy images. In fact, the smaller the light
source, the less likely light paths will randomly reach them, and the noisier the images (Fig. 2.21).

Figure 2.21: Naively handling soft shadows using spherical light sources of radius 1, 2, 10, and 20.
As the radius increases, light paths have more chances to randomly reach light sources, which reduces
noise. Also notice the soft shadows appearing. These renderings still have 1000 (uncorrelated) samples
per pixel, which is very large for typical scenes. The rendering takes about 25 seconds (in parallel) for
280 lines of code.

Recall that for diffuse surfaces, we are looking to numerically evaluate an expression of the form:
Z
ρ
Lo (x, ωo ) = Li (x, ωi )hωi , N idωi
π Ω
36 CHAPTER 2. RENDERING

Similarly to point light sources, we will separate direct and indirect contributions. The formalism
will be made clearer here: we split the integration domain Ω in two parts: the part Ωd (d for direct)
that consists in the area of the hemisphere where spherical light sources project and the rest of the
hemisphere, Ωi (i for indirect). Ωd is such that launching rays in a direction ωi ∈ Ωd from x would
reach a spherical light source, unless blocked by some geometry. This is akin to point light sources,
where Ωd what an infinitesimally small domain.

We hence keep our process in which we add indirect and direct lighting together. For indirect
lighting, we will only make a small change to our existing code (since these rays do not directly reach
light sources, they can be importance sampled according to the cosine term as we did before): if we
launch a random ray for indirect lighting contribution but it still hits a light, then we should count
its contribution as zero (otherwise this value would be counted twice, once in the direct lighting com-
putation, and once in the indirect lighting computation). We are left with implementing importance
sampling for direct lighting, that is, light rays directed towards light sources.

We could use a formula for importance sampling directions within the spherical cap Ωd . But
it is easier and more general to re-parameterize the rendering equation via a change of variable for
which instead of integrating over (part of) an hemisphere, we would integrate over (part of) the scene
directly. This means that we would sum over small area patches in the scene rather than small solid
angles (see Fig. 2.22).

S
C
x’ R
N’

-ωi(x’)
Lo N
Li
ωo Ωd
θi
Ωi

x
Le

Figure 2.22: Notations for integrating over elements in the scene.

As always, when making a change of variable within an integral, one needs to account for the
determinant of the Jacobian of this change of variable. It appears that this determinant is D =
hN 0 ,−ωi iVx (x0 )
kx−x0 k2
where Vx is still the visibility function, and N 0 the normal of the area patch around
point x0 . The rendering equation for purely diffuse surfaces now looks like:

hN 0 , −ωi (x0 )iVx (x0 ) 0

Z
ρ
Lo (x, ωo ) = Li (x, ωi (x0 ))hωi (x0 ), N i dx
π S kx − x0 k2

x0 −x
with ωi (x0 ) = |x0 −x| , and S the surface of our light source.
2.2. PHYSICALLY-BASED RENDERING 37

0 0 0
In fact, the coefficient G(x, x0 ) = hωi (x0 ), N i hN ,−ω i (x )iVx (x )
kx−x0 k2
is often called the form factor between
x and x0 . We will also use it later in the context of Radiosity (Sec. 2.2.4).

We will now seek to stochastically sample our spherical light sources in the scene (instead of directly
sampling directions towards them). Given the term in hN 0 , −ωi (x0 )i, it is obvious that we should avoid
sampling points ont the “edge” of the spherical light, as this dot product will be close to zero, and that
we would prefer sampling values for which hN 0 , −ωi (x0 )i is large. Also, the visibility term Vx is such
that half our spherical light sources will be occluded by the other half... so we would like to sample
points only on the visible side. Fortunately, we have already written some code, random cos(const
Vector &N), that takes a Vector N (that used to be our normal vector, but could be anything) and
returns a random Vector which has more chances of being sampled around N than orthogonally to it.
It samples them according to a probability density function p(V ) = hV,N π .
i

To generate a point x0 on our spherical light source S of center C and radius R from a point x, we
0
first build the vector D = |xx0 −C
−C| that defines the visible hemisphere of S, we call V = random cos(D)
to obtain a unit direction that has more chance of facing D, and finally obtain x0 using x0 = R V + C.
The probability density function at x0 is p(x0 ) = hV,Di 1 2
π . R2 , where 1/R is due to the samples being
stretched in two dimensions by a factor R.

Regarding Li (x, ωi (x0 )), we now need to spread our I Watts of light power over the surface of a
sphere of radius R, with each of these point radiating in all directions of the hemisphere with a cosine
factor. The number of Watts.m−2 .sr−1 is thus 4π2IR2 .

The code now looks like:

1 Vector Scene : : getColor ( c o n s t Ray& ray , i n t ray_depth , b o o l last_bounce_diffuse ) {

2 i f ( ray_depth < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // t e r m i n a t e s r e c u r s i o n a t some ←-
point
3
4 i f ( intersect ( ray , P , N , sphere_id ) ) {
5 i f ( spheres [ sphere_id ] . is_light ) {
6 i f ( last_bounce_diffuse ) { // i f t h i s i s an i n d i r e c t d i f f u s e bounce
7 // i f we h i t a l i g h t s o u r c e by chance v i a an i n d i r e c t d i f f u s e bounce , r e t u r n ←-
0 to avoid counting i t twice
8 r e t u r n Vector ( 0 . , 0 . , 0 . ) ;
9 } else {
10 r e t u r n Vector ( 1 . , 1 . , 1 . ) * light_intensity / ( 4 * M_PI * M_PI * R * R ) ; // R i s t h e ←-
spherical light radius
11 }
12 }
13 i f ( spheres [ sphere_id ] . is_diffuse ) {
14 // h a n d l e d i f f u s e s u r f a c e s
15 Vector Lo ( 0 . , 0 . , 0 . ) ;
16 // add d i r e c t l i g h t i n g
17 Vector xprime = random_point_on_light_sphere ( ) ;
18 Vector Nprime = ( xprime−centerLight ) / ( xprime−centerLight ) . norm ( ) ;
19 Vector omega_i = ( xprime−P ) / ( xprime−P ) . norm ( ) ;
20 d o u b l e visibility = . . . ; // computes t h e v i s i b i l i t y term by l a u n c h i n g a ray ←-
o f d i r e c t i o n o me g a i
21 d o u b l e pdf = dot ( Nprime , ( x−centerLight ) / ( x−centerLight ) . norm ( ) ) / ( M_PI * R * R ) ;
22 Lo = light_intensity / ( 4 * M_PI * M_PI * R * R ) * albedo / M_PI * visibility * std : : max ( ←-
dot ( N , omega_i ) , 0 . ) * std : : max ( dot ( Nprime , −omega_i ) , 0 . ) / ( ( xprime−P ) . ←-
squared_norm ( ) * pdf ) ;
23
24 // add i n d i r e c t l i g h t i n g
25 Ray randomRay = . . . ; // randomly sample ray u s i n g random cos
26 Lo += albedo * getColor ( randomRay , ray_depth −1) ;
27
38 CHAPTER 2. RENDERING

28 r e t u r n Lo ;
29 }
30 }
31 }

Note the similarity of this approach to an approach that would consider the scene to have a single
point light whose position is not deterministic but stochastically sampled on the surface of a sphere.
This code can simulate simple caustics (Fig. 2.24).

" Always replace in your code hx, yi by max(hx, yi, 0). After millions of rays being launched in
all directions, you will be sure to find numerically small but negative values that could mess with
your simulation. Also, you now test the visibility by launching a ray towards a point sampled on
the light source and testing for intersections. However, your light source is a sphere that is part of
the scene. It is thus possible for our visibility query to return a point on the light source that is
numerically almost the same as the point that has been sampled on the light source (if there is no
shadow, the resulting intersection point and the point sampled on the sphere should be mathematically
the same, but numerical errors will arise). An epsilon should be added in the visibility test to avoid
self shadowing, in a similar way that rays were launched by a slightly offseted point above the surface.

Figure 2.23: Soft shadows by directly sampling the spherical light source (the code is now about 320
lines) of radius 1, 2, 10 and 20. Using 1000 samples per pixel and 5 light bounces (top row), it takes
about 1 minute per image. Using 32 samples per pixel (bottom row), about 2 seconds. Note that
noise could be decreased by taking into account correlations between pixels (see text).

Depth of Field, motion blur and camera models

Our generated images are sharp at all distances. However, photographs tend to be sharp only around a
certain distance, called the focus distance. In fact, our camera model corresponds to what is known as
a pinhole camera (Fig. 2.26): just a dark box√ of length f (called focal length) pierced with an tiny hole
(in practice, the optimal hole size is d = 2 f λ). This kind of setup has been known for a long time.
In fact, it is suspected that it was known since paleolithic times10 . In more recent times, pinholes were
10
see http://paleo-camera.com/ for discussions on suspected paleolithic and neolithic setups.
2.2. PHYSICALLY-BASED RENDERING 39

Figure 2.24: Moving the light a little bit reveals caustics in the transparent scene. Here the light
sphere is at position (−10, 25, −10) and of radius 5. These indirect specular bounces are hard to
capture and thus produce much higher levels of noise (here, 5000 samples per pixel were used). Other
techniques such as bidirectional path tracing or photon mapping better capture caustics.

used to paint realistic scenes by projecting landscapes on a canvas, a setup called camera obscura,
locus obscurus or camera clausa – for instance this led to early realistic depictions of Venice sceneries
(Fig. 2.25)11 .

Figure 2.25: The camera obscura was used for precisely painting scenes. This was used by a number
of artists such as Canaletto (1697-1768, left), or Luca Carlevarijs (1663-1730, right: Venicians arriving
in London in 1707)

To implement depth of field (DoF), we will assume a circular aperture. The idea is to realize that
all points at the focus distance describe a plane where points project to points on the sensor and remain
sharp (Fig. 2.27), and that light passes through the aperture before reaching the lens. The result is
exactly as if we made infinitely many renderings from pinhole cameras, where the tiny hole location
varies inside a small disk of the size of the aperture, and then average results. For implementation
purpose, similarly to the pinhole case, we will keep the camera sensor and lens locations swapped. As
such, we will simply find new starting points for our rays that are slightly tangentially offseted from
the camera location Q, and recompute their directions such that all rays targeting a given pixel cross
at the plane that remains in focus (up to antialiasing).
11
In fact, the Hockney-Falco thesis says that the drastic increase in realism in the 17th century is due to such
technological advances ; other famous artists may have used such devices such as Vermeer (1632-1675) https:
//en.wikipedia.org/wiki/Hockney%E2%80%93Falco_thesis.
40 CHAPTER 2. RENDERING

Figure 2.26: A pinhole camera is just a small hole in a dark chamber that lets light come in and
displays a sharp view of the outside world on the screen. The image is flipped: in our path tracer,
we have just put our sensor at a virtual location at a distance f outside the box for a more intuitive
implementation and non-flipped renderings (in our setup, the camera location C is the hole, and the
pixel grid is outside).

Camera Lens Objects in Camera Lens Objects out

sensor focus sensor of focus

F’ F F’ F
x x x x

Circle of
confusion

x
Camera Lens Objects out
sensor Aperture of focus

F’ F
x x

Smaller circle of
confusion

Figure 2.27: Top row. Using a camera lens, an object if placed at the focus distance will appear sharp
(left) as the image of a point of the object is a point on the sensor. However, moving the distance
closer to (or away from) the camera makes the object appear blurry as the image of a point is a small
disk called the circle of confusion. Bottom row. By adding a shutter aperture (setup on the left,
photo on the right), the circle of confusion can be made much smaller resulting in sharper images
away from the focus distance by blocking light (and hence resulting in darker images). If the circle of
confusion is smaller than a pixel, the image appears sharp. Cameras allow for varying the position of
F 0 , varying the distance of the lens to the camera sensor, and the size of the aperture (the first two
vary together in parfocal lenses to remain in focus while zooming).

To achieve that, we first generate a ray from the camera center Q (the pinhole center) as before
(red ray in Fig. 2.27). Then we find the point P that would be in focus. This point is given in our
case by P = Q + |uDz | u where D is the distance at which objects appear in focus, u is the (original) ray
direction, and uz its z coordinate (since objects appear sharp on a plane at a distance D from Q in the
optical axis – up to Petzval field curvature)12 . Once P is found, you can generate a point inside the
aperture shape (here, a disk, but you can simulate bokeh of various shapes) which will serve as your
12
You may also simulate tilt-shift photography by changing the orientation of this plane.
2.2. PHYSICALLY-BASED RENDERING 41

new origin Q0 and compute the ray direction as the unit vector u0 towards P (Fig. 2.28). Generating a
point on a disk can be performed in polar coordinate by choosing the square root of a uniform random
number as the radius r, and a uniform random angle θ in [0, 2π]. Results can be see in Fig. 2.29.

Sensor

Aperture
u’
Q’ x u
x x
Q F’

Figure 2.28: Depth of field can be obtained in our path tracer by starting rays from a point on the
aperture shape (instead of the pinhole center) such that rays cross at the focusing distance D.

Figure 2.29: Depth of field result in our path tracer, adding less than 10 lines of code, bringing it to
330 lines. Here, 2000 samples per pixel were used because of specular paths, though depth of field
often necessitate more samples.

Similarly, while the shutter of the camera is open, objects may have moved. This produces another
kinf of blur called motion blur. This is easily simulated in our path-tracer: now, rays have an additional
time parameter, and objects have a way of describing their motion (in my simple implementation,
they merely have a single speed vector defaulting to zero, but more complex motion is possible). To
simulate motion blur, we randomly select the time parameter of the generated ray within the time the
shutter is open, and compute the intersection with scene that includes object motion. In my simple
implementation, I merely translate the sphere origin by the sphere’s speed multiplied by the ray time
parameter in the Ray-¿Sphere intersection test. By essentially adding two lines of code and modifying
a couple of others, we obtain the result shown in Fig. 2.30.
42 CHAPTER 2. RENDERING

Figure 2.30: Motion blur is obtained by adding a time parameter to the rays. The time value is selected
randomly within the interval of time the camera shutter is kept open. The ray-sphere intersection
here considers a linear motion of the sphere. This merely adds 2 lines of code, and modifies a couple
of others.

Meshes

The next big thing in our path tracer is the support of triangle meshes. It is highly uninteresting for
me to make you implement a loader for mesh files, so I provide an ugly one that can be downloaded
at: https://pastebin.com/CAgp9r15
Sure, that adds 200 lines to our path tracer, but let’s start simpler.

Ray-Plane intersection. A plane is defined by a normal vector N and a point A that belongs
to the plane. All points P from the plane thus have the equation hP − A, N i = 0. Substituting P
by the equation of a ray starting at O, of direction u, leads to hO + t u − A, N i = 0, and hence, the
unique solution, if it exists, is:
hA − O, N i
t=
hu, N i
We are still only interested in positive solution.

Ray-Triangle intersection. A point P is within a triangle defined by vertices A, B and C if

P = α A + β B + γ C, α, β, γ ∈ [0, 1] and α + β + γ = 1. α, β and γ are called the barycentric
coordinates of P , and when P is inside ABC, they represent ratios of areas, e.g., α = area(P BC)
area(ABC) . It
is often impractical to have 3 barycentric coordinates for something intrinsically 2-dimensional, so we
often reparameterize it by saying that P = A + βe1 + γe2 where e1 = B − A and e2 = C − A (also
use the fact that α + β + γ = 1). Using the ray equation, we obtain a linear equation for the point of
intersection of the form βe1 + γe2 − t u = O − A. In matrix form:
    
β
e1 e2 −u γ  = O − A
t

We note that for a 3x3 matrix  

det A B C  = hA, B × Ci
2.2. PHYSICALLY-BASED RENDERING 43

where × denotes the cross product, and that swapping columns change the sign of the determinant
while circular permutation does not. We also note N the non-normalized normal, using N = e1 × e2 .
Using Cramer’s formula, we obtain the solution of this linear system by ratios of determinants:

hO − A, e2 × −ui he2 , (A − O) × ui
β= = (2.11)
he1 , e2 × −ui hu, N i
he1 , (O − A) × −ui he1 , (A − O) × ui
γ= =− (2.12)
he1 , e2 × −ui hu, N i
α=1−β−γ (2.13)
he1 , e2 × (O − A)i hA − O, N i
t= = (2.14)
he1 , e2 × −ui hu, N i
(2.15)

We obtain the Möller–Trumbore intersection algorithm.

Ray-Mesh intersection. A mesh will be considered as a set of triangles, so, for now, we will
merely traverse all triangles and return the intersection closest to the camera, in exactly the same
way we traverse all objects of the scene to return the closest intersection to the camera. This will be
considerably slow, but we will improve next.

To obtain our first mesh renderings, we will need now to inherit the class Sphere from a more
general Geometry abstract class. An abstract class is a class that has some pure virtual functions
(functions that are not implemented at all, they are tagged virtual and their prototype ends with
= 0 to indicate no implementation is provided), and so, these class cannot be instantiated. Here, our
pure virtual function is the intersect() routine. We will now use the TriangleMesh provided class,
and make it inherit as well from Geometry. Our scene will now consists of an array of pointers to
Geometry rather than (pointers to) Sphere.

" A common bug is to duplicate properties such as materials/albedo/transparency... in the parent

(Geometry) and children (Sphere and TriangleMesh) classes, which results in the wrong variables
being used. Be sure to have all common properties only in the parent class. You may want to debug
your code using a mesh consisting of a single manually constructed triangle.

For a simple demo object, we will be using a low poly cat mesh, available at http://www.cadnav.
com/3d-models/model-47556.html. It has 3954 polygons to test.

" Unless you made a fancy GUI, normalized your models upon loading, or know or made your 3d
model, it is a good habit to check the obj file as a text file or display the bounding box to make sure
sizes are reasonable and the orientation looks correct. 3D modelers can use different units so you could
end up with a kilometer-sized cat or millimeter-sized cat that will not be visible, and the orientation
is not standardized either so that the up vector can be arbitrarily the +Y or +Z coordinate (most
often). Here, our cat model is roughly in the range (−35..30, 0..45, −8..8) which means our cat is a
pretty big boi (given our sphere are of radius 10), and given our ground is at an Y coordinate of −10,
our cat is floating in the air. I will first scale it by a factor 0.6 and translate it by (0, −10, 0) to obtain
Fig. 2.31.

Acceleration structures – Bounding Box. Right now, the rendering is pretty slow due to the
linear time spent in checking all triangles of the mesh – more than 6 minutes for 32 samples per pixel
44 CHAPTER 2. RENDERING

Figure 2.31: Our cat model, just scaled by a factor 0.6 and translated by the vector (0, −10, 0).
At 32 samples per pixel (spp) and 5 light bounces, it took 6 min and 20 seconds (in parallel) by
naively testing all triangles using the Möller–Trumbore intersection algorithm. By adding a simple
ray-bounding box test (and 30 lines of code) this falls to 1 min and 10 seconds. Using a simple BVH
(and about 50 additional lines), the rendering time even falls down to less than 3 seconds, with a close
to 150x speedup compared to the naive approach !

and 5 light bounces – though only adding about 40 lines of code (excluding the 200 lines obj loader).
A simple optimization is to test whether the axis aligned bounding box of the object is intersected by
the ray, and then only checking all triangles if the ray intersects the bounding box.

We have seen the equation for a ray-plane intersection. A bounding box is defined by the intersec-
tion between the volumes enclosed by pairs of planes. As such, as simple algorithm consists considering
the pairs of intersections between the ray and pairs of planes. Theses pairs of intersections define 3
intervals, one for the two planes of constant X, one for the two planes of constant Y and one for the
two planes of constant Z. If theses intervals have a non-null intersection, this means a ray-bounding
box intersection exists, and the ray-triangles intersection can be performed. An interval intersection
test hence corresponds to testing whether min(tx1 , ty1 , tz1 ) > max(tx0 , ty0 , tz0 ) (where if this is true, the
actual intersection is max(tx0 , ty0 , tz0 )), denoting tx0 the intersection along the ray with the first plane of
constant x (similarly for subscript 1 and superscripts y and z – see Fig. 2.32 for notations in 2-D).
It is also interesting to see that a ray-plane intersection with axis-aligned planes take a particularly
simpler form.

We can now write a BoundingBox class containing the two extremas of our bounding box (Bmin and
Bmax ), compute the bounding box of the mesh, and write a function for ray-bounding box intersection.
This makes the routine 6 times faster.

" Beware of computing the bounding box after having translated and scaled your model !

Acceleration structures – Bounding Volume Hierarchies (BVH). The previous idea can be
implemented recursively: if the ray hits the bounding box of the mesh, we can further test if it hits the
two bounding boxes containing each just half of the mesh (and so on with a quarter of the mesh etc.).
The idea is to build a binary tree, with the root being the entire mesh’s bounding box. We then take
the longest axis of the bounding box. Then for each triangle, we determine if its barycenter is within
2.2. PHYSICALLY-BASED RENDERING 45

tx0
x
ty B
x0 x max
tx0 ty0 tx1 ty1
x x x x

Intervals overlap
t
x
1
x ty1
x x Actual ray-box intersection
Bmin

Figure 2.32: A ray-bounding box intersection can be performed by testing the overlap between intervals
defined by pairs of ray-planes intersections.

the first half or the second half of this axis. This determines two sets of triangles, for which we can
compute their bounding boxes and that can be set as the two children of the root node. This process
is recursively performed for these two children nodes, until some criteria is met (for instance, until the
number of triangles in a leaf node is smaller than some threshold). Beware that this procedure does
not produce a space partition: bounding boxes can overlap, since the decision to put a triangle on one
side or the other is only based on its barycenter, while bounding boxes are computed using the
triangle’s 3 vertices (see Fig. 2.33).

x
x
BVH
x x
x x
x x x x
x x x x
x
x
x x
x

x x
x x
x

x
x
x
x
x
x x
x x

x
x

x x
x x
x x x x
x x x x
x
x
x x
x

x x
x x
x

x
x
triangles 0..7 triangles 18..23 triangles 24..31
x triangles 8..17
x
x
x x
x x

Figure 2.33: A BVH recursively computes bounding boxes. The overall bounding box (black) is split
into 2 categories (blue and red) based a vertical split in the middle of the black box. Triangles are
assigned to either the blue or red categories based on their centroid. The bounding boxes of these two
sets of triangles are computed (and they may overlap), and then each subdivided into 2 new categories
(cyan and green, and orange and grey). The process can go further. Here, the 4 leaves of the tree
contain consecutive indices of triangles refering to a permutation of the original set of triangles.

In practice, building this tree can be performed using a method akin to Quick Sort: triangles are
simply reordered in a way that consecutive triangles are in the same bounding box. This can be done
by keeping track of a pivot and performing swaps such that elements before the pivot are smaller,
while elements after it are always larger. This looks like:
46 CHAPTER 2. RENDERING

1 node−>bbox = compute_bbox ( starting_triangle , ending_triangle ) ; //BBox from ←-

s t a r t i n g t r i a n g l e included to e n d i n g t r i a n g l e excluded
2 node−>starting_triangle = starting_triangle ;
3 node−>ending_triangle = ending_triangle ;
4 Vector diag = compute_diag ( node−>bbox ) ;
5 Vector middle_diag = node−>bbox . Bmin + diag * 0 . 5 ;
6 i n t longest_axis = get_longest ( diag ) ;
7 i n t pivot_index = starting_triangle ;
8 f o r ( i n t i=starting_triangle ; i<ending_triangle ; i++) {
9 Vector barycenter = compute_barycenter ( indices [ i ] ) ;
10 // t h e swap below g u a r a n t e e s t r i a n g l e s whose b a r y c e n t e r a r e s m a l l e r than ←-
middle diag are before ” pivot index ”
11 i f ( barycenter [ longest_axis ] < middle_diag [ longest_axis ] ) {
12 std : : swap ( indices [ i ] , indices [ pivot_index ] ) ;
13 pivot_index++;
14 }
15 }
16 // s t o p p i n g c r i t e r i o n
17 i f ( pivot_index<=starting_triangle | | pivot_index>=ending_triangle −1 | | ←-
ending_triangle−starting_triangle <5 ) r e t u r n ;
18 recursive_call ( node−>child_left , starting_triangle , pivot_index ) ;
19 recursive_call ( node−>child_right , pivot_index , ending_triangle ) ;

Remark: We used the middle of the axis as a criterion for separating triangles. In unbalanced scenes
(with many more triangles on one side than the other) this may not be optimal. A heuristic consists
in minimizing the Surface Area Heuristic (SAH)13 to find a better place to cut.

Once the tree is built, the ray-BVH intersection can be performed by recursively visiting boxes
that are intersected. An interesting option is to perform a depth-first traversal until a triangle is
intersected (if any), and to avoid visiting bounding boxes that are further than the best triangle found
so far14 :

1 i f ( ! root . bbox . intersect ( ray ) ) r e t u r n f a l s e ;

2 std : : list<Node *> nodes_to_visit ;
3 nodes_to_visit . push_front ( root ) ;
4 d o u b l e best_inter_distance = std : : numeric_limits<double > : : max ( ) ;
5 w h i l e ( ! nodes_to_visit . empty ( ) ) {
6 Node * curNode = nodes_to_visit . back ( ) ;
7 nodes_to_visit . pop_back ( ) ;
8 // i f t h e r e i s one c h i l d , then i t i s not a l e a f , s o t e s t t h e bounding box
9 i f ( curNode−>child_left ) {
10 i f ( curNode−>child_left−>bbox . intersect ( ray , inter_distance ) ) {
11 i f ( inter_distance < best_inter_distance ) {
12 nodes_to_visit . push_back ( curNode−>child_left ) ;
13 }
14 }
15 i f ( curNode−>child_right−>bbox . intersect ( ray , inter_distance ) ) {
16 i f ( inter_distance < best_inter_distance ) {
17 nodes_to_visit . push_back ( curNode−>child_right ) ;
18 }
19 }
20 } else {
21 // t e s t a l l t r i a n g l e s between curNode−>s t a r t i n g t r i a n g l e
22 // and curNode−>e n d i n g t r i a n g l e a s b e f o r e .

14
A similar remark holds between objects of the scene: it is not useful testing the triangles of a mesh whose bounding
box is further than the best triangle found so far.
2.2. PHYSICALLY-BASED RENDERING 47

23 // i f an i n t e r s e c t i o n i s found , update b e s t i n t e r d i s t a n c e i f needed

24 }
25 }

Doing so drastically improves the render time: now less than 3 seconds for 32 spp! The traversal
order can also be optimized: since we perform a depth first traversal, it can be useful to first traverse
boxes that are closer to the ray origin. Feel free to add this to your pathtracer !

Normals and Textures

Now that we have computed geometric intersections with triangles, we can use barycentric coordinates
to interpolate values on the mesh. The first thing we will do is interpolating normals. In fact, 3d mod-
els are often provided with per-vertex normals (or even per-vertex-per-triangle: one vertex can have
different normals depending on which triangle it is considered to belong to). These artist-defined nor-
mals control the perceived smoothness of the shape, without changing the geometry itself, by allowing
each shaded point to receive a normal that is interpolated from the normals of the vertices of the inter-
α(P ) NA +β(P ) NB +γ(P ) NC
sected triangle. Specifically, we can compute the shading normal as N̂ (P ) = kα(P ) NA +β(P ) NB +γ(P ) NC k
where α(P ), β(P ) and γ(P ) are the barycentric coordinates of P within the triangle ABC whose artist
defined normals at A, B and C are respectively NA , NB , and NC . This shading normal can be used
in all lighting computations15 . This process is called Phong interpolation (and has nothing to do with
the Phong BRDF except this is the same inventor...). The result can be see in Fig. 2.34.

Figure 2.34: Cat model without (left) and with (right) interpolation of normals.

Similarly, vertices are associated with “per-vertex-per-triangle” UV coordinates. These coordinates

correspond to a parameterization of the mesh, which is non-trivial to obtain in the general case. We
will see in Sec. 4.6 how they can be obtained. UV coordinates associate for each vertex of each
triangle a 2D point within a texture map. The texture domain is normalized in the range [0, 1]2 .
Interpolated UV coordinates are often interpreted modulo 1, that is, only the fractional part of the
texture coordinates are used (if values are positive – consider the texture is a flat torus), which can
be useful for tiling textures (a wall made ob bricks can be geometrically modeled by a single quad,
with UV coordinates (0, 0) and (N, N ) at its extremities: a texture of a single brick can be then
used, and will produce a tiling of N × N bricks). UV coordinates interpolation is similarly performed:
UˆV (P ) = α(P ) U VA + β(P ) U VB + γ(P ) U VC . The interpolated UV coordinates are then scaled by
15
One can however notice that tweaking the integration hemisphere may break BRDF energy conservation...
48 CHAPTER 2. RENDERING

the width and height of the texture, and the texture color is then queried at the corresponding pixel
(Fig. 2.35). This color can serve as the albedo, for example.

(0,0)

(1,1)

Figure 2.35: UV coordinates associate for each 3D vertex a 2D coordinate in the texture map, that
can be interpolated using barycentric coordinates.

We are now ready to implement textures. We will be using stb image (see Sec. 1.1) to load the
image and the stbi load function, and retrieve its width and height. Each triangle is associated with
a group that corresponds to the material index within the associated cat.mtl file. This material file,
in this case, contains a single material, so all group values are set to 0 for all triangles – this may
not be the case for more complex objects, where different textures can be used for different parts of
the mesh. You can add a function to load one (or several) textures upon loading the mesh, and your
intersection routine should now return an albedo computed locally. The result can be seen in Fig. 2.36.

" Albedo values are in the range [0, 1] while textures are integers stored in unsigned chars. Do not
forget to divide by 255 ! But at this stage, you may realize that your texture was saved in a gamma-
corrected color space, so you would also need to apply a gamma function of color2 .2 to the queried
colors. Also, make sure to convert your pixel coordinates (x, y) to integers before accessing textures
with formulas such as texture[y*W*3+x*3+c]. If these coordinates contain fractional parts, the wrong
pixel will be accessed ! Finally, beware that the origin (0, 0) of UV coordinates is conventionally the
top left of the texture, while most often textures are loaded from bottom to top.

Blinn-Phong BRDF

Our materials were until now “perfect”: perfectly diffuse, perfectly specular or perfectly transparent.
However, most real-world materials are some combinations of these materials, or have some aspects
of these materials. A simple model was initially presented by Phong, called the Phong BRDF, and is
formulated as f (ωi , ωo ) = ρπd + ρs hωi , RN (ωo )iα , with RN (ωo ) the reflection of ωo around the normal
N , and α the Phong exponent that controls the frequency of the reflection (high α produces smaller
2.2. PHYSICALLY-BASED RENDERING 49

Figure 2.36: Cat model with textures, with a gamma function applied. The code is now about 700
lines long, including the 200 lines obj file reader.

Figure 2.37: The original Phong model does not appropriately model the distortion of highlights at
grazing angles (left of each pair) while this is solved by the Blinn-Phong model (right of each pair).
Left image pair by an unknown author. Right image pair by Lecocq et al. 2017.

highlights, giving the impression of a more shiny material, see Fig. 2.38). However, this model does
not model well highlight distortions at grazing angles (Fig. 2.37). A modified Phong BRDF model is
given by the Blinn-Phong model :
ρd α+8 ωi + ωo α
f (ωi , ωo ) = + ρs hN, i
π 8π kωi + ωo k

which better handles grazing incidences (the correct normalization factor is also slightly more complex).
The term H = kωωii +ω o
+ωo k is called the half-vector, a vector halfway between the incident and outgoing
directions (considering both vectors go away from the surface). We will implement this model.

To implement the Blinn-Phong BRDF, you could simply replace the diffuse BRDF we used by this
BRDF. That would work – up to large noise levels for specular materials. Our importance sampling
strategy consisted in sampling the hemisphere according to a simple cosine function, which produces
more often directions near the surface normal and few directions at grazing angles. However, if the
BRDF is highly specular, we expect the integrand to be very large near the reflected direction, and
very low far from it. This does not coincide with where we importance sampled our directions.

The goal will be to produce an importance sampling strategy for the Blinn-Phong model.

Importance sampling the specular lobe. We will first focus on the specular component and
50 CHAPTER 2. RENDERING

assume ρd = 0 and ρs = 1. We have seen how to importance sample a direction that follows a cosine
law around the normal of the surface – we called this function random cos(const Vector &N). There
is a generalization of this importance sampling strategy that allows to sample according to some power
of a cosine law16 .

r1 , r2 ∼ U(0, 1)
r
2
x = cos(2πr1 ) 1 − r2α+1
r
2
y = sin(2πr1 ) 1 − r2α+1
1
z = r2α+1

where the pdf is given by p(X) = α+1 α

2π cos θ, where here, θ is the angle with the +z axis (or any
other vector, up to a frame change as we did earlier).

We can use this formula to sample a half-vector H (which is the direction that follows some
lobe-shaped law around the normal), and bring it to our local frame with the same change of vari-
ables as before. We finally need to mirror ωo by H to obtain the desired sampled direction ωi .
This last step introduces a transformation that needs to be taken care of in the pdf: we now
have p(ωi ) = 4hωo1,Hi α+1 α
2π hH, N i . Let us call this entire sampling procedure random pow(const
Vector &N, double alpha).

Importance
P sampling
P a mixture model. We would like to sample a distribution of the form
p(x) = i αi pi (x), with i αi = 1. This can be achieved by using a uniform random number between
0 and 1 to determine which of the pi to sample, withR probability RαP i . But then, multiple choices
are possible to numerically evaluate the integral I = f (x)dx = wi fi (x)dx. The first, most
immediate, option is to ignore the particular form of the integrand, and compute the estimate as
I ≈ k P fα(x k)
P
. However, this requires to evaluate pi (xk ) for all pi . In our context, we have two
i i pi (xk )
pi ’s: one for the diffuse part that is cheap to compute, and one for the specular part that is expensive
to compute. Having to evaluate the pdf for the specular part although we sampled the diffuse part is
not optimal. There is another option that also works, by realizing that you actually evaluate a sum
of integrals. In that case, the uniform random number that you initially chose actually corresponds
f (xk )
to selecting which of the fi you want to evaluate. The estimator becomes I ≈ k α i(k)
P
p (xk ) where
i(k) i(k)
i(k) is the index of the k’s randomly sampled pdf pi , and xk the corresponding sample17 . Doing so
allows to first determine which term is sampled, and then only evaluate this part. This implies that
if the diffuse component is chosen, there is no other complex function to evaluate compared to our
implementation for diffuse materials18 . We end up with a code similar to:

1 Vector Scene : : getColor ( c o n s t Ray& ray , i n t ray_depth ) {

2 i f ( ray_depth < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // t e r m i n a t e s r e c u r s i o n a t some ←-
point
3

16
See again Philip Dutré’s Global Illumination Compendium https://people.cs.kuleuven.be/~philip.dutre/GI/
TotalCompendium.pdf
17
Similar weighting strategies exist for more general classes of integrand, which is called Multiple Importance
Sampling
18
There is a third option, but it works less well in practice – see Variance reduction for Russian-roulette http:
//cg.iit.bme.hu/~szirmay/c29.pdf for details.
2.2. PHYSICALLY-BASED RENDERING 51

4 Vector Lo ( 0 . , 0 . , 0 . ) ;
5 i f ( intersect ( ray , P , N , sphere_id ) ) {
6 i f ( spheres [ sphere_id ] . mirror ) {
7 // h a n d l e m i r r o r s u r f a c e s . . .
8 } else {
9 // h a n d l e Phong m a t e r i a l s
10 // add d i r e c t l i g h t i n g
11 Vector xprime = random_point_on_light_sphere ( ) ;
12 Vector Nprime = ( xprime−centerLight ) / ( xprime−centerLight ) . norm ( ) ;
13 Vector omega_i = ( xprime−P ) / ( xprime−P ) . norm ( ) ;
14 d o u b l e visibility = . . . ; // computes t h e v i s i b i l i t y term by l a u n c h i n g a ray ←-
o f d i r e c t i o n o me g a i
15 d o u b l e pdf = dot ( Nprime , ( x−centerLight ) / ( x−centerLight ) . norm ( ) ) / ( M_PI * R * R ) ;
16 Vector brdf_direct = PhongBRDF ( . . . ) ; // t h e e n t i r e Blinn−Phong model
17 Lo = light_intensity / ( 4 * M_PI * M_PI * R * R ) * brdf_direct * visibility * std : : max ( ←-
dot ( N , omega_i ) , 0 . ) * std : : max ( dot ( Nprime , −omega_i ) , 0 . ) / ( ( xprime−P ) . ←-
squared_norm ( ) * pdf ) ;
18
19
20 // add i n d i r e c t l i g h t i n g
21 d o u b l e diffuse_probability = rho_d / ( rho_d+rho_s ) ; // we s h o u l d u s e some c o l o r ←-
a v e r a g e o f r h o d and r h o s
22 i f ( uniform ( engine ) < diffuse_probability ) { // we sample t h e d i f f u s e l o b e
23 Ray randomRay = . . . ; // randomly sample ray u s i n g random cos
24 Lo += albedo / diffuse_probability * getColor ( randomRay , ray_depth −1) ;
25 } else {
26 Ray randomRay = . . . ; // randomly sample ray u s i n g random pow and m i r r o r i n g ←-
o f ray . d i r e c t i o n
27 i f ( dot ( randomRay . direction , N ) < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // make s u r e ←-
we sampled t h e upper h e m i s p h e r e
28 Vector brdf_indirect = rho_s * ( alpha +8) / ( 8 * M_PI ) * PhongSpecularLobe ( . . . ) ; ←-
// j u s t t h e s p e c u l a r p a r t o f t h e Blinn−Phong model
29 d o u b l e pdf_pow = . . . ; // t h e pdf a s s o c i a t e d with our f u n c t i o n random pow ←-
with t h e r e f l e c t i o n
30 Lo += brdf_indirect * std : : max ( dot ( N , randomRay . direction ) , 0 . ) /((1 −←-
diffuse_probability ) * pdf_pow ) * getColor ( randomRay , ray_depth −1) ;
31 }
32 }
33 }
34 r e t u r n Lo ;
35 }

Regarding the choice of ρs , it is usually taken as white for dielectrics (e.g., plastics), but can be
colored for metals. Results can be seen in Fig. 2.38.

Camera and object motion

We can move an object by a transformation T by instead transforming the rays via the inverse T −1 of T .
Specifically, considering a 4x4 affine transform T , you need to transform the ray origin (Ox , Oy , Oz , 1.0)
and direction (ux , uy , uz , 0.0) by T −1 . The point of intersection found should then be transformed by
T
T and its normal should be transformed by the inverse transpose matrix T −1 = T −T . Doing so has
several advantages over directly transforming the vertices of each object upon loading them. First,
a BVH can be appropriate for a mesh but not for a rotated version of it. But second, and more
importantly, this allows for instantiating objects by merely storing several transforms of the same
geometry. And finally, it allows for animating objects by merely playing with the transformations,
rather than building a BVH at each frame of the animation.

Moving the camera is more straightforward: just transform the origin and direction of the ray
when initially generating rays.
52 CHAPTER 2. RENDERING

ρs = 0.02
ρs = 0.1 α = 10 α = 1000 α = 1000

Figure 2.38: Cat model with a Blinn-Phong BRDF with varying α and ρs (here, ρs is not colored).
α controls the roughness of the material (i.e., the size of highlights) while ρs controls the intensity
of highlights. Note that since ρd is guided by a texture between 0 and 1 and ρs is a constant, this
particular rendering may not preserve energy. The rendering takes 1min20 for 1000spp – the code is
about 740 lines long.

Recall that the inverse of a rotation is its transpose, the inverse of a diagonal scaling matrix is a
diagonal matrix with the inverse of the scaling factors, and the inverse of a translation is a translation in
the opposite direction. Our affine transforms usually are compositions of these elementary transforms.
So, if a matrix encodes the transformation y = s R x+t with s a scaling factor, R a rotation matrix and
t a translation, then x = RT (y − t)/s. As such, when there is no scaling factor and when dealing with
vectors such as the normal vector, the inverse transposed transformation is the original transformation.

" Beware: you may have used the coordinates of the light source in your code, outside of the
ray-object intersection test (e.g., during the shading computation). Do not forget to also transform
these coordinates if you want to move the light source !

Normal Mapping

A common way to fake small details without increasing the geometric complexity of the mesh is to use
normal maps – a second way to tweak the shading normal, the fake normal used during the shading
computation in place of the geometric normal. A normal map is simply a texture that stores the
shading normal in some local frame. The two coordinates UV within the normal maps are mapped
to tangent and bi-tangent vectors (i.e., two vectors orthogonal to the geometric normal that form
an orthogonal basis, as we did when we first implemented indirect lighting), and the RGB value
2.2. PHYSICALLY-BASED RENDERING 53

Figure 2.39: Another model (see http://www.cadnav.com/3d-models/model-45798.html) with

Blinn-Phong BRDFs. The BRDF parameters can be found in the supplemental materials of “Ex-
perimental Analysis of BRDF Models” (https://people.csail.mit.edu/wojciech/BRDFAnalysis/
BRDFFits.pdf), a document that contains fits of several analytical BRDF models on 100 measured
materials. Here, they correspond to metallic-gold, alum-bronze, and green-metallic-paint. Note that
this 3d mesh has both few very large triangles and many small triangles. This results in a highly
unbalanced BVH, and the rendering time suffers: about 25min for 1000 spp and 5 bounces, for (only)
143k triangles – something that could be fixed with the Surface Area Heuristic for better balancing.
The mesh has first been scaled by a factor 0.1, then translated by Vector(0, 21, 45) ; the focus
distance is 44 instead of 55.

Figure 2.40: I rotated the cat around the vertical axis by 45 degrees using matrix transforms (along
with a hardcoded translation), and rotated the camera by -10 degrees around the x axis.

within each pixel encodes the shading normal vector in this local frame. As such, most normal maps
are blueish: the blue component represents the normal component of the shading normal, and the
shading normal is most often close to the geometric normal that would be encoded as pure blue:
(0, 0, 1). However, to handle negative values, RGB pixel values are transformed using a RGB ∗ 2 − 1
transformation, so in fact, a shading normal that would be identical to the geometric normal would
be encoded (0.5, 0.5, 1) (or (127, 127, 255) in unsigned char).

To obtain the tangent and bitangent vectors, we will not proceed as before. In fact, our tangent
vectors did not matter before since our reflectance model was isotropic. Conventionally, these vectors
T and B (for Tangent and Bi-tangent) are aligned with the UV parameterization: a vector V (P ) in
3D space at point P , can be expressed as a linear combination of T (P ) and B(P ) at P : V (P ) =
Vu (P ) T (P ) + Vv (P ) B(P ).

As such, in a triangle DEF with UV coordinates Du and Dv (similarly for E and F ), and space
54 CHAPTER 2. RENDERING

coordinates Dx , Dy , Dz (similarly for E and F ), we have

E − D = (Eu − Du )T + (Ev − Dv )B
F − D = (Fu − Du )T + (Fv − Dv )B
In matrix form, this reads:
   
Tx Bx Ex − Dx Fx − Dx
Ty By  Eu − Du Ev − Dv = Ey − Dy Fy − Dy 
Fu − Du Fv − Dv
Tz Bz Ez − Dz Fz − Dz

It becomes easy to invert the system, as:

   
Tx Bx Ex − Dx Fx − Dx −1
Ty By  = Ey − Dy Fy − Dy  Eu − D u Ev − Dv
Fu − Du Fv − Dv
Tz Bz Ez − Dz Fz − Dz

where the inverse of a 2x2 matrix is easily computed using A−1 = 1

det(A) Cof (A)
T with Cof the
cofactor matrix:

−1
Eu − Du Ev − Dv 1 Fv − Dv −(Ev − Dv )
=
Fu − Du Fv − Dv (Eu − Du )(Fv − Dv ) − (Fu − Du )(Ev − Dv ) −(Fu − Du ) Eu − Du

Written differently, we have:

1
T = (E − D)(Fv − Dv ) − (F − D)(Fu − Du ) (2.16)
det
1
B= − (E − D)(Ev − Dv ) + (F − D)(Eu − Du ) (2.17)
det
det = (Eu − Du )(Fv − Dv ) − (Fu − Du )(Ev − Dv ) (2.18)

Now, we can easily compute normalized T and B at each vertex of each triangle of the mesh19 ,
and interpolate these vectors at the desired intersection point P using barycentric coordinates. The
resulting shading normal becomes N̂ = r(P )T (P ) + g(P )B(P ) + b(P )N (P ) where r, g, b represent
the red, green and blue components of the normal map (with the affine transform to bring them in
[−1, 1]), and T (P ), B(P ), N (P ) represent the tangent, bitangent and (geometric) normal vectors at
point P 20 . See Fig. 2.41 for the result.

Participating Media

Until now we have considered the medium in which light travels is just vacuum. It is however quite
common for the medium to scatter light – for instance, fog, clouds, the atmosphere, dust... These
media are called “participating media”. We will simulate that.

The first things to observe is that light is absorbed and scattered away as it travels through the
medium. Light is absorbed exponentially with the distance traveled, as the Beer-Lambert law. But
there is another phenomenon: light reaching neighboring particles is also in-scattered, adding its
contribution to the light ray being considered. This is illustrated in Fig. 2.42.
19
You may need to fiddle a little bit with the code: you may or may not have per vertex normals, and you may want to
obtain per vertex (and not per vertex per triangle) tangents and bitangents. Here, we will consider that we have obtained
one tangent T per vertex of the mesh by averaging the T computed for all triangles containing this vertex, orthogonalize
it w.r.t. the per-vertex normal by removing its component along the normal, and then compute the bitangent B as the
crossproduct between N and T .
20
Similarly to the smooth shading normals we have implemented in Sec. 2.2.1, having a shading normal that is not
exactly the geometric normals can lead to issues in energy conservation.
2.2. PHYSICALLY-BASED RENDERING 55

Figure 2.41: Horse model without (left) and with (middle) normal mapping ; the normal map of the
body is illustrated on the right. The code is about 850 lines and runs in 1min 12s (left) or 1min 15sec
(right) using 1000spp and 5 bounces. The mesh has only 5333 polygons, but normal mapping makes
it look more complex. The mesh can be downloaded here: http://www.cadnav.com/3d-models/
model-46223.html. It has been rotated like the cat, scaled by 0.15 and translated by (10, -10, 0).
The order of the textures to be loaded (since there is no .mtl file) is: body2 d.tga, body2 d.tga,
gear d.tga, gear d.tga, body2 d.tga

P1
x
ωi

x
P

x
P2
Figure 2.42: The light coming from direction ωi is absorbed by the medium, but the medium also
contributes positively (arrows in blue) to the light reaching point P2 .

These phenomena transform the rendering equation by modifying the intensity of the light reaching
a point P2 if it came from P1 in a direction ωi , while up to now, the light emitted from P1 in direction ωi
was the same as the light received by P2 from that direction. The absorption of light can be described
by a multiplicative factor T (t) that depends on the distance parameter t the light has traveled through
the medium. The in-scattered light will be denoted Lv . We have:

Li (P2 , ωi ) = T (kP1 − P2 k)Lo (P1 , ωi ) + Lv (P1 , ωi )

The factor T (t) is called the transmittance function, and equals:

Z t
T (t) = exp − σt (P (r))dr
0
56 CHAPTER 2. RENDERING

where P (r) = P1 + r ωi and σt is the extinction coefficient of the medium, that can be seen as the
gas density of the medium, with σt = σa + σs the sum of the absorption coefficient and the scattering
coefficient. In a few cases of interest, this integral can be computed in closed form. This is the case of
homogenous media, where σt is a constant and thus T (t) = exp(−σt t). This is also the case for expo-
nentially decaying fog (such as in the atmosphere) where σt (y) = α exp(−β(y −y0 )) with y the altitude

over some ground level y0 , in which case T (t) = exp βωαi,y (exp(−β(Py − y0 )) − exp(−β(P1,y − y0 )))
with ωi,y the y component of the ωi direction, and similarly for Py and P2,y . Here P1 is the ray origin
while P2 is the first ray-scene intersection along the ray direction ωi . An illustration of the effect of
absorption can be seen in Fig. 2.43.

Figure 2.43: The absorption term T , using a uniform extinction coefficient (left, σt = 0.03) and
exponentially decreasing model (right, σt = exp(−0.3(y + 10))).

Regarding Lv the in-scattered radiance, it corresponds to all light reaching points along the ray
that scatter light in the direction ωi . It can also simply be expressed as:
Z t Z
Lv (P1 , ωi ) = σs (P (r))T (r) f (ωi , v)Li (P (r), v)dvdr
0 S2

Here f is called the phase function and acts similarly to a BRDF. This function tells how much
a light is reflected off a particle (e.g., of dust) or a molecule (e.g., of gas), similarly to the way a
BRDF describes how much light is reflected off a surface. For simplicity, we will implement a uniform
phase function (i.e., f = 1./(4π)) though you can google Mie scattering formula for large particles,
and Rayleight scattering for particles smaller than the light wavelength (giving its color to the sky).

At first sight, it seems that adding this integration in our path-tracer would be extremely costly.
In fact, recall that what we are doing is Monte-Carlo that essentially does not care about the dimen-
sionality of the integrand ! We are here merely adding a couple of dimensions to an integral equation
that already had many. What we need is to merely be able to evaluate the integrand only with random
parameters for r and v, and the way we average over all light path will take care of evaluating the
integral. Our code should just look like:

1 Vector Scene : : getColor ( c o n s t Ray& ray , i n t ray_depth ) {

2 i f ( ray_depth < 0 ) r e t u r n Vector ( 0 . , 0 . , 0 . ) ; // t e r m i n a t e s r e c u r s i o n a t some ←-
point
2.2. PHYSICALLY-BASED RENDERING 57

3
4 Vector Lo ( 0 . , 0 . , 0 . ) ;
5 i f ( intersect ( ray , P , N , sphere_id ) ) {
6 i f ( spheres [ sphere_id ] . mirror ) {
7 // h a n d l e m i r r o r s u r f a c e s . . .
8 } else {
9 // h a n d l e Phong m a t e r i a l s
10 }
11 }
12 // r e t u r n Lo ; // p r e v i o u s code w i t h o u t p a r t i c i p a t i n g media
13
14 d o u b l e T = . . . . ; // t r a n s m i t t a n c e f u n c t i o n ( u s e c l o s e d form e x p r e s s i o n )
15 Vector Lv = sigma_s_r * T_r * phase_func * getColor ( random_ray , ray_depth −1) ; // ←-
e v a l u a t e t h e i n t e g r a n d with a random ” r ” and random ”v”
16 d o u b l e pdf = . . . ; // pdf f o r t h e c h o i c e o f ” r ” and ”v”
17 r e t u r n T * Lo + Lv / pdf ; // r e t u r n t h e r a d i a n c e m o d i f i e d by t h e p a r t i c i p a t i n g medium
18
19 }

The problem is that the above code contains 2 calls to getColor, one (hidden) to compute the
indirect lighting contribution for Phong materials, and another (shown) for the participating medium
computation. This will makes the number of rays in the scene explode. While one option is to use
a smaller ray depth for the participating medium, a simpler solution lies within Single Scattering.
In the (direct) single scattering approximation, only the direct component is sampled instead of the
entire sphere for the in-scattered contribution (while the light source will contribute a lot to the in-
scattered radiance, the indirect lighting from objects and from nearby particles is often a much smaller
contribution). We will thus not call getColor but send rays toward the light source for which the
intensity is either that of the light source, or zero if it is occluded.

Regarding the random distance r, we could use a uniform random number in (0, t) (with t the
distance between the origin of the ray and the nearest intersection). But the exponentially decaying
nature of the absorption makes it less relevant to sample a point that is very far away (since the light
that will reach P2 will be highly absorbed). We could instead use an importance sampling strategy
that maximizes the contribution of light sources21 . Instead, we will adopt a slightly simpler strategy:
using an exponential distribution. To sample r with an exponential distribution of parameter λ, we
can use r = − log(u)/λ with u a uniform random number in (0, 1)22 and the corresponding pdf is
p(x) = λ exp(−λx).

Regarding the random choice of v, we will sample a point on the light source, use the change
of variable formula (which includes the visibility term, squared distance..), and throw a ray in this
direction v. We will use the same pdf as we computed earlier for sampling spherical area light sources.
The resulting images can be seen in Fig. 2.44.

" You may see very few bright pixels that do not seem to make sense. These are called fireflies
and correspond to events of very low probability that would require many many more rays to be
compensated.... You may want to discard paths where the pdf is smaller than an epsilon. Beware
however that it biases the rendering, but again, variance vs. bias is a tradeoff.

To conclude this course on path-tracing, I will just show a nicer scene. Because let’s face it: the
colors I previously used are just ugly. See Fig. 2.45;

21
See for instance Importance Sampling of Area Lights in Participating Media http://library.imageworks.com/pdfs/
imageworks-library-importance-sampling-of-area-lights-in-participating-media.pdf
22
This can be easily demonstrated using the inverse cumulative distribution function.
58 CHAPTER 2. RENDERING

Figure 2.44: Adding the in-scattered radiance to the models presented in Fig. 2.43, with uniform (left,
σs = 0.004) and exponential (right, σs = 0.5 exp(−0.3(y + 10))) fog. I used λ = 0.3t. The code is 900
lines and renders in 3min 40sec.

2.2.2 Photon Mapping

A completely different approach relies in launching photons from all light sources, making them interact
with the scene and storing photons on the 3d geometry at each bounce: this produces a photon map
that contains millions of photons desposited in the scene (Fig. 2.46). This photon map is stored within
an acceleration structure tailored for spatial search (while we could use A BVH as well, kd-trees that
produce a space partitionning are often preferred in photon mapping). The scene is finally raytraced
from the camera (without making the ray bounce), and at each ray-scene intersection, nearby photons
are collected using the acceleration structure, and density estimation is performed to estimate how
much energy is reflected towards the camera. Density estimation can be performed by looking for
a fixed number of neighbors and looking how far we need to look for these photons, or it can be
performed by counting how many photons fall within a fixed search radius. This raytracing step is
the final gathering.

Similarly to bidirectional path-tracing, launching photons from light sources allows to better cap-
ture phenomena like caustics, that are otherwise difficult to capture with (unidirectional) path-tracing.

2.2.3 Precomputed Radiance Transfer

Let’s write the rendering equation without emissivity:

Z
Lo (ωo ) = f (ωi , ωo )Li (ωi )hωi , N idωi
Ω

We can easily decompose the different quantities on orthogonal basis functions defined on the
(hemi-)sphere: {Fk }k . Let’s denote the decomposition using hat symbols, and include the cosine term
in the BRDF:
X
f (ωi , ωo )hωi , N i = fˆωko Fk (ωi ) (2.19)
k
X k
Li (ωi ) = L̂i Fk (ωi ) (2.20)
k
2.2. PHYSICALLY-BASED RENDERING 59

Figure 2.45: A nicer scene that includes an exponential fog, better colors for the walls and the ground,
and the Davy Jones model that can be found at http://www.cadnav.com/3d-models/model-45279.
html. Since there is no .mtl file, the textures (by number) should be loaded in that order: 2, 3, 11, 5,
1, 0, 9, 8, 6, 10, 7, 4. These textures include alpha maps (used in this rendering) that tell whether an
intersection should be considered as opaque or transparent (it should be tested inside the ray-triangle
intersection test), as well as specular maps (not used in this rendering) that give the ρs coefficient par
pixel. Rendering time: 4 min. for 1000 spp.

With this decomposition, one can rewrite the above rendering equation:
Z X X l
Lo (ωo ) = fˆωko Fk (ωi ) L̂i Fl (ωi )dωi
Ω k l
Z
XX l
= fˆωko L̂i Fk (ωi )Fl (ωi )dωi
k l Ω
R
If the basis functions are orthogonal with respect to the inner product hFk , Fl i = Ω Fk (ωi )Fl (ωi )dωi ,
this means that XX l
Lo (ωo ) = fˆωko L̂i
k l

In other words, one can easily compute the integral by just performing a scalar product between
vectors of coefficients. It can become easy to use this technique for rendering, by precomputing
tabulated values for the decomposition of a BRDF onto some basis functions and the decomposition
of some incident lighting (e.g., computed using photon mapping, or modeled using an environment
map, see Fig. 2.47 and 2.49), and performing the dot product in realtime.

Spherical Harmonics. Spherical Harmonics (SH) are commonly used orthogonal basis functions
on the sphere (Fig. 2.48). They are analogous to the Fourier transform on the plane (eigenfunctions
60 CHAPTER 2. RENDERING

Figure 2.46: Interior scene: (a) Traditional ray tracing. (b) Photon map. (c) Precomputed radiance
estimates at 1/4 of the photon positions. (d) Complete image with direct illumination, specular
reflection, and soft indirect illumination. Fig. 5.2 of the SIGGRAPH 2002 course “A Practical Guide
to Global Illumination using Photon Mapping”.

Figure 2.47: An environment map (or envmap) is simply a panoramic image representing the incident
radiance at a point. It is often used for outdoor scenes since it well approximates distant illumination,
and can be captured by mobile phone apps that stitch photographs into a panoramic image, or by
taking photo(s) of a chrome ball (left). Here, the same environment map is shown with 3 different
parameterizations: Latitude-Longitude, light probe, and cube map.

of the Laplacian operator are sine and cosines on the plane, and are spherical harmonics on the sphere
; if you are not sure about what is a Fourier transform, see Sec.13). They hence represent a frequency
decomposition of the signal. Similarly to the Fourier transform, they posess a discrete version, that can
be efficiently evaluated using Fast Fourier Transforms. Additionally, they posess rotation formulas:
one can obtain the SH decomposition of a rotated version of the signal using a simple (block diagonal)
matrix-vector multiplication of the SH coefficients of the original signal. This property can be useful
for frame changes and interpolation. The m = 0 subset of SH are called Zonal Harmonics.

Finally, from the rendering equation expressed in term of dot product between SH coefficients,
it becomes clear that a low frequency illumination over a high frequency (e.g., specular) surface will
produce the same result than a high-frequency illumination over a low-frequency (e.g., diffuse) material
– see Fig. 2.49. This is one reason why photographers use light diffusers: they will make skin more
matte, and remove shiny reflections.

Spherical Wavelets. Spherical Harmonics have the same limitations that Fourier basis functions:
2.2. PHYSICALLY-BASED RENDERING 61

Figure 2.48: Spherical Harmonics up to degree 5 (Source Dr Franz Zotter, Wikimedia Commons).

Figure 2.49: First image: input environment map, as a light probe. Other images: Progressively
increasing the number of terms in the spherical harmonics decomposition increases accuracy. In that
order: 3, 5, 6, 7, 30, 55, 80 spherical harmonic bands (N bands correspond to N 2 coefficients).

they are non-local, and tend to induce ringing artifacts when clamped abruptely. Compressing highly
specular BRDFs with SH is thus not very efficient. In this context, wavelets that were introduced for
image processing have been extended to work on the sphere. A simple Haar wavelet decomposition
on the sphere can be obtained via successibe triangulations of the sphere, filtering and differences.
A detailed hands on introduction to spherical wavelets in matlab can be found in Gabriel Peyré’s
Numerical Tours: https://www.numerical-tours.com/matlab/meshwav_4_haar_sphere/.

2.2.4 Radiosity

In the special case of diffuse surfaces, with isotropic omnidirectional emissivity (Le does not depend on
ωo ), and assuming vacuum (then the incident radiance Li is exactly the outgoing radiance Lo coming
from another point, at equilibrium, and is simply our unknown denoted L) the rendering equation can
be further simplified:
62 CHAPTER 2. RENDERING

Figure 2.50: Progressively increasing the number of spherical harmonic bands to represent a gold
BRDF makes it more shiny. In that order: 1, 2, 3, 4, 6, 12 SH bands (N bands correspond to N 2
coefficients).

Z
ρ(x)
L(x) = Le (x) + L(x, ωi )hωi , N idωi
π Ω

Notice how the result does not depend on any direction: one can freely navigate in the scene
without needing to recompute anything. We will rewrite the rendering equation so as to integrate
over the scene surface elements rather than directions, as we did in Sec. 2.2.1:

Z
ρ(x)
L(x) = Le (x) + L(x, ωi (x0 ))G(x, x0 )dx0
π S

0 ,−ω 0 )iV 0)
with G(x, x0 ) = hωi (x0 ), N i hN i (x
kx−x0 k2
x (x
the form factor we talked about earlier in Sec. 2.2.1.

The idea is to decompose again the unknown radiance L onto basis functions. Typically, either
constant or piecewise linear functions are used per triangle of the mesh. For instance, using constant
basis functions per triangles, and denoting Bk the basis functions which is 1 over triangle k and 0
elsewhere, we can rewrite the above expression in this basis as:

ρk X l k,l
Lk = Lke + LG
π
l

This yields a particularly simple linear system, written in matrix/vector form:

ρ
L = Le + diag( )GL
π
and by rearranging terms:
ρ −1
L = Id − diag( )G Le = M −1 Le
π
2.3. DISCUSSION 63

Solving linear systems in general is out of the scope of this class23 . However, a particularly simple
approach is to use Jacobi iterations, that read at iteration n + 1:

 
1  i X
Li,n+1 = Le − Mi,j Lj,n 
Mi,i
j6=i

where Li,n is the radiosity at triangle i and iteration n, and converges to the true solution Li as
n → ∞. It happens that each additional Jacobi iteration simulates one new light bounce.

The last detail I did not mention is how to compute

R R the matrix G. This matrix (assuming piecewise
constant basis functions) has coefficients Gi,j = Ti Tj G(x, x0 )dxdx0 where the integration is over all
pairs of triangles. Since G(x, x0 ) includes a visibility term, there is no real hope to have a closed form
expression in the general case: this integral is performed by sampling pairs of points, computing the
visibility term by raytracing, and evaluating the integral using Monte Carlo integration. To generate
uniformly random points within a triangle (with pdf p(x)1/area), one can again rely on the Global
Illumination Compendium:

r1 , r2 ∼ U(0, 1) (2.21)
√
α = 1 − r1 (2.22)
√
β = (1 − r2 ) r1 (2.23)
√
γ = r2 r1 (2.24)
(2.25)

with α, β and γ the barycentric coordinates of the sampled point. A radiosity result can be seen in
Fig. 2.51. More recent approaches allow for glossy materials24 .

2.3 Discussion

Nowadays, most work on radiosity has been abandonned: this approach is most often costly and
(almost) limited to diffuse scenes, but mostly, highly dependent on the mesh quality. Rendering a
large diffuse flat wall cannot be done with a single quadrilateral (or two triangles) but many triangles
that would ideally align with cast shadows (a few approaches try to progressively refine the mesh
where needed).

Also, offline, costly path-tracing and real-time GPU rasterization tend to get inspired by each
other. A couple of game engines start to integrate path-traced effects on the GPU for rendering
specular or transparent objects, mostly by using very very few samples per pixel combined with clever
filtering tricks (e.g., using deep learning). Conversely, it can be sometimes useful to rasterize the first
bounce of a path-tracer since the first intersection between camera rays and the scene can often be
found directly by rasterizing (if no depth of field effect is desired).

23
You can see a couple of slides I wrote at https://projet.liris.cnrs.fr/origami/math/presentations/matrices.
pdf
24
Implicit visibility and antiradiance for interactive global illumination, https://hal.inria.fr/inria-00606794/PDF/
ImplicitVisibilityAndAntiradiance.pdf
64 CHAPTER 2. RENDERING

Figure 2.51: Radiosity result – my (very naive) implementation has 100 lines for building and solving
the linear system + 450 lines for defining basic classes (Vector, Triangle, Mesh...), reading obj files,
constructing the BVH, intersecting. There are 10 light bounces (i.e., Jacobi iterations), 62 892 triangles
and piecewise constant basis functions. The entire matrix M is densely stored so it is huge in memory
(about 100GB in total for one matrix M per RGB color channel) – much better strategies exist –
and the computing time is a few hours. The mesh is available at https://perso.liris.cnrs.fr/
nbonneel/radiositymesh.obj – triangles with group==3 are emissive.
Chapter 3

Image Processing

This chapter covers a couple of image processing techniques that are popular within the computer
graphics community.

3.1 Filtering

This section will detail a couple of commonly used filtering operations in computer graphics.

3.1.1 Gaussian filtering

I will briefly go over Gaussian filtering. We have seen some ways of doing Gaussian filtering in the
context of our path-tracer. Gaussian filtering amounts to performing a convolution between a signal
and a Gaussian function. In 1-d the Gaussian function is expressed as:
√
h(x) = 1/(σ 2π) exp((x − µ)2 /(2σ 2 ))

The convolution between the signal f and Gaussian h is defined as, in 1D again and continuously,
Z ∞
(f ∗ h)(y) = f (x)h(y − x)dx
−∞

This operation averages values in the neighborhood of y with some Gaussian weight so that values
far from y contribute less to the average and values close to y contribute more. This tends to blur the
image, as can be seen in Fig. 3.1. Filtering a 2-d image with a Gaussian can be performed in several
ways.

Discrete, separable, truncated, convolution

The first thing to realize is that a d-dimensional isotropic Gaussian function is the product of d 1-
dimensional Gaussian functions. It happens that in most cases, we will be interested in isotropic
Gaussians anyway. As such, the convolution in d dimensions for isotropic Gaussians is separable.
Concretely, it means that a Gaussian convolution over a 2-d image can be performed by first con-
volving each rows independently with a 1-d Gaussian, and then, convolving each column of the result
independently with a 1-d Gaussian.

65
66 CHAPTER 3. IMAGE PROCESSING

Noisy image Gaussian filtered Bilateral filtered Non-local Means

0.1 s 0.24 s 0.4 s
25 lines 25 lines 45 lines

Figure 3.1: Image filtering. To handle RGB values, we perform 3 independent filterings, 1 for each
color channel. First row. Our Davy Jones rendering (without fog) at 4spp contains much noise and
is computed in 0.6 seconds in 512x512 (left). A simple Gaussian (σ = 7, 31x31 windows) aggressively
blurs the result without accounting for edges. The bilateral filter (σx = 7, σv = 45, 31x31 windows)
reduces noise while preserving edges. A naive bilateral filter implementation runs in 0.24s, though a
(naive) bilateral grid already brings it to 10 ms and takes 80 lines of code instead. A non-local mean
filtering (h = 52 ∗ 98) similarly preserve edges. Second row. Same things at 32 spp – rendering time:
5 seconds. Gaussian: σ = 5. Bilateral filter: σx = 5, σv = 15 and 31x31 windows. The bilateral grid
filtering time is 35 ms and is more expensive since the grid is finer. Non-local mean: h = 152 ∗ 98. All
timings performed using parallel code. Recall that we did not correlate samples among neighboring
pixels, so we could have (largely) reduced noise without post filtering.

The second thing to realize is that Gaussian functions drop quickly, such that truncating a Gaussian
at 1 standard deviation σ preserves 68% of its integral, and at 2σ, 95% remains and only less than
5% is lost. For filtering purposes, it is often the case that loosing the last 5% is ok1 . Additionally, in
certain cases it can be simpler to use the fact that convolving with a Gaussian of standard deviation σ
amounts to convolving twice with a twice cheaper Gaussian of standard deviation σ/2. Finally, we will
see in Sec. 3.1.2 that since Gaussian functions are low pass filters (they smooth thing out and remove
high frequencies), Gaussian filtering can be approximately computed on a coarser image resolution.

Overall, for filters with relatively small σ, a separable truncated convolution can be cheaply per-

1
A few algorithms such as the Sinkhorn algorithm for optimal transport are not robust to Gaussian truncation or
other approximations.
3.1. FILTERING 67

formed. A typical algorithm looks like:

Algorithm 1: Compute a Gaussian convolution of image I with a Gaussian of standard

deviation σ, assuming 0 outside of the image I.
Input: Image I of size W × H, standard deviation σ
Output: Filtered image F
1 S ←− 2σ // Support of the filter
√
2 h[−S..S] ←− 1/(σ 2π) exp([−S..S]2 /(2σ 2 )) // Precomputes a 1d filter
3 T [0..W, 0..H] ←− 0 // Temporary image
// Filter each row
4 for i = 1..H do // For each row (in parallel)
5 for j = 1..W do // For each column
6 for k = max(j − S, 1) − j.. min(j + S, W ) − j do // For each filter value
7 T (i, j)+ = h(k) ∗ I(i, j + k)

/* Transpose the image so that columns become rows. This can be done inplace if the image is
square by swappings rows and columns. Transposing the image allows for better cache
coherence rather than operating directly on columns since images are stored rows by rows.
*/
8 T ←− tranpose(T )
// Filter each column
9 for i = 1..W do // For each column (in parallel)
10 for j = 1..H do // For each row
11 for k = max(j − S, 1) − j.. min(j + S, H) − j do // For each filter value
12 F (i, j)+ = h(k) ∗ T (i, j + k)

// Transpose back to get the original orientation.

13 return tranpose(F )

Using FFTs

An important identity states that a convolution in the spatial domain amounts to a product in the
frequency domain. A such, denoting F the Fourier transform, and F −1 its inverse, we have f ∗
h = F −1 (F(f ).F(h)). Assuming our images are periodic and stored as discrete pixels, the discrete
analogous to the Fourier transform, called discrete Fourier transform, can be computed via the Fast
Fourier Transform algorithm, and the theorem still holds.

The Discrete Fourier Transform. A detailed introduction to the Fourier transform is out of
the scope of this class, but I will give some intuition about it. The idea is to decompose a signal
into a sum of waves of different frequencies. With a discrete 1-d signal {xn }, this can be achieved by
computing:
N −1
X 2π 2π
Xk = xn (cos( kn) − i. sin( kn))
N N
n=0

This amounts to projecting our sequence {xn } onto complex exponential basis functions of integer
i2π
frequencies e− N kn : the 0th frequency (for k = 0) we simply get the average of the signal, for the 1st
frequency, it is an average weighted by a complex exponential of 1 period etc. This transform hence
well represents the different frequencies in the signal, and one can recover the initial signal based on
its frequency decomposition:

N −1
1 X 2π 2π
xn = Xk (cos( kn) + i. sin( kn))
N N N
k=0
68 CHAPTER 3. IMAGE PROCESSING

For 2-d images, formulas are similar, except that now the frequency is a 2-d vector representing
directional waves. We can thus extract “vertical” frequencies, “horizontal” frequencies, “diagonal”
frequencies and so on (see Fig. 3.2):

M −1 N −1
X X 2π 2π
Xk,` = xm,n (cos( (km + `n)) − i. sin( (km + `n)))
N N
m=0 n=0

and
M −1 N −1
1 X X 2π 2π
xm,n = Xk,` (cos( (km + `n)) + i. sin( (km + `n)))
MN N N
k=0 `=0
i2π
This corresponds to projecting on directional complex exponentials e− N
k.n
where now k is a 2-d
frequency k = (k, `) and n is a 2-d image coordinate n = (m, n).

This formula seems computationally intensive since each 2-d frequency involves a summation over
the entire image plane. Fortunately, two strategies make that very cheap. First, similarly to the
separable Gaussian convolution, one can see that a 2-d Fourier decomposition can be achieved by
performing 1-d Fourier decompositions alongs rows and then columns of the image. Second is a
particularly fast algorithm by Cooley and Tukey called the Fast Fourier Transform 2 . This algorithm
recursively splits the summation in 2 parts (it hence works best for images of sizes that are power of
2), which results in an algorithm of complexity O(N log(N )) where (here) N is the number of pixels
in the image for 2-d images, or the number of values for 1-d (or any-dimensional) data.

The filtering process hence consists in precomputing the FFT of the Gaussian and of the im-
age, computing their pixel-wise product, and performing an inverse FFT. This technique is ideal for
Gaussian filters of large σ. In fact, additional speedup can be obtained by considering the Discrete
Cosine Transform, i.e., the restriction of the Discrete Fourier Transform to only cosines (so there is
no imaginary part and the cosine transform is real) since the Gaussian function is even3 .

Using recursive approximations

The Deriche filter consists in considering that a Gaussian is composed of a sum of 2 half-Gaussians,
that each can be approximated by a recursive causal or anticausal filter by approximating these half-
gaussians with sum of (complex) exponentials. He developed the following approximation to filter in
1-d (and again, this can be performed on rows and then columns) that can be expressed, outside of
boundaries, with the following :

1 1.512 1.475 1.475 1.556
Xi+ = √ 1.0051 xi + exp(− )(−2.9031 cos( ) + 1.021 sin( )) + 0.8929 exp(− ) xi−1
σ 2π σ σ σ σ

−3.068 1.475 1.475 −3.024
+ exp( ) −0.8929 cos( ) − 1.021 sin( ) + 1.898 exp( ) xi−2
σ σ σ σ

−1.512 1.475 −1.556 +
+ 2 exp( ) cos( ) + exp( ) Xi−1
σ σ σ

3.068 1.475 3.024 +
− 2 exp(− ) cos( ) + exp(− ) Xi−2
σ σ σ
4.58 +
+ exp(− )Xi−3
σ
2
See https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm. A typical C++ library for that is
FFTW
3
See A Survey of Gaussian Convolution Algorithms, https://www.ipol.im/pub/art/2013/87/article.pdf. Note
that FFT assumes periodic boundary conditions while DCT assumes symmetric boundary conditions, which may work
better in most cases.
3.1. FILTERING 69

Image Discrete Fourier Transform (log-abs)

Figure 3.2: Discrete Fourier Transform results on images. The result is shown in the log-domain,
and only the magnitude of the complex values are shown (the phase is harder to interprete). As can
be seen, repeating edges in the images result in lines that are orthogonal to the edges in the Fourier
domain. The Fourier Transform makes it easier to find repeating oriented structures.

for the causal part (i.e., values depend on previously computed values), and

− 1 1.512 1.475 1.475 1.556
Xi = √ exp(− )(−0.8929 cos( ) + 1.021 sin( )) + 1.898 exp(− ) xi+1
σ 2π σ σ σ σ

−3.068 1.475 1.475 −3.024
+ exp( ) −2.9031 cos( ) − 1.021 sin( ) + 0.8929 exp( ) xi+2
σ σ σ σ

−4.58
+1.0051 exp( )xi+3
s

−1.512 1.475 −1.556 −
+ 2 exp( ) cos( ) + exp( ) Xi+1
σ σ σ

3.068 1.475 3.024 −
− 2 exp(− ) cos( ) + exp(− ) Xi+2
σ σ σ
4.58 −
+ exp(− )Xi+3
σ

and (x ∗ h) = X + + X − gives the final result4 . A result can be seen in Fig. 3.3.

3.1.2 Bilateral filtering

An issue with Gaussian filtering is that it tends to blur edges, so it cannot be used for heavy denoising
(Fig. 3.1). To alleviate this issue, the bilateral filter adds a weighting term that penalize spatial
smoothing in places where edges occur. Bilateral filtering a 1-d signal {fi } can be performed using
the following formula to compute the ith value of the result:
4
See A Survey of Gaussian Convolution Algorithms for longer, more accurate, recursions. They have accompanying
code, but beware, they have typos P in Algorithm 10Pof the paper: for k = K, b− +
K = −aK b0 and the first sum of the
K − K − −
anticausal filter runs from 1 to K : k=1 bk fn+k − k=1 ak qn+k (i.e., b0 = 0
70 CHAPTER 3. IMAGE PROCESSING

1.5

0.5

0
Signal
-0.5 Gaussian conv.
Deriche approx.
-1
0 200 400 600 800 1000

Figure 3.3: Filtering a signal with a Gaussian (σ = 10) vs. using Deriche’s approximation (the 3-term
recurrence of Sec. 13). You can’t see both curves ? Sure, that means the approximation is good!

2 (fi −fj )2
exp(− (i−j)
P
j=−K..K 2σ 2
) exp(−
2σv2
)fi+j
x
Fi = P (i−j)2 (fi −fj )2
j=−K..K exp(− 2σx2 ) exp(− 2σv2 )

Without the second term involving σv , this would exactly be a (truncated) Gaussian filter. How-
ever, the extra term is such that large differences in the signal (i.e., edges) reduce the contribution of
the neighboring pixels. With σv = ∞, this amounts to Gaussian blurring, while σv = 0 results in no
denoising at all. This formula can again be easily generalized to 2-d images or higher dimensional sig-
nals5 . This formula implemented naively can become quite costly for large images and neighbors: like
for Gaussians, it results in a complexity of O(K 2 M N ) with K the width of the window of neighbors,
and M N the number of pixels in the image. However, it cannot be further sped up with the same
tricks as this filter is nonlinear: it cannot be directly computed via the Fourier Transform, and it is
not separable, so it cannot be performed dimension by dimension.

An interesting trick has been found via the Bilateral Grid data structure6 . This considers that
the above filtering can be computed as a 3-D Gaussian filtering in the space × intensity domain with
homogeneous coordinates (for the normalization), and that since this is a low-pass filter, it can be
performed at a much coarser resolution. In fact, a filter of standard deviation σ can be performed on a
grid that is σ times coarser, and this amounts to using a Gaussian function of standard deviation 1 on
this coarser grid (assuming the size of the image can be divided by σ and σ is an integer). Specifically,

5
In fact, it can also be generalized by considering that the second term does not depend on (fi − fj )2 but on (gi − gj )2
with g a different image that serves as a guide. This is called the Cross Bilateral Filter.
6
A Fast Approximation of the Bilateral Filter using a Signal Processing Approach, https://people.csail.mit.edu/
sparis/publi/2006/tr/Paris_06_Fast_Bilateral_Filter_MIT_TR.pdf
3.1. FILTERING 71

a simple version of the algorithm is as follows:

Algorithm 2: Compute a bilateral filtering of image I using a bilateral grid structure, a

spatial standard deviation σx , and intensity standard deviation σv . [x] denotes the rounding
of x to the nearest integer.
Input: Grayscale image I of size W × H, standard deviations σx , σv
Output: Filtered image F
1 I˜ ←− zeros(W/σx , H/σx , 255/σv , 2) // downsampled homogeneous data // Downsample
2 for i = 1..H do // For each row
3 for j = 1..W do // For each column
4 ˜
I([i/σx ], [j/σx ], [I(i, j)/σv ], 1)+ = I(i, j)
5 ˜
I([i/σx ], [j/σx ], [I(i, j)/σv ], 2)+ = 1
/* Standard isotropic truncated Gaussian convolution with σ = 1. A 5x5 window can be used. */
6 ˜ σ = 1)
F̃ ←− GaussianFilter(I,
/* Upsampling to the original resolution. A tri-linear interpolation is recommended ; a
nearest filtering somewhat works, so let’s do that. */
7 for i = 1..H do // For each row
8 for j = 1..W do // For each column
9 F (i, j) = F̃ ([i/σx ], [j/σx ], [I(i, j)/σv ], 1)/F̃ ([i/σx ], [j/σx ], [I(i, j)/σv ], 2)

10 return F

3.1.3 Non-local means

A similar idea shared with non-local means is to weigh pixels that are more similar to the current pixel
when spatially averaging. In non-local means filtering, the idea is to perform a weighted average of a
large neighborhood (the original paper7 mentions a 21x21 window), where the weight is computed as
a Gaussian function of a similarity metric. While in bilateral filtering, this similarity metric is just the
difference between pixel values, here the similarity metric is an `2 distance between 7x7 neighborhoods.
The formula becomes (directly in 2D) :

(fi+k+m,j+`+n −fi+m,j+n )2
P
m=−3..3,n=−3..3
P
k=−10..10,`=−10..10 exp(− h )fi+k,j+`
Fi,j = P 2
m=−3..3,n=−3..3 (fi+k+m,j+`+n −fi+m,j+n )
P
k=−10..10,`=−10..10 exp(− h )

Again, this can become quite costly to evaluate (each pixel is seen 212 ∗ 72 = 21609 times).
Accelerations and generalizations have been proposed, for instance by using fast nearest neighbor
features datastructures such as PatchMatch8 ).

In the context of denoising, more recent restoration algorithms make use of deep convolutional
neural network to learn noise models.

7
A non-local algorithm for image denoising: https://www.iro.umontreal.ca/~mignotte/IFT6150/Articles/
Buades-NonLocal.pdf
8
PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing: https://gfx.cs.princeton.
edu/pubs/Barnes_2009_PAR/patchmatch.pdf
72 CHAPTER 3. IMAGE PROCESSING

3.2 Color Matching

A typical problem for colorists is to get a good color palette in an image. While this can be performed
manually9 , we will see popular techniques for transfering the color palette from one image to another.
We will call model image the image from which we want to extract the color style, and the input image
the image we want to transform.

Input image Model image Mean/stdev Sliced

80 ms 5s
50 lines 20 lines

Figure 3.4: Color matching. Matching the mean and standard deviation does not precisely respect
the color distribution (e.g., see the brighter ground on the second row) but gives the overall atmosphere.
A sliced optimal transport approach precisely respects the color distribution, but this can produce
artifacts such as too large contrasts (e.g., the background wall in both results). I used 100 iterations
for the sliced optimal transport approach.

3.2.1 Simple mean/standard deviation matching

A pioneer work on color transfer is a simple procedure that matches means and standard deviations
of pixel values, in some lαβ color space10 . The lαβ color space represents some luminance value l and
chrominances α (yellow-blue) and β (red-green) and are computed using a linear transform of the log
of the LMS color space. The LMS color space represents the eye’s response to light relative to Long,
Medium and Short cones. Converting RGB values (normalized in [0, 1]) to lαβ values can be done
with the following transforms:

    
L 0.3811 0.5783 0.0402 R
M  = 0.1967 0.7244 0.0782 G
S 0.0241 0.1288 0.8444 B

9
and I urge you to get familiar with color spaces. We have seen the RGB color space that directly maps to displays,
but other exists such as Lab (that is perceptually uniform and ideal to compute perceptual distances between colors ;
supposedly, a Euclidean distance of 1 represents a “just noticeable difference” in term of colors), HSV (that is intuitive
and good for colorists), XYZ (that is good for spectral rendering), CMYK (that is good for printing), LMS (that represent
eye’s photoreceptor responses) and many others.
10
Color Transfer between Images: https://users.cs.northwestern.edu/~bgooch/PDFs/ColorTransfer.pdf
3.2. COLOR MATCHING 73

    
l 0.5774 0.5774 0.5774 log10 (L)
α = 0.4082 0.4082 −0.8165 log10 (M )
β 0.7071 −0.7071 0 log10 (S)

Once all the pixels of the input and model images have been transformed to this lαβ color space,
means and standard deviations11 are matched for each color channel independently. Specifically,
denoting ναi and σαi the means and standard deviation of the α color channel of the input image,
called αi , and µm m
α σα the means and standard deviationmof the α channel of the model image called
αm , the α channel of the transformed image reads αt = σσαi (αi − µiα ) + µm
α . And similarly for the l and
α
β channels. The assumption behind this model is that pixels (in lαβ) follow an isotropic Gaussian
distribution.

Finally, the transformed values are converted back to RGB using:

    
log10 (L) 0.5774 0.4082 0.7071 l
log10 (M ) = 0.5774 0.4082 −0.7071 α
log10 (S) 0.5774 −0.8165 0 β

    
R 4.4679 −3.5873 0.1193 L
G = −1.2186 2.3809 −0.1624 M 
B 0.0497 −0.2439 1.2045 S

Results can be seen in Fig. 3.4.

3.2.2 Sliced optimal transport matching

Another approach for transfering colors consists in considering an optimal transport problem12 . Here,
we will only consider images with the same number of pixels, and this so called optimal transport
problem becomes, in this case, a linear assignment problem. The goal here is to find a one to one
assignment between pixels of the input image (for instance in the RGB color space) and pixels of
the model image minimizing some cost. We hence try to match two point clouds of pixels that live
in a 3-dimensional (RGB) space. Once the matching is done, we can simply move the pixels of the
input image towards their assigned pixel in the model image, and the color distributions will perfectly
match.

Such linear assignment problem can be quite costly to compute and the sliced approach consists
in projecting the initial problem of matching pixels in 3-d onto 1-d lines where optimal transport is
trivial. The overall algorithm consists in, iteratively, first find a uniformly random direction on the
sphere. For that, we can go back to the Global Illumination Compendium, and use the formula:

r1 , r2 ∼ U(0, 1)
p
x = cos(2πr1 ) r2 (1 − r2 )
p
y = sin(2πr1 ) r2 (1 − r2 )
z = 1 − 2r2
q P 2
11 1
P 2 1
Recall that computing a standard deviation can be done via σX = N i xi − N i xi
12
Somewhat similar results can be found in N-dimensional probability density function transfer and its application
to color transfer : https://github.com/frcs/colour-transfer/blob/master/publications/pitie05iccv.pdf but the
technique is that of Sliced and Radon Wasserstein Barycenters of Measures: https://hal.archives-ouvertes.fr/
hal-00881872/document which has a similar framework to that of Wasserstein Barycenter and its Application to Texture
Mixing: https://hal.archives-ouvertes.fr/file/index/docid/476064/filename/TexturesECCV10.pdf
74 CHAPTER 3. IMAGE PROCESSING

Then, we project the input and model point clouds (i.e., pixel RGB values) onto this direction (that
is, simply computing the dot product between this random direction and the pixel coordinate). Once
projected, the optimal matching that is of interest to us simply consists in matching the first projected
point of the input point cloud to the first projected point of the model point cloud, the second with
the second and so on. To do that, we simply sort the two projected point clouds according to their
computed dot product, while keeping track of the pixel index (for instance, using a std::pair<>: by
default, std::sort will sort according to the first element in the pair). Finally, we advect each point
of the input point cloud in the randomly chosen direction by the projected distance to its matched
point in the model image. We finally iterate with newly chosen random directions. To summarize the
algorithm:

Algorithm 3: Sliced optimal transport color transfer algorithm.

Input: Color input image I and model M , both consisting of n pixels.
Output: Color matched image: I is modified in-place
1 for iter = 1..nbiter do
2 v ←− random direction()
// Project. We store the dot product and pixel index as a pair of values
3 for i = 1..n do // For each pixel
4 projI(i) ←− (hI(i), vi, i)
5 projM (i) ←− (hM (i), vi, i)
// Sort according to the dot product
6 sort(projI)
7 sort(projM )
// Advect initial point cloud.
8 for i = 1..n do // For each pixel
9 I(projI(i)[2])+ = (projM (i)[1] − projI(i)[1])v

10 return I

Results can be seen in Fig. 3.413 .

3.3 Image Retargeting

Input image Cropped Stretched Seam Carving

Figure 3.5: Image Retargeting. We would like to make the input image square. Cropping can loose
useful information or, like here, the image composition. Stretching, here, distorts the scultpure. The
Seam Carving approach removes vertical seams that are least useful. It here removed 450 vertical
seams on a 1800x1350 image. The seam carving result was computed in 22 seconds and less than 50
lines of C++ code. Image by Jean-Pol Grandmont, CC-BY 3.0.
13
This approach precisely matches histograms: it is thus sensitive in the image content. If a landscape input image
with 40% sky and 60% grass is recolored from a beach model image of 60% sky and 40% sand, the recolored image will
have 20% unpleasant beach-colored sky. We have published a partial sliced optimal transport framework to solve this
issue: SPOT: Sliced Partial Optimal Transport: https://hal.archives-ouvertes.fr/hal-02111220/document
3.3. IMAGE RETARGETING 75

Another common problem in image and video processing is that of resizing an image/video so that
it matches a certain display size. For instance, adapting a 4:3 movie to a 16:9 screen, or going from a
portrait to landscape image mode (without rotating). Simple solutions involve cropping (but that can
loose particularly important information) or stretching (but that can significantly distort the image).

We will now see a popular approach to downsizing images while preserving as much content as
possible: the Seam Carving method14 . Without loss of generality, I will focus on the case of horizontal
downsizing. As this name says, this approach finds a seam – a vertical 1-pixel wide path – in the
image that would go unnoticed if removed. Repeating this operation multiple times allows to stretch
the image down while preserving most of the interesting content and reducing deformations.

The first step is to determine what kind of features need to be preserved, ie., what would not go
unnoticed if removed. A simple heuristic is that we want to preserve edges: any flat uniform surface
can be stretched down without producing much artifacts, but stretching down a tree may pose more
problems. The first step is thus to compute an energy map that detects edges. A simple approach for
that is to consider that the value of the energy map E(x, y) at the pixel (x, y) is given by

E(x, y) = abs(I(x + 1, y) − I(x − 1, y)) + abs(I(x, y + 1) − I(x, y − 1))

where I(x, y) is the intensity of the original image. For color images, a simple approach is to take

I(x, y) = R(x, y) + G(x, y) + B(x, y)

where R, G, and B are respectively the red, green and blue channels of the image (Fig. 3.6).

Figure 3.6: Original photo and the associated energy that consists of edges.

The second step is to compute a cumulated energy map, C(x, y), that describes optimal paths.
Such methods come from the field of dynamic programming, but are relatively easy to understand.
This map can be computed with a rather simple formula:

C(x, y) = min(C(x − 1, y − 1), C(x, y − 1), C(x + 1, y − 1)) + E(x, y)

where E is the energy map above. This function tells you that if you currently are at the position
(x, y) in the image, on row y, your best move to go to row y − 1 is to either go toward the pixel
(x − 1, y − 1) or (x, y − 1) or (x + 1, y − 1) (choosing the one that minimizes this cumulative energy
map). And the value at pixel (x, y) is the minimum cost of reaching this pixel via a vertical seam.
This map is called Value Function (Fig. 3.7).

The third step is to go to the very last row of the cumulative energy map, and select the pixel
with the lowest cumulative energy. This is the starting point of the seam.

The fourth step is to start from the pixel chosen in step 3, and progressively build the seam.
For that, you will use the insight given in step two: if your current pixel is (x, y), your best move
76 CHAPTER 3. IMAGE PROCESSING

Figure 3.7: From the energy (left) we compute the cumulative energy (here, y = 0 is the top row).

Figure 3.8: Fom the cumulative energy, we compute a seam that goes through consecutive pixels of
lowest cumulative energy.

to build the seam is to choose the pixel that minimizes the cumulative energy C(x, y) among pixels
(x − 1, y − 1), (x, y − 1), and (x + 1, y − 1). This operation is called backtracking (Fig. 3.8).

The final step is to remove the seam. A simple way to see that is that all pixels on the left of the
seam remain unchanged, while all pixels on the right of the seam are translated by one pixel to the
left. After this step is done, you can finally crop your image by one pixel: you now have successfully
rescaled the original image width by one pixel. Repeat the operation as necessary to reduce the image
width by the desired amount15 .

3.4 Poisson Image Editing

It is quite common as well to integrate (part of) a photo onto another photo, for compositing purpose
(Fig. 3.9). This kind of problem arose in early photography when photographic plates were very
sensitive to blue-violet wavelengths leading to over-exposed sky. In 1852, Hippolyte Bayard proposed to
combine two negatives – this technique was first used by William Lake Price in 1855 and popularized by
Gustave Le Gray in 1856–1858 (Fig. 3.10. Most photos of skies from the XIX’s century were faked. In
modern digital photography era, we are able to copy-paste photographic elements16 . However, naively
copy-pasting does not always produce realistic results, and may require precise matting/detouring.
While matching colors is a good start, it does not solve the matting problem.
14
Seam Carving for Content-Aware Image Resizing: http://www.eng.tau.ac.il/~avidan/papers/imretFinal.pdf
15
It is also possible to enlarge an image by considering multiple seams at once, and duplicating them.
16
For a direct application to sky manipulation, see Sky is Not the Limit: Semantic-Aware Sky Replacement, https:
//sites.google.com/site/yihsuantsai/research/siggraph16-sky
3.4. POISSON IMAGE EDITING 77

Background Inset Mask

Copy&Paste Poisson

Figure 3.9: Seamless Cloning. We would like to insert the kid image onto the background. Copy-
pasting the image using a crude mask leads to unnatural results: boundaries are highly visibles, and
colors do not match. Using Poisson Image Editing for the task of seamless cloning reduces these
artifacts. This took 200ms to compute on a 1008x752 image, and 80 lines of code using a multiscale
strategy, or just 25 lines without multiscale. Statue image by Jean-Pol Grandmont, CC-BY 3.0.

The idea behind Poisson image editing17 for seamless cloning is that the human eye is more sensitive
to color differences than absolute values (see Adelson’s checkerboard, Fig. 3.11). As such, it tries to
preserve gradients from an input image to be pasted onto another image.

3.4.1 A simple approach

Preserving gradients can be expressed as the minimization of the following functional:

Z
min J(u) = min k∇u(x) − ∇f (x)k2 dx
u u Ω

u(x) = g(x) x ∈ ∂Ω
where Ω is the inpainted area, u is the solution we are looking for, f is the image to paste and g is
the background image.

We will denote h = u−f for simplicity. Minimizing this functional amounts to solving the following
PDE, called the Poisson equation18 :
−∆h = 0 x∈Ω
h=g−f x ∈ ∂Ω

This can be demonstrated by considering a small variation around h19 : h + εv, where v = 0 on
the boundary ∂Ω (so that h + εv still respects the original boundary conditions). Minimizing the
17
Poisson Image Editing: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.451.1843&rep=rep1&
type=pdf
18
We often use minus the Laplacian to keep a symmetric positive definite linear system instead of negative.
19
or by applying Euler-Lagrange formulas!
78 CHAPTER 3. IMAGE PROCESSING

Figure 3.10: A photography by Le Gray in 1857 (La Grande Vague) featuring an early combination
of two negatives, one for the sea, one for the sky.

Figure 3.11: In Adelson’s checkerboard, the squares flagged A and B are of the same absolute intensity:
the human eye is more sensitive to intensity differences than to absolute intensities.

functional J amounts to having all variations of J equals 0 around the minimizer. We thus compute
the variation of J:
J(h + εv) − J(h)
Z
1
= k∇h + ε∇vk2 − k∇hk2 dx (3.1)
ε ε Ω
Z
1
= 2εh∇h, ∇vi + ε2 k∇vk2 dx (3.2)
ε Ω
Z Z
= 2 h∇h, ∇vidx + ε k∇vk2 dx (3.3)
Ω Ω

We are interested in infinitesimally small variations, to get a notion of derivative of J:

J(h + εv) − J(h)
Z
lim = 2 h∇h, ∇vidx
ε→0 ε Ω

By applying Green’s identity (i.e., integrating by parts!):

J(h + εv) − J(h)
Z Z
lim = 2 Ωvhh, ~nidΓ − 2 v∆hdx
ε→0 ε ∂ Ω
3.4. POISSON IMAGE EDITING 79

where ~n is the normal of the boundary ∂Ω. Since we have taken v to be 0 on the boundary, and
setting this infinitesimally small variation to 0, we have:
Z
v∆hdx = 0
Ω

In this context, the fundamental lemma of calculus of variations says that ∆h = 0 (intuitively, if the
integral of the product of a function F with all tests functions is zero, then F ought to be zero).
∂ 2 ∂ 2
We can discretize this equations by realizing that ∆ = ∂x 2 + ∂y 2 , and using centered finite differ-
ences discretization of the second derivatives. We will denote hi,j the discretized value of h at pixel
(i, j). This yields:
−∆h(xi , yj ) ≈ 4hi,j − hi+1,j − hi−1,j − hi,j+1 − hi,j−1

This leads to a linear system, that we can again solve with Jacobi iterations (see Sec. 2.2.4). Taking
more time to detail the Jacobi method here, the idea is that a square matrix M can be decomposed
in a sum of a diagonal matrix D and two upper and lower triangular matrices (minus the diagonal)
E and F such that M = D − E − F . Solving M x = b corresponds to solving Dx = (E + F )x + b,
so x = D−1 (E + F )x + D−1 b. The idea behind Jacobi iterations is to consider the iterations xk+1 =
D−1 (E + F )xk + D−1 b which converge if the largest eigen value of D−1 (E + F ) is smaller than 1 (a
sufficient condition is for the matrix to be strictly diagonal dominant).

In our case, this leads to the iterates:

1 n
hn+1 hi+1,j + hni−1,j + hni,j+1 + hni,j−1

i,j = (i, j) ∈ Ω
4

hn+1 = g − f (i, j) ∈ ∂Ω
Performing these iterations (in parallel!) should yield a solution h, and u can be recovered using
u = h + f.

However, these iterations converge extremely slowly in the context of the Poisson equation. In
fact, so slowly that the difference between two consecutive iterations can reach machine precision, and
the iterations get stuck way before mathematical convergence. To alleviate this issue, we will proceed
in a multiscale strategy (see Fig. 3.12)20 . We first downsample the images by some power of two (on
this result, I downsampled by a factor 16 in both width and height), iteratively solve the problem at
this resolution, then upsample the result by a factor of 2, then iteratively solve the problem but using
the upsampled result as a starting point, and so on.

In practice, for dowsampling the mask, we can use a nearest neighbor downsampling (i.e., querying
the pixel value at the nearest integer pixel) as we cannot average binary mask values. For dowsampling
RGB values, simple averaging is sufficient. For upsampling, linear interpolation would be good – in
my results, I used a nearest neighbor for simplicity.

3.4.2 Possible improvements

The solution heavily relies on boundary values of g −f – the rest is merely a diffusion process that does
not involve the input images. As such, artifacts that can occasionally be observed mostly come from
boundary values. It has been suggested to find an optimal boundary using dynamic programming,
which reduces bleeding artifacts in practice21 .
20
The correct way to do it would be a geometric multigrid approach – however, proceeding as we will do works
reasonably well in practice.
21
Drag-and-Drop Pasting: https://www.cse.cuhk.edu.hk/~leojia/all_project_webpages/ddp/dragdroppasting.
pdf
80 CHAPTER 3. IMAGE PROCESSING

Figure 3.12: A multiscale strategy for solving the Poisson image editing problem.

While multigrid approaches converge in linear time complexity, it can still remain costly in practice.
It has been suggested to use mean value coordinates to solve a similar approximate problem22 .

Finally, we have considered minimizing a squared `2 norm, resulting in a smooth membrane in-
terpolation. However, many other approaches allow to account for edges, minimize other norms, add
anisotropy etc. Notably:

`1 -norm minimization. The goal is here to minimize |u(x) − f (x)|dx which penalizes less
R

extreme values. The corresponding PDE becomes:

∇u(x) − ∇f (x)
div
|∇u(x) − ∇f (x)|
It is of course much more difficult to solve as it is non-linear.

Spatial weighting. It can sometimes be interesting to weigh differently some parts of the image,
using a weight map w. We thus want to minimize w(x)ku(x) − f (x)k2 dx. The corresponding PDE
R

to solve is:
div (w(x)(∇u(x) − ∇f (x)))
R
General formula. When minimizing a general form of the problem F (|u(x)−f (x)|)dx for some
functional F , the resulting PDE becomes:
0
F (|∇u(x) − ∇f (x)|)
div (∇u(x) − ∇f (x))
|∇u(x) − ∇f (x)|
22
Coordinates for Instant Image Cloning: https://www.cse.huji.ac.il/~danix/mvclone/files/mvc-final-opt.
pdf
Chapter 4

Geometry Processing

This chapter will give basic tools of geometry processing and modeling.

4.1 Representing shapes

Triangle mesh Quad mesh Point cloud Subdivision surface

Revolution (NURBS) Extrusion (NURBS) Voxels Tet mesh

Figure 4.1: Geometry representation. Surfaces can be represented by triangle meshes, quad
meshes, subdivision surfaces (here, Catmull-Clark on a quad mesh), or parametric surfaces such as
surfaces of revolution or extrusion (here, using NURBS). Volumes are often represented by tetrahedral
meshes, voxel grids or hexahedral meshes (voxel grids are particular instances of hexahedral meshes).
In-between are point set representations. Other representations include cell complexes, triangle (or
polygon) soups or implicit representations.

There are various ways to represent shapes (Fig. 4.1), the two main categories being implicit
and parametric representations. Parametric representations represent geometries with parametric
equations of the form (x, y, z) = f (u, v). As such, a centered unit sphere, for instance can be described
by the parametric equation:
x = sin θ cos φ (4.1)
y = sin θ sin φ (4.2)
z = cos θ (4.3)
It is however not always the case that complex geometries have nice equations of this form. A common
way to represent parametric shapes is via tensor products of curves, such as B-splines, Bezier curves

81
82 CHAPTER 4. GEOMETRY PROCESSING

or Non-Uniform Rational B-Splines (NURBS). For instance, a Bezier Pcurve is a polynomial curve
n n
defined by a set of control points {Pi }i=0..n and is defined as P (t) = i=0 Bi (t)Pi , t ∈ [0, 1] and
Bin (t) = ni ti (1 − t)n−i are Bernstein polynomials – it can easily be constructed via De Casteljau’s
algorithm. From this definition in 1D, we can P definePa Bezier surface by instead using an array
of control points {Pi,j }i=0..n,j=0..m as P (u, v) = ni=0 m n m
j=0 Bi (u)Bj (v)Pi,j hence defining a smooth,
polynomial, parametric surface. Similarly, subdivision surfaces gives recursive subdivisions schemes
(e.g., Catmull-Clark, Doo Sabin, Loop...) to smoothly refine coarse triangular or quadrangular meshes
– these coarse meshes can be seen as “control points” as well. But surfaces need not be smooth, can
be defined by piecewise polynomial functions (e.g., splines – the simplest of them being piecewise
linear functions), ... and actually, a triangle mesh is a parametric surface (finding the parameters u
and v is a problem called mesh parameterization). More related to triangle meshes are quadrilateral
meshes (or quad meshes for short), or triangle soups (sets of triangles without any connectivity
stored between them).

Alos, mostly for rendering, surfaces can be stored as point clouds. For rendering, usually, a
small disk or sphere is rendered at the location of each point. This is typically the representation of
3d scanners such as LiDaR, that scan an environment using lasers and place a point at the location
where light has been reflected.

The other class of representation is implicit representations, where shapes are defined as solutions
of some equations. For the case of a sphere, this amounts to representing a centered unit sphere as
the set of points P solutions to:

kP k2 − 1 = 0

In practice, representing surfaces with implicit functions can be occasionally

Pn useful. This is done for
1
shapes like meta-balls (or blobs) defined as the surface solution to i=1 kP −Pi k2 − c = 0 where
{Pi } is a set of points defining the centers of these blobs, and c is the isosurface parameter. A
generalization of meta-balls are convolution surfaces, that are roughly equivalent to meta-balls but
using curves instead of points to define the center of each blob. It can also be useful to represent
surfaces using distance fields (for instance stored on voxel grids, it can be signed or unsigned), for
example when doing fluid simulation, to facilitate collision detection1 or boolean operation of surfaces
(e.g., computing the intersection of two geometries with implicit surfaces is way easier than using
meshes: the intersection of two implicit surfaces defined by equations f and g is simply max(f, g) and
one can build trees of boolean operators to represent complex objects, a process called Constructive
Solid Geometry).

To represent volumes, implicit representations are also easily used (e.g., instead of using the solution
f (P ) = 0, one can use f (P ) < 0 to define a volume – voxel grids can also be used to define which
voxels are inside or outside of the geometry, which is widely used for fluid simulations), as well as
parametric representations. Typically, triangles meshes are extended in 3d to form tetrahedral
meshes, while quad meshes are extended to 3d to form hexahedral meshes (a voxel grid is a
particular case of a hexahedral mesh). Cell complexes represent volumes (or surfaces, or geometries
in any dimension) by cells of any shapes partitioning the geometry – this is particularly the case of
Voronoı̈ diagrams that we will manipulate.

Finally, procedural shapes are represented via algorithms that produce them. This is for instance
widely used to generate complex terrains, cities, clouds etc. The final models obtained via these
procedures can be either parametric or implicit.
4.2. POLYGON CLIPPING 83

Figure 4.2: Iterations of the outer loop of the Sutherland-Hodgman algorithm for clipping the polygon
in blue by a quadrilateral. The algorithm iteratively removes half spaces.

4.2 Polygon clipping

An important tool that we will use later relates to the problem of clipping polygons (or polyhedras)
by convex polygons (resp. convex polyhedras). This corresponds to restricting a given polygon called
the subject polygon to the inside of a convex polygon called the clip polygon. This is used for many
applications. For instance, it is used for rendering: all polygons in the scene can be clipped to the view
frustum (i.e., strictly restricting the scene to what is within the field of view for efficiency), polygons
can be cut when using acceleration structures such as regular grids or kd-trees, etc. We will use it
later to build Voronoı̈ diagrams (Sec. 4.3.3).

A well-known algorithm for this task is the Sutherland-Hodgman algorithm described next, in 2-D
first. The basic operation in this algorithm is the clipping of the subject polygon by an infinite line
(or rather, a half-space delimited by an infinite line). Then this basic operation is simply repeated for
all edges of the convex clip polygon (Fig. 4.2), thus considering all half spaces delimited by each edge
of the clipping polygon.

To clip a polygon by a line, the algorithm traverses all edges of the subject polygon and progres-
sively builds a new updated polygon. For a given edge E, either it is completely outside of halfspace
of interest and this edge is simply ignored, or this edge is completely inside and it is added to the new
polygon, or this edge is partly inside and partly outside of the considered halfspace, and only part of
this edge is added.

1
See how it can be used for character skinning in Implicit Skinning: Real-Time Skin Deformation with Contact
Modeling: http://rodolphe-vaillant.fr/permalinks/implicit_skinning_project.php
84 CHAPTER 4. GEOMETRY PROCESSING

The algorithm reads:

Algorithm 4: Clips the subjectPolygon by a convex clipPolygon. The corresponding C++

code has 32 lines, plus the basic operators and classes for Vector.
Input: subjectPolygon, and a convex clipPolygon
Output: outPolygon
1 for clipEdge in clipPolygon do // For each edge of the clip polygon
// Clip the subjectPolygon by a half-space
2 outPolygon = new Polygon();
3 for i ← 0 to subjectPolygon.vertices.size()-1 do // For each vertex of the subject polygon
// Test the subject polygon edge with vertices (i-1, i)
4 Vector curVertex = subjectPolygon.vertices[i];
5 Vector prevVertex =
subjectPolygon.vertices[(i > 0)?(i − 1):(subjectPolygon.vertices.size()-1)];
// Compute inter. between the infinite line supported by clipEdge and edge (i-1, i)
6 Vector intersection = intersect(prevVertex, curVertex, clipEdge);
7 if curVertex inside clipEdge then
8 if prevVertex not inside clipEdge then
// The subject polygon edge crosses the clip edge, and we leave the clipping area
9 outPolygon.vertices.add(intersection);
10 outPolygon.vertices.add(curVertex);
11 else if prevVertex inside clipEdge then
// The subject polygon edge crosses the clip edge, and we enter the clipping area
12 outPolygon.vertices.add(intersection);
13 subjectPolygon = outPolygon;
14 return outPolygon

The point of intersection between the (finite) edge [A, B] and the (infinite) line for which two
points u and v are known can be computed similarly to the line-plane intersection in Sec. 2.2.1, that
hu−A,N i
is P = A + t(B − A) with t = hB−A,N i where N is the normal to the line (u, v) and has coordinates
(vy − uy , ux − vx ). If t < 0 or t > 1, no intersection exists with this segment (since we made sure both
ends of the edge are on opposite sides of the plane, this should not happen). Similarly, the test inside
tells on which side of the clipEdge the vertex is. It returns true for a point P if hP − u, N i ≤ 0 for u
a point on the clipEdge (here, make sure N is the outwards normal to the clipEdge!).

This algorithm can be easily extended to clip polyhedral domains or triangle meshes (in 3D). One
simply needs to perform the Sutherland-Hodgman to each facet of the domain (note that clipping a
triangle mesh will produce a mesh that does not only contain triangles), by iterating over all facets
of the clipping mesh, over all facets of the subject mesh, and finally over each edge of each facet of
the subject mesh. This will iteratively remove 3d half-spaces delimited by infinite planes, and the
intersection point between each edge of each facet and the cutting plane is found exactly like the
line-plane intersection formula of Sec. 2.2.1. However, after cutting half of the space, the hole needs to
be filled. This can be done by realizing that the new added polygons correspond (only) to consecutive
vertices that have been added while cutting each facet. These new vertices should be traversed in the
correct order so as to add the filled area (Fig. 4.3).

4.3 Voronoı̈ diagrams and Delaunay triangulations

Given a set of samples {Pi }, generally in 2-d or 3-d space, we would like to triangulate them. De-
launay triangulation is one possible triangulation that has interesting properties (Fig. 4.4). Delaunay
triangulations also happen to be the dual of Voronoı̈ diagrams, a decomposition of the domain into
4.3. VORONOÏ DIAGRAMS AND DELAUNAY TRIANGULATIONS 85

Figure 4.3: Cutting Davy Jones with a plane. Applying Sutherland-Hodgman’s algorithm to each
facet allows for clipping in 3D, but an additional step fills the hole produced (here, the hole does not
form only closed loops).

Delaunay ...with in-circle ... as dual of a ...as convex hull

triangulation predicate Voronoï diagram in d+1 dimensions

Figure 4.4: A Delaunay triangulation of a set of samples (left) is a triangulation that respects the
in-circle property, is the dual of a Voronoı̈ diagram, and the convex hull of these samples lifted with a
parabola. Here, the Voronoı̈ diagram has 2 closed cells, and 6 infinite open cells (truncated for display
purpose).

convex cells called Voronoı̈ cells – i.e., the triangulation produced by connecting adjacent cells by a
triangle edge – where Voronoı̈ diagram vertices are Delaunay triangles circumcenters. It is uniquely
defined for samples that are in general positions, that is, when adjacent triangles do not have cocyclic
vertices (vertices on the same circle, e.g., two triangles forming a rectangle).

Properties that define Delaunay triangulations and Voronoı̈ diagrams are:

The circumcircle of each Delaunay’s triangle does not encompass any other vertex. We usually
implement this property using an in-circle predicate, that checks whether a given sample is
within a given triangle’s circumcircle. The circumcircle of triangle ABC of center K and radius
r is a bisector of all edges. Denoting u = B − A and v = C − A, and M = A+B 2 and N = A+C2
the middle of AB and AC, then K, the intersection of both bisectors is such that hu, K − M i =
hv, K − N i = 0, so that hu, Ki = hu, M i, and hv, Ki = hv, N i. This defines two linear equations:

ux uy Kx hu, M i
=
vx vy Ky hv, N i

Using Cramer’s rule, and denoting u⊥ = (uy , −ux ) and v ⊥ = (vy , −vx ), this yields the circum-
86 CHAPTER 4. GEOMETRY PROCESSING

center
hu, M iv ⊥ − u⊥ hv, N i
K=
ux vy − uy vx
The radius is then trivially obtained, and the in-circle predicate merely compares distances to
the circumcenter.
Similarly in 3d, the circumsphere of a tetrahedron ABCD can be obtained the same way, by
considering the intersection of 3 planes passing through the middles M , N and O, of each edge
u = B − A, v = C − A and w = D − A, leading to the solution of a 3 × 3 linear system:

hu, M i(v × w) − hv, N i(u × w) + hw, Oi(u × v)

K=
hu, v × wi

The consequence of that is that if 4 points A, B, C and D (in 2d, here ordered clockwise) are
cocyclic (i.e., they belong to the same circle), the Delaunay triangulation is not unique, since
2 triangulations are equally valid: the first consists of the triangles ABC and ACD, the second
consists of the triangles ABD and BCD.

Each point within the Voronoı̈ cell associated with sample Pi is closer to Pi (usually using the
Euclidean norm2 ) than to any other sample Pj :

kP − Pi k2 ≤ kP − Pj k2 , ∀i 6= j

In this context, Pi are often called “sites”. An intuition is that if your samples represent the
location of bakeries, you would belong to the Voronoı̈ cell of your nearest bakery. As such, each
vertex of a Voronoı̈ diagram is the intersection of bisectors, and the Voronoı̈ diagram of a set of
points consists of Voronoı̈ cells that are polygonal and convex. This can be implemented with
the help a predicate determining on which side of a bisector a sample resides.

The Delaunay triangulation of {Pi } is the convex hull of {(Pi , kPi k2 )} (or of any other isotropic
parabolic lifting).

We will review an algorithm for Delaunay triangulation and two for Voronoı̈ diagrams (though
both could be used in the two contexts since it is trivial to go from the primal to the dual structure).
In general, Voronoı̈ diagrams are important in computer graphics, but also as a general data structure
as it is an acceleration structure to find closest points within a dataset.

4.3.1 Bowyer–Watson algorithm

The Bowyer–Watson algorithm is an algorithm to compute a Delaunay triangulation in arbitray di-

mension. I will describe it in 2d, but it easily extends to higher dimension.

In 2d, you would start with a gigantic triangle encompassing the entire point set to triangulate
(for instance, take an equilateral triangle whose basis is slightly below the bottom of the bounding
box of all points but of length W + 3H with W and H the width and height of the bounding box –
this should create a triangle that largely encompasses the bounding box). Then we will progressively
add each point to the triangulation. For that, we iterate over each point Pi , and for each Pi we check
if any triangle from the current triangulation has its circumcircle encompassing Pi . We remove all
these triangles from the triangulation, thus creating a hole in the mesh. We then fill this hole by
creating new triangles connecting Pi to all edges forming the border of this hole. When the algorithm
terminates, we simply clean up the triangulation by removing triangles connected to the vertices of
the initial gigantic triangle (Fig. 4.5).
2
See how using Lp norms can help produce hexahedral meshes for example, in Lp Centroidal Voronoi Tessellation
and its applications https://hal.inria.fr/inria-00600251/PDF/LpCVT.pdf
4.3. VORONOÏ DIAGRAMS AND DELAUNAY TRIANGULATIONS 87

New point Pi

Figure 4.5: Computing the Delaunay triangulation of these 6 samples requires first to add a gigantic
triangle (in green) encompassing all points. Then, assuming the triangulation of 5 of these points
was done previously (left), we want to insert the 6th point (in blue) in the triangulation. We first
determine which triangle it belongs to and then we progressively propagate to adjacent triangles testing
for the in-circle predicate. Here, 3 triangles have their circumcircle encompassing Pi . We remove these
triangles and instead create new triangles between Pi and the boundary of this hole (middle ; the hole
will not contain other points than Pi ). Ultimately, we remove the big green triangle to obtain the final
triangulation. My quick’n dirty monothreaded code is around 140 C++ lines, takes about 1.3 seconds
for 100K vertices, and around 1min30 for 1M vertices (most of the time is spent locating vertices on
the mesh – fast libraries using quad trees would do that 10-100x faster). Adding about 35 lines for a
regular grid acceleration structure (rasterizing triangles to store all triangles that fall inside each cell
of a 100x100 grid) brings timings down to 600ms for 100K triangles and less than 8 seconds for 1M
triangles.

Stated like that, the algorithm sounds quite simple, but this hides small details that make it fast.
In fact, a naive implementation in 2D would make it O(N 2 ) (for each inserted point, you would look
for all triangles whose circumcenter encompasses this point), although it is possible to bring that to
O(N log(N )). The thing is to realize that only the triangles around the newly added Pi will need to
be altered. The solution is to first find in which triangle from the existing triangulation Pi belongs.
And then, to only verify the incircle predicate of neighboring triangles, possibly propagating over a
larger neighborhood while the predicate is true. To allow for efficient propagation and navigation in
the triangulation, one needs to have quick access to the neighbors of a triangle. For that, the triangle
datastructure should now contain the 3 indices of triangles adjacent to each edge in addition to the 3
indices referencing the array of vertices.

To find the triangle in which Pi belongs, a simple solution is to randomly pick any triangle T in
the triangulation3 . We launch a ray from the barycenter Q of T towards Pi and check which of the
3 edges of T it intersects. If no intersection is found, it means Pi belongs to T (otherwise, since the
barycenter is inside T and Pi would be outside, there would be an intersection). If an intersection is
found, we go to the triangle at the opposite of the intersected edge and repeat the operation. This
efficiently navigates in the mesh triangles (though quad-tree based solution are generally faster).

Here, “launching a ray” means testing if a segment [Pi Q] intersects an edge (e.g., [AB]), one solution
could be to compute the point of intersection between infinite lines and check whether parameters lie
within the [0, 1] range. A simpler (and more efficient) option is to consider that Pi and Q should be
on opposite sides of the infinite line (AB) and that A and B should be on opposite sides of the infinite
line (Pi Q). After having computed the normal AB ⊥ , checking if Pi and Q are on opposite sides means
verifying that hPi − A, AB ⊥ ihQ − A, AB ⊥ i < 0, and similarly for the other condition.

Locating the triangle t containing a point P is thus performed with the following snippet (assuming
an appropriate datastructure storing triangles):

3
In my implementation, I use a 100x100 regular grid to help locate a reasonably close starting triangle. I also do not
remove bad triangles but merely invalidate them, to simplify memory management.
88 CHAPTER 4. GEOMETRY PROCESSING

1 i n t locateTriangle ( c o n s t Vector& P ) {
2 int i , t = 0;
3 do {
4 Vector Q = barycenter ( t ) ;
5 f o r ( i = 0 ; i < 3 ; i++) {
6 i f ( intersect ( Q , P , vertices [ triangles [ t ] . vtx [ i ] ] , vertices [ triangles [ t ] . ←-
vtx [ ( i + 1 ) % 3 ] ] ) ) { // segment−segment i n t e r s e c t i o n ?
7 t = triangles [ t ] . neighbor [ i ] ;
8 break ;
9 }
10 }
11 } w h i l e ( i != 3 ) ;
12 return t ;
13 }

4.3.2 Jump Flooding

The jump flooding algorithm4 is a simple and efficient algorithm to compute Voronoı̈ diagrams on
pixel grids, that is embarrassingly parallel, and that can also be used to propagate any information in
the grid (e.g., to compute distance maps).

The idea of this algorithm is to see the Voronoı̈ diagram as a fire spreading from the input sites.
After some time, the various fire fronts will meet in the bissectors. However, spreading fire iteratively
one pixel at a time would require at most as many iterations as the size of the input image. The idea
of Jump Flooding is to spread first at very large distances (at a distance of W/2 with W the width of
the image), and at each iteration to divide the step size by two, hence performing log2 (W ) iterations
(Fig. 4.6).

So, at the begining the image is entirely black (i.e., consisting of invalid data) except at the locations
of the sites from which fire will start, where the site index is recorded. The algorithm then scans all
pixels pi,j and checks whether a site index has been stored at pi+k,j+l with k and l in {−s, 0, s}, where
s, the step length, is initially set to W/2. The shortest distance between the current pixel pi,j and
the 9 sites whose indices are stored at pixels pi+k,j+l is computed, and the corresponding closest site
is stored in a second image at p0i,j . The process is repeated after having halved the step length s
and swapped image p and p0 , until s = 1. In general, the approximation error is already extremely
small. But if higher accuracy is needed, a few additional iterations at s = 1 can be performed. The
approach is very fast and its speed does not depend (or almost) on the number of seeds (Fig. 4.7). Note
that this algorithm usually converges even faster in higher dimension as fire propagates faster. And
since we only read values from image p and write in image p0 , all pixels can be computed in parallel,
which makes this algorithm ideal for GPU. Finally, in addition to propagating the seed indices, it can
propagate other information such as the distance to the nearest seed, which can be used to compute
distance maps very efficiently.

1 v o i d JFA ( i n t step , c o n s t i n t * prevIter , i n t * curIter ) {

2 #pragma omp p a r a l l e l f o r
3 f o r ( i n t i = 0 ; i < H ; i++) {
4 f o r ( i n t j = 0 ; j < W ; j++) {
5 Vector2D p ( j , i ) ;
6 d o u b l e minDist2 = std : : numeric_limits<double > : : max ( ) ;
7 i n t bestSite = −1;
8 f o r ( i n t k = −1; k <= 1 ; k++) {

4
Jump Flooding in GPU with Applications to Voronoı̈ Diagram and Distance Transform https://www.comp.nus.edu.
sg/~tants/jfa/i3d06.pdf
4.3. VORONOÏ DIAGRAMS AND DELAUNAY TRIANGULATIONS 89

9 f o r ( i n t l = −1; l <= 1 ; l++) {

10 i n t i2 = i + k * step , j2 = j + l * step ;
11 i f ( i2 < 0 | | j2 < 0 | | i2 >= H | | j2 >= W | | prevIter [ i2 * W + j2 ] <←-
0) continue ;
12 d o u b l e dist2 = ( seeds [ prevIter [ i2 * W + j2 ] ] − p ) . getSquaredNorm ( ) ;
13 i f ( dist2 < minDist2 ) {
14 minDist2 = dist2 ;
15 bestSite = prevIter [ i2 * W + j2 ] ;
16 }
17 }
18 }
19 curIter [ i * W + j ] = bestSite ;
20 distance [ i * W + j ] = minDist ; // o p t i o n a l l y s t o r e s a d i s t a n c e map
21 }
22 }
23 }
24
25 v o i d compute ( std : : vector<i n t > &curIter ) { // c u r I t e r w i l l r e c e i v e t h e r e s u l t
26 std : : vector<i n t > prevIter ( W * H , −1) ;
27 curIter . resize ( W * H ) ;
28 f o r ( i n t i = 0 ; i < seeds . size ( ) ; i++) { // s e e d s an a r r a y o f sample ←-
coordinates
29 prevIter [ ( ( i n t ) seeds [ i ] [ 1 ] ) * W + ( i n t ) ( seeds [ i ] [ 0 ] ) ] = i ; // p u t s s e e d ←-
numbers i n t h e image
30 }
31 f o r ( i n t k = W / 2 ; k >= 1 ; k/=2) { // l o g 2 W i t e r a t i o n s o f JFA
32 JFA ( k , &prevIter [ 0 ] , &curIter [ 0 ] ) ;
33 prevIter . swap ( curIter ) ; // t h i s j u s t swaps p o i n t e r s and i s i n O( 1 )
34 }
35 i f ( ( i n t ) ( log2 ( W ) )%2 == 1 ) // makes s u r e t h e r e s u l t i s i n t h e c o r r e c t s t d : : ←-
vector
36 prevIter . swap ( curIter ) ;
37 }

Figure 4.6: Iterations of the Jump Flooding Algorithm. Seeds are displayed as red pixels. In this
64x64 pixel grid, 6 iterations are needed (log2(64)).

Fig. 4.7 also shows a result where seeds are placed within all pixels belonging to the cat shape of
Fig. 2.32 (all brown pixels in Fig. 2.32). You may notice that some interior edges form the medial
axis. The medial axis is the set of points P such that placing a sphere of maximal radius centered
at P that remains inside the shape will touch the shape in at least two points. Medial axes are
also important in computer graphics – they allow to build skeletons of objects, allow for topological
analysis, or can be used as shape descriptors. It can be shown that in 2-d, if (the boundary of) a
shape is sampled with a set of points, the set of edges of the Voronoi Diagram that are completely
inside the shape form the medial axis5 .

5
Note that this is note the case in 3D – see Stability and Computation of Medial Axes: a State-of-the-Art Report,
https://hal.archives-ouvertes.fr/hal-00468690/document – nor on our 2D pixel grid since the shape boundary has
a certain width
90 CHAPTER 4. GEOMETRY PROCESSING

Figure 4.7: The Jump Flooding algorithm computes a Voronoı̈ diagram of these 2048x2048 images in
80ms in parallel (it is almost independent on number of seeds) and about 40 lines of code. At the
same time, it can compute a distance map or propagate any other information at no additional cost.
Seeds need not be isolated points: the bottom result shows the algorithm run on the cat of Fig. 2.32.
As you can see, the medial axis is a subset of the Voronoı̈ “edges”. Also note that sites can be grouped
if desired, in which case multiple sites would have the same ID.

4.3.3 Voronoı̈ Parallel Linear Enumeration

Clipping half-spaces. We have seen that Sutherland-Hodgman’s polygon clipping algorithm is an

efficient way to clip a polygon by iteratively removing half-spaces defined by infinite lines. The idea
of Voronoı̈ Parallel Linear Enumeration 6 is to treat each Voronoı̈ site independently and compute
their Voronoı̈ cell independently in parallel. To compute the Voronoı̈ cell of site Pi , we start with a
large shape largely enclosing the entire point set (e.g., an extremely large quadrilateral), and we use
Sutherland-Hodgman polygon clipping algorithm to remove all half-spaces defined by the infinite lines
that are bissectors between Pi and all Pj . Specifically, we cut our big quadrilateral removing the space
defined by the set of points X such that kX − Pi k2 > kX − Pj k2 (see Fig. 4.8). The only modification
to Sutherland-Hodgman algorithm is that the point of intersection P between the bissector of Pi Pj
(that passes through the middle M of Pi Pj ) and the current edge [AB] to be clipped is computed
using the fact that hP − M, Pi − Pj i = 0 and P = A + t(B − A), so that hA + t(B − A) − M, Pi − Pj i = 0
hM −A,P −P i
and so t = hB−A,Pii−Pjji , and a point X is inside the clip edge if hX − M, Pj − Pi i < 0. By performing
this operation for all Pj 6= Pi , we obtain the Voronoı̈ cell of Pi , and this operation can be performed
in parallel and independenly for all Pi .

However, proceeding that way would make the algorithm O(N 2 ) since for all Pi , the Voronoı̈ cell
of Pi necessitates cutting half spaces defined by all Pj 7 . This can quickly become prohibitive. To
6
Variational Anisotropic Surface Meshing with Voronoi Parallel Linear Enumeration, https://members.loria.fr/
Bruno.Levy/papers/vorpaline_IMR_2012.pdf
7
The cost of Sutherland-Hodgman also depends on the number of edges of the Voronoı̈ cell – however, the number of
vertices in the entire Voronoı̈ diagram in 2d is 2N − 5 and the number of edges 3N − 6 ; per Voronoı̈ cell, this number
is thus a constant(with an average of 6 edges per Voronoı̈ cell).
4.3. VORONOÏ DIAGRAMS AND DELAUNAY TRIANGULATIONS 91

A P B

x
Pi x

M x

D C

Figure 4.8: To compute the Voronı̈ cell of site Pi , Voronoı̈ Parallel Linear Enumeration clips a big
quadrilateral by all half-spaces defined by bissectors of Pi and all other Pj .

alleviate this issue, one needs to realize that a site Pj that is very far from Pi has little chance to
contribute to the Voronoı̈ cell of Pi . In fact, if the distance from Pi to Pj is greater than twice the
distance from Pi to the farthest point of its current polygon estimate of the Voronoı̈ cell (built with a
subset of the samples {Pk }), the bissector of [Pi Pj ] will not clip anything of the polygon (Fig. 4.9). It
is thus more interesting to start by clipping within bissectors of [Pi Pj ] for Pj close to Pi than far from
it. To achieve that, we query the k-nearest neighboring sites of Pi ordered by increasing distance,
and iteratively clip the current polygon estimate (starting with our gigantic quad) with these sites
until we find one site that is not contributing to the Voronoı̈ cell (in which case the next sites will
not contribute either). If k nearest neighbors are not sufficient (i.e., the k th nearest neighbor still
contributes to the Voronoı̈ cell), then we perform a new 2k-nearest neighbors query.

" Beware that the very first nearest neighbor in the point set is Pi itself ! Make sure to ignore the
very first nearest neighbor since it does not make sense to clip with the bisector of [Pi Pi ]

K-d trees. The k-nearest neighbor queries can be performed in O(k log(N )) using a kd-tree. A
kd-tree is an acceleration structure ideal for nearest neighbor queries which represents a partition of
the space (contrary to our previous BVH in our renderer!). A kd-tree is a binary tree and is built
by recursively splitting points into 2 subsets, ideally of equal sizes, alternately along each dimension
(Fig. 4.10). It is constructed by sorting points (or a subset of them) along one dimension, and using
the point with median value to split the set into two subsets, recursively until leaves have only one
point. Each node stores the index of the median point, as well as the extent of the domain (the root
contains the entire bounding box of the point set).

Searching for the closest point given a query point Q consists in first checking the distance between
Q and the root of the tree, and then taking the branch where Q is (in Fig. 4.10, the left branch) since
it will more likely contain the actual nearest neighbor. The process is repeated while keeping track of
the smallest distance R encountered while going down the tree. However, children are visited if and
only if their domain overlap with the disk D centered at Q and of radius R. Once we cannot go down
anymore, our depth-first traversal will go back up: we will also visit children branches whose domain
overlap D (while continuing updating R, possibly making it smaller during the traversal, hence elaging
more branches).

Similarly, searching for the k nearest neighbors can be performed by keeping a max heap of points,
92 CHAPTER 4. GEOMETRY PROCESSING

x
Pi R
x

bisse
ctor

Figure 4.9: Clipping a polygon (in black) representing the current estimate of the Voronoı̈ cell of Pi
with a bissector (magenta) of [Pi Pj ] of a Pj that is more than twice further than the furthest vertex
of the polygon estimate will not change the polygon estimate, and can thus be ignored.

visiting each branch that overlap with the disk whose radius is the k th closest distance found so far.
In practice, efficient libraries exist in C++ such as Flann, NanoFlann (header-only) or ANN. Flann
and ANN support approximate nearest neighbor searches, by using a larger disk to prune branches.

x G A x G A
x E x E
x x

A C B C A C B C
B x
x
B x
x

x F D E F G x x F D E F G
x D x D Q

Figure 4.10: Left. Building the kd-tree of points {A, B, C, D, E, F, G} consists in splitting the set of
points alternately in the horizontal and vertical directions. Right. Searching for the closest point to
Q results in first going down the tree to search in which leaf Q is located while recording the closest
distance R from each node to Q. Then, we go back up and down the tree to traverse each branch
whose domain overlap with the disk centered at Q and of radius R (that gets updated).

The Voronoı̈ Parallel Linear Enumeration algorithm is fast8 (see Fig. 4.11), though in large dimen-
sion the number of nearest neighbors contributing to each Voronoı̈ cell increases exponentially.

4.4 More than Voronoı̈

This section describes applications and variants of Voronoı̈ diagrams used in computer graphics.

8
It has also been made faster in some edge cases, notably in the context of optimal transport in the paper Restrict-
ing Voronoi diagrams to meshes using corner validation: https://hal.archives-ouvertes.fr/hal-01626140/file/
corner-validated-rvd.pdf
4.4. MORE THAN VORONOÏ 93

Figure 4.11: Left. For this 30k point 2D example, the algorithm runs in 19s (in parallel) using the
naive O(N 2 ) algorithm coded in 56 lines of code, but runs in 30ms using the Nanoflann library and 35
additional lines of code. It then runs in 16 seconds to generate the Voronoı̈ diagram of 10M points. In
my implementation, it starts by searching for 20 neighbors, and doubles it each time it is not sufficient.
Right. Extending the code to 3D, I run this 30k point 3D example in 3min30s using the naive O(N 2 )
algorithm (using a quick’n dirty inefficient 3D Sutherland-Hodgman) and brings that down to 300ms
using Nanoflann. The 3d code is about 200 (dirty) lines.

4.4.1 Centroidal Voronoı̈ Tessellation

A Centroidal Voronoı̈ Tessellation is a Voronoı̈ diagram in which sites coincide with cell barycenters.
This produces Voronoı̈ diagrams that have useful properties. Specifically, the dual of a Centroidal
Voronoı̈ Tessellation is a Delaunay triangulation with triangles that are as close as possible to equi-
lateral. Such triangulations are useful for simulation: for instance, the speed of convergence of several
iterative schemes to solve the heat equation depends on the anisotropy of triangles and are fastest
with isoceles triangles. They also distribute the “geometric budget” more equally on the mesh, hence
representing smooth surfaces better given the same number of triangles (note that this is not the case
for non-smooth surfaces, where anisotropic triangles better represent sharp features). Finally, they
also produce uniformly spread point sets that can be used for dithering/stippling or for quasi-Monte
Carlo integration (recall, footnote in Sec. 2.2.1).

Denoting the sites X = {xi }, these triangulations minimize the energy:

XZ
min E(X) = min kx − xi k2 dx (4.4)
X X V or(xi )
i

where V or(xi ) is the Voronoı̈ cell of xi . One can indeed see that this energy (called the Lloyd’s energy)
is minimal when the xi ’s are at the barycenter of their Voronoı̈ cell. In fact, Gersho’s conjecture (proved
in 2D by Gruber in 2001) states that after minimization, the resulting cells are hexagonal and each
cell will have the same contribution to the energy.

A simple strategy to compute a Centroidal Voronoı̈ Tessellation is to start with randomly placed
xi ’s, compute their Voronoı̈ Diagram, then move the each xi to the centroid of its Voronoı̈ cell, and
iterate. This process is called “Lloyd’s iterations”, and are similar to those used for clustering in the
k-means algorithm.
94 CHAPTER 4. GEOMETRY PROCESSING

Figure 4.12: Lloyd’s iterations bring a Voronoı̈ diagram of random sites (top) such that sites coincide
with cell centroids by moving them (bottom). The Delaunay triangulation of a Centroidal Voronoı̈
Tessellation is such that triangles are near-equilateral.

Remark: Computing a Delaunay triangulation from a Voronoı̈ diagram is relatively easy: one just
needs to check all vertices of the Voronoı̈ diagram shared by 3 sites, and form a triangle connecting
these 3 sites.

The centroid C of a non self-intersecting polygon in 2D whose N vertices are {(xi , yi )}i=0..N −1 is
given by9
N −1
1 X
Cx = (xi + xi+1 )(xi yi+1 − xi+1 yi ) (4.5)
6A
i=0
N −1
1 X
Cy = (yi + yi+1 )(xi yi+1 − xi+1 yi ) (4.6)
6A
i=0

where indices are taken modulo N , and where A, the area of the polygon, can be computed by
N −1
1X
A= (xi yi+1 − xi+1 yi )
2
i=0

In 3D, the centroid of a polyhedron is obtained10 by considering that polyhedron facets can (triv-
ially) be subdivided into triangles, such that the entire boundary of the polyhedron only consists of
triangles with vertices are {(ai , bi , ci )}. Denoting ni = (bi − ai ) × (ci − ai ) the non-normalized normal,
9
see https://www.seas.upenn.edu/~sys502/extra_materials/Polygon%20Area%20and%20Centroid.pdf
10
see https://wwwf.imperial.ac.uk/~rn/centroid.pdf
4.4. MORE THAN VORONOÏ 95

we have
N −1
1 X
Cx = ni,x ((ai,x + bi,x )2 + (bi,x + ci,x )2 + (ci,x + ai,x )2 ) (4.7)
48V
i=0
N −1
1 X
Cy = ni,y ((ai,y + bi,y )2 + (bi,y + ci,y )2 + (ci,y + ai,y )2 ) (4.8)
48V
i=0
N −1
1 X
Cz = ni,z ((ai,z + bi,z )2 + (bi,z + ci,z )2 + (ci,z + ai,z )2 ) (4.9)
48V
i=0

where V , the volume of the polyhedron is given by

N −1
1X
V = ai .ni
6
i=0

A faster strategy is to consider the minimization problem in Eq. 4.4 in order to use quasi-Newton
approaches, such as l-BFGS (a type of algorithm that minimizes convex energies without explicitly
computing a Hessian, but that tries to approximate a Hessian from the gradient). More generally,
using an underlying density ρ that controls how packed should samples be:
XZ
min E 0 (X) = min ρ(x)kx − xi k2 dx (4.10)
X X V or(xi )
i
R
and denoting mi = V or(xi ) ρ(x)dx (e.g., computed via numerical integration), the gradient of the
energy E 0 is:
∂E 0
= 2mi (xi − Ci )
∂xi
with Ci the centroid of V or(xi )11 .

4.4.2 Restricted Voronoı̈ Diagrams

A Restricted Voronoı̈ Diagram (RVD) is, most commonly, the intersection of a 3D Voronoı̈ diagram
with a triangle mesh. This defines cells on the surface of the mesh (though possibly non-connected)
that can be used for remeshing, by computing the triangulation dual to these cells called the Restricted
Delaunay Triangulation (RDT). It has the advantage over geodesic Voronoı̈ diagrams (that can
be computed via front propagation on the surface of the mesh12 ) to be much cheaper to compute.

A way to compute the RVD is to clip the mesh triangles using Sutherland-Hodgman’s algorithm by
the bisector of each pair of sites. Again, a k-nearest neighbor search with the same criterion (maximum
distance between the seed and the RVD cell) can be used to discard sites that will not contribute to
the RVD cell.

Also, the RVD can benefit from Lloyd’s iterations to obtain a Centroidal RVD, that can be used
to remesh meshes with equilateral triangles (Fig. 4.13). By slightly modifying the minimized energy,
this can be used to produce minimal surfaces (surfaces of constant mean curvature)13 .

As a parenthesis: to randomly sample sites on the surface of a triangle mesh, it is not sufficient to
naively randomly select a triangle uniformly, and then generate a random point within this triangle.
11
See On Centroidal Voronoı̈ Tessellation – Energy Smoothness and Fast Computation, https://dl.acm.org/doi/
pdf/10.1145/1559755.1559758
12
See Gabriel Peyré’s Numerical Tours: https://www.numerical-tours.com/matlab/fastmarching_4_mesh/
13
Robust Modeling of Constant Mean Curvature Surfaces, https://haopan.github.io/papers/cmc_surface.pdf
96 CHAPTER 4. GEOMETRY PROCESSING

Figure 4.13: The Restricted Voronoı̈ Diagram (RVD) is the intersection between Voronoı̈ cells and a
mesh (left) which dual, the Restricted Delaunay Triangulation (RDT) can be used to remesh shapes.
Performing Lloyd’s iterations (bottom) result in more isotropic remeshing – here with 3000 sites (note
that this one may not have yet fully converged ).

Indeed, this would ignore the triangle areas and would favor places where there are many small
triangles. Instead, you should compute the area of all triangles and store them in some array, and
store the total area of the mesh. For each new site you want to generate, you generate a random
number between 0 and the total mesh area, then scan the area array and progressively accumulate
areas until you have reached your random number. Once you have reached the random number, you
stop scanning the array and you obtained a correct uniformly sampled triangle. This strategy is an
inverse CDF random sampling (or inverse transform sampling) method and is used in much broader
context than computer graphics.
4.4. MORE THAN VORONOÏ 97

4.4.3 Power diagrams

A power diagram (or Laguerre diagram) is an extension of the Voronoı̈ diagram that allows for control-
ling the size of each cell via a set of weights. In fact, any partition of the space into convex polygonal
cells is the Power Diagram of some sites and some weights (Aurenhammer 1987). It is alternatively
defined as a Voronoı̈ diagram where instead of taking the classical distance kP − Pi k from a point P
to a site Pi , we take the distance between P and a point T tangent to a circle centered at Pi and of
radius ri .

P
x i
P
ri x
2
‖2 - r i
‖P i-P
T

This modified distance is thus kP − Pi k2 − ri2 (by Pythagorean theorem), and, denoting wi = ri2 ,
the power cell associated to sample Pi is defined by

kP − Pi k2 − wi ≤ kP − Pj k2 − wj ∀j 6= i (4.11)

More intuitively, as wi increases relative to others weights, the area of the corresponding power cell
increases. It is important to note that when all weights are equal, this power diagram coincides with
the Voronoı̈ diagram, and that the power diagram is invariant by an additive factor to all weights
(adding the same value to both sides of the inequality does not change the result).

More importantly, it can be easily seen that a power diagram in dimension d can be obtained from
√
a Voronoı̈ diagram in dimension d + 1. Indeed, denoting Pi0 = (Pi , m − wi ) the sites in dimension
√
d + 1 where a coordinate m − wi has been added, with m any sufficiently large value such that
m − wi ≥ 0 (for instance, m = maxi wi ) and denoting P 0 = (P, 0), we see that Eq. 4.11 is equivalent
to
kP 0 − Pi0 k ≤ kP 0 − Pj0 k ∀j 6= i
This, in fact, precisely describes the Voronoı̈ diagram of {Pi0 } restricted to the hyperplane defined by
all (P, 0) in Rd+1 – this is an RVD and can be obtained by the previous algorithm (Sec. 4.4.2). It is
important to note that, contrary to Voronoı̈ cells, power cells can be empty or may not encompass
their associated site (Fig. 4.14).

Alternatively, it can be obtained by removing half-spaces as before in Rd . Indeed, it can be easily

w −w
seen that cutting planes are parallel to bisectors, and pass through the point M 0 = M + 2kPii −Pjjk2 (Pj −
Pi ) where M = (Pi + Pj )/2 is the middle point (and thus, when wi = wj , we see that cutting planes
pass through M and are thus bisectors). The main change to the algorithm presented in Section 4.3.3
is thus merely a replacement of M by M 0 in the formulas. The second change is for the criterion to
prune non-contributing hyperplanes in the k-nearest neighbor search. For this criterion, it is easier
to perform the k-nearest neighbor search in the d + 1 dimensional space (recall the the constructed
√
polygon has its last coordinate set to 0, but the sites have their last coordinates set to m − wi ), and
keep the previously used criterion (take the distance from the d + 1-dimensional site to the furthest
point in the polygon, multiply it by 2, and you get the critertion for rejecting further sites that will
not contribute to the power cell).

It is however not very practical to control the area of each power cell via its weight wi (or the
radius ri ). We will see next how semi-discrete optimal transport can alleviate this issue.
98 CHAPTER 4. GEOMETRY PROCESSING

x x

x x x x

(m-wi)1/2

(m-wi)1/2
x x

x x

x x x x x x x x x x x
x

Figure 4.14: Simple example of a 1-d power diagram seen as a 2-d Voronoı̈ diagram. By adding a
second dimension to each site on the left (here, the second dimension has been chosen on the graph
of a function, but this is not necessary), one can compute a Voronoı̈ diagram (right) and consider its
restriction to the y = 0 line. Here, the resulting diagram consists of the first power cell in blue that
encompasses 3 sites ; the second power cell (in yellow) is empty, the third cell (in green) has a small
area and does not encompass any site, the fourth cell encompasses 2 sites, and the last cell (in gray)
does not encompass any site.

4.4.4 Semi-discrete Optimal Transport

Going back now to optimal transport, which we briefly introduced in Sec. 3.2.2. The optimal transport
problem is the problem of matching a probability distribution with another probability distribution
at minimal cost. Specifically, one considers that a probability distribution is a heap of sand, and
the second probability distribution is a hole in the ground, and one would like to find how to move
the sand into the hole using a little spoon, and minimizing the travelled distance with the spoon (or
alternatively, minimizing the sum of squared distance travelled with the spoon). The paths borrowed
by all these spoons define a transport plan, which tells what amount of sand from location x should
go to location y.

It turns out that when the hole consists in a sum of “Dirac holes”, the resulting transport plan can
be represented by a Power Diagram. A better analogy in this case is that of bakeries located in a city
whose population density is described by a probability density function f , and each bakery located
at position yi can serve λi pieces of bread (e.g., per day). An additional assumption is that all bread
will be sold (e.g., at the end of the day). Given that the cost for someone located at position x to
travel to any bakery at position yi is kx − yk2 , what is the optimal global strategy to sell that bread.
It can be shown14 that the solution to this optimal transport problem is a partition of the space into
convex polyhedra, which can thus be precisely modeled by the power diagram of the {yi } for some set
of weights {wi } that need to be found.

Another way to see that is that instead of controlling the size of each cell via some weights {wi }
that are hard to control, we want to directly control the mass of the Voronoı̈ cells so that they are
equal to {λi } (if the underlying “population density” is uniform, then this mass exactly corresponds
to the cell area).

In order to find the optimal set of weights {wi }, it can be shown15 that one needs to maximize the
following functional:
XZ X
g(W ) = (kx − yi k2 − wi )f (x)dx + λ i wi
i P owW (yi ) i

14
Minkowski-Type Theorems and Least-Squares Clustering: https://link.springer.com/content/pdf/10.1007/
PL00009187.pdf
15
For an intuitive explanation, see A numerical algorithm for L2 semi-discrete optimal transport in 3D: https://
arxiv.org/pdf/1409.1279.pdf
4.4. MORE THAN VORONOÏ 99

whose gradient can be expressed as:

Z
∇g(W ) = − f (x)dx + λi
P owW (yi )

and Hessian16 as:

∂2g
Z
f (x)
= dx ∀i 6= j
∂yi ∂yj P owW (yi )∩P owW (yj ) 2kyj − yi k

∂2g X ∂2g
= −
∂yi2 ∂yi ∂yj
j6=i

Using only the gradient, one can easily perform a gradient ascent (g should be maximized!) by
iterating:
wi ← wi + g(W )
for some step size (which can be adjusted via line search). Typically, a gradient descent starts from
an initial point and walks along the direction of steepest descent (i.e., the direction of the gradient of
the function) to find a local minimum. A gradient ascent walks the other way around. However, this
can be very slow to converge. Alternatively, one can use a quasi-Newton solver such as L-BFGS17 , in
which case you will need to provide −g and −∇g to the library. A Newton solver considers a Taylor
series expansion of g in the form g(x + d) ≈ g(x) + h∇g(x), di + 21 dT Hd, where H is the Hessian of
g (its matrix of second derivatives). Each iteration tries find the next point xn+1 that minimizes the
Taylor expansion of g given the current xn . This amounts to finding a good direction dn , i.e., one that
minimizes E(d) = g(x + d) ≈ g(x) + h∇g(x), di + 12 dT Hd among all d. One thus finds the gradient of
E(d) : ∇E(d) = ∇g(x)+ 12 Hd, and the best direction dn cancels this gradient, i.e., ∇g(x)+ 12 Hdn = 0.
This amounts to finding dn = 2H −1 ∇g(x). To maximize g, one thus iterates xn+1 = xn + dn with
dn = 2H −1 ∇g(x) and some well chosen . Unfortunately, in some cases, H is difficult to obtain and
quasi-Newton solvers merely approximate H.

In general, faster solutions are obtained with a Newton solver. However, this is not entirely trivial:
the Hessian cannot be computed if there are empty power cell along the execution of the optimizer.
Starting with an initial guess with no empty cell (e.g., a Voronoı̈ diagram, with constant weights), it
was shown that if you halve the step size as soon as you encounter an empty cell, then Newton steps
will converge.

When the density f is constant, the expression of g makes use of P owW (yi ) kx − yi k2 dx. In 2-d,
R

denoting {(Xi , Yi )} the vertices of the power cell, this can be computed analytically using18
Z N
1 X
kP − Pi k2 dP = 2
(Xk−1 Yk − Xk Yk−1 )(Xk−1 + Xk−1 Xk + Xk2 + Yk−1
2
+ Yk−1 Yk + Yk2
P olygon({(Xi ,Yi )}) 12
k=1
(4.12)
− 4(Pi,x (Xk−1 + Xk ) + Pi,y (Yk−1 + Yk )) + 6kPi k2 ) (4.13)

Alternatively, an equivalent expression19 considers that the power cell has been triangulated into
triangles, and each triangle T = (c1 , c2 , c3 ) contributes to the integral as:

|T | X
Z
kP − Pi k2 dP = hck − Pi , cl − Pi i
T 6
k≤l≤3

See Fig. 4.15 for results.

16
Convergence of a Newton algorithm for semi-discrete optimal transport: https://arxiv.org/pdf/1603.05579.pdf
17
You can use the library available at https://github.com/chokkan/liblbfgs – it merely consists of two header files
(lbfgs.h and arithmetic ansi.h) and one .c file (lbfgs.c), so it is quite easy to integrate it into your project ; also
100 CHAPTER 4. GEOMETRY PROCESSING

Figure 4.15: Power-diagram of sites in red, optimized using semi-discrete optimal transport so that
the cell associated to a site at position yi has an area proportional to exp(−kyi − Ck/0.02) where C
is the center of this unit square (here f = 1). This was optimized using L-BFGS. The analogy with
bakery would be a square city with a uniform population density, and 2000 bakeries. The bakeries
close to the center are able to produce more bread than those far from the center, so they attract
people from a larger area.

Semi-discrete optimal transport can be used similarly to Lloyd’s algorithm to produce well dis-
tributed point set20 (their Fourier spectrum has a peculiar form and we call this property of these
point sets “Blue Noise”). As we shall see later, it can also be used for fluid simulation.

4.5 The Marching Cubes algorithm

While it is possible to directly render implicit functions (e.g., via ray marching that performs raytracing
by computing the intersection via trial and errors along the ray), it is sometimes more convenient to
transform them into a triangular mesh. A simple algorithm called marching cubes allows that21 .

The input of the algorithm is an implicit function that can be evaluated at any vertex of a voxel
grid, where negative values represent the inside of the volume and positive values the outside. The
algorithm considers a voxel grid, and for each voxel taken individually, tries to determine if part of
the surface traverses this voxel (in which case, triangles should be computed and added to the mesh).

To do that, the algorithm checks the value of the implicit function at each of the 8 vertices of the
current voxel. Of course, if all the 8 vertices have negative values or if all the 8 vertices have positive
values, it means the voxel is outside of the surface and can be ignored. Now, the sign of the implicit
function at each of the 8 vertices produces 28 = 256 combinations (in fact, due to symmetries, only
15 are really different, but we will not use this fact). Given the following numbering of the vertices
and edges

take a look at the sample file.

18
Polygon Integrals – Arbitrary Moments of a Polygon: https://people.sc.fsu.edu/~jburkardt/cpp_src/polygon_
integrals/polygon_integrals.html
19
Fitting Polynomial Surfaces to Triangular Meshes with Voronoi Squared Distance Minimization: https://members.
loria.fr/Bruno.Levy/papers/VSDM_IMR_2011.pdf
20
Blue Noise Through Optimal Transport: https://graphics.stanford.edu/~kbreeden/pub/dGBOD12.pdf
21
Marching cubes: A high resolution 3D surface construction algorithm: https://dl.acm.org/doi/abs/10.1145/
37402.37422
4.5. THE MARCHING CUBES ALGORITHM 101

7 6 6
7
4 5
4 5
11 10
8 9
3 2 2
3
1
0 0 1

the first step is to convert our voxel into an single scalar value (a hash). This is performed by
considering the scalar
X7
k= S(i)2i
i=0

where S(i) = 1 if the implicit function is negative at vertex i, and S(i) = 0 otherwise. This simply is
a binary representation of our voxel.

The authors of the method have then built a large table that tells, given this scalar value, which
triangles should be added to the triangulation22 . The table looks like

1 i n t triTable [ 2 5 6 ] [ 1 6 ] =
2 { { −1 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
3 { 0 , 8 , 3 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
4 { 0 , 1 , 9 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
5 { 1 , 8 , 3 , 9 , 8 , 1 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
6 { 1 , 2 , 1 0 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
7 { 0 , 8 , 3 , 1 , 2 , 1 0 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
8 { 9 , 2 , 1 0 , 0 , 2 , 9 , −1, −1, −1, −1, −1, −1, −1, −1, −1, −1} ,
9 { 2 , 8 , 3 , 2 , 1 0 , 8 , 1 0 , 9 , 8 , −1, −1, −1, −1, −1, −1, −1} ,
10 // . . . . . .

" The original table published by the authors is incorrect: it (occasionally) produces surfaces that
are not watertight.

The most complex possible voxel contains 5 triangles – this table hence has 3*5 = 15 values at
most for each possible voxel (in fact the table contains 16 column, but that last one is always −1). The
indices in this table represent triplets of edges that are crossed by the 0 level of the implicit surface.
For instance, the very first voxel has index 0, which in binary, means that the implicit function is
positive at all of the 8 vertices, so no triangle should be formed (hence the first row contains only
−1 which represent null values). For another example, taking voxel 7 = 00000111 (in binary) means
vertices 0, 1 and 2 are positive (so, outside of the volume of interest) and all the other vertices are
negative (i.e., are inside). For this voxel, three triangles should be created. The first of these three
triangles has its vertices somewhere on the edges 2, 8 and 3 of the current voxel. The second one
has its vertices on edges 2, 10 and 8. And the third triangle on edges 10, 9 and 8. The figure below
illustrates this example:
22
I’ve put it here: https://pastebin.com/Bbmt1u4Y
102 CHAPTER 4. GEOMETRY PROCESSING

7- 6 6-
4 -7 5
4 5-
11 10
8 9
3- 2 2+
3
1
0+ 0 1+

The last detail is to decide where to put the vertices exactly on that edge. A simple solution is to
put it right in the middle, though it can produce jaggy results (Fig. 4.16). Instead, one can compute a
more accurate location by linearly interpolating the implicit function value at the vertices to determine
approximately where it should cross the 0 isovalue. Given the implicit function value f (A) at vertex
A of the voxel and f (B) at vertex B of the voxel, the mesh vertex P that should be added along edge
f (A)
AB is thus P = A + (B − A) f (A)−f (B) . Results can be seen in Fig. 4.17.

Figure 4.16: Placing the triangle vertex at the middle of the voxel’s edge (left) results in jaggy
reconstructions, while using a linear interpolation (right) produces smoother reconstructions.).

4.6 Surface parameterization

Parameterizing a triangle mesh has many applications, and in particular, texture mapping as we have
seen in Sec. 2.2.1. Parameterizing a surface means that we want to uniquely assign each point of the
surface to a point of a reference domain – a 2d texture map in our case of interest. There are many
properties such a map could possess, and among them:

Isometric: There are two definitions of isometries on manifolds. The first one is a map that pre-
serve (global) distances on the manifold. Bijective isometric mappings are affine (Mazur–Ulam
th.), so they are of little interest for texture mapping. Up to some distortions, there are al-
gorithms (e.g., SMACOF23 ) trying to find mappings that best preserve length. They are still
barely used in computer graphics, but are used for drawing graphs. The other definition is a
map that preserves entirely the Riemannian metric tensor (and hence local angles, distances,
areas). These two definitions are equivalent (Myers and Steenrod th.24 ).
23
Multidimensional Scaling Using Majorization: SMACOF in R: https://www.jstatsoft.org/article/view/
v031i03/v31i03.pdf
24
see Foundations Of Differential Geometry, vol. 1 (p. 169) for a proof: http://tomlr.free.fr/Math%E9matiques/
Math%20Complete/Differential%20Geometry/Foundations%20of%20Differential%20Geometry%20vol%201%20-%
20Kobayashi,%20Nomizu.pdf
4.6. SURFACE PARAMETERIZATION 103

Figure 4.17: Reconstructions using 32 × 32, 128 × 128, and 512 × 512 voxel grids. Bottom row: I
used the centroidal Voronoı̈ tessellation of Davy Jones in Fig. 4.13 to place Gaussian kernels at each
site, and reconstructed some level set of this sum of Gaussians (we can see in the cape that Lloyd’s
iterations have not yet converged as the density of samples is obviously lower there). The blob example
in 128 × 128 runs in 18 ms (monothreaded) and the code is about 40 lines (excluding the 300 lines
lookup table provided by the authors!).

Input mesh Tutte’s embedding Harmonic mapping Conformal mapping

Figure 4.18: Different parameterizations to a disk of Goethe’s life mask (https://www.turbosquid.

com/fr/3d-models/free-obj-mode-scan-lifemask-johann-wolfgang/1035699). The conformal
mapping has not fully converged yet, and is neither exactly a disk nor exactly bijective.

Isoareal: These maps preserve areas. These maps are also not used for texture mapping as even
a nice equilateral triangle on a mesh could be mapped to a very long but extremely thin triangle
of the same area in the texture map. However, area preservation is a property that can be
enforced on top of conformal maps (see next), notably using semi-discrete optimal transport25 .
In term of metric, it preserves the area element if the determinants of the first fundamental form
are preserved (see below, Sec. 4.6.2).

25
Area-Preservation Mapping using Optimal Mass Transport: https://www.researchgate.net/publication/
256837514_Area-Preservation_Mapping_using_Optimal_Mass_Transport
104 CHAPTER 4. GEOMETRY PROCESSING

Conformal: A conformal map preserves angles, and that are the maps that are mainly of
interest in texture mapping. In term of metric, it means the metrics are proportional. As such,
an isometric map is necessarily conformal. More generally, an isometry is a map that is both
conformal and isoareal.

Harmonic: This map is such that ∆φ = 0. Interestingly, a conformal map is necessarily

harmonic (but not the converse). More details in Sec. 4.6.2.

Example of them are shown in Fig. 4.18. Before delving into maps that posess these properties,
we will see a simpler embedding, Tutte’s mapping.

4.6.1 Tutte’s mapping

Tutte was interesting in laying out graphs on a plane. Translated in the language of triangular meshes
(from the book Polygon Mesh Processing):

“Given a triangulated surface homeomorphic to a disk, if the (u, v) coordinates at the bound-
ary vertices lie on a convex polygon, and if the coordinates of the internal vertices are a convex
combination of their neighbors, then the (u, v) coordinates form a valid parameterization (without
self-intersections).”

This gives a pretty simple algorithm to produce such mappings.

Algorithm 5: Tutte’s embedding algorithm.

Input: Triangular mesh M homeomorphic to a disk with vertices {vi }.
Output: Tutte’s embedding
1 ∂M ← {b0 , b1 , ..., bn−1 } // identify ordered boundary vertices (bn = b0 ).
Pn−1
2 s← i=0 kbi+1 − bi k // boundary length.
3 cs ← 0
4 vi0 ← vi ∀i
// Layout boundary vertices on a circle.
5 for i = 0..n − 1 do
6 θi ← 2π css
7 vi0 = (cos θi , sin θi )
8 cs ← cs + kbi+1 − bi k
// Layout internal vertices.
9 for iter = 0..nbiter − 1 do
10 for all interior vertex indices i do
vin+1 ← K1 j∼i vjn // With K: number of adjacent vertices, i ∼ j if vj shares an edge
P
11
with vi
12 for all boundary vertex indices i do
13 vin+1 ← vin

14 return {vinbiter }

Note that this amounts to perform Jacobi iterations to solve a linear system that we will discuss
in Sec. 4.6.2. See Fig. 4.18 for an example result.

4.6.2 Conformal mapping

This section describes conformal mapping, but again starts with simpler premises.
4.6. SURFACE PARAMETERIZATION 105

Laplace-Beltrami and Cotan Laplacian

We want to define a notion of Laplacian over the surface of a triangular mesh. To do that26 , let’s
consider that we want to solve the following equation on the mesh:

∆f = g

The weak formulation reads: Z Z

∆f (x) φ(x)dx = g(x) φ(x) dx

for all φ belonging to some suitable function space. We consider a set of hat test functions {φi }i=1..N ,
Plinear on the mesh (Fig. 4.19), such that φi (vi ) = 1 and φi (vj ) = 0, j 6= i, and hence
that are piecewise
consider φ(x) = j λj φj (x). While it would not make sense to directly take the Laplacian of φ (since
φ is piecewise linear, its second derivatives are 0!), we can nevertheless integrate by part (Sec. 3.4.1)
to obtain: Z Z
− ∇f.∇φdx = g(x) φ(x) dx
P
We also write f (and g) in the {φi } basis, such that f = fi φi (x), and so, by bilinearity:
X Z X Z
− fi λj ∇φi .∇φj dx = gi λj φi (x) φj (x) dx
i,j i,j

φi
vi

Figure 4.19: The piecewise linear basis vector associated to vertex vi .

A
α
B h
β γ C
w
Figure 4.20: Notations for our triangle ABC.
R
We are now left with evaluating Li,j = ∇φi .∇φj dx for all (i, j). The matrix L = [Li,j ] is the
discretization of the Laplacian operator.

Given a triangle ABC with φA = 1 on A and 0 on B and C (Fig. 4.20), it is easy to see that
⊥
∇φA = BC ⊥
2a with a the area of the triangle, and BC denotes a rotation of BC by 90 degrees counter-
clockwise. Indeed, the gradient should be constant over the triangle since the function is linear, the
function is constantly 0 on the edge BC so the gradient is necessarily orthogonal to BC, and the
function goes from 1 to 0 over the height h of the triangle. The area a of the triangle is a = BC.h/2
(where I denote BC both the length and so vector – that should be clear from the context), and since
⊥
BC ⊥ has length BC, BC 2a has length 1/h as expected.
⊥
Then ABC ∇φA .∇φA dx = a.k BC 2
R
2a k since we areRintegrating a constant over a triangle. And
⊥
using the fact that BC2a has length 1/h, we obtain ABC ∇φA .∇φA dx = BC 2h . Given the angles
(A, α), (B, β) and (C, γ), and given the definition of the tangents tan β and tan γ, it is easy to see
26
more details at https://graphics.stanford.edu/courses/cs468-13-spring/assets/lecture12-lu.pdf
106 CHAPTER 4. GEOMETRY PROCESSING

1 1
R
RthatABC ∇φA .∇φA dx = 2 (1/ tan β + 1/ tan γ) = 2 (cotanβ + cotanγ). We can similarly see that
1
ABC ∇φA .∇φB dx = − 2 cotanγ.

By summing over all triangles adjacent to vertex i (we denote i ∼ j if vertex i shares a triangle
edge with vertex j), we can now build our Laplacian matrix27 L:

1
−P
 2 (cotanαj + cotanβj )γ i∼j
Li,j 1
= 2 i∼j (cotanαj + cotanβj ) (4.14)
i=j

0 otherwise


P R R
Regarding the right hand side i,j gi λj φi (x) φj (x) dx, we also need to evaluate φi (x) φj (x) dx.
An approximation by a diagonal matrix M , called the lumped mass matrix, can be obtained by taking
as Mi,i one third of the summed areas of triangles incident to i.

Cauchy-Riemann equations

The idea behind a conformal mapping is that angles are preserved via the transformation. Without
loss of generality, a straight angle remains straight. Given a point X(u, v) on the surface28 given its
conformal parameterization (u, v), and let N (u, v) be the (unit) normal vector at X(u, v), we have
that N × ∂X ∂X
∂u = ∂v to enforce orthogonality of isolines on the surface.

Writing locally this equality within a single triangle T in its local frame (aligning the triangle’s
normal to the z axis), and using complex numbers to represent positions: X = x + iy, we obtain
that i ∂X ∂X ∂X
∂u = ∂v (where we rotated ∂u by multiplying by i). Or written equivalently, without complex
numbers:

(
∂x
∂u = ∂y
∂v
∂y (4.15)
∂u = − ∂x
∂v

These equations are called Cauchy-Riemann equations and form the basis of holomorphic/analytic
functions. If a (complex) function is analytic (i.e., obeys Cauchy-Riemann’s equations) with non-zero
(complex) derivative, then it defines a conformal map, and conversely.

Since we will often see conformal mapping results onto disks, it is interesting to note that a
Möbius transform conformally maps a disk to a disk. A Möbius transform is a map of the form
φ(z) = az+b
b̄z+ā
z−c
= eiθ 1−c̄z (c is the image of 0 and θ is a rotation angle). Möbius transforms define the
P SL(2, R) group, called the Möbius group. As such, a conformal map to a disk is only unique up to
these 3 degrees of freedom. The Riemann Mapping theorem states that given a simply connected29
domain D (whose boundary has more than one point) of the complex plane and z0 a point inside it,
there exists a unique conformal mapping φ from D to a unit disk such that φ(z0 ) = 0 and φ0 (z0 ) > 0.

More generally, regarding the target space, Riemann uniformization theorem states that any simply
connected surface can be embeded into the complex plane, the complex projective line or the hyperbolic
plane. Unfortunately, for computer graphics purposes, only genus-1 surfaces (with one hole, e.g., a
torus) or surfaces with boundaries can be mapped to the complex plane, i.e., the space that is most
relevant to store textures. To map more complex surfaces, it is required to cut the mesh either into
27
Note that the sign of the Laplacian may differ in the litterature. The true Laplacian is semi-definite negative, but
to simplify notations since algorithms usually work for semi-definite positive matrices, many people just change the sign
of the Laplacian operator (like in the presented formulas).
28
In fact, we will not deal with higher-dimensional objects. Liouville’s theorem states that in dimenion (strictly) greater
than 2, only Möbius transformations are conformal – those are extremely rigid and are thus pretty much uninteresting.
29
Simply connected = any loop can be contracted to a point
4.6. SURFACE PARAMETERIZATION 107

different charts (that are each topological disks), or to add seams to open the mesh. Adding seams can
also be used to reduce area distortions: the process involves adding cone singularities, i.e., identifying
a highly distorted point, and cutting to the nearest boundary (see Fig. 4.23).

Harmonic mapping

We can further differentiate both Cauchy-Riemann equations w.r.t. x and y to obtain:

∂2x ∂2y
= (4.16)
∂u2 ∂v∂u
∂2x
=− 2 (4.17)
∂v
∂2y ∂2x
= (4.18)
∂v 2 ∂u∂v
∂2y
=− 2 (4.19)
∂u
→ ∆X(u, v) = 0 (4.20)

Now, ∆ is to be understood locally on a surface (it is not just the Euclidean Laplacian – unless the
surface is perfectly flat – since we worked on a local frame!). This is the Laplace-Beltrami operator we
have seen in Sec. 4.6.2 (using, for example, cotan weights30 ) which generalizes the Laplacian operator
to surfaces.

This implies that conformal maps are harmonic, i.e., they satisfy ∆X = 0. Also, a harmonic
mapping of a topological disk to a subset of R2 with a convex boundary is bijective (Rado-Kneser-
Choquet th.), i.e., there will not be flipped triangles in the mapping.

Perhaps the simplest algorithm for harmonic mapping a mesh that is a topological disk to an actual
disk in the complex plane merely consists in solving Poisson equations. Specifically, the boundary of
the mesh is mapped to the unit circle by simply converting edge length to angles. Then, the UV
coordinates of the interior points are obtained by solving ∆U = 0 and ∆V = 0 with the boundary
vertices as Dirichlet boundary condition. The resulting linear system is symmetric positive definite,
and efficient solvers can be used (such as the Conjugate Gradient, see Sec. 5.2). The process would
also work for other (convex) boundary configurations, such as squares, as often used for textures. It
can even be used without imposing values on the boundaries, which reduces distortions. See Fig. 4.18
for a result on a disk.

While conformal maps are harmonic, the converse is not true, and harmonic maps may not preserve
angles.

Algorithms for Conformal mapping

Conformal mapping can be obtained using a slightly more involved algorithm31 . This algorithm is
based on another property of conformal maps, that is, an homeomorphism φ between Riemannian
manifolds φ : (S1 , g1 ) → (S2 , g2 ) is conformal iff φ∗ g2 = e2u g1 . For reminder, a Riemannian metric
g (here g1 and g2 ) gives a notion of dot product on a surface. For instance, g(X, Y ) = hX, Y i for
the Euclidean space. You do not need to know g for all possible pairs of vector: since it is bilinear
(and symmetric positive definite), it is sufficient to know it for all pairs of basis vectors, so, on our
2d surfaces, only g(X, X), g(X, Y ) and g(Y, Y ) are required for two independent vectors X and Y
30
You will notice how the algorithm introduced for Tutte embeddings resemble that of a graph Laplacian (a Laplacian
where each edge of the mesh graph is assigned a weight of 1) equals zero... however it is unfortunately not conformal.
31
The simplest I could describe is based on Conformal Equivalence of Triangle Meshes: https://dl.acm.org/doi/
pdf/10.1145/1399504.1360676
108 CHAPTER 4. GEOMETRY PROCESSING

(basis vectors). In fact, the corresponding SPD matrix is called the first fundamental form. Like
the
p regular Eucliean dot product, the metric gives a notion of angle (using an acos), length (using
g(u, u)) and area (using the square root of the determinant of the first fundamental form). The
notation φ∗ g2 means that we pullback the metric by the function φ, which means that we will evaluate
the metric g2 on the surface S1 using the mapping φ. Specifically, we define this pullback metric as
(φ∗ g2 )p (X, Y ) = (g2 )φ(p) (dφp (X), dφp (Y )) where p is the point where we evaluate the metric. The
intuition behind φ∗ g2 = e2u g1 is that, while we want to preserve angles, we allow lengths to be
uniformly stretched. This stretching corresponds to a scaling factor in front of the metric, and this
scaling factor may change at each point. To ensure positivity, we use the exponential. And to relate
to areas (which we also do not care about in the context of conformal mapping) rather than length,
we use the factor of 2. The factor e2u is called the conformal factor and tells how much areas are
stretched.

In the discrete setting, we will instead consider edge lengths, and write ˜li,j = e(ui +uj )/2 li,j , where
li,j is the edge length between vertices i and j, and {ui } are scaling factors stored per vertex. It can be
shown that in a triangle ti,j,k a triangle angle at vertex i can be expressed via its lengths (Fig. 4.21)
as s
(lij + ljk − lki )(ljk + lki − lij )
i
αjk = 2 tan−1
(lki + lij − ljk )(ljk + lki + lij )
i = 2π. The goal of the method is to minimize a function
P
Further, for a planar surface Θi = tijk ∼vi αjk
of the scaling factors {ui } such that the sums of the angles above are as close as possible to Θi = 2π for
interior vertices (for boundary vertices, we would either fix ui = 0 or prescribe the sum of the angles
– for example, prescribing the sum of these angles to Θi = π/4 for 4 of the boundary vertices and
Θi = π/2 for the other boundary vertices produces a mapping to a square ; prescribing the boundary
Θi to N 2−2 π with N the number of boundary vertices results in a disk).

vi
lij αijk lki
vj vk
ljk

Figure 4.21: Notations for the angles.

The authors designed an energy E (see paper), whose gradient is given by

1 X
i
∂ui E = (Θi − αjk
2 t ∼v ijk i

and Hessian applied to some δu:

1 1 X
(HessE.δu)i = (∆δu)i = Lij (δui − δuj )
2 4 e 3v
i,j i

with Lij the coefficients of our cotan Laplacian (Sec. 4.6.2). It is easy to see that when the gradient
is zero, the sum of angles is equal to the desired result Θi , leading to a flat mapping.

With the gradient and Hessian, it becomes easy to develop a Newton solver (Sec. 4.4.4) to obtain
optimal {ui } and thus, optimal edge length and triangle angles. Note that the optimization could
lead to edge lengths that do not respect the triangle inequality (i.e., an edge longer than the sum of
the two others), which messes up with angle formulas. When a triangle does not respect the triangle
inequality, the angles are merely artificially set to 0 (for the angles between the small edges and the
long edge) and π (for the angle between the two small edges).

To layout the triangle on a plane, one starts by placing the first triangle, and progressively prop-
agate to neighboring triangles. Each time a new triangle (sharing an edge with an existing triangle)
4.6. SURFACE PARAMETERIZATION 109

is placed, the coordinate of the new vertex on the plane can be obtained either by computing the
intersection of two circles of known radii (the edge lengths) or by rotating the existing edge by the
known angle and scaling it according to the known length.

See Fig. 4.18 for a result on a disk, Fig. 4.22 for a result on a square and a result that instead
imposes edge lengths on the boundary, and Fig. 4.23 for the effect of introducing seams.

Input mesh Mapped to square No boundary angular constraint

Figure 4.22: We can obtain a square by imposing 4 boundary angles to sum to π/4 and the rest to
π/2, or we can reduce distortion by not imposing angles on the boundary but instead edge lengths.

Figure 4.23: We can reduce distortions by introducing seams, and in particular to make
the mesh homeomorphic to a disk. Left, the input meshes are shown, without and
with seams that cut the mesh along highlighted edges. Right, the corresponding confor-
mal parameterizations. The 3d model can be found here: https://free3d.com/3d-model/
mask3d-facemask-wall-maskfor-decorative-or-face-character-973366.html (though I trian-
gulated it, and there is a tiny hole in the mouth that I stiched).

Aside from texturing, conformal maps have applications in remeshing (one can easily remesh in
the 2d UV map domain, for instance using 2-d Centroidal Voronoi Tesselations), shape interpola-
tion (which becomes easier once both meshes share the same 2d domain) and in fluids (an irrota-
tional+incompressible flow is harmonic, and such fluid simulation can thus be conformally mapped to
another domain).
110 CHAPTER 4. GEOMETRY PROCESSING
Chapter 5

Fluid simulation

This chapter is dedicated to a few approaches used in computer graphics for solving incompressible
Euler equations, a simplification of Navier Stokes equations. Note that other approaches exist, for
example solving for vorticity1 . While fluids can refer to liquids, gases or smoke, we will be mostly
interested in liquids.

5.1 Principles

5.1.1 Helmoltz-Hodge decomposition

As a preliminary to fluid simulation, we will have a look at an important theorem in vector calculus,
Helmoltz-Hodge decomposition (Fig. 5.1).

x
x x
x x
x x
x

div ud < 0 div ud > 0 div uc = 0 div uh = 0

curl ud = 0 curl ud = 0 curl uc ≠0 curl uh = 0
ud = ∇φ ud = -∇φ uc = curl Ѱ uh = ∇χ
Δuh = 0
Figure 5.1: A vector field can be expressed as a sum of vector fields that are divergent free, curl free
and harmonic.

This theorem can be stated in various ways, but it essentially says that any vector field can be
decomposed into a sum of divergent-free, curl-free and harmonic vector fields. Specifically, one can
write a vector field u:
u = ud + uc + uh
1
For instance to simulate smoke, Simulation of Smoke based on Vortex Filament Primitives: http://www-evasion.
imag.fr/Publications/2005/AN05/paper0132.pdf , or Lagrangian Vortex Sheets for Animating Fluids: https://dl.
acm.org/doi/pdf/10.1145/2185520.2185608

111
112 CHAPTER 5. FLUID SIMULATION

with ud a curl-free vector field (curl ud = 0) and so, div ud 6= 0 ; uc a divergence-free vector field
(div uc = 0) with curl uc 6= 0 ; and uh a harmonic vector field, which is both curl-free and divergence-
free (curl uc = div uc = 0) with ∆uc 6= 0.

The other way to see this decomposition is to realize that a curl-free vector field and a harmonic
vector field are both the gradient of some potential function, and that a divergence-free vector field
should be the curl of another vector field. In short:

u = ∇φ + curl Ψ

Intuitions about the shape of these vector fields are shown in Fig. 5.1. The divergence of a velocity
field indicates whether there is a sink or a source pulling or pushing matter around. There typicially
isn’t any harmonic field when dealing with a subset of R3 (there could be if you consider boundary
conditions that allow mass to freely flow in and out, for instance modeling the flow inside an infinite
tube by studying a section of it ; there could also be harmonic fields if you deal with flows on the
surface of a torus or higher genus surfaces).

5.1.2 Navier-Stokes and Incompressible Euler

Incompressible Navier-Stokes equations govern the motion of fluids and reads2 :

∂u 1
+ u.∇u + ∇p = g + ν∆u (5.1)
∂t ρ
div u = 0 (5.2)

P
There is a quite intuitive explanation of this equation in term of Newton’s second law: i Fi = ma,
the sum of the forces applied (locally) to the fluid is equal to its mass (or locally, its density, ρ) times
the acceleration. First, let’s assume that you are looking at a fluid particle whose position is described
as x(t), but consider a static point x of the domain. The velocity of the particle located at x and at
time t is thus u(x, t) = ∂x(t)
∂t . Now, its acceleration is the derivative of the velocity with respect to
time. By applying the chain rule, Du(x(t),t)
Dt = ∂x ∂u ∂u ∂x ∂u
∂t . ∂x + ∂t . Noting that v = ∂t and ∇u = ∂x , it reads
Du ∂u
Dt = ∂t + u.∇u. This is simply the expression of the acceleration of the fluid as seen from a fixed
domain (when following particles in time using an acceleration Du Dt we call the approach Lagrangian,
while we call approaches that see how velocities evolve on a static grid Eulerian).

So, with ρ( ∂u
∂t + u.∇u) the mass times the acceleration, this should be equal to the sum of external
forces, that is, the force of gravity ρg (the mass, expressed locally, times the Earth’s gravitational force
g) and other forces. Among other forces is the pressure. If an object underwater is subject to pressure,
but the pressure is the same everywhere, it will not make the object move – in fact, only differences
in pressure matter: hence the term ∇p. In fact, the pressure p is often seen as a Lagrange multiplier
to make fluid incompressible – it should take any value that makes the fluid incompressible (one can
deduce p from u by taking the divergence on both sides and imposing incompressibility). And finally,
viscosity. The intuition is that the Laplacian ∆ measures how much a function at point x differs from
its value within a neighborhood (see the second derivative as a measure of cuvature of the graph of a
function). Highly viscous fluids will tend to have more homogeneous velocity fields. The coefficient in
front of ∆u that we will call η is the dynamic viscosity, while ν = ηρ is the kinematic viscosity. Divide
everything by ρ, and you obtain the Momentum Equation of the Navier-Stokes equations above.

As for div u = 0, this simply states the incompressibility (Fig. 5.1).

2
See the excellent book by Robert Bridson, Fluid Simulation for Computer Graphics. Robert’s course notes are
available at https://www.cs.ubc.ca/~rbridson/fluidsimulation/fluids_notes.pdf
5.2. MARKER-AND-CELL METHOD 113

Considering now an inviscid fluid (one for which the viscosity is zero), one obtain the Incompressible
Euler equations:

∂u 1
+ u.∇u + ∇p = g (5.3)
∂t ρ
div u = 0 (5.4)

– the one we will be studying. Note that often, numerical schemes tend to introduce undesirable
viscosity (called numerical viscosity), so it will not matter much at the moment from a computer
graphics perspective.

Regarding boundary conditions, one has p = 0 on free surfaces (interfaces between the fluid and
the air – ignoring surface tension), and the velocity at fluid-objects (or walls) interface is that of the
object (or 0 for walls... unless walls move, you know!).

5.1.3 Chorin’s projection

Chorin has proposed a splitting approach for Navier-Stokes in 1967. The idea is that given a PDE of
the form:
∂u
= f (u) + g(u)
∂t
one could build a first order finite difference discretisation in time and explicit Euler integration to
obtain
un+1 − un = dt(f (un ) + g(un ))
This can be split into two substeps:
u∗ − un = dt(f (un ))

un+1 − u∗ = dt(g(u∗ ))
So the idea of splitting is to decouple the different terms in the Navier-Stokes (or incompressible
Euler) equation, and solve them separately (and not necessarily via explicit Euler schemes). The
incompressibility constraint is just one step in this splitting.

More precisely, many fluid solvers compute the next time step of the velocity field:

Algorithm 6: Classical fluid solver time stepping using splitting.

Input: Current velocity field un
Output: Next velocity field un+1
Dun
1 u∗ = advect(un ) by solving dt = 0
2 u∗∗ ∗ ∗
= addGravity(u ) = u + dt ∗ g
3 u∗∗∗ = addViscosity(u∗∗ ) = u∗∗ + dtν∆u∗∗
4 un+1 = project(u∗∗∗ ) // Make the fluid incompressible
5 return un+1

5.2 Marker-and-Cell Method

The historical (1965) and most classical approach to solving Navier-Stokes is via the Marker-and-Cell
(MAC) method and its variants. This approach is a semi-Lagrangian method, since it tracks particles
for the advection (the advect method of the algorithm) since solving Du dt = 0 using particles is much
easier than dealing with the non-linear PDE ∂u∂t + u.∇u on a grid, but it also makes use of a grid to
deal with the other terms and the incompressibility step.
114 CHAPTER 5. FLUID SIMULATION

x
pi-1,j
ui-½,j
x
pi,j-1 u xpi,j ui,j+½pi,j+1
x

i,j-½

ui+½,j
x
pi+1,j

Figure 5.2: The MAC grid is a staggered grid where velocities are stored on the edges of the pressure
grid.

The grid they used is a staggered grid: it is simply a grid structure which stores pressure and
velocities on different offsetted grids (Fig. 5.2), to gain an order of approximation for free.

∂p
For instance, with the staggered grid, one gets ∂x using a second order centered finite differences to
obtain the value at the grid location of ui,j+1/2 (assuming j represents the x coordinate) by computing

∂p p −p
∂x ≈ i,j+1dx i,j . And one would obtain it at the location (i, j) by averaging its value computed
i,j+1/2
at (i, j − 1/2) and (i, j + 1/2).

n
The second important thing is to advect particles. Solving Dq dt = 0 for some quantity q (q could be
the velocity u, but also the color, or any other quantity) amounts to transporting that quantity along
the flow, unchanged. That is, the variation (i.e., derivative) of that quantity is 0 when transporting it
n
on a particle. A robust way to solve Du dt = 0 at a grid position (i, j) is to check what the value of u was
by tracking back a particle that moved backward in time by the velocity ui,j and interpolating when
appropriate. So, assuming the grid position (i, j) is at spatial coordinate (j/Nx , i/Ny ), take the velocity
uni,j and set u∗i,j = interp(un , (j/Nx , i/Ny ) − dt ∗ uni,j ). Higher accuracy can be obtained via a Runge-
Kutta method instead of forward Euler. Also note that while backtracking the particle, you may arrive
at a grid point where no velocity was computed (e.g., due to numerical errors, (j/Nx , i/Ny ) − dt ∗ uni,j
is outside of the fluid, in the air) – you may need to extrapolate your velocity field by a couple of
voxels first.

The third important trick is the incompressibility step, called project for a reason. Recall the
Helmoltz-Hodge decomposition. Imposing div u = 0 corresponds to projecting the current velocity
field estimate to one that has no curl, and the degree of freedom we have is by playing on the pressure
field. I.e., one solves for a pressure field p such that the resulting velocity field un+1 is incompressible.
To do that, we realize that the fourth step of the algorithm can be written un+1 = u∗∗∗ − dt ρ ∇p, and
one want to find p such that div u n+1 = 0. So, taking the divergence (and using the fact that the
divergence of the gradient is the Laplacian), one obtains div u∗∗∗ − dt ρ ∆p = 0. This simply amounts
to solving a Poisson equation of the form ∆p = F (with the appropriate boundary conditions given
previously) !

We have already seen a few approaches to solve linear systems. In practice, I use a Conjugate
5.3. SMOOTHED PARTICLE HYDRODYNAMICS 115

Gradient algorithm which I briefly summarize for completeness but won’t explain any further3 :

Algorithm 7: Conjugate Gradient method to solve a linear system.

Input: A, b
Output: The solution x of Ax = b
1 p0 = r0 = b − Ax0
2 for k ← 0 to K-1 do // iterates
3 αk = hrhrkk,Ar
,rk i
ki
4 xk+1 = xk + αk pk
5 rk+1 = rk − αk Apk
hrk+1 ,rk+1 i
6 βk = hrk ,rk i
7 pk+1 = rk+1 + βk pk
8 return xK

Finally, the computed velocity field can be used to advect particles (Fig. 5.3) or an implicit function
representing the air-fluid interface.

Figure 5.3: Our Davy Jones represented as particles moving according to the incompressible Euler
equations. It simulated (very) approximately one frame in 15 seconds using a Jacobi preconditioned
Conjugate Gradient, in parallel, on a 1283 grid.

5.3 Smoothed particle hydrodynamics

Smoothed particle hydrodynamics is a Lagrangian way to solve PDEs, i.e., based on particle advection,
and without requiring a grid (though some people use a grid for the incompressibility step), introduced
by Lucy in 1977. The idea is to consider a set of particles moving in space, but instead of considering
them like Dirac distributions (like infinitesimally small point masses), we consider a small radial
function that describes these particles. Typicially, a spline can be used such as the cubic spline:

2 3
2/3 − r + 1/2r , if 0 < r < 1.

W (r, h) = αd × 1/6(2 − R)3 , if 1 < r < 2. (5.5)

0 if r > 2.


where αd is a normalizing constant, where αd = 1/h in 1-d, αd = 15/(7πh2 ) in 2-d and αd = 3/(2πh3 )
in 3-d. Here, r is the distance to the particle’s center, h is called the smoothing length and support
domain is 2h (the support of the function).

In general, any function f can be trivially written as f (x) = R3 f (x0 )δ(x − x0 )dx0 where δ is the
R

Dirac distribution (and abusing notations). The goal of our cubic spline kernel W is to approximate a
3
I still have some slides at https://projet.liris.cnrs.fr/origami/math/presentations/matrices.pdf ; if you
read french, Bruno Levy et al. have written a nice introduction here https://ejcim2018.sciencesconf.org/data/
pages/ejcim2018.pdf
116 CHAPTER 5. FLUID SIMULATION

Dirac, for sufficiently small h, while being of compact support. The idea is thus to replace the trivial
(x0 )
identity above by f (x) = R3 f (x0 )W (kx − x0 k, h)dx0 + O(h2 ), or equivalently f (x) = R3 fρ(x
R R
0 ) W (kx −
0 0 0 2
x k, h)ρ(x )dx + O(h ) with ρ the density of mass.

Discretizing this identity on our set of particles located at {xi }i=1..N leads to
X mi
f (x) ≈ f (xi )W (kx − xi k)
ρi
i

where mi is the (prescribed) mass of particle i (m = ρ(x0 )dx0 ). And by linearity of differential
operators, we have: X mi
∇f (x) ≈ f (xi )∇W (kx − xi k)
ρi
i
X mi
div f (x) ≈ hf (xi ), ∇W (kx − xi k)i
ρi
i

and several Laplacian estimators have been proposed, such as:

X mi
∆f (x) ≈ (f (x) − f (xi )) ∗ h(x − xi , ∇W (x − xi , h)i
kx − xi k2 ρi
i

In practice, more accurate estimators can be used, for instance:

X mi
∇f (x) ≈ (f (xi ) − f (x))∇W (kx − xi k)
ρ(xi )
i

or, a more commonly used symmetric approximation to estimate the pressure gradient:

1 X f (x) f (xi )
∇f (x) ≈ mi + ∇W (kx − xi k)
ρ(x) ρ2 (x) ρ2 (xi )
i

We can then compute the quantities we need using this formula. For instance, the density of
particles reads ρ(x) used above can be estimated as:
X
ρ(x) = mi W (kx − xi k, h)
i

which can as well be improved near free surfaces by further normalizing:

P
mi W (kx − xi k, h)
ρ(x) = Pi mi
i ρi W (kx − xi k, h)

In practice, all those quantities are only evaluated at (other) particle locations, such that x = xj
for some j, and the summation is performed over a neighborhood of particle j, and especially since
our particle support is compact, of radius 2h. This also calls for fast neighborhood queries – you
may use a regular grid for that, or re-use your favorite kd-tree library if its supports querying a fast
neighborhood.

" NanoFlann’s radiusSearch method, although documented to require the maximum distance as
a parameter, actually requires the squared distance ! (and also returns squared distances as a result).

A typical solver will then simply add the various forces to the particle velocities, and advect
particles according to their velocity. But among those forces are again pressure forces that make the
5.4. USING OPTIMAL TRANSPORT 117

fluid incompressible. There is again the solution to discretize the same Poisson equation as before
(div u − dt
ρ ∆p = 0) either on a grid (the scheme would be semi-Lagrangian!) or directly over particles
using the above discretization of the Laplacian operator. This would make the fluid incompressible.

There is however a simpler option if one tolerates a small loss of incompressibility (about 1%): the
weakly compressible model4 . In this model, one can merely apply a force that is repulsive when the
density of particles is higher than what it should be, or an attractive force otherwise. For instance,
if, for water, the density estimate ρ(x) = 2000kg/m3 , it means there are twice as many particles as
required here, and one should strive to break them appart. This is done by computing the pressure
as:
ρ(x) γ

p(x) = B −1
ρ0
2
Here, γ = 7, B = ρ0γcs , cs the speed of sound in the fluid (assumed to be at least 100x larger than
the maximum velocity of the fluid), and ρ0 the expected density of the fluid (i.e., ρ0 = 1000kg/m3 for
water). Since this does not require solving any linear system, this makes the approach extremely fast.

Finally, collisions with obstacles or boundaries can be performed in a similar way, by imposing a
repulsive force between the fluid particles and ghost particles sampling the obstacles. This force is
computed as a sum over neighboring ghost particles:
X mi x − xi
F (x) = Γ(x, xi )
m(x) + mi kx − xi k
i

2
2q − 1.5q , if q < 1.



2/3,
c2s if 1 < q < 2/3. kx−yk
where Γ(x, y) = 0.02 kx−yk 2
with q = h .


0.5(2 − q) , if 2/3 < q < 2.

0 if q > 2.

Also, since one obtains a density of particles, one can easily reconstruct a surface using the marching
cube algorithm we previously saw in Sec. 4.5 (Fig. 5.4).

Figure 5.4: Simple water drop falling using weakly compressible SPH. The simulation (not the ren-
dering!) is near realtime.

5.4 Using optimal transport

The last technique5 we will see to simulate fluids via incompressible Euler’s equations will make use
of the semi-discrete optimal transport method developed in Sec. 4.4.4.
4
Weakly compressible SPH for free surface flows https://cg.informatik.uni-freiburg.de/publications/2007_
SCA_SPH.pdf
5
inspired by A Lagrangian Scheme à la Brenier for the Incompressible Euler Equations: https://hal.
archives-ouvertes.fr/hal-01425826/document
118 CHAPTER 5. FLUID SIMULATION

A few intuitions help motivate the use of optimal transport. First a theorem by Brenier6 that
relates optimal transport to Helmoltz-Hodge decomposition and polar decomposition of vector fields.
The consequence of this theorem is that if you take a set of particles at position X and advect them
with any velocity field, you will obtain particles at new positions Y . Now, if you compute an optimal
transport map between X and Y , the map will produce a velocity field that is the closest divergence-
free velocity field to the original velocity field in the least square sense. In short, optimal transport
can enforce the incompressibility constraint. A second intuition is brought by going back to our
repulsive/attractive forces in SPH to weakly enforce incompressibility. The goal was to force areas
of high density to by repulsed towards areas of lower density such that the resulting distribution of
mass is uniform. And this can be optimally achieved by computing an optimal transport map between
a uniform density and the set of particles, exactly as we did in Sec. 4.4.4. A third intuition is that
the semi-discrete optimal transport approach we implemented merely allowed to control the volume
of each cell of a power diagram. By imposing these cells to remain of constant volume across the
simulation, at least we are sure the fluid is incompressible!

The Lagrangian approach of Gallouët and Mérigot hence considers the semi-discrete optimal trans-
portation problem between a set of fluid particles and a uniform density, leading to a Laguerre’s cell
for each particle. It then considers a spring force pushing each particle to the centroid of its Laguerre’s
cell. Doing so, particles that are too tightly packed will get spread closer to a uniform density at each
time step.

Formally, the splitting scheme now reads:

Algorithm 8: One time-step of the Gallouët Mérigot scheme.

Input: Positions X, velocity v and mass m of particles
Output: New positions X 0 and velocity v 0 of particles
1 VW = OptimalTransport(X, Uniform) // optimize weights W of the Laguerre’s cells of all
particles
2 for i = 1..N do // For each particle
3 Fspring = 12 (Centroid(VW
i )−X )
i
4 F = Fspring + ~g
5 vi0 = vi + m
dt
i
F
6 Xi0 = Xi + dt vi
7 return X 0 , v 0

Particles going outside of the domain can bounce back inside if needed.

Now, the interesting part is how to simulate free surface fluids with this scheme (as here, we
considered a uniform fluid density in the entire simulation domain).

The solution is to consider a set of particles for the fluid and a set of particles for the air, while only
moving the fluid particles. The air particles should cover the entire domain (in my example in Fig. 5.5,
I performed a few Lloyd iterations to have all the air particles uniformly spaced in the domain). And
then instead of considering an optimal transport only between the fluid and a uniform density, now we
enforce the size of each fluid Laguerre’s cell to have a constant mass mi (or a constant volume since
all particles have the same mass) as before by optimizing the weight wi of each Laguerre’s cell of each
fluid particle, but we also consider an new single additional weight w̃ that will be shared by all air
particles7 and is optimized such that the sum of all volumes of all Laguerre’s cells of all air particles
is a constant (equals to the initial air volume). This only slightly changes the semi-discrete optimal
transport formulation, as now, there is an additional weight that counts for several Laguerre’s cells –
6
Polar factorization and monotone rearrangement of vector-valued functions: http://www.math.toronto.edu/
~mccann/assignments/477/Brenier91.pdf
7
An even more efficient formulation considers the air as being a single big cell whose interface with the fluid consists
in portions of circles.
5.4. USING OPTIMAL TRANSPORT 119

we are not optimizing N weights anymore but N + 1.

A simple potential acceleration is the warm-restart of Laguerre’s weights. At each frame of the
simulation, this consists in starting the optimization of the Laguerre’s weights (for the semi-discrete
optimal transport) by the last weights found at the previous time step. Results can be found in
Fig. 5.5.

Figure 5.5: Visualization of a free-surface liquid simulated with semi-discrete optimal transport. The
sum of all air Laguerre’s cell has a prescribed mass, with each individual liquid Laguerre’s cell has a
prescribed mass. Here the optimization takes about 1 second per frame while there isn’t much fluid-
boundary interaction and about 30s–1 minute per frame afterwards (default L-BFGS settings) with
700 fluid particles and 2500 air particles (this could be made much faster, notably using the Newton’s
optimizer instead of quasi-Newton, and using other variants of power diagram computations! Early
universe reconstructions are being performed with millions of particles with this approach on the GPU -
see https://twitter.com/BrunoLevy01/status/957552532661915649 and https://twitter.com/
BrunoLevy01/status/1242942393474670592 !). I used = 0.004, dt = 0.002 and each particle has a
mass of 200.