Diffrt
Diffrt
(a) initial guess (b) real photograph (c) camera gradient (d) table albedo gradient (e) light gradient (f) our fitted result
(per-pixel contribution) (per-pixel contribution) (per-pixel contribution)
Fig. 1. We develop a general-purpose differentiable renderer that is capable of handling general light transport phenomena. Our method generates gradients
with respect to scene parameters, such as camera pose (c), material parameters (d), mesh vertex positions, and lighting parameters (e), from a scalar loss
computed from the output image. (c) shows the per-pixel gradient contribution of the L 1 difference with respect to the camera moving into the screen. (d)
shows the gradient with respect to the red channel of table albedo. (e) shows the gradient with respect to the green channel of the intensity of one light source.
As one of our applications, we use our gradient to perform an inverse rendering task by matching a real photograph (b) starting from an initial configuration
(a) with a manual geometric recreation of the scene. The scene contains a fisheye camera with strong indirect illumination and non-Lambertian materials. We
optimize for camera pose, material parameters, and light source intensity. Despite slight inaccuracies due to geometry mismatch and lens distortion, our
method generates image (f) that almost matches the photo reference.
Gradient-based methods are becoming increasingly important for computer We interface our differentiable ray tracer with the deep learning library
graphics, machine learning, and computer vision. The ability to compute PyTorch and show prototype applications in inverse rendering and the
gradients is crucial to optimization, inverse problems, and deep learning. In generation of adversarial examples for neural networks.
rendering, the gradient is required with respect to variables such as camera
CCS Concepts: • Computing methodologies → Ray tracing; Visibility;
parameters, light sources, scene geometry, or material appearance. However,
Reconstruction;
computing the gradient of rendering is challenging because the rendering
integral includes visibility terms that are not differentiable. Previous work on Additional Key Words and Phrases: ray tracing, inverse rendering, differen-
differentiable rendering has focused on approximate solutions. They often tiable programming
do not handle secondary effects such as shadows or global illumination, or
ACM Reference Format:
they do not provide the gradient with respect to variables other than pixel
Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen. 2018. Dif-
coordinates.
ferentiable Monte Carlo Ray Tracing through Edge Sampling. ACM Trans.
We introduce a general-purpose differentiable ray tracer, which, to our
Graph. 37, 6, Article 222 (November 2018), 11 pages. https://doi.org/10.1145/
knowledge, is the first comprehensive solution that is able to compute deriva-
3272127.3275109
tives of scalar functions over a rendered image with respect to arbitrary scene
parameters such as camera pose, scene geometry, materials, and lighting
parameters. The key to our method is a novel edge sampling algorithm that 1 INTRODUCTION
directly samples the Dirac delta functions introduced by the derivatives of The computation of derivatives is increasingly central to many areas
the discontinuous integrand. We also develop efficient importance sampling of computer graphics, computer vision, and machine learning. It
methods based on spatial hierarchies. Our method can generate gradients in is critical for the solution of optimization and inverse problems,
times running from seconds to minutes depending on scene complexity and and plays a major role in deep learning via backpropagation. This
desired precision.
creates a need for rendering algorithms that can be differentiated
with respect to arbitrary input parameters, such as camera location
Authors’ addresses: Tzu-Mao Li, MIT CSAIL, [email protected]; Miika Aittala, MIT and direction, scene geometry, lights, material appearance, or tex-
CSAIL, [email protected]; Frédo Durand, MIT CSAIL, [email protected]; Jaakko Lehti- ture values. Unfortunately, the rendering integral includes visibility
nen, Aalto University & NVIDIA, [email protected].
terms that are not differentiable at object boundaries. Whereas the
final image function is usually differentiable once radiance has been
© 2018 Copyright held by the owner/author(s). integrated over pixel prefilters, light source areas, etc., the integrand
This is the author’s version of the work. It is posted here for your personal use. Not for of rendering algorithms is not. In particular, the derivative of the
redistribution. The definitive Version of Record was published in ACM Transactions on
Graphics, https://doi.org/10.1145/3272127.3275109. integrand has Dirac delta terms at occlusion boundaries that cannot
be handled by traditional sampling strategies.
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
222:2 • Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen
Previous work in differentiable rendering [Kato et al. 2018; Loper there are multiple triangles inside a pixel, and both of them as-
and Black 2014] has focused on fast, approximate solutions using sume Lambertian materials and do not compute shadows and global
simpler rendering models that only handle primary visibility, and illumination.
ignore secondary effects such as shadows and indirect light. Ana- Recently, it is increasingly popular for deep learning methods
lytical solutions exist for diffuse interreflection [Arvo 1994] but are to incorporate a differentiable rendering layer in their architecture
difficult to generalize for arbitrary material models. The work by Ra- (e.g. [Liu et al. 2017; Richardson et al. 2017]). These rendering layers
mamoorthi et al. [2007] is an exception but it only differentiates with are usually special purpose and do not handle geometric disconti-
respect to image coordinates, whereas we want derivatives with nuities such as primary visibility and shadow.
respect to arbitrary scene parameters. Other previous work usually To our knowledge our method is the first that is able to differen-
also relies on finite differences, with the usual limitation of these tiate through a full path tracer, while taking the geometric disconti-
methods when the function is complex, namely that these meth- nuities into account.
ods work well for simple configurations but they do not propose a
comprehensive solution to the full light transport equation. 2.2 Derivatives in rendering
In this work, we propose an algorithm that is, to the best of our
Analytical derivatives have been used for computing the footprint
knowledge, the first to compute derivatives of scalar functions over
of light paths [Igehy 1999; Shinya et al. 1987; Suykens and Willems
a physicially-based rendered image with respect to arbitrary input
2001] and predicting the changes of specular light paths [Chen and
parameters (camera, light materials, geometry, etc.). Our solution
Arvo 2000; Jakob and Marschner 2012; Kaplanyan et al. 2014]. The
is stochastic and builds on Monte Carlo ray tracing. For this, we
derivatives are usually manually derived for the particular type of
introduce new techniques to explicitly sample edges of triangles in
light paths the work focused on, making it difficult to generalize to
addition to the usual solid angle sampling of traditional approaches.
arbitrary material models or lighting effects. Unlike these methods,
This requires new spatial acceleration strategies and importance
we compute the gradients using a hybrid approach that mixes auto-
sampling to efficiently sample edges. Our method is general and can
matic differentiation and manually derived derivatives focusing on
sample derivatives for arbitrary bounces of light transport. The run-
the discontinuous integrand.
ning times we observed range from a second to a minute depending
Arvo [1994] proposed an analytical method for computing the
on the required precision, for an overhead of roughly 10 × −20×
spatial gradients for irradiance. The method requires clipping of
compared to rendering an image alone.
triangle meshes in order to correctly integrate the form factor, and
We integrate our differentiable ray tracer with the automatic
does not scale well to scenes with large complexity. It is also diffi-
differentiation library PyTorch [Paszke et al. 2017] for efficient in-
cult or impossible to compute closed-form integration for arbitrary
tegration with optimization and learning approaches. The scene
materials.
geometry, lighting, camera and materials are parameterized by Py-
Ramamoorthi et al.’s work on first order analysis of light trans-
Torch tensors, which enables a complex combination of 3D graphics,
port [2007] is highly related to our method. Their method is a special
light transport, and neural networks. Backpropagation runs seam-
case of ours. Our derivation generalizes their method to differenti-
lessly across PyTorch and our renderer.
ate with respect to any scene parameters. Furthermore we handle
primary visibility, secondary visibility and global illumination.
2 RELATED WORK Irradiance or radiance caching [Jarosz et al. 2012; Krivanek et al.
2.1 Inverse graphics 2005; Ward and Heckbert 1992] numerically computes the gradient
Inverse graphics techniques seek to find the scene parameters given of interreflection with respect to spatial position and orientation
observed images. Vision as inverse graphics has a long history in of the receiver. To take discontinuities into account, these methods
both computer graphics and vision (e.g. [Baumgart 1974; Patow and resort to stratified sampling. Unlike these methods, we estimate
Pueyo 2003; Yu et al. 1999]). Many techniques in inverse graphics the gradient integral directly by automatic differentiation and edge
utilize derivatives of the rendering process for inference. sampling.
Blanz and Vetter [1999] optimized for shape and texture of a face. Li et al. [2015] proposed a variant of the Metropolis light trans-
Shacked and Lischinski [2001] and Bousseau et al. [2011] optimized port [Veach and Guibas 1997] algorithm by computing the Hessian
a perceptual metric for lighting design. Gkioulekas et al. [2016; of a light path contribution with respect to the path parameters
2013] focused on scattering parameters. Aittala et al. [2016; 2013; using automatic differentiation [Griewank and Walther 2008]. Their
2015] are interested in spatially varying material properties. Bar- method does not take geometric discontinuities into account.
ron et al. [2015] proposed a solution to jointly optimize shape, illumi-
nation, and reflectance. Khungurn et al. [2015] aimed for matching 3 METHOD
photographs of fabrics. All of the approaches above utilize gradients Our task is the following: given a 3D scene with a continuous pa-
for solving the inverse problem, and had to develop a specialized rameter set Φ (including camera pose, scene geometry, material and
solver to compute the gradient of the specific light transport sce- lighting parameters), we generate an image using the path tracing
narios they were interested in. algorithm [Kajiya 1986]. Given a scalar function computed from
Loper and Black [2014] and Kato et al. [2018] proposed two gen- the image (e.g. a loss function we want to optimize), our goal is to
eral differentiable rendering pipelines. Both of them focus on per- backpropagate the gradient of the scalar with respect to all scene
formance and approximate the primary visibility gradients when parameters Φ.
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
Differentiable Monte Carlo Ray Tracing through Edge Sampling • 222:3
zero contribution
(a) area sampling (b) edge sampling
(a) half-spaces (b) occlusion
Fig. 2. (a) The figure shows a pixel overlapped with two triangles. We are in-
terested in computing the derivative of pixel color with respect to the green Fig. 3. (a) An edge splits the space into two half-spaces fu and fl . If the edge
triangle moving up. Since the area covered by the green triangle increases, moves right, the green area increases while the white area decreases. We
the final pixel color will contain more green area and less white background. integrate over edges to compute gradients by taking into account the change
Traditional area sampling (yellow samples) even instrumented with auto- in areas. To compute the integration, we sample a point (the blue point) on
matic differentiation does not account for the change in covered area. (b) In the edge and compute the difference between the half-spaces by computing
addition to traditional area sampling, we propose a novel edge sampling the color on the two sides of the edge. (b) Our method handles occlusion
algorithm (blue samples) to sample the differential area on the edges. Our correctly since an occluded sample will land on the continuous part of the
method computes unbiased gradients and correctly takes occlusion into path contribution function, thus having the exact same contribution on
account. the two sides (for example, the grey sample has zero contribution to the
gradient).
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
222:4 • Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen
= δ (α i (x, y))∇α i (x, y) fi (x, y)dxdy (6) where ∥E ∥ is the length of the edge and P (E) is the probability of
" selecting edge E.
+ ∇fi (x, y)θ (α i (x, y))dxdy. In practice, if we employ smooth shading, most of the triangle
edges are in the continuous regions and the Dirac integral is zero
Equation 6 shows that we can estimate the gradient using two for these edges since by definition of continuity fu (x, y) = fl (x, y).
Monte Carlo estimators. The first one estimates the integral over Only the silhouette edges (e.g. [Hertzmann 1999]) have non-zero
the edges of triangles containing the Dirac delta functions, and contribution to the gradients. We select the edges by projecting all
the second estimates the original pixel integral except the smooth triangle meshes to screen space and clip them against the camera
function fi is replaced by its gradient, which can be computed frustrum. We select one silhouette edge with probability propor-
through automatic differentiation. tional to the screen space lengths. We then uniformly pick a point
To estimate the integral containing Dirac delta functions, we on the selected edge.
eliminate the Dirac function by performing variable substitution Our method handles occlusion correctly, since if the sample is
to rewrite the first term containing the Dirac delta function to an blocked by another surface, (x, y) will always land on the continuous
integral that integrates over the edge, that is, over the regions where part of the contribution function f (x, y). Such samples will have
α i (x, y) = 0: zero contribution to the gradients. Figure 3b illustrates the process.
"
To recap, we use two sampling strategies to estimate the gradient
δ (α i (x, y))∇α i (x, y) fi (x, y)dxdy integral of pixel filter (Equation 2): one for the discontinuous regions
Z
∇α i (x, y) (7) of the integrand (first term of Equation 6), one for the continuous
= fi (x, y)dσ (x, y), regions (second term of Equation 6). To compute the gradient for
α i (x,y )=0 ∇x,y α i (x, y)
discontinuous regions, we need to explicitly sample the edges. We
where ∇x,y α i (x, y) is the L2 length of the gradient of the edge compute the difference between two sides of the edges using Monte
Carlo sampling (Equation 10).
equations α i with respect to x, y, which takes the Jacobian of the
variable substitution into account. σ (x, y) is the measure of the
3.2 Secondary visibility
length on the edge [Hörmander 1983].
The gradients of the edge equations α i are: Our method can be easily generalized to handle effects such as
q shadow and global illumination by integrating over the 3D scene.
∇x,y α i = (a x − bx ) 2 + (ay − by ) 2 Figure 4 illustrates the idea.
We focus on a single shading point p since we can propagate
∂α i ∂α i
= by − y, = x − bx the derivative back to screen space and camera parameters using
∂a x ∂ay Equation 6. Given the shading point, the shading equation involves
∂α i ∂α i (8)
= y − ay , = ax − x an integration over all points m on the scene manifold M:
∂bx ∂by Z
∂α i ∂α i д(p) = h(p, m)dA(m), (11)
= ay − by , = bx − a x . M
∂x ∂y
where A is the area measure of point m, and h is the scene function
As a byproduct of the derivation, we also obtain the screen space including material response, geometric factor, incoming radiance,
∂ and ∂ , which can potentially facilitate adaptive sam-
gradients ∂x ∂y and visibility. Note that д(p) can itself be part of the pixel inte-
pling as shown in Ramamoorthi et al.’s first-order analysis [2007]. grand f (x, y) in the previous section (Equation 1). Therefore we can
We can obtain the gradient with respect to other parameters, such propagate the gradient of д(p) using the chain rule or automatic
as camera parameters, 3D vertex positions, or vertex normals by differentiation with Equation 6.
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
Differentiable Monte Carlo Ray Tracing through Edge Sampling • 222:5
light source
edge surface element
the scene surface element onto the infinitesimal width of the edge
scene surface element
(Figure 4b).
To integrate the 3D edge integral using Monte Carlo sampling
we substitute the variable again from the point m on the surface to
the line parameter t on the edge v 0 + t (v 1 − v 0 ):
Z 1
blocker ∇α (p, m(t )) ∥Jm (t )∥
h(p, m(t )) dt, (15)
0 ∇m α (p, m(t )) nm × n h
shading point
where the Jacobian Jm (t ) is a 3D vector describing the projection
of edge (v 0 , v 1 ) onto the scene manifold with respect to the line
(a) secondary visibility (b) width correction
parameter. We derive the Jacobian in Appendix A.1.
The derivatives for α (p, m) needed to compute the edge integral
Fig. 4. (a) Our method can be easily generalized to handle shadow and
global illumination. Similar to the primary visibility case (Figure 3), a ge-
are:
∇m α (p, m) = (v 0 − p) × (v 1 − p)
ometry edge (v 0, v 1 ) and the shading point p splits the 3D space into two
half-spaces fu and fl and introduces discontinuity. Assuming the blocker ∇v0 α (p, m) = v 1 × m, ∇v1 α (p, m) = m × v 0 (16)
is moving right, we integrate over the edge to compute the difference. By ∇p α (p, m) = (v 0 − p) × (v 1 − p).
doing so we take account of the increase in blocker area and the decrease
in light source area looking from the shading point. The integration over Efficient Monte Carlo sampling of secondary edges is more in-
edge is defined on the intersection between the scene manifold and the volved. Unlike primary visibility where the viewpoint does not
plane formed by the shading point and the edge (the semi-transparent change much, shading point p can be anywhere in the scene. The
triangle). (b) The orientation of the infinitesmal width of the edge differs consequence is that we need a more sophisticated data structure
from the scene surface element the edge intersects with. During integration to prune the edges with zero contribution. Section 4 describes the
we need to project the scene surface element width onto the edge surface process for importance sampling edges.
element. The ratio of the widths between the two is determined by sin1 θ ,
which is one over the length of the cross product between the normal of 4 IMPORTANCE SAMPLING THE EDGES
the edge plane and the scene surface.
Our edge sampling method described in Section 3 requires us to
sample an edge from hundreds of thousands, or even millions of
Similar to the primary visibility case, an edge (v 0 , v 1 ) in 3D in- triangles in the scene. The problem is two-fold: we need to sample
troduces a step function into the scene function h: an edge and then sample a point on the edge efficiently. Typically
only a tiny fraction of these edges contribute to the gradients, since
θ (α (p, m))hu (p, m) + θ (−α (p, m))hl (p, m). (12) most of the edges are not silhouette (e.g. [Hertzmann 1999]), and
We can derive the edge function α (m) by forming a plane using the many of them may have small solid angle. Naive sampling methods
shading point p and the two points on the edge. The sign of the fail to select important edges. Even if the number of edges is small,
dot product between the vector m − p and the plane normal deter- it is often the case that only a small region on the edge has non-zero
mines the two half-spaces. The edge equation α (m) can therefore contribution, especially when there exists highly-specular materials.
be defined by As mentioned in Section 3.1, the case for primary visibility is eas-
ier since the viewpoint is the camera. We project all edges onto the
α (p, m) = (m − p) · (v 0 − p) × (v 1 − p). (13)
screen in a preprocessing pass, and test whether they are silhouette
To compute the gradients, we analogously apply the derivation with respect to the camera position. We sample an edge based on
used for primary visibility, using the 3D version of Equation 6 and the distance of two points on the screen and uniformly sample in
Equation 7 with x, y replaced by p, m. The edge integral integrating screen space. For secondary visibility the problem is much more
over the line on the scene surface, analogous to Equation 7 is: complicated. The viewpoint can be anywhere in the scene, and we
Z
∇α (p, m) 1 need to take the material response between the viewpoint and the
h(p, m) dσ ′ (m) point on the edge into account.
α (p,m)=0 ∇m α (p, m) n m × nh
(14) In this section we describe a scalable implementation for sam-
(v 0 − p) × (v 1 − p) pling edges given arbitrary viewpoint. Our solution is inspired by
nh = ,
(v 0 − p) × (v 1 − p) previous methods for sampling many light sources using hierarchi-
where nm is the surface normal on point m. There are two crucial cal data structures (e.g. [Estevez and Kulla 2018; Paquette et al. 1998;
differences between the 3D edge integral (Equation 14) and the Walter et al. 2005]), efficient data structures for selecting silhouette
previous screen space edge integral (Equation 7). First, while the edges [Hertzmann and Zorin 2000; Sander et al. 2000], and the more
measure of the screen space edge integral σ (x, y) coincides with the recent closed-form solution for linear light sources [Heitz et al. 2016;
unit length of the 2D edge, the measure of the 3D edge integral σ ′ (m) Heitz and Hill 2017].
is the length of projection of a point on the edge from the shading
point p to a point m on the scene manifold (the semi-transparent 4.1 Hierarchical edge sampling
triangle in Figure 4a illustrates the projection). Second, there is Given a shading point, our first task is to importance sample one or
an extra area correction term nm × nh , since we need to project more triangle edges. There are several factors to take into account
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
222:6 • Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen
scenes 10s, w/o importance samp. 10s, w/ importance samp. 350s, w/o importance samp. 350s, w/ importance samp.
Fig. 5. Equal time comparison between sampling with and without our importance sampling method. We tested our algorithm on scenes with soft shadow,
global illumination, and specular reflection. We show the per-pixel derivatives of average color with respect to the bunny moving up in the top row, and the
derivatives with respect to the reflected plane with the SIGGRAPH logo moving right in the second row. For the second row we only show the gradients in the
red inset. The texture derivatives are resolved better without importance sampling since it has less overhead and more samples. However, without importance
sampling it is difficult to capture rare events such as shadows cast by a small area light or very specular reflection of edges, causing extremely high variance in
the gradients.
when selecting the edges: the geometric foreshortening factor pro- During the traversal, for each node in the hierarchy we compute
portional to the inverse squared distance to the edge, the material an importance value for selecting which child to traverse next, based
response between the shading point and the point on the edge, and on an upper bound estimation of the contribution, similar to the
the radiance incoming from the edge direction (e.g. whether it hits lightcuts algorithm [Walter et al. 2005]. We estimate the bound using
a light source or not). the total length of edges times inverse squared distance times a Blinn-
We build two hierarchies. The first contains the triangle edges Phong BRDF (using the method described in Walter’s note [2005]).
that associate with only one face and meshes that do not use smooth We set the importance to zero if the node does not contain any
shading normals. The second contains the remaining edges. For the silhouette. We traverse into both children if the shading point is
first set of edges we build a 3D bounding volume hierarchy using inside both of their bounding boxes, or when the BRDF bound is
the 3D positions of the two endpoints of an edge. For the second higher than a certain threshold (for all examples in the paper we
set of edges we build a 6D bounding volume hierarchy using the set it to 1), or when the angle subtended by the light cone is smaller
two endpoint positions and the two normals associated with the than a threshold (we set it to cos−1 (0.95)).
two faces of an edge. For quick rejection of non-silhouette edges,
for each node in the hierarchy we store a cone direction and an
angle covering all possible normal directions [Sander et al. 2000]. 4.2 Importance sampling a single edge.
An alternative might be a 3D hierarchy similar to the ones used by After we select a set of edges, we need to choose a point on the edge
Sander et al. [2000] or Hertzmann and Zorin [2000], but we opt for we selected. Oftentimes with a highly-specular BRDF, only a small
simplicity here. Similar to previous work [Walter et al. 2006], we portion of the edge has significant contribution. We employ the
scale the directional components with 18 the diagonal of the scene’s recent technique on integrating linear light sources over Linearly
bounding box. During the construction we split the dimension with Transformed Cosine Distribution [Heitz et al. 2016; Heitz and Hill
longest extent. 2017]. Heitz and Hill’s work provides a closed-form solution of
We traverse the hierarchies to sample edges. The edges blocking the integral between a point and a linear light source, weighted
light sources are usually the most significant source of contribution. by BRDF and geometric foreshortening. We numerically invert the
Therefore we traverse the hierarchy twice. The first traversal focuses integrated cumulative distribution function using Newton’s method
on edges that overlap with the cone subtended by the light source for importance sampling. We precompute a table of fitted linearly
at the shading point, and the second traversal samples all edges. transformed cosine for our BRDFs.
We combine the two sets of samples using multiple importance We evaluate our sampling method using equal-time comparison
sampling [Veach and Guibas 1995]. We use a box-cone intersection and show the results in Figure 5. We compare against the baseline
to quickly discard the edges that do not intersect the light sources. approach of uniformly sampling all edges by length. The baseline
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
Differentiable Monte Carlo Ray Tracing through Edge Sampling • 222:7
initial guess
target
optimized result
(a) primary occlusion (b) shadow (c) camera & glossy (d) glossy receiver (e) near-specular (f) global illumination
Fig. 6. We verify our renderer by matching a range of synthetic scenes with different light transport configurations. For each scene, we start from an initial
parameter (first row) and attempt to set scene parameters so that the rendering matches the target (second row) using gradient-based optimization. Each
scene is intended to test a different aspect of the renderer. (a) optimizes triangle positions under the presence of occlusion. (b) optimizes blocker position for
shadow. (c) optimizes camera pose and material parameters over textured and glossy surfaces. (d) optimizes the blocker position where the shadow receiver
is highly glossy. (e) optimizes an almost specular reflection of a plane behind the camera; the free parameter is the plane position. (f) optimizes camera
pose under the presence of global illumination and soft shadow. Our method is able to generate gradients for these scenes and to optimize the parameters
correctly, resulting in minimal difference between the optimized result (final row) and target (second row). All the scenes are rendered with 4 samples per pixel
during optimization. The final renderings are produced with 625 samples per pixel, except for (f) we use 4096 samples. We encourage the reader to refer to the
supplementary materials for videos and more scenes.
approach is not able to efficiently sample rare events such as shad- and specular reflectance and roughness, and area light sources with
ows cast by a small light source or very specular reflection of edges, triangle meshes.
while our importance sampling generates images with much lower
variance. 5.1 Verification of the method
We tested our method on several synthetic scenes covering a variety
of effects, including occlusion, non-Lambertian materials, and global
5 RESULTS
illumination. Figure 6 shows the scenes. We start from an initial
We implement our method in a stand-alone C++ renderer with an parameter, and try to optimize the parameters to minimize the L2
interface to the automatic differentiation library PyTorch [Paszke difference between the rendered image and target image using gra-
et al. 2017]. To use our system, the user constructs their scenes using dients generated by our method (except for the living room scene
lists of PyTorch tensors. For example, triangle vertices and indices in Figure 6 (f) where we optimize for the L2 difference between
are represented by floating point and integer tensors. Our renderer the Gaussian pyramids of the rendered image and target image).
in the forward pass outputs an image which is also a PyTorch tensor. Our PyTorch interface allows us to apply their in-stock optimiz-
The user can then compute a scalar loss on the output image and ers, and backpropagate to all scene parameters easily. We use the
obtain the gradient by backpropagating to the scene parameters. Adam [Kingma and Ba 2015] algorithm for optimization. The num-
Inside our C++ renderer, we use an operator overloading approach ber of parameters ranges from 6 to 30. The experiment shows that
for automatic differentiation. We use the Embree library [Wald our renderer is able to generate correct gradients for the optimizer
et al. 2014] for our ray casting operations. The renderer supports a to infer the correct scenes. It also shows that we are able to handle
pinhole camera with planar and equiangular spherical projection, many different light transport scenarios, including cases where a
Lambertian and Blinn-Phong BRDFs with Schlick approximation triangle vertex is blocked but we still need to optimize it into the
for Fresnel reflection, bilinear reconstruction of textures for diffuse correct position, optimization of blocker position when we only see
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
222:8 • Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen
image
finite differences
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
Differentiable Monte Carlo Ray Tracing through Edge Sampling • 222:9
Fig. 9. (a) A plane lit by a point light close to the plane. We are interested (a) input scene (b) 5 iterations (c) 25 iterations
in the derivative of the image with respect to the plane moving right. Since 53% street sign 26.8% handrail 23.3% handrail
the point light stays static, the derivatives should be zero except for the 14.5% traffic light 20.2% street sign 3.39% street sign or
boundary. (b) (c) Previous work uses color buffer differences to approximate 6.7% handrail 4.8% traffic light traffic light
the derivatives, making them unable to take large variation between pixels
0.6
into account and output non zero derivatives at the center. (d) Our method
0.5
class score
outputs the correct derivatives.
0.4
0.3
0.2
boundary, they use the kernel 12 [−1, 0, 1]. The Neural 3D Mesh ren- 0.1
derer performs an extra edge rasterization pass of the triangle edges
and accumulates the derivatives by computing the difference be- 0 5 10 15 20 25 30
tween the color difference on the color buffer around the edge. The iteration
derivative responses are modified by applying a smooth falloff.
(d) combined class score of street sign and traffic light
Both previous differentiable renderers output incorrect gradients
in the case where there is brightness variation between pixels due
Fig. 10. Our method can be used for finding 3D scenes as adversarial ex-
to lighting. Figure 9 shows an example of a plane lit by a point
amples for neural networks. We use the gradient generated by our method
light with inverse squared distance falloff. We ask the two renderers to optimize for the geometry of the stop sign, camera pose, light intensity
and ours to compute the derivatives of the pixel color with respect and direction to minimize the class scores of street sign and traffic light
to the plane moving right. Since the light source does not move, classes. After 5 iterations the network classifies the stop sign as a handrail,
the illumination on the plane remains static and the derivatives and after 25 iterations both street sign and traffic light are out of the top 5
should be zero except for the boundaries of the plane. Since both prediction. In (d) we plot the sum of street sign and traffic light class scores
previous renderers use the differences between color buffer pixels as a function of iteration. As we optimize scene parameters such as the stop
to approximate derivatives, they incorrectly take the illumination sign shape, gradient descent tries to find the geometries that minimizes the
variation as the changes that would happen if the plane moves right, class scores, thus we see decreasing of the score.
and output non-zero derivatives around the highlights. On the other
hand, since we sample on the edges, our method correctly outputs
zero derivatives on continuous regions. the final resolution 512 × 512 through 8 stages. For each scale we use
OpenDR’s point light does not have distance falloff and the Neural an L1 loss and perform 50 iterations. We exclude the light source
3D mesh renderer does not support point lights so we modified their in the loss function by setting the weights of pixels with radiance
renderers. Our renderer does not support pure point lights so we larger than 5 to 0.
use a small planar area light to approximate a point light. We also
tessellate the plane into 256 × 256 grids as both previous renderers 5.4 3D adversarial example
use Gouraud shading. Recently, it has been shown that gradient-based optimization can
also be used for finding adversarial examples for neural networks
5.3 Inverse rendering application (e.g. [Goodfellow et al. 2015; Szegedy et al. 2014]) for analysis or
We apply our method on an inverse rendering task for fitting camera mining training data. The idea is to take an image that was originally
pose, material parameters, and light source intensity. Figure 1 shows labelled correctly by a neural network classifier, and use backpropa-
the result. We take the scene photo and geometry data from the gation to find an image that minimizes the network’s output with
thesis work of Jones [2017], where the scene was used for validating respect to the correct output. Our system can be used for mining
daylight simulation. The scene contains strong indirect illumination adversarial examples of 3D scenes, since it provides the ability to
and has non-Lambertian materials. We assign most of the materials backpropagate from image to scene parameters. A similar idea has
to white except for plastic or metal-like objects, and choose an been explored by Zeng et al. [2017], but we use a more general
arbitrary camera pose as an initial guess. There are in total 177 renderer.
parameters for this scene. We then use gradient-based optimizer We demonstrate this in Figure 10. We show a stop sign classified
Adam and the gradients generated by our method, to find the correct correctly as a street sign by the VGG16 classifier [Simonyan and
camera pose and material/lighting parameters. In order to avoid Zisserman 2014]. We then optimize for 2256 parameters including
getting stuck in local minima, we perform the optimization in a camera pose, light intensity, sun position, global translation, rota-
multi-scale fashion, starting from 64 × 64 and linearly increasing to tion, and vertex displacement of the stop sign. We perform stochastic
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
222:10 • Tzu-Mao Li, Miika Aittala, Frédo Durand, and Jaakko Lehtinen
gradient descent to minimize the network’s output of the classes camera, lights and materials. For this, we have introduced a novel
street sign and traffic light, using 256 samples per pixel. After 5 iter- edge sampling algorithm to take the geometric discontinuities into
ations the network starts to output handrail as the most probable consideration, and derived the appropriate measure conversion. For
class. After 23 iterations both the street sign class and traffic light increased efficiency, we use a new discrete hierarchical sampling
class are out of the top-5 predictions and the sum of the two has method to focus on relevant edges as well as continuous edge im-
less than 5% probability. portance sampling. We believe this method and the software that
We do not claim this is a robust way to break or to attack neural we will release will have an impact in inverse rendering and deep
networks, since the CG scene we use has different statistics com- learning.
pared to real world images. Nevertheless this demonstrates that our
gradient can be used for finding interesting scene configurations A APPENDIX
and can be potentially used for mining training data. A.1 Derivation of the 3D edge Jacobian
We derive the Jacobian Jm (t ) in Equation 15. The goal is to compute
5.5 Limitations the derivatives of point m(t ) with respect to the line parameter t. The
Performance. Our current CPU implementation takes seconds to relation between m(t ) and t is described by a ray-plane intersection.
minutes to generate a small resolution image (say 256 × 256) with That is, we are intersecting a plane at point m with normal nm with
a small number of samples (say 4). Note though that when using a ray of origin p and unnormalized direction ω (t ):
stochastic gradient descent it is usually not necessary to use high ω (t ) = v 0 + (v 1 − v 0 )t − p
sample counts.
(m − p) · nm
We have found that, depending on the type of scene, the bot- τ (t ) = (17)
tleneck can be at the edge sampling phase or during automatic ω (t ) · nm
differentiation of the light paths. Developing better sampling algo- m(t ) = τ (t )ω (t ).
rithms such as incorporating bidirectional path tracing could be an We can then derive the derivative Jm (t ) =
∂m (t )
as:
interesting avenue of future work. Developing better compiler tech- ∂t
(v 1 − v 0 ) · nm
!
niques for optimizing automatic differentiation code and supporting Jm (t ) = τ (t ) (v 1 − v 0 ) − ω (t ) (18)
GPU backends is also an important task. ω (t ) · nm
Other light transport phenomena. We assume static scenes with no
participating media. Differentiating motion blur requires sampling ACKNOWLEDGMENTS
on 4D edges with an extra time dimension. Combining our method We thank the anonymous reviewers for their detailed comments
with Gkioulekas et al.’s work [2013] for handling participating media (especially reviewer #1). The work started as an internship project at
is left as future work. NVIDIA, where Marco Salvi and Aaron Lefohn provided immensely
Interpenetrating geometries and parallel edges. Dealing with the helpful advice. Luke Anderson and Prafull Sharma helped proofread
derivatives of interpenetration of triangles requires a mesh splitting the drafts. Nathaniel Jones modelled the conference scene in Figure 1
process and its derivatives. Interpeneration can happen if the mesh and gave helpful comments on the applications of inverse rendering
is generated by some simulation process. Our method also does not for architectural design. The teapot model in Figure 5 was modelled
handle the case where two edges are perfectly aligned as seen from by Martin Newell and the bunny in the same figure was created by
the center of projection (camera or shadow ray origin). However, Brian Curless and Marc Levoy. Both of them were downloaded from
these are zero-measure sets in path space, and as long as the two Morgan McGuire’s website [2017]. The living room scene in Figure 6
edges are not perfectly aligned to the viewport, we will be able to (f) was modelled by Wig42 and ported to Mitsuba scene format by
converge to the correct solution. Benedikt Bitterli [2016]. The stop sign in Figure 10 was modelled
Shader discontinuities. We assume our BRDF models and shaders by Elijah Rai, and the street in the same figure was modelled by
are differentiable and do not handle discontinuities in the shaders. Pabong. Both of them were downloaded from Free3D.com. The work
We handle textures correctly by differentiating through the smooth is funded by Toyota Research Institute.
reconstruction, and many widely-used reflection models such as
GGX [Walter et al. 2007] (with Smith masking) or Disney’s prin- REFERENCES
Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2016. Reflectance modeling by neural
cipled BRDF [Burley 2012] are differentiable. However, we do not texture synthesis. ACM Trans. Graph. (Proc. SIGGRAPH) 35, 4 (2016), 65:1–65:13.
handle the discontinuities at total internal reflection and some other Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2013. Practical SVBRDF Capture
BRDFs relying on discrete operations, such as the discrete stochastic In The Frequency Domain. ACM Trans. Graph. (Proc. SIGGRAPH) 32, 4 (2013),
110:1–110:12.
microfacet model of Jakob et al. [2014]. Compiler techniques for Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2015. Two-shot SVBRDF Capture
band-limiting BRDFs can be applied to mitigate the shader disconti- for Stationary Materials. ACM Trans. Graph. (Proc. SIGGRAPH) 34, 4 (2015), 110:1–
nuity issue [Yang and Barnes 2018]. 110:13.
James Arvo. 1994. The Irradiance Jacobian for Partially Occluded Polyhedral Sources.
In SIGGRAPH. 343–350.
6 CONCLUSION Jonathan T Barron and Jitendra Malik. 2015. Shape, Illumination, and Reflectance from
Shading. Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2015),
We have introduced a differentiable Monte Carlo ray tracing algo- 1670–1687.
rithm that is capable of generating correct and unbiased gradients Bruce Guenther Baumgart. 1974. Geometric modeling for computer vision. Technical
Report. Stanford University Department of Computer Science.
with respect to arbitrary input parameters such as scene geometry, Benedikt Bitterli. 2016. Rendering resources. https://benedikt-bitterli.me/resources/.
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.
Differentiable Monte Carlo Ray Tracing through Edge Sampling • 222:11
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary
faces. In SIGGRAPH. 187–194. DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Auto-
Adrien Bousseau, Emmanuelle Chapoulie, Ravi Ramamoorthi, and Maneesh Agrawala. matic differentiation in PyTorch. (2017).
2011. Optimizing environment maps for material depiction. Computer Graphics Gustavo Patow and Xavier Pueyo. 2003. A survey of inverse rendering problems.
Forum (Proc. EGSR) 30, 4 (2011), 1171–1180. Computer Graphics Forum 22, 4 (2003), 663–687.
Brent Burley. 2012. Physically-based shading at Disney. In SIGGRAPH Course Notes. Ravi Ramamoorthi, Dhruv Mahajan, and Peter Belhumeur. 2007. A First-order Analysis
Practical physically-based shading in film and game production., Vol. 2012. 1–7. of Lighting, Shading, and Shadows. ACM Trans. Graph. 26, 1 (2007), 2:1–2:21.
Min Chen and James Arvo. 2000. Theory and application of specular path perturbation. Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face
ACM Trans. Graph. 19, 4 (2000), 246–278. reconstruction from a single image. In Conference on Computer Vision and Pattern
Alejandro Conty Estevez and Christopher Kulla. 2018. Importance Sampling of Many Recognition. 5553–5562.
Lights with Adaptive Tree Splitting. ACM Comput. Graph. Interact. Tech. (Proc. HPG) Pedro V. Sander, Xianfeng Gu, Steven J. Gortler, Hugues Hoppe, and John Snyder. 2000.
1, 2 (2018), 25:1–25:17. Silhouette Clipping. In SIGGRAPH. 327–334.
Ioannis Gkioulekas, Anat Levin, and Todd Zickler. 2016. An evaluation of computational Ram Shacked and Dani Lischinski. 2001. Automatic lighting design using a perceptual
imaging techniques for heterogeneous inverse scattering. In European Conference quality metric. Computer Graphics Forum 20, 3 (2001), 215–227.
on Computer Vision. Springer, 685–701. Mikio Shinya, T. Takahashi, and Seiichiro Naito. 1987. Principles and Applications of
Ioannis Gkioulekas, Shuang Zhao, Kavita Bala, Todd Zickler, and Anat Levin. 2013. Pencil Tracing. Comput. Graph. (Proc. SIGGRAPH) 21, 4 (1987), 45–54.
Inverse Volume Rendering with Material Dictionaries. ACM Trans. Graph. 32, 6 K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-
(2013), 162:1–162:13. Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harness- Frank Suykens and Yves D. Willems. 2001. Path Differentials and Applications. In
ing Adversarial Examples. In International Conference on Learning Representations. Eurographics Workshop on Rendering Techniques. 257–268.
Andreas Griewank and Andrea Walther. 2008. Evaluating Derivatives: Principles and Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian
Techniques of Algorithmic Differentiation (second ed.). Society for Industrial and Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In
Applied Mathematics, Philadelphia, PA, USA. International Conference on Learning Representations.
Eric Heitz, Jonathan Dupuy, Stephen Hill, and David Neubelt. 2016. Real-time polygonal- Eric Veach and Leonidas J. Guibas. 1995. Optimally Combining Sampling Techniques
light shading with linearly transformed cosines. ACM Trans. Graph. (Proc. SIG- for Monte Carlo Rendering. In SIGGRAPH. 419–428.
GRAPH) 35, 4 (2016), 41:1–41:8. Eric Veach and Leonidas J. Guibas. 1997. Metropolis Light Transport. In SIGGRAPH.
Eric Heitz and Stephen Hill. 2017. Linear-Light Shading with Linearly Transformed 65–76.
Cosines. In GPU Zen. Ingo Wald, Sven Woop, Carsten Benthin, Gregory S Johnson, and Manfred Ernst. 2014.
Aaron Hertzmann. 1999. Introduction to 3D Non-Photorealistic Rendering: Silhouettes Embree: a kernel framework for efficient CPU ray tracing. ACM Trans. on Graph.
and Outlines. In SIGGRAPH Course Notes. Course on Non-Photorelistic Rendering, (Proc. SIGGRAPH) 33, 4 (2014), 143.
Stuart Green (Ed.). ACM Press/ACM SIGGRAPH, New York. Bruce Walter. 2005. Notes on the Ward BRDF. Program of Computer Graphics, Cornell
Aaron Hertzmann and Denis Zorin. 2000. Illustrating smooth surfaces. In SIGGRAPH. University, Technical report PCG-05 6 (2005).
517–526. Bruce Walter, Adam Arbree, Kavita Bala, and Donald P Greenberg. 2006. Multidimen-
Lars Hörmander. 1983. The analysis of linear partial differential operators I: Distribution sional lightcuts. ACM Trans. Graph. (Proc. SIGGRAPH) 25, 3 (2006), 1081–1088.
theory and Fourier analysis. Bruce Walter, Sebastian Fernandez, Adam Arbree, Kavita Bala, Michael Donikian, and
Homan Igehy. 1999. Tracing Ray Differentials. SIGGRAPH, 179–186. Donald P Greenberg. 2005. Lightcuts: a scalable approach to illumination. ACM
Wenzel Jakob, Miloš Hašan, Ling-Qi Yan, Jason Lawrence, Ravi Ramamoorthi, and Steve Trans. Graph. (Proc. SIGGRAPH) 24, 3 (2005), 1098–1107.
Marschner. 2014. Discrete stochastic microfacet models. ACM Transs Graph. (Proc. Bruce Walter, Stephen R Marschner, Hongsong Li, and Kenneth E Torrance. 2007.
SIGGRAPH) 33, 4 (2014), 115:1–115:10. Microfacet models for refraction through rough surfaces. Rendering Techniques
Wenzel Jakob and Steve Marschner. 2012. Manifold exploration: a Markov Chain Monte (Proc. EGSR) (2007), 195–206.
Carlo technique for rendering scenes with difficult specular transport. ACM Trans. Greg Ward and Paul Heckbert. 1992. Irradiance Gradients. In Eurographics Rendering
Graph. (Proc. SIGGRAPH) 31, 4 (2012), 58:1–58:13. Workshop. 85–98.
Wojciech Jarosz, Volker Schönefeld, Leif Kobbelt, and Henrik Wann Jensen. 2012. Y. Yang and C. Barnes. 2018. Approximate Program Smoothing Using Mean-Variance
Theory, analysis and applications of 2D global illumination. ACM Trans. Graph. 31, Statistics, with Application to Procedural Shader Bandlimiting. Computer Graphics
5 (2012), 125:1–125:21. Forum (Proc. Eurographics) 37, 2 (2018), 443–454.
Michael J. Jones and Tomaso Poggio. 1996. Model-Based Matching by Linear Combina- Yizhou Yu, Paul Debevec, Jitendra Malik, and Tim Hawkins. 1999. Inverse global
tions of Prototypes. Technical Report. illumination: Recovering reflectance models of real scenes from photographs. In
Nathaniel Louis Jones. 2017. Validated interactive daylighting analysis for architectural SIGGRAPH. 215–224.
design. Ph.D. Dissertation. Massachusetts Institute of Technology. Xiaohui Zeng, Chenxi Liu, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung Tang, and
James T. Kajiya. 1986. The Rendering Equation. Computer Graphics (Proc. SIGGRAPH) Alan L Yuille. 2017. Adversarial Attacks Beyond the Image Space. arXiv preprint
20, 4 (1986), 143–150. arXiv:1711.07183 (2017).
Anton S Kaplanyan, Johannes Hanika, and Carsten Dachsbacher. 2014. The natural-
constraint representation of the path space for efficient light transport simulation.
ACM Trans. Graph. (Proc. SIGGRAPH) 33, 4 (2014), 102:1–102:13.
Hiroharu Kato, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Neural 3D Mesh Renderer.
In Conference on Computer Vision and Pattern Recognition. 3907–3916.
Pramook Khungurn, Daniel Schroeder, Shuang Zhao, Kavita Bala, and Steve Marschner.
2015. Matching Real Fabrics with Micro-Appearance Models. ACM Trans. Graph.
35, 1 (2015), 1:1–1:26.
Diederick P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization.
In International Conference on Learning Representations.
Jaroslav Krivanek, Pascal Gautron, Sumanta Pattanaik, and Kadi Bouatouch. 2005.
Radiance Caching for Efficient Global Illumination. (2005), 550–561.
Tzu-Mao Li, Jaakko Lehtinen, Ravi Ramamoorthi, Wenzel Jakob, and Frédo Durand.
2015. Anisotropic Gaussian Mutations for Metropolis Light Transport through
Hessian-Hamiltonian Dynamics. ACM Transactions on Graphics (Proc. SIGGRAPH
Asia) 34, 6 (2015), 209:1–209:13.
Guilin Liu, Duygu Ceylan, Ersin Yumer, Jimei Yang, and Jyh-Ming Lien. 2017. Material
Editing Using a Physically Based Rendering Network. In International Conference
on Computer Vision. 2280–2288.
Matthew M. Loper and Michael J. Black. 2014. OpenDR: An Approximate Differentiable
Renderer. In European Conference on Computer Vision, Vol. 8695. 154–169.
Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/data
Eric Paquette, Pierre Poulin, and George Drettakis. 1998. A Light Hierarchy for Fast Ren-
dering of Scenes with Many Lights. Computer Graphics Forum (Proc. Eurographics)
(1998), 63–74.
ACM Trans. Graph., Vol. 37, No. 6, Article 222. Publication date: November 2018.