Efficient Calculation of Jacobian and Adjoint Vector Products in The Wave Propagational Inverse Problem Using Automatic Differentiation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Journal of Computational Physics 157, 234–255 (2000)

doi:10.1006/jcph.1999.6373, available online at http://www.idealibrary.com on

Efficient Calculation of Jacobian and Adjoint


Vector Products in the Wave Propagational
Inverse Problem Using Automatic
Differentiation1
Thomas F. Coleman,∗ Fadil Santosa,† and Arun Verma‡
∗ Department of Computer Science and Center for Applied Mathematics, Cornell University, Ithaca, New York
14850; †Minnesota Center for Industrial Mathematics, School of Mathematics, University of Minnesota,
Minneapolis, Minnesota 55455; ‡Cornell Theory Center, Cornell University, Ithaca, New York 14850
E-mail: [email protected], [email protected], [email protected]

Received February 18, 1999; revised July 27, 1999

Wave propagational inverse problems arise in several applications including med-


ical imaging and geophysical exploration. In these problems, one is interested in ob-
taining the parameters describing the medium from its response to excitations. The
problems are characterized by their large size, and by the hyperbolic equation which
models the physical phenomena. The inverse problems are often posed as a nonlin-
ear data-fitting where the unknown parameters are found by minimizing the misfit
between the predicted data and the actual data. In order to solve the problem numer-
ically using a gradient-type approach, one must calculate the action of the Jacobian
and its adjoint on a given vector. In this paper, we explore the use of automatic dif-
ferentiation (AD) to develop codes that perform these calculations. We show that by
exploiting structure at 2 scales, we can arrive at a very efficient code whose main
components are produced by AD. In the first scale we exploite the time-stepping
nature of the hyperbolic solver by using the “Extended Jacobian” framework. In the
second (finer) scale, we exploit the finite difference stencil in order to make explicit
use of the sparsity in the dependence of the output variables to the input variables.
The main ideas in this work are illustrated with a simpler, one-dimensional version
of the problem. Numerical results are given for both one- and two- dimensional
problems. We present computational templates that can be used in conjunction with
optimization packages to solve the inverse problem. °c 2000 Academic Press

1
This research is sponsored in part by the Applied Mathematical Sciences Research Program (KC-04-02) of the
Office of Energy Research of the U.S. Department of Energy under Grants DE-FG02-97ER25013 and DF-FG02-
94ER25225, and also by the National Science Foundation Grant DMS 9503114, and by the Air Force Office of
Scientific Research Grant F49620-95-I-0305. We acknowledge helpful discussions with William Symes, who has
a similar on-going effort on automatic differentiation as ours [15]. Some of the ideas in this work were inspired
by his presentation at the Institute for Mathematics and its Applications, Minnesota, in July 1997.

234
0021-9991/00 $35.00
Copyright ° c 2000 by Academic Press
All rights of reproduction in any form reserved.
INVERSION USING AUTOMATIC DIFFERENTIATION 235

Key Words: automatic differentiation; wave propagation; inverse problems seis-


mic inversion; hyperbolic PDEs; adjoints; finite difference computation; stencils;
Jacobian and adjoint products.

1. INTRODUCTION

In the type of wave propagational inverse problems under consideration, the goal is to
determine parameters, such as sound speed distribution and density distribution, from mea-
sured data, which are collected at a set of receivers. Figure 1 explains the situation. An
incident disturbance is generated, as it travels in the unknown medium and produces reflec-
tions and refractions. This information is collected at receivers placed at a set of locations.
Several such experiments are carried out for a set of incident disturbances. The inverse prob-
lem is to determine properties of the unknown medium from the set of measured response.
Problems of this type arise in several applications including geophysical exploration
and medical imaging. A common feature in these applications is that the problem is very
large. Typically, the number of unknowns and equations could be in the range of 103 to
106 . Often, the most convenient way to solve this type of inverse problem is to pose it as
an optimization problem, either using nonlinear least-squares [16, 12] or another approach
specialized to take advantage of the properties afforded by the particular application [14, 1].
In any event, what one will need for computation is derivative information concerning the
relation between medium parameters and data. Because of the size of the problem, we
cannot compute and store the entire Jacobian of the function, but rather, we must find ways
of computing the action of the Jacobian and its transpose on a given vector, or the so-called
direct and adjoint products.

FIG. 1. In this figure, the problem is to identify the unknown medium. An incident wave is generated, and as
it travels into the medium being probed, reflected and refracted signals are generated. These are captured at the
receivers. Several such experiments are carried out for a set of incident disturbances. The inverse problem is to
find the properties of the unknown medium from the collected data.
236 COLEMAN, SANTOSA, AND VERMA

The goal of this work is to show that efficient calculation of the direct and adjoint product
is possible. The approach we take is to use automatic differentiation (AD) while exploiting
structure to the extent possible. We emphasize that without taking advantage of structure, a
direct application of current AD technology to the codes simulating the wave phenomena
will lead to memory problems.
As we will show in the next section, the wave propagation can be modeled effectively
using time-stepping finite difference schemes. The time-stepping nature of the scheme can
be exploited using the general extended Jacobian framework [3, 4]. The spatial discretiza-
tion by finite differences reveals further structure. Each finite difference stencil encodes the
dependence of a computed intermediate variable on other variables. In particular, it shows
that there is an inherent sparsity in the Jacobian. A combination of these structure exploita-
tions allows us to overcome the problem posed by size, and its consequence on memory
requirements.
In our implementation, we apply AD on the finite difference stencils and use the resulting
codes to assemble a procedure for computing the Jacobian and adjoint vector products. The
resulting code is as efficient as those that are obtained by directly performing summations-
by-parts calculation on the simulation program. The advantage here is that we have avoided
the error-prone and tedious procedure [13]. Instead, we can view the code writing process
at a higher level, leaving the most difficult parts to AD.
The plan of this article is as follows. We proceed with a short introduction to the inverse
problem for acoustic waves. The model for the physics and its numerical discretization are
described in Section 2. In Section 3, we review the extended Jacobian framework and show
how it can be used for our problem. We will provide templates for calculating the adjoint-
vector product. The stencil approach and its implementation is presented in Section 4. We
will also show how the stencil can be described at a higher level as projections. Templates
for calculating Jacobian and adjoint vector products that use stencils are given. Section
5 summarizes our experience with this method of computation. A final section contains
concluding remarks.

2. AUTOMATIC DIFFERENTIATION BACKGROUND

Automatic differentiation is based on the fact that all computer programs, no matter
how complicated, use a finite set of elementary functions as defined by the programming
language. The function computed by the program is simply a composition of these ele-
mentary functions. The partial derivatives of the elementary functions are known, and the
overall derivatives are computed using the chain rule; this process is known as automatic
differentiation [9].
Abstractly, the program to evaluate the solution u (an m-vector) as a function of x
(generally a n-vector) has the form

x ≡ (x1 , x2 , . . . , xn )

z ≡ (z 1 , z 2 , . . . , z p ), p Àm+n

y ≡ (y1 , y2 , . . . , ym ),
INVERSION USING AUTOMATIC DIFFERENTIATION 237

where the intermediate variables z are related through a series of these elementary functions
which may be unary,

z k = f elem
k
(z i ), i < k,

consisting of operations such as (−, pow(·), sin(·), . . .) or binary,

z k = f elem
k
(z i , z j ), i < k, j < k.

such as (+, /, . . .).


There are a number of cases when the elementary function is not differentiable (e.g.,
k
f elem (z i ) = abs(z i ) or f elem
k
(z i , z j ) = max(z i , z j )). Sophisticated heuristic techniques are
developed to treat these cases. For more details consult [9].
Automatic differentiation has two basic modes of operations, the forward mode and the
reverse mode. In the forward mode the derivatives are propagated throughout the compu-
tation using the chain rule, e.g., for the elementary step z k = f elem k
(z i , z j ) the intermediate
derivative, dz k /d x, can be propagated in the forward mode as

dz k ∂ f k dz i ∂ f k dz j
= elem + elem .
dx ∂z i d x ∂z j d x
This chain rule based computation is done for all the intermediate variables z and for the
output variables u, finally yielding the derivative ddux .
The reverse mode computes the derivatives du/dz k for all intermediate variables back-
wards (i.e., in the reverse order) through the computation. For example, for the elementary
step z k = f elem
k
(z i , z j ), the derivatives are propagated as

du ∂ f k du du ∂ f k du
= elem and = elem .
dz i ∂z i dz k dz j ∂z j dz k

At the end of computation of the reverse mode the derivative ddux will be obtained.
The forward and reverse modes can be used to compute the direct and the adjoint products,
J v and J T v given a vector v, where J is the Jacobian of a nonlinear mapping [9]. Both these
computations require time proportional to one function evaluation, with the adjoint product
being approximately twice as costly as the direct mode. The Hessian-vector product H v
can also be computed via AD in time proportional to one function evaluation.

3. INVERSE PROBLEMS AND NUMERICAL MODELING

3.1. One-Dimensional Problem


Consider a bar or string of length L whose sound speed is location dependent. Let u(x, t)
represent a measure of the disturbance at time t and location x. Then u satisfies the wave
equation

u tt = c2 (x)u x x for 0 < x < L , (1a)

where c(x) is the sound speed of the medium. We assume that the medium is quiescent at
t = 0,

u(x, 0) = 0, and u t (x, 0) = 0, 0 ≤ x ≤ L . (1b)


238 COLEMAN, SANTOSA, AND VERMA

Disturbance is introduced at the boundary x = 0 as a Neumann boundary condition

u x (0, t) = f (t), for t > 0. (1c)

We will assume that f (t) is compactly supported away from t = 0. On the right end, we
assume a radiation boundary condition

[u t − c(x)u x ]|x=L = 0. (1d)

We are given u(0, t) = g(t) for 0 < t < T . The problem is to find the unknown c(x).
A convenient way to view the problem is to define the forward map as one that associates
a given c(x) with a boundary data u(0, t). Let

A[c](t) := u(0, t), 0 < t ≤ T,

where it is understood that the evaluation of A[c](·) is through the initial-boundary value
problem (IBVP) in (1). A least-squares formulation of this problem is to solve the
minimization
Z T
min |A[c](t) − g(t)|2 dt. (2)
c(x) 0

A typical solution to the above nonlinear least squares problem requires the knowledge
of the gradient of the above functional. This translates to computing the adjoints of the
function A[c](·) [12, 13]. The first step in the numerical computation of the adjoints is to
discretize the problem.
A common discretization for this problem is to use finite difference methods. Let

u ik ≈ u(xi , tk ), where xi = i1x, i = 0 : n, tk = k1t, k = 0 : m,

and 1x = L/n, and 1t = λ1x for some λ > 0. A second order finite difference is chosen.
The partial differential equation in (1a) is replaced by
¡ ¢ ¡ k ¢
u ik+1 = 2 1 − λ2 ci2 u ik−1 − u ik + λ2 ci2 u i+1 + u i−1
k
, for i = 1 : n − 1, k ≥ 0. (3a)

We use the initial conditions

u i−1 = u i0 = 0. (3b)

We discretize the boundary conditions as


¡ ¢
u k+1
0 = 2 1 − λ2 c02 u k0 − u k−1
0 + 2λ2 c02 u k1 − 2 f k λ2 c02 1x, (3c)

for the inhomogeneous Neumann condition on the left end, and


1t ¡ k ¢
u k+1 = u kn − cn u − u kn−1 (3d)
n
1x n
for the radiation boundary condition on the right. The discrete version of the forward map
is obtained by running the finite difference forward in time and recording the left end value
for u ik , that is,

A[c]k := u k0 .
INVERSION USING AUTOMATIC DIFFERENTIATION 239

A way to describe the function evaluation is through a vector notation. Let us write the
vectors uk = [u k0 , u k1 , . . . , u kn ]T and c = [c0 , c1 , . . . , cn ]T . Then the finite difference scheme
amounts to

uk+1 = F(c, uk , uk−1 ), with u−1 = u0 = 0. (4)

The forward map from c to A[c] is given by, letting e1 = [1, 0, . . . , 0]T ,

A[c]k = e1T uk , for k = 1 : m. (5)

The inverse problem is to solve for c in

minkA[c] − gk2 , (6)


c

where g is a data vector corresponding to a measurement.

3.2. Two-Dimensional Problem


The two-dimensional problem is motivated by a problem is acoustic imaging of human
tissues. The geometry of the problem has been described in the previous section, and
elsewhere [11]. Here we give a mathematical model of the physics.
Because any computational domain is necessarily finite, we will consider a box Ä :=
[−a, a] × [−a, a]. Letting u(x, y, t) represent the excess pressure, a model for acoustics is
given by the partial differential equation

u tt = c(x, y)2 1u + f (x, y, t) in Ä, t > 0. (7a)

Here c(x, y) represents the unknown soundspeed distribution, while f (x, y, t) is a known
acoustic source. Initially, the system is at rest, hence

u(x, y, 0) = u t (x, y, 0) = 0. (7b)

We need to simulate an unbounded medium with a bounded domain. In the unbounded


medium, we would have a boundary condition for |x 2 + y 2 | large that amounts to saying
that waves which are sufficiently far away from the origin and traveling outward will be
radiated to infinity. To simulate the unbounded medium, we assume that c is constant near
the boundary of Ä and apply the Engquist–Majda boundary conditions [6] along the flat
parts of ∂Ä (and a modification of Engquist–Majda at the corners of ∂Ä). For points away
from the corners, the boundary condition is given by

c2
cu xt − u tt + u yy = 0 for {x = ±a; |y| < a}, (7c)
2
c2
cu yt − u tt + u x x = 0 for {y = ±a; |x| < a}. (7d)
2
Let R represent the collection of coordinate points where receivers have been placed to
record u. Thus,

R = {(xr , yr ) = (ρ cos θr , ρ sin θr ), r = 1 : p}


240 COLEMAN, SANTOSA, AND VERMA

for some ρ > 0. The forward map is given by

A[c; f ]r := u(xr , yr , t).

The source term f (x, y, t) is assumed to be null for t = 0. We will view the forward map
A[ ] as dependent on c and parameterized by f .
The nonlinear least-squares formulation is given by
p Z
XX T
min |A[c; fl ]r (t) − grl (t)|2 ,
c 0
l r =1

where grl (t) is the measured response at location (xr , yr ) for the source fl (x, y, t).
Discretization of (7) is quite straightforward. The only tricky part comes in discretizing
the Enquist–Majda boundary condition. Letting (xi , y j ) = (i1, j1), −n ≤ i ≤ n, and
−n ≤ j ≤ n, we discretized the domain Ä by a regular mesh of size 1 = a/n. Time is
discretized as in the 1-D case: tk = k1t for k = 0 : m.
Let the (2n + 1)2 vector uk represent the value for u(x, y, t) at the node points at time
tk . The finite difference scheme can be written in shorthand as

uk+1 = F(c, uk , uk−1 ), for k = 1 : m,

with u−1 = u0 = 0. The discrete forward map evaluates u at each receiver, thus

A[c; f ]k = T uk ,

where T is a matrix of size p-by-(2n + 1)2 and its function is to “grab” values of u at time
step k at the recievers. In place of the integration in the nonlinear least-squares, we have

XX
p
X
m
¯ ¯
min ¯ A[c; fl ]k − g k ¯2 . (8)
c r rl
l r =1 k=1

Here grlk is the measured response at receiver r at time step k when the excitation is fl .

4. THE EXTENDED JACOBIAN FRAMEWORK

We restrict our discussion to the 1-D problem for clarity of presentation. The prescription
for computing Jacobian vector and adjoint vector products for the more complex 2-D
problem follows the same lines as for the 1-D problem. An algorithm for the forward map
for the 1-D case is

u−1 = u0 = 0
for k = 0 : m − 1
uk+1 = F(c, uk , uk−1 ) (9)
h k+1 = e1T uk+1
end

We use the notation h k = A[c]k . Thus the function in question is the mapping from c to
h = (h 1 , h 2 , . . . , h m )T .
INVERSION USING AUTOMATIC DIFFERENTIATION 241

We can give an alternate description of this mapping by enumerating through the loop

u1 = F(c, u0 , u−1 )
u2 = F(c, u1 , u0 )
..
.
um = F(c, um−1 , um−2 )
h = e1 e1T u1 + e2 e1T u2 + · · · + em e1T um

We call this the extended function. The extended function allows for an easy way to compute
the Jacobian and its transpose. Formally, the directional derivative of h in the direction dc,
i.e., the Jacobian-vector product, is given by the calculation

du−1 = du0 = 0
for k = 0 : m − 1
duk+1 = ∂1 F(c, uk , uk−1 ) dc + ∂2 F(c, uk , uk−1 ) duk + ∂3 F(c, uk , uk−1 ) duk−1 (10)
dh k+1 = e1T k+1
du
end

The matrices ∂1 F, ∂2 F, and ∂3 F are Jacobians of the function F with respect to the first,
second, and third variables. Therefore, they are (n + 1)-by-(n + 1) matrices. In a computer
program, we would simply define F(c, ·, ·) and use AD to either compute these matrices
or produce subprograms that calculate the action of these matrices on given vectors. The
desired directional derivative (Jacobian times vector dc) is dh = (dh 1 , dh 2 , . . . , dh m )T .

4.1. Adjoint Computation via Linear Algebra


The above calculation can be defined as a set of matrix equations through the use of the
extended Jacobian framework [3, 4]. Let
 
du1
 du2 
 
dU =  .  .
 .. 
dum

Define the m(n + 1) × m(n + 1) matrix

M=
 
.. ..
 −I 0 . . 0 0 
 ∂ F(c, u1 , u0 ) .. ..
 2 −I 0 . . 0 
 .. .. 
 ∂3 F(c, u2 , u1 ) ∂2 F(c, u2 , u1 ) −I . . 0 
 
 .. .. .. 
 0 . . . −I 0 
.. ..
0 . . ∂3 F(c, um−1 , um−2 ) ∂2 F(c, um−1 , um−2 ) −I
242 COLEMAN, SANTOSA, AND VERMA

and m(n + 1) × (n + 1) matrix

 
∂1 F(c, u0 , u−1 )
 ∂1 F(c, u1 , u0 ) 
 
B= .. 
 . 
∂1 F(c, um−1 , um−2 )

and m × m(n + 1) matrix

£ ¤
T = e1 e1T e2 e1T ... em e1T .

Then

−M dU = B dc, and dh = T dU.

From the above, we can solve for dU and write

dh = −T M −1 B dc, (11)

which encapsulates the Jacobian-vector product calculation in (10).


To obtain a formula for the adjoint-vector product calculation, we start by formally taking
the adjoint of (11). Let p be the result of multiplying vector q by the adjoint of the Jacobian.
Then from (11)

p = −B T M −T T T q. (12)

The matrices B, M, and T should not be computed explicitly, but rather, this formalism is
used to generate an efficient algorithm for finding p given q.
Let q be an m-vector. Then Q = T T q is an m(n + 1)-vector. From (12) if we let Y =
−M −T T T q, then

−M T Y = Q and p = B T Y. (13)

By exploiting the structures of M and B, we can come up with an efficient algorithm to find
p for a given q. Because of the lower-triangular structure of M, we never need to invert any
matrices. The algorithm starts by chopping up Q into m separate pieces,
 
q1
 2
q 
Q= 
 .. 
 . 
qm
INVERSION USING AUTOMATIC DIFFERENTIATION 243

and similarly for Y . Then, according to (13), we can calculate p by

ym = qm ; p = ∂1 F(c, um−1 , um−2 )T ym


ym−1 = qm−1 + ∂2 F(c, um−1 , um−2 )T ym
p = p + ∂1 F(c, um−2 , um−3 )T ym−1
for k = m − 2 : −1 : 1 (14)
yk = qk + ∂2 F(c, uk , uk−1 )T yk+1 + ∂3 F(c, uk+1 , uk )T yk+2
p = p + ∂1 F(c, uk−1 , uk−2 )T yk
end

Note that we have adjoints/transpose of ∂1 F(c, ·, ·), ∂2 F(c, ·, ·) and ∂3 F(c, ·, ·). These ad-
joints can be computed explicitly if we have matrices ∂1 F, ∂2 F, and ∂3 F or we can resort
to AD to produce subprograms that compute their action on given vectors.
Note that in the algorithm (14), we need to have available values of the fields uk for all
indices k. Depending on the size of the problem, it may be more efficient to store only values
of uk for some indices k ∈ K and use (9) to generate the field for other indices k ∈ / K . An
efficient method to do this is discussed in [8].

4.2. Adjoint Computation via Adjoint Variables


We give an alternate derivation of the algorithm in (14) which is based on using adjoint
variables. Consider a simple calculation involving the following 3 steps. The input variable
is c and the output variable is u 3 ; u 0 is a parameter. The steps are

u 1 = f (c, u 0 , 0)
u 2 = f (c, u 1 , u 0 )
u 3 = f (c, u 2 , u 1 ).

We can view this as an extended function. The Jacobian calculation is

du 1 = ∂1 f (c, u 0 , 0) dc
du 2 = ∂1 f (c, u 1 , u 0 ) dc + ∂2 f (c, u 1 , u 0 ) du 1
du 3 = ∂1 f (c, u 2 , u 1 ) dc + ∂2 f (c, u 2 , u 1 ) du 2 + ∂3 f (c, u 2 , u 1 ) du 1 .

Therefore, the Jacobian (in this case, derivative) can be identified as J from the output
du 3 = J dc. This is a forward mode computation.
Let the adjoint variables be p and v3 so that we formally have p = J T v3 . If we view du 1
and du 2 as intermediate variables, then we can associate to them adjoint variables v1 and
v2 . From the third equation in the adjoint calculation, we can formally write

   ∂ f (c, u , u ) 
p 1 2 1
 v2  =  ∂
 2 f (c, u 2 , u

1 )  v3
v1 ∂3 f (c, u 2 , u 1 )
244 COLEMAN, SANTOSA, AND VERMA

and from the second,


· ¸ " #
p ∂1 f (c, u 1 , u 0 )
= v2 ,
v1 ∂2 f (c, u 1 , u 0 )

and from the first,

p = ∂1 f (c, u 0 , 0)v1 .

The contributions to each of the adjoint variables are summed over each operation, hence

v2 = ∂2 f (c, u 2 , u 1 )v3
v1 = ∂3 f (c, u 2 , u 1 )v3 + ∂2 f (c, u 1 , u 0 )v2
p = ∂1 f (c, u 2 , u 1 )v3 + ∂1 f (c, u 1 , u 0 )v2 + ∂1 f (c, u 0 , 0)v1 .

This is the reverse computation [9].


We can generalize this concept to the 1-D wave propagation problem. In (10), we identify
adjoint variables p with dc for the input, and q with dh for the output. To the intermediate
variables duk , we associate adjoint variables vk . Performing the reverse mode calculation,
we must start at index k = m − 1 . Let qk , for k = 1 : m, be the elements of q. The adjoint-
times-vector algorithm is

set vm = 0
for k = m − 1 : 0
vk+1 = vk+1 + e1 qk+1
vk = vk + ∂2 F(c, uk , uk−1 )T vk+1 (15)
vk−1 = vk−1 + ∂3 F(c, uk , uk−1 )T vk+1
p = p + ∂1 F(c, uk , uk−1 )T vk+1
end

At the end of the calculation, we can identify p = J T q.

4.3. Discussion
The foregoing methodology, while limited to the 1-D problem, can be adapted to solve
the more complicated 2-D problem. What we wish to emphasize here is the conciseness of
the extended Jacobian framework and how to exploit the underlying problem structure. The
algorithms in (10), (14), and (15) can be viewed as code templates for Jacobian and adjoint
vector product calculations. AD is deployed in computing the Jacobian and adjoint of the
subproblem described by the time stepping process (4).
We recall that the adjoint (reverse product) ∂1 F(c, ·, ·)T y, etc., can be computed us-
ing the adjoint (reverse) mode of an AD tool. For large problems like this, computing
the adjoint product of the timestep routine (4) can be very expensive, since the size of
c, uk , and uk−1 can be large. An AD tool would be default assume that every element of
∂1 F(c, uk , uk−1 ) depends on every element of c, uk , and uk−1 . This assumption on depen-
dence generates a “table” which is used in computing intermediate values in the reverse
product mode. For example, ADOL-C [10] implements this lookup by creating a tape, which
INVERSION USING AUTOMATIC DIFFERENTIATION 245

it will write on the disk if the problem size is large. When it does this, it becomes unacceptably
inefficient.
This concern brings us to the main idea of this paper, i.e., that of AD applied to the
finite difference stencil. Our approach is to use AD on the smallest component of the
calculation—a kind of “microscopic” structure exploitation. We discuss how this is done
in the next section.
In principle, what we are exploiting is a specific sparsity structure that is inherent in the
finite difference scheme. A general approach for exploiting sparsity in AD is described in
[2].

5. EXPLOITING THE STENCIL STRUCTURE

The finite difference method that we used in the 1-D case can be written as indicated in
(4) which we rewrite here

uk+1 = F(c, uk , uk−1 ), with u−1 = u0 = 0.

This shorthand notation does not reveal the stencil structure given by the explicit formulas
in (3). For the jth component of uk+1 , j not equal to 0 or n, from (3a) we can write

¡ ¢
u k+1
j = f c j , u kj−1 , u kj , u kj−1 , u k−1
j . (16)

The above expression spells out clearly that the dependence of uk+1 on c, uk , and uk−1 , is
very sparse. This is best visualized by studying Fig. 2. Thus, we need only to deal with f
which is a function of only 5 variables. From (3c–3d), we have two more such functions
but they depend only on 4 and 3 variables, respectively, and are given by

¡ ¢
u k+1
0 = f L c0 , u k0 , u k1 , u k−1
0 ,
¡ ¢
u n = f R cn , u n−1 , u n .
k+1 k k

FIG. 2. The stencil for the 1-D problem for j 6= 0, n. Boundary nodes are slightly different and require separate
treatment.
246 COLEMAN, SANTOSA, AND VERMA

The function F(·, ·, ·), representing a time-step, is now replaced with the pseudo-code

function uk+1 = F(c, uk , uk−1 )


¡ ¢
u k+1
0 = f L c0 , u k0 , u k1 , u k−1
0
¡ ¢
u k+1
n = f R cn , u kn−1 , u kn
(17)
for j = 1 : n − 1
¡ ¢
u k+1
j = f c j , u kj−1 , u kj , u kj−1 , u k−1
j
end

It is to these “small” functions of a few variables that we want to apply automatic


differentiation. The benefits are that we will have efficient codes which explicitly exploit
the structure of the problem. The cost is that the derivative and adjoint codes will be slightly
more complicated to assemble. We discuss this next.

5.1. Sparse Jacobian


Due to the sparsity afforded by the stencil structure, it is feasible to calculate the full
Jacobian (rather than the Jacobian vector product). To see this we introduce projection
matrices. Let e j be the jth unit vector (we will let j run from 0 to n for convenience). Then
(16) can be rewritten in terms of vectors, c, uk , and uk−1 as
¡ ¢
u k+1
j = f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 . (18)

In computing the Jacobian, we needed the derivatives of F(·, ·, ·) with respect to the 3
variables. We next derive procedures to do this using the stencils.
The components of F1 (c, uk , uk−1 ) are
¡ ¢T 
∇c u k+1
 0 
¡ ¢T 
 ∇c u k+1 
 1 
 .
.. 
F1 =  .
 
¡ ¢ T 
 ∇c u k+1 
 ¡ n−1 ¢ 
∇c u k+1
n

The gradients are easily obtained by differentiating (18) with respect to c. We obtain, for
j 6= 0, n,
¡ ¢
∇c u k+1
j = ∂1 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j ,

which is an (n + 1)-vector with a single nonzero entry at j. Thus, it can be seen that
F1 (c, uk , uk−1 ) is a diagonal matrix. This property is not apparent to state-of-the-art auto-
matic differentiation programs.
The Jacobian F3 (c, uk , uk−1 ) will also be diagonal for the same reason and will be
computed by applying AD to (16). The Jacobian F2 (c, uk , uk−1 ) will be slightly more
complicated. The components of the Jacobian are similar to those of F1 (·, ·, ·) except that
the gradient will be with respect to uk . Directly differentiating (16) with respect to uk
INVERSION USING AUTOMATIC DIFFERENTIATION 247

yields
¡ ¢
∇uk u k+1
j = ∂2 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j−1
¡ ¢
+ ∂3 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j
¡ ¢
+ ∂4 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j+1 .

Thus, the matrix F2 (c, uk , uk−1 ) is a tridiagonal matrix.


We can summarize the steps in a MATLAB pseudo-code2
£ ¤
F1 = ∂1 f L e0T ;
£¡ ¢¤
F2 = ∂2 f L e0T + ∂3 f L e1T ;
£ ¤
F3 = ∂4 f L e0T ;
for j = 1 : n − 1
£ ¤
F1 = F1 ; ∂1 f eTj ;
£ ¡ ¢¤
F2 = F2 ; ∂2 f eTj−1 + ∂3 f eTj + ∂4 f eTj+1 ;
£ ¤
F3 = F3 ; ∂5 f eTj ;
end
£ ¤
F1 = F1 ; ∂1 f R enT ;
£ ¤
F2 = F2 ; ∂2 f R en−1
T
+ ∂3 f R enT ;
F3 = [F3 ; zeros(1, n + 1)];

Once the matrices F1 , F2 , and F3 are obtained, we can use the code in (10) to compute the for-
ward derivatives and the code in (14) to compute the adjoint. The codes for the partial deriva-
tives of f , f L , and f R are easily obtained using AD. These codes are expected to be very
efficient because of the simplicity of the stencil formula and because of the small number
of independent variables involved. We have gained efficiency in the AD computation by ap-
plying AD at the stencil level. The cost to the user is performing some detail “hand” coding.
We can employ a similar approach for the more complicated 2-D example. We note that
a typical stencil for interior nodes is given by
¡ ¢
u ik+1
j = f ci j , u i−1,
k
j , u i+1, j , u i, j−1 , u i, j+1 , u i j , u i j
k k k k k−1
.

The stencil is displayed in Fig. 3. Boundary node and corner nodes, because of the absorbing
boundary conditions described in (7c), result in slightly more complex stencils. The key
observation is that the stencil embodies the sparsity structure of the Jacobian and is a feature
that should be exploited.

5.2. Stencil in Forward and Reverse Mode


We can also exploit stencil structure without explicitly computing the Jacobian. This
results in procedures to compute the Jacobian times vector and adjoint times vector. Suppose
we are given dc and we wish to calculate the vector dh as outlined in (10). The approach
we take will make use of stencil formulas such as (16). Assume that we have used AD

2
One would use sparse utilities to implement this in MATLAB.
248 COLEMAN, SANTOSA, AND VERMA

FIG. 3. The stencil for the 2-D problem for an interior node. Boundary and corner nodes are slightly different
and require separate treatment.

to generate an algorithm to compute the gradient of f (·, ·, ·, ·, ·) times a 5-vector; that is,
given X and d X , we have a procedure to find

f (X ) and ∇ f (X ) · d X.

Here X stands for a 5-vector with components X = [c j , u kj−1 , u kj , u kj+1 , u k−1 T


j ] . Then it is
easy to see that from (16) if d X = [dc j , du j−1 , du j , du j+1 , du j ] , then
k k k k−1 T

du k+1
j = ∇ f (X ) · d X.

We would have similar formulas for j = 0 and j = n with the difference that the vector of
independent variables would be 4 and 3 dimensional, respectively. We can therefore assem-
ble the du k+1
j within an outer loop which corresponds to the time steps. The pseudo-code
would take the form
du−1 = du0 = 0
for k = 0 : m − 1
£ ¤T £ ¤T
X = c0 , u k0 , u k1 , u k−1
0 ; d X = dc0 , du k0 , du k1 , du k−1
0

du k+1
0 = ∇ f L (X ) · d X
£ ¤T £ ¤T
X = cn , u kn−1 , u kn ; d X = dcn , du kn−1 , du kn
du k+1
n = ∇ f R (X ) · d X
(19)
for j = 1 : n − 1
£ ¤T ¤T
X = c j , u kj−1 , u kj , u kj+1 , u k−1
j ; d X = [dc j , du kj−1 , du kj , du kj+1 , du k−1
j

du k+1
j = ∇ f (X ) · d X
end
dh k+1 = du k+1
0
end

The adjoint codes generated by AD on the stencil formula (16) computes the following.
Given a scalar v and a vector X , the adjoint code calculates a 5-vector

v∇ f (X ).
INVERSION USING AUTOMATIC DIFFERENTIATION 249

We have similar procedures for f L (·) and f R (·). In reverse mode, we want to perform a
calculation similar to (15). We start with a vector q = [q1 , q2 , . . . , qm ]T , and we wish to
compute p = J T q. The pseudo-code for this is

set vm = 0
for k = m − 1 : 0
v0k+1 = v0k+1 + qk+1
£ ¤T
X = c0 , u k0 , u k1 , u k−1
0
£ ¤T
Y = p0 , v0k , v1k , v0k−1
Y = Y + dv0k+1 ∇ f L (X )
for j = 1 : n − 1
£ ¤T
X = c j , u kj−1 , u kj , u kj+1 , u k−1
j
(20)
£ ¤T
Y = p j , v kj−1 , v kj , v kj+1 , v k−1
j

Y = Y + dv k+1
j ∇ f (X )
end
X = [cn , u kn−1 , u kn ]T
Y = [ pn , vn−1
k
, vnk ]T
Y = Y + dvnk+1 ∇ f R (X )
end

Algorithm (19) for the forward product calculation and algorithm (20) for the reverse product
calculation will be extremely efficient because the codes produced by AD for calculating the
derivative of f and its adjoint will be nearly as short and simple as the function calculation.
The number of independent variables is small, and there are no loops as can be seen in (3a).
The algorithm (20) can be easily understood by noticing that the input is the vector q. The
algorithm goes backwards in time computing adjoints of each state by using the previously
computed adjoints. At completion, this algorithm returns the adjoints of the independent
variables.
In 2-D, the stencil is a bit more complex as already pointed out, but the general principle
described here applies. Indeed, we have coded a version of algorithms (19) and (20) for the
2-D problem. We discuss the results of our numerical calculations next.

6. NUMERICAL RESULTS

We present some results from our numerical computations. In both examples, we use
TAMC [7] to obtain derivative and adjoint codes from Fortran sources. All the Fortran
codes were “wrapped” as MATLAB mex-files and used in conjunction with MATLAB
codes.
Our goal in this paper is to demonstrate the use of extended Jacobian framework together
with exploitation of stencil in Jacobian and adjoint calculations. In a subsequent work, we
apply our approach to solve a 2-D inverse problem arising in acoustic imaging.
250 COLEMAN, SANTOSA, AND VERMA

6.1. The 1-D Problem


In our example, we choose 1x = 1 and 1t = 0.8. The domain is of length L = (N −1)1x.
We will use several N in our calculations. The initial boundary value problem for the 1-D
wave equation is discretized according to (3). The number of time steps is m, which will
be varied as well. For excitation f (t), we choose the derivative of the Gaussian. The graph
of f is shown in Fig. 4a. We take two sound speeds c1 (x) and c2 (x), shown in Fig. 4b
when n = 100. The resulting boundary data are h(t) = u(0, t). When the medium is c1 (x)

FIG. 4. (a) The excitation used in the examples. (b) The two sound speed profiles, c1 (x) in dots, c2 (x) in solid.
(c) The graphs of (h2 − h1 ) and J (c1 )(c2 − c1 ). (d) The graph of p = J (c1 )T (h2 (t) − h1 ), shown for comparison,
is the graph of c2 − c1 .
INVERSION USING AUTOMATIC DIFFERENTIATION 251

the boundary data are h 1 (t), and h 2 (t) when the medium is c2 (x). Let q = h 1 − h 2 ; the
graph of q(t) is displayed in Fig. 4c for m = 200.
We will first compute the Jacobian at c1 (x) times the difference c2 (x) − c1 (x). The re-
sulting output vector should be very close to q(t). A comparison of p(t) with J (c1 )(c2 − c1 )
is shown in Fig. 4c. Next we calculate the adjoint times q(t); i.e.,

p = J (c1 )T (h2 − h1 ).

The output of this calculation will be the steepest descent direction corresponding to the
nonlinear least-squares functional in (6). This direction would be similar to c2 (x) − c1 (x).
The graph of p(t) is shown in Fig. 4d. One can see that the 2 big signals, which are
scaled versions of f , are reproduced near the places where c2 (x) − c1 (x) take jumps.
Unfortunately, the similarity ends there; the result shows that the inverse problem is not
very well posed. However, we did check that the Jacobian and the adjoint are correctly
computed by evaluating

qT J dc and dcT J T q (21)

and comparing their values for any choice of c, q, and dc. The agreement is usually 14
digits. Additionaly, the result is also confirmed by the computation of qT J dc using finite
differences, where we usually get about 6 digits right. We show a typical run in Fig. 5.

FIG. 5. A test of the correctness of Jacobian and adjoint calculations. In this example, N = 80 and m = 100.
We choose at random 2 vectors dc and q displayed on the first row. On the second row, we show J dc and J T q.
The inner products qT J dc and dcT J T q are evaluated. They agree to 14 digits.
252 COLEMAN, SANTOSA, AND VERMA

Computation time is linear in the number of x nodes for a fixed number of t nodes. There
is no difficulty with memory as the stencil codes are very simple with a small number of
independent variables.

6.2. The 2-D Problem


In the 2-D problem, we set up a grid of 161-by-161 node points. The computational
domain is [−80, 80] × [−80, 80], thus 1 = 1. For interior nodes, we use a second order
accurate discretization of the wave equation (7a). On the boundary nodes, we use a second
order discretization of the Enquist–Majda boundary condition (7c)–(7d). The corner nodes,
and the 2 nodes adjacent to the corner on the boundary, require special stencils. The stencils
are obtained by requiring that the discrete wave equation be satisfied at the node while at
the same time satisfying the discrete version of the absorbing boundary condition.
For excitation, we choose a point source. To model this, if the source is at node (i s , js ),
i.e., located at position (i s 1, js 1), we assume that f (x, y, t) is
(
φ(t) at (i s 1, js 1)
f (x, y, t) =
0 otherwise.

The time-dependent function φ(t) is chosen to be a Gaussian and will be sampled at the
time increments 1t = 0.55, which is the time step chosen for the finite difference scheme.
Data will be collected at 64 stations located at node points. These points are nodes that
lie close to a set of points distributed evenly at 64 places on the circumference of a circle of
radius 72. We will take 381 times steps. A window of size [−70, 70] × [−70, 70] represents
where c(x, y) is allowed to vary. Thus the mapping from sound speed c to data at the receiver
is from IR141×141 to IR64×381 .
In Fig. 6a we display the sound speed distribution in the domain. The receivers are marked
with circles; receiver 1 is at 0◦ from the positive x-axis. The source is located by a ?. Next,
in Fig. 6b, we display the receiver data when the medium has the two cylinders shown. The
difference between the previous data and those when the domain is homogeneous is shown
in Fig. 6c. In Fig. 6d, we show the result of applying the adjoint on the difference data in
Fig. 6c. This process is often refered to as back-propagation, and corresponds to the steepest
descent direction for the nonlinear least squares functional in (8). The resulting vector should
resemble the image of the two cylinders. Indeed this is the case if one compares. Figs. 6d
and 6e, the latter displayed for comparison.
In numerous experiments with random vectors, we were able to get the inner products
similar to (21) to agree 14 digits. The adjoint calculations take approximately 115 seconds
on a 4-processor SGI Challenge L. The hand-coded gradient calculation takes approxi-
mately 90 seconds. The fully AD-generated gradient (using the reverse mode) takes almost
5 minutes to compute. In another run, the size of the sound speed c (independent variables)
was increased from 141 × 141 to 201 × 201. A window of size [−100, 100] × [−100, 100]
represents where c(x, y) could vary. The receivers are placed on the circumference of a
circle of radius 90. The grid of 241 × 241 node points is set up corresponding to the domain
[−120, 120] × [−120, 120]. The adjoint calculations scale very well and take 225 seconds
to compute. The hand-coded gradient calculation is marginally efficient and takes about
200 seconds. However, the fully AD-generated gradient scheme becomes intractable be-
cause the volume of data to be stored for the reverse mode is much larger and some extra
FIG. 6. (a) The setup for the numerical experiment. The two cylinders represent sound speed anomalies.
The darker cylinder is 2% faster while the lighter cylinder is 1% faster than the background medium. Shown in
circles are the receiver locations. A star marks the location of the point source. (b) The receiver data when the
two cylinders are present. (c) The difference between (b) and receiver data when the medium is constant. (d) The
adjoint applied to (c). (e) The two cylinders plotted on the same scale as in (d) for comparison.

253
254 COLEMAN, SANTOSA, AND VERMA

time is spent in memory paging and accessing secondary storage. This takes nearly 20 min-
utes to finish. This illustrates that the adjoint scheme presented in this paper is automatic but
still much like traditional hand-coding in performance, while the naive use of AD doesn’t
provide a reasonably efficient solution.

7. CONCLUSIONS

We have described an inverse problem arising in wave propagation and how the need
arises for efficient computation of Jacobian and adjoint products when the problem is posed
as a least-squares problem. In this work, we describe the extended Jacobian framework
which gives a high level description of Jacobian and adjoint vector product calculation. The
framework is particularly appropriate for functions whose evaluation involve some type of
time stepping, such as those that arise in discretizing the wave equation.
We show further that the stencil structure of the finite difference scheme that provides
the underlying function evaluation can be exploited. Automatic differentiation is applied
at the stencil level, and the resulting subprograms fit nicely within the extended Jacobian
framework. The framework provides a guide for building highly efficient codes for Jacobian
and adjoint vector product evaluations. The one drawback of the approach is that we have
given up some “automation” for efficiency. A small amount of hand-coding is required to
assemble the programs. Nevertheless, our approach provides a way to overcome memory
problems associated with present AD technology.
One important extension of the 2-D problem we discussed is the case when the receivers
don’t lie on the grid points. The proposed methodology can be easily extended to handle
the interpolation between the grid values required in this case. It poses no difficulty for
the adjoint computation as the interpolated values are a linear combination of the values
at close grid points. This can be handled through definition of general (linear) projection
operators which when applied to the values at all grid points results in the the values at the
receivers. This linear operator is trivial to handle in the adjoint computation via AD.
Overall, the idea of exposing the stencil structure is very promising and can lead to an
order of magnitude improvement in the adjoint code, as our numerical results show.

REFERENCES

1. G. Chavent, F. Clement, and S. Gomez, Waveform inversion by MBTT formulation, in Mathematical and
Numerical Aspects of Wave Propagation, edited by Cohen, et al. (SIAM, Philadephia, 1995), p. 713.
2. T. Coleman and A. Verma, The efficient computation of sparse Jacobian matrices using automatic differenti-
ation, SIAM J. Sci. Comput. 19, 1210 (1998).
3. T. Coleman and A. Verma, Structure and efficient Hessian calculation, in Advances in Nonlinear Programming,
edited by Yuan (Kluwer Academic, Boston, 1996).
4. T. Coleman and A. Verma, Structure and efficient Jacobian calculation, in Computational Differentiation:
Techniques, Applications, and Tools, edited by Berz et al. (SIAM, Philadelphia, 1996), p. 149.
5. T. Coleman, F. Santosa, and A. Verma, Semi-automatic differentiation, in Proceedings of Optimal Design and
Control Workshop, VPI, 1997.
6. B. Enquist and A. Majda, Absorbing boundary conditions for the numerical simulation of waves, Math. Comp.
31, 629 (1977).
7. R. Giering, Tangent Linear and Adjoint Model Compiler (User Manual, TAMC Version 4.7, 1997).
8. A. Griewank, Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differ-
entiation, Optim. Methods Software 1, 35 (1992).
INVERSION USING AUTOMATIC DIFFERENTIATION 255

9. A. Griewank, Some bounds on the complexity of gradients, Jacobians, and Hessians, in Complexity in Non-
linear Optimization, edited by Pardalos (World Scientific, Singapore, 1993).
10. A. Griewank, D. Juedes, and J. Utke, ADOL-C, a package for the automatic differentiation of algorithms
written in C/C++, ACM Trans. Math. Software 22, 131 (1996).
11. T. Mast, A. Nachman, and R. Waag, Focussing and imaging using eigenfunctions of the scattering operator,
J. Acous. Soc. Am., in press.
12. F. Santosa and W. Symes, An Analysis of Least-Squares Velocity Inversion (Society of Exploration Geophysi-
cists, Tulsa, 1989).
13. F. Santosa and W. Symes, Computation of the Hessian for least-squares solutions of inverse problems of
reflection seismology, Inverse Problem 4, 211 (1988).
14. W. Symes, A differential semblance criterion for inversion of multioffset seismic reflection data, J. Geophys.
Res. 98, 2061 (1993).
15. W. Symes and C. Zhang, A Finite Difference Time Stepping Class, Rice University TRIP Report, 1997.
16. A. Tarantola, Inverse Problem Theory (Elsevier, Amsterdam, 1987).

You might also like