Efficient Calculation of Jacobian and Adjoint Vector Products in The Wave Propagational Inverse Problem Using Automatic Differentiation
Efficient Calculation of Jacobian and Adjoint Vector Products in The Wave Propagational Inverse Problem Using Automatic Differentiation
Efficient Calculation of Jacobian and Adjoint Vector Products in The Wave Propagational Inverse Problem Using Automatic Differentiation
1
This research is sponsored in part by the Applied Mathematical Sciences Research Program (KC-04-02) of the
Office of Energy Research of the U.S. Department of Energy under Grants DE-FG02-97ER25013 and DF-FG02-
94ER25225, and also by the National Science Foundation Grant DMS 9503114, and by the Air Force Office of
Scientific Research Grant F49620-95-I-0305. We acknowledge helpful discussions with William Symes, who has
a similar on-going effort on automatic differentiation as ours [15]. Some of the ideas in this work were inspired
by his presentation at the Institute for Mathematics and its Applications, Minnesota, in July 1997.
234
0021-9991/00 $35.00
Copyright ° c 2000 by Academic Press
All rights of reproduction in any form reserved.
INVERSION USING AUTOMATIC DIFFERENTIATION 235
1. INTRODUCTION
In the type of wave propagational inverse problems under consideration, the goal is to
determine parameters, such as sound speed distribution and density distribution, from mea-
sured data, which are collected at a set of receivers. Figure 1 explains the situation. An
incident disturbance is generated, as it travels in the unknown medium and produces reflec-
tions and refractions. This information is collected at receivers placed at a set of locations.
Several such experiments are carried out for a set of incident disturbances. The inverse prob-
lem is to determine properties of the unknown medium from the set of measured response.
Problems of this type arise in several applications including geophysical exploration
and medical imaging. A common feature in these applications is that the problem is very
large. Typically, the number of unknowns and equations could be in the range of 103 to
106 . Often, the most convenient way to solve this type of inverse problem is to pose it as
an optimization problem, either using nonlinear least-squares [16, 12] or another approach
specialized to take advantage of the properties afforded by the particular application [14, 1].
In any event, what one will need for computation is derivative information concerning the
relation between medium parameters and data. Because of the size of the problem, we
cannot compute and store the entire Jacobian of the function, but rather, we must find ways
of computing the action of the Jacobian and its transpose on a given vector, or the so-called
direct and adjoint products.
FIG. 1. In this figure, the problem is to identify the unknown medium. An incident wave is generated, and as
it travels into the medium being probed, reflected and refracted signals are generated. These are captured at the
receivers. Several such experiments are carried out for a set of incident disturbances. The inverse problem is to
find the properties of the unknown medium from the collected data.
236 COLEMAN, SANTOSA, AND VERMA
The goal of this work is to show that efficient calculation of the direct and adjoint product
is possible. The approach we take is to use automatic differentiation (AD) while exploiting
structure to the extent possible. We emphasize that without taking advantage of structure, a
direct application of current AD technology to the codes simulating the wave phenomena
will lead to memory problems.
As we will show in the next section, the wave propagation can be modeled effectively
using time-stepping finite difference schemes. The time-stepping nature of the scheme can
be exploited using the general extended Jacobian framework [3, 4]. The spatial discretiza-
tion by finite differences reveals further structure. Each finite difference stencil encodes the
dependence of a computed intermediate variable on other variables. In particular, it shows
that there is an inherent sparsity in the Jacobian. A combination of these structure exploita-
tions allows us to overcome the problem posed by size, and its consequence on memory
requirements.
In our implementation, we apply AD on the finite difference stencils and use the resulting
codes to assemble a procedure for computing the Jacobian and adjoint vector products. The
resulting code is as efficient as those that are obtained by directly performing summations-
by-parts calculation on the simulation program. The advantage here is that we have avoided
the error-prone and tedious procedure [13]. Instead, we can view the code writing process
at a higher level, leaving the most difficult parts to AD.
The plan of this article is as follows. We proceed with a short introduction to the inverse
problem for acoustic waves. The model for the physics and its numerical discretization are
described in Section 2. In Section 3, we review the extended Jacobian framework and show
how it can be used for our problem. We will provide templates for calculating the adjoint-
vector product. The stencil approach and its implementation is presented in Section 4. We
will also show how the stencil can be described at a higher level as projections. Templates
for calculating Jacobian and adjoint vector products that use stencils are given. Section
5 summarizes our experience with this method of computation. A final section contains
concluding remarks.
Automatic differentiation is based on the fact that all computer programs, no matter
how complicated, use a finite set of elementary functions as defined by the programming
language. The function computed by the program is simply a composition of these ele-
mentary functions. The partial derivatives of the elementary functions are known, and the
overall derivatives are computed using the chain rule; this process is known as automatic
differentiation [9].
Abstractly, the program to evaluate the solution u (an m-vector) as a function of x
(generally a n-vector) has the form
x ≡ (x1 , x2 , . . . , xn )
↓
z ≡ (z 1 , z 2 , . . . , z p ), p Àm+n
↓
y ≡ (y1 , y2 , . . . , ym ),
INVERSION USING AUTOMATIC DIFFERENTIATION 237
where the intermediate variables z are related through a series of these elementary functions
which may be unary,
z k = f elem
k
(z i ), i < k,
z k = f elem
k
(z i , z j ), i < k, j < k.
dz k ∂ f k dz i ∂ f k dz j
= elem + elem .
dx ∂z i d x ∂z j d x
This chain rule based computation is done for all the intermediate variables z and for the
output variables u, finally yielding the derivative ddux .
The reverse mode computes the derivatives du/dz k for all intermediate variables back-
wards (i.e., in the reverse order) through the computation. For example, for the elementary
step z k = f elem
k
(z i , z j ), the derivatives are propagated as
du ∂ f k du du ∂ f k du
= elem and = elem .
dz i ∂z i dz k dz j ∂z j dz k
At the end of computation of the reverse mode the derivative ddux will be obtained.
The forward and reverse modes can be used to compute the direct and the adjoint products,
J v and J T v given a vector v, where J is the Jacobian of a nonlinear mapping [9]. Both these
computations require time proportional to one function evaluation, with the adjoint product
being approximately twice as costly as the direct mode. The Hessian-vector product H v
can also be computed via AD in time proportional to one function evaluation.
where c(x) is the sound speed of the medium. We assume that the medium is quiescent at
t = 0,
We will assume that f (t) is compactly supported away from t = 0. On the right end, we
assume a radiation boundary condition
We are given u(0, t) = g(t) for 0 < t < T . The problem is to find the unknown c(x).
A convenient way to view the problem is to define the forward map as one that associates
a given c(x) with a boundary data u(0, t). Let
where it is understood that the evaluation of A[c](·) is through the initial-boundary value
problem (IBVP) in (1). A least-squares formulation of this problem is to solve the
minimization
Z T
min |A[c](t) − g(t)|2 dt. (2)
c(x) 0
A typical solution to the above nonlinear least squares problem requires the knowledge
of the gradient of the above functional. This translates to computing the adjoints of the
function A[c](·) [12, 13]. The first step in the numerical computation of the adjoints is to
discretize the problem.
A common discretization for this problem is to use finite difference methods. Let
and 1x = L/n, and 1t = λ1x for some λ > 0. A second order finite difference is chosen.
The partial differential equation in (1a) is replaced by
¡ ¢ ¡ k ¢
u ik+1 = 2 1 − λ2 ci2 u ik−1 − u ik + λ2 ci2 u i+1 + u i−1
k
, for i = 1 : n − 1, k ≥ 0. (3a)
u i−1 = u i0 = 0. (3b)
A[c]k := u k0 .
INVERSION USING AUTOMATIC DIFFERENTIATION 239
A way to describe the function evaluation is through a vector notation. Let us write the
vectors uk = [u k0 , u k1 , . . . , u kn ]T and c = [c0 , c1 , . . . , cn ]T . Then the finite difference scheme
amounts to
The forward map from c to A[c] is given by, letting e1 = [1, 0, . . . , 0]T ,
Here c(x, y) represents the unknown soundspeed distribution, while f (x, y, t) is a known
acoustic source. Initially, the system is at rest, hence
c2
cu xt − u tt + u yy = 0 for {x = ±a; |y| < a}, (7c)
2
c2
cu yt − u tt + u x x = 0 for {y = ±a; |x| < a}. (7d)
2
Let R represent the collection of coordinate points where receivers have been placed to
record u. Thus,
The source term f (x, y, t) is assumed to be null for t = 0. We will view the forward map
A[ ] as dependent on c and parameterized by f .
The nonlinear least-squares formulation is given by
p Z
XX T
min |A[c; fl ]r (t) − grl (t)|2 ,
c 0
l r =1
where grl (t) is the measured response at location (xr , yr ) for the source fl (x, y, t).
Discretization of (7) is quite straightforward. The only tricky part comes in discretizing
the Enquist–Majda boundary condition. Letting (xi , y j ) = (i1, j1), −n ≤ i ≤ n, and
−n ≤ j ≤ n, we discretized the domain Ä by a regular mesh of size 1 = a/n. Time is
discretized as in the 1-D case: tk = k1t for k = 0 : m.
Let the (2n + 1)2 vector uk represent the value for u(x, y, t) at the node points at time
tk . The finite difference scheme can be written in shorthand as
with u−1 = u0 = 0. The discrete forward map evaluates u at each receiver, thus
A[c; f ]k = T uk ,
where T is a matrix of size p-by-(2n + 1)2 and its function is to “grab” values of u at time
step k at the recievers. In place of the integration in the nonlinear least-squares, we have
XX
p
X
m
¯ ¯
min ¯ A[c; fl ]k − g k ¯2 . (8)
c r rl
l r =1 k=1
Here grlk is the measured response at receiver r at time step k when the excitation is fl .
We restrict our discussion to the 1-D problem for clarity of presentation. The prescription
for computing Jacobian vector and adjoint vector products for the more complex 2-D
problem follows the same lines as for the 1-D problem. An algorithm for the forward map
for the 1-D case is
u−1 = u0 = 0
for k = 0 : m − 1
uk+1 = F(c, uk , uk−1 ) (9)
h k+1 = e1T uk+1
end
We use the notation h k = A[c]k . Thus the function in question is the mapping from c to
h = (h 1 , h 2 , . . . , h m )T .
INVERSION USING AUTOMATIC DIFFERENTIATION 241
We can give an alternate description of this mapping by enumerating through the loop
u1 = F(c, u0 , u−1 )
u2 = F(c, u1 , u0 )
..
.
um = F(c, um−1 , um−2 )
h = e1 e1T u1 + e2 e1T u2 + · · · + em e1T um
We call this the extended function. The extended function allows for an easy way to compute
the Jacobian and its transpose. Formally, the directional derivative of h in the direction dc,
i.e., the Jacobian-vector product, is given by the calculation
du−1 = du0 = 0
for k = 0 : m − 1
duk+1 = ∂1 F(c, uk , uk−1 ) dc + ∂2 F(c, uk , uk−1 ) duk + ∂3 F(c, uk , uk−1 ) duk−1 (10)
dh k+1 = e1T k+1
du
end
The matrices ∂1 F, ∂2 F, and ∂3 F are Jacobians of the function F with respect to the first,
second, and third variables. Therefore, they are (n + 1)-by-(n + 1) matrices. In a computer
program, we would simply define F(c, ·, ·) and use AD to either compute these matrices
or produce subprograms that calculate the action of these matrices on given vectors. The
desired directional derivative (Jacobian times vector dc) is dh = (dh 1 , dh 2 , . . . , dh m )T .
M=
.. ..
−I 0 . . 0 0
∂ F(c, u1 , u0 ) .. ..
2 −I 0 . . 0
.. ..
∂3 F(c, u2 , u1 ) ∂2 F(c, u2 , u1 ) −I . . 0
.. .. ..
0 . . . −I 0
.. ..
0 . . ∂3 F(c, um−1 , um−2 ) ∂2 F(c, um−1 , um−2 ) −I
242 COLEMAN, SANTOSA, AND VERMA
∂1 F(c, u0 , u−1 )
∂1 F(c, u1 , u0 )
B= ..
.
∂1 F(c, um−1 , um−2 )
£ ¤
T = e1 e1T e2 e1T ... em e1T .
Then
dh = −T M −1 B dc, (11)
p = −B T M −T T T q. (12)
The matrices B, M, and T should not be computed explicitly, but rather, this formalism is
used to generate an efficient algorithm for finding p given q.
Let q be an m-vector. Then Q = T T q is an m(n + 1)-vector. From (12) if we let Y =
−M −T T T q, then
−M T Y = Q and p = B T Y. (13)
By exploiting the structures of M and B, we can come up with an efficient algorithm to find
p for a given q. Because of the lower-triangular structure of M, we never need to invert any
matrices. The algorithm starts by chopping up Q into m separate pieces,
q1
2
q
Q=
..
.
qm
INVERSION USING AUTOMATIC DIFFERENTIATION 243
Note that we have adjoints/transpose of ∂1 F(c, ·, ·), ∂2 F(c, ·, ·) and ∂3 F(c, ·, ·). These ad-
joints can be computed explicitly if we have matrices ∂1 F, ∂2 F, and ∂3 F or we can resort
to AD to produce subprograms that compute their action on given vectors.
Note that in the algorithm (14), we need to have available values of the fields uk for all
indices k. Depending on the size of the problem, it may be more efficient to store only values
of uk for some indices k ∈ K and use (9) to generate the field for other indices k ∈ / K . An
efficient method to do this is discussed in [8].
u 1 = f (c, u 0 , 0)
u 2 = f (c, u 1 , u 0 )
u 3 = f (c, u 2 , u 1 ).
du 1 = ∂1 f (c, u 0 , 0) dc
du 2 = ∂1 f (c, u 1 , u 0 ) dc + ∂2 f (c, u 1 , u 0 ) du 1
du 3 = ∂1 f (c, u 2 , u 1 ) dc + ∂2 f (c, u 2 , u 1 ) du 2 + ∂3 f (c, u 2 , u 1 ) du 1 .
Therefore, the Jacobian (in this case, derivative) can be identified as J from the output
du 3 = J dc. This is a forward mode computation.
Let the adjoint variables be p and v3 so that we formally have p = J T v3 . If we view du 1
and du 2 as intermediate variables, then we can associate to them adjoint variables v1 and
v2 . From the third equation in the adjoint calculation, we can formally write
∂ f (c, u , u )
p 1 2 1
v2 = ∂
2 f (c, u 2 , u
1 ) v3
v1 ∂3 f (c, u 2 , u 1 )
244 COLEMAN, SANTOSA, AND VERMA
p = ∂1 f (c, u 0 , 0)v1 .
The contributions to each of the adjoint variables are summed over each operation, hence
v2 = ∂2 f (c, u 2 , u 1 )v3
v1 = ∂3 f (c, u 2 , u 1 )v3 + ∂2 f (c, u 1 , u 0 )v2
p = ∂1 f (c, u 2 , u 1 )v3 + ∂1 f (c, u 1 , u 0 )v2 + ∂1 f (c, u 0 , 0)v1 .
set vm = 0
for k = m − 1 : 0
vk+1 = vk+1 + e1 qk+1
vk = vk + ∂2 F(c, uk , uk−1 )T vk+1 (15)
vk−1 = vk−1 + ∂3 F(c, uk , uk−1 )T vk+1
p = p + ∂1 F(c, uk , uk−1 )T vk+1
end
4.3. Discussion
The foregoing methodology, while limited to the 1-D problem, can be adapted to solve
the more complicated 2-D problem. What we wish to emphasize here is the conciseness of
the extended Jacobian framework and how to exploit the underlying problem structure. The
algorithms in (10), (14), and (15) can be viewed as code templates for Jacobian and adjoint
vector product calculations. AD is deployed in computing the Jacobian and adjoint of the
subproblem described by the time stepping process (4).
We recall that the adjoint (reverse product) ∂1 F(c, ·, ·)T y, etc., can be computed us-
ing the adjoint (reverse) mode of an AD tool. For large problems like this, computing
the adjoint product of the timestep routine (4) can be very expensive, since the size of
c, uk , and uk−1 can be large. An AD tool would be default assume that every element of
∂1 F(c, uk , uk−1 ) depends on every element of c, uk , and uk−1 . This assumption on depen-
dence generates a “table” which is used in computing intermediate values in the reverse
product mode. For example, ADOL-C [10] implements this lookup by creating a tape, which
INVERSION USING AUTOMATIC DIFFERENTIATION 245
it will write on the disk if the problem size is large. When it does this, it becomes unacceptably
inefficient.
This concern brings us to the main idea of this paper, i.e., that of AD applied to the
finite difference stencil. Our approach is to use AD on the smallest component of the
calculation—a kind of “microscopic” structure exploitation. We discuss how this is done
in the next section.
In principle, what we are exploiting is a specific sparsity structure that is inherent in the
finite difference scheme. A general approach for exploiting sparsity in AD is described in
[2].
The finite difference method that we used in the 1-D case can be written as indicated in
(4) which we rewrite here
This shorthand notation does not reveal the stencil structure given by the explicit formulas
in (3). For the jth component of uk+1 , j not equal to 0 or n, from (3a) we can write
¡ ¢
u k+1
j = f c j , u kj−1 , u kj , u kj−1 , u k−1
j . (16)
The above expression spells out clearly that the dependence of uk+1 on c, uk , and uk−1 , is
very sparse. This is best visualized by studying Fig. 2. Thus, we need only to deal with f
which is a function of only 5 variables. From (3c–3d), we have two more such functions
but they depend only on 4 and 3 variables, respectively, and are given by
¡ ¢
u k+1
0 = f L c0 , u k0 , u k1 , u k−1
0 ,
¡ ¢
u n = f R cn , u n−1 , u n .
k+1 k k
FIG. 2. The stencil for the 1-D problem for j 6= 0, n. Boundary nodes are slightly different and require separate
treatment.
246 COLEMAN, SANTOSA, AND VERMA
The function F(·, ·, ·), representing a time-step, is now replaced with the pseudo-code
In computing the Jacobian, we needed the derivatives of F(·, ·, ·) with respect to the 3
variables. We next derive procedures to do this using the stencils.
The components of F1 (c, uk , uk−1 ) are
¡ ¢T
∇c u k+1
0
¡ ¢T
∇c u k+1
1
.
..
F1 = .
¡ ¢ T
∇c u k+1
¡ n−1 ¢
∇c u k+1
n
The gradients are easily obtained by differentiating (18) with respect to c. We obtain, for
j 6= 0, n,
¡ ¢
∇c u k+1
j = ∂1 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j ,
which is an (n + 1)-vector with a single nonzero entry at j. Thus, it can be seen that
F1 (c, uk , uk−1 ) is a diagonal matrix. This property is not apparent to state-of-the-art auto-
matic differentiation programs.
The Jacobian F3 (c, uk , uk−1 ) will also be diagonal for the same reason and will be
computed by applying AD to (16). The Jacobian F2 (c, uk , uk−1 ) will be slightly more
complicated. The components of the Jacobian are similar to those of F1 (·, ·, ·) except that
the gradient will be with respect to uk . Directly differentiating (16) with respect to uk
INVERSION USING AUTOMATIC DIFFERENTIATION 247
yields
¡ ¢
∇uk u k+1
j = ∂2 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j−1
¡ ¢
+ ∂3 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j
¡ ¢
+ ∂4 f eTj c, eTj−1 uk , eTj uk , eTj+1 uk , eTj uk−1 e j+1 .
Once the matrices F1 , F2 , and F3 are obtained, we can use the code in (10) to compute the for-
ward derivatives and the code in (14) to compute the adjoint. The codes for the partial deriva-
tives of f , f L , and f R are easily obtained using AD. These codes are expected to be very
efficient because of the simplicity of the stencil formula and because of the small number
of independent variables involved. We have gained efficiency in the AD computation by ap-
plying AD at the stencil level. The cost to the user is performing some detail “hand” coding.
We can employ a similar approach for the more complicated 2-D example. We note that
a typical stencil for interior nodes is given by
¡ ¢
u ik+1
j = f ci j , u i−1,
k
j , u i+1, j , u i, j−1 , u i, j+1 , u i j , u i j
k k k k k−1
.
The stencil is displayed in Fig. 3. Boundary node and corner nodes, because of the absorbing
boundary conditions described in (7c), result in slightly more complex stencils. The key
observation is that the stencil embodies the sparsity structure of the Jacobian and is a feature
that should be exploited.
2
One would use sparse utilities to implement this in MATLAB.
248 COLEMAN, SANTOSA, AND VERMA
FIG. 3. The stencil for the 2-D problem for an interior node. Boundary and corner nodes are slightly different
and require separate treatment.
to generate an algorithm to compute the gradient of f (·, ·, ·, ·, ·) times a 5-vector; that is,
given X and d X , we have a procedure to find
f (X ) and ∇ f (X ) · d X.
du k+1
j = ∇ f (X ) · d X.
We would have similar formulas for j = 0 and j = n with the difference that the vector of
independent variables would be 4 and 3 dimensional, respectively. We can therefore assem-
ble the du k+1
j within an outer loop which corresponds to the time steps. The pseudo-code
would take the form
du−1 = du0 = 0
for k = 0 : m − 1
£ ¤T £ ¤T
X = c0 , u k0 , u k1 , u k−1
0 ; d X = dc0 , du k0 , du k1 , du k−1
0
du k+1
0 = ∇ f L (X ) · d X
£ ¤T £ ¤T
X = cn , u kn−1 , u kn ; d X = dcn , du kn−1 , du kn
du k+1
n = ∇ f R (X ) · d X
(19)
for j = 1 : n − 1
£ ¤T ¤T
X = c j , u kj−1 , u kj , u kj+1 , u k−1
j ; d X = [dc j , du kj−1 , du kj , du kj+1 , du k−1
j
du k+1
j = ∇ f (X ) · d X
end
dh k+1 = du k+1
0
end
The adjoint codes generated by AD on the stencil formula (16) computes the following.
Given a scalar v and a vector X , the adjoint code calculates a 5-vector
v∇ f (X ).
INVERSION USING AUTOMATIC DIFFERENTIATION 249
We have similar procedures for f L (·) and f R (·). In reverse mode, we want to perform a
calculation similar to (15). We start with a vector q = [q1 , q2 , . . . , qm ]T , and we wish to
compute p = J T q. The pseudo-code for this is
set vm = 0
for k = m − 1 : 0
v0k+1 = v0k+1 + qk+1
£ ¤T
X = c0 , u k0 , u k1 , u k−1
0
£ ¤T
Y = p0 , v0k , v1k , v0k−1
Y = Y + dv0k+1 ∇ f L (X )
for j = 1 : n − 1
£ ¤T
X = c j , u kj−1 , u kj , u kj+1 , u k−1
j
(20)
£ ¤T
Y = p j , v kj−1 , v kj , v kj+1 , v k−1
j
Y = Y + dv k+1
j ∇ f (X )
end
X = [cn , u kn−1 , u kn ]T
Y = [ pn , vn−1
k
, vnk ]T
Y = Y + dvnk+1 ∇ f R (X )
end
Algorithm (19) for the forward product calculation and algorithm (20) for the reverse product
calculation will be extremely efficient because the codes produced by AD for calculating the
derivative of f and its adjoint will be nearly as short and simple as the function calculation.
The number of independent variables is small, and there are no loops as can be seen in (3a).
The algorithm (20) can be easily understood by noticing that the input is the vector q. The
algorithm goes backwards in time computing adjoints of each state by using the previously
computed adjoints. At completion, this algorithm returns the adjoints of the independent
variables.
In 2-D, the stencil is a bit more complex as already pointed out, but the general principle
described here applies. Indeed, we have coded a version of algorithms (19) and (20) for the
2-D problem. We discuss the results of our numerical calculations next.
6. NUMERICAL RESULTS
We present some results from our numerical computations. In both examples, we use
TAMC [7] to obtain derivative and adjoint codes from Fortran sources. All the Fortran
codes were “wrapped” as MATLAB mex-files and used in conjunction with MATLAB
codes.
Our goal in this paper is to demonstrate the use of extended Jacobian framework together
with exploitation of stencil in Jacobian and adjoint calculations. In a subsequent work, we
apply our approach to solve a 2-D inverse problem arising in acoustic imaging.
250 COLEMAN, SANTOSA, AND VERMA
FIG. 4. (a) The excitation used in the examples. (b) The two sound speed profiles, c1 (x) in dots, c2 (x) in solid.
(c) The graphs of (h2 − h1 ) and J (c1 )(c2 − c1 ). (d) The graph of p = J (c1 )T (h2 (t) − h1 ), shown for comparison,
is the graph of c2 − c1 .
INVERSION USING AUTOMATIC DIFFERENTIATION 251
the boundary data are h 1 (t), and h 2 (t) when the medium is c2 (x). Let q = h 1 − h 2 ; the
graph of q(t) is displayed in Fig. 4c for m = 200.
We will first compute the Jacobian at c1 (x) times the difference c2 (x) − c1 (x). The re-
sulting output vector should be very close to q(t). A comparison of p(t) with J (c1 )(c2 − c1 )
is shown in Fig. 4c. Next we calculate the adjoint times q(t); i.e.,
p = J (c1 )T (h2 − h1 ).
The output of this calculation will be the steepest descent direction corresponding to the
nonlinear least-squares functional in (6). This direction would be similar to c2 (x) − c1 (x).
The graph of p(t) is shown in Fig. 4d. One can see that the 2 big signals, which are
scaled versions of f , are reproduced near the places where c2 (x) − c1 (x) take jumps.
Unfortunately, the similarity ends there; the result shows that the inverse problem is not
very well posed. However, we did check that the Jacobian and the adjoint are correctly
computed by evaluating
and comparing their values for any choice of c, q, and dc. The agreement is usually 14
digits. Additionaly, the result is also confirmed by the computation of qT J dc using finite
differences, where we usually get about 6 digits right. We show a typical run in Fig. 5.
FIG. 5. A test of the correctness of Jacobian and adjoint calculations. In this example, N = 80 and m = 100.
We choose at random 2 vectors dc and q displayed on the first row. On the second row, we show J dc and J T q.
The inner products qT J dc and dcT J T q are evaluated. They agree to 14 digits.
252 COLEMAN, SANTOSA, AND VERMA
Computation time is linear in the number of x nodes for a fixed number of t nodes. There
is no difficulty with memory as the stencil codes are very simple with a small number of
independent variables.
The time-dependent function φ(t) is chosen to be a Gaussian and will be sampled at the
time increments 1t = 0.55, which is the time step chosen for the finite difference scheme.
Data will be collected at 64 stations located at node points. These points are nodes that
lie close to a set of points distributed evenly at 64 places on the circumference of a circle of
radius 72. We will take 381 times steps. A window of size [−70, 70] × [−70, 70] represents
where c(x, y) is allowed to vary. Thus the mapping from sound speed c to data at the receiver
is from IR141×141 to IR64×381 .
In Fig. 6a we display the sound speed distribution in the domain. The receivers are marked
with circles; receiver 1 is at 0◦ from the positive x-axis. The source is located by a ?. Next,
in Fig. 6b, we display the receiver data when the medium has the two cylinders shown. The
difference between the previous data and those when the domain is homogeneous is shown
in Fig. 6c. In Fig. 6d, we show the result of applying the adjoint on the difference data in
Fig. 6c. This process is often refered to as back-propagation, and corresponds to the steepest
descent direction for the nonlinear least squares functional in (8). The resulting vector should
resemble the image of the two cylinders. Indeed this is the case if one compares. Figs. 6d
and 6e, the latter displayed for comparison.
In numerous experiments with random vectors, we were able to get the inner products
similar to (21) to agree 14 digits. The adjoint calculations take approximately 115 seconds
on a 4-processor SGI Challenge L. The hand-coded gradient calculation takes approxi-
mately 90 seconds. The fully AD-generated gradient (using the reverse mode) takes almost
5 minutes to compute. In another run, the size of the sound speed c (independent variables)
was increased from 141 × 141 to 201 × 201. A window of size [−100, 100] × [−100, 100]
represents where c(x, y) could vary. The receivers are placed on the circumference of a
circle of radius 90. The grid of 241 × 241 node points is set up corresponding to the domain
[−120, 120] × [−120, 120]. The adjoint calculations scale very well and take 225 seconds
to compute. The hand-coded gradient calculation is marginally efficient and takes about
200 seconds. However, the fully AD-generated gradient scheme becomes intractable be-
cause the volume of data to be stored for the reverse mode is much larger and some extra
FIG. 6. (a) The setup for the numerical experiment. The two cylinders represent sound speed anomalies.
The darker cylinder is 2% faster while the lighter cylinder is 1% faster than the background medium. Shown in
circles are the receiver locations. A star marks the location of the point source. (b) The receiver data when the
two cylinders are present. (c) The difference between (b) and receiver data when the medium is constant. (d) The
adjoint applied to (c). (e) The two cylinders plotted on the same scale as in (d) for comparison.
253
254 COLEMAN, SANTOSA, AND VERMA
time is spent in memory paging and accessing secondary storage. This takes nearly 20 min-
utes to finish. This illustrates that the adjoint scheme presented in this paper is automatic but
still much like traditional hand-coding in performance, while the naive use of AD doesn’t
provide a reasonably efficient solution.
7. CONCLUSIONS
We have described an inverse problem arising in wave propagation and how the need
arises for efficient computation of Jacobian and adjoint products when the problem is posed
as a least-squares problem. In this work, we describe the extended Jacobian framework
which gives a high level description of Jacobian and adjoint vector product calculation. The
framework is particularly appropriate for functions whose evaluation involve some type of
time stepping, such as those that arise in discretizing the wave equation.
We show further that the stencil structure of the finite difference scheme that provides
the underlying function evaluation can be exploited. Automatic differentiation is applied
at the stencil level, and the resulting subprograms fit nicely within the extended Jacobian
framework. The framework provides a guide for building highly efficient codes for Jacobian
and adjoint vector product evaluations. The one drawback of the approach is that we have
given up some “automation” for efficiency. A small amount of hand-coding is required to
assemble the programs. Nevertheless, our approach provides a way to overcome memory
problems associated with present AD technology.
One important extension of the 2-D problem we discussed is the case when the receivers
don’t lie on the grid points. The proposed methodology can be easily extended to handle
the interpolation between the grid values required in this case. It poses no difficulty for
the adjoint computation as the interpolated values are a linear combination of the values
at close grid points. This can be handled through definition of general (linear) projection
operators which when applied to the values at all grid points results in the the values at the
receivers. This linear operator is trivial to handle in the adjoint computation via AD.
Overall, the idea of exposing the stencil structure is very promising and can lead to an
order of magnitude improvement in the adjoint code, as our numerical results show.
REFERENCES
1. G. Chavent, F. Clement, and S. Gomez, Waveform inversion by MBTT formulation, in Mathematical and
Numerical Aspects of Wave Propagation, edited by Cohen, et al. (SIAM, Philadephia, 1995), p. 713.
2. T. Coleman and A. Verma, The efficient computation of sparse Jacobian matrices using automatic differenti-
ation, SIAM J. Sci. Comput. 19, 1210 (1998).
3. T. Coleman and A. Verma, Structure and efficient Hessian calculation, in Advances in Nonlinear Programming,
edited by Yuan (Kluwer Academic, Boston, 1996).
4. T. Coleman and A. Verma, Structure and efficient Jacobian calculation, in Computational Differentiation:
Techniques, Applications, and Tools, edited by Berz et al. (SIAM, Philadelphia, 1996), p. 149.
5. T. Coleman, F. Santosa, and A. Verma, Semi-automatic differentiation, in Proceedings of Optimal Design and
Control Workshop, VPI, 1997.
6. B. Enquist and A. Majda, Absorbing boundary conditions for the numerical simulation of waves, Math. Comp.
31, 629 (1977).
7. R. Giering, Tangent Linear and Adjoint Model Compiler (User Manual, TAMC Version 4.7, 1997).
8. A. Griewank, Achieving logarithmic growth of temporal and spatial complexity in reverse automatic differ-
entiation, Optim. Methods Software 1, 35 (1992).
INVERSION USING AUTOMATIC DIFFERENTIATION 255
9. A. Griewank, Some bounds on the complexity of gradients, Jacobians, and Hessians, in Complexity in Non-
linear Optimization, edited by Pardalos (World Scientific, Singapore, 1993).
10. A. Griewank, D. Juedes, and J. Utke, ADOL-C, a package for the automatic differentiation of algorithms
written in C/C++, ACM Trans. Math. Software 22, 131 (1996).
11. T. Mast, A. Nachman, and R. Waag, Focussing and imaging using eigenfunctions of the scattering operator,
J. Acous. Soc. Am., in press.
12. F. Santosa and W. Symes, An Analysis of Least-Squares Velocity Inversion (Society of Exploration Geophysi-
cists, Tulsa, 1989).
13. F. Santosa and W. Symes, Computation of the Hessian for least-squares solutions of inverse problems of
reflection seismology, Inverse Problem 4, 211 (1988).
14. W. Symes, A differential semblance criterion for inversion of multioffset seismic reflection data, J. Geophys.
Res. 98, 2061 (1993).
15. W. Symes and C. Zhang, A Finite Difference Time Stepping Class, Rice University TRIP Report, 1997.
16. A. Tarantola, Inverse Problem Theory (Elsevier, Amsterdam, 1987).