0% found this document useful (0 votes)
30 views3 pages

Schleicher 2018 The Conjugate Gradient Method

The document presents a tutorial on the conjugate gradient method for solving large linear geophysical problems, such as seismic data inversion. It details the implementation of this iterative method using Python, emphasizing a matrix-free approach that enhances efficiency and reduces memory usage. The tutorial demonstrates the method's effectiveness through examples, showing rapid convergence in estimating reflectivity models from synthetic seismic data.

Uploaded by

ROHIT ARORA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views3 pages

Schleicher 2018 The Conjugate Gradient Method

The document presents a tutorial on the conjugate gradient method for solving large linear geophysical problems, such as seismic data inversion. It details the implementation of this iterative method using Python, emphasizing a matrix-free approach that enhances efficiency and reduces memory usage. The tutorial demonstrates the method's effectiveness through examples, showing rapid convergence in estimating reflectivity models from synthetic seismic data.

Uploaded by

ROHIT ARORA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

G e o p h y s i c a l T u t o r i a l — C o o r d i n at e d by M a tt H a l l

The conjugate gradient method


Karl Schleicher1
Downloaded 01/02/24 to 103.240.194.208. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/page/policies/terms

T he conjugate gradient method can be used to solve many


large linear geophysical problems — for example, least-
squares parabolic and hyperbolic Radon transform, traveltime
The forward and adjoint operators
Starting with a known reflectivity model m, we create synthetic
seismic data d = Fm, where F is the linear operator that performs
tomography, least-squares migration, and full-waveform inversion the function “convolve with a Ricker wavelet.” Given such a trace
(FWI) (e.g., Witte et al., 2018). This tutorial revisits the “Linear and the operator, the conjugate gradient method can be used to
inversion tutorial” (Hall, 2016) that estimated reflectivity by estimate the original reflectivity.
deconvolving a known wavelet from a seismic trace using least Hall (2016) calls the operator G instead of F, and he creates
squares. This tutorial solves the same problem using the conjugate a matrix by shifting the wavelet and padding with zeros. In
gradient method. This problem is easy to understand, and the contrast, I implement the operator using the NumPy function
concepts apply to other applications. The conjugate gradient convolve(). This is advantageous because it allows us to solve
method is often used to solve large problems because the least- the linear equation  = F−1d without ever having to construct
squares algorithm is much more expensive — that is, even a large (or invert) this matrix, which can become very large. This
computer may not be able to find a useful solution in a reasonable matrix-free approach is faster and uses less memory than the
amount of time. matrix implementation.
We can add one more feature to the operator and implement
Introduction it with its adjoint. A convenient way to combine the two operations
The conjugate gradient method was originally proposed by is to use a so-called “object-oriented programming” approach and
Hestenes (1952) and extended to handle rectangular matrices by define a Python class. Then we can have two methods (i.e., func-
Paige and Saunders (1982). Claerbout (2012) demonstrates its tions) defined on the class: forward, implementing the forward
DOI:10.1190/tle37040296.1

application to geophysical problems. It is an iterative method. operator, and adjoint for the adjoint operator, which in this case
Each iteration applies the linear operator and its adjoint. The is correlation.
initial guess is often the zero vector, and computation may stop
after very few iterations. class Operator(object):
The adjoint of the operator A, denoted as A H, is defined as """A linear operator.
the operator that satisfies ⟨Ax, y〉 = ⟨x, A H y〉 for all vectors x and """
y (where ⟨u, v〉 represents the inner product between vectors u def __init__(self, wavelet):
and v). For a given matrix, the adjoint is simply the complex self.wavelet = wavelet
conjugate of the transpose of the matrix; this is also sometimes
known as the Hermitian transpose and is sometimes written as def forward(self, v):
A* or A†. Just to muddy the notation water even further, the """Defines the forward operator.
complex conjugate transpose is denoted by A.H in NumPy and """
A' in Matlab or Octave. However, we will implement the adjoint return np.convolve(v, self.wavelet, mode='same')
operator without forming any matrices.
Many linear operators can be programmed as functions that def adjoint(self, v):
are more intuitive and efficient than matrix multiplication. The """Defines the adjoint operator.
matrices for operators like migration and FWI would be huge, """
but we avoid this problem because once you have the program for return np.correlate(v, self.wavelet, mode='same')
the linear operator, you can write the adjoint operator without
computing matrices. Implementing the conjugate gradient algo- Claerbout (2012) teaches how to write this kind of symmetrical
rithm using functions to apply linear operators and their adjoints code and provides many examples of geophysical operators with
is practical and efficient. It is wonderful to see programs that adjoints (e.g., derivative versus negative derivative, causal integra-
implement linear algorithms without matrices, and the program- tion versus anticausal integration, stretch versus squeeze, truncate
ming technique is a key theme in Claerbout’s 2012 book. versus zero pad). Writing functions to apply operators is more
This tutorial provides a quick start to the conjugate gradient efficient than computing matrices.
method based on Guo’s pseudocode (2002). Those interested in Now that we have the operator, we can instantiate the class
more depth can read Claerbout (2012) and Shewchuk (1994). A with a wavelet. This wavelet will be “built in” to the instance F.
Jupyter Notebook with Python code to reproduce the figures in
this tutorial is at https://github.com/seg/tutorials. F = Operator(wavelet)

1
University of Texas, Bureau of Economic Geology, Jackson School https://doi.org/10.1190/tle37040296.1.
of Geosciences.

296 THE LEADING EDGE April 2018


Now we can compute d = Fm sim-
ply by passing the model m to the
method F.forward(), which already
Downloaded 01/02/24 to 103.240.194.208. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/page/policies/terms

has the wavelet:

d = F.forward(m)

This results in the synthetic seis-


mogram shown in Figure 1.

The conjugate gradient method


Now that we have a synthetic, we
wish to solve the linear inverse problem Figure 1. A plot of reflectivity model m (black) and the synthetic seismic data d (orange).
and estimate the model m from the data
d. The model can not be completely
recovered because the Ricker wavelet is These equations define a reasonable approach to iteratively improve an estimated
band limited, so some information is lost. solution, and it is called “the steepest descent method.” The conjugate gradient algorithm
One way to solve linear problems builds on this. It is not much harder to implement, has similar cost per iteration, and
is to start with an initial guess and faster convergence.
iteratively improve the solution. The The first iteration of the conjugate gradient method is the same as the steepest descent
next few paragraphs derive an iterative method. The second (and later) iterations compute the gradient g 1 and Fg 1. With s 0 and
method. You do not need to understand Fs 0 from the previous iteration we can then compute the step direction. Scalars α and β are
all the derivation, so you might want computed to minimize
to lightly read it and move to the section
DOI:10.1190/tle37040296.1

about the pseudocode. r2 = d – F(1 + αs 0 + βg 1) (2)


We start with an initial estimate for
the model, 0 = 0 (the zero vector) and Some mathematical manipulations determine the best direction is s1 = g 1 + β s 0 where
compute the residual r0 = d − F0 (i.e., β = ⟨g 1, g 1〉 / ⟨g 0, g 0〉. The conjugate gradient algorithm is guaranteed to converge when the
the difference between the data and the number of iterations is equal to the dimension of , but only a few iterations often give
action of the forward operator on the sufficient accuracy. For our implementation, we’ll start with a simplifed version of the
model estimate). A good measure of the pseudocode provided by Guo (2002), with its implementation in Python on the right:
error in the initial solution is the inner
product ⟨r0 , r0 〉 or np.dot(r0, r0) in =0 m_est = np.zeros_like(d)
code. This is equivalent to the squared r = d − F r = d - F.forward(m_est)
norm (length) of the residual vector r0 , s=0 s = np.zeros_like(d)
and constitutes our cost function. If the β=0 beta = 0
cost is 0, or within some small tolerance,
then we have a solution. iterate n times: for i in range(n):
We can improve the estimate,  0 g = FHr g = F.adjoint(r)
by selecting a direction s 0 and a scale if not first iteration : if i != 0:
α 0 to move  0 to a new guess, β = ⟨g, g〉 / γ beta = np.dot(g, g) / gamma
1 =  0 + α 0 s 0 . You can compute the γ = ⟨g, g〉 gamma = np.dot(g, g)
direction in model space that most s = g + βs s = g + beta * s
rapidly decreases the error. This is the Δr = Fs deltar = F.forward(s)
gradient of ⟨d − F, d − F〉 or ⟨r0 , α 0 = ⟨g 0 , g 0〉 / ⟨Δr, Δr〉 alpha = gamma / np.dot(deltar, deltar)
r0 〉. If you grind through the mathemat-  =  − αs m_est = m_est + alpha * s
ics, the gradient g0 turns out to be given r = r + αΔr r = r - alpha * deltar
by the action of the adjoint operator on
the residual: g 0 = FH r0 . Results
The scalar α 0 is computed to mini- The Python code in the previous section was used to invert for reflectivity. Figure 2
mize the residual, r1 = d − F = d – F( shows the five iterations of the conjugate gradient method. The conjugate gradient method
+ α0 s0 ) = r0 − α0Fs0. We can then compute converged in only four iterations; the results of the fourth and fifth iteration almost exactly
α0 by taking the derivative of ⟨r1, r1〉 with overlay on the plot. Fast convergence is important for a practical algorithm. Convergence
respect to α0, setting the derivative to 0, is guaranteed in 50 iterations (the dimension of the model).
and solving for α0. The result is: Figure 3 compares the original model and the model estimated using conjugate gradient
inversion. Conjugate gradient inversion does not completely recover the model because the
α 0 = ⟨g 0 , g 0〉 / ⟨Fs 0 , Fs 0〉. (1) Ricker wavelet is band limited, but side lobes are reduced compared to the data.

April 2018 THE LEADING EDGE 297


Finally, we compute the predicted data from the estimated model: can be used to promote a sparse solution. It also provides examples
using the solver provided in the SciPy package.
Downloaded 01/02/24 to 103.240.194.208. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/page/policies/terms

d_pred = F.forward(m_est)
Conclusions
Figure 4 compares the predicted data with the original data I described the conjugate gradient algorithm and presented
to show that we have done a good job of the estimation. This an implementation. This is an iterative method that requires
demonstrates more than one reflectivity sequence when convolved functions to apply the linear operator and its adjoint. Many
with the Ricker wavelet fits the data, in particular the original linear operators that are familiar geophysical operations like
model and the model estimated by the conjugate gradient method. convolution are more efficiently implemented without matrices.
It may be interesting to explore preconditioning operators that The reflectivity estimation problem described in Hall (2016)
promote a sparse solution. was solved using the conjugate gradient method. Convergence
The Jupyter notebook provided with this tutorial further only took four iterations. The conjugate gradient method is often
explores finding least-squares solutions using the conjugate gradi- used to solve large problems because well-known solvers like
ent method. The notebook demonstrates how preconditioning least squares are much more expensive.

Acknowledgments
The SEG Seismic Working Workshop on Reproducible Tutori-
als held 9–13 August 2017 in Houston inspired this tutorial. For
more information, visit http://ahay.org/wiki/Houston_2017.

Corresponding author: [email protected]

References
Claerbout, J., and S. Fomel, 2012, Image estimation by example,
DOI:10.1190/tle37040296.1

http://sepwww.stanford.edu/sep/prof/gee1-2012.pdf, accessed
6 March 2018.
Guo, J., H. Zhou, J. Young, and S. Gray, 2002, Merits and challenges
Figure 2. The initial estimate and the first 5 iterations of conjugate gradient. The for accurate velocity model building by 3D gridded tomography:
fifth iteration almost exactly overlays the fourth. 72nd Annual International Meeting, SEG, Expanded Abstracts,
854–857, https://doi.org/10.1190/1.1817395.
Hall, M., 2016, Linear Inversion: The Leading Edge, 35, no. 12,
1085–1087, https://doi.org/10.1190/tle35121085.1.
Hestenes, M. R., and E. Stiefel, 1952, Methods of conjugate gradients
for solving linear systems: Journal of Research of the National
Bureau of Standards, 49, no. 6, https://doi.org/10.6028/jres.049.044.
Paige, C. C., and M. A. Saunders, 1982, LSQR: An algorithm for
sparse linear equations and sparse least squares: ACM Transactions
on Mathematical Software, 8, no. 1, 43–71, https://doi.org/
10.1145/355984.355989.
Shewchuk, J. R., 1994, An introduction to the conjugate gradient
method without the agonizing pain, http://www.cs.cmu.
edu/~quake-papers/painless-conjugate-gradient.pdf, accessed
Figure 3. Comparison of the model (black) and the model estimated using 6 March 2018.
conjugate gradient inversion (purple), along with the data (orange) for Witte, P, M. Louboutin, K. Lensink, M. Lange, N. Kukreja, F.
comparison. Luporini, G. Gorman, and F. J. Herrman, 2018, Full-waveform
inversion, Part 3: Optimization: The Leading Edge, 37, no. 2,
142–145, https://doi.org/10.1190/tle37020142.1.

© The Author(s). Published by the Society of Exploration Geophysicists. All


article content, except where otherwise noted (including republished material), is
licensed under a Creative Commons Attribution 3.0 Unported License (CC BY-SA).
See https://creativecommons.org/licenses/by-sa/3.0/. Distribution or reproduc-
tion of this work in whole or in part commercially or noncommercially requires full
Figure 4. Comparison of the predicted data d_pred from the conjugate gradient attribution of the original publication, including its digital object identifier (DOI).
inversion with the original data d. It overplots the data almost exactly. Derivatives of this work must carry the same license.

298 THE LEADING EDGE April 2018

You might also like