0% found this document useful (0 votes)
20 views75 pages

Advances in Geophysics Volume 55 1st Edition Renata Dmowska

Uploaded by

schausseeder96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views75 pages

Advances in Geophysics Volume 55 1st Edition Renata Dmowska

Uploaded by

schausseeder96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Download the full version of the ebook at ebookfinal.

com

Advances in Geophysics Volume 55 1st Edition


Renata Dmowska

https://ebookfinal.com/download/advances-in-geophysics-
volume-55-1st-edition-renata-dmowska/

OR CLICK BUTTON

DOWNLOAD EBOOK

Download more ebook instantly today at https://ebookfinal.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Advances in Geophysics Vol 49 1st Edition Renata Dmowska


(Ed.)

https://ebookfinal.com/download/advances-in-geophysics-vol-49-1st-
edition-renata-dmowska-ed/

ebookfinal.com

Advances in Genetics 55 1st Edition Jeffrey C. Hall

https://ebookfinal.com/download/advances-in-genetics-55-1st-edition-
jeffrey-c-hall/

ebookfinal.com

Advances in Organometallic Chemistry Vol 55 1st Edition


Robert West

https://ebookfinal.com/download/advances-in-organometallic-chemistry-
vol-55-1st-edition-robert-west/

ebookfinal.com

Advances in Inorganic Chemistry Vol 55 1st Edition Rudi


Van Eldik

https://ebookfinal.com/download/advances-in-inorganic-chemistry-
vol-55-1st-edition-rudi-van-eldik/

ebookfinal.com
Advances in Food and Nutrition Research 55 1st Edition
Steve L. Taylor (Eds.)

https://ebookfinal.com/download/advances-in-food-and-nutrition-
research-55-1st-edition-steve-l-taylor-eds/

ebookfinal.com

Advances in Genetics Volume 92 1st Edition Dunlap

https://ebookfinal.com/download/advances-in-genetics-volume-92-1st-
edition-dunlap/

ebookfinal.com

Advances in Dendritic Macromolecules Volume 5 Volume 5 1st


Edition Diane Cogan

https://ebookfinal.com/download/advances-in-dendritic-macromolecules-
volume-5-volume-5-1st-edition-diane-cogan/

ebookfinal.com

Advances in Chemical Physics Advances in Chemical Physics


Volume 156 Stuart A. Rice

https://ebookfinal.com/download/advances-in-chemical-physics-advances-
in-chemical-physics-volume-156-stuart-a-rice/

ebookfinal.com

Advances in Taxation Volume 19 1st Edition Toby Stock

https://ebookfinal.com/download/advances-in-taxation-volume-19-1st-
edition-toby-stock/

ebookfinal.com
CONTENTS

Contributors vii

1. Seismic Tomography and the Assessment of Uncertainty 1


Nicholas Rawlinson, Andreas Fichtner, Malcolm Sambridge and Mallory K. Young
1. Introduction 2
2. Nonuniqueness in Seismic Tomography 14
3. Practical Assessment Methods 28
4. Case Studies 46
5. Concluding Remarks 64
Acknowledgments 65
References 66

~ o/Southern Oscillation and Selected Environmental


2. El Nin
Consequences 77
Tomasz Niedzielski
1. Introduction 77
2. Fundamentals of El Nin~o/Southern Oscillation 79
3. ~o/Southern Oscillation?
What Triggers El Nin 87
4. ~o/Southern Oscillation in the Past
El Nin 90
5. ~o/Southern Oscillation versus Selected Geophysical Processes
El Nin
and Their Predictions 95
6. Concluding Remarks 114
Acknowledgments 115
References 115

Index 123

v j
CHAPTER ONE

Seismic Tomography and the


Assessment of Uncertainty
Nicholas Rawlinson*, 1, Andreas Fichtnerx, Malcolm Sambridge{
and Mallory K. Youngjj
*
School of Geosciences, University of Aberdeen, Aberdeen, Scotland, UK
x
Institute of Geophysics, Department of Earth Sciences, ETH Zurich, Zurich, Switzerland
{
Research School of Earth Sciences, Australian National University, Canberra, ACT, Australia
jj
DownUnder GeoSolutions Pty Ltd, West Perth, WA, Australia
1
Corresponding author: E-mail: [email protected]

Contents
1. Introduction 2
1.1 Motivation 2
1.2 Historical Perspective 5
1.3 Uncertainty in the Age of Big Data 12
2. Nonuniqueness in Seismic Tomography 14
2.1 Data Coverage 15
2.2 Data Noise 17
2.3 The Parameterization Problem 21
2.4 The Data Prediction Problem 23
2.5 The Inverse Problem 25
3. Practical Assessment Methods 28
3.1 Covariance and Resolution 28
3.2 Jackknife and Bootstrap 36
3.3 Synthetic Reconstruction Tests 38
3.4 Linear and Iterative Nonlinear Sampling 39
3.5 Fully Nonlinear Sampling 44
4. Case Studies 46
4.1 Synthetic Reconstruction Test: Teleseismic Tomography Example 46
4.2 Iterative Nonlinear Sampling: Surface Wave Tomography Example 52
4.3 Transdimensional Inversion: Surface Wave Tomography Example 55
4.4 Full Waveform Inversion: Resolution Analysis Based on Second-Order Adjoints 60
5. Concluding Remarks 64
Acknowledgments 65
References 66

Abstract
Seismic tomography is a powerful tool for illuminating Earth structure across a range of
scales, but the usefulness of any image that is generated by this method is dependent
on our ability to quantify its uncertainty. This uncertainty arises from the ill-posed nature
Advances in Geophysics, Volume 55
© 2014 Elsevier Inc.
j
ISSN 0065-2687
http://dx.doi.org/10.1016/bs.agph.2014.08.001 All rights reserved. 1
2 Nicholas Rawlinson et al.

of the tomographic inverse problem, which means that multiple models are capable of
satisfying the data. The goal of this review is to provide an overview of the current state
of the art in the assessment of uncertainty in seismic tomography, and issue a timely
reminder that compared to the rapid advances made in many other areas of Earth
imaging, uncertainty assessment remains underdeveloped and is often ignored or given
minimal treatment in published studies. After providing a historical perspective that dates
back to the pioneering work of the early 1970s, the factors that control solution nonunique-
ness are discussed, which include data coverage, data noise, choice of parameterization,
method used for data prediction and formulation of the inverse problem. This is followed
by a description of common methods used to assess solution uncertainty and a commen-
tary on their strengths and weaknesses. The final section of the review presents four case
studies involving data sets from Australia and Europe that use different methods to assess
uncertainty. The descriptive nature of this review, which does not contain detailed math-
ematical derivations, means that it is suitable for the many nonspecialists who make use of
seismic tomography results but may not have a full appreciation of their reliability.

1. INTRODUCTION
1.1 Motivation
For over 40 years seismic tomography has been the primary tool for
revealing the heterogeneous nature of Earth’s internal structure across a large
range of scales. From its origins in early active source (Bois, La Porte,
Lavergne, & Thomas, 1971) and passive source (Aki, Christoffersson, &
Husebye, 1977; Aki & Lee, 1976; Dziewonski, Hager, & O’Connell,
1977) travel time studies, seismic tomography has become increasingly
sophisticated and powerful in response to advances in methodology, rapid
improvements in computing power, and growth in the availability of high-
quality digital data. Today, we have reached the point where massive inverse
problems involving millions of unknowns and tens of millions of data values
can be tackled (e.g., Burdick et al., 2014); where the entire waveform can be
inverted rather than a derivative component such as travel time (e.g., Chen,
Zhao, & Jordan, 2007; Fichtner, 2011; Tape, Liu, Maggi, & Tromp, 2010);
where multiscale structures can be recovered in a single inversion (Bodin,
Sambridge, Tkalcic, et al., 2012; Burdick et al., 2008; Fichtner et al.,
2013); where multiple data sets can be jointly inverted (Obrebski, Allen, Pol-
litz, & Hung, 2011; Rawlinson & Urvoy, 2006; West, Gao, & Grand, 2004),
including data sets of different classes such as surface wave dispersion, gravity,
and heat flow (Afonso, Fullea, Yang, Connolly, & Jones, 2013); and where
various seismic properties, including P- and S-wave velocity and attenuation
can be recovered, as well as, in some cases, other physical and material
Seismic Tomography and the Assessment of Uncertainty 3

properties such as temperature and composition (Khan, Boschi, & Connolly,


2011; Afonso, Fullea, Griffin, et al., 2013, Afonso, Fullea, et al., 2013). As a
consequence, our knowledge of the Earth’s internal structure, composition,
and dynamics is rapidly improving.
Despite the growing power of seismic tomography as a tool to image the
Earth’s interior, there remains one crucial facet of the technique that has
only seen limited improvement in recent times. This is the issue of solution
robustness, which arises from the ill-posed nature of the tomographic inverse
problem. According to the original definition of Hadamard (1902,
pp. 49–52), a well-posed problem in mathematics is characterized by having
a solution that exists, is unique, and changes continuously with respect to
initial conditions. In most practical seismic tomography applications, the in-
verse problem is under- or mixed-determined, so multiple data-satisfying
solutions exist, and solutions (e.g., maximum likelihood in a linearized least
squares formulation) tend to be unstable with respect to small perturbations
in prior information and data noise in the absence of regularization.
Although relatively simple, Figure 1 provides useful insight into the
ill-posed nature of seismic tomography. The synthetic model consists of
high- and low-wave speed perturbations relative to a background value of
3.0 km/s and is of variable scale length (Figure 1(a)). The source of data is

(a) (b)
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


Velocity (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Figure 1 Simple ray tracing example that demonstrates why seismic tomography
problems tend to be ill-posed. (a) Velocity perturbations overlain by a set of sources
(white stars) and receivers (blue triangles); (b) two-point rays traced between all sources
and receivers. (For interpretation of the references to color in this figure legend, the
reader is referred to the online version of this book.)
4 Nicholas Rawlinson et al.

provided by an irregular distribution of sources and receivers. If we assume


that geometric ray theory is valid and that only first arrivals are identifiable,
then the data coverage (Figure 1(b)) is uneven, not just because of the
source–receiver configuration, but also because the ray paths bend in
response to variations in wave speed. For first arrivals, the tendency is to
avoid low-velocity regions and preferentially sample high-velocity regions.
In this case, the tomographic inverse problem can be formulated as one of
finding a pattern of wave speeds that satisfy the two-point travel time
data. From the path coverage in Figure 1(b), it seems obvious that the solu-
tion to such a problem would be nonunique, for instance, the high velocity
anomaly in the top left corner could assume any value without influencing
the data. Similarly, a number of low-velocity regions within the array are
poorly sampled, and could likely assume a range of data-satisfying values.
Another consideration is that all seismic data contain noise, and as this noise
increases, so does the range of models that fit the data equally well.
In applications involving data recorded in the field, uncertainty arises not
only in the manner described above with regard to Figure 1, but also as a
result of simplifying assumptions in the physics of the forward problem,
limitations on the range of possible structures imposed by the choice of
parameterization, and assumptions about the distribution and magnitude
of the noise. All of these influences on the solution are extremely difficult
to quantify. To further compound the problem, when it actually comes
to interpreting the seismic results (e.g., P-wave velocity image) in terms of
temperature, composition, and other physical properties (e.g., grain size,
presence of melt) that provide direct insight into subsurface structure and
processes, there is an additional layer of nonuniqueness; for instance, a
decrease in P-wave velocity could be due to an increase in temperature,
an increase in melt, or a compositional change. As a consequence, even if
features appear to stand out clearly in a tomographic image, their meaning
is very often open to debate. For instance, while some authors cite images
of low wave speeds extending throughout much of the mantle as evidence
in support of mantle plume theory (e.g., Montelli, Nolet, Dahlen, &
Masters, 2006; Montelli et al., 2004; Wolfe et al., 2009), others have sug-
gested quite different convective mantle regimes (e.g., Foulger et al.,
2013). Thus, uncertainty in seismic tomography affects a wide range of Earth
scientists (e.g., geodynamicists, mantle geochemists) who utilize the singular
insights into deep Earth structure provided by this branch of geophysics, and
is not something that should be regarded as of interest to specialists only.
Seismic Tomography and the Assessment of Uncertainty 5

The pioneers of seismic tomography (e.g., Aki et al., 1977), and indeed
geophysical inverse problems (Backus & Gilbert, 1967, 1968, 1969, 1970),
were well aware of the issues surrounding solution robustness, and that simply
producing a model that satisfied the data was not meaningful, unless associated
error bounds could be determined. Yet even today, with vast computing and
intellectual resources at our fingertips, it is all too common to find examples of
seismic tomography in the literature where solution robustness is either
ignored or given minimal treatment. It therefore seems timely to provide
an overview of the various methods that have been devised to assess model
uncertainty and consider their strengths and weaknesses. We here restrict our-
selves to methods implemented in practical seismic tomography, noting that
inverse theory is a vast field with many applications throughout the physical
sciences of which complete coverage is well beyond the scope of this paper.
That said, there are other geophysical inverse problems that share similar chal-
lenges, for example, magnetic and gravity data inversion (Li & Oldenburg,
1996, 1998), which can also involve thousands or millions of unknowns.
As such, much of what is covered here is also applicable to other fields.
In the following sections, a descriptive approach is favored over one
involving detailed derivations in order to appeal to nonspecialists and stu-
dents who may have limited background in this area. For those interested
in the more mathematical aspects of the theory, sufficient references are
included throughout the text. After providing a brief historical perspective,
we discuss the causes of solution nonuniqueness in seismic tomography and
then go on to describe a range of methods used to assess model robustness. A
series of case studies are then presented to showcase a number of methods,
ranging from the more traditional to the cutting edge. Note that some prior
knowledge of seismic tomography methodology is assumed. For further in-
formation in this regard, interested readers are referred to several books and
review articles on the subject (Iyer & Hirahara, 1993; Liu & Gu, 2012;
Nolet, 1987; Nolet, 2008; Rawlinson & Sambridge, 2003; Romanowicz,
2003; Rawlinson, Pozgay, & Fishwick, 2010; Trampert & Fichtner, 2013a).

1.2 Historical Perspective


Although much of the fundamental framework for geophysical data inver-
sion was laid down by Backus and Gilbert (1967, 1968, 1969, 1970) and
Wiggins (1972), the first examples of seismic tomography were published
half a decade later (Aki & Lee, 1976; Aki et al., 1977; Dziewonski et al.,
1977). In the seminal work of Aki et al. (1977), on what is now known
6 Nicholas Rawlinson et al.

as teleseismic tomography, P-wave arrival time residuals from distant earth-


quakes are inverted for 3-D slowness variations beneath a seismic array. It is
assumed that the wave impinging on the model region from below is planar,
and that variations in wave speed can be described by a regular grid of con-
stant slowness blocks. A further assumption is that the geometry of ray paths
that penetrate the 3-D model region are only influenced by depth variations
in wave speed. As a result of these assumptions, the inverse problem,
although ill-posed, is linear. The authors use a damped least squares
approach to solve the linear inverse problem, and also produce formal esti-
mates of model resolution and covariance. Despite being published a year
earlier, the subsequent study of Aki and Lee (1976) represents the first
example of local earthquake tomography, in which hypocenter parameters
as well as slowness structure are simultaneously inverted for using arrival
times. In this case, posterior covariance and resolution estimates are made
for slowness structure, source location, and source origin time.
In the study of Dziewonski et al. (1977), 700,000 teleseismic P-wave
travel time residuals are inverted for the 3-D velocity structure of the mantle
described in terms of spherical harmonics. The authors use a similar approach
to Aki et al. (1977) to solve the linearized inverse problem and also produce
formal estimates of resolution. The earliest published example of seismic
tomography in an active source (cross-hole) context (Bois et al., 1971) is
2-D but accounts for the path dependence on velocity structure by using a
shooting method of ray tracing, in which the trajectory of rays are iteratively
adjusted until source–receiver paths are obtained. A damped least squares
approach similar to that used by Aki et al. (1977) is applied in an iterative
manner to solve the inverse problem. Although no estimates of model uncer-
tainty are provided in this case, the authors clearly recognize the issue of so-
lution nonuniqueness and perform several inversions using different input
parameters (such as cell size) to examine their sensitivity on the solution.
Of all methods for assessing robustness in seismic tomography, the
synthetic reconstruction test is by far the most ubiquitous in the published
literature; even today, most seismic tomography results are accompanied
by a test of this nature (e.g., Rawlinson, Pozgay, et al., 2010). Although
there are many variants, the basic commonality is that there is some
contrived, synthetic, or known structure through which the forward
problem is solved, using identical sources, receivers, and phase types as the
observational data set. This creates a synthetic data set, which is as accurate
as permitted by any approximations made in the forward solution. The next
step is to carry out an inversion of the synthetic data set in an attempt to
Seismic Tomography and the Assessment of Uncertainty 7

recover the known structure. Differences between the reconstruction and


the known structure provide insight into the resolution limits of the data
set. As discussed later, this approach to analyzing solution nonuniqueness
has its drawbacks, but its relative ease of implementation, even with very
large data sets, and the apparent simplicity of interpreting the output has
made it extremely popular.
The first use of synthetic reconstruction tests in seismic tomography was
actually made by Aki and Lee (1976) in their simultaneous inversion of local
earthquake travel times for 3-D P-wave velocity structure and hypocenter
location. In this study, they examine three synthetic models: the first is a
simple constant velocity half-space; the second a layered medium; and the
third a simple laterally heterogeneous model that simulates the presence of
a transform fault cross-cutting two media characterized by different veloc-
ities. The aim of these tests was to examine trade-offs between hypocenter
location and velocity variation. To simulate the effects of observational er-
ror, random noise was also added to the synthetic data sets. Future synthetic
reconstruction tests gradually introduced larger data sets, more sophisticated
forward solvers, and more complex synthetic models, but the underlying
approach used is essentially the same.
Today, the most commonly used model for synthetic tests is the so-called
checkerboard model, which consists of a regular alternating pattern of positive
and negative anomalies (e.g., positive and negative velocity perturbations
relative to some reference model) along each spatial dimension of the model.
This is an extension of the spike test (Walck & Clayton, 1987) in which the
synthetic model contains one or more short-wavelength anomalies; invert-
ing the associated synthetic data provides insight into smearing. The check-
erboard test was first introduced by Spakman and Nolet (1988) and rapidly
became very popular (Day, Peirce, & Sinha, 2001; Glahn & Granet, 1993;
Graeber, Houseman, & Greenhalgh, 2002; Granet & Trampert, 1989;
Rawlinson & Kennett, 2008; Rawlinson, Salmon, & Kennett, 2013;
Ritsema, Nyblade, Owens, Langston, & VanDecar, 1998) due largely to
its relative ease of interpretation. However, as discussed in more detail later,
the insight into solution nonuniqueness provided by a checkerboard test is
relatively limited (e.g., Lévêque, Rivera, & Wittlinger, 1993).
The drawbacks of synthetic testing and the difficulty of computing formal
estimates of resolution and covariance for large inverse problems motivated
researchers to look elsewhere for estimates of solution uncertainty. Statistics
provides a number of standard tests for measuring accuracy that can be readily
applied to potentially large inverse problems. These include bootstrapping
8 Nicholas Rawlinson et al.

and jackknifing, both of which are based on carrying out repeat inversions us-
ing different subsets of the data and then making an assessment of uncertainty
from the ensemble of solutions that are produced. Both bootstrapping and
jackknifing have been used in seismic tomography (Gung & Romanowicz,
2004; Lees & Crosson, 1989; Su & Dziewonski, 1997; Zelt, 1999), but ex-
amples in the published literature are few and far between.
As computing power increased during the 1990s and new methods were
developed to tackle very large linear inverse problems, the issue of trying to es-
timate covariance and resolution in the presence of many thousands of
unknowns was revisited. For example, Zhang and McMechan (1995) use an
extension of LSQR, a variant of the conjugate gradient method developed
by Paige and Saunders (1982), to approximate resolution and covariance
matrices for problems involving thousands of unknowns and tens of thousands
of observations. Yao, Roberts, and Tryggvason (1999) provide an alternative
approach to estimating resolution and covariance using LSQR and Zhang and
Thurber (2007) apply a method that also relies on Lanczos bidiagonalization
but yields the full resolution matrix and sidesteps the issue of whether subspace
methods, like LSQR, can produce useful estimates of uncertainty given that
they are restricted to exploring a small subspace of the full model space at each
iteration (Nolet, Montelli, & Virieux, 1999).
A variety of other more peripheral techniquesdin the sense that they
have not gained common usagedhave been suggested in the last few de-
cades for assessing model robustness in the context of linear and iterative
nonlinear inversion methods. Several of these fall into the category of pro-
ducing multiple data-satisfying models from which summary information is
produced. For example, Vasco, Peterson, and Majer (1996) use multiple
starting models to generate a set of solutions to which cluster analysis is
applied to retrieve the more robust features. Deal and Nolet (1996), within
a strictly linear framework, identify model null-space vectors along which
the solution can change but the data fit is essentially invariant. This “null-
space shuttle” enables one to produce an ensemble of data fitting solutions
with high computational efficiency, as demonstrated by the recent paper of
de Wit, Trampert, and van der Hilst (2012), in which the uncertainty of
detailed global P-wave models is assessed. Rawlinson, Sambridge, and
Saygin (2008) develop a dynamic objective function approach to generating
multiple solution models in which the objective function is modified in
response to the generation of each new model so that future models are de-
terred from visiting previously sampled regions of model space. The inverse
problem therefore only needs to be solved a limited number of times before
Seismic Tomography and the Assessment of Uncertainty 9

the full range of features allowed by the data is revealed. For all of the above
sampling techniques, taking the average model and the standard deviation of
the ensemble as summary information is one way of interpreting the results.
Traditionally, seismic tomography has relied on regular parameteriza-
tions to represent structure. Due to the well-known trade-off between res-
olution and variance (Backus & Gilbert, 1968), most data sets yield models in
which the uncertainty can vary significantly as a function of location while
spatial resolution is held constant. The other end-member approach is to
attempt to keep model variance constant and vary the spatial resolution of
the recovered model. Although this presents certain computational chal-
lenges, it has the potential advantage that solution robustness is relatively
uniform across the model. Early work by Chou and Booker (1979) and Tar-
antola and Nercessian (1984), in which “blockless” strategies are developed,
pioneered this approach, and were eventually followed by variable mesh
methods, which are becoming increasingly common (Abers & Roecker,
1991; Bijwaard, Spakman, & Engdahl, 1998; Burdick et al., 2014; Curtis
& Snieder, 1997; Fukao, Obayashi, Inoue, & Nebai, 1992; Michelini,
1995; Montelli et al., 2004; Nolet & Montelli, 2005; Sambridge & Gud-
mundsson, 1998; Sambridge, Braun, & McQueen, 1995; Spakman & Bij-
waard, 2001). However, the challenge of working out how to spatially
vary the resolution of recovery in response to the information content of
the data is nontrivial, and to date there is no method that can guarantee
that model variance is constant throughout the model, let alone what the
value of the variance might be. Recent advances in wavelet decomposition
methods for representing structure (Chiao & Kuo, 2001; Loris, Nolet, Dau-
bechies, & Dahlen, 2007; Simons et al., 2011; Tikhotsky & Achauer, 2008)
may help alleviate this limitation.
In the last decade, there has been an increased focus on nonlinear sam-
pling methods that produce an ensemble of data fitting models that can be
subsequently interrogated for robust features. In many cases, these methods
do not rely on the assumption of local linearization, which makes them
attractive for highly nonlinear problems. The down side is, of course, the
requirement for huge computational resources, but with rapid improvements
in computing power, such problems are gradually becoming more tractable.
Early attempts at fully nonlinear tomography, which were cast in the form of
global optimization problems, include Pullammanappallil and Louie (1993)
and Boschetti, Dentith, and List (1996) for 2-D reflection and
refraction travel time tomography, and in 3-D, Asad, Pullammanappallil,
Anooshehpoor, and Louie (1999) for local earthquake tomography. Apart
10 Nicholas Rawlinson et al.

from the limited computing power available at the time, which necessitated
the use of relatively small data sets, these pioneering efforts also relied on reg-
ular static parameterizations which did not account for spatial variations in the
constraining power of the data.
Monte Carlo methods form the basis of most fully nonlinear inversion tech-
niques developed for seismic tomography. Sambridge and Mosegaard (2002)
define Monte Carlo methods as “experiments making use of random
numbers to solve problems that are either probabilistic or deterministic in na-
ture.” The origin of Monte Carlo methods can be traced back to the begin-
ning of the nineteenth century, if not before (Sambridge & Mosegaard, 2002),
but much of the pioneering work on modern techniques that are still used
today originated in the 1960s (Hammersley & Handscomb, 1964; Press,
1968). The first paper to introduce Monte Carlo inversion methods into
geophysics was by Keilis-Borok and Yanovskaya (1967), which is based on
earlier work in the Union of Soviet Socialist Republics where much of the
initial development took place. Simulated annealing, a nonuniform Monte
Carlo method for global optimization, was introduced into geophysics in
the work of Rothman (1985, 1986). Genetic algorithms were first used in
geophysics in the early 1990s (Sambridge & Drijkoningen, 1992; Stoffa &
Sen, 1991) for global optimization problems, and proved popular for solving
highly nonlinear inverse problems involving a relatively small number of un-
knowns. The works cited in the previous paragraph by Pullammanappallil and
Louie (1993) and Boschetti et al. (1996) used inversion methods based on
simulated annealing and genetic algorithms, respectively.
Instead of using Monte Carlo techniques to directly solve global optimi-
zation problems, which produces a best fitting model, an alternative is to
exploit the sampling they produce to assess uncertainty and trade-off issues,
which are inherent to most geophysical inverse problems (Sambridge, 1999).
It is in this context that Monte Carlo methods are seeing a resurgence in
seismic tomography today, with the development of several techniques
that promise to reshape the traditional linear optimization framework that
is still favored in most studies.
Of the various nonlinear sampling techniques available, it is the advent of
so-called Bayesian transdimensional tomography that has perhaps shown
the most promise for improving the way we do seismic imaging (Bodin &
Sambridge, 2009). A key feature of the approach is that the number and
distribution of model unknowns, in addition to their values (e.g., velocity),
are determined by the inversion. The advantage is that the level of detail
recovered is strongly data driven, and potential increases in compute time
Seismic Tomography and the Assessment of Uncertainty 11

caused by these additional degrees of freedom are offset by the exclusion of


redundant parameters. The term “Bayesian” refers to the formal statistical
framework for combining a priori model information (i.e., information
about model unknowns that are independent of the data) and data to
produce a result (cast in terms of a posterior model distribution) that is
more tightly constrained than the a priori model distribution (Scales &
Snieder, 1997). Monte Carlo search techniques do not require a Bayesian
setting of the inverse problem, and not all Bayesian inverse problems are
solved using Monte Carlo methods, but the two are often linked. This is
probably because Monte Carlo methods generally avoid implementing ad
hoc regularizationdcommon with optimization methodsdthat is at odds
with the underlying philosophy of Bayes’ theorem.
The transdimensional inversion scheme of Bodin and Sambridge (2009)
is driven by a reversible jump Markov chain Monte Carlo (rj-McMC)
scheme, which produces a posterior probability density distribution of Earth
models. This ensemble of models can be interrogated for summary informa-
tion such as the average model and the standard deviation, which provides a
measure of uncertainty. Bodin et al. (2012b) apply the scheme to multiscale
ambient seismic noise data from the Australian region to produce group
velocity maps. Young, Rawlinson, and Bodin (2013) extend the method
to include inversion for shear wave speed and produce high-resolution
3-D images of the crust in southeast Australia using ambient noise data
from a large transportable array. In these applications, ray path trajectory is
not updated for every model generated by the rj-McMC scheme due to
computational cost, which means that the inversion is not fully nonlinear.
However, in practice, the frequency of update can be chosen to optimize
the trade-off between compute time and invariance of the posterior proba-
bility density distribution. Galetti, Curtis, Baptie, and Meles (2014) use the
scheme of Bodin et al. (2012b) with ray trajectory updates for each new
model.
With modern computing power, Bayesian transdimensional tomogra-
phy is becoming tractable even for relatively large, fully 3-D tomography
problems. For example, Piana Agostinetti, Giacomuzzi, and Malinverno
(submitted for publication) have developed a scheme that can be applied
to 3-D local earthquake tomography, which involves inverting not only
for Vp (P-wave velocity) and Vp/Vs (ratio of P-wave and S-wave velocity),
but also for hypocenter location. Computational requirements for a realistic
problem involving over 800 events, nearly 60 stations, and of the order of
5500–6500 unknowns is of the order of a week on a cluster of 250 central
12 Nicholas Rawlinson et al.

processing units (CPUs). As computing power grows, this class of Bayesian


approach will no doubt become increasingly popular.
In the last decade, full waveform tomography has emerged as a viable
tool for imaging the subsurface across a range of scales (Fichtner, 2011;
Fichtner, Kennett, Igel, & Bunge, 2009; Fichtner et al., 2013; Operto, Vir-
ieux, Dessa, & Pascal, 2006; Ravaut et al., 2004; Smithyman, Pratt, Hayles,
& Wittebolle, 2009; Tape, Liu, Maggi, & Tromp, 2009). Numerical solu-
tion of the elastic wave equation in three dimensions means that the full
recorded wave train generated by a seismic event can be exploited, which
has the potential to yield more information than more traditional ap-
proaches, like travel time tomography, which rely on picking the onset of
identifiable phases. The main drawbacks are the computational cost of
solving the wave equation and the nonlinear nature of the inverse problem,
which until recently, have limited application to relatively small data sets.
A consequence of these challenges is that meaningful estimates of solution
robustness are difficult to make; for example, formal estimates of covariance
and resolution under the assumption of local linearity are difficult to recover
without full realization of the sensitivity matrix, which is generally not done
in order to limit the computational burden. As a result, resolution analysis in
full waveform tomography has been limited to synthetic recovery tests
(e.g., Chen and Jordan, 2007) and estimates of composite volumetric sensi-
tivity (Tape et al., 2010). However, in a recent paper, Fichtner and Trampert
(2011b) demonstrate that under the assumption of a quadratic approxima-
tion to the misfit function, it is possible to produce quantitative estimates
of resolution in full waveform tomography with a computational burden
that is less than a synthetic reconstruction test.

1.3 Uncertainty in the Age of Big Data


In the last few years, the term “Big data” has become popular to describe our
rapidly growing ability to generate vast quantities of digital data (Dobbs,
Manyika, Roxburgh, & Lund, 2011; Lohr, 2012; Marx, 2013). While there
is no precise definition for this term, it generally refers to data sets that are
too large to store, manage, or effectively analyze (Dobbs et al., 2011). Gantz
and Reinsel (2010) estimate that the global rate of data collection is
increasing at a rate of 58% per year, which in 2010 alone amounted to
1250 billion gigabytes, more bytes than the estimated number of stars in
the universe. Moreover, since 2007, we have been generating more bits
of data per year than can be stored in all of the world’s storage devices (Gantz
& Reinsel, 2010).
Seismic Tomography and the Assessment of Uncertainty 13

Seismology is not immune from this data explosion, with increasing


amounts of high-quality data being recorded, stored, and made available
over the Internet. A good example of the rapid growth in seismic data comes
from the Incorporated Research Institutions in Seismology, Data Management
Centre (IRIS DMC), which since the early 1990s has been archiving local,
regional, and global data sets. Figure 2 plots the cumulative size of the archive
since 1992, which suggests an exponential rate of growth. In the field of
seismic tomography, the challenge will be to try and make use of as much
of these data as possible. At regional (e.g., continent-wide) and global scales,
there will be additional pressure to update models more regularly to keep pace
with the deluge of new data that potentially may result in significant improve-
ments to the imaging results. As it is, current global models already utilize mil-
lions of data measurements and push the boundaries of current computational
resources (e.g., Burdick et al., 2014).
As the size of the inverse problem increases due to the addition of more
data, the challenge and importance of assessing model uncertainty becomes
arguably greater. For example, if one were to compute formal estimates of
covariance and resolution using linear theory, then the compute time of
the matrix inversion is of the order of O (n2  n3) for an n  n matrix [it
could potentially be less if matrix sparsity is exploited, but the relationship

Figure 2 Cumulative volume of the IRIS DMC seismic data archive as on May 1, 2014.
Source: http://www.iris.edu.
14 Nicholas Rawlinson et al.

is still nonlinear]; clearly, then, as the number of unknowns is increased,


computational requirements will grow rapidly. Similarly, if a sampling
approach such as the aforementioned rj-McMC is used to generate a set
of data-satisfying models from which summary information such as model
standard deviation is extracted, compute time will not be linear. Yet, it is
crucial that we have quantitative information on uncertainty as models
become increasingly detailed and complex, and more inferences can poten-
tially be made about the physical state of the region that is imaged.
New methods will need to be brought to bear to properly deal with model
uncertainty in the age of Big Data.

2. NONUNIQUENESS IN SEISMIC TOMOGRAPHY


Nonuniqueness in seismic tomography refers to when more than one
model satisfies the observations, and is a consequence of the ill-posed na-
ture of the problem. The reason that this arises is succinctly explained by
Snieder (1991): “The inverse problem where one wants to estimate a
continuous model with infinitely many degrees of freedom from a finite
data set is necessarily ill-posed.” Although this appears to be indisputable,
it is nonetheless at odds with a statement made by Aki and Lee (1976) in
one of the first papers on seismic tomography when they evaluate the re-
sults of their inversion of local earthquake data: “Thus, we confirm our
earlier contention that we can obtain a unique solution of our problem
when sufficient data are available.” Ostensibly, this might seem contradic-
tory, but in reality it is merely a case of viewing the problem at different
stages of the solution process. From the outset, all seismic data sets are
finite, so it follows that any number of models, with no restrictions on their
flexibility, could be conceived that satisfy the observations to the same
level. However, if we impose a limit on the minimum scale length of
the model, for example, based on the dominant wavelength of the data
that is being exploited (with the argument that the observables are insen-
sitive to variations of smaller scale length), then the range of data-satisfying
models will be dramatically reduced. Taking it a step further, if we now
define an objective function to which the inverse problem is tied, then a
unique solution may be possible, particularly if the assumption of lineari-
zation is imposed. The above statement made by Aki and Lee (1976) is
essentially tied to an inverse problem that has been reduced to this state.
However, in most seismic tomography problems, the presence of (often
Seismic Tomography and the Assessment of Uncertainty 15

poorly constrained) data noise means that solution uniqueness is difficult to


achieve, even if a variety of limiting assumptions are imposed on the
permissible variations in structure.
Below, a brief description is provided of the various factors that play a role
in constraining the solution of an inverse problem in seismic tomography.

2.1 Data Coverage


Increasing the volume of available data by adding contributions from addi-
tional sources or receivers will in many cases produce a better outcome in
seismic tomography. However, it is well known that adding more data
does not necessarily result in a better constrained inverse problem. This is
illustrated by the simple case where a ray path traverses two blocks with
different velocities. When a second ray passes through the same blocks
with the same ratio of path lengths, then the two linear equations relating
travel time and slowness are linearly dependent and so the new ray adds
no new information to the inverse problem. Although there are a
variety of tomography problems where this issue arises, it is particularly
notable when earthquake sources are used (Fishwick & Rawlinson, 2012;
Rawlinson, Kennett, Vanacore, Glen, & Fishwick, 2011). In such cases,
earthquakes tend to cluster around seismogenic regions (e.g., subduction
zones, active faults), and after a period of time most subsequent earthquakes
occur within the neighborhood of previous earthquakes, such that they
contribute little new structural information in the recorded seismogram.
Figure 3 shows a simple synthetic example, based on Figure 1, which dem-
onstrates this concept. Figure 3(a) is a reconstruction of Figure 1(a), based on a
constant velocity starting model, which uses the source–receiver travel times of
the paths shown in Figure 1(b) (see Rawlinson et al., 2008; for an explanation
of the iterative nonlinear inversion scheme). Figure 3(b) is a repeat of this
experiment but with a travel time data set that is twice the size; this is simply
accomplished by repeating each source location with a 0.3 perturbation in
latitude and longitude. Despite the significant increase in the number of
data, the recovered model is virtually identical. Given that the new source
locations are perturbed by at least 2% of the model width, one might have
expected the reconstruction in Figure 3(b) to be slightly better. However,
given the minimum scale length of the anomalies, which is around 5% of
the model width, and the nonlinearity of the problem, the lack of improve-
ment is hardly surprising. In the latter case, since first-arrival rays are attracted
to higher velocity regions, rays from nearby sources tend to bunch together,
and do little to help constrain structure. In applications involving real
16 Nicholas Rawlinson et al.

(a) 280 rays (b) 560 rays


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


Velocity (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Figure 3 Two synthetic reconstruction tests based on the structure and path coverage
shown in Figure 1. The initial model has a uniform velocity of 3.0 km/s. (a) Data set con-
sists of 280 travel times between 20 sources and 14 receivers; (b) Data set consists of
560 travel times between 40 sources and 14 receivers. For both (a) and (b), the top
plot shows the reconstructed model and the bottom plot shows the ray paths super-
imposed on the input model.

observations, the presence of noise, provided that it is uncorrelated, should


mean that adding more data, even if it samples identical along-path structure,
will improve the result due to an “averaging out” of the noise. This is the same
philosophy behind data binning, which is often done prior to teleseismic,
regional, or global tomography (e.g., Rawlinson & Fishwick, 2012).
Ray coverage or density maps are often used in seismic tomography to
provide insight into the resolving power of seismic data and the quality of a
reconstruction (Bijwaard & Spakman, 2000; Nielsen, Thybo, & Solodilov,
Seismic Tomography and the Assessment of Uncertainty 17

Normalized sensitivity

0.0 0.5 1.0

0˚ 5˚ 10˚ 15˚
10˚ 10˚

5˚ 5˚

0˚ 0˚

−5˚ −5˚

0˚ 5˚ 10˚ 15˚
Figure 4 Normalized cumulative sensitivity of the travel time data set with respect to
the model parameterization for the synthetic travel time data set shown in Figure 1.

1999; Ramachandran, Hyndman, & Brocher, 2006; Walck & Clayton,


1987). However, at best they are an indirect tool, and have the potential to
be misleading. A variant of the ray coverage or density map is to instead
plot some measure of the sensitivity of the observables with respect to the
model parameters (Chen and Jordan, 2007; Tape et al., 2010). Figure 4 shows
the normalized cumulative sensitivity (obtained by summing the Frechet
derivatives at each control node and dividing by the largest value) of the travel
time data set in Figure 1; here the underlying grid is interpolated using a
smooth mosaic of cubic B-spline functions, which is why the sensitivity
plot is smooth. However, just because the data are sensitive to a change in
the value of a parameter does not automatically mean that the parameter is
well resolved. For example, if a unidirectional bundle of rays traverses a
pair of cells, each ray travel time will vary if the velocity of either of the cells
is changed, but the data cannot discriminate between a change made to the
velocity of one or the other of the cells.

2.2 Data Noise


Noise is ubiquitous to all seismic data, and is often very difficult to accu-
rately quantify. For example, with manual picking of phases, it is common
18 Nicholas Rawlinson et al.

for even experienced analysts to disagree on onset times (Leonard, 2000),


let alone some measure of picking uncertainty. Automated picking algo-
rithms (Allen, 1982; Di Stefano et al., 2006; Vassallo, Satriano, & Lomax,
2012; Wang & Houseman, 1997) have the potential to offer more rigorous
and consistent estimates of uncertainty; for example, Di Stefano et al.
(2006) automatically compute arrival time uncertainty using a quality-
weighting algorithm that takes into account waveform sampling rate, spec-
tral density analysis, and signal to noise ratio. However, these estimates are
calibrated using a series of reference picks and error estimates provided by
the user. Picking methods that use some measure of waveform coherence
(Chevrot, 2002; Rawlinson & Kennett, 2004; VanDecar & Crosson,
1990) have the potential to produce accurate estimates of relative onset
times, and can yield estimates of uncertainty. In the case of Chevrot
(2002), picking error is determined by computing the correlation
coefficient between each seismic waveform and the optimal waveform
(determined by simulated annealing), and comparing the result to the auto-
correlation of the optimal waveform; the point where the two correlation
functions intersect gives the time delay error. While this may produce good
relative estimates of picking uncertainty, it is unclear whether the absolute
values are very meaningful. In full waveform tomography, it is the seismo-
gram itself that represents the data, so no explicit picking of phases is usually
required. However, particular care is required as to how waveform misfit is
defined if imaging artifacts caused by the presence of data noise are to be
minimized (Bozdag, Trampert, & Tromp, 2011). Since noise-induced
measurement uncertainties are almost impossible to assess quantitatively
for complete waveforms, full waveform inversion mostly operates with
data characterized by high signal to noise ratios (e.g., Chen et al., 2007;
Fichtner et al., 2009; Tape et al., 2010).
It is clear that the presence of data noise is unavoidable in seismic tomog-
raphy, so it remains to be seen how it influences the analysis of uncertainty in
seismic tomography. In general, as the level of data noise increases, the range
of data-fitting models increases. Thus, in practice, even with an ostensibly
overdetermined problem (more independent data than unknowns) any
hope of solution uniqueness cannot be realized because of the noise. Within
a linearized least squares inversion framework (Menke, 1989; Tarantola,
1987), where the data noise is assumed to have a Gaussian distribution, large
levels of noise can be handled, provided some prior knowledge of the stan-
dard deviation of the noise is known. Figure 5 shows the result of applying
the regularized (damped and smoothed) least squares inversion method of
Seismic Tomography and the Assessment of Uncertainty 19

(a) Noise = 100%


(b)Noise = 50%
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚

(c) Noise = 25% (d)Noise =0%


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


Velocity (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Figure 5 The effect of varying levels of data noise on the inversion of the travel time
data set shown in Figure 1. Gaussian noise with a standard deviation of (a) 100%, (b)
50%, (c) 25%, and (d) 0% are applied to the data, where 100% represents the data misfit
of the initial (constant velocity) model (21 s).

Rawlinson et al. (2008) to the synthetic data set illustrated in Figure 1 with
varying amounts of Gaussian noise imposed, ranging up to a standard
deviation that equals the standard deviation of the data misfit of the constant
velocity initial model. In each case, the regularization is tuned so that inver-
sion converges to a point where the standard deviation of the data misfit
matches the standard deviation of the imposed data noise. As the noise level
increases, the solution model essentially grades toward a uniform velocity
model, with recovered model heterogeneity resembling the true structure,
20 Nicholas Rawlinson et al.

0˚ 5˚ 10˚ 15˚
10˚ 10˚

5˚ 5˚

0˚ 0˚

−5˚ −5˚

0˚ 5˚ 10˚ 15˚
Velocity (km/s)

1.5 2.0 2.5 3.0 3.5 4.0 4.5


Figure 6 The same as Figure 5(a) (i.e., 100% noise), but this time the damping and
smoothing are set such that the standard deviation of the final model data misfit is
75% of the standard deviation of the imposed data noise (i.e., the data are overfit).

albeit with increasingly lower amplitude and fewer short wavelength


variations.
The example shown in Figure 5 demonstrates the type of solution model
behavior that is desirable, with structure only being recovered where
required by the signal contained in the data. If one were to compute poste-
rior covariance for these models, they would show that the posterior uncer-
tainty would approach the prior uncertainty as the noise increases. However,
if the standard deviation of the noise is poorly known, then it is relatively
easy for spurious structure to be introduced. For example, Figure 6 shows
what happens when the standard deviation of data misfit is reduced to
75% of the standard deviation of the imposed data noise. Clearly, the results
bear little resemblance to the truth, and any computation of the posterior
covariance would likewise be extremely misleading. The rather disturbing
result of Figure 6 is in part the reason why the use of least squares misfit
measures in seismic tomography is not universally adopted (Djikpesse &
Tarantola, 1999; Pulliam, Vasco, & Johnson, 1993; Scales, Gersztenkorn,
& Treitel, 1988). Another drawback of least squares misfit measures is that
they are not robust to outliers, which could easily be introduced in a number
Seismic Tomography and the Assessment of Uncertainty 21

of ways including phase misidentification and Global Positioning System


(GPS) timing issues. In the latter case, with data repositories such as IRIS
DMC storing large data sets collected by different groups from various parts
of the world, it is not unusual for GPS timing failures to be improperly
flagged.
A novel approach to overcoming the issue of unknown or poorly under-
stood levels of data noise is to treat the standard deviation of the noise as an
unknown in the inversion (Malinverno & Briggs, 2004; Malinverno &
Parker, 2006); this has recently been implemented via a Hierarchical
Bayesian inversion scheme (Bodin, Sambridge, Tkalcic, et al., 2012), which
exhibits natural parsimony, and has shown to be effective in a number of
applications (Young, Cayley, et al., 2013; Young et al., 2013). Although a
notable advance in the field of seismic tomography, it still requires the noise
distribution to be assumed in advance.

2.3 The Parameterization Problem


One of the biggest, and in many cases the least, justifiable assumptions that is
made in seismic tomography is with regard to the permissible range of
seismic structure that can be recovered. In the Earth, seismic properties
can vary smoothly or sharply in three dimensions over a great range of scales.
Yet it is common to use regular basis functions in the spatial or wave number
domain to represent structure. One of the simplest representations uses reg-
ular blocks with constant seismic properties (e.g., Achauer, 1994; Aki et al.,
1977; Hildebrand, Dorman, Hammer, Schreiner, & Cornuelle, 1989;
Nakanishi, 1985; Oncescu, Burlacu, Anghel, & Smalbergher, 1984; Vasco
& Johnson, 1998), which has certain advantages such as simple initial value
ray tracing. More sophisticated parameterizations use a grid of control
nodes tied to a function that produces a continuum, such as trilinear
(Eberhart-Phillips, 1986; Graeber & Asch, 1999; Zhao et al., 1994) or cubic
splines (Farra & Madariaga, 1988; McCaughey & Singh, 1997; Rawlinson,
Reading, & Kennett, 2006). In the spectral domain, truncated Fourier series
(Hammer, Dorman, Hildebrand, & Cornuelle, 1994; Hildebrand et al.,
1989; Wang & Houseman, 1997) have been used at local and regional scales,
and spherical harmonics have been used at the global scale (Dziewonski
et al., 1977; Dziewonski & Woodhouse, 1987; Romanowicz & Gung,
2002; Trampert & Woodhouse, 1995).
The drawback of all these parameterizations is that they impose severe
limits on the types of structures that can be recovered, and can potentially
result in the appearance of artifacts if the observed data are due to structure
22 Nicholas Rawlinson et al.

that cannot be represented by the chosen parameterization. The imposition


of a parameterization dramatically reduces the range of data-satisfying
models, but the choice is often driven by convenience or computational
tractability rather than the underlying physics of the problem. The use of
a regular parameterization also does not take into account spatial variability
in the information content of the data, which commonly occurs in seismic
tomography due to irregular station and/or source distributions. Thus, some
parts of a model may be well resolved by the data and other parts poorly
resolved, but the spatial resolution of the model is unchanged. This means
that due to the trade-off between resolution and covariance, poorly con-
strained parts of the model have a high error if they are oversampled by
the parameterization, or well-constrained parts of the model have a low
error if they are undersampled by the parameterization. In either case, infor-
mation recovery is unlikely to be optimal.
From the point of view of solution nonuniqueness, the inverse problem
can be well constrained or poorly constrained, depending on the spacing of
the parameterization that is chosen. For instance, if the model was repre-
sented by a single parameter, then the inverse problem would be overdeter-
mined and the posterior error associated with the solution would be small.
Of course, it is unlikely that such a simple model would satisfy the data, but
the aim of this book-end example is to show how subjective the assessment
of model uncertainty can become as a result of its inextricable link to the
choice of model parameterization.
To overcome the limitations of a fixed regular parameterization, a variety
of studies have attempted to use either static or adaptive irregular parameter-
izations that are based on some measure of the information content of the
data set (Abers & Roecker, 1991; Bijwaard & Spakman, 2000; Bijwaard
et al., 1998; Burdick et al., 2008, 2014; Chou & Booker, 1979; Curtis &
Snieder, 1997; Fukao et al., 1992; Michelini, 1995; Montagner & Nataf,
1986; Montelli et al., 2004; Sambridge et al., 1995; Tarantola & Nercessian,
1984; Vesnaver, B€ ohm, Madrussani, Rossi, & Granser, 2000; Zhang,
Rector, & Hoversten, 2005). A common static approach is to match the
parameterization to the path density prior to inversion (Abers & Roecker,
1991). Adaptive approaches often use some kind of bottom-up splitting
strategy in which new parameters are added in regions where data con-
straints appear greater (Sambridge & Faletic, 2003). The principle of trying
to use the data itself to drive the spatial variability of recovered information
appears sensible, but the additional degrees of freedom that this requires can
quickly make the inverse problem intractable. Certainly one might imagine
Seismic Tomography and the Assessment of Uncertainty 23

that with the ability to vary the length scale of recovered structure, the goal
would be to end up with a model in which the uncertainty associated with
each parameter is identical; if not, then making inferences from the results
becomes even more difficult, because one would need to account for vari-
ations in both uncertainty and scale length. Yet, no study published to date
has produced a model with this property.
In the last decade or so, a number of advances have been made in the
development of data-driven parameterizations. These include wavelet
decomposition (Chiao & Kuo, 2001; Loris et al., 2007; Simons et al.,
2011; Tikhotsky & Achauer, 2008) and partition modeling (Bodin &
Sambridge, 2009; Bodin, Sambridge, Rawlinson, & Arrooucau, 2012;
Young et al., 2013). In the latter case, the number of unknowns, the spatial
distribution of basis functions and the values of their coefficients are all un-
knowns in the inversion, which makes it an extremely data-driven process.
Within a Bayesian framework, whereby the data are combined with prior
model information to produce a posterior distribution of data-fitting
models, the partition approach can recover structure over a large range of
scale lengths and yield meaningful estimates of solution uncertainty.

2.4 The Data Prediction Problem


The way in which the forward problem is solved in seismic tomography can
play a role in the recovery of structure, and therefore should not be ignored
when assessing the robustness of the result. There are three basic ways in
which the forward solution method can have an influence: (1) accuracy of
the forward solver, (2) simplifying assumptions about the physics of wave
propagation, and (3) completeness of the solution.
The limited accuracy of forward problem solvers can be a significant
source of error because they numerically solve equations for which no
analytical solutions are available. Numerical approximations are made in
the interests of computational efficiency and if these approximations are
poor, the resulting error will impact on the solution and any quantitative
assessment of uncertainty. To illustrate the effect of forward problem errors
on solution accuracy, Figure 7 shows the result of inverting the Figure 1 data
set with an inaccurate forward solution. In this case, the eikonal equation, a
nonlinear partial differential equation, is solved using a grid-based finite dif-
ference scheme. The use of a very coarse grid means that the finite difference
approximations are poor and the travel time predictions inaccurate, resulting
in a poorer solution. Compared to other sources of uncertainty in the to-
mography problem, inaccuracies in the forward solver, such as illustrated
24 Nicholas Rawlinson et al.

(a) (b)
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


Velocity (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Figure 7 Inversion of the Figure 1 data set using (a) inaccurate estimates of travel time
and (b) accurate estimates of travel time. In (a) grid spacing for numerical solution of
the eikonal equation is 0.8 , while in (b) it is 0.1 .

in Figure 7, are relatively straightforward to overcome. However, there are


more insidious sources of error that can be difficult to diagnose. For instance,
most methods of two-point ray tracing, which can be viewed as a potentially
highly nonlinear inverse problem, are nonrobust in that they use locally
linear approximations to achieve convergence (Cassell, 1982; Farra &
Madariaga, 1988; Julian & Gubbins, 1977; Koketsu & Sekine, 1998;
Pereyra, Lee, & Keller, 1980; Rawlinson, Houseman, & Collins, 2001;
Sambridge, 1990; Um & Thurber, 1987; VanDecar, James, & Assumpç~ao,
1995; Zhao, Hasegawa, & Horiuchi, 1992). As a result, rays may not be
found that exist, or the ray that is found is not the one that matches the phase
identified on the seismogram due to multipathing, a phenomenon that in-
creases as a function of velocity heterogeneity. This will invariably lead to
unquantifiable uncertainties in the final model.
Errors caused by simplifications of wave propagation physics are an active
area of research, especially in the context of so-called finite-frequency to-
mography. The majority of seismic tomography undertaken today is based
on geometric ray theory, in which the underlying assumption is that the
wavelength of the seismic wave is much smaller than the minimum scale
length of heterogeneity. This high-frequency assumption ignores a variety
of seismic wave behaviors including diffraction, scattering, and wave front
healing. As a result, the sensitivity of the observable, such as travel time, is
Seismic Tomography and the Assessment of Uncertainty 25

dependent on off-path effects, and ignoring this relationship means that the
accuracy of the recovery may be diminished. Studies that apply this approach
are in general not able to provide a quantitative measure of how such simpli-
fying assumptions impact on the uncertainty of the tomography result.
Finite-frequency tomography (Chevrot & Zhao, 2007; Marquering,
Dahlen, & Nolet, 1999; Montelli et al., 2004; Yang et al., 2009) attempts
to overcome some of these limitations by using first-order perturbation the-
ory to account for the presence of single scatterers. A simplification of wave
propagation physics frequently made in local- to regional-scale full wave-
form inversion is the acoustic approximation (e.g., Bleibinhaus, Hole, &
Ryberg, 2007; Kamei, Pratt, & Tsuji, 2013; Pratt & Shipp, 1999). The Earth
is assumed to act as a fluid where wave propagation is governed by the
acoustic wave equation that can be solved with much less computational re-
sources than the complete elastic wave equation. While the acoustic approx-
imation produces kinematically correct first arrivals (the travel times of the
direct P and S waves are correct), later parts of the seismogram may not
be accurately represented, thus introducing forward modeling errors that
are difficult to quantify.
Both ray theory and the acoustic approximation illustrate that simplifica-
tions in the physics of seismic wave propagation can go hand in hand with
incomplete solutions of the forward problem in the sense that specific types
of waves cannot be modeled. Consequently, only specific aspects of the
seismic wave field, e.g., early-arriving waveforms in the case of the acoustic
approximation, can be exploited for tomography. This limitation, in turn,
contributes to the nonuniqueness of the solution. Ultimately, errors in the
forward problem can only be minimized by the robust solution of the full
elastic wave equation, using, for instance, finite-difference (e.g., Moczo,
Kristek, Vavrycuk, Archuleta, & Halada, 2002), spectral-element (e.g.,
Komatitsch & Vilotte, 1998), or other numerical techniques. The band-
width of the solutions is, however, still very much limited by the available
computational resources.

2.5 The Inverse Problem


The tomographic inverse problem involves adjusting model parameters in
order to satisfy the data to an acceptable level and any a priori constraints
that may be available. It is often formulated as a minimization problem in
which an objective function or penalty function is defined and a search al-
gorithm is applied to find regions of model space with a high level of fit. The
way in which the objective function is defined can potentially have a major
26 Nicholas Rawlinson et al.

influence on the inversion result and its associated uncertainty. Gradient-


based inversion methods often use an objective function of the form
(Rawlinson et al., 2010a)

SðmÞ ¼ ðgðmÞ  dobs ÞT C1 T 1


d ðgðmÞ dobs Þ þ εðm m0 Þ Cm ðm m0 Þ
þ hmT DT Dm
(1)
where g(m) are the predicted data, dobs are the observed data, Cd is the a
priori data covariance matrix, m0 is the reference model, Cm is the a priori
model covariance matrix, and D is a second derivative smoothing operator.
Ideally, Cd represents the total covariance due to all sources of noise,
including those due to observation and assumptions (e.g., in the parame-
terization and forward problem). However, the reality is that Cd often only
contains some ad hoc estimate of picking uncertainty and therefore would
be better described as a weighting matrix rather than a true prior covariance
matrix. Yet Cd is crucial in controlling both the output model and its
associated uncertainties, so if it is poorly representative of prior uncertainty,
then the reliability of the solution will likewise be diminished. In recogni-
tion of this issue, there have been several recent studies that attempt to
recover Cd, or some component of it, during the inversion. For example,
Duputel, Agram, Simons, Minson, and Beck (2014) attempt to recover the
“prediction error” component of Cd during earthquake source inversion
and Bodin et al. (2012a) invert for the standard deviation of the diagonal
elements of Cd in surface wave tomography (a case study is provided in
Section 4.3).
The prefactors ε and h in Eqn (1) are referred to as the damping and
smoothing parameters, respectively, and control the trade-off between
data fit, model perturbation relative to a reference model, and model
smoothness. These regularization terms have different origins; in the case
of damping, if we set ε ¼ 1 and Cd and Cm truly represent the prior data
and model covariance, respectively, then we have a Bayesian style inversion
in which prior information on a model is combined with data constraints to
produce a posterior distribution. Smoothing, on the other hand, appeals to
Occam’s razor, in which parsimony is favored over complexity. The inclu-
sion of both damping and smoothing in which ε and h are real positive vari-
ables results in the definition of a somewhat ad hoc objective function for
which meaningful estimates of covariance and resolution, even for a linear
inverse problem, are difficult to obtain.
Seismic Tomography and the Assessment of Uncertainty 27

Another potential drawback of Eqn (1) is that it assumes data noise has a
Gaussian distribution. However, there is no guarantee that this is the case,
and outliers (e.g., from picking the wrong phase, GPS timing errors) may
have an unjustifiably large influence on the inversion result. Removal of
outliers on the basis of some assumption about the spread of acceptable values
(e.g., greater than N standard deviations from the mean, where N  1) is one
approach for reducing their influence on the final result, but as pointed out by
Jeffreys (1932), “a process that completely rejects certain observations, while
retaining with full weight others with comparable deviations, possibly in the
opposite direction, is unsatisfactory in principle.” Jeffreys (1932) developed a
scheme, known as uniform reduction, which reduces the influence of outliers
without directly needing to identify the anomalous data. The effect of
uniform reduction is to assign outliers small weights so that they do not
have a strong effect on the solution. The implementation and potential
benefits of this approach in the context of 3-D local earthquake tomography
are demonstrated in the study of Sambridge (1990).
In seismic tomography, the use of an L2 measure of misfit such as Eqn (1)
is almost universal, but in many cases there is little evidence for errors having
a Gaussian distribution (e.g., phase arrival times in the International Seismo-
logical Centre (ISC) bulletin Buland, 1986; Pulliam et al., 1993). As a result,
alternative misfit measures have been considered, most notably the L1 mea-
sure of misfit, which is known to be robust in the presence of outliers
(Claerbout & Muir, 1973; Jeong, Pyun, Son, & Min, 2013; Pulliam et al.,
1993; Scales et al., 1988). Claerbout and Muir (1973) advocate the use of
absolute error criteria, and find from studying numerous examples that it
rarely exceeds two to four times the computing requirements of its least
squares equivalent, and in many cases produces much better results. It is
interesting to note that despite these early efforts, it is still very common
to find studies that use an L2 norm and simply cull “outliers” that are defined
in a relatively arbitrary manner.
Deterministic inversion methods that produce a single solution, such as
linearized least squares, generally require some kind of regularization to sta-
bilize the inversion and produce a plausible result. When sampling methods
are used, it is possible to dispense with explicit regularization altogether. For
example, in the Bayesian transdimensional scheme of Bodin and Sambridge
(2009), the objective function that is used is Eqn (1) without any damping or
smoothing term; in other words, a simple least squares data misfit function.
Once an objective function or measure of misfit has been defined, there
are various ways in which the inverse problem can be solved. Most
28 Nicholas Rawlinson et al.

techniques rely on linearization of the inverse problem, which ultimately re-


sults in solution of a large system of linear equations. Back projection tech-
niques like algebraic reconstruction technique and simultaneous iterative
reconstruction technique avoid direct solution of these equations but tend
to suffer from poor convergence properties (Blundell, 1993). Gradient-
based methods such as damped least squares and its many variants
(Aki et al., 1977; Graeber et al., 2002; Rawlinson et al., 2006; Thurber,
1983; Zhao et al., 1992) require the solution of a large and often sparse linear
system of equations. There are various direct and approximate ways of
solving such systems including Lower Upper (LU) decomposition, Cholesky
decomposition, singular value decomposition, conjugate gradient and its
LSQR variants, and more general subspace schemes (Hestenes & Stiefel,
1952; Kennett, Sambridge, & Williamson, 1988; Nolet, 1985; Scales,
1987). Ultimately, the aim is to move from one point in model space (the
initial model) to another point (the final model) that lies within the bounds
of all data-satisfying models. Assessing the uncertainty of this single solution
involves trying to quantify the limits of the data-satisfying region of model
space. However, the linearization assumption means that the method will
only be effective if the objective function exhibits a single minimum and
its surrounding architecture is approximately quadratic. The application of
regularization essentially helps to conform the objective function to this
shape. As such, any estimate of model uncertainty depends not only on a
good knowledge of data noise, but also on the imposed regularization being
consistent with prior information.

3. PRACTICAL ASSESSMENT METHODS


Below, a summary is given of the strengths and weaknesses of a variety
of methods that have been devised for assessing model uncertainty in seismic
tomography.

3.1 Covariance and Resolution


For inverse problems that are linear or linearizable, the calculation of formal
estimates of posterior covariance and resolution is computationally tractable,
although for larger problems, it is usually only a subset of the full information
that is extracted (Nolet et al., 1999; Yao et al., 1999; Zhang & McMechan,
1995; Zhang & Thurber, 2007). The pioneering work of Backus and Gilbert
(1968, 1970) and Wiggins (1972) established the foundation of general linear
inverse theory for solving ill-posed inverse problems, which includes a
Seismic Tomography and the Assessment of Uncertainty 29

quantitative assessment of solution reliability. If the inverse problem is line-


arized, it is common to use an objective function of the form (Rawlinson,
Pozgay, et al., 2010)
SðmÞ ¼ ðGdm ddÞT C1 T 1
d ðGdm ddÞ þ εdm Cm dm
þ hdmT DT Ddm (2)
where the last term on the right-hand side of the equation perturbs the prior
model. The local minimum of this function occurs where vS(m)/vm ¼ 0,
which results in a solution of the form
 1 T 1
dm ¼ GT C1 1
d G þ εCm þ hD D
T
G Cd dd (3)
1 T 1
The term ½GT C1 1
d G þ εCm þ hD D G Cd is often referred to as
T

the generalized inverse Gg, the exact form of which is dependent on the
choice of objective function. The resolution matrix can then be written as
R ¼ GgG, where dm ¼ Rdmtrue, and estimates the averaging of the
true model dmtrue in its representation by dm. For Eqn (3), the resolution
matrix can be written:
 1 T 1
R ¼ GT C1 1
d G þ εCm þ hD D
T
G Cd G (4)
The posterior covariance matrix is defined by CM ¼ Gg[Gg]T (e.g.,
Yao et al., 1999) and measures the degree to which two model unknowns,
mi and mj, vary together (or covary) i.e., cov(mi,mj) ¼ E[(mi  mi) (mj  mj)],
where mi ¼ E(mi) and mj ¼ E(mj). CM can be related to the resolution matrix
by R ¼ I  CM C1 m (see Tarantola, 1987 for more details). In most appli-
cations, it is the diagonal elements of CM that indicate the posterior uncer-
tainty associated with each parameter that is interpreted. Likewise, it is
generally the diagonal elements of the resolution matrix that are considered,
which have a value approaching unity for well-resolved parameters.
Another metric that can be useful in analyzing solution robustness is the
correlation matrix, which can be defined (Tarantola, 1987):
ij
CM
rij ¼
 ii 1  jj 12
(5)
CM 2 CM

where 1  rij  1 and i,j ¼ 1,.,M. A strong correlation between pa-


rameters indicates that they have not been independently resolved by the
data. Covariance and resolution are commonly used to assess solution quality
in seismic tomography (Aki et al., 1977; Riahi & Juhlin, 1994; Steck et al.,
30 Nicholas Rawlinson et al.

1998; White, 1989; Yao et al., 1999; Zelt & Smith, 1992; Zhang & Thurber,
2007), although correlation is less frequently used (McCaughey & Singh,
1997; Zhang & Toks€ oz, 1998). The chief drawbacks of these measures of
uncertainty are that (1) their validity decreases as the nonlinearity of the
inverse problem increases, (2) the inversion of a large matrix is required, (3)
implicit regularization imposed by the ad hoc choice of model parameter-
ization is not accounted for, and (4) a priori model covariance and data errors
are usually poorly known, which at the very least make absolute values of
posterior uncertainty rather meaningless.
A major obstacle in the computation of resolution and covariance is, as
previously mentioned, the need to explicitly store and invert potentially
very large matrices. This difficulty sparked the development of matrix prob-
ing techniques where information about specific properties of a matrix, e.g.,
its largest eigenvalues or its trace, can be estimated through the application of
the matrix to random vectors. While very general matrix probing techniques
have been developed in applied mathematics (see Halko, Martinsson, and
Tropp (2011) for a comprehensive review), more specialized methods
have been developed recently in order to estimate resolution proxies,
such as the trace or the diagonal of the resolution matrix (e.g., An, 2012;
MacCarthy, Brochers, & Aster, 2011; Trampert & Fichtner, 2013b).
Figure 8 shows the result of computing posterior covariance for an iter-
ative nonlinear damped least squares inversion based on the data set shown in
Figure 1. In this case, no explicit smoothing is applied, so the posterior
covariance is defined by
 
1 1
CM ¼ m GT C1 d G þ εCm (6)
where m ¼ ε when ε  1 and m ¼ 1 when ε < 1. Strictly speaking, when the
covariance matrix is estimated by CM ¼ Gg[Gg]T, m ¼ 1 and ε is absorbed
into the definition of Cm. However, if ε is interpreted as a prefactor that
allows one to tune prior uncertainty, as ε / N, CM / 0, which can be
misleading. By using the alternative approach suggested in Eqn (6), as
1
ε / N, CM / Cm and as ε / 0, CM /½GT C1 d G (covariance
completely controlled by data), which is more desirable.
The initial model chosen for the Figure 8 example has a constant velocity
of 3.0 km/s and the standard deviation of the uncertainty associated with the
initial model is set at 0.5 km/s Figure 8(a) shows the result for ε ¼ 1 and
Figure 8(b) shows the result for ε ¼ 2000. The effect of increasing the damp-
ing is to decrease the amplitude of the recovered model and increase the
Seismic Tomography and the Assessment of Uncertainty 31

(a) ε =1 Solution model Posterior covariance


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


(b) ε =2000
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


v (km/s) σ (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5 0.00 0.25 0.50
Figure 8 Estimate of posterior covariance for the iterative nonlinear inversion of the
data set shown in Figure 1. (a) Damping factor of ε ¼ 1 chosen; (b) damping factor
of ε ¼ 2000 chosen. Left-hand column shows the solution model and right-hand col-
umn shows the associated estimate of posterior covariance.

posterior covariance estimates (plotted as the standard deviation s, which is


the square root of the diagonal elements of CM). The importance of Cm in
the result is clear, as regions of little or no path coverage have a s / 0.5.
The estimates of uncertainty provided by the posterior covariance matrix
appear reasonable, at least in a relative sense; for instance, uncertainty is
low near the center of the model where path density is high, and higher to-
ward the margins where path coverage drops off. Also, at about 5 east and
2.5 south in Figure 8(a) (right), the local zone of high uncertainty
32 Nicholas Rawlinson et al.

(a) ε =1 Vi=3.0 km/s (b)


ε =2000 Vi=3.0 km/s
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


(c) ε =1 Vi=3.5 km/s (d) ε =2000 Vi=3.5 km/s
0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚


Error (km/s)
0.00 0.25 0.50 0.75
Figure 9 (a) Actual error associated with the solution model shown in Figure 8(a)
(ε ¼ 1); (b) actual error associated with the solution model shown in Figure 8(b)
(ε ¼ 2000); (c) same as (a) except using a starting model with a constant velocity of
Vi ¼ 3.5 km/s; (d) same as (b) except using a starting model with a constant velocity
of Vi ¼ 3.5 km/s.

corresponds to a low-velocity zone in which path coverage is poor (see


Figure 1(b)).
However, if we illustrate the actual errors (Figure 9(a) and (b)) associated
with the two inversion results shown in Figure 8, it becomes clear that pos-
terior covariance estimates, certainly for a nonlinear inverse problem, are not
very meaningful. Although uncertainty estimates and actual error are not ex-
pected to be correlated in general, one would at least hope that where errors
are significant, the uncertainty estimate is able to accommodate the
Seismic Tomography and the Assessment of Uncertainty 33

difference, which is often not the case. It should be noted that in this
example, the initial model has a velocity (3.0 km/s) that is equal to the back-
ground velocity of the true model, which illustrates why the posterior un-
certainty need not match the actual error, given that it is reasonable to have
higher uncertainty where there is no path coverage. Figure 9(c) and (d)
shows the error when an initial model with a velocity of 3.5 km/s is used
instead.
Another instructive synthetic example is illustrated in Figures 10–12. In
this case, we have a 2-D wide-angle experiment in which refracted and re-
flected phases are generated and recorded at the surface, and sample a three-
layer model in which velocity varies linearly with depth in each layer, and
layer boundaries have variable geometry (Figure 10). Interfaces are described
by cubic B-spline functions and layer velocities by the linear equation
v ¼ v0 þ kz, where v0 is the velocity at the surface and k is the velocity
gradient. The inverse problem is to reconstruct the Moho geometry and
layer velocities using the synthetic travel times of both the refracted and re-
flected phases. Gaussian noise with a standard deviation of 70 ms is added to
the synthetic travel time data set to simulate picking uncertainties. A shoot-
ing method of ray tracing is used to compute the two-point travel times. An
iterative nonlinear damped least squares inversion scheme is used that does
not include smoothing. Figure 11(a) shows the inversion result, which
uses a laterally invariant starting model and velocities with around 10%
perturbation from the true model. In general, the reconstruction is quite ac-
curate, except for the concave-up zones of the interfaces, which are not well
sampled by first-arrival reflection phases. The covariance and resolution
(Figure 11(b) and (c)) plots appear to reflect these uncertainties quite well,
with the largest s values tending to occur in the concave-up regions of
the interface (Figure 11(b)). Part of the reason for the results appearing to
be more reliable in comparison to the Figures 8 and 9 example is that there
are many more data than unknowns (overdetermined inverse problem) in
the wide-angle example, whereas the surface wave example is much more
mixed determined.
Figure 12 shows the correlation between three different interface nodes
and the remaining unknowns for the Figure 11(a) solution. Again, this plot
shows that the interface nodes tend to be well resolved, although it is inter-
esting to observe oscillatory behavior between the reference node and sur-
rounding interface nodes. This may be due to the use of cubic B-splines,
which use a weighted combination of neighboring nodes to define a value
at a single point. Increasing or decreasing the depth of a single node can be
34 Nicholas Rawlinson et al.

Figure 10 Synthetic 2-D wide-angle data set consisting of refraction and reflection
phases. The associated travel time curves are shown beneath each phase type. Top:
refraction arrivals; bottom: reflection arrivals.
Seismic Tomography and the Assessment of Uncertainty 35

(a) Damped least squares solution

(b) Covariance

(c) Resolution

Figure 11 (a) Damped least squares inversion of data shown in Figure 10, using a later-
ally invariant starting model. Dashed lines show initial interfaces, solid lines show recov-
ered interfaces, and dotted line shows true interfaces; (b) posterior covariance for the
model shown in (a); (c) diagonal element of resolution matrix for the model shown in
(a). For both (b) and (c), the two numbers in parentheses within each layer represents
the error and resolution of the layer velocity parameters, respectively.
36 Nicholas Rawlinson et al.

Figure 12 Examples of correlations computed between three separate interface nodes


and all other model parameters for the solution shown in Figure 11(a). The two
numbers in parentheses within each layer represent the correlation values of the
two velocity parameters with respect to the reference interface node.

traded off to some extent by decreasing or increasing, respectively, the depth


of an adjacent node.

3.2 Jackknife and Bootstrap


Both the jackknife and bootstrap tests are standard statistical methods of error
assessment. The bootstrap test involves performing repeat inversions with a
resampled data set (i.e., a new data set formed by taking samples from an
original data set) and examining the characteristics of the model ensemble
Seismic Tomography and the Assessment of Uncertainty 37

Figure 13 Bootstrap test applied to the Figure 1 data set. The plot on the left shows the
average model, while the plot on the right shows the standard deviation of the model
ensemble; 50 models were generated for this test. Comparison with Figure 8 shows that
where data constraints are absent, the bootstrap uncertainty tends to zero, whereas the
covariance tends to the prior estimate of model uncertainty. The former effect is purely
due to the implicit regularization imposed by the subspace inversion scheme, which
does not alter the value of a parameter unless it is influenced by data.

that is produced. The resampling of the data set can be performed randomly
with replacement; thus, a single piece of data can be used more than once,
and the size of the new data set is set equal to that of the original (Efron &
Tibshirani, 1993). The jackknife test is similar, but instead of random sam-
pling of a data set, each separate inversion is carried out by omitting a
different set of observations. In the case where a single observation is omitted
per iteration, for N observations, N inversions are carried out and the result-
ing ensemble of solutions can be interrogated to produce summary informa-
tion. A number of tomography studies have used jackknifing to assess
solution robustness (Gung & Romanowicz, 2004; Lees & Crosson, 1989,
1990; Su & Dziewonski, 1997; Zelt, 1999); however, as pointed out by
Nolet et al. (1999), both bootstrapping and jackknifing rely on overdeter-
mined inverse problems, and these do not often arise in seismic tomography.
Figure 13 shows an example of the bootstrap test applied to the Figure 1
data set. As before, the initial model is defined by a constant velocity of
3.0 km/s. Damping and smoothing are turned off in the inversion, but im-
plicit regularization is still in place via the choice of a cubic B-spline param-
eterization with a finite separation of grid nodes. The solution model is
defined in this case by the average model, while the uncertainty is
38 Nicholas Rawlinson et al.

represented by the standard deviation of an ensemble of 50 models. Where


there is no path coverage, the uncertainty drops to zero despite the absence
of explicit regularization; this occurs because a subspace inversion technique
is used (Kennett et al., 1988), which will not adjust parameters that have a
zero Fréchet derivative. Consequently, the uncertainty estimate only has
meaning in regions of good path coverage, where the pattern of model vari-
ability bears some resemblance to path density. The amplitude of s is signif-
icantly underestimated, however, which is in part due to the need to
regularize (implicitly in this case) mixed and underdetermined inverse
problems. As such, it appears that bootstrapping is not very useful for seismic
tomography, particularly when heterogeneous path coverage is present.

3.3 Synthetic Reconstruction Tests


The synthetic reconstruction test is the most common, and perhaps the most
criticized, method for assessing solution robustness in seismic tomography.
All it essentially requires is for a synthetic or test model to be defined and
an artificial data set to be generated in the presence of this model using an
identical source–receiver geometry and phase types as the observational
data set. The inversion method is then applied in an attempt to recover
the synthetic structure. Differences between the true model and the recon-
struction form a basis for assessing the reliability of the solution. A particular
variant of this approach known as the checkerboard test, in which the syn-
thetic model is defined by an alternating pattern of positive and negative
anomalies in each dimension, has been one of the mainstays of seismic to-
mography studies for the last quarter of a century (e.g., Achauer, 1994;
Aloisi, Cocina, Neri, Orecchio, & Privitera, 2002; Chen and Jordan,
2007; Fishwick, Kennett, & Reading, 2005; Glahn & Granet, 1993;
Gorbatov, Widiyantoro, Fukao, & Gordeev, 2000; Pilia, Rawlinson,
Direen, Cummins, & Balfour, 2013; Rawlinson, Tkalcic, & Reading,
2010; Spakman & Nolet, 1988; Zelt & Barton, 1998). Other types of syn-
thetic structures have also been used, including discrete spikes, volumes of
various geometric shapes, and structures designed to mimic particular
features such as subduction zones (Eberhart-Phillips & Reyners, 1997;
Graeber & Asch, 1999; Hole, 1992; Rawlinson et al., 2006; Walck &
Clayton, 1987; Wolfe, Solomon, Silver, VanDecar, & Russo, 2002).
The synthetic reconstruction test has a number of weaknesses, including
(1) accounting for data noise, which is often poorly constrained, is difficult.
Simply adding Gaussian noise with a particular standard deviation to the
synthetic data may poorly represent the actual noise content of the
Seismic Tomography and the Assessment of Uncertainty 39

observational data set; (2) using identical parameterization for the synthetic
model and reconstructed model will yield a result biased in favor of a good
reconstruction; (3) similarly, using the same forward problem solver for the
computation and the inversion of the artificial data yields overly optimistic
results because errors in the forward problem solution are disregarded; (4)
results can vary according to the input structure used, particularly when
the inverse problem is nonlinear. In the latter case, Lévêque et al. (1993)
demonstrated with a simple test example that even for linear inverse prob-
lems the checkerboard test can be misleading, and in some circumstances can
reproduce small-scale structure more accurately than large-scale structure.
Partly as a result of such caveats, it is common to find examples of checker-
board tests carried out across a range of scales (e.g., Fishwick et al., 2005),
coupled with some other style of reconstruction test (e.g., Rawlinson
et al., 2006) or carried out together with some other measure of uncertainty
such as covariance and resolution (e.g., Graeber & Asch, 1999).
Figure 14 shows an example of the synthetic checkerboard test applied to
the data set in Figure 1. The original heterogeneous model (input model) is
shown in Figure 8(a). Although the reconstruction indicates that the basic
pattern of the checkerboard is recovered, this is not really the case when
one inspects Figure 8 or 9, and so the checkerboard could be construed as
somewhat misleading in this regard. In general, the amplitudes are underes-
timated, which is typical of a damped least squares solution. The amplitudes
are, unsurprisingly, most accurate in the region of dense path coverage near
the center of the model. Part of the reason for the relatively poor perfor-
mance of the checkerboard in this case as a proxy for the uncertainties in
the Figure 8(a) reconstruction can be attributed to the very different path
coverage between Figure 14(b) and (d), which is a function of the significant
wave speed anomalies that are present. This is particularly noticeable in re-
gions of low ray density when paths have some distance to travel, such as in
the southern region of the model. The general tendency with synthetic
reconstruction tests is to appraise them qualitatively, which in this case
may result in misleading inferences about the robustness of the actual model
recovery.

3.4 Linear and Iterative Nonlinear Sampling


As noted earlier, the underdetermined nature of the seismic tomography
problem means that a potentially wide range of models may satisfy the
data and a priori constraints. Yet most solution strategies end up yielding
a single data satisfying model from which inferences are made. Published
40 Nicholas Rawlinson et al.

(a) Input (b) Rays through input structure


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚

(c) Output (d) Rays through true structure


0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚
10˚ 10˚ 10˚ 10˚

5˚ 5˚ 5˚ 5˚

0˚ 0˚ 0˚ 0˚

−5˚ −5˚ −5˚ −5˚

0˚ 5˚ 10˚ 15˚ 0˚ 5˚ 10˚ 15˚

Velocity (km/s)
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Figure 14 Example of a checkerboard reconstruction test for the Figure 1 data set and
a iterative nonlinear damped least squares solution. (a) Synthetic checkerboard model;
(b) ray path geometry through synthetic checkerboard model; (c) reconstructed model
using inversion method; (d) actual path coverage through true model.

studies tend to implement some kind of qualitative or quantitative assess-


ment of solution robustness but, as has been pointed out, these are often
of limited value. Inversion strategies that aim to produce an ensemble of
data-fitting models are not tied to a particular configuration of features in
the solution; instead, a range of potentially plausible structures are recovered
from which summary information can be extracted that highlight those fea-
tures most required by the data.
Seismic Tomography and the Assessment of Uncertainty 41

Within a linear framework, Deal and Nolet (1996) develop the so-called
null-space shuttle, which exploits the underdetermined nature of the linear
system of equations that define the inverse problem in order to yield more
than one data-satisfying solution. The null-space shuttle is the operator that
allows movement from one solution to another without corrupting data fit.
It does this by filtering a solution model a posteriori, where the filter is
restricted to operate only on components of the solution that do not affect
the data fit. The filter that is used can assume a variety of forms depending on
the a priori information; examples include a smoothing filter, or one
designed to emphasize sharp boundaries. Once the filter operates on the so-
lution model, the difference between the new model and the filtered model
is projected onto the null-space, which has the effect of removing any
changes that degrade the fit to data. In their study, Deal and Nolet (1996)
apply the technique to synthetic data to demonstrate that, where a filter
based on good a priori information is available, a more accurate model
can be obtained by applying the null-space shuttle method to the minimum
norm solution. In a subsequent application of the method (Deal, Nolet, &
van der Hilst, 1999) to image the Tonga subduction zone, the travel time
tomography model obtained from the inversion of teleseismic and local
P-wave travel times is enhanced by biasing it toward a theoretical slab tem-
perature model based on the diffusion equation. Projecting the difference
between the seismic tomography model and the temperature model
(assuming velocity is a function of temperature) onto the null-space of the
inversion removes components of the slab temperature model that violates
the travel time data fit.
de Wit et al. (2012) generalize the null-space shuttle method proposed
by Deal and Nolet (1996) in order to estimate quantitative bounds on the
tomographic model with the goal of producing a range of different but
acceptable models. The new technique is applied to a very large global
body wave travel time data set. They found that accurate estimates of data
uncertainty are crucial for obtaining a reliable ensemble of models. Further-
more, the solution range also depends on the choice of regularization that is
required by the inversion of the underdetermined system of equations; in
particular, the range of acceptable models becomes larger as the regulariza-
tion is decreased.
The scheme proposed by de Wit et al. (2012) is similar to the so-called
regularized extremal bounds analysis (REBA) of Meju (2009), which finds
a range of data-fitting models given a set tolerance on the objective func-
tion. Although it is designed for nonlinear geophysical inverse problems,
42 Nicholas Rawlinson et al.

REBA is based on iterative updates using a local quadratic approximation


to the objective function and regularized system of linear equations, and
therefore is dependent on a number of assumptions, unrelated to the
data, in order to estimate the range of permissible models. Vasco (2007)
uses an alternative approach to exploiting the null-space of the tomo-
graphic inverse problem by applying Lie group methods that do not
require linearization about a reference model. As such, it can be viewed
as a generalization of the null-space shuttle method of Deal and Nolet
(1996) for nonlinear problems.
An alternative approach for searching model space for data fitting models
within an iterative nonlinear framework is the so-called dynamic objective
function scheme of Rawlinson et al. (2008). The basic principle behind the
method is to exploit information gained from previous solutions to help
drive the search for new models. Rather than attempt to minimize a fixed
objective function, a feedback or evolution term is included that modifies
the misfit landscape in accordance with the location of previous solutions.
The form of the objective function used in Rawlinson and Kennett
(2008) is
"
1
Sj ðmÞ ¼ ðgðmÞ  dobs ÞT C1
d ðgðmÞ  dobs Þ
2
# (7)
X j
1
þ h  ip j ¼ 1; .; N
T
i¼1 l ðm mi Þ m mi þz

which discards the usual damping and smoothing terms and instead in-
troduces a function that creates a local peak in the objective function at
values of m corresponding to all previous solutions j ¼ 1,.,n (where n < N)
that have been located. The aim is to penalize new solutions from
converging on previous solutions unless the data are sufficiently persuasive.
The terms p, l, and j control the shape and amplitude of the local maximum
as defined in Figure 15. With appropriate choices of these terms, it is possible
to produce a relatively small ensemble of models that together contain the
most robust features that can be inferred from the data. The main weakness
of the method is that, although damping and smoothing regularization is
discarded, appropriate choices of p, l, and j need to be found and are
problem dependent. An example of the dynamic objective function tech-
nique applied to observational data is given in the next section.
Seismic Tomography and the Assessment of Uncertainty
(a) Variable λ (b) Variable ζ (c) Variable p
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
20 20 20 20 20 20
p = constant p = constant ζ = constant
ζ = constant λ = constant λ = constant
S(m)

10 10 10 10 10 10

0 0 0 0 0 0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
m m m
Figure 15 Graphical representation of how the variables p, l, and z in Eqn (7) influence the shape of the evolution term. From Rawlinson et al.
(2008). Copyright 2008 Royal Astronomical Society. Reproduced by permission of Oxford University Press.

43
44 Nicholas Rawlinson et al.

3.5 Fully Nonlinear Sampling


Inversion methods that avoid the assumption of local linearization and pro-
vide a thorough interrogation of model space in order to produce an
ensemble of data-satisfying models are the most attractive for addressing
the nonlinear relationship between observables and model parameters. For
some tomography problems, such as global or teleseismic travel time tomog-
raphy, where ray paths do not strongly deviate from global reference model
predictions, linear and iterative nonlinear schemes can be relatively robust.
However, when wave speed heterogeneity is significant and prior informa-
tion is limited (e.g., crustal or near-surface studies), nonlinear sampling
methods are potentially of the greatest benefit. Full waveform tomography
also becomes increasingly nonlinear at higher frequencies due to cycle skip-
ping issues. However, nonlinear sampling methods are much more compu-
tationally expensive than methods based on linearization, and consequently
have only enjoyed limited exposure in realistic seismic tomography
problems.
Common nonlinear inversion methods used in the physical sciences,
including genetic algorithms, which use an analog to biological evolution
to drive the search for new models, and simulated annealing, which is based
on an analog with physical annealing in thermodynamic systems, have been
used to solve geophysical inverse problems (Mosegaard & Sambridge, 2002;
Sambridge & Mosegaard, 2002); however, application to 2-D and 3-D to-
mography problems is limited due to the large number of unknowns
involved (Asad et al., 1999; Boschetti et al., 1996; Pullammanappallil &
Louie, 1993).
Surface wave tomography is more amenable to nonlinear inversion
methods than body wave tomography, because the problem can be posed
as a composite 2-D and 1-D inverse problem rather than a fully 3-D inverse
problem. For instance, one can invert group or phase dispersion for 2-D
period-dependent group or phase velocity maps, and then carry out a
point-by-point inversion for 1-D shear wave velocity in order to build a
3-D model. Meier, Curtis, and Trampert (2007a, 2007b) use neural network
inversion to invert fundamental mode Love and Rayleigh phase and group
velocity maps for a global model of the crust and uppermost mantle. As well
as addressing the nonlinearity of the inverse problem, this approach has the
benefit of providing the posterior probability distribution of model param-
eters, thus allowing a quantitative assessment of uncertainty. However, the
number of unknowns in the 1-D inverse problem is limited (29 in this
Seismic Tomography and the Assessment of Uncertainty 45

case), and the reliance on 2-D group and phase velocity maps derived from a
linear inversion means that the full nonlinearity of the complete problem is
not addressed. Shapiro and Ritzwoller (2002) carry out a similar study using
a large data set of fundamental mode Rayleigh and Love wave group and
phase velocities, but instead use a Markov chain Monte Carlo (McMC)
method to yield an ensemble of data-satisfying models.
The McMC approach to solving the nonlinear inverse problem is grad-
ually growing in popularity in seismic tomography. Bodin and Sambridge
(2009) implement the reversible jump variant to solve a transdimensional
inverse problem in which the number and spatial distribution of model un-
knowns vary in addition to their values. The inverse problem is solved
within a Bayesian framework, which means that information is represented
by probability density functions. The goal of Bayesian inference, within a
linear or nonlinear setting, is to quantify the posterior probability distribu-
tion given a prior distribution and constraints provided by the data. The
posterior probability distribution is defined by an ensemble of data satis-
fying models generated by the Markov chain following an initial burn-in
phase. Information such as the mean and standard deviation can be
extracted from the ensemble. Bodin and Sambridge (2009) apply the
scheme to a 2-D surface wave test problem in which rays are only updated
after every Nth model is generated (where N is large) in order to minimize
computational resources. In this sense, the technique is ultimately iterative
nonlinear rather than fully nonlinear. However, since N is a variable and
linearization is not inherent to the inversion scheme, it is possible to
make the scheme fully nonlinear by setting N ¼ 1. This is done in the
transdimensional tomography study of Galetti et al. (submitted for publica-
tion), where surface wave group dispersion is inverted for period-
dependent group velocity maps.
Stochastic sampling methods provide a robust way of extracting mean-
ingful information from sparse data sets, but they still require an accurate
knowledge of data noise; in the absence of such information, the range of
data-fitting models becomes an unknown variable. In the context of seismic
tomography, Bodin, Sambridge, Tkalcic, et al. (2012) introduce the so-
called hierarchical Bayesian inversion scheme, an extension of the Bayesian
transdimensional scheme, which in addition to the number, value, and dis-
tribution of model parameters, allows the level of noise (e.g., represented by
the standard deviation) to be an unknown in the inversion. This is particu-
larly useful, as the absolute level of noise (including picking error and
46 Nicholas Rawlinson et al.

modeling error, the latter being the inability of the forward model to explain
the data) is usually poorly known. Bodin, Sambridge, Rawlinson and
Arroucau (2012) apply the new scheme to surface wave dispersion data
from Australia that comprises three separate experiments carried out at
very different scales. The standard deviation of the data noise is treated as
a linear function of interstation distance in order to account for the large
range of interstation path lengths.

4. CASE STUDIES
Four different case studies are presented below, which use different
means of assessing model robustness. The first example showcases the syn-
thetic reconstruction test that is commonly used in seismic tomography.
The remaining three examples apply more recently developed techniques
for assessing model robustness, including iterative nonlinear sampling, trans-
dimensional tomography, and resolution analysis for full waveform
tomography.

4.1 Synthetic Reconstruction Test: Teleseismic


Tomography Example
Following the early work of Aki et al. (1977), teleseismic tomography has
become very popular for imaging the structure of the crust and lithosphere
in 3-D (Glahn & Granet, 1993; Graeber et al., 2002; Humphreys &
Clayton, 1990; Oncescu et al., 1984; Rawlinson & Fishwick, 2012;
Rawlinson & Kennett, 2008; Saltzer & Humphreys, 1997) despite its
well-known drawbacks. These include ignoring lateral variations in struc-
ture outside the model region that may contribute to the measured arrival
time residual and the subvertical incidence of the seismic energy at the
receiver. The latter effect results in relatively poor resolution in the vertical
direction, while the mapping of arrival time residuals as wave speed
variations within a limited model region beneath the array may introduce
unwanted artifacts.
Here we present an example of teleseismic tomography applied to
Tasmania, southeast Australia, with the main goal of assessing the results
of an associated synthetic reconstruction test. Full details of the methods, re-
sults, and interpretation can be found in Rawlinson et al. (2006). Data for
the study comes from an array of 72 recorders deployed across northern
Tasmania in 2001 and 2002 (see Figure 16(a)). A total of 6520 relative
Seismic Tomography and the Assessment of Uncertainty 47

(a) (b)

(c)

Figure 16 (a) A 72-station Tigger array (deployed between 2001 and 2002) with an
average station separation of 15 km; (b) plot of teleseismic arrival time residuals for
an event from the Marianas; (c) estimate of uncertainty associated with the extraction
of arrival time residuals for the Marianas event.

P-wave arrival time residuals from 101 teleseismic sources are extracted from
the seismic records using the adaptive stacking technique of Rawlinson and
Kennett (2004). Figure 16(b) shows a map of the P-wave residuals for an
event from the Mariana Islands. The adaptive stacking technique also pro-
duces an estimate of picking uncertainty (Figure 16(c)), which is used to
weight the contribution of residuals in the tomography. A minimum uncer-
tainty threshold of 37.5 ms (75% of the sample interval) is imposed in recog-
nition of noise and waveform incoherence across the array. An iterative
nonlinear inversion scheme is applied to map the arrival time residuals as ve-
locity variations; the objective function includes damping and smoothing
regularization to control the amplitude and wavelength of retrieved struc-
ture. Trade-off curves are used to decide the appropriate damping and
48 Nicholas Rawlinson et al.

smoothing. The forward problem of travel time prediction is solved using a


grid-based eikonal scheme that robustly finds first arrivals (Rawlinson &
Sambridge, 2004). The inverse problem is solved iteratively using a subspace
inversion scheme (Kennett et al., 1988), with arrival times recomputed after
each model update.
Figure 17 shows a depth slice and an east–west slice through the
Tasmania solution model, obtained after six iterations using a 10-D subspace
scheme. The data variance is reduced by 74%, which corresponds to an
Root mean square (RMS) reduction from 193.7 ms to 98.7 ms. From the
adaptive stacking results, the estimated data noise is 77 ms, which indicates
that there is likely to be a component of “modeling noise” due to implicit
(from the parameterization) and explicit (from the damping and smoothing)
regularization and forward modeling assumptions. Most of the recovered
structures look plausible, although the edge of the vertical slices appear to
contain unrealistic streaking effects.
In order to investigate the robustness of the solution, a synthetic check-
erboard test is carried out using three different checkerboard sizes ranging
between approximately 25 and 50 km (see Figure 18). Gaussian noise
with a standard deviation of 77 ms is added to all three synthetic data sets
in order to simulate the effects of picking noise. Arguably, one could use
noise with a standard deviation of 98.7 ms in order to reproduce the same
fit to data experienced by the real model. In addition, rather than use iden-
tical parameterizations for the synthetic and recovered model, it would be
more realistic to use difference parameterizations. However, this approach
is the convention in seismic tomography, and the checkerboard recovery
that is achieved can be regarded as being on the optimistic side of the truth.
Figure 19 shows the output model, which in general shows a good recovery
of the pattern of anomalies across all three scales. The region of good recov-
ery is most extensive for the large checkerboard, and most restricted for the
small checkerboard. This is an expected result given the known trade-off be-
tween resolution and covariance. On the vertical sections (Figure 19(b) and
(c)), significant streaking can be observed toward the edge of the model
where crossing path coverage diminishes.
One of the limitations of a checkerboard test such as that illustrated in
Figures 18 and 19 is that the extent of near-vertical distortion of structure
is difficult to fully appreciate due to the structure of the checkerboard, where
the diagonal elements in the vertical plane are closely aligned with dominant
ray directions. To address this issue, spike tests in which discrete anomalies
placed some distance apart represent a more robust test. Figure 20 shows
Seismic Tomography and the Assessment of Uncertainty 49

Figure 17 Horizontal and vertical slice through the Tasmania solution model obtained
via iterative nonlinear inversion of teleseismic arrival time residuals. Modified from
Rawlinson et al. (2006). Copyright 2006 American Geophysical Union. Reproduced by
permission of American Geophysical Union.

the result of a synthetic spike recovery test, which, apart from the structure,
uses the same settings as the previous checkerboard test. The output shows
that even in regions with good path coverage, vertical smearing of structure
takes place. This characteristic of the recovery must be accounted for in the
interpretation of the results.
Exploring the Variety of Random
Documents with Different Content
After the feverish activity of the war came a period of comparative
inaction. The whole political atmosphere of the world, however, was
too heavily charged—too electric, as it were, to permit of hopes of
lasting peace. In the United States of America the tension between
the northern and southern states was already becoming acute, while
in Europe the prevailing attitude of the powers towards one another
was that of frigid politeness, which at any moment might thaw into
hostilities. So there was no lack of incentive to continue the
development of the fighting marine. The principal reasons why more
was not done at this time were that naval architects and
administrators were at the parting of the ways. Some urged that the
types with which they were familiar should be adhered to, and that
though armoured vessels were useful in the war against Russia,
where peculiar conditions had to be met, it did not follow that such
vessels would be of use in another war; and it was pointed out that
they would be of no value whatever in a naval engagement on
account of their unseaworthiness, or rather clumsiness, and the
difficulty of handling them. Others, more far-seeing, urged that iron-
clad vessels were bound to come sooner or later, and sooner rather
than later, since it had been demonstrated that such were not only
possible but, so far as they had been used in the war, effective, and
that they showed that vessels of less size, armour-plated and
carrying a few heavy guns, would be more than a match for any
wooden line-of-battle ship afloat. It was contended that the gunboats
which silenced the Kinburn forts would be able to give a good
account of themselves against the best three-deckers in the allied
fleets. But the Admiralty, still convinced of the excellence of the type
which had done so well in the past, retained that type and went on
building wooden ships, as for that matter did all the admiralties of the
world.
In 1858, there was designed the last and the finest line-of-battle ship
constructed of wood for the British navy. She was launched at
Portsmouth in 1859, and commissioned in 1864, and under the
name of the Victoria served as flagship in the Mediterranean, and
was removed from active service three years later. She was a screw
steamer, with horizontal return-connecting-rod engines by Maudslay,
indicating 4,000 h.p., and with the boilers giving 22 lb. pressure she
could steam at 12 miles an hour. She carried, on her upper deck,
twenty-two 32-pounders and one 68-pounder; on her main deck
thirty-four 32-pounders, on her middle deck thirty-two guns of the
same size, and on her lower deck thirty-two 8-inch guns. A
comparison of her armament and that of the next Victoria shows the
remarkable change made in the course of a few years in naval
artillery, no less than in the arrangement of the weapons on ship
board.
But whatever may have been the conservative official view, the
lessons of the armour-clads in the Crimean War were not thrown
away, and many naval designers were attempting to solve the
problem of the best means of applying those lessons to the altered
conditions of modern naval warfare. Guns were invented, more
powerful than any wooden ship could hope to withstand, and it was
admitted to be impossible to place as many of them on a ship as of
the ordinary weapons. The turret and the broadside systems had
already been suggested, and both had their enthusiastic advocates.
The report presented by a Royal Commission appointed in 1858 to
consider the relative strength of the British and French navies, first
compared the state of the navies of the two powers before the
Crimean War with that prevailing afterwards. In 1850 the line-of-
battle ships of both countries were sailers, as were nearly all the
frigates. The steam fleet of England at the time of the Crimean War
was superior to that of France, which at one time had only one screw
line-of-battle ship, the Austerlitz, available for the Baltic; but after the
war the French lost little time in converting several of their sailing
ships into steamships.
A return accompanying this report shows that although the British
had five steam line-of-battle ships for every four possessed by
France, including those completed or still under construction, the
French had forty-six steam frigates to thirty-four possessed by this
country. The report contained one significant item, viz., that four iron-
plated ships were being built by France, and these, “appearing so
ominously, had completely changed the situation.”[35]
The French naval architect, Dupuy de Lôme, was responsible for this
innovation, and the four vessels were a testimony to his genius. The
first of the quartette to be launched was the Gloire. Originally
designed as a 90-gun battleship, she took the water as a 60-gun
armoured frigate. She was of 5,650 tons displacement, and her three
sisters were slightly smaller. Her armour was of iron, 4½ to 4¾
inches thick. She was not, as is sometimes asserted, armoured all
over, but was plated her whole length along the water-line and for
some little distance above it, and her central battery was also
protected by a belt extending above the water-line belt. The engines
worked up to about 4,200 h.p. indicated.
Iron armour over a wooden frame suggested a compromise in the
matter of construction with which the Admiralty did not at all agree. It,
therefore, decided on building an iron ship in reply to the Gloire, and
the Warrior was the first seagoing ironclad. In her external
appearance there was nothing to distinguish her from the average
wooden steam frigate of the time, except her extraordinary length.
She was a three-masted square-rigged ship, with a graceful
overhanging cutwater, her dimensions being as follows: length, 380
feet, and 420 feet over all; draught, 25½ feet; depth from spar deck
to keel, 41 feet 6 inches. Her engines of 1,250 h.p. nominal gave her
a speed of nearly 14½ knots. She carried twenty-eight 7-inch
muzzle-loading rifle guns, two other rifle guns, and two 20-pounder
breech-loading rifle guns. She was built at what is now the Thames
Ironworks, then the no less celebrated yard of Messrs. Ditchburn and
Mare.
In describing the vessel, the builders say: “It may be of interest to
note here that the Warrior’s armour plates were all fitted at edges
and butts with tongues and grooves, the tongues being formed solid
out of the plate 1¼ inch wide and ½ inch deep, the grooves being
formed slightly larger to facilitate entering. This plan, which was very
costly, and was suggested by the curving out of the plates tested at
Shoeburyness after being struck by the shot, was not repeated in
later vessels, in view of the great difficulty in replacing damaged
plates. It is not generally known that the Warrior, though a sea-going
warship, had a ram bow, the greatest projection being at about the
water-line, the head knee or cutwater being brought on
independently after the ram was completed, to maintain the then
usual appearance of the frigates of the English navy.”[36]
Besides the side armour, the fore and after ends of the main deck
carrying the battery were protected by armoured bulkheads. The
great length of the vessel rendered it impossible to armour her
entirely, as had she been armoured from end to end the protection
afforded to the vital parts of the ship would have been insufficient to
withstand the heaviest artillery of the time. Therefore, some 85 feet
at either end were left unprotected, and the weight of armour thus
saved was added to that covering the central portions of the ship, so
that she would be enabled to withstand the worst fire an enemy
could bring to bear upon her. It was contended that were her
unarmoured ends to be shot away or riddled and rendered useless,
her armoured portion would remain afloat, an invulnerable citadel.
The belt of armour on the broadside was 22 feet deep, and was
backed by 18 inches of teak.
In every respect, save, perhaps, that of manœuvring, she was an
improvement upon her French rival. Her ports were about 8 feet 6
inches from the water as compared with 5 feet 8 inches in the Gloire,
those of the latter, though comparing favourably with the distance
which prevailed in the earlier ships of the line, both sail and steam,
being considered much too near the water to permit of her main deck
guns being fought except in fine weather. Her gun carriages, too,
were a great improvement upon anything of the kind that had been
fitted in an English ship. A system of pivoting the carriages under the
trunnions of the guns was applied, so that the guns could be trained
through portholes only 2 feet wide, or half the size of those fitted in
other ships, and as the sides of the ports were plated with 7-inch
iron, an additional measure of protection was afforded the crew. Her
tonnage was 6,177 tons, builder’s measurement, but her total weight
with stores and guns was about 9,000 tons.
The Warrior was a combination of the longitudinal system of ship
construction designed by Scott Russell, and the ordinary method of
transverse framing, the plans being prepared by the Admiralty. The
sixth longitudinal was used to rest the backing and armour upon. The
unprotected ends of the vessel were built on the transverse system,
and were given a number of watertight compartments. An important
feature in the construction was that the transverse plates between
the longitudinals were solid but had three holes cut in them to lighten
them, and it was in dealing with these plates that some of the earliest
improvements were made in following ships. As a further means of
giving strength, a vertical watertight longitudinal bulkhead extended
from the third longitudinal on each side up to the main deck, to which
it was rigidly secured, thus forming an exceedingly strong wing
passage and box girder, which was further strengthened by
transverse bulkheads. She had not a complete double bottom.
Externally, she was fitted with two bilge keels to prevent rolling.
The Black Prince, which followed the Warrior, was 380 feet in length,
and exceeded the length of the Gloire by 130 feet; her beam was 58
feet 4 inches, and her displacement 9,210 tons. She also was a full-
rigged ship, and had an overhanging or schooner bow, the ram being
thought unnecessary, as ramming was no longer looked upon as an
important feature of naval tactics.
“These were the last, however, in which the essentials of pictorial
beauty were held of paramount importance.”[37]
The attitude of the Admiralty in regard to steam had hitherto been
that in many respects it must be auxiliary to sail. The Black Prince’s
armour, though only 4½ inches thick, was considered to offer an
adequate resistance to the 68-pounder gun’s projectile, and this, too,
after the experience gained in the Crimean War; besides which no
allowance whatever was made for the probability that more powerful
guns, firing heavier projectiles than any yet known, would shortly be
in existence, especially as they were already being designed.
Although called an ironclad, the Black Prince would be better
described as “armour-patched,” for only 213 feet on each side was
armour-protected. The rest of the hull, including even the steering
gear, was as unarmoured and unprotected as that of any sailer of a
century before. The ends of the armoured belts, however, were
united by iron plated bulkheads, so that the armoured portion of the
ship formed a central or box battery. In order to add to the safety of
the ship, in case of its penetration by a hostile shot, a number of
watertight compartments was built into her, thereby ensuring a
certain amount of buoyancy. This vessel, like the Warrior, was
“unhandy,” to use a sailor’s phrase, as were all her class, their length
making them difficult to steer, on account of the amount of room
required in which to turn. Indeed, they were so awkward that in
manœuvres it was necessary to keep them four cables’ lengths apart
instead of the two cables’ lengths customary with other vessels. The
Black Prince carried four 9-ton guns and twenty 6½-ton guns, all
muzzle-loaders. These ships were unquestionably most impressive
from the spectacular point of view, and, compared with the wooden
ships they superseded, their fighting value was great. They were
practically the forerunners of the class represented by the three iron
sisters, Agincourt, Minotaur, and Northumberland. The last named, a
ship-rigged, armoured, first-class cruiser, was begun in 1865, by the
Millwall Ironworks and Shipbuilding Company, and completed in
1868, the designs being prepared by the Admiralty. At first it was
proposed that she should have only three masts, and as many as
fifty-eight guns, but during the process of construction, it was
decided to increase the number of masts to five and to reduce the
number of guns to twenty-eight more powerful than those originally
intended. Her design, and that of her sisters, represented a curious
adherence to a belief in the necessity of sail, tempered by a desire to
a compromise in the matter of more modern artillery. When
launched, she had four 12-ton muzzle-loading rifle guns and twenty-
two 9-ton 8-inch muzzle-loading rifles on the main deck, while on her
upper deck were two 6.5-ton 7-inch breech-loading rifle guns. Her
armour was 5½ inches thick, with 9 inches of teak backing, and was
extended throughout her entire length with the double purpose of
protecting the ends and steering gear, and of allowing her fore and
after guns to be fired from behind armour. This, of course, meant a
greater weight to be carried, and it could only be done, if speed were
not to be sacrificed, by increasing the length of the vessel. So far as
manœuvring was concerned, these ships were much worse than
their predecessors.
H.M.S. “BLACK PRINCE.”
Photograph by Symonds & Co., Portsmouth.
THE “BANGOR,” FIRST IRON SEA-GOING PROPELLER STEAMER IN
THE UNITED STATES.
From a Print in the possession of, and reproduced by permission of, the Harlan &
Hollingsworth
Corporation, U.S.A.
Their engines were on Penn’s trunk system, with two cylinders of 112
inches diameter, and a stroke of 52 inches. Each had ten boilers with
four furnaces per boiler, the total grate area being 956 square feet,
and the steam was supplied up to a pressure of 25 lb. per square
inch. These ships each carried a four-bladed Mangin propeller of 24
feet diameter, which was adjustable so that the pitch could be altered
from 22½ feet to 28½ feet. The Northumberland was the first war
vessel on which Macfarlane Gray’s steam steering gear, originally
invented for the Great Eastern, was installed. These three vessels
were 400 feet 3 inches in length, and had a beam of a fraction over
59 feet, and drew 27 feet 3 inches, with a displacement of about
10,786 tons.
Before referring to the historic American ships of the third quarter of
the last century, some attention may be given to a remarkable vessel
which passed into the possession of the United States Government.
The steamer Bangor was built by the firm of Betts, Harlan and
Hollingsworth (now the Harlan and Hollingsworth Corporation), in
1843-4, for the Bangor Steam Navigation Company, of Maine, and
was the first iron sea-going propeller steamer constructed in the
United States. The hull was formed of bar iron ribs or frames
secured by numerous wrought-iron clamps, and her plating was put
on in the lapped or “clinker” style, instead of the modern inside and
outside method of arranging the sheets.
The Bangor measured 231 tons burthen; her length over all was
about 131 feet; length between perpendiculars, 120 feet; beam
moulded, 23 feet; and depth of hold from base line amidships, 9 feet.
She had three wooden masts, with bowsprit and jib-boom, and was
schooner-rigged, carrying a suit of eight sails. Passengers were
carried aft in a commodious deck-house fitted up in a style of
elegance unusual in those days, and considered particularly
handsome by her owners and builders. There were but two deck-
houses upon the vessel at the time she was built, the third or forward
house, as shown in the illustration, having been added afterwards.
Her machinery consisted of independent twin-screw propeller
engines, having cylinders 22 inches in diameter by 24 inches stroke
of piston. The propeller wheels were of the Loper type and 8½ feet in
diameter. Her boiler was placed in the hold and was of iron, 20 feet
in length, of the type known as the “drop flue” boiler. On her trial trip
she averaged 10.61 miles per hour at one time. The first five miles
were run with low steam, making forty-four revolutions. The pressure
of steam was under 46 lb. to the square inch during the whole trip.
Afterwards with full steam the speed per hour was 14.07 miles. From
this, however, there should be deducted 2½ miles for tide, giving an
actual speed of 11.57 miles per hour. On the second trip of the
Bangor from Boston, she caught fire, and was beached upon the
New England coast, near Nantucket, in order to save the crew and
freight. She was afterward adjudged a wreck, the insurance
settlement was effected, and she was towed to a New England
shipyard (probably at Bath, Me.), where she was repaired and
rebuilt. She afterwards continued to run on the same line until she
was, in 1846, purchased by the United States Government, and re-
named the Scourge at the time of the outbreak of the Mexican War.
During her employ as a war vessel she was equipped with three
guns. After two years of war service, she was, on October 7th, 1848,
finally sold by the Government to John F. Jeter, of Lafayette,
Louisiana. From the date of this transfer no trace of her can be
found. It is possible that she may have been either lost by fire or
storm, or have been dismantled and altered for other than her
natural purposes.
A visit was paid to England in October, 1856, on her trial cruise, by a
ship which was destined to have considerable influence in the not
distant future upon warship construction, and to help to revolutionise
completely all the hitherto accepted theories. This was the famous
Merrimac—the first of six steam frigates the United States had
constructed. She was considered by her designers to be a match for
any vessel afloat on the European side of the Atlantic, and as a
specimen of the American fondness for fast and heavily armed
frigates, a type of vessel in which they excelled, she left nothing to
be desired. Naturally, she attracted a great deal of attention.
The Merrimac—she came to England under that name, and not as
the Virginia, as sometimes stated—was 300 feet over all, and 250
feet on the keel, and 260 feet on the load water-line, and was 51 feet
4 inches beam, and drew 28 feet of water. She was of 3,987 tons
measurement, and 4,500 tons displacement. Her engines were of
600 h.p. and presented several peculiarities. The cylinders were of
72 inches diameter, with a stroke of three feet, and there were two
rods to each piston. Her screw propeller was on Griffith’s system,
and had means of varying the pitch. Normally the screw had a pitch
of 26 feet 2 inches; its diameter was 17 feet 4 inches. She had four
of Martin’s vertical tubular boilers. The frame of the ship was of live
oak, crossed internally with two sets of diagonal iron plates, inclined
in opposite directions, and similar plates on the outside strengthened
her bow and stern. Her model, or shape, is said to have been of
considerable beauty, while her internal arrangements for the comfort
and accommodation of the officers and crew were of a high order.
She could spread 56,629 feet of canvas, and nautical men here were
of opinion that she could easily have borne heavier masts and spars
and so have spread more canvas still. However, the weight of her
armament had to be considered, and this may have been one
reason why she was not more heavily equipped aloft. She was
pierced for sixty guns, but on account of the weight and size and
effectiveness of those she had, the number on board was only forty.
Nevertheless, she was claimed to be, and with good reason, as
powerful as anything Europe could show. Two large pivot guns, of 10
inches calibre, and each weighing nearly 5½ tons, were on the upper
deck, together with fourteen 8-inch guns, weighing more than three
tons each; while on the gun-deck were twenty-four 9-inch guns, each
weighing close upon 4½ tons. All these guns were strong enough to
fire solid shot, but they were intended to take hollow shot or shell, a
custom to which the Americans attached considerable importance.
The guns were built on the Dahlgren system, which gave them
throughout their length a thickness proportionate to the pressure
caused by the explosion of an ordinary service charge of powder.
The adaptation of these guns to the Paixhan system of shell-firing
was another novelty she presented. As solid shot were more
destructive against fortifications and heavy works than the shells or
hollow shot—uncharged shells that is—the naval experts of Europe
did not look favourably upon explosive shells, preferring to consider
them more suitable for large swivel guns, such as were sometimes
mounted on the sponsons of paddle boats. The Merrimac had not a
solid shot on board. Her guns were of unusual thickness at the
breech and thinner than the European guns in that part called the
chase, which lies between the trunnions and the muzzle. Their
mounting, also, presented some peculiarities. There was no hinder
truck, the force of the recoil being taken up by the friction of the
carriage against the deck, but the gun recoiled sufficiently on
discharge to permit of reloading; while, instead of the hinder truck, a
contrivance attached to the end of a handspike was thrust under the
gun carriage. There were, in addition, a number of smaller guns.
THE “MERRIMAC” BEFORE CONVERSION.
THE “MERRIMAC” AS CONVERTED INTO AN IRONCLAD.
From Photographs supplied by the U.S. Navy Department.

The next that was heard of the Merrimac was that when the Federals
found it necessary to burn certain stores and ships which could not
be removed beyond reach of the Confederates after the American
War began, she was one of those set on fire and then sunk. The
Confederates, being short of ships—indeed, they seem to have been
short of everything except enthusiasm and a belief in their cause—
raised her to see what could be done with her. All her upper works
had been destroyed, and her hull somewhat damaged, but she was
held to be sound enough to be worth fitting out afresh. Accordingly,
to meet Commander Brooke’s design, she was cut down to the
water-line, and given a superstructure in the shape of an ugly, squat
rectangular deck-house with sloping sides, and was referred to
afterwards by her northern opponents as a floating barn. The over-all
deck length of this casemate was about 170 feet. Its sloping walls
were framed of pine twenty inches thick, upon which oak planking
four inches thick was laid, and outside this two sets of iron plates,
formed by rolling out railway rails, were laid, the first horizontally and
the outermost vertically. Both sets of plates were fastened on by
bolts 1⅜ inches thick, passing through to the back of the timber. The
sides sloped considerably, according to some writers 35 degrees,
while others put the inclination at 45 degrees. The intention was that
any shot striking her should only inflict a glancing blow and ricochet
harmlessly. For the same reason the ends of the casemate were
given a similar angle, but instead of being straight like the sides,
were semi-circular, or almost so. The top of the structure was
covered by an iron grating, which served the double purpose of
permitting the ventilation of the interior and keeping out missiles.
This grating measured about 20 feet by 120 feet. Her armament
consisted of two 7-inch rifle guns mounted on pivots so that they
could be fired through any of the ports in the sides of the casemate,
a 6-inch rifled gun on either broadside, and three 9-inch smooth-bore
Dahlgren guns. Altogether she had fourteen gunports. To add to her
effectiveness, an iron ram was affixed to the bow. Her stern lay very
little above the water, but the highest point of the bow was about two
feet above the sea. Her conning tower, a cone three feet high and
protected by four inches of armour, was placed beyond the forward
end of the casemate. Her funnel was unprotected. Though supposed
to be renamed the Virginia, she never lost her old name of Merrimac.
Against the wooden ships in Hampton Roads she was invulnerable.
Even at point-blank range their broadsides did not suffice to stop her.
This was her trial trip, and her engines, patched up after their
experiences in the fire and at the bottom of the harbour, could only
get her along at about four miles an hour, and her crew had never
been afloat in her before. Nevertheless her commander, Franklin
Buchanan, combined the trial trip with active service, and attacked
the northern ships with a determination which carried consternation
to the North. The wooden Cumberland was blown up and the
Congress sunk, the latter as the result of an application of the ram,
which, however, injured the ramming vessel so much that the future
effectiveness of her ram was greatly reduced. Buchanan was so
badly wounded in this engagement that he was unable to command
the Merrimac in her duel the next day with the Monitor.
The Monitor, designed by Ericsson, was built under very arbitrary
conditions. When it became known that the Merrimac was under
construction, President Lincoln advertised for something to meet her
on equal terms, and Ericsson tendered. He pointed out that the
armour plates of the Gloire or Warrior would be useless against the
heavy 12-inch wrought-iron gun he had brought out in 1840, in
connection with Colonel Robert Stockton, and as he pledged himself
that he could complete in a hundred days a steam vessel carrying
two of such guns placed in a turret which should be armour-plated
and proof against the heaviest guns the Confederates could place in
the Merrimac, his tender was accepted. Ericsson was hampered in
his work by the interference of the government officials, hardly any of
whom understood his plans, but all of whom thought themselves
competent to improve upon them. Considering the limitations under
which his undertaking had to be accomplished, the Monitor was a
remarkable vessel in every respect. He had to draw out his plans to
scale, have all the parts designed, see that everything was made as
he designed it, and supervise the construction of the ship and
engines, and the whole of this work had to be done within a stated
time. The adventure, for such it unquestionably was, was hailed
throughout the length and breadth of America as the work of a
madman. Like all innovations destined to play an important part in
the world’s history, it was greeted with derision and abuse. There
were a few people on both sides of the Atlantic who recognised the
importance of the change in naval construction which Ericsson’s ship
inaugurated. These were they who had profited by the lessons of the
armoured gunboats or floating batteries employed by the French and
English in the Crimean War. They saw that if small but powerfully
armed ships could effectively attack powerful shore batteries, and by
reason of their shape could never receive a direct blow but only
glancing shots, a vessel carrying a circular fort which also could not
receive a direct blow must be superior to any vessel afloat,
especially if its fort or turret were so heavily armoured as to be proof
against the heaviest ordnance to whose fire it should be subjected.
Moreover, if the hull were made to offer the least possible mark to an
enemy, the difficulty of striking the vessel to sink it would be greatly
increased. The form of the vessel was such that if it were used as a
ram the weight behind the ram would be in a horizontal plane with
the ram at the point of contact, and greater injury would thereby be
inflicted upon the side of an opposing vessel than were there a
greater amount of weight above the horizontal plane.
These considerations were ably supported by Admiral Porter, of the
United States Navy, who was well aware of the value of such a
means of attack even if the propelling engines could not give the
ship a speed of more than four or five miles an hour. The gallant
admiral himself was the butt of no slight amount of ridicule by his
emphatic declaration that the Monitor “is the strongest floating vessel
in the world and can whip anything afloat.” The vessel was built of
iron, and can best be described as a shallow, oblong box, with
sloping sides, having upon it a pointed, flat, shallow box or raft with a
stumpy, circular tower or turret amidships. This box or upper part
projected a considerable distance all round above the lower part,
and especially so at the stern; and had not the whole vessel been
very strongly constructed, the fearful blows which the under-part of
the projection received from the sea as it rose and fell on the waves
on its passage from New York to Hampton Roads would have driven
the two parts asunder.
THE “MONITOR”-“MERRIMAC” DUEL.
From a Photograph of a Contemporary Drawing supplied by the U.S. Navy
Department.
Up to the last Ericsson was bothered by the government officials.
Had he been left to himself the ship would not have had such a
narrow escape from going to the bottom. They interfered with the
turret-bearings, with the result that when the sea washed over the
low deck, the water poured into the hold from all round the turret and
put out the fires in the engine room, when the fumes drove the
engineers out of their quarters and nearly poisoned everybody in the
turret through which all the outgoing ventilation had to be made.
However, the tugs got the vessel safely into smoother water, the
furnace was set going again, and the pumps were restarted, and by
the time Hampton Roads was reached the vessel was labouring
along as best it could under its own steam and with the aid of a
couple of tugs. The narrow escape the Monitor had from foundering
on this voyage served to stimulate the chorus of disapproval, and
there were not wanting many on the northern side as well as on that
of the south to predict the failure of “Ericsson’s folly.”
Ericsson had confidence in his ship. He had never forgiven the
British Admiralty for its rejection of the screw propeller, nor for
ignoring his suggestions in regard to the Princeton, and one reason
why he chose the name of the Monitor, as he told the writer and
others more than once, was that it should be a perpetual reminder to
the British Admiralty of the chance it had lost.
In the turret were two 11-inch Dahlgren smooth-bores which fired
solid iron shots weighing 135 to 136 lb. each with charges of 15 lb. of
powder, and were even more powerful than his own gun. Solid iron
stoppers closed the ports when the guns were run in. The deck had
five projections besides the turret. Right forward was a small square
pilot-house measuring four feet, and constructed of bars of iron nine
inches thick, and provided with a flat iron roof two inches thick. In the
sides of the pilot-house were narrow slits as sight holes. The other
projections were two small chimneys six feet high, removable before
an engagement, and two intake ventilators.
Neither side on the morrow shirked the coming duel. From the outset
the Monitor was the better prepared. Her guns fired solid shot; the
Merrimac had only shell and grape, neither of which was calculated
to do much harm to the Monitor’s turret, whereas the blow of the
Monitor’s shot upon the sloping sides of the Merrimac’s battery was
bound to be delivered with terrific force, even though the blows were
slanting. For another thing, the southern vessel was built of wood
and had already suffered severely in the hard contest at short range
with the battleships the previous afternoon; her engines were shaky,
and her steering gear worked worse than before; and the
experiences of some of her crew, coupled with the wounding of her
commander, had not been such as to leave their confidence
unshaken. The Merrimac was now commanded by Commodore
Tatnall, the hero of the episode in the Anglo-American attack some
years before upon the Chinese forts at Peiho, when he justified the
participation of the Americans by the famous remark that “blood is
thicker than water.” Tatnall proved himself a worthy successor to
Buchanan.
When the Merrimac sallied forth the next morning intending to
complete the destruction of the northern warships, she found the
Monitor waiting for her. Notwithstanding the inferiority of his
ammunition, Tatnall never hesitated for a moment. The firing
between the two ships was mostly at short range, and by the time
the battle was over both vessels had had enough of it. Neither side
admitted defeat, but neither side had succeeded in destroying the
other. The Monitor was struck twenty-two times, and in return she
fired forty-one shots. Precisely how many of these were effective on
the southern ship is not known, but including the fight of the previous
day, she was found afterwards to have no fewer than ninety-seven
indentations on her armour. Her layers of plating were shattered, and
the heavy wooden backing was splintered, but not one of the heavy
shots of the Monitor succeeded in penetrating the Merrimac. The
backing only splintered where the heavy shot had struck direct
blows. Nine of the Confederate shells struck the turret, and the pilot-
house was struck twice, and the other projections and the deck also
showed marks of the enemy’s fire. The result of the battle was that
the Monitor was able to resume hostilities and the Merrimac was so
badly crippled that she could not do so.
The steering gear and anchor of the Monitor were protected by the
overhanging deck, and were out of reach of the Merrimac’s fire. This
arrangement was repeated with modifications in most of the northern
monitors afterwards built, and greatly puzzled the Confederates until
they discovered the method by which the vessels could be anchored
or lift anchor without anyone appearing on deck.
It should be remembered that the Merrimac had to contend not only
against the Monitor, but also against the gunboats of the northern
fleet, which fired upon her whenever they had a chance.
The subsequent fate of these two typical ironclads is interesting. The
Monitor was sent to sea in weather she could never hope to contend
against, and went to the bottom. When the fortunes of war drove the
Confederates away from the positions they had occupied at
Hampton Roads, the Merrimac was scuttled by her commander to
prevent her falling into the hands of the Federals. Both sides went on
building ironclads of the types they had introduced. The Federals
rapidly acquired a fleet of monitors, because they were convinced of
the superiority of that type of vessel, and had almost unlimited
resources. The South built a few more broadside ironclads because
it had no option in the matter. It was a case of taking wooden
steamers and plating them as best it could with rolled-out railway
metals, boiler plates, and, in fact, anything metallic that could be
bolted on.
The Atlanta, formerly the English steamer Fingal, was cut down
much as the Merrimac had been, and given a heavy wooden
casemate plated with iron. The two monitors, Nahant and
Weehawken, were waiting for her, and when she set out from
Savannah to look for them, they followed. So also did some
steamers carrying a large number of Southerners who went to see
their ship defeat the monitors. The Atlanta fired one shot at the
Weehawken and missed, and the monitor returned the compliment
by steaming to within 800 yards and firing her heavy 15-inch gun.
The projectile smashed the Atlanta’s armour and wooden backing,
and the flying splinters wounded sixteen of the crew. She returned
the fire two or three times without hitting once, but the Weehawken’s
second shot smashed the pilot-house and the third started the
casemate from the deck. The Atlanta surrendered in fifteen minutes
after the firing of the first shot. Her subsequent employment was as a
guardship in the northern fleet. The Nahant did not fire.
The Albemarle, another Confederate ram of the Merrimac type, had
a short but exciting career. She carried only two 100-pounder rifled
guns, pivoted to fire end-on or on the broadside. Her first exploit was
to ram the northern gunboat Southfield, in the Albemarle Sound; her
ram entered about 10 feet, and the Southfield began to sink so
rapidly that, before she rolled off the Albemarle’s ram, she nearly
took the latter down with her. The Albemarle afterwards fought a
pitched battle with four northern paddle-wheel gunboats, and
although she was rammed and damaged, she held her own. Her
destruction may be said to have heralded the introduction of the
torpedo boat, and for this reason is referred to in a subsequent
chapter.

You might also like