Geophysical Journal International

Geophys. J. Int. (2013) 194, 417–449 doi: 10.1093/gji/ggt095

Advance Access publication 2013 April 4

Global shear speed structure of the upper mantle and transition zone

A. J. Schaeffer1,2 and S. Lebedev1

1 Geophysics Section, School of Cosmic Physics, Dublin Institute for Advanced Studies, Dublin, Ireland. E-mail: [email protected]
2 School of Geological Sciences, University College Dublin, Dublin, Ireland

Accepted 2013 March 6. Received 2013 February 15; in original form 2012 November 16

The rapid expansion of broad-band seismic networks over the last decade has paved the way for
a new generation of global tomographic models. Significantly improved resolution of global
upper-mantle and crustal structure can now be achieved, provided that structural information
is extracted effectively from both surface and body waves and that the effects of errors in the
data are controlled and minimized. Here, we present a new global, vertically polarized shear
speed model that yields considerable improvements in resolution, compared to previous ones,
for a variety of features in the upper mantle and crust. The model, SL2013sv, is constrained by
an unprecedentedly large set of waveform fits (∼3/4 of a million broad-band seismograms),
computed in seismogram-dependent frequency bands, up to a maximum period range of 11–
450 s. Automated multimode inversion of surface and S-wave forms was used to extract a
set of linear equations with uncorrelated uncertainties from each seismogram. The equations
described perturbations in elastic structure within approximate sensitivity volumes between
sources and receivers. Going beyond ray theory, we calculated the phase of every mode at
every frequency and its derivative with respect to S- and P-velocity perturbations by integration
over a sensitivity area in a 3-D reference model; the (normally small) perturbations of the 3-D
model required to fit the waveforms were then linearized using these accurate derivatives.
The equations yielded by the waveform inversion of all the seismograms were simultaneously
inverted for a 3-D model of shear and compressional speeds and azimuthal anisotropy within
the crust and upper mantle. Elaborate outlier analysis was used to control the propagation of

GJI Seismology
errors in the data (source parameters, timing at the stations, etc.). The selection of only the
most mutually consistent equations exploited the data redundancy provided by our data set
and strongly reduced the effect of the errors, increasing the resolution of the imaging.
Our new shear speed model is parametrized on a triangular grid with a ∼280 km spacing. In
well-sampled continental domains, lateral resolution approaches or exceeds that of regional-
scale studies. The close match of known surface expressions of deep structure with the
distribution of anomalies in the model provides a useful benchmark. In oceanic regions,
spreading ridges are very well resolved, with narrow anomalies in the shallow mantle closely
confined near the ridge axis, and those deeper, down to 100–120 km, showing variability in
their width and location with respect to the ridge. Major subduction zones worldwide are
well captured, extending from shallow depths down to the transition zone. The large size
of our waveform fit data set also provides a strong statistical foundation to re-examine the
validity field of the JWKB approximation and surface wave ray theory. Our analysis shows
that the approximations are likely to be valid within certain time–frequency portions of most
seismograms with high signal-to-noise ratios, and these portions can be identified using a set
of consistent criteria that we apply in the course of waveform fitting.
Key words: Inverse theory; Surface waves and free oscillations; Seismic tomography; Com-
putational seismology; Dynamics of lithosphere and mantle.

the early work of Dziewónski et al. (1977). These improvements

1 I N T RO D U C T I O N
have been facilitated by the rapid growth of high-quality, broad-
The resolving power of global seismic tomographic imaging has im- band, three-component seismic data recorded by global and re-
proved dramatically over the last several decades, beginning from gional seismic networks. This growth has accelerated particularly

418 A. J. Schaeffer and S. Lebedev

in the last few years, stemming from the emergence of large- and etc.), methodological and theoretical limitations or treatment of the
continental-scale, high-resolution seismic arrays (i.e. EarthScope crust (Lekić & Romanowicz 2011).
USArray, VEBSN, etc.). Also important has been the development A number of theoretical and computational approaches have been
of computational infrastructure and advancements in semi- and fully developed during the past decade-and-a-half to more accurately
automated data-processing techniques and modelling methodolo- relate the seismic wavefield to seismic velocity structure. These
gies. techniques focused, in particular, on the frequency-dependent and
Together, these developments have now paved the way for a new 3-D nature of seismic wave sensitivity regions. Modelling finite-
generation of global tomographic models. They will provide sub- frequency effects using the first-order Born approximation was ap-
stantially higher resolution of the structure of the lithosphere and plied to body waves (Dahlen et al. 2000; Nolet & Dahlen 2000;
underlying upper mantle compared to ones of only a few years ago. Zhao et al. 2000; Montelli et al. 2004; Sigloch et al. 2008; Zaroli
The long-wavelength structure of the Earth’s lithospheric man- et al. 2010) and surface waves and multimode waveforms (Li & Ro-
tle (down to 200 or 300 km depth) has been well resolved for a manowicz 1995, 1996; Marquering et al. 1996; Meier et al. 1997;
number of years, with strong correlation for scale lengths of sev- Yoshizawa & Kennett 2002; Zhou et al. 2006). Recently, fully nu-
eral thousand kilometres between many global models (Becker & merical wavefield simulations have also been applied, more and
Boschi 2002). However, such wavelengths are too long for con- more, in waveform tomography (Chen et al. 2007; Fichtner et al.
sistent comparisons with geological and geochemical evidence on 2009, 2010; Tape et al. 2009; Lekić & Romanowicz 2011; Zhu et al.
regional-scale tectonics. In the deep upper mantle and in the mantle 2012). However, the improved precision and accuracy in modelling
transition zone (410–660 km depths), existing global models show greater wavefield complexity trades off very steeply with increases
substantially weaker agreement, even at the long wavelengths of a in computational cost. Commonly, such models utilize only tens to
few thousand kilometres. hundreds of events, and several hundreds of stations.
Global models have been obtained using a variety of methods and Asymptotic and ray-based approaches (e.g. Debayle et al. 2005;
data sets, including traveltimes (e.g. Grand et al. 1997; Bijwaard & Lebedev & van der Hilst 2008; Kustowski et al. 2008a; Ferreira
Spakman 2000; Karason & van der Hilst 2000; Grand 2002; Amaru et al. 2010; Ritsema et al. 2011; Debayle & Ricard 2012) are com-
2006; Simmons et al. 2006; Li et al. 2008), surface waves (e.g. putationally inexpensive and can be used with significantly larger
Zhang & Tanimoto 1993; Shapiro & Ritzwoller 2002; Zhou et al. data sets, affording a much higher degree of data redundancy. This
2006; Nettles & Dziewónski 2008; Ekström 2011), surface waves can play a critical role in minimizing the impact of errors com-
and body waves (or fundamental and higher modes, e.g. Woodhouse mon to different types of methodologies, the most significant being
& Dziewónski 1984; Mégnin & Romanowicz 2000; Debayle et al. event mislocations and incorrect source mechanisms and station
2005; Panning & Romanowicz 2006; Lebedev & van der Hilst 2008; timing errors. Simple algorithms which leverage the data redun-
Ferreira et al. 2010; Lekić & Romanowicz 2011; Debayle & Ricard dancy to identify and partition the affected seismograms can be
2012) and surface waves with traveltimes and normal modes often implemented to reduce (and in some cases eliminate) their effect on
included as well (e.g. Su et al. 1994; Masters et al. 1996, 2000; the final inversion product. The increased redundancy can also en-
Gu et al. 2001; Ritsema et al. 2004, 2011; Houser et al. 2008; hance the validity of approximations themselves, thanks to a larger
Kustowski et al. 2008a). data set. In both these regards, the utility and relevance of asymptotic
Models at regional to subcontinental scales can take advantage techniques currently remains very clear.
of particularly dense data sampling within regions and target higher The comparative advantages of different approaches of seismic
resolutions. Continental-scale models provide coverage across yet tomography are now a subject of scrutiny and debate, as the field
larger regions, but are still limited by their constrained dimensions. is developing and improving methods to exploit the enormous—
Some recent examples, grouped by continent, include: North Amer- and growing—volumes of available broad-band data. Different ap-
ica (Bedle & van der Lee 2009; Burdick et al. 2010; Tian et al. 2011; proaches may work best for different targets. For example, to image
Yuan et al. 2011), South America (van der Lee et al. 2001; Schimmel a narrow plume in the deep mantle—a notoriously difficult target
et al. 2003; Feng et al. 2004; Heintz et al. 2005), Eurasia (Amaru because wave front healing nearly erases the signal of such a struc-
2006; Priestley et al. 2006; Kustowski et al. 2008b; Legendre et al. ture in teleseismic travel times—accurate numerical modelling of
2012; Panning et al. 2012; Zhu et al. 2012), Australia (Simons et al. seismic wave diffraction and scattering off the plume may be the
1999; Debayle & Kennett 2000; Yoshizawa 2004; Fishwick et al. most suitable approach, even when applied to a small number of
2008; Fichtner et al. 2010) and Africa (Sebai et al. 2006; Pasyanos seismograms (Rickers et al. 2012). A very different problem is
& Nyblade 2007; Priestley et al. 2008; Fishwick 2010). Direct com- presented by the imaging of the wave speed distribution in the litho-
parisons between different regional or continental models are not sphere and upper mantle at a regional to global scale: asymptotic
always straightforward (one continent to another, for example), due methods that are applicable to very large data sets and capable of
to differences in regularization, parametrization and the data sets effective extraction of structural information from both surface and
themselves (Bijwaard et al. 1998; Nettles & Dziewónski 2008). body waves are likely to exploit the redundancy of the presently
With the expansion of seismic networks, improvements in com- available data more effectively and, thus, can provide models with
putational capabilities, and advancements in methodologies, higher higher resolution and greater robustness.
resolution (a few hundreds of kilometre length scales) global models Automated multimode inversion (AMI) of surface and S-wave
have now become a reality. These new models enable exploration forms (Lebedev & Nolet 2003; Lebedev et al. 2005; Lebedev &
of deep lithospheric processes at the fine scale of tectonic units van der Hilst 2008) was developed on the basis of the partitioned
across entire continental domains. At these shorter length scales, waveform inversion (Nolet 1990), which splits a large-scale tomog-
however, global models show greater variance than at longer wave- raphy problem into more tractable inversions of each seismogram
lengths (Becker & Boschi 2002). Such differences can arise from individually, similar to the techniques of Cara & Lévêque (1987)
a number of factors, including data set selection (i.e. earthquakes and Gee & Jordan (1992). AMI enables efficient, automated, accu-
and stations), treatment of errors in the data, model parametrization rate processing of very large numbers of vertical- and horizontal-
and regularization, data type (wave type, waveforms, phase delays, component seismograms. This is accomplished through an elaborate
Multimode upper-mantle tomography 419

window-selection procedure isolating signals least likely to contain in high-resolution models of the Earth’s upper mantle and transition
scattered arrivals, combined with appropriate weighting of windows zone. The fully automated algorithm is built on the basis of the par-
containing waves of different amplitudes and types, while enforc- titioned waveform inversion of Nolet (1990), as described in detail
ing strict misfit criteria. AMI assumes the JWKB approximation by Lebedev et al. (2005), with further advancements in Lebedev &
(Dahlen & Tromp 1998); time–frequency portions of each seismo- van der Hilst (2008).
gram are systematically selected to contain only signals that can be For each seismogram, AMI uses non-linear waveform fitting to
accurately modelled. Instead of relying on ray theory and the path- derive a set of linear equations with uncorrelated uncertainties that
average approximation, the initial phase velocities and their deriva- describe finite-width sensitivity-volume-average S- and P-velocity
tives with respect to S and P velocities are computed as integrals perturbations [δβ(r ) and δα(r )], with respect to a 3-D reference
over approximate sensitivity areas between sources and stations, model. Synthetic seismograms are computed in the frequency do-
within a 3-D reference model. main using the JWKB mode summation:
In this study, we have used AMI to generate an unprecedent-    
edly large data set of ∼3/4 million vertical-component, multimode s(ω) = Am (ω) exp iω Cm0 (ω) + δCm (ω) , (1)
waveform fits, each yielding a set of linear equations constrain-
ing perturbations in Earth structure. These equations were inverted by summing over the modes, m, for the given source–receiver dis-
together for our new global, upper-mantle, shear velocity model. tance . The initial phase velocities, Cm0 (ω), and their Fréchet
The model is constrained by substantially more waveforms com- derivatives are pre-computed for our 3-D reference model. For each
pared to any previous ones and an order of magnitude more than in source-station pair, they are averaged across approximate sensitivity
the previous global application of AMI (Lebedev & van der Hilst kernels. The average phase-velocity perturbations, δCm (ω), are ex-
2008). The improved data sampling and data redundancy enable pressed as functions of the sensitivity-volume average perturbations
finer global parametrization and global resolution, which serves to in P and S velocity:
further close the resolution gap between global and regional mantle  R  R
studies (Bijwaard et al. 1998). δCm0 (ω) δCm0 (ω)
δCm (ω) = δβ(r ) + δα(r ) dr, (2)
In the sections below, we begin with an overview of the methods 0 δβ(r ) 0 δα(r )
and the assembly and preparation of the data set prior to inversion. where R is the radius of the Earth.
We then present our new model and discuss its major features, from This scheme (Lebedev & van der Hilst 2008) goes significantly
the large-scale ones, already seen in past models, to the smaller beyond ray theory and path-average approximations. The phase of
scale ones that are now imaged much more clearly than previously. every mode at every frequency and its derivatives with respect to
Finally, we utilize our new data set of waveform fits to provide a seismic wave speeds are computed as integrals over the sensitivity
sound statistical sampling of the bulk fundamental- and higher mode area in the 3-D reference model. Only the perturbations to the phase
Rayleigh wave dispersive properties of the crust and upper mantle, velocities (small, in most cases) are linearized, and this is done
and also take the opportunity to re-examine the global validity using the accurate average derivatives. At no point do we use the
of the JWKB approximation and surface wave ray theory. In the cruder approximation of 1-D, path-average models. Generally, a 1-
Appendices, we provide a further analysis of the data set as well D model with the same phase velocities and their derivatives as the
as how the total frequency band of waveform fitting affects the ones we compute by sensitivity-area integration is not likely to even
resulting tomographic models. exist (although, of course, there may exist models with dispersion
properties that are similar).
Initially, δβ(r ) are parametrized using a set of 1-D basis functions
2 I N V E R S I O N P RO C E D U R E hi (r) which span depths from 7 km within the crust to ∼1600 km
We have built a global, vertically polarized shear velocity model, ex- in the upper portion of the lower mantle (18 parameters for S and
tending from the crust to the base of the transition zone, through the 10 parameters for P). Through diagonalization of the Hessian ma-
application of a three-step waveform fitting and inversion procedure. trix, these are transformed into independent linear equations with
We begin with the application of AMI to broad-band seismograms uncorrelated uncertainties (Nolet 1990), with new parameters ηi
to generate sets of linear equations which constrain the sensitivity- corresponding to basis functions gi (r), with each gi (r) a linear com-
volume average velocity perturbations between each source and re- bination of the original basis functions hi (r):
ceiver, with respect to a 3-D reference model. In the second step, we 
combine the equations from AMI into one large linear system and δβ(r ) = ηi gi (r ). (3)
solve it for the 3-D distribution of P, S and azimuthal anisotropy i=1
perturbations, subject to regularization and smoothing, using the
The strength of AMI is rooted in the fully automated selection and
LSQR algorithm (Paige & Saunders 1982) and following the pro-
weighting of time–frequency windows, which enables reliable and
cedure of Lebedev & van der Hilst (2008). Finally, we perform an
accurate application to massive data sets. Computed normal-mode
outlier analysis of the data set, and a posteriori select the most mu-
synthetics are matched with real seismograms in time–frequency
tually consistent equations to be reinverted so as to constrain the
windows that isolate the fundamental- and higher mode wave trains
final model.
unaffected by scattered waves. Examples of this waveform fitting
procedure are shown in Figs 1 and 2 for two paths, one shorter
and one longer. In the following paragraphs, key elements of the
2.1 Waveform inversion
technique are highlighted; for greater detail we refer the interested
AMI’s numerical efficiency, the capacity to select signal for which reader to Lebedev et al. (2005).
theoretical approximations hold and then weight and balance the in- Slip along a fault generates a non-uniform radiation pattern, with
formation derived from different portions of the wave train, enables the initial phase and amplitude of every mode varying as a function
accurate processing of very large numbers of waveforms, resulting of azimuth and frequency. Source-station azimuths close to a node
420 A. J. Schaeffer and S. Lebedev

Figure 1. Automated multimode waveform inversion example. (a) An earthquake on 1997 February 28 (43◦ N, 148◦ E, 37 km depth, moment Magnitude MW =
5.8) recorded on the vertical component at the broad-band station Talaya, Russia (TLY) of the Global Seismograph Network (GSN), operated IRIS/IDA; the
source-station distance is 3380 km. The approximate sensitivity area is shaded grey, with darker colours indicating greater sensitivity. (b) 11 closely spaced
Gaussian filters used in generating the different time–frequency windows. (c) The resulting waveforms (solid lines) are matched with synthetics (dashed lines)
in 18 different time–frequency windows simultaneously. The time windows are indicated by half-brackets, with the signal envelope shaded. The fundamental-
mode wave train is identified by vertical white bars at the maxima of the envelope. The initial fit is computed using our 3-D background model. (d) The misfit
is minimized through non-linear inversion for the sensitivity-volume average perturbations δβ(r ) and δα(r ). Energy in the synthetic is equalized with that of
the data in each window. All 18 selected time–frequency windows have final data-synthetic misfits less than 5 per cent. The average perturbations computed by
waveform inversion constrain the S- and P-velocity perturbations within the sensitivity area shown in (a), used in tomographic inversion. (e) Final data-synthetic
fit within a single, broad time–frequency window encompassing the entire frequency range of this waveform inversion. Arrival times of the S- and triplicated
multiple-S waves predicted by AK135 (Kennett et al. 1995) are indicated by grey shading. The same phases are also indicated above their frequency windows
in (d).
in the radiation pattern are more likely to contain relatively higher of Pollitz (2001), two fundamental-mode wavelengths are sufficient.
proportions of scattered energy within that portion of the seismo- The minimum frequency filter is then constructed such that the left-
gram, and should therefore be avoided. Prior to waveform fitting, most tail frequency corresponding to an amplitude of 0.3× the
the frequency- and azimuth-dependent nodal radiation patterns are filter central maximum contains exactly three fundamental-mode
computed for each seismogram. For each frequency, azimuth bands wavelengths between the source and receiver. This ensures the filter
in which the amplitude of the predicted pattern are less than half the centre frequency (dominant frequency) contains more than three
maximum across all azimuths at that frequency are determined, and wavelengths. Effectively, it is the path length that controls the min-
discarded. If the given source-receiver geometry does not fall in any imum filter frequency: the longer the paths, the lower the minimum
permitted azimuthal bands at any frequency, then that seismogram frequency (i.e. the longer the maximum period) of the fundamental-
is discarded. mode waveforms.
The Gaussian filter windows (B, Figs 1 and 2) are initially defined The validity of the point-source approximation is ensured by
within the range selected by the frequency-dependent azimuthal setting a maximum frequency (minimum period) limit at 1/3τ ,
nodal radiation pattern, for each given seismogram. This may be where τ is the earthquake source duration time, taken from Centroid
narrowed further through enforcement of the far-field and point- Moment Tensor (CMT) catalogues. (If the period of the wave is
source approximations. comparable to the source duration time, then both its amplitude and
The far-field approximation ensures sufficient source–receiver phase will be affected by unmodelled complexity of the source). As
distance to avoid complexities due to near-field wave propagation ef- a result, the earthquake magnitude controls, in part, the maximum
fects (e.g. evanescent waves); based on extrapolation from the work frequency of fitting: the larger the magnitude, the longer the source
Multimode upper-mantle tomography 421

Figure 2. Automated multimode inversion example. Earthquake on 2003 June 7 (5.3◦ S, 152.6◦ E, 30 km depth, moment magnitude MW = 6.6) recorded on
the vertical component of station Black Hills, South Dakota (RSSD of the GSN, operated by IRIS/USGS); the source-station distance is 11 485 km. Plots are
the same as in Fig. 1, except with seven Gaussian filters, 11 time–frequency windows and arrivals of multiple S3 − S7 as indicated.

duration time, and the lower the filters’ maximum frequency (the 20 wavelengths of the lowest frequency fundamental mode. Visual
longer the shortest period). examination of thousands of waveform fits confirmed that these
The time–frequency windows are generated through application measures, together with our use of a 3-D reference model with real-
of boxcar time windows after bandpassing with the suite of Gaus- istic crust, are sufficient to rule out cycle skips. The only exceptions
sian filters. The time windows contain individual wave trains or detected occurred due to very large (tens of seconds) timing errors
series of wave trains, and are selected such that their boundaries do at the stations. Waveform fits affected by such errors are removed
not cut the middle of a wave packet: the signal at the nearest maxi- at the later, outlier-removal stage.
mum of the envelope must be at least ∼3.5 times larger than at the After the minimum frequency filter is determined, waveform
window boundary. Lebedev et al. (2005) initially used a more con- fitting proceeds with the iterative addition of higher frequency
servative threshold of 4–5; however, further testing demonstrated time–frequency windows, where individual wave trains in all time–
this can safely be relaxed to a lower value, without detriment to frequency windows are inverted simultaneously, searching for
waveform fits. The rightmost time-window limit immediately fol- δβ(r ) which minimizes the cumulative misfit across all windows
lows the fundamental-mode arrival at all filter frequencies, avoiding (Lebedev et al. 2005). In addition, a fit is considered successful
scattered waves in the coda. The leftmost time-window limit varies only if the data-synthetic misfit in each time–frequency window is
as a function of epicentral distance, with cut-offs set to eliminate S less than 5 per cent. If the fit is acceptable only within portions of
and multiple-S waves sampling the deep lower mantle. As a result, the window, it is iteratively narrowed or split to attain the target
time–frequency windows at a given frequency may contain only a data-synthetic misfit in each new subwindow, while enforcing the
single fundamental-mode window, fundamental and higher modes requirement that each must contain a complete wave train. Fig. 2(d)
or a group of windows containing fundamental and higher mode at 16.8 and 21.1 mHz illustrates this window refinement procedure:
information (D, Figs 1 and 2). what began as single windows spanning 2150–3000 s were itera-
Waveform fitting begins with the lowest frequency Gaussian fil- tively split into two and four windows, respectively.
ter (pre-determined such that the far-field approximation is valid
and nodes in the radiation pattern are avoided) and widest possible
time windows. The minimum centre frequency of the first (lowest
frequency) filter is increased until the signal-to-noise ratio is suffi- 2.2 3-D reference model
ciently high, such that a low-noise window is found. By beginning To accurately relate the phase information in the waveform to per-
at this lowest possible frequency, the likelihood of errors resulting turbations in S and P velocity within its sensitivity area, we required
from 2π phase ambiguities (cycle skipping) are minimized. For the as accurate as possible reference phase velocities Cm0 (ω) and their
same reason, the signal envelope is fit first in every time–frequency Fréchet derivatives δCm0 (ω)/ [δβ(r ), δα(r )]. Less accurate deriva-
window, and the source-station distance is required to not exceed tives would result in more inconsistent equations, which manifest as
422 A. J. Schaeffer and S. Lebedev

Figure 3. Reference velocity models used for both AMI and 3-D tomography. Left-hand panel (a) illustrates four example crustal models, based on C RUST 2
(Bassin et al. 2000). In each crustal model, velocities below their respective Moho converge to the same mantle reference velocity profile (b). Topography is
accounted for in each model, and is indicated by negative depth values. The oceanic model (blue) has a 5-km water layer and 100 m of sediments above a
‘normal’ oceanic crustal model (C RUST 2 type A0); the platform margin model has 1 km of sediments with minimal topography (type DG); the Archean shield
model (based on Canadian Shield) has 500 m of sediments and an elevation of 700 m (type G2); the orogenic model (based on Tibet/Himalaya) has 4.5 km of
sediments and 4 km elevation (type RC). Right-hand panel (b) shows the different mantle reference models, including AK135 (Kennett et al. 1995). AMI uses
a modified AK135, recomputed at a reference period of 50 s (AK135_50). The reference model for tomography is generated by adjusting AK135_50 based on
an initial global inversion by Lebedev & van der Hilst (2008).
noise in the tomographic inversion. Consequently, greater smooth- were compared and a subset of 664 exemplar models were selected
ing and norm damping are required, and the model resolution is and weighted across the triangular grid (average interknot spacing
decreased. of ∼28 km). Fig. 3(a) illustrates four example crustal models drawn
The high lateral heterogeneity in the Earth’s crust gives rise from this set.
to significant lateral variability of the Fréchet derivatives. When The mantle reference beneath the Moho is based on AK135
not adequately accounted for, this gives rise to artefacts in both (Kennett et al. 1995), but recomputed at a reference period of 50 s
the crustal and mantle parts of the tomographic model. Generally, to minimize errors due to lateral variations in attenuation, poorly
crustal structure is often accounted for using ‘crustal corrections’, resolved within the waveband of interest (hereon referred to as
which involves computing period-dependent corrections based on AK135_50). Fig. 3(b) illustrates the mantle reference models uti-
an assumed crustal model, which are then applied to phase-velocity lized by AMI and the tomographic inversion. As indicated, the thick
maps used in the inversion (e.g. Boschi & Ekström 2002; Gu et al. grey line is the AK135_50 reference used by AMI. For the tomo-
2003; Chevrot & Zhao 2007; Kustowski et al. 2007; Marone & graphic inversion, this model has been modified slightly (solid black
Romanowicz 2007; Bozdaǧ & Trampert 2008; Lekić et al. 2010). line) based on an initial global inversion (Lebedev & van der Hilst
Instead of computing corrections, we construct a realistic 3-D refer- 2008), to bring the reference values closer to global averages.
ence model which includes a priori crustal structure, and then solve Phase velocities and their Fréchet derivatives (Cm (ω) and
for velocity perturbations with respect to it. Thanks to this more δCm (ω)/[δβ(r), δα(r)]) are pre-computed for every lateral knot
accurate approach and to our fundamental-mode waveform fits at in the reference model, covering the broad frequency band
the relatively short periods of 15–25 s, obtained for shorter source- 0.488–125 mHz (8–2048 s, extending beyond that used in wave-
station distances and sensitive to crustal structure, we were able to form fitting). This enables efficient computation of Cm0 (ω) and
resolve, typically, the average perturbations with respect to the 3-D δCm0 (ω)/ [δβ(r ), δα(r )] for any source–receiver path simply by sum-
reference model within the normal-continent crustal depth range, ming together weighted phase velocities and derivatives; the weights
while also resolving intracrustal layering for thick continental crust. are based on an approximate sensitivity kernel K (θ, φ), averaged
This increased the accuracy of our model in the upper mantle below over the frequency band. The same approximate sensitivity areas
as well. are used in both waveform fitting and the tomographic inversion
We sampled the Earth’s surface with a dense triangular grid of (see Section 2.3.1).
knots (Wang & Dahlen 1995a), over which the 3-D global crustal
model C RUST 2 (Bassin et al. 2000) was parametrized. The 360 type
models were smoothed at the boundaries of the 2◦ × 2◦ cells and aug-
2.3 3-D inversion
mented with topographic and bathymetric databases to generate a
larger suite of models encompassing greater variations in water, sed- The result from AMI for each seismogram consists of a number
iment and crustal thickness. The derivatives δCm (ω)/[δβ(r), δα(r)] of equations on the orthogonal basis gi (r) with parameters ηi and
Multimode upper-mantle tomography 423

uncorrelated uncertainties ηi which describe the 1-D average S-

and P-velocity perturbations within the sensitivity volume between
the source and receiver, relative to the 3-D reference model (crit-
ically we note this is different than generating a 1-D path-average
model). By combining together the equations obtained from all the
successfully fit seismograms, a large linear system is constructed,
from which the 3-D distribution of P, S and azimuthal anisotropy
perturbations, from the 3-D reference model, are solved for with
LSQR, subject to regularization and smoothing. The horizontal sen-
sitivity of each seismogram is given by the same kernel, K (θ, φ), as
in waveform fitting. The vertical structure of the kernels gi (r) dif-
fers for each seismogram. Only linear equations with corresponding
eigenvalues exceeding a pre-determined threshold are incorporated
into the inversion. On average, this results in ∼3.5 equations per
path, more if S waves or broader band fundamental modes are in-
cluded. Key elements of the inversion procedure are outlined in the
following sections; for further detail, we refer to Lebedev & van der
Hilst (2008).

2.3.1 Gridding
To build the linear system, we first generate two global, coregis-
tered triangular grids of knots, using the method of Wang & Dahlen
(1995a). The first is a dense integration grid with nominal interknot
spacing of ∼28 km (same as the reference model). The second grid,
with knot spacing of ∼280 km, is the model grid, on which per-
turbations of the isotropic-average shear and compressional speeds
and shear velocity anisotropy are expanded and solved for. By de-
sign, the knots of the model grid are co-located with knots of the
integration grid, enabling efficient transformations between the two.
Fig. 4(a) illustrates the locations of the integration (black dots and
yellow circles) and model (red and blue circles) grid nodes, and
Figure 4. (a) Model and integration grids used in tomography shown by
their relative sensitivities (yellow and blue circles) for the path in red circles and black dots, respectively. The source-station path illustrated
Fig. 1. is that from Fig. 1. Superimposed is the sensitivity area (kernel) κ(θ, φ)
The same ‘shell’ of knots is used at all depths in the model. from the source-station path illustrated in Fig. 1, represented by the yellow
Vertically, S-velocity perturbations are parametrized on 18 ‘stem’ (integration grid) and blue (model grid) circles. The circle sizes in the inte-
nodes: 7, 20, 36, 56, 80, 110, 150, 200, 260, 330, 410-, 410+, gration grid (yellow) scale with the weight of the knots in the sensitivity-area
485, 585, 660-, 660+, 809 and 1007 km, whereas for P velocity integral, whereas in the model (blue) grid their size indicates the contribu-
there are only 10 parameters: 7, 20, 36, 60, 90, 150, 240, 350, tion of each knot in inversion, for that path. (b) Triangular vertical basis
485 and 585 km. Anomalies between the knots of this 3-D grid are functions used in parametrization for S- (left) and P velocity (right). Note
computed by trilinear interpolation. The ‘stem’ nodes are the same that the discontinues at 410 and 660 km in the S parameters (absent in P)
are generated using two half triangles, with one above and the other below
as the vertices of the triangular basis functions hi (r) used in the
the discontinuity.
waveform inversion, prior to orthonormalization, as illustrated in
Fig. 4(b). The transition zone discontinuities at 410 and 660 km are
band. Weights are largest closest to the source and receiver; cross-
accommodated using pairs of half-triangles. The inclusion of the
sections reveal that at any point along the path, weights decrease
shallowest nodes ensures that globally there is at least one model
with distance from the great-circle ray path, to a total width of ±δ,
node in the crust, and at times up to four. Therefore, perturbations
where δ is the width of the ‘π /2’ Fresnel zone (yellow circles
from C RUST 2 are solved for directly in the inversion, which helps
in Fig. 4a). The sensitivity kernels for each seismogram are then
to minimize the inaccuracies of C RUST 2. As will be discussed in
mapped onto the model grid through averaging of the integration
Section 5, the resulting model contains strong deviations from the
grid knots, with the resulting weights applied to the parameters for
crustal reference in many locations (e.g. across Tibet).
that path in the inversion (blue circles in Fig. 4a).
For a given seismogram fit by AMI, the sensitivity kernel K (θ, φ)
around the corresponding path is evaluated on the integration grid,
with the total weight for the ith knot being the product between
2.3.2 Path weighting
the sensitivity K(θ i , φ i ) and the area Ai (θ i , φ i ) [defined by the
hexagon (pentagon) that contains all points that are closer to this The global distribution of seismometers and seismicity is not even,
grid knot than to any other]. The sensitivity areas K (θ, φ) are similar with station locations biased to continental regions and oceanic
to the ‘influence zone’ of Yoshizawa & Kennett (2002) and the islands, and events clustering along plate boundaries. As a result,
traveltime kernels of Zhou et al. (2005), essentially encompassing most large seismic data sets contain some sampling bias. It is clear
the interior region bounded by the ‘π /2’ Fresnel zone, computed at a from the stations and events shown in Fig. 5 that this is the case in
single frequency in the middle of the fundamental-mode’s frequency our data set. To reduce the effect of common or ‘bundled-rays,’ a
424 A. J. Schaeffer and S. Lebedev

Figure 5. Map illustrating the distribution of stations and events in the data set. More than 5000 stations from International, National, Regional and temporary
deployments are represented by red triangles. The ∼27 000 total events recorded at the different stations are shown as yellow circles.

reweighting of all paths is performed to produce a normalized path on the 2ψ components for vertically polarized Rayleigh waves. In
coverage. this paper, we present the isotropic component of an azimuthally
For each path in the inversion, weights measuring cumulative anisotropic model, which included highly smoothed 2ψ terms
similarity of a path to all other paths are computed and applied to reduce errors resulting from trade-offs between isotropic and
to the equations corresponding to this path. This results in rela- anisotropic heterogeneity. The azimuthally anisotropic structure
tive downweighting of equations derived from commonly travelled shall be the focus of subsequent work.
paths, with more even distribution of relative sampling across the Regularization is carried out in the form of lateral and verti-
model space. The post-path reweighting coverage for several depths cal smoothing and slight norm damping, to stabilize the mixed-
in the final model is illustrated in Fig. 8. determined inversion. Smoothing is the primary control, whereas
the weak damping plays a secondary role. We apply two kinds of
smoothing, with one penalizing the difference between the anomaly
2.3.3 Parametrization and regularization at a node and the average anomaly over this and the six (or five)
nearest neighbour nodes and the other (gradient damping) penaliz-
The inversion solves directly for perturbations in P, S and 2
ing the differences between pairs of neighbouring model knots. In
S-velocity azimuthal anisotropy with respect to our 3-D refer-
both cases, the strength of the coefficients decreases as a function of
ence model. Lebedev & van der Hilst (2008) verified that the
increasing depth, to prevent oversmoothing at greater depths due to
perturbations in P velocity could not be resolved independently
reduced sampling there (fundamental-mode sensitivity is lower at
using Rayleigh wave data. To avoid trade-offs, the difference be-
greater depths). Vertical gradient damping penalizes rapid changes
tween isotropic P- and S-velocity perturbations are damped, in
with depth within the model.
the form |δVP (m s−1 ) − δVS (m s−1 )|. This offers greater freedom
Finally, norm damping penalizes the amplitude of the anomalies
to the inversion, as opposed to forcing a rigid coupling; nev-
with respect to the 3-D reference model, with the strength of the
ertheless, the resulting P- and S-velocity images are still quite
damping, again, a function of depth. The data sampling (and hence
the relative strength of regularization) is quantified using the column
Azimuthal anisotropy is described by Smith & Dahlen (1973) as
sums of the matrix A that relates the model vector m to the data
a harmonic function of the form:
vector d (Am = d).
C(T, ψ) = C0 (T ) + A1 (T ) cos(2ψ) + A2 (T ) sin(2ψ)
+ A3 (T ) cos(4ψ) + A4 (T ) sin(4ψ), (4)
2.4 Outlier analysis
where T is period, ψ the azimuth, C the observed phase velocity,
C0 the isotropic phase velocity and Ai the anisotropy parameters. Outlier analysis and rejection (e.g. Lebedev & van der Hilst 2008) is
The strength of the 4ψ anisotropic terms is known from previous critical for improving equation consistency, inversion convergence
global and regional studies to be weak for long-period surface waves and the resolution of the imaging. Automated methods enable fast
(Montagner & Tanimoto 1991; Trampert & Woodhouse 2003; De- data processing and production of very large waveform data sets;
schamps et al. 2008b; Darbyshire & Lebedev 2009; Adam & Lebe- it is important to assess the relative quality of the successfully fit
dev 2012); therefore we disregard the 4ψ terms and focus only seismograms utilized in the tomographic model. Manual analysis
Multimode upper-mantle tomography 425

and examination of the waveform fits was carried out on a sub-

set; however, this process is time-consuming and negates efficien-
cies gained by automation. To quantitatively assess the consistency
and relative quality of hundreds of thousands of waveform fits, we
perform an objective search for those equations which deviate sig-
nificantly, most commonly due to source mislocations, errors in
event origin times and source mechanisms, as well as station timing
The basic outlier analysis procedure utilizes an initial tomo-
graphic inversion for the model m i , from which synthetic coeffi-
cients (data, ds ) are generated through the matrix multiplication
ds = Am i . Then, the distribution of data-synthetic misfits is anal-
ysed, and those equations that lie in the tails of the distribution,
well beyond the 2–3σ level (95–99 per cent), can be identified,
examined and possibly rejected. In practice, eliminating just 1–
2 per cent of outliers greatly improves the inversion convergence
and the resulting model. However, the massive size of our new data
set enables us to be more selective regarding which equations are
We have undertaken a rigorous, conservative outlier analysis pro-
cedure, to select only the most mutually consistent equations for
use in our final tomographic model. A series of smaller a poste-
riori outlier analyses were carried out. To identify the most con-
sistent equations, we used as a benchmark the data set of global
waveform fits from Lebedev & van der Hilst (2008). Subsets of
10–15 000 randomly selected seismograms from our data set were
inverted together with the 51 004 waveform fits of the benchmark
data set.
We elected to use this benchmarking method for several reasons.
First, the model of Lebedev & van der Hilst (2008) accurately re-
covers the major SV structure of the upper mantle and transition
zone. Secondly, it has been shown by Becker et al. (2012) that the
anisotropic component of this model correlates well with global
SKS splitting measurements, indicating that its accuracy extends
Figure 6. Example outlier analysis for a single data subset. Green represents
beyond the isotropic shear speed originally presented. Thirdly, sev-
the 51 004 seismograms (153 509 equations) that constrained the global
eral passes of outlier analysis were carried out on the benchmark
model of Lebedev & van der Hilst (2008). Blue represents the subset of
data set, therefore its equations are highly mutually consistent. Fi- 10 500 GSN seismograms (∼30 200 equations) randomly selected from
nally, the use of such a benchmark inversion provides statistical data set B (Table A1). The outlier inversion is carried out on these 61 504
constraints on our new data set, steering convergence towards a seismograms (∼184 000 equations). The top panel illustrates the ‘log’-
reasonable final model, while also leaving it the freedom to deviate scaled histogram of the data-model residuals. The bottom panel illustrates
if required by the data. Most importantly, this eliminates the fits the raw residuals for each equation. It is clear the residuals from the data set
affected by large errors in the data. of Lebedev & van der Hilst (2008) are substantially smaller than those in the
Fig. 6 provides an example of this procedure for one subset of subset, and restricted almost entirely to ±1 (range indicated by dashed red
10 500 seismograms from stations of the Global Seismographic lines). The data subset from our new set of waveforms fits exhibits a much
greater degree of scatter, and only those seismograms with misfits within
Network (GSN). Data residuals (d − ds ) are normalized by the
the dashed red lines are retained.
estimated uncertainty of each datum (Nolet 1990; Lebedev & van
der Hilst 2008). The misfits for the benchmark and new subset are
separated, and plotted in different colours. Only those seismograms
with corresponding misfits (blue points) inside the range of the
3.1 Seismogram selection and preparation
benchmark data set (green), indicated by the red dashed lines, are
retained. In this example, ∼2–3 per cent of blue equations have Using our large new data set, we expect to improve resolution in
residuals outside the accepted range; retaining only seismograms the upper mantle and transition zone using the structural constraints
whose equations are within the limits, we discard (in this case) ∼6– extracted by AMI from surface, S, and multiple-S (up to at least S7 )
9 per cent of the seismograms. For a subset containing noisier data waves over a broad range in periods spanning 11–450 s (note 11 s is
than that pictured here (e.g. some temporary and regional arrays), a the global minimum, more significant contributions begin at 20 s).
larger per cent of seismograms may be removed (up to 15 per cent). We have assembled data from more than 120 international, national,
After this first set of outlier removals is carried out, a second regional and temporary seismic networks available from Incorpo-
pass is performed, where each reduced data set is reinverted. After rated Research Institutions for Seismology (IRIS), GFZ-Potsdam
this second pass, there is much less scatter in residuals, and fewer (GEOFON), Observatories and Research Facilities for European
equations lie outside the misfit range defined by the benchmark. Seismology (ORFEUS) and Canadian National Seismic Network
This second pass may result in a further reductions of 1–3 per cent (CNSN) Data Centres; in total this includes data from more than
of seismograms. 5000 stations.
426 A. J. Schaeffer and S. Lebedev

The events used are those in the Global Centroid Moment Tensor
(CMT) catalogue (e.g. Ekström et al. 2012), which contains more
than 36 000 events since 1977. While we would like to obtain as
much data as possible without prior discrimination, there are several
criteria enforced to reduce the quantity of noisy data.
The primary criterion used for selecting seismograms is based
on an empirical relationship between the epicentral distance, earth-
quake magnitude and the signal-to-noise ratio. For a seismogram
to be requested, the earthquake magnitude must exceed a computed
threshold, depending on the source–receiver distance. The mini-
mum magnitude increases linearly with distance until ∼12 000 km,
beyond which all earthquakes with MW ≥ 5.7 are requested. The
parameters defining this empirical cut-off were selected based
on examination of past waveform fitting results using AMI (e.g.
Lebedev & van der Hilst 2008).
A number of pre-processing steps were undertaken to prepare
the raw seismograms for input to AMI. First, seismograms were
checked for segmentation due to clock drift, and subsequently
merged if timing gaps were small (t where t = 1 s). Next,
clipped seismograms and those with missing data were identified
and removed. Finally, instrument response was removed, and the
horizontal components were rotated into radial and transverse ori-
entations. Seismograms not passing any of the checks were removed
from the data set for follow up analysis. Finally, arrival times for the
first arriving P wave are computed to estimate the signal-to-noise
ratio prior to onset.
These rigorous checks resulted in a data set of more than 3.6
million vertical- and 2.9 million transverse-component broad-band
seismograms recorded for events between 1981 January and 2010
March. The distribution of sources and receivers are shown in Fig. 5.
The red triangles represent stations, the yellow circles indicate the
Figure 7. Estimated arrival times plotted as a function of epicentral distance
for the 3.14 million fundamental- and 330 000 higher mode wave trains, all
successfully fit using JWKB synthetics (data set B, Table A1), indicated
3.2 Waveform fitting
by red and blue colours, respectively. The arrival times were measured at
A single instance of AMI runs with a memory footprint of no the maxima of the signal envelope within all the time–frequency windows.
more than 1 GB, used mostly in storing the 3-D reference model’s Darker shades indicate a greater density within a particular distance-time
phase velocities and derivatives. Therefore, with modern multicore bin. Travel time curves of the S and multiple-S waves for a surface source in
desktop computers, the serial nature of waveform fitting is readily AK135 are plotted in solid grey lines, up to S6 . The traveltime curve for an
S wave generated at a depth of 650 km is shown as a dashed line, to illustrate
extended through parallelization using one of the available suites of
the range in arrival times expected due to source-depth variations.
tools. AMI determines and discards most unsuccessful fits in less
than 1 s; successful fitting takes up to 2 min, depending on the num-
ber of time–frequency windows and higher mode content. There-
fore, our data set of 3.6 million Rayleigh wave vertical-component successfully fit wave trains (time–frequency windows). We note
seismograms can be processed using 3000–5000 CPU hr on a single that although the ‘fundamental-mode windows’ are dominated by
12-core high-performance server. the fundamental mode, they often include substantial higher mode
Given the computational efficiency of AMI, we elected to repro- energy as well, which adds resolving power to the signal within
cess our full data set several times using different a priori settings, the windows. For simplicity during discussion, we will treat these
so as to examine their impact on the waveform fitting procedure and windows as containing the fundamental mode only.
resulting tomographic models. We tested, first, the impact of near- In Fig. 7, the distribution of arrival times for all wave trains is plot-
nodal propagation on waveform fitting, and, secondly, the effect of ted as a function of epicentral distance. Arrival times were computed
the upper frequency cut-off (limiting the highest Gaussian filter) im- at the envelope maximum within each time–frequency window, and
posed in waveform fitting. A detailed description of these different were binned to illustrate their relative density. The fundamental-
tests is given in Appendix A, the results of which are summarized mode arrivals, shown in red, plot along a straight line across the
in Table A1. range of epicentral distances. The higher mode arrivals plotted in
In this work, we focus our attention on the vertically polar- blue tend to cluster around the predicted S and multiple-S (up to S6
ized shear speed structure, and therefore only utilize the vertical- plotted) traveltime curves computed for AK135 (surface source).
component (Rayleigh wave) seismogram fits. Within windows se- Although not clearly evident in this figure (due to the binning),
lected for these 685 000–847 000 seismogram fits (depending on the there are a handful of seismograms with time–frequency windows
constraints applied during fitting, see Appendix A), there are 2.9– corresponding to the S7 branch. The sharp cut-offs seen for the di-
3.6 million fundamental-mode and 226 000–409 000 higher mode rect S wave and the first two multiples (SS and SSS) were purposely
Multimode upper-mantle tomography 427

enforced during fitting, to exclude body waves that bottom too deep
in the lower mantle and have smaller sensitivity to upper-mantle
structure, which is the focus of this study. The cut-offs are at 35◦
for S, 70◦ for SS, 105◦ for SSS and 140◦ for SSSS.
Based on the detailed analysis and discussion in Appendix A, the
data set selected for inversion in the final tomographic model is made
up of 521 705 vertical-component seismograms, whose waveform
fits were computed without enforcing any upper frequency limit.
These were selected from our master data set of more than 710 000
vertical-component waveform fits, using a rigorous process of out-
lier analysis (Section 2.4), including a final manual selection and
removal (Appendix A3).

4 R E S O L U T I O N A N A LY S I S
It is common practice in tomographic studies to perform a series
of resolution tests to assess the accuracy of the model. The tomo-
graphic resolution may depend on the data sampling, noise, regular-
ization and a priori information (e.g. background model). However,
conventional (checkerboard or spikes, for example) or resolution-
kernel tests are intrinsically limited, as they are carried out assuming
the same theoretical approximations as in the inversion of the data,
and therefore do not examine methodological inaccuracies (Qin
et al. 2008).
With the expansion of computing resources over the last decade,
more exact methods are available to examine the full resolving
power of models, including the methodological and theoretical
foundations. One such method is the generation of benchmark seis-
mic data sets. These consist of an arbitrarily complex synthetic
model, through which seismograms are computed between syn-
thetic sources and receivers. Recent techniques such as the spectral-
element method (SEM; Komatitsch & Vilotte 1998; Chaljub et al.
2003) and the coupled spectral-element method (CSEM; Capdev-
ille et al. 2003) are capable of simulating seismic wave propagation
in heterogeneous 3-D anisotropic media in period ranges compara-
ble to those used in waveform tomography (Qin et al. 2008). With
access to large modern clusters, synthetic data sets with several Figure 8. Relative lateral sampling at three different depths within the
thousands of seismograms can be generated and used to test and model. The colour scale is scaled to the min and max value for each depth.
benchmark tomographic methods. In no case is the sampling of any model node zero. Sampling is estimated
The resolving capability and accuracy of AMI has previously using matrix column sums of the SV parameter.
been examined through inversion of two benchmark data sets by
Qin et al. (2006, 2008). The first benchmark inverted ∼3000 suc-
cessfully fit CSEM synthetic seismograms, computed through a model) and the more traditional three-step phase-velocity method.
smooth global isotropic model (Lebedev & van der Hilst 2008). From this, it was clear that the use of an accurate 3-D crustal model
The results from this procedure confirmed the validity of the as- was very important to prevent artefacts in the mantle due to smearing
sumptions and approximations used by AMI and the subsequent of unmodelled crustal structure. The higher modes were indispens-
inversion. At lithospheric depths, both the shape and amplitude of able for retrieving deeper structures, at the base of the upper mantle
the anomalies were recovered, whereas at base of the transition zone and in the transition zone. The greater heterogeneity present in this
amplitudes were underestimated by up to a factor of two. Lebedev new model further illustrated that the location of anomalies was of-
& van der Hilst (2008) suggest that such underestimation could be ten better constrained than the shape of the boundaries. This would
resolved through explicit modelling of the sensitivity volumes (e.g. be improved using a larger synthetic data set, with coverage more
Meier et al. 1997). Alternatively, improvements in amplitude recov- comparable to that of a real global data set.
ery may be achieved through the incorporation of more multiple-S Having confirmed AMI’s ability to accurately invert for smooth
higher mode constraints to improve sampling within the transition and complex benchmark models, we now assess the resolving power
zone. of our current model and data set. We begin with the relative path
Several years later, Qin et al. (2008) constructed a more com- coverage using reweighted matrix column-sums for the isotropic SV
plex model containing a suite of ‘quasi’-realistic heterogeneities in parameter, plotted for depths of 80, 260 and 585 km in Fig. 8. Across
velocity, radial and azimuthal anisotropy, attenuation and density, all depths in the model, every shell node is sampled. Blue–green
spanning a range in spatial scales. The authors then compared the colours indicate lower sampling density, while red colours indicate
results from two different global tomographic inversion techniques: the highest sampling density; the colour scale is normalized for each
AMI (simplified, for that application, to using only a 1-D reference depth.
428 A. J. Schaeffer and S. Lebedev

At shallow depths in the lithosphere (Fig. 8a), sampling is densest

(red) beneath North America, Europe and eastern Asia. At 260 km
depth (Fig. 8b), the relative sampling of Eurasia increases, becom- Our new isotropic global upper mantle and transition zone SV ve-
ing almost equivalent to that of North America. Within the transi- locity model, SL2013sv, is computed on a ∼280 km (minimum
tion zone (Fig. 8c), sampling remains strongest beneath the North 250 km, maximum 296 km) triangular grid using 521 705 vertical-
American and Eurasian plates. Despite the efforts of normalization component broad-band seismograms selected from our master data
and re-weighting, several paths are evident in Fig. 8c, as bands of set of almost 3/4 million. The resulting inverse problem consisted of
elevated sensitivity in lesser sampled regions. Several examples in- 1.55 million data equations and 1.47 million smoothing and damp-
clude paths to stations in the western Pacific and the South African ing constraints to solve for 501 888 unknown model parameters
Seismic Experiment (SASE), and paths from events along the Mid- (7842 shell nodes × 18×3 S parameters and 10 P parameters). The
Atlantic Ridge. final model has a variance reduction of 90 per cent with respect to
In the upper mantle, sampling is good across almost all the North- our 3-D reference model. The increased quantity of data provides
ern Hemisphere continental regions and northern Atlantic Ocean the ability to decrease grid spacing, targeting higher resolution com-
(yellow colours), with weaker sampling in northern Africa. Oceanic pared to past models.
regions and the Southern Hemisphere are relatively less well sam- In Figs 10–13, we plot horizontal slices globally at 12 different
pled. However, in all cases the sampling is non-zero and represents depths through the model: 36, 56 and 80 km (Fig. 10); 110, 150
an increase over past global modelling efforts. and 200 km (Fig. 11); 260, 330 410 km (Fig. 12) and 485, 585,
Finally, we discuss the results from a series of four different 660 km (Fig. 13). In Figs 14 and 15, 12 vertical cross-sections
spike resolution tests illustrated in Fig. 9. The input models consist slice through various parts of the model, with inset maps indicating
of varying width columnar perturbations of ±300 m s−1 . Model A their locations. White circles overplotted indicate seismicity within
is simplest and consists of columns 6◦ in diameter spread around the 40 km laterally from the profile. Slices A–C cross Africa and western
globe. Model B has columns 10◦ in diameter between latitudes ±60◦ Eurasia; D crosses eastern Eurasia and the western Pacific; E and
and 6.6◦ outside. Model C has columns 18◦ in diameter centred at F slice the Mid and south Pacific from west to east, G–J focus on
latitudes of ±90◦ , ±45◦ and 0◦ ; anomalies 12◦ in diameter are the North American continent; and finally K and L cross South
centred at ±60◦ latitude. The last model D has the largest columns, America. For both the horizontal and vertical images, perturbations
with diameters of: 30◦ at 0◦ latitude, 25◦ at ±30◦ latitude and 20◦ in shear velocity are plotted with respect to our reference model. In
at ±60◦ and ±90◦ latitude. the horizontal slices, the indicated reference velocities are extracted
Synthetic data were generated for each of the different models, from the 1-D mantle reference model (solid black line, Fig. 3). At
and each inversion was run until convergence, with the same reg- depths greater than the Moho, perturbations are in per cent from the
ularization as the real-data inversion. We plot the resulting models reference. It is important to note that at depths shallower than the
at three different depths, to illustrate the model’s resolving capabil- Moho, model perturbations are in m s−1 relative to the 3-D crustal
ities. At 80 km depth, both model A and B anomalies are reliably model (depth slices at 36 and 56 km indicate the range in m s−1
recovered beneath the well-sampled continental regions, including in addition to percentage). Although this makes interpretation of
north and central America, Europe and eastern and southeast Asia. the magnitude of velocity perturbations in crustal regions more
The strength of the anomalies is reduced in regions with lesser sam- complex, relative variations are still readily interpreted in terms of
pling, for example, the Pacific Ocean. In the Pacific, some anomalies structure. For all vertical cross-sections, perturbations are in m s−1 ,
also exhibit a degree of smearing, more apparent for model A than with values indicated in the captions.
B. Long-wavelength, lithospheric-depth features in our new model
At 260 km depth, models B and C are illustrated. For model B, are in agreement with observations from past models that use dif-
again the strength and shape of the anomalies are well recovered ferent methodological approaches and parametrizations, as well as
beneath regions of highest station density, as well as most conti- differing types and sizes of data sets (e.g. Debayle et al. 2005; Pan-
nental regions. Anomalies in the mid-Pacific are somewhat under- ning & Romanowicz 2006; Simmons et al. 2006; Zhou et al. 2006;
estimated. For model C, the amplitude and shape of anomalies are Houser et al. 2008; Kustowski et al. 2008a; Lebedev & van der
recovered properly, including those in the Pacific Ocean. Hilst 2008; Nettles & Dziewónski 2008; Ferreira et al. 2010; Lekić
At 485 km depth, we show the results for models C and D. As & Romanowicz 2011; Ritsema et al. 2011; Debayle & Ricard 2012).
with previous depths, the anomalies are accurately retrieved beneath However, at greater depths (e.g. in the transition zone) variations
densely sampled continental regions, particularly North and South between models are large even at long wavelengths (Ritsema et al.
America, most of Eurasia and southeast Asia. However, the anoma- 2011). In Section 6.1, we will examine the differences between our
lies are less well resolved beneath the Pacific Ocean, Africa and new model and five other global tomographic models.
the Southern Hemisphere below 45◦ latitude, as would be expected In our model, we observe improvements in the resolution of fine-
based on the path sampling estimates shown in Fig. 8. scale regional structures. The prominent features in our model dis-
Overall, the results from the resolution tests indicate that the play deep expressions of regional tectonic structures and processes.
model is well resolved at a variety of length scales, in particular We observe sharp velocity contrasts across many tectonic bound-
beneath continental regions. Features with dimensions of 6◦ are aries, for example, subduction systems and associated backarc vol-
clearly recovered at lithospheric depths beneath North America, canics, actively deforming regions and continental orogens. The
Eurasia and southeast Asia; in these densely sampled areas, smaller strongest velocity anomalies in the model are associated with stable
scale features would easily be recovered. Larger scale features re- continental cratons (positive), mid-ocean ridges (MORs) and rift
main accurately retrieved at depths into the transition zone. In more systems (negative) and backarcs and active orogens (negative).
poorly sampled oceanic and some continental regions, although the In the continental crust, strong perturbations of more than
anomalies are still recovered, their strength is underestimated and 350 m s−1 (from the 3-D reference) are observed beneath the Hi-
suffer from a degree of distortion (due to oversmoothing relative to malaya and Tibet, the Hangai Dome (western Mongolia), the
the sparser data sampling). Afar Depression, the Pamirs, southern Alaska and the Yukon and
Multimode upper-mantle tomography 429

Figure 9. Synthetic resolution tests illustrating the sensitivity of the final model. The top panel shows the four different input models A, B, C and D. Each
consists of columnar perturbations of ±300 m s−1 with varying dimensions. Model A has columns 6◦ in diameter; model B has columns 10◦ in diameter
between ±60◦ and 6.6◦ outside ±60◦ . Model C has columns 18◦ in diameter centred at ±90◦ , ±45◦ , 0◦ latitude and 12◦ anomalies centred at ±60◦ latitude;
model D consists of 30◦ diameter columns at 0◦ latitude, 25◦ diameter at ±30◦ and 20◦ diameters at ±60◦ and ±90◦ . The three lower panels show the resulting
inversions at 80 km depth for models A and B, 260 km depth for models B and C and 485 km depth for models C and D.
430 A. J. Schaeffer and S. Lebedev

Figure 10. Horizontal cross-sections through the tomographic model SL2013sv at three depths in the shallow upper mantle (and crust in some continental
regions). Approximate plate boundaries are indicated. The reference SV velocity values (at a reference period of 50 s) are indicated. Perturbations from the
reference are indicated in percentage, with the absolute minimum (maximum) indicated below (above) the colour bar. Note that at 36 and 56 km, some
continental regions are still in the crust, therefore perturbations are indicated in m s−1 (colour scale range and absolute minimum and maximum are labelled),
relative to the 3-D reference model. North and south polar views are labelled at right.
Multimode upper-mantle tomography 431

Figure 11. Horizontal cross-sections through SL2013sv at three depths in the lithospheric mantle. Plate boundaries and reference velocities follow as in the
previous figure.

western United States. The largest crustal and shallow-mantle

anomalies beneath oceans are associated with backarcs of the 5.1 Oceanic regions
western Pacific and spreading ridges (most notably in the east In the upper ∼120 km beneath oceans (Figs 10 and 11), the most ap-
Pacific). parent feature of the model is the clear signature of the low-velocity
432 A. J. Schaeffer and S. Lebedev

Figure 12. Horizontal cross-sections through SL2013sv at three depths in the lower upper mantle and top of the transition zone.

anomalies associated with spreading at the MORs. Their width in- Pacific Rise ridge system is wider than others (i.e. the Mid-Atlantic
creases as a function of depth (South Atlantic Ridge, slices KK and or southwest Indian ridges).
LL in Fig. 15), as would be expected based on a simple triangu- At 110 km depth, the strongest anomalies beneath the ridges begin
lar decompression melting model. The more rapidly spreading East to become more localized, and by 150 km depth, the signature of
Multimode upper-mantle tomography 433

Figure 13. Horizontal cross-sections through SL2013sv at three depths within the transition zone.

most MORs no longer stands out from the lower velocities observed depths no longer visible in vertically polarized shear velocity. This
across the rest of the ocean basins. Therefore, we conclude that in is in agreement with some past studies (e.g. Zhang & Tanimoto
most cases, significant partial melting beneath MORs is confined 1992; Forsyth et al. 1998), but does not confirm inferences from
to depths less than ∼120 km, with lower degree melting at greater others regarding MOR anomalies and processes extending into the
434 A. J. Schaeffer and S. Lebedev

Figure 14. Vertical cross-sections of six profiles through SL2013sv. The location of each section is indicated in the maps at the top. Model is plotted from
the shallowest model node (7 km) to a depth of 410 km. Elevation/bathymetry is indicated at right, and is smoothed from E TOPO 1 (Amante & Eakins 2009).
Velocity perturbations for each section are: A ±240, B ±240, C ±240, D ±240, E ±180 and F ±240 m s−1 .

deep upper mantle (e.g. Su et al. 1992). Where slow anomalies lies, due to decompression melting associated with hot upwelling
do remain below depths of 150 km, often they are coincident with mantle, are narrowly confined beneath the ridge spreading centre.
oceanic islands. Sharp lateral boundaries between these anomalies and the smoothly
Our new model has made significant improvements in the lateral varying shear speeds in surrounding oceanic lithosphere and as-
definition of the MOR anomalies. The central low-velocity anoma- thenosphere are marked contrasts.
Multimode upper-mantle tomography 435

Figure 15. Vertical cross-sections of six additional profiles through SL2013sv. As in previous, the location of each profile is indicated in the maps at the
top. Velocity perturbations are: G ±180, H ±240, I ±240, J ±240, K ±180 and L ±180 m s−1 . Labels are: B&R, Basin and Range; Cord., Cordillera; C.P.,
Colorado Plateau; C.R., Coast Ranges; C.S., Cape Smith Belt; G.F., Grenville Front; L.Pr., Labradonian Province.; R.M.F., Rocky Mountain Front; S.F.T.B.,
Sevier Fold and Thrust Belt; S.N., Sierra Nevada; Sup. Pr., Superior Province; T.H.O., Trans-Hudson Orogen; T.R., Transverse Ranges; W.O., Wopmay Orogen.

Away from the MORs in the oceans, we observe relatively high itatively similar to the observations of Maggi et al. (2006). The
velocities, with older regions remaining fast to greater depths, con- leading eastward edge of oceanic lithosphere (transition from red to
sistent with cooling-induced thickening of the oceanic lithosphere. blue moving west from the Pacific MOR system) progresses west-
This can be seen clearly seen across the Pacific Basin, and is qual- wards across the Pacific with increasing depth (from 36 to 150 km).
436 A. J. Schaeffer and S. Lebedev

At 150 km depth we observe the deepest fast anomaly associated (west to east), the Rocky Mountain Front, which separates the juve-
with the ancient western Pacific, immediately east of the Marianas nile western margin of North America from the ancient continental
trench; by 200 km, this anomaly is gone. This age progression of backstop (Fig. 15, HH , II and JJ ).
the lithospheric thickness is clear in the vertical cross-section EE Notable low-velocity features in this region (Figs 10 and 11 at
(Fig. 14) through the Pacific, with the lithosphere thinning eastwards depths of 56–150 km) include the Snake River Plain volcanic belt (as
from the trench. imaged previously in, e.g. Tian et al. 2009) and extensional Basin-
The backarc regions near ocean-0ocean and ocean–continent and-Range province (additionally in HH and JJ , Fig. 15). At depths
convergent boundaries are characterized by low-velocity anoma- greater than 150 km, neither feature stands out; however, the western
lies, albeit relatively weaker than those beneath MORs. The most margin does remain distinct (low velocity) from continental North
prominent backarc anomalies at shallower depths (∼80 km) in- America, through depths to the base of the continental lithosphere.
clude Tonga–Kermadec, New Hebrides and Indonesia–Sumatra– The base of the main low-velocity anomalies appears to terminate
Java; they fade gradually with depth down to around 150 km and are more sharply and at shallower depths than in most past surface wave
weak or absent at greater depths (200 km and below). The anoma- models (e.g. Lebedev & van der Hilst 2008; Kustowski et al. 2008a;
lies beneath the Mariana, Izu-Bonin, Japan, Kuriles and Aleutians Nettles & Dziewónski 2008; Lekić & Romanowicz 2011; Ritsema
volcanic arcs are similar in strength at 110 km. A number of these et al. 2011).
anomalies are visible in the vertical cross-sections (DD , EE , GG , Large-scale high-velocity anomalies in the uppermost 250 km
KK and LL , Figs 14 and 15). Due to the much higher saturations beneath continental regions have been recognized as the signatures
used here to display deeper structures (where perturbations are much of ancient continental cratons in global surface wave tomographic
smaller) clearly, the volcanic-arc low-velocity anomalies appear to models for more than 25 yr (e.g. Woodhouse & Dziewónski 1984).
extend deeper, and along the subducting plate interface. These anomalies are the dominant high-velocity features in our new
At 150 km depth, a system of prominent high-velocity anomalies model. The difference between our model and other recent global
indicates subducting oceanic lithospheric slabs. This is seen in much tomography models is in the relative roughness of the craton margins
of the western Pacific, from the Aleutians (GG , Fig. 15) through the and fine structure within the cratons, which we resolve particularly
Kuriles, Japan (DD , Fig. 14), Izu-Bonin, Mariana (EE , Fig. 14), well in densely sampled regions such as North America, Europe
to Indonesia-Sumatra, as well as the Hikurangi (FF , Fig. 14) and and eastern Asia.
portions of the Andean (LL and KK , Fig. 15) . At 200 km depth By 260 km depth, there are very few high-velocity seismic
and below, these subducting slabs are even clearer, with smaller anomalies remaining beneath cratons. We can therefore conclude
anomalies associated with the subduction at the Cascadia, Lesser that the thickness of the high-velocity lithospheric roots beneath
Antilles, Scotia and other arcs also apparent. cratons is unlikely to exceed ∼200–220 km depth in most cases
(Lebedev & van der Hilst 2008; Debayle & Ricard 2012). Sev-
eral vertical cross-sections in Figs 14 and 15 bisect such ancient
Archean cratons, including in eastern Europe (AA and CC ), cen-
5.2 Continental regions
tral Australia (FF ), North America (HH , II and JJ ) and South
At depths less than 200–260 km beneath continents (Figs 10 and 11), America (KK and LL ).
the strongest low-velocity anomalies are associated with tectoni-
cally active regions undergoing deformation. One of the strongest
of the anomalies is beneath the Himalayas and the Tibetan Plateau.
The anomaly boundaries at crustal depths closely match the bound-
5.3 Sublithospheric mantle and transition zone
aries of the plateau at the surface; the very low velocities within the
mid-lower crust are consistent with pervasive partial melting in it In the depth range from 260 km to 660 km (Figs 12 and 13), the
(Nelson et al. 1996). In the mantle beneath, high-velocity anomalies most prominent high-velocity anomalies in the model are beneath
beneath much of the plateau probably indicate the underthrusting areas of past and current subduction. The most conspicuous of
and subduction of the Indian lithosphere (80–200 km depth and these are the various subduction zones in the western Pacific (DD
below), the nature of which varies with position along the thrust. and EE ), which are seen from shallower depths (Section 5.1) to
The region of shallow low velocities underlying Tibet appears the transition zone. Also more clearly evident is subduction of the
to be part of a much broader zone of convergence and deforma- south–central Nazca Plate beneath South America (KK and LL ),
tion, which originates beneath Burma to the South, and extends where the highest velocities are located beneath Peru, Bolivia, Chile
westwards, almost continuously, through the Zagros Mountains, the and Argentina. With increasing depth into the TZ, the signature of
Anatolian Plateau, into the Aegean Sea, and northwards towards the the plate spreads out laterally across much of central and southern
Alps and Pannonian Basin. South America; similar observations are noted in northern South
Other prominent low-velocity anomalies include the Cameroon America, where a clear signature of the Nazca Plate is imaged
Line volcanic belt, which bisects the African continent to depths beneath Columbia, Venezuela and northern Brazil.
of 200–260 km (centred at ∼4500 km along profile BB , Fig. 14), Beneath North America, the distribution of high-velocity anoma-
as well as the signatures of the Red Sea and East African rifts. We lies reflects the complex history of subduction. At upper-mantle
image the East African Rift extending through the upper mantle depths (200–410 km), the Juan de Fuca Plate is imaged subducting
approaching the transition zone (profile CC , Fig. 14). beneath the Cascades (British Columbia, Oregon and Washington).
Finally, we also image the structure of a pervasive low-velocity Within the transition zone itself, we image fragments of both the
anomaly underlying the western margin of North America, which Juan du Fuca and Farallon plates subducted over the last 150 Myr
is much younger than the rest of the continent and is undergoing (e.g. Sigloch et al. 2008; Tian et al. 2009). Towards the base of the
active deformation. The high station density (USArray) means this transition zone, we image the signature of the Farallon Plate ex-
portion of the model is well resolved, and the structures observed tending across much of the continental US, as far east as the Great
are robust. We clearly image this transition of low to high velocities Lakes (660 km depth, Fig. 13).
Multimode upper-mantle tomography 437

In addition to the Farallon Plate beneath the US, we also image depth); beneath each map the total range is indicated. The models are
a similar high-velocity feature beneath western and central Canada ordered from left to right by decreasing peak-to-peak perturbations
(mostly British Columbia, Alberta and Saskatchewan). This is im- at 100 and 150 km depth.
aged more clearly and detached from the slabs to the south than in Model DR2012 is an upper-mantle SV -wave model constrained
past models of North America (e.g. Frederiksen et al. 2001; van der by multimode Rayleigh wave seismograms, using an approach sim-
Lee & Frederiksen 2005; Nettles & Dziewónski 2008; Bedle & van ilar to that used in generating SL2013sv. The CUB model (specifi-
der Lee 2009). Previous high-resolution models of subduction be- cally CU_SDT1.0) is a crust and upper-most mantle isotropic shear
neath north America (e.g. Sigloch et al. 2008; Burdick et al. 2010) velocity and radial anisotropy model computed from fundamental-
focused only on the western US, and did not extend northwards into mode Rayleigh and Love group and phase measurements. SEMum
Canada. is a global upper-mantle Voigt-average shear speed and radially
Although not as high velocity as the signals associated with anisotropic model derived from long-period seismic waveforms
the subducting plates, the central and south–central Atlantic ocean (multimode Rayleigh and Love waves and long-period body waves)
maintains a small-to-moderate positive velocity anomaly from 260 and group-velocity dispersion maps. Model S362ANI is a whole-
to 585 km depth. Such a feature was documented in S20RTS (Rit- mantle Voigt-average isotropic shear velocity model generated us-
sema et al. 2004), and was speculated by King & Ritsema (2000) ing surface wave dispersion measurements, mantle and body wave
to be the signature of edge-driven convection. A similar feature can waveforms and body wave traveltimes. Finally, S40RTS is an
be observed in a number of more recent models (e.g. Lekić & Ro- isotropic shear velocity model of the mantle constrained by three
manowicz 2011; Ritsema et al. 2011; Debayle & Ricard 2012), with data sets: minor and major arc Rayleigh wave dispersion (funda-
some variations in its depth and horizontal location. The presence mental and first four overtones), teleseismic body wave traveltimes
of this anomaly in models using differing parametrizations and data and spheroidal mode splitting functions. For each model, we plot
sets suggests that it is a robust, though low-amplitude, structure. the SV component.
A strong, continuous band of high velocities is observed in the In the uppermost mantle, from 50 km (partially crust) to 150 km
transition zone, stretching from western and central Europe east- depth, the long-wavelength (several thousands of kilometres) fea-
wards, through Anatolia and into the Tibetan Plateau. It is the tures are consistent across the models. For example, all show low-
strongest in the mid-transition zone, becoming more diffuse at the velocity anomalies in the eastern Pacific and higher velocities in
base. In comparison with other surface wave models, such fast the ancient Western Pacific. High-velocity anomalies representing
velocities have been observed, though not as continuously (Kus- the continental cratonic roots are clear in each, although the ampli-
towski et al. 2008a; Lekić & Romanowicz 2011; Debayle & Ricard tude and clarity does vary (e.g. the Southern Hemisphere cratons in
2012). We conclude that this high-velocity material likely repre- S362ANI and S40RTS).
sents the final fragments of ocean basins, continental lithospheres At shorter length scales, there are much greater differences be-
and portions of continental margins subducted after the closure of tween the models. SL2013sv displays the highest resolution, partic-
the Tethys Ocean. The subducted oceanic Tethyan lithosphere it- ularly at lithospheric and crustal depths. One key difference between
self is already almost entirely well within the lower mantle, as has the models is how the crust is treated. In SL2013sv, crustal pertur-
been previously imaged in teleseismic P-wave traveltime tomogra- bations, with respect to our 3-D reference model, are solved for
phy (Bijwaard et al. 1998; Van der Voo et al. 1999; Amaru 2006; directly in the inversion. In continental regions, this often includes
Hafkenscheid et al. 2006). three vertical crustal knots (7, 20 and 36 km), whereas in the oceans
there is commonly only one (7 km). As a result, not only are deeper
mantle artefacts due to unaccounted for or assumed crustal struc-
6 DISCUSSION ture prevented, but also a high-resolution crustal model is generated
(depths from as shallow as 7 km). The CUB model is most similar to
In the following sections, we examine our new data set and model
SL2013sv at these depths, due to the inclusion of crustal parameters
from a number of different perspectives. First, we present a com-
in the inversion; in the other models, crustal structure tends to be
parison of SL2013sv with five recent, published global models. In
much smoother, both laterally and vertically.
the next section, we examine the bulk dispersive properties of the
At 50 km depth (top row Fig. 16), the dominant features are
Earth’s heterogeneous upper mantle and crust as sampled by our
the signature of spreading ridges, backarc basins and regions of
data set of more than 700 000 fundamental- and 475 000 higher
continental deformation. Although these can be observed in each
mode group- and phase-velocity curves. Finally, we leverage the
model, SL2013sv obtains the highest definition. For example, the
superior statistical sampling of this data set to re-examine the va-
spreading ridges are much narrower and with large perturbations
lidity field of the JWKB approximation and the overall success rate
(more continuous red and black colours) tightly confined near the
of waveform fitting using AMI.
ridge axes. Although DR2012, CUB and SEMum all show well-
defined ridges, the highest anomalies are not as continuous along
the spreading centres. A second feature in common between those
6.1 Comparison with other global models
models is the low-velocity anomaly associated with the partially
We have compared our new model SL2013sv with five other global molten Tibetan crust. Clearly, the structure resolved in SL2013sv
shear velocity models in Fig. 16: CUB (Shapiro & Ritzwoller 2002), is better correlated with surface tectonic boundaries, including the
DR2012 (Debayle & Ricard 2012), SEMum (Lekić & Romanowicz low velocities in the Hindu Kush and Pamirs to the west of the
2011), S362ANI (Kustowski et al. 2008a) and S40RTS (Ritsema Himalaya, high velocities of the Tarim Basin north of the Altyn Tagh
et al. 2011). Each of these models is computed with different data Fault, and a clear extension of the partially molten Tibetan crust
sets and modelling methodologies. The mean was removed at each southeastwards around the eastern syntaxis of the India–Eurasia
depth and model perturbations were plotted in per cent from this collision.
value. The limits of the (saturated) colour scales are indicated at At depths of 100 km, the spreading ridge anomalies are still
the left of each row (e.g. −8 to +8 per cent at 100 and 150 km clearly observed in each of the models, though still more narrowly
A. J. Schaeffer and S. Lebedev

Figure 16. Comparison of SL2013sv with five recent global tomographic models: CUB (Shapiro & Ritzwoller 2002), DR2012 (Debayle & Ricard 2012), SEMum (Lekić & Romanowicz 2011), S362ANI (Kustowski
et al. 2008a) and S40RTS (Ritsema et al. 2011). At each of five depths in the upper mantle (top to bottom 50, 100, 150, 250, 350 and 500 km), perturbations are plotted in per cent with respect to the mean value for
that model. The minimum and maximum values are indicated underneath each map, and the same linear colour scale spans from negative to positive saturation values indicated for each depth (at left). Models are
ordered left-to-right by decreasing peak-to-peak variations at 100 and 150 km depths.
Multimode upper-mantle tomography 439

Figure 17. Empirical distribution of the group (left-hand panel) and phase velocities (right-hand panel) of the fundamental and higher modes. All multimode
dispersion curves measured from the model subset of waveform fits (E, Table A1) were binned together; blue colours indicate minimum density, reds through
black indicate maximum density. The fundamental mode and overtone dispersion curves calculated for AK135_50 are plotted as white/black lines. For group
velocity, the first 12 overtones are plotted, for phase velocity the first 14 are plotted.

confined and of higher amplitude in SL2013sv. By 150 km depth, sublithospheric mantle and transition zone is relatively poorer than
the dominant ridge anomalies are gone in all models. At both these in the lithospheric mantle for all the models, and, due to this reduced
depths, high-velocity anomalies associated with continental litho- sampling, a wider range in structures is observed. This is clear from
sphere are evident in each of the models. As previously mentioned, the 500 km depth maps. Each model images high-velocity anomalies
over long wavelengths (thousands of kilometres), the cratons are in the western Pacific and beneath eastern Eurasia. However, there
quite similar; however, at shorter wavelengths (500–1000 km and are large differences in the amplitude and location of these slabs
shorter), there are strong differences across models. In SL2013sv, even at long wavelengths (>3000 km). The large contribution of
the structural boundaries within the high-velocity continental-cores multiple-S body waves (higher modes) in SL2013sv has enabled
are more finely resolved, and individual cratons are more readily relatively sharp images of the subducted slabs in the transition zone,
observed. For example, the different cratons in South America, the particularly beneath North and South America, eastern Eurasia and
cratonic blocks in southern Africa, the structural details along the through the Tethys suture towards the Mediterranean.
boundaries of stable North America, and the clear linear signa-
ture of the Indian lithosphere deepening beneath the Himalaya and
Tibet. In addition, very narrow high-velocity subducting oceanic
6.2 Multimode phase-velocity measurements
lithosphere is imaged along most of the western Pacific subduction
zones. Longer wavelength equivalents of these anomalies are im- Following successful waveform fitting of a seismogram, AMI can
aged in the other models, but with reduced correspondence with the measure phase velocities of the fundamental and higher modes, for
plate boundaries (green lines). those modes the velocities of which are constrained by the wave-
At depths corresponding to the base of the continental lithosphere form fit within the set of time–frequency windows. The tomographic
and in the sublithospheric mantle (250 and 350 km), differences be- inversion in this study used only the linear equations yielded by the
tween models continue to increase, even at longer wavelengths. fitting, not phase-velocity measurements. We did, however, mea-
Subduction zones are evident in most models, in particularly in sure >700 000 fundamental-mode and >475 000 higher mode,
the western Pacific and South America; however, the shape of the Rayleigh-wave, phase-velocity curves. These are well-suited for in-
subducting slabs are very different. They are imaged most clearly corporation into a variety of other imaging studies, for example,
in SL2013sv, as finely localized near the plate boundaries, and of array-based, teleseismic interferometry (Meier et al. 2004; Lebedev
higher amplitudes. Across the rest of the Pacific Ocean, all models et al. 2006; Deschamps et al. 2008a,b; Darbyshire & Lebedev 2009;
show a predominance of low velocities (yellow–orange colours), Zhang et al. 2009; Endrun et al. 2011; Adam & Lebedev 2012). In
but at length scales less than ∼5000 km, their amplitudes and this work, we restrict ourselves to simply examining their variability
shapes vary strongly. In addition, each model has low-amplitude and, thus, the bulk dispersive properties of the Earth’s crust, upper
fast anomalies beneath most continents; CUB and SEMum show mantle and transition zone.
the highest amplitude high-velocity anomalies extending to greater Fig. 17 displays the binned multimode group- and phase-velocity
depths. As we noted in the previous section, SL2013sv does not curves measured from the ∼521 000 vertical-component seismo-
require continental roots to extend to depths much beyond 200 km. gram fits used in the final tomographic model (E, Table A1). The
At depths below 300–350 km, the sensitivity of the fundamental group-velocity curves (left-hand panel) are not independently mea-
mode decreases rapidly (note that this is beyond the depth range of sured, but computed from phase velocities (right-hand panel) using:
the CUB model). The inclusion of higher mode surface waves and
c (ω)
teleseismic body waves become critical to resolve structures in the C(ω) =   , (5)
transition zone. Regardless of the methods used, the sampling of the 1− c(ω)

440 A. J. Schaeffer and S. Lebedev

where ω = 2π /T is the angular frequency, T the period and C(ω) phase-velocity curves result from paths sampling dominantly con-
and c(ω) are the group and phase speeds, respectively. Blue colours tinental regions and backarcs. From this, we can conclude that the
in Fig. 17 indicate the lowest density bins, whereas red through source–receiver distribution is affecting the relative sampling den-
black colours indicate increasing density. Group- and phase-velocity sity at shorter periods (which sample the heterogeneous lithospheric
curves for AK135_50 are superimposed. For group velocity, the first mantle and crust), resulting in a relative oversampling of low veloc-
12 overtones were plotted, whereas for phase velocity the first 14 ities (compared to AK135_50). The effects of the biased sampling is
were plotted. reduced at longer periods, where the range of path lengths is wider
In the fundamental-mode group-velocity curves (left-hand and phase velocities are sensitive to the more homogeneous deeper
panel), the greatest variability is seen at periods less than 45 s, structure.
whereas at longer periods of 100–450 s the range in group velocity
at each period is much smaller. In the transitional band at 50–80 s,
the spread in velocity at longer periods is a factor of 2 less than 6.3 Validity of the JWKB approximation
at shorter periods. The increasing group velocity at periods above
In 2005, Lebedev et al. used a data set of 4038 vertical-component
200 s is due to the higher S velocities in the lithospheric and sub-
seismogram fits computed for the western Pacific and southeast
lithospheric mantle, to which long periods are more sensitive.
Asia to examine the validity field of the JWKB approximation. Our
The variability in group velocity at shorter periods (≤40–50 s)
new data set (B, Table A1) of almost 3/4 of a million waveform
results from sampling of more heterogeneous shallow structure. In
fits provides a useful opportunity to revisit the stability fields of
continental regions, this period band is most sensitive to the crust.
the assumptions utilized by AMI, with a more substantial sampling.
Where the Moho is deeper (mainly, beneath orogens), low velocities
In the following sections, we first quantify AMI’s success rate of
extend to greater periods, manifesting as the thick green band at
waveform selection and fitting and then expand on the work of
30–70 s. In oceanic regions, however, group-velocity samples the
Lebedev et al. (2005), examining the validity field of the JWKB
uppermost mantle at periods of 15–40 s, and therefore plots faster
approximation, as implemented in AMI. Although surface wave ray
than AK135_50.
theory was not, strictly speaking, applied in the waveform fitting in
Although the depth sensitivity functions of higher mode group
this study (we integrated across approximate sensitivity areas), the
velocities are more complex than for the fundamental mode, the
frequency-dependent success rates of fitting would be similar if we
same reduction in the spread of group velocities for each mode
used rays instead of Frésnel zones. The results in this section thus
at increasing periods is observed. As with the fundamental mode,
apply to the validity of surface wave ray theory as well.
this results from sensitivity to a broader and deeper depth range.
In addition, the ‘ray-mode duality’ is clearly seen in the overtone
branches superimposing beginning at periods ≤60 s, and converging
6.3.1 Success rate of AMI
towards a group velocity of ∼4.3 km s−1 with decreasing period; this
represents an S wave travelling in the upper mantle. Using this new data set of waveform fits, we have further verified
The phase velocities for the fundamental- and first 14 higher AMI’s ability to successfully process large volumes of seismic data.
modes are shown in the right-hand panel of Fig. 17. Given that the The accuracy of processing depends both on the approximations and
sensitivity of phase- and group velocity differ substantially (Lebe- on successfully discriminating between true signal and noise. In the
dev et al. 2013), it is no surprise that the phase-velocity curves top panel of Fig. 18, a black dot is plotted at the distance mag-
are different in character. Unlike in group velocity, the ‘average’ nitude for each successfully fit seismogram (one per seismogram,
fundamental-mode phase velocity monotonically increases as a not each time–frequency window), mapping out the AMI fitting
function of period. In addition, beyond 50 s the spread in phase field. In this case, we have selected only seismograms from high-
velocity varies minimally. At periods shorter than 50 s, however, the quality, long-term stations of the GSN to reduce the effects of in
variation increases (in a manner similar to group velocity at peri- situ and instrument noise on the fitting statistics of AMI. As would
ods ≤60 s) due to the increasing sensitivity to more heterogeneous be expected, the pattern obtained when including noisier stations is
shallow structure. similar, albeit with a reduced overall success rate.
The higher mode phase-velocity curves can be distinctly iden- The shape of the successfully fit region results from both the
tified, particularly up to modes seven or eight. Those higher than source-station geometry, as well as the approximations and condi-
nine are more closely spaced and, especially at short periods, are tions enforced by AMI. The white regions around the perimeter of
more difficult to distinguish on the plot. At high phase veloci- the plot represent source-station configurations for which AMI does
ties, ∼8.5 km s−1 , sudden jumps in overtone branches related to not fit a seismogram. At the bottom (low magnitudes), the grey line
core–mantle-boundary Stoneley-modes are clearly recovered for at indicates the distance versus magnitude threshold employed when
least six of the overtones (e.g. Dahlen & Tromp 1998). selecting seismograms. The sharp vertical boundary at long path
The different character between fundamental-mode oceanic and lengths ( > 16 500 km) is enforced to avoid source-stations con-
continental dispersion curves is clearly observed in both the group figurations nearing the antipode, where interference of major and
(<80 s) and phase (<50 s) velocity images; curves from faster, minor arc phases results in greater complexity and large amplitude
oceanic paths lie above AK135_50, while those from slower paths variations of the arrivals.
(across continents or backarcs) lie below. Interestingly, the bin den- The distribution of points in Fig. 18 (top panel) was binned
sity is lower for the former (i.e. oceanic paths), rather than the to generate a quantitative measure of AMI’s success rate (bottom
latter. This is contrary to expectation, as oceanic crust accounts panel). White indicates a success ratio of 0 per cent, blue through
for more than 50 per cent of the Earth’s surface. As has already red an increasing fit success rate, and black represents 100 per cent
been discussed, however, the sources and receivers are not evenly successful fitting. Three smoothed contours are superimposed, the
distributed, and therefore impart a ‘sampling filter’ on the results. interior of which indicates all seismograms are fit successfully at
Based on an analysis of the distribution of path lengths and least 20, 50 and 70 per cent of the time; below the grey line no fits
minimum-filter centre periods, we observe that the shortest period are attempted.
Multimode upper-mantle tomography 441

tude of requested data is restricted, the highest concentration of

seismograms lies at these lowest magnitudes, even though most of
them are too noisy to yield useful fits. In examining the success
rate of AMI, it is encouraging to observe that at these low mag-
nitudes, very few seismograms are fit. Although this may result in
the impression of AMI underperforming (e.g. ∼750 000 fits out of
3.6 million seismograms, an overall success rate of 20 per cent), this
is certainly not the case. As the large majority of the 2.85 million
seismograms not fit lie at these low magnitudes, AMI has effectively
discarded noisy seismograms while simultaneously obtaining high
success rates across a large space of the distance-magnitude plane
(e.g. the >50 and >70 per cent successfully fit regions).

6.3.2 Empirical bounds for the JWKB approximation

The validity of the surface wave JWKB theory—in this context
neglecting the effects of scattering while incorporating finite width
sensitivity regions—is, in general, only warranted for waves trav-
elling through regions of smooth lateral heterogeneity (Kennett &
Nolet 1990; Wang & Dahlen 1995b; Dahlen & Tromp 1998; Lebe-
dev et al. 2005). In many regions of the Earth, particularly the crust
and upper mantle, heterogeneity sampled by surface waves is rough
compared to Fresnel-zone widths at the periods of interest (Wang
& Dahlen 1995b); therefore, in many cases the validity of JWKB
theory is not warranted, meaning that it may or may not be valid for
any given time–frequency portion of a particular seismogram.
In Lebedev et al. (2005) examined the validity field of surface
wave ray theory using a data set of 4038 vertical-component seis-
mograms fit by AMI. Using this data set, the authors concluded
that AMI’s case-by-case selection of the time–frequency portions
of seismograms that can successfully be modelled using eqs (1)
and (2) is well suited to ensure the validity of the approximations.
Our new data set of more than 175× the number of seismograms
(and, similarly, the number of fundamental- and higher mode time–
frequency windows) is well suited to further explore the empirical
validity field of the JWKB approximation.
In the top panels of Fig. 19, a single black dot is plotted for each
successfully fit time–frequency window for the fundamental mode
Figure 18. Success of AMI waveform fitting in the epicentral distance– (left-hand side) and higher modes (right-hand side); each point is
earthquake magnitude plane. In the top panel, we plot the raw data. Each mapped based on its Gaussian filter centre period and epicentral
black dot represents a successfully fit seismogram. In the bottom panel, we distance. The lower left corners are devoid empty, reflecting the
have computed the success of AMI waveform fitting. Colour indicates the far-field approximation implemented in AMI. The white areas in
percentage of successful fits in each bin. White regions indicate areas of the top right corners are regions where the JWKB approximation
the depth-magnitude plane where no fits are obtained; this may be due to
is never valid, as no matter how many attempts are made, no time–
noise or scattering or both. The grey line at the bottom is the empirical
frequency windows are successfully fit; this region represents the
distance-magnitude threshold utilized in selecting data.
scattering regime.
The distributions shown in the top panels of Fig. 19 provide an
At magnitudes greater than 6.5 MW , success rates of >70 per cent empirical estimate for the boundaries of the validity field of the
extend across the distance axis, indicating that for long periods, the JWKB approximation as implemented by AMI. By cumulatively
JWKB approximation is successful at fitting in most cases. The binning the point clouds for each distance, from minimum to maxi-
different behaviour at lower magnitudes is due to lower signal-to- mum period, more quantitative empirical validity field estimates are
noise ratios. This is clearly observed in Fig. 18 (bottom panel): for a presented in the bottom panels of Fig. 19. Colours going from blue
given magnitude, for example, 5.75 MW , fitting success is inversely towards red and then black indicate increasing density of success-
proportional to epicentral distance. The 50 per cent contour illus- fully fit time–frequency windows.
trates this decay of signal-to-noise ratio; with increasing distance, As is expected, we observe a decrease in the likelihood of the
the minimum magnitude required to achieve 50 per cent success validity of the JWKB approximation with increasing distance and
increases. decreasing period (increasing frequency), both for the fundamen-
It is clear that AMI is not only successfully fitting seismograms tal and higher modes. It is important to note that, by selection,
across a large area of the distance-magnitude plane (red areas repre- only S and multiple-S waves propagating primarily within the upper
senting >70 per cent success rate), but is also effectively identifying mantle and transition zone are included in the higher mode parts
and discarding noisy seismograms. Although the minimum magni- of the waveforms that are fitted. Therefore, they sample greater
442 A. J. Schaeffer and S. Lebedev

et al. (2005, Fig. 10), gives a transition frequency of ∼33 mHz at

3350 km, which is very similar to the results obtained in this study
when employing the same 10 per cent cut-off. The similarity verifies
the accuracy of the past estimate, but with a more robust statistical

7 C O N C LU S I O N S
We have generated an unprecedentedly large data set of ∼3/4 of
a million vertical-component waveform fits. We use this new data
set to validate key aspects of the multimode waveform inversion,
to assess the bulk dispersive properties of the upper mantle, and
to re-examine the validity field of the JWKB approximation, the
surface wave ray-theoretical foundation underpinning most of the
past global models, on which much of our current understanding of
large-scale mantle structure and dynamics is based.
Through recomputing our full data set of waveform fits four
separate times (see Appendix A), we examined and compared the
frequency-dependent sensitivity of the derived tomographic mod-
els. Whereas Lebedev & van der Hilst (2008) imposed an upper
frequency cut-off of ∼60 mHz (∼44 mHz filter centre frequency)
to avoid breakdowns of the path-average approximation for shorter
period surface waves sampling the Earth’s heterogeneous crust, we
show, using tests with our much larger data set, that the negative
effect of this on the model is very limited, smaller than the prob-
able effects of the errors in the source locations and mechanisms,
station timing and of unmodelled diffraction. The net effect of ex-
panding the frequency band is, instead, positive, thanks to the extra
Figure 19. Empirical validity field for the JWKB approximation for the structural information from now-included higher frequency S and
fundamental- and higher modes, left-hand and right-hand panels, respec- multiple-S wave (higher mode) data. Our preferred data set (B) was
tively. In the top panels, each successfully fit time–frequency window is thus generated without any imposed upper frequency limit.
plotted as a black dot in distance-period plane. These dot-plots are con- The large number of waveform fits offers new insight into the
verted into cumulatively-binned point densities in the bottom panels. For validity of the basic approximations that are used extensively in
each distance (column spanning entire period range), colours indicate the upper-mantle imaging. We were able to confirm both the consis-
cumulative successful fits starting from 0 per cent (white) at the smallest pe-
tency of AMI in detecting noisy seismograms (which should not
riods and increasing to 100 per cent (black) at greater periods. White regions
indicate the scattering regime (where no fits were computed successfully)
be fit) and correctly identifying and fitting large numbers of seis-
whereas coloured regions indicate where the JWKB approximation can be mograms for which the approximations are valid, across much of
valid (fits were successfully computed). Note that all period axes are loga- the earthquake magnitude–epicentral distance plane. We have re-
rithmically scaled. examined the empirical validity field for the JWKB approximation
and demonstrated that it is valid for a large proportion of the data,
heterogeneity than teleseismic S waves travelling in the deep lower particularly at shorter periods than previously theorized. Impor-
mantle. tantly, the time–frequency portions of the signal for which the ap-
In the past, theoretical and empirical estimates were made on proximation is valid can be consistently identified on a case-by-case
the bounds of validity of the JWKB approximation. Kennett & basis, for use in the imaging.
Nolet (1990) modelled wave propagation in realistic upper-mantle Our new global, upper-mantle, vertically polarized shear speed
models to infer a validity threshold of 50 s (20 mHz), for propagation model, SL2013sv is constrained by more than half-a-million of the
distances of 3350 km. Lebedev et al. (2005) found that this threshold most mutually consistent waveform fits, selected using a rigorous
was conservative, as their hit-count only begins to decrease at this outlier analysis procedure. This new model is capable of resolving
point. Wang & Dahlen (1995b) point out that the misfit of the features smaller than 6◦ laterally globally, and certainly much finer
surface wave ray approximation depends on diffraction and other in well-sampled continental regions. In oceanic regions, we have
finite-frequency effects
√ ignored by JWKB theory, and is dependent captured striking images of spreading ridge anomalies which are
on the quantity s/ 4πl (where s is the root-mean-square degree more localized near the ridge axis in the uppermost mantle than in
of the phase-velocity perturbation δc and l is the degree of the the past models. In continental regions, we conclude that the high-
equivalent mode n Sl or n Tl ). From Fig. 19, it is clear the validity field velocity, cold cratonic roots are not required to extend far beyond
can be extended to much shorter periods (higher frequencies) than 200 km depth. Between 150 km and the base of the transition zone,
past conservative estimates, as long as portions of the seismogram we obtain clear images of most of the major subduction zones,
affected by unmodelled wave propagation effects (scattering) can including many in the western Pacific, Cascadia and the South
be identified and avoided, as is automatically carried out by AMI. American margin (Andean). Finally, in the transition zone we see
At an epicentral distance of 3350 km, the fundamental-mode clear evidence for the lithosphere subducted during the closure of
validity field (in Fig. 19) extends to periods as short as ∼20 s the Tethys Ocean and subsequent continental collisions, stretching
(50 mHz), almost half (double) the past estimates. The empirical almost continuously from the Mediterranean to southeast Asia. Ob-
cut-off of ∼10 per cent (a bin density of ∼3), used by Lebedev served agreement of the deep-crustal and upper-mantle structure
Multimode upper-mantle tomography 443

of shear velocity heterogeneity in the mantle, J. geophys. Res., 99(B4), APPENDIX A: SELECTING A
6945–6980. WAV E F O R M D ATA S E T
Tape, C., Liu, Q., Maggi, A. & Tromp, J., 2009. Adjoint tomography of the
Outlined are the additional procedures used to select the data set
southern California crust, Science, 325, 988–992.
Tian, Y., Sigloch, K. & Nolet, G., 2009. Multiple-frequency SH -wave to- used in generating the final tomographic model. Tests were carried
mography of the western US upper mantle, Geophys. J. Int., 178(3), out to analyse the effects of near-nodal radiation and the impact
1384–1402. of the upper frequency limit used in waveform inversion on both
Tian, Y., Zhou, Y., Sigloch, K., Nolet, G. & Laske, G., 2011. Structure the fits and the tomographic models. Finally, the treatment of re-
of North American mantle constrained by simultaneous inversion of maining errors is discussed. The four main data sets examined are
multiple-frequency SH, SS, and Love waves, J. geophys. Res., 116(B2), summarized in Table A1.
1–18. To start, waveform fits are generated with no enforced upper
Trampert, J. & Woodhouse, J.H., 2003. Global anisotropic phase velocity frequency limit and no consideration of near-nodal radiation. The
maps for fundamental mode surface waves between 40 and 150 s, Geo-
resulting data set A (NR, Table A1) consists of 846 360 vertical-
phys. J. Int., 154, 154–165.
component waveform fits. Data set B (RAD, Table A1) excludes
van der Lee, S. & Frederiksen, A.W., 2005. Surface wave tomography applied
to the North American upper mantle, in Seismic Earth: Array Analysis time–frequency portions of seismograms at near-nodal azimuths;
of Broadband Seismograms, Vol. 157, pp. 67–80, eds Levander, A. & this data set comprises 712 077 waveform fits. The final two data
Nolet, G., AGU Geophysical Monograph Series, Washington, DC. sets, C (60 mHz) and D (43 mHz), were computed with progres-
van der Lee, S., James, D.E. & Silver, P.G., 2001. Upper mantle S velocity sively lower upper frequency cut-offs imposed during waveform
structure of central and western South America, J. geophys. Res., 106(12), fitting and include 692 540 and 685 146 fits, respectively. The
30 821–30 835. values of 60 and 43 mHz indicate the frequencies at which the high-
Van der Voo, R., Spakman, W. & Bijwaard, H., 1999. Tethyan subducted frequency tail of the highest frequency Gaussian filter decreases to
slabs under India, Earth planet. Sci. Lett., 171(1), 7–20. an amplitude 0.3× the filter’s central amplitude. As a result, the
Visser, K., Lebedev, S., Trampert, J. & Kennett, B.L.N., 2007. Global Love
centre frequencies of the respective highest frequency filters are
wave overtone measurements, Geophys. Res. Lett., 34(3), 1–6.
∼48 (∼20 s) and ∼35 mHz (∼29 s).
Visser, K., Trampert, J. & Kennett, B.L.N., 2008. Global anisotropic phase
velocity maps for higher mode Love and Rayleigh waves, Geophys. J. Int.,
172(3), 1016–1032. A1 Effects of near-nodal radiation
Wang, Z. & Dahlen, F.A., 1995a. Spherical-spline parameterization of three-
dimensional Earth models, Geophys. Res. Lett., 22(22), 3099–3102. Automated multimode inversion (AMI) is implemented with a
Wang, Z. & Dahlen, F.A., 1995b. Validity of surface-wave ray theory on a frequency- and azimuth-dependent threshold, set up to avoid fit-
laterally heterogeneous earth, Geophys. J. Int., 123(3), 757–773. ting seismograms with source–receiver azimuths near a node in the
446 A. J. Schaeffer and S. Lebedev

Table A1. Table summarizing the four different data sets of waveform fits generated using AMI (A–D). For each case, the
same set of ∼3.6 million seismograms was used. The final data set, E, is that used for generating the final tomographic model,
and is derived from data set B. Column one indicates the name of the data set. The second column indicates what conditions
were imposed during waveform fitting. The third and fourth columns indicate the ‘minimum’ and ‘maximum’ Gaussian filter
centre frequencies for each data set. The fifth column indicates the total number of fits for that data set. The sixth and seventh
columns list the number of fundamental and higher mode time–frequency windows.
Gaussian filters (mHz) No. time–frequency windows

Data set name AMI conditions Min. Max. No. fits Fund. mode Higher modes
A) NR Nodal radiation pattern 2.93 100.7 846 360 3 611 349 408 566
disregarded; No frequency limits
B) RAD Nodal radiation pattern 2.93 100.7 712 077 3 137 154 330 322
accounted for; No frequency limits
C) 60 mHz Nodal radiation pattern 2.93 50.0 692 540 3 055 588 293 827
accounted for; Upper frequency
limit at 60 mHz
D) 43 mHz Nodal radiation pattern 2.93 34.6 685 146 2 897 027 226 194
accounted for; Upper frequency
limit at 43 mHz
E) Model Same as B 2.93 87.1 521 705 2 302 157 171 260

seismic wave radiation pattern. There are several reasons to avoid significantly. Therefore, they imposed a uniform high-frequency
fitting such seismograms. First, there is a greater likelihood that limit of ∼44 mHz (23 s) for the centre frequency of their Gaussian
these portions of the seismogram will be dominated by scattered filters for their final processed data set used in the tomographic
waves, and that the synthetic seismograms used to model them model.
will be the most affected by phase errors due to source-mechanism We explored the effect the upper frequency limit has on the
uncertainties. Finally, waveform sensitivity kernels increase in com- waveform fitting procedure and resulting tomographic models using
plexity near to nodes in the radiation pattern, and as a result sample a our much larger data set of waveform fits. The starting data set of
larger volume lying off the source–receiver great-circle path (Meier 3.6 million seismograms has been reprocessed using two different
et al. 1997; Lebedev et al. 2005). cut-offs (both account for nodal radiation patterns), and are referred
The initial data set of waveforms fits (A) was generated with- to as data sets C and D. Data set C was processed imposing a similar
out accounting for the radiation pattern, and resulted in 846 360 60 mHz maximum (∼48 mHz average filter centre) frequency cut-
successfully fit seismograms. Reprocessing the same seismograms off as that used by Lebedev & van der Hilst (2008), whereas data
accounting for near-nodal radiation resulted in 712 077 waveform set D uses a stronger cut-off of 43 mHz (∼33 mHz filter centre).
fits (data set B). Avoidance of nodes in the radiation pattern thus As expected, a lower cut-off decreases the quantity of successful
cost ∼135 000 seismograms, ∼19 per cent of the total in data set waveform fits: only 692 540 fits for 60 mHz and 685 146 for 43 mHz.
B. We computed the waveform fits both with and without consid- The effects of the upper frequency limit as it pertains to the result-
ering the radial notes for completeness, obtaining an estimate of ing tomographic inversions were of particular interest. To this end,
how many waveform fits in the data set A were ‘near-nodal’. For three inversions were performed using a common set of ∼540 000
tomography, we shall proceed using data set (B), with near-nodal successful fits. Use of the same seismograms in each test enables a
signals excluded. more consistent appraisal of the effects that frequency band has on
the inversion and resulting model.
Minor variations in frequency content of each data set results in
small differences in the dimensions of the inverse problem: data set
A2 High-frequency cut-offs B uses 1 635 342 equations and data set D 1 586 951 equations. As
The validity of the path-average (sensitivity-area average) approx- data set C uses a cut-off intermediate to B and D, and, as expected
imation (1, 2) depends upon the smallness of differences be- the results lie within the range given by B and D; therefore attention
tween the sensitivity-area averages of phase-velocity derivatives is paid solely to these end-members. Each inversion was run for
[δCm0 (ω)/δβ(r )] and the derivatives at every point [δCm (ω, θ, 3000 iterations, yielding variance reductions of ∼90 per cent, and
φ)/δβ(r)] within the sensitivity area (Lebedev & van der Hilst model norms within 5 per cent of each other.
2008). Essentially, this means that for paths crossing strong lateral Fig. A1 shows three maps at 20, 36 and 585 km depth (left to right)
heterogeneities, the approximation may no longer be valid; δCm (ω, through three models generated using different data sets. The top
θ , φ)/δβ(r) at some points
θ i , ψ i within the kernel may deviate panel shows results for data set B and the middle panel for D. The
substantially from δCm0 (ω)/δβ(r ). largest differences are expected in the crust and transition zone.
Lebedev & van der Hilst (2008) performed a series of tests to Shallow structure is sensitive to higher frequencies, therefore re-
quantitatively investigate the effect of 3-D heterogeneity in the sen- stricting the maximum frequency limits resolution. In the transition
sitivity areas, using the misfit of synthetic seismograms as a mea- zone, higher modes are critical for resolving structure. However,
sure. They observed that enforcement of tighter misfit limits resulted reducing the upper frequency limit reduces the higher mode content
in most fundamental-mode time–frequency windows at higher fre- and therefore decreases resolving power below 250–300 km depth.
quencies being rejected, due to the lateral heterogeneity of the crust. Relatively little difference in apparent resolution is observed at
Importantly, however, their upper-mantle images were not affected the shallow depths (20 and 36 km) between models B and D (top and
Multimode upper-mantle tomography 447

Figure A1. Comparison of tomographic results from three different inversions using a common set of ∼540 000 seismograms. The top panel illustrates the
model generated using the fits drawn from data set B (Table A1, no upper frequency limit), whereas the middle panel shows the model generated drawing
the same set of seismograms from data set C (cut-off of ∼43 mHz). Three depths at 20 km (left), 36 km (centre) and 585 km (right) are illustrated for each.
Perturbations are saturated at ±360 m s−1 for both 20 and 36 km, and ±107 m s−1 at 585 km depth. Note that at depths 20 and 36 km the perturbations are
with respect to C RUST 2 when within the crust and to mantle reference when in the mantle, whereas at 585 km depth the perturbations are with respect to the
mantle reference model (equivalent to variations of ±2 per cent). The bottom model (E, Table A1) is generated using the same initial data set as the top panel
(∼540 000 best fits from data set B) as a starting point, however an additional ∼20 000 paths were manually selected and removed to reduce artefacts in the
transition zone.

middle panels). Amplitudes of the largest anomalies are reduced by

several tens of metres per second, however, the general structure A3 Treatment of remaining errors
remains largely unchanged. The main sources of remaining errors are event location errors, in-
At lithospheric mantle depths, both models are equally well con- correct origin times and source parameters, station timing errors
strained, and exhibit few differences. The maximum amplitudes and unmodelled diffraction of surface and body waves. The impact
are reduced by at most 35 m s−1 (∼0.7 per cent), and the shape of of errors in event locations and origin times on tomography was
anomalies remains the same. A portion of the amplitude reduc- examined by Lebedev et al. (1997) and found to be limited. Using
tion results from slightly higher effective damping for data set D, a much smaller data set in the Philippine Sea region, two individual
as ∼3 per cent fewer equations are incorporated (with the same reg- inversions were performed: the first used locations and origin times
ularization coefficients). derived from short-period body wave arrivals (NEIC catalogue),
The largest differences are at depths ≥250 km, where reduction whereas the other used the Harvard CMT catalogue. Anomalies
in higher modes reduces resolving power notably (e.g. Fig. A1, in the resulting tomographic models did not differ substantially,
585 km). Overall, the model norm is reduced ∼20–30 per cent in despite large systematic differences in source parameters. The ef-
data set D, and therefore features appear smoother, and in many fect of unmodelled diffraction on AMI tomography was tested in
cases with a lower amplitude. ‘spectral-element’ resolution tests (Lebedev & van der Hilst 2008;
The primary reason for reducing maximum frequency during Qin et al. 2008), which showed that the sensitivity-area-average ap-
waveform fitting was to enforce more strict validity criteria, reduc- proximation was adequate for the recovery of anomalies that were
ing errors due to assumptions of constant Fréchet derivatives across sufficiently well sampled by crossing rays. Although these previous
the sensitivity areas. It is possible that such errors may propagate tests suggest that the effect of errors in the data is overall limited
into the model and result in artefacts. As is suggested by Lebedev if the data sampling is dense, isolated artefacts often remain in the
& van der Hilst (2008) and observed in our tests here, however, tomographic models. Based on the analysis described earlier, we
errors due to the assumption of constant phase-velocity derivatives have chosen to retain data set B (no frequency limit), and perform
are small. a manual analysis to identify and remove additional seismograms
An examination of the two models presented in Fig. A1 (top potentially contaminated by errors.
two panels) reveals that, although some artefacts are reduced, they The final data set, E (Table A1), includes the best ∼540 000
are not eliminated. Therefore, they are likely to be due to other seismograms selected (using outlier analysis) from data set B. This
errors. was further reduced, beginning by removing all seismograms for
448 A. J. Schaeffer and S. Lebedev

events prior to 1994, as the moment-tensor solutions and source

locations for these older events commonly have larger errors and
are less well constrained. Next, the locations of suspected artefacts
were compared with the locations of stations and events. As the
sensitivity kernels have the largest values near their endpoints, errors
may concentrate in these regions and result in a corresponding
increase in anomaly amplitude. Seismograms recorded at stations
or originating from events in close proximity to apparent artefacts
were identified, examined and discarded if deemed suspicious.
This process of manual analysis identified ∼20 000 additional
seismograms for removal. The results from the inversion of this
data set are plotted in the bottom panel of Fig. A1, for comparison
with the previous two inversions. In this case, the amplitudes in the
crust and shallow mantle obtained in the top panel are maintained,
as are the amplitudes in the transition zone. In addition, the reduc-
tion of transition zone artefacts previously achieved by limiting the
maximum frequency during fitting (middle panel) has been repro-
duced. Therefore, this subset of 521 705 (data set E, Table A1) was
selected for our final tomographic model.

A P P E N D I X B : A N A LY S I S O F
In this section, we explore the properties of the waveform fits gener-
ated by AMI as a function of the different constraints applied during
waveform fitting. As discussed previously in Appendix A, the result
of numerous frequency-limit tests and outlier removal was 521 705
of the most mutually consistent vertical-component waveform fits
used to generate the final tomographic model. In the following plots,
we examine differences in the properties of four different full data
sets and the final, ‘model’ data set (E, Table A1). This comparison
offers insight into the statistical nature of the effects of the frequency
limits during fitting, as well as what constitutes an ‘outlier’.
Fig. B1 illustrates the path-length distribution of each full AMI
data set (A–D) and model subset (E, orange). The top panel rep-
resents seismograms, whereas the middle and bottom panels show
histograms for the fundamental- and higher mode time–frequency
windows, respectively. In all three panels, the distributions are bi-
modal. The secondary lobe centred at 10 000 km results from nu-
merous USArray stations sampling seismicity in the western Pacific.
Such a double-lobe distribution has also been observed in regional-
scale modelling, where the local seismicity dominated over several
Figure B1. Path-length distribution of successfully fit seismograms. The
large-distance events, included to help constrain the structure at the
five different colours represent the different data sets fit using AMI (Ta-
model domain boundaries (e.g. Legendre et al. 2012).
ble A1): data set A (white) with no cut-offs and ignoring the radiation
The different constraints applied during waveform fitting affect pattern; data set B (green) accounts for the radiation pattern; data set C (pur-
the distributions. Accounting for the radiation pattern reduces the ple) additionally imposes a 60 mHz high-frequency cut-off; data set D (blue)
number of fits relatively evenly across all distance bins (white com- imposes a 43 mHz high-frequency cut-off and E (orange) is the subset of
pared to green, all panels). However, the restriction of the upper 521 705 waveform fits selected from data set B used to compute the final to-
frequency limit during waveform fitting reduces the number of the mographic model. Top panel is the histogram of the number of seismograms,
shorter paths more than the number of longer paths (green compared the middle panel is the histogram for the number of fundamental-mode wave
to purple and blue); this effect is particularly clear in the higher trains (≥2 time–frequency windows for each seismogram) and the bottom
mode panel (bottom), where a significant proportion of paths lie in panel shows the number of distinct higher mode wave trains (one count per
time–frequency window with ≥1 higher mode fit). The largest contribution
the range 1200–3500 km. The a posteriori outlier removal appears
comes from paths between 2500 and 7500 km. The secondary lobe centred
to have a similar effect: shorter path lengths are preferentially re-
at ∼10 000 km results from a large number of circum-Pacific paths between
moved. This is expected, as the fits at short distances are affected stations of the USArray TA and western Pacific seismicity. Higher modes
more by source mislocations and timing errors than those for longer (bottom panel) are clearly dominated by shorter path lengths (≤4000 km). Y-
paths. axis is linearly scaled and indicates the number of seismograms or windows,
Fig. B2 plots the distribution of Gaussian filter centre periods for in thousands (e.g. 55 k ≡ 55 000).
each data sets. Both fundamental and higher mode time–frequency
windows sample the broad period range from 10 to ∼320 s. How-
ever, since the histogram represents only filter centre periods, the
Multimode upper-mantle tomography 449

Figure B3. Contributions of the fundamental (mode 0) and higher modes

in each of the five data sets in Table A1, colour coded as in the previous
two figures. The number of the fundamental-mode curves is an order of
magnitude greater than that of the higher modes, in part because the funda-
mental mode was required to be included for a waveform fit to be accepted.
There are fewer higher mode curves because S waves are not included in
all waveform fits. The effect of the cut-off frequency used during waveform
fitting is clearly visible, and results in a decrease of the number of higher
Figure B2. Histogram of the Gaussian filter centre periods for the funda- mode phase-velocity curves with increasing minimum period (decreasing
mental (top panel) and higher (bottom panel) modes. As in the previous maximum frequency). Note that the Y-axis is logarithmically scaled.
figure, colours represent the different data sets in Table A1. For the fun-
damental mode, one count indicates an arrival of this mode within a sin-
gle time–frequency window for a successfully fit seismogram. For higher to the waveform fit and can, thus be measured. Therefore, the higher
modes, more than one mode usually contributes to each wave train or time–
mode content represents a conservative, lower limit estimate of the
frequency window. See Fig. B3 for the contribution of the individual modes.
actual higher mode contributions.
Note that in the axis labels, TF-win indicates ‘time–frequency windows’.
The fundamental mode, indicated by mode number 0, has one
full finite width of the Gaussian filters broadens the complete period phase-velocity curve for every successfully fit seismogram; this
range, in particular extending to longer periods. Therefore, the full is one of AMI’s criteria for accepting a waveform fit. The dif-
range spans 10–455 s (observed in Fig. 17). In Fig. B2, the effect of ferent constraints imposed during waveform fitting are clear. The
the 60 mHz (C, 16 s) and 43 mHz (D, 23 s) upper frequency cut-offs inclusion of nodal radiation patterns has a minimal effect across
(purple and blue, respectively) is clear. The 16 s (purple) cut-off the higher modes (green compared to white). The frequency cut-
results in a minimum filter centre period of ∼19 s, with no time– offs however, have a stronger influence, with greater reduction for
frequency windows at shorter periods. For the 23 s (blue) cut-off, the the narrower frequency bands (D versus C). The 60 mHz (pur-
minimum filter centre period is ∼29 s. At long periods, the number ple) and 43 mHz (blue) data sets show progressively fewer higher
of time–frequency windows are similar for all the data sets. modes, which, as discussed in Section A2, reduce data redundancy
The fundamental-mode waveform fits in data set E (model) sam- and therefore resolution and recovered amplitude in the transition
ple the range 35–200 s almost uniformly, with a drop-off (approxi- zone.
mately half an order of magnitude) at periods from 200–350 s. For The first seven higher modes contribute most significantly in
higher modes, sampling is strongest in the period range 20–100 s, data set E (model), with ∼10 000–80 000 (modes three through
and decreases at longer periods (100–350 s). It is important to note five only) dispersion curves. For modes 8–10, thousands of phase-
that the counting of higher mode time–frequency windows is in- velocity curves are measured, and account for ∼5 per cent of the
cremented only once for each successfully fit window, not for each total overtones. At the highest mode numbers (11–18) less than 500
higher mode in the window. Since a higher mode wave train is curves are measured, and contribute only ∼0.05 per cent. We can
generated through interference of a number of modes, this distribu- compare the number of higher modes we obtain with other studies,
tion does not reflect the number of individual higher modes at each for example, Visser et al. (2008, their table 2). Although they obtain
period. more first overtone measurements, our new data set contains much
The distribution of fundamental- and higher modes, measured by more measurements at higher modes, by a factor of two for modes
AMI after waveform fitting, are plotted in Fig. B3, for each data set. 4–6. The selection criteria for our new data set are also more strict
AMI employs conservative criteria for the selection of frequency (compared to that outlined in Visser et al. 2007), with much closer
ranges in which a given mode has a sufficiently strong contribution data-synthetic fits required.

You might also like