0% found this document useful (0 votes)

109 views224 pages

Fwi Book

This document provides lecture notes on full waveform inversion (FWI). It discusses various forward modeling and inversion methods for FWI, including finite-difference and finite-element approaches in both the frequency and time domains. It covers topics such as the acoustic wave equation, boundary conditions, source implementation, numerical examples, the inverse problem theory, practical considerations for FWI, and applications to synthetic datasets. The document is intended to provide an overview of FWI methodology for educational purposes.

Uploaded by

appudeselvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views224 pages

Fwi Book

Uploaded by

appudeselvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 224

Lecture Notes on Full Waveform Inversion

Romain Brossier and Jean Virieux

[email protected] [email protected]

Université Joseph Fourier, Grenoble, France

SEISCOPE Consortium
http://seiscope.oca.eu

July 2011
Acknowledgments

This work has been done in the framework of SEISCOPE consortium partially funded by
BP, CGG-Veritas, ENI, EXXON-Mobil, SAUDI-ARAMCO, SHELL, STATOIL and TOTAL,
and by the French research agency (Agence Nationale de la Recherche) by project ANR-05-
NT05-2-42427. This work is also supported by Université Joseph Fourier (Grenoble), CNRS,
Université de Nice Sophia-Antipolis and IRD. R. Brossier has also been supported by Réseaux
de Transport d’Electricité under project 08CUF1954.
Access to the high performance computing facilities of the Mesocentre SIGAMM (Obser-
vatoire de la Côte d’Azur, France) (http://crimson.oca.eu/), the Géoazur lab (University
of Nice Sophia-Antipolis-CNRS, France), the LGIT lab (University Joseph Fourier, Greno-
ble, France)(https://ciment.ujf-grenoble.fr/) and the GENCI- [CINES/IDRIS] (http://
www.genci.fr/,http://www.idris.fr/,http://www.cines.fr/) provided the required com-
puter resources to develop the packages used in this work.
We are also gratefull to free packages available on the web: the linear systems are solved
with MUMPS package, available on http://graal.ens-lyon.fr/MUMPS/index.html. The
mesh generation is performed with help of Triangle, available on http://www.cs.cmu.edu/
$\sim$quake/triangle.html. Seismic Unix (http://www.cwp.mines.edu/cwpcodes/index.
html) is used to generate figures and partially for data preprocessing.
These lecture notes would not have been possible without the common work performed by
the different persons involved or connected with the SEISCOPE consortium. We may thank A.
Asnaashari, H. Ben Hadj Ali, C. Castellanos, B. Dupuy, V. Etienne, S. Garambois, Y. Gholami,
G. Hu, F. Lavoué, S. Operto, D. Pageot, V. Prieux, A. Ribodetti, A. Roques for various lifeful
discussions.
Jean Virieux thanks very much H. Calandra from TOTAL and R-E Plessix from SHELL
for the common overview of forward modelling.
Contents

Introduction 9

1 Forward modeling 13
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 PDE discretisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Frequency-domain acoustic wave equation . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 Mixed-grid Finite-difference method . . . . . . . . . . . . . . . . . . . . 19
1.3.1.1 Discretisation of the differential operators . . . . . . . . . . . . 20
1.3.1.2 Anti-lumped mass . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Numerical dispersion and anisotropy . . . . . . . . . . . . . . . . . . . . 22
1.3.3 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.3.1 PML absorbing boundary conditions . . . . . . . . . . . . . . . 24
1.3.3.2 Free surface boundary conditions . . . . . . . . . . . . . . . . . 26
1.3.4 Source and receiver implementation on coarse grids . . . . . . . . . . . . 26
1.3.5 Resolution with the sparse direct solver MUMPS . . . . . . . . . . . . . 27
1.4 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4.1 3D EAGE/SEG overthrust model . . . . . . . . . . . . . . . . . . . . . . 29
1.4.1.1 3D EAGE/SEG salt model . . . . . . . . . . . . . . . . . . . . 31
1.5 Finite-element Discontinuous Galerkin Method . . . . . . . . . . . . . . . . . . 33
1.6 2D Finite-element Discontinuous Galerkin Method in the Frequency Domain . 34
1.6.1 hp-adaptive Discontinuous Galerkin discretisation . . . . . . . . . . . . 35
1.6.2 Which interpolation orders to choose? . . . . . . . . . . . . . . . . . . . 38
1.6.3 Boundary conditions and source implementation . . . . . . . . . . . . . 40
1.6.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6.4.1 Acoustic wave modeling in presence of cavities . . . . . . . . . 41
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain . . . . 42
1.7.1 The 3D DG-FEM formulation . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.1.1 Elastodynamic system . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.1.2 Spatial discretisation . . . . . . . . . . . . . . . . . . . . . . . 44
1.7.1.3 Time discretisation . . . . . . . . . . . . . . . . . . . . . . . . 47
1.7.2 Computational aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.7.2.1 Source excitation and boundary conditions . . . . . . . . . . . 48
1.7.2.2 Source excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.7.2.3 Free surface condition . . . . . . . . . . . . . . . . . . . . . . . 50
1.7.2.4 Absorbing boundary condition . . . . . . . . . . . . . . . . . . 50
1.7.3 Validation tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
CONTENTS

1.7.3.1 Saving computation time and memory . . . . . . . . . . . . . . 55

1.7.4 Accuracy of DG-FEM with tetrahedral meshes . . . . . . . . . . . . . . 57
1.7.4.1 Convergence study . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.7.4.2 Accurate modelling of surface waves . . . . . . . . . . . . . . . 59
1.7.5 hp-adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1.7.5.1 Two-step refinement approach . . . . . . . . . . . . . . . . . . 60
1.7.5.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 62
1.7.6 The EUROSEISTEST Benchmarch . . . . . . . . . . . . . . . . . . . . . 64
1.7.6.1 Description of EUROSEISTEST verification and validation project 65
1.7.7 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
1.8 Partial conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

2 Inverse problem : theory 73

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2 FWI as a least-squares local optimization . . . . . . . . . . . . . . . . . . . . . 75
2.2.1 The Born approximation and the linearization of the inverse problem . 75
2.2.2 Normal equations: Newton, Gauss-Newton and steepest-descent methods 76
2.2.2.1 Basic equations . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.2.2.2 Numerical algorithms: Conjugate-gradient method . . . . . . . 78
2.2.2.3 Numerical algorithms: Quasi-Newton algorithms . . . . . . . . 78
2.2.2.4 Newton and Gauss-Newton algorithms . . . . . . . . . . . . . 79
2.2.3 The gradient and Hessian in FWI: interpretation and computation . . . 79
2.2.4 Regularization and preconditioning of inversion . . . . . . . . . . . . . . 84

3 FWI in practice 87
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2 Resolution power of FWI and relationship with the experimental setup . . . . . 87
3.3 Multiscale FWI: time-domain versus frequency-domain . . . . . . . . . . . . . . 93
3.4 Source wavelet estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5 Variants of Born-approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.6 Variants of classic least-squares . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.7 Building starting models for FWI . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.8 On the parallel implementation of FWI . . . . . . . . . . . . . . . . . . . . . . 101
3.8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.8.2 Exemple : the FWT2D ELAS code of SEISCOPE . . . . . . . . . . . . 102
3.8.2.1 Forward problem . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.8.2.2 Overview of the parallel inverse problem . . . . . . . . . . . . 102
3.8.2.3 Two parallelism levels . . . . . . . . . . . . . . . . . . . . . . . 103

4 Applications 105
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2 Onshore elastic FWI: the synthetic Overthrust application . . . . . . . . . . . . 106
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2.2 Application to a canonical model . . . . . . . . . . . . . . . . . . . . . . 107
4.2.3 Onshore synthetic case study . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2.3.1 SEG/EAGE Overthrust model and experimental setup . . . . 109
4.2.3.2 Raw data inversion . . . . . . . . . . . . . . . . . . . . . . . . 113

6
CONTENTS

4.2.3.3 Successive single-frequency inversions of damped data . . . . . 113

4.2.3.4 Full-waveform inversion without free-surface effects . . . . . . 114
4.2.3.5 Simultaneous multi-frequency inversion of damped data . . . . 115
4.2.3.6 Computational aspect . . . . . . . . . . . . . . . . . . . . . . . 118
4.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2.5 Partial conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform
Inversion? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3.2.1 The least-squares norm . . . . . . . . . . . . . . . . . . . . . . 126
4.3.2.2 The least-absolute-value norm . . . . . . . . . . . . . . . . . . 126
4.3.2.3 The Huber criterion . . . . . . . . . . . . . . . . . . . . . . . . 127
4.3.2.4 The Hybrid L1 /L2 criterion . . . . . . . . . . . . . . . . . . . . 127
4.3.2.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3.2.6 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3.3 Numerical tests: the offshore Valhall model . . . . . . . . . . . . . . . . 129
4.3.3.1 Inversion set-up . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.4 Numerical tests: The onshore SEG/EAGE overthrust model . . . . . . 133
4.3.4.1 Inversion set-up . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.3.5.1 The offshore Valhall model . . . . . . . . . . . . . . . . . . . . 137
4.3.5.2 The onshore SEG/EAGE Overthrust Model . . . . . . . . . . 137
4.3.5.3 Implications for 3D FWI . . . . . . . . . . . . . . . . . . . . . 138
4.3.6 Partial conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure 140
4.4.1 Geological setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.4.2 Wide-aperture acquisition and data quality . . . . . . . . . . . . . . . . 140
4.4.3 Traveltime tomography results . . . . . . . . . . . . . . . . . . . . . . . 142
4.4.4 Full-waveform tomography results . . . . . . . . . . . . . . . . . . . . . 144
4.4.5 Asymptotic prestack depth migration results . . . . . . . . . . . . . . . 149
4.4.6 Geological discussion of the final model . . . . . . . . . . . . . . . . . . 160
4.4.7 Partial conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy
on FWI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.5.2 Review of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.5.3 Geological context and data . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.5.3.1 Geological context . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.5.3.2 Ocean-bottom-cable (OBC) acquisition geometry, seismic data
set and reference models . . . . . . . . . . . . . . . . . . . . . 167
4.5.4 Starting models appraisals . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.5.5 FWI preprocessing and experimental setup . . . . . . . . . . . . . . . . 170
4.5.5.1 FWI preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.5.5.2 Experimental set-up: seismic modelling . . . . . . . . . . . . . 171

7
CONTENTS

4.5.5.3 Experimental set-up: inversion . . . . . . . . . . . . . . . . . . 172

4.5.6 FWI results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.5.6.1 FATT+FWI model 1 . . . . . . . . . . . . . . . . . . . . . . . 174
4.5.6.2 FATT+FWI model 2 . . . . . . . . . . . . . . . . . . . . . . . 175
4.5.6.3 FATT+FWI model 3 . . . . . . . . . . . . . . . . . . . . . . . 175
4.5.6.4 RTT+FWI model . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.5.6.5 Comparison with results of 3D FWI . . . . . . . . . . . . . . . 177
4.5.7 Model appraisals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.5.7.1 Seismic modelling . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.5.7.2 Analysis of migration and common image gathers . . . . . . . 179
4.5.7.3 Source wavelet estimation as a tool for model appraisal . . . . 181
4.5.8 Summary of isotropic results and anisotropic FWI . . . . . . . . . . . . 181
4.5.9 Partial conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

Conclusion 191

Bibliography 215

A Lagrangian basis functions 217

B Matrices used in the DG-FEM formulation 219

C Full Waveform Tomography Pre-Processing 221

8
Introduction

Seismic waves bring to the surface information gathered on the physical properties of the Earth.
Since the discovery of modern seismology at the end of the nineteenth century, the main dis-
coveries have arisen from the use of traveltime information (Oldham, 1906; Gutenberg, 1914;
Lehmann, 1936). Then there was a wait until the nineteeneighties for amplitude interpretation,
when the global seismic networks could provide enough calibrated seismograms for the comput-
ing of accurate synthetic seismograms using normal-mode summation. Differential seismograms
estimated through the Born approximation have been used as perturbations for the fitting of
long-period seismograms, which can provide high-resolution upper mantle tomography (Gilbert
and Dziewonski, 1975; Woodhouse and Dziewonski, 1984). The sensitivity or Fréchet derivative
matrix, i.e., the partial derivative of the seismic data with respect to the model parameters, is
explicitly estimated before proceeding to the inversion of the linearized system. The normal
mode description allows a limited number of parameters to be inverted (around a few hundreds
of parameters), which makes the optimization procedure feasible through explicit sensitivity
matrix estimation in spite of the high number of seismograms.
Meanwhile, exploration seismology has taken up the challenge of high-resolution imaging
of the subsurface by designing dense, multifold acquisition systems. The construction of the
sensitivity matrix turns out to be too prohibitive as the number of parameters goes over ten
thousand. Instead, another road has been taken to perform high-resolution imaging. Using
the exploding reflector concept, and after some kinematic corrections, amplitude summation
has provided detailed images of the subsurface for reservoir determination and characterization
(Claerbout, 1971, 1976). The sum of the traveltimes from a specific point of the interface
towards the source and the receiver should coincide with the time of large amplitudes in the
seismogram. The reflectivity as an amplitude attribute of related seismic traces at the selected
point of the reflector provides the migrated image needed for seismic stratigraphic interpre-
tation. Although migration is more a concept for converting seismic data recorded in the
time-space domain into images of physical properties, we often refer to it as the geometrical
description of the short wavelengths of the subsurface. A velocity macromodel or background
model provides the kinematic information required to focus waves inside the medium.
The limited offsets recorded by seismic reflection surveys and the limited frequency band-
width of the seismic sources make seismic imaging poorly sensitive to intermediate wavelengths
(Jannane et al., 1989). This is the motivation behind the two-step workflow: first, the con-
struction of the macromodel using kinematic information, and then the amplitude projection
through different types of migrations (Claerbout and Doherty, 1972; Gazdag, 1978; Stolt, 1978;
Baysal et al., 1983; Yilmaz, 2001; Biondi and Symes, 2004). This procedure has turned out
to be efficient for relatively simple geological targets in shallow-water environments, although
more limited performances have been achieved for imaging structurally complex structures,
INTRODUCTION

such as salt domes, sub-basalt targets, thrust belts and foot-hills. In such complex geologi-
cal environments, building an accurate velocity background model for migration is challenging.
Various approaches for iterative updating of the macromodel reconstruction have been proposed
(Snieder et al., 1989; Docherty et al., 2003), but they remain limited by the poor sensitivity of
the reflection seismic data to the large and intermediate wavelengths of the subsurface.
Full waveform inversion (FWI) is a challenging data-fitting procedure for extracting quan-
titative information from seismograms. High-resolution imaging at half the propagated wave-
length is expected. The key ingredients of the optimization technique are an efficient forward
modeling engine and a local differential approach, where both the gradient and the Hessian
operators are efficiently estimated. Local optimization does not, however, prevent convergence
of the misfit function towards local minima because of the limited accuracy of the starting
model, the lack of low frequencies, the presence of noise and the approximate modeling of the
wave-physics complexity. Different hierarchical multiscale strategies are designed to mitigate
the nonlinearity and ill-posedness of FWI by incorporating progressively shorter wavelengths
in the parameter space. Both synthetic and real data case studies illustrate the promise of
this technique, which should address the reconstruction of various parameters, from VP and
VS velocities to density, anisotropy and attenuation.
Crucial jumps are, however, still necessary to make this technique as popular as migration
techniques. The challenges can be categorized as (1) the building of accurate starting models
with automatic procedures and/or recording of low frequencies; (2) the definition of new min-
imization criteria to mitigate the sensitivity of FWI to amplitude errors, and to increase the
robustness of FWI when multiple parameter classes are estimated, and (3) the improvement of
the computational efficiency by data compression techniques, to make three-dimensional elastic
FWI feasible.
We shall present two volumetric methods for solving efficiently wave equations either in
the frequency domain or in the time domain. For the acoustic case, one may consider the
frequency formulation even in 3D, while we need to consider explicit time integration for elas-
tic equations in 3D media. The frequency approach is quite affordable in 2D geometries and
has nice features as the simple introduction of visco-elasticity and a reduction of the source
complexity. We then perform a short summary of the basic steps of the inverse problem theory
where we highlight the importance of gradients and Hessian operators. We, then, consider
different features of the specific full waveform inversion as the resolution in connection with
the experimental setup, the problem of the non-linearity related to the cycle skipping artifact
and the estimation of the source time function. The definition of the starting model is still an
issue and mitigation through a multiscale strategy from low frequencies to high frequencies is
the proposed option. As these optimisation tools are quite demanding in computer resources,
we analyze the different strategies to speed up the computation.

Then, we consider applications from synthetic realistic examples where the high-quality
reconstruction of the full waveform inversion is recognized: the SEG/EAGE overthrust model.
We may question the adequatness of the misfit function and especially the L2 norm and. The
L1 norm is investigated on another synthetic example based on the Valhall oil field configura-
tion including effects of noise.

Considering real data is quite challenging and the Baragiano overthrust structure is con-

10
INTRODUCTION

sidered and illustrates the efficiency formulation of the acoustic equation for interpreting real
on-land data. The marine environment is considered with one 2D acquisition line of the moni-
toring system of the oil reservoir of Valhall. The imprint of the anisotropy which are embedded
in the seismograms is underlined.

We finally conclude on the potentialities of the full waveform inversion for accurate recon-
struction of the velocity structure using global seismic acquisition configuration.

11
Chapter 1

Forward modeling

This chapter is mainly based on the papers :

• Virieux, J., H. Calandra and R-E Plessix [2011], A partial review of modelling methods
for geophysical imaging. Geophysical Prospecting in press.
• Brossier, R., V. Etienne, S. Operto and J. Virieux [2010], Frequency-Domain Numerical
Modelling of Visco-Acoustic Waves with Finite-Difference and Finite-Element Discon-
tinuous Galerkin Methods, Acoustic waves, Editor D.W. Dissanayake, SCIYO, 125-158,
ISBN=978-953-307-111-4
• Etienne, V., E. Chaljub, J. Virieux and N. Glinsky [2010], An hp-adaptive discontinuous
Galerkin finite-element method for 3D elastic wave modelling, Geophys. J. Int., 183,
941-962.

1.1 Introduction
Modeling accurately and efficiently the seismic waves in a continuum medium requires solutions
of the partial differential equations (PDE) governing the physics of related field experiments.
In seismology and exploration geophysics, modelling in various realistic media, for various
purposes ranging from risk analysis to crustal imaging, had led scientists to study a wide range
of analytical, semi-analytical and numerical methods. The numerical methods can be based on
an approximation of the PDE, for instance the high-frequency approximation (see (Červený,
2001; Chapman, 2004; Virieux and Lambaré, 2007) for references) or the one-way propagation
approximation (Claerbout, 1976; Pai, 1985, 1988; Wu, 1994; Liu and Wu, 1994; Ristow and
Ruhl, 1994; Le Rousseau and de Hoop, 2001; Cheng and Liu, 2006; Cheng et al., 2007) among
other references.
However, these approximations very powerful for interpretation generally encounter diffi-
culties in complex geological terrains where earth parameters can vary at all scales leading
to complicated interactions between the waves and the medium. The need for solutions of
the full/complete differential equations (or corresponding integral equations) has been rapidly
recognized. Numerical methods with related discretisation for geophysical applications were
discussed as soon as computers have been powerful enough for numerical simulations in het-
erogeneous media; Alterman and Karal (1968); Lysmer and Drake (1972); Alford et al. (1974);
FORWARD MODELING

Bolt and Smith (1976); Ilan and Loewenthal (1976); Kelly et al. (1976); Marfurt (1984); Virieux
(1984); Dablain (1986); Levander (1988) in propagative elastodynamics. These methods have
their own limitations related to time and space discretisation for getting synthetic fields as
we shall discuss further with respect to imaging goals. Although these methods were scarcely
used on large-scale imaging problems because of their computational cost, applications of these
numerical methods have been intensively discussed in the context of the seismic reverse-time
migration and of the seismic full waveform inversion (Baysal et al., 1983; Lailly, 1983; Whit-
more, 1983; Claerbout, 1985; Gauthier et al., 1986; Tarantola, 1987). These works are at the
basis of the current developments on seismic modeling we shall focus on this section.
The diversity of the numerical methods studied and used in geophysics questions the rel-
evance and the pertinence of each approach. Some scientific disciplines seem to have a more
focus approach. For instance, in meteorology the pseudo-spectral method (often referenced as
a spectral method in the literature) represents the main approach to address the challenging
problem of weather prediction and climate changes (Haltiner and Williams, 1980; Jarraud and
Baede, 1985; Fornberg, 1998) where complex physical processes have been catched into sub-
grid phenomenological evolution such as the chemical interaction inside clouds. In structural
mechanics, finite-element method is the method of choice (Zienkiewicz and Taylor, 1967) with
various attempts for non-linear behaviors through distinct/discrete element methods (Toomey
and Beans, 2000; Mariotti, 2007). The diversity in solving geophysical modelling may, however,
reflect the different challenges in geophysics, and these challenges may require different practi-
cal solutions. For instance, to be economically valuable, the migration of hundreds of thousand
shots of a marine data set to obtain a structural image from compressional waves demands a
different implementation of the wave propagation problem that the precise modelling of surface
waves generated by a superficial earthquake. The methodological effort for years has conducted
to sophisticated tools well tuned for specific purposes. This intensive exploration of various
simulation techniques comes from our difficulties when trying to understand the earth interior
from propagation, diffusion, or even potential fields. Challenges come from

• the different types of data we handle: for instance, seismic compressional waves in explo-
ration geophysics for structural images, trapped and surface waves in seismology;

• the various types of media we have to consider: marine environments with a liquid/solid
interface, sedimentary basins with very low velocity shallow structures, foothill complex
zones with velocity inversion, complex topography, resistivity variations of several orders
of magnitude;

• the lack of a precise knowledge of the geological structures;

• the modeling scale: a wave can be recorded after having propagated over hundreds of
wavelengths. In exploration, the depth of investigation is several kilometers with a reso-
lution of ten to hundred meters, while, in global seismology, the investigation zone is in
hundreds of kilometers and the resolution in kilometers;

• the computational cost especially when the modelling represents just the kernel of a
parameter inversion.

The techniques used for the forward modeling vary and include volumetric methods such
as finite-element methods (Marfurt, 1984; Min et al., 2003), finite-difference methods (Virieux,

14
1.2 PDE discretisation

1986; Levander, 1988), finite-volume methods (Brossier et al., 2008), and pseudo-spectral meth-
ods (Danecek and Seriani, 2008). Other alternatives based on interface/boundary discretization
are quite useful both for efficiency and validation such as boundary integral methods promoted
in seismology as reflectivity methods (Kennett, 1983) or generalized screen methods (Wu, 2003),
discrete wavenumber methods (Bouchon et al., 1989). Exact theory such as the full wave the-
ory (de Hoop, 1960) makes the link with asymptotic methods such as generalized ray methods
with the computation of WKBJ & Maslov & Gaussian seismograms (Popov, 1982; Chapman,
1985) including the extension of geometrical diffraction theory (Keller, 1962; Klem-Musatov
and Aizenberg, 1985).
We shall introduce partial differential equations both in the frequency and time domains
as well as the related linear aglebra expressions related to the discretization of such equations.
The 3D acoustic wave equation in the frequency domain is considered as the first system to be
solved using a finite difference scheme with a compact support for the finite difference stencil.
Two synthetic examples will highlight performances of such numerical tool. Then, we consider
a specific class of finite element methods, known as the Discontinuous Galerkin method which
presents nice features for seismic modeling in heterogeneous media with a complex topography.
We shall develop the frequency formulation in 2D geometries as the suitable method for seismic
imaging while still affordable from the point of view of computer resources. Then, we consider
the 3D geometry in the time domain where simple explicit time stepping will be used for
computing transient or harmonic solutions. We shall expand the description of the finite
element formalism with a peculiar attention to absorbing boundary conditions.

1.2 PDE discretisation

In the regime of small deformations associated with seismic wave propagation at least away for
the earthquake zone, the subsurface can be represented by a linear elastic solid parameterized
by twenty-one elastic constants and the density in the framework of the constitutive Hooke’s
law. If the subsurface is assumed isotropic, the elastic constants reduced to two independent
parameters, the Lamé parameters, which depend on the compressional (P) and the shear (S)
wave speeds. In marine environment, the P wave speed has most of the time a dominant
footprint in the seismic wavefield, in particular, on the hydrophone component which records
the pressure wavefield. The dominant footprint of the P wave speed on the seismic wavefield
has prompted many authors to develop and apply seismic modeling and inversion under the
acoustic approximation, either in the time domain or in the frequency domain.
Let us first introduce the notations for the forward problem; namely, the modeling of the
full seismic wavefield. The reader is referred to Robertsson et al. (2007) for an up-to-date series
of publications on modern seismic modeling methods.
We use matrix notations to denote the partial differential operators of the wave equation
(Marfurt, 1984; Carcione et al., 2002). The most popular direct method to discretize the wave
equation in the time and frequency domains is the finite-difference method (Virieux, 1986;
Levander, 1988; Graves, 1996; Operto et al., 2007), although more sophisticated finite-element
or finite-volume approaches can be considered. This is especially so when accurate boundary
conditions through unstructured meshes must be implemented (e.g., Komatitsch and Vilotte,
1998; Dumbser et al., 2007a).

15
FORWARD MODELING

In the time domain, the elastodynamics equations are expressed as a system of second-order
hyperbolic of equations expressed in a compact form as

d2 u(x, t)
M(x) = S(x)u(x, t) + s(x, t), (1.1)
dt2
where M and S are the mass and the stiffness matrices (Marfurt, 1984). The source term
is denoted by s and the seismic wavefield by u. In the acoustic approximation, u generally
represents pressure, while in the elastic case, u generally represents horizontal and vertical
particle velocities. The time is denoted by t and the spatial coordinates by x. Equation (1.1) is
generally solved with an explicit time-marching algorithm: the value of the wavefield at a time
step (n + 1) at a spatial position is inferred from the value of the wavefields at previous time
steps (Dablain, 1986; Tal-Ezer et al., 1990). Implicit time-marching algorithms are avoided
as they require solving a linear system (Marfurt, 1984; Mufti, 1985). If both velocity and
stress wavefields are helpful, the system of second-order equations can be recast as a first-order
hyperbolic velocity-stress system by incorporating the necessary auxiliary variables (Virieux,
1986). The time-marching approach could gain in efficiency if one consider local time steps
related to the coarsity of the spatial grid (Titarev and Toro, 2002): this leads to a quite
challenging load balancing program between processors when doing parallel programming as
most processors are waiting for the one which is doing the maximum of number crunching.
Adapting the distribution of the number of nodes to each processor depending on the expected
complexity of mathematical operations is still an open problem as far as we know.
In the frequency domain, the wave equation reduces to a system of linear equations, the
right-hand side of which is the source, and the solution of which is the seismic wavefield. This
system can be written compactly as

B(x, ω)u(x, ω) = s(x, ω), (1.2)

where B is the so-called impedance matrix (Marfurt, 1984). The sparse complex-valued ma-
trix B has a symmetric pattern, although is not symmetric because of absorbing boundary
conditions (Hustedt et al., 2004; Operto et al., 2007).
Solving the system of equations (1.2) can be performed through a decomposition of the
matrix B, such as lower and upper (LU) triangular decomposition, which leads to the so-called
direct-solver techniques. The advantage of the direct-solver approach is that, once the decom-
position is performed, equation (1.2) is efficiently solved for multiple sources using forward and
backward substitutions (Marfurt, 1984). This approach has been shown to be efficient for 2D
forward problems (Jo et al., 1996; Stekl and Pratt, 1998; Hustedt et al., 2004). However, the
time and memory complexities of the LU factorization, and its limited scalability on large-scale
distributed memory platforms, prevents the use of the direct-solver approach for large-scale 3D
problems (i.e., problems involving more than ten millions of unknowns) (Operto et al., 2007).
Iterative solvers provide an alternative approach for solving the time-harmonic wave equa-
tion (Riyanti et al., 2006, 2007; Plessix, 2007; Erlangga and Herrmann, 2008). Iterative solvers
are currently implemented with Krylov-subspace methods (Saad, 2003) that are preconditioned
by the solution of the dampened time-harmonic wave equation. The solution of the dampened
wave equation is computed with one cycle of a multigrid. The main advantage of the iter-
ative approach is the low memory requirement, while the main drawback results from the
difficulty to design an efficient preconditioner, because the impedance matrix is indefinite. To

16
1.2 PDE discretisation

our knowledge, the extension to elastic wave equations still needs to be investigated. As for the
time-domain approach, the time complexity of the iterative approach increases linearly with
the number of sources, in contrast to the direct-solver approach.
An intermediate approach between the direct and the iterative methods consists of a hybrid
direct-iterative approach that is based on a domain decomposition method and the Schur
complement system (Saad, 2003; Sourbier et al., 2008): the iterative solver is used to solve the
reduced Schur complement system, the solution of which is the wavefield at interface nodes
between subdomains. The direct solver is used to factorize local impedance matrices that are
assembled on each subdomain. Briefly, the hybrid approach provides a compromise in terms
of memory saving and multi-source-simulation efficiency between the direct and the iterative
approaches.
The last possible approach to compute monochromatic wavefields is to perform the model-
ing in the time domain and extract the frequency-domain solution, either by discrete Fourier
transform in the loop over the time steps (Sirgue et al., 2008) or by phase-sensitivity detection
once the steady-state regime has been reached (Nihei and Li, 2007). One advantage of the
approach based on the discrete Fourier transform is that an arbitrary number of frequencies
can be extracted within the loop over time steps at a minimal extra cost. Secondly, time
windowing can easily be applied, which is not the case when the modeling is performed in the
frequency domain. Time windowing allows the extraction of specific arrivals for FWI (early
arrivals, reflections, PS converted waves), which is often useful to mitigate the nonlinearity of
the inversion by judicious data preconditioning (Sears et al., 2008; Brossier et al., 2009a).
Among all of these possible approaches, the iterative-solver approach has theoretically the
best time complexity (here, complexity denotes how the computational cost of an algorithm
grows with the size of the computational domain) if the number of iterations is independent of
the frequency (Erlangga and Herrmann, 2008). In practice, the number of iterations generally
increases linearly with frequency. In this case, the time complexity of the time-domain approach
and the iterative-solver approach are equivalent (Plessix, 2007).
The reader is referred to Plessix (2007), Virieux et al. (2009) and Plessix (2009) for more
detailed complexity analysis of seismic modeling based on different numerical approaches. A
discussion on the pros and cons of time-domain versus frequency-domain seismic modeling
relating to what it is required for full waveform inversion is also provided in Vigh and Starr
(2008) and Warner et al. (2008).
Source implementation is an important issue in FWI. The spatial reciprocity of Green’s
functions can be exploited in FWI to mitigate the number of forward problems if the number
of receivers is significantly smaller than the number of sources (Aki and Richards, 1980). The
reciprocity of Green’s functions also allows the matching of data emitted by explosions and
recorded by directional sensors, with pressure synthetics computed for directional forces (Op-
erto et al., 2006). Of note, the spatial reciprocity is theoretically satisfied for unidirectional
sensor and unidirectional impulse source. However, the spatial reciprocity of Green’s functions
can also be used for explosive sources by virtue of the superposition principle. Indeed, explo-
sions can be represented by double dipoles, or in other word, by four unidirectional impulse
sources. Therefore, it is crucial to implement correctly the source excitation.
A final comment here is that the discretisation required for solving the forward problem
has no direct relation with the discretisation used for reconstruction of the physical parameters

17
FORWARD MODELING

during FWI. Often, these two discretisations are identical, although it is recommended that
the fingerprint of the forward problem should be kept minimal in FWI.
The properties of the subsurface which we want to quantify are embedded in the coefficients
of matrices M, S or B of equations (1.1) and (1.2). The relationship between the seismic
wavefield and the parameters is nonlinear, and it can be written compactly through the operator
G, defined as
u = G (m) (1.3)
either in the time domain or in the frequency domain.
We shall now consider the 3D acoustic wave equation which is intensively used as an ef-
ficient forward modeling in many industrial applications for seismic imaging (migration and
full waveform inversion, especially using a finite difference approach. 2D and 3D finite element
approaches will be considered later as less straigthforward.

1.3 Frequency-domain acoustic wave equation

Following standard Fourier transformation convention, the 3D acoustic first-order velocity-

pressure system can be written in the frequency domain as

∂vx (x, y, z, ω) ∂vy (x, y, z, ω) ∂vz (x, y, z, ω)
− iωp(x, y, z, ω) = κ(x, y, z) + +
∂x ∂x ∂z
∂p(x, y, z, ω)
−iωvx (x, y, z, ω) = b(x, y, z) + fx (x, y, z, ω)
∂x
∂p(x, y, z, ω)
−iωvy (x, y, z, ω) = b(x, y, z) + fy (x, y, z, ω)
∂y
∂p(x, y, z, ω)
−iωvz (x, y, z, ω) = b(x, y, z) + fz (x, y, z, ω), (1.4)
∂z
where ω is the angular frequency, κ(x, y, z) is the bulk modulus, b(x, y, z) is the buoyancy,
p(x, y, z, ω) is the pressure, vx (x, y, z, ω), vy (x, y, z, ω), vz (x, y, z, ω) are the components of the
particle velocity vector. fx (x, y, z, ω), fy (x, y, z, ω), fz (x, y, z, ω) are the components of the
external forces. The first block row of equation (1.4) is the time derivative of the Hooke’s law,
while the three last block rows are the equation of motion in the frequency domain.
The first-order system can be recast as a second-order equation in pressure after elimination
of the particle velocities in equation (1.4), that leads to a generalization of the Helmholtz
equation given by the following expression,

ω2 ∂ ∂p(x, ω) ∂ ∂p(x, ω) ∂ ∂p(x, ω)

p(x, ω) + b(x) + b(x) + b(x) = s(x, ω), (1.5)
κ(x) ∂x ∂x ∂y ∂y ∂z ∂z

where x = (x, y, z) and s(x, ω) = ∇ · f denotes the pressure source. In exploration seismology,
the source is generally a local point source corresponding to an explosion or a vertical force.
Attenuation effects of arbitrary complexity can be easily implemented in equations (1.4)
and (1.5) using complex-valued wave speeds in the expression of the bulk modulus, thanks
to the correspondence theorem transforming time convolution into products in the frequency

18
1.3 Frequency-domain acoustic wave equation

domain. For example, according to the Kolsky-Futterman model (Kolsky, 1956; Futterman,
1962), the complex wave speed c̄ is given by

sgn(ω) −1

1
c̄ = c 1+ |log(ω/ωr )| + i , (1.6)
πQ 2Q

where the P wave speed is denoted by c, the attenuation factor by Q and a reference frequency
by ωr .
Since the relationship between the wavefields and the source terms is linear in the first-
order and second-order wave equations, one can explicitely expressed the matrix structure of
equations (1.4) and (1.5) through the compact expression,

[M + S] u = Bu = s, (1.7)

where M is the mass matrix, S is the complex stiffness/damping matrix. The dimension of
the square matrix B is the number of nodes in the computational domain multiplied by the
number of wavefield components. In this study, we shall solve equation (1.7) using a sparse
direct solver. A direct solver performs first a LU decomposition of B followed by forward and
backward substitutions for the solutions (Duff et al., 1986) as shown by the following equations:

Bu = (LU) u = s (1.8)
Ly = s; Uu = y (1.9)

Exploration seismology requires to perform seismic modeling for a large number of sources,
typically, up to few thousands for 3D acquisition. Therefore, our motivation behind the use
of direct solver is the efficient computation of the solutions of the equation (1.7) for multiple
sources. The LU decomposition of B is a time and memory demanding task but is independent
of the source, and, therefore is performed only once, while the substitution phase provide the
solution for multiple sources efficiently. One bottleneck of the direct-solver approach is the
memory requirement of the LU decomposition resulting from the fill-in, namely, the creation
of additional non-zero coefficients during the elimination process. This fill-in can be minimized
by designing compact numerical stencils that allow for the minimization of the numerical
bandwidth of the impedance matrix.

1.3.1 Mixed-grid Finite-difference method

In FD methods, high-order accurate stencils are generally designed to achieve the best trade-
off between accuracy and computational efficiency (Dablain, 1986). However, direct-solver
methods prevent the use of high-order accurate stencils because the large spatial support of
the stencil will lead to a prohibitive fill-in of the matrix during the LU decomposition (Hustedt
et al., 2004). Alternatively, the mixed-grid method was proposed by Jo et al. (1996) to design
both accurate and compact FD stencils. The governing idea is to discretize the differential
operators of the stiffness matrix with different second-order accurate stencils and to linearly
combine the resulting stiffness matrices with appropriate weighting coefficients. The different
stencils are built by discretizing the differential operators along different rotated coordinate
systems (x̄, ȳ, z̄) such that their axes spans as many directions as possible in the FD cell

19
FORWARD MODELING

to mitigate numerical anisotropy. In practice, this means that the partial derivatives with
respect to x, y and z in equations (1.4) or (1.5) are replaced by a linear combination of partial
derivatives with respect to x̄, ȳ and z̄ using the chain rule followed by the discretisation of the
differential operators along the axis x̄, ȳ and z̄

1.3.1.1 Discretisation of the differential operators

In 2D geometry, the coordinate systems are the classic Cartesian one and the 45o -rotated one
(Saenger et al., 2000) which lead to the 9-point stencil (Jo et al., 1996). In 3D, three coordinate
systems have been identified (Operto et al., 2007) (Figure 1.1): [1] the Cartesian one which
leads to the 7-point stencil, [2] three coordinate systems obtained by rotating the Cartesian
system around each Cartesian axis x, y, and z. Averaging of the three elementary stencils leads
to a 19-point stencil. [3] four coordinate systems defined by the four main diagonals of the
cubic cell. Averaging of the four elementary stencils leads to the 27-point stencil. The stiffness
matrix associated with the 7-point stencil, the 19-point stencil and the 27-point stencil will be
denoted by S1 , S2 , S3 , respectively.
The mixed-grid stiffness matrix Smg is a linear combination of the stiffness matrices we
have just mentioned previously expressed as

w2 w3
Smg = w1 S1 + S2 + S3 , (1.10)
3 4

where we have introduced the weighting coefficients w1 , w2 and w3 . These coefficients satisfy
the following identity,
w1 + w2 + w3 = 1. (1.11)

In the original mixed-grid approach (Jo et al., 1996), the discretisation on the different
coordinate systems was directly applied to the second-order wave equation, equation (1.5),
with the second-order accurate stencil of Boore (1972). Alternatively, Hustedt et al. (2004)
have proposed to discretize first the first-order velocity-pressure system, equation (1.4), with
second-order staggered-grid stencils (Yee, 1966; Virieux, 1986; Saenger et al., 2000) and, sec-
ond, to eliminate the auxiliary wavefields (i.e., the velocity wavefields) following a parsimonious
staggered-grid method originally developed in the time domain (Luo and Schuster, 1990). The
parsimonious staggered-grid strategy allows us to minimize the number of wavefield compo-
nents involved in the equation (1.7), and therefore to minimize the size of the system to be
solved while taking advantage of the flexibility of the staggered-grid method to discretize first-
order difference operators. The parsimonious mixed-grid approach originally proposed in the
frequency domain by Hustedt et al. (2004) for the 2D acoustic wave equation was extended to
the 3D wave equation by Operto et al. (2007) and to a 2D pseudo-acoustic wave equation for
transversely isotropic media with tilted symmetry axis by Operto et al. (2009). The staggered-
grid method requires interpolation of the buoyancy in the middle of the FD cell which should
be performed by volume harmonic averaging (Moczo et al., 2002).
The pattern of the impedance matrix inferred from the 3D mixed-grid stencil is shown in
Figure 1.2. The bandwidth of the matrix is of the order of N 2 (N denotes the dimension of a
3D cubic N 3 domain) and was kept minimal thanks to the use of low-order accurate stencils.

20
1.3 Frequency-domain acoustic wave equation

Figure 1.1: Elementary FD stencils of the 3D mixed-grid stencil. Circles are pressure grid
points. Squares are positions where buoyancy needs to be interpolated in virtue of the
staggered-grid geometry. Gray circles are pressure grid points involved in the stencil. a)
Stencil on the classic Cartesian coordinate system. This stencil incorporates 7 coefficients. b)
Stencil on the rotated Cartesian coordinate system. Rotation is applied around x on the figure.
This stencil incorporates 11 coefficients. Same strategy can be applied by rotation around y
and z. Averaging of the 3 resultant stencils defines a 19-coefficient stencil. c) Stencil obtained
from 4 coordinate systems, each of them being associated with 3 main diagonals of a cubic cell.
This stencil incorporates 27 coefficients (Operto et al., 2007).

Figure 1.2: Pattern of the square impedance matrix discretized with the 27-point mixed-grid
stencil (Operto et al., 2007). The matrix is band diagonal with fringes. The bandwidth is
O(2N1 N2 ) where N1 and N2 are the two smallest dimensions of the 3D grid. The number of
rows/columns in the matrix is N1 × N2 × N3 . In the figure, N1 = N2 = N3 = 8

21
FORWARD MODELING

1.3.1.2 Anti-lumped mass

The linear combination of the rotated stencils in the mixed-grid approach is complemented by
the distribution of the mass term ω 2 /κ in the equation (1.5) over the different nodes of the
mixed-grid stencil to mitigate the numerical dispersion, leading to the following expression for
the mass matrix,
ω2 hpi hpi hpi hpi
p000 =⇒ ω 2 wm1 + wm2 + wm3 + wm4 , (1.12)
κ000 κ 0 κ 1 κ 2 κ 3
where the following equality,
wm2 wm3 wm4
wm1 + + + = 1, (1.13)
6 12 8
should be verified. In equation (1.12), the different nodes of the 27-point stencils are labelled
by indices lmn where l, m, n ∈ {−1, 0, 1} and 000 denotes the grid point in the middle of the
stencil. We used the notations
p
= p000 ,
κp 0 κp100
000
= + κp010 + κp001 + κp−100 + κp0−10 + κp00−1 ,
κp 1 κp110
100 010 001 −100 0−10 00−1
= 110 + κ011 + κ101 + κ−110 + κ0−11 + κ−101 + κp1−10
p011 p101 p−110 p0−11 p−101
+ κp01−1 + κp10−1 + κp−1−10 + κp0−1−1 + p−10−1
κ−10−1 ,
κp 2 κp111 p−1−1−1 p−111 p1−11 p11−1
1−10
p−1−11
01−1
p1−1−1
10−1
p−11−1
−1−10 0−1−1

κ 3 = κ111 + κ−1−1−1 + κ−111 + κ1−11 + κ11−1 + κ−1−11 + κ1−1−1 + κ−11−1 .

This anti-lumped mass strategy is opposite to the mass lumping technique used in finite
element methods to make the mass matrix diagonal when one wants to make a explicit time
integration. The anti-lumped mass approach, combined with the averaging of the rotated
stencils, allows us to minimize efficiently the numerical dispersion and to achieve an accuracy
representative of 4th -order accurate stencil from a linear combination of second-order accurate
stencils in the frequency formulation. The anti-lumped mass strategy introduces four additional
weighting coefficients wm1 , wm2 , wm3 and wm4 , equations (1.12) and (1.13). The coefficients
w1 , w2 , w3 , wm1 , wm2 , wm3 and wm4 are determined by minimization of the phase-velocity
dispersion in infinite homogeneous medium. Alternatives FD methods for designing optimized
FD stencils can be found in Holberg (1987); Takeuchi and Geller (2000).

1.3.2 Numerical dispersion and anisotropy

The dispersion analysis of the 3D mixed-grid stencil has been already developed in details in
Operto et al. (2007). We focus here on the sensitivity of the accuracy of the mixed-grid stencil
to the choice of the weighting coefficients w1 , w2 , w3 , wm1 , wm2 , wm3 . We aim to design
an accurate stencil for a discretisation criterion of 4 grid points per minimum propagated
wavelength. This criterion is driven by the spatial resolution of full waveform inversion, which
is half a wavelength. To properly sample subsurface heterogeneities, the size of which is half a
wavelength, four grid points per wavelength should be used according to Shannon theorem.
Inserting the discrete expression of a plane wave propagating in a 3D infinite homogeneous
medium of wave speed c and density equal to 1 in the wave equation discretized with the
mixed-grid stencil gives for the normalized phase velocity (Operto et al., 2007),
r
G w2 2w3
ṽph = √ w1 (3 − C) + (6 − C − B) + (3 − 3A + B − C), (1.14)
2Jπ 3 4

22
1.3 Frequency-domain acoustic wave equation

Table 1.1: Coefficients of the mixed-grid stencil as a function of the discretisation criterion Gm
for the minimization of the phase velocity dispersion.

Gm 4,6,8,10 4 8 10 20 40
wm1 0.4966390 0.5915900 0.5750648 0.7489436 0.7948160 0.6244839
wm2 7.51233E-02 4.96534E-02 5.76759E-02 1.39044E-02 3.71392E-03 5.06646E-02
wm3 4.38464E-03 5.10851E-03 5.56914E-03 6.38921E-03 5.54043E-03 1.42369E-03
wm4 6.76140E-07 6.14837E-03 1.50627E-03 1.13699E-02 1.45519E-02 6.8055E-03
w1 5.02480E-05 8.8075E-02 0.133953 0.163825 0.546804 0.479173
w2 0.8900359 0.8266806 0.7772883 0.7665769 0.1784437 0.2779923
w3 0.1099138 8.524394E-02 8.87589E-02 6.95979E-02 0.2747527 0.2428351

where J = (wm1 + 2wm2 C + 4wm3 B + 8wm4 A) with following quantities,

A = cos a cos b cos c,
B = cos a cos b + cos a cos c + cos b cos c,
C = cos a + cos b + cos c.

and a = 2π 2π 2π
G cos φ cos θ; b = G cos φ sin θ; c = G sin φ. Here, the normalized phase velocity is
the ratio between the numerical phase velocity ω/k and the wave speed c. G = λh = kh 2π
is
the number of grid points per wavelength λ. φ and θ are the incidence angles of the plane
wave. We look for the 5 independent parameters wm1 , wm2 , wm3 , w1 , w2 which minimize the
least-squares norm of the misfit (1. − ṽph ). The two remaining weighting coefficients wm4 and
w3 are inferred from equations (1.11) and (1.13), respectively. We estimated these coefficients
by a global optimization procedure based on a Very Fast Simulating Annealing algorithm (Sen
and Stoffa, 1995). We minimize the cost function for 5 angles φ and θ spanning between 0 and
45o and for different values of G.
In the following, the number of grid points for which phase velocity dispersion is minimized
will be denoted by Gm . The values of the weighting coefficients as a function of Gm are given in
Table 1.1. For high values of Gm , the Cartesian stencil has a dominant contribution (highlighted
by the value of w1 ), while the first rotated stencil has the dominant contribution for low values
of Gm as shown by the value of w2 . The dominant contribution of the Cartesian stencil for large
values of Gm is consistent with the fact that it has a smaller spatial support (i.e., 2×h) than the
rotated stencils and a good accuracy for G greater than 10 (Virieux, 1986). The error on the
phase velocity is plotted in polar coordinates for four values of G (4, 6, 8, 10) and for Gm = 4
in Figure 1.3a. We first show that the phase velocity dispersion is negligible for G = 4, that
shows the efficiency of the optimization. However, more significant error (0.4 %) is obtained for
intermediate values of G (for example, G = 6 in Figure 1.3a). This highlights the fact that the
weighting coefficients were optimally designed to minimize the dispersion for one grid interval
in homogeneous media. We show also the good isotropy properties of the stencil, shown by
the rather constant phase-velocity error whatever the direction of propagation. The significant
phase-velocity error for values of G greater than Gm prompt us to simultaneously minimize the
phase-velocity dispersion for four values of G: Gm =4,6,8,10 (Figure 1.3b). We show that the
phase-velocity error is now more uniform over the values of G and that the maximum phase-
velocity-error was reduced (0.25 % against 0.4 %). However, the nice isotropic property of the

23
FORWARD MODELING

Figure 1.3: Phase-velocity dispersion shown in spherical coordinates for four values of G. (a)
The phase-velocity dispersion was minimized for G = 4. (b) the phase-velocity dispersion was
minimized for 4 values of G: 4, 6, 8 and 10.

mixed-grid stencil was degraded and the phase-velocity dispersion was significantly increased
for G = 4. We conclude that the range of wavelengths propagated in a given medium should
drive the discretisation criterion used to infer the weighting coefficient of the mixed grid stencil
and that a suitable trade-off should be found between the need to manage the heterogeneity
of the medium and the need to minimize the error for a particular wavelength. Of note, an
optimal strategy might consist of adapting locally the values of the weighting coefficients to
the local wave speed during the assembling of the impedance matrix. This strategy is not
investigated yet.
Comparison between numerical and analytical pressure monochromatic wavefields com-
puted in a homogeneous medium of wave speed 1.5 km/s and density 1000 kg/m3 confirm the
former theoretical analysis (Figure 1.3). The frequency is 3.75 Hz corresponding to a propa-
gated wavelength of 400 m. The grid interval for the simulation is 100 m corresponding to G
= 4. Simulations were performed when the weighting coefficients of the mixed-grid stencils are
computed for Gm = 4 and Gm = {4, 6, 8, 10}. The best agreement is obtained for the weighting
coefficients associated with Gm = 4 as expected from the dispersion analysis.

1.3.3 Boundary conditions

In seismic exploration, two boundary conditions are implemented for wave modeling: absorbing
boundary conditions to mimic an infinite medium and free surface conditions on the top side
of the computational domain to represent the air-solid or air-water interfaces.

1.3.3.1 PML absorbing boundary conditions

We use Perfectly-Matched Layers (PML) absorbing boundary conditions (Berenger, 1994) to

mimic an infinite medium. In the frequency domain, the implementation of PMLs consists of

24
1.3 Frequency-domain acoustic wave equation

Figure 1.4: (a) Real part of a 3.75-Hz monochromatic wavefield computed with the mixed-
grid stencil in a 3D infinite homogeneous medium. The explosive point source is at x=2 km,
y=1 km, z=2 km. (b-c) Comparison between the analytical (gray) and the numerical solution
(black) for a receiver line oriented in the Y direction across the source position. The thin
black line is the difference. The amplitudes were corrected for 3D geometrical spreading. (b)
Gm = 4, 6, 8, 10. (c) Gm = 4.

applying in the wave equation a new system of complex-valued coordinates x̃ defined by (e.g.,
Chew and Weedon, 1994):
∂ 1. ∂
= . (1.15)
∂ x̃ ξx (x) ∂x
In the PML layers, the damped wave equation writes as

ω2

1. ∂ b(x) ∂ 1. ∂ b(x) ∂ 1. ∂ b(x) ∂
+ + + p(x, ω) = −s(x, ω),
κ(x) ξx (x) ∂x ξx (x) ∂x ξy (y) ∂y ξy (y) ∂y ξz (z) ∂z ξz (z) ∂z
(1.16)
where ξx (x) = 1 + iγx (x)/ω and γx (x) is a 1D damping function which defines the PML
damping behavior in the PML layers. These functions differ from zero only inside the PML
layers. In the PML layers, we used γ(x) = cpml 1 − cos( π2 L−x L ) where L denotes the width
of the PML layer and x is a local coordinate in the PML layer whose origin is located at the
outer edges of the model. The scalar cpml is defined by trial and error depending on the width
of the PML layer. The procedure to derive the unsplitted second-order wave equation with
PML conditions, equation (1.16), from the first-order damped wave equation is given in Operto
et al. (2007).
The absorption of the PML layers at grazing incidence can be improved by using convolu-
tional PML (C-PML) (Kuzuoglu and Mittra, 1996; Roden and Gedney, 2000; Komatitsch and
Martin, 2007). In the C-PML layers, the damping function ξx (x) becomes

dx
ξx (x) = κx + i , (1.17)
αx + iω

where dx and αx are generally quadratic and linear functions, respectively. Suitable expression
for κx , dx and αx are discussed in Kuzuoglu and Mittra (1996); Collino and Monk (1998); Roden
and Gedney (2000); Collino and Tsogka (2001); Komatitsch and Martin (2007); Drossaert and
Giannopoulos (2007).

25
FORWARD MODELING

1.3.3.2 Free surface boundary conditions

Planar free surface boundary conditions can be simply implemented in the frequency domain
with two approaches. In the first approach, the free surface matches the top side of the FD
grid and the pressure is forced to zero on the free surface by using a diagonal impedance
matrix for rows associated with collocation grid points located on the top side of the FD grid.
Alternatively, the method of image can be used to implement the free surface along a virtual
plane located half a grid interval above the topside of the FD grid (Virieux, 1986). The pressure
is forced to vanish at the free surface by using a ficticious plane located half a grid interval
above the free surface where the pressure is forced to have opposite values to that located just
below the free surface.
From a computer implementation point of view, an impedance matrix is typically built row
per row. One row of the linear system can be written as
X X X
ai1 i2 i3 pi1 i2 i3 = s0 0 0 (1.18)
i3 =−1,1 i2 =−1,1 i1 =−1,1

where ai1 i2 i3 are the coefficients of the 27-point mixed grid stencil and 0 0 0 denote the indices
of the collocation coefficient located in the middle of the stencil in a local coordinate system.
The free surface boundary conditions writes as

p−1 i2 i3 = −p0 i2 i3 , (1.19)

for i2 = {−1, 0, 1} and i3 = {−1, 0, 1}. The indices i1 =-1 and i1 =0 denotes here the grid
points just above and below the free surface, respectively. For a grid point located on the top
side of the computational domain (i.e., half a grid interval below free surface), equation (1.18)
becomes
X X X X
a 1 i2 i3 p 1 i 2 i3 + (a0 i2 i3 − a−1 i2 i3 ) p0 i2 i3 = s0 0 0 , (1.20)
i3 =−1,1 i2 =−1,1 i3 =−1,1 i2 =−1,1

where p−1 i2 i3 has been replaced by the opposite value of p0 i2 i3 according to the equation
(1.18).
Our practical experience is that both implementation of free surface boundary conditions
give results of comparable accuracy. Of note, rigid boundary conditions (zero displacement
perpendicular to the boundary) or periodic boundary conditions (Ben Hadj Ali et al., 2008)
can be easily implemented with the method of image following the same principle than for the
free surface condition.

1.3.4 Source and receiver implementation on coarse grids

Seismic imaging by full waveform inversion is initiated at frequency as small as possible to

mitigate the non linearity of the inverse problem resulting from the use of local optimization
approach such as gradient methods. The starting frequency for modeling can be as small as 2 Hz
which can lead to grid intervals as large as 200 m. In this framework, accurate implementation
of point source at arbitrary position in a coarse grid is critical. One method has been proposed

26
1.3 Frequency-domain acoustic wave equation

by Hicks (2002) where the point source is approximated by a windowed Sinc function. The
Sinc function is defined by
sin(πx)
sinc(x) = , (1.21)
πx
where x = (xg − xs ), xg denotes the position of the grid nodes and xs denotes the position of
the source. The Sinc function is tapered with a Kaiser function to limit its spatial support. For
multidimensional simulations, the interpolation function is built by tensor product construction
of 1D windowed Sinc functions. If the source positions matches the position of one grid node, the
Sinc function reduces to a Dirac function at the source position and no approximation is used
for the source positioning. If the spatial support of the Sinc function intersects a free surface,
part of the Sinc function located above the free surface is mirrored into the computational
domain with a reverse sign following the method of image. Vertical force can be implemented
in a straightforward way by replacing the Sinc function by its vertical derivative. The same
interpolation function can be used for the extraction of the pressure wavefield at arbitrary
receiver positions. The accuracy of the method of Hicks (2002) is illustrated in Figure 1.5
which shows a 3.5-Hz monochromatic wavefield computed in a homogeneous half space. The
wave speed is 1.5 km/s and the density is 1000 kg/m3 . The grid interval is 100 m. The free
surface is half a grid interval above the top of the FD grid and the method of image is used to
implement the free surface boundary condition. The source is in the middle of the FD cell at
2 km depth. The receiver line is oriented in the Y direction. Receivers are in the middle of the
FD cell in the horizontal plane and at a depth of 6 m just below the free surface. This setting
is representative of a ocean bottom survey where the receiver is on the sea floor and the source
is just below the sea surface (in virtue of the spatial reciprocity of the Green functions, sources
are processed here as receivers and vice versa). Comparison between the numerical and the
analytical solutions at the receiver positions are first shown when the source is positioned at
the closest grid point and the numerical solutions are extracted at the closest grid point (Figure
1.5b). The amplitude of the numerical solution is strongly overestimated because the numerical
solution is extracted at a depth of 50 m below free surface (where the pressure vanishes) instead
of 6 m. Second, a significant phase shift between numerical and analytical solutions results
from the approximate positioning of the sources and receivers. In contrast, a good agreement
between the numerical and analytical solutions both in terms of amplitude and phase is shown
in Figure 1.5c where the source and receiver positioning was implemented with the windowed
Sinc interpolation.

1.3.5 Resolution with the sparse direct solver MUMPS

To solve the sparse system of linear equations, equation (1.7), we use the massively parallel
direct MUMPS solver designed for distributed memory platforms. The reader is referred to
Guermouche et al. (2003); Amestoy et al. (2006); MUMPS-team (2009) for an extensive de-
scription of the method and their underlying algorithmic aspects. The MUMPS solver is based
on a multifrontal method (Duff et al., 1986; Duff and Reid, 1983; Liu, 1992), where the res-
olution of the linear system is subdivided into three main tasks. The first one is an analysis
phase or symbolic factorization. Reordering of the matrix coefficients is first performed in or-
der to minimize fill-ins. We used the METIS algorithm which is based on a hybrid multilevel
nested-dissection and multiple minimum degree algorithm (Karypis and Kumar, 1999). Then,
the dependency graph which describes the order in which the matrix can be factored is esti-
mated as well as the memory required to perform the subsequent numerical factorization. The

27
FORWARD MODELING

Figure 1.5: a) Real part of a 3.75-Hz monochromatic wavefield in a homogeneous half space.
(b) Comparison between numerical (black) and analytical (gray) solutions at receiver positions.
The Sinc interpolation with 4 coefficients was used for both the source implementation and the
extraction of the solution at the receiver positions on a coarse FD grid.

second task is the numerical factorization. The third task is the solution phase performed by
forward and backward substitutions. During the solution phase, multiple-shot solutions can
be computed simultaneously from the LU factors taking advantage of threaded BLAS3 (Basic
Linear Algebra Subprograms) library and are either assembled on the host or kept distributed
on the processors for subsequent parallel computations.
We performed the factorization and the solutions phases in complex arithmetic single pre-
cision. To reduce the condition number of the matrix, a row and column scaling is applied in
MUMPS before factorization. The sparsity of the matrix and suitable equilibration have made
single precision factorization accurate enough so far for the 2D and 3D problems we tackled.
If single precision factorization would be considered not accurate enough for very large prob-
lems, an alternative approach to double precision factorization may be the postprocessing of
the solution by a simple and fast iterative refinement performed in double precision (Demmel
(1997), pages 60-61 and Langou et al. (2006); Kurzak and Dongarra (2006)).
The main two bottlenecks of sparse direct solver is the time and memory complexity and
the limited scalability of the LU decomposition. By complexity is meant the increase of the
computational cost (either in terms of elapsed time or memory) of an algorithm with the
size of the problem, while scalability describes the ability of a given algorithm to use an
increasing number of processors. The theoretical memory and time complexity of the LU
decomposition for a sparse matrix, the pattern of which is shown in Figure 1.2, is O(N 4 ) and
O(N 6 ), respectively, where N is the dimension of a 3D cubic grid N 3 .
We estimated the observed memory complexity and scalability of the LU factorization
by means of numerical experience. The simulations were performed on the SGI ALTIX ICE
supercomputer of the computer center CINES (France). Nodes are composed of two quad-
core INTEL processors E5472. Each node has 30 Gbytes of useful memory. We used two
MPI process per node and four threads per MPI process. In order to estimate the memory
complexity, we performed simulations on cubic models of increasing dimension with PML
absorbing boundary conditions along the 6 sides of the model. The medium is homogeneous
and the source is on the middle of the grid. Figure 1.6a shows the memory required to store
the complex-valued LU factors as a function of N . Normalization of this curve by the real
memory complexity will lead to a horizontal line. We found an observed memory complexity of
O(Log2 (N )N 3.9 ) (Figure 1.6b) which is consistent with the theoretical one. In order to assess

28
1.4 Numerical examples

Figure 1.6: (a-b) Memory complexity of LU factorization. (a) Memory in Gbytes required for
storage of LU factors. (b) Memory required for storage of LU factors divided by Log2 N.N 3.9 .
N denotes the dimension of a 3D N 3 grid. The largest simulation for N = 207 correspond to
8.87 millions of unknowns. (c-e) Scalability analysis of LU factorization. (c) Elapsed time for
LU factorization versus the number of MPI processes. (d) Speedup. (e) Efficiency.

the scalability of the LU factorization, we consider a computational FD grid of dimensions 177

x 107 x 62 corresponding to 1.17 millions of unknowns. The size of the grid corresponds to
a real subsurface target for oil exploration at low frequency (3.5 Hz). We computed a series
of LU factorization using an increasing number of processors Np , starting with Npref = 2.
The elapsed time of the LU factorization (TLU ) and the parallelism efficiency (TLU (Npref ) ×
Npref /TLU (Np ) × Np ) are shown in Figure 1.6(c-d). The efficiency drops rapidly as the number
of processors increased, down to a value of 0.5 for NP = 32 (Figure 1.6e). This clearly indicates
that the most suitable platform for sparse direct solver should be composed of a limited number
of nodes with a large amount of shared memory. The efficiency of the multi-r.h.s solution phase
can be significantly improved by using multithreaded BLAS3 library.

1.4 Numerical examples

We present acoustic wave modeling in two realistic 3D synthetic velocity models, the SEG/EAGE
overthrust and salt models, developed by the oil exploration community to assess seismic mod-
eling and imaging methods (Aminzadeh et al., 1997). The simulation was performed on the
SGI ALTIX ICE supercomputer just described.

1.4.1 3D EAGE/SEG overthrust model

The 3D SEG/EAGE Overthrust model is a constant density onshore acoustic model covering
an area of 20 km × 20 km × 4.65 km (Aminzadeh et al., 1997)(Figure 1.7a). From a geological
viewpoint, it represents a complex thrusted sedimentary succession constructed on top of a
structurally decoupled extensional and rift basement block. The overthrust model is discretized
with 25 m cubic cells, representing an uniform mesh of 801 × 801 × 187 nodes. The minimum
and maximum velocities in the Overthrust model are 2.2 km/s and 6.0 km/s respectively.

29
FORWARD MODELING

Figure 1.7: (a) Overthrust velocity model. (b-c) 7-Hz monochromatic wavefield (real part)
computed with the FDFD (b) and FDTD (c) methods.(d) Direct comparison between FDFD
(gray) and FDTD (black) solutions. The receiver line in the dip direction is: (top) at 0.15-km
depth and at 2.4 km in the cross direction. The amplitudes were corrected for 3D geometrical
spreading; (bottom) at 2.5-km depth and at 15 km in the cross direction.

We present the results of a simulation performed with the mixed-grid FD method (referred to
as FDFD in the following) for a frequency of 7 Hz and for a source located at x = 2.4 km,
y = 2.4 km and z = 0.15 km. The model was resampled with a grid interval of 75 m that
corresponds to four grid points per minimum wavelength. The size of the resampled FD grid
is 266 x 266 x 62. PML layers of 8 grid points were added along the 6 sides of the 3D FD
grid. This leads to 6.2 millions of pressure unknowns. For the simulation, we used the weights
of the mixed-grid stencil obtained for Gm = 4, 6, 8, 10. These weights provided slightly more
accurate results than the weights obtained for Gm = 4, in particular for waves recorded at
long source-receiver offsets. The 7-Hz monochromatic wavefield computed with the FDFD
method is compared with that computed with a classic O(∆t2 , ∆x4 ) staggered-grid FD time-
domain (FDTD) method where the monochromatic wavefield is integrated by discrete Fourier
transform within the loop over time steps (Sirgue et al., 2008) (Figure 6).

We used the same spatial FD grid for the FDTD and FDFD simulations. The simulation
length was 15 s in the FDTD modeling. We obtain a good agreement between the two solutions
(Figure 1.7d). The statistics of the FDFD and FDTD simulations are outlined in Table 1.2.
The FDFD simulation was performed on 32 MPI processes with 2 threads and 15 Gbytes
of memory per MPI process. The total memory required by the LU decomposition of the
impedance matrix was 260 Gbytes. The elapsed time for LU decomposition was 1822 s and the
elapsed time for one r.h.s was 0.97 s. Of note, we processed efficiently groups of 16 sources in
parallel during the solution step by taking advantage of the multi-rhs functionality of MUMPS
and the threaded BLAS3 library. The elapsed time for the FDTD simulation was 352 s on 4
processors. Of note, C-PML absorbing boundary conditions were implemented in the full model
during FDTD modeling to mimic attenuation effects implemented with memory variables. To
highlight the benefit of the direct-solver approach for multi-r.h.s simulation on a small number

30
1.4 Numerical examples

Table 1.2: Statistics of the simulation in the overthrust (top row) and in the salt (bottom row)
model. F (Hz): frequency; h(m): FD grid interval; nu : number of unknowns; MLU : memory
used for LU factorization in Gbytes; TLU : elapsed time for factorization; Ts : elapsed time for
one solution phase; Npf df d : number of MPI processors used for FDFD; NPf dtd : number of MPI
processors used for FDTD; Tf dtd : elapsed time for one FDTD simulation.

Model F (Hz) h(m) nu (106 ) MLU (Gb) TLU (s) Ts (s) Npf df d NPf dtd Tf dtd (s)
Over. 7 75 6.2 260 1822 0.97 32 4 352
Salt 7.34 50 8.18 402.5 2863 1.4 48 16 211

Table 1.3: Comparison between FDTD and FDFD modeling for 32 (left) and 2000 (right)
processors. The number of sources is 2000. P re. denotes the elapsed time for the source-
independent task during seismic modeling (i.e., the LU factorization in the FDFD approach).
Sol. denotes the elapsed time for multi-r.h.s solutions during seismic modeling (i.e., the sub-
stitutions in the FDFD approach).

Model Method Pre. (hr) Sol. (hr) Total (hr) Pre. (hr) Sol. (hr) Total (hr)
Over. FDTD 0 21.7 21.7 0 0.96 0.96
Over. FDFD 0.5 0.54 1.04 0.5 0.0134 0.51
Salt FDTD 0 39 39 0 0.94 0.94
Salt FDFD 0.8 0.78 1.58 0.80 0.016 0.816

of processors, we compare the performances of the FDFD and FDTD simulations for 2000
sources (Table 1.3). If the number of available processors is 32, the FDFD method is more
than one order of magnitude faster than the FDTD one thanks of the efficiency of the solution
step of the direct-solver approach. If the number of processors equals to the number of sources,
the most efficient parallelization of the FDTD method consists of assigning one source to one
processor and performing the FDTD simulation in sequential on each processor. For a large
number of processors, the cost of the FDFD method is dominated by the LU decomposition
(if the 2000 processors are splitted into groups of 32 processors, each group being assigned to
the processing of 2000/32 sources) and the computational cost of the two methods is of the
same order of magnitude. This schematic analysis highlights the benefit of the FDFD method
based on sparse direct solver to tackle efficiently problems involving few millions of unknowns
and few thousands of r.h.s on small distributed-memory platforms composed of nodes with a
large amount of shared memory.

1.4.1.1 3D EAGE/SEG salt model

The salt model is a constant density acoustic model covering an area of 13.5 km × 13.5 km ×
4.2 km (Aminzadeh et al., 1997)(Figure 1.8). The salt model is representative of a Gulf Coast
salt structure which contains salt sill, different faults, sand bodies and lenses. The salt model
is discretized with 20 m cubic cells, representing an uniform mesh of 676 x 676 x 210 nodes.
The minimum and maximum velocities in the Salt model are 1.5 and 4.482 km/s respectively.

31
FORWARD MODELING

Figure 1.8: (a) Salt velocity model. (b-c) 7.33-Hz monochromatic wavefield (real part) com-
puted with the FDFD (b) and the FDTD (c) methods. (d) Direct comparison between FDFD
(gray) and FDTD (black) solutions. The receiver line in the dip direction is: (top) at 0.1-km
depth and at 3.6 km in the cross direction. The amplitudes were corrected for 3D geometrical
spreading. (bottom) at 2.5-km depth and at 15 km in the cross direction.

We performed a simulation for a frequency of 7.34 Hz and for one source located at x =
3.6 km, y = 3.6 km and z = 0.1 km. The model was resampled with a grid interval of 50 m
corresponding to 4 grid points per minimum wavelength. The dimension of the resampled
grid is 270 x 270 x 84 which represents 8.18 millions of unknowns after addition of the PML
layers. We used the weights of the mixed-grid stencil inferred from Gm = 4, 6, 8, 10. Results of
simulations performed with the FDFD and FDTD methods are compared in Figure 1.8. The
length of the FDTD simulation is 15 s.

The statistics of the simulation are outlined in Table 1.2. We obtain a good agreement
between the two solutions (Figure 1.8d) although we show a small phase shift between the two
solutions at offsets greater than 5 km. This phase shift results from the propagation in the
high-velocity salt body. This phase shift is higher when the FDFD is performed with weights
inferred from Gm = 4. The direct-solver modeling was performed on 48 MPI process using 2
threads and 15 Gbytes of memory per MPI process. The memory and the elapsed time for
the LU decomposition were 402 Gbytes and 2863 s, respectively. The elapsed time for the
solution step for one r.h.s was 1.4 s when we process 16 rhs at a time during the solution step
in MUMPS. The elapsed time for one FDTD simulation on 16 processors was 211 s. As for the
overthrust model, the FDFD approach is more than one order of magnitude faster than the
FDTD one when a large number of r.h.s (2000) and a small number of processors (48) are used
(Table 1.3). For a number of processors equal to the number of r.h.s, the two approaches have
the same cost. Of note, in the latter configuration (NP =Nrhs ), the cost of the FDFD modeling
and of the FDTD modeling are almost equal in the case of the salt model (0.94 h versus 0.816
h) while the FDFD modeling was almost two times faster in the case of the smaller overthrust

32
1.5 Finite-element Discontinuous Galerkin Method

case study (0.96 h versus 0.51 h). This trend simply highlights the higher scalability of the
FDTD compared to the FDFD one.
Over the last decades, simulations of wave propagation in complex media have been effi-
ciently tackled with finite-difference methods (FDMs) and applied with success to numerous
physical problems (Graves, 1996; Moczo et al., 2007). Nevertheless, FDMs suffer from some
critical issues that are inherent to the underlying Cartesian grid, such as parasite diffractions
in cases where the boundaries have a complex topography. To reduce these artefacts, the
discretisation should be fine enough to reduce the ’stair-case’ effect at the free surface. For
instance, a second-order rotated FDM requires up to 60 grid points per wavelength to compute
an accurate seismic wavefield in elastic media with a complex topography (Bohlen and Saenger,
2006). Such constraints on the discretisation drastically restrict the possible field of realistic
applications. Some interesting combinations of FDMs and finite-element methods (FEMs)
might overcome these limitations (Galis et al., 2008). The idea is to use an unstructured FEM
scheme to represent both the topography and the shallow part of the medium, and to adopt for
the rest of the model a classical FDM regular grid. For the same reasons as the issues related
to the topography, uniform grids are not suitable for highly heterogeneous media, since the
grid size is determined by the shortest wavelength. Except in some circumstances, like mixing
grids (Aoi and Fujiwara, 1999) or using non uniform Cartesian grids (Pitarka, 1999) in the
case of a low velocity layer, it is almost impossible to locally adapt the grid size to the medium
properties in the general case. From this point of view, FEMs are appealing, since they can
use unstructured grids or meshes. Due to ever-increasing computational power, these kinds of
methods have been the focus of a lot of interest and have been used intensively in seismology
(Aagaard et al., 2001; Akcelik et al., 2003; Ichimura et al., 2007).

1.5 Finite-element Discontinuous Galerkin Method

Finite element methods, often more intensive in computer resources, introduce naturally bound-
ary conditions. Therefore, we expect improved accurate solutions with this numerical approach
at the expense of computer requirements. The system of equations (1.1) in time has now a
non-diagonal mass matrix while the system of equations (1.2) has a impedance matrix par-
ticularly ill-conditioned in 3D geometry taking into account its dimensionality. Therefore, for
2D geometries, the frequency formulation is still a quite feasable option while time domain
approaches are there appealing when considering 3D geometries. Due to ever-increasing com-
putational power, finite element methods using unstructured grids or meshes have been the
focus of a lot of interest and have been used intensively in seismology (Aagaard et al., 2001;
Akcelik et al., 2003; Ichimura et al., 2007).
Usually, the approximation order remains low, due to the prohibitive computational cost
related to a non-diagonal mass matrix. However, this high computational cost can be avoided by
mass lumping, a standard technique that replaces the large linear system by a diagonal matrix
(Marfurt, 1984; Chin-Joe-Kong et al., 1999) and leads to a explicit time integration. Another
class of FEMs that relies on the Gauss-Lobatto-Legendre quadrature points has removed these
limitations, and allows for spectral convergence with high approximation orders. This high-
order FEM, called the spectral element method (SEM), (Seriani and Priolo, 1994; Komatitsch
and Vilotte, 1998), has been applied to large-scale geological models, up to the global scale
(Chaljub et al., 2007; Komatitsch et al., 2008). The major limitation of SEM is the exclusive use

33
FORWARD MODELING

of hexahedral meshes, which makes the design of an optimal mesh cumbersome in contrast to
the flexibility offered by tetrahedral meshes. With tetrahedral meshes (Frey and George, 2008),
it is possible to fit almost perfectly complex topographies or geological discontinuities and the
mesh width can be adapted locally to the medium properties (h-adaptivity). The extension of
the SEM to tetrahedral elements represents ongoing work, while some studies have been done in
two dimensions on triangular meshes (Pasquetti and Rapetti, 2006; Mercerat et al., 2006). On
the other hand, another kind of FEM has been proven to give accurate results on tetrahedral
meshes: the discontinuous Galerkin finite-element method (DG-FEM) in combination with
the arbitrary high-order derivatives (ADER) time integration (Dumbser and Käser, 2006).
Originally, DG-FEM was developed for the neutron transport equation (Reed and Hill, 1973).
It has been applied to a wide range of applications such as electromagnetics (Cockburn et al.,
2004), aeroacoustics (Toulopoulos and Ekaterinaris, 2006) and plasma physics (Jacobs and
Hesthaven, 2006), just to cite a few examples. This method relies on the exchange of numerical
fluxes between adjacent elements. Contrary to classical FEMs, no continuity of the basis
functions is imposed between elements, and therefore the method supports discontinuities in
the seismic wavefield, as in the case of a fluid/solid interface. In such cases, the DG-FEM allows
the same equation to be used for both the elastic and the acoustic media, and it does not require
any explicit conditions on the interface (Käser and Dumbser, 2008), which is, on the contrary,
mandatory for continuous formulations, like the SEM (Chaljub et al., 2003). Moreover, the DG-
FEM is completely local, which means that elements do not share their nodal values, contrary
to conventional continuous FEM. Local operators make the method suitable for parallelisation
and allow for the mixing of different approximation orders (p-adaptivity).

We proceed from increasing order by considering first a finite-element discontinuous ap-

proach of the acoustic equation for 2D geometries in the frequency domain. The elastic 2D
case is quite similar as long as we modify the field unknowns we need to compute. Because the
elastic 3D case is still a challenging task in 3D geometries, we shall introduce the finite-element
discontinuous approach through the time integration.

1.6 2D Finite-element Discontinuous Galerkin Method in the

Frequency Domain

On land exploration seismology, there is a need to perform elastic wave modeling in area
of complex topography such as foothills and thrust belts (Figure 1.9). Moreover, onshore
targets often exhibit weathered layers with very low wave speeds in the near surface which
require a locally-refined discretisation for accurate modeling. In shallow water environment, a
mesh refinement is also often required near the sea floor for accurate modeling of guided and
interface waves near the sea floor. Accurate modeling of acoustic and elastic waves in presence
of complex boundaries of arbitrary shape and the local adaptation of the discretisation to local
features such as weathered near surface layers or sea floor were two of our motivations behind
the development of a discontinuous element method on unstructured meshes for acoustic and
elastic wave modeling.

34
1.6 2D Finite-element Discontinuous Galerkin Method in the Frequency Domain

Figure 1.9: Application of the DG method in seismic exploration. (a) Velocity model repre-
sentative of a foothill area affected by a hilly relief and a weathered layer in the near surface.
(b) Close-up of the unstructured triangular mesh locally refined near the surface. (c) Example
of monochromatic pressure wavefield.

1.6.1 hp-adaptive Discontinuous Galerkin discretisation

In the finite-element framework, the wavefields are approximated by means of local polynomial
basis functions defined in volume elements. In the following, we adopt the nodal form of the DG
formulation, assuming that the wavefield vector is approximated in triangular or tetrahedral
elements for 2D and 3D problems, respectively which leads to the following expression,
di
X
~ui (ω, x, y, z, t) = ~uij (ω, xj , yj , zj )ϕij (ω, x, y, z), (1.22)
j=1

where ~u is the wavefield vector of components ~u = (p, vx , vy , vz ). i is the index of the element
in an unstructured mesh. The expression ~ui (ω, x, y, z, t) denotes the wavefield vector in the
element i and (x, y, z) are the coordinates inside the element i.In the framework of the nodal
form of the DG method, ϕij denotes Lagrange polynomial and di is the number of nodes in
the element i. The position of the node j in the element i is denoted by the local coordinates
(xj , yj , zj ).
In the following, the first-order acoustic velocity-pressure system, equation (2.1), will be
written in a pseudo-conservative form as
X
M~u = ∂θ (Nθ ~u) + ~s, (1.23)
θ∈{x,y,z}

where these expressions,

 
−iω/κ 0 0 0
 0 −iωρ 0 0 
M =   (1.24)
 0 0 −iωρ 0 
0 0 0 −iωρ

35
FORWARD MODELING

     
0 1 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 0 0
Nx = 
0
 Ny =   Nz =  ,
0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
(1.25)

Nθ ~u are linear fluxes and the source vector is denoted by ~s.

The first step in the finite-element formulation is obtaining the weak form of the first-
order acoustic velocity-stress system by multiplying equation (1.23) by a test function ϕir and
integration over the element volume Vi which gives
Z Z X Z
ϕir Mi ~ui dV = ϕir ∂θ (Nθ ~ui ) dV + ϕir ~si dV, (1.26)
Vi Vi θ∈{x,y,z} Vi

where the quantity r ∈ [1, di ]. In the framework of Galerkin methods, we used the same
function for the test function and the shape function, equation (1.22).
Integration by parts of the right hand side of equation (1.26) leads to the following equation,
 
Z Z X Z X Z
ϕir M~ui dV = − ∂θ ϕir (Nθ ~ui ) dV + ϕir  Nθ nθ  ~ui dS + ϕir ~si dV,
Vi Vi θ∈{x,y,z} Si θ∈{x,y,z} Vi

(1.27)
where the expression Si is the surface of the element i and ~n = (nx , ny , nz ) is the outward
pointing unit normal vector with respect to the surface i. We recognize in the second term of
the right-hand side of equation (1.27) the numerical flux fi defined by
 
X
n · fi =  Nθ nθ  ~ui . (1.28)
θ∈{x,y,z}

A suitable expression fi/k of the numerical flux fi should guarantee the consistency between
the values of the wavefield computed at a node shared by two neighbor elements i and k.
Among many ways to estimat numerically fluxes, We consider centered fluxes for their
energy conservation properties (Remaki, 2000) which gives the expression,

~vi + ~vk
fi/k = fi . (1.29)
2

Assuming constant physical properties per element and plugging the expression of the
centred flux, equation (1.29), in equation (1.27) gives the equation,
Z Z Z Z
X 1 X
Mi ϕir ~ui dV = − ∂θ ϕir (Nθ ~ui ) dV + ϕir Pik (~ui + ~uk ) dS + ϕir ~si dV,
Vi Vi 2 Sik Vi
θ∈{x,y,z} k∈Ni
(1.30)
where k ∈ Ni represents the elements k adjacent to the element i, Sik is the face between
elements i and k and P is defined as follows
X
Pik = nikθ Nθ , (1.31)
θ∈{x,y,z}

36
1.6 2D Finite-element Discontinuous Galerkin Method in the Frequency Domain

where nikθ is the component along the θ axis of the unit vector ~nik of the face Sik .
Equations (1.30) shows that the computation of the wavefield in one element requires only
information from the directly neighboring elements. This highlights clearly the local nature
of the DG scheme. If we replace the expression of ~ui and ~uk by their decomposition on the
polynomial basis, equation (1.22), we get the discrete expression,
X 1 Xh i
(Mi ⊗ Ki ) ~~ui = − (Nθ ⊗ Eiθ ) ~~ui + (Qik ⊗ Fik ) ~~ui + (Qik ⊗ Gik ) ~~uk + (I ⊗ Ki ) ~
~si
2
θ∈{x,y,z} k∈Ni
(1.32)
where the coefficient rj of the mass matrix Ki , of the stiffness matrix Ei and of the flux matrices
Fi and Gi are respectively given by:

Z
(Ki )rj = ϕir ϕij dV, j, r ∈ [1, di ]
Vi
Z
(Eiθ )ij = (∂θ ϕir ) ϕij dV, j, r ∈ [1, di ] θ ∈ {x, y, z}
Vi
Z
(Fik )rj = ϕir ϕij dS, j, r ∈ [1, di ]
Sik
Z
(Gik )rj = ϕir ϕkj dS, r ∈ [1, di ] , j ∈ [1, dk ] (1.33)
Sik

In equation (1.32), ~~ui and ~~si gather all nodal values for each component of the wavefield
and source. I is the identity matrix and ⊗ is the tensor product of two matrices A and B as
follows  
a11 B ... a1m B
 . . . 
 
A⊗B= .  . . ,
 (1.34)
 . . . 
an1 B ... anm B

where (n × m) denotes the dimensions of the matrix A. The four matrices Ki , Ei , Fik and Gik
are computed by numerical integration using Hammer quadrature.
It is worth noting that, in equation (1.33), arbitrary polynomial order of the shape functions
can be used in elements i and k indicating that the approximation orders are totally decoupled
from one element to another. Therefore, the DG allows for varying approximation orders in
the numerical scheme, leading to the p-adaptivity.
The equation (1.32) can be recast in matrix form as

B u = s. (1.35)

In contrast to the parsimonious FD formulation, we do not eliminate the auxiliary velocity

wavefields from the system because the elimination procedure is a cumbersome task in the DG
formulation.

37
FORWARD MODELING

Figure 1.10: Number of P0 , P1 , P2 nodes in a triangular (a) and tetrahedral (b) element.

1.6.2 Which interpolation orders to choose?

For the shape and test functions, we used low-order Lagrangian polynomials of orders 0, 1
and 2, referred to as Pk , k ∈ 0, 1, 2 in the following (Brossier, 2009; Etienne et al., 2009). Let
us remind that our motivation behind seismic modeling is to perform seismic imaging of the
subsurface by full waveform inversion, the spatial resolution of which is half the propagated
wavelength and that the physical properties of the medium are piecewise constant per element
in our implementation of the DG method. The spatial resolution of the FWI and the piecewise
constant representation of the medium direct us towards low-interpolation orders to achieve the
best compromise between computational efficiency, solution accuracy and suitable discretisation
of the computational domain. The P0 interpolation (or finite volume scheme) was shown to
provide sufficiently-accurate solution on 2D equilateral triangular mesh when ten cells per
minimum propagated wavelength are used (Brossier et al., 2008), while 10 cells and 3 cells
per propagated wavelengths provide sufficiently-accurate solutions on unstructured triangular
meshes with the P1 and the P2 interpolation orders, respectively (Brossier, 2009). Of note, the
P0 scheme is not convergent on unstructured meshes when centered fluxes are used (Brossier
et al., 2008). This prevents the use of the P0 scheme in 3D medium where uniform tetrahedral
meshes do not exist (Etienne et al., 2008). A second remark is that the finite volume scheme
on square cell is equivalent to second-order accurate FD stencil (Brossier et al., 2008) which
is consistent with a discretisation criterion of 10 grid points per wavelength (Virieux, 1986).
Use of interpolation orders greater than 2 would allow us to use coarser meshes for the same
accuracy but these coarser meshes would lead to an undersampling of the subsurface model
during imaging. On the other hand, use of high interpolation orders on mesh built using a
criterion of 4 cells par wavelength would provide an unnecessary accuracy level for seismic
imaging at the expense of the computational cost resulting from the dramatic increase of the
number of unknowns in the equation (1.35).
The computational cost of the LU decomposition depends on the numerical bandwidth of
the matrix, the dimension of the matrix (i.e., the number of rows/columns) and the number of
non-zero coefficients per row (nz ). The dimension of the matrix depends in turn of the number
of cell (ncell ), of the number of nodes per cell (nd ) and the number of wavefield components
(nwave ) (3 in 2D and 4 in 3D). The number of nodes in a 2D triangular and 3D tetrahedral
element is given by Hesthaven and Warburton (2008) and leads to the following expressions,

(k + 1)(k + 2) (k + 1)(k + 2)(k + 3)

2D mesh : nd = , 3D mesh : nd = , (1.36)
2 6
where k denotes the interpolation order (Figure 1.10). We shall comment further in the 3D
case later on.
The numerical bandwidth is not significantly impacted by the interpolation order. The

38
1.6 2D Finite-element Discontinuous Galerkin Method in the Frequency Domain

Table 1.4: Number of nodes per element (nd ) and number of non-zero coefficients per row of
the impedance matrix (nz ) for the FD and DG methods. Left: 2D case; Right: 3D case. nz
depends on the number of wavefield components involved in the r.h.s of the first-order wave
equation nder , unlike the parsimonious FD method applied to the second-order wave equation.

F D2D DG2DP0 DG2DP1 DG2DP2 F D3D DG3D P0 DG3DP1 DG3DP2

nd 1 1 3 6 1 1 4 10
nz 9 5-9 13-25 24-48 27 6-16 21-61 51-151

dimension of the matrix and the number of non zero elements per row of the impedance matrix
are respectively given by nwave × nd × ncell and (1 + nneigh ) × nd × nder + 1, where nneigh is the
number of neighbor cell (3 in 2D and 4 in 3D) and nder is the number of wavefield components
involved in the r.h.s of the velocity-pressure wave equation, equation (1.23). Table 1.4 outlines
the number of non zero coefficients per row for the mixed-grid FD and DG methods. Increasing
the interpolation order will lead to an increase of the number of non zero coefficients per row,
a decrease of the number of cells in the mesh and an increase of the number of nodes in each
element. The combined impact of the 3 parameters nz , ncell , nd on the computational cost of
the DG method makes difficult the definition of the optimal discretisation of the frequency-
domain DG method. The medium properties should rather drive us towards the choice of a
suitable discretisation. To illustrate this issue, we perform a numerical experiment with two
end-members models composed of an infinite homogeneous and a two-layer model with a sharp
velocity contrast at the base of a thin low-velocity layer. Both models have the same dimension
(4 km x 4 km). The top layer of the two-layer model has a thickness of 400 m and a wave speed
of 300 m/s, while the bottom layer has a wave speed of 1.5 km/s. During DG modeling, the
models were successively discretized with 10 cells per minimum wavelength on an equilateral
mesh for the P0 interpolation, 10 cells per local wavelength on unstructured triangular mesh
for the P1 interpolation and 3 cells per local wavelength on unstructured triangular mesh for
the P2 interpolation. A fourth simulation was performed where P1 interpolation is applied in
the top layer while P0 interpolation is used in the bottom layer. Table 1.5 outlines the time
and memory requirement of the LU factorization and multi-r.h.s solve for the FD and DG
methods. Among the different DG schemes, the P2 scheme is the most efficient one in terms of
computational time and memory for the two-layer model. This highlights the benefit provided
by the decreasing of the number of elements in the mesh resulting from the h adaptivity
coupled with a coarse discretisation criterion of 3 cells per local wavelength. The mixed P0 -
P1 scheme performs reasonably well in the two-layer model although it remains less efficient
than the P2 scheme. In contrast, the performances of the P0 and P2 schemes are of the same
order in the homogeneous model. This highlights that P2 scheme does not provide any benefit
if the h adaptivity is not required. The P1 scheme is the less efficient one in homogeneous
media because it relies on the same discretisation criterion than the P0 scheme but involves
an increasing number of nodes per element. As expected, the FD method is the most efficient
one in the homogeneous model thanks to the parsimonious formulation which involves only the
pressure wavefield and the optimized discretisation criterion of 4 grid points per wavelength.
The time and memory costs of the FD and P2 -DG methods are of the same order in the
two-layer model. However, the P2 -DG method will be the method of choice as soon as sharp
boundaries of arbitrary geometries will be present in the model due to the geometrical flexibility

39
FORWARD MODELING

Table 1.5: Computational ressources required for the forward problem solved with DGs P0 , P1 ,
P0 -P1 and P2 and optimized FD method in two simples cases, on 16 processors. Nomenclature:
Homog: homogeneous model. T wo − ley: two-layer model. TLU : time for LU factorization.
M emLU : memory required by LU factorization. Ts : time for 116 r.h.s solve.

Test Resource P0 P1 P0 -P1 P2 FD

Cell/point numbers 113 097 136 724 116 363 12 222 9 604
Degrees of freedom 339 291 1 230 516 417 477 219 996 9 604
Homog. TLU (s) 0.7 8.5 0.8 1.5 0.16
M emLU (Gb) 1.34 5.84 1.62 1.49 0.1
Ts (s) 11.6 40.9 13.6 7.2 0.5
Cell/point numbers 2 804 850 291 577 247 303 32 664 232 324
Degrees of freedom 8 414 550 2 624 193 1 416 243 587 952 232 324
Two-lay. TLU (s) 57.5 15.0 6.4 3.4 1.3
M emLU (Gb) 31.68 11.44 5.58 3.02 1.18
Ts (s) 274.3 83.3 46.8 18.9 2.7

provided by the unstructured triangular mesh.

One must underline that the LU factorization is quite demanding in computer memory and
has also some drawbacks for scalability, suggesting that nodes with high memory should be
preferred at the expense of the CPU numbers.

1.6.3 Boundary conditions and source implementation

Absorbing boundary conditions are implemented with unsplitted PML in the frequency-domain
DG method (Brossier, 2009) following the same approach than for the FD method (see section
PML absorbing boundary conditions).
Free surface boundary condition is implemented with the method of image. A ghost cell is
considered above the free surface with the same velocity and the opposite pressure components
to those below the free surface. This allows us to fulfill the zero pressure condition at the free
surface while keeping the correct numerical estimation of the particle velocity at the free sur-
face. Using these particle velocities and pressures in the ghost cell, the pressure flux across the
free surface interface vanishes, while the velocity flux is twice the value that would have been
obtained by neglecting the flux contribution above the free surface (see equation (1.29)). As in
the FD method, this boundary condition has been implemented by modifying the impedance
matrix accordingly without introducing explicitely the ghost element in the mesh. The rigid
boundary condition is implemented following the same principle except that the same pressure
and the opposite velocity are considered in the ghost cell.
Concerning the source excitation, the point source at arbitrary positions in the mesh is imple-
mented by means of the Lagrange interpolation polynomials for k ≥ 1. This means that the
source excitation is performed at the nodes of the cell containing the source with appropriate
weights corresponding to the projection of the physical position of the source on the polynomial
basis. When the source is located in the close vicinity of a node of a triangular cell, all the
weights are almost zero except that located near the source.

40
1.6 2D Finite-element Discontinuous Galerkin Method in the Frequency Domain

In the case of the P2 interpolation, a source close to the vertex of the triangular cell is problem-
atic because the integral of the P2 basis function over the volume of the cell is zero for nodes
located at the vertex of the triangle. In this case, no source excitation will be performed (see
equation (1.32)).
To overcome this problem specific to the P2 interpolation, one can use locally a P1 interpolation
in the element containing the source at the expense of the accuracy or distribute the source
excitation over several elements or express the solution in the form of local polynomials (i.e.,
the so-called modal form) rather than through nodes and interpolating Lagrange polynomials
(i.e., the so-called nodal form).
Another issue is the implementation of the source in P0 equilateral mesh. If the source is ex-
cited only within the element containing the source, a checker-board pattern is superimposed
on the wavefield solution. This pattern results from the fact that one cell out of two is excited
in the DG formulation because the DG stencil does not embed a staggered-grid structure (the
unexcited grid is not stored in staggered-grid FD methods; see Hustedt et al. (2004) for an
illustration). To overcome this problem, the source can be distributed over several elements of
the mesh or P1 interpolation can be used in the area containing the sources and the receivers,
while keeping P0 interpolation in the other parts of the model (Brossier et al., 2010).
Of note, use of unstructured meshes together with the source excitation at the different nodes
of the element contribute to mitigate the checker-board pattern in the in P1 and P2 schemes.
The same procedure as for the source is used to extract the wavefield solution at arbitrary
receiver positions.

1.6.4 Example

We present below one application involving highly-contrasted media where the DG method
should outperform the FD method thanks to the geometric flexibility provided by unstructured
triangular to implement boundary conditions along interfaces of arbitrary shape.

1.6.4.1 Acoustic wave modeling in presence of cavities

We design a model that mimics a perfect 2D oceanic waveguide of dimension 19 km x 2 km. A

free surface and a rigid surface explicit boundary conditions are implemented on the top and
on the bottom of the water column to mimic the sea surface and the sea floor, respectively.
A pressure source, located at position (x = 0 m; z = 1000 m), propagates the direct wave in
the homogeneous layer as well as waves which are multi-reflected from the top and the bottom
boundaries. Result of the simulation with the DG-P2 scheme at 10 Hz is shown in Figure
1.11a. In a second simulation, we added a circular cavity of diameter 400 m in the center of the
waveguide. A free surface boundary condition is implemented along the contour of the cavity.
The unstructured triangular meshing around the cavity allows for an accurate discretization
of the circular cavity (Figure 1.12). Simulation in the waveguide with the cavity is shown
in Figure 1.11b. Comparison with the simulation performed in the homogeneous waveguide
(Figure 1.11a) highlights the strong interaction between the multi-reflected wavefield with the
scattering source and the intrinsic non-linearity of whispering gallery imaging resulting from
complex wavepaths in the vertical column.

41
FORWARD MODELING

Distance (km)
a) 0 4 8 12 16 20
0
Depth (km) Depth (km)

1
2 Distance (km)
b) 0
1
2

Figure 1.11: Pressure wavefield in the waveguide without (a) and with (b) a circular cavity in
the vertical column. Note that two 500 m layers of PML absorbing conditions are implemented
at the two ends of the model.

Figure 1.12: Wave guide - Cavity model mesh: zoom on the cavity position.

1.7 3D Finite-element Discontinuous Galerkin Method in the

Time Domain

Considering finite-elements in frequency for 3D geometries is not yet feasible as the impedance
matrix is particularly ill-conditioned taking into account its dimensionality.
Time domain approaches are there appealing when considering explicit time integration.
However, in most studies, the DG-FEM is generally used with high approximation orders.
We present a low-order DG-FEM formulation with the convolutional perfectly matched layer
(CPML) absorbing boundary condition (Roden and Gedney, 2000; Komatitsch and Martin,
2007) that is suitable for large-scale three-dimensional (3D) seismic wave simulations. In
this context, the DG-FEM provides major benefits. Our approach relies intensively on the
p-adaptivity. This last feature is crucial for efficient simulations, in order to mitigate the ef-
fects of the very small elements that are generally encountered in refined tetrahedral meshes.
Indeed, the p-adaptivity allows an optimised time stepping to be achieved, by adapting the
approximation order according to the size of the elements and the properties of the medium.
The benefit of such a numerical scheme is particularly important with strongly heterogeneous
media. Due to the mathematical formulation we consider, the medium properties are assumed
to be constant per element. Therefore, meshes have to be designed in such a way that this
assumption is compatible with the expected accuracy. In particular, we address the issues of
seismic modelling and seismic imaging in complex media. In the first application, the discreti-
sation must be able to represent the geological structures fairly, without over-sampling, while

42
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

in the second, the spatial resolution of the imaging process puts constraints on the coarsest
parameterisation of the medium. If we consider full waveform inversion (FWI) applications, the
expected imaging resolution reaches half a wavelength, as shown by Sirgue and Pratt (2004).
Therefore, following the Shannon theorem, a minimum number of four points per wavelength is
required to obtain such accuracy. These reasons have motivated our development of DG-FEM
with low orders. In the present study, we focus on the quadratic interpolation, which yields a
good compromise between accuracy, discretisation and computational cost.
The DG-FEM formulation will be detailed for 3D geometries where the concept of p-adaptivity
will be underlined. Then, the implementation of the method on distributed memory machines
is discussed. The source excitation and two kinds of boundary conditions will be detailed: the
free surface, and the absorbing boundary conditions. Special attention is paid to the latter
with the detailed CPML formulation. The efficiency of the CPML is demonstrated with val-
idation tests that in some cases reveal instabilities inside the absorbing layers. The strategy
for saving CPU time and memory with low-order CPML is then presented. We study the con-
vergence of the method, and the ability to compute accurate surface waves when a free surface
is considered. The advantages of the hp-adaptivity in the context of tetrahedral meshes are
discussed. Finally, we illustrate the efficiency of our method, with a challenging seismological
model, where the computation of surface waves is critical for the prediction of site effects.

1.7.1 The 3D DG-FEM formulation

1.7.1.1 Elastodynamic system

The equations governing particle velocity and stress in an isotropic elastic medium, namely
the elastodynamic system (Virieux, 1986), is a first-order hyperbolic system. Following the
approach of BenJemaa et al. (2009), the elastodynamic system can be written in the following
pseudo-conservative form
X
ρ∂t~v = ∂θ (Mθ ~σ ) + f~
θ∈{x,y,z}
X
Λ∂t~σ = ∂θ (Nθ~v ) + Λ∂t σ~0 , (1.37)
θ∈{x,y,z}

with the definitions of the velocity and stress vectors as

~v = (vx vy vz )T
~σ = (τ τ 0 τ 00 σxy σxz σyz )T , (1.38)

and
1
τ = (σxx + σyy + σzz )
3
1
τ0 = (2σxx − σyy − σzz )
3
1
τ 00 = (−σxx + 2σyy − σzz ). (1.39)
3
Due to the change of variables defined in equation (1.39), the right-hand side of (1.37) does
not include any terms that relate to the physical properties. Mθ and Nθ are constant real

43
FORWARD MODELING

matrices (Appendix B). Λ is a diagonal matrix given by

3 3 3 1 1 1
Λ = diag , , , , , ,
3λ + 2µ 2µ 2µ µ µ µ

where λ and µ are the Lamé parameters. Moreover, the diagonality of Λ is an essential point
of our formulation since the inverse of this matrix is required for the computation of the
stress components (equation (1.37)). The extension of the pseudo-conservative form for the
anisotropic or visco-elastic cases should be further analysed since the change of variable may
depend on the physical parameters while the isotropic purely elastic case requires the simple
global change of variables as shown in this study. Finally, in the equation (1.37), ρ is the
medium density, while f~ and σ~0 are the external forces and the initial stresses, respectively.

1.7.1.2 Spatial discretisation

As is usual with FEMs (Zienkiewicz et al., 2005), we want to approximate the solution of the
equation (1.37) by means of polynomial basis functions defined in volume elements. The spatial
discretisation is carried out with non-overlapping and conforming tetrahedra. We adopt the
nodal form of the DG-FEM formulation (Hesthaven and Warburton, 2008), assuming that the
stress and velocity vectors are approximated in the tetrahedral elements as follows
di
X
~vbi (~x, t) = ~vij (~xj , t) ϕij (~x)
j=1
di
X
~σbi (~x, t) = ~σij (~xj , t) ϕij (~x), (1.40)
j=1

where i is the index of the element, ~x is the spatial coordinates inside the element, and t is the
time. di is the number of nodes or degrees of freedom (DOF) associated with the interpolating
Lagrangian polynomial basis function ϕij relative to the j-th node located at position ~xj . The
expressions of the Lagrangian basis functions are given in Appendix A. ~vij and ~σij are the
velocity and stress vectors, respectively, evaluated at the j-th node of the element. Although
it is not an intrinsic limitation, we have adopted here the same set of basis functions for the
interpolation of the velocity and the stress components. In the following, the notation Pk refers
to a spatial discretisation based on polynomial basis functions of degree k, and a Pk element is
a tetrahedron in which a Pk scheme is applied. The number of DOF in a tetrahedral element
is given by di = (k + 1)(k + 2)(k + 3)/6. For instance, in a P0 element (Figure 1.13.a), there
is only one DOF (the stress and velocity are constant per element), while in a P1 element
(Figure 1.13.b), there are four DOF located at the four vertices of the tetrahedron (the stress
and velocity are linearly interpolated). It is worth noting that the P0 scheme corresponds to
the case of the finite-volume method (BenJemaa et al., 2007, 2009; Brossier et al., 2008). For
the quadratic approximation order P2 , one node is added at the middle of each edge of the
tetrahedron, leading to a total of 10 DOF per element (Figure 1.13.c).
The first step in the finite-element formulation is to obtain the weak form of the elastodynamic
system. To do so, we multiply the equation (1.37) by a test function ϕir and integrate the
system over the volume of the element i. For the test function, we adopt the same kind of
function as used for the approximation of the solution. This case corresponds to the standard

44
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

Figure 1.13: (a) P0 element with one unique DOF. (b) P1 element with four DOF. (c) P2
element with 10 DOF.

Galerkin method and can be written as

Z Z X
ϕir ρ∂t~v dV = ϕir ∂θ (Mθ ~σ ) dV
Vi Vi θ∈{x,y,z}
Z Z X
ϕir Λ∂t~σ dV = ϕir ∂θ (Nθ~v ) dV ∀r ∈ [1, di ], (1.41)
Vi Vi θ∈{x,y,z}

where the volume of the tetrahedral element i is denoted by Vi . For the purpose of clarity, we
have omitted the external forces and stresses in the equation (1.41). Integration by parts of
the right side of the equation (1.41) leads to
Z Z X Z X
ϕir ρ∂t~v dV = − ∂θ ϕir (Mθ ~σ ) dV + ϕir Mθ nθ ~σ dS
Vi Vi θ∈{x,y,z} Si θ∈{x,y,z}
Z Z X Z X
ϕir Λ∂t~σ dV = − ∂θ ϕir (Nθ~v ) dV + ϕir Nθ nθ ~v dS, (1.42)
Vi Vi θ∈{x,y,z} Si θ∈{x,y,z}

with Si as the surface of the element i, and ~n = (nx , ny , nz )T as the outward pointing unit
normal vector with respect to the surface Si . In the second term of the right-hand side of
equation (1.42), the fluxes of the stress and velocity wavefields across the faces of the element i
appear. For evaluation of these fluxes, we adopt the centred flux scheme for its non-dissipative
property (Remaki, 2000; BenJemaa et al., 2009; Delcourte et al., 2009). Using equation (1.40)
and assuming constant physical properties per element, equation (1.42) can be approximated
with
Z Z Z
X 1 X
ρi ϕir ∂t~vbi dV = − ∂θ ϕir (Mθ ~σbi ) dV + ϕir Pik (~σbi + ~σ
ck ) dS
Vi Vi θ∈{x,y,z} 2 S ik
k∈Ni
Z Z XZ
X 1
Λi ϕir ∂t~σbi dV = − ∂θ ϕir (Nθ~vbi ) dV + ϕir Qik (~vbi + ~vbk ) dS,
(1.43)
Vi Vi 2 Sik
θ∈{x,y,z} k∈Ni

with k ∈ Ni representing the elements k adjacent to the element i, and Sik the face between
elements i and k. P and Q are defined as follows
X
Pik = nikθ Mθ
θ∈{x,y,z}

45
FORWARD MODELING

X
Qik = nikθ Nθ ,
θ∈{x,y,z}

where nikθ is the component along the θ axis of the unit vector ~nik of the face Sik that points
from element i to element k. Equation (1.43) indicates that the computations of the stress and
velocity wavefields in one element require information from the directly neighbouring elements.
This illustrates clearly the local nature of DG-FEM. Using the tensor product ⊗, we obtain
the expression
X 1 Xh i
ρi (I3 ⊗ Ki )∂t~vi = − (Mθ ⊗ Eiθ )~σi + (Pik ⊗ Fik )~σi + (Pik ⊗ Gik )~σk
2
θ∈{x,y,z} k∈Ni
X 1 X h i
(Λi ⊗ Ki )∂t~σi = − (Nθ ⊗ Eiθ )~vi + (Qik ⊗ Fik )~vi + (Qik ⊗ Gik )~vk , (1.44)
2
θ∈{x,y,z} k∈Ni

where I3 represents the identity matrix. In equation (1.44), the vectors ~vi and ~σi should be
read as the collection of all nodal values of the velocity and stress components in the element
i. We now introduce the mass matrix
Z
(Ki )rj = ϕir ϕij dV j, r ∈ [1, di ], (1.45)
Vi

the stiffness matrix

Z
(Eiθ )rj = (∂θ ϕir ) ϕij dV j, r ∈ [1, di ], (1.46)
Vi

with θ ∈ {x, y, z}, and the flux matrices

Z
(Fik )rj = ϕir ϕij dS j, r ∈ [1, di ] (1.47)
Sik

Z
(Gik )rj = ϕir ϕkj dS r ∈ [1, di ] j ∈ [1, dk ]. (1.48)
Sik

It is worth noting that in equation (1.48), the DOF of elements i and k appear (di and dk ,
respectively) indicating that the approximation orders are totally decoupled from one element
to another. Therefore, the DG-FEM allows for varying approximation orders in the numerical
scheme. This feature is referred to as p-adaptivity. Moreover, given an approximation order,
these matrices are unique for all elements (with a normalisation according to the volume or
surface of the elements) and they can be computed before hand with appropriate integration
quadrature rules. The memory requirement is therefore low, since only a collection of small ma-
trices is needed according to the possible combinations of approximation orders. The maximum
size of these matrices is (dmax ×dmax ) where dmax is the maximum number of DOF per element
and the number of matrices to store is given by the square of the number of approximation
orders mixed in the numerical domain.
Details regarding the computation of these matrices in 3D are given in Appendix B.
It should be mentioned that, in order to retrieve both the velocity and the stress compo-
nents, the equation (1.44) requires the computation of Ki−1 , which can also be performed before

46
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

hand.
Note that if we want to consider variations in the physical properties inside the elements, the
pseudo-conservative form makes the computation of flux much easier and computationally more
efficient than in the classical elastodynamic system. These properties come from the fact that
in the pseudo-conservative form, the physical properties are located in the left-hand side of
equation (1.37). Therefore, no modification of the stiffness and flux matrices nor additional
terms are needed in equation (1.44) to take into account the variation of properties. Only the
mass matrix needs to be evaluated for each element and for each physical property according
to the expression
Z
(Ki )rj = χi (~x) ϕir (~x) ϕij (~x) dV j, r ∈ [1, di ], (1.49)
Vi

where χi (~x) represents the physical property (ρi or one of the Λi components) varying inside
the element.

1.7.1.3 Time discretisation

For the time integration of equation (1.44), we adopt a second-order explicit leap-frog scheme
that allows to compute alternatively the velocity and the stress components between a half
time step. Equation (1.44) can be written as
1 1
v~i n+ 2 − v~i n− 2 X 1 Xh i
ρi (I3 ⊗ Ki ) = − (Mθ ⊗ Eiθ )~σin + (Pik ⊗ Fik )~σin + (Pik ⊗ Gik )~σkn
∆t 2
θ∈{x,y,z} k∈Ni

σ~i n+1 − σ~i n X n+ 12

(Λi ⊗ Ki ) = − (Nθ ⊗ Eiθ )~vi
∆t
θ∈{x,y,z}
1 Xh n+ 1 n+ 1
i
+ (Qik ⊗ Fik )~vi 2 + (Qik ⊗ Gik )~vk 2 , (1.50)
2
k∈Ni

where the superscript n indicates the time step. We chose to apply the definition of the time
step as given by Käser et al. (2008), which links the mesh width and time step as follows

1 2ri
∆t < min( · ), (1.51)
i 2ki + 1 VP i

where ri is the radius of the sphere inscribed in the element indexed by i, VP i is the P-wave
velocity in the element, and ki is the polynomial degree used in the element. Equation (1.51)
is a heuristic stability criterion that usually works well. However, there is no mathematical
proof for unstructured meshes that guarantees numerical stability.

1.7.2 Computational aspects

The DG-FEM is a local method, and therefore it is naturally suitable for parallel computing.
In our implementation, the parallelism relies on a domain-partitioning strategy, assigning one
subdomain to one CPU. This corresponds to the single program mutiple data (SPMD) archi-
tecture, which means that there is only one program and each CPU uses the same executable to

47
FORWARD MODELING

250

200

Speed−up
150

100

50 100 150 200 250

Number of MPI process

Figure 1.14: Speed-up observed when the number of MPI processes is increased from 1 to 256
for modelling with a mesh of 1.8 million P2 elements. The ideal speed-up is plotted with a
dashed line, the observed speed-up with a continuous line. These values were observed on a
computing platform with bi-processor quad core Opteron 2.3 GHz CPUs interconnected with
Infiniband at 20 Gb/s.

work on different parts of the 3D mesh. Communication between the subdomains is performed
with the message passing interface (MPI) parallel environment (Aoyama and Nakano, 1999),
which allows for applications to run on distributed memory machines. For efficient load balanc-
ing among the CPUs, the mesh is divided with the partitioner METIS (Karypis and Kumar,
1998), to balance the number of elements in the subdomains, and to minimise the number of
adjacent elements between the subdomains. These two criteria are crucial for the efficiency of
the parallelism on large-scale numerical simulations. Figure 1.14 shows the observed speed-up
(i.e. the ratio between the computation time with one CPU, and the computation time with
N CPUs) when the number of MPI processes is increased from 1 to 256, for strong scaling
calculations on a fixed mesh of 1.8 million P2 elements. This figure shows good efficiency of
the parallelism, of around 80%.
In our formulation, another key point is the time step, which is common for all of the subdo-
mains. The time step should satisfy the stability condition given in equation (1.51) for every
element. Consequently, the element with the smallest time step imposes its time step on all of
the subdomains. We should mention here a more elaborate approach with local time stepping
(Dumbser et al., 2007b) that allows for elements to have their own time step independent of
the others. Nevertheless, the p-adaptivity offered by DG-FEM allows mitigation of the compu-
tational burden resulting from the common time step as we shall see. From a technical point
of view, we implemented the method in the FORTRAN 90 langage without the use of spe-
cific mathematical librairies like Basic Linear Algebra Subroutines (BLAS). Indeed, the matrix
products in the DG-FEM formulation involve relatively small matrices (typically 10 × 10 in
P2 ). Therefore, we did not experience substantial gains when calling mathematical libraries,
as already observed by Komatitsch et al. (2008) for SEM.

1.7.2.1 Source excitation and boundary conditions

We consider here the implementation of a point source in the DG-FEM, and we detail two types
of boundary conditions that are generally encountered in seismic modelling: the free surface,

48
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

x x x
Pa −10 Pa −10 Pa −11
0 0.5 1 1.5 2 x 10 0 0.5 1 1.5 2 x 10 0 0.5 1 1.5 2 x 10
2 2 2

2 1.2 7

1.5 1.5 1 1.5 6

1.5 5
0.8
1 1 1 4
y

y
1 0.6
3
0.4
0.5 0.5 0.5 2
0.5
0.2 1

0 0 0 0 0 0
(a) (b) (c)

Figure 1.15: (a) Cross-section of the mesh near the source position, indicated with a yellow
star in the xy plane. This view represents the spatial support of the stress component in a P0
element containing the point source. (b) Same as (a) with a P1 element. (c) Same as (a) with
a P2 element.

and the absorbing boundary conditions. Special attention is given to the latter, based on the
CPML (Komatitsch and Martin, 2007; Drossaert and Giannopoulos, 2007). To our knowedge,
this point has not been studied intensely in a DG-FEM framework.

1.7.2.2 Source excitation

The excitation of a point source is projected onto the nodes of the element that contains the
source as follows

ϕ~i (~xs )
~sni = Pd R s(t), (1.52)
i
j=1 ϕij (~
x s ) Vi ϕij (~
x )dV

with ~sni the nodal values vector associated to the excited component, t = n∆t, ~xs the position
of the point source and s(t) the source function. Equation (1.52) gives the source term that
should be added to the right-hand side of equation (1.50) for the required components. It should
be noticed that this term is only applied to the element containing the source. Depending on
the approximation order, the spatial support of the source varies. Figure 1.15.a shows that
the support of a P0 element is actually the whole volume of the element (represented on the
cross-section with a homogeneous white area). In this case, no precise localisation of the source
inside the element is possible due to the constant piece-wise interpolation approximation. On
the other hand, in a P1 element (Figure 1.15.b), the spatial support of the source is linear and
allows for a rough localisation of the source. In a P2 element (Figure 1.15.c), the quadratic
spatial support tends to resemble the expected Dirac in space close to the source position. It
should be noted that the limitations concerning source localisation also apply to the solution
extraction at the receivers, according to the approximation order of the elements containing
the receivers. We do have the same features of the spatial discrete influence of the interpolation
order in the source excitation we have discussed in the 2D geometry.

49
FORWARD MODELING

1.7.2.3 Free surface condition

For the element faces located on the free surface, we use an explicit condition by changing
the flux expression locally. This is carried out with the concept of virtual elements, which are
exactly symmetric to the elements located on the free surface. Inside the virtual elements,
we impose a velocity wavefield that is identical to the wavefield of the corresponding inner
elements, and we impose an opposite stress wavefield. As a result, the velocity is seen as
continuous across the free surface, while the stress is equal to zero on the faces related to the
free surface.

1.7.2.4 Absorbing boundary condition

For simulations in an infinite medium, an absorbing boundary condition needs to be applied

at the edges of the numerical model. An efficient way to mimic such an infinite medium can be
achieved with PMLs, which was initially developed by Berenger (1994) for electromagnetics,
and adapted for elastodynamics by Chew and Liu (1996). PMLs are anisotropic absorbing
layers that are added at the periphery of the numerical model. The classical PML formulation
is based on splitting of the elastodynamic equations. In the following, we use a new kind of
PML, known as CPML, which does not require split terms. The CPML originated from Roden
and Gedney (2000) for electromagnetics and was applied by Komatitsch and Martin (2007)
and Drossaert and Giannopoulos (2007) to the elastodynamic system. CPML is based on an
idea of Kuzuoglu and Mittra (1996), who obtained a strictly causal form of PML by adding
some parameters in the standard damping function of Berenger (1994), which enhanced the
absorption of waves arriving at the boundaries of the model with grazing incidence angles.
Inside the CPML, a damping function is applied only onto the spatial derivative perpen-
dicular to the boundary. In the CPML formulation, the damping function is defined in the
frequency domain as follows
dθ
sθ = κθ + ∀θ ∈ {x, y, z}, (1.53)
αθ + iω
with angular frequency ω and coefficients κθ ≥ 1 and αθ ≥ 0. The damping profile dθ varies
from 0 at the entrance of the layer, up to a maximum real value dθmax at the end (Collino and
Tsogka, 2001) such that
δ 2
θ
dθ = dθmax , (1.54)
Lcpml
and
log(Rcoef f )
dθmax = −3VP ∀θ ∈ {x, y, z}, (1.55)
2Lcpml
with δθ as the depth of the element barycentre inside the CPML, Lcpml the thickness of the
absorbing layer, and Rcoef f the theoretical reflection coefficient. For all of the tests presented
in the following, we chose Rcoef f = 0.1%. αθ is a coefficient that varies from a maximum value
(αθmax = πf0 ) at the entrance of the CPML, to zero at its end. If κθ = 1 and αθ = 0, the
classical PML formulation is obtained. In the CPML, the spatial derivatives are replaced by
1
∂θ̃ → ∂θ + ζθ ∗ ∂θ ∀θ ∈ {x, y, z}, (1.56)
κθ

50
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

with
dθ
ζθ (t) = − H(t)e−(dθ κθ +αθ )t ∀θ ∈ {x, y, z}, (1.57)
κ2θ

where H(t) denotes the Heaviside distribution. Roden and Gedney (2000) have demonstrated
that the time convolution in equation (1.56) can be performed in a recursive way using memory
variables defined by

ψθ = ζθ ∗ ∂θ ∀θ ∈ {x, y, z}. (1.58)

ψθ represents a memory variable in the sense that it is updated at each time step. Komatitsch
and Martin (2007) showed that the term κθ has a negligible effect on the absorbing abilities,
and it can be set to 1. If we take κθ = 1 and derive the equation (1.58) using the equation
(1.57), we get

∂t ψθ = −dθ ∂θ − (dθ + αθ )ψθ ∀θ ∈ {x, y, z}. (1.59)

We can introduce the memory variables into the initial elastodynamic system (1.37) with the
definition of vectors
~θ (~v ) = (ψθ (vx ) ψθ (vy ) ψθ (vz ))T
ψ
~θ (~σ ) = (ψθ (τ ) ψθ (τ 0 ) ψθ (τ 00 ) ψθ (σxy ) ψθ (σxz ) ψθ (σyz ))T
ψ ∀θ ∈ {x, y, z}. (1.60)

If we apply the change of variables in equation (1.56), equation (1.37) becomes

X X
ρ∂t~v = ∂θ (Mθ ~σ ) + Mθ ψ~θ (~σ )
θ∈{x,y,z} θ∈{x,y,z}
X X
Λ∂t~σ = ∂θ (Nθ~v ) + ~θ (~v ).
Nθ ψ (1.61)
θ∈{x,y,z} θ∈{x,y,z}

Equation (1.61) is the initial elastodynamic system augmented by the memory variables on the
right-hand side. In combination, another extra system dealing with the memory variables is
~θ (~σ ) = −dθ ∂θ (~σ ) − (dθ + αθ )ψ
∂t ψ ~θ (~σ )
~θ (~v ) = −dθ ∂θ (~v ) − (dθ + αθ )ψ
∂t ψ ~θ (~v ) ∀θ ∈ {x, y, z}. (1.62)

The collection of memory variables associated with each element located in the CPMLs is
made of 22 memory variables per DOF. These variables correspond to the 22 spatial derivatives
involved in equation (1.37). If we apply the DG-FEM formulation as presented in the previous
section to the equation (1.61) and the equation (1.62), we get
1 1
v~i n+ 2 − v~i n− 2 X 1 Xh i
ρi (I3 ⊗ Ki ) = − (Mθ ⊗ Eiθ )~σin + (Pik ⊗ Fik )~σin + (Pik ⊗ Gik )~σkn
∆t 2
θ∈{x,y,z} k∈Ni
X
+ (I3 ⊗ Ki ) ~θ (~σ n )
Mθ ψ i
θ∈{x,y,z}
n+1 n
σ~i − σ~i X n+ 12 1 Xh n+ 1 n+ 1
i
(Λi ⊗ Ki ) = − (Nθ ⊗ Eiθ )~vi + (Qik ⊗ Fik )~vi 2 + (Qik ⊗ Gik )~vk 2
∆t 2
θ∈{x,y,z} k∈Ni

51
FORWARD MODELING

1
~θ (~v n+ 2 ),
X
+ (I3 ⊗ Ki ) Nθ ψ i (1.63)
θ∈{x,y,z}

in combination with the memory variable system

~θ (~σ n ) − ψ
ψ ~θ (~σ n−1 )
(I3 ⊗ Ki ) i i
= diθ (I6 ⊗ Eiθ )~σin−1
∆t
1 X h i
− diθ nikθ (I6 ⊗ Fik )~σin−1 + (I6 ⊗ Gik )~σkn−1
2
k∈Ni
~θ (~σ n−1 )
− (I3 ⊗ Ki )(diθ + αiθ )ψ i
1 1
ψ ~θ (~v n− 2 )
~θ (~v n+ 2 ) − ψ X n− 21
i i
(I3 ⊗ Ki ) = diθ (I3 ⊗ Eiθ )~vi
∆t
θ∈{x,y,z}
1 X h n− 1 n− 1
i
− diθ nikθ (I3 ⊗ Fik )~vi 2 + (I3 ⊗ Gik )~vk 2
2
k∈Ni
1
~θ (~v n− 2 )
− (I3 ⊗ Ki )(diθ + αiθ )ψ ∀θ ∈ {x, y, z}.(1.64)
i

Equations (1.63) and equations (1.64) indicate that p-adaptivity is also supported in the
CPMLs. At the end of the CPMLs, we apply a simple free surface condition as explained
in the previous section.

1.7.3 Validation tests

To validate the efficiency of the CPML, we present some simulations of wave propagation in
a homogeneous, isotropic and purely elastic medium. The model size is 8 km × 8 km × 8
km, and the medium properties are: VP = 4000 m/s, VS = 2310 m/s and ρ = 2000 kg/m3 .
An explosive source is placed at coordinates (xs = 2000 m, ys = 2000 m, zs = 4000 m) and a
line of receivers is located at coordinates (3000 m ≤ xr ≤ 6000 m, yr = 2000 m, zr = 4000
m) with 500 m between receivers. The conditions of the tests are particularly severe, since
the source and the receivers are located close to the CPMLs (at a distance of 250 m), thus
favouring grazing waves. The source signature is a Ricker wavelet with a dominant frequency
of 3 Hz and a maximum frequency of about 7.5 Hz. Due to the explosive source, only P-
wave is generated and the minimum wavelength is about 533 m. The mesh contains 945,477
tedrahedra with an average edge of 175 m, making a discretisation of about 3 elements per
λmin . Figures 1.16.c and 1.16.d show the results obtained with the P2 interpolation and CPMLs
of 10-elements width (Lcpml = 1750 m) at all edges of the model. With the standard scale,
no reflection can be seen from the CPMLs. When the amplitude is magnified by a factor of
100, some spurious reflections are visible. This observation is in agreement with the theoretical
reflection coefficient (Rcoef f = 0.1%) in equation (1.55). Figure 1.17.a allows to compare the
seismograms computed with CPMLs of 10-elements width and the seismograms computed in
a larger model without reflection in the time window.
As shown by Collino and Tsogka (2001), the thickness of the absorbing layer plays an important
role in the absorption efficiency. In Figures 1.16.a and 1.16.b, the same test was performed
with CPMLs of 5-elements width (Lcpml = 875 m) at all edges of the model. Compared to
Figures 1.16.c and 1.16.d, the amplitude of the reflections have the same order of magnitude.

52
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

x (km) x (km)
0 4 8 0 4 8
0 0

y (km)

y (km)
4 4

8 8
(a) (b)

x (km) x (km)
0 4 8 0 4 8
0 0
y (km)

y (km)
4 4

8 8
(c) (d)

Figure 1.16: Snapshots at 1.6 s of the velocity component vx in the plane xy that contains
the source location. CPMLs of 10-elements width are applied at all edges of the model. The
modelling was carried out with P2 interpolation. White lines, the limits of the CPMLs; black
cross, the position of the source. (a) Real amplitude. (b) Amplitude magnified by a factor of
100. (c) & (d) Same as (a) & (b) with CPMLs of 5-elements width.

Nevertheless, in the upper and left parts of the model, some areas with a strong amplitude
appear close to the edges. These numerical instabilities arise at the outer edges of the CPMLs,
and they expand over the complete model during the simulations. Instabilities of PML in
long time simulations have been studied in electromagnetics (Abarbanel et al., 2002; Bécache
et al., 2004). In the following, we present a numerical stability study of CPML combined with
DG-FEM for the elastodynamics. The results are shown in Figure 1.18, with snapshots at
long times for CPMLs of 5- and 10-elements widths. In these snapshots, the instabilities arise
at the four corners of the model (at 20 s for the 10-elements width CPML). Tests with larger
CPMLs (not shown) demonstrate that when the CPML width is 20 elements, these instabilities
do not appear. Such instabilities were experienced by Meza-Fajardo and Papageorgiou (2008)
with standard PML, for an isotropic medium. These authors proposed the application of an
additional damping in the PML, onto the directions parallel to the layer, leading to a multiaxial
PML (M-PML). Figure 1.19 is equivalent to Figure 1.18, instead that 10% of the damping
profile defined in equation (1.54) has been added onto the directions parallel to the CPMLs (in
the latter named M-CPMLs). As a result, instabilities do not appear when the CPML width
is at least 10 elements while the efficiency of the absorption is preserved as shown by Figure
1.17.b with similar residuals compared to Figure 1.17.a.

53
FORWARD MODELING

CPML M−CPML
(a) 6500 (b) 6500
6000 6000

5500 5500

5000 5000
Offset (m)

Offset (m)
4500 4500

4000 4000

3500 3500

3000 3000

2500 2500
0 1 2 3 4 0 1 2 3 4
Time (s) Time (s)

Figure 1.17: (a) Seismograms of the velocity component vx . The amplitude of each seismogram
is normalised. Black continuous line, numerical solution in large model without reflection in
the time window; dashed line, numerical solution with 10-elements width CPMLs; grey line,
residuals magnified by a factor of 10. (b) Same as (a) with 10-elements width M-CPMLs.

x (km) x (km) x (km)

0 4 8 0 4 8 0 4 8
0 0 0
y (km)

y (km)

4 4 4

8 8 8
(a) (b) (c)

x (km) x (km) x (km)

0 4 8 0 4 8 0 4 8
0 0 0
y (km)

y (km)

4 4 4

8 8 8
(d) (e) (f)

Figure 1.18: (a), (b) & (c) Snapshots of the velocity component vx in the plane xy that
contains the source location at 10, 20 and 30 s, respectively. The amplitude is plotted without
any magnification factor. The modelling was carried out with P2 interpolation. CPMLs with
5-elements width are applied at all edges of the model. White lines, the limits of the CPMLs;
black cross, the position of the source. (d), (e) & (f) Same as (a), (b) & (c), respectively, except
with CPMLs of 10-elements width.

54
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

x (km) x (km) x (km)

0 4 8 0 4 8 0 4 8
0 0 0

y (km)

y (km)
4 4 4

8 8 8
(a) (b) (c)

x (km) x (km) x (km)

0 4 8 0 4 8 0 4 8
0 0 0
y (km)

y (km)

y (km)
4 4 4

8 8 8
(d) (e) (f)

Figure 1.19: (a), (b) & (c) Snapshots of the velocity component vx in the plane xy that
contains the source location at 10, 20 and 30 s, respectively. The amplitude is plotted without
any magnification factor. The modelling was carried out with P2 interpolation. M-CPMLs
with 5-elements width and 10% of the damping profile added onto the directions parallel to the
layer were applied at all edges of the model. White lines, the limits of the M-CPMLs; black
cross, the position of the source. (d), (e) & (f) Same as (a), (b) & (c), respectively, except with
M-CPMLs of 10-elements width.

Table 1.6: Computation times for updating the velocity and stress wavefields in one element
for one time step. These values correspond to average computation times for a computing
platform with bi-processor quad core Opteron 2.3 GHz CPUs interconnected with Infiniband
20 at Gb/s.

Approximation order Element outside CPML Element inside CPML

P0 2.6 µs 3.6 µs
P1 5.0 µs 8.3 µs
P2 21.1 µs 29.9 µs

1.7.3.1 Saving computation time and memory

Table 1.6 gives the computation times for updating the velocity and stress wavefields in one
element for one time step, for different approximation orders, without or with the update of
the CPML memory variables (i.e. elements located outside or inside the CPMLs). These com-
putation times illustrate the significant increase with respect to the approximation order, and
they allow an evaluation of the additional costs of the CPML memory variables computation
from 40% to 60%. The effects of this additionnal cost have to be analysed in the context of a

55
FORWARD MODELING

x (km) x (km)
0 4 8 0 4 8
0 30 0 2

20
y (km)

y (km)
4 15 4

0
8 8 1
(a) (b)

Figure 1.20: (a) Layout of the subdomains obtained with the partitioner METIS (Karypis and
Kumar, 1998) along the xy plane that contains the source location. Grey lines, the limits of
the CPMLs. The mesh was divided into 32 partitions, although only a few of these are visible
on this cross-section. (b) View of the approximation order per element along the same plane.
Black, the P2 elements; white, the P1 elements.

domain-partitioning strategy. The mesh is divided into subdomains, using a partitioner. Fig-
ure 1.20.a shows the layout of the subdomains that were obtained with the partitioner METIS
(Karypis and Kumar, 1998) along the xy plane used in the previous validation tests. The mesh
was divided into 32 partitions, although only a few of these are visible on the cross-section
in Figure 1.20.a. We used an unweighted partitioning, meaning that each partition contains
approximately the same number of elements. The subdomains, partially located in the CPMLs,
contain different numbers of CPML elements. In large simulations, some subdomains are to-
tally located inside the CPMLs, and some others outside the CPMLs. In such a case, the
extra computation costs of the subdomains located in the absorbing layers penalise the whole
simulation. Indeed, most of the subdomains spend 40% to 60% of the time just waiting for
the subdomains located in the CPMLs to complete the computations at each time step. For
a better load balancing, we propose to benefit from the p-adaptivity of DG-FEM, using lower
approximation orders in the CPMLs. Indeed, inside the absorbing layers, we do not need a spe-
cific accuracy, and consequently the approximation order can be decreased. Table 1.6 indicates
that such a mixed numerical scheme is advantageous, since the computation time required for
a P0 or P1 element located in the CPML is shorter than the computation time of a standard
P2 element. Figure 1.20.b shows the approximation order per element when P1 is used in the
CPMLs and P2 in the rest of the medium. We should note here that the interface between
these two areas is not strictly aligned to a cartesian axis, and has some irregularities due to
the shape of the tetrahedra. Although it is possible to constrain the alignment of the element
faces parallel to the CPML limits, we did not observe significant differences in the absorption
efficiency whether the faces are aligned or not.
Figure 1.21.a shows the seismograms computed when the modelling was carried out with P2
inside the medium and P1 in the CPMLs. Absorbing layers of 10-elements width are applied
at all edges of the model. For comparison, Figure 1.21.b shows the results obtained with P0
in the CPMLs and P2 for the rest of the medium. In this case, the spurious reflections have
significant amplitudes. The snaphots (not presented here) reveal a large number of artefacts
both in the CPMLs and in the medium. These artefacts make it impossible to use these seis-

56
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

P2/P1 P2/P0
(a) 6500 (b) 6500
6000 6000

5500 5500

5000 5000
Offset (m)

Offset (m)
4500 4500

4000 4000

3500 3500

3000 3000

2500 2500
0 1 2 3 4 0 1 2 3 4
Time (s) Time (s)

Figure 1.21: (a) Seismograms of the velocity component vx . The amplitude of each seismogram
is normalised. The modelling is done with P1 in the CPMLs and P2 inside the medium. Black
continuous line, numerical solution in large model without reflection in the time window; dashed
line, numerical solution with 10-elements width CPMLs; grey line, residuals magnified by a
factor of 10. (b) Same as (a) except the modelling is done with P0 in the CPMLs and P2 inside
the medium.

mograms for practical applications. On the other hand, the seismograms computed with the
mixed scheme P2 /P1 show weak artefacts, and are reasonably comparable with the seismograms
obtained with complete P2 modelling (compare Figure 1.21.a and Figure 1.17.a). Therefore,
taking into account that the computation time and the memory consumption of the P2 /P1
simulation are nearly half of those required with the full P2 modelling, we can conclude that
this mixed numerical scheme is of interest. It should be noticed that it is possible to adopt
a weighted partitioning approach to overcome partly load balancing issues. Nevertheless, it
does not prevent from using our mixed scheme approach which allows a significant reduction
of the number of CPML memory variables. Actually, our strategy is totally compatible with
a weighted partitioning and the combination of both would be more efficient than using only
one of them. We should also stress that the saving in CPU time and memory provided with
this kind of low-cost absorbing boundary condition is crucial for large 3D simulations, and this
becomes a must in the context of 3D seismic imaging applications that require a lot of forward
problems, such as FWI.

1.7.4 Accuracy of DG-FEM with tetrahedral meshes

There are a variety of studies in the literature concerning the dispersive and dissipative prop-
erties of DG-FEM with reference to wave-propagation problems. To cite but a few examples:
Ainsworth et al. (2006) provided a theoretical study for the 1D case; Basabe et al. (2008) anal-
ysed the effects of basis functions on 2D periodic and regular quadrilateral meshes; and Käser
et al. (2008) discussed the convergence of the DG-FEM combined with ADER time integration
and 3D tetrahedral meshes. More related to our particular concern here, Delcourte et al. (2009)
provided a convergence analysis of the DG-FEM with a centred flux scheme and tetrahedral
meshes for elastodynamics. They demonstrated the sensitivity of the DG-FEM to the mesh
quality, and they proved that the convergence is limited by the second-order time integration

57
FORWARD MODELING

Table 1.7: Average edge length, minimum and maximum insphere radius and number of ele-
ments of the unstructured tetrahedral meshes used for the convergence study

Mesh 1 2 3 4 5 6
Average edge (m) 0.19 0.12 0.08 0.05 0.04 0.03
Min. insphere radius (m) 0.0203 0.0132 0.0078 0.0048 0.0030 0.0019 .
Max. insphere radius (m) 0.0486 0.0304 0.0211 0.0155 0.0117 0.0087
Number of elements 1561 5357 17932 49822 154297 388589

we have used in the present study, despite the order of the basis function.

1.7.4.1 Convergence study

We present a convergence analysis of the DG-FEM P2 , P1 and P0 schemes following the ap-
proach of Delcourte et al. (2009). The analysis is based on the propagation of an eigenmode
in a unit cube with a free surface condition applied at all faces. The properties of the cube are
VP = 1 m/s, VS = 0.5 m/s and ρ = 1 kg/m3 . According to these parameters, the solution of
the eigenmode (1,1,1) is given by

vx = cos(πx) (sin(πy) − sin(πz)) cos(Ωt)

vy = cos(πy) (sin(πz) − sin(πx)) cos(Ωt)
vz = cos(πz) (sin(πx) − sin(πy)) cos(Ωt)
σxx = −A sin(πx) (sin(πy) − sin(πz)) sin(Ωt)
σyy = −A sin(πy) (sin(πz) − sin(πx)) sin(Ωt)
σzz = −A sin(πz) (sin(πx) − sin(πy)) sin(Ωt)
σxy = σxz = σyz = 0, (1.65)
√ √
where A = 1/ 2 and Ω = π/ 2. In order to assess the convergence rate of the method,
we made several tests with different unstructured tetrahedral meshes with the characteristics
summarized in Table 1.7. The initial conditions are imposed at each node of the elements by
setting the velocities at t = 0 and the stresses at t = ∆t/2 following equation (1.65). We
place a bunch of receivers according to a cartesian grid that matches the size of the cube. The
spacing between receivers is 0.1 m, making a total number of√1331 receivers (11 × 11 × 11).
At each receiver, a sinusoidal signal with a period of T = 2 2 seconds should be observed.
This monochromatic signal corresponds to the propagation of P-waves across the cube that
are continuously reflected at the cube faces. Consequently, we can establish a relationship
between the simulation time and the propagated distance. In the Figure 1.22.a, we present
the normalised RMS error between the analytical and numerical solutions at t = 5 T and at
t = 50 T , corresponding to a propagation of 5 and 50 wavelengths respectively. We can observe
that no convergence is achieved with P0 while a second-order convergence is observed for both
P1 and P2 at t = 5 T . As expected, an increase of the error is seen at longer times, resulting
from the accumulation of errors with time iterations. At t = 50 T , a second-order convergence
is still observed for P1 while the convergence of P2 becomes more erratic. The seismograms
of Figure 1.23.a and Figure 1.23.a represent the vx component observed at short and long

58
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

1 1
(a) 10 (b) 10

0 0
10 10
RMS Error

RMS Error
−1 −1
10 10

−2 −2
10 10

−3 −3
10 1 2
10 −2 0 2 4 6
10 10 10 10 10 10 10
1 / r (m−1) CPU time (s)

Figure 1.22: (a) Root mean square error between the analytical and numerical solutions versus
the inverse of the maximum insphere radius r. Black dashed line, the error against the P0
solution at t = 5 T ; black continuous line, the error against P0 at t = 50 T ; red dashed line, the
error against the P1 solution at t = 5 T ; red continuous line, the error against P1 at t = 50 T ;
blue dashed line, the error against the P2 solution at t = 5 T ; blue continuous line, the error
against P2 at t = 50 T ; grey curve, second-order slope. (b) Same as (a) except the root mean
square error is plotted versus the elapsed computation time. The tests have been performed
with 32 CPUs on a computing platform with bi-processor quad core Opteron 2.3 GHz CPUs
interconnected with Infiniband at 20 Gb/s

times respectively. These seismograms have been recorded at the position (x = 0 m, y = 0

m, z = 0.5 m) with the mesh # 4. At short times, we can see a good match between the
numerical and analytical solutions for both P1 and P2 schemes. Concerning the P0 scheme, we
can notice a strong distorsion of the sinusoidal signal with a apparent period that is shorter
than the analytical one. We can conclude that the P0 scheme does not provide accurate results
with unstructured tetrahedral meshes. At long times, the agreement is still good for P2 (thus
explaining the slow convergence observed when using finer meshes) but we can observe a strong
delay for the P1 scheme. The delay is reduced when using finer meshes as indicated by the
convergence curve in Figure 1.22.a. In terms of precision and efficiency, the gain from the P2
scheme compared with the P1 scheme can be evaluated from Figure 1.22.b. For the same level
of precision, the computation time of the P2 modelling is nearly two orders of magnitude lower
than the computation time of the P1 modelling.

1.7.4.2 Accurate modelling of surface waves

Accurate modelling of surfaces wave is crucial for seismological studies, such as for the predic-
tion of site effects or FWI of land seismic data, where the receivers are usually located on the
free surface. For simple geometries, some analytical solutions exist. The propagation of waves
along the surface of an elastic half space was discussed by Lamb (1904) for a force located on
the surface, and an analytical solution was defined by Garvin (1956) for the buried line-source
problem. Nevertheless, in the case of complex topographies, a numerical method needs to be
used. For this, a method suitable for unstructured meshes has major advantages. In the follow-
ing, for validation purposes, we consider a homogeneous, isotropic and purely elastic medium

59
FORWARD MODELING

(a) (b)

P0 P0

P1 P1

P2 P2

0 1 2 3 4 5 6 7 8 9 134 136 138 140 142

Time (s) Time (s)

Figure 1.23: (a). Seismograms of velocity component vx computed with the P2 , P1 and P0
schemes for t ∈ [0, 3 T ]. Continuous line, the DG-FEM solution; dashed line, the analytical
solution. (b) Same as (a) for t ∈ [47 T, 50 T ].

with a planar free surface, and we adopt the experimental set-up defined in the WP1 HHS1
test case of the SPICE test code validation project (Moczo et al., 2005). The model dimensions
are 20 km × 20 km × 10 km in the directions x, y and z, respectively. The physical properties
are given by VP = 6000 m/s, VS = 3464 m/s and ρ = 2700 kg/m3 . The source is a point
dislocation with the only non-zero moment tensor component Mxy . The moment-rate time
history is given by
t t
Mxy (t) = M0 exp(− ),
T2 T
with M0 = 1018 Nm and T = 0.1 s. Considering a maximum frequency of 5 Hz, the minimum
wavelength is 693 m. The source and receiver locations are given in Table 1.8. The distance
between the source and the receivers varies from 1 to 16 λmin . We performed the computation
with the mixed scheme, with P2 elements in the medium and P1 elements in the CPMLs.
Absorbing layers were applied at all edges of the model, except at the top, where a free surface
condition was used. Figures 1.24.a and 1.24.b allow a comparison of the seismograms of the
components vx and vz , respectively, obtained with DG-FEM and with the reflectivity method
(Bouchon, 1981; Coutant, 1989). All of these seismograms were filtered between 0.13 and 5 Hz.
With an average mesh spacing of 3 elements per wavelength, a good match is seen between the
analytical and numerical solutions for all of the traces. Exceptions are found for the component
vz in traces #1, 2 and 3, where the DG-FEM fails to reproduce strictly null signals, but exhibits
weak residuals. These residuals might be due to the spatial support of the source, which does
not coincide with a pure Dirac in space, as depicted in Figure 1.15.c.

1.7.5 hp-adaptivity

1.7.5.1 Two-step refinement approach

One of the most interesting aspects of the DG-FEM is the possibility to mix approximation
orders without any special efforts. This feature relies on the local support of the basis func-
tions, which are discontinuous between the elements and is referred to as p-adaptivity. When

60
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

Table 1.8: Source and receiver locations for the planar free-surface modelling.

Type X (m) Y (m) Z (m)

source 0 0 -693
receiver #1 0 693 0
receiver #2 0 5543 0
receiver #3 0 10932 0
receiver #4 490 490 0
receiver #5 3919 3919 0
receiver #6 7348 7348 0
receiver #7 577 384 0
receiver #8 4612 3075 0
receiver #9 8647 5764 0

10 10

(a) 9 (b) 9

8 8

7 7

6 6
Receiver

Receiver

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 0 1 2 3 4 5
Time (s) Time (s)

Figure 1.24: (a) Seismograms of the velocity component vx computed for the planar free-surface
modelling of the SPICE test code validation project. Continuous line, the analytical solution
provided by the reflectivity method; dashed line, the DG-FEM solution. (b) Same as (a) with
the component vz .

combined with mesh refinement, this method becomes hp-adaptive. As in the initial study of
Babuska and Suri (1990), hp-adaptive FEMs associated with a-posteriori error estimates have
gained a lot of interest due to the exponential rates of convergence seen with the correct com-
bination of h- and p- refinements. In the present study, we propose to define a simple a-priori
error estimate to predict the required approximation order for each element. Our approach is
based on two major steps. The first refers to the mesh construction, with the intention to build
a tetrahedral mesh that is locally adapted to the media properties. Initially, a mesh is gener-
ated that roughly satisfies the discretisation required by the target approximation order. At the
very beginning of the procedure, the mesh can even be regular. Afterwards, the elements are
checked against the physical properties of the medium, and the list of elements that need to be
refined is used for the next iteration. The process is repeated until the list of elements to refine
is empty. We used to build and refine our meshes with the tool TETGEN (Si and Gärtner,
2005) which allows to specify for each element the maximum authorized volume. To compute

61
FORWARD MODELING

Table 1.9: Minimum and maximum insphere radius and number of elements of the unstructured
tetrahedral meshes with a refined area

Mesh 10 20 30
Min. insphere radius (m) 0.0017 0.0010 0.0007
Max. insphere radius (m) 0.0425 0.0292 0.0198
Number of elements 6952 26374 82668

the optimal volume for each element, we usually define a maximum ratio between the insphere
radius and the wavelength and then we evaluate the corresponding volume of an equilateral
tetrahedron. Given the complexity of the medium to be discretised, tetrahedral mesh genera-
tors can produce ill-shaped tetrahedra even if quality criteria are used. A common practice is to
limit the aspect ratio, which is defined by the ratio between the maximum side length and the
minimum height of the elements. Nevertheless, despite robust algorithms, like the Delaunay
refinement algorithm of Shewchuk (1998), some almost flat elements can be present at the end
of the refinement process, which are known as slivers. Besides these slivers, another critical
phenomenon can occur where there are abrupt contrasts in the physical properties. In these
situations, the refinement algorithm might not be able to perform the optimal discretisation.
This occurs when the size of the elements cannot vary as fast as the medium properties for
geometrical reasons. In that case, some elements are necessarily undersized. Consequently, the
construction of an ideal mesh is a difficult task, and a large range of element sizes is often seen
in constrained meshes. To mitigate the negative effects of the badly sized elements, we propose
to downgrade these elements with lower approximation orders. This is done in the second step
of our refinement approach, which is devoted to the p-adaptivity.

1.7.5.2 Numerical results

Our intention here, is to illustrate the benefits of the p-adaptivity. For that purpose, we consider
the case of the eigenmode propagation in the unit cube as we already did and introduce a refined
area in meshes #1, 2 and 3 in order to create artificially a large range of element sizes. We
obtain the new meshes #10 , 20 and 30 by defining a cubic zone of size 0.1 m × 0.1 m × 0.1 m in
the middle of the model where the average edge length is ten times smaller than h, the average
edge length in the surrounding mesh. The characteristics of the meshes can be found in Table
1.9. The ratio between the maximum and minimum insphere radius have been significantly
increased compared to the uniform meshes used previously (compare with Table 1.7). The
cross-section of the mesh #30 in Figure 1.25.a. allows to see the refined area in the center of
the model. For the p-adaptivity, we adopted the following criteria: if the insphere radius
is comprised between h/30 and h/10, the approximation order is downgraded to P1 , and if
the radius is smaller than h/30, the approximation order is downgraded to P0 . This strategy
is depicted in Figure 1.26, where for each approximation order, the time step evaluated with
equation (1.51) versus the insphere radius of one single equilateral tetrahedron is shown. When
applying these criteria, the time step does not decrease uniformly according to the size of the
element. Instead, two jumps (Figure 1.26, dashed line) allow the time step to increase despite
the reduction in the element size. These jumps are due to the decrease in the approximation
order from P2 to P1 , and from P1 to P0 . According to the adopted criteria, we obtain the

62
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

x (m) meter −3 x (m) order

0 0.2 0.4 0.6 0.8 1 x 10 0 0.2 0.4 0.6 0.8 1
1 1 2
18
0.9 0.9
16
0.8 0.8
14
0.7 0.7
12
0.6 0.6
y (m)

y (m)
0.5 10 0.5 1

0.4 8 0.4

0.3 6 0.3

0.2 4 0.2

0.1 0.1
2
0 0 0
(a) (b)

Figure 1.25: (a) View of the mesh in the xy plane at z = 0.5 m, showing the size of the elements
(insphere radius) in the mesh #30 . (b) Same as (a) with the approximation order associated
with each element. White, P2 elements; grey, P1 elements; black, P0 elements.

Table 1.10: Number of elements per approximation orders and time steps for the complete P2
and the p-adaptive modelling

Nb P0 elements Nb P1 elements Nb P2 elements Time step

Full P2 scheme with mesh 10 0 0 6952 0.0006745
p-adaptive scheme with mesh 10 2520 1606 2826 0.0033372
Full P2 scheme with mesh 20 0 0 26374 0.0004187
p-adaptive scheme with mesh 20 10883 5849 9642 0.0020934
Full P2 scheme with mesh 30 0 0 82668 0.0002737
p-adaptive scheme with mesh 30 34176 17483 31009 0.0013687

distribution of approximation orders indicated in Table 1.10. The number of downgraded

elements is quite important and represent for all meshes approximatively 60 %. Nevertheless,
the downgraded elements are mostly located in the vicinity of the refined area as shown in
Figure 1.25.b and represent in average only 3 % of the volume of the model. Moreover, despite
the fact that the P0 scheme does not provide accurate results, the introduction of such elements
allows a drastic increase of the time step by a factor of five. The impact of the downgraded
elements can be analysed with Figure 1.27.a showing the normalised RMS error between the
analytical, the P2 and the p-adaptive numerical solutions at t = 50 T . Actually, the p-adaptive
scheme exhibits an error which is comparable to the complete P2 modelling except for the mesh
#30 , where we observe a particular behavior of the P2 scheme with an increase of the error
despite the mesh spacing has been reduced. This indicates that a large distribution of element
sizes has an effect on the convergence on the P2 scheme. On the contrary, the p-adaptive scheme
seems less sensitive and preserves the second-order convergence. From a computational point
of view, the benefit of the approach appears in Figure 1.27.b where the error is represented
versus the computation time. For the same computation time, the p-adaptive approach shows a

63
FORWARD MODELING

0.04

0.035

0.03

0.025

time step (s)

0.02

0.015

0.01

0.005

0
0 0.005 0.01 0.015 0.02
insphere radius (m)

Figure 1.26: Time step versus the insphere radius of one single equilateral tetrahedron com-
puted with the equation (1.51), for different approximation orders. Grey curve, P0 ; blue curve,
P1 ; red curve, P2 ; dashed line, the p-adaptive approach used for mesh #30 .

0 0
(a) 10 (b) 10

−1 −1
10 10
RMS Error

RMS Error

−2 −2
10 10

−3 −3
10 1 2
10 1 2 3 4 5
10 10 10 10 10 10 10
−1 CPU time (s)
1 / r (m )

Figure 1.27: (a) Root mean square error between the analytical and numerical solutions versus
the inverse of the maximum insphere radius r at t = 50 T . Blue line, the error against the P2
solution; pink line, the error against the p-adaptive solution; grey curve, second-order slope.
(b) Same as (a) except the root mean square error is plotted versus the elapsed computation
time. The tests have been performed with 32 CPUs on a computing platform with bi-processor
quad core Opteron 2.3 GHz CPUs interconnected with Infiniband at 20 Gb/s

better misfit than the full P2 modelling, as indicated by the position of the p-adaptive curve at
the left of the P2 curve. The hp-refinement provided by DG-FEM is particularly interesting in
the case of complex refined meshes where small elements are generally produced by tetrahedral
mesh generators. The efficiency of our approach in such cases is illustrated in the next section.

1.7.6 The EUROSEISTEST Benchmarch

We demonstrate the potential of DG-FEM with hp-adaptivity in a challenging seismological

model, where the computation of the surface waves is critical for the prediction of site effects.

64
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

Table 1.11: The properties of the geological structures of the EUROSEISTEST model.

P-wave velocity S-wave velocity Density Ratio VP / VS Max. depth

Basin 1000 to 3027 m/s 200 to 848 m/s 2100 kg/m3 5.00 to 3.57 411 m
Bedrock 4500 to 6144 m/s 2600 to 3444 m/s 2600 to 2755 kg/m3 1.73 to 1.78 8 km

These phenomena arise when the ground motion caused by an earthquake is amplified by
geological structures. Site effects can be related to a sedimentary basin, like for the great
earthquake in Mexico in 1985 (Campillo et al., 1989; Kawase, 2003). The importance of site
effects and their study were the main motivation for setting-up worldwide test sites. Here,
we consider the EUROSEISTEST verification and validation project (Chaljub et al., 2009),
and address the issue of modelling the ground motion in a basin structure. We compare the
results obtained with our method against results computed with spectral element method SEM
(Komatitsch and Vilotte, 1998).

1.7.6.1 Description of EUROSEISTEST verification and validation project

The EUROSEISTEST verification and validation project refers to the geological structure of
the Mygdonian sedimentary basin about 30 km E-NE of the city of Thessaloniki (northern
Greece). It mainly consists of a sedimentary basin with extreme low velocities and a high
Poisson ratio, embedded in high velocity bedrock. The velocity structure of the area is well
known along the central section AB (Figure 1.28.b), following a large number of geophysical and
geotechnical measurements (Jongmans et al., 1998), surface and borehole seismic prospecting,
and electrical soundings and microtremor recordings. The 3D structure in the whole graben
was then extrapolated from this central profile, taking into account information from many
single-point microtremor measurements, some array microtremor recordings, one EW refraction
profile, and old deep boreholes drilled for water-exploration purposes (Raptakis et al., 2005;
Manakou et al., 2007). The sediment thickness indeed increases both to the West and the East
of the central profile, which corresponds to a buried pass between two thicker sub-basins. For
the verification part of the EUROSEISTEST project, a smooth vertical gradient without any
lateral variation was considered. Inside the basin, the velocities vary with the depth as follows
√
VP = 1000 + 100 d
√
VS = 200 + 32 d,

where VP and VS are expressed in m/s, and d is the depth in m. Table 1.11 summarises the
properties of the EUROSEISTEST model. The ratio between the maximum and minimum S-
wave velocities is 17.2. This high factor favours the use of unstructured meshes, as a large range
of different element sizes is expected. Indeed, small elements are required in the basin area
while larger ones can be used in the bedrock. The size of the model is 16 km × 15 km × 8 km
in the directions x,y and z, respectively. M-CPMLs of 2 km width are applied at all edges of
the model, except at the top, where a free surface condition is used. The model topography
is flat. Figures 1.28.a and 1.28.b show the P-wave and S-wave velocities, respectively, on the
free surface in the xy plane. In these figures, the complex shape of the basin and the abrupt

65
FORWARD MODELING

x (km) m/s x (km) m/s

0 5 10 15 0 5 10 15
15 4500 15
2500

4000

2000
3500
10 10
4
3000
y (km)

y (km)
5 A 1500
3

2&7 2500
6
5 5 1000
1 B
2000

1500 500

0 1000 0
(a) (b)

Figure 1.28: (a) View of the mesh in the xy plane at z = 0 m, showing the P-wave velocity
associated with each element in the EUROSEISTEST model. Numbered green triangles, the
receivers; yellow star, source epicentre. (b) Same with the S-wave velocity associated with each
element. The position of the cross-section AB is indicated by the white line.

−3
(a) 0.08 (b) x 10

4.5
0.06
4
Moment−rate (Nm/s)

3.5
0.04
3
Amplitude

0.02 2.5
2
0
1.5
1
−0.02
0.5
−0.04
0 1 2 3 4 5 0 2 4 6 8 10
time (s) Hertz (Hz)

Figure 1.29: (a) Moment-rate function of the source used for the EUROSEISTEST modelling.
(b) Amplitude spectrum of the source.

contrast of velocity at the basin border can be seen. The source is located 5 km below the
basin, and it acts as a double-couple mechanism that represents a small earthquake with a
corner frequency of 4 Hz (Figure 1.29). The epicentre is indicated with a yellow star in Figure
1.28.a. The minimum propagated wavelength is 50 m, and the largest dimension of the model
is 320 λ. We considered seven receivers, as marked with numbered green triangles in Figure
1.28.a, at strategic positions of the true EUROSEISTEST array. All of these receivers lie on
the free surface, except receiver #7, which is buried at 197 m depth just above the source.
Receivers #1 and # 4 are located on the bedrock, and the others are located within the basin
area.

66
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

1.7.7 Numerical results

For the SEM calculations, the size of the computational domain is 16.14 km × 29.31 km ×
7.86 km, and local absorbing boundary conditions were imposed at the lateral and bottom
boundaries, following Komatitsch and Vilotte (1998). The mesh is based on a conforming
layer-cake topology (Komatitsch et al., 2004) where the elements are deformed to follow the
sediment-bedrock interface, except for depths shallower than a threshold value, which was
set to 80 m for the basin. For the elements close to the valley edges, the sediment-bedrock
discontinuity is approximated by assigning different material values to the collocation points
inside the elements. Note that because of the large P-wave velocity in the shallow bedrock, the
choice of the threshold depth directly controls the time step authorised by the CFL stability
condition, and therefore the total CPU time of the simulation.
For the DG-FEM calculations, the size of the numerical model was 20.14 km × 19 km × 8
km in the directions x,y and z, respectively, including M-CPMLs of 2 km width at all edges of
the model, except at the top, where a free-surface condition was used. We adopted the two-step
refinement approach explained in the previous section. In the first step, we built an ad-hoc
tetrahedral mesh with TETGEN. A total of six mesh refinement iterations were required to
reach an adaptive discretisation of 3 elements per λS . Figures 1.30.a, 1.30.b and 1.30.c show
the distribution of the S-wave velocity in the cross-section AB for the first, second and last
iterations of the h-refinement process, respectively. Due to the extremely low velocities in the
basin, the automatic refinement process produced very small elements, which resulted in a fine
and regular discretisation of the basin shape. Figure 1.31.a shows the size of the elements
(insphere radius) on the free surface. As expected, smaller elements are found in the basin area
rather than in the bedrock. In this example, we have taken advantage of the tetrahedral mesh
refinement. Indeed, the volume of the basin represents 0.8% of the complete volume of the
model and it contains 72% of the total number of mesh elements. In the second step, we made
use of p-adaptivity to reduce the number of time steps. We adopted the following criteria: if
the insphere radius is between λS /120 and λS /40, the approximation order is downgraded to
P1 , and if the insphere radius is smaller than λS /120, the approximation is downgraded to
P0 . While most of the tetrahedral elements are adequate for P2 , the badly sized elements are
computed with lower approximation orders. We end up with a mesh that contains in total 16.3
million elements and 131.0 million DOF. The approximation orders are distributed as follows:
67.04% P2 elements, 32.67% P1 elements (with 28.66% elements in the M-CPMLs), and 0.29%
P0 elements. This strategy is shown in Figure 1.31.b, where the approximation order is shown
for each element located on the free surface. Almost all of the elements are P2 elements, except
for those with inappropriate sizes, which are downgraded to P1 or to P0 in the worst cases.
Indeed, the contact between the basin and the bedrock produces a high velocity contrast that
is not ideally accomodated by the tetrahedra. Therefore, some elements located in the bedrock
have smaller sizes than expected, and thus can be treated with lower approximation orders.
These latter are particularly visible in Figure 1.31.b. Some P1 elements also appear at the
border in Figure 1.31.b where the M-CPMLs start.
The seismograms of the components vx and vz computed with DG-FEM and with SEM are
shown in Figures 1.32.a and 1.32.b, respectively. The fit between the DG-FEM and SEM so-
lutions is almost perfect for the vertical component vz , whatever the position of the receivers,
and even at long times. On the other hand, for the horizontal component vx , good agree-
ment is seen for short times, of up to 6-7 s. At later times, some amplitude misfits are seen.
Nevertheless, for all of the traces, the overall fit of the waveforms between the two solutions

67
FORWARD MODELING

x (m)
0 500 1000 1500 2000 2500 3000 3500 4000 4500 m/s
0 600
z (m) −200 400
−400
200
(a)

x (m)
0 500 1000 1500 2000 2500 3000 3500 4000 4500 m/s
0 600
z (m)

−200 400
−400
200
(b)

x (m)
0 500 1000 1500 2000 2500 3000 3500 4000 4500 m/s
0 600
z (m)

−200 400
−400
200
(c)

Figure 1.30: (a) Cross-section AB of the mesh at the first iteration of the h-refinement showing
the S-wave velocity associated with each element in the EUROSEISTEST model. (b) Same as
(a) at the second iteration of the h-refinement. (c) Same as (a) at the sixth and last iteration
of the h-refinement.

x (km) meter x (km) order

0 5 10 15 0 5 10 15
15 15 2
35

10 25 10
y (km)

y (km)

20
1

15
5 5
10

0 0 0
(a) (b)

Figure 1.31: (a) View of the mesh in the xy plane at z = 0 m, showing the size of the elements
(insphere radius) in the EUROSEISTEST model. (b) Same as (a) with the approximation
order associated with each element. White, P2 elements; grey, P1 elements; black, P0 elements.

is remarkable, which indicates that the same and complex wave propagation phenomena are
represented. Contrary to the SEM, for the DG-FEM, constant physical properties per element
were assumed, given by the average of the properties at the four vertices of the elements.
Therefore, the amplitude misfits seen in the DG-FEM seismograms might be the consequence
of the approximations used in the model discretisation, rather than the accuracy of the nu-
merical method itself. The statistics related to the DG-FEM and SEM modelling are given in
Table 1.12. Compared to DG-FEM, the number of DOF used in the SEM modelling is 30%
lower, and the number of time steps is nearly two-fold lower. Both of the simulations were

68
1.7 3D Finite-element Discontinuous Galerkin Method in the Time Domain

Table 1.12: Mesh statistics, computation time and memory allocation relative to the EURO-
SEISTEST modelling. the DG-FEM and SEM computations were both performed with 18
bi-xeon Quadcore CPU IBM E5420 at 2.5 GHz (making a total of 144 cores).

Method Order min. edge max. edge Nb elem. Nb DOF Nb steps Nb CPUs Elapse ti
DG-FEM P2 /P1 /P0 2.5 m 399.8 m 16.3 × 106 131.0 × 106 122 565 144 52 hour
SEM P4 20.0 m 906.0 m 1.4 × 106 91.7 × 106 75 000 144 7 hours

8 8

(a) 7 (b) 7

6 6

5 5
Receiver

Receiver
4 4

3 3

2 2

1 1

0 0
0 5 10 15 20 25 0 5 10 15 20 25
Time (s) Time (s)

Figure 1.32: (a) Seismograms of the component vx computed in the EUROSEISTEST model.
Black line, with DG-FEM; red line, with SEM. (b) Same as (a) with the component vz .

performed on the same computing platform with 18 bi-xeon Quadcore CPU IBM E5420 at
2.5 GHz (giving a total of 144 cores). The methods required similar amounts of memory, and
to obtain 30 s of wave propagation, the computation time was 7 h with SEM and 52 h with
DG-FEM. The computation time per DOF and per step is on average 1.67 µs for DG-FEM,
and 0.52 µs for SEM. Taking into account that the number of unknowns per DOF is nine with
DG-FEM (with first-order velocity-stress formulation) and three with SEM (with second-order
velocity formulation), these two methods yield comparable computation times per unknown.
Therefore, the relative cost of the methods depends mainly on the mesh characteristics. How-
ever, a detailed analysis is required and goes beyond the scope of this study. We can expect,
that in more complicated cases (like a set of thin geological layers), the DG-FEM would be
more efficient, due to the flexibility of tetrahedral meshes. In the following, we present another
comparison tool that allows for a study of the misfits on the complete free surface of the model.
An objective in earthquake engineering is to predict the ground motion for a realistic scenario.
The map of peak ground velocity (PGV) provides a convenient representation that shows the
maximum value of the norm of the velocity vector for each position on the free surface. PGV
maps computed with 30 s of seismic signals are shown in Figure 1.33. The fit between the PGV
map computed with DG-FEM and the PGV map computed with SEM is almost perfect. On
these maps, the paths followed by energetic bundles of surface waves can be seen. When they
reach the basin borders, these bundles are reflected and diffracted. This behaviour can be be
seen in the PGV map in the south-east part of the basin.

69
FORWARD MODELING

x (km) m/s x (km) m/s

−5 −5
0 5 10 15 x 10 0 5 10 15 x 10
15 1 15 1

0.9 0.9

0.8 0.8

0.7 0.7
10 10
4 0.6 4 0.6
y (km)

y (km)
3 5 3 5
0.5 0.5
2&7 2&7
6 0.4 6 0.4
5 5
1 0.3 1 0.3

0.2 0.2

0.1 0.1

0 0 0 0
(a) (b)

Figure 1.33: (a) Peak ground velocity map computed for the EUROSEISTEST modelling with
DG-FEM. Numbered white triangles, the receivers; yallow star, the source epicentre. (b) Same
as (a) computed with SEM.

1.8 Partial conclusion

We have reviewed two numerical methods to perform wave modeling in the frequency domain
with sparse direct solvers and one numerical method in the time doman with explicit time
integration.
Two benefits of the frequency domain compared to the time domain are the straightfor-
ward and inexpensive implementation of attenuation effects by means of complex-valued wave
speeds and the computational efficiency of multi-source modeling when a sparse direct solver
is used to solve the linear system resulting from the discretisation of the wave equation in the
frequency domain. The first discretisation method relies on a parsimonious staggered-grid FD
method based on a compact and accurate stencil allowing for both the minimization of the
numerical bandwidth of the impedance matrix and the number of unknowns in the FD grid.
The discretisation criterion which can be used with this method is 4 grid points per minimum
wavelength. We have shown the efficiency of the method for tackling 3D problems involving
few millions of unknowns and few thousands of right-hand sides on computational platform
composed of a limited number of processors with a large amount of shared memory.
Since the FD lacks geometrical flexibility to discretize objects of complex geometries, we have
developed a 2D discontinuous finite element method on unstructured triangular mesh. The DG
method is fully local in the sense that each element is uncoupled from the next, thanks to the
duplication of variables at nodes shared by two neighboring elements. This uncoupling allows
for a flexible implementation of the so-called h − p adaptivity, where the size of the element can
be adapted to the local features of the model and the order of the interpolating polynomials
can be adapted within each element. The price to be paid for the geometrical flexibility of the
discretisation is the increase of the number of unknowns compared to continuous finite element
methods. We have illustrated the fields of application where the frequency-domain DG method
should perform well. A first perspective of this work concerns the investigation of other linear
algebra techniques to solve the linear system and overcome the limits of sparse direct solver
in terms of memory requirement and limited scalability. Use of domain decomposition meth-

70
1.8 Partial conclusion

ods based on hybrid direct-iterative solvers should allow us to tackle 3D problems of higher
dimensions. A second perspective is the improvement of the frequency-domain DG method
to make possible the extension to 3D. One possible improvement is the use of heterogeneous
medium properties in each element of the mesh to allow for higher-order interpolation orders.
Another field of investigation concerns the numerical flux, which is a central ingredient of the
DG method. Although we used centred fluxes for their energy conservation properties, other
fluxes such as upwind fluxes should be investigated for improved accuracy of the scheme.
The time domain is still appealing, especially in the 3D geometry for elastic wave propaga-
tion. We have proposed a DG-FEM with CPML absorbing boundary condition that benefits
most from hp-adaptivity combined with tetrahedral meshes. The gain obtained with this
method in the context of 3D seismic elastic modelling is important when complex geological
structures are considered, especially if the medium has highly contrasting physical properties.
In our approach, we favour the use of low approximation orders which allows fine discretisa-
tion of the medium with piece-wise constant properties per element. From this point of view,
an optimal compromise between precision, computational cost and adequate discretisation is
achieved with the P2 interpolation. For efficient reduction of the computation time, CPMLs
were designed with lower approximation orders and they allowed a saving of between 40% and
60% of CPU time on large clusters. Moreover, we mitigated the effects of ill-sized tetrahedral
elements by automatically choosing the appropriate approximation order for each element, and
hence we have kept the number of time steps as low as possible. In our case, the so-called
p-adaptivity technique can reduce the number of time steps by a factor of five. Consequently,
when combined with the low-cost CPMLs, computation times are generally reduced by nearly
one order of magnitude, compared with the times observed with standard DG-FEM modelling
using a unique approximation order.
The potential and the perspectives concerning this method are numerous. For the limitations
of our formulation, we note the possibility of attributing varying physical properties inside the
elements. This would release the discretisation constraint and would allow the use of higher
approximation orders, thus reducing the number of elements and the computational cost of
the simulations. For completeness, we note another possible means of releasing the discreti-
sation constraint, with non-conforming meshing, although the expected gain does not appear
as crucial in the case of tetrahedral meshes as it is with hexahedral meshes. Apart from
these possible evolutions, we intend to include visco-elastic rheologies (Käser et al., 2007) and
to apply the method to realistic problems requiring appropriate discretisations of geological
structures and/or large material contrasts. Due to the discontinuous nature of the method,
rupture mechanisms, like earthquake dynamic rupture, might be modelled (BenJemaa et al.,
2007, 2009; De la Puente et al., 2009). This method can also be applied to seismic modelling
in cases of complex topographies, or be used as a forward modelling tool for FWI techniques
(Tarantola, 1987; Pratt et al., 1998).

71
Chapter 2

Inverse problem : theory

This chapter is mainly based on the paper :

• Virieux, J. and S. Operto [2009] An overview of full-waveform inversion in exploration

geophysics. Geophysics 74(6), WCC1–WCC26.

2.1 Introduction

In the eighties, Lailly (1983) and Tarantola (1984) recasted the migration imaging principle of
Claerbout (1971, 1976) as a local optimization problem, the aim of which was the least-squares
minimization of the misfit between the recorded and the modeled data. They show that the
gradient of the misfit function, along which the perturbation model is searched, can be built
by a weighted cross-correlation between the incident wavefield emitted from the source and
the backpropagated residual wavefields from the receivers. The velocity perturbation obtained
after at the first iteration of the local optimization looks like a migrated image obtained by
reverse-time migration. One difference is that the seismic wavefield recorded at the receiver is
backpropagated in reverse time migration, whereas the data misfit is backpropagated in the
waveform inversion of Lailly (1983) and Tarantola (1984). When added to the initial velocity,
the velocity perturbations lead to an updated velocity model, which is used as a starting model
for the next iteration of the minimization of the misfit function. The impressive amount of data
included in seismograms (each sample of a time series has to be considered) is now involved in
gradient estimation. This estimation is performed by summation over sources, receivers and
time.
Waveform-fitting imaging was quite computer-demanding at that time, even for two-dimen-
sional (2D) geometries (Gauthier et al., 1986). It has, however, been applied successfully in
various studies using different forward-modeling techniques, such as reflectivity techniques in
layered media (Kormendi and Dietrich, 1991), finite-difference techniques (Kolb et al., 1986;
Ikelle et al., 1988; Pica et al., 1990; Crase et al., 1990; Djikpéssé and Tarantola, 1999), finite-
element methods (Choi et al., 2008), and extended ray theory (Cary and Chapman, 1988; Koren
et al., 1991; Sambridge and Drijkoningen, 1992). A less computationally intensive approach was
achieved by Jin et al. (1992) and Lambaré et al. (1992) who established the theoretical connec-
tion between ray-based generalized Radon reconstruction techniques (Beylkin, 1985; Bleistein,
INVERSE PROBLEM : THEORY

1987; Beylkin and Burridge, 1990) and least-squares optimization (Tarantola, 1987). By defin-
ing a specific norm in the data space, which varies from one focusing point to the next, they
were able to recast the asymptotic Radon transform as an iterative, least-squares optimization
after diagonalization of the Hessian operator. Applications on 2D synthetic data and real data
have been provided (Thierry et al., 1999b; Operto et al., 2000a) and 3D extension has been
possible (Thierry et al., 1999a; Operto et al., 2003), due to efficient asymptotic forward model-
ing (Lucio et al., 1996). As the Green’s functions are computed in smooth media with the ray
theory, the forward problem is linearized with the Born approximation, and the optimization
is iterated in a linear way, which means that the background model remains the same over the
iterations. These imaging methods are generally called migration/inversion or true-amplitude
prestack depth migration. As the smooth background model does not change over iterations
and only the single-scattered wavefield is modeled by means of the linearization of the forward
problem, this is the main difference with the waveform inversion methods described here.
Alternatively, the full information content of the seismogram can be considered in the
optimization. This leads us to full-waveform inversion (FWI), where at each iteration of the
optimization, the full-wave equation is solved in the final model of the previous iteration. All
types of waves are involved in the optimization, including diving waves, super-critical reflections
and multi-scattered waves, such as intern multiples and free-surface related multiples if the free
surface is modelled.
FWI has not been recognized as an efficient seismic imaging technique because pioneering
applications were restricted to seismic reflection data: for short-offset acquisition, the seismic
wavefield is rather insensitive to the intermediate wavelengths, and, therefore, the optimization
cannot adequately reconstruct the true velocity structure through iterative updates. Only when
a sufficiently accurate initial model is provided, can the fitting of waveforms converge to the
velocity structure through such updates. For sampling the initial model, sophisticated investi-
gations with global and semi-global techniques (Koren et al., 1991; Jin and Madariaga, 1993,
1994; Mosegaard and Tarantola, 1995; Sambridge and Mosegaard, 2002) have been performed.
The rather poor performance of these investigations, that arises from the above-mentioned
insensitivity has given many researchers the feeling that this optimization technique is not
particularly efficient.
Only with the benefit of long-offset and transmission data to reconstruct the large and in-
termediate wavelengths of the structure did FWI reached its maturity, as has been highlighted
by Mora (1987), Mora (1988), Pratt and Worthington (1990), Pratt et al. (1996) and Pratt
(1999). FWI attempts to characterize a broad and continuous wavenumber spectrum at each
point of the model, hence reunifying the macromodel building and the migration tasks into a
single procedure. There have been historical cross-hole and wide-angle surface data examples,
that have illustrated the capacity of the simultaneous reconstruction of the entire spatial spec-
trum (e.g., Pratt, 1999; Ravaut et al., 2004; Brenders and Pratt, 2007a). Robust application
of FWI to long-offset data remains challenging, however, because of increasing nonlinearities
introduced by wavefields propagated over several tens of wavelengths and various incidence
angles (Sirgue, 2006).
In the following, we review the main theoretical aspects of FWI based on a least-squares
local optimization approach. We follow the compact matrix formalism because of its simplicity
(Pratt et al., 1998; Pratt, 1999). This leads to a clear interpretation of the gradient and of the
Hessian of the objective function. We also briefly review different optimization algorithms in
order to compute the perturbation model, and introduce regularization in FWI.

74
2.2 FWI as a least-squares local optimization

2.2 FWI as a least-squares local optimization

We follow the simplest view of FWI based on the so-called length method (Menke, 1984).
The reader interested in the probabilistic maximum likelihood or on the generalized inverse
formulations is referred to Menke (1984), Tarantola (1987), Scales and Smith (1994) and Sen
and Stoffa (1995).
We define the misfit vector ∆d = dobs − dcal (m) by the difference at the receiver positions
between the recorded seismic data dobs and the modeled one dcal (m), for each source of the
seismic survey. dcal can be related to the modeled seismic wavefield u by an operator R,
which extracts the values of the wavefields computed in the full computational domain at
the receiver positions for each source: dcal = Ru. Note that depending on the acquisition,
the R operator allows to extract from the forward modelled wavefields, several components
of multi-component sensor for each receiver position. The model m represents some physical
parameters of the subsurface discretized over the computational domain. In the simplest case
corresponding to the mono-parameter acoustic approximation, the model parameters are the
P-wave velocities defined at each node of the numerical mesh used to discretize the inverse
problem. In the extreme case, the model parameters would correspond to the twenty-one elastic
moduli characterizing linear triclinic elastic media, the density, and some memory variables
characterizing the anelastic behavior of the subsurface (Toksöz and Johnston, 1981). The most
common discretization consists of the projection of the continuous model of the subsurface on
a multidimensional Dirac comb but more complex basis can be considered (see Appendix A in
Pratt et al. (1998) for a discussion on alternative parameterization). We define a norm C(m)
of this misfit vector ∆d, referred to as the misfit function or the objective function. We focus
below on the least-squares norm, which is easier to manipulate from a mathematical viewpoint
(Tarantola, 1987). Other norms will be discussed later.

2.2.1 The Born approximation and the linearization of the inverse problem

The least-squares norm is given by

1
C(m) = ∆d† ∆d, (2.1)
2
where † denotes the adjoint operator (complex conjugate). Equation 2.1 is considered for one
seismic source.
In the time domain, an implicit summation in equation (2.1) is performed over the number
of time samples in the seismograms. In the frequency domain, the summation over frequencies
replaces that over time. In the time domain, the misfit vector is real valued, whereas in the
frequency domain it is complex valued. As we shall show later, an advantage of the frequency
domain is that the number of frequencies can be significantly smaller than the number of time
samples without producing aliasing in the reconstructed models (e.g., Pratt and Worthington,
1990; Pratt, 1999; Sirgue and Pratt, 2004; Brenders and Pratt, 2007a).
The minimum of the misfit function C(m) is searched in the vicinity of the starting model
m0 : the FWI is essentially a local optimization. In the frame of the Born approximation, we
assume than the updated model m can be written as the sum of the starting model m0 plus
a perturbation model ∆m.
m = m0 + ∆m. (2.2)

75
INVERSE PROBLEM : THEORY

In the following, we shall assume that the model m is real valued.

A second-order Taylor-Lagrange development of the misfit function in the vicinity of the
model m0 gives the following expression

M M M
X ∂C(m0 ) 1 X X ∂ 2 C(m0 )
C(m0 + ∆m) = C(m0 ) + ∆mj + ∆mj ∆mk + O(m3 ), (2.3)
∂mj 2 ∂mj ∂mk
j=1 j=1 k=1

where the integer M denotes the number of elements in the vector m. Taking the derivative
with respect to the model parameter ml gives

M
∂C(m) ∂C(m0 ) X ∂ 2 C(m0 )
= + ∆mj , (2.4)
∂ml ∂ml ∂mj ∂ml
j=1

which gives in matrix form the following expression

∂C(m) ∂C(m0 ) ∂ 2 C(m0 )

= + ∆m. (2.5)
∂m ∂m ∂m2
The minimum of the misfit function in the vicinity of the point m0 is reached when the first
derivative of the misfit function vanishes, which gives an expression of the perturbation model
written as
2 −1
∂ C(m0 ) ∂C(m0 )
∆m = − . (2.6)
∂m2 ∂m
The perturbation model is searched in the opposite direction of steepest ascent (i.e., the gra-
dient) of the misfit function at the point m0 . The second derivative of the misfit function is
the Hessian and defines the curvature of the misfit function at the point m0 . Of note, the
error term O(m3 ) is zero when the misfit function is a quadratic function of the model m.
This is the case for linear forward problems such as u = Gm. In this case, the expression of
the perturbation model of equation (2.6) will give the minimum of the misfit function in one
iteration. In FWI, the relationship between the data and the model is nonlinear u = G (m)
and the inversion needs to be iterated several times to converge towards the minimum of the
misfit function.

2.2.2 Normal equations: Newton, Gauss-Newton and steepest-descent meth-

ods

2.2.2.1 Basic equations

The derivative of C(m) with respect to the model parameter ml gives

N
∂d∗cali

∂C(m) 1 X ∂dcali ∗
= − (dobsi − dcali ) + (dobsi − dcali )
∂ml 2 ∂ml ∂ml
i=1
N
∂dcali ∗
X
= − < (dobsi − dcali ) , (2.7)
∂ml
i=1

76
2.2 FWI as a least-squares local optimization

where the real part of a complex number and the number of elements of the data misfit vector
are denoted by the symbol < and by N respectively. The conjugate of a complex number is
denoted by the symbol ∗ . In matrix form, equation (2.7) translates to
" #
∂dcal (m) †

∂C(m) h i
∇Cm = = −< (dobs − dcal (m)) = −< J† ∆d , (2.8)
∂m ∂m

where J is the sensitivity or the Fréchet derivative matrix. In equation (2.8), ∇Cm is a vector
of dimension M . Taking m = m0 in equation (2.8) provides the descent direction along which
the perturbation model is searched, equation (2.6).
Differentiation of the gradient expression, equation (2.7), with respect to model parameters
gives in matrix form (see Pratt et al. (1998) for details) the following expression
∂ 2 C(m0 ) ∂J0 t
h i
† ∗ ∗
= < J 0 0J + < (∆d ...∆d ) . (2.9)
∂m2 ∂mt
Plugging the expression of the gradient and of the Hessian gives for the perturbation model
−1 h
∂J0 t
i
† ∗ ∗ †
∆m = − < J0 J0 + (∆d ...∆d ) < J 0 ∆d 0 . (2.10)
∂mt
The method providing the solution of the normal equations, equation (2.10), is generally re-
ferred to as the Newton method, which is locally quadratically convergent.
For linear problems (d = Gm), the second term in the Hessian is zero because the second-
order derivative of the data with respect to model parameters is zero. This second-order term
is most of the time neglected even for nonlinear inverse problems. The methodproviding the
solution of equation (2.10) when only the approximate Hessian Ha = < J0 † J0 is taken into
account, is generally referred to as the Gauss-Newton method.
Alternatively, the inverse of the Hessian can be replaced in equation (2.10) by a scalar α, the
so-called step length, leading to gradient or steepest-descent method. The steplength can be
estimated by a a line-search method, for which a linear approximation of the forward problem
is used (Gauthier et al., 1986; Tarantola, 1987). In the linear approximation framework, the
second-order Taylor-Lagrange development of the misfit function gives
C(m − α∇C(m0 )) = C(m) − α < ∇C(m)|∇C(m0 ) >
1 2
+ α Ha (m) < ∇C(m0 )|∇C(m0 ) >, (2.11)
2
where we assumed a model perturbation of the form ∆m = α∇C(m0 ). In equation (2.11), we
replaced the second-order derivative of the misfit function by the approximate Hessian in the
second term of the right-hand side term. Plugging the expression of the approximate Hessian
Ha in the previous expression, zeroing the partial derivative of the misfit function with respect
to α, and using m = m0 gives
< ∇C(m0 )|∇C(m0 ) >
α= . (2.12)
< Jt (m t
0 )∇C(m0 )|J (m0 )∇C(m0 )) >

The term Jt (m0 )∇C(m0 ) is conventionally computed using a first-order-accurate finite-difference

approximation of the partial derivative of G,
∂G(m0 ) 1
∇C(m0 ) = (G(m0 + ∇C(m0 )) − G(m0 )) , (2.13)
∂m

77
INVERSE PROBLEM : THEORY

with a small parameter . The estimation of α requires solving an extra forward problem per
shot for the perturbed model m0 + ∇C(m0 ). This line-search technique was extended to
multiple-parameter classes by Sambridge et al. (1991) using a subspace approach. In this case,
one forward problem must be solved per parameter class, which can reveal computationally
expensive. Alternatively, the steplength can be estimated by parabolic interpolation through
three different points (α, C(m0 +α∇C(m0 )): the minimum of the parabola provides the desired
α. In this case, two extra forward problems per shot must be solved as we have already a third
point corresponding to (0, C(m0 )) (see Figure 1 in Vigh et al. (2009) for an illustration).
Pratt et al. (1998) have illustrated how the quality and the rate of the convergence of
the inversion depend significantly on the Newton, Gauss-Newton or gradient methods to be
used. Importantly, they have shown how the gradient method can fail to converge towards
an acceptable model unlike the Gauss-newton and Newton methods, whatever the number of
iterations is. They interpret this failure as the result of the difficulty to estimate a reliable
steplength. Gradient methods can be however significantly improved by using some judicious
scaling provided by the diagonal terms of the approximate Hessian or of the pseudo-Hessian
(Shin et al., 2001). Computation of the diagonal terms of the Hessian requires to compute each
coefficient of the sensitivity matrix J0 unlike the gradient (see next section).
h i
∆m = −αDiag (Ha )−1 < J†0 ∆d0 . (2.14)

2.2.2.2 Numerical algorithms: Conjugate-gradient method

Over the last decade, the most popular local optimization algorithm to solve FWI problems was
based on the conjugate gradient method (Mora, 1987; Tarantola, 1987; Crase et al., 1990). In
the conjugate gradient method, the model is updated at the iteration n in the direction of p(n) ,
which is a linear combination of the gradient at iteration n, ∇C (n) , and the previous direction
p(n−1) . Therefore, only two gradients of the misfit function are stored for the implementation
of the conjugate-gradient algorithm,

p(n) = ∇C n + β (n) p(n−1) . (2.15)

The scalar β (n) is designed to guarantee that p(n) and p(n−1) are conjugate.
Among the different variants of the conjugate gradient method to derive the expression
of β (n) , the Polak-Ribière formulae (Polak and Ribière, 1969) is generally used for waveform
inversion of seismic data,

< ∇C (n) − ∇C (n−1) |∇C (n) >

β (n) = . (2.16)
||∇C (n) ||2

In FWI, the preconditioned gradient Wm −1 ∇C (k) is used for p(n) where W is a weighting
m
operator that will be introduced in the next section (Mora, 1987).

2.2.2.3 Numerical algorithms: Quasi-Newton algorithms

Finite approximations of the Hessian or its inverse can be computed by using quasi-Newton
method such as the BFGS (named for its discoverers, Broyden, Fletcher, Goldfarb and Shanno)

78
2.2 FWI as a least-squares local optimization

algorithm (Nocedal, 1980). The governing idea is to update the approximation of the Hessian
B (n) or of its inverse H(n) at each iteration of the inversion taking into account the additional
knowledge provided by ∇C (n) at the iteration n. In these approaches, the approximation of
the Hessian or its inverse is explicitly formed.
For large-scale problems such as FWI, where the cost of storing and working with the
approximation of the Hessian matrix is prohibitive, a limited-memory variant of the quasi-
Newton BFGS method, the so-called L-BFGS algorithm, allows to compute in a recursive
manner H(n) ∇C (n) without forming explicitly H(n) . Only few gradients of the previous non-
linear iterations (typically, between 3 and 20 iterations) need to be stored in L-BFGS, which
represents a negligible storage and computational cost compared to the conjugate-gradient al-
gorithm. The algorithm is described in Nocedal (1980, p. 177-180). The L-BFGS algorithm
requires an initial guess H(0) , which can be provided by the inverse of the diagonal Hessian
(Brossier et al., 2009a). For multiparameter FWI, the L-BFGS algorithm provides a suit-
able scaling of the gradients computed for each each parameter class and, hence, provides a
computationally-efficient alternative to the subspace method of Sambridge et al. (1991). A
comparison between the conjugate-gradient method and the L-BFGS method for a realistic
onshore application of multiparameter elastic FWI is shown in Brossier et al. (2009a).

2.2.2.4 Newton and Gauss-Newton algorithms

More accurates but more computationally-intensive, Gauss-Newton and Newton algorithms are
also described in Akcelik (2002), Askan et al. (2007), Askan and Bielak (2008) and Epanomer-
itakis et al. (2008), with an application to a 2D synthetic model of the San Fernando Valley
using the SH-wave equation. At each nonlinear iteration of the FWI, a matrix-free conjugate-
gradient is used to solve the reduced Karush-Kuhn-Tucker (KKT) optimal system, which turns
out to be similar to the normal-equation system, equation (2.10). Neither the full Hessian nor
the sensitivity matrix is explicitely formed: only the application of the Hessian to a vector
needs to be performed at each iteration of the conjugate-gradient algorithm. Application of
the Hessian to a vector requires to perform two forward problems per shot for the incident
and the adjoint wavefields (Akcelik, 2002). Since these two simulations are performed at each
iteration of the conjugate-gradient algorithm, an efficient preconditioner must be used to mit-
igate the number of iterations of the conjugate-gradient algorithm. For preconditioner of the
conjugate gradient, Epanomeritakis et al. (2008) used a variant of the L-BFGS method, where
the curvature of the objective function is updated at each iteration of the conjugate gradient
using the Hessian-vector products collected over the iterations.

2.2.3 The gradient and Hessian in FWI: interpretation and computation

A clear interpretation of the gradient and Hessian has been given by Pratt et al. (1998) using
the compact matrix formalism of frequency-domain FWI. A brief review is given here. Let
us consider the forward-problem equation given by equation (1.2) for one source and for one
frequency. In the following, we shall assume that the model is discretized in a finite-difference
sense using a uniform grid of nodes.
Differentiation of equation (1.2) with respect to model parameter ml gives the expression

79
INVERSE PROBLEM : THEORY

of the partial derivative wavefield, ∂u/∂ml , by solving the following system

∂u
B = f (l) , (2.17)
∂ml

where
∂B
f (l) = − u. (2.18)
∂ml
Analogy between the forward-problem equation, equation (1.2), and the equation (2.17) shows
that the partial derivative wavefield can be computed by solving one forward problem, the
source of which is given by f (l) . This so-called virtual secondary source is formed by the
product between the matrix ∂B/∂ml and the incident wavefield. The matrix ∂B/∂ml is built
by differentiating each coefficient of the forward-problem operator B with respect to ml .
The matrix ∂B/∂ml is extremely sparse. The spatial support of the virtual secondary
source is centered on the position of ml , while the temporal support of f (l) is centered around
the arrival time of the incident wavefield at the position of ml . Therefore, the partial derivative
wavefield with respect to model parameter ml can be viewed as the wavefield emitted by the
seismic source s and scattered by a point diffractor located at ml . The radiation pattern of
the virtual secondary source is controlled by the operator ∂B/∂ml . Analysis of this radiation
pattern for different parameter classes allow to assess to which extent parameters of different
natures are uncoupled in the tomographic reconstruction as a function of the diffraction angle,
and, hence can be reliably reconstructed during FWI. Radiation patterns for the isotropic
acoustic, elastic, and visco-elastic wave equations are shown in Wu and Aki (1985); Tarantola
(1986); Forgues and Lambaré (1997); Ribodetti and Virieux (1996).
Since the gradient is formed by the zero-lag correlation between the partial derivative wave-
field and the data residual, the data residuals and the partial derivative wavefields have the
same meaning: they represent wavefields scattered by the missing heterogeneities in the start-
ing model m0 (Tarantola, 1984; Pratt et al., 1998). The interpretation of the partial deriva-
tive wavefield in terms of scattered wavefield draws some clear connections between FWI and
diffraction tomography: the perturbation model can be represented by a series of closely-spaced
diffractors. In virtue of the Huygens’ principle, the image of the model perturbations is built by
superposition of the elementary image of each diffractor and the seismic wavefield perturbation
is built by superposition of the wavefields scattered by each point diffractor (McMechan and
Fuis, 1987). The interpretation of the partial derivative wavefields gives also clear insights on
the effects of the Hessian on the reconstruction. The approximate Hessian is formed by the
zero-lag correlation between the partial derivative wavefields. The diagonal terms of the ap-
proximate Hessian contains the zero-lag autocorrelation and therefore, represent the square of
the amplitude of the partial derivative wavefield. Scaling the gradient by these diagonal terms
remove from the gradient the geometrical amplitude effects associated with the partial deriva-
tive wavefields and the residuals. In the frame of surface seismic experiments, the effect of the
scaling performed by the diagonal Hessian is to provide a good balancing between the shallow
and the deep perturbations. A diagonal Hessian is shown in Ravaut et al. (2004, Figure 12).
The off-diagonal terms of the Hessian are computed by correlation between partial derivative
wavefields associated with different model parameters. For 1D media, the approximate Hessian
is a band-diagonal matrix, the numerical bandwidth decreasing as the frequency increases. The
off-diagonal elements of the approximate Hessian accounts for the limited-bandwidth effects
resulting from the experimental setup. Applying its inverse to the gradient can be interpreted

80
2.2 FWI as a least-squares local optimization

Distance (km)
a) 0 1 2 3
0

Depth (km)
1

b) 0 1 2 3
0

Depth (km)
1

3
c) 0 1 2 3
0
Depth (km)

Figure 2.1: Reconstruction of an inclusion by frequency-domain FWI. a) true model. (b-c)

FWI models built by a preconditioned gradient method (b) and by a Gauss-Newton method
(c). Four frequencies (4, 5, 7 and 10 Hz) were inverted. One iteration per frequency was
computed. Fourteen shots are deployed along the top and the left edges of the model. The
shots along the top edge are recorded by fourteen receivers along the bottom edge, while The
shots along the left edges are recorded by fourteen receivers along the right edge. The P-wave
velocities in the background and in the inclusion are 4.0 km/s and 4.2 km/s respectively. Of
note, the improved focusing of the inclusion resulting from the deconvolution of the gradient
by the approximate Hessian in the Gauss-Newton method.

as a deconvolution of the gradient from this limited bandwidth effects. An illustration of the
scaling and deconvolution effects performed by the diagonal Hessian on one hand and by the
approximate Hessian on the other hand is illustrated in Figure 2.1, where a single inclusion in
a homogeneous background model (Figure 2.1a) is reconstructed by one iteration of FWI using
a gradient method preconditioned by the diagonal terms of the approximate Hessian (Figure
2.1b) and by a Gauss-Newton method (Figure 2.1c). The model obtained with the Gauss-
Newton method is better focused than the one obtained with the scaled gradient method. The
corresponding approximate Hessian and its diagonal elements are shown in Figure 2.2.
An interpretation of the second term of the Hessian, equation (2.9), is given in Pratt et al.
(1998). This term accounts for multi-scattering events such as multiples in the reconstruction
procedure. Through iterations, we may correct effects due to this missing term as long as
convergence is achieved.
Although equation (2.17) gives some clear insight of the physical sense of the gradient of

81
INVERSE PROBLEM : THEORY

Column number
a) 0
0 200 400 600 800

200

400

Row number
600

800

Distance (km)
b) 0 5 10 15 20 25 30
0

10
Depth (km)

Figure 2.2: a) Approximate Hessian corresponding to the 31 x 31 model of Figure 2.1 for
a frequency of 4 Hz. A close-up of the area delineated by the yellow square highlights the
band-diagonal structure of the Hessian which describes the correlation between partial deriva-
tive wavefields associated with closely-spaced diffractors. b) Corresponding diagonal terms of
the Hessian plotted in the distance-depth domain. The high-amplitude coefficients indicate
the source and receiver positions. Scaling the gradient by this map removes the geometrical
amplitude effects from the wavefields.

the misfit function, it is unpractical from a computer-implementation point of view because

explicitly forming the sensitivity matrix with equation (2.17) would require to perform as many
forward problems as the number of model paremeter ml (l = 1, M ) for each source of the survey.
For the mitigation of this computational burden, spatial reciprocity of Green’s functions can
be exploited as shown below.
Plugging the expression of the partial derivative of the wavefield, equation (2.17), in the
expression of the gradient, equation (2.8), gives the following expression of the gradient
" t #
t ∂B −1t t ∗
∇Cl = < u B R ∆d , (2.19)
∂ml

The column of B−1 corresponds to the Green’s functios for unit impulse sources located at
each node of the model. In virtue of the spatial reciprocity of Green’s functions, the matrix
t
B−1 is symmetric, hence, B−1 can be substituted by B−1 in equation (2.19). This gives the

82
2.2 FWI as a least-squares local optimization

following expression
" t # " t #
∂B ∂B
∇Cl = < ut B−1 Rt ∆d∗ = < ut rb . (2.20)
∂ml ∂ml

Wavefield rb corresponds to the backpropagated residual wavefield. All the residuals associated
with one seismic source are assembled to form one residual source. The backpropagation in
time is indicated by the conjugate operator in the frequency domain. The number of forward
seismic problems for the computation of the gradient is reduced to two: one to compute the
incident wavefield u and one to backpropagate the corresponding residuals. The underlying
imaging principle is that of reverse time migration and relies on the correspondence of the
arrival times of the incident wavefield and of the backpropagated wavefield at the position of
the heterogeneity (Claerbout, 1971; Lailly, 1984; Tarantola, 1984).
The approach consisting of computing the gradient of the misfit function without explicitly
building the sensitivity matrix is often referred to as the adjoint-wavefield approach by the geo-
physical community. The mathematical theory allowing to compute the gradient of a functional
without forming the sensitivity matrix is the adjoint-state method (Lions, 1972). A detailed
review of the method with illustrations to several seismic problems is given in Tromp et al.
(2005), Plessix (2006), Askan (2006) and Epanomeritakis et al. (2008) for different seismological
problems. The expression of the gradient of the frequency-domain FWI misfit function, equa-
tion (2.20), is derived with the adjoint-state method and the method of Lagrange multiplier in
appendix A.
One element of the sensitivity matrix is given by
h ti
∂B
Jk(s,r),l = pts ∂m l
B−1 δr , (2.21)

where k(s, r) denotes a source-receiver couple of the acquisition geometry, with s and r a shot
and a receiver position, respectively. δr is an impulse source located at the receiver position
r. If the sensitivity matrix must be built, one forward problem for the incident wavefield and
one forward problem per receiver position must be computed. Therefore, explicit building of
the sensitivity matrix can be significantly more computationally expensive than that of the
gradient if the number of non redundant receiver positions significantly exceeds that of the non
redundant shots or vice versa. Explicit building of the sensitivity matrix can be required to
compute the diagonal terms of the Hessian, which provides a judicious scaling of the gradient
(Ravaut et al., 2004). To overcome the computational burden associated with the building of
the sensitivity matrix for coarse OBS surveys, (Operto et al., 2006) suggested to compute the
diagonal Hessian for a decimated shot acquisition. Alternatively, Shin et al. (2001) proposed to
use an approximation of the diagonal Hessian, which can be computed at the same cost than
the gradient.
Whereas the matrix-free adjoint approach is widely used in exploration seismology, the
earthquake-seismology community tends to favor method based on the explicit building of
the sensitivity matrix, the so-called scattering-integral method (Chen et al., 2007): the linear
system relating the model perturbation to the data perturbation is formed and solved with a
conjugate-gradient algorithm such as LSQR (Paige and Saunders, 1982a). Although only phase
of selected arrivals is involved in the inversion, Green’s functions in the sensitivity kernel are
computed with the two-way wave equation (Tromp et al., 2005; Chen et al., 2007). A compara-
tive complexity analysis of the adjoint approach and the scattering-integral approach approach

83
INVERSE PROBLEM : THEORY

is presented in Chen et al. (2007). They concluded that the scattering-integral approach out-
performed the adjoint one for a regional tomographic problem, although it requires more disk
storage. Indeed, the superiority of one approach compared to the other is highly dependent to
the acquisition geometry (the relative number of sources and receivers) as well as the number
of model parameters. We speculate that for large-scale problems involving thousands of shots
and receivers, the scattering-integral approach will become intractable.
For multiple sources and multiple frequencies, the gradient is formed by the summation
over sources and over frequencies
Nω XNs
" t #
t ∂Bi
X
−1 −1 t ∗
∇Cl = < [Bi sj ] [Bi R ∆di,j ] . (2.22)
∂ml
i=1 j=1

Let us remind that matrices B−1 i (i = 1, Nω ) do not depend on shots, therefore any speedup on
resolving systems involving these matrices with multiple sources should be considered (Marfurt,
1984; Jo et al., 1996; Stekl and Pratt, 1998).
The formalism in equation (2.22) has been kept as general as possible and could be either
for the acoustic or for elastic wave equation. In the acoustic case, the wavefield is the pressure
scalar wavefield whereas, in the elastic case, the wavefield is ideally formed by the components
of the particle velocity and the pressure if the sensors have four components. Equation (2.22)
can be translated in the time domain using the Parseval relation. The expression of the
gradient in equation (2.22) can be equivalently developed using a functional analysis (Tarantola,
1984). The partial derivatives of the wavefield with respect to the model parameters are
provided by the kernel of the Born integral relating the model perturbations to the wavefield
perturbations. Multiplication of the transpose of the resulting operator to the conjugate of
the data residuals provides the expression of the gradient. The two formalisms (the matricial
one and the functional one) give the same expression provided that the discretization of the
partial differential operators are performed consistently in the two approaches. The derivation
in the frequency domain of the gradient of the misfit function using the two formalisms has
been provided explicitly by Gelis et al. (2007).

2.2.4 Regularization and preconditioning of inversion

As widely underlined, FWI is an ill-posed problem, meaning that an infinity of models match
the data. Some regularizations are conventionally applied to the inversion to make it better
posed (Menke, 1984; Tarantola, 1987; Scales et al., 1990). The misfit function can be augmented
as follows
1 1
C(m) = ∆d† Wd ∆d + λ (m − mprior )† Wm (m − mprior ) (2.23)
2 2
where Wd = Std Sd and Wm = Stm Sm . Wd and Wm are weighting operators, the inverse of
the data and model covariance operators in the frame of the Bayesian formulation of FWI
(Tarantola, 1987).
Sd can be implemented as a diagonal weighting operator, which controls the respective
weight of each element of the data misfit vector. For example, Operto et al. (2006) have used
Sd as a power of the source-receiver offset to strengthen the contribution of large-offset data for
crustal-scale imaging. In geophysical applications where one often seeks the smoothest model
that fits the data, the aim of the least-squared regularization term in the augmented misfit

84
2.2 FWI as a least-squares local optimization

function, equation (2.23), is to minimize the roughness of the model m, hence, defining the
so-called Tikhonov regularization. The operator Sm is generally a roughness operator such as
the first-difference or the second-difference matrices (Press et al., 1986, p. 1007).
For linear problem (assuming the second term of the Hessian is neglected), the minimization
of the weighted misfit function gives for the perturbation model,

J0 † Wd J0 + Wm λ ∆m = −< J0 t Wd ∆d∗ + λWm (m − mprior )

−1
J0 † Wd J0 + λ ∆m = −< Wm −1
J0 t Wd ∆d∗ + λ(m − mprior )

Wm (2.24)

Let us note that, since Wm is a roughness operator, Wm −1 is a smoothing operator. It can

be implemented, for example, with a multidimensional adaptive Gaussian smoother (Ravaut

et al., 2004; Operto et al., 2006) or with a low-pass filter in the wavenumber domain (Sirgue,
2003).
For the steepest-descent algorithm, the regularized solution for the perturbation model is
given by
−1
< Jt0 Wd ∆d∗0 ,

∆m = −αWm (2.25)
considering m = mprior
As mentioned above, the gradient can be also efficiently scaled by the inverse of the diagonal
Hessian. This scaling can be embedded into the expression of Wm −1 = G/diag (H ) where G
a
denotes a smoothing operator. A more complete and rigorous mathematical derivation of these
equations is presented in Tarantola (1987).
Alternative regularizations based on the minimization of the total variation of the model,
have been developed, mainly by the image-processing and electromagnetic communities, the
aim of which is to preserve both the blocky and the smooth characteristics of the model. These
regularizations are called edge-preserving regularization or total variation regularization (Vogel
and Oman, 1996; Vogel, 2002). Total variation regularization is conventionally implemented by
minimization of the L1 norm of the model-misfit function, RT V = (∆mWm ∆m)1/2 . Alterna-
tively, van den Berg and Abubakar (2001) have implemented the total variation regularization
as a multiplicative constraint in the original misfit function. In this framework, the original
misfit function can be seen as the weighting factor of the regularization term, which is automat-
ically updated by the optimization process without the need of an heuristic tuning. Application
of the total variation regularization to FWI is shown in Askan and Bielak (2008). Application
of the weighted L2 -norm regularization to frequency-domain FWI is shown in Hu et al. (2009)
and Abubakar et al. (2009).

85
Chapter 3

FWI in practice

This chapter is mainly based on the papers :

• Virieux, J. and S. Operto [2009] An overview of full-waveform inversion in exploration

geophysics. Geophysics 74(6), WCC1–WCC26.

• Brossier [2011] Two-dimensional frequency-domain visco-elastic full waveform inversion:

Parallel algorithms, optimization and performance. Computers & Geosciences 37(4),
444-455

3.1 Introduction

In this chapter, we introduce several practical aspects of FWI, as a preliminary part for real
applications. We therefore review some key features of FWI : we first highlight the relationships
between the experimental set-up (source bandwidth, acquisition geometry) and the spatial
resolution of FWI. The resolution analysis provides the necessary guidelines to design the
multiscale FWI algorithms that are required to mitigate the nonlinearity of FWI. We discuss
the pros and cons of the time and frequency domains for efficient multiscale algorithms. We then
address the source wavelet estimation as an unknown of the inverse problem, and alternatives
of the classical Born linearization and least-square objective function. A key issue of FWI is the
initial model from which the local optimization is started, and we discuss several tomographic
approaches to build a starting model. The implementation of FWI is finally presented, as this
is a computer-demanding technology.

3.2 Resolution power of FWI and relationship with the exper-

imental setup

The interpretation of the partial derivative wavefield as the wavefield scattered by the miss-
ing heterogeneities draws some connection between FWI and diffraction tomography (Devaney,
1982; Wu and Toksöz, 1987): Pratt et al. (1998) defines FWI as the generalization of the diffrac-
tion tomography to arbitrary heterogeneous background models. The diffraction tomography
method recasts the the imaging as an inverse Fourier transform (Wu and Toksöz, 1987; Sirgue
FWI IN PRACTICE

a) Distance (km) Distance (km) Distance (km)

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
0 0 0

^ ^
Depth (km)

1 1 1 s r
2 ^
s 2 2
^
r
3 3 3 k
4 4 4
b) Distance (km)
0 10 20 30 40 50 60 70 80 90 100
0

5
Depth (km)

c) 0 10 20 30 40 50 60 70 80 90 100
0

5
Depth (km)

X
10 S X R
15

20
X
θ

Figure 3.1: a) Incident monochromatic plane wave (real part). b) Scattered monochromatic
plane wave (real part). c) Gradient of FWI describing one wavenumber component (real part)
built from the plane waves shown in Figures 3a and 3b. d) Monochromatic Green’s function
for a point source. e) Wavepath for a receiver located at an horizontal distance of 70 km from
the source. Frequency is 5 Hz and the velocity in the homogeneous background is 6 km/s.
The dash red lines delineate the first Fresnel zone and an isochrone surface. The yellow line is
a vertical section across the wave path. The blue lines represents diffraction paths within the
first Fresnel zone and from the isochrone.

and Pratt, 2004). Let us consider an homogeneous background model of velocity c0 , an incident
monochromatic plane wave propagating in the direction of ŝ, and a scattered monochromatic
plane wave in the far-fied approximation propagating in the direction of r̂ (Figure 3.1). If
amplitude effects are not taken into account, the incident and scattered Green’s functions can
be compactly written as.

G0 (x, s) = exp(ik0 ŝ.x)

(3.1)
G0 (x, r) = exp(ik0 r̂.x),

88
3.2 Resolution power of FWI and relationship with the experimental setup

S R

λ=c/f
θ

ps
pr q
k =f q
q = ps + pr
ps = pr = 1/c

Figure 3.2: Illustration of the main parameters in diffraction tomography and their relationship.
λ: wavelength. θ diffraction or aperture angle. c: P-wave velocity. f : frequency. pS , pR , q:
slowness vectors. k: wavenumber vector. x: diffractor point. S and R: source and receiver
positions.

with the relation k0 = ω/c0 . Plugging the expression of the incident and scattered plane waves
in the gradient of the misfit function, equation (2.20), gives the following expression (Sirgue
and Pratt, 2004),
2
P P P
∇C(m) = −ω ω s r R {exp(−ik0 ŝ.x) × exp(−ik0 r̂.x)δd} (3.2)
−ω 2 ω s r R {exp(−ik0 (ŝ + r̂).x)∆d} .
P P P

The expression has the form of a truncated Fourier series whose integration variable is the
scattering wavenumber vector, the expression of which is k = k0 (ŝ + r̂). The coefficients of the
series are the data residuals. The summation is performed over source, receiver and frequencies,
that controls the truncation and the sampling of the Fourier series.
We can express the wavenumber vector k0 (ŝ + r̂) in the argument of the basis functions as
a function of frequency, diffraction angle or aperture to highlight the relationship between the
experimental setup and the spatial resolution of the reconstruction (Figure 3.2).
2f
k= cos (θ/2) n, (3.3)
c0

where n is a unit vector in the direction of (ŝ + r̂).

Equation (3.3) was also derived in the frame of the ray+Born migration/inversion recast
either as the inverse of a generalized Radon transform (Miller et al., 1987) or as a least-squares
inverse problem (Lambaré et al., 2003).

89
FWI IN PRACTICE

The modulus of the wavenumber vector depends on the frequency, the aperture angle and
the velocity. Several key conclusions can be driven from the equation (3.3).
First, one frequency and one aperture in the data space map one wavenumber in the model
space. Therefore, frequency and aperture have a redundant control on the wavenumber cover-
age. This redundancy increases with the aperture bandwidth. Pratt and Worthington (1990),
Sirgue and Pratt (2004) and Brenders and Pratt (2007a) have proposed to decimate this redun-
dancy in the wavenumber coverage by limiting the inversion to few discrete frequencies when
wide-aperture geometries are available, hence, leading to computationally-efficient FWI. The
guideline to select the frequencies to be involved in FWI is that the maximum wavenumber
imaged by a frequency corresponds to the minimum vertical wavenumber imaged by the next
frequency Sirgue and Pratt (2004, Figure 3).
Second, the low-frequency content and the wide apertures contribute to resolve the large
and intermediate wavelengths of the medium. The maximum resolution provided by θ = 0 is
half a wavelength if normal incidence reflections are recorded by the acquisition geometry.
Third, for surface acquisitions, long offsets are helpful to sample the small horizontal
wavenumbers of dipping structures, such as flanks of salt dome.
A frequency-domain sensitivity kernel for point sources named wavepath by Woodwards
(1992) is shown in Figure 3.3. The interference picture shows zones of equi-phase on which
the residuals are backprojected during FWI. The central zone of elliptical shape is the first
Fresnel zone, the width of which is the square root of the product between the source-receiver
offset and the wavelength. Residuals matching the first arrival with an error lower than half a
period will be backprojected constructively over the first Fresnel zone, hence, updating the large
wavelengths of the structure. The outer fringes are isochrones, on which residuals associated
with later-arriving reflection phases will be backprojected, hence, providing an update of the
shorter wavelengths of the medium just like prestack depth migration. The width of the
isochrones is given by the modulus of the wavenumber of equation (3.3).
To illustrate the relationship between the resolution power of FWI and the acquisition ge-
ometry, we show results of a simple synthetic experiment, where an inclusion in a homogeneous
background is reconstructed from three acquisition geometries (Figure 3.4). In the crosshole
experiment (Figure 3.4a), FWI have reconstructed a low-pass filtered (smoothed) version of the
vertical section of the inclusion and a band-pass filtered version of the horizontal section of the
inclusion. This anisotropy of the imaging results from the transmission-like reconstruction of
the vertical wavenumbers and the reflection-like reconstruction of the horizontal wavenumbers
of the inclusion. In the case of the double crosshole experiment (Figure 3.4b), the vertical
and horizontal wavenumber spectrum of the inclusion have been partly filled-up due to the
combined use of transmission and reflection wavepaths. Of note, the vertical section exhibits a
deficit of small wavenumbers, whereas the horizontal section exhibits a deficit of low wavenum-
bers because the model is twice longer than large. Therefore, the aperture illuminations of
the horizontal and vertical wavenumbers differ. The last favorable case corresponds to sur-
face acquisition (Figure 3.4c). In this case the vertical section exhibits a strong deficit of low
wavenumbers due to the deficit of large aperture illumination. Of note, the pick-to-pick am-
plitude of the perturbation is however fully recovered in Figure 3.4c. The horizontal section of
the inclusion is poorly recovered due to the poor illumination of the horizontal wavenumbers
from the surface.
The ability of the wide-apertures to resolve the large wavelengths of the medium has

90
3.2 Resolution power of FWI and relationship with the experimental setup

kz max
kz

min
kz

f1 f2 f3 f

Figure 3.3: Illustration of the criterion of Sirgue and Pratt (2004) to design efficient frequency-
domain FWI by choosing a limited subset of frequency in the inversion. The horizontal axis
denotes frequency and the vertical one denotes vertical wavenumbers. The frequency sam-
pling interval in FWI is defined such that the maximum wavenumber imaged by a frequency
f corresponds to the minimum wavenumber imaged by the next frequency. The criterion is
provided assuming a homogeneous background model and a flat reflector, which does not rep-
resent realistic configuration. However, it provides some useful guidelines to select frequencies
in FWI.

prompted some authors to consider long-offset acquisitions as a promising approach to de-

sign well-posed FWI problems (Pratt et al., 1996; Ravaut et al., 2004). For example, equation
(3.3) may suggest that the long wavelengths of the medium can be resolved, whatever the
source bandwidth is, provided that wide apertures data are recorded by the acquisition geom-
etry. However, it is worth reminding that all the conclusions derived so far rely on the Born
approximation, the domain of validity of which requires that the starting model allow to match
observed traveltimes with an error less than half the period (Beydoun and Tarantola, 1988).
If not, the so-called cycle-skipping artifacts will lead to convergence towards a local minima
(Figure 3.5).
Pratt et al. (2008) translates this condition in terms of relative time error ∆t/TL as a
function of the number of propagated wavelengths Nλ expressed as

∆t 1
< , (3.4)
TL Nλ

where TL denotes the duration of the simulation. Condition (3.4) shows that the traveltime
error must be less than 1 % for an offset involving 50 propagated wavelengths, a condition
that will be unlikely satisfied if FWI is applied without data preconditioning. Therefore, some

91
FWI IN PRACTICE

a) Distance (km)
VP(km/s) 0 1 2 3 4 5 6 7 8
0

1
4.1

Depth (km)
2

4.0 3

b) Distance (km)
VP(km/s) 0 1 2 3 4 5 6 7 8
0
4.2
1
Depth (km)

4.1 2

4.0
4

c) Distance (km)
VP(km/s) 0 1 2 3 4 5 6 7 8
0

1
Depth (km)

4.0
2

3.9
4

Figure 3.4: Imaging an inclusion by FWI. a) Cross-hole experiment. The source and the receiver
lines are in red and blue respectively. The contour of the inclusion, the diameter of which is
400 m, is delineated by the blue circle. The true velocity in the inclusion is 4.2 km/s, while the
velocity in the background is 4 km/s. Six frequencies (4, 7, 9, 11 and 15 Hz) were successively
inverted and 20 iterations per frequency were computed. The black and gray curves along the
right and bottom side of the reconstructed model are velocity profiles across the center of the
inclusion extracted from the exact model and the reconstructed one respectively. b) Same as
(a) for a vertical and horizontal cross-hole experiment (the shots along the red dash line are
recorded only by the receivers along the vertical blue dash line). c) Same as (a) for a surface
experiment.

92
3.3 Multiscale FWI: time-domain versus frequency-domain

T/2 T/2

n-1 n n+1

Time (s)

Figure 3.5: Schematic illustration of cycle skipping artifacts in FWI. The solid black line
represents a monochromatic seismograms of period T as a function of time. The upper dash
line represents the modeled monochromatic seismograms with a time delay greater than T /2.
In this case, FWI will update the model such that the n + 1th cycle of the modeled seismograms
will match the nth cycle of the observed seismogram, leading to erroneous model. In the bottom
example, FWI will update the model such that the modeled and recorded nth cycle are in phase
because the time delay is lower than T /2.

authors consider that recording of low frequencies (< 1 Hz) is the best strategy to design well-
posed FWI (Sirgue, 2006). Unfortunately, such low frequencies still cannot be recorded during
controlled-source experiments. Alternatively to low frequencies, layer-stripping approaches,
where longer offsets and longer recording times, are progressively introduced in FWI, should
be viewed to mitigate the nonlinearities introduced by long offsets.

3.3 Multiscale FWI: time-domain versus frequency-domain

FWI can be implemented either in the time domain or in the frequency domain. FWI has
been originally developed in the time domain (Tarantola, 1984; Gauthier et al., 1986; Mora,
1987; Crase et al., 1990), while the frequency domain approach has been mainly proposed in
the nineties by G. Pratt and collaborators, first with application to cross-hole data (Pratt
and Worthington, 1990; Pratt, 1990; Pratt and Goulty, 1991) and later on with application to
wide-aperture surface seismic data (Pratt et al., 1996). The nonlinearity of FWI prompts many
authors to develop some hierarchic multiscale strategies to mitigate this nonlinearity. Apart
computational efficiency, the flexibility offered by the time domain or the frequency domain
to implement efficient multiscale strategies is one of the main criteria to favor one domain
rather than the other one. The multiscale strategy will successively process data subsets of
increasing resolution power to incorporate smaller wavenumbers in the tomographic models.
In the time domain, Bunks et al. (1995) has proposed to successively invert data of increasing

93
FWI IN PRACTICE

high-frequency content, since the low frequencies are less sensitive to cycle-skipping artifacts.
The frequency domain provides a more natural framework for this multiscale approach by
performing successive inversions of increasing frequencies. In the frequency domain, either
single frequencies or multiple frequencies (i.e., frequency group) can be inverted at a time.
As mentioned in the previous section, only a limited number of discrete frequencies needs
theoretically to be inverted provided that the acquisition geometry samples a sufficiently-broad
spectrum of aperture angles. This allows to design computationally-efficient multiscale FWI
algorithm (Pratt and Worthington, 1990; Sirgue and Pratt, 2004; Brenders and Pratt, 2007a).
However, simultaneously inverting redundant multiple frequencies should help to improve the
signal-to-noise ratio or to improve the robustness of FWI when complex wave phenomena are
observed (i.e., guide waves, surface waves, dispersive waves). Therefore, a trade-off between
computational efficiency and quality of the imaging needs to be found. When simultaneous
multi-frequency inversion is performed, the bandwidth of the frequency group must be ideally
as large as possible to mitigate the nonlinearity of FWI in term of non-unicity of the solution,
whereas the maximum frequency of the group must be sufficiently low to prevent cycle-skipping
artifacts, which illustrates the nonlinearity of FWI with respect to the inaccuracy of the starting
model. An illustration of this tuning of FWI is illustrated in Brossier et al. (2009a) in the frame
of elastic seismic imaging of complex onshore models from the joint inversion of surface waves
and body waves. Some of these results are presented in section 4.2.
The regularization effect introduced by hierarchical inversion of data subsets of increasing
high-frequency content may not be sufficient to provide reliable FWI results for realistic fre-
quencies and realistic starting models in the case of complex structures. This has prompted
some authors to design additional regularization levels in FWI. One is the selection of a subset
of arrivals as a function of time. One aim of the time windowing may be to remove arrivals not
predicted by the physics of the wave equation implemented in FWI (for example, PS converted
waves in the frame of acoustic FWI). A second aim may be to perform a selection of aperture
angles or offset in the data to design multiscale and layer-stripping strategies (Shipp and Singh,
2002; Sears et al., 2008). Considering a narrow time-window centred on the first-arrival leads
to the so-called early-arrival waveform tomography (Sheng et al., 2006). Although the early-
arrival waveform tomography has been formally introduced by Sheng et al. (2006), the strategy
consisting of windowing in time the data around the first arrival was systematically applied
during the pioneering applications of FWI to real data (Pratt and Shipp, 1999; Ravaut et al.,
2004; Operto et al., 2006). Time windowing the data around the first arrivals is equivalent to
select heuristically the large aperture components of the data. Alternatively, time windowing
can be applied to isolate reflections or PS converted phases to focus on the imaging of a specific
reflector or to focus on the imaging of a specific parameter class such as the S-wave velocity.
(Shipp and Singh, 2002; Sears et al., 2008; Brossier et al., 2009a).
Whereas the frequency domain is the most appropriate one to select one or few frequencies
for FWI, the time domain one is the most appropriate to select one type of arrival for FWI.
Indeed, time windowing cannot be applied in frequency domain modeling, where a limited num-
ber of frequencies is modeled at a time. A last resort is the use of complex-valued frequencies,
which is equivalent to exponentially damp a signal p(t) in time from an arbitrary traveltime t0
(Sirgue, 2003; Brenders and Pratt, 2007b) expressed in the following expressions,
R +∞
p(t) = −∞ P (ω)eiωt dω,
(t−t )
− τ0
R +∞ t0 (3.5)
p(t)e = −∞ P (ω + i/τ )e τ eiωt dω,

94
3.4 Source wavelet estimation

where τ is the damping factor.

An other regularization level can be implemented by layer stripping where the imaging
proceeds hierarchically from the shallow part to the deep part. Layer stripping in FWI can be
applied by combined offset and temporal windowing (Shipp and Singh, 2002).
These three levels of regularization (frequency-dependent, time-dependent and offset-dependent)
can be combined in one integrated multi-loop FWI workflow. An example is provided Brossier
et al. (2009a) where the frequency-dependent and the time-dependent regularizations are im-
plemented into two nested loops over frequency groups and time damping factor (algorithm
3.1). In this approach, the frequencies increase in the outer loop, while the damping factors
decrease in the inner loop.
In summary, implementation of FWI in the frequency domain allows to easily implement
multiscale FWI based on hierarchic inversion of groups of frequencies of arbitrary bandwidth
and sampling intervals. Time-domain modeling provides the most flexible framework to apply
time windowing of arbitrary geometry. This makes frequency-domain FWI based on time-
domain modeling an attractive strategy to design robust FWI algorithms. This is especially
true for 3D problems, for which time-domain modeling exhibits several advantages with respect
to frequency ones as mentionned previously (Sirgue et al., 2008).

Algorithm 3.1 Two-level hierarchical frequency-domain FWI algorithm (from Brossier et al.,
2009a)
1: for ig = 1 to ng do
2: for id = 1 to nd do
3: while (NOT convergence AND n < nmax ) do
4: for if = 1 to nf do
5: Compute incident wavefields u from sources
6: Compute residual vectors ∆d and cost function C
7: Compute adjoint back-propagated wavefields r
8: Build gradient vector ∇C (n)
9: end for
10: Compute perturbation vector δm(n)
11: Define optimal step length α(n) by parabola fitting
12: Update model m(n+1) = m(n) + α(n) δm(n)
13: end while
14: end for
15: end for

3.4 Source wavelet estimation

The source excitation is generally unknown and must be considered as an unknown of the
problem (Pratt, 1999). The source wavelet can be estimated by solving a linear inverse problem
since the relationship between the seismic wavefield and the source is linear (equation (1.2))
The solution for the source s is given by the expression
< gcal (m0 )|dobs >
s= , (3.6)
< gcal (m0 )|gcal (m0 ) >

95
FWI IN PRACTICE

where gcal (m0 ) denote the Green’s functions computed in the starting model m0 . Estimation
of the source function can be estimated directly in the FWI algorithm, once the incident
wavefields have been modeled. Updating the source and the medium is performed alternatively
over iterations of the FWI. Note that one may take advantage of the source estimation to design
alternative misfit function based on the differential semblance optimization (Pratt and Symes,
2002) or to define more heuristic criterion to stop iteration of inversion (Jaiswal et al., 2009).

Alternatively, new misfit functions has been designed such that the inversion becomes in-
dependent of the source function (Zhou and Greenhalgh, 2003; Lee and Kim, 2003). The
governing idea of the method is the normalization of each seismogram of a shot gather by the
sum of all the seismograms. This removes the dependency of the normalized data with respect
to the source, but modifies the misfit function. The drawback is that this approach requires
the explicit estimation of the sensitivity matrix because the normalized residuals cannot be
backpropagated since they do not satisfy the wave equation.

3.5 Variants of Born-approximation

The sensitivity matrix is generally computed in the framework of the Born approximation,
which assumes a linear tangent relationship between the model perturbations and the wavefield
perturbations (Woodwards, 1992). This linear relationship between the model perturbation and
the wavefield perturbations can be inferred from the assumption that the wavefield computed
in the updated model is the wavefield computed in the starting model plus the perturbation
wavefield.

The Rytov approach considers the generalized phase as the wavefield (Woodwards, 1992).
The Rytov approximation provides a linear relationship between the complex-phase perturba-
tion and the model perturbation by assuming that the wavefield computed in the updated model
is related to the wavefield computed in the starting model through u(x) = u0 (x)exp (∆φ(x)),
where ∆φ(x) denotes the complex-phase perturbation. The sensitivity of the Rytov kernel
is zero on the Fermat raypath since the traveltime is stationary along this path. A lin-
ear relationship between the model perturbations and the logarithm of the amplitude ratio
Ln [A(ω)/A0 (ω)] is also provided by the Rytov approximation by taking the real part of the
sensitivity kernel of the Rytov integral, instead of the imaginary part that provides the phase
perturbation.

Whereas the Born approximation is valid in the case of weak and small perturbations,
the Rytov approximation is supposed to be valid for large aperture angles and small amount
of scattering per wavelength, i.e., smooth perturbations or smooth variation of the phase-
perturbation gradient (Beydoun and Tarantola, 1988). Although several analysis of the Rytov
approximation have been carried out, it remains unclear for us to which extent its domain of
validity significantly differs from that of the Born approximation. A comparison between the
Born approximation and the Rytov approximation in the frame of elastic frequency-domain
FWI is presented in Gelis et al. (2007). The main advantage of the Rytov approximation may
be to provide a natural separation between the phase and the amplitude (e.g., Woodwards,
1992). This splitting of phase and amplitude can also be related to the choice of the misfit
function, described in the next section.

96
3.6 Variants of classic least-squares

3.6 Variants of classic least-squares

While the most popular approach of FWI is based on the minimization of the least-squares
norm of the data misfit, several variants of FWI have been proposed over the last decade.
Variants may concern the definition of the norm or criterion used for the minimization or the
representation of the data (amplitude, phase, logarithm of the complex-valued data, envelope)
The least-squares-norm approach assumes a Gaussian distribution of the misfit (Tarantola,
1987). Poor results can be obtained when this assumption is not satisfied, for example, when
large-amplitude outliers affect the data. Therefore, a careful quality control of the data must
be carried out before least-squares inversion.
Crase (1989) have investigated several norms in the framework of time-domain FWI, such
as the least-squares norm L2 , the least-absolute-values norm L1 , the Cauchy criterion norm
and the hyperbolic secant (sech) criterion. The L2 criterion and the Cauchy criterion have
been also compared by Amundsen (1991) in the frame of frequency-wavenumber-domain FWI
domain for stratified media described by velocity, density, and layer thicknesses (Amundsen
and Ursin, 1991). They have considered random noise and weather noised and have concluded
in both cases that the Cauchy criterion has led to the most robust results. The Huber norm is
an other norm, which combines the L1 and the L2 norms and has been combined with quasi-
Newton L-BFGS by Guitton and Symes (2003) and Bube and Nemeth (2007). The Huber norm
has also been used in the frame of frequency-domain FWI by Ha et al. (2009) and has shown
an overall more robust behavior than the L2 norm.Brossier et al. (2010) have reinvestigated
the least-squares norm L2 , the least-absolute-values norm L1 and two hybrid L1 /L2 norms in
the framework of frequency-domain FWI that considers complex-values data.
The L1 is well designed for large amplitude error, by specifically ignoring the amplitude
of the residuals during the backpropagation of the residuals in the gradient building, making
this criterion less sensitive to large errors in the data. The Cauchy, the sech and other hybrid
criteria can be considered as a combination of the L1 and the L2 norms with different transitions
between the 2 norms. On the one hand, Crase (1989) have concluded that the most reliable
FWI results have been obtained with the Cauchy and the sech criteria. On the other hand,
the L1 appears to be more “automatic” since it does not requires an other parameter to design
the transition limit in the criterion behavior (Brossier et al., 2010). The section 4.3 illustrates
some results of Brossier et al. (2010).
Other criteria try to seperate the effect of phase and amplitude in frequency-domain FWI.
Bednar et al. (2007); Pyun et al. (2007) investigated the separation of the amplitude and the
phase from a frequency-domain FWI code using a logarithmic norm (Shin and Min, 2006;
Shin et al., 2007), the use of which leads to the Rytov approximation in the frame of local
optimization problems. They conclude that phase-only and phase-amplitude inversion were
the most robust, compared to amplitude-only criterion.
Finaly, other criteria try to mitigate the non-linearity by considering other observables in
the date. Fichtner et al. (2008) developed a time-frequency domain scheme for continental- and
global-scale FWI, where the misfit of the phase and the misfit of the envelopes are minimized in
a least-squares sense. The expected benefit from this approach is to mitigate the nonlinearity
of FWI by separating the phase and amplitude in the inversion, and by inverting the envelope
instead of the amplitudes, the former being more linearly related to the data. Tape et al. (2009)
use a cross-correlation criterion in time-domain FWI to remove amplitude and consider only

97
FWI IN PRACTICE

delayed time in the misfit.

3.7 Building starting models for FWI

The ultimate goal in seismic imaging is to be able to apply reliably FWI from scratch, i.e.,
without need of sophisticated a priori information. Unfortunately, since multidimensional FWI
can be presently attacked only through local optimization approaches, building an accurate
starting model for FWI remains one of the most topical issue because very low frequencies
(< 1 Hz) still cannot be recorded in the frame of controlled-source experiments.
A starting model for FWI can be built by reflection tomographic approaches and migration-
based velocity analysis such as those used in oil and gas exploration. A review of the tomo-
graphic workflow is given in Woodward et al. (2008).
Other possible approaches for building accurate starting models, which should tend towards
a more automated procedure and which may be more closely-related to FWI, are first-arrival
traveltime tomography (FATT), stereotomography and Laplace-domain inversion.
FATT performs nonlinear inversion of first-arrival traveltimes to produce smooth models of
the subsurface (e.g., Nolet, 1987; Hole, 1992; Zelt and Barton, 1998). Traveltime residuals are
backprojected along the raypaths to compute the sensitivity matrix. Once the tomographic sys-
tem linearly relating the traveltime residuals to the model pertutbations through the sensitivity
matrix is built, the linear system is generally solved with a conjugate gradient algorithm such
as LSQR (Paige and Saunders, 1982b). Smoothness regularization are conventionally added in
the tomographic system to penalize the roughness of the perturbation models. Alternatively,
the adjoint-state method can be applied to FATT and allows to avoid the explicit building
of the sensitivity matrix just like in FWI (Taillandier et al., 2009). The spatial
√ resolution of
FATT is estimated to be the width of the first Fresnel zone width, given by λ × osr where osr
denotes the source-receiver offset (Williamson, 1991) (Figure 3.1). Examples of applications
of FWI to real data using a starting model built by FATT are shown, for example, in Ravaut
et al. (2004), Operto et al. (2006), Jaiswal et al. (2008), Jaiswal et al. (2009),Malinowsky and
Operto (2008) for surface acquisitions; in Dessa and Pascal (2003) in the frame of ultrasonic
experimental data; in Pratt and Goulty (1991) for cross-hole data and in Gao et al. (2006) for
VSP data. Several blind tests corresponding to surface acquisitions have been tackled by joint
FATT and FWI. Results at the oil-exploration scale and at the lithospheric scale are presented
in Brenders and Pratt (2007b), Brenders and Pratt (2007a) and Brenders and Pratt (2007c).
Results suggest that very low frequencies and very large offsets are required to obtain reliable
results of FWI when the starting model is built by FATT. For example, only the upper part of
the BP benchmark model has been successfully imaged by Brenders and Pratt (2007c) using a
starting frequency as small as 0.5 Hz and a maximum offset of 16 km.
Apart the need of long-offset recordings, a drawback of FATT is that the method is not
suited when low-velocity zones exist because theses low velocity zones will create shadow zones.
Moreover, reliable picking of first-arrival times is a difficult issue when low-velocity zones exist.
In the frame of FWI, fitting first-arrival traveltimes does not guarantee that computed travel-
times of later-arriving phases such as reflections will match the true reflection traveltimes with
an error which does not exceed half a period (validity criterion of the Born approximation), es-
pecially if anisotropy affects the wavefield. However, progressive incorporation of shorter aper-
tures in FWI by time windowing or damping may help to mitigate the nonlinearity resulting

98
3.7 Building starting models for FWI

from the kinematic inconsistency between horizontal and vertical wavepaths. Let us underline
that FATT can be recast as a phase inversion of the first arrival using a frequency-domain
waveform inversion algorithm within which complex-valued frequencies are implemented (Min
and Shin, 2006; Effelsen, 2009). Compared to FATT based on the high-frequency approxima-
tion, this approach helps to take into account the limited bandwidth effect of the data in the
sensitivity kernel of the tomography. Judicious selection of the real and of the imaginary parts
of the frequency allow extraction of the phase of the first arrival. Principles and applications
of the method are presented in Min and Shin (2006) and in Effelsen (2009) for near-surface
applications. This is strongly related to finite-frequency tomography (Montelli et al., 2004).
Traveltime tomography methods which can manage both refraction and reflection trav-
eltimes should provide more consistent starting model for FWI. Among these methods, the
method called stereotomography is probably one of the most promising approach because it
exploits the slope of locally coherent events and reliable semi-automatic picking procedure has
been developed (see Lambaré (2008) for a review). Applications of stereotomography to syn-
thetic and real data sets were presented in Billette and Lambaré (1998), Alerini et al. (2002),
Billette et al. (2003), Lambaré and Alérini (2005) and Dummong et al. (2008).
To illustrate the sensitivity of FWI to the accuracy of different starting models, FWI
reconstructions of the synthetic Valhall model are shown in Figure 3.6 when the starting model
is built by FATT and reflection stereotomography (Prieux et al., 2009). In the case of the
stereotomography, the maximum offset was 9 km and only the reflection traveltimes were
used (Lambaré and Alérini, 2005), whereas the maximum offset was 32 km for FATT (Prieux
et al., 2009). The stereotomography successfully reconstructs the large wavelength within the
gas cloud down to a maximum depth of 2.5 km, whereas FATT fails to reconstruct the large
wavelengths of the LVZ associated with the gas cloud. However, the FWI model inferred
from the FATT starting model shows a very accurate reconstruction of the shallow part of the
model. These results suggest that joint inversion of refraction and reflection traveltimes by
stereotomography may provide a promising framework to build starting models for FWI.
A third approach to build a starting model for FWI may be provided by Laplace-domain
and Laplace-Fourier domain inversions (Shin and Cha, 2008; Shin and Ha, 2008, 2009). The
Laplace-domain inversion can be viewed as a frequency-domain waveform inversion using
complex-valued frequencies (see equation (3.5)), the real part of which is zero and the imagi-
nary part controls the time-damping of the seismic wavefield. In other words, the principle is
the inversion of the DC component of damped seismograms where the Laplace variable s cor-
responds to 1/τ in equation (3.5). Whereas the DC of the undamped data is nil, the DC of the
damped data is not and is exploited in Laplace-domain waveform inversion. The information
preserved in the data may be similar to the amplitude of the wavefield (Shin and Ha, 2009).
Laplace-domain waveform inversion provides a smooth reconstruction of the velocity model,
which can be used as a starting model for Laplace-Fourier and classical frequency-domain
waveform inversions. The Laplace-Fourier domain is equivalent to perform inversion of seismo-
grams damped in time (Shin and Ha, 2009). The results of Shin and Ha (2009) suggest that
the method can be applied to frequencies smaller than the minimum frequency of the source
bandwidth. The ability of the method to use frequencies smaller than the frequencies effectively
propagated by the seismic source is called the ’mirage-like’ resurrection of the low frequencies
by Shin and Ha (2009). Application to real data from the Gulf of Mexico is shown in Shin and
Ha (2009). For the real data application, frequencies between 0 and 2 Hz in combination with
nine Laplace damping constants were used for the Laplace-Fourier domain inversion, the final

99
FWI IN PRACTICE

a) Distance (km) b) Distance (km)

4 5 6 7 8 9 10 11 4 5 6 7 8 9 10 11
0 0

Depth (km)
1 1
2 2
3 3
4 4
c) 4 5 6 7 8 9 10 11 d) 4 5 6 7 8 9 10 11
0 0
Depth (km)

1 1

2 2

3 3

4 4
km/s
1.6 2.0 2.4 2.8 3.2
e) V (km/s) f) VP (km/s) g) VP (km/s)
P
1.2 2.2 3.2 1.2 2.2 3.2 1.2 2.2 3.2
0 0 0

1 1 1
Depth (km)

Depth (km)

2 2 2

3 3 3

4 4 4

Figure 3.6: a) Close-up of the synthetic Valhall velocity model centred on the gas layer. b)
FWI model built from a starting model obtained by smoothing the true model with a Gaussian
filter of horizontal and vertical correlation lengths of 500 m. c) FWI model from a starting
model built by FATT (Prieux et al., 2009). d) FWI model from a starting model built by
stereotomography (Lambaré and Alérini, 2005). e) Velocity profiles at a distance of 7.5 km
extracted from the true model (black line), from the starting model built by smoothing the true
model (blue) and from the FWI model of (b) (red). f) Same as (e) for the starting model built
by FATT and the corresponding FWI model (c). g) Same as (e) for the starting model built
by stereotomography and the corresponding FWI model (d). Frequencies used in the inversion
are between 4 Hz and 15 Hz.

model of which was used as the starting model for standard frequency-domain FWI.

The joint application of Laplace-domain, Laplace-Fourier-domain and Fourier-domain FWI

to the BP benchmark model is illustrated in Figure 3.7 (Shin and Ha, 2009). The starting
model is a simple velocity-gradient model (Figure 3.7b). A first velocity model of the large
wavelengths is obtained by Laplace-domain inversion (Figure 3.7c) that is subsequently used
as a starting model for inversion in the Laplace-Fourier domain inversion, the final model of

100
3.8 On the parallel implementation of FWI

Figure 3.7: Laplace-Fourier domain waveform inversion. a) BP benchmark model. b) Starting

model. c) Velocity model after Laplace-domain inversion. d) Velocity model after Laplace-
Fourier-domain inversion. e) Velocity model after frequency-domain full-waveform inversion
(image courtesy of C. Shin and Y. H. Cha).

which is shown in Figure 3.7d. During this stage, the starting frequency used in the inversion
of the damped data is as low as 0.01 Hz. The final model obtained after frequency-domain
FWI is shown in Figure 3.7e. All the structures have been successfully imaged, starting from
a very crude starting model.

3.8 On the parallel implementation of FWI

3.8.1 Overview

FWI algorithms must be implemented in parallel in order to address large-scale 3D problems.

Depending on the numerical technique for solving the forward problem, different parallel strate-
gies can be considered for FWI. If the forward problem is based on numerical methods such
as time-domain modeling or iterative solvers, which are not demanding in terms of memory, a
coarse grain parallelism, which consists of distributing sources over processors is generally used
and the forward problem is performed sequentially on each processor for each source (Plessix,
2009). If the number of processors significantly exceeds the number of shots, which can be the
case if source encoding techniques are used (Krebs et al., 2009), a second level of parallelism by
domain decomposition of the physical computational domain can be viewed. A comprehensive
review of different algorithms to compute efficiently the forward and the adjoint-wavefields in
time-domain FWI is presented in Akcelik (2002). Each algorithm has its pros and cons in terms
of storage and computational costs. If the forward problem is based on a method, which em-
beds a memory-expensive preprocessing step such as LU factorization in the frequency-domain

101
FWI IN PRACTICE

direct-solver approach, parallelism based on a domain decomposition of the computational do-

main is required. Each processor handles a subdomain of the model and the contribution of all
the sources are processed by each processor. Examples of such algorithms are described in Ben
Hadj Ali et al. (2008); Sourbier et al. (2009a,b); Brossier (2011). A contrast source inversion
(CSI) method is described by Abubakar et al. (2009) which allows the decrease of the number
of LU factorization in frequency-domain FWI at the expense of the number of iterations.

3.8.2 Exemple : the FWT2D ELAS code of SEISCOPE

The FWT2D ELAS code is a 2D frequency-domain FWI implementation for visco-acoustic and
visco-elastic media developed in the framework of SEISCOPE consortium (Brossier, 2011).

3.8.2.1 Forward problem

The forward problem of FWT2D ELAS relies on a finite-element (discontinuous Galerkin)

discretization of the first order elastodynamic equations in the frequency-domain, that leads
to the resolution of linear systems for multiple sources (right-hand-side (RHS)). This solving
is performed with the massively parallel direct solver MUMPS (Amestoy et al., 2006), which
performs a LU decomposition of the matrix B, based on a multi-frontal approach (Duff and
Reid, 1983). The parallelization allows for the speeding up of the factorization by more than
one order of magnitude, compared to sequential execution, and for the distribution of the LU
factors in the core memory over all of the processors, which makes this quite efficient for solving
large problems without intensive I/O resources. The MUMPS software is based on Message
Passing Interface (MPI) standards for applications with distributed memory architectures. The
workflow of the MUMPS algorithm can be summarized as:

1. analysis of the B matrix pattern (note that unknowns are treated independently in the
current version of the algorithm, without taking advantage of the bloc structure provided
by elements), followed by reordering of the matrix to limit the fill-in during factorization.
The analysis step is performed sequentially on the master processor with the current
MUMPS version (but does not modify the workflow if done in parallel)

2. parallel factorization of the B matrix with dynamic pivoting

3. parallel forward and backward substitutions to obtain solution vectors u from RHS vec-
tors b.

3.8.2.2 Overview of the parallel inverse problem

In the framework of iterative FWI, the physical properties of the model change at each itera-
tion, but the mesh geometry is not changed. Therefore, the B matrix keeps the same pattern,
allowing a single analysis phase in the whole inversion procedure. This feature is interesting
because the matrix analysis is the only time-demanding step performed sequentially on the
master process. After the substitution step, the solution vectors u are returned by MUMPS
in a distributed form over the in-core memory of the processors. However, the distribution
of the solutions performed by MUMPS does not always provide a well-balanced load for the

102
3.8 On the parallel implementation of FWI

processors, because the MUMPS strategy is to optimize LU decomposition with accurate dy-
namic pivoting and reduced fill in. To overcome this limitation, the solutions are re-ordered
following well-balanced mesh partitioning, performed with the METIS software (Karypis and
Kumar, 1999). The MUMPS-distributed solutions are therefore mapped to the METIS decom-
position with MPI point-to-point communications. This mapping/projection step represents
about 25 % of the MUMPS substitution time but allows to speed-up the gradient and hessian
computation. The forward and back-propagated wavefields are stored in the in-core memory
for all sources. The gradient and the diagonal of the Hessian are therefore efficiently computed
in parallel with local in-core wavefields for the local subdomain associated with each process,
before being centralized on the master processor for perturbation-model building. Of note, the
assemblage and storage of the sparse B matrix are performed by the master process, as the stor-
age of the mesh-related tables, the wavefield solutions at the receiver positions for computing
the objective function, and the composite residual sources for adjoint-wavefield computation.
To avoid prohibitive memory allocation on the master processor, the master process is not
involved in the factorization and substitution/projection steps, the wavefield storage, and the
gradient/Hessian computation.

3.8.2.3 Two parallelism levels

The scalability of parallel direct solvers is intrinsically limited because of the large amount of
communications between the MPI processes, and the memory overhead generated by the num-
ber of MPI process involved in the factorization (Sourbier et al., 2009b). Therefore, the single
parallelization level with MPI can be not adapted for large-scale applications. A second paral-
lelism level, based on multi-threading, can, however, be efficiently implemented on multi-core
architectures. The main computationally demanding parallel tasks of the FWI algorithm are
the forward problem resolutions (factorization and substitutions/projection) and the building
of the gradient and diagonal Hessian.
The MUMPS solver uses the Basic Linear Algebra Subroutines (BLAS) library mainly for
level 2 and 3 operations (matrix/vector, and matrix/matrix operations). Several BLAS libraries
are available in multi-thread distributions, which allows an efficient low-level of parallelization
for multi-core architectures with shared memory.
The gradient and diagonal Hessian computations have several nested loops embedded. In
addition to the MPI-based domain-decomposition parallelism, the outer loop over the local
model parameters can be easily parallelized with shared-memory thread technology (as with
standard OpenMP) to speed up the loop on the available computing-cores without duplicating
or exchanging data.
An alternative to classical parallelization using one MPI process per computing-core is,
therefore, to run one MPI process on n computing-cores that share the same memory (phys-
icaly on the same node), and to launch n shared-memory threads per MPI-process during
the BLAS operations, and the gradient and diagonal Hessian computations. These two levels
of parallelism can mitigate the memory overhead of the factorization, the network usage for
intensive communications, and the computational time for LU factors.
A third level of parallelism could also be implemented over frequencies and/or time-damping
factors (Shin and Min, 2006). This third level of parallelism has however not been investigated
in the current algorithm.

103
Chapter 4

Applications

This chapter is mainly based on the papers :

• S. Operto, C. Ravaut, L. Improta, J. Virieux, A. Herrero, A. and P. Dell’Aversana

[2004] Quantitative imaging of complex structures from dense wide-aperture seismic data
by multiscale traveltime and waveform inversions: a case study, Geophysical Prospecting,
(Special issue), 52, 625-651

• C. Ravaut, S. Operto, L. Improta, J. Virieux, A. Herrero and P. Dell’Aversana [2004]

Multiscale imaging of complex structures from multifold wide-aperture seismic data by
frequency-domain full-waveform-tomography: application to a thrust belt. Geophys. J.
Int., 159, 1032–1056

• V. Prieux, R. Brossier, Y. Gholami, S. Operto, J. Virieux, O. Barkved and J. Kommedal.

On the footprint of anisotropy on isotropic Full Waveform Inversion: the Valhall case
study. Submitted to Geophys. J. Int.

• R. Brossier, S. Operto, and J. Virieux [2009] Seismic imaging of complex onshore struc-
tures by 2D elastic frequency-domain full-waveform inversion. Geophysics, 74 (6), WCC105-
WCC118.

• R. Brossier, S. Operto, and J. Virieux [2010] Which data residual norm for robust elastic
frequency-domain full waveform inversion?. Geophysics, 75 (3), R37 - R46.

4.1 Introduction

This chapter presents four FWI applications in different frames. The objective is to show some
classical and potential issues related to FWI, for synthetic and real data, and to present how
this has been tackled in the SEISCOPE group :

• the first application deals with a synthetic onshore application for elastic parameters
inversion. In this application, we particularly focused on the combined inversion of body
and surface waves.
APPLICATIONS

• the second section is a reinvestigation of data-space norms for frequency-domain FWI, in

the framework of elastic FWI.

• the third application is the first real data application done by the SEISCOPE group in the
acoustic approximation, from an onshore dataset from the Southern Apennines (Italy)

• the last application is a shallow-water offshore application in the Valhall zone (Norway).
This application allows to highlight the difficulties associated to the application of FWI
where a significant anisotropy affect the data.

4.2 Onshore elastic FWI: the synthetic Overthrust application

4.2.1 Introduction

Most of the recent applications of FWI to real data have been performed under acoustic ap-
proximation (Pratt and Shipp, 1999; Hicks and Pratt, 2001; Ravaut et al., 2004; Gao et al.,
2006; Operto et al., 2006; Bleibinhaus et al., 2007). Although reliable results can be obtained
with acoustic approximation if judicious data pre-processing and inversion preconditioning is
applied (Brenders and Pratt, 2007b), elastic FWI is desirable for applications that involve the
detection of fluids and gas, and for CO2 sequestration. Moreover, acoustic FWI can lead to
erroneous models when applied to elastic data when the velocity models show high velocity
contrasts and when no specific FWI pre-processing and tuning is applied to the data (Barnes
and Charara, 2008).
Only a few applications of elastic FWI have been presented in the literature recently. In
the early applications, Crase et al. (1990, 1992) applied elastic FWI to real land and marine
reflection data using limited offsets. In this framework, the FWI was applied as quantitative
migration processing for imaging P and S impedances. With the benefit provided by wide-
aperture data to build the large and intermediate wavelengths of the subsurface recognized
by Mora (1987, 1988), acoustic and elastic FWI evolved as an attempt to build high reso-
lution velocity models. Shipp and Singh (2002) performed 2D time-domain FWI of a small
wide-aperture marine data subset recorded by a long streamer (12-km long). Although the
forward problem was solved using the elastic wave equation, only the VP parameter was recon-
structed during FWI, assuming an empirical relationship between VP and VS and between VP
and density. They designed a hierarchical multi-step approach based on layer-stripping, and
offset and time windowing, where the aim was to mitigate the non-linearity of the inverse prob-
lem. Sears et al. (2008) designed a similar multi-step strategy to perform elastic time-domain
FWI of multi-component ocean-bottom-cable data. Their strategy, was based on selection of
data component, parameter class and arrival type (by time windowing), and it revealed itself
especially useful when the amount of P-to-S conversion was small at the sea bottom, which
makes the reconstruction of the VS model particularly ill-posed. Choi and Shin (2008) and
Choi et al. (2008) applied elastic frequency-domain FWI to onshore and offshore versions of
the synthetic Marmousi2 model (Martin et al., 2006), respectively. They successfully imaged
the model using a velocity-gradient starting model and a very low starting frequency (0.16 Hz).
Shi et al. (2007) applied elastic time-domain FWI to marine data collected from a gas field
in western China. They successfully imaged a zone of Poisson-ratio anomalies associated with
gas layers. Accurate starting VP and VS models were built from the P-wave and P-SV-wave

106
4.2 Onshore elastic FWI: the synthetic Overthrust application

velocity analysis and from a priori information of several well logs along the profile. Gelis
et al. (2007) implemented a 2D elastic frequency-domain FWI using the Born and the Rytov
approximations for the linearization of the inverse problem. They highlighted the dramatic
footprint of the surface waves on the imaging of small inclusions in homogeneous background
models. To mitigate this footprint, they only involved body waves during the early stages of
the inversion, by only selecting short-offset traces.
The present study presents an application to a realistic synthetic onshore case study. The
aim of this application was to assess whether surface waves and body waves recorded by wide-
aperture acquisition geometries can be jointly inverted to build high resolution VP and VS
of complex onshore structures. We implement FWI in the frequency domain, which presents
some distinct advantages with respect to the time-domain formulation (Pratt, 1990, 1999; Sir-
gue and Pratt, 2004). The inverse problem can be solved with a local optimization approach
using either a conjugate gradient or a quasi-Newton method (Nocedal and Wright, 1999). The
gradient of the objective function is computed with the adjoint-state technique (Plessix, 2006).
Successive inversions of increasing frequency provide a natural framework for multiscale imag-
ing algorithms. Moreover, by sacrificing the data redundancy of multifold acquisitions, the
inversion of a limited number of frequencies can be enough to build reliable velocity models,
provided the acquisition geometry spans over sufficiently long offsets. This limited number of
frequencies can be efficiently modeled in the 2D case for multiple shots once the impedance
matrix that results from discretization of the frequency-domain wave equation has been factor-
ized through a LU decomposition (Marfurt, 1984; Nihei and Li, 2007). Finally, attenuation can
be implemented in the forward problem in a straightforward way, and without extra compu-
tational cost, by using complex velocities. The main drawback of the frequency-domain FWI
formulation arises from the difficulty of time windowing of the modeled data when inverting
one or a few sparsely sampled frequencies at a time. Time windowing allows a selection of
specific arrivals to be included in the various stages of the inversion. A last resort is the use
of complex-valued frequencies, which damp arrivals starting at a given traveltime (Shin et al.,
2002).
In the next section of the present study, we applied some multiscale strategies to mitigate
the non-linearity of the elastic inverse problem. These strategies involve two nested levels of
hierarchy over frequencies and aperture angles in the inversion algorithm as described in section
3.3-3.1. The effectiveness of these strategies is first illustrated with a simple two-parameter elas-
tic problem with a free surface. Then we have applied the elastic frequency-domain FWI algo-
rithm to a realistic synthetic example, for the reconstruction of a dip section of the SEG/EAGE
Overthrust model, assuming a constant Poisson ratio. The results of the different analyses show
that simultaneous inversions of multiple frequencies and data preconditioning by time damp-
ing are critical to obtain reliable results when surface waves propagating in a heterogeneous
near-surface are present in the elastic wavefield.

4.2.2 Application to a canonical model

FWI is an ill-posed, non-linear problem even for apparently simple models involving few pa-
rameters. Mulder and Plessix (2008) illustrated analytically the non-linearity of 3D acoustic
FWI with a 1D velocity gradient model defined by two parameters. The objective function
showed multiple local minima around the true global minimum. We consider here a similar
two-parameter problem for a 2D elastic model with a free surface on top of it.

107
APPLICATIONS

Figure 4.1: Objective function as a function of the two model parameters for a) full data set
and b) damped data set. c) Cross-sections of maps shown above for V0 =4 km/s. The solid
line and the dashed line correspond to the full data-set and the damped data-set, respectively.
d) Cross-sections of maps shown above for η=0.35 s−1 . The solid line and the dashed line
correspond to the full data-set and the damped data set respectively.

A 1D VP gradient model defined by VP (z) = V0 + ηz is considered, where V0 is the P-wave

velocity at the surface and η is the vertical velocity gradient. The S-wave velocity is inferred
from the P-wave velocity by considering a constant Poisson ratio of 0.24. A vertical point-force
is located just below the free surface, and 350 receivers record horizontal and vertical particle
velocities on the free surface along a 17-km-long profile.

The objective function is plotted as a function of V0 and η for the 5.8 Hz frequency in
Figure 4.1a. The global minimum is located at the coordinates of the true model (V0 =4 km/s,
η =0.35 s−1 ). Cross sections along the η axis for V0 =4 km/s and along the V0 axis for η =0.35
s−1 show the non-convex shape of the objective function (Figure 4.1). Multiple local minima
are present, particularly on the η section, even for this simple gradient model and the low
frequency content in the data.

We repeated the same simulations for damped data using γ = 1/τ = 3.33 (equation (3.5)).
Using this data preconditioning, the objective function is now convex (Figures 4.1b and 4.1c).
The convex shape of the objective function should ensure the convergence of the inversion
towards the global minimum of the objective function for all of the starting models sampled in
Figure 4.1.

108
4.2 Onshore elastic FWI: the synthetic Overthrust application

Figure 4.2: a) Dip section of the synthetic SEG/EAGE Overthrust model. P-wave velocity is
depicted. b) Starting model used for elastic FWI.

4.2.3 Onshore synthetic case study

4.2.3.1 SEG/EAGE Overthrust model and experimental setup

We considered a 20 km × 4.65 km dip section of the SEG/EAGE Overthrust model to assess

the potential of 2D elastic frequency-domain FWI for imaging complex onshore structures
(Aminzadeh et al., 1997)(Figure 4.2a). The Overthrust model was a 20 km × 20 km × 4.65
km 3D acoustic model that represents an onshore complex thrusted sedimentary succession
constructed on top of a basement block. Several faults and channels were present in the model,
as well as a complex weathering zone on the surface. For elastic FWI, a VS model was built
from the VP model using a constant Poisson ratio of 0.24. A uniform density of 1000 kg.m−3
was considered and assumed to be known during the inversion. A free surface was set on top
of the model.
The onshore, wide aperture survey consisted of 199 explosive sources spaced every 100
m and located 25 m below the free surface. All of the shots were recorded by 198 vertical
and horizontal geophones, which were spaced every 100 m on the surface. The vertical and
horizontal components of particle velocity were used as the dataset for the elastic FWI. The data
were computed with the same algorithm for both observed and computed data in inversion. The
source signature was assumed to be known in FWI. An elastic shot gather is shown in Figure
4.3(a-b) for the horizontal and vertical components of particle velocity. The corresponding shot
gather computed when an absorbing boundary condition is set on top of the model is shown
for comparison in Figure 4.3(c-d), to highlight the additional wave complexities introduced
by free-surface effects (i.e., surface waves and body-wave reflexion from the free surface). Of
note, the high amplitudes of the surface waves dominate the wavefield especially on the vertical
component (Figure 4.3(a-b)). Starting VP and VS models for FWI were computed by smoothing

109
APPLICATIONS

Figure 4.3: Seismograms computed in Overthrust model for (a) horizontal and (b) vertical
components of particle velocity. The shot is located at a horizontal distance of 3 km. A free-
surface was set on top of the model. (c-d) As for Figure 4.3(a-b), except that an absorbing
boundary condition was implemented on top of the model.

the true velocity models with a 2D Gaussian function of vertical and horizontal correlation
ranges of 500 m (Figure 4.2b). This starting model was proven to be accurate enough to
image the Overthrust model by 2D acoustic frequency-domain FWI using a realistic starting
frequency of 3.5 Hz (Sourbier et al., 2009b). For elastic inversions presented hereafter, we used
a lower starting frequency of 1.7 Hz. Using a starting frequency of 3.5 Hz for elastic FWI led
to a deficit of long wavelengths in the VS models, which made the inversion converge towards
a local minimum.

The different behavior of acoustic and elastic FWI for the Overthrust case study highlights
the increased sensitivity of elastic FWI with respect to the limited accuracy of the starting
models. Five discrete frequencies (1.7, 2.5, 3.5, 4.7 and 7.2 Hz) were used for the elastic FWI.
This frequency sampling should allow continuous sampling of the wavenumber spectrum ac-
cording to the criterion of Sirgue and Pratt (2004). In the following, we shall consider the
three different strategies to manage frequencies described in the section “Full-waveform inver-
sion data preconditioning and multiscale strategies”: successive inversion of single frequencies,
successive inversion of frequency groups of increasing bandwidth, and successive inversion of
slightly overlapping frequency groups. For each frequency group, the inversion is subdivided

110
4.2 Onshore elastic FWI: the synthetic Overthrust application

Table 4.1: Inversion parameters for the sequential, Bunks, and simultaneous approaches. FG:
frequency group number; F (Hz): frequencies within a frequency group; γ (1/s): damping
factors (imaginary part of frequency).

FG Sequential - F(Hz) Bunks - F(Hz) Simultaneous - F(Hz) γ (1/s)

1 1.7 1.7 1.7, 2.5, 3.5 1.5, 1, 0.5, 0.1, 0.033
2 2.5 1.7, 2.5 3.5, 4.7, 7.2 1.5, 1, 0.5, 0.1, 0.033
3 3.5 1.7, 2.5, 3.5 1.5, 1, 0.5, 0.1, 0.033
4 4.7 1.7, 2.5, 3.5, 4.7 1.5, 1, 0.5, 0.1, 0.033
5 7.2 1.7, 2.5, 3.5, 4.7, 7.2 1.5, 1, 0.5, 0.1, 0.033

into two steps. In the first step, no offset-dependent gain was applied to the data. Although
we scale the gradient by the diagonal terms of the Hessian, we observed some lack of recon-
struction in the deep part of the model, suggesting that the near-offset traces have a dominant
contribution in the objective function. This layer-stripping effect may provide additional reg-
ularization of the inversion, in addition to that provided by the frequency and aperture angle
selections. In the second step, we applied a quadratic gain with offset to the data, to strengthen
the contribution of long-offset data in the inversion, and hence to improve the imaging of the
deeper part of the model.
During the two-step inversion, the coefficients of the diagonal weighting operator Sd were
respectively given by:

Sd = expγt0
Sd = expγt0 |of f set|2 , (4.1)

with the reminder that the coefficients expγt0 account for the offset-dependent time damping;
equation (3.5). For all of the tests presented below, except for the first, we used five damping
factors per frequency to precondition the data (γ = 1.5, 1.0, 0.5, 0.1, 0.033). A shot gather
computed for the first four damping factors is shown in Figure 4.4, to illustrate the amount
of information preserved in the data. Note how the high damping limits the offset range over
which surface waves are seen. Inversion was also regularized by Gaussian smoothing of the
perturbation model, the aim of which was to cancel high frequency artifacts in the gradient.
The diagonal terms of the pseudo-Hessian matrix (Shin et al., 2001) provided an initial guess of
the Hessian for the L-BFGS algorithm without introducing extra computational costs during
gradient building. Five differences of gradients and models vectors were used for the L-BFGS
algorithm. The model parameters for inversion were VP and VS , which are suitable for wide
aperture acquisition geometries (Tarantola, 1986). The loop over the inversion iteration of one
complex-valued frequency group was stopped when a maximum iteration number of 45 was
reached or when the convergence criterion was reached (relative decrease of two successive cost
functions lower than 5.10−5 ). The schedule of the frequencies and damping terms used in the
sequential approach, the Bunks approach and the simultaneous approach are outlined in Table
4.1.
In the following, we shall quantify the data misfit for each test with the normalized misfit

111
APPLICATIONS

Figure 4.4: Seismograms for vertical component of particle velocity computed in the dip section
of the Overthrust model using four values of imaginary frequency. a) γ = 1.5 s−1 , b) γ = 1.0
s−1 , c) γ = 0.5 s−1 , d) γ = 0.1 s−1 . Time-damping was applied from the first-arrival traveltime
to preserve long-offset information.

¯ defined by:
C,
P5
¯ k∆di (mf )k2
C = P5i=1 (4.2)
i=1 k∆di (m0 )k2

where ∆di (mf ) denotes the data misfit vector for the ith frequency and for the final FWI model
mf , and m0 denotes the starting model shown in Figure 4.2b.

The FWI model quality will be quantified by:

1 mf − mtrue
mq = (4.3)
N mtrue 2

where mtrue denotes the exact model either for VP or VS , and N is the number of grid points
in the computational domain. The normalized misfit and the model quality for the different
tests presented hereafter are outlined in Table 4.2.

112
4.2 Onshore elastic FWI: the synthetic Overthrust application

Table 4.2: Final L2 misfit and model quality mq for the different reconstructed models. Seq:
Sequential approach; Bunks: Bunks approach; Sim: Simultaneous approach; without FS: Se-
quential approach without free-surface effects; PCG: Sequential approach computed with PCG
optimization.

Test Data misfit C¯ VP mq VS mq

Seq 4.12 10−1 5.54 10−2 6.47 10−2
Bunks 1.54 10−1 5.22 10−2 5.33 10−2
Sim 1.46 10−1 5.03 10−2 5.39 10−2
Without FS 9.03 10−2 4.09 10−2 3.89 10−2
PCG 6.71 10−1 5.56 10−2 6.89 10−2

Figure 4.5: Sequential inversion of raw data - (a) VP and (b) VS models after frequency of 7.2
Hz.

4.2.3.2 Raw data inversion

A first inversion test was performed without data damping (i.e., without using complex-valued
frequencies), which implies that all of the arrivals were involved in the inversion. The five
frequencies (Table 4.1) were successively inverted with the sequential approach. The FWI VP
and VS models after inversion are shown in Figure 4.5. The inversion clearly failed to converge
towards the true models for both of the VP and VS parameters, even at low frequencies.

4.2.3.3 Successive single-frequency inversions of damped data

We repeated the previous experiment, except that the five damping terms (γ = 1.5, 1.0, 0.5,
0.1, 0.033) were used to stabilize the inversion (Table 4.1). The final FWI VP and VS models
are shown in Figure 4.6. Contrary to the previous experiment, most of the layers were now

113
APPLICATIONS

Figure 4.6: Sequential inversion of damped data - (a) VP and (b) VS models after inversion.
The L-BFGS algorithm was used for optimization. Five frequency components were inverted
successively. Five damping coefficients were successively used for data preconditioning during
each mono-frequency inversion.

successfully reconstructed. Comparison between 1D vertical profiles extracted from the true
model, the starting model and the FWI models shows a reliable estimate of velocity amplitudes
despite a low maximum frequency of 7.2 Hz (Figure 4.7). To take into account the limited
bandwidth effect of the source in the FWI model appraisal, we also plotted the vertical profiles
of the true models after low-pass filtering at the theoretical resolution of FWI for a maximum
frequency of 7.2 Hz: the true models were converted from depth to time using the velocities
of the starting model, and low-pass filtered with a cut-off frequency of 7 Hz, before conversion
back to the depth domain. We noted that the VS model is more affected by spurious artifacts
than the VP model, especially in the deep part of the model. This may be due to a deficit
of small wavenumbers in the VS models, resulting from the shorter propagated wavelengths,
which makes the reconstruction of the VS parameter more non-linear. Secondly, we saw some
inaccuracies in the reconstruction of both the VP and VS parameters in the shallowest parts of
the models (Figure 4.7). The resulting residuals of the surface waves and reflections from the
free surface may have been erroneously back-projected in the deeper part of the model, leading
to the above-mentioned noise (Figure 4.8). The final normalized L2 misfit computed for the
five frequencies was 4.12 10−1 . The VP and VS model qualities are 5.54 10−2 and 6.47 10−2 ,
respectively.

4.2.3.4 Full-waveform inversion without free-surface effects

To assess the footprint of surface waves and free-surface reflections on elastic FWI, we inverted
the data computed in the Overthrust model with an absorbing boundary condition on top of
it, instead of a free surface. The same inversion process was used as in the previous section
(sequential approach with damped data). The final FWI VP and VS models were very close to
the low-pass filtered versions of the true models, and they were not affected by any spurious

114
4.2 Onshore elastic FWI: the synthetic Overthrust application

Figure 4.7: Sequential inversion of damped data - Vertical profiles for the VP (a-b) and VS
(c-d) parameters. Profiles (a)-(c) and (b)-(d) are at horizontal distances of 7.5 km and 14 km
respectively. Profiles of the starting and the true models are plotted with dashed gray lines and
solid black lines, respectively. A low-pass filtered version of the true model at the theoretical
resolution of FWI is plotted with a dashed black line for comparison with the FWI results.
The profiles of the FWI models of Figure 4.6 are plotted with solid gray lines.

artifacts (Figure 4.9). The VP and VS model qualities are 4.09 10−2 and 3.89 10−2 , respectively.
Comparison with the previous results (Figure 4.6) illustrates the substantial increase of non-
linearity introduced by surface waves and free-surface reflections in elastic FWI.

4.2.3.5 Simultaneous multi-frequency inversion of damped data

We now investigated the influence of simultaneous multiple-frequency inversion strategies for

FWI improvement. We first considered the Bunks approach for FWI. The different frequency
groups and damping terms are outlined in Table 4.1. Each mono-frequency dataset belonging
to a frequency group was computed with a unit Dirac source wavelet, which implies that each
frequency of a group has a similar weight in the inversion. This is equivalent to inversion of
deconvolved data (i.e., data with a flat amplitude spectrum).
The final FWI VP and VS models shown in Figure 4.10 are slightly improved compared
to those of the sequential approach (Figure 4.6). The improvements are more obvious on
the vertical profiles of the final FWI models (Figure 4.11). Near-surface instabilities in the
VP and VS models were mitigated, although not fully removed, and the estimations of the

115
APPLICATIONS

Figure 4.8: Sequential inversion of damped data - Seismograms computed in the FWI models
of Figure 4.6 for shot located at a horizontal distance of 3 km. a) Horizontal component.
b) Vertical component. (c-d) Residuals between seismograms computed in the true VP and
VS models (Figure 4.3a) and in the FWI models of Figure 4.6. c) Vertical component. d)
Horizontal component.

velocity amplitudes were improved in most parts of the model (compare Figures 4.7 and 4.11).
Mitigation of the near-surface instabilities translated into a significant misfit reduction for
the surface waves and free-surface reflections (Figure 4.12). The final normalized L2 misfit
decreased in this case to 1.54 10−1 . The VP and VS qualities are 5.22 10−2 and 5.33 10−2 ,
respectively.

In a second step, we considered the simultaneous approach implemented by successive

inversions of two overlapping frequency groups (Table 4.1). The frequency bandwidth of each
group was chosen such that cycle-skipping artifacts were avoided. For example, we tried to
simultaneously invert the five frequencies listed in Table 4.1. In this case, the inversion failed
to converge towards acceptable models. Note that the computational cost of the simultaneous
approach is similar to that of the sequential one if the same convergence rate for the two
approaches is assumed. The total number of iterations in the simultaneous approach is less
important, because the iterations are performed over fewer frequency groups at the expense
of more factorization and substitution phases per frequency group. The extra cost of the
simultaneous approach is only due to the overlap between the frequency groups.

116
4.2 Onshore elastic FWI: the synthetic Overthrust application

Figure 4.9: Sequential inversion without free-surface effects - (a) VP and (b) VS models. The
models can be compared with that of Figures 4.6 to assess the footprint of free-surface effects
on elastic FWI.

Figure 4.10: Bunks inversion - (a) VP and (b) VS models obtained with the frequency-domain
adaptation of the multiscale approach of Bunks et al. (1995).

The final FWI VP and VS velocity models shown in Figure 4.13 were improved with re-
spect to the velocity models produced by the sequential approach (Figure 4.6) and the Bunks
approach (Figure 4.10), especially for the VS velocity model in the thrust zone. The vertical
profiles extracted from the final FWI models do not show near-surface instabilities anymore
(Figure 4.14), which allowed a significant data misfit reduction for the surface waves and free-
surface reflections (Figure 4.15). A significant misfit reduction of the wide aperture arrivals

117
APPLICATIONS

Figure 4.11: Same as Figure 4.7, but for the profiles extracted from the models recovered by
the Bunks approach (Figure 4.10).

recorded at large offsets was also seen. The final L2 misfit is 1.46 10−1 . The VP and VS model
qualities are 5.03 10−2 and 5.39 10−2 , respectively. We note, however, slightly underestimated
velocity amplitudes in the deep part of the VP and VS models at the thrust location (see below
3 km depth in Figure 4.14a, c). We attribute this amplitude deficit to a slower convergence of
the simultaneous approach when compared to that of the sequential one, which results from
the fact that more information is simultaneously inverted in the simultaneous approach. The
imaging was further improved by decreasing the frequency interval by a factor of 2 within
each frequency group (five frequencies instead of three frequencies per group). This resampling
contributes to the strengthening of the spectral redundancy in the imaging. Close-ups of the
VP and VS models centered on the thrust zone show how the resolution and the signal-to-noise
ratio of the velocity models were still improved by involving more frequencies in one inversion
iteration (Figure 4.16). Note that using five frequencies instead of three in each group leads to
a factor of 5/3 in computational costs.

4.2.3.6 Computational aspect

All of the simulations were performed on the cluster of the SIGAMM computer center, which
is composed of a 48-node cluster, with each node comprising 2 dual-core 2.4-GHz Opteron
processor, providing 19.2 Gflops peak performance per node. This computer has a distributed
memory architecture, where each node has 8 GBytes of RAM. The interconnection network

118
4.2 Onshore elastic FWI: the synthetic Overthrust application

Figure 4.12: Same as Figure 4.8, but for seismograms computed in the FWI models recovered
by the Bunks approach (Figure 4.10).

between processors is Infiniband 4X. Twenty-four processors were used for each simulation,
leading to the best compromise between execution time and numerical resources used. A
single regular equilateral mesh composed of 265675 cells was designed for the simulations.
Although the mesh could have been adapted to the inverted frequency, we did not consider
this strategy here, and the mesh was kept constant whatever the inverted frequency. Table
4.3 outlines the memory requirements and computational time of the major tasks performed
by the parallel FWI algorithm. Of note, most of the memory and computational time were
dedicated to the LU factorization and substitution phases performed during the multi-source
forward problem. Computation of the gradient had a negligible computational cost, due to
the domain-decomposition parallelism. The L-BFGS algorithm required a negligible extra
amount of memory and computational time compared to a steepest-descent or PCG algorithm,
suggesting that this optimization scheme can be efficiently used for realistic 2D and 3D FWI
applications.

4.2.4 Discussion

Application of elastic FWI to the Overthrust model has highlighted the strong non-linearity
of the inversion resulting from free-surface effects. The impact of these effects on FWI can
be assessed by comparing the FWI results inferred from the data including or without the

119
APPLICATIONS

Figure 4.13: Simultaneous inversion - Final models obtained by successive inversion of two
overlapping frequency groups composed of three frequencies each. (a) VP model. (b) VS model.

Figure 4.14: Same as Figure 4.7, but for the profiles extracted from the models recovered by
the simultaneous approach (Figure 4.13).

120
4.2 Onshore elastic FWI: the synthetic Overthrust application

Figure 4.15: Same as Figure 4.8, but for seismograms computed in the FWI models recovered
by successive inversion of 2 overlapping frequency groups (Figure 4.13).

Table 4.3: Computational cost of the main tasks performed by FWI. Time estimations were
averaged over several iterations for 24 processors.

Time for factorization (s) 21.3

MUMPS total memory for factorization (Gbytes) 4.8
Time for substitutions (199 shots) (s) 22.9
MUMPS total memory for substitution (Gbytes) 20.2
Time for gradient build up (s) 13.8
Time for L-BFGS perturbation computation (s) 1.2
Memory for L-BFGS(5) history (Mbytes) 21.2

free-surface effects (compare Figures 4.6 and 4.9). As the best models were obtained when
free-surface effects are not considered, this shows that for this case study, inversion of the
surface waves was not useful towards an improvement of the reconstruction of the near-surface
structure.

121
APPLICATIONS

Figure 4.16: Comparison between FWI models obtained by successive inversion of two over-
lapping groups of frequencies (simultaneous approach) when three and five frequencies per
group are used in the inversion, respectively. The frequency bandwidth is the same for each
experiment but the frequency interval differs. a) Close-up of the true VP model after low-pass
filtering at the theoretical resolution of FWI. (c)-(e) Close-up of the FWI VP model when five
frequencies (c) and three frequencies (e) per group are used respectively. (b)-(d)-(f) Same as
(a)-(c)-(e) for the VS model.

We interpret the failure of the raw-data inversion as the footprint of surface waves, the
amplitudes of which dominate the wavefield and carry no information of the deep part of the
model (Figure 4.5). Similar effects of surface waves on elastic FWI were also seen on a smaller
scale by Gelis et al. (2007). Comparison between the sequential FWI results obtained with
the raw data and the preconditioned data illustrates how the time damping helps to mitigate
the non-linearities of FWI by injecting progressively more complex wave phenomena in the
inversion (compare Figures 4.5 and 4.6).

The further improvements obtained by simultaneous inversion of multiple frequencies com-

bined with hierarchical inversions of damped data show that preserving some wavenumber re-
dundancy in elastic FWI is critical to mitigate the non-linearity of the inversion associated with
the propagation of surface waves in weathered near-surface layers and free-surface reflections.
The more stable results obtained with the simultaneous approach compared to the Bunks ap-

122
4.2 Onshore elastic FWI: the synthetic Overthrust application

proach, especially in the near-surface, suggest that several frequencies must be simultaneously
inverted from the early stage of the inversion (compare Figures 4.10 and 4.13). Strengthen-
ing the wavenumber redundancy by decreasing the frequency interval in each frequency group
further improved the imaging (Figure 4.16).

Another factor that increased the non-linearity of elastic FWI was the short S wavelengths
that may require more accurate starting models or lower frequencies to converge towards an
acceptable model. The maximum frequency of the starting frequency group must be chosen
such that it prevents cycle-skipping artifacts that can result from the limited accuracy of the
starting S-wave velocity model. Laplace-domain waveform inversion, which has been recently
proposed as a reliable approach to build smooth initial elastic models of the subsurface (Pyun
et al., 2008; Shin and Cha, 2008), may represent an approach to tackle the issue of building
the starting model. An alternative approach is PP-PS stereotomography (Alerini et al., 2002),
including the joint inversion of refraction and reflection traveltimes of wide-aperture data.

As mentioned above, with this case study, we were not able to illustrate the usefulness of the
surface waves for the reconstruction of the near-surface structure, since the most accurate FWI
models were inferred without involving free-surface effects in the data (Figure 4.9). Instead,
we have shown how to manage the non-linearities introduced by the surface waves, by means
of judicious data preconditioning and FWI tuning. Alternatively, the surface waves in the
recorded and modeled data can be filtered out or muted. We did not investigate this approach
at this stage because efficient filtering of the modeled surface waves in the frequency domain is
not straightforward. Note that the surface waves must be filtered out not only at the receiver
positions, but also at each position in the computational domain where they have significant
amplitudes, in order to remove their footprint from the gradient of the objective function. This
investigation still requires further work.

More realistic applications of elastic FWI in more complex models still need to be investi-
gated. Areas of complex topography, such as foot-hills, will lead to conversions from surface
waves to body waves and vice versa, which may carry additional information on the near-
surface. The robustness of elastic FWI for imaging models with heterogeneous Poisson ratios
is a second field of investigation, especially in areas of soft seabed where the P-S-converted
wavefield may have a limited signature in the data (Sears et al., 2008). In the present study,
we inverted data computed with a constant density that was assumed to be known. For real
data inversion, estimation of the density is required for a more reliable amplitude match. Re-
liable estimation of the density by FWI is difficult, because the P-wave velocity and density
have similar radiation patterns at short apertures (Forgues and Lambaré, 1997). The benefit
provided by wide apertures to uncouple these two parameters needs to be investigated. Other
extensions of isotropic elastic FWI may relate to reconstruction of attenuation factors and
some anisotropic parameters. Vertical transversely isotropic elastic FWI should be easily im-
plemented from isotropic elastic FWI, since only the expression of the coefficients of the P-SV
elastodynamic system need to be modified compared to the isotropic case (e.g., Carcione et al.,
1988).

Application of 2D elastic FWI to real data will require additional data preprocessing that
was not addressed in this study, such as source estimation (Pratt, 1999) and 3D to 2D amplitude
corrections (Bleinstein, 1986; Williamson and Pratt, 1995). The sensitivity of the elastic FWI
to the approximations and errors underlying this processing will need to be determined.

123
APPLICATIONS

4.2.5 Partial conclusion

This study presents an application of 2D elastic frequency-domain FWI to a dip section of the
SEG/EAGE onshore Overthrust model. Strong non-linearities of elastic FWI arise both from
the presence of converted and surface waves, and from the limited accuracy of the VS starting
model. These two factors prevent convergence of FWI on the global minimum of the objective
function if no specific preconditioning is applied to the data and no low starting frequency is
available. Data preconditioning performed by time damping is necessary to converge towards
acceptable velocity models, whatever the frequency sampling strategy is. Secondly, successive
inversions of overlapping frequency groups out-perform successive inversions of single frequen-
cies for the removal of instabilities in the near-surface of the FWI models. The bandwidth
of the frequency groups must be chosen such that cycle-skipping artifacts are avoided, while
injecting a maximum amount of redundant information into the frequency groups.

124
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

4.3 Which data residual norm for robust elastic frequency-

domain Full Waveform Inversion?

4.3.1 Introduction

The noise footprint in seismic imaging is conventionally mitigated by stacking highly redundant
multifold data. However, improving our understanding of the inversion sensitivity to noise is a
key issue, and in particular when the data redundancy is decimated in the framework of efficient
frequency-domain FWI. The least-squares objective function remains the most commonly used
criterion in FWI, although it theoretically suffers from poor robustness in the presence of large
isolated and non-Gaussian errors. Other norms can therefore be considered. The least-absolute-
values norm (L1 ) is not based on Gaussian statistics in the data space, and it was introduced
into time-domain FWI by Tarantola (1987); Crase et al. (1990); it has been shown to be weakly
sensitive to noise. Djikpéssé and Tarantola (1999) used the L1 norm successfully to invert
field data from the Gulf of Mexico with time-domain FWI. Surprisingly, this norm has been
marginally used during recent applications of FWI. Pyun et al. (2009) used a L1 -like norm for
frequency-domain FWI: the L1 norm is applied independently to the real and imaginary parts of
the complex-valued wavefield. The resulting functional does not rigorously define a norm from
a mathematical viewpoint, because the functional does not satisfy the scalar multiplication
property of norms (||αx|| = |α| · ||x|| for complex-valued scalar α and vector x). The violation
of the norm property makes the value of the functional vary with the phase of the residuals,
when the residual amplitude is kept constant. Despite this mathematical approximation, quite
robust results were obtained. Alternative functionals, such as the Huber criterion (Huber,
1973; Guitton and Symes, 2003) and the Hybrid L1 /L2 criterion (Bube and Langan, 1997) can
also be considered. Ha et al. (2009) applied the Huber criterion for frequency-domain FWI and
illustrated its robust behavior compared to the L2 norm when considering a dense frequency
sampling in inversion. All of these criteria behave as the L2 norm for small residuals and as the
L1 norm for large residuals, and therefore allow to overcome the non-derivability issue of the L1
norm for null residuals. A threshold, which needs to be defined, controls where the transition
between these two different behaviors takes place, with a more or less smooth shape depending
on the criteria. These Hybrid criteria are efficient for dealing with outliers in data. However,
they assume Gaussian statistics as soon as the L2 norm is used, leading to the difficult issue
of the estimation of the threshold.
This study presents applications of 2D elastic frequency-domain FWI for imaging realistic
complex offshore and onshore structures in the presence of noisy synthetic data. A special
emphasis is put on the performance of different minimization criteria in the framework of
efficient, elastic frequency-domain FWI.
In the next section, we briefly review the theoretical aspects of different possible norms
and criteria that can be applied to frequency-domain FWI. Then we apply these objective
functions to two synthetic datasets that are contaminated by ambient, random, white noise
with and without outliers. We assess the sensitivity of the inversion to noise in the case of
decimated noisy data. We show that the L2 norm is highly sensitive to non-Gaussian errors
and requires the consideration of denser frequency sampling to improve the signal-to-noise ratio
of the model. The L1 norm shows very robust behavior, even in the case of highly decimated
data, and provides, therefore, an interesting alternative to the L2 norm for the design of
efficient, frequency-domain algorithms. Investigations of other functionals, i.e., the Huber and

125
APPLICATIONS

the Hybrid criteria, highlight the difficulties for finding the best threshold, which will require
some tedious trial-and-error investigations.

4.3.2 Theory

4.3.2.1 The least-squares norm

The least-squares formalism provides the most usual framework for frequency-domain FWI
(Pratt and Worthington, 1990; Pratt, 1990). The L2 functional is usually written in the
following form:
1
CL2 = ∆d† S†d Sd ∆d,
(k)
(4.4)
2
(k)
where ∆d = dobs −dcalc is the data misfit vector for one source and one frequency, the difference
(k)
between the observed data dobs and the modeled data dcalc computed in the model m(k) . k is
the iteration number of the non-linear iterative inversion. Superscript † indicates the adjoint
operator, and Sd is a diagonal weighting matrix that is applied to the misfit vector to scale the
relative contributions of each of its components.
(k)
Differentiation of CL2 with respect to the model parameters gives the following expression
of the gradient: n o
GL2 = R Jt S†d Sd ∆d∗ ,
(k)
(4.5)

where J is the Fréchet derivative matrix, t and ∗ denote the transpose and conjugate operators,
respectively, and R denotes the real part of a complex number. The gradient of the misfit
function, equation (4.5), can be efficiently computed without explicitly forming J, using the
adjoint-state method (Plessix, 2006). This gives the following expression:
n ∂At o
(k) t −1 † ∗
Gm i L2
= R v A S S
d d ∆d , (4.6)
∂mi

where A is the forward-problem operator, which linearly relates the source s to the wavefield,
v: Av = s. The modeled seismic data used in FWI, dcalc , is related to the seismic wavefield,
v, by a projection operator, which extracts the values of the seismic wavefield at the receiver
positions. The sparse matrix ∂A/∂mi represents the radiation pattern of the diffraction by the
model parameter mi , and, hence, gives some insight on the sensitivity of the data to a specific
class of parameter as a function of the aperture angle. The gradient of the misfit function can
be interpreted as a weighted zero-lag convolution between the incident wavefield v and the
adjoint residual wavefield back-propagated from the receiver positions, A−1 S†d Sd ∆d∗ .
The misfit function and its gradient, equations (4.4) and (4.6), are given for one source and
one frequency. For multiple sources and frequencies, the expressions are obtained by summing
the contribution of each source and frequency.

4.3.2.2 The least-absolute-value norm

We can extend the L1 norm developed by Tarantola (1987); Crase et al. (1990) for real arith-
metic numbers to the complex arithmetic required by frequency-domain data through the misfit

126
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

function:
(k)
X
CL1 = |sdi ∆di |, (4.7)
i=1,N

where |x| = (xx∗ )1/2 , N is the number of elements in the misfit vector for one source and one
frequency, and sdi are the elements of the diagonal Sd . The gradient of the misfit function is
given by:

(k)
n o ∆d∗i
GL1 = R Jt Std r with ri = for 1 ≤ i ≤ N, (4.8)
|∆di |

where we assume that |∆di | > 0, considering the machine precision used. For all of the tests
that we have performed, we never met any case where |∆di | = 0. In the case of real arithmetic
numbers, the term ∆d∗i /|∆di | of expression (4.8) corresponds to the sign function (Tarantola,
1987; Crase et al., 1990).

4.3.2.3 The Huber criterion

where is a threshold that controls the transition between the L1 and the L2 norms. In
equation (4.9), the Huber criterion is continuous for all ∆di , and particularly for the value that
satisfies |sdi ∆di | = .
The gradient of the Huber functional is given by:
∆d∗
 n o
 R Jt S† Sd r with ri = i for |sdi ∆di | ≤ and 1 ≤ i ≤ N,
(k) d
GLHuber = n o
∆d∗i (4.10)
 R J t St r with r i = for |sdi ∆di | > and 1 ≤ i ≤ N.
d |∆di |

4.3.2.4 The Hybrid L1 /L2 criterion

Bube and Langan (1997) introduced a Hybrid L1 /L2 criterion in order to overcome some
limitations of the Huber criterion that introduces artificial nonuniqueness in full-rank linear
problems (Bube and Nemeth, 2007).
The Hybrid L1 /L2 functional can be written for complex arithmetic numbers as:
1/2
|sd ∆di |2

(k)
X
CLHybrid = ci with ci = 1+ i 2 − 1, (4.11)

i=1,N

where is the threshold between the L1 and L2 behaviors. Note that we have the following
properties:
|sdi ∆di |2
(
22
for small ∆di ,
ci ≈ |sdi ∆di | (4.12)
for large ∆di ,

127
APPLICATIONS

Functional evolution Residual source amplitude

8 4

Residual source amplitude

L2 functional L2 functional
7 L1 functional 3 L1 functional
Huber functional Huber functional
6 Hybrid functional 2 Hybrid functional
Function

5 1

4 0

3 -1

2 -2

1 -3

0 -4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
∆d ∆d

Figure 4.17: (a) The values of the criteria as functions of an unweighted real arithmetic misfit
∆d; and (b) the associated residual source amplitude in the gradient expression. L2 , L1 , and
the Huber and Hybrid functionals are shown in the red, green, blue and black lines, respectively.
The last two criteria are plotted for = 1.

which show that the Hybrid functional is asymptotically equivalent to the L2 and L1 norms
for small-amplitude and large-amplitude residuals, respectively.
The gradient of the Hybrid L1 /L2 criterion is given by:
n o ∆d∗i
GLHybrid = R Jt S†d Sd r
(k)
with ri = 1/2 for 1 ≤ i ≤ N. (4.13)
|sdi ∆di |2

2 1+ 2

4.3.2.5 Interpretation

Equations (4.5), (4.8), (4.10) and (4.13) clearly show that the gradients of the misfit functions
have similar forms, but with different source terms for the back-propagated adjoint wavefield.
This implies that the same FWI algorithm can be used to compute the gradients of the different
misfit functions with the same computational cost, provided that the source term of the adjoint
back-propagated wavefield and the misfit function can be computed for each functional.
Figure 4.17 shows the misfit function and the source term of the back-propagated wavefield
as functions of the real arithmetic unweighted misfit ∆d for the four minimization criteria. The
L2 norm naturally gives a high weight to large residual, which leads to a lack of robustness
for this approach in the case of incoherent large errors in data. For the L1 norm, the data
residuals are normalized according to their amplitudes, which gives clear insights into why this
is expected to be less sensitive to large residuals. The Huber and Hybrid criteria follow the L2
and L1 behaviors for small and large residuals, respectively, defined by the threshold , with a
different transition shape.

4.3.2.6 Algorithm

The 2D elastic, frequency-domain FWI algorithm used in this study is described in Brossier
et al. (2009a); Brossier (2011), and uses the hierarchical algorithm (algo 3.1)

128
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

The forward problem is performed with a finite-element discontinuous Galerkin method

for solving the elastodynamic equations in the frequency domain (Brossier et al., 2008). The
linear system resulting from this discretization is solved in parallel using the MUMPS LU solver
(Amestoy et al., 2006).
The optimization problem is solved with the quasi-Newton L-BFGS algorithm (Nocedal,
1980). Newton or quasi-Newton algorithms are generally applied to quadratic or locally-
quadratic misfit functions. In contrast, the L1 norm and the Huber criterion are not twice
continuously differentiable, a condition for the convergence of the L-BFGS algorithm. Nonethe-
less, the numerical examples performed in this study show that the violation of the convergence
conditions did not significantly affect, in pratice, the convergence of the L-BFGS algorithm.
Similar conclusions were derived by Guitton and Symes (2003), who applied the L-BFGS algo-
rithm with the Huber criterion for velocity analysis. The convergence of the L-BFGS algorithm,
for functionals which are not twice continuously differentiable, can be interpreted by the fact
that only a definite positive matrix approximation of the inverse of the Hessian, not the exact
one, is computed in the L-BFGS algorithm from only the first derivative (i.e., gradient) of the
misfit functions at previous iterations (Guitton and Symes, 2003). Moreover, the line search in
the perturbation direction is performed by parabolic fitting, to compute the optimal steplength
α(k) , and to guarantee the decrease of the misfit function, even for non-twice continuously dif-
ferentiable functionals. For all the applications presented in this study, the information of the
five previous iterations were used for L-BFGS. The diagonal of the L2 pseudo-Hessian matrix
(Shin et al., 2001) is used as an initial guess for the L-BFGS algorithm for each minimiza-
tion criterion investigated in this study. The aim of the diagonal pseudo-Hessian is to remove
the geometrical spreading signature of the incident and backpropagated wavefields from the
gradient amplitudes.
The perturbation model estimated at each iteration is regularized with an adaptive Gaussian
smoothing operator, to filter out high frequencies artifacts that are not constrained by the
current group of inverted frequencies. The local vertical correlation length of the Gaussian filter
is defined according to the inverted frequency and the local wavespeeds such that wavelengths
significantly smaller than half the propagated wavelength (i.e., the maximum vertical resolution
of FWI) are filtered out in the gradient (Sirgue and Pratt, 2004).
The source is estimated in the FWI algorithm by a linear inversion (equation (3.6)). Of
note, the source is estimated with the least-squares norm even in presence of outliers, whatever
the norm used for the model update is. We found out that the source estimation was quite
robust in presence of noise for the examples shown hereafter: the error does not not exceed 5
% in amplitude and 1% in phase. The robustness of the source estimation can be attributed
to the strong redundancy of the data for the mono-frequency scalar source estimation.
For all numerical tests of this study, the threshold value for the Huber and Hybrid criteria
was fixed to = 0.2 mean(|dobsi |). This value was shown practically to be less sensitive to
outliers in the data than the one indicated by Guitton and Symes (2003), based on max(|dobsi |).

4.3.3 Numerical tests: the offshore Valhall model

4.3.3.1 Inversion set-up

A first numerical example is based on the synthetic Valhall model (Figure 4.18), which is
representative of oil and gas fields in shallow water environments of the North Sea (Munns,

129
APPLICATIONS

1985). The main targets are a gas cloud in the large sediment layer, and in a deeper part of the
model, the trapped oil underneath the cap rock, which is formed of chalk. Gas clouds are easily
identified by the low P-wave velocities, whereas their signature is much weaker in the VS model.
The selected acquisition mimics a four-component ocean-bottom cable survey (Kommedal et al.,
2004), with a line of 315 explosive sources positioned 5 m below water surface, and 315 3C
sensors on the sea bed. This geological setting leads to a particularly ill-posed problem for
S-wave velocity reconstruction, due to the relatively small shear-wave velocity contrast at the
sea bed, which prevents recording of significant P-to-S converted waves. A successful inversion
requires a multi-step hierarchical strategy in the manner of Sears et al. (2008), as developed in
Brossier et al. (2009b) for noise-free data. In this study, we assess the same approach for noisy
data:

1. In the first step, the P-wave velocity is reconstructed from the hydrophone data. The
forward problem is performed with the elastic discontinuous Galerkin method, but the VS
model is left unchanged during the FWI. The aim of this first stage is to improve the VP
model so as to significantly decrease the P-wave residuals. During this first step, a coarse
mesh that is adapted to the VP wavelength is designed for computational efficiency. In
this case, S-wave modeling is affected by numerical dispersion that, however, does not
significantly impact the reconstruction of the VP model. This first stage is justified
by the fact that the P-to-S converted waves have a minor footprint in the hydrophone
component. This negligible sensitivity of the hydrophone data to the VS structure has
allowed us for the successfull acoustic inversion of the elastic data computed in the Valhall
model. (Brossier et al., 2009b).

2. In the second step, the VP and VS models are reconstructed simultaneously from the
horizontal and vertical components of the geophones. An amplification, with a gain
given by the power of 2 of the source/receiver offset, is applied to the data through the
matrix Sd . This weighting increases the weight of the intermediate-to-long-offset data at
which the converted P-to-S arrivals are recorded.

Five frequencies were inverted successively (2, 3, 4, 5 and 6 Hz). The starting frequency
(2 Hz) is lower than the one available in the real OBC hydrophone-data of Valhall (3.5 Hz)
(Sirgue et al., 2009). Note, however, that a starting frequency as small as 2 Hz was recently
used to perform acoustic FWI of OBS data (Plessix, 2009). The use of such low starting
frequency is required by the fact the VS has a higher resolution power than VP because of
shorter propagated wavelength, and, hence, requires lower starting frequency or more accurate
starting model. Our main concern in this study is to tune the elastic FWI with a reasonably-
realistic experimental setup such that differences in the behaviors of the different functionals
are highlighted. In the following, we shall not address anymore the impact of the starting
frequency and the starting model in FWI, and will focus on the comparative performances of
different data residual functionals for a given pair of starting frequency and model.
During each frequency inversion, we used 3 time-damping factors (γ=2, 0.33, 0.1 s−1 )
applied in cascade to the monochromatic data. For the smaller damping factor, the entire
wavefield, including converted waves and free-surface multiples, is involved in the inversion.
Starting models were built by smoothing the true models with a Gaussian filter, the vertical
correlation length of which increased linearly from 25 m to 1,000 m with depth, while the
horizontal one was fixed at 500 m (Figure 4.19). This smoothing should reasonably mimic the

130
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

Figure 4.18: The true synthetic Valhall model for (a) P-wave and (b) S-wave velocities.

Figure 4.19: (a) VP and (b)VS starting models for FWI, as inferred by Gaussian smoothing of
the true models (Figure 4.18).

spatial resolution of a velocity model developed by refraction traveltime tomography (Prieux

et al., 2009). The deep part of this starting model is clearly smoother than a velocity model
inferred from reflection traveltime tomography as the one shown in Sirgue et al. (2009).
Ten iterations were performed per damping factor, leading to 30 iterations per frequency
inversion. VP and VS are the reconstructed parameters. The density is constant and assumed
to be known in the inversion.
Two tests were performed, with and without outliers in the data. For both tests, a ran-
dom uniform white noise was introduced into the observed data, computed using the forward-
problem engine implemented in the inversion code (the so-called inverse crime). The observed
data were computed using a (Dirac) delta function for the source wavelet. Therefore, each
frequency component of the data has the same signal-to-noise ratio (SNR), since white noise

131
APPLICATIONS

Figure 4.20: Real part of the 4-Hz frequency-domain data in the source/receiver domain for the
Valhall model. (a) Noise-free hydrophone data; (b) added noise; and (c) resulting contaminated
data used for FWI.

was considered. Note however that the source wavelet spectrum has a negligible influence
in frequency-domain FWI, where single frequencies or groups of frequencies of narrow band-
with are generally inverted sequentially, namely, independently, in the framework of multiscale
approaches.
The signal-to-noise ratio was set to 10 dB, based on the power value of the signal defined
for one frequency by:
nshot
X nreceiver
Pdata X
SN R(dB) = 10 log10 with P = dishot,irec d∗ishot,irec . (4.14)
Pnoise
ishot=1 irec=1

Figure 4.20 shows the 4-Hz, noise-free and noisy data in the source/receiver domain for the
hydrophone data.

4.3.3.2 Results

During the first test, we considered only the ambient noise. The VP and VS models inferred from
the four minimization criteria (L2 , L1 , Huber, Hybrid) after the second step of inversion are
shown in Figure 4.21. These reveal very good results for VP models for all functionals, whereas
only the robust L1 norm, the Huber and Hybrid criteria provide acceptable VS models.
In a second test, we introduced outliers into the data: large errors (i.e., the noise was locally
multiplied by 20) were introduced randomly in one trace out of a hundred, to simulate a poorly
preprocessed dataset. The resulting noise is consequently no longer uniform for this test. The
VP models obtained after the first inversion step with the four functionals are shown in Figure
4.22. The L1 norm, and the Huber and Hybrid criteria provide accurate VP models, whereas
the inversion rapidly converged towards a local minimum when the L2 norm was used. For the
L2 norm, the inversion was stopped close to the first step because of the convergence towards
a local minimum. The results obtained after the second inversion step performed with the
L1 norm showed reliable reconstruction of both the VP and VS models (Figure 4.23), which
are close to those obtained from data without outliers (Figure 4.21(c-d)). This highlights the
limited sensitivity of the L1 norm to outliers even for the VS reconstruction.

132
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

Figure 4.21: Reconstructed VP (left panels) and VS (right panels) models for the first Valhall
test with the noisy data, after the two FWI steps. (a-b) L2 norm; (c-d) L1 norm; (e-f) Huber
criterion; and (g-h) Hybrid criterion.

4.3.4 Numerical tests: The onshore SEG/EAGE overthrust model

4.3.4.1 Inversion set-up

A second numerical example focuses on the SEG/EAGE overthrust model. High-amplitude

surface waves are present in the data of this onshore model, and need to be taken into account
during inversion. Brossier et al. (2009a) designed a hierarchical scheme to invert the body and
surface waves jointly, by using simultaneous frequency inversion. We followed this approach in
this study. The acquisition geometry was composed of 199 explosive sources 20 m below the
surface, and 198 vertical and horizontal geophones recording wavefields were located on the
free surface.
Five discrete frequencies, distributed among two slightly overlapping frequency groups, were
inverted: (1.7, 2.0, 3.5) and (3.5, 4.8, 7.2) Hz. The choice of these two groups of frequencies
were shown to be efficient for inversion while limiting the numerical cost (Brossier et al., 2009a).
Five time-damping factors were used in cascade for each frequency group (γ = 1.5, 1.0, 0.5, 0.1,

133
APPLICATIONS

Figure 4.22: Reconstructed VP models for the second Valhall test with the noisy data containing
outliers, after the first FWI step. (a) L2 norm; (b) L1 norm; (c) Huber criterion; and (d) Hybrid
criterion.

0.033). We used a higher number of damping factors and a smaller interval between damping
factors than for the Valhall example, because the overthrust case study is more non linear than
the Valhall one, and, hence, requires injecting more progressively increasing amount of data.
This increasing non linearity results from the presence of high-amplitude surface waves in the
onshore overthrust case study.
Forty-five non-linear iterations were performed per damping factor. Figure 4.24 shows the
true VP model of the overthrust and the 500 m Gaussian smoothed version used as the starting
model. A constant Poisson’s ratio of 0.24 was fixed to build the true and the starting VS
models. The density was constant and assumed to be known during the inversion. VP and VS
are the reconstructed parameters for the inversion of the horizontal and vertical components
of the particle velocity.
The inverted data were computed with a Dirac source wavelet. Random uniform white noise
was introduced into the observed data, with a signal-to-noise ratio of 7 dB for each frequency
component. Figure 4.25 shows the 3.5 Hz, noise-free and noisy data in the source/receiver
domain for the horizontal component of particle velocity.

134
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

Figure 4.23: Reconstructed (a) VP and (b) VS models for the second Valhall test with the noisy
data containing outliers, after the two FWI steps with the L1 norm. Note that the models are
very close to those of Figure 4.21(c)(d).

Figure 4.24: (a) True and (b) starting VP models for the synthetic SEG/EAGE overthrust
tests. The VS models are derived from the VP models using a constant Poisson ratio of 0.24.

4.3.4.2 Results

The VP and VS models obtained with the different minimization criteria are shown in Figure
4.26. The L1 norm again provides the most reliable results. The Huber and Hybrid criteria
show quite robust results, in particular in the shallow part of the models, even if the Hybrid
criterion suffers from high frequency artifacts despite the smoothing regularization operator
applied to the perturbation model at each iteration. The models obtained with the L2 norm
are polluted by strong artifacts, particularly in the thrust area and in the deep structure.

135
APPLICATIONS

Figure 4.25: Real part of the 3.5-Hz frequency-domain data in the source/receiver domain for
the overthrust model. (a) Noise-free horizontal component of particle velocity recorded by
geophones; (b) added noise; and (c) resulting contaminated data used for FWI.

Figure 4.26: Reconstructed VP (left panels) and VS (right panels) models for the overthrust
test, obtained by FWI. (a-b) L2 norm; (c-d) L1 norm; (e-f) Huber criterion; and (g-h) Hybrid
criterion.

136
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

4.3.5 Discussion

4.3.5.1 The offshore Valhall model

The results of the first test, where only ambient noise without outliers was considered, show
reliable reconstruction of the VP model for the four norms, whereas only the L1 , the Huber
and the Hybrid functionals provide reliable reconstruction of the VS model.
In this shallow-water environment with low velocity contrasts at the sea bed, the VP imaging
is more linear than the VS imaging for two main reasons. First, the larger P-wavelengths are
less resolving than their S counterparts, and are therefore less sensitive to the inaccuracies of
the starting model in the framework of a multi-scale reconstruction (Brossier et al., 2009a).
Secondly, the P-waves dominate the seismic wavefield, whereas the P-to-S waves have a weaker
footprint in the data. The limited signature of the S-waves in the data makes the inversion
poorly conditioned for the S-wave parameter class, even with noise-free data. Brossier et al.
(2009b) showed how the hierarchical two-step strategy allows us to increase the sensitivity of the
inversion to the VS parameter during the second step, and hence the successful reconstruction
of the VS model with the L2 norm in the case of noise-free data. However, adding noise to
the data still contributes to the weakening of the sensitivity of FWI to the P-to-S arrivals.
In this case, the two-step strategy implemented with the L2 norm failed to reconstruct the
VS model. In contrast, the L1 norm, the Huber and the Hybrid criteria successfully converge
towards acceptable VS models by mitigating the contribution of the residual amplitude in the
reconstruction.
The second test, where outliers were introduced into the data, illustrates first the expected
failure of the L2 norm in the presence of high-amplitude isolated noise. The L2 norm intrin-
sically amplifies the weight of the high-amplitude residuals during inversion, hence causing
divergence of the optimization if the residuals do not correspond to useful seismic arrivals. The
L1 norm, as well as the Huber and Hybrid criteria, shows stable behavior for VP imaging in
this unfavorable context, because the isolated, high-amplitude outliers have a negligible con-
tribution in these functionals. The strong robustness of the L1 norm with respect to noise
is illustrated by its ability to reconstruct the VS model from low-amplitude P-to-S converted
waves, even in the presence of outliers.

4.3.5.2 The onshore SEG/EAGE Overthrust Model

In an onshore context, where body waves and surface waves are jointly inverted, the data are
very sensitive to both VP and VS parameters. For example, the VS velocities on the near
surface have a significant impact on the high-amplitude surface waves. If the starting VP and
VS models for FWI are not sufficiently accurate, then high-amplitude, surface-wave residuals
can direct the inversion towards a local minimum of the misfit function. In this context,
data redundancy provided by the simultaneous inversion of multiple frequencies is essential to
converge towards acceptable models (Brossier et al., 2009a). In the case of noisy data, it might
however be necessary to strengthen this redundancy, by decreasing the frequency interval within
each frequency group when the L2 norm is used. Figure 4.27 shows the VP and VS models
obtained with the L2 norm, when the number of frequencies per group was increased from
three to nine: i.e., (1.7, 1.8, 2.0, 2.3, 2.5, 2.7, 3.0, 3.2, 3.5) and (3.5, 3.8, 4.1, 4.4, 4.8, 5.2, 6.0,

137
APPLICATIONS

Figure 4.27: Reconstructed (a) VP and (b) VS overthrust models after FWI using the L2 norm
applied to a dense dataset of two slightly overlapping groups of nine frequencies each. Note
the improvement in the signal-to-noise ratio compared to Figure 4.26(a-b).

6.5, 7.2) Hz. Improvements of both the VP and VS models, compared to those of Figure 4.26(a-
b), clearly show that increasing the data redundancy improves the signal-to-noise ratio of the
models, at the expense of the computational cost. On the contrary, the L1 norm shows more
stable behavior in the presence of noise, and therefore it provides more reliable results with
efficient frequency-domain FWI, where only a few frequencies are involved in the inversion. The
Huber and Hybrid criteria show less convincing results compared to the Valhall experiment,
highlighting the difficult tuning of the threshold .

4.3.5.3 Implications for 3D FWI

Some implications for 3D FWI can be derived from the results of the synthetic experiments.
Historically, the two main motivations behind the use of 2D frequency-domain modeling and
inversion were: (1) to design computationally efficient algorithms by limiting the modeling and
the inversion to a few discrete frequencies; and (2) to design multiscale strategies by proceeding
from low frequencies to higher ones (Pratt and Worthington, 1990). For 3D problems, com-
parative analyses of modeling methods has shown that time-domain, explicit-scheme modeling
provides a computationally efficient alternative to frequency-domain modeling methods based
either on direct or iterative solvers to perform frequency-domain FWI (Nihei and Li, 2007;
Plessix, 2007; Sirgue et al., 2008; Virieux et al., 2009).
Our results have highlighted the benefits provided by the data redundancy for performing
robust elastic FWI, where both the VP and the VS parameters are reconstructed and the L2
norm is used. Therefore, is it worth leaving the inversion in the frequency domain if the
time-domain modeling engine provides the opportunity to fully exploit the data redundancy
by mean of time-domain FWI? A more quantitative analysis of the computational burden that
results from implementation of the inversion in the time domain compared to the frequency
domain might be needed to answer this question. The computational burden resulting from
implementation of the inversion in the time domain might be due to disk access or extra forward-

138
4.3 Which data residual norm for robust elastic frequency-domain Full Waveform Inversion?

problem resolutions that depend on the implemented numerical strategy (Akcelik, 2002; Liu
and Tromp, 2006; Symes, 2007). Multiscale strategies in time-domain FWI can be implemented
using the approach of Bunks et al. (1995), where successive inversions of datasets of increasing
high-frequency content are performed. If efficient 3D elastic frequency-domain FWI must be
performed, the L1 norm definitively provides the more robust criterion for the noise levels
investigated in this study.

4.3.6 Partial conclusions

Application of elastic FWI to offshore and onshore data shows the strong sensitivity of the
L2 norm to non-Gaussian errors in the data when decimated discrete frequencies in FWI are
considered for computational efficiency. The marginally used L1 norm appears to be weakly
sensitive to noise even in the presence of outliers, and provides stable results for onshore and off-
shore FWI applications. In particular, the L1 norm allows successful inversion of low-amplitude
P-to-S arrivals for reliable VS model building in an offshore environment, even in the presence
of noise. Alternative functionals, such as the Huber and Hybrid criteria, which combine the
L1 and L2 norms, can provide stable results if the threshold controlling the transition between
the L1 and the L2 behaviors is well chosen. The judicious estimation of this threshold by trial-
and-error tests is a clear drawback of the Huber and Hybrid criteria, even if these functionals
should be as good as the L1 norm if the threshold is well chosen. More automatic functionals,
such as the L1 or L2 norms, should therefore be recommended for inversion of field data. The
L1 norm reveals an interesting alternative to the L2 norm, especially when decimated datasets
are processed by efficient frequency-domain FWI. The results obtained with the L2 norm can,
however, be improved in the presence of ambient noise, by increasing the data redundancy
through simultaneous inversion of multiple frequencies at the expense of the computational
cost. Some implications for 3D FWI can be derived from these conclusions. On the one hand,
the L1 norm might be an appealing alternative to the L2 norm for FWI based on frequency-
domain seismic modeling, which requires consideration of a limited number of frequencies for
computational efficiency. On the other hand, explicit time-marching algorithms are competi-
tive with frequency-domain seismic modeling methods for 3-D problems. If the wavefields are
computed fully in the time-domain for computational efficiency, the inversion of decimated
data in the frequency domain may not provide a significant computational saving compared to
a time-domain inversion. In this case, the inversion may be left in the time domain, to fully
take advantage of the data redundancy.

Acknowledgments

Many thanks go to J. Kommendal and L. Sirgue from BP, for providing the elastic synthetic
models of Valhall.

139
APPLICATIONS

4.4 Onshore acoustic FWI : the Southern Apennines Baragiano

overthrust structure

The first real example where the FWI procedure has been performed by the SEISCOPE group
has been devoted to a dataset over a quite complex crustal structure where standard workflow
has difficulties to be applied as velocity analysis does not catch any hyperbolic reflections,
leaving the migration with poor macromodel for focusing images.

4.4.1 Geological setting

The seismic data processed in this paper have been collected in the axial zone of the South-
ern Apennines thrust-and-fold belt (Italy) by the Enterprise Oil Italiana (Figure 4.30). The
investigated area is characterized by a strongly heterogeneous crustal structure.
The geological setting consists of a tectonic stack of NE-verging sheets involving Jurassic
rocks (cherty dolomites, cherts) and Cretaceous shales. These basinal units are overthrusted by
a regional nappe, which consists of a tectonic melange of Paleocene clays and marly-limestones.
The shallower units include Pliocene soft sediments representing the infill of small basins. Main
surface tectonic structures crossed by the seismic profile are a NW-trending synform, filled by
soft Pliocene sediments, and broad nappe anticline, the latter responsible for a tectonic window
where Mesozoic rocks outcrop.
Due to the presence of clayey strata alternated with Mesozoic hard-rock sheets, and to
a variable surface geology, strong lateral variations and velocity inversions are present at all
depths. The heterogeneous velocity structure, along with a rough topography (Figure 4.28b)
hamper the collection of good-quality near-vertical reflection data, which are otherwise affected
by strong diffractions, multiples, surface waves and static problems (Dell’Aversana et al., 2000).
In addition, in such a context, standard velocity analysis are inadequate to estimate accurate
velocity macro-models for prestack depth migration. As a consequence, multi-channel reflection
seismic usually yields only poor-quality structural images in the investigated area (see Figure
4.29 same Figure 7 in Dell’Aversana (2001)). In order to address this problem, alternative
geophysical exploration tools (multi-fold wide-angle seismic, well-logging, magnetotelluric and
gravity) have been jointly used in the region (Dell’Aversana, 2001).

4.4.2 Wide-aperture acquisition and data quality

The 2-D acquisition geometry consists of a SW-NE 14200m long line running above a synform
and wide antiform (Dell’Aversana et al., 2000). The profile strikes NE-SW and is almost
perpendicular to the main thrust front and fold axes in the area. The profile is tied to a
deepwell drilled in the core of the antiform (Fig. 4.30c). The topography along the profile
is rough. The maximum difference in altitude between sources reaches 700 m (Figure 4.30b).
The surface receiver array consists of 160 vertical geophones deployed along the 2-D line with
90 m interval. Two hundred thirty-three shots were fired with an average spacing of 60 m into
the array by housing explosive charges in 30 m deep boreholes (for a detailed description of the
experiment design see Dell’Aversana et al. (2000)). This acquisition geometry leads to a multi-
fold wide-aperture acquisition with densely sampled source and receiver spacings amenable to

140
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

SW Offset (km) NE
a) 0
-4 -2 0 2 4 6 8

0.5
Time - offset/5. (s)

1.0

1.0
1.5
Amplitude

2.0
0.5

0 10 20 30 40 50 60 Offset (km)
Frequency (Hz)
b) 0
-4 -2 0 2 4 6 8

Re
SZ

{
Time - offset/5. (s)

0.5

1.0

GR R1

1.5 R2

2.0

Offset (km)
c) 0
-4 -2 0 2 4 6 8

0.5
Time - offset/5. (s)

1.0

1.5

2.0

Figure 4.28: Example of CRG. The receiver is located at x = 5.75 km in Figure 4.30b. a) Raw
data. Note the sharp amplitude variations with offset. b) Data processed by minimum phase
whitening, Butterworth filtering and automatic gain control for qualitative interpretation. The
main arrivals are labeled. Re: refracted waves, GR: ground rolls, R1 and R2: wide-angle
reflections; SZ: shadow zone. A reduced time scale is applied with a reduction velocity of
5 km/s. c) Data processed for full waveform tomography.

wavefield propagation processings such as full waveform tomography and asymptotic prestack
depth migration.

A representative example of Common-Reiver-Gather (CRG) is shown in Figure 4.28. The

141
APPLICATIONS

Distance (m)
0 5000 10000
-500

500

1000

1500
TWT (ms)

2000

2500

3000

3500

4000

4500

Figure 4.29: Time migrated stack section obtained by conventional processing of seismic mul-
tichannel reflection data collected along the investigated wide-aperture profile. The horizontal
axis is labelled between 0 km and 13 km of distance with the same coordinate system that the
one used for the wide-aperture profile. Note the overall poor quality of the data and the low
reflectivity on the right side of the section beneath the antiform.

CRG after Butterworth bandpass filtering (cut-off frequencies: 4 − 50 Hz) is shown in Figure
4.28a. Note the sharp amplitude variations with offset which can be attributed to both shot
size variability, variable receiver-ground coupling and heterogeneity of the structure. Data has
been processed by minimum phase frequency-domain deconvolution (whitening), Butterworth
bandpass filtering and automatic gain control (AGC) to carry out a qualitative interpretation
of the main arrivals in Figure 4.28b. The cut off frequencies of the Butterworth filter were
between 5 Hz and 15 Hz to eliminate high-amplitude high-frequency noise. Refracted waves
(Re) and two wide-angle reflections, (R1 ) and (R2 ) are labeled in Figure 4.28b. Note also the
sharp attenuation of the refracted wave at offsets greater than 5 km indicating the presence of
a possible constant velocity or low velocity zone (SZ).
The amplitude spectra of the 4 − 50 Hz bandpass filtered data, the data after minimum phase
whitening and the data after minimum phase whitening and Butterworth band-pass filtering
are shown in the inset of Figure 4.28a.

4.4.3 Traveltime tomography results

Application of the non-linear traveltime inversion to the data set used in this paper have already
been presented in Improta et al. (2002). The resulting velocity model, that will be used as
a starting model for full waveform tomography later on is shown in Figure 4.31a. Over 6000

142
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

Aa) C)
Velocity (m/s)
N 4000 5000 6000
-500

500

Elevation (m)
1000

1500

0 2.5 5 km

1 2 3 2000
4 5 6 7

b) SW Distance (km) NE
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2500
Depth below reference level (km)

l l l l l l
l l l l l l
l l l l l l

1 2 3 4
0.5 l l l l l l 5
l l l l l l
l l l l l l

Figure 4.30: a) Geological setting in the target area. (1) Plio-Pleistocene soft sediments, (2)
Paleocene shales, clays and marly-calcareous deposits, (3) Mesozoic basinal rocks, (4) syncline,
(5) anticline, (6) seismic profile, (7) well for oil exploration (modified after Scarsella et al., 1967,
Servizio Geologico d’Italia, Foglio no 199). b) Topography along the seismic profile. Position
of the well along the profile is indicated by the symbol (circle+cross). c) Velocity-depth and
petrological profiles determined by VSP and well data. (1) shales (Lower Cretaceous); (2) cherts
(Jurassic-Upper Triassic); (3) Cherty limestones (Upper Triassic); (4) sandstones (Middle-
Lower Triassic); (5) main thrust planes. Elevation is with respect to the sea level (from P.
Dell’Aversana, unpublished).

first-arrival traveltimes, from 32 receivers, have been inverted. The velocity model of Figure
4.31a was obtained after four inversion runs and was parameterized during the last run (which
involves the finer parameterization) by 128 B-spline nodes (16 vertical and 8 horizontal nodes,
respectively). Checkboard resolution tests indicated that the model is well resolved down to
about 1.7 km depth below the surface in the 4 − 12 km distance range. The velocities range
between 2.0 km/s and 7.0 km/s. The traveltime tomographic velocity model was subsequently
interpolated on a 639 × 171 regular grid with a 25 m grid interval. This discretization was
used to solve the FD forward-modeling problem in the waveform tomography program. Since
the FD stencils that were implemented in the wave modeling program require 4 grid points per
minimum wavelength, the maximum frequency that is involved in the full waveform tomography
is 20 Hz.
The linearized waveform tomography requires that the first arrival times are matched to within

143
APPLICATIONS

a half-cycle of the frequency to be inverted to avoid cycle-skipping artefacts in the waveform

inversion. The superposition of the first arrival times on the CRG of Figure 4.28 suggests that
this condition is reasonably verified for the dominant frequency of the signal (and hence for
the lowest frequency component) when using the velocity model of Figure 4.31a (Figures 4.31b
and 4.31c).

4.4.4 Full-waveform tomography results

Successful application of full waveform tomography requires to estimate a source wavelet and
to design a specific preprocessing of the data (e.g. Pratt and Shipp, 1999). A source wavelet
was estimated by solving the linear inverse problem (equation (3.6)).
The main objectives of the data preprocessing (Figure 4.32) were:

• (1) to improve the signal-to-noise ratio

• (2) to apply to the data several modifications which reflect the approximations used in
the wavefield modelling and inversion algorithms (see Chapman and Orcutt (1985) for a
discussion on this issue)

• (3) to mitigate several amplitude effects resulting from the instrumentation (variability
of the source size from one shot to the next) and from the variable surface geology which
affects the receiver-ground coupling. Accounting for the surface geology is clearly beyond
the resolution limit of full waveform tomography.

These issues are discussed in the appendix C and in Ravaut et al. (2004).
The CRG of Figure 4.31a after waveform inversion preprocessing is shown in Figure 4.31c.
The best waveform inversion results that we have obtained are depicted in Figure 4.33.
We select 119 CRGs for full waveform tomography among the 160 available gathers. Their
positions along the profile are given in Figure 4.33. This dataset represents 20563 traces while
16 frequency components ranging from 5.4 Hz and 20 Hz are inverted sequentially. This gives
roughly a frequency interval of 1 Hz. This frequency interval was chosen heuristically to
reach a reasonable trade-off between the need to decimate data redundancy in the wavenumber
domain to limit the number of frequencies to be inverted (Sirgue and Pratt, 2001), and hence
saving CPU time and the need to stack redundant data to improve signal-to-noise ratio. Ten
iterations were computed per frequency component.
For all the inversions, the model parameters were discretized on a Cartesian grid with a
grid interval of 100 m and 25 m horizontally and vertically respectively.
The velocity models shown in Figure 4.33 were obtained close of the inversion of the fre-
quency components 5.38 Hz, 10.27 Hz, 15.16 Hz and 20 Hz respectively. Note how the full
waveform tomography incorporates details of the structure in the velocity models as the in-
version progresses towards high frequencies. The most noticeable structures are high-velocity
SW-dipping slices at around x = 10 km and z = 1 − 2 km in Figure 4.33(a-d) corresponding to
high-resistivity bodies previously identified by Dell’Aversana (2001). Note that the inversion of
a rather low frequency component such as 5.38 Hz incorporates already in the velocity models
these SW-dipping structures (Figure 4.33a).

144
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

a) 0 5000 10000 15000

1000
Elevation (m)

-1000

-2000

m/s

2000 4000 6000

Distance (m)
b)
Velocity
0 5000 10000 15000

1000
0
200
3000
Elevation (m)

0
0 400
4000
5000
-1000

50
00
-2000

Offset (km)
c) -4 -2 0 2 4 6 8
0

0.5
Time - X/5.0 (s)

1.0

Figure 4.31: a) Velocity model developed by traveltime tomography. This model was used
as a starting model by the full waveform tomography. The triangles label the position of the
receivers that were involved in the full waveform tomography. b) Ray tracing in the velocity
model. The rays are traced from the receiver position given by x = 5.75 km. c) First-arrival
traveltimes are superimposed on the shown CRG

145
APPLICATIONS

Raw data

Full-waveform tomography data preprocessing

- Trace editing
- minimum phase whitening
- spectral amplitude normalization
- Butterworth bandpass filtering
- Coherency filtering
- Time windowing

Bandpass filtered data

Loop 1 over frequency iw

Picking of first-arrival traveltimes Mono-frequency dataset

Traveltime tomography Loop 2 over iterations it

True
Smooth velocity model iw=1,it=1 Starting model

Full-waveform tomography
False

Updated velocity model

Final velocity model

Smoothing

Asymptotic prestack depth migration Asymptotic prestack depth migration

Depth migrated section Depth migrated section

Figure 4.32: Flowchart of the processing flow. The processings are delineated by rectangles.
The input/output of the processings are delineated by parallepipeds. The gray text denotes
intermediate and final outputs (velocity models and migrated sections) of the seismic imaging
processings. All these models can be integrated in a subsequent geo-structural interpretation.

Comparisons between the VSP log and the velocity logs extracted from the traveltime
(Figure 4.31a) and full waveform (Figure 4.33) tomography models are shown in Figure 4.34.
The waveform tomography logs of Figure 4.34a, 4.34b, 4.34c and 4.34d were extracted from
the models of Figs. 4.33a, 4.33b, 4.33c and 4.33d respectively. The VSP log in Figure 4.34(a-
d) have been low-pass filtered in the time domain in order to roughly reflect the expected

146
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

Figure 4.33: Velocity models from full waveform tomography close of inversion of the 5.4 Hz
(a), 10.27 Hz (b), 15.16 Hz (c) and 20.06 Hz (d) frequency components respectively. Note the
improvement of the spatial resolution compared to the one of the traveltime tomography model
(Figure 4.31a) (panels a and d)

resolution of the full waveform tomography.

A reasonably good fit between the filtered VSP log and the full waveform tomography log is
reached down to 1.9 km depth. This depth corresponds to the limit below which the traveltime
tomography model was considered to be unresolved by the acquisition device (Improta et al.,

147
APPLICATIONS

Velocity (m/s) Velocity (m/s)

a) 4000 5000 6000
b) 4000 5000 6000
-500
-500

0
0

500
500

Elevation (m)

Elevation (m)
1000 1000

1500 1500

2000 2000

2500 2500

c) Velocity (m/s) d) Velocity (m/s)

4000 5000 6000 4000 5000 6000
-500 -500

0 0

500 500
Elevation (m)
Elevation (m)

1000 1000

1500 1500

2000 2000

2500 2500

Figure 4.34: Comparison between VSP and traveltime and waveform tomography velocity logs.
VSP log (solid black line), low-pass filtered VSP log (dot black line), traveltime tomography
log (dot grey line), waveform tomography log (solid grey line). The waveform tomography logs
of panels (a-d) have been extracted from the velocity models of Figures 4.33(a-d) respectively.

2002).
The velocity value and thickness of the high-velocity layer centered on 0 km depth is pretty
well recovered (Figure 4.34d). The high-velocity layer drilled between 1.2 km and 1.6 km is
also pretty well recovered (Figures 4.34c and 4.34d). The arrival labeled R2 in Figure 4.28b is
probably a reflection from this carbonate layer. As documented by the VSP log, these two high-
velocity layers are separated from 0.3 km down to 1.2 km by a low-velocity zone in a large-scale
sense. This low velocity zone corresponds to a stack of thin layers of cherts and shales and is
likely responsible for the shadow zone observed in data and labeled SZ (Figure 4.28b). The full
waveform tomography successfully retrieves each of these thin layers although the amplitude of
the velocity is not always well recovered (Figure 4.34d). Note that a deep low-velocity cherty
layer drilled between 1.6 km and 1.9 km is also well marked in the full waveform tomography
log (Figure 4.34d), despite of its small thickness. Finally, a sharp velocity increase at 1.9 km
in the waveform inversion log matches quite well the top of the high-velocity dolomites layer
drilled between 1.9 km and 2.5 km.

148
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

At a maximum, only a reduction of the cost function of 10% was obtained over iterations
for each frequency component inversion (Ravaut et al., 2004). This was obtained for the
dominant frequency component of the source spectrum. This poor cost function reduction may
partly result from the approximations that we made in the waveform modeling and inversion
but also from inaccuracies in the subsurface velocity models. Thanks to an application to
the SEG/EAGE Overthrust model which is characterized by surface weathering zone, it was
shown that local subsurface velocity inaccuracies in the starting model used by full-waveform
tomography can lead to a poor cost function reduction while a good image of the deeper
structure can be obtained (Ravaut et al., 2004).
In order to assess qualitatively which part of the data contributed significantly in the
model reconstruction, we compute time-domain finite-difference (FDTD) seismograms in the
traveltime and full waveform tomography velocity models (Figure 4.35). The source that was
used for this simulation is located at the position of the receiver of the CRG of Figure 4.28. We
used a bandpass filtered delta function for the temporal source function which should provide a
good approximation of source excitation to model whitened and bandpass filtered data. Direct
comparison between the data of Figure 4.28c and seismograms computed in the traveltime
and full waveform tomography models of Figures 4.33b, 4.33c and 4.43d is shown on the right
panels of Figs. 4.35(b-e). Each seismogram has been normalized by its maximum amplitude
to facilitate the phase identifications.
In Figure 4.35b, one can note that the first-arrival (refracted arrival) is generally correctly
predicted by the traveltime tomography model although amplitudes are overestimated in the
5-8 km offset range where the shadow zone was noticed (compare Figures 4.35a and 4.35b). On
the contrary the wide-angle reflections R1 and R2 are poorly predicted by the smooth travel-
time tomography model. The large-scale high-velocity variation of the traveltime tomography
model centered at 1500 m depth (Figure 4.34) generates small-amplitude wavefield focusing at
large offsets (8-9.5 km) which approximately fits the traveltimes of the R2 wide-angle reflection
(Figure 4.35b). This wavefield focusing evolves towards a high-amplitude wide-angle reflection
spanning a broader range of offsets as full waveform tomography incorporates higher wavenum-
ber in the tomography images (Figures 4.35(c-e)). Note also appearance of the shadow zone
corresponding to attenuation of the first-arrival refracted wave while the wide-angle reflection
R2 is built (Figs. 4.35(c-e)). The wide angle reflection R1 is well predicted in Figures 4.35c
and 4.35d but its match has been degraded close of inversion of the 20 Hz frequency component
(Figure 4.35e). This suggests that waveform tomography was instable between 15 and 20 Hz
although a close comparison between the logs of Figs. 4.34c and 4.34d proves that at least
around the VSP log location the vertical resolution of the full waveform tomography images
has still been improved up to the 20 Hz frequency component inversion.

4.4.5 Asymptotic prestack depth migration results

The final step of our processing flow applies asymptotic prestack depth migration to the wide-
aperture data with three objectives. The first basic objective is to assess the overall consistency
between the images developed by full waveform tomography and asymptotic prestack depth mi-
gration in particular when initiating the two waveform processings from the same macro-model
developed by traveltime tomography. Second, we assess whether the velocity models developed
by full waveform tomography could provide, after smoothing, an improved macro-model for
asymptotic prestack depth migration. For that, we compare the migrated images obtained

149
APPLICATIONS

Offset (km)
a)0 -4 -2 0 2 4 6 8
Time - X/5.0 (s)

0.5

1.0

Offset (km)
b) 0 -4 -2 0 2 4 6 8 1 2 3 4 5 6 7 8 9
0

Time - X/5.0 (s)

0.5

1.0
1

c) 0 1 2 3 4 5 6 7 8 9
-4 -2 0 2 4 6 8
0
Time - X/5.0 (s)
Time - X/5.0 (s)

0.5

1.0 1

1 2 3 4 5 6 7 8 9
d) 0 -4 -2 0 2 4 6 8
0
Time - X/5.0 (s)
Time - X/5.0 (s)

0.5

1.0 1

e) -4 -2 0 2 4 6 8
1 2 3 4 5 6 7 8 9
0 0
Time - X/5.0 (s)
Time - X/5.0 (s)

0.5

1.0 1

Figure 4.35: Comparison in the time domain between observed and computed seismograms for
the CRG of Figure 4.28. a) Observed CRG, b) On the left, synthetic seismograms computed
in the velocity model inferred from traveltime tomography. On the right, direct comparisons
between seismograms of Figures 4.35a and 4.35b. Observed and computed seismograms are in
grey and black respectively. c) Same that Figure 4.35b for synthetic seismograms computed
in the 10.27 Hz full waveform tomography model. d) Same that Figure 4.35b for synthetic
seismograms computed in the 15.16 Hz full waveform tomography model. e) Same that Figure
4.35b for synthetic seismograms computed in the 20.06 Hz full waveform tomohraphy model.
The ellipses delineate the wide-angle reflections R1 and R2 and the shadow zone SZ (Figure
4.28).

150
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

from macro-models developed by first-arrival traveltime and full waveform tomographies re-
spectively. Third, since full waveform tomography is difficult to handle at high frequencies due
to the non linearity of this kind of processing, more robust asymptotic prestack depth migration
should provide a complementary tool to process the data over the full source bandwidth and
hence develop a sharp image of the short wavelengths of the structure.
The asymptotic prestack depth migration(/inversion) was applied to the full dataset (i.e.,
incorporating the full offset range) although the ray-Born approximation is know to be inac-
curate at wide angles (Lambaré, 1991)(pp.167-174). Due to this inaccuracy, we will not stress
on the ability of the quantitative migration to recover the true amplitude of the model param-
eters but we will content ourself with analyzing the accuracy of the reflector mapping (from a
structural viewpoint exclusively).
The same pre-processing flow (minimum phase whitening + Butterworth filtering + co-
herency filtering) as for full waveform tomography (see Appendix C) was applied to the data
except that the cut off frequencies of the Butterworth filtering were 5 and 25 Hz for the asymp-
totic prestack depth migration instead of 5 Hz and 15 Hz for the full waveform tomography.
As a consequence, high frequencies were better preserved for asymptotic prestack depth migra-
tion to get the migrated section with the highest possible resolution. No attempt was made to
mute the refracted waves since these arrivals are often very close in time to the super-critical
reflections. The strongest arrival was migrated instead of the first arrival since it is supposed
to provide superior imaging in the presence of multiple arrivals (Thierry et al., 1999b).
The macro-models
The three macro velocity models that were used are shown in Figure 4.36.The first (referred as
macro-model 1 in the following) is derived from the model developed by first-arrival traveltime
tomography (Figure 4.36a). The macro model is built by first replacing the air layer above
topography by a ficticious layer the velocity of which is close to that present just below topog-
raphy. This modified model is subsequently projected on a cardinal cubic B-spline basis with a
200 m spacing between B-spline nodes. The projection on the B-spline basis is complemented
by a 2-D Gaussian filtering with horizontal and vertical correlation lengths of 200 m (Figure
4.36a). The algorithm used is described in Operto et al. (2003). The air velocity was replaced
by surface velocity to avoid sharp velocity contrast during the smoothing and B-spline basis
projection steps. Note that the 2-D B-spline basis projection and Gaussian filtering do not
modify the wavelength content of the traveltime tomography model (compare Figures 4.31a
and 4.36a which exhibit essentially the same velocity distribution below topography).
The second velocity macro-model (referred as macro-model 2 in the following), shown in
Figure 4.36b, was derived from the final model developed by full waveform tomography (Figure
4.33d). The same procedure than for the traveltime tomography model (air layer replacement,
smoothing, B-spline description) was used. The correlation lengths of the Gaussian filter and
the spacing between B-spline nodes were 200 meters. The macro-model 2 incorporates more
structural details than the first one and hence should theoretically provide an improved macro-
model.
The third velocity macro-model (referred as macro-model 3 in the following), shown in
Figure 4.36c, was again derived from the final model developed by full waveform tomography
(Figure 4.33d) but the spacing between B-spline nodes and the correlation lengths of the
Gaussian filter were decreased to 100 meters. The spatial resolution of the macro-model 3 is
still superior to that of the macro-models 1 and 2.

151
APPLICATIONS

Distance (m)
A
0 5000 10000 15000

1000
Elevation (m)

-1000

-2000

Distance (m)
B
0 5000 10000 15000

1000
Elevation (m)

-1000

-2000

Distance (m)
0 5000 10000 15000
C)

1000
Elevation (m)

-1000

-2000

m/s

2000 4000 6000

Velocity

Figure 4.36: a) Velocity macro-model used by asymptotic prestack depth migration derived
from the traveltime tomography model. b) Velocity macro-model used by asymptotic prestack
depth migration derived from the final waveform tomography model. The final waveform
tomography model was smoothed with a 2D Gaussian filter of horizontal and vertical correlation
lengths of 200 meters. c) Velocity macro-model used by asymptotic prestack depth migration
derived from the final waveform tomography model. The final waveform tomography model
was smoothed with a 2D Gaussian filter of horizontal and vertical correlation lengths of 100
meters.

152
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

Figure 4.37: a) Example of ray tracing in the velocity model of Figure 4.36a. b) Example of
ray tracing in the velocity model of Figure 4.36b. c) Example of ray tracing in the velocity
model of Figure 4.36c. Note the occurrence of numerous caustics in the ray field.

The dynamic ray tracing used in the prestack depth algorithm requires the macro-model
to be smooth (Lambaré et al., 1996). This requirement justifies why we smoothed the full-
waveform tomography model to provide suited macro-models for migration. An optimal com-
promise must be heuristically found between the need of smooth macro-model to guarantee a
good ray coverage and the need of accurate macro-model to optimally stack the reflections.
An example of ray tracing in the three velocity macro-models is shown in Figure 4.37. The
ray field computed in the macro-model 1 is unfolded (i.e., no caustic are formed) (Figure 4.37a).
The ray field computed in macro-model 2 also remains mostly unfolded (Figure 4.37b) although
one caustic is observed in the thrusted zone at distance 12 km and depth 2 km. On contrary,
many caustics are generated in the macro-model 3 due to the increased heterogeneity of the
macro-model (Figure 4.37c). The ray fields computed in the macro-models 1 and 2 suggest

153
APPLICATIONS

Distance (m)
A) 0 5000 10000 15000

1000

0
Elevation (m)

-1000

-2000

Distance (m)
B) 0 5000 10000 15000

1000

0
Elevation (m)

-1000

-2000

Distance (m)
0 5000 10000 15000
C)
1000

0
Elevation (m)

-1000

-2000

-15 -10 -5 0 5 10 15
Velocity

Figure 4.38: a) Migrated section 1 obtained using the velocity macro-model 1 of Figure 4.36a.
b) Migrated section 2 obtained using the velocity macro-model 2 of Figure 4.36b. c) Migrated
section 3 obtained using the velocity macro-model 3 of Figure 4.36c.

that single-arrival migration can be used contrary to macro-model 3 which would require to
use multiple-arrival migration (Operto et al., 2000b).
The migrated images
The three migrated sections (referred as migrated images 1, 2 and 3 in the following) inferred
from the macro-models 1, 2 and 3 are shown without AGC in Figures 4.38a, 4.38b and 4.38c
respectively. Although the quantitative aspects of the migration can be questionable due to
the inaccuracy of the ray-Born approximation at wide apertures, the migrated sections roughly
quantify P-wave velocity perturbations in a relative sense (i.e., the source function is not

154
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

accounted for by the processing) associated with the imaged discontinuities. From a qualitative
viewpoint, it is rather difficult to assess which migrated sections is the most accurate. However,
we can note that the overall amplitude level of the migrated images decreases from the image
1 to the image 3 but that the spatial resolution of the migrated images is improved from the
image 1 to the image 3. A possible interpretation is that the macro-model 1 is not enough
accurate to provide an optimally focused migrated image. But the macro-model 1 is consistent
with a single-arrival migration algorithm since it guarantees unfolded ray fields. In that case,
amplitude of the reflected wavefields are more fully processed in the case of a very smooth
macro-model than in the case of a more accurate one when a single-arrival migration is used.
This reflects the compromise in migration based on the single-arrival hypothesis between the
need for an accurate macro-model for spatial resolution and positioning and the need for a very
smooth macro-model for quantitative imaging of complex media (Operto et al., 2000b).
Unlike macro-model 1, the macro-model 3 may contain sufficient structural details to op-
timally stack reflections but a multi-arrival migration would be required to migrate the full
information contained in the data as suggested by the numerous caustics present in the ray
fields. Macro-model 2 represents a compromise between macro-models 1 and 3.
Note also that the migrated image 3 is polluted by numerous spikes caused by the ray
theory amplitude singularity at caustics.
At present, no attempt has been made to migrate the multiple arrivals to prove that mi-
grated image 3 can be improved by taking into account multiple arrivals. However, this hy-
pothesis may be supported by the Marmousi synthetic test presented in Operto et al. (2000b)
where several macro-models of various smoothness were tested. The macro-model 3 used in this
study has a wavelength content (i.e., a level of complexity) comparable to the Marmousi macro-
model which provided the best quantitative migrated section by multiple-arrival prestack depth
migration. This Marmousi macro-model was built by smoothing the true Marmousi macro-
model with a 2D Gaussian filter of horizontal and vertical correlation lengths 76 m while it is
reminded that the macro-model 3 was built by smoothing the final full-waveform tomography
model with a 2D Gaussian filter of horizontal and vertical correlation lengths 100 meters.
This level of complexity of macro-model 3 was obtained thanks to the combination of trav-
eltime and full-waveform tomography applied to wide-aperture data. It is unlikely that such a
complex macro-model could have been developed by traveltime tomography or migration-based
velocity analysis techniques applied to near vertical reflection data. Although this statement
is only hypothetical, it renders the potential benefit provided by wide-aperture acquisition
geometry to address the problem of complex media imaging.
To assess in a more quantitative way the accuracy of the three tested velocity macro-models,
we computed common image gathers (CIGs) (the so-called iso-X panels) in the diffraction angle-
depth domain (e.g. Xu et al., 2001) (see Figure 4.39 for a definition of the diffraction angle θ)
and we locally compared the three migrated images with the VSP log.
CIGs computed from the three migrated sections are shown in Figure 4.40 for diffraction
angles spanning between 40 and 120 degrees. Events are generally better aligned on the CIGs
of migrated images 2 and 3 compared to that of migrated image 1 suggesting that the macro-
models inferred from full-waveform tomography may provide, at least in some regions of the
target, an improved macro-model compared to the traveltime tomography one. This conclusion
is also supported by the comparison between a bandpass filtered version of the VSP log with
the coincident log of the three migrated images (Figure 4.41). The VSP log was bandpass

155
APPLICATIONS

o
S R

z
c

pr θ ps
pr=ps =1/c k=υq

Figure 4.39: Schematic view of a wide-aperture reflection from a flat interface. The diffraction
angle θ is the angle between the two slowness vectors ps and pr associated with the rays con-
necting from the source and the receiver to the reflection point respectively. The wavenumber
is related to the frequency ν and the diffraction angle trough the relation k = ν q where
q = ps + pr .

filtered to match the wavelength content of the migrated image logs (Figure 4.41(b-d)). The
VSP log was filtered such that vertical wavelengths between roughly 100 and 800 meters are
preserved. The spectral amplitude of the spatial bandpass filter applied to the VSP log is
shown in Figure 4.41e.
An AGC with a 500 meters long window was applied to the migrated section logs since
we are essentially attached to verify the accuracy of the position and focusing of reflectors. In
overall, we obtained the best match of the VSP log when using the migrated image 2. The main
features of the VSP log are roughly matched (i.e., the high-velocity layer centered at elevation
0 meter, the high-velocity layer at 700 meter depth, the two high-velocity layers centered at
depths 1300 and 2000 meters). However, the migrated image 1 allows a good match of the
deep structures (below 1600 meters). This may illustrate the decrease of the sensitivity of
migration to the inaccuracies of the macro-model as depth increases (i.e., as the aperture range
decreases). The superior accuracy of the macro-model 3 is illustrated by the improvement of
spatial resolution and positioning in depth of the perturbations on the log of migrated image
3 (for example, note the positioning accuracy of the high velocity layer centered at depth 0
meter and of the layer centered at 2000 meters depth in Figure 4.41d). However, the log of
the migrated image 3 is polluted by several oscillations which likely result from the ray theory
artefacts in presence of caustics.
Resolution analysis
Since asymptotic prestack depth migration is more usually applied to near-vertical reflection
data, it is instructive to assess the range of trace apertures (i.e., diffraction angles) which has
a significant contribution in the imaging when migration is applied to wide-aperture reflection
data.
In terms of spatial resolution, wide-aperture seismic data should theoretically lead to mi-
grated images with an improved low wavenumbers content (if the wide-aperture components of
the data are successfully migrated) than that provided by more classic reflection experiments
spanning a narrower aperture range.

156
given by
a) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees)
40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120

-1 -1 -1 -1 -1

0 0 0 0 0

1 1 1 1 1

Elevation (m)

Elevation (m)
Elevation (m)
Elevation (m)
Elevation (m)

2 2 2 2 2

X=3.5 km X=5.5 km X=7 km X=10 km X=12 km

Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees)
b) Diffraction angle (degrees)

λv =
40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120
40 50 60 70 80 90 100110120

1
kv
-1 -1 -1 -1
-1

2. c) CIGs computed using macro-model 3.

=
0 0 0 0
0

p
1 1 1 1
Elevation (m)

1
Elevation (m)

Elevation (m)
Elevation (m)

Elevation (m)

2ν
2 2 2 2
2

X=5.5 km X=7 km X=10 km X=12 km

X=3.5 km

c 1 + (o/2z)2
Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees) Diffraction angle (degrees)

,
c) 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120 40 50 60 70 80 90 100110120

-1 -1 -1 -1 -1

0 0 0 0 0

1 1 1 1 1

Elevation (m)

Elevation (m)
Elevation (m)
Elevation (m)
Elevation (m)

2 2 2 2 2

X=3.5 km X=5.5 km X=7 km X=10 km X=12 km

157
frequency. The relation ((4.15)) can be derived by simple trigonometric computations from
where kv is the vertical wavenumber, c is velocity, o is offset, z is depth and ν is temporal
(4.15)
offset range is between 1km and 9km. The vertical wavelength λv which is locally imaged is
medium with a velocity of 4km/s. The source bandwidth is between 6 Hz and 25 Hz. The
target (Figure 4.39). We assume a reflector at a 2 km depth overlaid by a average homogeneous
data, let us consider the most illustrative parameters defining the acquisition device and the
To give a rough estimate of wavenumber range which may be imaged from the wide-aperture
Figure 4.40: a) CIGs computed using macro-model 1. b) CIGs computed using macro-model
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure
APPLICATIONS

a) d)
Velocity (m/s)
b) Velocity perturbations (m/s)
c) Velocity perturbations (m/s) Velocity perturbations (m/s)
4000 5000 6000 -1000 0 1000 -1000 0 1000
-1000 0 1000
-500 -500 -500
-500

0 0 0
0

500 500 500

500

Elevation (m)
Elevation (m)

Elevation (m)
Elevation (m) 1000
1000 1000 1000

1500 1500 1500

1500

2000 2000 2000

2000

2500 2500 2500

2500

e) 1.0
0.9
0.8
0.7
Amplitude

0.6
0.5
0.4
0.3
0.2
0.1

0 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018

Wavenumber (m-1)

Figure 4.41: a) VSP log. b) Comparison between the bandpass filtered VSP log (black curve)
with the coincident log extracted from the migrated section of Figure 4.38a (grey curve).
c) Comparison between the bandpass filtered VSP log (black curve) with the coincident log
extracted from the migrated section of Figure 4.38b (grey curve). d) Comparison between the
bandpass filtered VSP log (black curve) with the coincident log extracted from the migrated
section of Figure 4.38c (grey curve). d) Spectral amplitude of the bandpass filter applied to the
VSP log (Figure 4.41a) to match the wavelength content of the migrated section logs (Figures
4.41(b-d)).

the acquisition geometry defined in Figure 4.39. Considering a maximum offset of 9 km, a
minimum frequency of 6 Hz, an average velocity of 4 km/s, we obtain a diffraction angle of
130o and an estimate of the largest vertical wavelength to be imaged of around 800 meters. In
the opposite way, considering a minimum offset of 1 km and a maximum frequency of 25 Hz,
the smallest wavelength to be imaged is 80 meters.
The asymptotic prestack depth migration roughly recovers the wavelengths predicted by
our schematic resolution analysis suggesting that the prestack depth migration exploited (at
least from a kinematic viewpoint) the wide aperture components of the data. Indeed, the
bandwidth of the spatial filter that we applied to the VSP log to match the spectral content of
the migrated image logs (Figure 4.41e) are in agreement with the wavelength range predicted
by the resolution analysis. This range of wavelengths (between around 80 and 800 meters)
can be compared with those estimated by Jannane et al. (1989) for a more classic reflection

158
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

Distance (m)
a) 0 2000 5000 8000 10000 12000

1000
Elevation (m)

-1000

-2000

b) Distance (m)
0 2000 5000 8000 10000 12000

1000
Elevation (m)

-1000

-2000

Diffraction angle=130 degrees

Figure 4.42: a) Common-angle migrated section obtained using the velocity macro-model 2.
The diffraction angle is equal to 130o . b) Same that Fig. 14a after high-pass filtering.

acquisition geometry (i.e., spanning a narrower range of apertures). Jannane et al. (1989)
have found that (classic reflection) data can recover long wavelengths (λ > 300m) and short
wavelengths (20 m < λ < 60 m) but that middle wavelengths (60 m < λ < 300 m) cannot
be recovered. We have shown that these middle wavelengths were recovered thanks to the
wide-aperture components of our data set.
To illustrate the contribution of the wide-aperture components in the migration, we plotted
in Figure 4.42 a common-angle migrated section for a diffraction angle of 130o using the macro-
model 2. Note the unusual large-scale pattern of the wide-aperture component migrated image
compared to more classic migrated images derived from near-vertical reflection data. This
large-scale pattern results from the migration of the low (temporal) frequency part of the
wide-aperture data component.
The common-angle migrated section of Figure 4.42a is shown in Figure 4.42b after vertical
high-pass filtering. Vertical wavelengths below 300 meters (the so-called middle wavelengths)
were filtered out to highlight the short wavelength content of the 130o common-angle migrated
image. Some reflectors are clearly observed suggesting that the high (temporal) frequency part
of the wide-aperture data component have also been successfully migrated.
For completeness, we presented in Figure 4.43 the 3 migrated images of Figs. 4.38 after
vertical high-pass filtering. The cut-off wavelength was again 300 meters to keep only the short

159
APPLICATIONS

Distance (m)
0 5000 10000 15000
A)
1000

0
Elevation (m)

-1000

-2000

Distance (m)
B) 0 5000 10000 15000

1000

0
Elevation (m)

-1000

-2000

Distance (m)
C) 0 5000 10000 15000

1000

0
Elevation (m)

-1000

-2000

-5 0 5
Velocity

Figure 4.43: a) High-pass filtered migrated section obtained using the velocity macro-model 1.
b) High-pass filtered migrated section obtained using the velocity macro-model 2. c) High-pass
filtered migrated section obtained using the velocity macro-model 3 .

wavelengths. Hence, the migrated images of Figure 4.43 mimic that which would have been
obtained from more classic reflection data. Removing the footprint of the middle wavelengths
in the migrated images may be helpful to delineate the main discontinuities during the struc-
tural interpretation stage. However, large and middle wavelengths are of invaluable help for
lithological interpretation of the layers delineated by the discontinuities.

4.4.6 Geological discussion of the final model

The small wavelength content of the velocity model resulting from the full waveform tomog-
raphy provides new insights into the structure of the upper crust and contributes to a better
understanding of the internal geometry of the investigated thrust-and-fold system, which is
instead only poorly imaged by conventional reflection seismic (Dell’Aversana, 2001) (Figure

160
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

A 1000

Elevation (m)
0

-1000

-2000

b 1000
Elevation (m)

-1000

-2000

m/s

2000 4000 6000

Velocity

c
Rho (ohm m)
4096
2048
1024
512
256
128
64
32
16
8
4
2

Figure 4.44: a) Superposition of the final full waveform tomography model and the migrated
section 2. b) Schematic line drawing of the main reflectors revealed by the joint analysis of the
full waveform tomography and prestack depth migrated images. c) Two-dimensional resistivity
model obtained by inverting magnetotelluric data collected along the profile (modified after
Dell’Aversana (2001)). The resistivity model well delineates the large-scale structures of the
waveform tomography velocity models. The high-velocity regions in the deep part of the
model unambiguously correspond to two evident high-resistivity bodies in the MT section,
while shallow low-velocity layer well match low-resistivity regions.

4.29). Asymptotic prestack depth migration (Figure 4.44(a-b)) provides additional insights
into the structure of the upper crust and contributes to a better understanding of the internal
geometry of the investigated thrust-and-fold system, which is instead only poorly imaged by
conventional reflection seismic (Dell’Aversana, 2001). The migrated image 2 has been super-
imposed on the final full waveform tomography model to verify the consistency between the
two images (Figure 4.44a). A schematic line drawing of the main discontinuities based on the
joint analysis of the two images is shown in Figure 4.44b.
The most noticeable features of the two images are SW-dipping slices clearly delineated by
sharp velocity contrasts between high-velocity bodies (5500-6500 m/s) and narrow low-velocity
regions (3500 − 4000 m/s), which are located beneath the anticline drilled by the borehole
(Figure 4.45a). On the western flank of the anticline, these structures are bounded by an
intermediate-velocity region (4500 − 5500 m/s), that merges upwards with an evident near-
surface high-velocity bump, the latter corresponding to the anticline where Mesozoic rocks are
exposed. In this region, the prestack depth migrated section shows between 5 km and 7 km
of distance antiformal structures, which merge eastward with strong discontinuities imbedded

161
APPLICATIONS

in the high-velocity shallow bump. On the western side of the profile, a noticeable feature of
the prestack depth migrated image is a clear west-dipping fault, delineated by truncated and
tilted events. Both flanks of the antiform are characterized by near-surface low-velocity layers
(2000 − 3000 m/s). The low-velocity layers reach a maximum thickness of about 1 km over
distances between 5 km and 7 km and overlay a region showing a quite chaotic succession
of low and intermediate-velocity layers, which hamper the identification of large-scale velocity
structures. Nevertheless, a low-velocity region (3500 − 4000 m/s) can be identified at about
4 − 5 km of distance in the 1 − 2.5 km depth range.
Reliability of the full waveform tomography velocity model and migrated image has been
further assessed by a comparison with a 2D resistivity model obtained by Dell’Aversana (2001)
inverting magnetotelluric data collected along the seismic profile (Figure 4.44c).
We found an overall good agreement between the velocity and resistivity images. The
SW-dipping high-velocity bodies imaged at 1 − 2 km depth are consistent with two high-
resistivity regions showing the same trend, while the high-velocity bump imaged beneath the
antiform matches quite well a high-resistivity shallow region. In addition, a weak arc-shaped
discontinuity showed by the migrated section between 7 km and 8 km of distance at about
1.5 km depth, matches quite well the top of the deeper high-resistivity body. The near-
surface low-velocity layers bounding the antiform correspond to strongly conductive bodies.
In particular, note the pronounced thickening of the conductive region imaged between 4 km
and 7 km of distance. This conductive region is cross-cut by a thin high-resistivity layer, that
matches well strong discontinuities showed by the prestack depth migrated image. Finally,
the low-velocity region identified in the deeper part of the model between 4 km and 5 km of
distance corresponds to a region of relatively low-resistivity values, that is sandwiched between
two high-resistivity bodies.
A schematic geo-structural interpretation of the velocity model is shown in figure 4.45c.
The geological interpretation, which is locally constrained by well data, is also based on the
combined analysis of the velocity and resistivity images. In addition, surface geological mapping
(Figure 4.30) and some shallow reflectors imaged by conventional reflection seismic (Figure
4.29), facilitate the interpretation of the near-surface velocity structure.
We interpret the anticline explored by the well as a stack of two main SW-dipping sheets,
which are in turn cross-cut by an out-of-sequence thrust. The out-of-sequence thrust, which
we relate to the main discontinuity drilled at about 0.3 km depth, is responsible for a tectonic
doubling, as well as for the formation of a wide nappe anticline in the shallow part of the
crust. The internal geometry of the anticline is well delineated by the high-velocity (5500 −
6500 m/s) and high-resistivity slices, which correspond to cherty dolomites (Figure 4.45c). As
inferred from the well, the bottom of the high-velocity slices delineates the main thrust planes.
Intermediate-velocity bodies are associated with cherts and/or strongly fractured dolomites,
while the regions showing a chaotic succession of low- and intermediate- velocity layers may be
indicative of sequences lithologically dominated by Cretaceous shales. This hypothesis is based
on surface and well data, and is further supported by the presence of low-resistivity regions
(< 100 ohm.m), which suggest the presence of clayey materials.
Finally, we relate the near-surface low-velocity and strongly conductive layers to the re-
gional nappe mainly composed of Cenozoic clays, which overthrusts the Mesozoic terrains. On
the western side of the model, geological interpretation of the shallow structure is aided by con-
ventional reflection data showing strong and quite continuous events at about 500 ms (Figure

162
4.4 Onshore acoustic FWI : the Southern Apennines Baragiano overthrust structure

SW NE
Distance (m)
a

m/s

2000 4000 6000

Velocity

b
Rho (ohm m)
4096
2048
1024
512
256
128
64
32
16
8
4
2

Distance (m)
c
overthrust
antiform

synform
Ps Cs-Jc Pc
Pc Jd-c
Jd Cs-Jc
Cs-Jc
Jc
Jc? Cs-Jc
Jc
Jd
Jd

Ps Pc Cs-Jc Jc Jd-c Jd

Figure 4.45: Geo-structural interpretation of the waveform tomography model. (a) Main dis-
continuities (sharp velocity changes are indicated by black solid lines) and very low velocity
contrasts (dot black lines) used for the geological interpretation of the velocity model. The
discontinuities are locally constrained by well logs. (b) Two-dimensional resistivity model ob-
tained by inverting magnetotelluric data collected along the seismic profile (modified after
Dell’Aversana (2001). Note the good agreement between the velocity and resistivity images.
(c) Schematic geo-structural interpretation of the velocity model. Ps: Pliocene soft sediments,
Pc: Paleocene clayey sediments, Cs-Jc: Cretaceous shales and Jurassic cherts, Jc: Jurassic
cherts, Jd-c: strongly fractured Triassic cherty dolomites and cherts, Jd: stiff cherty dolomites.
The main thrust planes (red dashed lines) and the overthrust of the Paleocene nappe (red
continuous line) are indicated. The geological discontinuities are constrained by the well data.

4.29). In particular, both the velocity and resistivity images contain hints of a thrust structure
at about 5 km, which may be related to SW-dipping events and truncated reflections in the
stack section.

4.4.7 Partial conclusion

We have presented an application of a seismic imaging flow integrating first-arrival travel-

time tomography, frequency-domain full waveform tomography and asymptotic prestack depth
migration to real onshore wide-aperture seismic data recorded in a complex geological envi-

163
APPLICATIONS

ronment (thrust belt). We have illustrated the input/output relationships between the three
processings of our multi-scale approach.
The first main conclusion is that a reliable macro-model for frequency-domain full-waveform
tomography and asymptotic prestack depth migration can be derived by first-arrival traveltime
tomography thanks to the wide-aperture acquisition geometry.
Frequency-domain full-waveform tomography allows to tremendously improved the spatial
resolution of the velocity model by incorporating shorter wavelengths as the inversion progresses
towards high frequencies. Full-waveform tomography was applied to (temporal) frequency com-
ponents between 5.4 Hz and 20 Hz. Full-waveform tomography is highly non-linear at higher
frequencies and hence is complemented by more robust asymptotic prestack depth migration to
account for the full frequency content of the data. The asymptotic prestack depth migration is
applied to all the aperture components of the data. Three macro-models of various smoothness
is used. The first is developed from traveltime tomography and the two others are smoothed
versions of the final full-waveform tomography models.
First, we have shown that asymptotic prestack depth migration applied to the wide-aperture
data allowed to image middle wavelengths which are out of reach when classic near vertical
reflection data are available. Hence, migrated images with an unusual broad wavelength content
can be developed when wide-aperture geometry are available. The broadening of the wavelength
content improves the spatial resolution of the migrated images and facilitates interpretation of
the lithological units.
Second, thanks to an analysis of CIGs and local comparison of the migrated images with the
VSP log, we have shown that the smoothed full-waveform tomography models provide improved
macro-model for asymptotic prestack depth migration. First of all, this contributed to verify
the relevance of the full-waveform tomography models in addition to the others verificationsthat
we presented (match of a VSP log, data fit, comparison with a co-incident resistivity section).
Moreover, combination of traveltime and full-waveform tomography applied to wide-aperture
seismic data provide a framework to develop macro-model for prestack depth migration with a
high level of complexity. Previous synthetic tests whose goal was to image complex media by
asymptotic prestack depth migration suggested that such complexity was required to develop
well-resolved and quantitative migrated images of complex structures (Operto et al., 2000b). It
is reasonable to think that the spatial resolution of the macro-models developed in this paper
is out of reach when only near-vertical reflection data are available.
The next step will be to apply multiple-arrival prestack depth migration to the wide-
aperture data to fully take advantage of the high spatial resolution of the macro-models devel-
oped by full-waveform tomography. Indeed, the heterogeneity of the full-waveform tomography
macro-model generates complex ray fields with caustics which requires such multiple-arrivals
prestack depth migration.
We have presented an integrated processing flow applied to dense wide-aperture data
recorded in a complex geological environment. We have shown that the combination of
wide-aperture acquisition configuration and a multi-scale seismic imaging flow could provide a
promising approach to image complex structures.

164
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

4.5 Shallow-water offshore acoustic FWI : on the footprint of

the Valhall anisotropy on FWI

4.5.1 Introduction

Most of the recent applications of FWI to real wide-aperture data have been performed in
the isotropic acoustic approximation, where one seeks to reconstruct the P-wave velocity only
(e.g., Ravaut et al., 2004; Operto et al., 2006; Bleibinhaus et al., 2007; Jaiswal et al., 2009).
In this framework, one may question the real meaning of the reconstructed velocity, and,
therefore, the validity of the isotropic approximation to invert wide-aperture data, which can
potentially contain the footprint of the subsurface anisotropy. An analysis of such footprint
was presented by Pratt and Sams (1996), who applied isotropic and anisotropic first-arrival
traveltime tomography to cross-hole data recorded in a fractured highly-layered medium. They
show the need to incorporate anisotropic effects in the tomography to reconcile cross-hole
seismic velocities with well information. In this study, we address the validity of the isotropic
approximation in the framework of FWI of surface wide-aperture data with a real data case
study from the Valhall field in the North Sea. This case study will clearly highlight the footprint
of vertical transverse isotropic (VTI) anisotropy on the velocity reconstruction performed by
the isotropic FWI of the wide-aperture data.
The Valhall oilfield in the North Sea is characterised by a gas cloud, which hampers the
imaging of reflectors at the oil reservoir level, and by a significant VTI anisotropy (Kommedal
et al., 2004). The zone has been successfully imaged with a three-dimensional isotropic acous-
tic frequency-domain FWI used for an OBC data-set in the low frequency range [3.5 - 7] Hz
(Sirgue et al., 2009, 2010). The starting model used for 3D isotropic FWI has been built by
anisotropic reflection traveltime tomography (RTT). The anisotropic model has been subse-
quently converted into normal move-out (NMO) velocity used initially by the isotropic FWI.
The recovered FWI velocity model clearly shows a series of complex channels at a depth of
150 m, and reached a resolution such that it allows to distinguish details like fractures filled of
gas at a depth of 1000 m (Fig. 4.46b). Using the FWI velocity model as the background model,
improved migrated images of the reservoir at around a depth of 2500 m, and of the overbur-
den have been computed. Although the 3D FWI procedure shows impressive results, authors
question the meaning of the isotropic velocities, as the anisotropy is well acknowledged in the
Valhall zone. The isotropic approximation should be acceptable if the medium is quasi-elliptic
(low values of the η parameter (Thomsen, 1986)) and the lateral velocity contrasts smooth
enough. If these conditions are satisfied, the main effects of the anisotropy in the isotropic
FWI models should be a vertical stretching of the velocity structure. The meaning of the
effective isotropic velocity reconstructed from anisotropic wide-aperture data needs, however,
to be clarified. This is the question we want to address in this work.
Consequently, our prime interest is the application of a complete 2D frequency-domain
FWI work flow including detailed model appraisals to real OBC wide-aperture data from
the Valhall field, with some emphasis on the footprint of anisotropy on FWI velocities. Full
waveform inversion requires an initial velocity model to start with. The first initial model that
we consider is a 2D version of the 3D NMO velocity model used by Sirgue et al. (2010). As
mentioned above, this velocity model has been developed by reflection traveltime tomography,
and therefore, is tightened to match short-aperture reflected arrivals. First-arrival traveltimes
computed in the NMO velocity model shows a significant delay with respect to the first-arrival

165
APPLICATIONS

traveltimes picked on the real data: an obvious evidence of the footprint of the anisotropy in
the wide-aperture data. To remove these kinematic inaccuracies, we update the NMO velocity
model by first-arrival traveltime tomography (FATT) to match the first arrival times, at the
expense of the match of the reflection traveltimes. The resulting model, referred to as the
FATT model, provides us a second initial model for FWI. Our motivation behind the building
of the FATT model is the accurate prediction of the wide-aperture components of the data
during the early stages of the multiscale approach in order to perform a reliable reconstruction
of the long and intermediate wavelengths. We shall investigate in this study how the isotropic
FWI manages the anisotropy information in the data depending on the initial model we use
and on the multiscale data preconditioning that we apply.
In the next section, we present the application of FWI on the Valhall data set. After the
presentation of the results, we discuss the meaning of the velocities inferred from isotropic
full waveform inversion based upon local comparison of these results with well log, seismic
modelling, prestack depth migration (PSDM) and source wavelet estimation. We also show
that using an acoustic VTI forward modelling in the inversion procedure for vertical velocity
reconstruction allows us to develop a velocity model that matches the well log velocities, unlike
the acoustic velocity model reconstructed by isotropic full waveform inversion.

4.5.2 Review of the method

In this study, acoustic full waveform inversion is performed in the frequency domain using the
acoustic-elastic FWI method described in Brossier (2011). The seismic modelling is performed
in the frequency domain with a velocity-stress discontinuous Galerkin (DG) method on un-
structured triangular mesh, allowing for an accurate positioning of sources and receivers, and
accurate parametrisation of bathymetry in the framework of the shallow-water environment of
Valhall (Brossier et al., 2008; Brossier, 2011).
The misfit function is defined by the weighted least-absolute-value (L1 ) norm of the data
residual vector. We choose the L1 norm in the data space because it has been shown to
be less sensitive to noise in the framework of efficient frequency-domain FWI (Brossier et al.,
2010). The optimisation relies on a Polak and Ribière (1969) preconditioned conjugate-gradient
scheme, where the gradient direction is preconditioned, on one hand, by the diagonal terms
of the so-called approximate Hessian (i.e., the linear part of the full Hessian) to correct for
geometrical spreading effects, and, on the other hand, by a 2D Gaussian filtering to filter out
the high wavenumber components of the gradient vector (e.g., Sirgue and Pratt, 2004).
In order to increase the quadratic-wellposedness of the inverse problem (Chavent, 2009, p.
162), the FWI algorithm is designed into a multiscale reconstruction of the targeted medium
(Brossier et al., 2009a; Brossier, 2011). The first level of multiscaling is controlled by the outer
loop over frequency groups, where a frequency group defines a subset of simultaneously-inverted
frequencies. The multiscale algorithm proceeds over frequency groups of higher-frequency con-
tent with possible overlap between frequency groups. A second level of multiscaling is imple-
mented within a second loop over exponential time-dampings applied from the first arrival times
t0 . Time damping is implemented in the frequency domain by means of complex-valued fre-
quencies, where the imaginary part of the frequency controls the amount of damping (Brenders
and Pratt, 2007b; Brossier et al., 2009a; Shin and Ha, 2009).
In real data application, the source wavelet signature is generally unknown and must be

166
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

estimated for each frequency. Since the source is linearly related to the wavefield, the source
wavelet signature can be estimated by solving a least-squares linear inverse problem assuming
that the medium is known. Following Pratt (1999), we reconstruct the source function as un
unknown of in the inverse problem. In the framework of the adjoint-state method, for consis-
tency with the model update performed with a L1 norm minimisation, the source signature can
also be estimated alternatively with a L1 norm minimization (personal communication of R.
-E. Plessix). Such optimisation has been implemented with a non-linear optimisation scheme
based on the Very Fast Simulated Annealing (VFSA). Our experience with source wavelet es-
timation shows that, for both synthetic and real data sets, both non-linear L1 and linear L2
optimisations give similar results. In the following, the source wavelets are estimated using
the L1 norm. The source signatures are updated for each source gather at each iteration,
once the incident Green functions are computed, and are subsequently used for the gradient
computation and model update.

4.5.3 Geological context and data

4.5.3.1 Geological context

The Valhall oilfield in the North Sea has been producing since 1982. This is a shallow water
environment (70 m water depth), located in the central zone of an old Triassic graben, entering
into compression during late Cretaceous (Munns, 1985). The subsequent inversion of stress
orientations has led to the formation of an anticlinal now lying at a depth of 2.5 km, delineating
a high velocity interface with overlying layers. Extension regime has occurred in tertiary age
allowing for a thick deposit of sediments in which gas are trapped in some layers. Rising from
the underlying Jurassic layers, oil is trapped underneath the cap rock of the anticlinal. The oil
migration reaches a peak nowadays, by means of numerous fractures induced by the different
tectonic phases. Of note, those fluid are the cause of the high porosity preservation of Valhall
reservoir: a distinctive feature even though this field is affected by subsidence likely due to
production.

4.5.3.2 Ocean-bottom-cable (OBC) acquisition geometry, seismic data set and

reference models

The layout of the 3D wide aperture/azimuth acquisition designed by BP is shown in Fig. 4.46a,
where the black points and lines represent respectively the location of the shots at 5 m depth
and of the permanent OBC-four-component arrays at around 70 m depth on the sea floor. In
this study, 2-D acoustic FWI is applied to the OBC line denoted by Cable 21 in Fig. 4.46a.
This cable is located outside the gas cloud holding at a depth of 1000 m, as revealed on the
horizontal cross-section extracted from the 3D-FWI model of Sirgue et al. (2010) (Fig. 4.46b).
It is worth noting that maximum offset for the 2D and 3D acquisition is of the same order of
magnitude, but the ratio of large over short-aperture components of the data is much more
important for 3D than for 2D acquisitions.
Data set recorded by cable 21 consists of 320 shots recorded by 220 4-C receivers. A receiver
gather for the hydrophone component is shown in Fig. 4.47a, with superimposed manually-
picked first-arrival times. A 3D model for the vertical velocity V0 and the Thomsen’s parameters
δ and developed by anisotropic reflection traveltime tomography (RTT) has been provided by

167
APPLICATIONS

Figure 4.46: The Valhall experiment - (a) Layout of the Valhall survey. Lines and points denote
the position of shots and of the 4C-OBC, respectively. Cable 21 is the 2D line considered in
our study. (b) Horizontal slice at a depth of 1000 m across the gas cloud extracted from the
3D-FWI model of Sirgue et al. (2010) (from Sirgue et al. (2009)).

the BP company. The anisotropic model has been converted into a normal move out√(NMO)
velocity model for isotropic imaging, where the NMO velocity is given by VN M O = V0 1 + 2δ.
A cross-section of the NMO velocity model is extracted from the 3D model along the cable
21 acquisition. This model is referred to as the RTT model (Fig. 4.48a). Since accurate
modelling of the diving waves is theoretically required by the full waveform inversion of wide-
aperture data to prevent cycle skipping, we consider another model referred to as the FATT
model obtained by fitting the observed first-arrival times with the RTT model as initial model
(Fig. 4.47b). Rays associated with the first-arrival paths turn down to a maximum depth of
1300 m, above the low-velocity gas cloud located between depths around 1500 m and 2500 m.
The traveltime curve of the first arrival in the offset-time domain (Fig. 4.47a) exhibits clearly
two distinct slopes with a crossover distance of around 4000 m. These two slopes suggest the
presence of a sharp interface at a depth of around 500 m as shown by the ray tracing in the
FATT velocity model and the VSP well profile (Fig. 4.47c). Reflections from top and bottom
of the gas layers are clearly seen in the receiver gather of Fig. 4.47a, where they can be followed
up to the super-critical incidences. In particular, we show that the reflection hyperbola from
the top of the gas cloud becomes tangent to the diving wave from above in the time-offset
domain. Of note, late arrivals are dominated by shingling dispersive guided waves in the near
surface (Robertson et al., 1996). These high-amplitude waves can have a harmful impact on

168
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

a) Offset (km) c)
-8 -4 0 4 Velocity (m/s)
0 1000 2000 3000
0
Time - Offset / 2.5 (s)

1 500

1000
2

1500

Depth (m)
3
2000
b) 4 8
Distance (km)
12 16
0
3400
2500
Depth (km)

Velocity (km/s)
1
2700
2

3 2000 3000

4
1300
3500
X = 9.5 km

Figure 4.47: OBC data set - (a) Example of preprocessed recorded receiver gather, at the
position x = 14100 m. The green curve corresponds to manually-picked first-arrival traveltimes
associated with diving waves from above the gas layers. The red and blue curves correspond
to the picking of waves reflecting from the top and the bottom of the gas cloud, respectively.
Seismograms are plotted with a reduction velocity of 2500 m/s. (b) RTT model updated by
FATT on which we superimposed rays associated with the first-arrival travel times, emitted
from the source located at the position x = 14100 m. The gas cloud is between depths around
1500 m and 2500 m. (c) VSP well log extracted at the position x = 9500 m (with the courtesy
of BP).

the inversion procedure, in particular, if the sea bottom is not accurately modelled.

4.5.4 Starting models appraisals

In the following of this study, we use the RTT and the FATT models, as initial models for
FWI. Comparison between recorded seismograms with synthetic seismograms computed in the
RTT model shows a strong delay of the modelled first-arrival travel times at large offsets (Fig.
4.48c). The phase mismatch reaches 0.285 s at an offset of 11000 m, corresponding to half a
period at a frequency of 3.5 Hz. In contrast, the traveltimes of the reflections from the top and
the base of the gas cloud are quite well predicted by the RTT model. In contrast, the FATT
model matches as expected first arrivals through travel-time tomography (Figs. 4.48b and d).
The velocities in the overburden of the gas cloud are higher in the FATT model than in the
RTT model (Figs. 4.48a and b). These increased shallow velocities degrade the match of the
reflected waves from the top and the bottom of the gas cloud. Since none of the RTT and
FATT models allows one to match both the short and wide-aperture components of the data,
we conclude that this kinematic inconsistency provides some seismic evidence of anisotropy in
the Valhall data set. For a further illustration of this statement, we migrate the short-aperture
reflected wavefield with a ray-Born prestack depth migration (PSDM) (Thierry et al., 1999b)

169
APPLICATIONS

c) 0
a) Distance (km )

Time − offset / 2.5 (s)

4 8 12 16
0 3400 1

Velocity m/s
1
Depth (Km)

2 2700
3 2000 2

4
1300
3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
Offset (m)
d) 0

Time − offset / 2.5 (s)

b) 4 8 12 16
0 3400

Velocity m/s
1
1
Depth (Km)

2 2700
3 2000 2
4
1300
3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
Offset (m)

Figure 4.48: Initial FWI velocity models - Initial FWI velocity model built by (a) RTT and (b)
updated by FATT. (c-d) Direct comparison between observed data (black seismograms) and
modelled data (gray seismograms) computed in the RTT (c) and in the FATT (d) models. True-
amplitude seismograms are plotted with a gain with offset corresponding to the offset power
0.6. The yellow curve denotes the first-arrival travel times computed in the considered starting
model by solving the eikonal equation. The green curve denotes the first-arrival traveltimes
picked in the recorded data. Note the significant delay between observed traveltimes and
computed traveltimes from the RTT model.

using the RTT and the FATT models as background models (Fig. 4.49). As expected from
normal move-out informations, the reflectors are better focused and more laterally continuous
in the migrated image inferred from the RTT model, with fairly flat reflectors in the common
image gathers (CIGs)(Figs. 4.49a and c). In contrast, the reflectors in the CIGs computed
in the FATT model (Fig. 4.49d) are clearly frowning, betraying the too high velocities in the
upper part of the FATT model. Indeed, the increase of the velocities in the upper part of the
FATT model also translates in the deepening of the reflectors from below the top of the gas
cloud.

4.5.5 FWI preprocessing and experimental setup

4.5.5.1 FWI preprocessing

Among the available 4-C data, only the hydrophone component is considered as we deal with
acoustic FWI. Acoustic inversion has been applied to the hydrophone component of the elastic
data computed in the synthetic elastic Valhall model (Brossier et al., 2009b): successful image
of the VP structure has been obtained because converted P-SV waves have a minor footprint
on the hydrophone component. Since receivers are around 2/3 times less numerous than shots,
data are sorted by receiver gathers in virtue of the source-receiver reciprocity holding between
an explosion source and a pressure component of the data, in order to reduce the computational
cost (one LU-substitution phase per shot).

170
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

Distance (km) Distance (km)

4 6 8 10 12 14 16 6 7 8 9 10 11 12 13 14 15
0 0

0.5 0.5

1.0 1.0

1.5 1.5
Depth (km)

Depth (km)
2.0 2.0

2.5 2.5

3.0 3.0

3.5 3.5

4.0 4.0

Distance (km) Distance (km)

4 6 8 10 12 14 16 6 7 8 9 10 11 12 13 14 15
0 0

0.5 0.5

1.0 1.0
1.5 1.5
Depth (km)

Depth (km)
2.0
2.0
2.5
2.5
3.0
3.0
3.5
3.5
4.0
4.0
4.5

Figure 4.49: PSDM in the initial FWI velocity models - (a-b) Ray+Born migrated images
computed in the RTT (a) and in the FATT (b) models. (c-d) CIGs sorted by angles between
-50o and 50o computed in the RTT (c) and in the FATT (d) models. Note how the reflectors
frown in the CIGs computed in the FATT model due to too high velocities in the upper
structure.

The FWI data preprocessing first consists of a minimum phase whitening followed by a
Butterworth filtering of bandwidth [4 − 20] Hz. The whitening was designed to preserve the
amplitude versus offset properties of the data, by normalizing the spectral amplitudes of the
deconvolution operator associated with each trace by its maximum amplitude. The spectral
whitening combined with the Butterworth filtering can provide a suitable data weighting for
simultaneous inversion of multiple frequencies. The bandwidth of the Butterworth filter was
chosen heuristically to provide the best trade-off between signal-to-noise ratio and flattening
of the amplitude spectrum. We then applied a FK filtering to remove as much as possible S-
wave energy, and a spectral matrix filtering to enhance the lateral coherency of events. We also
applied a mute to remove noise before the first-arrival time, and after a time of 4 s following the
√
first-arrival excluding late arrivals. Finally, the observed data are multiplied by the function t
to roughly transform the 3D geometrical spreading of real amplitude data into a 2D amplitude
behaviour. An example of a fully preprocessed receiver gather is shown in Fig. 4.47a.

4.5.5.2 Experimental set-up: seismic modelling

The 18000 m x 5000 m velocity, density and attenuation models are discretised on unstructured
triangular meshes for seismic modelling with the discontinuous Galerkin method, where the
medium properties are piecewise constant per element (Brossier et al., 2008; Brossier, 2011).

171
APPLICATIONS

Figure 4.50: Seismic modelling - Close-up of the hybrid P1-P0 triangular mesh on which seismic
modelling is performed using the Discontinuous Galerkin method.

Accurate positioning of seismic devices and of the bathymetry at a depth of 70 m is permitted

by the use of a fine mesh in the first 160 meters of the medium. Linear interpolation order
(P1) is used to describe the acoustic wavefield in this fine mesh (Fig. 4.50). Below, a regular
triangular mesh is used with piecewise-constant (P0) representation of wavefield in each cell to
reduce the cost of the modelling in terms of memory and computational cost. A discretisation
rule of 10 elements per wavelength is used in the regular mesh, leading to 20 m long triangle
edges. The hybrid P1-P0 mesh contains around 585.103 cells. The mesh includes 500 m thick
perfectly matched layers (PML) on the right, left and bottom sides of the model for absorbing
boundary conditions (Berenger, 1994). A free surface boundary condition is implemented on
top of the models, that implies that free surface multiples are involved during FWI. Although
real depth of receivers varies between 67 m and 73 m, we choose per convenience to design a
flat bathymetry at a depth of 70 m within the mesh and to position all the receivers at a depth
of 71 m, just below the sea bottom. This approximation has a minor impact on the modelling
accuracy given the shortest propagated wavelength of 215 m.

4.5.5.3 Experimental set-up: inversion

Only the P-wave velocity is reconstructed during the inversion procedure we perform. Atten-
uation model is set homogeneous to a realistic value of the attenuation factor Qp = 150. This
value of attenuation is chosen by trial-and-error such that the root-mean-squares amplitudes of
the early-arriving phases computed in the initial model roughly matches that of the recorded
data, following the approach of (Pratt, 1999, his figure 6). The density model is inferred from
the starting FWI velocity models using the Gardner law (Gardner et al., 1974) and is kept
constant over iterations of the inversion.
We sequentially invert five increasing frequency groups between 3.5 Hz and 7 Hz. We have
verified that a sufficiently high signal-to-noise ratio is inside traces at the lowest frequency of
3.5 Hz, already used in the 3D-FWI application of Sirgue et al. (2010). The spectral amplitude
of the 3.5-Hz frequency represents 45 % of that of the dominant 7-Hz frequency after whitening
and Butterworth filtering. The highest inverted frequency of 7 Hz appeared to be the latest
frequency leading to models with a sufficiently good S/N ratio of the imaging. We designed
our frequency groups with 3 frequencies per group, with one frequency overlapping frequency
between groups (Table 4.4).

172
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

Table 4.4: Schedule of frequency groups used in multiscale FWI.

Group Frequencies (Hz)

I [3.5, 3.78, 4]
II [4, 4.3, 4.76]
III [4.76, 5, 5.25]
IV [5.25, 5.6, 6]
V [6, 6.35, 6.7]

Offset (km) Offset (km)

-8 -4 0 4 -8 -4 0 4
0 0
Time - offset / 2.5 (s)

Time - offset / 2.5 (s)

4500 1 4500 1
2250 2250
0 0
-2250 2 -2250 2
-4500 -4500
3 3
Offset (km) Offset (km)
-8 -4 0 4 -8 -4 0 4
0 0
Time - offset / 2.5 (s)

Time - offset / 2.5 (s)

4500 1 4500 1
2250 2250
0 0
-2250 2 -2250 2
-4500 -4500
3 3

Figure 4.51: Data preconditioning by time damping - Recorded receiver gather plotted without
time damping (a), and with time damping using τ = 1 s (b), 3 s (c) and 5 s (d). The solid
white line denote the first-arrival picks. The dash lines denote the reflections from the top and
the bottom of the gas layers.

When used, time damping factors τ are chosen equal to 1 s, 3 s and 5 s, and were applied
from the first-arrival travel time t0 . A receiver gather on which we have applied those different
exponential time damping factors are compared in Fig. 4.51. The most aggressive damping
factor (τ = 1 s) applied during the early iteration of one frequency group inversion favours the
early-arriving phases associated with the wide-aperture components of the data, while the use
of higher values of τ introduce smoothly later-arriving phases associated with shorter-aperture
components.
For numerical optimisation, we use the preconditioned conjugate gradient algorithm (Mora,
1988). We allow for a maximum of 25 iterations per damping or per frequency depending
whether dampings are used or not. We noticed that 25 iterations were enough because PCG
often stopped before reaching this step as we stop when no significant model variations are
encountered.
Since the data are sorted by receiver gather for FWI, we estimate a source wavelet per
receiver gather at each non-linear FWI iteration by using the least-squares norm (see equation

173
APPLICATIONS

Table 4.5: FWI setup used for the 4 applications performed in this study. m0 : initial FWI
model. τ (s): time damping.

FWI model m0 Frequency groups (Hz) τ (s)

FATT+FWI model 1 FATT [3.5], [4], [5],[6.1],[7] 1s
FATT+FWI model 2 FATT [3.5, 3.78, 4], [4, 4.3, 4.76], [4.76, 5, 5.25],[ 5.25, 5.6, 6] 1s
FATT+FWI model 3 FATT [3.5, 3.78, 4], [4, 4.3, 4.76], [4.76, 5, 5.25],[ 5.25, 5.6, 6] 1, 3, 5 s
RTT+FWI model RTT [3.5, 3.78, 4], [4, 4.3, 4.76], [4.76, 5, 5.25],[ 5.25, 5.6, 6] -

(3.6)). The underlying assumption is that the shots are perfectly repetitive.
Application of a Gaussian smoothing to the model velocity perturbations is applied using
some frequency dependent correlation lengths. Of note, the horizontal correlation length is
set three times longer than the vertical one as the medium is fairly tabular. The model is
kept constant down to a depth of 77 m in order to keep constant the velocity in the water
layer (above 70 m depth), and to avoid instabilities in the vicinity of the sources and receivers
located at 71-m and 5-m depth, respectively. No data weighting is applied during FWI, and,
therefore, Wd = I, where I is the identity matrix.
As reported by Pratt and Shipp (1999), the potential improvement in resolution
√ provided
by FWI compared to FATT, can be estimated as being of the order of Nλ , where Nλ is
the number of wavelengths propagating between source and receiver. Although we consider
starting models originally built by RTT, exhibiting higher resolution than models built by
FATT only, it might be interesting to quantify this resolution improvement. The velocities in
the Valhall model range between 1500 m/s and 3500 m/s, and FWI is performed in the [3.5-7]
Hz frequency band. For a maximum offset of 13 km, it follows that FWI should lead to an
increase of resolution by a factor between 4 and 8 compared to the resolution of FATT.

4.5.6 FWI results

We perform four applications of FWI, for which we use different starting models, different
frequency groups and different time dampings. The inversion setup used for these 4 applications
are outlined in Table 4.5.

4.5.6.1 FATT+FWI model 1

In a first test, we use the FATT model as the initial FWI model and only one time damping
factor of 1 s. A time damping factor of 1 s allows to favour the aperture components of the
data which are well predicted by the FATT model from a kinematic viewpoint. We perform
successive mono-frequency inversions of frequencies 3.5, 4, 5, 6.1 and 7 Hz, leading to the
FATT+FWI model 1 (Fig. 4.52a). The FATT, the RTT and the final FATT+FWI velocity
models are compared with a smoothed version of the well log in Fig. 4.53a. The well log of
Fig. 4.47c is low-pass filtered in the time-domain after depth-to-time conversion with a cut-off
frequency of 14 Hz, inferred from the theoretical resolution of FWI (i.e., half a wavelength)
at the 7 Hz frequency. The inversion successfully images a low velocity reflector at a depth

174
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

of 500 m, the existence of which has been previously proposed based on the interpretation
of the receiver gather (Fig. 4.47a). Interestingly, the FWI profile remains centred on the
horizontal velocities inferred from FATT down to the top of the gas layers and does not match
the vertical velocities described by the well log, nor the NMO velocities of the RTT model. The
FATT+FWI model 1 exhibits low-velocity artifacts at the ends of the model in the near surface,
where the deficit of data coverage provides the necessary degrees of freedom in the inversion
to artificially accommodate anisotropic effects. These artifacts might be linked with artificial
undulations of the horizontal reflector on top of the reservoir at around a depth of 2500 m.
We interpret these artifacts as the footprint of anisotropy in the isotropic FWI reconstruction.
The deep part of the velocity model below 2500 m is weakly perturbed by the FWI procedure,
due to the aggressive time damping applied to the data. Indeed, the wide-aperture components
of the data favoured by this time damping poorly illuminate the deep part of the model, due
to insufficient long-offset coverage. We notice, however, that the FWI creates a high-velocity
interface at a depth of 2900 m depth (Fig. 4.53a), which can be interpreted as a stretching in
depth of the velocity law to balance the increase of velocities in the shallow part.

4.5.6.2 FATT+FWI model 2

In a second test, we use the same experimental setup than for the FATT+FWI model 1 test
except that we successively invert 5 frequency groups instead of single frequencies (Tables 4.4
and 4.5), leading to the FATT+FWI model 2 (Figs. 4.52b and 4.53b). Here, we attempt to
improve the FWI results by increasing the data redundancy. The FATT+FWI model 2 model
is rather close to the FATT+FWI model 1, but its FWI log shows a slightly better match of
the well log in the gas layers (Fig. 4.53b). The undulation of the high-velocity interface at
the depth of 2500 m and the high-velocity interface at a depth of 2900 m increases compared
to those of the FATT+FWI model 1. This shows that these artificial features are intrinsic to
the isotropic FWI of the anisotropic data, and, therefore, cannot be removed by increasing the
data redundancy.

4.5.6.3 FATT+FWI model 3

In a third test, we use the same experimental setup that for the FATT+FWI model 2 test but
we use in cascade 3 time dampings during the inversion of each frequency group (τ = 1, 3, 5s)
(Table 4.5). This allows us to progressively broaden the aperture illumination from the wide-
aperture transmission regime to the short-aperture reflection regime during the inversion. Note
that the wide-aperture components associated with strong time dampings are injected first
during one frequency group. On the one hand, this is consistent with the fact that these
aperture components are those that are accurately predicted by the starting FATT model. On
the other hand, this hierarchical strategy is consistent with the multiscale approach, where the
long wavelengths constrained by the wide apertures must be reconstructed first. The final FWI
model, referred to as the FATT+FWI model 3, is shown in Fig. 4.52c. Broadening the aperture
content of the data improves the resolution of the FWI, in particular in the deep part of the
model where features down to a depth of 4000 m can be interpreted. This improved resolution
is at the expense of the S/N of the imaging. For example, we notice increased artificial vertical
ringing in the upper part of the model, which might be interpreted as the footprint of extrinsic
layered-induced anisotropy (Pratt et al., 2001) (Figs. 4.52c and 4.53c). Moreover, the high

175
APPLICATIONS

FATT+FWI model 1

FATT+FWI model 2

FATT+FWI model 3

RTT+FWI model

3D-FWI model

Figure 4.52: Final FWI models - (a) FATT+FWI model 1. The ellipses delineate the velocity
artifacts mentioned in the text. (b) FATT+FWI model 2. (c) FATT+FWI model 3. (d)
RTT+FWI model. (e) Vertical section extracted from the 3D-FWI model of Sirgue et al.
(2010) at the position of cable21. Note we extended this model below 4 km depth to represent
it with the same vertical scale than the above-mentioned models. See text for details as well
as Table 4.5.

176
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

Figure 4.53: Logs of FWI models - (a-e) Profiles extracted at the position of the well log
from the FWI models shown in Figure 4.52 (solid gray curve), from the RTT model (black
dash curve) and from the FATT model (gray dash curve). (a) FATT+FWI model 1. (b)
FATT+FWI model 2. (c) FATT+FWI model 3. (d) RTT+FWI model. (e) 3D-FWI model of
Sirgue et al. (2010). The smoothed well log velocities are plotted with a solid black curve (see
text for details).

velocity at a depth of 2900 m in the FWI models has also increased compared to the previous
cases because the footprint of the anisotropy has been increased by the broadening of the
aperture content of the data.

4.5.6.4 RTT+FWI model

In a fourth test, we use the RTT model as an initial model and we inverted the full data
set without considering time damping. The resulting FWI model, referred to as RTT+FWI
model, is shown in Figs. 4.52d and 4.53d. Compared to the previous test, we did not consider
time dampings to give more weight during the early stages of the inversion to the arrivals
which are accurately predicted by the RTT model (i.e., the short-aperture reflections). It is
worth noting that, in this setting, artifacts on model sides are mostly cancelled, and that the
reflector at a depth of 2500 m remains almost flat as it should be. We note also an improved
lateral coherency of the gas cloud as well as the improved focusing of the gas layers in it. All
these observations strongly support that the artifacts shown when the FATT model is used as
a starting model mainly result from the match of the long offset data enhanced by the data
preconditioning performed with aggressive damping. The velocity profile of the RTT+FWI
model at the well log position shows that velocities are slightly closer to the NMO velocities
between depths of 1000 m and 1700 m than in previous models (Fig. 4.53d). The stretching
of the velocity structure at 2900 m has also been cancelled probably in relation with velocities
closer to NMO velocities in the upper part. Reflectors below a depth of 3000 m are more
focused than in the FATT+FWI model 3 and can be correlated with the deep reflector shown
in Sirgue et al. (2010, Their Figure 3b).

4.5.6.5 Comparison with results of 3D FWI

We now compare our 2D FWI models with a cross-section extracted from the 3D-FWI model
of Sirgue et al. (2010) at the position of cable 21 (Fig. 4.52e). It is reminded that the 3D FWI

177
APPLICATIONS

model has been obtained using the RTT NMO velocity model as initial FWI model and without
time damping (Sirgue et al., 2009, 2010). The 3D FWI model presents several similarities with
the 2D FATT+FWI models. The most striking observation is that velocities are as high as
in the FATT + FWI models between depths of 800 m and 1500 m, suggesting that the 3D
FWI converges towards the horizontal velocities at these depths, although the initial model
is inferred from reflection tomography (Fig. 4.53e). In order to balance these high velocities
in the shallow part, the 3D FWI has created a high velocity interface at a depth of 2900 m,
hence stretching the velocity structure in depth to match the deep short aperture reflections.
This velocity interface is also similar to that shown in the FATT+FWI models, while this
interface is missing in the 2D RTT+FWI model. The fact that the 3D FWI model shares
more similarities with the FATT+FWI models than with the RTT+FWI model is somehow
unexpected, because both the 3D FWI model and the 2D RTT+FWI model have been inferred
from an initial model built by reflection tomography. We interpret this result by the fact that
the ratio between the wide-aperture and the short-aperture components is higher in 3D wide-
aperture/wide-azimuth acquisition than in 2D acquisition. Therefore, we conclude that the 2D
RTT+FWI reconstruction is mostly driven by the short-aperture reflection wavefield, while the
3D FWI reconstruction is more dominated by the wide-aperture components of the data, at
least in the upper part of the structure.

4.5.7 Model appraisals

Model appraisal is a key issue in FWI since uncertainty analysis is quite challenging to perform
in a Bayesian framework (Gouveia and Scales, 1998). In this study, the FWI models will
be evaluated based upon four criteria: the local match with VSP log, synthetic seismogram
modelling, flatness of common image gathers and repeatability of source wavelet estimation.

4.5.7.1 Seismic modelling

For seismic modelling, we need a source wavelet, that is estimated by a VFSA method using
a suitable subdataset and velocity model. We use the RTT model and the first 2 kilometers
of offsets for the source estimation. In this setting, the source wavelet estimation is mainly
controlled by the short-aperture reflection wavefield, which should be well predicted by the
RTT model. The 220 sources wavelets associated with each receiver gather and the mean
wavelet are shown in Figs. 4.54a and 4.54b, respectively. The source wavelet is computed
within the 4-20-Hz frequency band. Assuming a uniform receiver-ground coupling all along the
profile (a reasonable assumption according to lithology of the sea bed composed of hard sand
(Kommedal et al., 1997)) and a sufficiently-accurate velocity model, we should end-up with
quite similar source wavelets, as shown in Fig. 4.54a. In the following, we shall use the mean
wavelet shown in Fig. 4.54b for seismic modelling. This source is also used to compute the
receiver gathers shown in Fig. 4.48(c-d).
Time-domain seismograms computed in the 2D FWI models with the same DG modelling
engine than for FWI are compared with the recorded seismograms for the receiver gather
previously shown in Fig. 4.47(a). The data match obtained with the final FWI models (Fig.
4.55) has been nicely improved compared to that obtained with the initial FATT and RTT
models (Fig. 4.48). As expected, the FATT+FWI model 2 provides a better match of the
diving waves and of the super-critical reflections than the RTT+FWI model, because both the

178
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
20

0 0
Time (s)

Time (s)
0

0.5
-20

1 1.0

Figure 4.54: Source wavelet estimation for seismic modelling - (a) Sources wavelets for each
receiver gather, estimated using the RTT model and a maximum offset of 2km in the data. (b)
Mean wavelet averaged over the wavelets shown in (a).

initial FATT model and the data preconditioning by time damping (τ = 1 s) used to build
the FATT+FWI model 2 favour the match of the wide apertures (Fig. 4.55(a)-(c)). On the
other hand, the RTT+FWI model provides a much better match of the reflections from top
and bottom of the gas layers than the FATT+FWI model 2. The match of the short-aperture
reflections is nearly equivalent in the FATT+FWI model 3 and in the RTT+FWI model, while
the FATT+FWI model 3 provides a better match of the diving waves than the RTT+FWI
model (Fig. 4.55(b)-(c)). For both models, all the aperture components of the models are
injected in the inversion, but using different initial models and data preconditioning. The
initial model (FATT model) and the data preconditioning (hierarchical use of increasing time
damping factors) used to build the FATT+FWI model 3 favour the match as complete as
possible of all the aperture components of the data. On the other hand, the attempt to
match all the aperture components of the data, in particular the wide-aperture ones, probably
contributes to the increase of the footprint of the anisotropy in the reconstructed isotropic
models, as suggested by the artifacts in the FATT+FWI models highlighted in Fig. 4.52a.
Beyond those differences of waveform agreement, it is noteworthy to say that both the
FATT+FWI model 3 and the RTT+FWI model allow for a satisfying match of the data, in
particular in terms of phase, disregarding the significant discrepancies between those models.
It thereby highlights the ill-posedness of FWI in terms of non-unicity of the models to match
anisotropic data.

4.5.7.2 Analysis of migration and common image gathers

The FWI models are assessed using migrated images and CIGs computed by ray+Born pre-
stack depth migration. For migration, we use a suitable preprocessed data set, where free-
surface multiples are removed. The data are filtered in the [10-60] Hz frequency band. We

179
APPLICATIONS

a) 0

Time − offset / 2.5 (s)

3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
b) Offset (m)
0
Time − offset / 2.5 (s)

3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
Offset (m)
c)
0
Time − offset / 2.5 (s)

3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
Offset (m)

Figure 4.55: Seismic modelling in the FWI models - Direct comparison between recorded
(black curves) and modelled (gray curves) seismograms computed in the final FWI models for a
receiver gather at the receiver position x = 14100 m. Seismograms are plotted with a reduction
velocity of 2500 m/s. True amplitudes are plotted with a gain with offset corresponding to the
offset power 0.6. (a-c) Seismograms computed in (a) FATT+FWI model 2, (b) FATT+FWI
model 3, (c) RTT+FWI model.

apply an automatic gain control (AGC) to the migrated image to increase the amplitude of the
deep reflectors. Such AGC is not applied to the CIGs to have a quantitative analysis of the
reconstructed velocity contrasts. The same data were previously used for the CIGs computed in
the initial models (Fig. 4.49). The most obvious result is that the reflectors are raised at more
correct depths in the migrated image computed within the FATT+FWI model 3 compared to

180
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

those of the migrated image computed in the FATT model (compare Figs. 4.56a and 4.49b).
This is also illustrated by the fact that the CIGs computed from the FATT+FWI model 3 are
much flatter than those inferred from the FATT model (compare Figs. 4.56e and 4.49d). The
migrated image and the CIGs computed in the RTT+FWI model are similar to that computed
in the RTT model, i.e., the FWI does not allow to improve significantly the migrated images
compared to reflection tomography (compare Figs. 4.56e and 4.49d). The CIGs computed in
the RTT+FWI model are slightly flatter and more laterally coherent below a depth of 2500 m
than the ones computed in the FATT+FWI model 3. This may result from artifacts already
mentioned at a depth of 2500 m in the FATT+FWI models. The footprint of these artifacts
are clearly visible in the migrated images computed in the FATT+FWI model 3, where the
reflectors at and below a depth of 2500 m exhibits artificial undulations, unlike those of the
migrated image computed in the RTT+FWI model (Fig. 4.56). The reflectors below a depth
of 2500 m are shallower in the RTT+FWI migrated image than in the FATT+FWI migrated
image: the RTT+FWI model is less affected by depth stretching than the FATT+FWI model.
As a reference, we show the migrated image computed in the 2D section of the 3D FWI
model of Sirgue et al. (2010). The migrated image is slightly superior to that computed in the
RTT+FWI model, in terms of focusing and flatness of the CIGs. On the other hand, the depth
of the deep reflectors below a depth of 2500 m matches more closely that of the FATT+FWI
migrated image than that of the RTT+FWI migrated image. This confirms that the 3D FWI
velocity model is kinematically closer to the FATT+FWI model than the RTT+FWI model,
as already mentioned, and is significantly affected by depth stretching.

4.5.7.3 Source wavelet estimation as a tool for model appraisal

We use the source wavelet estimation as a tool to appraise the relevance of the FWI models
(Jaiswal et al., 2009). We estimate source wavelets considering the full offset range, in order to
make the wavelet more sensitive to the model quality (Fig. 4.57). The sensitivity of the wavelet
estimation to the amount of data used in equation (3.6) can be assessed by comparing wavelets
estimated in the RTT model using data sets with a maximum offset of 2000 m (Fig. 4.54)
and 13000 m (Fig. 4.57a), respectively. Repeatability of the wavelets is strongly affected when
all the aperture components of the data are involved in the inversion process, if the velocity
model is not accurate enough. Collection of source wavelets and corresponding mean wavelet
inferred from the full offset range and computed in the FATT model, in the FATT+FWI model
3 and in the RTT+FWI model are shown in Fig. 4.57(b-d). Comparison between the wavelets
computed in the traveltime tomography models and in the FWI models shows how the source
wavelet estimation is improved when a FWI model is used. All the wavelets inferred from the
FWI models are closed in shape and amplitudes, although we notice that the wavelet inferred
from the RTT model is more symmetric in shape. The wavelets estimated from the section of
the 3D FWI model show strongest amplitudes and has a shape which closely matches that of
the wavelet inferred from the FATT+FWI models. This supports again than 3D FWI converges
to models closer to the FATT+FWI models than the RTT+FWI one.

4.5.8 Summary of isotropic results and anisotropic FWI

We have presented an application of 2D isotropic full waveform inversion (FWI) to wide-

aperture data from the Valhall field, that allows us to highlight the footprint of anisotropy

181
APPLICATIONS

Figure 4.56: PSDM and CIGs computed in the final FWI models. (a-c) PSDM images com-
puted in (a) FATT+FWI model 3, (b) RTT+FWI model, (c) the 2D section of the 3D-FWI
model of Sirgue et al. (2010). (d-f) Corresponding CIGs.

182
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

on the imaging. We have used two initial FWI velocity models, the RTT and the FATT
models, specifically designed to match the short-aperture reflection traveltimes and the first-
arrival traveltimes, respectively. When we use the FATT model as the initial model for FWI, we
apply some preconditioning to the data by successive time dampings to favour the wide-aperture
components of the data during the early stages of the inversion following a multiscale approach.
In this setting, the FWI model exhibits horizontal velocities in the upper part of the model,
which is illuminated by diving waves, and vertical stretching of the deep structure, which is
required to match deep short-aperture reflections. We also show several kinds of artifacts in the
isotropic velocity models (near surface velocity anomalies, undulation of horizontal reflector,
ringing of the velocity-depth function), that are interpreted as the footprint of anisotropy, when
the FWI is tuned to jointly match the wide-aperture and the short-aperture components of the
data. When we use the RTT velocity model as the initial model for FWI, these artifacts are
significantly reduced. This might be due to the fact that the starting model drives the inversion
mainly towards the match of the short-aperture reflections, at the expense of the match of the
wide-aperture arrivals. We compare our 2D velocity models with a dip section of the 3D FWI
model of Sirgue et al. (2010). Although the 3D FWI model has been constructed from an
initial velocity model developed by RTT, the 3D FWI model share more similarities with the
2D FWI models built from the FATT model than with the FWI model built from the RTT
model. This might result from the fact that, in 3D wide-azimuth acquisition, wide-aperture
components have a stronger weight in the data than the short-aperture components, compared
to 2D wide-aperture acquisitions.

In order to validate the interpretations above-mentioned, we perform an anisotropic (VTI)

acoustic FWI where only the so-called vertical velocity V0 is involved in the inversion, while
the Thomsen’s parameters δ and are kept constant over the FWI iterations. The initial
models for V0 , ρ, δ and are shown in Figs. 4.58(a-d), and are those obtained from anisotropic
reflection travel time tomography. The FWI is applied consecutively to five frequency groups
without considering time dampings. The final FWI model for V0 is shown in Fig. 4.58(e) and the
comparison with the well log is shown in Fig. 4.59a. As expected, the vertical velocity model do
not show artifacts. The same structures than for isotropic FWI are shown with a good S/N and
lateral coherency such as the interface at a depth of 500 m, the gas layers from the gas cloud,
and some deep reflectors below 3000 m. The final V0 FWI model shows a good agreement
with the well log velocities without stretching effects (Fig. 4.59a). We also notice that the
velocity perturbations in the gas layers have smaller amplitudes in the V0 model than in the
RTT+FWI velocity model (Fig. 4.59b), suggesting a more stable reconstruction of velocities
when a background level of anisotropy is involved in the modelling. This is consistent with
the results of Pratt et al. (2001), who have shown that isotropic FWI can match anisotropic
crosshole data as well as anisotropic FWI, by creating layer-induced anisotropy with sharp
velocity contrasts in the isotropic velocity model.

Synthetic seismograms computed in the anisotropic acoustic approximation with the dis-
continuous Galerkin method are compared with the real data in Fig. 4.60. The match with the
real data is quite good from both kinematic and dynamic viewpoints (Figs. 4.60a and b)), and
has been slightly improved compared to the results of the isotropic modeling (Figs. 4.55). The
results of the visco-acoustic anisotropic modeling shown in Fig. 4.60 supports that the acoustic
VTI wave equation is sufficiently accurate to perform the FWI of the P wavefield, in the case
of weak to moderate anisotropy (Operto et al., 2009). Of note, the S waves generated during
acoustic VTI modeling (Grechka et al., 2004) should have a minor footprint in the modeling,

183
APPLICATIONS

because the medium is isotropic or quasi-elliptic near the receiver positions (therefore, no S
waves are generated at the receivers processed as sources) and the FWI models are sufficiently
smooth to prevent significant P-to-S conversions.
For all the synthetic seismogram modellings, we show a better match of the amplitudes of
the reflection from the top of the gas layers than from the bottom. Moreover, the amplitudes at
short offset are often underestimated. This can be interpreted as follows: first, late reflections
are more difficult to match than early reflections because they are more sensitive to kinematic
inaccuracies and because they have potentially smaller amplitudes, hence, making the inversion
of these events poorly conditioned. Second, amplitude of short-aperture reflections are more
sensitive to density errors, according to the radiation pattern of the density parameter, which
shows the maximum of sensitivity at normal incidence when the model space is parametrised
by velocity and density (Virieux and Operto, 2009, their figure 13). Since deep reflectors are
illuminated by narrower aperture bands than shallow reflectors for a given receiver spread,
imaging of deep reflectors should be more sensitive to density errors than shallow reflectors.
We apply isotropic and anisotropic frequency-domain reverse time migration (RTM) to
the data already used for Ray-Born PSDM, using the VTI finite-difference frequency-domain
modelling engine of Operto et al. (2009) and the kernel of the FWI program of Sourbier et al.
(2009a,b). The RTM is applied in the [3.5 − 60] Hz frequency range. The RTM corresponds to
the stack over frequencies of the product of the incident wavefield and back-propagated data.
The source wavelet, used for the computation of synthetic seismograms, is also used in the
RTM to model the incident wavefields (Kim et al., 2011). The Fig. 4.61a shows the isotropic
migrated image inferred from the background NMO RTT velocity model, while the Figs. 4.61(b-
c) shows the two anisotropic migrated images inferred from the anisotropic reflection traveltime
tomography background models (Figs. 4.58(a-d)) and from the anisotropic FWI background
models (Figs. 4.58(b-e)). We show a better lateral coherency of the reflectors in the anisotropic
RTT migrated image than in the isotropic one. Moreover, the reflectors are positioned at their
correct depths in the anisotropic migrated image. In particular, the depth of the top of the
reservoir at a depth of 2500 m is consistent with the high-velocity interface in the well log, and
is much better focused than in the isotropic RTM, this latter being clearly affected by vertical
stretching.
The anisotropic RTM image inferred from the FWI background model (Fig. 4.61c) is not
as well focused as the anisotropic RTM image inferred from the V0 background model (Fig.
4.61b), except at the reservoir level at depths around 2500 − 3000 m and below where reflectors
are more continuous. This latter point suggests that, unlike RTT, FWI successfully accounts
for arrivals coming from below the reservoir level. The Fig. 4.61d shows the superimposition
of this last RTM image (Fig. 4.61c), with the model perturbation obtained by the difference
between the final and the starting FWI model, that we vertically derive as to enhance the
vertical resolution. This representation highlights the consistency existing between reflectors
of both images at all depths, and also allows to illustrate the gap of resolution existing between
FWI and migration.

4.5.9 Partial conclusion

The case study of isotropic FWI of anisotropic wide-aperture data presented in this study has
highlighted the footprint of anisotropy on isotropic FWI. Most of application of FWI at the
oil exploration scale are performed in the isotropic acoustic approximation. We have discussed

184
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

with a real data case study the validity of the isotropic approximation to perform FWI applied
to wide-aperture/wide-azimuth data. For such acquisitions, the difference between the hori-
zontal and the vertical velocities leads to kinematic inconsistencies between the short-aperture
and the wide-aperture components of the data, when the data are modelled in the isotropic
approximation. These kinematic inconsistencies can lead to artifacts in the FWI velocity model
like unrealistic velocities in part of the model where a deficit of data coverage exists and ar-
tificial vertical and horizontal undulations of reflectors, if all the aperture components of the
data are involved in the FWI. Moreover, the meaning of the isotropic velocity can be am-
biguous when isotropic FWI is applied to anisotropic wide-aperture data. Surface acquisition
geometries make the upper part of the target to be dominantly controlled by the wide-aperture
components of the data such as diving waves and super-critical reflections. This is even more
true for 3D wide-azimuth acquisitions. Therefore, FWI will tend to reconstruct the horizontal
velocities in the upper part of the medium, if the FWI initial model predicts sufficiently accu-
rately these wide-aperture arrivals. The reconstruction of the horizontal velocities in the upper
structure leads to kinematic inaccuracies during the inversion of the short-aperture reflections
which dominantly control the reconstruction of the NMO velocity in the deeper part of the
target. This kinematic errors are accommodated by stretching in depth the velocity structure
in the deep part of the target. Therefore, anisotropy should be involved in FWI to avoid
this bias in the velocity estimation and in the positioning in depth of the reflectors. Indeed,
this is not a trivial task. We have shown how significantly different velocity models allow for
a nearly equivalent match of the data. This highlights the ill-posedness of multiparameter
acoustic anisotropic FWI. Therefore, future work will require a careful sensitivity analysis of
anisotropic FWI in order to define the number and the type of parameter classes that can
be reliably reconstructed by anisotropic FWI of wide-aperture data, as well as the design of
efficient strategies to constrain the inversion with suitable a priori information coming from
well logs.

Acknowledgements

We thank BP for providing us the 2D raw Valhall data set as well as the data set preprocessed
by PGS for migration, the initial anisotropic models, the well log velocities and the 2D section
of the 3D FWI model of Sirgue et al. (2010). We would like to thank H. Chauris (Mines
ParisTech) for providing us their 2D ray+Born migration code.

185
APPLICATIONS

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0

Time (s)

Time (s)
0

0.5
-10

1 1.0

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0
Time (s)

Time (s)
0

0.5
-10

1 1.0

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0
Time (s)

Time (s)

0.5
-10

1 1.0

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0
Time (s)

Time (s)

0.5
-10

1 1.0

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0
Time (s)

Time (s)

0.5
-10

1 1.0

Receiver number Amplitude

0 50 100 150 200 -20 -10 0 10 20

-0.5
10

0 0
Time (s)

Time (s)

0.5
-10

1 1.0

Figure 4.57: Source wavelet estimation using the full offset range in the data. (a-e) Source
wavelets estimated from (a) the RTT model, (b) the FATT model, (c) the FATT+FWI model
186
3, (d) the RTT+FWI model, (e) the 2D section of the 3D-FWI model of Sirgue et al. (2010).
(f-j) Corresponding average wavelets.
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

Figure 4.58: Initial and final anisotropic FWI models - (a-d) Initial models for anisotropic
FWI. (a) Vertical velocity V0 , (b) density inferred from the Gardner law, (c) δ, and (d) . (e)
Final anisotropic FWI model for vertical velocity. Density, δ and are kept constant during
the inversion. 187
APPLICATIONS

Velocity (m/s) Velocity (m/s)

1000 2000 3000 1000 2000 3000
0 0

500 500

1000 1000

1500 1500
Depth (m)

Depth (m)

2000 2000

2500 2500

3000 3000

3500 3500
X = 9.5 km X = 9.5 km

Figure 4.59: Log of the anisotropic FWI model - (a) Log of the final (thick solid gray curve)
and of the initial (dash dark gray curve) FWI vertical velocity model are compared with the
smoothed well log (solid black line). (b) Vertical-velocity logs shown in (a) can be compared
with the log of the isotropic RTT+FWI model (thin dark gray line). The dash light gray line
represents the log of the horizontal velocity model inferred from the V0 model and the model of
Figure 4.58(a)-(d). The agreement between the RTT+FWI model and the horizontal-velocity
log confirms that isotropic FWI has built the horizontal velocity in the upper structure. Note
also the higher velocity contrasts in the gas layers of the RTT+FWI model (thin dark gray
line), compared to that of the anisotropic FWI model (thick solid gray curve).

188
4.5 Shallow-water offshore acoustic FWI : on the footprint of the Valhall anisotropy on FWI

a) Offset (km)
-8 -4 0 4
0
Time - offset / 2.5 (s)

b) Offset (km)
-8 -4 0 4
0
Time - offset / 2.5 (s)

3
c)
0
Time − offset / 2.5 (s)

3
−10000 −8000 −6000 −4000 −2000 0 2000 4000
Offset (m)

Figure 4.60: Seismic modelling in the anisotropic FWI model - (a-b) True amplitude seis-
mograms plotted without gain with offset. (a) Recorded data. (b) Modelled data computed
in the final anisotropic FWI model (Figure 4.57e)). (c) Direct comparison between recorded
(black) and modelled (gray) seismograms computed in the final anisotropic FWI model. True
amplitudes are plotted.

189
APPLICATIONS

Distance (km)
4 6 8 10 12 14 16
0

0.5

1.0

1.5
Depth (km)
2.0

2.5

3.0

3.5

4.0
Distance (km)
4 6 8 10 12 14 16
0

0.5

1.0

1.5
Depth (km)

2.0

2.5

3.0

3.5

4.0
Distance (km)
4 6 8 10 12 14 16
0

0.5

1.0

1.5
Depth (km)

2.0

2.5

3.0

3.5

4.0
Distance (km)
d) 4 6 8 10 12 14 16
0

0.5

1.0

1.5
Depth (km)

2.0

2.5

3.0

3.5

4.0

Figure 4.61: RTM migrated images - (a) isotropic RTM computed in the RTT background
model. (b-c) Anisotropic RTM computed in (b) the initial anisotropic FWI model and (c) in
the final anisotropic FWI model. (d) Same as (c), on which we superimposed the perturbation
model, i.e., the difference between the final and the starting vertical velocity FWI models.

190
Conclusion

In these lecture notes, two main forward modelling approaches have been presented. Firstly,
the discretisation of the strong formulation of the partial differential equations has been dis-
cussed through essentially the 3D acoustic equation in the frequency domain. This approach
corresponds to the pseudo-spectral, finite-difference, and finite-volume methods. On a struc-
tured meshing, notably a regular or stretched grid, these approaches are easy to implement and
quite flexible. They are currently the methods of choice for large-scale modelling and inversion
in exploration geophysics, especially in the marine environment. They may however demand a
very fine discretisation when the earth model contains large contrasts; and accurately modelling
the responses around a sharp interface is quite challenging.
Secondly, we discussed the weak formulation, namely the finite-element methods with a
discontinuous Galerkin approach. Test functions are equal to the interpolation functions and
only fluxes are exchanged at boundaries, giving us more freedom and the integral form provides
us flexibility in the meshing. However, they lead to numerical challenges: they are more difficult
to implement than the finite-difference method, they are often more expensive in computational
time and memory, and they are more complicated to use because the accuracy of the response
depends on the quality of the meshing. In seismology, where surface waves play a major job,
they have shown their relevance, especially the spectral finite-element method. Therefore, they
can be considered in the frequency domain in 2D and in the time domain in 3D for elastic
propagation as the frequency formulation is uptonow too expensive.
This classification helps to understand the advantages and limitations of each particular
method to model a specific physical phenomenon. It should be noticed that attempts exist to
combine the advantages of these methods in one approach, at least for specific applications.
When the modelling method serves as the kernel of an inversion algorithm, additional constrains
generally appear because the gradient of the misfit functional needs to be evaluated. The
choice of the modelling approach notably depends on the needed accuracy, the efficiency in
evaluation the solution and the gradient of the misfit functional in an inversion algorithm, and
the simplicity to use. Although this was not really discussed, the efficiency may considerably
depends on the hardware architecture. Some of the new types of hardware architecture require
a new modelling implementation to be used efficiently as, for example, graphical processor
units (GPU) which may require specific developments. Similarly, the practical implementation
shall probably be adapted to the data acquisition. Densely sampled acquisition in exploration
geophysics with or without blending, or in lithospheric investigation with the recent deployment
of sensors as for the US array experiment challenges our modelling choice. This seems to
indicate that development in modelling and associated inversion approaches remain crucial
to improve our subsurface knowledge, notably by extracting more information from the, ever
larger, data sets we record.
CONCLUSION AND PERSPECTIVES

This forward modeling could be embedded into a full waveform inversion which can be
performed either in the time-domain or in the frequency-domain. If FWI is applied in the
frequency-domain, monochromatic wavefields can be extracted from the DG-FEM time-domain
modelling engine by discrete Fourier transform, as shown by Sirgue et al. (2008) or by phase-
sensitive detection (Nihei and Li, 2007). The first scheme is particularly interesting for frequency-
domain FWI, since an arbitrary number of frequencies can be treated without a significant in-
crease in computation time. Moreover, due to the satisfying speed-up of DG-FEM, we suggest
that this method will be efficient for very large problems. The FWI is based on a gradient and
Hessian operator estimation using adjoint method formulation for minimizing a misfit criterion.
Full waveform inversion is the last mile procedure for the extraction of information from
seismograms. We have shown the conceptual effort in the last thirty years for making the
full waveform inversion a possible tool for high-resolution imaging. These efforts have focused
on the development of large-scale numerical optimization techniques, efficient resolution of
the two-way wave equation, judicious model parameterization for multi-parameter reconstruc-
tion, multiscale strategies to mitigate the ill-posedness of full-waveform inversion and specific
waveform-inversion data preprocessing.
The full waveform inversion has reached today enough maturity for the prototype applica-
tion to 3D real data sets. Whereas applications to 3D real data have shown promising results
at low frequencies (< 7 Hz), it is still unclear to which extent application of full waveform
inversion can be efficiently performed at higher frequencies. Answering this question requires a
more quantitative understanding of the sensitivity of full waveform inversion to the accuracy of
the starting model, to the noise and to the wavefield-amplitude mismatch is probably required.
If full waveform inversion remains limited to the low frequencies, it will remain a tool to
build background models for migration. In the opposite case, FWI will tend to a self-contained
processing workflow that can reunify macromodel building and migration tasks.
The applicability of full waveform inversion will depend also on technological advances
related to acquisition and computing.
New strategies have to be found for making a new jump in the interpretation of seismograms.
One could say that there is a need for another conceptual progress which is unlikely to come
up from the current application of optimization. Will this jump be achieved by speeding up
by few orders the forward problem (for example by using GPUs) or will it be performed by a
rather complex nonlinear transform related to new norms in the model and data spaces?
Time is exciting as realistic applications are right now possible while we still have intermedi-
ate progress as anisotropic reconstruction, 3D elastic imaging. An intellectual jump for better
reconstruction and extraction is necessary for making this technique attractive as scientific
issues are asking for.

192
Bibliography

Aagaard, B. T., Hall, J. F., and Heaton, T. H. (2001). Characterization of near-source ground
motion with earthquake simulations. Earthq. Spectra., 17:177–207.

Abarbanel, S., Gottlieb, D., and Hesthaven, J. S. (2002). Long-time behavior of the perfectly
matched layer equations in computational electromagnetics. Journal of scientific Computing,
17:405–422.

Abubakar, A., Hu, W., Habashy, T. M., and van den Berg, P. M. (2009). Application of
the finite-difference contrast-source inversion algorithm to seismic full-waveform data. Geo-
physics, 74(6):WCC47–WCC58.

Ainsworth, M., Monk, P., and Muniz, W. (2006). Dispersive and Dissipative Properties of Dis-
continuous Galerkin Finite Element Methods for the Second-Order Wave Equation. Journal
of Scientific Computing, 27(1-3):5–40.

Akcelik, V. (2002). Multiscale Newton-Krylov methods for inverse acoustic wave propagation.
PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania.

Akcelik, V., Bielak, J., Biros, G., Epanomeritakis, I., Fernandez, A., Ghattas, O., Kim, E. J.,
Lopez, J., O’Hallaron, D., Tu, T., and Urbanic, J. (2003). High resolution forward and inverse
earthquake modeling on terascale computers. In SC ’03: Proceedings of the 2003 ACM/IEEE
conference on Supercomputing, page 52, Washington, DC, USA. IEEE Computer Society.

Aki, K. and Richards, P. (1980). Quantitative Seismology: Theory and Methods. W. H. Freeman
& Co, San Francisco.

Alerini, M., Bégat, S. L., Lambaré, G., and Baina, R. (2002). 2d pp- and ps-stereotomography
for a multicomponent dataset. SEG Technical Program Expanded Abstracts, 21(1):838–841.

Alford, R., Kelly, K., and Boore, D. (1974). Accuracy of finite-difference modeling of the
acoustic wave equation. Geophysics, 39:834–842.

Alterman, Z. and Karal, F. C. (1968). Propagation of elastic waves in layared media by finite-
difference methods. Bulletin of the Seismological Society of America, 58:367–398.

Amestoy, P. R., Guermouche, A., L’Excellent, J. Y., and Pralet, S. (2006). Hybrid scheduling
for the parallel solution of linear systems. Parallel Computing, 32:136–156.

Aminzadeh, F., Brac, J., and Kunz, T. (1997). 3-D Salt and Overthrust models. SEG/EAGE
3-D Modeling Series No.1.
BIBLIOGRAPHY

Amundsen, L. (1991). Comparison of the least-squares criterion and the Cauchy criterion in
frequency-wavenumber inversion. Geophysics, 56:2027–2035.

Amundsen, L. and Ursin, B. (1991). Frequency-wavenumber inversion of acoustic data. Geo-

physics, 56:1027–1039.

Aoi, S. and Fujiwara, H. (1999). 3D finite-difference method using discontinuous grids. Bulletin
of the Seismological Society of America, 89:918–930.

Aoyama, Y. and Nakano, J. (1999). RS/6000 SP: Practical MPI Programming. IBM Corpora-
tion, Texas, Red Book edition.

Askan, A. (2006). Full waveform inversion for seismic velocity and anelastic losses in hetero-
geneous structures. PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania.

Askan, A., Akcelik, V., Bielak, J., and Ghattas, O. (2007). Full waveform inversion for seismic
velocity and anelastic losses in heterogeneous structures. Bulletin of the Seismological Society
of America, 97(6):1990–2008.

Askan, A. and Bielak, J. (2008). Full anelastic waveform tomography including model uncer-
tainty. Bulletin of the Seismological Society of America, 98(6):2975–2989.

Babuska, I. and Suri, M. (1990). The p and the hp versions of the finite element method: An
overview. Comp. Meth. Appl. Mech. Engng., 80(1-3):5–26.

Barnes, C. and Charara, M. (2008). Full-waveform inversion results when using acoustic approx-
imation instead of elastic medium. SEG Technical Program Expanded Abstracts, 27(1):1895–
1899.

Basabe, J. D., Sen, M., and Wheeler, M. (2008). The interior penalty discontinuous galerkin
method for elastic wave propagation: grid dispersion. Geophysical Journal International,
175:83–93.

Baysal, E., Kosloff, D., and Sherwood, J. (1983). Reverse time migration. Geophysics, 48:1514–
1524.

Bécache, E., Petropoulos, P. G., and Gedney, S. G. (2004). On the long-time behavior of unsplit
perfectly matched layers. IEEE Transactions on Antennas and Propagation, 52:1335–1342.

Bednar, J. B., Shin, C., and Pyun, S. (2007). Comparison of waveform inversion, part 2: phase
approach. Geophysical Prospecting, 55(4):465–475.

Ben Hadj Ali, H., Operto, S., and Virieux, J. (2008). Velocity model building by 3D frequency-
domain, full-waveform inversion of wide-aperture seismic data. Geophysics, 73(5):VE101–
VE117.

BenJemaa, M., Glinsky-Olivier, N., Cruz-Atienza, V. M., and Virieux, J. (2009). 3D Dynamic
rupture simulations by a finite volume method. Geophys. J. Int., 178:541–560.

BenJemaa, M., Glinsky-Olivier, N., Cruz-Atienza, V. M., Virieux, J., and Piperno, S. (2007).
Dynamic non-planar crack rupture by a finite volume method. Geophys. J. Int., 171:271–285.

194
BIBLIOGRAPHY

Berenger, J.-P. (1994). A perfectly matched layer for absorption of electromagnetic waves. J.
Comput. Phys., 114:185–200.

Beydoun, W. B. and Tarantola, A. (1988). First Born and Rytov approximation: Modeling and
inversion conditions in a canonical example. Journal of the Acoustical Society of America,
83:1045–1055.

Beylkin, G. (1985). Imaging of discontinuities in the inverse scaterring problem by inversion

of a causal generalized Radon transform. Journal of Mathematical Physics, 26:99–108.

Beylkin, G. and Burridge, R. (1990). Linearized inverse scattering problems in acoustics and
elasticity. Wave motion, 12:15–52.

Billette, F. and Lambaré, G. (1998). Velocity macro-model estimation from seismic reflection
data by stereotomography. Geophys. J. Int., 135(2):671–680.

Billette, F., Le Bégat, S., Podvin, P., and Lambaré, G. (2003). Practical aspects and applica-
tions of 2D stereotomography. Geophysics, 68:1008–1021.

Biondi, B. and Symes, W. (2004). Angle-domain common-image gathers for migration velocity
analysis by wavefield-continuation imaging. Geophysics, 69(5):1283–1298.

Bleibinhaus, F., Hole, J. A., Ryberg, T., and Fuis, G. S. (2007). Structure of the California
Coast Ranges and San Andreas Fault at SAFOD from seismic waveform inversion and re-
flection imaging . Journal of Geophysical Research, 112(B06315):doi:10.1029/2006JB004611.

Bleinstein, N. (1986). Two-and-one-half dimensional in-plane wave-propagation. Geophysical

Prospecting, 34:686–703.

Bleistein, N. (1987). On the imaging of reflectors in the Earth. Geophysics, 52(7):931–942.

Bohlen, T. and Saenger, E. H. (2006). Accuracy of heterogeneous staggered-grid finite-difference

modeling of Rayleigh waves. Geophysics, 71:109–115.

Bolt, B. and Smith, W. (1976). Finite element computation of seismic anomalies from bodies
of arbitrary shape. Geophysics, 41:145–150.

Boore, D. M. (1972). Finite-difference methods for seismic wave propagation in heterogeneous

materials. In Ed., B. B. A., editor, Methods in computational physics, volume 11. Academic
Press, Inc.

Bouchon, M. (1981). A simple method to calculate Green’s functions for elastic layered media.
Bull. Seism. Soc. Am., 71(4):959–971.

Bouchon, M., Campillo, M., and Gaffet, S. (1989). A boundary integral equation - discrete
wavenumber representation method to study wave propagation in multilayered media having
irregualr interfaces. Geophysics, 54:1134–1140.

Brenders, A. J. and Pratt, R. G. (2007a). Efficient waveform tomography for lithospheric imag-
ing: implications for realistic 2D acquisition geometries and low frequency data. Geophysical
Journal International, 168:152–170.

195
BIBLIOGRAPHY

Brenders, A. J. and Pratt, R. G. (2007b). Full waveform tomography for lithospheric imaging:
results from a blind test in a realistic crustal model. Geophysical Journal International,
168:133–151.

Brenders, A. J. and Pratt, R. G. (2007c). Waveform tomography of marine seismic data: what
can limited offset offer? In Extended Abstracts, pages 3024–3029.

Brossier, R. (2009). Imagerie sismique à deux dimensions des milieux visco-élastiques par
inversion des formes d’onde: développements méthodologiques et applications. PhD thesis,
Université de Nice-Sophia-Antipolis.

Brossier, R. (2011). Two-dimensional frequency-domain visco-elastic full waveform inversion:

Parallel algorithms, optimization and performance. Computers & Geosciences, 37(4):444 –
455.

Brossier, R., Operto, S., and Virieux, J. (2009a). Seismic imaging of complex onshore structures
by 2D elastic frequency-domain full-waveform inversion. Geophysics, 74(6):WCC63–WCC76.

Brossier, R., Operto, S., and Virieux, J. (2009b). Two-dimensional seismic imaging of the
Valhall model from synthetic OBC data by frequency-domain elastic full-waveform inversion.
SEG Technical Program Expanded Abstracts, Houston, 28(1):2293–2297.

Brossier, R., Operto, S., and Virieux, J. (2010). Which data residual norm for robust elastic
frequency-domain full waveform inversion? Geophysics, 75(3):R37–R46.

Brossier, R., Virieux, J., and Operto, S. (2008). Parsimonious finite-volume frequency-domain
method for 2-D P-SV-wave modelling. Geophysical Journal International, 175(2):541–559.

Bube, K. P. and Langan, R. T. (1997). Hybrid l1 /l2 minimization with applications to tomog-
raphy. Geophysics, 62(4):1183–1195.

Bube, K. P. and Nemeth, T. (2007). Fast line searches for the robust solution of linear systems
in the hybrid l1 /l2 and huber norms. Geophysics, 72(2):A13–A17.

Bunks, C., Salek, F. M., Zaleski, S., and Chavent, G. (1995). Multiscale seismic waveform
inversion. Geophysics, 60(5):1457–1473.

Campillo, M., Gariel, J., Aki, K., and Sanchez-Sesma, F. (1989). Destructive strong ground
motion in Mexico city: source, path, and site effects during great 1985 Michoacan earthquake.
BSSA, 79(6):1718–1735.

Carcione, J. M., Herman, G. C., and ten Kroode, A. P. E. (2002). Seismic modeling. Geophysics,
67(4):1304–1325.

Carcione, J. M., Kosloff, D., and Kosloff, R. (1988). Wave-propagation simulation in an elastic
anisotropic (transversely isotropic) solid. Q. J. Mech. appl. Math., 41(3):319–345.

Cary, P. and Chapman, C. (1988). Automatic 1-D waveform inversion of marine seismic re-
fraction data. Geophysical Journal of the Royal Astronomical Society, 93:527–546.

Červený, V. (2001). Seismic Ray Theory. Cambridge University Press, Cambridge.

196
BIBLIOGRAPHY

Chaljub, E., Bard, P., Tsuno, S., Kristek, J., Moczo, P., Franek, P., Hollender, F., Manakou,
M., Raptakis, D., and Pitilakis, K. (2009). Assessing the capability of numerical methods to
predict earthquake ground motion: the Euroseistest verification and validation project. In
EOS Trans. AGU, abstract S43A-1968. American Geophysical Union, San Francisco, USA.
Chaljub, E., Capdeville, Y., and Vilotte, J.-P. (2003). Solving elastodynamics in a fluid-solid
heterogeneous sphere: a parallel spectral element approximation on non-conforming grids.
Journal of Computational Physics, 187:457–491.
Chaljub, E., Komatitsch, D., Vilotte, J.-P., Capdeville, Y., Valette, B., and Festa, G. (2007).
Spectral element analysis in seismology. In Wu, R.-S. and Maupin, V., editors, Advances
in Wave Propagation in Heterogeneous Earth, volume 48 of Advances in Geophysics, pages
365–419. Elsevier - Academic Press, London.
Chapman, C. (2004). Fundamentals of seismic waves propagation. Cambridge University Press,
Cambridge, England.
Chapman, C. H. (1985). Ray theory and its extensions: WKBJ and Maslov seismograms. J.
Geophys., 58:27–43.
Chapman, C. H. and Orcutt, J. A. (1985). Least-squares fitting of marine seismic refraction
data. Geophys. J. R. astr. Soc., 82:339–374.
Chavent, G. (2009). Nonlinear least squares for inverse problems. Springer Dordrecht Heidel-
berg London New York.
Chen, P., Jordan, T., and Zhao, L. (2007). Full three-dimensional tomography: a compar-
ison between the scattering-integral and adjoint-wavefield methods. Geophysical Journal
International, 170:175–181.
Cheng, J.-B. and Liu, H. (2006). Two kinds of separable approximations for the one-way
operator. Geophysics, 71:T1–T5.
Cheng, J.-B., Liu, H., and Zhang, Z. (2007). A separable-kernel decomposition method for
approximating the dsr continuation operator. Geophysics, 72:S25–S31.
Chew, W. C. and Liu, Q. H. (1996). Perfectly matched layers for elastodynamics: a new
absorbing boundary condition. J. Compu. Acous., 4:341–359.
Chew, W. C. and Weedon, W. H. (1994). A 3-D perfectly matched medium from modified
Maxwell’s equations with stretched coordinates. Microwave and Optical Technology Letters,
7:599–604.
Chin-Joe-Kong, M. J. S., Mulder, W. A., and Van Veldhuizen, M. (1999). Higher-order tri-
angular and tetrahedral finite elements with mass lumping for solving the wave equation.
Journal of Engineering Mathematics, 35:405–426.
Choi, Y., Min, D., and Shin, C. (2008). Two-dimensional waveform inversion of multi-
component data in acoustic-elastic coupled media. Geophysical Prospecting, 56(6):863–881.
Choi, Y. and Shin, C. (2008). Frequency-Domain Elastic Full Waveform Inversion Using the
New Pseudo-Hessian Matrix: Experience Of Elastic Marmousi 2 Synthetic Data. Bulletin of
the Seismological Society of America, 98(5):2402–2415.

197
BIBLIOGRAPHY

Claerbout, J. (1971). Towards a unified theory of reflector mapping. Geophysics, 36:467–481.

Claerbout, J. (1985). Imaging the Earth’s interior. Blackwell Scientific Publication.

Claerbout, J. and Doherty, S. (1972). Downward continuation of moveout corrected seismo-

grams. Geophysics, 37:741–768.

Claerbout, J. F. (1976). Fundamentals of Geophysical Data Processing. McGraw-Hill Book Co.

Cockburn, B., Li, F., and Shu, C. W. (2004). Locally divergence-free discontinuous Galerkin
methods for the Maxwell equations. Journal of Computational Physics, 194:588–610.

Collino, F. and Monk, P. (1998). Optimizing the perfectly matched layer. Computer methods
in Applied Mechanics and Engineering, 164:157–171.

Collino, F. and Tsogka, C. (2001). Application of the perfectly matched absorbing layer model
to the linear elastodynamic problem in anisotropic heterogeneous media. Geophysics, 66:294–
307.

Coutant, O. (1989). Program of Numerical Simulation AXITRA. Research report, LGIT,

Grenoble.

Crase, E. (1989). Robust elastic nonlinear inversion of seismic waveform data. PhD thesis,
University of Houston.

Crase, E., Pica, A., Noble, M., McDonald, J., and Tarantola, A. (1990). Robust elastic non-
linear waveform inversion: application to real data. Geophysics, 55:527–538.

Crase, E., Wideman, C., Noble, M., and Tarantola, A. (1992). Nonlinear elastic inversion of
land seismic reflection data. Journal of Geophysical Research, 97:4685–4705.

Dablain, M. (1986). The application of high order differencing for the scalar wave equation.
Geophysics, 51:54–66.

Danecek, P. and Seriani, G. (2008). An efficient parallel Chebyshev pseudo-spectral method

for large-scale 3D seismic forward modelling. In Expanded Abstracts, page P046.

de Hoop, A. (1960). A modification of Cagniard’s method for solving seismic pulse problems.
Applied Scientific Research, 8:349–356.

De la Puente, J., Ampuero, J.-P., and Käser, M. (2009). Dynamic Rupture Modeling on Un-
structured Meshes Using a Discontinuous Galerkin Method. Journal of Geophysical Research,
114:B10302.

Delcourte, S., Fezoui, L., and Glinsky-Olivier, N. (2009). A high-order discontinuous Galerkin
method for the seismic wave propagation. ESAIM: Proc., 27:70–89.

Dell’Aversana, P. (2001). Integration of seismic, MT and gravity data in a thrust belt inter-
pretation. First Break, 19:335–341.

Dell’Aversana, P., Ceragioli, E., Morandi, S., and Zollo, A. (2000). A simultaneous acquisi-
tion test of high-density ’global offset’ seismic in complex geological settings. First Break,
18(3):87–96.

198
BIBLIOGRAPHY

Demmel, J. W. (1997). Applied numerical linear algebra. SIAM, Philadelphia.

Dessa, J. X. and Pascal, G. (2003). Combined traveltime and frequency-domain seismic wave-
form inversion: a case study on multi-offset ultrasonic data. Geophys. J. Int., 154(1):117–133.

Devaney, A. J. (1982). A filtered backprojection algorithm for diffraction tomography. Ultra-

sonic Imaging, 4:336–350.

Djikpéssé, H. A. and Tarantola, A. (1999). Multiparameter l1 norm waveform fitting: Inter-

pretation of gulf of mexico reflection seismograms. Geophysics, 64(4):1023–1035.

Docherty, P., Silva, R., Singh, S., Song, Z.-M., and Wood, M. (2003). Migration velocity
analysis using a genetic algorithm. Geophysical Prospecting, 45:865–878.

Drossaert, F. H. and Giannopoulos, A. (2007). A nonsplit complex frequency-shifted PML

based on recursive integration for FDTD modeling of elastic waves. Geophysics, 72(2):T9–
T17.

Duff, I. S., Erisman, A. M., and Reid, J. K. (1986). Direct methods for sparse matrices.
Clarendon Press, Oxford, U. K.

Duff, I. S. and Reid, J. K. (1983). The multifrontal solution of indefinite sparse symmetric
linear systems. ACM Transactions on Mathematical Software, 9:302–325.

Dumbser, M. and Käser, M. (2006). An Arbitrary High Order Discontinuous Galerkin Method
for Elastic Waves on Unstructured Meshes II: The Three-Dimensional Isotropic Case. Geo-
physical Journal International, 167(1):319–336.

Dumbser, M., Käser, M., and de la Puente, J. (2007a). Arbitrary high-order finite volume
schemes for seismic wave propagation on unstructured meshes in 2D and 3D. Geophysical
Journal International, 171:665–694.

Dumbser, M., Käser, M., and Toro, E. (2007b). An Arbitrary High Order Discontinuous
Galerkin Method for Elastic Waves on Unstructured Meshes V: Local Time Stepping and
p-Adaptivity. Geophysical Journal International, 171(2):695–717.

Dummong, S., Meier, K., Gajewski, D., and Hubscher, C. (2008). Comparison of prestack
stereotomography and NIP wave tomography for velocity model building: Instances from
the Messinian evaporites. Geophysics, 73(5):VE291–VE302.

Dunavant, D. A. (1985). High degree efficient symmetrical gaussian quadrature rules for the
triangle. International Journal of Numerical Methods in Engineering, 21:1129–1148.

Effelsen, K. (2009). A comparison of phase inversion and traveltime tomography for processing
of near-surface refraction traveltimes. Geophysics, 74(6):WCB11–WCB24.

Epanomeritakis, I., Akçelik, V., Ghattas, O., and Bielak, J. (2008). A Newton-CG method
for large-scale three-dimensional elastic full waveform seismic inversion. Inverse Problems,
24:1–26.

Erlangga, Y. A. and Herrmann, F. J. (2008). An iterative multilevel method for computing

wavefields in frequency-domain seismic inversion. In Expanded Abstracts, pages 1956–1960.
Soc. Expl. Geophys.

199
BIBLIOGRAPHY

Etienne, V., Brossier, R., Operto, S., and Virieux, J. (2008). A 3D parsimonious Finite-Volume
Frequency-Domain method for elastic wave modelling. In Expanded Abstracts, 70th Annual
EAGE Conference & Exhibition, Rome. EAGE.

Etienne, V., Virieux, J., and Operto, S. (2009). A massively parallel time domain discontinuous
Galerkin method for 3D elastic wave modeling. In Expanded Abstracts, 79th Annual SEG
Conference & Exhibition, Houston. Society of Exploration Geophysics.

Fichtner, A., Kennett, B. L. N., Igel, H., and Bunge, H. P. (2008). Theoretical background
for continental- and global-scale full-waveform inversion in the time-frequency fomain. Geo-
physical Journal International, 175:665–685.

Forgues, E. and Lambaré, G. (1997). Parameterization study for acoustic and elastic ray+born
inversion. Journal of Seismic Exploration, 6:253–278.

Fornberg, B. (1998). A practical guide to pseudospectral methods. Cambridge monographs on

applied and computational mathematics.

Frey, P. and George, P. (2008). Mesh Generation. ISTE Ltd & John Wiley Sons Inc, London
(UK) & Hoboken (USA).

Futterman, W. (1962). Dispersive body waves. Journal of Geophysics Research, 67:5279–5291.

Galis, M., Moczo, P., and Kristek, J. (2008). A 3-D hybrid finite-difference - finite-element
viscoelastic modelling of seismic wave motion. Geophysical Journal International, 175:153–
184.

Gao, F., Levander, A. R., Pratt, R. G., Zelt, C. A., and Fradelizio, G. L. (2006). Wave-
form tomography at a groundwater contamination site: Vsp-surface data set. Geophysics,
71(1):H1–H11.

Gardner, G. H. F., Gardner, L. W., and Gregory, A. R. (1974). Formation velocity and
density–the diagnostic basics for stratigraphic traps. Geophysics, 39:770–780.

Garvin, W. W. (1956). Exact transient solution of the buried line source problem. Proc. Roy.
Soc. London, 234:528–541.

Gauthier, O., Virieux, J., and Tarantola, A. (1986). Two-dimensional nonlinear inversion of
seismic waveforms: numerical results. Geophysics, 51(7):1387–1403.

Gazdag, J. (1978). Wave equation migration with the phase-shift method. Geophysics, 43:1342–
1351.

Gelis, C., Virieux, J., and Grandjean, G. (2007). 2D elastic waveform inversion using Born
and Rytov approximations in the frequency domain. Geophysical Journal International,
168:605–633.

Gilbert, F. and Dziewonski, A. (1975). An application of normal mode theory to the retrieval
of structural parameters and source mechanisms from seismic spectra. Philosophical Trans-
actions of the Royal Society of London, 278:187–269.

Glangeaud, J. L. M. F. and Coppens, F. (1997). Traitement du signal pour géologues et

géophysiciens. éditions TECHNIP.

200
BIBLIOGRAPHY

Gouveia, W. P. and Scales, J. A. (1998). Bayesian seismic waveform inversion: parameter

estimation and uncertainty analysis. Journal of Geophysical Research, 103(B2):2579–2779.

Graves, R. (1996). Simulating seismic wave propagation in 3D elastic media using staggered-
grid finite differences. Bull. Seismol. Soc. Am., 86:1091–1106.

Grechka, V., Zhang, L., and III, J. W. R. (2004). Shear waves in acoustic anisotropic media.
Geophysics, 69:576–582.

Guermouche, A., L’Excellent, J. Y., and Utard, G. (2003). Impact of reordering on the memory
of a multifrontal solver. Parallel computing, 29:1191–1218.

Guitton, A. and Symes, W. W. (2003). Robust inversion of seismic data using the Huber norm.
Geophysics, 68(4):1310–1319.

Gutenberg, B. (1914). Über erdbenwellen viia. beobachtungen an registrierungen von fern-

beben in göttingen und folgerungen über die konstitution des erdkörpers. Nachrichten von
der Könglichen Gesellschaft der Wissenschaften zu Göttinge, Mathematisch-Physikalische
Klasse, pages 125–176.

Ha, T., Chung, W., and Shin, C. (2009). Waveform inversion using a back-propagation algo-
rithm and a Huber function norm. Geophysics, 74(3):R15–R24.

Haltiner, G. and Williams, R. (1980). Numerical prediction and dynamic meteorology. Wiley,
New York.

Hesthaven, J. and Warburton, T. (2008). Nodal discontinuous Galerkin methods: algorithms,

analysis, and applications. Springler.

Hicks, G. J. (2002). Arbitrary source and receiver positioning in finite-difference schemes using
kaiser windowed sinc functions. Geophysics, 67:156–166.

Hicks, G. J. and Pratt, R. G. (2001). Reflection waveform inversion using local descent methods:
estimating attenuation and velocity over a gas-sand deposit. Geophysics, 66(2):598–612.

Holberg, O. (1987). Computational aspects of the choice of operators and sampling interval for
numerical differentiation in large-scale simulation of wave phenomena. Geophys. Prospecting,
35:629–655.

Hole, J. A. (1992). Nonlinear high-resolution three-dimensional seismic travel time tomography.

Journal of Geophysical Research, 97:6553–6562.

Hu, W., Abubakar, A., and Habashy, T. M. (2009). Simultaneous multifrequency inversion of
full-waveform seismic data. Geophysics, 74(2):R1–R14.

Huber, P. J. (1973). Robust regression: Asymptotics, conjectures, and Monte Carlo. The
Annals of Statistic, 1(5):799–821.

Hustedt, B., Operto, S., and Virieux, J. (2004). Mixed-grid and staggered-grid finite difference
methods for frequency domain acoustic wave modelling. Geophysical Journal International,
157:1269–1296.

201
BIBLIOGRAPHY

Ichimura, T., Hori, M., and Kuwamoto, H. (2007). Earthquake motion simulation with multi-
scale finite element analysis on hybrid grid. Bull. seism. Soc. Am., 97(4):1133–1143.

Ikelle, L., Diet, J. P., and Tarantola, A. (1988). Linearized inversion of multioffset seismic
reflection data in the ω − k domain : depth-dependent reference medium. Geophysics,
53:50–64.

Ilan, A. and Loewenthal, D. (1976). Instability of finite difference schemes. Geophysical

Prospecting, 24:431–453.

Improta, L., Zollo, A., Herrero, A., Frattini, R., Virieux, J., and DellÁversana, P. (2002).
Seismic imaging of complex structures by non-linear traveltimes inversion of dense wide-
angle data: application to a thrust belt. Geophysical Journal International, 151:264–278.

Jacobs, G. and Hesthaven, J. S. (2006). High-order nodal discontinuous Galerkin particle-in-cell

methods on unstructured grids. Journal of Computational Physics, 214:96–121.

Jaiswal, P., Zelt, C., Bally, A. W., and Dasgupta, R. (2008). 2-D traveltime and waveform
inversion for improved seismic imaging: Naga Thrust and Fold Belt, India. Geophysical
Journal International, doi:10.1111/j.1365-246X.2007.03691.x.

Jaiswal, P., Zelt, C., Dasgupta, R., and Nath, K. (2009). Seismic imaging of the Naga Thrust
using multiscale waveform inversion. Geophysics, 74(6):WCC129–WCC140.

Jannane, M., Beydoun, W., Crase, E., Cao, D., Koren, Z., Landa, E., Mendes, M., Pica, A.,
Noble, M., Roeth, G., Singh, S., Snieder, R., Tarantola, A., and Trezeguet, D. (1989). Wave-
lengths of Earth structures that can be resolved from seismic reflection data. Geophysics,
54(7):906–910.

Jarraud, M. and Baede, A. (1985). The use of spectral techiques in numerical weather prediction,
pages 1–47. American Mathematical Society, Providence.

Jin, S. and Madariaga, R. (1993). Background velocity inversion with a genetic algorithm.
Geophysical Research Letters, 20:93–96.

Jin, S. and Madariaga, R. (1994). Nonlinear velocity inversion by a two-step Monte Carlo.
Geophysics, 59(4):577–590.

Jin, S., Madariaga, R., Virieux, J., and Lambaré, G. (1992). Two-dimensional asymptotic
iterative elastic inversion. Geophysical Journal International, 108:575–588.

Jo, C. H., Shin, C., and Suh, J. H. (1996). An optimal 9-point, finite-difference, frequency-space
2D scalar extrapolator. Geophysics, 61:529–537.

Jongmans, D., Pitilakis, K., Demanet, D., Raptakis, D., Riepl, J., Horrent, C., Lontzetidis,
K., and Bard, P. Y. (1998). Determination of the geological structure of the Volvi basin and
validation of the basin response. Bulletin of the Seismological Society of America, 88(2):473–
487.

Karypis, G. and Kumar, V. (1998). METIS - A software package for partitioning unstructured
graphs, partitioning meshes and computing fill-reducing orderings of sparse matrices - Version
4.0. University of Minnesota.

202
BIBLIOGRAPHY

Karypis, G. and Kumar, V. (1999). A fast and high quality multilevel scheme for partitioning
irregular graphs. SIAM Journal on Scientific Computing, 20(1):359 – 392.

Käser, M. and Dumbser, M. (2008). A highly accurate discontinuous Galerkin method for
complex interfaces between solids and moving fluids. Geophysics, 73(3):23–35.

Käser, M., Dumbser, M., de la Puente, J., and Igel, H. (2007). An Arbitrary High Order
Discontinuous Galerkin Method for Elastic Waves on Unstructured Meshes III: Viscoelastic
Attenuation. Geophysical Journal International, 168(1):224–242.

Käser, M., Hermann, V., and de la Puente, J. (2008). Quantitative accuracy analysis of the
discontinuous Galerkin method for seismic wave propagation. Geophysical Journal Interna-
tional, 173(2):990–999.

Kawase, H. (2003). Site effects on strong ground motions, International Handbook of Earthquake
and Engineering Seismology, Part B. W.H.K. Lee and H. Kanamori (eds), Academic Press,
London.

Keast, P. (1986). Moderate-degree tetrahedral quadrature formulae. Computer Methods in

Applied Mechanics and Engineering, 55:339–348.

Keller, J. B. (1962). A geometrical theory of diffraction. J. Opt. Soc. Am., 52:116–130.

Kelly, K., Ward, R., Treitel, S., and Alford, R. (1976). Synthetic seismograms - a finite-
difference approach. Geophysics, 41:2–27.

Kennett, B. L. N. (1983). Seismic wave propagation in stratified media. Cambridge University

Press, Cambridge.

Kim, Y., Min, D.-J., and Shin, C. (2011). Frequency-domain reverse-time migration with
source estimation. Geophysics, 76(2):S41–S49.

Klem-Musatov, K. D. and Aizenberg, A. M. (1985). Seismic modelling by methods of the

theory of edge waves. J. Geophys., 57:90–105.

Kolb, P., Collino, F., and Lailly, P. (1986). Prestack inversion of 1-D medium,. In Extended
Abstracts, volume 74, pages 498–508.

Kolsky, H. (1956). The propagation of stress pulses in viscoelastic solids. Philosophical Maga-
zine, 1:693–710.

Komatitsch, D., Labarta, J., and Michéa, D. (2008). A simulation of seismic wave propagation
at high resolution in the inner core of the Earth on 2166 processors of MareNostrum. Lecture
Notes in Computer Science, 5336:364–377.

Komatitsch, D., Liu, Q., Tromp, J., Suss, P., Stidham, C., and Shaw, J. H. (2004). Simulations
of ground motion in the Los Angeles basin based upon the spectral-element method. Bull.
Seismol. Soc. Am., 94:187–206.

Komatitsch, D. and Martin, R. (2007). An unsplit convolutional perfectly matched layer

improved at grazing incidence for the seismic wave equation. Geophysics, 72(5):SM155–
SM167.

203
BIBLIOGRAPHY

Komatitsch, D. and Vilotte, J. P. (1998). The spectral element method: an efficient tool to
simulate the seismic response of 2D and 3D geological structures. Bulletin of the Seismological
Society of America, 88:368–392.

Kommedal, J. H., Barkved, O. I., and Howe, D. J. (2004). Initial experience operating a
permanent 4C seabed array for reservoir monitoring at Valhall. SEG Technical Program
Expanded Abstracts, 23(1):2239–2242.

Kommedal, J. H., Barkved, O. I., and Thomsen, L. A. (1997). Acquisition of 4 component obs
data - a case study from the valhall field. Presented at the 59th EAGE conference and Teck
Exhibition.

Koren, Z., Mosegaard, K., Landa, E., Thore, P., and Tarantola, A. (1991). Monte carlo
estimation and resolution analysis of seismic background velocities. Journal of Geophysical
Research, 96:20289–20299.

Kormendi, F. and Dietrich, M. (1991). Non linear waveform inversion of plane-wave seismo-
grams in stratified elastic media. Geophysics, 56(5):664–674.

Krebs, J., Anderson, J., Hinkley, D., Neelamani, R., Lee, S., Baumstein, A., and Lacasse,
M. D. (2009). Fast full-wavefield seismic inversion using encoded sources. Geophysics,
74(6):WCC105–WCC116.

Kurzak, J. and Dongarra, J. (2006). Implementation of the mixed-precision high performance

LINPACK benchmark on the CELL processor. Technical report ut-cs-06-580, University of
Tennessee. http://icl.cs.utk.edu/iter-ref/.

Kuzuoglu, M. and Mittra, R. (1996). Frequency dependence of the constitutive parameters of

causal perfectly matched anisotropic absorbers. IEEE Microwave and Guided Wave Letters,
6:447–449.

Lailly, P. (1983). The seismic inverse problem as a sequence of before stack migrations. In
Bednar, R. and Weglein, editors, Conference on Inverse Scattering, Theory and application,
Society for Industrial and Applied Mathematics, Philadelphia, pages 206–220.

Lailly, P. (1984). The seismic inverse problem as a sequence of before stack migrations. In
Bednar, R. and Weglein, editors, Conference on Inverse Scattering, SIAM, Philadelphia,
pages 206–220. Soc. Ind. appl. Math.

Lamb, H. (1904). On the propagation of tremors over the surface of an elastic solid. Philos.
Tran. R. Soc. London Ser., A 203:1–42.

Lambaré, G. (1991). Inversion linearisée de données de sismique réflexion par une méthode
quasi-newtonienne. PhD thesis, Université de Paris VII.

Lambaré, G. (2008). Stereotomography. Geophysics, 73(5):VE25–VE34.

Lambaré, G. and Alérini, M. (2005). Semi-automatic picking PP-PS stereotomography: appli-

cation to the syntheticValhall dataset. In Extended Abstracts, pages 943–946.

Lambaré, G., Lucio, P. S., and Hanyga, A. (1996). Two-dimensional multivalued traveltime
and amplitude maps by uniform sampling of ray field. Geophys. J. Int., 125:584–598.

204
BIBLIOGRAPHY

Lambaré, G., Operto, S., Podvin, P., Thierry, P., and Noble, M. (2003). 3-D ray+Born migra-
tion/inversion - part 1: theory. Geophysics, 68:1348–1356.

Lambaré, G., Virieux, J., Madariaga, R., and Jin, S. (1992). Iterative asymptotic inversion in
the acoustic approximation. Geophysics, 57:1138–1154.

Langou, J., Luszczek, P., Kurzak, J., Buttari, A., , and Dongarra, J. (2006). LAPACK working
note 175: exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit
accuracy. Technical report, University of Tennessee. http://icl.cs.utk.edu/iter-ref/.

Le Rousseau, J. and de Hoop, M. (2001). Modelling and imaging with the scalar generalized-
screen algorithms in isotropic media. Geophysics, 66:1551–1568.

Lee, K. H. and Kim, H. J. (2003). Source-independent full-waveform inversion of seismic data.

Geophysics, 68:2010–2015.

Lehmann, I. (1936). P’. Publications du Bureau Central Séismologique international, A14:87–

115.

Levander, A. R. (1988). Fourth-order finite-difference P-SV seismograms. Geophysics,

53(11):1425–1436.

Lions, J. (1972). Nonhomogeneous boundary value problems and applications. Springer Verlag,
Berlin.

Liu, J. W. H. (1992). The multifrontal method for sparse matrix solution: theory and practice.
SIAM review, 34(1):82–109.

Liu, Q. and Tromp, J. (2006). Finite-frequency kernels based on adjoint methods. Bulletin of
the Seismological Society of America, 96(6):2383–2397.

Liu, Y.-B. and Wu, R.-S. (1994). A comparison between phase screen, finite difference, and
eigenfunction expansion calculations for scalar waves in inhomogeneous media. Bulletin of
the Seismological Society of America, 84:1154–1168.

Lucio, P. S., Lambaré, G., and Hanyga, A. (1996). 3D multivalued travel time and amplitude
maps. Pure Appl. Geophys., 148:113–136.

Luo, Y. and Schuster, G. T. (1990). Parsimonious staggered grid finite-differencing of the wave
equation. Geophysical Research Letters, 17(2):155–158.

Lysmer, J. and Drake, L. A. (1972). A finite element method for seismology. In Methods in
computational physics, volume 11. Academic Press, New Yort, USA.

Malinowsky, M. and Operto, S. (2008). Quantitative imaging of the Permo-Mesozoic complex

and its basement by frequency domain waveform tomography of wide-aperture seismic data
from the Polish basin. Geophysical Prospecting, 56:805–825.

Manakou, M., Raptakis, D., Apostolidis, P., Chavez-Garcia, F. J., and Pitilakis, K. (2007).
The 3D geological structure of the Mygdonian basin (Greece). In Proceedings of the 4th
International Conference on Earthquake Geotechnical Engineering, paper No.1686. ICEGE,
Thessaloniki, Greece.

205
BIBLIOGRAPHY

Marfurt, K. (1984). Accuracy of finite-difference and finite-elements modeling of the scalar and
elastic wave equation. Geophysics, 49:533–549.

Mariotti, C. (2007). Lamb’s problem with the lattice model Mka3D. Geophys. J. Int., 171:857–
864.

Martin, G. S., Wiley, R., and Marfurt, K. J. (2006). Marmousi2: An elastic upgrade for
marmousi. The Leading Edge, 25(2):156–166.

McMechan, G. A. and Fuis, G. S. (1987). Ray equation migration of wide-angle reflections

from southern alaska. Journal of Geophysical Research, 92(1):407–420.

Menke, W. (1984). Geophysical Data Analysis: Discrete Inverse Theory. Academic Press, Inc.,
Orlando, USA.

Mercerat, E. D., Vilotte, J. P., and Sanchez-Sesma, F. J. (2006). Triangular spectral element
simulation of two-dimensional elastic wave propagation using unstructured triangular grids.
Geophysical Journal International, 166:679–698.

Meza-Fajardo, K. and Papageorgiou, A. (2008). A nonconvolutional, split-field, perfectly

matched layer for wave propagation in isotropic and anisotropic elastic media: Stability
analysis. Bulletin of the Seismological Society of America, 98(4):1811–1836.

Miller, D., Oristaglio, M., and Beylkin, G. (1987). A new slant on seismic imaging: Migration
and integral geometry. Geophysics, 52(7):943–964.

Min, D. J. and Shin, C. (2006). Refraction tomography using a waveform-inversion back-

propagation technique. Geophysics, 71(3):R21–R30.

Min, D.-J., Shin, C., Pratt, R. G., and Yoo, H. S. (2003). Weighted-averaging finite-element
method for 2D elastic wave equations in the frequency domain. Bull. Seis. Soc. Am.,
93(2):904–921.

Moczo, P., Ampuero, J. P., Kristek, J., Galis, M., Day, S. M., and Igel, H. (2005). The
European Network SPICE Code Validation. In EOS Trans. AGU, abstract S13A-0180.
American Geophysical Union, San Francisco, USA.

Moczo, P., Kristek, J., Galis, M., Pazak, P., and Balazovjech, M. (2007). The finite-difference
and finite-element modeling of seismic wave propagation and earthquake motion. Acta Phys-
ica Slovaca, 52(2):177–406.

Moczo, P., Kristek, J., Vavrycuk, V., Archuleta, R., and Halada, L. (2002). 3D heteroge-
neous staggered-grid finite-difference modeling of seismic motion with volume harmonic and
arithmetic averaging of elastic moduli and densities. Bulletin of the Seismological Society of
America, 92:3042–3066.

Montelli, R., Nolet, G., Dahlen, F. A., Masters, G., Engdahl, E. R., and Hung, S. H. (2004).
Finite-frequency tomography reveals a variety of plumes in the mantle. Science, 303:338–343.

Mora, P. R. (1987). Nonlinear two-dimensional elastic inversion of multi-offset seismic data.

Geophysics, 52:1211–1228.

206
BIBLIOGRAPHY

Mora, P. R. (1988). Elastic wavefield inversion of reflection and transmission data. Geophysics,
53:750–759.

Mosegaard, K. and Tarantola, A. (1995). Monte carlo sampling of solutions to inverse problems.
Journal of Geophysical Research, 100 B7:12431–12447.

Mufti, I. R. (1985). Seismic modeling in the implicit mode. Geophy. Prosp., 33:619–656.

Mulder, W. and Plessix, R.-E. (2008). Exploring some issues in acoustic full waveform inversion.
Geophysical Prospecting, 56(6):827–841.

MUMPS-team (2009). MUMPS - MUltifrontal Massively Parallel Solver users’ guide - version
4.9.2 (November 2009). ENSEEIHT-ENS Lyon, http://www.enseeiht.fr/apo/MUMPS/ or
http://graal.ens-lyon.fr/MUMPS.

Munns, J. W. (1985). The Valhall field: a geological overview. Marine and Petroleum Geology,
2:23–43.

Nihei, K. T. and Li, X. (2007). Frequency response modelling of seismic waves using finite
difference time domain with phase sensitive detection (TD-PSD). Geophysical Journal In-
ternational, 169:1069–1078.

Nocedal, J. (1980). Updating Quasi-Newton Matrices With Limited Storage. Mathematics of

Computation, 35(151):773–782.

Nocedal, J. and Wright, S. J. (1999). Numerical Optimization. New York, US : Springer.

Nolet, G. (1987). Seismic tomography with applications in global seismology and exploration
geophysics. D. Reidel publishing Company.

Oldham, R. (1906). The constitution of the earth. Quarterly Journal of the Geological Society
of London, 62:456–475.

Operto, S., Lambaré, G., Podvin, P., and Thierry, P. (2003). 3-D ray-Born migration/inversion.
part 2: application to the SEG/EAGE overthrust experiment. Geophysics, 68(4):1357–1370.

Operto, S., Virieux, J., Amestoy, P., L’Éxcellent, J.-Y., Giraud, L., and Ben Hadj Ali, H.
(2007). 3D finite-difference frequency-domain modeling of visco-acoustic wave propagation
using a massively parallel direct solver: A feasibility study. Geophysics, 72(5):SM195–SM211.

Operto, S., Virieux, J., Dessa, J. X., and Pascal, G. (2006). Crustal imaging from
multifold ocean bottom seismometers data by frequency-domain full-waveform tomog-
raphy: application to the eastern Nankai trough. Journal of Geophysical Research,
111(B09306):doi:10.1029/2005JB003835.

Operto, S., Virieux, J., Ribodetti, A., and Anderson, J. E. (2009). Finite-difference frequency-
domain modeling of visco-acoustic wave propagation in two-dimensional TTI media. Geo-
physics, 74 (5):T75–T95.

Operto, S., Xu, S., and Lambaré, G. (2000a). Can we image quantitatively complex models
with rays? Geophysics, 65(4):1223–1238.

207
BIBLIOGRAPHY

Operto, S., Xu, S., and Lambaré, G. (2000b). Can we image quantitatively complex models
with rays? Geophysics, 65(4):1223–1238.

Pai, D. (1985). A new solution method for the wave equation in inhomogeneous media. Geo-
physics, 50:1541–1547.

Pai, D. (1988). Generalized f-k (frequency-wavenumber) migration in arbitrarily varying media.

Geophysics, 53:1547–1555.

Paige, C. C. and Saunders, M. A. (1982a). ALGORITHM 583 LSQR : Sparse linear equations
and least squares problems. ACM Trans. math. Soft., 8(2):195–209.

Paige, C. C. and Saunders, M. A. (1982b). LSQR: an algorithm for sparse linear equations and
sparse least squares. ACM Transactions on Mathematical software, 8(1):43–71.

Pasquetti, R. and Rapetti, F. (2006). Spectral element methods on unstructured meshes:

Comparisons and recent advances. Journal of Scientific Computing, 27:377–387.

Pica, A., Diet, J. P., and Tarantola, A. (1990). Nonlinear inversion of seismic reflection data
in laterally invariant medium. Geophysics, 55(3):284–292.

Pitarka, A. (1999). 3D elastic finite-difference modeling of seismic motion using staggered grids
with nonuniform spacing. Bull. Seism. Soc. Am., 89(1):54–68.

Plessix, R.-E. (2006). A review of the adjoint-state method for computing the gradient of a
functional with geophysical applications. Geophysical Journal International, 167(2):495–503.

Plessix, R. E. (2007). A Helmholtz iterative solver for 3D seismic-imaging problems. Geophysics,

72(5):SM185–SM194.

Plessix, R. E. (2009). Three-dimensional frequency-domain full-waveform inversion with an

iterative solver. Geophysics, 74(6):WCC53–WCC61.

Polak, E. and Ribière, G. (1969). Note sur la convergence de méthodes de directions conjuguées.
Revue Française d’Informatique et de Recherche Opérationnelle, 16:35–43.

Popov, M. M. (1982). A new method of computation of wave fields using gaussian beams.
Wave Motion, 4:85–95.

Pratt, R. G. (1990). Inverse theory applied to multi-source cross-hole tomography. part II :

elastic wave-equation method. Geophysical Prospecting, 38:311–330.

Pratt, R. G. (1999). Seismic waveform inversion in the frequency domain, part I : theory and
verification in a physic scale model. Geophysics, 64:888–901.

Pratt, R. G. and Goulty, N. R. (1991). Combining wave-equation imaging with traveltime

tomography to form high-resolution images from crosshole data. Geophysics, 56(2):204–224.

Pratt, R. G., Plessix, R. E., and Mulder, W. A. (2001). Seismic waveform tomography: the
effect of layering and anisotropy. In Expanded Abstracts, page P092.

Pratt, R. G. and Sams, M. S. (1996). Reconciliation of crosshole seismic velocities with well
information in a layered sedimentary environment. Geophysics, 61:549–560.

208
BIBLIOGRAPHY

Pratt, R. G., Shin, C., and Hicks, G. J. (1998). Gauss-Newton and full Newton methods in
frequency-space seismic waveform inversion. Geophysical Journal International, 133:341–362.

Pratt, R. G. and Shipp, R. M. (1999). Seismic waveform inversion in the frequency domain,
part II: Fault delineation in sediments using crosshole data. Geophysics, 64:902–914.

Pratt, R. G., Sirgue, L., Hornby, B., and Wolfe, J. (2008). Cross-well waveform tomography
in fine-layered sediments - meeting the challenges of anisotropy. In 70th Annual EAGE
Conference & Exhibition, Roma, page F020.

Pratt, R. G., Song, Z. M., Williamson, P. R., and Warner, M. (1996). Two-dimensional
velocity model from wide-angle seismic data by wavefield inversion. Geophysical Journal
International, 124:323–340.

Pratt, R. G. and Symes, W. (2002). Semblance and differential semblance optimisation for
waveform tomography: a frequency domain implementation. In Journal of Conference Ab-
stracts, volume 7(2), pages 183–184. Cambridge publications.

Pratt, R. G. and Worthington, M. H. (1990). Inverse theory applied to multi-source cross-hole

tomography. Part I: acoustic wave-equation method. Geophysical Prospecting, 38:287–310.

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Veltering, W. T. (1986). Numerical recipes
: the art of scientific computing. Cambridge University Press.

Prieux, V., Operto, S., Brossier, R., and Virieux, J. (2009). Application of acoustic full wave-
form inversion to the synthetic Valhall model. SEG Technical Program Expanded Abstracts,
28(1):2268–2272.

Pyun, S., Shin, C., and Bednar, J. B. (2007). Comparison of waveform inversion, part 3:
amplitude approach. Geophysical Prospecting, 55(4):477–485.

Pyun, S., Shin, C., Lee, H., and Yang, D. (2008). 3D elastic full waveform inversion in the
laplace domain. SEG Technical Program Expanded Abstracts, 27(1):1976–1980.

Pyun, S., Shin, C., and Son, W. (2009). Frequency-domain waveform inversion using an L1-
norm objective function. In Expanded Abstracts, page P005. EAGE.

Raptakis, D. G., Manakou, M. V., Chavez-Garcia, F. J., Makra, K. A., and Pitilaks, K. D.
(2005). 3D configuration of Mygdonian basin and preliminary estimate of its site response.
Soil Dynamics and Earthquake Engineering, 25:871–887.

Ravaut, C., Operto, S., Improta, L., Virieux, J., Herrero, A., and dell’Aversana, P. (2004).
Multi-scale imaging of complex structures from multi-fold wide-aperture seismic data by
frequency-domain full-wavefield inversions: application to a thrust belt. Geophysical Journal
International, 159:1032–1056.

Reed, W. and Hill, T. (1973). Triangular mesh methods for the neuron transport equation.
Technical Report LA-UR-73-479, Los Alamos Scientific Laboratory.

Remaki, M. (2000). A new finite volume scheme for solving Maxwell’s system. COMPEL,
19(3):913–931.

209
BIBLIOGRAPHY

Ribodetti, A. and Virieux, J. (1996). Asymptotic theory for imaging the attenuation factors
Qp and Qs . In Inverse Problems of Wave Propagation and Diffraction, Proceedings, Aix-les-
Bains, France 1996, pages 334–353. Springer-Verlag.

Ristow, D. and Ruhl, T. (1994). Fourier finite difference migration. Geophysics, 59:1882–1893.

Riyanti, C. D., Erlangga, Y. A., Plessix, R. E., Mulder, W. A., Vuik, C., and Oosterlee, C.
(2006). A new iterative solver for the time-harmonic wave equation. Geophysics, 71(E):57–63.

Riyanti, C. D., Kononov, A., Erlangga, Y. A., Vuik, C., Oosterlee, C., Plessix, R. E., and
Mulder, W. A. (2007). A parallel multigrid-based preconditioner for the 3D heterogeneous
high-frequency Helmholtz equation. Journal of Computational physics, 224:431–448.

Robertson, J. O. A., Holliger, K., Green, A. G., Pugin, A., and Iaco, R. D. (1996). Effects
of near-surface waveguides on shallow high-resolution seismic refraction and reflection data.
Geophysical Research Letters, 23(5):495–498.

Robertsson, J. O., Bednar, B., Blanch, J., Kostov, C., and van Manen, D. J. (2007). Intro-
duction to the supplement on seismic modeling with applications to acquisition, processing
and interpretation. Geophysics, Seismic modeling supplement to the september/october issue,
72(5):SM1–SM4.

Roden, J. A. and Gedney, S. D. (2000). Convolution PML (CPML): An efficient FDTD

implementation of the CFS-PML for arbitrary media. Microwave and Optical Technology
Letters, 27(5):334–339.

Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM, Philadelphia.

Saenger, E. H., Gold, N., and Shapiro, S. A. (2000). Modeling the propagation of elastic waves
using a modified finite-difference grid. Wave motion, 31:77–92.

Sambridge, M. and Drijkoningen, G. (1992). Genetic algorithms in seismic waveform inversion.

Geophysical Journal International, 109:323–342.

Sambridge, M. and Mosegaard, K. (2002). Monte carlo methods in geophysical inverse prob-
lems. Reviews of Geophysics, 40(3):1–29.

Sambridge, M. S., Tarantola, A., and Kennett, B. L. (1991). An alternative strategy for non-
linear inversion of seismic waveforms. Geophysical Prospecting, 39:723–736.

Scales, J. A., Docherty, P., and Gersztenkorn, A. (1990). Regularization of nonlinear inverse
problems: imaging the near-surface weathering layer. Inverse Problems, 6:115–131.

Scales, J. A. and Smith, M. L. (1994). Introductory geophysical inverse theory. Samizdat press.

Sears, T., Singh, S., and Barton, P. (2008). Elastic full waveform inversion of multi-component
OBC seismic data. Geophysical Prospecting, 56(6):843–862.

Sen, M. K. and Stoffa, P. L. (1995). Global Optimization Methods in Geophysical Inversion.

Elsevier Science Publishing Co.

Seriani, G. and Priolo, E. (1994). Spectral element method for acoustic wave simulation in
heterogeneous media. Finite elements in analysis and design, 16:337–348.

210
BIBLIOGRAPHY

Sheng, J., Leeds, A., Buddensiek, M., and Schuster, G. T. (2006). Early arrival waveform
tomography on near-surface refraction data. Geophysics, 71(4):U47–U57.

Sheriff, R. E. and Geldart, L. P. (1995). Exploration seismology. Cambridge University Press.

Shewchuk, J. R. (1998). Tetrahedral Mesh Generation by Delaunay Refinement. In Proceedings

of the 14th Annual Symposium on Computational Geometry, pages 86–95. SCG, Minneapolis.

Shi, Y., Zhao, W., and Cao, H. (2007). Nonlinear process control of wave-equation inversion
and its application in the detection of gas. Geophysics, 72(1):R9–R18.

Shin, C. and Cha, Y. H. (2008). Waveform inversion in the Laplace domain. Geophysical
Journal International, 173(3):922–931.

Shin, C. and Ha, W. (2008). A comparison between the behavior of objective functions for
waveform inversion in the frequency and laplace domains. Geophysics, 73(5):VE119–VE133.

Shin, C. and Ha, Y. H. (2009). Waveform inversion in the Laplace-Fourier domain. Geophysical
Journal International, 177:1067–1079.

Shin, C., Jang, S., and Min, D. J. (2001). Improved amplitude preservation for prestack depth
migration by inverse scattering theory. Geophysical Prospecting, 49:592–606.

Shin, C. and Min, D.-J. (2006). Waveform inversion using a logarithmic wavefield. Geophysics,
71(3):R31–R42.

Shin, C., Min, D.-J., Marfurt, K. J., Lim, H. Y., Yang, D., Cha, Y., Ko, S., Yoon, K., Ha, T.,
and Hong, S. (2002). Traveltime and amplitude calculations using the damped wave solution.
Geophysics, 67:1637–1647.

Shin, C., Pyun, S., and Bednar, J. B. (2007). Comparison of waveform inversion, part 1:
conventional wavefield vs logarithmic wavefield. Geophysical Prospecting, 55(4):449–464.

Shipp, R. M. and Singh, S. C. (2002). Two-dimensional full wavefield inversion of wide-aperture

marine seismic streamer data. Geophysical Journal International, 151:325–344.

Si, H. and Gärtner, K. (2005). Meshing Piecewise Linear Complexes by Constrained Delaunay
Tetrahedralizations. In Proceedings of the 14th International Meshing Roundtable, San Diego,
pages 147–163. IMR, San Diego.

Sirgue, L. (2003). Inversion de la forme d’onde dans le domaine fréquentiel de données sis-
miques grand offset. PhD thesis, Université Paris 11, France - Queen’s University, Canada.

Sirgue, L. (2006). The importance of low frequency and large offset in waveform inversion. In
68th Annual EAGE Conference & Exhibition, London, page A037.

Sirgue, L., Barkved, O. I., Dellinger, J., Etgen, J., Albertin, U., and Kommedal, J. H. (2010).
Full waveform inversion: the next leap forward in imaging at Valhall. First Break, 28:65–70.

Sirgue, L., Barkved, O. I., Gestel, J. P. V., Askim, O. J., and Kommedal, J. H. . (2009). 3D
waveform inversion on Valhall wide-azimuth OBC. In 71th Annual International Meeting,
EAGE, Expanded Abstracts, page U038.

211
BIBLIOGRAPHY

Sirgue, L., Etgen, J. T., and Albertin, U. (2008). 3D Frequency Domain Waveform Inversion
using Time Domain Finite Difference Methods. In Proceedings 70th EAGE, Conference and
Exhibition, Roma, Italy, page F022.
Sirgue, L. and Pratt, R. G. (2001). Frequency-domain waveform inversion: a strategy for
choosing frequencies. In Abstracts Book, pages 631–634. Eur. Geophys. Soc.
Sirgue, L. and Pratt, R. G. (2004). Efficient waveform inversion and imaging : a strategy for
selecting temporal frequencies. Geophysics, 69(1):231–248.
Snieder, R., Xie, M., Pica, A., and Tarantola, A. (1989). Retrieving both the impedance
contrast and background velocity : a global strategy for the seismic reflection problem.
Geophysics, 54:991–1000.
Sourbier, F., Haidar, A., Giraud, L., Operto, S., and Virieux, J. (2008). Frequency-domain
full-waveform modeling using a hybrid direct-iterative solver based on a parallel domain
decomposition method: A tool for 3D full-waveform inversion? SEG Technical Program
Expanded Abstracts, 27(1):2147–2151.
Sourbier, F., Operto, S., Virieux, J., Amestoy, P., and L’Excellent, J.-Y. (2009a). Fwt2d: A
massively parallel program for frequency-domain full-waveform tomography of wide-aperture
seismic data–part 1: Algorithm. Computers & Geosciences, 35(3):487 – 495.
Sourbier, F., Operto, S., Virieux, J., Amestoy, P., and L’Excellent, J.-Y. (2009b). Fwt2d: A
massively parallel program for frequency-domain full-waveform tomography of wide-aperture
seismic data–part 2: Numerical examples and scalability analysis. Computers & Geosciences,
35(3):496 – 514.
Spudich, P. and Orcutt, J. (1980). Petrology and porosity of an oceanic crustal site: re-
sults from wave form modeling of seismic refraction data. Journal of Geophysical Research,
85(B3):1409–1434.
Stekl, I. and Pratt, R. G. (1998). Accurate viscoelastic modeling by frequency-domain finite
difference using rotated operators. Geophysics, 63:1779–1794.
Stolt, R. H. (1978). Migration by fourier transform. Geophysics, 43:23–48.
Symes, W. W. (2007). Reverse time migration with optimal checkpointing. Geophysics,
72(5):SM213–SM221.
Taillandier, C., Noble, M., Chauris, H., and Calandra, H. (2009). First-arrival travel time
tomography based on the adjoint state method. Geophysics, in-press.
Takeuchi, N. and Geller, R. J. (2000). Optimally accurate second-order time-domain finite-
difference scheme for computing synthetic seismograms in 2-D and 3-D media. Physics of
the Earth and Planetary Interiors, 119:99–131.
Tal-Ezer, H., Carcione, J., and Kosloff, D. (1990). An accurate and efficient scheme for wave
propagation in linear viscoelastic media. Geophysics, 55:1366–1379.
Tape, C., Liu, Q., Maggi, A., and Tromp, J. (2009). Seismic tomography of the southern califor-
nia crust based on spectral-element and adjoint methods. Geophysical Journal International,
180:433–462.

212
BIBLIOGRAPHY

Tarantola, A. (1984). Inversion of seismic reflection data in the acoustic approximation. Geo-
physics, 49(8):1259–1266.

Tarantola, A. (1986). A strategy for non linear inversion of seismic reflection data. Geophysics,
51(10):1893–1903.

Tarantola, A. (1987). Inverse problem theory: methods for data fitting and model parameter
estimation. Elsevier, New York.

Thierry, P., Lambaré, G., Podvin, P., and Noble, M. (1999a). 3-D preserved amplitude prestack
depth migration on a workstation. Geophysics, 64(1):222–229.

Thierry, P., Operto, S., and Lambaré, G. (1999b). Fast 2D ray-Born inversion/migration in
complex media. Geophysics, 64(1):162–181.

Thomsen, L. A. (1986). Weak elastic anisotropy. Geophysics, 51:1954–1966.

Titarev, V. and Toro, E. (2002). Ader: arbitrary high order godunov approach. SIAM Journal
Scientific Computing, 17:609–618.

Toksöz, M. N. and Johnston, D. H. (1981). Geophysics reprint series, No. 2: Seismic wave
attenuation. Society of exploration geophysicists, Tulsa, OK.

Toomey, A. and Beans, C. (2000). Numerical simulation of seismic waves using a discrete
particle scheme. Geophysical Journal International, 141:889–901.

Toulopoulos, I. and Ekaterinaris, J. A. (2006). High-order discontinuous Galerkin discretiza-

tions for computational aeroacoustics in complex domains. AIAA J., 44:502–511.

Tromp, J., Tape, C., and Liu, Q. (2005). Seismic tomography, adjoint methods, time reversal
and banana-doughnut kernels. Geophysical Journal International, 160:195–216.

van den Berg, P. M. and Abubakar, A. (2001). Contrast source inversion method: state of the
art. Progress in Electromagnetics research, 34:189–218.

Vigh, D. and Starr, E. W. (2008). Comparisons for Waveform Inversion, Time domain or
Frequency domain? In Extended Abstracts, pages 1890–1894.

Vigh, D., Starr, E. W., and Kapoor, J. (2009). Developing Earth model with full waveform
inversion. The Leading Edge, 28(4):432–435.

Virieux, J. (1984). SH wave propagation in heterogeneous media, velocity-stress finite difference

method. Geophysics, 49:1259–1266.

Virieux, J. (1986). P-SV wave propagation in heterogeneous media, velocity-stress finite dif-
ference method. Geophysics, 51:889–901.

Virieux, J. and Lambaré, G. (2007). Theory and observations - body waves: ray methods and
finite frequency effects. In Romanovitz, B. and Diewonski, A., editors, Treatise of Geophysics,
volume 1: Seismology and structure of the Earth, Treatise of Geophysics. Elsevier.

Virieux, J. and Operto, S. (2009). An overview of full waveform inversion in exploration

geophysics. Geophysics, 74(6)(6):WCC127–WCC152.

213
BIBLIOGRAPHY

Virieux, J., Operto, S., Ben Hadj Ali, H., Brossier, R., Etienne, V., Sourbier, F., Giraud,
L., and Haidar, A. (2009). Seismic wave modeling for seismic imaging. The Leading Edge,
28(5):538–544.

Vogel, C. (2002). Computational methods for inverse problems. Society of Industrial and
Applied Mathematics, Philadelphia.

Vogel, C. R. and Oman, M. E. (1996). Iterative methods for total variation denoising. Society
for Industrial and Applied Mathematics Journal on Scientific Computing, 17(1):227–238.

Warner, M., Stekl, I., and Umpleby, A. (2008). 3D wavefield tomography: synthetic and field
data examples. SEG Technical Program Expanded Abstracts, 27(1):3330–3334.

White, R. S. and Stephen, R. A. (1980). Compressional to shear wave conversion in oceanic

crust. Geophys. J. R. astr. Soc., 63:547–565.

Whitmore, N. (1983). Iterative depth migration by backward time propagation. In 53th Annual
Meeting. SEG.

Williamson, P. (1991). A guide to the limits of resolution imposed by scattering in ray tomog-
raphy. Geophysics, 56:202–207.

Williamson, P. and Pratt, G. (1995). A critical review of 2.5D acoustic wave modeling proce-
dures. Geophysics, 60:591–595.

Woodhouse, J. and Dziewonski, A. (1984). Mapping the upper mantle: Three dimensional
modelling of earth structure by inversion of seismic waveforms. Journal of Geophysical
Research, 89:5953–5986.

Woodward, M. J., Nichols, D., Zdraveva, O., Whitfield, P., and Johns, T. (2008). A decade of
tomography. Geophysics, 73(5):VE5–VE11.

Woodwards, M. J. (1992). Wave-equation tomography. Geophysics, 57:15–26.

Wu, R.-S. (1994). Wide-angle elastic one-way propagation in heterogeneous media and an
elastic wave complex-screen method. Journal of Geophysical Research, 99:751–766.

Wu, R. S. (2003). Wave propagation, scattering and imaging using dual-domain one-way and
one-return propagation. Pure Applied Geophysics, 160:509–539.

Wu, R. S. and Aki, K. (1985). Scattering characteristics of elastic waves by an elastic hetero-
geneity. Geophysics, 50(4):582–595.

Wu, R.-S. and Toksöz, M. N. (1987). Diffraction tomography and multisource holography
applied to seismic imaging. Geophysics, 52:11–25.

Xu, S., Chauris, H., Lambaré, G., and Noble, M. (2001). Common angle image gather : a
strategy for imaging complex media. Geophysics, 66(6):1877–1894.

Yee, K. S. (1966). Numerical solution of initial boundary value problems involving Maxwell’s
equations in isotropic media. IEEE Trans. Antennas and Propagation, 14:302–307.

214
BIBLIOGRAPHY

Yilmaz, O. (2001). Seismic data analysis, volume 1. Society of Exploration Geophysicists:

processing, inversion and interpretation of seismic data.

Zelt, C. and Barton, P. J. (1998). Three-dimensional seismic refraction tomography: a compar-

ison of two methods applied to data from the faeroe basin. Journal of Geophysical Research,
103(B4):7187–7210.

Zhou, B. and Greenhalgh, S. A. (2003). Crosshole seismic inversion with normalized full-
waveform amplitude data. Geophysics, 68:1320–1330.

Zienkiewicz, O. and Taylor, R. (1967). The theory of finite element methods. Springer.

Zienkiewicz, O. C., Taylor, R. L., and Zhu, J. Z. (2005). The Finite Element Method: Its Basis
and Fundamentals. Elsevier, London. 6th edition.

215
Appendix A

Lagrangian basis functions

For the definition of the Lagrangian basis functions, the barycentric or tetrahedral coordinates
(ζ1 , ζ2 , ζ3 , ζ4 ) that are linked to the cartesian coordinates (x, y, z) are defined inside an element
as follows
    
1 1 1 1 1 ζ1
x x1 x2 x3 x4  ζ2 
 =
y   y1 y2 y3 y4  ζ3  ,
 

z z 1 z2 z3 z4 ζ4

where (xj , yj , zj ) are the coordinates of the j-th node of the element. Then, the Lagrangian basis
functions can be defined with a linear combination of the tetrahedral coordinates depending
on the approximation order. Following the node numbering convention given in Figure 1.13,
these functions are given by
for the P0 interpolation:

ϕ1 = 1,

for the P1 interpolation:

ϕ1 = ζ1 ϕ2 = ζ2 ϕ3 = ζ3 ϕ4 = ζ4 ,

and for the P2 interpolation:

ϕ1 = (2ζ1 − 1)ζ1 ϕ2 = (2ζ2 − 1)ζ2 ϕ3 = (2ζ3 − 1)ζ3 ϕ4 = (2ζ4 − 1)ζ4

ϕ5 = 4ζ1 ζ2 ϕ6 = 4ζ1 ζ3 ϕ7 = 4ζ1 ζ4 ϕ8 = 4ζ3 ζ2 ϕ9 = 4ζ3 ζ4 ϕ10 = 4ζ2 ζ4 .

Appendix B

Matrices used in the DG-FEM

formulation

Mθ and Nθ are constant real matrices defined by

   T
1 1 0 0 0 0 1 2 −1 0 0 0
Mx =  0 0 0 1 0 0 Nx =  0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0 1 0
   T
0 0 0 1 0 0 0 0 0 1 0 0
My =  1 0 1 0 0 0  Ny = 1 −1 2
 0 0 0
0 0 0 0 0 1 0 0 0 0 0 1
   T
0 0 0 0 1 0 0 0 0 0 1 0
Mz =  0 0 0 0 0 1  Nz = 0 0
 0 0 0 1 .
1 −1 −1 0 0 0 1 −1 −1 0 0 0
For Pk , with k ≤ 2, the volume integral in equations (1.45) and (1.46) can be computed with
the 11 Gauss points integration rule for tetrahedra (Keast, 1986) and the surface integral in
equations (1.47) and (1.48) can be computed with the six Gauss points integration rule for
triangles (Dunavant, 1985). Below, we give the expression of the matrices relevant for P1
elements following the node numbering convention given in Figure 1.13b.
 
2 1 1 1
voli 1 2 1 1 ,

Ki = (B.1)
20 1 1 2 1
1 1 1 2
with voli as the volume of element i.
 
Si1 ni1θ Si1 ni1θ Si1 ni1θ Si1 ni1θ
1 Si2 ni2θ Si2 ni2θ Si2 ni2θ Si2 ni2θ 
Eiθ =  ∀θ ∈ {x, y, z}, (B.2)
12 Si3 ni3θ Si3 ni3θ Si3 ni3θ Si3 ni3θ 
Si4 ni4θ Si4 ni4θ Si4 ni4θ Si4 ni4θ
with Sik the surface of the face opposite to the k-th node of element i and ~nik = (nikx , niky , nikz )T
as the outward pointing unit normal vector with respect to the surface Sik . For the compu-
tation of the flux matrices, we adopt a specific node numbering scheme. First, the neighbour
MATRICES USED IN THE DG-FEM FORMULATION

element k is given by the node number of element i which is not shared between elements i
and k. For instance, in Figure 1.13.b, the neighbour element k = 1 is the element sharing the
face (234) of element i. Second, the neighbour element nodes share the same node numbers of
element i on the common face. Therefore, the oppposite nodes of element i and k have also
the same number. With this node numbering scheme, Fik and Gik are identical when both
elements are P1 . We use this property to perform an efficient computation of the flux. In that
case, we get
   
0 0 0 0 2 0 1 1
Si1 
0 2 1 1
 Si2 
0 0 0 0

Fi1 = Fi2 =
12 0 1 2 1 12 1 0 2 1
0 1 1 2 1 0 1 2

   
2 1 0 1 2 1 1 0
Si3 
1 2 0 1 Si4 
1 2 1 0
Fi3 =  Fi4 = . (B.3)
12 0 0 0 0 12 1 1 2 0
1 1 0 2 0 0 0 0

220
Appendix C

Full Waveform Tomography

Pre-Processing

As mentioned in the text in section 4.4, the main objective of the data preprocessing is to
apply to the data some modifications which reflect the approximation made in the modelling
and inversion algorithms.
One of the main approximation we made is that we used the scalar acoustic wave equation
to model wave propagation. Density and the attenuation factor Q were assumed to be constant
and equal to 1 and 1000 respectively for this case study. Converted PS-waves are not modelled
by the acoustic wave equation and hence must be processed as noise. As a consequence, a
major objective of the preprocessing was to transform the data such that the processed data
mimic as well as possible the one which would have been recorded in a perfect acoustic earth.
At present, we didn’t use a visco-elastic full-waveform modelling/inversion scheme for this case
study but a simpler acoustic one for several reasons. First, multi-parameter full waveform
tomography is very sensitive to errors in the starting model. Hence, the relevance of the multi-
parameter estimation is very difficult to assess. Second, the PS-wave conversion phenomena is a
rather ephemeral and hence complex phenomena in the case of complex geological environment
involving rugged interfaces (Spudich and Orcutt, 1980; White and Stephen, 1980). It was
noted that rugged interfaces may prevent the downward propagating converted shear waves
from being phase coherent although the velocity properties at the interface may be appropriate
for considerable mode conversion (White and Stephen, 1980). This comment renders the non
linearity of the elastic full waveform inverse problem in the case of complex structure since it is
unlikely that the waveform inversion will be sufficiently robust to distinguish between the effect
of interface roughness and velocity contrasts when fitting uncoherent converted PS-waves.
Pre-processing must also mitigate several amplitude effects resulting from the instrumenta-
tion (variability of the source size from one shot to the next) and the extremely variable surface
geology which affects the receiver-ground coupling. Accounting for the surface geology is clearly
beyond the resolution limit of full waveform tomography. The sharp amplitude versus offset
variations in the CRG of Figure 4.28a suggests that the amplitude versus offset information
may be very difficult to account for in the full waveform tomography since it may be affect by
several factors extraneous to wave propagation phenomena (source variability, receiver-ground
coupling).
FULL WAVEFORM TOMOGRAPHY PRE-PROCESSING

To overcome these difficulties and account for the acoustic approximation, we designed a
heuristic pre-processing sequence which performs the following tasks (Figure 4.32):

• minimum-phase frequency-domain deconvolution (whitening) (Yilmaz, 2001), p. 253,

including a normalization of the spectral amplitude of each trace with respect to its
maximum.

• Butterworth band-pass filtering to improve the signal-to-noise ratio.

• Quality control and trace editing to remove noisy traces.

• Application of a reduced time scale to facilitate subsequent application of coherency

filtering.

• Coherency filtering using spectral matrix filtering (Glangeaud and Coppens, 1997) to
improve the signal-to-noise ratio and to strengthen the lateral trace coherency.

• Offset and time windowing. Traces within the 0-0.8km offset range were removed to
eliminate ground rolls. Time windowing was applied to eliminate late arrivals which
corresponds to deep reflections coming from outside the limits of our model and converted
PS-waves.

Whitening which flattens the amplitude spectrum was applied to give a similar weight to
each frequency component of the amplitude spectrum and to mitigate the effect of the source
directivity. The spectrum flattening was subsequently mitigated by applying a Butterworth
bandpass filtering to improve the signal-to-noise ratio. The spectral amplitude normalization
implies that the amplitude-versus-offset information is lost and that an equivalent weight is
given to each trace in the full waveform tomography algorithm. Only the amplitude-versus-time
information is fully preserved by the pre-processing. We claim that this information may be
sufficient to develop reliable mono-parameter P-wave velocity model by summation of each indi-
vidual trace contribution. But we acknowledge that this data weighting would not be acceptable
in the frame of a multi-parameter elastic inversion which may require amplitude-versus-offset
analysis to uncouple the different parameters. The spectral amplitude normalization was ap-
plied to remove many amplitude effects such as shot-to-shot variability and receiver-ground
coupling.
The acoustic wave equation provides the pressure wavefield. The data measured by vertical
geophone are vertical particle velocity. Rigorously, a conversion from pressure to vertical par-
ticle velocity would be required in the FD forward-modelling code. Moreover, the Jacobian of
the partial derivative of pressure wavefields with respect to model parameters must be replaced
by that of the partial derivatives of vertical particle velocity with respect to model parameters
in the full-waveform tomography algorithm. This latter task is not trivial at all.
Alternatively, one can apply some heuristic weighting of the observed vertical particle veloc-
ity data such that the weighted vertical particle velocity wavefields reflect pressure wavefields.
The waveform shape of the pressure and vertical velocity field are essentially related by a
derivative relation (Sheriff and Geldart, 1995),pp. 225. This relation can be accounted for by
the source term whose estimation is embedded in the waveform tomography algorithm. More-
over, the amplitude-versus-angle behavior of the pressure and vertical velocity fields are rather
different since the amplitude of the vertical velocity is sensitive to the incidence angle of the

222
arrival at the sensor. We used the spectral amplitude normalization applied by the whitening
to weight the amplitude-versus-angle behavior of vertical particle velocity seismograms such
that it reflects that of pressure seismograms. Although this approximation may appear crude,
this is the approach which, up to now, allowed us to develop waveform tomography images
which revealed geological features whose relevance was demonstrated.

223

Material Point Method For Geotechnical Engineering A Practical Guide
No ratings yet
Material Point Method For Geotechnical Engineering A Practical Guide
443 pages
Easy Drive GT20manualV1.1
No ratings yet
Easy Drive GT20manualV1.1
102 pages
(Investigations in Geophysics - 20) Gerard T. Schuster - Seismic Inversion (2017, Society of Exploration Geophysicists) - Libgen - Li
No ratings yet
(Investigations in Geophysics - 20) Gerard T. Schuster - Seismic Inversion (2017, Society of Exploration Geophysicists) - Libgen - Li
377 pages
Poulson Dissertation
No ratings yet
Poulson Dissertation
216 pages
Webb2013 - PHD Thesis University of Colorado Boulder
No ratings yet
Webb2013 - PHD Thesis University of Colorado Boulder
251 pages
Robinson - Moser - TJ - (2020) - Basic Wave Analysis
No ratings yet
Robinson - Moser - TJ - (2020) - Basic Wave Analysis
407 pages
Tiago Martins 89718 Tese de Mestrado MEAer
No ratings yet
Tiago Martins 89718 Tese de Mestrado MEAer
106 pages
FreeFEM Documentation
No ratings yet
FreeFEM Documentation
716 pages
Induction Study 150110 PDF
No ratings yet
Induction Study 150110 PDF
173 pages
PHD Alsayyari
No ratings yet
PHD Alsayyari
177 pages
The Use of Surface Integral Methods in Computational Jet Aeroacoustics
No ratings yet
The Use of Surface Integral Methods in Computational Jet Aeroacoustics
87 pages
MasterarbeitAmelieNuesse Final
No ratings yet
MasterarbeitAmelieNuesse Final
100 pages
Seismic Inversion Book PDF
0% (1)
Seismic Inversion Book PDF
221 pages
Farcas 2015 Master Thesis
No ratings yet
Farcas 2015 Master Thesis
93 pages
Thesis
No ratings yet
Thesis
89 pages
Finite Difference Methods For Wave Motion
No ratings yet
Finite Difference Methods For Wave Motion
51 pages
Witherden 2015 - On The Development and Implementation of High-Order Flux Reconstruction Schemes For Computational Fluid Dynamics
No ratings yet
Witherden 2015 - On The Development and Implementation of High-Order Flux Reconstruction Schemes For Computational Fluid Dynamics
131 pages
The Material Point Method in Geotechnical
0% (1)
The Material Point Method in Geotechnical
53 pages
Bachelor Thesis Sample - Partial Lumping
No ratings yet
Bachelor Thesis Sample - Partial Lumping
46 pages
Thesis
No ratings yet
Thesis
133 pages
Dissertation
100% (1)
Dissertation
285 pages
Dipl.-Phys. Hauke Gravenkamp. Numerical Methods For The Simulation of Ultrasonic Guided Waves
No ratings yet
Dipl.-Phys. Hauke Gravenkamp. Numerical Methods For The Simulation of Ultrasonic Guided Waves
212 pages
STPM Physics Experiment 7 Wheatstone Bridge (Second Term)
89% (36)
STPM Physics Experiment 7 Wheatstone Bridge (Second Term)
2 pages
Elementi Finiti e Matlab
No ratings yet
Elementi Finiti e Matlab
196 pages
2013-2D Full Waveform Inversion of Shallow Seismic Rayleigh Waves-Groos
No ratings yet
2013-2D Full Waveform Inversion of Shallow Seismic Rayleigh Waves-Groos
156 pages
FreeFEM Documentation
No ratings yet
FreeFEM Documentation
746 pages
Sound Thesis
No ratings yet
Sound Thesis
218 pages
CFD For Fem
No ratings yet
CFD For Fem
176 pages
Intro To FEM For Thermal and Stress Analysis With Matlab SOFEA
No ratings yet
Intro To FEM For Thermal and Stress Analysis With Matlab SOFEA
196 pages
ICSE Class 9 Physics Important Questions
No ratings yet
ICSE Class 9 Physics Important Questions
2 pages
Dynamic Analysis of Multi-Degree-Of-Freedom Systems Using A Pole
No ratings yet
Dynamic Analysis of Multi-Degree-Of-Freedom Systems Using A Pole
90 pages
Spiegelman MMM
No ratings yet
Spiegelman MMM
202 pages
Full Seismic Waveform Inversion For Structural and
No ratings yet
Full Seismic Waveform Inversion For Structural and
266 pages
What Every Engineer Should Know
No ratings yet
What Every Engineer Should Know
6 pages
Basic Earth Imaging Imp
No ratings yet
Basic Earth Imaging Imp
234 pages
Intro To FEA Notes Zurich
No ratings yet
Intro To FEA Notes Zurich
208 pages
(L) - 2015 - May - Introduction To Finite Element Modelling in Geosciences - ETH
No ratings yet
(L) - 2015 - May - Introduction To Finite Element Modelling in Geosciences - ETH
81 pages
Image Estimation by Example - Geophysical Sounding - J. Claerbout
No ratings yet
Image Estimation by Example - Geophysical Sounding - J. Claerbout
326 pages
Modeling and Simulation of Time-Harmonic Wave Propagation in Cylindrical Impedance Guides: Application To An Oil Well Stimulation Technology
No ratings yet
Modeling and Simulation of Time-Harmonic Wave Propagation in Cylindrical Impedance Guides: Application To An Oil Well Stimulation Technology
130 pages
Thesis Michaela and Tobias Hoefler PDF
No ratings yet
Thesis Michaela and Tobias Hoefler PDF
182 pages
Advanced Numerical Methods
No ratings yet
Advanced Numerical Methods
160 pages
Basic Earth Imaging - Claerbout PDF
No ratings yet
Basic Earth Imaging - Claerbout PDF
230 pages
AE4133 CFD II Part 1 Discretisations For Compressible Flows
No ratings yet
AE4133 CFD II Part 1 Discretisations For Compressible Flows
104 pages
Schneider2021 Article AReviewOfNonlinearFFT-basedCom
No ratings yet
Schneider2021 Article AReviewOfNonlinearFFT-basedCom
50 pages
Hysing PHD Thesis
No ratings yet
Hysing PHD Thesis
134 pages
ElmerModelsManual PDF
No ratings yet
ElmerModelsManual PDF
290 pages
Computational Methods in Acoustics
No ratings yet
Computational Methods in Acoustics
75 pages
Rapid Seismic Waveform Modeling and Inversion With Neural Operators
No ratings yet
Rapid Seismic Waveform Modeling and Inversion With Neural Operators
12 pages
LectureNotes ps2pdf 6mai
No ratings yet
LectureNotes ps2pdf 6mai
94 pages
Gee7 2011
No ratings yet
Gee7 2011
318 pages
SC Physics Formulas
86% (7)
SC Physics Formulas
2 pages
FEM Lecture Notes by Peter Hunter, Andrew Pullian
No ratings yet
FEM Lecture Notes by Peter Hunter, Andrew Pullian
153 pages
Structural Dynamics With Linear System Theories 16nov20
No ratings yet
Structural Dynamics With Linear System Theories 16nov20
301 pages
CT5123 LectureNotesThe Finite Element Method An Introduction PDF
No ratings yet
CT5123 LectureNotesThe Finite Element Method An Introduction PDF
125 pages
MMP-II 100 MCQs For MOCK Tests
83% (12)
MMP-II 100 MCQs For MOCK Tests
3 pages
The Finite Element Method-An Introduction
No ratings yet
The Finite Element Method-An Introduction
125 pages
IntroToFEA Red
No ratings yet
IntroToFEA Red
202 pages
Luminous Panel and Inverter Data Sheet - Removed
No ratings yet
Luminous Panel and Inverter Data Sheet - Removed
2 pages
API UT21 ThicknessProcedure 20190304
100% (1)
API UT21 ThicknessProcedure 20190304
7 pages
Introduction To Structural Geology
100% (3)
Introduction To Structural Geology
30 pages
CT5123 - LectureNotesThe Finite Element Method An Introduction
100% (2)
CT5123 - LectureNotesThe Finite Element Method An Introduction
125 pages
Structural Analysis Using Openfoam
No ratings yet
Structural Analysis Using Openfoam
86 pages
NCERT Exemplar For Class 9 Science Chapter 1 - Matter in Our Surroundings (Book Solutions)
No ratings yet
NCERT Exemplar For Class 9 Science Chapter 1 - Matter in Our Surroundings (Book Solutions)
15 pages
3D Finite-Difference Time-Domain Modeling of Acoustic Wave Propagation Based On Domain Decompostion
No ratings yet
3D Finite-Difference Time-Domain Modeling of Acoustic Wave Propagation Based On Domain Decompostion
32 pages
Excellent Achievers Global Integrated School, Inc. #36 Quirino ST., Zone 6, South Signal, Taguig City Tel. No. 838-6643/5539316
No ratings yet
Excellent Achievers Global Integrated School, Inc. #36 Quirino ST., Zone 6, South Signal, Taguig City Tel. No. 838-6643/5539316
21 pages
Ladder Problems LESSON
No ratings yet
Ladder Problems LESSON
3 pages
Nuclear Medicine: Equipment & Consumables
No ratings yet
Nuclear Medicine: Equipment & Consumables
14 pages
Midterm Reviewer
No ratings yet
Midterm Reviewer
21 pages
Technical Guide
No ratings yet
Technical Guide
16 pages
PWPS 001
No ratings yet
PWPS 001
2 pages
41.en BQ80 V1.0
No ratings yet
41.en BQ80 V1.0
6 pages
SPS - CYG PRS 789 H Series Auxiliary Relay Instruction Manual (SP408)
No ratings yet
SPS - CYG PRS 789 H Series Auxiliary Relay Instruction Manual (SP408)
31 pages
Chapter 11 Intermolecular Forces New
No ratings yet
Chapter 11 Intermolecular Forces New
81 pages
Potential Chipless RFID Sensors For Food Packaging Applications A Review
No ratings yet
Potential Chipless RFID Sensors For Food Packaging Applications A Review
19 pages
Aqa 83001H QP Nov21
No ratings yet
Aqa 83001H QP Nov21
35 pages
Learning Objectives: Florencio L. Vargas College, Inc College of Engineering Instructional Module
No ratings yet
Learning Objectives: Florencio L. Vargas College, Inc College of Engineering Instructional Module
8 pages
Histology and Embryology: Dr. Ajay Kr. Mishra Jilin Medical College
No ratings yet
Histology and Embryology: Dr. Ajay Kr. Mishra Jilin Medical College
34 pages
List of New Books Added To Iist Library On 06.05.2025
No ratings yet
List of New Books Added To Iist Library On 06.05.2025
21 pages
Analysis of Steel Slab Base and Gusseted Base For Economy
No ratings yet
Analysis of Steel Slab Base and Gusseted Base For Economy
3 pages
Simple Choke Calculator Tutorial
No ratings yet
Simple Choke Calculator Tutorial
4 pages
DD00005302 Avoiding EMI and ESD in Camera Installationss
No ratings yet
DD00005302 Avoiding EMI and ESD in Camera Installationss
30 pages
Displacement - Measurement - Lecture 5
No ratings yet
Displacement - Measurement - Lecture 5
20 pages
2023 Chiban
No ratings yet
2023 Chiban
12 pages
Aircraft Structures 1: Department of Aeronautical Engineering
No ratings yet
Aircraft Structures 1: Department of Aeronautical Engineering
8 pages
Acids, Bases and Salts 2
No ratings yet
Acids, Bases and Salts 2
4 pages