Get Discrete inverse and state estimation problems with geophysical fluid applications Carl Wunsch PDF ebook with Full Chapters Now
Get Discrete inverse and state estimation problems with geophysical fluid applications Carl Wunsch PDF ebook with Full Chapters Now
Get Discrete inverse and state estimation problems with geophysical fluid applications Carl Wunsch PDF ebook with Full Chapters Now
https://ebookultra.com/download/linear-and-nonlinear-inverse-problems-
with-practical-applications-2012-11-15-edition-jennifer-l-mueller/
ebookultra.com
https://ebookultra.com/download/geophysical-data-analysis-and-inverse-
theory-with-matlab-and-python-5th-edition-william-menke/
ebookultra.com
https://ebookultra.com/download/complex-variables-with-
applications-3rd-edition-david-a-wunsch/
ebookultra.com
https://ebookultra.com/download/discrete-chaos-second-edition-with-
applications-in-science-and-engineering-elaydi/
ebookultra.com
Inverse Problems of Wave Processes A. S. Blagoveshchenskii
https://ebookultra.com/download/inverse-problems-of-wave-processes-a-
s-blagoveshchenskii/
ebookultra.com
https://ebookultra.com/download/inverse-boundary-spectral-
problems-1st-edition-alexander-kachalov/
ebookultra.com
https://ebookultra.com/download/biometric-inverse-problems-1st-
edition-svetlana-n-yanushkevich/
ebookultra.com
https://ebookultra.com/download/coefficient-inverse-problems-for-
parabolic-type-equations-and-their-application-danilaev/
ebookultra.com
https://ebookultra.com/download/multidimensional-inverse-and-ill-
posed-problems-for-differential-equations-yu-e-anikonov/
ebookultra.com
Discrete inverse and state estimation problems with
geophysical fluid applications Carl Wunsch Digital Instant
Download
Author(s): Carl Wunsch
ISBN(s): 9780521854245, 0521854245
Edition: CUP
File Details: PDF, 5.81 MB
Year: 2006
Language: english
This page intentionally left blank
DISCRETE INVERSE AND STATE
ESTIMATION PROBLEMS
With Geophysical Fluid Applications
The problems of making inferences about the natural world from noisy observations
and imperfect theories occur in almost all scientific disciplines. This book addresses
these problems using examples taken from geophysical fluid dynamics. It focuses
on discrete formulations, both static and time-varying, known variously as inverse,
state estimation or data assimilation problems. Starting with fundamental algebraic
and statistical ideas, the book guides the reader through a range of inference tools
including the singular value decomposition, Gauss–Markov and minimum variance
estimates, Kalman filters and related smoothers, and adjoint (Lagrange multiplier)
methods. The final chapters discuss a variety of practical applications to geophysical
flow problems.
Discrete Inverse and State Estimation Problems: With Geophysical Fluid Appli-
cations is an ideal introduction to the topic for graduate students and researchers
in oceanography, meteorology, climate dynamics, geophysical fluid dynamics, and
any field in which models are used to interpret observations. It is accessible to
a wide scientific audience, as the only prerequisite is an understanding of linear
algebra.
Carl Wunsch is Cecil and Ida Green Professor of Physical Oceanography at the
Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute
of Technology. After gaining his Ph.D. in geophysics in 1966 at MIT, he has risen
through the department, becoming its head for the period between 1977–81. He
subsequently served as Secretary of the Navy Research Professor and has held senior
visiting positions at many prestigious universities and institutes across the world.
His previous books include Ocean Acoustic Tomography (Cambridge University
Press, 1995) with W. Munk and P. Worcester, and The Ocean Circulation Inverse
Problem (Cambridge University Press, 1996).
DISCRETE INVERSE AND STATE
ESTIMATION PROBLEMS
With Geophysical Fluid Applications
CARL WUNSCH
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
© C. Wunsch 2006
Cambridge University Press has no responsibility for the persistence or accuracy of s
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
To Walter Munk for decades of friendship and exciting collaboration.
Contents
Preface page ix
Acknowledgements xi
Part I Fundamental machinery 1
1 Introduction 3
1.1 Differential equations 4
1.2 Partial differential equations 7
1.3 More examples 10
1.4 Importance of the forward model 17
2 Basic machinery 19
2.1 Background 19
2.2 Matrix and vector algebra 19
2.3 Simple statistics: regression 29
2.4 Least-squares 43
2.5 The singular vector expansion 69
2.6 Combined least-squares and adjoints 118
2.7 Minimum variance estimation and simultaneous equations 125
2.8 Improving recursively 136
2.9 Summary 143
Appendix 1. Maximum likelihood 145
Appendix 2. Differential operators and Green functions 146
Appendix 3. Recursive least-squares and Gauss–Markov solutions 148
3 Extensions of methods 152
3.1 The general eigenvector/eigenvalue problem 152
3.2 Sampling 155
3.3 Inequality constraints: non-negative least-squares 164
3.4 Linear programming 166
3.5 Empirical orthogonal functions 169
3.6 Kriging and other variants of Gauss–Markov estimation 170
vii
viii Contents
This book is to a large extent the second edition of The Ocean Circulation Inverse
Problem, but it differs from the original version in a number of ways. While teach-
ing the basic material at MIT and elsewhere over the past ten years, it became
clear that it was of interest to many students outside of physical oceanography –
the audience for whom the book had been written. The oceanographic material,
instead of being a motivating factor, was in practice an obstacle to understanding
for students with no oceanic background. In the revision, therefore, I have tried to
make the examples more generic and understandable, I hope, to anyone with even
rudimentary experience with simple fluid flows.
Also many of the oceanographic applications of the methods, which were still
novel and controversial at the time of writing, have become familiar and almost
commonplace. The oceanography, now confined to the two last chapters, is thus
focussed less on explaining why and how the calculations were done, and more on
summarizing what has been accomplished. Furthermore, the time-dependent prob-
lem (here called “state estimation” to distinguish it from meteorological practice)
has evolved rapidly in the oceanographic community from a hypothetical method-
ology to one that is clearly practical and in ever-growing use.
The focus is, however, on the basic concepts and not on the practical numerical
engineering required to use the ideas on the very large problems encountered with
real fluids. Anyone attempting to model the global ocean or atmosphere or equiv-
alent large scale system must confront issues of data storage, code parallelization,
truncation errors, grid refinement, and the like. Almost none of these important
problems are taken up here. Before constructive approaches to the practical prob-
lems can be found, one must understand the fundamental ideas. An analogy is the
need to understand the implications of Maxwell’s equations for electromagnetic
phenomena before one undertakes to build a high fidelity receiver. The effective
engineering of an electronic instrument can only be helped by good understanding
ix
x Preface
of how one works in principle, albeit the details of making one work in practice
can be quite different.
In the interests of keeping the book as short as possible, I have, however, omitted
some of the more interesting theoretical material of the original version, but which
readers can find in the wider literature on control theory. It is assumed that the
reader has a familiarity at the introductory level with matrices and vectors, although
everything is ultimately defined in Chapter 2.
Finally, I have tried to correct the dismaying number of typographical and other
errors in the previous book, but have surely introduced others. Reports of errors of
any type will be gratefully received.
I thank the students and colleagues who over the years have suggested correc-
tions, modifications, and clarifications. My time and energies have been supported
financially by the National Aeronautics and Space Administration, and the National
Science Foundation through grants and contracts, as well as by the Massachusetts
Institute of Technology through the Cecil and Ida Green Professorship.
Acknowledgements
xi
Part I
Fundamental machinery
1
Introduction
The most powerful insights into the behavior of the physical world are obtained
when observations are well described by a theoretical framework that is then avail-
able for predicting new phenomena or new observations. An example is the observed
behavior of radio signals and their extremely accurate description by the Maxwell
equations of electromagnetic radiation. Other such examples include planetary mo-
tions through Newtonian mechanics, or the movement of the atmosphere and ocean
as described by the equations of fluid mechanics, or the propagation of seismic
waves as described by the elastic wave equations. To the degree that the theoretical
framework supports, and is supported by, the observations one develops sufficient
confidence to calculate similar phenomena in previously unexplored domains or to
make predictions of future behavior (e.g., the position of the moon in 1000 years,
or the climate state of the earth in 100 years).
Developing a coherent view of the physical world requires some mastery, there-
fore, of both a framework, and of the meaning and interpretation of real data.
Conventional scientific education, at least in the physical sciences, puts a heavy
emphasis on learning how to solve appropriate differential and partial differential
equations (Maxwell, Schrödinger, Navier–Stokes, etc.). One learns which problems
are “well-posed,” how to construct solutions either exactly or approximately, and
how to interpret the results. Much less emphasis is placed on the problems of under-
standing the implications of data, which are inevitably imperfect – containing noise
of various types, often incomplete, and possibly inconsistent and thus considered
mathematically “ill-posed” or “ill-conditioned.” When working with observations,
ill-posedness is the norm, not the exception.
Many interesting problems arise in using observations in conjunction with theory.
In particular, one is driven to conclude that there are no well-posed problems outside
of textbooks, that stochastic elements are inevitably present and must be confronted,
and that more generally, one must make inferences about the world from data that
are necessarily always incomplete. The main purpose of this introductory chapter
3
4 Introduction
d2 T
κ = 0, (1.1)
dr 2
subject to the boundary conditions
Equation (1.1) is so simple we can write its solution in a number of different ways.
One form is
T (r ) = a + br, (1.3)
which is a straight line. Such problems, or analogues for much more complicated
systems, are sometimes called “forward” or “direct” and they are “well-posed”:
exactly enough information is available to produce a unique solution insensitive to
perturbations in any element (easily proved here, not so easily in other cases). The
solution is both stable and differentiable. This sort of problem and its solution is
what is generally taught in elementary science courses.
On the other hand, the problems one encounters in actually doing science differ
significantly – both in the questions being asked, and in the information available.
1.1 Differential equations 5
For example:
1. One or both of the boundary values TA , TB is known from measurements; they are thus
given as TA = TA(c) ±TA , TB = TB(c) ±TB , where the TA,B are an estimate of the
possible inaccuracies in the theoretical values Ti(c) . (Exactly what that might mean is
taken up later.)
2. One or both of the positions, rA,B is also the result of measurement and are of the form
(c)
rA,B ± rA,B .
3. TB is missing altogether, but is known to be positive, TB > 0.
(c)
4. One of the boundary values, e.g., TB , is unknown, but an interior value Tint = Tint ±
Tint is provided instead. Perhaps many interior values are known, but none of them
perfectly.
Other possibilities exist. But even this short list raises a number of interesting,
practical problems. One of the themes of this book is that almost nothing in reality
is known perfectly. It is possible that TA , TB are very small; but as long as they
are not actually zero, there is no longer any possibility of finding a unique solution.
Many variations on this model and theme arise in practice. Suppose the problem
is made slightly more interesting by introducing a “source” ST (r ), so that the
temperature field is thought to satisfy the equation
d2 T (r )
= ST (r ), (1.5)
dr 2
along with its boundary conditions, producing another conventional forward prob-
lem. One can convert (1.5) into a different problem by supposing that one knows
T (r ), and seeks ST (r ). Such a problem is even easier to solve than the conven-
tional one: differentiate T twice. Because convention dictates that the “forward”
or “direct” problem involves the determination of T (r ) from a known ST (r ) and
boundary data, this latter problem might be labeled as an “inverse” one – simply
because it contrasts with the conventional formulation.
In practice, a whole series of new problems can be raised: suppose ST (r )
is imperfectly known. How should one proceed? If one knows ST (r ) and T (r )
at a series of positions ri = rA , rB , could one nonetheless deduce the bound-
ary conditions? Could one deduce ST (r ) if it were not known at these interior
values?
T (r ) has been supposed to satisfy the differential equation (1.1). For many
purposes, it is helpful to reduce the problem to one that is intrinsically discrete.
One way to do this would be to expand the solution in a system of polynomials,
T (r ) = α 0r 0 + α 1r 1 + · · · + α m r m , (1.6)
6 Introduction
and
ST (r ) = β 0r 0 + β 1r 1 + · · · + β n r n , (1.7)
where the β i would conventionally be known, and the problem has been reduced
from the need to find a function T (r ) defined for all values of r, to one in which
only the finite number of parameters α i , i = 0, 1, . . . , m, must be found.
An alternative discretization is obtained by using the coordinate r. Divide the in-
terval rA = 0 ≤ r ≤ rB into N − 1 intervals of length r, so that rB = (N − 1) r.
Then, taking a simple one-sided difference:
T (2r ) − 2T (r ) + T (0) = (r )2 ST (r ),
T (3r ) − 2T (2r ) + T (1r ) = (r )2 ST (2r ), (1.8)
..
.
T ((N − 1) r ) − 2T ((N − 2) r ) + T ((N − 3)r ) = (r )2 ST ((N − 2) r ) .
If one counts the number of equations in (1.8) it is readily found that there are N − 2,
but with a total of N unknown T ( pr ). The two missing pieces of information are
provided by the two boundary conditions T (0r ) = T0 , T ((N − 1) r ) = TN −1 .
Thus the problem of solving the differential equation has been reduced to finding
the solution of a set of ordinary linear simultaneous algebraic equations, which we
will write, in the notation of Chapter 2, as
Ax = b, (1.9)
where A is a square matrix, x is the vector of unknowns T ( pr ), and b is the vector
of values q( pt), and of boundary values. The list above, of variations, e.g., where
a boundary condition is missing, or where interior values are provided instead of
boundary conditions, then becomes statements about having too few, or possibly
too many, equations for the number of unknowns. Uncertainties in the Ti or in
the q( pr ) become statements about having to solve simultaneous equations with
uncertainties in some elements. That models, even non-linear ones, can be reduced
to sets of simultaneous equations, is the unifying theme of this book. One might
need truly vast numbers of grid points, pr, or polynomial terms, and ingenuity in
the formulation to obtain adequate accuracy, but as long as the number of parameters
N < ∞, one has achieved a great, unifying simplification.
Consider a little more interesting ordinary differential equation, that for the
simple mass–spring oscillator:
d2 ξ (t) dξ (t)
m 2
+ε + k0 ξ (t) = Sξ (t), (1.10)
dt dt
where m is mass, k0 is a spring constant, and ε is a dissipation parameter. Although
1.2 Partial differential equations 7
the equation is slightly more complicated than (1.5), and we have relabeled the
independent variable as t (to suggest time), rather than as r, there really is no
fundamental difference. This differential equation can also be solved in any number
of ways. As a second-order equation, it is well-known that one must provide two
extra conditions to have enough information to have a unique solution. Typically,
there are initial conditions, ξ (0), dξ (0)/dt – a position and velocity, but there is
nothing to prevent us from assigning two end conditions, ξ (0), ξ (t = t f ), or even
two velocity conditions dξ (0)/dt, dξ (t f )/dt, etc.
If we naively discretize (1.10) as we did the straight-line equation, we have
εt k0 (t)2 εt
ξ ( pt + t) − 2 − − ξ ( pt) − − 1 ξ ( pt − t)
m m m
Sξ (( p − 1) t)
= (t)2 , 2 ≤ p ≤ N − 1, (1.11)
m
which is another set of simultaneous equations as in (1.9) in the unknown ξ ( pt);
an equation count again would show that there are two fewer equations than un-
knowns – corresponding to the two boundary or two initial conditions. In Chapter 2,
several methods will be developed for solving sets of simultaneous linear equations,
even when there are apparently too few or too many of them. In the present case,
if one were given ξ (0), ξ (1t), Eq. (1.11) could be stepped forward in time, gen-
erating ξ (3t), ξ (4t), . . . , ξ ((N − 1)t). The result would be identical to the
solution of the simultaneous equations – but with far less computation.
But if one were given ξ ((N − 1)t) instead of ξ (1t), such a simple time-
stepping rule could no longer be used. A similar difficulty would arise if q( jt) were
missing for some j, but instead one had knowledge of ξ ( pt), for some p. Looked
at as a set of simultaneous equations, there is no conceptual problem: one simply
solves it, all at once, by Gaussian elimination or equivalent. There is a problem
only if one sought to time-step the equation forward, but without the required
second condition at the starting point – there would be inadequate information to
go forward in time. Many of the so-called inverse methods explored in this book
are ways to solve simultaneous equations while avoiding the need for all-at-once
brute-force solution. Nonetheless, one is urged to always recall that most of the
interesting algorithms are just clever ways of solving large sets of such equations.
Solve
∇ 2 φ = ρ, (1.12)
φ i j = φ 0i j , i, j ∈ ∂ D, (1.14)
there are precisely 4N − 4 of these conditions, and thus the combined set (1.13)
plus (1.14), written as (1.9) with,
x = vec{φ i j } = [ φ 11 φ 12 . φ N N ]T ,
b = vec{ρ i j , φ i0j } = [ ρ 22 ρ 23 . ρ N −1,N −1 φ 011 . φ 0N ,N ]T ,
1.2 Partial differential equations 9
Figure 1.1 Square, homogeneous grid used for discretizing the Laplacian, thus
reducing the partial differential equation to a set of linear simultaneous equations.
∇ 2 φ = ρ, (1.15)
etc., where ∂ D represents the set of boundary indices necessary to compute the
local normal derivative. There is a new combined set:
which says that, for a steady state, the rate of transfer in must equal the rate of
transfer out (written J0∞ ). To conserve mass,
J10, C1
J20, C2
J0∞
C0
JN0, CN
Figure 1.2 A simple reservoir problem in which there are multiple sources of
flow, at rates Ji0 , each carrying an identifiable property Ci , perhaps a chemical
concentration. In the forward problem, given Ji0 , Ci one could calculate C0 . One
form of inverse problem provides C0 and the Ci and seeks the values of Ji0 .
have also been written as though everything were perfect. If, for example, the
tracer concentrations Ci were measured with finite precision and accuracy (they
always are), the resulting inaccuracy might be accommodated as
where n represents the resulting error in the equation. Its introduction produces
another unknown. If the reservoir were capable of some degree of storage or fluc-
tuation in level, an error term could be introduced into (1.19) as well. One should
also notice that, as formulated, one of the apparently infinite number of solutions
to Eqs. (6.1, 1.19) includes Ji0 = J0∞ = 0 – no flow at all. More information is
required if this null solution is to be excluded.
To make the problem slightly more interesting, suppose that the tracer C is
radioactive, and diminishes with a decay constant λ. Equation (6.1) becomes
A tomographic problem
So-called tomographic problems occur in many fields, most notably in medicine, but
also in materials testing, oceanography, meteorology, and geophysics. Generically,
they arise when one is faced with the problem of inferring the distribution of
properties inside an area or volume based upon a series of integrals through the
region. Consider Fig. 1.3, where, to be specific, suppose we are looking at the top
of the head of a patient lying supine in a so-called CAT-scanner. The two external
shell sectors represent a source of x-rays, and a set of x-ray detectors. X-rays are
emitted from the source and travel through the patient along the indicated lines
where the intensity of the received beam is measured. Let the absorptivity/unit
length within the patient be a function, c(r), where r is the vector position within
the patient’s head. Consider one source at rs and a receptor at re connected by the
1.3 More examples 13
Figure 1.4 Simplified geometry for defining a tomographic problem. Some squares
may have no integrals passing through them; others may be multiply-covered.
Boxes outside the physical body can be handled in a number of ways, including
the addition of constraints setting the corresponding c j = 0.
where s is the arc length along the path. The basic tomographic problem is to
determine c(r) for all r in the patient, from measurements of I. c can be a function
of both position and the physical parameters of interest. In the medical problem, the
shell sectors rotate around the patient, and an enormous number of integrals along
(almost) all possible paths are obtained. An analytical solution to this problem,
as the number of paths becomes infinite, is produced by the Radon transform.3
Given that tumors and the like have a different absorptivity to normal tissue, the
reconstructed image of c(r) permits physicians to “see” inside the patient. In most
other situations, however, the number of paths tends to be much smaller than the
formal number of unknowns and other solution methods must be found.
Note first, however, that Eq. (1.22) should be modified to reflect the inability
of any system to produce a perfect measurement of the integral, and so, more
realistically,
re
I (rs , re ) = c (r (s)) ds + n(rs , re ), (1.23)
rs
Then Eq. (1.23) can be approximated with arbitrary accuracy (by letting the sub-
square dimensions become arbitrarily small) as
N
Ii = c j ri j + n i . (1.24)
j=1
Here ri j is the arc length of path i within square j (most of them will vanish for
any particular path). These last equations are of the form
Ex + n = y, (1.25)
where E ={ri j }, x = [c j ], y = [Ii ], n = [n i ] . Quite commonly there are many
more unknown c j than there are integrals Ii . (In the present context, there is no
distinction made between using matrices A, E. E will generally be used where noise
elements are present, and A where none are intended.)
Tomographic measurements do not always consist of x-ray intensities. In seis-
mology or oceanography, for example, c j is commonly 1/v j , where v j is the speed
of sound or seismic waves within the area; I is then a travel time rather than an inten-
sity. The equations remain the same, however. This methodology also works in three
dimensions, the paths need not be straight lines, and there are many generalizations.4
A problem of great practical importance is determining what one can say about the
solutions to Eqs. (1.25) even where many more unknowns exist than formal pieces
of information yi .
As with all these problems, many other forms of discretization are possible. For
example, the continuous function c (r) can be expanded:
c (r) = anm Tn (r x ) Tm (r y ), (1.26)
n m
where r =(r x , r y ), and the Tn are any suitable expansion functions (sines and
cosines, Chebyschev polynomials, etc.). The linear equations (4.35) then repre-
sent constraints leading to the determination of the anm .
where the convention is made that flows into the box are positive, and flows out
are negative. z = −h is the lower boundary of the volume and z = 0 is the top
1.3 More examples 15
L1 v1
L4 L2
L3
v4 v3
v2 h
Figure 1.5 Volume of fluid bounded on four open vertical and two horizontal sides
across which fluid is supposed to flow. Mass is conserved, giving one relationship
among the fluid transports vi ; conservation of one or more other tracers Ci leads
to additional useful relationships.
one. If the vi are unknown, Eq. (1.27) represents one equation (constraint) in four
unknowns:
0
vi (z) dz, 1 ≤ i ≤ 4. (1.28)
−h
where z 0 is a convenient place to start the integration (but can be any value).
bi are integration constants (bi = vi (z 0 )) that remain unknown. Constraint (1.27)
becomes
4 0 z
li ρ 0 vi (z )dz + bi (z 0 ) dz = 0, (1.30)
i=1 −h z0
16 Introduction
or
4
4 0 z
hli bi (z 0 ) = − li dz vi (z )dz , (1.31)
i=1 i=1 −h z0
where the right-hand side is known. Equation (1.31) is still one equation in four
unknown bi , but the zero-solution is no longer possible, unless the right-hand side
vanishes. Equation (1.31) is a statement that the weighted average of the bi on the
left-hand side is known. If one seeks to obtain estimates of the bi separately, more
information is required.
Suppose that information pertains to a tracer, perhaps a red dye, known to be
conservative, and that the box concentration of red dye, C, is known to be in a
steady state. Then conservation of C becomes
4 0 4 0 z
hli Ci (z) dz bi = − li dz Ci (z )vi (z )dz , (1.32)
i=1 −h i=1 −h −z 0
where Ci (z) is the concentration of red dye on each boundary. Equation (1.32)
provides a second relationship for the four unknown bi . One might try to measure
another dye concentration, perhaps green dye, and write an equation for this second
tracer, exactly analogous to (1.32). With enough such dye measurements, there
might be more constraint equations than unknown bi . In any case, no matter how
many dyes are measured, the resulting equation set is of the form (1.9). The number
of boundaries is not limited to four, but can be either fewer, or many more.5
Vibrating string
Consider a uniform vibrating string anchored at its ends r x = 0, r x = L . The free
motion of the string is governed by the wave equation
∂ 2η 1 ∂ 2η
− = 0, c2 = T /ρ, (1.33)
∂r x2 c2 ∂t 2
where T is the tension and ρ the density. Free modes of vibration (eigen-frequencies)
are found to exist at discrete frequencies, sq ,
qπc
2πsq = , q = 1, 2, 3, . . . , (1.34)
L
which is the solution to a classical forward problem. A number of interesting and
useful inverse problems can be formulated. For example, given sq ± sq , q =
1, 2, . . . , M, to determine L or c. These are particularly simple problems, because
there is only one parameter, either c or L, to determine. More generally, it is obvious
from Eq. (1.34) that one has information only about the ratio c/L – they could not
be determined separately.
1.4 Importance of the forward model 17
Suppose, however, that the density varies along the string, ρ = ρ(r x ), so that
c = c (r x ). Then, it may be confirmed that the observed frequencies are no longer
given by Eq. (1.34), but by expressions involving the integral of c over the length
of the string. An important problem is then to infer c(r x ), and hence ρ(r x ). One
might wonder whether, under these new circumstances, L can be determined inde-
pendently of c?
A host of such problems exist, in which the observed frequencies of free modes
are used to infer properties of media in one to three dimensions. The most elaborate
applications are in geophysics and solar physics, where the normal mode frequen-
cies of the vibrating whole Earth or Sun are used to infer the interior properties
(density, elastic parameters, magnetic field strength, etc.).6 A good exercise is to
render the spatially variable string problem into discrete form.
1 M
m̃ = yi . (1.35)
M i=1
In deciding to compute, and use, m̃ the observer has probably made a long list of very
sophisticated, but implicit, model assumptions. Among them we might suggest: (1)
Thermometers actually measure the length of a fluid, or an oscillator frequency, or
a voltage and require knowledge of the relation to temperature as well as potentially
elaborate calibration methods. (2) That the temperature in the room is sufficiently
slowly changing that all of the ti can be regarded as effectively identical. A different
observer might suggest that the temperature in the room is governed by shock waves
bouncing between the walls at intervals of seconds or less. Should that be true, m̃
constructed from the available samples might prove completely meaningless. It
might be objected that such an hypothesis is far-fetched. But the assumption that
the room temperature is governed, e.g., by a slowly evolving diffusion process, is
a specific, and perhaps incorrect model. (3) That the errors in the thermometer are
18 Introduction
such that the best estimate of the room mean temperature is obtained by the simple
sum in Eq. (1.35). There are many measurement devices for which this assumption
is a very poor one (perhaps the instrument is drifting, or has a calibration that varies
with temperature), and we will discuss how to determine averages in Chapter 2. But
the assumption that property m̃ is useful, is a strong model assumption concerning
both the instrument being used and the physical process it is measuring.
This list can be extended, but more generally, the inverse problems listed earlier
in this chapter only make sense to the degree that the underlying forward model
is likely to be an adequate physical description of the observations. For example,
if one is attempting to determine ρ in Eq. (1.15) by taking the Laplacian ∇ 2 φ,
(analytically or numerically), the solution to the inverse problem is only sensible if
this equation really represents the correct governing physics. If the correct equation
to use were, instead,
∂ 2 φ 1 ∂φ
+ = ρ, (1.36)
∂r x2 2 ∂r y
where r y is another coordinate, the calculated value of ρ would be incorrect. One
might, however, have good reason to use Eq. (1.15) as the most likely hypothesis,
but nonetheless remain open to the possibility that it is not an adequate descrip-
tor of the required field, ρ. A good methodology, of the type to be developed in
subsequent chapters, permits posing the question: is my model consistent with the
data? If the answer to the question is “yes,” a careful investigator would never
claim that the resulting answer is the correct one and that the model has been “val-
idated” or “verified.” One claims only that the answer and the model are consistent
with the observations, and remains open to the possibility that some new piece
of information will be obtained that completely invalidates the model (e.g., some
direct measurements of ρ showing that the inferred value is simply wrong). One
can never validate or verify a model, one can only show consistency with existing
observations.7
Notes
1 Whittaker and Robinson (1944).
2 Lanczos (1961) has a much fuller discussion of this correspondence.
3 Herman (1980).
4 Herman (1980); Munk et al. (1995).
5 Oceanographers will recognize this apparently highly artificial problem as being a slightly
simplified version of the so-called geostrophic inverse problem, and which is of great practical
importance. It is a central subject in Chapter 6.
6 Aki and Richards (1980). A famous two-dimensional version of the problem is described by Kač
(1966); see also Gordon and Webb (1996).
7 Oreskes et al. (1994).
2
Basic machinery
2.1 Background
The purpose of this chapter is to record a number of results that are useful in
finding and understanding the solutions to sets of usually noisy simultaneous linear
equations and in which formally there may be too much or too little information.
A lot of the material is elementary; good textbooks exist, to which the reader
will be referred. Some of what follows is discussed primarily so as to produce
a consistent notation for later use. But some topics are given what may be an
unfamiliar interpretation, and I urge everyone to at least skim the chapter.
Our basic tools are those of matrix and vector algebra as they relate to the solution
of linear simultaneous equations, and some elementary statistical ideas – mainly
concerning covariance, correlation, and dispersion. Least-squares is reviewed, with
an emphasis placed upon the arbitrariness of the distinction between knowns, un-
knowns, and noise. The singular-value decomposition is a central building block,
producing the clearest understanding of least-squares and related formulations.
Minimum variance estimation is introduced through the Gauss–Markov theorem
as an alternative method for obtaining solutions to simultaneous equations, and its
relation to and distinction from least-squares is discussed. The chapter ends with
a brief discussion of recursive least-squares and estimation; this part is essential
background for the study of time-dependent problems in Chapter 4.
19
20 Basic machinery
then ei are said to be a “basis.” A necessary and sufficient condition for them to
have that property is that they should be “independent,” that is, no one of them
should be perfectly representable by the others:
N
ej − β i ei = 0, j = 1, 2, . . . , N . (2.2)
i=1, i= j
A subset of the e j are said to span a subspace (all vectors perfectly representable
by the subset). For example, [1, −1, 0]T , [1, 1, 0]T span the subspace of all vectors
[v1 , v2 , 0]T . A “spanning set” completely describes the subspace too, but might have
additional, redundant vectors. Thus the vectors [1, −1, 0]T , [1, 1, 0]T , [1, 1/2, 0]
span the subspace but are not a basis for it.
2.2 Matrix and vector algebra 21
f
e1
e2
h
f
Figure 2.1 Schematic of expansion of an arbitrary vector f in two vectors e1 , e2
which may nearly coincide in direction.
The expansion coefficients α i in (2.1) are obtained by taking the dot product
of (2.1) with each of the vectors in turn:
N
α i eTk ei = eTk f, k = 1, 2, . . . , N , (2.3)
i=1
eiT e j = δ i j ,
but this requirement is not a necessary one. With a basis, the information contained
in the set of projections, eiT f = fT ei , is adequate then to determine the α i and thus
all the information required to reconstruct f is contained in the dot products.
The concept of “nearly dependent” vectors is helpful and can be understood
heuristically. Consider Fig. 2.1, in which the space is two-dimensional. Then the
two vectors e1 , e2 , as depicted there, are independent and can be used to expand
an arbitrary two-dimensional vector f in the plane. The simultaneous equations
become
The vectors become nearly parallel as the angle φ in Fig. 2.1 goes to zero; as long as
they are not identically parallel, they can still be used mathematically to represent f
perfectly. An important feature is that even if the lengths of e1, e2 , f are all order-one,
the expansion coefficients α 1 , α 2 can have unbounded magnitudes when the angle
φ becomes small and f is nearly orthogonal to both (measured by angle η).
22 Basic machinery
and whose magnitudes can become arbitrarily large as φ → 0. One can imagine a
situation in which α 1 e1 and α 2 e2 were separately measured and found to be very
large. One could then erroneously infer that the sum vector, f, was equally large.
This property of the expansion in non-orthogonal vectors potentially producing
large coefficients becomes important later (Chapter 5) as a way of gaining insight
into the behavior of so-called non-normal operators. The generalization to higher
dimensions is left to the reader’s intuition. One anticipates that as φ becomes very
small, numerical problems can arise in using these “almost parallel” vectors.
Gram–Schmidt process
One often has a set of p independent, but non-orthonormal vectors, hi , and it is con-
venient to find a new set gi , which are orthonormal. The “Gram–Schmidt process”
operates by induction. Suppose the first k of the hi have been orthonormalized to a
new set, gi . To generate vector k + 1, let
k
gk+1 = hk+1 − γ jgj. (2.9)
j
But a simple numerical perturbation usually suffices to render them so. In practice,
the algorithm is changed to what is usually called the “modified Gram–Schmidt
process” for purposes of numerical stability.2
For the definition (2.10) to make sense, A must be an M × P matrix and B must
be P × N (including the special case of P × 1, a column vector). That is, the two
matrices must be “conformable.” If two matrices are multiplied, or a matrix and
a vector are multiplied, conformability is implied – otherwise one can be assured
that an error has been made. Note that AB = BA even where both products exist,
except under special circumstances. Define A2 = AA, etc. Other definitions of
matrix multiplication exist, and are useful, but are not needed here.
The mathematical operation in (2.10) may appear arbitrary, but a physical inter-
pretation is available: Matrix multiplication is the dot product of all of the rows of A
with all of the columns of B. Thus multiplication of a vector by a matrix represents
the projections of the rows of the matrix onto the vector.
Define a matrix, E, each of whose columns is the corresponding vector ei , and
a vector, α = {α i }, in the same order. Then the expansion (2.1) can be written
compactly as
f = Eα. (2.11)
is most useful; often the subscript is omitted. Equation (2.12) leads in turn to the
measure of distance between two vectors, a, b, as
a − b2 = (a − b)T (a − b), (2.13)
which is the familiar Cartesian distance. Distances can also be measured in such a
way that deviations of certain elements of c = a − b count for more than others –
that is, a metric, or set of weights can be introduced with a definition,
cW = cW c,
i i ii i
(2.14)
depending upon the importance to be attached to magnitudes of different elements,
stretching and shrinking various coordinates. Finally, in the most general form,
distance can be measured in a coordinate system both stretched and rotated relative
to the original one
√
cW = cT Wc, (2.15)
where W is an arbitrary matrix (but usually, for physical reasons, symmetric and
positive definite,3 implying that cT Wc ≥ 0).
Ae =λe. (2.19)
In this set of linear simultaneous equations one seeks a special vector, e, such
that for some as yet unknown scalar eigenvalue, λ, there is a solution. An N × N
matrix will have up to N solutions (λi , ei ), but the nature of these elements and their
relations require considerable effort to deduce. We will look at this problem more
later; for the moment, it again suffices to say that numerical methods for solving
Eq. (2.19) are well-known.
is usually adequate. Without difficulty, it may be seen that this definition is equiv-
alent to
xT AT Ax Ax2
A2 = max = max (2.21)
T
x x x2
where the maximum is defined over all vectors x.5 Another useful measure is the
“Frobenius norm,”
M N
AF = i=1
A 2
j=1 i j
= trace(AT A). (2.22)
Neither norm requires A to be square. These norms permit one to derive various use-
ful results. Consider the following illustration. Suppose Q is square, and Q < 1,
then
(I + Q)−1 = I − Q + Q2 − · · · , (2.23)
If r, q are of the same dimension, the determinant of B = det (B) is the “Jacobian”
of r.7
The second derivative of a scalar,
⎧ ∂2s ∂2s
⎫
· · ∂q∂1 qs N ⎪
2
⎪ ⎨ ∂q12 ∂q1 q2 ⎬
∂ s
2
∂ ∂s
= = · · · · · , (2.26)
∂q2 ∂qi ∂q j ⎪
⎩ ∂2s ∂2s ⎪ ⎭
∂q N ∂q1
· · · ∂q 2
N
∞
px (X ) dX = px (X ) dX = 1. (2.38)
all X −∞
The mean is the center of mass of the probability density. Knowledge of the true
mean value of a random variable is commonly all that we are willing to assume
known. If forced to “forecast” the numerical value of x under such circumstances,
often the best we can do is to employ x . If the deviation from the true mean is
denoted x so that x = x + x , such a forecast has the virtue that we are assured
the average forecast error, x , would be zero if many such forecasts are made. The
bracket operation is very important throughout this book; it has the property that if
a is a non-random quantity, ax = a x and ax + y = a x + y .
Quantity x is the “first-moment” of the probability density. Higher order mo-
ments are defined as
∞
mn = x =n
X n px (X )dX,
−∞
where n are the non-negative integers. A useful theoretical result is that a knowledge
of all the moments is usually enough to completely define the probability density
themselves. (There are troublesome situations with, e.g., non-existent moments,
as with the so-called Cauchy distribution, px (X ) = (2/π ) (1/(1 + X 2 )) X ≥ 0,
whose mean is infinite.) For many important probability densities, including the
Gaussian, a knowledge of the first two moments n = 1, 2 is sufficient to define
all the others, and hence the full probability density. It is common to define the
moments for n > 1 about the mean, so that one has
∞
μn = (x − x ) = n
(X − X )n px (X )dX.
−∞
The notation m̃ 1 is used to distinguish the sample estimate from the true value, m 1 .
On the other hand, if the experiment of computing m̃ 1 from M samples could be
repeated many times, the mean of the sample estimates would be the true mean.
This conclusion is readily seen by considering the expected value of the difference
from the true mean:
!
1 M
x M− x = Xi − x
M i=1
M
1 M
= Xi − x = x − x = 0.
M i=1
M
Such an estimate is said to be “unbiassed”: its expected value is the quantity one
seeks.
The interpretation is that, for finite M, we do not expect that the sample mean
will equal the true mean, but that if we could produce sample averages from distinct
groups of observations, the sample averages would themselves have an average that
will fluctuate about the true mean, with equal probability of being higher or lower.
There are many sample estimates, however, some of which we encounter, where
the expected value of the sample estimate is not equal to the true estimate. Such
an estimator is said to be “biassed.” A simple example of a biassed estimator is the
“sample variance,” defined as
M
1
s2 ≡ (X i − x M) .
2
(2.41)
M i
For reasons explained later in this chapter (p. 42), one finds that
M −1 2
s2 = σ = σ 2 ,
M
and thus the expected value is not the true variance. (This particular estimate is
“asymptotically unbiassed,” as the bias vanishes as M → ∞.)
We are assured that the sample mean is unbiassed. But the probability that
x M = x , that is that we obtain exactly the true value, is very small. It helps to
have a measure of the extent to which x M is likely to be very far from x . To do
so, we need the idea of dispersion – the expected or average squared value of some
quantity about some interesting value, like its mean. The most familiar measure of
dispersion is the variance, already used above, the expected fluctuation of a random
variable about its mean:
σ 2 = (x − x )2 .
Discovering Diverse Content Through
Random Scribd Documents
masters of assemblies, well, deeply fixed,
which are given from dominating over the
one shepherd. herd, appointed so by a
shepherd, who is the
only one.
(12.) But for the rest (repeating the formula of verse 9, and
hence a further extension of the same idea), from them (emphatic,
‘but for anything else that these wise words can do’ is the meaning)
my son, be admonished: makings of books ( ספריםused for the
sake of the alliteration with אספותabove) the many (i.e. too many) is
nothing of an end (i.e. gives no result) and study ( להגoccurs here
only, LXX. μελέτη) the much (too much) wearies (compare chapter
i. 8) the flesh. (Thus even wisdom itself is no cure for the ills of
humanity. The catalogue of human ills and the instances of human
evanescence would form too large a volume for humanity to master,
so that in this case also the world itself would not contain the books
which should be written. The grand result of all however is easily
obtained, and follows.)
BOOKS
PUBLISHED DURING 1869, 1870, 1871, AND 1872,
BY
Messrs. RIVINGTON,
HIGH STREET, OXFORD;
TRINITY STREET, CAMBRIDGE;
WATERLOO PLACE, LONDON.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookultra.com