#Data Assimilation - Mathematical Concepts and Instructive Examples
#Data Assimilation - Mathematical Concepts and Instructive Examples
RodolfoGuzzi
Data Assimilation:
Mathematical
Concepts and
Instructive
Examples
123
SpringerBriefs in Earth Sciences
More information about this series at http://www.springer.com/series/8897
Rodolfo Guzzi
Data Assimilation:
Mathematical Concepts
and Instructive Examples
123
Rodolfo Guzzi
System Biology Group
University La Sapienza
Rome
Italy
v
vi Preface
3 Sequential Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 An Effective Introduction of a Kalman Filter . . . . . . . . . . . . . . . 39
3.1.1 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Building up the Kalman Filter . . . . . . . . . . . . . . . . . . . . 42
3.2 More Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . 46
3.2.2 Sigma Point Kalman Filter (SPKF) . . . . . . . . . . . . . . . . . 49
3.2.3 Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vii
viii Contents
5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Lorenz Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1.1 Solution of Lorenz 63 Model . . . . . . . . . . . . . . . . . . . . . 92
5.1.2 Lorenz Model and Data Assimilation . . . . . . . . . . . . . . . 93
5.2 Biology and Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.1 Tumor Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.2 Growth Tumor Data Assimilation with LETKF . . . . . . . . 99
5.2.3 LETFK Receipt Computation. . . . . . . . . . . . . . . . . . . . . 104
5.3 Mars Data Assimilation: The General Circulation Model . . . . . . . 105
5.3.1 Mars Data Assimilation: Methods and Solutions. . . . . . . . 112
5.4 Earthquake Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.1 Renewal Process as Forecast Model . . . . . . . . . . . . . . . . 115
5.4.2 Sequential Importance Sampling and Beyond. . . . . . . . . . 116
5.4.3 The Receipt of SIR. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Chapter 1
Introduction Through Historical Perspective
Abstract The first approach to the data assimilation methods started with the
applications of the Newtonian equations to the motion of the planets made by Laplace
and Gauss. However, only with the introduction of a probabilistic view, the dynamical
equations were able to make a real forecast. The first numerical weather prediction
(NWP) models, based on determinism, soon became probabilistic, mainly with the
contribute of the group leaded by Von Neumann at Princetons Institute for Advanced
Study. This chapter contains the history of the evolution of the data assimilation meth-
ods, from the early determinism algorithms to the most recent probabilistic methods
as the ensemble forecasting. The fundamental work by Edward Lorenz and by the
group of Los Alamos, which developed the Monte Carlo method, are also reported.
Forecasts from models provide useful information to help predict the future state of a
system. However, inevitably predictions will diverge from reality as time progresses,
that is an inescapable property of dynamical systems. Data assimilation (DA) is a
formal approach used to help correct forecasts by introducing information observed
from the environment.
The first Dynamic Data Assimilation, which we intend to describe in this book,
which is that requires the existence of data and equations or dynamic models, was
made to calculate the comets orbit, the moving bodies per excellence. This is a
classical problem of two bodies where one seeks to determine the position and the
components of the body at a particular time, (Principia book III prop XLI).
The first method of finding the orbit of a comet moving in a parabola was tried by
Newton [1], using three observations. He, however, wrote, This being a problem of
very great difficulty, I tried many methods of resolving it.
When the observations are taken from the surface of Earth, the apparent position,
in space, of the comet is non-given and therefore the velocity components are not
determined. This fact implies that the observer has to make more observations, at
different times. During the interval of time, Earth and comet have moved, so the
where is the longitude of ascending node, i is the inclination to plane of the ecliptic,
is the longitude of the perihelion measured from the node, a is the major semi-axis
which defines of the orbit of revolution, e is the eccentricity which describes the
shape of the orbit, T is the time of perihelion passage defining the position of the
body at any time.
Since the functions and are highly transcendental the solution of equation
(1.1) is not obtained by ordinary processes and is very complex as mentioned by
Newton. The fundamental elements described by Newton were explained by Laplace
[2], in 1780, that also described the method of solution of equation (1.1) that has
been the basis for the later works.
In 1809 Gauss [3], in Theoria Motus Corporum Celestium, elaborated a technique,
based on the least squares, that led to the recovery of the orbit of Ceres lost to sight
when its light vanished in the ray of the Sun. Before them Euler [4] in 1744, in
Theoria Motum Planetorum et Cometarum had obtained an approximate solution
while in 1778 Lagrange [5] obtained an original solution which has never been used
in practice.
Thus from our point of view the first Dynamic Data Assimilation was made by
Laplace and Gauss. In Moultons book [6], the methods they had adopted are fully
described to determine the orbit of a comet. Furthermore in the Gauss method it lays
the groundwork for the least squares method that became the foundation of the Data
Assimilation.
Previously, in 1805, Adrien-Marie Legendre [7] also had developed the method of
least squares, and introduced it in his book Nouvelles mthodes pour la dtermination
des orbites des comtes (New Methods for Determining the Orbits of Comets). The
least squares method, independently established by Legendre and Gauss in 1800,
also represents the passage from the determinism to the probability.
The basic assumption of physics from the time of Newton up to the Gauss period
(late 1600 to 1800) was the determinism. It establishes that the future state of a
1.1 From Gauss to Kolmogorov 3
system is entirely determined by the present state of the system. The evolution of the
system is governed by causal relationships as those described, for example, by the
Newton equations of motion.
Around the late 1600 the important concept of the probability arose, as a mea-
sure of the likeliness that an event will occur. The first to demonstrate the efficacy
of defining odds as the ratio of favorable to unfavorable outcomes was Gerolamo
Cardano [8] in his book, written around 1564, but the mathematics of the probability
was set by Jacob Bernoulli [9], one of the twelve many prominent mathematicians
in the Bernoulli family, in the book Ars Conjectandi (posthumous, 1713) and by
Abraham de Moivres Doctrine of Chances (1718).
In 1812, Laplace [10] published his Thorie analytique des probabilits in which it
was allowed to compare the merits of different parameter values. For the first time the
concept of estimation was introduced, in which acceptable values for the parameters
of distributions, specified by hypotheses, are sought. The first half of this treatise was
concerned with probability methods and problems, the second half with statistical
methods and applications. The term inverse probability appeared in the 1837 paper of
De Morgan [11] since Laplaces book did not use this term. The term inverse means
that the probabilities of causes, or hypotheses, could be deduced from the frequency
of events. It involves inferring backward from the data to the parameter or from
effects to causes. If no information is available when setting initial priors, merely
defined as uniform prior, one sets all possible hypotheses to an equal initial prior
probability. This concept was introduced by Bayes [12] before and after Laplace [2]
that stated the principle:
If an event can be produced by a number n of different causes, then the probabilities
of these causes, given the event, are to each other as the probabilities of the event
given the causes, and the probability of the existence of each of these is equal to the
probability of the event given that cause, divided by the sum of all of the probabilities
of the event given each of these causes.
Gauss combined the Bernoulli idea about the probability with Laplaces principle
of inverse probability maximized the posterior density of the location parameter
in the error distribution, assuming that the prior distribution is uniform. Requiring
that the posterior mode is equal to the arithmetic mean, Gauss derived the normal
distribution and thus gave a probabilistic justification for the method of least squares
(Hald [13]).
By the mid-1700s, the problems of observational error were mathematically
described. They were supposed to be positive and negative, and it was generally
accepted that their frequency distribution followed a smooth symmetric curve.
The inverse probability describing the probability distribution of an unobserved
variable, is called Bayesian probability, and the problem of determining an unob-
served variable (by whatever method) is called inferential statistics. The distribution
of an unobserved variable given data is rather defined by the likelihood function
(which is not a probability distribution), and the distribution of an unobserved vari-
able, given both data and a prior distribution, is the posterior distribution. The term
likelihood was introduced by Fisher [14] who wrote:
4 1 Introduction Through Historical Perspective
This short paper to the Cambridge Philosophical Society was intended to intro-
duce the notion of fiducial probability, and the type of inference which may be
expressed in this measure. It opens with a discussion of the difficulties which had
arisen from attempts to extend Bayes theorem to problems in which the essential
information on which Bayes theorem is based is in reality absent, and passes on
to relate the new measure to the likelihood function, previously introduced by the
author, and to distinguish it from the Bayesian probability a posteriori.
Hald [13] has shown that the method of maximum likelihood was proposed by
Daniel Bernoulli, whose paper on likelihood was reported by Kendall [19], but with
no practical effect because the maximum likelihood equation for the error distri-
bution was considered intractable. In English the word likelihood has been distin-
guished as being related to, but weaker than, probability since its earliest uses. The
comparison of hypotheses by evaluating likelihoods has been used for centuries. Dif-
ferent authors by also used the term likelihood, from Christiaan Huygens to Charles
Sanders Peirce [20], where model-based inference (usually abduction but sometimes
including induction) is distinguished from statistical procedures based on objective
randomization.
Fisher [21] also uses the term method of maximum likelihood estimation (MLE). In
statistics, maximum-likelihood estimation is a method of estimating the parameters
of a statistical model. When applied to a data set and given a statistical model,
maximum-likelihood estimation provides estimates for the models parameters.
The method of maximum likelihood corresponds to many well-known estimation
methods in statistics. Assuming that the heights are normally (Gaussian) distributed
with some unknown mean and variance, the mean and variance can be estimated with
MLE while only knowing the heights of some sample of the overall population. MLE
would accomplish this by taking the mean and variance as parameters and finding
particular parametric values that make the observed results the most probable (given
the model).
With the advent of the statistical thermodynamics due to James Maxwell (see
Myrvold [22] and Ludwigh Boltzman [23]) appeared to clear the limitations of
the deterministic view. At the same time, however, Poincar [24], Birkhoff (see
Marston Morse [25]) and Lyapunov [26] continued their exploration along the lines
of what was called dynamic system whose most recent approach was given by Edward
Lorenz [27] with the predictability: the butterfly effect and the strange attractors.
Markov [28], Wiener (see Masani [29]) and Kolmogorov [30] developed the sto-
chastic dynamic prediction related to process that exhibited randomness, in addition
to the dynamical system. In probability theory, a stochastic process, or sometimes
random process (widely used) is a collection of random variables, representing the
evolution of some system of random values over time. This is the probabilistic coun-
terpart to a deterministic process (or deterministic system). Instead of describing a
process which can only evolve in one way (as in the case, for example, of solutions
of an ordinary differential equation), in a stochastic or random process there is some
indeterminacy: even if the initial condition (or starting point) is known, there are
several (often infinitely many) directions in which the process may evolve.
6 1 Introduction Through Historical Perspective
With the invention of the first meteorological instruments the quantitative observation
of meteorological phenomena in 14th century was initiated. The first regular instru-
mental observation was planned by Blaise Pascal at Puy de Dme in Paris where
he observed the first relation between the weather change and the height of mercury
column in the thermometer. The first systematic observations were also made in
London and Oxford. Previously non-systematic observations had been also made in
Florence by Galileo and his scholars, including Torricelli [39], in the first climate net-
work set up by Duke Ferdinand II of Tuscany (16101670). The first observations in
1.2 Approaching the Meteorological System 7
Fig. 1.1 I have used part of this figure to draw the structure of this book, changing it in the early
part when probability studies arose. Source Lewis [37]
8 1 Introduction Through Historical Perspective
Germany started in 1654 when the Jesuit college of Osnabrak received a Florentine
thermometer. During the first part of the 18th century systematic observations were
begun in Russia. Due to the heterogeneity of the instruments and mainly because
they were not standardized and calibrated it was impossible to make some forecasts.
In 1763 in Mannheim The Palatine Academy of Sciences and Letters was founded
that in 1780 established a special meteorological class of the Academy to be known as
the Palatine Meteorological Society (Societas Meteorologica Palatina). Many obser-
vatories participated in the observation proposed by the Mannheim Society. The
observations collected were published in Latin, which was the international lan-
guage of the science, in the journal of the Society: Ephemerides. Nevertheless the
Mannheim Society had closed its activities in 1799, the concept had been initiated and
the first attempt to standardize the observations could be extended to all observations
on ground stations everywhere.
The present meteorological network has evolved over 300 years with new instru-
ments and new technologies. With the advent of satellites, the instruments of class
1, which are the instruments making measurements at points, were joined by class 2
instruments, that are instruments sampling an area or a volume remotely. A succes-
sive class 3 devices were also introduced that are instruments calculating the wind
velocities from Lagrange trajectories.
While the instrumental error for traditional class 1 instruments can be defined
within a certain accuracy, the instrumental errors for the class 2 and 3 instruments
are not so straightforward because their results are derived from inversion methods
and therefore are strongly dependent on the algorithms used.
Another type of error was also reported: the error of representativeness. Petersen
and Middleton [40] defined a representative observation as a datum pertaining to a
particular point and time, which is the result of an optimum filtering operation on
the continuous raw data field, under the criterion of minimum average mean square
error of reconstruction from the subsequently sampled values on a given space-time
lattice.
Starting from this definition one can say that the error of representativeness is a
measure of the error due to misrepresentation in space-time scale of the observation.
Such error can arise from the non-correct distance of the stations in the observation
network, in case of hazard or natural phenomena, or in case of micro observations, in
a non-sufficient density of observations able to capture the essence of phenomenon
under investigation.
Petersen [41] defines the expression of the observation process as a set of measure-
ments of the physical field made at various space-time points by means of devices
having certain physical and statistical features. The main refinement is that the total-
ity of this information be instantaneous in time and, whether counted in units of real
numbers, be finite. The criterion by which these estimates are to be evaluated is their
mean square error or deviation from the true value of the field variable at the given
location and time. Then the optimal solution to this problem is that the estimate
should coincide with the mean value or expectation of the variable, conditional upon
the available data.
1.2 Approaching the Meteorological System 9
Surface stations, radiosondes and pilot balloons are irregularly distributed over
land and on some remote islands. Then they are complemented with meteorolog-
ical satellites Meteosat (Europe), GOES (USA), Elektro-L1 (Russia), MTSAT-1R
(Japan), INSAT (India), Feng-Yun (China) and Earth Observation Satellite constel-
lations as for example the NASA A-Train or the near future European, ESA, con-
stellation named Sentinel.
These constellations provide data that can be used together to obtain compre-
hensive information about atmospheric components or processes that are happening
at the same time. Combining the information collected simultaneously from several
sources, by the ground and airborne network, one obtains a complete answer to many
questions than would be possible from any one satellite taken alone at different times.
Data assimilation methods used in Numerical Weather Prediction, where grid
measurements are present and model process are available, have also been developed
in other fields: earthquakes, chemical process in atmosphere, planetary circulation,
robotics and biology growth. The applications refer to methods for updating the
state vector (initial condition) of a complex space-time model by combining new
observations with one or more prior forecasts, in which the changing of the space-
time observations requires proper sampling and the observation representativeness
and related errors become crucial to both follow and forecast the phenomenon under
investigation.
Reviewing the concepts expressed above it is possible to define to better what is
the assimilation and its cycle. It can be defined as
1. data checking;
2. proper interpolation of data;
3. initialization of the assimilation model;
4. short forecast to prepare the background field.
Let us now reverse our approach and look first of all at the assimilation model
instead of observation data. In general a dynamic system is described by a system of
non-linear partial equation (PDEs) whose solution is discrete. If the solution of the
system is well-posed in the sense of Hadamard, there is a forward discrete operator
that yields the solution. Let us note that a dynamic system evolves with time but
that not all of the state variables, within the state vector, have equal information
content and not all state variables are known to the same precision. It is therefore
desirable that the observations made both contain the maximum information content
possible and allow the systems state to be characterized with a minimum uncertainty.
Furthermore, in order to pass from the observation location to the grid points of the
model some interpolations are performed. In fact the first-guess fields defined at the
grid points of the forecasting model are interpolated to the observation location, while
differences between the observation and the interpolated value are then interpolated
back onto the grid points to define a correction.
Although the continuum state of the observations is the most natural candidate
to be accumulated into a model, the true state to be estimated is a projection of the
continuum state on a discrete space. This refers to the best possible state represented
by the model, which is what we are trying to approximate. Thus the production of an
10 1 Introduction Through Historical Perspective
accurate image of the true state of the system at a given time, represented in a model
as a collection of numbers, is called analysis. An analysis can be useful in itself as
a comprehensive and self-consistent diagnostic of the system. It can also be used as
input data to another operation, notably as the initial state for a numerical forecast,
or as a data retrieval to be used as a pseudo-observation. It can provide a reference
against which to check the quality of observations.
There are two basic approaches: sequential assimilation, which only considers
observation made in the past until the time of analysis, which is the case of real-
time assimilation systems, and non-sequential, or retrospective assimilation, where
observation from the future can be used, for instance in a reanalysis exercise.
Since the model has a lower resolution than reality, even the best possible analysis
will never be entirely realistic. Thus even though the observations do not have any
instrumental error, and the analysis is equal to the true state, there will be some
unavoidable discrepancies between the observed values and their equivalents in the
analysis, because of representativeness errors. Although we will often treat these
errors as a part of the observation errors in the mathematical equations, one should
keep in mind that they depend on the model discretization, not only on instrumental
problems.
Summarizing, the necessary objective information that we can use to produce
an analysis is a collection of observed values provided by observations of the true
state. If the model state is overdetermined by the observations, then the analysis
reduces to an interpolation problem. In several cases, the analysis problem is under-
determined because data are sparse and only indirectly related to the model variables,
as happens in remote sensing measurements. In order to make it a well-posed problem
it is necessary to rely on some background information in the form of an a priori
estimate of the model state. Physical constraints on the analysis problem can also
help. The background information can be a trivial state; it can also be generated
from the output of a previous analysis, using some assumptions of consistency in
time of the model state, like stationarity (hypothesis of persistence) or the evolution
predicted by a forecast model.
An interesting paper by Lewis [37] outlines the history of the numerical weather
forecasting, from the first deterministic solutions up to the most recent approaches
based on Monte Carlo methods.
Around 1950, it was clear that the dynamics of weather could be evaluated using
the same equation of dynamics developed in other fields. The theoretical treatment of
the scales of motion in the atmosphere, initially based on the determinist set of equa-
tions, proposed by Jule Charney [42, 43] showed be considered the first successful
Numerical Weather Prediction (NWP). At Princetons Institute for Advanced Study
under the guidance of John von Neumann, Jule Charney [43] and his team made two
successful 24 h forecasts of the transient features of the large-scale flow, initialized
1.3 Numerical Weather Prediction Models 11
on 30 January and 13 February 1949, even though the successive 24 h forecasts made
(5 January 1949, e.g.) were not particularly useful, as referred by Lewis [37] in his
paper.
The reactions of the meteorologic community were positive even though it was
clear the limits of deterministic prediction that were governed by the growth of errors
because the solution depends both on the initial state, which is generally erroneous,
and on the models that are, by their nature, imperfect. Furthermore the feature of
the causal laws, characterized by unstable systems and non-periodicity limited the
predictability of the system.
Whereas Gauss and contemporaries had found that the two-body problem of
celestial mechanics tolerated, in the initial state, small errors, meteorological pre-
diction under non-periodic constraints would be found to be less forgiving of these
uncertainties [37].
In the first years, the major theme was the development of the models and, of
course, the identification and related correction of systematic errors. That the deter-
ministic forecast was imperfect had been already discovered by Eady [44] in his
Compendium of meteorology.
we never know what small perturbations may exist below a certain margin
of error. Since the perturbations may grow at an exponential rate, the margin of
the error in the forecast (final) state will grow exponentially as the period of the
forecast is increased, and this possible error is unavoidable whatever our method of
forecasting if we are to glean any information at all about developments beyond
the limited time interval, we must extend our analysis and consider the properties of
the set or ensemble (corresponding to the Gibbs-ensemble of statistical mechanics)
of all possible developments. Thus, long range forecasting is necessarily a branch
of statistical physics in its widest sense: both our questions and answers must be
expressed in terms of probabilities.
In 1962 Edward Lorenz [45], using a truncated version of the two levels quasi-
geostrophic model described by Lorenz himself [46], found that:
However the work the existence of deterministic systems governed by equations
whose nonlinearity resembles the non-linearity of the atmosphere, which are not
perfectly nor almost perfectly predictable by simple and simply determined linear
formulas, if the period between successive observations is greater than half of the
shortest significant period of observation.
Lorenz speculated that, by a linear regression, one day forecast should be good,
but successive days forecast should be poor. An unexpected result occurred when
Lorenz inadvertently introduced truncation errors in the model he was running. The
small error had influenced the three decimal place, instead of six decimal but had
amplified so much of the simulation that the signal had been covered and I found
this very exciting because this implied if the atmosphere behaved this way, the long
range forecasting was impossible
The contribution by Lorenz [27], using a system of three ordinary differential
equations of a simplified system of Barry Saltzman to study finite amplitude con-
vention, has laid the foundation for the field of chaotic systems (see Chap. 5).
12 1 Introduction Through Historical Perspective
Assimilation is a word that covers different meanings, from those linguistic to those
cultural, from the process of conversion of nutrient in biology to those incorporating
new concepts into existing schemes as it is in psychology, up to the technical assim-
ilation in meteorology or climate, robotics and recently biology processes. In such
cases regarding the word assimilation, we need to append the word data. One shall
cover this last aspect analyzing the dynamic data assimilation.
In this frame the Data Assimilation is a set of mathematical techniques allowing us
to use all the information available to us within a time frame, including observational
data, any prior information we may have by a deterministic model describing our
system and encapsulating our theoretical understanding of Data Assimilation. The
mathematical basis is the estimation theory or the theory of inverse problems that is
an organized set of mathematical techniques for obtaining useful information about
the physical world on the basis of observations.
In a conventional problem, one would use a set of known prior parameters to
predict the state of a physical system. This approach is usually called a forward
problem, whereas in the inverse problem one attempts to use available observation
of the state of the system to estimate poorly known parameters of the state itself. In
both the case, Data Assimilation can be treated as a Bayesian system.
The Bayesian theorem or the law of inverse probability allows us to combine prior
information about the parameters with the information contained in the observations,
to guide the statistical inference process. The reason why the Data Assimilation is
so effective is that it seeks to produce an analysis that fits a set of observations taken
over a time frame, not just the observations made at one instant in time, subject to
the strong constraint that the evolution of the analyzed quantities is governed by a
deterministic model describing the given observation.
Data Assimilation has been applied to several fields, from the numerical weather
prediction, where it began, to planetary climate analysis up to the evolution of bio-
logical cells or in a robotic system. One begins with a forecast model, often called
background. In order to make a useful prediction, the background must frequently
be updated with noisy and sparse measurements. This procedure updates the back-
ground in light of the new observation to produce an analysis, which, under suit-
able assumptions is the maximum estimate of the model state vector. Subsequently,
the model is restarted from analysis and provides a new background or forecast.
Data Assimilation and model forecast can be combined into an observing system
14 1 Introduction Through Historical Perspective
the observation data with the data produced by the model to reproduce an optimal
estimate evolving state of the system. The model provides texture to observed data
allowing interpolating or extracting them in regions of space and time in which they
are missing. Moreover, the observed data adjust the trajectory of the model through
the state of the model itself, keeping it online in a loop of prediction-comment-
correction.
There are at least two basic approaches for data assimilation: one where we con-
sider sequential data observed in the past until the time of writing, in this case,
we have similar systems in real time, and one non-sequentially where one can use
successive observations. Methods may be intermittent or continuous over time.
Temporal intermittent distribution includes a cycle of 6 h and observations are
processed in batches. Continuous observations are carried out on a long run and
analyzed. The state correction is smoothed over time allowing one to obtain a more
realistic analysis.
From a practical point of view, one uses a template, the so-called direct model, to
connect the input parameters to the output parameters. Mathematically one writes:
y = H(x) (1.2)
where x represents the collection of all the variables that describe the state of the
model. In this way one can compare observations obtained from the model with
those obtained by y, estimating the error of the model itself. The measured data can
come from different sources, from in situ measurements or from satellite. Then the
numeric features of assimilation are reduced to a minimization problem where the
cost function J is:
J = ||y H(x)||2 (1.3)
References
24. Poincar, H.: The Foundations of Science. Science Press, New York. Reprinted in 1921 (1902-
1908); This book includes the English translations of Science and Hypothesis (1902), The
Value of Science (1905), Science and Method (1908)
25. Marston, M.: Bull. Amer. Math. Soc. 52(5), Part 1, 357391 (1946) (see project euclid)
26. Lyapunov, A.M.: The general problem of the stability of motion. Translated by Fuller, A.T.
Taylor and Francis, London. ISBN: 978-0-7484-0062-1. Reviewed in detail by Smith, M.C.:
Automatica 1995 3(2), 353356 (1992)
27. Lorenz, E.N.: Deterministic non-periodic flow. J. Atmos. Sci. 20, 130141 (1963)
28. Markov, A.A.: Extension of the limit theorems of probability theory to a sum of variables
connected in a chain. Reprinted in Appendix B of: Howard, R. Dynamic Probabilistic Systems,
vol. 1, Markov Chains. Wiley (1971)
29. Masani, P. (ed.): The Mathematical Work of Norbert Wiener, vol. 4. MIT Press, Cambridge.
This contains a complete collection of Wieners mathematical papers with commentaries (1976)
30. Kolmogorov, A.: Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Julius Springer,
Berlin. Translation: Kolmogorov, A. (1956). Foundations of the Theory of Probability, 2nd edn.
Chelsea, New York (1933)
31. Kalman, R.E., Bucy, R.S.: New results in linear filtering and prediction theory. Trans. Am. Soc.
Mech. Eng., J. Basic Eng. Ser. D 83, 95108 (1961)
32. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44(247), 335341
(1949)
33. Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the Agulhas current
using the ensemble Kalman filter with a quasi geostrophic model. Mon. Wea. Rev. 124, 8596
(1996)
34. Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter.
Mon. Wea. Rev. 126, 17191724 (1998)
35. Hamill, T., Mullen, S., Snyder, C., Toth, Z., Baumhefner, D.: Ensemble forecasting in the short
to medium range: report from a workshop. Bull. Am. Meteor. Soc. 81, 26532664 (2000)
36. Evensen, G.: Data Assimilation: The Ensemble Kalman Filter. 2nd edn. p. 320. Springer (2009)
37. Lewis, J.M.: Roots of ensemble forecasting. Am. Meteorol. Soc. 133(7), 18651885 (2005)
38. Gillispie, C. (ed.): Dictionary of Scientific Biography, vol. 18. Scribner, New York (1981)
39. Torricelli, E.: http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Torricelli.html
40. Petersen, D.P., Middleton, D.: On representative observations. Tellus XV 4 (1963)
41. Petersen, D.P.: On the concept and implementation of sequential analysis for linear random
fields. Tellus XX 4, 673686 (1968)
42. Charney, J.: On the scale of the atmospheric motions. Geofys. Publ. 17(2), 17 (1948)
43. Charney, J.: The use of the primitive equations of motion in numerical weather prediction.
Tellus 7, 2226 (1955)
44. Eady, E.: The quantitative theory of cyclone development compendium of meteorology. In:
Malone, T. (ed.) American Meteorological Society, pp. 464469 (1951)
45. Lorenz, E.N.: The statistical prediction of solutions of dynamic equations. In: Proceedings of
the International Symposium on Numerical Weather Prediction, pp. 629634. Meteorological
Society of Japan, Tokyo (1962)
46. Lorenz, E.N.: Energy and numerical weather prediction, Tellus XII 4, 364373 (1960)
47. GARP: GARP topics. Bull. Am. Meteor. Soc. 50, 136141 (1969)
48. Epstein, E.: Stochastic dynamic prediction. Tellus 21, 739759 (1969)
Chapter 2
Representation of the Physical System
representation of the same, to which the dynamic equations of motion are applied in
order to extrapolate into future time.
Thus the first step in the mathematical formulation of the analytical problem is the
definition of the work space. The goal is to find a true state vector that is the projection
of the infinite dimensional space of the field vector to the finite dimensional space
of its numerical representation.
In dynamic meteorology and related disciplines, the forecasts of physical field
variables and the mathematical models of field dynamics are expressed by a system
of nonlinear partial differential equations PDEs whose prognostic state variables at
time tk is the vector xk .
Assuming the governing PDEs to be well-posed in the sense of Hadamard there
is a unique solution operator or time dependent propagator g that yields the solution
xk given the solution xk1 at an earlier time interval tk tk1 . Omitting the time
index tk , from now on represented by the index k, we have:
xk = g(xk1 ) (2.1)
xkd = f(xk1
d
), (2.2)
Using the previously defined operators and applying the operator on both side
of Eq. 2.1 and adding and subtracting f(xtk1 ) we obtain the discrete evolution of the
equation for xkt
where f is the discrete propagator that resides in our numerical solution and the
t
forcing term ek1 is the model error from time tk1 to tk that is:
Here hkc is the continuum forward observation operator and ekm is the measurement
error, considered stochastic, whose mean is: ekm E[ekm ]. E[] is the expectation
operator.1
The observation operator can be considered linear when the state variables are
directly observed, as in case devices located in ground measurement stations and
radiosondes and non linear when data come from remotely sensed devices that require
proper integro-differential algorithms to be interpreted.
1 Theexpectation of a random variable is defined as the sum of all values the random variable
may take, each weighted by the probability with which the value is taken. In term of formula the
expectation of E[x] is given by: +
E[x] = x f (x)d x (2.7)
This is also called mean value of x or first moment. A second moment is given by the quantity
+infty
E[x 2 ] = infty x 2 f (x)d x. The variance of a random variable is the mean squared deviation of the
random variable from its mean; it is 2 = E[x 2 ] E[x]2 . Another important concept is the statistical
correlation between random variables that is given by the covariance, which is the expectation of
the product of the deviations of two random variables from their means:
that is a measure of bias, and its covariance matrix Rk E[(ekm ekm )(ekm ekm )T ].
22 2 Representation of the Physical System
Let us now formulate the stochastic dynamic model adding and subtracting hk (xtk )
from the relation (2.6) taking into account the discrete true state xkt = xk . Then
the discrete observation model is obtained by:
where hk is the discrete forward operator acting on xkt and eobs erk + ekm is the
total observation error. The measurement error is ekm while the representativeness
error (see Lorenc [2]) erk is given by the difference between the representation of the
continuum forward model and formulating the model error discretely, that is:
The impact of this error on our system will be more clear when we address the
initialization problem in the next paragraph.
Since our goal is to study a physical system described by the vector state xkt and since
this vector is unknown, one assumes that the best estimation of the system is given by
the state vector xkb , where b stands for background, denoting the a priori or background
estimate of the true state before the analysis is carried out, valid at the same time. This
vector is the result of the data assimilation or statistical analysis performed earlier.
By Eq. 2.9 the observations performed on the system bring new information through
the operator hk . One assumes the statistics of the observation error are known up to
the second order moments. When new observations are available we can improve
the analysis obtaining our estimation xka with its error. The suffix a means analysis.
In geophysics it is usual to define the a priori estimate as forecast/background and
the posterior estimate as analysis.
Let us now explore the background and analysis errors. The background error is
defined as:
ekb = xkb xkt (2.11)
it reflects the discrepancy between the a priori estimate and the unknown truth. It
is considered stochastic the mean of which is eb = E[ekb ]. The background error
covariance is: B = E[(ekb ekb )(ekb ekb )T ]. The analysis error is defined as:
It defines the difference between the analysis process and the truth; the related analysis
error covariance is: A = E[(eka eka )(eka eka )T ] with its mean given by: ea = E[eka ].
All matrices are symmetric and positive.
2.1 The Observational System and Errors 23
In order to understand if we can linearize our system we need to analyze the role
played by the representativeness error in the model. If we write the representativeness
error adding and subtracting hkc (xk ) to the relation (2.10)
that operates on the unresolved portion of (I )xk of the continuum state xk . Hkc
is a tangent linear operator or Jacobian matrix of a non linear operator. If we define
the linear operator as H we can write Hkc = H
Thus the vector of observations y is related with the observation operator H(x)
and the observation error as e by:
y = Hxt + e. (2.17)
Given the observation Eq. 2.17 we obtain that the estimate error of the analysis is:
If we assume the errors and observations are unbiased, the related expectations are
E[e] = 0 and E[eb ] = 0 and thus E[ea ] = (L + KH I)E[xt ]. On the contrary
if there is a bias, it is always possible to diagnose it and subtract its value from the
total observation errors to make the corrected error unbiased. If we postulate that:
L = I KH, (2.20)
xa = (I KH)xb + Ky
xa = xb + K(y Hxb ), (2.21)
As previously we assume the errors and observations are unbiased so that the covari-
ance analysis matrix is Pa = E[(ea )(ea )T ]. Remembering the error covariance
background matrix Pb and error covariance observation matrix R and the relation
(2.20) we have:
Pa = E[(ea )(ea )T ]
= E[(eb + K(e Heb ))(eb + K(e Heb ))T ]
= E[(Leb + Ke)(Leb + Ke)T ]
= E[Leb (eb )T LT ] + E[Ke(e)T K T ]
= LPb LT + KRK T
= (I KH)Pb (I KH)T + KRK T (2.23)
Following Bouttier and Courtier [4] we have a continuous differentiable scalar func-
tion of the coefficient of K whose first order derivative in K of the difference
T race[Pa ](K + L) T race[Pa ](K], where L is an arbitrary test matrix, is:
The last line shows that the derivative is zero for any choice of L if (HPb HT +
R)K T Pb H = 0 that is equivalent to:
Now we need to improve our knowledge of the state xa taking into account the two
available sources of information: the model and observations. There are two ways to
combine observations with the model:
the observations y may be interpolated between the observational data which can
be sparse in time and space subjected to constraints provided by the model;
we want to reduce the uncertainties of model H on the input x, under the constrains
of the measurements.
We define a cost function J that is a measure of the distance between the observations
and model.
J (x) = ||y H(x)||2 , (2.27)
with || || the norm two. Since we need to balance each component through the
confidence in the measurement we can access to the a priori estimate or background
information. In this way we introduce a compromise between the observations given
by the observation and the information given by background value. Then the cost
function can be defined as
and are the weight given to the confidence in the observations and background.
Those parameters can be defined empirically or analytically, knowing the background
and observation errors.
A simple example drawn by Bouttier and Courtier [4] based on temperature and
related error variance shows that the cost function terms which have a quadratic
form tend to pull the analysis xa toward the background xb and the observation y,
respectively. In this case xa makes J (x) as small as possible, given the computational
constraints.
The quadratic form (2.28) can be written in matrix form as:
1
J (x) = {(y H(x))T R1 (y H(x)) + (x xb )T (Pb )1 (x xb )}, (2.29)
2
It demands the first derivative, the gradient, of the cost function with respect to its
variable x at the analysis xa be equal zero
dJ
J (xa ) = =0 (2.31)
dx xa
Rearranging this equation and applying again the linear assumption, we have:
The analysis state xa is called optimal because is closest in a root mean square sense
to the true state xt . The equivalence of this relation with relation (2.21) can also be
2.1 The Observational System and Errors 27
K = Pb HT (HPb HT + R)1
= ((Pb )1 + HT R1 H)1 ((Pb )1 + HT R1 H)Pb HT (HPb HT + R)1
= ((Pb )1 + HT R1 H)1 (HT + HT R1 HPb HT )(HPb HT + R)1
= ((Pb )1 + HT R1 H)1 HT R1 (R + HPb HT )(HPb HT + R)1
= ((Pb )1 + HT R1 H)1 HT R1 . (2.36)
The equivalence can be useful because sometime the inversion of the ((Pb )1 +
HT R1 H)1 is more costly than the matrix R + HPb HT .
Summarizing we have:
xa = xb + K[y H(xb )]
W = Pb HT [R + HPb HT ]1 . (2.37)
where e(r) = xb (r) xt (r) is the forecast error with xt (r) represents the true value
of the state vector and xb (r) is the forecast state vector. Therefore we can decompose
Pb as b| b| b|
P P P
Pb (ri , r j ) Pb| Pb| Pb|
Pb| Pb| Pb|
Bouttier, and Courtier [4] provide the fundamental assumptions and procedures to
be followed.
The fundamental hypothesis of optimal interpolation (OI) is that: for each variable
of the model, just some observations are important to determine the increase of the
analysis. From this it follows that:
1. for each variable of the model x(i) choose a small number of observations pi
using an empirical policy of selection;
2. form the corresponding pi list of deviations of the background data [y H(xb )]i ,
of the errors background covariance matrix, between the variables of the model
x(i) and the state of the model interpolated in points pi namely the pi coefficients
of the ith row of Pb H and ( pi pi ) covariance submatrix of the errors observations
and background formed by HPb HT and R for selected observations;
3. invert the positive-definite matrix ( pi pi ) formed from [R + HPb HT ] for
selected observations (e.g. using Cholesky Factorizazion methods or LU);
4. multiply it by the ith row of Pb H to get the K row required.
In Optimal Interpolation it is necessary that Pb is a matrix that can be easily applied
to a pair of observed value and model variables or to a pair of observed variables.
The simplicity of the OI collides with the disadvantage that there is no consistency
between small and large scales and that H must be linear.
There are several methods to minimize the cost function including the most relevant
below. If the cost function is quadratic and convex, its solution is unique. In general,
however, J exhibits several minima. In such a case the problem is more difficult
to solve, even though there are some algorithms among which we select conjugate
gradient and quasi-Newton methods.
The minimization algorithms start from an initial point x0 and construct a sequence
xk which converges to a local minimum. At each step k one determines a direction
dk to define the next point of the sequence. Then the problem of minimization of
multivariable functions is usually solved by determining a search direction vector
dk and solve it as a linear minimization problem. If xk is the vector containing the
variables to be determined and dk is the vector of search direction, at each iteration
step the minimization problem of a function f is formulated so as to find the step
size that minimizes f(x + dk ), where is a positive and real number. At the next
iteration, x is replaced by xk + k dk and a new search direction is determined.
The conjugate gradient method is an algorithm for finding the nearest local min-
imum of a function which uses conjugate directions for descending. Two vectors u
and v are said to be conjugate, with respect to a matrix A, if
uT Av = 0, (2.39)
2.1 The Observational System and Errors 29
where A is the Hessian matrix of the cost function. In Press et als book [5] there are
two conjugate gradient methods by Fletcher-Reeves and Polak-Ribire.
These algorithms calculate the mutually conjugate directions of search with
respect to the Hessian matrix of the cost function directly from the function and
the gradient evaluations, but without the direct evaluation of the Hessian matrix. The
new search direction dk+1 is determined by using
where dk is the previous search direction, gk+1 is the local gradient at iteration step
k + 1 that is determined by the Fletcher-Reeves equation
gk+1 gk+1
k = (2.41)
gk gk
(gk+1 gk ) gk+1
k = . (2.42)
gk gk
If the vicinity of the minimum has the shape of a long, narrow valley, the minimum is
reached in far fewer steps than would be the case using the steepest descent method,
which makes use of the inverse of the local gradient as the search direction. The line
minimization to find the step size that minimizes f(x + dk ) at every iteration step
can be done by using the Golden Section Search Algorithm [5].
For the problem of minimizing a multivariable function quasi-Newton methods
are also widely used. These methods involve the approximation of the Hessian, or
its inverse, matrix of the function. The LBFGS (Limited memory-Broyden-Fletcher-
Goldfarb-Shanno) method is basically a method to approximate the Hessian matrix
in the quasi-Newton method of optimization. It is a variation of the standard BFGS
method, which is given by (Nocedal [6], Byrd et al. [7])
xk+1 = xk k Hk gk , k = 1, 2, 3 . . . (2.43)
where k is a step length, gk is the local gradient of the cost function, and Hk is the
approximate inverse Hessian matrix which is updated at every iteration by means of
the formula
Hk+1 = VkT Hk Vk + k sk skT (2.44)
where
1
k = (2.45)
qT s k
30 2 Representation of the Physical System
and
Vk = I k qskT (2.46)
sk = sk+1 sk (2.47)
and
qk = gk+1 gk (2.48)
Using this method, instead of storing the matrices Hk , one stores a certain number
of pairs {sk , qi } that define them implicitly. The product of Hk gk is obtained by
performing a sequence of inner products involving gk and the most recent vector
pairs {sk , qi } to define the iteration matrix.
and we can conclude E[y Hxa ] = 0. Since the background error is unbiased
also the innovation error is also unbiased
means the background and its error are uncorrelated. From Eq. 2.21 we have
Introducing the true state x t of the model into the (2.31) we have:
Thus
Multiplying the right side of this equation by its transposed and taking into account
of the Eq. 2.23 and computing the expectation we have:
Assuming the background error and those due to the observation are uncorrelated,
by simplifying we obtain:
1
Pa = [ J (x)]1 (2.62)
2
or inversely
1 a 1
J (x) = (P ) (2.63)
2
The basic principle of the 3D Var is to avoid explicitly calculating the gain matrix
and make its inversion using a minimization procedure of the cost function J . In this
case the solution of the Eq. (2.29) is obtained iteratively doing various evaluations of
the equation and its gradient to get the minimum using a suited descent algorithm.
The minimization is obtained limiting the number of iterations and stipulating that
the norm of the gradient J (s a ) decreases by a predefined amount, during
minimization this is an intrinsic measure of how close the analysis is to the optimal
value rather than the starting point of minimization.
When the observations are distributed over time the approach 3D V ar is gener-
alized to the approach 4D Var. The equations are the same provided the operators
are generalized including a forecasting model that allows comparison the state of the
model with the observations at a time k defined.
In a given time interval the cost function to be minimized is the same of the
3D Var but with a difference related to the operator H and R that are subject to
a partial trajectory, i.e. the k ranges from k 1 to k. Thus according to the relation
given by Lorenc [2]:
N
1 T 1 1
J = [yk Hk (xt )] Rk [yk Hk (xk )] + [x x0b ]T (Pb )1 [x x0b ],
2 2
t=0
(2.64)
2.2 Variational Approach: 3-D VAR and 4-D VAR 33
where x is at zero time forecast produced by the data assimilation for k < k0 . Pb is
the covariance matrix of the errors while k = 0; x0b is a value of x before the first
iteration of the descent algorithm. The second term on the right it is thought to force
the forecast x0b towards the prediction before x to reduce the time discontinuity. In
the classical receipt (see [8]) the assimilation issue 4D Var is subject to a strong
constraint such that the sequence of model states xt must be a solution of the equation:
where M0k is a forecasting model from the starting time to k. 4D Var is a problem
of nonlinear optimization difficult to solve unless in the following two hypotheses:
causality and tangent linear.
1. Causality
The forecast model can be expressed as the product of intermediate forecasting
steps, that reflect the randomness of nature. The integration of a prognostic model
starts with the initial condition x0 = x so that M0 is the identity. Thus indicating
with Mk the step of forecasting from k 1 to k we have xt = Mk xk1 and by
recurrence:
xk = Mk Mk1 ...M1 x1 . (2.66)
2. Tangent linear
The cost function can be squared assuming that the operator M can be linearized,
that is:
3 The adjoint operators have been introduced to reduce the size and the number of multiplication of
matrices and to be able to calculate the cost function. Algebraically means replace a set of matrices
with their transposed, hence the name of adjoint techniques.
34 2 Representation of the Physical System
1
N
1
Jo = Jok
2 2
k=0
N
= M1T ...MkT HkT dk
k=0
= H0T d0 + MkT [H1T d1 + M1T [H2T d2 + ..HnT dn ]...]. (2.68)
Before showing why the assimilation is an inverse problem let us introduce the
concept of well and ill posed problem in the sense of Hadamard. Given and operator
A, we wish to solve the following system of equations
g = Af. (2.69)
Let us now apply the approach to a simplified model or toy model representing the
sea circulation in a well defined ocean basin (Bennet [11]):
u u
+c = F, (2.70)
t x
with dimension between 0 x L in the time interval 0 t T where c is a
known positive constant. Let F = F(x, t) be the forcing field. An initial condition
is given by:
u(x, 0) = I (x), (2.71)
where B(t) is defined. In order to evaluate the uniqueness of the solution we assume
F, I e B have two solutions u 1 and u 2 . Defining the difference v = u 1 u 2 we have:
v v
+c = 0, (2.73)
t x
with the boundary conditions and initial conditions respectively of v(x, 0) = 0 and
v(0, t) = 0. The solution can be obtained using the methods of characteristics by
which Partial Differential Equations (PDEs) are reduced to Ordinary Differential
Equation (ODEs). The characteristic equations are:
dx
=c
ds
dt
= 1. (2.74)
ds
The PDE that has been transformed into ODE is
du
=0 (2.75)
ds
On the basis of the initial condition and the boundary conditions the solution is:
u(x, t) = 0 (2.76)
Let us now verify the other two points of the well-posed conditions. Using the
Green function one defines G = G(x, t, , )
G G
c = (x )(t ) (2.77)
s x
that is an explicit solution for the forward model. Relation (2.78) indicates u depends
on F, I , B with continuity and if they change of a O(e), u also consequently changes.
Furthermore the request is that I (0) = B(0) otherwise u discontinues along x = ct
for all ts. On the basis of such evaluation one can deduce the model is well posed.
Let us see what happens to the forward model when we introduce the information
for the field u(x, t) of the circulation model proposed. This information consist of
imperfect observations around at an isolated point in space and time. The direct model
becomes indeterminate and cannot be solved with a smooth function and therefore
must be considered an ill-posed problem that must be resolved through a best fit
weighted with all the information we hold.
Let us assume to collect a M number of measurements (observations, data,
etc. ..) of u, in our basin with 0 x L, during the time cruise 0 t T .
The data are collected in xi , ti with 0 i M and will be indicated by the recorded
value u i and its error as:
u i = u(xi , ti ) + ei , (2.79)
where ei is the measurement error and u(xi , ti ) is the true value. Since the boundary
conditions and the initials conditions are affected by errors the equation should be
written taking into account the error f on the forcing F.
u u
+c = F + f, (2.80)
t x
with
u(x, 0) = I (x) + i(x) (2.81)
and
u(0, t) = B(t) + b(t). (2.82)
The problem is now to obtain a unique solution for each choice of F + f , I + i and
B + b. This can be done by looking for the field u = u(xi , ti ) that minimizes errors.
2.3 Assimilation as an Inverse Problem 37
One will try the minimum of the function J cost or penalty function where one also
has introduced the error standard deviations of the a priori functions: Wi for the
model, Wb for the boundary conditions and Wo for the observations.
T L L T
M
J = J [u] = W f dt f (x, t)2 d x + Wi i(x)2 d x + Wb b(t)2 dt + w ei2 ,
0 0 0 0 m=1
(2.83)
where W f , Wi , Wb and w are the positive weights. The cost function J [u] is a
number for each choice of the entire field u. Rewriting (2.83) and highlighting the
explicit dependence of m, b, o and, ei from F, I , B and u i , we have:
L
T L u u
J (u) = W f dt { +c F}2 d x + Wi {u(x, 0) I (x)}2 d x +
0 0 dt x 0
T
M
Wb {u(0, t) B(t)}2 dt + w {u(xi , ti ) u i }2 , (2.84)
0 m=1
the solution of which can be obtained using the calculation of variations as reported
in Appendix, both for the solution with weak constraint and strong constraints. Since
J is quadratic in u, it is non negative and the local extremum must be the global
minimum [11].
References
Abstract In this chapter, the stochastic dynamical system describing the evolution of
a physical system is analyzed. The Kalman filter genesis and its subsequent evolutions
are outlined to give a complete overview of this algorithm. The Extended Kalman
Filter, the Sigma Point Kalman Filter and the Unscented Kalman Filter are also
reported. The more recent and advanced filters as the Ensemble Kalman Filters will
be reported in the next chapter.
In the previous chapter, we focused our attention on the optimal estimator as well as its
uncertainties, given some a priori information, the background, and the observation
set, but we have not mentioned the temporal dimension of the problem. However, our
interest lies in the evolution of forward model for the system state defined between
times tk and tk+1 . In this chapter, the stochastic dynamical system describing the
evolution of a physical system is analyzed.
The Kalman filter is an algorithm that, given a set of measurements, arrives at the
estimate of a given quantity by a recursive process. To achieve this objective, the
filter processes all possible information, with no regard to their accuracy by using:
1. the knowledge of the system and its dynamic;
2. the statistical description of the noise associated with the system, the measurement
error and the uncertainty of dynamic models used;
3. all possible information about the initial conditions of the variables of interest.
The concept of optimal estimate refers to the fact that the Kalman [1] filter is the best
estimate that can be done, based on all information we can obtain using the three
items mentioned above. The concept of recursive indicates that it is not necessary
to store all information and previous measurements and reprocess whenever a new
measurement has been taken. An example of how the Kalman filter works, is given
in Fig. 3.1. The Kalman filter combines all available measurement data plus the a
priori knowledge about the system and measurement devices to produce an estimate
of the desired variables, so that the error is statistically minimized.
The Author(s) 2016 39
R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_3
40 3 Sequential Interpolation
Fig. 3.1 Kalman filter process. The Kalman filter keeps track of the estimated state of the system
and its variance or uncertainty. The updated estimate is obtained by a state transition model and
measurements
The original derivation of the Kalman filter was not however in terms of Bayesian
Rules and did not require the use of any function of probability. The only assumption
was that the system of random variables should be estimated through the sequential
updating of moments of first and second order (mean and covariance) and the shape
of the estimator should be linear.
However, since the best approach is the one defined by the Bayesian inference,
we will introduce this concept in the most effective way possible, by delaying the
most complete version to the next chapters. Then, from a Bayesian point of view, the
Kalman filter propagates the probability density of the given quantities conditioned
by the knowledge of the actual data which comes from measurement devices. The
term conditioned is associated with the probability density showing that its shape
and its location on the axis of x depends on the values of the measurements taken.
The shape of the conditioned probability density, contains the level of uncertainty,
given by the variance , which we have given knowledge of the value of x, including
the mean, the median and the mode. The Kalman filter propagates the probability
3.1 An Effective Introduction of a Kalman Filter 41
density for those problems where the system can be described by a linear model in
which the system error and the noise are white and Gaussian. Under these conditions
the mean, the mode and the median are the same.
Let us introduce the definition of a linear and nonlinear system, together with the
concept of state space. Its representation is given by a mathematical model of a phys-
ical system with a data set of input, output and state variables related to differential
equations of first degree. The representation of states space is a convenient and com-
pact way to define a pattern and a test system with multiple input and output. It is
built with the state variables of the system dynamic as coordinates. In engineering it
is called the state space, whereas in physics it is called phase space.
The state of a system, at any time, is represented by a point in space. Starting
from an initial position the point, corresponding to the state, moves in space and this
movement is completely determined by the equations of state. The path of the point
is called the orbit or trajectory of the system. It starts from given conditions. These
trajectories are obtained from the solutions of the state equations.
A continuous-time linear system can be described by a first-order differential
equations with the associated output:
where x(t) is a state vector; F is the matrix of the system; G is the input matrix and
C is the output matrix. All these matrices have the appropriate sizes. The dot on x
indicates the time derivative; u is the control vector and y is the output vector. Even
though the matrices are time dependent the system is still linear.
If F, G, C are constant the solution of the Eq. (3.1) is:
t
x(t) = exp[F(t t0 )]x(t0 ) + exp[F(t )]Gu( )]d
t0
y(t) = Cx(t), (3.2)
where t0 is the initial value of the system that is often zero. In case of input equal
zero the relation (3.2) is:
(Ft)j
exp[Ft] = . (3.4)
j!
j=0
Let us define now t = tk tk1 and = tk1 ; substituting into (3.5) we obtain:
t
x(tk ) = exp[Ft]x(tk1 ) + exp[F(t )]Gu(tk1 )d
0
t
= exp[Ft]x(tk1 ) + exp[Ft] exp[F]Gu(tk1 )d. (3.6)
0
At this point, it is necessary to compute the integral of the exponential of the matrix.
It can be simplified if G is invertible.
t
t (F)j
exp[F]d = d
0 0 j!
j=0
= [I exp[Ft]]F1 . (3.7)
where we have substituted with tk and tk1 the time step previously defined with k
e k 1.
Let us assume now that the discretized dynamical system is the one relating (3.8) to
which we have added wk1 , which is the white noise associated with the process.
Thus we have:
xk = Ak1 xk1 + Bk1 uk1 + wk1 , (3.9)
3.1 An Effective Introduction of a Kalman Filter 43
where x, u and w are the state vector of the process at time k 1 with size n 1;
A is the matrix of transition of the process state from k 1 to t, time stationary and
with size n m. B is the input matrix; uk1 is the input vector at time step k 1;
wk1 is the uncertainty. The index k gives the state of the system at time k.
The observation vector is:
yk = Hk xk + vk , (3.10)
where yk defines the observation related to the measurement x at time k with size
m 1; Hk is the matrix jointing the vector state to the measurement vector. For
any measurements it can be changed, but usually it is taken stationary. It does not
contain the noise and its size is m n; vk is the measurement error and it has zero
cross-correlation with the noise. Its size is m 1; wk N(0, Qk ) and vk N(0, Rk ).
In summary: the matrix A makes the link between the current state with the
previous one; B is the process control; Hk is the measurement operator. Let us now
define with xf the a priori estimate or in Bayesian terms.
f
xk = Ak1 xk1
a
+ Bk1 uk1 , (3.12)
Secondary information source is given from yk own data. In order to obtain the
optimal filter, we need to minimize the mean square error, provided the error of the
system is Gaussian.
Let us assume models of noise are stationary in time and are evidenced by the
covariance of the type:
E[wi wjT ] = Qij (3.13)
and
E[vi vjT ] = Rij (3.14)
and
E[vi wjT ] = 0, (3.15)
f f f f f
Pk = E[ek (ek )T ] = E[(x xk )(xk xk )T ] (3.16)
f
Assuming the a priori estimate xka is xk , (see Fig. 3.2), it is possible to write an
equation, for the new estimate, that depends on a priori estimate.
f f
xka = xk + Kk (yk Hk xk ), (3.18)
f
where Kk is the Kalman gain, that we get soon, and the term (yk Hk xk ) is known
as innovation. Substituting the relation (3.10) into (3.18) we get:
f f
xka = xk + Kk (Hk xk + vk Hk xk ), (3.19)
f f
Pka = E[(I Kk Hk )(xk xk ) Kk vk ][(I Kk Hk )(xk xk ) Kk vk ]T ]. (3.21)
f
Since the a priori error estimate (xk xk ) is not correlated with the measurement
noise, the expectation can be written as
f f
Pka = (I Kk Hk )E[(xk xk )(xk xk )T ](I Kk Hk )T + Kk E[vk vkT ]KkT . (3.22)
Taking into account the relation (3.14) we get the update covariance equation
f
Pka = (I Kk Hk )Pk (I Kk Hk )T + Kk RKTk , (3.23)
f
where Pk is the a priori estimate of Pka .
3.1 An Effective Introduction of a Kalman Filter 45
As it is known, the diagonal of the covariance matrix contains the average quadratic
error and because the trace of the matrix is the sum of the diagonal values of a matrix it
will show the trace of the covariance matrix is the sum of the mean square errors. Then
the mean square error can be minimized by minimizing the trace of the covariance
matrix. First of all we make the derivative of Pka with respect to Kk and then we put
it equal to zero.
Expanding the Eq. (3.23) we get:
f f f f
Pka = Pk Kk Hk Pk Pk HkT KkT + Kk (Hk Pk HkT + R)KkT . (3.24)
Noting the trace of a matrix (Trace[]) is equal to the trace of its transpose, we
can write:
f f f
Trace[Pka ] = Trace[Pk ]2Trace[Kk Hk Pk ]+Trace[Kk (Hk Pk HkT +R)KkT ]. (3.25)
d Trace[Pka ] f f
= 2(Hk Pk )T + 2Kk (Hk Pk HkT + R) = 0. (3.26)
dKk
The update covariance is obtained from (3.23) in which we have substituted the
Kalman gain matrix that is given by (3.27);
f
= (I Kk Hk )Pk . (3.29)
This result gives us the update relation of the covariance matrix with optimal gain.
Equations (3.20), (3.19), (3.29) give an estimate of the variable xk . The projection
of the next state is obtained from:
f
xk+1 = Ak xka + Bk uk . (3.30)
To complete the recursion, we need to map the error covariance matrix on the next
step k + 1. We get it defining an expression for the a priori error:
f f
ek+1 = xk+1 xk+1 = (Ak xk +Bk uk +wk )(Ak xka +Bk uk ) = Ak eka +wk . (3.31)
46 3 Sequential Interpolation
f
Extending the definition of Pk at time k + 1 we get:
f f f
Pk+1 = E[ek+1 (ek+1 )T ] = E[(Ak eka )(Ak eka )T ]+E[wk wkT ] = Ak Pka ATk +Q, (3.32)
that completes the recursive filter. Let us note that ek and wk have zero cross-
correlation because the noise wk accumulates between k 1 and k while the error
ek is the current error up to time k.
In summary, the Kalman filter is an algorithm that, given an initial estimate pro-
duces a gain, the so-called Kalman gain, which generates an updated estimate that
generates an updated variance that is projected onto the next step.
In practice the filter operates sequentially in k; at time k we make an estimate of
f
xk , xka , with a covariance matrix of the error of Sk . The Eq. (3.28) defines the Kalman
gain. The stochastic Eq. (3.9) is used to build the a priori estimate and its variance at
the time k + 1. This result is then combined with the measurement made at the same
time, using the equation of the maximum estimate to produce an update status. The
Kalman gain matrix is functionally identical to the maximum a posteriori estimate
(MAP).
The algorithm can be synthetized (without the current index on A, B, H):
1. be:
f
xk = Axk1
a
+ Buk1 ; (3.33)
2. be:
f
Pk = APak1 AT + Q; (3.34)
3. get:
Kk = Pk HT (HPk HT + R)1 ;
f f
(3.35)
4. update:
f
xka = xk + Kk (yk Hxka ); (3.36)
5. update:
f
Pka = (I Kk H)Pk . (3.37)
There are some cases in which the linearity hypothesis is not valid. For instance,
when the observation operator is nonlinear as it is in the case of satellite observation,
where the radiance operators are taken into account, or in the case of a forward
nonlinear model as it is in atmospheric chemistry.
3.2 More Kalman Filters 47
This means we need to adapt the Kalman filter to handle the potential nonlinearity
of these operators. In these cases we can use the Extended Kalman Filter (EKF). The
evolution of the state of the system is given by (3.8), which in discrete form is:
where wk1 is a random perturbation of the system whose distribution has zero mean
and covariance given by the matrix Qk .
The measurement is given by:
yk = h(xk , vk ). (3.39)
The forecast step is like the one we have previously done for the Kalman filter.
f
xk = f(xk1
a
, uk1 , 0). (3.40)
f f
yk = h(xk , 0). (3.41)
f(x, u, w) = f(xk1
a
, uk1 , 0) + Ak1 (x xak1 ) + Wk1 (w 0). (3.42)
Note we assume uk is known and then it has not to be linearized. Ak1 and Wk1
are partial differential matrices or Jacobian matrix and are:
fi (xk1
a ,u
k1 , 0)
Ai,j,k = . (3.43)
xj
fi (xk1
a ,u
k1 , 0)
Wi,j,k = . (3.44)
wj
f
With this approach we have the approximated covariance matrix for xk :
f
Pk = Ak1 Pk1
a
ATk1 + Wk1 Qk1 Wk1
T
. (3.45)
f
hi (xk , 0)
Hi,j,k = (3.46)
xj
48 3 Sequential Interpolation
and
f
hi (xk , 0)
Vi,j,k = (3.47)
vj
Thus,
f
exf k = xk xk (3.48)
and
f
ezf k = yk yk . (3.49)
f
We do not know xk , but we aspect that xk xk is relatively small. Now we are able
f
to linearize our function f() in order to obtain an approximation for exk :
f
exf k = f(xk1 , uk1 , wk1 ) f(xk1 , uk1 , 0). (3.50)
f
exf k Ak1 (xk1 xk1 ) + k , (3.51)
where k is a distribution as N(0, Wk1 Qk1 Wk1 T ) taking into account the effect of
f
ek Hk exf k + k , (3.53)
f
exak = Kk (yk yk ) (3.54)
f
and after placing xka = xk + exak and using the same derivative process like that one
used for the Kalman gain we obtain the optimal gain for EKF
from which we can obtain the covariance for the update estimate of xka
f
Pka = (I Kk Hk )Pk . (3.56)
3.2 More Kalman Filters 49
The central point of the Kalman filter is the propagation of a Gaussian random variable
through the dynamic system. In an Extended Kalman filter the distribution of the state
of a system and the related noise density are approximated by the Gaussian random
variables that are then propagated through a first order linearization of non-linear
system. This can introduce large errors into true mean and in the covariance of the
Gaussian random variable that can lead to a divergence in the filter compromising
the operation.
To address substantially the development of the new Kalman filters, particularly
the filter Sigma Point Filter, and to what it is connected, the Unscented Kalman filter,
we need to take the concepts of Bayesian inference. We use such an approach at the
light of the applications derived from the robotics world where the Sigma Point
Kalman Filter was born (Van der Merwe and Wan [3]).
The probabilistic inference consists in the problem of estimating the hidden vari-
ables (states or parameters) of a system in a consistent and optimal way, once we
have the information incomplete and noisy. We consider that the hidden state of the
system xk with initial probability density of p(x0 ) evolves in time (we remind that
we used the k index to describe the discrete time evolution ) as a first-order Markov
process according to the probability density of p(xk |xk1 ). Once the state variable is
given, the observations yk are conditionally independent and are generated according
to the density of conditioned probability p(yk |xk ). Then the dynamic model of space
is given by:
xk = f[xk1 , uk , wk ] (3.57)
yk = h[xk , vk ], (3.58)
where wk is the noise-related process that drives the dynamic system through the
function of non-linear transition state f. vk is the noise corrupting the measure-
ment through the non-linear observation function of h. The state transition density
p(xk |xk1 ) is fully specified by f and the noise distribution process is given by p(wk ).
h and the noise distribution p(vk ) indicate the observation probability p(yk |xk ). We
assume the external input uk is known. The Dynamic Space-State Model (DSSM )
along with the statistics of random noise variables, as well as the a priori distribu-
tions of the state system, define a probabilistic model of how the system temporally
evolves and how we can observe the evolution of the hidden state. The problem is to
know, in a recursive way, how to obtain an optimal estimate of the hidden variables
of the system, when incomplete and noisy observations are taken.
From a Bayesian point of view the a posteriori filtering density
Then the problem can be reformulated: how do we compute recursively the a poste-
riori density when there are new observations? The answer comes from the recursive
Bayesian estimation algorithm.
Using the Bayesian rule and the DSSM of the system, the a posteriori density can
be expanded and factorized in the following update recursive form.
Let us see how the relation (3.62) is constructed: the a posteriori state at time k 1,
p(xk1 |y1:k1 ) is firstly forward projected in order to compute the a priori state at
time k, using the model of probabilistic process
p(xk |y1:k1 ) = p(xk |x1:k1 )p(xk1 |y1:k1 )dxk1 . (3.63)
Then the more recent measurement of noise is incorporated, using the observed
probability function, to generate a posteriori update state.
where is the Diracs delta. These multi-dimensional integrals can only be treated
in the case of a Gaussian linear system.
The methodology proposed to solve the problem of probabilistic inference, the
Bayesian optimal recursive solution, requires the propagation of probability density
function of the a posteriori state. This solution is general enough to deal with all forms
of a posteriori density, including the multimodality, asymmetries and discontinuity.
However since the solution does not place any restrictions on the form of a posteriori
density, in general it cannot be described by a finite number of parameters. Then each
estimator must be approximated as a function of the form of a posteriori density and
of the form of recursive Bayesian structure as has been defined previously.
A common mistake is to think the Kalman filter requires that the space in which
it operates is linear and that the probability density is Gaussian. The Kalman filter
does not require these conditions, but only the following assumptions.
1. Estimates of minimum variance of random variables, and then the distribution of
subsequent state variables can be computed, only by computing the first and sec-
ond momentum (i.e. the mean and the variance) by propagating them recursively
and updating.
2. The estimate (the updated measurement) is a linear function of a priori knowledge
of the system, sinthetizing p(xk |y1:k1 ), and new information p(yk |xk ). In other
words, we assume that the Eq. (3.67) of the optimal Bayesian recursion can be
approximated by a linear function.
3. The accurate predictions of the state variable (using the process model) and the
observations (using the forecast model) can be calculated to approximate the first
and second momentum p(xk |y1:k1 ) and p(yk |xk ).
Because the first assumption is consistent, it is necessary that the estimate of the
mean and of the covariance of the a posteriori state density, xk and Pxk , satisfy the
following inequality:
x
k = E[f(xk1 , wk1 , uk )] (3.72)
z
k = E[h(xk , vk )] (3.73)
Kk = E[(xk x k )(yk z
k ) E[(yk
T
z
k )(yk z
k ) ]
T
(3.74)
1
= Pxk zk Pz , (3.75)
k
with x
k as the optimal forecast (a priori mean at the time k) of xk and corresponds to
the expectation (taken on the a posteriori distribution of the state variable at the time
k 1) of a nonlinear function of the random variables xk1 e wk1 . Similarly for the
optimal forecast z
k , except that the expectation is taken on the a priori distribution
of time state variable k. The term Kk is expressed as a function of the expectation of
a cross-correlazione matrix (covariance matrix) of the state variable forecast error.
Since the problem is to carefully calculate the expected mean and the covariance
of a random variable, let us see what uncertainties arise when estimates of the future
state of a system or measurements are performed. If x is a random variable with
mean x and covariance Pxx we can always find a second random variable y that has
a non-linear functional relationship with x of the type
y = f[x]. (3.76)
Now we want to calculate the statistics of y i.e. the average of y and covariance
Pyy . We need to determine the distribution density function transformed and evaluate
the statistics from this distribution. In the case of a linear function there exists an exact
solution, but in the case of a nonlinear function we must find an approximate solution
that is statistically significant. Ideally it should be efficient and unbiased. Because
the statistic transformed is consistent it is necessary that the following inequality
holds:
Pyy E[{y y}{y y}T ] 0. (3.77)
This condition is extremely important for the validity of the method of processing.
If the statistic is inconsistent the value of Pyy is underestimated and therefore the
Kalman filter assigns a weight too high to the information and underestimates the
covariance. Then the filter tends to diverge. It is therefore appropriate that the trans-
formation is efficient, that is that the left side of the Eq. (3.77) is minimized and that
the estimate is unbiased or y E[y].
We develop a consistent, efficientand unbiased transformation by developing a
Taylor series around x with the Eq. (3.76).
3.2 More Kalman Filters 53
f 1 nf n
f[x] = f[x + x] = f[x] + x + ... x , (3.78)
x n! xn
where x is a Gaussian variable with zero mean, with covariance Pxx and the xnf xn
n
is the nth-order of the multidimensional Taylor series. Taking the expectation of the
transformed mean we have:
1 2f 1 4f
y = f[x] + Pxx + E[x4 ] + .... (3.79)
2 x2 2 x4
In other words, the n order term in the series for x is a function of the nth-
momentum of x multiplied by the nth-derivative of f[] evaluated at x = x. If the
moments and the derivatives can be properly evaluated until the nth-order, the mean
is correct up to that order. Similarly it applies to covariance, although the structure
of each term is more complicated. Because each term in the series is scaled down
to the smallest terms, the lowest order in the series has a bigger impact, so that our
forecast procedure should be concentrated on evaluating the terms of lower order.
The linearization process assumes that the second order terms and those of largest
order in x are neglected. Therefore under this assumption
y = f[x] (3.81)
T
f f
Pyy = Pxx . (3.82)
x x
By comparing these estimates with quadratic equations (3.79) and (3.80) we see
that, for the mean, the first order approximations are only valid when orders higher
than the second are insignificant while, for the covariance matrix, those that are
higher than the fourth order.
The Kalman filter calculates the optimal conditions through relations (3.75), while
the Extended Kalman filter linearizes the system around the current state using a first-
order-truncation of the multidimensional Taylor series. Broadly it approximates the
optimal conditions as:
x
k = E[f(xk1 , w, uk )] (3.83)
z
k = E[h(x k , v)] (3.84)
Kk = Pxk zk (Pzlink )1 ,
lin
(3.85)
54 3 Sequential Interpolation
Once we have these points and their weights, it is possible to calculate the average
value y and the covariance Pyy , through the following process:
1. each point is transformed by a function into a set of Sigma Points
n
= W (i) (i) . (3.88)
i=0
3. The covariance is weighed using the outer product of transformed points treating
n
P = W (i) { (i) }{ (i) }T . (3.89)
i=0
(0) = x
W (0) = W (0)
(i) Nx
= x + Pxx
1 W (0)
i
1 W (0)
W (i) =
2Nx
(i+nx ) Nx
= x Pxx
1 W (0)
i
1 W (0)
W (i+Nx ) = , (3.90)
2Nx
where ( Nx Pxx )i is the ith column or row of the square root of the matrix Nx Pxx
which is the covariance matrix multiplied by the number of dimensions. W (i) is the
weight associated with the ith point. By convention W (0) is the weight of the mean
point which is indexed as zero point. Note that the weights are not necessarily positive
because they depend on the approach used for the sigma points.
Because SPKF does not use the Jacobian of the system of equations, it becomes
particularly attractive for the black box" systems or in expressions of the dynamic
system in which they can be linearized.
Because the mean and the covariance of x are captured accurately up to second
order, the calculated value of the mean of y are correct up to the second order.
This indicates that the method is more accurate than the EKF with the additional
benefit that the distribution has been approximated rather than the f[] since the
series expansion has not been truncated in some order. In this way the algorithm is
able to incorporate information from the highest orders giving greater accuracy.
The Sigma Points capture the same mean and covariance with no regard to the
choice of the square root of the matrix used. We can for example use matrix decom-
position methods more stable and efficient as the decomposition of Cholesky.
The mean and covariance have been calculated using the standard operations on
vectors and this means that the algorithm is adaptable to any choice that has been
56 3 Sequential Interpolation
made on the process, therefore compared to the EKF it is not necessary to evaluate
the Jacobian matrix.
The EKF has two problems: linearization can produce highly unstable filters if the
assumption of linearity is locally violated; the derivation of the Jacobian matrix is
not trivial for many applications and often leads to difficulties. In addition, because
the linearization of the errors in EKF introduces an error which is about 1.5 times the
standard deviation of the measurement interval, the transformation is inconsistent.
In practice, this inconsistency may be resolved by introducing an additional noise
that stabilizes the transformation, but induces a growth of the size of covariance
transformed. This is a possible reason why EKF are difficult to adjust. In fact, we
must introduce a sufficient noise to perform the linearization. However the noise
stabilizing is an undesirable solution since the estimate remains biased and there is
no guarantee that transformed estimate is consistent.
To overcome these drawbacks the UKF transformation is introduced. This is
a method for calculating the statistic of a random variable which is subject to a
nonlinear transformation. It is based on the assertion that it is easier to approximate
a Gaussian distribution than approximate an arbitrary nonlinear function.
A UKF transformation is based on two fundamental points:
1. Is easier to make a non-linear transformation of a single point rather than an entire
probability density function.
2. It is not too hard to find a set of individual points in the state phase whose
probability density function sample approximates the actual probability density
function of a state vector.
The UKF is closely linked to the transformation of the Sigma. The formula for
the Unscented Kalman filter is:
1. Take the nonlinear system at n-discrete time states is given by:
x0a = E[x0 ]
P0a = E[(x0 x0a )(x0 x0a )T ]. (3.92)
3.2 More Kalman Filters 57
The following equations are used to make the upgrade timely in order to propagate
the estimated state and covariance from a temporal measurement to another.
(i)
1. In order to propagate from the time (k 1) to k, first select the sigma xk1 as
specified previously in the transformation of SPKF with an appropriate exchange
a
because the current best hypothesis for the mean and the covariance of xk are xk1
a
and Pk1 :
(i)
xk1 = xk1a
+ x(i) i = 1, ...2n
x(i) = ( xPak1 )Ti i = 1, ...n
x(n+i) = ( xPak1 )Ti i = 1, ...n. (3.93)
2. Use the equation of known nonlinear system f() to transform the sigma points
(i)
in vectors xk taking into account that the transformation is f() rather than h()
data from SPKF and therefore we have to make the appropriate changes.
xk(i) = f(xk1
(i)
, uk1 , wk1 ). (3.94)
(i)
3. Combine vectors xk to get the a priori state estimate at the time k
1 (i)
2n
f
xk = xk . (3.95)
2n
i=1
4. Estimate the a priori covariance of the error adding Qk1 to account for the noise.
1 (i)
2n
f f (i) f
Pk = (xk xk )(xk xk )T + Qk1 . (3.96)
2n
i=1
Once we have done the upgrade, we implement the update equation of measurement
1. We choose the sigma points as specified by the unscented transform with an
appropriate change because the current best hypothesis for the mean and the
f f
covariance of xk are xk e Pk :
(i)
xk = xk + x(i) i = 1, ...2n
f
x(i) = ( xPk )Ti i = 1, ...n
f
x(n+i) = ( xPk )Ti i = 1, ...n.
f
(3.97)
This passage can be neglected re-using the sigma points obtained from the time
update. Let us use the equation of nonlinear system known h() to transform the
58 3 Sequential Interpolation
(i)
sigma vectors yk according to the relation:
(i) (i)
yk = h(xk , uk , vk ). (3.98)
(i)
2. Combine vectors yk to obtain the estimate of the a priori state at time k
1 (i)
2n
f
yk = yk . (3.99)
2n
i=1
3. Estimate the covariance of the a priori error adding Rk to take account of the noise
on the measurement
1 (i)
2n
f (i) f
Pz = (yk yk )(yk yk )T + Rk . (3.100)
2n
i=1
f
4. Estimate the cross-variance between xk and yk
1 (i)
2n
f (i)
Pxz = (xk xk )(yk yk )T . (3.101)
2n
i=1
5. Finally to update the measurement of the estimate of the state we use the Kalman
filter equation
Kk = Pxz Pz1
f f
xka = xk + Kk (yk yk )
f
Pka = Pk Kk Pz KkT . (3.102)
We have assumed that the equations of the process and those of measurement are
linear if compared to the noise, even though this is not generally true. In this case
we must treat the state vector in a different ways, using what Julier and Uhlmann [4]
call the augmented state (+).
xk
xk+ = wk
vk .
In this case we need to estimate the mean as increased mean and the covariance
as increased covariance as we have done by the algorithms previously defined but
from which we have removed Qk1 e Rk .
References
1. Kalman, R.E., Bucy, R.S.: New results in linear filtering and prediction theory. Trans. Am. Soc.
Mech. Eng. J. Basic Eng. Ser. D 83, 95108 (1961)
2. Molner, C., VanLoan, C.: Nineteen Dubious Ways to Compute the Exponential of a Matrix.
Twenty-Five Years, SIAM Rev. Soc. Ind. Appl. Math. 45, 349 (2003)
3. van der Merwe R., Wan, E.A., Julier, S.J.: Sigma-Point kalman filters for nonlinear estimation
and sensor-fusion: applications to integrated navigation. In: Proceedings of the AIAA Guidance,
Navigation, and Control Conference, Providence, RI (2004)
4. Julier, S., Uhlmann, J.: Unscented filtering and nonlinear estimation. In: Proceeding of IEEE
92(3), 401422 (2004)
Chapter 4
Advanced Data Assimilation Methods
Abstract The computations of the model and its error require an accurate selection
of the methods to be used. When the algorithm adopted to specific non-linear systems
diverges or the approximation order to better handle the non linearity of the model
fails, we need to improve the performance of the method. This chapter deals as
overcome these problems by the recursive Bayesian estimation and the most advanced
filters such as the stochastic ensemble Kalman filters.
As we have seen in the previous chapters the accurate numerical prediction requires
accurate initial conditions. However the computations of the model and its error
require an accurate selection of the methods to be used. For example the Extended
Kalman Filter applied to specific non-linear systems diverges because it is not able
to gather enough information on the system states trajectory and because the error
propagation is approximated by the tangent linear model in analysis steps. The prob-
lem can partially overcome by using higher order approximation in order to better
handle the non linearity of the model as, for instance happens in the Sigma Point
Kalman filter (SPKF) or derived from. However, beyond the tangent linear, there is
also a requirement of a Hessian which in terms of computer time is expensive.
Then in order to improve the performance of the method, with specific nonlinear
systems, we need to take into account all statistical moments of the state distribution
due to their impact in propagating within the system.
If we want to work with all statistics moments we must not limit ourselves to
the expectations and the covariances of distributions of variables. There the problem
needs to be reformulated in terms of probability of density function or pdf of a
state system.
This approach can be addressed using Bayesian statistics and stochastic filtering.
Before demonstrating the Kalman filter can be derived in the framework of the
Bayesian statistics, we must introduce the recursive Bayesian estimation.
The stochastic filter can be described as giving the initial density p(x0 ), the tran-
sition density p(xk |xk1 ), and the likelihood p(yk |xk ). The objective of the filtering
is to estimate the optimal current state at time k given the observations up to time k,
which is how to arrive at estimating the a posteriori density p(xk |y0:k ) or p(x0:k |y0:k ).
Although the posterior density provides a complete solution of the filtering prob-
lem, the problem is still complex since the density is a function rather than a finite-
dimensional point estimate. Because several physical systems do not have finite
dimension, the infinite-dimensional system can only be approximately modeled
trough a finite-dimensional filter, i.e. the filter can only be suboptimal.
In order to derive the recursive Bayesian filter, two assumptions are produced:
1. The states follow a first-order Markov process p(xk |x0:k1 ) = p(xk |xk1 );
2. the observations are independent of the given states. For simplicity, we denote
with Yl = y0:l := {y0 , . . . , yl } a set of observations available up to some time tl ,
while Yk = y0:k = y0:k := {y0 , . . . , yk } is the solution of filtering problem for
tk , k = 1, . . . 3
If the conditional pdf of xk is denoted by p(xk |Yk ), from Bayesian rule we have
The aim of Bayesian filtering is to apply the Bayesian statistics and Bayesian rule
to probabilistic inference problems, and the stochastic filtering problem. Bayesian
filtering is optimal in a sense that it seeks the posterior distribution which integrates
and uses all of available information expressed by probabilities (assuming they are
quantitatively correct). The criteria for measuring the optimality can be done by:
1. The Minimum mean-squared error (MMSE) that is defined in terms of prediction
or filtering error (or equivalently the trace of state-error covariance)
E[||xk xk || |y0:k ] =
2
||xk xk ||2 p(xk |y0:k )dxk (4.4)
where the conditional mean xk = E[xk |y0:k ] = xk p(xk |y0:k )dxk .
2. Maximum a posteriori (MAP) that is aimed to find the mode of posterior proba-
bility p(xk |y0:k ) which is equal to minimize a cost function
3. The Maximum likelihood (ML) which reduces to a special case of MAP where
the prior is neglected.
4. The Minimax that is to find the median of posterior p(xk |y0:k ).
When the mode and the mean of distribution coincide, the MAP estimation is correct.
However, for multimodal distributions, the MAP estimate can be arbitrarily negative.
MMSE requires full knowledge of the prior, likelihood and evidence, while MAP
methods require the estimation of the posterior distribution (density), but it does not
require the calculation of the denominator and thereby is computational inexpensive.
Note that, however, MAP estimate has a drawback especially in a high-dimensional
space.
yk = h(k, xk , uk , vk ), (4.6)
where Eqs. (4.5) and (4.6) are called state equation and measurement equation,
respectively; xt represents the state vector, yk is the measurement vector, uk rep-
resents the system input vector (as driving force) in a controlled environment. f and
h are two vector valued functions, which are potentially time-varying; wk and vk
represent the process (dynamical) noise and measurement noise respectively, with
appropriate dimensions. Since the extension to a driven system is straightforward,
the driving force, also called stochastic control problem, is not considered in our
approach.
64 4 Advanced Data Assimilation Methods
yk = Hk xkt + vk , (4.8)
where the true state xkt that we want to estimate has a probability density function
p(xkt ). Although the probability density function is not available, it is the complete
solution of the prediction problem.
Given a set of realizations of all observations available up the time tl
the conditional probability density function given by p(xkt |Ylo ) yields to the solution
of the filtering problem at time tk , k = 1, 2, 3 . . .. The density probability function
p(xkt |Ylo ) for fixed l and tk , k = l + 1, l + 2, l + 3 . . . gives the solution for the
prediction problem from time t1 to tl .
However in several problems it is not possible compute them because, unlike the
unconditional densities, the conditional densities are random functions because they
depend on the observations.
Thus, instead of computing the densities explicitly we can use the conditional
mean xk|k1 = E[ p(xkt |Yko )] that is a random n-vector since it depends on the
realizations of the observations.
The filtering solution is given by the Kalman filter, in which the sufficient statistics
of mean and state-error correlation matrix are calculated and propagated. In Eqs. (4.7)
and (4.8), Fk1,k and Hk are called transition matrix and measurement matrix, respec-
tively. Kalman filter is also optimal in the sense that it is unbiased E[xk ] = E[xk ]
and is a minimum variance estimate. The continuous-time domain, in practice how-
ever, is more concerned about the discrete-time filtering, because the continuous-time
dynamic system can be always converted into a discrete-time system by sampling the
outputs and using zero-order holds on the inputs. Hence the derivative will be replaced
by the difference and the operator will become a matrix. wk and vn can be viewed as
white noise random sequences with unknown statistics in the discrete-time domain.
The state equation characterizes the state transition probability p(xk+1 |xk ), whereas
the measurement equation describes the probability p(yk |xk ) which is related to the
measurement noise model. For simplicity we assume the dynamic and measurement
noises are both Gaussian distributed with zero mean and constant covariance. The
derivation of Kalman filter in the linear Gaussian scenario is based on the following
assumptions:
wkt is Gaussian with zero mean and white noise in time, i.e.:
vk N (0, Rk ); (4.11)
t
Suppose we are given the conditional density p(xk1 |Yk1
o ),the object is to calculate
p(xk |Yk ).
t o
t
Suppose we are given the conditional density p(xk1 |Yk1
o ),the object is to cal-
t
Denote the mean and covariance matrix respectively of the density p(xk1 |Yk1
o )
a
xk1 E[ p(xk1
t
|Yk1
o
)] (4.16)
and
a
Pk1 E[(xk1
t
xk1
a
)(xk1
t
xk1
a
)T |Yk1
o
]. (4.17)
a
xk1 is the analysis at time tk1 , an n-vector, the expected value of the true state
xkt conditioned on all observations available up to and including that time, while
a
Pk1 is the analysis error covariance matrix, an n nmatrix, at time tk1 . At time
t there are no observations so from it follows that p(x0t ) is a Gaussian density with
mean x0a xt0 and covariance matrix P0a P0 We will see that if p(xk1 t |Yk1
o )
is Gaussian then is p(xk |Yk ) so by induction it will follow that all the densities
t o
and
f f f
Pk E[(xkt xk )(xkt xk )T |Yk1
o
]. (4.19)
66 4 Advanced Data Assimilation Methods
f
xk1 is the forecast at the new time tk1 , an n-vector, the expected value of the true
state xkt conditioned on all observations available up to and including that time, while
f
Pk is the forecast error covariance matrix, an n n matrix, at time tk
Substituting (4.7) into (4.18) we obtain:
f
xk = Fk1 E[xk1
t
|Yk1
o
] + E[wk1
t
|Yk1
o
], (4.20)
and therefore vanishes under the assumption which is indeed a forecast to time tk1
via the linear propagator Fk1 . Thus we have
f
xk = Fk1 xk1
a
, (4.21)
which is indeed a forecast to time tk from the analysis at time tk1 via the linear
propagator Fk1 .
Substituting (4.7) and (4.21) into (4.19) we obtain:
f
Pk = E[{Fk1 (xk1
t
xk1
a
)+wk1
t
}{Fk1 (xk1
t
Xk1
a
)+wk1
t
}T |Yk1
o
], (4.22)
f
Pk = Fk1 Pk1
a T
Fk1 + Qk1 . (4.23)
random vectors.
More complex is the analysis step where we need to make the Gaussian assump-
tion. However let us apply the recursive Bayesian estimation underlying the prin-
ciple of sequential Bayesian filtering. Under the assumptions that the states follow
a first-order Markov process p(xkt |x0:k1 ) = p(xkt |xk1 ) and the observations are
independent of the given states, we can write:
p(yk , Yk1
o |xt ) p(xt )
k k
=
p(yk , Yk1 )
p(yk |Yk1
o , xt ) p(Y o |xt ) p(xt )
k k1 k k
=
p(yk |Yk1
o ) p(Y
k1 )
p(yk |Yk1
o , xt ) p(xt |Y o ) p(Y o ) p(xt )
k k k1 k1 k
=
p(yk |Yk1
o ) p(Y o ) p(xt )
k1 k
p(yk |xk ) p(xk |Yk1
t t o )
= . (4.24)
p(yk |Yk1
o )
We need to evaluate the a posteriori density by the right side of (4.24) that is
described by
1. the a priori pdf p(xkt |Yk1
o ) that defines the knowledge of the model
p(xkt |Yk1
o
)= p(xkt |xk1 ) p(xk1 |Yk1
o
)dxk1 , (4.25)
When the mode and the mean of distribution coincide, the MAP estimation is cor-
rect; however, for multimodal distributions, the MAP estimate can be arbitrarily bad.
Both MMSE and MAP methods require the estimation of the posterior distribution
(density), but MAP does not require the calculation of the denominator (integra-
tion) and thereby more computational inexpensive; whereas the former requires full
knowledge of the prior, likelihood and evidence. Note that however, MAP estimate
has a drawback especially in a high-dimensional space.
Let xkM A P denote the MAP estimate of xkt that maximizes p(xkt |Yk ), or equiva-
lently log p(xkt |Yko ). By using the Bayesian rule, we may express p(xkt |Yko ) by
p(xkt |Yko )
p(xkt |Yko ) =
p(Yko )
p(xkt , yk , Yk1
o )
= , (4.27)
p(yk , Yk1
o )
p(xkt , yk , Yk1
o
) = p(yk |xkt , Yk1
o
) p(xkt , Yk1
o
)
= p(yk |xkt , Yk1 ) p(xkt |Yk1
o
) p(Yk1
o
)
which shares the same form as (4.24). Under the Gaussian assumption of process
noise and measurement noise, the mean and covariance of p(yk |xkt ) are calculated by
and
E[(yk E[yk |xkt ])(yk E[yk |xkt ])T |xkt ] = E[vk (vk )T ] = Rk . (4.31)
1
p(yk |xkt ) = A1 exp{ (yk Hk xkt )T Rk1 (yk Hk xkt )}, (4.32)
2
1
) = A2 exp{ (xkt xk )T (Pk )1 (xkt xk )},
f f f
p(xkt |Yk1
o
(4.33)
2
1 1
p(xkt |xk ) A exp{ (yk Hk xkt )T Rk1 (yk Hk xkt ) (xkt xk )T (Pk )1 (xkt xk )}, (4.34)
f f f
2 2
4.1 Recursive Bayesian Estimation 69
f f
xkM A P = xk + Kk (yk Hk xk ) (4.37)
f f
xkt xkM A P = xkt xk Kk (yk Hk xk )
f
= (I Kk Hk )(xkt xk ) Kk vk (4.40)
and taking into account the assumptions (4.15) the covariance matrix (4.39) is
f
Pka = (I Kk Hk )Pk (I Kk Hk )t + Kk Rk KkT . (4.41)
f
Pka = Pk (I Kk Hk ). (4.42)
Thus far, the Kalman filter is completely derived from MAP principle.
(A BD1 C)1 , it follows from the matrix inverse lemma, or Sherman Morrison Woodbury
1 For
The same procedure can be extended to Maximum likelihood maximizing the log
likelihood
log p(xk |Yk ) = log p(xk , Yk ) log p(Yk ) (4.43)
from which we can obtain the same solution for Kalman filter.
We have seen the Kalman filter can be derived within the Bayesian framework. By
admitting state-space formulation, Kalman filter provides that the signal process (i.e.
state) is regarded as a linear stochastic dynamical system driven by white noise, the
optimal filter thus has a stochastic differential structure which makes the recursive
estimation possible. Kalman filter is an unbiased minimum variance estimator under
Linear Quadratic Gaussian control circumstance. When the Gaussian assumption of
noise is violated, Kalman filter is still optimal in a mean square sense, but the estimate
does not produce the conditional mean (i.e. it is biased), and neither the minimum
variance. Kalman filter is not robust because of the underlying assumption of noise
density model.
In order to approximate the optimal estimation methods for linear problems with
gaussian statistics, Monte Carlo methods have been used to estimate the initial con-
dition for forecasting (Evensen and van Leuwen [1], Houtemaker and Mitchel [2])
where f(k, xk ) is often called nonlinear drift and (t, xk ) is diffusion coefficient.
Again, the noise processes {wk , vk , k 0} are two Brownian or Wiener processes.
First, lets look at the state diffusion equation. For all t 0, we define a backward
diffusion operator Lk , a partial differential operator, as:
4.1 Recursive Bayesian Estimation 71
Nx
1
Nx
ij 2
Lk = fki + at , (4.48)
xi 2 xi x j
i=1 i, j=1
ij
where ak = i (k, xt ) j (k, xk ). Operator Lk corresponds to an infinitesimal gen-
erator of the diffusion process {xk , k 0|}. The goal now is to deduce conditions
under which one can find a recursive and finite-dimensional (close form) scheme
to compute the conditional probability distribution p(xk |Yk ), given the filtration Yk
produced by the observation process (4.6).
Lets define an innovations process, that is defined as a white Gaussian noise
process
k
ek = yk E[h(k, xk )|y0:k ]ds, (4.49)
0
Nx
1 i j 2
N
Lt = fti + ak , (4.51)
xi 2 xi x j
i=1 i, j=1
p(xk )
= Lk p(xk ) (4.52)
k
or the SDE (4.46).
The FPK equation can be interpreted as follows: the first term is the equation of
motion for a cloud of particles whose distribution is p(xk ), each point of which obeys
the equation of motion dxdk = f(xk , k). The second term describes the disturbance
due to Brownian motion or as it also called, Wiener noise.
The (4.52) can be solved exactly by Fourier transform. By inverting the Fourier
transform, we can obtain
1 (x x0 f(x0 )k)2
p(x, k + k|x0 , k) = exp{ }, (4.53)
2 0 k 20 k
The Eq. (4.52) can be considered as the fundamental equation for the time evo-
lution of error statistics. It describes the change of the pdf in a local volume that
depends on the divergence term describing a probability flux into the local volume.
The diffusion term tends to flatten the pdf due to the effect of stochastic model errors,
that is the probability decrease and error increases. When, as in the case of (4.53),
we are able to solve the pdf we are also able to be calculate the mean and error
covariance for the model forecast at different time levels. This is the case of Kalman
filter where pdf is fully characterized by its mean and covariance. However when
we have a non linear model the time evolution of (xij ) is not able to characterize
the mean and covariance and the solution is to solve the FPK by using Monte Carlo
methods. If the pdf is represented by a large ensemble of model states it is possible to
integrate each member of the ensemble forward in time using the stochastic model
(4.46), that is equivalent to solving the FPK equation with Monte Carlo.
The standard approach is to calculate firstly the best guess initial condition on
the base of the data and statistics. The ensemble of initial states is generated with
the mean that is equal to the best initial condition and the variance is specified from
the known uncertainties of the first guess initial state. The covariance should reflect
the true scale of the system. In order to provide a realistic increase in the ensemble
variance the estimate of the model error variance should be include to give the time
evolution of the external error in the estimates.
The Ensemble Kalman filter (EnKF) was introduced by Evensen [3, 4] in alterna-
tive to the Extended Kalman filter (EKF) (see Kalman chapter) and it is a Monte
Carlo approximation to the traditional Kalman Filter. The EnKF is a sequential data
assimilation method in which the error statistics are predicted by solving the Fokker
Planck or forward Kolmogorov equation with Monte Carlo or ensemble integration
by which it is possible to compute the statistical moments like mean and error covari-
ance. Respect to the basic KF the gain of the update equation is calculated from the
error covariance provide by the ensemble of model states. The ensemble mean is the
best estimate and the spreading of the ensemble around the mean. If we assume to
have N model states in the ensemble, each of dimension n, they can be represented as
a single point in an n-dimensional state space. In the state space the ensemble mem-
bers will constitute a cloud points that is described by a probability density function
of the type f (x) = dnn where dn is the number of points in a small unit volume and
n is the total number of points. When we know the ensemble representing f (x) we
can calculate the moments of the statistics.
Thus the Ensemble Kalman Filter seeks to mimic the Kalman Filter but with an
ensemble of limited size rather than with error covariance matrix. The aim of EnKF
is to perform an analysis through each member of the ensemble
4.2 Ensemble Kalman Filter 73
f f
xia = xi + K En K F (yi H(xi )), (4.54)
f
where i = 1, 2, 3, . . . n is the index indicating the member of the ensemble and xi
is the state vector describing the i member forecasted at the time analysis. K En K F
should be the Kalman gain we should compute from the ensemble statistics.
where R is the given observational error covariance matrix. Following Whitaker and
Hamill [5]) the forecasted error covariance matrix is estimated from the ensemble
1 f
n
Pf = (x x f )(x f x f )T , (4.56)
n1
i=1
with
1 f
n
x f = xi . (4.57)
n
i=1
Note that wherever an overbar is used in the context of a covariance estimate a factor
of n 1 instead of n is at denominator that implies the estimate is unbiased.
Thanks to (4.54) we can obtain a posterior ensemble{x1a } with i = 1, 2, 3, . . . n
from which we can compute the posterior statistics. Thus the analysis is computed
as the mean of the posterior ensemble:
1 a
n
xa = xi . (4.58)
n
i=1
If all members are updated with the same observations yi y the deviations of
the ensemble members from the mean, that represent the ensemble anomalies ea =
xia xa implies:
f f f
eia = ei + K En K F (0 Hei ) = (In K En K F H)ei , (4.59)
1 a
n
Pa = (e = xia xa )(ea = xia xa )T = (In K En K F H)P f (In K En K F H)T ,
n
i=1
(4.60)
where K En K F K Comparing this equation with (4.41) shows the second term is
missed and the underestimation can lead to the divergence of EnKF.
74 4 Advanced Data Assimilation Methods
A solution can be that to perturb the observation vector for each member: yi =
y + ui where ui is a Gaussian distribution of the type ui N (0, R). Lets now define
an empirical error covariance matrix
1
n
Ru = ui uiT , (4.61)
n1
i=1
n
where in order to avoid biases i=1 ui = 0. Ru R in the symptotic limit n
With the introduction of this perturbation we modify the anomalies accordingly
f f
eia = ei + KuEn K F (eio Hei ), (4.62)
Pa = (In KuEn K F H)P f (In KuEn K F H)T + KuEn K F Ru (KuEn K F )T = (In KuEn K F H)P f .
(4.63)
Also in the forecast step the updated ensemble obtained at the analysis evolves in
time and is propagated according to a model
f
xi = M(xia ) f or i = 1, 2, n (4.64)
where M is a model operator. The forecast estimate is the mean of the forecasted
n f
ensemble x f = n1 i=1 xi , while the forecast error covariance matrix is
1 f
n
f
Pf = (xi x f )(xi x f )T . (4.65)
n1
i=1
The most important difference between ensemble Kalman filtering and the other
methods is that it only quantifies uncertainty in the space spanned by the ensemble.
We can be under a severe limitation if computational resources restrict the number
of ensemble members n to be much smaller than the number of model variables m.
When this limitation is overcome, then the analysis can be performed in a much
lower-dimensional space (n versus m). Thus, ensemble Kalman filtering has the
potential to be more computationally efficient than the other methods.
1. Initialization
f f
Initial system state x0 and initial covariance matrix P0
4.2 Ensemble Kalman Filter 75
2. For tk = 1, 2 . . .
a. Observation
Draw a statistical consistent observation set for i = 1, 2, . . . n:
n
yi = y + ui ui = 0 (4.66)
i
1
n
Ru = ui uiT (4.67)
n1
i
3. Analysis
Compute the gain
Ku = P f HT (HP f HT + Ru )1 (4.68)
f f
xia = xi + Ku (yi H(xi )) (4.69)
1 a
n
Pa = (xi xia )(xia xia )T (4.71)
n1
i
4. Forecast
f
Compute the ensemble forecast for i = 1, 2, . . . , n, xi = M(xia ) and their
mean
1 f
n
f
xi = xi (4.72)
n
i
1 f
n
f f f
Pf = (xi xi )(xi xi )T (4.73)
n1
i
76 4 Advanced Data Assimilation Methods
Lets now define a revised EnKF in which we eliminate the necessity to introduce
the perturbation on the observations. There are several variants of the deterministic
EnKF, (Whitaker and Hamill [5], Bishop et al [6], Hunt et al. [7]). We report
here the one developed by Bishop et al. [6], Hunt et al. [7] that can be defined
as Ensemble Transform ( ETKF) or one of the variant of the Ensemble square root
Kalman Filter (EnSRKF). The other approaches (EnSRKF and LETKF) are presented
in the applications chapter, where also presented is the Particle filters (Monte Carlo)
method applied to a highly non-Gaussian distribution and non linear operator.
The entire approach is based on the ensemble space rather than in the state or obser-
f
vation space. We start with an ensemble {(Xi )k } i = 1, 2, . . . , n of n-dimensional
model state vectors at time tk1 . Rather than to let one of the ensemble members
represent the best estimate of the system state, one assumes the ensemble to be
chosen so that its average represents the analysis state estimate. We evolve each
ensemble member according to the nonlinear model to obtain a forecast ensemble
f
{(Xi )k1 , i = 1, 2, . . . , n} at time tk :
f
[Xi ]k = Mk1,k ([Xia ]k1 ). (4.74)
Let l the number of scalar observations used in the analysis. Starting from the sample
n f
mean x f = n1 i=1 Xi the forecast error covariance matrix (4.65) can be written:
1 f
n
f
Pf = (Xi x f )(xi x f )T = X f (X f )T , (4.75)
n1
i=1
The analysis not only determines and estimate, a covariance but also an ensemble
{(xia )k1 }, i = 1, 2, . . . , n with the appropriate sample mean and covariance.
1 a
n
xa = xi (4.77)
n
i=1
and
1 a
n
Pa = (xi xa )(xia xa )T = Xa (Xa )T , (4.78)
n1
i=1
4.2 Ensemble Kalman Filter 77
xa xa
[Xa ]i = i . (4.79)
n1
f
In order to describe the transformation from a forecast ensemble {(Xi )k , i =
1, 2, . . . , n} into an appropriate analysis ensemble {(Xia )k1 , i = 1, 2, . . . , n} we
assume that the number of ensemble members n is smaller than both the number
of model variables m and the number of observations l, even when localization has
reduced the effective values of m and l considerably compared to a global analy-
sis. Assuming the choice of observations to use for the local analysis to have been
performed already, and consider y, H and R to be truncated to these observations;
as such, correlations between errors in the chosen observations and errors in other
observations are ignored. Most of the analysis will take place in a n-dimensional
space, with as few operations as possible in the model and observation spaces.
Remember the relation (4.45) where we want to minimize the cost function J
where we have the forecast covariance matrix having rank at most n 1 and therefore
it is not invertible. Nevertheless, its inverse is well-defined on the space S spanned
by the background ensemble perturbations, that is, the columns of x f . Thus J is
also well-defined in S, and the minimization can be carried out in this subspace. The
reduced dimensionality is an advantage from the point of view of efficiency, though
the restriction of the analysis mean to S is sure to be detrimental if n is too small. A
natural approach is to use the singular vectors of Y f (the eigenvectors of P f ) to form
a basis for S. In order to perform the analysis on S, we must choose an appropriate
coordinate system even though a conceptual difficulty in this approach is that the sum
of these columns is zero, so they are necessarily not linearly independent. We could
assume the first n 1 columns to be independent and use them as a basis, but this
assumption is unnecessary and clutters the resulting equations. Instead, we regard
x f as a linear transformation from a n-dimensional space S onto S, and perform the
analysis in S. Let d denote a vector in S then X f d belongs to the space S spanned
by the background ensemble perturbations, and
x = x f + X f d (4.80)
1 f
n
f
Pf = (xi x f )(xi x f )T = X f (X f )T . (4.81)
n1
i=1
78 4 Advanced Data Assimilation Methods
If the observation operator is linear we can also use the notation Y f = HX f . If the
operator is not linear we consider Y f to be the matrix of the observation anomalies:
f
H(xi ) y f
[Y f ]i = , (4.82)
n1
n f
with y = n1 i=1 H(xi ).
With respect to the stochastic EnKF one performs a single analysis rather than
performing an analysis for each member of the ensemble. Adapting the mean analysis
of the stochastic EnKF we have:
xa = x f + K (y Hx f ), (4.83)
xa = x f + X f da . (4.84)
Then the gain is computed into the ensemble space rather than in the observation
space, that mean we are reformulating the stochastic EnKF.
In order to understand the difference between the deterministic EnKF and the
stochastic EnKF we can generate the posterior ensemble. Remembering relation
(4.63) we want to factorize Pa = Xa (Xa )T . In fact
Now we are looking for a square root matrix such as Y a (Y a )T = Pa . One of this
matrix is 1
Xa = X f (In (Y f )T (Y f (Y f )T + R)1 Y f ) 2 (4.89)
This is the a posteriori ensemble of anomalies for the deterministic EnKF. Defining
1
as T = (In + (Y f )T R1 Y f ) 2 we can build the posterior ensemble as:
xia = xa + n 1X f [T]i = x f + X f (da + n 1X f [T]i ). (4.91)
because 1 = [1, 1, 1, 1..]T is the matrix that yields the same state vector x since
X f 1 = 0.
1. Initialization
f
Ensemble of state vectors E0 = {x0 , . . . xn }
2. For tk = 1, 2, . . .
a. Analysis
Compute forecast mean, the ensemble anomalies and the observation
anomalies:
f1 f
xk =
E 1
n k
f 1 f
Yk = (Ek x f 1T )
n1
f 1 f
Yk = (Hk (Ek ) Hk (x f )1T )
n1
f f 1
Eka = xk 1T + Xk (dka 1T + n 1T 2 ) (4.95)
b. Forecast
Forecast ensemble
f
Ek+1 = Mk+1 (Eka ) (4.96)
4.3.1 Inbreeding
Inbreeding has been introduced by Houtekamer and Mitchell [2] to describe the
phenomenon in ensemble filtering that arise due to undersampling. This term is also
used to describe a situation where the analysis error covariances are systematically
underestimated after each of the observation assimilations. Furrer and Bengtsson
[8] indicate the analysis error covariance Pa should always be less than that of the
forecast error covariance, since, as we have been before, it is defined as:
where the index e indicates the ensemble, uses a ratio of the error covariance of the
forecast background state and the error covariance of the observations to calculate
the weight that should be placed on the background state and how much weighting
should be given to the observations. Then the forecast state estimate is adjusted
by the observations in accordance with the ratio of background and observation
covariance matrices in the Kalman gain. Note if either the forecast background errors
or observational errors are incorrectly specified then the adjustment of the forecast
state will be incorrect.
Inbreeding in small ensembles that do not adequately span the model subspace
can occur due to sampling errors in the covariance estimate (Lorenc [10]). The
smaller the ensemble is, the greater the degree of undersampling is present and
the greater the chance is of underestimated forecast error covariances (Ehrendorfer
[11]). In ensemble filters, where each member of the ensemble is updated, at the
analysis stage, by the same observations and the same Kalman gain matrix there
is a tendency for the filter to underestimate the analysis covariance (Whitaker and
Hamill [5]). Ensemble Kalman filters that use perturbed observations, such as the
EnKF of Evensen [4], have additional sampling errors in the estimation of the
observation error covariances. This in turn makes it more likely that inbreeding will
occur (Whitaker and Hamill [5]).
Ensemble Kalman filters that use perturbed observations have additional sam-
pling errors in the estimation of the observation error covariances which means it is
likely that the inbreeding occurs (Whitaker and Hamill [5]). In case of Square root
filters since observations are not perturbed the inbreeding is negated. Inbreeding is a
potential source of filter divergence and the development of spurious long range cor-
relations (Hamill et al. [12]). Undersampling produces a reduced rank representation
of the background error covariance matrix and in cases where the undersampling is
severe there is a tendency for the variances and covariances to be underestimated.
82 4 Advanced Data Assimilation Methods
Spurious correlations happen in the forecast error covariance between state compo-
nents that are not physically related and they are normally at a significant distance
from each other. Where all the observations have an impact on each state variable large
long range spurious correlations may develop (Anderson [14]). The consequence
of these is that a state variable may be incorrectly impacted by an observation that
is physically remote. As the size of the ensemble and the true correlation between
state components decreases, so the error in the covariance error estimate, relative
to the true correlation, greatly increases (Hamill et al. [12]). In the physical world
it is expected that at a distance from a given observation point the true correlation
will decrease. In a NWP the size of the error relative to the true correlation at grid
points remote from the observation point in the forecast error covariance matrix will
therefore be expected to increase (Hamill et al. [12]). It was demonstrated by Hamill
et al. [12] that the analysis estimate provided by the EnKF was less accurate when
the error in the covariance estimate, known as noise, is greater than the true corre-
lation, known as the the signal. Since the correlations at a distance are expected to
be small and the relative error increases with distance, then it is expected that state
components distant from the observations have a greater noise to signal ratio. These
are long range spurious correlations and they degrade the quality of the analysis
estimate. Further Hamill et al. [12] show that the noise to signal ratio is a function
of ensemble size. Larger ensembles which more accurately reflect the statistics have
less associated noise. Thus the problem of spurious correlations is associated with
under-sampling. Lorenc [10] shows that when an ensemble is generated by random
sampling from a probability distribution function (pdf), the forecast error covariance
will have an error proportional to 1/N, where N is the size of the ensemble.
4.4 Methods to Reduce Problems of Undersampling 83
If the ensemble has n members, then the forecast covariance matrix P f given by
(4.75) describes nonzero uncertainty only in the n-dimensional subspace spanned by
the ensemble, and a global analysis will allow adjustments to the system state only
in this subspace. If the system is high-dimensionally unstable, then forecast errors
will grow in directions not accounted for by the ensemble, and these errors will not
be corrected by the analysis. On the other hand, in case of a sufficiently small local
region, the system may behave like a low-dimensionally unstable system driven by
the dynamics in neighboring region (see Hunt [7]).
Localization is generally done either explicitly, considering only the observations
from a region surrounding the location of the analysis, or implicitly, by multiplying
the entries in P f by a distance-dependent function that decays to zero beyond a certain
distance, so that observations do not affect the model state beyond that distance.
Since the model is discretized, the choice of which observations to use for each grid
point depends also on the method adopted, and a good choice will depend both on
the particular system being modeled and the size of the ensemble (more ensemble
members will generally allow more distant observations to be used gainfully). It is
important however to have significant overlap between the observations used for
one grid point and the observations used for a neighboring grid point; otherwise the
analysis ensemble may change suddenly from one grid point to the next.
The global analysis should not be confined to the n-dimensional ensemble space
and instead should explore a much higher dimensional space. However the necessity
of spatial localization for space temporally chaotic systems introduce barely corre-
lations between distant locations in the background covariance matrix P f , at least
for short time scales. These spurious correlations randomly affect observations. If
the system has a characteristic correlation distance. then the analysis should ignore
84 4 Advanced Data Assimilation Methods
One method to correct the underestimation in the forecast error covariance matrix
is the Covariance inflation. It was introduced by Anderson and Anderson [18] with
the aim to increase the forecast error covariances by inflating, for each ensem-
ble member, the deviation of the background error from the ensemble mean by a
certain percentage. In fact prior to a new observation being assimilated in any new
cycle, the background forecast deviations from the mean are increased by a factor of
inflation, r,
f f
xi r(xi x f ) + x f , (4.98)
Houtekamer and Mitchell [2], Hamill et al. [12], Whitaker and Hamill [5] show
the covariance localization is a process of cutting off longer range correlations in the
error covariances at a specified distance. It is a method of improving the estimate
of the forecast error covariance. Since the empirical P f is a good approximation,
the insufficient rank introduces long range correlations that are the above mentioned
4.4 Methods to Reduce Problems of Undersampling 85
spurious correlations. Then the idea is to regularize P f by shooting out these long
range correlation and increasing the rank of P f by multiplying it with short range
predefined correlation matrices . The pointwise multiplication is ordinarily achieved
by applying a Schur product (Schur [20]), also known as the Hadamard product (Horn
[21]), denote with a , to the forecast error covariance matrix.
A Schur product involves an element-wise product of matrices written as P f ,
where P f and have the same dimensions. If i is the row index and j is the column
index the Schur product is calculated as
In mathematics, the Hadamard product (also known as the Schur product or the
entrywise product) is a binary operation that takes two matrices of the same dimen-
sions, and produces another matrix where each element i j is the product of elements
i j of the original two matrices. It should not be confused with the more common
matrix product. It is attributed to, and named after, either French mathematician
Jacques Hadamard, or German mathematician Issai Schur. The Hadamard product
is associative and distributive, and unlike the matrix product it is also commutative
(see Wikipedia)
To achieve covariance localization by Schur product a function, , is normally
defined to be a correlation function with local support. Local support is a term
meaning that the function is only non zero in a small (local) region and is zero
elsewhere. The correlation function is commonly taken to be the compactly supported
5th order piecewise rational function as defined in Gaspari and Cohn [22], such that
1
(|z|/c)5 + 21 (|z|/c)4 + 58 (|z|/c)3 53 (|z|/c)2 + 1, 0 |z| c
14
(|z|/c)5 21 (|z|/c)4 + 58 (|z|/c)3 + 53 (|z|/c)2
= 12
5(|z|/c) + 4 + 58 (|z|/c)3 23 (c/|z|) c |z| 2c
0 2c |z|,
where z is the Euclidean distance between either the grid points in physical space or
the distance between a grid point and the observation location; this is dependent on
the implementation. A length scale c is defined such that beyond this the correlation
reduces from 1 and at a distance of more than twice the correlation length
scale the
correlation reduces to zero. The length scale is generally set to be c = 10 3l where l
is any chosen cutoff length scale. The factor 10 3l is included to tune the correlation
function to be optimal (Lorenc [10]) such that the final localised global average error
variance is closet to that of the true probability distribution.
To achieve covariance localization a Schur product is taken between the forecast
background error covariance matrix, P f , calculated from the ensemble, and a corre-
lation function with local support, . Remembering the Kalman gain (4.55) we can
write:
Ke = ( Pe )HT (H( Pe )HT + R)1 .
f f
(4.100)
86 4 Advanced Data Assimilation Methods
References
1. Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the Agulhas current
using the ensemble Kalman filter with a quasi geostrophic model. Mon. Weather. Rev. 124,
8596 (1996)
2. Houtekamer, P.L., Mitchell, H.L.: Data assimilation using an ensemble Kalman filter technique.
Mon. Weather. Rev. 126, 796811 (1998)
3. Evensen, G.: Sequential data assimilation with nonlinear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics. J. Geophys. Res. 99, 143162 (1994)
4. Evensen, G.: Data assimilation: the ensemble kalman filter, 2nd edn, p. 320. Springer, New
York (2009)
5. Whitaker, J.S., Hamill, T.M.: Ensemble data assimilation without perturbed observations. Mon.
Weather. Rev. 130, 19131924 (2002)
6. Bishop, C.H., Etherton, B.J., Manjundar, S.J.: Adaptive sampling with the ensemble transform
filter. Part I: theoretical aspects. Mon. Weather. Rev. 129, 420436 (2001)
7. Hunt, B.R., Kostelich, E.J., Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos:
a local ensemble transform Kalman filter. Physica D 230, 112126 (2007)
8. Furrer, R., Bengtsson, T.: Estimation of high-dimensional prior and posterior covariance matri-
ces in Kalman filter variants. J. Multivar. Anal. 98, 227255 (2007)
9. UKMet Office.: Observations. https://www.metoffice.gov.uk/research/nwp/observations/. Last
accessed (2008)
10. Lorenc, A.C.: The potential of the ensemble Kalman filter for NWPa comparison with 4D-
VAR. Q. J. R. Meteorol. Soc. 129, 31833203 (2003)
11. Ehrendorfer, M.: A review of issues in ensemble-based Kalman filtering. Meteorol. Z. 16,
795818 (2007)
References 87
12. Hamill, T., Mullen, S., Snyder, C., Toth, Z., Baumhefner, D.: Ensemble forecasting in the short
to medium range: report from a workshop. Bull. Am. Meteorol. Soc. 81, 26532664 (2000)
13. Petrie, R.E.: Localization in the ensemble Kalman Filter. MSc Atmosphere, Ocean and Climate
University of Reading (2008)
14. Anderson, J.L.: An Ensemble Adjustment Kalman Filter for data assimilation. Mon. Weather.
Rev. 129, 28842903 (2001)
15. Ott, E., et al.: A local ensemble Kalman filter for atmospheric data assimilation. Tellus 56A,
415428 (2004)
16. Szunyogh, I., Kostelich, E.J., Gyarmati, G., Patil, D.J., Hunt, B.R., Kalnay, E., Ott, E., Yorke,
J.A.: Assessing a local ensemble Kalman filter: perfect model experiments with the National
Centers for Environmental Prediction global model. Tellus 57A, 528545 (2005)
17. Kalnay, E.: Atmospheric modelling: data assimilation and predictability. Cambridge University
Press, Cambridge (2003)
18. Anderson, J.L., Anderson, S.L.: A Monte Carlo implementation of the nonlinear filtering prob-
lem to produce ensemble assimilations and forecasts. Mon. Weather. Rev. 126, 27412758
(1999)
19. Oke, P.R., Sakov, P., Corney, S.P.: Impacts of localisation in the EnKF and EnOI: experiments
with a small model. Ocean. Dyn. 57, 3245 (2007)
20. Schur, I.: Bemerkungen zur theorie der beschrnkten bilinearformen mit unendlich vielen vern-
derlichen. Journal Fur Die Reine Und Angewandte Mathematik 140, 128 (1911)
21. Horn, R.: The Hadamard product. In: Johnson, C.R. (ed.) Matrix theory and applications,
American Mathematical Society, Proceedings of Symposia in Applied Mathematics, vol. 40,
pp. 87169 (1990)
22. Gaspari, G., Cohn, S.E.: Construction of correlation functions in two and three dimensions. Q.
J. R. Meteorol. Soc. 125, 723757 (1999)
Chapter 5
Applications
Abstract Some times the applications are neglected, when, instead of they are the
true benchmark of the theory. This chapter deals, by using simple pedagogical exam-
ples, with the forecasting processes based on data assimilation methods. They are
applied in different fields: the atmospheric study by the classical Lorenz model, the
biology of cells by a model of tumor growth, the planetary general circulation by an
application on Mars and the earthquake forecasts by renewal processes models.
The Kalman Filter equations describe how information from forecasts and from
observations should be combined in a optimal way which extracts maximum infor-
mation from each source. A forecast starts from initial conditions at time t0 and run
to time t1 accumulating errors over this period, In language of DA the assimilated
forecast/observation is called analysis state. The analysis state at t1 is then used as
initial condition for another forecast to time t2 which is then combined with the next
batch of observations at that time. When one uses the KF one describes not only the
analysis state but also the uncertainties correlated with the information provided by
the observations.
The Kalman filter presents the optimal solution to the data assimilation problem
under the assumptions of linear models with Gaussian observation and model noise.
These assumptions are strongly violated in stochastic point process models for earth-
quake forecasting where one needs a more general approach based on propagating
the entire probability distribution, rather than solely mean and covariance.
This chapter treats, by using simple pedagogical examples, the forecasting
processes based on data assimilation methods in different fields: the atmospheric
study by the classical Lorenz model, the biology of cells by a model of tumor growth,
the planetary general circulation by an application on Mars and the earthquake fore-
casts by renewal processes models.
The first of the pedagogical model is the simplified one proposed by Lorenz. Lorenz
[1] discusses three one-dimensional toy models that incorporate many features shown
in real atmospheric dynamics and in global numerical weather prediction models.
The first model (Lorenz model 1) was originally introduced in Lorenz [2] and Lorenz
and Emanuel [3]. This model has become the standard model for the initial testing of
EnKF schemes. The popularity of the model is in part due to the similarity between
the propagation of uncertainties (forecast errors) in Lorenz model 1 and global cir-
culation models in the midlatitude storm-track regions. In particular, the errors are
propagated by dispersive waves whose behavior is similar to that of synoptic- scale
Rossby waves, and the magnitude of the errors has a doubling time of about 1.5
days (where the dimensionless model time has been converted to dimensional time
by assuming that the characteristic dissipation time scale in the real atmosphere is
5 days). Lorenz model 2 adds the feature of a smooth spatial variation of the model
variables that resembles the smooth variation of the geopotential height stream-
function at the synoptic and large scales in the atmosphere. Lorenz model 3, the
most refined and realistic of the three models in Lorenz [1], adds a rapidly varying
small-amplitude component to the smooth large-scale flow, mimicking the effects of
small-scale atmospheric processes.
Previously, Edward Lorenz [4] had described the convection motion of a fluid in a
small, idealized Rayleigh-Benard cell. The Lorenz equations were derived from the
Oberbeck-Boussinesq approximation to the equations describing fluid circulation in
a shallow layer of fluid, heated uniformly from below and cooled uniformly from
above. This fluid circulation is known as Rayleigh-Bnard convection. The fluid
is assumed to circulate in two dimensions (vertical and horizontal) with periodic
rectangular boundary conditions.
The partial differential equations modeling the systems stream function and tem-
perature are subjected to a spectral Galerkin approximation: the hydrodynamic fields
are expanded in Fourier series, which are then severely truncated to a single term for
the stream function and two terms for the temperature. This reduces the model equa-
tions to a set of three coupled, nonlinear ordinary differential equations. A detailed
derivation may be found, for example, in nonlinear dynamics textbooks. The Lorenz
system is a reduced version of a larger system studied earlier by Barry Saltzman [5].
In his idealized model the boundary conditions of the fluid at the upper and lower
plates were stress free rather than the realistic no-slip, while the lateral bound-
ary conditions are taken to be periodic rather than corresponding to realistic side
walls, and the motion is assumed to be two-rather than three- dimensional. These
modifications greatly simplified the mathematical analysis because the governing
equations reduce the complicated partial differential equations describing the fluid
motion and heat flow to three ordinary differential equations where the fluid and heat
equations are coupled. Since these equations are non-linear, there will be terms cou-
pling the different modes, and also terms generating higher harmonics representing
the thermal modes and the fluid components velocity. The major approximation is
that the latter terms are ignored.
5.1 Lorenz Model 91
dx
= f x (x, y, z, ) = (x y)
dt
dy
= gx (x, y, z, ) = x y x z
dt
dz
= h x (x, y, z, ) = x y z, (5.1)
dt
where depends on the properties of the fluid (the ratio of the viscous to thermal
diffusivities) and = 83 (this would be different for a different choice of horizontal
wavelength or roll diameter). The temperature difference is the important control
parameter: for < 1 the solution at long times is asymptotic to x = y = z = 0,
i.e. no convection. For > 1 various chaotic solutions occur. Lorenz realized the
importance of the aperiodic motion. This Lorenz model has been widely used for
exploring many real world problems.
In the last few years an increasing number of physicists have directed their research
activities toward investigating the complex behavior of nonlinear model systems
described by seemingly simple deterministic differential or difference equations.
In fact there is a wealth of similar systems of nonlinear differential equations and
difference equations whose chaotic state is characterized by apparently random-
looking motion on attractors displaying a complicated structure. The trajectories in
the phase space of these systems and the bifurcation sequence leading to the erratic
motion have been studied intensively by computer calculations. Lyapunov character-
istic exponents (1 15), Poincar maps, symbolic transition dynamics, and other
mathematical tools have been used. Lyapunov exponents [6] measure the growth
rates of generic perturbations, in a regime where their evolution is ruled by linear
equations. Possible universal properties of the bifurcation sequences have been stud-
ied using scaling approaches. In mathematics the Lyapunov exponent or Lyapunov
characteristic exponent of a dynamical system is a quantity that characterizes the rate
of separation of infinitesimally close trajectories. Quantitatively, two trajectories in
phase space with initial separation Z0 diverge (provided that the divergence can be
treated within the linearized approximation) at a rate given by |Z(t)| et |Z0 |
where is the Lyapunov exponent.
The rate of separation can be different for different orientations of initial sepa-
ration vector. Thus, there is a spectrum of Lyapunov exponents equal in number to
the dimensionality of the phase space. It is common to refer to the largest one as the
Maximal Lyapunov exponent (MLE), because it determines a notion of predictability
for a dynamical system. A positive MLE is usually taken as an indication that the
system is chaotic (provided some other conditions are met, e.g., phase space com-
pactness). Note that an arbitrary initial separation vector will typically contain some
component in the direction associated with the MLE, and because of the exponential
growth rate, the effect of the other exponents will be obliterated over time.
Since the equations for the dynamics of x, y, z are first order and autonomous,
one may consider this set of variables the phase space. The dynamics at each point,
specified by the velocity in phase space vector is unique. The evolution in time
then traces out a path in the three dimensions. An immediate result is that phase
space trajectories cannot cross.
92 5 Applications
The Lorenz model can be solved using a discretized four order Runge-Kutta method
[9]. Starting from our Eq. (5.1) we have:
xk+1 = xk + t ( f 1 + 2 f 2 + 2 f 3 + f 4 )
yk+1 = yk + t (g1 + 2g2 + 2g3 + g4 )
z k+1 = z k + t (h 1 + 2h 2 + 2h 3 + h 4 ), (5.2)
5.1 Lorenz Model 93
where
f 1 = f x (x, y, z, )
g1 = f y (x, y, z, )
h 1 = f y (x, y, z, )
f1 g1 h1
f 2 = f x (x + t , y + t , z + t , )
2 2 2
f1 g1 h1
g2 = f y (x + t , y + t , z + t , )
2 2 2
f1 g1 h1
h 2 = f z (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
f 3 = f x (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
g3 = f y (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
h 3 = f z (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
f 4 = f x (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
g4 = f y (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
h 4 = f z (x + t , y + t , z + t , ), (5.3)
2 2 2
where t is the model time step and k is the time step index.
Since the
discretization introduces an error, it is usual to add to (5.2) a term of
the form t where = (x , y , z )T N (0, Q) is assumed to be a normally
distributed random vector with zero mean and error covariance Q.
On the web there are some implementations of the Lorenz experiment. They are
written in MATLAB (Kuhl and Kostelich [10]) for the LETKF method or in C++
(Bannister [11]) or IDL (Migliorini [12]) for the EnSRKF method. The reason is that
Lorenz 63 or 96, depending on the paper used, are a test bench to understand the
behavior of a simplified atmosphere or to make some experiments of data assimilation
or chaos.
In this paragraph we show the EnSRKF approach by Bannister. His approach is a
little bit different from the general approach reported in the previous chapter, but it
has the advantage to show another solution in the framework of what we have called
the Deterministic Ensemble Kalman Filter.
94 5 Applications
The EnSRKF process treats the whole ensemble as a single entity so the value of xa
is in fact the same of the ensemble mean. In matrix notation we have X f and also
the analysis matrix Xa . The mean of X f is denoted with X f , of size n N , that is
made up of N identical columns each containing the ensemble mean state vectors
for model forecast. This matrix is that one of writing x f . Xa of size n N is the
ensemble mean matrix of the analyzed Kalman filter output and contains the writing
of xa .
In order to use the matrix notation we define also a new matrix Y that is made up
of N identical columns each containing the observation vector y. Thus
1 f
N
f
Pf (xk x f )(xk x f )T
N 1
k=1
1
Pf X f (X f )T , (5.6)
N 1
f
where X f = (xi x f ), k = 1, 2, 3, . . . , N is the n N matrix of the ensemble
member perturbations. The same can be done for the analysis error covariance matrix:
1 a
N
Pa (xk xa )(xka xa )T
N 1
k=1
1
Pa Xa (Xa )T . (5.7)
N 1
5.1 Lorenz Model 95
X f (X f )T T X f (X f )T T
Xa = X f + H [H H + R]1 (Y HX f ). (5.8)
N 1 N 1
Indicating B = HX f we have:
Xa = X f + X f BT C1 (Y HX f ). (5.11)
Remembering that P f and Pa are given from (5.6) and (5.7), and taking into account
that HX f = B we make a substitution in the relation (5.12) obtaining:
we obtain
Xa (Xa )T = X f [I G](X f )T . (5.15)
In the square root scheme the idea is to obtain an ensemble perturbed analysis (con-
tained into Xa ) that has the covariance given by the relation (5.12). The matrix Xa
is thought as the square root of the analysis error covariance matrix given by (5.7)
that can be added to the mean Xa of relation (5.10). The last step is to find Xa that
has the properties of (5.7), which is the same as to find the square root of (5.7). Thus
we decompose firstly G = C in its eigenvectors and eigenvalues .
Bringing the eigenvectors outside brackets and using the properties of eigenvectors
IT = I we have:
The square matrix [I ] has an infinite number of possible roots. One is simply
1 1
[I ] 2 and the other is [I ] 2 T since T = I. Therefore:
1 1 1 1 1 1
[(I ) 2 T ][(I ) 2 T ]T = (I ) 2 T (I ) 2 = (I ) 2 (I ) 2 = I
(5.18)
Thus using the second square root we have:
1 1
Xa (Xa )T = X f [I ] 2 T ([I ] 2 )T T (X f )T (5.19)
The final step is to construct the full ensemble from the perturbation and then to
propagate this ensemble to the next step to obtain the final next step.
Xa (t) = Xa + Xa (5.21)
X (t + t) = M(X (t)) + X(t)
f a
(5.22)
where M is the non linear forecast model and X(t) is a n N matrix of stochastic
perturbations to simulate the imperfect model with specified error covariance Q. In
absence of observations no data assimilation can be performed and only the forecast
is performed.
The System Biology considers that information flow depends on a mix of both
molecular (DNA, protein, lipids, etc.) and physical signals (System status parame-
ters: strength, thermodynamic constraints), which interact with each other through a
non-linear dynamics. To understand these interactions is necessary, on one side, to
provide data on which it is possible to conduct statistical evaluations, made possible
through techniques of Genomics, Proteomics, Metabolomics, etc. Such approaches
provide large amounts of information and the way to understand how physical forces,
the weight of the tangential stress, stiffness, surface tension and gravitational inter-
actions, determine the fate of complex biological systems.
At the same time mathematical and numerical models have been progressively
used as a tool for supporting medical research in the biology and medicine science.
Silico experiments have provided remarkable insights into a physio-pathological
process completing the traditional in vitro and in vivo investigations. Furthermore
numerical models have been used to give a dynamical representation of the biology
of a specific patient and to be a support to the prognostic activity.
The need to obtain not only qualitative responses but also quantitative responses
for diagnostic purposes has stimulated the design of new methods and instruments of
measurements and imaging. The advent of high resolution 3D imaging instruments
in biology and medicine are producing novel approaches to treat huge amount of data
which can be used for the numerical simulations. However, beyond the validation,
it is possible to merge simulations and measures by means of more sophisticated
numerical techniques.
Recently some Data Assimilation methods have been applied to biomathematics,
merging observed (generally sparse and noisy) information into a numerical model.
This approach improves the quality of the information because allows to include
effects otherwise difficult to modeling introducing a sophisticated filter able to bal-
ance the uncertainties of the measured data with basic principles. In summary, data
assimilation methods born in geophysics and meteorology, are also mature to be used
in the fields of biology and medicine.
What is relevant is the transparency of the mathematical and statistical method-
ology and of the digital representation of a specific pathology on that patient (elec-
tronic/virtual phenotype). This approach allows us to initiate a route towards a well
based selection of therapeutics for the future integrated and personalized care in
agreement with the actual institution guidelines development.
The integration of missed information such as those of case report will allow to
approach the solution of a single difficult case by using a set of different domain
choices (social, ethical, medical etc.) capable to improve quality of life with the
specific pathology and patient stage.
The data from specific research in the pathology sector will be the base for the
continuous and future tuning of the data base and for the periodic validation of a
complex virtual engine. By reconstructing the electronic phenotype, the individuality
recognition (personalization) of each single patient is actually linked to the overall
information recruited on the disease.
98 5 Applications
Phenotype domains are crucial to see the personalized patient represented in all its
own dimensions even though social economic psychological-ethical condition could
be prone to orient the final selection of the integrated therapeutic actions.
into account the process of creation and/or death of the cells due to treatment, one
can write the partial differential equation in three dimension
c c c c c
= (D ) + (D ) + (D ) + c(1 ) G(t)c, (5.24)
t x x y y z z Tmax
where G(t) accounts for the temporal profile of treatment and as a first approximation
may be considered a constant and is the maximum glioma growth rate. Such model
is given both by Chaplain [13] and Swanson [14]. In order to avoid the exponential
growth term, the cells growth logistically with some carrying capacity Tmax at any
given point of the models domain. The representative parameters for this model
applied to brain are given in the following Table 5.1, where are reported also the
density of white matter, gray matter, corpus callosum and cerebro spinal fluid (CSF).
The initial conditions are
c(x, y, x, 0) = f(x, y, z) (5.25)
while the boundary conditions on the brain domain on which the diffusion equation
is to be solved is
5.2 Biology and Medicine 99
c c c
n D( i + j + k) = 0 on . (5.26)
x y z
In the Neumann boundary condition the normal derivative of the unknown function
c (solution of PDE) is
c
(x, y, z) = n c = f(x, y, z) , (5.27)
n
where n is the unit vector normal to the boundary of the domain , i, j, k are
the vectors representing the subintervals in which the coordinates are subdivided and
f(x, y, z) is a known function that defines the initial spatial distribution of malignant
cells [15].
In order to solve the Eq. (5.24) one uses the Crank-Nicolson method, well known
in the frame of many diffusion problems.
We report here the paradigmatic approach of Kostelich et al. [16], because the data
assimilation procedure, despite being applied to a brain tumor (Glioblastoma mul-
tiform-GBM-), does not depend on the details of a given cancer growth model and
should be applicable to other models of cancer or biological phenomena. They report
two types of models. The one reported here is simpler than the other which involves
more processes, from haptotaxis to degradation and repair.
We prefer to avoid discussing the second model in order to emphasize the data
assimilation methods. The equation has been integrated on the brain geometry
obtained from the Brain web data base developed by the McConnel Brain Imag-
ing Center of the Montreal neurological Institute at McGill University.
This is an example of Local Ensemble Transform Kalman Filter (see Hunt et
al. [17]) one of the general approach of classical or advanced data assimilation used
in weather forecast. It can be addressed to GBMs growth even thought the details of
the growth of the tumor cells are poorly known due to complexity of the mechanisms
100 5 Applications
involved. Data are obtained from high resolution episodic images that are obtained
at intervals of weeks to months using chemical agents to enhance the contrast. The
therapy induces further complications that affect the information we need to apply
the data assimilation methods. However the goal of the research is to obtain good
quantitative predictions of GBM growth and spread as well as to estimate their
uncertainties.
Data assimilation have been obtained observing the magnetic resonance images
and using two different model of growth of the tumor.
Even thought the method is the one we have defined in EnKF we exploit and
expand it here.
The problem is to estimate the solution trajectory that best fits the observations,
given an imperfect forecast model that produces trajectories from time tn1 to tn .
Assuming at each time tk , k = 1, . . . , n that the observation is related to the
operator H, i.e. yk = H(x(tk )) + v, where v = N (0, R) is a Gaussian random vector,
and that the system evolves according to a linear model xn = Mn xn1 , the problem
is to maximize the likelihood function
n
1
L[x(t)] = exp{ [y j H(x(t j ))]T R1 [y j H(x(t j ))]} (5.28)
2
j=1
or minimize the cost function obtained taking the log of relation (5.28)
n
J[x(t)] = {[y j H(x(t j ))]T R1 [y j H(x(t j ))]}. (5.29)
j=1
Since we have seen the extension of Kalman Filter to a non linear scenario implies
that the propagation of the error analysis covariance is no longer traceable by the
f
background covariance matrix given by Pn = Mn Pn1 a (M )T , one of the possible
n
solutions is to select an ensemble of k possible trajectories whose variance is approx-
imated by Pn1a . Each ensemble is updated by the model to time t to compute the
n
f
updated ensemble background/forecast covariance Pn . Accurate analysis is obtained
f
when the spread of the ensemble approximates well Pn , otherwise the analysis fails
to correct errors in forecast model. LETKF uses localization to overcome problems
arising from the size of trajectories. Since the size of k is in general smaller than the
model resolution m, the LEFKFs strategy is to perform the analysis at each point
individually by forming a local ensemble over a subset of the model domain (Hunt
et al. [17]). In this way the dynamics at a given point is captured over the local region.
Then the local ensemble will estimate the background uncertainty and will correct
the forecast at each point, giving an updated global analysis over the entire model grid.
The advantage of LETKF is to be model independent, i.e. the analysis is computed
without the use of the model equations. This is a key feature when the system under
study is poorly understood and the models are not fully mature.
5.2 Biology and Medicine 101
The procedure starts at time tn1 with the analysis of the ensemble consisting of
m-dimensional model vectors
{xtan1
i
i = 1, 2, . . . , k}, (5.30)
that represents the density of the cells (proliferating, migrating, chemo-repelling) and
the extracellular matrix (ECM) at each grid point of the model geometry assumed
for the cavity in which the tumor is growing.
The mean is considered as the best value estimate of the most likely state of the
system. The update of the ensemble member is abstained from the model up the time
tn . In such a way one can obtain the forecast ensemble
f
{xtni i = 1, 2, . . . , k}. (5.31)
The relation between the forecast state and the analysis state is given by:
f
xtni = F(xtan1
i
, tn1 ) i = 1, 2, . . . , k. (5.32)
From now on the time subscripts will be omitted. The forecast mean is:
k
x f = k 1 x fi , (5.33)
i=1
k
P f = (k 1)1 (x fi x f )(x fi x f )T
i=1
1
= (k 1) X (X f )T ,
f
(5.34)
k
xa = k 1 xai (5.35)
i=1
and
k
P f = (k 1)1 (xai xa )(xai xa )T
i=1
1
= (k 1) X (Xa )T ,
a
(5.36)
102 5 Applications
However the rank of the covariance forecast matrix (5.34) can be at most k 1
and therefore it is not invertible. Nevertheless its inverse is well defined on the
space S defined by the forecast ensemble perturbation, that is the column of X f .
Then J is well defined for (x x f ) in S and the minimization can be carried out
in this subspace. Thus in order to carry out the analysis on S one needs to choose
an appropriate coordinate system through an affine transformation. If one regards
X f as a linear transform from a k-dimensional space S onto S, one may perform
the analysis in S. Then let w denote a vector in the S space. X f w belongs to the
space S spanned by the forecast perturbation ensemble and x = x f + X f w is the
corresponding model status. Note that if w is a random Gaussian vector with mean
0 and covariance (k 1)1 I then x = x f + X f w is Gaussian with mean x and
covariance P f = (k 1)1 X f (X f )T .
Then the cost function is on S:
Combining (5.39) and (5.40) into the cost function (5.38) one gets
where the first term on the right is the orthogonal projection of w onto the null space
N of X f which depends on the w components in the null space. The second term
5.2 Biology and Medicine 103
only depends on the components in the column space S, of X f Then it follows that
wa minimizes J if and only if it is orthogonal to N and xa minimizes J
In order to derive the updated analysis and error covariance matrix one proceeds to
compute a minimizer to J based on Kalman filter. First of all one linearizes the obser-
vation operator H to each forecast trajectory x fi producing the l dimensional vectors
(l m), representing the spatial dimension of the observations, which comprise the
forecast observation ensemble,
y fi = H(x fi ). (5.42)
As before lets denote the mean forecast observation with y f and the l k forecast
observation ensemble perturbation matrix with Y f whose ith column is x fi xf .
H(x f + X f w) y f + Y f w, (5.43)
w = K[y y f Y f w] (5.44)
Remembering
K = Pa HT R1 (5.45)
and
Pa = (I + P f HT R1 H)1 P f , (5.46)
wa = Pa (Y f )T R1 [y y f ] (5.47)
and
Pa = [(k 1)I + (Y f )T R1 Y f ]1 (5.48)
The analysis mean and error covariance matrix in model variables are:
xa = x f + X f wa , (5.49)
Pa = X f Pa (X f )T . (5.50)
Lets now determine an analysis ensemble whose mean and error covariance
matrix are given from the previous last Eqs. (5.49) and (5.50). The strategic approach
is based on the two following points.
1. Since LETKF adopts the localization, the ensemble depends continuously on the
analysis covariance matrix, that ensures the analysis ensemble covariance matrix
Pa is the same in any points of the grid. A possible choice is Xa = X f Wa where
104 5 Applications
Pa = X f Pa (X f )T
= X f (k 1)1 Wa (Wa )T (X f )T
= (k 1)1 Xa (Xa )T . (5.51)
2. Choose a matrix whose columns sum to zero and has the desired error covariance
matrix. In order to show the columns of Xa sum to zero is equivalent to show
Xa w = 0 where w (1, 1, . . .)T . Then because also Y f sum to zero we have
(Pa )1 w = (k 1)w + (Y f )1 R1 Y f w
= (k 1)w. (5.52)
Xa w = X f W a w
= Xfw = 0 (5.53)
3. Shift the columns by x because the mean as in (5.49) holds. Add wa to each
column vector wai of Wa defined as xai . By these weighting vectors one gets the
ith analysis ensemble in space model xai = x f + Xa xai .
With the previous positions, the updated analysis mean is
k
k
(k)1 xai = (k)1 (x f + Xa xai )
i=1 i=1
k
xa = x f + X f wa + k 1 X f wai
i=1
1
= x + X w + k
f f a f
X W w a
= x f + X f wa . (5.54)
Following Hunt [17] the procedure starts with several preliminary computation car-
ried out over the entire model grid.
5.2 Biology and Medicine 105
Because the data assimilation requires a reference model, this section is divided into
two parts, one dealing with a Mars general circulation model and another that deals
with Mars data assimilation .
Some planetary models derive from the Mintz-Arakawa s model for Earth
atmospheric circulation. There are several versions, from the first one up to the cur-
rent UCLAs model (version 6.4) pass 40 years of research work. Akio Arakawa has
been a leader in the field of Earth atmospheric general circulation model (AGCM)
development from its beginning. AGCMs are essential tools for studies of global
warming and projecting the consequences of anthropogenic climate change. His
AGCMs contain several contributions on several areas
1. numerical schemes suitable for the long model integrations required by climate
studies;
2. modeling of cloud processes including cumulus parameterization;
3. modeling of planetary boundary layer (PBL) processes.
The equations for large-scale atmospheric motion, the so called primitive equa-
tions, are defined by:
106 5 Applications
where is the planetary rotation rate, a is the planetary radius, v is the wind
velocity with the eastward component u and westward v, is the latitude, is the
longitude and t the time. F is the horizontal frictionless force per unit mass and
k is the vertical unit vector, and divh and gradh are respectively the divergence
and gradient operators on the surface of constant .
3. The thermodynamic equation
(T) ( T) dp
= divh (vT) + C 1
p bRT + C 1
p H (5.57)
t t dt
where H is the rate of heating per unit mass, with the auxiliary relation which
follows from the equation of continuity; C p is the specific heat at constant pres-
sure. Here R = 0.188 107 erggm1 K1 ; C p = 0.879107 (1.0 + 0.634
103 (T 250)) for T ranging from 120 to 300 K.
= divh (v)d (5.58)
0 t
dp
= [ + w gradh ()] + . (5.60)
dt t
5.3 Mars Data Assimilation: The General Circulation Model 107
Mars general circulation modeling began in the 1960 s when Leovy and Mintz [18]
modified the UCLA two level atmospheric model developed by Mintz [19] and
Arakawa [20] for conditions appropriate for Mars, and used it to study the planets
wind systems, thermal structure, and energetics. More information on Mars geologi-
cal structure and a description of the present atmosphere can be found in two books:
Mars [21] and An Introduction to its Interior, Surface and Atmosphere [22].
The value of = 0 at the lower boundary , where = 1 except when CO2
condenses or sublimes. It is assumed ddtp = 0 at p = pT so that = 0 where
= 0. The heating rate is the combined effect of solar heating, IR radiative transfer,
convective exchange and latent heat due to condensation and sublimation of CO2
which is the main gas on Mars.
Thus the rate of heating for unit mass is:
g
Hi = (Si + Wi + Ci ) (5.61)
(p)i
where Si , Wi , Ci are the differences between the net flux at the top and bottom
of the layer due to solar radiation, thermal radiation and sub-grid convection. Leovy
and Mintz [18] report the equations to compute these quantities. The mass of the
layer is represented by (p)g
i
where (p)i is the increase in pressure from the top
to the bottom of the layer. In the region where the pressure is lower than pT they
assume that it absorb heat.
Following the Mintz [19] and Arakawa [20] two layer Earth atmosphere model
they subdivided the Mars atmosphere in the upper layer with i = 1 and lower layer
with i = 3 where (p)1 = ( ps +2 pT ) and (p)3 = ( ps 2 pT ) respectively. The heating
Hi is defined as:
1. Solar heating
Along the CO2 near IR absorption band, the differences can be formulated as:
rm 2 1 1
S1 = ( ) (sin ) 2 {465 + [2397 + 531 ln(csc )](sin ) 2 } (5.62)
r
and rm 2 1 1
S3 = ( ) (sin ) 2 {378 + 657(sin ) 2 }, (5.63)
r
where rrm is the ratio of the Mars mean distance to its actual distance and is the
solar elevation angle.
2. Infrared heating
The flux differences in the IR 15 region, where is located the main infrared band
of CO2 are:
and
where
Y (Tk ) = [exp(964.1/Tk ) 1] (5.66)
C4 = C p C H (X )u (TG T4 ), (5.67)
where is the mean surface air density, u is the surface friction velocity and
C H (X ) is a convective heat transfer coefficient which depends on X = [g(TG
1
T4 )/T ] 2 /u . is the molecular thermal diffusivity at the surface and T is the
global mean surface air temperature. For stable conditions one assumes C H (X ) =
C M where C M 2 is the momentum drag coefficient for stable conditions. When the
When the lapse rate is unstable ( > a ) the upward convective heat flux at level
2 is the rate
C2 = 4 102 T2 ( a ), (5.69)
g
where is the actual lapse rate between T1 and T3 and a = C p is the adiabatic
lapse rate. a = 4.23 (1.0 + 0.634 103 (T2 250)) K Km1 . When the lapse
rate is stable but exceeds the threshold m 2.5 Km1 and when the surface heat
flux C4 is upward, a convective exchange arises. In such a case
m
C2 = C4 . (5.70)
a m
5.3 Mars Data Assimilation: The General Circulation Model 109
(1 A)S4 W4 C4 D L = 0, (5.71)
one can obtain the ground temperature TG . In it A is the surface albedo, S4 is the
downward solar radiation, W4 is the net upward IR radiation at the surface, C4 is
the net upward convective heat flux at the surface, D is the downward conductive
heat flux into the soil and L is the latent heat release due to CO2 condensation on
the surface.
5. Friction and lateral diffusion
Since some of the thermal energy introduced into the system is converted into
kinetic energy of the wind field that is dissipated by the frictional force F linked
with the vertical eddy stress, for the two layer model by Leovy and Mintz [18]
one has:
2g
F1 = 2 (5.72)
ps p T
and
2g
F2 = (2 s ) (5.73)
ps p T
pT
where ps 2g is the mass per unit area of each layer and 2 is the stress on the
lower layer due to turbulent exchange
ps p T
2 = C (v1 v2 ) (5.74)
2g
where u = C M |vs |, with vs the surface wind and the momentum drag coefficient
is C M .
Since in Earth simulation model there also a lateral diffusion Leovy and Mintz
also introduced a Mars lateral diffusion coefficients depending on the local grid
distance.
Several sophisticated Mars global circulation models (GCMs) have been devel-
oped since. As an example among these, Richardson et al. [23] derived Planet-WRF
from WRF, which had been designed for modelling terrestrial meso-scale and micro-
scale. WRF was modified to work as a planetary model (see http://planetwrf.com/
model) and the main changes are:
110 5 Applications
Fig. 5.1 PlanetWRF global grid. Looking down on the pole (left). C-grid cylindrical projection
(right). The grid points are equally spaced in longitude and latitude with horizontally staggered u, v
and mass points. At the pole all v points correspond to one physical location. source Richardson
et al. [23]
5.3 Mars Data Assimilation: The General Circulation Model 111
the meridional velocity constantly equal to zero. Flux and gradient calculations
across the pole are not allowed. This does not preclude advection of material
across the pole. The advection over the poles is instead accomplished by zonal
transport within the most poleward zone.
Since the physical distance for zonal advection of information decreases rapidly
to zero at the pole, the model time step needed to avoid instabilities in the hori-
zontal direction. In numerical analysis, when certain partial differential equations
(usually hyperbolic PDEs) are solved, the time step must be less than a certain
threshold, otherwise the simulation will produce incorrect results. The Courant-
Friedrichs-Lewy [25] or CFL criteria is a necessary condition for convergence by
the method of finite differences. To be able to use longer time steps (those suitable
for more tropical latitudes) PlanetWRF has implemented polar Fourier filtering
of the higher-frequency components of state variables. All grid points poleward
of 60 degrees are filtered, with the cutoff frequency being a function of the cosine
of latitude. To yield the greatest stability, the column mass, horizontal winds,
temperature and tracers (moisture, aerosols, chemicals, etc.) are also filtered
3. Generalized Planetary Parameters and Timing Conventions.
All planet-specific parameters such as orbital parameters, the relationship between
MKS (SI) seconds and model seconds defined as 1/86400 of the rotational period
of a planet, reference pressure, gravity and gas constant are set in one centralized
module where there are also the set of consistent planetary parameters.
Since the model assumes that one day is made up of 24 h, each of which is
composed of 60 minutes, in turn made up of 60 s, then there is always an integer
number of time steps per day. However, the dynamics and physics routines are
still integrated in MKS (SI) units, with the conversion from model-to-SI time
made before calculating tendencies and physical quantities. Other items are:
a. Since WRF version uses the standard day-month-year calendar format, this
convention is used to drive the solar radiation routines and to label model
output.
b. Since WRF uses routines from the standardized Earth System Modeling
Framework (ESMF) planetWRF converts these routines to drive the model
with user-specified orbital elements using the planetocentric solar longitude
(L s ) date system (with L s = 0 corresponding to northern hemisphere spring
equinox and L s = 90 to northern hemisphere summer solstice, etc.).
4. Parameterizations of sub-grid scale physical processes for various planet.
Depending on the planet, physical routines have been added to WRF to treat
the radiative transfer in atmosphere with high aerosol optical thickness and CO2
atmospheric gas, also taking into account the condensation/sublimation of CO2
from polar caps (Mars), For Titan, an updated version of the radiative transfer
scheme described by McKay et al. [26] has been developed. Simple surface drag
and radiative relaxation schemes, similar to the Held and Suarez [27] forcing
has been used to validate the global dynamical core and to treat the Venusian
atmosphere. The concept of dynamical core was introduced to extend the ideas
of plug-compatible parameterizations to the coding of the dynamics. PlanetWRF
112 5 Applications
has the same WRF existing horizontal and vertical diffusion parameterization
schemes, as far as the physics of diffusion remains the same with only the diffu-
sivities varying.
To perform an assimilation Lee et al. [28] adopted version 3.0.1.1 of the MarsWRF
climate model (Richardson et al., [23]) with a radiative forcing in the GCM using
a two stream, single scattering, radiative flux solver based on the Hadley Centre
Unified Model algorithm (Edwards and Slingo, [29]), modified by Mischna et al.
[30]. This parameterization calculates fluxes in the visible and infrared spectra using a
correlated k method to couple the optical properties of the carbon dioxide atmosphere
with the Mie scattering [31] parameters in order to describe a radiative atmosphere
where there are also dust and water and aerosols.
To create the ensemble of model states, the steady state atmosphere is perturbed
using additive Gaussian white noise perturbations on the temperature, surface pres-
sure, horizontal wind, emissivity, albedo, and column dust opacity. Each ensemble
member is then integrated for a certain number of sols to reach a new steady state.
The term sol is used by planetary astronomers to refer to the duration of a solar day
on Mars. A mean Martian solar day, or sol, is 24 h, 39 min, and 35.244 s. The magni-
tudes of the perturbations are small (e.g. standard deviations of 2 K in temperature,
5 m/s in horizontal wind, 0.02 in albedo and emissivity) and are constrained to keep
albedo and emissivity between zero and one.
At each assimilation step an observation forward model within the data assim-
ilation system has been used to simulate the observations using the state vectors
of the ensemble members. The simulated observations are the calibrated radiances
observed by the Thermal Emission Spectrometer (TES) [32] aboard the Mars Global
Surveyor (MGS). TES is a nadir sounding grating spectrometer, observing the spec-
trum from 200 to 1700 cm1 with a resolution of 5 cm1 or 10. cm1 depending on
the scan mode. These data can be obtained from the NASA Planetary Data System
archive (PDS, http://pds.jpl.nasa.gov).
The integration step is performed by the MarsWRF GCM under the control of
the DART software, which maintains the ensemble of model states and provides
the initial conditions necessary to continue the simulation. The assimilation step is
performed by the DART software.
DART is a facility for ensemble Data Assimilation developed and maintained
by the Data Assimilation Research Section (DAReS) at the National Center for
Atmospheric Research (NCAR) http://www.image.ucar.edu/DAReS/DART/. DART
is a software that makes to explore a variety of data assimilation methods and obser-
vations with different numerical models and is designed to facilitate the combination
of assimilation algorithms, models, and real (as well as synthetic) observations to
allow increased understanding of all three.
5.3 Mars Data Assimilation: The General Circulation Model 113
on Gaussian approximations. They are a set of simulation based methods that pro-
vide a flexible alternative to computing posterior distributions. They are applicable
in very general settings and often relatively easy to implement. SMC methods are
also known under the names of particle filters, bootstrap filters, condensation, Monte
Carlo filters, interacting particle approximations and survival of the fittest. Good
introductions can be found, for instance in Arulampalam et al. [37],
As we have seen in previous chapters the Bayesian filtering covers different sce-
narios: from Kalman filter to Sequential Monte Carlo sampling, the so called parti-
cle filters. Among the Monte Carlo methods one of the simplest is the Importance
Sampling (IS) that introduces the idea of generating samples from a known, easy-to-
sample probability density function pdf, q(x), called the importance density or pro-
posal density, and then correcting the weights of each sample so that the weighted
samples approximate the desired density. However IS, in its simplest form, is not
adequate for sequential estimation. Whenever new data become available, one needs
to recompute the importance weights over the entire state sequence.
Sequential Importance Sampling (SIS) modifies IS so that it becomes possible to
compute an estimate of the posterior without modifying the past simulated trajec-
tories. The problem encountered by the SIS method is that, as time increases, the
distribution of the importance weights becomes more and more skewed. For instance,
if the support of the importance density is broader than the posterior density, then
some particles will have their weights set to zero in the update stage. But even if the
supports coincide exactly, many particles will over time decrease in weight so that
after a few time steps, only a few lucky survivors have significant weights, while a
large computational effort is spent on propagating unimportant particles.
It has been shown that the variance of the weights can only increase over time,
thus it is impossible to overcome the degeneracy problem (Kong et al., [38]). Two
solutions exist to minimize this problem:
1. a good choice of the importance density,
2. resampling.
Sequential Importance Resampling (SIR) introduces, however, other problems. Since
particles are sampled from discrete approximations to density functions, the particles
with high weights are statistically selected many times. This leads to a loss of diversity
among the particles as the resultant sample will contain many repeated points. This
is known as sample impoverishment (see Arulampalam et al. [37]) and is severe
when the model forecast is very narrow. There are various methods to deal with this
problem, including sophisticated methods that recalculate past states and weights
via a recursion and Markov Chain Monte Carlo (MCMC) methods. Because of the
additional problems introduced by resampling, it makes sense to resample only when
the variance of the weights has decreased appreciably.
In the next section we introduce the Earthquake model and the Sequential Monte
Carlo and related methods applicable to the Earthquake events.
5.4 Earthquake Forecast 115
One of the model used in seismology is the renewal processes. In general the physics
model that are used for seismic hazard map are not well defined and are not unique,
and the same renewal processes models, which are motivated from the elastic rebound
proposed by Reid [39], are under discussion. According to the theory, a large earth-
quake release the elastic strain that has been built up since the last large earthquake.
Then renewal point process are characterized by interval between successive
events that are identically and independently distributed according to a probabil-
ity density function that defines the process (Daley and Vere-Jones [40]).
The time of the next event only depends on the time of the last event:
where represent the interval between the two events. In this frame the time of
the event tk corresponds, in data assimilation, to the model state. The renewal point
processes provides the prior information for the analysis.
Werner et al., [41] offers a pedagogical example of earthquake forecasting. Their
model is based on lognormal renewal process [42, 43].
The intervals of of the lognormal process are distributed according to:
1 log 2
f ( , , ) = exp( ) (5.77)
2 2 2
1 log(tk tk1 ) 2
p(tk tk1 , , ) = exp( ) (5.78)
(tk tk1 ) 2 2 2
with = 0.7 and = 0.245 given by Biasi et al. [44]. Since the observed
occurrence time tko is given by the true occurrence time tkt affected by the noise k
we can write:
tko = tkt + k (5.79)
Werner et al. [41] run the numerical experiments using two different distribution.
1. Gaussian uniform distribution
It is:
1
puniform () = H ( + )H ( ) (5.80)
2 2
1
+ 2
puniform () = 2
0 otherwise
116 5 Applications
where fk and hk are possibly non linear functions and wk1 and vk are independent
and identical distributed (i.i.d.) process and measurement noise sequences respec-
tively. The problem is to seek filtered estimates of yk based on the set of all available
measurements {yi , i = 1, . . . , k} as we have already seen in the bayesian chapter.
From a Bayesian point of view the tracking problem is to construct the probability
density function pd f , p(x0:k |y1:k ) that can recursively be obtained in two stages:
forecast and update. The initial pd f is: p(x0 |z 0 ) p(x0 ) as we have seen previously.
When the true density is non-Gaussian, the approximated grid based filters and
particle filters will improve the performance respect to other approaches that require
Gaussian approximation.
In the Monte Carlo sampling an empirical posterior density at k can be expressed
as
Np
p(x0:k |y1:k ) = (x0:k
i
) (5.85)
i=1
i . Considering a function f (x )
where is the Delta Dirac mass located at x0:k 0:k
integrable in a measurable space, the Lebesque-Stieltjes integral is the estimate of
such function
Np
1
IN p ( f ) = f (x0:k ) p(x0:k |y1:k )dx0:k = f (x0:k
i
), (5.86)
Np
i=1
that is unbiased.
In the importance sampling the objective is to sample the distribution in the region
of importance in order to achieve computational efficiency. This is important espe-
cially for the high-dimensional space where the data are usually sparse, and the region
of interest where the target lies in is relatively small respect to the whole data space.
Thus let denote a random measure that is characterized by the posterior pd f ,
p(x0:k |z 1:k ), where {x0:k
i , i = 1, 2, . . . , N } is a set of support points with associ-
p
p(x )i
ated weights {wki = q(x i) , i = 1, 2, . . . , N p } and {x0:k
i , i = 1, 2, . . . , N } is the
p
set of all states up to the time k.
One chooses a proposal distribution q(x0:k |y1:k ) in place of a true probability
distribution p(x0:k |y1:k ) so that
p(x0:k |y1:k )
p(x0:k |y1:k ) = q(x0:k |y1:k ), (5.87)
q(x0:k |y1:k )
where
p(x0:k |y1:k )
w(x0:k ) = . (5.89)
q(x0:k |y1:k )
When one generates N identically and independently distributed samples x0:k i from
w(xi )
wki = N 0:k . (5.91)
p j
j=1 w0:k
Np
p(x0:k |y1:k ) = wki (x0:k
i
). (5.92)
i=1
The importance sampling may be useful to reduce the variance of the estimator and
when encountering the difficulty to sample from the true probability.
In general, however, it is difficult to find a good proposal distribution function
especially in a high dimensional space. If the importance density q(x0:k |y1:k ) at time
k admits a marginal distribution at time k 1, the importance function q(x0:k |y1:k1 )
can construct the proposal sequentially, which is the basis of Sequential Importance
Sampling (SIS)
k
q(x0:k |y1:k ) = q(x0 ) q(x0:l |x0:l1 , y1:l ) (5.94)
l=1
k
p(x0:k ) = p(x0 ) p(xl |xl1 ) (5.95)
l=1
5.4 Earthquake Forecast 119
and
k
p(y1:k |x0:k ) = p(yl |xl ) (5.96)
l=1
According to the law of probability, the importance weights wki given by the
relation (5.91), and using Eqs. (5.94), (5.95), (5.96), can be written as
This equation provides the importance weights for the sequential updating. The
advantage of SIS is that it does not rely on the underlying Markov chain. The dis-
advantage is that the importance weights may have large variances and there is a
problem of degeneracy. In order to minimize the problem there are two possible
solutions. The first is to choose a good importance density, while the second is the
resampling.
The sampling-importance resampling (SIR) is motivated from the bootstrap and
jackknife techniques. The bootstrapping is to evaluate the properties of an estimator
through the empirical cumulative distribution function (cdf) of the samples instead
of the true cdf.
In statistics, the jackknife is a resampling techniques and bootstrapping is the eval-
uation of the properties of an estimator through the empirical cumulative distribution
function (Efron and Tibshirani [46]).
In order to overcome the degeneracy a suitable method is to compute the effective
sample size Neff (Liu and Chen [47]) defined as
Np
Neff = (5.98)
1 + V ar (wki )
where
p(xki |y1:k )
wki = (5.99)
q(xki |xk1
i , yk )
When this may be not available and an estimate Neff can be obtained as an inverse
of the so called Participation Ratio (Mzard et al. [48])
1
Neff = N (5.100)
i=1 (wk )
p i 2
References
1. Lorenz, E.N.: Designing chaotic models. J. Atmos. Sci. 62, 15741587 (2005)
2. Lorenz, E.N.: Atmospheric Predictability. Advances in Numerical Weather Prediction,
196566 Seminar Series. Travelers Research Center, Inc., pp. 34-39 (1966)
3. Lorenz, E.N., Emanuel, K.A.: Optimal sites for supplementary weather observations: simula-
tion with a small model. J. Atmos. Sci. 55, 399414 (1998)
4. Lorenz, E.N.: Deterministic non-periodic flow. J. Atmos. Sci. 20, 130141 (1963)
5. Salzman, B.: Finite amplitude free convection as an initial value problem I. J. Atmos. Sci. 19,
329341 (1962)
6. Lyapunov, A.M.:The general problem of the stability of motion, Translated by Fuller, A.T.
Taylor and Francis, London. ISBN: 978-0-7484-0062-1 Reviewed in detail by Smith, M.C.:
Automatica 1995 3(2), 353356 (1992)
7. McLaughlin, J.B., Martin, P.C.: Transition to turbulence of a statically stressed fluid system.
Phys. Rev. A 12, 186 (1975)
8. Glatzmaier, G.A., Roberts, P.H.: A three-dimensional self-consistent computer simulation of
a geomagnetic field reversal. Nature 377, 203209 (1995)
9. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes. Cambridge
University Press, Cambridge (1986)
10. Kuhl, D., Kostelich, E.: Introduction to LETKF Data Assimilation(2008).https://www.atmos.
umd.edu/~dkuhl/AOSC614/LETKF_lab.pdf
11. Bannister, R.: A square root ensemble Kalman filter demonstration with the Lorenz model
(2012). https://www.met.reading.ac.uk/textdarc/training/lorenze_nsrkf/
12. Migliorini, S.: Ensemble data assimilation with the lorenz equations (2010). https://www.met.
reading.ac.uk/~hraa/projects/05/index.html
13. Chaplain, M.: Modelling Aspects of Cancer Growth: Insight from Mathematical and Numerical
Analysis and Computational Simulation. Multiscale Problems in the Life Sciences Lecture
Notes in Mathematics 1940, 147200 (2008)
14. Swanson, K.R., Alvord Jr, E.C., Murray, J.D.: A quantitative model of differential motility of
gliomas in white and grey matter. Cell Prolif. 33, 317329 (2000)
15. Giatili, S.G., Stamatakos, G.S.: A detailed numerical treatment of the boundary conditions
imposed by the skull on a diffusion-reaction model of glioma tumor growth. Clinical validation
aspects. Appl. Math. Comput. 218 (2012)
16. Kostelich E.J., Kuang, Y., McDaniel, J.M., Moore, N.Z., Martirosyan, N.L., Preul, M.C.: Accu-
rate state estimation from uncertain data and models: an application of data assimilation to
mathematical models of human brain tumors, Biol. Direct 1, 6-64 (2011). http://www.biology-
direct.com/content/6/1/64
17. Hunt, B.R., Kostelich, E.J., Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos:
a local ensemble transform Kalman filter. Physica D 230, 112126 (2007)
References 121
18. Leovy, C., Mintz, Y.: The numerical simulation of atmospheric circulation and climate of Mars.
J. Atmos. Sci. 26(6), 11671190 (1969)
19. Mintz, Y.: Very long-term global integration of the primative equations of atmospheric motion.
(An experiment in climate simulation) WMO tech notes No. 66 141167, also Meteorol.
Monogr. 8(30), 2036 (1965)
20. Arakawa A.: Numerical simulation of large-scale atmospheric motions. Numerical solution of
field problems continuum physics. In: Birkhoff, G., verge. S. (eds,) American Math Society,
vol 2, pp 24-40. Providence R.I. (1970)
21. Matthews, M.S., Kieffer, H.H., Jakosky, B.M., Snyder, C.: Mars. The University of Arizona
Press
22. Barlow, N.: Mars: An Introduction to its Interior, Surface and Atmosphere. Cambridge Planetary
Science. Cambridge University Press, Cambridge (2014)
23. Richardson, M.I., Toigo, A.D., Newman, C.E.: PlanetWRF: A general purpose, local to global
numerical model for planetary atmospheric and climate dynamics. J. Geophys Res. 112 (2007)
24. Arakawa, A., Lamb, V.R.: Computational Design of the Basic Dynamical Processes of the
UCLA General Circulation Model. Methods of Computational Physics 17, pp. 173265. Aca-
demic Press, New York (1977)
25. Courant R., Friedrichs, K., Lewyt, H.: On the Partial Difference Equations of Mathematical
Physics. IBM J., (1967)
26. McKay, C.P., Pollack, J.B., Courtin, R.: The thermal structure of Titans atmosphere. Icarus
80(1), 2353 (1989)
27. Held, I.M., Suarez, M.J.: A proposal for the intercomparison of the dynamical cores of
atmospheric general-circulation models. Bull. Am. Meteorol. Soc. 72(10), 18251830 (1994)
28. Lee, C., Lawson, W.G., Richardson, M.I., Anderson, J.L., Collins, N., Hoar, T., Mischna, M.:
Demonstration of ensemble data assimilation for Mars using DART, MarsWRF, and radiance
observations from MGS TES. J. Geophys. Res. 116, 117 (2011)
29. Edwards, J.M., Slingo, A.: Studies with a flexible new radiation code: 1. Choosing a configu-
ration for a large scale model. Q. J. R. Meteorol. Soc. 122(531), 689719 (1996)
30. Mischna, M.A., Toigo, A.D., Newman, C.E., Richardson, M.I.: Development of a new global,
scalable and generic general circulation model for studies of the Martian atmosphere. Paper
presented at the Second Workshop on Mars Atmosphere Modelling and Observations, Granada,
Spain, CNES, 27 February3 March (2006)
31. Levoni, C., Cervino, M., Guzzi, R., Torricella, F.: Atmospheric aerosol optical properties : a
database of radiation characteristic for different components and classes. Appl. Opt. 36 (1997)
32. Christensen, P.R., et al.: Mars Global Surveyor Thermal Emission Spectrometer experiment:
Investigation description and surface science results, J. Geophys. Res., 106(E10) (2001)
33. Montabone, L., Lewis, S.R., Read, P.L., Hinson, D.: Validation of Martian meteorological data
assimilation for MGS/TES using radio occultation measurements. Icarus 185, 113132 (2006)
34. Lewis, S.R., Read, P.L., Conrath, B.J., Pearl, J.C., Smith, M.D.: Assimilation of Thermal
Emission Spectrometer atmospheric data during the Mars Global Surveyor aerobraking period.
Icarus 1932(2), 327347 (2007)
35. Greybush, S.J., Wilson, R.J., Hoffman, R.N., Hoffman, M.J., Miyoshi, T., Ide, K.,
McConnochie, T., Kalnay, E.: Ensemble kalman filter data assimilation of thermal emission
spectrometer temperature retrievals into a Mars GCM. J. Geophys. Res. 117 (2012)
36. Navarro, T., Forget, F., Millour, E., Greybush, S.J.: Detection of detached dust layers in the Mar-
tian atmosphere from their thermal signature using assimilation. Geophys. Res. Lett. 41(19),
66206626 (2014)
37. Arulampalam, M., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online
nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174188 (2002)
38. Kong, A., Liu, J., Wong, W.: Sequential imputations and Bayesian missing data problems. J.
Am. Stat. Assoc. 89(425), 278288 (1994)
39. Reid, H.: The Mechanics of the Earthquake, The California Earthquake of April 18, 1906,
Report of the State Investigation Commission, vol. 2. Carnegie Institution of Washington,
Washington (1910)
122 5 Applications
40. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, vol. I. Springer,
New York (2003)
41. Werner M.J., Ide, K., Sornette, D.: Earthquake forecasting based on data assimilation: sequential
Monte Carlo methods for renewal point processes. Nonlinear Process. Geophys. 18, 49-70
(2011). www.nonlin-processes-geophys.net/18/49/2011/
42. Ogata, Y.: Seismicity analysis through point-process modeling: a review. Pure Appl. Geophys.
155, 471507 (1999)
43. Field, E.H.: A summary of previous working groups on california earthquake probabilities.
Bull. Seismol. Soc. Am. 97(4), 10331053 (2007)
44. Biasi, G., Weldon, R., Fumal, T., Seitz, G.: Paleoseismic event dating and the conditional
probability of large earthquakes on the southern San Andreas fault, California. B. Seismol.
Soc. Am. 92, 27612781 (2002)
45. Marshall, A.: The use of multi-stage sampling schemes in Monte Carlo computations. In:
Meyer, M. (ed.) Symposium on Monte Carlo Methods, pp. 123140. Wiley, New York (1956)
46. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman and Hall/CRC, Boca
Raton (1993). ISBN 0-412-04231-2
47. Liu, J.S., Chen, R.: Sequential monte carlo methods for dynamic systems. J. Am. Stat. Assoc.
93(443), 10321044 (1998)
48. Mzard, M., Parisi, G., Virasoro, M.: Spin Glass Theory and Beyond. World Scientific Lecture
Notes in Physics, vol. 9. Cambridge University Press, Cambridge (1987)
Appendix
For two real matrices A and B that are of the same dimensions.
1. If A and B are positive semidefinite, then is A B,
2. If B is positive definite and if A is positive semidefinite with all its main diagonal
entries positive, then A B is positive definite, Horn [1],
3. A (BC) = (A B)C
4. A(B C) = (A B)C
Points 3 and 4 will be needed for approximations in the inclusion of the Schur product
in the EnKF and ETKF. These are now proved by writing the identities in a matrix
index notation, where i is the row index, j is the column index and k is an index to
be summed over.
[A B]i j = Ai j Bi j
N
[(A B)C]i j = Aik Bik Ck j
k=1
N
[BC]i j = Bik Ck j
k=1
N
[A (BC)]i j = Ai j Bik Ck j (A.1)
k=1
N
[A(B C)]i j = Aik Bk j Ck j (A.2)
k=1
Now we cover the various calculation methods used in previous chapters in order to
provide a comprehensive and self-consistent framework of reference.
u u
a(x, y) + b(x, y) + c(x, y)u = 0 (A.3)
x y
with the boundary conditions u(x, 0) = f (x) where u = u(x, y) is the unknown
function we need to find, and the expressions a(x, y), b(x, y), c(x, y) and f (x) are
given.
The objective of the method of characteristics , when it is applied to these equa-
tions, is to change coordinates from (x, y) to a new coordinate system (x0 , s) where
the partial differential equations PDE becomes an ordinary differential equation
(ODE) along certain curves in the plane x y. These curves along which the solu-
tion of the PDE is reduced to ODE, are called characteristic curves or features. While
the new variable s vary for along these characteristic curves, the new variable x0 will
be constant.
Let us now transform the PDE into ODE.
u u x u y
= + . (A.4)
s x s y s
If we select
x
= a(x, y) (A.5)
s
Appendix 125
and
y
= b(x, y) (A.6)
s
we have:
u u u
= a(x, y) + b(x, y) (A.7)
s x y
du
+ c(x, y)u = 0. (A.8)
ds
The relations (A.5) and (A.6) are the characteristic equations.
The strategy to be adopted to apply the method of characteristics is to:
1. solve quadratic equations (A.5) and (A.6);
2. integrate by placing constants x(0) = x0 e y(0) = 0;
3. solve the ODE (A.8) with the initial conditions u(x, 0) = f (x0 );
4. get a solution, solving it for s and x0 in terms of x and y (using the results of
step 1) and replace these values in u(x0 , s) to get the solution of PDE u(x, t).
Calculus of variations has been developed by Leonhard Euler in 1744 to find the
biggest or smaller values whose rate changed very quickly. In 1755 Lagrange wrote
a letter to Leonhard Euler in which he described his method on the variations. Euler
immediately adopted this new method called Calculus of variations. The great advan-
tage of the calculus of variations is that we consider a system as a whole and the
individual components of the system itself are not explicitly considered. In this way
one allows to deal with a system without knowing in detail all the interactions between
the various components of the system itself. The variational calculation determines
the stationary points (extremes) integral expressions, known as functional.
We can say that a function f (x, y) has a steady value at one point (x0 , y0 ) if
around this point, the rate of the function, in any direction, is zero. The stationary
value concept is well described by the operator that was introduced by Lagrange.
This operator is similar to the differential operator, but while this refers to a real
infinitesimal displacement, the Lagrangian refers to an virtual infinitesimal displace-
ment. The virtual nature of the Lagrangian operator arises from the fact that enables
a shift around an esplorative point called change of position.
The operator behaves like the differential operator and vanishes at the endpoints
of the curve defined by the function. Let us see how it works with a simple example.
126 Appendix
f f
f = x + y. (A.9)
x y
x = x
y = y . (A.10)
In this way the variation of the function, in a specific direction, is given by:
f f f
= x + y . (A.11)
x y
By definition, since (x0 , y0 ) is a steady value, f must disappear for any virtual
displacement with no regard to the direction of the shift that is independently on x
and y . Then the condition that all partial derivatives disappear in a stationary point
is a necessary and sufficient condition for the function f that has a steady value at
that point.
f
=0
x
f
= 0. (A.12)
y
The fact that the value f (x0 , y0 ) is a stationary value of f is a necessary but not
sufficient condition, to be an extremum of f . When the infinitesimal around f (x0 , y0 )
has f > f (x0 , y0 ) everywhere, then f (x0 , y0 ) is a local minimum. Vice versa if
f < f (x0 , y0 ) then f (x0 , y0 ) is a local maximum. In this case the stationary value it
is also an extremum. In other cases f (x0 , y0 ) is a stationary value but not an extreme.
The second derivative allows us to understand if a stationary point is a maximum,
minimum or nil. Of course we need to specify the domain where the stationary or
extreme values can be found.
When the variational calculation has some constraints, for example such as:
g(x, y) = 0, (A.13)
f f
f = x + y = 0
x y
g f
g = x + y = 0. (A.14)
x y
f 1 = ( f + g) = + g + g = f. (A.15)
Then the conditions because there is a stationary point of f 1 subject to the constraint
(A.13) is that:
f g
+ =0
x x
f g
+ = 0. (A.16)
y y
The two previous equations and the constraint (A.13) are used to find the stationary
value. The Lagrange multiplier can be considered as a measure of the sensitivity of
the value of a function f , in the stationary point, when it varies with the constraint
given from relation (A.13).
In the case of N dimensions x1 , . . . , x N , for a stationary point of the function
f (x1 , . . . , x N ) subject to constraints g1 (x1 , . . . , x N ) = 0, . . . g M (x1 , . . . , x N ) = 0,
we have:
M
(f + m gm ) = 0 1 n N. (A.17)
xn
m=1
Let us look how to solves the (2.84) following Bannister [2]. The inclusion of errors
L T
in the field f = f (x, t) and W f 0 d x 0 dt f (x, t)2 in the cost function, defines
the inverse model as a weak constraint (Sasaki [3]).
Let us build the variations of J around the reference field u so we can see for
what u(x, t) J [u] is stationary.
J [u + u] = J [u] + J u . (A.18)
128 Appendix
Thus
L T
J
J |u = dx dt u + O(u 2 ) =
0 0 u
u
L T
2Wi d x{u(x, 0) I (x)}u(x, 0) + 2Wb dt{u(0, t) B(t)}u(0, t)
0 0
M
+2w {u(xi , ti ) u i }u(x, t)(x xi )(t ti )
i=1
L T
u u u u
+2W f dx dt +c F +c + O(u 2 ). (A.19)
0 0 dt x t x
Let us define
u u
(x, t) = W f ( +c F), (A.20)
dt x
is written as
L T u u
2 dx (x, t) +c dt
0 0 t x
L L
T
=2 (x, 0)u(x, 0)d x 2 dx u(x, t)dt
0 0 0 t
L
T T
2 c(0, t)u(0, t)dt 2 dt c u(x, t)d x. (A.24)
0 0 0 x
Appendix 129
Thus we have:
L
J |u = 2Wi d x{u(x, 0) I (x)}u(x, 0)
0
T
+2Wb dt{u(0, t) B(t)}u(0, t)
0
M
+2w {u(xi , ti ) u i }u(x, t)(x xi )(t ti )
i=1
L L T
2 (x, 0)u(x, 0)d x 2 dx u(x, t)dt
0 0 0 t
T T L
2 c(0, t)u(0, t)dt 2 dt c u(x, t)d x + O(u 2 ). (A.25)
0 0 0 x
Posing the linear part to zero, using the Eq. (2.70) and the definition (A.20) we get
the equations of Euler-Lagrange for a weak constraints, where:
1. the forward equation is:
u u
+c F = W 1
f . (A.26)
t x
2. Its initial conditions and the boundary conditions are:
M
w {u(xi , ti ) u i }(x xi )(t ti ) ( + c ) = 0. (A.29)
t x
i=1
(x, T ) = 0 (A.30)
(L , t) = 0. (A.31)
When we require that u = u(x, t) satisfies exactly the Eq. (2.70), then we need to
find a minimum for the functional:
130 Appendix
L
f [u] = Wi {u(x, 0) I (x)}2 d x + (A.32)
0
T
M
Wb {u(0, t) B(t)} dt + w 2
{u(xi , ti )u i }2 .
0 m=1
The query is: which u(x, t) makes f [u] stationary, subject to the following model
constraint?
u u
g(x, t) = +c F = 0. (A.33)
t x
This question is answered by adding to the functional f [u] the strong constraint
obtained using a Lagrange multiplier = (x, t) obtaining:
L T
J [u, ] = f [u] + 2 dx (x, t)g(x, t)dt. (A.34)
0 0
Note that u and can vary independently. The variation of J around the reference
field u and is given by:
J [u + u, + ] = J [u, ] + J (A.35)
u,
where
L T L T
J J
J |u, = dx dt u + dx dt + O(u 2 , 2 , u).
0 0 u 0 0
u, u,
(A.36)
Thus
L
J |u, = 2Wi d x{u(x, 0) I (x)}u(x, 0)
0
T
+2Wb dt{u(0, t) B(t)}u(0, t)
0
M
+2w {u(xi , ti ) u i }u(xi , ti )
i=1
L T u u
+2 dx dt (x, t)+c
0 0 t x
L T u u
+2 dx (x, t) +c F dt
0 0 t x
+O(u 2 , 2 , u), (A.37)
Appendix 131
with
L u u T
2 dx +c =dt (x, t)
0 0 t x
T
u(x, T )(x, T ) u(x, 0)(x, 0) dtu
0 t
L
+cu(L , t)(L , t) cu(0, t)(0, t) d xcu . (A.39)
0 x
Posing the linear part to zero and using the boundary conditions of we get the
Euler-Lagrange equation for a strong constraint,
1. where the forward equation is:
u u
+c F =0 (A.42)
t x
2. the boundary conditions are:
132 Appendix
M
w {u(xi , ti ) u i }(x xi )(t ti ) ( + c ) = 0, (A.45)
t x
i=1
(x, T ) = 0 (A.46)
(L , t) = 0 (A.47)
References
1. Horn, R.: The Hadamard product. Matrix theory and applications. In: Johnson, C.R. (ed.) Amer-
ican Mathematical Society, Proceedings of Symposia in Applied Mathematics, vol. 40, pp.
87169 (1990)
2. Bannister, R.: Various papers on data assimilation. http://www.met.rdg.ac.uk/~ross/DARC/
DataAssim.html (2012)
3. Sasaki, Y.: Some basic formalisms in numerical variational analysis. Mon. Weather Rev. 98,
875883 (1970)
Index
A D
Analysis residue, 30 DART, 112
Assimilation, 13 Data assimilation, 13
DeMoivre, 3
DeMorgan, 3
B Duke of Tuscany, 6
Bayes, 3 Dynamic data assimilation, 1
Bayesian optimal recursion, 51 Dynamic space-state-model (DSSM), 49
Bayesian probability, 3
Bayesian rule, 63
Bayesian theorem, 13 E
Bernoulli D., 5 EKF, optimal gain, 48
Bernoulli J., 3 EMARS, 113
Best estimate, 39 EnKF, 100
Birkhoff, 5 Ensemble, 12
BLUE, 25 EnSRKF, 93
Boble EnKF, 83 Equation of Euler-Lagrange, 129
Brain web data, 99 Euler, 2
Expectation, 21
Extended Kalman filter, 61
C Extracellular matrix, 101
Calculus of variations, 125
Cerebro spinal fluid, 98
Ceres, 15 F
C-grid models, 110 Ficks second law, 98
Characteristics, method of, 35 Fisher, 3
Cholesky factorizazion, 28 Fletcher-Reeves Polak-Ribire methods, 29
Cholesky, decomposition of, 55 Fokker-Planck operator, 71
CLF criteria, 111 Fokker-Planck-Kolmogorov (FPK), 71
Conditioned, 40 Forecast model, 33
Conjugate method, 28 Forecast/background, 22
Cost function, 31
Covariance, 40
Covariance analysis, 24 G
Covariance background, 24 Galileo, 6
Covariance observation, 24 Gauss, 2
Cross-correlation matrix, 52 General circulation model (GCM), 12
The Author(s) 2016 133
R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7
134 Index
J
Jacobian, 23, 47 N
NASA archive, 112
National center for atmospheric research
K (NCAR), 112
Kalman, 31 Newton, 1
Kalman filter, 39 Numerical weather prediction (NWP), 9
Kalman gain, 44
Kalman SPKF, 49
Kalman, conditions, 51 O
Kolmogorov, 5 Operator, 20
Kronecker, delta, 44 Optimal estimate, 39
Optimal estimation, 24
Optimal interpolation (OI), 28
L
Laboratoire de meteorologie dynamique,
113 P
Lagrange, 2 Pascal, 6
Laplace, 2 Phenotype, 98
LBFGS method, 29 PlanetWRF, 110
Least squares, 3 Poincar maps, 91
Legendre, 2
LETKF, 99, 103
Likelihood, 3 Q
Localization, 103 Quadratic form, 26
Lorenz, 5, 11, 89
Lorenzs butterfly effect, 92
Lyapunov, 5 R
Lyapunov exponents, 91 Random variable covariance, 52
Random variable, mean, 52
Rayleigh-Benard cell, 90
M Recursion, 45
Mannheim society, 8 Recursive, 39
Markov, 5 Recursive Bayesian estimation, 50
Index 135
T
S Taylor expansion, 23
Schur, 123 Thermal emission spectrometer (TES), 112
Schur product, 85 Torricelli, 6
Sequential importance resampling (SIR), True vector, 20
114
Sequential importance sampling (SIS), 114
Sequential Monte Carlo, 113 U
Sherman-Woodbury-Morrinson equation, UCLA two level, 107
27 UK meteorological office, 113
Sigma coordinate, 106 Unscented Kalman filter (UKF), 56
Sigma point Kalman filter (SPKF), 61
Sigma points, 54
Square root scheme, 95 W
State vector, 20 Weak constraints, 127, 129
Stigler, 4 White noise, 42
Stochastic differential equation (SDE), 70 Wiener, 5