0% found this document useful (0 votes)

349 views

#Data Assimilation - Mathematical Concepts and Instructive Examples

#Data Assimilation- Mathematical Concepts and Instructive Examples

Uploaded by

Michael G. Tadesse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

349 views

#Data Assimilation - Mathematical Concepts and Instructive Examples

#Data Assimilation- Mathematical Concepts and Instructive Examples

Uploaded by

Michael G. Tadesse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 140

SPRINGER BRIEFS IN EARTH SCIENCES

RodolfoGuzzi

Data Assimilation:
Mathematical
Concepts and
Instructive
Examples

123
SpringerBriefs in Earth Sciences
More information about this series at http://www.springer.com/series/8897
Rodolfo Guzzi

Data Assimilation:
Mathematical Concepts
and Instructive Examples

123
Rodolfo Guzzi
System Biology Group
University La Sapienza
Rome
Italy

ISSN 2191-5369 ISSN 2191-5377 (electronic)

SpringerBriefs in Earth Sciences
ISBN 978-3-319-22409-1 ISBN 978-3-319-22410-7 (eBook)
DOI 10.1007/978-3-319-22410-7

Library of Congress Control Number: 2015948730

Springer Cham Heidelberg New York Dordrecht London

The Author(s) 2016
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microlms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specic statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media

(www.springer.com)
Preface

Data Assimilation is a set of mathematical techniques allowing us to use all the

information available within a time frame. This includes observational data, any a
priori information we may have and a deterministic or stochastic model describing
our system, and encapsulating our theoretical understanding. The mathematical
basis is the estimation theory or theory of the inverse problem that is an organized
set of mathematical techniques for obtaining useful information about the physical
world on the basis of observation.
In a conventional problem one would use a set of known prior parameters to
predict the state of the physical system. This approach is usually called a forward
problem. In the inverse problem one attempts to use available observation of the
state of the system to estimate poorly known parameters of the state itself. In both
these cases Data Assimilation can be treated as a Bayesian system. The Bayesian
theorem or the law of inverse probability allows us to combine a priori information
about the parameters with the information contained into observations to guide the
statistical inference process.
The reason why the Data Assimilation is so effective is that it seeks to produce
an analysis that ts a set of observations taken over a time frame (not just the
observations made at one instant in time). This is subject to the strong constraint
that the evolution of the analyzed quantities is governed by a deterministic model
describing the given observation.
Due to its flexibility Data Assimilation has been applied to several elds, from
the numerical weather prediction, where it was rst developed, to planetary climate
analysis up to the evolution of biological cells.
One begins with a forecast model, often called background. In order to make
useful predictions the background must be updated frequently with noisy and sparse
measurements. This procedure updates the background in light of the new obser-
vation to produce an analysis, which, under suitable assumptions is the maximum
estimate of the model state vector. Later, the model is restarted from analysis and
produces a new background forecast.

v
vi Preface

Data Assimilation and model forecasts can be combined into an observing

system simulation experiment in order to quantify the effect of changes in the
observation accuracy, type, location, and frequency on the accuracy of the
numerical forecast.
The method is prone to severe limitations, because eventually the forecasted
parameters diverge from the true values. The optimal procedures would combine
the most exhaustive theoretical and observational knowledge taking into account
the errors of observation and those due to the model.
The aim of this book is to give an in-depth and complete overview in the frame
of the SpringerBrief short book concept, to construct Data Assimilation in different
and emerging elds from environment to biology, as outlined in the Applications
chapter.
The book serves both teachers and college students and other interested parties
providing the algorithms and formulas to manage the Data Assimilation wherever a
dynamic system is present.
In writing this book I have followed the history of the data assimilation evo-
lution. The rst chapter gives a wide overview of the data assimilation steps starting
from Gauss rst methods to the most recent as those developed under the Monte
Carlo methods. The second chapter treats the representation of the physical system
as an ontological basis of the problem. The third chapter deals with the classical
Kalman lter, while the fourth chapter deals with the advanced methods based on
recursive Bayesian Estimation. A special chapter, the fth, deals with the possible
applications, from the rst Lorenz model, passing through the biology and medicine
up to planetary assimilation, mainly on Mars. The earthquake application, where the
model is highly nonlinear and the error is non-Gaussian, is also reported in order to
offer another point of view to the reader.
The book endeavors to give a concise contribution to understanding the data
assimilation and related methodologies. The mathematical concepts and related
algorithms are fully presented, especially for those facing this theme for the rst
time.
Finally, I want to express my special thanks to Judith De Paul and George C.
Kieffer to whom I am indebted for their linguistic tips and the effective corrections,
and Luca Montabone for his suggestions on the chapter of applications, mainly on
Mars data assimilation.
Contents

1 Introduction Through Historical Perspective . . . . . . . . . . . . . . . . . 1

1.1 From Gauss to Kolmogorov. . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Approaching the Meteorological System . . . . . . . . . . . . . . . . . . 6
1.3 Numerical Weather Prediction Models. . . . . . . . . . . . . . . . . . . . 10
1.4 What, Where, When . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Representation of the Physical System . . . . . . . . . . . . . . . . . . . . . . 19

2.1 The Observational System and Errors . . . . . . . . . . . . . . . . . . . . 19
2.1.1 The Estimation Problem . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.2 The Linear Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.3 Optimal Estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.4 Minimization Methods of Cost Functions . . . . . . . . . . . . 28
2.1.5 Some Properties of Estimation . . . . . . . . . . . . . . . . . . . . 30
2.1.6 Estimation of the quality of analysis . . . . . . . . . . . . . . . . 31
2.2 Variational Approach: 3-D VAR and 4-D VAR . . . . . . . . . . . . . 32
2.3 Assimilation as an Inverse Problem . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 An Illustrative Example. . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Sequential Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 An Effective Introduction of a Kalman Filter . . . . . . . . . . . . . . . 39
3.1.1 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.2 Building up the Kalman Filter . . . . . . . . . . . . . . . . . . . . 42
3.2 More Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.1 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . 46
3.2.2 Sigma Point Kalman Filter (SPKF) . . . . . . . . . . . . . . . . . 49
3.2.3 Unscented Kalman Filter (UKF) . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

vii
viii Contents

4 Advanced Data Assimilation Methods . . . . . . . . . . . . . . . . . . . . . . 61

4.1 Recursive Bayesian Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.1 The Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.2 The Forecast Step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.3 The Analysis Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.4 Prediction by Stochastic Filtering . . . . . . . . . . . . . . . . . . 70
4.2 Ensemble Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2.1 The Stochastic Ensemble Kalman Filter Menu . . . . . . . . . 74
4.2.2 The Deterministic Ensemble Kalman Filter . . . . . . . . . . . 76
4.2.3 The Analysis Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.4 The Deterministic Ensemble Kalman Filter Menu . . . . . . . 79
4.3 Issues Due to Small Ensembles . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.1 Inbreeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.2 Filter Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.3 Spurious Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Methods to Reduce Problems of Undersampling. . . . . . . . . . . . . 83
4.4.1 Spatial Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 Covariance Inflation . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4.3 Covariance Localization . . . . . . . . . . . . . . . . . . . . . . . . 84
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Lorenz Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1.1 Solution of Lorenz 63 Model . . . . . . . . . . . . . . . . . . . . . 92
5.1.2 Lorenz Model and Data Assimilation . . . . . . . . . . . . . . . 93
5.2 Biology and Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.2.1 Tumor Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2.2 Growth Tumor Data Assimilation with LETKF . . . . . . . . 99
5.2.3 LETFK Receipt Computation. . . . . . . . . . . . . . . . . . . . . 104
5.3 Mars Data Assimilation: The General Circulation Model . . . . . . . 105
5.3.1 Mars Data Assimilation: Methods and Solutions. . . . . . . . 112
5.4 Earthquake Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.1 Renewal Process as Forecast Model . . . . . . . . . . . . . . . . 115
5.4.2 Sequential Importance Sampling and Beyond. . . . . . . . . . 116
5.4.3 The Receipt of SIR. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Chapter 1
Introduction Through Historical Perspective

Abstract The first approach to the data assimilation methods started with the
applications of the Newtonian equations to the motion of the planets made by Laplace
and Gauss. However, only with the introduction of a probabilistic view, the dynamical
equations were able to make a real forecast. The first numerical weather prediction
(NWP) models, based on determinism, soon became probabilistic, mainly with the
contribute of the group leaded by Von Neumann at Princetons Institute for Advanced
Study. This chapter contains the history of the evolution of the data assimilation meth-
ods, from the early determinism algorithms to the most recent probabilistic methods
as the ensemble forecasting. The fundamental work by Edward Lorenz and by the
group of Los Alamos, which developed the Monte Carlo method, are also reported.

1.1 From Gauss to Kolmogorov

Forecasts from models provide useful information to help predict the future state of a
system. However, inevitably predictions will diverge from reality as time progresses,
that is an inescapable property of dynamical systems. Data assimilation (DA) is a
formal approach used to help correct forecasts by introducing information observed
from the environment.
The first Dynamic Data Assimilation, which we intend to describe in this book,
which is that requires the existence of data and equations or dynamic models, was
made to calculate the comets orbit, the moving bodies per excellence. This is a
classical problem of two bodies where one seeks to determine the position and the
components of the body at a particular time, (Principia book III prop XLI).
The first method of finding the orbit of a comet moving in a parabola was tried by
Newton [1], using three observations. He, however, wrote, This being a problem of
very great difficulty, I tried many methods of resolving it.
When the observations are taken from the surface of Earth, the apparent position,
in space, of the comet is non-given and therefore the velocity components are not
determined. This fact implies that the observer has to make more observations, at
different times. During the interval of time, Earth and comet have moved, so the

The Author(s) 2016 1

R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_1
2 1 Introduction Through Historical Perspective

problem of finding the elements of a comet orbit becomes a difficult problem, as

Newton wrote.
In principle, since the orbit is defined by six elements and a simple, complete obser-
vation gives us the angular coordinates of the body, three observations are enough.
Being the reference system given by the neighboring fixed star, the coordinates are
determined in right ascension and declination . Supposing the observations are
made in times t1 , t2 and t3 the six equations of the corresponding right ascension and
declination are functions of the elements of the orbit and of the dates of observations
and are represented by:

1 = (, i, , a, e, T ; t1 )

2 = (, i, , a, e, T ; t2 )

3 = (, i, , a, e, T ; t3 )
(1.1)

1 = (, i, , a, e, T ; t1 )

2 = (, i, , a, e, T ; t1 )

3 = (, i, , a, e, T ; t1 ),

where is the longitude of ascending node, i is the inclination to plane of the ecliptic,
is the longitude of the perihelion measured from the node, a is the major semi-axis
which defines of the orbit of revolution, e is the eccentricity which describes the
shape of the orbit, T is the time of perihelion passage defining the position of the
body at any time.
Since the functions and are highly transcendental the solution of equation
(1.1) is not obtained by ordinary processes and is very complex as mentioned by
Newton. The fundamental elements described by Newton were explained by Laplace
[2], in 1780, that also described the method of solution of equation (1.1) that has
been the basis for the later works.
In 1809 Gauss [3], in Theoria Motus Corporum Celestium, elaborated a technique,
based on the least squares, that led to the recovery of the orbit of Ceres lost to sight
when its light vanished in the ray of the Sun. Before them Euler [4] in 1744, in
Theoria Motum Planetorum et Cometarum had obtained an approximate solution
while in 1778 Lagrange [5] obtained an original solution which has never been used
in practice.
Thus from our point of view the first Dynamic Data Assimilation was made by
Laplace and Gauss. In Moultons book [6], the methods they had adopted are fully
described to determine the orbit of a comet. Furthermore in the Gauss method it lays
the groundwork for the least squares method that became the foundation of the Data
Assimilation.
Previously, in 1805, Adrien-Marie Legendre [7] also had developed the method of
least squares, and introduced it in his book Nouvelles mthodes pour la dtermination
des orbites des comtes (New Methods for Determining the Orbits of Comets). The
least squares method, independently established by Legendre and Gauss in 1800,
also represents the passage from the determinism to the probability.
The basic assumption of physics from the time of Newton up to the Gauss period
(late 1600 to 1800) was the determinism. It establishes that the future state of a
1.1 From Gauss to Kolmogorov 3

system is entirely determined by the present state of the system. The evolution of the
system is governed by causal relationships as those described, for example, by the
Newton equations of motion.
Around the late 1600 the important concept of the probability arose, as a mea-
sure of the likeliness that an event will occur. The first to demonstrate the efficacy
of defining odds as the ratio of favorable to unfavorable outcomes was Gerolamo
Cardano [8] in his book, written around 1564, but the mathematics of the probability
was set by Jacob Bernoulli [9], one of the twelve many prominent mathematicians
in the Bernoulli family, in the book Ars Conjectandi (posthumous, 1713) and by
Abraham de Moivres Doctrine of Chances (1718).
In 1812, Laplace [10] published his Thorie analytique des probabilits in which it
was allowed to compare the merits of different parameter values. For the first time the
concept of estimation was introduced, in which acceptable values for the parameters
of distributions, specified by hypotheses, are sought. The first half of this treatise was
concerned with probability methods and problems, the second half with statistical
methods and applications. The term inverse probability appeared in the 1837 paper of
De Morgan [11] since Laplaces book did not use this term. The term inverse means
that the probabilities of causes, or hypotheses, could be deduced from the frequency
of events. It involves inferring backward from the data to the parameter or from
effects to causes. If no information is available when setting initial priors, merely
defined as uniform prior, one sets all possible hypotheses to an equal initial prior
probability. This concept was introduced by Bayes [12] before and after Laplace [2]
that stated the principle:
If an event can be produced by a number n of different causes, then the probabilities
of these causes, given the event, are to each other as the probabilities of the event
given the causes, and the probability of the existence of each of these is equal to the
probability of the event given that cause, divided by the sum of all of the probabilities
of the event given each of these causes.
Gauss combined the Bernoulli idea about the probability with Laplaces principle
of inverse probability maximized the posterior density of the location parameter
in the error distribution, assuming that the prior distribution is uniform. Requiring
that the posterior mode is equal to the arithmetic mean, Gauss derived the normal
distribution and thus gave a probabilistic justification for the method of least squares
(Hald [13]).
By the mid-1700s, the problems of observational error were mathematically
described. They were supposed to be positive and negative, and it was generally
accepted that their frequency distribution followed a smooth symmetric curve.
The inverse probability describing the probability distribution of an unobserved
variable, is called Bayesian probability, and the problem of determining an unob-
served variable (by whatever method) is called inferential statistics. The distribution
of an unobserved variable given data is rather defined by the likelihood function
(which is not a probability distribution), and the distribution of an unobserved vari-
able, given both data and a prior distribution, is the posterior distribution. The term
likelihood was introduced by Fisher [14] who wrote:
4 1 Introduction Through Historical Perspective

What has now appeared is that the mathematical concept of probability is

inadequate to express our mental confidence or diffidence in making inferences,
and that the mathematical quantity which usually appears to be appropriate for mea-
suring our order of preference among different possible populations does not in fact
obey the laws of probability. To distinguish it from the probability, I have used the
term Likelihood to designate this quantity; since both the words likelihood and
probability are loosely used in common speech to cover both kinds of relationship
(Laplace, 1774, in Stiglers translation [15]).
The development of the field and terminology from inverse probability to Bayesian
probability is described by Fienberg [16] in his paper, where he traces the passage
from inverse probability concept, due to Bayesian probability, and the birth of the
frequentist methods.
At the time that Ronald Alymer Fisher began his studies of statistics at Cambridge
in 1909, inverse probability was an integral part of the subject he learned (c.f.
Edwards [17]). Frequentist and other non-Bayesian ideas were clearly in the air,
but it is difficult to know to what extent Fisher was aware of them.
and more
Fishers work had a profound influence on two other young statisticians working
in Pearsons laboratory at University College London: Jerzy Neyman, a Polish sta-
tistician whose early work focused on experiments and sample surveys, and Egon
Pearson, Karl Pearsons son. They found Fishers ideas on significance tests lacking
in mathematical detail and, together, they set out to extend and complete what he
had done. In the process, they developed the methods of hypothesis testing and con-
fidence intervals that revolutionized both the theory and the application of statistics.
Although Fisher disagreed with them, their approaches supplanted inverse prob-
ability and later was referred as frequentist methods.
There are some differences between Bayesian and frequentist. The frequentist
sees the probability as the long-run expected frequency of occurrence while for
the Bayesian the probability is related to the degree of belief. It is a measure of the
plausibility of an event given incomplete knowledge. For a frequentist the population
mean is real, but unknown, and unknowable, and can only be estimated from the
data. Knowing the distribution for the sample mean, the frequentist constructs a
confidence interval, centered at the sample mean. Since the true mean being a single
fixed value, does not have a distribution, and the frequentist cannot say if there is a
95 % probability that the true mean is in a particular interval. The sample mean does.
Thus, the frequentist must use circumlocutions like 95 % of similar intervals would
contain the true mean, if each interval were constructed from a different random
sample like this one. The Bayesian has an altogether different view because for
them only the data are real. The population mean is an abstraction, and thus some
values are more believable than others, based on the data and their prior beliefs, even
though sometimes the prior belief is very non-informative. The Bayesian constructs
a credible interval, centered near the sample mean, but adjusted by their prior beliefs
concerning the mean.
According to David [18] the term Bayesian, was introduced by R.A. Fisher in
1950 in his Contributions to mathematical Statistics.
1.1 From Gauss to Kolmogorov 5

This short paper to the Cambridge Philosophical Society was intended to intro-
duce the notion of fiducial probability, and the type of inference which may be
expressed in this measure. It opens with a discussion of the difficulties which had
arisen from attempts to extend Bayes theorem to problems in which the essential
information on which Bayes theorem is based is in reality absent, and passes on
to relate the new measure to the likelihood function, previously introduced by the
author, and to distinguish it from the Bayesian probability a posteriori.
Hald [13] has shown that the method of maximum likelihood was proposed by
Daniel Bernoulli, whose paper on likelihood was reported by Kendall [19], but with
no practical effect because the maximum likelihood equation for the error distri-
bution was considered intractable. In English the word likelihood has been distin-
guished as being related to, but weaker than, probability since its earliest uses. The
comparison of hypotheses by evaluating likelihoods has been used for centuries. Dif-
ferent authors by also used the term likelihood, from Christiaan Huygens to Charles
Sanders Peirce [20], where model-based inference (usually abduction but sometimes
including induction) is distinguished from statistical procedures based on objective
randomization.
Fisher [21] also uses the term method of maximum likelihood estimation (MLE). In
statistics, maximum-likelihood estimation is a method of estimating the parameters
of a statistical model. When applied to a data set and given a statistical model,
maximum-likelihood estimation provides estimates for the models parameters.
The method of maximum likelihood corresponds to many well-known estimation
methods in statistics. Assuming that the heights are normally (Gaussian) distributed
with some unknown mean and variance, the mean and variance can be estimated with
MLE while only knowing the heights of some sample of the overall population. MLE
would accomplish this by taking the mean and variance as parameters and finding
particular parametric values that make the observed results the most probable (given
the model).
With the advent of the statistical thermodynamics due to James Maxwell (see
Myrvold [22] and Ludwigh Boltzman [23]) appeared to clear the limitations of
the deterministic view. At the same time, however, Poincar [24], Birkhoff (see
Marston Morse [25]) and Lyapunov [26] continued their exploration along the lines
of what was called dynamic system whose most recent approach was given by Edward
Lorenz [27] with the predictability: the butterfly effect and the strange attractors.
Markov [28], Wiener (see Masani [29]) and Kolmogorov [30] developed the sto-
chastic dynamic prediction related to process that exhibited randomness, in addition
to the dynamical system. In probability theory, a stochastic process, or sometimes
random process (widely used) is a collection of random variables, representing the
evolution of some system of random values over time. This is the probabilistic coun-
terpart to a deterministic process (or deterministic system). Instead of describing a
process which can only evolve in one way (as in the case, for example, of solutions
of an ordinary differential equation), in a stochastic or random process there is some
indeterminacy: even if the initial condition (or starting point) is known, there are
several (often infinitely many) directions in which the process may evolve.
6 1 Introduction Through Historical Perspective

In the simple case of discrete time, as opposed to continuous time, a stochastic

process involves a sequence of random variables and the time series associated with
these random variables (for example, see Markov chain, also known as discrete-time
Markovs chain). One approach to stochastic processes treats them as functions of
one or several deterministic arguments (inputs, in most cases regarded as time) whose
values (outputs) are random variables: non-deterministic (single) quantities which
have certain probability distributions. Random variables corresponding to various
times (or points, in the case of random fields) may be entirely different. The main
requirement is that these different random quantities all have the same type. Type
refers to the codomain of the function. Although the random values of a stochastic
process at different times may be independent random variables, in most commonly
considered situations they exhibit complicated statistical correlations.
Familiar examples of processes modeled as stochastic time series include stock
market and exchange rate fluctuations, signals such as speech, audio and video, med-
ical data such as a patients EKG, EEG, blood pressure or temperature, and random
movement such as Brownian motion or random walks. Examples of random fields
include static images, random terrain (landscapes), wind waves or composition vari-
ations of a heterogeneous material. By mid-nineteenth, there was a renewed interest
on Bayesian approach that laid the groundwork for filtering theory, smoothing, pre-
diction by Kolmogorov [30], Kalman and Bucy [31].
At the end of our more recent history, we have the work on predictability, data
analysis via minimum variance and in the frame of quantum studies the Monte Carlo
approach by Metropolis and Ulam [32].
This area of research is one of the most active and one of the most challenging
in the frame of ensemble prediction. There are a variety of avenues that are being
explored and that will be shown in this book. Among them one can cite the ensemble
Kalman filter (see Evensen and van Leeuwen [33], Burgers et al. [34], Hamill et
al. [35], Evensen [36]).
Lewis [37] offers a complete picture of the genealogy of data assimilation con-
structed using the information coming from the Dictionary of Scientific Biography
(Gillespie [38]). Such a chart has been drawn here with some small differences
(see Fig. 1.1).

1.2 Approaching the Meteorological System

With the invention of the first meteorological instruments the quantitative observation
of meteorological phenomena in 14th century was initiated. The first regular instru-
mental observation was planned by Blaise Pascal at Puy de Dme in Paris where
he observed the first relation between the weather change and the height of mercury
column in the thermometer. The first systematic observations were also made in
London and Oxford. Previously non-systematic observations had been also made in
Florence by Galileo and his scholars, including Torricelli [39], in the first climate net-
work set up by Duke Ferdinand II of Tuscany (16101670). The first observations in
1.2 Approaching the Meteorological System 7

Fig. 1.1 I have used part of this figure to draw the structure of this book, changing it in the early
part when probability studies arose. Source Lewis [37]
8 1 Introduction Through Historical Perspective

Germany started in 1654 when the Jesuit college of Osnabrak received a Florentine
thermometer. During the first part of the 18th century systematic observations were
begun in Russia. Due to the heterogeneity of the instruments and mainly because
they were not standardized and calibrated it was impossible to make some forecasts.
In 1763 in Mannheim The Palatine Academy of Sciences and Letters was founded
that in 1780 established a special meteorological class of the Academy to be known as
the Palatine Meteorological Society (Societas Meteorologica Palatina). Many obser-
vatories participated in the observation proposed by the Mannheim Society. The
observations collected were published in Latin, which was the international lan-
guage of the science, in the journal of the Society: Ephemerides. Nevertheless the
Mannheim Society had closed its activities in 1799, the concept had been initiated and
the first attempt to standardize the observations could be extended to all observations
on ground stations everywhere.
The present meteorological network has evolved over 300 years with new instru-
ments and new technologies. With the advent of satellites, the instruments of class
1, which are the instruments making measurements at points, were joined by class 2
instruments, that are instruments sampling an area or a volume remotely. A succes-
sive class 3 devices were also introduced that are instruments calculating the wind
velocities from Lagrange trajectories.
While the instrumental error for traditional class 1 instruments can be defined
within a certain accuracy, the instrumental errors for the class 2 and 3 instruments
are not so straightforward because their results are derived from inversion methods
and therefore are strongly dependent on the algorithms used.
Another type of error was also reported: the error of representativeness. Petersen
and Middleton [40] defined a representative observation as a datum pertaining to a
particular point and time, which is the result of an optimum filtering operation on
the continuous raw data field, under the criterion of minimum average mean square
error of reconstruction from the subsequently sampled values on a given space-time
lattice.
Starting from this definition one can say that the error of representativeness is a
measure of the error due to misrepresentation in space-time scale of the observation.
Such error can arise from the non-correct distance of the stations in the observation
network, in case of hazard or natural phenomena, or in case of micro observations, in
a non-sufficient density of observations able to capture the essence of phenomenon
under investigation.
Petersen [41] defines the expression of the observation process as a set of measure-
ments of the physical field made at various space-time points by means of devices
having certain physical and statistical features. The main refinement is that the total-
ity of this information be instantaneous in time and, whether counted in units of real
numbers, be finite. The criterion by which these estimates are to be evaluated is their
mean square error or deviation from the true value of the field variable at the given
location and time. Then the optimal solution to this problem is that the estimate
should coincide with the mean value or expectation of the variable, conditional upon
the available data.
1.2 Approaching the Meteorological System 9

Surface stations, radiosondes and pilot balloons are irregularly distributed over
land and on some remote islands. Then they are complemented with meteorolog-
ical satellites Meteosat (Europe), GOES (USA), Elektro-L1 (Russia), MTSAT-1R
(Japan), INSAT (India), Feng-Yun (China) and Earth Observation Satellite constel-
lations as for example the NASA A-Train or the near future European, ESA, con-
stellation named Sentinel.
These constellations provide data that can be used together to obtain compre-
hensive information about atmospheric components or processes that are happening
at the same time. Combining the information collected simultaneously from several
sources, by the ground and airborne network, one obtains a complete answer to many
questions than would be possible from any one satellite taken alone at different times.
Data assimilation methods used in Numerical Weather Prediction, where grid
measurements are present and model process are available, have also been developed
in other fields: earthquakes, chemical process in atmosphere, planetary circulation,
robotics and biology growth. The applications refer to methods for updating the
state vector (initial condition) of a complex space-time model by combining new
observations with one or more prior forecasts, in which the changing of the space-
time observations requires proper sampling and the observation representativeness
and related errors become crucial to both follow and forecast the phenomenon under
investigation.
Reviewing the concepts expressed above it is possible to define to better what is
the assimilation and its cycle. It can be defined as
1. data checking;
2. proper interpolation of data;
3. initialization of the assimilation model;
4. short forecast to prepare the background field.
Let us now reverse our approach and look first of all at the assimilation model
instead of observation data. In general a dynamic system is described by a system of
non-linear partial equation (PDEs) whose solution is discrete. If the solution of the
system is well-posed in the sense of Hadamard, there is a forward discrete operator
that yields the solution. Let us note that a dynamic system evolves with time but
that not all of the state variables, within the state vector, have equal information
content and not all state variables are known to the same precision. It is therefore
desirable that the observations made both contain the maximum information content
possible and allow the systems state to be characterized with a minimum uncertainty.
Furthermore, in order to pass from the observation location to the grid points of the
model some interpolations are performed. In fact the first-guess fields defined at the
grid points of the forecasting model are interpolated to the observation location, while
differences between the observation and the interpolated value are then interpolated
back onto the grid points to define a correction.
Although the continuum state of the observations is the most natural candidate
to be accumulated into a model, the true state to be estimated is a projection of the
continuum state on a discrete space. This refers to the best possible state represented
by the model, which is what we are trying to approximate. Thus the production of an
10 1 Introduction Through Historical Perspective

accurate image of the true state of the system at a given time, represented in a model
as a collection of numbers, is called analysis. An analysis can be useful in itself as
a comprehensive and self-consistent diagnostic of the system. It can also be used as
input data to another operation, notably as the initial state for a numerical forecast,
or as a data retrieval to be used as a pseudo-observation. It can provide a reference
against which to check the quality of observations.
There are two basic approaches: sequential assimilation, which only considers
observation made in the past until the time of analysis, which is the case of real-
time assimilation systems, and non-sequential, or retrospective assimilation, where
observation from the future can be used, for instance in a reanalysis exercise.
Since the model has a lower resolution than reality, even the best possible analysis
will never be entirely realistic. Thus even though the observations do not have any
instrumental error, and the analysis is equal to the true state, there will be some
unavoidable discrepancies between the observed values and their equivalents in the
analysis, because of representativeness errors. Although we will often treat these
errors as a part of the observation errors in the mathematical equations, one should
keep in mind that they depend on the model discretization, not only on instrumental
problems.
Summarizing, the necessary objective information that we can use to produce
an analysis is a collection of observed values provided by observations of the true
state. If the model state is overdetermined by the observations, then the analysis
reduces to an interpolation problem. In several cases, the analysis problem is under-
determined because data are sparse and only indirectly related to the model variables,
as happens in remote sensing measurements. In order to make it a well-posed problem
it is necessary to rely on some background information in the form of an a priori
estimate of the model state. Physical constraints on the analysis problem can also
help. The background information can be a trivial state; it can also be generated
from the output of a previous analysis, using some assumptions of consistency in
time of the model state, like stationarity (hypothesis of persistence) or the evolution
predicted by a forecast model.

1.3 Numerical Weather Prediction Models

An interesting paper by Lewis [37] outlines the history of the numerical weather
forecasting, from the first deterministic solutions up to the most recent approaches
based on Monte Carlo methods.
Around 1950, it was clear that the dynamics of weather could be evaluated using
the same equation of dynamics developed in other fields. The theoretical treatment of
the scales of motion in the atmosphere, initially based on the determinist set of equa-
tions, proposed by Jule Charney [42, 43] showed be considered the first successful
Numerical Weather Prediction (NWP). At Princetons Institute for Advanced Study
under the guidance of John von Neumann, Jule Charney [43] and his team made two
successful 24 h forecasts of the transient features of the large-scale flow, initialized
1.3 Numerical Weather Prediction Models 11

on 30 January and 13 February 1949, even though the successive 24 h forecasts made
(5 January 1949, e.g.) were not particularly useful, as referred by Lewis [37] in his
paper.
The reactions of the meteorologic community were positive even though it was
clear the limits of deterministic prediction that were governed by the growth of errors
because the solution depends both on the initial state, which is generally erroneous,
and on the models that are, by their nature, imperfect. Furthermore the feature of
the causal laws, characterized by unstable systems and non-periodicity limited the
predictability of the system.
Whereas Gauss and contemporaries had found that the two-body problem of
celestial mechanics tolerated, in the initial state, small errors, meteorological pre-
diction under non-periodic constraints would be found to be less forgiving of these
uncertainties [37].
In the first years, the major theme was the development of the models and, of
course, the identification and related correction of systematic errors. That the deter-
ministic forecast was imperfect had been already discovered by Eady [44] in his
Compendium of meteorology.
we never know what small perturbations may exist below a certain margin
of error. Since the perturbations may grow at an exponential rate, the margin of
the error in the forecast (final) state will grow exponentially as the period of the
forecast is increased, and this possible error is unavoidable whatever our method of
forecasting if we are to glean any information at all about developments beyond
the limited time interval, we must extend our analysis and consider the properties of
the set or ensemble (corresponding to the Gibbs-ensemble of statistical mechanics)
of all possible developments. Thus, long range forecasting is necessarily a branch
of statistical physics in its widest sense: both our questions and answers must be
expressed in terms of probabilities.
In 1962 Edward Lorenz [45], using a truncated version of the two levels quasi-
geostrophic model described by Lorenz himself [46], found that:
However the work the existence of deterministic systems governed by equations
whose nonlinearity resembles the non-linearity of the atmosphere, which are not
perfectly nor almost perfectly predictable by simple and simply determined linear
formulas, if the period between successive observations is greater than half of the
shortest significant period of observation.
Lorenz speculated that, by a linear regression, one day forecast should be good,
but successive days forecast should be poor. An unexpected result occurred when
Lorenz inadvertently introduced truncation errors in the model he was running. The
small error had influenced the three decimal place, instead of six decimal but had
amplified so much of the simulation that the signal had been covered and I found
this very exciting because this implied if the atmosphere behaved this way, the long
range forecasting was impossible
The contribution by Lorenz [27], using a system of three ordinary differential
equations of a simplified system of Barry Saltzman to study finite amplitude con-
vention, has laid the foundation for the field of chaotic systems (see Chap. 5).
12 1 Introduction Through Historical Perspective

Despite Lorenzs numerical experiments not being encouraging, scientists con-

tinued to make weather experiments for intermediate time, that was of the order of
weeks.
The Global Atmospheric Research Program of 1968 [47] became the reference
Program, and the General Circulation Model (GCM) became the instrument for the
experiments.
In the meantime, an alternative to the classical Numerical Weather Prediction
models was also developed. It was based on a stochastic dynamic approach that
coupled the probability with the determinism whose practical implementation in
meteorology was called ensemble prediction.
We can follow the reaction of Lorenz twho in 1965 claimed: Among the possible
internal mechanism for the production of long-term fluctuations, one of the most
interesting to consider is due to the non-linearity of the governing physical law.
It seems possible that there may exist a number of distinct rgimes, such that the
general circulation, upon finding itself in any particular rgimes, tends to become
stuck there. That is, the atmosphere may readily progress from one state to another
within a given rgime, but only occasionally acquire just such a state as to allow
it to pass to one rregimes to another. The change of rgimes will then appear as
long-period fluctuates.
and moreover:
The aperiodicity of the variations indicates that if the initial conditions are not
precisely known, prediction at a sufficiently long range is impossible, no matter
how well the governing law may be formulated. If distinct rregimes are present it
may be possible to predict the rregimes with a reasonable probability of success, at a
considerably longer range than that at which one can hope to predict the state within
the rregime. The proposed procedure chooses a finite ensemble of initial states, rather
than the single observed initial state. Each state within the ensemble resembles the
observed state closely enough so that the differences might be ascribed to errors or
inadequacies in observation. A system of dynamic equations previously deemed to
be suitable for forecasting is then applied to each member of the ensemble, leading
to an ensemble of states at any future time. From an ensemble of future states, the
probability of occurrence of any event, or such statistics as the ensemble mean and
ensemble standard deviation of any quantity, may be evaluated. Between the near
future, when all states within an ensemble will look about alike, and the very distant
future, when two states within an ensemble will show no more resemblance than
two atmospheric states chosen at random, it is hoped that there will be an extended
range when most of the states in an ensemble,while not constituting good pin-point
forecasts,will possess certain important features in common. It is for this extended
range that the procedure may prove useful.
In the early 1960s, Edward Epstein, a professor at University of Michigan, began to
use probabilistic methods called Monte Carlo [32] defined by Metropolis and Ulam.
While attending an American Meteorological Society meeting of 1968, in listening
to Lorenzs presentation he was influenced in such a way as to redirect his work. It
was clear that the initial position of the motion was subject to indeterminacy, but one
started, the motion followed the deterministic law. The variables were not considered
1.3 Numerical Weather Prediction Models 13

deterministic variables, rather as random variables with associated probabilistic,

that is, stochastic properties [37]. During a sabbatical year at the International Insti-
tute of Meteorology in Stockholm in 1968, he began work on a stochastic dynamic
(SD) approach to Numerical Weather Prediction which culminated with the paper:
Stochastic dynamic prediction [48].

1.4 What, Where, When

Assimilation is a word that covers different meanings, from those linguistic to those
cultural, from the process of conversion of nutrient in biology to those incorporating
new concepts into existing schemes as it is in psychology, up to the technical assim-
ilation in meteorology or climate, robotics and recently biology processes. In such
cases regarding the word assimilation, we need to append the word data. One shall
cover this last aspect analyzing the dynamic data assimilation.
In this frame the Data Assimilation is a set of mathematical techniques allowing us
to use all the information available to us within a time frame, including observational
data, any prior information we may have by a deterministic model describing our
system and encapsulating our theoretical understanding of Data Assimilation. The
mathematical basis is the estimation theory or the theory of inverse problems that is
an organized set of mathematical techniques for obtaining useful information about
the physical world on the basis of observations.
In a conventional problem, one would use a set of known prior parameters to
predict the state of a physical system. This approach is usually called a forward
problem, whereas in the inverse problem one attempts to use available observation
of the state of the system to estimate poorly known parameters of the state itself. In
both the case, Data Assimilation can be treated as a Bayesian system.
The Bayesian theorem or the law of inverse probability allows us to combine prior
information about the parameters with the information contained in the observations,
to guide the statistical inference process. The reason why the Data Assimilation is
so effective is that it seeks to produce an analysis that fits a set of observations taken
over a time frame, not just the observations made at one instant in time, subject to
the strong constraint that the evolution of the analyzed quantities is governed by a
deterministic model describing the given observation.
Data Assimilation has been applied to several fields, from the numerical weather
prediction, where it began, to planetary climate analysis up to the evolution of bio-
logical cells or in a robotic system. One begins with a forecast model, often called
background. In order to make a useful prediction, the background must frequently
be updated with noisy and sparse measurements. This procedure updates the back-
ground in light of the new observation to produce an analysis, which, under suit-
able assumptions is the maximum estimate of the model state vector. Subsequently,
the model is restarted from analysis and provides a new background or forecast.
Data Assimilation and model forecast can be combined into an observing system
14 1 Introduction Through Historical Perspective

simulation experiment quantifying the effect of changes in the observation accuracy,

type, location and frequency on the accuracy of the numerical forecast.
In the case of weather forecasting, like many other systems, one cannot sample
the atmosphere at every point, observations are corrupted by noise, and any given
model is imperfect.
For these reasons the method suffers from some limitations, because eventually
the forecasted parameters diverge from the true values which have subsequently
introduced, the optimal procedures combine the most exhaustive theoretical and
observational knowledge with the errors of observation and those due to the model.
Furthermore dynamic systems exhibit sensitive dependence on initial conditions.
Those are small errors in non-fixed-point initial conditions quickly propagated in
time, leading to substantial differences in solutions. Even though trajectories have
similar limit sets, they became uncorrelated over time even when the initial conditions
were very similar and when, on time scales of a few days or less, uncertainties in the
initial state of the atmosphere may lead to substantial forecast errors.
In a classic example, consisting of a system of three coupled ordinary differential
equations modeling fluid flow, Edward Lorenz [27] found a chaotic behavior that led
to models that did not offer any predictive advantage over climatological averages.
To account for uncertainties in a forecast model, Lorenz suggested that instead of
simulating a single initial condition, under the model from a best estimate of the
state of the atmosphere, one should evolve a set or ensemble of initial conditions,
each from a statistically equivalent estimate of the true initial state, that assesses and
quantifies the effect of uncertainty in a mathematical model of a dynamical system.
The ensemble gives a Monte Carlo estimate of the uncertainty in a given dynam-
ical model. Under assumptions to be discussed in the next sections, the ensemble
mean constitutes an empirical maximum-likelihood estimate of the true state of the
atmosphere.
Ensemble forecasting, the technique used in the frame of the early studies of
chaotic behavior is now part of the routine operations at the U.S. and European
weather centers.
Summarizing Data Assimilation is an analysis in which the information is stored in
a dynamic model, exploiting the consistency constraints inherent in physical laws and
processes, combining temporal observations distributed over time with the dynamic
model itself. The analysis process is to varying degrees:
1. an approximation of the true state of a physical system at a given time;
2. a comprehensive diagnosis and consisting of a physical system;
3. a reference by which to check the quality of the observations;
4. a matter of useful inputs for another task, such as the initial state of a predictive
model.
A paradigmatic case is one of the assimilation of a thunderstorm forecast, which
implies that errors due to initial conditions must be reduced as far as possible leaving
only the model to generate errors and then proceed in a realistic direction. In this
way, it will develope not only the prediction, but also searching for a collection of
accurate data and diagnosis of errors of the model itself. The acquisition combines
1.4 What, Where, When 15

the observation data with the data produced by the model to reproduce an optimal
estimate evolving state of the system. The model provides texture to observed data
allowing interpolating or extracting them in regions of space and time in which they
are missing. Moreover, the observed data adjust the trajectory of the model through
the state of the model itself, keeping it online in a loop of prediction-comment-
correction.
There are at least two basic approaches for data assimilation: one where we con-
sider sequential data observed in the past until the time of writing, in this case,
we have similar systems in real time, and one non-sequentially where one can use
successive observations. Methods may be intermittent or continuous over time.
Temporal intermittent distribution includes a cycle of 6 h and observations are
processed in batches. Continuous observations are carried out on a long run and
analyzed. The state correction is smoothed over time allowing one to obtain a more
realistic analysis.
From a practical point of view, one uses a template, the so-called direct model, to
connect the input parameters to the output parameters. Mathematically one writes:

y = H(x) (1.2)

where x represents the collection of all the variables that describe the state of the
model. In this way one can compare observations obtained from the model with
those obtained by y, estimating the error of the model itself. The measured data can
come from different sources, from in situ measurements or from satellite. Then the
numeric features of assimilation are reduced to a minimization problem where the
cost function J is:
J = ||y H(x)||2 (1.3)

where is the norm two.

Data Assimilation is applied in several fields. From the first applications, as those
developed as an example, by Gauss around the reappearance of the planetoid Ceres,
as it is reported in previous paragraphs, the techniques were mainly developed in
Numerical Weather Prediction and became mature with a right blended of determin-
istic and stochastic approach.
Recent developments have been made in covering new fields like the planetary
atmospheric circulation, mainly Martian atmosphere and in a field apparently far from
the current data assimilation like that devoted to bio-medicine where one addresses
the problem of how differences between the predicted state of a biological system
can be reconciled with noisy measurements to correct the forecast in view of new
information.
16 1 Introduction Through Historical Perspective

References

1. Newton I.: Principia book III prop XLI. https://archive.org/details/newtonspmathema00

newtrich
2. Laplace, P.S.: Mmoire sur la probabilit des causes par les vnements," Mmoires de
lAcadmie Royale des Sciences Presents par Divers Savans 6, 621656 (1774). Translated in
S. Stigler Laplaces 1774 Memoir on Inverse Probability, Statistical Science 1, 359378 (1986)
3. Gauss, C.F.: Theoria motus corporum celestium, Hamburg: Perthes und Besser (1809). Trans-
lation by C.H. Davis in Theory of Motion of Heavenly Bodies. Dover, New York (1963)
4. Euler, L. : Theoria Motum Planetorum et Cometarum https://math.dartmouth.edu/~euler/tour/
tour_17.html
5. Lagrange, J.L. (1776). Mmoire sur lutilit de la mthode de prendre le milieu entre les rsultats
de plusieurs observations; dans lequel on examine les avantages de cette mthode par le calcul
d es probabilits, ou lon resoud differens problmes relatifs cette matire. Miscellanea
Taurinensia 5 167232. Reprinted in Lagrange. Oeuvres de Lagrange, 2. Gauthier-Villars,
Paris. (1868)
6. Moulton, F.R.: An Introduction To Celestial Mechanics. Kessinger Pub Co ( 2007)
7. Legendre, A.M.: Sur la Mthode des moindres quarrs. On the Appendix of Nouvelles mthodes
pour la dtermination des orbites des comtes. Courcier, Paris (1805)
8. Cardano G.: Liber de ludo aleae (Book on Games of Chance). Princeton University press (1953)
9. Bernoulli, J.: Ars Conjectandi. Thurnisiorum, Basel (1713)
10. Laplace, P.S.: Thorie analytique des probabilits, 1st edn., 1812, 3rd ed., with supplements.
Courcier, Paris (1820)
11. de Morgan, A.: An Essay on Probabilities and Their Application to Life Contingencies and
Insurance Offices. Longmans, London (1838)
12. Bayes, T.: An essay towards solving a problem in the doctrine of chances. Philos. Trans. R.
Soc. Lond. for 1763 53, 370-414 (1764). Reprinted with an introduction by G.A. Barnard In:
Pearson, E.S., Kendall, M.G. (eds.) Studies in the History of Statistics and Probability, pp.
131-153. Chas. Griffin, London (1970)
13. Hald, A.: A History of Mathematical Statistics from 1750 to 1930. Wiley, New York (1998)
14. Fisher, R.A.: Statistical Methods, Experimental Design, and Scientific Inference, Being a
Reprint of Statistical Methods for Research Workers (1925). The Design of Experiments (1935),
and Statistical Methods and Scientific Inference (1956). University Press, Oxford (1990)
15. Stigler, S.M.: Laplace, Fisher, and the discovery of the concept of sufficiency. Biometrica 60
439445. Reprinted in 1977. In: Kendall, M.G., Plackett, R.L. (eds.) Studies in the History of
Statistics and Probability, vol. 2, pp. 271277. Griffin, London (1973)
16. Fienberg, S.E.: When Did Bayesian Inference Become Bayesian? Bayesian Anal. 1(1), 140
(2006)
17. Edwards, A.W.F.: What did Fisher mean by inverse probability in 19121922. Stat. Sci. 12,
177184 (1997)
18. David, H.A.: First (?) occurrence of common terms in mathematical statistics. Am. Stat. 49,
121133 (1995)
19. Kendall, M.G.: Studies in the history of probability and statistics XI Daniel Bernoulli on
maximum likelihood. Biometrica 48(12), 1 (1961)
20. Pierce A. : Theory of Probable Inference in Studies in Logic by Members of the Johns Hopkins
University (1883)
21. Fisher, R.A.: Contributions to Mathematical Statistics. Wiley, New York (1950)
22. Myrvold W.C.: Statistical mechanics and thermodynamics: a Maxwellian view. Stud. Hist.
Philos. Mod. Phys. 42, 237243 (2011)
23. Boltzmann, L.: Vorlesungen ber Gastheorie: 2 VolumesLeipzig 1895, 98 UB: O 52626
(18961898). English version: Lectures on gas theory. Translated by Stephen G.B.: University
of California Press, New York (1964). ISBN: 0-486-68455-5 (1995)
References 17

24. Poincar, H.: The Foundations of Science. Science Press, New York. Reprinted in 1921 (1902-
1908); This book includes the English translations of Science and Hypothesis (1902), The
Value of Science (1905), Science and Method (1908)
25. Marston, M.: Bull. Amer. Math. Soc. 52(5), Part 1, 357391 (1946) (see project euclid)
26. Lyapunov, A.M.: The general problem of the stability of motion. Translated by Fuller, A.T.
Taylor and Francis, London. ISBN: 978-0-7484-0062-1. Reviewed in detail by Smith, M.C.:
Automatica 1995 3(2), 353356 (1992)
27. Lorenz, E.N.: Deterministic non-periodic flow. J. Atmos. Sci. 20, 130141 (1963)
28. Markov, A.A.: Extension of the limit theorems of probability theory to a sum of variables
connected in a chain. Reprinted in Appendix B of: Howard, R. Dynamic Probabilistic Systems,
vol. 1, Markov Chains. Wiley (1971)
29. Masani, P. (ed.): The Mathematical Work of Norbert Wiener, vol. 4. MIT Press, Cambridge.
This contains a complete collection of Wieners mathematical papers with commentaries (1976)
30. Kolmogorov, A.: Grundbegriffe der Wahrscheinlichkeitsrechnung (in German). Julius Springer,
Berlin. Translation: Kolmogorov, A. (1956). Foundations of the Theory of Probability, 2nd edn.
Chelsea, New York (1933)
31. Kalman, R.E., Bucy, R.S.: New results in linear filtering and prediction theory. Trans. Am. Soc.
Mech. Eng., J. Basic Eng. Ser. D 83, 95108 (1961)
32. Metropolis, N., Ulam, S.: The Monte Carlo method. J. Am. Stat. Assoc. 44(247), 335341
(1949)
33. Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the Agulhas current
using the ensemble Kalman filter with a quasi geostrophic model. Mon. Wea. Rev. 124, 8596
(1996)
34. Burgers, G., van Leeuwen, P.J., Evensen, G.: Analysis scheme in the ensemble Kalman filter.
Mon. Wea. Rev. 126, 17191724 (1998)
35. Hamill, T., Mullen, S., Snyder, C., Toth, Z., Baumhefner, D.: Ensemble forecasting in the short
to medium range: report from a workshop. Bull. Am. Meteor. Soc. 81, 26532664 (2000)
36. Evensen, G.: Data Assimilation: The Ensemble Kalman Filter. 2nd edn. p. 320. Springer (2009)
37. Lewis, J.M.: Roots of ensemble forecasting. Am. Meteorol. Soc. 133(7), 18651885 (2005)
38. Gillispie, C. (ed.): Dictionary of Scientific Biography, vol. 18. Scribner, New York (1981)
39. Torricelli, E.: http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Torricelli.html
40. Petersen, D.P., Middleton, D.: On representative observations. Tellus XV 4 (1963)
41. Petersen, D.P.: On the concept and implementation of sequential analysis for linear random
fields. Tellus XX 4, 673686 (1968)
42. Charney, J.: On the scale of the atmospheric motions. Geofys. Publ. 17(2), 17 (1948)
43. Charney, J.: The use of the primitive equations of motion in numerical weather prediction.
Tellus 7, 2226 (1955)
44. Eady, E.: The quantitative theory of cyclone development compendium of meteorology. In:
Malone, T. (ed.) American Meteorological Society, pp. 464469 (1951)
45. Lorenz, E.N.: The statistical prediction of solutions of dynamic equations. In: Proceedings of
the International Symposium on Numerical Weather Prediction, pp. 629634. Meteorological
Society of Japan, Tokyo (1962)
46. Lorenz, E.N.: Energy and numerical weather prediction, Tellus XII 4, 364373 (1960)
47. GARP: GARP topics. Bull. Am. Meteor. Soc. 50, 136141 (1969)
48. Epstein, E.: Stochastic dynamic prediction. Tellus 21, 739759 (1969)
Chapter 2
Representation of the Physical System

Abstract The core problems of estimation approaches to data assimilation prob-

lems lie in the generic discrete stochastic dynamic model of the system components
and the generic discrete stochastic model of the observations. Thus, the first step
in the mathematical formulation of the analytical problem is the definition of the
work space, data quality and the model used to represent the system dynamics. This
chapter deals with the estimation problem, the representativeness of the model and
the Optimal Estimation techniques. A special attention is also devoted to explaining
why the data assimilation is an inverse problem.

2.1 The Observational System and Errors

The core problems of estimation approaches to data assimilation problems lie in

the generic discrete stochastic dynamic model of the system components and in the
generic discrete stochastic model of the observations. Discrete dynamics are assumed
as given and the difference between the discrete dynamics and the governing con-
tinuum dynamics is accounted for by model error represented by stochastic forcing.
Since the system state is considered discretely whereas it is the continuum state that
should be observed, the observation model includes a representativeness error term
as well as a measurement error term.
Although the applicable equations imply that a continuous initial condition
field description is sufficient to define the values of the field for all future time, the
uncertainty of the synoptic reconstruction, caused by finite sampling density and
inaccurate measurements, suggests that continuity stems from a previous analysis
may assist in defining the present distribution of values. The analyst utilizes this
principle intuitively by inspecting the current array of data for patterns previously
identified [1].
As a natural consequence the design of observation and data collection system
for analysis and prediction of physical fields needs to be oriented toward peri-
odic simultaneous measurements throughout the medium under investigation. The
analysis consists of a reconstruction of the continuous spatial field, or a dense grid

The Author(s) 2016 19

R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_2
20 2 Representation of the Physical System

representation of the same, to which the dynamic equations of motion are applied in
order to extrapolate into future time.
Thus the first step in the mathematical formulation of the analytical problem is the
definition of the work space. The goal is to find a true state vector that is the projection
of the infinite dimensional space of the field vector to the finite dimensional space
of its numerical representation.
In dynamic meteorology and related disciplines, the forecasts of physical field
variables and the mathematical models of field dynamics are expressed by a system
of nonlinear partial differential equations PDEs whose prognostic state variables at
time tk is the vector xk .
Assuming the governing PDEs to be well-posed in the sense of Hadamard there
is a unique solution operator or time dependent propagator g that yields the solution
xk given the solution xk1 at an earlier time interval tk tk1 . Omitting the time
index tk , from now on represented by the index k, we have:

xk = g(xk1 ) (2.1)

for k = 1, 2, 3 . . .. It is a faithful representation of the dynamics of the system under

investigation.
This system could be stochastically forced, for example through uncertain bound-
ary conditions or also be internally forced by stochastic free parameters estimated
during the course of data assimilation, for example by physical parameterization.
If the presentation parameters and forcing are considered fixed the propagator g is
deterministic.
Since the numerical representation is defined through a discretization process, one
has a discretized version xkd of Eq. 2.1 by the discrete propagator f. How these vector
components relate to the real state depend on the choice of discretization, which is
mathematically equivalent to a choice of basis.

xkd = f(xk1
d
), (2.2)

for k = 1, 2, 3 . . ., where the superscript d means discrete.

Since the reality is more complex than what can be represented by a state vector,
one must distinguish between reality itself and the best possible representation of
reality, which one denotes as the true state at the time of the analysis. Thus we define
a discrete true state vector xt of dimension n, based on the available observation
t1 , t2 , . . ., that is the representation of the discrete state on a continuum state.

xkt xk (2.3)

where is an operator mapping, in a proper manner, the discrete space on continuous

space. The true state is still unknown since xk is unknown and also the initial condition
x0 and the propagator g are unknown.
2.1 The Observational System and Errors 21

Using the previously defined operators and applying the operator on both side
of Eq. 2.1 and adding and subtracting f(xtk1 ) we obtain the discrete evolution of the
equation for xkt

xkt = f(xtk1 ) + ek1

t
(2.4)

where f is the discrete propagator that resides in our numerical solution and the
t
forcing term ek1 is the model error from time tk1 to tk that is:

ekt = g(xk ) f(xk ) (2.5)

that is appropriate to be represented as a stochastic perturbation because:

is state-dependent;
its dependence on unknown renders it, from a deterministic point of view,
unknowable;
is small provided f approximates g.
Let us now explore the observed data on the basis of the true state xkt . Le us define
a continuous observing vector obtained by a number of time dependent observations,
with their errors, made at time tk where k = 1, 2, 3, . . .. If the error is additive we
can write:
xkobs = hkc (xk ) + ekm (2.6)

Here hkc is the continuum forward observation operator and ekm is the measurement
error, considered stochastic, whose mean is: ekm E[ekm ]. E[] is the expectation
operator.1
The observation operator can be considered linear when the state variables are
directly observed, as in case devices located in ground measurement stations and
radiosondes and non linear when data come from remotely sensed devices that require
proper integro-differential algorithms to be interpreted.

1 Theexpectation of a random variable is defined as the sum of all values the random variable
may take, each weighted by the probability with which the value is taken. In term of formula the
expectation of E[x] is given by: +
E[x] = x f (x)d x (2.7)

This is also called mean value of x or first moment. A second moment is given by the quantity
+infty
E[x 2 ] = infty x 2 f (x)d x. The variance of a random variable is the mean squared deviation of the
random variable from its mean; it is 2 = E[x 2 ] E[x]2 . Another important concept is the statistical
correlation between random variables that is given by the covariance, which is the expectation of
the product of the deviations of two random variables from their means:

E[(x E[x])(y E[y])] = E[x y] E[x]E[y] (2.8)

that is a measure of bias, and its covariance matrix Rk E[(ekm ekm )(ekm ekm )T ].
22 2 Representation of the Physical System

Let us now formulate the stochastic dynamic model adding and subtracting hk (xtk )
from the relation (2.6) taking into account the discrete true state xkt = xk . Then
the discrete observation model is obtained by:

xkobs = hk (xtk ) + ekobs (2.9)

where hk is the discrete forward operator acting on xkt and eobs erk + ekm is the
total observation error. The measurement error is ekm while the representativeness
error (see Lorenc [2]) erk is given by the difference between the representation of the
continuum forward model and formulating the model error discretely, that is:

erk = erk (xk ) hkc (xk ) hk (xk ). (2.10)

The impact of this error on our system will be more clear when we address the
initialization problem in the next paragraph.

2.1.1 The Estimation Problem

Since our goal is to study a physical system described by the vector state xkt and since
this vector is unknown, one assumes that the best estimation of the system is given by
the state vector xkb , where b stands for background, denoting the a priori or background
estimate of the true state before the analysis is carried out, valid at the same time. This
vector is the result of the data assimilation or statistical analysis performed earlier.
By Eq. 2.9 the observations performed on the system bring new information through
the operator hk . One assumes the statistics of the observation error are known up to
the second order moments. When new observations are available we can improve
the analysis obtaining our estimation xka with its error. The suffix a means analysis.
In geophysics it is usual to define the a priori estimate as forecast/background and
the posterior estimate as analysis.
Let us now explore the background and analysis errors. The background error is
defined as:
ekb = xkb xkt (2.11)

it reflects the discrepancy between the a priori estimate and the unknown truth. It
is considered stochastic the mean of which is eb = E[ekb ]. The background error
covariance is: B = E[(ekb ekb )(ekb ekb )T ]. The analysis error is defined as:

eka = xka xkt (2.12)

It defines the difference between the analysis process and the truth; the related analysis
error covariance is: A = E[(eka eka )(eka eka )T ] with its mean given by: ea = E[eka ].
All matrices are symmetric and positive.
2.1 The Observational System and Errors 23

2.1.2 The Linear Hypothesis

In order to understand if we can linearize our system we need to analyze the role
played by the representativeness error in the model. If we write the representativeness
error adding and subtracting hkc (xk ) to the relation (2.10)

erk = hkc (xk ) hk (xk ) + hkc (xk ) hkc (xk ), (2.13)

we can split the representativeness error into the sum of two parts, ek and ek giving:

ek hkc (xk ) hkc (xk )

ek hkc (xk ) hk (xk ), (2.14)

where ek can be all depending on our integration interpolation formulas and is easy to

solve, while ek depending on the scale variability of xk could dominate the measure-
ment error, as happens in a highly variable field. This can be more clarified if we use

a linear approximation of ek , for instance applying the first order Taylor expansion
c
to hk obtaining:

ek = Hkc (I )xk (2.15)

where Hkc is:

hkc (xk )
Hkc = , (2.16)
x x=xk

that operates on the unresolved portion of (I )xk of the continuum state xk . Hkc
is a tangent linear operator or Jacobian matrix of a non linear operator. If we define
the linear operator as H we can write Hkc = H
Thus the vector of observations y is related with the observation operator H(x)
and the observation error as e by:
y = Hxt + e. (2.17)

Let us estimate now, with a reasonable assumption, that xa is a linear combination

of all available information given by the n n matrix L and n p matrix where K
is the linear operator, we have:
xa = Lxb + Ky (2.18)

Given the observation Eq. 2.17 we obtain that the estimate error of the analysis is:

xa xt = L(xb xt + xt ) + K(Hxt + e) xt (2.19)

ea = Leb + Ke + (L + KH I)xt .
24 2 Representation of the Physical System

If we assume the errors and observations are unbiased, the related expectations are
E[e] = 0 and E[eb ] = 0 and thus E[ea ] = (L + KH I)E[xt ]. On the contrary
if there is a bias, it is always possible to diagnose it and subtract its value from the
total observation errors to make the corrected error unbiased. If we postulate that:

L = I KH, (2.20)

as sufficient but not necessary condition, we have:

xa = (I KH)xb + Ky
xa = xb + K(y Hxb ), (2.21)

where the difference [y Hxb ] is called innovation and K is a weight or gain

computed on the estimated statistical error covariances of the forecast and the obser-
vations. It gives the additional information brought in by the observation compared
to background. The error covariance matrix obtained from the innovation vector is
called information matrix. Since the operator K is linear, the analysis is a linear
interpolation, as was for example in the first Cressman [3] interpolation.
Once we have computed the optimal gain matrix K we need to compute the error
covariance or posterior error matrix Pa .
Using the notation already introduced for the error we have:

ea = eb + K(e Heb ) (2.22)

As previously we assume the errors and observations are unbiased so that the covari-
ance analysis matrix is Pa = E[(ea )(ea )T ]. Remembering the error covariance
background matrix Pb and error covariance observation matrix R and the relation
(2.20) we have:

Pa = E[(ea )(ea )T ]
= E[(eb + K(e Heb ))(eb + K(e Heb ))T ]
= E[(Leb + Ke)(Leb + Ke)T ]
= E[Leb (eb )T LT ] + E[Ke(e)T K T ]
= LPb LT + KRK T
= (I KH)Pb (I KH)T + KRK T (2.23)

that is the posterior error.

In order to obtain an optimal estimation, we minimize the trace of the analy-
sis error covariance given by (2.23). Remembering Trace[Pb ] = Trace[Pb ]T and
T race[R] = T race[R]T , expanding the relation (2.23) we have:
2.1 The Observational System and Errors 25

T race[Pa ] =T race[Pb ] + T race[KHPb HT K T ] 2T race[Pb HT K T ]

+ T race[KRK T ] (2.24)

Following Bouttier and Courtier [4] we have a continuous differentiable scalar func-
tion of the coefficient of K whose first order derivative in K of the difference
T race[Pa ](K + L) T race[Pa ](K], where L is an arbitrary test matrix, is:

d[T race[Pa ]]L

= 2T race[KHPb H T LT ] 2T race[Pb H T LT ] + 2T race[KRLT ]
dK
= 2T race[KHPb H T LT Pb H T LT + KRLT ]
= 2T race{[K(HPb H T + R) Pb H T ]LT } (2.25)

The last line shows that the derivative is zero for any choice of L if (HPb HT +
R)K T Pb H = 0 that is equivalent to:

K = Pb HT (HPb HT + R)1 (2.26)

because (HPb HT + R) is assumed invertible.

The estimate of xa and Pa are called Best Linear Unbiased Estimator, BLUE.
BLUE, because H is linear through K and L, without bias through the first step of
derivation and optimal through the second derivation.

2.1.3 Optimal Estimation

Now we need to improve our knowledge of the state xa taking into account the two
available sources of information: the model and observations. There are two ways to
combine observations with the model:
the observations y may be interpolated between the observational data which can
be sparse in time and space subjected to constraints provided by the model;
we want to reduce the uncertainties of model H on the input x, under the constrains
of the measurements.
We define a cost function J that is a measure of the distance between the observations
and model.
J (x) = ||y H(x)||2 , (2.27)

with || || the norm two. Since we need to balance each component through the
confidence in the measurement we can access to the a priori estimate or background
information. In this way we introduce a compromise between the observations given
by the observation and the information given by background value. Then the cost
function can be defined as

J (x) = ||y H(x)||2 + ||x xb ||2 . (2.28)

26 2 Representation of the Physical System

and are the weight given to the confidence in the observations and background.
Those parameters can be defined empirically or analytically, knowing the background
and observation errors.
A simple example drawn by Bouttier and Courtier [4] based on temperature and
related error variance shows that the cost function terms which have a quadratic
form tend to pull the analysis xa toward the background xb and the observation y,
respectively. In this case xa makes J (x) as small as possible, given the computational
constraints.
The quadratic form (2.28) can be written in matrix form as:

1
J (x) = {(y H(x))T R1 (y H(x)) + (x xb )T (Pb )1 (x xb )}, (2.29)
2

where y is the vector of observation of length k; xb is a vector of background of

length j and Pb is the background error covariance matrix of rank j j and R
is the observation error covariance matrix of rank k k. H is the linear forward
interpolation operator.
The optimal solution, i.e. the analysis xa that is closest to the true state xt , in an
r.m.s. sense, requires the minimum:
x
J (x) min (2.30)

It demands the first derivative, the gradient, of the cost function with respect to its
variable x at the analysis xa be equal zero

dJ
J (xa ) = =0 (2.31)
dx xa

Under our linear assumption, we have:

(Pb )1 (xa xb ) HT R1 [y H(xa )] = 0 (2.32)

Adding and subtracting y H(xb ) we obtain:

(Pb )1 (xa xb ) HT R1 [y H(xb )] HT R1 [H(xa ) H(xb )] = 0 (2.33)

Rearranging this equation and applying again the linear assumption, we have:

0 = (Pb )1 (xa xb ) HT R1 [y H(xb )] HT R1 H(xa xb )

xa xb = ((Pb )1 + HT R1 H)1 HT R1 [y H(xb )] (2.34)

The analysis state xa is called optimal because is closest in a root mean square sense
to the true state xt . The equivalence of this relation with relation (2.21) can also be
2.1 The Observational System and Errors 27

shown applying the Sherman-Woodbury-Morrinson equation.2 In fact, remembering

that K is given by the relation (2.26) we have:

K = Pb HT (HPb HT + R)1
= ((Pb )1 + HT R1 H)1 ((Pb )1 + HT R1 H)Pb HT (HPb HT + R)1
= ((Pb )1 + HT R1 H)1 (HT + HT R1 HPb HT )(HPb HT + R)1
= ((Pb )1 + HT R1 H)1 HT R1 (R + HPb HT )(HPb HT + R)1
= ((Pb )1 + HT R1 H)1 HT R1 . (2.36)

The equivalence can be useful because sometime the inversion of the ((Pb )1 +
HT R1 H)1 is more costly than the matrix R + HPb HT .
Summarizing we have:

xa = xb + K[y H(xb )]
W = Pb HT [R + HPb HT ]1 . (2.37)

With respect to the relaxation method Optimal Interpolation is an intermittent assim-

ilation methods. It is used at synoptic times, that is the instant of standard meteo-
rological time. The difference of K gain with respect to the Kalman gain is that the
background error covariance matrix Pkb is specified rather than predicted. In opti-
mal estimation the element of Pkb is based on statistical evaluations and dynamic
constraints. For the state vector r(, , ) at a certain instant, the error covariance
matrix Pb is:
Pb = E[e(ri )eT (r j )] (2.38)

where e(r) = xb (r) xt (r) is the forecast error with xt (r) represents the true value
of the state vector and xb (r) is the forecast state vector. Therefore we can decompose
Pb as b| b| b|
P P P

Pb (ri , r j ) Pb| Pb| Pb|
Pb| Pb| Pb|

where Pb| are the cross-covariance functions defined in analogy to (2.38).

It is a simplification of the algebraic calculation of the gain matrix. The first of
the Eq. (2.37) is solved through the inversion, the matrix K is simplified assuming
that only the most forthcoming observations determine the analysis increment. For
each of the variables of the model the increment of the analysis is given from the
corresponding K for the vector of deviations of the background value [y H(xb )].

2 Sherman-Morrison-Woodbury formula is:

(A + UCV)1 = A1 A1 V(C1 + VA1 V)1 VA1 , (2.35)

where all matrices have they correct size.

28 2 Representation of the Physical System

Bouttier, and Courtier [4] provide the fundamental assumptions and procedures to
be followed.
The fundamental hypothesis of optimal interpolation (OI) is that: for each variable
of the model, just some observations are important to determine the increase of the
analysis. From this it follows that:
1. for each variable of the model x(i) choose a small number of observations pi
using an empirical policy of selection;
2. form the corresponding pi list of deviations of the background data [y H(xb )]i ,
of the errors background covariance matrix, between the variables of the model
x(i) and the state of the model interpolated in points pi namely the pi coefficients
of the ith row of Pb H and ( pi pi ) covariance submatrix of the errors observations
and background formed by HPb HT and R for selected observations;
3. invert the positive-definite matrix ( pi pi ) formed from [R + HPb HT ] for
selected observations (e.g. using Cholesky Factorizazion methods or LU);
4. multiply it by the ith row of Pb H to get the K row required.
In Optimal Interpolation it is necessary that Pb is a matrix that can be easily applied
to a pair of observed value and model variables or to a pair of observed variables.
The simplicity of the OI collides with the disadvantage that there is no consistency
between small and large scales and that H must be linear.

2.1.4 Minimization Methods of Cost Functions

There are several methods to minimize the cost function including the most relevant
below. If the cost function is quadratic and convex, its solution is unique. In general,
however, J exhibits several minima. In such a case the problem is more difficult
to solve, even though there are some algorithms among which we select conjugate
gradient and quasi-Newton methods.
The minimization algorithms start from an initial point x0 and construct a sequence
xk which converges to a local minimum. At each step k one determines a direction
dk to define the next point of the sequence. Then the problem of minimization of
multivariable functions is usually solved by determining a search direction vector
dk and solve it as a linear minimization problem. If xk is the vector containing the
variables to be determined and dk is the vector of search direction, at each iteration
step the minimization problem of a function f is formulated so as to find the step
size that minimizes f(x + dk ), where is a positive and real number. At the next
iteration, x is replaced by xk + k dk and a new search direction is determined.
The conjugate gradient method is an algorithm for finding the nearest local min-
imum of a function which uses conjugate directions for descending. Two vectors u
and v are said to be conjugate, with respect to a matrix A, if

uT Av = 0, (2.39)
2.1 The Observational System and Errors 29

where A is the Hessian matrix of the cost function. In Press et als book [5] there are
two conjugate gradient methods by Fletcher-Reeves and Polak-Ribire.
These algorithms calculate the mutually conjugate directions of search with
respect to the Hessian matrix of the cost function directly from the function and
the gradient evaluations, but without the direct evaluation of the Hessian matrix. The
new search direction dk+1 is determined by using

dk+1 = gk+1 + k dk , (2.40)

where dk is the previous search direction, gk+1 is the local gradient at iteration step
k + 1 that is determined by the Fletcher-Reeves equation
gk+1 gk+1
k = (2.41)
gk gk

and the Polak-Ribire equation

(gk+1 gk ) gk+1
k = . (2.42)
gk gk

If the vicinity of the minimum has the shape of a long, narrow valley, the minimum is
reached in far fewer steps than would be the case using the steepest descent method,
which makes use of the inverse of the local gradient as the search direction. The line
minimization to find the step size that minimizes f(x + dk ) at every iteration step
can be done by using the Golden Section Search Algorithm [5].
For the problem of minimizing a multivariable function quasi-Newton methods
are also widely used. These methods involve the approximation of the Hessian, or
its inverse, matrix of the function. The LBFGS (Limited memory-Broyden-Fletcher-
Goldfarb-Shanno) method is basically a method to approximate the Hessian matrix
in the quasi-Newton method of optimization. It is a variation of the standard BFGS
method, which is given by (Nocedal [6], Byrd et al. [7])

xk+1 = xk k Hk gk , k = 1, 2, 3 . . . (2.43)

where k is a step length, gk is the local gradient of the cost function, and Hk is the
approximate inverse Hessian matrix which is updated at every iteration by means of
the formula
Hk+1 = VkT Hk Vk + k sk skT (2.44)

where
1
k = (2.45)
qT s k
30 2 Representation of the Physical System

and
Vk = I k qskT (2.46)

sk = sk+1 sk (2.47)

and
qk = gk+1 gk (2.48)

Using this method, instead of storing the matrices Hk , one stores a certain number
of pairs {sk , qi } that define them implicitly. The product of Hk gk is obtained by
performing a sequence of inner products involving gk and the most recent vector
pairs {sk , qi } to define the iteration matrix.

2.1.5 Some Properties of Estimation

1. The innovation and the analysis residue y Hxa are unbiased.

As consequence of the linearity of the observation operator and absence of bias
of the analysis error ea the analysis residue is unbiased. In fact:

y Hxa = Hxt + e Hxa = Hea e (2.49)

and we can conclude E[y Hxa ] = 0. Since the background error is unbiased
also the innovation error is also unbiased

y Hxb = Hxt + e Hxb = Heb e, (2.50)

from which E[y Hxb ] = 0.

2. The analysis and analysis error are orthogonal.
Calculate the covariance matrix

C = E[xa (ea )T ]. (2.51)

Assuming the background satisfies

E[xb (eb )T ] = 0 (2.52)

means the background and its error are uncorrelated. From Eq. 2.21 we have

xa = xb + K(y Hxb ) = xb + K(Heb e). (2.53)

2.1 The Observational System and Errors 31

Then, recalling (2.22)

C = E[(xb + K(Heb e))((I KH)eb KeT ]

= KH E[eb (eb )T ](I KH)T + K E[eeT ]K T
= K[HPb (I KH)T + RK T ]. (2.54)

If the analysis is optimal, we have HPb (I KH)T + RK T = 0, so C = 0 and

the estimate xa and its error ea are orthogonal.

2.1.6 Estimation of the quality of analysis

An important step in the process of assimilation is to be able to estimate the quality of

the analysis. In fact in a sequential analysis it is useful to know the level of reliability
of the analysis because it helps to specify the background error covariance matrix
for later analysis. If the background is a forecast then as we have seen the errors are
a combination of the errors of the model and those of the analysis that evolve in time
according to dynamic model as it is also seen using of the algorithm of Kalman filter.
The word quality means reliability and it is estimated through the value of the
covariance matrix of the error of the analysis Pa . The process by which we estimate
the quality of the analysis is closely linked to the cost or penalty function and its
gradient. Recalling the relation (2.29) and (2.31), the second derivative or Hessian
of the cost function derived two times around x that the control variable is:

J (x) = 2((Pb )1 + HT R1 H) (2.55)

Introducing the true state x t of the model into the (2.31) we have:

0 = (Pb )1 (xa xt + xt xa ) HT R1 (y H(xt ) + H(xt ) xa )

= (Pb )1 (xa xt ) HT R1 H(xt xa ) (Pb )1 (xb xt ) + HT R1 (y H(xt )). (2.56)

Thus

((Pb )1 + HT R1 H)(xa xt ) = (Pb )1 (xb xt ) + HT R1 (y H(xt )). (2.57)

Multiplying the right side of this equation by its transposed and taking into account
of the Eq. 2.23 and computing the expectation we have:

((Pb )1 + H T R1 H)Pa ((Pb )1 + H T R1 H)T

= (Pb )1 Pb (Pb )1 + H T R1 RR1 H (2.58)
+[(Pb )1 H T R1 (xb xt )T (y H(xt ) + (Pb )1 R1 (xb xt )(y H(xt )T . (2.59)
32 2 Representation of the Physical System

Assuming the background error and those due to the observation are uncorrelated,
by simplifying we obtain:

((Pb )1 + HT R1 H)Pa ((Pb )1 + HT R1 H)T = ((Pb )1 + HT R1 H). (2.60)

Removing the zero component we have:

Pa = ((Pb )1 + HT R1 H)1 (2.61)

By definition, the Hessian is given by (2.55) and:

1
Pa = [ J (x)]1 (2.62)
2
or inversely
1 a 1
J (x) = (P ) (2.63)
2

The matrix (Pa )1 is called information matrix.

2.2 Variational Approach: 3-D VAR and 4-D VAR

The basic principle of the 3D Var is to avoid explicitly calculating the gain matrix
and make its inversion using a minimization procedure of the cost function J . In this
case the solution of the Eq. (2.29) is obtained iteratively doing various evaluations of
the equation and its gradient to get the minimum using a suited descent algorithm.
The minimization is obtained limiting the number of iterations and stipulating that
the norm of the gradient J (s a ) decreases by a predefined amount, during
minimization this is an intrinsic measure of how close the analysis is to the optimal
value rather than the starting point of minimization.
When the observations are distributed over time the approach 3D V ar is gener-
alized to the approach 4D Var. The equations are the same provided the operators
are generalized including a forecasting model that allows comparison the state of the
model with the observations at a time k defined.
In a given time interval the cost function to be minimized is the same of the
3D Var but with a difference related to the operator H and R that are subject to
a partial trajectory, i.e. the k ranges from k 1 to k. Thus according to the relation
given by Lorenc [2]:
N
1 T 1 1
J = [yk Hk (xt )] Rk [yk Hk (xk )] + [x x0b ]T (Pb )1 [x x0b ],
2 2
t=0
(2.64)
2.2 Variational Approach: 3-D VAR and 4-D VAR 33

where x is at zero time forecast produced by the data assimilation for k < k0 . Pb is
the covariance matrix of the errors while k = 0; x0b is a value of x before the first
iteration of the descent algorithm. The second term on the right it is thought to force
the forecast x0b towards the prediction before x to reduce the time discontinuity. In
the classical receipt (see [8]) the assimilation issue 4D Var is subject to a strong
constraint such that the sequence of model states xt must be a solution of the equation:

xk = M0k (x) k (2.65)

where M0k is a forecasting model from the starting time to k. 4D Var is a problem
of nonlinear optimization difficult to solve unless in the following two hypotheses:
causality and tangent linear.
1. Causality
The forecast model can be expressed as the product of intermediate forecasting
steps, that reflect the randomness of nature. The integration of a prognostic model
starts with the initial condition x0 = x so that M0 is the identity. Thus indicating
with Mk the step of forecasting from k 1 to k we have xt = Mk xk1 and by
recurrence:
xk = Mk Mk1 ...M1 x1 . (2.66)

2. Tangent linear
The cost function can be squared assuming that the operator M can be linearized,
that is:

yk Hk M0k (x) yk Ht M0k (xb ) Hk M0k (x xb ), (2.67)

where M is the tangent linear model, i.e. the differential of M.

These two assumptions simplify the problem of minimizing an unconstrained
quadratic function which is numerically much easier to solve. The second term of the
cost function (2.64) is no more complicated than that is in 3D Var. The evaluation
of the first term Jo requires N integrations of the forecasting model from the time of
the analysis at each observation time k and more for the calculation of the gradient.
The evaluation of the cost function and its gradient require an integration model
from k = 0 at the value N and the use of integration with adjoint operators per-
formed with the transpose of the tangent linear model of temporal operators M.3
The minimization of the cost function and its gradient is carried out according to the
following procedure:

3 The adjoint operators have been introduced to reduce the size and the number of multiplication of
matrices and to be able to calculate the cost function. Algebraically means replace a set of matrices
with their transposed, hence the name of adjoint techniques.
34 2 Representation of the Physical System

1. estimate xk = Mk Mk1 ...M1 x1 ;

2. normalized discrepancies dk = Rk1 (yk Hk [xk ]) that are stored
3. the contribution ofthe first part of cost function Jok = (y Hk [xk ])T dk
N
4. and finally Jo = t=0 Jok (x).

In order to compute Jo we need to factorize:

1
N
1
Jo = Jok
2 2
k=0

N
= M1T ...MkT HkT dk
k=0
= H0T d0 + MkT [H1T d1 + M1T [H2T d2 + ..HnT dn ]...]. (2.68)

This equation needs to be evaluated using the following algorithm.

1. initialize the adjoint variable x = 0
2. for each step k 1 the variable xk1 is obtained adding the adjoint forcing HkT dk
to xk and performing the adjoint integration by multiplying the result for MkT , i.e.
xk1 = MkT (xk + HkT dk )
3. at the end of recurrent the adjoint value x0 = 21 Jo (x) gives the required result.

2.3 Assimilation as an Inverse Problem

Before showing why the assimilation is an inverse problem let us introduce the
concept of well and ill posed problem in the sense of Hadamard. Given and operator
A, we wish to solve the following system of equations

g = Af. (2.69)

By Hadamard [9] e Hilbert e Courant [10] it is a well-posed problem when:

1. a solution exists;
2. the solution is only determined by the input parameters (forcing, boundary con-
ditions, initial conditions);
3. depends continually on input parameters and it is stable (A1 continuous).
When the conditions 2 and/or 3 are not satisfied the problem is ill-posed. In finite
dimension, existence and uniqueness can be imposed and stability follows, however,
the discrete problem of underlying ill-posed problem become ill-conditioned and the
singular value of A decay to zero.
2.3 Assimilation as an Inverse Problem 35

2.3.1 An Illustrative Example

Let us now apply the approach to a simplified model or toy model representing the
sea circulation in a well defined ocean basin (Bennet [11]):

u u
+c = F, (2.70)
t x
with dimension between 0 x L in the time interval 0 t T where c is a
known positive constant. Let F = F(x, t) be the forcing field. An initial condition
is given by:
u(x, 0) = I (x), (2.71)

where I (x) is specified. The boundary conditions are:

u(0, t) = B(t), (2.72)

where B(t) is defined. In order to evaluate the uniqueness of the solution we assume
F, I e B have two solutions u 1 and u 2 . Defining the difference v = u 1 u 2 we have:

v v
+c = 0, (2.73)
t x

with the boundary conditions and initial conditions respectively of v(x, 0) = 0 and
v(0, t) = 0. The solution can be obtained using the methods of characteristics by
which Partial Differential Equations (PDEs) are reduced to Ordinary Differential
Equation (ODEs). The characteristic equations are:

dx
=c
ds
dt
= 1. (2.74)
ds
The PDE that has been transformed into ODE is
du
=0 (2.75)
ds
On the basis of the initial condition and the boundary conditions the solution is:

u(x, t) = 0 (2.76)

and then u 1 (x, t) = u 2 (x, t) showing the solution is unique.

36 2 Representation of the Physical System

Let us now verify the other two points of the well-posed conditions. Using the
Green function one defines G = G(x, t, , )

G G
c = (x )(t ) (2.77)
s x

where is the Diracs delta with 0 L and 0 T . The boundary conditions

for 0 x L are G(L , t, , ) = 0 and for 0 t T are G(x, T, , ) = 0.
The solution is:
T L L
u(x, t) = d dG(, , x, t)F(, ) + dG(, 0, x, t)I ()
0 0 0
T
+ d G(0, , x, t)B( ) (2.78)
0

that is an explicit solution for the forward model. Relation (2.78) indicates u depends
on F, I , B with continuity and if they change of a O(e), u also consequently changes.
Furthermore the request is that I (0) = B(0) otherwise u discontinues along x = ct
for all ts. On the basis of such evaluation one can deduce the model is well posed.
Let us see what happens to the forward model when we introduce the information
for the field u(x, t) of the circulation model proposed. This information consist of
imperfect observations around at an isolated point in space and time. The direct model
becomes indeterminate and cannot be solved with a smooth function and therefore
must be considered an ill-posed problem that must be resolved through a best fit
weighted with all the information we hold.
Let us assume to collect a M number of measurements (observations, data,
etc. ..) of u, in our basin with 0 x L, during the time cruise 0 t T .
The data are collected in xi , ti with 0 i M and will be indicated by the recorded
value u i and its error as:
u i = u(xi , ti ) + ei , (2.79)

where ei is the measurement error and u(xi , ti ) is the true value. Since the boundary
conditions and the initials conditions are affected by errors the equation should be
written taking into account the error f on the forcing F.
u u
+c = F + f, (2.80)
t x
with
u(x, 0) = I (x) + i(x) (2.81)

and
u(0, t) = B(t) + b(t). (2.82)

The problem is now to obtain a unique solution for each choice of F + f , I + i and
B + b. This can be done by looking for the field u = u(xi , ti ) that minimizes errors.
2.3 Assimilation as an Inverse Problem 37

One will try the minimum of the function J cost or penalty function where one also
has introduced the error standard deviations of the a priori functions: Wi for the
model, Wb for the boundary conditions and Wo for the observations.
T L L T
M
J = J [u] = W f dt f (x, t)2 d x + Wi i(x)2 d x + Wb b(t)2 dt + w ei2 ,
0 0 0 0 m=1
(2.83)

where W f , Wi , Wb and w are the positive weights. The cost function J [u] is a
number for each choice of the entire field u. Rewriting (2.83) and highlighting the
explicit dependence of m, b, o and, ei from F, I , B and u i , we have:
L
T L u u
J (u) = W f dt { +c F}2 d x + Wi {u(x, 0) I (x)}2 d x +
0 0 dt x 0
T
M
Wb {u(0, t) B(t)}2 dt + w {u(xi , ti ) u i }2 , (2.84)
0 m=1

the solution of which can be obtained using the calculation of variations as reported
in Appendix, both for the solution with weak constraint and strong constraints. Since
J is quadratic in u, it is non negative and the local extremum must be the global
minimum [11].

References

1. Petersen, D.P., Middleton, D.: On representative observations. Tellus XV 4 (1963)

2. Lorenc, A.C.: The potential of the ensemble Kalman filter for NWPa comparison with 4D-
VAR. Q. J. R. Meteorol. Soc. 129, 31833203 (2003)
3. Cressman, G.: An optimal objective analysis system. Mon. Weather Rev. 87, 367374. Part II:
Data Assimilation European Centre for Medium-Range Weather Forecasts ECMWF Shinfield
IFS DOCUMENTATION Cy40r1 Park, Reading, RG2 9AX, England (1959)
4. Bouttier, F., Courtier, P.: Data assimilation concepts and methods. Meteorological Training
Course. ECMWF, Reading (1999)
5. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes. Cambridge
University Press, Cambridge (1986)
6. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comput. 35(151),
773782 (1980)
7. Byrd, R., Nocedal, J., Schnabel, R.: Representations of quasi-Newton matrices and their use
in limited memory methods. Math. Program. 63(4), 129156 (1994)
8. Fisher, M., Courtier, P.: Estimating the covariance matrices of analysis and forecast error in
variational data assimilation. ECMWF Tech. Memo. No. 220 (1995)
9. Hadamard, J.: An Essay on the Psychology of Invention in the Mathematical Field by Jacques
Hadamard. Dover Publications, New York (1954)
10. Hilbert, D., Courant, R.: Methods of Mathematical Physics, pp. 12. Wiley Classical Library
(1962)
11. Bennet, A.F.: Inverse Modeling of the Ocean And Atmosphere. Cambridge University Press,
Cambridge (2002)
Chapter 3
Sequential Interpolation

Abstract In this chapter, the stochastic dynamical system describing the evolution of
a physical system is analyzed. The Kalman filter genesis and its subsequent evolutions
are outlined to give a complete overview of this algorithm. The Extended Kalman
Filter, the Sigma Point Kalman Filter and the Unscented Kalman Filter are also
reported. The more recent and advanced filters as the Ensemble Kalman Filters will
be reported in the next chapter.

In the previous chapter, we focused our attention on the optimal estimator as well as its
uncertainties, given some a priori information, the background, and the observation
set, but we have not mentioned the temporal dimension of the problem. However, our
interest lies in the evolution of forward model for the system state defined between
times tk and tk+1 . In this chapter, the stochastic dynamical system describing the
evolution of a physical system is analyzed.

3.1 An Effective Introduction of a Kalman Filter

The Kalman filter is an algorithm that, given a set of measurements, arrives at the
estimate of a given quantity by a recursive process. To achieve this objective, the
filter processes all possible information, with no regard to their accuracy by using:
1. the knowledge of the system and its dynamic;
2. the statistical description of the noise associated with the system, the measurement
error and the uncertainty of dynamic models used;
3. all possible information about the initial conditions of the variables of interest.
The concept of optimal estimate refers to the fact that the Kalman [1] filter is the best
estimate that can be done, based on all information we can obtain using the three
items mentioned above. The concept of recursive indicates that it is not necessary
to store all information and previous measurements and reprocess whenever a new
measurement has been taken. An example of how the Kalman filter works, is given
in Fig. 3.1. The Kalman filter combines all available measurement data plus the a
priori knowledge about the system and measurement devices to produce an estimate
of the desired variables, so that the error is statistically minimized.
The Author(s) 2016 39
R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_3
40 3 Sequential Interpolation

Fig. 3.1 Kalman filter process. The Kalman filter keeps track of the estimated state of the system
and its variance or uncertainty. The updated estimate is obtained by a state transition model and
measurements

The original derivation of the Kalman filter was not however in terms of Bayesian
Rules and did not require the use of any function of probability. The only assumption
was that the system of random variables should be estimated through the sequential
updating of moments of first and second order (mean and covariance) and the shape
of the estimator should be linear.
However, since the best approach is the one defined by the Bayesian inference,
we will introduce this concept in the most effective way possible, by delaying the
most complete version to the next chapters. Then, from a Bayesian point of view, the
Kalman filter propagates the probability density of the given quantities conditioned
by the knowledge of the actual data which comes from measurement devices. The
term conditioned is associated with the probability density showing that its shape
and its location on the axis of x depends on the values of the measurements taken.
The shape of the conditioned probability density, contains the level of uncertainty,
given by the variance , which we have given knowledge of the value of x, including
the mean, the median and the mode. The Kalman filter propagates the probability
3.1 An Effective Introduction of a Kalman Filter 41

density for those problems where the system can be described by a linear model in
which the system error and the noise are white and Gaussian. Under these conditions
the mean, the mode and the median are the same.

3.1.1 Linear System

Let us introduce the definition of a linear and nonlinear system, together with the
concept of state space. Its representation is given by a mathematical model of a phys-
ical system with a data set of input, output and state variables related to differential
equations of first degree. The representation of states space is a convenient and com-
pact way to define a pattern and a test system with multiple input and output. It is
built with the state variables of the system dynamic as coordinates. In engineering it
is called the state space, whereas in physics it is called phase space.
The state of a system, at any time, is represented by a point in space. Starting
from an initial position the point, corresponding to the state, moves in space and this
movement is completely determined by the equations of state. The path of the point
is called the orbit or trajectory of the system. It starts from given conditions. These
trajectories are obtained from the solutions of the state equations.
A continuous-time linear system can be described by a first-order differential
equations with the associated output:

x(t) = Fx(t) + Gu(t)

y(t) = Cx(t), (3.1)

where x(t) is a state vector; F is the matrix of the system; G is the input matrix and
C is the output matrix. All these matrices have the appropriate sizes. The dot on x
indicates the time derivative; u is the control vector and y is the output vector. Even
though the matrices are time dependent the system is still linear.
If F, G, C are constant the solution of the Eq. (3.1) is:
t
x(t) = exp[F(t t0 )]x(t0 ) + exp[F(t )]Gu( )]d
t0
y(t) = Cx(t), (3.2)

where t0 is the initial value of the system that is often zero. In case of input equal
zero the relation (3.2) is:

x(t) = exp[F(t t0 )]x(t0 ). (3.3)

If x is an n-dimensional vector, (3.2) is still the solution of (3.1).

The problem is now to compute exp[Ft]. Molner and Van Loan [2] give some
solutions. Among them we select the Taylor series
42 3 Sequential Interpolation

(Ft)j
exp[Ft] = . (3.4)
j!
j=0

The computer generally transforms a continuous-time linear system into a discrete-

time linear system. Thus, the Eq. (3.2) is transformed into discrete-time equation
t = tk with the initial condition t0 = tk1 . Assuming that F( ), G( ) and u( ) are
considered constants, in the integration interval, we obtain:
tk
x(tk ) = exp[F(tk tk1 )]x(tk1 ) + exp[F(tk )]Gu(tk1 )d . (3.5)
tk1

Let us define now t = tk tk1 and = tk1 ; substituting into (3.5) we obtain:
t
x(tk ) = exp[Ft]x(tk1 ) + exp[F(t )]Gu(tk1 )d
0
t
= exp[Ft]x(tk1 ) + exp[Ft] exp[F]Gu(tk1 )d. (3.6)
0

At this point, it is necessary to compute the integral of the exponential of the matrix.
It can be simplified if G is invertible.
t
t (F)j
exp[F]d = d
0 0 j!
j=0

= [I exp[Ft]]F1 . (3.7)

Putting A = exp[Ft] e B = A[I exp[Ft]]F1 G into (3.6), where t is the

set of discretization, we can write the following dynamic discrete-time equation.

xk = Ak1 xk1 + Bk1 uk1 , (3.8)

where we have substituted with tk and tk1 the time step previously defined with k
e k 1.

3.1.2 Building up the Kalman Filter

Let us assume now that the discretized dynamical system is the one relating (3.8) to
which we have added wk1 , which is the white noise associated with the process.
Thus we have:
xk = Ak1 xk1 + Bk1 uk1 + wk1 , (3.9)
3.1 An Effective Introduction of a Kalman Filter 43

where x, u and w are the state vector of the process at time k 1 with size n 1;
A is the matrix of transition of the process state from k 1 to t, time stationary and
with size n m. B is the input matrix; uk1 is the input vector at time step k 1;
wk1 is the uncertainty. The index k gives the state of the system at time k.
The observation vector is:
yk = Hk xk + vk , (3.10)

where yk defines the observation related to the measurement x at time k with size
m 1; Hk is the matrix jointing the vector state to the measurement vector. For
any measurements it can be changed, but usually it is taken stationary. It does not
contain the noise and its size is m n; vk is the measurement error and it has zero
cross-correlation with the noise. Its size is m 1; wk N(0, Qk ) and vk N(0, Rk ).
In summary: the matrix A makes the link between the current state with the
previous one; B is the process control; Hk is the measurement operator. Let us now
define with xf the a priori estimate or in Bayesian terms.

xka = E[xk |y1 , y2 , ...yk ], (3.11)

where the a posteriori estimate is computed on the expected values of xk , conditioned

by all measurements up to k (including k). Notice how the a priori estimate and the
f
a posteriori, xk and xka are the estimates of the same quantity xk respectively.
f
Let us define as error ef = xk xk and ea = xk xka . Figure 3.2 shows it as the
sequence of the a priori and a posteriori estimates and related covariances.
We have two sources of information that help us to estimate the state of the system
at the time k. The first information is the dynamic equation of the system. Substituting
its estimate into the relation (3.9) to the variable , we have for wk1 = 0:

f
xk = Ak1 xk1
a
+ Bk1 uk1 , (3.12)

Secondary information source is given from yk own data. In order to obtain the
optimal filter, we need to minimize the mean square error, provided the error of the
system is Gaussian.
Let us assume models of noise are stationary in time and are evidenced by the
covariance of the type:
E[wi wjT ] = Qij (3.13)

Fig. 3.2 The sequential

process is based on
continuous or intermittent
background/forecast and
analysis steps
44 3 Sequential Interpolation

and
E[vi vjT ] = Rij (3.14)

and
E[vi wjT ] = 0, (3.15)

where ij is the deltas Kronecker (ij = 1 if i = j or ij = 0 if i = j).

The covariance matrix of the a priori error, at time t with size n n, is:

f f f f f
Pk = E[ek (ek )T ] = E[(x xk )(xk xk )T ] (3.16)

and the one a posteriori

Pka = E[eka (eka )T ] = E[(xk xka )(xk xka )T ] (3.17)

f
Assuming the a priori estimate xka is xk , (see Fig. 3.2), it is possible to write an
equation, for the new estimate, that depends on a priori estimate.

f f
xka = xk + Kk (yk Hk xk ), (3.18)

f
where Kk is the Kalman gain, that we get soon, and the term (yk Hk xk ) is known
as innovation. Substituting the relation (3.10) into (3.18) we get:

f f
xka = xk + Kk (Hk xk + vk Hk xk ), (3.19)

that can be written as:

f
xka = (I Kk Hk )xk + Kk Hk xk + Kk vk , (3.20)

substituting it into (3.17) we get:

f f
Pka = E[(I Kk Hk )(xk xk ) Kk vk ][(I Kk Hk )(xk xk ) Kk vk ]T ]. (3.21)

f
Since the a priori error estimate (xk xk ) is not correlated with the measurement
noise, the expectation can be written as

f f
Pka = (I Kk Hk )E[(xk xk )(xk xk )T ](I Kk Hk )T + Kk E[vk vkT ]KkT . (3.22)

Taking into account the relation (3.14) we get the update covariance equation

f
Pka = (I Kk Hk )Pk (I Kk Hk )T + Kk RKTk , (3.23)

f
where Pk is the a priori estimate of Pka .
3.1 An Effective Introduction of a Kalman Filter 45

As it is known, the diagonal of the covariance matrix contains the average quadratic
error and because the trace of the matrix is the sum of the diagonal values of a matrix it
will show the trace of the covariance matrix is the sum of the mean square errors. Then
the mean square error can be minimized by minimizing the trace of the covariance
matrix. First of all we make the derivative of Pka with respect to Kk and then we put
it equal to zero.
Expanding the Eq. (3.23) we get:

f f f f
Pka = Pk Kk Hk Pk Pk HkT KkT + Kk (Hk Pk HkT + R)KkT . (3.24)

Noting the trace of a matrix (Trace[]) is equal to the trace of its transpose, we
can write:
f f f
Trace[Pka ] = Trace[Pk ]2Trace[Kk Hk Pk ]+Trace[Kk (Hk Pk HkT +R)KkT ]. (3.25)

Making the derivative for Kk and putting it equal zero we get:

d Trace[Pka ] f f
= 2(Hk Pk )T + 2Kk (Hk Pk HkT + R) = 0. (3.26)
dKk

from which we get the Kalman gain matrix

Kk = Pk HkT (Hk Pk HkT + R)1

f f
(3.27)

and the associated covariance forecast matrix

Sk = (Hk Pk HkT + R)1 .

f f
(3.28)

The update covariance is obtained from (3.23) in which we have substituted the
Kalman gain matrix that is given by (3.27);

Pka = Pk Pk HkT (Hk Pk HkT + R)1 Hk Pk

f f f f

f
= (I Kk Hk )Pk . (3.29)

This result gives us the update relation of the covariance matrix with optimal gain.
Equations (3.20), (3.19), (3.29) give an estimate of the variable xk . The projection
of the next state is obtained from:
f
xk+1 = Ak xka + Bk uk . (3.30)

To complete the recursion, we need to map the error covariance matrix on the next
step k + 1. We get it defining an expression for the a priori error:

f f
ek+1 = xk+1 xk+1 = (Ak xk +Bk uk +wk )(Ak xka +Bk uk ) = Ak eka +wk . (3.31)
46 3 Sequential Interpolation

f
Extending the definition of Pk at time k + 1 we get:

f f f
Pk+1 = E[ek+1 (ek+1 )T ] = E[(Ak eka )(Ak eka )T ]+E[wk wkT ] = Ak Pka ATk +Q, (3.32)

that completes the recursive filter. Let us note that ek and wk have zero cross-
correlation because the noise wk accumulates between k 1 and k while the error
ek is the current error up to time k.
In summary, the Kalman filter is an algorithm that, given an initial estimate pro-
duces a gain, the so-called Kalman gain, which generates an updated estimate that
generates an updated variance that is projected onto the next step.
In practice the filter operates sequentially in k; at time k we make an estimate of
f
xk , xka , with a covariance matrix of the error of Sk . The Eq. (3.28) defines the Kalman
gain. The stochastic Eq. (3.9) is used to build the a priori estimate and its variance at
the time k + 1. This result is then combined with the measurement made at the same
time, using the equation of the maximum estimate to produce an update status. The
Kalman gain matrix is functionally identical to the maximum a posteriori estimate
(MAP).
The algorithm can be synthetized (without the current index on A, B, H):
1. be:
f
xk = Axk1
a
+ Buk1 ; (3.33)

2. be:
f
Pk = APak1 AT + Q; (3.34)

3. get:
Kk = Pk HT (HPk HT + R)1 ;
f f
(3.35)

4. update:
f
xka = xk + Kk (yk Hxka ); (3.36)

5. update:
f
Pka = (I Kk H)Pk . (3.37)

3.2 More Kalman Filters

3.2.1 The Extended Kalman Filter

There are some cases in which the linearity hypothesis is not valid. For instance,
when the observation operator is nonlinear as it is in the case of satellite observation,
where the radiance operators are taken into account, or in the case of a forward
nonlinear model as it is in atmospheric chemistry.
3.2 More Kalman Filters 47

This means we need to adapt the Kalman filter to handle the potential nonlinearity
of these operators. In these cases we can use the Extended Kalman Filter (EKF). The
evolution of the state of the system is given by (3.8), which in discrete form is:

xk = f(xk1 , uk1 , wk1 ), (3.38)

where wk1 is a random perturbation of the system whose distribution has zero mean
and covariance given by the matrix Qk .
The measurement is given by:

yk = h(xk , vk ). (3.39)

The forecast step is like the one we have previously done for the Kalman filter.

f
xk = f(xk1
a
, uk1 , 0). (3.40)

Introducing the following notation for the forecast observation we have:

f f
yk = h(xk , 0). (3.41)

If we argue that f(x, u, w) is linearly approximated, for small variations of x and

f
w, we can assume that yk is approximately normally distributed.
a ,u
Thus we can render linear f(x, u, w) around (xk1 k1 , 0) and we obtain:

f(x, u, w) = f(xk1
a
, uk1 , 0) + Ak1 (x xak1 ) + Wk1 (w 0). (3.42)

Note we assume uk is known and then it has not to be linearized. Ak1 and Wk1
are partial differential matrices or Jacobian matrix and are:

fi (xk1
a ,u
k1 , 0)
Ai,j,k = . (3.43)
xj

fi (xk1
a ,u
k1 , 0)
Wi,j,k = . (3.44)
wj

f
With this approach we have the approximated covariance matrix for xk :

f
Pk = Ak1 Pk1
a
ATk1 + Wk1 Qk1 Wk1
T
. (3.45)

Similarly we may linearize h placing

f
hi (xk , 0)
Hi,j,k = (3.46)
xj
48 3 Sequential Interpolation

and
f
hi (xk , 0)
Vi,j,k = (3.47)
vj

Thus,
f
exf k = xk xk (3.48)

and
f
ezf k = yk yk . (3.49)

f
We do not know xk , but we aspect that xk xk is relatively small. Now we are able
f
to linearize our function f() in order to obtain an approximation for exk :

f
exf k = f(xk1 , uk1 , wk1 ) f(xk1 , uk1 , 0). (3.50)

Thus the approximated value is:

f
exf k Ak1 (xk1 xk1 ) + k , (3.51)

where k is a distribution as N(0, Wk1 Qk1 Wk1 T ) taking into account the effect of

random variable wk1 .

Similarly
f f
ek = h(xk , vk ) h(xk , 0), (3.52)

can be approximately linearized by:

f
ek Hk exf k + k , (3.53)

where the distribution of k is N(0, Vk Rk VkT ).

f f f
Ideally we need to update ek in order to obtain xk = xk + exk , but , since we do
f
not know exk we need to estimate it by the Kalman gain factor. Placing

f
exak = Kk (yk yk ) (3.54)

f
and after placing xka = xk + exak and using the same derivative process like that one
used for the Kalman gain we obtain the optimal gain for EKF

Kk = Pk HkT (Hk Pk HkT + Vk Rk VkT )1 ,

f f
(3.55)

from which we can obtain the covariance for the update estimate of xka

f
Pka = (I Kk Hk )Pk . (3.56)
3.2 More Kalman Filters 49

3.2.2 Sigma Point Kalman Filter (SPKF)

The central point of the Kalman filter is the propagation of a Gaussian random variable
through the dynamic system. In an Extended Kalman filter the distribution of the state
of a system and the related noise density are approximated by the Gaussian random
variables that are then propagated through a first order linearization of non-linear
system. This can introduce large errors into true mean and in the covariance of the
Gaussian random variable that can lead to a divergence in the filter compromising
the operation.
To address substantially the development of the new Kalman filters, particularly
the filter Sigma Point Filter, and to what it is connected, the Unscented Kalman filter,
we need to take the concepts of Bayesian inference. We use such an approach at the
light of the applications derived from the robotics world where the Sigma Point
Kalman Filter was born (Van der Merwe and Wan [3]).
The probabilistic inference consists in the problem of estimating the hidden vari-
ables (states or parameters) of a system in a consistent and optimal way, once we
have the information incomplete and noisy. We consider that the hidden state of the
system xk with initial probability density of p(x0 ) evolves in time (we remind that
we used the k index to describe the discrete time evolution ) as a first-order Markov
process according to the probability density of p(xk |xk1 ). Once the state variable is
given, the observations yk are conditionally independent and are generated according
to the density of conditioned probability p(yk |xk ). Then the dynamic model of space
is given by:

xk = f[xk1 , uk , wk ] (3.57)
yk = h[xk , vk ], (3.58)

where wk is the noise-related process that drives the dynamic system through the
function of non-linear transition state f. vk is the noise corrupting the measure-
ment through the non-linear observation function of h. The state transition density
p(xk |xk1 ) is fully specified by f and the noise distribution process is given by p(wk ).
h and the noise distribution p(vk ) indicate the observation probability p(yk |xk ). We
assume the external input uk is known. The Dynamic Space-State Model (DSSM )
along with the statistics of random noise variables, as well as the a priori distribu-
tions of the state system, define a probabilistic model of how the system temporally
evolves and how we can observe the evolution of the hidden state. The problem is to
know, in a recursive way, how to obtain an optimal estimate of the hidden variables
of the system, when incomplete and noisy observations are taken.
From a Bayesian point of view the a posteriori filtering density

p(xk |y1:k ) (3.59)

of the state, given all observations from 1 to k (1 : k),

50 3 Sequential Interpolation

y1:k = {z1 , z2 ...zk }, (3.60)

is the complete solution to the sequential probabilistic inference and allows us to

compute the optimal estimate of each state through the conditioned mean.

xk = E[xk |y1:k ] = xk p(xk |y1:k )dxk . (3.61)

Then the problem can be reformulated: how do we compute recursively the a poste-
riori density when there are new observations? The answer comes from the recursive
Bayesian estimation algorithm.
Using the Bayesian rule and the DSSM of the system, the a posteriori density can
be expanded and factorized in the following update recursive form.

p(y1:k |xk )p(xk )

p(xk |y1:k ) =
p(y1:k1 )
p(yk |xk )p(xk |y1:k1 )
= . (3.62)
p(yk |y1:k1 )

Let us see how the relation (3.62) is constructed: the a posteriori state at time k 1,
p(xk1 |y1:k1 ) is firstly forward projected in order to compute the a priori state at
time k, using the model of probabilistic process

p(xk |y1:k1 ) = p(xk |x1:k1 )p(xk1 |y1:k1 )dxk1 . (3.63)

Then the more recent measurement of noise is incorporated, using the observed
probability function, to generate a posteriori update state.

p(xk |y1:k ) = Cp(yk |xk )p(xk |y1:k1 ). (3.64)

The normalization factor is given by:

C = (p(yk |xk )p(xk |y1:k1 ))1 . (3.65)

The a priori transition state is:

p(xk |xk1 ) = (xk f[xk1 , uk , wk ])p(wk )dwk (3.66)

and the probability density of the observation is given by:

p(yk |xk ) = (yk h[xk , vk ])p(vk )dvk , (3.67)
3.2 More Kalman Filters 51

where is the Diracs delta. These multi-dimensional integrals can only be treated
in the case of a Gaussian linear system.
The methodology proposed to solve the problem of probabilistic inference, the
Bayesian optimal recursive solution, requires the propagation of probability density
function of the a posteriori state. This solution is general enough to deal with all forms
of a posteriori density, including the multimodality, asymmetries and discontinuity.
However since the solution does not place any restrictions on the form of a posteriori
density, in general it cannot be described by a finite number of parameters. Then each
estimator must be approximated as a function of the form of a posteriori density and
of the form of recursive Bayesian structure as has been defined previously.
A common mistake is to think the Kalman filter requires that the space in which
it operates is linear and that the probability density is Gaussian. The Kalman filter
does not require these conditions, but only the following assumptions.
1. Estimates of minimum variance of random variables, and then the distribution of
subsequent state variables can be computed, only by computing the first and sec-
ond momentum (i.e. the mean and the variance) by propagating them recursively
and updating.
2. The estimate (the updated measurement) is a linear function of a priori knowledge
of the system, sinthetizing p(xk |y1:k1 ), and new information p(yk |xk ). In other
words, we assume that the Eq. (3.67) of the optimal Bayesian recursion can be
approximated by a linear function.
3. The accurate predictions of the state variable (using the process model) and the
observations (using the forecast model) can be calculated to approximate the first
and second momentum p(xk |y1:k1 ) and p(yk |xk ).
Because the first assumption is consistent, it is necessary that the estimate of the
mean and of the covariance of the a posteriori state density, xk and Pxk , satisfy the
following inequality:

Trace[Pxk E[(x xk )(x xk )T ]] 0, (3.68)

where x xk is called the error estimator .

The assumption 2 assumes that the update of the measure is linear and implying
that the approximate Gaussian estimator is the best linear estimator (as defined by
the minimum variance criterion in assumption 1). This in turn implies that we must
calculate the forecast of assumption 3 in a optimally way.
Based on these assumptions we can derive the recursive form of Kalman filter
through the conditional mean of the random state variable xk = E[xk |y1:k ] and of its
covariance Pxk .
In the case of a recursive linear form, although we accept that the model is not
linear, we have:
52 3 Sequential Interpolation

xk = (forecast of xk ) Kk (yk (forecast of yk ) (3.69)

= x
k + Kk (yk zk ) (3.70)
Pxk = Pxk Kk Pzk KkT , (3.71)

where the recursive terms are:

x
k = E[f(xk1 , wk1 , uk )] (3.72)
z
k = E[h(xk , vk )] (3.73)
Kk = E[(xk x k )(yk z
k ) E[(yk
T
z
k )(yk z
k ) ]
T
(3.74)
1
= Pxk zk Pz , (3.75)
k

with x
k as the optimal forecast (a priori mean at the time k) of xk and corresponds to
the expectation (taken on the a posteriori distribution of the state variable at the time
k 1) of a nonlinear function of the random variables xk1 e wk1 . Similarly for the
optimal forecast z
k , except that the expectation is taken on the a priori distribution
of time state variable k. The term Kk is expressed as a function of the expectation of
a cross-correlazione matrix (covariance matrix) of the state variable forecast error.
Since the problem is to carefully calculate the expected mean and the covariance
of a random variable, let us see what uncertainties arise when estimates of the future
state of a system or measurements are performed. If x is a random variable with
mean x and covariance Pxx we can always find a second random variable y that has
a non-linear functional relationship with x of the type

y = f[x]. (3.76)

Now we want to calculate the statistics of y i.e. the average of y and covariance
Pyy . We need to determine the distribution density function transformed and evaluate
the statistics from this distribution. In the case of a linear function there exists an exact
solution, but in the case of a nonlinear function we must find an approximate solution
that is statistically significant. Ideally it should be efficient and unbiased. Because
the statistic transformed is consistent it is necessary that the following inequality
holds:
Pyy E[{y y}{y y}T ] 0. (3.77)

This condition is extremely important for the validity of the method of processing.
If the statistic is inconsistent the value of Pyy is underestimated and therefore the
Kalman filter assigns a weight too high to the information and underestimates the
covariance. Then the filter tends to diverge. It is therefore appropriate that the trans-
formation is efficient, that is that the left side of the Eq. (3.77) is minimized and that
the estimate is unbiased or y E[y].
We develop a consistent, efficientand unbiased transformation by developing a
Taylor series around x with the Eq. (3.76).
3.2 More Kalman Filters 53

f 1 nf n
f[x] = f[x + x] = f[x] + x + ... x , (3.78)
x n! xn

where x is a Gaussian variable with zero mean, with covariance Pxx and the xnf xn
n

is the nth-order of the multidimensional Taylor series. Taking the expectation of the
transformed mean we have:

1 2f 1 4f
y = f[x] + Pxx + E[x4 ] + .... (3.79)
2 x2 2 x4

and the related covariance is:

T
f f 1 2f
Pyy = Pxx + (E[x4 ] E[x2 Pyy ] E[Pyy x2 ]
x x 2 4! x2
2 T T
f 1 3f f
+ Pyy )
2
+ E[x ]
4
+ .... (3.80)
x 2 3! x 3 x

In other words, the n order term in the series for x is a function of the nth-
momentum of x multiplied by the nth-derivative of f[] evaluated at x = x. If the
moments and the derivatives can be properly evaluated until the nth-order, the mean
is correct up to that order. Similarly it applies to covariance, although the structure
of each term is more complicated. Because each term in the series is scaled down
to the smallest terms, the lowest order in the series has a bigger impact, so that our
forecast procedure should be concentrated on evaluating the terms of lower order.
The linearization process assumes that the second order terms and those of largest
order in x are neglected. Therefore under this assumption

y = f[x] (3.81)
T
f f
Pyy = Pxx . (3.82)
x x

By comparing these estimates with quadratic equations (3.79) and (3.80) we see
that, for the mean, the first order approximations are only valid when orders higher
than the second are insignificant while, for the covariance matrix, those that are
higher than the fourth order.
The Kalman filter calculates the optimal conditions through relations (3.75), while
the Extended Kalman filter linearizes the system around the current state using a first-
order-truncation of the multidimensional Taylor series. Broadly it approximates the
optimal conditions as:

x
k = E[f(xk1 , w, uk )] (3.83)
z
k = E[h(x k , v)] (3.84)
Kk = Pxk zk (Pzlink )1 ,
lin
(3.85)
54 3 Sequential Interpolation

where estimates are approximate without calculating any expectation, just as an

average value. The averages of noises w and v are usually assumed equal zero. Fur-
thermore covariances Pxlinz e Pzlin are determined by linearizing the system model
k k k
around the current estimate of the state variable and then approximatively determin-
ing analytically the a posteriori covariance matrices. This approach not only produces
errors to the Extended Kalman filter, but also makes the filter may diverge.
The Kalman filter based on Sigma Points, increases the accuracy, robustness and
efficiency of approximate inference Gaussian algorithms applied to a non-linear
system. Its different approach is to calculate the a posteriori statistics of the first and
second order of a random variable which is subject to a nonlinear transformation. The
distribution of the state variable is still represented by a Gaussian random variable,
but it is specified using a minimum set of sample points weighed in a deterministic
way. These points, which are called sigma points, completely capture the actual mean
covariance of a priori random variable. When they are then propagated through a non-
linear system, they capture the a posteriori mean covariance up to the second order.
The basic approach of the transformation of data using the method of sigma points
is defined as follows:
1. A set of samples weighed (Sigma Points) is deterministically calculated using
the mean and the decomposition of the covariance matrix of the a priori random
variable. The set of sigma points to be used shall be such as to capture fully the
first and second moment of the a priori random variable. If we want to capture
higher moments we have to increase the sigma points.
2. Sigma points are propagated through the non-linear functions using only a func-
tional assessment: i.e. we do not use analytic derivatives to generate the sigma
points retrospectively.
3. The a posteriori statistics is calculated using the Sigma points that are propagated
and of the weights that take the form of a sample of a weighted average and of
the variance calculation.
In order to translate these steps in an algorithm we take a set of points sigma,
i = {i = 0, 1, 2...n : x (i) , W (i) } consisting in n + 1 vectors with their associated
weights Wi . The weights, that can be either positive or negative, must obey to the
normalization condition:
n
W (i) = 1. (3.86)
i=0

Once we have these points and their weights, it is possible to calculate the average
value y and the covariance Pyy , through the following process:
1. each point is transformed by a function into a set of Sigma Points

(i) = g[x(i) ]. (3.87)

2. The mean is given by the weighted mean of the transformed points

3.2 More Kalman Filters 55

n
= W (i) (i) . (3.88)
i=0

3. The covariance is weighed using the outer product of transformed points treating

n
P = W (i) { (i) }{ (i) }T . (3.89)
i=0

A set of points that satisfy the above

mentioned conditions is a symmetric set of
2Nx points that lies on the boundary of Nx -th covariance

(0) = x
W (0) = W (0)

(i) Nx
= x + Pxx
1 W (0)
i
1 W (0)
W (i) =
2Nx

(i+nx ) Nx
= x Pxx
1 W (0)
i
1 W (0)
W (i+Nx ) = , (3.90)
2Nx

where ( Nx Pxx )i is the ith column or row of the square root of the matrix Nx Pxx
which is the covariance matrix multiplied by the number of dimensions. W (i) is the
weight associated with the ith point. By convention W (0) is the weight of the mean
point which is indexed as zero point. Note that the weights are not necessarily positive
because they depend on the approach used for the sigma points.
Because SPKF does not use the Jacobian of the system of equations, it becomes
particularly attractive for the black box" systems or in expressions of the dynamic
system in which they can be linearized.
Because the mean and the covariance of x are captured accurately up to second
order, the calculated value of the mean of y are correct up to the second order.
This indicates that the method is more accurate than the EKF with the additional
benefit that the distribution has been approximated rather than the f[] since the
series expansion has not been truncated in some order. In this way the algorithm is
able to incorporate information from the highest orders giving greater accuracy.
The Sigma Points capture the same mean and covariance with no regard to the
choice of the square root of the matrix used. We can for example use matrix decom-
position methods more stable and efficient as the decomposition of Cholesky.
The mean and covariance have been calculated using the standard operations on
vectors and this means that the algorithm is adaptable to any choice that has been
56 3 Sequential Interpolation

made on the process, therefore compared to the EKF it is not necessary to evaluate
the Jacobian matrix.

3.2.3 Unscented Kalman Filter (UKF)

The EKF has two problems: linearization can produce highly unstable filters if the
assumption of linearity is locally violated; the derivation of the Jacobian matrix is
not trivial for many applications and often leads to difficulties. In addition, because
the linearization of the errors in EKF introduces an error which is about 1.5 times the
standard deviation of the measurement interval, the transformation is inconsistent.
In practice, this inconsistency may be resolved by introducing an additional noise
that stabilizes the transformation, but induces a growth of the size of covariance
transformed. This is a possible reason why EKF are difficult to adjust. In fact, we
must introduce a sufficient noise to perform the linearization. However the noise
stabilizing is an undesirable solution since the estimate remains biased and there is
no guarantee that transformed estimate is consistent.
To overcome these drawbacks the UKF transformation is introduced. This is
a method for calculating the statistic of a random variable which is subject to a
nonlinear transformation. It is based on the assertion that it is easier to approximate
a Gaussian distribution than approximate an arbitrary nonlinear function.
A UKF transformation is based on two fundamental points:
1. Is easier to make a non-linear transformation of a single point rather than an entire
probability density function.
2. It is not too hard to find a set of individual points in the state phase whose
probability density function sample approximates the actual probability density
function of a state vector.
The UKF is closely linked to the transformation of the Sigma. The formula for
the Unscented Kalman filter is:
1. Take the nonlinear system at n-discrete time states is given by:

xk = f (xk1 , uk1 , wk1 )

yk = h(xk , vk )
wk (0, Qk )
vk (0, Rk ). (3.91)

2. The UKF is initialized as:

x0a = E[x0 ]
P0a = E[(x0 x0a )(x0 x0a )T ]. (3.92)
3.2 More Kalman Filters 57

The following equations are used to make the upgrade timely in order to propagate
the estimated state and covariance from a temporal measurement to another.
(i)
1. In order to propagate from the time (k 1) to k, first select the sigma xk1 as
specified previously in the transformation of SPKF with an appropriate exchange
a
because the current best hypothesis for the mean and the covariance of xk are xk1
a
and Pk1 :

(i)
xk1 = xk1a
+ x(i) i = 1, ...2n

x(i) = ( xPak1 )Ti i = 1, ...n

x(n+i) = ( xPak1 )Ti i = 1, ...n. (3.93)

2. Use the equation of known nonlinear system f() to transform the sigma points
(i)
in vectors xk taking into account that the transformation is f() rather than h()
data from SPKF and therefore we have to make the appropriate changes.

xk(i) = f(xk1
(i)
, uk1 , wk1 ). (3.94)

(i)
3. Combine vectors xk to get the a priori state estimate at the time k

1 (i)
2n
f
xk = xk . (3.95)
2n
i=1

4. Estimate the a priori covariance of the error adding Qk1 to account for the noise.

1 (i)
2n
f f (i) f
Pk = (xk xk )(xk xk )T + Qk1 . (3.96)
2n
i=1

Once we have done the upgrade, we implement the update equation of measurement
1. We choose the sigma points as specified by the unscented transform with an
appropriate change because the current best hypothesis for the mean and the
f f
covariance of xk are xk e Pk :

(i)
xk = xk + x(i) i = 1, ...2n
f

x(i) = ( xPk )Ti i = 1, ...n
f

x(n+i) = ( xPk )Ti i = 1, ...n.
f
(3.97)

This passage can be neglected re-using the sigma points obtained from the time
update. Let us use the equation of nonlinear system known h() to transform the
58 3 Sequential Interpolation

(i)
sigma vectors yk according to the relation:

(i) (i)
yk = h(xk , uk , vk ). (3.98)

(i)
2. Combine vectors yk to obtain the estimate of the a priori state at time k

1 (i)
2n
f
yk = yk . (3.99)
2n
i=1

3. Estimate the covariance of the a priori error adding Rk to take account of the noise
on the measurement

1 (i)
2n
f (i) f
Pz = (yk yk )(yk yk )T + Rk . (3.100)
2n
i=1

f
4. Estimate the cross-variance between xk and yk

1 (i)
2n
f (i)
Pxz = (xk xk )(yk yk )T . (3.101)
2n
i=1

5. Finally to update the measurement of the estimate of the state we use the Kalman
filter equation

Kk = Pxz Pz1
f f
xka = xk + Kk (yk yk )
f
Pka = Pk Kk Pz KkT . (3.102)

We have assumed that the equations of the process and those of measurement are
linear if compared to the noise, even though this is not generally true. In this case
we must treat the state vector in a different ways, using what Julier and Uhlmann [4]
call the augmented state (+).
xk
xk+ = wk
vk .

Thus the UKF is initialized as

E(x0 )
x0+a = 0
0
3.2 More Kalman Filters 59

E[(x0 x0+a )(x0 x0+a )T ] 0 0
P0+a = 0 Q0 0
0 0 R0

In this case we need to estimate the mean as increased mean and the covariance
as increased covariance as we have done by the algorithms previously defined but
from which we have removed Qk1 e Rk .

References

1. Kalman, R.E., Bucy, R.S.: New results in linear filtering and prediction theory. Trans. Am. Soc.
Mech. Eng. J. Basic Eng. Ser. D 83, 95108 (1961)
2. Molner, C., VanLoan, C.: Nineteen Dubious Ways to Compute the Exponential of a Matrix.
Twenty-Five Years, SIAM Rev. Soc. Ind. Appl. Math. 45, 349 (2003)
3. van der Merwe R., Wan, E.A., Julier, S.J.: Sigma-Point kalman filters for nonlinear estimation
and sensor-fusion: applications to integrated navigation. In: Proceedings of the AIAA Guidance,
Navigation, and Control Conference, Providence, RI (2004)
4. Julier, S., Uhlmann, J.: Unscented filtering and nonlinear estimation. In: Proceeding of IEEE
92(3), 401422 (2004)
Chapter 4
Advanced Data Assimilation Methods

Abstract The computations of the model and its error require an accurate selection
of the methods to be used. When the algorithm adopted to specific non-linear systems
diverges or the approximation order to better handle the non linearity of the model
fails, we need to improve the performance of the method. This chapter deals as
overcome these problems by the recursive Bayesian estimation and the most advanced
filters such as the stochastic ensemble Kalman filters.
As we have seen in the previous chapters the accurate numerical prediction requires
accurate initial conditions. However the computations of the model and its error
require an accurate selection of the methods to be used. For example the Extended
Kalman Filter applied to specific non-linear systems diverges because it is not able
to gather enough information on the system states trajectory and because the error
propagation is approximated by the tangent linear model in analysis steps. The prob-
lem can partially overcome by using higher order approximation in order to better
handle the non linearity of the model as, for instance happens in the Sigma Point
Kalman filter (SPKF) or derived from. However, beyond the tangent linear, there is
also a requirement of a Hessian which in terms of computer time is expensive.
Then in order to improve the performance of the method, with specific nonlinear
systems, we need to take into account all statistical moments of the state distribution
due to their impact in propagating within the system.
If we want to work with all statistics moments we must not limit ourselves to
the expectations and the covariances of distributions of variables. There the problem
needs to be reformulated in terms of probability of density function or pdf of a
state system.
This approach can be addressed using Bayesian statistics and stochastic filtering.
Before demonstrating the Kalman filter can be derived in the framework of the
Bayesian statistics, we must introduce the recursive Bayesian estimation.

4.1 Recursive Bayesian Estimation

The recursive sequential Bayesian filtering is based on recursive Bayesian estimation.

The filtering problem is a conditional a posteriori density estimation problem that,
in the Hadamard sense, is a stochastically ill posed problem.

The Author(s) 2016 61

R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_4
62 4 Advanced Data Assimilation Methods

The stochastic filter can be described as giving the initial density p(x0 ), the tran-
sition density p(xk |xk1 ), and the likelihood p(yk |xk ). The objective of the filtering
is to estimate the optimal current state at time k given the observations up to time k,
which is how to arrive at estimating the a posteriori density p(xk |y0:k ) or p(x0:k |y0:k ).
Although the posterior density provides a complete solution of the filtering prob-
lem, the problem is still complex since the density is a function rather than a finite-
dimensional point estimate. Because several physical systems do not have finite
dimension, the infinite-dimensional system can only be approximately modeled
trough a finite-dimensional filter, i.e. the filter can only be suboptimal.
In order to derive the recursive Bayesian filter, two assumptions are produced:
1. The states follow a first-order Markov process p(xk |x0:k1 ) = p(xk |xk1 );
2. the observations are independent of the given states. For simplicity, we denote
with Yl = y0:l := {y0 , . . . , yl } a set of observations available up to some time tl ,
while Yk = y0:k = y0:k := {y0 , . . . , yk } is the solution of filtering problem for
tk , k = 1, . . . 3
If the conditional pdf of xk is denoted by p(xk |Yk ), from Bayesian rule we have

p(Yk |xk ) p(xk )

where the posterior density p(xk |Yk ) is described by three terms:

1. The a priori pdf p(xk |Yk1 ) defines the knowledge of the model

p(xk |Yk1 ) = p(xk |xk1 ) p(xk1 |Yk1 )dxk1 , (4.2)

where p(xk |xk1 ) is the transition density of the state.

2. The likelihood p(yk |xk ) essentially determines the measurement noise model in
the equation
3. The integral of normalization involving the denominator, also called evidence

p(yk |Yk1 ) = p(xk |xk ) p(xk |Yk1 )dxk . (4.3)
4.1 Recursive Bayesian Estimation 63

The aim of Bayesian filtering is to apply the Bayesian statistics and Bayesian rule
to probabilistic inference problems, and the stochastic filtering problem. Bayesian
filtering is optimal in a sense that it seeks the posterior distribution which integrates
and uses all of available information expressed by probabilities (assuming they are
quantitatively correct). The criteria for measuring the optimality can be done by:
1. The Minimum mean-squared error (MMSE) that is defined in terms of prediction
or filtering error (or equivalently the trace of state-error covariance)

E[||xk xk || |y0:k ] =
2
||xk xk ||2 p(xk |y0:k )dxk (4.4)

where the conditional mean xk = E[xk |y0:k ] = xk p(xk |y0:k )dxk .
2. Maximum a posteriori (MAP) that is aimed to find the mode of posterior proba-
bility p(xk |y0:k ) which is equal to minimize a cost function
3. The Maximum likelihood (ML) which reduces to a special case of MAP where
the prior is neglected.
4. The Minimax that is to find the median of posterior p(xk |y0:k ).
When the mode and the mean of distribution coincide, the MAP estimation is correct.
However, for multimodal distributions, the MAP estimate can be arbitrarily negative.
MMSE requires full knowledge of the prior, likelihood and evidence, while MAP
methods require the estimation of the posterior distribution (density), but it does not
require the calculation of the denominator and thereby is computational inexpensive.
Note that, however, MAP estimate has a drawback especially in a high-dimensional
space.

4.1.1 The Kalman Filter

Kalman filter can be derived within a Bayesian framework, or more specifically, it

reduces to a MAP solution and can be easily also extended to ML solution. Lets
consider the following generic stochastic filtering problem in a dynamic state-space
form:
xk = f(k, xk , uk , wk ) (4.5)

yk = h(k, xk , uk , vk ), (4.6)

where Eqs. (4.5) and (4.6) are called state equation and measurement equation,
respectively; xt represents the state vector, yk is the measurement vector, uk rep-
resents the system input vector (as driving force) in a controlled environment. f and
h are two vector valued functions, which are potentially time-varying; wk and vk
represent the process (dynamical) noise and measurement noise respectively, with
appropriate dimensions. Since the extension to a driven system is straightforward,
the driving force, also called stochastic control problem, is not considered in our
approach.
64 4 Advanced Data Assimilation Methods

When Fk fxk (x)

and Hk hx
k (x)
, the Eqs. (4.5) and (4.6) reduce to the following
special case where a linear Gaussian dynamic system is considered:

xkt = Fk1 xk1

t
+ wk1
t
(4.7)

yk = Hk xkt + vk , (4.8)

where the true state xkt that we want to estimate has a probability density function
p(xkt ). Although the probability density function is not available, it is the complete
solution of the prediction problem.
Given a set of realizations of all observations available up the time tl

Ylo {y1o , y2o , y3o . . . yl }, (4.9)

the conditional probability density function given by p(xkt |Ylo ) yields to the solution
of the filtering problem at time tk , k = 1, 2, 3 . . .. The density probability function
p(xkt |Ylo ) for fixed l and tk , k = l + 1, l + 2, l + 3 . . . gives the solution for the
prediction problem from time t1 to tl .
However in several problems it is not possible compute them because, unlike the
unconditional densities, the conditional densities are random functions because they
depend on the observations.
Thus, instead of computing the densities explicitly we can use the conditional
mean xk|k1 = E[ p(xkt |Yko )] that is a random n-vector since it depends on the
realizations of the observations.
The filtering solution is given by the Kalman filter, in which the sufficient statistics
of mean and state-error correlation matrix are calculated and propagated. In Eqs. (4.7)
and (4.8), Fk1,k and Hk are called transition matrix and measurement matrix, respec-
tively. Kalman filter is also optimal in the sense that it is unbiased E[xk ] = E[xk ]
and is a minimum variance estimate. The continuous-time domain, in practice how-
ever, is more concerned about the discrete-time filtering, because the continuous-time
dynamic system can be always converted into a discrete-time system by sampling the
outputs and using zero-order holds on the inputs. Hence the derivative will be replaced
by the difference and the operator will become a matrix. wk and vn can be viewed as
white noise random sequences with unknown statistics in the discrete-time domain.
The state equation characterizes the state transition probability p(xk+1 |xk ), whereas
the measurement equation describes the probability p(yk |xk ) which is related to the
measurement noise model. For simplicity we assume the dynamic and measurement
noises are both Gaussian distributed with zero mean and constant covariance. The
derivation of Kalman filter in the linear Gaussian scenario is based on the following
assumptions:
wkt is Gaussian with zero mean and white noise in time, i.e.:

wkt N (0, Qk ); (4.10)

4.1 Recursive Bayesian Estimation 65

vk is Gaussian with zero mean and white noise in time, i.e.:

vk N (0, Rk ); (4.11)

xkt is Gaussian with mean xtk and covariance P0 , i.e.

xkt N (xtk , P0 ); (4.12)

along with the whiteness assumptions

E[wkt (wlt )T ] = Qk k,l for all k, l (4.13)

E[vk (vl ) ] = Rk k,l for all k, l
T
(4.14)
E[wkt (x0t )T ]= E[vk (x0t )T ] = E[wkt (vk )T ] = 0 (4.15)

t
Suppose we are given the conditional density p(xk1 |Yk1
o ),the object is to calculate

p(xk |Yk ).
t o
t
Suppose we are given the conditional density p(xk1 |Yk1
o ),the object is to cal-

culate p(xk |Yk ).

t o

4.1.2 The Forecast Step

t
Denote the mean and covariance matrix respectively of the density p(xk1 |Yk1
o )

a
xk1 E[ p(xk1
t
|Yk1
o
)] (4.16)

and
a
Pk1 E[(xk1
t
xk1
a
)(xk1
t
xk1
a
)T |Yk1
o
]. (4.17)

a
xk1 is the analysis at time tk1 , an n-vector, the expected value of the true state
xkt conditioned on all observations available up to and including that time, while
a
Pk1 is the analysis error covariance matrix, an n nmatrix, at time tk1 . At time
t there are no observations so from it follows that p(x0t ) is a Gaussian density with
mean x0a xt0 and covariance matrix P0a P0 We will see that if p(xk1 t |Yk1
o )

is Gaussian then is p(xk |Yk ) so by induction it will follow that all the densities
t o

p(x1t |Y1o ), p(x2t |Y2o )... are in fact Gaussian.

In the forecast step denote the mean and covariance matrix of the density
p(xkt |Yk1
o )
f
xk E[xkt |Yk1
o
] (4.18)

and
f f f
Pk E[(xkt xk )(xkt xk )T |Yk1
o
]. (4.19)
66 4 Advanced Data Assimilation Methods

f
xk1 is the forecast at the new time tk1 , an n-vector, the expected value of the true
state xkt conditioned on all observations available up to and including that time, while
f
Pk is the forecast error covariance matrix, an n n matrix, at time tk
Substituting (4.7) into (4.18) we obtain:

f
xk = Fk1 E[xk1
t
|Yk1
o
] + E[wk1
t
|Yk1
o
], (4.20)

since Fk1 is constant, that is a deterministic matrix.

a
The first expectation here is xk1 along to definition (4.16). The second one is the
unconditional expectation E[wk1 ] according to the whiteness assumptions (4.15)
t

and therefore vanishes under the assumption which is indeed a forecast to time tk1
via the linear propagator Fk1 . Thus we have

f
xk = Fk1 xk1
a
, (4.21)

which is indeed a forecast to time tk from the analysis at time tk1 via the linear
propagator Fk1 .
Substituting (4.7) and (4.21) into (4.19) we obtain:

f
Pk = E[{Fk1 (xk1
t
xk1
a
)+wk1
t
}{Fk1 (xk1
t
Xk1
a
)+wk1
t
}T |Yk1
o
], (4.22)

that under the assumptions (4.15) gives:

f
Pk = Fk1 Pk1
a T
Fk1 + Qk1 . (4.23)

f a . This process does

This equation gives the evolution of Pk starting from Pk1
not require any Gaussian assumptions. However if p(xk1 |Xk1
t o ) is Gaussian also

p(xk |Xk1 ) is Gaussian since relation (4.7) is a linear combination of Gaussian

t o

random vectors.

4.1.3 The Analysis Step

More complex is the analysis step where we need to make the Gaussian assump-
tion. However let us apply the recursive Bayesian estimation underlying the prin-
ciple of sequential Bayesian filtering. Under the assumptions that the states follow
a first-order Markov process p(xkt |x0:k1 ) = p(xkt |xk1 ) and the observations are
independent of the given states, we can write:

p(Yko |xkt ) p(xkt )

p(xkt |Yko ) =
p(Yko )
4.1 Recursive Bayesian Estimation 67

We need to evaluate the a posteriori density by the right side of (4.24) that is
described by
1. the a priori pdf p(xkt |Yk1
o ) that defines the knowledge of the model

p(xkt |Yk1
o
)= p(xkt |xk1 ) p(xk1 |Yk1
o
)dxk1 , (4.25)

where p(xkt |xk1 ) is the transition density of the state.

2. the likelihood p(yk |xkt ) that essentially determines the measurement noise model
in the equation
3. the evidence that involves an integral of normalization at the denominator.

p(yk |Yk1
o
) = p(yk |xkt ) p(yk |Yk1
o
)dxkt . (4.26)

When the mode and the mean of distribution coincide, the MAP estimation is cor-
rect; however, for multimodal distributions, the MAP estimate can be arbitrarily bad.
Both MMSE and MAP methods require the estimation of the posterior distribution
(density), but MAP does not require the calculation of the denominator (integra-
tion) and thereby more computational inexpensive; whereas the former requires full
knowledge of the prior, likelihood and evidence. Note that however, MAP estimate
has a drawback especially in a high-dimensional space.
Let xkM A P denote the MAP estimate of xkt that maximizes p(xkt |Yk ), or equiva-
lently log p(xkt |Yko ). By using the Bayesian rule, we may express p(xkt |Yko ) by

p(xkt |Yko )
p(xkt |Yko ) =
p(Yko )
p(xkt , yk , Yk1
o )
= , (4.27)
p(yk , Yk1
o )

where the expression of joint pdf in the numerator is further expressed by

68 4 Advanced Data Assimilation Methods

p(xkt , yk , Yk1
o
) = p(yk |xkt , Yk1
o
) p(xkt , Yk1
o
)
= p(yk |xkt , Yk1 ) p(xkt |Yk1
o
) p(Yk1
o
)

and on the fact that wk does not depend on Yk1

o .

p(xkt |Yko ) = p(yk |xkt ) p(xkt |Yk1

o
) p(Yk1
o
). (4.28)

Substituting (4.28) into (4.27), we obtain:

p(yk |xkt ) p(xkt |Yk1

o ) p(Y o )
k1
p(xkt |Yko ) =
p(yk , Yk1
o )

p(yk |xkt ) p(xkt |Yk1

o ) p(Y o )
k1
=
p(yk |Yk1
o ) p(Y o )
k1
p(yk |xkt ) p(xkt |Yk1
o )
= , (4.29)
p(yk |Yk1
o )

which shares the same form as (4.24). Under the Gaussian assumption of process
noise and measurement noise, the mean and covariance of p(yk |xkt ) are calculated by

E[yk |xkt ] = E[Hk xkt + wk |xkt ] = Hk xkt (4.30)

and

E[(yk E[yk |xkt ])(yk E[yk |xkt ])T |xkt ] = E[vk (vk )T ] = Rk . (4.31)

The conditional pdf p(yk |xkt ) can be further written as:

1
p(yk |xkt ) = A1 exp{ (yk Hk xkt )T Rk1 (yk Hk xkt )}, (4.32)
2

where A1 = (2 )Ny /2 |Rk |1/2 , with Nyk is the dimension of observations.

f
Remembering the relation (4.19), the covariance Pk , by Gaussian assumption,
we may obtain:

1
) = A2 exp{ (xkt xk )T (Pk )1 (xkt xk )},
f f f
p(xkt |Yk1
o
(4.33)
2

where A2 = (2 )Nx /2 |Pk |1/2 with Nx is the dimension of state. By substituting

Eqs. (4.32) and (4.33) to (4.27), it further follows:

1 1
p(xkt |xk ) A exp{ (yk Hk xkt )T Rk1 (yk Hk xkt ) (xkt xk )T (Pk )1 (xkt xk )}, (4.34)
f f f
2 2
4.1 Recursive Bayesian Estimation 69

where A = A1 A2 is a constant. Since the denominator in relation (4.29) is a normal-

izing constant, the relation (4.34) can be regarded as an unnormalized density, for
which it does not affect the next derivation.
Since the MAP estimate of the state is defined by the condition

log p(xkt |xk )
xkt t M A P = 0, (4.35)
x =x k

substituting Eq. (4.34) into (4.35) yields:

x M A P = (HkT Rk1 Hk + (Pk )1 )1 ((Pk )1 xk + HkT Rk1 yk ).

f f f
(4.36)

By using the lemma of inverse matrix 1 it is simplified as:

f f
xkM A P = xk + Kk (yk Hk xk ) (4.37)

where Kk is the Kalman gain as defined by

Kk = Pk HkT (Hk Pk HkT + Rk )1

f f
(4.38)

and the covariance matrix

Pka = E[(xkt xkM A P )(xkt xkM A P )T |Yko ]. (4.39)

Observing that yk = Hk xkt + vk , the difference xkt xkM A P is:

f f
xkt xkM A P = xkt xk Kk (yk Hk xk )
f
= (I Kk Hk )(xkt xk ) Kk vk (4.40)

and taking into account the assumptions (4.15) the covariance matrix (4.39) is

f
Pka = (I Kk Hk )Pk (I Kk Hk )t + Kk Rk KkT . (4.41)

Rearranging the above equation it reduces to:

f
Pka = Pk (I Kk Hk ). (4.42)

Thus far, the Kalman filter is completely derived from MAP principle.

(A BD1 C)1 , it follows from the matrix inverse lemma, or Sherman Morrison Woodbury
1 For

formula, that is equal to A1 + A1 B(D CA1 B)1 CA1

70 4 Advanced Data Assimilation Methods

The same procedure can be extended to Maximum likelihood maximizing the log
likelihood
log p(xk |Yk ) = log p(xk , Yk ) log p(Yk ) (4.43)

The optimal estimate is

log p(xk |Yk )
= 0, (4.44)
xk xk =x M L

substituting (4.34) to (4.44), we want to minimize the cost function

J = (yk Hk xkt )Rk1 (yk Hk xkt )T + (xkt x M L )(Pk )1 (xkt x M L )T , (4.45)

from which we can obtain the same solution for Kalman filter.
We have seen the Kalman filter can be derived within the Bayesian framework. By
admitting state-space formulation, Kalman filter provides that the signal process (i.e.
state) is regarded as a linear stochastic dynamical system driven by white noise, the
optimal filter thus has a stochastic differential structure which makes the recursive
estimation possible. Kalman filter is an unbiased minimum variance estimator under
Linear Quadratic Gaussian control circumstance. When the Gaussian assumption of
noise is violated, Kalman filter is still optimal in a mean square sense, but the estimate
does not produce the conditional mean (i.e. it is biased), and neither the minimum
variance. Kalman filter is not robust because of the underlying assumption of noise
density model.
In order to approximate the optimal estimation methods for linear problems with
gaussian statistics, Monte Carlo methods have been used to estimate the initial con-
dition for forecasting (Evensen and van Leuwen [1], Houtemaker and Mitchel [2])

4.1.4 Prediction by Stochastic Filtering

We will formulate the continuous-time stochastic filtering problem by Stochastic

Differential Equation (SDE) theory. Suppose (SDE) {xk } is a Markov process with
an infinitesimal generator, rewriting state-space Eqs. (4.5) and (4.6) in the following
form of I t o SDE:
dxk = f(k, xk )dt + (k, xk )dwk (4.46)

dyk = h(k, xk )dk + dvk , (4.47)

where f(k, xk ) is often called nonlinear drift and (t, xk ) is diffusion coefficient.
Again, the noise processes {wk , vk , k 0} are two Brownian or Wiener processes.
First, lets look at the state diffusion equation. For all t 0, we define a backward
diffusion operator Lk , a partial differential operator, as:
4.1 Recursive Bayesian Estimation 71

Nx
1
Nx
ij 2
Lk = fki + at , (4.48)
xi 2 xi x j
i=1 i, j=1

ij
where ak = i (k, xt ) j (k, xk ). Operator Lk corresponds to an infinitesimal gen-
erator of the diffusion process {xk , k 0|}. The goal now is to deduce conditions
under which one can find a recursive and finite-dimensional (close form) scheme
to compute the conditional probability distribution p(xk |Yk ), given the filtration Yk
produced by the observation process (4.6).
Lets define an innovations process, that is defined as a white Gaussian noise
process
k
ek = yk E[h(k, xk )|y0:k ]ds, (4.49)
0

where E[h(k, xk )|Yk ] is described as:

h(xk ) = E[h(k, xk )|Yk ]

= h(xk ) p(xk |Yk )dx. (4.50)

For any test function , the forward diffusion operator L is defined as

Nx
1 i j 2
N
Lt = fti + ak , (4.51)
xi 2 xi x j
i=1 i, j=1

which essentially is the Fokker-Planck operator. Given initial condition p(x0 ) at

k = 0 as boundary condition, it turns out that the pdf of diffusion process satisfies
the Fokker-Planck-Kolmogorov equation (FPK; a.k.a. Kolmogorov forward equa-
tion).The stochastic process is determined equivalently by the FPK equation

p(xk )
= Lk p(xk ) (4.52)
k
or the SDE (4.46).
The FPK equation can be interpreted as follows: the first term is the equation of
motion for a cloud of particles whose distribution is p(xk ), each point of which obeys
the equation of motion dxdk = f(xk , k). The second term describes the disturbance
due to Brownian motion or as it also called, Wiener noise.
The (4.52) can be solved exactly by Fourier transform. By inverting the Fourier
transform, we can obtain

1 (x x0 f(x0 )k)2
p(x, k + k|x0 , k) = exp{ }, (4.53)
2 0 k 20 k

which is a Gaussian distribution of a deterministic path.

72 4 Advanced Data Assimilation Methods

The Eq. (4.52) can be considered as the fundamental equation for the time evo-
lution of error statistics. It describes the change of the pdf in a local volume that
depends on the divergence term describing a probability flux into the local volume.
The diffusion term tends to flatten the pdf due to the effect of stochastic model errors,
that is the probability decrease and error increases. When, as in the case of (4.53),
we are able to solve the pdf we are also able to be calculate the mean and error
covariance for the model forecast at different time levels. This is the case of Kalman
filter where pdf is fully characterized by its mean and covariance. However when
we have a non linear model the time evolution of (xij ) is not able to characterize
the mean and covariance and the solution is to solve the FPK by using Monte Carlo
methods. If the pdf is represented by a large ensemble of model states it is possible to
integrate each member of the ensemble forward in time using the stochastic model
(4.46), that is equivalent to solving the FPK equation with Monte Carlo.
The standard approach is to calculate firstly the best guess initial condition on
the base of the data and statistics. The ensemble of initial states is generated with
the mean that is equal to the best initial condition and the variance is specified from
the known uncertainties of the first guess initial state. The covariance should reflect
the true scale of the system. In order to provide a realistic increase in the ensemble
variance the estimate of the model error variance should be include to give the time
evolution of the external error in the estimates.

4.2 Ensemble Kalman Filter

The Ensemble Kalman filter (EnKF) was introduced by Evensen [3, 4] in alterna-
tive to the Extended Kalman filter (EKF) (see Kalman chapter) and it is a Monte
Carlo approximation to the traditional Kalman Filter. The EnKF is a sequential data
assimilation method in which the error statistics are predicted by solving the Fokker
Planck or forward Kolmogorov equation with Monte Carlo or ensemble integration
by which it is possible to compute the statistical moments like mean and error covari-
ance. Respect to the basic KF the gain of the update equation is calculated from the
error covariance provide by the ensemble of model states. The ensemble mean is the
best estimate and the spreading of the ensemble around the mean. If we assume to
have N model states in the ensemble, each of dimension n, they can be represented as
a single point in an n-dimensional state space. In the state space the ensemble mem-
bers will constitute a cloud points that is described by a probability density function
of the type f (x) = dnn where dn is the number of points in a small unit volume and
n is the total number of points. When we know the ensemble representing f (x) we
can calculate the moments of the statistics.
Thus the Ensemble Kalman Filter seeks to mimic the Kalman Filter but with an
ensemble of limited size rather than with error covariance matrix. The aim of EnKF
is to perform an analysis through each member of the ensemble
4.2 Ensemble Kalman Filter 73

f f
xia = xi + K En K F (yi H(xi )), (4.54)

f
where i = 1, 2, 3, . . . n is the index indicating the member of the ensemble and xi
is the state vector describing the i member forecasted at the time analysis. K En K F
should be the Kalman gain we should compute from the ensemble statistics.

K En K F = P f HT (HP f HT + R)1 , (4.55)

where R is the given observational error covariance matrix. Following Whitaker and
Hamill [5]) the forecasted error covariance matrix is estimated from the ensemble

1 f
n
Pf = (x x f )(x f x f )T , (4.56)
n1
i=1

with
1 f
n
x f = xi . (4.57)
n
i=1

Note that wherever an overbar is used in the context of a covariance estimate a factor
of n 1 instead of n is at denominator that implies the estimate is unbiased.
Thanks to (4.54) we can obtain a posterior ensemble{x1a } with i = 1, 2, 3, . . . n
from which we can compute the posterior statistics. Thus the analysis is computed
as the mean of the posterior ensemble:

1 a
n
xa = xi . (4.58)
n
i=1

If all members are updated with the same observations yi y the deviations of
the ensemble members from the mean, that represent the ensemble anomalies ea =
xia xa implies:

f f f
eia = ei + K En K F (0 Hei ) = (In K En K F H)ei , (4.59)

that applied to the analysis error covariance matrix leads:

1 a
n
Pa = (e = xia xa )(ea = xia xa )T = (In K En K F H)P f (In K En K F H)T ,
n
i=1
(4.60)
where K En K F K Comparing this equation with (4.41) shows the second term is
missed and the underestimation can lead to the divergence of EnKF.
74 4 Advanced Data Assimilation Methods

A solution can be that to perturb the observation vector for each member: yi =
y + ui where ui is a Gaussian distribution of the type ui N (0, R). Lets now define
an empirical error covariance matrix

1
n
Ru = ui uiT , (4.61)
n1
i=1
n
where in order to avoid biases i=1 ui = 0. Ru R in the symptotic limit n
With the introduction of this perturbation we modify the anomalies accordingly

f f
eia = ei + KuEn K F (eio Hei ), (4.62)

that yields the corrected analysis error covariance matrix

Pa = (In KuEn K F H)P f (In KuEn K F H)T + KuEn K F Ru (KuEn K F )T = (In KuEn K F H)P f .
(4.63)

Also in the forecast step the updated ensemble obtained at the analysis evolves in
time and is propagated according to a model

f
xi = M(xia ) f or i = 1, 2, n (4.64)

where M is a model operator. The forecast estimate is the mean of the forecasted
n f
ensemble x f = n1 i=1 xi , while the forecast error covariance matrix is

1 f
n
f
Pf = (xi x f )(xi x f )T . (4.65)
n1
i=1

The most important difference between ensemble Kalman filtering and the other
methods is that it only quantifies uncertainty in the space spanned by the ensemble.
We can be under a severe limitation if computational resources restrict the number
of ensemble members n to be much smaller than the number of model variables m.
When this limitation is overcome, then the analysis can be performed in a much
lower-dimensional space (n versus m). Thus, ensemble Kalman filtering has the
potential to be more computationally efficient than the other methods.

4.2.1 The Stochastic Ensemble Kalman Filter Menu

1. Initialization
f f
Initial system state x0 and initial covariance matrix P0
4.2 Ensemble Kalman Filter 75

2. For tk = 1, 2 . . .
a. Observation
Draw a statistical consistent observation set for i = 1, 2, . . . n:

n
yi = y + ui ui = 0 (4.66)
i

Compute the empirical error covariance matrix

1
n
Ru = ui uiT (4.67)
n1
i

3. Analysis
Compute the gain
Ku = P f HT (HP f HT + Ru )1 (4.68)

Ensemble of analysis for i = 1, 2 . . . , n

f f
xia = xi + Ku (yi H(xi )) (4.69)

and their mean

1 a
n
xia = xi (4.70)
n
i

Compute the analysis error covariance matrix

1 a
n
Pa = (xi xia )(xia xia )T (4.71)
n1
i

4. Forecast
f
Compute the ensemble forecast for i = 1, 2, . . . , n, xi = M(xia ) and their
mean
1 f
n
f
xi = xi (4.72)
n
i

Compute the forecast error covariance matrix

1 f
n
f f f
Pf = (xi xi )(xi xi )T (4.73)
n1
i
76 4 Advanced Data Assimilation Methods

4.2.2 The Deterministic Ensemble Kalman Filter

Lets now define a revised EnKF in which we eliminate the necessity to introduce
the perturbation on the observations. There are several variants of the deterministic
EnKF, (Whitaker and Hamill [5], Bishop et al [6], Hunt et al. [7]). We report
here the one developed by Bishop et al. [6], Hunt et al. [7] that can be defined
as Ensemble Transform ( ETKF) or one of the variant of the Ensemble square root
Kalman Filter (EnSRKF). The other approaches (EnSRKF and LETKF) are presented
in the applications chapter, where also presented is the Particle filters (Monte Carlo)
method applied to a highly non-Gaussian distribution and non linear operator.
The entire approach is based on the ensemble space rather than in the state or obser-
f
vation space. We start with an ensemble {(Xi )k } i = 1, 2, . . . , n of n-dimensional
model state vectors at time tk1 . Rather than to let one of the ensemble members
represent the best estimate of the system state, one assumes the ensemble to be
chosen so that its average represents the analysis state estimate. We evolve each
ensemble member according to the nonlinear model to obtain a forecast ensemble
f
{(Xi )k1 , i = 1, 2, . . . , n} at time tk :

f
[Xi ]k = Mk1,k ([Xia ]k1 ). (4.74)

Let l the number of scalar observations used in the analysis. Starting from the sample
n f
mean x f = n1 i=1 Xi the forecast error covariance matrix (4.65) can be written:

1 f
n
f
Pf = (Xi x f )(xi x f )T = X f (X f )T , (4.75)
n1
i=1

where X f is a m n matrix composed by n columns m-vectors whose columns are

the normalized anomalies:
f
xi x f
[X ]i =
f
. (4.76)
n1

The analysis not only determines and estimate, a covariance but also an ensemble
{(xia )k1 }, i = 1, 2, . . . , n with the appropriate sample mean and covariance.

1 a
n
xa = xi (4.77)
n
i=1

and
1 a
n
Pa = (xi xa )(xia xa )T = Xa (Xa )T , (4.78)
n1
i=1
4.2 Ensemble Kalman Filter 77

where xa is the m n whose columns are

xa xa
[Xa ]i = i . (4.79)
n1

4.2.3 The Analysis Scheme

f
In order to describe the transformation from a forecast ensemble {(Xi )k , i =
1, 2, . . . , n} into an appropriate analysis ensemble {(Xia )k1 , i = 1, 2, . . . , n} we
assume that the number of ensemble members n is smaller than both the number
of model variables m and the number of observations l, even when localization has
reduced the effective values of m and l considerably compared to a global analy-
sis. Assuming the choice of observations to use for the local analysis to have been
performed already, and consider y, H and R to be truncated to these observations;
as such, correlations between errors in the chosen observations and errors in other
observations are ignored. Most of the analysis will take place in a n-dimensional
space, with as few operations as possible in the model and observation spaces.
Remember the relation (4.45) where we want to minimize the cost function J
where we have the forecast covariance matrix having rank at most n 1 and therefore
it is not invertible. Nevertheless, its inverse is well-defined on the space S spanned
by the background ensemble perturbations, that is, the columns of x f . Thus J is
also well-defined in S, and the minimization can be carried out in this subspace. The
reduced dimensionality is an advantage from the point of view of efficiency, though
the restriction of the analysis mean to S is sure to be detrimental if n is too small. A
natural approach is to use the singular vectors of Y f (the eigenvectors of P f ) to form
a basis for S. In order to perform the analysis on S, we must choose an appropriate
coordinate system even though a conceptual difficulty in this approach is that the sum
of these columns is zero, so they are necessarily not linearly independent. We could
assume the first n 1 columns to be independent and use them as a basis, but this
assumption is unnecessary and clutters the resulting equations. Instead, we regard
x f as a linear transformation from a n-dimensional space S onto S, and perform the
analysis in S. Let d denote a vector in S then X f d belongs to the space S spanned
by the background ensemble perturbations, and

x = x f + X f d (4.80)

is the corresponding model state in an affine space. Note that if d is a Gaussian

random vector with mean 0 and covariance (n 1)1 I, relation (4.80) is gaussian
with mean x f and covariance

1 f
n
f
Pf = (xi x f )(xi x f )T = X f (X f )T . (4.81)
n1
i=1
78 4 Advanced Data Assimilation Methods

If the observation operator is linear we can also use the notation Y f = HX f . If the
operator is not linear we consider Y f to be the matrix of the observation anomalies:

f
H(xi ) y f
[Y f ]i = , (4.82)
n1
n f
with y = n1 i=1 H(xi ).
With respect to the stochastic EnKF one performs a single analysis rather than
performing an analysis for each member of the ensemble. Adapting the mean analysis
of the stochastic EnKF we have:

xa = x f + K (y Hx f ), (4.83)

where K = P f HT (HP f HT + R)1 rather than Ku , because in the deterministic

approach the observations are not perturbed.
As before we reformulate our analysis in the ensemble space, that is to find the
corresponding optimal coefficient vector da that stands for xa in our relation:

xa = x f + X f da . (4.84)

Inserting this decomposition into (4.83) we obtain

x f + X f da = x f + X f (x f )T HT (HX f (X f )T HT + R)1 (y Hx f ). (4.85)

If we remember that X f H = Y f simplifying and substituting we obtain:

da = (X f )T HT (HX f (X f )T HT + R)1 (y H(x f ))

= (Y f )T (Y f (Y f )T + R)1 (y H(x f )). (4.86)

Applying the Sherman-Morrison-Woodbury lemma In (Y f )T (Y f (Y f )T + R)1 we

obtain:
da = (In + (Y f )T R1 Y f )(Y f )T R1 (y H(x f )). (4.87)

Then the gain is computed into the ensemble space rather than in the observation
space, that mean we are reformulating the stochastic EnKF.
In order to understand the difference between the deterministic EnKF and the
stochastic EnKF we can generate the posterior ensemble. Remembering relation
(4.63) we want to factorize Pa = Xa (Xa )T . In fact

Pa = (In KuEn K F H)P f

(In X f (Y f )T (Y f (Y f )T + R1 )Y f (Y f )T
X f (In (Y f )T (Y f (Y f )T + R)1 Y f )(X f )T . (4.88)
4.2 Ensemble Kalman Filter 79

Now we are looking for a square root matrix such as Y a (Y a )T = Pa . One of this
matrix is 1
Xa = X f (In (Y f )T (Y f (Y f )T + R)1 Y f ) 2 (4.89)

that can be simplified using the Sherman-Morrison-Woodbury lemma, obtaining:

1
Xa = X f (In (Y f )T R1 Y f ) 2 . (4.90)

This is the a posteriori ensemble of anomalies for the deterministic EnKF. Defining
1
as T = (In + (Y f )T R1 Y f ) 2 we can build the posterior ensemble as:

xia = xa + n 1X f [T]i = x f + X f (da + n 1X f [T]i ). (4.91)

We can note the ensemble is centered on xa . In fact

1 a
n
n1 f n1 f
xi = xa + X T1 = xa + X 1 = xa , (4.92)
n n n
i=1

because 1 = [1, 1, 1, 1..]T is the matrix that yields the same state vector x since
X f 1 = 0.

4.2.4 The Deterministic Ensemble Kalman Filter Menu

1. Initialization
f
Ensemble of state vectors E0 = {x0 , . . . xn }
2. For tk = 1, 2, . . .
a. Analysis
Compute forecast mean, the ensemble anomalies and the observation
anomalies:

f1 f
xk =
E 1
n k
f 1 f
Yk = (Ek x f 1T )
n1
f 1 f
Yk = (Hk (Ek ) Hk (x f )1T )
n1

Computation of ensemble transform matrix

T = (In + YkT Rk1 Yk )1 (4.93)

80 4 Advanced Data Assimilation Methods

Analysis estimate in ensemble space

dka = TYkT R1 (yk H(xk ))

f
(4.94)

Generating the posterior ensemble

f f 1
Eka = xk 1T + Xk (dka 1T + n 1T 2 ) (4.95)

b. Forecast
Forecast ensemble
f
Ek+1 = Mk+1 (Eka ) (4.96)

4.3 Issues Due to Small Ensembles

Ensemble Kalman filtering is subjected to some restriction when the computation

requires of maintaining a large ensemble. Ensembles that are too small can lead to
problems of a different nature. Since the specification of the relative weighting to be
placed on the background errors is always underestimated, Furrer and Bengtsson [8]
refer this can be exacerbated by using ensembles sizes that are too small. This leads
to the filter placing more confidence in the background state of the system and
less on the observations. This means that observational data is not influential in
adjusting the forecast state adequately when producing the analysis state. If the
forecast state is not a good representation of the true state of the system then filter
fails to produce a meaningful state estimate. Further the use of a small ensemble
can lead to the development of correlations between state components that are at a
significant distance from one another where there is no physical relation.
Thus the success of the EnKF depends on the size of the ensemble employed as
it has been investigated widely for example in Houtekamer and Mitchell [2]. The
success of an ensemble filter will be dependent on the ensemble being statistically
representative but the ensemble must span the model sub-space adequately (Oke
et al., 2007). Current NWP models have state spaces of the order of O(107 ) (UK
Met Office [9]) and thus require a large ensemble to be adequately statistically
represented. This requires a large computational cost. If an ensemble filter uses a
smaller number of ensemble members than the size of the state, that ensemble is
not statistically representative and the system is said to be undersampled introducing
three major problems in ensemble filtering: inbreeding, filter divergence and the
development of long range spurious correlations.
4.3 Issues Due to Small Ensembles 81

4.3.1 Inbreeding

Inbreeding has been introduced by Houtekamer and Mitchell [2] to describe the
phenomenon in ensemble filtering that arise due to undersampling. This term is also
used to describe a situation where the analysis error covariances are systematically
underestimated after each of the observation assimilations. Furrer and Bengtsson
[8] indicate the analysis error covariance Pa should always be less than that of the
forecast error covariance, since, as we have been before, it is defined as:

Pa = (In KH)P f , (4.97)

where the forecast error covariance P f = E[(x f xt )(x f xt )T ] is a measure of

uncertainty in the forecast estimate state of the system.
The ensemble Kalman gain, as given by the relation Ke = Pe HT (HPe HT +R)1 ,
f f

where the index e indicates the ensemble, uses a ratio of the error covariance of the
forecast background state and the error covariance of the observations to calculate
the weight that should be placed on the background state and how much weighting
should be given to the observations. Then the forecast state estimate is adjusted
by the observations in accordance with the ratio of background and observation
covariance matrices in the Kalman gain. Note if either the forecast background errors
or observational errors are incorrectly specified then the adjustment of the forecast
state will be incorrect.
Inbreeding in small ensembles that do not adequately span the model subspace
can occur due to sampling errors in the covariance estimate (Lorenc [10]). The
smaller the ensemble is, the greater the degree of undersampling is present and
the greater the chance is of underestimated forecast error covariances (Ehrendorfer
[11]). In ensemble filters, where each member of the ensemble is updated, at the
analysis stage, by the same observations and the same Kalman gain matrix there
is a tendency for the filter to underestimate the analysis covariance (Whitaker and
Hamill [5]). Ensemble Kalman filters that use perturbed observations, such as the
EnKF of Evensen [4], have additional sampling errors in the estimation of the
observation error covariances. This in turn makes it more likely that inbreeding will
occur (Whitaker and Hamill [5]).
Ensemble Kalman filters that use perturbed observations have additional sam-
pling errors in the estimation of the observation error covariances which means it is
likely that the inbreeding occurs (Whitaker and Hamill [5]). In case of Square root
filters since observations are not perturbed the inbreeding is negated. Inbreeding is a
potential source of filter divergence and the development of spurious long range cor-
relations (Hamill et al. [12]). Undersampling produces a reduced rank representation
of the background error covariance matrix and in cases where the undersampling is
severe there is a tendency for the variances and covariances to be underestimated.
82 4 Advanced Data Assimilation Methods

4.3.2 Filter Divergence

When an incorrectly specified analysis state is unable to be adjusted by observa-

tion assimilations to more accurately represent the true state of the system a filter
divergence occurs. When the covariances in the forecast estimate become too large
then there is little certainty in the forecast estimate state of the system because less
weighting is given to the forecast state of the system and more to the observations. On
the contrary if the covariances become too small then there is high degree of certainty
in the forecast estimate state of the system because, due to the Kalman gain, more
weighting is placed on the forecast state of the system and less on the observations.
The greater the degree of inbreeding, the more relative weighting a filter assigns to the
background state and the more the observational data is ignored during the assimila-
tion. If the standard deviation of the analysis state ensemble is smaller than that of the
forecast state ensemble the ensemble members are converging. Due to inbreeding,
it becomes impossible for the filter to then adjust an incorrectly specified forecast
state to accurately represent the true state and filter divergence has occurred [13].

4.3.3 Spurious Correlations

Spurious correlations happen in the forecast error covariance between state compo-
nents that are not physically related and they are normally at a significant distance
from each other. Where all the observations have an impact on each state variable large
long range spurious correlations may develop (Anderson [14]). The consequence
of these is that a state variable may be incorrectly impacted by an observation that
is physically remote. As the size of the ensemble and the true correlation between
state components decreases, so the error in the covariance error estimate, relative
to the true correlation, greatly increases (Hamill et al. [12]). In the physical world
it is expected that at a distance from a given observation point the true correlation
will decrease. In a NWP the size of the error relative to the true correlation at grid
points remote from the observation point in the forecast error covariance matrix will
therefore be expected to increase (Hamill et al. [12]). It was demonstrated by Hamill
et al. [12] that the analysis estimate provided by the EnKF was less accurate when
the error in the covariance estimate, known as noise, is greater than the true corre-
lation, known as the the signal. Since the correlations at a distance are expected to
be small and the relative error increases with distance, then it is expected that state
components distant from the observations have a greater noise to signal ratio. These
are long range spurious correlations and they degrade the quality of the analysis
estimate. Further Hamill et al. [12] show that the noise to signal ratio is a function
of ensemble size. Larger ensembles which more accurately reflect the statistics have
less associated noise. Thus the problem of spurious correlations is associated with
under-sampling. Lorenc [10] shows that when an ensemble is generated by random
sampling from a probability distribution function (pdf), the forecast error covariance
will have an error proportional to 1/N, where N is the size of the ensemble.
4.4 Methods to Reduce Problems of Undersampling 83

4.4 Methods to Reduce Problems of Undersampling

Various methods have been implemented in an attempt to negate these problems.

Methods known as covariance inflation and covariance localization (Hamill et al.
[12]) are the methods implemented in the model previously described.
However these are not the only methods used, alternative methods of covariance
localization including a double EnKF (Houtekamer and Mitchell [2]) and the local
ensemble Kalman filter (LEKF) (Ott et al. [15], Szunyogh et al. [16]) are also
possible. The double EnKF employs parallel ensemble data assimilation cycles. Error
covariances are estimated by one ensemble and this is then used to update the other.
This can prevent the cycle of systematic inbreeding. The LEKF is a square root filter
where the analysis equations are solved exactly in a local subspace spanned by all
the ensemble members using all the available observations within the predefined
localized space [13]. This filter is explored in more detail in Kalnay et al. [17].

4.4.1 Spatial Localization

If the ensemble has n members, then the forecast covariance matrix P f given by
(4.75) describes nonzero uncertainty only in the n-dimensional subspace spanned by
the ensemble, and a global analysis will allow adjustments to the system state only
in this subspace. If the system is high-dimensionally unstable, then forecast errors
will grow in directions not accounted for by the ensemble, and these errors will not
be corrected by the analysis. On the other hand, in case of a sufficiently small local
region, the system may behave like a low-dimensionally unstable system driven by
the dynamics in neighboring region (see Hunt [7]).
Localization is generally done either explicitly, considering only the observations
from a region surrounding the location of the analysis, or implicitly, by multiplying
the entries in P f by a distance-dependent function that decays to zero beyond a certain
distance, so that observations do not affect the model state beyond that distance.
Since the model is discretized, the choice of which observations to use for each grid
point depends also on the method adopted, and a good choice will depend both on
the particular system being modeled and the size of the ensemble (more ensemble
members will generally allow more distant observations to be used gainfully). It is
important however to have significant overlap between the observations used for
one grid point and the observations used for a neighboring grid point; otherwise the
analysis ensemble may change suddenly from one grid point to the next.
The global analysis should not be confined to the n-dimensional ensemble space
and instead should explore a much higher dimensional space. However the necessity
of spatial localization for space temporally chaotic systems introduce barely corre-
lations between distant locations in the background covariance matrix P f , at least
for short time scales. These spurious correlations randomly affect observations. If
the system has a characteristic correlation distance. then the analysis should ignore
84 4 Advanced Data Assimilation Methods

ensemble correlations over greater distances. In this context an important issue in

ensemble Kalman filtering of space temporally chaotic systems is spatial localization.

4.4.2 Covariance Inflation

One method to correct the underestimation in the forecast error covariance matrix
is the Covariance inflation. It was introduced by Anderson and Anderson [18] with
the aim to increase the forecast error covariances by inflating, for each ensem-
ble member, the deviation of the background error from the ensemble mean by a
certain percentage. In fact prior to a new observation being assimilated in any new
cycle, the background forecast deviations from the mean are increased by a factor of
inflation, r,
f f
xi r(xi x f ) + x f , (4.98)

where represents the replacement of the previous value.

The inflation factor, r, is normally chosen to be slightly greater than 1.0, but the
specification of an optimal inflation factor may vary according to the size of the
ensemble (Hamill et al. [12]). Hamill et al. [12], experimentally found, that a 1 %
inflation factor was nearly optimal for all numbers of ensemble members even though
Whitaker and Hamill [5] found the optimal values of the inflation factors were 7 %
for the EnKF and 3 % for EnSRF.
The size of the inflation factor chosen depends on various factors. First of all it
will depend on the dynamical model adopted. In fact the size of the inflation factor
required may be deeply affected if the dynamical model has significant error growth
or decay. Furthermore the size of the inflation factor may also be dependent on the
type of ensemble filter used, and the covariance filtering length scale can be chosen
as in Whitaker and Hamill [5].
This method of overcoming the problem of inbreeding is commonly used, for
example in Hamill et al. (2001); Whitaker and Hamill [5]; Anderson [14] and
Oke et al. [19]. Anderson [14] mentions that, when there are local linear physical
balances in the system dynamics, a large inflations may break this balance. Inflation
factors do not help to correct the problem of long range spurious correlations, for
this a more sophisticated approach is required.

4.4.3 Covariance Localization

Houtekamer and Mitchell [2], Hamill et al. [12], Whitaker and Hamill [5] show
the covariance localization is a process of cutting off longer range correlations in the
error covariances at a specified distance. It is a method of improving the estimate
of the forecast error covariance. Since the empirical P f is a good approximation,
the insufficient rank introduces long range correlations that are the above mentioned
4.4 Methods to Reduce Problems of Undersampling 85

spurious correlations. Then the idea is to regularize P f by shooting out these long
range correlation and increasing the rank of P f by multiplying it with short range
predefined correlation matrices . The pointwise multiplication is ordinarily achieved
by applying a Schur product (Schur [20]), also known as the Hadamard product (Horn
[21]), denote with a , to the forecast error covariance matrix.
A Schur product involves an element-wise product of matrices written as P f ,
where P f and have the same dimensions. If i is the row index and j is the column
index the Schur product is calculated as

[P f ]ij = [P f ]ij []ij . (4.99)

In mathematics, the Hadamard product (also known as the Schur product or the
entrywise product) is a binary operation that takes two matrices of the same dimen-
sions, and produces another matrix where each element i j is the product of elements
i j of the original two matrices. It should not be confused with the more common
matrix product. It is attributed to, and named after, either French mathematician
Jacques Hadamard, or German mathematician Issai Schur. The Hadamard product
is associative and distributive, and unlike the matrix product it is also commutative
(see Wikipedia)
To achieve covariance localization by Schur product a function, , is normally
defined to be a correlation function with local support. Local support is a term
meaning that the function is only non zero in a small (local) region and is zero
elsewhere. The correlation function is commonly taken to be the compactly supported
5th order piecewise rational function as defined in Gaspari and Cohn [22], such that
1

(|z|/c)5 + 21 (|z|/c)4 + 58 (|z|/c)3 53 (|z|/c)2 + 1, 0 |z| c
14
(|z|/c)5 21 (|z|/c)4 + 58 (|z|/c)3 + 53 (|z|/c)2
= 12

5(|z|/c) + 4 + 58 (|z|/c)3 23 (c/|z|) c |z| 2c

0 2c |z|,

where z is the Euclidean distance between either the grid points in physical space or
the distance between a grid point and the observation location; this is dependent on
the implementation. A length scale c is defined such that beyond this the correlation
reduces from 1 and at a distance of more than twice the correlation length
scale the
correlation reduces to zero. The length scale is generally set to be c = 10 3l where l

is any chosen cutoff length scale. The factor 10 3l is included to tune the correlation
function to be optimal (Lorenc [10]) such that the final localised global average error
variance is closet to that of the true probability distribution.
To achieve covariance localization a Schur product is taken between the forecast
background error covariance matrix, P f , calculated from the ensemble, and a corre-
lation function with local support, . Remembering the Kalman gain (4.55) we can
write:
Ke = ( Pe )HT (H( Pe )HT + R)1 .
f f
(4.100)
86 4 Advanced Data Assimilation Methods

Since is a covariance matrix and P f is a covariance matrix then it can be proved

that P f is also a covariance matrix (Horn [21]).
The distance at which correlations in the error covariances are cutoff, i.e. reduced
to zero, can be defined in two ways. In Hamill et al. [12] it is defined as the Euclidean
distance between a grid point and the observation location and in Oke et al. [19]
as the Euclidean distance between grid points in physical space. The filtering length
scale is of primary importance because must remove spurious correlations while the
correlation function that correctly specifies physical correlations are not excessively
damped but maintained. If the filter length scale is too long, in order to correctly
capture all the dynamical correlations, then many of the spurious correlations may
not be removed. Ehrendorfer [11] suggests that the covariance may be noisy causing
an overly adjusted variance deficient ensemble. If the length scale is too short then
important dynamical correlations may be lost.
Increasing the rank of the forecast error covariance means increasing the number
of degrees of freedom of available for assimilating the observations. In such a case
the analysis state should better represent the observations (Ehrendorfer [11]). This
is useful when a system is under-sampled as the effective size of the ensemble is
increased (Oke et al. [19]). The use of the correlation function, where there is an
introduction of many zeros into the error covariance matrices also has the advan-
tage of computational savings in the calculations of the analysis error covariance
(Lorenc [10]).

References

1. Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the Agulhas current
using the ensemble Kalman filter with a quasi geostrophic model. Mon. Weather. Rev. 124,
8596 (1996)
2. Houtekamer, P.L., Mitchell, H.L.: Data assimilation using an ensemble Kalman filter technique.
Mon. Weather. Rev. 126, 796811 (1998)
3. Evensen, G.: Sequential data assimilation with nonlinear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics. J. Geophys. Res. 99, 143162 (1994)
4. Evensen, G.: Data assimilation: the ensemble kalman filter, 2nd edn, p. 320. Springer, New
York (2009)
5. Whitaker, J.S., Hamill, T.M.: Ensemble data assimilation without perturbed observations. Mon.
Weather. Rev. 130, 19131924 (2002)
6. Bishop, C.H., Etherton, B.J., Manjundar, S.J.: Adaptive sampling with the ensemble transform
filter. Part I: theoretical aspects. Mon. Weather. Rev. 129, 420436 (2001)
7. Hunt, B.R., Kostelich, E.J., Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos:
a local ensemble transform Kalman filter. Physica D 230, 112126 (2007)
8. Furrer, R., Bengtsson, T.: Estimation of high-dimensional prior and posterior covariance matri-
ces in Kalman filter variants. J. Multivar. Anal. 98, 227255 (2007)
9. UKMet Office.: Observations. https://www.metoffice.gov.uk/research/nwp/observations/. Last
accessed (2008)
10. Lorenc, A.C.: The potential of the ensemble Kalman filter for NWPa comparison with 4D-
VAR. Q. J. R. Meteorol. Soc. 129, 31833203 (2003)
11. Ehrendorfer, M.: A review of issues in ensemble-based Kalman filtering. Meteorol. Z. 16,
795818 (2007)
References 87

12. Hamill, T., Mullen, S., Snyder, C., Toth, Z., Baumhefner, D.: Ensemble forecasting in the short
to medium range: report from a workshop. Bull. Am. Meteorol. Soc. 81, 26532664 (2000)
13. Petrie, R.E.: Localization in the ensemble Kalman Filter. MSc Atmosphere, Ocean and Climate
University of Reading (2008)
14. Anderson, J.L.: An Ensemble Adjustment Kalman Filter for data assimilation. Mon. Weather.
Rev. 129, 28842903 (2001)
15. Ott, E., et al.: A local ensemble Kalman filter for atmospheric data assimilation. Tellus 56A,
415428 (2004)
16. Szunyogh, I., Kostelich, E.J., Gyarmati, G., Patil, D.J., Hunt, B.R., Kalnay, E., Ott, E., Yorke,
J.A.: Assessing a local ensemble Kalman filter: perfect model experiments with the National
Centers for Environmental Prediction global model. Tellus 57A, 528545 (2005)
17. Kalnay, E.: Atmospheric modelling: data assimilation and predictability. Cambridge University
Press, Cambridge (2003)
18. Anderson, J.L., Anderson, S.L.: A Monte Carlo implementation of the nonlinear filtering prob-
lem to produce ensemble assimilations and forecasts. Mon. Weather. Rev. 126, 27412758
(1999)
19. Oke, P.R., Sakov, P., Corney, S.P.: Impacts of localisation in the EnKF and EnOI: experiments
with a small model. Ocean. Dyn. 57, 3245 (2007)
20. Schur, I.: Bemerkungen zur theorie der beschrnkten bilinearformen mit unendlich vielen vern-
derlichen. Journal Fur Die Reine Und Angewandte Mathematik 140, 128 (1911)
21. Horn, R.: The Hadamard product. In: Johnson, C.R. (ed.) Matrix theory and applications,
American Mathematical Society, Proceedings of Symposia in Applied Mathematics, vol. 40,
pp. 87169 (1990)
22. Gaspari, G., Cohn, S.E.: Construction of correlation functions in two and three dimensions. Q.
J. R. Meteorol. Soc. 125, 723757 (1999)
Chapter 5
Applications

Abstract Some times the applications are neglected, when, instead of they are the
true benchmark of the theory. This chapter deals, by using simple pedagogical exam-
ples, with the forecasting processes based on data assimilation methods. They are
applied in different fields: the atmospheric study by the classical Lorenz model, the
biology of cells by a model of tumor growth, the planetary general circulation by an
application on Mars and the earthquake forecasts by renewal processes models.
The Kalman Filter equations describe how information from forecasts and from
observations should be combined in a optimal way which extracts maximum infor-
mation from each source. A forecast starts from initial conditions at time t0 and run
to time t1 accumulating errors over this period, In language of DA the assimilated
forecast/observation is called analysis state. The analysis state at t1 is then used as
initial condition for another forecast to time t2 which is then combined with the next
batch of observations at that time. When one uses the KF one describes not only the
analysis state but also the uncertainties correlated with the information provided by
the observations.
The Kalman filter presents the optimal solution to the data assimilation problem
under the assumptions of linear models with Gaussian observation and model noise.
These assumptions are strongly violated in stochastic point process models for earth-
quake forecasting where one needs a more general approach based on propagating
the entire probability distribution, rather than solely mean and covariance.
This chapter treats, by using simple pedagogical examples, the forecasting
processes based on data assimilation methods in different fields: the atmospheric
study by the classical Lorenz model, the biology of cells by a model of tumor growth,
the planetary general circulation by an application on Mars and the earthquake fore-
casts by renewal processes models.

5.1 Lorenz Model

The first of the pedagogical model is the simplified one proposed by Lorenz. Lorenz
[1] discusses three one-dimensional toy models that incorporate many features shown
in real atmospheric dynamics and in global numerical weather prediction models.

The Author(s) 2016 89

R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7_5
90 5 Applications

The first model (Lorenz model 1) was originally introduced in Lorenz [2] and Lorenz
and Emanuel [3]. This model has become the standard model for the initial testing of
EnKF schemes. The popularity of the model is in part due to the similarity between
the propagation of uncertainties (forecast errors) in Lorenz model 1 and global cir-
culation models in the midlatitude storm-track regions. In particular, the errors are
propagated by dispersive waves whose behavior is similar to that of synoptic- scale
Rossby waves, and the magnitude of the errors has a doubling time of about 1.5
days (where the dimensionless model time has been converted to dimensional time
by assuming that the characteristic dissipation time scale in the real atmosphere is
5 days). Lorenz model 2 adds the feature of a smooth spatial variation of the model
variables that resembles the smooth variation of the geopotential height stream-
function at the synoptic and large scales in the atmosphere. Lorenz model 3, the
most refined and realistic of the three models in Lorenz [1], adds a rapidly varying
small-amplitude component to the smooth large-scale flow, mimicking the effects of
small-scale atmospheric processes.
Previously, Edward Lorenz [4] had described the convection motion of a fluid in a
small, idealized Rayleigh-Benard cell. The Lorenz equations were derived from the
Oberbeck-Boussinesq approximation to the equations describing fluid circulation in
a shallow layer of fluid, heated uniformly from below and cooled uniformly from
above. This fluid circulation is known as Rayleigh-Bnard convection. The fluid
is assumed to circulate in two dimensions (vertical and horizontal) with periodic
rectangular boundary conditions.
The partial differential equations modeling the systems stream function and tem-
perature are subjected to a spectral Galerkin approximation: the hydrodynamic fields
are expanded in Fourier series, which are then severely truncated to a single term for
the stream function and two terms for the temperature. This reduces the model equa-
tions to a set of three coupled, nonlinear ordinary differential equations. A detailed
derivation may be found, for example, in nonlinear dynamics textbooks. The Lorenz
system is a reduced version of a larger system studied earlier by Barry Saltzman [5].
In his idealized model the boundary conditions of the fluid at the upper and lower
plates were stress free rather than the realistic no-slip, while the lateral bound-
ary conditions are taken to be periodic rather than corresponding to realistic side
walls, and the motion is assumed to be two-rather than three- dimensional. These
modifications greatly simplified the mathematical analysis because the governing
equations reduce the complicated partial differential equations describing the fluid
motion and heat flow to three ordinary differential equations where the fluid and heat
equations are coupled. Since these equations are non-linear, there will be terms cou-
pling the different modes, and also terms generating higher harmonics representing
the thermal modes and the fluid components velocity. The major approximation is
that the latter terms are ignored.
5.1 Lorenz Model 91

dx
= f x (x, y, z, ) = (x y)
dt
dy
= gx (x, y, z, ) = x y x z
dt
dz
= h x (x, y, z, ) = x y z, (5.1)
dt
where depends on the properties of the fluid (the ratio of the viscous to thermal
diffusivities) and = 83 (this would be different for a different choice of horizontal
wavelength or roll diameter). The temperature difference is the important control
parameter: for < 1 the solution at long times is asymptotic to x = y = z = 0,
i.e. no convection. For > 1 various chaotic solutions occur. Lorenz realized the
importance of the aperiodic motion. This Lorenz model has been widely used for
exploring many real world problems.
In the last few years an increasing number of physicists have directed their research
activities toward investigating the complex behavior of nonlinear model systems
described by seemingly simple deterministic differential or difference equations.
In fact there is a wealth of similar systems of nonlinear differential equations and
difference equations whose chaotic state is characterized by apparently random-
looking motion on attractors displaying a complicated structure. The trajectories in
the phase space of these systems and the bifurcation sequence leading to the erratic
motion have been studied intensively by computer calculations. Lyapunov character-
istic exponents (1 15), Poincar maps, symbolic transition dynamics, and other
mathematical tools have been used. Lyapunov exponents [6] measure the growth
rates of generic perturbations, in a regime where their evolution is ruled by linear
equations. Possible universal properties of the bifurcation sequences have been stud-
ied using scaling approaches. In mathematics the Lyapunov exponent or Lyapunov
characteristic exponent of a dynamical system is a quantity that characterizes the rate
of separation of infinitesimally close trajectories. Quantitatively, two trajectories in
phase space with initial separation Z0 diverge (provided that the divergence can be
treated within the linearized approximation) at a rate given by |Z(t)| et |Z0 |
where is the Lyapunov exponent.
The rate of separation can be different for different orientations of initial sepa-
ration vector. Thus, there is a spectrum of Lyapunov exponents equal in number to
the dimensionality of the phase space. It is common to refer to the largest one as the
Maximal Lyapunov exponent (MLE), because it determines a notion of predictability
for a dynamical system. A positive MLE is usually taken as an indication that the
system is chaotic (provided some other conditions are met, e.g., phase space com-
pactness). Note that an arbitrary initial separation vector will typically contain some
component in the direction associated with the MLE, and because of the exponential
growth rate, the effect of the other exponents will be obliterated over time.
Since the equations for the dynamics of x, y, z are first order and autonomous,
one may consider this set of variables the phase space. The dynamics at each point,
specified by the velocity in phase space vector is unique. The evolution in time
then traces out a path in the three dimensions. An immediate result is that phase
space trajectories cannot cross.
92 5 Applications

The sensitive dependence on initial conditions found by Lorenz is now known

affectionately as Lorenzs butterfly effect. In fact in a later paper Lorenz remarked:
One meteorologist remarked that if the theory were correct, one flap of the sea
gulls wings would be enough to alter the course of the weather forever.
By the time of Lorenzs talk at the December 1972 meeting of the American
Association for the Advancement of Science in Washington, D.C. the sea gull had
evolved into the more poetic butterflythe title of his talk was: Predictability: does
the Flap of a Butterflys wings in Brazil set off a Tornado in Texas?
Lorenzs work was largely ignored for ten years, but can now be seen as a prescient
beginning to the study of chaos. Using this model we have identified some charac-
teristics of chaos, that we will want to quantify further: apparent randomness in the
time variation, broad band components to the power spectrum, sensitive dependence
on initial conditions.
It is now known that the Lorenz equations are not an accurate description of the
original (idealized) convection system for temperature differences, expressed by the
dimensionless measure , large enough to yield chaos.
McLauglin and Martin [7] also showed that chaos is obtained for a three dimen-
sional version, but of course the mode equations are then not simply the Lorenz
equations.
An alternative approach has been to construct experimental systems for which the
Lorenz equations are a good description. Since the Lorenz equations break down due
to the excitation of higher spatial harmonics, one scheme has been used to investigate
convection in a circular glass tube held vertically and heated over the lower half and
cooled over the upper half. Another experimental system described by the Lorenz
equations is the Rititake dynamo, i.e. a homopolar generator with the output fed back
through inductors and resistors to the coil generating the magnetic field. The disc is
driven by a constant torque. The coupled circuit and rotation equations can be reduced
to the Lorenz form, and experiments indeed show an apparently chaotic reversal of the
coil current and the magnetic field. Analogies with the earths magnetic field, which
shows irregular reversals on a time scale of millions of years, are certainly intriguing
even though Glatzmaier and Roberts [8] suggest that a three mode truncation will
not be a good approximation for the turbulent dynamics of the earths interior.

5.1.1 Solution of Lorenz 63 Model

The Lorenz model can be solved using a discretized four order Runge-Kutta method
[9]. Starting from our Eq. (5.1) we have:

xk+1 = xk + t ( f 1 + 2 f 2 + 2 f 3 + f 4 )
yk+1 = yk + t (g1 + 2g2 + 2g3 + g4 )
z k+1 = z k + t (h 1 + 2h 2 + 2h 3 + h 4 ), (5.2)
5.1 Lorenz Model 93

where

f 1 = f x (x, y, z, )
g1 = f y (x, y, z, )
h 1 = f y (x, y, z, )
f1 g1 h1
f 2 = f x (x + t , y + t , z + t , )
2 2 2
f1 g1 h1
g2 = f y (x + t , y + t , z + t , )
2 2 2
f1 g1 h1
h 2 = f z (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
f 3 = f x (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
g3 = f y (x + t , y + t , z + t , )
2 2 2
f2 g2 h2
h 3 = f z (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
f 4 = f x (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
g4 = f y (x + t , y + t , z + t , )
2 2 2
f3 g3 h3
h 4 = f z (x + t , y + t , z + t , ), (5.3)
2 2 2
where t is the model time step and k is the time step index.
Since the
discretization introduces an error, it is usual to add to (5.2) a term of
the form t where = (x , y , z )T N (0, Q) is assumed to be a normally
distributed random vector with zero mean and error covariance Q.

5.1.2 Lorenz Model and Data Assimilation

On the web there are some implementations of the Lorenz experiment. They are
written in MATLAB (Kuhl and Kostelich [10]) for the LETKF method or in C++
(Bannister [11]) or IDL (Migliorini [12]) for the EnSRKF method. The reason is that
Lorenz 63 or 96, depending on the paper used, are a test bench to understand the
behavior of a simplified atmosphere or to make some experiments of data assimilation
or chaos.
In this paragraph we show the EnSRKF approach by Bannister. His approach is a
little bit different from the general approach reported in the previous chapter, but it
has the advantage to show another solution in the framework of what we have called
the Deterministic Ensemble Kalman Filter.
94 5 Applications

At time t the forecast state of the system is represented by an n-element state

f
vector xk (t). Suppose the ensemble of N such model states exist (1 k N ) and
that the state vectors comprise the columns of the matrix X f (n N ) as X f (t) =
f f f
{x1 , x2 , . . . , x N } where the forecast ensemble will represent the realizations of the
real system. The uncertainty of the forecasts is represented by the spread of the
members.
Lets suppose the observations are available at time t in the p-element vector y(t),
for p- observations.
The EnSRKF may be used to update the members to reduce the spread of the
ensemble consistent with the uncertainties of the forecasts and the observations.
The Kalman filter is

xka = xk + P f HT (HP f HT + R)1 (y Hxk ).

f f
(5.4)

The EnSRKF process treats the whole ensemble as a single entity so the value of xa
is in fact the same of the ensemble mean. In matrix notation we have X f and also
the analysis matrix Xa . The mean of X f is denoted with X f , of size n N , that is
made up of N identical columns each containing the ensemble mean state vectors
for model forecast. This matrix is that one of writing x f . Xa of size n N is the
ensemble mean matrix of the analyzed Kalman filter output and contains the writing
of xa .
In order to use the matrix notation we define also a new matrix Y that is made up
of N identical columns each containing the observation vector y. Thus

Xa = X f + P f HT [HP f HT + R]1 (Y HX f ). (5.5)

First of all we need to define P f that is unknown. However it can be approximated

because it is derived from statistical ensemble.

1 f
N
f
Pf (xk x f )(xk x f )T
N 1
k=1
1
Pf X f (X f )T , (5.6)
N 1
f
where X f = (xi x f ), k = 1, 2, 3, . . . , N is the n N matrix of the ensemble
member perturbations. The same can be done for the analysis error covariance matrix:

1 a
N
Pa (xk xa )(xka xa )T
N 1
k=1
1
Pa Xa (Xa )T . (5.7)
N 1
5.1 Lorenz Model 95

Substituting in (5.5) we obtain:

X f (X f )T T X f (X f )T T
Xa = X f + H [H H + R]1 (Y HX f ). (5.8)
N 1 N 1

Rearranging and simplifying we obtain:

Xa = X f + X f (X f )T HT [HX f (X f )T HT + (N 1)R]1 (Y HX f ). (5.9)

Indicating B = HX f we have:

Xa = X f + X f BT [BBT + (N 1)R]1 (Y HX f ). (5.10)

If we indicate with C = (BBT + (N 1)R) we can write:

Xa = X f + X f BT C1 (Y HX f ). (5.11)

Now we turn to the state covariance. Remembering that Pa = (I KH)P f =

P f KHP f , substituting in it for K = P f HT [HP f HT + R]1 , we get:

Pa = P f P f HT [HP f HT + R]1 HP f . (5.12)

Remembering that P f and Pa are given from (5.6) and (5.7), and taking into account
that HX f = B we make a substitution in the relation (5.12) obtaining:

Xa (Xa )T = X f (X f )T X f BT [BBT + R(N 1)]1 B(X f )T . (5.13)

Introducing a new term

G = BT [BBT + R(N 1)]1 B (5.14)

we obtain
Xa (Xa )T = X f [I G](X f )T . (5.15)

In the square root scheme the idea is to obtain an ensemble perturbed analysis (con-
tained into Xa ) that has the covariance given by the relation (5.12). The matrix Xa
is thought as the square root of the analysis error covariance matrix given by (5.7)
that can be added to the mean Xa of relation (5.10). The last step is to find Xa that
has the properties of (5.7), which is the same as to find the square root of (5.7). Thus
we decompose firstly G = C in its eigenvectors and eigenvalues .

Xa (Xa )T = X f [I T ](X f )T (5.16)

96 5 Applications

Bringing the eigenvectors outside brackets and using the properties of eigenvectors
IT = I we have:

Xa (Xa )T = X f [I ]T (X f )T (5.17)

The square matrix [I ] has an infinite number of possible roots. One is simply
1 1
[I ] 2 and the other is [I ] 2 T since T = I. Therefore:
1 1 1 1 1 1
[(I ) 2 T ][(I ) 2 T ]T = (I ) 2 T (I ) 2 = (I ) 2 (I ) 2 = I
(5.18)
Thus using the second square root we have:
1 1
Xa (Xa )T = X f [I ] 2 T ([I ] 2 )T T (X f )T (5.19)

from which we can extract Xa .

1
Xa = X f [I ] 2 T (5.20)

The final step is to construct the full ensemble from the perturbation and then to
propagate this ensemble to the next step to obtain the final next step.

Xa (t) = Xa + Xa (5.21)
X (t + t) = M(X (t)) + X(t)
f a
(5.22)

where M is the non linear forecast model and X(t) is a n N matrix of stochastic
perturbations to simulate the imperfect model with specified error covariance Q. In
absence of observations no data assimilation can be performed and only the forecast
is performed.

5.2 Biology and Medicine

In recent years, a branch of science called Systems Biology has developed. It is a

discipline intended to redefine the concept of biological system in term of integrated
observable entities and to analyze complex data sets using interdisciplinary tech-
nology platforms as: Phenomics, Genomics, Epigenomics, Transcriptomics, Inter-
feromics, Proteomics, Metabolomics, etc. The model at the center of observation
returns to be the cell and cell populations, together with its microenvironment. The
approach of System Biology tends to promote a comprehensive vision that can com-
bine complex causal relationships. Such relationships are not simply associated to
the classical concept of deterministic causality, but they rather correspond to the
concept of probabilistic causality, common in physics.
5.2 Biology and Medicine 97

The System Biology considers that information flow depends on a mix of both
molecular (DNA, protein, lipids, etc.) and physical signals (System status parame-
ters: strength, thermodynamic constraints), which interact with each other through a
non-linear dynamics. To understand these interactions is necessary, on one side, to
provide data on which it is possible to conduct statistical evaluations, made possible
through techniques of Genomics, Proteomics, Metabolomics, etc. Such approaches
provide large amounts of information and the way to understand how physical forces,
the weight of the tangential stress, stiffness, surface tension and gravitational inter-
actions, determine the fate of complex biological systems.
At the same time mathematical and numerical models have been progressively
used as a tool for supporting medical research in the biology and medicine science.
Silico experiments have provided remarkable insights into a physio-pathological
process completing the traditional in vitro and in vivo investigations. Furthermore
numerical models have been used to give a dynamical representation of the biology
of a specific patient and to be a support to the prognostic activity.
The need to obtain not only qualitative responses but also quantitative responses
for diagnostic purposes has stimulated the design of new methods and instruments of
measurements and imaging. The advent of high resolution 3D imaging instruments
in biology and medicine are producing novel approaches to treat huge amount of data
which can be used for the numerical simulations. However, beyond the validation,
it is possible to merge simulations and measures by means of more sophisticated
numerical techniques.
Recently some Data Assimilation methods have been applied to biomathematics,
merging observed (generally sparse and noisy) information into a numerical model.
This approach improves the quality of the information because allows to include
effects otherwise difficult to modeling introducing a sophisticated filter able to bal-
ance the uncertainties of the measured data with basic principles. In summary, data
assimilation methods born in geophysics and meteorology, are also mature to be used
in the fields of biology and medicine.
What is relevant is the transparency of the mathematical and statistical method-
ology and of the digital representation of a specific pathology on that patient (elec-
tronic/virtual phenotype). This approach allows us to initiate a route towards a well
based selection of therapeutics for the future integrated and personalized care in
agreement with the actual institution guidelines development.
The integration of missed information such as those of case report will allow to
approach the solution of a single difficult case by using a set of different domain
choices (social, ethical, medical etc.) capable to improve quality of life with the
specific pathology and patient stage.
The data from specific research in the pathology sector will be the base for the
continuous and future tuning of the data base and for the periodic validation of a
complex virtual engine. By reconstructing the electronic phenotype, the individuality
recognition (personalization) of each single patient is actually linked to the overall
information recruited on the disease.
98 5 Applications

Phenotype domains are crucial to see the personalized patient represented in all its
own dimensions even though social economic psychological-ethical condition could
be prone to orient the final selection of the integrated therapeutic actions.

5.2.1 Tumor Growth

Cancer may be regarded as a paradigmatic microcosmos for all of biology, i.e. as

an observable system where mutation and evolution take place. This is a multiscale-
problem in the life sciences.
A general approach to the problem of tumor growth is proposed by Chaplain [13].
Starting from a macroscopic model it is possible to describe the space-temporal
growth and spread of the tumor by classical deterministic reactiondiffusion
equations.
Consider the flux of cells across a volume V enclosed by a smooth surface S in
term of mass conservation, the rate at which the number of cells changes within V
is equal to the number of cells across S. Since the process is affected by creation of
cells due to mitosis or a lost due to their death we need also to take into account these
effects.
By Ficks second law one can predict how diffusion changes the concentration of
cells with time.
c 2c
= D 2, (5.23)
t x

where c is the concentration (amount of substance) length 3 , ( mol

m3
); it depends on
location x in (m) and time t in sec, D is the diffusion coefficient m time1 Taking
2

into account the process of creation and/or death of the cells due to treatment, one
can write the partial differential equation in three dimension

c c c c c
= (D ) + (D ) + (D ) + c(1 ) G(t)c, (5.24)
t x x y y z z Tmax

where G(t) accounts for the temporal profile of treatment and as a first approximation
may be considered a constant and is the maximum glioma growth rate. Such model
is given both by Chaplain [13] and Swanson [14]. In order to avoid the exponential
growth term, the cells growth logistically with some carrying capacity Tmax at any
given point of the models domain. The representative parameters for this model
applied to brain are given in the following Table 5.1, where are reported also the
density of white matter, gray matter, corpus callosum and cerebro spinal fluid (CSF).
The initial conditions are
c(x, y, x, 0) = f(x, y, z) (5.25)

while the boundary conditions on the brain domain on which the diffusion equation
is to be solved is
5.2 Biology and Medicine 99

Table 5.1 Parameters for Swanson model

Parameter Meaning Value
Maximum glioma growth rate day1 0.2
Tmax Glioma carrying capacity cells mm2 1000
D white matter Diffusion rate (mm2 day1 ) 0.0065
D corpus callosum Diffusion rate (mm2 day1 ) 0.001
D gray matter Diffusion rate (mm2 day1 ) 0.0013
D CSF Diffusion rate (mm2 day1 ) 0.001

c c c
n D( i + j + k) = 0 on . (5.26)
x y z

In the Neumann boundary condition the normal derivative of the unknown function
c (solution of PDE) is

c
(x, y, z) = n c = f(x, y, z) , (5.27)
n

where n is the unit vector normal to the boundary of the domain , i, j, k are
the vectors representing the subintervals in which the coordinates are subdivided and
f(x, y, z) is a known function that defines the initial spatial distribution of malignant
cells [15].
In order to solve the Eq. (5.24) one uses the Crank-Nicolson method, well known
in the frame of many diffusion problems.

5.2.2 Growth Tumor Data Assimilation with LETKF

We report here the paradigmatic approach of Kostelich et al. [16], because the data
assimilation procedure, despite being applied to a brain tumor (Glioblastoma mul-
tiform-GBM-), does not depend on the details of a given cancer growth model and
should be applicable to other models of cancer or biological phenomena. They report
two types of models. The one reported here is simpler than the other which involves
more processes, from haptotaxis to degradation and repair.
We prefer to avoid discussing the second model in order to emphasize the data
assimilation methods. The equation has been integrated on the brain geometry
obtained from the Brain web data base developed by the McConnel Brain Imag-
ing Center of the Montreal neurological Institute at McGill University.
This is an example of Local Ensemble Transform Kalman Filter (see Hunt et
al. [17]) one of the general approach of classical or advanced data assimilation used
in weather forecast. It can be addressed to GBMs growth even thought the details of
the growth of the tumor cells are poorly known due to complexity of the mechanisms
100 5 Applications

involved. Data are obtained from high resolution episodic images that are obtained
at intervals of weeks to months using chemical agents to enhance the contrast. The
therapy induces further complications that affect the information we need to apply
the data assimilation methods. However the goal of the research is to obtain good
quantitative predictions of GBM growth and spread as well as to estimate their
uncertainties.
Data assimilation have been obtained observing the magnetic resonance images
and using two different model of growth of the tumor.
Even thought the method is the one we have defined in EnKF we exploit and
expand it here.
The problem is to estimate the solution trajectory that best fits the observations,
given an imperfect forecast model that produces trajectories from time tn1 to tn .
Assuming at each time tk , k = 1, . . . , n that the observation is related to the
operator H, i.e. yk = H(x(tk )) + v, where v = N (0, R) is a Gaussian random vector,
and that the system evolves according to a linear model xn = Mn xn1 , the problem
is to maximize the likelihood function

n
1
L[x(t)] = exp{ [y j H(x(t j ))]T R1 [y j H(x(t j ))]} (5.28)
2
j=1

or minimize the cost function obtained taking the log of relation (5.28)

n
J[x(t)] = {[y j H(x(t j ))]T R1 [y j H(x(t j ))]}. (5.29)
j=1

Since we have seen the extension of Kalman Filter to a non linear scenario implies
that the propagation of the error analysis covariance is no longer traceable by the
f
background covariance matrix given by Pn = Mn Pn1 a (M )T , one of the possible
n
solutions is to select an ensemble of k possible trajectories whose variance is approx-
imated by Pn1a . Each ensemble is updated by the model to time t to compute the
n
f
updated ensemble background/forecast covariance Pn . Accurate analysis is obtained
f
when the spread of the ensemble approximates well Pn , otherwise the analysis fails
to correct errors in forecast model. LETKF uses localization to overcome problems
arising from the size of trajectories. Since the size of k is in general smaller than the
model resolution m, the LEFKFs strategy is to perform the analysis at each point
individually by forming a local ensemble over a subset of the model domain (Hunt
et al. [17]). In this way the dynamics at a given point is captured over the local region.
Then the local ensemble will estimate the background uncertainty and will correct
the forecast at each point, giving an updated global analysis over the entire model grid.
The advantage of LETKF is to be model independent, i.e. the analysis is computed
without the use of the model equations. This is a key feature when the system under
study is poorly understood and the models are not fully mature.
5.2 Biology and Medicine 101

The procedure starts at time tn1 with the analysis of the ensemble consisting of
m-dimensional model vectors

{xtan1
i
i = 1, 2, . . . , k}, (5.30)

that represents the density of the cells (proliferating, migrating, chemo-repelling) and
the extracellular matrix (ECM) at each grid point of the model geometry assumed
for the cavity in which the tumor is growing.
The mean is considered as the best value estimate of the most likely state of the
system. The update of the ensemble member is abstained from the model up the time
tn . In such a way one can obtain the forecast ensemble

f
{xtni i = 1, 2, . . . , k}. (5.31)

The relation between the forecast state and the analysis state is given by:

f
xtni = F(xtan1
i
, tn1 ) i = 1, 2, . . . , k. (5.32)

From now on the time subscripts will be omitted. The forecast mean is:

k
x f = k 1 x fi , (5.33)
i=1

with the appropriate error covariance matrix

k
P f = (k 1)1 (x fi x f )(x fi x f )T
i=1
1
= (k 1) X (X f )T ,
f
(5.34)

where the m k forecast ensemble perturbation matrix X f has the i-column (x fi

x f ).
The problem is to compute the analysis ensemble {xai i = 1, 2, . . . , k} with
appropriate mean and error covariance matrix

k
xa = k 1 xai (5.35)
i=1

and

k
P f = (k 1)1 (xai xa )(xai xa )T
i=1
1
= (k 1) X (Xa )T ,
a
(5.36)
102 5 Applications

since the task of LETFK is to minimize the cost function

J[x(t)] = [y H(x)]T R1 [y H(x)] + (x x f )T (P f )1 (x x f ). (5.37)

However the rank of the covariance forecast matrix (5.34) can be at most k 1
and therefore it is not invertible. Nevertheless its inverse is well defined on the
space S defined by the forecast ensemble perturbation, that is the column of X f .
Then J is well defined for (x x f ) in S and the minimization can be carried out
in this subspace. Thus in order to carry out the analysis on S one needs to choose
an appropriate coordinate system through an affine transformation. If one regards
X f as a linear transform from a k-dimensional space S onto S, one may perform
the analysis in S. Then let w denote a vector in the S space. X f w belongs to the
space S spanned by the forecast perturbation ensemble and x = x f + X f w is the
corresponding model status. Note that if w is a random Gaussian vector with mean
0 and covariance (k 1)1 I then x = x f + X f w is Gaussian with mean x and
covariance P f = (k 1)1 X f (X f )T .
Then the cost function is on S:

J(w) = [y H(x f X f w)]T R1 [y H(x f X f w)] + (k 1)w T w. (5.38)

If wa minimizes J then xa = x f + X f wa minimizes J. In order to prove this, let

P be the orthogonal projection matrix from S onto the subspace spanned by the
columns of X f , that is P = X f [(X f )T X f ]1 (X f )T . Decompose the vector w as
w = Pw + (I P)w. Substituting into the first term of the cost function (5.38) one
obtains

(k 1)w T w = (k 1)w T [Pw + (I P)w]

= (k 1)w T Pw + (k 1)1 w T (I P)w. (5.39)

Since w = (X f )1 (x x f ) and given the orthogonalization of P one gets:

(k 1)w T Pw = (k 1)[(X f )1 (x x f )]T X f [(X f )T X f ]1 (X f )T (X f )1 (x x f )

= (k 1)[(X f )1 (x x f )]T (X f )1 (x x f )
= (k 1)[(x x f )]T [(X f )1 ]T (X f )1 [(x x f )]
= (k 1)[(x x f )]T [X f (X f )1 ]T [(x x f )]
= [(x x f )]T (P f )1 [(x x f )]. (5.40)

Combining (5.39) and (5.40) into the cost function (5.38) one gets

J(w) = (k 1)w T (I P)w + J(x f + X f w), (5.41)

where the first term on the right is the orthogonal projection of w onto the null space
N of X f which depends on the w components in the null space. The second term
5.2 Biology and Medicine 103

only depends on the components in the column space S, of X f Then it follows that
wa minimizes J if and only if it is orthogonal to N and xa minimizes J
In order to derive the updated analysis and error covariance matrix one proceeds to
compute a minimizer to J based on Kalman filter. First of all one linearizes the obser-
vation operator H to each forecast trajectory x fi producing the l dimensional vectors
(l m), representing the spatial dimension of the observations, which comprise the
forecast observation ensemble,

y fi = H(x fi ). (5.42)

As before lets denote the mean forecast observation with y f and the l k forecast
observation ensemble perturbation matrix with Y f whose ith column is x fi xf .

H(x f + X f w) y f + Y f w, (5.43)

remembering the Kalman filter update equation we obtain:

w = K[y y f Y f w] (5.44)

Remembering
K = Pa HT R1 (5.45)

and
Pa = (I + P f HT R1 H)1 P f , (5.46)

since Y f is playing the role of H one has:

wa = Pa (Y f )T R1 [y y f ] (5.47)

and
Pa = [(k 1)I + (Y f )T R1 Y f ]1 (5.48)

The analysis mean and error covariance matrix in model variables are:

xa = x f + X f wa , (5.49)

Pa = X f Pa (X f )T . (5.50)

Lets now determine an analysis ensemble whose mean and error covariance
matrix are given from the previous last Eqs. (5.49) and (5.50). The strategic approach
is based on the two following points.
1. Since LETKF adopts the localization, the ensemble depends continuously on the
analysis covariance matrix, that ensures the analysis ensemble covariance matrix
Pa is the same in any points of the grid. A possible choice is Xa = X f Wa where
104 5 Applications

Wa is the symmetric square root matrix defined as (k 1)Pa = Wa (Wa )T . In

fact the (5.50) is, taking into account also Xa = X f Wa

Pa = X f Pa (X f )T
= X f (k 1)1 Wa (Wa )T (X f )T
= (k 1)1 Xa (Xa )T . (5.51)

2. Choose a matrix whose columns sum to zero and has the desired error covariance
matrix. In order to show the columns of Xa sum to zero is equivalent to show
Xa w = 0 where w (1, 1, . . .)T . Then because also Y f sum to zero we have

(Pa )1 w = (k 1)w + (Y f )1 R1 Y f w
= (k 1)w. (5.52)

Thus w is an eigenvector for Pa with eigenvalue (k 1)1 . At the same time,

from Pa w = (k 1)1 Wa (Wa )T w, we have that w is also an eigenvector for
Wa with eigenvalue 1. Since the columns of X f sum to zero

Xa w = X f W a w
= Xfw = 0 (5.53)

3. Shift the columns by x because the mean as in (5.49) holds. Add wa to each
column vector wai of Wa defined as xai . By these weighting vectors one gets the
ith analysis ensemble in space model xai = x f + Xa xai .
With the previous positions, the updated analysis mean is

k
k
(k)1 xai = (k)1 (x f + Xa xai )
i=1 i=1

k
xa = x f + X f wa + k 1 X f wai
i=1
1
= x + X w + k
f f a f
X W w a

= x f + X f wa . (5.54)

5.2.3 LETFK Receipt Computation

Following Hunt [17] the procedure starts with several preliminary computation car-
ried out over the entire model grid.
5.2 Biology and Medicine 105

1. The observation operator is applied to the m dimensional forecast ensemble vector

x fi to form the forecast observation ensemble y fi
2. After both ensembles are averaged the vectors y fi y f and x fi x f that are
used to form the perturbed matrix Y f and X f are computed.
3. For each local region select the components of x fi , y fi , X f , Y f . and R.
4. Compute the k l matrix RCT = Y f . This approach avoid to invert R.
5. Compute the k k matrix Pa = [(k 1)I/ + CY f ]1 . > 1 is a multiplicative
covariance inflation factor that is a remedy to inflate the forecast ensemble
covariance that tends to be underweighted. The multiplication CY f requires less
that 2k 2 l operations.
1
6. Compute the k k matrix Wa = [(k 1)Pa ] 2
7. Compute the k dimensional vector w = Pa C(y y f ) and add it to each i column
of Wa to form the analysis k k weight matrix Wa
8. Compute the analysis perturbation matrix Xa = X f Wa
9. The analysis ensemble xai , is formed by adding x f to the ith column of Xa with
i = 1, 2, . . . , k
The data assimilation is complete when the global analysis ensemble xai is formed,
which consists of the collection of local analysis ensemble at the center of each local
region.

5.3 Mars Data Assimilation: The General

Circulation Model

Because the data assimilation requires a reference model, this section is divided into
two parts, one dealing with a Mars general circulation model and another that deals
with Mars data assimilation .
Some planetary models derive from the Mintz-Arakawa s model for Earth
atmospheric circulation. There are several versions, from the first one up to the cur-
rent UCLAs model (version 6.4) pass 40 years of research work. Akio Arakawa has
been a leader in the field of Earth atmospheric general circulation model (AGCM)
development from its beginning. AGCMs are essential tools for studies of global
warming and projecting the consequences of anthropogenic climate change. His
AGCMs contain several contributions on several areas
1. numerical schemes suitable for the long model integrations required by climate
studies;
2. modeling of cloud processes including cumulus parameterization;
3. modeling of planetary boundary layer (PBL) processes.
The equations for large-scale atmospheric motion, the so called primitive equa-
tions, are defined by:
106 5 Applications

1. The hydrostatic equations

= bRT (5.55)

where the sigma coordinate ( pp pT

s pT
) is the vertical coordinate, increasing
downward, with p the pressure and ps and pT are respectively the pressure at
the lower and upper boundaries of the atmospheric domain. is the geopotential
of the sigma coordinate surface, R is the gas constant, T is the temperature and
b ( + pT )1 with ( ps pT ) where ps is a a function of , are the
latitude and the longitude respectively, t is the time and pT is a constant.
2. The horizontal momentum
(v) ( v) u
= divh (vv) (2 + )(k v)
t a cos
sin [gradh () ( bRT)gradh ()] + F (5.56)

where is the planetary rotation rate, a is the planetary radius, v is the wind
velocity with the eastward component u and westward v, is the latitude, is the
longitude and t the time. F is the horizontal frictionless force per unit mass and
k is the vertical unit vector, and divh and gradh are respectively the divergence
and gradient operators on the surface of constant .
3. The thermodynamic equation

(T) ( T) dp
= divh (vT) + C 1
p bRT + C 1
p H (5.57)
t t dt
where H is the rate of heating per unit mass, with the auxiliary relation which
follows from the equation of continuity; C p is the specific heat at constant pres-
sure. Here R = 0.188 107 erggm1 K1 ; C p = 0.879107 (1.0 + 0.634
103 (T 250)) for T ranging from 120 to 300 K.

= divh (v)d (5.58)
0 t

and the pressure tendency equation

1
= divh (w)d ( )=1 (5.59)
t 0

and the substantial derivative of pressure

dp
= [ + w gradh ()] + . (5.60)
dt t
5.3 Mars Data Assimilation: The General Circulation Model 107

Mars general circulation modeling began in the 1960 s when Leovy and Mintz [18]
modified the UCLA two level atmospheric model developed by Mintz [19] and
Arakawa [20] for conditions appropriate for Mars, and used it to study the planets
wind systems, thermal structure, and energetics. More information on Mars geologi-
cal structure and a description of the present atmosphere can be found in two books:
Mars [21] and An Introduction to its Interior, Surface and Atmosphere [22].
The value of = 0 at the lower boundary , where = 1 except when CO2
condenses or sublimes. It is assumed ddtp = 0 at p = pT so that = 0 where
= 0. The heating rate is the combined effect of solar heating, IR radiative transfer,
convective exchange and latent heat due to condensation and sublimation of CO2
which is the main gas on Mars.
Thus the rate of heating for unit mass is:
g
Hi = (Si + Wi + Ci ) (5.61)
(p)i

where Si , Wi , Ci are the differences between the net flux at the top and bottom
of the layer due to solar radiation, thermal radiation and sub-grid convection. Leovy
and Mintz [18] report the equations to compute these quantities. The mass of the
layer is represented by (p)g
i
where (p)i is the increase in pressure from the top
to the bottom of the layer. In the region where the pressure is lower than pT they
assume that it absorb heat.
Following the Mintz [19] and Arakawa [20] two layer Earth atmosphere model
they subdivided the Mars atmosphere in the upper layer with i = 1 and lower layer
with i = 3 where (p)1 = ( ps +2 pT ) and (p)3 = ( ps 2 pT ) respectively. The heating
Hi is defined as:
1. Solar heating
Along the CO2 near IR absorption band, the differences can be formulated as:
rm 2 1 1
S1 = ( ) (sin ) 2 {465 + [2397 + 531 ln(csc )](sin ) 2 } (5.62)
r
and rm 2 1 1
S3 = ( ) (sin ) 2 {378 + 657(sin ) 2 }, (5.63)
r

where rrm is the ratio of the Mars mean distance to its actual distance and is the
solar elevation angle.
2. Infrared heating
The flux differences in the IR 15 region, where is located the main infrared band
of CO2 are:

W1 = 1.473 106 Y (TT ) + [1.055T2 1367 16580T21 ] [T1 T3 ]

0.1282 106 [Y (T4 ) Y (TG )] (5.64)
108 5 Applications

and

W3 = 0.455 106 Y (TT ) + [1.50T2 171 51530T21 ] [T1 T3 ]

1.80 106 [Y (T4 ) Y (TG )] + 1.302 108 [Y (T4 )/T4 ]2
(T4 TG ) exp(964.1/T4 ), (5.65)

where
Y (Tk ) = [exp(964.1/Tk ) 1] (5.66)

and T4 , T2 are obtained from T1 and T3 by linear interpolation and extrapolation.

3. Convection
Sub-grid convective heat transport at the ground is:

C4 = C p C H (X )u (TG T4 ), (5.67)

where is the mean surface air density, u is the surface friction velocity and
C H (X ) is a convective heat transfer coefficient which depends on X = [g(TG
1
T4 )/T ] 2 /u . is the molecular thermal diffusivity at the surface and T is the
global mean surface air temperature. For stable conditions one assumes C H (X ) =
C M where C M 2 is the momentum drag coefficient for stable conditions. When the

atmosphere is unstable one uses an empirical formula:

C H (X ) = k0 / [(X 3 C H /T ) + /()]1 d, (5.68)
0

where () is an empirical function, T is an empirical parameter and k0 is the

von Karman constant. The value of C H can assume the following values

C H (X ) = 0.142X 0.14 if X 0.153
1
C H (X ) = 0.204X 3 if X > 0.153.

When the lapse rate is unstable ( > a ) the upward convective heat flux at level
2 is the rate
C2 = 4 102 T2 ( a ), (5.69)
g
where is the actual lapse rate between T1 and T3 and a = C p is the adiabatic
lapse rate. a = 4.23 (1.0 + 0.634 103 (T2 250)) K Km1 . When the lapse
rate is stable but exceeds the threshold m 2.5 Km1 and when the surface heat
flux C4 is upward, a convective exchange arises. In such a case

m
C2 = C4 . (5.70)
a m
5.3 Mars Data Assimilation: The General Circulation Model 109

4. Surface heat balance

From the heat balance equation

(1 A)S4 W4 C4 D L = 0, (5.71)

one can obtain the ground temperature TG . In it A is the surface albedo, S4 is the
downward solar radiation, W4 is the net upward IR radiation at the surface, C4 is
the net upward convective heat flux at the surface, D is the downward conductive
heat flux into the soil and L is the latent heat release due to CO2 condensation on
the surface.
5. Friction and lateral diffusion
Since some of the thermal energy introduced into the system is converted into
kinetic energy of the wind field that is dissipated by the frictional force F linked
with the vertical eddy stress, for the two layer model by Leovy and Mintz [18]
one has:
2g
F1 = 2 (5.72)
ps p T

and
2g
F2 = (2 s ) (5.73)
ps p T
pT
where ps 2g is the mass per unit area of each layer and 2 is the stress on the
lower layer due to turbulent exchange

ps p T
2 = C (v1 v2 ) (5.74)
2g

where C is a parameter depending on the lapse rate. The surface stress s is

vs
s = s u 2 ( ) (5.75)
|vs |

where u = C M |vs |, with vs the surface wind and the momentum drag coefficient
is C M .
Since in Earth simulation model there also a lateral diffusion Leovy and Mintz
also introduced a Mars lateral diffusion coefficients depending on the local grid
distance.
Several sophisticated Mars global circulation models (GCMs) have been devel-
oped since. As an example among these, Richardson et al. [23] derived Planet-WRF
from WRF, which had been designed for modelling terrestrial meso-scale and micro-
scale. WRF was modified to work as a planetary model (see http://planetwrf.com/
model) and the main changes are:
110 5 Applications

1. Non-Conformal Grid Modification.

Instead of using the conformal map of WRF where the scale factors, that relates
map projection distance to physical distance are independent of direction (as
happens when one uses polar stereographic, Lambert conformal, Mercator, and
simplified/idealized square grid boxes), the PlanetWRF was generalized so that
non-conformal map projections could be used. The equations were written with
separate x- and y-directional map scale factor components. These choices allow
any generalized map projection to be used, including transverse projection and
stretched/zoomed grids. Thus the map projection of planetWFR was built to run
on a simple cylindrical projection, evenly spaced in latitude and longitude.
2. Polar boundaries and filtering.
In the simple cylindrical map projection the grid points can be defined to cover
the entire globe in latitude and longitude. In the east-west direction, where there
is a flow across the border and onto the opposite side, the appropriate boundary
conditions are periodic. In the north-south direction a polar boundary condition
is needed. PlanetWRF has adopted the convention for C-grid models with the
standard sigma coordinate. The Arakawa grid system represents and computes
orthogonal physical quantities as the velocity and mass-related quantities, on rec-
tangular grids used for Earth system models in meteorology and oceanography.
The five Arakawa grids (A-E) were first introduced in Arakawa and Lamb [24].
The staggered Arakawa C-grid , instead of evaluating both east-west (u) and
north-south (v) velocity components at the grid center, separates the vector quan-
tities evaluating the u components at the centers of the left and right grid faces,
and the v components at the centers of the upper and lower grid faces. PlanetWRF
takes the polar point to be a C-grid v-stagger point (see Fig. 5.1), with the value of

Fig. 5.1 PlanetWRF global grid. Looking down on the pole (left). C-grid cylindrical projection
(right). The grid points are equally spaced in longitude and latitude with horizontally staggered u, v
and mass points. At the pole all v points correspond to one physical location. source Richardson
et al. [23]
5.3 Mars Data Assimilation: The General Circulation Model 111

the meridional velocity constantly equal to zero. Flux and gradient calculations
across the pole are not allowed. This does not preclude advection of material
across the pole. The advection over the poles is instead accomplished by zonal
transport within the most poleward zone.
Since the physical distance for zonal advection of information decreases rapidly
to zero at the pole, the model time step needed to avoid instabilities in the hori-
zontal direction. In numerical analysis, when certain partial differential equations
(usually hyperbolic PDEs) are solved, the time step must be less than a certain
threshold, otherwise the simulation will produce incorrect results. The Courant-
Friedrichs-Lewy [25] or CFL criteria is a necessary condition for convergence by
the method of finite differences. To be able to use longer time steps (those suitable
for more tropical latitudes) PlanetWRF has implemented polar Fourier filtering
of the higher-frequency components of state variables. All grid points poleward
of 60 degrees are filtered, with the cutoff frequency being a function of the cosine
of latitude. To yield the greatest stability, the column mass, horizontal winds,
temperature and tracers (moisture, aerosols, chemicals, etc.) are also filtered
3. Generalized Planetary Parameters and Timing Conventions.
All planet-specific parameters such as orbital parameters, the relationship between
MKS (SI) seconds and model seconds defined as 1/86400 of the rotational period
of a planet, reference pressure, gravity and gas constant are set in one centralized
module where there are also the set of consistent planetary parameters.
Since the model assumes that one day is made up of 24 h, each of which is
composed of 60 minutes, in turn made up of 60 s, then there is always an integer
number of time steps per day. However, the dynamics and physics routines are
still integrated in MKS (SI) units, with the conversion from model-to-SI time
made before calculating tendencies and physical quantities. Other items are:
a. Since WRF version uses the standard day-month-year calendar format, this
convention is used to drive the solar radiation routines and to label model
output.
b. Since WRF uses routines from the standardized Earth System Modeling
Framework (ESMF) planetWRF converts these routines to drive the model
with user-specified orbital elements using the planetocentric solar longitude
(L s ) date system (with L s = 0 corresponding to northern hemisphere spring
equinox and L s = 90 to northern hemisphere summer solstice, etc.).
4. Parameterizations of sub-grid scale physical processes for various planet.
Depending on the planet, physical routines have been added to WRF to treat
the radiative transfer in atmosphere with high aerosol optical thickness and CO2
atmospheric gas, also taking into account the condensation/sublimation of CO2
from polar caps (Mars), For Titan, an updated version of the radiative transfer
scheme described by McKay et al. [26] has been developed. Simple surface drag
and radiative relaxation schemes, similar to the Held and Suarez [27] forcing
has been used to validate the global dynamical core and to treat the Venusian
atmosphere. The concept of dynamical core was introduced to extend the ideas
of plug-compatible parameterizations to the coding of the dynamics. PlanetWRF
112 5 Applications

has the same WRF existing horizontal and vertical diffusion parameterization
schemes, as far as the physics of diffusion remains the same with only the diffu-
sivities varying.

5.3.1 Mars Data Assimilation: Methods and Solutions

To perform an assimilation Lee et al. [28] adopted version 3.0.1.1 of the MarsWRF
climate model (Richardson et al., [23]) with a radiative forcing in the GCM using
a two stream, single scattering, radiative flux solver based on the Hadley Centre
Unified Model algorithm (Edwards and Slingo, [29]), modified by Mischna et al.
[30]. This parameterization calculates fluxes in the visible and infrared spectra using a
correlated k method to couple the optical properties of the carbon dioxide atmosphere
with the Mie scattering [31] parameters in order to describe a radiative atmosphere
where there are also dust and water and aerosols.
To create the ensemble of model states, the steady state atmosphere is perturbed
using additive Gaussian white noise perturbations on the temperature, surface pres-
sure, horizontal wind, emissivity, albedo, and column dust opacity. Each ensemble
member is then integrated for a certain number of sols to reach a new steady state.
The term sol is used by planetary astronomers to refer to the duration of a solar day
on Mars. A mean Martian solar day, or sol, is 24 h, 39 min, and 35.244 s. The magni-
tudes of the perturbations are small (e.g. standard deviations of 2 K in temperature,
5 m/s in horizontal wind, 0.02 in albedo and emissivity) and are constrained to keep
albedo and emissivity between zero and one.
At each assimilation step an observation forward model within the data assim-
ilation system has been used to simulate the observations using the state vectors
of the ensemble members. The simulated observations are the calibrated radiances
observed by the Thermal Emission Spectrometer (TES) [32] aboard the Mars Global
Surveyor (MGS). TES is a nadir sounding grating spectrometer, observing the spec-
trum from 200 to 1700 cm1 with a resolution of 5 cm1 or 10. cm1 depending on
the scan mode. These data can be obtained from the NASA Planetary Data System
archive (PDS, http://pds.jpl.nasa.gov).
The integration step is performed by the MarsWRF GCM under the control of
the DART software, which maintains the ensemble of model states and provides
the initial conditions necessary to continue the simulation. The assimilation step is
performed by the DART software.
DART is a facility for ensemble Data Assimilation developed and maintained
by the Data Assimilation Research Section (DAReS) at the National Center for
Atmospheric Research (NCAR) http://www.image.ucar.edu/DAReS/DART/. DART
is a software that makes to explore a variety of data assimilation methods and obser-
vations with different numerical models and is designed to facilitate the combination
of assimilation algorithms, models, and real (as well as synthetic) observations to
allow increased understanding of all three.
5.3 Mars Data Assimilation: The General Circulation Model 113

DART employs a modular programming approach to apply an Ensemble Kalman

Filter which nudges the underlying models toward a state that is more consistent with
information from a set of observations. Models may be swapped in and out, as can
different algorithms in the Ensemble Kalman Filter. The method requires running
multiple instances of a model to generate an ensemble of states. A forward operator
appropriate for the type of observation being assimilated is applied to each of the
states to generate the models estimate of the observation.
A sequential data assimilation scheme has been implemented and validated by
Montabone et al. [33] and Lewis et al. [34]. They derived the analysis correction
scheme for Mars, from the one used operationally for Earth weather forecasting at
the UK Meteorological Office. The scheme has been interfaced with the state-of-the-
art Mars General Circulation Model (MGCM) under development at the University
of Oxford in the UK and at Laboratoire de Meteorologie Dynamique du CNRS in
Paris, France. A reanalysis of almost complete Martian years has been carried out
using retrieved thermal profiles and column dust optical depth by data from TES
mapping phase and has been made publicly available.
Greybush et al. [35] have developed an Ensemble Mars Atmosphere Reanalysis
System (EMARS) based on the LETKF scheme (Hunt et al., [17]) and the Geophys-
ical Fluid Dynamics Laboratory (GFDL, Princeton, NY, USA) Mars GCM. Each
ensemble member represents a potential atmospheric state, spanning a range of pos-
sibilities, with the ensemble mean being the most probable and the ensemble spread
reflecting the uncertainty. The LETKF also has the ability to estimate and improve
model parameters. They have assimilated retrievals of thermal profiles and column
dust optical depth from TES during its entire mapping phase as well.
Most recently, Navarro et al. [36] have applied the same LETKF scheme used by
Greybush et al. [35] to a different MGCM, i.e. the one maintained by the Laboratoire
de Meteorologie Dynamique in Paris. They have also used different observations,
i.e. the thermal profiles retrieved from the radiances measured by the Mars Cli-
mate Sounder aboard the Mars Reconnaissance Orbiterthe US spacecraft which
followed MGS in the atmospheric mapping on Mars.

5.4 Earthquake Forecast

Earthquake statistics clearly violate Gaussian approximations in terms of their tem-

poral, spatial and magnitude occurrences, so much so that approximate algorithms
based on local Gaussian approximations (for instance the extended Kalman filter)
are highly unlikely to produce good results. Furthermore, the continuous state space
of seismicity rules out methods in which that space is assumed to be discrete such
as grid-based methods.
Then the approach is with numerical integration techniques and Monte Carlo
methods. The former are numerically accurate but computationally expensive in
problems with high dimensionality, while the Sequential Monte Carlo (SMC) meth-
ods bridge the gap between these cost-intensive methods and the methods based
114 5 Applications

on Gaussian approximations. They are a set of simulation based methods that pro-
vide a flexible alternative to computing posterior distributions. They are applicable
in very general settings and often relatively easy to implement. SMC methods are
also known under the names of particle filters, bootstrap filters, condensation, Monte
Carlo filters, interacting particle approximations and survival of the fittest. Good
introductions can be found, for instance in Arulampalam et al. [37],
As we have seen in previous chapters the Bayesian filtering covers different sce-
narios: from Kalman filter to Sequential Monte Carlo sampling, the so called parti-
cle filters. Among the Monte Carlo methods one of the simplest is the Importance
Sampling (IS) that introduces the idea of generating samples from a known, easy-to-
sample probability density function pdf, q(x), called the importance density or pro-
posal density, and then correcting the weights of each sample so that the weighted
samples approximate the desired density. However IS, in its simplest form, is not
adequate for sequential estimation. Whenever new data become available, one needs
to recompute the importance weights over the entire state sequence.
Sequential Importance Sampling (SIS) modifies IS so that it becomes possible to
compute an estimate of the posterior without modifying the past simulated trajec-
tories. The problem encountered by the SIS method is that, as time increases, the
distribution of the importance weights becomes more and more skewed. For instance,
if the support of the importance density is broader than the posterior density, then
some particles will have their weights set to zero in the update stage. But even if the
supports coincide exactly, many particles will over time decrease in weight so that
after a few time steps, only a few lucky survivors have significant weights, while a
large computational effort is spent on propagating unimportant particles.
It has been shown that the variance of the weights can only increase over time,
thus it is impossible to overcome the degeneracy problem (Kong et al., [38]). Two
solutions exist to minimize this problem:
1. a good choice of the importance density,
2. resampling.
Sequential Importance Resampling (SIR) introduces, however, other problems. Since
particles are sampled from discrete approximations to density functions, the particles
with high weights are statistically selected many times. This leads to a loss of diversity
among the particles as the resultant sample will contain many repeated points. This
is known as sample impoverishment (see Arulampalam et al. [37]) and is severe
when the model forecast is very narrow. There are various methods to deal with this
problem, including sophisticated methods that recalculate past states and weights
via a recursion and Markov Chain Monte Carlo (MCMC) methods. Because of the
additional problems introduced by resampling, it makes sense to resample only when
the variance of the weights has decreased appreciably.
In the next section we introduce the Earthquake model and the Sequential Monte
Carlo and related methods applicable to the Earthquake events.
5.4 Earthquake Forecast 115

5.4.1 Renewal Process as Forecast Model

One of the model used in seismology is the renewal processes. In general the physics
model that are used for seismic hazard map are not well defined and are not unique,
and the same renewal processes models, which are motivated from the elastic rebound
proposed by Reid [39], are under discussion. According to the theory, a large earth-
quake release the elastic strain that has been built up since the last large earthquake.
Then renewal point process are characterized by interval between successive
events that are identically and independently distributed according to a probabil-
ity density function that defines the process (Daley and Vere-Jones [40]).
The time of the next event only depends on the time of the last event:

p(tk |tk1 ) = p(tk tk1 ) = p( ) (5.76)

where represent the interval between the two events. In this frame the time of
the event tk corresponds, in data assimilation, to the model state. The renewal point
processes provides the prior information for the analysis.
Werner et al., [41] offers a pedagogical example of earthquake forecasting. Their
model is based on lognormal renewal process [42, 43].
The intervals of of the lognormal process are distributed according to:

1 log 2
f ( , , ) = exp( ) (5.77)
2 2 2

where the parameter and may need to be estimated.

In terms of lognormal distribution we have

1 log(tk tk1 ) 2
p(tk tk1 , , ) = exp( ) (5.78)
(tk tk1 ) 2 2 2

with = 0.7 and = 0.245 given by Biasi et al. [44]. Since the observed
occurrence time tko is given by the true occurrence time tkt affected by the noise k
we can write:
tko = tkt + k (5.79)

Werner et al. [41] run the numerical experiments using two different distribution.
1. Gaussian uniform distribution
It is:
1
puniform () = H ( + )H ( ) (5.80)
2 2
1
+ 2
puniform () = 2
0 otherwise
116 5 Applications

where H () is the Heaviside step function. The conditional likelihood is obtained

substituting = t o tkt

puniform (k ) = p(tk |tk1 ) = p(tko tkt ) (5.81)

tk
tk 2 tko tk +
puniform () = 2
0 otherwise

where the parameter = 0.5

2. The Gaussian mixture model is

pmix () = p1 N (1 , 1 ) + p2 N (2 , 2 ) (5.82)

where p1 = 0.4 and p2 = 0.6 and the normal distribution N is characterized by

1 = 0.2, 2 = 0.02, 2 = 0.01
The simulation is obtained generating the true estimate by n random samples from
lognormal distribution (5.78).
The particle filtering was initialized with N = 10000 particles at t0 and the
forecast t1 was obtained propagating each particle though the model (5.78). Once
they had obtained the observation estimate and the model forecast the SIR particle
filter (described in the next paragraph) was used to obtain the analysis at t1 . Then
posterior approximation is used to forecast t2 by the marginal distribution of the
recursive bayesian filter.
It is interesting to note that Werner et al. [41] make the comparison between SIR
and EnSRF obtaining quite similar results in the case of uniform Gaussian noise but
fail with problems in which the measurements errors are non Gaussian.

5.4.2 Sequential Importance Sampling and Beyond

Sequential importance sampling algorithm introduced by Marshall [45] is a Monte

Carlo (MC) method that forms the basis for most sequential MC sampling approxi-
mation known as bootstrap filtering, jackknife techniques, condensation algorithm,
particle filtering etc. It is a technique for implementing a recursive Bayesian filter by
MC simulations. The key idea (Arulampalam et al. [37]) is to represent the required
a posteriori density function by a set of random samples with associated weights.
Let us consider a state sequence

xk = fk (xk1 , wk1 ) (5.83)

and the measurements

yk = hk (xk , vk ) (5.84)
5.4 Earthquake Forecast 117

where fk and hk are possibly non linear functions and wk1 and vk are independent
and identical distributed (i.i.d.) process and measurement noise sequences respec-
tively. The problem is to seek filtered estimates of yk based on the set of all available
measurements {yi , i = 1, . . . , k} as we have already seen in the bayesian chapter.
From a Bayesian point of view the tracking problem is to construct the probability
density function pd f , p(x0:k |y1:k ) that can recursively be obtained in two stages:
forecast and update. The initial pd f is: p(x0 |z 0 ) p(x0 ) as we have seen previously.
When the true density is non-Gaussian, the approximated grid based filters and
particle filters will improve the performance respect to other approaches that require
Gaussian approximation.
In the Monte Carlo sampling an empirical posterior density at k can be expressed
as
Np

p(x0:k |y1:k ) = (x0:k
i
) (5.85)
i=1

i . Considering a function f (x )
where is the Delta Dirac mass located at x0:k 0:k
integrable in a measurable space, the Lebesque-Stieltjes integral is the estimate of
such function
Np
1
IN p ( f ) = f (x0:k ) p(x0:k |y1:k )dx0:k = f (x0:k
i
), (5.86)
Np
i=1

that is unbiased.
In the importance sampling the objective is to sample the distribution in the region
of importance in order to achieve computational efficiency. This is important espe-
cially for the high-dimensional space where the data are usually sparse, and the region
of interest where the target lies in is relatively small respect to the whole data space.
Thus let denote a random measure that is characterized by the posterior pd f ,
p(x0:k |z 1:k ), where {x0:k
i , i = 1, 2, . . . , N } is a set of support points with associ-
p
p(x )i
ated weights {wki = q(x i) , i = 1, 2, . . . , N p } and {x0:k
i , i = 1, 2, . . . , N } is the
p
set of all states up to the time k.
One chooses a proposal distribution q(x0:k |y1:k ) in place of a true probability
distribution p(x0:k |y1:k ) so that

p(x0:k |y1:k )
p(x0:k |y1:k ) = q(x0:k |y1:k ), (5.87)
q(x0:k |y1:k )

to obtain the identity

f (x0:k )w(x0:k )q(x0:k |y1:k )dx0:k
I( f ) = , (5.88)
w(x0:k )q(x0:k |y1:k )dx0:k
118 5 Applications

where
p(x0:k |y1:k )
w(x0:k ) = . (5.89)
q(x0:k |y1:k )

When one generates N identically and independently distributed samples x0:k i from

the importance sampling q(x0:k |y1:k ), a Monte Carlo estimate of I ( f ) is

N p
1
Np i=1 f (x0:k
i )w(xi )
0:k
Np

I( f ) = N p = f (x0:k
i
)wki , (5.90)
j
j=1 w(x0:k )
1
Np i=1

where the normalized importance weights wki is

w(xi )
wki = N 0:k . (5.91)
p j
j=1 w0:k

Then the posterior density is

Np

p(x0:k |y1:k ) = wki (x0:k
i
). (5.92)
i=1

The importance sampling may be useful to reduce the variance of the estimator and
when encountering the difficulty to sample from the true probability.
In general, however, it is difficult to find a good proposal distribution function
especially in a high dimensional space. If the importance density q(x0:k |y1:k ) at time
k admits a marginal distribution at time k 1, the importance function q(x0:k |y1:k1 )
can construct the proposal sequentially, which is the basis of Sequential Importance
Sampling (SIS)

q(x0:k |y1:k ) = q(x0:k1 |y1:k1 )q(xk |x0:k1 , y1:k ) (5.93)

After iteration, the proposal is obtained in a factorized form

k
q(x0:k |y1:k ) = q(x0 ) q(x0:l |x0:l1 , y1:l ) (5.94)
l=1

and the importance sampling can be performed recursively.

Assuming the state evolves according to a Markov process and the observation
are conditionally independent

k
p(x0:k ) = p(x0 ) p(xl |xl1 ) (5.95)
l=1
5.4 Earthquake Forecast 119

and

k
p(y1:k |x0:k ) = p(yl |xl ) (5.96)
l=1

According to the law of probability, the importance weights wki given by the
relation (5.91), and using Eqs. (5.94), (5.95), (5.96), can be written as

p(yk |xki ) p(xki |xti |xk1

i )
wki wk1
i
(5.97)
q(xki |x0:k1
i , y1:k )

This equation provides the importance weights for the sequential updating. The
advantage of SIS is that it does not rely on the underlying Markov chain. The dis-
advantage is that the importance weights may have large variances and there is a
problem of degeneracy. In order to minimize the problem there are two possible
solutions. The first is to choose a good importance density, while the second is the
resampling.
The sampling-importance resampling (SIR) is motivated from the bootstrap and
jackknife techniques. The bootstrapping is to evaluate the properties of an estimator
through the empirical cumulative distribution function (cdf) of the samples instead
of the true cdf.
In statistics, the jackknife is a resampling techniques and bootstrapping is the eval-
uation of the properties of an estimator through the empirical cumulative distribution
function (Efron and Tibshirani [46]).
In order to overcome the degeneracy a suitable method is to compute the effective
sample size Neff (Liu and Chen [47]) defined as

Np
Neff = (5.98)
1 + V ar (wki )

where
p(xki |y1:k )
wki = (5.99)
q(xki |xk1
i , yk )

When this may be not available and an estimate Neff can be obtained as an inverse
of the so called Participation Ratio (Mzard et al. [48])

1
Neff = N (5.100)
i=1 (wk )
p i 2

Thus resampling is applied when Neff falls below a certain threshold.

120 5 Applications

5.4.3 The Receipt of SIR

The SIR receipt is

1. Draw N p random samples {xi , 1 = 1, . . . , N p } from proposal distribution q(x);
2. Calculate importance weights wi for each sample xi ;
3. Normalize the importance weight to obtain wi ;
4. Resample with replacement N times from the discrete set {xi , 1 = 1, . . . , N p }
where the probability of resampling from xi is proportional to wi ;
5. Calculate NJeff , if NJeff < Nthreshold resample.

References

1. Lorenz, E.N.: Designing chaotic models. J. Atmos. Sci. 62, 15741587 (2005)
2. Lorenz, E.N.: Atmospheric Predictability. Advances in Numerical Weather Prediction,
196566 Seminar Series. Travelers Research Center, Inc., pp. 34-39 (1966)
3. Lorenz, E.N., Emanuel, K.A.: Optimal sites for supplementary weather observations: simula-
tion with a small model. J. Atmos. Sci. 55, 399414 (1998)
4. Lorenz, E.N.: Deterministic non-periodic flow. J. Atmos. Sci. 20, 130141 (1963)
5. Salzman, B.: Finite amplitude free convection as an initial value problem I. J. Atmos. Sci. 19,
329341 (1962)
6. Lyapunov, A.M.:The general problem of the stability of motion, Translated by Fuller, A.T.
Taylor and Francis, London. ISBN: 978-0-7484-0062-1 Reviewed in detail by Smith, M.C.:
Automatica 1995 3(2), 353356 (1992)
7. McLaughlin, J.B., Martin, P.C.: Transition to turbulence of a statically stressed fluid system.
Phys. Rev. A 12, 186 (1975)
8. Glatzmaier, G.A., Roberts, P.H.: A three-dimensional self-consistent computer simulation of
a geomagnetic field reversal. Nature 377, 203209 (1995)
9. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes. Cambridge
University Press, Cambridge (1986)
10. Kuhl, D., Kostelich, E.: Introduction to LETKF Data Assimilation(2008).https://www.atmos.
umd.edu/~dkuhl/AOSC614/LETKF_lab.pdf
11. Bannister, R.: A square root ensemble Kalman filter demonstration with the Lorenz model
(2012). https://www.met.reading.ac.uk/textdarc/training/lorenze_nsrkf/
12. Migliorini, S.: Ensemble data assimilation with the lorenz equations (2010). https://www.met.
reading.ac.uk/~hraa/projects/05/index.html
13. Chaplain, M.: Modelling Aspects of Cancer Growth: Insight from Mathematical and Numerical
Analysis and Computational Simulation. Multiscale Problems in the Life Sciences Lecture
Notes in Mathematics 1940, 147200 (2008)
14. Swanson, K.R., Alvord Jr, E.C., Murray, J.D.: A quantitative model of differential motility of
gliomas in white and grey matter. Cell Prolif. 33, 317329 (2000)
15. Giatili, S.G., Stamatakos, G.S.: A detailed numerical treatment of the boundary conditions
imposed by the skull on a diffusion-reaction model of glioma tumor growth. Clinical validation
aspects. Appl. Math. Comput. 218 (2012)
16. Kostelich E.J., Kuang, Y., McDaniel, J.M., Moore, N.Z., Martirosyan, N.L., Preul, M.C.: Accu-
rate state estimation from uncertain data and models: an application of data assimilation to
mathematical models of human brain tumors, Biol. Direct 1, 6-64 (2011). http://www.biology-
direct.com/content/6/1/64
17. Hunt, B.R., Kostelich, E.J., Szunyogh, I.: Efficient data assimilation for spatiotemporal chaos:
a local ensemble transform Kalman filter. Physica D 230, 112126 (2007)
References 121

18. Leovy, C., Mintz, Y.: The numerical simulation of atmospheric circulation and climate of Mars.
J. Atmos. Sci. 26(6), 11671190 (1969)
19. Mintz, Y.: Very long-term global integration of the primative equations of atmospheric motion.
(An experiment in climate simulation) WMO tech notes No. 66 141167, also Meteorol.
Monogr. 8(30), 2036 (1965)
20. Arakawa A.: Numerical simulation of large-scale atmospheric motions. Numerical solution of
field problems continuum physics. In: Birkhoff, G., verge. S. (eds,) American Math Society,
vol 2, pp 24-40. Providence R.I. (1970)
21. Matthews, M.S., Kieffer, H.H., Jakosky, B.M., Snyder, C.: Mars. The University of Arizona
Press
22. Barlow, N.: Mars: An Introduction to its Interior, Surface and Atmosphere. Cambridge Planetary
Science. Cambridge University Press, Cambridge (2014)
23. Richardson, M.I., Toigo, A.D., Newman, C.E.: PlanetWRF: A general purpose, local to global
numerical model for planetary atmospheric and climate dynamics. J. Geophys Res. 112 (2007)
24. Arakawa, A., Lamb, V.R.: Computational Design of the Basic Dynamical Processes of the
UCLA General Circulation Model. Methods of Computational Physics 17, pp. 173265. Aca-
demic Press, New York (1977)
25. Courant R., Friedrichs, K., Lewyt, H.: On the Partial Difference Equations of Mathematical
Physics. IBM J., (1967)
26. McKay, C.P., Pollack, J.B., Courtin, R.: The thermal structure of Titans atmosphere. Icarus
80(1), 2353 (1989)
27. Held, I.M., Suarez, M.J.: A proposal for the intercomparison of the dynamical cores of
atmospheric general-circulation models. Bull. Am. Meteorol. Soc. 72(10), 18251830 (1994)
28. Lee, C., Lawson, W.G., Richardson, M.I., Anderson, J.L., Collins, N., Hoar, T., Mischna, M.:
Demonstration of ensemble data assimilation for Mars using DART, MarsWRF, and radiance
observations from MGS TES. J. Geophys. Res. 116, 117 (2011)
29. Edwards, J.M., Slingo, A.: Studies with a flexible new radiation code: 1. Choosing a configu-
ration for a large scale model. Q. J. R. Meteorol. Soc. 122(531), 689719 (1996)
30. Mischna, M.A., Toigo, A.D., Newman, C.E., Richardson, M.I.: Development of a new global,
scalable and generic general circulation model for studies of the Martian atmosphere. Paper
presented at the Second Workshop on Mars Atmosphere Modelling and Observations, Granada,
Spain, CNES, 27 February3 March (2006)
31. Levoni, C., Cervino, M., Guzzi, R., Torricella, F.: Atmospheric aerosol optical properties : a
database of radiation characteristic for different components and classes. Appl. Opt. 36 (1997)
32. Christensen, P.R., et al.: Mars Global Surveyor Thermal Emission Spectrometer experiment:
Investigation description and surface science results, J. Geophys. Res., 106(E10) (2001)
33. Montabone, L., Lewis, S.R., Read, P.L., Hinson, D.: Validation of Martian meteorological data
assimilation for MGS/TES using radio occultation measurements. Icarus 185, 113132 (2006)
34. Lewis, S.R., Read, P.L., Conrath, B.J., Pearl, J.C., Smith, M.D.: Assimilation of Thermal
Emission Spectrometer atmospheric data during the Mars Global Surveyor aerobraking period.
Icarus 1932(2), 327347 (2007)
35. Greybush, S.J., Wilson, R.J., Hoffman, R.N., Hoffman, M.J., Miyoshi, T., Ide, K.,
McConnochie, T., Kalnay, E.: Ensemble kalman filter data assimilation of thermal emission
spectrometer temperature retrievals into a Mars GCM. J. Geophys. Res. 117 (2012)
36. Navarro, T., Forget, F., Millour, E., Greybush, S.J.: Detection of detached dust layers in the Mar-
tian atmosphere from their thermal signature using assimilation. Geophys. Res. Lett. 41(19),
66206626 (2014)
37. Arulampalam, M., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online
nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174188 (2002)
38. Kong, A., Liu, J., Wong, W.: Sequential imputations and Bayesian missing data problems. J.
Am. Stat. Assoc. 89(425), 278288 (1994)
39. Reid, H.: The Mechanics of the Earthquake, The California Earthquake of April 18, 1906,
Report of the State Investigation Commission, vol. 2. Carnegie Institution of Washington,
Washington (1910)
122 5 Applications

40. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes, vol. I. Springer,
New York (2003)
41. Werner M.J., Ide, K., Sornette, D.: Earthquake forecasting based on data assimilation: sequential
Monte Carlo methods for renewal point processes. Nonlinear Process. Geophys. 18, 49-70
(2011). www.nonlin-processes-geophys.net/18/49/2011/
42. Ogata, Y.: Seismicity analysis through point-process modeling: a review. Pure Appl. Geophys.
155, 471507 (1999)
43. Field, E.H.: A summary of previous working groups on california earthquake probabilities.
Bull. Seismol. Soc. Am. 97(4), 10331053 (2007)
44. Biasi, G., Weldon, R., Fumal, T., Seitz, G.: Paleoseismic event dating and the conditional
probability of large earthquakes on the southern San Andreas fault, California. B. Seismol.
Soc. Am. 92, 27612781 (2002)
45. Marshall, A.: The use of multi-stage sampling schemes in Monte Carlo computations. In:
Meyer, M. (ed.) Symposium on Monte Carlo Methods, pp. 123140. Wiley, New York (1956)
46. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman and Hall/CRC, Boca
Raton (1993). ISBN 0-412-04231-2
47. Liu, J.S., Chen, R.: Sequential monte carlo methods for dynamic systems. J. Am. Stat. Assoc.
93(443), 10321044 (1998)
48. Mzard, M., Parisi, G., Virasoro, M.: Spin Glass Theory and Beyond. World Scientific Lecture
Notes in Physics, vol. 9. Cambridge University Press, Cambridge (1987)
Appendix

A.1 Hadamard Product

For two real matrices A and B that are of the same dimensions.
1. If A and B are positive semidefinite, then is A B,
2. If B is positive definite and if A is positive semidefinite with all its main diagonal
entries positive, then A B is positive definite, Horn [1],
3. A (BC) = (A B)C
4. A(B C) = (A B)C
Points 3 and 4 will be needed for approximations in the inclusion of the Schur product
in the EnKF and ETKF. These are now proved by writing the identities in a matrix
index notation, where i is the row index, j is the column index and k is an index to
be summed over.

[A B]i j = Ai j Bi j

N
[(A B)C]i j = Aik Bik Ck j
k=1

N
[BC]i j = Bik Ck j
k=1

N
[A (BC)]i j = Ai j Bik Ck j (A.1)
k=1

The Author(s) 2016 123

R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7
124 Appendix

Thus one concludes that A (BC) = (A B)C. Also

N
[A(B C)]i j = Aik Bk j Ck j (A.2)
k=1

so it is now concluded that A(B C) = (A B)C.

A.2 Differential Calculus

Now we cover the various calculation methods used in previous chapters in order to
provide a comprehensive and self-consistent framework of reference.

A.3 The Method of Characteristics

Let us consider the partial differential equations of the following form:

u u
a(x, y) + b(x, y) + c(x, y)u = 0 (A.3)
x y

with the boundary conditions u(x, 0) = f (x) where u = u(x, y) is the unknown
function we need to find, and the expressions a(x, y), b(x, y), c(x, y) and f (x) are
given.
The objective of the method of characteristics , when it is applied to these equa-
tions, is to change coordinates from (x, y) to a new coordinate system (x0 , s) where
the partial differential equations PDE becomes an ordinary differential equation
(ODE) along certain curves in the plane x y. These curves along which the solu-
tion of the PDE is reduced to ODE, are called characteristic curves or features. While
the new variable s vary for along these characteristic curves, the new variable x0 will
be constant.
Let us now transform the PDE into ODE.
u u x u y
= + . (A.4)
s x s y s

If we select
x
= a(x, y) (A.5)
s
Appendix 125

and
y
= b(x, y) (A.6)
s
we have:
u u u
= a(x, y) + b(x, y) (A.7)
s x y

and thus the PDE becomes the following ODE

du
+ c(x, y)u = 0. (A.8)
ds
The relations (A.5) and (A.6) are the characteristic equations.
The strategy to be adopted to apply the method of characteristics is to:
1. solve quadratic equations (A.5) and (A.6);
2. integrate by placing constants x(0) = x0 e y(0) = 0;
3. solve the ODE (A.8) with the initial conditions u(x, 0) = f (x0 );
4. get a solution, solving it for s and x0 in terms of x and y (using the results of
step 1) and replace these values in u(x0 , s) to get the solution of PDE u(x, t).

A.4 Calculus of Variations

Calculus of variations has been developed by Leonhard Euler in 1744 to find the
biggest or smaller values whose rate changed very quickly. In 1755 Lagrange wrote
a letter to Leonhard Euler in which he described his method on the variations. Euler
immediately adopted this new method called Calculus of variations. The great advan-
tage of the calculus of variations is that we consider a system as a whole and the
individual components of the system itself are not explicitly considered. In this way
one allows to deal with a system without knowing in detail all the interactions between
the various components of the system itself. The variational calculation determines
the stationary points (extremes) integral expressions, known as functional.
We can say that a function f (x, y) has a steady value at one point (x0 , y0 ) if
around this point, the rate of the function, in any direction, is zero. The stationary
value concept is well described by the operator that was introduced by Lagrange.
This operator is similar to the differential operator, but while this refers to a real
infinitesimal displacement, the Lagrangian refers to an virtual infinitesimal displace-
ment. The virtual nature of the Lagrangian operator arises from the fact that enables
a shift around an esplorative point called change of position.
The operator behaves like the differential operator and vanishes at the endpoints
of the curve defined by the function. Let us see how it works with a simple example.
126 Appendix

Suppose there is a (x0 , y0 ) around which we want to conduct a virtual displacement

x, y. The first variation on function f (x, y) is given by:

f f
f = x + y. (A.9)
x y

x and y can be written in terms of directional cosines multiplied by a small

parameter which tends to zero.

x = x
y = y . (A.10)

In this way the variation of the function, in a specific direction, is given by:

f f f
= x + y . (A.11)
x y

By definition, since (x0 , y0 ) is a steady value, f must disappear for any virtual
displacement with no regard to the direction of the shift that is independently on x
and y . Then the condition that all partial derivatives disappear in a stationary point
is a necessary and sufficient condition for the function f that has a steady value at
that point.

f
=0
x
f
= 0. (A.12)
y

The fact that the value f (x0 , y0 ) is a stationary value of f is a necessary but not
sufficient condition, to be an extremum of f . When the infinitesimal around f (x0 , y0 )
has f > f (x0 , y0 ) everywhere, then f (x0 , y0 ) is a local minimum. Vice versa if
f < f (x0 , y0 ) then f (x0 , y0 ) is a local maximum. In this case the stationary value it
is also an extremum. In other cases f (x0 , y0 ) is a stationary value but not an extreme.
The second derivative allows us to understand if a stationary point is a maximum,
minimum or nil. Of course we need to specify the domain where the stationary or
extreme values can be found.
When the variational calculation has some constraints, for example such as:

g(x, y) = 0, (A.13)

we use the Lagrange multipliers forming a new function f 1 = f + g where is the

Lagrange multiplier, a undetermined function. Then let us take the variation of the
function f (x, y) and the constraint
Appendix 127

f f
f = x + y = 0
x y
g f
g = x + y = 0. (A.14)
x y

Let us take the variation of f 1

f 1 = ( f + g) = + g + g = f. (A.15)

Then the conditions because there is a stationary point of f 1 subject to the constraint
(A.13) is that:

f g
+ =0
x x
f g
+ = 0. (A.16)
y y

The two previous equations and the constraint (A.13) are used to find the stationary
value. The Lagrange multiplier can be considered as a measure of the sensitivity of
the value of a function f , in the stationary point, when it varies with the constraint
given from relation (A.13).
In the case of N dimensions x1 , . . . , x N , for a stationary point of the function
f (x1 , . . . , x N ) subject to constraints g1 (x1 , . . . , x N ) = 0, . . . g M (x1 , . . . , x N ) = 0,
we have:

M
(f + m gm ) = 0 1 n N. (A.17)
xn
m=1

A.5 The Solution for the Simplified Equation

of Oceanographic Circulation

Let us look how to solves the (2.84) following Bannister [2]. The inclusion of errors
L T
in the field f = f (x, t) and W f 0 d x 0 dt f (x, t)2 in the cost function, defines
the inverse model as a weak constraint (Sasaki [3]).
Let us build the variations of J around the reference field u so we can see for
what u(x, t) J [u] is stationary.

J [u + u] = J [u] + J u . (A.18)
128 Appendix

Thus
L T
J
J |u = dx dt u + O(u 2 ) =
0 0 u
u
L T
2Wi d x{u(x, 0) I (x)}u(x, 0) + 2Wb dt{u(0, t) B(t)}u(0, t)
0 0

M
+2w {u(xi , ti ) u i }u(x, t)(x xi )(t ti )
i=1
L T
u u u u
+2W f dx dt +c F +c + O(u 2 ). (A.19)
0 0 dt x t x

Let us define

u u
(x, t) = W f ( +c F), (A.20)
dt x

where (x, T ) = 0 and (L , t) = 0. Integrating by parts

b b
du dv
v d x = [uv]ab u d x, (A.21)
a dx a dx

a part of the integration

L T u u u u
2 dx dt +c F +c = (A.22)
0 0 dt x t x
L T
u u
2 dx (x, t) +c dt (A.23)
0 0 t x

is written as

L T u u
2 dx (x, t) +c dt
0 0 t x

L L
T
=2 (x, 0)u(x, 0)d x 2 dx u(x, t)dt
0 0 0 t
L
T T
2 c(0, t)u(0, t)dt 2 dt c u(x, t)d x. (A.24)
0 0 0 x
Appendix 129

Thus we have:
L
J |u = 2Wi d x{u(x, 0) I (x)}u(x, 0)
0
T
+2Wb dt{u(0, t) B(t)}u(0, t)
0

M
+2w {u(xi , ti ) u i }u(x, t)(x xi )(t ti )
i=1

L L T
2 (x, 0)u(x, 0)d x 2 dx u(x, t)dt
0 0 0 t

T T L
2 c(0, t)u(0, t)dt 2 dt c u(x, t)d x + O(u 2 ). (A.25)
0 0 0 x

Posing the linear part to zero, using the Eq. (2.70) and the definition (A.20) we get
the equations of Euler-Lagrange for a weak constraints, where:
1. the forward equation is:

u u
+c F = W 1
f . (A.26)
t x
2. Its initial conditions and the boundary conditions are:

Wi {u(x, 0) I (x)} (x, 0) = 0 (A.27)

Wb {u(0, t) B(t)} c(0, t) = 0. (A.28)

3. The inverse is given by:

M

w {u(xi , ti ) u i }(x xi )(t ti ) ( + c ) = 0. (A.29)
t x
i=1

4. Its initial conditions and at the boundary conditions are:

(x, T ) = 0 (A.30)
(L , t) = 0. (A.31)

When we require that u = u(x, t) satisfies exactly the Eq. (2.70), then we need to
find a minimum for the functional:
130 Appendix
L
f [u] = Wi {u(x, 0) I (x)}2 d x + (A.32)
0
T
M
Wb {u(0, t) B(t)} dt + w 2
{u(xi , ti )u i }2 .
0 m=1

The query is: which u(x, t) makes f [u] stationary, subject to the following model
constraint?
u u
g(x, t) = +c F = 0. (A.33)
t x
This question is answered by adding to the functional f [u] the strong constraint
obtained using a Lagrange multiplier = (x, t) obtaining:
L T
J [u, ] = f [u] + 2 dx (x, t)g(x, t)dt. (A.34)
0 0

Note that u and can vary independently. The variation of J around the reference
field u and is given by:

J [u + u, + ] = J [u, ] + J (A.35)
u,

where
L T L T
J J

J |u, = dx dt u + dx dt + O(u 2 , 2 , u).
0 0 u 0 0
u, u,
(A.36)
Thus
L
J |u, = 2Wi d x{u(x, 0) I (x)}u(x, 0)
0
T
+2Wb dt{u(0, t) B(t)}u(0, t)
0

M
+2w {u(xi , ti ) u i }u(xi , ti )
i=1

L T u u
+2 dx dt (x, t)+c
0 0 t x

L T u u
+2 dx (x, t) +c F dt
0 0 t x
+O(u 2 , 2 , u), (A.37)
Appendix 131

with the boundary conditions : (x, T ) = 0 and (L , t) = 0.

Integrating by parts (A.37), that is:

L T u u
2 dx dt (x, t) +c (A.38)
0 0 t x

with

L u u T
2 dx +c =dt (x, t)
0 0 t x
T

u(x, T )(x, T ) u(x, 0)(x, 0) dtu
0 t
L

+cu(L , t)(L , t) cu(0, t)(0, t) d xcu . (A.39)
0 x

Let us note the observation term is

L T
{u(xi , ti ) u i }u(xi , ti ) = dx {u(xi , ti ) u i }u(x, t)(x xi )(t ti ).
0 0
(A.40)
With these solutions and the condition on we obtain the complete result
L T
J |u, = 2Wi d x{u(x, 0) I (x)}u(x, 0) + 2Wb dt{u(0, t) B(t)}u(0, t)
0 0

M
+ 2w {u(xi , ti ) u i }u(x, t)(x xi )(t ti )
i=1
L L T

2 d xu(x, 0)(x, 0) 2 dx dt u(x, t)
0 0 0 t
T T L

2 cu(0, t)(0, t)dt 2 dt c u(x, t)d x
0 0 0 x
L T
u u
+2 dx (x, t) +c F dt + O(u 2 , 2 , u). (A.41)
0 0 t x

Posing the linear part to zero and using the boundary conditions of we get the
Euler-Lagrange equation for a strong constraint,
1. where the forward equation is:

u u
+c F =0 (A.42)
t x
2. the boundary conditions are:
132 Appendix

Wi {u(x, 0) I (x)} (x, 0) = 0 (A.43)

Wb {u(0, t) B(x)} c(0, t) = 0. (A.44)

3. The inverse equation is:

M

w {u(xi , ti ) u i }(x xi )(t ti ) ( + c ) = 0, (A.45)
t x
i=1

4. with the boundary conditions

(x, T ) = 0 (A.46)
(L , t) = 0 (A.47)

References

1. Horn, R.: The Hadamard product. Matrix theory and applications. In: Johnson, C.R. (ed.) Amer-
ican Mathematical Society, Proceedings of Symposia in Applied Mathematics, vol. 40, pp.
87169 (1990)
2. Bannister, R.: Various papers on data assimilation. http://www.met.rdg.ac.uk/~ross/DARC/
DataAssim.html (2012)
3. Sasaki, Y.: Some basic formalisms in numerical variational analysis. Mon. Weather Rev. 98,
875883 (1970)
Index

A D
Analysis residue, 30 DART, 112
Assimilation, 13 Data assimilation, 13
DeMoivre, 3
DeMorgan, 3
B Duke of Tuscany, 6
Bayes, 3 Dynamic data assimilation, 1
Bayesian optimal recursion, 51 Dynamic space-state-model (DSSM), 49
Bayesian probability, 3
Bayesian rule, 63
Bayesian theorem, 13 E
Bernoulli D., 5 EKF, optimal gain, 48
Bernoulli J., 3 EMARS, 113
Best estimate, 39 EnKF, 100
Birkhoff, 5 Ensemble, 12
BLUE, 25 EnSRKF, 93
Boble EnKF, 83 Equation of Euler-Lagrange, 129
Brain web data, 99 Euler, 2
Expectation, 21
Extended Kalman filter, 61
C Extracellular matrix, 101
Calculus of variations, 125
Cerebro spinal fluid, 98
Ceres, 15 F
C-grid models, 110 Ficks second law, 98
Characteristics, method of, 35 Fisher, 3
Cholesky factorizazion, 28 Fletcher-Reeves Polak-Ribire methods, 29
Cholesky, decomposition of, 55 Fokker-Planck operator, 71
CLF criteria, 111 Fokker-Planck-Kolmogorov (FPK), 71
Conditioned, 40 Forecast model, 33
Conjugate method, 28 Forecast/background, 22
Cost function, 31
Covariance, 40
Covariance analysis, 24 G
Covariance background, 24 Galileo, 6
Covariance observation, 24 Gauss, 2
Cross-correlation matrix, 52 General circulation model (GCM), 12
The Author(s) 2016 133
R. Guzzi, Data Assimilation: Mathematical Concepts and Instructive Examples,
SpringerBriefs in Earth Sciences, DOI 10.1007/978-3-319-22410-7
134 Index

Glioblastoma multiform, 99 Markov Chain Monte Carlo (MCMC), 114

GOES, 9 Markov process, 62, 66
Golden section search algorithm, 29 Mars global surveyor (MGS), 112
Green function, 36 MATLAB, 93
Maximal Lyapunov exponent, 91
Maximum a posteriori, 63
H Maximum likelihood, 63
Hadamard, 9 Maximum likelihood (MLE), 5
Hadley centre, 112 Maxwell, 5
Huygens, 5 Mean, 40
Meteosat, 9
Method of characteristics, 124
I Metropolis Ulam, 6
Ill posed problem, 34 Mie scattering, 112
Importance sampling (IS), 114 Minimization problem, 15
Inflation, 84 Minimum mean-squared error, 63
Innovation, 24, 30 Mintz-Arakawa s model, 105
Inverse probability, 3 Monte Carlo, 6, 72
Moulton, 2

J
Jacobian, 23, 47 N
NASA archive, 112
National center for atmospheric research
K (NCAR), 112
Kalman, 31 Newton, 1
Kalman filter, 39 Numerical weather prediction (NWP), 9
Kalman gain, 44
Kalman SPKF, 49
Kalman, conditions, 51 O
Kolmogorov, 5 Operator, 20
Kronecker, delta, 44 Optimal estimate, 39
Optimal estimation, 24
Optimal interpolation (OI), 28
L
Laboratoire de meteorologie dynamique,
113 P
Lagrange, 2 Pascal, 6
Laplace, 2 Phenotype, 98
LBFGS method, 29 PlanetWRF, 110
Least squares, 3 Poincar maps, 91
Legendre, 2
LETKF, 99, 103
Likelihood, 3 Q
Localization, 103 Quadratic form, 26
Lorenz, 5, 11, 89
Lorenzs butterfly effect, 92
Lyapunov, 5 R
Lyapunov exponents, 91 Random variable covariance, 52
Random variable, mean, 52
Rayleigh-Benard cell, 90
M Recursion, 45
Mannheim society, 8 Recursive, 39
Markov, 5 Recursive Bayesian estimation, 50
Index 135

Representativeness errors, 10 Stochastic dynamic prediction, 5

Rititake, 92 Strong constraint, 130, 131
Rossby waves, 90 Systems biology, 96
Runge Kutta method, 92

T
S Taylor expansion, 23
Schur, 123 Thermal emission spectrometer (TES), 112
Schur product, 85 Torricelli, 6
Sequential importance resampling (SIR), True vector, 20
114
Sequential importance sampling (SIS), 114
Sequential Monte Carlo, 113 U
Sherman-Woodbury-Morrinson equation, UCLA two level, 107
27 UK meteorological office, 113
Sigma coordinate, 106 Unscented Kalman filter (UKF), 56
Sigma point Kalman filter (SPKF), 61
Sigma points, 54
Square root scheme, 95 W
State vector, 20 Weak constraints, 127, 129
Stigler, 4 White noise, 42
Stochastic differential equation (SDE), 70 Wiener, 5

Doob
No ratings yet
Doob
8 pages
Lindgren, Georg - Rootzen, Holger - Sandsten, Maria - Stationary Stochastic Processes For Scientists and Engineers-CRC Press (2013)
No ratings yet
Lindgren, Georg - Rootzen, Holger - Sandsten, Maria - Stationary Stochastic Processes For Scientists and Engineers-CRC Press (2013)
316 pages
René L. Schilling - (Solution Manual) Brownian Motion - An Guide To Random Processes and Stochastic Calculus - (2022)
No ratings yet
René L. Schilling - (Solution Manual) Brownian Motion - An Guide To Random Processes and Stochastic Calculus - (2022)
240 pages
Math8 Q4 Wk5 Module 5
No ratings yet
Math8 Q4 Wk5 Module 5
12 pages
An Introduction To Malliavin Calculus With Applications To Economics
No ratings yet
An Introduction To Malliavin Calculus With Applications To Economics
83 pages
Linear Differential Operators
No ratings yet
Linear Differential Operators
10 pages
Kleinbaum D.G., Klein M., Logistic Regression - A Self-Learning Text (3ed, Springer, 2010,709p,) PDF
No ratings yet
Kleinbaum D.G., Klein M., Logistic Regression - A Self-Learning Text (3ed, Springer, 2010,709p,) PDF
1 page
Mike Urban Tutorials
No ratings yet
Mike Urban Tutorials
240 pages
Data Assimilation For Atmospheric, Oceanic and Hydrologic Applications by Seon Ki Park, Liang Xu
No ratings yet
Data Assimilation For Atmospheric, Oceanic and Hydrologic Applications by Seon Ki Park, Liang Xu
707 pages
Introduction To Data Assimilation and The Ensemble Kalman Filter
No ratings yet
Introduction To Data Assimilation and The Ensemble Kalman Filter
49 pages
Adjoint Tutorial PDF
No ratings yet
Adjoint Tutorial PDF
6 pages
MATLAB Source Codes
No ratings yet
MATLAB Source Codes
69 pages
Bovier & Den Hollander - Metastability
No ratings yet
Bovier & Den Hollander - Metastability
578 pages
Partial Differential Equations A Unified Hilbert Space Approach 1st Edition Rainer Picard All Chapters Instant Download
100% (1)
Partial Differential Equations A Unified Hilbert Space Approach 1st Edition Rainer Picard All Chapters Instant Download
82 pages
Temuco
No ratings yet
Temuco
298 pages
Perturbation Methods
100% (2)
Perturbation Methods
26 pages
Amen S., Bilokon P., Brinley Codd A., Fofaria M., Shah T. (2004) - Numerical Solutions of Differential Equations PDF
No ratings yet
Amen S., Bilokon P., Brinley Codd A., Fofaria M., Shah T. (2004) - Numerical Solutions of Differential Equations PDF
251 pages
Ravi Project Report
No ratings yet
Ravi Project Report
37 pages
Probability Theory - Varadhan
No ratings yet
Probability Theory - Varadhan
225 pages
Acoustic Fields & Waves in Solids
No ratings yet
Acoustic Fields & Waves in Solids
15 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
23 pages
Transform Methods: Partial Differential Equations
No ratings yet
Transform Methods: Partial Differential Equations
4 pages
A First Course of Partial Differential in Physical Sciences and Engineering
No ratings yet
A First Course of Partial Differential in Physical Sciences and Engineering
279 pages
Notes On Markov Chain
No ratings yet
Notes On Markov Chain
34 pages
Path Integrals Linetsky
No ratings yet
Path Integrals Linetsky
35 pages
Chebyshev Polynomial Approximation To Solutions of Ordinary Diffe PDF
No ratings yet
Chebyshev Polynomial Approximation To Solutions of Ordinary Diffe PDF
34 pages
2015 Book TheInverseProblemOfTheCalculus PDF
100% (1)
2015 Book TheInverseProblemOfTheCalculus PDF
296 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
56 pages
(Arnold L.) Stochastic Differential Equations The (BookFi)
No ratings yet
(Arnold L.) Stochastic Differential Equations The (BookFi)
244 pages
Tensors Calculus
No ratings yet
Tensors Calculus
46 pages
Duhamel's Principle
No ratings yet
Duhamel's Principle
14 pages
Waves in Fluids and Solids
No ratings yet
Waves in Fluids and Solids
326 pages
Stability of Linear Delay Differential Equations A Numerical Approach With Matlab
No ratings yet
Stability of Linear Delay Differential Equations A Numerical Approach With Matlab
162 pages
Notes Spectral Methods PDF
No ratings yet
Notes Spectral Methods PDF
12 pages
LIVRO An Introduction To Inverse Problems With Applications
100% (3)
LIVRO An Introduction To Inverse Problems With Applications
255 pages
Center Manifold Reduction
100% (2)
Center Manifold Reduction
8 pages
NCL Functions and Procedures Reference Cards
No ratings yet
NCL Functions and Procedures Reference Cards
13 pages
Notes On Adjoint Methods MIT
No ratings yet
Notes On Adjoint Methods MIT
6 pages
The Concept of Stability in Numerical Mathematics
100% (1)
The Concept of Stability in Numerical Mathematics
202 pages
Laplace Transform PDF
No ratings yet
Laplace Transform PDF
13 pages
(Inverse and Ill-Posed Problems Series) Yu P Petrov - V S Sizikov - Well-Posed, Ill-Posed, and Intermediate Problems With Applications-VSP (2005)
No ratings yet
(Inverse and Ill-Posed Problems Series) Yu P Petrov - V S Sizikov - Well-Posed, Ill-Posed, and Intermediate Problems With Applications-VSP (2005)
245 pages
Euler's and Heun's Methods: Douglas Wilhelm Harder, M.Math. LEL
No ratings yet
Euler's and Heun's Methods: Douglas Wilhelm Harder, M.Math. LEL
72 pages
Differential Equations
No ratings yet
Differential Equations
351 pages
Introduction To Kalman Filters: Michael Williams 5 June 2003
No ratings yet
Introduction To Kalman Filters: Michael Williams 5 June 2003
23 pages
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
No ratings yet
Maximum Likelihood Estimates of Linear Dynamic Systems (1965)
14 pages
Ode
No ratings yet
Ode
387 pages
Fourier Analysis: Synthesis. For Example, Determining What
No ratings yet
Fourier Analysis: Synthesis. For Example, Determining What
11 pages
MATH1070 4 Numerical Integration PDF
No ratings yet
MATH1070 4 Numerical Integration PDF
135 pages
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
No ratings yet
The Feynman Path Integral Approach To Atomic Interferometry. A Tutorial
30 pages
Markov Chain Geology
No ratings yet
Markov Chain Geology
21 pages
Inverse Problems in Geophysics
No ratings yet
Inverse Problems in Geophysics
43 pages
Cubic Spline Tutorial
No ratings yet
Cubic Spline Tutorial
4 pages
Three Example Lagrange Multiplier Problems PDF
No ratings yet
Three Example Lagrange Multiplier Problems PDF
4 pages
Ballmann-Lectures On Kähler Manifolds
No ratings yet
Ballmann-Lectures On Kähler Manifolds
183 pages
A. E. Kennelly - Application of Hyperbolic Functions To Engineering Problems
No ratings yet
A. E. Kennelly - Application of Hyperbolic Functions To Engineering Problems
324 pages
1.ma6459 NM PDF
No ratings yet
1.ma6459 NM PDF
118 pages
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Numerical Methods for Two-Point Boundary-Value Problems
From Everand
Numerical Methods for Two-Point Boundary-Value Problems
Herbert B. Keller
No ratings yet
Data Assimilation - Mathematical - Rodolfo Guzzi
No ratings yet
Data Assimilation - Mathematical - Rodolfo Guzzi
188 pages
Chapter-1---Introduction_2017_Data-Assimilation-for-the-Geosciences
No ratings yet
Chapter-1---Introduction_2017_Data-Assimilation-for-the-Geosciences
4 pages
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Railway Engineering
100% (1)
Railway Engineering
306 pages
Nhess 10 171 2010
No ratings yet
Nhess 10 171 2010
9 pages
Design of Steel Beams
No ratings yet
Design of Steel Beams
25 pages
The Big Lie - 911 and Government Mass Murder
100% (3)
The Big Lie - 911 and Government Mass Murder
85 pages
Nhess 10 171 2010
No ratings yet
Nhess 10 171 2010
9 pages
A Kick-Start To Hec-Ras Models: Kim Draper, E.I., CFM Water & Environmental Technologies Butte, MT Great Falls, MT
No ratings yet
A Kick-Start To Hec-Ras Models: Kim Draper, E.I., CFM Water & Environmental Technologies Butte, MT Great Falls, MT
29 pages
CCMdigital May15 2017
100% (1)
CCMdigital May15 2017
41 pages
CCM Magazine
100% (1)
CCM Magazine
40 pages
Delta Life Magazine No 5
No ratings yet
Delta Life Magazine No 5
36 pages
GLOBAL DIGITAL STATISTICS 2014 - We Are Social's Snapshot of Key Digital Indicators
No ratings yet
GLOBAL DIGITAL STATISTICS 2014 - We Are Social's Snapshot of Key Digital Indicators
183 pages
Applied Statistics and Probability for Engineers 5th Edition Chapter3 Compress
No ratings yet
Applied Statistics and Probability for Engineers 5th Edition Chapter3 Compress
35 pages
Central Limit Theorem
No ratings yet
Central Limit Theorem
3 pages
TP09 Garciano PDF
No ratings yet
TP09 Garciano PDF
11 pages
Financial Mathematics, Derivatives And Structured Products, 2nd (Raymond H. Chan, Yves ZY. Guo, Spike T. Lee etc.) (Z-Library)
No ratings yet
Financial Mathematics, Derivatives And Structured Products, 2nd (Raymond H. Chan, Yves ZY. Guo, Spike T. Lee etc.) (Z-Library)
478 pages
Tut 05
No ratings yet
Tut 05
2 pages
Mathematica Laboratories For Mathematical Statistics (ASA-SIAM Series On Statistics and Applied Probability) (Jenny A. Baglivo) 0898715660
No ratings yet
Mathematica Laboratories For Mathematical Statistics (ASA-SIAM Series On Statistics and Applied Probability) (Jenny A. Baglivo) 0898715660
281 pages
8 PDF
No ratings yet
8 PDF
76 pages
MCQ For PTSP
100% (1)
MCQ For PTSP
11 pages
Mean Time Spent in Transient Space
No ratings yet
Mean Time Spent in Transient Space
2 pages
Variance and Standard Deviation
No ratings yet
Variance and Standard Deviation
44 pages
M6 2020 Binomial Distribution Lecture Notes
No ratings yet
M6 2020 Binomial Distribution Lecture Notes
27 pages
Phân Tích Efa Của Các Yếu Tố Tác Động Ý Định Mua Bảo Hiểm: KMO and Bartlett's Test
No ratings yet
Phân Tích Efa Của Các Yếu Tố Tác Động Ý Định Mua Bảo Hiểm: KMO and Bartlett's Test
4 pages
Biostatistics 7th Semester (Final Exam)
No ratings yet
Biostatistics 7th Semester (Final Exam)
18 pages
2022 1 PyE Gilda - Ps17001413.conjuntas
No ratings yet
2022 1 PyE Gilda - Ps17001413.conjuntas
4 pages
Solution To Homework 5 PDF
No ratings yet
Solution To Homework 5 PDF
6 pages
Immediate download (Ebook) Probability and Statistics with R by Maria Dolores Ugarte, Ana F. Militino, Alan T. Arnholt ISBN 9781584888918, 1584888911 ebooks 2024
100% (3)
Immediate download (Ebook) Probability and Statistics with R by Maria Dolores Ugarte, Ana F. Militino, Alan T. Arnholt ISBN 9781584888918, 1584888911 ebooks 2024
71 pages
Statistical Tables SNM MA3251
No ratings yet
Statistical Tables SNM MA3251
10 pages
Response 2 - OC Curve
No ratings yet
Response 2 - OC Curve
25 pages
Flood Frequency Analysis: Reading: Applied Hydrology Sec 12.1 - 12.6
No ratings yet
Flood Frequency Analysis: Reading: Applied Hydrology Sec 12.1 - 12.6
35 pages
Download Full Stochastic Models with Applications to Genetics Cancers AIDS and Other Biomedical Systems Series on Concrete and Applicable Mathematics Volume 4 1st Edition Tan Wai-Yuan PDF All Chapters
100% (10)
Download Full Stochastic Models with Applications to Genetics Cancers AIDS and Other Biomedical Systems Series on Concrete and Applicable Mathematics Volume 4 1st Edition Tan Wai-Yuan PDF All Chapters
77 pages
BUS 341 Exercise#4
No ratings yet
BUS 341 Exercise#4
2 pages
Excel - Skills SUMIFS Type 02
No ratings yet
Excel - Skills SUMIFS Type 02
40 pages
Portfolio Models
No ratings yet
Portfolio Models
19 pages
EE 162 Random Processes For Communication and Signal Processing
No ratings yet
EE 162 Random Processes For Communication and Signal Processing
2 pages
Skewness, Kurtosis, and The Normal Curve
No ratings yet
Skewness, Kurtosis, and The Normal Curve
8 pages
Multivariate Statistical Analysis: The Multivariate Normal Distribution
No ratings yet
Multivariate Statistical Analysis: The Multivariate Normal Distribution
13 pages
3 - Principles of Data Reduction
No ratings yet
3 - Principles of Data Reduction
14 pages
ME SP 11 Q3 0101 PS
No ratings yet
ME SP 11 Q3 0101 PS
24 pages
Probability_and_Statistics_FYBCA_24_28
No ratings yet
Probability_and_Statistics_FYBCA_24_28
3 pages