Bayesian Filtering
Bayesian Filtering
The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Contents
ix
xiii
Preface
Symbols and abbreviations
1
1
1
7
8
12
14
15
Bayesian inference
2.1
Philosophy of Bayesian inference
2.2
Connection to maximum likelihood estimation
2.3
The building blocks of Bayesian models
2.4
Bayesian point estimates
2.5
Numerical methods
2.6
Exercises
17
17
17
19
20
22
24
27
27
30
31
34
36
39
46
51
51
54
56
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Contents
vi
4.4
Exercises
62
96
96
97
99
103
106
110
114
Particle filtering
7.1
Monte Carlo approximations in Bayesian inference
7.2
Importance sampling
7.3
Sequential importance sampling
7.4
Sequential importance resampling
7.5
RaoBlackwellized particle filter
7.6
Exercises
116
116
117
120
123
129
132
134
134
135
139
142
144
144
146
148
152
10
154
154
155
64
64
69
75
77
81
86
92
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Contents
10.3
10.4
10.5
10.6
vii
156
159
162
164
11
Particle smoothing
11.1 SIR particle smoother
11.2 Backward-simulation particle smoother
11.3 Reweighting particle smoother
11.4 RaoBlackwellized particle smoothers
11.5 Exercises
165
165
167
169
171
173
12
Parameter estimation
12.1 Bayesian estimation of parameters in state space models
12.2 Computational methods for parameter estimation
12.3 Practical parameter estimation in state space models
12.4 Exercises
174
174
177
185
202
13
Epilogue
13.1 Which method should I choose?
13.2 Further topics
204
204
206
Appendix
A.1
A.2
A.3
A.4
Additional material
Properties of Gaussian distribution
Cholesky factorization and its derivative
Parameter derivatives for the Kalman filter
Parameter derivatives for the Gaussian filter
References
Index
209
209
210
212
214
219
229
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Preface
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Preface
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Preface
xi
Simo Sarkka
Vantaa, Finland
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
a; b; c; x; t; ;
a; f ; s; x; y; ;
A; F ; S; X; Y
A; F ; S ; X ; Y
A; F; S; X; Y
A
A 1
A T
Ai
Aij
jaj
jAj
dx=dt
@gi .x/
@xj
.a1 ; : : : ; an /
.a1 an /
.a1 an /T
@g.x/
@x
@g.x/
@x
Covx
diag.a1 ; : : : ; an /
p
P
Ex
Ex j y
General notation
Scalars
Vectors
Matrices
Sets
Spaces
Notational conventions
Transpose of matrix
Inverse of matrix
Inverse of transpose of matrix
ith column of matrix A
Element at ith row and j th column of matrix A
Absolute value of scalar a
Determinant of matrix A
Time derivative of x.t /
Partial derivative of gi with respect to xj
Column vector with elements a1 ; : : : ; an
Row vector with elements a1 ; : : : ; an
Column vector with elements a1 ; : : : ; an
Gradient (column vector) of scalar function g
Jacobian matrix of vector valued function x ! g.x/
Covariance Covx D E.x Ex/ .x Ex/T of
the random variable x
Diagonal matrix with diagonal values a1 ; : : : ; an
p p T
Matrix such that P D P P
Expectation of x
Conditional expectation of x given y
xiii
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xiv
f .x/ dx
p.x/
p.x j y/
p.x/ / q.x/
tr A
Varx
xy
xi;k
x p.x/
x,y
xy
x'y
x0Wk
P
x
Symbols
i
N
./
x
t
tk
"k
"k
k
.n/
.i /
O MAP
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
i
'k ./
./
.i /
.i /
.i1 ;:::;in /
a
ao
a.t/
A
Ak
b
B
Bj jk
c
C./
xv
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xvi
C
Ck
CL
CM
CQ
CS
CU
d
di
dt
dx
D
Dk
ei
f ./
Fx ./
F
.i /
Fxx
./
F
g
g./
gi ./
g./
g 1 ./
Q
g./
Gk
Gx ./
.i /
Gxx
./
Hp ./
H
Hk
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
L./
m
m
Q
m
mk
m.ik /
m.i0WT/
Qk
m
mk
mk .i /
Qk
m
msk
/
ms;.i
0WT
mkjn
n
n0
n00
N
N./
xvii
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xviii
p
P
P
PQ
Pk
Pk.i /
.i /
P0WT
PQ k
Pk
PQ k
Pk .i /
Pks
s;.i /
P0WT
Pkjn
qc
qic
q./
q .n/
q
qk
Q
Q.; .n/ /
Q
Qk
rk
rk
R
R
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xix
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xx
Wi
Wi.m/
0
Wi.m/
Wi.c/
0
Wi.c/
Wi1 ;:::;in
x
x
x.i /
xk
Qk
x
x0Wk
x.i0Wk/
/
Q .j
x
0WT
X
Xk
X ./
XQ ./
Xk./
XQk./
XOk./
Xk ./
XQk ./
y
yk
y1Wk
Y ./
YQ ./
YO k./
Z
Zk
1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
ADF
AM
AMCMC
AR
ARMA
ASIR
BS-PS
CDKF
CKF
CLT
CPF
CRLB
DLM
DOT
DSP
EC
EEG
EKF
EM
EP
ERTSS
FHKF
FHRTSS
fMRI
GHKF
GHPF
GHRTSS
GPB
GPS
HMC
HMM
IMM
INS
IS
InI
KF
LMS
LQG
Abbreviations
Assumed density filter
Adaptive Metropolis (algorithm)
Adaptive Markov chain Monte Carlo
Autoregressive (model)
Autoregressive moving average (model)
Auxiliary sequential importance resampling
Backward-simulation particle smoother
Central differences Kalman filter
Cubature Kalman filter
Central limit theorem
Cubature particle filter
CramerRao lower bound
Dynamic linear model
Diffuse optical tomography
Digital signal processing
Expectation correction
Electroencephalography
Extended Kalman filter
Expectationmaximization
Expectation propagation
Extended RauchTungStriebel smoother
FourierHermite Kalman filter
FourierHermite RauchTungStriebel smoother
Functional magnetic resonance imaging
GaussHermite Kalman filter
GaussHermite particle filter
GaussHermite RauchTungStriebel smoother
Generalized pseudo-Bayesian
Global positioning system
Hamiltonian (or hybrid) Monte Carlo
Hidden Markov model
Interacting multiple model (algorithm)
Inertial navigation system
Importance sampling
Inverse imaging
Kalman filter
Least mean squares
Linear quadratic Gaussian (regulator)
xxi
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
xxii
LS
MA
MAP
MC
MCMC
MEG
MH
MKF
ML
MLP
MMSE
MNE
MSE
PF
PMCMC
PMMH
PS
QKF
RAM
RBPF
RBPS
RMSE
RTS
RTSS
SDE
SIR
SIR-PS
SIS
SLDS
SLF
SLRTSS
SMC
TVAR
UKF
UPF
URTSS
UT
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
1
What are Bayesian filtering and smoothing?
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
can be found, for example, in navigation, aerospace engineering, space engineering, remote surveillance, telecommunications, physics, audio signal
processing, control engineering, finance, and many other fields. Examples
of such applications are the following.
Global positioning system (GPS) (Kaplan, 1996) is a widely used satellite navigation system, where the GPS receiver unit measures arrival
times of signals from several GPS satellites and computes its position
based on these measurements (see Figure 1.1). The GPS receiver typically uses an extended Kalman filter (EKF) or some other optimal filtering algorithm1 for computing the current position and velocity such that
the measurements and the assumed dynamics (laws of physics) are taken
into account. Also the ephemeris information, which is the satellite reference information transmitted from the satellites to the GPS receivers,
is typically generated using optimal filters.
Figure 1.1 In the GPS system, the measurements are time delays
of satellite signals and the optimal filter (e.g., extended Kalman
filter, EKF) computes the position and the accurate time.
Strictly speaking, the EKF is only an approximate optimal filtering algorithm, because it
uses a Taylor series based Gaussian approximation to the non-Gaussian optimal filtering
solution.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
infrared sensors, and other types of sensors are used for determining
the position and velocity of a remote target (see Figure 1.2). When this
tracking is done continuously in time, the dynamics of the target and
measurements from the different sensors are most naturally combined
using an optimal filter or smoother. The target in this (single) target
tracking case can be, for example, a robot, a satellite, a car or an airplane.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
account the natural way of computing the estimates is by using an optimal filter or smoother. Also, in sensor calibration, which is typically
done in a time-varying environment, optimal filters and smoothers can
be applied.
Integrated inertial navigation (Grewal et al., 2001; Bar-Shalom et al.,
2001) combines the good sides of unbiased but inaccurate sensors, such
as altimeters and landmark trackers, and biased but locally accurate inertial sensors. A combination of these different sources of information
is most naturally performed using an optimal filter such as the extended
Kalman filter. This kind of approach was used, for example, in the guidance system of the Apollo 11 lunar module (Eagle), which landed on the
moon in 1969.
GPS/INS navigation (Grewal et al., 2001; Bar-Shalom et al., 2001) is a
form of integrated inertial navigation where the inertial navigation system (INS) is combined with a GPS receiver unit. In a GPS/INS navigation system the short term fluctuations of the GPS can be compensated
by the inertial sensors and the inertial sensor biases can be compensated
by the GPS receiver. An additional advantage of this approach is that
it is possible to temporarily switch to pure inertial navigation when the
GPS receiver is unable to compute its position (i.e., has no fix) for some
reason. This happens, for example, indoors, in tunnels and in other cases
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
when there is no direct line-of-sight between the GPS receiver and the
satellites.
Brain imaging methods such as electroencephalography (EEG), magnetoencephalography (MEG), parallel functional magnetic resonance
imaging (fMRI) and diffuse optical tomography (DOT) (see Figure 1.4)
are based on reconstruction of the source field in the brain from noisy
sensor data by using minimum norm estimates (MNE) and its variants
(Hauk, 2004; Tarantola, 2004; Kaipio and Somersalo, 2005; Lin et al.,
2006). The minimum norm solution can also be interpreted in the
Bayesian sense as a problem of estimating the field with certain prior
structure from Gaussian observations. With that interpretation the
estimation problem becomes equivalent to a statistical inversion or
generalized Gaussian process regression problem (Tarantola, 2004;
Kaipio and Somersalo, 2005; Rasmussen and Williams, 2006; Sarkka,
2011). Including dynamical priors then leads to a linear or non-linear
spatio-temporal estimation problem, which can be solved with Kalman
filters and smoothers (see Hiltunen et al., 2011; Sarkka et al., 2012b).
The same can be done in inversion based approaches to parallel fMRI
such as inverse imaging (InI, Lin et al., 2006).
Figure 1.4 Brain imaging methods such as EEG and MEG are
based on estimating the state of the brain from sensor readings. In
dynamic case the related inversion problem can be solved with an
optimal filter or smoother.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
in Sarkka et al. (2007a), Hartikainen and Sarkka (2010) and Sarkka and
Hartikainen (2012).
Physical systems which are time-varying and measured through nonideal sensors can sometimes be formulated as stochastic state space
models, and the time evolution of the system can be estimated using
optimal filters (Kaipio and Somersalo, 2005). These kinds of problem
are often called inverse problems (Tarantola, 2004), and optimal filters
and smoothers can be seen as the Bayesian solutions to time-varying
inverse problems.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
/ xk
observed:
yk
/ xk
yk
/ xkC1
ykC1
that while Kalman and Bucy were formulating the linear theory in the
United States, Stratonovich was doing the pioneering work on the probabilistic (Bayesian) approach in Russia (Stratonovich, 1968; Jazwinski,
1970).
As discussed in the book of West and Harrison (1997), in the 1960s,
Kalman filter like recursive estimators were also used in the Bayesian community and it is not clear whether the theory of Kalman filtering or the
theory of dynamic linear models (DLM) came first. Although these theories were originally derived from slightly different starting points, they are
equivalent. Because of the Kalman filters useful connection to the theory and history of stochastic optimal control, this book approaches the
Bayesian filtering problem from the Kalman filtering point of view.
Although the original derivation of the Kalman filter was based on the
least squares approach, the same equations can be derived from pure probabilistic Bayesian analysis. The Bayesian analysis of Kalman filtering is
well covered in the classical book of Jazwinski (1970) and more recently in
the book of Bar-Shalom et al. (2001). Kalman filtering, mostly because of
its least squares interpretation, has widely been used in stochastic optimal
control. A practical reason for this is that the inventor of the Kalman filter,
Rudolph E. Kalman, has also made several contributions (Kalman, 1960a)
to the theory of linear quadratic Gaussian (LQG) regulators, which are
fundamental tools of stochastic optimal control (Stengel, 1994; Maybeck,
1982a).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
0.25
Resonator position xk
0.2
0.15
0.1
0.05
0
0.05
0.1
0.15
0.2
Signal
Measurement
0.25
0
10
15
Time step k
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
10
(1.2)
1 /;
yk p.yk j xk /:
(1.3)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
11
Because computing the full joint distribution of the states at all time steps is
computationally very inefficient and unnecessary in real-time applications,
in Bayesian filtering and smoothing the following marginal distributions
are considered instead (see Figure 1.7).
Filtering distributions computed by the Bayesian filter are the marginal
distributions of the current state xk given the current and previous measurements y1Wk D fy1 ; : : : ; yk g:
p.xk j y1Wk /;
k D 1; : : : ; T:
(1.4)
The result of applying the Bayesian filter to the resonator time series in
Figure 1.6 is shown in Figure 1.8.
Prediction distributions which can be computed with the prediction step
of the Bayesian filter are the marginal distributions of the future state
xkCn , n steps after the current time step:
p.xkCn j y1Wk /;
k D 1; : : : ; T;
n D 1; 2; : : : :
(1.5)
k D 1; : : : ; T:
(1.6)
The result of applying the Bayesian smoother to the resonator time series
is shown in Figure 1.9.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
12
0.25
Resonator position xk
0.2
0.15
0.1
0.05
0
0.05
0.1
Signal
Measurement
Filter Estimate
95% Quantile
0.15
0.2
0.25
0
10
15
Time step k
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
13
0.25
Resonator position xk
0.2
0.15
0.1
0.05
0
0.05
0.1
Signal
Measurement
Smoother Estimate
95% Quantile
0.15
0.2
0.25
0
10
15
Time step k
But because the Bayesian optimal filtering and smoothing equations are
generally computationally intractable, many kinds of numerical approximation methods have been developed, for example:
The extended Kalman filter (EKF) approximates the non-linear and nonGaussian measurement and dynamic models by linearization, that is,
by forming a Taylor series expansion at the nominal (or maximum a
posteriori, MAP) solution. This results in a Gaussian approximation to
the filtering distribution.
The extended RauchTungStriebel smoother (ERTSS) is the approximate non-linear smoothing algorithm corresponding to EKF.
The unscented Kalman filter (UKF) approximates the propagation
of densities through the non-linearities of measurement and noise
processes using the unscented transform. This also results in a Gaussian
approximation.
The unscented RauchTungStriebel smoother (URTSS) is the approximate non-linear smoothing algorithm corresponding to UKF.
Sequential Monte Carlo methods or particle filters and smoothers represent the posterior distribution as a weighted set of Monte Carlo samples.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
14
The unscented particle filter (UPF) and local linearization based particle
filtering methods use UKFs and EKFs, respectively, for approximating
the optimal importance distributions in particle filters.
RaoBlackwellized particle filters and smoothers use closed form integration (e.g., Kalman filters and RTS smoothers) for some of the state
variables and Monte Carlo integration for others.
Grid based approximation methods approximate the filtering and
smoothing distributions as discrete distributions on a finite grid.
Other methods also exist, for example, based on Gaussian mixtures, series expansions, describing functions, basis function expansions, exponential family of distributions, variational Bayesian methods, and batch
Monte Carlo (e.g., Markov chain Monte Carlo, MCMC, methods).
1 ; /;
yk p.yk j xk ; /:
(1.7)
The full Bayesian solution to this problem would require the computation
of the full joint posterior distribution of states and parameters p.x0WT ; j
y1WT /. Unfortunately, computing this joint posterior of the states and parameters is even harder than computation of the joint distribution of states
alone, and thus this task is intractable.
Fortunately, when run with fixed parameters , the Bayesian filter algorithm produces the sequence of distributions p.yk j y1Wk 1 ; / for k D
1; : : : ; T as side products. Once we have these, we can form the marginal
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
1.6 Exercises
15
T
Y
p.yk j y1Wk
1 ; /;
(1.8)
kD1
where we have denoted p.y1 j y1W0 ; / , p.y1 j / for notational convenience. When combined with the smoothing distributions, we can form all
the marginal joint distributions of states and parameters as follows:
p.xk ; j y1WT / D p.xk j y1WT ; / p. j y1WT /;
(1.9)
1.6 Exercises
1.1
1.2
Find the seminal article of Kalman (1960b) from the Internet (or, e.g., from
a library) and investigate the orthogonal projections approach that is taken
in the article. How would you generalize the approach to the non-linear/nonGaussian case? Is it possible?
An alternative to Bayesian estimation would be to formulate the state estimation problem as maximum likelihood (ML) estimation. This would amount
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
16
1.3
(1.11)
Do you see any problem with this approach? Hint: where is the dynamic
model?
Assume that in an electronics shop the salesperson decides to give you a
chance to win a brand new GPS receiver. He lets you choose one of three
packages of which one contains the GPS receiver and two others are empty.
After you have chosen the package, the salesperson opens one of the packages that you have not chosen and that package turns out to be empty.
He gives you a chance to switch to the other yet unopened package. Is it
advantageous for you to do that?
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
2
Bayesian inference
This chapter provides a brief presentation of the philosophical and mathematical foundations of Bayesian inference. The connections to classical
statistical inference are also briefly discussed.
17
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
18
Bayesian inference
L./ D
T
Y
p.yk j /:
(2.1)
kD1
The maximum of the likelihood function with respect to gives the maximum likelihood estimate (ML-estimate)
O D arg max L./:
(2.2)
The difference between the Bayesian inference and the maximum likelihood method is that the starting point of Bayesian inference is to formally
consider the parameter as a random variable. Then the posterior distribution of the parameter can be computed by using Bayes rule
p. j y1WT / D
p.y1WT j / p./
;
p.y1WT /
(2.3)
where p./ is the prior distribution which models the prior beliefs on the
parameter before we have seen any data, and p.y1WT / is a normalization
term which is independent of the parameter . This normalization constant
is often left out and if the measurements y1WT are conditionally independent
given , the posterior distribution of the parameter can be written as
p. j y1WT / / p./
T
Y
p.yk j /:
(2.4)
kD1
Because we are dealing with a distribution, we might now choose the most
probable value of the random variable, the maximum a posteriori (MAP)
estimate, which is given by the maximum of the posterior distribution.
The optimal estimate in the mean squared sense is the posterior mean of
the parameter (MMSE-estimate). There are an infinite number of other
ways of choosing the point estimate from the distribution and the best way
depends on the assumed loss or cost function (or utility function). The MLestimate can be seen as a MAP-estimate with uniform prior p./ / 1 on
the parameter .
We can also interpret Bayesian inference as a convenient method for
including regularization terms into maximum likelihood estimation. The
basic ML-framework does not have a self-consistent method for including
regularization terms or prior information into statistical models. However,
this regularization interpretation of Bayesian inference is quite limited,
because Bayesian inference is much more than this.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
19
p.y j / p./
/ p.y j / p./;
p.y/
(2.7)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
20
Bayesian inference
(2.8)
In the case of multiple measurements y1WT , if the measurements are conditionally independent, the joint likelihood of all measurements is the
product of distributions of individual measurements and the posterior
distribution is
p. j y1WT / / p./
T
Y
p.yk j /;
(2.9)
kD1
where the normalization term can be computed by integrating the righthand side over . If the random variable is discrete the integration is
replaced by summation.
Predictive posterior distribution
The predictive posterior distribution is the distribution of new measurements yT C1 given the observed measurements
Z
p.yT C1 j y1WT / D p.yT C1 j / p. j y1WT / d:
(2.10)
Thus, after obtaining the measurements y1WT the predictive posterior distribution can be used for computing the probability distribution for measurement index T C 1 which has not yet been observed.
In the case of tracking, we could imagine that the parameter is the sequence
of dynamic states of a target, where the state contains the position and velocity. The measurements could be, for example, noisy distance and direction measurements produced by a radar. In this book we will divide the
parameters into two classes: the dynamic state of the system and the static
parameters of the model. But from the Bayesian estimation point of view
both the states and static parameters are unknown (random) parameters of
the system.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
21
C.; a/ D .
a/;
(2.12)
is called an absolute error loss and in this case the optimal choice is the
median of the distribution (the medians of the marginal distributions in
the multi-dimensional case).
01 loss. If the loss function is of the form
C.; a/ D
.a
/;
(2.15)
where ./ is the Diracs delta function, then the optimal choice is the
maximum (mode) of the posterior distribution, that is, the maximum a
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
22
Bayesian inference
(2.17)
N
X
iD1
Wi g. .i / /;
(2.18)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
23
i D 1; : : : ; N;
(2.19)
.i / /;
(2.21)
where ./ is the Dirac delta function. The convergence of Monte Carlo
approximation is guaranteed by the central limit theorem (CLT) (see,
e.g., Liu, 2001) and the error term is, at least in theory, under certain
ideal conditions, independent of the dimensionality of . The rule of
thumb is that the error should decrease like the square root of the number
of samples, regardless of the dimensions.
Efficient methods for generating Monte Carlo samples are the Markov
chain Monte Carlo (MCMC) methods (see, e.g., Gilks et al., 1996; Liu,
2001; Brooks et al., 2011). In MCMC methods, a Markov chain is constructed such that it has the target distribution as its stationary distribution. By simulating the Markov chain, samples from the target distribution can be generated.
Importance sampling (see, e.g., Liu, 2001) is a simple algorithm for
generating weighted samples from the target distribution. The difference
between this and direct Monte Carlo sampling and MCMC is that each of
the particles has an associated weight, which corrects for the difference
between the actual target distribution and the approximate importance
distribution ./ from which the sample was drawn.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Bayesian inference
24
An importance sampling estimate can be formed by drawing N samples from the importance distribution
.i / . j y1WT /;
i D 1; : : : ; N:
(2.22)
1 p. .i / j y1WT /
;
N . .i / j y1WT /
(2.23)
N
X
wQ .i / g. .i / /;
(2.24)
wQ .i / g. .i / /
:
PN
Q .i /
i D1 w
(2.25)
i D1
or alternatively as
PN
Eg./ j y1WT
iD1
2.6 Exercises
2.1
Prove that median of distribution p. / minimizes the expected value of the
absolute error loss function
Z
Ej aj D j aj p. / d:
(2.26)
2.2
Find the optimal point estimate a which minimizes the expected value of the
loss function
C.; a/ D .
2.3
a/T R .
a/;
(2.27)
k D 1; 2; : : : ; T:
(2.28)
The purpose is now to derive estimates of the parameters 1 and 2 such that
the following error is minimized (least squares estimate):
E.1 ; 2 / D
T
X
kD1
.yk
1 xk
2 /2 :
(2.29)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
25
Assume that in the linear regression model above (Equation (2.28)) we set
independent Gaussian priors for the parameters 1 and 2 as follows:
1 N.0; 2 /;
2 N.0; 2 /;
where the variance 2 is known. The measurements yk are modeled as
yk D 1 xk C 2 C "k ;
k D 1; 2; : : : ; T;
where the terms "k are independent Gaussian errors with mean 0 and variance 1, that is, "k N.0; 1/. The values xk are fixed and known. The posterior distribution can be now written as
p. j y1 ; : : : ; yT /
/ exp
T
1X
.yk
2
kD1
!
1 xk
2 /
exp
1 2
2 2 1
exp
1 2
:
2 2 2
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
26
2.5
Bayesian inference
Implement an importance sampling based approximation for the Bayesian
linear regression problem in the above Exercise 2.4. Use a suitable Gaussian
distribution as the importance distribution for the parameters . Check that
the posterior mean and covariance (approximately) coincide with the exact
values computed in Exercise 2.4.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
3
Batch and recursive Bayesian estimation
(3.1)
(3.2)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
28
2
Measurement
True Signal
1.5
0.5
0
0.2
0.4
0.6
0.8
Figure 3.1 The underlying truth and the measurement data in the
simple linear regression problem.
T
Y
p.yk j /
kD1
D N. j m0 ; P0 /
T
Y
N.yk j Hk ; 2 /:
kD1
(3.3)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
29
2
Measurement
True Signal
Estimate
1.5
0.5
0
0.2
0.4
0.6
0.8
1
t1
:: C ;
:A
1 tT
1
y1
B C
y D @ ::: A :
(3.5)
yT
Figure 3.2 shows the result of batch linear regression, where the posterior
mean parameter values are used as the linear regression parameters.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
30
1/
D N. j mk
1 ; Pk 1 /:
1/
/ N. j mk ; Pk /;
(3.6)
1
;
(3.7)
HTk C 2 ;
HTk Sk 1 ;
1
1
C Kk yk
Kk Sk KTk :
Hk mk
1 ;
(3.8)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
31
1.2
1
0.8
0.6
0.4
0.2
0
Recursive E[ 1 ]
0.2
Batch E[ 1 ]
0.4
Recursive E[ 2 ]
0.6
Batch E[ 2 ]
0.8
0
0.2
0.4
0.6
0.8
prediction and update) is required, because the estimated parameters are assumed to be constant, that is, there is no stochastic dynamics model for the
parameters . Figures 3.3 and 3.4 illustrate the convergence of the means
and variances of the parameters during the recursive estimation.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
32
0
10
Recursive Var[ 1 ]
Batch Var[ 1 ]
10
Recursive Var[ 2 ]
Batch Var[ 2 ]
2
10
10
10
0.2
0.4
0.6
0.8
1 Specify the likelihood model of measurements p.yk j / given the parameter . Typically the measurements yk are assumed to be conditionally independent such that
p.y1WT j / D
T
Y
p.yk j /:
kD1
2 The prior information about the parameter is encoded into the prior
distribution p./.
3 The observed data set is D D f.t1 ; y1 /; : : : ; .tT ; yT /g, or if we drop the
explicit conditioning on tk , the data is D D y1WT .
4 The batch Bayesian solution to the statistical estimation problem can be
computed by applying Bayes rule:
T
p. j y1WT / D
Y
1
p./
p.yk j /;
Z
kD1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
33
For example, the batch solution of the above kind to the linear regression
problem (3.2) was given by Equations (3.3) and (3.4).
The recursive Bayesian solution to the above statistical estimation problem can be formulated as follows.
1 The distribution of measurements is again modeled by the likelihood
function p.yk j / and the measurements are assumed to be conditionally independent.
2 In the beginning of estimation (i.e., at step 0), all the information about
the parameter we have is contained in the prior distribution p./.
3 The measurements are assumed to be obtained one at a time, first y1 ,
then y2 and so on. At each step we use the posterior distribution from
the previous time step as the current prior distribution:
1
p.y1 j /p./;
Z1
1
p. j y1W2 / D
p.y2 j /p. j y1 /;
Z2
1
p. j y1W3 / D
p.y3 j /p. j y1W2 /;
Z3
::
:
1
p. j y1WT / D
p.yT j /p. j y1WT 1 /:
ZT
p. j y1 / D
It is easy to show that the posterior distribution at the final step above
is exactly the posterior distribution obtained by the batch solution. Also,
reordering of measurements does not change the final solution.
For example, Equations (3.6) and (3.7) give the one step update rule for the
linear regression problem in Equation (3.2).
The recursive formulation of Bayesian estimation has many useful properties.
The recursive solution can be considered as the on-line learning solution
to the Bayesian learning problem. That is, the information on the parameters is updated in an on-line manner using new pieces of information as
they arrive.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
34
1/
D N.k j k
1 ; Q/;
p.0 / D N.0 j m0 ; P0 /;
(3.9)
where Q is the covariance of the random walk. Now, given the distribution
p.k
j y1Wk
1/
D N.k
j y1Wk
1/
j mk
1 ; Pk 1 /;
is1
D p.k j k
1 / p.k 1
j y1Wk
1 /:
1/
D N.k j mk ; Pk /;
where
mk D mk
Pk D Pk
1
1;
1
C Q:
Note that this formula is correct only for Markovian dynamic models, where
p.k j k 1 ; y1Wk 1 / D p.k j k 1 /.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
35
1.5
Measurements
True Signal
Estimate
0.5
0.5
1.5
0
0.5
1
t
1.5
and Pk
are replaced by
Sk D Hk Pk HTk C 2 ;
Kk D Pk HTk Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Hk mk ;
Kk Sk KTk :
(3.10)
This recursive computational algorithm for the time-varying linear regression weights is again a special case of the Kalman filter algorithm. Figure 3.5 shows the result of recursive estimation of a sine signal assuming a
small diagonal Gaussian drift model for the parameters.
At this point we change from the regression notation used so far into
state space model notation, which is commonly used in Kalman filtering and related dynamic estimation literature. Because this notation easily
causes confusion to people who have got used to regression notation, this
point is emphasized.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
36
In state space notation x means the unknown state of the system, that is,
the vector of unknown parameters in the system. It is not the regressor,
covariate or input variable of the system.
For example, the time-varying linear regression model with drift presented in this section can be transformed into the more standard state
space model notation by replacing the variable k D .1;k 2;k /T with
the variable xk D .x1;k x2;k /T :
p.yk j xk / D N.yk j Hk xk ; 2 /;
p.xk j xk
1/
D N.xk j xk
1 ; Q/;
p.x0 / D N.x0 j m0 ; P0 /:
(3.11)
From now on, the symbol is reserved for denoting the static parameters
of the state space model. Although there is no fundamental difference between states and static parameters of the model (we can always augment
the parameters as part of the state), it is useful to treat them separately.
xk
D xP tk
1;
(3.12)
where xP is the derivative, which is constant in the exactly linear case. The
divergence from the exactly linear function can be modeled by assuming
that the above equation does not hold exactly, but there is a small noise
term on the right hand side. The derivative can also be assumed to perform
a small random walk and thus not be exactly constant. This model can be
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
37
written as follows:
x1;k D x1;k
C tk
1 x2;k 1
x2;k D x2;k
C q2;k
1;
C q1;k
1;
yk D x1;k C rk ;
(3.13)
where the signal is the first components of the state, x1;k , xk , and the
derivative is the second, x2;k , xP k . The noises are rk N.0; 2 / and
.q2;k 1 ; q2;k 1 / N.0; Q/. The model can also be written in the form
p.yk j xk / D N.yk j H xk ; 2 /;
p.xk j xk
1/
D N.xk j Ak
xk
1 ; Q/;
(3.14)
where
Ak
1 tk
0
1
HD 1 0 :
1/
D N.xk j Ak
xk
1 ; Qk 1 /:
mk
Pk D Ak
Pk
1;
1
ATk
C Qk
1:
2 Update step:
Sk D Hk Pk HTk C Rk ;
Kk D Pk HTk Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Hk mk ;
Kk Sk KTk :
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
38
1.5
Measurements
True Signal
Estimate
0.5
0.5
1.5
0
0.5
1
t
1.5
The result of tracking the sine signal with Kalman filter is shown in Figure 3.6. All the mean and covariance calculation equations given in this
book so far have been special cases of the above equations, including the
batch solution to the scalar measurement case (which is a one-step solution). The Kalman filter recursively computes the mean and covariance of
the posterior distributions of the form
p.xk j y1Wk / D N.xk j mk ; Pk /:
Note that the estimates of xk derived from this distribution are nonanticipative in the sense that they are only conditional on the measurements obtained before and at the time step k. However, after we have
obtained the measurements y1 ; : : : ; yk , we could compute estimates of
xk 1 ; xk 2 ; : : :, which are also conditional to the measurements after the
corresponding state time steps. Because more measurements and more
information is available for the estimator, these estimates can be expected
to be more accurate than the non-anticipative measurements computed by
the filter.
The above mentioned problem of computing estimates of the state
by conditioning not only on previous measurements, but also on future
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
39
1.5
Measurements
True Signal
Estimate
0.5
0.5
1.5
0
0.5
1
t
1.5
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
40
1.5
0.5
0.5
Measurements
True Signal
Estimate
1.5
0
0.5
1.5
2.5
(3.15)
1;
yk D Hk xk C "k :
(3.16)
Because the model is a linear Gaussian state space model, we can use
linear Kalman filter for estimating the parameters.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
41
(3.17)
where the weights w0 ; : : : ; wd are to be estimated from a set of measurements f.s1 ; y1 /; : : : ; .sT ; yT /g. Analogously to the previous example, we
can convert the problem into a linear Gaussian state space model by defining Hk D .1 g1 .sk / gd .sk // and xk D x D .w0 w1 wd /T .
Linearity of the state space models in the above examples resulted from
the property that the models are linear in parameters. Generalized linear
models involving non-linear link functions will lead to non-linear state
space models, as is illustrated with the following example.
Example 3.3 (Generalized linear model)
linear model is
An example of a generalized
yk D g.w0 C w T sk / C "k ;
(3.18)
1;
yk D hk .xk / C "k :
(3.19)
Because the state space model is non-linear, instead of the linear Kalman
filter, we need to use non-linear Kalman filters such as the extended Kalman
filter (EKF) or the unscented Kalman filter (UKF) to cope with the nonlinearity.
One general class of non-linear regression models, which can be converted into state space models using an analogous approach to the above,
is the multi-layer perceptron (MLP) neural networks (see, e.g., Bishop,
2006). Using a non-linear Kalman filter is indeed one way to train (to estimate the parameters of) such models (Haykin, 2001). However, non-linear
regression models of this kind arise in various others contexts as well.
In digital signal processing (DSP), an important class of models is linear
signal models such as autoregressive (AR) models, moving average (MA)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
42
models, autoregressive moving average models (ARMA) and their generalizations (see, e.g., Hayes, 1996). In those models one is often interested
in performing adaptive filtering, which refers to the methodology where
the parameters of the signal model are estimated from data. These kinds
of adaptive filtering problem can often be formulated as Kalman filtering
problems, as is illustrated in the following example.
Example 3.4 (Autoregressive (AR) model) An autoregressive (AR) model
of order d has the form
yk D w1 yk
C C wd yk
C "k ;
(3.20)
where "k is a white Gaussian noise process. The problem of adaptive filtering is to estimate the weights w1 ; : : : ; wd given the observed signal
y1 ; y2 ; y3 ; : : :. If we let Hk D .yk 1 yk d / and define the state as
xk D .w1 wd /T , we get a linear Gaussian state space model
xk D xk
1;
yk D Hk xk C "k :
(3.21)
Thus the adaptive filtering problem can be solved with a linear Kalman
filter.
The classical algorithm for an adaptive filtering is called the least mean
squares (LMS) algorithm, and it can be interpreted as an approximate version of the above Kalman filter. However, in LMS algorithms it is common
to allow the model to change in time, which in the state space context corresponds to setting up a dynamic model for the model parameters. This
kind of model is illustrated in the next example.
Example 3.5 (Time-varying autoregressive (TVAR) model) In a timevarying autoregressive (TVAR) model the weights are assumed to depend
on the time step number k as
yk D w1;k yk
C C wd;k yk
C "k :
(3.22)
A typical model for the time dependence of weights is the random walk
model
wi;k D wi;k
C qi;k
1;
qi;k
N.0; Qi /;
i D 1; : : : ; d:
(3.23)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
43
.y1 ; y2 /
g1 .t/
g2 .t /
(a)
(b)
.q1;k
qd;k
1/
:
xk D xk
C qk
1;
yk D Hk xk C "k :
(3.24)
(3.25)
where a.t/ is the acceleration, m is the mass of the car, and g.t / is a vector
of (unknown) forces acting on the car. Lets now model g.t /=m as a twodimensional white random process
d2 x1
D w1 .t /;
dt 2
d2 x2
D w2 .t /:
dt 2
(3.26)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
44
C qk
1;
(3.29)
(3.30)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
45
w.t /
g
The dynamic and measurement models of the car form a linear Gaussian
state space model
xk D A xk
C qk
1;
yk D H xk C rk ;
where qk 1 N.0; Q/ and rk N.0; R/. The state of the car from the
noisy measurements can be estimated using the Kalman filter.
The above car model is a linear Gaussian state space model, because
the model was based on a linear differential equation, and the measured
quantities were linear functions of the state. However, if either the dynamic
or measurement model is non-linear, we get a non-linear state space model,
as is illustrated in the following example.
Example 3.7 (Noisy pendulum model) The differential equation for a
simple pendulum (see Figure 3.10) with unit length and mass can be written
as
d2
D g sin./ C w.t /;
(3.31)
dt 2
where is the angle, g is the gravitational acceleration and w.t / is a
random noise process. This model can be converted into the following state
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
46
space model:
d
dt
x2
0
x1
D
C
w.t /;
g sin.x1 /
1
x2
(3.32)
(3.34)
(3.35)
1 ; qk 1 /;
yk D h.xk ; rk /;
(3.36)
3.7 Exercises
3.1
In this exercise your task is to implement a simple Kalman filter using readymade codes and analyze the results.
R
(a) Download and install the EKF/UKF toolbox2 to some MATLAB
computer. Run the following demonstrations:
demos/kf_sine_demo/kf_sine_demo.m
demos/kf_cwpa_demo/kf_cwpa_demo.m
2
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
47
After running them read the contents of these files and try to
understand how they have been implemented. Also read the documentation for the functions kf_predict and kf_update (type, e.g.,
R
help kf_predict in MATLAB
).
(b) Consider the following state space model:
1 1
xk D
x
C qk 1 ;
0 1 k 1
yk D 1 0 xk C rk ;
(3.37)
where xk D .xk xP k /T is the state, yk is the measurement, and qk
N.0; diag.1=102 ; 12 // and rk N.0; 102 / are white Gaussian noise
processes.
Simulate a 100 step state sequence from the model and plot the signal
xk , signal derivative xP k , and the simulated measurements yk . Start from
an initial state drawn from the zero-mean 2d-Gaussian distribution with
identity covariance.
(c) Use the Kalman filter for computing the state estimates mk using
R
-code along the following lines:
MATLAB
m = [0;0]; % Initial mean
P = eye(2); % Initial covariance
for k = 1:100
[m,P] = kf_predict(m,P,A,Q);
[m,P] = kf_update(m,P,y(k),H,R);
% Store the estimate m of state x_k here
end
(d) Plot the state estimates mk , the true states xk and measurements yk .
Compute the root mean squared error (RMSE) of using the first components of vectors mk as the estimates of the first components of states
xk . Also compute the RMSE error that we would have if we used the
measurements as the estimates.
3.2
Note that the model in Exercise 2.4 can be rewritten as a linear state space
model
wk D wk
1;
yk D Hk wk C "k ;
where Hk D .xk 1/, w0 N.0; 2 I/ and "k N.0; 1/. The state in
the model is now wk D .1 2 /T and the measurements are yk for k D
1; : : : ; T . Assume that the Kalman filter is used for processing the measurements y1 ; : : : ; yT . Your task is to prove that at time step T , the mean and
covariance of wT computed by the Kalman filter are the same as the mean
and covariance of the posterior distribution computed in Exercise 2.4.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
48
The Kalman filter equations for the above model can be written as:
T
1 Hk
Sk D Hk Pk
Kk D
Pk 1 HTk
mk D mk
Pk D Pk
C 1;
Sk ;
C Kk .yk
1 /;
H k mk
Kk Sk KTk :
(a) Write formulas for the posterior mean mk 1 and covariance Pk 1 assuming that they are the same as those which would be obtained if the
pairs f.xi ; yi / W i D 1; : : : ; k 1g were (batch) processed as in Exercise 2.4. Write similar equations for the mean mk and covariance Pk .
Show that the posterior means can be expressed in the form
mk
T
1 Xk 1 yk 1 ;
Pk XTk yk ;
D Pk
mk D
Pk
T
1 Hk
.Hk Pk
T
1
Hk
1 Hk C1/
Pk
D .Pk 11 CHTk Hk /
k D 1; : : : ; T;
(3.38)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
3.4
49
where sk 2 R are the known regressor values, R; 12 ; 22 ; b2 are given positive constants, yk 2 R are the observed output variables, and "k are independent Gaussian measurement errors. The scalars a1 , a2 , and b are the
unknown parameters to be estimated. Formulate the estimation problem as a
linear Gaussian state space model.
Consider the model
xk D
d
X
ai xk
C qk
1;
i D1
yk D xk C "k ;
3.5
(3.39)
where the values fyk g are observed, qk 1 and "k are independent Gaussian
noises, and the weights a1 ; : : : ; ad are known. The aim is to estimate the
sequence fxk g. Rewrite the problem as an estimation problem in a linear
Gaussian state space model.
Recall that the Gaussian probability density is defined as
1
1
T
1
:
exp
.x
m/
P
.x
m/
N.x j m; P / D
2
.2 /n=2 jP j1=2
Derive the following Gaussian identities.
(a) Let x and y have the Gaussian densities
p.x/ D N.x j m; P /;
P HT
H P HT C R
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
50
.y
b/; A
CB
C T /:
Hints:
D11 D12
and expand the
T
D22
D12
quadratic form in the Gaussian exponent.
Compute the derivative with respect to x and set it to zero. Conclude
that due to symmetry the point where the derivative vanishes is the
mean.
Check from a linear algebra book that the inverse of D11 is given by
the Schur complement:
D111 D A
CB
CT
D11 C B
Find the simplified expression for the mean by applying the identities
above.
Find the second derivative of the negative Gaussian exponent with
respect to x. Conclude that it must be the inverse conditional covariance of x.
Use the Schur complement expression above for computing the conditional covariance.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
4
Bayesian filtering equations and
exact solutions
In this chapter, we derive the Bayesian filtering equations, which are the
general equations for computing Bayesian filtering solutions to both linear
Gaussian and non-linear/non-Gaussian state space models. We also derive
the Kalman filtering equations which give the closed form solution to the
linear Gaussian Bayesian filtering problem.
1 /;
yk p.yk j xk /;
(4.1)
for k D 1; 2; : : :, where
xk 2 Rn is the state of the system at time step k,
yk 2 Rm is the measurement at time step k,
p.xk j xk 1 / is the dynamic model which describes the stochastic dynamics of the system. The dynamic model can be a probability density,
a counting measure or a combination of them depending on whether the
state xk is continuous, discrete, or hybrid.
p.yk j xk / is the measurement model, which is the distribution of measurements given the state.
The model is assumed to be Markovian, which means that it has the
following two properties.
51
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
52
1 ; y1Wk 1 /
D p.xk j xk
1 /:
(4.2)
j xk /:
(4.3)
1/
D p.yk j xk /:
(4.4)
C qk
yk D xk C rk ;
1;
N.0; Q/;
rk N.0; R/;
(4.5)
D N.xk j xk 1 ; Q/
1
1
2
exp
.xk xk 1 / ;
Dp
2Q
2Q
p.yk j xk / D N.yk j xk ; R/
1
1
2
.yk xk / ;
Dp
exp
2R
2R
1/
(4.6)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
53
Signal
Measurement
4
2
xk
0
2
4
6
8
10
0
20
40
60
Time step k
80
100
With the Markovian assumption and the filtering model (4.1), the joint
prior distribution of the states x0WT D fx0 ; : : : ; xT g, and the joint likelihood of the measurements y1WT D fy1 ; : : : ; yT g are, respectively,
p.x0WT / D p.x0 /
T
Y
p.xk j xk
1 /;
(4.7)
kD1
T
Y
p.y1WT j x0WT / D
p.yk j xk /:
(4.8)
kD1
In principle, for a given T we could simply compute the posterior distribution of the states by Bayes rule:
p.y1WT j x0WT / p.x0WT /
p.y1WT /
/ p.y1WT j x0WT / p.x0WT /:
p.x0WT j y1WT / D
(4.9)
However, this kind of explicit usage of the full Bayes rule is not feasible in
real-time applications, because the number of computations per time step
increases as new observations arrive. Thus, this way we could only work
with small data sets, because if the amount of data is unbounded (as in realtime sensing applications), then at some point of time the computations
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
54
(4.10)
1
p.yk j xk / p.xk j y1Wk
Zk
1 /;
(4.12)
(4.13)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
55
If some of the components of the state are discrete, the corresponding integrals are replaced with summations.
(a)
(b)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
56
Proof
as
p.xk ; xk
j y1Wk
1/
given y1Wk
D p.xk j xk
1 ; y1Wk 1 / p.xk 1
D p.xk j xk
1 / p.xk 1
j y1Wk
can be computed
j y1Wk
1 /;
1/
(4.14)
p.xk j y1Wk / D
1/
(4.16)
where the normalization constant is given by Equation (4.13). The disappearance of the measurement history y1Wk 1 in Equation (4.16) is due to the
conditional independence of yk of the measurement history, given xk .
xk
C qk
1;
yk D Hk xk C rk ;
(4.17)
1/
D N.xk j Ak
xk
1 ; Qk 1 /;
p.yk j xk / D N.yk j Hk xk ; Rk /:
(4.18)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
57
Theorem 4.2 (Kalman filter) The Bayesian filtering equations for the linear filtering model (4.17) can be evaluated in closed form and the resulting
distributions are Gaussian:
p.xk j y1Wk
1/
D N.xk j mk ; Pk /;
1/
D N.yk j Hk mk ; Sk /:
(4.19)
The parameters of the distributions above can be computed with the following Kalman filter prediction and update steps.
The prediction step is
mk D Ak
mk
Pk D Ak
Pk
1;
1
ATk
C Qk
1:
(4.20)
Hk mk ;
Sk D Hk Pk HTk C Rk ;
Kk D Pk HTk Sk 1 ;
mk D mk C Kk vk ;
Pk D Pk
Kk Sk KTk :
(4.21)
1 ; xk
j y1Wk
D p.xk j xk
given
1/
1 / p.xk 1
j y1Wk
1/
D N.xk j Ak 1 xk 1 ; Qk 1 / N.xk
xk 1 0 0
DN
m ;P ;
xk
j mk
1 ; Pk 1 /
(4.22)
where
mk 1
m D
Ak 1 mk
Pk 1
0
P D
Ak 1 Pk
0
;
1
Pk 1 ATk 1
Ak 1 Pk 1 ATk 1 C Qk
;
1
(4.23)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
58
1/
D N.xk j mk ; Pk /;
(4.24)
where
mk D Ak
mk
1;
P k D Ak
Pk
ATk
C Qk
1:
(4.25)
1/
1/
D N.yk j Hk xk ; Rk / N.xk j mk ; Pk /
xk 00 00
m
;
P
;
DN
yk
(4.26)
where
m00 D
mk
Hk mk
;
P 00 D
Pk
Hk Pk
Pk HTk
: (4.27)
Hk Pk HTk C Rk
1/
D p.xk j y1Wk /
D N.xk j mk ; Pk /;
(4.28)
where
mk D mk C Pk HTk .Hk Pk HTk C Rk / 1 yk
Pk D Pk
Pk HTk
.Hk Pk HTk
C Rk /
Hk mk ;
Hk Pk ;
(4.29)
(4.30)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
59
Signal
Measurement
Filter Estimate
95% Quantiles
4
2
xk
0
2
4
6
8
10
0
20
40
60
Time step k
80
100
The Kalman filter prediction and update equations are now given as
mk D mk
1;
Pk D Pk
C Q;
Pk
.yk
mk D mk C
Pk C R
mk /;
.Pk /2
:
Pk C R
Pk D Pk
(4.31)
The result of applying this Kalman filter to the data in Figure 4.1 is shown
in Figure 4.4.
Example 4.3 (Kalman filter for car tracking) By discretizing the state
space model for the car in Example 3.6 we get the following linear state
space model:
xk D A xk
C qk
yk D H xk C rk ;
1;
qk
N.0; Q/;
(4.32)
rk N.0; R/;
(4.33)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
60
0
q2c t 3
3
0
q2c
t 2
2
q1c t 2
2
0
c
q1 t
0
q2c t 2 C
C
2 C;
C
0 A
q2c t
where q1c and q2c are the spectral densities (continuous time variances)
of the process noises in each direction. The matrices in the measurement
model are
2
1 0 0 0
1 0
;
HD
;
RD
0 1 0 0
0 22
where 12 and 22 are the measurement noise variances in each position
coordinate.
The Kalman filter prediction step now becomes the following:
0
1
B 0
mk D B
@ 0
0
0
1
B 0
Pk D B
@ 0
0
0 t
1 0
0 1
0 0
1
0
t C
Cm
0 A k
1
0 t
1 0
0 1
0 0
1
0
t C
CP
0 A k
1
q1c t 3
B 3
B 0
CB
B q1c t 2
@ 2
0
0
q2c t 3
3
0
q2c
t
2
1;
0
1
1
B 0
B
@ 0
0
q1c t 2
2
q1c
0
t
0
0 t
1 0
0 1
0 0
1
q2c t 2 C
C
2 C:
C
0 A
q2c t
1T
0
t C
C
0 A
1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
61
2
True Trajectory
Measurements
Filter Estimate
0
2
x2
4
6
8
10
12
4
4
x1
10
12
0
;
22
Kk Sk KTk :
The result of applying this filter to simulated data is shown in Figure 4.5.
The parameter values used in the simulation were 1 D 2 D 1=2, q1c D
q2c D 1, and t D 1=10.
Although the Kalman filter can be seen as the closed form Bayesian
solution to the linear Gaussian filtering problem, the original derivation of
Kalman (1960b) is based on a different principle. In his seminal article,
Kalman (1960b) derived the filter by considering orthogonal projections
on the linear manifold spanned by the measurements. A similar approach
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
62
4.4 Exercises
4.1
Derive the Kalman filter equations for the following linear-Gaussian filtering
model with non-zero-mean noises:
xk D A xk
C qk
yk D H xk C rk ;
4.2
4.3
4.4
4.5
4.6
1;
(4.34)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
63
q ! q cos.!/ sin.!/
2! 3
q c sin2 .!/
2! 2
q sin .!/
2! 2
q !Cq c cos.!/ sin.!/
2!
c
(4.35)
R
The MATLAB
files are available on this books web page
www.cambridge.org/sarkka/
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
5
Extended and unscented Kalman filtering
It often happens in practical applications that the dynamic and measurement models are not linear and the Kalman filter is not appropriate. However, often the filtering distributions of this kind of model can be approximated by Gaussian distributions. In this chapter, three types of method
for forming the Gaussian approximations are considered, the Taylor series
based extended Kalman filters (EKF), the statistical linearization based statistically linearized filters (SLF), and the unscented transform based unscented Kalman filters (UKF). Among these, the UKF differs from the
other filters in this section in the sense that it is not a series expansion
based method per se even though it was originally justified by considering a series expansion of the non-linear function. Actually, the relationship
is even closer than that, because the UKF can be considered as an approximation of the SLF and it converges to the EKF in a suitable parameter
limit.
(5.1)
This actually only applies to invertible g./, but it can easily be generalized to the
non-invertible case.
64
(5.2)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
65
where jJ .y/j is the determinant of the Jacobian matrix of the inverse transform g 1 .y/. However, it is not generally possible to handle this distribution directly, because it is non-Gaussian for all but linear g.
A first order Taylor series based Gaussian approximation to the distribution of y can now be formed as follows. If we let x D m C x, where
x N.0; P /, we can form the Taylor series expansion of the function g./
as follows (provided that the function is sufficiently differentiable):
X1
.i /
xT Gxx
.m/ x ei C ;
g.x/ D g.mCx/ g.m/CGx .m/ xC
2
i
(5.3)
where Gx .m/ is the Jacobian matrix of g with elements
@gj .x/
Gx .m/j;j 0 D
;
(5.4)
@xj 0
xDm
and
.i /
.m/
Gxx
h
i
@2 gi .x/
.i /
Gxx .m/ 0 D
:
j;j
@xj @xj 0
(5.5)
xDm
(5.6)
(5.7)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
66
g.m//T
(5.10)
(5.11)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
67
Algorithm 5.1 (Linear approximation of an additive transform) The linear approximation based Gaussian approximation to the joint distribution of x and the transformed random variable y D g.x/ C q, where
x N.m; P / and q N.0; Q/, is given as
P CL
x
m
;
(5.12)
;
N
CLT SL
y
L
where
L D g.m/;
SL D Gx .m/ P GxT .m/ C Q;
CL D P GxT .m/;
(5.13)
@gj .x/
Gx .m/j;j 0 D
:
(5.14)
@xj 0
xDm
In filtering models where the process noise is not additive, we often need
to approximate transformations of the form
x N.m; P /;
q N.0; Q/;
y D g.x; q/;
(5.15)
where x and q are independent random variables. The mean and covariance
can now be computed by substituting the augmented vector .x; q/ for the
vector x in Equation (5.10). The joint Jacobian matrix can then be written
as Gx;q D .Gx Gq /. Here Gq is the Jacobian matrix of g./ with respect
to q and both Jacobian matrices are evaluated at x D m; q D 0. The
approximations to the mean and covariance of the augmented transform as
in Equation (5.10) are then given as
Q
Eg.x;
q/ ' g.m; 0/;
T
I
0
P 0
I
0
Q
Covg.x;
q/ '
Gx .m/ Gq .m/
0 Q
Gx .m/ Gq .m/
T
P
P Gx .m/
D
:
Gx .m/ P Gx .m/ P GxT .m/ C Gq .m/ Q GqT .m/
(5.16)
The approximation above can be formulated as the following algorithm.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
68
(5.18)
@gj .x; q/
Gx .m/j;j 0 D
;
(5.19)
@xj 0
xDm; qD0
@gj .x; q/
Gq .m/ j;j 0 D
:
(5.20)
@qj 0
xDm; qD0
In quadratic approximations, in addition to the first order terms, the second order terms in the Taylor series expansion of the non-linear function
are also retained.
Algorithm 5.3 (Quadratic approximation of an additive non-linear transform) The second order approximation is of the form
P CQ
x
m
;
;
(5.21)
N
CQT SQ
y
Q
where the parameters are
n
o
1X
.i /
Q D g.m/ C
ei tr Gxx
.m/ P ;
2 i
o
n
1X
.i /
.i 0 /
SQ D Gx .m/ P GxT .m/ C
ei eTi 0 tr Gxx
.m/ P Gxx
.m/ P ;
2 i;i 0
CQ D P GxT .m/;
(5.22)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
69
.i /
Gx .m/ is the Jacobian matrix (5.14), and Gxx
.m/ is the Hessian matrix
of gi ./ evaluated at m:
h
i
2
@
g
.x/
i
.i /
;
(5.23)
Gxx
.m/ 0 D
j;j
@xj @xj 0
xDm
1/
C qk
1;
yk D h.xk / C rk ;
(5.24)
(5.25)
1 /;
Pk D Fx .mk
1 / Pk 1
FxT .mk
1/
C Qk
1;
(5.26)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
70
Update:
vk D yk
h.mk /;
Kk Sk KTk :
(5.27)
1/
C qk
1;
(5.28)
0 0
m
;
P
;
(5.29)
where
mk 1
m D
;
f .mk 1 /
Pk 1
Pk 1 FxT
P0 D
Fx Pk 1 Fx Pk 1 FxT C Qk
0
;
(5.30)
1 /;
Pk D Fx Pk
FxT C Qk
1:
(5.31)
(5.32)
1/ ' N
xk
yk
00 00
m
;
P
;
(5.33)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
71
where
m00 D
mk
;
h.mk /
P 00 D
Pk
Hx Pk
Pk HTx
; (5.34)
Hx Pk HTx C Rk
1/
' N.xk j mk ; Pk /;
(5.35)
where
1
Pk HTx
.Hx Pk HTx
C Rk /
yk
h.mk /;
Hx Pk :
(5.36)
1 ; qk 1 /;
yk D h.xk ; rk /;
(5.37)
1 ; 0/;
Pk D Fx .mk
1 / Pk 1
FxT .mk
1/
C Fq .mk
1 / Qk 1
FqT .mk
1 /;
(5.38)
Update:
vk D y k
h.mk ; 0/;
Kk Sk KTk ;
(5.39)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
72
where the matrices Fx .m/, Fq .m/, Hx .m/, and Hr .m/ are the Jacobian
matrices of f and h with respect to state and noise, with elements
@fj .x; q/
Fx .m/j;j 0 D
;
(5.40)
@xj 0
xDm; qD0
@fj .x; q/
Fq .m/ j;j 0 D
;
(5.41)
@qj 0
xDm; qD0
@hj .x; r/
Hx .m/j;j 0 D
;
(5.42)
@xj 0
xDm; rD0
@hj .x; r/
Hr .m/j;j 0 D
:
(5.43)
@rj 0
xDm; rD0
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
73
Algorithm 5.6 (Extended Kalman filter III) The prediction and update
steps of the second order extended Kalman filter (in the additive noise case)
are:
Prediction:
mk D f .mk
1/ C
n
1X
.i /
ei tr Fxx
.mk
2 i
1 / Pk
o
1
.i 0 /
1 Fxx .mk 1 /Pk 1
C Qk
1;
(5.44)
Update:
vk D y k
h.mk /
o
n
1X
ei tr H.ixx/ .mk / Pk ;
2 i
Kk Sk KTk ;
(5.45)
where the matrices Fx .m/ and Hx .m/ are given by Equations (5.40) and
.i /
(5.42). The matrices Fxx
.m/ and H.ixx/ .m/ are the Hessian matrices of fi
and hi respectively:
h
i
@2 fi .x/
.i /
Fxx .m/ 0 D
;
(5.46)
j;j
@xj @xj 0
xDm
h
i
2
@
h
.x/
i
H.ixx/ .m/ 0 D
:
(5.47)
j;j
@xj @xj 0
xDm
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
74
5
True Angle
Measurements
EKF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
xk
f .xk
1/
yk D sin.x1;k / Crk ;
(5.48)
h.xk /
where qk
q t
3
q c t 2
2
q t
2
c
q t
(5.49)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
75
in the angle is 0:12 which is much lower than the standard deviation of the
measurement noise which was 0:32.
(5.51)
MSE.b; A/ D E.g.x/
A x/T .g.x/
A x/:
(5.52)
(5.53)
In this approximation to the transform g.x/, b is now exactly the mean and
the approximate covariance is given as
E.g.x/
Eg.x// .g.x/
Eg.x//T
' A P AT
D Eg.x/ xT P
Eg.x/ xT T :
(5.54)
Q
We may now apply this approximation to the augmented function g.x/
D
.x; g.x// in Equation (5.9) of Section 5.1, where we get the approximations
m
Q
Eg.x/
'
;
Eg.x/
P
Eg.x/ xT T
Q
Covg.x/
'
: (5.55)
Eg.x/ xT Eg.x/ xT P 1 Eg.x/ xT T
Thus we get the following algorithm.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
76
Algorithm 5.7 (Statistically linearized approximation of an additive transform) The statistical linearization based Gaussian approximation to the
joint distribution of x and the transformed random variable y D g.x/ C q,
where x N.m; P / and q N.0; Q/, is given as
x
m
P CS
N
;
;
(5.56)
CST SS
y
S
where
S D Eg.x/;
SS D Eg.x/ xT P
Eg.x/ xT T C Q;
CS D Eg.x/ xT T :
(5.57)
SS D Eg.x; q/ xT P
C Eg.x; q/ qT Q
Eg.x; q/ xT T
Eg.x; q/ qT T ;
CS D Eg.x; q/ xT T :
The expectations are taken with respect to the variables x and q.
(5.60)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
77
If the function g.x/ is differentiable, it is possible to use the following well-known property of Gaussian random variables for simplifying the
expressions:
Eg.x/ .x
(5.61)
(5.63)
and Gx .x/ is the Jacobian matrix of g. The expectations are taken with
respect to the distribution of x.
Note that we actually only need to compute the expectation Eg.x/,
because if we know the function
S .m/ D Eg.x/;
(5.64)
(5.65)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
78
(5.24) or (5.37). The filter is similar to the EKF, except that statistical linearizations in Algorithms 5.7, 5.8, and 5.9 are used instead of the Taylor
series approximations.
Algorithm 5.10 (Statistically linearized filter I) The prediction and update steps of the additive noise statistically linearized (Kalman) filter are:
Prediction:
mk D Ef .xk
1 /;
Pk D Ef .xk
T
1
1 / xk 1 Pk 1
Ef .xk
T
T
1 / xk 1
C Qk
1;
(5.66)
Eh.xk /;
Q Tk .Pk /
Sk D Eh.xk / x
Q Tk T C Rk ;
Eh.xk / x
Q Tk T Sk 1 ;
Kk D Eh.xk / x
mk D mk C Kk vk ;
Kk Sk KTk ;
Pk D Pk
(5.67)
1 ; qk 1 /;
T
1
T
T
1 ; qk 1 / xk 1 Pk 1 Ef .xk 1 ; qk 1 / xk 1
Ef .xk 1 ; qk 1 / qTk 1 Qk 1 1 Ef .xk 1 ; qk 1 / qTk 1 T ;
Pk D Ef .xk
C
(5.68)
where xk 1 D xk
to the variables xk
1
1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
79
5
True Angle
Measurements
SLF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
Update:
vk D y k
Eh.xk ; rk /;
Q Tk .Pk /
Sk D Eh.xk ; rk / x
Q Tk T
Eh.xk ; rk / x
Kk Sk KTk ;
(5.69)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
80
(5.70)
c11 c12
;
m/ D
c21 c22
T
(5.71)
where
c11 D P11 C t P12 ;
c12 D P12 C t P22 ;
c21 D P12
c22 D P22
(5.72)
cos.m1 / P11 exp. P11 =2/
:
cos.m1 / P12 exp. P11 =2/
(5.73)
and
Eh.x/ .x
m/T D
The above computations reveal the main weakness of statistical linearization based filtering there is no hope of computing the above expectations
when the functions are complicated. In this case we were lucky, because
the only non-linearities in the dynamic and measurement models were sinusoidal, for which the closed form expectations can be computed.
The result of applying the SLF to the same simulated pendulum data as
was used with the EKF in Figure 5.1 is shown in Figure 5.2. The result of
the SLF is practically the same as that of the EKF. However, the RMSE of
the SLF is slightly lower than that of the EKF.
The advantage of the SLF over the EKF is that it is a more global approximation than the EKF, because the linearization is not only based on
the local region around the mean but on a whole range of function values. The non-linearities also do not have to be differentiable. However, if
the non-linearities are differentiable, then we can use the Gaussian random
variable property (5.61) for rewriting the equations in an EKF-like form.
The clear disadvantage of the SLF over the EKF is that the expected values
of the non-linear functions have to be computed in closed form. Naturally,
it is not possible for all functions. Fortunately, the expected values involved
are of such a type that one is likely to find many of them tabulated in older
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
81
physics and control engineering books (see, e.g., Gelb and Vander Velde,
1968).
The statistically linearized filter (SLF) is a special case of the Fourier
Hermite Kalman filter (FHKF), when the first order truncation of the series
is used (Sarmavuori and Sarkka, 2012a). Many of the sigma-point methods
can also be interpreted as approximations to the FourierHermite Kalman
filters and statistically linearized filters (see Van der Merwe and Wan, 2003;
Sarkka and Hartikainen, 2010b; Sarmavuori and Sarkka, 2012a).
Note that this Gaussianity assumption is one interpretation, but the unscented transform
can also be applied without the Gaussian assumption. However, because the assumption
makes Bayesian interpretation of the UT much easier, we shall use it here.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
82
4
4
5
2
0
(a) Original
(b) Transformed
4
5
2
0
(a) Original
(b) Transformed
X .0/ D m;
X .i / D m C
X .i Cn/ D m
hp i
p
nC
P ;
hp i i
p
nC
P ;
i
i D 1; : : : ; n;
(5.74)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
4
5
83
2
0
(a) Original
(b) Transformed
where i denotes the ith column of the matrix, and is a scaling parameter, which is defined in terms of algorithm parameters and as
follows:
D 2 .n C /
n:
(5.75)
The parameters and determine the spread of the sigma points around
the mean (Wan and Van der Merwe, 2001). The matrix square root dep p T
notes a matrix such that P P D P .
2 Propagate the sigma points through the non-linear function g./:
Y .i / D g.X .i / /;
i D 0; : : : ; 2n;
2n
X
Wi.m/ Y .i / ;
i D0
Covg.x/ ' SU D
2n
X
iD0
Wi.c/ .Y .i /
U / .Y .i /
U /T ;
(5.76)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
84
where the constant weights Wi.m/ and Wi.c/ are given as follows (Wan
and Van der Merwe, 2001):
;
nC
D
C .1 2 C /;
nC
1
D
; i D 1; : : : ; 2n;
2.n C /
1
D
; i D 1; : : : ; 2n;
2.n C /
W0.m/ D
W0.c/
Wi.m/
Wi.c/
(5.77)
and is an additional algorithm parameter that can be used for incorporating prior information on the (non-Gaussian) distribution of x (Wan
and Van der Merwe, 2001).
Q
If we apply the unscented transform to the augmented function g.x/
D
.x; g.x//, we simply get a set of sigma points, where the sigma points X .i /
and Y .i / have been concatenated to the same vector. Thus, also forming the
approximation to the joint distribution x and g.x/ C q is straightforward
and the result is the following algorithm.
Algorithm 5.12 (Unscented approximation of an additive transform) The
unscented transform based Gaussian approximation to the joint distribution of x and the transformed random variable y D g.x/ C q, where
x N.m; P / and q N.0; Q/, is given as
x
m
P CU
N
;
;
(5.78)
y
U
CUT SU
where the submatrices can be computed as follows.
1 Form the set of 2n C 1 sigma points as follows:
X .0/ D m;
X .i / D m C
X .i Cn/ D m
hp i
p
nC
P ;
hp i i
p
nC
P ;
i
i D 1; : : : ; n;
Y .i / D g.X .i / /;
i D 0; : : : ; 2n:
(5.79)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
85
2n
X
Wi.m/ Y .i / ;
iD0
SU D
2n
X
Wi.c/ .Y .i /
U / .Y .i /
U /T C Q;
Wi.c/ .X .i /
m/ .Y .i /
U /T ;
i D0
CU D
2n
X
(5.80)
i D0
where the constant weights Wi.m/ and Wi.c/ were defined in Equation (5.77).
The unscented transform approximation to a transformation of the form
y D g.x; q/ can be derived by considering the augmented random variable
Q D .x; q/ as the random variable in the transform. The resulting algorithm
x
is the following.
Algorithm 5.13 (Unscented approximation of a non-additive transform)
The (augmented) unscented transform based Gaussian approximation
to the joint distribution of x and the transformed random variable
y D g.x; q/ when x N.m; P / and q N.0; Q/ is given as
x
m
P CU
N
;
;
(5.81)
y
U
CUT SU
where the sub-matrices can be computed as follows. Let the dimensionalities of x and q be n and nq , respectively, and let n0 D n C nq .
Q D .x; q/:
1 Form the sigma points for the augmented random variable x
Q
XQ .0/ D m;
Q C
XQ .i / D m
0
Q
XQ .i Cn / D m
hp i
PQ ;
hp i i
p
0
0
n C
PQ ;
p
n0 C 0
i D 1; : : : ; n0 ;
(5.82)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
86
U D
2n
X
0
Wi.m/ YQ .i / ;
i D0
0
SU D
2n
X
0
Wi.c/ .YQ .i /
U / .YQ .i /
U /T ;
i D0
0
CU D
2n
X
0
Wi.c/ .XQ .i /;x
m/ .YQ .i /
U /T ;
i D0
0
where the definitions of the weights Wi.m/ and Wi.c/ are the same as in
Equation (5.77), but with n replaced by n0 and replaced by 0 .
The unscented transform is a third order method in the sense that the
estimate of the mean of g./ is exact for polynomials up to order three.
That is, if g./ is indeed a multi-variate polynomial of order three, the
mean is exact. However, the covariance approximation is exact only for
first order polynomials, because the square of a second order polynomial is
already a polynomial of order four, and the unscented transform (UT) does
not compute the exact result for fourth order polynomials. In this sense
the UT is only a first order method. With suitable selection of parameters
( D 3 n) it is possible to get some of the fourth order terms appearing in
the covariance computation right also for quadratic functions, but not all.
(5.83)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
87
where mk and Pk are the mean and covariance computed by the algorithm.
Algorithm 5.14 (Unscented Kalman filter I) In the additive form of the
unscented Kalman filter (UKF) algorithm, which can be applied to additive
models of the form (5.24), the following operations are performed at each
measurement step k D 1; 2; 3; : : :
Prediction:
1 Form the sigma points:
Xk.0/1 D mk 1 ;
Xk.i /1 D mk
Xk.iCn/
1 D mk
p
1
nC
hp
Pk
hp
nC
Pk
ii
i
;
;
i D 1; : : : ; n;
(5.84)
XOk.i / D f .Xk.i /1 /;
i D 0; : : : ; 2n:
(5.85)
2n
X
Wi.m/ XOk.i / ;
i D0
Pk D
2n
X
Wi.c/ .XOk.i /
mk / .XOk.i /
m k /T C Q k
1;
(5.86)
i D0
where the weights Wi.m/ and Wi.c/ were defined in Equation (5.77).
Update:
1 Form the sigma points:
Xk .0/ D mk ;
Xk .i / D mk C
Xk .i Cn/ D mk
hq i
p
nC
Pk ;
i
hq
i
p
nC
Pk ;
i
i D 1; : : : ; n:
(5.87)
YO k.i / D h.Xk .i / /;
i D 0; : : : ; 2n:
(5.88)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
88
3 Compute the predicted mean k , the predicted covariance of the measurement Sk , and the cross-covariance of the state and the measurement Ck :
k D
2n
X
Wi.m/ YO k.i / ;
iD0
Sk D
2n
X
k / .YO k.i /
k /T C Rk ;
i D0
Ck D
2n
X
Wi.c/ .Xk .i /
mk / .YO k.i /
k /T :
(5.89)
i D0
4 Compute the filter gain Kk , the filtered state mean mk and the covariance Pk , conditional on the measurement yk :
Kk D Ck Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Kk Sk KTk :
k ;
(5.90)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
89
1 C
n0 C 0
q
PQ k
;
1
i
0
/
Qk
XQk.iCn
Dm
1
q
p
0
0
n C
PQ k
;
i D 1; : : : ; n0 ; (5.91)
where
Qk
m
mk 1
;
D
0
PQ k
Pk
D
0
0
Qk
:
1
i D 0; : : : ; 2n0 ;
(5.92)
Q .i /
Q .i /;q
where XQk.i /;x
1 denotes the first n components in Xk 1 and Xk 1 denotes the last nq components.
3 Compute the predicted mean mk and the predicted covariance Pk :
mk D
2n
X
0
Wi.m/ XOk.i / ;
i D0
Pk D
2n
X
Wi.c/ .XOk.i /
mk / .XOk.i /
mk /T ;
(5.93)
i D0
0
where the weights Wi.m/ and Wi.c/ are the same as in Equation (5.77),
but with n replaced by n0 and by 0 .
Update:
1 Form the sigma points for the augmented random variable .xk ; rk /:
Q k;
XQk .0/ D m
Qk C
XQk .i / D m
n00 C 00
q
PQ k
;
i
00
XQk .i Cn / D mk
n00
00
q
PQ k ;
i
i D 1; : : : ; n00 ; (5.94)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
90
where
Qk D
m
mk
;
0
Pk
PQ k D
0
0
:
Rk
i D 0; : : : ; 2n00 ;
(5.95)
where XQk .i /;x denotes the first n components in XQk .i / and XQk .i /;r
denotes the last nr components.
3 Compute the predicted mean k , the predicted covariance of the measurement Sk , and the cross-covariance of the state and the measurement Ck :
00
k D
2n
X
00
Wi.m/ YO k.i / ;
iD0
00
Sk D
2n
X
00
k / .YO k.i /
k /T ;
iD0
00
Ck D
2n
X
00
mk / .YO k.i /
k /T ;
(5.96)
iD0
00
00
where the weights Wi.m/ and Wi.c/ are the same as in Equation (5.77), but with n replaced by n00 and by 00 .
4 Compute the filter gain Kk and the filtered state mean mk and covariance Pk , conditional to the measurement yk :
Kk D Ck Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Kk Sk KTk :
k ;
(5.97)
The advantage of the UKF over the EKF is that the UKF is not based on
a linear approximation at a single point, but uses further points in approximating the non-linearity. As discussed in Julier and Uhlmann (2004), the
unscented transform is able to capture the higher order moments caused by
the non-linear transform better than Taylor series based approximations.
However, as already pointed out in the previous section, although the mean
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
91
5
True Angle
Measurements
UKF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
92
5.7 Exercises
5.1
0:01 sin.xk
yk D 0:5 sin.2 xk / C rk ;
1/
C qk
1;
(5.98)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
5.2
5.3
93
For the above model, derive the required expected values for an SLF and
implement the SLF for the model. Hint: use the imaginary part of the inverse
Fourier transform of the Gaussian distribution. Compute the RMSE values,
plot the results, and compare the performance to the EKF above.
In this exercise your task is to derive the derivative form of the statistically
linearized filter (SLF).
(a) Prove using integration by parts the following identity for a Gaussian
random variable x, differentiable non-linear function g.x/, and its Jacobian matrix Gx .x/ D @g.x/=@x:
Eg.x/ .x
(5.99)
(5.100)
Implement a UKF for the model in Exercise 5.1. Plot the results and compare
the RMSE values to the EKF and the SLF.
In this exercise we consider a classical bearings only target tracking problem
which frequently arises in the context of passive sensor tracking. In this
problem there is single target in the scene and two angular sensors are used
for tracking it. The scenario is illustrated in Figure 5.7.
The state of the target at time step k consist of the position .xk ; yk / and the
velocity .xP k ; yPk /. The dynamics of the state vector xk D .xk yk xP k yPk /T
are modeled with the discretized Wiener velocity model:
0
1 0
10
1
1 0 t
0
xk 1
xk
B yk C B 0 1 0 t C B yk 1 C
B
C B
CB
C C qk 1 ;
@ xP k A D @ 0 0 1
0 A @ xP k 1 A
0 0 0
1
yPk 1
yPk
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
94
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
95
Implement a UKF for the bearings only target tracking problem in Exercise 5.5. Compare the performance to the EKF.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
6
General Gaussian filtering
Quite soon after the unscented Kalman filter (UKF) was published, Ito and
Xiong (2000) pointed out that the UKF can be considered as a special
case of so-called Gaussian filters, where the non-linear filtering problem is
solved using Gaussian assumed density approximations. The generalized
framework also enables the usage of various powerful Gaussian quadrature and cubature integration methods (Wu et al., 2006; Arasaratnam and
Haykin, 2009). The series expansion based filters presented in the previous sections can be seen as approximations to the general Gaussian filter.
In this section we present the Gaussian filtering framework and show how
the GaussHermite Kalman filter (GHKF) and the cubature Kalman filter
(CKF) can be derived as its approximations. We also show how the UKF
can be seen as a generalization of the CKF.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
97
where
Z
M D
SM D
.g.x/
M / .g.x/
Z
CM D
.x
m/ .g.x/
M /T N.x j m; P / dx C Q;
M /T N.x j m; P / dx:
(6.2)
SM D
.g.x; q/
M / .g.x; q/
M /T
.x
m/ .g.x; q/
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
98
2000; Wu et al., 2006). The key idea is to assume that the filtering distribution is indeed Gaussian,
p.xk j y1Wk / ' N.xk j mk ; Pk /;
(6.5)
f .xk
1/
N.xk
j mk
1 ; Pk 1 /
Z
Pk D
.f .xk
1/
N.xk
mk / .f .xk
j mk
1/
1 ; Pk 1 /
dxk
1;
mk /T
dxk
C Qk
1:
(6.6)
Update:
Z
k D
Sk D
.h.xk /
k / .h.xk /
Z
Ck D
.xk
mk / .h.xk /
k /T N.xk j mk ; Pk / dxk C Rk ;
k /T N.xk j mk ; Pk / dxk ;
Kk D Ck Sk 1 ;
mk D mk C Kk .yk
Pk D Pk
k /;
Kk Sk KTk :
(6.7)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
1 ; qk 1 /
N.xk
j mk
1 ; Pk 1 /
N.qk
j 0; Qk
Z
Pk D
99
.f .xk
1 ; qk 1 /
N.xk
j mk
mk / .f .xk
1 ; Pk 1 /
dxk
dqk
1;
mk /T
1 ; qk 1 /
N.qk
1/
j 0; Qk
1/
dxk
dqk 1 :
(6.8)
Update:
Z
k D
h.xk ; rk /
N.xk j mk ; Pk / N.rk j 0; Rk / dxk drk ;
Z
Sk D
.h.xk ; rk /
k / .h.xk ; rk /
k /T
.xk
mk / .h.xk ; rk /
k /T
k /;
Kk Sk KTk :
(6.9)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
100
dp
exp. x 2 =2/:
dx p
(6.12)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
101
1;
H3 .x/ D x
3x;
H4 .x/ D x
6 x 2 C 3;
(6.13)
p Hp 1 .x/:
(6.14)
Using the same weights and sigma points, integrals over non-unit Gaussian
weights functions N.x j m; P / can be evaluated using a simple change of
integration variable:
Z 1
Z 1
g.x/ N.x j m; P / dx D
g.P 1=2 C m/ N. j 0; 1/ d: (6.15)
1
p2
p
:
Hp 1 . .i / /2
i D1
(6.17)
(6.18)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
102
By generalizing the change of variables idea, we can form approximations to multi-dimensional integrals of the form (6.10). First let P D
p
p p T
P P , where P is the Cholesky factor of the covariance matrix P or
some other similar square root of the covariance matrix. If we define new
integration variables by
p
x D m C P ;
(6.19)
we get
Z
Z
p
g.x/ N.x j m; P / dx D g.m C P / N. j 0; I/ d:
(6.20)
(6.21)
The weights Wik ; k D 1; : : : ; n are simply the corresponding onedimensional GaussHermite weights and .i1 ;:::;in / is an n-dimensional
vector with one-dimensional unit sigma point .ik / at element k. The
algorithm can now be written as follows.
Algorithm 6.6 (GaussHermite cubature) The pth order GaussHermite
approximation to the multi-dimensional integral
Z
g.x/ N.x j m; P / dx
(6.22)
can be computed as follows.
1 Compute the one-dimensional weights Wi ; i D 1; : : : ; p and unit sigma
points .i / as in the one-dimensional GaussHermite quadrature Algorithm 6.5.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
103
(6.23)
(6.25)
where
p p T
P P .
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
104
4
4
5
2
0
(a) Original
(b) Transformed
.i1 ;:::;in / ;
i1 ; : : : ; in D 1; : : : ; p;
(6.26)
where the unit sigma points .i1 ;:::;in / were defined in Equation (6.24).
2 Propagate the sigma points through the dynamic model:
i1 ; : : : ; in D 1; : : : ; p:
(6.27)
Pk D
mk / .XOk.i1 ;:::;in /
mk /T C Qk
1;
i1 ;:::;in
(6.28)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
105
Xk .i1 ;:::;in / D mk C
q
Pk .i1 ;:::;in / ;
i1 ; : : : ; in D 1; : : : ; p;
(6.29)
where the unit sigma points .i1 ;:::;in / were defined in Equation (6.24).
2 Propagate sigma points through the measurement model:
i1 ; : : : ; in D 1; : : : ; p:
(6.30)
3 Compute the predicted mean k , the predicted covariance of the measurement Sk , and the cross-covariance of the state and the measurement Ck :
X
k D
Wi1 ;:::;in YO k.i1 ;:::;in / ;
i1 ;:::;in
Sk D
k /T C Rk ;
i1 ;:::;in
Ck D
k /T ;
i1 ;:::;in
(6.31)
where the weights Wi1 ;:::;in were defined in Equation (6.23).
4 Compute the filter gain Kk , the filtered state mean mk and the covariance Pk , conditional on the measurement yk :
Kk D Ck Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Kk Sk KTk :
k ;
(6.32)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
106
5
True Angle
Measurements
GHKF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
that in this particular case the higher order approximation to the Gaussian
integrals does not really help. It is possible that the consistency and stability properties of the filters are indeed different, but it is impossible to know
based on this single test.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
107
(6.34)
where the points u.i / belong to the symmetric set 1 with generator
.1; 0; : : : ; 0/ (see, e.g., Wu et al., 2006; Arasaratnam and Haykin, 2009):
9
80 1 0 1
0 1 0 1
0
1
0
1
>
>
B C B C
>
B 0 C B 1C
>
1
0
>
<B C B C
B C B C
=
B0C B0C
B0C B0C
1 D B C ; B C ; B C ; B C ; ;
(6.35)
B :: C B :: C
B :: C B :: C
>
>
>
A
@
A
@
A
@
A
@
:
:
:
:
>
>
:
;
0
0
0
0
and W is a weight and c is a parameter yet to be determined.
Because the point set is symmetric, the rule is exact for all monomials
of the form x1d1 x2d2 xndn , if at least one of the exponents di is odd. Thus
we can construct a rule which is exact up to third degree by determining
the coefficients W and c such that it is exact for selections gj ./ D 1 and
gj ./ D j2 . Because the true values of the integrals are
Z
N. j 0; I/ d D 1;
Z
j2 N. j 0; I/ d D 1;
(6.36)
we get the equations
W
1 D W 2n D 1;
X
W
c uj.i / 2 D W 2c 2 D 1;
(6.37)
W D
(6.38)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
108
That is, we get the following simple rule which is exact for monomials up
to third degree:
Z
1 X p .i /
g. n u /:
g./ N. j 0; I/ d
(6.39)
2n i
We can now easily extend the method to arbitrary mean and covariance by
using the change of variables in Equations (6.19) and (6.20) and the result
is the following algorithm.
Algorithm 6.8 (Spherical cubature integration) The third order spherical
cubature approximation to the multi-dimensional integral
Z
g.x/ N.x j m; P / dx
(6.40)
can be computed as follows.
1 Compute the unit sigma points as
(p
n ei ;
i D 1; : : : ; n;
.i /
D
p
n ei n ; i D n C 1; : : : ; 2n;
(6.41)
p p T
P P .
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
4
5
109
2
0
(a) Original
(b) Transformed
its weights are always positive, which is not always true for more general
methods (Wu et al., 2006).
We can generalize the above approach by using a 2n C 1 point approximation, where the origin is also included:
Z
X
g.c u.i / /:
(6.43)
g./ N. j 0; I/ d W0 g.0/ C W
i
We can now solve for the parameters W0 , W , and c such that we get the
exact result with selections gj ./ D 1 and gj ./ D j2 . The solution can
be written in the form
W0 D
;
nC
1
;
W D
2.n C /
p
c D n C ;
(6.44)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
110
X
p
1
g.m/ C
g.m C P .i / /;
nC
2.n C / i D1
where
.i /
(p
n C ei ;
D
p
n C ei
i D 1; : : : ; n;
i D n C 1; : : : ; 2n:
n;
(6.45)
(6.46)
The rule can be seen to coincide with the original UT (Julier and Uhlmann,
1995), which corresponds to the unscented transform presented in Section 5.5 with D 1, D 0, and where is left as a free parameter. With
the selection D 3 n, we can also match the fourth order moments of the
distribution (Julier and Uhlmann, 1995), but with the price that when the
dimensionality n > 3, we get negative weights and approximation rules
that can sometimes be unstable. But nothing prevents us from using other
values for the parameter.
Note that third order here means a different thing than in the Gauss
Hermite Kalman filter the pth order GaussHermite filter is exact for
monomials up to order 2p 1, which means that the third order GHKF is
exact for monomials up to fifth order. The third order spherical cubature
rule is exact only for monomials up to third order. It is also possible to
derive symmetric rules that are exact for higher than third order. However,
this is no longer possible with a number of sigma points which is linear
O.n/ in state dimension (Wu et al., 2006; Arasaratnam and Haykin, 2009).
For example, for a fifth order rule, the required number of sigma points is
proportional to n2 , the state dimension squared.
As in the case of the unscented transform, being exact up to order three
only ensures that the estimate of the mean of g./ is exact for polynomials
of order three. The covariance will be exact only for polynomials up to
order one (linear functions). In this sense the third order spherical cubature
rule is actually a first order spherical cubature rule for the covariance.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
111
Algorithm 6.9 (Cubature Kalman filter I) The additive form of the cubature Kalman filter (CKF) algorithm is the following.
Prediction:
1 Form the sigma points as:
p
Xk.i /1 D mk 1 C Pk
.i / ;
i D 1; : : : ; 2n;
(6.47)
(6.48)
XOk.i / D f .Xk.i /1 /;
i D 1; : : : ; 2n:
(6.49)
mk D
1 X O .i /
X ;
2n i D1 k
Pk D
1 X O .i /
.X
2n i D1 k
2n
mk / .XOk.i /
m k /T C Q k
1:
(6.50)
Update:
1 Form the sigma points:
Xk .i / D mk C
q
Pk .i / ;
i D 1; : : : ; 2n;
(6.51)
YO k.i / D h.Xk .i / /;
i D 1 : : : 2n:
(6.52)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
112
3 Compute the predicted mean k , the predicted covariance of the measurement Sk , and the cross-covariance of the state and the measurement Ck :
2n
k D
1 X O .i /
Y ;
2n i D1 k
Sk D
1 X O .i /
.Y
2n i D1 k
Ck D
1 X
.X .i /
2n i D1 k
2n
k / .YO k.i /
k /T C Rk ;
2n
mk / .YO k.i /
k /T :
(6.53)
4 Compute the filter gain Kk and the filtered state mean mk and covariance Pk , conditional on the measurement yk :
Kk D Ck Sk 1 ;
mk D mk C Kk yk
Pk D Pk
k ;
Kk Sk KTk :
(6.54)
By applying the cubature rule to the non-additive Gaussian filter in Algorithm 6.4 we get the following augmented form of the cubature Kalman
filter (CKF).
Algorithm 6.10 (Cubature Kalman filter II) The augmented non-additive
form of the cubature Kalman filter (CKF) algorithm is the following.
Prediction:
1 Form the matrix of sigma points for the augmented random variable
.xk 1 ; qk 1 /:
q
0
Q k 1 C PQ k 1 .i / ;
XQk.i /1 D m
i D 1; : : : ; 2n0 ;
(6.55)
where
Qk
m
mk 1
;
0
PQ k
Pk
0
0
Qk
:
1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
113
i D 1; : : : ; 2n0 ;
(6.57)
Q .i /
Q .i /;q
where XQk.i /;x
1 denotes the first n components in Xk 1 and Xk 1 denotes the last nq components.
3 Compute the predicted mean mk and the predicted covariance Pk :
0
2n
1 X O .i /
mk D 0
X ;
2n iD1 k
0
2n
1 X O .i /
.X
Pk D 0
2n iD1 k
mk / .XOk.i /
mk /T :
(6.58)
Update:
1 Let n00 D n C nr , where n is the dimensionality of the state and nr is
the dimensionality of the measurement noise. Form the sigma points
for the augmented vector .xk ; rk / as follows:
q
00
.i /
Q
Q k C PQ k .i / ;
Xk D m
i D 1; : : : ; 2n00 ;
(6.59)
where
Qk D
m
mk
;
0
Pk
PQ k D
0
0
:
Rk
00
The unit sigma points .i / are defined as in Equation (6.56), but with
n0 replaced by n00 .
2 Propagate the sigma points through the measurement model:
i D 1; : : : ; 2n00 ;
(6.60)
where XQk .i /;x denotes the first n components in XQk .i / and XQk .i /;r
denotes the last nr components.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
114
3 Compute the predicted mean k , the predicted covariance of the measurement Sk , and the cross-covariance of the state and the measurement Ck :
00
2n
1 X O .i /
Y ;
k D 00
2n iD1 k
00
2n
1 X O .i /
Sk D 00
.Y
2n iD1 k
k / .YO k.i /
k /T ;
00
2n
1 X
.Xk .i /;x
Ck D 00
2n iD1
mk / .YO k.i /
k /T :
(6.61)
4 Compute the filter gain Kk , the filtered state mean mk and the covariance Pk , conditional on the measurement yk :
Kk D Ck Sk 1 ;
mk D mk C Kk yk
Pk D Pk
Kk Sk KTk :
k ;
(6.62)
Although in the cubature Kalman filter (CKF) literature the third order characteristic of the cubature integration rule is often emphasized (see
Arasaratnam and Haykin, 2009), it is important to remember that in the
covariance computation, the rule is only exact for first order polynomials.
Thus in that sense CKF is a first order method.
Example 6.2 (Pendulum tracking with CKF) The result of the CKF in
the pendulum model (Example 5.1) is shown in Figure 6.4. The result is
practically the same as the result of the UKF, which was to be expected,
because the CKF is just a UKF with a specific parametrization.
6.7 Exercises
6.1
6.2
6.3
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
115
5
True Angle
Measurements
CKF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
6.5
6.6
6.7
(6.64)
(6.65)
Implement a GHKF for the model in Exercise 5.1. Plot the results and compare the RMSE values with the EKF, SLF, and UKF.
Implement a CKF for the model in Exercise 5.1. Plot the results and compare
the RMSE values with the other filters. Can you find such parameter values
for the methods which cause the UKF, GHKF, and CKF methods to become
identical?
Implement a CKF for the bearings only target tracking problem in Exercise 5.5. Compare the performance with the EKF and UKF.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
7
Particle filtering
In this section we formally treat x as a continuous random variable with a density, but
the analogous results apply to discrete random variables.
116
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
5
5
(a)
117
5
5
(b)
In a (perfect) Monte Carlo approximation, we draw N independent random samples x.i / p.x j y1WT /, i D 1; : : : ; N and estimate the expectation as
N
1 X
Eg.x/ j y1WT
g.x.i / /:
(7.2)
N i D1
Thus Monte Carlo methods approximate the target distribution by a set
of samples that are distributed according to the target density. Figure 7.1
represents a two-dimensional Gaussian distribution and its Monte Carlo
representation.
The convergence of the Monte Carlo approximation is guaranteed by
the central limit theorem (CLT, see, e.g., Liu, 2001) and the error term is
O.N 1=2 /, regardless of the dimensionality of x. This invariance with respect to dimensionality is unique to Monte Carlo methods and makes them
superior to practically all other numerical methods when the dimensionality of x is considerable. At least in theory, not necessarily in practice (see
Daum and Huang, 2003; Snyder et al., 2008).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
118
Particle filtering
i D 1; : : : ; N;
(7.4)
N
X
wQ .i / g.x.i / /;
(7.5)
iD1
1 p.x.i / j y1WT /
:
N .x.i / j y1WT /
(7.6)
Figure 7.2 illustrates the idea of importance sampling. We sample from the
importance distribution which is an approximation to the target distribution. Because the distribution of samples is not exact, we need to correct
the approximation by associating a weight with each of the samples.
The disadvantage of this direct importance sampling is that we should
be able to evaluate p.x.i / j y1WT / in order to use it directly. Recall that
by Bayes rule the evaluation of the posterior probability density can be
written as
p.y1WT j x.i / / p.x.i / /
p.x.i / j y1WT / D R
:
(7.7)
p.y1WT j x/ p.x/ dx
The likelihood p.y1WT j x.i / / and prior terms p.x.i / / are usually easy to
evaluate but often the integral in the denominator the normalization constant cannot be computed. To overcome this problem, we can form an
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
119
0.03
0.5
Target Distribution
Importance Distribution
0.025
0.02
Weight
p(x)
0.4
0.3
0.015
0.2
0.01
0.1
0.005
0
4
0
x
0
4
(a)
0
x
(b)
importance sampling approximation to the expectation integral by also approximating the normalization constant by importance sampling. For this
purpose we can decompose the expectation integral and form the approximation as follows.
Z
Eg.x/ j y1WT D g.x/ p.x j y1WT / dx
R
g.x/ p.y1WT j x/ p.x/ dx
D R
p.y1WT j x/ p.x/ dx
i
R h p.y1WT jx/ p.x/
g.x/
.x j y1WT / dx
.xjy1WT /
i
D Rh
p.y1WT jx/ p.x/
.x j y1WT / dx
.xjy1WT /
P
.i
/
N
p.y1WT jx / p.x.i / /
1
g.x.i / /
i D1
N
.x.i / jy1WT /
PN p.y1WT jx.j / / p.x.j / /
1
N
j D1
.x.j / jy1WT /
N
X
w .i /
3
5 g.x.i / /:
(7.8)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
120
i D 1; : : : ; N:
(7.9)
(7.10)
(7.11)
N
X
w .i / g.x.i / /:
(7.12)
i D1
N
X
w .i / .x
x.i / /;
(7.13)
iD1
1 /;
yk p.yk j xk /;
n
(7.14)
m
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
121
Eg.xk / j y1Wk
wk.i / g.x.ik / /:
(7.15)
iD1
N
X
wk.i / .xk
x.ik / /:
(7.16)
iD1
1 / p.x0Wk
j y1Wk
1/
1 ; y1Wk 1 / p.x0Wk 1
1 / p.x0Wk 1
j y1Wk
j y1Wk
1/
1 /:
(7.17)
Using a similar rationale as in the previous section, we can now construct
an importance sampling method which draws samples from a given importance distribution x.i0Wk/ .x0Wk j y1Wk / and computes the importance
weights by
wk.i / /
j y1Wk
1/
j y1Wk /
(7.18)
1 ; y1Wk / .x0Wk 1
j y1Wk
1 /;
(7.19)
x.i0Wk/ 1 ; y1Wk /
1
.x.i0Wk/ 1
j y1Wk
1/
j y1Wk
1/
(7.20)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
122
Lets now assume that we have already drawn the samples x.i0Wk/ 1 from the
importance distribution .x0Wk 1 j y1Wk 1 / and computed the corresponding importance weights wk.i / 1 . We can now draw samples x.i0Wk/ from the
importance distribution .x0Wk j y1Wk / by drawing the new state samples
for the step k as x.ik / .xk j x.i0Wk/ 1 ; y1Wk /. The importance weights from
the previous step are proportional to the last term in Equation (7.20):
wk.i / 1 /
p.x.i0Wk/
1
.x.i0Wk/ 1
j y1Wk
1/
j y1Wk
1/
(7.21)
x.i0Wk/ 1 ; y1Wk /
wk.i / 1 :
(7.22)
The generic sequential importance sampling algorithm can now be described as follows.
Algorithm 7.2 (Sequential importance sampling)
following:
i D 1; : : : ; N;
(7.23)
1 ; y1Wk /;
i D 1; : : : ; N:
(7.24)
p.yk
wk.i / 1
.x.ik / j x.i0Wk/
1 ; y1Wk /
(7.25)
1 ; y1Wk /
D .xk j xk
1 ; y1Wk /:
(7.26)
With this form of importance distribution we do not need to store the whole
histories x.i0Wk/ in the SIS algorithm, only the current states x.ik / . This form is
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
123
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
124
Particle filtering
is not usually performed at every time step, but only when it is actually
needed. One way of implementing this is to do resampling on every nth
step, where n is some predefined constant. This method has the advantage
that it is unbiased. Another way, which is used here, is adaptive resampling.
In this method, the effective number of particles, which is estimated from
the variance of the particle weights (Liu and Chen, 1995), is used for monitoring the need for resampling. The estimate for the effective number of
particles can be computed as:
neff
1
;
PN .i / 2
w
i D1
k
(7.27)
where wk.i / is the normalized weight of particle i at the time step k (Liu
and Chen, 1995). Resampling is performed when the effective number of
particles is significantly less than the total number of particles, for example,
neff < N=10, where N is the total number of particles.
Algorithm 7.4 (Sequential importance resampling) The sequential importance resampling (SIR) algorithm, which is also called the particle filter
(PF), is the following.
Draw N samples x.i0 / from the prior
x.i0 / p.x0 /;
i D 1; : : : ; N;
(7.28)
i D 1; : : : ; N:
(7.29)
(7.30)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
125
Performance of the SIR algorithm depends on the quality of the importance distribution ./. The importance distribution should be in such a
functional form that we can easily draw samples from it and that it is possible to evaluate the probability densities of the sample points. The optimal
importance distribution in terms of variance (see, e.g., Doucet et al., 2001;
Ristic et al., 2004) is
.xk j x0Wk
1 ; y1Wk /
D p.xk j xk
1 ; yk /:
(7.33)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
126
i D 1; : : : ; N:
(7.34)
i D 1; : : : ; N;
(7.35)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
127
5
True Angle
Measurements
PF Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
Because low noise in the dynamic model causes sample impoverishment, it also implies that pure recursive estimation with particle filters
is challenging. This is because in pure recursive estimation the process
noise is formally zero and thus a basic SIR based particle filter is likely
to perform very badly. Pure recursive estimation, such as recursive estimation of static parameters, can sometimes be done by applying a Rao
Blackwellized particle filter instead of the basic SIR particle filter (see
Section 12.3.5). However, the more common use of RaoBlackwellization
is in the conditionally linear Gaussian state space models which we will
discuss in the next section.
Example 7.1 (Pendulum tracking with a particle filter) The result of the
bootstrap filter with 10 000 particles in the pendulum model (Example 5.1)
is shown in Figure 7.3. The RMSE of 0:12 is slightly higher than with most
of the other filters with the EKF we got an RMSE of 0:12 and with
the SLF/UKF/GHKF/CKF we got an RMSE of 0:11. This implies that in
this case the filtering distribution is indeed quite well approximated with a
Gaussian distribution, and thus using a particle filter is not beneficial.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
128
5
True Angle
Measurements
PF Estimate
GHKF with True R
GHKF with Increased R
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
In the above example the model is of the type which is suitable for
Gaussian approximation based filters and thus the particle filter produces
much the same result as they do. But often the noises in the system are not
Gaussian or there might be clutter (outlier) measurements which do not
fit into the Gaussian non-linear state space modeling framework at all. In
these kinds of model, the particle filter still produces good results whereas
Gaussian filters do not work at all. The next example illustrates this kind
of situation.
Example 7.2 (Cluttered pendulum tracking with a particle filter) In this
scenario, the pendulum sensor is broken such that at each time instant it
produces clutter (a random number in the range 2; 2) with probability
50%. This kind of situation can be modeled by including an indicator value
(data association indicator) as part of the state, which indicates whether
the measurement is clutter or not. The result of the bootstrap filter in the
pendulum model (Example 5.1) is shown in Figure 7.4. The RMSE of 0:16
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
129
is slightly higher than in the clutter-free case. In Gaussian filters a clutter model cannot be included into the system as such, but one heuristic
way to cope with it is to set the measurement noise variance high enough.
In Figure 7.4 we also show the results of a GaussHermite Kalman filter
(GHKF) with no clutter model at all, and the result of a GHKF with artificially increased noise variance. The RMSEs of these filters are 3:60 and
0:82, respectively, which are much higher than the RMSE of the particle
filter. The estimate trajectories of the GHKFs also indicate that they are
having significant trouble in estimating the state.
1 ; uk 1 /
D N.xk j Ak
1/
(7.36)
(7.37)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
130
where the first term is Gaussian and computable with the Kalman filter
and RTS smoother. For the second term we get the following recursion
analogously to Equation (7.17):
p.u0Wk j y1Wk /
/ p.yk j u0Wk ; y1Wk
1 / p.u0Wk
j y1Wk
1 / p.uk
j u0Wk
1 / p.uk
j uk
1/
1 ; y1Wk 1 / p.u0Wk 1
1 / p.u0Wk 1
j y1Wk
j y1Wk
1 /;
1/
(7.38)
1 ; y1Wk / .u0Wk 1
j y1Wk
1 /;
(7.39)
then by following the same derivation as in Section 7.3, we get the following recursion for the weights:
wk.i /
p.yk j u.i0Wk/
.i /
1 ; y1Wk 1 / p.uk
.u.ik / j u.i0Wk/ 1 ; y1Wk /
j u.ik / 1 /
wk.i / 1 ;
(7.40)
which corresponds to Equation (7.22). Thus via the above recursion we can
form an importance sampling based approximation to the marginal distribution p.u0Wk j y1Wk /. But because, given u0Wk , the distribution p.x0Wk j
u0Wk ; y1Wk / is Gaussian, we can form the full posterior distribution by using
Equation (7.37). Computing the distribution jointly for the full history x0Wk
would require running both the Kalman filter and the RTS smoother over
the sequences u0Wk and y1Wk , but if we are only interested in the posterior of
the last time step xk , we only need to run the Kalman filter. The resulting
algorithm is the following.
Algorithm 7.6 (RaoBlackwellized particle filter) Given a sequence of
importance distributions .uk j u.i0Wk/ 1 ; y1Wk / and a set of weighted samples fwk.i / 1 ; u.ik / 1 ; m.ik / 1 ; Pk.i /1 W i D 1; : : : ; N g, the RaoBlackwellized
particle filter (RBPF) processes the measurement yk as follows (Doucet
et al., 2001).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
131
1 Perform Kalman filter predictions for each of the Kalman filter means
and covariances in the particles i D 1; : : : ; N conditional on the previously drawn latent variable values u.ik / 1 :
mk .i / D Ak
.i /
.i /
1 .uk 1 / mk 1 ;
Pk .i / D Ak
.i /
.i /
1 .uk 1 / Pk 1
ATk 1 .u.ik / 1 / C Qk
.i /
1 .uk 1 /:
(7.41)
1 ; y1Wk /:
(7.42)
1/
.u.ik / j u.i0Wk/
p.u.ik / j u.ik / 1 /
1 ; y1Wk /
(7.43)
Hk .u.ik / / mk ;
(7.45)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle filtering
132
(7.46)
Equivalently, the RBPF can be interpreted to form an approximation to the
filtering distribution as
p.xk ; uk j y1Wk /
N
X
wk.i / .uk
(7.47)
iD1
1/
1/
1 / p.uk
j u.i0Wk/
1 ; y1Wk 1 /:
(7.48)
In general, normalizing this distribution or drawing samples from this distribution directly is not possible. But, if the latent variables uk are discrete,
we can normalize this distribution and use this optimal importance distribution directly.
The class of models where RaoBlackwellization of some linear state
components can be carried out can be extended beyond the conditionally
linear Gaussian models presented here. We can, for example, include additional latent-variable dependent non-linear terms into the dynamic and
measurement models (Schon et al., 2005). In some cases, when the filtering model is not strictly Gaussian due to slight non-linearities in either the
dynamic or measurement models, it is possible to replace the exact Kalman
filter update and prediction steps in RBPF with the extended Kalman filter
(EKF), the unscented Kalman filter (UKF) prediction and update steps, or
with any other Gaussian approximation based filters (Sarkka et al., 2007b).
7.6 Exercises
7.1
(7.49)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
133
(a) Write down the Kalman filter equations for this model.
(b) Derive an expression for the optimal importance distribution for the
model:
.xk / D p.xk j xk
1 ; y1Wk /:
(7.50)
(c) Write pseudo-code for the corresponding particle filter algorithm (sequential importance resampling algorithm). Also write down the equations for the weight update.
(d) Compare the number of CPU steps (multiplications and additions)
needed by the particle filter and Kalman filter. Which implementation
would you choose for a real implementation?
7.2
7.3
7.4
7.5
Implement the bootstrap filter for the model in Exercise 5.1 and test its
performance against the non-linear Kalman filters.
Implement a sequential importance resampling filter with an EKF, UKF,
GHKF, or CKF based importance distribution for the model in Exercise 5.1.
Note that you might want to use a small non-zero covariance as the prior of
the previous step instead of plain zero to get the filters to work better.
Implement the bootstrap filter and SIR with the CKF importance distribution
for the bearings only target tracking model in Exercise 5.5. Plot the results
and compare the RMSE values to those of the non-linear Kalman filters.
Implement a RaoBlackwellized particle filter for the following clutter
model (outlier model) :
xk D xk 1 C qk 1 ;
(
xk C rk ; if uk D 0;
yk D c
if uk D 1;
rk ;
(7.51)
where qk 1 N.0; 1/, rk N.0; 1/ and rkc N.0; 102 /. The indicator
variables uk are modeled as independent random variables which take the
value uk D 0 with prior probability 0:9 and the value uk D 1 with prior
probability 0:1. Test the performance of the filter with simulated data and
compare the performance to a Kalman filter, where the clutter rkc is ignored.
What is the optimal importance distribution for this model?
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
8
Bayesian smoothing equations and
exact solutions
So far in this book we have only considered filtering algorithms which use
the measurements obtained before and at the current step for computing
the best possible estimate of the current state (and possibly future states).
However, sometimes it is also of interest to estimate states for each time
step conditional on all the measurements that we have obtained. This problem can be solved with Bayesian smoothing. In this chapter, we present
the Bayesian theory of smoothing. After that we derive the RauchTung
Striebel (RTS) smoother which is the closed form smoothing solution to
linear Gaussian models. We also briefly discuss two-filter smoothers.
(8.1)
The difference between filters and smoothers is that the Bayesian filter
computes its estimates using only the measurements obtained before and
at the time step k, but the Bayesian smoother uses also the future measurements for computing its estimates. After obtaining the filtering posterior
state distributions, the following theorem gives the equations for computing the marginal posterior distributions for each time step conditionally on
all the measurements up to the time step T .
Theorem 8.1 (Bayesian optimal smoothing equations) The backward recursive equations (the Bayesian smoother) for computing the smoothed
distributions p.xk j y1WT / for any k < T are given by the following
1
134
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
135
(8.3)
The joint distribution of xk and xkC1 given y1WT can be now computed as
p.xk ; xkC1 j y1WT / D p.xk j xkC1 ; y1WT / p.xkC1 j y1WT /
D p.xk j xkC1 ; y1Wk / p.xkC1 j y1WT /
p.xkC1 j xk / p.xk j y1Wk / p.xkC1 jy1WT /
D
;
p.xkC1 j y1Wk /
(8.4)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
136
(8.5)
to the linear filtering model (4.17). The difference from the solution computed by the Kalman filter is that the smoothed solution is conditional on
the whole measurement data y1WT , while the filtering solution is conditional
only on the measurements obtained before and at the time step k, that is,
on the measurements y1Wk .
Theorem 8.2 (RTS smoother) The backward recursion equations for the
(fixed interval) RauchTungStriebel smoother are given as
mkC1 D Ak mk ;
PkC1 D Ak Pk ATk C Qk ;
Gk D Pk ATk PkC1 1 ;
msk D mk C Gk mskC1
mkC1 ;
s
Pks D Pk C Gk PkC1
PkC1 GkT ;
(8.6)
where mk and Pk are the mean and covariance computed by the Kalman
filter. The recursion is started from the last time step T , with msT D mT and
PTs D PT . Note that the first two of the equations are simply the Kalman
filter prediction equations.
Proof Similarly to the Kalman filter case, by Lemma A.1, the joint distribution of xk and xkC1 given y1Wk is
p.xk ; xkC1 j y1Wk / D p.xkC1 j xk / p.xk j y1Wk /
D N.xkC1 j Ak xk ; Qk / N.xk j mk ; Pk /
xk
Q
Q 1 ; P1 ;
DN
m
xkC1
(8.7)
where
mk
Q1D
m
;
Ak mk
PQ 1 D
Pk
Ak Pk
Pk ATk
:
Ak Pk ATk C Qk
(8.8)
(8.9)
(8.10)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
137
where
Gk D Pk ATk .Ak Pk ATk C Qk /
Q 2 D mk C Gk .xkC1 Ak mk /
m
PQ 2 D Pk Gk .Ak Pk ATk C Qk / GkT :
(8.11)
(8.12)
where
mskC1
;
mk C Gk .mskC1 Ak mk /
s
s
PkC1
PkC1
GkT
Q
P3 D
:
s
s
Gk PkC1
Gk PkC1
GkT C PQ 2
Q3D
m
(8.13)
(8.14)
where
msk D mk C Gk .mskC1
Pks
D Pk C
s
Gk .PkC1
Ak mk /;
Ak Pk ATk
Qk / GkT :
(8.15)
Example 8.1 (RTS smoother for Gaussian random walk) The RTS
smoother for the random walk model given in Example 4.1 is given by the
equations
mkC1 D mk ;
PkC1 D Pk C Q;
Pk
msk D mk C
.ms
mkC1 /;
PkC1 kC1
!2
P
k
s
Pks D Pk C
PkC1
PkC1 ;
PkC1
(8.16)
where mk and Pk are the updated mean and covariance from the Kalman
filter in Example 4.2. The result of applying the smoother to simulated data
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
138
6
4
2
xk
0
2
4
6
Filter Estimate
Smoother Estimate
Filters 95% Quantiles
Smoothers 95% Quantiles
8
10
0
20
40
60
Time step k
80
100
is shown in Figure 8.1. The evolution of the filter and smoother variances
is illustrated in Figure 8.2.
Example 8.2 (RTS smoother for car tracking) The backward recursion
equations required for implementing the RTS smoother for the car tracking
problem in Example 4.3 are the following:
0
1
1 0 t 0
B 0 1 0 t C
Cm ;
mkC1 D B
@ 0 0 1
0 A k
0 0 0
1
0
1
0
1T
1 0 t 0
1 0 t 0
B 0 1 0 t C
B
C
C Pk B 0 1 0 t C
PkC1 D B
@ 0 0 1
A
@
0
0 0 1
0 A
0 0 0
1
0 0 0
1
1
0 c 3
q1c t 2
q1 t
0
0
2
B 3
q2c t 3
q2c t 2 C
C
B 0
0
3
2 C;
CB
C
B q1c t 2
0
q1c t
0 A
@ 2
c
2
q2 t
0
0
q2c t
2
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
139
1
Filter Variance
Smoother Variance
0.9
Variance
0.8
0.7
0.6
0.5
0.4
0
20
40
60
Time step k
80
100
1
B 0
Gk D Pk B
@ 0
0
0 t
1 0
0 1
0 0
msk D mk C Gk mskC1
Pks
D Pk C
s
Gk PkC1
1T
0
t C
C P 1 ;
kC1
0 A
1
mkC1 ;
PkC1 GkT :
The terms mk and Pk are the Kalman filter means and covariance computed with the equations given in Example 4.3. It would also be possible to
store the values mkC1 and PkC1 during Kalman filtering to avoid recomputation of them in the first two equations above. The gains Gk could be
computed already during the Kalman filtering as well. The result of applying the RTS smoother to simulated data is shown in Figure 8.3.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
140
0
2
x2
4
6
8
10
12
4
4
x1
10
12
1 / p.ykWn
j xk /:
(8.17)
The first term on the right is just the result of the Bayesian filter just after
prediction on the step k 1. The second term on the right can be evaluated
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
141
(8.19)
The classical linear two-filter smoother (Fraser and Potter, 1969) can be
derived from the general equations by assuming that we have
p.ykWn j xk / / N.xk j mbk ; Pkb /;
(8.20)
for some mean mbk and covariance Pkb . This results in recursions which
resemble a Kalman filter which runs backwards in time. However, it turns
out that at the initial steps of the backward recursions the distributions are
not normalizable, because the covariances Pkb are formally singular. In the
formulation of Fraser and Potter (1969) this problem is avoided by using
the so-called information filter, which is a formulation of the Kalman filter
in terms of inverses of covariance matrices instead of the plain covariances.
Unfortunately, when starting from Equations (8.18) and (8.19), it is difficult to go beyond the linear case because, in the more general case, a
simple information filter formulation does not work. For example, there
is no Monte Carlo version of an information filter. It is indeed possible
to form reasonable approximations by forming backward dynamic models
by using, for example, the unscented transform (Wan and Van der Merwe,
2001), but in general this might not lead to good or valid approximations
(Briers et al., 2010). The key problem is that p.yRkWn j xk / is not generally
normalizable with respect to xk , that is, we have p.ykWn j xk / dxk D 1.
One solution to the problem was presented by Briers et al. (2010), who
proposed a generalized version of the two-filter smoothing formulas of
Kitagawa (1994). The solution is based on the introduction of artificial
probability densities f
k .xk /g such that if p.ykWn j xk / > 0 then
k .xk / >
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
142
0. The backward recursions are then replaced with the following recursions:
Z
p.xkC1 j xk /
k .xk /
p.x
Q k j ykC1Wn / D p.x
Q kC1 j ykC1Wn /
dxkC1 ;
kC1 .xkC1 /
(8.21)
p.x
Q k j ykWn / D R
p.x
Q k j ykC1Wn / p.yk j xk /
;
p.x
Q k j ykC1Wn / p.yk j xk / dxk
(8.22)
where p.x
Q k j ykWn / is an auxiliary probability density such that
p.ykWn j xk / /
p.x
Q k j ykWn /
:
k .xk /
(8.23)
8.4 Exercises
8.1
8.2
8.3
8.4
8.5
8.6
Derive the linear RTS smoother for the non-zero-mean noise model in Exercise 4.1.
Write down the Bayesian smoothing equations for finite-state HMM
models described in Exercise 4.2 assuming that the filtering distributions
p.xk j y1Wk / have already been computed.
Implement the Gaussian random walk model smoother in Example 8.1 and
compare its performance to the corresponding Kalman filter. Plot the evolution of the smoothing distribution.
The Gaussian random walk model considered in Example 4.1 also defines
a joint Gaussian prior distribution p.x0 ; : : : ; xT /. The measurement model
p.y1 ; : : : ; yT j x0 ; : : : ; xT / is Gaussian as well. Construct these distributions and compute the posterior distribution p.x0 ; : : : ; xT j y1 ; : : : ; yT /.
Check numerically that the mean and the diagonal covariance entries of this
distribution exactly match the smoother means and variances.
Form a grid-based approximation to the Gaussian random walk model
smoother in the same way as was done for the filtering equations in
Exercise 4.4. Verify that the result is practically the same as in the RTS
smoother above.
Write down the smoother equations for the Gaussian random walk model,
when the stationary filter is used as the filter. Note that the smoother becomes a stationary backward filter. Compare the performance of this stationary smoother to that of the non-stationary smoother.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
8.7
143
Implement the RTS smoother for the resonator model in Exercise 4.6. Compare its RMSE performance to the filtering and baseline solutions and plot
the results.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
9
Extended and unscented smoothing
(9.1)
D Pk C
s
Gk PkC1
mkC1 ;
PkC1 GkT ;
(9.2)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
145
As in the derivation of the prediction step of the EKF in Section 5.2, the
approximate joint distribution of xk and xkC1 given y1Wk is
xk
Q
Q
p.xk ; xkC1 j y1Wk / D N
m
;
P
(9.3)
1 1 ;
xkC1
where
mk
;
f .mk /
Pk
Pk FxT
Q
;
P1 D
Fx Pk Fx Pk FxT C Qk
Q1D
m
(9.4)
and the Jacobian matrix Fx of f .x/ is evaluated at x D mk . By conditioning on xkC1 as in the RTS derivation in Section 8.2 we get
p.xk j xkC1 ; y1WT / D p.xk j xkC1 ; y1Wk /
Q 2 ; PQ 2 /;
D N.xk j m
(9.5)
where
Gk D Pk FxT .Fx Pk FxT C Qk / 1 ;
Q 2 D mk C Gk .xkC1 f .mk //;
m
PQ 2 D Pk Gk .Fx Pk FxT C Qk / GkT :
(9.6)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
146
The joint distribution of xk and xkC1 given all the data is now
p.xkC1 ; xk j y1WT / D p.xk j xkC1 ; y1WT / p.xkC1 j y1WT /
xkC1
Q
Q 3 ; P3 ;
DN
m
xk
(9.7)
where
mskC1
Q3D
m
;
mk C Gk .mskC1 f .mk //
s
s
PkC1
PkC1
GkT
Q
P3 D
:
s
s
Gk PkC1
Gk PkC1
GkT C PQ 2
(9.8)
(9.9)
where
msk D mk C Gk .mskC1
Pks
D Pk C
s
Gk .PkC1
f .mk //;
Fx Pk FxT
Qk / GkT :
(9.10)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
147
5
True Angle
Measurements
ERTSS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
mkC1 ;
s
Pks D Pk C Gk PkC1
PkC1 GkT ;
(9.11)
where the expectations are taken with respect to the filtering distribution
xk N.mk ; Pk /.
The generalization to the non-additive case is also straightforward and
just amounts to replacing the first two of Equations (9.11) with their nonadditive versions as in Algorithm 5.8. In the gain computation we then also
need to average over qk .
It is also possible to use Equation (5.61) for rewriting the cross-terms
above in terms of Jacobians of f , but this is left as an exercise to the
reader. The SLRTSS can also be considered as a first order truncation of the
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
148
5
True Angle
Measurements
SLRTSS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
Simandl
and Dunk, 2006; Sarkka, 2008) is a Gaussian approximation
based smoother where the non-linearity is approximated using the unscented transform. The smoother equations for the additive model (5.37)
are given as follows.
Algorithm 9.3 (Unscented RauchTungStriebel smoother I)
tive form unscented RTS smoother algorithm is the following.
The addi-
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
149
Xk.0/ D mk ;
Xk.i / D mk C
Xk.i Cn/ D mk
hp i
Pk ;
i
h
p
p i
nC
Pk ;
p
nC
i D 1; : : : ; n;
(9.12)
i D 0; : : : ; 2n:
3 Compute the predicted mean mkC1 , the predicted covariance PkC1 , and
the cross-covariance DkC1 :
mkC1 D
2n
X
.i /
;
Wi.m/ XOkC1
i D0
PkC1 D
2n
X
.i /
Wi.c/ .XOkC1
.i /
mkC1 / .XOkC1
mkC1 /T C Qk ;
i D0
DkC1 D
2n
X
Wi.c/ .Xk.i /
.i /
mk / .XOkC1
mkC1 /T ;
(9.13)
i D0
D Pk C
s
Gk .PkC1
mkC1 /;
PkC1 / GkT :
(9.14)
The above computations are started from the filtering result of the last
time step msT D mT , PTs D PT and the recursion runs backwards for
k D T 1; : : : ; 0.
Derivation Assume that the approximate means and covariances of the
filtering distributions are available:
p.xk j y1Wk / ' N.xk j mk ; Pk /;
and the smoothing distribution of time step k C 1 is known and approximately Gaussian:
s
p.xkC1 j y1WT / ' N.xkC1 j mskC1 ; PkC1
/:
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
150
T
DkC1
PkC1
xkC1 mkC1
(9.15)
This can be done by using the additive form of the unscented transformation in Algorithm 5.12 for the non-linearity xkC1 D f .xk /Cqk . This
is done in Equations (9.13).
2 Because the distribution (9.15) is Gaussian, by the computation rules of
Gaussian distributions the conditional distribution of xk is given as
Q 2 ; PQ 2 /;
p.xk j xkC1 ; y1WT / ' N.xk j m
where
Gk D DkC1 PkC1 1 ;
Q 2 D mk C Gk .xkC1 mkC1 /;
m
PQ 2 D Pk Gk PkC1 GkT :
3 The rest of the derivation is completely analogous to the derivation of
the ERTSS in Section 9.1.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
151
q
PQ k ;
q i
p
n0 C 0
PQ k ;
p
n0 C 0
i D 1; : : : ; n0 ;
(9.16)
where
mk
Qk D
m
;
0
Pk
Q
Pk D
0
0
:
Qk
i D 0; : : : ; 2n0 ;
where XQk.i /;x and XQk.i /;q denote the parts of the augmented sigma point
i which correspond to xk and qk , respectively.
3 Compute the predicted mean mkC1 , the predicted covariance PkC1 , and
the cross-covariance DkC1 :
0
mkC1 D
2n
X
0
.i /
Wi.m/ XOkC1
;
i D0
0
PkC1 D
2n
X
0
.i /
Wi.c/ .XOkC1
.i /
mkC1 / .XOkC1
mkC1 /T ;
i D0
0
DkC1 D
2n
X
.i /
mk / .XOkC1
mkC1 /T ;
(9.17)
i D0
0
where the definitions of the parameter 0 and the weights Wi.m/ and
0
Wi.c/ are the same as in Section 5.5.
4 Compute the smoother gain Gk , the smoothed mean msk , and the covariance Pks :
Gk D DkC1 PkC1 1 ;
msk D mk C Gk mskC1 mkC1 ;
s
Pks D Pk C Gk PkC1
PkC1 GkT :
(9.18)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
152
5
True Angle
Measurements
URTSS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
Example 9.3 (Pendulum tracking with URTSS) The result of applying the
URTSS to the pendulum model in Example 5.1 is shown in Figure 9.3. The
resulting RMSE is 0:028 which is the same as for the SLRTSS and lower
than that of the ETRSS which is 0:033. Thus we get the same error with the
URTSS as with statistical linearization but without needing to compute the
analytical expectations.
9.4 Exercises
9.1
9.2
9.3
9.4
Derive and implement the extended RTS smoother to the model in Exercise 5.1 and compare the errors of filters and smoothers.
Write down the detailed derivation of the (additive form) statistically linearized RTS smoother. You can follow the same steps as in the derivation of
the extended RTS smoother.
Derive and implement the statistically linearized RTS smoother to the model
in Exercise 5.1 and compare the errors of filters and smoothers.
In Exercise 5.3 you derived an alternative (derivative) form of the SLF. Write
down the corresponding alternative form of the SLRTS. Derive the smoothing equations for the model in Exercise 9.3 above and compare the equations
that you obtain with the equations obtained in Exercise 9.3.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
9.5
9.6
153
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
10
General Gaussian smoothing
mkC1 /;
s
Pks D Pk C Gk .PkC1
PkC1 / GkT :
(10.1)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
155
is, with GaussHermite cubatures (Ito and Xiong, 2000; Wu et al., 2006),
spherical cubature rules (Arasaratnam and Haykin, 2009), or with many
other numerical integration schemes. In the non-additive case the Gaussian
smoother becomes the following (Sarkka and Hartikainen, 2010a).
Algorithm 10.2 (Gaussian RTS smoother II) The equations of the nonadditive form Gaussian RTS smoother are the following:
Z
mkC1 D f .xk ; qk / N.xk j mk ; Pk / N.qk j 0; Qk / dxk dqk ;
Z
PkC1 D f .xk ; qk / mkC1 f .xk ; qk / mkC1 T
N.xk j mk ; Pk / N.qk j 0; Qk / dxk dqk ;
Z
DkC1 D
xk
mk f .xk ; qk /
mkC1 T
D Pk C
s
Gk .PkC1
mkC1 /;
PkC1 / GkT :
(10.2)
As in the Gaussian filtering case, the above algorithms are mainly theoretical, because the integrals can be solved in closed form only in special
cases.
i1 ; : : : ; in D 1; : : : ; p; (10.3)
where the unit sigma points .i1 ;:::;in / were defined in Equation (6.24).
2 Propagate the sigma points through the dynamic model:
.i1 ;:::;in /
XOkC1
D f .Xk.i1 ;:::;in / /;
i1 ; : : : ; in D 1; : : : ; p:
(10.4)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
156
3 Compute the predicted mean mkC1 , the predicted covariance PkC1 , and
the cross-covariance DkC1 :
X
.i1 ;:::;in /
mkC1 D
Wi1 ;:::;in XOkC1
;
i1 ;:::;in
PkC1 D
.i1 ;:::;in /
Wi1 ;:::;in .XOkC1
.i1 ;:::;in /
mkC1 / .XOkC1
.i1 ;:::;in /
mk / .XOkC1
mkC1 /T C Qk ;
i1 ;:::;in
DkC1 D
mkC1 /T ;
i1 ;:::;in
(10.5)
where the weights Wi1 ;:::;in were defined in Equation (6.23).
4 Compute the gain Gk , mean msk and covariance Pks as follows:
Gk D DkC1 PkC1 1 ;
msk D mk C Gk .mskC1
Pks
D Pk C
s
Gk .PkC1
mkC1 /;
PkC1 / GkT :
(10.6)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
157
5
True Angle
Measurements
GHRTSS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
The addi-
Xk.i / D mk C
Pk .i / ;
i D 1; : : : ; 2n;
(10.7)
(10.8)
i D 1; : : : ; 2n:
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
158
3 Compute the predicted mean mkC1 , the predicted covariance PkC1 , and
the cross-covariance DkC1 :
2n
mkC1 D
1 X O .i /
X ;
2n iD1 kC1
PkC1 D
1 X O .i /
.X
2n iD1 kC1
DkC1 D
1 X .i /
.X
2n iD1 k
2n
.i /
mkC1 / .XOkC1
mkC1 /T C Qk ;
2n
.i /
mk / .XOkC1
mkC1 /T :
(10.9)
mkC1 /;
s
Pks D Pk C Gk .PkC1
PkC1 / GkT :
(10.10)
By using the third order spherical cubature approximation to the nonadditive form Gaussian RTS smoother, we get the following algorithm.
Algorithm 10.5 (Cubature RauchTungStriebel smoother II) A single
step of the augmented form cubature RTS smoother is as follows.
1 Form the sigma points for the n0 D n C nq -dimensional augmented
random variable .xk ; qk /:
q
0
.i /
Q
Q k C PQ k .i / ;
Xk D m
i D 1; : : : ; 2n0 ;
(10.11)
where
Qk D
m
mk
;
0
Pk
PQ k D
0
0
:
Qk
i D 1; : : : ; 2n0 ;
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
159
3 Compute the predicted mean mkC1 , the predicted covariance PkC1 , and
the cross-covariance DkC1 :
0
mkC1
2n
1 X O .i /
X ;
D 0
2n i D1 kC1
PkC1
2n
1 X O .i /
D 0
.X
2n i D1 kC1
DkC1
2n
1 X Q .i /;x
.X
D 0
2n i D1 k
.i /
mkC1 / .XOkC1
mkC1 /T ;
.i /
mk / .XOkC1
mkC1 /T :
(10.12)
mkC1 ;
PkC1 GkT :
(10.13)
It is easy to see that the above algorithms are indeed special cases of the
URTSS methods with parameters D 1, D 0, D 0. However, this
particular selection of parameters tends to work well in practice and due
to the simplicity of sigma-point and weight selection rules, the method is
very simple to implement.
Example 10.2 (Pendulum tracking with CRTSS) The result of applying
the CRTSS for the pendulum model in Example 5.1 is shown in Figure 10.2.
As expected, the error 0:028 and the overall result are practically identical
to the result of the URTSS, as well as to the SLRTSS and GHRTSS.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
160
5
True Angle
Measurements
CRTSS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
161
are performed on the filtering results and thus we can compute the smoothing gain sequence Gk from the filtering results in a causal manner. Because
of these properties we may now derive a fixed-point smoother using similar methods as have been used for deriving the linear fixed-point smoother
from the linear RauchTungStriebel smoother in Meditch (1969).
We will now denote the smoothing means and covariances using notation of the type mkjn and Pkjn , which refer to the mean and covariance of
the state xk , which is conditioned to the measurements y1 ; : : : ; yn . With
this notation, the filter estimates are mkjk , Pkjk and the RTS smoother estimates, which are conditioned to the measurements y1 ; : : : ; yT , have the
form mkjT , PkjT . The RTS smoothers have the following common recursion equations:
Gk D DkC1 PkC1jk 1 ;
mkC1jk ;
PkC1jk GkT ;
(10.14)
which are indeed linear recursion equations for the smoother mean and covariance. Note that the gains Gk depend only on the filtering results, not on
the smoother mean and covariance. Because the gains Gk are independent
of T , from Equations (10.14) we get for i D j; : : : ; k the identity
mi ji D Gi miC1jk
mi jk
Similarly, for i D j; : : : ; k
mijk
miC1ji :
1 we have
mi ji D Gi miC1jk
(10.15)
mi C1ji :
(10.16)
1 :
(10.17)
mi jk
By varying i from j to k
mj jk
mj C1jk
D Gi miC1jk
mi C1jk
mj jk
D Gj mj C1jk
mj C1jk
D Gj C1 mj C2jk
mj C1jk
1 ;
mj C2jk
1 ;
::
:
mk
1jk
mk
1jk 1
D Gk
1 mkjk
mkjk
1 :
(10.18)
C Bj jk mkjk
mkjk
1 ;
(10.19)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
162
where
Bj jk D Gj Gk
1:
(10.20)
C Bj jk Pkjk
Pkjk
T
1 Bj jk :
(10.21)
D Dk Pkjk
(10.22)
2 Fixed-point smoothing:
(a) If k < j , just store the filtering result.
(b) If k D j , set Bj jj D I. The fixed-point smoothed mean and covariance on step j are equal to the filtered mean and covariance mj jj
and Pj jj .
(c) If k > j , compute the smoothing gain and the fixed-point smoother
mean and covariance:
Bj jk D Bj jk
1 Gk 1 ;
mj jk D mj jk
Pj jk D Pj jk
C Bj jk mkjk
C Bj jk Pkjk
mkjk
Pkjk
1 ;
T
1 Bj jk :
(10.23)
n 1jk
D mk
C Bk
n 1jk 1
n 1jk mkjk
mkjk
1 :
(10.24)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
163
n 1jk
D mk
n 1jk n 1
C Gk
1 n mk njk
mk
njk n 1 :
(10.25)
Equating the right-hand sides, and solving for mk njk while remembering
the identity Bk njk D Gk 1n 1 Bk n 1jk results in the smoothing equation
mk
njk
D mk
C Gk
C Bk
njk n 1
n 1 mk n 1jk 1
njk mkjk
mkjk
mk
n 1jk n 1
1 :
(10.26)
njk
D Pk
C Gk
C Bk
njk n 1
1
n 1 Pk n 1jk 1
njk Pkjk
Pkjk
Pk
T
n 1jk n 1 Gk n 1
T
1 Bk njk :
(10.27)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
164
only once, and the same gains can be used in different smoothers operating
on different intervals.
Algorithm 10.7 (General Gaussian fixed-lag smoother) Thus the general
Gaussian fixed-lag smoother can be implemented by performing the following on each time step k D 1; 2; 3; : : :
1 Gain computation: During the Gaussian filter prediction step compute
and store the predicted mean mkjk 1 , predicted covariance Pkjk 1 and
cross-covariance Dk . Also compute and store the smoothing gain
Gk
D Dk Pkjk
(10.28)
10.6 Exercises
10.1
10.2
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
11
Particle smoothing
165
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle smoothing
166
i D 1; : : : ; N;
(11.1)
and set w0.i / D 1=N , for all i D 1; : : : ; N . Initialize the state histories
to contain the prior samples x.i0 / .
For each k D 1; : : : ; T do the following:
1 Draw N new samples x.ik / from the importance distributions:
x.ik / .xk j x.ik / 1 ; y1Wk /;
i D 1; : : : ; N;
(11.2)
where x.ik / 1 is the k 1th (last) element in the sample history x.i0Wk/
2 Calculate the new weights according to
wk.i /
p.yk
wk.i / 1
1.
(11.3)
.i /
1 ; xk /:
(11.4)
4 If the effective number of particles (7.27) is too low, perform resampling on the state histories.
The approximation to the full posterior (smoothed) distribution is (Kitagawa, 1996; Doucet et al., 2000)
p.x0WT j y1WT /
N
X
wT.i / .x0WT
x.i0WT/ /:
(11.5)
i D1
N
X
wT.i / .xk
x.ik / /;
(11.6)
i D1
where x.ik / is the kth component in x.i0WT/ . However, if T k the direct SIR
smoother algorithm is known to produce very degenerate approximations
(Kitagawa, 1996; Doucet et al., 2000).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
167
(11.7)
.i /
Q k D x.ik / with probability wkjkC1
2 Choose x
.
Q kC1WT
Derivation Assume that we have already simulated a trajectory x
from the smoothing distribution. By using Equation (8.3) we get
p.QxkC1 j xk / p.xk j y1Wk /
p.QxkC1 j y1Wk /
D Z p.QxkC1 j xk / p.xk j y1Wk /;
Q kC1 ; y1WT / D
p.xk j x
(11.8)
where Z is a normalization constant. By substituting the SIR filter approximation in Equation (7.31) we get
X .i /
Q kC1 ; y1WT / Z
p.xk j x
wk p.QxkC1 j xk / .xk x.ik / /: (11.9)
i
We can now draw a sample from this distribution by sampling x.ik / with
probability / wk.i / p.QxkC1 j x.ik / /.
Given S iterations of Algorithm 11.2, resulting in sample trajectories
/
Q .j
x
0WT for j D 1; : : : ; S , the smoothing distribution can now be approximated as
1X
/
Q .j
p.x0WT j y1WT /
.x0WT x
(11.10)
0WT /:
S j
The marginal distribution samples for a step k can be obtained by extracting the kth components from the above trajectories. The computational
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle smoothing
168
5
True Angle
Measurements
PS Estimate
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
169
5
True Angle
Measurements
PS Estimate
GHRTSS with True R
GHRTSS with Increased R
3
2
1
0
1
2
3
0.5
1.5
2.5
3
Time t
3.5
4.5
bootstrap filter with 10 000 samples as the filter part) to the cluttered
pendulum model in Example 7.2 is shown in Figure 11.2. The RMSE error of the particle smoother was 0:028. The RMSE of a GHRTSS without any clutter model was 3:36 and the RMSE of a GHRTSS with artificially increased measurement noise was 0:74. Thus in this case the particle
smoother gives a significant improvement over the Gaussian approximation based smoothers.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Particle smoothing
170
/
Start by setting wT.ijT
D wT.i / for i D 1; : : : ; N .
For each k D T 1; : : : ; 0 compute new weights by
.i /
wkjT
D
X
j
/
.i /
w .i / p.x.j
.j /
kC1 j xk /
i:
hP k
wkC1jT
.l/
.j /
.l/
l wk p.xkC1 j xk /
(11.11)
Derivation Assume that we have already computed the weights for the
/
following approximation, where the particles x.ikC1
are from the SIR filter:
X .i /
/
p.xkC1 j y1WT /
wkC1jT .xkC1 x.ikC1
/:
(11.13)
i
X
i
.i /
wkC1jT
/
p.x.ikC1
j xk /
/
p.x.ikC1
j y1Wk /
(11.14)
which gives
Z
(11.16)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
171
wk.l/ .xk
p.xkC1 j xk / p.xkC1 j y1WT /
dxkC1
p.xkC1 j y1Wk /
/
X .i /
p.x.ikC1
j x.l/
k /
i;
h
/
w
x.l/
kC1jT P
k
.j /
.i /
.j /
w
p.x
j
x
/
i
j
k
kC1
k
(11.17)
X
i
/
w .l/ p.x.ikC1
j x.l/
.i /
k /
i:
hP k
wkC1jT
.j /
.i /
.j /
w
p.x
j
x
/
j
k
kC1
k
(11.18)
(11.19)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
172
Particle smoothing
(11.20)
3 Apply the RTS smoother to each of the mean and covariance histories
.i /
m.i0WT/ ; P0WT
for i D 1; : : : ; N to produce the smoothed mean and covari/
s;.i /
ance histories ms;.i
0WT ; P0WT .
The RaoBlackwellized particle smoother in this simple form also has
the same disadvantage as the SIR particle smoother, that is, the smoothed
estimate of uk can be quite degenerate if T k. Fortunately, the smoothed
estimates of the actual states xk can still be relatively good, because their
degeneracy is avoided by RaoBlackwellization.
To avoid the degeneracy in estimates of uk it is possible to use
better sampling procedures for generating samples from the smoothing
distributions analogously to the plain particle smoothing. The backwardsimulation has indeed been generalized to the RaoBlackwellized case,
but the implementation of the RaoBlackwellized reweighting smoother
seems to be quite problematic. The RaoBlackwellized backwardsimulation smoother (see Sarkka et al., 2012a) can be used for drawing
backward trajectories from the marginal posterior of the latent variables
uk and the posterior of the conditionally Gaussian part is obtained via
Kalman filtering and RTS smoothing. Another option is to simulate
backward trajectories from the joint distribution of .xk ; uk / (Fong et al.,
2002; Lindsten, 2011). However, this approach does not really lead to
RaoBlackwellized estimates of the smoothing distribution, because the
Gaussian part of the state is sampled as well.
It is also possible to construct approximate RaoBlackwellized
backward-simulation smoothers by using Kims approximation (see Kim,
1994; Barber, 2006; Sarkka et al., 2012a)
Z
p.uk j ukC1 ; xkC1 ; y1Wk / p.xkC1 j ukC1 ; y1WT / dxkC1
' p.uk j ukC1 ; y1Wk /:
(11.21)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
11.5 Exercises
173
part may be recovered with a Kalman filter and RTS smoother. However,
this procedure is only an approximation and does not lead to a true Rao
Blackwellized Monte Carlo representation of the smoothing distribution.
11.5 Exercises
11.1
11.2
11.3
Implement the backward-simulation smoother for the model in Exercise 5.1 and compare its performance to the Gaussian approximation based
smoothers.
Implement the reweighting smoother for the model in Exercise 5.1 and compare its performance to the other smoothers.
Show that the latent variable sequence in conditionally Gaussian models is
not Markovian in general in the sense that
p.uk j ukC1 ; y1WT / p.uk j ukC1 ; y1Wk /
11.4
11.5
(11.22)
when T > k, and thus simple backward smoothing in uk does not lead to
the correct result.
Implement the RaoBlackwellized SIR particle smoother for the clutter
model in Exercise 7.5.
Lets again consider the clutter model in Exercise 7.5. Assume that you have
implemented the filter as a RaoBlackwellized particle filter with resampling at every step (thus the weights are all equal). Write down the algorithm for the Kims approximation based backward simulation smoother for
the model. What peculiar property does the smoother have? Does this have
something to do with the property in Equation (11.22)?
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
12
Parameter estimation
In the previous chapters we have assumed that the parameters of the state
space model are known and only the state needs to be estimated. However,
in practical models, the parameters are unknown as well. In this chapter
we concentrate on three types of method for parameter estimation: optimization based methods for computing maximum a posteriori (MAP)
or maximum likelihood (ML) estimates, expectation-maximization (EM)
algorithms for computing the MAP or ML estimates, and Markov chain
Monte Carlo (MCMC) methods for generating Monte Carlo approximations of the posterior distributions. We also show how Kalman filters and
RTS smoothers, Gaussian filters and smoothers, as well as particle filters
and smoothers can be used for approximating the marginal likelihoods, parameter posteriors, and other quantities needed by the methods.
1 ; /;
yk p.yk j xk ; /:
(12.1)
174
(12.2)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
T
Y
p.xk j xk
1 ; /;
kD1
p.y1WT j x0WT ; / D
T
Y
p.yk j xk ; /:
kD1
(12.4)
without explicitly forming the joint posterior distribution of the states and
parameters as in Equation (12.2). Instead, we present recursive algorithms
for direct computation of the above distribution. For linear state space models, this can be done exactly, and in non-linear and non-Gaussian models
we can use Gaussian filtering or particle filtering based approximations.
Once we know how to evaluate the above distribution, we can estimate the
parameters, for example, by finding their maximum a posteriori (MAP) estimate or by sampling from the posterior by Markov chain Monte Carlo
(MCMC) methods. If the direct evaluation of the distribution is not feasible, we can use the expectation maximization (EM) algorithm for iteratively finding the ML or MAP estimate.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
176
T
Y
p.yk j y1Wk
1 ; /;
(12.5)
kD1
where we have denoted p.y1 j y1W0 ; / , p.y1 j / for notational convenience. Because each of the terms in the above product can be computed
recursively, the whole marginal likelihood can be computed recursively as
follows.
Theorem 12.1 (Recursion for marginal likelihood of parameters) The
marginal likelihood of parameters is given by Equation (12.5), where the
terms in the product can be computed recursively as
Z
p.yk j y1Wk 1 ; / D p.yk j xk ; / p.xk j y1Wk 1 ; / dxk ;
(12.6)
where p.yk j xk ; / is the measurement model and p.xk j y1Wk 1 ; / is
the predictive distribution of the state, which obeys the recursion
Z
p.xk j y1Wk 1 ; / D p.xk j xk 1 ; / p.xk 1 j y1Wk 1 ; / dxk 1
p.xk j y1Wk ; / D
1 ; /
:
(12.7)
Note that the latter equations are just the Bayesian filtering Equations (4.11) and (4.12), where we have explicitly written down the
parameter dependence.
Proof Due to the conditional independence of the measurements (Property 4.2) we have
p.yk ; xk j y1Wk
1 ; /
D p.yk j xk ; y1Wk
1 ; / p.xk
j y1Wk
1 ; /:
1 ; /
(12.8)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
177
The marginal likelihood obtained via Theorem 12.1 can then be substituted into Equation (12.4) to give the marginal posterior distribution of
the parameters. However, instead of working with marginal likelihood or
marginal posterior explicitly, in parameter estimation, it is often convenient
to define the unnormalized negative log-posterior or energy function as follows.
Definition 12.1 (Energy function)
'T ./ D
log p.y1WT j /
log p./:
(12.9)
(12.10)
The energy function can be seen to obey the following simple recursion.
Theorem 12.2 (Recursion for energy function) The energy function defined in Equation (12.9) can be evaluated recursively as follows.
Start from '0 ./ D log p./.
At each step k D 1; 2; : : : ; T compute the following:
'k ./ D 'k
where the terms p.yk j y1Wk
orem 12.1.
1 ./
1 ; /
1 ; /;
(12.11)
Proof The result follows from substituting Equation (12.5) into the definition of the energy function in Equation (12.9) and identifying the terms
corresponding to 'k 1 ./.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
178
Parameter estimation
(12.13)
which is usually numerically more stable and easier to compute. The maximum likelihood (ML) estimate of the parameter is a MAP estimate with a
formally uniform prior p./ / 1.
The minimum of the energy function can be computed by using various
gradient-free or gradient based general optimization algorithms (see, e.g.,
Luenberger and Ye, 2008). However, to be able to use gradient based optimization we will need to evaluate the derivatives of the energy function as
well. It is possible to find the derivatives in basically two ways (see, e.g.,
Cappe et al., 2005).
1 By formally differentiating the energy function recursion equations for
a particular method. This results in so-called sensitivity equations which
can be implemented as additional recursion equations computed along
with filtering.
2 Using Fishers identity which expresses the gradient of the energy function as an expectation of the derivative of the complete data log likelihood over the smoothing distribution. The advantage of this approach
over direct differentiation is that there is no need for an additional recursion.
The disadvantage of the MAP-estimate is that it essentially approximates
the posterior distribution with the Dirac delta function
p. j y1WT / ' .
O MAP /;
(12.14)
(12.15)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
179
where H.O MAP / is the Hessian matrix evaluated at the MAP estimate. However, to implement the Laplace approximation, we need to have a method
to compute (or approximate) the second order derivatives of the energy
function.
1/
/:
(12.16)
(12.18)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
180
Parameter estimation
1/
/ D N. .i / j .i
1/
; i
1 /;
(12.20)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
181
1/
'. //g :
(12.21)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
182
Parameter estimation
(12.23)
(12.24)
(12.25)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
183
(12.26)
T
X
kD1
log p.xk j xk
1 ; / C
T
X
kD1
(12.28)
The expression for Q in Equation (12.27) and thus the E-step in Algorithm 12.4 now reduces to computation of (see Schon et al., 2011)
(12.29)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
184
where
I1 .;
.n/
I2 .;
.n/
Z
/D
/D
T Z
X
p.xk ; xk
j y1WT ; .n/ /
kD1
log p.xk j xk
I3 .; .n/ / D
T Z
X
1 ; /
dxk dxk
1;
kD1
The above expectations are over the smoothing distribution and the key
thing is to observe that we do not need to compute expectations over the
full posterior, but only over the smoothing distributions p.xk j y1WT ; .n/ /
and pairwise smoothing distributions p.xk ; xk 1 j y1WT ; .n/ /. It turns out
that the required expectations can be easily (approximately) evaluated using smoother results. In the case of linear state space models we can find
a closed form expression for the above integrals in terms of RTS smoother
results. In the non-linear case we can approximate the integrals by using
non-linear smoothers such as Gaussian smoothers. In the more general
probabilistic state space model we can use particle smoothers to approximate them.
On the E-step of Algorithm 12.4 we need to maximize the expression for
Q in Equation (12.29) with respect to the parameters . In principle, we
can utilize various gradient-free and gradient based optimization methods
(see, e.g., Luenberger and Ye, 2008) for doing that, but the most useful
case occurs when we can do the maximization analytically via setting the
gradient to zero:
@Q.; .n/ /
D 0:
(12.31)
@
This happens, for example, when estimating the parameters of linear state
space models and in certain classes of non-linear state space models.
It turns out that we can calculate MAP estimates using the EM algorithm
by replacing p.x0WT ; y1WT j / in Equation (12.27) with p.x0WT ; y1WT ; /.
In practice, it can be implemented by maximizing Q.; .n/ / C log p./
at the M-step instead of the plain Q.
As a side product of the EM formulation above we also get a method
to compute the gradient of the energy function needed in gradient based
optimization for finding the MAP or ML estimates. Fishers identity (see,
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
185
@ log p./
@Q.; .n/ /
@'T ./
D
(12.32)
.n/ :
@
@
@
D
Note that here we refer to the above identity as Fishers identity although
the original identity is the relationship with the log marginal likelihood and
Q, not with the log posterior and Q. In any case this identity is useful in linear state space models, because it is often easier to compute and computationally lighter (Segal and Weinstein, 1989; Olsson et al., 2007). However,
in non-linear state space models it is not as useful, because the approximations involved in computation of the filtering and smoothing solutions
often cause the gradient to have different approximations from the energy
function approximation implied by the same method. That is, the gradient
approximation computed with Fishers identity and Gaussian smoothing
might not exactly match the gradient of the energy function approximation
computed with the corresponding Gaussian filter. However, in the case of
particle filters, Fishers identity provides a feasible way to approximate the
gradient of the energy function.
1 ; /
C qk
yk D h.xk ; / C rk :
1;
(12.33)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
186
Parameter estimation
1;
xk D f .xk
1 ; k 1 /
C qk
yk D h.xk ; k / C rk ;
1;
(12.34)
where the dynamic model for the parameter essentially says that it is conQ k D .xk ; k /, we get a state space
stant. If we now redefine the state as x
model of the form
Q k D fQ .Qxk 1 / C q
Q k 1;
x
yk D h.Qxk / C rk ;
(12.35)
which does not contain any unknown parameter anymore. The problem
in this state augmentation is the singularity of the dynamic model for the
parameter. It works well when the whole system is linear and we do not
have any approximation errors in the estimator. If the parameters appear
linearly in the system, it sometimes is a good idea to include the parameters
as part of the state. However, this might fail sometimes as well.
With approximate non-linear filters the singularity of the parameter dynamic model indeed causes problems. With non-linear Kalman filters the
Gaussian approximation tends to become singular which causes the filter
to diverge. As discussed in Section 7.4, particle filters have a problem with
small noises in the dynamic model because it causes sample impoverishment. As the noise in the dynamic model above is exactly zero, this case is
particularly problematic for particle filters.
A common way to circumvent the problem is to introduce an artificial
noise to the dynamic model of the parameter, that is, replace k D k 1
with
k D k
C "k
1;
(12.36)
where "k 1 is a small noise process. But the problem is that we are
no longer solving the original parameter estimation problem, but another
one with a time-varying parameter. Anyway, this approach is sometimes
applicable and should be considered before jumping into more complicated
parameter estimation methods.
There is also a form of RaoBlackwellization that sometimes helps. This
approach is discussed in Section 12.3.5 and the idea is to use a closed form
solution for the static parameter (RaoBlackwellize the parameter) and
sample only the original state part. This works if the parameter appears in
the model in a suitable conjugate form.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
187
C qk
1;
yk D H./ xk C rk ;
(12.37)
The recur-
1
1
log j2 Sk ./j C vkT ./ Sk 1 ./ vk ./; (12.38)
2
2
where the terms vk ./ and Sk ./ are given by the Kalman filter with the
parameters fixed to .
'k ./ D 'k
1 ./
Prediction:
mk ./ D A./ mk
Pk ./ D A./ Pk
1 ./;
T
1 ./ A ./
C Q./:
(12.39)
Update:
vk ./ D yk
H./ mk ./;
(12.40)
(12.41)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
188
3
Posterior Distribution
True Parameter Value
2.5
p(R | y1:T )
1.5
0.5
0
0
0.5
1
R
1.5
Thus if we fix and run the above algorithm from '0 ./ D log p./
at k D 0 to the step k D T , then the full energy function is 'T ./. That
is, the marginal posterior density at the point can be evaluated up to a
normalization constant by Equation (12.10) as
p. j y1WT / / exp. 'T .//:
Given the energy function it is now easy to implement, for example, a
MetropolisHastings based MCMC sampler for generating a Monte Carlo
approximation of the posterior distribution, or to use the energy function
in a gradient-free optimization for finding the ML or MAP estimates of the
parameters.
Example 12.1 (Parameter posterior for Gaussian random walk) The posterior distribution of the noise variance p.R j y1WT / for the Gaussian random walk model in Example 4.1 is shown in Figure 12.1. A formally uniform prior p.R/ / 1 was assumed. The true value used in the simulation is
indeed well within the high density area of the posterior distribution. However, it can also been seen that if we computed the MAP (or equivalently
ML) estimate of the parameter, we would get a smaller value than the true
one.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
189
s
xk 1
mk 1
Gk 1 Pks
p.xk ; xk
Pks GkT
Pks 1
;
(12.42)
where the means, covariances, and gains are computed with an RTS
smoother with the model parameters fixed to .n/ . Note that in the
EM algorithms appearing in the literature the cross term Pks GkT 1 is
sometimes computed with a separate recursion (see, e.g., Shumway and
Stoffer, 1982), but in fact it is unnecessary due to the above. The required
expectations for EM in Equations (12.30) can now be computed in closed
form and the result is the following (see Shumway and Stoffer, 1982).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
190
Theorem 12.4 (Evaluation of Q for linear Gaussian model) The expression for Q for the linear Gaussian model in Equation (12.37) can be written as
Q.; .n/ /
1
T
T
D
log j2 P0 ./j
log j2 Q./j
log j2 R./j
2(
2
2
)
i
h
1
1
s
s
s
T
tr P0 ./ P0 C .m0 m0 .// .m0 m0 .//
2
(
)
h
i
T
1
T
T
T
tr Q ./ C A ./ A./ C C A./ A ./
2
(
)
h
i
T
tr R 1 ./ D B HT ./ H./ BT C H./ HT ./ ;
2
(12.43)
where the following quantities are computed from the results of RTS
smoothers run with parameter values .n/ :
T
1 X s
D
Pk C msk msk T ;
T
kD1
T
1 X s
Pk
T
C msk
msk
T
1 ;
kD1
BD
T
1 X
yk msk T ;
T
kD1
T
1 X s T
CD
Pk Gk
T
C msk msk
kD1
DD
T
1 X
yk ykT :
T
(12.44)
kD1
The usefulness of the EM algorithm for linear state space models stems
from the fact that if the parameters are selected to be some of the full model
matrices (or initial mean) we can indeed perform the M-step of the EM
algorithm in closed form. The same thing happens if the parameters appear
linearly in one of the model matrices (e.g., are some subcomponents of the
matrices), but application of the EM algorithm to the estimation of the full
matrices is the classical result. By setting the gradients of @Q.; .n/ /=@
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
191
m0 / .ms0
m 0 /T :
(12.45)
When D A we get
A D C
(12.46)
When D Q we get
Q D
C AT
A C T C A AT :
(12.47)
When D H we get
H D B
(12.48)
When D R we get
R D D
H BT
B HT C H HT :
(12.49)
(12.50)
Obviously the above theorem can also be used for solving the maximum
of Q with respect to any subset of model matrices by solving the resulting
equations jointly. The EM algorithm for finding the maximum likelihood
estimates of the linear state space model parameters is now the following.
Algorithm 12.5 (EM algorithm for linear state space models) Let contain some subset of the model parameters fA; H; Q; R; P0 ; m0 g. We can
find maximum likelihood estimates of them via the following iteration.
Start from some initial guess .0/ .
For n D 0; 1; 2; : : : do the following steps.
1 E-step: Run the RTS smoother using the current parameter values
in .n/ and compute the quantities in Equation (12.44) from the
smoother results.
2 M-step: Find new parameters values by using Equations (12.45)
(12.50) and store them in .nC1/ .
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
192
The expression for Q.; .n/ / also provides an easy gradient recipe
(Olsson et al., 2007) for computation of the energy function gradient via
Fishers identity (Equation (12.32)), as it enables the computation of the
gradient without an additional recursion (sensitivity equations). The resulting expression is given in Theorem A.3 in Section A.3.
1 ; /
C qk
1;
yk D h.xk ; / C rk ;
(12.51)
1 ./
Prediction:
Z
mk ./ D
f .xk
1 ; /
N.xk
j mk
1 ./; Pk 1 .//
Z
Pk ./ D
.f .xk
1 ; /
N.xk
mk .// .f .xk
j mk
1 ; /
1 ./; Pk 1 .//
dxk
1;
dxk
mk .//T
1
C Qk
1 ./:
(12.53)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
193
Update:
Z
k ./ D
vk ./ D yk k ./;
Z
Sk ./ D .h.xk ; /
k .// .h.xk ; /
k .//T
.xk
mk .// .h.xk ; /
k .//T
(12.54)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
194
Algorithm 12.7 (Evaluation of Q via Gaussian smoothing) The expression for Q for the non-linear state space model in Equation (12.51) can be
written as
Q.; .n/ /
T
T
1
log j2 P0 ./j
log j2 Q./j
log j2 R./j
'
2(
2
2
)
i
h
1
1
s
s
s
T
tr P0 ./ P0 C .m0 m0 .// .m0 m0 .//
2
T
1X 1
tr Q ./ E .xk
2
f .xk
1 ; // .xk
f .xk
T
1 ; //
j y1WT
kD1
T
1X 1
tr R ./ E .yk
2
kD1
(12.55)
where the expectations are over the counterparts of the distributions in
Equations (12.42) obtained from the Gaussian smoother.
In practice, we can approximate the Gaussian smoother and Gaussian
integrals above with Taylor series approximations (EKF/ERTSS) or by
sigma-point methods such as GaussHermite or spherical cubature integration or the unscented transform. The M-step for the noise parameters can
indeed be implemented analogously to the linear case in Theorem 12.5, because the maxima of the above Q with respect to the noise covariance are
simply
Q D
T
1 X
E .xk
T
f .xk
1 ; // .xk
f .xk
1 ; //
j y1WT ;
kD1
T
1 X
R D
E .yk
T
h.xk ; //T j y1WT :
(12.56)
kD1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
195
1 ; /;
yk p.yk j xk ; /;
(12.57)
where 2 Rd is the unknown parameter to be estimated. The approximate evaluation of the marginal likelihood can be done with the following
modification of the SIR particle filter (see, e.g., Andrieu et al., 2004; Creal,
2012).
Algorithm 12.8 (SIR based energy function approximation) An approximation to the marginal likelihood of the parameters can be evaluated during the sequential importance resampling (SIR) algorithm (particle filter),
as follows.
Draw N samples x.i0 / from the prior:
x.i0 / p.x0 j /;
i D 1; : : : ; N;
(12.58)
i D 1; : : : ; N:
(12.59)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
196
(12.60)
(12.61)
(12.62)
log p./
T
X
log p.y
O k j y1Wk
1 ; /:
(12.64)
kD1
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
197
90
PMCMC Histogram
Gaussian Filter Estimate
True Parameter Value
80
70
p(R | y1:T )
60
50
40
30
20
10
0
0.05
0.1
R
0.15
method which thus should approach the true posterior of the parameter. As
can be seen, the posterior distribution estimate of the Gaussian filter (fifth
order GaussHermite filter) is a bit thinner than the true posterior distribution. That is, the uncertainty in the parameter value is underestimated.
In this case we are lucky, because the true parameter value still remains inside the high density area of the posterior distribution approximation, but
this might not always be the case.
In principle, it would also be possible to use the likelihood or energy
function approximation in gradient-free optimization methods for finding
MAP or ML estimates. However, this might turn out to be hard, because
even if we fixed the random number generator sequence in the particle filter, the likelihood function would not be continuous in (see, e.g., Kantas
et al., 2009). This also renders the use of gradient based optimization methods impossible.
By comparing to the RaoBlackwellized particle filter in Algorithm 7.6,
it is easy to see that the corresponding likelihood approximation can be
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
198
obtained by setting
vk.i / D
1 ; /
.u.ik / j u.i0Wk/
p.u.ik / j u.ik / 1 ; /
1 ; y1Wk ; /
(12.65)
.n/
S
1X
/
log p.Qx.i0 / j /;
S i D1
I2 .; .n/ /
T
X1
kD0
I3 .; .n/ /
S
1X
/
Q .ik / ; /;
log p.Qx.ikC1
jx
S i D1
T
S
X
1X
Q .ik / ; /:
log p.yk j x
S i D1
(12.66)
kD1
If we are using the reweighting (or marginal) particle smoother in Algorithm 11.3, the corresponding expectations can be approximated as follows
(Schon et al., 2011).
Algorithm 12.10 (Evaluation of Q via the reweighting smoother) Assume that we have the set of particles fx.ik / W k D 0; : : : ; T I i D 1; : : : ; N g
representing the filtering distribution and we have calculated the weights
.i /
fwkjT
W k D 0; : : : ; T I i D 1; : : : ; N g using Algorithm 11.3. Then we can
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
199
I2 .;
.n/
/
T
X1 X X
kD0
.j /
/
.i /
.n/
wkC1jT
wk.i / p.x.j
/
kC1 j xk ;
i
hP
.l/
.j /
.l/
.n/ /
l wk p.xkC1 j xk ;
.i /
/
log p.x.j
kC1 j xk ; /;
I3 .; .n/ /
T X
X
kD1
.i /
wkjT
log p.yk j x.ik / ; /:
(12.67)
1/
C qk
1;
yk D h.xk / C rk ;
rk N.0; R/;
R Inv-2 .0 ; R0 /;
(12.68)
where qk 1 N.0; Q/. This is thus the same kind of non-linear state
space model that we have already considered in this book, except that here
the measurement noise variance R is considered as unknown and given an
inverse-chi-squared prior distribution Inv-2 .0 ; R0 /.
It turns out that we can do sequential Monte Carlo sampling in this
model such that we do not need to sample the values of R. Instead, it is
enough to sample the state values and then carry the parameters of the distribution of R, conditional on the previous measurements and the histories
of samples. The idea is the following.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Parameter estimation
200
1 ; y1Wk 1 /
(12.70)
where k.i / 1 ; Rk.i / 1 have already been computed for each i. Then we have
the following approximation for the full distribution of states and parameters:
p.x0Wk
1; R
j y1Wk
1/
x.i0Wk/
1 /:
(12.71)
(12.72)
3 We can now evaluate the likelihood of the measurement given x.ik / and
the previous measurements as follows:
p.yk j x.ik / ; y1Wk 1 /
Z
D N.yk j h.x.ik / /; R/ Inv-2 .R j k.i / 1 ; Rk.i / 1 / dR
D t .i / .yk j h.x.ik / /; Rk.i / /;
(12.73)
h.x.ik / //2
(12.74)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
201
This allows us to compute the next step importance weights for the SIR
algorithm as follows:
wk.i /
1/
N.x.ik / j f .x.ik / 1 /; Q/
.x.ik / j x.ik / 1 ; yk /
(12.75)
4 Given the measurement and the state we can further compute the conditional distribution of R given y1Wk and x.i0Wk/ :
p.R j x.i0Wk/ ; y1Wk / D Inv-2 .R j k.i / ; Rk.i / /:
(12.76)
1 ; /;
yk p.yk j xk ; /;
p./;
(12.77)
where the vector contains the unknown static parameters. Now if the
posterior distribution of the parameters depends only on some sufficient
statistics
Tk D Tk .x1Wk ; y1Wk /;
(12.78)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
202
Parameter estimation
Fearnhead (2002), Djuric and Miguez (2002), and more recently it has been
applied to estimation of full noise covariances in state space models by
Saha et al. (2010).
A particularly useful special case, which includes the example above,
is obtained when the dynamic model is independent of the parameters .
In this case, if conditionally to the state xk the prior p./ belongs to the
conjugate family of the likelihood p.yk j xk ; /, the static parameters
can be marginalized out and only the states need to be sampled. This idea
can be extended to the time-varying case if the dynamic model has such
a form which keeps the predicted distribution of the parameter within the
conjugate family (see Sarkka and Nummenmaa, 2009).
When the static parameter appears linearly in the model we recover
a noise free version of the conditionally Gaussian RaoBlackwellization
considered in Section 7.5 (see Schon and Gustafsson, 2003). The Rao
Blackwellized particle filter can then be seen as a time-varying extension
of this method in the conditionally linear Gaussian case.
12.4 Exercises
12.1
12.2
12.3
12.4
12.5
12.6
12.7
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Exercises
12.8
203
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
13
Epilogue
1/
C qk
yk D h.xk / C rk ;
1;
(13.1)
where f and h are somewhat well-behaved functions, then the first choice
would be one of the Gaussian approximation based filters and smoothers
provided that we are working on an application and the theoretical exactness of the solution is not important per se, but we are interested in
getting good estimates of the state and parameters. If theoretical exactness
is needed, then the only option is to use particle filters and smoothers (or
grid based solutions).
Among the Gaussian approximation based filters and smoothers it is always a good idea to start with an EKF and an ERTSS. These are the only
algorithms that have been used for over half a century in practical applications and there are good reasons for that they simply work. Statistical
linearization can sometimes be used to enhance implementations of the
EKF afterwards by replacing some of the function evaluations by their expectations. Otherwise the SLF is a theoretical tool rather than a practical
filtering algorithm.
204
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
205
With some models the EKF and ERTSS do not work well or at all, and
in that case we can move to the sigma-point methods. The spherical cubature and unscented methods have the advantage of being computationally
quite light, but still they tend to produce very good results. However, these
methods have the problem that their error estimates might not always be
consistent with the actual errors, a problem which the EKF/ERTSS methods also tend to have. The unscented transform has more parameters to
tune for a particular problem than the spherical cubature method, which
can be an advantage or a disadvantage (recall that the cubature spherical
method is an unscented transform with a certain selection of parameters).
The GaussHermite based methods tend to be more consistent in errors and
are thus more robust approximations, but have the disadvantage of having
high computational complexity. One should always remember that there
is no guarantee that using more complicated filtering and smoothing algorithms would actually improve the results, therefore it is a good idea to
always test the EKF and ERTSS first. The bootstrap filter has the advantage that it is very easy to implement and thus it can sometimes be used as
a reference solution when testing the performance and debugging Gaussian
approximation based filters and smoothers.
If the problem has a more complicated form which cannot be fitted into
the non-linear Gaussian framework or when the Gaussian approximations
do not work for other reasons, we need to go to particle based solutions.
Because the bootstrap filter is very easy to implement, it (and probably one
of the particle smoothers) should be the first option to test with a sufficiently large number of particles. However, the clear disadvantage of particle methods is the high computational load and thus it is a good idea
to check at quite an early stage if any of the states or parameters can be
marginalized out (RaoBlackwellized) exactly or approximately. If this
is the case, then one should always prefer marginalization to sampling.1
The other thing to check is if it is possible to use the optimal or almost optimal importance distribution in the particle filter. In principle, non-linear
Gaussian approximation based filters can be used to form such importance
distributions, but this may also lead to overly heavy computational methods
as well as to convergence problems. If they are used, then it might be advisable to artificially increase the filter covariances a bit or to use Students
t distributions instead of using the Gaussian approximations as such.
1
The rule of thumb is: use Monte Carlo sampling only as a last resort when all the other
options have failed.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
206
Epilogue
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
207
There also exist various other kinds of Gaussian integration methods that
we have not presented here that could be used for constructing new kinds
of Gaussian filters and smoothers (see, e.g., OHagan, 1991; Nrgaard
et al., 2000; Lefebvre et al., 2002; Wu et al., 2006; Sarkka and Hartikainen,
2010b; Sandblom and Svensson, 2012). One particularly interesting approach is to approximate the non-linear functions with a Gaussian process
based non-parametric model which is fitted using a finite number of sample
points (Deisenroth et al., 2009, 2012).
One useful class of discrete-time methods is the multiple model approaches such as the generalized pseudo-Bayesian methods (GPB1 and
GPB2) as well as the interacting multiple model (IMM) algorithm (BarShalom et al., 2001). These methods can be used for approximating the
Bayesian solutions to problems with a fixed number of models or modes
of operation. The active mode of the system is described by a discrete latent variable which is modeled as a discrete-state Markov chain. Given the
value of the latent variable, the system is (approximately) Gaussian. The
GPB1, GPB2, and IMM algorithms are based on forming a mixture of
Gaussians approximation (a bank of Kalman or extended Kalman filters)
to the Bayesian filtering solutions by using moment matching.
The above-mentioned multiple model methods are also closely related to
so-called expectation correction (EC, Barber, 2006) and expectation propagation (EP, Zoeter and Heskes, 2011) methods, which can also be used
for Bayesian filtering and smoothing in switching linear dynamic systems
(SLDS), which is another term used for multiple mode/model problems.
These models can also be considered as special cases of the conditionally Gaussian models considered in the previous section and the history of
similar approximations dates back to the works of Alspach and Sorenson
(1972) and Akashi and Kumamoto (1977). The relationship between various methods for this type of model has recently been analyzed by Barber
(2011).
When the measurement model is non-Gaussian (e.g., Students t), it is
sometimes possible to use variational Bayes approximations (Agamennoni
et al., 2011; Piche et al., 2012) to yield to tractable inference. The expectation propagation (EP) algorithm (Ypma and Heskes, 2005) can also be
used for approximate inference in non-linear and non-Gaussian dynamic
systems. Both of these approaches are also closely related to the Gaussian filters and smoothers considered in this book. Variational Bayesian
approximations can also be used for estimation of unknown time-varying
parameters in state space models (Sarkka and Nummenmaa, 2009).
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
208
Epilogue
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Appendix
Additional material
(A.2)
(A.3)
Lemma A.2 (Conditional distribution of Gaussian variables) If the random variables x and y have the joint Gaussian probability distribution
x
a
A C
N
;
;
(A.4)
y
b
CT B
209
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
210
Additional material
y j x N.b C C T A
.y
1
.x
b/; A
a/; B
C B 1 C T /;
CT A
C/:
(A.5)
(A.6)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
211
Another way to compute the same derivative is via the following theorem.
Theorem A.1 (Partial derivative of Cholesky factorization) The partial
derivative @A=@ of the lower triangular Cholesky factor A such that P D
A AT with respect to a scalar parameter can be computed as
@A
1 @P
T
;
(A.7)
D A A
A
@
@
where ./ is a function returning the lower triangular part and half the
diagonal of the argument as follows:
8
if i > j;
<Mij ;
ij .M/ D 21 Mij ; if i D j;
(A.8)
:
0;
if i < j:
Proof We use a similar trick that was used in the derivation of the time
derivative of the Cholesky factor in Morf et al. (1978). We have
P D A AT :
(A.9)
(A.10)
T
gives
@P
@A
@AT
A TDA 1
C
A T:
(A.11)
@
@
@
Now the right-hand side is the sum of a lower triangular matrix and an
upper triangular matrix with identical diagonals. Thus we can recover
A
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Additional material
212
@A=@ via
A
@A
D A
@
@P
A
@
;
(A.12)
where the function ./ returns the (strictly) lower triangular part of the
argument and half of the diagonal. Multiplying from the left with A gives
the result.
C qk
1;
yk D H./ xk C rk ;
(A.13)
(A.15)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
213
(A.16)
The recursion should be started from the initial condition @'0 ./=@ D
@ log p./=@ .
Another way to compute the same derivative is by using Fishers identity
(Equation 12.32) together with the expression for Q in Theorem 12.4. The
result is the following.
Theorem A.3 (Energy function derivative for linear Gaussian model II)
The derivative of the energy function given in Theorem 12.3 can be computed as
@ log p./
@Q.; .n/ /
@'T ./
(A.17)
D
.n/ ;
@
@
@
D
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
214
Additional material
where
@Q.; .n/ /
.n/
@i
D
1
T
T
1 @P0
1 @Q
1 @R
D
tr P0
tr Q
tr R
2
@i
2
@i
2
@i
)
(
i
h
@P
1
0
C tr P0 1
P 1 P0s C .ms0 m0 / .ms0 m0 /T
2
@i 0
(
"
#)
T
1
@m
@m
0
0
C tr P0 1
.ms0 m0 /T C .ms0 m0 /
2
@i
@i
(
)
h
i
T
1 @Q
1
T
T
T
C tr Q
Q
CA
AC C AA
2
@i
(
"
#)
T
@AT @A T @A
@AT
1
T
tr Q
C
C C
A C A
2
@i
@i
@i
@i
)
(
i
h
T
@R
C tr R 1
R 1 D B HT H BT C H HT
2
@i
(
"
#)
T
@HT @H T @H
@HT
1
T
tr R
B
B C
H C H
;
2
@i
@i
@i
@i
(A.18)
where all the terms are evaluated at .
1 ; /
C qk
yk D h.xk ; / C rk :
1;
(A.19)
In order to compute the derivative, it is convenient to first rewrite the expectations as expectations over unit Gaussian distributions as follows:
Z
mk ./ D f .xk 1 ; / N.xk 1 j mk 1 ./; Pk 1 .// dxk 1
Z
p
D f .mk 1 ./ C Pk 1 ./ ; / N. j 0; I/ d:
(A.20)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
215
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Additional material
216
p
Pk
1 C
;
#
mk
p
@m
@ Pk 1
k 1
C
1 ;
@i
@i
#
@m T
p
@f
@Qk 1
k
C
mk 1 C Pk 1 ;
N. j 0; I/ d C
;
@i
@i
@i
(A.23)
"
Fx mk
p
Pk
1 C
@k
;
@i
"
Z
q
Hx mk C Pk ;
!
p
@ Pk
@mk
C
@i
@i
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
217
#
k
!
p
@ Pk
@mk
C
@i
@i
#
q
@ T
@Rk
@h
k
mk C Pk ;
N. j 0; I/ d C
;
C
@i
@i
@i
!T
Z ( p
q
@ Pk
@Ck
D
h mk C Pk ;
k
@i
@i
"
!
p
q
q
@m
P
@
k
k
C Pk Hx mk C Pk ;
C
@i
@i
# )
q
@ T
@h
k
C
mk C Pk ;
N. j 0; I/ d;
@i
@i
"
q
Hx mk C Pk ;
@Ck
@Kk
@Sk
D
Sk 1 Ck Sk 1
S 1;
@i
@i
@i k
@mk
@mk
@Kk
@vk
D
C
vk C Kk
;
@i
@i
@i
@i
@Pk
@Kk
@Sk T
@Pk
D
Sk KTk Kk
K
@i
@i
@i
@i k
Kk Sk
@KTk
;
@i
(A.24)
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Additional material
218
mean:
@mk
D
@i
Z "
@f .xk 1 ; /
Fx .xk 1 ; / g.xk 1 ; / C
@i
N.xk
j mk
1 ; Pk 1 /
dxk
1;
(A.25)
where
g.xk
1 ; /
@mk
D
@i
p
@ Pk
C
@i
p
Pk
1
.xk
mk
1/ :
(A.26)
The derivation of the full set of equations is left as an exercise to the reader.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
References
Agamennoni, G., Nieto, J., and Nebot, E. 2011. An outlier-robust Kalman filter. Pages
15511558 of: IEEE International Conference on Robotics and Automation (ICRA).
Akashi, H. and Kumamoto, H. 1977. Random sampling approach to state estimation in
switching environments. Automatica, 13(4), 429434.
Alspach, D. L. and Sorenson, H. W. 1972. Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Transactions on Automatic Control, 17(4).
Andrieu, C., De Freitas, N., and Doucet, A. 2002. Rao-Blackwellised particle filtering
via data augmentation. In: Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds.),
Advances in Neural Information Processing Systems 14. MIT Press.
Andrieu, C., Doucet, A., Singh, S., and Tadic, V. 2004. Particle methods for change
detection, system identification, and control. Proceedings of the IEEE, 92(3), 423
438.
Andrieu, C. and Thoms, J. 2008. A tutorial on adaptive MCMC. Statistics and Computing, 18(4), 343373.
Andrieu, C., Doucet, A., and Holenstein, R. 2010. Particle Markov chain Monte Carlo
methods. The Royal Statistical Society: Series B (Statistical Methodology), 72(3),
269342.
Arasaratnam, I. and Haykin, S. 2009. Cubature Kalman filters. IEEE Transactions on
Automatic Control, 54(6), 12541269.
Arasaratnam, I. and Haykin, S. 2011. Cubature Kalman smoothers. Automatica, 47(10),
22452250.
Arasaratnam, I., Haykin, S., and Elliott, R. J. 2007. Discrete-time nonlinear filtering
algorithms using GaussHermite quadrature. Proceedings of the IEEE, 95(5), 953
977.
Arasaratnam, I., Haykin, S., and Hurd, T. R. 2010. Cubature Kalman filtering for
continuous-discrete systems: theory and simulations. IEEE Transactions on Signal
Processing, 58(10), 49774993.
Bar-Shalom, Y. and Li, X.-R. 1995. Multitarget-Multisensor Tracking: Principles and
Techniques. YBS Publishing.
Bar-Shalom, Y., Li, X.-R., and Kirubarajan, T. 2001. Estimation with Applications to
Tracking and Navigation. Wiley.
Barber, D. 2006. Expectation correction for smoothed inference in switching linear
dynamical systems. The Journal of Machine Learning Research, 7, 25152540.
219
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
220
References
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
References
221
Doucet, A., Godsill, S. J., and Andrieu, C. 2000. On sequential Monte Carlo sampling
methods for Bayesian filtering. Statistics and Computing, 10(3), 197208.
Doucet, A., De Freitas, N., and Gordon, N. 2001. Sequential Monte Carlo Methods in
Practice. Springer.
Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. 1987. Hybrid Monte
Carlo. Physics Letters B, 195(2), 216222.
Fearnhead, P. 2002. Markov chain Monte Carlo, sufficient statistics, and particle filters.
Journal of Computational and Graphical Statistics, 11(4), 848862.
Fearnhead, P. and Clifford, P. 2003. On-line inference for Hidden Markov models
via particle filters. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 65(4), 887899.
Fong, W., Godsill, S. J., Doucet, A., and West, M. 2002. Monte Carlo smoothing with
application to audio signal enhancement. IEEE Transactions on Signal Processing,
50(2), 438449.
Fraser, D. and Potter, J. 1969. The optimum linear smoother as a combination of two
optimum linear filters. IEEE Transactions on Automatic Control, 14(4), 387390.
Gelb, A. 1974. Applied Optimal Estimation. MIT Press.
Gelb, A. and Vander Velde, W. 1968. Multiple-Input Describing Functions and Nonlinear System Design. McGraw-Hill.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. R. 2004. Bayesian Data Analysis.
Second edn. Chapman & Hall.
Gilks, W., Richardson, S., and Spiegelhalter, D. (eds.). 1996. Markov Chain Monte
Carlo in Practice. Chapman & Hall.
Godsill, S. J. and Rayner, P. J. 1998. Digital Audio Restoration: a Statistical Model
Based Approach. Springer-Verlag.
Godsill, S. J., Doucet, A., and West, M. 2004. Monte Carlo smoothing for nonlinear
time series. Journal of the American Statistical Association, 99(465), 156168.
Golub, G. H. and van Loan, C. F. 1996. Matrix Computations. Third edn. The Johns
Hopkins University Press.
Golub, G. H. and Welsch, J. H. 1969. Calculation of Gauss quadrature rules. Mathematics of Computation, 23(106), 221230.
Gonzalez, R. C. and Woods, R. E. 2008. Digital Image Processing. Third edn. Prentice
Hall.
Gordon, N. J., Salmond, D. J., and Smith, A. F. M. 1993. Novel approach to
nonlinear/non-Gaussian Bayesian state estimation. Pages 107113 of: IEEE Proceedings on Radar and Signal Processing, vol. 140.
Grewal, M. S. and Andrews, A. P. 2001. Kalman Filtering, Theory and Practice Using
MATLAB. Wiley.
Grewal, M. S., Miyasako, R. S., and Smith, J. M. 1988. Application of fixed point
smoothing to the calibration, alignment and navigation data of inertial navigation
systems. Pages 476479 of: Position Location and Navigation Symposium.
Grewal, M. S., Weill, L. R., and Andrews, A. P. 2001. Global Positioning Systems,
Inertial Navigation and Integration. Wiley.
Gupta, N. and Mehra, R. 1974. Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations. IEEE Transactions on
Automatic Control, 19(6), 774783.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
222
References
Gustafsson, F. and Hendeby, G. 2012. Some relations between extended and unscented
Kalman filters. IEEE Transactions on Signal Processing, 60(2), 545555.
Haario, H., Saksman, E., and Tamminen, J. 1999. Adaptive proposal distribution for
random walk Metropolis algorithm. Computational Statistics, 14(3), 375395.
Haario, H., Saksman, E., and Tamminen, J. 2001. An adaptive Metropolis algorithm.
Bernoulli, 7(2), 223242.
Hartikainen, J. and Sarkka, S. 2010. Kalman filtering and smoothing solutions to temporal Gaussian process regression models. Pages 379384 of: Proceedings of IEEE
International Workshop on Machine Learning for Signal Processing (MLSP).
Hauk, O. 2004. Keep it simple: a case for using classical minimum norm estimation in
the analysis of EEG and MEG data. NeuroImage, 21(4), 16121621.
Hayes, M. H. 1996. Statistical Digital Signal Processing and Modeling. John Wiley &
Sons, Inc.
Haykin, S. 2001. Kalman Filtering and Neural Networks. Wiley.
Hiltunen, P., Sarkka, S., Nissila, I., Lajunen, A., and Lampinen, J. 2011. State space
regularization in the nonstationary inverse problem for diffuse optical tomography.
Inverse Problems, 27, 025009.
Ho, Y. C. and Lee, R. C. K. 1964. A Bayesian approach to problems in stochastic
estimation and control. IEEE Transactions on Automatic Control, 9(4), 333339.
Hu, X., Schon, T., and Ljung, L. 2008. A basic convergence result for particle filtering.
IEEE Transactions on Signal Processing, 56(4), 13371348.
Hu, X., Schon, T., and Ljung, L. 2011. A general convergence result for particle
filtering. IEEE Transactions on Signal Processing, 59(7), 34243429.
Hurzeler, M. and Kunsch, H. R. 1998. Monte Carlo approximations for general statespace models. Journal of Computational and Graphical Statistics, 7(2), 175193.
Ito, K. and Xiong, K. 2000. Gaussian filters for nonlinear filtering problems. IEEE
Transactions on Automatic Control, 45(5), 910927.
Jazwinski, A. H. 1966. Filtering for nonlinear dynamical systems. IEEE Transactions
on Automatic Control, 11(4), 765766.
Jazwinski, A. H. 1970. Stochastic Processes and Filtering Theory. Academic Press.
Julier, S. J. and Uhlmann, J. K. 1995. A General Method of Approximating Nonlinear
Transformations of Probability Distributions. Tech. rept. Robotics Research Group,
Department of Engineering Science, University of Oxford.
Julier, S. J. and Uhlmann, J. K. 2004. Unscented filtering and nonlinear estimation.
Proceedings of the IEEE, 92(3), 401422.
Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. F. 1995. A new approach for
filtering nonlinear systems. Pages 16281632 of: Proceedings of the 1995 American
Control, Conference, Seattle, Washington.
Julier, S. J., Uhlmann, J. K., and Durrant-Whyte, H. F. 2000. A new method for the
nonlinear transformation of means and covariances in filters and estimators. IEEE
Transactions on Automatic Control, 45(3), 477482.
Kailath, T., Sayed, A. H., and Hassibi, B. 2000. Linear Estimation. Prentice Hall.
Kaipio, J. and Somersalo, E. 2005. Statistical and Computational Inverse Problems.
Applied Mathematical Sciences, no. 160. Springer.
Kalman, R. E. 1960a. Contributions to the theory of optimal control. Boletin de la
Sociedad Matematica Mexicana, 5(1), 102119.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
References
223
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
224
References
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
References
225
Rauch, H. E., Tung, F., and Striebel, C. T. 1965. Maximum likelihood estimates of
linear dynamic systems. AIAA Journal, 3(8), 14451450.
Ristic, B., Arulampalam, S., and Gordon, N. 2004. Beyond the Kalman Filter. Artech
House.
Roberts, G. O. and Rosenthal, J. S. 2001. Optimal scaling for various Metropolis
Hastings algorithms. Statistical Science, 16(4), 351367.
Roweis, S. and Ghahramani, Z. 2001. Learning nonlinear dynamical systems using
the expectationmaximization algorithm. Chapter 6, pages 175220 of: Haykin, S.
(ed.), Kalman Filtering and Neural Networks. Wiley-Interscience.
Sage, A. P. and Melsa, J. L. 1971. Estimation Theory with Applications to Communications and Control. McGraw-Hill.
Saha, S., Ozkan, E., Gustafsson, F., and Smidl, V. 2010. Marginalized particle filters for
Bayesian estimation of Gaussian noise parameters. Pages 18 of: 13th Conference
on Information Fusion (FUSION).
Sandblom, F. and Svensson, L. 2012. Moment estimation using a marginalized transform. IEEE Transactions on Signal Processing, 60(12), 61386150.
Sarkka, S. 2006. Recursive Bayesian Inference on Stochastic Differential Equations.
Doctoral dissertation, Helsinki University of Technology.
Sarkka, S. 2007. On unscented Kalman filtering for state estimation of continuous-time
nonlinear systems. IEEE Transactions on Automatic Control, 52(9), 16311641.
Sarkka, S. 2008. Unscented Rauch-Tung-Striebel smoother. IEEE Transactions on
Automatic Control, 53(3), 845849.
Sarkka, S. 2010. Continuous-time and continuous-discrete-time unscented RauchTung-Striebel smoothers. Signal Processing, 90(1), 225235.
Sarkka, S. 2011. Linear operators and stochastic partial differential equations in Gaussian process regression. In: Proceedings of ICANN.
Sarkka, S. and Hartikainen, J. 2010a. On Gaussian optimal smoothing of non-linear
state space models. IEEE Transactions on Automatic Control, 55(8), 19381941.
Sarkka, S. and Hartikainen, J. 2010b. Sigma point methods in optimal smoothing of
non-linear stochastic state space models. Pages 184189 of: Proceedings of IEEE
International Workshop on Machine Learning for Signal Processing (MLSP).
Sarkka, S. and Hartikainen, J. 2012. Infinite-dimensional Kalman filtering approach to
spatio-temporal Gaussian process regression. In: Proceedings of AISTATS 2012.
Sarkka, S. and Nummenmaa, A. 2009. Recursive noise adaptive Kalman filtering by
variational Bayesian approximations. IEEE Transactions on Automatic Control,
54(3), 596600.
Sarkka, S. and Sarmavuori, J. 2013. Gaussian filtering and smoothing for continuousdiscrete dynamic systems. Signal Processing, 93(2), 500510.
Sarkka, S. and Solin, A. 2012. On continuous-discrete cubature Kalman filtering. Pages
12101215 of: Proceedings of SYSID 2012.
Sarkka, S. and Sottinen, T. 2008. Application of Girsanov theorem to particle filtering
of discretely observed continuous-time non-linear systems. Bayesian Analysis, 3(3),
555584.
Sarkka, S., Vehtari, A., and Lampinen, J. 2007a. CATS benchmark time series prediction by Kalman smoother with cross-validated noise density. Neurocomputing,
70(1315), 23312341.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
226
References
Sarkka, S., Vehtari, A., and Lampinen, J. 2007b. Rao-Blackwellized particle filter for
multiple target tracking. Information Fusion Journal, 8(1), 215.
Sarkka, S., Bunch, P., and Godsill, S. J. 2012a. A backward-simulation based RaoBlackwellized particle smoother for conditionally linear Gaussian models. Pages
506511 of: Proceedings of SYSID 2012.
Sarkka, S., Solin, A., Nummenmaa, A., Vehtari, A., Auranen, T., Vanni, S., and Lin,
F.-H. 2012b. Dynamic retrospective filtering of physiological noise in BOLD fMRI:
DRIFTER. NeuroImage, 60(2), 15171527.
Sarmavuori, J. and Sarkka, S. 2012a. Fourier-Hermite Kalman filter. IEEE Transactions
on Automatic Control, 57(6), 15111515.
Sarmavuori, J. and Sarkka, S. 2012b. Fourier-Hermite Rauch-Tung-Striebel Smoother.
In: Proceedings of EUSIPCO.
Schon, T. and Gustafsson, F. 2003. Particle filters for system identification of statespace models linear in either parameters or states. Pages 12871292 of: Proceedings
of the 13th IFAC Symposium on System Identification, Rotterdam, The Netherlands.
Schon, T., Gustafsson, F., and Nordlund, P.-J. 2005. Marginalized particle filters for
mixed linear/nonlinear state-space models. IEEE Transactions on Signal Processing,
53(7), 22792289.
Schon, T., Wills, A., and Ninness, B. 2011. System identification of nonlinear statespace models. Automatica, 47(1), 3949.
Segal, M. and Weinstein, E. 1989. A new method for evaluating the log-likelihood
gradient, the Hessian, and the Fisher information matrix for linear dynamic systems.
IEEE Transactions on Information Theory, 35(3), 682687.
Shiryaev, A. N. 1996. Probability. Springer.
Shumway, R. and Stoffer, D. 1982. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 3(4), 253264.
Simandl,
M. and Dunk, J. 2006. Design of derivative-free smoothers and predictors.
Pages 991996 of: Preprints of the 14th IFAC Symposium on System Identification.
Singer, H. 2008. Nonlinear continuous time modeling approaches in panel research.
Statistica Neerlandica, 62(1), 2957.
Singer, H. 2011. Continuous-discrete state-space modeling of panel data with nonlinear
filter algorithms. AStA Advances in Statistical Analysis, 95(4), 375413.
Snyder, C., Bengtsson, T., Bickel, P., and Anderson, J. 2008. Obstacles to highdimensional particle filtering. Monthly Weather Review, 136(12), 46294640.
Stengel, R. F. 1994. Optimal Control and Estimation. Dover.
Stone, L. D., Barlow, C. A., and Corwin, T. L. 1999. Bayesian Multiple Target Tracking.
Artech House.
Storvik, G. 2002. Particle filters in state space models with the presence of unknown
static parameters. IEEE Transactions on Signal Processing, 50(2), 281289.
Stratonovich, R. L. 1968. Conditional Markov Processes and Their Application to the
Theory of Optimal Control. Elsevier.
Striebel, C. T. 1965. Partial differential equations for the conditional distribution of a
Markov process given noisy observations. Journal of Mathematical Analysis and
Applications, 11, 151159.
Tam, P., Tam, D., and Moore, J. 1973. Fixed-lag demodulation of discrete noisy measurements of FM signals. Automatica, 9(6), 725729.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
References
227
Tarantola, A. 2004. Inverse Problem Theory and Methods for Model Parameter Estimation. SIAM.
Titterton, D. H. and Weston, J. L. 1997. Strapdown Inertial Navigation Technology.
Peter Peregrinus Ltd.
Vaa nanen, V. 2012. Gaussian Filtering and Smoothing Based Parameter Estimation in
Nonlinear Models for Sequential Data. Masters Thesis, Aalto University.
Van der Merwe, R. and Wan, E. 2003. Sigma-point Kalman filters for probabilistic
inference in dynamic state-space models. In: Proceedings of the Workshop on Advances in Machine Learning.
Van der Merwe, R. and Wan, E. A. 2001. The square-root unscented Kalman filter for
state and parameter estimation. Pages 34613464 of: International Conference on
Acoustics, Speech, and Signal Processing.
Van der Merwe, R., De Freitas, N., Doucet, A., and Wan, E. 2001. The unscented particle filter. Pages 584590 of: Advances in Neural Information Processing Systems
13.
Van Trees, H. L. 1968. Detection, Estimation, and Modulation Theory Part I. John
Wiley & Sons.
Van Trees, H. L. 1971. Detection, Estimation, and Modulation Theory Part II. John
Wiley & Sons.
Vihola, M. 2012. Robust adaptive Metropolis algorithm with coerced acceptance rate.
Statistics and Computing, 22(5), 9971008.
Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2).
Wan, E. A. and Van der Merwe, R. 2001. The unscented Kalman filter. Chapter 7 of:
Haykin, S. (ed.), Kalman Filtering and Neural Networks. Wiley.
West, M. and Harrison, J. 1997. Bayesian Forecasting and Dynamic Models. SpringerVerlag.
Wiener, N. 1950. Extrapolation, Interpolation and Smoothing of Stationary Time Series
with Engineering Applications. John Wiley & Sons.
Wills, A., Schon, T. B., Ljung, L., and Ninness, B. 2013.
Identification of
HammersteinWiener models. Automatica, 49(1), 7081.
Wu, Y., Hu, D., Wu, M., and Hu, X. 2005. Unscented Kalman filtering for additive noise
case: augmented versus nonaugmented. IEEE Signal Processing Letters, 12(5), 357
360.
Wu, Y., Hu, D., Wu, M., and Hu, X. 2006. A numerical-integration perspective on
Gaussian filters. IEEE Transactions on Signal Processing, 54(8), 29102921.
Ypma, A. and Heskes, T. 2005. Novel approximations for inference in nonlinear dynamical systems using expectation propagation. Neurocomputing, 69(1), 8599.
Zoeter, O. and Heskes, T. 2011. Expectation propagation and generalized EP methods
for inference in switching linear dynamical systems. Chapter 7, pages 141165 of:
Bayesian Time Series Models. Cambridge University Press.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
Index
229
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
230
INDEX
non-additive, 98
Gaussian fixed-lag smoother, 164
Gaussian fixed-point smoother, 162
Gaussian moment matching, 96, 97
Gaussian process regression, 6
Gaussian random walk, 52
for linear regression, 34
Gaussian RTS smoother
additive, 154
non-additive, 155
Hermite polynomial, 100
Hessian matrix, 65
importance sampling, 23, 119
information filter, 141
Jacobian matrix, 65
Kalman filter
basic, 57
cubature, 111, 112
extended, 69, 71, 73
for car tracking, 59
for Gaussian random walk, 58
for linear regression, 30
for linear regression with drift, 35
GaussHermite, 104
Gaussian, 98
statistically linearized, 78
unscented, 87, 88
Kalman smoother, see RTS smoother
Laplace approximation, 178
least squares estimate, 24
linear approximation, 66, 68
linear quadratic Gaussian regulator, 8
linear regression, 27
linearization, see linear approximation
local linearization, 125
loss function
01, 21
absolute error, 21
definition, 21
quadratic error, 21
MAP-estimate, 22, 178
marginal likelihood of parameters, 176
marginalized transform, 92
Markov chain Monte Carlo, 23, 179
matrix inversion lemma, 30
measurement model
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
INDEX
definition, 19
joint distribution of measurements, 53
of probabilistic state space model, 51
MetropolisHastings, 179
ML-estimate, 18, 178
MMSE-estimate, 21
Monte Carlo method, 23, 116
non-linear transform
additive Gaussian moment matching,
96
additive linear approximation, 66
additive quadratic approximation, 68
additive statistically linearized
approximation, 76, 77
additive unscented transform
approximation, 84
non-additive Gaussian moment
matching, 97
non-additive linear approximation, 68
non-additive statistically linearized
approximation, 76
non-additive unscented transform
approximation, 85
on-line learning, 33
optimal filtering, see Bayesian filtering
optimal importance distribution, 125
optimal smoothing, see Bayesian
smoothing
parameter estimation
definition, 174
Gaussian random walk, 188
linear Gaussian models, 187
pendulum model, 196
via Gaussian filtering and smoothing,
192
via particle filtering and smoothing,
195
via RaoBlackwellization, 199
via state augmentation, 185
particle filter
algorithm, 124
for cluttered pendulum tracking, 128
for pendulum tracking, 127
RaoBlackwellized, 130
particle marginal MetropolisHastings,
196
particle Markov chain Monte Carlo, 196
particle smoother
231
backward-simulation, 167
Kims approximation, 172
marginal, 169
RaoBlackwellized, 171
reweighting, 169
SIR, 165
posterior distribution
batch linear regression model, 29
definition, 19
joint distribution of states, 53
recursive linear regression model, 30
posterior mean, 21
prior distribution
definition, 19
joint distribution of states, 53
probabilistic notation, 27
quadratic approximation, 68
quadrature Kalman filter, see
GaussHermite Kalman filter
RaoBlackwellization of parameters, 199
RaoBlackwellized particle filter, 130
RaoBlackwellized particle smoother,
171
RauchTungStriebel smoother, see RTS
Smoother
recursive solution
general Bayesian, 33
to linear regression, 30
resampling, 123
reweighting particle smoother, 169
robust adaptive Metropolis, 180
RTS smoother
basic, 136
cubature, 157, 158
extended, 144
for car tracking, 138
for Gaussian random walk, 137
GaussHermite, 155
Gaussian, 154, 155
statistically linearized, 146
unscented, 148, 150
sample impoverishment, 126
second order approximation, see
quadratic approximation
sensitivity equations, 178, 189
linear Gaussian model, 212
non-linear Gaussian model, 215
sequential importance resampling, 124
This PDF version is made available for personal use. The copyright in all material rests with the author (Simo Sarkka). Commercial
reproduction is prohibited, except as authorised by the author and publisher.
232
INDEX