0% found this document useful (0 votes)
37 views30 pages

Particle Filter Theory and Practice With Positioning Applications

This document provides an introduction to particle filter theory and applications for positioning. It discusses how particle filters can be used to estimate the state of a dynamic system based on partial observations, when the state includes variables like position, velocity and acceleration. It summarizes classical Bayesian filtering approaches like the Kalman filter and its extensions, and introduces the particle filter as a numerical solution that is applicable to nonlinear and non-Gaussian systems. It also provides an overview of several positioning applications that demonstrate the use of particle filters with real data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views30 pages

Particle Filter Theory and Practice With Positioning Applications

This document provides an introduction to particle filter theory and applications for positioning. It discusses how particle filters can be used to estimate the state of a dynamic system based on partial observations, when the state includes variables like position, velocity and acceleration. It summarizes classical Bayesian filtering approaches like the Kalman filter and its extensions, and introduces the particle filter as a numerical solution that is applicable to nonlinear and non-Gaussian systems. It also provides an overview of several positioning applications that demonstrate the use of particle filters with real data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

I.

I NT RO D U CT I O N

A dynamic system can in general t erms be

Particle Filter Theory and characteri zed by a state- space model with a hidden
state from which parti al informati on is obt ai ned by

Practice with Positioning


observ ations. F or the applications in mind,the state
vector may i nclude positi on, velocity, and acceleration

Applications
of a moving platform, and the observati ons may come
from either internal onboard sensors (the navi gation
problem) measuri ng i nertial moti on or absolute
position relative to some landmark or from external
sensors (the tracki ng problem) measuring for i nstance
range and bearing to the t arget.
FREDRIK GUSTAFSSON, Senior Member, IEEE
The nonlinear filtering problem i s to make
Linkoping University
Sweden i nference on the state from the observations. In the
Bayesian framework,thi s is done by computing
or approximati ng the posterior distribution for the
state vector given all avai lable observations at that
The particle filter (PF) was introduced in 1993 as a numerical time. F or the applications in mind,thi s means th at
approximation to the nonlinear Bayesian filtering problem, and the position of the platform i s represented with a
there is today a rather mature theory as well as a number of
conditional prob ability density function (pdf) giv en
successful applications described in literature. This tutorial
the observations.
Classical approaches to Bayesian nonlinear
serves two purposes: to survey the part of the theory that is most
filtering described in liter ature i nclude the following
important for applications and to survey a number of illustrative
algorithms:
positioning applications from which conclusions relevant for the

theory can be drawn.


1 ) The Kalman filter ( KF ) [l, 2] computes the
posterior di stribution ex act ly for li near Gaussi an
The theory part first surveys the nonlinear filtering problem
systems by updating finite-dimensional stati stics
and then describes the general PF algorithm in relation to
recursively.
classical solutions based on the extended Kalman filter (EKF) and
2) F or nonlinear, non-G au s si an models,the KF
the point mass filter (PMF). 'TIming options, design alternatives,
algorithm can be appli ed to a linearized model with
and user guidelines are described, and potential computational
G au ssi an noi se with the same fir st- and second-order
bottlenecks are identified and remedies suggested. Finally, the moment s. Thi s approach is commonly referred to as
marginalized (or Rao-Blackwellized) PF is overviewed as a the extended Kalman filter (E KF ) [3,4]. Thi s may
general framework for applying the PF to complex systems. work well but without any guar antees for mildly
The application part is more or less a stand-alone tutorial nonlinear systems where the true posterior is unimodal
without equations that does not require any background Gu st one peak) and essenti ally symmetri c.
knowledge in statistics or nonlinear filtering. It describes a 3 ) The unscented Kalman filter (UKF ) [5,6]
number of related positioning applications where geographical
propagates a number of point s in the state space
from which a G aussi an distribution is fit at each
information systems provide a nonlinear measurement and where
time step. UKF is known to accomodate also the
it should be obvious that classical approaches based on Kalman
quadratic t erm in nonlinear models, and i s often
filters (KFs) would have poor performance. All applications are
more accurate than E KF. The divided difference filter
based on real data and several of them come from real-time
(DDF ) [7] and the quadrature Kalman filter ( Q KF )
implementations. This part also provides complete code examples.
[8] are two other vari ant s of thi s pri nciple. Agai n,
the applicabilit y of these filters i s limited to unimodal
posterior di stributions.
4) G aussi an sum Kalman filters (GS- KFs) [9]
represent the posteri or with a Gaussi an mixture
distribution. Filters in thi s class can h andle mu ltimodal
Manuscript received June 1 8 , 2008 ; revised January 26 and June 1 7 ,
2009; released for publication September 8, 2009. posteriors. The i dea can be extended to KF
approximations like the GS- Q KF in [8].
Refereeing of this contribution was handled by L. Kaplan.
5) The point mass filter (PMF ) [ 10,9] grids
Author's address : Dept. of Electrical Engineering, Linkoping the state space and computes the posterior over thi s
University, ISY, Linkoping, SE-5 8 l83, Sweden, E-mail: grid recursively. PMF applies to any nonlinear and
([email protected]).
non-Gaussi an model and is able to represent any
posterior di stribution. The mai n limiti ng factor is the
0018-925 1 11 0/$1 7 .00 © 20 1 0 IEEE curse of dimensionality of the gri d size in hi gher state

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORlALS-GUSTAFSSON 53
-+-PForSMC
- PFor SMC and application
-+- Citations to Gor<lon

2000

1500

1000

500

1996 1998 2000 2002 2004 2006 2008


Year

Fig. 1 . Evolution over time of research on PFs. Graph shows number of papers in Thomson/ISI database that match search on
"particle filter" OR "sequential Monte Carlo" (upper curve), "particle filter" OR "sequential Monte Carlo" AND "application"
(middle curve), and number of citations of [ 1 5 ] (lower curve).

dimensions and that the algorithm itself is of quadrati c P o sitioning of moving platforms has b een a
complexity in the grid si ze. technica l driver for real-time applications of the
PF in both the si gnal processing and the robotics
It should b e stressed that both EKF and UKF
communities. F or this reason, we spend some time
approximate the model and propagate Gaussian
distributions representitive of the post erior while the explaining several such applications in detail and
PMF uses the original model and approxi mates the summari zi ng the experi ences of usi ng the PF in
posterior over a grid. The particle filt er (PF) also practice. The applications concern positioning of
provides a numerical approximati on to the nonlinear underwater (UW) vessels , surface ships , cars , and
filteri ng prob lem similar to the PMF but uses an aircraft usi ng geographical information systems (GIS)
ad aptive stochastic grid that aut omatically select s containing a d atabase with features of the surrounding
relevant grid points i n the st ate space, and in contrast landsc ape. These appli cations provide conclusi on s
to the PMF, the st and ard PF has linear complexity i n supporting the theoreti cal survey part.
the number o f grid points. In the robotics community , the PF has been
The first traces of the PF date b ack to the 1950s developed into one of t he mai n algorithms
[11 , 12] , and the control community made some (fast S LAM) [24] for solving the simult aneous
attempts in the 1970s [ 13 , 14]. However, the PF localiz ati on and mapping (SLAM) problem
era st arted with the semi nal paper [ 15] , and the [25-27]. This can be s een as an extensi on of the
i ndependent development s in [ 16 , 17]. Here, an aforementi oned applications , where the features in
important res ampli ng step was i ntroduced. The ti mi ng the GI S are dy namically detected and updated on
for proposing a general solution to the nonlinear the fly. Visual tracki ng has turned out to be another
filteri ng prob lem was perfect in that the computer
i mport ant application for the PF. MUltiple target s are
development enabled the use of computationally
here tr acked from a vid eo stream alone [28-30] or by
complex algorithms in quite realistic prob lems. Si nce
fusi on with other information, for inst ance, acoustic
the paper [ 15] the research has steadily i ntensified;
sensor s [3 1].
see the article collection [ 18] , the surveys [ 19-22] ,
The common denominator of these applicat ions
and the monograph [23]. Fig. I illustrat es how
of the PF is the use of a low-dimensional state
the number of papers i ncreases exponenti ally each
y ear, and the s ame appears to be true for applied vector consisting of horizontal position and course
papers. The PFs may be a serious alternative for (three-dimensional pose). The PF performs quite
real-ti me applications classically approached by well in a three-dimensional state space. In higher
the (E)KF. The more nonlinear model, or the more dimensi ons the curse of dimensionality q ui te soon
non-Gaussian noise, the more potential PFs have, makes th e particle representation too sparse to
especi ally in applications where computational b e a meaningful representation of the posterior
power is rather cheap, and the s ampling rate i s di stribution. That is , the PF is not practically useful
moderate. when extendi ng the models to more realisti c case s

54 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
with II. N O N L I N EA R FI LTE R I N G

1) moti on in three dimensions (six-dimensi onal A. Models a n d N otation


pose) ,
2) more dy namic st ates (accelerations , unmeasured Appli ed nonlinear filteri ng is b ased on discrete
velocities , et c.) , time nonlinear st at e-space models relating a hidden
3) or sensor bi ases and d rifts. state xk Yk:
to the obser vations

A technical enabler for such applicati ons is the xk+l f(xk'vk)' vk PVk' Xl PX]
=
rv rv
( 1 a)
margin alized PF (MPF), also referred to as the
( 1b)
Rao-Blackwellized PF (RB PF). It allows for the
use of high-dimensional state-space models as long Here k denotes the sample number, vk is a st ochastic
as th e (severe) nonlinearities only affect a small
subset of the st ates. In this way the structure of the
noise process specified by its known pdf
is compactly expressed as vk Similarly
rv PVk.
which
is an
Pv ' ek
model is utilized so that the particle filter is used
to so lve the most difficult t asks , and the (E)KF is
additive measurement noise also with known pdf e
The first observation is denoted and thus the fir;t
Yl' P •

Px Uk'Xl
used f or the (almost) linear Gaussian st ates. The unknown state is where the pdf of the initi al state is
fastS LAM algorithm is in fact a version of the MPF, denoted . The model can also depend on a known
wher e hundreds or thousands of feature points in (control) i dput so f(xk,uk, vk) and
this dependence is omitted to simplify not ation. The
h(xk , uk) ,
but
the stat e vector are updated usi ng the (E)KF. The
need f or the MPF in the list of applicati ons will
not ati on sl:k , e
denotes the sequence sl,s2 , ... ,sk (s
is one
be m otivated by examples and experience from
of that si gnal.
x
of the si gnals v , y , ) , and ns denot es the dimension
pr acti ce.
In the st atistical literature, a general Markov model
T his tutorial uses not ati on and terminology that
and observati on model in terms of conditional pdfs are
should be famili ar to the AES community, and it often used
deliberately avoids ex cessive use of concepts from
(2a)
probability theory, where the main tools here are
B ayes' theorem and the margi nalization formula (2b)
(or law of total prob ability). There are explicit
comparisons and r eferences to the KF, and the This is in a sense a more general model. F or
applicati ons are in the ar ea of t arget tracking and instance, (2) allows implicit measurement relations
navigation. F or inst ance, a particle r epresents a h(Yk , xk ,ek)= 0 in (1b) , and differential alg ebrai c

equati ons that add implicit st ate constraints to ( 1 a).


(target ) state tr aject ory; the (target) motion dy namics
The B ay esi an approach to nonlinear filteri ng is
and s ensor observati on model are assumed to be in
to compute or approximate the posterior distributi on
st ate-space form, and the PF algorithm is split into
for the st ate given the observations. The posterior
ti me a nd measurement updates.
The PF should be the nonlinear filtering algorithm
is denoted
predicti on, and
p(xk p(xk
I Yl:k-) m Yl:k )
for filtering,
I
I p(xk+m Yl:k)
for smoothing where
for

that appeals to engineers the most since it i ntimately


m > 0 denotes the prediction or smoothing lag. The
addresses the system model. T he filtering code is theoretical derivations are b ased on the general model
thus very similar to the simulation code that the (2) , while algorithms and discussions are b as ed on
engineer worki ng with the application should already (1). Not e that the Markov property of the model (2)
, xk xk)
P(Yk xl:xk)k, Yi:k ) P( +l
P(Yk Xl:k'YI:k-l) P(xk+l
b e quite fami li ar with. F or that reason, one can implies the formulas I = I
have a code-first approach, st arting with Section IX and I = I which are used
to get a complete si mulation code for a concrete frequently.
ex ample. This section also provides some other A li nearized mod el will t um up on several
ex am ples usi ng an object-ori ented programmi ng occasi ons and is obtained by a first-order Tay lor
framework where models and si gnals are represented expansion of ( 1) around = xk xk
and vk = 0:
with objects , and can be used to quickly compare
different filters , tunings , and models. Section X
xk +l f xk
= ( , 0) + F (xk)(xk - xk) G(xk)vk
+ (3 a)

provides an overview of a number of applications (3b)


of th e PF, which can also be read stand-alone.
Section XI extends the applications to models of where
high s tate dimensions where the MPF has b een
applied. The practical experi ences are summarized i n
Section XII.
H owever, the natural structure is to st art with an
overvi ew of the PF theory as found i n Section I I ,
and a s ummary of the MPF t heory is provided i n
Section VIII , where the selection o f t opics is strongly
influen ced by the practical experi ences in Section XII.

IEEE A &E SYSTEMS M AGAZINE VOL. 25, NO. 7 JULY 20 10 PART 2: TUTORIALS-GUSTAFSSON 55
and the noise is represented by their second-order coordinates is therefore crucial in EKF and UKF (see
moments [34 , ch. 8.9.3] for one example) while this choice
does not affect the performance of the PF (more than
COV(Xl) = Po . potential numerical problems).
(3d)
F or instance, the EKF recursions are obt ained by B. Bayesi a n Filte r i n g
linear izing around the previous estimate and applying T he B ay esian solution to comput ing the posteri or
the KF equat ions , which gives distribution P(xk Yl:k )
I o f the state vector, given past

Kk = ltlk_lH T(Xklk_l) observat ions , is given by t he general B ay esian updat e


recursion:
A lk-l)ltlk-IH TA(Xklk-l) Rk)-l I xk)P(xk I Yl:k-l )
x (H(Xk + (4a)
P(xk I Yl:k ) - P(Yk P(Yk I Yl:k -l)
(6a)
Xklk = Xklk-l Kk(Yk -hk(Xklk-I»
+ (4b )

ltlk = ltlk-l -KkH(Xklk-l)ltlk-1 (4c)

Xk+llk = f(Xklk'0) (4d)

It+llk = F(Xklk )ltlkF T(Xklk ) G(Xklk )QGT(Xklk )'


+ (6c)
(4e) This classical result [35 , 36] is th e cornerstone in
nonlinear Bay esian filt ering. The first equation follows
o = Xo li o =
p(xl) '" N(xxllo'Po)· l
The recurs ion is initialized with and directly from Bay es' law, and the other t wo follow
Po, assuming the prior The EKF from the law of tot al probability, using t he model
approximat ion of the posterior filteri ng distribut ion (2). T he first equat ion corresponds to a measurement
is then update, the second is a normalization constant , and the
(5) third corresponds to a time update.
where
with mean
N(mm,P) denotes the Gaussian density functi on
and covariance P. The special cas e
T he posterior distribution is the primary out put
from a nonlinear filter, from which standard measures
o f a linear model is covered by (3 ) i n which case as the minimum mean square (MMS) est imate
F (xk) = Ft,G(xk)x OGk, H(xxk) Hk;x x
= = usi ng these and xrMS and its covari ance IktrS can be extracted and
the equalities f( k' ) Fk k h( k) Hk k
= and = in (4 ) compared with EKF and UKF outputs:

J xkP(xk I Yl:k )dxk


gives the standard KF recurs ion.
XW
S
T he neglect ed hi gher order t erms in the Tay lor = (7a)

klk - J(Xk -Xk


expansion imply that the EKF can be biased and that
it tends to underestimate the covariance of the state R
MMS AMMS)(Xk -AXkMMS)TP(Xk I Yl:k )dXk·
estimate. There is a vari ant of the EKF that also takes
the s econd-order term of the Tay lor expansion into (7b )
account [32]. This is done by adding the expected
value of the second-order term to the state updates F or a linear Gaussi an model, the KF recursi ons in (4)
and its covariance to the state covariance updates. also provide the solut ion (7) to this B ay es ian problem.
The UKF [5 , 6] does a similar correct ion by using However, for nonlinear or non-Gaussian models there
is in general no finit e- dimensional representation of
the posterior d istribut ions similar to (XWs ,1k�MS).
propagation of systematically chosen state points
(called s igma points ) through the model. Related
approaches include the DDF [7] that uses Sterling's That is why numerical approximat ions are needed.
formula to find the sigma points and the QKF [8] that
uses the quadrature rule in numerical integration to C. The Poi nt Mass Fi lter
select the sigma points. The common theme in EKF,
UKF , DDF, and QKF is that the nonlinear model is
Suppose now we have a deterministic grid
of the state space Rnx over N points , and that at time
{xi}f:l
evaluated in the current state estimate. The latter filters
have some extra points in common that depend on the
k, b as ed on observations Yl:k-l'
we have comput ed the
relat ive probabilites (assumi ng distinct grid points)
current state covariance.
UKF is closely related to the s econd-order EKF (8 )
[33]. B oth variants perform b ett er than the EKF i n
cert ain problems and can work well as long as t he s atisfying ��l W�lk-l
= 1 (note that this is a relative
posterior distribution is unimodal. The algorithms are normalization with respect to the grid points). T he
prone to diverge, a problem that is hard to miti gat e notation x�
is introduced here to unify not ation with
o r foresee b y analyt ical methods. T h e choice o f state the PF, and it means that the state xk
at t ime k visits

56 IEEE A&E SYSTEMS MAGAZINE VOL. 25 , NO. 7 JULY 20 1 0 PART 2: TUTORIALS


the grid point Xi.
The predict ion density and the ftrst
two moments can then be approximated by
a discrete density of the form (9a), and they are b oth
b ased on a direct applicat ion of (6 ) le ading t o t he
N numerical recursion in (10). However, there are some
P(Xk I Yl:k-l) = L
i=l wilk-lO(Xk -xi) (9a) major differences:

N
I) The deterministic grid xi
in the PMF is replaced

i with a dy namic stochastic grid xi


in the PF that
xklk-l = E(Xk) = Li=l wilk-lx
(9b ) changes over time. The stochastic grid is a much more
efftcient represent ation of the st ate space than a ftxed
lllk-l = OV(Xk
c ) or adaptive deterministic grid in most cases.
2) The PF aims at est imat ing the whole trajectory
x\:k .
{xLxkk}f:1
N
rather than the current st ate That is , the PF
= LWilk-l (xi - xklk-l) (xi - xklk_1 )T. generates and evaluates a set of N different
i=l (9c)
trajectories. This affects (6c) as follows:

Here , o(x)
de notes the Dirac impulse fu nction. The P(�:k+l I Yl:k) = P(Xi+l I xLk'Yl:k)P(xLk I Yl:k)
,
p(.xi+ [ Ixi)
... '�
B ayes ian recursion (6 ) now gives
W�lk
N (12)
i=l ,:k P(Yk I xi)wilk-l, O(xk -xi)
P(xk I Yl:k) = L
'V
= WilkP(xi+l I x�). (13)

(10 a) C omparing this to (lOc) and (11 ) , we note th at


N the double sum le ading to a q uadrat ic complexity
ck = L
i=l P(Yk I xi)wilk-l
(lOb ) is avoided by this trick. However, this quadrat ic
complexity appears if one wants to recover the
N margi nal distribution P(xk Yl:k)
I from p(x\:k Yl:k)'
I

i=l wilkP(Xk+l I xi)·


P(xk+l Yl:k) L
I = (10c) more on this in Section I I IC.
3) The new grid in the PF is obtained by s ampling
from (lOc) rather than reusi ng the old grid as done in
N ote t hat the recursion st arts with a discrete
the PMF. The original version of the PF [15] s amples
approximation (9a) and ends in a continuous
from (lOc) as it st ands by drawing one s ample e ach
distribution (lOc). Now, to close the recursion, the
standard approach is to sample (lOc) at the grid
from p(xk+ 1 xi)
I for i = 1 , 2, . . . , N . More generally,

points Xi,
which comput ationally can be see n as a
the concept of import ance sampling [37] can be
used. The ide a is to introduce a proposal density
multidimensional convolut ion,
N
q(xk+1 xk'Yk+l)'
I
rewrite (6c) as
which is e asy to s ample from, and

Wi+llk = P(Xi+l I YI:k) = L w{lkP(xi+l l.xfc),


j=l P(xk+1 I Yl:k) = iIRr nx P(xk+l I xk)P(xk I Yl:k)dxk
i = 1 , 2, . . . , N . (11 )
P(xk+l I xk)
This is t he principle of the PMF [9 , 10] , whose
advant age is its simple implement at ion and t uning (the
=
lJRn, q(xk+l I xk'Yk+l) q(xk+l I xk'Yk+l)
engineer b asically only has to consider the size and X p(xk I Yl:k)dxk. (14)
resolut ion of the grid). The curse of dimensionality
limits the application of PMF to small models (nx
The trick now is to generate a s ample at random fr om

less th an two or three ) for t wo re asons: the ftrst one


x�+l ,....., q(xk+l I Xi,Yk+l)
for e ach particle , and then
adjust the posterior probability for e ach part icle with
is that a grid is an inefficient ly sparse represent at ion the import ance weight
in higher dimensions , and the second one is that
the multidimensional convolut ion becomes a re al
bottleneck with quadratic complexity in N . Another
practically important but diffi cult proble m is to
translate and change the res olution of the grid
adaptively.
q(�+l I xi'Yk+l)
As indicated, the proposal distribution
.xi:k'
depends on the last st ate in the particle tr ajectory
III. THE PART I C L E FI LT E R Yk+ l.
but also the next me asurement The simplest
choice of proposal is to use the dy namic mode l itself
A. R elation t o the Poi nt Mass Fi lter
q(x�+l x�'Yk+l) P(�+l I xD
I = Wi+llk
leading to =
The PF has much in common with the PMF. B oth
algorithms approximate the posterior distribut ion with
iW lk.
The choice of proposal and its actual form are
discussed more thoroughly in Sect ion V.

IEEE A &E SYSTEMS M AGAZINE VOL. 25, NO. 7 JULY 20 10 PART 2: TUTORIALS-GUSTAFSSON 57
4) Resampling is a crucial step in the PF. Without 4) Time update: Generate predictions according to
resampling, the PF would break down to a set the proposal distribution
of independent simulations yielding trajectories
xLk with relative probabilities wi.
Since there xi+1 q(xk+1 I Xi,Yk+I)
f'V
( 1 6c)
would then be no feedback mechanism from
and compensate for the importance weight
the observations to control the simulations, they
would quite soon diverge. As a result, all relative i k -_wkli k �p(xi+ 1 iI xD '
wk+ll ( 1 6d)
weights would tend to zero except for one that
tends to one. This is called sample depletion,
q( k+1 I Xk'Yk+1 )
sample degeneracy, or sample impoverishment.
Note that a relative weight of one Wilk
� 1 is not

at all an indicator of how close a trajectory is


C. Pred i ctio n , S m oothi ng, and Ma rgi nals

to the true trajectory since this is only a relative Algorithm 1 outputs an approximation of the
weight. It merely says that one sequence in the trajectory posterior density p(xl:k I YI:k)'
For a filtering
set {xLk};:1is much more likely than all of the
other ones. Resampling introduces the required
problem, the simplest engineering solution is to just
extract the last state xi
from the trajectory and use A:k
information feedback from the observations, so the particle approximation
trajectories that perform well will survive the N
resampling. There are some degrees of freedom p(xk I YI:k) = L Wilk8(Xk -xi)· ( 1 7)
in the choice of resampling strategy discussed in
Section IVA.
i=1
Technically this is incorrect, and one may overlook
the depletion problem by using this approximation.
B. Algorithm
The problem is that in general all paths x{:k-I
can lead
The PF algorithm is summarized in Algorithm 1 . It �
to the state x . Note that the marginal distribution is
functionally of the same form as (6c) . The correct
can be seen as an algorithmic framework from which
particular versions of the PF can be defined later on. solution taking into account all paths leading to x
leads (similar to ( 1 1 )) to an importance weight

It should be noted that the most common form of the
. .
algorithm combines the weight updates ( 1 6a, d) into .

one equation. Here, we want to stress the relations to i k = Lj=1 i�lkP(x'i k+1 I xi)
wk+ll
N

( 1 8)
the fundamental Bayesian recursion by keeping the q(xk+1 I xk'Yk+l)
structure of a measurement update (6a)-( 1 0a)-( 1 6a),
that replaces the one in ( 1 6d). That is, the marginal PF
normalization (6b)-( 1 0b)-( 1 6b), and time update
can be implemented just like Algorithm 1 by replacing
(6c)-( 1 0c)-( 1 6c, d).
the time update of the weights with ( 1 8). Note that the
ALGORITHM 1 Particle Filter Choose a proposal complexity increases from O(N) in the PF to O(N 2 )
distribution q(xk+1 xI:k'Yk+I)'
I
the number of particles N.
resampling strategy, and in the marginal PF, due to the new importance weight.
A method with O(N log(N)) complexity is suggested
Initialization: Generate x; PXo' i = 1 , . . . , N and
f'V in [38] .
let wilo= liN.

Iteration For k = 1 , 2, . . . .
The marginal PF has found very interesting
applications in system identification, where a gradient
1 ) Measurement update: For i = 1 , 2, . . . , N, search for unknown parameters in the model is
applied [39, 40] . The same parametric approach
Wkilk _I i
- -wkl i
ck k-1P(y k I Xk) ( 1 6a) has been suggested for SLAM in [4 1 ] and optimal
trajectory planning in [42] .
where the normalization weight is given by Though the PF appears to solve the smoothing
problem for free, the inherent depletion problem of
N
the history complicates the task, since the number

i=1 Wilk-IP(Yk I xi)·


ck = L ( 1 6b) of surviving trajectories with a time lag will quickly
be depleted. For fixed-lag smoothing p( m I xk :k
-

2) Estimation: The filtering density is approximated Yl:k)' one can compute the same kind of marginal
by p(xl:k I Yl:k) = Li=1 wk1k8(xl:k -x'l:k) and the mean
A N ·
.
distributions as for the marginal PF leading to another
(7a) is approximated by xl:k L;:'I wilkxLk'
compensation factor of the importance weight.

3) Resampling: Optionally at each time, take


However, the complexity will then be O(Nm ) .
+ 1
Similar to the KF smoothing problem, the suggested
N samples with replacement from the set {xLk};:1
solution [43] is based on first running the PF in the
where the probability to take sample i is wklk and let
usual way and then applying a backward sweep of a
wilk = liN. modified PF.

58 IEEE A&E SYSTEMS MAGAZINE VOL. 25 , NO. 7 JULY 20 1 0 PART 2: TUTORIALS


T he prediction to get P(Xl:k+m Yl:k )
I can b e
implem ented b y repeating t h e time update in
1) The st andard version of Algorit hm 1 is termed
s ampling import ance resampling (SIR), or bootstrap
Algorithm 1 m times . PF, and is obt ained by resampling each time.
2) T he alternat ive is to use import ance s ampling,
in which case res ampling is performed only when
D. Read i ng Advice
needed. This is called s ampling import ance s ampling
The reader may at this st age cont inue to (SIS). Usually, res ampling is done when the effective
Section IX to see MATLAB ™ code for some number of s amples , as will be defined in the next
illustrative ex amples , or to Section X to read about the s ection, becomes too s mall.
results and experiences using some other applicat ions , As an alt ernative, the res ampling step can b e replaced
or proceed to the subsequent s ections that discuss the with a s ampling step from a dis tr ib ut ion that is fitted
following issues: to the particles after b oth the t ime and measurement
1 ) T he tuning possibilities and different versions update. The Gaussian PF (GPF ) in [44] fits a
of the b asic PF are discussed in Section IV. Gaussian distribut ion to the particle cloud after
2)T he choice of propos al distribution is crucial
for performance, just as in any classical s ampling
which a new s et of part icles is generated from this
distribution. T he Gaussian sum PF (GSPF ) in [45]
algorithm [ 37], and this is discussed in Section V . uses a Gauss ian sum instead of a distribut ion.
3) Performance i n terms o f convergence o f the
approximation p(X\:k Yl:k ) p(x\:k Y\:k ) N
I ---7 I as ---7 00

and re lat ion to fundament al performance bounds are


B. Effective N u m ber of Samples

An indicator of the degree of depletion is the


discussed in Section VI .
effective numb er of s amples , l defined in t erms of the
4) T he PF is comput at ionally quite complex , and
some potential bottlenecks and possib le remedies are coefficient of variation Cv [ 1 9, 46, 47] as
discussed in Section VII.
Neff - 1 c�N(w1Ik ) = Var(N w1Ik )
+
=
1 +
N
N2Var(w1Ik ) ·
IV. TUN I N G
1
(E(wk1k»2
+
.

(19a)
The number of particles N is the most immediate The effective number of s amples is t hus at its
Neff = N
Neff =W�1 ,lk
design parameter in the PE There are a few other max imum when all weights are equal =
degrees of freedom discussed b elow. The overall goal
liN, and the lowest value it can att ain is
is to avoid s ample deplet ion, which means that only
which occurs when w�lk = 1with probability liN and
a few particles , or even only one, contribut e to the
state estimate. The choice of proposal distribution is w�lk = 0 with probability(N - l) IN.
A logical comput able approximation of Neff is
the most intricate one, and it is discussed separately in
Section V. How the res ampling strategy affects s ample provided by
deplet ion is discussed in Sect ion IVA. The effective
number of samples in Sect ion IVB is an indicator of
l9 )
( b

1 !jeff :::; N
s ample deplet ion in that it measures how efficient ly
T his approximat ion shar es the property :::;
the PF is utilizing its particles . It can be used to
design proposal distributions , depletion mitigation with the definit ion (l9 a) . The upper b ound = is Neff N
tricks , and res ampling algorithms and also to choose att ained when all particles have t he s ame weight and
the number of particles . It can also be used as an the lower bound Neff
= 1 when all t he probability mass
online control variable for when to res ample. Some is devot ed to a single particle.
dedicated tr icks are discussed in Sect ion Ive . T he res� pling condition in the PF can now b e
defined as Neff Nth
< . T he threshold can for inst ance b e
A. Resa m pl i ng chosen as Nth = 2N 1 3.
W ithout the res ampling step, the b as ic PF would
C. Tri cks to Mitigate Sample Depletio n
suffer from s ample deplet ion. This means that after
a while all particles but a few will have negligible T he choice o f proposal distrib ut ion and res ampling
weights . Res ampling solves t his prob lem but creates strategy are the two availab le instruments to avoid
anothe r in that res ampling inevit ab ly destroys s ample deplet ion problems . There are also some
information and thus increas es uncert ainty in the simple and more practical ad hoc t ricks that can b e
random s ampling. It is therefore of interest to st art tried as discussed below.
the resampling process only when it is really needed.
The follow ing options for when to res ample are I Note that the literature often defines the effective number of
possib le. samples as N /0 + Var(wk » which is incorrect.
lk '

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 59
One important trick is to modify the noise models For filtering problems this is not an issue, but for
so the state noise and/or the measurement noise smoothing problems the second factor becomes
appear larger in the filter than they really are in the important. Here, the idea of block sampling [5 1 ] is
data generating process. This technique is called quite interesting.
"jittering" in [48], and a similar approach was Now, due to the Markov property of the model, the
introduced in [ 1 5] under the name "roughening."
Increasing the noise level in the state model ( la)
proposal q(xk 1 X\:k_1 'Y\:k)
can be written as

increases the support of the sampled particles, which q(xk 1 xl:k-I'Y\:k) q(xk 1 xk-l'Yk)·
= (2 1 )
partly mitigates the depletion problem. Further, The following sections discuss various approximations
increasing the noise level in the observation model of this proposal and in particular how the choice of
( 1b) implies that the likelihood decays slower for proposal depends on the signal-to-noise ratio (SNR).
particles that do not fit the observation, and the chance For linear Gaussian models, the SNR is in loose
to resample these increases. In [49] , the depletion terms defined as IIQII/IIRI I; that is, the SNR is high
problem is handled by introducing an additional if the measurement noise is small compared with the
Markov Chain Monte Carlo (MCMC) step to separate signal noise. Here, we define SNR as the ratio of the
the samples. maximal value of the likelihood to the prior,
In [ 1 5] , the so-called prior editing method is
maxXk P(Yk 1 xk)
discussed. The estimation problem is delayed one
time step so that the likelihood can be evaluated
SNR ex
maxxkP(xk 1 xk-l) . (22)

at the next time step. The idea is to reject particles For a linear Gaussian model, this gives SNR ex
with sufficiently small likelihood values, since they Jdet(Q)1 deteR).
are not likely to be resampled. The update step is In this section we use the weight update
repeated until a feasible likelihood value is received.
The roughening method could also be applied before
Wkli k Wki -Ilk-I P(Ykq(x1 Xk:C1)p(x� 1 xLI) (23)
the update step is invoked. The auxiliary PF [50] is a
more formal way to sample such that only particles
ex
fxk-\'y)k
associated with large predictive likelihoods are combining ( l6a) and ( l6b). The SNR thus indicates
considered; see Section VF. which factor in the numerator most likely to change
Another technique is regularization. The basic idea the weights the most.
to is convolve each particle with a diffusion kernel Besides the options below that all relate to (2 1 ),
with a certain bandwidth before resampling. This there are many more ad hoc-based options described
will prevent multiple copies of a few particles. One in the literature.
may for instance use a Gaussian kernel where the
variance acts as the bandwidth. One problem in theory A. Opti mal Sampli ng
with this approach is that this kernel will increase the
variance of the posterior distribution. The conditional distribution includes all
information from the previous state and the current
observation and should thus be the best proposal to
V. CHO I C E OF PRO POSA L D I STRI BUT I O N sample from. This conditional pdf can be written as
In this section we focus on the choice of proposal
distribution, which influences the depletion problem q(Xk 1 Xki _1 'Yk) P(xk 1 4-1 'Yk) P(Yk P1 (yxk)P(xk
=
.

= i
1 xLI)
k 1 Xk_1 ) .

significantly, and we outline available options with


some comments on when they are suitable. (24a)
First note that the most general proposal This choice gives the proposal weight update
distribution has the form q(x\:k 1 Y\:k ).
This means
that the whole trajectory should be sampled at each
w�lk wLllk-IP(Yk 1 X:C-I).
ex (24b)
iteration, which is clearly not attractive in real-time The point is that the weight will be the same whatever
applications. Now, the general proposal can be
factorized as
x�
sample of is generated. Put in another way, the
variance of the weights is unaffected by the sampling.
All other alternatives will add variance to the weights
and thus decrease the effective number of samples
(20) according to ( l9a). In the interpretation of keeping
The most common approximation in applications the effective number of samples as large as possible,
is to reuse the path
,xk x\:k_1q(x : \:
and only sample the new (24a) is the optimal sampling.
state so the proposal
q(xk x\:k_1 'YI:k)·
1
I k 1 Y k)
is replaced by
The approximate proposal suggests
The drawbacks are as follows:
1) It is generally hard to sample from this
xk
good values of only, not of the trajectory x\:k. proposal distribution.

60 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
2) It is generally hard to compute the weight
update needed for this proposal distribution, since it 1B t
�3
would require integrating over the whole state space, >< 1

P(Yk 1 x LI)=J P(Yk 1 Xk)P(Xk 1 xLI)dxk· 0: O'-1�L==::::


-0.5 0
::;::=====::::
0.5
= ::: 1.5=::::;:2=====-..l
2.5
One important special case when these steps actually x

become explicit is a linear and Gaussian measurement


relation, which is the subject of Section VE.

B. Prior Sampli ng
x
The standard choice in Algorithm 1 is to use
the conditional prior of the state vector as proposal
distribution

q(Xk 1 xLI'Yk)=P(Xk 1 xLI) (25 a)


where p(xk 1 xLI ) is referred to as the prior of xk for x

Fig. 2. Illustration of (24a) for scalar state and observation


each trajectory. This yields
model. State dynamics moves particle to xk I and adds
=

Wklk=Wklk-IP(Yk 1 xi)=wLI1k-IP(Yk 1 4)· Yk =


uncertainty with variance I, after which observation
0.7 xk + ek is taken. Posterior in this high SNR example is
=

(25b) here essentially equal to likelihood.

This leads to the most common by far version of


the PF (SIR) that was originally proposed in [ 1 5 ] . It 3
performs well when the SNR is small, which means
that the state prediction provides more information 2.5
about the next state value than the likelihood function.
2
For medium or high SNR, it is more natural to sample
from the likelihood. 1.5

C. Li kel i h ood Sampli ng �

Consider first the factorization 0.5

1 xLI)
p(xk l xk-I'Yk)=P(Yk l xk_I'Xk) p(xk
i i
P(Yk 1 Xk_1 )
i
-0.5
0

=p(yk 1 Xk) p(x k 1 xLI) . (26a) -1


P(Yk 1 Xk_l) -1 -0.5 0.5 1
x1
1.5 2.5

If the likelihood P(Yk 1 xk) is much more peaky than Fig. 3. Illustration of (24a) for two-dimensional state and scalar
the prior and if it is integrable in xk [52] , then observation model. State dynamics moves particle to xk (1, I)T
=

and adds correlated noise, after which an observation


(26b) Yk 0.7 (1,O)xk + ek is taken. Posterior in this high SNR
= =

example corresponds roughly to likelihood in one dimension (XI)


That is, a suitable proposal for the high SNR case is and prior in the other dimension (x2).

based on a scaled likelihood function


D. I llu strati ons
(26c)
A simple linear Gaussian model is used to
which yields
illustrate the choice of proposal as a function of
(26d) SNR. Fig. 2 illustrates a high SNR case for a scalar
model, where the information in the prior is negligible
Sampling from the likelihood requires that the compared with the peaky likelihood. This means that
likelihood function
to xk P(Yk xk)
1 is integrable with respect
[52] . This is not the case when nx > ny. The
the optimal proposal essentially becomes a scaled
version of the likelihood.
interpretation in this case is that for each value of Ykx' Fig. 3 illustrates a high SNR case for a
there is a infinite-dimensional manifold of possible
to sample from, each one equally likely.
k two-dimensional state, where the observation
dimension is smaller than the state space. The optimal

IEEE A &E SYSTEMS M AGAZINE VOL. 25, NO. 7 JULY 20 10 PART 2: TUTORIALS-GUSTAFSSON 61
proposal can here be interpreted as the intersection of
the prior and likelihood.
OPT-EKF : v� N(K�+l (Yk+ l - h(f(x�))) ,
E

(Hi�l Rtl Hi+l Qk)t). +

E. Opt i m al Sampli ng with L i n earized Li keli hood (29c)


and measurement update
The principles illustrated in Figs. 2 and 3 can
be used for a linearized model [43], similar to the SIR : W�lk WL1Ik-1N(Yk - h(x�), Rk)
=

measurement update in the EKF (4ef). To simplify


(29d)
the notation somewhat, the process noise in ( 1 a) is
assumed additive xk+l !(xk) vk.
=
+ Assuming that
the measurement relation ( 1 b) is linearized as (3b)
OPT-EKF : W� k WL11k-1 N(Yk - h(f(4-1))'
l =

when evaluating (24a), the optimal proposal can be HiQk-1HV Rk) +

approximated with (2ge)


q(xk I XL1'Yk) N(f(xLl ) K�(Yk - y�),
=
+ respectively. For OPT-SIR, the SNR definition can be

(HVRkHi QL1)t ) (27a)


+
more precisely stated as

where t denotes pseudoinverse. The Kalman gain, (30)


linearized measurement model, and measurement
prediction, respectively, are given by We make the following observations and

Kki Qk-1Hkui,T(RikQk-l Ri,k T Rk) -1


=
+ (27b)
interpretations on some limiting cases of these
algebraic expressons:

(27c) 1 ) For small SNR, K� 0 in (27b) and


(HV RkHi QL1 )t Qk-l in (29c), which shows


+

that the resampling (29c) in OPT-EKF proposal


y� h(f(xLl ))·
= (27d) approaches (29b) in SIR as the SNR goes to zero.
The weights should thus be multiplied by the That is, for low SNR the approximation approaches
following likelihood in the measurement update: prior sampling in Section VB .
2) Conversely, for large SNR and assuming
Hi invertible (implicitly implying ny � nx ) ' then
(27e) (HVRkHl QL1 )t Hi,-l RkHi,-T
+
� in (29c). Here,
all information about the state is taken from the
The modifications of (27) can be motivated intuitively measurement, and the model is not used; that is, for
-
as follows. At time k 1 , each particle corresponds
to a state estimate with no uncertainty. The EKF
high SNR the approximation approaches likelihood
sampling in Section VC.
3) The pseudoinverse is used consequentlr in
t
recursions (4) using this initial value gives

Xk-1Ik-1 ",N(4_1'0)::::} (28a) the notation for the proposal covariance (HVRkHl
+

xklk-1 !(4-1 )
= (28b)
QL1)t
cases:
instead of inverse to accomodate the following

lllk-l Qk-l
= (28c)
a) singular process noise Qk-l '
which is the case
in most dynamic models including integrated
Kk Qk_1H[(HkQk_1 H[ Rk) -1
= + (28d)
noise,
b) singular measurement noise Rk,
to allow
xklk xklk-l Kk(Yk - h(Xklk-l ))
=
+ (28e) ficticious measurements that model state
constraints. For instance, a known state
llik Qk-l -KkHkQk-l·
= (28f) constraint corresponds to infinite information
We denote this sampling strategy OPT-EKF. To in a subspace of the state space, and the
compare it with standard SIR algorithm, one can corresponding eigenvector of the measurement
interpret the difference in terms of the time update. information H1RkHV will overwrite the prior
The modification in Algorithm 1 assuming a Gaussian information QL1.
distribution for both process and measurement noise,
is to make the following substitution in the time F. Auxili ary Sampli ng
update
4+1 !(x�) v�
=
+ (29a) The auxiliary sampling proposal resampling
filter [50] uses an auxiliary index in the proposal
SIR : v� '" N(O,Qk) (29b) distribution q(xk, i Yl:k).
I This leads to an algorithm

62 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
that first generates a large number M (typically A. Convergen ce I ssues
M = ION) of pairs
have
{x;{,ij}f=,I.
From Bayes ' rule, we
The convergence properties of the PF are well
understood on a theoretical level, see the survey [54]
p(xk,i I Yl:k)""'" P(Yk Ixk)P(xk,i IYl:k-I) (3 Ia) and the book [55 ] . The key question is how well a
functiong(xk) of the state can be approximated g(Xk)
= P(Yk Ixk)P(xk Ii'Yl:k_I)P(i I YI:k-l) by the PF compared with the conditional expectation
(3 I b) E(g(xk»' where

P(Yk Ixk)P(xk Ixi-l)wLl1k-l· (3 I c)


= (32)
This density is implicit in xk and thus not useful as
N
an proposal density, since it requires xk to be known.
The general idea is to find an approximation of g(Xk) = J g(xk)P(x l:k IYl:k)dx l:k = 2:wi1kg(xi)·
i=1
P(Yk I xLI) = J P(Yk Ixk)P(xk I xLI)dxk· A simple
though useful approximation is to replace xk with its (33)
estimate and thus let P(Yk IxLI ) = P(Yk I iD above. In short, the following key results exist.
This leads to the proposal I) Almost sure weak convergence
q(xk,i I Y!:k) = p(Yk Iii)p(xk Ixi-l )wLl1k-l· p(Xl:k I Yl:k) = p(xl:k I Yl:k)
lim
N ->00
(34)
(3 I d)
in the sense that limN->oo g(Xk) = E(g(xk».
Here, Xi = E(xk I xLI) can be the conditional mean
2) MSE asymptotic convergence
or Xi P(xk IxLI) a sample from the prior. The new
E(g(Xk) - E(g(Xk»)2 Pk IIg�k)lsup (35)
'"

samples are drawn from the marginalized density :<:;

X;{""'" P(Xk IY!:k ) = 2:p(xk,i I Y!:k)· (3 I e)


where the supremum norm of g(xk) is used. As shown
in [55] using the Feynman-Kac formula, under certain
To evalute the proposal weight, we first take Bayes regularity and mixing conditions, the constant Pk =
rule which gives
P[54, 55]does
< 00 not increase in time. The main condition
q(xk,i IY!:k) = q(i IYI:k)q(xk I i'Yl:k)· (3 1 f) for this result is that the unnormalized weight
function is bounded. Further, most convergence results
Here, another choice must be made. The latter as surveyed in [56] are restricted to bounded functions
proposal factor should be defined as of the state g(x)
such that Ig(x)1
< C for some C. The

q(xk Ii'Yl:k) = P(xk IxLI)· (3 I g) convergence result presented in [57] extends this to
unbounded functions, for instance estimation of the
Then, this factor cancels out when forming state itself g(x) x,
= where the proof requires the
q(i I Yl:k) P(Yk I Xi)wLl1k-l.
ex (3 I h)
additional assumption that the likelihood function is
bounded from below by a constant.
The new weights are thus given by
In general, the constant Pk
grows polynomially

Wklk = Wk-Ilk-I P(YkqI x;{)p(.j x;{IYl:Ikx)LI)


i iJ

(-j
Xk,l
(3 I i)
in time, but does not necessarily depend on the
dimension of the state space, at least not explicitly.
That is, in theory we can expect the same good
Note that this proposal distribution is a product of performance for high-order state vectors. In practice,
the prior and the likelihood. The likelihood has the the performance degrades quickly with the state
ability to punish samples xi
that give a poor match
to the most current observation, unlike SIR and SIS dimension due to the curse of dimensionality.
However, it scales much better with state dimension
where such samples are drawn and then immediately than the PMF, which is one of the key reasons for the
rejected. There is a link between the auxiliary PF success of the PF.
and the standard SIR as pointed out in [53] , which
is useful for understanding its theoretical properties.
B. N o n l i n ear F i lteri n g Perform a n ce B o u n d

VI. THEORET I CA L P E RFORMA N C E Besides the performance bound of a specific


algorithm as discussed in the previous section, there
The key questions here are how well the PF are more fundamental estimation bounds for nonlinear
filtering density P(XI:k IYl:k)
approximates the true filtering that depend only on the model and not on
posterior k( Yl:k)'
p(xl:MSE
I and what the fundamental mean the applied algorithm. The Cramer-Rao Lower Bound
lklk
square error )
bounds for the true posterior are. (CRLB) provides such a performance bound for

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 63
any unbiased estimator Xklk' often, unexpected bottlenecks are discovered that can

� ��RLB.
be improved with a little extra work.
cov(Xk k)
l (36)
The most useful version of CRLB is computed A. Resampl i n g
recursively by a Riccati equation which has the same
functional form as the KF in (4) evaluated at the true One real bottleneck is the resampling step. This
trajectory xt:k' crucial step has to be performed at least regularly
when Ne ff becomes too small.
k - RklCRLk-lB - RklCRLk-lBH(Xk
RklCRLB _ o)T The resampling can be efficiently implemented
using a classical algorithm for sampling N ordered
(H(Xv���BHT (xV Rk) -l H(XV���B
X + independent identically distributed variables according
(37a) to [60] , commonly referred to as Ripley ' s method:

��1fuB = F (XV��RLBF T(xV G(xVQkG(x"l.


+ (37b) function [x,w]=resample(x,w)
% Multinomial sampling with Ripley's method
The following remarks summarize the CRLB theory u=cumprod(rand(l ,N). (1.1 [N:-1:1]»;
A

with respect to the PF: u=fliplr(u);


wc=cumsum(w);
I) For a linear Gaussian model
k=l;
xk+l Fkxk Gkvk' vk rvN(O,Qk) (38a)
= + for i=l:N

Yk Hkxk ek' ek rvN(O,Rk) (3 8b)


= +
while(wc(k)<u(i»
k=k+1;
the KF covariance � I k coincides with lk�RLB. That is, end

the CRLB bound is attainable in the linear Gaussian ind(i)=k;

case. end

2) In the linear non-Gaussian case, the covariances x=x(ind,:);

Qk' Rk, and Po are replaced with the inverse intrinsic


accuracies I;/, I;,/ and I�l, respectively. Intrinsic
w=ones(1,N) IN; •

The complexity of this algorithm is linear in the


accuracy is defined as the Fisher information number of particles N , which cannot be beaten if
with respect to the location parameter, and the the implementation is done at a sufficiently low
inverse intrinsic accuracy is always smaller than the
level. For this reason this is the most frequently
covariance. As a consequence of this, the CRLB
suggested algorithm also in the PF literature.
is always smaller for non-Gaussian noise than for
However, in engineering programming languages such
Gaussian noise with the same covariance. See [58]
as MATLAB TM, vectorized computations are often an
for the details.
order of magnitude faster than code based on "for"
3) The parametric CRLB is a function of the true
state trajectory x�:k
and can thus be computed only in
simulations or when ground truth is available from a
and "while" loops.
The following code also implements the
resampling needed in the PF by completely avoiding
reference system.
loops.
4) The posterior CRLB is the parametric CRLB
averaged over all possible trajectories lkf�stCRLB = function [x,w]=resample(x,w)
E(lkf:reRLB). The expectation makes its computation % Multinomial sampling with sort

quite complex in general. u=rand(N,1);


5) In the linear Gaussian case, the parametric and wc=cumsum(w);
posterior bounds coincide. wc=wc/wc(N);
6) The covariance of the state estimate from [dum,ind1]=sort([u;wc]);
the PF is bounded by the CRLB . The CRLB theory ind2=find(ind1<=N);
also says that the PF estimate attains the CRLB ind=ind2-(O:N-1)';
bound asymptotically in both the number of particles x=x(ind,:);
and the information in the model (basically the w=ones(1,N)./N;
SNR). This implementation relies on the efficient
Consult [59] for details on these issues. implementation of sort. Note that sorting is of
complexity N log2 (N) for low-level implementations,
so in theory it should not be an alternative to Ripley ' s
VII. COM P L EX I TY BOTIL E N ECKS
method for sufficiently large N . However, as Fig. 4
It is instructive and recommended to generate a illustrates, the sort algorithm is a factor of five faster
profile report from an implementation of the PE Quite for one instance of a vector-oriented programming

64 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
100
measurements
-+- Rlpley
Sort
-- Stratified
Yk = h . (xi) ek ,
,J . J
+ .
,J (40)
-e- Systematic The likelihood is given by

P(Yk I xi} ex e -O.5 �� 1 (Yk,j h/xDfR;;:) (Yk,j h/xD)


- -

(4 1 a)
M
= II e -O.5(yk.j -hj(xDlRk.;(Ykrhj(x�» . (4 1b)
j=l
The former equation with a sum should be used to
avoid extensive calls to the exponential function. Even
here, the process for vectorizing the calculations in the
--����""'"'-��
10-3 '-
sum for all particles in parallel is not trivial.
102 103 104 105
N
C. T i m e Update Sampli ng
Fig. 4. Computational complexity in vectorized language of two
different resampling algorithms: Ripley and sort. Generating random numbers from nonstandard
proposals may be time consuming. Then,
remembering that dithering is often a necessary
language. Using interpreters with loop optimization practical trick to tune the PF, one should investigate
reduces this difference, but the sort algorithm is still proposals including dithering noise that are as simple
an alternative. as possible to sample from.
Note that this code does not use the fact that
wc is already ordered_ The sorting also gets further
D. F u n ctio n Evalu ations
simplified if the sequence of uniform numbers is
ordered. This is one advantage of systematic or When all issues above have been dealt with, the
stratified sampling [ 1 6] , where the random number only thing that remains is to evaluate the functions
generation is replaced with one of the following f(x, v) and hex).
These functions are evaluated a large
number of times, so it is worthwile to spend time
lines:
optimizing their implementation. An interesting idea is
% S tratified sampling
to implement these in dedicated hardware taylored to
u=([O:N-l ) '+(rand(N,l » )/N; the application. This was done using analog hardware
% S y stematic sampling
in [6 1 ] for an arctangens function, which is common
u=([O:N-l ) '+rand(l » /N; in sensor models for bearing measurements.
Both the code based on sort and for, while are
possible. Another advantage with these options is E. PF versu s E KF
that the state space is more systematically covered, so The computational steps of EKF (4) and SIR-PF
there will not be any large uncovered volumes existing ( 1 6) are compared with the KF in Table I. The EKF
at random. requires only one function evaluation of v) f(x,
B. Li keli hood Evalu ation and Iterated Measu rement
and hex)
per time step, while the PF requires N
evaluations. However, if the gradients are not available
Updates
analytically in the EKF, then at least another nx
The likelihood evaluation can be a real bottleneck evaluations of both f(x,
v ) and hex)
are needed. These
numbers increase when the step size of the numeric
if not properly implemented. In the case that
there are several independent sensors, an iterated gradients are adaptive. Further, if the process noise
measurement update can be performed. Denote the is not additive, even more numerical derivatives are
M sensor observations yi , for j = 1 , 2, ... , M. Then, needed. However, the PF is still roughly a factor N /nx
independence directly gives more complex.
The most time consuming step in the KF is the
M
p(Yk I xk) = jII=l p(yi I xk)· (39)
Riccati recursion of the matrix P. Here, either the
matrix multiplication F P in the time update or the
matrix inversion in the measurement update dominate
This trick is even simpler than the corresponding for large enough models. Neither of these are needed
iterated measurement update in the KF. in the PF. The time update of the state is the same.
However, this iterated update is not necessarily The complexity of a matrix inversion using
the most efficient implementation. One example is state-of-the-art algorithms [62] is O(n; .376). The
the multivariate Gaussian distribution for independent matrix inversion in the measurement update can be

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 65
TABLE I
Comparison of EKF in (4) and S IR-PF in (16): Main
xZ+l = ft(xZ) + Ft(xZ)xi + GJ:(xZ)vJ: (42a)
Computational Steps Xi+l = fi(xJ:) + Fi(xZ)xi + Gi (xZ)vi (42b)

Algorithm Extended Kalman filter Particle filter Yk = hk(xZ) + Hk(xZ)xi + ek ' (42c)
8f(x, v) 8f(x, v) The state vector and Gaussian state noise are
Time update F = G =
8x ' 8v
partitioned as
x : = f(x, O)

P : = FPF T + GQG T

8h(x)
Measurement H =
update
8x
K = PH T (HPH T + R ) - I
vk = ( :1) �N(O,Qk) (42d)

Qk = ( (QQinJ:) k QQiin ) .
x : = x + K(y - h(x))
P : = P - KHP
N
Estimation x=x
x= L w i xi Furthermore, x& is assumed Gaussian, xb � N(xo ,Po ).
i= 1
The density of XO can be arbitrary, but it is assumed
N
Resampling
xi �L wi 6(x - xj)
known. The underlying purpose with this model
structure is that conditioned on the sequence (42) xJ:k'
j= 1 is linear in xi
with Gaussian prior, process noise, and
measurement noise, respectively, so the KF theory
applies.
avoided using the iterated measurement update. The
condition is that the covariance matrix
diagonal.
is (block-) Rk B. Algorith m Overview

As a first-order approximation for large nx ' the KF The MPF relies on the following key factorization:
is O(n�) from the matrix multiplication while the F P, P(XL xJ:k I Y1:k) = p(xi I xJ:k'Yl:k)p(x1:k I Yl:k) ' (43)
( f ,v
PF is O N n;) for a typical dynamic model where all
elements of (x f) ,v Fx v.
depend on all states, for instance These two factors decompose the nonlinear filtering
the linear model (x )
= + Also from this
perspective, the PF is a factor N /nx computationally
task into two subproblems:
1) A KF operating on the conditionally linear,
more demanding than the EKF. Gaussian model (42) provides the exact conditional
posterior p(xi IxJ:k'Y1:k) N(xi ;iilk(x7�i ),P�k(x7�i»·
=
Here, (42a) becomes an extra measurement for the KF
VI I I . MARG I NA L I Z E D PA RT I C L E F I LT E R THEO RY
with xk+l - ft (xZ )
acting as the observation.
2) A PF estimates the filtering density of
The main purpose of the marginalized PF (MPF)
is to keep the state dimension small enough for the PF the nonlinear states. This involves a nontrivial
marginalization step by integrating over the state space
of all xi using the law of total probability
to be feasible. The resulting filter is called the MPF
or the Rao-Blackwellized PF (RBPF), and it has been
known for quite some time under different names, see,
e.g., [49, 63-68].
P(xJ: k+l IY1:k) = P(xJ:k I Y1:k)P(xJ:+ l I xJ:k'Y!:k)
The MPF utilizes possible linear Gaussian = P(xJ:k I Yl:k ) J P(XJ:+1 I XL xJ:k'Yl:k)
substructures in the model ( 1 ). The state vector is
assumed partitioned as = xk « xZl,(xiYl xi
where
enters both the dynamic model and the observation
x p(xi I x1:k'Y!:k)dxi
model linearly. We refer a bit informally to xi = p(x1:k I Yl:k ) J P(xZ+l I xi,xJ:k'Yl:k)
as the linear state and xZ Z
as the nonlinear state.
MPF essentially represents x
with particles and
applies one KF per particle. The KF provides
X N(xi;xilk(x7�i),P� k(x7�i »dxi. (44)
the conditional distribution for xi
conditioned on The intuitive interpretation of this result is that the
the trajectory
observations.
x1:k
of nonlinear states and the past
linear state estimate acts as an extra state noise in
(42a) when performing the PF time update.
The time and measurement updates of KF and PF
A. Model Structu re are interleaved, so the timing is important. The
information structure in the recursion is described in
A rather general model, containing a conditionally Algorithm 2. Table II summarizes the information
linear Gaussian substructure is given by steps in Algorithm 2. Note that the time index appears

66 IEEE A&E SYSTEMS M AG AZINE VOL. 25, NO. 7 JULY 2010 PART 2: TUTORIALS
TABLE II
Summary of the Information Steps in Algorithm 2 for the Marginalized PF Utilizing a Linear Gaussian Substructure

Prior

PF TU P(xf:k I hk) '* P(xf:k+ l I Y l :k )


KF TU p(xi I xf:k,hk) '* p (xi+ 1 I xf,k 'Yl:k )
KF dyn MU P(xi+ l l xf:k , hk ) '* PCxi+ l l xf:k+ l 'Yl:k )
PF MU P(xf:k+ l I Yl:k) '* P(xf:k+ l I hk+ l )
KF obs MU P(xi+ l I xf:k+ l ' hk ) '* P(xi + l I xf,k+ l 'hk+ l )
Posterior P(xi + l ,xf,k + 1 I Yl :k+ I ) = P(xi + l I xf:k+ l 'Yl :k+ I )P(xf,k+ l I Y I :k+ l )

five times in the right hand side expansion of the It is easy to verify that the Ricatti equations in this
prior. The five steps increase each k one at the time case only involve matrices that are the same for all
to finally form the posterior at time k + 1 . trajectories x7 � � . This implies a significant complexity
reduction.
ALGORITHM 2 Marginalized Particle Filter With
One important special case of (42) in practice is
reference to the standard PF in Algorithm 1 and the
a model with linear state equations with a nonlinear
KF; iterate the following steps for each time step: observation which is a function of a (small) part of the
1) PF measurement update and resampling using
state vector
(42c) where xi is interpreted as measurement noise.
2) KF measurement update using (42c) for each (47a)
. ni
partlc Ie x l �k .
3) PF time update using (42a) where xi is (47b)
interpreted as process noise.
4) KF time update using (42b) for each particle (47c)
n,i
x 1:k• For instance, all applications in Section X fall into
5) KF extra measurement update using (42a) for
. ni this category. In this case, step 3 in Algorithm 2
each partlc Ie Xl� k .
disappears.
The posterior distribution for the nonlinear states The MPF appears to add quite a lot of overhead
is given by a discrete particle distribution as usual, computations. It turns out, however, that the MPF
while the posterior for the linear states is given by a is often more efficient. It may seem impossible
Gaussian mixture: to give any general conclusions, so application­
N
dependent simulation studies have to be performed.
I
p(x1:k Y\:k) L wkl k 8 (xl: k - x:�� )

i= 1
(45a) Nevertheless, quite realistic predictions of the
computational complexity can be done with rather
simple calculations, as pointed out in [70] . The result
is that for the case when (46) is satisfied, MPF should
always be more efficient, otherwise the complexities
i=1 are comparable.
(45b)
For a complete derivation, see [67]. As demonstrated
D. Variance Red u cti on
in [69] , standard KF and particle filtering code
can be reused when implementing the MPF. The The MPF reduces the variance of the linear
model (42) can be further generalized by introducing states which is demonstrated below. The law of total
an additional discrete mode parameter, giving a larger variance says that
family of marginalized filters; see [68].
cov(U) = cov(E(U I V» + E(cov(U I V» . (48)
C. Complexity I ssues Letting U = xi and V = X\: k gives the following
decomposition of the variance of the PF:
In general, each KF comes with its own Riccati
equation. However, the Riccati equation is the same if
the following three conditions are satisfied:
cov(xi > = cov(E(x I xl: k »
"-v-"
i + E(cov(xi Ix!:k »
PF

GI:(xf: ) = GI: or Ft (xl: ) = 0 (46a) (49a)


N
Gi (xk) = Gi (46b) = COV (Xi l k (x7��» + L WU 1I k (x7 �� ) . (49b)
Hk (xf:) = Hk• (46c) �
MPF
i=1 '-v--" KF

IEEE A&E SYSTEMS M AGAZINE VOL. 25, NO. 7 JULY 2010 PART 2: TUTORIALS-GUSTAFSSON 67
Covariance for linear slales
Here, we recognize (xi I x�;�) as the Gaussian 1 2r---------�--------�----r===�
distribution, delivered by the KF, conditioned on the - M PF
PF
trajectory �;� . Now, the MPF computes the mean 10 -+- KF
of each trajectory as xil k (x�;� ), and the unconditional
mean estimator is simply the mean of these,
N
Xi1 k =
L w�xil k (X�;�)
;=1
( 5 0)
and its covariance follows from the first term in (49b).
The first term in (49b) corresponds to the spread of
the mean contribution from the Gaussian mixture, and
this is the only uncertainty in the MPF. o �--------��--------�--------�
The variance decomposition shows that the 1� 1� 1� 1�
covariance for the MPF is strictly smaller than the
N

corresponding covariance for the PF. This can also be Fig. 5 . Schematic view of how covariance of linear part of state
seen as a result of Rao-Blackwell' s lemma, see, e.g., vector depends on number of particles for PF and MPF,
[37] , and the marginalization is commonly referred respectively. Gain in MPF is given by KF covariance.

to as Rao-Blackwellization. This result says that the


improvement in the quality of the estimate is given separable nonlinear least squares problem. In fact,
by the term E(cov(xi I x1: k )). Note that when (46) is the special case of a static problem where only (42c)
satisfied, then P�l k = ll ik and thus 2:;:' 1 WkP�l k = lli k ' exists falls into this class of problems. Here, the
That is, the KF covariance ll ik is a good indicator of weighted least squares estimate of xi is first computed
how much that has been gained in using the MPF as a function of x1: k ' which is then backsubstituted
instead of the PF. As a practical rule of thumb, the into the model with its estimation covariance to form
gain in MPF increases as the uncertainty in the linear
state increases in the model. Further discussions
a nonlinear least squares problem in only. x):k
regarding the variance reduction property of the MPF F. I llustrati ng Exa m ple
are provided for instance in [49] .
The variance reduction in the MPF can be used in The aim here is to illustrate how the MPF works
two different ways: using the following nonlinear stochastic system

1) With the same number of particles, the variance xk+l


n _
- xkI xkn +
n
Vk (5 I a)
in the estimates of the linear states can be decreased.
2) With the same performance in terms of variance xk+1
I
= xkI +
Vk I
(S I b)

for the linear states, the number of particles can be


decreased.
Yk = 0 . 2(xk)2 + ek (S I c)

where the noise is assumed white and Gaussian


This is schematically illustrated in Fig. 5, for the according to
case when (46) is satisfied, implying that the same
covariance matrix can be used for all particles . The
two alternatives above are illustrated for the case when
v k = ( VI:)k "V N (( 0O ) ' ( 0. 25 00 ))
v ° 1 -
4
(S I d)

a PF with 10,000particles is first applied and then


ek "V N(O, 1). (5 1 e)
replaced by the MPF.
The initial state Xo is given by
E. M P F Synonyms

The following names have been suggested for the


xo "V N (( ��� ) , ( �6 � )) I
-3 ' (5 1 f)

filter in this section:


This particular model was used i n [7 1 ] , where it
1) MPF as is motivated by the nontrivial illustrated grid-based (point-mass) filters. Obviously,
marginalization step (44). the states can be estimated by applying the standard
2) "Rao-Blackwellized particle filter," as motivated PF to the entire state vector. However, a better
by the variance reduction in (49). solution is to exploit the conditionally linear,
3) "Mixture Kalman filter," as motivated by the Gaussian substructure that is present in (5 1 ) . The
various mixture distributions that appear, for instance nonlinear process xl: is a first-order auto regressive
in (45b). (AR) process, where the linear process xi is the
4) Another logical name would be "separable time-varying parameter. The linear, Gaussian
particle filter" in parallel to the well-established substructure is used by the MPF and the resulting

68 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
� so

Fig. 6. Estimated filter pdf for system (5 1 ) at time 1 0,


P(X IO I Yl : l o ) using MPF. It is instructive to see that linear state X; o
is estimated by Gaussian densities (from the KF), and position
along the nonlinear state xto is given by a particle (from the PF).

o �------�--�
o w w � � � ro ro 00 W �
filtering density function at time 1 0, P (x l O I Y l : l o) x

before the resampling step is shown in Fig. 6 (for a Pig. 7. Aircraft altitude z(xk) (upper dark line) as a function of
particular realization) . In this example 2000 particles position xk (dots on upper dark line) and nonlinear measurement
relation hex) (lower gray line) for the model in (52). Computed
were used, but only 1 00 of them are plotted in
terrain altitude h(x)) is also marked, and circle is put in all grid
Fig. 6 in order to obtain a clearer illustration of points that give best match to this altitude.
the result. The figure illustrates the fact that the
MPF is a combination of the KF and the PF. The
density functions for the linear states are provided where both the state and the measurement are scalar
by the KFs, which is evident from the fact that valued. This model mimics a navigation problem in
the marginals p(xti I Y l : k ) are given by Gaussian one-dimension, where U k is a measurable velocity,
densities . Furthermore, the nonlinear state estimates vk unmeasurable velocity disturbance, and the
are provided by the PF. Hence, the linear states are observation Yk measures the terrain altitude, which
given by a parametric estimator (the KF), whereas is known in the database h(x) . An illustration from
the nonlinear states are given by a nonparametric a real application is found in Fig. 6. Note that the
estimator (the PF) . In this context the MPF can terrain altitude as a measurement relation is not one
be viewed as a combination of a parametric and a to one, since a given terrain altitude is found at many
nonparametric estimator. different positions. However, the observed terrain
profile will after a short time be unique for the flown
trajectory.
IX. PA RTI C L E F I LT E R CO D E EXAM P L E S
Fig. 7 shows a trajectory, and one realization of the
This section gives concrete MATLABTM-like nonlinear function terrain profile h(x) , generated by
code for a general SIR-PF, and applies it to a fully the code below.
annotated simulation example. Further, object-oriented x=l:100; % Map grid
implementations of nonlinear filters are illustrated on h=20+filter(l,[1 -1.8 0.8 1 1 ,randn(l,10 0» ;
target tracking applications . The classes and examples % Terrain altitude
are available in the Signal and Systems Lab; URL: N=15;
www.control.isy.liu.se/..-Ifredriklsigsyslab. z=100+filter(l,[1 -1. 8 0.81 1 ,randn(N,1» ;
% Measurement input
A. Terrai n- B ased Position i ng u=2*ones(N,1); % State input
xO=20+cumsum(u); % True position
The following scalar state example suits three y=z-interpl(x,h,xO); % Noisefree measurement
purposes. First, it enables intuitive graphical yn=y+l*randn(N, 1); % Noisy measurement
illustrations. Second, it introduces the positioning plot(xO,y,'o-b',x,h,'g',xO, z-y,'go',
applications in the next section. Third, it should 'linewidth',3)
be easy to implement for interested readers for
reproducing the example and extending the code to The horizontal line indicates where the first
other applications. measurement is taken. There are ten different
Consider the model intersections between the terrain profile and this
observation, where the grid point just before each
(S2a) intersection is marked in the figure. This is clearly a
problem where the posterior is multimodal after the
(S2b) first measurement update.

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 69
..
21 Time k = 1 ..
21 Time k = 15

�" I •
� 0.4
m 0
� 10

,
JL
20
• l�lITll!
30

,
40 50 60 70
�NWI! •
80 90
I
100
1:;1

'"
• • • '---�. • •
10 20 30 40 50 60 70 80 90 100

i"l n Ilfl1® • lim I


c
'5. 0 . 8
E

m 0.4 m 0. 4


0 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

fi!

t" 1 Ul D' II[I�II I


"8. 0.8
"

� 0.4 � 0.4
i=
0 i= 0
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Position x Position x

Fig. 8 . First two subplots: approximations o f p(xk I Yl:k ) before and after resampling, respectively. Last subplot: approximations of
p(xk+! I Yl:k ) '

I --Truel
The following code lines define the model (52) as 70
an object structure: 65
, -+- PF
I
I
m.f=inline('x+u','x','u'); 60 I

m.h=inline('z -interp1(x,h,xp)','xp','h',
55 4t I
I
'x','z'); \ I
\ I
m.pv=ndist(0,5); m.pe=ndist(O,l); 2 50
';(
"
m.pO=udist(10,9 0); c
45

The pdf classes ndist and udist with the � 40


methods rand and pdf are assumed to be available. 35
A script that both implements a version of the PF and
also animates all the partial results is given below: 30

25
Np=100; w=ones (Np,1)/Np;
xp=rand(m.pO,Np); % Initialization
for k=l:N;
yp=m.h(xp,h,x, z(k» ; % Measurement pred.
2 4 6
Time
8 10 12 14 16

Fig. 9. True and estimated state as function of time.


w=w.*pdf(m.pe,repmat(y n(k,:),Np,l)-yp);
% L ikelihood Fig. 8(b) illustrates the same thing after the 1 5th
w=w/sum(w); % Normali z ation measurement. The posterior is now more clustered
subplot(3,1,1), stem(xp,Np*w/10) to a unimodal distribution. Fig. 9 shows the position
xhat(k,:)=w(:)'*xp; % Estimation error as a function of time. The break point in
[xp,w]=resample(xp,w); % Resampling performance indicates when the multimodal posterior
subplot(3,1,2), stem(xp,Np*w) distribution becomes unimodal.
v=rand(m.pv,Np); % Random process noise
B. Target Tracki ng
xp=m.f(xp,u(k,:)')+v; % State prediction
subplot(3,1,3 ), stem(xp,Np*w) In an object-oriented implementation, simulation
end studies can be performed quite efficiently. The
following example compares different filters for a
Code examples of the function resample are simple target tracking model:
given in Section VIlA. Fig. 8 shows the posterior
density approximation at two time instants. Fig. 8(a)
shows first the unnormalized weights after the
measurement update, which with this uniform
prior is just the likelihood function P (Y l I xo) =

P(Y l ) ' and then follows the particle distribution (53a)


after resampling (where wi 1 / N) and finally the
=

particles after time update (which is just a translation ek ,...., N(0, 0.01l2)·
with u ) 1 . (53b)

70 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
The observation model is first linear to be comparable
to the KF that provides the optimal estimate. The
example makes use of two different objects:
1 ) S ignal object where the state x l : k and
observation Y l : k sequences are stored with their
associated uncertainty (covariances Ikx, PI or particle
representation). Plot methods in this class can then
automatically provide confidence bounds. >- 3

2) Model objects for linear and nonlinear models,


with methods implementing simulation and filtering
algorithms.
The purpose of the following example is to illustrate
how little coding is required with this object-oriented
-1
approach. First, the model is loaded from an extensive
example database as a linear state-space model. It � L-____�____�____________�____��____�
-1 -0.5 0.5 1.5
is then converted to the general nonlinear model
structure, which does not make use of the fact that Fig. 10_ Simulated trajectory using constant velocity
the underlying model is linear. two-dimensional motion model with position sensor, where plots
show CRLB (darkest) and estimates from KF, EKF, UKF, and PF,
mss=exlti ( 'cv2d') ;
respectively.
mnl=nl ( mss) ;

Now, the following state trajectories are compared:


1 ) the true state from the simulation.
2) the CRLB computed from the nonlinear model.
3) the KF estimate using the linear model.
4) the EKF using the nonlinear model.
S) the UKF using the nonlinear model.
6) the PF using the nonlinear model.
For all except the first one, a confidence ellipsoid
indicates the position estimation uncertainty.
y=simulate ( mss,lO) ;
xhatl=kalman ( mss,y) ;
xhat2=ekf ( mnl,y) ;
xhat3=ukf ( mnl,y) ; -1 jadar sensor
xhat4=pf ( mnl,y,'Np',lOOO) ; ----�--�--��-
_2 L-
xcrlb=crlb ( mnl,y) ; -I -0.5 0.5 1.5

xplot2 ( xcrlb,xhat4,xhat3,xhat2,xhatl,
Fig. 1 1 . Simulated trajectory using constant velocity
'conf',90)
two-dimensional motion model with radar sensor, where plots
Fig. 10 validates that all algorithms provide show CRLB (darkest) and estimates from EKF (small ellipsoids)
and PF, respectively_
comparable estimates in accordance with the CRLB .
Now, consider the case of a radar sensor that

(X��: -- ) )
However, the performance of all filters is comparable,
provides good angle resolution but poor range. The

(
measurement relation in model (S3b) is changed to and the nonlinear measurement relation does not in
itself motivate computer-intensive algorithms in this
e (2) case.
arctan
Yk =
xk e( l ) + ek
J(x�
l)
_ e( 1 » 2 + (x�2) _ e(2» 2
C. Growth Model

--
The following toy example was used in the
ek ", N(O, diag(O.OOO I , O.3)). (S4) original paper [ 1 S ] :
Fig. 1 1 compares EKF and PF with respect to the Xk xk
xk + l - + 2S 2 + 8 cos(k) + vk '
2
=
CRLB . The PF performs well, where the covariances 1 + xk
fitted to the particles are very similar to the CRLB .
vk ", N(O, 1 0), xo ", N(S , S) (SSa)
The EKF is slightly biased and too optimistic about
the uncertainty, which is a typical behavior when
neglecting higher order terms in the nonlinearities. (SSb)

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2 : TUTORIALS-GUSTAFSSON 71
xl TABLE III
� r-----�----�--' MSE Performance of the Estimates in Fig. 1 2 for the B enchmark
60 Problem in (55)

50 CRLB PF UKF EKF


I
I 8 18 54 132
40 I

model [72]
Xk = (Xk , lk , 'l/Jk ) T (56a)

Uk = (\'k, �k l (56b)
Xk+ ! = Xk + T\'k cos('l/Jk ) (56c)
lk + ! = Xk + T\'k sin( 'l/Jk ) (56d)

5 10 15 20 25 30
'l/Jk+ 1 = 'l/Jk + T'l/Jk (56e)
Time h (xk )
Yk = + ek • (56f)
Fig. 1 2 . Simulated trajectory using model (55), where plots show
CRLB (darkest) and estimates from EKF, PF, and UKF, Here, Xk , lk denote the Cartesian position, 'l/Jk the
respectively. Table III summarizes performance. course or heading, T is the sampling interval, \'k is the
.
sp eed, and 'l/Jk the yaw rate. The inertial signals \'k and
It has since then been used many times in the particle 'l/Jk are considered as inputs to the dynamic model, and
filter literature, and it is often claimed to be a growth are given by onboard sensors. These are different in
model. It is included here just because it has turned each of the four applications, and they are described
into a benchmark problem. The simulation code is in more detail in the subsequent sections. The
m=exnl('pfex'); measurement relation is based on a distance measuring
z =simulat e(m,30); equipment (DME) and a GIS . Both the DME and
zcrlb=crlb(m,z); the GIS are different in the four applications, but the
zekf=ekf(m, z) ; measurement principle is the same. By comparing the
zukf=ukf(m, z) ; measured distance to objects in the GIS , a likelihood
zpf=pf (m, z) ; for each particle can be computed. It should here
xplot(zcrlb,zpf,zekf,zukf,'conf',90,'view', be noted that neither an EKF, UKF, nor KF bank
'cont','conft ype',2) is suited for such problems. The reason is that it is
[mean(zcrlb.Px) norm(z.x-zpf.x) typically not possible to linearize the database other
norm(z.x-zekf.x) norm(z.x-zukf.x)] ; than in a very small neighborhood.
In common for the applications is that they
The last two lines produce the result in Fig. 1 2 and
do not rely on satellite navigation systems, which
Table III, respectively. The conclusion from this
are assumed unavailable or provide insufficient
example is that PF performs much better than the
navigation integrity. First, the inertial inputs, DME
UKF which in turn performs much better than the
and GIS, for the four applications are described.
EKF. Thus, this example illustrates quite nicely the
Conclusions concering the PF from these applications
ranking of the different filters.
are summarized in Section XII. Different ways to
augment the state vector are described for each
X. PART I C L E F I LT E R POS I TI O N I N G A P P L I CAT I O N S application in Section XI. The point is that the
This section is concerned with four positioning dimension of the state vector has to be increased
applications of underwater vessels, surface ships, in order to account for model errors and more
wheeled vehicles (cars), and aircraft, respectively. complicated dynamics. This implies that the PF is
Though these applications are at first glance quite simply not applicable, due to the high dimensional
different, almost the same PF can be used in all of state vector.
them. In fact, successful applications of the PF are The outline follows a bottom-up approach, starting
described in literature which are all based on the with underwater vessels below sea level and ending
same state-space model and similar measurement with fighter aircraft in the air.
equations.
B. Underwater Position i ng u si n g a Topograp h i c Map
A. Model Framework
The goal is to compute the position of a UW
The positioning applications, as well as existing vessel. A sonar is measuring the distance d! to the
applications of fastSLAM, are all based on the sea floor. The depth of the platform itself d2 can be

72 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
Fig. 1 3 . Left plot is an illustration of UW vessel measuring distance dl to sea bottom, and absolute depth d2 . Sum d = dl + d2 is
compared with a bottom map as illustrated with contours in plot to right. Particle cloud illustrates snapshot of PF from known
validation trajectory in field trial, see [75 ] .

o•
• D•
• •

wi'
0
• -', (1 + 301 1)(

Fig . 1 4 . Rotating radar returns detections of range R at body angle B. Result of one radar revolution is conventionally displayed in
polar coordinates as illustrated. Comparing the (R, e) detections to sea chart as shown to (right), position and course are estimated by
PF. When correctly estimated, radar overlay principle can be used for visual validation as also illustrated in sea chart. PF has to
distinguish radar reflections from shore with clutter and other ships, see [76] . The latter can be used for conventional target tracking
algorithms, and collision avoidance algorithms, as also illustrated to (right), see [77 ] .

computed from pressure sensors or from a sonar C. S u rface Positi o n i n g u s i n g a Sea Chart
directed upwards. By adding these distances, the sea
The same principle as above can of course be
depth at the position Xk , lk is measured. This can be
used also for surface ships, which are constrained
compared to the depth in a dedicated sea chart with
to be on the sea level (d2 0). However, vectorized
=

detailed topographical information, and the likelihood


sea charts (for instance the S-57 standard) contain a
takes the combined effect of errors in the two sensors
commercially available worldwide map.
and the map into account, see [73] . Fig. 1 3 provides The idea is to use the radar as DME and compare
an illustration. the detections with the shore profile, which is known
The speed Vk and yaw rate 'lj;k in (56) are computed from the sea chart conditioned on the position Xk , lk
using simplified dynamic motion models based on the and course 'lj;k (most importantly, the ship orientation,
propeller speed and the rudder angle. It is important but more on this later); see [73] . The likelihood
to note that since the PF does not rely on pure dead function models the radar error, but must also take
reckoning, such models do not have to be very clutter (false detections) and other ships into account.
accurate, see [74] for one simple linear model. An The left hand part of Fig. 1 4 illustrates the
alternative is to use inertial measurement units (IMU) measurements provided by the radar, while the
for measuring and computing speed and yaw rate. right hand part of the same figure shows the radar
Detailed seabed charts are so far proprietary detections from one complete revolution overlayed on
military information, and most applications are the sea chart. The inertial data can be computed from
also military. As an example of civilian use, oil propeller speed and rudder angle using simplified
companies are starting to use unmanned UW vessels dynamical models as above.
for exploring the sea and oil platforms, and in this American and European maritime authorities have
way they are building up their own maps. recently published reports highlighting the need for a

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORlALS-GUSTAFSSON 73
Fig. 1 5 . Left: Example of multimodal posterior represented by number of distinct particle clouds from NIRA Dynamics navigation
system. This is caused by regular road pattern and will be resolved after sufficiently long sequence of turns . Right: PF in embedded
navigation solution runs in real time on pocket PC with serial interface to vehicle CAN data bus, see [80] .

Fig. 1 6. Left figure is an illustration of an aircraft measuring distance h I to ground. Onboard baro-altitude supported INS system
provides absolute altitude over sea level h, and difference h2 = h - h I is compared to a topographical map. Right plot shows a snapshot
of PF particle cloud j ust after aircraft has left sea in upper left comer. There are three distinct modes, where the one corresponding to
the correct position dominates.

backup and support system to satellite navigation to applications use vibrations in wheel speeds and
increase integrity. The reason for this need is accidents vehicle body as a DME. When a rough surface is
and incidents caused by technical problems with the detected, this DME can increase the likelihood for
satellite navigation system and the risk of accidental being outside the road. Likewise, if a forward-looking
or deliberate j amming. The LORAN standard offers camera is present in the vehicle, this can be used to
one such supporting technique based on triangulation compute the likelihood that the front view resembles
to radio beacons, see [78] . The PF solution here a road or if it is rather a nonmapped parking area or
is a promising candidate, since it is, in contrast to smaller private road.
LORAN, not sensitive to j amming nor does it require The system is suitable as a support to satellite
any infrastructure. navigation in urban environments, in parking garages
or tunnels or whenever satellite signals are likely
D. Veh icle Positi o n i ng u s i n g a Road Map to be obstructed. It is also a stand-alone solution to
the navigation problem. Road databases covering
The goal here is to position a car relative to a road complete continents are available from two main
map by comparing the driven trajecto�y to the road vendors (NavTech and TeleAtlas).
network. The speed � and yaw rate '¢k in (56) are
computed from the angular velocities of the nondriven E. A i rcraft Pos ition i ng u s i ng a Topograp h i c M a p
wheels on one axle using rather simple geometrical
relations. Dead reckoning (56) provides a profile that The principal approach here i s quite similar to
fits to the road network. the UW positioning application and extends the
The measurement relation is in its simplest form one-dimensional example in Section IX to two
a binary likelihood which is zero for all positions dimensions.
outside the roads and a non-zero constant otherwise. A high-end IMU is used in an inertial navigation
In this case, the DME is basically the prior that the system (INS) which dead �eckons the sensor data
vehicle is located on a road, and not a conventional to speed � and yaw rate '¢k in (56) with quite high
physical sensor. See [72] , [79] for more details accuracy. Still, absolute position support is needed to
and Fig. 1 5 for an illustration. More sophisticated prevent long-term drifts.

74 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2 : TUTORIALS
VEl. . 11 8 [ ka/b )
B!W(E 1
R�ElR . 0

Parking garage

Fig. 1 7 . Navigation of car in parking garage. Results for MPF when relative wheel radii and gyro offset are added to state vector. Two
trajectories correspond to map-aided system and EKF with same state vector, but where GPS is used as position sensor. Since GPS gets
several drop-outs before parking garage, dead-reckoning trajectory is incorrect; see [8 1 ] .

The DME i s a wide-lobe, downward looking the speed sensed by the log is the speed in water,
radar that measures the distance to the ground. The not the speed over ground. Hence, the local water
absolute altitude is computed using the INS and a current is a parameter to include in the state vector.
supporting barometric pressure sensor. Fig. 1 6 shows Second, the radar is strap down and measures relative
one example just before convergence to a unimodal to body orientation, which is not the same as
filtering density. the course 'ifJk• The difference is the so called crab
Commercial databases of topographic information angle, which depends on currents and wind. This can
are available on land (but not below sea level), with a also be included in the state vector. Further, there is
resolution of 50-200 m. in our demonstrator system [76] an unknown and
time-varying offset in the reported radar angle, which
XI. MARG I N A L I Z E D PART I C L E F I LT E R A P P L I CATI O N S has to be compensated for.

This section continues the applications in


C. Veh icle Positi on i ng
Section X with extended motion models where the
MPF has been applied. The bottleneck of the first generation of vehicle
positioning PF is the assumption that the vehicle must
A. U nderwater Position i ng be located on a road. As previously hinted one could
Navigating an unmanned or manned UW vessel use a small probability in the likelihood function
requires knowledge of the full three-dimensional for being off-road, but there is no real benefit for
position and orientation, not only the projection in this without an accurate dead-reckoning ability, so
a horizontal plane. That is, at least six states are reoccurrence on the road network can be predicted
needed. For control, also the velocity and angular with high reliability.
velocities are needed, which directly implies at least The speed and yaw rate computed from the
a twelve-dimensional state vector. The PF cannot be wheel angular velocity are limited by the insufficient
assumed to perform well in such cases, and MPF is a knowledge of wheel radii. However, the deviation
promising approach [73 ] . between actual and real wheel radii of the two wheels
on one axle can be included in the state vector.
B. Su rface Position i ng
Similarly, with a yaw rate sensor available (standard
component in electronic stability programs (ESP)
There are two bottlenecks in the surface and navigation systems), the yaw rate drift can be
positioning PF that can be mitigated using the MPF. included in the state vector. The point is that these
Both relate to the inertial measurements. First, parameters are accurately estimated when the vehicle

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 75
is on the road, and in the off-road mode, improved 2) The aircraft positioning PF was implemented in
dead reckoning can be achieved. Tests in demonstrator ADA and shown to satisfy real-time performance on
vehicles have shown that the exit point from parking the onboard computer in the Swedish fighter Gripen
garages and parking areas are well estimated, and in the year 2000. Real-time performance was reached,
that shorter unmapped roads are not a problem; see despite the facts that a very large number of particles
Fig. 1 7 . were used on a rather old computer.

D. Ai rcraft Positi o n i n g
B. Sam p l i ng Rates
The primary role of the terrain based navigation The DME can in all cases deliver measurements
(TERNAV) module is to support the INS with much faster than the chosen sampling rate. However,
absolute position information. The INS consists faster sampling will introduce an unwanted correlation
of an EKF based on a state vector with over 20 in the observations. This is due to the fact that the
motion states and sensor bias parameters. The databases are quantized, so the platform should
current bottleneck is the interface between TERNAV make a significant move between two measurement
and INS . The reason is that TERNAV outputs a updates.
possibly multimodal position density, while the
INS EKF expects a Gaussian observation. The
C. I m plementatio n
natural idea is to integrate both TERNAV and INS
into one filter. This gives a high-dimensional state Implementing and debugging the PF has not
vector, where one measurement (radar altitude) is been a major issue. On the contrary, students and
very nonlinear. The MPF handles this elegantly, nonexperts have faced fewer problems with the PF
by essentially keeping the EKF from the existing than for similar projects involving the EKF. In many
INS and using the PF only for the radar altitude cases, they obtained deep intuition for including
measurement. nontrivial but ad hoc modifications. There are today
The altitude radar gives a measurement outlier several hardware solutions reported in literature,
when the radar pulse is reflected in trees. Tests where the parallel structure of the PF algorithms
have validated that a Gaussian mixture where can be utilized efficiently. For instance, an FPGA
one mode has a positive mean models the real implementation is reported in [82] , and on a general
measurement error quite well. This Gaussian mixture purpose graphics processing unit (GPGPU) in [83 ] .
distribution can be used in the likelihood computation, Analog hardware can further b e used to speed up
but such a distribution is in this case logically function evaluations [6 1 ] .
modeled by a binary Markov parameter, which is
one in positions over forest and zero otherwise.
D. Dithe ring
In this way, the positive correlation between
outliers is modeled, and a prior from ground-type Both the process noise and measurement
information in the GIS can be incorporated. This noise distributions need some dithering (increased
example motivates the inclusion of discrete states covariance). Dithering the process noise is a
in the model framework. See [67], [68] for the well-known method to mitigate the sample depletion
details. problem [ 1 5 ] . Dithering the measurement noise is
a good way to mitigate the effects of outliers and
XI I . S U MMARY to robustify the PF in general. One simple and still
very effective method to mitigate sample depletion
This section summarizes practical experience from is to introduce a lower bound on the likelihood. This
the applications in Sections X and XI with respect to lower bound was first introduced more or less ad hoc.
the theorectical survey in Sections II and VIII. However, recently this algorithm modification has
been justified more rigorously. In proving that the
A. Real -T i m e Issues PF converges for unbounded functions, like the state
xk itself, it is sufficient to have a lower bound on the
The PF has been applied to real data and
likelihood; see [57] for details.
implemented on hardware targeted for the application
platforms. The sampling rate has been chosen in the
order 1-2 Hz, and there is no problem in achieving E. N u m ber of Particles
real-time performance in any of the applications.
The number of particles is chosen to be the quite
Some remarkable cases follow.
large to achieve good transient behaviour in the
1 ) The vehicle positioning PF was already start-up phase and to increase robustness. However,
implemented on a PDA using 1 5 ,000 particles in it has been concluded that in the normal operational
200 1 ; see [79] . mode the number of particles can be decreased

76 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
level of integrity. After divergence, the particles do
- 1 200 particles
- - - 2500 particles not reflect the true state distribution and there is no
0.8 . . . . . . . 5000 particles mechanism that automatically stabilizes the PF. Hence,
' - ' - , 1 0000 particles
divergence monitoring has to be performed in parallel
0.6 with the actual PF code, and when divergence is
detected, the PF is reinitialized.
w
� 0.4 One indicator of particle depletion is the effective
n:
number of samples Neff' used in the PF. This number
0.2 monitors the amount of particles that significantly
contribute to the posterior, and it is computed from
the normalized weights. However, the unnormalized
likelihoods are a more logical choice for monitoring.
Fig. 1 8 . RMSE performance for aircraft terrain navigation as Standard hypothesis tests can be applied for testing
function of number of particles. if the particle predictions represent the likelihood
distribution.
Another approach is to use parallel PFs interleaved
substantially (typically a factor of ten). Fig. 1 8 in time. The requirement is that the sensors are
shows experimental results for the terrain navigation faster than the chosen sampling rate in the PF. The
application. The transient improves when going from PFs then use different time delays in the sensor
N = 1 200 to N = 2500, but using more particles give observations.
no noticable improvement after convergence. The reinitialization procedure issued when
A real-time implementation should be designed divergence is detected is quite application dependent.
for the worst case. However, using an adaptive The general idea is to use a very diffuse prior, or to
sampling interval T and number of particles N is one infer external information. For the vehicle positioning
option. The idea is to use a longer sampling interval application in [79] , a cellular phone operator took
and more particles initially, and when the PF has part in the demonstrator, and cell information was
converged to a few distinct modes, T and N can be used as a new prior for the PF in case of occasional
decreased in such a way that the complexity N IT is divergence.
constant.

H. Performance Bou nds


E Choos i n g the Proposa l Dens ity

The standard SIR-PF works fine for an initial For all four GPS-free applications, the positioning
design. However, the maps contain rather detailed performance is in the order of 1 0 m root mean
information about position and can be considered square error (RMSE), which is comparable to GPS
as state constraints in the limit. In such high performance. Further, the performance of the PF has
signal-to-noise applications, the standard proposal been shown to be close to the CRLB for a variety
density used in the SIR-PF is not particularly of examined trajectories. In Fig. 1 9 two examples of
efficient. An alternative, that typically improves the performance evaluations in terms of the RMSE are
performance, is to use the information available in depicted. On the left hand side the position RMSE
the next measurement already in the state prediction and CRLB are shown for the UW application, and
step. Note that the proposal in its most general on the right hand side the horizontal position error is
form includes the next observation. Consider for provided for the aircraft application.
instance positioning based on road maps. In standard
SIR-PF, the next positions are randomized around I. Parti cle F i lter i n E m bedded System s
the predicted position according to the state noise,
which is required to obtain diversity. Almost all of The primary application is to output position
these new particles are outside the road network, and information to the operator. However, in all cases
will not survive the resampling step. Obviously this there have been decision and control applications built
is a waste of particles. By looking at how the roads on the position information, which indicates that the
are located locally around the predicted position, a PF is a powerful software component in embedded
much more clever process noise can be computed, systems as follows.
and the particles explore the road network much more
1) UW positioning: Here, the entire mission
efficiently.
relies on the position, so path planning and trajectory
control are based on the output from the PF. Note
G. Divergence Mon ito r i n g
that there is hardly any alternative below sea level,
Divergence monitoring is fundamental for where no satellites are reachable, and deploying
real-time implementations to achieve the required infrastructure (sonar buoys) is quite expensive.

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 77
OO'---�----�--r=�P�F� 6O'-����--��-'==�P�F�
- _ . CRLB - - . CRLB
80
50
70

60 40

Fig. 1 9 . Position RMSE for UW (left) and surface (right) applications compared to CRLB .

2) Surface positioning: Differentiating radar completed a Ph.D. with a focus on particle filtering:
detections from shore, clutter, and other ships is an Niclas Bergman, Rickard Karlsson, Thomas Schon,
essential association task in the PF. It is a natural Gustaf Hendeby, David Tornqvist, and Per-Johan
extension to integrate a collision avoidance system Nordlund. There are also numerous current graduate
in such an application, as illustrated in a sea chart students and post-docs, and more than 50 master
snapshot in Fig. 14. students who have contributed indirectly. This survey
3) Vehicle positioning: The PF position was also is very much influenced by their work.
used in a complete voice-controlled navigation system
with dynamic route optimization; see Fig. 1 5 .
REFERENCES
4) Aircraft navigation: The position from the
PF is primarily used as a supporting sensor in the [I] Kalman, R.
INS, whose position is a refined version of the PF A new approach to linear filtering and prediction
output. problems.
Transactions of Journal Basic Engineering, ASME Series
D, 82 ( 1 960), 35-45 .
J. Margi nal ized Parti cle F i lteri n g
[2] Kailath, T. , Sayed, A., and Hassibi, B.
Linear Estimation (Information and System Sciences
Finally, the MPF offers a scalable extension of Series) .
the PF in all applications surveyed here and many Upper Saddle River, N J : Prentice-Hall , 2000.
others. MPF is applicable for instance in the following [3 ] Smith, L. A. M. G. L., and Schmidt, S. F.
localization, navigation, and tracking problems: Application of statistical filter theory to the optimal
estimation of position and velocity on board a
1) three-dimensional position spaces, circumlunar vehicle.
2) motion models with velocity and acceleration NASA, Technical Report TR R- 1 35 , 1 962.
states, [4] Schmidt, S.
3) augmenting the state vector with unknown Application of state-space methods to navigation
problems.
nuisance parameters as sensor offsets and drifts.
Advances in Control Systems, ( 1 966), 293-340.
The FastSLAM algorithm is state of the art; [5] Julier, S . J., Uhlmann, J. K., and Durrant-Whyte, H . F.
see [24] . This algorithm applies MPF to the A new approach for filtering nonlinear systems.
In Proceedings of the American Control Conference, vol. 3 ,
SLAM problem. FastSLAM has been applied to
1 995, 1 628- 1 632.
applications where thousands of two-dimensional
[6] Julier, S . J., and Uhlmann, J. K.
landmark features are marginalized out from a Unscented filtering and nonlinear estimation.
three dimensional motion state. Further, in [84] Proceedings of IEEE, 92, 3 (Mar. 2004), 40 1 -422.
a double marginalization process was employed [7] Norgaard, M., Poulsen, N., and Ravn, O.
to handle hundreds of landmark features and a New developments in state estimation of nonlinear
24-dimensional state vector for three-dimensional systems.
Automatica, 36 (2000), 1 627- 1 63 8 .
navigation of an unmanned aerial vehicle in an
[8] Arasaratnam, I . , Haykin, S., and Elliot, R .
unknown environment.
Discrete-time nonlinear filtering algorithms using
Gauss-Hermite quadrature.
Proceedings of IEEE, 95 (2007), 953.
ACKNOW L E DG M E N T
[9] Alspach, D . , and Sorenson, H.
Nonlinear Bayesian estimation using Gaussian sum
This survey is the result of various research
approximation.
projects over the last ten years, and the author is IEEE Transactions on Automatic Control, 17 ( 1 972),
greatly indebted to the following persons who have 439-448.

78 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
[ 1 0] Kramer, S., and Sorenson, H. [26] Bailey, T., and Durrant-Whyte, H.
Recursive Bayesian estimation using piece-wise constant Simultaneous localization and mapping (SLAM): Part II.
approximations. IEEE Robotics & Automation Magazine, 13, 3 (Sept.
Automatica, 24 ( 1 988), 789-80 1 . 2006), 1 08- 1 1 7 .
[11] Hammersley, J., and Morton, K. [27] Thrun, S., Burgard, w. , and Fox, D.
Poor man ' s Monte Carlo. Probabilistic Robotics.
Journal of the Royal Statistical Society, Series B, 16 Cambridge, MA: MIT Press, 2005 .
( 1 954), 23. [28] Rathi, Y., Vaswani, N., Tannenbaum, A., and Yezzi, A.
Tracking deforming objects using particle filtering for
[ 1 2] Rosenbluth, M., and Rosenbluth, A.
geometric active contours.
Monte Carlo calculation of the average extension of
IEEE Transactions on Pattern Analysis and Machine
molecular chains.
Intelligence, 29, 8 (2007), 1 470-- 1 475.
Journal of Chemical Physics, 23 ( 1 956), 590.
[29] Rathi, Y. , Vaswani, N., and Tannenbaum, A.
[ 1 3] Akashi, H., and Kumamoto, H. A generic framework for tracking using particle filter
Random sampling approach to state estimation in with dynamic shape prior.
switching environment. IEEE Transactions on Image Processing, 16, 5 (2007),
Automation, 1 3, 1 977, 429.
1 370-- 1 382.
[ 1 4] Handshin, J. Monte Carlo techniques for prediction and [30] Lu, W-L., Okuma, K., and Little, J.
filtering of nonlinear stochastic processes. Automatica, 6, Tracking and recognizing actions of multiple hockey
1 970, 555. players using the boosted particle filter.
[ 1 5] Gordon, N., Salmond, D., and Smith, A. Image and Vision Computing, 27 (2009), 1 89-205 .
A novel approach to nonlinear/non-Gaussian Bayesian [3 1 ] Cevher, V., Sankaranarayanan, A . , McClellan, J . , and
state estimation. Chellappa, R.
In lEE Proceedings on Radar and Signal Processing, vol. Target tracking using a joint acoustic video system.
1 40, 1 993, 1 07-1 1 3 . IEEE Transactions on Multimedia, 9, 4 (2007), 7 1 5-727.
[ 1 6] Kitagawa, G. [32] Bar-Shalom, Y., and Fortmann, T.
Monte Carlo filter and smoother for non-Gaussian Tracking and Data Association, vol. 179 (Mathematics in
nonlinear state space models. Science and Engineering Series).
Journal of Computational and Graphical Statistics, S, 1 New York: Academic Press, 1 98 8 .
( 1 996), 1 -25 . [33] Gustafsson, E , and Hendeby, G.
On nonlinear transformations of stochastic variables and
[ 1 7] Isard, M., and Blake, A.
its application to nonlinear filtering.
Condensation-Conditional density propagation for visual
Presented at the IEEE International Conference on
tracking.
Acoustics, Speech, and Signal Processing, Las Vegas, NV,
International Journal of Computer Vision, 29, 1 ( 1 998),
2008.
5-28.
[34] Gustafsson, E
[ 1 8] Doucet, A., de Freitas, N., and Gordon, N., (Eds.) Adaptive Filtering and Change Detection.
Sequential Monte Carlo Methods in Practice. New York: Wiley, 200 1 .
New York: Springer-Verlag, 200 1 . [35] Jazwinsky, A.
[ 1 9] Liu, J., and Chen, R. Stochastic Process and Filtering Theory, vol. 64
Sequential Monte Carlo methods for dynamic systems. (Mathematics in Science and Engineering Series).
Journal of the American Statistical Association, 93 ( 1 998). New York: Academic Press, 1 970.
[20] Arulampalam, S . , Maskell, S . , Gordon, N., and Clapp, T. [36] Van Trees, H.
A tutorial on particle filters for online Detection, Estimation and Modulation Theory.
nonlinear/non-Gaussian Bayesian tracking. New York: Wiley, 1 97 1 .
IEEE Transactions on Signal Processing, 50, 2 (2002), [37] Robert, C. P., and Casella, G.
1 74- 1 8 8 . Monte Carlo Statistical Methods, (Springer Texts in
[2 1 ] Djuric, P., Kotecha, J., Zhang, J., Huang, Y., Ghirmai, T., Statistics Series).
Bugallo, M., and Miguez, J. New York: Springer, 1 999.
Particle filtering. [38] Klaas, M.
IEEE Signal Processing Magazine, 20 (2003), 1 9 . Toward practical n2 Monte Carlo: The marginal particle
filter.
[22] Cappe, 0., Godsill, S., and Moulines, E.
Uncertainty in Artificial Intelligence, (2005).
An overview of existing methods and recent advances in
[39] Poyiadjis, G., Doucet, A., and Singh, S.
sequential Monte Carlo.
Maximum likelihood parameter estimation in general
IEEE Proceedings, 95 (2007), 899.
state-space models using particle methods.
[23] Ristic, B., Arulampalam, S., and Gordon, N. Presented at the Joint Statistical Meeting, Minneapolis,
Beyond the Kalman filter: Particle filters for tracking MN, 2005 .
applications. [40] Poyiadjis, G., Doucet, A., and Singh, S.
London: Artech House, 2004. Maximum likelihood parameter estimation using particle
[24] Montemerlo, M., Thrun, S., Koller, D., and Wegbreit, B . methods.
FastSLAM a factored solution t o the simultaneous Presented at the IEEE Conference on Acoustics, Speech
localization and mapping problem. and Signal Processing, 2006.
Presented at the AAAI National Conference on Artificial [4 1 ] Martinez-Cantin, R., de Freitas, N., and Castellanos, J.
Intelligence, Edmonton, Canada, 2002. Analysis of particle methods for simultaneous robot
[25] Durrant-Whyte, H., and Bailey, T. localization and mapping and a new algorithm:
Simultaneous localization and mapping (SLAM): Part I. Marginal-slam.
IEEE Robotics & Automation Magazine, 13, 2 (June 2006), Presented at the IEEE International Conference on
99- 1 10. Robotics and Automation, Rome, Italy, 2007 .

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 79
[42] Sing, S., Kantas, N., Yo, B . , Doucet, A., and Evans, R [58] Hendeby, G.
Simulation-based optimal sensor scheduling with Performance and implementation aspects of nonlinear
application to observer trajectory planning. filtering.
Automatica, 43 (2007), 8 1 7-830. Dissertation No. 1 1 6 1 , Linkoping University, Sweden,
[43] Doucet, A., Godsill, S., and Andrieu, C. 2008.
On sequential simulation-based methods for Bayesian [59] Bergman, A D. N., and Gordon, N.
filtering. Optimal estimation and Cramer-Rao bounds for partial
Statistics and Computing, 10, 3 (2000), 1 97-208 . non-Gaussian state-space model.
[44] Kotecha, J., and Djuric, P. Annals of the Institute of Statistical Mathematics, 52 , 1
Gaussian particle filtering. (200 1 ), 97- 1 12.
IEEE Transactions on Signal Processing, 51 (2003), 2592. [60] Ripley, B .
Stochastic Simulation.
[45] Kotecha, J., and Djuric, P.
Gaussian sum particle filtering. Hoboken, NJ: Wiley, 1988.
IEEE Transactions on Signal Processing, 51 (2003), 2602. [61 ] Velmurugan, R, Subramanian, S., Cevher, V., Abramson,
D., Odame, K., Gray, J., Lo, H-J., McClellan, M., and
[46] Kong, A., Liu, J. S., and Wong, W. H.
Anderson, D.
Sequential imputations and Bayesian missing data
On low-power analog implementation of particle filters
problems.
for target tracking.
Journal of American Statistical Association, 89, 425
In Proceedings of the European Signal Processing
( 1 994), 278-288.
Conference (EUSIPCO), 2006.
[47] Liu, J.
[62] Coppersmith, D., and Winograd, S.
Metropolized independent sampling with comparison to
Matrix multiplication via arithmetic progressions.
rejection sampling and importance sampling.
Journal of Symbolic Computation, 9 ( 1 990), 25 1 -280.
Statistics and Computing, 6 ( 1 996), 1 1 3- 1 1 9.
[63] Doucet, A., Godsill, S . J., and Andrieu, C.
[48] Fearnhead, P.
On sequential Monte Carlo sampling methods for
Sequential Monte Carlo methods in filter theory.
Bayesian filtering.
Ph.D. dissertation, University of Oxford, UK, 1 998.
Statistics and Computing, 10, 3 (2000), 1 97-208.
[49] Doucet, A., Gordon, N., and Krishnamurthy, V.
[64] Casella, G., and Robert, C. P.
Particle filters for state estimation of jump Markov linear
Rao-Blackwellisation of sampling schemes.
systems.
Biometrika, 83, 1 ( 1 996), 8 1-94.
IEEE Transactions on Signal Processing, 49, 3 (200 1 ) ,
[65] Chen, R, and Liu, J. S.
6 1 3-624.
Mixture Kalman filters.
[50] Pitt, M., and Shephard, N. Journal of the Royal Statistical Society, 62, 3 (2000),
Filtering via simulation: Auxiliary particle filters.
493-508.
Journal of the American Statistical Association, 94, 446
[66] Andrieu, C., and Doucet, A
(June 1 999), 590-599
Particle filtering for partially observed Gaussian state
[5 1 ] Doucet, A , Briers, M., and Senecal, S . space models.
Efficient block sampling strategies for sequential Monte Journal of the Royal Statistical Society, 64, 4 (2002),
Carlo methods. 827-836.
Journal of Computational and Graphical Statistics, 15, 3
[67] Schon, T., Gustafsson, F. , and Nordlund, P.
(2006), 1 - 1 9 .
Marginalized particle filters for nonlinear state-space
[52] Thrun, S., Fox, D., Dellaert, F. , and Burgard, W. models.
Particle filters for mobile robot localization. IEEE Transactions on Signal Processing, 53 (2005),
In A. Doucet, N. de Freitas, and N. Gordon, (Eds.), 2279-2289.
Sequential Monte Carlo Methods in Practice, New York:
[68] Nordlund, P-J., and Gustafsson, F.
Springer-Verlag, 200 1 .
Marginalized particle filter for accurate and reliable
[53] Johansen, A., and Doucet, A terrain-aided navigation.
A note on auxiliary particle filters. IEEE Transactions on Aerospace and Electronic Systems,
Statistics & Probability Letters, 78, 12 (2008), 1 498- 1 504. 35, 3 (2008).
[54] Crisan, D., and Doucet, A. [69] Hendeby, G., Karlsson, R., and Gustafsson, F.
Convergence of sequential Monte Carlo methods. A new formulation of the Rao-Blackwellized particle
Signal Processing Group, Department of Engineering, filter.
University of Cambridge, Technical Report In Proceedings of IEEE Workshop on Statistical Signal
CUEDIF-INFENGrrR38 1 , 2000. Processing, Madison, WI, Aug. 2007.
[55] Moral, P. D. [70] Karlsson, R., Schon, T., and Gustafsson, F.
Feynman-Kac Formulae: Genealogical and Interacting Complexity analysis of the marginalized particle filter.
Particle Systems with Applications. IEEE Transactions on Signal Processing, 53 (2005),
New York: Springer, 2004. 4408-44 1 1 .
[56] Crisan, D., and Doucet, A. [7 1 ] S imandl, M., Knilovec, J., and Soderstrom, T.
A survey of convergence results on particle filtering Advanced point-mass method for nonlinear state
methods for practitioners. estimation.
IEEE Transactions on Signal Processing, 50, 3 (2002), Automatica, 42, 7 (July 2006), 1 1 3 3-1 145.
736-746. [72] Gustafsson, F., Gunnarsson, F., Bergman, N., Forssell, U.,
[57] Hu, X., Schon, T., and Ljung, L. Jansson, J., Karlsson, R , and Nordlund, P-J.
A basic convergence result for particle filtering. Particle filters for positioning, navigation and tracking.
IEEE Transactions on Signal Processing, 56 , 4 (Apr. IEEE Transactions on Signal Processing, 50, 2 (Feb.
2008), 1 337- 1 348. 2002), 425-437.

80 IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS
[73] Karlsson, R., and Gustafsson, F. [79] Forssell, U., Hall, P., Ahlqvist, S., and Gustafsson, F.
Bayesian surface and underwater navigation. Novel map-aided positioning system.
IEEE Transactions on Signal Processing, 54, 1 1 (2006), I n Proceedings of FISITA , no. F02- 1 1 3 1 , Helsinki, 2002.
4204-42 1 3 . [80] Hall, P.
[74] Fauske, K., Gustafsson, F., and Herenaes, O. A Bayesian approach to map-aided vehicle positioning.
Estimation of AUV dynamics for sensor fusion. Department of Electric Engineering, Linkoping
In Proceedings of Fusion 2007, Quebec, Canada, July, University, S-5 8 1 83 Linkoping, Sweden, Master ' s Thesis
2007. LiTH-ISY-EX-3 1 04, 200 1 , in Swedish.
[75] Karlsson, T. [8 1 ] Kronander, J.
Terrain aided underwater navigation using Bayesian Robust vehicle positioning: Integration of GPS and
statistics. motion sensors.
Department of Electrical Engineering, Linkoping Department of Electrical Engineering, Linkoping
University, S-5 8 1 83, Linkoping, Sweden, Master ' s University, S-5 8 1 83 Linkoping, Sweden, Master ' s Thesis
Thesis LiTH-ISY-EX-3292, 2002. LiTH-ISY-EX-3578, 2003 .
[76] Dahlin, M., and Mahl, S. [82] Athalye, A.
Radar distance positioning system-With a particle filter Design and implementation of reconfigurable hardware
approach. for real-time particle filtering.
Department of Electrical Engineering, Linkoping Ph.D. dissertation, Stody Brook University, 2007.
University, Master ' s Thesis LiTH-ISY EX-3998. [83] Hendeby, G., Hoi, J. D., Karlsson, R., and Gustafsson, F.
[77] Ronnebjerg, A. A graphics processing unit implementation of the particle
A tracking and collision warning system for maritime filter.
applications. In Proceedings of the European Signal Processing
Department of Electrical Engineering, Linkoping Conference (EUSIPCO), Pozna ' n, Poland, Sept. 2007.
University, S-58 1 83 Linkoping, Sweden, Master ' s Thesis [84] Karlsson, R., Schon, T., Tornqvist, D., Conte, G., and
LiTH-ISY-EX-3709, 2005, in Swedish. Gustafsson, F.
[78] Lo, S., Peterson, 8., and Enge, P. Utilizing model structure for efficient simultaneous
Loran data modulation: A primer (AESS Tutorial IV). localization and mapping for a UAV application.
IEEE Aerospace and Electronic Systems Magazine, 22 In Proceedings of IEEE Aerospace Conference, Big Sky,
(2007), 3 1-5 1 . MT, 2008.

Fredrik Gustafsson received the M.Sc. degree in electrical engineering 1 988 and
the Ph.D. degree in automatic control, 1 992, both from Linkoping University.
During 1 992-1 999 he held various positions in automatic control, and
1 999-2005 he had a professorship in communication systems. He has been a
professor in the Sensor Informatics at Department of Electrical Engineering,
Linkoping University, since 2005 . His research interests are in stochastic
signal processing, adaptive filtering, and change detection, with applications to
communication, vehicular, airborne, and audio systems. His work in the sensor
fusion area involves design and implementation of nonlinear filtering algorithms
for localization, navigation and tracking of all kind of platforms, including cars,
aircraft, spacecraft, UAV s, surface and underwater vessels, cell phones, and
film cameras for augmented reality. He is a cofounder of the companies NIRA
Dynamics and Softube, developing signal processing software solutions for
automotive and music industry, respectively.
He was an associate editor for IEEE Transactions of Signal Processing
2000-2006 and is currently associate editor for EURASIP Journal on Applied
Signal Processing and International Journal of Navigation and Observation. In
2004, he was awarded the Amberg prize by the Royal Swedish Academy of
Science (KVA) and in 2007 he was elected member of the Royal Academy of
Engineering Sciences (IVA) .

IEEE A&E SYSTEMS MAGAZINE VOL. 25, NO. 7 JULY 20 1 0 PART 2: TUTORIALS-GUSTAFSSON 81

You might also like