The Kalman Filter and Related Algorithms: A Literature Review
The Kalman Filter and Related Algorithms: A Literature Review
net/publication/236897001
CITATIONS READS
16 12,921
1 author:
Corey Montella
Lehigh University
9 PUBLICATIONS 132 CITATIONS
SEE PROFILE
All content following this page was uploaded by Corey Montella on 22 May 2014.
Corey Montella
Abstract
1. Introduction
The ultimate goal of algorithms research is to find an optimal solution for a giv-
en problem. However, there are few problems in computer science that can be
considered completely solved. That is, it is rare that one can show an algorithm is
completely optimal for its problem domain. One of the few areas where a provably
optimal algorithm can be found, however, is in Bayesian state estimation. Here,
the Kalman Filter, an algorithm that propagates a system’s varying quantities over
time, can be shown to be the best algorithm possible for its domain. This paper is
an introduction to the Kalman Filter and several related Bayesian state estimators.
In the rest of this introduction we will introduce Bayesian Filters. In the next sec-
tion we will describe the Kalman Filter in detail. Then, we will detail the Extended
Kalman Filter, and finally the Particle Filter. For each filter, we will provide a
sketch of the algorithm, analyze its computational time complexity, and finally
sketch a proof of its correctness, or analyze how well it approximates the optimal
solution..
The focus of this paper is the Kalman Filter and its related algorithms. These are
examples of Bayesian Filters, named after their application of Bayes’ law, ex-
pressed in Equation 1.
( | ) ( )
( | ) Equation 1
( )
2
Stated simply, Bayes’ law says the probability of estimating A given B has oc-
curred is equal to the normalized probability of B given A has occurred, multiplied
by the probability of A occurring. Bayes’ law is useful when we cannot measure
P(A|B) explicitly, but we can measure P(B|A) and P(A). A typical example used to
illustrate Bayes’ law is estimating the probability of having a disease given a posi-
tive test result. In this scenario, the probability of a false-positive and false-
negative are the observable quantities used to calculate this.
Bayesian Filters use Bayes’ law to estimate an unobservable state of a given
system using observable data. They do this by propagating the posterior probabil-
ity density function of the state using a transition model. For example, in robotics,
the system’s state is typically the pose of a robot in its environment, or the config-
uration of joints in an actuator. This is usually denoted s t. Also, in robotics the ob-
servable data is typically a control that tells the robot how to move, and an obser-
vation about the environment it makes with its sensors. These are usually denoted
ut and zt respectively. The control can be velocity command give to motors, or
perhaps desired angles given to joints. The observation can be a range and bearing
to an obstacle. Finally, using Bayes’ law and the Markov assumption (that the cur-
rent sate depends only on the previous state and not any state before it) we can
state the Bayesian Filter below, which is derived from Equation 1 [4].
( | ) ( | ) ( | ) Equation 2
Equation 3
Equation 4
While these assumptions restrict the applicability of the Kalman Filter, we will
show later they also ensure its optimality.
The algorithm is structured in a predictor-corrector format. The general idea is
to project the state forward, using a state transition function. Then this state is cor-
rected by incorporating a measurement of the system’s observable quantities. The
algorithm can be divided into two distinct phases: a time update phase and a
measurement update phase.
Equation 5
Here, A is the same matrix used to propagate the state mean, and Q is random
Gaussian noise. Equations 3 and 5 exploit a general property of Gaussians: adding
two Gaussians results in a Gaussian, and applying a linear transformation to a
Gaussian yields a Gaussian. There properties of Gaussian are crucial to the opti-
mality of the filter. After the time update phase, the original Gaussian character-
ized by st and Pt is a new Gaussian, now characterized by st+1 and Pt+1. This con-
cludes the time update phase, and represents the prediction step of the algorithm.
( ) Equation 6
In section 2.3 we will derive this gain, and show it is this that makes the Kal-
man Filter optimal.
Next, we calculate what is known as the innovation (Equation 7). This is the
difference between the expected observation, and the actual observation.
̂ ( ) Equation 7
̂ Equation 8
( ) Equation 9
Figure 1: (a) the prior distribution in black. (b) State is propagated forward in blue, adding uncertainty.
(c) A measurement it made in red. (d) Kalman Filter produces the prior distribution in green. This dis-
tribution has less uncertainty compared to both state and measurement as a result of Bayes’ Law.
a b
c d
5
1.
2.
3. ( )
4. ̂ ( )
5. ̂
6. ( )
7.
Line 1: This line consists of two matrix multiplications and two matrix addi-
tions. Here, st and ut are 1xn vectors, where n is the size of the state.
A, B are two nxn matrices. Also, wt is a 1xn vector. The best known
algorithm for multiplying a mxn and a nxp matrix runs as O(nmp).
The run time for adding two mxn matrices runs as O(mn). Therefore,
the total time complexity of this line is:
O(2n2 + 3n)
Line 2: This line consists of two matrix multiplications, a transpose opera-
tion, and a matrix addition. The best known algorithm for multiply-
ing two nxn matrices runs as O(n2.376). The transpose operation runs
as O(n2). So the total complexity of this line is:
O(2n2.376 + 2n2)
Line 3: This line consists of four matrix multiplications, two transpose oper-
ations, a matrix addition, and a matrix inversion. The inverse opera-
tion runs as O(n2.376). So the total complexity of this line is:
O(5n2.376 + 3n2).
Line 4: This line consists of a matrix subtraction and a matrix multiplication.
The total complexity is:
O(n2 + n).
Line 5: This line consists of a matrix addition and a matrix multiplication, so
again the complexity is:
6
O(n2 + n).
Line 6: Our final line consists of two matrix multiplications and a matrix ad-
dition, so the total complexity is :
O(2n2.376 + n2).
Therefore the total time complexity of a single application of the Kalman Filter is
O(9n2.376 + 10n2 + 5n) ∈ O(n2.376).
2.3. Correctness
Several times in this paper, we made the claim that the Kalman Filter propa-
gates the state st in an optimal manner. This is due to the choice of the Kalman
Gain, which we will now show minimizes the mean square error in the posterior
state estimate. That is, we want our choice of the Kalman Gain to minimize
| | Equation 10
( ) ( ) Equation 11
( ) Equation 12
We want to minimize the trace of this, so we take the derivative of the trace and
set it equal to zero.
( ) Equation 13
( ) ( )
Solving for K we arrive at Equation 6, the optimal Kalman Gain that minimizes
the sum of the square error in the posterior. Therefore we can be sure that if we
satisfy all the assumptions of the Kalman Filter, that the output will be optimal [4].
7
( ) Equation 14
̂ ( ) Equation 15
Here, the process update and observation models are characterized by two po-
tentially nonlinear functions ( ) and ( ). Since the Kalman Filter
works on only linear inputs, we must linearize these functions to propagate the
state covariance matrix forward. We do this by approximating the function as a
line tangent to the actual function at the mean value. This line is found by expand-
ing the nonlinear functions in a Taylor Series around the mean, and taking the first
order approximation. This expansion is expressed as y = f(x+ε) ≈ f(x) + Jε, where
J is the Jacobian of f(x). Equation 16 demonstrates how we use the Jacobian to
propagate our covariance matrices.
( ̅)( ̅) Equation 16
Here, Cx is our current covariance; J is the Jacobian of our nonlinear state tran-
sition function; is zero mean Gaussian noise; and Cy is our transformed covari-
ance matrix. However, this result is only approximate due to our use of the Jacobi-
an. Thus, we need to linearize both equations 14 and 15 by taking the gradient of
each with respect to the state st, as in Equations 17 and 18.
( ) Equation 17
( ) Equation 18
Thus we now follow the same predictor-corrector form of the original Kalman
Filter in two distinct phases.
8
Equation 19
Here, Q is again zero mean Gaussian noise of the process model. These two
equations provide us with a fully parameterized Gaussian prior. We now move to
the measurement update phase.
̂ ( ( )) Equation 20
With this last equation, we can estimate the posterior in the standard Kalman
Filter way. The pseudocode for the Extended Kalman Filter is depicted in Figure
3.
1. ( )
2.
3. ( )
4. ̂ ( ( ))
5. ̂
6. ( )
7.
9
3.3. Accuracy
We showed earlier that the Kalman Filter is optimal, in that its choice in the
Kalman Gain minimizes the sum of the square of the error in the posterior state es-
timate (that is, it minimizes Equation 10). This optimal Kalman Gain was only
possible because the transition and observation models were linear in their argu-
ments. With the Extended Kalman Filter, we relaxed this requirement, by lineariz-
ing nonlinear models and calculating their Jacobians to propagate uncertainty.
This means the Kalman Gain in the Extended Kalman Filter is no longer optimal,
but an approximation. But exactly how far from optimal is the Extended Kalman
Filter? The answer is dependent on two factors:
First, the accuracy of the filter largely depends on how linear the transition and
observation models are around the mean. If these functions are relatively, linear,
the approximation will be good. However, it is possible that these will be highly
nonlinear, and thus the performance of the Extended Kalman Filter will suffer [4].
Second, the modality of the transition and observation models will affect the
Extended Kalman Filter’s performance; if these functions are multimodal, then
this filter can diverge. For this reason, nonparametric filters like the Particle Filter,
which we are about to introduce, are better suited for these types of systems [4].
10
4. Particle Filter
Until now, we have focused exclusively on parametric Bayesian Filters that
manipulate Gaussian distributions, propagating with a transition model. We now
turn to a nonparametric filter, with a unique approach to calculating the posterior
distribution: the Particle Filter. As its name suggests, it uses a set of hypothesis
called particles as guesses for the true configuration of the state. In the following
section we present the basic Particle Filter algorithm. This section derives from
this seminal 1993 paper by N. Gordon, D. Salmond, and A. Smith titled Novel
approach to nonlinear/non-Gaussian Bayesian state estimation [3], where they
called the algorithm the “Bootstrap Filter”.
Figure 4: The basic cycle of the Particle Filter involves 3 steps. The first step involves sampling from a
proposal distribution. The second step involves applying an importance weight to those samples. The
final step is the key step, that resamples the particle set based on the assigned weights.
4.1.1. Sample
The first step in the particle filter is to go through every particle and sample
from a proposal distribution.
( | ) Equation 21
Just as the Bayesian Filters we looked at in previous sections, the Particle Filter
is a recursive algorithm, so we therefore sample the current state using the previ-
ous state. Here, the superscript [m] on both state variables implies that the state s t
is derived from the same particle in the previous particle set. The same control in-
put ut as the previous methods is used to propagate the state forward. Since the
transition function is noisy, each particle goes through a different transition, which
adds variety to the particle set.
11
Note that we do not explicitly state how to obtain this sample. This is depend-
ent on the exact system, and is one of the requirements of the Particle Filter, that a
proposal distribution must be available to sample from. In many applications, this
is readily available. If not the case, it sometimes suffices to sample from a Gaussi-
an distribution with appropriate mean and covariance.
4.1.2. Weight
Next each particle is weighted based on how well it matches the posterior dis-
tribution. This is expressed in Equation 22.
( | ) Equation 22
This is equivalent to the measurement update phase in the Kalman Filter, where
the observation is incorporated into the belief. There are many different methods
to weight particles, but most involve estimating the following quotient
( ) Equation 23
( )
Where ( ) is our proposal distribution, and ( ) is our target distribu-
tion. When we have completed these two steps, we are finally free to add the new
weighted particle to a temporary particle set.
4.1.3. Resample
Also known as importance sampling, the resample step uses the newly generat-
ed temporary particle set to generate the final posterior distribution. Just as with
weighting the particles, there are many ways to accomplish importance sampling,
but doing this incorrectly can lead to the wrong posterior distribution. Essentially,
the resampling step reduces variance in the particle set, which decreases the accu-
racy of the posterior approximation. In general, we sample with particles with re-
placement with a frequency proportional to their weight. However there are myri-
ad variations on this general method. Below are several guidelines to follow when
implementing a resampling routine [4].
Do not resample too frequently – resampling too often can lead to a situa-
tion where the particle variance is artificially low. A common example used
to illustrate this is a stationary robot with no sensing. If this robot uses a parti-
cle filter to estimate its position, and is allowed to resample the particle set,
the robot will uniquely determine its position after a long enough time, since
we sample with replacement. Yet a robot that does not sense should never be
able to determine its position. Therefore, the most you should resample is
once per control and observation. Sometimes it is even beneficial to resample
only when the variance in the particle set is too high [4].
particle set. This method will quickly lower the variance it particles; a particle
with a very high weight will be sampled many times. A better implementation
is dependent on the particles already chosen. One method that does this is
known as stratified random sampling, which breaks the particle set into strata
and chooses a representative number of particles from each tier. This has the
desirable effect of maintaining particle variance in the posterior distribution
[4].
After the resampling step is complete, the new particle set should match the
target distribution. This particle set is then used as input to a subsequent iteration
of the particle filter algorithm. Figure 5 depicts the algorithm graphically, while
Figure 6 presents the pseudocode for the algorithm.
1.
2.
3.
4. ( | )
5. ( | )
6. 〈 〉
7.
8.
9.
10.
11.
12.
13
Figure 6: (a) The blue distribution to the left represent the proposal posterior. The red distribution to
the right represents the target posterior. The black lines under the figure represent the particles, which
are sampled from the blue distribution. (b) The particle set after weighting. The height of the lines is
indicative of the weight of the particles. (c) The particle set after resampling. The particles with low
weight were not added to the new set. Here, a single line may represent multiple particles, since we
sample with replacement.
4.3. Accuracy
Because of its sample-based nature, the Particle Filter is an approximate algo-
rithm; the posterior it generates is only an estimate of the true distribution. How-
ever, unlike the Extended Kalman Filter, this error can be mitigated and the accu-
racy of the Particle Filter can be arbitrarily improved at the expense of
performance.
The most obvious source of inaccuracy in the Particle Filter is the amount of
particles. As the number of particles approaches infinity, the particle distribution
approaches the true value of the posterior distribution. Therefore, if we want to
make the filter more accurate, one easy way is to just increase the particle count.
However, in practice, if the state space is small, around 100 particles will be suffi-
cient. Of course, this is a floor, and it raises as the size of the state space increases
[4].
Another source of inaccuracy stems from the difference between the proposal
posterior distribution and the target posterior distribution. In Figure 6, this differ-
ence is exaggerated for illustrative purposes. If the actual distributions differed
this much, the filter will very likely diverge, for the reason that only a few parti-
cles from the proposal distribution are highly weighted. To compensate for this we
either increase the particle count or use a better proposal distribution.
The proposal comes from Equation 21, which we conditioned only on s t-1 and
ut. However, the target posterior we want to find is conditioned on zt as well (this
fact comes from Equation 2). Therefore, if we want a better proposal, we can in-
corporate the measurement in to this distribution.
15
Output
Equation 24
Equation 25
Equation 26
16
Here, x and y are the Cartesian coordinates of the robot, while is its orienta-
tion; v and are respectively the linear and angular velocity commands sent to
the robot; and and are respectively the range and bearing to obstacles. We also
have a transition model and observation model for this problem. These correspond
to equations 14 and 15.
̂ ̂
( ̂ )
̂ ̂
( ) [ ] [ ] ̂ ̂ Equation 27
( ̂ )
̂ ̂
[ ̂ ]
√ Equation 28
( ) [ ( ) ( ) ]
( )
Here, v and ω have been corrupted with noise, and μj is a feature in the map the
robot observes. It is clear from Equations 27 and 28 that this system is nonlinear.
Therefore, a Kalman Filter is not applicable to this problem; although, it would be
appropriate for robot localization in one dimension. Unfortunately, this is the case
for many problems in robotics and other fields, but of course we can move for-
ward with the Extended Kalman Filter. To do this, we must calculate Equations 17
an 18 by taking the gradient of Equations 27 and 28.
( )
[ ( ) ] Equation 29
√ √
Equation 30
[ ]
Where dx is and dy is .
With equations 24 through 30, we can now implement both the Extended
Kalman Filter and the Particle Filter. Which one we choose to use depends largely
on the environment we want to localize our robot in. For an indoor environment,
the Particle Filter is preferable. The state space is smaller, and there are many cor-
ners to provide good landmarks. Further, the Particle Filter is able to solve what is
known as the “Kidnapped Robot Problem.” That is, given a fully localized robot,
move it between Particle Filter iterations, and the Particle Filter with added ran-
dom samples will re-localize the robot. The Extended Kalman Filter has no way of
accomplishing this task. Outdoors, the state space is very large, so the complexity
17
of the Particle Filter is prohibitive. Here, we would rather use the Extended Kal-
man Filter. If we keep the number of tracked objects small, the computational
complexity of this algorithm should not be overbearing.
5.2. Conclusions
In this paper we introduced three very influential algorithms in the field of
Bayesian State Estimation. The first algorithm, the Kalman Filter, is one of the
few algorithms researchers can call solved; it solves state estimation for linear
Gaussian systems in an optimized manner. For this reason, even though the algo-
rithm is over half a century old, it is still used extensively today.
The second algorithm was an extension of the Kalman Filter. It attempts to re-
lax the requirements for a linear Gaussian system, so it is applicable to more sys-
tems. It does this by expanding the transition and observation models in a Taylor
series approximation, and using their Jacobians to propagate covariances. The Ex-
tended Kalman Filter is no longer optimal, but the robustness of the original Kal-
man Filter allows for a great deal of latitude in the degree of nonlinearity of the
model functions.
The final algorithm was the Particle Filter, a nonparametric Bayesian State Es-
timator. This algorithm differs from the previous two in that it has no requirements
on the transition and observation models. It manages this because samples are tak-
en from a proposal posterior, and then weighted according to observations in order
to approximate the true posterior. The beauty in the Particle Filter is found in the
resampling stage, where particles are chosen for the posterior according to their
respective weights. Several factors affect the performance of the Particle Filter,
including variance in the particle set, frequency of resampling, and how well the
proposed posterior matches the target posterior. This flexibility comes at the price
of high computational complexity in the size of the state space.
While these algorithms are among the most important in the field, they are cer-
tainly not the only ones. In fact, there are countless variations of these algorithms
including the Information Filter, the Unscented Kalman Filter, and the Rao-
Blackwellized Particle Filter. Each of these has their own take of the basic algo-
rithms we presented here, in order to increase computational efficiency or accura-
cy. However, we expect to see the algorithms presented here in use for many years
to come.
6. References