0% found this document useful (0 votes)
19 views13 pages

Error State - Kalman Filter - Mike

Kalman filter

Uploaded by

yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views13 pages

Error State - Kalman Filter - Mike

Kalman filter

Uploaded by

yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

A “quick” review of Error State -

Extended Kalman Filter


Recently in my job I had to work on implementing a Kalman Filter. My surprise was
that there is an incredible lack of resources explaining with detail how Kaman Filter
(KF) works. Imagine now the lack of resources explaining a more complex KF as the
Error-state Extended Kaman Filter (ES-EKF). In this post, I will focus on the ES-EKF and
leave UKF alone for now. One of the only blogs regarding a linear KF worth reading is
kalman filter with images (https://www.bzarg.com/p/how-a-kalman-filter-works-in-
pictures/) which I recommended. Here I will cover with more details the whole linear
Kalman filter equations and how to derive them. After that, I will explain how to
transform it into an Extended KF (EKF) and then how to transform it into an Error-
state Extended KF (ES-EKF).

Notation
We will use Proper Euler angles to note rotations, that will be is α, β, γ, we are only
interested in 2D rotations, therefore, we will use the z-x’-z’’ representation in which α
represents the yaw (the representation does not matter as far as the first rotation
happens in the z axis). The steering angle will be noted by δ.

Explanation
The Kalman Filter is used to keep track of certain variables and fuse information
coming from other sensors such as Inertial Measurement Unit (IMU) or Wheels or
any other sensor. It is very common in robotics because it fuses the information
according to how certain the measurements are. Therefore we can have several
sources of information, some more reliable than others and a KF takes that into
account to keep track of the variables we are interested in.

State
The state s we are interested in tracking is composed by x and y coordinates, the
t

heading of the vehicle or the yaw θ, the current velocity v and steering angle δ. The
tracked orientation is only composed by the yaw θ, we are only modelling a 2D world,
therefore we do not care about the roll β or pitch γ. And finally, we added the
steering angle δ which is important to predict the movement of the car. Therefore
the state in timestep t is
x
⎡ ⎤

⎢ y ⎥
⎢ ⎥
st = ⎢ θ ⎥
⎢ ⎥
⎢ ⎥
⎢ v ⎥

⎣ ⎦
δ

KF can be divided into two steps, update and predict step. In the predict step, using
the tracked information we predict where will the object move in the next step. In the
update step, we update the belief we have about the variables using the external
measurements coming from the sensors.

Sensor
Keep in mind that a KF can handle any number of sensors, so far we are going to use
the localization measurement coming from a GPS + pseudo-gyro.
This measurement contains the global measurements (x, y) that avoid the system of
drifting. This system (without global variables) is also called Dead reckoning. Dead
reckoning or using a Kalman Filter without a global measurement is prone to
cumulative errors, that means that the state will slowly diverge from the true value.
Prediction Step
We will track the state as a multivariable Gaussian distribution with mean μ and
covariance P . μ will be the expected value of the state using the information
t

available (i.e. the mean of s ). And the state will have a covariance matrix P which
t

means how certain we are about our prediction. We will use μ and u to predict μ .
t−1 t

Here u is a control column-vector of any extra information we can use, for example,
steering angle if we can have access to the steering of the car or the acceleration if
we have access to it. u can be a vector of any size.
We will try to model everything using matrices but for now, we will use scalars, the
new value of the state in t will be
xt = xt−1 + vΔt cos θ

yt = yt−1 + vΔt sin θ

θ = θt−1

vt = vt−1

δt = δt−1

Here we are making simplifying assumptions about the world. First, the velocity v
and the steering δ of the next step will be the same as before which is a weak
assumption. The strong assumption is that the heading or yaw of the car θ is the
same. Notice we are not using the steering but we still track it, it will be useful later.
We can incorporate the kinematic model here to make the prediction more robust.
But that will be adding non-linearities (and so far it is a linear KF). For now, let’s work
with a simple environment and later on we can make things more interesting.
This prediction can be re-formulated in matrix form as
μt = F μt−1 + Bu

Where u is a zero vector and B is a linear transformation from u into the same form
of the state s. Also, F would be (F has to be linear so far, in the EKF we will expand
that to include non-linearities)
1 0 0 Δt cos θ 0
⎡ ⎤

⎢0 1 0 Δt sin θ 0⎥
⎢ ⎥
F = ⎢0 0 1 0

0⎥

⎢ ⎥
⎢0 0 0 1 0⎥
⎣ ⎦
0 0 0 0 1

This will result in the same equations but using matrix notation. Rember now that we
are modelling s as a multivariable gaussian distribution and we are keeping track of
the mean μ of the state s and the covariance P . Using the equations above we
update the mean of the state, now we have to update the covariance of the state.
Every time we predict we make small errors which add noise and results in a slightly
less accurate prediction. The covariance P has to reflect this reduction in certainty.
The way it is done with Gaussian distributions is that the distribution gets slightly
more flat (i.e. the covariance “increase”).
In a single-variable gaussian distribution y ∼ N (μ , σ ) the variance has the
′ 2

property that var(ky) = k var(y), where k is a scalar. In matrix notation that is


2

Pt = F Pt−1 F
T
. Now we have to take into account that we are adding Bu, where u
is the control vector and a gaussian variable with covariance Q. The good thing about
Gaussians is that the covariances of a sum of Gaussians is the sum of the covariances
(if both random variables are independent). Having this into account we have.
T T
Pt = F Pt−1 F + BQB

And with this, we have finished prediction the state and updating its covariance.

Update step
In the update step, we receive a measurement z coming from a sensor. We use the
sensor information to correct/update the belief we have about the state. The
measurement is a random variable with covariance R. This is where things get
interesting. In this case, we have two Gaussians variables, the state best estimate μ t

and the measurement reading z.


The best way to combine two Gaussians is by multiplying them together. By
multiplying them together, if certain values have high certainty in both distributions,
the result will be also a high in the product (very certain). If both values have low
certainty, the product will be even lower. And if If only one is high and the other is
not, then the result will lay between high and low certainty. So multiplication of
Gaussians merges the information of both distributions taking into account how
certain the values are (covariance).
The equations derived from multiplying two multivariate Gaussians are similar to the
single variable case. We will derive them here and generalize that to matrix form.
Let’s suppose we have x 1 ∼ N (μ1 , σ )
2
1
and x 2 ∼ N (μ2 , σ )
2
2
(and they do not have
anything to do with the state or measurement for now). Have in mind that both x 1

and x live in the same vector space x, therefore


2

2 2
(x−μ ) (x−μ )
1 2

1 −
2 1 −
2
2σ 2σ
p(x1 ) = e 2
p(x2 ) = e 2

2 2
√2πσ1 √2πσ2

by multiplying them together we obtain


2 2
(x−μ1 ) (x−μ2 )

1 −
2 1 −
2
2σ 2σ
e 1
e 2

2 2
√2πσ √2πσ
1 2

We also now about a very useful property of Gaussians: the product of Gaussians is
also a gaussian distribution. Therefore, to know the result of fusing both Gaussians
we have to write the equation above in a gaussian form.
2 2
(x−μ ) (x−μ )
1 2
− −
1 2 1 2
2σ 2σ
= e 1
e 2

2 2
√2πσ1 √2πσ2

2 2
(x−μ ) (x−μ )
1 2
−( + )
1 2 2
2σ 2σ
= e 1 2

2 2
2πσ σ
1 2

Because we know the result will be a Gaussian distribution, we do not care about
constant values (e.g. 2πσ ), in fact, we only care about the exponent value, which I
2
1

have to transform it into something similar to


2
(x − something)

2
2something else

Where something will be the new mean and something else will be the new 2

covariance after multiplication. Therefore we will ignore all the other terms and focus
on the exponent value.
2 2 2 2 2 2
(x − μ1 ) (x − μ2 ) σ (x − μ1 ) + σ (x − μ2 )
2 1
+ =
2 2 2 2
2σ 2σ 2σ σ
1 2 1 2

2 2 2 2 2 2 2 2 2 2
σ x − 2σ μ1 x + σ μ + σ x − 2σ μ2 x + σ μ
2 2 2 1 1 1 1 2
=
2 2
2σ σ
1 2

2 2 2 2 2 2 2 2 2
x (σ + σ ) − 2x(σ μ1 + σ μ2 ) σ μ + σ μ
2 1 2 1 2 1 1 2
= +
2 2 2 2
2σ σ 2σ σ
1 2 1 2

2 2 2 2 2 2 2 2
(σ + σ ) σ μ1 + σ μ2 σ μ + σ μ
2 1 2 1 2 1 1 2
2
= (x − 2x ) +
2 2 2 2 2 2
2σ σ σ + σ 2σ σ
1 2 2 1 2

The term on the right can be ignored because it is constant and goes out of the
exponent. And the term in parenthesis resembles a perfect square trinomial lacking
the last squared term.
2 2 2 2
(σ + σ ) σ μ1 + σ μ2
2 1 2 1
2
= (x − 2x )
2 2 2 2
2σ σ σ + σ
1 2 2

2 2
2 2 2 2 2 2 2 2
(σ + σ ) ⎛ σ μ1 + σ μ2 σ μ1 + σ μ2 σ μ1 + σ μ2 ⎞
2 1 2 1 2 1 2 1
2
= x − 2x + ( ) − ( )
2 2 2 2 2 2 2 2
2σ σ ⎝ σ + σ σ + σ σ + σ ⎠
1 2 2 2 2

2 2
2 2 2 2 2 2
(σ + σ ) ⎛ σ μ1 + σ μ2 σ μ1 + σ μ2 ⎞
2 1 2 1 2 1
= (x − ) − ( )
2 2 2 2 2 2
2σ σ ⎝ σ + σ σ + σ ⎠
1 2 2 2

Ignoring the second term because it is also a constant, the final result of the
exponent value is
2
2 2
σ μ1 +σ μ2
2 1

2 2 2 2
2 (x − 2 2
)
σ +σ
(σ + σ ) σ μ1 + σ μ2 1 2
2 1 2 1
(x − ) =
2 2 2 2 2 2
2σ σ
2σ σ σ + σ 1 2
1 2 1 2
2 2
(σ +σ )
2 1

In fact this final form does resemble a Gaussian distribution. The new mean will be
what is in the parenhesis with x and the new covariance will be the denominator
divided by 2. To simplify things further along the way, we will re write it like
2 2
σ μ1 + σ μ2
2 1
μnew =
2 2
σ + σ
1 2

2 2
σ μ1 + σ μ2
2 1
= μ1 + − μ1
2 2
σ + σ
1 2

2 2 2 2
σ μ1 + σ μ2 − μ1 (σ + σ )
2 1 1 2
= μ1 +
2 2
σ + σ
1 2

2 2 2 2
σ μ1 + σ μ2 − μ1 σ − σ μ1
2 1 1 2
= μ1 +
2 2
σ + σ
1 2

2
σ (μ2 − μ1 )
1
= μ1 +
2 2
σ + σ
1 2

= μ1 + K(μ2 − μ1 )

where K 2
= σ /(σ
1 1
2 2
+ σ )
2
. For the variance we have
2 2
σ σ
1 2
σnew =
2 2
(σ + σ )
2 1

2 2
σ σ
1 2
2 2
= σ + − σ
1 2 2 1
(σ + σ )
2 1

2 2 2 2 2
σ σ − σ (σ + σ )
1 2 1 2 1
2
= σ +
1 2 2
σ + σ
2 1

2 2 2 2 4
σ σ − σ σ + σ
1 2 1 2 1
2
= σ +
1 2 2
σ + σ
2 1

4
σ
1
2
= σ +
1 2 2
σ + σ
2 1

2 2
= σ + Kσ
1 1

Now we need to transform that to matrix notation and change for the correct
variables. μ and z are not in the same vector space, therefore to transform x into the
same vector space as the measurement space we use the matrix H . The final result
will be
T T −1
K = H Pt−1 H (H Pt−1 H + R)

H μt = H μt−1 + K(z − H μt−1 )

T T T
H Pt H = H Pt−1 H + KH Pt−1 H

If we take one H out from the left of K and we end up with


T T −1
K = Pt−1 H (H Pt−1 H + R)

H μt = H μt−1 + H K(z − H μt−1 )

T T T
H Pt H = H Pt−1 H + H KH Pt−1 H

We can pre-multiply the second and third equation by H −T


and also post-multiply
the third equation by H −1
, The final result turns out to be in the state vector space μ
and not in the measurement vector space H μ. The final result for the update step
(which corresponds to the combination of two sources of information with different
certainty levels) is
T T −1
K = Pt−1 H (H Pt−1 H + R)

μt = μt−1 + K(z − H μt−1 )

Pt = Pt−1 + KH Pt−1 = (I + KH )Pt−1

And that is it! The all the equations for a Linear Kalman Filter.

Prediction step
μt = F μt−1 + Bu

T T
Pt = F Pt−1 F + BQB

Update step:
T T −1
K = PH (H P H + R)

μt = μt−1 + K(z − H μt−1 )

Pt = Pt−1 + KH Pt−1 = (I + KH )Pt−1

Extended Kalman Filter


In reality, the world does not behave linearly. The way KF deals with non-linearities is
by using the jacobian to linearize the equation. We can expand this model to a non-
linear proper KF modifying the prediction step by adding a simple kinematic model,
for example, a bicycle kinematic model.
If we model everything from the centre of gravity of the vehicle, the equations for the
bicycle kinematic model are
ẋ = v cos(θ + β)

ẏ = v sin(θ + β)

v cos(β) tan(δ)
˙
θ =
L

lr tan δ
−1
β = tan ( )
L

Where θ is the heading of the vehicle (yaw), β is the slip angle of the centre of gravity,
L is the length of the vehicle, l is the length between the rearmost part to the centre
r

of gravity and δ is the steering angle. In discrete-time form, we will have


xt = xt−1 + Δt ⋅ v cos(θ + β)

yt = yt−1 + Δt ⋅ v sin(θ + β)

v cos(β) tan(δ)
θt = θt−1 + Δt ⋅
L

lr tan δt−1
−1
βt = tan ( )
L

vt = vt−1

δt = δt−1

If you define that system of equations as f (x, y, θ, v, δ) ∈ R then we can model the
6

whole system using f and F = ∂fj /∂xi . We can also use the same trick with the
transformation from state space s into measurement vector space z.
We can also add non-linearities in the measurement. Before we used the matrix H
now we can use the function h(⋅) and define H as H = ∂hi /∂xi . The final Extended
Kalman Filter is

Prediction step
μt = f (μt−1 ) + Bu

T T
Pt = F Pt−1 F + BQB

Update step:
T T −1
K = Pt−1 H (H Pt−1 H + R)

μt = μt−1 + K(z − h(μt−1 ))

Pt = (I + KH )Pt−1
##

Error state - Extended Kalman Filter


EKF is not a perfect method to estimate and predict the state, it will always make
mistakes when predicting. The longer the number of sequential predictions without
updates, the bigger the accumulated error. One interesting common property of the
errors is that they have less complex behaviour than the state itself. This can be seen
easier in the image below. While the behaviour of the position is highly non-linear,
the error (estimation - ground truth) behaves much closer to a linear behaviour.

left image taken from “Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video”.

Therefore modelling the error of the state (i.e. error-state) is more likely that will be
model correctly by a linear model. Therefore, we can avoid some noise coming from
trying to model highly non-linear behaviour by modelling the error-state. Let’s define
the error-state as e = μ t − μt−1 . We can approximate f (μ t−1 ) using the Taylor
series expansion only using the first derivative. Therefore f (μ t−1 ) ≈ μt−1 + F et−1 .
Replacing this and rearranging equation we end up with the final equations for the
Error state - Extended Kalman Filter (ES-EKF)

Prediction step
st = f (st−1 , u)

T T
Pt = F Pt−1 F + BQB

Update step:
T T −1
K = PH (H P H + R)

et = K(z − h(μt−1 ))

st = st−1 + et

Pt = (I + KH )Pt−1

Keep in mind that now we are tracking the error state and the covariance of the
error, therefore we need to predict the state s and correct it by using the error-state
t

during the update step, otherwise, we can estimate the state directly using f (⋅) as in
ithe prediction step.
(if you see I have made a mistake, don’t hesitate to tell me).
4 Comments 
1 Login

Join the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 Share Best Newest Oldest

− ⚑
S
Sharief Saleh
3 months ago

There is virtually no difference between the EKF and ES-EKF implementations that you have presented... Both
estimate the state using a non-linear function. Then both transition the covariance matrix P with the same
linearized F matrix. After that, both compute the same error state "K(z-h(x))". Then you added that error to the
navigation state. In EKF implementation you did that step in the same line while in the ES-EKF you did it in two
separate lines. For me, both are the same... Would you please show how they are different?

0 0 Reply • Share ›

− ⚑
Y
YoK > Sharief Saleh
a month ago edited

I think that F matrix is actually not the same. For the traditional EKF, F is just computed by 1st order partial
derivative with respect to the states itself. But for ES-EKF, F is computed with respect to “error state“.
As a result propagation of covariance becomes more accurate and stable utilizing error state form.

1 0 Reply • Share ›

− ⚑
E
Eric Huang
2 years ago

Great article, but could explain more why error state EKF defines the error-state as e = μ(t) − μ(t−1) ? some article
defines it as nominal state - true state.

0 0 Reply • Share ›

− ⚑
M
Mike Woodcock > Eric Huang
12 days ago

The error state is already subtracting some non-linearity thanks to subtracting the prediction, so in several
cases the ES has shown to lower the degree of non-linearity behaviour

0 0 Reply • Share ›
Subscribe Privacy Do Not Sell My Data

Powered by Jekyll with Phantom template by Jami Gibbs © Copyright Mike Woodcock.

You might also like