0% found this document useful (0 votes)
30 views30 pages

Notes On Kalman Filter KF EKF ESKF IEKF IESKF

Uploaded by

Jawad ahmed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views30 pages

Notes On Kalman Filter KF EKF ESKF IEKF IESKF

Uploaded by

Jawad ahmed Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Notes on Kalman Filter

(KF, EKF, ESKF, IEKF, IESKF)

Gyubeom Edward Im∗


arXiv:2406.06427v1 [cs.RO] 10 Jun 2024

May 2, 2024

Contents
1 Recursive Bayes Filter 2
1.1 Derivation of Recursive Bayes Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Gaussian Belief Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Kalman Filter (KF) 5


2.1 Prediction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Correction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Extended Kalman Filter (EKF) 7


3.1 Prediction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Correction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Error-state Kalman Filter (ESKF) 9


4.1 Prediction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Correction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2.1 Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Iterated Extended Kalman Filter (IEKF) 14


5.1 Compare to EKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1 Commonality 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.2 Commonality 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.3 Commonality 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.4 Difference 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.1.5 Difference 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Prediction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Correction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Iterated Error-State Kalman Filter (IESKF) 16


6.1 Compare to ESKF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.1 Commonality 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.2 Commonality 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.1.3 Commonality 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
∗ blog: alida.tistory.com, email: [email protected]

1
6.1.4 Difference (MAP-based derivation) . . . . . . . . . . . . . . . . . . . . . . . 18
6.2 Prediction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3 Correction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 MAP, GN, and EKF relationship 21


7.1 Traditional EKF derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2 MAP-based EKF derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.1 Start from MAP estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2.2 MLE of new observation function . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2.3 Gauss-Newton Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

8 Derivation of IESKF Update Step 25

9 Wrap-up 27
9.1 Kalman Filter (KF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
9.2 Extended Kalman Filter (EKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.3 Error-state Kalman Filter (ESKF) . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.4 Iterated Extended Kalman Filter (IEKF) . . . . . . . . . . . . . . . . . . . . . . . 29
9.5 Iterated Error-state Kalman Filter (IESKF) . . . . . . . . . . . . . . . . . . . . . . 29

10 Reference 29

11 Revision log 30

1 Recursive Bayes Filter


State estimation problems generally refer to estimating the current state (e.g., position, speed,
rotation) based on given control inputs and observations. Given control inputs and observations,
the degree of reliability of the current state xt , or the Belief bel(xt ) about xt , is defined as follows:

bel(xt ) = p(xt | z1:t , u1:t ) (1)

- xt : State variable at time t


- z1:t = {z1 , · · · , zt } : Observations from time 1 to t
- u1:t = {u1 , · · · , ut } : Control inputs from time 1 to t
- bel(xt ) : Belief of xt , which represents the probability (degree of belief) that the robot is at xt
based on sensor observations z1:t and control inputs u1:t from the start time to t seconds.

bel(·) is expressed and developed according to the Bayesian rule, so it is also called the Bayes
filter. Using the Markov assumption and the Bayesian rule, a recursive filter can be induced, and
this is called the recursive Bayes filter.

bel(xt ) = η · p(zt | xt )bel(xt )


(2)
Z
bel(xt ) = p(xt | xt−1 , ut )bel(xt−1 )dxt−1

- η = 1/p(zt |z1:t−1 , u1:t ) : A normalization factor to maintain the definition of the probability
distribution by normalizing its width to 1
- p(z
R t | xt ) : Observation model
- p(xt | xt−1 , ut )dxt−1 : Motion model

The Recursive Bayes filter is thus called a recursive filter because it calculates the current
step’s bel(xt ) from the previous step’s bel(xt−1 ).

2
1.1 Derivation of Recursive Bayes Filter
The formulation of the Recursive Bayes Filter is derived as follows:

bel(xt ) = p(xt |z1:t , u1:t )


= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1
(3)
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t−1 )dxt−1
xt−1
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut )bel(xt−1 )dxt−1
xt−1

= η · p(zt |xt ) · bel(xt−1 )

Steps applied include Bayesian rule, Markov Assumption, and Marginalization to derive the re-
cursive filter process.
Step 1:
bel(xt ) = p(xt |z1:t , u1:t )
(4)
= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )

Apply the Bayesian rule: p(x|y) = p(y|x)p(x)


p(y)
Step 2:
bel(xt ) = p(xt |z1:t , u1:t )
= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t ) (5)
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t )
Apply the Markov Assumption that the current state depends only on the immediately previous
state.
Step 3:

bel(xt ) = p(xt |z1:t , u1:t )


= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t ) (6)
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1

Apply the Law of Total Probability or Marginalization: p(x) =


R
y
p(x|y) · p(y)dy
Step 4:

bel(xt ) = p(xt |z1:t , u1:t )


= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t )
(7)
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1

Apply the Markov Assumption that the current state depends only on the immediately previous
state.

3
Step 5:
bel(xt ) = p(xt |z1:t , u1:t )
= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t )
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1 (8)
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t−1 )dxt−1
xt−1

Apply the Markov Assumption that the current state depends only on the immediately previous
state.
Step 6:
bel(xt ) = p(xt |z1:t , u1:t )
= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t )
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1
(9)
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
xt−1
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t−1 )dxt−1
xt−1
Z
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · bel(xt−1 )dxt−1
xt−1

p(xt−1 |z1:t−1 , u1:t−1 ) is substituted by bel(xt−1 ) as they are equivalent.


Step 7:
bel(xt ) = p(xt |z1:t , u1:t )
= η · p(zt |xt , z1:t−1 , u1:t ) · p(xt |z1:t−1 , u1:t )
= η · p(zt |xt ) · p(xt |z1:t−1 , u1:t )
Z
= η · p(zt |xt ) · p(xt |xt−1 , z1:t−1 , u1:t ) · p(xt−1 |z1:t−1 , u1:t )dxt−1
x
Z t−1
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t )dxt−1 (10)
x
Z t−1
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · p(xt−1 |z1:t−1 , u1:t−1 )dxt−1
x
Z t−1
= η · p(zt |xt ) · p(xt |xt−1 , ut ) · bel(xt−1 )dxt−1
xt−1

= η · p(zt |xt ) · bel(xt−1 )


The integral p(xt |xt−1 , ut ) · bel(xt−1 )dxt−1 is replaced by bel(xt−1 ).
R
xt−1

1.2 Gaussian Belief Case


If bel(xt ) follows a Gaussian distribution, this is specifically referred to as the Kalman filter.
bel(xt ) ∼ N (x̂t|t−1 , Pt|t−1 ) (Kalman Filter Prediction)
(11)
bel(xt ) ∼ N (x̂t|t , Pt|t ) (Kalman Filter Correction)

4
Mean and covariance are also denoted as (x̂, P) or (µ̂, Σ), representing the same concept with
different notations.

2 Kalman Filter (KF)


NOMENCLATURE of Kalman Filter

• Scalars are denoted by lowercase letters, e.g., a


• Vectors are denoted by bold lowercase letters, e.g., a
• Matrices are denoted by bold uppercase letters, e.g., R
• prediction: bel(xt ) ∼ N (x̂t|t−1 , Pt|t−1 )

– x̂t|t−1 : Mean at step t given the correction value at step t − 1. Some literature also
denotes this as x−
t .
– P̂t|t−1 : Covariance at step t given the correction value at step t − 1. Some literature
also denotes this as P−
t .

• correction: bel(xt ) ∼ N (x̂t|t , Pt|t )


– x̂t|t : Mean at step t given the prediction value at step t. Some literature also denotes
this as x+t .
– P̂t|t : Covariance at step t given the prediction value at step t. Some literature also
denotes this as P+t .

At time t, the robot’s position is denoted by xt , the measurements from the robot’s sensor
by zt , and the robot’s control input by ut . Using these, we can define the motion model and
observation model. The models are constrained by the requirement that they must be linear.

Motion Model: xt = Ft xt−1 + Bt ut + wt


(12)
Observation Model: zt = Ht xt + vt
- xt : Model state variable
- ut : Model input
- zt : Model measurement
- Ft : Model state transition matrix
- Bt : Matrix that transforms input ut to the state variable when given
- Ht : Model observation matrix
- wt ∼ N (0, Qt ): Motion model noise. Qt denotes the covariance matrix of wt .
- vt ∼ N (0, Rt ): Observation model noise. Rt denotes the covariance matrix of vt .

5
Assuming all random variables follow Gaussian distributions, p(xt | xt−1 , ut ), p(zt | xt ) can be
represented as follows.

p(xt | xt−1 , ut ) ∼ N (Ft xt−1 + Bt ut , Qt )


 
1 1
=p exp − (xt − Ft xt−1 − Bt ut )⊺ Q−1t (x t − F t x t−1 − Bt ut )
det(2πQt ) 2
(13)
p(zt | xt ) ∼ N (Ht xt , Rt )
(14)
 
1 1
=p exp − (zt − Ht xt )⊺ R−1
t (z t − H x
t t )
det(2πRt ) 2

Next, the bel(xt ), bel(xt ) that need to be computed through the Kalman filter can be repre-
sented as follows.
Z
bel(xt ) = p(xt | xt−1 , ut )bel(xt−1 )dxt−1 ∼ N (x̂t|t−1 , Pt|t−1 )
(15)
bel(xt ) = η · p(zt | xt )bel(xt ) ∼ N (x̂t|t , Pt|t )

As seen in (15), the Kalman filter operates by first computing the predicted value bel(xt )
using the previous step’s value and motion model in the prediction phase, then obtaining the
corrected value bel(xt ) using the measurement and observation model in the correction phase. By
substituting (13) and (14) into (15), we can derive the means and covariances for the prediction
and correction steps (x̂t|t−1 , Pt|t−1 ), (x̂t|t , Pt|t ) respectively. For a detailed derivation process,
refer to Section 3.2.4 "Mathematical Derivation of the KF" in [[13]].
The initial value bel(x0 ) is given as follows.

bel(x0 ) ∼ N (x̂0 , P0 ) (16)

- x̂0 : Typically set to 0


- P0 : Typically set to a small value (<1e-2).

2.1 Prediction step


Prediction involves computing bel(xt ). Since bel(xt ) follows a Gaussian distribution, we can
compute the mean x̂t|t−1 and variance Pt|t−1 as follows.

x̂t|t−1 = Ft x̂t−1|t−1 + Bt ut
(17)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Qt

2.2 Correction step


Correction involves computing bel(xt ). Since bel(xt ) also follows a Gaussian distribution, we can
compute the mean x̂t|t and variance Pt|t as follows. Here, Kt represents the Kalman gain.

Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Rt )−1


x̂t|t = x̂t|t−1 + Kt (zt − Ht x̂t|t−1 ) (18)
Pt|t = (I − Kt Ht )Pt|t−1

6
2.3 Summary
The Kalman Filter can be expressed as a function as follows.

KalmanFilter(x̂t−1|t−1 , Pt−1|t−1 , ut , zt ){
(Prediction Step)
x̂t|t−1 = Ft x̂t−1|t−1 + Bt ut
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Qt

(Correction Step) (19)


Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Rt ) −1

x̂t|t = x̂t|t−1 + Kt (zt − Ht x̂t|t−1 )


Pt|t = (I − Kt Ht )Pt|t−1
return x̂t|t , Pt|t
}

3 Extended Kalman Filter (EKF)

Kalman filters assume that both the motion model and the observation model are linear to
estimate the state. However, since most phenomena in the real world are modeled non-linearly,
applying the Kalman filter as previously defined would not work properly. The extended Kalman
filter (EKF) was proposed to use the Kalman filter even in non-linear motion and observation
models. EKF uses the first-order Taylor approximation to approximate the non-linear
model to a linear model before applying the Kalman filter. The motion model and
observation model of the EKF are as follows.
Motion Model: xt = f (xt−1 , ut , wt )
(20)
Observation Model: zt = h(xt , vt )
- xt : state variable of the model
- ut : input of the model
- zt : measurement of the model
- f (·): non-linear motion model function
- h(·): non-linear observation model function
- wt ∼ N (0, Qt ): noise in the motion model. Qt represents the covariance matrix of wt
- vt ∼ N (0, Rt ): noise in the observation model. Rt represents the covariance matrix of vt

In the above formula, f (·) means a non-linear motion model and h(·) means a non-linear ob-
servation model. When the first-order Taylor approximation is performed on f (·), h(·), it becomes
as follows.
xt ≈ f (x̂t−1|t−1 , ut , 0) + Ft (xt−1 − x̂t−1|t−1 ) + Fw wt
(21)
zt ≈ h(x̂t|t−1 , 0) + Ht (xt − x̂t|t−1 ) + Hv vt
At this time, Ft represents the Jacobian matrix of the motion model calculated at x̂t−1|t−1 ,
and Ht represents the Jacobian matrix of the observation model calculated at x̂t|t−1 . And Fw , Hv

7
represent the Jacobian matrix of the noise when wt = 0, vt = 0 respectively. For more information
on the Jacobian, refer to this post.
∂f ∂f
Ft = Fw = (22)
∂xt−1 xt−1 =x̂t−1|t−1 ∂wt xt−1 =x̂t−1|t−1
wt =0

∂h ∂h
Ht = Hv = (23)
∂xt xt =x̂t|t−1 ∂vt xt =x̂t|t−1
vt =0
(21) If the formula is expanded, it becomes as follows.

xt = Ft xt−1 + f (x̂t−1|t−1 , ut , 0) − Ft x̂t−1|t−1 + Fw wt


(24)
= Ft xt−1 + ũt + w̃t

- ũt = f (x̂t−1|t−1 , ut , 0) − Ft x̂t−1|t−1


- w̃t = Fw wt ∼ N (0, Fw Qt F⊺w )

zt = Ht xt + h(x̂t|t−1 , 0) − Ht x̂t|t−1 + Hv vt
(25)
= Ht xt + z̃t + ṽt
- z̃t = h(x̂t|t−1 , 0) − Ht x̂t|t−1
- ṽt = Hv vt ∼ N (0, Hv Rt H⊺v )

Assuming all random variables follow a Gaussian distribution, p(xt |xt−1 , ut ), p(zt |xt ) can be
expressed as follows.
p(xt |xt−1 , ut )∼ N (Ft xt−1 + ũt , Fw Qt F⊺w )
 
1 1 ⊺ ⊺ −1
=p exp − (xt − Ft xt−1 − ũt ) (Fw Qt Fw ) (xt − Ft xt−1 − ũt )
det(2πFw Qt F⊺w ) 2
(26)
p(zt |xt ) ∼ N (Ht xt + z̃t , Hv Rt H⊺v )
 
1 1 ⊺ ⊺ −1
=p exp − (z t − Ht x t − z̃ t ) (Hv R t Hv ) (z t − Ht xt − z̃ t )
det(2πHv Rt H⊺v ) 2
(27)
Fw Qt F⊺w represents the noise in the linearized motion model, and Hv Rt H⊺v represents the
noise in the linearized observation model. Next, bel(xt ), bel(xt ) that need to be found through
the Kalman filter can be expressed as follows.
Z
bel(xt ) = p(xt |xt−1 , ut )bel(xt−1 )dxt−1 ∼ N (x̂t|t−1 , Pt|t−1 )
(28)
bel(xt ) = η · p(zt |xt )bel(xt ) ∼ N (x̂t|t , Pt|t )
EKF operates in a similar manner to KF, using the values from the previous step and the
motion model in the prediction to first obtain the predicted value bel(xt ), and then using the
observed values and observation model in the correction to obtain the corrected value bel(xt ).
Substitute p(xt |xt−1 , ut ), p(zt |xt ) into the above equation to obtain the mean and covariance of
the prediction and correction steps (x̂t|t−1 , Pt|t−1 ), (x̂t|t , Pt|t ) respectively. Refer to section 3.2.4
"Mathematical Derivation of the KF" in [[13]] for detailed derivation.
The initial value bel(x0 ) is given as follows.
bel(x0 ) ∼ N (x̂0 , P0 ) (29)
- x̂0 : Typically set to 0
- P0 : Typically set to a small value (<1e-2)

8
3.1 Prediction step
Prediction involves calculating bel(xt ). The covariance matrix is obtained using the linearized
Jacobian matrix Ft .
x̂t|t−1 = f (x̂t−1|t−1 , ut , 0)
(30)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

3.2 Correction step


Correction involves calculating bel(xt ). The Kalman gain and covariance matrix are obtained
using the linearized Jacobian matrix Ht .

Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Hv Rt H⊺v )−1


x̂t|t = x̂t|t−1 + Kt (zt − h(x̂t|t−1 , 0)) (31)
Pt|t = (I − Kt Ht )Pt|t−1

3.3 Summary
The Extended Kalman Filter can be expressed as a function as follows.

ExtendedKalmanFilter(x̂t−1|t−1 , Pt−1|t−1 , ut , zt ){
(Prediction Step)
x̂t|t−1 = ft (x̂t−1|t−1 , ut , 0)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

(Correction Step) (32)


Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Hv Rt H⊺v )−1
x̂t|t = x̂t|t−1 + Kt (zt − ht (x̂t|t−1 , 0))
Pt|t = (I − Kt Ht )Pt|t−1
return x̂t|t , Pt|t
}

4 Error-state Kalman Filter (ESKF)


NOMENCLATURE of Error-State Kalman Filter
• prediction: bel(δxt ) ∼ N (δx̂t|t−1 , Pt|t−1 )

– δx̂t|t−1 : The mean at step t given the correction value at step t − 1. Some literature
also denotes this as δx−
t .
– P̂t|t−1 : The covariance at step t given the correction value at step t−1. Some literature
also denotes this as P−t .

• correction: bel(δxt ) ∼ N (δx̂t|t , Pt|t )


– δx̂t|t : The mean at step t given the prediction value at step t. Some literature also
denotes this as δx+
t .
– P̂t|t : The covariance at step t given the prediction value at step t. Some literature also
denotes this as P+ t .

9
The error-state Kalman filter (ESKF) is a Kalman filter algorithm that estimates
the mean and variance of the error state variable δxt , unlike the EKF which estimates
the mean and variance of the conventional state variable xt . Since it estimates through the
error state rather than directly estimating the state variable, it is also called an indirect Kalman
filter. It is also known as the error-state extended Kalman filter (ES-EKF). In the ESKF, the
conventional state variable is called the true state variable, and it is represented as a sum of the
nominal (nominal) state and the error (error) state.

xtrue,t = xt + δxt (33)

- xtrue,t : True state variable updated in conventional KF, EKF at step t


- xt : Nominal state variable at step t. Represents a state without error
- δxt : Error state variable at step t

If we interpret the above formula, it means that the actual (true) state variable xt we want to
estimate can be represented as a sum of the general (or nominal) state x̂t without errors and the
error state δxt arising from model and sensor noise. Here, the nominal state has (relatively) large
values and is nonlinear. In contrast, the error state has small values near zero and is linear. While
the conventional EKF linearizes the nonlinear true (nominal + error) state variable
for filtering, resulting in slow speed and accumulated errors over time, the ESKF
linearizes only the error state for filtering, thus having faster speed and accuracy.
The advantages of ESKF over conventional EKF are as follows (Madyastha et al., 2011):
• The representation of the orientation error state involves minimal parameters. That is, it has
the minimum number of parameters as degrees of freedom, so phenomena like singularities
due to over-parameterization do not occur.
• The error state system only operates near the origin, making it easy to linearize. Therefore,
parameter singularities like gimbal lock do not occur, and it is always possible to perform
linearization.
• Since error states typically have small values, terms of second order and higher can be
ignored. This helps perform Jacobian operations quickly and easily. Some Jacobians are
also used as constants.
However, while the prediction speed of the ESKF is fast, the correction step where the nominal
state with generally nonlinear large values is processed is slow.

The motion model and observation model of the ESKF are as follows.
Error-state Motion Model: xt + δxt = f (xt−1 , δxt−1 , ut , wt )
(34)
Error-state Observation Model: zt = h(xt , δxt , vt )

- xt : Model’s nominal state variable


- δxt : Model’s error state variable
- ut : Model’s input
- zt : Model’s measurement

10
- f (·): Nonlinear motion model function
- h(·): Nonlinear observation model function
- wt ∼ N (0, Qt ): Noise of the error state model. Qt denotes the covariance matrix of wt
- vt ∼ N (0, Rt ): Noise of the observation model. Rt denotes the covariance matrix of vt

When first-order Taylor approximations are performed for f (·), h(·), the following expansions
result. This development refers to [5].

xt + δxt ≈ f (xt−1|t−1 , δx̂t−1|t−1 , ut , 0) + Ft (δxt−1 − δx̂t−1|t−1 ) + Fw wt


(35)
zt ≈ h(xt|t−1 , δx̂t|t−1 , 0) + Ht (δxt − δx̂t|t−1 ) + Hv vt

Note that both Jacobians, Ft , Ht , are for the error state δxt , not the true state
xtrue,t . This is the most significant difference from EKF in the Jacobian part.

∂f ∂f
Ft = Fw = (36)
∂δxt−1 δxt−1 =δx̂t−1|t−1 ∂wt δxt−1 =δx̂t−1|t−1
wt =0

∂h ∂h
Ht = Hv = (37)
∂δxt δxt =δx̂t|t−1 ∂vt δxt =δx̂t|t−1
vt =0

Ht can be expressed using the chain rule as follows.


∂h ∂h ∂xtrue,t
Ht = = (38)
∂δxt ∂xtrue,t ∂δxt

In this, the front part ∂xtrue,t is the same


∂h
Jacobian obtained in EKF, but a Jacobian for the
∂xtrue,t
error state variable ∂δxt has been added.For more details on this, see the post Quaternion
kinematics for the error-state Kalman filter summary.
Expanding (35) results in the following.

xt + δxt = Ft δxt−1 + f (xt−1|t−1 , δx̂t−1|t−1 , ut , 0) − Ft δx̂t−1|t−1 + Fw wt (39)

Since δx̂t−1|t−1 = 0 is always initialized at the previous correction step, 0 is substituted for
the related terms.
xt + δxt = Ft δxt−1 + f (xt−1|t−1 , 0, ut , 0) + Fw wt (40)
The nominal state variable xt is identical to f (xt−1 , 0, ut , 0) by definition, so they cancel each
other out.
δxt = Ft δxt−1 + Fw wt
= Ft δxt−1 + w̃t (41)
= 0 + w̃t

- w̃t = Fw wt ∼ N (0, Fw Qt F⊺w )


The observation model function is similarly developed by substituting the error state variable
values with 0.
zt = Ht δxt + h(xt|t−1 , δx̂t−1|t−1 , 0) − Ht δx̂t|t−1 + Hv vt
= h(xt|t−1 , 0, 0) + Hv vt (42)
= z̃t + ṽt

- z̃t = h(xt|t−1 , 0, 0)
- ṽt = Hv vt ∼ N (0, Hv Rt H⊺v )

11
Assuming all random variables follow a Gaussian distribution, p(δxt | δxt−1 , ut ), p(zt | δxt )
can be represented as follows.

p(δxt | δxt−1 , ut ) ∼ N (0, Fw Qt F⊺w )


(43)
 
1 1
=p exp − (δxt )⊺ (Fw Qt F⊺w )−1 (δxt )
det(2πFw Qt F⊺w ) 2

p(zt | δxt ) ∼ N (z̃t , Hv Rt H⊺v )


(44)
 
1 1
=p exp − (zt − z̃t )⊺ (Hv Rt H⊺v )−1 (zt − z̃t )
det(2πHv Rt H⊺v ) 2
Fw Qt F⊺w denotes the noise of the linearized error state motion model, and Hv Rt H⊺v de-
notes the noise of the linearized observation model. Next, the Kalman filter needs to derive
bel(δxt ), bel(δxt ) as follows.
Z
bel(δxt ) = p(δxt | δxt−1 , ut )bel(δxt−1 )dxt−1 ∼ N (δx̂t|t−1 , Pt|t−1 )
(45)
bel(δxt ) = η · p(zt | δxt )bel(δxt ) ∼ N (δx̂t|t , Pt|t )

ESKF also operates in the same way as EKF, where the prediction first derives the predicted
value bel(δxt ) using the previous step’s value and motion model, and then the correction derives
the corrected value bel(δxt ) using the measurement value and observation model. Substitut-
ing p(δxt |δxt−1 , ut ), p(zt |δxt ) into the above formula allows you to derive the mean and
covariance for the prediction and correction steps, (δx̂t|t−1 , Pt|t−1 ), (δx̂t|t , Pt|t ), respec-
tively. The detailed derivation process can be found in section 3.2.4 "Mathematical
Derivation of the KF" of [[13]].
The initial value bel(δx0 ) is given as follows.

bel(δx0 ) ∼ N (0, P0 ) (46)

- δx̂0 = 0 : Always has a value of 0


- P0 : Typically set to small values (<1e-2).

4.1 Prediction step


Prediction is the process of deriving bel(δxt ). The linearized Jacobian matrix Ft is used to derive
the covariance matrix.
δx̂t|t−1 = Ft δx̂t−1|t−1 = 0 ← Always 0
x̂t|t−1 = f (x̂t−1|t−1 , 0, ut , 0) (47)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

Note that Ft is the Jacobian for the error state. Since the error state variable δx̂ is reset to
0 at every correction step, multiplying it by the linear Jacobian Ft still results in 0. Therefore,
the value of δx̂ always remains 0 in the prediction step. Thus, the error state δx̂t|t−1
remains unchanged in the prediction step, while only the nominal state x̂t|t−1 and the error state
covariance Pt|t−1 are updated.

12
4.2 Correction step
Correction is the process of deriving bel(δxt ). The Kalman gain and covariance matrix are derived
using the linearized Jacobian matrix Ht .

Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Hv Rt Hv )−1


δx̂t|t = Kt (zt − h(x̂t|t−1 , 0, 0))
x̂t|t = x̂t|t−1 + δx̂t|t
Pt|t = (I − Kt Ht )Pt|t−1 (48)
reset δx̂t|t
δx̂t|t ← 0
Pt|t ← GPt|t G⊺

Note that in the above formula, Ht is the Jacobian for the observation model of
the error state δxt , not the true state xtrue,t . The symbols Pt|t−1 , Kt are only similar
to those in EKF, but the actual values are different. Thus, while the overall formulas are
the same as in EKF, the matrices F, H, P, K denote values for the error state δxt , which is a key
difference.

4.2.1 Reset
Once the nominal state is updated, the next step is to reset the value of the error state to 0. The
reset is necessary because it is necessary to represent a new error for the new nominal state. The
reset updates the covariance of the error state Pt|t .
If the reset function is denoted as g(·), it can be represented as follows. For more details on
this, see the chapter 6 content in the post Quaternion kinematics for the error-state Kalman filter
summary.
δx ← g(δx) = δx − δx̂t|t−1 (49)
The reset process in ESKF is as follows.

δx̂t|t ← 0
(50)
Pt|t ← GPt|t G⊺

G denotes the Jacobian for the reset defined as follows.


∂g
G= (51)
∂δx δxt =δx̂t|t

13
4.3 Summary
ErrorStateKalmanFilter(x̂t−1|t−1 , δx̂t−1|t−1 , Pt−1|t−1 , ut , zt ){
(Prediction Step)
δx̂t|t−1 = Ft x̂t−1|t−1 = 0 ← Always 0
x̂t|t−1 = f (x̂t−1|t−1 , 0, ut , 0)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

(Correction Step)
Kt = Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Hv Rt H⊺v )−1
(52)
δx̂t|t = Kt (zt − ht (x̂t|t−1 , 0, 0))
x̂t|t = x̂t|t−1 + δx̂t|t
Pt|t = (I − Kt Ht )Pt|t−1
reset δx̂t|t
δx̂t|t ← 0
Pt|t ← GPt|t P⊺
return δx̂t|t , Pt|t
}

5 Iterated Extended Kalman Filter (IEKF)

The Iterated Extended Kalman Filter (IEKF) is an algorithm that repetitively performs the
correction step of the Extended Kalman Filter (EKF). Since EKF linearizes non-linear functions
to estimate state variables, errors inevitably occur during the linearization process. IEKF reduces
these linearization errors by repeatedly performing the linearization if the update change δx̂t|t,j
after the correction step is sufficiently large.
In this context, δx̂ is interpreted as the update change, not an error state variable. That is,
x̂t|t,j+1 ← x̂t|t,j +δx̂t|t,j is used solely to update the j-th posterior value to the j+1 posterior value.
In other words, it is not the target of state estimation.

5.1 Compare to EKF


5.1.1 Commonality 1
In IEKF, the motion and observation models are as follows, which are exactly the same as in EKF.

Motion Model: xt = f (xt−1 , ut , wt )


(53)
Observation Model: zt = h(xt , vt )
- xt : state variable of the model - ut : input of the model - zt : measurement of the model - f (·):
non-linear motion model function - h(·): non-linear observation model function - wt ∼ N (0, Qt ):
noise in the motion model. Qt represents the covariance matrix of wt - vt ∼ N (0, Rt ): noise in
the observation model. Rt represents the covariance matrix of vt

14
5.1.2 Commonality 2
Next, the linearization process is also exactly the same as EKF’s equations (24), (25).

xt = Ft xt−1 + f (x̂t−1|t−1 , ut , 0) − Ft x̂t−1|t−1 + Fw wt


(54)
= Ft xt−1 + ũt + w̃t

- ũt = f (x̂t−1|t−1 , ut , 0) − Ft x̂t−1|t−1


- w̃t = Fw wt ∼ N (0, Fw Qt F⊺w )

zt = Ht xt + h(x̂t|t−1 , 0) − Ht x̂t|t−1 + Hv vt
(55)
= Ht xt + z̃t + ṽt

- z̃t = h(x̂t|t−1 , 0) − Ht x̂t|t−1


- ṽt = Hv vt ∼ N (0, Hv Rt H⊺v )

5.1.3 Commonality 3
IEKF, like EKF, uses the matrices F, H, K true to the state variables xtrue .

5.1.4 Difference 1
EKF: The correction value is obtained from the prediction value in one step.
IEKF: The correction value becomes the prediction value again and the correction step is repeated
iteratively.

EKF Correction : x̂t|t−1 → x̂t|t


(56)
IEKF Correction : x̂t|t,j ⇆ x̂t|t,j+1

- j: j-th iteration

5.1.5 Difference 2
Expanding equation (55) to create the innovation term rt gives the following.

rt = zt − h(x̂t|t−1 , 0) − Ht (xt − x̂t|t−1 ) (57)


In EKF, looking at the second line of correction step (31), it can be seen that the Kalman gain
Kt is multiplied by rt to compute the posterior.

EKF correction : x̂t|t = x̂t|t−1 + Kt (rt )


(58)
= x̂t|t−1 + Kt (zt − h(x̂t|t−1 , 0))
EKF: The Ht (xt − x̂t|t−1 ) part is eliminated when xt = x̂t|t−1 is substituted, and only the
remaining parts are used.
IEKF: Linearization for a new state (new operating point) is performed at each moment of
the correction step, so this part is not eliminated.

EKF innovation term : rt = zt − h(x̂t|t−1 , 0)


(59)
IEKF innovation term : rt,j = zt − h(x̂t|t,j , 0) − Ht,j (x̂t|t−1 − x̂t|t,j )

If it is the first iteration j = 0, x̂t|t,0 = x̂t|t−1 so it is eliminated resulting in a formula similar


to EKF, but from j = 1 onwards, different values are used so it is not eliminated.

15
5.2 Prediction step
Prediction involves the process of determining bel(xt ). The linearized Jacobian matrix Ft is used
when computing the covariance matrix. This is identical to the prediction step in EKF.

x̂t|t−1 = f (x̂t−1|t−1 , ut , 0)
(60)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

5.3 Correction step


Correction involves the process of determining bel(xt ). The Kalman gain and covariance matrix
are computed using the linearized Jacobian matrix Ht . The IEKF’s correction step is performed
iteratively until the update change δx̂t|t,j is sufficiently small.

set ϵ
start j-th loop
Kt,j = Pt|t−1 H⊺t,j (Ht,j Pt|t−1 H⊺t,j + Hv,j Rt H⊺v,j )−1
δx̂t|t,j = Kt,j (zt − h(x̂t|t,j , 0) − Ht (x̂t|t−1 − x̂t|t,j ))
(61)
x̂t|t,j+1 = x̂t|t,j + δx̂t|t,j
iterate until δx̂t|t,j < ϵ.
end loop
Pt|t,n = (I − Kt,n Ht,n )Pt|t−1

5.4 Summary
IteratedExtendedKalmanFilter(x̂t−1|t−1 , Pt−1|t−1 , ut , zt ){
(Prediction Step)
x̂t|t−1 = ft (x̂t−1|t−1 , ut , 0)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

(Correction Step)
set ϵ
start j-th loop
(62)
Kt,j = Pt|t−1 H⊺t,j (Ht,j Pt|t−1 H⊺t,j + Hv,j Rt H⊺v,j )−1
δx̂t|t,j = Kt,j (zt − h(x̂t|t,j , 0) − Ht (x̂t|t−1 − x̂t|t,j ))
x̂t|t,j+1 = x̂t|t,j + δx̂t|t,j
iterate until δx̂t|t,j < ϵ.
end loop
Pt|t,n = (I − Kt,n Ht,n )Pt|t−1
return x̂t|t , Pt|t
}

6 Iterated Error-State Kalman Filter (IESKF)

16
While IEKF estimates the true state variable xtrue iteratively in the correction step, IESKF
is an algorithm that iteratively estimates the error state variable δxt . The relationship between
these state variables is as follows.

xtrue,t = xt + δxt (63)


- xtrue,t : true state variable at step t updated in the conventional KF, EKF
- xt : nominal state variable at step t. Represents the state without error
- δxt : error state variable at step t

The iterative update process in IESKF involves understanding xtrue,t as the post-update state,
nominal xt as the pre-update state, and the error state variable δxt as the update amount.

6.1 Compare to ESKF


6.1.1 Commonality 1
The motion and observation models of IESKF are as follows, which are identical to those of ESKF.

Error-state Motion Model: xt + δxt = f (xt−1 , δxt−1 , ut , wt )


(64)
Error-state Observation Model: zt = h(xt , δxt , vt )

- xt : model’s nominal state variable


- δxt : model’s error state variable
- ut : model’s input
- zt : model’s observation value
- f (·): nonlinear motion model function
- h(·): nonlinear observation model function
- wt ∼ N (0, Qt ): noise in the error state model. Qt represents the covariance matrix of wt
- vt ∼ N (0, Rt ): noise in the observation model. Rt is the covariance matrix of vt

6.1.2 Commonality 2
Both Jacobians Ft , Ht of IESKF refer to the Jacobians for the error state δxt , similar to ESKF.

∂f ∂f
Ft = Fw = (65)
∂δxt−1 δxt−1 =δx̂t−1|t−1 ∂wt δxt−1 =δx̂t−1|t−1
wt =0

∂h ∂h
Ht = Hv = (66)
∂δxt δxt =δx̂t|t−1 ∂vt δxt =δx̂t|t−1
vt =0

Ht can be expressed through the chain rule as follows.


∂h ∂h ∂xtrue,t
Ht = = (67)
∂δxt ∂xtrue,t ∂δxt

17
6.1.3 Commonality 3
The linearized error state variable x̂t can also be derived identically to ESKF.

δxt = Ft δxt−1 + Fw wt
= Ft δxt−1 + w̃t (68)
= 0 + w̃t

- w̃t = Fw wt ∼ N (0, Fw Qt F⊺w )

6.1.4 Difference (MAP-based derivation)


ESKF derivation:
The ESKF correction step unfolds the following equation to derive the mean δx̂t|t and covari-
ance Pt|t .

bel(δxt ) = η · p(zt |δxt )bel(δxt ) ∼ N (δx̂t|t , Pt|t ) (69)


- p(zt |δxt ) ∼N (z̃t , Hv Rt H⊺v ) : (see 44)
- bel(δxt ) ∼ N (δx̂t|t−1 , Pt|t−1 )

δx̂t|t = Kt (zt − h(x̂t|t−1 , 0, 0))


(70)
Pt|t = (I − Kt Ht )Pt|t−1

IESKF derivation:
IESKF uses maximum a posteriori (MAP) for deriving the correction step. For MAP estima-
tion, the Gauss-Newton optimization method is utilized. For more details, refer to section 7. This
derivation process is based on references [6], [7], [8], [9], [10], and [11], among which [6] provides
the most detailed explanation of the IESKF derivation process.
If EKF is derived using the MAP approach, the final equation to optimize is as follows:

arg min ∥zt − h(xtrue,t , 0)∥Hv R−1 H⊺v + xtrue,t − x̂t|t−1 P−1 (71)
xt t|t−1

When expressed in terms of the nominal state xt and error state δxt , it becomes the following:

arg min ∥zt − h(xt , δxt , 0)∥Hv R−1 H⊺v + δx̂t|t−1 P−1 (72)
δx̂t|t t|t−1

Tip

Prediction error state variable δx̂t|t−1 is defined as follows:

δx̂t|t−1 = xtrue,t − x̂t|t−1


(73)
∼ N (0, Pt|t−1 )

Posterior (correction) error state variable δx̂t|t is defined as follows:

δx̂t|t = xtrue,t|t − x̂t|t (74)

If the equation is rearranged, the following formula is established:

xtrue,t|t = x̂t|t + δx̂t|t (75)

18
The first part of (72) should be linearized similarly to (42).

zt − h(xt|t−1 , δx̂t|t−1 , 0) ≈ zt − h(xt|t−1 , 0, 0) − Ht δxt (76)


In ESKF, δxt is always 0, thus it was removed, but in IESKF it retains a non-zero value, so it
is not removed. The next part of (72) with (75) substituted in unfolds as follows:

δx̂t|t−1 = xtrue,t − x̂t|t−1


= (x̂t|t + δx̂t|t ) − x̂t|t−1
(77)
≈ x̂t|t − x̂t|t−1 + Jt δx̂t|t
∼ N (0, Pt|t−1 )
Here, Jt is defined as follows:
 

Jt = (x̂t|t + δx̂t|t ) − x̂t|t−1 (78)
∂δx̂t|t δx̂t|t =0

If the third and fourth lines of (77) are rearranged and simplified, the following formula is
obtained:
−⊺
δx̂t|t ∼ N (−J−1 −1
t (x̂t|t − x̂t|t−1 ), Jt Pt|t−1 Jt ) (79)
Having calculated (76) and (77), the MAP estimation problem becomes as follows:

arg min zt − h(x̂t|t−1 , 0, 0) − Ht δx̂t|t Hv R−1 H⊺


δx̂t|t v
(80)
+ x̂t|t − x̂t|t−1 + Jt δx̂t|t P−1
t|t−1

IESKF iteratively estimates the error state variable δx̂t|t until it converges to a value less than
a specific threshold ϵ. The expression for the j-th iteration is as follows:

−⊺
δx̂t|t,j ∼ N (−J−1 −1
t,j (x̂t|t,j − x̂t|t−1 ), Jt,j Pt|t−1 Jt,j ) (81)

arg min zt − h(x̂t|t−1 , 0, 0) − Ht δx̂t|t,j Hv R−1 H⊺


δx̂t|t,j v
(82)
+ x̂t|t,j − x̂t|t−1 + Jt,j δx̂t|t,j P−1
t|t−1

Refer to Section 8 for the derivation process of the update equation from the above equation.

6.2 Prediction step


Prediction involves the process of determining bel(δxt ). The linearized Jacobian matrix Ft is used
when calculating the covariance matrix. This is completely identical to ESKF.

δx̂t|t−1 = Ft δx̂t−1|t−1 = 0 ← Always 0


x̂t|t−1 = f (x̂t−1|t−1 , 0, ut , 0) (83)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

19
6.3 Correction step
Correction involves the process of determining bel(xt ). The Kalman gain and covariance matrix
are calculated using the linearized Jacobian matrix Ht . The correction step of IESKF is performed
iteratively until the update amount δx̂t|t,j becomes sufficiently small. The derived formula (82) is
differentiated and set to zero as follows:

set ϵ
start j-th loop
−⊺ ⊺ ⊺
St,j = Ht,j J−1
t,j Pt|t−1 Jt,j Ht,j + Hv,j Rt Hv,j
−⊺ ⊺
Kt,j = J−1 −1
t,j Pt|t−1 Jt,j Ht,j St,j
 
δx̂t|t,j = Kt,j zt − h(x̂t|t,j , 0, 0) + Ht,j J−1
t,j (x̂t|t,j − x̂ t|t−1 ) − J−1
t,j (x̂t|t,j − x̂t|t−1 )
(84)

x̂t|t,j+1 = x̂t|t,j ⊞ δx̂t|t,j


iterate until δx̂t|t,j < ϵ.
end loop
−⊺
Pt|t,n = (I − Kt,n Ht,n )J−1
t,n Pt|t−1 Jt,n

For a more detailed derivation process, refer to Section 8.

6.4 Summary
IESKF can be expressed as a function as follows.

IteratedErrorStateKalmanFilter(x̂t−1|t−1 , δx̂t−1|t−1 , Pt−1|t−1 , ut , zt ){


(Prediction Step)
δx̂t|t−1 = Ft x̂t−1|t−1 = 0 ← Always 0
x̂t|t−1 = f (x̂t−1|t−1 , 0, ut , 0)
Pt|t−1 = Ft Pt−1|t−1 F⊺t + Fw Qt F⊺w

(Correction Step)
set ϵ
start j-th loop
−⊺ ⊺ ⊺
St,j = Ht,j J−1
t,j Pt|t−1 Jt,j Ht,j + Hv,j Rt Hv,j (85)
−⊺ ⊺
Kt,j = J−1 −1
t,j Pt|t−1 Jt,j Ht,j St,j
 
δx̂t|t,j = Kt,j zt − h(x̂t|t,j , 0, 0) + Ht,j J−1
t,j (x̂t|t,j − x̂t|t−1 ) − J−1
t,j (x̂t|t,j − x̂t|t−1 )

x̂t|t,j+1 = x̂t|t,j ⊞ δx̂t|t,j


iterate until δx̂t|t,j < ϵ.
end loop
−⊺
Pt|t,n = (I − Kt,n Ht,n )J−1
t,n Pt|t−1 Jt,n
return x̂t|t , Pt|t
}

20
7 MAP, GN, and EKF relationship
7.1 Traditional EKF derivation
Let’s consider the observation model function for the EKF given as follows. For the convenience
of development, the observation noise vt is positioned outside.

Observation Model: zt = h(xt ) + vt (86)


The correction step of the EKF unfolds the following formula to derive the mean x̂t|t and the
covariance Pt|t .

bel(xt ) = η · p(zt | xt )bel(xt ) ∼ N (x̂t|t , Pt|t ) (87)


- p(zt | xt ) ∼ N (ht (xt ), Rt )
- bel(xt ) ∼ N (x̂t|t−1 , Pt|t−1 )

x̂t|t = Kt (zt − h(x̂t|t−1 ))


(88)
Pt|t = (I − Kt Ht )Pt|t−1

7.2 MAP-based EKF derivation


7.2.1 Start from MAP estimator
As a method for deriving the correction step, one can use the maximum a posteriori (MAP)
estimation that maximizes the probability of the posterior. The details were written with reference
to [12].

x̂t|t = arg max bel(xt )


xt

= arg max p(xt |z1:t , u1:t ) · · · posterior


xt

∝ arg max p(zt |xt )bel(xt ) · · · likelihood · prior


xt
  (89)
1
∝ arg max exp − (zt − h(xt ))⊺ R−1 t (zt − h(xt ))
xt 2

⊺ −1
+ (xt − x̂t|t−1 ) Pt|t−1 (xt − x̂t|t−1 )

- p(zt | xt ) ∼ N (ht (xt ), Rt )


- bel(xt ) ∼ N (x̂t|t−1 , Pt|t−1 )

Removing the negative sign turns the maximization problem into a minimization problem, and
it can be organized into the following optimization formula.

x̂t|t ∝ arg min exp (zt − h(xt ))⊺ R−1 t (zt − h(xt ))
xt
 (90)
+ (xt − x̂t|t−1 )⊺ P−1t|t−1 (xt − x̂t|t−1 )

x̂t|t = arg min ∥zt − h(xt )∥R−1 + xt − x̂t|t−1 P−1 (91)


xt t t|t−1

- ∥a∥B = a⊺ Ba : Mahalanobis norm

Expanding the formula within (91) and defining it as a cost function Ct results in the following.
Note that switching the order of xt − x̂t|t−1 does not affect the overall value.

21
Ct = (zt − h(xt ))⊺ R−1 ⊺ −1
t (zt − h(xt )) + (x̂t|t−1 − xt ) Pt|t−1 (x̂t|t−1 − xt ) (92)
This can be expressed in matrix form as follows.
 ⊺  −1  
x̂t|t−1 − xt Pt|t−1 0 x̂t|t−1 − xt
Ct = (93)
zt − h(xt ) 0 R−1
t
zt − h(xt )

7.2.2 MLE of new observation function


A new observation function satisfying the above equation can be defined as follows.

yt = g(xt ) + et
(94)
∼ N (g(xt ), Pe )
 
x̂t|t−1
- yt =
zt
 
xt
- g(xt ) =
h(xt )
- et ∼ N (0, Pe ) 
Pt|t−1 0
- Pe =
0 Rt
The nonlinear function g(xt ) can be linearized as follows.

g(xt ) ≈ g(x̂t|t−1 ) + Jt (xt − x̂t|t−1 )


(95)
= g(x̂t|t−1 ) + Jt δx̂t|t−1
The Jacobian Jt is as follows.

∂g
Jt =
∂xt xt =x̂t|t−1
 
xt

h(xt )
∂xt xt =x̂t|t−1
  (96)
xt

h(x̂t|t−1 ) + Ht (xt − x̂t|t−1 )
∂xt xt =x̂t|t−1
 
I
=
Ht
Therefore, the likelihood for yt can be developed as follows.

p(yt |xt ) ∼ N (g(xt ), Pe )


(97)
 
1 ⊺ −1
= η · exp − (yt − g(xt )) Pe (yt − g(xt ))
2
Thus, the maximum a posteriori (MAP) problem for the existing bel(xt ) resolves into solving
the maximum likelihood estimation (MLE) problem for p(yt |xt ). Solving equation (97) with MLE
leads to the following.

22
x̂t|t = arg max p(yt |xt )
xt

∝ arg min − ln p(yt |xt )


xt
1 (98)
∝ arg min (yt − g(xt ))⊺ P−1
e (yt − g(xt ))
xt 2

∝ arg min ∥yt − g(xt )∥P−1


e
xt

7.2.3 Gauss-Newton Optimization


Equation (98) has the form of a least squares method. Specifically, it is also called weighted least
squares (WLS) since the weight P−1 e is multiplied in between. After linearizing and rearranging
the formula, it becomes as follows.

x̂t|t = arg min ∥yt − g(xt )∥P−1


e
xt

= arg min yt − g(x̂t|t−1 ) − Jt δx̂t|t−1 P−1


xt e
(99)
= arg min Jt δx̂t|t−1 − (yt − g(x̂t|t−1 )) P−1
xt e

= arg min Jt δx̂t|t−1 − rt P−1


xt e

- δx̂t|t−1 = xt − x̂t|t−1 : For ease of expression, xt is represented as the true state

The linearized residual term rt is as follows.

rt = yt − g(x̂t|t−1 )
= Jt δx̂t|t−1 + e (100)
∼ N (0, Pe )
By solving the normal equations of Gauss-Newton, the solution is as follows.

(J⊺t P−1 ⊺ −1
e Jt )δx̂t|t−1 = Jt Pe rt (101)

∴ δx̂t|t−1 = (J⊺t P−1


e Jt )
−1 ⊺ −1
Jt P e r t (102)

In the above equation, the part (J⊺t P−1


e Jt ) is generally called the approximate Hessian matrix
H̃.

Posterior covariance matrix Pt|t :

Pt|t can be obtained as follows.

23
Pt|t =E[(xt − x̂t|t−1 )(xt − x̂t|t−1 )⊺ ]
=E(δx̂t|t−1 δx̂⊺t|t−1 )
 
=E (J⊺t P−1
e J t ) −1 ⊺ −1
J P
t e r r ⊺ −⊺
P
t t e J (J ⊺ −1
t t eP J t ) −⊺

=(J⊺t P−1e Jt )
−1 ⊺ −1
Jt Pe E(rt r⊺t )P−⊺ ⊺ −1
e Jt (Jt Pe Jt )
−⊺
← E(rt r⊺t ) = Pe
−1
= J⊺t P−1

e Jt
(103)
 −1  !−1
 ⊺  Pt|t−1 0 I
= I Ht
0 Rt Ht
 −1
⊺ −1
= P−1 t|t−1 + H t R t Ht

=Pt|t−1 − Pt|t−1 H⊺t (Ht Pt|t−1 H⊺t + Rt )−1 Ht Pt|t−1 ← matrix inversion lemmas
=(I − Kt Ht )Pt|t−1

As can be seen from the fifth line of the above equation, the inverse of the approximate Hessian
matrix H̃−1 obtained through GN and the EKF’s posterior covariance Pt|t are the same.

H̃−1 = (J⊺t P−1


e Jt )
−1
= Pt|t (104)

Posterior mean xt|t :

By iteratively performing GN, the jth xt|t,j can be determined as follows.

x̂t|t,j+1 =x̂t|t,j + δx̂t|t,j


=x̂t|t,j + (J⊺t P−1
t|t−1 Jt )
−1 ⊺ −1
(Jt Pt|t−1 rt )
=(J⊺t P−1t|t−1 Jt )
−1 ⊺ −1
Jt Pt|t−1 (yt − g(x̂t|t,j ) + Jt x̂t|t,j ) ← rt = yt − g(x̂t|t,j )
−1 h i 
x̂t|t−1

−1 ⊺ −1 −1 ⊺ −1
= Pt|t−1 + Ht Rt Ht Pt|t−1 Ht Rt ← expand Jt
zt − h(x̂t|t,j ) + Ht x̂t|t,j
 −1  
⊺ −1
= P−1 t|t−1 + Ht Rt Ht H⊺t R−1 −1
t (zt − h(x̂t|t,j ) + Ht x̂t|t,j ) + Pt|t−1 x̂t|t−1
 −1  
⊺ −1
= P−1 t|t−1 + Ht Rt Ht H⊺ R−1 −1
t (zt − h(x̂t|t,j ) − Ht (x̂t|t−1 − x̂t|t,j ) + Ht x̂t|t−1 ) + Pt|t−1 x̂t|t−1
 
 −1 
⊺ −1 ⊺ −1
= P−1 H⊺ R−1 −1

t|t−1 + Ht Rt Ht t (zt − h(x̂t|t,j ) − Ht (x̂t|t−1 − x̂t|t,j )) + (Pt|t−1 + Ht Rt Ht )x̂t|t−1 


| {z }
x̂t|t−1
 −1
⊺ −1
=x̂t|t−1 + P−1
t|t−1 + Ht Rt Ht H⊺t R−1
t (zt − h(x̂t|t,j ) − Ht (x̂t|t−1 − x̂t|t,j ))

=x̂t|t−1 + Kt (zt − h(x̂t|t,j ) − Ht (x̂t|t−1 − x̂t|t,j ))


(105)
The above equation is identical to the IEKF (61) equation. As seen in the last equation,
estimating the solution of the EKF through Gauss-Newton and estimating the solution through
IEKF have the same meaning. If it is the first iteration j = 0, x̂t|t,0 = x̂t|t−1 , then the equation
is summarized as follows.

x̂t|t = x̂t|t−1 + Kt (zt − h(x̂t|t−1 )) (106)

24
This is identical to the solution of the EKF. That is, the EKF implies that GN iteration=1,
and IEKF performs the same operations as GN.

8 Derivation of IESKF Update Step


In this section, the intermediate derivation process of obtaining δx̂ from the update step in the
IESKF process is explained. The derivation primarily references the contents of a link. First, let
us revisit the update formula (82)

arg min zt − h(x̂t|t−1 , 0, 0) − Ht δx̂t|t,j Hv R−1 H⊺


+ x̂t|t,j − x̂t|t−1 + Jt,j δx̂t|t,j P−1
δx̂t|t,j v t|t−1

(107)
The symbols in the above equation are simplified as follows:
2 2
x = arg min ∥z − Hx∥R + ∥c + Jx∥P
x

- z = zt − h(x̂t|t−1 , 0, 0)
- c = x̂t|t,j − x̂t|t−1
- R = Hv R−1 H⊺v
- x = δx̂

Expanding the norm in the above equation gives:


 
⊺ −1 ⊺ −1
x = arg min (z − Hx) R (z − Hx) + (c + Jx) P (c + Jx)
x | {z }
r

If we expand only the r part in the above equation, it becomes:

r = z⊺ R−1 z−x⊺ H⊺ R−1 z − z⊺ R−1 Hx + x⊺ H⊺ R−1 Hx + c⊺ P−1 c + x⊺ J⊺ P−1 c + c⊺ P−1 Jx + x⊺ J⊺ P−1 Jx


= z⊺ R−1 z−2x⊺ H⊺ R−1 z + x⊺ H⊺ R−1 Hx + c⊺ P−1 c+2x⊺ J⊺ P−1 c + x⊺ J⊺ P−1 Jx
(108)
Differentiating r with respect to x yields:
∂r
= −H⊺ R−1 z + H⊺ R−1 Hx + J⊺ P−1 c + J⊺ P−1 Jx = 0
∂x
(H⊺ R−1 H + J⊺ P−1 J)x = H⊺ R−1 z − J⊺ P−1 c
x = (H⊺ R−1 H + J⊺ P−1 J)−1 (H⊺ R−1 z − J⊺ P−1 c)
(109)
The (H⊺ R−1 H+J⊺ P−1 J)−1 part can be expanded using the matrix inversion lemma as follows:
⊺ −1 ⊺ −1 −1 ⊺ ⊺ −1
(J
| P{z J} +H R H) = (P̄ − P̄H (R + HP̄H ) HP̄ (110)
P̄−1

- P̄ = J−1 PJ−⊺

Tip

Matrix Inversion Lemma


(A + UCV)−1 = A−1 − A−1 U(C−1 + VA−1 U)−1 VA−1

25
Substituting this equation into (109) yields:

x = (P̄ − P̄H⊺ (R + HP̄H⊺ )−1 HP̄)(H⊺ R−1 z − J⊺ P−1 c) (111)

Expanding the above equation results in:

x = (P̄ − P̄H⊺ (R + HP̄H⊺ )−1 HP̄)(H⊺ R−1 z − J⊺ P−1 c)


= P̄H⊺ R−1 z − P̄J⊺ P−1 c − P̄H⊺ (R + HP̄H⊺ )−1 HP̄H⊺ R−1 z + P̄H⊺ (R + HP̄H⊺ )−1 HP̄J⊺ P−1 c
= P̄H⊺ R−1 z − P̄J⊺ P−1 c − P̄H⊺( )−1 (( + R)R−1 − I)z + P̄H⊺ (R + HP̄H⊺ )−1 HP̄J⊺ P−1 c
((⊺ (( ⊺ (((
(R(+(H(P̄H (H(
P̄H
((
(⊺(R−1 z − P̄J⊺ P−1 c − ( (⊺(R−1 z + P̄H⊺ (R + HP̄H⊺ )−1 z + P̄H⊺ (R + HP̄H⊺ )−1 HP̄J⊺ P−1 c
(( ((
=(P̄H P̄H
= −P̄J⊺ P−1 c + P̄H⊺ (R + HP̄H⊺ )−1 z + P̄H⊺ (R + HP̄H⊺ )−1 HP̄J⊺ P−1 c
J⊺ P−1} c + J−1 PJ−⊺ H⊺ (R + HJ−1 PJ−⊺ H⊺ )−1 z + J−1 PJ−⊺ H⊺ (R + HJ−1 PJ−⊺ H⊺ )−1 HJ−1 |PJ−⊺{z
= −J−1 |PJ−⊺{z J⊺ P
| {z } | {z }
I S S I

= Kz + KHJ−1 c − J−1 c
(112)
- K = J−1 PJ−⊺ H⊺ S−1

Restoring the symbols substituted in the last line of the equation yields the final update
equation (84) as follows:
 
∴ δx̂t|t,j = Kt,j zt − h(x̂t|t,j , 0, 0) + Ht,j J−1
t,j (x̂t|t,j − x̂t|t−1 ) − J−1
t,j (x̂t|t,j − x̂t|t−1 ) (113)

- z = zt − h(x̂t|t−1 , 0, 0)
- c = x̂t|t,j − x̂t|t−1
- R = Hv R−1 H⊺v
- x = δx̂

The derivation process up to now can be illustrated as follows.

26
9 Wrap-up
The explanation of KF, EKF, ESKF, IEKF, IESKF so far can be represented on a single slide as
follows. Click to see the larger image.

9.1 Kalman Filter (KF)

27
9.2 Extended Kalman Filter (EKF)

9.3 Error-state Kalman Filter (ESKF)

28
9.4 Iterated Extended Kalman Filter (IEKF)
9.5 Iterated Error-state Kalman Filter (IESKF)

10 Reference
[1] (Wiki) Kalman Filter

29
[2] (Paper) Sola, Joan. "Quaternion kinematics for the error-state Kalman filter." arXiv preprint
arXiv:1711.02508 (2017).
[3] (Youtube) Robot Mapping Coure - Freiburg Univ
[4] (Blog) [SLAM] Kalman filter and EKF(Extended Kalman Filter) - Jinyong Jeong
[5] (Blog) Error-State Kalman Filter understanding and formula derivation - CSDN

[6] (Paper) He, Dongjiao, Wei Xu, and Fu Zhang. "Kalman filters on differentiable manifolds."
arXiv preprint arXiv:2102.03804 (2021).
[7] (Book) SLAM in Autonomous Driving book (SAD book)
[8] (Paper) Xu, Wei, and Fu Zhang. "Fast-lio: A fast, robust lidar-inertial odometry package by
tightly-coupled iterated kalman filter." IEEE Robotics and Automation Letters 6.2 (2021):
3317-3324.
[9] (Paper) Huai, Jianzhu, and Xiang Gao. "A Quick Guide for the Iterated Extended Kalman
Filter on Manifolds." arXiv preprint arXiv:2307.09237 (2023).

[10] (Paper) Bloesch, Michael, et al. "Iterated extended Kalman filter based visual-inertial odom-
etry using direct photometric feedback." The International Journal of Robotics Research
36.10 (2017): 1053-1072.
[11] (Paper) Skoglund, Martin A., Gustaf Hendeby, and Daniel Axehill. "Extended Kalman filter
modifications based on an optimization view point." 2015 18th International Conference on
Information Fusion (Fusion). IEEE, 2015.
[12] (Blog) From MAP, MLE, OLS, GN to IEKF, EKF
[13] (Book) Thrun, Sebastian. "Probabilistic robotics." Communications of the ACM 45.3 (2002):
52-57.

11 Revision log
• 1st: 2020-06-23
• 2nd: 2020-06-24

• 3rd: 2020-06-26
• 4th: 2023-01-21
• 5th: 2023-01-31
• 6th: 2023-02-02

• 7th: 2023-02-04
• 8th: 2024-02-08
• 9th: 2024-02-09

• 10th: 2024-05-02

30

You might also like