Solution Manual and Notes For - Applied Optimal Estimation (Gelb)
Solution Manual and Notes For - Applied Optimal Estimation (Gelb)
John L. Weatherwax∗
Introduction
Here you’ll find various notes and derivations of the technical material I made as I worked
through this book. There is also quite a complete set of solutions to the various end of
chapter problems. I did much of this in hopes of improving my understanding of Kalman
filtering and thought it might be of interest to others. I have tried hard to eliminate any
mistakes but it is certain that some exit. I would appreciate constructive feedback (sent to
the email below) on any errors that are found in these notes. I will try to fix any corrections
that I receive. In addition, there were several problems that I was not able to solve or
that I am not fully confident in my solutions for. If anyone has any suggestions at solution
methods or alternative ways to solve given problems please contact me. Finally, some of the
derivations found here can be quite long (since I really desire to fully document exactly how
to do each derivation) many of these can be skipped if they are not of interest.
I hope you enjoy this book as much as I have and that these notes might help the further
development of your skills in Kalman filtering.
As a final comment, I’ve worked hard to make these notes as good as I can, but I have no
illusions that they are perfect. If you feel that that there is a better way to accomplish
or explain an exercise or derivation presented in these notes; or that one or more of the
explanations is unclear, incomplete, or misleading, please tell me. If you find an error of
any kind – technical, grammatical, typographical, whatever – please tell me that, too. I’ll
gladly add to the acknowledgments in later printings the name of the first person to bring
each problem to my attention.
∗
[email protected]
1
Acknowledgments
Special thanks to (most recent comments are listed first): David Herold and Ed Corbett for
help with these notes.
Chapter 1: Introduction
Some special cases of the above that validate its usefulness are when each measurement
contributes the same uncertainty then σ1 = σ2 and we see that x̂ = 21 z1 + 12 z2 , or the average
of the two measurements. As another special case if one measurement is exact i.e. σ1 = 0,
then we have x̂ = z1 (in the same way if σ2 = 0, then x̂ = z2 ).
Problem Solutions
For this problem we are now going to assume that E[v1 v2 ] = ρσ1 σ2 i.e. that the noise v1 and
v2 are correlated. Recall from above that the condition E[x̃] = 0 requires that our estimate
x̂ = k1 z1 + k2 z2 requires k2 = 1 − k1 . Next we compute the expected error or E[x̃2 ] and in
this case using Equation 1 for x̃ we find
To find a minimum variance estimator we will take the derivative of E[x̃2 ] with respect to
k1 , set the result equal to zero, and then solve for k1 . We have
dE[x̃2 ]
= 0 ⇒ 2k1 σ12 + 2ρ(1 − k1 )σ1 σ2 + 2ρk1 (−1)σ1 σ2 + 2(1 − k1 )(−1)σ22 = 0 .
dk1
or dividing by 2
k1 σ12 + ρ(1 − k1 )σ1 σ2 − ρk1 σ1 σ2 − (1 − k1 )σ22 = 0 .
On solving for k1 in this expression we find
σ22 − ρσ1 σ2
k1 = , (3)
σ22 − 2ρσ1 σ2 + σ12
We are told that our measurements z1 and z2 are given as noised measurements of a constant
as z1 = x + v1 and z2 = x + v2 , while our estimate of x or x̂ is to be constructed as a linear
combination of zi as x̂ = k1 z1 + k2 z2 . Now defining x̃ as before we have in this case that
x̃ = x̂ − x = k1 (x + v1 ) + k2 (x + v2 ) − x = (k1 + k2 − 1)x + k1 v1 + k2 v2 .
Taking the expectation of this expression and using the facts that the mean of the noise is
zero so E[vi ] = 0 and x is a constant gives
For simplicity lets assume that the two noise sources are uncorrelated i.e. E[v1 v2 ] = 0. Then
to find the minimum of this expression we take derivatives with respect to k1 and k2 set each
expression equal to zero and solve for k1 and k2 . We find the derivatives given by
∂E[x̃2 ]
= 2(k1 + k2 − 1)x2 + 2k1 σ12 = 0
∂k1
∂E[x̃2 ]
= 2(k1 + k2 − 1)x2 + 2k2 σ22 = 0 .
∂k2
When we group terms by the coefficients k1 and k2 we get the following system
To solve this system for k1 and k2 we can use Cramer’s rule. We find
2
x 2
2 2x 2
x x + σ2 x2 σ22
k1 = 2 =
x + σ12 x2 (σ12 + σ22 )x2 + σ12 σ22
x2 x2 + σ22
2
x + σ12 x2
x2 x2 x2 σ12
k2 = = ,
(σ12 + σ22 )x2 + σ12 σ22 (σ12 + σ22 )x2 + σ12 σ22
both of which are functions of the unknown variable x. An interesting idea would be to con-
sider the iterative algorithm where we initially estimate x above using an unbiased estimator
and then replace the x above with this estimate obtaining values for k1 and k2 . One could
then use these to estimate x again and put this value into the above expressions for k1 and
k2 . Doing this several times one gets an iterative algorithm as the estimation procedure.
Problem 1-3 (estimating a constant with three measurements)
For this problem our three measurements are related to the unknown value of x from as
z1 = x + v1 , z2 = x + v2 , and z3 = x + v3 , and our estimate will be a linear combination of
them as x̂ = k1 z1 + k2 z2 + k3 z3 . To have an unbiased estimate compute the expectation of
x̃ = x̂ − x which we find to be
x̃ = x̂ − x
= k1 z1 + k2 z2 + k3 z3 − x
= k1 (x + v1 ) + k2 (x + v2 ) + k3 (x + v3 ) − x
= (k1 + k2 + k3 − 1)x + k1 v1 + k1 v1 + k2 v2 + k3 v3 . (6)
x̂ = k1 z1 + k2 z2 + (1 − k1 − k2 )z3 .
We will now pick k1 and k2 such that the mean square error E[x̃2 ] is a minimum. With this
functional form for x̂ we have using Equation 6 that
x̃2 = (k1 v1 + k2 v2 + k3 v3 )2
= k12 v12 + k22 v22 + k32 v32 + 2k1 k2 v1 v2 + 2k1 k3 v1 v3 + 2k2 k3 v2 v3 .
Taking the expectation of the above expression, assuming uncorrelated measurements E[vi vj ] =
0 when i 6= j and recalling Equation 7 we have
to minimize this expression we take the partial derivatives with respect to k1 and k2 and set
the resulting expressions equal to zero. This gives
∂E[x̃2 ]
= 2k1 σ12 + 2(1 − k1 − k2 )(−1)σ32 = 0
∂k1
∂E[x̃2 ]
= 2k2 σ22 + 2(1 − k1 − k2 )(−1)σ32 = 0 .
∂k2
Now solving these two equations for k1 and k2 we find
σ22 σ32 1
k1 = = 2 2
σ12 σ22 + σ12 σ32 + σ22 σ32 σ1 σ1
σ3
+ σ2
+1
σ12 σ32 1
k2 = = 2 2 .
σ12 σ22 + σ12 σ32 + σ22 σ32 σ2 σ2
σ3
+1+ σ1
From these we can compute k3 = 1 − k1 − k2 to find
σ22 σ32 σ12 σ32
k3 = 1 − k1 − k2 = 1 − −
σ12 σ22 + σ12 σ32 + σ22 σ32 σ12 σ22 + σ12 σ32 + σ22 σ32
σ12 σ22 1
= 2 2 2 2 2 2
= 2 2 .
σ1 σ2 + σ1 σ3 + σ2 σ3
1 + σσ23 + σσ31
Then by defining D ≡ σ12 σ22 + σ12 σ32 + σ22 σ32 and using Equation 8 we see that
2 2 2
2 σ24 σ34 σ12 σ34 σ14 σ22 σ14 σ24 σ32 σ1 σ2 σ3 2 2 2 2 2 2
σ12 σ22 σ32
E[x̃ ] = + + = σ σ
2 3 + σ σ
1 3 + σ σ
1 2 =
D2 D2 D2 D2 D
2 2 3
σ1 σ2 σ3 1
= 2 2 2 2 2 2
= ,
σ1 σ2 + σ1 σ3 + σ3 σ2 1
2 + 1
2 + 1
2
σ3 σ2 σ1
as we were to show.
We are told that our estimate of the concentration, zi are noisy measurements of the time-
decayed initial concentration x0 and so have the form
zi = x0 e−ati + vi , (9)
for i = 1, 2. The book provides us with a functional form of an estimator x̂0 we could use to
estimate x0 , and asks us to show that it is unbiased. We could begin by attempting to esti-
mate the initial concentration x0 using a expression that is linear in the two measurements.
That is we might consider
x̂0 = k1 z1 + k2 z2 ,
as has been done else where in the book. From the given form of the measurements in
Equation 9 it might be better however to estimate x0 using the following
with k1 and k2 unknown. Since in that case the exponential parts eati , multiplied by zi will
“remove” the corresponding factor found in Equation 9 and provide a more direct estimate
of x0 . We next define our estimation error x̃0 as x̃0 = x̂0 − x0 . To have an unbiased estimator
requires that E[x̃0 ] = 0. Using this last form form x̂0 this later expectation is given by
(eat2 σ2 )2 σ22
k1 = = .
(eat1 σ1 )2 + (eat2 σ2 )2 σ22 + σ12 e−2a(t2 −t1 )
(eat1 σ1 )2 σ12
k2 = 1 − k1 = = .
(eat1 σ1 )2 + (eat2 σ2 )2 σ12 + σ22 e2a(t2 −t1 )
To simplify the notation of the algebra that follows we define A1 = e2at1 σ12 and A2 = e2at2 σ22
so that the variables ki in terms of Ai are given as k1 = A1A+A
2
2
and k2 = A1A+A 1
2
. Then we
have that Equation 10 becomes
A22 A21 A1 A2 A1 A2
E[(x̂0 − x0 )2 ] = 2
A1 + 2
A2 = 2
(A1 + A2 ) =
(A1 + A2 ) (A1 + A2 ) (A1 + A2 ) A1 + A2
−2t1 a −1
1 e e−2t2 a
= 1 = + ,
A2
+ A11 σ12 σ22
as we were to show.
Chapter 2: Underlying Mathematical Techniques
Least-Squares Techniques
J = z T z − 2z T Hx + xT H T Hx .
Taking the first derivative of this expression with respect to the unknown vector x using
Equations 311 and 312 gives
∂J
= −2H T z + (H T H + H T H)x = −2H T z + 2H T Hx .
∂x
The second derivative of J with respect to x is given by
∂2J
2
= 2H T H . (12)
∂x
This matrix is positive semi-definite since if we let ξ be a arbitrary non-zero vector and
2
compute the inner product ξ T ∂∂xJ2 ξ we see that this can can be written as a quadratic sum
as X
2(Hξ)T (Hξ) = 2 (Hξ)2i ≥ 0 ,
i
T
for all possible vectors ξ. Thus 2H H is positive semi-definite and the solution to the first
order optimality condition ∂J∂x
= 0 gives a minimum.
Problem Solutions
Since P (t)P (t)−1 = I, taking the derivative of both sides of this expression and using the
product rule gives
dP −1
Ṗ P −1 + P = 0.
dt
dP −1
Solving for dt
we find
dP −1
= −P −1 Ṗ P −1 , (13)
dt
as we were to show.
Problem 2-3 (eigenvalues of positive definite matrices)
We will prove this by showing the equivalence of between two quadratic forms. If we consider
the quadratic form xT Ax then as discussed in the book there exists an orthogonal matrix
Q such that A′ = QT AQ = Q−1 AQ, is a diagonal matrix. Since A and A′ are related by a
similarity transformation they have the same eigenvalues which are equal to the elements on
the diagonal of A′ . Thus if we define x′ = Qx then xT Ax can be written as
2 2 2
λ1 x′1 + λ2 x′2 + · · · + λn x′n ,
where λi is the eigenvalue of A (equivalently A′ ). Now if we are told that A is positive definite
then we know that xT Ax > 0 for all x’s. If we take x = qi , where qi is the ith column vector
of Q then in that case by the orthogonality of the matrix Q we have x′ = Qqi = ei , a vector
of all zeros with a single 1 in the ith spot. For that value of x then xT Ax = λi . Since
xT Ax > 0 for all x we see that λi > 0. On the other hand if we are told that the eigenvalues
of A are all positive
Pwe know that λi > 0 for all i then from the above decomposition we
n
have that x Ax = i=1 λi x′i 2 > 0 showing that A is positive definite.
T
dR(t) T
Problem 2-4 (S(t) = dt
R (t) is skew symmetric)
S(t)T = −S(t) ,
Part (a): The Cayley-Hamilton theorem requires that a matrix A satisfy its own charac-
teristic polynomial. The given matrix has a characteristic polynomial given by |A − λI| = 0
or
1−λ 2
= 0,
3 4−λ
or after expanding some
(1 − λ)(4 − λ) − 6 = 0 ,
or finally λ2 − 5λ − 2 = 0 as we were to show. The eigenvalues of this matrix are then given
by the quadratic formula √
5 ± 33
λ= . (14)
2
to evaluate this we need to compute powers of A. Powers of A can be computed using the
fact that A satisfies its own characteristic polynomial (the Cayley-Hamilton theorem). We
find
A2 = 2I + 5A
A3 = (5A + 2I)A = 2A + 5A2 = 2A + 5(5A + 2I) = 10I + 27A
A4 = A(A3 ) = 10A + 27A2 = 10A + 27(5A + 2I) = 54I + 145A .
Using these we can write eAt as
t2 t3 t4
eAt = I + tA + (2I + 5A) + (10I + 27A) + (54I + 145A) + · · ·
2 6 24
If we group terms that are multiples of I together and terms that are multiples of A together,
we find that the above expression for eAt is equal to
At 2 5 3 9 4 5 2 9 3 145 4
e = I 1+t + t + t +··· +A t + t + t + t +···
3 4 2 2 24
= a1 (t)I + a2 (t)A ,
with ai (t) defined by the respective terms in brackets above.
Problem 2-6 (evaluating an integral over the points r such that r T E −1 r < 1)
R
For this problem we want to evaluate the integral rT E −1 r<1 dr. To do this lets introduce a
change of coordinates that decouples the variables in r. Since E is a positive definite matrix
so is its inverse E −1 , and thus E −1 has a Cholesky factorization given by E −1 = GGT , where
G is an lower triangular matrix. Introduce the vector v = GT r then the set of possible r
values r T E −1 r < 1 becomes
p Z 4 p
|E| dv = π 3 |E| ,
vT v<1 3
R
since we recognized that vT v<1
dv represents the volume of a sphere with radius 1.
Problem 2-7 (weighted least squares)
J = z T W z − 2z T W Hx + xT H T W Hx = z T W z − 2(H T W z)T x + xT H T W Hx .
Taking the first derivative of this expression with respect to the unknown vector x using
Equations 311 and 312 gives
∂J
= −2H T W z + (H T W H + H T W H)x = −2H T W z + 2H T W Hx .
∂x
Setting this derivative equal to zero and solving for x (which we denote as x̂) gives
x̂ = (H T W H)−1H T W z , (17)
the result quoted in the book. The second derivative of J with respect to x is given by
∂2J
= 2H T W H . (18)
∂x2
This matrix is positive semi-definite if the elements on the diagonal of W are non-negative
and the solution given in Equation 17 to the first order optimality condition ∂J
∂x
= 0 gives a
minimum.
Problem 2-9 (the distribution of the sum of three uniform random variables)
If X is a uniform random variable over (−1, +1) then it has a p.d.f. given by
1
2
−1 ≤ x ≤ 1
pX (x) = ,
0 otherwise
X
while the random variable Y = 3
is another uniform random variable with a p.d.f. given by
3
2
− 13 ≤ x ≤ 13
pY (y) = .
0 otherwise
Since the three random variables X/3, Y /3, and Z/3 are independent the characteristic
function of the sum of them is the product of the characteristic function of each one of them.
For a uniform random variable over the domain (α, β) on can show that the characteristic
function ζ(t) is given by Equation 21 or
Z β
1 eitβ − eitα
ζ(t) = eitx dx = ,
α β−α it(β − α)
note this is a slightly different than the normal definition of the Fourier transform [9], which
has e−itx as the exponential argument. Thus for each of the random variables X/3, Y /3, and
Z/3 the characteristic function since β = 13 and α = − 13 looks like
3(eit(1/3) − e−it(1/3) )
ζ(t) = .
2it
Thus the sum of two uniform random variables like X/3 and Y /3 has a characteristic function
given by
9
ζ 2 (t) = − 2 (eit(2/3) − 2 + e−it(2/3) ) ,
4t
and adding in a third random variable say Z/3 to the sum of the previous two will give a
characteristic function that looks like
3 27 eit 3eit(1/3) 3e−it(1/3) e−it
ζ (t) = − − + − 3 .
8i t3 t3 t3 t
Given the characteristic function of a random variable to compute its probability density
function from it we need to evaluate the inverse Fourier transform of this function. That is
we need to evaluate Z ∞
1
pW (w) = ζ(t)3 e−itw dt .
2π −∞
1
R∞
Note that this later integral is equivalent to 2π −∞
ζ(t)3 e+itw dt (the standard definition of
the inverse Fourier transform) since ζ(t)3 is an even function. To evaluate this integral then
it will be helpful to convert the complex exponentials in ζ(t)3 into trigonometric functions
by writing ζ(t)3 as !
t
27 3 sin sin(t)
ζ(t)3 = 3
− 3 . (19)
4 t3 t
Thus to solve this problem we need to be able to compute the inverse Fourier transform of
two expressions like
sin(αt)
.
t3
To do that we will write it as a product with two factors as
sin(αt) sin(αt) 1
3
= · 2.
t t t
This is helpful since we (might) now recognize as the product of two functions each of which
we know the Fourier transform of. For example one can show [9] that if we define the step
function h1 (w) as 1
2
|w| < α
h1 (w) ≡ ,
0 |w| > α
then the Fourier transform of this step function h1 (w) is the first function in the product
above or sin(αt)
t
. Notationally, we can write this as
1
2
|w| < α sin(αt)
F = .
0 |w| > α t
initial ramp function flipped ramp function
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
Figure 1: Left: The initial function h2 (x) (a ramp function). Right: The ramp function
flipped or h2 (−x).
h2 (w) = −w u(w) ,
1
F [−wu(w)] = .
t2
Since the inverse of a function that is the product of two functions for which we know the
individual inverse Fourier transform of is the convolution integral of the two inverse Fourier
transforms we have that
Z ∞
−1 sin(αt)
F = h1 (x)h2 (w − x)dx ,
t3 −∞
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
Figure 2: Left: The function h2 (x), flipped and shifted by w = 3/4 to the right or
h2 (−(x − w)). Right: The flipped and shifted function plotted together with h1 (x) al-
lowing visualizations of function overlap as w is varied.
The distribution function for a Poisson random variable when the mean number of events
we expect to observe is µ is given by
x
X x
X
e−µ µi −µ µi
F (x) = =e .
i=0
i! i=0
i!
When are arrival rate is 0.4 arrivals per minute since in 10 minutes we would have a mean
number of arrivals given by µ = 10(0.4) = 4. Thus the probability of exactly four arrivals in
10 min is given by
44
f (x = 4|µ = 4) = e−4 = 0.1954 ,
4!
and the probability of no more than four arrivals in 10 minutes is given by
4
X
−4 4i
F (4) = e = 0.62884 .
i=0
i!
See the Matlab file chap 2 prob 10.m for calls to the poisspdf and poisscdf Matlab func-
tions used in evaluating these two probabilities.
√
For the first part of this problem lets define the random variable Z = X 2 + Y 2 and attempt
to compute the distribution function for the random variable Z. We have
n√ o
FZ (z) = Pr {Z ≤ z} = Pr 2
X +Y ≤z 2
Z
= √ p(x, y)dxdy
X 2 +Y 2 ≤z
Z
= √ p(x)p(y)dxdy
X 2 +Y 2 ≤z
Z
1 1 x2 1 1 y2
= √ √ exp − 2 √ exp − 2 dxdy
X 2 +Y 2 ≤z 2π 2σ 2π 2σ
Z 2 2
1 1 (x + y )
= √ exp − dxdy .
X 2 +Y 2 ≤z 2π 2 σ2
To evaluate this last integral we will change from Cartesian coordinates to polar coordinates.
Let r 2 = x2 + y 2 and the integral above becomes
Z z Z z
1 2
− 12 r 2 1 r2
FZ (z) = e σ 2πrdr = re− 2 σ2 dr
2π r=0 r=0
Z 1 z22
2σ 1 z2
= e−v dv = 1 − e− 2 σ2 .
0
We will take the derivative of Fz (z) to get the p.d.f for Z. We find
1 z − 21 z22
fZ (z) = FZ′ (z) = e σ ,
2 σ2
which is the desired expression.
Next we will compute the expectation of Z and Z 2 directly from the definition of the given
Rayleigh density function. We have that
Z ∞ 2
z − z22
E(Z) = 2
e 2σ dz .
z=0 σ
z2
√ √
To evaluate this integral let v = so that z = 2σ v and dz = √σ v −1/2 dv to get
2σ2 2
Z ∞
1 2 −v σ
E(Z) = (2σ v)e √ v −1/2 dv
σ 2 v=0 2
√ Z ∞ 3 −1 −v
= 2σ v 2 e dv
0
√ 3 √ 1 1
= 2σΓ( ) = 2σ Γ( )
2 2 2
r
π
= σ.
2
Next we calculate E(Z 2 ). We find
Z ∞
1 z2
E(Z ) = 22
z 3 e− 2σ2 dz .
σ v=0
2Pmax + P0 + b(2Amax ) = 1 ,
The uniform distribution has a characteristic function that can be computed directly
Z b
itX 1
ζ(t) = E(e ) = eitx dx (20)
a b−a
itb
1 e − eita
= . (21)
b−a it
We could compute E(X) using the characteristic function ζ(t) for a uniform random variable.
Beginning this calculation we have
1 ∂ζ(t)
E(X) =
i ∂t t=0
1 1 1 itb ita 1 itb
ita
= (ibe − iae ) − 2 (e − e )
i b − a it it t=0
1 t(ibe − iae ) − (eitb − eita )
itb ita
= − .
b−a t2 t=0
To evaluate this expression requires the use of L’Hopital’s rule, and seems a somewhat
complicated route to compute E(X). The evaluation of E(X 2 ) would probably be even
more work when computed from the characteristic function. For this distribution, it is much
easier to compute the expectations directly. We have
Z b
b
1 1 x2 1
E(X) = x dx = = (a + b) .
a b−a b−a 2 a 2
Problem 2-14 (the distribution of X1 +X2 when X1 and X2 are correlated normals)
It would be nice to be able to evaluate this expression directly but it might be simpler to
determine the functional form of fZ (l) by taking the derivative of the above with respect to
l and then evaluating the resulting integral. We find
Z ∞
′
FZ (l) = f2 (l − x2 , x2 )dx2
x2 =−∞
Z ∞
1 1 (l − x2 )2 (l − x2 ) x2 x22
= p exp − − − 2ρ + .
2πσ1 σ2 1 − ρ2 x2 =−∞ 2(1 − ρ2 ) σ12 σ1 σ2 σ22
In the argument in the exponent we can expand everything in terms of x2 , complete the
square and write it as
2
1 1 2ρ 1 lσ2 (ρσ1 + σ2 ) l2
− + + x2 − − .
2(1 − ρ2 ) σ12 σ1 σ2 σ22 σ12 + 2ρσ1 σ2 + σ22 2(σ12 + 2ρσ1 σ2 + σ22 )
Using this we see that the value of FZ′ (l) is the integral of the exponential of this expression
over the entire real line. Since x2 goes from −∞ to +∞ the “shift” amount of σ2lσ+2ρσ 2 (ρσ1 +σ2 )
1 σ2 +σ2
2
1
in the quadratic above can translated away and we get
l2
Z ∞
′ 1 − 1 1 2ρ 1
FZ (l) = p e 2(σ 2 +2ρσ1 σ2 +σ 2 )
1 2 exp − 2) 2
+ + 2 x22 dx2 .
2πσ1 σ2 1 − ρ2 x2 =−∞ 2(1 − ρ σ1 σ1 σ2 σ2
To evaluate this expression recall that because of the normalization of the Gaussian proba-
R∞ 1 x2 √
bility density that −∞ e− 2 σ2 dx = 2πσ and the above becomes
l2
1 −
FZ′ (l) =√ p e 2(σ 2 +2ρσ1 σ2 +σ 2 )
1 2 .
2π σ12 + 2ρσ1 σ2 + σ22
Note that this expression is the probability density function of a normal random variable
with a mean value of zero and a variance given by σ12 + 2ρσ1 σ2 + σ22 . In the Mathematical
file chap 2 prob 14.nb some of the algebra for this problem is worked.
Chapter 3 (Linear Dynamic Systems)
Working through the block diagram presented in the text in figure 3.1-4 for this example we
find that the various state variables must be related as follows
εa − φg = δ v̇
δ ṗ = δv
δv
+ εg = φ̇ .
R
φ
If we solve for the derivative variable and assume a state vector given by δv we find
δp
φ
φ̇ = 0 1/R 0 δv + εg
δp
φ
δ ṗ = 0 1 0 δv
δp
φ
δ v̇ = −g 0 0 δv + εa .
δp
which when written as a first order matrix system is given by the books equation 3.1-13.
We are told that a solution to the continuous linear system with a time dependent companion
matrix F (t) or
ẋ(t) = F (t)x(t) + L(t)u(t) , (23)
is given by Z t
x(t) = Φ(t, t0 )x(t0 ) + Φ(t, τ )L(τ )u(τ )dτ . (24)
t0
To verify this take the derivative of x(t) with respect to time. We find
Z t
′ ′
x (t) = Φ (t, t0 )x(t0 ) + Φ′ (t, τ )L(τ )u(τ )dτ + Φ(t, t)L(t)u(t)
t0
Z t
= F (t)Φ(t, t0 )x(t0 ) + F (t)Φ(t, τ )L(τ )u(τ )dτ + L(t)u(t)
t0
Z t
= F (t) Φ(t, t0 )x(t0 ) + Φ(t, τ )L(τ )u(τ )dτ + L(t)u(t)
t0
= F (t)x(t) + L(t)u(t) .
showing that the expression given in Equation 24 is indeed a solution. Note that in the above
we have used the fact that for a fundamental solution Φ(t, t0 ) we have Φ′ (t, t0 ) = F (t)Φ(t, t0 ).
dξ
= Φ(t, t0 )−1 L(t)u(t) = Φ(t0 , t)L(t)u(t) ,
dt
since
Φ(t, t0 )−1 = Φ(t0 , t) . (27)
When we integrate the above expression we find that ξ(t) is given by
Z t
ξ(t) = ξ(t0 ) + Φ(t0 , τ )L(τ )u(τ )dτ .
t0
Putting this expression into Equation 25 we get for x(t) the following
Z t
x(t) = Φ(t, t0 )ξ(t0 ) + Φ(t, t0 )Φ(t0 , τ )L(τ )u(τ )dτ .
t0
Since the product of the two Φ functions inside the integral simplifies as
ẋ1 = x2
ẋ2 = 0 .
From the equation for x2 (t) by integrating we have that x2 (t) = x2 (0) where x2 (0) is the
random constant initial condition. It is worth repeating the point about the randomness of
x2 (0). The value of x2 (0) is not known beforehand but is assumed to be generated from
a distribution. Once the random value is generated and observed, the value of x2 (t) is
specified for all later time. Then using the first equation we have that ẋ1 = x2 (0) so that
x1 (t) = x2 (0)t + x1 (0), where x1 (0) is another random initial condition. Thus if we consider
x1 (0) to be the “mean value” of x1 (t) then
showing the quadratic growth of the variance expected with a random ramp noise model.
For the exponentially correlated random variables the state differential equation is given
by
ẋ = −βx + w ,
then from this representation the system function F is −β and if we assume w(t) is uncor-
related white noise so that E[w(t)w(τ )] = q(t)δ(t − τ ) then the linear variance equation
we will use the results from the book that translate from the continuous time model to the
discrete time model. Recall that the continuous noise produces a discrete noise term Γk Qk ΓTk
that is given by
Z tk+1
T
Γk Qk Γk = Φ(tk+1 , τ )G(τ )Q(τ )G(τ )T Φ(tk+1 , τ )T dτ . (32)
tk
For the continuous problem where the fundamental solution is given by Φ(t, t0 ) = e−β(t−t0 )
and G = 1 so we can evaluate Equation 32 taking Q(t) = q a constant as
Z tk+1
T
Γk Qk Γk = e−β(tk+1 −τ ) qe−β(tk+1 −τ ) dτ
tk
q
= (1 − e−2β(tk+1 −tk ) ) ,
2β
or the books equation 3.8-20.
In the discussion on time series analysis given in the text the focus is on ARMA(p,q) models
for the output process zk given an input process rk . This means that we assume that
our output, zk , can be expressed as a sum of p values of its past realizations (termed the
autoregressive part) and q values of the innovative input process rk (called the moving average
part). Mathematically this is expressed as
p q
X X
zk = bi zk−i + rk − ci rk−i . (33)
i=1 i=1
for some coefficients bi and ci . We can cast this formulation into a state-space representation
in several ways. The book recommends the following
rk−q
rk−q+1
..
.
rk−2
rk−1
xk = zk−p . (34)
zk−p+1
..
.
zk−2
zk−1
z̃k (−)
The first block of x is the moving average MA(q) part, the second block of x is the AR(p) part
and the third block (the single element z̃(−)) is discussed below. This third element in the
book is written as zk (−) but with an ∞ symbol above it. Since we observe the system output
zk which is determined from the p previous values zk−i for i = 1, 2, · · · p and the observed
zero mean random q previous system inputs rk−i for i = 1, 2, · · · q the state representation
above uses those previously observed values. The last element z̃k (−) is the best estimate of
the prediction of zk given the information thus far. Since we have not observed rk at this
point our prediction is given by the sum of the terms we have observed
p q
X X
z̃k = bi zk−i − ci rk−i . (35)
i=1 i=1
Note that from Equation 33 this is also equal to zk − rk . To derive the discrete time
propagation equation xk+1 = Φk xk we note that since
rk−q+1
rk−q+2
..
.
r
k−1
rk
xk+1 = zk−p+1 ,
zk−p+2
..
.
z
k−1
zk
z̃k+1 (−)
most of the variables in xk+1 are “shifted up” and can be directly found in xk . The ones
that are not are rk , zk , and z̃k+1 (−). The first, rk , we treat as a source of process noise. The
second, zk , we can obtain from z̃k (−) + rk the sum of a term in the state xk and the process
noise rk . The third we express as follows
p q
X X
z̃k+1 (−) = b1 zk − c1 rk + bi zk+1−i − ci rk+1−i
i=2 i=2
p q
X X
= b1 (zk − rk ) + b1 rk − c1 rk + bi zk+1−i − ci rk+1−i
i=2 i=2
p q
X X
= b1 z̃k (−) + (b1 − c1 )rk + bi zk+1−i − ci rk+1−i .
i=2 i=2
Taken together all of these considerations given the books equation 3.9-16.
Problem Solutions
is a solution to the linear variance equation. We can do this by first taking the derivative of
the given expression for P (t) with respect to t. We find
dP dΦ(t, t0 ) dΦ(t, t0 )T
= P (t0 )Φ(t, t0 )T + Φ(t, t0 )P (t0 )
dt dt dt
T T
+ Φ(t, t)G(t)Q(t)G (t)Φ(t, t)
Z t Z t
dΦ(t, τ ) T T dΦ(t, τ )T
+ G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ + Φ(t, τ )G(τ )Q(τ )G(τ )T dτ .
t0 dt t0 dt
Recall that the fundamental solution Φ(t, t0 ) satisfies the following dΦ(t,t
dt
0)
= F (t)Φ(t, t0 ) and
that Φ(t, t) = I with I the identity matrix. With these expressions the right-hand-side of
dP
dt
then becomes
dP
= F (t)Φ(t, t0 )P (t0 )Φ(t, t0 )T + Φ(t, t0 )P (t0 )Φ(t, t0 )T F T (t) + G(t)Q(t)G(t)T
dt Z t Z t
T T
+ F (t)Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ + Φ(t, τ )G(τ )Q(τ )G(τ )T Φ(t, τ )T F (t)T dτ
t0 t0
Z t
T T T
= F (t) Φ(t, t0 )P (t0 )Φ(t, t0 ) + Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ
t0
Z t
T
+ Φ(t, t0 )P (t0)Φ(t, t0 ) + Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ F (t)T + G(t)Q(t)G(t)
T T T
(37)
t0
= F (t)P (t) + P (t)F (t) + G(t)Q(t)G(t)T ,
T
Consider the linear variance equation Ṗ (t) = F P + P F T + Q, then the solution P (t), to this
equation is given in problem 3.1 above in Equation 36. Since our system is time-invariant
we have Φ(t, τ ) = eF (t−τ ) and the expression for P (t) in this case becomes
Z t
F (t−t0 ) F T (t−t0 ) T
P (t) = e P (t0 )e + eF (t−τ ) QeF (t−τ ) dτ
t
Z 0t−t0
T T
= eF (t−t0 ) P (t0 )eF (t−t0 ) + eF v QeF v dv .
0
Where in the second line above we make the substitution v = t − τ in the integral. To make
our above expression for P solve the desired equation from the problem or F P + P F T = −Q,
we will consider the steady-state solution to the linear variance equation by taking t → ∞
in the above expression. In that case P (t) is a constant so Ṗ (t) = 0 and the linear variance
equation reduces to the desired equation F P + P F T = −Q. If our initial state x(t0 ) has no
uncertainty (P (t0 ) = 0) or if our linear system is stable we can assume that
Warning: I’m not entirely sure that I’ve worked this problem correctly since the answer
I propose seems too simple to structure a problem around. If anyone sees an error in my
solution or can offer verification that this is a correct result please email me.
To get an autocorrelation function of the functional form specified in φ̂(τ ) we note that we
can view it as the sum of three parts: σ 2 α12 , σ 2 σ22 cos(ωτ ), and σ 2 α32 e−β|τ | . We next consider
what type of random process give rise to each of these three autocorrelation functional forms.
From figure 3.8-3 in the book the system of a random constant or ẋ1 = 0 has a constant
autocorrelation function and we can take φ̂1 (τ ) = σ 2 α12 .
x2 (t) = A sin(ωt + θ) ,
1
with θ a uniform random variable with density fΘ (θ) = 2π on 0 ≤ θ ≤ 2π has an autocorre-
A2 A2 2 2
√
lation given by 2 cos(ωτ ). If we take 2 = σ α2 or A = 2σα2 then the signal x2 (t) has a
autocorrelation function φ̂2 (τ ) = σ 2 α22 cos(ωτ )
Finally the system
ẋ3 = −βx3 + w ,
where w(t) is white noise signal with E[w(t)w(τ )] = σ 2 α32 δ(t − τ ) has an autocorrelation
function given by φ̂3 (τ ) = σ 2 α32 e−β|τ | .
Warning: For this problem I was unable to get the result quoted in the book and was unable
to find an error in my work or assumptions below. If anyone sees anything wrong with what
I have done please email me, I would be interested in determining what the problem is.
Perhaps it is a typo in the books expression for P (t)?
The diagram in Figure 3.1 gives the following system for the variables x1 (t) and x2 (t)
ẋ1 = x2
ẋ2 = −βx2 + w ,
with E[w(t)w(τ )] = σ 2 δ(t − τ ). We have been able to write down the differential equation
for x2 (t) from the given expression for its autocorrelation φx2 x2 (τ ) = σ 2 e−β|τ | using the
discussion in thebook on exponentially correlated random variables. If we introduce the
x1
state vector x = then from the above we have a linear system for x given by
x2
d x1 (t) 0 1 x1 0
= + .
dt x2 (t) 0 −β x2 w
0 1
From which we see that our system matrix F is given by F = . With F defined
0 −β
in this way the linear variance equation given by 30 for this problem them becomes
ṗ11 ṗ12 0 1 p11 p12 p11 p12 0 0 0 0
= + +
ṗ12 ṗ22 0 −β p12 p22 p12 p22 1 −β 0 σ2
2p12 p22 − βp12
= .
−βp12 + p22 −2βp22 + σ 2
This gives the following system for p11 (t), p12 (t), and p22 (t)
ṗ11 = 2p12
ṗ12 = p22 − βp12
ṗ22 = −2βp22 + σ 2 .
Here we take initial conditions of p11 (0) = 0, p12 (0) = 0, and p22 (0) = σ 2 , meaning that
initially we have uncertainty only in the component x2 . Then we find a solution to p22 (t)
given by
σ2
p22 (t) = (1 + (2β − 1)e−2βt ) .
2β
which is not the same as the expression for p22 (t) given in the book which is simply σ 2 . In
the Mathematical file chap 3 prob 4.nb this and the differential equations for p11 (t) and
p12 (t) are solved. I find it strange that the (2, 2) component of P (t) is constant independent
of t while the other elements are not.
Note that since in this problem the system matrix F is time invariant
the fundamental
Ft 0 1
solution is given by Φ(t) = e . Since the matrix F in this case is , we can
0 −β
compute powers of F directly. We find
2 0 1 0 1 0 −β
F = =
0 −β 0 −β 0 β2
3 0 1 0 −β 0 β2
F = =
0 −β 0 β2 0 −β 3
4 0 1 0 β2 0 −β 3
F = =
0 −β 0 −β 3 0 β4
..
.
2n 0 −β 2n−1
F =
0 β 2n
2n+1 0 β 2n
F = .
0 −β 2n+1
Using these we find that
X∞ ∞
Ft 1 22 1 33 F 2k t2k X F 2k+1 t2k+1
Φ(t) = e = I + Ft + F t + F t + ··· = +
2 6 k=0
(2k!) k=0
(2k + 1)!
X∞ ∞
t2k 0 −β 2k−1 X t2k+1 0 β 2k
= I+
2k! 0 β 2k (2k + 1)! 0 −β 2k+1
k=1 k=0
" P # " P∞ t2k+1 β 2k+1 #
t2k β 2k
0 − β1 ∞ 0 1
= I+ P∞ k=1t2k β2k!
2k + β
Pk=0 (2k+1)!
∞ t2k+1 β 2k+1
0 k=1 2k!
0 − k=0 (2k+1)!
1
1
0 − β (cosh(tβ) − 1) 0 β sinh(tβ)
= I+ +
0 cosh(tβ) − 1 0 − sinh(tβ)
1 − β1 cosh(βt) + β1 sinh(βt)
= .
0 cosh(βt) − sinh(βt)
Problem 3-5 (is this system observable)
To begin with we express the given diagram figure 3-2 in terms of mathematical equations.
We then study the observability of these equations. To begin with from the given diagram
we see that the gyro vertical defection (ξ) error eξ has two terms, a bias term eξb and a
random term eξr and can be expressed as the sum of these two as
eξ = eξb + eξr .
Following the flow diagram from left to right we next see that the variable δv is given by
Z
−g(eξ + δp) = δv ,
ėξb = 0
ėpb = 0 .
Finally the velocity and position measurements zv and zp are related to the state variables
as
zv = δv + ev
zp = epb + ep + δp .
Thus if we take our state to be xT = epb δp δv eξb then our dynamical system in
companion form is given by
epb 0 0 0 0 0 epb 0
d
δp
=
1
R
δv 0 0 1/R 0 δp
=
0
+
dt δv −geξb − geξr − gδp 0 −g 0 −g δv −geξr . (38)
eξb 0 0 0 0 0 eξb 0
Part (a): If our measurement is zp and expressed in terms of the state vector x as
epb
δp
zp = 1 1 0 0
δv + ep ,
eξb
so the measurement sensitivity matrix H in this case is 1 1 0 0 . Since our state
vector is four dimensional the requirement that the state be observable requires that the
block matrix T
H F T H T (F T )2 H T (F T )3 H T , (39)
have rank equal to four. When we compute the above matrix using the matrices F and H
for this problem we find this matrix is given by
1 0 0 0
1 0 −g 0
R .
0 1 0 − Rg2
R
0 0 − Rg 0
This matrix has rank 3 and thus our system with only a position measurement is not ob-
servable.
Part (b): If we have both position and velocity measurements then our measurement vector
z is given by
epb
zp 1 1 0 0 δp
+ e p
z= = .
zv 0 0 1 0 δv ev
eξb
1 1 0 0
So the measurement sensitivity matrix H in this case is . When we compute
0 0 1 0
the observability matrix in Equation 39 above we find that is is given by
1 0 0 0 0 0 0 0
1 0 0 −g − g g2
R
0 0 R .
0 1 1 0 0 − g − g2 0
R R R
g2
0 0 0 −g − Rg 0 0 R
For observability this system this matrix must have a rank of 4. Since the first and second
row can be combined to yield the fourth row it can have rank at most three. It in fact has a
rank of 3 indicating that even with two measurements the given state is still unobservable.
Part (c): Tor this part if we are told that epb = 0, that is the position measurement has
no bias, our state
is now of dimension three i.e. has the representation given by xT =
δp δv eξb and for observability of this state we need to consider the matrix
T
H F T H T (F T )2 H T . (40)
To be observable this matrix must be of rank 3. This matrix is easy to compute since it is
the same observability matrix as in Part (b) above but without the last two columns or
1 0 0 0 0 0
1 0 0 −g − g 0
R .
0 1 1
0 0 − Rg
R
0 0 0 −g − Rg 0
This later matrix does have a rank of three and the resulting system is observable. To prevent
error in algebraic manipulations the matrix multiplications required above are performed in
the Mathematical file chap 3 prob 5.nb.
Problem 3-6 (an approximate solution)
0 1 0
We see that the matrix F in this case is F = 0 0 1 . From this matrix we can
0 0 −α
compute powers of F . We find
0 1 0 0 1 0 0 0 1
F 2 = 0 0 1 0 0 1 = 0 0 −α
0 0 −α 0 0 −α 0 0 α2
0 1 0 0 0 1 0 0 −α
F 3 = 0 0 1 0 0 −α = 0 0 α2
0 0 −α 0 0 α2 0 0 −α2
0 1 0 0 0 −α 0 0 α2
F 4 = 0 0 1 0 0 α2 = 0 0 −α3
0 0 −α 0 0 −α2 0 0 α4
..
.
0 0 (−1)n−2 αn−2
F n = 0 0 (−1)n−1 αn−1 .
0 0 (−1)n αn
Recall that the fundamental solution Φ(t, t0 ) for a linear time invariant system is given by
Φ(t, t0 ) = eF (t−t0 ) , which when we use the definition of the matrix exponential to evaluate
this expression we find
Lets take T = t − t0 and sum the components of these matrices. We find that
T2 3
1 T 2
− α T6 + · · ·
Φ(T ) = 0 1
2 2
T − α T2 + α6 T 3 + · · ·
α2 2 α3 3
0 0 1 − αT + 2 T − 6 T + · · ·
Note that we could explicitly evaluate each of these sums directly in terms of the exponential
function e· , if needed. For example, the (1, 3) element of Φ(T ) above can be written as
T2 T3 e−αT − 1 + αT
−α +··· = .
2 6 α2
If we take only the most significant term in each sum above we find that Φ(T ) is approxi-
mately equal to 2
1 T T2
Φ(T ) = 0 1 T ,
0 0 1
as we were to show.
where λ is a constant vector. The books discussion on controllability, when specified explicitly
for this system gives exactly the requirement stated. That is, the matrix Θ given by
Θ = λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ , (41)
must have rank n for this system to be controllable. As a direct way obtain this result, we
recall that the definition of controllability is that given an arbitrary input x0 we can specify
a set of controls ui such that the state xn after n stages takes any desired value. To build
up an intuition for Equation 41 we find that on the first stage after one control u0 , has been
specified that we arrive at the state x1 via
x1 = Φx0 + λu0 .
On the second stage after the two controls (u0 and u1 ) have been specified we have the state
x2 via
x2 = Φx1 + λu1 = Φ(Φx0 + λu0 ) + λu1 = Φ2 x0 + Φλu0 + λu1 .
In the same way, on the third stage after three controls u0 , u1 , and u2 we have the state x3
via
x3 = Φ3 x0 + Φ2 λu0 + Φλu1 + λu2 .
Generalizing the above, at the nth stage we have used n controls and have the state xn in
terms of these controls given by
is invertible, then we can specify the n control values ui to get any state xn and vice versa. As
another way of stating this result is the following. Given an arbitrary initial state x0 and a
T
target state xn we can compute a vector u of controls u = u0 u1 u2 · · · un−2 un−1
such that we arrive at the target state xn in n steps by solving the system
un−1
un−2
un−3
λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ .. = xn − Φn x0 .
.
u1
u0
for the vector u. This requires the invertibility of Θ, or equivalently that Θ must have rank
n, which is what we wanted to show.
For this problem I have denoted the value of the signal on the first “loopback” line x2 (t) (since
this signal will get multiplied by T12 ) and the value of the signal on the second “loopback” line
as x1 (t) (since this signal will get multiplied by T11 ). Under that convention, the differential
equation for the system given by figure 3-3 is then given by
1 1
ẋ1 (t) = − x1 (t) + x2 (t)
T1 T1
1 1
ẋ2 (t) = − x2 (t) + w(t) .
T2 T2
x1 (t)
If our system state is then the system above can be written in terms of matrices
x2 (t)
as
d x1 (t) − T11 T11 x1 (t) 0
= + 1 .
dt x2 (t) 0 − T12 x2 (t) T2
w(t)
From this expression we see that the F matrix for this problem is given by
− T11 T11
F = .
0 − T12
Since this is independent of time the fundamental solution Φ(t, t0 ) = eF (t−t0 ) and thus to
determine Φ(t, t0 ) we need to evaluate eF (t−t0 ) . Since this problem is time invariant without
loss of generality we can take t0 = 0. To compute Φ(t) = eF t we will solve two initial values
problems. The first will have initial conditions given by
x1 (0) 1
= ,
x2 (0) 0
and the second will have initial conditions given by
x1 (0) 0
= .
x2 (0) 1
The solutions to the first initial value problem become the first column of the matrix eF t
and the solution to the second initial value problem will become the second column of eF t .
When we do this we find that
" −t t #
T2 −T − t
e T1
e 1 − e T2
eF t = T1 −T2
− t
0 e T2
From the above expression we see that Φ(∆t) = eF ∆t is the same as the expression we are
asked to derive in the book. In the Mathematical file chap 3 prob 8.nb some of the algebra
for this problem is done.
From figure 3-4 in the book we see that as a system of differential equations we obtain
ẋn = xn−1 (t) + wn (t)
ẋn−1 = xn−2 (t) + wn−1 (t)
ẋn−2 = xn−3 (t) + wn−2 (t)
..
.
ẋ3 = x2 (t) + w3 (t)
ẋ2 = x1 (t) + w2 (t)
ẋ1 = w1 (t) ,
with each white noise term w i (t) has a spectral density given by qi δ(t). If we define the
x1
x2
x3
system state vector as x(t) = ... then our system above in matrix notation is given
xn−2
xn−1
xn
by
x1 0 0 0 x1 w1
x2 1 0 0
x2
w2
x3 ..
0 1 0 . x3 w3
d . . .
.
. = . . .
. . .
. . . .. + .. .
dt
xn−2 x
. . . 0 0 0 n−2 wn−2
xn−1 x wn−1
1 0 0 n−1
xn 0 1 0 xn wn
Thus the system matrix F in this case is the zero matrix with ones on the first sub-diagonal.
With the above F the linear variance equation Ṗ = F P + P F T + Q has a somewhat special
form. The product F P is a block row matrix composed of an initial row of zeros followed
by the first n − 1 rows of P . The product P F T is a block column matrix with the first block
a column of zeros and the second block the first n − 1 columns of the matrix P . With these
observations when we write out the linear variance equation for this problem with Ṗ (t) given
by
ṗ11 ṗ12 ṗ13 · · · ṗ1n
ṗ21 ṗ22 ṗ23 · · · ṗ2n
Ṗ (t) = ṗ31 ṗ32 ṗ33 · · · ṗ3n ,
.. ..
. .
ṗn1 ṗn2 ṗn3 · · · ṗnn
we get the following system
0 0 0 ··· 0
p11 p p · · · p
12 13 1n
p21 p p · · · p
Ṗ (t) = 22 23 2n
.. ..
. .
pn−1,1 pn−1,2 pn−1,3 · · · pn−1,n
0 p11 p12 · · · p1,n−1 q1 0 0 ··· 0
0 p21 p22 · · · p2,n−1 0 q2 0 · · · 0
+ 0 p31 p32 · · · p3,n−1 + 0 0 q3 · · · 0
.. .. .. ..
. . . .
0 pn1 pn2 · · · pn,n−1 0 0 0 · · · qn
q1 p11 p12 ··· p1,n−1
p11 p12 + p21 + q2 p13 + p22 ··· p1n + p2,n−1
p21 p22 + p31 p23 + p32 + q3 · · · p2n + p3,n−1
= .
.. ..
. .
pn−1,1 pn−1,2 + pn,1 pn−1,3 + pn2 · · · pn−1,n + pn,n−1 + qn
Looking at the above expressions we see that in component form we have that the (i, j)th
component of the product F P is
(F P )ij = pi−1,j (t) .
for i ≥ 2 and that the (i, j)th component of the product P F T is given by
(P F T )ij = pi,j−1(t) ,
for j ≥ 2. The differential equation for the function pij (t) is thus given by
ṗij (t) = pi−1,j + pi,j−1 + qi δij , (42)
for 2 ≤ i ≤ n and 2 ≤ j ≤ n.
If we look at the first row of these equations we have for the p11 (t) the following
q1 t2
ṗ12 = p11 = q1 t ⇒ p12 = .
2
The equation for p13 next gives
q1 t2 q1 t3
ṗ13 = p12 = ⇒ p13 = .
2 6
In general for the first row we have
q1 tj
p1j = for 1 ≤ j ≤ n (43)
j!
When we recall that p12 (t) = p21 (t) by the symmetry of P (t) the equation for p22 (t) is
ṗ22 = 2p12 + q2 = q1 t2 + q2 .
We will now use this expression in Equation 36 to derive an expression for pii (t). In this
problem here we have P (t0 ) = 0, G(t) = I, and Q(t) = Q where Q is a diagonal matrix.
Then we have
Z t Z t−t0
T
P (t) = Φ(t − τ )QΦ (t − τ )dτ = Φ(τ )QΦ(τ )T dτ .
t0 0
Since Q is diagonal the product Φ(t)Q is easy to compute since it is a scalar multiplier of
each column of Φ(t). That is we have
q1 0 0 0 ··· 0
q1 t q2 0 0 ··· 0
q1 2
t q t q 0 ··· 0
2 2 3
Φ(t)Q = q1 3
t q2 2
t q 3t q4 ··· 0 .
3! 2
.. .. .. .. .. ..
. . . . . .
q1 n−1 q2 n−2 q3 n−3 q4
(n−1)!
t (n−2)!
t (n−3)!
t (n−4)!
tn−4 · · · qn
From this we see that the elements of the nth row of Φ(t)Q are given by
q1 q2 qn−2 2
tn−1 , tn−2 , · · · , t , qn−1 t , qn .
(n − 1)! (n − 2)! 2!
The nth column of Φ(t)T is given by the nth row of Φ(t) and has elements given by
1 1 1
tn−1 , tn−2 , · · · , t2 , t , 1 .
(n − 1)! (n − 2)! 2!
When we take the dot product of these two vector we see that the the (n, n)th component
of P (t) is given by when we take t0 = 0
Z tXn Xn
2 qn+1−i 2i−2 qn+1−i t2i−1
pnn (t) = E[xnn (t) ] = 2
τ dτ = ,
0 i=1 (i − 1)! i=1
(i − 1)!2 (2i − 1)
as we were to show.
Note: I think there is an error in this problem. The error has to do with the additive
noise function n(t). The book states that the autocorrelation of n(t) is proportional to
a delta function, specifically φnn (τ ) = Nδ(t). I think what they meant to say was that
E[n(t)n(τ )] = N 2 δ(t − τ ) (note the square on N). In this later case I can show the stated
claim: that k = 1.0 when β = σ 2 = 1.0 and N = 12 .
From the given diagram in figure 3-6 for the unknowns c(t) and r(t) we find the following
system of differential equations
ċ(t) = kr(t) − kc(t) − kn(t)
ṙ(t) = −βr(t) + w(t) .
Note in deriving the given differential equation for r(t) we have used the discussion on
exponentially correlated random variables, since we are told its autocorrelation function is
φrr (τ ) = σ 2 e−β|τ | . In matrix from we find this system is given by
d c(t) −k k c(t) −k 0 n
= + .
dt r(t) 0 −β r(t) 0 1 w
−k k
From this expression we see that our system matrix F is given by F = and
0 −β
using the linear variance equation 30 we have
ṗ11 ṗ12 −k k p11 p12 p11 p12 −k 0
= +
ṗ12 ṗ22 0 −β p12 p22 p12 p22 k −β
2
−k 0 N 0 −k 0
+ 2
0 1 0 σ 0 1
−2kp11 + 2kp12 + k 2 N 2 −(k + β)p12 + kp22
= .
−(k + β)p12 + kp22 −2βp22 + σ 2
If we next restrict to the steady-state version of this, where take all time derivatives equal
to zero and solve for pij (t) to find
σ2 kσ 2 kσ 2 kN 2
p22 = , and p12 = , and p11 = + .
2β 2β(k + β) 2β(k + β) 2
With these expressions as the steady-state values for a matrix PSS we can compute the value
of the error variance, where our error function e(t) is defined as e(t) = c(t) − r(t). Writing
this error e(t) as the vector inner product
c(t)
e(t) = 1 −1 ,
r(t)
so that the variance of e(t) as a function of k can be computed using the matrix PSS as
" #
kσ2 kN 2 kσ2
2
1 2β(k+β)
+ 2 2β(k+β) 1
σe (k) = 1 −1 PSS = 1 −1 kσ2 σ2
−1 2β(k+β) 2β
−1
kN 2 kσ 2 σ2
= − + .
2 2β(k + β) 2β
Since the above expression is a function of k, then to pick k such that this expression is a
minimum we take the derivative with respect to k, set the resulting expression equal to zero,
and solve for k. When we do this we find that k is given by
σ
k = −β ± √ . (44)
N2
1
If we take β = σ 2 = 1.0, and N = 2
then from the above we see that k is given by
−3
k = −1 ± 2 = .
1
When we put the value of k = −3 into the second derivative of σe2 (k) we see that the value of
the second derivative is − 81 , which is negative indicating that this value of k gives a maximum
of σe2 (k). When we put the value of k = +1 into the second derivative of σe (k)2 we get a
value of 81 which is positive indicating that this value of k is a minimum as we were asked
to show. In the Mathematical file chap 3 prob 11.nb some of the algebra for this problem
is done.
Chapter 4 (Optimal Linear Filtering)
Here we explain how to evaluate the books equation 4.0-3 if we have k measurements zi of
the same quantity x. As k scalar equations we have zi = x + vi for i = 1, 2, · · · , k. This
same situation can be viewed as a vector of measurements z by introducing the measurement
sensitivity matrix H for this problem as
1 v1
1 v2
z = ... x + ... .
1 vk−1
1 vk
Thus the matrix H in this case is in fact a column vector. The least squares estimate of x
given z is given by equation 4.0-3 or
x̂ = (H T H)−1 H T z .
Pk
For the H given above we have H T H = k and H T z = i=1 zi so that our least squares
estimate x̂ is given by
k
1X
x̂ = zi ,
k i=1
which is the books equation 4.1-1.
For this chapter we will consider a certain specific forms for the estimator of the unknown
state x at the k-th time step after the k measurement zk has been observed. We denote this
estimate of x as x̂k (+), and the previous estimate of the state x before the measurement as
x̂k (−). With this notation in this section we want to study estimators that linearly combine
these two pieces of information in the following form
We have yet to determine the optimal choice for the yet undetermined coefficients Kk′ and
Kk . Since our kth measurement zk in terms of the true state xk and measurement noise vk
is given by
zk = Hk xk + vk , (46)
the above expression for x̂k (+) can be written as
Thus we have replaced the measurement zk with an expression in terms of the state xk . To
replace the value of x̂k (−) with something in terms of the state xk we introduce the the error
in our a priori estimate x̂k (−) as x̃k (−) defined as
Introducing the a posteriori error x̃k (+) = xk + x̃k (+) into the left-hand-side of Equation 48
gives the following
which is the books equation 4.2-2. If we assume that the a priori estimate x̂k (−) is unbiased
meaning that E[x̂k (−)] = xk or equivalently E[x̃k (−)] = 0 then to have our a posteriori
estimate, x̂k (+), also be unbiased requires that we take
Kk′ = I − Kk Hk , (50)
which is the books equation 4.2-3. Using this expression in Equation 45 gives
We will now determine Kk by minimizing an appropriate measure of the error in our new
estimate x̂k (+). If we define the value of Pk (−) to be the prior covariance Pk (−) ≡
E[x̃k (−)x̃k (−)T ] and a posterior covariance error Pk (+) defined in a similar manner namely
then with the value of Kk′ given above by Kk′ = I − Kk Hk we can use Equation 52 to derive
our posterior state estimate as
By expanding the terms on the right hand side of this expression and remembering that
E[vk x̃Tk (−)] = 0 gives
To evaluate the trace of Pk (+) we will use the quadratic outer product trace derivative
∂
trace[ABAT ] = 2AB , (55)
∂A
and the sandwich product trace derivative identity
∂
trace[BAC] = B T C T . (56)
∂A
Then to use these two identities when we rotate Kk to be in the middle of the matrix
products2 so that we can use the sandwich product trace derivative we have that trace[Pk (+)]
is given by
∂trace[Pk (+)]
= −2Pk (−)HkT + 2Kk Hk Pk (−)HkT + 2Kk Rk
∂Kk
= −2Pk (−)HkT + Kk (2Hk Pk (−)HkT + 2Rk ) . (57)
Now that we have an expression for Kk , alternative forms for the error covariance extrapola-
tion Pk (+) can be obtained through algebraic manipulations. Because every matrix depends
explicitly on k, in the following derivations we can drop the k subscript index from the
given P (±), H and R matrices. The subscript will be added again to the equations that
are the most significant. To derive alternative form for P (+) we expand the product on the
right-hand-side of Equation 53 to get
which is the books equation 4.2-16 a. When we recall the expression we found for Kk in
Equation 58 or K = P (−)H T [HP (−)H T + R]−1 by using the last line above we get
which is the books equation 4.2-16 b. This later form is most often used in computation.
To begin this subsection we want to show that inverses of the state covariance matrices are
“easy” to update after obtaining a measurement zk . Namely we want to show that
is true. To do this consider the product Pk (+)Pk (+)−1 , where Pk (+) is given by Equation 60
and Kk is given by Equation 58. Dropping the subscripts k to ease algebraic manipulation
we find
as we were to show.
To derive another form for Kk we can introduce the product Pk (+)−1 Pk (+) = I into the
expression for Kk provided in Equation 58 as
Kk = P (−)H T [HP (−)H T + R]−1
= [P (+)P (+)−1]P (−)H T [HP (−)H T + R]−1
= P (+)[P (−)−1 + H T R−1 H]P (−)H T [HP (−)H T + R]−1
= P (+)[H T + H T R−1 HP (−)H T ][HP (−)H T + R]−1
= P (+)H T [I + R−1 HP (−)H T ][HP (−)H T + R]−1
= P (+)H T R−1 [R + HP (−)H T ][HP (−)H T + R]−1
= Pk (+)HkT Rk−1 , (62)
which is the books equation 4.2-20.
In this subsection we present the algebra and further discussion on the Kalman filtering
examples presented in the book. We begin with the estimation of a constant x from a
series of uncorrelated corrupted noisy measurements. For this example, because there is no
dynamics the variance propagation equation is simple pk (−) = pk−1(+) and with Hk = 1 the
error covariance update equation due to the measurement zk is
pk (+) = (1 − kk )pk (−) .
For this scalar problem we then have kk = pk (−)(pk (−) + r0 )−1 and with the above we find
pk (+) given by
r0 pk (−) pk (−)
pk (+) = pk (−) − pk (−)[pk (−) + r0 ]−1 pk (−) = = .
pk (−) + r0 1 + pkr(−)
0
The iterative equation for pk (+) is then given by replacing pk (−) with pk−1 (+) in the above
expression to get
pk−1(+)
pk (+) = p (+)
.
1 + k−1r0
The above expression can be iterated to find the general solution with p0 (+) = p0 . We have
p0
p1 (+) =
1 + pr00
p0
p
p1 (+) 1+ r0
0
p0
p2 (+) = = p0 =
1+ p1 (+)
r0 1+ r0
p
1 + 2p
r0
0
1+ r0
0
p0
2p
p2 (+) 1+ r 0
0
p0
p3 (+) = = p0 =
1+ p2 (+)
r0 1+ r0 1 + 3p
r0
0
2p
1+ r 0
0
..
.
p0
pk (+) = . (63)
1 + kp
r0
0
Given this analytic form for pk (+) we can write the Kalman gain Kk with Equation 62 as
0 p
pk (+)
Kk = Pk (+)HkT Rk−1 = = r
.
r0 1 + kpr0
0
There is no process dynamics in this problem so when we need to propagate the state to the
time tk+1 and before the next measurement we have x̂k+1 (−) = x̂k (+).
x1
From the given state vector and measurement sensitivity matrix H we seek to deter-
x2
mine how a single measurement z modifies our uncertainty in the state. To do this we will
use the a posterior covariance update equation
To evaluate the right-hand-side of the above from the problem description we see that
p11 (−) p12 (−) 2
HP (−) = 0 1 = p12 (−) p22 (−) = σ12 σ22 .
p12 (−) p22 (−)
Using P (−)H T just computed the inner product like term HP (−)H T is given by
T
p12 (−)
HP (−)H = 0 1 = p22 (−) = σ22 .
p22 (−)
We multiply the two matrices on the right-hand-side and introduce the correlation ρ with
4
2 σ12
σ12 = σ1 σ2 ρ so that 2
= σ22 ρ2 .
σ1
We then get that P (+) equals
σ22 (1−ρ2 )+r2 r2
σ12 2
2
σ12 σ22 +r2
P (+) = σ2 +r2 ,
2 r2 r2
σ12 σ22 +r2
σ22 σ22 +r2
which is the expression in the book. Some special cases of this result are worth considering.
When the measurement z is perfect meaning that there is no estimation error we have r2 = 0
and P (+) becomes 2
σ1 (1 − ρ2 ) 0
P (+) = .
0 0
Thus we have no uncertainty in the value of x2 and we have maximally reduced our uncer-
tainly in x1 . Next if the measurement z gives no information about x1 their correlation is
zero. When we take ρ = 0 in the above we have
r2
σ12 2
σ12 2
P (+) = σ2 +r2 ,
r2
2
σ12 σ2 +r2
σ22 σ2r+r2
2
2 2
Thus the measurement z provides no information about x1 and using it does not reduce the
initial uncertainty in x1 so we have p11 (+) = σ12 . If the unknowns x1 and x2 are perfectly
correlated ρ = ±1 we have
σ12 σ2r+r
2
σ 2 r2
2 2 12 σ2 +r2 ,
2
P (+) = r2
2
σ12 σ2 +r2 σ2 σ2r+r
2 2
2
2 2
thus the measurement z provides the same amount of information for both x1 and x2 and
reduces their initial uncertainty by the same amount (by the fraction σ2r+r
2
2
).
2
If our a priori estimate of the state is zero x̂(−) = 0 then from the posteriori state update
equation x̂(+) = x̂(−) + Kk (z − H x̂k (−)) we have x̂(+) = Kk z. We compute Kk it the
normal way
Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1 = P (0)H T (HP (0)H T )−1 ,
where we have assumed that the measurement noise is zero “very small”. Given the pieces
from this problem we will now compute Kk . Using the given expression for P (0) and H we
find
1 e−r12 /d e−r13 /d 0 0 −r23 /d
0 1 0 −r12 /d −r23 /d 1 e
T
HP (0)H = σφ 2
e 1 e 1 0 = σφ
2
.
0 0 1 −r13 /d −r23 /d e−r23 /d 1
e e 1 0 1
The inverse of this matrix is given by
T −1 1 1 −e−r23 /d
(HP (0)H ) = 2 .
σφ (1 − e−2r23 /d ) −e−r23 /d 1
Using this as a factor we next find that the product K = P (0)H T (HP (0)H T )−1 given by
−r /d
e 12 − e−r13 /d−r23 /d e−r13 /d − e−r12 /d−r23 /d
1 .
1 − e−2r23 /d 0
1 − e−2r23 /d
0 1 − e−2r23 /d
φ̂1
From this matrix we can compute x̂(+). We find since x̂(+) = φ̂2 = Kz that
φ̂3
e−r12 /d − e−r13 /d−r23 /d e−r13 /d − e−r12 /d−r23 /d
1 −2r /d φ 2
x̂(+) = 1 − e 23 0
1 − e−2r23 /d −2r23 /d φ3
0 1−e
1
1−e−2r23 /d
(e−r12 /d − e−(r13 +r23 )/d )φ2 + (e−r13 /d − e−(r12 +r23 )/d )φ3
= φ2 ,
φ3
which duplicates the results given in the book. In the Mathematical file chap 4 2 3.nb we
perform some of the algebra not displayed in the above derivation.
The propagation from t = 0 to t = T the time of the first fix is done using the state error
covariance extrapolation equation or P (T − ) = Φ(T, 0)P (0)Φ(T, 0)T . Using the given matrix
Φ(T, 0) for this problem we can compute P (T − ) to find
2
1 T T2 σp2 0 0 1 0 0
P (T − ) = 0 1 T 0 σv2 0 T 1 0
T2
0 0 1 0 0 σa2 T 1
2 2 T2 2
2
σp T σv 2 σa 1 0 0
= 2
0 σv T σa 2 T 1 0
T2
0 0 σa2 2
T 1
2 4 3 2
σp + T 2 σv2 + T4 σa2 T σv2 + T2 σa2 T2 σa2
= σv2 + T 2 σa2 T σa2 ,
3
T σv2 + T2 σa2 (65)
T2 2 2 2
σ
2 a
T σa σa
which is the expression in the book. Note I have used the notation σp2 = δp2 (0), σv2 = δv 2 (0),
and σa2 = δa2 (0) since it is easier to type. After the measurement the new uncertainty P (T + )
is reduced from P (T − ) with
H T P (T − )H + R = p11 (T − ) + σp2 .
and
HP (T − ) = −1 0 0 P (T − ) = − p11 (T − ) p12 (T − ) p13 (T − ) ,
p11 (T − )
so that P (T − )H T = − p12 (T − ) . With these we find the matrix product given about
p13 (T − )
Since the total uncertainty after the fix P (T + ) is given by P (T − ) − M, with M computed
above we see that the uncertainty of the (1, 1) component becomes
and thus the first fix reduces the error in the position measurement to that of the sensor.
In this section using the discrete results we derive how the continuous covariance matrix P (t)
propagates due to the process dynamics and the continuous measurement stream. When we
use the approximations Φk → I + F ∆t, and Qk → GQGT ∆t in
we get
Pk+1(−) = Pk (+) + [F Pk (+) + Pk (+)F T + GQGT ]∆t + O(∆t2 ) . (68)
Recalling that after a measurement our state uncertainty is updated with
we can put this expression for Pk (+) into the right-hand-side of Equation 68
Pk+1 (−) = (I−Kk Hk )Pk (−)+[F (I−Kk Hk )Pk (−)+(I−Kk Hk )Pk (−)F T +GQGT ]∆t+O(∆t2 ) .
We can manipulate this into a first order difference as
1 1
Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1
∆t ∆t
= Pk (−)HkT (Hk Pk (−)HkT ∆t + Rk ∆t)−1 .
With the discrete covariance matrix Rk converging to the spectral density matrix R(t) when
∆t → 0 we have Rk ∆t → R as ∆t → 0 and the term Hk Pk (−)HkT ∆t → 0 as ∆t → 0 and so
this term then has the following limit
1
Kk → P H T R−1 . (70)
∆t
In the same way the Kalman gain Kk by itself limits to zero since
1
Kk → P H T R−1 then Kk → ∆tP H T R−1 = 0 ,
∆t
as ∆t → 0. Because of this in Equation 69 the two terms −F Kk Hk Pk (−) and −Kk Hk Pk (−)F T
vanish when we take the limit of ∆t tending to zero. Collecting all of these results we are
finally left with
Ṗ (t) = F P + P F T + GQGT − P H T R−1 HP , (71)
which is known as the matrix Riccati equation, and is the books equation 4.3-8.
Having just developed the matrix Riccati equation which governs how the state covariance
matrix P (t) evolves we now perform the same procedure to determine the equation that
governs how the continuous state x̂(t) evolves. As before we begin with the corresponding
discrete state update equation
where we put x̂k (−) = Φk−1 x̂k−1 (+) into the above to get
which is the books equation 4.3-10. Next we use the discrete to continuous approximations
Φk−1 = I + F ∆t
Kk = P H T R−1 ∆t ,
to get
or
x̂k (+) − x̂k−1 (+)
= F x̂k−1 (+) + P H T R−1 (zk − Hk x̂k−1 (+)) + O(∆t) .
∆t
In the limit ∆t → 0 this becomes
˙
x̂(t) = F x̂(t) + P H T R−1 (z − H x̂(t)) , (72)
which is the continuous Kalman filter equation. Note that in the above the expression
P (t) given by solving the matrix Riccati Equation 71 for P (t).
Often it is helpful to have a dynamical expression for the error in the continuous state
estimate x̂(t). To derive the differential equation for this error x̃(t) ≡ x̂(t) − x(t) we subtract
the equation governing the true system dynamics
dx
= F x + Gw ,
dt
from the continuous Kalman filter Equation 72 to get
x̃(t)
= F x̃(t) − Gw + P H T R−1 (z − H(x̃(t) + x(t)))
dt
= F x̃(t) − Gw − P H T R−1 H x̃(t) + P H T R−1 v(t) .
Where we have used z(t) − Hx(t) = v(t). When we group terms we have
x̃(t)
= (F − P H T R−1 H)x̃(t) − Gw + P H T R−1 v .
dt
Recalling that K(t) can be expressed as P (t)H(t)T R(t)−1 this later expression becomes
dx̃
= (F − KH)x̃ − Gw + Kv , (73)
dt
which is the books equation 4.3-13.
If our process w(t) and measurement v(t) noise are correlated, meaning that E[w(t)v T (τ )] =
C(t)δ(t − τ ), then we can transform this problem into one where the new process noise term
is uncorrelated with the measurement noise. The algebra to do this are discussed here. Since
our measurement z(t) is given in terms of our state via z = Hx + v we can add a multiple
(say D) of the expression z − Hx − v = 0 to the system dynamics equation giving
dx(t)
= F x + Gw + D(z − Hx − v) = (F − DH)x + Dz + Gw − Dv . (74)
dt
If we take D to be given by the special value of D = GCR−1 then we claim that this new
process noise term Gw − Dv will be uncorrelated with the measurement noise v and results
in a system of the type we have previously been studying. To prove this, we compute the
cross-correlation of the new process noise term Gw − Dv with the old measurement noise
term v as
E[(Gw − Dv)v T ] = GE[wv T ] − DE[vv T ]
= GC − DR
= GC − GCR−1 R = 0 ,
as we desired to show. We next derive the continuous Kalman filter and the matrix Riccati
equation for the system given by Equation 74. To derive the continuous Kalman filter in
this case we will use the form given by Equation 72 but with a few modifications. The first
modification is that with a deterministic forcing in the system dynamics (as we have here in
the form of the Dz term) this forcing must also show up as a term on the right-hand-side of
Equation 72. The second modification is that the “F ” matrix in Equation 72 is now given
by F − DH. We thus obtain
˙
x̂(t) = (F − GCR−1 H)x̂(t) + P H T R−1 (z − H x̂(t)) + GCR−1 z
= F x̂(t) − (GCR−1 H + P H T R−1 H)x̂(t) + (P H T R−1 + GCR−1 )z
= F x̂ − (P H T + GC)R−1 H x̂ + (P H T + GC)R−1 z
= F x̂ + (P H T + GC)R−1 (z − H x̂) . (75)
Next we consider the matrix Riccati Equation 71 for this system. As before we need to
modify this slightly for the given system. The first modification is again that “F ” matrix
in Equation 71 becomes F − DH = F − GCR−1 H. The second modification is that the Q
matrix (representing the process noise covariance matrix) needs to correspond to the form
of the process noise we have here which has a form given by
Gw − Dv = Gw − GCR−1 v = G(w − CR−1 v) .
A noise vector of this form will have a covariance matrix given by
Cov(G(w − CR−1 v)) = GCov((w − CR−1 v))GT
= G(Cov(w) + Cov(CR−1 v) − 2Cov(wv T )R−1 C)GT
= G(Q + CR−1 RR−1 C − 2CR−1 C)GT
= G(Q − CR−1 C)QT .
This later expression will replace the expression GQGT in Equation 71. When we make
these two substitutions into the matrix Riccati equation and perform some manipulations.
We find
Ṗ (t) = (F − DH)P + P (F − DH)T + G(Q − CR−1 C)GT − P H T R−1 HP
= F P + P F T + GQGT − GCR−1 HP − P H T R−1 CGT − GCR−1 CGT − P H T R−1 HP
= F P + P F T + GQGT − GCR−1 (HP + CGT ) − P H T R−1 (HP + CGT )
= F P + P F T + GQGT − (GCR−1 + P H T R−1 )(HP + CGT )
= F P + P F T + GQGT − (GC + P H T )R−1 (CG + P H T )T
= F P + P F T + GQGT − (GC + P H T )R−1 RR−1 (CG + P H T )T . (76)
If we define
K(t) ≡ (P H T + GC)R−1 , (77)
then we see that Equation 75 and 76 become
This result agrees with the ones presented in the book when given a system with correlated
process and measurement noises.
where P (0) ≈ +∞ can be taken to mean that we have no a priori information. We will
transform this expression into a differential equation for P (t)−1 . To do this recall that since
Ṗ −1 = −P −1 Ṗ P −1 by solving for Ṗ (t) we get that Ṗ = −P Ṗ −1P and using this expression
in the left-hand-side of Equation 78 we get
−P Ṗ −1 P = F P + P F T + GQGT − P H T R−1 HP .
or by multiplying by P −1 once on the left and once on the right and then negating we get
Ṗ −1 = −P −1 F − F T P −1 − P −1 GQGT P −1 + H T R−1 H
= −F T P −1 − P −1 F − P −1 GQGT P −1 + H T R−1 H , (79)
where the last equation simple changes the order of the terms in the equation above it.
The initial condition P (0) ≈ +∞ transforms into the initial condition that P −1(0) = 0. If
we assume our system has no process noise then the term GQGT vanishes and this is the
books equation 4.4-10. We can solve this equation as in Problem 3.1 on Page 28. Since the
T
fundamental solution to the system with a transition matrix −F T is given by e−F (t−τ ) we
see that the solution to P (t)−1 is given by
Z t
−1 T
P (t) = e−F (t−τ ) H(τ )T R−1 (τ )H(τ )e−F (t−τ ) dτ
Z0 t
T
= eF (τ −t) H(τ )T R−1 (τ )H(τ )eF (τ −t) dτ
Z0 t
= Φ(τ, t)T H(τ )T R−1 (τ )H(τ )Φ(τ, t)dτ ,
0
which is the books equation 4.4-11. In the above Φ(t, τ ) is the transition matrix correspond-
ing to F .
Notes on correlated measurement errors: continuous time when R is singular
z1 = ż − Ez , (80)
then we see that we can write z1 in terms of our original state x, the original process noise
w, and the unexplained measurement noise w1 as
z1 = ż − Ez
d
= (Hx + v) − E(Hx + v)
dt
= Ḣx + H ẋ + v̇ − EHx − Ev
= Ḣx + H(F x + Gw) + (Ev + w1 ) − EHx − Ev
= (Ḣ + HF − EH)x + HGw + w1
= H1 x + v1 .
For this measurement equation for z1 we can now calculate its measurement covariance
matrix R1 as E[v1 v1T ]. Since w and w1 are uncorrelated E[ww1T ] = 0 and we find
which is the books equation 4.5.7. The cross correlation matrix C1 is then computed as
which is the books equation 4.5.8. The equivalent problem which we have just formulated is
then expressed as
ẋ = F x + Gw with w ∼ N(0, Q)
z1 = H1 x + v1 ,
with the matrices R1 and C1 given by Equations 81 and 82 respectively. For this continuous
problem, since we have correlated process and measurement noise using Equation 77 we find
K1 given by
which is the books equation 4.5-9. Then the continuous Kalman filter is given by
Ṗ = F P + P F T + GQGT − K1 R1 K1T ,
which is the books equation 4.5-11. We can avoid having to differentiate our measurement z
which is seemingly required by the ż term on the right-hand-side in Equation 84 by instead
taking our state to be x(t) − K1 (t)z(t). Using this expression when we put Equation 84 into
the derivative dtd (x̂(t) − K1 (t)z(t)) we get
d
(x̂(t) − K1 (t)z(t)) = x̂˙ − K̇1 z − K1 ż
dt
= (F x̂ + K1 (ż − Ez − H1 x̂)) − K̇1 z − K1 ż
= F x̂ − K1 Ez − K1 H1 x̂ − K̇1 z
= (F − K1 H1 )x̂ − K1 Ez − K̇1 z .
When we have Rk ≡ 0, the update equation for the error covariance matrix is given by
If we multiply this by Hk on the left and HkT on the right we find that
showing that a linear combination of elements from Pk (+) is zero so a linear combination of
states is known exactly.
In this section we will demonstrate an algebraic transformation that will allow the solution
of the Riccati equation in the case where it has constant coefficients
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
with P (t0 ) given. To show the transformation we will use to solve the equation above we let
λ = Py , (85)
ẏ = −F T y + H T R−1 HP y . (86)
λ̇ = Ṗ y + P ẏ
= (F P + P F T + GQGT − P H T R−1 HP )y + P (−F T y + H T R−1 HP y)
= F P y + GQGT y
= F λ + GQGT y , (87)
which
is the books equation 4.6-5. As a matrix system with a vector of unknowns given by
y
Equations 86 and 87 combine to give
λ
ẏ −F T H T R−1 H y
= T , (88)
λ̇ GQG F λ
which is the books equation
4.6-6.
Since this is a time-invariant linear dynamical system for
y
the vector of unknowns , let Φ = Φ(t0 + τ, t0 ) be its transition matrix, such that when
λ
written in block form
y(t0 + τ ) Φyy (τ ) Φyλ (τ ) y(t0)
= .
λ(t0 + τ ) Φλy (τ ) Φλλ (τ ) λ(t0 )
If we compute components of the product above we find
y(t0 + τ ) = Φyy (τ )y(t0 ) + Φyλ (τ )λ(t0 ) = Φyy (τ )y(t0 ) + Φyλ (τ )P (t0 )y(t0) (89)
λ(t0 + τ ) = Φλy (τ )y(t0 ) + Φλλ (τ )λ(t0 ) = Φλy (τ )y(t0 ) + Φλλ (τ )P (t0 )y(t0 ) . (90)
We can replace the left-hand-side of Equation 90 with λ(t0 + τ ) = P (t0 + τ )y(t0 + τ ) and
then use Equation 89 to evaluate y(t0 + τ ) to get
[Φλy (τ ) + Φλλ (τ )P (t0 )]y(t0 ) = λ(t0 + τ ) = P (t0 + τ )y(t0 + τ )
= P (t0 + τ )[Φyy (τ ) + Φyλ (τ )P (t0 )]y(t0 ) .
If we “cancel” y(t0 ) from both side of this expression and solve for P (t0 + τ ) we get
P (t0 + τ ) = [Φλy (τ ) + Φλλ (τ )P (t0 )][Φyy (τ ) + Φyλ (τ )P (t0 )]−1 , (91)
which is the books equation 4.6-8.
As a special case we can use the above result to solve the linear variance equation
Ṗ = F P + P F T + GQGT with P (t0 ) given .
Since the linear variance equation has H T R−1 H = 0 the system in Equation 88 is given by
ẏ −F T 0 y
= T . (92)
λ̇ GQG F λ
In the above the equation for y decouples from that of λ and we have ẏ = −F T y so that
T
the fundamental solution for y is Φyy (τ ) = e−F τ and y(t) at any time is then given using
that as y(t) = Φyy (t)y0 . The differential equation for λ now has the known function y(t) as
a forcing term and is given by
λ̇ = GQGT y + F λ = F λ + GQGT Φ(t)y0 .
As forcing functions like GQGT Φ(t)y0 are not important in determining fundamental solu-
tions, the fundamental solution for λ(t) is eF t . Next, to see that Φyλ (τ ) = 0 we can note that
for the matrix given in Equation 92 the block matrix fundamental solution Φ(τ ) is given by
−F T
0 ∞ k
GQG T
F
τ
X τk −F T 0
Φ(τ ) = e = .
k=0
k! GQGT F
k
−F T 0
Each term in the above sum is of the form , which is the k-th power of a
GQGT F
block lower triangular matrix and thus is also block lower triangular. Thus the block (1, 2)
term in each component of the sum is 0. Since each component in the sum has a zero (1, 2)
term the (1, 2) term for the block fundamental solution Φ(τ ) will also be zero. Thus we
conclude that Φyλ (τ ) = 0. Using this fact, Equation 91 then gives
so
P (t0 + τ ) = Φλy (τ )Φλλ (τ )T + Φλλ (τ )P (t0 )Φλλ (τ )T , (93)
which is the books equation 4.6-10 and represents a way to solve the linear variance equation.
Problem Solutions
Part (a): If the two measurements are sequential we first observe z1 and then observe z2 .
Assuming no prior information is equivalent to the maximum likelihood estimation method
which for Gaussian densities is given by
when there is only one measurement z1 = x + v1 we see that H1 = 1 and R1 = σ12 so the
above gives x̂1 (+) = z1 . To update the new uncertainty we use
Next, since we are estimating a constant the system dynamics propagate x̂1 (+) to x̂2 (−) as
Part (b): When the two measurements are taken sequentially the each are of the form
zi = x + vi for i = 1, 2 and our measurement vector z is given by
z1 1 v1
z1 = = x+ ,
z2 1 v2
1
so H1 = and the probability density for the measurement error vector v1 given by
1 2
σ1 0
p(v1 ) = N 0, . Since we have no a priori information we are required to use
0 σ22
weighted-least squares which has a update given by
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 ,
1/σ12 0
with the matrix R1−1 = . With the form of H1 above we compute
0 1/σ22
1 1
H1T R1−1 H1 = 2
+ 2.
σ1 σ2
Using this we can compute the new uncertainty matrix P1 (+) as
1 1
P1 (+)−1 = P1−1(−) + H1T R1−1 H1 = 0 + 2
+ 2.
σ1 σ2
Thus P1 (+) is given by
−1
1 1
P1 (+) = + ,
σ12 σ22
the same as the books equation 1.0-6. Finally we have x̂1 (+) after this combined measure-
ment z1 given by
−1
1 1 z1 z2
x̂1 (+) = + +
σ12 σ22 σ12 σ22
1
= 2 2
σ22 z1 + σ12 z2 ,
σ1 + σ2
the same as the books equation 1.0-7.
Problem 4-2 (additional Kalman filtering examples)
For this problem we want to rework Problems 1-1 and 1-3 using the Kalman filtering frame-
work developed in this chapter. Problem 1-1 has to do with two measurements zi of a
constant x that are correlated with a correlation coefficient ρ. Problem 1-3 has to with three
independent measurements.
Problem 1-1: If we assume that our measurements of the constant x of the form zi = x + vi
for
i 2= 1, 2 are correlated,
then the noise vector v takes the form v ∼ N(0, R) with R =
σ1 ρσ1 σ2
. Thus our measurement vector z1 is given by
ρσ1 σ2 σ22
1
z1 = x + v1 ,
1
1
thus H1 = and R1 = R the matrix above. If we assume we have no a priori information
1
on the value of x then our estimate of our state x after the measurement z1 is given by the
weighted least squares estimate
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 , (94)
and the new uncertainty, P1 (+), can be computed as
P1 (+)−1 = P1 (−)−1 + H1T R1−1 H1 = H1T R1−1 H1 ,
since P1 (−)−1 = 0. From the given form for R1 we have that its inverse R−1 is given by
−1 1 σ22 −ρσ1 σ2
R1 = 2 2 .
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 σ12
So that the product R1−1 H1 is given by
1 σ22 − ρσ1 σ2
R1−1 H1 = 2 2 ,
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 + σ12
and the product H1T R1−1 H1 is given by
σ12 + σ22 − 2ρσ1 σ2
H1T R1−1 H1 = .
σ12 σ22 (1 − ρ2 )
Thus since this product H1T R1−1 H1 equals P1 (+)−1 we have that
σ12 σ22 (1 − ρ2 )
P1 (+) = ,
σ12 + σ22 − 2ρσ1 σ2
which is the same result given in the book for the uncertainty of this system. Next using
these subresults in Equation 94 we compute x̂1 (+) as
σ12 σ22 (1 − ρ2 ) 1 2 2
x̂1 (+) = 2 2
· 2 2
(σ2 − ρσ 1 σ2 )z1 + (−ρσ1 σ2 + σ1 )z2
σ + σ − 2ρσ1 σ2 σ1 σ2 (1 − ρ2 )
1 2 2
σ2 − ρσ1 σ2 σ12 − ρσ1 σ2
= z1 + z2 ,
σ12 + σ22 − 2ρσ1 σ2 σ12 + σ22 − 2ρσ1 σ2
which also agrees with the solution found in Problem 1.1.
Problem 1-3: In the case when we have three independent measurements, zi , of an unknown
1
scalar x, our measurement vector z1 is given by z1 = 1 x + v1 with v1 ∼ N(0, R1 ) and
1
2 2 2 −1 2 2 2
R1 = diag(σ1 , σ2 , σ3 ). From this formulation
wesee that R = diag(1/σ1 , 1/σ2 , 1/σ3 ) and
1
the measurement sensitivity matrix H1 = 1 . Again assuming no a priori information
1
we have
P1 (+)−1 = P1 (−)−1 + H1T R1−1 H1 = H1T R1−1 H1 ,
and
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 .
h i
1 1 1
With the above matrices we have H1T R1−1 = σ12 σ22 σ32 , and thus H1T R1−1 H1 = σ12 + 1
σ22
+ 1
σ32
1
so that −1
1 1 1
P1 (+) = 2
+ 2+ 2 ,
σ1 σ2 σ3
and −1
1 1 1 z1 z2 z3
x̂1 (+) = + + + + ,
σ12 σ22 σ32 σ12 σ22 σ32
which is the same as the results found in problem 1-3.
Part (a): For this part of the problem our measurements are zi = x0 e−ti + vi with vi ∼
N(0, σi2 ) to be taken simultaneously. Since we are told that a priori we have no prior
information on the initial concentration x0 we will take P (−)−1 = 0 and the initial estimate
x̂(+) is the maximum likelihood estimate, which in this case because the two measurements
have different uncertainties is given by the weighted-least-squares estimate
e−2t1 e−2t2
H T R−1 H = + ,
σ12 σ22
and
e−t1 e−t2
H T R−1 z = z1 + z2 .
σ12 σ22
Thus x̂(+) is given by
−1 −t1 −t1
e−2t1 e−2t2 e e−t2 σ12 σ22 e e−t2
x̂(+) = + 2 z1 + 2 z2 = z1 + 2 z2
σ12 σ2 σ12 σ2 σ22 e−2t1 + σ12 e−2t2 σ12 σ2
2 −t1 2 −t2
σ e z1 + σ1 e z2
= 2 2 −2t1 ,
σ2 e + σ12 e−2t2
e−2t1 e−2t2
P (+)−1 = P (−)−1 + H T R−1 H = + ,
σ12 σ22
so P (+) is given by
−1
σ2σ2 1 −2t1 1
P (+) = 2 −2t1 1 2 2 −2t2 = 2
e + 2 e−2t2 ,
σ2 e + σ1 e σ1 σ2
Part (b): If the measurements are now assumed to be obtained sequentially then since
z1 = e−t1 x0 + v1 is the first one we have H1 = e−t1 and R1 = σ12 . Since we have no a priori
information on x0 the state update equation is still the maximum likelihood equation, an
applying the information from just this one measurement gives as our new estimate of x0
−2t1 −1 −t1
T −1 −1 −1 e e
x̂1 (+) = (H1 R1 H1 ) H1 R1 z1 = 2
z1 = et1 z1 ,
σ1 σ12
and
−1 −1 e−2t1
P1 (+) = P1 (−) + H1T R1−1 H1 = 2 ⇒ P1 (+) = σ12 e2t1 .
σ1
Now before we can incorporate the second equation we must perform state and covariance
extrapolation
x̂2 (−) = x̂1 (+) = et1 z1 and P2 (−) = P1 (+) = σ12 e2t1 .
K2 = P2 (−)H2T [H2 P2 (−)H2T + R2 ]−1 = σ12 e2t1 · e−t2 [e−2t2 σ12 e2t1 + σ22 ]−1
= σ12 e2t1 −t2 [σ12 e−2t2 +2t1 + σ22 ]−1 .
Then
and
σ12 e2t1 −2t2
P2 (+) = (I − K2 H2 )P2 (−) = 1 − σ12 e2t1
σ12 e−2t2 +2t1 + σ22
σ12 σ22
= ,
σ22 e−2t1 + σ12 e−2t2
After having appended a second measurement the same weighted least squares solution for
x̂ will hold, but with the larger matrices H1 , R1 , and z1 . That is we have
Since the new measurement is uncorrelated with the others R1 is block diagonal so its inverse
is also block diagonal −1
−1 R0 0
R1 = ,
0 R−1
and the measurement sensitivity matrix H1 also has a block form given by
H1T = H0T H T .
The problem states that we should define P (−)−1 as H0T R0−1 H0 so if we define P (+)−1 in
the same way as H1T R1−1 H1 then from Equation 97 we have shown that
Next lets compute x̂(+) using Equation 96. We first see that
T −1
T R0−1 0 z0
H1 R1 z1 = H0 H T
= H0T R0−1 z0 + H T R−1 z ,
0 R−1 z
so that
x̂(+) = (H1T R1−1 H1 )−1 [H0T R0−1 z0 + H T R−1 z] = P (+)[H0T R0−1 z0 + H T R−1 z]
= P (+)H0T R0−1 z0 + P (+)H T R−1 z , (98)
using the definition that (H1T R1−1 H1 )−1 = P (+). Now P (+) is given in terms of P (−) as
with
which is the desired expression. In the above simplifications we have used the fact that
Then to find the value of x̂ that minimizes this expression we take the derivative of J with
respect to x̂, set the result equal to zero and then solve for x̂. This derivative is given by
∂J
= 2P (−)−1 x̂ − P (−)−1 x(−) − P (−)−1 x(−)
∂ x̂
− H T R−1 z − H T R−1 z + 2H T R−1 H x̂
= 2[P (−)−1 + H T R−1 H]x̂ − 2P (−)−1 x(−) − 2H T R−1 z .
Where to take the derivative above we have used Equations 311 and 312
∂aT x ∂xT a
=a= ,
∂x ∂x
and the quadratic derivative Equation 312,
∂xT Ax
= (A + AT )x . (101)
∂x
∂J
Setting the expression ∂ x̂
equal to zero and solving for x̂ which we denote x̂(+) we get
as the solution to the expressed minimization problem. Motivated by the expression above
if we define P (+) as
P (+) = (P (−)−1 + H T R−1 H)−1 ,
then the inverse of P (+) is given directly
and for the first term in the above we can use the matrix inversion lemma as in the previous
problem to write P (+) as given by Equation 100 to obtain
as we were to show.
As an alternative way to show the desired expressions for x̂(+) and P (+) that does not use
the matrix inversion lemma, we can take the expression for J and write everything in terms
of the estimated vs. prior difference or x̃ = x̂ − x(−). We find that
As before we will want to take the derivative of J with respect to x̂, set the result equal to
zero and solve for x̂. With the above expression since x(−) is a constant, the derivative with
respect to x̂ is equal to the derivative with respect to the expression x̂ − x(−). If we define
this expression as x̃, we see that J in terms of x̃ can be written as
Thus converting the minimum we just found for x̃ into the variable x̂ with x̃ = x̂ − x(−) we
have that
x̂ = x(−) + (P (−)−1 + H T R−1 H)−1 H T R−1 (z − Hx(−)) ,
the same expression as in Equation 102.
as a function of x. When we take the derivative of this expression and set the result equal
to zero we find that
∂J
= −H T R−1 z − H T R−1 z + 2H T R−1 Hx = 0 .
∂x
Solving for x we find that
x = (H T R−1 H)−1 (H T R−1 z) , (104)
for the maximal likelihood solution. This is the same expression we found in Problem 4.4
above and thus the analysis from that problem is valid here. Namely, if we receive another
measurement z2 , with a measurement sensitivity matrix H2 , and measurement covariance
matrix R2 the recursive update of our state estimate x̂ is given by
where x1 is the estimate of x before receiving the measurement z2 given by Equation 104
with H = H1 , R = R1 , z = z1 , and x = x1 .
where we have used the fact that p(z|x) = p(z − Hx|x) = p(v).
Part (c): Note that from the problem statement we have that x ∼ N(x̂(−), P (−)), from
Part (a) of this problem we have that z ∼ N(Hx(−), HP (−)H T + R), and from Problem 4-6
above that p(z|x) can be expressed using Equation 103. Thus we can compute p(x|z) using
each of these components and obtain the functional form presented in the book.
1
p(x|z) = c exp{− [(x − x̂(−))T P (−)−1 (x − x̂(−)) + (z − Hx)T R−1 (z − Hx)
2
− (z − H x̂(−))[HP (−)H T + R]−1 (z − H x̂(−))]} .
In the above exponential one can see the three major terms that come from p(x), p(z|x),
and p(z) respectively.
Part (d): Sine p(x|z) is another Gaussian density, but with an as yet undetermined mean
and covariance, lets denote this unknown mean and covariance by x̂(+) and P (+), and
emphasize this by setting the term in the exponential above equal to
1
− (x − x̂(+))T P (+)−1(x − x̂(+)) .
2
This gives the equation (after we multiply by −2 on both sides)
Equating quadratic and terms in x above we see that P (+)−1 must be given by
Now we want to find the value of K such that our objective function J = trace(Ṗ ) is a
minimum. To find this value of K lets first compute the expression for trace(Ṗ ). Using
Equation 111 we find
J = trace(Ṗ )
= trace(F P ) + trace(P F T ) + trace(GQGT )
− trace(KHP ) − trace(P H T K T ) + trace(KRK T ) .
∂J
Next we need to evaluate ∂K
. To do this we will recall the following matrix derivative facts
∂
trace(BAC) = B T C T so that (112)
∂A
∂
trace(AC) = I T C T = C T
∂A
∂ ∂
trace(CAT ) = trace(AC T ) = I T C = C and
∂A ∂A
∂
trace(ABAT ) = 2AB . (113)
∂A
∂J
Using these results we find that ∂K
is given by
∂J
= −P H T − P H T + 2KR .
∂K
Setting this derivative equal to zero and solving for K gives
K = P H T R−1 , (114)
as we were to show.
Warning: I’m not sure exactly what this problem was asking or how to answer it. If anyone
has an idea of the type of solution requested please contact me.
That the estimator m̂k is unbiased can be seen by taking the expectation of its expression
k k
1X 1X
E[m̂k ] = E[xi ] = m = m,
k i=1 k i=1
where we have used the fact that the expectation of any given sample is the same as the
population mean or E[xi ] = m.
To show that the estimate of σ 2 is an unbiased estimator of the population variance we will
assume that the samples xi are drawn from a Gaussian distribution with a population mean
m and variance σ 2 . Then it can be shown that σ̂k2 as defined in this problem is related to a
chi-squared distribution in that the random variable
(k − 1)σ̂k2
,
σ2
is distributed as a χ2 random variable with k − 1 degrees of freedom [2, 3]. Recalling that if
the random variable, say X, is χ2 with k − 1 degrees of freedom then the expectation of X
is
E[X] = k − 1 , (115)
(k−1)σ̂k2
so that since σ2
is also χ2 with k − 1 degrees of freedom
(k − 1)σ̂k2
E = k − 1.
σ2
E[σ̂k2 ] = σ 2 ,
To derive a recursive form for an estimator for the mean m note that from the given expression
for m̂k note that we have
k k−1
!
1X k−1 1 X 1
m̂k = xi = xi + xk
k i=1 k k − 1 i=1 k−1
k−1 1
= m̂k−1 + xk , (116)
k k
showing how given m̂k−1 and xk we can obtain the estimate m̂k .
To derive a recursive form for an estimator for the standard deviation σ 2 we follow much of
the same manipulations we did for the mean. We find
k
1 X 2
σ̂k2 = (x − m̂k )2
k − 1 i=1 i
k
1 X 2
= (x − 2xi m̂k + m̂2k )
k − 1 i=1 i
k k
1 X 2 2 X k
= xi − m̂k xi + m̂2k
k − 1 i=1 k−1 i=1
k−1
k
1 X 2 k
= xi − m̂2k (117)
k − 1 i=1 k−1
k−1
!
1 X k
= x2i + x2k − m̂2 . (118)
k − 1 i=1 k−1 k
Lets
Pk−1now decrease the index k in Equation 117 so that we can derive an expression for
2
i=1 xi (note the upper limit on this summation of k − 1). We find
k−1
2 1 X 2 k−1 2
σ̂k−1 = x − m̂ ,
k − 2 i=1 i k − 2 k−1
Pk−1
so that the sum i=1 x2i is given by
k−1
X
2
x2i = (k − 2)σ̂k−1 + (k − 1)m̂2k−1 .
i=1
The above expression is a recursive representation for σ̂k that requires storing and computing
the last and most recent estimate of the mean m̂k . Since we can express m̂k recursively in
terms of m̂k−1 via Equation 116 if desired we could put this expression into the above and
derive an alternative recursive expression for σ̂k2 , that only involves the “new” measurement
2
xk and the old estimates σ̂k−1 , m̂k−1 , that is it does not depend on m̂k .
dz
= −(Q(x) + 2y1 R(x))z(x) − R(x) .
dx
The later, is a first order equation for z(x) which we can solve by quadrature. For the specific
problem given here, the initial solution y1 needed to proceed will be the steady-state or a
constant solution. When we take ṗ = 0 and denote the solution by p∞ in Equation 120 we
have
b2
− p2∞ + 2ap∞ + q = 0 .
r
When we solve for p∞ in the above quadratic we find
r !
ar b2 q
p∞ = 2 1 ± 1 + 2 . (123)
b ar
Since p∞ > 0 we must take the positive sign in the above expression. Next we let z(t) =
1 2
p(t)−p∞
and since P (t) = q, Q(t) = 2a, and R(t) = − br in the general Riccati solution
formulation find the equation for z(t) given by
2 2
′ b b
z (t) = − 2a + 2p∞ − z− −
r r
p
2 2
2b b 2 b2 q + a2 r b2
= − 2a − p∞ z + = √ z+ ,
r r r r
when we put in p∞ and simplify. Consider the coefficient of z(t) in the above equation
r s r
b2 q b2q b2 q
2 + a2 = 2 a2 1 + 2 = 2|a| 1 + 2 = 2β ,
r ar ar
where we have defined β in the last equality. Thus for z(t) we need to solve
b2
z ′ (t) = 2βz(t) + .
r
When we do this for z(0) = z0 we find
From this later expression we see that as t → ∞ that p(t) → p∞ as it should. Since p(0) = p0
when we let t = 0 we find that p0 = p∞ + z10 or z0 = p0 −p
1
∞
. Thus
2βr(p0 − p∞ )
p(t) = p∞ + . (124)
b2 (p 0 − p∞ )(−1 + e
2βt ) + 2rβe2βt
Dividing by r on the top and the bottom of this expression and noting that
2
r 2 2
r 2 2 b2 q r bq
2
a −β = 2 a −a 1+ 2 = 2 − = −q ,
b b ar b r
The given diagram from the book for this problem implies that ẋ1 = w and
Z
x2 = (x1 − βx2 )dτ .
ẋ1 = w
ẋ2 = x1 − βx2 ,
Ṗ = F P + P F T + GQGT − P H T R−1 HP .
Now using this expression in the (1, 2) component gives for p11
α2
p11 = βp12 + p12 p22
r
√ s r
√
qr α2 qr r q
= ±β + ± 2
−β + β 2 + 2α
α r α α r
s r
√
qr 2
q
= ± β + 2α .
α r
As p11 > 0 we must take the positive sign in the above expression. Which means that we
know the complete expression for p12 given by Equation 126. Now to compute K(∞) we
note that
as we were to show. In the Mathematical file chap 4 prob 12.nb we perform some of the
algebra not displayed in the above derivation.
Problem 4-13 (the optimal filter for detecting a sine wave in white noise)
Warning: I was not able to solve this problem. If anyone has an attempted solution I would
be interested in seeing it.
As a continuous system from the problem description the output x(t) of our integrator would
satisfy
ẋ = w ,
where w(t) is a white noise process. If we discretize this process we get the discrete system
of
xk+1 = xk + wk ,
where now we have that wk ∼ N(0, q∆). We are told that the observation equation is given
by
vk = xk + vk .
With no a priori information measure we have P0 (+) = +∞, and to compute the a posteriori
covariance matrix after each measurement in this problem we will use
From the equations above we can make the association to the standard problem that Φk = I,
Gk = I, Qk = q∆, Hk = 1, and Rk = r0 .
Part (a): In this case we told to assume that q∆ ≫ r0 . Now we have P0 (+) = +∞, since
there is no a priori information and we get P1 (−) from
P1 (−) = P0 (+) + q∆ = +∞ .
P2 (−) = P1 (+) + q∆ = r0 + q∆ .
For the updated variance after the second measurement P2 (+) we get
1 1 1 1
P2 (+)−1 = P2 (−)−1 + = + ≈ ⇒ P2 (+) = r0 ,
r0 r0 + q∆ r0 r0
since q∆ ≫ r0 . Now P3 (−) is given by
P3 (−) = P2 (+) + q∆ = r0 + q∆ ,
Pk (+) = r0 ,
and
Pk+1(−) = r0 + q∆ ≈ q∆ ,
when q∆ ≫ r0 . This corresponds to the case where the object we are filtering has very
large process noise, so that at each timestep when we propagate between measurements we
effectively “loose” the object. The measurements are considerably more accurate so when
we take a measurement we have a much tighter uncertainty around the tracked object.
Part (b): For this part we assume that r0 ≫ q∆ and follow the outline as in the previous
part. Again we start with P0 (+) = +∞, since there is no a priori information. Then we get
P1 (−) from
P1 (−) = P0 (+) + q∆ = +∞ .
Then P1 (+) is given by
1 1
P1 (+)−1 = P1 (−)−1 + = ⇒ P1 (+) = r0 .
r0 r0
Then for P2 (−) we get
P2 (−) = P1 (+) + q∆ = r0 + q∆ ≈ r0 .
δp(0)
For this problem we are told to take as our state the vector x = δv(0) . This is different
δa(0)
from the state vector specified in example 4.2-4 in that this state is a constant vector of
initial conditions,
while example 4.2-4 in the book used the time dependent state given by
δp(t)
x(t) = δv(t) , where each function in the state is the appropriate integral of the one
δa(t)
below it. The constant state for this problem then satisfies the null dynamics given by
dx
dt
= 0, which has the fundamental solution Φ = I. We assume that our initial uncertainty
in these constants before the measurement at time T is given by
p11 (0) 0 0 E[δp2 (0)] 0 0
P (0) = 0 p22 (0) 0 = 0 E[δv 2 (0)] 0 .
2
0 0 p33 (0) 0 0 E[δa (0)]
The discrete state and covariance extrapolation equations from the time 0 to T − the time
just before the first measurement fix gives
δp(0)
x̂(T − ) = I x̂(0) = δv(0) ,
δa(0)
and P (T − ) = P (0). Because our state x is independent of time the given measurement z(t)
requires that the measurement sensitivity matrix H now be a function of time because
δp(0)
z(t) = −δp(t) + ep = − 1 t t2 δv(0) + ep ,
2
δa(0)
so the measurement sensitivity matrix is given by
t2
H(t) = − 1 t 2
.
With this definition of H we next compute some of the factors needed in computing the a
posteriori state and covariance update equations. One expression we require is
− T 2 T4
H(T )P (T )H(T ) = p11 (0) + T p22 (0) + p33 (0) .
4
From this point on to simplify the notation we will write p11 (0) as p11 dropping the argument
of zero (we follow the same convention for the other expressions). To evaluate P (T + ) we
could use Equation 66 with R = σp2 or we can use the inverse update formulation given by
Equation 61 which gives
The following algebra, required to derive the expression quoted in the text, is rather tedious
and can be skipped if desired. First we evaluate the factor I + V T A−1 U and find
I + V T A−1 U = I + V T P (T − )U
1
1 2
T
= 1+ 2
1 T T2
σp T2
2
1 T4
= 1 + 2 p11 + p22 T + p33 .
σp 4
Note that from the definition of ∆a (T ) given we can simplify the denominator above as
T4
σp2 + p11 + p22 T + p33 = p11 p22 p33 ∆a (T ) . (129)
4
When we use this in M we get
2
p211 p11 p22 T p11 p33 T2
1 p22 p11 T p222 T 2 p22 p33 T2 .
3
M=
p11 p22 p33 ∆a (T ) 2 3 4
p33 p11 T2 p33 p22 T2 p233 T4
Then the expression for P (T + ) then looks like
P (T + ) = P (T − ) − M
∆a (T )p11 0 0
1
= 0 ∆a (T )p22 0
∆a (T )
0 0 ∆a (T )p33
p11 T T 2
p22 p33 p33 2p22
1 T p22 T 2 T3
− p33 p11 p33 2p11 .
∆a (T ) T2 T3 p33 T 4
2p22 2p11 4p11 p22
So ∆a (T )P (T + ) then looks like
2
∆a (T )p11 − p22p11p33 − pT33 − 2pT 22
p22 T 2 3
− pT33 ∆a (T )p22 − p11 p33
− 2pT 11 .
T2 3 p33 T 4
− 2p22 − 2pT 11 ∆a (T )p33 − 4p11 p22
We have one more simplification (that we don’t fully document) and we have shown the
requested result. If we take each of the diagonal elements in the expression for P (T + ) and
simplify using the definition of ∆a (T ) given in Equation 129. For example the (1, 1) element
becomes
1 2 T4 p11 σp2 T2 T4
σp + p11 + p22 T + p33 − = + + ,
p22 p33 4 p22 p33 p22 p33 p33 4p22
which is the quoted expression in the book. Simplifying the other diagonal terms gives rise
to the desired expression for P (T + ).
The single-star fix: We are told that our first measurement gives us an estimate of θ1 and
θ2 . Lets assume (for this part and the next) that there is no dynamics in this problem and we
just want to observe how the single star and double star fixes change our state uncertainty
estimates. For the single star fix the measurement vector z is related to the state by
θ1
z1 1 0 0 v1
z= = θ2 + ,
z2 0 1 0 v2
θ3
2
v1 σ1 0
with the measurement noise vector ∼ N 0, . Then we update the a
v2 0 σ22
priori covariance to account for this measurement using the standard a posteriori update
equation
P (+) = P (−) − P (−)H T (HP (−)H T + R)−1 HP (−) . (130)
To evaluate this we find that the product HP (−) is given by
σ2 0 0 2
1 0 0 2 σ 0 0
HP (−) = 0 σ 0 = .
0 1 0 0 σ2 0
0 0 σ2
2
σ 0
The matrix P (−)H T is the transpose of this or 0 σ 2 . Next we compute HP (−)H T
0 0
and find 2
σ 0 2
T 1 0 0 2 σ 0
HP (−)H = 0 σ = .
0 1 0 0 σ2
0 0
With this we have
−1 " 1
#
σ 2 + σ12 0 σ2 +σ12
0
(HP (−)H T + R)−1 = = 1 ,
0 σ + σ12
2
0 σ2 +σ22
The two-star fix: For the two-star fix we follow the one-star fix with another pair of
measurements of the angles θ1 and θ3 . In this case the second measurement vector has the
form
θ1
1 0 0 v
z= θ2 + 1
,
0 0 1 v3
θ3
with 2
v1 σ1 0
∼ N 0, .
v3 0 σ32
2
1 0 0 σ1 0
Thus in this case we have that H = and R = . Performing the
0 0 1 0 σ32
same manipulations as above but with these different H and R matrices and using the value
computed for P (+) in Equation 131 for the value of P (−) in Equation 130 (the second
measurement directly follows the first) we find that P (+) after both measurements is given
by σ2 σ2
1
0 0 σ2
2
2
2σ +σ1 σ2 σ2 2
1
0 0
P (+) =
0 2
σ2 +σ22
0 ≈
0 σ2 0 ,
2
0 0
2
σ σ32
2
0 0 σ32
2
σ +σ 3
2
when use σ ≫ σi2 to simplify terms like
σ 2 σi2 σ 2 σi2 σi2
≈ = .
nσ 2 + σi2 nσ 2 n
From the above we find trace(P (+)) to be given by
σ12
trace(P (+)) = + σ22 + σ32 ,
2
as we were to show.
In the Mathematical file chap 4 prob 16.nb we perform some of the algebra not displayed
in the above derivation.
Problem 4-17 (a polynomial tracking filter)
x1
The zero forcing dynamic equation ẍ = 0 when we introduce the state x = defined
x2
by x1 (t) = x(t) and x2 (t) = ẋ(t) has components that satisfy
To derive the requested expression for Pk+1(+) we sequentially perform error covariance
extrapolation followed by error covariance updates until we get to the discrete time tk+1 =
(k + 1)τ . The error covariance extrapolation equation is explicitly given by
and is subsequently followed by an error covariance update step which can be written as
−1
Pk+1 (+)−1 = Pk+1 (−)−1 + Hk+1
T
Rk+1 Hk+1
1 1 0
= Pk+1 (−)−1 + . (133)
r 0 0
Once we have computed the matrix Pk+1 (+) we can compute Kk+1 via Equation 62 which
in this case becomes
T −1 1 1
Kk+1 = Pk+1 (+)Hk+1Rk+1 = Pk+1 (+) . (134)
r 0
While we have not derived the quoted expression for Pk+1 (+) if we assume that it is correct
and compute Kk+1 with the above formula we get
1 1 1 2r 2k + 1
Kk+1 = Pk+1 (+) = 3
r 0 r (k + 1)(k + 2) τ
2 2k + 1
= 3 ,
(k + 1)(k + 2) τ
which is the expression given. Thus to finish this problem it remains to derive the expression
for Pk+1 (+). From Equations 132 and 133 we can combine these two expressions into one to
get
−1 T −1 1 1 0
Pk+1(+) = (Φ(τ, 0)Pk (+)Φ(τ, 0) ) +
r 0 0
−1
1 τ 1 0 1 1 0
= Pk (+) + . (135)
0 1 τ 1 r 0 0
Following the hint in the book if we begin these iterations with P0 (+) = 1ǫ I we find that
1 r(1 + τ 2 ) rτ
P1 (+) = .
1 + ǫr + τ 2 rτ r + 1ǫ
We cannot take the limit of this as ǫ → 0 so we iterate Equation 135 another time to get an
expression for P2 (+). When we do this we find that we can set ǫ = 0 and get a well defined
expression. The resulting expression is
r τr
P2 (+) = r 2r .
τ τ2
Iterating Equation 135 a third time with on the above matrix gives
5r r
P3 (+) = 6 2τ .
r 2r
2τ τ2
Both of these expressions agree with the stated result for Pk+1 (+) when we take k = 1 and
k = 2. If we hypothesis that
2r 2k + 1 τ3
Pk+1 (+) = 3 6 ,
(k + 1)(k + 2) τ kτ 2
we can then use Equation 135 to show by induction that the above expression for Pk+1(+)
is valid for all k.
Note that in the Mathematical file chap 4 prob 17.nb we perform some of the algebra not
displayed in the above derivation.
where Kk is the Kalman gain given by Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1 . To derive the
requested determinant first consider the following manipulations of the product Hk Kk . We
have
When we put in the expression just derived for Hk Kk into the above we get
the initial expression requested. Taking the determinant of both sides of this then gives
|Hk ||Pk (+)| = |Rk ||Hk Pk (−)HkT + Rk |−1 |Hk ||Pk (−)| .
We can divide both sides of this equation by |Hk | since Hk is invertible to get
Lets look for an optimal linear estimator of the following form for processing the kth mea-
surement zk
x̂k (+) = kk′ x̂k (−) + kk zk .
Introducing the a priori and a posteriori estimation errors x̃k (±) = x̂k (±) − xk , and the
measurement equation zk = xk + vk in the above equation we have an recursive update of
x̃k (+) given by
x̃k (+) = [kk′ + kk − 1]xk + kk′ x̃(−) + kk vk .
To be an unbiased requires that since E[vk ] = 0 that kk′ = 1 − kk and we have an estimator
of
x̂k (+) = (1 − kk )x̂k (−) + kk zk .
To determine the value of kk consider
pk (+) = E{x̃k (+)x̃k (+)T }
= E{(1 − kk )x̃k (x̃k (1 − kk ) + kk vk ) + kk vk (x̃k (−)(1 − kk ) + kk vk )}
= (1 − kk )2 E{x̃k (−)2 } + 2(1 − kk )kk E{x̃k (−)vk } + kk2 E{vk2 }
q2
= (1 − kk )2 pk (−) + kk2 .
12
Where we have used
Z q Z q
1 2 2 2
E[vk2 ] = 2
x dx = x2 dx
q − q2 q 0
q/2
2 x3 q2
= = .
q 3 0 12
To find the value of kk that makes pk (+) a minimum we take the derivative and set the
results equal to zero and solve for kk . We find for the derivative
q2
2(1 − kk )(−1)pk (−) + kk = 0 .
6
or
pk (−)
kk = q2
, (137)
pk (−) + 12
so
q2
12
1 − kk = q2
.
pk (−) + 12
With this value of kk the covariance pk (+) becomes
(q 2 /12)2 (q 2 /12)2 pk (−)2
pk (+) = p k (−) +
(pk (−) + q 2 /12) (pk (−) + q 2 /12)2
(q 2 /12)2pk (−)2
= .
(pk (−) + q 2 /12)
Since we are estimating a constant with no dynamics we have that x̂k (−) = x̂k−1 (+) and
pk (−) = pk−1 (+). In summary then the recursive form of our estimator for the unknown
constant starts with
x̂0 (+) = m with p0 (+) = σ 2 ,
and then iterates for each measurement zk for k ≥ 1 the following
x̂k (−) = x̂k−1 (+) and pk (−) = pk−1(−)
x̂k (+) = (1 − kk )x̂k−1 (−) + kk zk
(q 2 /12) pk−1 (−)
= x̂k−1 + zk
pk−1 (−) + (q 2 /12) pk−1(−) + q 2 /12
(q 2 /12)2 pk (−)2
pk (+) = ,
(pk (−) + q 2 /12)
It seems that we only needed an expression for E[v 2 ] but the explicit form of the distribution
did not seem to matter.
Problem 4-21 (filtering with multiplicative noise)
Our estimator for this problem will be constructed as x̂ = kz for some as of yet unspecified
value for the multiplier k. The error using this estimator is computed as
x̃ = x̂ − x
= kz − x
= k(1 + η)x − x (138)
= (k(1 + η) − 1)x .
For x̂ to be an unbiased estimate of x means that E[x̃] = 0. From Equation 138 we see that
this requires
E[x̃] = kE[x] + kE[ηx] − E[x] = 0 ,
since all three expectations are zero. Thus the estimator as defined is unbiased. Next we
will pick the value of k so that the variance in the error is as small as possible. The variance
in the error is
In the above I have assumed that E[η 2 x2 ] = E[η 2 ]E[x2 ], which would be true if x and η are
independent random variables. Then we want to minimize the expression E[x̃2 ] when viewed
as a function of k. When we take the derivative, of this expression, set the result equal to
zero and solve for k we find
1
k= .
1 + ση2
We can check that the value above is indeed a minimum by taking the second derivative
d2 E[x̃]
2
= 2ση2 σx2 + 2σx2 > 0 .
dk
Now since
1 −ση2
k−1= −1= ,
1 + ση2 1 + ση2
the minimum variance E[x̃2 ] is given by
Warning: I’m not sure exactly what this problem was asking or how to answer it. If anyone
has an idea of the type of solution requested please contact me.
Problem 4-23 (filtering a constant angular rate)
If we define the state variables x1 and x2 for this problem to be x1 = θ and x2 = θ̇ then as
a differential system we have
d ẋ1 x2 0 1 x1
x= = = .
dt ẋ2 0 0 0 x2
Then using the power series definition for the fundamental solution we have
1
Φ(t + T, t) = eF T = I + F T + F 2 T 2 + · · · .
2
For the F given above F 2 = 0 and so the above sum explicitly stops after two terms.
Evaluating this two term sum we find that Φ(t + T, t) given by
1 T
Φ(t + T, t) = .
0 1
The filtering equations that will produce the optimal estimates of position and velocity are
given by the Kalman equations. We will do the first of these updates “by hand” and then
one could write a simple program to generate the rest. We first need to propagate the initial
state and uncertainty to the first measurement time
1 T 0 0
x̂1 (−) = Φ0 x̂0 (+) = =
0 1 0 0
T 1 T 2 1 0 2 1 + T2 T
P1 (−) = Φ0 P0 (+)Φ0 = 20 I = 20 .
0 1 T 1 T 1
Next we observe the first measurement z1 and update the state and covariance matrix with
with Equations 51, 58, and 59. We begin with Equation 58 or
Since Φ and H do not depend on the index k the steps in this process are summarized as
follows. Given an initial starting values of x̂(+) and P (+) as each measurement z comes in
compute
x̂(−) = Φx̂(+)
P (−) = ΦP (+)ΦT
K = P (−)H T (HP (−)H T + R)−1
x̂(+) = x̂(−) + K(z − H x̂(−))
P (+) = (I − KH)P (−) .
For this problem we are told that E[x0 ] = 1 and E[x20 ] = 2. From this we can conclude that
the variance of the initial state x0 is given by
where since T = τ the value of the exponential is above is actually e−1 . Our fundamental
solution matrix is then Φk = e−1 with a process noise variance of qk = 2. With measurements
of this process given by
zk = xk + vk ,
we have hk = 1. To derive statistics of the measurement noise process vk recall that the
density of the measurement noise vk is discrete and specifically given by
1
P (vk = −2) = P (vk = +2) = ,
2
so that E[vk ] = 0. The variance of noise distributed like this is given by
1 1
rk = E[vk2 ] = 4 + 4 = 4 ,
2 2
With all of the above information we can apply the Kalman filtering framework to this
problem.
Part (a-b): With initial conditions for this problem are given by x̂0 (+) = 1 with p0 (+) = 1,
our estimate for x̂1 (−) and p1 (−) is given by
x̂1 (−) = Φ0 x̂0 (+) = e−1 ,
and
p1 (−) = Φ0 p0 (+)ΦT0 + Q0 = e−2 + 2 .
Then we observe the measurement z1 , which we can incorporate using the Kalman mea-
surement update Equations 51, 58, and 59. Rather than document these in detail again,
please see the python file chap 4 prob 24.py for some numerical code where we do these
calculations for the two measurements z1 and z2 . When we implement these equations and
execute the above script we find
x̂1 (+) = 0.7619 p1 (+) = 1.3921 and
x̂1 (+) = 1.2420 p1 (+) = 1.4145 .
Warning: I was not sure about this problem. If anyone has any ideas please contact me.
Warning: I was not sure how to deal with the derivative of the expression hC (t) in the
noise term on the right-hand-side of the differential equation for h(t). If anyone has any
ideas please contact me.
Denote by i1 (t) and i2 (t) the currents in the left most and right most cell in Figure 4-
4 respectively. We assume that the currents are running in a clockwise direction. Then
Kirchhoff’s voltage law (KVL) [5] around the left most cell gives
u(t) − R1 i1 − v1 = 0 , (139)
while Kirchhoff’s voltage law around the right most cell gives
v1 − R2 i2 − v2 = 0 , (140)
where vi is the voltage of the capacitor Ci . Also the current flowing from top down through
the capacitor C1 gives rise to a change in voltage as
dv1
i1 − i2 = C1 . (141)
dt
The same consideration for the current flowing from top down through the capacitor C2 gives
i2 = C2 dvdt2 so that with this we can write i1 in terms of vi . From Equation 141 we have
dv1 dv2 dv1
i1 = i2 + C1 = C2 + C1 .
dt dt dt
With these expressions for i1 and i2 , using Equations 139 and 140 our system differential
equation in terms of the variables v1 and v2 is
dv1 dv2
u(t) − R1 C1 + C2 − v1 = 0 (142)
dt dt
dv2
v1 − R2 C2 − v2 = 0 . (143)
dt
dv2
Solving this second equation for dt
gives
dv2 1
= (v1 − v2 ) .
dt R2 C2
When we put that expression into Equation 142 and solving for dvdt1 we find
dv1 1 1 1 1 1
=− + v1 + v2 + u(t) .
dt C1 R1 R2 R2 C1 R1 C1
v1
When we view these two equations as a matrix system with a state x = we find
v2
" h i #
d v1 − C11 R11 + R12 R
1
C v1 1 u(t)
= 2 1 + .
dt v2 1
− 1 v2 R1 C1 0
R2 C2 R2 C2
If we next simplify the system above to the case where R1 = R2 = 1 and C1 = C2 = 1 the
above system becomes
d v1 −2 1 v1 u(t)
= + .
dt v2 1 −1 v2 0
−2 1
Thus for this problem we see that our system matrix F = . We are told that
1 −1
the measurement for this system is of v2 (t) and is exact or
v1
z(t) = 0 1 .
v2
Since numerically having no measurement noise can be harder to we will simulate this by
taking R to be a very small number say 10−6.
This problem, as specified, is continuous but we want to compute our estimates the discrete
times so we will discretize it and apply the discrete Kalman filtering equations. To do that
we need the discrete transition matrix Φk given by
1
Φk = Φ((k + 1)∆t, k∆t) = eF ∆t ≈ I + F ∆t + F 2 ∆t2 .
2
1.4
1.2
0.8
0.6
0.4
0.2
var(v1(−))
var(v (+))
1
0
1 1.5 2 2.5 3 3.5 4 4.5 5
Figure 3: Plots of the a priori (in blue) and a posteriori (in red) covariance for the voltage
across the capacitor C1 as a function of the index in the discrete Kalman filtering algorithm.
The “index” 1 corresponds to the time 0.
since u ∼ N(0, 2). Then the optimal estimate of the voltage across C1 is given by the discrete
Kalman filter. For this problem statement we have ∆t = 0.5 seconds, and to reach the time
T = 2 seconds we need four iterations. We will take the initial conditions for this system as
0
x̂0 (+) = and P0 (+) = 0 ,
0
since we assume that the initial conditions are known exactly. Then to finish this problem
we need to iterate the discrete Kalman filtering covariance equations
and then plot the (1, 1)th element of the matrices Pk (±) after each iteration. In the MAT-
LAB/Octave file chap 4 prob 27.m we perform the Kalman filtering iterations needed to
produce the plot above. We see that the value of the variance of v1 after the first measure-
ment goes to 1 and stays there for all further iterations.
Problem 4-28 (Kalman filtering the inverse square law)
To begin, first consider the given equations under the conditions that u1 = u2 = 0, which
are given by
G0
r̈ = r θ̇2 −
r2
ṙ
θ̈ = −2θ̇ .
r
Then if r = R is a constant we see that ṙ = r̈ = 0 and the above becomes
G0
0 = Rθ̇2 −
R2
θ̈ = 0 .
G0
The first equation above gives θ̇2 = R3
or
√
G0
θ̇ = ,
R3/2
so that as a function of t when we integrate we find
√
G0
θ(t) = 3/2 t + θ0 ,
R
where θ0 is an arbitrary constant. Note that this solution also satisfies θ̈ = 0. To get the
circular orbit solution quoted in the book we take θ0 = 0 and then θ(t) = ωt with ω given
by √
G0
ω = 3/2 ,
R
3 2
or equivalently R ω = G0 .
We will take this nonlinear system and split it into two parts to write it as
x2 x2 0
(x1 + R) x4 + ω 2 − G0 2 + u1 (x1 + R) x4 + ω 2 − G0 2 u
R (x1 +R) R (x1 +R)
+ .
1
x = x
0
2 x2
4 4
x4 Ru2
−2R xR4 + ω x1x+R + xRu 2
1 +R
−2R R
+ ω x1 +R x1 +R
(150)
This writes the right-hand-side as the sum of two vectors
each that are nonlinear in the state
u1
x and the components of the noise vector u = . If we denote the first vector as f(x)
u2
(since it does not depend on the noise vector u) then we will linearized it about the state
x0 . We do this as
x
x 2 2 x1
(x1 + R) 4 + ω − G0 2 ∂f
R (x1 +R)
≈ f(x ) + x2 (151)
∂x x0 x3
x4 0
x x
−2R 4 + ω 2 x4
R x1 +R
The point x0 is the equilibrium point for circular orbits and corresponds to x0 = 0. Using the
fact that ω 2 = G
R3
0
we have that f (x0 ) = 0. To complete this derivation recall the definition
∂f
of ∂x which is given by
∂f ∂f ∂f ∂f
1 1 1 1
∂x1 ∂x2 ∂x3 ∂x4
∂f ∂f2 ∂f2 ∂f2 ∂f2
∂x1 ∂x2 ∂x3 ∂x4
= ∂f3 ∂f3 ∂f3 ∂f3
∂x ∂x1 ∂x2 ∂x3 ∂x4
∂f4 ∂f4 ∂f4 ∂f4
∂x1 ∂x2 ∂x3 ∂x4
0 1 0 0
x 2 x
4
+ ω + (x12G 0
0 0 2
(x + R) 4
+ ω
R +R)3 R 1 R
= 0 0 0 1 .
1 x
x4
2R R + ω (x1x+R) 2
2 −2R x4
R
+ ω x1 +R 0 −2R R1 2
x1 +R
G0
We now evaluate this at the point x0 . We find that when we use the fact that ω 2 = R3
we
get
0 1 0 0 0 1 0 0
∂f ω 2 + 2G30 0 2
0 R Rω 3ω 2
0 0 2ω
= R = . (152)
∂x x0 0 0 0 1 0 0 0 1
ω
0 −2R R 0 0 0 −2ω 0 0
The
second
term in the sum in Equation 150 is the non-linear forcing function given by
0
u1
0 . To expand this vector about the joint point (x0 , u0 ) = (0, 0) = 0 we have
Ru2
x1 +R
0 0 0
u1 u1
≈ g(0) + ∂ u1 x1 + ∂ u 1
0 ∂x1 0 ∂u 0 u2
Ru2 Ru2 Ru2
x1 +R x1 +R x1 +R
0 0
0 0 0
0 1 0 u
= x1 +
1
0
0 0 u2
− (x1Ru 2 R
0 x1 +R
+R)2 0 0
0 0
1 0 u
=
1
. (153)
0 0 u2
0 1
When we combine Equations 149 151, 152, and 153 we have the equation we wanted to show.
In the two parts below it seemed strange that the measurement noise had a variance that
was the same symbol q as the process noise symbol. Thus I’ve changed the notation below
to use the notation ri for the variance of the measurement zi .
Part (a): In this case z(t) = x3 (t) + v3 (t) with v3 ∼ N(0, r3 ) so we have a measurement
sensitivity matrix H given by
H= 0 0 1 0 ,
with a measurement noise variance given by R = r3 .
Part (b): In this case z(t) = x1 (t) + v1 (t) with v1 ∼ N(0, r1 ) so we have a measurement
sensitivity matrix H given by
H= 1 0 0 0 ,
with a measurement noise variance given by R = r1 .
In comparing the prescriptions from Part (a) and Part (b) the better estimator will be the
come with the smaller value of trace(P∞ ), so we need to solve the steady-state for the Riccati
equation
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
0 0
q1 0 1 0
when Ṗ = 0 and with F given by the above, Q = ,G=
, and H and R
0 q2 0 0
0 1
given by the different parts as above.
In the Mathematical file chap 4 prob 28.nb we perform some of the algebra in attempting
to solving for the steady state error covariance matrix P (∞).
Warning: I ran into trouble in that Mathematica could not solve the above nonlinear system
for the components pij in the time I gave it. I then tried to solve the matrix Riccati equation
using the methods discussed on Page 57 above. Unfortunately the eigenvalues of the system
matrix F do not have a negative real parts since they are zero or entirely imaginary and this
method cannot be used. Thus algebraically in the time I had to work on this I was unable to
determine which of the two methods is better. If we specify numerical values for the above
variances one could easy do a numerical simulation and make some headway. If anyone has
any insight into this problem I would be interested in hearing your comments.
Chapter 5 (Optimal Linear Smoothing)
Many of the results from the initial section use the following simple matrix inverse identity
which we now derive. Since we can write the sum P + Pb as
P + Pb = Pb (Pb−1 + P −1 )P ,
when we take the inverse of this sum P + Pb we find that this inverse is given by
Pb (τ )
= −F Pb − Pb F T + GQGT − Pb H T R−1 HPb ,
dτ
and multiply on the left by Pb−1 and on the right by Pb−1 (and then negate the entire
expression) we get
−1 d −1
−Pb Pb Pb−1 = Pb−1 F + F T Pb−1 − Pb−1 GQGT Pb−1 + H T R−1 H .
dτ
As the expression on the left-hand-side is dτd Pb (τ )−1 this is the books equation 5.2-12. Using
this we can now derive the differential equation for the variable s(t) = Pb−1(t)x̂b (t). Taking
this derivative and using the product rule (and dropping the b subscript) we have
−1
ds dP (τ ) dx̂(τ )
= x̂(τ ) + P −1 (τ )
dτ dτ dτ
= (P F + F P − P GQGT P −1 + H T R−1 H)x̂ + P −1(−F x̂ + P H T R−1 (z − H x̂))
−1 T −1 −1
which is the books equation 5.2-13 and the expression we wanted to show.
In this subsubsection we derive the expression for the optimal smoother expressed in Ta-
ble 5.2-1 and which is based on combining the forward filtering equations with the backwards
filtering equations. In that table the forward filter and the backwards filter are the same as
given in the text in many places. What is not directly obvious is the given expression for
the optimal fixed-interval smoother x̂(t|T ) and P (t|T ). To derive these equations we will
use the matrix identity
B −1 = A−1 − B −1 (B − A)A−1 ,
to evaluate [P −1 + Pb−1]−1 in the expression for P (t|T ). By taking B = P −1 + Pb−1 and
A = P −1 we have
which is the books equation for P (t|T ) found in table 5.2-1. Next we compute x̂(t|T ) using
the definition of s(t) as
Warning: This is different from the expression in the book for x̂(t|T ) found in table 5.2-1
in that the books expression does not have an inverse on the factor I + P Pb−1. If anyone
finds anything wrong with the above expression or derivation please contact me.
The derivation of the Rauch-Tung-Striebel smoother equations
which expresses the smoothed covariance P (t|T ) in terms of the forward and backwards
covariances. To do this we will applying the matrix inverse derivative identity
d −1 −1 dA
A = −A A−1 ,
dt dt
to the left-hand-side of the above equation (but not to the right-hand-side) giving
d −1 −1 dP (t|T )
P (t|T ) = −P (t|T ) P −1 (t|T ) (158)
dt dt
d d
= P (t)−1 + Pb (t)−1
dt dt
d d
= P (t)−1 − Pb (τ )−1 , (159)
dt dτ
where we have converted the t derivative into a τ ≡ T − t derivative in the derivative of Pb−1
in the last term above. Now recall that from Equation 79 that the time derivative of P −1 is
given by
d −1
P = −F T P −1 − P −1F − P −1 GQGT P −1 + H T R−1 H ,
dt
and using the books equation 5.2-12 that the τ derivative of Pb−1 is given by
d −1
P = Pb−1F + F T Pb−1 − Pb−1 GQGT Pb−1 + H T R−1 H . (160)
dτ b
If we use these two expression in Equation 159 we find
d
P (t|T )−1 = −F T P −1 − P −1F − P −1GQGT P −1 + H T R−1 H
dt
− F T Pb−1 − Pb−1 F + Pb−1GQGT Pb−1 − H T R−1 H
= −F T (P −1 + Pb−1 ) − (P −1 + Pb−1 )F − P −1 GQGT P −1 + Pb−1 GQGT Pb−1
= −F T P (t|T )−1 − P (t|T )−1F − P −1GQGT P −1 + Pb−1GQGT Pb−1 .
To solve for dP dt
(t|T )
we use Equation 158 by premultiplying and postmultiplying by P (t|T )
and then negating the resulting expression. This procedure gives
dP (t|T )
= P (t|T )F T + F P (t|T )
dt
+ P (t|T )P −1GQGT P −1 P (t|T ) − P (t|T )Pb−1GQGT Pb−1 P (t|T ) . (161)
Lets now try to “remove” the terms with Pb from this expression. To do that recall if we
premultiply by P (t|T ) in Equation 157, we get
dP (t|T )
= P (t|T )F T + F P (t|T ) + P (t|T )P −1GQGT P (t|T )
dt
− (I − P (t|T )P −1)GQGT (I − P −1P (t|T ))
= P (t|T )F T + F P (t|T ) − GQGT + P (t|T )P −1GQGT + GQGT P −1P (t|T )
= (F + GQGT P −1 )P (t|T ) + P (t|T )(F + GQGT P −1 )T − GQGT , (165)
We next derive the differential expression satisfied by the smoothed estimate x̂(t|T ). To
begin recall the books equation 5.1-12,
from which we see that the time derivative of this expression is given by
dx̂(t|T ) dP (t|T ) −1 d d
= [P x̂ + Pb−1 x̂b ] + P (t|T )[ (P −1 x̂) + (Pb−1x̂b )]
dt dt dt dt
= [(F + GQG P )P (t|T ) + P (t|T )(F + GQG P ) − GQGT ]P −1 (t|T )x̂(t|T )
T −1 T −1 T
−1
dP −1 dx̂ dPb−1 −1 dx̂b
+ P (t|T ) x̂ + P + x̂b + Pb .
dt dt dt dt
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T )
dt
+ [P (t|T )(F + GQGT P −1 )T − GQGT ]P −1(t|T )x̂(t|T )
+ P (t|T ) −F T P −1 x̂ − P −1 F x̂ − P −1 GQGT P −1 x̂ + H T R−1 H x̂
+ P (t|T ) P −1 F x̂ + H T R−1 (z − H x̂)
+ P (t|T ) −Pb−1 F x̂b − F T Pb−1 x̂b + Pb−1 GQGT Pb−1 x̂b − H T R−1 H x̂b
+ P (t|T ) Pb−1 F x̂b − H T R−1 (z − H x̂b ) .
Many terms cancel in this expression and we are left with
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T ) (167)
dt
+ P (t|T )(F + GQGT P −1 )T − GQGT P −1 (t|T )x̂(t|T ) (168)
+ P (t|T ) −F T P −1 x̂ − P −1 GQGT P −1 x̂ (169)
+ P (t|T ) −F T Pb−1 x̂b + Pb−1 GQGT Pb−1 x̂b . (170)
Notice that the terms −P (t|T )F T P −1 x̂ and −P (t|T )F T Pb−1 x̂b on the lines 169 and 170
combine using Equation 166 to give −P (t|T )F T P −1 (t|T )x̂(t|T ), which cancels the the first
term on line 168 above to give
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T )
dt
+ P (t|T )P −1GQGT P −1 (t|T )x̂(t|T ) − GQGT P −1 (t|T )x̂(t|T )
− P (t|T )P −1GQGT P −1 x̂ + P (t|T )Pb−1GQGT Pb−1 x̂b . (171)
Again trying to “remove” the terms that contain x̂b or Pb we note that from Equation 166
we get
Pb−1 x̂b = P −1 (t|T )x̂(t|T ) − P −1 x̂(t) ,
and from Equation 157 we have Pb−1 = P (t|T )−1 − P −1 so when we use these two expression
in the last term in line 171 we find it is equal to
P (t|T )Pb−1GQGT Pb−1 x̂b = P (t|T )(P (t|T )−1 − P −1 )GQGT (P −1(t|T )x̂(t|T ) − P −1x̂)
= GQGT P −1 (t|T )x̂(t|T ) − GQGT P −1 x̂
− P (t|T )P −1GQGT P −1 (t|T )x̂(t|T ) + P (t|T )P −1GQGT P −1 x̂ .
After this expansion when we use it in Equation 171 many terms cancel to give
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T ) − GQGT P −1 x̂
dt
= F x̂(t|T ) + GQGT P −1 (x̂(t|T ) − x̂) , (172)
the equation we were to show. Recall that x̂ is the forward filtering solution and thus is a
function of time even thought we don’t explicitly denote it as such in the above expression.
Lets prove that the claimed expression for P (t|T ) or Φ(t, T )P (T )Φ(t, T )T is indeed a solution
to this equation. From P (t|T ) = Φ(t, T )P (T )Φ(t, T )T using the product rule to take the
time derivative we have that
d d
Ṗ = Φ(t, T )P (T )ΦT (t, T ) + Φ(t, T )P (T ) Φ(t, T )T .
dt dt
d
Since Φ is a fundamental solution we have dt
Φ(t, T ) = F (t)Φ(t, T ) and we can conclude that
d
Φ(t, T )T = (F Φ)T = ΦT F T ,
dt
so the above first derivative of P (t|T ) becomes
as we were to show.
In part one of this example we perform fixed-interval smoothing using the forward-backwards
optimal filters. Thus to begin with we need to solve the continuous forward filtering Riccati
equation. To do that note that for this problem we have f = 0, g = h = 1 so that Equation 71
in this case becomes
p2
ṗ = q − .
r
2 √
In steady-state ṗ = 0 so p = rq or p = + rq ≡ α. The backwards error covariance from
Equation 156 is given by
dpb p2
=q− b .
dτ r
dpb √
In steady-state dτ = 0 so p2b = rq or qb = + rq = α. Thus in steady-state the smoothed
state has the following error covariance
1 1 2
p−1 (t|T ) = p−1 (t) + p−1
b (t) = + = ,
α α α
and so
α
p(t|T ) = .
2
Next the smoothed state estimate is given by
x̂(t) x̂b (t) α x̂ x̂b 1
x̂(t|T ) = p(t|T ) + = + = (x̂ + x̂b ) . (173)
p(t) pb (t) 2 α α 2
For part 2 of this example we want to perform fixed-interval smoothing using the Rauch-
Tung-Striebel equations, which in general are given by Equations 165 and 172. Specifying
these to the problem at hand we find Equation 165 becomes
q q
ṗ(t|T ) = p(t|T ) + p(t|T ) −q
α α
2q
= p(t|T ) − q ,
α
as our differential equation to solve for p(t|T ). This equation has the final condition given
by p(T |T ) = p(T ), where p(T ) the forward smoother’s error covariance value at the time
t = T . Define β to be β = α1 then solving this differential equation is done as follows
ṗ(t|T ) − 2βp(t|T ) = −q or
d −2βt
e p(t|T ) = −qe−2βt integrating both sides gives
dt
q −2βt
e−2βt p(t|T ) = e + C0 for some constant C0 thus
2β
q
p(t|T ) = + C0 e2βt .
2β
Note that p(T ) = α since we assume that T is large enough so that the forward filtering
equation is in steady-state. With this to satisfy the final condition on p(t|T ) of p(T |T ) =
p(T ) = α requires C0 satisfy
q 2βT q
+ C0 e = α ⇒ C0 = α − e−2βT .
2β 2β
˙ q
x̂(t|T ) = (x̂(t|T ) − x̂(t)) = β(x̂(t|T ) − x̂(t)) .
α
This can be shown to be equivalent to Equation 173 by taking the time derivative of that
equation which gives us
˙ 1
x̂(t|T ) = (x̂˙ + x̂˙ b ) .
2
Using the differential equations for x̂ and x̂b which in this case are given by
r
˙x̂ = p (z − x̂) = q (z − x̂)
r r
r
˙x̂b = − pb (z − x̂) = − q (z − x̂b ) .
r r
˙
When we sum these two expressions (as required by x̂(t|T )) we find
r r
˙ 1 q 1 q
x̂(t|T ) = (x̂b − x̂) = (2x̂(t|T ) − x̂ − x̂)
2 r 2 r
r
q
= (x̂(t|T ) − x̂) ,
r
where we have expressed x̂b in terms of x̂ and x̂(t|T ) using Equation 173.
Notes on a steady-state, fixed-interval smoother solution
In this subsection we show an alternative method to solve for the fixed-interval linear
smoother covariance equation for P (t|T ) governed by the differential Equation 165. We
start by defining an unknown λ in terms of the variable y as
λ = P (t|T )y , (174)
where y is chosen to satisfy the following differential equation
dy
= −[F + GQGT P −1 ]T y . (175)
dt
With such a definition taking the time derivative of λ above and using the product rule
followed by replacing Ṗ (t|T ) with the right-hand-side of Equation 165 we find
λ̇ = Ṗ (t|T )y − P (t|T )(F + GQGT P −1 )T y
= (F + GQGT P −1 )P (t|T )y + P (t|T )(F + GQGT P −1 )T y − GQGT y
− P (t|T )(F + GQGT P −1 )T y
= (F + GQGT P −1 )P (t|T )y − GQGT y
= (F + GQGT P −1 )λ − GQGT y . (176)
y
Then as a system in terms of the vector unknown we have
λ
d y −(F + GQGT P −1 )T 0 y
= T T −1 ,
dt λ −GQG F + GQG P λ
which is the books equation 5.2-14.
In this subsection we provide somewhat more complete derivations of many of the stated
fixed-point smoother equations. While the algebra for some of these can be tedious and
I include most of it, the hope is that someone could simple “read” these derivations and
observe their correctness. In other-words I don’t want to have any of the steps that lead up
to a result be mysterious. By cataloging these derivations and results in one place I won’t
have to revisit this work again in the future.
The first statement of this section is that we can write the explicit solution to the fixed-
interval smoother differential Equation 172 in terms of a smoothing fundamental solution
Φs (t, T ). The claimed functional form for x̂(t|T ) is given by
Z t
x̂(t|T ) = Φs (t, T )x̂(T ) − Φs (t, τ )GQGT P −1(τ )x̂(τ )dτ , (177)
T
where Φs (t, T ) is the fundamental solution for Equation 172 and thus satisfies
Φ̇s (t, T ) = (F + GQGT P −1 (t))Φs (t, T ) with Φs (t, t) = I . (178)
As a note on our notation, when dealing with multiple matrix products as in GQGT P −1
if all factors in the product are to evaluated at the same argument we will present that
argument only on the last factor. Thus the expression GQGT P −1 (τ ) is really a short-hand
for G(τ )Q(τ )G(τ )T P −1(τ ). In the same way, the addition of another matrix to a product
expression will be evaluated at the same argument as the product expression. Thus the
expression F + GQGT P −1 (τ ) is really a short-hand for F (τ ) + G(τ )Q(τ )GT (τ )P −1 (τ ).
Now we will show that Equation 177 is a solution to Equation 172 by explicitly evaluating
its time derivative. Using Leibniz’s rule and Equation 177 itself to replace any resulting
integrals with simpler expressions it then follows that
˙
x̂(t|T ) = (F + GQGT P −1(t))Φs (t, T )x̂(T ) − Φs (t, t)GQGT P −1 (t)x̂(t)
Z t
− Φ̇s (t, τ )GQGT P −1 (τ )x̂(τ )dτ
T
= (F + GQGT P −1(t))Φs (t, T )x̂(T ) − GQGT P −1 (t)x̂(t)
Z t
T −1
− (F + GQG P (t)) Φs (t, τ )GQGT P −1(τ )x̂(τ )dτ
T
= (F + GQGT P −1(t))Φs (t, T )x̂(T ) − GQGT P −1 (t)x̂(t)
− (F + GQGT P −1(t))[−x̂(t|T ) + Φs (t, T )x̂(T )]
= (F + GQGT P −1(t))x̂(t|T ) − GQGT P −1 (t)x̂(t) ,
or an expression equivalent to Equation 172 proving that Equation 177 is a representation
of its solution.
The next steps in the derivation are to derive expressions for the T evolution of x̂(t|T ) and
P (t|T ) or explicit equations for dx̂(t|T
dT
)
and dPdT
(t|T )
. To derive an expression for dx̂(t|T
dT
)
we will
dΦs (t,T )
need to be able to evaluate the expression dT which the book claims is given by
dΦs (t, T )
= −Φs (t, T )(F + GQGT P −1 (T )) , (179)
dT
where the expression F + GQGT P −1 (T ) means that every matrix has its argument evaluated
at T . To show this is true, consider the t derivative of the identity Φs (t, T )Φs (T, t) = I, which
by the product rule is given by
dΦs (t, T ) dΦs (T, t)
Φs (T, t) + Φs (t, T ) = 0.
dt dt
dΦs (T,t) dΦs (t,T )
Solving for dt
and using the expression for dt
given by Equation 178 we get
dΦs (T, t) dΦs (t, T )
= −Φs (t, T )−1 Φs (T, t)
dt dt
dΦs (t, T )
= −Φs (T, t) Φs (T, t)
dt
= −Φs (T, t)(F + GQGT P −1 (t))Φs (t, T )Φs (T, t)
= −Φs (T, t)(F + GQGT P −1 (t)) . (180)
d
Then to get the desired expression for dT Φs (t, T ) we exchange T and t in Equation 180
to get Equation 179 or the books equation 5.3-5. Once the expression for dΦsdT(t,T ) has been
dx̂(t|T )
established the equation for dT
is give by using Leibniz’ rule on Equation 177 in a straight-
forward manner.
To verify this expression is indeed a solution we can take its t derivative to get
From the claimed solution for P (t|T ) given by Equation 181 we have
Z t
Φs (t, τ )GQGT (τ )ΦTs (t, τ )dτ = Φs (t, T )P (T )ΦTs (t, T ) − P (t|T ) ,
T
when we simplify. This is the books equation 5.2-15 showing that Equation 181 is indeed a
solution to Equation 165 as claimed.
With the explicit representation for P (t|T ) given by Equation 181 we next take the T
derivative of this expression. The product rule and Leibniz’ rule gives
dP (t|T ) dΦs (t, T ) dP (T ) T
= P (T )ΦTs (t, T ) + Φs (t, T ) Φs (t, T )
dT dT dT
dΦT (t, T )
+ Φs (t, T )P (T ) s + Φs (t, T )GQGT (T )Φs (t, T )T .
dT
Now using Equation 71 and 179 into the above we have
dP (t|T )
= −Φs (t, T )(F + GQGT P −1 (T ))P (T )ΦTs (t, T )
dT
+ Φs (t, T ) F P + P F T + GQGT − P H T R−1 HP (T ) ΦTs (t, T )
− Φs (t, T )P (T )(F + GQGT P −1 (T ))T ΦTs (t, T ) + Φs (t, T )GQGT (T )Φs (t, T )T
= −Φs (t, T )P H T R−1 HP (T )ΦTs (t, T ) , (182)
which is the book’s equation 5.3-8.
In this subsection of we present notes and derivations on the equations fixed-lag smoothers
must satisfy. Starting with Equation 177 but by taking t = T − ∆ gives the equation
Z T −∆
x̂(T − ∆|T ) = Φs (T − ∆, T )x̂(T ) − Φs (T − ∆, τ )GQGT P −1 (τ )x̂(τ )dτ . (183)
T
To derive the ordinary differential equation that the optimal fixed-lag state estimate or
x̂(T − ∆|T ) must satisfy we will take the T derivative of the above expression. To take
the T derivative of the above requires us to evaluate dΦs (TdT−∆,T ) . This derivative can be
evaluated by writing Φs (T − ∆, T ) = Φs (T − ∆, t)Φs (t, T ), using the product rule followed
by Equations 178 and 179. We find
dΦs (T − ∆, T ) dΦs (T − ∆, t) dΦs (t, T )
= Φs (t, T ) + Φs (T − ∆, t)
dT dT dT
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, t)Φs (t, T )
− Φs (T − ∆, t)Φs (t, T )(F + GQGT P −1 (T ))
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, T )
− Φs (T − ∆, T )(F + GQGT P −1 (T )) . (184)
which is the books equation 5.4-3.
dx̂(T −∆|T )
With this result we are ready to evaluate dT
using Equation 183. We find
dx̂(T − ∆|T ) dΦs (T − ∆, T ) dx̂(T )
= x̂(T ) + Φs (T − ∆, T )
dT dT dT
T −1
− Φs (T − ∆, T − ∆)GQG P (T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )GQGT P −1 (T )x̂(T )
Z T −∆
dΦs (T − ∆, τ )
− GQGT P −1(τ )x̂(τ )dτ .
T dT
dΦs (T −∆,τ )
Using Equation 178 to evaluate dT
the integral term above becomes
Z T −∆
T −1
(F + GQG P (T − ∆)) Φs (T − ∆, τ )GQGT P −1 (τ )x̂(τ )dτ .
T
dx̂(T − ∆|T )
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, T )x̂(T )
dT
− Φs (T − ∆, T )(F + GQGT P −1 (T ))x̂(T )
+ Φs (T − ∆, T ) [F (T )x̂(T ) + K(T )(z(T ) − H(T )x̂(T ))]
− Φs (T − ∆, T − ∆)GQGT P −1(T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )GQGT P −1(T )x̂(T )
− (F + GQGT P −1 (T − ∆)) [Φs (T − ∆, T )x̂(T ) − x̂(T − ∆|T )]
= (F + GQGT P −1 (T − ∆))x̂(T − ∆|T )
− GQGT P −1 (T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )K(T )(z(T ) − H(T )x̂(T )) , (185)
which is the books equation 5.4-3 and is the desired differential equation for x̂(T − ∆|T ).
Next we derive the differential equation for P (T − ∆|T ) under optimal fixed-lag smoothing.
To do this we set t = T − ∆ in Equation 181 and get
Z T −∆
T
P (T −∆|T ) = Φs (T −∆, T )P (T )Φs (T −∆, T )− Φs (T −∆, τ )GQGT (τ )ΦTs (T −∆, τ )dτ .
T
We follow the same procedure to derive the corresponding differential equation we have been
performing above. The algebra for this seems quite involved and can be skipped at first
reading. Taking the T derivative of this expression we find
dP (T − ∆|T ) dΦs (T − ∆, T ) dP (T ) T
= P (T )ΦTs (T − ∆, T ) + Φs (T − ∆, T ) Φs (T − ∆, T )
dT dT dT
dΦTs (T − ∆, T )
+ Φs (T − ∆, T )P (T ) − GQGT (T − ∆)
dT
+ Φs (T − ∆, T )GQGT (T )ΦTs (T − ∆, T )
Z T −∆
dΦs (T − ∆, τ )
− GQGT (τ )ΦTs (T − ∆, τ )dτ
T dT
Z T −∆
dΦT (T − ∆, τ )
− Φs (T − ∆, τ )GQGT (τ ) s dτ .
T dT
Again we will use Equation 178 to evaluate dΦs (TdT−∆,τ ) in the above integrals and then write
them in terms in terms of P (T − ∆|T ) and Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) using the
proposed integral solution for P (T − ∆|T ). When we do this along with other simplifications
of derivatives that appear we obtain
dP (T − ∆|T )
= (F + GQGT P −1(T − ∆))Φs (T − ∆, T )P (T )ΦTs (T − ∆, T )
dT
− Φs (T − ∆, T )(F + GQGT P −1 (T ))P (T )ΦTs (T − ∆, T )
+ Φs (T − ∆, T )[F P + P F T + GQGT − P H T R−1 HP (T )]ΦTs (T − ∆, T )
+ Φs (T − ∆, T )P (T )ΦTs (T − ∆, T )(F + GQGT P −1 (T − ∆))T
− Φs (T − ∆, T )P (T )(F − GQGT P −1(T ))T ΦTs (T − ∆, T )
− GQGT (T − ∆)
+ Φs (T − ∆, T )GQGT (T )ΦTs (T − ∆, T )
− (F + GQGT P −1(T − ∆))[Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) − P (T − ∆|T )]
− [Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) − P (T − ∆|T )](F + GQGT P −1 (T − ∆))T .
As expected, many terms cancel in the above expression and when the smoke clears we find
we are left with
dP (T − ∆|T )
= (F + GQGT P −1(T − ∆))P (T − ∆|T )
dT
+ P (T − ∆|T )(F + GQGT P −1 (T − ∆))T
− Φs (T − ∆, T )P H T R−1 HP ΦTs (T − ∆, T )
− GQGT (T − ∆) , (186)
Problem Solutions
To solve this problem lets begin by expanding the given objective function as
Then using Equations 311 and 312 we can compute the derivative of J with respect to x.
We find
∂J
= 2P −1x + 2Pb−1 x − 2(P −1x̂) − 2(Pb−1 x̂b ) .
∂x
Setting this result equal to zero we have
Warning: I was unable to derive the given expression for Λ̇(t) or to show the identity
as requested in this problem. Below I present the algebraic steps I took and where I got
stuck. If anyone sees what to do next or an alternative solution please contact me.
then taking the derivative of x̂(t|T ) using the product rule gives
dλ
= −F T λ + H T R−1 HP λ + H T R−1 (z − H x̂)
dt
= −[F T − P H T R−1 H]T λ + H T R−1 (z − H x̂) ,
as we were to show. Note that since x̂(t|T ) when t = T is given by x̂(T |T ) = x̂(T ), we see
that this translates into an initial condition on λ(t) of the following
Using the definition of Λ(t) as E[λ(t)λ(t)T ] we have that the first derivative of this expression
(when we use the results from above) is
d dλ T dλT
Λ(t) = E λ +E λ
dt dt dt
= −(F − P H R H)T E[λλT ] + H T R−1 E[(z − H x̂)λT ]
T −1
This result is similar to the expression we are attempting to derive for Λ̇. To make the two
expressions the same we need to evaluate the last two terms above. Since the two terms on
line 191 are transposes of each other we will evaluate only the first one and get the second
one by transposition. From the definition of λ we have
λ = P −1 (x̂ − x̂(t|T )) ,
Next lets consider the second factor in the product above. From the definition of x̂(t|T ) in
Equation 166 and using Equation 162 we see that
With this result we can now compute the inner product needed in Equation 192. We find
Part (a): See the problem 4-11 on Page 73 where we do this calculation in detail.
Part (b): We will consider the Rauch-Tung-Striebel (RTS) covariance Equation 165 in
steady-state where Ṗ (t|T ) = 0 but specified for this problem where all system matrices are
scalars and constant. Specifically we have F = a, G = 1, Q = q, H = b, and R = r so the
RTS equation becomes
q
0=2 a+ p∞ (t|T ) − q .
p∞
When we solve this for p∞ (t|T ) we get
q p
p∞ (t|T ) = = ∞ .
q
2 a+ p∞
2 1 + aq p∞
To solve this problem another way one could consider the backwards covariance filtering
equation given by
d −1
P (T − τ ) = Pb−1(T − τ )F (T − τ ) + F T (T − τ )Pb−1 (T − τ )
dτ b
− Pb−1(T − τ )G(T − τ )Q(T − τ )GT (T − τ )Pb−1 (T − τ )
+ H T (T − τ )R−1 (T − τ )H(T − τ ) .
dPb−1
Set dτ
= 0 and solve for Pb (∞). For this problem the above becomes
2a q b2
0= − + .
pb (∞) pb (∞)2 r
which we would need to solve for pb (∞). Given this value we can compute the desired
expression P∞ (t|T ) using P∞ (t|T )−1 = P −1 (∞) + Pb−1 (∞).
p∞ (t|T ) 1 1
= = q .
p∞ a
2 1 + q p∞ a ar b2 q
2 1+ q
· b2
1+ 1+ a2 r
b2 q
Defining γ 2 as γ 2 = a2 r
the above becomes
1
p ,
1
2 1+ γ2
1+ 1 + γ2
which if we multiply by γ 2 on the top and bottom of this expression gives the desired result.
For this problem we desire to apply fixed interval smoothing to a discrete system which looks
like
Thus we have that Φk = 1, Qk = q∆, Hk = 1, and Rk = r0 . Note that the forward filtering
part of this problem is the same as that of Problem 4-14 on page 77.
Part (a): For this part we want to use fixed-interval smoothing to compute p0|2 and p1|2 ,
so N = 2 and to solve this problem using the Rauch-Tung-Striebel algorithm we first need
to compute the forward smoothed solution pk (±).
Since we are told to assume no a-priori information on the knowledge of the state we must
take p0 (+) ≈ +∞. If we do this directly it seems that we run into problems when we perform
backwards filtering (in that we obtain the indefinite ratio of ∞/∞) with the above forward
filtered results. Thus I’ll take our initial condition on p0 (+) to be
1
p0 (+) = ,
ǫ
where ǫ is a small number. Just as in Problem 4-14 we iterate the discrete Kalman filter
equations for k = 0, 1, 2 to find when we take ǫ = 0 we get
p0 (+) +∞=
p1 (−) +∞=
p1 (+) r0=
p2 (−) =
r0 (1 + γ)
1+γ
p2 (+) = r0 .
2+γ
When we keep ǫ 6= 0 we can then perform the discrete RTS filtering equations backwards.
Starting with pN |N = p2|2 = p2 (+) we compute for k = 1 and then k = 0 the following
Ak = Pk (+)ΦTk Pk+1
−1
(−)
Pk|N = Pk (+) + Ak [Pk+1|N − Pk+1 (−)]ATk .
The calculations when p0 (+) = 1ǫ and the subsequent limit as ǫ → 0 are rather tedious and
are done in the Mathematica file chap 5 prob 6.nb. Performing the above iterations we
obtain
1+γ
p2|2 = p2 (+) = r0
2+γ
1
a1 = p1 (+)ΦT1 p−1
2 (−) =
1+γ
p1|2 = p1 (+) + a1 [p2|2 − p2 (−)]aT1
1+γ
= p1 (+) + a21 [p2 (+) − p2 (−)] = r0
2+γ
a0 = p0 (+)ΦTk p−1
1 (−) = 1
p0|2 = p0 (+) + a0 [p1|2 − p1 (−)]aT0
1 + 3γ + γ 2
= r0 .
2+γ
Warning: Note that these expressions are somewhat different than the ones presented for
this problem. If anyone sees an error in what I’ve done or can verify that these are correct
please contact me.
Part (c): In fixed-point smoothing we desire a smoothed estimate of the state at a particular
point of interest while the “end point” of the interval grows. Specifically, in fixed-point
optimal smoothing we will fix the index k and then let the index N increase. For this
problem since we want to compute p0|1 and p0|2 that means we take k = 0 and let N = 1 and
N = 2. Once k is fixed and using the a priori and a posteriori covariance estimate Pi (±)
for i ≥ k computed from forward filtering we will compute the desired fixed-point smoothed
solutions Pk|N for N = k + 1, k + 2, · · · by using
N
Y −1
BN = Pi (+)ΦTi Pi+1
−1
(−)
i=k
T
Pk|N = Pk|N −1 + BN [Pk (+) − Pk (−)]BN ,
with Pk|k = Pk (+).
Warning: I don’t see how to evaluate the term P0 (−) since our initial a posteriori uncer-
tainty was to be infinite P0 (+) = ∞. This might mean that P0 (−) = ∞. In any case these
results don’t agree with what the book claims this expression should be.
Chapter 6 (Nonlinear Estimation)
If we perform a power series expansion of our nonlinear function f (x, t) in terms of the
current estimate (the conditional mean x̂(t)) then we have
∂f
f (x, t) ≈ f (x̂, t) + (x − x̂) + · · · = f (x̂, t) + F (x − x̂) + · · · ,
∂x x=x̂
where F is a function of the state, x̂, we linearize about and time t i.e. F = F (x̂, t). Then
the state estimate x̂ satisfies
˙
x̂(t) = fˆ(x(t), t) . (194)
Next using the books equation 6.1-5 or
dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q , (195)
we will evaluate the right-hand-side using the above power series expansion for f (x, t). For
dT = E[xf T ] we find
the term xf
E[xf T ] = E[xf (x̂, t)T ] + E[x(x − x̂)T F T ]
= E[x]f (x̂, t)T + E[(x − x̂)(x − x̂)T ]F T + E[x̂(x − x̂)T ]F T
= x̂f (x̂, t)T + P F T .
To evaluate fdxT = E[f xT ] we simply take the transpose of the above result. To evaluate
the expression fˆ we have
fˆ = E[f (x, t)] ≈ f (x̂, t) + E[(F (x − x̂))T ] = f (x̂, t) .
Using these two expressions in Equation 195 we have for Ṗ
Ṗ (t) = x̂f (x̂, t)T + P F T
− x̂f (x̂, t)T
+ f (x̂, t)x̂T + F P
− f (x̂, t)x̂T + Q
= PFT + FP + Q, (196)
which is the book’s equation 6.1-8.
We will estimate the state at time tk or xk after the measurement zk using a formula like
x̂k (+) = ak + Kk zk . (197)
Then introducing the definition of the a priori and a posteriori state error x̃k (±) as
and first using x̃k (+) on the left-hand-side of the proposed estimator Equation 197 above we
get
x̃k (+) + xk = ak + Kk (hk (xk ) + vk ) .
Next using x̃k (−) to replace xk on the left-hand-side of this expression we get
which is the books equation 6.1-11. Now taking the expectation of both sides of this expres-
sion and assuming that our earlier estimate of xk was unbiased that is E[x̃k (−)] = 0 then to
make our a posteriori estimate of xk unbiased we require the following
and the a posteriori estimate x̂k (+) in Equation 197 then takes the form
x̂k (+) = ak + Kk zk
= x̂k (−) + Kk (zk − E[hk (xk )]) , (200)
which is the books equation 6.1-13. Using this expression for ak we can go back to the
expression above for the a posteriori estimate error x̃k (+) or Equation 199 where we find
x̃k (+) = x̂k (−) − Kk E[hk (xk )] + Kk hk (xk ) + Kk vk + x̃k (−) − x̂k (−)
= x̃k (−) + Kk (hk (xk ) − E[hk (xk )]) + Kk vk , (201)
or the books equation 6.1-14. This expression makes it easy to compute Pk (+) since it is the
expectation of the above expression “squared”. Specifically Pk (+) = E[x̃k (+)x̃k (+)T ] and
this quadratic product is given by
x̃k (+)x̃k (+)T = x̃k (−)x̃k (−)T + x̃k (−)(hk (xk ) − E[hk (xk )])T KkT + x̃k (−)vkT KkT
+ Kk (hk (xk ) − E[hk (xk )])x̃k (−)T
+ Kk (hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T KkT
+ Kk (hk (xk ) − E[hk (xk )])vkT KkT
+ Kk vk x̃k (−)T + Kk vk (hk (xk ) − E[hk (xk )])T KkT + Kk vk vkT KkT .
When we take the expectation of the above many terms simplify. Specifically using
As we have done before we will select Kk so that Pk (+) has a minimum trace. Defining Jk =
trace(Pk (+)), we then seek to minimize Jk as a function of Kk by taking the Kk derivative
of Jk , setting the result equal to zero and then solving for Kk . From the Equation 202 we
have several types of derivatives to take. Using Equation 112 with either B or C equal to the
identity matrix we can take the derivative of the second and third terms in Equation 202,
while using Equation 113 we can take the derivative of the fourth and fifth terms. When we
use these expressions we find we need to solve
∂Jk
= E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]
∂Kk
+ E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]
+ 2Kk E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] + 2Kk Rk = 0 ,
or the books equation 6.1-17. We next want to put this expression into Equation 202 to
evaluate what the minimum value of the objective function Jk is. To do this we will briefly
introduce some short-hand notation so that the manipulations are more manageable. We
define the symbol “xhT ” as
hhT = E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] .
With this short-hand we have Kk = −xhT (hhT + Rk )−1 and find that Pk (+) becomes
xhT (hhT + Rk )−1 (hhT + Rk )(hhT + Rk )−1 hxT = xhT (hhT + Rk )−1 hxT ,
which cancels with the third term. Thus we get (expressed in terms of the expressions with
expectations and not the short-hand notation)
If, as the book suggests, we Taylor expand the nonlinear function hk (xk ) about the a priori
state estimate x̂k (−) as
then using this we observe that the expectation of hk (xk ) denoted by E[hk (xk )] is equal to
hk (x̂k (−)) and thus
Thus some of the expectations in the formulas for Kk and Pk (+) simplify as
E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] = Hk Pk (−)HkT ,
and
E[x̃k (−)(hk (xk ) − E[hk (xk )])T ] = −Pk (−)HkT .
Using both of these observations we see that Equation 204 becomes
for the a posteriori covariance update equation for the extended Kalman filter and the books
equation 6.1-21.
In this section we will attempt to derive many of the expressions for a second-order filter
presented in the book. To begin we will perform a second-order Taylor expansion of f (x(t), t)
and hk (xk ) about x̂(t) and x̂k (−) respectively as follows
1
f (x(t), t) = f (x̂(t), t) − F (x̂(t), t)x̃(t) + ∂ 2 (f, x̃(t)x̃(t)T ) + · · · (206)
2
1
hk (xk ) = hk (x̂k (−)) − H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T ) + · · · , (207)
2
where for any matrix B the expression ∂ 2 (f, B) is a vector with an ith component defined
as 2
2 ∂ fi
∂i (f, B) ≡ trace B . (208)
∂xp ∂xq
˙
When these expressions are put into the state dynamic Equation 194 or x̂(t) = fˆ(x(t), t) we
get
˙
x̂(t) = fˆ(x(t), t) = E[f (x(t), t)]
1
= E[f (x̂(t), t) − F (x̂, t)x̃(t) + ∂ 2 (f, x̃x̃T )]
2
1 2
= f (x̂(t), t) + ∂ (f, P (t)) ,
2
since E[F (x̂, t)x̃(t)] = F (x̂, t)E[x̃(t)] = 0.
Next we want to put the second-order Taylor expansions above into Equation 195 or
dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q .
dT .
Since we know how to evaluate fˆ, the expectation of f , lets first consider the term xf
Before we take the expectation, under the second order Taylor expansion of f (x, t) we find
xf T is given by
T T T T 1 2 T T
xf = x f (x̂) − x̃ F (x̂) + ∂ (f, x̃x̃ ) .
2
When we take expectations of this using the fact that x = x̂ − x̃ we get
dT = E[xf T ] = x̂f (x̂)T − E[(x̂ − x̃)x̃T ]F (x̂)T + 1 E[(x̂ − x̃)∂ 2 (f, x̃x̃T )T ]
xf
2
1 2 1
= x̂f (x̂) + P (t)F (x̂) + x̂∂ (f, P (t)) − E[x̃∂ 2 (f, x̃x̃T )T ] .
T T T
2 2
T
From which we see that we now need to evaluate the expectation of the matrix x̃∂ 2 f, x̃x̃T
which has an ijth component given by
2
2
T T ∂ fj T
(x̃∂ f, x̃x̃ )ij = x̃i trace x̃x̃ .
∂xp ∂xq
When we take the expectation of this we get zero, assuming that x̃i are independent Gaussian
random variables with zero expectation because then E[x̃i x̃q x̃p ] = 0. After all of this we
finally arrive at
dT = x̂f (x̂)T + P (t)F (x̂)T + 1 x̂∂ 2 (f, P (t))T .
xf
2
Now the expectation of f is given by fˆ = f (x̂) + 12 ∂ 2 (f, P (t)) so we can now evaluate Ṗ (t)
using Equation 195. We find
1 1
Ṗ (t) = x̂f (x̂)T + P (t)F (x̂)T + x̂∂ 2 (f, P (t))T − x̂f (x̂)T − x̂∂ 2 (f, P (t))T
2 2
1 1
+ f (x̂)x̂T + F (x̂)P (t) + ∂ 2 (f, P (t))x̂T − f (x̂)x̂T − ∂ 2 (f, P (t))x̂T + Q
2 2
T
= P (t)F (x̂) + F (x̂)P (t) + Q ,
From the given second-order Taylor series expansion for hk (xk ) we have the expectation of
hk (xk ) denoted by ĥk (xk ) given by
1
ĥk (xk ) = E[hk (xk )] = hk (x̂k (−)) + ∂ 2 (hk , Pk (−)) .
2
Thus we see that Equation 200 becomes
1
x̂k (+) = x̂k (−) + Kk [zk − ĥk (x̂k (−)) − ∂ 2 (hk , Pk (−))] ,
2
the desired equation in 6.1-26.
Next we simplify Equation 203 to derive the equation for Kk under the second-order Taylor
series approximation. To do this we first evaluate
1
hk (xk ) − ĥk (xk ) = hk (x̂k (−)) − H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T )
2
1 2
− hk (x̂k (−)) − ∂ (hk , Pk (−))
2
1 1
= −H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T ) − ∂ 2 (hk , Pk (−)) .
2 2
Using this expression we see that the product x̃k (−)(hk (xk ) − ĥk (xk ))T is then
1 1
−x̃k (−)x̃k (−)T H(x̂k (−))T + x̃k (−)∂ 2 (hk , x̃k (−)x̃k (−)T )T − x̃k (−)∂ 2 (hk , Pk (−))T .
2 2
Taking expectation of this the third term vanishes and by using using Equation 210 the
second term also vanishes. Thus we are left with
E x̃k (−)(hk (xk ) − E[hk (xk )])T = −Pk (−)H(x̂k (−))T . (211)
Next we can now compute the inner product required in the expression for the matrix inverse
portion of Kk or
[hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T .
To do this lets define this product as T , and use the shorthand that H ≡ H(x̂k (−)). Then
this product has nine terms and is given by
1 1
T = H x̃k (−)x̃k (−)T H T − H x̃k (−)∂ 2 (hk , x̃k (−)x̃k (−)T ) + H x̃k (−)∂ 2 (hk , Pk (−))
2 2
1 2 T T T
− ∂ (hk , x̃k (−)x̃k (−) )xk (−) H
2
1 2
+ ∂ (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T
4
1 2
− ∂ (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , Pk (−))T
4
1 2
− ∂ (hk , Pk (−))x̃k (−)T H T
2
1 2
− ∂ (hk , Pk (−))∂ 2 (hk , x̃k (−)x̃k (−)T )T
4
1 2
+ ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T .
4
Taking the required expectation of this expression and recalling Equation 210 we see that
the second, third, fourth, and seventh terms vanish and we get
1
E[T ] = HPk (−)H T + E[∂ 2 (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T ]
4
1 2 1
− ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T − ∂ 2 (hk , Pk (−))∂ 2 (hk , Pk (−))T
4 4
1 2
+ ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T ,
4
or canceling terms that
1
E[T ] = HPk (−)H T + E[∂ 2 (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T ]
4
1 2
− ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T . (212)
4
In the above expression notice that the last two terms are equal to the definition of the
matrix Ak in the book. Next lets evaluate the above expression for Ak . To begin with, for
notational simplicity, we will drop the k subscripts and the (−) notation by considering the
second term in the above expression or
∂ 2 (h, x̃x̃T )∂ 2 (h, x̃x̃T )T .
This matrix since it is an outer product has an ijth element given by
! !
X X ∂ 2 hi X X ∂ 2 hj
∂ 2 (h, x̃x̃T )i ∂ 2 (h, x̃x̃T )j = x̃q x̃p x̃n x̃m
p q
∂xp ∂x q m n
∂xm ∂x n
X ∂ 2 hi ∂ 2 hj
= x̃p x̃q x̃m x̃n .
p,q,m,n
∂xp ∂xq ∂xm ∂xn
Taking the expectation of this expression and using the fact that for Gaussian random
variables we have
E[x̃p x̃q x̃m x̃n ] = ppq pmn + ppm pqn + ppn pqm , (213)
we can write the above as
X ∂ 2 hi ∂ 2 hj
[ppq pmn + ppm pqn + ppn pqm ] .
p,q,m,n
∂xp ∂xq ∂xm ∂xn
At the same time the ijth element of the other term in the definition of Ak is
X ∂ 2 hi ∂ 2 hj
2 2
∂ (h, P )i ∂ (h, P )j = ppq pmn .
p,q,m,n
∂xp ∂xq ∂xm ∂xn
This combined with the other term in Equation 212 gives the books equation 6.1.28. Com-
bining all of the expressions obtained thus far we finally end with
−1
Kk = Pk (−)Hk (x̂k (−))T Hk (x̂k (−))Pk (−)Hk (x̂k (−))T + Rk + Ak ,
as we were to show.
In this section we have computed all of the needed expectations required to evaluate Equa-
tion 204. Using everything from earlier we find that
In this section we see to approximate the nonlinear vector function f (x) with the linear form
f (x) ≈ a + Nf x , (215)
where the vector a and the matrix Nf are determined by statistical linearization. To deter-
mine the specific form for a and Nf introduce the approximation error e as
e = f (x) − a − Nf x ,
J = E[eT Ae]
= E[(f (x) − a − Nf x)T A(f (x) − a − Nf x)] , (216)
where A is some symmetric positive semidefinite matrix. To find the minimum of J with
respect to a, e take the derivative with respect to a, set the resulting expression equal to
zero and then solve for a. Using Equation 312 to take the derivative we find
∂J
= E[−2A(f (x) − a − Nf x)] = 0 .
∂a
When we solve for a we get
or the books equation 6.2-7. When we put this expression for a back into our approximate
expression for f (x) given by Equation 215 we get
and we need to find the minimum of the above expression as a function of Nf . Taking the
Nf derivative of the above expression is made easier if we write J as
Then using the product rule for the fourth term and Equations 319 and 320 to evaluate the
matrix derivatives we see that
∂J
= −E[A(f − fˆ)(x − x̂)T ] − E[A(f − fˆ)(x − x̂)T ]
∂Nf
+ E[ANf (x − x̂)(x − x̂)T ] + E[ANf (x − x̂)(x − x̂)T ]
= −2AE[(f − fˆ)(x − x̂)T ] + 2ANf E[(x − x̂)(x − x̂)T ] .
When we set this last expression equal to zero and solve for Nf we find
so
Nf = (fd
xT − fˆx̂T )P −1 , (220)
or the book’s equation 6.2-9.
Note that pjs is a column vector and contains the elements in the jth column/row of P =
E[x̃x̃T ] excluding the jth diagonal element, while Σss is a matrix. When we put these two
into Equation 222 we get
Z ∞ Z ∞
ξij = ··· fi (xs )p(xs ) x̂j + pTjs Σ−1
ss (xs − x̂s ) dxs
−∞ −∞
= E x̂j + pTjs Σ−1ss (xs − x̂s ) fi (xs )
since x̂j , pjs and Σss are all constants with respect to the expectation over xs . Now since
the expression pTjs Σ−1
ss E[(xs − x̂s )fi (xs )] is a scalar we can take its transpose and not change
its value. Doing this gives
ξij = fˆi x̂j + nTsi pjs , (223)
where we have defined nsi as
= E[fi (xs )(xs − x̂s )T ]E[(xs − x̂s )(xs − x̂s )T ]−1 , (224)
which is the books equation 6.2-36. Note that I think the book is missing a transpose on its
definition of nsi .
Notes on Direct Statistical Analysis of Nonlinear Systems (CADET)
Taking the expectation of this and using the fact that E[rmT ] = E[mr T ] = 0 and that m is
a constant gives
We next want to take the trace of this expression and use it to evaluate the Nm and Nr
derivatives needed to find a minimum of the objective function J = trace(E[eeT ]). The
derivative expressions we need are
∂ ∂
trace(E[eeT ]) = 0 and trace(E[eeT ]) = 0 .
∂Nm ∂Nr
To evaluate these derivatives we will use Equations 313, 314, 315, 316, 317, and 318. For
the derivative of Nm we find
∂
trace(E[eeT ]) = −E[f ]mT − E[f ]mT + 2Nm mmT = 0 ,
∂Nm
or that Nm must satisfy
Nm mmT = E[f ]mT , (226)
which is the books equation 6.4-4. For the derivative of Nr we find
∂
trace(E[eeT ]) = −E[f r T ] − E[f r T ] + Nr (2E[rr T ]) = 0 ,
∂Nr
or that Nr must satisfy
Nr E[rr T ] = E[f r T ] , (227)
which is the books equation 6.4-5.
Now our dynamic equation is given by ẋ = f (x, t) + w which under the assumption that
x = m + r and Equation 225 becomes
ṁ + ṙ = Nm m + Nr r + w .
When w ∼ N(b, Q) we can introduce the variable u as u = w − b and get
ṁ + ṙ = Nm m + Nr r + b + u .
If we assume that we can decouple into two equations the expressions for the mean from the
residual expressions we get the following
ṁ = Nm m + b (228)
ṙ = Nr r + u , (229)
which are the book’s equations 6.4-9. From Equation 229 we can derive the differential
equation for S ≡ E[rr T ] to find
Ṡ = Nr (m, S)S + SNrT (m, S) + Q , (230)
since w ∼ N(b, Q) so u ≡ w − b ∼ N(0, Q).
If our system is linear f (x) = F x = F m + F r we can evaluate the Equations 226 and 227.
For Equation 227 we find that E[f r T ] is given by
E[f r T ] = E[F mr T + F rr T ] = F E[rr T ] = F S(t) .
So Nr becomes
Nr = E[f r T ]S −1 = F SS −1 = F .
In the same way we find that Equation 226 becomes
Nm m = E[f (x)] = F m so Nm = F ,
also.
When we then consider this expression for all values of i we see that
∂E[f (x)]
= E[f (x)r T ]S −1 = Nr (m, S) ,
∂m
as we were to show.
In the special case where f is a scalar function and we assume that the random perturbation
r is Gaussian than taking nr ≡ Nr (m, S) we find
Z ∞
T −1 1 1 −r 2 /2σ2
nr = E[f (x)r ]S = 2 √ f (m + r)re dr
σ 2πσ −∞
Z ∞
1 2 2
= √ f (m + r)re−r /2σ dr . (234)
3
2πσ −∞
At the same time we find the scalar version of the equation for Nm or Nm (m, S)m = E[f ]
becomes Z ∞
1 2 2
nm = √ f (m + r)e−r /2σ dr . (235)
2πσm −∞
Problem Solutions
The given expression for the probability density function (p.d.f) for x is a special case of
distribution known as the gamma distribution. If X is given by a gamma distribution then
it has a p.d.f given by
β α α−1 −βx
f (x|α, β) = x e . (236)
Γ(α)
From which we see that the books expression can be obtained by taking α = 2 and β = λ.
We now derive several properties of the gamma distribution and then answer the requested
questions by making the substitution α = 2 and β = λ in the resulting expressions.
Part (a): If we take α = 2 and β = λ in the expression from Equation 239 we get
2
E(X) = .
λ
Part (b): For this part to find the maximum value of f (x|α, β) when X is a gamma random
variable we take the x derivative of f , set the result equal to zero, and then solve for x. We
find
df (x|α, β) βα
= (α − 1)xα−2 e−βx − βxα−1 e−βx = 0 .
dx Γ(α)
When we solve for x we find
α−1
x= .
β
If we take α = 2 and β = λ in the above expression we get
1
x= .
λ
R
Part (c): The expectation of y is given by E[Y ] = yp(y)dy, while the value of y that
maximizes p(y) is given by the solution to p′ (y) = 0. If these two points are are the same
then we must have
p′ (E[y]) = 0 .
For this problem we use the “E” notation for expectation rather than the books “hat”
b The books equation 6.1-5 is
notation. In symbols E[X] ≡ X.
If our function f is in fact a linear function f (x) = F x then E[f ] = F E[x] where we are
assuming that F is not state dependent. Next xf T = xxT F T under this linear assumption,
so taking expectations we have
E[xf T ] = E[xxT ]F T .
we have that
E[xxT ] = P (t) + E[x]E[x]T .
and we see that E[xf T ] is given by
Part (a): We will estimate x(tk ) after observing the measurement zk using and expression
quadratic in zk or
x̂k (+) = ak + bk zk + ck zk2 .
Since zk = h(xk ) + vk in terms of h(·) the above becomes
To have the above expression for x̂k (+) be an unbiased estimator of xk we require that
E[x̂k (+)] = E[xk ] = x̂k (−). Using this with E[vk ] = 0 and E[vk2 ] = r when we take the
expectation of Equation 243 we get
For this problem we derive the expressions for a linearized Kalman filter that are summarized
in the book. To begin we consider a first-order Taylor expansion of f (x(t), t) and hk (xk )
about a known trajectory x̄(t) as follows
∂f
f (x(t), t) = f (x̄(t), t) + (x − x̄) + · · · (245)
∂x x=x̄(t)
∂hk
hk (xk ) = hk (x̄(tk )) + (xk − x̄(tk )) + . (246)
∂x x=x̄(tk )
To simplify notation we will define the matrices F and Hk to be
∂f
F = F (x̄(t), t) =
∂x x=x̄(t)
∂h
Hk = Hk (x̄(tk ), tk ) = .
∂x x=x̄(tk )
When the expression for f (x(t), t) above it put into the state dynamic Equation 194 or
˙
x̂(t) = fˆ(x(t), t) we get
˙
x̂(t) = fˆ(x(t), t) = E[f (x(t), t)] = f (x̄(t), t) + F (x̄(t), t)(x̂ − x̄) .
Next we want to put our Taylor expansions above into Equation 195 or
dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q .
dT .
Since we know how to evaluate fˆ, the expectation of f , lets first consider the term xf
Before we take the expectation, under the Taylor expansion above f (x, t) we find xf T is
given by
xf T = xf (x̄(t), t)T + x(x − x̄)T F (x̄(t), t)T .
When we use the fact that x = x̂ − x̃ we get xf T equal to
From the given Taylor series expansion for hk (xk ) we have the expectation of hk (xk ) denoted
by ĥk (xk ) given by
x̂k (+) = x̂k (−) + Kk [zk − hk (x̄(tk )) − H(x̄(tk ), tk )(x̂k (−) − x̄(tk ))] ,
Next we simplify Equation 203 to derive the equation for Kk under the linearization above.
To do this we first evaluate
Using this expression we see that the product x̃k (−)(hk (xk ) − ĥk (xk ))T is then
Next we can now compute the inner product required in the expression for the matrix
inverse portion of Kk or [hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T . From the above expression for
hk (xk ) − ĥk (xk ) we see that this is given by
E[[hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T ] = H(x̄(tk ))Pk (−)H(x̄(tk ))T .
Combining all of the expressions obtained thus far we finally end with
−1
Kk = Pk (−)H(x̄(tk ))T H(x̄(tk ))Pk (−)H(x̄(tk ))T + Rk ,
as we were to show.
In this section we have computed all of the needed expectations required to evaluate Equa-
tion 204. Using everything from earlier we find that
To begin we will square the given expression to get several terms. We find
To find the values of a, b, and c such that the above expression is a minimum we take the
derivative of E[(f (x) − a − bx − cx2 )2 ] with respect to each of these values, set the resulting
expressions equal to zero and solve for them. We find
a = fˆ − bx̂ − cxb2
2
(−fˆxb2 xb3 + fd x2 (−x̂xb2 + xb3 ) + fcx(xb2 − xb4 ) + fˆx̂xb4 )
b = 3 2
(xb2 + xb3 + x̂2 xb4 − xb2 (2x̂xb3 + xb4 ))
2
fdx2 (x̂2 − xb2 ) + fcx(−x̂xb2 + xb3 ) + fˆ(xb2 − x̂xb3 )
c = 3 2 .
(xb2 + xb3 + x̂2 xb4 − xb2 (2x̂xb3 + xb4 ))
These calculations are done in the Mathematica file chap 6 prob 6.nb. Now to try to make
this expressions look more like the ones in the book we could transform these “raw moments”
i.e. the expressions E[xi ] into central moments mi defined by mi = E[(x − x̂)i ]. This can be
done with the “inverse binomial transform” (see [10]) or
n
X
n n
E[x ] = mk x̂n−k .
k
k=0
E[x] = x̂
E[x2 ] = m2 + x̂2
E[x3 ] = m3 + 3m2 x̂ + x̂3
E[x4 ] = m4 + 4m3 x̂ + 6m2 x̂2 + x̂4 .
Warning: When we do this however the results for a, b, and c don’t seem to match the
book’s results. If anyone sees an error with what I’ve done please contact me.
Warning: While this seems like a simple problem, I was unable to show the desired result
for n even. If anyone sees anything wrong with what I’ve done or has an alternative way to
solve this problem please contact me.
Recall (see [3]) that if a given random variable X has a characteristic function ζ(t) and the
expectation E[xn ] exists for some positive integer n then it can be evaluated from
as we were to show. If our Gaussian has zero mean µ = 0 and unit variance then σ 2 = 1 and
the above expression simplifies to
2
t
ζ(t) = exp − ,
2
We can use this result to compute the expectation of X n when X has a unit variance. If X
does not have a unit variance then derivation below changes slightly but is effectively the
same. Thus we will evaluate E[X n ] in the case where X has unit variance. To determine
this expectations requires that we evaluate derivatives of ζ(t). We find
t2
ζ (0) (t) = e− 2
t2 t2
ζ (1) (t) = e− 2 (−t) = −te− 2
t2 t2 t2
ζ (2) (t) = −e− 2 + t2 e− 2 = (−1 + t2 )e− 2
t2 t2 t2
ζ (3) (t) = 2te− 2 + (−1 + t2 )(−t)e− 2 = (3t − t3 )e− 2
t2
ζ (4) (t) = (3 − 6t2 + t4 )e− 2
t2
ζ (5) (t) = (−15t + 10t3 − t5 )e− 2
t2
ζ (6) (t) = (−15 + 45t2 − 15t4 + t6 )e− 2 .
Some of these calculations are done in the Mathematica file chap 6 prob 7.nb. By perform-
ing these derivatives we see that the form of ζ (n) (t) looks like it takes the form
t2
ζ (n) (t) = φn (t)e− 2 , (249)
where φn (t) is a nth degree polynomial. In fact for n odd the polynomial φn (t) has only
odd powers of t (with no intercept term) and for n even it looks like φn (t) has only even
powers of t. Thus with the above expression for ζ (n) (t) we see that to evaluate expectations
of powers of X we have
E[X n ] = i−n ζ (n) (0) = i−n φn (0) ,
thus we need to be able to evaluate the polynomial φn (t) at t = 0.
From the above expression for ζ (n) (t) in Equation 249 we see that using the product rule
ζ (n+1) (t) is given by
t2 t2 t2
ζ (n+1) (t) = φ′n (t)e− 2 − φn (t)te− 2 = (φ′n (t) − tφn (t))e− 2 ,
Thus the recursive relationship between the coefficient polynomials φn+2 (t) and the one two
previous φn (t) is
φn+2 (t) = φ′′n (t) − 2tφ′n (t) + (−1 + t2 )φn (t) . (250)
Given the examples of φ1 (t), φ3 (t), and φ5 (t) presented at the beginning of this problem lets
form the induction hypothesis that when n is odd then φn (t) is an odd polynomial that is
n
X
φ2n+1 (t) = a2k+1 t2k+1 . (251)
k=0
This statement is true for the polynomials φ1 (t), φ3 (t), and φ5 (t) above. If we assume
that φ2n+1 (t) has the form given by Equation 251 then we see from Equation 250 that
φ2n+3 (t) must also have a form given by Equation 251. This is because each of the terms in
Equation 251 is odd polynomial and so the sum is another odd polynomial. In this case we
see that φ2n+1 (t) = 0 and by Equation 247 all odd powers of X have zero expectation.
Given the examples of φ2 (t), φ4 (t), and φ6 (t) presented at the beginning of this problem lets
form the induction hypothesis that when n is even then φn (t) is an even polynomial that is
n
X
φ2n (t) = a2k t2k . (252)
k=0
Again using Equation 250 we see that if φ2n (t) has this form then φ2n+2 (t) will also have this
form.
At this point I would like to derive a recursive expression for φ2n (0) since that would enable
me to evaluate the desired expectations. I was unable to do this however. If anyone sees a
method to do this please let me know.
Problem 6-9 (deriving the expressions for Nm (m, S) and Nr (m, S))
Using Equations 226 and 227 we see that all terms cancel and we end with E[exT ] = 0 as
we were to show.
Problem 6-11 (evaluating some multiple-input describing function gains)
For this problem f (x) = x(1 + x2 ) so that using Equation 234 to compute nr we get
Z ∞
2 1 2 2
nr (m, σr ) = √ (m + r)(1 + (m + r)2 )re−r /2σr dr
2πσr3 −∞
= 1 + 3m2 + 3σr2 .
Where we have evaluated the above integrals in the Mathematica file chap 6 prob 11.nb.
Note that to evaluate these integrals “by hand” we would expand the polynomial argument,
for example
(m + r)(1 + (m + r)2 ) ,
in the case of nm into a polynomial in r and then use the results on the expectation of powers
of a zero mean Gaussian random variable with variance σ 2 . This later result is that if X is
such a random variables then E[X n ] = 0 when n is odd and
E[X n ] = 1 · 3 · 5 · · · (n − 1)σ n ,
when n is even.
Problem 6-12 (describing function gains for some simple probability densities)
we want to evaluate nr under several different assumptions on the probability density for r
(the residual). Since f (·) is a scalar function we have
Z Z
1 1 1
nr = 2 E[f (x)r] = 2 rf (m + r)p(r)dr = 2 rf (r)p(r)dr ,
σ σ σ
when we assume that m is zero.
r2
For a Gaussian density recall that p(r) = √2πσ 1
e− 2σ2 and we compute
Z
1
nr = rf (r)p(r)dr
σ2
Z Z
D 0 1 2
− r2 D ∞ 1 r2
= − 2 r√ e 2σ dr + 2 r√ e− 2σ2 dr
σ −∞ 2πσ σ 0 2πσ
0 ∞
D 1 2
2
− r2
D 1 2
2
− r2
= − 2√ (−σ ) e 2σ + 2√ (−σ ) e 2σ
σ 2πσ −∞ σ 2πσ 0
r
D D 2D
= √ (1 − 0) − √ (0 − 1) = .
σ 2π σ 2π πσ
Next recall that a Triangular density between −b and +b had an analytic representation
of its density of
0 r < −b
1
b2
(r + b) −b <r<0
p(r) = 1 .
− b2 (r − b) 0 < r < b
0 r>b
To use this, we first compute the expression for the variance σ 2 of this density in terms of
the parameter b. We find
Z
2
σ = r 2 p(r)dr
Z 0 Z b
2 1 1
= r 2 (r + b)dr − r 2 2 (r − b)dr
−b b 0 b
Z 0 Z 0 Z b Z b
1 3 2 1 3 2
= 2 r dr + b r dr − 2 r dr − b r dr
b −b −b b 0 0
" 0 # " b #
0 b
1 r 4 r 3 1 r 4 r 3
= 2 +b − 2 −b
b 4 −b 3 −b b 4 0 3 0
4
1 b b4 1 b4 b4 1
= 2 − + − 2 − = b2 .
b 4 3 b 4 3 6
Next we calculate nr . We find
Z
1
nr = rf (r)p(r)dr
σ2
Z 0 Z b
1 1 1 1
= r(−D) 2 (r + b)dr + 2 r(D) − 2 (r − b) dr
σ 2 −b b σ 0 b
Z 0 Z b
D D
= − 2 2 (r 2 + br)dr − 2 2 (r 2 − br)dr
σ b −b σ b 0
0 b
D r 3 br 2 D r 3 br 2
= − 2 2 + − −
σ b 3 2 −b σ 2 b2 3 2 0
3
D b b3 D b3 b3
= − 2 2 − − 2 2 −
σ b 3 2 σ b 3 2
Db
= .
3σ 2
√
When we use the fact that b = 6σ we get
r
2D
nr = .
3σ
For a uniform density between − a2 and a2 we start by recalling that the variance is related
to the end points of the density by
Z
a2
σ = r 2 p(r)dr =
2
.
12
Next we calculate nr as
Z a "Z Z a #
0
1 2 1 1 2
nr = rf (r) drn = 2 −Drdr + Drdr
σ 2 − a2 a aσ − a2 0
" 0 a #
D r 2 r 2 2 D 1 a2 1 a2 aD
= 2
− + rdr = 2 + = 2.
aσ 2 −a 2 0 aσ 2 4 2 4 4σ
2
2 √
Since σ 2 = a12 or a = 12σ, so in terms of σ only nr is given by
r
3D
nr = .
4σ
We can derive the state covariance update equations by noting that figure 7.1-5 is the same
system as that given in Example 7.1-1 but with the value of γ taken to be 1, with q22 = 0,
and with the matrix P H T R−1 HP taken to be zero. This last fact is because we are not
getting the reduction in state uncertainty from any measurements. Using these facts and
the results from Exercise 7-3 on Page 155 the state covariance equation
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
becomes the set of scalar equations
ṗ11 = −2α1 p11 + q1
ṗ12 = −(α1 + α2 )p12 + p11
ṗ22 = −2α2 p22 + 2p12 ,
which are the same ones given in the book. If we want to get the steady-state values for the
covariance errors under the system above we set Ṗ = 0 and then solve for the elements of
P . When we do this and by solving these equations from top to bottom we find that the
steady-state values for each element of P are
q
p11 =
2α1
p11 q
p12 = =
α1 + α2 2α1 (α1 + α2 )
p12 q
p22 = = ,
α2 2α1 α2 (α1 + α2 )
which is the book’s equation 7.1-14.
For this example we assume that we are filtering the given system using
with K a constant as of yet unspecified. Now K is not totally unconstrained, since we note
that the above is equivalent to
and the condition of stability of this differential equation is that the coefficient of x̂ be
negative or that 1 + K > 0 or K > −1.
We will take our filtering performance measure given by J = p∞ , where p∞ is the steady-
state state error covariance for this problem. Since we assume that we will operate the filter
with a constant gain (rather than the optimal time varying Kalman gain) and the correct
system dynamics the covariance propagation for this filter will follow the Wiener filtering
equations
Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T , (255)
where K is a constant. In this example, we have F = −1, G = 1, Q = q, H = 1, and R = r,
and so the expression for Ṗ becomes
We next proceed to evaluate the S1 , S2 , and S3 criterion expressions for this example.
Where have used the fact that since 0 < q < 1 and 0 < r < 1 the maximum of J(k; q, r)
over (q, r) is obtained when r = 1 and q = 1. To perform the next minimization we take
the derivative with respect to k, set the result equal to zero, and solve for k. We find the
derivative given by
2k k2 + 1 k 2 + 2k − 1
+ = .
2(1 + k) 2(1 + k)2 2(1 + k)2
When we set that equal to zero and solve the resulting quadratic equation for k we find
√
−2 ± 4 + 4 √
k= = −1 ± 2 .
2
To have stability
√ requires that k > −1 and we must take the positive solution found above
giving k = 2 − 1 in agreement
√ with the book. Thus using this criterion function we should
filter our signal with k = 2 − 1.
For S2 criterion we have
0.8
Out[29]= 0.6
0.4
0.2
k
-0.5 0.5 1.0 1.5 2.0 2.5 3.0
1 k2
√ 1+k 2
Figure 4: A plot of the three functions 2(1+k) (in blue) 2(1+k) (in red) and 1 − 2 + 2(1+k)
(in brown). For each value of k the maximum of these three functions is the result of the
maximization over α of J(α, β) − J0 (α) and is a function of β = k.
For this example the true physical system has parameters given by F = −β, G = 1, Q = q,
H = 1, and R = r, while we choose to filter our system with possibly non optimal parameters
F ∗ = −βf , K ∗ = k, and H ∗ = 1. Because of this we are filtering with an incorrect
implementation of dynamics and measurements and need to use the results derived below to
obtain the steady-state error covariance expression p∞ .
To derive an equation for p∞ we use the books equations 7.2-14, 7.2-15, and 7.2-16 or
Equations 263, 264, and 265 below with the values for F , F ∗ etc given above. In this case
we first note that ∆F = F ∗ − F = −βf + β and ∆H = 0, so that in steady-state when
Ṗ = V̇ = U̇ = 0 the given system becomes
The first expression for p∞ is equation 7 in the original reference for this section see [1], while
the last equation is the result presented in the book.
Given the above expression for p∞ the vector “α” in this case or the actual physical param-
eters is given by the three unknown scalar values (β, q, r) and the design parameter vector
“β”, is given by the two parameters (βf , k).
Next as the design criterion S2 and S3 requires we need to compute the expression for J0 (·)
defined as a minimum in terms of the unknowns for this problem by
Ṗ = F P + P F T + GQGT − P H T R−1 HP .
With this background discussion we now proceed to determine the performance of optimal,
S1 , S2 , S3 and β = 0.5 filters when q = 10, r = 1 and 0.1 ≤ β ≤ 1 as documented in this
example.
• The Kalman Optimal Filter:
To design and plot the optimal filtering performance result for this problem note that
this system is a special case of that in example 7.1-4, with a = −β and b = +1. In
that example we found that p∞ is given by
! !
ar β̃ β̃
p∞ = 2 1 + = −βr 1 + with
b a β
r r
b2 q q
β̃ = a 1 + 2 = −β 1 + 2
ar β r
• The S1 Filter:
The design of the S1 filter is defined by
There are probably many ways to implement such a filter. For this example we will do
this in a brute force way. What this means is that we will create a grid that samples
from the possible values for βf and k. Then for each value of βf and k we have as
a candidate pair to filter with we then need to compute the inner maximization over
(β, q, r). Again we do this by simply sampling the provided function on a discretized
grid of points. Having evaluated the above function at each of these points we return
the maximum. We then move to the next candidate pair for (βf , k) and repeat this
procedure. The filter designer would then pick the values of βf and k that gave the
minimum over all tested pairs.
Since we assume that we know that the value of β is 1/2 the value of k that we will
filter is given by taking β = 1/2 in the above expression.
4
optimal filter
S1 filter (grid based)
3.8 S1 filter (analytic)
S2 filter
S3 filter
3.6 beta 0.5
estimated error covariance p_infinity
3.4
3.2
2.8
2.6
2.4
2.2
2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
true system beta
Figure 5: Attempted duplication of the results found in figure 7.1-11 from the book. This
is qualitatively very similar to the corresponding figure from the text. See the main text for
more discussion on this plot.
As discussed in [1] the S1 criterion can be determine exactly given the functional form for
J(α, β). We have
where J0 is given by Equation 258. This means that we can exactly analyze the S1 criterion.
Unfortunately I was not able to numerically duplicate these expected analytic results. In
the Figure 5 one will see a numerical duplication of the filters discussed above. For the S1
filter I present both the analytic and the grid based numeric result. These plots are produced
in the Matlab file example 7 1 6 plots brute force optimization.m, and if anyone sees
anything wrong with what I have done please contact me. These results, as they stand,
are very similar to the ones presented in the books figure 7.1-11. In addition, qualitatively
Figure 5 shows the statements given in the text on min/max filters. Namely that they enable
a filter design that is very close to the optimal result (the green line). From the plot it looks
like the S2 filter is the closest to the optimal result. Finally, we mention that the algebra to
derive some of these expressions is can be found in the Mathematical file example 7 1 6.nb.
Notes on incorrect implementation of dynamics and measurement
When we filter with incorrect Kalman gain K ∗ , measurement sensitivity H ∗ , and dynamics
F ∗ , the differential equation for the error x̃ ≡ x̂ − x in the continuous case can be derived
using the implemented equation for x̂ and true state dynamic equation for x as follows
d d
x̃ = (x̂ − x)
dt dt
= F ∗ x̂ + K ∗ (z − H ∗x̂) − F x − Gw
= (F ∗ − K ∗ H ∗ )x̂ − F x + K ∗ z − Gw
= (F ∗ − K ∗ H ∗ )x̂ − F x + K ∗ (Hx + v) − Gw
= (F ∗ − K ∗ H ∗ )x̂ − (F − K ∗ H)x + K ∗ v − Gw . (259)
or the books equation 7.2-8. Let ∆F = F ∗ − F and ∆H = H ∗ − H so that F = F ∗ − ∆F
and H = H ∗ − ∆H and then Equation 259 in terms of ∆F and ∆H becomes
d
x̃ = (F ∗ − K ∗ H ∗ )x̂ − (F ∗ − ∆F − K ∗ (H ∗ − ∆H))x + K ∗ v − Gw
dt
= (F ∗ − K ∗ H ∗ )x̂ − (F ∗ − K ∗ H ∗ )x + (∆F − K ∗ ∆H)x + K ∗ v − Gw
= (F ∗ − K ∗ H ∗ )x̃ + (∆F − K ∗ ∆H)x + K ∗ v − Gw , (260)
or the books equation 7.2-9. Since this equation involves
x̃ and x on the right-hand-side,
x̃
let x′ be denoted as the vector of x̃ and x as x′ = , then since the dynamics of x is
x
governed by dxdt
= F x + Gw the system for x′ is given by
∗ ∗
dx′ F − K ∗ H ∗ ∆F − k ∗ ∆H x̃ K v − Gw
= + ≡ F ′ x′ + w ′ , (261)
dt 0 F x Gw
which is the books equation 7.2-10 and in which we have implicitly defined the variables F ′
and w ′ . Now using the system theory from earlier we have that the covariance matrix for
the variable x′ satisfies the following differential equation
dE[x′ x′T ]
= F ′ E[x′ x′T ] + E[x′ x′T ]F ′T + E[w ′ w ′T ] . (262)
dt
What we really want to study however is the behavior of the covariance matrix for x̃ only
since this represents the difference between the true state x and our estimate x̃. To obtain
this lets block partition the covariance of the vector x′ by introducing the matrices P , V ,
and U as
′ ′T P VT
E[x x ] ≡ .
V U
Now from the definition of w ′ we can compute E[w ′ w ′T ] as
∗
′ ′T K v − Gw T ∗ T T T T T
E[w w ] = E v K −w G w G
Gw
∗ T ∗ T
K vv K − K ∗ vw T GT − Gwv T K ∗ T + Gww T GT K ∗ vw T GT − Gww T GT
= E
Gwv T K ∗ T − Gww T GT Gww T GT
∗
K RK ∗ T + GQGT −GQGT
= ,
−GQGT GQGT
which is the expression for E[w ′ w ′T ] presented in the books equation 7.2-13. Using this
expression and the definition of F ′ we can construct the right-hand-side of Equation 262.
′ ′T
Setting this equal to the block partitioned form for dE[xdtx ] , we obtain a dynamical system
for the components. When we do this we obtain the following system
Ṗ = (F ∗ − K ∗ H ∗ )P + P (F ∗ − K ∗ H ∗ )T + (∆F − K ∗ ∆H)V
+ V T (∆F − K ∗ ∆H)T + GQGT + K ∗ RK ∗T (263)
V̇ = F V + V (F ∗ − K ∗ H ∗ )T + U(∆F − K ∗ ∆H)T − GQGT (264)
U̇ = F U + UF T + GQGT , (265)
If we are deleting states from the true state in order to derive the filter we will process
measurements with, then the filter equations for this case can be obtained from the ones
above if we make the substitutions
F ∗ → W T F ∗W
H ∗ → H ∗W
K∗ → W T K∗ .
Ṗ = (W T F ∗ W − W T K ∗ H ∗ W )P + P (W T F ∗ W − W T K ∗ H ∗ W )T + (∆F − W T K ∗ ∆H)V
+ V T (∆F − W T K ∗ ∆H)T + GQGT + W T K ∗ RK ∗ T W
= W T (F ∗ − K ∗ H ∗ )W P + P W T (F ∗ − K ∗ H ∗ )T W + (∆F − W T K ∗ ∆H)V
+ V T (∆F − W T K ∗ ∆H)T + GQGT + W T K ∗ RK ∗T W , (266)
which duplicates the books equation 7.2-19. The other equations would be done in a similar
manner.
Problem Solutions
Problem 7-1 (the fixed gain k∞ gives the same steady-state error covariance)
and is the equation that our covariance satisfies if we don’t use the optimal Kalman gain but
instead filter with another value say k. In example 7.1-3 we have F = 0, G = 1, Q = q,
H = 1, and R = r so this equation becomes
ṗ = −kp − kp + q + k 2 r = −2kp + q + k 2 r .
pq
If in particular we filter with the value k = r
, the above equation becomes
r r
q q q
ṗ = −2 p + q + · r = −2 p + 2q .
r r r
To find the steady-state solution to this equation we could solve it for p(t) and then take the
limit as t → ∞ or simply recall that in steady-state ṗ = 0 and then solve for p = p∞ in the
above equation. When we do that we find
√
p∞ = rq ,
the same steady-state value result we obtain if we had in fact done optimal Kalman filtering.
Recalling our definition of the a priori state error of x̃k (−) = x̂k (−) − xk , when we increment
k by one we get
Now Introduce the notation ∆Φk ≡ Φ∗k − Φk so that Φk = Φ∗k − ∆Φk so the above becomes
x̃k+1 (−) = Φ∗k x̂k (+) − (Φ∗k xk − ∆Φk xk ) − wk = Φ∗k x̃k (+) + ∆Φk xk − wk .
This last result expresses x̃k+1 (−) in terms of x̃k (+) and xk . Next introduce the stacked
x̃k (−)
vector x′k (−) defined as as x′k (−) = , then we have that x′k (−) satisfies
xk
∗
′ x̃k+1 (−) Φk x̃k (+) + ∆Φk xk −wk
xk+1 (−) = = +
xk Φk xk wk
∗
Φk ∆Φk x̃k (+) −wk
= + . (267)
0 Φk xk wk
by using Equation 267 we find the block matrix expression for P ≡ E[x′k+1 (−)x′k+1 (−)T ] as
∗ ∗T
Φk ∆Φk Pk (+) Vk (+)T Φk 0 Qk −Qk
P= + .
0 Φk Vk (+) Uk (+) ∆ΦTk ΦTk −Qk Qk
Pk+1(−) = Φ∗k Pk (+)Φ∗k T + Φ∗k Vk (+)T ∆ΦTk + ∆Φk Vk (+)Φ∗k T + ∆Φk Uk (+)∆ΦTk + Qk
Vk+1(−) = Φk Vk (+)Φ∗k T + Φk Uk (+)∆ΦTk − Qk
Uk+1 (−) = Φk Uk (+)ΦTk + Qk .
To derive a recursive relationship across a measurement note that we can write x̃k (+) as
x̃k (+) = xk − x̂k (+) = xk − (x̂k (−) + Kk∗ (zk − Hk∗ x̂k (−)))
= x̃k (−) − Kk∗ (Hk xk + vk − Hk∗x̂k (−)) .
Using the above block definitions we find the matrix expression for P ≡ E[x′k (+)x′k (+)T ] as
∗
I − Kk∗ Hk∗ −Kk∗ ∆Hk Pk (−) Vk (−)T (I − Kk∗ Hk∗ )T 0 Kk Rk Kk∗ T 0
P= + .
0 I Vk (−) Uk (−) −∆Hk T Kk∗ T I 0 0
Kk∗ Rk Kk∗ T 0
Performing the matrix products, adding the matrix and equating the result
0 0
Pk (+) Vk (+)T
to gives the following system
Vk (+) Uk (+)
Pk (+) = (I − Kk∗ Hk∗ )Pk (−)(I − Kk∗ Hk∗ )T − (I − Kk∗ Hk∗ )Vk (−)T ∆HkT Kk∗ T
− Kk∗ ∆Hk Vk (−)(I − Kk∗ Hk∗ )T + Kk∗ ∆Hk Uk (−)∆HkT Kk∗ T + Kk∗ Rk Kk∗ T
Vk (+) = Vk (−)(I − Kk∗ Hk∗ )T − Uk (−)∆Hk Kk∗ T
Uk (+) = Uk (−) ,
−α1 0 T −α1 γ
For the given system we have F = , ( so that F = ), Q =
γ −α2 0 −α2
q11 0 1 0 r11 0
,H= , and R = . Then the matrix Riccati Equation 71 for
0 q22 0 1 0 r22
this problem becomes
ṗ11 ṗ12 −α1 p11 α1 p12 −α1 p11 γp11 − α2 p12
= +
ṗ12 ṗ22 γp11 − α2 p12 γp12 − α2 p22 −α1 p12 γp12 − α2 p22
1 2
q11 0 p + r122 p212
r11 11
1
p p + r122 p12 p22
r11 11 12
+ − 1 .
0 q22 p p + r122 p12 p22
r11 11 12
1 2
p + r122 p222
r11 12
The requested expression for Pk (+) is a specialization of the discussion given in the section
on filtering with incorrect dynamics and measurement found on Page 151, in that the result
we desire to show here can be obtained if we take ∆H = 0 and ∆F = 0. This means that
we are filtering with the correct dynamics and measurement sensitivity matrix but with a
potentially incorrect Kalman gain K ∗ . In this case Equation 263 becomes
Pk (+) = (I − Kk∗ Hk )Pk (−)(I − Kk∗ Hk ) + Kk∗ Rk Kk∗ T ,
which is the result requested.
Problem 7-7 (the error covariance differential equation for a Kalman like filter)
Part (a): From Table 4.3-1 a Kalman like filter means that we should derive an estimate
of our state x̂(t) by integrating
dx̂
= F x̂(t) + K(t)(z(t) − H(t)x̂(t)) .
dt
For this problem we assume that K(t) is general and not necessarily given by the optimal
expression P H T R−1 . Since the true state x follows the dynamics given by
dx
= F x + Gw ,
dt
the error vector x̃ = x̂ − x has a differential equation given by
dx̃ dx̂ dx
= −
dt dt dt
= F x̂ + K(z − H x̂) − F x − Gw .
Now the measurement z in terms of the true state x is given by z = Hx + v so we can write
the above as
dx̃
= F (x̂ − x) − Gw + K(H(x − x̂) + v)
dt
= (F − KH)x̃ − Gw + Kv ,
or the desired error differential equation.
Part (b): Given this differential equation for x̃ and following by example the results from
Chapter 4 we have that
dE[x̃x̃T ]
Ṗ =
dt
= (F − KH)E[x̃x̃T ] + E[x̃x̃T ](F − KH)T + E[(Gw − Kv)(Gw − Kv)T ]
= (F − KH)P + P (F − KH)T + GQGT + KRK T .
As we were to show. Note that if K equals the Kalman optimal value of P H T R−1 then we
see that the equation for Ṗ becomes
Ṗ = (F − P H T R−1 H)P + P (F − P H T R−1 H)T + GQGT + P H T R−1 RR−1 HP
= F P + P F T + GQGT − P H T R−1 HP − P H T R−1 HP + P H T R−1 HP
= F P + P F T + GQGT − P H T R−1 HP ,
or the matrix Riccati equation as it should.
For the system given for this problem with no process noise we have
d x1 0 0 x1
=
dt x2 1 0 x2
z = x2 + v with v ∼ N(0, r) ,
0 0
With that representation we have F = , G = 1, Q = 0, H = 0 1 , and R = r.
1 0
The matrix Riccati equation of
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
in component form is given by
ṗ11 ṗ12 0 0 0 p11 1 p11 p12 0 0 p11 p12
= + +0−
ṗ12 ṗ22 p11 p12 0 p12 r p12 p22 0 1 p12 p22
0 p11 1 p11 p12 0 0
= −
p11 2p12 r p12 p22 p12 p22
0 p11 1 p212 p12 p22 − 1r p212 p11 − 1r p12 p22
= − = .
p11 2p12 r p12 p22 p222 p11 − 1r p12 p22 2p12 − 1r p222
From this we find the following system of scalar equations
1
ṗ11 = − p212
r
1
ṗ12 = p11 − p12 p22
r
1
ṗ22 = 2p12 − p222 ,
r
as we were to show. If we seek the steady state solution were ṗ11 = ṗ12 = ṗ22 = 0, from the
above we see that p12 = 0, p11 = 0, and p22 = 0. Then since K = P∞ H T R−1 we see that
K = 0 also.
To consider the case where the true system has process noise, but we in fact performed a
filter design without it recall that this is an example where we are using the correct dynamics
and measurement
matrices in the implementation of the filter, but an incorrect process noise
0 w
vector q ∗ = rather than the true value of q = . To determine the effect that
0 0
this error has on our filtering equations we recall the section entitled “Exact Implementation
of Dynamics and Measurements” since in this case we are correctly modeling the F and H
matrices. In that section a procedure is outlined for assessing the true filters performance
under modeling errors. The procedure to follow is
0
• Assume all filter design values are correct i.e. take q = and calculate the optimal
0
Kalman gain K in that situation.
• Use the value of K found above to solve for P in
Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T , (268)
where in the above expression all variables (except K) are their true values.
0
The first part of the above procedure where we take q = done earlier and where we
0
have shown that in steady-state we get P = 0 and thus K = 0. When we put this value into
Equation 268 we get
T 0 p11 q 0 q p11
Ṗ = F P + P F + Q = + = ,
p11 2p12 0 0 p11 2p12
which when written as a system of scalar equations gives the desired expression.
Chapter 8 (Implementation Considerations)
In this section of the notes we derive the expression for x̂k+1 (+) expressed via the books
equation 8.1-10 when we use the ǫ technique. From equation 8.1-5, the introduced expression
for ǫ′ , and the expression for ∆x̂k+1 (+) we have
x̂k+1 (+) − Φk x̂k (+) = Kk+1 [zk+1 − Hk+1 Φk x̂k (+)]
ǫrk+1 T T
+ T
Hk+1 (Hk+1Hk+1 )−1 [zk+1 − Hk+1 Φk x̂k (+)]
Hk+1 Pk+1(−)Hk+1 + rk+1
T T
ǫrk+1 Hk+1 (Hk+1 Hk+1 )−1
= Kk+1 + T
(zk+1 − Hk+1Φk x̂k (+)) .
Hk+1 Pk+1 (−)Hk+1 + rk+1
When we take Kk+1 to be the optimal Kalman gain given by
T T
Kk+1 = Pk+1(−)Hk+1 (Hk+1 Pk+1 (−)Hk+1 + rk+1 )−1 ,
in the above we get for the leading coefficient of (zk+1 − Hk+1Φk x̂k (+)) the following
r HT
T
Pk+1(−)Hk+1 + ǫ Hk+1 Hk+1
T
k+1 k+1
K= T
.
Hk+1Pk+1 (−)Hk+1 + rk+1
This is the book’s equation 8.1-10. Recall that Hk+1 is a row matrix and rk is a scalar, so
K is a column matrix.
From the above expression we see that in this case the Kalman gain K we are using to filter
with is composed of two parts K = Kreg + Kow . To determine how this non optimal gain
performs i.e. what the error covariance matrix Pk (+) will be for such a filter we need to use
the results from chapter 7. Namely the results under the section “Exact Implementation of
Dynamics and Measurements”. There the true a posterior error covariance matrix Pk (+)
when filtering with a Kalman gain Kk is given by
Pk (+) = (I − Kk Hk )Pk (−)(I − Kk Hk )T + Kk Rk KkT .
Using the expression in this section for K we find that Pk (+) is given by (dropping the k
subscript for notational simplicity)
P (+) = (I − KH)P (−)(I − KH)T + KRK T
= (I − Kreg H − Kow H)P (−)(I − Kreg H − Kow H)T + (Kreg + Kow )R(Kreg + Kow )T
= (I − Kreg H)P (−)(I − Kreg H)T + Kreg RKreg T
− (I − Kreg H)P (−)H T Kow T − Kow HP (−)(I − Kreg H)T + Kow HP (−)H T Kow T
+ Kreg RKow T + Kow RKreg T + Kow RKow T
= [P (+)]reg
− P (−)H T Kow T + Kreg HP (−)H T Kow T − Kow HP (−) + Kow HP (−)H T Kreg T
+ Kow HP (−)H T Kow T + rKreg Kow T + rKow Kreg T + rKow Kow T .
To further evaluate this expression we will need to simplify the terms after [P (+)]reg . To do
that we will first replace Kreg with P (−)H T (HP (−)H T + r)−1 to get
P (+) = [P (+)]reg
T TP (−)H T HP (−)H T Kow T
− P (−)H Kow + − Kow HP (−)
HP (−)H T + r
Kow HP (−)H T HP (−)
+ + Kow HP (−)H T Kow T
HP (−)H T + r
rP (−)H T Kow T rKow HP (−)
+ T
+ + rKow Kow T .
HP (−)H + r HP (−)H T + r
Counting terms after the [P (+)]reg expression starting at one let us combine the first and
sixth term, the third and sevenths terms, and the fifth and eighth terms to get
P (+) = [P (+)]reg
(−HP (−)H T )P (−)H T Kow T P (−)H T HP (−)H T Kow T
+ +
HP (−)H T + r HP (−)H T + r
(−HP (−)H T )P (−)Kow HP (−) Kow HP (−)H T HP (−)
+ +
HP (−)H T + r HP (−)H T + r
+ (HP (−)H T + r)Kow Kow T .
Now to simplify this we note that since H is a row vector the expression HP (−)H T is a
scalar and can be factored out if needed. This cancels four terms and we get
as we were to show.
ẋ = 0 with x(0) = x0
z = x + v with v ∼ N(0, σ 2 ) .
xk = xk−1
zk = xk + vk ,
with x0 = x0 and vk ∼ N(0, σ 2 ). From this the variables defined in the discrete Kalman
filtering case are φ = 1, q = 0, h = 1 and r = σ 2 . Using these the recursive Kalman filter
given by equation 8.1-16 is
p′k (−) = sp′k−1 (+) (269)
p′k (−)2
p′k (+) = p′k (−) −
p′k (−) + σ 2
′ s2 p′k−1 (+)2 σ 2 sp′k−1 (+)
= spk−1 (+) − ′ = 2 .
spk−1(+) + σ 2 σ + sp′k−1 (+)
In the steady state we have p′k (+) = p′k−1 (+) = p′∞ and using the above expression we see
that p′∞ is given by
σ 2 sp′∞
p′∞ = 2 ,
σ + sp′∞
or
2
sp′∞ + σ 2 p′∞ − σ 2 sp′∞ = 0 .
Solving for p′∞ in the above gives
σ 2 (s − 1) 1
p′∞ = 2
=σ 1− .
s s
To determine k∞ note that it is given by
p′∞ (−)
k∞ = p′∞ (−)H∞
T
(H∞ p′∞ (−)H∞
T
+ σ 2 )−1 = ,
p′∞ (−) + σ 2
with p′∞ (−) related to p′∞ (+) (which we know) by Equation 269 or
p′∞ (−) = sp′∞ (+) = σ 2 (s − 1) .
Thus we get
1
k∞ = σ 2 (s − 1)(σ 2 (s − 1) + σ 2 )−1 = 1 −.
s
To calculate the true error covariances pk (+) (note no prime) see Problem 8-2.
Notes on prefiltering
For the simple example given we find that we can evaluate the expectation of the additional
noise due to smoothing the signal as
!2 !2
X2 X2
1 1
E xi − x2 = E xi − 2x2
2 i=1 4 i=1
1 1
= E (x1 − x2 )2 = E[x21 − 2x1 x2 + x22 ]
4 4
1 2 σ2
= (σx − 2σx2 e−∆t + σx2 ) = x (1 − e−∆t ) ,
4 2
which is the books equation 8.2-8.
Notes on algorithms and integration rules
From the definition of Qk in terms of the continuous system recall that we have
Z t
Qk = Φ(t, τ )Q(τ )ΦT (t, τ )dτ , (270)
tk
In this subsection of these notes we argue that different forms for the a priori to a posteriori
equation (i.e. computing P (+) from P (−)) have different computational properties and that
the so called Joseph’s form for computing P (+) from P (−) is to be preferred all other things
being equal. Dropping subscripts for notational simplicity, to begin we consider computing
P (+) via P (+) = (I − KH)P (−) under a perturbation in the Kalman gain K. To do this
we take K → K + δK and see that the new P (+) then becomes
When we compute P (+) using the Joseph form and the same perturbation in K we have
To simplify this, consider the expression for RK T − HP (−)(I − KH)T when we put in the
optimal Kalman gain K = P (−)H T (HP (−)H T + R)−1 . We see that we get
Problem Solutions
Problem 8-1 (the covariance matrix Pk (+) when using the ǫ technique)
This result is verified in these notes in the section on the ǫ technique. See Page 158 where
it is derived.
Problem 8-2 (the expression for p∞ for Example 8.1-3)
If we assume when working this example that we will be filtering with the correct dynamic
and measurement model i.e. with correct F and H matrices but with the the non-optimal
Kalman gain k∞ given by
1
k∞ = 1 − ,
s
then from Chapter 7 in the section entitled “Exact Implementation of Dynamics and Mea-
surements” the true error covariance is given by Pk (+) obtained by solving the following
Pk (+) = (I − Kk Hk )Pk (−)(I − Kk Hk )T + Kk Rk KkT
Pk+1 (−) = Φk Pk (+)ΦTk + Qk .
In the case considered here these become
2
1 1 1
pk (+) = 1− 1− pk (−) 1 − 1 − + 1− σ2
s s s
pk+1 (−) = pk (+) .
So we have 2
1 1 1
pk (+) = 1 − σ 2 + 2 pk (+) + 2 pk (+) .
s s s
When we let k → ∞ where we get
2
1 1
1 − 2 p∞ (+) = 1 − σ2 ,
s s
or when we solve for p∞ (+) and simplify some we get
(s − 1)2 2 s − 1 2
p∞ (+) = σ = σ ,
s2 − 1 s+1
the result we were to show.
For the second measurement we have H = 0 1 and R = [1] so this measurement updates
the Pi (+) covariance matrix (i for intermediate) as follows
P (+) = Pi (+) − Pi (+)H T [HPi (+)H T + R]−1 HPi (+)
1 1 1 1 −1
0 7 1 1
= 2 4 − 1 7 2 4 +1 0 1 2 4
1 7 1 7
4 8 4 8
1 8 4 8
1 1 1 1 1
8 1 7 8 1 7
= 2 4 − 4 2
= 1 7 − 4 16 32
1 7
4 8 15 78 4 8
4 8 15 7
32
49
64
7 2
= 15 15 ,
2 7
15 15
where ∆t = t2 − t1 . Since the function Φ has translational invariance with respect to time,
that is Φ(t2 , t1 ) = Φ(t2 − t1 , 0), we can simplify the problem by considering only a single
variable by taking t1 = 0 and t2 = t. In addition, since we are interested only in small times
from the time tk we can consider methods for approximating Φ(∆t, 0) = Φ(tk+1 , tk ), where
∆t = tk+1 − tk .
To show the equivalence of this expression with various integration methods, we first recall
that Φ(t, 0) is the solution to dΦ(t,0)
dt
= F (t)Φ(t, 0) with initial condition given by Φ(0, 0) = I.
Then note that if we approximate the solution to this differential equation at ∆t using Euler’s
method
xk+1 = xk + f (xk , tk )∆tk , (273)
so that the state x(t) is Φ(t, 0), the initial time tk is 0, the final time tk+1 = ∆t, and
f (x, t) = F (t)x so that f (xk , tk ) = F (0)Φ(0, 0) to get
or the first two terms in Equation 272. Alternatively if we approximate this differential
equation with the modified Euler method given by Equation 271 we get
∆t
Φ(∆t, 0) = Φ(0, 0) + [F (0)(Φ(0, 0) + F (0)∆t) + F (0)]
2
∆t 2 ∆t2
= I+ [2F (0) + ∆tF (0) ] = I + ∆tF (0) + F (0)2 ,
2 2
or the first three terms in Equation 272.
a b
We let the matrix W be W = we find
c d
2
T a b a c a + b2 ac + bd
WW = = .
c d b d ac + bd c2 + d2
2 1
Setting this expression equal to P = gives the scalar equations
1 2
a2 + b2 = 2
ac + bd = 1
c2 + d 2 = 2 .
From this we see that we have three equations and four unknowns and therefore no unique
solution. If we take W to be lower triangular then b = 0 and the equations above simplify
to
a2 = 2
ac = 1
c + d2 = 2 .
2
√ q
One solution to these is to take a = 2, and then c = √1 2
and d = 2 − 1
= 3
so d = 3
.
2 2 2 2
Chapter 9 (Additional Topics)
Recall that the state error x̃ is defined as x̃ = x̂ − x and using that we can write the
innovation ν as ν = −H x̃ + v. Consider two times t1 and t2 where t2 > t1 and lets compute
E[ν(t2 )ν(t1 )T ]. In terms of x̃ and v this is given by
Now to evaluate this expression we note that the measurement errors observed at the time
t1 i.e. v(t1 ) can and will affect our estimate error at the later time t2 i.e. x̃(t2 ), thus we
can’t conclude that E[x̃(t2 )v(t1 )T ] = 0 since x̃(t2 ) depends in on what v(t1 ) was. On the
other hand the measurement errors observed at the later time t2 i.e. v(t2 ) will not affect
or modify our estimation error made earlier i.e. x̃(t1 ), thus E[v(t2 )x̃(t1 )T ] = 0. Using the
known correlation of v i.e. E[v(t2 )v(t1 )T ] = R(t1 )δ(t1 − t2 ) we have
E[ν(t2 )ν(t1 )T ] = H(t2 )E[x̃(t2 )x̃(t1 )T ]H(t1 )T − H(t2 )E[x̃(t2 )v(t1 )T ] + R(t1 )δ(t2 − t1 ) , (274)
Recall that we have derived the differential equation that x̃ satisfies in Equation 73. From
this equation we see that the solution for x̃(t2 ) given by
Z t2
x̃(t2 ) = Φ(t2 , t1 )x̃(t1 ) − Φ(t2 , τ )[G(τ )w(τ ) − K(τ )v(τ )]dτ , (275)
t1
where Φ(t2 , t1 ) is the transition matrix corresponding to F − KH. Using this expression for
x̃(t2 ) we can now compute terms needed to evaluate Equation 274. To begin we compute
E[x̃(t2 )x̃T (t1 )] as
Z t2
T T T
= E Φ(t2 , t1 )x̃(t1 )x̃ (t1 ) − Φ(t2 , τ )[G(τ )w(τ )x̃ (t1 ) − K(τ )v(τ )x̃ (t1 )]dτ
t1
= Φ(t2 , t1 )P (t2 ) ,
where we have used the facts that E[w(τ )x̃T (t1 )] = 0 and E[v(τ )x̃T (t1 )] = 0 when τ > t1 .
Next we evaluate E[x̃(t2 )v T (t1 )] and find
Z t2
T T T
= E Φ(t2 , t1 )x̃(t1 )v (t1 ) − Φ(t2 , τ )[G(τ )w(τ )v (t1 ) − K(τ )v(τ )v (t1 )]dτ
t1
Z t2
= 0+ Φ(t2 , τ )K(τ )E[v(τ )v T (t1 )]dτ = Φ(t2 , t1 )K(t1 )R(t1 ) .
t1
Thus using these two expressions in Equation 274 we find that
E[ν(t2 )ν(t1 )T ] = H(t2 )Φ(t2 , t1 )P (t1 )H T (t1 ) − H(t2 )Φ(t2 , t1 )K(t1 )R(t1 ) + R(t1 )δ(t2 − t1 )
= H(t2 )Φ(t2 , t1 )[P (t1 )H T (t1 ) − K(t1 )R(t1 )] + R(t1 )δ(t2 − t1 ) , (276)
this is the books equation 9.1-12. If our filter is optimal the the optimal expression for K is
given by K(t1 ) = P (t1 )H T (t1 )R−1 (t1 ) so that the above then becomes
If our dynamics ẋ = F x+Gw is linear and time-invariant then the transition matrix Φ(t2 , t1 )
is a function of only τ = t2 − t1 as
or the books equation 9.1-14. In the above all matrices F , K, etc are evaluated at t = t1 .
For this example our system and measurement equations are given by
ẋ = w with w ∼ N(0, q)
z = x + v with v ∼ N(0, r) ,
So the system and measurement matrices are scalars with f = 0 and h = 1. We will filter
our signal z using x̂˙ = k(z − x̂) where k is non optimal i.e. derived from erroneous values of
q and r. As we filter z with this value of k, we will be observing the innovations ν at each
time defined as ν = z − hx̂ = z − x̂. Using Equation 277 for this system we find that it
becomes
E[ν(t)ν(t − τ )] = e−k|τ | (p∞ − kr) + rδ(τ ) . (278)
Note that in the above expression we can compute the left-hand-side based on the realized
observations of the innovation function ν(t) and call this empirically computed function
φνν (τ ). By performing a least squares fit or (using some other method) we fit the empirically
obtained φνν (τ ) function to a autocorrelation model of the form Ae−k|τ | + B, for some
unknown coefficients A and B. Once we have empirical estimates of the coefficients A and B
using Equation 278 from the above model we see that these are estimates of the expressions
p∞ −kr and r. Since we know the value of k using in filtering this means we have an estimate
of p∞ . The expression p∞ is the steady state solution to the linear variance equation for
Wiener filtering, since we are not filtering with the optimal gain value k (but are instead
using its steady-state value). Thus we need to find the steady state solution p∞ to
In this subsection and the next we introduce and discuss the notation of an observer. Basi-
cally an observer is another transformation of the state x(t) (in addition to the measurement
z(t) = H(t)x(t)) that will estimate and that will allow us to determine a complete specifi-
cation of our state x(t). We begin by requiring that the relationship between our observer
ξ(t) and state x(t) should be
ξ(t) = T (t)x(t) .
In addition we would like our observe to have the property that if we know ξ(t) and z(t)
then we can construct an estimate of x(t) by inverting the combined measurement observer
system
ξ(t) T (t)
= x(t) ,
z(t) H(t)
as −1
T (t) ξ(t)
x(t) = .
H(t) z(t)
Once we have specified the expression we will use for T (t) we can actually compute the
−1
T (t) ξ(t)
inverse above. Since this inverse then multiplies the stacked vector , we
H(t) z(t)
will define it in terms of two more unknowns A(t) and B(t) as the matrix A(t) B(t) .
These unknowns makes the state x(t) from the observer ξ(t) and measurement z(t) equation
simple
x(t) = A(t)ξ(t) + B(t)z(t) . (279)
Thus one way to state what we are doing isto observe
that if we can obtain an expression for
T (t)
T (t) then we can form the stacked matrix , invert it, and obtain the block matrices
H(t)
A(t) and B(t). With these we can construct x(t) using Equation 279.
which by evaluating the matrix product on the left-hand-side we have the block identity
TA TB I 0
= . (282)
HA HB 0 I
The result of this expression is that they allow us to move the time derivative on one factor in
a product to the other factor in the product while we introduce a negative sign. For example,
the (1, 1) and (1, 2) components imply the relationships Ṫ A = −T Ȧ and Ṫ B = −T Ḃ.
From how we have defined the observer ξ(t) its differential equation can be computed using
the relationships introduced above and the true state dynamics of x(t) as
ξ˙ = Ṫ x + T ẋ
= Ṫ (Aξ + Bz) + T (F (Aξ + Bz) + Lu)
= (Ṫ A + T F A)ξ + (Ṫ B + T F B)z + T Lu . (284)
which is the books equation 9.2-11. Then assuming we had a T matrix (and thus the A and
B matrices) we would use Equation 285 to propagate an estimate of ξ(t) namely ξ(t)ˆ and
then use this estimate in Equation 279 to derive an estimate of x. As a next step we must
make sure that whatever choice we make for T any initial error in our estimate of ξ and x
will exponentially propagate to zero. Thus we need to study the properties of the error in
our estimates of ξ and x.
To do this we begin with the error in ξ as ξ˜ defined in the normal way as ξ˜ = ξˆ − ξ with
ξ satisfying Equation 285 and our estimate ξˆ satisfying the same functional form as the
differential equation that ξ satisfies. That is we propagate ξˆ using
˙
ξˆ = (T F A − T Ȧ)ξˆ + (T F B − T Ḃ)z + T Lu .
Now to study the error in x or x̃ = x̂ − x. Using the facts that x = Aξ + Bz and x̂ = Aξˆ+ Bz
we see that x̃ can be written as
but ξ˜ = T x̃ so we
Note that we can further simplify this by noting that if we premultiply Equation 288 by A
to get Aξ˜ = AT x̃ and then use Equation 287 to replace Aξ˜ with x̃ we end with
x̃ = AT x̃ . (290)
Thus replacing AT in the second term on the right-hand-side of Equation 289 we have
which is the books equation 9.2-18. From Equation 280 or AT + BH = I we can write AT
as AT = I − BH and then get for x̃˙ the following
We next replace the H Ȧ in the third term in the above with −ḢA from Equation 283 to
get a third term that looks like
BH ȦT x̃ = −B ḢAT x̃ = −B Ḣ x̃ ,
where we used AT x̃ = x̃, to simplify. Using this for the third term for x̃˙ we finally get
and thus b2 is currently unspecified. The differential equation for x̃ or x̃˙ = (F − BHF )x̃ for
this problem has the matrix F − BHF given by
0 1 1 0 1
F − BHF = − 1 0
0 −β b2 0 −β
1 0 1 0 0 1
= −
0 1 b2 0 0 −β
0 0 0 1 0 0
= = .
−b2 1 0 −β 0 −b2 − β
A nice property would be to have x̃ converge to zero faster than the system response time
which is β. To achieve this we would like to make the m = 1 eigenvalue of F − BHF which
is λ = −(β + b2 ) “significantly” smaller than
β.
One way to do this is to take λ = −5β so
1
that b2 = 4β and we now have that B = .
4β
T A = t1 a1 + t2 a2 = 1
1
TB = t1 t2 = t1 + 4βt2 = 0
4β
a1
HA = 1 0 = a1 = 0 .
a2
Since a1 = 0 the one requirement from Equation 280 is
0 1 1 0 1 0
AT + BH = t1 t2 + 1 0 = = .
a2 4β t1 a2 + 4β t2 a2 0 1
t2 a2 = 1
t1 + 4βt2 = 0
t1 a2 + 4β = 0 .
Since the last equation can be obtained by multiplying the second equation by a2 and using
the first equation we have two equations and three unknowns. One solution can be found
by taking a2 = t2 = 1, and then t1 = −4β.
To finish this example we would solve Equation 285 (with ξ replaced with ξ) ˆ and then
ˆ + B(t)z(t). Equation 285 for ξˆ in this case is
estimate x using x̂ = A(t)ξ(t)
˙ 0 1 0 0 1 1
ˆ
ξ = −4β 1 ˆ
ξ+ −4β 1 z
0 −β 1 0 −β 4β
0
+ −4β 1 u
l
= −5β ξˆ − (16β 2 + β)z + lu = −5ξˆ − 17z − 1 .
with initial condition x1 (0) = 1, x2 (0) = 1, and we solve the above differential equation for
0 ≤ t ≤ ∞. Then since our measurement z = x1 solving these three equations is equivalent
to solving the coupled set system
ẋ1 0 1 0 x1 0
ẋ2 = 0 −1 0 x2 + −1 ,
ẋ3 17 0 −5 x3 −1
1
with initial condition of 1 . Once we have ξˆ as a function of time, x is reconstructed
−4
via
ˆ 0 ˆ 1 x1 (t)
x̂ = Aξ + Bz = ξ+ x1 (t) = ˆ .
1 4β ξ + 4x1 (t)
Notes on observers for stochastic systems
In this section of these notes we provide further details on observers, but in this case we
consider the situation where in addition to exact measurements (considered above) we have
noisy measurements. In this case the measurements are a combination of noisy and noise-free
as
z1 H1 v1
z= = x+ .
z2 H2 0
Here z is a vector of dimension m and we consider the case where there are m1 noise
measurements and m2 noise-free measurements where m2 must equal m − m1 .
Using the standard definition of the error in ξ as ξ˜ = ξˆ−ξ we can derive the differential equa-
tion for ξ˜ by take the time derivative of this difference by using the postulated expressions
ˆ˙ When we do this we find
for ξ˙ and ξ.
We next would like to derive the expression for the differential equation for the error in our
state x̃. To do this we need to derive a few axillary results. The first is to note that that
ξ = T x and ξˆ = T x̂, so that ξ˜ = T x̃. The second is to note that that we can write the error
correction term above as
z1 − H1 x̂ = H1 x + v1 − H1 x̂ = −H1 x̃ + v1 .
x̃ = x̂ − x
= Aξˆ + B2 z2 − (Aξ + B2 z2 )
= Aξ˜ . (294)
˜˙
˜ by taking the time derivative as x̃˙ = Ȧξ˜ + Aξ,
Starting with this last expression, x̃ = Aξ,
when we use ξ˜˙ given by Equation 293 we get
which is the books equation 9.2-32. To further simplify this recall that from 9.2-18 we derived
Equation 292 an equivalent express for the first three terms in the above or
Notice that if we replace B1 in the above with AT B1 we see that the expression AT B1
becomes AT (AT B1 ) = AT AT B1 = AT B1 , since T A = I. Thus the transformation given by
B1 → AT B1 leave the right-hand-side of the above unmodified. The book argues that this
means that we can also perform the transformation AT B1 → B1 .
Warning: I don’t really see the logic in the books argument. If anyone knows of a better
argument for making this substitution please let me know.
We now verify that in special cases these results duplicate known results. If we consider the
case where there is no noisy measurements (v1 = 0 and B1 = 0) and no process noise G = 0
we then get
x̃˙ = (F − B2 H2 F − B2 Ḣ2 )x̃ ,
or Equation 291, which is the expected result for observers of deterministic systems. In the
case where there are no noise free measurements B2 = H2 = 0 (and only noisy measurements)
we get
x̃˙ = (F − B1 H1 )x̃ + B1 v1 − Gw .
which is the standard Kalman filter error dynamics when B1 is the Kalman gain.
Using Equation 297 we can write down the differential equation satisfied by P = E[x̃x̃T ],
where we find
Ṗ = (F − B2 H2 F − B2 Ḣ2 − B1 H1 )P + P (F − B2 H2 F − B2 Ḣ2 − B1 H1 )T
+ B1 R1 B1T + (I − B2 H2 )GQGT (I − B2 H2 )T .
As in other parts of this text we seek expressions for B1 and B2 that make trace(Ṗ ) as
small as possible. This requires taking the B1 and B2 derivatives, setting the results equal
to zero and solving for B1 and B2 . To take these derivatives we will use Equations 313, 315,
and 317. Performing this procedure to determine the optimal value for B1 first to evaluate
∂
∂B1
trace(Ṗ ) we find the three derivatives we need to evaluate given by
∂ ∂
trace((F − B2 H2 F − B2 Ḣ2 − B1 H1 )P ) = − trace(B1 H1 P )
∂B1 ∂B1
= −(H1 P )T = −P H1T
∂ ∂
trace(P (F − B2 H2 F − B2 Ḣ2 − B1 H1 )T ) = − trace(P (B1 H1 )T )
∂B1 ∂B1
∂
= − trace(P H1T B1T )
∂B1
∂
= − trace(B1 H1 P ) = −P H1T
∂B1
∂
trace(B1 R1 B1T ) = 2B1 R1 .
∂B1
∂
Thus ∂B1
trace(Ṗ ) = 0 becomes
When we use the optimal value for B1 found above we find that Ṗ is given by
since several terms cancel. This is the books equation 9.2-37. Now to minimize the trace of
Ṗ in Equation 299 with respect to B2 we need to take the derivative of the above expression
with respect to B2 . The various derivatives we need in this calculation are given by
∂ ∂ ∂
trace((F − B2 H2 F − B2 Ḣ2 )P ) = − trace(B2 H2 F P ) − trace(B2 Ḣ2 P )
∂B2 ∂B2 ∂B2
= −(H2 F P )T − (Ḣ2 P )T
= −P F T H2T − P Ḣ2T .
The trace of the second term on the right-hand-side of Equation 299 has the same derivative
since it is the transpose of the first. Next we evaluate
∂ ∂
trace((I − B2 H2 )GQGT (I − B2 H2 )T ) = − trace(GQGT H2T B2T )
∂B2 ∂B2
∂
− trace(B2 H2 GQGT )
∂B2
∂
+ trace(B2 H2 GQGT H2T B2T ) .
∂B2
Note that the first term and second term are equal since the arguments of the traces are
transposes of each other. Thus we get for this part of the total derivative
The total derivative of trace(Ṗ ) is then given by adding up all of the parts seen thus far to
get
∂
trace(Ṗ ) = −2P F T H2T − 2P Ḣ2T − 2GQGT H2T + 2B2 H2 GQGT H2T = 0 .
∂B2
Thus solving for B2 we see that B2 is given by
B2opt = (P F T H2T + GQGT H2T + P Ḣ2T )(H2 GQGT H2T )−1 , (300)
We will solve the problem of correlated measurement errors by incorporating the correlated
dynamics of the measurement noise v
v̇ = Ev + w1 ,
into the state by forming an n + mth order augmented “prime” system,where the new state
x′ is is the old state x plus the measurement noise v defined as x′T = xT v T . Such an
augmented system has new system matrices F ′ , G′ , H2′ , and Q′ as given in the book. We
now show that the state estimation error x̃′ is orthogonal to the noise-free measurements
represented by H2′ or
′ ′
x̃
H2 x̃ = H I = H x̃ + ṽ = 0 . (301)
ṽ
To show this recall that x̃′ = Aξ˜ and premultiply this relationship by H2′ to get
or H2′ A = 0 meaning that H2′ x̃′ = 0 showing the claimed orthogonalization in Equation 301.
Using this expression we can derive expressions for the augmented state error covariance
matrix P ′ = E[x̃′ x̃′T ] as
′ ′ ′T x̃ T T P E[x̃ṽ T ]
P = E[x̃ x̃ ] = E x̃ ṽ = .
ṽ E[ṽx̃T ] E[ṽṽ T ]
With this augmented system we are now in a situation where we can apply the results
of the previous section. That is we will put the primed system, and Equation 302 into
Equation 300.
To do this we first need to evaluate various products. To begin we find
T
GQG 0
G′ Q′ G′T = so that
0 Q1
T
′ ′ ′T ′T GQGT 0 H GQGT H T
G Q G H2 = = ,
0 Q1 I Q1
and
H2′ G′ Q′ G′T H2′T = HGQGT H T + Q1 .
Next we find
T T T
′ ′T P −P H T FT 0 H P −P H T F H
PF H2′T = =
−HP HP H T 0 E T
I −HP HP H T
ET
P F T H T − P H T ET
= ,
−HP F T H T + HP H T E T
and
P −P H T Ḣ T P Ḣ T
P Ḣ2′T = = .
−HP HP H T 0 −HP Ḣ T
Thus the sum of the three needed terms in B2opt is given by
′ ′T ′T ′ ′ ′T ′T ′ ′T P F T H T − P H T E T + GQGT H T + P Ḣ T
P F H2 + G Q G H2 + P Ḣ2 = .
−HP F T H T + HP H T E T + Q1 − HP Ḣ T
If our measurements are noised versions of the constant x0 or zk = x0 +vk then our stochastic
estimation algorithm is
x̂k+1 = x̂k + kk (zk − x̂k ) .
In this case g(x) = x0 − x, and so g ′ (x) = −1. Thus the required convergence condition
on the sign of kk of sgn(kk ) = −sgn(g ′(x)) = −(−1) = +1 thus we must have kk > 0 for
convergence.
This is the books equation 9.3-29. We denote this estimate x̂k since it is the best predictor
“going into” the kth measurement. In other words it is the prior estimate of the value of
x0 before we obtain the kth measurement. From the above expression for x̂k a recursive
estimate of x̂k+1 can be derived as follows
Pk−1 Pk−1 2 P
zk hk + j=1 zj hj zk hk + x̂k j=1 hj zk hk + x̂k ( kj=1 h2j − h2k )
x̂k+1 = Pk 2
= Pk 2
= Pk 2
h
j=1 j h
j=1 j j=1 hj
1 hk
= x̂k + Pk (zk hk − h2k x̂k ) = x̂k + Pk (zk − hk x̂k ) ,
2 2
j=1 h j j=1 hj
where we have used the fact that the sequence of measurement noise vj are independent i.e.
E[vi vj ] = δij σ 2 .
Now in the present case, where x0 ∼ N(µ0 , σ02 ) and when taking measurements zj = x0 + vj
with vj ∼ N(0, σ 2 ) in terms of a Kalman filter framework by taking our initial guess at the
state, x0 , and its uncertainty as x̂0 = µ0 and p0 (−) = σ02 , we see that this example is exactly
like Example 4.2-1 discussed on Page 47. To make the notation from that example match
this example we need to take r0 → σ 2 and p0 → σ02 . Under this similarity using Equation 63
we have that our state uncertainty changes with measurements as
p0 r0 σ2
pk (+) = = → 2 ,
1 + pr00 k r0
p0
+k k + σσ2
0
which is the books equation 9.3-39. The state update Equation 64 from that example and
using the above transformations gives the books equation 9.3-38.
Notes on deterministic optimal linear systems – duality
In this section of these notes we will simply derive and verify many of the book’s equations.
Given the quadratic performance index J specified in the book we seek to transform it using
a time-varying symmetric matrix S(t) with certain properties. Since S(t) is a function of
time we have
d T
(x Sx) = ẋT Sx + xT Ṡx + xT S ẋ .
dt
Using the fact that our system state satisfies ẋ = F (t)x(t) + L(t)u(t) this becomes
d T
x Sx = uT LT Sx + xT F T Sx + xT Ṡx + xT SF x + xT SLu
dt
= sT (F T S + SF + Ṡ)x + uT LT Sx + xT SLu .
d T
We next add and subtract xT V x + uT Uu to this expression to get that dt
x Sx equals
xT (F T S + SF + Ṡ + V )x + uT LT Sx + xT SLu + uT Uu − xT V xT − uT Uu . (304)
or the books equation 9.5-8. We claim that we can write this as
d T
x Sx = (xT SL + uT U)U −1 (LT Sx + Uu) − xT V x − uT Uu ,
dt
if we impose some restrictions on S. To show this expand out the first term to get
xT SLU −1 LT Sx + xT SLu + uT LT Sx + uT Uu .
This will be equal to Equation 304 if
F T S + SF + Ṡ + V = SLU −1 LT S , (305)
or the books equation 9.5-10. Thus since we have just argued that
d T
xT V x + uT Uu = (xT SL + uT U)U −1 (LT Sx + Uu) − (x Sx) ,
dt
and requiring that at tf the matrix S equals Vf or
x(tf )T S(tf )x(tf ) = x(tf )T Vf x(tf ) ,
we can write our quadratic performance index J as
Z tf
T
J = x(tf ) Vf x(tf ) + (xT V x + uT Uu)dt
t0
Z tf
T
= x(tf ) S(tf )x(tf ) + (xT SL + uT U)U −1 (LT Sx + Uu)dt
t0
− (x(tf ) S(tf )x(tf ) − x(t0 )T S(t0 )x(t0 ))
T
Z tf
T
= x(t0 ) S(t0 )x(t0 ) + (xT SL + uT U)U −1 (LT Sx + Uu)dt , (306)
t0
which is the book’s equation 9.5-12. From this we see that we can minimize J if we require
LT Sx + Uu = 0 , (307)
or that the control u should be given by
u(t) = −U −1 (t)L(t)T S(t)x(t) . (308)
Notes on optimal linear stochastic control systems – separation principles
From the discussion in the book we arrive at a minimization problem for u of the form
Z tf
J¯u = E[(xT SL + uT U)U −1 (LT Sx + Uu)] ,
t0
2LT S x̂ + 2Uu = 0 .
Problem Solutions
Note this is a linear-time invariant system and so the innovations are generated by Equa-
tion 277, which in this case becomes
As discussed in the example 9.1-1 on Page 167 we empirically compute the left-hand-side of
the above (we call this φνν (τ )) and then fit the empirical values to a function of the form
Ae−(β+k)|τ | (p∞ − kr) + Bδ(τ ). Once we have done this we have estimate of p∞ − kr and r.
Next we look for the steady-state solution to
0 = 2(−β − k)p∞ + q + k 2 r ,
or
q + k2 r
p∞ = .
2(β + k)
Thus the adaptive filtering procedure for this problem then is as follows
1. Measure the autocorrelation of the innovations ν(t) and denote this φνν (τ ).
2. Fit a model of the form Ae−(β+k)|τ | + Bδ(τ ) to the measured function φνν (τ ), obtaining
estimates of A and B.
3. From the earlier discussion these two values of A and B should satisfy
q + k2 r
A = p∞ − kr = − kr and B = r .
2(β + k)
Thus we can use these estimates to solve for q and r with k fixed. These two values
of q and r should be better estimates of q and r than we previously had and could be
used to modify the value of k using in filtering.
Since x̃ and ξ˜ are related via ξ˜ = T x̃ see Equation 287 and since AT = I when we premultiply
by A this means that Aξ˜ = x̃. From these two expressions we see that the error covariances
for x̃ and ξ˜ are related via
and
P = E[x̃x̃T ] = AE[ξ˜ξ˜T ]AT = AΠAT , (310)
as we were to show.
When we keep only the highest order terms in ek on the top and the bottom we obtain
ek+1 = ek − k0 ek = (1 − k0 )ek .
ek = (1 − k0 )k−1e1 for k ≥ 2 .
Thus we see that if |1 − k0 | < 1 then this method converges since ek → 0 in that case. This
means that convergence is guaranteed when −1 < 1 − k0 < 1 or 0 < k0 < 2. We are told
that g(x) satisfies 0 ≤ a ≤ |g(x)| ≤ b < ∞, from which we conclude that 0 < ab < 1 so when
we impose the requirement that k0 be such that 0 < k0 < ab this requires that 0 < k0 < 1,
which is stricter than was is truly required for convergence (which is k0 < 2).
All the examples given for the gainP sequence, kk , are examples that can be shown similar to
that of the classic divergent series ∞ 1
k=1 k .
In this section of the appendix we enumerate several matrix and vector derivatives that are
used in the previous document. We begin with some derivatives of scalar forms
∂xT a aT x
= =a (311)
∂x ∂x
∂xT Bx
= (B + BT )x . (312)
∂x
Next we present some derivatives involving traces. We have
∂
trace(AX) = AT (313)
∂X
∂
trace(XA) = AT (314)
∂X
∂
trace(AXT ) = A (315)
∂X
∂
trace(XT A) = A (316)
∂X
∂
trace(XT AX) = (A + AT )X (317)
∂X
∂
trace(XAXT ) = X(A + AT ) . (318)
∂X
Note that we can derive Equations 317 and 318 given the previous trace derivative identities
using the “product rule”. To do this we assume that one of the terms X (or XT ) is constant
when we take the derivative with respect to the other X term. For example to derive
Equation 318 we have
∂ ∂ ∂
T
trace(XAX ) = trace(XAV) + trace(VAX )
T
∂X ∂X V=XT ∂X V=X
T T T
= (AV) V=XT + (VA)| V=X= (AX ) + XA
T
= X(A + A ) .
Next we present some matrix derivatives that are helpful to know. We have
∂
(aT Xb) = abT (319)
∂X
∂
(aT XT b) = baT , (320)
∂X
where as before X is a matrix. Derivations of expressions of this form are derived in [4, 6].
References
[1] J. D’Appolito and C. Hutchinson. Low sensitivity filters for state estimation in the
presence of large parameter uncertainties. Automatic Control, IEEE Transactions on,
14(3):310–312, 1969.
[4] P. A. Devijver and J. Kittler. Pattern recognition: A statistical approach. Prentice Hall,
1982.
[5] R. C. Dorf. Introduction to Electric Circuits. John Wiley & Sons, Inc., New York, NY,
USA, 2007.
[7] M. S. Grewal and A. P. Andrews. Kalman Filtering : Theory and Practice Using
MATLAB. Wiley-Interscience, January 2001.
[8] E. L. Ince. Ordinary Differential Equations. Dover Publications, Inc., New York, NY,
1956.
[9] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab. Signals & systems (2nd ed.).
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
[10] A. Papoulis. Probability, Random Variables, and Stochastic Processes. 3rd edition, 1991.