0% found this document useful (0 votes)
568 views185 pages

Solution Manual and Notes For - Applied Optimal Estimation (Gelb)

The document provides notes and solutions to problems from the book "Applied Optimal Estimation" by Arthur Gelb. It includes: 1) Derivations of the optimal linear estimator that minimizes the mean square error when combining two measurements with different uncertainties and potentially correlated noise. 2) The solution to Problem 1-1 which generalizes the result to the case when the measurement noises are correlated. Expressions for the estimator coefficients and minimum mean square error are derived. 3) Acknowledgments and thanks to people who provided feedback on the notes to improve accuracy. The document aims to help readers better understand Kalman filtering concepts from the book through detailed technical explanations and derivations supplemented with problem solutions

Uploaded by

chisn235711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
568 views185 pages

Solution Manual and Notes For - Applied Optimal Estimation (Gelb)

The document provides notes and solutions to problems from the book "Applied Optimal Estimation" by Arthur Gelb. It includes: 1) Derivations of the optimal linear estimator that minimizes the mean square error when combining two measurements with different uncertainties and potentially correlated noise. 2) The solution to Problem 1-1 which generalizes the result to the case when the measurement noises are correlated. Expressions for the estimator coefficients and minimum mean square error are derived. 3) Acknowledgments and thanks to people who provided feedback on the notes to improve accuracy. The document aims to help readers better understand Kalman filtering concepts from the book through detailed technical explanations and derivations supplemented with problem solutions

Uploaded by

chisn235711
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

A Solution Manual and Notes for:

Applied Optimal Estimation


by Arthur Gelb.

John L. Weatherwax∗

August 21, 2015

Introduction

Here you’ll find various notes and derivations of the technical material I made as I worked
through this book. There is also quite a complete set of solutions to the various end of
chapter problems. I did much of this in hopes of improving my understanding of Kalman
filtering and thought it might be of interest to others. I have tried hard to eliminate any
mistakes but it is certain that some exit. I would appreciate constructive feedback (sent to
the email below) on any errors that are found in these notes. I will try to fix any corrections
that I receive. In addition, there were several problems that I was not able to solve or
that I am not fully confident in my solutions for. If anyone has any suggestions at solution
methods or alternative ways to solve given problems please contact me. Finally, some of the
derivations found here can be quite long (since I really desire to fully document exactly how
to do each derivation) many of these can be skipped if they are not of interest.

I hope you enjoy this book as much as I have and that these notes might help the further
development of your skills in Kalman filtering.

As a final comment, I’ve worked hard to make these notes as good as I can, but I have no
illusions that they are perfect. If you feel that that there is a better way to accomplish
or explain an exercise or derivation presented in these notes; or that one or more of the
explanations is unclear, incomplete, or misleading, please tell me. If you find an error of
any kind – technical, grammatical, typographical, whatever – please tell me that, too. I’ll
gladly add to the acknowledgments in later printings the name of the first person to bring
each problem to my attention.

[email protected]

1
Acknowledgments

Special thanks to (most recent comments are listed first): David Herold and Ed Corbett for
help with these notes.
Chapter 1: Introduction

Notes On The Text

optimal estimation with two measurements of a constant value

We desire our estimate x̂ of x to be a linear combination of the two measurements zi for


i = 1, 2. Thus we take x̂ = k1 z1 + k2 z2 , and define x̃ to be our estimate error given by
x̃ = x̂ − x. To make our estimate x̂ unbiased requires we set E[x̃] = 0 or
E[x̃] = E[k1 (x + v1 ) + k2 (x + v2 ) − x] = 0
= E[(k1 + k2 )x + k1 v1 + k2 v2 − x]
= E[(k1 + k2 − 1)x + k1 v1 + k2 v2 ]
= (k1 + k2 )x − x = (k1 + k2 − 1)x = 0 ,
thus this requirement becomes k2 = 1 − k1 which is the same as the books Equation 1.0-4.
Next lets pick k1 and k2 (subject to the above constraint such that) the error as small as
possible. When we take k2 = 1 − k1 we find that x̂ is given by
x̂ = k1 z1 + (1 − k1 )z2 ,
so x̃ is given by
x̃ = x̂ − x = k1 z1 + (1 − k1 )z2 − x
= k1 (x + v1 ) + (1 − k1 )(x + v2 ) − x
= k1 v1 + (1 − k1 )v2 . (1)
Next we compute the expected error or E[x̃2 ] and find
E[x̃2 ] = E[k12 v12 + 2k1 (1 − k1 )v1 v2 + (1 − k1 )2 v22 ]
= k12 σ12 + 2k1 (1 − k1 )E[v1 v2 ] + (1 − k1 )2 σ22
= k12 σ12 + (1 − k1 )2 σ22 ,
since E[v1 v2 ] = 0 as v1 and v2 are assumed to be uncorrelated. This is the books equation 1.0-
5. We desire to minimize this expression with respect to the variable k1 . Taking its derivative
with respect to k1 , setting the result equal to zero, and solving for k1 gives
σ22
2k1 σ12 + 2(1 − k1 )(−1)σ22 = 0 ⇒ k1 = .
σ12 + σ22
Putting this value in our expression for E[x̃2 ] to see what our minimum error is given by we
find
 2  2
2 σ22 2 σ12
E[x̃ ] = 2 2
σ1 + 2 2
σ22
σ1 + σ2 σ1 + σ2
2 2
σ1 σ2 2 2
 σ12 σ22
= σ + σ =
(σ12 + σ22 )2 2 1
(σ12 + σ22 )
 −1
1 1 1
= 1 1 = 2
+ 2 ,
σ2
+ σ2 σ1 σ2
1 2
which is the books equation 1.06. Then our optimal estimate x̂ take the following form
   
σ22 σ12
x̂ = z1 + z2 .
σ12 + σ22 σ12 + σ22

Some special cases of the above that validate its usefulness are when each measurement
contributes the same uncertainty then σ1 = σ2 and we see that x̂ = 21 z1 + 12 z2 , or the average
of the two measurements. As another special case if one measurement is exact i.e. σ1 = 0,
then we have x̂ = z1 (in the same way if σ2 = 0, then x̂ = z2 ).

Problem Solutions

Problem 1-1 (correlated measurements)

For this problem we are now going to assume that E[v1 v2 ] = ρσ1 σ2 i.e. that the noise v1 and
v2 are correlated. Recall from above that the condition E[x̃] = 0 requires that our estimate
x̂ = k1 z1 + k2 z2 requires k2 = 1 − k1 . Next we compute the expected error or E[x̃2 ] and in
this case using Equation 1 for x̃ we find

E[x̃2 ] = E[k12 v12 + 2k1 (1 − k1 )v1 v2 + (1 − k1 )2 v22 ]


= k12 σ12 + 2k1 (1 − k1 )E[v1 v2 ] + (1 − k1 )2 σ22
= k12 σ12 + 2k1 (1 − k1 )ρσ1 σ2 + (1 − k1 )2 σ22 . (2)

To find a minimum variance estimator we will take the derivative of E[x̃2 ] with respect to
k1 , set the result equal to zero, and then solve for k1 . We have

dE[x̃2 ]
= 0 ⇒ 2k1 σ12 + 2ρ(1 − k1 )σ1 σ2 + 2ρk1 (−1)σ1 σ2 + 2(1 − k1 )(−1)σ22 = 0 .
dk1
or dividing by 2
k1 σ12 + ρ(1 − k1 )σ1 σ2 − ρk1 σ1 σ2 − (1 − k1 )σ22 = 0 .
On solving for k1 in this expression we find

σ22 − ρσ1 σ2
k1 = , (3)
σ22 − 2ρσ1 σ2 + σ12

as claimed. From symmetry k2 = 1 − k1 is given by

σ12 − 2ρσ1 σ2 + σ22 − σ22 + ρσ1 σ2 σ12 − ρσ1 σ2


k2 = 1 − k1 = = . (4)
σ12 − 2ρσ1 σ2 + σ22 σ12 − 2ρσ1 σ2 + σ22

With these values for k1 and k2 and introducing

D ≡ σ12 − 2ρσ1 σ2 + σ22 ,


to simplify notation the minimum mean square error given by Equation 2 becomes
1  2 
E[x̃2 ] = (σ 2 − ρσ 1 σ2 ) 2 2
σ 1 + 2ρ(σ 2
2 − ρσ1 σ2 )(σ 2
1 − ρσ1 σ2 )σ 1 σ2 + (σ1
2
− ρσ1 σ2 ) 2 2
σ2
D2
1 
= σ 2 (σ 4 − 2ρσ1 σ2 σ22 + ρ2 σ12 σ22 )
D2 1 2
+ 2ρσ1 σ2 (σ12 σ22 − ρσ13 σ2 − ρσ1 σ23 + ρ2 σ12 σ22 )

+ σ22 (σ14 − 2ρσ13 σ2 + ρ2 σ12 σ22 )
1  2 4
= 2
σ1 σ2 − 2ρσ13 σ23 + ρ2 σ14 σ22
D
+ 2ρσ13 σ23 − 2ρ2 σ14 σ22 − 2ρ2 σ12 σ24 + 2ρ3 σ13 σ23

+ σ14 σ22 − 2ρσ13 σ23 + ρ2 σ12 σ24
1  2 4 2 2 4 2 2 2 3 3 3

= σ 1 σ 2 (1 − 2ρ + ρ ) + σ 1 σ 2 (ρ − 2ρ + 1) + σ 1 σ2 (2ρ − 2ρ)
D2
σ12 σ22  2 
= 2
σ2 (1 − ρ2 ) + σ12 (1 − ρ2 ) + σ1 σ2 (2ρ)(−1)(1 − ρ2 )
D
σ12 σ22 (1 − ρ2 )  2 2

= σ 1 + σ 2 − 2ρσ 1 σ2
D2
σ1 σ2 (1 − ρ2 )
2 2
= .
σ12 − 2ρσ1 σ2 + σ22
Note that this last expression is zero when ρ = ±1. Our estimate x̂ is then given by
   
σ22 − ρσ1 σ2 σ12 − ρσ1 σ2
x̂ = z1 + z2 . (5)
σ22 − 2ρσ1 σ2 + σ12 σ12 − 2ρσ1 σ2 + σ22
As before we now consider some special cases. If ρ = +1 then the errors are totally positively
correlated and we see that
σ22 − σ1 σ2 σ2 (σ2 − σ1 ) σ2
k1 = 2 2
= 2
= ,
σ1 − 2σ1 σ2 + σ2 (σ1 − σ2 ) σ2 − σ1
with k2 is given by
−σ1
k2 = 1 − k1 = ,
σ2 − σ1
so that x̂ is given by
   
σ2 −σ1 σ2 z1 − σ1 z2
x̂ = z1 + z2 = .
σ2 − σ1 σ2 − σ1 σ2 − σ1
If ρ = −1 the errors are totally negatively correlated and we have
σ22 + σ1 σ2 σ2
k1 = 2 2
= .
σ1 + 2σ1 σ2 + σ2 σ2 + σ1
with k2 is given by
σ1
k2 = 1 − k1 = ,
σ2 + σ1
so that x̂ is given by
   
σ2 σ1 σ2 z1 + σ1 z2
x̂ = z1 + z2 = .
σ2 + σ1 σ2 + σ1 σ2 + σ1
Problem 1-2 (E[x̃2 ] without the requirement that E[x̃] = 0)

We are told that our measurements z1 and z2 are given as noised measurements of a constant
as z1 = x + v1 and z2 = x + v2 , while our estimate of x or x̂ is to be constructed as a linear
combination of zi as x̂ = k1 z1 + k2 z2 . Now defining x̃ as before we have in this case that

x̃ = x̂ − x = k1 (x + v1 ) + k2 (x + v2 ) − x = (k1 + k2 − 1)x + k1 v1 + k2 v2 .

So that x̃2 is given by

x̃2 = (k1 + k2 − 1)2 x2 + 2x(k1 + k2 − 1)(k1v1 + k2 v2 ) + (k1 v1 + k2 v2 )2


= (k1 + k2 − 1)2 x2 + 2xk1 (k1 + k2 − 1)v1 + 2xk2 (k1 + k2 − 1)v2 + (k12 v12 + 2k1 k2 v1 v2 + k22 v22 ) .

Taking the expectation of this expression and using the facts that the mean of the noise is
zero so E[vi ] = 0 and x is a constant gives

E[x̃2 ] = (k1 + k2 − 1)2 x2 + k12 σ12 + 2k1 k2 E[v1 v2 ] + k22 σ22 .

For simplicity lets assume that the two noise sources are uncorrelated i.e. E[v1 v2 ] = 0. Then
to find the minimum of this expression we take derivatives with respect to k1 and k2 set each
expression equal to zero and solve for k1 and k2 . We find the derivatives given by

∂E[x̃2 ]
= 2(k1 + k2 − 1)x2 + 2k1 σ12 = 0
∂k1
∂E[x̃2 ]
= 2(k1 + k2 − 1)x2 + 2k2 σ22 = 0 .
∂k2
When we group terms by the coefficients k1 and k2 we get the following system

(x2 + σ12 )k1 + x2 k2 = x2


x2 k1 + (x2 + σ22 )k2 = x2 .

To solve this system for k1 and k2 we can use Cramer’s rule. We find
2
x 2
2 2x 2
x x + σ2 x2 σ22
k1 = 2 =
x + σ12 x2 (σ12 + σ22 )x2 + σ12 σ22

x2 x2 + σ22
2
x + σ12 x2

x2 x2 x2 σ12
k2 = = ,
(σ12 + σ22 )x2 + σ12 σ22 (σ12 + σ22 )x2 + σ12 σ22

both of which are functions of the unknown variable x. An interesting idea would be to con-
sider the iterative algorithm where we initially estimate x above using an unbiased estimator
and then replace the x above with this estimate obtaining values for k1 and k2 . One could
then use these to estimate x again and put this value into the above expressions for k1 and
k2 . Doing this several times one gets an iterative algorithm as the estimation procedure.
Problem 1-3 (estimating a constant with three measurements)

For this problem our three measurements are related to the unknown value of x from as
z1 = x + v1 , z2 = x + v2 , and z3 = x + v3 , and our estimate will be a linear combination of
them as x̂ = k1 z1 + k2 z2 + k3 z3 . To have an unbiased estimate compute the expectation of
x̃ = x̂ − x which we find to be

x̃ = x̂ − x
= k1 z1 + k2 z2 + k3 z3 − x
= k1 (x + v1 ) + k2 (x + v2 ) + k3 (x + v3 ) − x
= (k1 + k2 + k3 − 1)x + k1 v1 + k1 v1 + k2 v2 + k3 v3 . (6)

To make x̂ an unbiased estimate of x we require that E[x̃] = 0. This in turn requires


k1 + k2 + k3 − 1 = 0 or
k3 = 1 − k1 − k2 (7)
Thus our unbiased estimate of x now takes the form

x̂ = k1 z1 + k2 z2 + (1 − k1 − k2 )z3 .

We will now pick k1 and k2 such that the mean square error E[x̃2 ] is a minimum. With this
functional form for x̂ we have using Equation 6 that

x̃2 = (k1 v1 + k2 v2 + k3 v3 )2
= k12 v12 + k22 v22 + k32 v32 + 2k1 k2 v1 v2 + 2k1 k3 v1 v3 + 2k2 k3 v2 v3 .

Taking the expectation of the above expression, assuming uncorrelated measurements E[vi vj ] =
0 when i 6= j and recalling Equation 7 we have

E[x̃2 ] = k12 σ12 + k22 σ22 + (1 − k1 − k2 )2 σ32 . (8)

to minimize this expression we take the partial derivatives with respect to k1 and k2 and set
the resulting expressions equal to zero. This gives

∂E[x̃2 ]
= 2k1 σ12 + 2(1 − k1 − k2 )(−1)σ32 = 0
∂k1
∂E[x̃2 ]
= 2k2 σ22 + 2(1 − k1 − k2 )(−1)σ32 = 0 .
∂k2
Now solving these two equations for k1 and k2 we find

σ22 σ32 1
k1 = =  2  2
σ12 σ22 + σ12 σ32 + σ22 σ32 σ1 σ1
σ3
+ σ2
+1
σ12 σ32 1
k2 = =  2  2 .
σ12 σ22 + σ12 σ32 + σ22 σ32 σ2 σ2
σ3
+1+ σ1
From these we can compute k3 = 1 − k1 − k2 to find
   
σ22 σ32 σ12 σ32
k3 = 1 − k1 − k2 = 1 − −
σ12 σ22 + σ12 σ32 + σ22 σ32 σ12 σ22 + σ12 σ32 + σ22 σ32
σ12 σ22 1
= 2 2 2 2 2 2
=  2  2 .
σ1 σ2 + σ1 σ3 + σ2 σ3
1 + σσ23 + σσ31

Then by defining D ≡ σ12 σ22 + σ12 σ32 + σ22 σ32 and using Equation 8 we see that
 2 2 2
2 σ24 σ34 σ12 σ34 σ14 σ22 σ14 σ24 σ32 σ1 σ2 σ3 2 2 2 2 2 2
 σ12 σ22 σ32
E[x̃ ] = + + = σ σ
2 3 + σ σ
1 3 + σ σ
1 2 =
D2 D2 D2 D2 D
2 2 3
σ1 σ2 σ3 1
= 2 2 2 2 2 2
= ,
σ1 σ2 + σ1 σ3 + σ3 σ2 1
2 + 1
2 + 1
2
σ3 σ2 σ1

as we were to show.

Problem 1-4 (estimating the initial concentration)

We are told that our estimate of the concentration, zi are noisy measurements of the time-
decayed initial concentration x0 and so have the form

zi = x0 e−ati + vi , (9)

for i = 1, 2. The book provides us with a functional form of an estimator x̂0 we could use to
estimate x0 , and asks us to show that it is unbiased. We could begin by attempting to esti-
mate the initial concentration x0 using a expression that is linear in the two measurements.
That is we might consider
x̂0 = k1 z1 + k2 z2 ,
as has been done else where in the book. From the given form of the measurements in
Equation 9 it might be better however to estimate x0 using the following

x̂0 = k1 eat1 z1 + k2 eat2 z2 ,

with k1 and k2 unknown. Since in that case the exponential parts eati , multiplied by zi will
“remove” the corresponding factor found in Equation 9 and provide a more direct estimate
of x0 . We next define our estimation error x̃0 as x̃0 = x̂0 − x0 . To have an unbiased estimator
requires that E[x̃0 ] = 0. Using this last form form x̂0 this later expectation is given by

E[x̃0 ] = E[k1 eat1 (x0 e−at1 + v1 ) + k2 eat2 (x0 e−at2 + v2 ) − x0 ] = 0 .

Since E[vi ] = 0 the above gives k1 x0 + k2 x0 − x0 = 0 so that k2 = 1 − k1 . Thus our estimator


x̂0 looks like
x̂0 = k1 eat1 z1 + (1 − k1 )eat2 z2 ,
and is in the form suggested in the book. To have the optimal estimator we next select k1
such that our expected square error is the smallest. To do this we compute our expected
square error or E[x̃2 ] and find

E[x̃20 ] = E[(k1 eat1 (e−at1 x0 + v1 ) + k2 eat2 (e−at2 x0 + v2 ) − x0 )2 ]


= E[(k1 x0 + k1 eat1 v1 + k2 x0 + k2 eat2 v2 − x0 )2 ]
= E[(k1 eat1 v1 + k2 eat2 v2 )2 ]
= E[k12 e2at1 v12 + 2k1 k2 eat1 eat2 v1 v2 + k22 e2at2 v22 ]
= k12 e2at1 σ12 + k22 e2at2 σ22 , (10)

assuming uncorrelated measurements E[v1 v2 ] = 0. Taking the derivative of this expression


with respect to k1 (while recalling that k2 = 1 − k1 and setting this derivative equal to zero
we get
2k1 e2at1 σ12 + 2(1 − k1 )(−1)e2at2 σ22 = 0 .
Solving for k1 we find

(eat2 σ2 )2 σ22
k1 = = .
(eat1 σ1 )2 + (eat2 σ2 )2 σ22 + σ12 e−2a(t2 −t1 )

Using this then k2 becomes

(eat1 σ1 )2 σ12
k2 = 1 − k1 = = .
(eat1 σ1 )2 + (eat2 σ2 )2 σ12 + σ22 e2a(t2 −t1 )

To simplify the notation of the algebra that follows we define A1 = e2at1 σ12 and A2 = e2at2 σ22
so that the variables ki in terms of Ai are given as k1 = A1A+A
2
2
and k2 = A1A+A 1
2
. Then we
have that Equation 10 becomes

A22 A21 A1 A2 A1 A2
E[(x̂0 − x0 )2 ] = 2
A1 + 2
A2 = 2
(A1 + A2 ) =
(A1 + A2 ) (A1 + A2 ) (A1 + A2 ) A1 + A2
 −2t1 a  −1
1 e e−2t2 a
= 1 = + ,
A2
+ A11 σ12 σ22

as we were to show.
Chapter 2: Underlying Mathematical Techniques

Notes On The Text

Least-Squares Techniques

The objective function, J, for least squares is given by

J = (z − Hx)T (z − Hx) , (11)

which we can expand to write as follows

J = z T z − 2z T Hx + xT H T Hx .

Taking the first derivative of this expression with respect to the unknown vector x using
Equations 311 and 312 gives
∂J
= −2H T z + (H T H + H T H)x = −2H T z + 2H T Hx .
∂x
The second derivative of J with respect to x is given by
∂2J
2
= 2H T H . (12)
∂x
This matrix is positive semi-definite since if we let ξ be a arbitrary non-zero vector and
2
compute the inner product ξ T ∂∂xJ2 ξ we see that this can can be written as a quadratic sum
as X
2(Hξ)T (Hξ) = 2 (Hξ)2i ≥ 0 ,
i
T
for all possible vectors ξ. Thus 2H H is positive semi-definite and the solution to the first
order optimality condition ∂J∂x
= 0 gives a minimum.

Problem Solutions

Problem 2-1 (the derivative of the matrix inverse)

Since P (t)P (t)−1 = I, taking the derivative of both sides of this expression and using the
product rule gives
dP −1
Ṗ P −1 + P = 0.
dt
dP −1
Solving for dt
we find
dP −1
= −P −1 Ṗ P −1 , (13)
dt
as we were to show.
Problem 2-3 (eigenvalues of positive definite matrices)

We will prove this by showing the equivalence of between two quadratic forms. If we consider
the quadratic form xT Ax then as discussed in the book there exists an orthogonal matrix
Q such that A′ = QT AQ = Q−1 AQ, is a diagonal matrix. Since A and A′ are related by a
similarity transformation they have the same eigenvalues which are equal to the elements on
the diagonal of A′ . Thus if we define x′ = Qx then xT Ax can be written as
2 2 2
λ1 x′1 + λ2 x′2 + · · · + λn x′n ,

where λi is the eigenvalue of A (equivalently A′ ). Now if we are told that A is positive definite
then we know that xT Ax > 0 for all x’s. If we take x = qi , where qi is the ith column vector
of Q then in that case by the orthogonality of the matrix Q we have x′ = Qqi = ei , a vector
of all zeros with a single 1 in the ith spot. For that value of x then xT Ax = λi . Since
xT Ax > 0 for all x we see that λi > 0. On the other hand if we are told that the eigenvalues
of A are all positive
Pwe know that λi > 0 for all i then from the above decomposition we
n
have that x Ax = i=1 λi x′i 2 > 0 showing that A is positive definite.
T

dR(t) T
Problem 2-4 (S(t) = dt
R (t) is skew symmetric)

Taking the transpose of the expression for S(t) and we find


 T
T dR(t) d
S(t) = R(t) = R(t) R(t)T
dt dt
d  d
= R(t)RT (t) − R(t)RT (t) .
dt dt
Since R(t) is orthogonal R(t)RT (t) = I which has a zero derivative. Since the right-hand-side
of the above after this equals −S(t) we have shown

S(t)T = −S(t) ,

or that S(t) is skew-symmetric.

Problem 2-5 (uses for the Cayley-Hamilton theorem)

Part (a): The Cayley-Hamilton theorem requires that a matrix A satisfy its own charac-
teristic polynomial. The given matrix has a characteristic polynomial given by |A − λI| = 0
or
1−λ 2
= 0,
3 4−λ
or after expanding some
(1 − λ)(4 − λ) − 6 = 0 ,
or finally λ2 − 5λ − 2 = 0 as we were to show. The eigenvalues of this matrix are then given
by the quadratic formula √
5 ± 33
λ= . (14)
2

Part (b): Since one definition of eAt is



X
At (At)k t2 2 t3 3 t4 4
e = = I + tA + A + A + A +··· , (15)
k! 2 6 24
k=0

to evaluate this we need to compute powers of A. Powers of A can be computed using the
fact that A satisfies its own characteristic polynomial (the Cayley-Hamilton theorem). We
find
A2 = 2I + 5A
A3 = (5A + 2I)A = 2A + 5A2 = 2A + 5(5A + 2I) = 10I + 27A
A4 = A(A3 ) = 10A + 27A2 = 10A + 27(5A + 2I) = 54I + 145A .
Using these we can write eAt as
t2 t3 t4
eAt = I + tA + (2I + 5A) + (10I + 27A) + (54I + 145A) + · · ·
2 6 24
If we group terms that are multiples of I together and terms that are multiples of A together,
we find that the above expression for eAt is equal to
   
At 2 5 3 9 4 5 2 9 3 145 4
e = I 1+t + t + t +··· +A t + t + t + t +···
3 4 2 2 24
= a1 (t)I + a2 (t)A ,
with ai (t) defined by the respective terms in brackets above.

Part (c): Using the expression derived above


eAt = a1 (t)I + a2 (t)A ,
If we take A = λ1 and then A = λ2 we get the following system
eλ1 t = a1 (t)I + λ1 a2 (t)
eλ2 t = a1 (t)I + λ2 a2 (t) .
Solving for the functions a1 (t) and a2 (t) we get
λ2 eλ1 t − λ1 eλ2 t
a1 (t) = −
λ1 − λ2
e − eλ2 t
λ1 t
a2 (t) = .
λ1 − λ2
Note that the first expression is the negative of the books expression1 . To verify that these
exponential functions are equivalent to the expressions for a1 (t) and a2 (t) given in the book
1
I think there is a sign mistake in the expression for a1 (t) given in the book.
(and above) we can Taylor expand each of the ai (t) expressions about t = 0 with λi given by
Equation 14. We do this using Mathematica in the file chap 2 prob 5.nb. Where we find

λ2 eλ1 t − λ1 eλ2 t 5 9 779 6


− = 1 + t2 + t3 + t4 + t +···
λ1 − λ2 3 4 360
eλ1 t − eλ2 t 5 9 145 4 779 5 93 6
= t + t2 + t3 + t + t + t +··· ,
λ1 − λ2 2 2 24 120 16
which are the same as the above expressions in brackets, proving the equivalence.

Problem 2-6 (evaluating an integral over the points r such that r T E −1 r < 1)

R
For this problem we want to evaluate the integral rT E −1 r<1 dr. To do this lets introduce a
change of coordinates that decouples the variables in r. Since E is a positive definite matrix
so is its inverse E −1 , and thus E −1 has a Cholesky factorization given by E −1 = GGT , where
G is an lower triangular matrix. Introduce the vector v = GT r then the set of possible r
values r T E −1 r < 1 becomes

r T GGT r < 1 or v T v < 1 .

Our integral under this change of coordinates then becomes


Z
∂r
dv ,

vT v<1 ∂v
∂r
where ∂v is the determinant of the Jacobian of the transformation from the v coordinates
to the r coordinates. Since r = G−T v we see that
∂r
= G−T ,
∂v
and so
∂r
= |G−T | = |G−1 | = 1 .
∂v |G|
∂r
Note that since G is related to E we can express ∂v in terms of E by noting that

|E −1 | = |G| · |GT | = |G|2 .


p ∂r p
Thus we can replace 1
|G|
with |E| to find that ∂v = |E|, and that our integral becomes

p Z 4 p
|E| dv = π 3 |E| ,
vT v<1 3
R
since we recognized that vT v<1
dv represents the volume of a sphere with radius 1.
Problem 2-7 (weighted least squares)

The objective function, J, for weighted least squares is given by

J = (z − Hx)T W (z − Hx) , (16)

which we can expand to write as follows

J = z T W z − 2z T W Hx + xT H T W Hx = z T W z − 2(H T W z)T x + xT H T W Hx .

Taking the first derivative of this expression with respect to the unknown vector x using
Equations 311 and 312 gives
∂J
= −2H T W z + (H T W H + H T W H)x = −2H T W z + 2H T W Hx .
∂x
Setting this derivative equal to zero and solving for x (which we denote as x̂) gives

x̂ = (H T W H)−1H T W z , (17)

the result quoted in the book. The second derivative of J with respect to x is given by

∂2J
= 2H T W H . (18)
∂x2
This matrix is positive semi-definite if the elements on the diagonal of W are non-negative
and the solution given in Equation 17 to the first order optimality condition ∂J
∂x
= 0 gives a
minimum.

Problem 2-9 (the distribution of the sum of three uniform random variables)

If X is a uniform random variable over (−1, +1) then it has a p.d.f. given by
 1
2
−1 ≤ x ≤ 1
pX (x) = ,
0 otherwise
X
while the random variable Y = 3
is another uniform random variable with a p.d.f. given by
 3
2
− 13 ≤ x ≤ 13
pY (y) = .
0 otherwise

Since the three random variables X/3, Y /3, and Z/3 are independent the characteristic
function of the sum of them is the product of the characteristic function of each one of them.
For a uniform random variable over the domain (α, β) on can show that the characteristic
function ζ(t) is given by Equation 21 or
Z β
1 eitβ − eitα
ζ(t) = eitx dx = ,
α β−α it(β − α)
note this is a slightly different than the normal definition of the Fourier transform [9], which
has e−itx as the exponential argument. Thus for each of the random variables X/3, Y /3, and
Z/3 the characteristic function since β = 13 and α = − 13 looks like

3(eit(1/3) − e−it(1/3) )
ζ(t) = .
2it
Thus the sum of two uniform random variables like X/3 and Y /3 has a characteristic function
given by
9
ζ 2 (t) = − 2 (eit(2/3) − 2 + e−it(2/3) ) ,
4t
and adding in a third random variable say Z/3 to the sum of the previous two will give a
characteristic function that looks like
 
3 27 eit 3eit(1/3) 3e−it(1/3) e−it
ζ (t) = − − + − 3 .
8i t3 t3 t3 t

Given the characteristic function of a random variable to compute its probability density
function from it we need to evaluate the inverse Fourier transform of this function. That is
we need to evaluate Z ∞
1
pW (w) = ζ(t)3 e−itw dt .
2π −∞
1
R∞
Note that this later integral is equivalent to 2π −∞
ζ(t)3 e+itw dt (the standard definition of
the inverse Fourier transform) since ζ(t)3 is an even function. To evaluate this integral then
it will be helpful to convert the complex exponentials in ζ(t)3 into trigonometric functions
by writing ζ(t)3 as !
t

27 3 sin sin(t)
ζ(t)3 = 3
− 3 . (19)
4 t3 t
Thus to solve this problem we need to be able to compute the inverse Fourier transform of
two expressions like
sin(αt)
.
t3
To do that we will write it as a product with two factors as

sin(αt) sin(αt) 1
3
= · 2.
t t t
This is helpful since we (might) now recognize as the product of two functions each of which
we know the Fourier transform of. For example one can show [9] that if we define the step
function h1 (w) as  1
2
|w| < α
h1 (w) ≡ ,
0 |w| > α
then the Fourier transform of this step function h1 (w) is the first function in the product
above or sin(αt)
t
. Notationally, we can write this as
 1 
2
|w| < α sin(αt)
F = .
0 |w| > α t
initial ramp function flipped ramp function
2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6

Figure 1: Left: The initial function h2 (x) (a ramp function). Right: The ramp function
flipped or h2 (−x).

In the same way if we define the ramp function h2 (w) as

h2 (w) = −w u(w) ,

where u(w) is the unit step function



0 w<0
u(w) = ,
1 w>0
1
then the Fourier transform of h2 (w) is given by t2
. Notationally in this case we then have

1
F [−wu(w)] = .
t2
Since the inverse of a function that is the product of two functions for which we know the
individual inverse Fourier transform of is the convolution integral of the two inverse Fourier
transforms we have that
  Z ∞
−1 sin(αt)
F = h1 (x)h2 (w − x)dx ,
t3 −∞

the other ordering of the integrands


Z ∞
h1 (w − x)h2 (x)dx ,
−∞

can be shown to be an equivalent representation. To evaluate the above convolution integral


and finally obtain the p.d.f for the sum of three uniform random variables we might as well
select a formulation that is simple to evaluate. I’ll pick the first formulation since it is easy to
flip and shift to the ramp function h2 (·) distribution to produce h2 (w − x). Now since h2 (x)
looks like the plot given in Figure 1 (left) we see that h2 (−x) then looks like Figure 1 (right).
Inserting a right shift by the value w we have h2 (−(x − w)) = h2 (w − x), and this function
looks like that shown in Figure 2 (left). The shifted factor h2 (w − x) and our step function
h1 (x) are plotted together in Figure 2 (right). These considerations give a functional form
flipped ramp function shifted right by 0.75 alpha=2; a=0.75
2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2
−6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6

Figure 2: Left: The function h2 (x), flipped and shifted by w = 3/4 to the right or
h2 (−(x − w)). Right: The flipped and shifted function plotted together with h1 (x) al-
lowing visualizations of function overlap as w is varied.

for the p.d.f of gα (w) given by



 Rw 0 w < −α
1
gα (w) = (x − w)dx −α < w < +α
 R −α
+α 1
2
(x − w)dx w>α
 −α 2
 0 w < −α
= − 14 (α + w)2 −α < w < +α ,

−αw w>α
when we evaluate each of the integrals. Using this and Equation 19 we see that
27
F −1[ζ 3 (t)] = (3g1/3 (w) − g1 (w))
4  
 0 w < − 31 0 w < −1
81 1 1 2 1 1 27  1 2
= − ( + w) − 3 < w < + 3 + (1 + w) −1 < w < +1
4  4 31 1 4  4
3
w w > 3
w w>1


 0 w < −1

 27
 16 (1 + w) 2
−1 < w < − 13
= − 98 (−1 + 3w 2 ) − 31 < w < + 13 ,

 27 1

 16
(−1 + w)2 3
<w<1

0 w>1
which is equivalent to what we were to show. In the Mathematical file chap 2 prob 9.nb
some of the algebra for this problem is worked.

Problem 2-10 (the Poisson probability density)

The distribution function for a Poisson random variable when the mean number of events
we expect to observe is µ is given by
x
X x
X
e−µ µi −µ µi
F (x) = =e .
i=0
i! i=0
i!
When are arrival rate is 0.4 arrivals per minute since in 10 minutes we would have a mean
number of arrivals given by µ = 10(0.4) = 4. Thus the probability of exactly four arrivals in
10 min is given by
44
f (x = 4|µ = 4) = e−4 = 0.1954 ,
4!
and the probability of no more than four arrivals in 10 minutes is given by
4
X
−4 4i
F (4) = e = 0.62884 .
i=0
i!

See the Matlab file chap 2 prob 10.m for calls to the poisspdf and poisscdf Matlab func-
tions used in evaluating these two probabilities.

Problem 2-11 (a Rayleigh process)


For the first part of this problem lets define the random variable Z = X 2 + Y 2 and attempt
to compute the distribution function for the random variable Z. We have
n√ o
FZ (z) = Pr {Z ≤ z} = Pr 2
X +Y ≤z 2
Z
= √ p(x, y)dxdy
X 2 +Y 2 ≤z
Z
= √ p(x)p(y)dxdy
X 2 +Y 2 ≤z
Z    
1 1 x2 1 1 y2
= √ √ exp − 2 √ exp − 2 dxdy
X 2 +Y 2 ≤z 2π 2σ 2π 2σ
Z  2 2

1 1 (x + y )
= √ exp − dxdy .
X 2 +Y 2 ≤z 2π 2 σ2
To evaluate this last integral we will change from Cartesian coordinates to polar coordinates.
Let r 2 = x2 + y 2 and the integral above becomes
Z z Z z
1 2
− 12 r 2 1 r2
FZ (z) = e σ 2πrdr = re− 2 σ2 dr
2π r=0 r=0
Z 1 z22
2σ 1 z2
= e−v dv = 1 − e− 2 σ2 .
0

We will take the derivative of Fz (z) to get the p.d.f for Z. We find
1 z − 21 z22
fZ (z) = FZ′ (z) = e σ ,
2 σ2
which is the desired expression.

Next we will compute the expectation of Z and Z 2 directly from the definition of the given
Rayleigh density function. We have that
Z ∞ 2
z − z22
E(Z) = 2
e 2σ dz .
z=0 σ
z2
√ √
To evaluate this integral let v = so that z = 2σ v and dz = √σ v −1/2 dv to get
2σ2 2
Z ∞
1 2 −v σ
E(Z) = (2σ v)e √ v −1/2 dv
σ 2 v=0 2
√ Z ∞ 3 −1 −v
= 2σ v 2 e dv
0
 
√ 3 √ 1 1
= 2σΓ( ) = 2σ Γ( )
2 2 2
r
π
= σ.
2
Next we calculate E(Z 2 ). We find
Z ∞
1 z2
E(Z ) = 22
z 3 e− 2σ2 dz .
σ v=0

Using the same transformations as was used to evaluate E(Z) we get


Z ∞
2 1 σ
E(Z ) = 2
23/2 σ 3 v 3/2 e−v √ v −1/2 dv
σ v=0 2
Z ∞
= σ22 v 0+1 v −v dv = 2σ 2 Γ(0) = 2σ 2 .
v=0

Thus the variance of Z is given by


π  π
Var(Z) = E(Z 2 ) − E(Z)2 = 2σ 2 − σ 2 = σ2 2 − .
2 2

Problem 2-12 (a maneuvering vehicle)

From the given description the probability of various accelerations A is given by




 Pmax a = −Amax

P0 a=0
Pr(a) =

 P max a = +Amax

b −Amax < a < +Amax
To be a normalized probability density we must have the value of b satisfy

2Pmax + P0 + b(2Amax ) = 1 ,

or solving for b we find


1 − (P0 − 2Pmax )
b= .
2Amax

Using this density the expectation of a is then given by


Z Amax
E(a) = −Amax Pmax + Amax Pmax + 0P0 + a b da = 0 .
−Amax
and the expectation of a2 is given by
Z Amax
2
E(a ) = +A2max Pmax + A2max Pmax 2
+ 0 P0 + a2 b da
−Amax

3 Amax
a
= 2A2max Pmax + b
3 −Amax
A2max
= [1 + 4Pmax − P0 ] ,
3
when we evaluate. Since E(a) = 0 the value of the variance is given by E(a2 ).

Problem 2-13 (statistics for the uniform distribution)

The uniform distribution has a characteristic function that can be computed directly
Z b
itX 1
ζ(t) = E(e ) = eitx dx (20)
a b−a
 itb 
1 e − eita
= . (21)
b−a it

We could compute E(X) using the characteristic function ζ(t) for a uniform random variable.
Beginning this calculation we have

1 ∂ζ(t)
E(X) =
i ∂t t=0
 
1 1 1 itb ita 1 itb
ita
= (ibe − iae ) − 2 (e − e )
i b − a it it t=0
 
1 t(ibe − iae ) − (eitb − eita )
itb ita
= − .
b−a t2 t=0

To evaluate this expression requires the use of L’Hopital’s rule, and seems a somewhat
complicated route to compute E(X). The evaluation of E(X 2 ) would probably be even
more work when computed from the characteristic function. For this distribution, it is much
easier to compute the expectations directly. We have
Z b
b
1 1 x2 1
E(X) = x dx = = (a + b) .
a b−a b−a 2 a 2

In the same way we find E(X 2 ) to be given by


Z b  3 
2 1 1
2 b − a3
E(X ) = x dx =
a b−a b−a 3
2 2
(b − a)(b + ab + a ) 1
= = (b2 + ab + a2 ) .
3(b − a) 3
Using these two results we thus have that the variance of a uniform random variable is
Var(X) = E(X 2 ) − E(X)2
1 2 1
= (b + ab + a2 ) − (a2 + b2 + 2ab)
3 4
(b − a)2
= .
12

Problem 2-14 (the distribution of X1 +X2 when X1 and X2 are correlated normals)

The joint p.d.f of X1 and X2 is given by


  2 
1 1 x1 x1 x2 x22
f2 (x1 , x2 ) = p exp − − 2 − 2ρ + , (22)
2πσ1 σ2 1 − ρ2 2(1 − ρ2 ) σ1 σ1 σ2 σ22
and we want to determine what the probability density function of Z = X1 + X2 is. To do
that consider the distribution function for the random variable Z. From the definition of the
distribution function we have
FZ (l) = Pr{Z ≤ l} = Pr{X1 + X2 ≤ l}
Z ∞ Z l−x2
= f2 (x1 , x2 )dx1 dx2 .
x2 =−∞ x1 =−∞

It would be nice to be able to evaluate this expression directly but it might be simpler to
determine the functional form of fZ (l) by taking the derivative of the above with respect to
l and then evaluating the resulting integral. We find
Z ∞

FZ (l) = f2 (l − x2 , x2 )dx2
x2 =−∞
Z ∞   
1 1 (l − x2 )2 (l − x2 ) x2 x22
= p exp − − − 2ρ + .
2πσ1 σ2 1 − ρ2 x2 =−∞ 2(1 − ρ2 ) σ12 σ1 σ2 σ22
In the argument in the exponent we can expand everything in terms of x2 , complete the
square and write it as
   2
1 1 2ρ 1 lσ2 (ρσ1 + σ2 ) l2
− + + x2 − − .
2(1 − ρ2 ) σ12 σ1 σ2 σ22 σ12 + 2ρσ1 σ2 + σ22 2(σ12 + 2ρσ1 σ2 + σ22 )
Using this we see that the value of FZ′ (l) is the integral of the exponential of this  expression 
over the entire real line. Since x2 goes from −∞ to +∞ the “shift” amount of σ2lσ+2ρσ 2 (ρσ1 +σ2 )
1 σ2 +σ2
2
1
in the quadratic above can translated away and we get
l2
Z ∞    
′ 1 − 1 1 2ρ 1
FZ (l) = p e 2(σ 2 +2ρσ1 σ2 +σ 2 )
1 2 exp − 2) 2
+ + 2 x22 dx2 .
2πσ1 σ2 1 − ρ2 x2 =−∞ 2(1 − ρ σ1 σ1 σ2 σ2
To evaluate this expression recall that because of the normalization of the Gaussian proba-
R∞ 1 x2 √
bility density that −∞ e− 2 σ2 dx = 2πσ and the above becomes
l2
1 −
FZ′ (l) =√ p e 2(σ 2 +2ρσ1 σ2 +σ 2 )
1 2 .
2π σ12 + 2ρσ1 σ2 + σ22
Note that this expression is the probability density function of a normal random variable
with a mean value of zero and a variance given by σ12 + 2ρσ1 σ2 + σ22 . In the Mathematical
file chap 2 prob 14.nb some of the algebra for this problem is worked.
Chapter 3 (Linear Dynamic Systems)

Notes on the text

Notes on Example 3.1.2 (verification of the derivation of the differential system)

Working through the block diagram presented in the text in figure 3.1-4 for this example we
find that the various state variables must be related as follows

εa − φg = δ v̇
δ ṗ = δv
δv
+ εg = φ̇ .
R
 
φ
If we solve for the derivative variable and assume a state vector given by  δv  we find
δp
 
  φ
φ̇ = 0 1/R 0  δv  + εg
δp
 
  φ
δ ṗ = 0 1 0  δv 
δp
 
  φ
δ v̇ = −g 0 0  δv  + εa .
δp

which when written as a first order matrix system is given by the books equation 3.1-13.

Verification of the analytic solution to the continuous linear system

We are told that a solution to the continuous linear system with a time dependent companion
matrix F (t) or
ẋ(t) = F (t)x(t) + L(t)u(t) , (23)
is given by Z t
x(t) = Φ(t, t0 )x(t0 ) + Φ(t, τ )L(τ )u(τ )dτ . (24)
t0
To verify this take the derivative of x(t) with respect to time. We find
Z t
′ ′
x (t) = Φ (t, t0 )x(t0 ) + Φ′ (t, τ )L(τ )u(τ )dτ + Φ(t, t)L(t)u(t)
t0
Z t
= F (t)Φ(t, t0 )x(t0 ) + F (t)Φ(t, τ )L(τ )u(τ )dτ + L(t)u(t)
t0
 Z t 
= F (t) Φ(t, t0 )x(t0 ) + Φ(t, τ )L(τ )u(τ )dτ + L(t)u(t)
t0
= F (t)x(t) + L(t)u(t) .

showing that the expression given in Equation 24 is indeed a solution. Note that in the above
we have used the fact that for a fundamental solution Φ(t, t0 ) we have Φ′ (t, t0 ) = F (t)Φ(t, t0 ).

Notes on the derivation of the matrix superposition integral (Example 3.3-1)

We will seek a solution x(t) of the form

x(t) = Φ(t, t0 )ξ(t) (25)

to our differential equation given by


dx(t)
= F (t)x(t) + L(t)u(t) .
dt
When we put our hypothesized expression for x(t) given by Equation 25 into the above
equation we get
d
[Φ(t, t0 )ξ(t)] = F (t)Φ(t, t0 )ξ(t) + L(t)u(t) ,
dt
or expanding the time derivative on the left-hand-side we get

F (t)Φ(t, t0 )ξ(t) + Φ(t, t0 ) = F (t)Φ(t, t0 )ξ(t) + L(t)u(t) .
dt
Where we have used the fact that
d
Φ(t, t0 ) = F (t)Φ(t, t0 ) . (26)
dt
Canceling the common terms on both sides of this expression we get

Φ(t, t0 ) = L(t)u(t) .
dt

When we solve this for dt
we find


= Φ(t, t0 )−1 L(t)u(t) = Φ(t0 , t)L(t)u(t) ,
dt
since
Φ(t, t0 )−1 = Φ(t0 , t) . (27)
When we integrate the above expression we find that ξ(t) is given by
Z t
ξ(t) = ξ(t0 ) + Φ(t0 , τ )L(τ )u(τ )dτ .
t0

Putting this expression into Equation 25 we get for x(t) the following
Z t
x(t) = Φ(t, t0 )ξ(t0 ) + Φ(t, t0 )Φ(t0 , τ )L(τ )u(τ )dτ .
t0

Since the product of the two Φ functions inside the integral simplifies as

Φ(t, t0 )Φ(t0 , τ ) = Φ(t, τ ) , (28)

and ξ(t0 ) = x(t0 ) the above expression for x(t) becomes


Z t
x(t) = Φ(t, t0 )x(t0 ) + Φ(t, τ )L(τ )u(τ )dτ , (29)
t0

or the matrix superposition integral as we were trying to show.

Notes on state vector augmentation: some common correlated noise models

The random ramp disturbance can be modeled with the system

ẋ1 = x2
ẋ2 = 0 .

From the equation for x2 (t) by integrating we have that x2 (t) = x2 (0) where x2 (0) is the
random constant initial condition. It is worth repeating the point about the randomness of
x2 (0). The value of x2 (0) is not known beforehand but is assumed to be generated from
a distribution. Once the random value is generated and observed, the value of x2 (t) is
specified for all later time. Then using the first equation we have that ẋ1 = x2 (0) so that
x1 (t) = x2 (0)t + x1 (0), where x1 (0) is another random initial condition. Thus if we consider
x1 (0) to be the “mean value” of x1 (t) then

E[(x1 (t) − x1 (0))2 ] = E[x2 (0)]t2 ,

showing the quadratic growth of the variance expected with a random ramp noise model.

For the exponentially correlated random variables the state differential equation is given
by
ẋ = −βx + w ,
then from this representation the system function F is −β and if we assume w(t) is uncor-
related white noise so that E[w(t)w(τ )] = q(t)δ(t − τ ) then the linear variance equation

Ṗ (t) = F (t)P (t) + P (t)F (t)T + G(t)Q(t)G(t)T , (30)


in this scalar case becomes
ṗ(t) = −2βp(t) + 12 q(t) .
In steady state ṗ(t) = 0 and taking q(t) = q (a constant) and then solving for p the steady-
state error variance with exponentially correlated random variables we have
q
p = E[x2 ] = ,

since the definition of p is p = E[x2 ]. If we want to consider the case of exponentially


correlated random variables in the discrete setting the discrete system equation in that case
is given by
xk+1 = e−β(tk+1 −tk ) xk + wk .
To evaluate the various terms in the discrete error covariance extrapolation equation

Pk+1 = Φk Pk ΦTk + Γk Qk ΓTk , (31)

we will use the results from the book that translate from the continuous time model to the
discrete time model. Recall that the continuous noise produces a discrete noise term Γk Qk ΓTk
that is given by
Z tk+1
T
Γk Qk Γk = Φ(tk+1 , τ )G(τ )Q(τ )G(τ )T Φ(tk+1 , τ )T dτ . (32)
tk

For the continuous problem where the fundamental solution is given by Φ(t, t0 ) = e−β(t−t0 )
and G = 1 so we can evaluate Equation 32 taking Q(t) = q a constant as
Z tk+1
T
Γk Qk Γk = e−β(tk+1 −τ ) qe−β(tk+1 −τ ) dτ
tk
q
= (1 − e−2β(tk+1 −tk ) ) ,

or the books equation 3.8-20.

Notes on time series analysis

In the discussion on time series analysis given in the text the focus is on ARMA(p,q) models
for the output process zk given an input process rk . This means that we assume that
our output, zk , can be expressed as a sum of p values of its past realizations (termed the
autoregressive part) and q values of the innovative input process rk (called the moving average
part). Mathematically this is expressed as
p q
X X
zk = bi zk−i + rk − ci rk−i . (33)
i=1 i=1
for some coefficients bi and ci . We can cast this formulation into a state-space representation
in several ways. The book recommends the following
 
rk−q
 rk−q+1 
 .. 
 . 
 
 rk−2 
 
 
 rk−1 
 
xk =  zk−p . (34)
 
 zk−p+1 
 .. 
 . 
 
 zk−2 
 
 zk−1 
z̃k (−)

The first block of x is the moving average MA(q) part, the second block of x is the AR(p) part
and the third block (the single element z̃(−)) is discussed below. This third element in the
book is written as zk (−) but with an ∞ symbol above it. Since we observe the system output
zk which is determined from the p previous values zk−i for i = 1, 2, · · · p and the observed
zero mean random q previous system inputs rk−i for i = 1, 2, · · · q the state representation
above uses those previously observed values. The last element z̃k (−) is the best estimate of
the prediction of zk given the information thus far. Since we have not observed rk at this
point our prediction is given by the sum of the terms we have observed
p q
X X
z̃k = bi zk−i − ci rk−i . (35)
i=1 i=1

Note that from Equation 33 this is also equal to zk − rk . To derive the discrete time
propagation equation xk+1 = Φk xk we note that since
 
rk−q+1
 rk−q+2 
 .. 
 . 
 
 r 
 k−1 
 
 rk 
 
xk+1 =  zk−p+1 ,
 
 zk−p+2 
 .. 
 . 
 
 z 
 k−1 
 zk 
z̃k+1 (−)

most of the variables in xk+1 are “shifted up” and can be directly found in xk . The ones
that are not are rk , zk , and z̃k+1 (−). The first, rk , we treat as a source of process noise. The
second, zk , we can obtain from z̃k (−) + rk the sum of a term in the state xk and the process
noise rk . The third we express as follows
p q
X X
z̃k+1 (−) = b1 zk − c1 rk + bi zk+1−i − ci rk+1−i
i=2 i=2
p q
X X
= b1 (zk − rk ) + b1 rk − c1 rk + bi zk+1−i − ci rk+1−i
i=2 i=2
p q
X X
= b1 z̃k (−) + (b1 − c1 )rk + bi zk+1−i − ci rk+1−i .
i=2 i=2

Taken together all of these considerations given the books equation 3.9-16.

Problem Solutions

Problem 3-1 (proving the solution to the linear variance equation)

For this problem we want to show that P (t) given by


Z t
T
P (t) = Φ(t, t0 )P (t0 )Φ(t, t0 ) + Φ(t, τ )G(τ )Q(τ )G(τ )T Φ(t, τ )T dτ . (36)
t0

is a solution to the linear variance equation. We can do this by first taking the derivative of
the given expression for P (t) with respect to t. We find
dP dΦ(t, t0 ) dΦ(t, t0 )T
= P (t0 )Φ(t, t0 )T + Φ(t, t0 )P (t0 )
dt dt dt
T T
+ Φ(t, t)G(t)Q(t)G (t)Φ(t, t)
Z t Z t
dΦ(t, τ ) T T dΦ(t, τ )T
+ G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ + Φ(t, τ )G(τ )Q(τ )G(τ )T dτ .
t0 dt t0 dt

Recall that the fundamental solution Φ(t, t0 ) satisfies the following dΦ(t,t
dt
0)
= F (t)Φ(t, t0 ) and
that Φ(t, t) = I with I the identity matrix. With these expressions the right-hand-side of
dP
dt
then becomes
dP
= F (t)Φ(t, t0 )P (t0 )Φ(t, t0 )T + Φ(t, t0 )P (t0 )Φ(t, t0 )T F T (t) + G(t)Q(t)G(t)T
dt Z t Z t
T T
+ F (t)Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ + Φ(t, τ )G(τ )Q(τ )G(τ )T Φ(t, τ )T F (t)T dτ
t0 t0
 Z t 
T T T
= F (t) Φ(t, t0 )P (t0 )Φ(t, t0 ) + Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ
t0
 Z t 
T
+ Φ(t, t0 )P (t0)Φ(t, t0 ) + Φ(t, τ )G(τ )Q(τ )G(τ ) Φ(t, τ ) dτ F (t)T + G(t)Q(t)G(t)
T T T
(37)
t0
= F (t)P (t) + P (t)F (t) + G(t)Q(t)G(t)T ,
T

as a differential equation for P (t).


Problem 3-2 (the steady-state solution to the linear variance equation)

Consider the linear variance equation Ṗ (t) = F P + P F T + Q, then the solution P (t), to this
equation is given in problem 3.1 above in Equation 36. Since our system is time-invariant
we have Φ(t, τ ) = eF (t−τ ) and the expression for P (t) in this case becomes
Z t
F (t−t0 ) F T (t−t0 ) T
P (t) = e P (t0 )e + eF (t−τ ) QeF (t−τ ) dτ
t
Z 0t−t0
T T
= eF (t−t0 ) P (t0 )eF (t−t0 ) + eF v QeF v dv .
0

Where in the second line above we make the substitution v = t − τ in the integral. To make
our above expression for P solve the desired equation from the problem or F P + P F T = −Q,
we will consider the steady-state solution to the linear variance equation by taking t → ∞
in the above expression. In that case P (t) is a constant so Ṗ (t) = 0 and the linear variance
equation reduces to the desired equation F P + P F T = −Q. If our initial state x(t0 ) has no
uncertainty (P (t0 ) = 0) or if our linear system is stable we can assume that

Φ(t, t0 )P (t0 )ΦT (t, t0 ) → 0 ,

as t → ∞, and the expression for P (t) becomes


Z ∞
Tv
P = lim P (t) = eF v QeF dv .
t→+∞ 0

the desired expression.

Problem 3-3 (analysis of an autocorrelation function)

Warning: I’m not entirely sure that I’ve worked this problem correctly since the answer
I propose seems too simple to structure a problem around. If anyone sees an error in my
solution or can offer verification that this is a correct result please email me.

To get an autocorrelation function of the functional form specified in φ̂(τ ) we note that we
can view it as the sum of three parts: σ 2 α12 , σ 2 σ22 cos(ωτ ), and σ 2 α32 e−β|τ | . We next consider
what type of random process give rise to each of these three autocorrelation functional forms.

From figure 3.8-3 in the book the system of a random constant or ẋ1 = 0 has a constant
autocorrelation function and we can take φ̂1 (τ ) = σ 2 α12 .

From example 2.2-2 in the book the periodic signal

x2 (t) = A sin(ωt + θ) ,
1
with θ a uniform random variable with density fΘ (θ) = 2π on 0 ≤ θ ≤ 2π has an autocorre-
A2 A2 2 2

lation given by 2 cos(ωτ ). If we take 2 = σ α2 or A = 2σα2 then the signal x2 (t) has a
autocorrelation function φ̂2 (τ ) = σ 2 α22 cos(ωτ )
Finally the system
ẋ3 = −βx3 + w ,
where w(t) is white noise signal with E[w(t)w(τ )] = σ 2 α32 δ(t − τ ) has an autocorrelation
function given by φ̂3 (τ ) = σ 2 α32 e−β|τ | .

Thus if we consider the total system


ẋ1 = 0
ẍ2 = −x2
ẋ3 = −βx3 + w(t) ,
with E[w 2 ] = σ 2 α32 , then x(t) defined as the sum of the three terms
x(t) = x1 (t) + x2 (t) + x3 (t) ,
then x(t) will have the given autocorrelation function. Note that the differential equation
for x2 (t) is of second order. From the above decomposition we see that x(t) is the sum of
three parts, a constant term, an oscillatory term, and an exponentially decaying term.

Problem 3-4 (a simple integrator)

Warning: For this problem I was unable to get the result quoted in the book and was unable
to find an error in my work or assumptions below. If anyone sees anything wrong with what
I have done please email me, I would be interested in determining what the problem is.
Perhaps it is a typo in the books expression for P (t)?

The diagram in Figure 3.1 gives the following system for the variables x1 (t) and x2 (t)
ẋ1 = x2
ẋ2 = −βx2 + w ,
with E[w(t)w(τ )] = σ 2 δ(t − τ ). We have been able to write down the differential equation
for x2 (t) from the given expression for its autocorrelation φx2 x2 (τ ) = σ 2 e−β|τ | using the
discussion in thebook on exponentially correlated random variables. If we introduce the
x1
state vector x = then from the above we have a linear system for x given by
x2
      
d x1 (t) 0 1 x1 0
= + .
dt x2 (t) 0 −β x2 w
 
0 1
From which we see that our system matrix F is given by F = . With F defined
0 −β
in this way the linear variance equation given by 30 for this problem them becomes
         
ṗ11 ṗ12 0 1 p11 p12 p11 p12 0 0 0 0
= + +
ṗ12 ṗ22 0 −β p12 p22 p12 p22 1 −β 0 σ2
 
2p12 p22 − βp12
= .
−βp12 + p22 −2βp22 + σ 2
This gives the following system for p11 (t), p12 (t), and p22 (t)
ṗ11 = 2p12
ṗ12 = p22 − βp12
ṗ22 = −2βp22 + σ 2 .
Here we take initial conditions of p11 (0) = 0, p12 (0) = 0, and p22 (0) = σ 2 , meaning that
initially we have uncertainty only in the component x2 . Then we find a solution to p22 (t)
given by
σ2
p22 (t) = (1 + (2β − 1)e−2βt ) .

which is not the same as the expression for p22 (t) given in the book which is simply σ 2 . In
the Mathematical file chap 3 prob 4.nb this and the differential equations for p11 (t) and
p12 (t) are solved. I find it strange that the (2, 2) component of P (t) is constant independent
of t while the other elements are not.

Note that since in this problem the system matrix F is time invariant
 the fundamental

Ft 0 1
solution is given by Φ(t) = e . Since the matrix F in this case is , we can
0 −β
compute powers of F directly. We find
    
2 0 1 0 1 0 −β
F = =
0 −β 0 −β 0 β2
    
3 0 1 0 −β 0 β2
F = =
0 −β 0 β2 0 −β 3
    
4 0 1 0 β2 0 −β 3
F = =
0 −β 0 −β 3 0 β4
..
.  
2n 0 −β 2n−1
F =
0 β 2n
 
2n+1 0 β 2n
F = .
0 −β 2n+1
Using these we find that
X∞ ∞
Ft 1 22 1 33 F 2k t2k X F 2k+1 t2k+1
Φ(t) = e = I + Ft + F t + F t + ··· = +
2 6 k=0
(2k!) k=0
(2k + 1)!
X∞   ∞  
t2k 0 −β 2k−1 X t2k+1 0 β 2k
= I+
2k! 0 β 2k (2k + 1)! 0 −β 2k+1
k=1 k=0
" P # " P∞ t2k+1 β 2k+1 #
t2k β 2k
0 − β1 ∞ 0 1
= I+ P∞ k=1t2k β2k!
2k + β
Pk=0 (2k+1)!
∞ t2k+1 β 2k+1
0 k=1 2k!
0 − k=0 (2k+1)!
 1
  1

0 − β (cosh(tβ) − 1) 0 β sinh(tβ)
= I+ +
0 cosh(tβ) − 1 0 − sinh(tβ)
 
1 − β1 cosh(βt) + β1 sinh(βt)
= .
0 cosh(βt) − sinh(βt)
Problem 3-5 (is this system observable)

To begin with we express the given diagram figure 3-2 in terms of mathematical equations.
We then study the observability of these equations. To begin with from the given diagram
we see that the gyro vertical defection (ξ) error eξ has two terms, a bias term eξb and a
random term eξr and can be expressed as the sum of these two as

eξ = eξb + eξr .

Following the flow diagram from left to right we next see that the variable δv is given by
Z
−g(eξ + δp) = δv ,

and that δp in terms of δv is given by


Z
1
δv = δp .
R
We expect that the gyro vertical deflection and position bias are driven by random initial
constants (which we don’t know) and thus have differential equations given by

ėξb = 0
ėpb = 0 .

Finally the velocity and position measurements zv and zp are related to the state variables
as

zv = δv + ev
zp = epb + ep + δp .
 
Thus if we take our state to be xT = epb δp δv eξb then our dynamical system in
companion form is given by
        
epb 0 0 0 0 0 epb 0
d 
 δp  
=
1
R
δv   0 0 1/R 0   δp
= 
  0 
+ 
dt  δv   −geξb − geξr − gδp   0 −g 0 −g   δv   −geξr  . (38)
eξb 0 0 0 0 0 eξb 0

Part (a): If our measurement is zp and expressed in terms of the state vector x as
 
epb
   δp 
zp = 1 1 0 0  
 δv  + ep ,
eξb
 
so the measurement sensitivity matrix H in this case is 1 1 0 0 . Since our state
vector is four dimensional the requirement that the state be observable requires that the
block matrix  T 
H F T H T (F T )2 H T (F T )3 H T , (39)
have rank equal to four. When we compute the above matrix using the matrices F and H
for this problem we find this matrix is given by
 
1 0 0 0
 1 0 −g 0 
 R .
 0 1 0 − Rg2 
R
0 0 − Rg 0

This matrix has rank 3 and thus our system with only a position measurement is not ob-
servable.

Part (b): If we have both position and velocity measurements then our measurement vector
z is given by  
    epb  
zp 1 1 0 0   δp 
+ e p
z= = .
zv 0 0 1 0  δv  ev
eξb
 
1 1 0 0
So the measurement sensitivity matrix H in this case is . When we compute
0 0 1 0
the observability matrix in Equation 39 above we find that is is given by
 
1 0 0 0 0 0 0 0
 1 0 0 −g − g g2 
 R
0 0 R  .
 0 1 1 0 0 − g − g2 0 
R R R
g2
0 0 0 −g − Rg 0 0 R

For observability this system this matrix must have a rank of 4. Since the first and second
row can be combined to yield the fourth row it can have rank at most three. It in fact has a
rank of 3 indicating that even with two measurements the given state is still unobservable.

Part (c): Tor this part if we are told that epb = 0, that is the position measurement has
no bias, our state
 is now of dimension three i.e. has the representation given by xT =
δp δv eξb and for observability of this state we need to consider the matrix
 T 
H F T H T (F T )2 H T . (40)

To be observable this matrix must be of rank 3. This matrix is easy to compute since it is
the same observability matrix as in Part (b) above but without the last two columns or
 
1 0 0 0 0 0
 1 0 0 −g − g 0 
 R .
 0 1 1
0 0 − Rg 
R
0 0 0 −g − Rg 0

This later matrix does have a rank of three and the resulting system is observable. To prevent
error in algebraic manipulations the matrix multiplications required above are performed in
the Mathematical file chap 3 prob 5.nb.
Problem 3-6 (an approximate solution)

 
0 1 0
We see that the matrix F in this case is F =  0 0 1 . From this matrix we can
0 0 −α
compute powers of F . We find
    
0 1 0 0 1 0 0 0 1
F 2 =  0 0 1   0 0 1  =  0 0 −α 
0 0 −α 0 0 −α 0 0 α2
    
0 1 0 0 0 1 0 0 −α
F 3 =  0 0 1   0 0 −α  =  0 0 α2 
0 0 −α 0 0 α2 0 0 −α2
    
0 1 0 0 0 −α 0 0 α2
F 4 =  0 0 1   0 0 α2  =  0 0 −α3 
0 0 −α 0 0 −α2 0 0 α4
..
.  
0 0 (−1)n−2 αn−2
F n =  0 0 (−1)n−1 αn−1  .
0 0 (−1)n αn

Recall that the fundamental solution Φ(t, t0 ) for a linear time invariant system is given by
Φ(t, t0 ) = eF (t−t0 ) , which when we use the definition of the matrix exponential to evaluate
this expression we find

Φ(t, t0 ) = Φ(t − t0 ) = eF (t−t0 )


1 1
= I + F (t − t0 ) + F 2 (t − t0 )2 + F 3 (t − t0 )3 + · · ·
  2  6  
1 0 0 0 1 0 0 0 1
1
=  0 1 0  +  0 0 1  (t − t0 ) +  0 0 −α  (t − t0 )2
2
0 0 1 0 0 −α 0 0 α2
 
0 0 −α
1
+ 0 0 α2  (t − t0 )3 + · · ·
6
0 0 −α2

Lets take T = t − t0 and sum the components of these matrices. We find that
 T2 3 
1 T 2
− α T6 + · · ·
Φ(T ) =  0 1 
2 2
T − α T2 + α6 T 3 + · · ·
α2 2 α3 3
0 0 1 − αT + 2 T − 6 T + · · ·

Note that we could explicitly evaluate each of these sums directly in terms of the exponential
function e· , if needed. For example, the (1, 3) element of Φ(T ) above can be written as

T2 T3 e−αT − 1 + αT
−α +··· = .
2 6 α2
If we take only the most significant term in each sum above we find that Φ(T ) is approxi-
mately equal to  2 
1 T T2
Φ(T ) =  0 1 T  ,
0 0 1
as we were to show.

Problem 3-7 (deriving the controllability criterion)

The discrete system given in this problem is

xk+1 = Φxk + λuk ,

where λ is a constant vector. The books discussion on controllability, when specified explicitly
for this system gives exactly the requirement stated. That is, the matrix Θ given by
 
Θ = λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ , (41)

must have rank n for this system to be controllable. As a direct way obtain this result, we
recall that the definition of controllability is that given an arbitrary input x0 we can specify
a set of controls ui such that the state xn after n stages takes any desired value. To build
up an intuition for Equation 41 we find that on the first stage after one control u0 , has been
specified that we arrive at the state x1 via

x1 = Φx0 + λu0 .

On the second stage after the two controls (u0 and u1 ) have been specified we have the state
x2 via
x2 = Φx1 + λu1 = Φ(Φx0 + λu0 ) + λu1 = Φ2 x0 + Φλu0 + λu1 .
In the same way, on the third stage after three controls u0 , u1 , and u2 we have the state x3
via
x3 = Φ3 x0 + Φ2 λu0 + Φλu1 + λu2 .
Generalizing the above, at the nth stage we have used n controls and have the state xn in
terms of these controls given by

xn = Φn x0 + Φn−1 λu0 + Φn−2 λu1 + · · · + Φ2 λun−3 + Φλun−2 + λun−1 .

We can write the above equation as a vector equation as


 
un−1
 un−2 
 
   un−3 
 
xn = λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ  ..  + Φn x0 .
 . 
 
 u1 
u0
From the above we see that if the matrix
 
Θ = λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ ,

is invertible, then we can specify the n control values ui to get any state xn and vice versa. As
another way of stating this result is the following. Given an arbitrary initial state x0 and a
 T
target state xn we can compute a vector u of controls u = u0 u1 u2 · · · un−2 un−1
such that we arrive at the target state xn in n steps by solving the system
 
un−1
 un−2 
 
   un−3 
 
λ Φλ Φ2 λ · · · Φn−2 λ Φn−1 λ  ..  = xn − Φn x0 .
 . 
 
 u1 
u0

for the vector u. This requires the invertibility of Θ, or equivalently that Θ must have rank
n, which is what we wanted to show.

Problem 3-8 (the discrete state transition matrix)

For this problem I have denoted the value of the signal on the first “loopback” line x2 (t) (since
this signal will get multiplied by T12 ) and the value of the signal on the second “loopback” line
as x1 (t) (since this signal will get multiplied by T11 ). Under that convention, the differential
equation for the system given by figure 3-3 is then given by
1 1
ẋ1 (t) = − x1 (t) + x2 (t)
T1 T1
1 1
ẋ2 (t) = − x2 (t) + w(t) .
T2 T2
 
x1 (t)
If our system state is then the system above can be written in terms of matrices
x2 (t)
as       
d x1 (t) − T11 T11 x1 (t) 0
= + 1 .
dt x2 (t) 0 − T12 x2 (t) T2
w(t)
From this expression we see that the F matrix for this problem is given by
 
− T11 T11
F = .
0 − T12

Since this is independent of time the fundamental solution Φ(t, t0 ) = eF (t−t0 ) and thus to
determine Φ(t, t0 ) we need to evaluate eF (t−t0 ) . Since this problem is time invariant without
loss of generality we can take t0 = 0. To compute Φ(t) = eF t we will solve two initial values
problems. The first will have initial conditions given by
   
x1 (0) 1
= ,
x2 (0) 0
and the second will have initial conditions given by
   
x1 (0) 0
= .
x2 (0) 1

The solutions to the first initial value problem become the first column of the matrix eF t
and the solution to the second initial value problem will become the second column of eF t .
When we do this we find that
" −t  t  #
T2 −T − t
e T1
e 1 − e T2
eF t = T1 −T2
− t
0 e T2

From the above expression we see that Φ(∆t) = eF ∆t is the same as the expression we are
asked to derive in the book. In the Mathematical file chap 3 prob 8.nb some of the algebra
for this problem is done.

Problem 3-9 (a cascading additive noise integration system)

From figure 3-4 in the book we see that as a system of differential equations we obtain
ẋn = xn−1 (t) + wn (t)
ẋn−1 = xn−2 (t) + wn−1 (t)
ẋn−2 = xn−3 (t) + wn−2 (t)
..
.
ẋ3 = x2 (t) + w3 (t)
ẋ2 = x1 (t) + w2 (t)
ẋ1 = w1 (t) ,
with each white noise term w i (t) has  a spectral density given by qi δ(t). If we define the
x1
 x2 
 
 x3 
 
 
system state vector as x(t) =  ...  then our system above in matrix notation is given
 
 xn−2 
 
 xn−1 
xn
by
      
x1 0 0 0 x1 w1
 x2   1 0 0 
     x2  
  w2 

 x3   .. 
   0 1 0 .   x3   w3 
 
d  .    .   . 
.
 .  =   . . .
. . .
. . .   ..  +  ..  .
dt      
 xn−2    x
   . . . 0 0 0   n−2   wn−2 
 

 xn−1    x   wn−1 
 1 0 0  n−1
xn 0 1 0 xn wn
Thus the system matrix F in this case is the zero matrix with ones on the first sub-diagonal.
With the above F the linear variance equation Ṗ = F P + P F T + Q has a somewhat special
form. The product F P is a block row matrix composed of an initial row of zeros followed
by the first n − 1 rows of P . The product P F T is a block column matrix with the first block
a column of zeros and the second block the first n − 1 columns of the matrix P . With these
observations when we write out the linear variance equation for this problem with Ṗ (t) given
by  
ṗ11 ṗ12 ṗ13 · · · ṗ1n
 ṗ21 ṗ22 ṗ23 · · · ṗ2n 
 
 
Ṗ (t) =  ṗ31 ṗ32 ṗ33 · · · ṗ3n  ,
 .. .. 
 . . 
ṗn1 ṗn2 ṗn3 · · · ṗnn
we get the following system
 
0 0 0 ··· 0
 p11 p p · · · p 
 12 13 1n 
 p21 p p · · · p 
Ṗ (t) =  22 23 2n 
 .. .. 
 . . 
pn−1,1 pn−1,2 pn−1,3 · · · pn−1,n
   
0 p11 p12 · · · p1,n−1 q1 0 0 ··· 0
 0 p21 p22 · · · p2,n−1   0 q2 0 · · · 0 
   
   
+  0 p31 p32 · · · p3,n−1  +  0 0 q3 · · · 0 
 .. ..   .. .. 
 . .   . . 
0 pn1 pn2 · · · pn,n−1 0 0 0 · · · qn
 
q1 p11 p12 ··· p1,n−1
 p11 p12 + p21 + q2 p13 + p22 ··· p1n + p2,n−1 
 
 p21 p22 + p31 p23 + p32 + q3 · · · p2n + p3,n−1 
=  .
 .. .. 
 . . 
pn−1,1 pn−1,2 + pn,1 pn−1,3 + pn2 · · · pn−1,n + pn,n−1 + qn

Looking at the above expressions we see that in component form we have that the (i, j)th
component of the product F P is
(F P )ij = pi−1,j (t) .
for i ≥ 2 and that the (i, j)th component of the product P F T is given by

(P F T )ij = pi,j−1(t) ,
for j ≥ 2. The differential equation for the function pij (t) is thus given by
ṗij (t) = pi−1,j + pi,j−1 + qi δij , (42)
for 2 ≤ i ≤ n and 2 ≤ j ≤ n.

Can we solve for pii (t)? This equation would be


ṗii = pi−1,i + pi,i−1 + qi ,
since P is a symmetric matrix pi−1,i = pi,i−1 so the above differential equation becomes

ṗii = 2pi−1,i + qi for 2 ≤ i ≤ n ,

Thus we need to compute pi−1,i (t) to evaluate pii (t).

If we look at the first row of these equations we have for the p11 (t) the following

ṗ11 = q1 ⇒ p11 (t) = q1 t .

The equation for p12 is given by

q1 t2
ṗ12 = p11 = q1 t ⇒ p12 = .
2
The equation for p13 next gives

q1 t2 q1 t3
ṗ13 = p12 = ⇒ p13 = .
2 6
In general for the first row we have

q1 tj
p1j = for 1 ≤ j ≤ n (43)
j!

When we recall that p12 (t) = p21 (t) by the symmetry of P (t) the equation for p22 (t) is

ṗ22 = 2p12 + q2 = q1 t2 + q2 .

When we integrate this gives


q1 t3
p22 (t) = + q2 t .
3
Thus we have been able to verify the first two expectations given. This would be a good
starting points for an inductive proof of the general case pnn = E[xn (t)2 ]. Instead of providing
an inductive proof it can be shown in [7] that the fundamental solution Φ(t) to the given
system can be written as
 
1 0 0 0 ··· 0
 t 1 0 0 ··· 0 
 
 1 2
t t 1 0 ··· 0 
 2 
Φ(t) =  1 3
t 1 2
t t 1 · · · 0 .
 3! 2 
 .. .. .. .. . . .. 
 . . . . . . 
1 1 1 1
(n−1)!
tn−1 (n−2)!
tn−2 (n−3)!
tn−3 (n−4)!
tn−4 ··· 1

We will now use this expression in Equation 36 to derive an expression for pii (t). In this
problem here we have P (t0 ) = 0, G(t) = I, and Q(t) = Q where Q is a diagonal matrix.
Then we have
Z t Z t−t0
T
P (t) = Φ(t − τ )QΦ (t − τ )dτ = Φ(τ )QΦ(τ )T dτ .
t0 0
Since Q is diagonal the product Φ(t)Q is easy to compute since it is a scalar multiplier of
each column of Φ(t). That is we have
 
q1 0 0 0 ··· 0
 q1 t q2 0 0 ··· 0 
 
 q1 2
t q t q 0 ··· 0 
 2 2 3 
Φ(t)Q =  q1 3
t q2 2
t q 3t q4 ··· 0 .
 3! 2 
 .. .. .. .. .. .. 
 . . . . . . 
q1 n−1 q2 n−2 q3 n−3 q4
(n−1)!
t (n−2)!
t (n−3)!
t (n−4)!
tn−4 · · · qn

From this we see that the elements of the nth row of Φ(t)Q are given by
q1 q2 qn−2 2
tn−1 , tn−2 , · · · , t , qn−1 t , qn .
(n − 1)! (n − 2)! 2!
The nth column of Φ(t)T is given by the nth row of Φ(t) and has elements given by
1 1 1
tn−1 , tn−2 , · · · , t2 , t , 1 .
(n − 1)! (n − 2)! 2!
When we take the dot product of these two vector we see that the the (n, n)th component
of P (t) is given by when we take t0 = 0
Z tXn Xn
2 qn+1−i 2i−2 qn+1−i t2i−1
pnn (t) = E[xnn (t) ] = 2
τ dτ = ,
0 i=1 (i − 1)! i=1
(i − 1)!2 (2i − 1)

as we were to show.

Problem 3-10 (steady-state error for a given system)

The system associated with the given diagram figure 3-5 is


ẋ1 = x2
ẋ2 = −ω 2 x1 − 2ξωx2 + w ,
where w(t) is a white
 noise
 input with spectral density qδ(t). If we define the state of this
x1
system to be x = then our system in terms of x becomes
x2
   
d 0 1 0
x(t) = x(t) + .
dt −ω 2 −2ξω w(t)
 
0 1
Thus we see that our system F matrix is given by F = . With this the linear
−ω 2 −2ξω
variance Equation 30 becomes
         
ṗ11 ṗ12 0 1 p11 p12 p11 p12 0 −ω 2 0 0
= + +
ṗ12 ṗ22 −ω 2 −2ξω p12 p22 p12 p22 1 −2ξω 0 q
 
2p12 p22 − ω 2 p11 − 2ξωp12
= .
p22 − ω p11 − 2ξωp12 −2ω 2 p12 − 4ξωp22 + q
2
As a system for the functions pij (t) this is given by
ṗ11 = 2p12
ṗ12 = −ω 2 p11 − 2ξωp12 + p22
ṗ22 = −2ω 2 p12 − 4ξωp22 + q ,
In steady-state all time derivatives above are zero. In this case we see that that p12 (t) = 0,
and the other functions must satisfy
0 = −ω 2 p11 + p22
0 = −4ξωp22 + q .
When we solve for p11 and p22 using the above system we find
q q
p22 = and p11 = ,
4ξω 4ξω 3
as we were to show.

Problem 3-11 (the optimal first-order system)

Note: I think there is an error in this problem. The error has to do with the additive
noise function n(t). The book states that the autocorrelation of n(t) is proportional to
a delta function, specifically φnn (τ ) = Nδ(t). I think what they meant to say was that
E[n(t)n(τ )] = N 2 δ(t − τ ) (note the square on N). In this later case I can show the stated
claim: that k = 1.0 when β = σ 2 = 1.0 and N = 12 .

From the given diagram in figure 3-6 for the unknowns c(t) and r(t) we find the following
system of differential equations
ċ(t) = kr(t) − kc(t) − kn(t)
ṙ(t) = −βr(t) + w(t) .
Note in deriving the given differential equation for r(t) we have used the discussion on
exponentially correlated random variables, since we are told its autocorrelation function is
φrr (τ ) = σ 2 e−β|τ | . In matrix from we find this system is given by
       
d c(t) −k k c(t) −k 0 n
= + .
dt r(t) 0 −β r(t) 0 1 w
 
−k k
From this expression we see that our system matrix F is given by F = and
0 −β
using the linear variance equation 30 we have
       
ṗ11 ṗ12 −k k p11 p12 p11 p12 −k 0
= +
ṗ12 ṗ22 0 −β p12 p22 p12 p22 k −β
  2  
−k 0 N 0 −k 0
+ 2
0 1 0 σ 0 1
 
−2kp11 + 2kp12 + k 2 N 2 −(k + β)p12 + kp22
= .
−(k + β)p12 + kp22 −2βp22 + σ 2
If we next restrict to the steady-state version of this, where take all time derivatives equal
to zero and solve for pij (t) to find

σ2 kσ 2 kσ 2 kN 2
p22 = , and p12 = , and p11 = + .
2β 2β(k + β) 2β(k + β) 2

With these expressions as the steady-state values for a matrix PSS we can compute the value
of the error variance, where our error function e(t) is defined as e(t) = c(t) − r(t). Writing
this error e(t) as the vector inner product
 
  c(t)
e(t) = 1 −1 ,
r(t)

so that the variance of e(t) as a function of k can be computed using the matrix PSS as
  " # 
kσ2 kN 2 kσ2
2
  1   2β(k+β)
+ 2 2β(k+β) 1
σe (k) = 1 −1 PSS = 1 −1 kσ2 σ2
−1 2β(k+β) 2β
−1
kN 2 kσ 2 σ2
= − + .
2 2β(k + β) 2β

Since the above expression is a function of k, then to pick k such that this expression is a
minimum we take the derivative with respect to k, set the resulting expression equal to zero,
and solve for k. When we do this we find that k is given by
σ
k = −β ± √ . (44)
N2
1
If we take β = σ 2 = 1.0, and N = 2
then from the above we see that k is given by

−3
k = −1 ± 2 = .
1

When we put the value of k = −3 into the second derivative of σe2 (k) we see that the value of
the second derivative is − 81 , which is negative indicating that this value of k gives a maximum
of σe2 (k). When we put the value of k = +1 into the second derivative of σe (k)2 we get a
value of 81 which is positive indicating that this value of k is a minimum as we were asked
to show. In the Mathematical file chap 3 prob 11.nb some of the algebra for this problem
is done.
Chapter 4 (Optimal Linear Filtering)

Notes on the text

Recursive filters: estimating a scalar x

Here we explain how to evaluate the books equation 4.0-3 if we have k measurements zi of
the same quantity x. As k scalar equations we have zi = x + vi for i = 1, 2, · · · , k. This
same situation can be viewed as a vector of measurements z by introducing the measurement
sensitivity matrix H for this problem as
   
1 v1
 1   v2 
   
   
z =  ...  x +  ...  .
   
 1   vk−1 
1 vk

Thus the matrix H in this case is in fact a column vector. The least squares estimate of x
given z is given by equation 4.0-3 or

x̂ = (H T H)−1 H T z .
Pk
For the H given above we have H T H = k and H T z = i=1 zi so that our least squares
estimate x̂ is given by
k
1X
x̂ = zi ,
k i=1
which is the books equation 4.1-1.

State estimators in Linear Form: the discrete Kalman filter

For this chapter we will consider a certain specific forms for the estimator of the unknown
state x at the k-th time step after the k measurement zk has been observed. We denote this
estimate of x as x̂k (+), and the previous estimate of the state x before the measurement as
x̂k (−). With this notation in this section we want to study estimators that linearly combine
these two pieces of information in the following form

x̂k (+) = Kk′ x̂k (−) + Kk zk . (45)

We have yet to determine the optimal choice for the yet undetermined coefficients Kk′ and
Kk . Since our kth measurement zk in terms of the true state xk and measurement noise vk
is given by
zk = Hk xk + vk , (46)
the above expression for x̂k (+) can be written as

x̂k (+) = Kk′ x̂k (−) + Kk Hk xk − Kk vk .

Thus we have replaced the measurement zk with an expression in terms of the state xk . To
replace the value of x̂k (−) with something in terms of the state xk we introduce the the error
in our a priori estimate x̂k (−) as x̃k (−) defined as

x̃k (−) = xk − x̂k (−) . (47)

Using this we get for x̂k (+)

x̂k (+) = Kk′ (xk + x̃k (−)) + Kk Hk xk + Kk vk


= [Kk′ + Kk Hk ]xk + Kk′ x̃k (−) + Kk vk (48)

Introducing the a posteriori error x̃k (+) = xk + x̃k (+) into the left-hand-side of Equation 48
gives the following

x̃k (+) = [Kk′ + Kk Hk − I]xk + Kk′ x̃k (−) + Kk vk , (49)

which is the books equation 4.2-2. If we assume that the a priori estimate x̂k (−) is unbiased
meaning that E[x̂k (−)] = xk or equivalently E[x̃k (−)] = 0 then to have our a posteriori
estimate, x̂k (+), also be unbiased requires that we take

Kk′ = I − Kk Hk , (50)

which is the books equation 4.2-3. Using this expression in Equation 45 gives

x̂k (+) = (I − Kk Hk )x̂k (−) + Kk zk


= x̂k (−) + Kk (zk − x̂k (−)) . (51)

In addition, using this expression with Equation 48 gives

x̃k (+) = [I − Kk Hk ]x̃k (−) + Kk vk , (52)

which is the books equation 4.2-6.

We will now determine Kk by minimizing an appropriate measure of the error in our new
estimate x̂k (+). If we define the value of Pk (−) to be the prior covariance Pk (−) ≡
E[x̃k (−)x̃k (−)T ] and a posterior covariance error Pk (+) defined in a similar manner namely

Pk (+) = E[x̃k (+)x̃k (+)] ,

then with the value of Kk′ given above by Kk′ = I − Kk Hk we can use Equation 52 to derive
our posterior state estimate as

Pk (+) = E[x̃k (+)x̃Tk ]


= E[((I − Kk Hk )x̃k (−) + Kk vk )(x̃Tk (−)(I − Kk Hk )T + vkT KkT )] .

By expanding the terms on the right hand side of this expression and remembering that
E[vk x̃Tk (−)] = 0 gives

Pk (+) = (I − Kk Hk )Pk (−)(I − Kk Hk )T + Kk Rk KkT , (53)


or the so called Joseph form of the covariance update equation and is also the books
equation 4.2-12. We now introduce the post measurement quadratic objective function
Jk = trace[Pk (+)], for which we want to select the matrix Kk such that Jk is a minimum.
Then to find the value of Kk that minimizes this expression we take the derivative of our
objective function with respect to our unknown matrix Kk , set the result equal to zero, and
solve for Kk . To do this this we will expand the quadratic in Equation 53, the Joseph form
of the covariance update equation to write Pk (+) as

Pk (+) = Pk (−) − Kk Hk Pk (−) − Pk (−)HkT KkT + Kk Hk Pk (−)HkT KkT + Kk Rk KkT . (54)

To evaluate the trace of Pk (+) we will use the quadratic outer product trace derivative


trace[ABAT ] = 2AB , (55)
∂A
and the sandwich product trace derivative identity

trace[BAC] = B T C T . (56)
∂A
Then to use these two identities when we rotate Kk to be in the middle of the matrix
products2 so that we can use the sandwich product trace derivative we have that trace[Pk (+)]
is given by

trace[Pk (+)] = trace[Pk (−)] − trace[Pk (−)Kk Hk ] − trace[Pk (−)Kk Hk ]


+ trace[Kk Hk Pk (−)HkT KkT ] + trace[Kk Rk KkT ] .

With this our derivative is given by

∂trace[Pk (+)]
= −2Pk (−)HkT + 2Kk Hk Pk (−)HkT + 2Kk Rk
∂Kk
= −2Pk (−)HkT + Kk (2Hk Pk (−)HkT + 2Rk ) . (57)

Setting this expression equal to zero and solving for Kk we get

Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1 . (58)

which is the books equation 4.2-15.

Now that we have an expression for Kk , alternative forms for the error covariance extrapola-
tion Pk (+) can be obtained through algebraic manipulations. Because every matrix depends
explicitly on k, in the following derivations we can drop the k subscript index from the
given P (±), H and R matrices. The subscript will be added again to the equations that
are the most significant. To derive alternative form for P (+) we expand the product on the
right-hand-side of Equation 53 to get

P (+) = P (−) − KHP (−) − P (−)H T K T + KHP (−)H T K T + KRK T .


2
We are using two additional facts about traces here namely cyclic permutability trace(ABC) =
trace(BCA), and transpose invariance trace(A) = trace(AT )
Putting in K = P (−)H T M −1 with M = HP (−)H T + R we find

P (+) = P (−) − P (−)H T M −1 HP (−) − P (−)H T M −1 HP (−)


+ P (−)H T M −1 HP (−)H T M −1 HP (−) + P (−)H T M −1 RM −1 HP (−)
= P (−) − 2P (−)H T M −1 HP (−) + P (−)H T M −1 [HP (−)H T + R]M −1 HP (−)
= P (−) − 2P (−)H T M −1 HP (−) + P (−)H T M −1 HP (−)
= P (−) − P (−)H T M −1 HP (−)
= Pk (−) − Pk (−)HkT [Hk Pk (−)HkT + Rk ]−1 Hk Pk (−) , (59)

which is the books equation 4.2-16 a. When we recall the expression we found for Kk in
Equation 58 or K = P (−)H T [HP (−)H T + R]−1 by using the last line above we get

Pk (+) = Pk (−) − Kk Hk Pk (−)


= (I − Kk Hk )Pk (−) , (60)

which is the books equation 4.2-16 b. This later form is most often used in computation.

Some simpler forms for Pk (+)−1 and Kk

To begin this subsection we want to show that inverses of the state covariance matrices are
“easy” to update after obtaining a measurement zk . Namely we want to show that

Pk (+)−1 = Pk (−)−1 + HkT Rk−1 Hk , (61)

is true. To do this consider the product Pk (+)Pk (+)−1 , where Pk (+) is given by Equation 60
and Kk is given by Equation 58. Dropping the subscripts k to ease algebraic manipulation
we find

P (+)P (+)−1 = (P (−) + KHP (−))(P (−)−1 + H T R−1 H)


= I + P (−)H T R−1 H − KH − KHP (−)H T R−1 H
= I + P (−)H T R−1 H − P (−)H T [HP (−)H T + R]−1 H
− P (−)H T [HP (−)H T + R]−1 [HP (−)H T + R − R]R−1 H
= I + P (−)H T R−1 H − P (−)H T [HP (−)H T + R]−1 H
− P (−)H T (I − [HP (−)H T + R]−1 R)R−1 H
= I + P (−)H T R−1 H − P (−)H T [HP (−)H T + R]−1 H
− P (−)H T R−1 H + P (−)H T [HP (−)H T + R]−1 H
= I,

as we were to show.

To derive another form for Kk we can introduce the product Pk (+)−1 Pk (+) = I into the
expression for Kk provided in Equation 58 as
Kk = P (−)H T [HP (−)H T + R]−1
= [P (+)P (+)−1]P (−)H T [HP (−)H T + R]−1
= P (+)[P (−)−1 + H T R−1 H]P (−)H T [HP (−)H T + R]−1
= P (+)[H T + H T R−1 HP (−)H T ][HP (−)H T + R]−1
= P (+)H T [I + R−1 HP (−)H T ][HP (−)H T + R]−1
= P (+)H T R−1 [R + HP (−)H T ][HP (−)H T + R]−1
= Pk (+)HkT Rk−1 , (62)
which is the books equation 4.2-20.

Kalman filtering the constant dynamics xk+1 = xk with measurements zk = xk + vk

In this subsection we present the algebra and further discussion on the Kalman filtering
examples presented in the book. We begin with the estimation of a constant x from a
series of uncorrelated corrupted noisy measurements. For this example, because there is no
dynamics the variance propagation equation is simple pk (−) = pk−1(+) and with Hk = 1 the
error covariance update equation due to the measurement zk is
pk (+) = (1 − kk )pk (−) .
For this scalar problem we then have kk = pk (−)(pk (−) + r0 )−1 and with the above we find
pk (+) given by
r0 pk (−) pk (−)
pk (+) = pk (−) − pk (−)[pk (−) + r0 ]−1 pk (−) = = .
pk (−) + r0 1 + pkr(−)
0

The iterative equation for pk (+) is then given by replacing pk (−) with pk−1 (+) in the above
expression to get
pk−1(+)
pk (+) = p (+)
.
1 + k−1r0

The above expression can be iterated to find the general solution with p0 (+) = p0 . We have
p0
p1 (+) =
1 + pr00
p0
p
p1 (+) 1+ r0
0
p0
p2 (+) = = p0 =
1+ p1 (+)
r0 1+ r0
p
1 + 2p
r0
0

1+ r0
0
p0
2p
p2 (+) 1+ r 0
0
p0
p3 (+) = = p0 =
1+ p2 (+)
r0 1+ r0 1 + 3p
r0
0
2p
1+ r 0
0
..
.
p0
pk (+) = . (63)
1 + kp
r0
0
Given this analytic form for pk (+) we can write the Kalman gain Kk with Equation 62 as
0 p
pk (+)
Kk = Pk (+)HkT Rk−1 = = r
.
r0 1 + kpr0
0

Thus our optimal state estimate x̂k (+) is given by

x̂k (+) = x̂k (−) + Kk [zk − x̂k (−)]


p
!
0
r0
= x̂k (−) + kp0
(zk − x̂k (−)) . (64)
1+ r0

There is no process dynamics in this problem so when we need to propagate the state to the
time tk+1 and before the next measurement we have x̂k+1 (−) = x̂k (+).

Kalman filtering correlated measurements (Example 4-2.2)

 
x1
From the given state vector and measurement sensitivity matrix H we seek to deter-
x2
mine how a single measurement z modifies our uncertainty in the state. To do this we will
use the a posterior covariance update equation

P (+) = P (−) − P (−)H T [HP (−)H T + R]−1 HP (−) .

To evaluate the right-hand-side of the above from the problem description we see that
 
  p11 (−) p12 (−)    2 
HP (−) = 0 1 = p12 (−) p22 (−) = σ12 σ22 .
p12 (−) p22 (−)

Then the matrix P (−)H T is the transpose of this or


   2 
T p12 (−) σ12
P (−)H = = .
p22 (−) σ22

Using P (−)H T just computed the inner product like term HP (−)H T is given by
 
T
  p12 (−)
HP (−)H = 0 1 = p22 (−) = σ22 .
p22 (−)

Thus using all of these components we find that P (+) is given by


 2   2 
2
σ1 σ12 1 σ12  2 
P (+) = 2 2 − 2 2 σ12 σ22 .
σ12 σ2 σ2 + r2 σ2

We multiply the two matrices on the right-hand-side and introduce the correlation ρ with
4
2 σ12
σ12 = σ1 σ2 ρ so that 2
= σ22 ρ2 .
σ1
We then get that P (+) equals
     
σ22 (1−ρ2 )+r2 r2
σ12 2
2
σ12 σ22 +r2
P (+) =  σ2 +r2   ,
2 r2 r2
σ12 σ22 +r2
σ22 σ22 +r2

which is the expression in the book. Some special cases of this result are worth considering.
When the measurement z is perfect meaning that there is no estimation error we have r2 = 0
and P (+) becomes  2 
σ1 (1 − ρ2 ) 0
P (+) = .
0 0
Thus we have no uncertainty in the value of x2 and we have maximally reduced our uncer-
tainly in x1 . Next if the measurement z gives no information about x1 their correlation is
zero. When we take ρ = 0 in the above we have
   
r2
σ12 2
σ12 2
P (+) =     σ2 +r2  ,
r2
2
σ12 σ2 +r2
σ22 σ2r+r2
2
2 2

Thus the measurement z provides no information about x1 and using it does not reduce the
initial uncertainty in x1 so we have p11 (+) = σ12 . If the unknowns x1 and x2 are perfectly
correlated ρ = ±1 we have
     
σ12 σ2r+r
2
σ 2 r2
 2 2  12  σ2 +r2  ,
2
P (+) =  r2
2
σ12 σ2 +r2 σ2 σ2r+r
2 2
2
2 2

thus the measurement z provides the same amount of information for both x1 and x2 and
reduces their initial uncertainty by the same amount (by the fraction σ2r+r
2
2
).
2

Kalman filtering the navigation system Omega (Example 4.2-3)

If our a priori estimate of the state is zero x̂(−) = 0 then from the posteriori state update
equation x̂(+) = x̂(−) + Kk (z − H x̂k (−)) we have x̂(+) = Kk z. We compute Kk it the
normal way
Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1 = P (0)H T (HP (0)H T )−1 ,
where we have assumed that the measurement noise is zero “very small”. Given the pieces
from this problem we will now compute Kk . Using the given expression for P (0) and H we
find
  
  1 e−r12 /d e−r13 /d 0 0  −r23 /d

0 1 0  −r12 /d −r23 /d   1 e
T
HP (0)H = σφ 2
e 1 e 1 0  = σφ
2
.
0 0 1 −r13 /d −r23 /d e−r23 /d 1
e e 1 0 1
The inverse of this matrix is given by
 
T −1 1 1 −e−r23 /d
(HP (0)H ) = 2 .
σφ (1 − e−2r23 /d ) −e−r23 /d 1
Using this as a factor we next find that the product K = P (0)H T (HP (0)H T )−1 given by
 −r /d 
e 12 − e−r13 /d−r23 /d e−r13 /d − e−r12 /d−r23 /d
1  .
1 − e−2r23 /d 0
1 − e−2r23 /d
0 1 − e−2r23 /d
 
φ̂1
From this matrix we can compute x̂(+). We find since x̂(+) =  φ̂2  = Kz that
φ̂3
 
e−r12 /d − e−r13 /d−r23 /d e−r13 /d − e−r12 /d−r23 /d  
1  −2r /d  φ 2
x̂(+) = 1 − e 23 0
1 − e−2r23 /d −2r23 /d φ3
0 1−e
 1
 
1−e−2r23 /d
(e−r12 /d − e−(r13 +r23 )/d )φ2 + (e−r13 /d − e−(r12 +r23 )/d )φ3
=  φ2 ,
φ3

which duplicates the results given in the book. In the Mathematical file chap 4 2 3.nb we
perform some of the algebra not displayed in the above derivation.

Kalman filtering an inertial navigation system (Example 4.2-4)

The propagation from t = 0 to t = T the time of the first fix is done using the state error
covariance extrapolation equation or P (T − ) = Φ(T, 0)P (0)Φ(T, 0)T . Using the given matrix
Φ(T, 0) for this problem we can compute P (T − ) to find
 2   
1 T T2 σp2 0 0 1 0 0
P (T − ) =  0 1 T   0 σv2 0   T 1 0 
T2
0 0 1 0 0 σa2 T 1
 2 2 T2 2
 2
σp T σv 2 σa 1 0 0
=  2
0 σv T σa 2   T 1 0 
T2
0 0 σa2 2
T 1
 2 4 3 2 
σp + T 2 σv2 + T4 σa2 T σv2 + T2 σa2 T2 σa2
=  σv2 + T 2 σa2 T σa2  ,
3
T σv2 + T2 σa2 (65)
T2 2 2 2
σ
2 a
T σa σa

which is the expression in the book. Note I have used the notation σp2 = δp2 (0), σv2 = δv 2 (0),
and σa2 = δa2 (0) since it is easier to type. After the measurement the new uncertainty P (T + )
is reduced from P (T − ) with

P (T + ) = P (T − ) − P (T − )H T (HP (T − )H T + R)−1 HP (T − ) . (66)


 
With a measurement sensitivity matrix H of H = −1 0 0 we find

H T P (T − )H + R = p11 (T − ) + σp2 .
and    
HP (T − ) = −1 0 0 P (T − ) = − p11 (T − ) p12 (T − ) p13 (T − ) ,
 
p11 (T − )
so that P (T − )H T = −  p12 (T − ) . With these we find the matrix product given about
p13 (T − )

M = P (T − )H T (HP (T −)H T + R)−1 HP (T − )


 
p11 (T − )  
1  p12 (T − )  p11 (T − ) p12 (T − ) p13 (T − )
= − 2
p11 (T ) + σp
p13 (T − )
 
p11 (T − )2 p11 (T − )p12 (T − ) p11 (T − )p13 (T − )
1  p11 (T − )p12 (T − )
= p12 (T − )2 p12 (T − )p13 (T − )  . (67)
p11 (T ) + σp2

− − − −
p11 (T )p13 (T ) p12 (T )p13 (T ) p13 (T − )2

Since the total uncertainty after the fix P (T + ) is given by P (T − ) − M, with M computed
above we see that the uncertainty of the (1, 1) component becomes

p11 (T − )2 p11 (T − )σp2


p11 (T + ) = p11 (T − ) − = .
p11 (T − ) + σp2 p11 (T − ) + σp2

If p11 (T − ) ≫ σp2 then the above becomes

p11 (T − )σp2 p11 (T − )σp2


= = σp2 ,
p11 (T − ) + σp2 p11 (T − )

and thus the first fix reduces the error in the position measurement to that of the sensor.

Notes on continuous propagation of covariance

In this section using the discrete results we derive how the continuous covariance matrix P (t)
propagates due to the process dynamics and the continuous measurement stream. When we
use the approximations Φk → I + F ∆t, and Qk → GQGT ∆t in

Pk+1 (+) = Φk Pk (+)ΦTk + Qk ,

we get
Pk+1(−) = Pk (+) + [F Pk (+) + Pk (+)F T + GQGT ]∆t + O(∆t2 ) . (68)
Recalling that after a measurement our state uncertainty is updated with

Pk (+) = (I − Kk Hk )Pk (−) ,

we can put this expression for Pk (+) into the right-hand-side of Equation 68

Pk+1 (−) = (I−Kk Hk )Pk (−)+[F (I−Kk Hk )Pk (−)+(I−Kk Hk )Pk (−)F T +GQGT ]∆t+O(∆t2 ) .
We can manipulate this into a first order difference as

Pk+1 (−) − Pk (−) 1


= F Pk (−) + Pk (−)F T + GQGT − Kk Hk Pk (−)
∆t ∆t
− F Kk Hk Pk (−) − Kk Hk Pk (−)F T + O(∆t) . (69)
1
To further evaluate this we need to consider the expression K .
∆t k
We have

1 1
Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1
∆t ∆t
= Pk (−)HkT (Hk Pk (−)HkT ∆t + Rk ∆t)−1 .

With the discrete covariance matrix Rk converging to the spectral density matrix R(t) when
∆t → 0 we have Rk ∆t → R as ∆t → 0 and the term Hk Pk (−)HkT ∆t → 0 as ∆t → 0 and so
this term then has the following limit
1
Kk → P H T R−1 . (70)
∆t
In the same way the Kalman gain Kk by itself limits to zero since
1
Kk → P H T R−1 then Kk → ∆tP H T R−1 = 0 ,
∆t
as ∆t → 0. Because of this in Equation 69 the two terms −F Kk Hk Pk (−) and −Kk Hk Pk (−)F T
vanish when we take the limit of ∆t tending to zero. Collecting all of these results we are
finally left with
Ṗ (t) = F P + P F T + GQGT − P H T R−1 HP , (71)
which is known as the matrix Riccati equation, and is the books equation 4.3-8.

Notes on the continuous Kalman filter

Having just developed the matrix Riccati equation which governs how the state covariance
matrix P (t) evolves we now perform the same procedure to determine the equation that
governs how the continuous state x̂(t) evolves. As before we begin with the corresponding
discrete state update equation

x̂k (+) = x̂k (−) + Kk [zk − Hk x̂k (−)] ,

where we put x̂k (−) = Φk−1 x̂k−1 (+) into the above to get

x̂k (+) = Φk−1 x̂k−1 (+) + Kk (zk − Hk Φk−1 x̂k−1 (+)) ,

which is the books equation 4.3-10. Next we use the discrete to continuous approximations

Φk−1 = I + F ∆t
Kk = P H T R−1 ∆t ,
to get

x̂k (+) = x̂k−1 (+) + F x̂k−1(+)∆t + P H T R−1 (zk − Hk (I + F ∆t)x̂k−1 (+))∆t ,

or
x̂k (+) − x̂k−1 (+)
= F x̂k−1 (+) + P H T R−1 (zk − Hk x̂k−1 (+)) + O(∆t) .
∆t
In the limit ∆t → 0 this becomes
˙
x̂(t) = F x̂(t) + P H T R−1 (z − H x̂(t)) , (72)

which is the continuous Kalman filter equation. Note that in the above the expression
P (t) given by solving the matrix Riccati Equation 71 for P (t).

Often it is helpful to have a dynamical expression for the error in the continuous state
estimate x̂(t). To derive the differential equation for this error x̃(t) ≡ x̂(t) − x(t) we subtract
the equation governing the true system dynamics
dx
= F x + Gw ,
dt
from the continuous Kalman filter Equation 72 to get

x̃(t)
= F x̃(t) − Gw + P H T R−1 (z − H(x̃(t) + x(t)))
dt
= F x̃(t) − Gw − P H T R−1 H x̃(t) + P H T R−1 v(t) .

Where we have used z(t) − Hx(t) = v(t). When we group terms we have

x̃(t)
= (F − P H T R−1 H)x̃(t) − Gw + P H T R−1 v .
dt
Recalling that K(t) can be expressed as P (t)H(t)T R(t)−1 this later expression becomes

dx̃
= (F − KH)x̃ − Gw + Kv , (73)
dt
which is the books equation 4.3-13.

Notes on correlated process and measurement noise: Y. C. Ho’s method

If our process w(t) and measurement v(t) noise are correlated, meaning that E[w(t)v T (τ )] =
C(t)δ(t − τ ), then we can transform this problem into one where the new process noise term
is uncorrelated with the measurement noise. The algebra to do this are discussed here. Since
our measurement z(t) is given in terms of our state via z = Hx + v we can add a multiple
(say D) of the expression z − Hx − v = 0 to the system dynamics equation giving

dx(t)
= F x + Gw + D(z − Hx − v) = (F − DH)x + Dz + Gw − Dv . (74)
dt
If we take D to be given by the special value of D = GCR−1 then we claim that this new
process noise term Gw − Dv will be uncorrelated with the measurement noise v and results
in a system of the type we have previously been studying. To prove this, we compute the
cross-correlation of the new process noise term Gw − Dv with the old measurement noise
term v as
E[(Gw − Dv)v T ] = GE[wv T ] − DE[vv T ]
= GC − DR
= GC − GCR−1 R = 0 ,
as we desired to show. We next derive the continuous Kalman filter and the matrix Riccati
equation for the system given by Equation 74. To derive the continuous Kalman filter in
this case we will use the form given by Equation 72 but with a few modifications. The first
modification is that with a deterministic forcing in the system dynamics (as we have here in
the form of the Dz term) this forcing must also show up as a term on the right-hand-side of
Equation 72. The second modification is that the “F ” matrix in Equation 72 is now given
by F − DH. We thus obtain
˙
x̂(t) = (F − GCR−1 H)x̂(t) + P H T R−1 (z − H x̂(t)) + GCR−1 z
= F x̂(t) − (GCR−1 H + P H T R−1 H)x̂(t) + (P H T R−1 + GCR−1 )z
= F x̂ − (P H T + GC)R−1 H x̂ + (P H T + GC)R−1 z
= F x̂ + (P H T + GC)R−1 (z − H x̂) . (75)
Next we consider the matrix Riccati Equation 71 for this system. As before we need to
modify this slightly for the given system. The first modification is again that “F ” matrix
in Equation 71 becomes F − DH = F − GCR−1 H. The second modification is that the Q
matrix (representing the process noise covariance matrix) needs to correspond to the form
of the process noise we have here which has a form given by
Gw − Dv = Gw − GCR−1 v = G(w − CR−1 v) .
A noise vector of this form will have a covariance matrix given by
Cov(G(w − CR−1 v)) = GCov((w − CR−1 v))GT
= G(Cov(w) + Cov(CR−1 v) − 2Cov(wv T )R−1 C)GT
= G(Q + CR−1 RR−1 C − 2CR−1 C)GT
= G(Q − CR−1 C)QT .
This later expression will replace the expression GQGT in Equation 71. When we make
these two substitutions into the matrix Riccati equation and perform some manipulations.
We find
Ṗ (t) = (F − DH)P + P (F − DH)T + G(Q − CR−1 C)GT − P H T R−1 HP
= F P + P F T + GQGT − GCR−1 HP − P H T R−1 CGT − GCR−1 CGT − P H T R−1 HP
= F P + P F T + GQGT − GCR−1 (HP + CGT ) − P H T R−1 (HP + CGT )
= F P + P F T + GQGT − (GCR−1 + P H T R−1 )(HP + CGT )
= F P + P F T + GQGT − (GC + P H T )R−1 (CG + P H T )T
= F P + P F T + GQGT − (GC + P H T )R−1 RR−1 (CG + P H T )T . (76)
If we define
K(t) ≡ (P H T + GC)R−1 , (77)
then we see that Equation 75 and 76 become

x̂˙ = F x̂ + K(z − H x̂)


Ṗ = F P + P F T + GQGT − KRK T .

This result agrees with the ones presented in the book when given a system with correlated
process and measurement noises.

A system model that contains deterministic inputs: stochastic observability

Given the continuous system matrix Riccati equation

Ṗ = F P + P F T + GQGT − P H T R−1 HP with P (0) ≈ +∞ , (78)

where P (0) ≈ +∞ can be taken to mean that we have no a priori information. We will
transform this expression into a differential equation for P (t)−1 . To do this recall that since
Ṗ −1 = −P −1 Ṗ P −1 by solving for Ṗ (t) we get that Ṗ = −P Ṗ −1P and using this expression
in the left-hand-side of Equation 78 we get

−P Ṗ −1 P = F P + P F T + GQGT − P H T R−1 HP .

or by multiplying by P −1 once on the left and once on the right and then negating we get

Ṗ −1 = −P −1 F − F T P −1 − P −1 GQGT P −1 + H T R−1 H
= −F T P −1 − P −1 F − P −1 GQGT P −1 + H T R−1 H , (79)

where the last equation simple changes the order of the terms in the equation above it.
The initial condition P (0) ≈ +∞ transforms into the initial condition that P −1(0) = 0. If
we assume our system has no process noise then the term GQGT vanishes and this is the
books equation 4.4-10. We can solve this equation as in Problem 3.1 on Page 28. Since the
T
fundamental solution to the system with a transition matrix −F T is given by e−F (t−τ ) we
see that the solution to P (t)−1 is given by
Z t
−1 T
P (t) = e−F (t−τ ) H(τ )T R−1 (τ )H(τ )e−F (t−τ ) dτ
Z0 t
T
= eF (τ −t) H(τ )T R−1 (τ )H(τ )eF (τ −t) dτ
Z0 t
= Φ(τ, t)T H(τ )T R−1 (τ )H(τ )Φ(τ, t)dτ ,
0

which is the books equation 4.4-11. In the above Φ(t, τ ) is the transition matrix correspond-
ing to F .
Notes on correlated measurement errors: continuous time when R is singular

If we define the derived measurement z1 as

z1 = ż − Ez , (80)

then we see that we can write z1 in terms of our original state x, the original process noise
w, and the unexplained measurement noise w1 as

z1 = ż − Ez
d
= (Hx + v) − E(Hx + v)
dt
= Ḣx + H ẋ + v̇ − EHx − Ev
= Ḣx + H(F x + Gw) + (Ev + w1 ) − EHx − Ev
= (Ḣ + HF − EH)x + HGw + w1
= H1 x + v1 .

For this measurement equation for z1 we can now calculate its measurement covariance
matrix R1 as E[v1 v1T ]. Since w and w1 are uncorrelated E[ww1T ] = 0 and we find

R1 = E[(HGw + w1 )(HGw + w1 )T ] = HGQGT H T + Q1 , (81)

which is the books equation 4.5.7. The cross correlation matrix C1 is then computed as

C1 = E[w(t)v1T (τ )] = E[w(t)(HGw(τ ) + w1 (τ ))T ]


= QGT H T , (82)

which is the books equation 4.5.8. The equivalent problem which we have just formulated is
then expressed as

ẋ = F x + Gw with w ∼ N(0, Q)
z1 = H1 x + v1 ,

with the matrices R1 and C1 given by Equations 81 and 82 respectively. For this continuous
problem, since we have correlated process and measurement noise using Equation 77 we find
K1 given by

K1 = [P H1T + GC1 ]R1−1


= [P (Ḣ + HF − EH)T + GQGT H T ](HGQGT H T + Q1 )−1 , (83)

which is the books equation 4.5-9. Then the continuous Kalman filter is given by

x̂˙ = F x̂ + K1 (z1 − H1 x̂)


= F x̂ + K1 (ż − Ez − H1 x̂) , (84)

which is the books equation 4.5-10, and the covariance equation is

Ṗ = F P + P F T + GQGT − K1 R1 K1T ,
which is the books equation 4.5-11. We can avoid having to differentiate our measurement z
which is seemingly required by the ż term on the right-hand-side in Equation 84 by instead
taking our state to be x(t) − K1 (t)z(t). Using this expression when we put Equation 84 into
the derivative dtd (x̂(t) − K1 (t)z(t)) we get

d
(x̂(t) − K1 (t)z(t)) = x̂˙ − K̇1 z − K1 ż
dt
= (F x̂ + K1 (ż − Ez − H1 x̂)) − K̇1 z − K1 ż
= F x̂ − K1 Ez − K1 H1 x̂ − K̇1 z
= (F − K1 H1 )x̂ − K1 Ez − K̇1 z .

Notes on correlated measurement errors: discrete time when Rk is singular

When we have Rk ≡ 0, the update equation for the error covariance matrix is given by

Pk (+) = Pk (−) − Pk (−)HkT [Hk Pk (−)HkT ]−1 Hk Pk (−) .

If we multiply this by Hk on the left and HkT on the right we find that

Hk Pk (+)HkT = Hk Pk (−)HkT − Hk Pk (−)HkT [Hk Pk (−)HkT ]−1 Hk Pk (−)HkT = 0 ,

showing that a linear combination of elements from Pk (+) is zero so a linear combination of
states is known exactly.

Notes on the solution of the Riccati equation

In this section we will demonstrate an algebraic transformation that will allow the solution
of the Riccati equation in the case where it has constant coefficients

Ṗ = F P + P F T + GQGT − P H T R−1 HP ,

with P (t0 ) given. To show the transformation we will use to solve the equation above we let

λ = Py , (85)

and let y satisfy the following differential equation

ẏ = −F T y + H T R−1 HP y . (86)

Then the first derivative of λ is given by

λ̇ = Ṗ y + P ẏ
= (F P + P F T + GQGT − P H T R−1 HP )y + P (−F T y + H T R−1 HP y)
= F P y + GQGT y
= F λ + GQGT y , (87)
which
  is the books equation 4.6-5. As a matrix system with a vector of unknowns given by
y
Equations 86 and 87 combine to give
λ
    
ẏ −F T H T R−1 H y
= T , (88)
λ̇ GQG F λ
which is the books equation
 4.6-6.
 Since this is a time-invariant linear dynamical system for
y
the vector of unknowns , let Φ = Φ(t0 + τ, t0 ) be its transition matrix, such that when
λ
written in block form
    
y(t0 + τ ) Φyy (τ ) Φyλ (τ ) y(t0)
= .
λ(t0 + τ ) Φλy (τ ) Φλλ (τ ) λ(t0 )
If we compute components of the product above we find
y(t0 + τ ) = Φyy (τ )y(t0 ) + Φyλ (τ )λ(t0 ) = Φyy (τ )y(t0 ) + Φyλ (τ )P (t0 )y(t0) (89)
λ(t0 + τ ) = Φλy (τ )y(t0 ) + Φλλ (τ )λ(t0 ) = Φλy (τ )y(t0 ) + Φλλ (τ )P (t0 )y(t0 ) . (90)
We can replace the left-hand-side of Equation 90 with λ(t0 + τ ) = P (t0 + τ )y(t0 + τ ) and
then use Equation 89 to evaluate y(t0 + τ ) to get
[Φλy (τ ) + Φλλ (τ )P (t0 )]y(t0 ) = λ(t0 + τ ) = P (t0 + τ )y(t0 + τ )
= P (t0 + τ )[Φyy (τ ) + Φyλ (τ )P (t0 )]y(t0 ) .
If we “cancel” y(t0 ) from both side of this expression and solve for P (t0 + τ ) we get
P (t0 + τ ) = [Φλy (τ ) + Φλλ (τ )P (t0 )][Φyy (τ ) + Φyλ (τ )P (t0 )]−1 , (91)
which is the books equation 4.6-8.

As a special case we can use the above result to solve the linear variance equation
Ṗ = F P + P F T + GQGT with P (t0 ) given .
Since the linear variance equation has H T R−1 H = 0 the system in Equation 88 is given by
    
ẏ −F T 0 y
= T . (92)
λ̇ GQG F λ
In the above the equation for y decouples from that of λ and we have ẏ = −F T y so that
T
the fundamental solution for y is Φyy (τ ) = e−F τ and y(t) at any time is then given using
that as y(t) = Φyy (t)y0 . The differential equation for λ now has the known function y(t) as
a forcing term and is given by
λ̇ = GQGT y + F λ = F λ + GQGT Φ(t)y0 .
As forcing functions like GQGT Φ(t)y0 are not important in determining fundamental solu-
tions, the fundamental solution for λ(t) is eF t . Next, to see that Φyλ (τ ) = 0 we can note that
for the matrix given in Equation 92 the block matrix fundamental solution Φ(τ ) is given by
−F T
 
0 ∞  k

GQG T
F
τ
X τk −F T 0
Φ(τ ) = e = .
k=0
k! GQGT F
 k
−F T 0
Each term in the above sum is of the form , which is the k-th power of a
GQGT F
block lower triangular matrix and thus is also block lower triangular. Thus the block (1, 2)
term in each component of the sum is 0. Since each component in the sum has a zero (1, 2)
term the (1, 2) term for the block fundamental solution Φ(τ ) will also be zero. Thus we
conclude that Φyλ (τ ) = 0. Using this fact, Equation 91 then gives

P (t0 + τ ) = (Φλy (τ ) + Φλλ (τ )P (t0 ))Φyy (τ )−1 .

Which can further be evaluated by noting that


Tτ Tτ
Φyy (τ )−1 = (e−F )−1 = eF = Φλλ (τ )T ,

so
P (t0 + τ ) = Φλy (τ )Φλλ (τ )T + Φλλ (τ )P (t0 )Φλλ (τ )T , (93)
which is the books equation 4.6-10 and represents a way to solve the linear variance equation.

Problem Solutions

Problem 4-1 (two measurements treated sequentially/simultaneously)

Part (a): If the two measurements are sequential we first observe z1 and then observe z2 .
Assuming no prior information is equivalent to the maximum likelihood estimation method
which for Gaussian densities is given by

x̂1 (+) = (H T H)−1 H T z1 .

when there is only one measurement z1 = x + v1 we see that H1 = 1 and R1 = σ12 so the
above gives x̂1 (+) = z1 . To update the new uncertainty we use

P1−1(+) = P1−1 (−) + H1T R1−1 H1 .

If we have no a priori information P1−1 (−) = 0 and the above gives


1
P1−1(+) = ⇒ P1 (+) = σ12 .
σ12

Next, since we are estimating a constant the system dynamics propagate x̂1 (+) to x̂2 (−) as

x̂2 (−) = 1x̂1 (+) = z1


P2 (−) = P1 (+) = σ12 .

The second measurement z2 is again of the form z2 = x + v2 so we have H2 = 1, and R2 = σ22


so that
σ2
K2 = P2 (−)H2T [H2 P2 (−)H2T + R2 ]−1 = σ12 (σ12 + σ22 )−1 = 2 1 2 .
σ1 + σ2
Using this Kalman gain K2 we have
x̂2 (+) = x̂2 (−) + K2 (z2 − H2 x̂2 (−))
σ2
= z1 + 2 1 2 (z2 − z1 )
σ1 + σ2
σ z2 + σ22 z1
2
= 1 2 ,
σ1 + σ22
the same as the books equation 1.0-7. Next we have for P2 (+) the following
 
σ12
P2 (+) = (1 − K2 H2 )P2 (−) = 1 − 2 2
σ12
σ1 + σ2
2 2
 −1
σ1 σ2 1 1
= 2 2
= 2
+ 2 ,
σ1 + σ2 σ1 σ2
the books equation 1.0-6.

Part (b): When the two measurements are taken sequentially the each are of the form
zi = x + vi for i = 1, 2 and our measurement vector z is given by
     
z1 1 v1
z1 = = x+ ,
z2 1 v2
 
1
so H1 = and the probability density for the measurement error vector v1 given by
1  2 
σ1 0
p(v1 ) = N 0, . Since we have no a priori information we are required to use
0 σ22
weighted-least squares which has a update given by
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 ,
 
1/σ12 0
with the matrix R1−1 = . With the form of H1 above we compute
0 1/σ22
1 1
H1T R1−1 H1 = 2
+ 2.
σ1 σ2
Using this we can compute the new uncertainty matrix P1 (+) as
1 1
P1 (+)−1 = P1−1(−) + H1T R1−1 H1 = 0 + 2
+ 2.
σ1 σ2
Thus P1 (+) is given by
 −1
1 1
P1 (+) = + ,
σ12 σ22
the same as the books equation 1.0-6. Finally we have x̂1 (+) after this combined measure-
ment z1 given by
 −1  
1 1 z1 z2
x̂1 (+) = + +
σ12 σ22 σ12 σ22
1 
= 2 2
σ22 z1 + σ12 z2 ,
σ1 + σ2
the same as the books equation 1.0-7.
Problem 4-2 (additional Kalman filtering examples)

For this problem we want to rework Problems 1-1 and 1-3 using the Kalman filtering frame-
work developed in this chapter. Problem 1-1 has to do with two measurements zi of a
constant x that are correlated with a correlation coefficient ρ. Problem 1-3 has to with three
independent measurements.

Problem 1-1: If we assume that our measurements of the constant x of the form zi = x + vi
for
 i 2= 1, 2 are correlated,
 then the noise vector v takes the form v ∼ N(0, R) with R =
σ1 ρσ1 σ2
. Thus our measurement vector z1 is given by
ρσ1 σ2 σ22
 
1
z1 = x + v1 ,
1
 
1
thus H1 = and R1 = R the matrix above. If we assume we have no a priori information
1
on the value of x then our estimate of our state x after the measurement z1 is given by the
weighted least squares estimate
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 , (94)
and the new uncertainty, P1 (+), can be computed as
P1 (+)−1 = P1 (−)−1 + H1T R1−1 H1 = H1T R1−1 H1 ,
since P1 (−)−1 = 0. From the given form for R1 we have that its inverse R−1 is given by
 
−1 1 σ22 −ρσ1 σ2
R1 = 2 2 .
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 σ12
So that the product R1−1 H1 is given by
 
1 σ22 − ρσ1 σ2
R1−1 H1 = 2 2 ,
σ1 σ2 (1 − ρ2 ) −ρσ1 σ2 + σ12
and the product H1T R1−1 H1 is given by
σ12 + σ22 − 2ρσ1 σ2
H1T R1−1 H1 = .
σ12 σ22 (1 − ρ2 )
Thus since this product H1T R1−1 H1 equals P1 (+)−1 we have that
σ12 σ22 (1 − ρ2 )
P1 (+) = ,
σ12 + σ22 − 2ρσ1 σ2
which is the same result given in the book for the uncertainty of this system. Next using
these subresults in Equation 94 we compute x̂1 (+) as
   
σ12 σ22 (1 − ρ2 ) 1 2 2

x̂1 (+) = 2 2
· 2 2
(σ2 − ρσ 1 σ2 )z1 + (−ρσ1 σ2 + σ1 )z2
σ + σ − 2ρσ1 σ2 σ1 σ2 (1 − ρ2 )
 1 2 2   
σ2 − ρσ1 σ2 σ12 − ρσ1 σ2
= z1 + z2 ,
σ12 + σ22 − 2ρσ1 σ2 σ12 + σ22 − 2ρσ1 σ2
which also agrees with the solution found in Problem 1.1.

Problem 1-3: In the case when we have three independent  measurements, zi , of an unknown
1
scalar x, our measurement vector z1 is given by z1 =  1  x + v1 with v1 ∼ N(0, R1 ) and
1
2 2 2 −1 2 2 2
R1 = diag(σ1 , σ2 , σ3 ). From this formulation
 wesee that R = diag(1/σ1 , 1/σ2 , 1/σ3 ) and
1
the measurement sensitivity matrix H1 = 1 . Again assuming no a priori information

1
we have
P1 (+)−1 = P1 (−)−1 + H1T R1−1 H1 = H1T R1−1 H1 ,
and
x̂1 (+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 .
h i  
1 1 1
With the above matrices we have H1T R1−1 = σ12 σ22 σ32 , and thus H1T R1−1 H1 = σ12 + 1
σ22
+ 1
σ32
1
so that  −1
1 1 1
P1 (+) = 2
+ 2+ 2 ,
σ1 σ2 σ3
and  −1  
1 1 1 z1 z2 z3
x̂1 (+) = + + + + ,
σ12 σ22 σ32 σ12 σ22 σ32
which is the same as the results found in problem 1-3.

Problem 4-3 (Kalman filtering a decaying concentration)

Part (a): For this part of the problem our measurements are zi = x0 e−ti + vi with vi ∼
N(0, σi2 ) to be taken simultaneously. Since we are told that a priori we have no prior
information on the initial concentration x0 we will take P (−)−1 = 0 and the initial estimate
x̂(+) is the maximum likelihood estimate, which in this case because the two measurements
have different uncertainties is given by the weighted-least-squares estimate

x̂(+) = (H T R−1 H)−1 H T R−1 z . (95)

In this problem we map our measurement z to our state via


   −t   
z1 e 1 v1
z= = x0 + ,
z2 e−t2 v2
 −t   2 
e 1 σ1 0
we see that the matrices H and R are explicitly given by H = and R = .
e−t2 0 σ22
Using these we can compute the matrix products needed to evaluate Equation 95 as

e−2t1 e−2t2
H T R−1 H = + ,
σ12 σ22
and
e−t1 e−t2
H T R−1 z = z1 + z2 .
σ12 σ22
Thus x̂(+) is given by
 −1  −t1     −t1 
e−2t1 e−2t2 e e−t2 σ12 σ22 e e−t2
x̂(+) = + 2 z1 + 2 z2 = z1 + 2 z2
σ12 σ2 σ12 σ2 σ22 e−2t1 + σ12 e−2t2 σ12 σ2
2 −t1 2 −t2
σ e z1 + σ1 e z2
= 2 2 −2t1 ,
σ2 e + σ12 e−2t2

and the covariance update equation gives

e−2t1 e−2t2
P (+)−1 = P (−)−1 + H T R−1 H = + ,
σ12 σ22

so P (+) is given by
 −1
σ2σ2 1 −2t1 1
P (+) = 2 −2t1 1 2 2 −2t2 = 2
e + 2 e−2t2 ,
σ2 e + σ1 e σ1 σ2

the same results found in problem 1.3 earlier.

Part (b): If the measurements are now assumed to be obtained sequentially then since
z1 = e−t1 x0 + v1 is the first one we have H1 = e−t1 and R1 = σ12 . Since we have no a priori
information on x0 the state update equation is still the maximum likelihood equation, an
applying the information from just this one measurement gives as our new estimate of x0
 −2t1 −1 −t1
T −1 −1 −1 e e
x̂1 (+) = (H1 R1 H1 ) H1 R1 z1 = 2
z1 = et1 z1 ,
σ1 σ12

and
−1 −1 e−2t1
P1 (+) = P1 (−) + H1T R1−1 H1 = 2 ⇒ P1 (+) = σ12 e2t1 .
σ1
Now before we can incorporate the second equation we must perform state and covariance
extrapolation

x̂2 (−) = Φ1 x̂1 (+) and P2 (−) = Φ1 P1 (+)ΦT1 + Q1 = Φ1 P1 (+)ΦT1 ,

since Q1 = 0. As the underlying initial state, x0 , we are trying to estimate is a constant we


have Φ = 1 (here Φ denotes how the state changes with time, not the measurement). Thus

x̂2 (−) = x̂1 (+) = et1 z1 and P2 (−) = P1 (+) = σ12 e2t1 .

The second z2 is related to the initial concentration as z2 = e−t2 x0 + v2 we have H2 = e−t2


and R2 = σ22 . Next we use the Kalman update equations to obtain the posteriori state and
covariance x̂2 (+) and P2 (+) after the second measurement. We find

K2 = P2 (−)H2T [H2 P2 (−)H2T + R2 ]−1 = σ12 e2t1 · e−t2 [e−2t2 σ12 e2t1 + σ22 ]−1
= σ12 e2t1 −t2 [σ12 e−2t2 +2t1 + σ22 ]−1 .
Then

x̂2 (+) = x̂2 (−) + K2 (z2 + H2 x̂2 (−))


= et1 z1 + σ12 e2t1 −t2 (σ12 e−2t2 +2t1 + σ22 )−1 (z2 − e−t2 et1 z1 )
σ 2 e−t1 z1 + σ12 e−t2 z2
= 2 2 −2t1 ,
σ2 e + σ12 e−2t2

and
 
σ12 e2t1 −2t2
P2 (+) = (I − K2 H2 )P2 (−) = 1 − σ12 e2t1
σ12 e−2t2 +2t1 + σ22
σ12 σ22
= ,
σ22 e−2t1 + σ12 e−2t2

both of which agree with what we computed earlier.

Problem 4-4 (weighted least squares and adding an additional measurement)

After having appended a second measurement the same weighted least squares solution for
x̂ will hold, but with the larger matrices H1 , R1 , and z1 . That is we have

x̂(+) = (H1T R1−1 H1 )−1 H1T R1−1 z1 . (96)

Since the new measurement is uncorrelated with the others R1 is block diagonal so its inverse
is also block diagonal  −1 
−1 R0 0
R1 = ,
0 R−1
and the measurement sensitivity matrix H1 also has a block form given by
 
H1T = H0T H T .

Using these two we see that


    
  R0−1 0 H0   R0−1 H0
H1T R1−1 H1 = H0T H T
= H0T H T
0 R−1 H R−1 H
= H0T R0−1 H0 + H T R−1 H . (97)

The problem states that we should define P (−)−1 as H0T R0−1 H0 so if we define P (+)−1 in
the same way as H1T R1−1 H1 then from Equation 97 we have shown that

P (+)−1 = P (−)−1 + H T R−1 H .

Next lets compute x̂(+) using Equation 96. We first see that
  
T −1
 T  R0−1 0 z0
H1 R1 z1 = H0 H T
= H0T R0−1 z0 + H T R−1 z ,
0 R−1 z
so that

x̂(+) = (H1T R1−1 H1 )−1 [H0T R0−1 z0 + H T R−1 z] = P (+)[H0T R0−1 z0 + H T R−1 z]
= P (+)H0T R0−1 z0 + P (+)H T R−1 z , (98)

using the definition that (H1T R1−1 H1 )−1 = P (+). Now P (+) is given in terms of P (−) as

P (+) = [P (−)−1 + H T R−1 H]−1 .

To evaluate this we will use the matrix inversion identity

B −1 = A−1 − B −1 (B − A)A−1 . (99)

with

B = P (−)−1 + H T R−1 H = P (+)−1 and


A = P (−)−1 .

For which we find


P (+) = P (−) − P (+)H T R−1 HP (−) . (100)
When we put this expression for P (+) into the first term in Equation 98 we find

x̂(+) = P (−)H0T R0−1 z0 − P (+)H T R−1 HP (−)H0T R0−1 z0 + P (+)H T R−1 z


= x̂(−) − P (+)H T R−1 H x̂(−) + P (+)H T R−1 z
= x̂(−) + P (+)H T R−1 (z − H x̂(−)) ,

which is the desired expression. In the above simplifications we have used the fact that

x̂(−) = (H0T R0−1 H0 )−1 H0T R0−1 z0 = P (−)H0T R0−1 z0 .

Problem 4-5 (minimizing the scalar loss functional J(x̂))

The given objective function J(x̂) can be expanded and written as

J(x̂) = [x̂ − x(−)]T P (−)−1 [x̂ − x(−)] + (z − H x̂)T R−1 (z − H x̂)


= x̂T P (−)−1 x̂ − x̂T P (−)−1 x(−) − x(−)T P (−)−1 x̂ + x(−)T P (−)−1 x(−)
+ z T R−1 z − z T R−1 H x̂ − x̂T H T R−1 z + x̂T H T R−1 H x̂ .

Then to find the value of x̂ that minimizes this expression we take the derivative of J with
respect to x̂, set the result equal to zero and then solve for x̂. This derivative is given by
∂J
= 2P (−)−1 x̂ − P (−)−1 x(−) − P (−)−1 x(−)
∂ x̂
− H T R−1 z − H T R−1 z + 2H T R−1 H x̂
= 2[P (−)−1 + H T R−1 H]x̂ − 2P (−)−1 x(−) − 2H T R−1 z .
Where to take the derivative above we have used Equations 311 and 312

∂aT x ∂xT a
=a= ,
∂x ∂x
and the quadratic derivative Equation 312,

∂xT Ax
= (A + AT )x . (101)
∂x
∂J
Setting the expression ∂ x̂
equal to zero and solving for x̂ which we denote x̂(+) we get

x̂(+) = (P (−)−1 + H T R−1 H)−1(P (−)−1 x(−) + H T R−1 z) ,

as the solution to the expressed minimization problem. Motivated by the expression above
if we define P (+) as
P (+) = (P (−)−1 + H T R−1 H)−1 ,
then the inverse of P (+) is given directly

P (+)−1 = P (−)−1 + H T R−1 H .

Using this definition the above expression for x̂(+) is given as

x̂(+) = P (+)P (−)−1 x(−) + P (+)H T R−1 z ,

and for the first term in the above we can use the matrix inversion lemma as in the previous
problem to write P (+) as given by Equation 100 to obtain

x̂(+) = [P (−) − P (+)H T R−1 HP (−)]P (−)− x̂(−) + P (+)H T R−1 z


= x̂(−) − P (+)H T R−1 H x̂(−) + P (+)H T R−1 z
= x̂(−) + P (+)H T R−1 (z − H x̂(−)) , (102)

as we were to show.

As an alternative way to show the desired expressions for x̂(+) and P (+) that does not use
the matrix inversion lemma, we can take the expression for J and write everything in terms
of the estimated vs. prior difference or x̃ = x̂ − x(−). We find that

J = (x̂ − x(−))T P (−)−1 (x̂ − x(−))


+ (z − H(x̂ − x(−) + x(−)))T R−1 (z − H(x̂ − x(−) + x(−)))
= (x̂ − x(−))T P (−)−1 (x̂ − x(−))
+ (z − H(x̂ − x(−)))T R−1 (z − H(x̂ − x(−)))
− (z − H(x̂ − x(−)))T R−1 Hx(−) − x(−)T H T R−1 (z − H(x̂ − x(−)))
+ x(−)T H T R−1 Hx(−) .

As before we will want to take the derivative of J with respect to x̂, set the result equal to
zero and solve for x̂. With the above expression since x(−) is a constant, the derivative with
respect to x̂ is equal to the derivative with respect to the expression x̂ − x(−). If we define
this expression as x̃, we see that J in terms of x̃ can be written as

J = x̃T P (−)−1 x̃ + (z − H x̃)T R−1 (z − H x̃)


− (z − H x̃)T R−1 Hx(−) − x(−)T H T R−1 (z − H x̃)
+ x(−)T H T R−1 Hx(−)
= x̃T P (−)−1 x̃
+ z T R−1 z − z T R−1 H x̃ − x̃T H T R−1 z + x̃T H T R−1 H x̃
− z T R−1 Hx(−) + x̃T H T R−1 Hx(−) − x(−)T H T R−1 z + x(−)T H T R−1 H x̃
+ x(−)T H T R−1 Hx(−) .

Taking the x̃ derivative of this expression gives


∂J
= 2P (−)−1 x̃
∂ x̃
− H T R−1 z − H T R−1 z + 2H T R−1 H x̃
+ H T R−1 Hx(−) + H T R−1 Hx(−)
= 2(P (−)−1 + H T R−1 H)x̃ − 2H T R−1 z + 2H T R−1 Hx(−) .

Seeing this derivative equal to zero ans solving for x̃ we find

x̃ = (P (−)−1 + H T R−1 H)−1 [−H T R−1 Hx(−) + H T R−1 z]


= (P (−)−1 + H T R−1 H)−1 H T R−1 (z − Hx(−)) .

Thus converting the minimum we just found for x̃ into the variable x̂ with x̃ = x̂ − x(−) we
have that
x̂ = x(−) + (P (−)−1 + H T R−1 H)−1 H T R−1 (z − Hx(−)) ,
the same expression as in Equation 102.

Problem 4-6 (the derivation of the maximum likelihood expression)

Using the definition of conditional probability that


p(x, z) p(x)p(z)
p(z|x) = = = p(z) ,
p(x) p(x)
since the variables x and v are independent. Let pick the estimate x̂ so that it maximizes
p(z|x), this is known as the maximum likelihood estimate. The probability density function
of the random variable v is said to be a multidimensional normal and is given by
 
1 1 T −1
p(v) = exp − v R v ,
(2π)l/2 |R|1/2 2
where l is the dimension of the measurement noise. Then as a function of x since v = z − Hx
is given by  
1 1 T −1
p(z|x) = exp − (z − Hx) R (z − Hx) , (103)
(2π)l/2 |R|1/2 2
so to maximize p(z|x) is equivalent to minimize the product

(z − Hx)T R−1 (z − Hx) = z T R−1 z − z T R−1 Hx − xT H T R−1 z + xT H T R−1 Hx ,

as a function of x. When we take the derivative of this expression and set the result equal
to zero we find that
∂J
= −H T R−1 z − H T R−1 z + 2H T R−1 Hx = 0 .
∂x
Solving for x we find that
x = (H T R−1 H)−1 (H T R−1 z) , (104)
for the maximal likelihood solution. This is the same expression we found in Problem 4.4
above and thus the analysis from that problem is valid here. Namely, if we receive another
measurement z2 , with a measurement sensitivity matrix H2 , and measurement covariance
matrix R2 the recursive update of our state estimate x̂ is given by

x2 = x1 + P (+)H2T R2−1 (z2 − H2 x1 )


P (+)−1 = H1T R1−1 H1 + H2T R2−1 H2 ,

where x1 is the estimate of x before receiving the measurement z2 given by Equation 104
with H = H1 , R = R1 , z = z1 , and x = x1 .

Problem 4-7 (the recursive maximum a posteriori estimate)

Part (a): As x is a Gaussian random variable and a linear transformation of Gaussian


random variables produces another Gaussian random variable, we see that Hx is another
Gaussian random variable. Since v is independent of x and Gaussian and since sums of
independent Gaussian random variables are also Gaussian the random variable Hx + v is
Gaussian. To determine the full distribution of Hx + v, it is sufficient to compute the mean
of the we have for the mean and covariance of z = Hx + v. For the mean of z we have

E[z] = HE[x] + E[v] = H x̂(−) + 0 = H x̂(−) .

For the Cov(z) using independence we find

Cov(z) = Cov(Hx) + Cov(v)


= HCov(x)H T + R = HP (−)H T + R .

Thus z ∼ N(Hx(−), HP (−)H T + R) as we were to show.

Part (b): Using the definition of conditional probability we find

p(x, z) p(z|x)p(x) p(v)p(x)


p(x|z) = = = ,
p(z) p(z) p(z)

where we have used the fact that p(z|x) = p(z − Hx|x) = p(v).
Part (c): Note that from the problem statement we have that x ∼ N(x̂(−), P (−)), from
Part (a) of this problem we have that z ∼ N(Hx(−), HP (−)H T + R), and from Problem 4-6
above that p(z|x) can be expressed using Equation 103. Thus we can compute p(x|z) using
each of these components and obtain the functional form presented in the book.
1
p(x|z) = c exp{− [(x − x̂(−))T P (−)−1 (x − x̂(−)) + (z − Hx)T R−1 (z − Hx)
2
− (z − H x̂(−))[HP (−)H T + R]−1 (z − H x̂(−))]} .

In the above exponential one can see the three major terms that come from p(x), p(z|x),
and p(z) respectively.

Part (d): Sine p(x|z) is another Gaussian density, but with an as yet undetermined mean
and covariance, lets denote this unknown mean and covariance by x̂(+) and P (+), and
emphasize this by setting the term in the exponential above equal to
1
− (x − x̂(+))T P (+)−1(x − x̂(+)) .
2
This gives the equation (after we multiply by −2 on both sides)

(x − x̂(+))T P (+)−1 (x − x̂(+)) = (x − x̂(−))T P (−)−1 (x − x̂(−)) + (z − Hx)T R−1 (z − Hx)


− (z − H x̂(−))[HP (−)H T + R]−1 (z − H x̂(−)) .

Expanding the quadratics on both sides of the above expression gives

xT P (+)−1 x − 2x̂(+)T P (+)−1x + x̂(+)T P (+)−1x̂(+)


= xT P (−)−1 x − 2x̂(−)T P (−)−1 x + x̂(−)T P (−)−1 x̂(−)
+ z T R−1 z − 2z T R−1 Hx + xT H T R−1 Hx
− (z − H x̂(−))[HP (−)H T + R]−1 (z − H x̂(−)) .

Equating quadratic and terms in x above we see that P (+)−1 must be given by

P (+)−1 = P (−)−1 + H T R−1 H . (105)

Equating the linear terms in x above we get that

−2x̂(+)T P (+)−1x = −2(x̂(−)T P (−)−1 + z T R−1 H)x ,

or “canceling x” from both sides and taking the transpose we have

P (+)−1x̂(+) = P (−)−1 x̂(−) + H T R−1 z .

Now if we multiply by P (+) on the right-hand-side of the above we end with

x̂(+) = P (+)P (−)−1 x̂(−) + P (+)H T R−1 z , (106)

From Equation 105 we see that

P (+) = (P (−)−1 + H T R−1 H)−1


= P (−) − P (+)H T R−1 HP (−) ,
when we use the matrix inversion lemma given in Equation 99. With this expression we can
write the product of P (+)P (−)−1 as
P (+)P (−)−1 = I − P (+)H T R−1 H , (107)
from which we can conclude that x̂(+) is given by
x̂(+) = (I − P (+)H T R−1 H)x̂(−) + P (+)H T R−1 z
= x̂(−) + P (+)H T R−1 (z − H x̂(−)) . (108)
Proving the results summarized in Equations 105 and 108.

Problem 4-8 (the uncertainty in an estimator of Kalman like form)

The given linear filter we seek is of the form


˙
x̂(t) = K ′ x̂ + Kz ,
where K ′ and K chosen such that x̂ is unbiased and to have the smallest variance among all
estimators of this form. Lets consider the error x̃ defined as x̃ = x̂ − x. This function has a
differential equation given by
dx̃ dx̂ dx
= −
dt dt dt
= K ′ x̂ + Kz − F x − Gw
= K ′ x̂ + K(Hx + v) − F x − Gw
= K ′ x̂ + KHx + Kv − F x − Gw .
dx̃
Since x̂ = x̃ + x we have that dt
in terms of x̃ and x is given by
dx̃
= K ′ x̃ + (K ′ + KH − F )x + Kv − Gw .
dt
Then to be unbiased for all x we must pick K ′ and K such that
K ′ + KH − F = 0 or K ′ = F − KH . (109)
With this expression for K ′ our estimator is then given by solving the following
x̂˙ = (F − KH)x̂ + Kz
= F x̂ + K(z − H x̂) , (110)
for x̂. With this choice for K ′ the expression for dx̃dt
has no terms involving the unknown x
and is given by
dx̃
= K ′ x̃ + Kv − Gw .
dt
If we define P (t) to be P (t) = E[x̃x̃T ] from the above we see that
Ṗ = K ′ P + P K ′T + Cov(Kv − Gw)
= K ′ P + P K ′T + KCov(v)K T + GCov(w)GT
= K ′ P + P K ′T + KRK T + GQGT .
When we put in the expression for K ′ found above we obtain

Ṗ = (F − KH)P + P (F − KH)T + KRK T + GQGT . (111)

Now we want to find the value of K such that our objective function J = trace(Ṗ ) is a
minimum. To find this value of K lets first compute the expression for trace(Ṗ ). Using
Equation 111 we find

J = trace(Ṗ )
= trace(F P ) + trace(P F T ) + trace(GQGT )
− trace(KHP ) − trace(P H T K T ) + trace(KRK T ) .
∂J
Next we need to evaluate ∂K
. To do this we will recall the following matrix derivative facts


trace(BAC) = B T C T so that (112)
∂A

trace(AC) = I T C T = C T
∂A
∂ ∂
trace(CAT ) = trace(AC T ) = I T C = C and
∂A ∂A

trace(ABAT ) = 2AB . (113)
∂A
∂J
Using these results we find that ∂K
is given by

∂J
= −P H T − P H T + 2KR .
∂K
Setting this derivative equal to zero and solving for K gives

K = P H T R−1 , (114)

as we were to show.

Problem 4-9 (questions about Kalman filters)

Warning: I’m not sure exactly what this problem was asking or how to answer it. If anyone
has an idea of the type of solution requested please contact me.

Problem 4-10 (recursive scalar estimation)

That the estimator m̂k is unbiased can be seen by taking the expectation of its expression
k k
1X 1X
E[m̂k ] = E[xi ] = m = m,
k i=1 k i=1
where we have used the fact that the expectation of any given sample is the same as the
population mean or E[xi ] = m.

To show that the estimate of σ 2 is an unbiased estimator of the population variance we will
assume that the samples xi are drawn from a Gaussian distribution with a population mean
m and variance σ 2 . Then it can be shown that σ̂k2 as defined in this problem is related to a
chi-squared distribution in that the random variable

(k − 1)σ̂k2
,
σ2
is distributed as a χ2 random variable with k − 1 degrees of freedom [2, 3]. Recalling that if
the random variable, say X, is χ2 with k − 1 degrees of freedom then the expectation of X
is
E[X] = k − 1 , (115)
(k−1)σ̂k2
so that since σ2
is also χ2 with k − 1 degrees of freedom
 
(k − 1)σ̂k2
E = k − 1.
σ2

but at the same time  


(k − 1)σ̂k2 (k − 1)
E 2
= E[σ̂k2 ] .
σ σ2
Setting these two expressions equal to each other and solving for E[σ̂k2 ] gives

E[σ̂k2 ] = σ 2 ,

showing that the estimator σ̂k2 is unbiased.

To derive a recursive form for an estimator for the mean m note that from the given expression
for m̂k note that we have
k k−1
!
1X k−1 1 X 1
m̂k = xi = xi + xk
k i=1 k k − 1 i=1 k−1
k−1 1
= m̂k−1 + xk , (116)
k k
showing how given m̂k−1 and xk we can obtain the estimate m̂k .

To derive a recursive form for an estimator for the standard deviation σ 2 we follow much of
the same manipulations we did for the mean. We find
k
1 X 2
σ̂k2 = (x − m̂k )2
k − 1 i=1 i
k
1 X 2
= (x − 2xi m̂k + m̂2k )
k − 1 i=1 i
k k
1 X 2 2 X k
= xi − m̂k xi + m̂2k
k − 1 i=1 k−1 i=1
k−1
k
1 X 2 k
= xi − m̂2k (117)
k − 1 i=1 k−1
k−1
!
1 X k
= x2i + x2k − m̂2 . (118)
k − 1 i=1 k−1 k

Lets
Pk−1now decrease the index k in Equation 117 so that we can derive an expression for
2
i=1 xi (note the upper limit on this summation of k − 1). We find

k−1
2 1 X 2 k−1 2
σ̂k−1 = x − m̂ ,
k − 2 i=1 i k − 2 k−1
Pk−1
so that the sum i=1 x2i is given by
k−1
X
2
x2i = (k − 2)σ̂k−1 + (k − 1)m̂2k−1 .
i=1

When we put this expression into Equation 118 we get


1  k
σ̂k2 = 2
(k − 2)σ̂k−1 + (k − 1)m̂2k−1 + x2k − m̂2k
k
 − 1  k − 1
k−2 2 k 1
= σ̂k−1 + m̂2k−1 − m̂2k + x2 . (119)
k−1 k−1 k−1 k

The above expression is a recursive representation for σ̂k that requires storing and computing
the last and most recent estimate of the mean m̂k . Since we can express m̂k recursively in
terms of m̂k−1 via Equation 116 if desired we could put this expression into the above and
derive an alternative recursive expression for σ̂k2 , that only involves the “new” measurement
2
xk and the old estimates σ̂k−1 , m̂k−1 , that is it does not depend on m̂k .

Problem 4-11 (the system ẋ = ax + w with measurements z = bx + v)

For this problem everything is a scalar and we have F = a, H = b, G = 1, Q = q, and


R = r. Since the process and measurement noise are uncorrelated the Kalman gain is given
p(t)b
by K = P H T R−1 = r
. The error covariance propagation equation thus given by
   
p(t)b p(t)b
ṗ(t) = 2ap(t) + q − r
r r
b2
= 2ap(t) − p(t)2 + q , (120)
r
with an initial condition p(0) = p0 . Thus to determine p(t) as a function of t we need to
solve the above differential equation. This type of equation is known as a Riccati equation
and can be transformed into a second order linear equation which can possibly be solved
more easily. Note if q = 0 this non-linear equation is known as a Bernoulli equation. Next
we outline the solution to this equation. See [8] for more specific details. The general Riccati
equation is given by
dy
= P (x) + Q(x)y + R(x)y 2 , (121)
dx
for arbitrary functions P (x), Q(x), and R(x). To solve this equation we begin by finding an
initial solution y1 to this equation. Once we have an initial solution if we defined z(x) as
1
z(x) = , (122)
y(x) − y1
or
1
y(x) = y1 + ,
z(x)
then when we put the above expression for y(x) into Equation 121 we get the following
differential equation for z(x)

dz
= −(Q(x) + 2y1 R(x))z(x) − R(x) .
dx
The later, is a first order equation for z(x) which we can solve by quadrature. For the specific
problem given here, the initial solution y1 needed to proceed will be the steady-state or a
constant solution. When we take ṗ = 0 and denote the solution by p∞ in Equation 120 we
have
b2
− p2∞ + 2ap∞ + q = 0 .
r
When we solve for p∞ in the above quadratic we find
r !
ar b2 q
p∞ = 2 1 ± 1 + 2 . (123)
b ar

Since p∞ > 0 we must take the positive sign in the above expression. Next we let z(t) =
1 2
p(t)−p∞
and since P (t) = q, Q(t) = 2a, and R(t) = − br in the general Riccati solution
formulation find the equation for z(t) given by
  2   2
′ b b
z (t) = − 2a + 2p∞ − z− −
r r
  p
2 2
2b b 2 b2 q + a2 r b2
= − 2a − p∞ z + = √ z+ ,
r r r r
when we put in p∞ and simplify. Consider the coefficient of z(t) in the above equation
r s   r
b2 q b2q b2 q
2 + a2 = 2 a2 1 + 2 = 2|a| 1 + 2 = 2β ,
r ar ar

where we have defined β in the last equality. Thus for z(t) we need to solve

b2
z ′ (t) = 2βz(t) + .
r
When we do this for z(0) = z0 we find

−b2 + b2 e2βt + 2rβz0e2βt


z(t) = .
2βr
Thus
1 2βr
p(t) = p∞ + = p∞ + 2 .
z(t) b (−1 + e ) + 2rβz0 e2βt
2βt

From this later expression we see that as t → ∞ that p(t) → p∞ as it should. Since p(0) = p0
when we let t = 0 we find that p0 = p∞ + z10 or z0 = p0 −p
1

. Thus

2βr(p0 − p∞ )
p(t) = p∞ + . (124)
b2 (p 0 − p∞ )(−1 + e
2βt ) + 2rβe2βt

Now note that from the definition of β we have


r
b2 q b2
β = a 1 + 2 = p∞ − a ,
ar r
so p∞ in terms of β is given by
r
p∞ = (β + a) .
b2
When we convert the exponentials above into the hyperbolic functions sinh(·) and cosh(·)
and replace p∞ with the above expression for β we find that we can represent p(t) as
 
r (ap0 − br2 (a2 − β 2 )) sinh(βt) + βp0 cosh(βt)
p(t) = .
(b2 p0 − ar) sinh(βt) + βr cosh(βt)

Dividing by r on the top and the bottom of this expression and noting that
    2 
r 2 2
 r 2 2 b2 q r bq
2
a −β = 2 a −a 1+ 2 = 2 − = −q ,
b b ar b r

the above becomes


(ap0 + q) sinh(βt) + βp0 cosh(βt)
p(t) =  2  , (125)
b p0
r
− a sinh(βt) + β cosh(βt)
as we were to show. In the Mathematical file chap 4 prob 11.nb we perform much of the
algebra not displayed in the above derivation.
Problem 4-12 (Kalman filtering a second order system)

The given diagram from the book for this problem implies that ẋ1 = w and
Z
x2 = (x1 − βx2 )dτ .

Thus as a system of differential equations our system is given by

ẋ1 = w
ẋ2 = x1 − βx2 ,

or in matrix form the above is


      
d x1 0 0 x1 w
= + ,
dt x2 1 −β x2 0
   
0 0 T q 0
from which we recognize that F = and GQG = .
1 −β 0 0

The measurement we observe z(t) is related to the state


 as z = αx2 + v, and so the mea-
surement sensitivity matrix H is given by H = 0 α and R = r. Using these pieces the
matrix Riccati differential equation given by

Ṗ = F P + P F T + GQGT − P H T R−1 HP .

then becomes in steady-state (Ṗ = 0) the following system


" #
α2 p2 2
q − r 12 p11 − βp12 − α p12 r
p22
0= α2 p12 p22 α2 p222 .
p11 − βp12 − r
2p 12 − 2βp 22 − r

Solving for p12 using the (1, 1) component above gives



rq
p12 = ± . (126)
α
When we put that value into the (2, 2) component of the above expression we get

qr α2
±2 − 2βp22 − p222 = 0 ,
α r
or as a quadratic equation in standard form

α2 2 2 qr
p + 2βp22 ∓ = 0.
r 22 α
Since we have two signs in the above expression and we have two solutions for each individual
quadratic we have four possible solutions for p22 . Solving these using the quadratic equation
gives r  √ 
2 2 qr  s 
−2β ± 4β 2 − 4 αr ∓ α r
r q
p22 = α2
 = 2 −β ± β 2 ± 2α .
2 r α r
Since p22 > 0 we must take signs such that the resulting expression is positive. Since we are
not explicitly told the signs of the variables β and α, lets assume that β > 0. In that case
to guarantee that p22 > 0 we must take both signs above to be positive. Thus we have
 s 
r
r q
p22 = 2 −β + β 2 + 2α .
α r

Now using this expression in the (1, 2) component gives for p11
α2
p11 = βp12 + p12 p22
r  
 √   s r

qr α2 qr r  q
= ±β + ± 2
−β + β 2 + 2α
α r α α r
s r

qr 2
q
= ± β + 2α .
α r

As p11 > 0 we must take the positive sign in the above expression. Which means that we
know the complete expression for p12 given by Equation 126. Now to compute K(∞) we
note that

K(∞) = P (∞)H T R−1


    
1 p11 p12 0 α p12
= =
r p12 p22 α r p22
" √
qr #
α  qα
= p 
r αr2 −β + β 2 + 2α qr
" pq #
=  q r
pq  ,
β α
α
−1 + 1 + 2 β 2 r

as we were to show. In the Mathematical file chap 4 prob 12.nb we perform some of the
algebra not displayed in the above derivation.

Problem 4-13 (the optimal filter for detecting a sine wave in white noise)

Warning: I was not able to solve this problem. If anyone has an attempted solution I would
be interested in seeing it.

Problem 4-14 (an integrator driven by white noise)

As a continuous system from the problem description the output x(t) of our integrator would
satisfy
ẋ = w ,
where w(t) is a white noise process. If we discretize this process we get the discrete system
of
xk+1 = xk + wk ,
where now we have that wk ∼ N(0, q∆). We are told that the observation equation is given
by
vk = xk + vk .
With no a priori information measure we have P0 (+) = +∞, and to compute the a posteriori
covariance matrix after each measurement in this problem we will use

Pk (+)−1 = Pk (−)−1 + HkT Rk−1 Hk .

From the equations above we can make the association to the standard problem that Φk = I,
Gk = I, Qk = q∆, Hk = 1, and Rk = r0 .

Part (a): In this case we told to assume that q∆ ≫ r0 . Now we have P0 (+) = +∞, since
there is no a priori information and we get P1 (−) from

P1 (−) = P0 (+) + q∆ = +∞ .

Then P1 (+) after the first measurement is given by


1 1
P1 (+)−1 = P1 (−)−1 + = ⇒ P1 (+) = r0 .
r0 r0
For the variance before the second measurement or P2 (−) we get

P2 (−) = P1 (+) + q∆ = r0 + q∆ .

For the updated variance after the second measurement P2 (+) we get
1 1 1 1
P2 (+)−1 = P2 (−)−1 + = + ≈ ⇒ P2 (+) = r0 ,
r0 r0 + q∆ r0 r0
since q∆ ≫ r0 . Now P3 (−) is given by

P3 (−) = P2 (+) + q∆ = r0 + q∆ ,

and P3 (+) is given by


1 1 1
P3 (+)−1 = P3 (−)−1 + = + ⇒ P3 (+) = r0 .
r0 r0 + q∆ r0
Continuing the pattern above we conclude that

Pk (+) = r0 ,

and
Pk+1(−) = r0 + q∆ ≈ q∆ ,
when q∆ ≫ r0 . This corresponds to the case where the object we are filtering has very
large process noise, so that at each timestep when we propagate between measurements we
effectively “loose” the object. The measurements are considerably more accurate so when
we take a measurement we have a much tighter uncertainty around the tracked object.

Part (b): For this part we assume that r0 ≫ q∆ and follow the outline as in the previous
part. Again we start with P0 (+) = +∞, since there is no a priori information. Then we get
P1 (−) from
P1 (−) = P0 (+) + q∆ = +∞ .
Then P1 (+) is given by
1 1
P1 (+)−1 = P1 (−)−1 + = ⇒ P1 (+) = r0 .
r0 r0
Then for P2 (−) we get

P2 (−) = P1 (+) + q∆ = r0 + q∆ ≈ r0 .

Then P2 (+) is given by


1 1 1 2r0 + q∆
P2 (+)−1 = P2 (−)−1 + = + = ,
r0 r0 + q∆ r0 r0 (r0 + q∆)
so  
q∆
r0 (r0 + q∆) r 0 1 + r0 r
P2 (+) = =   ≈ 0,
2r0 + q∆ 2 1 + 2r q∆ 2
0

since r0 ≫ q∆. Now P3 (−) is given by


r0 r0
P3 (−) = P2 (+) + q∆ = + q∆ ≈ ,
2 2
and P3 (+) is given by
1 2 1 3 r0
P3 (+)−1 = P3 (−)−1 + = + = ⇒ P3 (+) = .
r0 r0 r0 r0 3
Doing one more iteration for completeness we find P4 (−) given by
r0 r0
P4 (−) = P3 (+) + q∆ = + q∆ ≈ ,
3 3
and P4 (+) given by
1 3 1 4 r0
P4 (+)−1 = P4 (−)−1 + = + = ⇒ P4 (+) = .
r0 r0 r0 r0 4
Continuing the pattern above we conclude that
r0
Pk (+) = = Pk+1 (−) , (127)
k
for k > 0 when q∆ ≫ r0 . This case corresponds to the situation where the dynamics has very
little process noise so once we have “found” the object we are able to keep hold of it relatively
easily. As the initial uncertainty is infinite each measurement reduces the uncertainty in a
algebraic manner while propagation introduces no additional uncertainty see Equation 127.
Problem 4-15 (an expression for Pa (T + ))

 
δp(0)
For this problem we are told to take as our state the vector x =  δv(0) . This is different
δa(0)
from the state vector specified in example 4.2-4 in that this state is a constant vector of
initial conditions,
  while example 4.2-4 in the book used the time dependent state given by
δp(t)
x(t) =  δv(t) , where each function in the state is the appropriate integral of the one
δa(t)
below it. The constant state for this problem then satisfies the null dynamics given by
dx
dt
= 0, which has the fundamental solution Φ = I. We assume that our initial uncertainty
in these constants before the measurement at time T is given by
   
p11 (0) 0 0 E[δp2 (0)] 0 0
P (0) =  0 p22 (0) 0 = 0 E[δv 2 (0)] 0 .
2
0 0 p33 (0) 0 0 E[δa (0)]

The discrete state and covariance extrapolation equations from the time 0 to T − the time
just before the first measurement fix gives
 
δp(0)
x̂(T − ) = I x̂(0) =  δv(0)  ,
δa(0)

and P (T − ) = P (0). Because our state x is independent of time the given measurement z(t)
requires that the measurement sensitivity matrix H now be a function of time because
 
  δp(0)
z(t) = −δp(t) + ep = − 1 t t2  δv(0)  + ep ,
2

δa(0)
so the measurement sensitivity matrix is given by
 t2

H(t) = − 1 t 2
.

With this definition of H we next compute some of the factors needed in computing the a
posteriori state and covariance update equations. One expression we require is

− T 2 T4
H(T )P (T )H(T ) = p11 (0) + T p22 (0) + p33 (0) .
4
From this point on to simplify the notation we will write p11 (0) as p11 dropping the argument
of zero (we follow the same convention for the other expressions). To evaluate P (T + ) we
could use Equation 66 with R = σp2 or we can use the inverse update formulation given by
Equation 61 which gives

P (T + )−1 = P (T − )−1 + H(T )T R(T )−1 H(T )


 
1  
1  
= P (T ) +  T 
− −1
1 T T2
.
T2 σp2 2
2
The book suggest that we invert the right-hand-side of this using the Sherman-Morrison-
Woodbury formula
(A + UV T )−1 = A−1 − A−1 U(I + V T A−1 U)−1 V T A−1 . (128)
Using this expression we can compute P (T + ), with the associations A = P (T − )−1 and
 
1
1 
U =V = T .
σp T2
2

The following algebra, required to derive the expression quoted in the text, is rather tedious
and can be skipped if desired. First we evaluate the factor I + V T A−1 U and find
I + V T A−1 U = I + V T P (T − )U

1
1  2 
 T 
= 1+ 2
1 T T2
σp T2
2
 
1 T4
= 1 + 2 p11 + p22 T + p33 .
σp 4

Next the expression M = A−1 U(I + V T A−1 U)−1 V T A−1 is given by


  !
1  2 

P (T )  1
M = 2
T  1 T 4 1 T T2 P (T − )
σp T2 1 + σ2 p11 + p22 T + p33 4
p
2
!  2 
1 T T2
1
P (T − )  T T 2 T2  P (T − ) .
3
= T4
2
σp + p11 + p22 T + p33 4 T2 T3 T4
2 2 4

Note that from the definition of ∆a (T ) given we can simplify the denominator above as
T4
σp2 + p11 + p22 T + p33 = p11 p22 p33 ∆a (T ) . (129)
4
When we use this in M we get
 2 
p211 p11 p22 T p11 p33 T2
1  p22 p11 T p222 T 2 p22 p33 T2  .
3
M=
p11 p22 p33 ∆a (T ) 2 3 4
p33 p11 T2 p33 p22 T2 p233 T4
Then the expression for P (T + ) then looks like
P (T + ) = P (T − ) − M
 
∆a (T )p11 0 0
1  
= 0 ∆a (T )p22 0
∆a (T )
0 0 ∆a (T )p33
 p11 T T 2 
p22 p33 p33 2p22
1  T p22 T 2 T3 
−  p33 p11 p33 2p11 .
∆a (T ) T2 T3 p33 T 4
2p22 2p11 4p11 p22
So ∆a (T )P (T + ) then looks like
 2 
∆a (T )p11 − p22p11p33 − pT33 − 2pT 22
 p22 T 2 3 
 − pT33 ∆a (T )p22 − p11 p33
− 2pT 11 .
T2 3 p33 T 4
− 2p22 − 2pT 11 ∆a (T )p33 − 4p11 p22

We have one more simplification (that we don’t fully document) and we have shown the
requested result. If we take each of the diagonal elements in the expression for P (T + ) and
simplify using the definition of ∆a (T ) given in Equation 129. For example the (1, 1) element
becomes
 
1 2 T4 p11 σp2 T2 T4
σp + p11 + p22 T + p33 − = + + ,
p22 p33 4 p22 p33 p22 p33 p33 4p22

which is the quoted expression in the book. Simplifying the other diagonal terms gives rise
to the desired expression for P (T + ).

Problem 4-16 (single-star vs. two-star fixes)

The single-star fix: We are told that our first measurement gives us an estimate of θ1 and
θ2 . Lets assume (for this part and the next) that there is no dynamics in this problem and we
just want to observe how the single star and double star fixes change our state uncertainty
estimates. For the single star fix the measurement vector z is related to the state by
 
    θ1  
z1 1 0 0   v1
z= = θ2 + ,
z2 0 1 0 v2
θ3
    2 
v1 σ1 0
with the measurement noise vector ∼ N 0, . Then we update the a
v2 0 σ22
priori covariance to account for this measurement using the standard a posteriori update
equation
P (+) = P (−) − P (−)H T (HP (−)H T + R)−1 HP (−) . (130)
To evaluate this we find that the product HP (−) is given by
 
  σ2 0 0  2 
1 0 0  2  σ 0 0
HP (−) = 0 σ 0 = .
0 1 0 0 σ2 0
0 0 σ2
 2 
σ 0
The matrix P (−)H T is the transpose of this or  0 σ 2 . Next we compute HP (−)H T
0 0
and find  2 
  σ 0  2 
T 1 0 0  2  σ 0
HP (−)H = 0 σ = .
0 1 0 0 σ2
0 0
With this we have
 −1 " 1
#
σ 2 + σ12 0 σ2 +σ12
0
(HP (−)H T + R)−1 = = 1 ,
0 σ + σ12
2
0 σ2 +σ22

and P (+) is then given by


 σ4

σ2 − σ2 +σ12
0 0
 
P (−) − P (−)H T (HP (−)H T + R)−1 HP (−) =  0 σ2 − σ4
σ2 +σ22
0 . (131)
0 0 σ2
From this we see that the trace of this expression is
σ4 σ4
trace(P (+)) = 3σ 2 − −
σ 2 + σ12 σ 2 + σ22
≈ 3σ 2 − σ 2 − σ 2 = σ 2 ,
when we assume that σ 2 ≫ σi2 .

The two-star fix: For the two-star fix we follow the one-star fix with another pair of
measurements of the angles θ1 and θ3 . In this case the second measurement vector has the
form  
  θ1  
1 0 0  v
z= θ2  + 1
,
0 0 1 v3
θ3
with     2 
v1 σ1 0
∼ N 0, .
v3 0 σ32
   2 
1 0 0 σ1 0
Thus in this case we have that H = and R = . Performing the
0 0 1 0 σ32
same manipulations as above but with these different H and R matrices and using the value
computed for P (+) in Equation 131 for the value of P (−) in Equation 130 (the second
measurement directly follows the first) we find that P (+) after both measurements is given
by  σ2 σ2 
1
0 0  σ2 
2
2
 2σ +σ1 σ2 σ2  2
1
0 0
P (+) = 
 0 2
σ2 +σ22
0  ≈
 0 σ2 0  ,
2
0 0
2
σ σ32
2
0 0 σ32
2
σ +σ 3
2
when use σ ≫ σi2 to simplify terms like
σ 2 σi2 σ 2 σi2 σi2
≈ = .
nσ 2 + σi2 nσ 2 n
From the above we find trace(P (+)) to be given by
σ12
trace(P (+)) = + σ22 + σ32 ,
2
as we were to show.

In the Mathematical file chap 4 prob 16.nb we perform some of the algebra not displayed
in the above derivation.
Problem 4-17 (a polynomial tracking filter)

 
x1
The zero forcing dynamic equation ẍ = 0 when we introduce the state x = defined
x2
by x1 (t) = x(t) and x2 (t) = ẋ(t) has components that satisfy

ẋ1 (t) = ẋ(t) = x2 (t)


ẋ2 (t) = ẍ(t) = 0 .

so that our equation ẍ = 0 has the following companion form


    
d x1 0 1 x1
= .
dt x2 0 0 x2
 
x1  
The measurements for this problem are given by z = x1 + v = 1 0 + v, so the
  x2
matrices H and R are given by H = 1 0 and R = r. The fundamental solution, Φ, to
the above companion form representation can be computed as
   
Ft 0 1 1 t
Φ(t, 0) = e = I + t= .
0 0 0 1

To derive the requested expression for Pk+1(+) we sequentially perform error covariance
extrapolation followed by error covariance updates until we get to the discrete time tk+1 =
(k + 1)τ . The error covariance extrapolation equation is explicitly given by

Pk+1 (−) = Φ(τ, 0)Pk (+)Φ(τ, 0)T , (132)

and is subsequently followed by an error covariance update step which can be written as
−1
Pk+1 (+)−1 = Pk+1 (−)−1 + Hk+1
T
Rk+1 Hk+1
 
1 1 0
= Pk+1 (−)−1 + . (133)
r 0 0

Once we have computed the matrix Pk+1 (+) we can compute Kk+1 via Equation 62 which
in this case becomes
 
T −1 1 1
Kk+1 = Pk+1 (+)Hk+1Rk+1 = Pk+1 (+) . (134)
r 0

While we have not derived the quoted expression for Pk+1 (+) if we assume that it is correct
and compute Kk+1 with the above formula we get
   
1 1 1 2r 2k + 1
Kk+1 = Pk+1 (+) = 3
r 0 r (k + 1)(k + 2) τ
 
2 2k + 1
= 3 ,
(k + 1)(k + 2) τ
which is the expression given. Thus to finish this problem it remains to derive the expression
for Pk+1 (+). From Equations 132 and 133 we can combine these two expressions into one to
get
 
−1 T −1 1 1 0
Pk+1(+) = (Φ(τ, 0)Pk (+)Φ(τ, 0) ) +
r 0 0
   −1  
1 τ 1 0 1 1 0
= Pk (+) + . (135)
0 1 τ 1 r 0 0
Following the hint in the book if we begin these iterations with P0 (+) = 1ǫ I we find that
 
1 r(1 + τ 2 ) rτ
P1 (+) = .
1 + ǫr + τ 2 rτ r + 1ǫ
We cannot take the limit of this as ǫ → 0 so we iterate Equation 135 another time to get an
expression for P2 (+). When we do this we find that we can set ǫ = 0 and get a well defined
expression. The resulting expression is
 
r τr
P2 (+) = r 2r .
τ τ2
Iterating Equation 135 a third time with on the above matrix gives
 5r r 
P3 (+) = 6 2τ .
r 2r
2τ τ2

Both of these expressions agree with the stated result for Pk+1 (+) when we take k = 1 and
k = 2. If we hypothesis that
 
2r 2k + 1 τ3
Pk+1 (+) = 3 6 ,
(k + 1)(k + 2) τ kτ 2

we can then use Equation 135 to show by induction that the above expression for Pk+1(+)
is valid for all k.

Note that in the Mathematical file chap 4 prob 17.nb we perform some of the algebra not
displayed in the above derivation.

Problem 4-18 (the optimal differentiator)

If we define y(t) by y(t) = M(t)x(t) then y(t) satisfies the system


ẏ = Ṁx + M ẋ
= Ṁx + M(F x + Gw)
= (Ṁ + MF )x + MGw .
As this is a linear transformation of x(t) which is itself a Gaussian random process y(t)
will also be a Gaussian random process and the estimate of its mean will be the optimal a
posteriori estimate. Since E[w] = 0 we have that the mean of y(t) has dynamics given by
ŷ˙ = E[ẏ] = (Ṁ + MF )E[x] = (Ṁ + MF )x̂ ,
the claimed equation.
Problem 4-19 (the determinant of the posteriori covariance matrix Pk (+))

The discrete covariance matrix update equation is given by

Pk (+) = (I − Kk Hk )Pk (−) , (136)

where Kk is the Kalman gain given by Kk = Pk (−)HkT (Hk Pk (−)HkT + Rk )−1 . To derive the
requested determinant first consider the following manipulations of the product Hk Kk . We
have

Hk Kk = Hk Pk (−)HkT (Hk Pk (−)HkT + Rk )−1


= (Hk Pk (−)HkT + Rk − Rk )(Hk Pk (−)HkT + Rk )−1
= I − Rk (Hk Pk (−)HkT + Rk )−1 .

Thus if we multiply Equation 136 on the left by Hk we get

Hk Pk (+) = Hk Pk (−) − Hk Kk Hk Pk (−) .

When we put in the expression just derived for Hk Kk into the above we get

Hk Pk (+) = Hk Pk (−) − (I − Rk (Hk Pk (−)HkT + Rk )−1 )Hk Pk (−)


= Rk (Hk Pk (−)HkT + Rk )−1 Hk Pk (−) ,

the initial expression requested. Taking the determinant of both sides of this then gives

|Hk ||Pk (+)| = |Rk ||Hk Pk (−)HkT + Rk |−1 |Hk ||Pk (−)| .

We can divide both sides of this equation by |Hk | since Hk is invertible to get

|Rk ||Pk (−)|


|Pk (+)| = ,
|Hk Pk (−)HkT + Rk |

the expression we desired.

Problem 4-20 (filtering with a uniform distribution)

Lets look for an optimal linear estimator of the following form for processing the kth mea-
surement zk
x̂k (+) = kk′ x̂k (−) + kk zk .
Introducing the a priori and a posteriori estimation errors x̃k (±) = x̂k (±) − xk , and the
measurement equation zk = xk + vk in the above equation we have an recursive update of
x̃k (+) given by
x̃k (+) = [kk′ + kk − 1]xk + kk′ x̃(−) + kk vk .
To be an unbiased requires that since E[vk ] = 0 that kk′ = 1 − kk and we have an estimator
of
x̂k (+) = (1 − kk )x̂k (−) + kk zk .
To determine the value of kk consider
pk (+) = E{x̃k (+)x̃k (+)T }
= E{(1 − kk )x̃k (x̃k (1 − kk ) + kk vk ) + kk vk (x̃k (−)(1 − kk ) + kk vk )}
= (1 − kk )2 E{x̃k (−)2 } + 2(1 − kk )kk E{x̃k (−)vk } + kk2 E{vk2 }
q2
= (1 − kk )2 pk (−) + kk2 .
12
Where we have used
Z q Z q
1 2 2 2
E[vk2 ] = 2
x dx = x2 dx
q − q2 q 0
q/2
2 x3 q2
= = .
q 3 0 12
To find the value of kk that makes pk (+) a minimum we take the derivative and set the
results equal to zero and solve for kk . We find for the derivative
q2
2(1 − kk )(−1)pk (−) + kk = 0 .
6
or
pk (−)
kk = q2
, (137)
pk (−) + 12
so
q2
12
1 − kk = q2
.
pk (−) + 12
With this value of kk the covariance pk (+) becomes
(q 2 /12)2 (q 2 /12)2 pk (−)2
pk (+) = p k (−) +
(pk (−) + q 2 /12) (pk (−) + q 2 /12)2
(q 2 /12)2pk (−)2
= .
(pk (−) + q 2 /12)
Since we are estimating a constant with no dynamics we have that x̂k (−) = x̂k−1 (+) and
pk (−) = pk−1 (+). In summary then the recursive form of our estimator for the unknown
constant starts with
x̂0 (+) = m with p0 (+) = σ 2 ,
and then iterates for each measurement zk for k ≥ 1 the following
x̂k (−) = x̂k−1 (+) and pk (−) = pk−1(−)
x̂k (+) = (1 − kk )x̂k−1 (−) + kk zk
 
(q 2 /12) pk−1 (−)
= x̂k−1 + zk
pk−1 (−) + (q 2 /12) pk−1(−) + q 2 /12
(q 2 /12)2 pk (−)2
pk (+) = ,
(pk (−) + q 2 /12)
It seems that we only needed an expression for E[v 2 ] but the explicit form of the distribution
did not seem to matter.
Problem 4-21 (filtering with multiplicative noise)

Our estimator for this problem will be constructed as x̂ = kz for some as of yet unspecified
value for the multiplier k. The error using this estimator is computed as

x̃ = x̂ − x
= kz − x
= k(1 + η)x − x (138)
= (k(1 + η) − 1)x .

For x̂ to be an unbiased estimate of x means that E[x̃] = 0. From Equation 138 we see that
this requires
E[x̃] = kE[x] + kE[ηx] − E[x] = 0 ,
since all three expectations are zero. Thus the estimator as defined is unbiased. Next we
will pick the value of k so that the variance in the error is as small as possible. The variance
in the error is

E[x̃2 ] = E[(ηk + k − 1)2 x2 ]


= E[(η 2 k 2 + 2ηk(k − 1) + (k − 1)2 )x2 ]
= k 2 E[η 2 ]E[x2 ] + 0 + (k − 1)2 E[x2 ]
= k 2 ση2 σx2 + (k − 1)2 σx2 .

In the above I have assumed that E[η 2 x2 ] = E[η 2 ]E[x2 ], which would be true if x and η are
independent random variables. Then we want to minimize the expression E[x̃2 ] when viewed
as a function of k. When we take the derivative, of this expression, set the result equal to
zero and solve for k we find
1
k= .
1 + ση2
We can check that the value above is indeed a minimum by taking the second derivative
d2 E[x̃]
2
= 2ση2 σx2 + 2σx2 > 0 .
dk
Now since
1 −ση2
k−1= −1= ,
1 + ση2 1 + ση2
the minimum variance E[x̃2 ] is given by

ση2 σx2 ση4 σx2 ση2 σx2


E[x̃2 ] = + = .
(1 + ση2 )2 (1 + ση2 )2 1 + ση2

Problem 4-22 (filtering with spectral densities)

Warning: I’m not sure exactly what this problem was asking or how to answer it. If anyone
has an idea of the type of solution requested please contact me.
Problem 4-23 (filtering a constant angular rate)

If we define the state variables x1 and x2 for this problem to be x1 = θ and x2 = θ̇ then as
a differential system we have
      
d ẋ1 x2 0 1 x1
x= = = .
dt ẋ2 0 0 0 x2

Then using the power series definition for the fundamental solution we have
1
Φ(t + T, t) = eF T = I + F T + F 2 T 2 + · · · .
2
For the F given above F 2 = 0 and so the above sum explicitly stops after two terms.
Evaluating this two term sum we find that Φ(t + T, t) given by
 
1 T
Φ(t + T, t) = .
0 1

Also since zk = θk + vk = x1(kT ) + vk the measurement sensitivity matrix H is independent


of time and given by H = 1 0 , and R = 52 . We are told to take initial state estimate
and uncertainty for this problem given by
   2 
0 20 0
x̂0 (+) = and P0 (+) = = 202 I .
0 0 202

The filtering equations that will produce the optimal estimates of position and velocity are
given by the Kalman equations. We will do the first of these updates “by hand” and then
one could write a simple program to generate the rest. We first need to propagate the initial
state and uncertainty to the first measurement time
    
1 T 0 0
x̂1 (−) = Φ0 x̂0 (+) = =
0 1 0 0
     
T 1 T 2 1 0 2 1 + T2 T
P1 (−) = Φ0 P0 (+)Φ0 = 20 I = 20 .
0 1 T 1 T 1

Next we observe the first measurement z1 and update the state and covariance matrix with
with Equations 51, 58, and 59. We begin with Equation 58 or

K1 = P1 (−)H1T [H1 P1 (−)H1T + R1 ]−1 .

Now to compute this we need to add


  
  1 + T2 T 1
H1 P1 (−)H1T = 20 2
1 0 = 202 (1 + T 2 ) .
T 1 0

to R1 = 52 , giving H1 P1 (−)H1T + R1 = 202 (1 + T 2 ) + 52 . Next we compute


    
T 1 + T2 T 1 1 + T2
P1 (−)H1 = = ,
T 1 0 T
so K1 is explicitly given by
 
1 1 + T2
K1 = .
(20 (1 + T 2 ) + 52 )
2 T

Then the application of Equation 51 and 59 then give

x̂1 (+) = x̂1 (−) + K1 (z1 − H x̂1 (−)) = K1 z1


 
1 1 + T2
= z1
(202 (1 + T 2 ) + 52 ) T
P1 (+) = (I − K1 H1 )P1 (−) .

Since Φ and H do not depend on the index k the steps in this process are summarized as
follows. Given an initial starting values of x̂(+) and P (+) as each measurement z comes in
compute

x̂(−) = Φx̂(+)
P (−) = ΦP (+)ΦT
K = P (−)H T (HP (−)H T + R)−1
x̂(+) = x̂(−) + K(z − H x̂(−))
P (+) = (I − KH)P (−) .

Problem 4-24 (Kalman filtering with discrete measurement noise)

For this problem we are told that E[x0 ] = 1 and E[x20 ] = 2. From this we can conclude that
the variance of the initial state x0 is given by

p0 (+) = Var(x0 ) = E[x20 ] − E[x0 ]2 = 2 − 12 = 1 .

Our dynamic system model for this problem is

xk+1 = e−T /τ xk + wk for k ≥ 0 ,

where since T = τ the value of the exponential is above is actually e−1 . Our fundamental
solution matrix is then Φk = e−1 with a process noise variance of qk = 2. With measurements
of this process given by
zk = xk + vk ,
we have hk = 1. To derive statistics of the measurement noise process vk recall that the
density of the measurement noise vk is discrete and specifically given by
1
P (vk = −2) = P (vk = +2) = ,
2
so that E[vk ] = 0. The variance of noise distributed like this is given by
1 1
rk = E[vk2 ] = 4 + 4 = 4 ,
2 2
With all of the above information we can apply the Kalman filtering framework to this
problem.

Part (a-b): With initial conditions for this problem are given by x̂0 (+) = 1 with p0 (+) = 1,
our estimate for x̂1 (−) and p1 (−) is given by
x̂1 (−) = Φ0 x̂0 (+) = e−1 ,
and
p1 (−) = Φ0 p0 (+)ΦT0 + Q0 = e−2 + 2 .
Then we observe the measurement z1 , which we can incorporate using the Kalman mea-
surement update Equations 51, 58, and 59. Rather than document these in detail again,
please see the python file chap 4 prob 24.py for some numerical code where we do these
calculations for the two measurements z1 and z2 . When we implement these equations and
execute the above script we find
x̂1 (+) = 0.7619 p1 (+) = 1.3921 and
x̂1 (+) = 1.2420 p1 (+) = 1.4145 .

Problem 4-25 (Kalman filtering the motion of a one-dimensional ship)

Warning: I was not sure about this problem. If anyone has any ideas please contact me.

Problem 4-26 (an airplane autopilot)

Warning: I was not sure how to deal with the derivative of the expression hC (t) in the
noise term on the right-hand-side of the differential equation for h(t). If anyone has any
ideas please contact me.

Problem 4-27 (measuring the voltage in the black box)

Denote by i1 (t) and i2 (t) the currents in the left most and right most cell in Figure 4-
4 respectively. We assume that the currents are running in a clockwise direction. Then
Kirchhoff’s voltage law (KVL) [5] around the left most cell gives
u(t) − R1 i1 − v1 = 0 , (139)
while Kirchhoff’s voltage law around the right most cell gives
v1 − R2 i2 − v2 = 0 , (140)
where vi is the voltage of the capacitor Ci . Also the current flowing from top down through
the capacitor C1 gives rise to a change in voltage as
dv1
i1 − i2 = C1 . (141)
dt
The same consideration for the current flowing from top down through the capacitor C2 gives
i2 = C2 dvdt2 so that with this we can write i1 in terms of vi . From Equation 141 we have
dv1 dv2 dv1
i1 = i2 + C1 = C2 + C1 .
dt dt dt
With these expressions for i1 and i2 , using Equations 139 and 140 our system differential
equation in terms of the variables v1 and v2 is
 
dv1 dv2
u(t) − R1 C1 + C2 − v1 = 0 (142)
dt dt
dv2
v1 − R2 C2 − v2 = 0 . (143)
dt
dv2
Solving this second equation for dt
gives
dv2 1
= (v1 − v2 ) .
dt R2 C2
When we put that expression into Equation 142 and solving for dvdt1 we find
 
dv1 1 1 1 1 1
=− + v1 + v2 + u(t) .
dt C1 R1 R2 R2 C1 R1 C1
 
v1
When we view these two equations as a matrix system with a state x = we find
v2
  " h i #   
d v1 − C11 R11 + R12 R
1
C v1 1 u(t)
= 2 1 + .
dt v2 1
− 1 v2 R1 C1 0
R2 C2 R2 C2

If we next simplify the system above to the case where R1 = R2 = 1 and C1 = C2 = 1 the
above system becomes
      
d v1 −2 1 v1 u(t)
= + .
dt v2 1 −1 v2 0
 
−2 1
Thus for this problem we see that our system matrix F = . We are told that
1 −1
the measurement for this system is of v2 (t) and is exact or
 
  v1
z(t) = 0 1 .
v2

Since numerically having no measurement noise can be harder to we will simulate this by
taking R to be a very small number say 10−6.

This problem, as specified, is continuous but we want to compute our estimates the discrete
times so we will discretize it and apply the discrete Kalman filtering equations. To do that
we need the discrete transition matrix Φk given by
1
Φk = Φ((k + 1)∆t, k∆t) = eF ∆t ≈ I + F ∆t + F 2 ∆t2 .
2
1.4

1.2

0.8

0.6

0.4

0.2
var(v1(−))
var(v (+))
1
0
1 1.5 2 2.5 3 3.5 4 4.5 5

Figure 3: Plots of the a priori (in blue) and a posteriori (in red) covariance for the voltage
across the capacitor C1 as a function of the index in the discrete Kalman filtering algorithm.
The “index” 1 corresponds to the time 0.

The discrete process noise Qk is given by


 
2 0
Qk = ∆tQ = ∆t ,
0 0

since u ∼ N(0, 2). Then the optimal estimate of the voltage across C1 is given by the discrete
Kalman filter. For this problem statement we have ∆t = 0.5 seconds, and to reach the time
T = 2 seconds we need four iterations. We will take the initial conditions for this system as
 
0
x̂0 (+) = and P0 (+) = 0 ,
0

since we assume that the initial conditions are known exactly. Then to finish this problem
we need to iterate the discrete Kalman filtering covariance equations

Pk (−) = Φk−1 Pk−1(+)ΦTk−1 + Qk−1


Kk = Pk (−)HkT [Hk Pk (−)HkT + Rk ]−1 = Pk (−)HkT [Hk PK (−)HkT ]−1
Pk (+) = [I − Kk Hk ]Pk (−) ,

and then plot the (1, 1)th element of the matrices Pk (±) after each iteration. In the MAT-
LAB/Octave file chap 4 prob 27.m we perform the Kalman filtering iterations needed to
produce the plot above. We see that the value of the variance of v1 after the first measure-
ment goes to 1 and stays there for all further iterations.
Problem 4-28 (Kalman filtering the inverse square law)

To begin, first consider the given equations under the conditions that u1 = u2 = 0, which
are given by
G0
r̈ = r θ̇2 −
r2

θ̈ = −2θ̇ .
r
Then if r = R is a constant we see that ṙ = r̈ = 0 and the above becomes
G0
0 = Rθ̇2 −
R2
θ̈ = 0 .
G0
The first equation above gives θ̇2 = R3
or

G0
θ̇ = ,
R3/2
so that as a function of t when we integrate we find

G0
θ(t) = 3/2 t + θ0 ,
R
where θ0 is an arbitrary constant. Note that this solution also satisfies θ̈ = 0. To get the
circular orbit solution quoted in the book we take θ0 = 0 and then θ(t) = ωt with ω given
by √
G0
ω = 3/2 ,
R
3 2
or equivalently R ω = G0 .

We are told to introduce state variables x1 , x2 , x3 , and x4 to be given by


x1 = r−R (144)
x2 = ṙ (145)
x3 = R(θ − ωt) (146)
x4 = R(θ̇ − ω) . (147)
Note that with the above definitions of xi when we evaluate the state vector x at the
equilibrium solution r(t) = R and θ(t) = ωt we have xi = 0 for i = 1, 2, 3, 4. Our next
step will to derive the differential equations satisfied by the variables x1 , x2 , x3 , and x4 . To
begin note that
ẋ1 = ṙ = x2 . (148)
Then
G0
ẋ2 = r̈ = r θ̇2 − 2 + u1
r
hx i2 G0
4
= (x1 + R) +ω − + u1 .
R (x1 + R)2
Next
x 
4
ẋ3 = R(θ̇ − ω) = R +ω−ω
R
= x4 .
Finally
 
ṙ u2
ẋ4 = Rθ̈ = R −2θ̇ +
r r
hx i x  Ru2
4 2
= −2R +ω + .
R x1 + R x1 + R
Each of the expressions above shows how the first derivative of xi can be expressed purely
in terms of the function values xi . Thus as a matrix system we have
   x2

x1  2
d  x    (x1 + R) xR4 + ω − (x1G+R)
0 
2 + u1 
 2 = . (149)
dt  x3   x
x4
 x  
x4 −2R R4 + ω x1 +R 2
+ xRu 2
1 +R

We will take this nonlinear system and split it into two parts to write it as
     
x2 x2 0
   
 (x1 + R) x4 + ω 2 − G0 2 + u1   (x1 + R) x4 + ω 2 − G0 2   u 
 R (x1 +R)   R (x1 +R) 
+ .
1
 x = x  
    0
 2    x2 
4 4
  x4 Ru2
−2R xR4 + ω x1x+R + xRu 2
1 +R
−2R R
+ ω x1 +R x1 +R

(150)
This writes the right-hand-side as the sum of two vectors
 each that are nonlinear in the state
u1
x and the components of the noise vector u = . If we denote the first vector as f(x)
u2
(since it does not depend on the noise vector u) then we will linearized it about the state
x0 . We do this as
   
x
 x 2 2 x1
 (x1 + R) 4 + ω − G0 2  ∂f  


R (x1 +R) 
 ≈ f(x ) +  x2  (151)
∂x x0  x3 
x4  0
 x  x  
−2R 4 + ω 2 x4
R x1 +R

The point x0 is the equilibrium point for circular orbits and corresponds to x0 = 0. Using the
fact that ω 2 = G
R3
0
we have that f (x0 ) = 0. To complete this derivation recall the definition
∂f
of ∂x which is given by
 ∂f ∂f ∂f ∂f 
1 1 1 1
∂x1 ∂x2 ∂x3 ∂x4
∂f  ∂f2 ∂f2 ∂f2 ∂f2 
 ∂x1 ∂x2 ∂x3 ∂x4 
=  ∂f3 ∂f3 ∂f3 ∂f3 
∂x  ∂x1 ∂x2 ∂x3 ∂x4 
∂f4 ∂f4 ∂f4 ∂f4
∂x1 ∂x2 ∂x3 ∂x4
 
0 1 0 0
x 2 x 
 4
+ ω + (x12G 0
0 0 2
(x + R) 4
+ ω 
 R +R)3 R 1 R 
=  0 0 0 1 .
    1   x  
x4
2R R + ω (x1x+R) 2
2 −2R x4
R
+ ω x1 +R 0 −2R R1 2
x1 +R
G0
We now evaluate this at the point x0 . We find that when we use the fact that ω 2 = R3
we
get    
0 1 0 0 0 1 0 0
∂f  ω 2 + 2G30 0 2 
0 R Rω    3ω 2
0 0 2ω 
= R = . (152)
∂x x0  0 0 0 1   0 0 0 1 
ω
0 −2R R 0 0 0 −2ω 0 0
The
 second
 term in the sum in Equation 150 is the non-linear forcing function given by
0
 u1 
 
 0 . To expand this vector about the joint point (x0 , u0 ) = (0, 0) = 0 we have
Ru2
x1 +R
     
0 0 0
 
 u1     u1 
  ≈ g(0) + ∂  u1  x1 + ∂   u 1
 0  ∂x1  0  ∂u  0  u2

Ru2 Ru2 Ru2
x1 +R x1 +R x1 +R
  0  0
0 0 0
 
 0   1 0  u
=   x1 + 

 1
 0
 0 0  u2
− (x1Ru 2 R
0 x1 +R
+R)2 0 0
 
0 0  
 1 0  u
= 

 1
. (153)
0 0  u2
0 1

When we combine Equations 149 151, 152, and 153 we have the equation we wanted to show.

In the two parts below it seemed strange that the measurement noise had a variance that
was the same symbol q as the process noise symbol. Thus I’ve changed the notation below
to use the notation ri for the variance of the measurement zi .

Part (a): In this case z(t) = x3 (t) + v3 (t) with v3 ∼ N(0, r3 ) so we have a measurement
sensitivity matrix H given by  
H= 0 0 1 0 ,
with a measurement noise variance given by R = r3 .

Part (b): In this case z(t) = x1 (t) + v1 (t) with v1 ∼ N(0, r1 ) so we have a measurement
sensitivity matrix H given by  
H= 1 0 0 0 ,
with a measurement noise variance given by R = r1 .

In comparing the prescriptions from Part (a) and Part (b) the better estimator will be the
come with the smaller value of trace(P∞ ), so we need to solve the steady-state for the Riccati
equation
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
 
  0 0
q1 0  1 0 
when Ṗ = 0 and with F given by the above, Q = ,G=

, and H and R
0 q2 0 0 
0 1
given by the different parts as above.

In the Mathematical file chap 4 prob 28.nb we perform some of the algebra in attempting
to solving for the steady state error covariance matrix P (∞).

Warning: I ran into trouble in that Mathematica could not solve the above nonlinear system
for the components pij in the time I gave it. I then tried to solve the matrix Riccati equation
using the methods discussed on Page 57 above. Unfortunately the eigenvalues of the system
matrix F do not have a negative real parts since they are zero or entirely imaginary and this
method cannot be used. Thus algebraically in the time I had to work on this I was unable to
determine which of the two methods is better. If we specify numerical values for the above
variances one could easy do a numerical simulation and make some headway. If anyone has
any insight into this problem I would be interested in hearing your comments.
Chapter 5 (Optimal Linear Smoothing)

Notes on the text

Notes on the matrix inverse of a sum

Many of the results from the initial section use the following simple matrix inverse identity
which we now derive. Since we can write the sum P + Pb as

P + Pb = Pb (Pb−1 + P −1 )P ,

when we take the inverse of this sum P + Pb we find that this inverse is given by

(P + Pb )−1 = P −1 (Pb−1 + P −1 )−1 Pb−1 = Pb−1 (P −1 + Pb−1 )−1 P −1 . (154)

Notes on the derivation of the backward filter covariance matrix

To fully specify the backwards smoothing equations


dx̂b
= −F x̂b + Pb H T R−1 [z − H x̂b ] (155)

dPb
= −F Pb − Pb F T + GQGT − Pb H T R−1 HPb , (156)

we must specify the initial condition on x̂b (τ ). Now we don’t know the value of x̂b (t = T )
but we know that it must be finite. Since we know that Pb−1 (t = T ) = 0 we can try to derive
an alternative differential equation for the product Pb−1 (τ )x̂b (τ ), since we know the value of
this expression when τ = 0 (t = T ) is 0. We start by recalling the matrix derivative of an
inverse given by  
d −1 −1 dP (τ )
P (τ ) = −P (τ ) P (τ )−1 .
dτ dτ
If we take the backwards covariance propagation equation

Pb (τ )
= −F Pb − Pb F T + GQGT − Pb H T R−1 HPb ,

and multiply on the left by Pb−1 and on the right by Pb−1 (and then negate the entire
expression) we get
 
−1 d −1
−Pb Pb Pb−1 = Pb−1 F + F T Pb−1 − Pb−1 GQGT Pb−1 + H T R−1 H .

As the expression on the left-hand-side is dτd Pb (τ )−1 this is the books equation 5.2-12. Using
this we can now derive the differential equation for the variable s(t) = Pb−1(t)x̂b (t). Taking
this derivative and using the product rule (and dropping the b subscript) we have
 −1 
ds dP (τ ) dx̂(τ )
= x̂(τ ) + P −1 (τ )
dτ dτ dτ
= (P F + F P − P GQGT P −1 + H T R−1 H)x̂ + P −1(−F x̂ + P H T R−1 (z − H x̂))
−1 T −1 −1

= (F T − P −1 GQGT + H T R−1 HP )P −1x̂ + H T R−1 (z − HP P −1x̂)


= (F T − P −1 GQGT )s + H T R−1 z ,

which is the books equation 5.2-13 and the expression we wanted to show.

Notes on the forward-backwards filter formulation of the smoother Table 5.2-1

In this subsubsection we derive the expression for the optimal smoother expressed in Ta-
ble 5.2-1 and which is based on combining the forward filtering equations with the backwards
filtering equations. In that table the forward filter and the backwards filter are the same as
given in the text in many places. What is not directly obvious is the given expression for
the optimal fixed-interval smoother x̂(t|T ) and P (t|T ). To derive these equations we will
use the matrix identity
B −1 = A−1 − B −1 (B − A)A−1 ,
to evaluate [P −1 + Pb−1]−1 in the expression for P (t|T ). By taking B = P −1 + Pb−1 and
A = P −1 we have

(P −1 + Pb−1 )−1 = P − (P −1 + Pb−1 )−1 Pb−1 P


= P − (Pb (P −1 + Pb−1 ))−1 P
= P − (I + Pb P −1 )−1 P
= P − (P P −1 + Pb P −1)−1 P
= P − P (P + Pb )−1 P
= P − P Pb−1(I + P Pb−1)−1 P ,

which is the books equation for P (t|T ) found in table 5.2-1. Next we compute x̂(t|T ) using
the definition of s(t) as

x̂(t|T ) = P (t|T )[P −1(t)x̂(t) + Pb−1 x̂b (t)]


= P (t|T )[P −1(t)x̂(t) + s(t)] = P (t|T )P −1(t)x̂(t) + P (t|T )s(t) .

We next write this expression as

x̂(t|T ) = (P −1 + Pb−1 )−1 P −1 x̂(t) + P (t|T )s(t)


= (I + P Pb−1)−1 x̂(t) + P (t|T )s(t) .

Warning: This is different from the expression in the book for x̂(t|T ) found in table 5.2-1
in that the books expression does not have an inverse on the factor I + P Pb−1. If anyone
finds anything wrong with the above expression or derivation please contact me.
The derivation of the Rauch-Tung-Striebel smoother equations

To derive the Rauch-Tung-Striebel smoother equations we begin by taking the t derivative


of the books equation 5.1-11

P −1 (t|T ) = P −1 (t) + Pb−1 (t) , (157)

which expresses the smoothed covariance P (t|T ) in terms of the forward and backwards
covariances. To do this we will applying the matrix inverse derivative identity
 
d −1 −1 dA
A = −A A−1 ,
dt dt
to the left-hand-side of the above equation (but not to the right-hand-side) giving
 
d −1 −1 dP (t|T )
P (t|T ) = −P (t|T ) P −1 (t|T ) (158)
dt dt
d d
= P (t)−1 + Pb (t)−1
dt dt
d d
= P (t)−1 − Pb (τ )−1 , (159)
dt dτ
where we have converted the t derivative into a τ ≡ T − t derivative in the derivative of Pb−1
in the last term above. Now recall that from Equation 79 that the time derivative of P −1 is
given by
d −1
P = −F T P −1 − P −1F − P −1 GQGT P −1 + H T R−1 H ,
dt
and using the books equation 5.2-12 that the τ derivative of Pb−1 is given by
d −1
P = Pb−1F + F T Pb−1 − Pb−1 GQGT Pb−1 + H T R−1 H . (160)
dτ b
If we use these two expression in Equation 159 we find
d
P (t|T )−1 = −F T P −1 − P −1F − P −1GQGT P −1 + H T R−1 H
dt
− F T Pb−1 − Pb−1 F + Pb−1GQGT Pb−1 − H T R−1 H
= −F T (P −1 + Pb−1 ) − (P −1 + Pb−1 )F − P −1 GQGT P −1 + Pb−1 GQGT Pb−1
= −F T P (t|T )−1 − P (t|T )−1F − P −1GQGT P −1 + Pb−1GQGT Pb−1 .

To solve for dP dt
(t|T )
we use Equation 158 by premultiplying and postmultiplying by P (t|T )
and then negating the resulting expression. This procedure gives
dP (t|T )
= P (t|T )F T + F P (t|T )
dt
+ P (t|T )P −1GQGT P −1 P (t|T ) − P (t|T )Pb−1GQGT Pb−1 P (t|T ) . (161)

Lets now try to “remove” the terms with Pb from this expression. To do that recall if we
premultiply by P (t|T ) in Equation 157, we get

I = P (t|T )P −1 + P (t|T )Pb−1 , (162)


or solving for P (t|T )Pb−1
P −1 (t|T )Pb−1 = I − P (t|T )P −1 . (163)
Next we postmultiply by P (t|T ) in Equation 157, to get

I = P −1 P (t|T ) + Pb−1 P (t|T ) ,

or solving for Pb−1P (t|T )


Pb−1P (t|T ) = I − P −1 P (t|T ) . (164)
Then using these two expressions 163 and 164 in Equation 161 we obtain

dP (t|T )
= P (t|T )F T + F P (t|T ) + P (t|T )P −1GQGT P (t|T )
dt
− (I − P (t|T )P −1)GQGT (I − P −1P (t|T ))
= P (t|T )F T + F P (t|T ) − GQGT + P (t|T )P −1GQGT + GQGT P −1P (t|T )
= (F + GQGT P −1 )P (t|T ) + P (t|T )(F + GQGT P −1 )T − GQGT , (165)

or the books equation 5.2-15.

We next derive the differential expression satisfied by the smoothed estimate x̂(t|T ). To
begin recall the books equation 5.1-12,

x̂(t|T ) = P (t|T )[P −1x̂(t) + Pb−1x̂b ] , (166)

from which we see that the time derivative of this expression is given by

dx̂(t|T ) dP (t|T ) −1 d d
= [P x̂ + Pb−1 x̂b ] + P (t|T )[ (P −1 x̂) + (Pb−1x̂b )]
dt dt dt dt
= [(F + GQG P )P (t|T ) + P (t|T )(F + GQG P ) − GQGT ]P −1 (t|T )x̂(t|T )
T −1 T −1 T
 −1 
dP −1 dx̂ dPb−1 −1 dx̂b
+ P (t|T ) x̂ + P + x̂b + Pb .
dt dt dt dt

Since the forward and backward state estimates must satisfy


dx̂
= F x̂ + P H T R−1 (z − H x̂)
dt
dx̂b
= −(−F x̂b + Pb H T R−1 (z − H x̂b )) = F x̂b − Pb H T R−1 (z − H x̂b ) ,
dt
when we put these into the above expression we find that

dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T )
dt
+ [P (t|T )(F + GQGT P −1 )T − GQGT ]P −1(t|T )x̂(t|T )
 
+ P (t|T ) −F T P −1 x̂ − P −1 F x̂ − P −1 GQGT P −1 x̂ + H T R−1 H x̂
 
+ P (t|T ) P −1 F x̂ + H T R−1 (z − H x̂)
 
+ P (t|T ) −Pb−1 F x̂b − F T Pb−1 x̂b + Pb−1 GQGT Pb−1 x̂b − H T R−1 H x̂b
 
+ P (t|T ) Pb−1 F x̂b − H T R−1 (z − H x̂b ) .
Many terms cancel in this expression and we are left with
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T ) (167)
dt  
+ P (t|T )(F + GQGT P −1 )T − GQGT P −1 (t|T )x̂(t|T ) (168)
 
+ P (t|T ) −F T P −1 x̂ − P −1 GQGT P −1 x̂ (169)
 
+ P (t|T ) −F T Pb−1 x̂b + Pb−1 GQGT Pb−1 x̂b . (170)

Notice that the terms −P (t|T )F T P −1 x̂ and −P (t|T )F T Pb−1 x̂b on the lines 169 and 170
combine using Equation 166 to give −P (t|T )F T P −1 (t|T )x̂(t|T ), which cancels the the first
term on line 168 above to give
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T )
dt
+ P (t|T )P −1GQGT P −1 (t|T )x̂(t|T ) − GQGT P −1 (t|T )x̂(t|T )
− P (t|T )P −1GQGT P −1 x̂ + P (t|T )Pb−1GQGT Pb−1 x̂b . (171)

Again trying to “remove” the terms that contain x̂b or Pb we note that from Equation 166
we get
Pb−1 x̂b = P −1 (t|T )x̂(t|T ) − P −1 x̂(t) ,
and from Equation 157 we have Pb−1 = P (t|T )−1 − P −1 so when we use these two expression
in the last term in line 171 we find it is equal to

P (t|T )Pb−1GQGT Pb−1 x̂b = P (t|T )(P (t|T )−1 − P −1 )GQGT (P −1(t|T )x̂(t|T ) − P −1x̂)
= GQGT P −1 (t|T )x̂(t|T ) − GQGT P −1 x̂
− P (t|T )P −1GQGT P −1 (t|T )x̂(t|T ) + P (t|T )P −1GQGT P −1 x̂ .

After this expansion when we use it in Equation 171 many terms cancel to give
dx̂(t|T )
= (F + GQGT P −1 )x̂(t|T ) − GQGT P −1 x̂
dt
= F x̂(t|T ) + GQGT P −1 (x̂(t|T ) − x̂) , (172)

the equation we were to show. Recall that x̂ is the forward filtering solution and thus is a
function of time even thought we don’t explicitly denote it as such in the above expression.

Notes on the smoothabilty

When Q = 0 the Rauch-Tung-Striebel equations for the smoothed covariance P (t|T ) is

Ṗ (t|T ) = F P (t|T ) + P (t|T )F T .

Lets prove that the claimed expression for P (t|T ) or Φ(t, T )P (T )Φ(t, T )T is indeed a solution
to this equation. From P (t|T ) = Φ(t, T )P (T )Φ(t, T )T using the product rule to take the
time derivative we have that
d d
Ṗ = Φ(t, T )P (T )ΦT (t, T ) + Φ(t, T )P (T ) Φ(t, T )T .
dt dt
d
Since Φ is a fundamental solution we have dt
Φ(t, T ) = F (t)Φ(t, T ) and we can conclude that

d
Φ(t, T )T = (F Φ)T = ΦT F T ,
dt
so the above first derivative of P (t|T ) becomes

Ṗ = F Φ(t, T )P (T )ΦT (t, T ) + Φ(t, T )P (T )ΦT F T


= FP + PFT ,

as we were to show.

Notes on the Books Example 5.2-1

In part one of this example we perform fixed-interval smoothing using the forward-backwards
optimal filters. Thus to begin with we need to solve the continuous forward filtering Riccati
equation. To do that note that for this problem we have f = 0, g = h = 1 so that Equation 71
in this case becomes
p2
ṗ = q − .
r
2 √
In steady-state ṗ = 0 so p = rq or p = + rq ≡ α. The backwards error covariance from
Equation 156 is given by
dpb p2
=q− b .
dτ r
dpb √
In steady-state dτ = 0 so p2b = rq or qb = + rq = α. Thus in steady-state the smoothed
state has the following error covariance
1 1 2
p−1 (t|T ) = p−1 (t) + p−1
b (t) = + = ,
α α α
and so
α
p(t|T ) = .
2
Next the smoothed state estimate is given by
   
x̂(t) x̂b (t) α x̂ x̂b 1
x̂(t|T ) = p(t|T ) + = + = (x̂ + x̂b ) . (173)
p(t) pb (t) 2 α α 2

For part 2 of this example we want to perform fixed-interval smoothing using the Rauch-
Tung-Striebel equations, which in general are given by Equations 165 and 172. Specifying
these to the problem at hand we find Equation 165 becomes
q q 
ṗ(t|T ) = p(t|T ) + p(t|T ) −q
α α
2q
= p(t|T ) − q ,
α
as our differential equation to solve for p(t|T ). This equation has the final condition given
by p(T |T ) = p(T ), where p(T ) the forward smoother’s error covariance value at the time
t = T . Define β to be β = α1 then solving this differential equation is done as follows

ṗ(t|T ) − 2βp(t|T ) = −q or
d −2βt 
e p(t|T ) = −qe−2βt integrating both sides gives
dt
q −2βt
e−2βt p(t|T ) = e + C0 for some constant C0 thus

q
p(t|T ) = + C0 e2βt .

Note that p(T ) = α since we assume that T is large enough so that the forward filtering
equation is in steady-state. With this to satisfy the final condition on p(t|T ) of p(T |T ) =
p(T ) = α requires C0 satisfy
 
q 2βT q
+ C0 e = α ⇒ C0 = α − e−2βT .
2β 2β

Thus we have for p(t|T ) the following


 
q q
p(t|T ) = + α− e−2βT e2βt
2β 2β
α 
= 1 + e−2β(T −t) for t < T .
2
The differential equation for the smoothed state derived from Equation 172 is

˙ q
x̂(t|T ) = (x̂(t|T ) − x̂(t)) = β(x̂(t|T ) − x̂(t)) .
α
This can be shown to be equivalent to Equation 173 by taking the time derivative of that
equation which gives us
˙ 1
x̂(t|T ) = (x̂˙ + x̂˙ b ) .
2
Using the differential equations for x̂ and x̂b which in this case are given by
r
˙x̂ = p (z − x̂) = q (z − x̂)
r r
r
˙x̂b = − pb (z − x̂) = − q (z − x̂b ) .
r r
˙
When we sum these two expressions (as required by x̂(t|T )) we find
r r
˙ 1 q 1 q
x̂(t|T ) = (x̂b − x̂) = (2x̂(t|T ) − x̂ − x̂)
2 r 2 r
r
q
= (x̂(t|T ) − x̂) ,
r

where we have expressed x̂b in terms of x̂ and x̂(t|T ) using Equation 173.
Notes on a steady-state, fixed-interval smoother solution

In this subsection we show an alternative method to solve for the fixed-interval linear
smoother covariance equation for P (t|T ) governed by the differential Equation 165. We
start by defining an unknown λ in terms of the variable y as
λ = P (t|T )y , (174)
where y is chosen to satisfy the following differential equation
dy
= −[F + GQGT P −1 ]T y . (175)
dt
With such a definition taking the time derivative of λ above and using the product rule
followed by replacing Ṗ (t|T ) with the right-hand-side of Equation 165 we find
λ̇ = Ṗ (t|T )y − P (t|T )(F + GQGT P −1 )T y
= (F + GQGT P −1 )P (t|T )y + P (t|T )(F + GQGT P −1 )T y − GQGT y
− P (t|T )(F + GQGT P −1 )T y
= (F + GQGT P −1 )P (t|T )y − GQGT y
= (F + GQGT P −1 )λ − GQGT y . (176)
 
y
Then as a system in terms of the vector unknown we have
λ
    
d y −(F + GQGT P −1 )T 0 y
= T T −1 ,
dt λ −GQG F + GQG P λ
which is the books equation 5.2-14.

Derivations of the equations for optimal fixed-point smoothers

In this subsection we provide somewhat more complete derivations of many of the stated
fixed-point smoother equations. While the algebra for some of these can be tedious and
I include most of it, the hope is that someone could simple “read” these derivations and
observe their correctness. In other-words I don’t want to have any of the steps that lead up
to a result be mysterious. By cataloging these derivations and results in one place I won’t
have to revisit this work again in the future.

The first statement of this section is that we can write the explicit solution to the fixed-
interval smoother differential Equation 172 in terms of a smoothing fundamental solution
Φs (t, T ). The claimed functional form for x̂(t|T ) is given by
Z t
x̂(t|T ) = Φs (t, T )x̂(T ) − Φs (t, τ )GQGT P −1(τ )x̂(τ )dτ , (177)
T

where Φs (t, T ) is the fundamental solution for Equation 172 and thus satisfies
Φ̇s (t, T ) = (F + GQGT P −1 (t))Φs (t, T ) with Φs (t, t) = I . (178)
As a note on our notation, when dealing with multiple matrix products as in GQGT P −1
if all factors in the product are to evaluated at the same argument we will present that
argument only on the last factor. Thus the expression GQGT P −1 (τ ) is really a short-hand
for G(τ )Q(τ )G(τ )T P −1(τ ). In the same way, the addition of another matrix to a product
expression will be evaluated at the same argument as the product expression. Thus the
expression F + GQGT P −1 (τ ) is really a short-hand for F (τ ) + G(τ )Q(τ )GT (τ )P −1 (τ ).

Now we will show that Equation 177 is a solution to Equation 172 by explicitly evaluating
its time derivative. Using Leibniz’s rule and Equation 177 itself to replace any resulting
integrals with simpler expressions it then follows that
˙
x̂(t|T ) = (F + GQGT P −1(t))Φs (t, T )x̂(T ) − Φs (t, t)GQGT P −1 (t)x̂(t)
Z t
− Φ̇s (t, τ )GQGT P −1 (τ )x̂(τ )dτ
T
= (F + GQGT P −1(t))Φs (t, T )x̂(T ) − GQGT P −1 (t)x̂(t)
Z t
T −1
− (F + GQG P (t)) Φs (t, τ )GQGT P −1(τ )x̂(τ )dτ
T
= (F + GQGT P −1(t))Φs (t, T )x̂(T ) − GQGT P −1 (t)x̂(t)
− (F + GQGT P −1(t))[−x̂(t|T ) + Φs (t, T )x̂(T )]
= (F + GQGT P −1(t))x̂(t|T ) − GQGT P −1 (t)x̂(t) ,
or an expression equivalent to Equation 172 proving that Equation 177 is a representation
of its solution.

The next steps in the derivation are to derive expressions for the T evolution of x̂(t|T ) and
P (t|T ) or explicit equations for dx̂(t|T
dT
)
and dPdT
(t|T )
. To derive an expression for dx̂(t|T
dT
)
we will
dΦs (t,T )
need to be able to evaluate the expression dT which the book claims is given by
dΦs (t, T )
= −Φs (t, T )(F + GQGT P −1 (T )) , (179)
dT
where the expression F + GQGT P −1 (T ) means that every matrix has its argument evaluated
at T . To show this is true, consider the t derivative of the identity Φs (t, T )Φs (T, t) = I, which
by the product rule is given by
dΦs (t, T ) dΦs (T, t)
Φs (T, t) + Φs (t, T ) = 0.
dt dt
dΦs (T,t) dΦs (t,T )
Solving for dt
and using the expression for dt
given by Equation 178 we get
dΦs (T, t) dΦs (t, T )
= −Φs (t, T )−1 Φs (T, t)
dt dt
dΦs (t, T )
= −Φs (T, t) Φs (T, t)
dt
= −Φs (T, t)(F + GQGT P −1 (t))Φs (t, T )Φs (T, t)
= −Φs (T, t)(F + GQGT P −1 (t)) . (180)
d
Then to get the desired expression for dT Φs (t, T ) we exchange T and t in Equation 180
to get Equation 179 or the books equation 5.3-5. Once the expression for dΦsdT(t,T ) has been
dx̂(t|T )
established the equation for dT
is give by using Leibniz’ rule on Equation 177 in a straight-
forward manner.

Having just derived an expression for dx̂(t|T


dT
)
, we proceed do the same thing for dPdT
(t|T )
. To
do this we start with an explicit solution for P (t|T ) in terms of the fundamental smoothing
matrix Φs (t, T ), and proceed to take the T derivative of that solution using Equation 179 to
simplify the resulting expressions. Now we claim that a solution to the differential equation
for dP dt
(t|T )
given by Equation 165 can be expressed as
Z t
P (t|T ) = Φs (t, T )P (T )ΦTs (t, T ) − Φs (t, τ )GQGT (τ )Φs (t, τ )T dτ . (181)
T

To verify this expression is indeed a solution we can take its t derivative to get

Ṗ (t|T ) = (F + GQGT P −1 (t))Φs (t, T )P (T )ΦTs (t, T )


+ Φs (t, T )P (T )Φs(t, T )T (F + GQGT P −1 (t))T
− Φs (t, t)GQGT (t)ΦTs (t, t)
Z th i
T T T T
− Φ̇s (t, τ )GQG (τ )Φs (t, τ ) + Φs (t, τ )GQG (τ )Φ̇s (t, τ ) dτ
T
= (F + GQGT P −1 (t))Φs (t, T )P (T )ΦTs (t, T )
+ Φs (t, T )P (T )ΦTs (t, T )(F + GQGT P −1 (t))T
− GQGT (t)
Z t
− (F + GQGT P −1(t))Φs (t, τ )GQGT (τ )ΦTs (t, τ )dτ
ZT t
− Φs (t, τ )GQGT (τ )ΦTs (t, τ )(F + GQGT P −1 (t))T dτ .
T

From the claimed solution for P (t|T ) given by Equation 181 we have
Z t
Φs (t, τ )GQGT (τ )ΦTs (t, τ )dτ = Φs (t, T )P (T )ΦTs (t, T ) − P (t|T ) ,
T

so using this the above expression for Ṗ (t|T ) becomes

Ṗ (t|T ) = (F + GQGT P −1(t))Φs (t, T )P (T )ΦTs (t, T )


+ Φs (t, T )P (T )Φs (t, T )T (F + GQGT P −1 (t))T
− GQGT (t)
− (F + GQGT P −1(t))[Φs (t, T )P (T )ΦTs (t, T ) − P (t|T )]
− [Φs (t, T )P (T )ΦTs (t, T ) − P (t|T )](F + GQGT P −1 (t))T
= [F + GQGT P −1 (t)]P (t|T ) + P (t|T )[F + GQGT P −1 (t)]T − GQGT (t) ,

when we simplify. This is the books equation 5.2-15 showing that Equation 181 is indeed a
solution to Equation 165 as claimed.

With the explicit representation for P (t|T ) given by Equation 181 we next take the T
derivative of this expression. The product rule and Leibniz’ rule gives
dP (t|T ) dΦs (t, T ) dP (T ) T
= P (T )ΦTs (t, T ) + Φs (t, T ) Φs (t, T )
dT dT dT
dΦT (t, T )
+ Φs (t, T )P (T ) s + Φs (t, T )GQGT (T )Φs (t, T )T .
dT
Now using Equation 71 and 179 into the above we have
dP (t|T )
= −Φs (t, T )(F + GQGT P −1 (T ))P (T )ΦTs (t, T )
dT  
+ Φs (t, T ) F P + P F T + GQGT − P H T R−1 HP (T ) ΦTs (t, T )
− Φs (t, T )P (T )(F + GQGT P −1 (T ))T ΦTs (t, T ) + Φs (t, T )GQGT (T )Φs (t, T )T
= −Φs (t, T )P H T R−1 HP (T )ΦTs (t, T ) , (182)
which is the book’s equation 5.3-8.

Derivations of the equations for optimal fixed-lag smoothing

In this subsection of we present notes and derivations on the equations fixed-lag smoothers
must satisfy. Starting with Equation 177 but by taking t = T − ∆ gives the equation
Z T −∆
x̂(T − ∆|T ) = Φs (T − ∆, T )x̂(T ) − Φs (T − ∆, τ )GQGT P −1 (τ )x̂(τ )dτ . (183)
T

To derive the ordinary differential equation that the optimal fixed-lag state estimate or
x̂(T − ∆|T ) must satisfy we will take the T derivative of the above expression. To take
the T derivative of the above requires us to evaluate dΦs (TdT−∆,T ) . This derivative can be
evaluated by writing Φs (T − ∆, T ) = Φs (T − ∆, t)Φs (t, T ), using the product rule followed
by Equations 178 and 179. We find
dΦs (T − ∆, T ) dΦs (T − ∆, t) dΦs (t, T )
= Φs (t, T ) + Φs (T − ∆, t)
dT dT dT
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, t)Φs (t, T )
− Φs (T − ∆, t)Φs (t, T )(F + GQGT P −1 (T ))
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, T )
− Φs (T − ∆, T )(F + GQGT P −1 (T )) . (184)
which is the books equation 5.4-3.

dx̂(T −∆|T )
With this result we are ready to evaluate dT
using Equation 183. We find
dx̂(T − ∆|T ) dΦs (T − ∆, T ) dx̂(T )
= x̂(T ) + Φs (T − ∆, T )
dT dT dT
T −1
− Φs (T − ∆, T − ∆)GQG P (T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )GQGT P −1 (T )x̂(T )
Z T −∆
dΦs (T − ∆, τ )
− GQGT P −1(τ )x̂(τ )dτ .
T dT
dΦs (T −∆,τ )
Using Equation 178 to evaluate dT
the integral term above becomes
Z T −∆
T −1
(F + GQG P (T − ∆)) Φs (T − ∆, τ )GQGT P −1 (τ )x̂(τ )dτ .
T

In terms of x̂(T − ∆|T ) from Equation 183 this is

(F + GQGT P −1 (T − ∆)) [Φs (T − ∆, T )x̂(T ) − x̂(T − ∆|T )] .

Thus we find our derivative of x̂(T − ∆|T ) given by

dx̂(T − ∆|T )
= (F + GQGT P −1 (T − ∆))Φs (T − ∆, T )x̂(T )
dT
− Φs (T − ∆, T )(F + GQGT P −1 (T ))x̂(T )
+ Φs (T − ∆, T ) [F (T )x̂(T ) + K(T )(z(T ) − H(T )x̂(T ))]
− Φs (T − ∆, T − ∆)GQGT P −1(T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )GQGT P −1(T )x̂(T )
− (F + GQGT P −1 (T − ∆)) [Φs (T − ∆, T )x̂(T ) − x̂(T − ∆|T )]
= (F + GQGT P −1 (T − ∆))x̂(T − ∆|T )
− GQGT P −1 (T − ∆)x̂(T − ∆)
+ Φs (T − ∆, T )K(T )(z(T ) − H(T )x̂(T )) , (185)

which is the books equation 5.4-3 and is the desired differential equation for x̂(T − ∆|T ).

Next we derive the differential equation for P (T − ∆|T ) under optimal fixed-lag smoothing.
To do this we set t = T − ∆ in Equation 181 and get
Z T −∆
T
P (T −∆|T ) = Φs (T −∆, T )P (T )Φs (T −∆, T )− Φs (T −∆, τ )GQGT (τ )ΦTs (T −∆, τ )dτ .
T

We follow the same procedure to derive the corresponding differential equation we have been
performing above. The algebra for this seems quite involved and can be skipped at first
reading. Taking the T derivative of this expression we find

dP (T − ∆|T ) dΦs (T − ∆, T ) dP (T ) T
= P (T )ΦTs (T − ∆, T ) + Φs (T − ∆, T ) Φs (T − ∆, T )
dT dT dT
dΦTs (T − ∆, T )
+ Φs (T − ∆, T )P (T ) − GQGT (T − ∆)
dT
+ Φs (T − ∆, T )GQGT (T )ΦTs (T − ∆, T )
Z T −∆
dΦs (T − ∆, τ )
− GQGT (τ )ΦTs (T − ∆, τ )dτ
T dT
Z T −∆
dΦT (T − ∆, τ )
− Φs (T − ∆, τ )GQGT (τ ) s dτ .
T dT

Again we will use Equation 178 to evaluate dΦs (TdT−∆,τ ) in the above integrals and then write
them in terms in terms of P (T − ∆|T ) and Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) using the
proposed integral solution for P (T − ∆|T ). When we do this along with other simplifications
of derivatives that appear we obtain
dP (T − ∆|T )
= (F + GQGT P −1(T − ∆))Φs (T − ∆, T )P (T )ΦTs (T − ∆, T )
dT
− Φs (T − ∆, T )(F + GQGT P −1 (T ))P (T )ΦTs (T − ∆, T )
+ Φs (T − ∆, T )[F P + P F T + GQGT − P H T R−1 HP (T )]ΦTs (T − ∆, T )
+ Φs (T − ∆, T )P (T )ΦTs (T − ∆, T )(F + GQGT P −1 (T − ∆))T
− Φs (T − ∆, T )P (T )(F − GQGT P −1(T ))T ΦTs (T − ∆, T )
− GQGT (T − ∆)
+ Φs (T − ∆, T )GQGT (T )ΦTs (T − ∆, T )
− (F + GQGT P −1(T − ∆))[Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) − P (T − ∆|T )]
− [Φs (T − ∆, T )P (T )ΦTs (T − ∆, T ) − P (T − ∆|T )](F + GQGT P −1 (T − ∆))T .

As expected, many terms cancel in the above expression and when the smoke clears we find
we are left with
dP (T − ∆|T )
= (F + GQGT P −1(T − ∆))P (T − ∆|T )
dT
+ P (T − ∆|T )(F + GQGT P −1 (T − ∆))T
− Φs (T − ∆, T )P H T R−1 HP ΦTs (T − ∆, T )
− GQGT (T − ∆) , (186)

which is the books equation 5.4-4.

Problem Solutions

Problem 5-1 (the smoothing equation via minimization)

To solve this problem lets begin by expanding the given objective function as

J = (x − x̂)T P −1 (x − x̂) + (x − x̂b )T Pb−1 (x − x̂b )


= xT P −1 x − 2x̂T P −1 x + x̂T P −1 x̂
+ xT Pb−1 x − 2x̂Tb Pb−1 x + x̂Tb Pb−1 x̂b .

Then using Equations 311 and 312 we can compute the derivative of J with respect to x.
We find
∂J
= 2P −1x + 2Pb−1 x − 2(P −1x̂) − 2(Pb−1 x̂b ) .
∂x
Setting this result equal to zero we have

(P −1 + Pb−1 )x = P −1 x̂ + Pb−1 x̂b ,

or solving for x and calling the result x̂(t|T ) we have

x̂(t|T ) = (P −1 + Pb−1 )−1 (P −1x̂ + Pb−1 x̂b ) . (187)


If we define P (t|T ) to be
P (t|T ) = (P −1 + Pb−1)−1 ,
then the above is
x̂(t|T ) = P (t|T )(P −1x̂ + Pb−1 x̂b ) ,
as we were to show.

Problem 5-2 (deriving the Rauch-Tung-Striebel smoother equations)

This exercise is worked beginning on Page 100 in this text.

Problem 5-3 (deriving the Bryson-Frazier fixed-interval smoother equations)

Warning: I was unable to derive the given expression for Λ̇(t) or to show the identity

P (t|T ) = P (t) − P (t)Λ(t)P (t) ,

as requested in this problem. Below I present the algebraic steps I took and where I got
stuck. If anyone sees what to do next or an alternative solution please contact me.

If we consider the estimate x̂(t|T ) decomposed as

x̂(t|T ) = x̂(t) − P (t)λ(t) , (188)

then taking the derivative of x̂(t|T ) using the product rule gives

x̂(t|T ) dx̂ dP λ(t)


= − λ(t) − P (t) . (189)
dt dt dt dt
x̂(t|T )
Now using equation 5.2-14 from the book to evaluate dt
on the left-hand-side followed by
the forward filtering equations given by
dx̂
= F x̂ + P H T R−1 (z − H(t)x̂) ,
dt
dP
and a similar equation for dt
in the right-hand-side of Equation 189 we find

F x̂(t|T ) + GQGT P −1 (x̂(t|T ) − x̂) = F x̂ + P H T R−1 (z − H x̂)



− (F P + P F T + GQGT − P H T R−1 HP )λ − P .
dt
Putting the expression for x̂(t|T ) given by Equation 188 into the left-hand-side the above
expression and then canceling the common terms we obtain

0 = P H T R−1 (z − H(t)x̂) − P F T λ + P H T R−1 HP λ − P .
dt

Solving for dt
we obtain


= −F T λ + H T R−1 HP λ + H T R−1 (z − H x̂)
dt
= −[F T − P H T R−1 H]T λ + H T R−1 (z − H x̂) ,

as we were to show. Note that since x̂(t|T ) when t = T is given by x̂(T |T ) = x̂(T ), we see
that this translates into an initial condition on λ(t) of the following

x̂(T |T ) = x̂(T ) − P (T )λ(T ) so λ(T ) = 0 .

Using the definition of Λ(t) as E[λ(t)λ(t)T ] we have that the first derivative of this expression
(when we use the results from above) is
   
d dλ T dλT
Λ(t) = E λ +E λ
dt dt dt
= −(F − P H R H)T E[λλT ] + H T R−1 E[(z − H x̂)λT ]
T −1

− E[λλT ](F − P H T R−1 H) + E[λ(z − H x̂)T ]R−1 H


= −(F − P H T R−1 H)T Λ(t) − Λ(t)(F − P H T R−1 H) (190)
T −1 T T −1
+ H R E[(z − H x̂)λ ] + E[λ(z − H x̂) ]R H . (191)

This result is similar to the expression we are attempting to derive for Λ̇. To make the two
expressions the same we need to evaluate the last two terms above. Since the two terms on
line 191 are transposes of each other we will evaluate only the first one and get the second
one by transposition. From the definition of λ we have

λ = P −1 (x̂ − x̂(t|T )) ,

thus we see that


E[(z − H x̂)λT ] = E[(z − H x̂)(x̂ − x̂(t|T ))T ]P −1 . (192)
Now since by assumption our measurement z is related to the state via z = Hx+v where x is
our true system state and our estimate x̂ is the true system state plus an error as x̂ = x + x̃.
Using these two relationships we can write the first factor in the product above as

z − H x̂ = Hx + v − H(x + x̃) = v − H x̃ . (193)

Next lets consider the second factor in the product above. From the definition of x̂(t|T ) in
Equation 166 and using Equation 162 we see that

x̂ − x̂(t|T ) = x̂ − P (t|T )P −1x̂ − P (t|T )Pb−1x̂b


= (I − P (t|T )P −1)x̂ − P (t|T )Pb−1x̂b
= P (t|T )Pb−1x̂ − P (t|T )Pb−1x̂b
= P (t|T )Pb−1(x̂ − x̂b ) = P (t|T )Pb−1(x̃ − x̃b ) .

With this result we can now compute the inner product needed in Equation 192. We find

(z − H x̂)(x̂ − x̂(t|T ))T = (v − H x̃)(x̃ − x̃b )T Pb−1P (t|T ) .


Now taking expectations and using the facts that
E[vx̃T ] = 0
E[vx̃Tb ] = 0 ,
and the fact that the backwards filter is independent of the forward filter so that
E[x̃x̃Tb ] = 0 ,
we find the needed expectation given by
E[(z − H x̂)(x̂ − x̂(t|T ))T ] = −HE[x̃x̃T ]Pb−1 P (t|T )
= −HP Pb−1P (t|T ) .
Putting everything back together we find the term H T R−1 E[(z − H x̂)λT ] given by
H T R−1 E[(z − H x̂)λT ] = H T R−1 E[(z − H x̂)(x̂ − x̂(t|T ))T ]P −1
= −H T R−1 HP Pb−1P (t|T )P −1 .
Now we have two terms like this to add together on line 191 where the second is the transpose
of the first we need to simplify
−H T R−1 HP Pb−1P (t|T )P −1 − P −1 P (t|T )Pb−1P H T R−1 H .
Warning: I don’t see how to turn this remaining expression into H T R−1 H. If anyone sees
how to proceed with this derivation please contact me.

Problem 5-4 (an example of the reduction in uncertainty with smoothing)

Part (a): See the problem 4-11 on Page 73 where we do this calculation in detail.

Part (b): We will consider the Rauch-Tung-Striebel (RTS) covariance Equation 165 in
steady-state where Ṗ (t|T ) = 0 but specified for this problem where all system matrices are
scalars and constant. Specifically we have F = a, G = 1, Q = q, H = b, and R = r so the
RTS equation becomes  
q
0=2 a+ p∞ (t|T ) − q .
p∞
When we solve this for p∞ (t|T ) we get
q p
p∞ (t|T ) =  =  ∞ .
q
2 a+ p∞
2 1 + aq p∞

To solve this problem another way one could consider the backwards covariance filtering
equation given by
d −1
P (T − τ ) = Pb−1(T − τ )F (T − τ ) + F T (T − τ )Pb−1 (T − τ )
dτ b
− Pb−1(T − τ )G(T − τ )Q(T − τ )GT (T − τ )Pb−1 (T − τ )
+ H T (T − τ )R−1 (T − τ )H(T − τ ) .
dPb−1
Set dτ
= 0 and solve for Pb (∞). For this problem the above becomes

2a q b2
0= − + .
pb (∞) pb (∞)2 r

which we would need to solve for pb (∞). Given this value we can compute the desired
expression P∞ (t|T ) using P∞ (t|T )−1 = P −1 (∞) + Pb−1 (∞).

Part (c): Using the above two results we find that

p∞ (t|T ) 1 1
=  =   q  .
p∞ a
2 1 + q p∞ a ar b2 q
2 1+ q
· b2
1+ 1+ a2 r

b2 q
Defining γ 2 as γ 2 = a2 r
the above becomes

1
  p  ,
1
2 1+ γ2
1+ 1 + γ2

which if we multiply by γ 2 on the top and bottom of this expression gives the desired result.

Problem 5-6 (smoothing an integrator)

For this problem we desire to apply fixed interval smoothing to a discrete system which looks
like

xk = xk−1 + wk−1 for wk−1 ∼ N(0, q∆)


zk = xk + vk for vk ∼ N(0, r0 ) .

Thus we have that Φk = 1, Qk = q∆, Hk = 1, and Rk = r0 . Note that the forward filtering
part of this problem is the same as that of Problem 4-14 on page 77.

Part (a): For this part we want to use fixed-interval smoothing to compute p0|2 and p1|2 ,
so N = 2 and to solve this problem using the Rauch-Tung-Striebel algorithm we first need
to compute the forward smoothed solution pk (±).

Since we are told to assume no a-priori information on the knowledge of the state we must
take p0 (+) ≈ +∞. If we do this directly it seems that we run into problems when we perform
backwards filtering (in that we obtain the indefinite ratio of ∞/∞) with the above forward
filtered results. Thus I’ll take our initial condition on p0 (+) to be
1
p0 (+) = ,
ǫ
where ǫ is a small number. Just as in Problem 4-14 we iterate the discrete Kalman filter
equations for k = 0, 1, 2 to find when we take ǫ = 0 we get

p0 (+) +∞=
p1 (−) +∞=
p1 (+) r0=
p2 (−) =
r0 (1 + γ)
1+γ
p2 (+) = r0 .
2+γ
When we keep ǫ 6= 0 we can then perform the discrete RTS filtering equations backwards.
Starting with pN |N = p2|2 = p2 (+) we compute for k = 1 and then k = 0 the following

Ak = Pk (+)ΦTk Pk+1
−1
(−)
Pk|N = Pk (+) + Ak [Pk+1|N − Pk+1 (−)]ATk .

The calculations when p0 (+) = 1ǫ and the subsequent limit as ǫ → 0 are rather tedious and
are done in the Mathematica file chap 5 prob 6.nb. Performing the above iterations we
obtain
1+γ
p2|2 = p2 (+) = r0
2+γ
1
a1 = p1 (+)ΦT1 p−1
2 (−) =
1+γ
p1|2 = p1 (+) + a1 [p2|2 − p2 (−)]aT1
 
1+γ
= p1 (+) + a21 [p2 (+) − p2 (−)] = r0
2+γ
a0 = p0 (+)ΦTk p−1
1 (−) = 1
p0|2 = p0 (+) + a0 [p1|2 − p1 (−)]aT0
 
1 + 3γ + γ 2
= r0 .
2+γ

Warning: Note that these expressions are somewhat different than the ones presented for
this problem. If anyone sees an error in what I’ve done or can verify that these are correct
please contact me.

Part (c): In fixed-point smoothing we desire a smoothed estimate of the state at a particular
point of interest while the “end point” of the interval grows. Specifically, in fixed-point
optimal smoothing we will fix the index k and then let the index N increase. For this
problem since we want to compute p0|1 and p0|2 that means we take k = 0 and let N = 1 and
N = 2. Once k is fixed and using the a priori and a posteriori covariance estimate Pi (±)
for i ≥ k computed from forward filtering we will compute the desired fixed-point smoothed
solutions Pk|N for N = k + 1, k + 2, · · · by using
N
Y −1
BN = Pi (+)ΦTi Pi+1
−1
(−)
i=k
T
Pk|N = Pk|N −1 + BN [Pk (+) − Pk (−)]BN ,
with Pk|k = Pk (+).

To iterate these equations when N = 1 we have


0
Y P0 (+)
B1 = Pi (+)ΦTi Pi+1
−1
(−) = P0 (+)P1−1 (−) = =1
i=0
P1 (−)
P0|1 = P0|0 + B1 (P0 (+) − P0 (−))B1T
= 2P0 (+) − P0 (−) .

Warning: I don’t see how to evaluate the term P0 (−) since our initial a posteriori uncer-
tainty was to be infinite P0 (+) = ∞. This might mean that P0 (−) = ∞. In any case these
results don’t agree with what the book claims this expression should be.
Chapter 6 (Nonlinear Estimation)

Notes on the text

Notes on the extended Kalman filter

If we perform a power series expansion of our nonlinear function f (x, t) in terms of the
current estimate (the conditional mean x̂(t)) then we have

∂f
f (x, t) ≈ f (x̂, t) + (x − x̂) + · · · = f (x̂, t) + F (x − x̂) + · · · ,
∂x x=x̂
where F is a function of the state, x̂, we linearize about and time t i.e. F = F (x̂, t). Then
the state estimate x̂ satisfies
˙
x̂(t) = fˆ(x(t), t) . (194)
Next using the books equation 6.1-5 or
dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q , (195)
we will evaluate the right-hand-side using the above power series expansion for f (x, t). For
dT = E[xf T ] we find
the term xf
E[xf T ] = E[xf (x̂, t)T ] + E[x(x − x̂)T F T ]
= E[x]f (x̂, t)T + E[(x − x̂)(x − x̂)T ]F T + E[x̂(x − x̂)T ]F T
= x̂f (x̂, t)T + P F T .

To evaluate fdxT = E[f xT ] we simply take the transpose of the above result. To evaluate
the expression fˆ we have
fˆ = E[f (x, t)] ≈ f (x̂, t) + E[(F (x − x̂))T ] = f (x̂, t) .
Using these two expressions in Equation 195 we have for Ṗ
Ṗ (t) = x̂f (x̂, t)T + P F T
− x̂f (x̂, t)T
+ f (x̂, t)x̂T + F P
− f (x̂, t)x̂T + Q
= PFT + FP + Q, (196)
which is the book’s equation 6.1-8.

Notes on the extended Kalman filter: incorporating measurements

We will estimate the state at time tk or xk after the measurement zk using a formula like
x̂k (+) = ak + Kk zk . (197)
Then introducing the definition of the a priori and a posteriori state error x̃k (±) as

x̃k (±) = x̂k (±) − xk , (198)

and first using x̃k (+) on the left-hand-side of the proposed estimator Equation 197 above we
get
x̃k (+) + xk = ak + Kk (hk (xk ) + vk ) .
Next using x̃k (−) to replace xk on the left-hand-side of this expression we get

x̃k (+) = ak + Kk hk (xk ) + Kk vk + x̃k (−) − x̂k (−) , (199)

which is the books equation 6.1-11. Now taking the expectation of both sides of this expres-
sion and assuming that our earlier estimate of xk was unbiased that is E[x̃k (−)] = 0 then to
make our a posteriori estimate of xk unbiased we require the following

E[x̃k (+)] = ak + Kk E[hk (xk )] − E[x̂k (−)] = 0 .

Since E[x̂k (−)] = x̂k (−) when we solve for ak we find

ak = x̂k (−) − Kk E[hk (xk )] ,

and the a posteriori estimate x̂k (+) in Equation 197 then takes the form

x̂k (+) = ak + Kk zk
= x̂k (−) + Kk (zk − E[hk (xk )]) , (200)

which is the books equation 6.1-13. Using this expression for ak we can go back to the
expression above for the a posteriori estimate error x̃k (+) or Equation 199 where we find

x̃k (+) = x̂k (−) − Kk E[hk (xk )] + Kk hk (xk ) + Kk vk + x̃k (−) − x̂k (−)
= x̃k (−) + Kk (hk (xk ) − E[hk (xk )]) + Kk vk , (201)

or the books equation 6.1-14. This expression makes it easy to compute Pk (+) since it is the
expectation of the above expression “squared”. Specifically Pk (+) = E[x̃k (+)x̃k (+)T ] and
this quadratic product is given by

x̃k (+)x̃k (+)T = x̃k (−)x̃k (−)T + x̃k (−)(hk (xk ) − E[hk (xk )])T KkT + x̃k (−)vkT KkT
+ Kk (hk (xk ) − E[hk (xk )])x̃k (−)T
+ Kk (hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T KkT
+ Kk (hk (xk ) − E[hk (xk )])vkT KkT
+ Kk vk x̃k (−)T + Kk vk (hk (xk ) − E[hk (xk )])T KkT + Kk vk vkT KkT .

When we take the expectation of the above many terms simplify. Specifically using

E[x̃k (±)x̃k (±)T ] = Pk (±)


E[x̃k (−)vkT ] = 0
E[(hk (xk ) − E[hk (xk )])vkT ] = 0
E[vk vkT ] = Rk ,
the above simplifies to

Pk (+) = Pk (−) + E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]KkT


+ Kk E[(hk (xk ) − E[hk (xk )])x̃k (−)T ]
+ Kk E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ]KkT + Kk Rk KkT . (202)

which is the books equation 6.1-15.

As we have done before we will select Kk so that Pk (+) has a minimum trace. Defining Jk =
trace(Pk (+)), we then seek to minimize Jk as a function of Kk by taking the Kk derivative
of Jk , setting the result equal to zero and then solving for Kk . From the Equation 202 we
have several types of derivatives to take. Using Equation 112 with either B or C equal to the
identity matrix we can take the derivative of the second and third terms in Equation 202,
while using Equation 113 we can take the derivative of the fourth and fifth terms. When we
use these expressions we find we need to solve
∂Jk
= E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]
∂Kk
+ E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]
+ 2Kk E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] + 2Kk Rk = 0 ,

for Kk . Doing this gives

Kk = −E[x̃k (−)(hk (xk ) − E[hk (xk )])T ]


 −1
× E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] + Rk , (203)

or the books equation 6.1-17. We next want to put this expression into Equation 202 to
evaluate what the minimum value of the objective function Jk is. To do this we will briefly
introduce some short-hand notation so that the manipulations are more manageable. We
define the symbol “xhT ” as

xhT = E[x̃k (−)(hk (xk ) − E[hk (xk )])T ] .

and the symbol “hhT ” as

hhT = E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] .

With this short-hand we have Kk = −xhT (hhT + Rk )−1 and find that Pk (+) becomes

Pk (+) = Pk (−) + xhT (hhT + Rk )−1 (hhT )(hhT + Rk )−1 hxT


− xhT (hhT + Rk )−1 hxT − xhT (hhT + Rk )−1 hxT
+ xhT (hhT + Rk )−1 Rk (hhT + Rk )−1 hxT .

Combining the second and fifth terms gives

xhT (hhT + Rk )−1 (hhT + Rk )(hhT + Rk )−1 hxT = xhT (hhT + Rk )−1 hxT ,

which cancels with the third term. Thus we get (expressed in terms of the expressions with
expectations and not the short-hand notation)

Pk (+) = Pk (−) + Kk E[(hk (xk ) − E[hk (xk )])x̃k (−)T ] , (204)


or the books equation 6.1-18.

If, as the book suggests, we Taylor expand the nonlinear function hk (xk ) about the a priori
state estimate x̂k (−) as

hk (xk ) = hk (x̂k (−)) + Hk (x̂k (−))(xk − x̂k (−)) , (205)

then using this we observe that the expectation of hk (xk ) denoted by E[hk (xk )] is equal to
hk (x̂k (−)) and thus

hk (xk ) − E[hk (xk )] = Hk (xk − x̂k (−)) = −Hk x̃k (−) .

Thus some of the expectations in the formulas for Kk and Pk (+) simplify as

E[(hk (xk ) − E[hk (xk )])(hk (xk ) − E[hk (xk )])T ] = Hk Pk (−)HkT ,

and
E[x̃k (−)(hk (xk ) − E[hk (xk )])T ] = −Pk (−)HkT .
Using both of these observations we see that Equation 204 becomes

Pk (+) = Pk (−) − Kk Hk (x̂k (−))Pk (−) ,

for the a posteriori covariance update equation for the extended Kalman filter and the books
equation 6.1-21.

Notes on Higher-Order Filters

In this section we will attempt to derive many of the expressions for a second-order filter
presented in the book. To begin we will perform a second-order Taylor expansion of f (x(t), t)
and hk (xk ) about x̂(t) and x̂k (−) respectively as follows
1
f (x(t), t) = f (x̂(t), t) − F (x̂(t), t)x̃(t) + ∂ 2 (f, x̃(t)x̃(t)T ) + · · · (206)
2
1
hk (xk ) = hk (x̂k (−)) − H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T ) + · · · , (207)
2
where for any matrix B the expression ∂ 2 (f, B) is a vector with an ith component defined
as  2  
2 ∂ fi
∂i (f, B) ≡ trace B . (208)
∂xp ∂xq
˙
When these expressions are put into the state dynamic Equation 194 or x̂(t) = fˆ(x(t), t) we
get
˙
x̂(t) = fˆ(x(t), t) = E[f (x(t), t)]
1
= E[f (x̂(t), t) − F (x̂, t)x̃(t) + ∂ 2 (f, x̃x̃T )]
2
1 2
= f (x̂(t), t) + ∂ (f, P (t)) ,
2
since E[F (x̂, t)x̃(t)] = F (x̂, t)E[x̃(t)] = 0.

Next we want to put the second-order Taylor expansions above into Equation 195 or

dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q .

dT .
Since we know how to evaluate fˆ, the expectation of f , lets first consider the term xf
Before we take the expectation, under the second order Taylor expansion of f (x, t) we find
xf T is given by  
T T T T 1 2 T T
xf = x f (x̂) − x̃ F (x̂) + ∂ (f, x̃x̃ ) .
2
When we take expectations of this using the fact that x = x̂ − x̃ we get

dT = E[xf T ] = x̂f (x̂)T − E[(x̂ − x̃)x̃T ]F (x̂)T + 1 E[(x̂ − x̃)∂ 2 (f, x̃x̃T )T ]
xf
2
1 2 1
= x̂f (x̂) + P (t)F (x̂) + x̂∂ (f, P (t)) − E[x̃∂ 2 (f, x̃x̃T )T ] .
T T T
2 2
T
From which we see that we now need to evaluate the expectation of the matrix x̃∂ 2 f, x̃x̃T
which has an ijth component given by
 2  
2

T T ∂ fj T
(x̃∂ f, x̃x̃ )ij = x̃i trace x̃x̃ .
∂xp ∂xq

Now consider the matrix product  


∂ 2 fj
x̃x̃T ,
∂xp ∂xq
which has a pnth component given by
Xn
∂ 2 fj
x̃q x̃n ,
q=1
∂xp ∂xq

thus the trace in the above expression becomes


   X n X n
∂ 2 fj T ∂ 2 fj
trace x̃x̃ = x̃q x̃p . (209)
∂xp ∂xq p=1 q=1
∂xp ∂xq

When we multiply this by by x̃i we finally find


Xn X n
2

T T ∂ 2 fj
(x̃∂ f, x̃x̃ )ij = x̃i x̃q x̃p . (210)
p=1 q=1
∂xp ∂xq

When we take the expectation of this we get zero, assuming that x̃i are independent Gaussian
random variables with zero expectation because then E[x̃i x̃q x̃p ] = 0. After all of this we
finally arrive at
dT = x̂f (x̂)T + P (t)F (x̂)T + 1 x̂∂ 2 (f, P (t))T .
xf
2
Now the expectation of f is given by fˆ = f (x̂) + 12 ∂ 2 (f, P (t)) so we can now evaluate Ṗ (t)
using Equation 195. We find
1 1
Ṗ (t) = x̂f (x̂)T + P (t)F (x̂)T + x̂∂ 2 (f, P (t))T − x̂f (x̂)T − x̂∂ 2 (f, P (t))T
2 2
1 1
+ f (x̂)x̂T + F (x̂)P (t) + ∂ 2 (f, P (t))x̂T − f (x̂)x̂T − ∂ 2 (f, P (t))x̂T + Q
2 2
T
= P (t)F (x̂) + F (x̂)P (t) + Q ,

the desired expression in 6.1-26.

Next we evaluate Equation 200 or

x̂k (+) = x̂k (−) + Kk [zk − ĥk (xk )] .

From the given second-order Taylor series expansion for hk (xk ) we have the expectation of
hk (xk ) denoted by ĥk (xk ) given by
1
ĥk (xk ) = E[hk (xk )] = hk (x̂k (−)) + ∂ 2 (hk , Pk (−)) .
2
Thus we see that Equation 200 becomes
1
x̂k (+) = x̂k (−) + Kk [zk − ĥk (x̂k (−)) − ∂ 2 (hk , Pk (−))] ,
2
the desired equation in 6.1-26.

Next we simplify Equation 203 to derive the equation for Kk under the second-order Taylor
series approximation. To do this we first evaluate
1
hk (xk ) − ĥk (xk ) = hk (x̂k (−)) − H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T )
2
1 2
− hk (x̂k (−)) − ∂ (hk , Pk (−))
2
1 1
= −H(x̂k (−))x̃k (−) + ∂ 2 (hk , x̃k (−)x̃k (−)T ) − ∂ 2 (hk , Pk (−)) .
2 2

Using this expression we see that the product x̃k (−)(hk (xk ) − ĥk (xk ))T is then
1 1
−x̃k (−)x̃k (−)T H(x̂k (−))T + x̃k (−)∂ 2 (hk , x̃k (−)x̃k (−)T )T − x̃k (−)∂ 2 (hk , Pk (−))T .
2 2
Taking expectation of this the third term vanishes and by using using Equation 210 the
second term also vanishes. Thus we are left with

E x̃k (−)(hk (xk ) − E[hk (xk )])T = −Pk (−)H(x̂k (−))T . (211)

Next we can now compute the inner product required in the expression for the matrix inverse
portion of Kk or
[hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T .
To do this lets define this product as T , and use the shorthand that H ≡ H(x̂k (−)). Then
this product has nine terms and is given by
1 1
T = H x̃k (−)x̃k (−)T H T − H x̃k (−)∂ 2 (hk , x̃k (−)x̃k (−)T ) + H x̃k (−)∂ 2 (hk , Pk (−))
2 2
1 2 T T T
− ∂ (hk , x̃k (−)x̃k (−) )xk (−) H
2
1 2
+ ∂ (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T
4
1 2
− ∂ (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , Pk (−))T
4
1 2
− ∂ (hk , Pk (−))x̃k (−)T H T
2
1 2
− ∂ (hk , Pk (−))∂ 2 (hk , x̃k (−)x̃k (−)T )T
4
1 2
+ ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T .
4
Taking the required expectation of this expression and recalling Equation 210 we see that
the second, third, fourth, and seventh terms vanish and we get
1
E[T ] = HPk (−)H T + E[∂ 2 (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T ]
4
1 2 1
− ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T − ∂ 2 (hk , Pk (−))∂ 2 (hk , Pk (−))T
4 4
1 2
+ ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T ,
4
or canceling terms that
1
E[T ] = HPk (−)H T + E[∂ 2 (hk , x̃k (−)x̃k (−)T )∂ 2 (hk , x̃k (−)x̃k (−)T )T ]
4
1 2
− ∂ (hk , Pk (−))∂ 2 (hk , Pk (−))T . (212)
4
In the above expression notice that the last two terms are equal to the definition of the
matrix Ak in the book. Next lets evaluate the above expression for Ak . To begin with, for
notational simplicity, we will drop the k subscripts and the (−) notation by considering the
second term in the above expression or
∂ 2 (h, x̃x̃T )∂ 2 (h, x̃x̃T )T .
This matrix since it is an outer product has an ijth element given by
! !
X X ∂ 2 hi X X ∂ 2 hj
∂ 2 (h, x̃x̃T )i ∂ 2 (h, x̃x̃T )j = x̃q x̃p x̃n x̃m
p q
∂xp ∂x q m n
∂xm ∂x n

X ∂ 2 hi ∂ 2 hj
= x̃p x̃q x̃m x̃n .
p,q,m,n
∂xp ∂xq ∂xm ∂xn

Taking the expectation of this expression and using the fact that for Gaussian random
variables we have
E[x̃p x̃q x̃m x̃n ] = ppq pmn + ppm pqn + ppn pqm , (213)
we can write the above as
X ∂ 2 hi ∂ 2 hj
[ppq pmn + ppm pqn + ppn pqm ] .
p,q,m,n
∂xp ∂xq ∂xm ∂xn

At the same time the ijth element of the other term in the definition of Ak is
X ∂ 2 hi ∂ 2 hj
2 2
∂ (h, P )i ∂ (h, P )j = ppq pmn .
p,q,m,n
∂xp ∂xq ∂xm ∂xn

Thus these terms cancel and we are left with


X ∂ 2 hi ∂ 2 hj
Aij = [ppm pqn + ppn pqm ] . (214)
p,q,m,n
∂xp ∂xq ∂xm ∂x n

This combined with the other term in Equation 212 gives the books equation 6.1.28. Com-
bining all of the expressions obtained thus far we finally end with
 −1
Kk = Pk (−)Hk (x̂k (−))T Hk (x̂k (−))Pk (−)Hk (x̂k (−))T + Rk + Ak ,

as we were to show.

In this section we have computed all of the needed expectations required to evaluate Equa-
tion 204. Using everything from earlier we find that

Pk (+) = Pk (−) − Kk Hk (x̂k (−))Pk (−) ,

the same as in the book.

Notes on Statistical Linearization

In this section we see to approximate the nonlinear vector function f (x) with the linear form

f (x) ≈ a + Nf x , (215)

where the vector a and the matrix Nf are determined by statistical linearization. To deter-
mine the specific form for a and Nf introduce the approximation error e as

e = f (x) − a − Nf x ,

and seek to minimized an objective function of a and Nf defined by

J = E[eT Ae]
= E[(f (x) − a − Nf x)T A(f (x) − a − Nf x)] , (216)

where A is some symmetric positive semidefinite matrix. To find the minimum of J with
respect to a, e take the derivative with respect to a, set the resulting expression equal to
zero and then solve for a. Using Equation 312 to take the derivative we find
∂J
= E[−2A(f (x) − a − Nf x)] = 0 .
∂a
When we solve for a we get

a = E[f (x)] − Nf E[x] = fˆ − Nf x̂ , (217)

or the books equation 6.2-7. When we put this expression for a back into our approximate
expression for f (x) given by Equation 215 we get

f (x) ≈ fˆ + Nf (x − x̂) , (218)

and for J given by Equation 216 we find that

J = E[(f − fˆ − Nf (x − x̂))T A(f − fˆ − Nf (x − x̂))] ,

and we need to find the minimum of the above expression as a function of Nf . Taking the
Nf derivative of the above expression is made easier if we write J as

J = E[(f − fˆ)T A(f − fˆ)] − E[(A(f − fˆ))T Nf (x − x̂)]


− E[(x − x̂)T N T A(f − fˆ)] + E[(x − x̂)T N T ANf (f − fˆ)] .
f f

Then using the product rule for the fourth term and Equations 319 and 320 to evaluate the
matrix derivatives we see that
∂J
= −E[A(f − fˆ)(x − x̂)T ] − E[A(f − fˆ)(x − x̂)T ]
∂Nf
+ E[ANf (x − x̂)(x − x̂)T ] + E[ANf (x − x̂)(x − x̂)T ]
= −2AE[(f − fˆ)(x − x̂)T ] + 2ANf E[(x − x̂)(x − x̂)T ] .

When we set this last expression equal to zero and solve for Nf we find

Nf = E[(f − fˆ)(x − x̂)T ]E[(x − x̂)(x − x̂)T ]−1 . (219)

Introducing x̃ as x̃ = x̂ − x we have E[(x − x̂)(x − x̂)T ] = P and

E[(f − fˆ)(x − x̂)T ] = E[f xT ] − E[f x̂T ] − E[fˆxT ] + E[fˆx̂T ]


= E[f xT ] − fˆx̂T − fˆx̂T + fˆx̂T
= fd
xT − fˆx̂T ,

so
Nf = (fd
xT − fˆx̂T )P −1 , (220)
or the book’s equation 6.2-9.

Notes on Computational Considerations

We now consider the evaluation of ξij or


Z ∞ Z ∞
ξij = ··· xj fi (x)p(x)dx , (221)
−∞ −∞
when fi (x) only depends on a limited number, say xs of state elements. To envision this case
think of this this way. Given all of the fi (·) functions, if they are functions of only a limited
number of state variables then we can find an index (say j) such that our ith nonlinearity,
fi (x), does not depend on the state xj . Then if we let xs be the state variables without the
variable xj then we can write the joint density p(x) as xj conditional on xs as

p(x) = p(xs , xj ) = p(xj |xs )p(xs ) .

With this we can then write the expression for ξij as


Z ∞ Z ∞
ξij = ··· xj fi (xs )p(xj |xs )p(xs )dxs dxi
−∞ −∞
Z ∞ Z ∞ Z ∞
= ··· fi (xs )p(xs ) xj p(xj |xs )dxj dxs . (222)
−∞ −∞ −∞
R∞
We now need to evaluate −∞ xj p(xj |xs )dxj . Since xs and xj are jointly normal this integral
is in fact the conditional mean of xj given the vector xs . Since xj and xs are jointly Gaussian
this expression has the form given by
Z ∞
E[xj |xs ] = xj p(xj |xs )dxj = x̂j + pTjs Σ−1
ss (xs − x̂s ) .
−∞

see [3]. Where we have defined

pjs = E[(xj − x̂j )(xs − x̂s )]


Σss = E[(xs − x̂s )(xs − x̂s )T ] .

Note that pjs is a column vector and contains the elements in the jth column/row of P =
E[x̃x̃T ] excluding the jth diagonal element, while Σss is a matrix. When we put these two
into Equation 222 we get
Z ∞ Z ∞

ξij = ··· fi (xs )p(xs ) x̂j + pTjs Σ−1
ss (xs − x̂s ) dxs
−∞ −∞
 
= E x̂j + pTjs Σ−1ss (xs − x̂s ) fi (xs )

= x̂j E [fi (xs )] + pTjs Σ−1


ss E [(xs − x̂s )fi (xs )] ,

since x̂j , pjs and Σss are all constants with respect to the expectation over xs . Now since
the expression pTjs Σ−1
ss E[(xs − x̂s )fi (xs )] is a scalar we can take its transpose and not change
its value. Doing this gives
ξij = fˆi x̂j + nTsi pjs , (223)
where we have defined nsi as

nTsi = E[fi (xs )(xs − x̂s )T ][Σ−1


ss ]
T

= E[fi (xs )(xs − x̂s )T ]E[(xs − x̂s )(xs − x̂s )T ]−1 , (224)

which is the books equation 6.2-36. Note that I think the book is missing a transpose on its
definition of nsi .
Notes on Direct Statistical Analysis of Nonlinear Systems (CADET)

We approximate f (x) with

f (x) ≈ fa (x) = Nm m + Nr r , (225)

thus the error e is given by

e = f (x) − fa (x) = f (x) − Nm m − Nr r .

Thus eeT is given by

eeT = (f (x) − Nm m − Nr r)(f (x) − Nm m − Nr r)T


= f f T − f mT Nm
T
− f r T NrT
− Nm mf T + Nm mmT Nm T
+ Nm mr T NrT
− Nr rf T + Nr rmT NmT
+ Nr rr T NrT .

Taking the expectation of this and using the fact that E[rmT ] = E[mr T ] = 0 and that m is
a constant gives

E[eeT ] = E[f f T ] − E[f ]mT NmT


− E[f r T ]NrT
− Nm mE[f T ] + Nm mmT Nm T

− Nr E[rf T ] + Nr E[rr T ]NrT .

We next want to take the trace of this expression and use it to evaluate the Nm and Nr
derivatives needed to find a minimum of the objective function J = trace(E[eeT ]). The
derivative expressions we need are
∂ ∂
trace(E[eeT ]) = 0 and trace(E[eeT ]) = 0 .
∂Nm ∂Nr
To evaluate these derivatives we will use Equations 313, 314, 315, 316, 317, and 318. For
the derivative of Nm we find

trace(E[eeT ]) = −E[f ]mT − E[f ]mT + 2Nm mmT = 0 ,
∂Nm
or that Nm must satisfy
Nm mmT = E[f ]mT , (226)
which is the books equation 6.4-4. For the derivative of Nr we find

trace(E[eeT ]) = −E[f r T ] − E[f r T ] + Nr (2E[rr T ]) = 0 ,
∂Nr
or that Nr must satisfy
Nr E[rr T ] = E[f r T ] , (227)
which is the books equation 6.4-5.
Now our dynamic equation is given by ẋ = f (x, t) + w which under the assumption that
x = m + r and Equation 225 becomes
ṁ + ṙ = Nm m + Nr r + w .
When w ∼ N(b, Q) we can introduce the variable u as u = w − b and get
ṁ + ṙ = Nm m + Nr r + b + u .
If we assume that we can decouple into two equations the expressions for the mean from the
residual expressions we get the following
ṁ = Nm m + b (228)
ṙ = Nr r + u , (229)
which are the book’s equations 6.4-9. From Equation 229 we can derive the differential
equation for S ≡ E[rr T ] to find
Ṡ = Nr (m, S)S + SNrT (m, S) + Q , (230)
since w ∼ N(b, Q) so u ≡ w − b ∼ N(0, Q).

If our system is linear f (x) = F x = F m + F r we can evaluate the Equations 226 and 227.
For Equation 227 we find that E[f r T ] is given by
E[f r T ] = E[F mr T + F rr T ] = F E[rr T ] = F S(t) .
So Nr becomes
Nr = E[f r T ]S −1 = F SS −1 = F .
In the same way we find that Equation 226 becomes
Nm m = E[f (x)] = F m so Nm = F ,
also.

Next we want to prove that when r is Gaussian we have the identity


d
Nr (m, S) = E[f (x)] . (231)
dm
To do this we note that if r is Gaussian with a covariance matrix S then E[f (x)] can be
written Z ∞ Z ∞
E[f (x)] = ··· f (m + r)N(r; 0, S)dr ,
−∞ −∞
where to simplify notation we have introduced the notation
 
1 1 T −1
N(r; µ, Σ) ≡ exp − (r − µ) Σ (r − µ) , (232)
(2π)n/2 |Σ|1/2 2
to represent the probability density function of a n dimensional Gaussian random variable.
Using the above expression for E[f (x)] we see that the m derivative of this is given by
Z ∞ Z ∞
∂E[f (x)] ∂f (m + r)
= ··· N(r; 0, S)dr .
∂m −∞ −∞ ∂m
Note that this m derivative is really also an r derivative as
∂f (m + r) ∂f (m + r)
= ,
∂m ∂r
and thus we need to evaluate
Z ∞ Z ∞
∂E[f (x)] ∂f (m + r)
= ··· N(r; 0, S)dr .
∂m −∞ −∞ ∂r
In the above integration the expression ∂f (m+r)
∂r
is a matrix with an ijth component given by
∂fi (m+r)
∂rj
. If we then consider just the integral over rj (denoted by Ij ) in the above expression
then by integration by parts we have
Z ∞ Z ∞
∂fi (m + r) ∂
Ij = N(r; 0, S)drj = 0 − fi (m + r) N(r; 0, S)drj .
rj =−∞ ∂rj rj =−∞ ∂rj
Evaluating the rj derivative above we see that
  
∂ ∂ 1 1 T −1
N(r; 0, S) = exp − r S r
∂rj ∂rj (2π)n/2 |S|1/2 2
  
1 1 T −1 1 T −1 T −1
= exp − r S r − (ej S r + r S ej )
(2π)n/2 |S|1/2 2 2
= −(r T S −1 ej )N(r; 0, S) .
Thus the vector derivative of N(r; 0, S) is given by

N(r; 0, S) = −N(r; 0, S)S −1 r (233)
∂r
Using these results we see that
Z ∞ Z ∞
∂E[fi (x)] ∂fi (m + r)
= ··· N(r; 0, S)dr
∂m −∞ −∞ ∂r
Z ∞ Z ∞
= ··· fi (m + r)rN(r; 0, S)drS −1 .
−∞ −∞

When we then consider this expression for all values of i we see that
∂E[f (x)]
= E[f (x)r T ]S −1 = Nr (m, S) ,
∂m
as we were to show.

In the special case where f is a scalar function and we assume that the random perturbation
r is Gaussian than taking nr ≡ Nr (m, S) we find
 Z ∞ 
T −1 1 1 −r 2 /2σ2
nr = E[f (x)r ]S = 2 √ f (m + r)re dr
σ 2πσ −∞
Z ∞
1 2 2
= √ f (m + r)re−r /2σ dr . (234)
3
2πσ −∞
At the same time we find the scalar version of the equation for Nm or Nm (m, S)m = E[f ]
becomes Z ∞
1 2 2
nm = √ f (m + r)e−r /2σ dr . (235)
2πσm −∞
Problem Solutions

Problem 6-1 (a density for x)

The given expression for the probability density function (p.d.f) for x is a special case of
distribution known as the gamma distribution. If X is given by a gamma distribution then
it has a p.d.f given by
β α α−1 −βx
f (x|α, β) = x e . (236)
Γ(α)
From which we see that the books expression can be obtained by taking α = 2 and β = λ.
We now derive several properties of the gamma distribution and then answer the requested
questions by making the substitution α = 2 and β = λ in the resulting expressions.

The characteristic function for a gamma random variable is given by


Z ∞
itX β α α−1 itx −βx
ζ(t) = E(e ) = x e e dx
x=0 Γ(α)
Z ∞
βα
= xα−1 e−(β−it)x dx .
Γ(α) x=0
v
To evaluate this integral let v = (β − it)x so that x = β−it and dv = (β − it)dx and we get
Z ∞
βα 1 dv
ζ(t) = α−1
v α−1 e−v .
Γ(α) (β − it) v=0 β − it
If we recall the definition of the Gamma function
Z ∞
Γ(α) ≡ v α−1 e−v dv , (237)
v=0

we see that the above integral becomes


 α
βα Γ(α) β
ζ(t) = =
Γ(α) (β − it)α β − it
 −α
it
= 1− . (238)
β
Using this expression we could compute E(X) and E(X 2 ) via derivatives. Alternatively we
could compute these expectations directly as follows
Z ∞ Z ∞ α
β α α −βx βα v −v dv
E(X) = x e dx = e
x=0 Γ(α) Γ(α) v=0 β α β
Z ∞
1 Γ(α + 1) α
= v α e−v dv = = , (239)
βΓ(α) v=0 βΓ(α) β
when we make the substitution v = βx. Next we find E(X 2 ) given by
Z ∞ Z
2 β α α+1 −βx β α 1 1 ∞ α+1 −v
E(X ) = x e dx = v e dv
x=0 Γ(α) Γ(α) β α+1 β v=0
1 (α + 1)α
= 2
Γ(α + 2) = . (240)
β Γ(α) β2
Thus the variance of a gamma random variable is given by
(α + 1)α α2 α
Var(X) = 2
− 2 = 2. (241)
β β β

Part (a): If we take α = 2 and β = λ in the expression from Equation 239 we get
2
E(X) = .
λ

Part (b): For this part to find the maximum value of f (x|α, β) when X is a gamma random
variable we take the x derivative of f , set the result equal to zero, and then solve for x. We
find
df (x|α, β) βα 
= (α − 1)xα−2 e−βx − βxα−1 e−βx = 0 .
dx Γ(α)
When we solve for x we find
α−1
x= .
β
If we take α = 2 and β = λ in the above expression we get
1
x= .
λ

R
Part (c): The expectation of y is given by E[Y ] = yp(y)dy, while the value of y that
maximizes p(y) is given by the solution to p′ (y) = 0. If these two points are are the same
then we must have
p′ (E[y]) = 0 .

Problem 6-2 (the non-linear expectation reduces to the linear)

For this problem we use the “E” notation for expectation rather than the books “hat”
b The books equation 6.1-5 is
notation. In symbols E[X] ≡ X.

Ṗ (t) = E[xf T ] − E[x]E[f ]T + E[f xT ] − E[f ]E[x]T + Q . (242)

If our function f is in fact a linear function f (x) = F x then E[f ] = F E[x] where we are
assuming that F is not state dependent. Next xf T = xxT F T under this linear assumption,
so taking expectations we have

E[xf T ] = E[xxT ]F T .

Since P (t) by definition can be written as

P (t) = E[(x − E[x])(x − E[x])T ] = E[xxT ] − E[x]E[x]T ,

we have that
E[xxT ] = P (t) + E[x]E[x]T .
and we see that E[xf T ] is given by

E[xf T ] = F P (t) + E[x]E[x]T F T .

In the same way since f xT is just the transpose of xf T we see that

E[f xT ] = P (t)F T + F E[x]E[x]T .

Thus the differential equation for P (t) becomes

Ṗ (t) = P F T + E[x]E[x]T F T − E[x]E[x]T F T


+ F P + F E[x]E[x]T − F E[x]E[x]T + Q
= PFT + FP + Q,

the expression we were to show.

Problem 6-3 (filtering xk using a quadratic expression for the measurements)

Part (a): We will estimate x(tk ) after observing the measurement zk using and expression
quadratic in zk or
x̂k (+) = ak + bk zk + ck zk2 .
Since zk = h(xk ) + vk in terms of h(·) the above becomes

x̂k (+) = ak + bk h(xk ) + bk vk + ck h(xk )2 + 2ck h(xk )vk + ck vk2 . (243)

To have the above expression for x̂k (+) be an unbiased estimator of xk we require that
E[x̂k (+)] = E[xk ] = x̂k (−). Using this with E[vk ] = 0 and E[vk2 ] = r when we take the
expectation of Equation 243 we get

ak + bk E[h(xk )] + ck E[h(xk )2 ] + ck r = x̂k (−) . (244)

This is the same expression we were asked to derive.

Problem 6-4 (deriving the linearized Kalman filter)

For this problem we derive the expressions for a linearized Kalman filter that are summarized
in the book. To begin we consider a first-order Taylor expansion of f (x(t), t) and hk (xk )
about a known trajectory x̄(t) as follows

∂f
f (x(t), t) = f (x̄(t), t) + (x − x̄) + · · · (245)
∂x x=x̄(t)

∂hk
hk (xk ) = hk (x̄(tk )) + (xk − x̄(tk )) + . (246)
∂x x=x̄(tk )
To simplify notation we will define the matrices F and Hk to be

∂f
F = F (x̄(t), t) =
∂x x=x̄(t)

∂h
Hk = Hk (x̄(tk ), tk ) = .
∂x x=x̄(tk )

When the expression for f (x(t), t) above it put into the state dynamic Equation 194 or
˙
x̂(t) = fˆ(x(t), t) we get
˙
x̂(t) = fˆ(x(t), t) = E[f (x(t), t)] = f (x̄(t), t) + F (x̄(t), t)(x̂ − x̄) .

Next we want to put our Taylor expansions above into Equation 195 or
dT − x̂fˆT + fd
Ṗ (t) = xf xT − fˆx̂T + Q .
dT .
Since we know how to evaluate fˆ, the expectation of f , lets first consider the term xf
Before we take the expectation, under the Taylor expansion above f (x, t) we find xf T is
given by
xf T = xf (x̄(t), t)T + x(x − x̄)T F (x̄(t), t)T .
When we use the fact that x = x̂ − x̃ we get xf T equal to

xf T = (x̂ + x̃)f (x̄(t), t)T + (x̂ + x̃)(x̂ + x̃ − x̄)T F (x̄(t), t)T


= (x̂ + x̃)f (x̄(t), t)T + x̂(x̂ + x̃ − x̄)T F (x̄(t), t)T + x̃(x̂ + x̃ − x̄)T F (x̄(t), t)T

From which we can now take the expectation to find that


dT = x̂f (x̄(t), t)T + x̂(x̂ − x̄)T F (x̄(t), t)T + P (t)F (x̄(t), t)T .
xf

We can now evaluate Ṗ (t) using Equation 195. We find

Ṗ (t) = x̂f (x̄, t)T + x̂(x̂ − x̄)T + P (t)F (x̄, t)T


− x̄f (x̄, t)T − x̂(x̂ − x̄)T F (x̄, t)T
+ f (x̄, t)x̄T + F (x̄, t)(x̂ − x̄)x̂T + F (x̄, t)P (t)
− f (x̄, t)x̄T − F (x̄, t)(x̂ − x̄)x̂T + Q
= P (t)F (x̄, t)T + F (x̄, t)P (t) + Q ,

the desired expression.

Next we evaluate Equation 200 or

x̂k (+) = x̂k (−) + Kk [zk − ĥk (xk )] .

From the given Taylor series expansion for hk (xk ) we have the expectation of hk (xk ) denoted
by ĥk (xk ) given by

ĥk (xk ) = E[hk (xk )] = hk (x̄(tk )) + H(x̄(tk ), tk )(x̂k (−) − x̄(tk )) .


Thus we see that Equation 200 becomes

x̂k (+) = x̂k (−) + Kk [zk − hk (x̄(tk )) − H(x̄(tk ), tk )(x̂k (−) − x̄(tk ))] ,

the desired equation.

Next we simplify Equation 203 to derive the equation for Kk under the linearization above.
To do this we first evaluate

hk (xk ) − ĥk (xk ) = hk (x̄(tk )) + H(x̄(tk ))(xk − x̄k )


− hk (x̄(tk )) − H(x̄(tk ))(x̂k − x̄k )
= H(x̄(tk ))(xk − x̂k )
= −H(x̄(tk ))x̃k (−) .

Using this expression we see that the product x̃k (−)(hk (xk ) − ĥk (xk ))T is then

−x̃k (−)x̃k (−)T H(x̄(tk ))T .

Taking expectation of this we are left with



E x̃k (−)(hk (xk ) − E[hk (xk )])T = −Pk (−)H(x̄(tk ))T .

Next we can now compute the inner product required in the expression for the matrix
inverse portion of Kk or [hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T . From the above expression for
hk (xk ) − ĥk (xk ) we see that this is given by

E[[hk (xk ) − ĥk (xk )][hk (xk ) − ĥk (xk )]T ] = H(x̄(tk ))Pk (−)H(x̄(tk ))T .

Combining all of the expressions obtained thus far we finally end with
 −1
Kk = Pk (−)H(x̄(tk ))T H(x̄(tk ))Pk (−)H(x̄(tk ))T + Rk ,

as we were to show.

In this section we have computed all of the needed expectations required to evaluate Equa-
tion 204. Using everything from earlier we find that

Pk (+) = Pk (−) − Kk H(x̄(tk ))Pk (−) ,

the same as in the book.

Problem 6-6 (a density for x)

To begin we will square the given expression to get several terms. We find

(f (x) − a − bx − cx2 )2 = a2 − 2af (x) + f (x)2


+ 2abx − 2bf (x)x + b2 x2
+ 2acx2 − 2cf (x)x2 + 2bcx3 + c2 x4 .
We now take the expectation of the above expression. We find

E[(f (x) − a − bx − cx2 )2 ] = a2 − 2aE[f (x)] + E[f (x)2 ]


+ 2abE[x] − 2bE[f (x)x] + b2 E[x2 ]
+ 2acE[x2 ] − 2cE[f (x)x2 ] + 2bcE[x3 ] + c2 E[x4 ] .

To find the values of a, b, and c such that the above expression is a minimum we take the
derivative of E[(f (x) − a − bx − cx2 )2 ] with respect to each of these values, set the resulting
expressions equal to zero and solve for them. We find

a = fˆ − bx̂ − cxb2
2
(−fˆxb2 xb3 + fd x2 (−x̂xb2 + xb3 ) + fcx(xb2 − xb4 ) + fˆx̂xb4 )
b = 3 2
(xb2 + xb3 + x̂2 xb4 − xb2 (2x̂xb3 + xb4 ))
2
fdx2 (x̂2 − xb2 ) + fcx(−x̂xb2 + xb3 ) + fˆ(xb2 − x̂xb3 )
c = 3 2 .
(xb2 + xb3 + x̂2 xb4 − xb2 (2x̂xb3 + xb4 ))

These calculations are done in the Mathematica file chap 6 prob 6.nb. Now to try to make
this expressions look more like the ones in the book we could transform these “raw moments”
i.e. the expressions E[xi ] into central moments mi defined by mi = E[(x − x̂)i ]. This can be
done with the “inverse binomial transform” (see [10]) or
n 
X 
n n
E[x ] = mk x̂n−k .
k
k=0

Using this expression we have

E[x] = x̂
E[x2 ] = m2 + x̂2
E[x3 ] = m3 + 3m2 x̂ + x̂3
E[x4 ] = m4 + 4m3 x̂ + 6m2 x̂2 + x̂4 .

Warning: When we do this however the results for a, b, and c don’t seem to match the
book’s results. If anyone sees an error with what I’ve done please contact me.

Problem 6-7 (evaluating E[xn ])

Warning: While this seems like a simple problem, I was unable to show the desired result
for n even. If anyone sees anything wrong with what I’ve done or has an alternative way to
solve this problem please contact me.

Recall (see [3]) that if a given random variable X has a characteristic function ζ(t) and the
expectation E[xn ] exists for some positive integer n then it can be evaluated from

E[X n ] = i−n ζ (n) (0) . (247)


When X is a Gaussian random variable n withomean µ and variance σ 2 then it has a p.d.f
2
given by f (x|µ, σ 2) = (2πσ 2 )−1/2 exp − (x−µ)
2σ2
. In what follows we will try to evaluate ζ(t)
directly. We have Z ∞
itX 1 (x−µ)2
itx − 2σ 2
ζ(t) = E(e ) = e e dx .
(2π)1/2 σ −∞
The argument of the exponential in the above expression is given by
1  2 
− 2
x − 2µx + µ2 − 2iσ 2 tx

1  
= − 2 x2 − 2(µ + iσ 2 t)x + µ2

1  
= − 2 x2 − 2(µ + iσ 2 t)x + (µ + iσ 2 t)2 − (µ + iσ 2 t)2 + µ2

1 (µ + iσ 2 t)2 µ2
= − 2 (x − (µ + iσ 2 t))2 + −
2σ 2σ 2 2σ 2
1 µ + 2µσ ti − σ 4 t2 − µ2
2 2
= − 2 (x − (µ + iσ 2 t))2 +
2σ 2σ 2
1 2µσ ti − σ 4 t2
2
= − 2 (x − (µ + iσ 2 t))2 + .
2σ 2σ 2
Thus the integral expression we seek to evaluate looks like
 Z ∞
1 σ 2 t2 1 2 2
ζ(t) = 1/2
exp itµ − e− 2σ2 (x−(µ+iσ t)) dx .
(2π) σ 2 −∞

To evaluate this let v = x − (µ + iσ 2 t) so that dx = dv and the integral above becomes


Z ∞
1 2
e− 2σ2 v dv = (2π)1/2 σ .
−∞

Thus the characteristic function for a Gaussian random variable is given by


 
σ 2 t2
ζ(t) = exp itµ − , (248)
2

as we were to show. If our Gaussian has zero mean µ = 0 and unit variance then σ 2 = 1 and
the above expression simplifies to
 2
t
ζ(t) = exp − ,
2

We can use this result to compute the expectation of X n when X has a unit variance. If X
does not have a unit variance then derivation below changes slightly but is effectively the
same. Thus we will evaluate E[X n ] in the case where X has unit variance. To determine
this expectations requires that we evaluate derivatives of ζ(t). We find
t2
ζ (0) (t) = e− 2
t2 t2
ζ (1) (t) = e− 2 (−t) = −te− 2
t2 t2 t2
ζ (2) (t) = −e− 2 + t2 e− 2 = (−1 + t2 )e− 2
t2 t2 t2
ζ (3) (t) = 2te− 2 + (−1 + t2 )(−t)e− 2 = (3t − t3 )e− 2
t2
ζ (4) (t) = (3 − 6t2 + t4 )e− 2
t2
ζ (5) (t) = (−15t + 10t3 − t5 )e− 2
t2
ζ (6) (t) = (−15 + 45t2 − 15t4 + t6 )e− 2 .

Some of these calculations are done in the Mathematica file chap 6 prob 7.nb. By perform-
ing these derivatives we see that the form of ζ (n) (t) looks like it takes the form
t2
ζ (n) (t) = φn (t)e− 2 , (249)

where φn (t) is a nth degree polynomial. In fact for n odd the polynomial φn (t) has only
odd powers of t (with no intercept term) and for n even it looks like φn (t) has only even
powers of t. Thus with the above expression for ζ (n) (t) we see that to evaluate expectations
of powers of X we have
E[X n ] = i−n ζ (n) (0) = i−n φn (0) ,
thus we need to be able to evaluate the polynomial φn (t) at t = 0.

From the above expression for ζ (n) (t) in Equation 249 we see that using the product rule
ζ (n+1) (t) is given by
t2 t2 t2
ζ (n+1) (t) = φ′n (t)e− 2 − φn (t)te− 2 = (φ′n (t) − tφn (t))e− 2 ,

and that ζ (n+2) (t) is given by


t2
ζ (n+2) (t) = [φ′′n (t) − φn (t) − tφ′n (t) − t(φ′n (t) − tφn (t))] e− 2
  t2
= φ′′n (t) − 2tφ′n (t) + (−1 + t2 )φn (t) e− 2 .

Thus the recursive relationship between the coefficient polynomials φn+2 (t) and the one two
previous φn (t) is
φn+2 (t) = φ′′n (t) − 2tφ′n (t) + (−1 + t2 )φn (t) . (250)
Given the examples of φ1 (t), φ3 (t), and φ5 (t) presented at the beginning of this problem lets
form the induction hypothesis that when n is odd then φn (t) is an odd polynomial that is
n
X
φ2n+1 (t) = a2k+1 t2k+1 . (251)
k=0

This statement is true for the polynomials φ1 (t), φ3 (t), and φ5 (t) above. If we assume
that φ2n+1 (t) has the form given by Equation 251 then we see from Equation 250 that
φ2n+3 (t) must also have a form given by Equation 251. This is because each of the terms in
Equation 251 is odd polynomial and so the sum is another odd polynomial. In this case we
see that φ2n+1 (t) = 0 and by Equation 247 all odd powers of X have zero expectation.

Given the examples of φ2 (t), φ4 (t), and φ6 (t) presented at the beginning of this problem lets
form the induction hypothesis that when n is even then φn (t) is an even polynomial that is
n
X
φ2n (t) = a2k t2k . (252)
k=0

Again using Equation 250 we see that if φ2n (t) has this form then φ2n+2 (t) will also have this
form.

At this point I would like to derive a recursive expression for φ2n (0) since that would enable
me to evaluate the desired expectations. I was unable to do this however. If anyone sees a
method to do this please let me know.

Problem 6-8 (deriving the expression ξij )

See the notes on Page 125 for this derivation.

Problem 6-9 (deriving the expressions for Nm (m, S) and Nr (m, S))

See the notes on Page 127 for the requested derivation.

Problem 6-10 (show that E[exT ] = 0)

Since e = f (x) − fa (x) = f (x) − Nm m − Nr r we find

E[exT ] = E[(f − Nm m − Nr r)xT ]


= E[f xT ] − Nm mE[xT ] − Nr E[rxT ] .

Since x = m + r and m is a constant this becomes

E[f ]mT + E[f r T ] − Nm mmT − Nr E[rr T ] .

Using Equations 226 and 227 we see that all terms cancel and we end with E[exT ] = 0 as
we were to show.
Problem 6-11 (evaluating some multiple-input describing function gains)

For this problem f (x) = x(1 + x2 ) so that using Equation 234 to compute nr we get
Z ∞
2 1 2 2
nr (m, σr ) = √ (m + r)(1 + (m + r)2 )re−r /2σr dr
2πσr3 −∞
= 1 + 3m2 + 3σr2 .

Using Equation 235 to compute nm we get


Z ∞
2 1 2 2
nm (m, σr ) = √ (m + r)(1 + (m + r)2 )e−r /2σr dr
2πmσr −∞
= 1 + m2 + 3σr2 .

Where we have evaluated the above integrals in the Mathematica file chap 6 prob 11.nb.
Note that to evaluate these integrals “by hand” we would expand the polynomial argument,
for example
(m + r)(1 + (m + r)2 ) ,
in the case of nm into a polynomial in r and then use the results on the expectation of powers
of a zero mean Gaussian random variable with variance σ 2 . This later result is that if X is
such a random variables then E[X n ] = 0 when n is odd and

E[X n ] = 1 · 3 · 5 · · · (n − 1)σ n ,

when n is even.

Problem 6-12 (describing function gains for some simple probability densities)

For the ideal relay nonlinearity f (x) defined by



 D x>0
f (x) = 0 x=0 ,

−D x < 0

we want to evaluate nr under several different assumptions on the probability density for r
(the residual). Since f (·) is a scalar function we have
Z Z
1 1 1
nr = 2 E[f (x)r] = 2 rf (m + r)p(r)dr = 2 rf (r)p(r)dr ,
σ σ σ
when we assume that m is zero.
r2
For a Gaussian density recall that p(r) = √2πσ 1
e− 2σ2 and we compute
Z
1
nr = rf (r)p(r)dr
σ2
Z Z
D 0 1 2
− r2 D ∞ 1 r2
= − 2 r√ e 2σ dr + 2 r√ e− 2σ2 dr
σ −∞ 2πσ σ 0 2πσ
0 ∞
D 1 2
2
− r2
D 1 2
2
− r2
= − 2√ (−σ ) e 2σ + 2√ (−σ ) e 2σ
σ 2πσ −∞ σ 2πσ 0
r
D D 2D
= √ (1 − 0) − √ (0 − 1) = .
σ 2π σ 2π πσ

Next recall that a Triangular density between −b and +b had an analytic representation
of its density of 

 0 r < −b
 1
b2
(r + b) −b <r<0
p(r) = 1 .

 − b2 (r − b) 0 < r < b

0 r>b
To use this, we first compute the expression for the variance σ 2 of this density in terms of
the parameter b. We find
Z
2
σ = r 2 p(r)dr
Z 0 Z b
2 1 1
= r 2 (r + b)dr − r 2 2 (r − b)dr
−b b 0 b
Z 0 Z 0  Z b Z b 
1 3 2 1 3 2
= 2 r dr + b r dr − 2 r dr − b r dr
b −b −b b 0 0
" 0 # " b #
0 b
1 r 4 r 3 1 r 4 r 3
= 2 +b − 2 −b
b 4 −b 3 −b b 4 0 3 0
 4   
1 b b4 1 b4 b4 1
= 2 − + − 2 − = b2 .
b 4 3 b 4 3 6
Next we calculate nr . We find
Z
1
nr = rf (r)p(r)dr
σ2
Z 0 Z b  
1 1 1 1
= r(−D) 2 (r + b)dr + 2 r(D) − 2 (r − b) dr
σ 2 −b b σ 0 b
Z 0 Z b
D D
= − 2 2 (r 2 + br)dr − 2 2 (r 2 − br)dr
σ b −b σ b 0
 0  b
D r 3 br 2 D r 3 br 2
= − 2 2 + − −
σ b 3 2 −b σ 2 b2 3 2 0
 3   
D b b3 D b3 b3
= − 2 2 − − 2 2 −
σ b 3 2 σ b 3 2
Db
= .
3σ 2

When we use the fact that b = 6σ we get
r
2D
nr = .

For a uniform density between − a2 and a2 we start by recalling that the variance is related
to the end points of the density by
Z
a2
σ = r 2 p(r)dr =
2
.
12
Next we calculate nr as
Z a "Z Z a #
0
1 2 1 1 2
nr = rf (r) drn = 2 −Drdr + Drdr
σ 2 − a2 a aσ − a2 0
" 0 a #     
D r 2 r 2 2 D 1 a2 1 a2 aD
= 2
− + rdr = 2 + = 2.
aσ 2 −a 2 0 aσ 2 4 2 4 4σ
2

2 √
Since σ 2 = a12 or a = 12σ, so in terms of σ only nr is given by
r
3D
nr = .

Problem 6-13 (CADET applied to a nonlinear differential equation)

We want to approximate f (x) = a1 x + a2 x2 under the CADET philosophy. That is we take


f (x) ≈ fa (x) = nm m + nr r for two functions nm and nr . For nr assuming a Gaussian density
for r i.e. r ∼ N(0, σ 2 ) we find
Z ∞
1 2 2
nr = √ f (r + m)re−r /2σ dr
2πσ 3 −∞
Z ∞
1 2 2
= √ (a1 (r + m) + a2 (r + m)2 )re−r /2σ dr
2πσ 3 −∞
= a1 + 2a2 m .
While for nm we find
Z ∞
1 2 2
nm = √ f (r + m)e−r /2σ dr
2πmσ −∞
Z ∞
1 2 2
= √ (a1 (r + m) + a2 (r + m)2 )e−r /2σ dr
2πmσ −∞
1 
= a1 m + a2 (m2 + σ 2 ) .
m
Where we have evaluated the above integrals in the Mathematica file chap 6 prob 13.nb.
Then using Equations 228 and 229 with σ 2 = p we have
ṁ = mm m + b = a1 m + a2 (m2 + σ 2 ) + b = a1 m + a2 (m2 + p) + b
ṙ = nr r + u = (a1 + 2a2 m)r + u .
Then using Equation 230 we get for the evolution of p

ṗ = 2nr p + q = 2(a1 + 2a2 m)p + q .


Chapter 7 (Suboptimal Filter Design and Sensitivity
Analysis)

Notes on the text

Notes on Example 7.1-2

We can derive the state covariance update equations by noting that figure 7.1-5 is the same
system as that given in Example 7.1-1 but with the value of γ taken to be 1, with q22 = 0,
and with the matrix P H T R−1 HP taken to be zero. This last fact is because we are not
getting the reduction in state uncertainty from any measurements. Using these facts and
the results from Exercise 7-3 on Page 155 the state covariance equation

Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
becomes the set of scalar equations
ṗ11 = −2α1 p11 + q1
ṗ12 = −(α1 + α2 )p12 + p11
ṗ22 = −2α2 p22 + 2p12 ,
which are the same ones given in the book. If we want to get the steady-state values for the
covariance errors under the system above we set Ṗ = 0 and then solve for the elements of
P . When we do this and by solving these equations from top to bottom we find that the
steady-state values for each element of P are
q
p11 =
2α1
p11 q
p12 = =
α1 + α2 2α1 (α1 + α2 )
p12 q
p22 = = ,
α2 2α1 α2 (α1 + α2 )
which is the book’s equation 7.1-14.

Notes on Example 7.1-4

From the given system


ẋ = ax + w with w ∼ N(0, q) (253)
z = bx + v with v ∼ N(0, r) , (254)
bp(t)
we have F = a, G = 1, Q = q, H = b, and R = r, and thus k = r
and the Kalman filter
error covariance is
b2 p2
ṗ = 2ap + q − ,
r
β

with p(0) = p0 . As shown in Chapter 4 Problem 4-11 that p∞ = ar
b2
1+ a
and so we find
k∞ given by  
T −1 p∞ b a β a+β
k ∞ = p∞ H R = = 1+ = .
r b a b
When we define the estimation error between the Wiener and Kalman filters as δp(t) =
pw (t) − pk (t), we can use the bound presented in the book to show
 
2 b2
2 T −1
||p0 − p∞ || ||H R H|| (p 0 − p ∞ ) r
||δp(t)|| ≤ = ,
8|αmax | 8|αmax |
where αmax is the maximum real part of the eigenvalues of the matrix F − K∞ H. In this
case everything is a scalar and we find
 
a+β
F − K∞ H = a − b=β.
b
So the above error discrepancy between the Kalman and Wiener filters becomes
(p0 − p∞ )2 b2
||δp(t)|| ≤ ,
8rβ
the same result given in the book.

Notes on Example 7.1-5

For this example we assume that we are filtering the given system using

x̂˙ = −x̂ + K(z − x̂) ,

with K a constant as of yet unspecified. Now K is not totally unconstrained, since we note
that the above is equivalent to

x̂˙ = −(1 + K)x̂ + Kz ,

and the condition of stability of this differential equation is that the coefficient of x̂ be
negative or that 1 + K > 0 or K > −1.

We will take our filtering performance measure given by J = p∞ , where p∞ is the steady-
state state error covariance for this problem. Since we assume that we will operate the filter
with a constant gain (rather than the optimal time varying Kalman gain) and the correct
system dynamics the covariance propagation for this filter will follow the Wiener filtering
equations
Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T , (255)
where K is a constant. In this example, we have F = −1, G = 1, Q = q, H = 1, and R = r,
and so the expression for Ṗ becomes

ṗ = (−1 − k)p + p(−1 − k) + q + k 2 r


= −2(k + 1)p + q + k 2 r .
To compute the steady-state solution p∞ we can solve this differential equation for p(0) = p0
and take the limit t → ∞ or by taking ṗ = 0 and solving for p. When we do the later we
find
q + k2 r
p∞ = .
2(k + 1)
The actual real world parameters α that are unknown to us are the two values of q and r or
the variances of the process noise and measurement noise. Because of this, in the minimum
sensitivity design formulation we take α = (q, r) and now need to compute the function
J0 (α) = J0 (q, r). For given values of α = (q, r) the actual parameters, the minimum value
of J over k is denoted J0 (α). To find this function J0 (α) we can take the derivative with
respect to k of the function J = p∞ , computed above, set the result equal to zero, solve for
k, and then put this value of k back into the given expression for J. Taking the k derivative
of J we find
dJ 2kr k2r + q 2kr + k 2 r − q
= − = .
dk 2(1 + k) 2(1 + k)2 2(k + 1)2
Setting this expression equal to zero, then solving for k with the quadratic formula we find
p r
−r ± r 2 + rq q
k= = −1 ± 1 + ,
r r
p
If we put this expression into J(·) we find p∞ = −r ± r(q + r). Since p∞ must be positive
we need to take the plus sign. If we do this and denote the resulting expression by J0 (q, r)
we find √
J0 (q, r) = −r + r q + r ,
which is the books equation 7.1-17.

We next proceed to evaluate the S1 , S2 , and S3 criterion expressions for this example.

Given the above expressions for the objective S1 we have

S1 = min max J(α, β) = min max J(k; q, r)


β∈B α∈A k (q,r)
2
q+k r k2 + 1
= min max = min .
k (q,r) 2(k + 1) k 2(1 + k)

Where have used the fact that since 0 < q < 1 and 0 < r < 1 the maximum of J(k; q, r)
over (q, r) is obtained when r = 1 and q = 1. To perform the next minimization we take
the derivative with respect to k, set the result equal to zero, and solve for k. We find the
derivative given by
2k k2 + 1 k 2 + 2k − 1
+ = .
2(1 + k) 2(1 + k)2 2(1 + k)2
When we set that equal to zero and solve the resulting quadratic equation for k we find

−2 ± 4 + 4 √
k= = −1 ± 2 .
2
To have stability
√ requires that k > −1 and we must take the positive solution found above
giving k = 2 − 1 in agreement
√ with the book. Thus using this criterion function we should
filter our signal with k = 2 − 1.
For S2 criterion we have

S2 = min max (J(α, β) − J0 (α))


β∈B α∈A
 2 
k r+q 2 1/2
= min max − (r + rq) + r
k q,r 2(1 + k)
To get an understanding of the the inner maximization in the above min-max problem we
simply plot the above expression as a function of 0 ≤ q ≤ 1 and 0 ≤ r ≤ 1 for several values
of k > −1. This is done in the Mathematical file example 7 1 5.nb where it is found that
for all k the maximization above happen at the end values of r namely either when r = 0
from which we see
 2 
k r+q q
− (r + rq) + r
2 1/2
= ,
2(1 + k) r=0 2(1 + k)
when r = 0 and when r = 1 we find
 2 
k r+q p k2 + q
− (r + rq) + r
2 1/2
=1− 1+q+ .
2(1 + k) r=1 2(1 + k)
For all k the first expression is maximized as a function of q when 0 ≤ q ≤ 1 when q = 1
1
and gives 2(1+k) , for its maximum value. To find the minimum or maximum of the second
expression as a function of q we take the q derivative, set the result equal to zero, and solve
for q. Before performing all of this work we note that the second derivative with respect to
q of the expression under discussion is given by
1
,
4(1 + q)3/2
which is always positive indicating that the value of q that makes first derivative zero is a
minimum and not a maximum. Thus the maximum in the case when r = 1 must occur at
the end points of the domain i.e. q = 0 or q = 1. The expression above at these points has
the values of
k2 √ 1 + k2
and 1 − 2 + ,
2(1 + k) 2(1 + k)
respectively. In summary then, when we fix the value of k the value we could get from the
expression maxα∈A (J(α, β) − J0 (α)) is given by
 
1 k2 √ 1 + k2
max , ,1 − 2 + .
2(1 + k) 2(1 + k) 2(1 + k)
We plot the three functions that make up this maximization in Figure 4. To evaluate the S2
criterion and design our filter we have to pick k such that S2 is minimized since
 
1 k2 √ 1 + k2
S2 = min max , ,1 − 2 + .
k 2(1 + k) 2(1 + k) 2(1 + k)
From Figure 4 we see that this minimum occurs at k = 1 and this would correspond to
our filtering design parameter to use. This result agrees with that discussed in the book
but no detail as to how the book got its results was provided. In the Mathematical file
example 7 1 5.nb much of the algebra for this problem is worked.
1.0

0.8

Out[29]= 0.6

0.4

0.2

k
-0.5 0.5 1.0 1.5 2.0 2.5 3.0

1 k2
√ 1+k 2
Figure 4: A plot of the three functions 2(1+k) (in blue) 2(1+k) (in red) and 1 − 2 + 2(1+k)
(in brown). For each value of k the maximum of these three functions is the result of the
maximization over α of J(α, β) − J0 (α) and is a function of β = k.

Notes on Example 7.1-6

For this example the true physical system has parameters given by F = −β, G = 1, Q = q,
H = 1, and R = r, while we choose to filter our system with possibly non optimal parameters
F ∗ = −βf , K ∗ = k, and H ∗ = 1. Because of this we are filtering with an incorrect
implementation of dynamics and measurements and need to use the results derived below to
obtain the steady-state error covariance expression p∞ .

To derive an equation for p∞ we use the books equations 7.2-14, 7.2-15, and 7.2-16 or
Equations 263, 264, and 265 below with the values for F , F ∗ etc given above. In this case
we first note that ∆F = F ∗ − F = −βf + β and ∆H = 0, so that in steady-state when
Ṗ = V̇ = U̇ = 0 the given system becomes

0 = 2(−βf − k)p∞ + 2(−βf + β)v∞ + q + k 2 r


0 = −βv∞ + (−βf − k)v∞ + (−βf + β)u∞ − q
0 = −2βu∞ + q .

When we solve this system for p∞ , v∞ and u∞ we find

βf2 q + β 2 k 2 r + β(βf + k)(q + k 2 r)


p∞ = (256)
2β(βf + k)(β + βf + k)
q(β + βf )
v∞ = −
2β(β + βf + k)
q
u∞ = .

A couple equivalent expressions for p∞ are given by simple manipulations of the above
expression. We find

k2r (βf2 + β(βf + k))q


p∞ = +
2(βf + k) 2β(βf + k)(β + βf + k)
k2r (βf2 + β(β + βf + k − β))q
= +
2(βf + k) 2β(βf + k)(β + βf + k)
2
k r+q (βf2 − β 2 )q
= + .
2(βf + k) 2β(βf + k)(β + βf + k)

The first expression for p∞ is equation 7 in the original reference for this section see [1], while
the last equation is the result presented in the book.

Given the above expression for p∞ the vector “α” in this case or the actual physical param-
eters is given by the three unknown scalar values (β, q, r) and the design parameter vector
“β”, is given by the two parameters (βf , k).

Next as the design criterion S2 and S3 requires we need to compute the expression for J0 (·)
defined as a minimum in terms of the unknowns for this problem by

J0 (β, q, r) ≡ min p∞ (β, q, r; βf , k)


βf ,k
 2 
k r+q (βf2 − β 2 )q
= min + .
βf ,k 2(βf + k) 2β(βf + k)(β + βf + k)
We could find this minimum analytically, numerically, or more easily by recognizing that its
value is equal to the optimal Kalman steady-state value of p∞ when we filter with the true
system value. This in turn is given by the steady-state value of

Ṗ = F P + P F T + GQGT − P H T R−1 HP .

With F = −β, G = 1, Q = q, H = 1, and R = r When we specify the above equation to the


given expression we get
p2∞
0 = −2βp∞ + q − , (257)
r
or
p2∞ + 2βrp∞ − qr = 0 .
Solving this with the quadratic equation gives
p
−2βr ± 4β 2r 2 − 4(−qr) p
p∞ = = −β ± β 2 r 2 + qr .
2
Since p∞ > 0 we need to take the positive sign in the above and we have
p
J0 (β, q, r) ≡ p∞ = −βr + β 2 r 2 + qr . (258)

With this background discussion we now proceed to determine the performance of optimal,
S1 , S2 , S3 and β = 0.5 filters when q = 10, r = 1 and 0.1 ≤ β ≤ 1 as documented in this
example.
• The Kalman Optimal Filter:
To design and plot the optimal filtering performance result for this problem note that
this system is a special case of that in example 7.1-4, with a = −β and b = +1. In
that example we found that p∞ is given by
! !
ar β̃ β̃
p∞ = 2 1 + = −βr 1 + with
b a β
r r
b2 q q
β̃ = a 1 + 2 = −β 1 + 2
ar β r

Thus we let q = 10, r = 1 and 0.1 ≤ β ≤ 1 and plot p∞ as a function of β.

• The S1 Filter:
The design of the S1 filter is defined by

S1 = min max J(α, β)


β∈B α∈A
 2 
k r+q (βf2 − β 2 )q
= min max + .
(βf ,k) (β,q,r) 2(βf + k) 2β(βf + k)(β + βf + k)

There are probably many ways to implement such a filter. For this example we will do
this in a brute force way. What this means is that we will create a grid that samples
from the possible values for βf and k. Then for each value of βf and k we have as
a candidate pair to filter with we then need to compute the inner maximization over
(β, q, r). Again we do this by simply sampling the provided function on a discretized
grid of points. Having evaluated the above function at each of these points we return
the maximum. We then move to the next candidate pair for (βf , k) and repeat this
procedure. The filter designer would then pick the values of βf and k that gave the
minimum over all tested pairs.

• The S2 and S3 Filter:


We design these two filters in the same way we do the S1 filter except that the objective
function is slightly different than in the S1 case.
The β = 0.5 case:
In this case we assume that we think know the dynamics exactly β = 0.5 and that we
design a optimal Kalman filter under that assumption. Thus we have βf = 0.5 and k
would be given by the optimal Kalman gain under the assumption that β = 0.5.
We can find this value of k by using the value of p∞
and then take k = K∞ = P∞ H T R−1 .
From this we have that k is given by
r
q
k = −β + β 2 + .
r

Since we assume that we know that the value of β is 1/2 the value of k that we will
filter is given by taking β = 1/2 in the above expression.
4
optimal filter
S1 filter (grid based)
3.8 S1 filter (analytic)
S2 filter
S3 filter
3.6 beta 0.5
estimated error covariance p_infinity

3.4

3.2

2.8

2.6

2.4

2.2

2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
true system beta

Figure 5: Attempted duplication of the results found in figure 7.1-11 from the book. This
is qualitatively very similar to the corresponding figure from the text. See the main text for
more discussion on this plot.

As discussed in [1] the S1 criterion can be determine exactly given the functional form for
J(α, β). We have

S1 = min max p(α, β)


β α

= min max p(β, q, r; βf , k)


(βf ,k) (β,q,r)

= min max p(β, qmax , rmax ; βf , k)


(βf ,k) β

= min p(βmin, qmax , rmax ; βf , k)


(βf ,k)

= J0 (βmin , qmax , rmax ) ,

where J0 is given by Equation 258. This means that we can exactly analyze the S1 criterion.
Unfortunately I was not able to numerically duplicate these expected analytic results. In
the Figure 5 one will see a numerical duplication of the filters discussed above. For the S1
filter I present both the analytic and the grid based numeric result. These plots are produced
in the Matlab file example 7 1 6 plots brute force optimization.m, and if anyone sees
anything wrong with what I have done please contact me. These results, as they stand,
are very similar to the ones presented in the books figure 7.1-11. In addition, qualitatively
Figure 5 shows the statements given in the text on min/max filters. Namely that they enable
a filter design that is very close to the optimal result (the green line). From the plot it looks
like the S2 filter is the closest to the optimal result. Finally, we mention that the algebra to
derive some of these expressions is can be found in the Mathematical file example 7 1 6.nb.
Notes on incorrect implementation of dynamics and measurement

When we filter with incorrect Kalman gain K ∗ , measurement sensitivity H ∗ , and dynamics
F ∗ , the differential equation for the error x̃ ≡ x̂ − x in the continuous case can be derived
using the implemented equation for x̂ and true state dynamic equation for x as follows
d d
x̃ = (x̂ − x)
dt dt
= F ∗ x̂ + K ∗ (z − H ∗x̂) − F x − Gw
= (F ∗ − K ∗ H ∗ )x̂ − F x + K ∗ z − Gw
= (F ∗ − K ∗ H ∗ )x̂ − F x + K ∗ (Hx + v) − Gw
= (F ∗ − K ∗ H ∗ )x̂ − (F − K ∗ H)x + K ∗ v − Gw . (259)
or the books equation 7.2-8. Let ∆F = F ∗ − F and ∆H = H ∗ − H so that F = F ∗ − ∆F
and H = H ∗ − ∆H and then Equation 259 in terms of ∆F and ∆H becomes
d
x̃ = (F ∗ − K ∗ H ∗ )x̂ − (F ∗ − ∆F − K ∗ (H ∗ − ∆H))x + K ∗ v − Gw
dt
= (F ∗ − K ∗ H ∗ )x̂ − (F ∗ − K ∗ H ∗ )x + (∆F − K ∗ ∆H)x + K ∗ v − Gw
= (F ∗ − K ∗ H ∗ )x̃ + (∆F − K ∗ ∆H)x + K ∗ v − Gw , (260)
or the books equation 7.2-9. Since this equation involves
  x̃ and x on the right-hand-side,

let x′ be denoted as the vector of x̃ and x as x′ = , then since the dynamics of x is
x
governed by dxdt
= F x + Gw the system for x′ is given by
 ∗    ∗ 
dx′ F − K ∗ H ∗ ∆F − k ∗ ∆H x̃ K v − Gw
= + ≡ F ′ x′ + w ′ , (261)
dt 0 F x Gw
which is the books equation 7.2-10 and in which we have implicitly defined the variables F ′
and w ′ . Now using the system theory from earlier we have that the covariance matrix for
the variable x′ satisfies the following differential equation
dE[x′ x′T ]
= F ′ E[x′ x′T ] + E[x′ x′T ]F ′T + E[w ′ w ′T ] . (262)
dt
What we really want to study however is the behavior of the covariance matrix for x̃ only
since this represents the difference between the true state x and our estimate x̃. To obtain
this lets block partition the covariance of the vector x′ by introducing the matrices P , V ,
and U as  
′ ′T P VT
E[x x ] ≡ .
V U
Now from the definition of w ′ we can compute E[w ′ w ′T ] as
 ∗  
′ ′T K v − Gw  T ∗ T T T T T

E[w w ] = E v K −w G w G
Gw
 ∗ T ∗ T 
K vv K − K ∗ vw T GT − Gwv T K ∗ T + Gww T GT K ∗ vw T GT − Gww T GT
= E
Gwv T K ∗ T − Gww T GT Gww T GT
 ∗ 
K RK ∗ T + GQGT −GQGT
= ,
−GQGT GQGT
which is the expression for E[w ′ w ′T ] presented in the books equation 7.2-13. Using this
expression and the definition of F ′ we can construct the right-hand-side of Equation 262.
′ ′T
Setting this equal to the block partitioned form for dE[xdtx ] , we obtain a dynamical system
for the components. When we do this we obtain the following system

Ṗ = (F ∗ − K ∗ H ∗ )P + P (F ∗ − K ∗ H ∗ )T + (∆F − K ∗ ∆H)V
+ V T (∆F − K ∗ ∆H)T + GQGT + K ∗ RK ∗T (263)
V̇ = F V + V (F ∗ − K ∗ H ∗ )T + U(∆F − K ∗ ∆H)T − GQGT (264)
U̇ = F U + UF T + GQGT , (265)

with as before ∆F ≡ F ∗ − F and ∆H ≡ H ∗ − H.

If we are deleting states from the true state in order to derive the filter we will process
measurements with, then the filter equations for this case can be obtained from the ones
above if we make the substitutions

F ∗ → W T F ∗W
H ∗ → H ∗W
K∗ → W T K∗ .

In addition, we now define ∆F and ∆H as ∆F = W T F ∗ W − F and ∆H = H ∗ W − H.


Using this, we can derive the following for the equation for Ṗ from Equation 263 as follows

Ṗ = (W T F ∗ W − W T K ∗ H ∗ W )P + P (W T F ∗ W − W T K ∗ H ∗ W )T + (∆F − W T K ∗ ∆H)V
+ V T (∆F − W T K ∗ ∆H)T + GQGT + W T K ∗ RK ∗ T W
= W T (F ∗ − K ∗ H ∗ )W P + P W T (F ∗ − K ∗ H ∗ )T W + (∆F − W T K ∗ ∆H)V
+ V T (∆F − W T K ∗ ∆H)T + GQGT + W T K ∗ RK ∗T W , (266)

which duplicates the books equation 7.2-19. The other equations would be done in a similar
manner.

Problem Solutions

Problem 7-1 (the fixed gain k∞ gives the same steady-state error covariance)

Equation 7.2-3 from the book is

Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T ,

and is the equation that our covariance satisfies if we don’t use the optimal Kalman gain but
instead filter with another value say k. In example 7.1-3 we have F = 0, G = 1, Q = q,
H = 1, and R = r so this equation becomes

ṗ = −kp − kp + q + k 2 r = −2kp + q + k 2 r .
pq
If in particular we filter with the value k = r
, the above equation becomes
r r
q q q
ṗ = −2 p + q + · r = −2 p + 2q .
r r r

To find the steady-state solution to this equation we could solve it for p(t) and then take the
limit as t → ∞ or simply recall that in steady-state ṗ = 0 and then solve for p = p∞ in the
above equation. When we do that we find

p∞ = rq ,

the same steady-state value result we obtain if we had in fact done optimal Kalman filtering.

Problem 7-2 (discrete equations under errors in measurements and dynamics)

Recalling our definition of the a priori state error of x̃k (−) = x̂k (−) − xk , when we increment
k by one we get

x̃k+1 (−) = xk+1 − x̂k+1 (−)


= Φ∗k x̂k (+) − Φk xk − wk .

Now Introduce the notation ∆Φk ≡ Φ∗k − Φk so that Φk = Φ∗k − ∆Φk so the above becomes

x̃k+1 (−) = Φ∗k x̂k (+) − (Φ∗k xk − ∆Φk xk ) − wk = Φ∗k x̃k (+) + ∆Φk xk − wk .

This last result expresses x̃k+1 (−) in  terms of  x̃k (+) and xk . Next introduce the stacked
x̃k (−)
vector x′k (−) defined as as x′k (−) = , then we have that x′k (−) satisfies
xk
   ∗   
′ x̃k+1 (−) Φk x̃k (+) + ∆Φk xk −wk
xk+1 (−) = = +
xk Φk xk wk
 ∗    
Φk ∆Φk x̃k (+) −wk
= + . (267)
0 Φk xk wk

Then defining the blocks of the covariance of x′k+1 (−) as


 
′ ′ T Pk+1 (−) Vk+1(−)T
E[xk+1 (−)xk+1 (−) ] ≡ ,
Vk+1 (−) Uk+1 (−)

by using Equation 267 we find the block matrix expression for P ≡ E[x′k+1 (−)x′k+1 (−)T ] as
 ∗    ∗T   
Φk ∆Φk Pk (+) Vk (+)T Φk 0 Qk −Qk
P= + .
0 Φk Vk (+) Uk (+) ∆ΦTk ΦTk −Qk Qk

To evaluate this first we multiply the rightmost matrices together as


   ∗T   
Pk (+) Vk (+)T Φk 0 Pk (+)Φ∗k T + Vk (+)T ∆ΦTk Vk (+)T Φk T
= ,
Vk (+) Uk (+) ∆ΦTk ΦTk Vk (+)Φ∗k T + Uk (+)∆ΦTk Uk (+)ΦTk
 
Φ∗k ∆Φk
multiply the resulting matrix on the left by the matrix and then add the
  0 Φk
 
Qk −Qk Pk+1(−) Vk+1 (−)T
matrix . When we do this and equate the result to
−Qk Qk Vk+1(−) Uk+1 (−)
as components we get a system of equations given by

Pk+1(−) = Φ∗k Pk (+)Φ∗k T + Φ∗k Vk (+)T ∆ΦTk + ∆Φk Vk (+)Φ∗k T + ∆Φk Uk (+)∆ΦTk + Qk
Vk+1(−) = Φk Vk (+)Φ∗k T + Φk Uk (+)∆ΦTk − Qk
Uk+1 (−) = Φk Uk (+)ΦTk + Qk .

These correspond to the book’s equations 7.2-20.

To derive a recursive relationship across a measurement note that we can write x̃k (+) as

x̃k (+) = xk − x̂k (+) = xk − (x̂k (−) + Kk∗ (zk − Hk∗ x̂k (−)))
= x̃k (−) − Kk∗ (Hk xk + vk − Hk∗x̂k (−)) .

Again using that ∆H ≡ H ∗ − H or putting H = ∆H + H ∗ in the above we get

x̃k (+) = x̃k (−) − Kk∗ (Hk∗ x̃k (−) + ∆Hk xk + vk )


= (I − Kk∗ Hk∗ )x̃k (−) − Kk∗ ∆Hk xk − Kk∗ vk .
 
′ x̃k (+)
If we introduce the vector xk (+) as then we see that in terms of x̃k (−) and xk by
xk
using the above we have
 
′ (I − Kk∗ Hk∗ )x̃k (−) − Kk∗ ∆Hk xk − Kk∗ vk
xk (+) =
xk
    
I − Kk∗ Hk∗ −Kk∗ ∆Hk x̃k (−) −Kk∗ vk
= + .
0 I xk 0

Using the above block definitions we find the matrix expression for P ≡ E[x′k (+)x′k (+)T ] as
     ∗ 
I − Kk∗ Hk∗ −Kk∗ ∆Hk Pk (−) Vk (−)T (I − Kk∗ Hk∗ )T 0 Kk Rk Kk∗ T 0
P= + .
0 I Vk (−) Uk (−) −∆Hk T Kk∗ T I 0 0
 
Kk∗ Rk Kk∗ T 0
Performing the matrix products, adding the matrix and equating the result
  0 0
Pk (+) Vk (+)T
to gives the following system
Vk (+) Uk (+)

Pk (+) = (I − Kk∗ Hk∗ )Pk (−)(I − Kk∗ Hk∗ )T − (I − Kk∗ Hk∗ )Vk (−)T ∆HkT Kk∗ T
− Kk∗ ∆Hk Vk (−)(I − Kk∗ Hk∗ )T + Kk∗ ∆Hk Uk (−)∆HkT Kk∗ T + Kk∗ Rk Kk∗ T
Vk (+) = Vk (−)(I − Kk∗ Hk∗ )T − Uk (−)∆Hk Kk∗ T
Uk (+) = Uk (−) ,

these results agree with the book’s equations 7.2-21.


Problem 7-3 (The fully coupled system γ 6= 0)

   
−α1 0 T −α1 γ
For the given system we have F = , ( so that F = ), Q =
     γ −α2 0 −α2
q11 0 1 0 r11 0
,H= , and R = . Then the matrix Riccati Equation 71 for
0 q22 0 1 0 r22
this problem becomes
     
ṗ11 ṗ12 −α1 p11 α1 p12 −α1 p11 γp11 − α2 p12
= +
ṗ12 ṗ22 γp11 − α2 p12 γp12 − α2 p22 −α1 p12 γp12 − α2 p22
   1 2 
q11 0 p + r122 p212
r11 11
1
p p + r122 p12 p22
r11 11 12
+ − 1 .
0 q22 p p + r122 p12 p22
r11 11 12
1 2
p + r122 p222
r11 12

As a system of scalar equations this becomes


1 2 1 2
ṗ11 = −2α1 p11 + q11 − p11 − p
r11 r22 12
1 1
ṗ12 = −α1 p11 + γp11 − α2 p12 − p11 p12 − p12 p22
r11 r22
 
p11 p12
= − α1 + α2 + + p12 + γp11
r11 r22
1 2 1 2
ṗ22 = 2γp12 − 2α2 p22 + q22 − p12 − p
r11 r22 22
p222 p212
= −2α2 p22 + q22 − − + 2γp12 ,
r22 r11
the same equations given by the book in Example 7.1-1.

Problem 7-6 (the expression for Pk (+) under non-optimal filtering)

The requested expression for Pk (+) is a specialization of the discussion given in the section
on filtering with incorrect dynamics and measurement found on Page 151, in that the result
we desire to show here can be obtained if we take ∆H = 0 and ∆F = 0. This means that
we are filtering with the correct dynamics and measurement sensitivity matrix but with a
potentially incorrect Kalman gain K ∗ . In this case Equation 263 becomes
Pk (+) = (I − Kk∗ Hk )Pk (−)(I − Kk∗ Hk ) + Kk∗ Rk Kk∗ T ,
which is the result requested.

Problem 7-7 (the error covariance differential equation for a Kalman like filter)

Part (a): From Table 4.3-1 a Kalman like filter means that we should derive an estimate
of our state x̂(t) by integrating
dx̂
= F x̂(t) + K(t)(z(t) − H(t)x̂(t)) .
dt
For this problem we assume that K(t) is general and not necessarily given by the optimal
expression P H T R−1 . Since the true state x follows the dynamics given by
dx
= F x + Gw ,
dt
the error vector x̃ = x̂ − x has a differential equation given by
dx̃ dx̂ dx
= −
dt dt dt
= F x̂ + K(z − H x̂) − F x − Gw .
Now the measurement z in terms of the true state x is given by z = Hx + v so we can write
the above as
dx̃
= F (x̂ − x) − Gw + K(H(x − x̂) + v)
dt
= (F − KH)x̃ − Gw + Kv ,
or the desired error differential equation.

Part (b): Given this differential equation for x̃ and following by example the results from
Chapter 4 we have that
dE[x̃x̃T ]
Ṗ =
dt
= (F − KH)E[x̃x̃T ] + E[x̃x̃T ](F − KH)T + E[(Gw − Kv)(Gw − Kv)T ]
= (F − KH)P + P (F − KH)T + GQGT + KRK T .
As we were to show. Note that if K equals the Kalman optimal value of P H T R−1 then we
see that the equation for Ṗ becomes
Ṗ = (F − P H T R−1 H)P + P (F − P H T R−1 H)T + GQGT + P H T R−1 RR−1 HP
= F P + P F T + GQGT − P H T R−1 HP − P H T R−1 HP + P H T R−1 HP
= F P + P F T + GQGT − P H T R−1 HP ,
or the matrix Riccati equation as it should.

Problem 7-8 (estimating a random ramp function)

For the system given for this problem with no process noise we have
    
d x1 0 0 x1
=
dt x2 1 0 x2
z = x2 + v with v ∼ N(0, r) ,
 
0 0  
With that representation we have F = , G = 1, Q = 0, H = 0 1 , and R = r.
1 0
The matrix Riccati equation of
Ṗ = F P + P F T + GQGT − P H T R−1 HP ,
in component form is given by
         
ṗ11 ṗ12 0 0 0 p11 1 p11 p12 0 0 p11 p12
= + +0−
ṗ12 ṗ22 p11 p12 0 p12 r p12 p22 0 1 p12 p22
    
0 p11 1 p11 p12 0 0
= −
p11 2p12 r p12 p22 p12 p22
     
0 p11 1 p212 p12 p22 − 1r p212 p11 − 1r p12 p22
= − = .
p11 2p12 r p12 p22 p222 p11 − 1r p12 p22 2p12 − 1r p222
From this we find the following system of scalar equations
1
ṗ11 = − p212
r
1
ṗ12 = p11 − p12 p22
r
1
ṗ22 = 2p12 − p222 ,
r
as we were to show. If we seek the steady state solution were ṗ11 = ṗ12 = ṗ22 = 0, from the
above we see that p12 = 0, p11 = 0, and p22 = 0. Then since K = P∞ H T R−1 we see that
K = 0 also.

To consider the case where the true system has process noise, but we in fact performed a
filter design without it recall that this is an example where we are using the correct dynamics
and measurement
 matrices in the implementation of the filter, but an incorrect process noise
0 w
vector q ∗ = rather than the true value of q = . To determine the effect that
0 0
this error has on our filtering equations we recall the section entitled “Exact Implementation
of Dynamics and Measurements” since in this case we are correctly modeling the F and H
matrices. In that section a procedure is outlined for assessing the true filters performance
under modeling errors. The procedure to follow is

 
0
• Assume all filter design values are correct i.e. take q = and calculate the optimal
0
Kalman gain K in that situation.
• Use the value of K found above to solve for P in
Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T , (268)
where in the above expression all variables (except K) are their true values.

 
0
The first part of the above procedure where we take q = done earlier and where we
0
have shown that in steady-state we get P = 0 and thus K = 0. When we put this value into
Equation 268 we get
     
T 0 p11 q 0 q p11
Ṗ = F P + P F + Q = + = ,
p11 2p12 0 0 p11 2p12
which when written as a system of scalar equations gives the desired expression.
Chapter 8 (Implementation Considerations)

Notes on the text

Notes on the ǫ technique

In this section of the notes we derive the expression for x̂k+1 (+) expressed via the books
equation 8.1-10 when we use the ǫ technique. From equation 8.1-5, the introduced expression
for ǫ′ , and the expression for ∆x̂k+1 (+) we have
x̂k+1 (+) − Φk x̂k (+) = Kk+1 [zk+1 − Hk+1 Φk x̂k (+)]
 
ǫrk+1 T T
+ T
Hk+1 (Hk+1Hk+1 )−1 [zk+1 − Hk+1 Φk x̂k (+)]
Hk+1 Pk+1(−)Hk+1 + rk+1
 T T 
ǫrk+1 Hk+1 (Hk+1 Hk+1 )−1
= Kk+1 + T
(zk+1 − Hk+1Φk x̂k (+)) .
Hk+1 Pk+1 (−)Hk+1 + rk+1
When we take Kk+1 to be the optimal Kalman gain given by
T T
Kk+1 = Pk+1(−)Hk+1 (Hk+1 Pk+1 (−)Hk+1 + rk+1 )−1 ,
in the above we get for the leading coefficient of (zk+1 − Hk+1Φk x̂k (+)) the following
r HT
T
Pk+1(−)Hk+1 + ǫ Hk+1 Hk+1
T
k+1 k+1
K= T
.
Hk+1Pk+1 (−)Hk+1 + rk+1
This is the book’s equation 8.1-10. Recall that Hk+1 is a row matrix and rk is a scalar, so
K is a column matrix.

From the above expression we see that in this case the Kalman gain K we are using to filter
with is composed of two parts K = Kreg + Kow . To determine how this non optimal gain
performs i.e. what the error covariance matrix Pk (+) will be for such a filter we need to use
the results from chapter 7. Namely the results under the section “Exact Implementation of
Dynamics and Measurements”. There the true a posterior error covariance matrix Pk (+)
when filtering with a Kalman gain Kk is given by
Pk (+) = (I − Kk Hk )Pk (−)(I − Kk Hk )T + Kk Rk KkT .
Using the expression in this section for K we find that Pk (+) is given by (dropping the k
subscript for notational simplicity)
P (+) = (I − KH)P (−)(I − KH)T + KRK T
= (I − Kreg H − Kow H)P (−)(I − Kreg H − Kow H)T + (Kreg + Kow )R(Kreg + Kow )T
= (I − Kreg H)P (−)(I − Kreg H)T + Kreg RKreg T
− (I − Kreg H)P (−)H T Kow T − Kow HP (−)(I − Kreg H)T + Kow HP (−)H T Kow T
+ Kreg RKow T + Kow RKreg T + Kow RKow T
= [P (+)]reg
− P (−)H T Kow T + Kreg HP (−)H T Kow T − Kow HP (−) + Kow HP (−)H T Kreg T
+ Kow HP (−)H T Kow T + rKreg Kow T + rKow Kreg T + rKow Kow T .
To further evaluate this expression we will need to simplify the terms after [P (+)]reg . To do
that we will first replace Kreg with P (−)H T (HP (−)H T + r)−1 to get

P (+) = [P (+)]reg
T TP (−)H T HP (−)H T Kow T
− P (−)H Kow + − Kow HP (−)
HP (−)H T + r
Kow HP (−)H T HP (−)
+ + Kow HP (−)H T Kow T
HP (−)H T + r
rP (−)H T Kow T rKow HP (−)
+ T
+ + rKow Kow T .
HP (−)H + r HP (−)H T + r

Counting terms after the [P (+)]reg expression starting at one let us combine the first and
sixth term, the third and sevenths terms, and the fifth and eighth terms to get

P (+) = [P (+)]reg
(−HP (−)H T )P (−)H T Kow T P (−)H T HP (−)H T Kow T
+ +
HP (−)H T + r HP (−)H T + r
(−HP (−)H T )P (−)Kow HP (−) Kow HP (−)H T HP (−)
+ +
HP (−)H T + r HP (−)H T + r
+ (HP (−)H T + r)Kow Kow T .

Now to simplify this we note that since H is a row vector the expression HP (−)H T is a
scalar and can be factored out if needed. This cancels four terms and we get

P (+) = [P (+)]reg + (HP (−)H T + r)Kow Kow T


ǫ2 rH T Hr
= [P (+)]reg + (HP (−)H T + r)
(HP (−)H T + r)2
ǫ2 rH T Hr
= [P (+)]reg + .
HP (−)H T + r

as we were to show.

Notes on fading memory filters and age-weighting: Example 8.1-3

In Example 8.1-3 The continuous system specified by Figure 8.1-8 is given by

ẋ = 0 with x(0) = x0
z = x + v with v ∼ N(0, σ 2 ) .

As a discrete system this is then given by

xk = xk−1
zk = xk + vk ,
with x0 = x0 and vk ∼ N(0, σ 2 ). From this the variables defined in the discrete Kalman
filtering case are φ = 1, q = 0, h = 1 and r = σ 2 . Using these the recursive Kalman filter
given by equation 8.1-16 is
p′k (−) = sp′k−1 (+) (269)
p′k (−)2
p′k (+) = p′k (−) −
p′k (−) + σ 2
′ s2 p′k−1 (+)2 σ 2 sp′k−1 (+)
= spk−1 (+) − ′ = 2 .
spk−1(+) + σ 2 σ + sp′k−1 (+)
In the steady state we have p′k (+) = p′k−1 (+) = p′∞ and using the above expression we see
that p′∞ is given by
σ 2 sp′∞
p′∞ = 2 ,
σ + sp′∞
or
2
sp′∞ + σ 2 p′∞ − σ 2 sp′∞ = 0 .
Solving for p′∞ in the above gives
 
σ 2 (s − 1) 1
p′∞ = 2
=σ 1− .
s s
To determine k∞ note that it is given by
p′∞ (−)
k∞ = p′∞ (−)H∞
T
(H∞ p′∞ (−)H∞
T
+ σ 2 )−1 = ,
p′∞ (−) + σ 2
with p′∞ (−) related to p′∞ (+) (which we know) by Equation 269 or
p′∞ (−) = sp′∞ (+) = σ 2 (s − 1) .
Thus we get
1
k∞ = σ 2 (s − 1)(σ 2 (s − 1) + σ 2 )−1 = 1 −.
s
To calculate the true error covariances pk (+) (note no prime) see Problem 8-2.

Notes on prefiltering

For the simple example given we find that we can evaluate the expectation of the additional
noise due to smoothing the signal as
 !2   !2 
X2 X2
1 1 
E xi − x2  = E xi − 2x2 
2 i=1 4 i=1

1   1
= E (x1 − x2 )2 = E[x21 − 2x1 x2 + x22 ]
4 4
1 2 σ2
= (σx − 2σx2 e−∆t + σx2 ) = x (1 − e−∆t ) ,
4 2
which is the books equation 8.2-8.
Notes on algorithms and integration rules

From the definition of Qk in terms of the continuous system recall that we have
Z t
Qk = Φ(t, τ )Q(τ )ΦT (t, τ )dτ , (270)
tk

so that we can take the t derivative of Qk using Leibniz’ rule as


Z t Z t
dQk T
= Q(t) + F (t)Φ(t, τ )Q(τ )Φ (t, τ )dτ + Φ(t, τ )Q(τ )ΦT (t, τ )F T (t)dτ
dt tk tk
= Q(t) + F (t)Qk + Qk F T (t) ,
dΦ(t,τ )
gives equation 8.3-6. We have used the facts that dt
= F (t)Φ(t, τ ) and Φ(t, t) = I.

Notes on integration algorithms

Take a second order approximation of the Taylor expansion of x(t) about tk as


1
xk+1 = xk + ẋk ∆tk + ẍk ∆tk 2 ,
2
where the approximation we will use for ẍk is
ẋk+1 − ẋk
ẍk = .
∆tk
When we put this approximation for ẍk in the above we get
 
1 2 ẋk+1 − ẋk ∆tk
xk+1 = xk + ẋk ∆tk + ∆tk = xk + (ẋk+1 + ẋk ) .
2 ∆tk 2
To finish this evaluation we need to find an expression for ẋk+1 . From our differential equation
ẋ = f (x, t), if we take ẋk+1 ≈ f (xk + ẋk ∆tk , tk ) then the above becomes
∆tk
xk+1 = xk + (f (xk + ẋk ∆tk , tk ) + ẋk ) . (271)
2
or the books equation 8.3-16. This method is known as the modified Euler method. Note
that since our differential equation is given by ẋ = f (x, t) one might also evaluate ẋk+1
as f (xk + ẋk ∆tk , tk+1 ) where we have evaluated f at the time tk+1 rather than tk . Since
we are assuming that ∆tk = tk+1 − tk ≪ 1 there is not much difference between either
approximation.

Notes on the mathematical form of equations

In this subsection of these notes we argue that different forms for the a priori to a posteriori
equation (i.e. computing P (+) from P (−)) have different computational properties and that
the so called Joseph’s form for computing P (+) from P (−) is to be preferred all other things
being equal. Dropping subscripts for notational simplicity, to begin we consider computing
P (+) via P (+) = (I − KH)P (−) under a perturbation in the Kalman gain K. To do this
we take K → K + δK and see that the new P (+) then becomes

P (+) = (I − KH)P (−) → (I − KH − δKH)P (−) = (I − KH)P (−) − δKHP (−) ,

thus the change in P (+) or δP (+) is given by

δP (+) = −δKHP (−) .

When we compute P (+) using the Joseph form and the same perturbation in K we have

P (+) = (I − KH)P (−)(I − KH)T + KRK T


→ (I − KH − δKH)P (−)(I − KH − δKH)T + (K + δK)R(K + δK)T
= (I − KH)P (−)(I − KH)T − (I − KH)P (−)(δKH)T − δKHP (−)(I − KH)T
+ δKHP (−)(δKH)T + KRK T + δKRK T + KRδK T + δKRδK T .

Therefore the change in P (+) is given by

δP (+) = −(I − KH)P (−)H T δK T − δKHP (−)(I − KH)T + δKHP (−)H T δK T


+ δKRK T + KRδK T + δKRδK T
= δK[RK T − HP (−)(I − KH)T ] + [KR − (I − KH)P (−)H T ]δK T
+ δK[HP (−)H T + R]δK T .

To simplify this, consider the expression for RK T − HP (−)(I − KH)T when we put in the
optimal Kalman gain K = P (−)H T (HP (−)H T + R)−1 . We see that we get

RK T − HP (−)(I − KH)T = RK T − HP (−) + HP (−)H T K T


= (R + HP (−)H T )K T − HP (−) = 0 .

Thus in this case we see that

δP (+) = δK[HP (−)H T + R]δK T ,

or is zero to first order.

Problem Solutions

Problem 8-1 (the covariance matrix Pk (+) when using the ǫ technique)

This result is verified in these notes in the section on the ǫ technique. See Page 158 where
it is derived.
Problem 8-2 (the expression for p∞ for Example 8.1-3)

If we assume when working this example that we will be filtering with the correct dynamic
and measurement model i.e. with correct F and H matrices but with the the non-optimal
Kalman gain k∞ given by
1
k∞ = 1 − ,
s
then from Chapter 7 in the section entitled “Exact Implementation of Dynamics and Mea-
surements” the true error covariance is given by Pk (+) obtained by solving the following
Pk (+) = (I − Kk Hk )Pk (−)(I − Kk Hk )T + Kk Rk KkT
Pk+1 (−) = Φk Pk (+)ΦTk + Qk .
In the case considered here these become
       2
1 1 1
pk (+) = 1− 1− pk (−) 1 − 1 − + 1− σ2
s s s
pk+1 (−) = pk (+) .
So we have 2
1 1 1
pk (+) = 1 − σ 2 + 2 pk (+) + 2 pk (+) .
s s s
When we let k → ∞ where we get
   2
1 1
1 − 2 p∞ (+) = 1 − σ2 ,
s s
or when we solve for p∞ (+) and simplify some we get
(s − 1)2 2 s − 1 2
p∞ (+) = σ = σ ,
s2 − 1 s+1
the result we were to show.

Problem 8-3 (verification of sequential observations)

 
For the second measurement we have H = 0 1 and R = [1] so this measurement updates
the Pi (+) covariance matrix (i for intermediate) as follows
P (+) = Pi (+) − Pi (+)H T [HPi (+)H T + R]−1 HPi (+)
 1 1   1 1   −1  
0 7   1 1
= 2 4 − 1 7 2 4 +1 0 1 2 4
1 7 1 7
4 8 4 8
1 8 4 8
 1 1   1   1 1   
8  1 7  8 1 7
= 2 4 − 4 2
= 1 7 − 4 16 32
1 7
4 8 15 78 4 8
4 8 15 7
32
49
64
 7 2 
= 15 15 ,
2 7
15 15

which provides the result we want to show.


Problem 8-6 (the exponential series)

The books equation 8.3-18 is given by



X ∆tn F (t1 )n ∆t2
Φ(t2 , t1 ) = = I + ∆tF (t1 ) + F (t1 )2 + · · · , (272)
n=0
n! 2

where ∆t = t2 − t1 . Since the function Φ has translational invariance with respect to time,
that is Φ(t2 , t1 ) = Φ(t2 − t1 , 0), we can simplify the problem by considering only a single
variable by taking t1 = 0 and t2 = t. In addition, since we are interested only in small times
from the time tk we can consider methods for approximating Φ(∆t, 0) = Φ(tk+1 , tk ), where
∆t = tk+1 − tk .

To show the equivalence of this expression with various integration methods, we first recall
that Φ(t, 0) is the solution to dΦ(t,0)
dt
= F (t)Φ(t, 0) with initial condition given by Φ(0, 0) = I.
Then note that if we approximate the solution to this differential equation at ∆t using Euler’s
method
xk+1 = xk + f (xk , tk )∆tk , (273)
so that the state x(t) is Φ(t, 0), the initial time tk is 0, the final time tk+1 = ∆t, and
f (x, t) = F (t)x so that f (xk , tk ) = F (0)Φ(0, 0) to get

Φ(∆t, 0) = Φ(0, 0) + ∆tF (0)Φ(0, 0) = I + ∆tF (0) ,

or the first two terms in Equation 272. Alternatively if we approximate this differential
equation with the modified Euler method given by Equation 271 we get
∆t
Φ(∆t, 0) = Φ(0, 0) + [F (0)(Φ(0, 0) + F (0)∆t) + F (0)]
2
∆t 2 ∆t2
= I+ [2F (0) + ∆tF (0) ] = I + ∆tF (0) + F (0)2 ,
2 2
or the first three terms in Equation 272.

Problem 8-7 (a W W T decomposition)



a b
We let the matrix W be W = we find
c d
    2 
T a b a c a + b2 ac + bd
WW = = .
c d b d ac + bd c2 + d2
 
2 1
Setting this expression equal to P = gives the scalar equations
1 2

a2 + b2 = 2
ac + bd = 1
c2 + d 2 = 2 .
From this we see that we have three equations and four unknowns and therefore no unique
solution. If we take W to be lower triangular then b = 0 and the equations above simplify
to

a2 = 2
ac = 1
c + d2 = 2 .
2

√ q
One solution to these is to take a = 2, and then c = √1 2
and d = 2 − 1
= 3
so d = 3
.
2 2 2 2
Chapter 9 (Additional Topics)

Notes on the text

Notes on adaptive Kalman filtering

Recall that the state error x̃ is defined as x̃ = x̂ − x and using that we can write the
innovation ν as ν = −H x̃ + v. Consider two times t1 and t2 where t2 > t1 and lets compute
E[ν(t2 )ν(t1 )T ]. In terms of x̃ and v this is given by

E[ν(t2 )ν(t1 )T ] = E[(H(t2 )x̃(t2 ) − v(t2 ))(H(t1 )x̃(t1 ) − v(t1 ))T ]


= H(t2 )E[x̃(t2 )x̃(t1 )T ]H(t1 )T − H(t2 )E[x̃(t2 )v(t1 )T ]
− E[v(t2 )x̃(t1 )T ]H(t1 )T + E[v(t2 )v(t1 )T ] .

Now to evaluate this expression we note that the measurement errors observed at the time
t1 i.e. v(t1 ) can and will affect our estimate error at the later time t2 i.e. x̃(t2 ), thus we
can’t conclude that E[x̃(t2 )v(t1 )T ] = 0 since x̃(t2 ) depends in on what v(t1 ) was. On the
other hand the measurement errors observed at the later time t2 i.e. v(t2 ) will not affect
or modify our estimation error made earlier i.e. x̃(t1 ), thus E[v(t2 )x̃(t1 )T ] = 0. Using the
known correlation of v i.e. E[v(t2 )v(t1 )T ] = R(t1 )δ(t1 − t2 ) we have

E[ν(t2 )ν(t1 )T ] = H(t2 )E[x̃(t2 )x̃(t1 )T ]H(t1 )T − H(t2 )E[x̃(t2 )v(t1 )T ] + R(t1 )δ(t2 − t1 ) , (274)

or the book’s equation 9.1-7.

Recall that we have derived the differential equation that x̃ satisfies in Equation 73. From
this equation we see that the solution for x̃(t2 ) given by
Z t2
x̃(t2 ) = Φ(t2 , t1 )x̃(t1 ) − Φ(t2 , τ )[G(τ )w(τ ) − K(τ )v(τ )]dτ , (275)
t1

where Φ(t2 , t1 ) is the transition matrix corresponding to F − KH. Using this expression for
x̃(t2 ) we can now compute terms needed to evaluate Equation 274. To begin we compute
E[x̃(t2 )x̃T (t1 )] as
 Z t2 
T T T
= E Φ(t2 , t1 )x̃(t1 )x̃ (t1 ) − Φ(t2 , τ )[G(τ )w(τ )x̃ (t1 ) − K(τ )v(τ )x̃ (t1 )]dτ
t1
= Φ(t2 , t1 )P (t2 ) ,

where we have used the facts that E[w(τ )x̃T (t1 )] = 0 and E[v(τ )x̃T (t1 )] = 0 when τ > t1 .
Next we evaluate E[x̃(t2 )v T (t1 )] and find
 Z t2 
T T T
= E Φ(t2 , t1 )x̃(t1 )v (t1 ) − Φ(t2 , τ )[G(τ )w(τ )v (t1 ) − K(τ )v(τ )v (t1 )]dτ
t1
Z t2
= 0+ Φ(t2 , τ )K(τ )E[v(τ )v T (t1 )]dτ = Φ(t2 , t1 )K(t1 )R(t1 ) .
t1
Thus using these two expressions in Equation 274 we find that

E[ν(t2 )ν(t1 )T ] = H(t2 )Φ(t2 , t1 )P (t1 )H T (t1 ) − H(t2 )Φ(t2 , t1 )K(t1 )R(t1 ) + R(t1 )δ(t2 − t1 )
= H(t2 )Φ(t2 , t1 )[P (t1 )H T (t1 ) − K(t1 )R(t1 )] + R(t1 )δ(t2 − t1 ) , (276)

this is the books equation 9.1-12. If our filter is optimal the the optimal expression for K is
given by K(t1 ) = P (t1 )H T (t1 )R−1 (t1 ) so that the above then becomes

E[ν(t2 )ν(t1 )T ] = R(t1 )δ(t2 − t1 ) .

If our dynamics ẋ = F x+Gw is linear and time-invariant then the transition matrix Φ(t2 , t1 )
is a function of only τ = t2 − t1 as

Φ(t2 , t1 ) = Φ(t1 + τ, t1 ) = e(F −KH)|τ | ,

and Equation 276 becomes

E[ν(t1 + τ )ν(t1 )T ] = He(F −KH)|τ | [P H T − KR] + Rδ(τ ) , (277)

or the books equation 9.1-14. In the above all matrices F , K, etc are evaluated at t = t1 .

Notes on Example 9.1-1

For this example our system and measurement equations are given by

ẋ = w with w ∼ N(0, q)
z = x + v with v ∼ N(0, r) ,

So the system and measurement matrices are scalars with f = 0 and h = 1. We will filter
our signal z using x̂˙ = k(z − x̂) where k is non optimal i.e. derived from erroneous values of
q and r. As we filter z with this value of k, we will be observing the innovations ν at each
time defined as ν = z − hx̂ = z − x̂. Using Equation 277 for this system we find that it
becomes
E[ν(t)ν(t − τ )] = e−k|τ | (p∞ − kr) + rδ(τ ) . (278)
Note that in the above expression we can compute the left-hand-side based on the realized
observations of the innovation function ν(t) and call this empirically computed function
φνν (τ ). By performing a least squares fit or (using some other method) we fit the empirically
obtained φνν (τ ) function to a autocorrelation model of the form Ae−k|τ | + B, for some
unknown coefficients A and B. Once we have empirical estimates of the coefficients A and B
using Equation 278 from the above model we see that these are estimates of the expressions
p∞ −kr and r. Since we know the value of k using in filtering this means we have an estimate
of p∞ . The expression p∞ is the steady state solution to the linear variance equation for
Wiener filtering, since we are not filtering with the optimal gain value k (but are instead
using its steady-state value). Thus we need to find the steady state solution p∞ to

Ṗ = (F − KH)P + P (F − KH)T + GQGT + KRK T ,


or
0 = −2kp∞ + q + k 2 r ,
which means that
q + k2 r
p∞ = .
2k
Since we have estimates of r and p∞ and we know the value of k we now have an estimate
of q. Showing that for this simple example we can identify the initially unknown values of q
and r.

Notes on observers for deterministic systems

In this subsection and the next we introduce and discuss the notation of an observer. Basi-
cally an observer is another transformation of the state x(t) (in addition to the measurement
z(t) = H(t)x(t)) that will estimate and that will allow us to determine a complete specifi-
cation of our state x(t). We begin by requiring that the relationship between our observer
ξ(t) and state x(t) should be
ξ(t) = T (t)x(t) .
In addition we would like our observe to have the property that if we know ξ(t) and z(t)
then we can construct an estimate of x(t) by inverting the combined measurement observer
system    
ξ(t) T (t)
= x(t) ,
z(t) H(t)
as  −1  
T (t) ξ(t)
x(t) = .
H(t) z(t)
Once we have specified the expression we will use for T (t) we can actually compute the
 −1  
T (t) ξ(t)
inverse above. Since this inverse then multiplies the stacked vector , we
H(t)  z(t) 
will define it in terms of two more unknowns A(t) and B(t) as the matrix A(t) B(t) .
These unknowns makes the state x(t) from the observer ξ(t) and measurement z(t) equation
simple
x(t) = A(t)ξ(t) + B(t)z(t) . (279)
Thus one way to state what we are doing isto observe
 that if we can obtain an expression for
T (t)
T (t) then we can form the stacked matrix , invert it, and obtain the block matrices
H(t)
A(t) and B(t). With these we can construct x(t) using Equation 279.

We next derive some relationships between the block


 matrices introduced thus far. From the
  T
definition that A B is the inverse of we have
H
 
  T
A B = AT + BH = I . (280)
H
On taking the product in the other order we have
 
T  
A B =I, (281)
H

which by evaluating the matrix product on the left-hand-side we have the block identity
   
TA TB I 0
= . (282)
HA HB 0 I

This in turn gives the four equations T A = I, T B = 0, HA = 0, and HB = I. Taking the


time derivative of the above block matrix identity while using the product rule gives another
set of four constraints
   
Ṫ A + T Ȧ Ṫ B + T Ḃ 0 0
= . (283)
ḢA + H Ȧ ḢB + H Ḃ 0 0

The result of this expression is that they allow us to move the time derivative on one factor in
a product to the other factor in the product while we introduce a negative sign. For example,
the (1, 1) and (1, 2) components imply the relationships Ṫ A = −T Ȧ and Ṫ B = −T Ḃ.

From how we have defined the observer ξ(t) its differential equation can be computed using
the relationships introduced above and the true state dynamics of x(t) as

ξ˙ = Ṫ x + T ẋ
= Ṫ (Aξ + Bz) + T (F (Aξ + Bz) + Lu)
= (Ṫ A + T F A)ξ + (Ṫ B + T F B)z + T Lu . (284)

which is the books equation 9.2-10. If we use two expressions Ṫ A = −T Ȧ and Ṫ B = −T Ḃ


in Equation 284 we get

ξ˙ = (T F A − T Ȧ)ξ + (T F B − T Ḃ)z + T Lu , (285)

which is the books equation 9.2-11. Then assuming we had a T matrix (and thus the A and
B matrices) we would use Equation 285 to propagate an estimate of ξ(t) namely ξ(t)ˆ and
then use this estimate in Equation 279 to derive an estimate of x. As a next step we must
make sure that whatever choice we make for T any initial error in our estimate of ξ and x
will exponentially propagate to zero. Thus we need to study the properties of the error in
our estimates of ξ and x.

To do this we begin with the error in ξ as ξ˜ defined in the normal way as ξ˜ = ξˆ − ξ with
ξ satisfying Equation 285 and our estimate ξˆ satisfying the same functional form as the
differential equation that ξ satisfies. That is we propagate ξˆ using
˙
ξˆ = (T F A − T Ȧ)ξˆ + (T F B − T Ḃ)z + T Lu .

From these two equation we see that our error ξ˜ satisfies


˙
ξ˜ = (T F A − T Ȧ)ξ˜ . (286)
Thus how the error in ξ behaves is determined by the eigenvalues of the matrix T F A − T Ȧ.
This observation guides the specification of the T matrix in that we would like this matrix
to have small eigenvalues and thus convergence of ξˆ to ξ to be “fast”.

Now to study the error in x or x̃ = x̂ − x. Using the facts that x = Aξ + Bz and x̂ = Aξˆ+ Bz
we see that x̃ can be written as

x̃ = x̂ − x = Aξˆ + Bz − Aξ − Bz = A(ξˆ − ξ) = Aξ˜ ,

or the simple relationship


x̃ = Aξ˜ , (287)
which is the book’s equation 9.2-16. If we premultiply this by T and use the fact that T A = I
we get
ξ˜ = T x̃ , (288)
which is the book’s equation 9.2-17. Now we can get the differential equation for the error
in x from the corresponding differential equation for the error in ξ as
d ˜
x̃˙ = (Aξ) = Ȧξ˜ + Aξ˜˙ ,
dt
using Equation 286 we have

x̃˙ = (Ȧ + A(T F A − T Ȧ))ξ˜ ,

but ξ˜ = T x̃ so we

x̃˙ = (Ȧ + AT F A − AT Ȧ)T x̃


= (ȦT + AT F AT − AT ȦT )x̃ . (289)

Note that we can further simplify this by noting that if we premultiply Equation 288 by A
to get Aξ˜ = AT x̃ and then use Equation 287 to replace Aξ˜ with x̃ we end with

x̃ = AT x̃ . (290)

Thus replacing AT in the second term on the right-hand-side of Equation 289 we have

x̃˙ = (ȦT + AT F − AT ȦT )x̃ , (291)

which is the books equation 9.2-18. From Equation 280 or AT + BH = I we can write AT
as AT = I − BH and then get for x̃˙ the following

x̃˙ = (ȦT + (I − BH)F − (I − BH)ȦT )x̃ = (F − BHF + BH ȦT )x̃ .

We next replace the H Ȧ in the third term in the above with −ḢA from Equation 283 to
get a third term that looks like

BH ȦT x̃ = −B ḢAT x̃ = −B Ḣ x̃ ,

where we used AT x̃ = x̃, to simplify. Using this for the third term for x̃˙ we finally get

x̃˙ = (F − BHF − B Ḣ)x̃ , (292)

which is the books equation 9.2-19.


Notes on Example 9.2-1

For this given example we have the noiseless measurement


 
  x1
z= 1 0 ,
x2
 
so that H = 1 0 . It is helpful to consider the dimensions of the matrices involved in this
problem. Now our state dimension n is 2 and H ∈ R1×2 gives us one noiseless measurement
(m = 1) thus to derive n − m = 1 more with an observer we have T ∈ R1×2 to give a second
observation via our observer ξ ∈ R. Then given the measurement z and an estimate of our
ˆ we use matrices A(t) and B(t) as Kalman like gains to construct an estimate of
observer, ξ,
x from
ˆ + B(t)z(t) .
x̂ = A(t)ξ(t)
From which we see that the dimensions of A and B are A ∈ R2×1 and B ∈ R2×1 . From the
(2, 2) component of Equation 282 in terms of these vectors gives
 
  b1
HB = I = 1 0 = b1 = 1 ,
b2

and thus b2 is currently unspecified. The differential equation for x̃ or x̃˙ = (F − BHF )x̃ for
this problem has the matrix F − BHF given by
     
0 1 1   0 1
F − BHF = − 1 0
0 −β b2 0 −β
     
1 0 1 0 0 1
= −
0 1 b2 0 0 −β
    
0 0 0 1 0 0
= = .
−b2 1 0 −β 0 −b2 − β

A nice property would be to have x̃ converge to zero faster than the system response time
which is β. To achieve this we would like to make the m = 1 eigenvalue of F − BHF which
is λ = −(β + b2 ) “significantly” smaller than
 β.
 One way to do this is to take λ = −5β so
1
that b2 = 4β and we now have that B = .

From the matrix


 dimensions discussed above, for the most general A and T , we can take
a1  
A = and T = t1 t2 . Using these general expressions, the three additional
a2
requirements from Equation 282 become

T A = t1 a1 + t2 a2 = 1
 
  1
TB = t1 t2 = t1 + 4βt2 = 0

 
  a1
HA = 1 0 = a1 = 0 .
a2
Since a1 = 0 the one requirement from Equation 280 is
       
0   1   1 0 1 0
AT + BH = t1 t2 + 1 0 = = .
a2 4β t1 a2 + 4β t2 a2 0 1

Thus we end with the set of equations

t2 a2 = 1
t1 + 4βt2 = 0
t1 a2 + 4β = 0 .

Since the last equation can be obtained by multiplying the second equation by a2 and using
the first equation we have two equations and three unknowns. One solution can be found
by taking a2 = t2 = 1, and then t1 = −4β.

To finish this example we would solve Equation 285 (with ξ replaced with ξ) ˆ and then
ˆ + B(t)z(t). Equation 285 for ξˆ in this case is
estimate x using x̂ = A(t)ξ(t)
        
˙   0 1 0   0 1 1
ˆ
ξ = −4β 1 ˆ
ξ+ −4β 1 z
0 −β 1 0 −β 4β
 
  0
+ −4β 1 u
l
= −5β ξˆ − (16β 2 + β)z + lu = −5ξˆ − 17z − 1 .

With an initial condition on ξˆ given by


 
  x1 (0)
ˆ = T (0)x̂(0) = −4β 1
ξ(0) = −4βx1 (0) + x2 (0) = −4(1) + 0 = −4 .
x2 (0)

The true system evolves as


      
ẋ1 0 1 x1 0
= + ,
ẋ2 0 −1 x2 −1

with initial condition x1 (0) = 1, x2 (0) = 1, and we solve the above differential equation for
0 ≤ t ≤ ∞. Then since our measurement z = x1 solving these three equations is equivalent
to solving the coupled set system
      
ẋ1 0 1 0 x1 0
 ẋ2  =  0 −1 0   x2  +  −1  ,
ẋ3 17 0 −5 x3 −1
 
1
with initial condition of  1 . Once we have ξˆ as a function of time, x is reconstructed
−4
via      
ˆ 0 ˆ 1 x1 (t)
x̂ = Aξ + Bz = ξ+ x1 (t) = ˆ .
1 4β ξ + 4x1 (t)
Notes on observers for stochastic systems

In this section of these notes we provide further details on observers, but in this case we
consider the situation where in addition to exact measurements (considered above) we have
noisy measurements. In this case the measurements are a combination of noisy and noise-free
as      
z1 H1 v1
z= = x+ .
z2 H2 0
Here z is a vector of dimension m and we consider the case where there are m1 noise
measurements and m2 noise-free measurements where m2 must equal m − m1 .

Using the standard definition of the error in ξ as ξ˜ = ξˆ−ξ we can derive the differential equa-
tion for ξ˜ by take the time derivative of this difference by using the postulated expressions
ˆ˙ When we do this we find
for ξ˙ and ξ.

ξ˜˙ = (T F A − T Ȧ)ξ˜ + T B1 (z1 − H x̂) − T Gw . (293)

We next would like to derive the expression for the differential equation for the error in our
state x̃. To do this we need to derive a few axillary results. The first is to note that that
ξ = T x and ξˆ = T x̂, so that ξ˜ = T x̃. The second is to note that that we can write the error
correction term above as

z1 − H1 x̂ = H1 x + v1 − H1 x̂ = −H1 x̃ + v1 .

Next we show that x̃ = Aξ˜ which can be done as follows

x̃ = x̂ − x
= Aξˆ + B2 z2 − (Aξ + B2 z2 )
= Aξ˜ . (294)

˜˙
˜ by taking the time derivative as x̃˙ = Ȧξ˜ + Aξ,
Starting with this last expression, x̃ = Aξ,
when we use ξ˜˙ given by Equation 293 we get

x̃˙ = Ȧξ˜ + A(T F A − T Ȧ)ξ˜ + AT B1 (z1 − H1 x̂) − AT Gw .

Since ξ˜ = T x̃ and z1 − H1 x̂ = −H1 x̃ + v1 the above becomes

x̃˙ = ȦT x̃ + AT F AT x̃ − AT ȦT x̃ − AT B1 H1 x̃ + AT B1 v1 − AT Gw


= (ȦT + AT F AT − AT ȦT − AT B1 H1 )x̃ + AT B1 v1 − AT Gw .

Now we will simplify this by showing that AT x̃ = x̃. By premultipling ξ˜ = T x̃ by A we have


Aξ˜ = AT x̃ and since Aξ˜ = x̃ by Equation 294 we have shown that

x̃˙ = (ȦT + AT F − AT ȦT − AT B1 H1 )x̃ + AT B1 v1 − AT Gw , (295)

which is the books equation 9.2-32. To further simplify this recall that from 9.2-18 we derived
Equation 292 an equivalent express for the first three terms in the above or

ȦT + AT F − AT ȦT = F − BHF − B Ḣ .


To modify this expression for the case of noiseless and noisy measurements considered here
we take B → B2 and H → H2 since the subscript 2 represents the noiseless measurements.
Using this expression in the first three terms and AT = I − B2 H2 in the last term the
differential equation for x̃ becomes

x̃˙ = (F − B2 H2 F − B2 Ḣ2 − AT B1 H1 )x̃ + AT B1 v1 + (I − B2 H2 )Gw , (296)

which is the books equation 9.2-33.

Notice that if we replace B1 in the above with AT B1 we see that the expression AT B1
becomes AT (AT B1 ) = AT AT B1 = AT B1 , since T A = I. Thus the transformation given by
B1 → AT B1 leave the right-hand-side of the above unmodified. The book argues that this
means that we can also perform the transformation AT B1 → B1 .

Warning: I don’t really see the logic in the books argument. If anyone knows of a better
argument for making this substitution please let me know.

If we can do this transformation however we get for x̃ the following

x̃˙ = (F − B2 H2 F − B2 Ḣ2 − B1 H1 )x̃ + B1 v1 + (I − B2 H2 )Gw , (297)

or the books equation 9.2-34.

We now verify that in special cases these results duplicate known results. If we consider the
case where there is no noisy measurements (v1 = 0 and B1 = 0) and no process noise G = 0
we then get
x̃˙ = (F − B2 H2 F − B2 Ḣ2 )x̃ ,
or Equation 291, which is the expected result for observers of deterministic systems. In the
case where there are no noise free measurements B2 = H2 = 0 (and only noisy measurements)
we get
x̃˙ = (F − B1 H1 )x̃ + B1 v1 − Gw .
which is the standard Kalman filter error dynamics when B1 is the Kalman gain.

Notes on the optimal choice for B1 and B2

Using Equation 297 we can write down the differential equation satisfied by P = E[x̃x̃T ],
where we find

Ṗ = (F − B2 H2 F − B2 Ḣ2 − B1 H1 )P + P (F − B2 H2 F − B2 Ḣ2 − B1 H1 )T
+ B1 R1 B1T + (I − B2 H2 )GQGT (I − B2 H2 )T .

As in other parts of this text we seek expressions for B1 and B2 that make trace(Ṗ ) as
small as possible. This requires taking the B1 and B2 derivatives, setting the results equal
to zero and solving for B1 and B2 . To take these derivatives we will use Equations 313, 315,
and 317. Performing this procedure to determine the optimal value for B1 first to evaluate

∂B1
trace(Ṗ ) we find the three derivatives we need to evaluate given by

∂ ∂
trace((F − B2 H2 F − B2 Ḣ2 − B1 H1 )P ) = − trace(B1 H1 P )
∂B1 ∂B1
= −(H1 P )T = −P H1T
∂ ∂
trace(P (F − B2 H2 F − B2 Ḣ2 − B1 H1 )T ) = − trace(P (B1 H1 )T )
∂B1 ∂B1

= − trace(P H1T B1T )
∂B1

= − trace(B1 H1 P ) = −P H1T
∂B1

trace(B1 R1 B1T ) = 2B1 R1 .
∂B1

Thus ∂B1
trace(Ṗ ) = 0 becomes

−2P H1T + 2B1 R1 = 0 ,

or when we solve for B1 we find


B1opt = P H1T R1−1 . (298)

When we use the optimal value for B1 found above we find that Ṗ is given by

Ṗ = (F − B2 H2 F − B2 Ḣ2 )P + P (F − B2 H2 F − B2 Ḣ2 )T + (I − B2 H2 )GQGT (I − B2 H2 )T


− P H1T R1−1 H1 P − P H1T R1−1 H1 P + P H1T R1−1 R1 R1−1 H1 P
= (F − B2 H2 F − B2 Ḣ2 )P + P (F − B2 H2 F − B2 Ḣ2 )T + (I − B2 H2 )GQGT (I − B2 H2 )T
− P H1T R1−1 H1 P , (299)

since several terms cancel. This is the books equation 9.2-37. Now to minimize the trace of
Ṗ in Equation 299 with respect to B2 we need to take the derivative of the above expression
with respect to B2 . The various derivatives we need in this calculation are given by
∂ ∂ ∂
trace((F − B2 H2 F − B2 Ḣ2 )P ) = − trace(B2 H2 F P ) − trace(B2 Ḣ2 P )
∂B2 ∂B2 ∂B2
= −(H2 F P )T − (Ḣ2 P )T
= −P F T H2T − P Ḣ2T .

The trace of the second term on the right-hand-side of Equation 299 has the same derivative
since it is the transpose of the first. Next we evaluate
∂ ∂
trace((I − B2 H2 )GQGT (I − B2 H2 )T ) = − trace(GQGT H2T B2T )
∂B2 ∂B2

− trace(B2 H2 GQGT )
∂B2

+ trace(B2 H2 GQGT H2T B2T ) .
∂B2
Note that the first term and second term are equal since the arguments of the traces are
transposes of each other. Thus we get for this part of the total derivative

−2(H2 GQGT )T + 2B2 H2 GQGT H2T .

The total derivative of trace(Ṗ ) is then given by adding up all of the parts seen thus far to
get

trace(Ṗ ) = −2P F T H2T − 2P Ḣ2T − 2GQGT H2T + 2B2 H2 GQGT H2T = 0 .
∂B2
Thus solving for B2 we see that B2 is given by

B2opt = (P F T H2T + GQGT H2T + P Ḣ2T )(H2 GQGT H2T )−1 , (300)

or the books equation 9.2.38.

Notes on specialization to correlated measurement errors

We will solve the problem of correlated measurement errors by incorporating the correlated
dynamics of the measurement noise v

v̇ = Ev + w1 ,

into the state by forming an n + mth order augmented “prime” system,where the new state
x′ is is the old state x plus the measurement noise v defined as x′T = xT v T . Such an
augmented system has new system matrices F ′ , G′ , H2′ , and Q′ as given in the book. We
now show that the state estimation error x̃′ is orthogonal to the noise-free measurements
represented by H2′ or  
′ ′
  x̃
H2 x̃ = H I = H x̃ + ṽ = 0 . (301)

To show this recall that x̃′ = Aξ˜ and premultiply this relationship by H2′ to get

H2′ x̃′ = H2′ Aξ˜ ,

and by Equation 281 for the augmented system we have that


   
T   I 0
A B = ,
H2′ 0 I

or H2′ A = 0 meaning that H2′ x̃′ = 0 showing the claimed orthogonalization in Equation 301.
Using this expression we can derive expressions for the augmented state error covariance
matrix P ′ = E[x̃′ x̃′T ] as
    
′ ′ ′T x̃  T T  P E[x̃ṽ T ]
P = E[x̃ x̃ ] = E x̃ ṽ = .
ṽ E[ṽx̃T ] E[ṽṽ T ]

By post-multiplying the relationship H x̃ + ṽ = 0 by x̃T we have H x̃x̃T + ṽx̃T = 0 so taking


expectations we get
HP + E[ṽx̃T ] = 0 ,
or
E[ṽx̃T ] = −HP .
The transpose of this is E[x̃ṽ T ] = −P H T and E[ṽṽ T is computed as

E[ṽṽ T ] = E[(−H x̃)(−H x̃)T ] = HP H T .

Thus using all of these parts we find


 
′ P −P H T
P = (302)
−HP HP H T

which is the books equation 9.2-47.

With this augmented system we are now in a situation where we can apply the results
of the previous section. That is we will put the primed system, and Equation 302 into
Equation 300.
 To do this we first need to evaluate various products. To begin we find
T
GQG 0
G′ Q′ G′T = so that
0 Q1
  T   
′ ′ ′T ′T GQGT 0 H GQGT H T
G Q G H2 = = ,
0 Q1 I Q1

and
H2′ G′ Q′ G′T H2′T = HGQGT H T + Q1 .
Next we find
   T    T T 
′ ′T P −P H T FT 0 H P −P H T F H
PF H2′T = =
−HP HP H T 0 E T
I −HP HP H T
ET
 
P F T H T − P H T ET
= ,
−HP F T H T + HP H T E T

and     
P −P H T Ḣ T P Ḣ T
P Ḣ2′T = = .
−HP HP H T 0 −HP Ḣ T
Thus the sum of the three needed terms in B2opt is given by
 
′ ′T ′T ′ ′ ′T ′T ′ ′T P F T H T − P H T E T + GQGT H T + P Ḣ T
P F H2 + G Q G H2 + P Ḣ2 = .
−HP F T H T + HP H T E T + Q1 − HP Ḣ T

When we group terms then for the matrix B2opt we have


 opt   
B21 P (Ḣ + HF − EH)T + GQGT H T
opt = (HGQGT H T + Q1 )−1 ,
B22 −HP (Ḣ + HF − EH)T + Q1

or the books equation 9.2-51.


Notes on stochastic approximation: estimating x0 from zk = x0 + vk

If our measurements are noised versions of the constant x0 or zk = x0 +vk then our stochastic
estimation algorithm is
x̂k+1 = x̂k + kk (zk − x̂k ) .
In this case g(x) = x0 − x, and so g ′ (x) = −1. Thus the required convergence condition
on the sign of kk of sgn(kk ) = −sgn(g ′(x)) = −(−1) = +1 thus we must have kk > 0 for
convergence.

Notes on stochastic approximation: estimating x0 from zj = hj x0 + vj

When zj = hj x0 + vj for j = 1, 2, · · · k − 1 then by tabulating these equations for each value


of j gives      
z1 h1 v1
 z2   h2   v2 
     
 ..  =  ..  x0 +  .. .
 .   .   . 
zk−1 hk−1 vk−1
This is an over determined system and to solve for x0 using the least-squares methodology
 T
h1
 h2 
 
we multiply both sides by the transpose of the coefficient in front of x0 or  ..  to get
 . 
hk−1
  −1  
h1 z1
  h2    z2 
    
x̂k =  h1 h2 · · · hk−1  ..  h1 h2 · · · hk−1  .. 
  .   . 
hk−1 zk−1
Pk−1
j=1 hj zj
= Pk−1 2 . (303)
j=1 hj

This is the books equation 9.3-29. We denote this estimate x̂k since it is the best predictor
“going into” the kth measurement. In other words it is the prior estimate of the value of
x0 before we obtain the kth measurement. From the above expression for x̂k a recursive
estimate of x̂k+1 can be derived as follows
Pk−1 Pk−1 2 P
zk hk + j=1 zj hj zk hk + x̂k j=1 hj zk hk + x̂k ( kj=1 h2j − h2k )
x̂k+1 = Pk 2
= Pk 2
= Pk 2
h
j=1 j h
j=1 j j=1 hj
1 hk
= x̂k + Pk (zk hk − h2k x̂k ) = x̂k + Pk (zk − hk x̂k ) ,
2 2
j=1 h j j=1 hj

which is the books equation 9.3-30.


Notes on Example 9.3-1

As an example of these techniques we will use stochastic approximation methods to estimate


the value of a Gaussian random variable x0 (with mean µ0 and variance σ02 ) from noised
measurements like zk = x0 + vk , where vk ∼ N(0, σ 2 ). If we take kk = k1 then since
g(x) = x0 − x we find our stochastic approximation algorithm given by
1
x̂k+1 = x̂k + kk mk = x̂k + (zk − x̂k ) .
k
Note that if we start with an initial guess at x0 denoted by x̂1 (since it is our guess before
the measurement z1 is obtained) taken to be µ0 and we receive the measurements zk we see
our estimates of x0 become
1
x̂2 = x̂1 + (z1 − x̂1 ) = z1
1
1 1 1
x̂3 = x̂2 + (z2 − x̂2 ) = z1 + (z2 − z1 ) = (z1 + z2 )
2 2 2
1 1 1 1 1
x̂4 = x̂3 + (z3 − x̂3 ) = (z1 + z2 ) + (z3 − (z1 + z2 )) = (z1 + z2 + z3 ) .
3 2 3 2 3
From this sequence it look like in general than we have that
k
1X
x̂k+1 = zj ,
k j=1
2
or the average of the k data points. Then the statement E[(x̂k+1 − x0 )2 ] = σk is the well
known result on the variance in the estimate of the mean. We can prove its correctness
simply as
 !2  " !#
X k Xk
1 1
E zj − x0  = E (x0 + vj ) − x0
k j=1 k j=1
 !2 
k k
1X 1 X 1 σ2
= E vj  = 2 E[vj2 ] = 2 σ 2 k = ,
k j=1 k j=1 k k

where we have used the fact that the sequence of measurement noise vj are independent i.e.
E[vi vj ] = δij σ 2 .

Now in the present case, where x0 ∼ N(µ0 , σ02 ) and when taking measurements zj = x0 + vj
with vj ∼ N(0, σ 2 ) in terms of a Kalman filter framework by taking our initial guess at the
state, x0 , and its uncertainty as x̂0 = µ0 and p0 (−) = σ02 , we see that this example is exactly
like Example 4.2-1 discussed on Page 47. To make the notation from that example match
this example we need to take r0 → σ 2 and p0 → σ02 . Under this similarity using Equation 63
we have that our state uncertainty changes with measurements as
p0 r0 σ2
pk (+) = = → 2 ,
1 + pr00 k r0
p0
+k k + σσ2
0

which is the books equation 9.3-39. The state update Equation 64 from that example and
using the above transformations gives the books equation 9.3-38.
Notes on deterministic optimal linear systems – duality

In this section of these notes we will simply derive and verify many of the book’s equations.
Given the quadratic performance index J specified in the book we seek to transform it using
a time-varying symmetric matrix S(t) with certain properties. Since S(t) is a function of
time we have
d T
(x Sx) = ẋT Sx + xT Ṡx + xT S ẋ .
dt
Using the fact that our system state satisfies ẋ = F (t)x(t) + L(t)u(t) this becomes
d T
x Sx = uT LT Sx + xT F T Sx + xT Ṡx + xT SF x + xT SLu
dt
= sT (F T S + SF + Ṡ)x + uT LT Sx + xT SLu .
d T
We next add and subtract xT V x + uT Uu to this expression to get that dt
x Sx equals
xT (F T S + SF + Ṡ + V )x + uT LT Sx + xT SLu + uT Uu − xT V xT − uT Uu . (304)
or the books equation 9.5-8. We claim that we can write this as
d T
x Sx = (xT SL + uT U)U −1 (LT Sx + Uu) − xT V x − uT Uu ,
dt
if we impose some restrictions on S. To show this expand out the first term to get
xT SLU −1 LT Sx + xT SLu + uT LT Sx + uT Uu .
This will be equal to Equation 304 if
F T S + SF + Ṡ + V = SLU −1 LT S , (305)
or the books equation 9.5-10. Thus since we have just argued that
d T
xT V x + uT Uu = (xT SL + uT U)U −1 (LT Sx + Uu) − (x Sx) ,
dt
and requiring that at tf the matrix S equals Vf or
x(tf )T S(tf )x(tf ) = x(tf )T Vf x(tf ) ,
we can write our quadratic performance index J as
Z tf
T
J = x(tf ) Vf x(tf ) + (xT V x + uT Uu)dt
t0
Z tf
T
= x(tf ) S(tf )x(tf ) + (xT SL + uT U)U −1 (LT Sx + Uu)dt
t0
− (x(tf ) S(tf )x(tf ) − x(t0 )T S(t0 )x(t0 ))
T
Z tf
T
= x(t0 ) S(t0 )x(t0 ) + (xT SL + uT U)U −1 (LT Sx + Uu)dt , (306)
t0

which is the book’s equation 9.5-12. From this we see that we can minimize J if we require
LT Sx + Uu = 0 , (307)
or that the control u should be given by
u(t) = −U −1 (t)L(t)T S(t)x(t) . (308)
Notes on optimal linear stochastic control systems – separation principles

From the discussion in the book we arrive at a minimization problem for u of the form
Z tf
J¯u = E[(xT SL + uT U)U −1 (LT Sx + Uu)] ,
t0

which we desire to minimize as a function of u. The u derivative of this expression is


∂ ∂
E[(xT SL + uT U)U −1 (LT Sx + Uu)] = E[xT SLU −1 LT Sx + xT SLu + uT LT Sx + uT Uu]
∂u ∂u
∂ T
= (x̂ SLU −1 LT S x̂ + x̂T SLu + uT LT S x̂ + uT Uu)
∂u
= (x̂SL)T + LT S x̂ + (U + U T )u .

When we simplify and set this equal to zero we get

2LT S x̂ + 2Uu = 0 .

Solving for u we find


u = −U −1 LT S x̂ ,
the same solution as in Equation 308 but evaluated at the mean state vector x̂.

Problem Solutions

Problem 1 (an adaptive filtering example)

Note this is a linear-time invariant system and so the innovations are generated by Equa-
tion 277, which in this case becomes

E[ν(t1 + τ )ν(t1 )] = e(−β−k)|τ | (p∞ − kr) + rδ(τ ) ,

As discussed in the example 9.1-1 on Page 167 we empirically compute the left-hand-side of
the above (we call this φνν (τ )) and then fit the empirical values to a function of the form
Ae−(β+k)|τ | (p∞ − kr) + Bδ(τ ). Once we have done this we have estimate of p∞ − kr and r.
Next we look for the steady-state solution to

Ṗ = (F − KH)P + P (F − KH)T + QGQT + KRK T ,

Which for this system is given by

0 = 2(−β − k)p∞ + q + k 2 r ,

or
q + k2 r
p∞ = .
2(β + k)
Thus the adaptive filtering procedure for this problem then is as follows
1. Measure the autocorrelation of the innovations ν(t) and denote this φνν (τ ).
2. Fit a model of the form Ae−(β+k)|τ | + Bδ(τ ) to the measured function φνν (τ ), obtaining
estimates of A and B.
3. From the earlier discussion these two values of A and B should satisfy
q + k2 r
A = p∞ − kr = − kr and B = r .
2(β + k)
Thus we can use these estimates to solve for q and r with k fixed. These two values
of q and r should be better estimates of q and r than we previously had and could be
used to modify the value of k using in filtering.

Problem 3 (relationships between the covariance of ξ and x)

Since x̃ and ξ˜ are related via ξ˜ = T x̃ see Equation 287 and since AT = I when we premultiply
by A this means that Aξ˜ = x̃. From these two expressions we see that the error covariances
for x̃ and ξ˜ are related via

Π = E[ξ˜ξ˜T ] = T E[x̃x̃T ]T T = T P T T , (309)

and
P = E[x̃x̃T ] = AE[ξ˜ξ˜T ]AT = AΠAT , (310)
as we were to show.

Problem 5 (convergence of the modified Newton’s algorithm)

For the iterations of the modified Newton’s algorithm


g(x̂k )
x̂k+1 = x̂k − k0 ,
g ′ (x̂k )
introduce the error, ek , defined to be ek = x̂k − x0 . Then
k0 g(x̂k )
ek+1 = x̂k+1 − x0 = x̂k − x0 −
g ′(x̂k )
k0 g(x0 + ek )
= ek − .
g ′(x0 + ek )
Taylor expand g(x0 + ek ) and g ′ (x0 + ek ) about x0 to get
1
g(x0 + ek ) = g ′ (x0 )ek + g ′′ (x0 )ek 2 + · · ·
2
Since x0 is a root of g(·) so that g(x0 ) = 0. Next Taylor expand g ′(x0 + ek ) about x0 to get

g ′ (x0 + ek ) = g ′(x0 ) + g ′′ (x0 )ek + · · · .


With these two expressions the iterations of ek satisfy
 ′ 
g (x0 )ek + 21 g ′′(x0 )e2k + · · ·
ek+1 = ek − k0 .
g ′(x0 ) + g ′′ (x0 )ek + · · ·

When we keep only the highest order terms in ek on the top and the bottom we obtain

ek+1 = ek − k0 ek = (1 − k0 )ek .

When we iterate this over k we get

ek = (1 − k0 )k−1e1 for k ≥ 2 .

Thus we see that if |1 − k0 | < 1 then this method converges since ek → 0 in that case. This
means that convergence is guaranteed when −1 < 1 − k0 < 1 or 0 < k0 < 2. We are told
that g(x) satisfies 0 ≤ a ≤ |g(x)| ≤ b < ∞, from which we conclude that 0 < ab < 1 so when
we impose the requirement that k0 be such that 0 < k0 < ab this requires that 0 < k0 < 1,
which is stricter than was is truly required for convergence (which is k0 < 2).

Problem 7 (requirements for stochastic convergence)

All the examples given for the gainP sequence, kk , are examples that can be shown similar to
that of the classic divergent series ∞ 1
k=1 k .

Problem 8 (some derivations)

See the notes that accompany Example 9.3-1 on Page 179.


A Appendix

A.1 Matrix and Vector Derivatives

In this section of the appendix we enumerate several matrix and vector derivatives that are
used in the previous document. We begin with some derivatives of scalar forms

∂xT a aT x
= =a (311)
∂x ∂x
∂xT Bx
= (B + BT )x . (312)
∂x
Next we present some derivatives involving traces. We have

trace(AX) = AT (313)
∂X

trace(XA) = AT (314)
∂X

trace(AXT ) = A (315)
∂X

trace(XT A) = A (316)
∂X

trace(XT AX) = (A + AT )X (317)
∂X

trace(XAXT ) = X(A + AT ) . (318)
∂X
Note that we can derive Equations 317 and 318 given the previous trace derivative identities
using the “product rule”. To do this we assume that one of the terms X (or XT ) is constant
when we take the derivative with respect to the other X term. For example to derive
Equation 318 we have

∂ ∂ ∂
T
trace(XAX ) = trace(XAV) + trace(VAX )
T
∂X ∂X V=XT ∂X V=X

T T T
= (AV) V=XT + (VA)| V=X= (AX ) + XA
T
= X(A + A ) .

Next we present some matrix derivatives that are helpful to know. We have

(aT Xb) = abT (319)
∂X

(aT XT b) = baT , (320)
∂X
where as before X is a matrix. Derivations of expressions of this form are derived in [4, 6].
References
[1] J. D’Appolito and C. Hutchinson. Low sensitivity filters for state estimation in the
presence of large parameter uncertainties. Automatic Control, IEEE Transactions on,
14(3):310–312, 1969.

[2] J. P. M. de S. Applied Statistics Using SPSS, STATISTICA, MATLAB and R, 2nd


Edition. 1997.

[3] M. H. DeGroot. Optimal Statistical Decisions. 2004.

[4] P. A. Devijver and J. Kittler. Pattern recognition: A statistical approach. Prentice Hall,
1982.

[5] R. C. Dorf. Introduction to Electric Circuits. John Wiley & Sons, Inc., New York, NY,
USA, 2007.

[6] P. S. Dwyer and M. S. Macphail. Symbolic matrix derivatives. Annals of Mathematical


Statistics, 19(4):517–534, 1948.

[7] M. S. Grewal and A. P. Andrews. Kalman Filtering : Theory and Practice Using
MATLAB. Wiley-Interscience, January 2001.

[8] E. L. Ince. Ordinary Differential Equations. Dover Publications, Inc., New York, NY,
1956.

[9] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab. Signals & systems (2nd ed.).
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.

[10] A. Papoulis. Probability, Random Variables, and Stochastic Processes. 3rd edition, 1991.

You might also like