0% found this document useful (0 votes)
53 views58 pages

Linear Systems

This document summarizes a talk on smoothing and filtering of linear and nonlinear dynamic systems. It discusses using observers and Kalman filters for state estimation of linear systems, and using coordinate transformations and input-output injections to construct observers for nonlinear systems. It provides references for further reading on linear and nonlinear estimation techniques.

Uploaded by

Cesar Apodaca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views58 pages

Linear Systems

This document summarizes a talk on smoothing and filtering of linear and nonlinear dynamic systems. It discusses using observers and Kalman filters for state estimation of linear systems, and using coordinate transformations and input-output injections to construct observers for nonlinear systems. It provides references for further reading on linear and nonlinear estimation techniques.

Uploaded by

Cesar Apodaca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/229058207

Linear Systems

Chapter · January 1980

CITATIONS READS

99 11,743

1 author:

Thomas Kailath
Stanford University
728 PUBLICATIONS   43,896 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Ph.D. in Electrical Engineering at Stanford University View project

All content following this page was uploaded by Thomas Kailath on 28 January 2015.

The user has requested enhancement of the downloaded file.


.

Model Based Smoothing of


Linear and Nonlinear Processes

Arthur J. Krener
Department of Mathematics
University of California
Davis, CA, 95616, USA

[email protected]

1
Dynamic Models

We assume that there are three processes.


The observed ”input” process u(t) ∈ IRm ,
the observed ”output” process y(t) ∈ IRp
and the hidden ”state” process x(t) ∈ IRn .
We will restrict to m, n, p finite.
We will restrict to continuous time.

We also assume that the processes are related by some model,


which in the absence of noise is of the form
ẋ = f (x, u)
y = h(x, u)
There may also be additional information such as
x(0) = x0 or b(x(0), x(T )) = 0

2
In controls community there are two important problems.

Filtering: From the knowledge of the past inputs u(0 : t) , the


past outputs y(0 : t) , the model f, h and the initial condition
x0 , estimate the current state x(t) .

Typically filtering needs to be done in real time as t evolves be-


cause the state estimate x̂(t) will be used to determine the current
control u(t) = κ(x̂(t) to achieve some goal such as stabilization.

Smoothing: From the knowledge of the inputs u(0 : T ) , the


outputs y(0 : T ) , the model f, h and the initial condition or
boundary conditions, estimate the state x(0 : T ) .

Typically smoothing is done off-line after the data has been col-
lected. It is done on a fixed interval t ∈ [0, T ] . Since we are
using more data, we expect that the smoothed estimate to be more
accurate than the filtered estimate.
3
This talk will focus on smoothing but I will make a few general
remarks about filtering.

There are both deterministic and stochastic approaches to filtering.

If the model is linear


ẋ = F x + Gu
y = Hx + J u
then the filters are relatively straightforward but if the model is
nonlinear significant then there can be significant difficulties.

4
Observer

This is another dynamical system driven by the input and output

ż = g(z, u, y)
x̂ = k(z, u, y)
such that the error x̃(t) = x(t) − x̂(t) goes to zero as t → ∞.
For a linear system the construction of an observer is particularly
simple x̂ = z
x̂˙ = F x̂ + Gu + L(y − ŷ)
ŷ = H x̂ + J u
Then
x̃˙ = (F − LH)x̃
so if
σ(F − LH) < 0
the error goes to zero. When can we find such an L ?
5
The linear system is observable if the largest F invariant subspace
V in the kernel of H is zero.

In other words the initial state is uniquely determined by the input


and the time derivatives of the output. Of course we do not want
to differentiate because it greatly accentuates any noise.

The linear system is observable iff the eigenvalues of F − LH


can be set arbitraily up to cc by choice of L .

If the dimension of V is r > 0 then r of the eigenvalues do not de-


pend on L . If these r eigenvalues are already in the open left half
plane then the linear system is detectable and an asymptotically
convergent observer can be constructed.

If not then it is impossible to construct an asymptotically conver-


gent observer.

6
I have added some references to my slides. They are hardly com-
plete.

A basic reference on linear control and estimation is


Author Kailath, Thomas.
Title Linear systems / Thomas Kailath.
Publisher Englewood Cliffs, N.J. : Prentice-Hall, c1980.
but it is almost too complete.

Simpler introductions are


Author Anderson, Brian D. O. and J. Moore
Title Optimal control–linear quadratic methods / Brian D.O. An-
derson, John B. Moore.
Publisher Englewood Cliffs, N.J. : Prentice Hall, c1990.

7
Author Anderson, Brian D. O.
Title Optimal filtering / Brian D. O. Anderson, John B. Moore.
Publisher Englewood Cliffs, N.J. : Prentice-Hall, c1979.

Author Rugh, Wilson J


Title Linear system theory / Wilson J. Rugh
Edition 2nd ed
Publisher Englewood Cliffs, N.J. : Prentice Hall, 1996

A good reference on nonlinear systems is


Author Khalil, Hassan K., 1950-
Title Nonlinear systems / Hassan K. Khalil
Edition 3rd ed
Publisher Upper Saddle River, N.J. : Prentice Hall, c2002

Observers for linear systems were initiated in


D. G. Luenberger, Observing the state of a linear system, IEEE
Trans. on Military Electronics, 8 (1964), 74-80.

8
Kalman Filtering

Assume that the linear system is affected by independent standard


white Gaussian driving and observation noises and the initial con-
dition is Gaussian ,
ẋ = F x + Gu + Bw
y = Hx + J u + Dv
x(0) = x0
where D is invertible and the mean and covariance of x0 are
x̂0, P 0 .

The goal is to compute the conditional expectation x̂(t) condi-


tioned on the past inputs and outputs u(0 : t), y(0 : t)

Note the system could be time varying F = F (t), G = G(t),


B = B(t), H = H(t), J = J (t), D = D(t) .

9
Kalman Filter

x̂˙ = F x̂ + Gu + L(y − ŷ)


ŷ = H x̂ + J u
x̂(0) = x̂0
L = P H 0(DD 0)−1
Ṗ = F P + P F 0 + BB 0 − P H 0(DD 0)−1HP
P (0) = P0

The equation for P is called a Riccati differential equations.

For autonomous systems there is also a asymptotic version (large


t ) of the Kalman filter which is also autonomous Ṗ = 0 .

10
A readable introduction to linear and nonlinear estimation is

Author Analytic Sciences Corporation. Technical Staff.


Title Applied optimal estimation. Written by: Technical Staff,
Analytic Sciences Corporation.
Edited by Arthur Gelb. Principal authors: Arthur Gelb [and oth-
ers]
Publisher Cambridge, Mass., M.I.T. Press [1974]

11
Least Squares or Minimum Energy Filtering

For fixed u(0 : t) the model describes a mapping


(x0, w(0 : t), v(0 : t)) 7→ y(0 : t)
where w(0 : t), v(0 : t) are L2[0, t] .

Given y(0 : t) we seek the noise triple (x0, w(0 : t), v(0 : t))
of minimum ”energy” that generates it. The ”energy” is defined
to be
1 0 0 0 −1 0 1 Z t
(x ) (P ) x + 0 |w(s)|2 + |v(s)|2 ds
2 2
The least squares estimate x̂(t) is the endpoint of the trajectory
generated by the minimizing (x̂0, ŵ(0 : t)) . This estimate is
the same as that of the Kalman filter.

12
Minimax Estimation or H ∞ Estimation

For fixed u(0 : t) consider the mapping from noise to estimation


error
(x0, w(0 : t), v(0 : t)) 7→ x̃(0 : t)
One seeks the estimator that minimizes the L2 induced norm of
this mapping.

It is difficult to solve this problem directly so usually one sets


an upper bound γ for induced norm and seeks an estimator that
achieves it. As with the Kalman filter this reduces to finding
a nonnegative definite solution to a Riccati differential equation.
The form of the minimax estimator is similar to that of a Kalman
filter.

If such an estimator can be found then γ can be lowered, if not γ


is increased.

13
There are generalizations of the above to nonlinear systems of the
form
ẋ = f (x, u)
y = h(x, u)
A popular way of finding a nonlinear observer is to seek a local
change of state coordinates z = θ(x) and an input-output injec-
tion β(u, y) such that the system becomes
ż = Az + β(u, y)
where σ(A) < 0 for then the observer
ẑ˙ = Aẑ + β(u, y)
x̂ = θ −1(ẑ)
has linear and exponentially stable error dynamics
z̃˙ = Az̃
in the transformed coordinates. This is relatively easy to accom-
plish if there is no input m = 0 but generally impossible if
m>0.
14
Using change of state coordinates and input/output injection was
initiated in
A. J. Krener and A. Isidori,
Linearization by output injection and nonlinear observers,
Systems Control Lett., 3 (1983), pp. 47–52.

The method was improved in

A. J. Krener and W. Respondek,


Nonlinear observers with linearizable error dynamics,
SIAM J. Control Optim., 23 (1985), pp. 197–216.

and futher improved in

N. Kazantzis and C. Kravaris, Nonlinear observer design using


Lyapunov’s auxiliary theorem, Systems Control Lett., 34 (1998),
pp. 241–247.

15
Other methods of constructing observers can be found in
E. A. Misawa, J. K. Hedrick, Nonlinear observers a state of the
art survey, Trans. of ASME, J. of Dynamic Systems, Measure-
ment and Control, 111 (1989), 344-352.

See also
Title New directions in nonlinear observer design
H. Nijmeijer and T.I. Fossen (eds.)
Publisher London ; New York : Springer, c1999

16
Nonlinear Stochastic Filtering

We add driving and observation noises to the nonlinear model


which now must be written as a pair of Ito stochastic differential
equations
dx = f (x, u)dt + Bdw
dy = h(x, u)dt + dv
x(0) = x0
Let p0(x) be the known density of x0 .
The unnormalied conditional density q(x, t) of x(t) given the
model and past inputs and outputs satisfies the Zakai stochastic
partial differential equation (summation convention)
∂qfi ∂ 2q
dq = − dt + (BB 0)ij dt + qhidyi
∂xi ∂xi∂xj
q(x, 0) = p0(x)

17
The Zakai equation is stochastic PDE in the Ito sense driven by
the observations. It has to be solved in its Stratonovich form.

It is very difficult to solve if n > 1 . It can’t be solved implicitly


and it is parabolic so if the spatial step is small, the temporal step
is extremely small.

The function x 7→ q(x, t) can be thought of as the state at time


t of an infinite dimensional observer with inputs u, y and output

R
xq(x, t) dx
x̂(t) = R
q(x, t) dx
or
x̂(t) = argmaxx q(x, t)

Many nonlinear estimators are infinite dimensional and one is


forced to seek a finite dimensional approximation.
18
A relatively simple derivation of the Zakai equation can be found
in
M. H. A. Davis and S. I. Marcus,
An introduction to nonlinear filtering,
in Stochastic Systems: The Mathematics of Filtering and Identi-
fication and Applications,
M. Hazewinkel and J. C. Willems, (eds.),
D. Reidel Publishing, Dordrecht, (1981), 53-76.

19
Monte Carlo Filtering, Particle Filtering

These are discrete time filters based on the approximation of


p(x, t) by point masses.
p(x, t) ≈ αk (t)δ(x − xk (t))
X

Sample p(x, t) , use the noisy system and the Bayes formula to
compute αk (t + 1), xk (t + 1)

There are several different implimentations of this basic philosophy


including replacing the point masses with Gaussians.

20
Nonlinear Minimum Energy Filtering

Some of the techical difficulties associated with stochastic nonlin-


ear models can be avoided by assuming the noises are unknown
L2 functions and the initial condition is also unknown
ẋ = f (x, u) + Bw
y = h(x, u) + v
x(0) = x0.

Consistent with u(0 : t), y(0 : t) we seek to minimize


1 0 2 1 Zt
|x | + 0 |w(s)|2 + |v(s)|2 ds
2 2
The least squares estimate x̂(t) is the endpoint of the trajectory
generated by the minimizing (x̂0, ŵ(0 : t)) . This estimate is
generally not the same as that of the stochastic nonlinear filter.

21
To find this estimate we must solve in real time a partial differential
equation of Hamilton-Jacobi-Bellman type driven by u(t), y(t).

∂Q ∂Q 1 ∂Q ∂Q 1
= − fi − + (yi − hi)(yi − hi)
∂t ∂xi 2 ∂xi ∂xi 2
x̂(t) = argminx Q(x, t)
This is nearly impossible if n > 1 .

There is no guarantee that a smooth solution exists so we must


allow solutions in the viscosity sense. We have repaced the techni-
cal difficulties of stochastic nonlinear filtering with a different set
of technical difficulties.

Similar remarks hold for nonlinear minimax filtering except now


the PDE is of Hamilton-Jacobi-Isaacs type so it is even harder.

For these reasons most nonlinear filtering is done by approxima-


tion. The workhorse is the extended Kalman filter.
22
Nonlinear Minimum Energy Estimation was initiated in

Mortenson RE (1968) J. Optimization Theory and Applications,


2:386–394

See also

Hijab O (1980) Minimum Energy Estimation PhD Thesis, Univer-


sity of California, Berkeley, California

Hijab O (1984), Annals of Probability, 12:890–902

23
Extended Kalman Filtering (EKF)

Add WGNs to the nonlinear model


ẋ = f (x, u) + Bw
y = h(x, u) + Dv
and linearize around the estimated trajectory.
F (t) = ∂f∂x (x̂(t), u(t)), H(t) = ∂h
∂x (x̂(t), u(t))
Then build a Kalman filter for the linear model and impliment it
on the nonlinear model.
x̂˙ = f (x̂, u) − L(t)(y − h(x̂, u))
L(t) = −P (t)H(t)0(DD 0)−1
Ṗ = F (t)P (t) + P (t)F (t)0 + BB 0
−P (t)H(t)0(DD 0)−1H(t)P (t)

24
Interpertation
x(t) ≈ N (x̂(t), P (t))

The EKF is the most widely used nonlinear filter.

Like the Kalman filter it can also be derived nonstochastically from


the minimum energy filter.

It generally performs well but can diverge.

Recently it has shown that when viewed as an observer it is lo-


cally convergent for a broad class of nonlinear systems in both
continuous and discrete time.

25
The local covergence of the EKF in discrete time is demonstrated
in

Song and Grizzle, Proceedings of the American Control Confer-


ence, 1992, pp. 3365-3369.

and in continuous time in Krener AJ (2002) The convergence of


the extended Kalman filter.
In: Rantzer A, Byrnes CI (eds)
Directions in Mathematical Systems Theory and Optimization,
173–182.
Springer, Berlin Heidelberg New York,
also at http://arxiv.org/abs/math.OC/0212255

26
Unscented Kalman Filter

Like the particle filters this is a discrete time filter.

At each time it assumes that the conditional density is approxi-


mately N (x̂(t), P (t))
1
Loosely speaking one computes an orthonormal frame that is P 2 (t)
and propagates the noisy dynamics from x̂(t) and from x̂(t)
plus and minus a positive multiple of the vectors of the frame.
This deterministic sample of 2n + 1 points is updated by the
current observation using Bayes rule and a mean and covariance
x̂(t + 1), P (t + 1) is computed.

The computational burden is about the same of the EKF but the
UKF is second order accurate rather than first order accurate.
It has the same drawback of the EKF, namely that it is a lo-
cal method in that it approximates the conditional density by a
Gaussian.
27
The UKF was introduced in

Julier et al. Proceedings of the American Control Conference,


1995, pp. 1628-1632

28
Smoothing of Linear Initial Value Models

ẋ = F x + Gu + Bw
y = Hx + J u + Dv
x(0) ≈ N (x̂0, P 0)
t ∈ [0, T ]

Estimate x(0 : T ) from the model and u(0 : t), y(0 : T ).

29
Two Kalman Filters
x̂˙ f = F x̂f + Gu + Lf (y − ŷf )
ŷf = H x̂f + J u
x̂f (0) = x̂0
Lf = Pf H 0(DD 0)−1
Ṗf = F Pf + Pf F 0 + BB 0 − Pf H 0(DD 0)−1HPf
Pf (0) = P0

x̂˙ b = F x̂b + Gu − Lb(y − ŷb)


ŷb = H x̂b + J u
x̂b(T ) = ?
Lb = PbH 0(DD 0)−1
Ṗb = F Pb + PbF 0 − BB 0 + PbH 0(DD 0)−1HPb
Pb(T ) = ?

30
The two technical problems can be solved by letting
Q(t) = Pb−1(t), z(t) = Q(t)x̂b(t)
then
ż = (−F 0 + QBB 0)z + (QG + H 0(DD 0)−1J )u
−H 0(DD 0)−1y
z(T ) = 0
Q̇ = −QF − F 0Q + QBB 0Q − H 0(DD 0)−1H
Q(T ) = 0
Then the smoothed estimate x̂(t) and its error covariance P (t)
are given by combining the independent estimates
 

x̂(t) = P (t) Pf−1(t)x̂f (t) + Pb−1(t)x̂b(t)


 

= P (t) Pf−1(t)x̂f (t) + z(t)


P −1(t) = Pf−1(t) + Pb−1(t)
= Pf−1(t) + Q(t)

31
The same smoothed estimate can be obtained by a least squares
argument. Suppose w(0 : T ), v(0 : t) are unknown L2[0, T ]
functions and x0 is an unkown initial condition. We seek to min-
imize
0 −1 Z
(x ) (P ) x + 0T |w|2 + |v|2 dt
0 0 0

consistent with u(0 : T ), y(0 : T ) and the model


ẋ = F x + Gu + Bw
y = Hx + J u + Dv
x(0) = x0.

Let ŵ(0 : T ), v̂(0 : t), x̂0 be the minimizer then the smoothed
estimate x̂(t) is the solution of
x̂˙ = F x̂ + Gu + B ŵ
x̂(0) = x̂0.

32
For fixed u(0 : T ), y(0 : T ) define the Hamiltonian
0 1 2 2

H(λ, x, w) = λ (F x + Gu + Bw) + |w| + |v| 


2
v = D −1 (y − Hx − J u)
If x̂(t), ŵ(0 : T ), v̂(0 : t), x̂0 are minimizing then there exist
λ̂(t) such that the Pontryagin Minimum Principle holds.

 ∂H 

0
x̂˙ =  
 (λ̂, x̂, ŵ)


∂λ

 ∂H 

0
λ̂˙ = −  
 (λ̂, x̂, ŵ)


∂x
ŵ = argminw H(λ, x, w)

x̂(0) = P 0λ̂(0)

0 = λ̂(T )

33
This yields the two point boundary value problem
 
   
 ˙ 
 x̂   F 0
−BB   x̂ 
  
 = 
  
 ˙ 


−H 0 (DD 0 )−1 H −F 0  
 
λ̂



λ̂ 

   

 G 0   u 
  
+ 

0 0 −1 0 0 −1
 
 


−H (DD ) J H (DD )  
y

x̂(0) = P 0λ̂(0)

0 = λ̂(T )
There are numerous ways of solving this problem, perhaps the
simplest is to define
µ̂(t) = λ̂(t) − M (t)x̂(t)
where
Ṁ = −M F − F 0M − H 0(DD 0)−1H + M BB 0M
M (T ) = 0

34
This transformation triangularizes the dynamics
     
˙ 
 x̂  F − M BB
0 −BB 0   x̂ 
    
 = 
     

 ˙  
0 0
−F + BB M µ̂ 0  
 

µ̂
   

 G 0   u 
  
+ 

0 0 −1 0 0 −1
 
 


−M G − H (DD ) J H (DD )
  
y

 

I + P 0M (0) x̂(0) = P 0µ̂(0)

0 = µ̂(T )

The triangularized dynamics can be solved by integrating µ̂(t)


backward from µ̂(T ) and then integrating x̂(t) forward from
x̂(0)

35
Stochastic Interpertation of the Variational Equations
(For simplicity G = 0, J = 0 )

The model
ẋ = F x + Bw
y = Hx + Dv
x(0) = x0.
defines a linear map
 

T1 : x0, w(0 : T ), v(0 : T ) 7→ y(0 : T )

36
The complementary model
λ̇ = −F 0λ + H 0(D 0)−1v
ψ = −B 0λ + w
λ(0) = 0
ξ = x0 − P 0λ(0)
defines a map
 

T2 : x0, w(0 : T ), v(0 : T ) 7→ (ξ, ψ(0 : T ))


such that the combined map
 

T = T1 × T2 : x0, w(0 : T ), v(0 : T ) 7→ (y(0 : T ), ξ, ψ(0 : T ))


is invertible and the ranges of T1, T2 are independent
E(y(t)ξ 0) = 0, E(y(t)ψ 0(s)) = 0
Hence
 

x̂0, ŵ(0 : T ), v̂(0 : T ) = T −1 (y(0 : T ), 0, 0)

37
For more on linear smoothing, boudary value models and comple-
mentary models see

Weinert H L (2001) Fixed interval smoothing for state space mod-


els.
Kluwer Academic Publishers, Norwell MA

and its extensive references.

38
In filtering it is reasonable to assume that there is apriori informa-
tion about the initial condition but no apriori information about
the terminal condition. But this is not reasonable in smoothing.

A better model for the smoothing problem is the two point bound-
ary value problem
ẋ = Ax + Bw
y = Cx + Dv
b = V 0x(0) + V T x(T )
where as before w, v are standard white Gaussian noises and b
is an independent Gaussian vector.

What is the meaning of such a model? What kind of process is


x(t) ?

More generally we might consider multipoint value processes,


b = V 0x(t0) + V 1x(t1) + · · · + V k x(tk )

39
The two point boundary process is well-posed if
W = V 0 + V T Φ(T, 0)
is nonsingular where
∂Φ
(t, s) = A(t)Φ(t, s)
∂t
Φ(s, s) = I
Then there is a Green’s matrix





 Φ(t, 0)W −1Φ(0, s) t > s
Γ(t, s) =  −1Φ(T, s) t < s
 −Φ(t, 0)W

and
−1 Z
x(t) = Φ(t, s)W b + 0T Γ(t, s)B(s)w(s) ds
where the integral is in the Weiner sense (integration by parts).

This in general is not a Markov process. It is a reciprocal process


in the sense of S. Bernstein.

40
A process on [0, T ] is reciprocal if conditioned on x(t1), x(t2)
where 0 ≤ t1 < t2 ≤ T , what happens on (t1, t2) is indepen-
dent of what happens on [0, t1) ∪ (t2, T ] .

In other words a reciprocal process is a Markov random field on


the interval [0, T ] .

Every Markov process is reciprocal but not vice versa.

The covariance R(t, s) of a reciprocal process with full noise


∂R + ∂R +
Q(t) = (t, t ) − (t , t) > 0
∂t ∂t
satisfies a second order, self-adjoint differential equation
∂ 2R ∂R
(t, s) = F (t)R(t, s) + G(t) (t, s) − Q(t)δ(t − s)
∂t2 ∂t
If Q(t) is not positive definite then R(t, s) satisfies a higher
order, self-adjoint differential equation.

41
For x defined by the linear boundary value model above
Q = BB 0
0 dQ
GQ = AQ − QA +
dt
dA
F = + A2 − GA
dt

Any reciprocal process with full noise is the solution of a second


order, stochastic boundary value problem
−d2x(t) + F (t)x(t) dt2 + G(t)dx(t) dt = Q(t)ξ(t)dt2

x(0) = x0, x(T ) = xT


To leading order ξ(t)dt2 = d2w(t) + . . . where w(t) is a
standard Wiener process.

This is not an Ito stochastic differential equation!

42
The quick and dirty way to derive this equation is to note that if
d+x = Ax dt + B d+w
where d+ denotes the forward differential in the Ito sense then
 

B −1 d+x − Ax dt = d+w


We apply the adjoint of the operator on the left to obtain
   

−d− + A0 dt (B 0)−1B −1 d+x − Ax dt
 

= −d− + A0 dt (B 0)−1d+w


which yields
−d2x + F x dt2 + Gdx dt = Qd2w + . . .

43
For more on reciprocal processes see

Krener A J, Frezza R, Levy B C (1991)


Gaussian reciprocal processes and self-adjoint stochastic differen-
tial equations of second order.
Stochastics 34:29-56.

Krener, A. J., Reciprocal diffusions in flat space,


Probability Theory and Related Fields (1997) pp. 243-281

44
Suppose that we have observations of the reciprocal process
dy = Cxdt + Ddv
where v is a standard Wiener process and D is invertible.

If the boundary conditions are known to be zero x0 = xT = 0


then the optimal smoothed estimate x̂(t) satisfies
−d2x̂(t) + F (t)x̂(t) dt2 + G(t)dx̂(t) dt
 

= C 0(t)(DD 0)−1(t) dy dt − C(t)x̂(t) dt2

x̂(0) = 0, x̂(T ) = 0

The formula is a little more complicated if x0 6= 0, xT 6= 0

The error process x̃(t) = x(t) − x̂(t) is also reciprocal.

45
The reciprocal smoother can be found in

Frezza R (1990)
Models of higher order and mixed order Gaussian reciprocal pro-
cesses with applications to the smoothing problem.
PhD Thesis, University of California, Davis, CA

46
Deterministic Smoothing of Linear Boundary Processes

ẋ = F x + Gu + Bw
y = Hx + J u + Dv
b = V 0x(0) + V T x(T )

As before we view w(0 : T ), v(: T ) as unknown L2 functions


and b ∈ IRk as an unknown boundary condition.

For given u(0 : T ), y(0 : T ) we seek to minimize


1  2 ZT 2 2


|b| + 0 |w| + |v| dt
2
Define the Hamiltonian
1  

H = λ0 (F x + Gu + Bw) + |w|2 + |y − (Hx + J u + Dv)|2


2

47
If x̂(t), ŵ(0 : T ), v̂(0 : t), b̂ are minimizing then there exist
λ̂(t) such that the Pontryagin Minimum Principle holds.
∂H 0
 

x̂˙ = 





(λ̂, x̂, u, y, ŵ)
∂λ
∂H 0
 

˙ 
λ̂ = − 





(λ̂, x̂, u, y, ŵ)
∂x
ŵ = argminw H(λ̂, x̂, u, y, w)

b̂ = V 0x̂(0) + V T x̂(T )

λ̂(0) = (V 0)0b̂

λ̂(T ) = (V T )0b̂

48
This yields the two point boundary value problem
 
   
 ˙ 
 x̂   F 0
−BB   x̂ 
  
 = 
  
 ˙ 

 0 0 −1
−H (DD ) H −F 0  
 
λ̂ 


λ̂ 

   

 G 0   u 
  
+ 

0 0 −1 0 0 −1
 
 


−H (DD ) J H (DD )  
y
 

λ̂(0) = (V 0)0 V 0x̂(0) + V T x̂(T )


 

λ̂(T ) = (V T )0 V 0x̂(0) + V T x̂(T )

In contrast to the stochastic approach, we do not have to assume


that the model is well-posed to pursue the deterministic approach.

Under reasonable assumptions (controllability and observability)


the variational equations will be well-posed even if the model is
not.

49
In the initial value formulation, one of the boundary conditions of
the variational equations was particularly simple
λ(T ) = 0
This allowed us to triangularize the dynamics and solve by back-
ward sweep of λ̂(t) and a forward sweep of x̂(t) .

In the general boundary value formulation, the boundary condi-


tions of the variational equations are fully coupled so we cannot
solve by two simple sweeps. This is a big problem if the dimension
n of x is large.

50
Another advantage of the deterministic approach is that it readily
generalizes to nonlinear boundary value models of the form
ẋ = f (t, x, u) + g(t, x)w
y = h(t, x, u) + v
b = β(x(0), x(T ))

If we try to formulate this stochastically, e.g. w, v independent


white Gaussian noises and b an independent random k vector, we
cannot give meaning to the equations even in an Ito sense.

But for fixed u(0 : T ), y(0 : T ) we can seek to minimize

1  2 ZT 

|b| + 0 |w|2 + |v|2 dt
2

51
The Pontryagin minimum principle yields the first order necessary
that must be satisfied by an optimal solution. The Hamiltonian is

0 1
H = λ (f (t, x, u(t)) + g(t, x)w) + |w|2(t)
2
1
+ |y(t) − h(t, x, u(t)|2.
2
and 
 ∂H


0
˙
x̂(t) = 



(t, λ̂(t), x̂(t), u(t), y(t), ŵ(t)) 



∂λ

 ∂H

0
˙
λ̂(t) = − 

(t, λ̂(t), x̂(t), u(t), y(t), ŵ(t))
∂x
ŵ(t) = argminw H(t, λ̂(t), x̂(t), u(t), y(t), w)

 ∂b


0
 
λ̂(0) = 


0
(x̂(0), x̂(T ))  b(x̂(0), x̂(T ))


∂x

 ∂b


0
 
λ̂(T ) = 


T
(x̂(0), x̂(T ))  b(x̂(0), x̂(T ))


∂x
52
Then
ŵ(t) = −g 0(t, x̂(t))λ̂(t)
and so
˙
x̂(t) = f (t, x̂(t), u(t)) − g(t, x̂(t))g 0(t, x̂(t))λ̂(t)

 ∂f ∂g

0
˙
(t, x̂(t))g 0(t, x̂(t))λ̂(t)) λ̂(t)

λ̂(t) = −  (t, x̂(t), u(t)) −
∂x ∂x

 ∂h


0
+  (t, x̂(t), u(t)) (y(t) − h(t, x̂(t), u(t)))
∂x

 ∂b


0
λ̂(0) =  0 (x̂(0), x̂(T )) b(x̂(0), x̂(T ))
∂x

 ∂b


0
λ̂(T ) =  T (x̂(0), x̂(T )) b(x̂(0), x̂(T )).
∂x
This is a nonlinear two point boundary value problem in 2n vari-
ables that we can try solve by direct methods, shooting methods
or iterative methods. We shall use an iterative method that takes
advantage of the variational nature of the problem.
53
We use gradient descent. First solve the initial value problem,
ẋ = f (t, x, u) + g(t, x(t))w
x(0) = x0
and compute the cost
1 Z

π(x0, w(0 : T )) = 
|b|2 + 0T |w|2 + |v|2 dt
2
Then solve the final value problem for µ(s) ,
d
µ(s) = −F (s)µ(s) + H 0(s) (y(s) − H(s)x(s))
ds 
 ∂b


0
µ(T ) =  T (x(0), x(T )) b(x(0), x(T ))
∂x
∂f
F (t) = (t, x(t), u(t))
∂x
G(t) = g(t, x(t))
∂h
H(t) = (t, x(t), u(t))
∂x
54
The first variation of the cost due to the changes δx0, δw(0 : T )
is
 

0 
 0 ∂b 0 
 0
δπ = π(x , w) + β 
(x(0), x(T )) + µ (0) δx


∂x0
à !
Z
T 0 0
+ 0 w (s) + µ (s)G(s) δw(s) ds
+O(δx0, δw)2.
Choose a step size ² and define
 
 ∂b


0 
0 





δx = −²  0 (x(0), x(T )) β + µ(0)
∂x
à !
0
δw(s) = −² w(s) + G (s)µ(s) .

Replace x0, w(0 : T ) by x0 + δx0, w(0 : T ) + δw(0 : T )


and repeat.

55
The nonlinear smoother can be found in

Krener, A. J., Least Squares Smoothing of Nonlinear Systems,


preprint, [email protected]

56
Conclusions

We have briefly surveyed methods for filtering and smoothing data


from dynamic models. For many applications the relevant problem
is smoothing a nonlnear dynamic model that involves two point
boundary data (or multipoint data).

Well-posed boundary value linear models driven by white Gussian


noise lead to reciprocal processes.

A combination of boundary information and nonlinearities rule


out a stochastic formulation of the smoothing problem. A deter-
ministic (least squares) formulation leads to a two point boundary
problem that is well-posed for controllable and observable linear
models.

We have presented a gradient descent algorithm for solving linear

View publication stats


and nonlinear problems. It will converge for linear models. Its
convergence for nonlinear models is an open question.
57

You might also like