Linear Systems
Linear Systems
net/publication/229058207
Linear Systems
CITATIONS READS
99 11,743
1 author:
Thomas Kailath
Stanford University
728 PUBLICATIONS 43,896 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Thomas Kailath on 28 January 2015.
Arthur J. Krener
Department of Mathematics
University of California
Davis, CA, 95616, USA
1
Dynamic Models
2
In controls community there are two important problems.
Typically smoothing is done off-line after the data has been col-
lected. It is done on a fixed interval t ∈ [0, T ] . Since we are
using more data, we expect that the smoothed estimate to be more
accurate than the filtered estimate.
3
This talk will focus on smoothing but I will make a few general
remarks about filtering.
4
Observer
ż = g(z, u, y)
x̂ = k(z, u, y)
such that the error x̃(t) = x(t) − x̂(t) goes to zero as t → ∞.
For a linear system the construction of an observer is particularly
simple x̂ = z
x̂˙ = F x̂ + Gu + L(y − ŷ)
ŷ = H x̂ + J u
Then
x̃˙ = (F − LH)x̃
so if
σ(F − LH) < 0
the error goes to zero. When can we find such an L ?
5
The linear system is observable if the largest F invariant subspace
V in the kernel of H is zero.
6
I have added some references to my slides. They are hardly com-
plete.
7
Author Anderson, Brian D. O.
Title Optimal filtering / Brian D. O. Anderson, John B. Moore.
Publisher Englewood Cliffs, N.J. : Prentice-Hall, c1979.
8
Kalman Filtering
9
Kalman Filter
10
A readable introduction to linear and nonlinear estimation is
11
Least Squares or Minimum Energy Filtering
Given y(0 : t) we seek the noise triple (x0, w(0 : t), v(0 : t))
of minimum ”energy” that generates it. The ”energy” is defined
to be
1 0 0 0 −1 0 1 Z t
(x ) (P ) x + 0 |w(s)|2 + |v(s)|2 ds
2 2
The least squares estimate x̂(t) is the endpoint of the trajectory
generated by the minimizing (x̂0, ŵ(0 : t)) . This estimate is
the same as that of the Kalman filter.
12
Minimax Estimation or H ∞ Estimation
13
There are generalizations of the above to nonlinear systems of the
form
ẋ = f (x, u)
y = h(x, u)
A popular way of finding a nonlinear observer is to seek a local
change of state coordinates z = θ(x) and an input-output injec-
tion β(u, y) such that the system becomes
ż = Az + β(u, y)
where σ(A) < 0 for then the observer
ẑ˙ = Aẑ + β(u, y)
x̂ = θ −1(ẑ)
has linear and exponentially stable error dynamics
z̃˙ = Az̃
in the transformed coordinates. This is relatively easy to accom-
plish if there is no input m = 0 but generally impossible if
m>0.
14
Using change of state coordinates and input/output injection was
initiated in
A. J. Krener and A. Isidori,
Linearization by output injection and nonlinear observers,
Systems Control Lett., 3 (1983), pp. 47–52.
15
Other methods of constructing observers can be found in
E. A. Misawa, J. K. Hedrick, Nonlinear observers a state of the
art survey, Trans. of ASME, J. of Dynamic Systems, Measure-
ment and Control, 111 (1989), 344-352.
See also
Title New directions in nonlinear observer design
H. Nijmeijer and T.I. Fossen (eds.)
Publisher London ; New York : Springer, c1999
16
Nonlinear Stochastic Filtering
17
The Zakai equation is stochastic PDE in the Ito sense driven by
the observations. It has to be solved in its Stratonovich form.
R
xq(x, t) dx
x̂(t) = R
q(x, t) dx
or
x̂(t) = argmaxx q(x, t)
19
Monte Carlo Filtering, Particle Filtering
Sample p(x, t) , use the noisy system and the Bayes formula to
compute αk (t + 1), xk (t + 1)
20
Nonlinear Minimum Energy Filtering
21
To find this estimate we must solve in real time a partial differential
equation of Hamilton-Jacobi-Bellman type driven by u(t), y(t).
∂Q ∂Q 1 ∂Q ∂Q 1
= − fi − + (yi − hi)(yi − hi)
∂t ∂xi 2 ∂xi ∂xi 2
x̂(t) = argminx Q(x, t)
This is nearly impossible if n > 1 .
See also
23
Extended Kalman Filtering (EKF)
24
Interpertation
x(t) ≈ N (x̂(t), P (t))
25
The local covergence of the EKF in discrete time is demonstrated
in
26
Unscented Kalman Filter
The computational burden is about the same of the EKF but the
UKF is second order accurate rather than first order accurate.
It has the same drawback of the EKF, namely that it is a lo-
cal method in that it approximates the conditional density by a
Gaussian.
27
The UKF was introduced in
28
Smoothing of Linear Initial Value Models
ẋ = F x + Gu + Bw
y = Hx + J u + Dv
x(0) ≈ N (x̂0, P 0)
t ∈ [0, T ]
29
Two Kalman Filters
x̂˙ f = F x̂f + Gu + Lf (y − ŷf )
ŷf = H x̂f + J u
x̂f (0) = x̂0
Lf = Pf H 0(DD 0)−1
Ṗf = F Pf + Pf F 0 + BB 0 − Pf H 0(DD 0)−1HPf
Pf (0) = P0
30
The two technical problems can be solved by letting
Q(t) = Pb−1(t), z(t) = Q(t)x̂b(t)
then
ż = (−F 0 + QBB 0)z + (QG + H 0(DD 0)−1J )u
−H 0(DD 0)−1y
z(T ) = 0
Q̇ = −QF − F 0Q + QBB 0Q − H 0(DD 0)−1H
Q(T ) = 0
Then the smoothed estimate x̂(t) and its error covariance P (t)
are given by combining the independent estimates
31
The same smoothed estimate can be obtained by a least squares
argument. Suppose w(0 : T ), v(0 : t) are unknown L2[0, T ]
functions and x0 is an unkown initial condition. We seek to min-
imize
0 −1 Z
(x ) (P ) x + 0T |w|2 + |v|2 dt
0 0 0
Let ŵ(0 : T ), v̂(0 : t), x̂0 be the minimizer then the smoothed
estimate x̂(t) is the solution of
x̂˙ = F x̂ + Gu + B ŵ
x̂(0) = x̂0.
32
For fixed u(0 : T ), y(0 : T ) define the Hamiltonian
0 1 2 2
x̂(0) = P 0λ̂(0)
0 = λ̂(T )
33
This yields the two point boundary value problem
˙
x̂ F 0
−BB x̂
=
˙
−H 0 (DD 0 )−1 H −F 0
λ̂
λ̂
G 0 u
+
0 0 −1 0 0 −1
−H (DD ) J H (DD )
y
x̂(0) = P 0λ̂(0)
0 = λ̂(T )
There are numerous ways of solving this problem, perhaps the
simplest is to define
µ̂(t) = λ̂(t) − M (t)x̂(t)
where
Ṁ = −M F − F 0M − H 0(DD 0)−1H + M BB 0M
M (T ) = 0
34
This transformation triangularizes the dynamics
˙
x̂ F − M BB
0 −BB 0 x̂
=
˙
0 0
−F + BB M µ̂ 0
µ̂
G 0 u
+
0 0 −1 0 0 −1
−M G − H (DD ) J H (DD )
y
I + P 0M (0) x̂(0) = P 0µ̂(0)
0 = µ̂(T )
35
Stochastic Interpertation of the Variational Equations
(For simplicity G = 0, J = 0 )
The model
ẋ = F x + Bw
y = Hx + Dv
x(0) = x0.
defines a linear map
36
The complementary model
λ̇ = −F 0λ + H 0(D 0)−1v
ψ = −B 0λ + w
λ(0) = 0
ξ = x0 − P 0λ(0)
defines a map
37
For more on linear smoothing, boudary value models and comple-
mentary models see
38
In filtering it is reasonable to assume that there is apriori informa-
tion about the initial condition but no apriori information about
the terminal condition. But this is not reasonable in smoothing.
A better model for the smoothing problem is the two point bound-
ary value problem
ẋ = Ax + Bw
y = Cx + Dv
b = V 0x(0) + V T x(T )
where as before w, v are standard white Gaussian noises and b
is an independent Gaussian vector.
39
The two point boundary process is well-posed if
W = V 0 + V T Φ(T, 0)
is nonsingular where
∂Φ
(t, s) = A(t)Φ(t, s)
∂t
Φ(s, s) = I
Then there is a Green’s matrix
Φ(t, 0)W −1Φ(0, s) t > s
Γ(t, s) = −1Φ(T, s) t < s
−Φ(t, 0)W
and
−1 Z
x(t) = Φ(t, s)W b + 0T Γ(t, s)B(s)w(s) ds
where the integral is in the Weiner sense (integration by parts).
40
A process on [0, T ] is reciprocal if conditioned on x(t1), x(t2)
where 0 ≤ t1 < t2 ≤ T , what happens on (t1, t2) is indepen-
dent of what happens on [0, t1) ∪ (t2, T ] .
41
For x defined by the linear boundary value model above
Q = BB 0
0 dQ
GQ = AQ − QA +
dt
dA
F = + A2 − GA
dt
42
The quick and dirty way to derive this equation is to note that if
d+x = Ax dt + B d+w
where d+ denotes the forward differential in the Ito sense then
43
For more on reciprocal processes see
44
Suppose that we have observations of the reciprocal process
dy = Cxdt + Ddv
where v is a standard Wiener process and D is invertible.
x̂(0) = 0, x̂(T ) = 0
45
The reciprocal smoother can be found in
Frezza R (1990)
Models of higher order and mixed order Gaussian reciprocal pro-
cesses with applications to the smoothing problem.
PhD Thesis, University of California, Davis, CA
46
Deterministic Smoothing of Linear Boundary Processes
ẋ = F x + Gu + Bw
y = Hx + J u + Dv
b = V 0x(0) + V T x(T )
47
If x̂(t), ŵ(0 : T ), v̂(0 : t), b̂ are minimizing then there exist
λ̂(t) such that the Pontryagin Minimum Principle holds.
∂H 0
x̂˙ =
(λ̂, x̂, u, y, ŵ)
∂λ
∂H 0
˙
λ̂ = −
(λ̂, x̂, u, y, ŵ)
∂x
ŵ = argminw H(λ̂, x̂, u, y, w)
b̂ = V 0x̂(0) + V T x̂(T )
λ̂(0) = (V 0)0b̂
λ̂(T ) = (V T )0b̂
48
This yields the two point boundary value problem
˙
x̂ F 0
−BB x̂
=
˙
0 0 −1
−H (DD ) H −F 0
λ̂
λ̂
G 0 u
+
0 0 −1 0 0 −1
−H (DD ) J H (DD )
y
49
In the initial value formulation, one of the boundary conditions of
the variational equations was particularly simple
λ(T ) = 0
This allowed us to triangularize the dynamics and solve by back-
ward sweep of λ̂(t) and a forward sweep of x̂(t) .
50
Another advantage of the deterministic approach is that it readily
generalizes to nonlinear boundary value models of the form
ẋ = f (t, x, u) + g(t, x)w
y = h(t, x, u) + v
b = β(x(0), x(T ))
1 2 ZT
|b| + 0 |w|2 + |v|2 dt
2
51
The Pontryagin minimum principle yields the first order necessary
that must be satisfied by an optimal solution. The Hamiltonian is
0 1
H = λ (f (t, x, u(t)) + g(t, x)w) + |w|2(t)
2
1
+ |y(t) − h(t, x, u(t)|2.
2
and
∂H
0
˙
x̂(t) =
(t, λ̂(t), x̂(t), u(t), y(t), ŵ(t))
∂λ
∂H
0
˙
λ̂(t) = −
(t, λ̂(t), x̂(t), u(t), y(t), ŵ(t))
∂x
ŵ(t) = argminw H(t, λ̂(t), x̂(t), u(t), y(t), w)
∂b
0
λ̂(0) =
0
(x̂(0), x̂(T )) b(x̂(0), x̂(T ))
∂x
∂b
0
λ̂(T ) =
T
(x̂(0), x̂(T )) b(x̂(0), x̂(T ))
∂x
52
Then
ŵ(t) = −g 0(t, x̂(t))λ̂(t)
and so
˙
x̂(t) = f (t, x̂(t), u(t)) − g(t, x̂(t))g 0(t, x̂(t))λ̂(t)
∂f ∂g
0
˙
(t, x̂(t))g 0(t, x̂(t))λ̂(t)) λ̂(t)
λ̂(t) = − (t, x̂(t), u(t)) −
∂x ∂x
∂h
0
+ (t, x̂(t), u(t)) (y(t) − h(t, x̂(t), u(t)))
∂x
∂b
0
λ̂(0) = 0 (x̂(0), x̂(T )) b(x̂(0), x̂(T ))
∂x
∂b
0
λ̂(T ) = T (x̂(0), x̂(T )) b(x̂(0), x̂(T )).
∂x
This is a nonlinear two point boundary value problem in 2n vari-
ables that we can try solve by direct methods, shooting methods
or iterative methods. We shall use an iterative method that takes
advantage of the variational nature of the problem.
53
We use gradient descent. First solve the initial value problem,
ẋ = f (t, x, u) + g(t, x(t))w
x(0) = x0
and compute the cost
1 Z
π(x0, w(0 : T )) =
|b|2 + 0T |w|2 + |v|2 dt
2
Then solve the final value problem for µ(s) ,
d
µ(s) = −F (s)µ(s) + H 0(s) (y(s) − H(s)x(s))
ds
∂b
0
µ(T ) = T (x(0), x(T )) b(x(0), x(T ))
∂x
∂f
F (t) = (t, x(t), u(t))
∂x
G(t) = g(t, x(t))
∂h
H(t) = (t, x(t), u(t))
∂x
54
The first variation of the cost due to the changes δx0, δw(0 : T )
is
0
0 ∂b 0
0
δπ = π(x , w) + β
(x(0), x(T )) + µ (0) δx
∂x0
à !
Z
T 0 0
+ 0 w (s) + µ (s)G(s) δw(s) ds
+O(δx0, δw)2.
Choose a step size ² and define
∂b
0
0
δx = −² 0 (x(0), x(T )) β + µ(0)
∂x
à !
0
δw(s) = −² w(s) + G (s)µ(s) .
55
The nonlinear smoother can be found in
56
Conclusions