Optimization-Based Control Murray
Optimization-Based Control Murray
Richard M. Murray
Control and Dynamical Systems
California Institute of Technology
This manuscript is for personal use only and may not be reproduced,
in whole or in part, without written consent from the author.
Preface
short description of the prerequisites for the chapter and citations to the rele-
vant literature. Advanced sections, marked by the “dangerous bend” symbol
shown in the margin, contain material that requires a slightly more tech-
nical background, of the sort that would be expected of graduate students
in engineering. Additional information is available on the Feedback Systems
web site:
http://www.cds.caltech.edu/~murray/amwiki/OBC
Contents
Bibliography B-1
Index I-1
Chapter One
Trajectory Generation and Tracking
Figure 1.1: Two degree of freedom controller design for a process P with uncer-
tainty ∆. The controller consists of a trajectory generator and feedback controller.
The trajectory generation subsystem computes a feedforward command ud along
with the desired state xd . The state feedback controller uses the measured (or es-
timated) state and desired state to compute a corrective input ufb . Uncertainty is
represented by the block ∆, representing unmodeled dynamics, as well as distur-
bances and noise.
We use the notation r(·) to indicate that the control law can depend not
only on the reference signal r(t) but also derivatives of the reference signal.
A feasible trajectory for the system (1.1) is a pair (xd (t), ud (t)) that sat-
isfies the differential equation and generates the desired trajectory:
ẋd (t) = f xd (t), ud (t) r(t) = h xd (t), ud (t) .
The problem of finding a feasible trajectory for a system is called the tra-
jectory generation problem, with xd representing the desired state for the
1.2. TRAJECTORY TRACKING AND GAIN SCHEDULING 1-3
y 12
10
6
θ v
4 y
−2
0 1 2 3 4 5
x Time (s)
(a) (b)
Figure 1.2: Vehicle steering using gain scheduling.
where K(x, µ) depends on the current system state (or some portion of
it) and an external parameter µ. The dependence on the current state x (as
opposed to the desired state xd ) allows us to modify the closed loop dynamics
differently depending on our location in the state space. This is particularly
useful when the dynamics of the process vary depending on some subset of
the states (such as the altitude for an aircraft or the internal temperature
for a chemical reaction). The dependence on µ can be used to capture the
dependence on the reference trajectory, or they can reflect changes in the
environment or performance specifications that are not modeled in the state
of the controller.
v
ẋ = cos θ v, ẏ = sin θ v, θ̇ = tan φ, (1.2)
l
where (x, y, θ) is the position and orientation of the vehicle, v is the veloc-
ity and φ is the steering angle, both considered to be inputs, and l is the
wheelbase.
A simple feasible trajectory for the system is to follow a straight line in
the x direction at lateral position yr and fixed velocity vr . This corresponds
to a desired state xd = (vr t, yr , 0) and nominal input ud = (vr , 0). Note that
(xd , ud ) is not an equilibrium point for the system, but it does satisfy the
equations of motion.
1-6 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING
The form of the controller shows that at low speeds the gains in the steering
angle will be high, meaning that we must turn the wheel harder to achieve
the same effect. As the speed increases, the gains become smaller. This
matches the usual experience that at high speed a very small amount of
actuation is required to control the lateral position of a car. Note that the
gains go to infinity when the vehicle is stopped (vr = 0), corresponding to
the fact that the system is not reachable at this point.
Figure 1.2b shows the response of the controller to a step change in
lateral position at three different reference speeds. Notice that the rate of
the response is constant, independent of the reference speed, reflecting the
fact that the gain scheduled controllers each set the closed loop poles to the
same values. ∇
One limitation of gain scheduling as we have described it is that a separate
set of gains must be designed for each operating condition xd . In practice,
1.3. TRAJECTORY GENERATION AND DIFFERENTIAL FLATNESS 1-7
Figure 1.3: Gain scheduling. A general gain scheduling design involves finding a
gain K at each desired operating point. This can be thought of as a gain surface,
as shown on the left (for the case of a scalar gain). An approximation to this gain
can be obtained by computing the gains at a fixed number of operating points
and then interpolated between those gains. This gives an approximation of the
continuous gain surface, as shown on the right.
where Kj is a set of gains designed around the operating point xd,j and αj (x)
is a weighting factor. For example, we might choose the weights αj (x) such
that we take the gains corresponding to the nearest two operating points
and weight them according to the Euclidean distance of the current state
from that operating point; if the distance is small then we use a weight very
near to 1 and if the distance is far then we use a weight very near to 0.
While the intuition behind gain scheduled controllers is fairly clear, some
caution in required in using them. In particular, a gain scheduled controller
is not guaranteed to be stable even if K(x, µ) locally stabilizes the system
around a given equilibrium point. Gain scheduling can be proven to work in
the case when the gain varies sufficiently slowly (Exercise 1.3).
For a differentially flat system, all of the feasible trajectories for the
system can be written as functions of a flat output z(·) and its derivatives.
The number of flat outputs is always equal to the number of system inputs.
The kinematic car is differentially flat with the position of the rear wheels as
the flat output. Differentially flat systems were originally studied by Fliess
et al. [FLMR92].
Differentially flat systems are useful in situations where explicit trajec-
tory generation is required. Since the behavior of a flat system is determined
by the flat outputs, we can plan trajectories in output space, and then map
these to appropriate inputs. Suppose we wish to generate a feasible trajec-
tory for the the nonlinear system
ẋ = f (x, u), x(0) = x0 , x(T ) = xf .
If the system is differentially flat then
x(0) = β z(0), ż(0), . . . , z (q) (0) = x0 ,
(1.5)
x(T ) = γ z(T ), ż(T ), . . . , z (q) (T ) = xf ,
and we see that the initial and final condition in the full state space depends
on just the output z and its derivatives at the initial and final times. Thus
any trajectory for z that satisfies these boundary conditions will be a feasible
trajectory for the system, using equation (1.4) to determine the full state
space and input trajectories.
In particular, given initial and final conditions on z and its derivatives
that satisfy equation (1.5), any curve z(·) satisfying those conditions will
correspond to a feasible trajectory of the system. We can parameterize the
flat output trajectory using a set of smooth basis functions ψi (t):
N
X
z(t) = αi ψi (t), αi ∈ R.
i=1
N
X
ż(t) = αi ψ̇i (t)
i=1
..
.
N
X (q)
ż (q) (t) = αi ψi (t).
i=1
We can thus write the conditions on the flat outputs and their derivatives
as
ψ1 (0) ψ2 (0) . . . ψN (0) z(0)
ψ̇1 (0) ψ̇2 (0) . . . ψ̇N (0) ż(0)
.
.. .. ..
. . . ..
(q) (q) (q)
α1
ψ (0) ψ (0) . . . ψ (0) z (q) (0)
1 2 N ..
. =
ψ1 (T ) ψ 2 (T ) . . . ψ N (T ) z(T )
αN
ψ̇1 (T ) ψ̇ 2 (T ) . . . ψ̇ N (T ) ż(T )
.
.. .. .. .
. . . .
(q) (q)
ψ1 (T ) ψ2 (T ) . . . ψN (T )
(q) z (q) (T )
This system is differentially flat with flat output z = (x1 , x3 ). The relation-
ship between the flat variables and the states is given by
1 0 0 0 0 0 0 0 α11 x1,0
0 1 0 0 0 0 0 0 α12 1
0 0 0 0 1 0 0 0 α13 x3,0
0 0 0 0 0 1 0 0 α14 x2,0
= .
1 T T2 T3 0 0 0 0 α21 x1,f
0 1 2T 3T 2 0 0 0 0 α22 1
0 0 0 0 1 T T2 T3 α23 x3,f
0 0 0 0 0 1 2T 3T 2 α24 x2,f
(c) N trailers
(d) Towed cable
Figure 1.5: Examples of flat systems.
form”:
ẋ1 = f1 (x1 , x2 )
ẋ2 = f2 (x1 , x2 , x3 )
..
.
ẋn = fn (x1 , . . . , xn , u).
Under certain regularity conditions these systems are differentially flat with
output y = x1 . These systems have been used for so-called “integrator back-
stepping” approaches to nonlinear control by Kokotovic et al. [KKM91] and
constructive controllability techniques for nonholonomic systems in chained
form [vNRM98]. Figure 1.5 shows some additional systems that are differ-
entially flat.
Example 1.3 Vectored thrust aircraft
Consider the dynamics of a planar, vectored thrust flight control system as
shown in Figure 1.6. This system consists of a rigid body with body fixed
forces and is a simplified model for a vertical take-off and landing aircraft
(see Example 2.9 in ÅM08). Let (x, y, θ) denote the position and orientation
of the center of mass of the aircraft. We assume that the forces acting on the
vehicle consist of a force F1 perpendicular to the axis of the vehicle acting
at a distance r from the center of mass, and a force F2 parallel to the axis of
the vehicle. Let m be the mass of the vehicle, J the moment of inertia, and
1.4. FURTHER READING 1-13
y r
F2
x F1
Figure 1.6: Vectored thrust aircraft (from ÅM08). The net thrust on the aircraft
can be decomposed into a horizontal force F1 and a vertical force F2 acting at a
distance r from the center of mass.
description can be found in the survey article by Rugh [Rug90] and the work
of Shamma [Sha90]. Differential flatness was originally developed by Fliess,
Levin, Martin and Rouchon [FLMR92]. See [Mur97] for a description of the
role of flatness in control of mechanical systems and [vNM98, MFHM05] for
more information on flatness applied to flight control systems.
Exercises
1.1 (Feasible trajectory for constant reference) Consider a linear input/output
system of the form
ẋ = Ax + Bu, y = Cx (1.10)
in which we wish to track a constant reference r. A feasible (steady state)
trajectory for the system is given by solving the equation
0 A B xd
=
r C 0 ud
for xd and ud .
(a) Show that these equations always have a solution as long as the linear
system (1.10) is reachable.
(b) In Section 6.2 of ÅM08 we showed that the reference tracking problem
could be solved using a control law of the form u = −Kx + kr r. Show
that this is equivalent to a two degree of freedom control design using
xd and ud and give a formula for kr in terms of xd and ud . Show that
this formula matches that given in ÅM08.
1.2 A simplified model of the steering control problem is described in
Åström and Murray, Example 2.8. The lateral dynamics can be approxi-
mated by the linearized dynamics
0 v 0
ż = z+ u, y = z1 ,
0 0 1
where z = (y, θ) ∈ R2 is the state of the system and v is the speed of
the vehicle. Suppose that we wish to track a piecewise constant reference
trajectory
r = square(2πt/20),
where square is the square wave function in MATLAB. Suppose further
that the speed of the vehicle varies according to the formula
v = 5 + 3 sin(2πt/50).
Design and implement a gain-scheduled controller for this system by first
designing a state space controller that places the closed loop poles of the
system at the roots of s2 + 2ζω0 s + ω02 , where ζ = 0.7 and ω0 = 1. You
1.4. FURTHER READING 1-15
1.3 (Stability of gain scheduled controllers for slowly varying systems) Con-
sider a nonlinear control system with gain scheduled feedback
ė = f (e, v) v = k(µ)e,
where µ(t) ∈ R is an externally specified parameter (e.g., the desired tra-
jectory) and k(µ) is chosen such that the linearization of the closed loop
system around the origin is stable for each fixed µ.
Show that if |µ̇| is sufficiently small then the equilibrium point is locally
asymptotically stable for the full nonlinear, time-varying system. (Hint: find
a Lyapunov function of the form V = xT P (µ)x based on the linearization of
the system dynamics for fixed µ and then show this is a Lyapunov function
for the full system.)
(a) Compute the state space trajectory x(t) and input u(t) corresponding
to equation (1.12) and satisfying the differential equation (1.11). Your
answer should be an equation similar to equation (1.6) for each state
xi and the input u.
(b) Find an explicit input that steers a double integrator system between
any two equilibrium points x0 ∈ R2 and xf ∈ R2 .
1-16 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING
(c) Show that all reachable systems are differentially flat and give a for-
mula for finding the flat output in terms of the dynamics matrix A
and control matrix B.
1.5 Consider the lateral control problem for an autonomous ground vehicle
as described in Example 1.1 and Section 1.3. Using the fact that the system is
differentially flat, find an explicit trajectory that solves the following parallel
parking maneuver:
x0 = (0, 4)
xi = (6, 2)
xf = (0, 0)
(c) Write a simulation of the system stabilizes the desired trajectory and
demonstrate your two-degree of freedom control system maneuvering
from several different initial conditions into the parking space, with
either disturbances or modeling errors included in the simulation.
Chapter Two
Optimal Control
F (x)
x1
dx
∂F
∂x
dx
x
x2
x∗
F (x) ∂G
∂x x3
∗
x G(x) = 0
x2
G(x) = 0 x2
x1
x1
(a) Constrained optimization (b) Constraint normal vectors
Figure 2.2: Optimization with constraints. (a) We seek a point x∗ that minimizes
F (x) while lying on the surface G(x) = 0 (a line in the x1 x2 plane). (b) We can
parameterize the constrained directions by computing the gradient of the constraint
G. Note that x ∈ R2 in (a), with the third dimension showing F (x), while x ∈ R3
in (b).
and x2 and x∗ in the figure all satisfy the necessary condition but only one
is the (global) minimum.
The situation is more complicated if constraints are present. Let Gi :
Rn → R, i = 1, . . . , k be a set of smooth functions with Gi (x) = 0 repre-
senting the constraints. Suppose that we wish to find x∗ ∈ Rn such that
Gi (x∗ ) = 0 and F (x∗ ) ≤ F (x) for all x ∈ {x ∈ Rn : Gi (x) = 0, i = 1, . . . , k}.
This situation can be visualized as constraining the point to a surface (de-
fined by the constraints) and searching for the minimum of the cost function
along this surface, as illustrated in Figure 2.2a.
A necessary condition for being at a minimum is that there are no di-
rections tangent to the constraints that also decrease the cost. Given a con-
straint function G(x) = (G1 (x), . . . , Gk (x)), x ∈ Rn we can represent the
constraint as a n − k dimensional surface in Rn , as shown in Figure 2.2b.
The tangent directions to the surface can be computed by considering small
2.1. REVIEW: OPTIMIZATION 2-3
The variables λ can be regarded as free variables, which implies that we need
to choose x such that G(x) = 0 in order to insure the cost is minimized.
Otherwise, we could choose λ to generate a large cost.
We see that the conditions that we have derived are independent of the sign
of F since they only depend on the gradient begin zero in approximate di-
rections. Thus finding x∗ that satisfies the conditions corresponds to finding
an extremum for the function.
Very good software is available for numerically solving optimization prob-
lems of this sort. The NPSOL and SNOPT libraries are available in FOR-
TRAN (and C). In MATLAB, the fmin function can be used to solve a
constrained optimization problem.
2.2. OPTIMAL CONTROL OF SYSTEMS 2-5
Sketch of proof. We follow the proof given by Lewis and Syrmos [LS95],
omitting some of the details required for a fully rigorous proof. We use
the method of Lagrange multipliers, augmenting our cost function by the
dynamical constraints and the terminal constraints:
Z T
˜
J(x(·), u(·), λ(·), ν) = J(x, u) + −λT (t) ẋ(t) − f (x, u) dt + ν T ψ(x(T ))
0
Z T
= L(x, u) − λT (t) ẋ(t) − f (x, u) dt
0
+ V (x(T )) + ν T ψ(x(T )).
Note that λ is a function of time, with each λ(t) corresponding to the instan-
taneous constraint imposed by the dynamics. The integral over the interval
[0, T ] plays the role of the sum of the finite constraints in the regular opti-
mization.
Making use of the definition of the Hamiltonian, the augmented cost
becomes
Z T
˜
J(x(·), u(·), λ(·), ν) = H(x, u) − λT (t)ẋ dt + V (x(T )) + ν T ψ(x(T )).
0
We can now “linearize” the cost function around the optimal solution x(t) =
x∗ (t) + δx(t), u(t) = u∗ (t) + δu(t), λ(t) = λ∗ (t) + δλ(t) and ν = ν ∗ + δν.
Taking T as fixed for simplicity (see [LS95] for the more general case), the
incremental cost can be written as
δ J˜ = J(x
˜ ∗ + δx, u∗ + δu, λ∗ + δλ, ν ∗ + δν) − J(x
˜ ∗ , u ∗ , λ∗ , ν ∗ )
Z T ∂H
∂H ∂H T T
≈ δx + δu − λ δ ẋ + − ẋ δλ dt
0 ∂x ∂u ∂λ
∂V ∂ψ
+ δx(T ) + ν T δx(T ) + δν T ψ x(T ), u(T ) ,
∂x ∂x
where we have omitted the time argument inside the integral and all deriva-
tives are evaluated along the optimal solution.
2-8 CHAPTER 2. OPTIMAL CONTROL
Since we are requiring x(0) = x0 , the δx(0) term vanishes and substituting
this into δ J˜ yields
Z T ∂H
˜ ∂H T ∂H T
δJ ≈ + λ̇ δx + δu + − ẋ δλ dt
0 ∂x ∂u ∂λ
∂V ∂ψ
+ + νT − λT (T ) δx(T ) + δν T ψ x(T ), u(T ) .
∂x ∂x
To be optimal, we require δ J˜ = 0 for all δx, δu, δλ and δν, and we obtain
the (local) conditions in the theorem.
2.3 Examples
To illustrate the use of the maximum principle, we consider a number of
analytical examples. Additional examples are given in the exercises.
where the terminal time tf is given and c > 0 is a constant. This cost
function balances the final value of the state with the input required to get
to that state.
To solve the problem, we define the various elements used in the maxi-
mum principle. Our integral and terminal costs are given by
L = 12 u2 (t) V = 12 cx2 (tf ).
We write the Hamiltonian of this system and derive the following expressions
for the costate λ:
H = L + λf = 12 u2 + λ(ax + bu)
∂H ∂V
λ̇ = − = −aλ, λ(tf ) = = cx(tf ).
∂x ∂x
This is a final value problem for a linear differential equation in λ and the
2.3. EXAMPLES 2-9
u = −Q−1 T
u B λ,
which can be substituted into the dynamic equation (2.6) To solve for the
optimal control we must solve a two point boundary value problem using the
initial condition x(0) and the final condition λ(T ). Unfortunately, it is very
hard to solve such problems in general.
Given the linear nature of the dynamics, we attempt to find a solution
by setting λ(t) = P (t)x(t) where P (t) ∈ Rn×n . Substituting this into the
necessary condition, we obtain
λ̇ = Ṗ x + P ẋ = Ṗ x + P (Ax − BQ−1 T
u B P )x,
=⇒ −Ṗ x − P Ax + P BQ−1 T
u BP x = Qx x + A P x.
− Ṗ = P A + AT P − P BQ−1 T
u B P + Qx , P (T ) = P1 . (2.7)
This is a matrix differential equation that defines the elements of P (t) from
a final value P (T ). Solving it is conceptually no different than solving the
initial value problem for vector-valued ordinary differential equations, except
that we must solve for the individual elements of the matrix P (t) backwards
in time. Equation (2.7) is called the Riccati ODE.
An important property of the solution to the optimal control problem
when written in this form is that P (t) can be solved without knowing either
x(t) or u(t). This allows the two point boundary value problem to be sepa-
rated into first solving a final-value problem and then solving a time-varying
initial value problem. More specifically, given P (t) satisfying equation (2.7),
we can apply the optimal input
u(t) = −Q−1 T
u B P (t)x.
and then solve the original dynamics of the system forward in time from
the initial condition x(0) = x0 . Note that this is a (time-varying) feedback
control that describes how to move from any state to the origin in time T .
An important variation of this problem is the case when we choose T = ∞
2-12 CHAPTER 2. OPTIMAL CONTROL
and eliminate the terminal cost (set P1 = 0). This gives us the cost function
Z ∞
J= (xT Qx x + uT Qu u) dt. (2.8)
0
Since we do not have a terminal cost, there is no constraint on the final value
of λ or, equivalently, P (t). We can thus seek to find a constant P satisfying
equation (2.7). In other words, we seek to find P such that
P A + AT P − P BQ−1 T
u B P + Qx = 0. (2.9)
This equation is called the algebraic Riccati equation. Given a solution, we
can choose our input as
u = −Q−1 T
u B P x.
and the minimum cost from initial condition x(0) is given by J ∗ = xT (0)P x(0).
The basic form of the solution follows from the necessary conditions, with
the theorem asserting that a constant solution exists for T = ∞ when the
additional conditions are satisfied. The full proof can be found in standard
texts on optimal control, such as Lewis and Syrmos [LS95] or Athans and
Falb [AF06]. A simplified version, in which we first assume the optimal
control is linear, is left as an exercise.
Example 2.4 Optimal control of a double integrator
Consider a double integrator system
dx 0 1 0
= x+ u
dt 0 0 1
with quadratic cost given by
q2 0
Qx = , Qu = 1.
0 0
The optimal control is given by the solution of matrix Riccati equation (2.9).
Let P be a symmetric positive definite matrix of the form
a b
P = .
b c
Then the Riccati equation becomes
2
−b + q 2 a − bc 0 0
= ,
a − bc 2b − c2 0 0
which has solution "p #
2q 3 q
P = √ .
q 2q
The controller is given by
p
K = Q−1 T
u B P = [q 2q].
The feedback law minimizing the given cost function is then u = −Kx.
To better understand the structure of the optimal solution, we exam-
ine the eigenstructure of the closed loop system. The closed-loop dynamics
matrix is given by
0 1
√
Acl = A − BK = .
−q − 2q
2-14 CHAPTER 2. OPTIMAL CONTROL
y r
F2
x F1
(a) Harrier “jump jet” (b) Simplified model
Figure 2.3: Vectored thrust aircraft. The Harrier AV-8B military aircraft (a)
redirects its engine thrust downward so that it can “hover” above the ground.
Some air from the engine is diverted to the wing tips to be used for maneuvering.
As shown in (b), the net thrust on the aircraft can be decomposed into a horizontal
force F1 and a vertical force F2 acting at a distance r from the center of mass.
1.5 1.5
x
Position x, y [m]
Position x, y [m]
y
1 1
0.5 0.5
rho = 0.1
rho = 1
rho = 10
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (seconds) Time (seconds)
(a) Step response in x and y (b) Effect of control weight ρ
Figure 2.4: Step response for a vectored thrust aircraft. The plot in (a) shows
the x and y positions of the aircraft when it is commanded to move 1 m in each
direction. In (b) the x motion is shown for control weights ρ = 1, 102 , 104 . A higher
weight of the input term in the cost function causes a more sluggish response.
2-16 CHAPTER 2. OPTIMAL CONTROL
1.5 4
x u1
y u2
3
1
0.5
1
0 0
0 2 4 6 8 10 0 2 4 6 8 10
(a) Step response in x and y (b) Inputs for the step response
Figure 2.5: Step response for a vector thrust aircraft using physically motivated
LQR weights (a). The rise time for x is much faster than in Figure 2.4a, but there
is a small oscillation and the inputs required are quite large (b).
loss in efficiency. This can be accounted for in the LQR weights be choosing
100 0 0 0 0 0
0 1 0 0 0 0
0 0 2π/9 0 0 0 10 0
Qx =
0
, Qu = .
0 0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
It follows from these equations that λ1 and λ3 are constant. To find the
input u corresponding to the extremal curves, we see from the Hamiltonian
that
u1 = −sgn(λ1 + λ3 x2 u1 ), u2 = −sgnλ2 .
These equations are well-defined as long as the arguments of sgn(·) are non-
zero and we get switching of the inputs when the arguments pass through
0.
An example of an abnormal extremal is the optimal trajectory between
x0 = (0, 0, 0) to xf = (ρ, 0, 0) where ρ > 0. The minimum time trajectory
is clearly given by moving on a straight line with u1 = 1 and u2 = 0. This
extremal satisfies the necessary conditions but with λ2 ≡ 0, so that the
“constraint” that ẋ2 = u2 is not strictly enforced through the Lagrange
multipliers. ∇
Exercises
2.1 (a) Let G1 , G2 , . . . , Gk be a set of row vectors on a Rn . Let F be
another row vector on Rn such that for every x ∈ Rn satisfying
Gi x = 0, i = 1, . . . , k, we have F x = 0. Show that there are con-
stants λ1 , λ2 , . . . , λk such that
k
X
F = λ k Gk .
i=1
(c) (Optional) Find the input u to steer the system from (0, 0) to (0, Ỹ ) ∈
Rm × Rm×m where Ỹ T = −Ỹ .
(Hint: if you get stuck, there is a paper by Brockett on this problem.)
2.3 In this problem, you will use the maximum principle to show that the
shortest path between two points is a straight line. We model the problem
by constructing a control system
ẋ = u,
where x ∈ R2 is the position in the plane and u ∈ R2 is the velocity vector
along the curve. Suppose we wish to find a curve of minimal length con-
necting x(0) = x0 and x(1) = xf . To minimize the length, we minimize the
integral of the velocity along the curve,
Z 1 Z 1√
J= kẋk dt = ẋT ẋ dt,
0 0
subject to to the initial and final state constraints. Use the maximum prin-
ciple to show that the minimal length path is indeed a straight line at max-
imum velocity. (Hint: try minimizing using the integral cost ẋT ẋ first and
then show this also optimizes the optimal control problem with integral cost
kẋk.)
2.4 Consider the optimal control problem for the system
ẋ = −ax + bu,
2.7. FURTHER READING 2-19
(a) Solve explicitly for the optimal control u∗ (t) and the corresponding
state x∗ (t) in terms of t0 , tf , x(t0 ) and t and describe what happens
to the terminal state x∗ (tf ) as c → ∞.
(b) Show that the system is differentially flat with appropriate choice of
output(s) and compute the state and input as a function of the flat
output(s).
(c) Using the polynomial basis {tk , k = 0, . . . , M −1} with an appropriate
choice of M , solve for the (non-optimal) trajectory between x(t0 ) and
x(tf ). Your answer should specify the explicit input ud (t) and state
xd (t) in terms of t0 , tf , x(t0 ), x(tf ) and t.
(d) Let a = 1 and c = 1. Use your solution to the optimal control prob-
lem and the flatness-based trajectory generation to find a trajectory
between x(0) = 0 and x(1) = 1. Plot the state and input trajectories
for each solution and compare the costs of the two approaches.
(e) (Optional) Suppose that we choose more than the minimal number of
basis functions for the differentially flat output. Show how to use the
additional degrees of freedom to minimize the cost of the flat trajec-
tory and demonstrate that you can obtain a cost that is closer to the
optimal.
Use the maximum principle to show that any optimal trajectory consists
of segments in which the robot is traveling at maximum velocity in either the
forward or reverse direction, and going either straight, hard left (ω = −M )
or hard right (ω = +M ).
Note: one of the cases is a bit tricky and cannot be completely proven
with the tools we have learned so far. However, you should be able to show
the other cases and verify that the tricky case is possible.
2.7 Consider a linear system with input u and output y and suppose we
wish to minimize the quadratic cost function
Z ∞
J= y T y + ρuT u dt.
0
Show that if the corresponding linear system is observable, then the closed
loop system obtained by using the optimal feedback u = −Kx is guaranteed
to be stable.
2.8 Consider the system transfer function
s+b
H(s) = , a, b > 0
s(s + a)
with state space representation
0 1 0
ẋ = x+ u,
0 −a 1
y= b 1 x
and performance criterion
Z ∞
V = (x21 + u2 )dt.
0
(a) Let
p11 p12
P = ,
p21 p22
with p12 = p21 and P > 0 (positive definite). Write the steady state
Riccati equation as a system of four explicit equations in terms of the
elements of P and the constants a and b.
(b) Find the gains for the optimal controller assuming the full state is
available for feedback.
(c) Find the closed loop natural frequency and damping ratio.
This set of notes builds on the previous two chapters and explores the use of
online optimization as a tool for control of nonlinear control. We begin with
a high-level discussion of optimization-based control, refining some of the
concepts initially introduced in Chapter 1. We then describe the technique
of receding horizon control (RHC), including a proof of stability for a partic-
ular form of receding horizon control that makes use of a control Lyapunov
function as a terminal cost. We conclude the chapter with a detailed design
example, in which we can explore some of the computational tradeoffs in
optimization-based control.
The material in this chapter is based on part on joint work with John Hauser
and Ali Jadbabaie [MHJ+ 03].
Cost Function
Model Predictive Linear Controller
Control
Design approach
The basic philosophy that we propose is illustrated in Figure 3.1. We begin
with a nonlinear system, including a description of the constraint set. We
linearize this system about a representative equilibrium point and perform
a linear control design using standard control design tools. Such a design
can provide provably robust performance around the equilibrium point and,
more importantly, allows the designer to meet a wide variety of formal and
informal performance specifications through experience and the use of so-
phisticated linear design tools.
The resulting linear control law then serves as a specification of the de-
sired control performance for the entire nonlinear system. We convert the
control law specification into a receding horizon control formulation, chosen
such that for the linearized system, the receding horizon controller gives com-
parable performance. However, because of its use of optimization tools that
can handle nonlinearities and constraints, the receding horizon controller
is able to provide the desired performance over a much larger operating
envelope than the controller design based just on the linearization. Further-
more, by choosing cost formulations that have certain properties, we can
provide proofs of stability for the full nonlinear system and, in some cases,
the constrained system.
The advantage of the proposed approach is that it exploits the power
of humans in designing sophisticated control laws in the absence of con-
straints with the power of computers to rapidly compute trajectories that
optimize a given cost function in the presence of constraints. New advances
in online trajectory generation serve as an enabler for this approach and
their demonstration on representative flight control experiments shows their
viability [MFHM05]. This approach can be extended to existing nonlinear
paradigms as well, as we describe in more detail below.
An advantage of optimization-based approaches is that they allow the
potential for online customization of the controller. By updating the model
3.1. OPTIMIZATION-BASED CONTROL 3-3
that the optimization uses to reflect the current knowledge of the system
characteristics, the controller can take into account changes in parameters
values or damage to sensors or actuators. In addition, environmental models
that include dynamic constraints can be included, allowing the controller to
generate trajectories that satisfy complex operating conditions. These mod-
ifications allow for many state- and environment-dependent uncertainties to
including the receding horizon feedback loop, providing potential robustness
with respect to those uncertainties.
A number of approaches in receding horizon control employ the use of
terminal state equality or inequality constraints, often together with a ter-
minal cost, to ensure closed loop stability. In Primbs et al. [PND99], aspects
of a stability-guaranteeing, global control Lyapunov function (CLF) were
used, via state and control constraints, to develop a stabilizing receding
horizon scheme. Many of the nice characteristics of the CLF controller to-
gether with better cost performance were realized. Unfortunately, a global
control Lyapunov function is rarely available and often not possible.
Motivated by the difficulties in solving constrained optimal control prob-
lems, researchers have developed an alternative receding horizon control
strategy for the stabilization of nonlinear systems [JYH01]. In this approach,
closed loop stability is ensured through the use of a terminal cost consisting
of a control Lyapunov function (defined later) that is an incremental upper
bound on the optimal cost to go. This terminal cost eliminates the need
for terminal constraints in the optimization and gives a dramatic speed-up
in computation. Also, questions of existence and regularity of optimal solu-
tions (very important for online optimization) can be dealt with in a rather
straightforward manner.
Inverse Optimality
The philosophy presented here relies on the synthesis of an optimal control
problem from specifications that are embedded in an externally generated
controller design. This controller is typically designed by standard classical
control techniques for a nominal process, absent constraints. In this frame-
work, the controller’s performance, stability and robustness specifications
are translated into an equivalent optimal control problem and implemented
in a receding horizon fashion.
One central question that must be addressed when considering the use-
fulness of this philosophy is: Given a control law, how does one find an
equivalent optimal control formulation? The paper by Kalman [Kal64] lays
a solid foundation for this class of problems, known as inverse optimality.
In this paper, Kalman considers the class of linear time-invariant (LTI) pro-
cesses with full-state feedback and a single input variable, with an associated
cost function that is quadratic in the input and state variables. These as-
sumptions set up the well-known linear quadratic regulator (LQR) problem,
by now a staple of optimal control theory.
3-4 CHAPTER 3. RECEDING HORIZON CONTROL
We note that the first equation is simply the normal algebraic Riccati equa-
tion of optimal control, but with PT , Q, and R yet to be chosen. The second
equation places additional constraints on R and PT .
Equation (3.4) is exactly the same equation that one would obtain if we
had considered an infinite time horizon problem, since the given control was
constant and hence P (t) was forced to be constant. This infinite horizon
3.1. OPTIMIZATION-BASED CONTROL 3-5
problem is precisely the one that Kalman considered in 1964, and hence
his results apply directly. Namely, in the single-input single-output case, we
can always find a solution to the coupled equations (3.4) under standard
conditions on reachability and observability [Kal64]. The equations can be
simplified by substituting the second relation into the first to obtain
AT PT + PT A − K T RK + Q = 0.
This equation is linear in the unknowns and can be solved directly (remem-
bering that PT , Qx and Qu are required to be positive definite).
The implication of these results is that any state feedback control law
satisfying these assumptions can be realized as the solution to an appro-
priately defined receding horizon control law. Thus, we can implement the
design framework summarized in Figure 3.1 for the case where our (linear)
control design results in a state feedback controller.
The above results can be generalized to nonlinear systems, in which one
takes a nonlinear control system and attempts to find a cost function such
that the given controller is the optimal control with respect to that cost.
The history of inverse optimal control for nonlinear systems goes back to
the early work of Moylan and Anderson [MA73]. More recently, Sepulchre
et al. [SJK97] showed that a nonlinear state feedback obtained by Sontag’s
formula from a control Lyapunov function (CLF) is inverse optimal. The con-
nections of this inverse optimality result to passivity and robustness prop-
erties of the optimal state feedback are discussed in Jankovic et al. [JSK99].
Most results on inverse optimality do not consider the constraints on control
or state. However, the results on the unconstrained inverse optimality justify
the use of a more general nonlinear loss function in the integrated cost of
a finite horizon performance index combined with a real-time optimization-
based control approach that takes the constraints into account.
of this type.
The performance of the system will be measured by an integral cost
L : Rn × Rm → R. We require that L be twice differentiable (C 2 ) and fully
penalize both state and control according to
L(x, u) ≥ cq (kxk2 + kuk2 ), x ∈ Rn , u ∈ Rm
for some cq > 0 and L(0, 0) = 0. It follows that the quadratic approximation
of L at the origin is positive definite,
∂L
≥ cq I > 0.
∂x (0,0)
where the control function u(·) belongs to some reasonable class of admissible
controls (e.g., piecewise continuous). The function J∞ ∗ (x) is often called the
optimal value function for the infinite horizon optimal control problem. For
the class of f and L considered, it can be verified that J∞ ∗ (·) is a positive
2
definite C function in a neighborhood of the origin [HO01].
For practical purposes, we are interested in finite horizon approximations
of the infinite horizon optimization problem. In particular, let V (·) be a
nonnegative C 2 function with V (0) = 0 and define the finite horizon cost
(from x using u(·)) to be
Z T
JT (x, u(·)) = L(xu (τ ; x), u(τ )) dτ + V (xu (T ; x)) (3.8)
0
and denote the optimal cost (from x) as
JT∗ (x) = inf JT (x, u(·)) .
u(·)
3-8 CHAPTER 3. RECEDING HORIZON CONTROL
As in the infinite horizon case, one can show, by geometric means, that JT∗ (·)
is locally smooth (C 2 ). Other properties will depend on the choice of V and
T.
Let Γ∞ denote the domain of J∞ ∗ (·) (the subset of Rn on which J ∗
∞
is finite). It is not too difficult to show that the cost functions J∞ ∗ (·) and
sets
Γ∞r := {x ∈ Γ
∞ ∗
: J∞ (x) ≤ r2 }
S
are compact and path connected and moreover Γ∞ = r≥0 Γ∞ r . Note also
that Γ∞ may be a proper subset of Rn since there may be states that cannot
be driven to the origin. We use r2 (rather than r) here to reflect the fact that
our integral cost is quadratically bounded from below. We refer to sub-level
sets of JT∗ (·) and V (·) using
ΓTr := path connected component of {x ∈ Γ∞ : JT∗ (x) ≤ r2 } containing 0,
and
Ωr := path connected component of {x ∈ Rn : V (x) ≤ r2 } containing 0.
These results provide the technical framework needed for receding hori-
zon control.
two issues. The first is that the finite horizon optimizations must be solved
in a relatively short period of time. Second, it can be demonstrated using
linear examples that a naive application of the receding horizon strategy
can have undesirable effects, often rendering a system unstable. Various ap-
proaches have been proposed to tackle this second problem; see [MRRS00]
for a comprehensive review of this literature. The theoretical framework pre-
sented here also addresses the stability issue directly, but is motivated by
the need to relax the computational demands of existing stabilizing RHC
formulations.
Receding horizon control provides a practical strategy for the use of in-
formation from a model through on-line optimization. Every δ seconds, an
optimal control problem is solved over a T second horizon, starting from the
current state. The first δ seconds of the optimal control u∗T (·; x(t)) is then
applied to the system, driving the system from x(t) at current time t to
x∗T (δ, x(t)) at the next sample time t + δ (assuming no model uncertainty).
We denote this receding horizon scheme as RH(T, δ).
In defining (unconstrained) finite horizon approximations to the infinite
horizon problem, the key design parameters are the terminal cost function
V (·) and the horizon length T (and, perhaps also, the increment δ). We wish
to characterize the sets of choices that provide successful controllers.
It is well known (and easily demonstrated with linear examples), that
simple truncation of the integral (i.e., V (x) ≡ 0) may have disastrous effects
if T > 0 is too small. Indeed, although the resulting value function may be
nicely behaved, the “optimal” receding horizon closed loop system can be
unstable.
A more sophisticated approach is to make good use of a suitable terminal
cost V (·). Evidently, the best choice for the terminal cost is V (x) = J∞ ∗ (x)
since then the optimal finite and infinite horizon costs are the same. Of
course, if the optimal value function were available there would be no need
to solve a trajectory optimization problem. What properties of the optimal
value function should be retained in the terminal cost? To be effective, the
terminal cost should account for the discarded tail by ensuring that the
origin can be reached from the terminal state xu (T ; x) in an efficient manner
(as measured by L). One way to do this is to use an appropriate control
Lyapunov function, which is also an upper bound on the cost-to-go.
The following theorem shows that the use of a particular type of CLF is
in fact effective, providing rather strong and specific guarantees.
Theorem 3.1. [JYH01] Suppose that the terminal cost V (·) is a control
Lyapunov function such that
min (V̇ + L)(x, u) ≤ 0 (3.9)
u∈Rm
for each x ∈ Ωrv for some rv > 0. Then, for every T > 0 and δ ∈ (0, T ], the
resulting receding horizon trajectories go to zero exponentially fast. For each
3-10 CHAPTER 3. RECEDING HORIZON CONTROL
Theorem 3.1 shows that for any horizon length T > 0 and any sampling
time δ ∈ (0, T ], the receding horizon scheme is exponentially stabilizing
over the set ΓTrv . For a given T , the region of attraction estimate is en-
larged by increasing r beyond rv to r̄(T ) according to the requirement that
V (x∗T (T ; x)) ≤ rv2 on that set. An important feature of the above result is
that, for operations with the set ΓTr̄(T ) , there is no need to impose stability
ensuring constraints which would likely make the online optimizations more
difficult and time consuming to solve.
This expression says that solution to the finite-horizon, optimal control prob-
lem starting at time t = δ has cost that is less than the cost of the solution
from time t = 0, with the initial portion of the cost subtracted off.. In other
words, we are closer to our solution by a finite amount at each iteration
of the algorithm. It follows using Lyapunov analysis that we must converge
to the zero cost solution and hence our trajectory converges to the desired
terminal state (given by the minimum of the cost function).
To show equation (3.10) holds, consider a trajectory in which we apply
the optimal control for the first T seconds and then apply a closed loop
controller using a stabilizing feedback u = −k(x) for another T seconds. (The
stabilizing compensator is guaranteed to exist since V is a control Lyapunov
function.) Let (x∗T , u∗T )(t; x), t ∈ [0, T ] represent the optimal control and
(xk , uk )(t − T ; x∗T (T ; x)), t ∈ [T, 2T ] represent the control with u = −k(x)
applied where k satisfies (V̇ + L)(x, −k(x)) ≤ 0. Finally, let (x̃(t), ũ(t)),
t ∈ [0, 2T ] represent the trajectory obtained by concatenating the optimal
trajectory (x∗T , u∗T ) with the CLF trajectory (xk , uk ).
We now proceed to show that the inequality (3.10) holds. The cost of
using ũ(·) for the first T seconds starting from the initial state x∗T (δ; x)),
3.2. RECEDING HORIZON CONTROL WITH CLF TERMINAL COST 3-11
δ ∈ [0, , T ] is given by
Z T +δ
JT (x∗T (δ; x), ũ(·)) = L(x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ))
δ
Z δ
∗
= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
Z T +δ
+ L(x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ)).
T
Note that the second line is simply a rewriting of the integral in terms of
the optimal cost JT∗ with the necessary additions and subtractions of the
additional portions of the cost for the interval [δ, T + δ]. We can how use the
bound
L(x̃(τ ), ũ(τ )) ≤ V̇ (x̃(τ ), ũ(τ ), τ ∈ [T, 2T ],
which follows from the definition of the CLF V and stabilizing controller
k(x). This allows us to write
Z δ
∗ ∗
JT (xT (δ; x), ũ(·)) ≤ JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
Z T +δ
− V̇ (x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ))
T
Z δ
∗
= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
T +δ
− V (x̃(τ )) + V (x̃(T + δ))
T
Z δ
∗
= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)).
0
Finally, using the optimality of u∗T we have that JT∗ (x∗T (δ; x)) ≤ JT (x∗T (δ; x), ũ(·))
and we obtain equation (3.10).
zj (t)
zj (to )
where Bi,kj (t) is the B-spline basis function defined in [dB78] for the output
zj with order kj , Cij are the coefficients of the B-spline, lj is the number
of knot intervals, and mj is number of smoothness conditions P at the knots.
The set (z1 , z2 , . . . , zn−r ) is thus represented by M = j∈{1,r+1,...,n} pj co-
efficients.
In general, w collocation points are chosen uniformly over the time in-
terval [to , tf ] (though optimal knots placements or Gaussian points may
also be considered). Both dynamics and constraints will be enforced at the
collocation points. The problem can be stated as the following nonlinear
programming form:
(
Φ(z(y), ż(y), . . . , z (n−r) (y)) = 0
minM F (y) subject to (3.18)
y∈R lb ≤ c(y) ≤ ub
where
y = (C11 , . . . , Cp11 , C1r+1 , . . . , Cpr+1
r+1
, . . . , C1n , . . . , Cpnn ).
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-15
The coefficients of the B-spline basis functions can be found using nonlinear
programming.
A software package called Nonlinear Trajectory Generation (NTG) has
been written to solve optimal control problems in the manner described
above (see [MMM00] for details). The sequential quadratic programming
package NPSOL by [GMSW] is used as the nonlinear programming solver in
NTG. When specifying a problem to NTG, the user is required to state the
problem in terms of some choice of outputs and its derivatives. The user is
also required to specify the regularity of the variables, the placement of the
knot points, the order and regularity of the B-splines, and the collocation
points for each output.
where FXa = D cos γ + L sin γ and FZa = −D sin γ + L cos γ are the aerody-
namic forces and FXb and FZb are thrust vectoring body forces in terms of
the lift (L), drag (D), and flight path angle (γ). Ip and Ω are the moment
of inertia and angular velocity of the ducted fan propeller, respectively. J is
the moment of ducted fan and rf is the distance from center of mass along
the Xb axis to the effective application point of the thrust vectoring force.
The angle of attack α can be derived from the pitch angle θ and the flight
path angle γ by
α = θ − γ.
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-17
The flight path angle can be derived from the spatial velocities by
−ż
γ = arctan .
ẋ
The lift (L) ,drag (D), and moment (M ) are given by
L = qSCL (α) D = qSCD (α) M = c̄SCM (α),
respectively. The dynamic pressure is given by q = 21 ρV 2 . The norm of the
velocity is denoted by V , S the surface area of the wings, and ρ is the
atmospheric density. The coefficients of lift (CL (α)), drag (CD (α)) and the
moment coefficient (CM (α)) are determined from a combination of wind
tunnel and flight testing and are described in more detail in [MM99], along
with the values of the other parameters.
Forward Flight
To obtain the forward flight test data, an operator commanded a desired
forward velocity and vertical position with joysticks. We set the trajectory
update time δ to 2 seconds. By rapidly changing the joysticks, NTG produces
high angle of attack maneuvers. Figure 3.4aa depicts the reference trajec-
tories and the actual θ and ẋ over 60 s. Figure 3.4b shows the commanded
forces for the same time interval. The sequence of maneuvers corresponds to
the ducted fan transitioning from near hover to forward flight, then follow-
ing a command from a large forward velocity to a large negative velocity,
and finally returning to hover.
Figure 3.5 is an illustration of the ducted fan altitude and x position
for these maneuvers. The air-foil in the figure depicts the pitch angle (θ).
It is apparent from this figure that the stabilizing controller is not tracking
well in the z direction. This is due to the fact that unmodeled frictional
effects are significant in the vertical direction. This could be corrected with
an integrator in the stabilizing controller.
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-19
6 12
x’act
4
x’
des
2 10
x’
0
−2 8
−4
110 120 130 140 150 160 170 180
t 6
fz
3
θ
act
2.5 θ 4
des
2
1.5
θ
2 constraints
1 desired
0.5
0 0
110 120 130 140 150 160 170 180 −6 −4 −2 0 2 4 6
t fx
x vs. alt
1
0.8
0.6
0.4
0.2
alt (m)
−0.2
−0.4
−0.6
−0.8
0 10 20 30 40 50 60 70 80
x (m)
Figure 3.5: Forward flight test case: altitude and x position (actual (solid) and
desired (dashed)). Airfoil represents actual pitch angle (θ) of the ducted fan.
An analysis of the run times was performed for 30 trajectories; the aver-
age computation time was less than one second. Each of the 30 trajectories
converged to an optimal solution and was approximately between 4 and
12 seconds in length. A random initial guess was used for the first NTG
trajectory computation. Subsequent NTG computations used the previous
solution as an initial guess. Much improvement can be made in determin-
ing a “good” initial guess. Improvement in the initial guess will improve not
only convergence but also computation times.
controller described in Section 3.2. We also limit the operation of the system
to near hover, so that we can use the local linearization to find the terminal
CLF.
We have implemented the receding horizon controller on the ducted fan
experiment where the control objective is to stabilize the hover equilibrium
point. The quadratic cost is given by
1 1
L(x, u) = x̂T Qx̂ + ûT Rû
2 2 (3.20)
T
V (x) = γ x̂ P x̂
where
x̂ = x − xeq = (x, z, θ − π/2, ẋ, ż, θ̇)
û = u − ueq = (FXb − mg, FZb )
Q = diag{4, 15, 4, 1, 3, 0.3}
R = diag{0.5, 0.5},
For the terminal cost, we choose γ = 0.075 and P is the unique stable
solution to the algebraic Riccati equation corresponding to the linearized
dynamics of equation (3.19) at hover and the weights Q and R. Note that
if γ = 1/2, then V (·) is the CLF for the system corresponding to the LQR
problem. Instead V is a relaxed (in magnitude) CLF, which achieved better
performance in the experiment. In either case, V is valid as a CLF only in a
neighborhood around hover since it is based on the linearized dynamics. We
do not try to compute off-line a region of attraction for this CLF. Experi-
mental tests omitting the terminal cost and/or the input constraints leads
to instability. The results in this section show the success of this choice for
V for stabilization. An inner-loop PD controller on θ, θ̇ is implemented to
stabilize to the receding horizon states θT∗ , θ̇T∗ . The θ dynamics are the fastest
for this system and although most receding horizon controllers were found
to be nominally stable without this inner-loop controller, small disturbances
could lead to instability.
The optimal control problem is set-up in NTG code by parameterizing
the three position states (x, z, θ), each with 8 B-spline coefficients. Over the
receding horizon time intervals, 11 and 16 breakpoints were used with hori-
zon lengths of 1, 1.5, 2, 3, 4 and 6 seconds. Breakpoints specify the locations
in time where the differential equations and any constraints must be satis-
fied, up to some tolerance. The value of FXmax b
for the input constraints is
made conservative to avoid prolonged input saturation on the real hardware.
The logic for this is that if the inputs are saturated on the real hardware,
no actuation is left for the inner-loop θ controller and the system can go
unstable. The value used in the optimization is FXmax b
= 9 N.
Computation time is non-negligible and must be considered when imple-
menting the optimal trajectories. The computation time varies with each
optimization as the current state of the ducted fan changes. The follow-
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-21
Legend
computed X applied unused
u*T (i-1)
X X X
X X *
uT (i+1)
X
X X
computation computation u*T (i)
(i) (i+1)
time
ti ti+1 ti+2
ing notational definitions will facilitate the description of how the timing is
set-up:
i Integer counter of RHC computations
ti Value of current time when RHC computation i started
δc (i) Computation time for computation i
u∗T (i)(t) Optimal output trajectory corresponding to computation
i, with time interval t ∈ [ti , ti + T ]
A natural choice for updating the optimal trajectories for stabilization is to
do so as fast as possible. This is achieved here by constantly resolving the
optimization. When computation i is done, computation i + 1 is immedi-
ately started, so ti+1 = ti + δc (i). Figure 3.6 gives a graphical picture of the
timing set-up as the optimal input trajectories u∗T (·) are updated. As shown
in the figure, any computation i for u∗T (i)(·) occurs for t ∈ [ti , ti+1 ] and the
resulting trajectory is applied for t ∈ [ti+1 , ti+2 ]. At t = ti+1 computation
i + 1 is started for trajectory u∗T (i + 1)(·), which is applied as soon as it is
available (t = ti+2 ). For the experimental runs detailed in the results, δc (i)
is typically in the range of [0.05, 0.25] seconds, meaning 4 to 20 optimal
control computations per second. Each optimization i requires the current
measured state of the ducted fan and the value of the previous optimal in-
put trajectories u∗T (i − 1) at time t = ti . This corresponds to, respectively,
6 initial conditions for state vector x and 2 initial constraints on the in-
put vector u. Figure 3.6 shows that the optimal trajectories are advanced
by their computation time prior to application to the system. A dashed line
corresponds to the initial portion of an optimal trajectory and is not applied
since it is not available until that computation is complete. The figure also
reveals the possible discontinuity between successive applied optimal input
trajectories, with a larger discontinuity more likely for longer computation
3-22 CHAPTER 3. RECEDING HORIZON CONTROL
Average run time for previous second of computation MPC response to 6m offset in x for various horizons
0.4 6
T = 1.5
T = 2.0 5
x (m)
0.2 o T = 2.0
2 * T = 3.0
x T = 4.0
1 . T = 6.0
0.1
0
0 −1
0 5 10 15 20 −5 0 5 10 15 20 25
seconds after initiation time (sec)
(a) Average run time (b) Step responses
Figure 3.7: Receding horizon control: (a) moving one second average of compu-
tation time for RHC implementation with varying horizon time, (b) response of
RHC controllers to 6 meter offset in x for different horizon lengths.
seconds, the fan is still far from the desired hover position and the terminal
cost CLF is large, likely far from its region of attraction. Figure 3.7b shows
the measured x response for these different controllers, exhibiting a rise time
of 8–9 seconds independent of the controller. So a horizon time closer to the
rise time results in a more feasible optimization in this case.
Exercises
3.1
3.2 Consider a nonlinear control system
ẋ = f (x, u)
with linearization
ẋ = Ax + Bu.
Show that if the linearized system is reachable, then there exists a (local)
control Lyapunov function for the nonlinear system. (Hint: start by proving
the result for a stable system.)
3.3 In this problem we will explore the effect of constraints on control of
the linear unstable system given by
ẋ1 = 0.8x1 − 0.5x2 + 0.5u
ẋ2 = x1 + 0.5u
subject to the constraint that |u| ≤ a where a is a postive constant.
(a) Ignore the constraint (a = ∞) and design an LQR controller to stabi-
lize the system. Plot the response of the closed system from the initial
condition given by x = (1, 0).
(b) Use SIMULINK or ode45 to simulate the system for some finite value
of a with an initial condition x(0) = (1, 0). Numerically (trial and
error) determine the smallest value of a for which the system goes
unstable.
(c) Let amin (ρ) be the smallest value of a for which the system is unstable
from x(0) = (ρ, 0). Plot amin (ρ) for ρ = 1, 4, 16, 64, 256.
(d) Optional: Given a > 0, design and implement a receding horizon con-
trol law for this system. Show that this controller has larger region
of attraction than the controller designed in part (b). (Hint: solve the
finite horizon LQ problem analytically, using the bang-bang example
as a guide to handle the input constraint.)
3.4
3-24 CHAPTER 3. RECEDING HORIZON CONTROL
i. Solve for u∗ (t) = −bP x∗ (t) where P is the positive solution corre-
sponding to the algebraic Riccati equation. Note that this gives an
explicit feedback law (u = −bP x).
ii. Plot the state solution of the finite time optimal controller for the
following parameter values
a=2 b = 0.5 x(t0 ) = 4
c = 0.1, 10 tf = 0.5, 1, 10
(This should give you a total of 6 curves.) Compare these to the infinite
time optimal control solution. Which finite time solution is closest to
the infinite time solution? Why?
Using the solution given in equation (2.5), implement the finite-time op-
timal controller in a receding horizon fashion with an update time of δ = 0.5.
Using the parameter values in part (b), compare the responses of the reced-
ing horizon controllers to the LQR controller you designed for problem 1,
from the same initial condition. What do you observe as c and tf increase?
(Hint: you can write a MATLAB script to do this by performing the
following steps:
(i) set t0 = 0
(ii) using the closed form solution for x∗ from problem 1, plot x(t), t ∈
[t0 , tf ] and save xδ = x(t0 + δ)
(iii) set x(t0 ) = xδ and repeat step (ii) until x is small.)
3.6
3.7 In this problem we will explore the effect of constraints on control of
the linear unstable system given by
ẋ1 = 0.8x1 − 0.5x2 + 0.5u, ẋ2 = x1 + 0.5u,
3.5. FURTHER READING 3-25
Ω can be any set, either with a finite, countable or infinite number of ele-
ments. The event space F consists of subsets of Ω. There are some mathemat-
ical limits on the properties of the sets in F, but these are not critical for our
purposes here. The probability measure P is a mapping from P : F → [0, 1]
that assigns a probability to each event. It must satisfy the property that
given any two disjoint sets A, B ∈ F, P(A ∪ B) = P(A) + P(B).
With these definitions, we can model many different stochastic phenom-
ena. Given a probability space, we can choose samples ω ∈ Ω and identify
each sample with a collection of events chosen from F. These events should
correspond to phenomena of interest and the probability measure P should
capture the likelihood of that event occurring in the system that we are
modeling. This definition of a probability space is very general and allows
us to consider a number of situations as special cases.
A random variable X is a function X : Ω → S that gives a value in S,
called the state space, for any sample ω ∈ Ω. Given a subset A ⊂ S, we can
write the probability that X ∈ A as
The sum of the probabilities over the entire set of states must be unity, and
so we have that X
pX (s) = 1.
s∈S
P(X = 1) = p, P(X = 0) = 1 − p.
4.1. BRIEF REVIEW OF RANDOM VARIABLES 4-3
p(x) p(x)
L U µ
(a) Uniform distribu- (b) Gaussian distri- (c) Exponentialdistri-
tion bution bution
Figure 4.2: Probability density function (pdf) for uniform, Gaussian and expo-
nential distributions.
for X and Y :
Z xu Z zu −x
P(A ∩ B) = pY (y)dy pX (x)dx
zl −x
Zxlxu Z zu Z zu Z xu
= pY (z − x)pX (x)dzdx =: pZ,X (z, x)dxdz.
xl zl zl xl
Using Gaussians for X and Y we have
1 1 2 1 1 2
pZ,X (z, x) = √ e− 2 (z − x − µY ) · √ e− 2 (x − µX )
2π 2π
1 − (z − x − µY ) + (x − µX )2
1 2
= e 2 .
2π
A similar expression holds for pZ,Y . ∇
Given a random variable X, we can define various standard measures of
the distribution. The expectation or mean of a random variable is defined as
Z ∞
E(X) = hXi = x p(x) dx,
−∞
and the mean square of a random variable is
Z ∞
2 2
E(X ) = hX i = x2 p(x) dx.
−∞
Proof. The first property follows from the definition of mean and variance:
Z ∞ Z ∞
E(αX) = αx p(x) dx = α αx p(x) dx = αE(X)
Z−∞
∞
−∞
Z ∞
E((αX)2 ) = (αx)2 p(x) dx = α2 x2 p(x) dx = α2 E(X 2 ).
−∞ −∞
The second property follows similarly, remembering that we must take the
expectation using the joint distribution (since we are evaluating a function
of two random variables):
Z ∞Z ∞
E(αX + βY ) = (αx + βy) pX,Y (x, y) dxdy
−∞ −∞
Z ∞Z ∞ Z ∞Z ∞
=α x pX,Y (x, y) dxdy + β y pX,Y (x, y) dxdy
Z−∞
∞
−∞
Z ∞
−∞ −∞
In the case that W [k] is a Gaussian with mean zero and (stationary) standard
deviation σ, then E(W [k]W [l]) = σ 2 δ(k − l).
We next wish to describe the evolution of the state x in equation (4.11)
in the case when W is a random variable. In order to do this, we describe
the state x as a sequence of random variables X[k], k = 1, · · · , N . Looking
back at equation (4.11), we see that even if W [k] is an uncorrelated sequence
of random variables, then the states X[k] are not uncorrelated since
To capture the relationship between the current state and the future state,
we define the correlation function for a random process as
Z ∞
ρ(k1 , k2 ) := E(X[k1 ]X[k2 ]) = x1 x2 p(x1 , x2 ; k1 , k2 ) dx1 dx2
−∞
n kX
1 −1 2 −1
kX o
E(X[k1 ]X[k2 ]) = E Ak1 −i BW [i] Ak2 −j BW [j]
i=0 j=0
nkX
1 −1 k
X 2 −1 o
=E Ak1 −i BW [i]W [j]BAk2 −j .
i=0 j=0
We can now use the linearity of the expectation operator to pull this inside
the summations:
1 −1 k
kX X 2 −1
In particular, if the discrete time system is stable then |A| < 1 and the
correlation function decays as we take points that are further departed in
time (d large). Furthermore, if we let k → ∞ (i.e., look at the steady state
solution) then the correlation function only depends on d (assuming the sum
converges) and hence the steady state random process is stationary.
In our derivation so far, we have assumed that X[k + 1] only depends on
the value of the state at time k (this was implicit in our use of equation (4.11)
and the assumption that W [k] is independent of X). This particular assump-
tion is known as the Markov property for a random process: a Markovian
process is one in which the distribution of possible values of the state at time
k depends only on the values of the state at the prior time and not earlier.
Written more formally, we say that a discrete random process is Markovian
if
pX,k (x | X[k − 1], X[k − 2], . . . , X[0]) = pX,k (x | X[k − 1]).
Note that the state of a random process is not enough to determine the extact
next state, but only the distribution of next states (otherwise it would be a
deterministic process). We typically omit indexing of the individual states
unless the meaning is not clear from context.
We can characterize the dynamics of a random process by its statistical
characteristics, written in terms of joint probability density functions:
form
P X(ti ) ∈ [xi , xi + dxi ], i = 1, . . . , n =
p(xn , xn−1 , . . . , x0 ; tn , tn−1 , . . . , t0 )dxn dxn−1 dx1 ,
where dxi are taken as infinitesimal quantities.
Just as in the case of discrete time processes, we define a continuous time
random process to be a Markov process if the probability of being in a given
state at time tn depends only on the state that we were in at the previous
time instant tn−1 and not the entire history of states prior to tn−1 :
P X(tn ) ∈ [xn , xn + dxn ] | X(ti ) ∈ [xi , xi + dxi ], i = 1, . . . , n − 1
= P X(tn ) ∈ [xn , xn + dxn ] | X(tn−1 ) ∈ [xn−1 , xn−1 + dxn−1 ] . (4.12)
ρ(t1 − t2 )
τ = t1 − t2
Figure 4.3: Correlation function for a first-order Markov process.
Note on terminology. The terminology and notation for covariance and cor-
relation varies between disciplines. The term covariance is often used to
refer to both the relationship between different variables X and Y and the
relationship between a single variable at different times, X(t) and X(s).
The term “cross-covariance” is used to refer to the covariance between two
random vectors X and Y , to distinguish this from the covariance of the
elements of X with each other. The term “cross-correlation” is sometimes
also used. Finally, the term “correlation coefficient” refers to the normalized
correlation r̄(t, s) = E(X(t)X(s))/E(X(t)X(t))..
4.4. LINEAR STOCHASTIC SYSTEMS WITH GAUSSIAN NOISE 4-15
Note here that we have relied on the linearity of the convolution integral to
pull the expectation inside the integral.
We can compute the covariance of the output by computing the correla-
tion r(τ ) and setting σ 2 = r(0). The correlation function for y is
Z t Z s
rY (t, s) = E(Y (t)Y (s)) = E( h(t − η)W (η) dη · h(s − ξ)W (ξ) dξ )
0 0
Z tZ s
= E( h(t − η)W (η)W (ξ)h(s − ξ) dηdξ )
0 0
If this integral exists, then we can compute the second order statistics for
the output Y .
We can provide a more explicit formula for the correlation function r in
terms of the matrices A, F and C by expanding equation (4.14). We will
consider the general case where W ∈ Rm and Y ∈ Rp and use the correlation
matrix R(t, s) instead of the correlation function r(t, s). Define the state
4.4. LINEAR STOCHASTIC SYSTEMS WITH GAUSSIAN NOISE 4-17
log S(ω)
ω0 log ω
Figure 4.4: Spectral power density for a first-order Markov process.
.
The power spectral density provides an indication of how quickly the values
of a random process can change through the frequency content: if there
is high frequency content in the power spectral density, the values of the
random variable can change quickly in time.
Example 4.4 First-order Markov process
To illustrate the use of these measures, consider a first-order Markov process
as defined in Example 4.7. The correlation function is
Q −ω0 (τ )
ρ(τ ) = e .
2ω0
The power spectral density becomes
Z ∞
Q −ω|τ | −jωτ
S(ω) = e e dτ
−∞ 2ω0
Z 0 Z ∞
Q (ω−jω)τ Q (−ω−jω)τ Q
= e dτ + e dτ = 2 .
−∞ 2ω0 0 2ω0 ω + ω02
We see that the power spectral density is similar to a transfer function and
we can plot S(ω) as a function of ω in a manner similar to a Bode plot,
as shown in Figure 4.4. Note that although S(ω) has a form similar to a
transfer function, it is a real-valued function and is not defined for complex
s. ∇
Using the power spectral density, we can more formally define “white
noise”: a white noise process is a zero-mean, random process with power
spectral density S(ω) = W = constant for all ω. If X(t) ∈ Rn (a random
vector), then W ∈ Rn×n . We see that a random process is white if all
frequencies are equally represented in its power spectral density; this spectral
property is the reason for the terminology “white”. The following proposition
verifies that this formal definition agrees with our previous (time domain)
definition.
Proposition 4.4. For a white noise process,
Z ∞
1
ρ(τ ) = S(ω)ejωτ dτ = W δ(τ ),
2π −∞
4-20 CHAPTER 4. STOCHASTIC SYSTEMS
Ẋ = AX + F V ρY (τ ) = RY (τ ) = CP e−A|τ | C T
ρV (τ ) = RV δ(τ )
Y = CX AP + P AT + F RV F T = 0
Figure 4.5 summarizes the relationship between the time and frequency
domains.
Exercises
4.1 Let Z be a random random variable that is the sum of two independent
normally (Gaussian) distributed random variables X1 and X2 having means
m1 , m2 and variances σ12 , σ22 respectively. Show that the probability density
function for Z is
Z ∞
1 (z − x − m1 )2 (x − m2 )2
p(z) = exp − − dx
2πσ1 σ2 −∞ 2σ12 2σ22
and confirm that this is normal (Gaussian) with mean m1 +m2 and variance
σ12 + σ22 . (Hint: Use the fact that p(z|x2 ) = pX1 (x1 ) = pX1 (z − x2 ).)
4.2 (ÅM08, Exercise 7.13) Consider the motion of a particle that is under-
going a random walk in one dimension (i.e., along a line). We model the
position of the particle as
x[k + 1] = x[k] + u[k],
where x is the position of the particle and u is a white noise processes with
E(u[i]) = 0 and E(u[i] u[j])Ru δ(i − j). We assume that we can measure
x subject to additive, zero-mean, Gaussian white noise with covariance 1.
Show that the expected value of the particle as a function of k is given by
k−1
X
E(x[k]) = E(x[0]) + E(u[i]) = E(x[0]) =: µx
i=0
4-22 CHAPTER 4. STOCHASTIC SYSTEMS
(a) Compute the correlation function ρ(τ ) for the output of the system.
Your answer should be an explicit formula in terms of a, b and σ.
(b) Assuming that the input transients have died out, compute the mean
and variance of the output.
4.4 Find a constant matrix A and vectors F and C such that for
Ẋ = AX + F W, Y = CX
the power spectrum of Y is given by
1 + ω2
S(ω) =
(1 − 7ω 2 )2 + 1
Describe the sense in which your answer is unique.
Chapter Five
Kalman Filtering
In this chapter we derive the optimal estimator for a linear system in con-
tinuous time (also referred to as the Kalman-Bucy filter). This estimator
minimizes the covariance and can be implemented as a recursive filter.
Prerequisites. Readers should have basic familiarity with continuous-time
stochastic systems at the level presented in Chapter 4.
Theorem 5.1 (Kalman-Bucy, 1961). The optimal estimator has the form
of a linear observer
˙
X̂ = AX̂ + BU + L(Y − C X̂)
where L(t) = P (t)C T Rv−1 and P (t) = E{(X(t) − X̂(t))(X(t) − X̂(t))T }
satisfies
Ṗ = AP + P AT − P C T Rv−1 (t)CP + F RW (t)F T ,
P (0) = E{X(0)X T (0)}.
Sketch of proof. The error dynamics are given by
Ė = (A − LC)E + ξ, ξ = F W − LV, Rξ = F RW F T + LRv LT
The covariance matrix PE = P for this process satisfies
Ṗ = (A − LC)P + P (A − LC)T + F RW F T + LRv LT
= AP + P AT + F RW F T − LCP − P C T LT + LRv LT
= AP + P AT + F RW F T + (LRv − P C T )Rv−1 (LRv + P C T )T
− P C T Rv CP,
where the last line follows by completing the square. We need to find L such
that P (t) is as small as possible, which can be done by choosing L so that
Ṗ decreases by the maximum amount possible at each instant in time. This
is accomplished by setting
LRv = P C T =⇒ L = P C T Rv−1 .
Note that the Kalman filter has the form of a recursive filter: given P (t) =
E{E(t)E T (t)} at time t, can compute how the estimate and covariance
change. Thus we do not need to keep track of old values of the output.
Furthermore, the Kalman filter gives the estimate X̂(t) and the covariance
PE (t), so you can see how well the error is converging.
If the noise is stationary (RW , RV constant) and if the dynamics for P (t)
are stable, then the observer gain converges to a constant and satisfies the
algebraic Riccati equation:
L = P C T Rv−1 AP + P AT − P C T Rv−1 CP + F RW F T .
This is the most commonly used form of the controller since it gives an
explicit formula for the estimator gains that minimize the error covariance.
The gain matrix for this case can solved use the lqe command in MATLAB.
Another property of the Kalman filter is that it extracts the maximum
possible information about output data. To see this, consider the residual
random process
R = Y − C X̂
5.2. EXTENSIONS OF THE KALMAN FILTER 5-3
(this process is also called the innovations process). It can be shown for the
Kalman filter that the correlation matrix of R is given by
This implies that the residuals are a white noise process and so the output
error has no remaining dynamic information content.
Ẋ = f (X, U, W ), X ∈ Rn , u ∈ Rm ,
Y = CX + V, Y ∈ Rp ,
where W and V are Gaussian white noise processes with covariance matrices
RW and RV . A nonlinear observer for the system can be constructed by using
the process
˙
X̂ = f (X̂, U, 0) + L(Y − C X̂).
where
F (E, X̂, U, W ) = f (E + X̂, U, W ) − f (X̂, U, 0)
5-4 CHAPTER 5. KALMAN FILTERING
∂F ∂F
Ê = E + F (0, X̂, U, 0) + W− LCe + h.o.t
∂E | {z ∂W
} | {z } |{z}
=0 noise obserwer gain
≈ ÃE + F̃ W − LCE,
depend on current estimate X̂. We can now design an observer for the lin-
earized system around the current estimate:
˙
X̂ = f (X̂, U, 0) + L(Y − C X̂), L = P C T Rv−1 ,
Ṗ = (Ã − LC)P + P (Ã − LC)T + F̃ RW F̃ T + LRv LT ,
P (t0 ) = E{X(t0 )X T (t0 )}
Ẋ = A(ξ)X + B(ξ)U + F W, ξ ∈ Rp ,
Y = C(ξ)X + V.
We wish to solve the parameter identification problem: given U (t) and Y (t),
estimate the value of the parameters ξ.
One approach to this online parameter estimation problem is to treat ξ
as an unknown state that has zero derivative:
Ẋ = A(ξ)X + B(ξ)U + F W, ξ˙ = 0.
5.3. LQG CONTROL 5-5
We can now write the dynamics in terms of the extended state Z = (X, ξ):
f( X
h i
ξ ,U,W )
z }| {
d X A(ξ) 0 X B(ξ) F
= + U+ W,
dt ξ 0 0 ξ 0 0
Y = C(ξ)X + V .
| h {zi }
h( X
ξ ,W )
This system is nonlinear in the extended state Z, but we can use the ex-
tended Kalman filter to estimate Z. If this filter converges, then we obtain
both an estimate of the original state X and an estimate of the unknown
parameter ξ ∈ Rp .
Remark: need various observability conditions on augmented system in
order for this to work. ∇
Ẋ = AX + BU + F W W, V Gaussian white
Figure noise with covariance
Y = CX + V RW , RV
Stochastic control problem: find C(s) to minimize
Z ∞
J =E (Y − r)T RW (Y − r)T + U T Rv U dt
0
Assume for simplicity that the reference input r = 0 (otherwise, translate
the state accordingly).
Theorem 5.2 (Separation principle). The optimal controller has the form
˙
X̂ = AX̂ + BU + L(Y − C X̂)
U = K(X̂ − Xd )
where L is the optimal observer gain ignoring the controller and K is the
optimal controller gain ignoring the noise.
This is called the separation principle (for H2 control).
% System parameters
J = 0.0475; % inertia around pitch axis
m = 1.5; % mass of fan
r = 0.25; % distance to flaps
g = 10; % gravitational constant
d = 0.2; % damping factor (estimated)
0 0 0 0 1 0;
0 0 0 0 0 1;
0, 0, (-F(1)*sin(xhat(3)) - F(2)*cos(xhat(3)))/m, -d, 0, 0;
0, 0, (F(1)*cos(xhat(3)) - F(2)*sin(xhat(3)))/m, 0, -d, 0;
0 0 0 0 0 0 ];
% PVTOL dynamics
deriv(1) = x(4); deriv(2) = x(5); deriv(3) = x(6);
deriv(4) = (F(1)*cos(x(3)) - F(2)*sin(x(3)) - d*x(4) + fd(1)) / m;
deriv(5) = (F(1)*sin(x(3)) + F(2)*cos(x(3)) - m*g - d*x(5) + fd(2)) / m;
deriv(6) = (F(1) * r + fd(3)) / J;
% All done
return;
To show how this estimator can be used, consider the problem of stabiliz-
ing the system to the origin with an LQR controller that uses the estimated
state. The following MATLAB code implements the controller, using the
previous simulation:
% kf_dfan.m - Kalman filter for the ducted fan
% RMM, 5 Feb 06
%%
%% Ducted fan dynamics
%%
%% These are the dynamics for the ducted fan, written in state space
5-8 CHAPTER 5. KALMAN FILTERING
%% form.
%%
% System parameters
J = 0.0475; % inertia around pitch axis
m = 1.5; % mass of fan
r = 0.25; % distance to flaps
g = 10; % gravitational constant
d = 0.2; % damping factor (estimated)
B = [ 0 0;
0 0;
0 0;
1/m 0;
0 1/m;
r/J 0 ];
C = [ 1 0 0 0 0 0;
0 1 0 0 0 0 ];
D = [ 0 0; 0 0];
%%
%% State space control design
%%
%% We use an LQR design to choose the state feedback gains
%%
%%
%% Estimator #1
%%
subplot(321);
plot(t1, y1(:,1), ’b-’, t1, y1(:,2), ’g--’);
legend x y;
xlabel(’time’);
ylabel(’States (no \theta)’);
axis([0 15 -0.3 0.3]);
subplot(323);
plot(t1, y1(:,7) - y1(:,1), ’b-’, ...
t1, y1(:,8) - y1(:,2), ’g--’, ...
t1, y1(:,9) - y1(:,3), ’r-’);
legend xerr yerr terr;
xlabel(’time’);
ylabel(’Errors (no \theta)’);
axis([0 15 -0.2 0.2]);
subplot(325);
plot(t1, y1(:,13), ’b-’, t1, y1(:,19), ’g--’, t1, y1(:,25), ’r-’);
legend P11 P22 P33
xlabel(’time’);
ylabel(’Covariance (w/ \theta)’);
axis([0 15 -0.2 0.2]);
%%
%% Estimator #2
%%
% Now change the output and see what happens (L computed but not used)
pvtol_C = [1 0 0 0 0 0; 0 1 0 0 0 0; 0 0 1 0 0 0];
pvtol_Rw = diag([R11 R22 R33]);
pvtol_L = lqe(A, pvtol_F, pvtol_C, pvtol_Rv, pvtol_Rw);
subplot(324);
plot(t2, y2(:,7) - y2(:,1), ’b-’, ...
t2, y2(:,8) - y2(:,2), ’g--’, ...
5-10 CHAPTER 5. KALMAN FILTERING
subplot(326);
plot(t2, y2(:,13), ’b-’, t2, y2(:,19), ’g--’, t2, y2(:,25), ’r-’);
legend P11 P22 P33
xlabel(’time’);
ylabel(’Covariance (w/ \theta)’);
axis([0 15 -0.2 0.2]);
Exercises
5.1 Consider the problem of estimating the position of an autonomous
mobile vehicle using a GPS receiver and an IMU (inertial measurement
unit). The dynamics of the vehicle are given by
y l ẋ = cos θ v
θ ẏ = sin θ v
1
θ̇ = tan φ v,
ℓ
x
We assume that the vehicle is disturbance free, but that we have noisy
measurements from the GPS receiver and IMU and an initial condition error.
In this problem we will utilize the full form of the Kalman filter (including
the Ṗ equation).
(a) Suppose first that we only have the GPS measurements for the xy
position of the vehicle. These measurements give the position of the
vehicle with approximately 1 meter accuracy. Model the GPS error
as Gaussian white noise with σ = 1.2 meter in each direction and
design an optimal estimator for the system. Plot the estimated states
and the covariances for each state starting with an initial condition
of 5 degree heading error at 10 meters/sec forward speed (i.e., choose
x(0) = (0, 0, 5π/180) and x̂ = (0, 0, 0)).
(b) An IMU can be used to measure angular rates and linear acceleration.
Assume that we use a Northrop Grumman LN200 to measure the
5.5. FURTHER READING 5-11
angular rate θ̇. Use the datasheet on the course web page to determine
a model for the noise process and design a Kalman filter that fuses
the GPS and IMU to determine the position of the vehicle. Plot the
estimated states and the covariances for each state starting with an
initial condition of 5 degree heading error at 10 meters/sec forward
speed.
Note: be careful with units on this problem!
Chapter Six
Sensor Fusion
In this chapter we consider the problem of combining the data from differ-
ent sensors to obtain an estimate of a (common) dynamical system. Unlike
the previous chapters, we focus here on discrete-time processes, leaving the
continuous-time case to the exercises. We begin with a summary of the in-
put/output properties of discrete-time systems with stochastic inputs, then
present the discrete-time Kalman filter, and use that formalism to formulate
and present solutions for the sensor fusion problem. Some advanced methods
of estimation and fusion are also summarized at the end of the chapter that
demonstrate how to move beyond the linear, Gaussian process assumptions.
Prerequisites. The material in this chapter is designed to be reasonably self-
contained, so that it can be used without covering Sections ??–4.4 or Chap-
ter 5 of this supplement. We assume rudimentary familiarity with discrete-
time linear systems,, at the level of the brief descriptions in Chapters 2 and 5
of ÅM08, and discrete-time random processes as described in Section 4.2 of
these notes.
linear, we can always add it back in by superposition). Note first that the
state at time k + l can be written as
X[k + l] = AX[k + l − 1] + F W [x + l − 1]
= A(AX[k + l − 2] + F W [x + l − 2]) + F W [x + l − 1]
l
X
= Al X[k] + Aj−1 F W [k + l − j].
j=1
where W [k] and V [k] are Gaussian, white noise processes satisfying
E{W [k]} = 0 E{V [k]} = 0
( (
0 k 6= j 0 k 6= j
E{W [k]W T [j]} = E{V [k]V T [j]} =
RW k=j RV k=j
E{W [k]V T [j]} = 0.
(6.6)
We assume that the initial condition is also modeled as a Gaussian random
variable with
E{X[0]} = x0 E{X[0]X T [0]} = P [0]. (6.7)
We wish to find an estimate X̂[k] that gives the minimum mean square
error (MMSE) for E{(X[k] − X̂[k])(X[k] − X̂[k])T } given the measurements
{Y [l] : 0 ≤ l ≤ k}. We consider an observer of the form
Theorem 6.2. Consider a random process X[k] with dynamics (6.5) and
noise processes and initial conditions described by equations (6.6) and (6.7).
The observer gain L that minimizes the mean square error is given by
L[k] = AP [k]C T (RV + CP [k]C T )−1 ,
where
P [k + 1] = (A − LC)P [k](A − LC)T + F RW F T + LRV LT
(6.9)
P [0] = E{X[0]X T [0]}.
Note that the Kalman filter has the form of a recursive filter: given P [k] =
E{E[k]E[k]T } at time k, can compute how the estimate and covariance
change. Thus we do not need to keep track of old values of the output.
Furthermore, the Kalman filter gives the estimate X̂[k] and the covariance
P [k], so we can see how reliable the estimate is. It can also be shown that
the Kalman filter extracts the maximum possible information about output
data. It can be shown that for the Kalman filter the correlation matrix for
the error is
RE [j, k] = Rδjk .
In other words, the error is a white noise process, so there is no remaining
dynamic information content in the error.
In the special case when the noise is stationary (RW , RV constant) and
if P [k] converges, then the observer gain is constant:
L = AP C T (RV + CP C T ),
where P satisfies
−1
P = AP AT + F RW F T − AP C T RV + CP C T CP AT .
We see that the optimal gain depends on both the process noise and the
measurement noise, but in a nontrivial way. Like the use of LQR to choose
state feedback gains, the Kalman filter permits a systematic derivation of
the observer gains given a description of the noise processes. The solution
for the constant gain case is solved by the dlqe command in MATLAB.
Step 0: Initialization.
k=1
X̂[0|0] = E{X[0]}
P [0|0] = E{X[0]X T [0]}
N −l
1 X
RW (l) = E{W [i]W [i + l]} = W [j]W [j + l].
N −l
j=1
Exercises
6.1 Consider the problem of estimating the position of an autonomous
mobile vehicle using a GPS receiver and an IMU (inertial measurement
unit). The continuous time dynamics of the vehicle are given by
6.7. FURTHER READING 6-9
y l ẋ = cos θ v
θ ẏ = sin θ v
1
θ̇ = tan φ v,
ℓ
x
We assume that the vehicle is disturbance free, but that we have noisy
measurements from the GPS receiver and IMU and an initial condition error.
6.2
6.3 Consider a continuous time dynamical system with multiple measure-
ments,
Ẋ = AX + Bu + F V, Y i = C ix + W i, i = 1, . . . , q.
Assume that the measurement noises W i are indendendent for each sensor
and have zero mean and variance σi2 . Show that the optimal estimator for
X weights the measurements by the inverse of their covariances.
6.4 Show that if we formulate the optimal estimate using an estimator of
the form
X̂[k + 1] = AX̂[k] + L[k](Y [k + 1] − CAX̂[k])
that we recover the update law in the predictor-corrector form.
Bibliography
Note: Under construction! The indexing for the text has not yet been
done and so this index contains an incomplete and unedited set of terms.