0% found this document useful (0 votes)
35 views122 pages

Optimization-Based Control Murray

This chapter discusses trajectory generation and tracking for control systems. It introduces the two degree of freedom controller design approach, which separates the controller into a feedforward compensator that generates a nominal trajectory and a feedback compensator that corrects errors. The chapter focuses on using the concept of differential flatness to generate feasible trajectories for nonlinear systems. It reviews prerequisites in modeling systems with differential equations and linear control concepts before covering two degree of freedom design, trajectory tracking with gain scheduling, trajectory generation using differential flatness, and directions for further reading.

Uploaded by

Antonio Gargiulo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views122 pages

Optimization-Based Control Murray

This chapter discusses trajectory generation and tracking for control systems. It introduces the two degree of freedom controller design approach, which separates the controller into a feedforward compensator that generates a nominal trajectory and a feedback compensator that corrects errors. The chapter focuses on using the concept of differential flatness to generate feasible trajectories for nonlinear systems. It reviews prerequisites in modeling systems with differential equations and linear control concepts before covering two degree of freedom design, trajectory tracking with gain scheduling, trajectory generation using differential flatness, and directions for further reading.

Uploaded by

Antonio Gargiulo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Optimization-Based Control

Richard M. Murray
Control and Dynamical Systems
California Institute of Technology

Version v2.1b (20 Oct 2018)

c California Institute of Technology



All rights reserved.

This manuscript is for personal use only and may not be reproduced,
in whole or in part, without written consent from the author.
Preface

These notes serve as a supplement to Feedback Systems by Åström and Mur-


ray and expand on some of the topics introduced there. They are motivated
by the increasing role of online optimization in feedback systems. This is a
change from the traditional use of optimization in control theory for offline
design of control laws and state estimators. Fueled by Moore’s law and im-
provements in real-time algorithms, it is now possible to perform estimation
and control design algorithms online, allowing the system to better account
for nonlinearities and to adapt to changes in the underlying dynamics of the
controlled process. This changes the way that we think about estimation
and control since it allows much greater flexibility in the design process and
more modularity and flexibility in the overall system.
Our goal in this supplement is to introduce the keys formalisms and tools
required to design optimization-based controllers. Key topics include real-
time trajectory generation using differential flatness, the maximum princi-
ple, dynamic programming, receding horizon optimal control, stochastic pro-
cesses, Kalman filtering, moving horizon estimation and (distributed) sensor
fusion. While these topics might normal consititute separate textbooks, in
this set of notes we attempt to present them in a compact way that allows
them to be used in engineering design. We also briefly survey additional
advanced topics through the text, with pointers to further information for
interested readers.
This supplement as been used in a second quarter controls course at Cal-
tech, taken by a mixture of advanced undergraduates and beginning grad-
uate students with interest in a variety of application areas. The first half
of the 10 week course focuses on trajectory generation and optimal control,
ending with receding horizon control. In the second half of the course, we
introduce stochastic processes and derive the Kalman filter and its various
extensions, including the information filter and sensor fusion. The prerequi-
sites for the course are based on the material covered in Feedback Systems,
including basic knowledge in Lyapunov stability theory and observers. If
needed, these topics can be inserted at the appropriate point in covering the
material in this supplement.
The notation and conventions in the book follow those used in the main
text. Because the notes may not be used for a standalone class, we have
attempted to write each as a standalone reference for advanced topics that
are introduced in Feedback Systems. To this end, each chapter starts with a
iv

short description of the prerequisites for the chapter and citations to the rele-
 vant literature. Advanced sections, marked by the “dangerous bend” symbol
shown in the margin, contain material that requires a slightly more tech-
nical background, of the sort that would be expected of graduate students
in engineering. Additional information is available on the Feedback Systems
web site:
http://www.cds.caltech.edu/~murray/amwiki/OBC
Contents

Chapter 1. Trajectory Generation and Tracking 1-1


1.1 Two Degree of Freedom Design 1-1
1.2 Trajectory Tracking and Gain Scheduling 1-3
1.3 Trajectory Generation and Differential Flatness 1-7
1.4 Further Reading 1-13

Chapter 2. Optimal Control 2-1


2.1 Review: Optimization 2-1
2.2 Optimal Control of Systems 2-5
2.3 Examples 2-8
2.4 Linear Quadratic Regulators 2-10
2.5 Choosing LQR weights 2-14
2.6 Advanced Topics 2-16
2.7 Further Reading 2-17

Chapter 3. Receding Horizon Control 3-1


3.1 Optimization-Based Control 3-1
3.2 Receding Horizon Control with CLF Terminal Cost 3-8
3.3 Receding Horizon Control Using Differential Flatness 3-12
3.4 Implementation on the Caltech Ducted Fan 3-15
3.5 Further Reading 3-23

Chapter 4. Stochastic Systems 4-1


4.1 Brief Review of Random Variables 4-1
4.2 Introduction to Random Processes 4-8
4.3 Continuous-Time, Vector-Valued Random Processes 4-11
4.4 Linear Stochastic Systems with Gaussian Noise 4-15
4.5 Random Processes in the Frequency Domain 4-18
4.6 Further Reading 4-21

Chapter 5. Kalman Filtering 5-1


5.1 Linear Quadratic Estimators 5-1
5.2 Extensions of the Kalman Filter 5-3
5.3 LQG Control 5-5
vi CONTENTS

5.4 Application to a Thrust Vectored Aircraft 5-5


5.5 Further Reading 5-10

Chapter 6. Sensor Fusion 6-1


6.1 Discrete-Time Stochastic Systems 6-1
6.2 Kalman Filters in Discrete Time (AM08) 6-2
6.3 Predictor-Corrector Form 6-4
6.4 Sensor Fusion 6-6
6.5 Information Filters 6-7
6.6 Additional topics 6-7
6.7 Further Reading 6-8

Bibliography B-1

Index I-1
Chapter One
Trajectory Generation and Tracking

This chapter expands on Section 7.5 of Feedback Systems by Åström and


Murray (ÅM08), which introduces the use of feedforward compensation in
control system design. We begin with a review of the two degree of freedom
design approach and then focus on the problem of generating feasible tra-
jectories for a (nonlinear) control system. We make use of the concept of
differential flatness as a tool for generating feasible trajectories.
Prerequisites. Readers should be familiar with modeling of input/output
control systems using differential equations, linearization of a system around
an equilibrium point and state space control of linear systems, including
reachability and eigenvalue assignment. Although this material supplements
concepts introduced in the context of output feedback and state estimation,
no knowledge of observers is required.

1.1 Two Degree of Freedom Design


A large class of control problems consist of planning and following a trajec-
tory in the presence of noise and uncertainty. Examples include autonomous
vehicles maneuvering in city streets, mobile robots performing tasks on fac-
tor floors (or other planets), manufacturing systems that regulate the flow
of parts and materials through a plant or factory, and supply chain manage-
ment systems that balance orders and inventories across an enterprise. All
of these systems are highly nonlinear and demand accurate performance.
To control such systems, we make use of the notion of two degree of free-
dom controller design. This is a standard technique in linear control theory
that separates a controller into a feedforward compensator and a feedback
compensator. The feedforward compensator generates the nominal input re-
quired to track a given reference trajectory. The feedback compensator cor-
rects for errors between the desired and actual trajectories. This is shown
schematically in Figure 1.1.
In a nonlinear setting, two degree of freedom controller design decouples
the trajectory generation and asymptotic tracking problems. Given a de-
sired output trajectory, we first construct a state space trajectory xd and
a nominal input ud that satisfy the equations of motion. The error system
can then be written as a time-varying control system in terms of the error,
e = x − xd . Under the assumption that that tracking error remains small, we
1-2 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

noise Process output


ud
P
Trajectory
ref
Generation
xd ufb
Feedback
Compensation

Figure 1.1: Two degree of freedom controller design for a process P with uncer-
tainty ∆. The controller consists of a trajectory generator and feedback controller.
The trajectory generation subsystem computes a feedforward command ud along
with the desired state xd . The state feedback controller uses the measured (or es-
timated) state and desired state to compute a corrective input ufb . Uncertainty is
represented by the block ∆, representing unmodeled dynamics, as well as distur-
bances and noise.

can linearize this time-varying system about e = 0 and stabilize the e = 0


state. (Note: in ÅM08 the notation uff was used for the desired [feedforward]
input. We use ud here to match the desired state xd .)
More formally, we assume that our process dynamics can be described
by a nonlinear differential equation of the form
ẋ = f (x, u), x ∈ Rn , u ∈ Rm ,
(1.1)
y = h(x, u), y ∈ Rp ,
where x is the system state, u is a vector of inputs and f is a smooth function
describing the dynamics of the process. The smooth function h describes
the output y that we wish to control. We are particularly interested in the
class of control problems in which we wish to track a time-varying reference
trajectory r(t), called the trajectory tracking problem. In particular, we wish
to find a control law u = α(x, r(·)) such that

lim y(t) − r(t) = 0.
t→∞

We use the notation r(·) to indicate that the control law can depend not
only on the reference signal r(t) but also derivatives of the reference signal.
A feasible trajectory for the system (1.1) is a pair (xd (t), ud (t)) that sat-
isfies the differential equation and generates the desired trajectory:
 
ẋd (t) = f xd (t), ud (t) r(t) = h xd (t), ud (t) .
The problem of finding a feasible trajectory for a system is called the tra-
jectory generation problem, with xd representing the desired state for the
1.2. TRAJECTORY TRACKING AND GAIN SCHEDULING 1-3

(nominal) system and ud representing the desired input or the feedforward


control. If we can find a feasible trajectory for the system, we can search
for controllers of the form u = α(x, xd , ud ) that track the desired reference
trajectory.
In many applications, it is possible to attach a cost function to trajec-
tories that describe how well they balance trajectory tracking with other
factors, such as the magnitude of the inputs required. In such applications,
it is natural to ask that we find the optimal controller with respect to some
cost function. We can again use the two degree of freedom paradigm with
an optimal control computation for generating the feasible trajectory. This
subject is examined in more detail in Chapter 2. In addition, we can take
the extra step of updating the generated trajectory based on the current
state of the system. This additional feedback path is denoted by a dashed
line in Figure 1.1 and allows the use of so-called receding horizon control
techniques: a (optimal) feasible trajectory is computed from the current po-
sition to the desired position over a finite time T horizon, used for a short
period of time δ < T , and then recomputed based on the new system state.
Receding horizon control is described in more detail in Chapter 3.
A key advantage of optimization-based approaches is that they allow the
potential for customization of the controller based on changes in mission,
condition and environment. Because the controller is solving the optimiza-
tion problem online, updates can be made to the cost function, to change
the desired operation of the system; to the model, to reflect changes in pa-
rameter values or damage to sensors and actuators; and to the constraints,
to reflect new regions of the state space that must be avoided due to ex-
ternal influences. Thus, many of the challenges of designing controllers that
are robust to a large set of possible uncertainties become embedded in the
online optimization.

1.2 Trajectory Tracking and Gain Scheduling


We begin by considering the problem of tracking a feasible trajectory. As-
sume that a trajectory generator is able to generate a trajectory (xd , ud ) that
satisfies the dynamics (1.1) and satisfies r(t) = h(xd (t), ud (t)). To design the
controller, we construct the error system. Let e = x − xd and v = u − ud
and compute the dynamics for the error:
ė = ẋ − ẋd = f (x, u) − f (xd , ud )
= f (e + xd , v + ud ) − f (xd ) =: F (e, v, xd (t), ud (t)).
The function F represents the dynamics of the error, with control input
v and external inputs xd and ud . In general, this system is time-varying
through the desired state and input.
For trajectory tracking, we can assume that e is small (if our controller
1-4 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

is doing a good job), and so we can linearize around e = 0:



de ∂F ∂F
≈ A(t)e + B(t)v, A(t) = , B(t) = .
dt ∂e (xd (t),ud (t)) ∂v (xd (t),ud (t)
It is often the case that A(t) and B(t) depend only on xd , in which case it
is convenient to write A(t) = A(xd ) and B(t) = B(xd ).
We start by reviewing the case where A(t) and B(t) are constant, in
which case our error dynamics become
ė = Ae + Bv.
This occurs, for example, if the original nonlinear system is linear. We can
then search for a control system of the form
v = −Ke + kr r.
In the case where r is constant, we can apply the results of Chapter 6 of
ÅM08 and solve the problem by finding a gain matrix K that gives the
desired closed loop dynamics (e.g., by eigenvalue assignment) and choosing
kr to give the desired output value at equilibrium. The equilibrium point is
given by
xe = −(A − BK)−1 Bkr r =⇒ ye = −C(A − BK)−1 Bkr r
and if we wish the output to be y = r it follows that

kr = −1/ C(A − BK)−1 B .
It can be shown that this formulation is equivalent to a two degree of freedom
design where xd and ud are chosen to give the desired reference output
(Exercise 1.1).
Returning to the full nonlinear system, assume now that xd and ud are
either constant or slowly varying (with respect to the performance criterion).
This allows us to consider just the (constant) linearized system given by
(A(xd ), B(xd )). If we design a state feedback controller K(xd ) for each xd ,
then we can regulate the system using the feedback
v = K(xd )e.
Substituting back the definitions of e and v, our controller becomes
u = −K(xd )(x − xd ) + ud .
Note that the controller u = α(x, xd , ud ) depends on (xd , ud ), which them-
selves depend on the desired reference trajectory. This form of controller is
called a gain scheduled linear controller with feedforward ud .
More generally, the term gain scheduling is used to describe any controller
that depends on a set of measured parameters in the system. So, for example,
we might write
u = −K(x, µ) · (x − xd ) + ud ,
1.2. TRAJECTORY TRACKING AND GAIN SCHEDULING 1-5

y 12

10

x vel (m/s), y pos (m)


φ 8

6
θ v
4 y

−2
0 1 2 3 4 5
x Time (s)

(a) (b)
Figure 1.2: Vehicle steering using gain scheduling.

where K(x, µ) depends on the current system state (or some portion of
it) and an external parameter µ. The dependence on the current state x (as
opposed to the desired state xd ) allows us to modify the closed loop dynamics
differently depending on our location in the state space. This is particularly
useful when the dynamics of the process vary depending on some subset of
the states (such as the altitude for an aircraft or the internal temperature
for a chemical reaction). The dependence on µ can be used to capture the
dependence on the reference trajectory, or they can reflect changes in the
environment or performance specifications that are not modeled in the state
of the controller.

Example 1.1 Steering control with velocity scheduling


Consider the problem of controlling the motion of a automobile so that it
follows a given trajectory on the ground, as shown in Figure 1.2a. We use
the model derived in ÅM08, choosing the reference point to be the center of
the rear wheels. This gives dynamics of the form

v
ẋ = cos θ v, ẏ = sin θ v, θ̇ = tan φ, (1.2)
l

where (x, y, θ) is the position and orientation of the vehicle, v is the veloc-
ity and φ is the steering angle, both considered to be inputs, and l is the
wheelbase.
A simple feasible trajectory for the system is to follow a straight line in
the x direction at lateral position yr and fixed velocity vr . This corresponds
to a desired state xd = (vr t, yr , 0) and nominal input ud = (vr , 0). Note that
(xd , ud ) is not an equilibrium point for the system, but it does satisfy the
equations of motion.
1-6 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

Linearizing the system about the desired trajectory, we obtain


   
0 0 − sin θ 0 0 0

∂f
Ad = = 0 0 cos θ  = 0 0 1 ,
∂x (xd ,ud ) 0 0 0 0 0 0
(xd ,ud )
 
1 0
∂f
Bd = = 0 0 .
∂u (xd ,ud ) 0 vr /l
We form the error dynamics by setting e = x − xd and w = u − ud :
vr
ėx = w1 , ėy = eθ , ėθ = w2 .
l
We see that the first state is decoupled from the second two states and
hence we can design a controller by treating these two subsystems separately.
Suppose that we wish to place the closed loop eigenvalues of the longitudinal
dynamics (ex ) at λ1 and place the closed loop eigenvalues of the lateral
dynamics (ey , eθ ) at the roots of the polynomial equation s2 + a1 s + a2 = 0.
This can accomplished by setting
w1 = −λ1 ex
l
w2 = (a1 ey + a2 eθ ).
vr
Note that the gains depend on the velocity vr (or equivalently on the nominal
input ud ), giving us a gain scheduled controller.
In the original inputs and state coordinates, the controller has the form
  
  λ1 0 0 x − vr t  
v v
= − a 1 l a 2 l   y − yr  + r .
φ 0 0
vr vr θ
| {z } | {z } | {z }
Kd e ud

The form of the controller shows that at low speeds the gains in the steering
angle will be high, meaning that we must turn the wheel harder to achieve
the same effect. As the speed increases, the gains become smaller. This
matches the usual experience that at high speed a very small amount of
actuation is required to control the lateral position of a car. Note that the
gains go to infinity when the vehicle is stopped (vr = 0), corresponding to
the fact that the system is not reachable at this point.
Figure 1.2b shows the response of the controller to a step change in
lateral position at three different reference speeds. Notice that the rate of
the response is constant, independent of the reference speed, reflecting the
fact that the gain scheduled controllers each set the closed loop poles to the
same values. ∇
One limitation of gain scheduling as we have described it is that a separate
set of gains must be designed for each operating condition xd . In practice,
1.3. TRAJECTORY GENERATION AND DIFFERENTIAL FLATNESS 1-7

Figure 1.3: Gain scheduling. A general gain scheduling design involves finding a
gain K at each desired operating point. This can be thought of as a gain surface,
as shown on the left (for the case of a scalar gain). An approximation to this gain
can be obtained by computing the gains at a fixed number of operating points
and then interpolated between those gains. This gives an approximation of the
continuous gain surface, as shown on the right.

gain scheduled controllers are often implemented by designing controllers at


a fixed number of operating points and then interpolating the gains between
these points, as illustrated in Figure 1.3. Suppose that we have a set of
operating points xd,j , j = 1, . . . , N . Then we can write our controller as
N
X
u = ud − K(x)e K(x) = αj (x)Kj ,
j=1

where Kj is a set of gains designed around the operating point xd,j and αj (x)
is a weighting factor. For example, we might choose the weights αj (x) such
that we take the gains corresponding to the nearest two operating points
and weight them according to the Euclidean distance of the current state
from that operating point; if the distance is small then we use a weight very
near to 1 and if the distance is far then we use a weight very near to 0.
While the intuition behind gain scheduled controllers is fairly clear, some
caution in required in using them. In particular, a gain scheduled controller
is not guaranteed to be stable even if K(x, µ) locally stabilizes the system
around a given equilibrium point. Gain scheduling can be proven to work in
the case when the gain varies sufficiently slowly (Exercise 1.3).

1.3 Trajectory Generation and Differential Flatness


We now return to the problem of generating a trajectory for a nonlinear
system. Consider first the case of finding a trajectory xd (t) that steers the
system from an initial condition x0 to a final condition xf . We seek a feasible
solution (xd (t), ud (t)) that satisfies the dynamics of the process:
ẋd = f (xd , ud ), xd (0) = x0 , xd (T ) = xf . (1.3)
1-8 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

input constraints → curvature constraints


Figure 1.4: Simple model for an automobile. We wish to find a trajectory from an
initial state to a final state that satisfies the dynamics of the system and constraints
on the curvature (imposed by the limited travel of the front wheels).

Formally, this problem corresponds to a two-point boundary value problem


and can be quite difficult to solve in general.
In addition, we may wish to satisfy additional constraints on the dy-
namics. These can include input saturation constraints |u(t)| < M , state
constraints g(x) ≤ 0 and tracking constraints h(x) = r(t), each of which
gives an algebraic constraint on the states or inputs at each instant in time.
We can also attempt to optimize a function by choosing (xd (t), ud (t)) to
minimize Z T
L(x, u)dt + V (x(T ), u(T )).
0
As an example of the type of problem we would like to study, consider
the problem of steering a car from an initial condition to a final condition,
as show in Figure 1.4. To solve this problem, we must find a solution to
the differential equations (1.2) that satisfies the endpoint conditions. Given
the nonlinear nature of the dynamics, it seems unlikely that one could find
explicit solutions that satisfy the dynamics except in very special cases (such
as driving in a straight line).
A closer inspection of this system shows that it is possible to understand
the trajectories of the system by exploiting the particular structure of the
dynamics. Suppose that we are given a trajectory for the rear wheels of the
system, xd (t) and yd (t). From equation (1.2), we see that we can use this
solution to solve for the angle of the car by writing
ẏ sin θ
= =⇒ θd = tan−1 (ẏd /ẋd ).
ẋ cos θ
Furthermore, given θ we can solve for the velocity using the equation
ẋ = v cos θ =⇒ vd = ẋd / cos θd ,
assuming cos θd 6= 0 (if it is, use v = ẏ/ sin θ). And given θ, we can solve for
φ using the relationship
v lθ̇d
θ̇ = tan φ =⇒ φd = tan−1 ( ).
l vd
Hence all of the state variables and the inputs can be determined by the
trajectory of the rear wheels and its derivatives. This property of a system
1.3. TRAJECTORY GENERATION AND DIFFERENTIAL FLATNESS 1-9

is known as differential flatness.

Definition 1.1 (Differential flatness). A nonlinear system (1.1) is differen-


tially flat if there exists a function α such that
z = α(x, u, u̇ . . . , u(p) )
and we can write the solutions of the nonlinear system as functions of z and
a finite number of derivatives
x = β(z, ż, . . . , z (q) ),
(1.4)
u = γ(z, ż, . . . , z (q) ).

For a differentially flat system, all of the feasible trajectories for the
system can be written as functions of a flat output z(·) and its derivatives.
The number of flat outputs is always equal to the number of system inputs.
The kinematic car is differentially flat with the position of the rear wheels as
the flat output. Differentially flat systems were originally studied by Fliess
et al. [FLMR92].
Differentially flat systems are useful in situations where explicit trajec-
tory generation is required. Since the behavior of a flat system is determined
by the flat outputs, we can plan trajectories in output space, and then map
these to appropriate inputs. Suppose we wish to generate a feasible trajec-
tory for the the nonlinear system
ẋ = f (x, u), x(0) = x0 , x(T ) = xf .
If the system is differentially flat then

x(0) = β z(0), ż(0), . . . , z (q) (0) = x0 ,
 (1.5)
x(T ) = γ z(T ), ż(T ), . . . , z (q) (T ) = xf ,
and we see that the initial and final condition in the full state space depends
on just the output z and its derivatives at the initial and final times. Thus
any trajectory for z that satisfies these boundary conditions will be a feasible
trajectory for the system, using equation (1.4) to determine the full state
space and input trajectories.
In particular, given initial and final conditions on z and its derivatives
that satisfy equation (1.5), any curve z(·) satisfying those conditions will
correspond to a feasible trajectory of the system. We can parameterize the
flat output trajectory using a set of smooth basis functions ψi (t):
N
X
z(t) = αi ψi (t), αi ∈ R.
i=1

We seek a set of coefficients αi , i = 1, . . . , N such that z(t) satisfies the


boundary conditions (1.5). The derivatives of the flat output can be com-
1-10 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

puted in terms of the derivatives of the basis functions:

N
X
ż(t) = αi ψ̇i (t)
i=1
..
.
N
X (q)
ż (q) (t) = αi ψi (t).
i=1

We can thus write the conditions on the flat outputs and their derivatives
as
   
ψ1 (0) ψ2 (0) . . . ψN (0) z(0)
 ψ̇1 (0) ψ̇2 (0) . . . ψ̇N (0)   ż(0) 
   . 
 .. .. .. 
 . . .     .. 

 (q) (q) (q)
 α1  
 ψ (0) ψ (0) . . . ψ (0)  z (q) (0) 
 1 2 N   ..  
 
  .  =  
 ψ1 (T ) ψ 2 (T ) . . . ψ N (T )   z(T ) 
  αN  
 ψ̇1 (T ) ψ̇ 2 (T ) . . . ψ̇ N (T )   ż(T ) 
   . 
 .. .. ..   .
 . . .  . 
(q) (q)
ψ1 (T ) ψ2 (T ) . . . ψN (T )
(q) z (q) (T )

This equation is a linear equation of the form M α = z̄. Assuming that M


has a sufficient number of columns and that it is full column rank, we can
solve for a (possibly non-unique) α that solves the trajectory generation
problem.

Example 1.2 Nonholonomic integrator


A simple nonlinear system called a nonholonomic integrator [Bro81] is given
by the differential equations

ẋ1 = u1 , ẋ2 = u2 , ẋ3 = x2 u1 .

This system is differentially flat with flat output z = (x1 , x3 ). The relation-
ship between the flat variables and the states is given by

x 1 = z1 , x2 = ẋ3 /ẋ1 = ż2 /ż1 , x 3 = z2 . (1.6)

Using simple polynomials as our basis functions,

ψ1,1 (t) = 1, ψ1,2 (t) = t, ψ1,3 (t) = t2 , ψ1,4 (t) = t3 ,


ψ2,1 (t) = 1 ψ2,2 (t) = t, ψ2,3 (t) = t2 , ψ2,4 (t) = t3 ,
1.3. TRAJECTORY GENERATION AND DIFFERENTIAL FLATNESS 1-11

the equations for the feasible (flat) trajectory become

    
1 0 0 0 0 0 0 0 α11 x1,0
0 1 0 0 0 0 0 0  α12   1 
   
0 0 0 0 1 0 0 0  α13   x3,0 

   
0 0 0 0 0 1 0 0  α14   x2,0 

   =  .
1 T T2 T3 0 0 0 0  α21  x1,f 
    
0 1 2T 3T 2 0 0 0 0  α22   1 
   
0 0 0 0 1 T T2 T3  α23 x3,f 
0 0 0 0 0 1 2T 3T 2 α24 x2,f

This is a set of 8 linear equations in 8 variables. It can be shown that the


matrix M is full rank when T 6= 0 and the system can be solved numerically.

Note that no ODEs need to be integrated in order to compute the feasible


trajectories for a differentially flat system (unlike optimal control methods
that we will consider in the next chapter, which involve parameterizing the
input and then solving the ODEs). This is the defining feature of differ-
entially flat systems. The practical implication is that nominal trajectories
and inputs that satisfy the equations of motion for a differentially flat sys-
tem can be computed in a computationally efficient way (solving a set of
algebraic equations). Since the flat output functions do not have to obey a
set of differential equations, the only constraints that must be satisfied are
the initial and final conditions on the endpoints, their tangents, and higher
order derivatives. Any other constraints on the system, such as bounds on
the inputs, can be transformed into the flat output space and (typically)
become limits on the curvature or higher order derivative properties of the
curve.
If there is a performance index for the system, this index can be trans-
formed and becomes a functional depending on the flat outputs and their
derivatives up to some order. By approximating the performance index we
can achieve paths for the system that are suboptimal but still feasible. This
approach is often much more appealing than the traditional method of ap-
proximating the system (for example by its linearization) and then using
the exact performance index, which yields optimal paths but for the wrong
system.
In light of the techniques that are available for differentially flat systems,
the characterization of flat systems becomes particularly important. Unfor-
tunately, general conditions for flatness are not known, but many important
class of nonlinear systems, including feedback linearizable systems, are dif-
ferential flat. One large class of flat systems are those in “pure feedback
1-12 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

(a) Kinematic car (b) Ducted fan

(c) N trailers
(d) Towed cable
Figure 1.5: Examples of flat systems.

form”:
ẋ1 = f1 (x1 , x2 )
ẋ2 = f2 (x1 , x2 , x3 )
..
.
ẋn = fn (x1 , . . . , xn , u).
Under certain regularity conditions these systems are differentially flat with
output y = x1 . These systems have been used for so-called “integrator back-
stepping” approaches to nonlinear control by Kokotovic et al. [KKM91] and
constructive controllability techniques for nonholonomic systems in chained
form [vNRM98]. Figure 1.5 shows some additional systems that are differ-
entially flat.
Example 1.3 Vectored thrust aircraft
Consider the dynamics of a planar, vectored thrust flight control system as
shown in Figure 1.6. This system consists of a rigid body with body fixed
forces and is a simplified model for a vertical take-off and landing aircraft
(see Example 2.9 in ÅM08). Let (x, y, θ) denote the position and orientation
of the center of mass of the aircraft. We assume that the forces acting on the
vehicle consist of a force F1 perpendicular to the axis of the vehicle acting
at a distance r from the center of mass, and a force F2 parallel to the axis of
the vehicle. Let m be the mass of the vehicle, J the moment of inertia, and
1.4. FURTHER READING 1-13

y r

F2

x F1
Figure 1.6: Vectored thrust aircraft (from ÅM08). The net thrust on the aircraft
can be decomposed into a horizontal force F1 and a vertical force F2 acting at a
distance r from the center of mass.

g the gravitational constant. We ignore aerodynamic forces for the purpose


of this example.
The dynamics for the system are
mẍ = F1 cos θ − F2 sin θ,
mÿ = F1 sin θ + F2 cos θ − mg, (1.7)
J θ̈ = rF1 .
Martin et al. [MDP94] showed that this system is differentially flat and that
one set of flat outputs is given by
z1 = x − (J/mr) sin θ,
(1.8)
z2 = y + (J/mr) cos θ.
Using the system dynamics, it can be shown that
z̈1 cos θ + (z̈2 + g) sin θ = 0 (1.9)
and thus given z1 (t) and z2 (t) we can find θ(t) except for an ambiguity of π
and away from the singularity z¨1 = z¨2 + g = 0. The remaining states and the
forces F1 (t) and F2 (t) can then be obtained from the dynamic equations, all
in terms of z1 , z2 , and their higher order derivatives. ∇

1.4 Further Reading


The two degree of freedom controller structure introduced in this chapter is
described in a bit more detail in ÅM08 (in the context of output feedback
control) and a description of some of the origins of this structure are provided
in the “Further Reading” section of Chapter 8. Gain scheduling is a classical
technique that is often omitted from introductory control texts, but a good
1-14 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

description can be found in the survey article by Rugh [Rug90] and the work
of Shamma [Sha90]. Differential flatness was originally developed by Fliess,
Levin, Martin and Rouchon [FLMR92]. See [Mur97] for a description of the
role of flatness in control of mechanical systems and [vNM98, MFHM05] for
more information on flatness applied to flight control systems.

Exercises
1.1 (Feasible trajectory for constant reference) Consider a linear input/output
system of the form
ẋ = Ax + Bu, y = Cx (1.10)
in which we wish to track a constant reference r. A feasible (steady state)
trajectory for the system is given by solving the equation
    
0 A B xd
=
r C 0 ud
for xd and ud .
(a) Show that these equations always have a solution as long as the linear
system (1.10) is reachable.
(b) In Section 6.2 of ÅM08 we showed that the reference tracking problem
could be solved using a control law of the form u = −Kx + kr r. Show
that this is equivalent to a two degree of freedom control design using
xd and ud and give a formula for kr in terms of xd and ud . Show that
this formula matches that given in ÅM08.
1.2 A simplified model of the steering control problem is described in
Åström and Murray, Example 2.8. The lateral dynamics can be approxi-
mated by the linearized dynamics
   
0 v 0
ż = z+ u, y = z1 ,
0 0 1
where z = (y, θ) ∈ R2 is the state of the system and v is the speed of
the vehicle. Suppose that we wish to track a piecewise constant reference
trajectory
r = square(2πt/20),
where square is the square wave function in MATLAB. Suppose further
that the speed of the vehicle varies according to the formula
v = 5 + 3 sin(2πt/50).
Design and implement a gain-scheduled controller for this system by first
designing a state space controller that places the closed loop poles of the
system at the roots of s2 + 2ζω0 s + ω02 , where ζ = 0.7 and ω0 = 1. You
1.4. FURTHER READING 1-15

should design controllers for three different parameter values: v = 2, 5, 10.


Then use linear interpolation to compute the gain for values of v between
these fixed values. Compare the performance of the gain scheduled controller
to a simple controller that assumes v = 5 for the purpose of the control
design (but leaving v time-varying in your simulation).

1.3 (Stability of gain scheduled controllers for slowly varying systems) Con-
sider a nonlinear control system with gain scheduled feedback
ė = f (e, v) v = k(µ)e,
where µ(t) ∈ R is an externally specified parameter (e.g., the desired tra-
jectory) and k(µ) is chosen such that the linearization of the closed loop
system around the origin is stable for each fixed µ.
Show that if |µ̇| is sufficiently small then the equilibrium point is locally
asymptotically stable for the full nonlinear, time-varying system. (Hint: find
a Lyapunov function of the form V = xT P (µ)x based on the linearization of
the system dynamics for fixed µ and then show this is a Lyapunov function
for the full system.)

1.4 (Flatness of systems in reachable canonical form) Consider a single input


system in reachable canonical form [ÅM08, Sec. 6.1]:
   
−a1 −a2 −a3 . . . −an 1
 1 0 0 . . . 0  0
dx   0 
  
 
= 0 1 0 ...  x + 0 u,
dt  .. . .. . .. . 
..   ..  (1.11)
 . .
0 1 0 0
 
y = b1 b2 b3 . . . bn x + du.
Suppose that we wish to find an input u that moves the system from x0 to
xf . This system is differentially flat with flat output given by z = xn and
hence we can parameterize the solutions by a curve of the form
N
X
xn (t) = αk tk , (1.12)
k=0

where N is a sufficiently large integer.

(a) Compute the state space trajectory x(t) and input u(t) corresponding
to equation (1.12) and satisfying the differential equation (1.11). Your
answer should be an equation similar to equation (1.6) for each state
xi and the input u.

(b) Find an explicit input that steers a double integrator system between
any two equilibrium points x0 ∈ R2 and xf ∈ R2 .
1-16 CHAPTER 1. TRAJECTORY GENERATION AND TRACKING

(c) Show that all reachable systems are differentially flat and give a for-
mula for finding the flat output in terms of the dynamics matrix A
and control matrix B.
1.5 Consider the lateral control problem for an autonomous ground vehicle
as described in Example 1.1 and Section 1.3. Using the fact that the system is
differentially flat, find an explicit trajectory that solves the following parallel
parking maneuver:

x0 = (0, 4)

xi = (6, 2)

xf = (0, 0)

Your solution should consist of two segments: a curve from x0 to xi with


v > 0 and a curve from xi to xf with v < 0. For the trajectory that you
determine, plot the trajectory in the plane (x versus y) and also the inputs
v and φ as a function of time.
1.6 Consider first the problem of controlling a truck with trailer, as shown
in the figure below:
ẋ = cos θ u1
ẏ = sin θ u1
φ̇ = u2
1
θ̇ = tan φ u1
l
1
θ̇1 = cos(θ − θ1 ) sin(θ − θ1 )u1 ,
d
The dynamics are given above, where (x, y, θ) is the position and orientation
of the truck, φ is the angle of the steering wheels, θ1 is the angle of the trailer,
and l and d are the length of the truck and trailer. We want to generate
a trajectory for the truck to move it from a given initial position to the
loading dock. We ignore the role of obstacles and concentrate on generation
of feasible trajectories.
(a) Show that the system is differentially flat using the center of the rear
wheels of the trailer as the flat output.
(b) Generate a trajectory for the system that steers the vehicle from an
initial condition with the truck and trailer perpendicular to the loading
dock into the loading dock.
1.4. FURTHER READING 1-17

(c) Write a simulation of the system stabilizes the desired trajectory and
demonstrate your two-degree of freedom control system maneuvering
from several different initial conditions into the parking space, with
either disturbances or modeling errors included in the simulation.
Chapter Two
Optimal Control

This set of notes expands on Chapter 6 of Feedback Systems by Åström and


Murray (ÅM08), which introduces the concepts of reachability and state
feedback. We also expand on topics in Section 7.5 of ÅM08 in the area
of feedforward compensation. Beginning with a review of optimization, we
introduce the notion of Lagrange multipliers and provide a summary of
the Pontryagin’s maximum principle. Using these tools we derive the linear
quadratic regulator for linear systems and describe its usage.
Prerequisites. Readers should be familiar with modeling of input/output
control systems using differential equations, linearization of a system around
an equilibrium point and state space control of linear systems, including
reachability and eigenvalue assignment. Some familiarity with optimization
of nonlinear functions is also assumed.

2.1 Review: Optimization


Optimization refers to the problem of choosing a set of parameters that
maximize or minimize a given function. In control systems, we are often
faced with having to choose a set of parameters for a control law so that
the some performance condition is satisfied. In this chapter we will seek to
optimize a given specification, choosing the parameters that maximize the
performance (or minimize the cost). In this section we review the conditions
for optimization of a static function, and then extend this to optimization
of trajectories and control laws in the remainder of the chapter. More infor-
mation on basic techniques in optimization can be found in [Lue97] or the
introductory chapter of [LS95].
Consider first the problem of finding the minimum of a smooth function
F : Rn → R. That is, we wish to find a point x∗ ∈ Rn such that F (x∗ ) ≤
F (x) for all x ∈ Rn . A necessary condition for x∗ to be a minimum is that
the gradient of the function be zero at x∗ :
∂F ∗
(x ) = 0.
∂x
The function F (x) is often called a cost function and x∗ is the optimal value
for x. Figure 2.1 gives a graphical interpretation of the necessary condition
for a minimum. Note that these are not sufficient conditions; the points x1
2-2 CHAPTER 2. OPTIMAL CONTROL

F (x)
x1

dx
∂F
∂x
dx

x
x2
x∗

Figure 2.1: Optimization of functions. The minimum of a function occurs at a


point where the gradient is zero.

F (x) ∂G
∂x x3

x G(x) = 0

x2
G(x) = 0 x2
x1
x1
(a) Constrained optimization (b) Constraint normal vectors
Figure 2.2: Optimization with constraints. (a) We seek a point x∗ that minimizes
F (x) while lying on the surface G(x) = 0 (a line in the x1 x2 plane). (b) We can
parameterize the constrained directions by computing the gradient of the constraint
G. Note that x ∈ R2 in (a), with the third dimension showing F (x), while x ∈ R3
in (b).

and x2 and x∗ in the figure all satisfy the necessary condition but only one
is the (global) minimum.
The situation is more complicated if constraints are present. Let Gi :
Rn → R, i = 1, . . . , k be a set of smooth functions with Gi (x) = 0 repre-
senting the constraints. Suppose that we wish to find x∗ ∈ Rn such that
Gi (x∗ ) = 0 and F (x∗ ) ≤ F (x) for all x ∈ {x ∈ Rn : Gi (x) = 0, i = 1, . . . , k}.
This situation can be visualized as constraining the point to a surface (de-
fined by the constraints) and searching for the minimum of the cost function
along this surface, as illustrated in Figure 2.2a.
A necessary condition for being at a minimum is that there are no di-
rections tangent to the constraints that also decrease the cost. Given a con-
straint function G(x) = (G1 (x), . . . , Gk (x)), x ∈ Rn we can represent the
constraint as a n − k dimensional surface in Rn , as shown in Figure 2.2b.
The tangent directions to the surface can be computed by considering small
2.1. REVIEW: OPTIMIZATION 2-3

perturbations of the constraint that remain on the surface:


∂Gi ∂Gi
Gi (x + δx) ≈ Gi (x) + (x)δx = 0. =⇒ (x)δx = 0,
∂x ∂x
where δx ∈ Rn is a vanishingly small perturbation. It follows that the normal
directions to the surface are spanned by ∂Gi /∂x, since these are precisely
the vectors that annihilate an admissible tangent vector δx.
Using this characterization of the tangent and normal vectors to the
constraint, a necessary condition for optimization is that the gradient of F
is spanned by vectors that are normal to the constraints, so that the only
directions that increase the cost violate the constraints. We thus require that
there exist scalars λi , i = 1, . . . , k such that
X ∂Gi k
∂F ∗
(x ) + λi (x∗ ) = 0.
∂x ∂x
i=1
 T
If we let G = G1 G2 . . . Gk , then we can write this condition as
∂F ∂G
+ λT =0 (2.1)
∂x ∂x
the term ∂F /∂x is the usual (gradient) optimality condition while the term
∂G/∂x is used to “cancel” the gradient in the directions normal to the
constraint.
An alternative condition can be derived by modifying
P the cost function
to incorporate the constraints. Defining Fe = F + λi Gi , the necessary
condition becomes
∂ Fe ∗
(x ) = 0.
∂x
The scalars λi are called Lagrange multipliers. Minimizing Fe is equivalent
to the optimization given by

min F (x) + λT G(x) . (2.2)
x

The variables λ can be regarded as free variables, which implies that we need
to choose x such that G(x) = 0 in order to insure the cost is minimized.
Otherwise, we could choose λ to generate a large cost.

Example 2.1 Two free variables with a constraint


Consider the cost function given by
F (x) = F0 + (x1 − a)2 + (x2 − b)2 ,
which has an unconstrained minimum at x = (a, b). Suppose that we add a
constraint G(x) = 0 given by
G(x) = x1 − x2 .
2-4 CHAPTER 2. OPTIMAL CONTROL

With this constraint, we seek to optimize F subject to x1 = x2 . Although in


this case we could do this by simple substitution, we instead carry out the
more general procedure using Lagrange multipliers.
The augmented cost function is given by
F̃ (x) = F0 + (x1 − a)2 + (x2 − b)2 + λ(x1 − x2 ),
where λ is the Lagrange multiplier for the constraint. Taking the derivative
of F̃ , we have
∂ F̃  
= 2x1 − 2a + λ 2x2 − 2b − λ .
∂x
Setting each of these equations equal to zero, we have that at the minimum
x∗1 = a − λ/2, x∗2 = b + λ/2.
The remaining equation that we need is the constraint, which requires that
x∗1 = x∗2 . Using these three equations, we see that λ∗ = a − b and we have
a+b a+b
x∗1 = , x∗2 = .
2 2
To verify the geometric view described above, note that the gradients of
F and G are given by
∂F   ∂G  
= 2x1 − 2a 2x2 − 2b , = 1 −1 .
∂x ∂x
At the optimal value of the (constrained) optimization, we have
∂F   ∂G  
= b−a a−b , = 1 −1 .
∂x ∂x
Although the derivative of F is not zero, it is pointed in a direction that
is normal to the constraint, and hence we cannot decrease the cost while
staying on the constraint surface. ∇

We have focused on finding the minimum of a function. We can switch


back and forth between maximum and minimum by simply negating the
cost function: 
max F (x) = min −F (x)
x x

We see that the conditions that we have derived are independent of the sign
of F since they only depend on the gradient begin zero in approximate di-
rections. Thus finding x∗ that satisfies the conditions corresponds to finding
an extremum for the function.
Very good software is available for numerically solving optimization prob-
lems of this sort. The NPSOL and SNOPT libraries are available in FOR-
TRAN (and C). In MATLAB, the fmin function can be used to solve a
constrained optimization problem.
2.2. OPTIMAL CONTROL OF SYSTEMS 2-5

2.2 Optimal Control of Systems


Consider now the optimal control problem:
Z T

min L(x, u) dt + V x(T )
u(·) 0
subject to the constraint
ẋ = f (x, u), x ∈ Rn , u ∈ Rm .
Abstractly, this is a constrained optimization problem where we seek a fea-
sible trajectory (x(t), u(t)) that minimizes the cost function
Z T

J(x, u) = L(x, u) dt + V x(T ) .
0
More formally, this problem is equivalent to the “standard” problem of min-
imizing a cost function J(x, u) where (x, u) ∈ L2 [0, T ] (the set of square
integrable functions) and h(z) = ẋ(t) − f (x(t), u(t)) = 0 models the dynam-
ics. The term L(x, u) is referred to as the integral cost and V (x(T )) is the
final (or terminal) cost.
There are many variations and special cases of the optimal control prob-
lem. We mention a few here:
Infinite horizon optimal control. If we let T = ∞ and set V = 0, then we seek
to optimize a cost function over all time. This is called the infinite horizon
optimal control problem, versus the finite horizon problem with T < ∞.
Note that if an infinite horizon problem has a solution with finite cost, then
the integral cost term L(x, u) must approach zero as t → ∞.
Linear quadratic (LQ) optimal control. If the dynamical system is linear and
the cost function is quadratic, we obtain the linear quadratic optimal control
problem:
Z T

ẋ = Ax + Bu, J= xT Qx + uT Ru dt + xT (T )P1 x(T ).
0
In this formulation, Q ≥ 0 penalizes state error, R > 0 penalizes the input
and P1 > 0 penalizes terminal state. This problem can be modified to track a
desired trajectory (xd , ud ) by rewriting the cost function in terms of (x − xd )
and (u − ud ).
Terminal constraints. It is often convenient to ask that the final value of
the trajectory, denoted xf , be specified. We can do this by requiring that
x(T ) = xf or by using a more general form of constraint:
ψi (x(T )) = 0, i = 1, . . . , q.
The fully constrained case is obtained by setting q = n and defining ψi (x(T )) =
xi (T ) − xi,f . For a control problem with a full set of terminal constraints,
V (x(T )) can be omitted (since its value is fixed).
2-6 CHAPTER 2. OPTIMAL CONTROL

Time optimal control. If we constrain the terminal condition to x(T ) = xf ,


let the terminal time T be free (so that we can optimize over it) and choose
L(x, u) = 1, we can find the time-optimal trajectory between an initial and
final condition. This problem is usually only well-posed if we additionally
constrain the inputs u to be bounded.
A very general set of conditions are available for the optimal control problem
that captures most of these special cases in a unifying framework. Consider
a nonlinear system
ẋ = f (x, u), x = Rn ,
x(0) given, u ∈ Ω ⊂ Rm ,
where f (x, u) = (f1 (x, u), . . . fn (x, u)) : Rn × Rm → Rn . We wish to minimize
a cost function J with terminal constraints:
Z T
J= L(x, u) dt + V (x(T )), ψ(x(T )) = 0.
0
The function ψ : Rn → Rq gives a set of q terminal constraints. Analogous
to the case of optimizing a function subject to constraints, we construct the
Hamiltonian: X
H = L + λT f = L + λ i fi .
The variables λ are functions of time and are often referred to as the costate
variables. A set of necessary conditions for a solution to be optimal was
derived by Pontryagin [PBGM62].
Theorem 2.1 (Maximum Principle). If (x∗ , u∗ ) is optimal, then there exists
λ∗ (t) ∈ Rn and ν ∗ ∈ Rq such that
x(0) given, ψ(x(T )) = 0
∂H ∂H
ẋi = − λ̇i = ∂V ∂ψ
∂λi ∂xi λ(T ) = (x(T )) + ν T
∂x ∂x
and
H(x∗ (t), u∗ (t), λ∗ (t)) ≤ H(x∗ (t), u, λ∗ (t)) for all u∈Ω
The form of the optimal solution is given by the solution of a differential
equation with boundary conditions. If u = arg min H(x, u, λ) exists, we can
use this to choose the control law u and solve for the resulting feasible
trajectory that minimizes the cost. The boundary conditions are given by
the n initial states x(0), the q terminal constraints on the state ψ(x(T )) = 0
and the n − q final values for the Lagrange multipliers
∂V ∂ψ
λ(T ) = (x(T )) + ν T .
∂x ∂x
In this last equation, ν is a free variable and so there are n equations in n+q
free variables, leaving n − q constraints on λ(T ). In total, we thus have 2n
boundary values.
2.2. OPTIMAL CONTROL OF SYSTEMS 2-7

The maximum principle is a very general (and elegant) theorem. It allows


the dynamics to be nonlinear and the input to be constrained to lie in a set
Ω, allowing the possibility of bounded inputs. If Ω = Rm (unconstrained
input) and H is differentiable, then a necessary condition for the optimal
input is
∂H
= 0.
∂u
We note that even though we are minimizing the cost, this is still usually
called the maximum principle (an artifact of history).

Sketch of proof. We follow the proof given by Lewis and Syrmos [LS95],
omitting some of the details required for a fully rigorous proof. We use
the method of Lagrange multipliers, augmenting our cost function by the
dynamical constraints and the terminal constraints:
Z T

˜
J(x(·), u(·), λ(·), ν) = J(x, u) + −λT (t) ẋ(t) − f (x, u) dt + ν T ψ(x(T ))
0
Z T

= L(x, u) − λT (t) ẋ(t) − f (x, u) dt
0
+ V (x(T )) + ν T ψ(x(T )).

Note that λ is a function of time, with each λ(t) corresponding to the instan-
taneous constraint imposed by the dynamics. The integral over the interval
[0, T ] plays the role of the sum of the finite constraints in the regular opti-
mization.
Making use of the definition of the Hamiltonian, the augmented cost
becomes
Z T

˜
J(x(·), u(·), λ(·), ν) = H(x, u) − λT (t)ẋ dt + V (x(T )) + ν T ψ(x(T )).
0

We can now “linearize” the cost function around the optimal solution x(t) =
x∗ (t) + δx(t), u(t) = u∗ (t) + δu(t), λ(t) = λ∗ (t) + δλ(t) and ν = ν ∗ + δν.
Taking T as fixed for simplicity (see [LS95] for the more general case), the
incremental cost can be written as
δ J˜ = J(x
˜ ∗ + δx, u∗ + δu, λ∗ + δλ, ν ∗ + δν) − J(x
˜ ∗ , u ∗ , λ∗ , ν ∗ )
Z T   ∂H  
∂H ∂H T T
≈ δx + δu − λ δ ẋ + − ẋ δλ dt
0 ∂x ∂u ∂λ
∂V ∂ψ 
+ δx(T ) + ν T δx(T ) + δν T ψ x(T ), u(T ) ,
∂x ∂x
where we have omitted the time argument inside the integral and all deriva-
tives are evaluated along the optimal solution.
2-8 CHAPTER 2. OPTIMAL CONTROL

We can eliminate the dependence on δ ẋ using integration by parts:


Z T Z T
− λT δ ẋ dt = −λT (T )δx(T ) + λT (0)δx(0) + λ̇T δx dt.
0 0

Since we are requiring x(0) = x0 , the δx(0) term vanishes and substituting
this into δ J˜ yields
Z T    ∂H  
˜ ∂H T ∂H T
δJ ≈ + λ̇ δx + δu + − ẋ δλ dt
0 ∂x ∂u ∂λ
 ∂V ∂ψ  
+ + νT − λT (T ) δx(T ) + δν T ψ x(T ), u(T ) .
∂x ∂x
To be optimal, we require δ J˜ = 0 for all δx, δu, δλ and δν, and we obtain
the (local) conditions in the theorem.

2.3 Examples
To illustrate the use of the maximum principle, we consider a number of
analytical examples. Additional examples are given in the exercises.

Example 2.2 Scalar linear system


Consider the optimal control problem for the system
ẋ = ax + bu, (2.3)
where x = R is a scalar state, u ∈ R is the input, the initial state x(t0 )
is given, and a, b ∈ R are positive constants. We wish to find a trajectory
(x(t), u(t)) that minimizes the cost function
Z tf
J = 21 u2 (t) dt + 12 cx2 (tf ),
t0

where the terminal time tf is given and c > 0 is a constant. This cost
function balances the final value of the state with the input required to get
to that state.
To solve the problem, we define the various elements used in the maxi-
mum principle. Our integral and terminal costs are given by
L = 12 u2 (t) V = 12 cx2 (tf ).
We write the Hamiltonian of this system and derive the following expressions
for the costate λ:
H = L + λf = 12 u2 + λ(ax + bu)
∂H ∂V
λ̇ = − = −aλ, λ(tf ) = = cx(tf ).
∂x ∂x
This is a final value problem for a linear differential equation in λ and the
2.3. EXAMPLES 2-9

solution can be shown to be


λ(t) = cx(tf )ea(tf −t) .
The optimal control is given by
∂H
= u + bλ = 0 ⇒ u∗ (t) = −bλ(t) = −bcx(tf )ea(tf −t) .
∂u
Substituting this control into the dynamics given by equation (2.3) yields a
first-order ODE in x:
ẋ = ax − b2 cx(tf )ea(tf −t) .
This can be solved explicitly as
b2 c ∗ h i
x∗ (t) = x(to )ea(t−to ) + x (tf ) ea(tf −t) − ea(t+tf −2to ) .
2a
Setting t = tf and solving for x(tf ) gives
2a ea(tf −to ) x(to )
x∗ (tf ) = 
2a − b2 c 1 − e2a(tf −to )
and finally we can write
2abc ea(2tf −to −t) x(to )
u∗ (t) = −  (2.4)
2a − b2 c 1 − e2a(tf −to )
b2 c ea(tf −to ) x(to ) h i
x∗ (t) = x(to )ea(t−to ) +  ea(tf −t) − ea(t+tf −2to ) .
2a − b2 c 1 − e2a(tf −to )
(2.5)
We can use the form of this expression to explore how our cost function
affects the optimal trajectory. For example, we can ask what happens to
the terminal state x∗ (tf ) and c → ∞. Setting t = tf in equation (2.5) and
taking the limit we find that
lim x∗ (tf ) = 0.
c→∞

Example 2.3 Bang-bang control
The time-optimal control program for a linear system has a particularly
simple solution. Consider a linear system with bounded input
ẋ = Ax + Bu, |u| ≤ 1,
and suppose we wish to minimize the time required to move from an initial
state x0 to a final state xf . Without loss of generality we can take xf = 0.
We choose the cost functions and terminal constraints to satisfy
Z T
J= 1 dt, ψ(x(T )) = x(T ).
0
2-10 CHAPTER 2. OPTIMAL CONTROL

To find the optimal control, we form the Hamiltonian


H = 1 + λT (Ax + Bu) = 1 + (λT A)x + (λT B)u.
Now apply the conditions in the maximum principle:
∂H
ẋ = = Ax + Bu
∂λ
∂H
−λ̇ = = AT λ
∂x
u = arg min H = −sgn(λT B)
The optimal solution always satisfies this equation (since the maximum prin-
ciple gives a necessary condition) with x(0) = x0 and x(T ) = 0. It follows
that the input is always either +1 or −1, depending on λT B. This type of
control is called “bang-bang” control since the input is always on one of its
limits. If λT (t)B = 0 then the control is not well defined, but if this is only
true for a specific time instant (e.g., λT (t)B crosses zero) then the analysis
still holds. ∇

2.4 Linear Quadratic Regulators


In addition to its use for computing optimal, feasible trajectories for a
system, we can also use optimal control theory to design a feedback law
u = α(x) that stabilizes a given equilibrium point. Roughly speaking, we do
this by continuously re-solving the optimal control problem from our current
state x(t) and applying the resulting input u(t). Of course, this approach is
impractical unless we can solve explicitly for the optimal control and some-
how rewrite the optimal control as a function of the current state in a simple
way. In this section we explore exactly this approach for the linear quadratic
optimal control problem.
We begin with the the finite horizon, linear quadratic regulator (LQR)
problem, given by
ẋ = Ax + Bu, x ∈ Rn , u ∈ Rn , x0 given,
Z
˜ 1 T T  1
J= x Qx x + uT Qu u dt + xT (T )P1 x(T ),
2 0 2
where Qx ≥ 0, Qu > 0, P1 ≥ 0 are symmetric, positive (semi-) definite
matrices. Note the factor of 12 is usually left out, but we included it here
to simplify the derivation. (The optimal control will be unchanged if we
multiply the entire cost function by 2.)
To find the optimal control, we apply the maximum principle. We begin
by computing the Hamiltonian H:
1 1
H = xT Qx x + uT Qu u + λT (Ax + Bu).
2 2
2.4. LINEAR QUADRATIC REGULATORS 2-11

Applying the results of Theorem 2.1, we obtain the necessary conditions


 
∂H T
ẋ = = Ax + Bu x(0) = x0
∂λ
 
∂H T (2.6)
−λ̇ = = Qx x + AT λ λ(T ) = P1 x(T )
∂x
∂H
0= = Qu u + λT B.
∂u
The last condition can be solved to obtain the optimal controller

u = −Q−1 T
u B λ,

which can be substituted into the dynamic equation (2.6) To solve for the
optimal control we must solve a two point boundary value problem using the
initial condition x(0) and the final condition λ(T ). Unfortunately, it is very
hard to solve such problems in general.
Given the linear nature of the dynamics, we attempt to find a solution
by setting λ(t) = P (t)x(t) where P (t) ∈ Rn×n . Substituting this into the
necessary condition, we obtain

λ̇ = Ṗ x + P ẋ = Ṗ x + P (Ax − BQ−1 T
u B P )x,
=⇒ −Ṗ x − P Ax + P BQ−1 T
u BP x = Qx x + A P x.

This equation is satisfied if we can find P (t) such that

− Ṗ = P A + AT P − P BQ−1 T
u B P + Qx , P (T ) = P1 . (2.7)

This is a matrix differential equation that defines the elements of P (t) from
a final value P (T ). Solving it is conceptually no different than solving the
initial value problem for vector-valued ordinary differential equations, except
that we must solve for the individual elements of the matrix P (t) backwards
in time. Equation (2.7) is called the Riccati ODE.
An important property of the solution to the optimal control problem
when written in this form is that P (t) can be solved without knowing either
x(t) or u(t). This allows the two point boundary value problem to be sepa-
rated into first solving a final-value problem and then solving a time-varying
initial value problem. More specifically, given P (t) satisfying equation (2.7),
we can apply the optimal input

u(t) = −Q−1 T
u B P (t)x.

and then solve the original dynamics of the system forward in time from
the initial condition x(0) = x0 . Note that this is a (time-varying) feedback
control that describes how to move from any state to the origin in time T .
An important variation of this problem is the case when we choose T = ∞
2-12 CHAPTER 2. OPTIMAL CONTROL

and eliminate the terminal cost (set P1 = 0). This gives us the cost function
Z ∞
J= (xT Qx x + uT Qu u) dt. (2.8)
0
Since we do not have a terminal cost, there is no constraint on the final value
of λ or, equivalently, P (t). We can thus seek to find a constant P satisfying
equation (2.7). In other words, we seek to find P such that
P A + AT P − P BQ−1 T
u B P + Qx = 0. (2.9)
This equation is called the algebraic Riccati equation. Given a solution, we
can choose our input as
u = −Q−1 T
u B P x.

This represents a constant gain K = Q−1 T


u B P where P is the solution of
the algebraic Riccati equation.
The implications of this result are interesting and important. First, we
notice that if Qx > 0 and the control law corresponds to a finite minimum
of the cost, then we must have that limt→∞ x(t) = 0, otherwise the cost will
be unbounded. This means that the optimal control for moving from any
state x to the origin can be achieved by applying a feedback u = −Kx for
K chosen as described as above and letting the system evolve in closed loop.
More amazingly, the gain matrix K can be written in terms of the solution to
a (matrix) quadratic equation (2.9). This quadratic equation can be solved
numerically: in MATLAB the command K = lqr(A, B, Qx, Qu) provides
the optimal feedback compensator.
In deriving the optimal quadratic regulator, we have glossed over a num-
ber of important details. It is clear from the form of the solution that we
must have Qu > 0 since its inverse appears in the solution. We would typ-
ically also have Qx > 0 so that the integral cost is only zero when x = 0,
but in some instances we might only care about certain states, which would
imply that Qx ≥ 0. For this case, if we let Qx = H T H (always possible),
our cost function becomes
Z ∞ Z ∞
J= xT H T Hx + uT Qu u dt = kHxk2 + uT Qu u dt.
0 0
A technical condition for the optimal solution to exist is that the pair (A, H)
be detectable (implied by observability). This makes sense intuitively by
considering y = Hx as an output. If y is not observable then there may be
non-zero initial conditions that produce no output and so the cost would
be zero. This would lead to an ill-conditioned problem and hence we will
require that Qx ≥ 0 satisfy an appropriate observability condition.
We summarize the main results as a theorem.
Theorem 2.2. Consider a linear control system with quadratic cost:
Z ∞
ẋ = Ax + Bu, J= xT Qx x + uT Qu u dt.
0
2.4. LINEAR QUADRATIC REGULATORS 2-13

Assume that the system defined by (A, B) is reachable, Qx = QTx ≥ 0 and


Qu = QTu > 0. Further assume that Qx = H T H and that the linear sys-
tem with dynamics matrix A and output matrix H is observable. Then the
optimal controller satisfies
u = −Q−1 T
u B P x, P A + AT P − P BQ−1 T
u B P = −Qx ,

and the minimum cost from initial condition x(0) is given by J ∗ = xT (0)P x(0).
The basic form of the solution follows from the necessary conditions, with
the theorem asserting that a constant solution exists for T = ∞ when the
additional conditions are satisfied. The full proof can be found in standard
texts on optimal control, such as Lewis and Syrmos [LS95] or Athans and
Falb [AF06]. A simplified version, in which we first assume the optimal
control is linear, is left as an exercise.
Example 2.4 Optimal control of a double integrator
Consider a double integrator system
   
dx 0 1 0
= x+ u
dt 0 0 1
with quadratic cost given by
 
q2 0
Qx = , Qu = 1.
0 0
The optimal control is given by the solution of matrix Riccati equation (2.9).
Let P be a symmetric positive definite matrix of the form
 
a b
P = .
b c
Then the Riccati equation becomes
 2   
−b + q 2 a − bc 0 0
= ,
a − bc 2b − c2 0 0
which has solution "p #
2q 3 q
P = √ .
q 2q
The controller is given by
p
K = Q−1 T
u B P = [q 2q].
The feedback law minimizing the given cost function is then u = −Kx.
To better understand the structure of the optimal solution, we exam-
ine the eigenstructure of the closed loop system. The closed-loop dynamics
matrix is given by
 
0 1

Acl = A − BK = .
−q − 2q
2-14 CHAPTER 2. OPTIMAL CONTROL

The characteristic polynomial of this matrix is


p
λ2 + 2qλ + q.
Comparing this to λ2 + 2ζω0 λ + ω02 , we see that
√ 1
ω0 = q, ζ=√ .
2
Thus the optimal controller gives a closed loop system with damping ratio
ζ = 0.707, giving a good tradeoff between rise time and overshoot (see
ÅM08). ∇

2.5 Choosing LQR weights


One of the key questions in LQR design is how to choose the weights Qx and
Qu . To choose specific values for the cost function weights Qx and Qu , we
must use our knowledge of the system we are trying to control. A particularly
simple choice is to use diagonal weights
   
q1 0 ρ1 0
 ..   .. 
Qx =  . , Qu =  . .
0 qn 0 ρn
For this choice of Qx and Qu , the individual diagonal elements describe how
much each state and input (squared) should contribute to the overall cost.
Hence, we can take states that should remain small and attach higher weight
values to them. Similarly, we can penalize an input versus the states and
other inputs through choice of the corresponding input weight ρj .
Choosing the individual weights for the (diagonal) elements of the Qx and
Qu matrix can be done by deciding on a weighting of the errors from the in-
dividual terms. Bryson and Ho [BH75] have suggested the following method
for choosing the matrices Qx and Qu in equation (2.8): (1) choose qi and ρj
as the inverse of the square of the maximum value for the corresponding xi
or uj ; (2) modify the elements to obtain a compromise among response time,
damping and control effort. This second step can be performed by trial and
error.
It is also possible to choose the weights such that only a given subset of
variable are considered in the cost function. Let z = Hx be the output we
want to keep small and verify that (A, H) is observable. Then we can use a
cost function of the form
Qx = H T H Qu = ρI.
The constant ρ allows us to trade off kzk2 versus ρkuk2 .
We illustrate the various choices through an example application.
2.5. CHOOSING LQR WEIGHTS 2-15

y r

F2

x F1
(a) Harrier “jump jet” (b) Simplified model
Figure 2.3: Vectored thrust aircraft. The Harrier AV-8B military aircraft (a)
redirects its engine thrust downward so that it can “hover” above the ground.
Some air from the engine is diverted to the wing tips to be used for maneuvering.
As shown in (b), the net thrust on the aircraft can be decomposed into a horizontal
force F1 and a vertical force F2 acting at a distance r from the center of mass.

Example 2.5 Thrust vectored aircraft


Consider the thrust vectored aircraft example introduced in ÅM08, Exam-
ple 2.9. The system is shown in Figure 2.3, reproduced from ÅM08. The
linear quadratic regulator problem was illustrated in Example 6.8, where
the weights were chosen as Qx = I and Qu = ρI. Figure 2.4 reproduces the
step response for this case.
A more physically motivated weighted can be computing by specifying
the comparable errors in each of the states and adjusting the weights ac-
cordingly. Suppose, for example that we consider a 1 cm error in x, a 10 cm
error in y and a 5◦ error in θ to be equivalently bad. In addition, we wish
to penalize the forces in the sidewards direction (F1 ) since these results in a

1.5 1.5
x
Position x, y [m]

Position x, y [m]

y
1 1

0.5 0.5
rho = 0.1
rho = 1
rho = 10
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (seconds) Time (seconds)
(a) Step response in x and y (b) Effect of control weight ρ
Figure 2.4: Step response for a vectored thrust aircraft. The plot in (a) shows
the x and y positions of the aircraft when it is commanded to move 1 m in each
direction. In (b) the x motion is shown for control weights ρ = 1, 102 , 104 . A higher
weight of the input term in the cost function causes a more sluggish response.
2-16 CHAPTER 2. OPTIMAL CONTROL

1.5 4
x u1
y u2
3
1

0.5
1

0 0
0 2 4 6 8 10 0 2 4 6 8 10
(a) Step response in x and y (b) Inputs for the step response
Figure 2.5: Step response for a vector thrust aircraft using physically motivated
LQR weights (a). The rise time for x is much faster than in Figure 2.4a, but there
is a small oscillation and the inputs required are quite large (b).

loss in efficiency. This can be accounted for in the LQR weights be choosing
 
100 0 0 0 0 0
 0 1 0 0 0 0
   
 0 0 2π/9 0 0 0 10 0
Qx = 
 0
, Qu = .
 0 0 0 0 0 0 1
 0 0 0 0 0 0
0 0 0 0 0 0

The results of this choice of weights are shown in Figure 2.5. ∇

2.6 Advanced Topics


In this section we briefly touch on some related topics in optimal control,
with reference to more detailed treatments where appropriate.
Singular extremals. The necessary conditions in the maximum principle en-
force the constraints through the of the Lagrange multipliers λ(t). In some
instances, we can get an extremal curve that has one or more of the λ’s
identically equal to zero. This corresponds to a situation in which the con-
straint is satisfied strictly through the minimization of the cost function and
does not need to be explicitly enforced. We illustrate this case through an
example.
Example 2.6 Nonholonomic integrator
Consider the minimum time optimization problem for the nonholonomic
integrator introduced in Example 1.2 with input constraints |ui | ≤ 1. The
Hamiltonian for the system is given by
H = 1 + λ1 u 1 + λ2 u 2 + λ3 x 2 u 1
and the resulting equations for the Lagrange multipliers are
λ̇1 = 0, λ̇2 = λ3 x2 , λ̇3 = 0. (2.10)
2.7. FURTHER READING 2-17

It follows from these equations that λ1 and λ3 are constant. To find the
input u corresponding to the extremal curves, we see from the Hamiltonian
that
u1 = −sgn(λ1 + λ3 x2 u1 ), u2 = −sgnλ2 .
These equations are well-defined as long as the arguments of sgn(·) are non-
zero and we get switching of the inputs when the arguments pass through
0.
An example of an abnormal extremal is the optimal trajectory between
x0 = (0, 0, 0) to xf = (ρ, 0, 0) where ρ > 0. The minimum time trajectory
is clearly given by moving on a straight line with u1 = 1 and u2 = 0. This
extremal satisfies the necessary conditions but with λ2 ≡ 0, so that the
“constraint” that ẋ2 = u2 is not strictly enforced through the Lagrange
multipliers. ∇

2.7 Further Reading


There are a number of excellent books on optimal control. One of the first
(and best) is the book by Pontryagin et al. [PBGM62]. During the 1960s and
1970s a number of additional books were written that provided many ex-
amples and served as standard textbooks in optimal control classes. Athans
and Falb [AF06] and Bryson and Ho [BH75] are two such texts. A very el-
egant treatment of optimal control from the point of view of optimization
over general linear spaces is given by Luenberger [Lue97]. Finally, a modern
engineering textbook that contains a very compact and concise derivation of
the key results in optimal control is the book by Lewis and Syrmos [LS95].

Exercises
2.1 (a) Let G1 , G2 , . . . , Gk be a set of row vectors on a Rn . Let F be
another row vector on Rn such that for every x ∈ Rn satisfying
Gi x = 0, i = 1, . . . , k, we have F x = 0. Show that there are con-
stants λ1 , λ2 , . . . , λk such that
k
X
F = λ k Gk .
i=1

(b) Let x∗ ∈ Rn be an the extremal point (maximum or minimum) of a


function f subject to the constraints gi (x) = 0, i = 1, . . . , k. Assum-
ing that the gradients ∂gi (x∗ )/∂x are linearly independent, show that
there are k scalers λk , i = 1, . . . , n such that the function
n
X
f˜(x) = f (x) + λi gi (x)
i=1
2-18 CHAPTER 2. OPTIMAL CONTROL

has an extremal point at x∗ .


2.2 Consider the following control system
q̇ = u
Ẏ = quT − uq T
where u ∈ Rm and Y ∈ Rm×∋ is a skew symmetric matrix, Y T = Y .
(a) For the fixed end point problem, derive the form of the optimal con-
troller minimizing the following integral
Z
1 1 T
u u dt.
2 0

(b) For the boundary conditions q(0) = q(1) = 0, Y (0) = 0 and


 
0 −y3 y2
Y (1) =  y3 0 −y1 
−y2 y1 0
for some y ∈ R3 , give an explicit formula for the optimal inputs u.

(c) (Optional) Find the input u to steer the system from (0, 0) to (0, Ỹ ) ∈
Rm × Rm×m where Ỹ T = −Ỹ .
(Hint: if you get stuck, there is a paper by Brockett on this problem.)
2.3 In this problem, you will use the maximum principle to show that the
shortest path between two points is a straight line. We model the problem
by constructing a control system
ẋ = u,
where x ∈ R2 is the position in the plane and u ∈ R2 is the velocity vector
along the curve. Suppose we wish to find a curve of minimal length con-
necting x(0) = x0 and x(1) = xf . To minimize the length, we minimize the
integral of the velocity along the curve,
Z 1 Z 1√
J= kẋk dt = ẋT ẋ dt,
0 0
subject to to the initial and final state constraints. Use the maximum prin-
ciple to show that the minimal length path is indeed a straight line at max-
imum velocity. (Hint: try minimizing using the integral cost ẋT ẋ first and
then show this also optimizes the optimal control problem with integral cost
kẋk.)
2.4 Consider the optimal control problem for the system
ẋ = −ax + bu,
2.7. FURTHER READING 2-19

where x = R is a scalar state, u ∈ R is the input, the initial state x(t0 )


is given, and a, b ∈ R are positive constants. (Note that this system is not
quite the same as the one in Example 2.2.) The cost function is given by
Z tf
J=2 1
u2 (t) dt + 12 cx2 (tf ),
t0
where the terminal time tf is given and c is a constant.

(a) Solve explicitly for the optimal control u∗ (t) and the corresponding
state x∗ (t) in terms of t0 , tf , x(t0 ) and t and describe what happens
to the terminal state x∗ (tf ) as c → ∞.
(b) Show that the system is differentially flat with appropriate choice of
output(s) and compute the state and input as a function of the flat
output(s).
(c) Using the polynomial basis {tk , k = 0, . . . , M −1} with an appropriate
choice of M , solve for the (non-optimal) trajectory between x(t0 ) and
x(tf ). Your answer should specify the explicit input ud (t) and state
xd (t) in terms of t0 , tf , x(t0 ), x(tf ) and t.
(d) Let a = 1 and c = 1. Use your solution to the optimal control prob-
lem and the flatness-based trajectory generation to find a trajectory
between x(0) = 0 and x(1) = 1. Plot the state and input trajectories
for each solution and compare the costs of the two approaches.
(e) (Optional) Suppose that we choose more than the minimal number of
basis functions for the differentially flat output. Show how to use the
additional degrees of freedom to minimize the cost of the flat trajec-
tory and demonstrate that you can obtain a cost that is closer to the
optimal.

2.5 Repeat Exercise 2.4 using the system


ẋ = −ax3 + bu.
For part (a) you need only write the conditions for the optimal cost.
2.6 Consider the problem of moving a two-wheeled mobile robot (e.g., a
Segway) from one position and orientation to another. The dynamics for the
system is given by the nonlinear differential equation
ẋ = cos θ v, ẏ = sin θ v, θ̇ = ω,
where (x, y) is the position of the rear wheels, θ is the angle of the robot
with respect to the x axis, v is the forward velocity of the robot and ω is
spinning rate. We wish to choose an input (v, ω) that minimizes the time
that it takes to move between two configurations (x0 , y0 , θ0 ) and (xf , yf , θf ),
subject to input constraints |v| ≤ L and |ω| ≤ M .
2-20 CHAPTER 2. OPTIMAL CONTROL

Use the maximum principle to show that any optimal trajectory consists
of segments in which the robot is traveling at maximum velocity in either the
forward or reverse direction, and going either straight, hard left (ω = −M )
or hard right (ω = +M ).
Note: one of the cases is a bit tricky and cannot be completely proven
with the tools we have learned so far. However, you should be able to show
the other cases and verify that the tricky case is possible.
2.7 Consider a linear system with input u and output y and suppose we
wish to minimize the quadratic cost function
Z ∞

J= y T y + ρuT u dt.
0
Show that if the corresponding linear system is observable, then the closed
loop system obtained by using the optimal feedback u = −Kx is guaranteed
to be stable.
2.8 Consider the system transfer function
s+b
H(s) = , a, b > 0
s(s + a)
with state space representation
   
0 1 0
ẋ = x+ u,
0 −a 1
 
y= b 1 x
and performance criterion
Z ∞
V = (x21 + u2 )dt.
0

(a) Let 
p11 p12
P = ,
p21 p22
with p12 = p21 and P > 0 (positive definite). Write the steady state
Riccati equation as a system of four explicit equations in terms of the
elements of P and the constants a and b.
(b) Find the gains for the optimal controller assuming the full state is
available for feedback.
(c) Find the closed loop natural frequency and damping ratio.

2.9 Consider the optimal control problem for the system


Z tf
ẋ = ax + bu J=2 1
u2 (t) dt + 12 cx2 (tf ),
t0
2.7. FURTHER READING 2-21

where x ∈ R is a scalar state, u ∈ R is the input, the initial state x(t0 ) is


given, and a, b ∈ R are positive constants. We take the terminal time tf as
given and let c > 0 be a constant that balances the final value of the state
with the input required to get to that position. The optimal trajectory is
derived in Example 2.2.
Now consider the infinite horizon cost
Z ∞
J=2 1
u2 (t) dt
t0

with x(t) at t = ∞ constrained to be zero.


(a) Solve for u∗ (t) = −bP x∗ (t) where P is the positive solution corre-
sponding to the algebraic Riccati equation. Note that this gives an
explicit feedback law (u = −bP x).
(b) Plot the state solution of the finite time optimal controller for the
following parameter values
a = 2, b = 0.5, x(t0 ) = 4,
c = 0.1, 10, tf = 0.5, 1, 10.
(This should give you a total of 6 curves.) Compare these to the infinite
time optimal control solution. Which finite time solution is closest to
the infinite time solution? Why?
2.10 Consider the lateral control problem for an autonomous ground vehicle
from Example 1.1. We assume that we are given a reference trajectory r =
(xd , yd ) corresponding to the desired trajectory of the vehicle. For simplicity,
we will assume that we wish to follow a straight line in the x direction at a
constant velocity vd > 0 and hence we focus on the y and θ dynamics:
1
ẏ = sin θ vd , θ̇ = tan φ vd .
l
We let vd = 10 m/s and l = 2 m.
(a) Design an LQR controller that stabilizes the position y to yd = 0. Plot
the step and frequency response for your controller and determine the
overshoot, rise time, bandwidth and phase margin for your design.
(Hint: for the frequency domain specifications, break the loop just
before the process dynamics and use the resulting SISO loop transfer
function.)
(b) Suppose now that yd (t) is not identically zero, but is instead given
by yd (t) = r(t). Modify your control law so that you track r(t) and
demonstrate the performance of your controller on a “slalom course”
given by a sinusoidal trajectory with magnitude 1 meter and frequency
1 Hz.
Chapter Three
Receding Horizon Control
(with J. E. Hauser and A. Jadbabaie)

This set of notes builds on the previous two chapters and explores the use of
online optimization as a tool for control of nonlinear control. We begin with
a high-level discussion of optimization-based control, refining some of the
concepts initially introduced in Chapter 1. We then describe the technique
of receding horizon control (RHC), including a proof of stability for a partic-
ular form of receding horizon control that makes use of a control Lyapunov
function as a terminal cost. We conclude the chapter with a detailed design
example, in which we can explore some of the computational tradeoffs in
optimization-based control.

Prerequisites. Readers should be familiar with the concepts of trajectory


generation and optimal control as described in Chapters 1 and 2. For the
proof of stability for the receding horizon controller that we present, famil-
iarity with Lyapunov stability analysis at the level given in ÅM08, Chapter
4 (Dynamic Behavior) is assumed.

The material in this chapter is based on part on joint work with John Hauser
and Ali Jadbabaie [MHJ+ 03].

3.1 Optimization-Based Control


Optimization-based control refers to the use of online, optimal trajectory
generation as a part of the feedback stabilization of a (typically nonlinear)
system. The basic idea is to use a receding horizon control technique: a
(optimal) feasible trajectory is computed from the current position to the
desired position over a finite time T horizon, used for a short period of time
δ < T , and then recomputed based on the new system state starting at
time t + δ until time t + T + δ. Development and application of receding
horizon control (also called model predictive control, or MPC) originated
in process control industries where the processes being controlled are often
sufficiently slow to permit its implementation. An overview of the evolution
of commercially available MPC technology is given in [QB97] and a survey
of the state of stability theory of MPC is given in [MRRS00].
3-2 CHAPTER 3. RECEDING HORIZON CONTROL

Nonlinear System Linearized Model Linear System


with Constraints

Constraints and Linear


Nonlinearities Design

Cost Function
Model Predictive Linear Controller
Control

Figure 3.1: Optimization-based control approach.

Design approach
The basic philosophy that we propose is illustrated in Figure 3.1. We begin
with a nonlinear system, including a description of the constraint set. We
linearize this system about a representative equilibrium point and perform
a linear control design using standard control design tools. Such a design
can provide provably robust performance around the equilibrium point and,
more importantly, allows the designer to meet a wide variety of formal and
informal performance specifications through experience and the use of so-
phisticated linear design tools.
The resulting linear control law then serves as a specification of the de-
sired control performance for the entire nonlinear system. We convert the
control law specification into a receding horizon control formulation, chosen
such that for the linearized system, the receding horizon controller gives com-
parable performance. However, because of its use of optimization tools that
can handle nonlinearities and constraints, the receding horizon controller
is able to provide the desired performance over a much larger operating
envelope than the controller design based just on the linearization. Further-
more, by choosing cost formulations that have certain properties, we can
provide proofs of stability for the full nonlinear system and, in some cases,
the constrained system.
The advantage of the proposed approach is that it exploits the power
of humans in designing sophisticated control laws in the absence of con-
straints with the power of computers to rapidly compute trajectories that
optimize a given cost function in the presence of constraints. New advances
in online trajectory generation serve as an enabler for this approach and
their demonstration on representative flight control experiments shows their
viability [MFHM05]. This approach can be extended to existing nonlinear
paradigms as well, as we describe in more detail below.
An advantage of optimization-based approaches is that they allow the
potential for online customization of the controller. By updating the model
3.1. OPTIMIZATION-BASED CONTROL 3-3

that the optimization uses to reflect the current knowledge of the system
characteristics, the controller can take into account changes in parameters
values or damage to sensors or actuators. In addition, environmental models
that include dynamic constraints can be included, allowing the controller to
generate trajectories that satisfy complex operating conditions. These mod-
ifications allow for many state- and environment-dependent uncertainties to
including the receding horizon feedback loop, providing potential robustness
with respect to those uncertainties.
A number of approaches in receding horizon control employ the use of
terminal state equality or inequality constraints, often together with a ter-
minal cost, to ensure closed loop stability. In Primbs et al. [PND99], aspects
of a stability-guaranteeing, global control Lyapunov function (CLF) were
used, via state and control constraints, to develop a stabilizing receding
horizon scheme. Many of the nice characteristics of the CLF controller to-
gether with better cost performance were realized. Unfortunately, a global
control Lyapunov function is rarely available and often not possible.
Motivated by the difficulties in solving constrained optimal control prob-
lems, researchers have developed an alternative receding horizon control
strategy for the stabilization of nonlinear systems [JYH01]. In this approach,
closed loop stability is ensured through the use of a terminal cost consisting
of a control Lyapunov function (defined later) that is an incremental upper
bound on the optimal cost to go. This terminal cost eliminates the need
for terminal constraints in the optimization and gives a dramatic speed-up
in computation. Also, questions of existence and regularity of optimal solu-
tions (very important for online optimization) can be dealt with in a rather
straightforward manner.

Inverse Optimality
The philosophy presented here relies on the synthesis of an optimal control
problem from specifications that are embedded in an externally generated
controller design. This controller is typically designed by standard classical
control techniques for a nominal process, absent constraints. In this frame-
work, the controller’s performance, stability and robustness specifications
are translated into an equivalent optimal control problem and implemented
in a receding horizon fashion.
One central question that must be addressed when considering the use-
fulness of this philosophy is: Given a control law, how does one find an
equivalent optimal control formulation? The paper by Kalman [Kal64] lays
a solid foundation for this class of problems, known as inverse optimality.
In this paper, Kalman considers the class of linear time-invariant (LTI) pro-
cesses with full-state feedback and a single input variable, with an associated
cost function that is quadratic in the input and state variables. These as-
sumptions set up the well-known linear quadratic regulator (LQR) problem,
by now a staple of optimal control theory.
3-4 CHAPTER 3. RECEDING HORIZON CONTROL

In Kalman’s paper, the mathematical framework behind the LQR prob-


lem is laid out, and necessary and sufficient algebraic criteria for optimality
are presented in terms of the algebraic Riccati equation, as well as in terms
of a condition on the return difference of the feedback loop. In terms of the
LQR problem, the task of synthesizing the optimal control problem comes
down to finding the integrated cost weights Qx and Qu given only the dynam-
ical description of the process represented by matrices A and B and of the
feedback controller represented by K. Kalman delivers a particularly elegant
frequency characterization of this map [Kal64], which we briefly summarize
here.
We consider a linear system
ẋ = Ax + Bu x ∈ Rn , u ∈ Rm (3.1)
with state x and input u. We consider only the single input, single output
case for now (m = 1). Given a control law
u = Kx
we wish to find a cost functional of the form
Z T
J= xT Qx x + uT Qu u dt + xT (T )PT x(T ) (3.2)
0

where Qx ∈ Rn×n and Qu ∈ Rm×m define the integrated cost, PT ∈ Rn×n


is the terminal cost, and T is the time horizon. Our goal is to find PT > 0,
Qx > 0, Qu > 0, and T > 0 such that the resulting optimal control law is
equivalent to u = Kx.
The optimal control law for the quadratic cost function (3.2) is given by
u = −R−1 B T P (t),
where P (t) is the solution to the Riccati ordinary differential equation
− Ṗ = AT P + P A − P BR−1 B T P + Q (3.3)
with terminal condition P (T ) = PT . In order for this to give a control law
of the form u = Kx for a constant matrix K, we must find PT , Qx , and
Qu that give a constant solution to the Riccati equation (3.3) and satisfy
−R−1 B T P = K. It follows that PT , Qx and Qu should satisfy
AT PT + PT A − PT BQ−1 T
u B PT + Q = 0
(3.4)
−Q−1 T
u B PT = K.

We note that the first equation is simply the normal algebraic Riccati equa-
tion of optimal control, but with PT , Q, and R yet to be chosen. The second
equation places additional constraints on R and PT .
Equation (3.4) is exactly the same equation that one would obtain if we
had considered an infinite time horizon problem, since the given control was
constant and hence P (t) was forced to be constant. This infinite horizon
3.1. OPTIMIZATION-BASED CONTROL 3-5

problem is precisely the one that Kalman considered in 1964, and hence
his results apply directly. Namely, in the single-input single-output case, we
can always find a solution to the coupled equations (3.4) under standard
conditions on reachability and observability [Kal64]. The equations can be
simplified by substituting the second relation into the first to obtain
AT PT + PT A − K T RK + Q = 0.
This equation is linear in the unknowns and can be solved directly (remem-
bering that PT , Qx and Qu are required to be positive definite).
The implication of these results is that any state feedback control law
satisfying these assumptions can be realized as the solution to an appro-
priately defined receding horizon control law. Thus, we can implement the
design framework summarized in Figure 3.1 for the case where our (linear)
control design results in a state feedback controller.
The above results can be generalized to nonlinear systems, in which one
takes a nonlinear control system and attempts to find a cost function such
that the given controller is the optimal control with respect to that cost.
The history of inverse optimal control for nonlinear systems goes back to
the early work of Moylan and Anderson [MA73]. More recently, Sepulchre
et al. [SJK97] showed that a nonlinear state feedback obtained by Sontag’s
formula from a control Lyapunov function (CLF) is inverse optimal. The con-
nections of this inverse optimality result to passivity and robustness prop-
erties of the optimal state feedback are discussed in Jankovic et al. [JSK99].
Most results on inverse optimality do not consider the constraints on control
or state. However, the results on the unconstrained inverse optimality justify
the use of a more general nonlinear loss function in the integrated cost of
a finite horizon performance index combined with a real-time optimization-
based control approach that takes the constraints into account.

Control Lyapunov Functions


For the optimal control problems that we introduce in the next section, we
will make use of a terminal cost that is also a control Lyapunov function
for the system. Control Lyapunov functions are an extension of standard
Lyapunov functions and were originally introduced by Sontag [Son83]. They
allow constructive design of nonlinear controllers and the Lyapunov function
that proves their stability. A more complete treatment is given in [KKK95].
Consider a nonlinear control system
ẋ = f (x, u), x ∈ Rn , u ∈ Rm . (3.5)
Definition 3.1 (Control Lyapunov Function). A locally positive function
V : Rn → R+ is called a control Lyapunov function (CLF) for a control
system (3.5) if
 
∂V
inf f (x, u) < 0 for all x 6= 0.
u∈Rm ∂x
3-6 CHAPTER 3. RECEDING HORIZON CONTROL

In general, it is difficult to find a CLF for a given system. However, for


many classes of systems, there are specialized methods that can be used. One
of the simplest is to use the Jacobian linearization of the system around the
desired equilibrium point and generate a CLF by solving an LQR problem.
As described in Chapter 2, the problem of minimizing the quadratic
performance index,
Z ∞
ẋ = Ax + Bu,
J= (xT (t)Qx(t) + uT Ru(t))dt subject to
0 x(0) = x0 ,
(3.6)
results in finding the positive definite solution of the following Riccati equa-
tion:
AT P + P A − P BR−1 B T P + Q = 0 (3.7)
The optimal control action is given by
u = −R−1 B T P x
and V = xT P x is a CLF for the system.
In the case of the nonlinear system ẋ = f (x, u), A and B are taken as
∂f (x, u) ∂f (x, u)
A= |(0,0) B= |(0,0)
∂x ∂u
1
where the pairs (A, B) and (Q 2 , A) are assumed to be stabilizable and de-
tectable respectively. The CLF V (x) = xT P x is valid in a region around the
equilibrium (0, 0), as shown in Exercise 3.1.
More complicated methods for finding control Lyapunov functions are
often required and many techniques have been developed. An overview of
some of these methods can be found in [Jad01].

Finite Horizon Optimal Control


We briefly review the problem of optimal control over a finite time horizon
as presented in Chapter 2 to establish the notation for the chapter and set
some more specific conditions required for receding horizon control. This
material is based on [MHJ+ 03].
Given an initial state x0 and a control trajectory u(·) for a nonlinear
control system ẋ = f (x, u), let xu (·; x0 ) represent the state trajectory. We
can write this solution as a continuous curve
Z t
u
x (t; x0 ) = x0 + f (xu (τ ; x0 ), u(τ )) dτ
0
for t ≥ 0. We require that the trajectories of the system satisfy an a priori
bound
kx(t)k ≤ β(x, T, ku(·)k1 ) < ∞, t ∈ [0, T ],
where β is continuous in all variables and monotone increasing in T and
ku(·)k1 = ku(·)kL1 (0,T ) . Most models of physical systems will satisfy a bound
3.1. OPTIMIZATION-BASED CONTROL 3-7

of this type.
The performance of the system will be measured by an integral cost
L : Rn × Rm → R. We require that L be twice differentiable (C 2 ) and fully
penalize both state and control according to
L(x, u) ≥ cq (kxk2 + kuk2 ), x ∈ Rn , u ∈ Rm
for some cq > 0 and L(0, 0) = 0. It follows that the quadratic approximation
of L at the origin is positive definite,

∂L
≥ cq I > 0.
∂x (0,0)

To ensure that the solutions of the optimization problems of interest


are well behaved, we impose some convexity conditions. We require the set
f (x, Rm ) ⊂ Rn to be convex for each x ∈ Rn . Letting λ ∈ Rn represent
the co-state, we also require that the pre-Hamiltonian function λT f (x, u) +
L(x, u) =: K(x, u, λ) be strictly convex for each (x, λ) ∈ Rn × Rn and that
there is a C 2 function ū∗ : Rn × Rn → Rm providing the global minimum of
K(x, u, λ). The Hamiltonian H(x, λ) := K(x, ū∗ (x, λ), λ) is then C 2 , ensur-
ing that extremal state, co-state, and control trajectories will all be suffi-
ciently smooth (C 1 or better). Note that these conditions are automatically
satisfied for control affine f and quadratic L.
The cost of applying a control u(·) from an initial state x over the infinite
time interval [0, ∞) is given by
Z ∞
J∞ (x, u(·)) = L(xu (τ ; x), u(τ )) dτ .
0
The optimal cost (from x) is given by

J∞ (x) = inf J∞ (x, u(·))
u(·)

where the control function u(·) belongs to some reasonable class of admissible
controls (e.g., piecewise continuous). The function J∞ ∗ (x) is often called the

optimal value function for the infinite horizon optimal control problem. For
the class of f and L considered, it can be verified that J∞ ∗ (·) is a positive
2
definite C function in a neighborhood of the origin [HO01].
For practical purposes, we are interested in finite horizon approximations
of the infinite horizon optimization problem. In particular, let V (·) be a
nonnegative C 2 function with V (0) = 0 and define the finite horizon cost
(from x using u(·)) to be
Z T
JT (x, u(·)) = L(xu (τ ; x), u(τ )) dτ + V (xu (T ; x)) (3.8)
0
and denote the optimal cost (from x) as
JT∗ (x) = inf JT (x, u(·)) .
u(·)
3-8 CHAPTER 3. RECEDING HORIZON CONTROL

As in the infinite horizon case, one can show, by geometric means, that JT∗ (·)
is locally smooth (C 2 ). Other properties will depend on the choice of V and
T.
Let Γ∞ denote the domain of J∞ ∗ (·) (the subset of Rn on which J ∗

is finite). It is not too difficult to show that the cost functions J∞ ∗ (·) and

JT∗ (·), T ≥ 0, are continuous functions on Γ∞ [Jad01]. For simplicity, we will


allow J∞ ∗ (·) to take values in the extended real line so that, for instance,

J∞ (x) = +∞ means that there is no control taking x to the origin.
We will assume that f and L are such that the minimum value of the
cost functions J∞ ∗ (x), J ∗ (x), T ≥ 0, is attained for each (suitable) x. That
T
is, given x and T > 0 (including T = ∞ when x ∈ Γ∞ ), there is a (C 1 in t)
optimal trajectory (x∗T (t; x), u∗T (t; x)), t ∈ [0, T ], such that JT (x, u∗T (·; x)) =
JT∗ (x). For instance, if f is such that its trajectories can be bounded on
finite intervals as a function of its input size, e.g., there is a continuous
function β such that kxu (t; x0 )k ≤ β(kx0 k, ku(·)kL1 [0,t] ), then (together with
the conditions above) there will be a minimizing control (cf. [LM67]). Many
such conditions may be used to good effect; see [Jad01] for a more complete
discussion.
It is easy to see that J∞ ∗ (·) is proper on its domain so that the sub-level

sets
Γ∞r := {x ∈ Γ
∞ ∗
: J∞ (x) ≤ r2 }
S
are compact and path connected and moreover Γ∞ = r≥0 Γ∞ r . Note also
that Γ∞ may be a proper subset of Rn since there may be states that cannot
be driven to the origin. We use r2 (rather than r) here to reflect the fact that
our integral cost is quadratically bounded from below. We refer to sub-level
sets of JT∗ (·) and V (·) using
ΓTr := path connected component of {x ∈ Γ∞ : JT∗ (x) ≤ r2 } containing 0,
and
Ωr := path connected component of {x ∈ Rn : V (x) ≤ r2 } containing 0.
These results provide the technical framework needed for receding hori-
zon control.

3.2 Receding Horizon Control with CLF Terminal Cost


In receding horizon control, a finite horizon optimal control problem is
solved, generating open-loop state and control trajectories. The resulting
control trajectory is applied to the system for a fraction of the horizon
length. This process is then repeated, resulting in a sampled data feedback
law. Although receding horizon control has been successfully used in the
process control industry for many years, its application to fast, stability-
critical nonlinear systems has been more difficult. This is mainly due to
3.2. RECEDING HORIZON CONTROL WITH CLF TERMINAL COST 3-9

two issues. The first is that the finite horizon optimizations must be solved
in a relatively short period of time. Second, it can be demonstrated using
linear examples that a naive application of the receding horizon strategy
can have undesirable effects, often rendering a system unstable. Various ap-
proaches have been proposed to tackle this second problem; see [MRRS00]
for a comprehensive review of this literature. The theoretical framework pre-
sented here also addresses the stability issue directly, but is motivated by
the need to relax the computational demands of existing stabilizing RHC
formulations.
Receding horizon control provides a practical strategy for the use of in-
formation from a model through on-line optimization. Every δ seconds, an
optimal control problem is solved over a T second horizon, starting from the
current state. The first δ seconds of the optimal control u∗T (·; x(t)) is then
applied to the system, driving the system from x(t) at current time t to
x∗T (δ, x(t)) at the next sample time t + δ (assuming no model uncertainty).
We denote this receding horizon scheme as RH(T, δ).
In defining (unconstrained) finite horizon approximations to the infinite
horizon problem, the key design parameters are the terminal cost function
V (·) and the horizon length T (and, perhaps also, the increment δ). We wish
to characterize the sets of choices that provide successful controllers.
It is well known (and easily demonstrated with linear examples), that
simple truncation of the integral (i.e., V (x) ≡ 0) may have disastrous effects
if T > 0 is too small. Indeed, although the resulting value function may be
nicely behaved, the “optimal” receding horizon closed loop system can be
unstable.
A more sophisticated approach is to make good use of a suitable terminal
cost V (·). Evidently, the best choice for the terminal cost is V (x) = J∞ ∗ (x)

since then the optimal finite and infinite horizon costs are the same. Of
course, if the optimal value function were available there would be no need
to solve a trajectory optimization problem. What properties of the optimal
value function should be retained in the terminal cost? To be effective, the
terminal cost should account for the discarded tail by ensuring that the
origin can be reached from the terminal state xu (T ; x) in an efficient manner
(as measured by L). One way to do this is to use an appropriate control
Lyapunov function, which is also an upper bound on the cost-to-go.
The following theorem shows that the use of a particular type of CLF is
in fact effective, providing rather strong and specific guarantees.

Theorem 3.1. [JYH01] Suppose that the terminal cost V (·) is a control
Lyapunov function such that
min (V̇ + L)(x, u) ≤ 0 (3.9)
u∈Rm

for each x ∈ Ωrv for some rv > 0. Then, for every T > 0 and δ ∈ (0, T ], the
resulting receding horizon trajectories go to zero exponentially fast. For each
3-10 CHAPTER 3. RECEDING HORIZON CONTROL

T > 0, there is an r̄(T ) ≥ rv such that Γr̄(T T


) is contained in the region of
attraction of RH(T, δ). Moreover, given any compact subset Λ of Γ∞ , there
is a T ∗ such that Λ ⊂ ΓTr̄(T ) for all T ≥ T ∗ .

Theorem 3.1 shows that for any horizon length T > 0 and any sampling
time δ ∈ (0, T ], the receding horizon scheme is exponentially stabilizing
over the set ΓTrv . For a given T , the region of attraction estimate is en-
larged by increasing r beyond rv to r̄(T ) according to the requirement that
V (x∗T (T ; x)) ≤ rv2 on that set. An important feature of the above result is
that, for operations with the set ΓTr̄(T ) , there is no need to impose stability
ensuring constraints which would likely make the online optimizations more
difficult and time consuming to solve.

Sketch of proof. Let xu (τ ; x) represent the state trajectory at time τ start-


ing from initial state x and applying a control trajectory u(·), and let
(x∗T , u∗T )(·, x) represent the optimal trajectory of the finite horizon, opti-
mal control problem with horizon T . Assume that x∗T (T ; x) ∈ Ωr for some
r > 0. Then for any δ ∈ [0, T ] we want to show that the optimal cost x∗T (δ; x)
satisfies
Z δ

JT∗ x∗T (δ; x) ≤ JT∗ (x) − q L(x∗T (τ ; x), u∗T (τ ; x)) dτ. (3.10)
0

This expression says that solution to the finite-horizon, optimal control prob-
lem starting at time t = δ has cost that is less than the cost of the solution
from time t = 0, with the initial portion of the cost subtracted off.. In other
words, we are closer to our solution by a finite amount at each iteration
of the algorithm. It follows using Lyapunov analysis that we must converge
to the zero cost solution and hence our trajectory converges to the desired
terminal state (given by the minimum of the cost function).
To show equation (3.10) holds, consider a trajectory in which we apply
the optimal control for the first T seconds and then apply a closed loop
controller using a stabilizing feedback u = −k(x) for another T seconds. (The
stabilizing compensator is guaranteed to exist since V is a control Lyapunov
function.) Let (x∗T , u∗T )(t; x), t ∈ [0, T ] represent the optimal control and
(xk , uk )(t − T ; x∗T (T ; x)), t ∈ [T, 2T ] represent the control with u = −k(x)
applied where k satisfies (V̇ + L)(x, −k(x)) ≤ 0. Finally, let (x̃(t), ũ(t)),
t ∈ [0, 2T ] represent the trajectory obtained by concatenating the optimal
trajectory (x∗T , u∗T ) with the CLF trajectory (xk , uk ).
We now proceed to show that the inequality (3.10) holds. The cost of
using ũ(·) for the first T seconds starting from the initial state x∗T (δ; x)),
3.2. RECEDING HORIZON CONTROL WITH CLF TERMINAL COST 3-11

δ ∈ [0, , T ] is given by
Z T +δ
JT (x∗T (δ; x), ũ(·)) = L(x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ))
δ
Z δ

= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
Z T +δ
+ L(x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ)).
T

Note that the second line is simply a rewriting of the integral in terms of
the optimal cost JT∗ with the necessary additions and subtractions of the
additional portions of the cost for the interval [δ, T + δ]. We can how use the
bound
L(x̃(τ ), ũ(τ )) ≤ V̇ (x̃(τ ), ũ(τ ), τ ∈ [T, 2T ],
which follows from the definition of the CLF V and stabilizing controller
k(x). This allows us to write
Z δ
∗ ∗
JT (xT (δ; x), ũ(·)) ≤ JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
Z T +δ
− V̇ (x̃(τ ), ũ(τ )) dτ + V (x̃(T + δ))
T
Z δ

= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)) dτ − V (x∗T (T ; x))
0
T +δ

− V (x̃(τ )) + V (x̃(T + δ))
T
Z δ

= JT (x) − L(x∗T (τ ; x), u∗T (τ ; x)).
0

Finally, using the optimality of u∗T we have that JT∗ (x∗T (δ; x)) ≤ JT (x∗T (δ; x), ũ(·))
and we obtain equation (3.10).

An important benefit of receding horizon control is its ability to handle


state and control constraints. While the above theorem provides stability
guarantees when there are no constraints present, it can be modified to
include constraints on states and controls as well. In order to ensure stability
when state and control constraints are present, the terminal cost V (·) should
be a local CLF satisfying minu∈U V̇ + L(x, u) ≤ 0 where U is the set of
controls where the control constraints are satisfied. Moreover, one should
also require that the resulting state trajectory xCLF (·) ∈ X , where X is
the set of states where the constraints are satisfied. (Both X and U are
assumed to be compact with origin in their interior). Of course, the set Ωrv
will end up being smaller than before, resulting in a decrease in the size of
the guaranteed region of operation (see [MRRS00] for more details).
3-12 CHAPTER 3. RECEDING HORIZON CONTROL

3.3 Receding Horizon Control Using Differential Flatness


In this section we demonstrate how to use differential flatness to find fast
numerical algorithms for solving the optimal control problems required for
the receding horizon control results of the previous section. We consider the
affine nonlinear control system
ẋ = f (x) + g(x)u, (3.11)
where all vector fields and functions are smooth. For simplicity, we focus on
the single input case, u ∈ R. We wish to find a trajectory of equation (3.11)
that minimizes the performance index (3.8), subject to a vector of initial,
final, and trajectory constraints
lb0 ≤ ψ0 (x(t0 ), u(t0 )) ≤ ub0 ,
lbf ≤ ψf (x(tf ), u(tf )) ≤ ubf , (3.12)
lbt ≤ S(x, u) ≤ ubt ,
respectively. For conciseness, we will refer to this optimal control problem
as 
ẋ = f (x) + g(x)u,
min J(x, u) subject to (3.13)
(x,u) lb ≤ c(x, u) ≤ ub.

Numerical Solution Using Collocation


A numerical approach to solving this optimal control problem is to use the
direct collocation method outlined in Hargraves and Paris [HP87]. The idea
behind this approach is to transform the optimal control problem into a
nonlinear programming problem. This is accomplished by discretizing time
into a grid of N − 1 intervals
t0 = t1 < t2 < . . . < tN = tf (3.14)
and approximating the state x and the control input u as piecewise polyno-
mials x̃ and ũ, respectively. Typically a cubic polynomial is chosen for the
states and a linear polynomial for the control on each interval. Collocation
is then used at the midpoint of each interval to satisfy equation (3.11). Let
x̃(x(t1 ), ..., x(tN )) and ũ(u(t1 ), ..., u(tN )) denote the approximations to x and
u, respectively, depending on (x(t1 ), ..., x(tN )) ∈ RnN and (u(t1 ), ..., u(tN )) ∈
RN corresponding to the value of x and u at the grid points. Then one
solves the following finite dimension approximation of the original control
problem (3.13):


 x̃˙ − f (x̃(y)) + g(x̃(y))ũ(y) = 0,

min F (y) = J(x̃(y), ũ(y)) subject to lb ≤ c(x̃(y), ũ(y)) ≤ ub,
y∈RM 
 tj + tj+1
 ∀t = j = 1, . . . , N − 1
2
(3.15)
3.3. RECEDING HORIZON CONTROL USING DIFFERENTIAL FLATNESS 3-13

where y = (x(t1 ), u(t1 ), . . . , x(tN ), u(tN )), and M = dim y = (n + 1)N .


Seywald [Sey94] suggested an improvement to the previous method (see
also [Bry99, p. 362]). Following this work, one first solves a subset of system
dynamics in equation (3.13) for the the control in terms of combinations of
the state and its time derivative. Then one substitutes for the control in the
remaining system dynamics and constraints. Next all the time derivatives
ẋi are approximated by the finite difference approximations
x(ti+1 ) − x(ti )
˙ i) =
x̄(t
ti+1 − ti
to get 
˙ i ), x(ti )) = 0
p(x̄(t
˙ i ), x(ti )) ≤ 0 i = 0, ..., N − 1.
q(x̄(t
The optimal control problem is turned into

˙ i ), x(ti )) = 0
p(x̄(t
minM F (y) subject to (3.16)
y∈R ˙ i ), x(ti )) ≤ 0
q(x̄(t
where y = (x(t1 ), . . . , x(tN )), and M = dim y = nN . As with the Har-
graves and Paris method, this parameterization of the optimal control prob-
lem (3.13) can be solved using nonlinear programming.
The dimensionality of this discretized problem is lower than the dimen-
sionality of the Hargraves and Paris method, where both the states and the
input are the unknowns. This induces substantial improvement in numerical
implementation.

Differential Flatness Based Approach


The results of Seywald give a constrained optimization problem in which
we wish to minimize a cost functional subject to n − 1 equality constraints,
corresponding to the system dynamics, at each time instant. In fact, it is
usually possible to reduce the dimension of the problem further. Given an
output, it is generally possible to parameterize the control and a part of
the state in terms of this output and its time derivatives. In contrast to the
previous approach, one must use more than one derivative of this output for
this purpose.
When the whole state and the input can be parameterized with one
output, the system is differentially flat, as described in Section 1.3. When
the parameterization is only partial, the dimension of the subspace spanned
by the output and its derivatives is given by r the relative degree of this
output [Isi89]. In this case, it is possible to write the system dynamics as
x = α(z, ż, . . . , z (q) )
u = β(z, ż, . . . , z (q) ) (3.17)
n−r
Φ(z, ż, . . . , z )=0
3-14 CHAPTER 3. RECEDING HORIZON CONTROL

collocation point knotpoint

zj (t)

mj at knotpoints defines smoothness zj (tf )

zj (to )

kj − 1 degree polynomial between knotpoints

Figure 3.2: Spline representation of a variable.

where z ∈ Rp , p > m represents a set of outputs that parameterize the


trajectory and Φ : Rn × Rm represents n−r remaining differential constraints
on the output. In the case that the system is flat, r = n and we eliminate
these differential constraints.
Unlike the approach of Seywald, it is not realistic to use finite difference
approximations as soon as r > 2. In this context, it is convenient to represent
z using B-splines. B-splines are chosen as basis functions because of their
ease of enforcing continuity across knot points and ease of computing their
derivatives. A pictorial representation of such an approximation is given in
Figure 3.2. Doing so we get
pj
X
zj = Bi,kj (t)Cij , pj = lj (kj − mj ) + mj
i=1

where Bi,kj (t) is the B-spline basis function defined in [dB78] for the output
zj with order kj , Cij are the coefficients of the B-spline, lj is the number
of knot intervals, and mj is number of smoothness conditions P at the knots.
The set (z1 , z2 , . . . , zn−r ) is thus represented by M = j∈{1,r+1,...,n} pj co-
efficients.
In general, w collocation points are chosen uniformly over the time in-
terval [to , tf ] (though optimal knots placements or Gaussian points may
also be considered). Both dynamics and constraints will be enforced at the
collocation points. The problem can be stated as the following nonlinear
programming form:
(
Φ(z(y), ż(y), . . . , z (n−r) (y)) = 0
minM F (y) subject to (3.18)
y∈R lb ≤ c(y) ≤ ub
where
y = (C11 , . . . , Cp11 , C1r+1 , . . . , Cpr+1
r+1
, . . . , C1n , . . . , Cpnn ).
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-15

Figure 3.3: Caltech ducted fan.

The coefficients of the B-spline basis functions can be found using nonlinear
programming.
A software package called Nonlinear Trajectory Generation (NTG) has
been written to solve optimal control problems in the manner described
above (see [MMM00] for details). The sequential quadratic programming
package NPSOL by [GMSW] is used as the nonlinear programming solver in
NTG. When specifying a problem to NTG, the user is required to state the
problem in terms of some choice of outputs and its derivatives. The user is
also required to specify the regularity of the variables, the placement of the
knot points, the order and regularity of the B-splines, and the collocation
points for each output.

3.4 Implementation on the Caltech Ducted Fan


To demonstrate the use of the techniques described in the previous section,
we present an implementation of optimization-based control on the Caltech
Ducted Fan, a real-time, flight control experiment that mimics the longitu-
dinal dynamics of an aircraft. The experiment is show in Figure 3.3.
3-16 CHAPTER 3. RECEDING HORIZON CONTROL

Description of the Caltech Ducted Fan Experiment


The Caltech ducted fan is an experimental testbed designed for research
and development of nonlinear flight guidance and control techniques for
Uninhabited Combat Aerial Vehicles (UCAVs). The fan is a scaled model of
the longitudinal axis of a flight vehicle and flight test results validate that
the dynamics replicate qualities of actual flight vehicles [MM99].
The ducted fan has three degrees of freedom: the boom holding the ducted
fan is allowed to operate on a cylinder, 2 m high and 4.7 m in diameter, per-
mitting horizontal and vertical displacements. A counterweight is connected
to the vertical axis of the stand, allowing the effective mass of the fan to be
adjusted. Also, the wing/fan assembly at the end of the boom is allowed to
rotate about its center of mass. Optical encoders mounted on the ducted fan,
counterweight pulley, and the base of the stand measure the three degrees
of freedom. The fan is controlled by commanding a current to the electric
motor for fan thrust and by commanding RC servos to control the thrust
vectoring mechanism.
The sensors are read and the commands sent by a DSP-based multi-
processor system, comprised of a D/A card, a digital I/O card, two Texas
Instruments C40 signal processors, two Compaq Alpha processors, and a
high-speed host PC interface. A real-time interface provides access to the
processors and I/O hardware. The NTG software resides on both of the
Alpha processors, each capable of running real-time optimization.
The ducted fan is modeled in terms of the position and orientation of the
fan, and their velocities. Letting x represent the horizontal translation, z the
vertical translation and θ the rotation about the boom axis, the equations
of motion are given by

mẍ + FXa − FXb cos θ − FZb sin θ = 0,


mz̈ + FZa + FXb sin θ − FZb cos θ = mgeff ,
(3.19)
1
J θ̈ − Ma + Ip Ωẋ cos θ − FZb rf = 0,
rs

where FXa = D cos γ + L sin γ and FZa = −D sin γ + L cos γ are the aerody-
namic forces and FXb and FZb are thrust vectoring body forces in terms of
the lift (L), drag (D), and flight path angle (γ). Ip and Ω are the moment
of inertia and angular velocity of the ducted fan propeller, respectively. J is
the moment of ducted fan and rf is the distance from center of mass along
the Xb axis to the effective application point of the thrust vectoring force.
The angle of attack α can be derived from the pitch angle θ and the flight
path angle γ by

α = θ − γ.
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-17

The flight path angle can be derived from the spatial velocities by
−ż
γ = arctan .

The lift (L) ,drag (D), and moment (M ) are given by
L = qSCL (α) D = qSCD (α) M = c̄SCM (α),
respectively. The dynamic pressure is given by q = 21 ρV 2 . The norm of the
velocity is denoted by V , S the surface area of the wings, and ρ is the
atmospheric density. The coefficients of lift (CL (α)), drag (CD (α)) and the
moment coefficient (CM (α)) are determined from a combination of wind
tunnel and flight testing and are described in more detail in [MM99], along
with the values of the other parameters.

Real-Time Trajectory Generation


In this section we describe the implementation of the trajectory generation
algorithms by using NTG to generate minimum time trajectories in real
time. An LQR-based regulator is used to stabilize the system. We focus
in this section on aggressive, forward flight trajectories. The next section
extends the controller to use a receding horizon controller, but on a simpler
class of trajectories.

Stabilization Around Reference Trajectory


The results in this section rely on the traditional two degree of freedom
design paradigm described in Chapter 1. In this approach, a local control law
(inner loop) is used to stabilize the system around the trajectory computed
based on a nominal model. This compensates for uncertainties in the model,
which are predominantly due to aerodynamics and friction. Elements such as
the ducted fan flying through its own wake, ground effects and velocity- and
angle-of-attack dependent thrust contribute to the aerodynamic uncertainty.
Actuation models are not used when generating the reference trajectory,
resulting in another source of uncertainty.
Since only the position of the fan is measured, we must estimate the
velocities. We use an extended Kalman filter (described in later chapters)
with the optimal gain matrix is gain scheduled on the (estimated) forward
velocity.
The stabilizing LQR controllers were gain scheduled on pitch angle, θ,
and the forward velocity, ẋ. The pitch angle was allowed to vary from −π/2
to π/2 and the velocity ranged from 0 to 6 m/s. The weights were chosen
differently for the hover-to-hover and forward flight modes. For the forward
flight mode, a smaller weight was placed on the horizontal (x) position of
the fan compared to the hover-to-hover mode. Furthermore, the z weight
was scheduled as a function of forward velocity in the forward flight mode.
3-18 CHAPTER 3. RECEDING HORIZON CONTROL

There was no scheduling on the weights for hover-to-hover. The elements


of the gain matrices for each of the controller and observer are linearly
interpolated over 51 operating points.

Nonlinear Trajectory Generation Parameters


We solve a minimum time optimal control problem to generate a feasible tra-
jectory for the system. The system is modeled using the nonlinear equations
described above and computed the open loop forces and state trajectories
for the nominal system. This system is not known to be differentially flat
(due to the aerodynamic forces) and hence we cannot completely eliminate
the differential constraints.
We choose three outputs, z1 = x, z2 = z, and z3 = θ, which results
in a system with one remaining differential constraint. Each output is pa-
rameterized with four, sixth order C 4 piecewise polynomials over the time
interval scaled by the minimum time. A fourth output, z4 = T , is used to
represent the time horizon to be minimized and is parameterized by a scalar.
There are a total of 37 variables in this optimization problem. The trajec-
tory constraints are enforced at 21 equidistant breakpoints over the scaled
time interval.
There are many considerations in the choice of the parameterization of
the outputs. Clearly there is a trade between the parameters (variables,
initial values of the variables, and breakpoints) and measures of performance
(convergence, run-time, and conservative constraints). Extensive simulations
were run to determine the right combination of parameters to meet the
performance goals of our system.

Forward Flight
To obtain the forward flight test data, an operator commanded a desired
forward velocity and vertical position with joysticks. We set the trajectory
update time δ to 2 seconds. By rapidly changing the joysticks, NTG produces
high angle of attack maneuvers. Figure 3.4aa depicts the reference trajec-
tories and the actual θ and ẋ over 60 s. Figure 3.4b shows the commanded
forces for the same time interval. The sequence of maneuvers corresponds to
the ducted fan transitioning from near hover to forward flight, then follow-
ing a command from a large forward velocity to a large negative velocity,
and finally returning to hover.
Figure 3.5 is an illustration of the ducted fan altitude and x position
for these maneuvers. The air-foil in the figure depicts the pitch angle (θ).
It is apparent from this figure that the stabilizing controller is not tracking
well in the z direction. This is due to the fact that unmodeled frictional
effects are significant in the vertical direction. This could be corrected with
an integrator in the stabilizing controller.
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-19
6 12

x’act
4
x’
des
2 10
x’
0

−2 8

−4
110 120 130 140 150 160 170 180
t 6

fz
3
θ
act
2.5 θ 4
des
2

1.5
θ

2 constraints
1 desired

0.5

0 0
110 120 130 140 150 160 170 180 −6 −4 −2 0 2 4 6
t fx

(a) System state (b) Input forces


Figure 3.4: Forward flight test case: (a) θ and ẋ desired and actual, (b) desired
FXb and FZb with bounds.

x vs. alt
1

0.8

0.6

0.4

0.2
alt (m)

−0.2

−0.4

−0.6

−0.8
0 10 20 30 40 50 60 70 80
x (m)

Figure 3.5: Forward flight test case: altitude and x position (actual (solid) and
desired (dashed)). Airfoil represents actual pitch angle (θ) of the ducted fan.

An analysis of the run times was performed for 30 trajectories; the aver-
age computation time was less than one second. Each of the 30 trajectories
converged to an optimal solution and was approximately between 4 and
12 seconds in length. A random initial guess was used for the first NTG
trajectory computation. Subsequent NTG computations used the previous
solution as an initial guess. Much improvement can be made in determin-
ing a “good” initial guess. Improvement in the initial guess will improve not
only convergence but also computation times.

Receding Horizon Control


The results of the previous section demonstrate the ability to compute opti-
mal trajectories in real time, although the computation time was not suffi-
ciently fast for closing the loop around the optimization. In this section, we
make use of a shorter update time δ, a fixed horizon time T with a quadratic
integral cost, and a CLF terminal cost to implement the receding horizon
3-20 CHAPTER 3. RECEDING HORIZON CONTROL

controller described in Section 3.2. We also limit the operation of the system
to near hover, so that we can use the local linearization to find the terminal
CLF.
We have implemented the receding horizon controller on the ducted fan
experiment where the control objective is to stabilize the hover equilibrium
point. The quadratic cost is given by
1 1
L(x, u) = x̂T Qx̂ + ûT Rû
2 2 (3.20)
T
V (x) = γ x̂ P x̂
where
x̂ = x − xeq = (x, z, θ − π/2, ẋ, ż, θ̇)
û = u − ueq = (FXb − mg, FZb )
Q = diag{4, 15, 4, 1, 3, 0.3}
R = diag{0.5, 0.5},
For the terminal cost, we choose γ = 0.075 and P is the unique stable
solution to the algebraic Riccati equation corresponding to the linearized
dynamics of equation (3.19) at hover and the weights Q and R. Note that
if γ = 1/2, then V (·) is the CLF for the system corresponding to the LQR
problem. Instead V is a relaxed (in magnitude) CLF, which achieved better
performance in the experiment. In either case, V is valid as a CLF only in a
neighborhood around hover since it is based on the linearized dynamics. We
do not try to compute off-line a region of attraction for this CLF. Experi-
mental tests omitting the terminal cost and/or the input constraints leads
to instability. The results in this section show the success of this choice for
V for stabilization. An inner-loop PD controller on θ, θ̇ is implemented to
stabilize to the receding horizon states θT∗ , θ̇T∗ . The θ dynamics are the fastest
for this system and although most receding horizon controllers were found
to be nominally stable without this inner-loop controller, small disturbances
could lead to instability.
The optimal control problem is set-up in NTG code by parameterizing
the three position states (x, z, θ), each with 8 B-spline coefficients. Over the
receding horizon time intervals, 11 and 16 breakpoints were used with hori-
zon lengths of 1, 1.5, 2, 3, 4 and 6 seconds. Breakpoints specify the locations
in time where the differential equations and any constraints must be satis-
fied, up to some tolerance. The value of FXmax b
for the input constraints is
made conservative to avoid prolonged input saturation on the real hardware.
The logic for this is that if the inputs are saturated on the real hardware,
no actuation is left for the inner-loop θ controller and the system can go
unstable. The value used in the optimization is FXmax b
= 9 N.
Computation time is non-negligible and must be considered when imple-
menting the optimal trajectories. The computation time varies with each
optimization as the current state of the ducted fan changes. The follow-
3.4. IMPLEMENTATION ON THE CALTECH DUCTED FAN 3-21

Legend
computed X applied unused

Input δc(i) δc(i+1)

u*T (i-1)
X X X

X X *
uT (i+1)
X
X X
computation computation u*T (i)
(i) (i+1)
time
ti ti+1 ti+2

Figure 3.6: Receding horizon input trajectories.

ing notational definitions will facilitate the description of how the timing is
set-up:
i Integer counter of RHC computations
ti Value of current time when RHC computation i started
δc (i) Computation time for computation i
u∗T (i)(t) Optimal output trajectory corresponding to computation
i, with time interval t ∈ [ti , ti + T ]
A natural choice for updating the optimal trajectories for stabilization is to
do so as fast as possible. This is achieved here by constantly resolving the
optimization. When computation i is done, computation i + 1 is immedi-
ately started, so ti+1 = ti + δc (i). Figure 3.6 gives a graphical picture of the
timing set-up as the optimal input trajectories u∗T (·) are updated. As shown
in the figure, any computation i for u∗T (i)(·) occurs for t ∈ [ti , ti+1 ] and the
resulting trajectory is applied for t ∈ [ti+1 , ti+2 ]. At t = ti+1 computation
i + 1 is started for trajectory u∗T (i + 1)(·), which is applied as soon as it is
available (t = ti+2 ). For the experimental runs detailed in the results, δc (i)
is typically in the range of [0.05, 0.25] seconds, meaning 4 to 20 optimal
control computations per second. Each optimization i requires the current
measured state of the ducted fan and the value of the previous optimal in-
put trajectories u∗T (i − 1) at time t = ti . This corresponds to, respectively,
6 initial conditions for state vector x and 2 initial constraints on the in-
put vector u. Figure 3.6 shows that the optimal trajectories are advanced
by their computation time prior to application to the system. A dashed line
corresponds to the initial portion of an optimal trajectory and is not applied
since it is not available until that computation is complete. The figure also
reveals the possible discontinuity between successive applied optimal input
trajectories, with a larger discontinuity more likely for longer computation
3-22 CHAPTER 3. RECEDING HORIZON CONTROL

Average run time for previous second of computation MPC response to 6m offset in x for various horizons
0.4 6
T = 1.5
T = 2.0 5

average run time (seconds)


T = 3.0
0.3 T = 4.0 4
T = 6.0 step ref
3 + T = 1.5

x (m)
0.2 o T = 2.0
2 * T = 3.0
x T = 4.0
1 . T = 6.0
0.1
0

0 −1
0 5 10 15 20 −5 0 5 10 15 20 25
seconds after initiation time (sec)
(a) Average run time (b) Step responses
Figure 3.7: Receding horizon control: (a) moving one second average of compu-
tation time for RHC implementation with varying horizon time, (b) response of
RHC controllers to 6 meter offset in x for different horizon lengths.

times. The initial input constraint is an effort to reduce such discontinuities,


although some discontinuity is unavoidable by this method. Also note that
the same discontinuity is present for the 6 open-loop optimal state trajec-
tories generated, again with a likelihood for greater discontinuity for longer
computation times. In this description, initialization is not an issue because
we assume the receding horizon computations are already running prior to
any test runs. This is true of the experimental runs detailed in the results.
The experimental results show the response of the fan with each controller
to a 6 meter horizontal offset, which is effectively engaging a step-response
to a change in the initial condition for x. The following details the effects of
different receding horizon control parameterizations, namely as the horizon
changes, and the responses with the different controllers to the induced
offset.
The first comparison is between different receding horizon controllers,
where time horizon is varied to be 1.5, 2.0, 3.0, 4.0 or 6.0 seconds. Each
controller uses 16 breakpoints. Figure 3.7a shows a comparison of the average
computation time as time proceeds. For each second after the offset was
initiated, the data correspond to the average run time over the previous
second of computation. Note that these computation times are substantially
smaller than those reported for real-time trajectory generation, due to the
use of the CLF terminal cost versus the terminal constraints in the minimum-
time, real-time trajectory generation experiments.
There is a clear trend toward shorter average computation times as the
time horizon is made longer. There is also an initial transient increase in
average computation time that is greater for shorter horizon times. In fact,
the 6 second horizon controller exhibits a relatively constant average com-
putation time. One explanation for this trend is that, for this particular test,
a 6 second horizon is closer to what the system can actually do. After 1.5
3.5. FURTHER READING 3-23

seconds, the fan is still far from the desired hover position and the terminal
cost CLF is large, likely far from its region of attraction. Figure 3.7b shows
the measured x response for these different controllers, exhibiting a rise time
of 8–9 seconds independent of the controller. So a horizon time closer to the
rise time results in a more feasible optimization in this case.

3.5 Further Reading

Exercises
3.1
3.2 Consider a nonlinear control system
ẋ = f (x, u)
with linearization
ẋ = Ax + Bu.
Show that if the linearized system is reachable, then there exists a (local)
control Lyapunov function for the nonlinear system. (Hint: start by proving
the result for a stable system.)
3.3 In this problem we will explore the effect of constraints on control of
the linear unstable system given by
ẋ1 = 0.8x1 − 0.5x2 + 0.5u
ẋ2 = x1 + 0.5u
subject to the constraint that |u| ≤ a where a is a postive constant.
(a) Ignore the constraint (a = ∞) and design an LQR controller to stabi-
lize the system. Plot the response of the closed system from the initial
condition given by x = (1, 0).
(b) Use SIMULINK or ode45 to simulate the system for some finite value
of a with an initial condition x(0) = (1, 0). Numerically (trial and
error) determine the smallest value of a for which the system goes
unstable.
(c) Let amin (ρ) be the smallest value of a for which the system is unstable
from x(0) = (ρ, 0). Plot amin (ρ) for ρ = 1, 4, 16, 64, 256.
(d) Optional: Given a > 0, design and implement a receding horizon con-
trol law for this system. Show that this controller has larger region
of attraction than the controller designed in part (b). (Hint: solve the
finite horizon LQ problem analytically, using the bang-bang example
as a guide to handle the input constraint.)
3.4
3-24 CHAPTER 3. RECEDING HORIZON CONTROL

3.5 Consider the optimal control problem given in Example 2.2:


Z tf
ẋ = ax + bu, J=2 1
u2 (t) dt + 12 cx2 (tf ),
t0

where x ∈ R is a scalar state, u ∈ R is the input, the initial state x(t0 )


is given, and a, b ∈ R are positive constants. We take the terminal time tf
as given and let c > 0 be a constant that balances the final value of the
state with the input required to get to that position. The optimal control
for a finite time T > 0 is derived in Example 2.2. Now consider the infinite
horizon cost Z ∞
J=2 1
u2 (t) dt
t0

with x(t) at t = ∞ constrained to be zero.

i. Solve for u∗ (t) = −bP x∗ (t) where P is the positive solution corre-
sponding to the algebraic Riccati equation. Note that this gives an
explicit feedback law (u = −bP x).
ii. Plot the state solution of the finite time optimal controller for the
following parameter values
a=2 b = 0.5 x(t0 ) = 4
c = 0.1, 10 tf = 0.5, 1, 10
(This should give you a total of 6 curves.) Compare these to the infinite
time optimal control solution. Which finite time solution is closest to
the infinite time solution? Why?

Using the solution given in equation (2.5), implement the finite-time op-
timal controller in a receding horizon fashion with an update time of δ = 0.5.
Using the parameter values in part (b), compare the responses of the reced-
ing horizon controllers to the LQR controller you designed for problem 1,
from the same initial condition. What do you observe as c and tf increase?
(Hint: you can write a MATLAB script to do this by performing the
following steps:
(i) set t0 = 0
(ii) using the closed form solution for x∗ from problem 1, plot x(t), t ∈
[t0 , tf ] and save xδ = x(t0 + δ)
(iii) set x(t0 ) = xδ and repeat step (ii) until x is small.)
3.6
3.7 In this problem we will explore the effect of constraints on control of
the linear unstable system given by
ẋ1 = 0.8x1 − 0.5x2 + 0.5u, ẋ2 = x1 + 0.5u,
3.5. FURTHER READING 3-25

subject to the constraint that |u| ≤ a where a is a postive constant.


i. Ignore the constraint (a = ∞) and design an LQR controller to stabi-
lize the system. Plot the response of the closed system from the initial
condition given by x = (1, 0).
ii. Use SIMULINK or ode45 to simulate the system for some finite value
of a with an initial condition x(0) = (1, 0). Numerically (trial and
error) determine the smallest value of a for which the system goes
unstable.
iii. Let amin (ρ) be the smallest value of a for which the system is unstable
from x(0) = (ρ, 0). Plot amin (ρ) for ρ = 1, 4, 16, 64, 256.
iv. Optional: Given a > 0, design and implement a receding horizon con-
trol law for this system. Show that this controller has larger region
of attraction than the controller designed in part (b). (Hint: solve the
finite horizon LQ problem analytically, using the bang-bang example
as a guide to handle the input constraint.)
Chapter Four
Stochastic Systems

In this appendix we present a focused overview of stochastic systems, suit-


able for use in either estimation theory or biomolecular modeling. After a
brief review of random variables, we define discrete-time and continuous-
time random processes, including the expectation, (co-)variance and corre-
lation functions for a random process. These definitions are used to describe
linear stochastic systems (in continuous time) and the stochastic response
of a linear system to a random process (e.g., noise). We initially derive
the relevant quantities in the state space, followed by a presentation of the
equivalent frequency domain concepts.
Prerequisites. Readers should be familiar with basic concepts in probability,
including random variables and standard distributions. We do not assume
any prior familiarity with random processes.
Caveats. This appendix is written to provide a brief introduction to stochas-
tic processes that can be used to derive the results in other subject areas. In
order to keep the presentation compact, we gloss over several mathematical
details that are required for rigorous presentation of the results. A more de-
tailed (and mathematically precise) derivation of this material is available
in the book by Åström [Åst06].

4.1 Brief Review of Random Variables


To help fix the notation that we will use, we briefly review the key concepts
of random variables. A more complete exposition is available in standard
books on probability, such as Grimmett and Stirzaker [GS01].
Random variables and processes are defined in terms of an underlying
probability space that captures the nature of the stochastic system we wish
to study. A probability space (Ω, F, P) consists of:

• a sample space Ω that represents the set of all possible outcomes;

• a set of events F the captures combinations of elementary outcomes


that are of interest; and

• a probability measure P that describes the likelihood of a given event


occurring.
4-2 CHAPTER 4. STOCHASTIC SYSTEMS

Ω can be any set, either with a finite, countable or infinite number of ele-
ments. The event space F consists of subsets of Ω. There are some mathemat-
ical limits on the properties of the sets in F, but these are not critical for our
purposes here. The probability measure P is a mapping from P : F → [0, 1]
that assigns a probability to each event. It must satisfy the property that
given any two disjoint sets A, B ∈ F, P(A ∪ B) = P(A) + P(B).
With these definitions, we can model many different stochastic phenom-
ena. Given a probability space, we can choose samples ω ∈ Ω and identify
each sample with a collection of events chosen from F. These events should
correspond to phenomena of interest and the probability measure P should
capture the likelihood of that event occurring in the system that we are
modeling. This definition of a probability space is very general and allows
us to consider a number of situations as special cases.
A random variable X is a function X : Ω → S that gives a value in S,
called the state space, for any sample ω ∈ Ω. Given a subset A ⊂ S, we can
write the probability that X ∈ A as

P(X ∈ A) = P({ω ∈ Ω : X(ω) ∈ A}).


We will often find it convenient to omit ω when working random variables
and hence we write X ∈ S rather than the more correct X(ω) ∈ S. The term
probability distribution is used to describe the set of possible values that X
can take.
A discrete random variable X is a variable that can take on any value
from a discrete set S with some probability for each element of the set. We
model a discrete random variable by its probability mass function pX (s),
which gives the probability that the random variable X takes on the specific
value s ∈ S:

pX (s) = probability that X takes on the value s ∈ S.

The sum of the probabilities over the entire set of states must be unity, and
so we have that X
pX (s) = 1.
s∈S

If A is a subset of S, then we can write P(X ∈ A) for the probability that


X will take on some value in the set A. It follows from our definition that
X
P(X ∈ A) = pX (s).
s∈A

Definition 4.1 (Bernoulli distribution). The Bernoulli distribution is used


to model a random variable that takes the value 1 with probability p and 0
with probability 1 − p:

P(X = 1) = p, P(X = 0) = 1 − p.
4.1. BRIEF REVIEW OF RANDOM VARIABLES 4-3

(a) Binomial distribution (b) Poisson distribution


Figure 4.1: Probability mass functions for common discrete distributions.

Alternatively, it can be written in terms of its probability mass function




p s=1
p(s) = 1 − p s = 0


0 otherwise.
Bernoulli distributions are used to model independent experiments with bi-
nary outcomes, such as flipping a coin.

Definition 4.2 (Binomial distribution). The binomial distribution models


the probability of successful trials in n experiments, given that a single ex-
periment has probability of success p. If we let Xn be a random variable that
indicates the number of success in n trials, then the binomial distribution is
given by  
n k
pXn (k) = P(Xn = k) = p (1 − p)n−k
k
for k = 1, . . . , n. The probability mass function is shown in Figure 4.1a.
Definition 4.3 (Poisson distribution). The Poisson distribution is used to
describe the probability that a given number of events will occur in a fixed
interval of time t. The Poisson distribution is defined as
(λt)k −λt
pNt (k) = P(Nt = k) = e , (4.1)
k!
where Nt is the number of events that occur in a period t and λ is a real
number parameterizing the distribution. This distribution can be considered
as a model for a counting process, where we assume that the average rate
of occurrences in a period t is given by λt and λ represents the rate of the
counting process. Figure 4.1b shows the form of the distribution for different
values of k and λt.
A continuous (real-valued) random variable X is a variable that can take
on any value in the set of real numbers R. We can model the random variable
X according to its probability distribution function F : R → [0, 1]:
F (x) = P(X ≤ x) = probability that X takes on a value in the range (−∞, x].
4-4 CHAPTER 4. STOCHASTIC SYSTEMS

p(x) p(x)

L U µ
(a) Uniform distribu- (b) Gaussian distri- (c) Exponentialdistri-
tion bution bution
Figure 4.2: Probability density function (pdf) for uniform, Gaussian and expo-
nential distributions.

It follows from the definition that if X is a random variable in the range


[L, U ] then P(L ≤ X ≤ U ) = 1. Similarly, if y ∈ [L, U ] then P(L ≤ X <
y) = 1 − P(y ≤ X ≤ U ).
We characterize a random variable in terms of the probability density
function (pdf) p(x). The density function is defined so that its integral over
an interval gives the probability that the random variable takes its value in
that interval: Z xu
P(xl ≤ X ≤ xu ) = p(x)dx. (4.2)
xl

It is also possible to compute p(x) given the distribution P as long as the


distribution function is suitably smooth:
∂F
(x).
p(x) =
∂x
We will sometimes write pX (x) when we wish to make explicit that the pdf
is associated with the random variable X. Note that we use capital letters to
refer to a random variable and lower case letters to refer to a specific value.

Definition 4.4 (Uniform distribution). The uniform distribution on an in-


terval [L, U ] assigns equal probability to any number in the interval. Its pdf
is given by
1
p(x) = . (4.3)
U −L
The uniform distribution is illustrated in Figure 4.2a.

Definition 4.5 (Gaussian distribution). The Gaussian distribution (also


called a normal distribution) has a pdf of the form
 2
1 x−µ
1 −2 σ
p(x) = √ e . (4.4)
2πσ 2
The parameter µ is called the mean of the distribution and σ is called the
4.1. BRIEF REVIEW OF RANDOM VARIABLES 4-5

standard deviation of the distribution. Figure 4.2b shows a graphical repre-


sentation a Gaussian pdf.
Definition 4.6 (Exponential distribution). The exponential distribution is
defined for positive numbers and has a pdf of the form
p(x) = λe−λx , x>0
where λ is a parameter defining the distribution. A plot of the pdf for an
exponential distribution is shown in Figure 4.2c.
We now define a number of properties of collections of random variables.
We focus on the continuous random variable case, but unless noted other-
wise these concepts can all be defined similarly for discrete random variables
(using the probability mass function in place of the probability density func-
tion).
If two random variables are related, we can talk about their joint prob-
ability distribution: PX,Y (A, B) is the probability that both event A occurs
for X and B occurs for Y . This is sometimes written as P (A ∩ B), where we
abuse notation by implicitly assuming that A is associated with X and B
with Y . For continuous random variables, the joint probability distribution
can be characterized in terms of a joint probability density function
Z y Z x
FX,Y (x, y) = P(X ≤ x, Y ≤ y) = p(u, v) du dv. (4.5)
−∞ ∞
The joint pdf thus describes the relationship between X and Y , and for
sufficiently smooth distributions we have
∂2F
p(x, y) = .
∂x∂y
We say that X and Y are independent if p(x, y) = p(x) p(y), which im-
plies that FX,Y (x, y) = FX (x) FY (y) for all x, y. Equivalently, P(A ∩ B) =
P(A)P(B) if A and B are independent events.
The conditional probability for an event A given that an event B has
occurred, written as P(A | B), is given by
P(A ∩ B)
P(A | B) = . (4.6)
P (B)
If the events A and B are independent, then P(A | B) = P(A). Note that the
individual, joint and conditional probability distributions are all different, so
if we are talking about random variables we can write PX,Y (A, B), PX|Y (A |
B) and PY (B), where A and B are appropriate subsets of R.
If X is dependent on Y then Y is also dependent on X. Bayes’ theorem
relates the conditional and individual probabilities:
P(B | A)P(A)
P(A | B) = , P (B) 6= 0. (4.7)
P(B)
4-6 CHAPTER 4. STOCHASTIC SYSTEMS

Bayes’ theorem gives the conditional probability of event A on event B given


the inverse relationship (B given A). It can be used in situations in which
we wish to evaluate a hypothesis H given data D when we have some model
for how likely the data is given the hypothesis, along with the unconditioned
probabilities for both the hypothesis and the data.
The analog of the probability density function for conditional probability
is the conditional probability density function p(x | y)

 p(x, y) 0 < p(y) < ∞
p(x | y) = p(y) (4.8)

0 otherwise.
It follows that
p(x, y) = p(x | y)p(y) (4.9)
and
P(X ≤ x | y) := P (X ≤ x | Y = y)
Z x Rx
p(u, y)du (4.10)
= p(u | y)du = −∞ .
−∞ p(y)
If X and Y are independent than p(x | y) = p(x) and p(y | x) = p(y). Note
that p(x, y) and p(x | y) are different density functions, though they are
related through equation (4.9). If X and Y are related with joint probability
density function p(x, y) and conditional probability density function p(x | y)
then Z ∞ Z ∞
p(x) = p(x, y)dy = p(x | y)p(y)dy.
−∞ −∞

Example 4.1 Conditional probability for sum


Consider three random variables X, Y and Z related by the expression
Z = X + Y.
In other words, the value of the random variable Z is given by choosing
values from two random variables X and Y and adding them. We assume
that X and Y are independent Gaussian random variables with mean µ1
and µ2 and standard deviation σ = 1 (the same for both variables).
Clearly the random variable Z is not independent of X (or Y ) since if
we know the values of X then it provides information about the likely value
of Z. To see this, we compute the joint probability between Z and X. Let
A = {xl ≤ x ≤ xu }, B = {zl ≤ z ≤ zu }.
The joint probability of both events A and B occurring is given by
PX,Z (A ∩ B) = P(xl ≤ x ≤ xu , zl ≤ x + y ≤ zu )
= P(xl ≤ x ≤ xu , zl − x ≤ y ≤ zu − x).
We can compute this probability by using the probability density functions
4.1. BRIEF REVIEW OF RANDOM VARIABLES 4-7

for X and Y :
Z xu Z zu −x 
P(A ∩ B) = pY (y)dy pX (x)dx
zl −x
Zxlxu Z zu Z zu Z xu
= pY (z − x)pX (x)dzdx =: pZ,X (z, x)dxdz.
xl zl zl xl
Using Gaussians for X and Y we have
1 1 2 1 1 2
pZ,X (z, x) = √ e− 2 (z − x − µY ) · √ e− 2 (x − µX )
2π 2π

1 − (z − x − µY ) + (x − µX )2
1 2
= e 2 .

A similar expression holds for pZ,Y . ∇
Given a random variable X, we can define various standard measures of
the distribution. The expectation or mean of a random variable is defined as
Z ∞
E(X) = hXi = x p(x) dx,
−∞
and the mean square of a random variable is
Z ∞
2 2
E(X ) = hX i = x2 p(x) dx.
−∞

If we let µ represent the expectation (or mean) of X then we define the


variance of X as
Z ∞
2 2
E((X − µ) ) = h(X − hXi) i = (x − µ)2 p(x) dx.
−∞

We will often write the variance as σ2.


As the notation indicates, if we have a
Gaussian random variable with mean µ and (stationary) standard deviation
σ, then the expectation and variance as computed above return µ and σ 2 .
Example 4.2 Exponential distribution
The exponential distribution has mean and variance given by
1 1
µ= , σ2 = 2 .
λ λ

Several useful properties follow from the definitions.
Proposition 4.1 (Properties of random variables).
1. If X is a random variable with mean µ and variance σ 2 , then αX is
random variable with mean αX and variance α2 σ 2 .
2. If X and Y are two random variables, then E(αX + βY ) = αE(X) +
β E(Y ).
4-8 CHAPTER 4. STOCHASTIC SYSTEMS

3. If X and Y are Gaussian random variables with means µX , µY and


2 , σ2 ,
variances σX Y
2 2
1 1
 
x−µX y−µY
−1 − 12
p(x) = q e 2 σX
, p(y) = q e σY
,
2
2πσX 2πσY2
then X + Y is a Gaussian random variable with mean µZ = µX + µY
and variance σZ2 = σX
2 + σ2 ,
Y
2
1

x+y−µZ
−1
p(x + y) = q e 2 σZ
.
2πσZ2

Proof. The first property follows from the definition of mean and variance:
Z ∞ Z ∞
E(αX) = αx p(x) dx = α αx p(x) dx = αE(X)
Z−∞

−∞
Z ∞
E((αX)2 ) = (αx)2 p(x) dx = α2 x2 p(x) dx = α2 E(X 2 ).
−∞ −∞
The second property follows similarly, remembering that we must take the
expectation using the joint distribution (since we are evaluating a function
of two random variables):
Z ∞Z ∞
E(αX + βY ) = (αx + βy) pX,Y (x, y) dxdy
−∞ −∞
Z ∞Z ∞ Z ∞Z ∞
=α x pX,Y (x, y) dxdy + β y pX,Y (x, y) dxdy
Z−∞

−∞
Z ∞
−∞ −∞

=α x pX (x) dx + β y pY (y) dy = αE(X) + β E(Y ).


−∞ −∞
The third item is left as an exercise.

4.2 Introduction to Random Processes


A random process is a collection of time-indexed random variables. Formally,
we consider a random process X to be a joint mapping of sample and a time
to a state: X : Ω × T → S, where T is an appropriate time set. We view
this mapping as a generalized random variable: a sample corresponds to
choosing an entire function of time. Of course, we can always fix the time and
interpret X(ω, t) as a regular random variable, with X(ω, t′ ) representing a
different random variable if t 6= t′ . Our description of random processes will
consist of describing how the random variable at a time t relates to the
value of the random variable at an earlier time s. To build up some intuition
about random processes, we will begin with the discrete time case, where
the calculations are a bit more straightforward, and then proceed to the
continuous time case.
4.2. INTRODUCTION TO RANDOM PROCESSES 4-9

A discrete-time random process is a stochastic system characterized by


the evolution of a sequence of random variables X[k], where k is an integer.
As an example, consider a discrete-time linear system with dynamics

X[k + 1] = AX[k] + BU [k] + F W [k], Y [k] = CX[k] + V [k]. (4.11)

As in ÅM08, X ∈ Rn represents the state of the system, U ∈ Rm is the


vector of inputs and Y ∈ Rp is the vector of outputs. The (possibly vector-
valued) signal W represents disturbances to the process dynamics and V
represents noise in the measurements. To try to fix the basic ideas, we will
take u = 0, n = 1 (single state) and F = 1 for now.
We wish to describe the evolution of the dynamics when the disturbances
and noise are not given as deterministic signals, but rather are chosen from
some probability distribution. Thus we will let W [k] be a collection of ran-
dom variables where the values at each instant k are chosen from a probabil-
ity distribution with pdf pW,k (x). As the notation indicates, the distributions
might depend on the time instant k, although the most common case is to
have a stationary distribution in which the distributions are independent of
k (defined more formally below).
In addition to stationarity, we will often also assume that distribution of
values of W at time k is independent of the values of W at time l if k 6= l.
In other words, W [k] and W [l] are two separate random variables that are
independent of each other. We say that the corresponding random process
is uncorrelated (also defined more formally below). As a consequence of our
independence assumption, we have that
(
2 E(W 2 [k]) k = l
E(W [k]W [l]) = E(W [k])δ(k − l) =
0 k=6 l.

In the case that W [k] is a Gaussian with mean zero and (stationary) standard
deviation σ, then E(W [k]W [l]) = σ 2 δ(k − l).
We next wish to describe the evolution of the state x in equation (4.11)
in the case when W is a random variable. In order to do this, we describe
the state x as a sequence of random variables X[k], k = 1, · · · , N . Looking
back at equation (4.11), we see that even if W [k] is an uncorrelated sequence
of random variables, then the states X[k] are not uncorrelated since

X[k + 1] = AX[k] + F W [k],

and hence the probability distribution for X at time k + 1 depends on the


value of X at time k (as well as the value of W at time k), similar to the
situation in Example 4.1.
Since each X[k] is a random variable, we can define the mean and variance
4-10 CHAPTER 4. STOCHASTIC SYSTEMS

as µ[k] and σ 2 [k] using the previous definitions at each time k:


Z ∞
µ[k] := E(X[k]) = x p(x, k) dx,
−∞
Z ∞
σ 2 [k] := E((X[k] − µ[k])2 ) = (x − µ[k])2 p(x, k) dx.
−∞

To capture the relationship between the current state and the future state,
we define the correlation function for a random process as
Z ∞
ρ(k1 , k2 ) := E(X[k1 ]X[k2 ]) = x1 x2 p(x1 , x2 ; k1 , k2 ) dx1 dx2
−∞

The function p(xi , xj ; k1 , k2 ) is the joint probability density function, which


depends on the times k1 and k2 . A process is stationary if p(x, k+d) = p(x, d)
for all k, p(xi , xj ; k1 + d, k2 + d) = p(xi , xj ; k1 , k2 ), etc. In this case we can
write p(xi , xj ; d) for the joint probability distribution. We will almost always
restrict to this case. Similarly, we will write p(k1 , k2 ) as p(d) = p(k, k + d).
We can compute the correlation function by explicitly computing the
joint pdf (see Example 4.1) or by directly computing the expectation. Sup-
pose that we take a random process of the form (4.11) with X[0] = 0 and
W having zero mean and standard deviation σ. The correlation function is
given by

n kX
1 −1 2 −1
 kX o
E(X[k1 ]X[k2 ]) = E Ak1 −i BW [i] Ak2 −j BW [j]
i=0 j=0
nkX
1 −1 k
X 2 −1 o
=E Ak1 −i BW [i]W [j]BAk2 −j .
i=0 j=0

We can now use the linearity of the expectation operator to pull this inside
the summations:
1 −1 k
kX X 2 −1

E(X[k1 ]X[k2 ]) = Ak1 −i B E(W [i]W [j])BAk2 −j


i=0 j=0
1 −1 k
kX X 2 −1

= Ak1 −i Bσ 2 δ(i − j)BAk2 −j


i=0 j=0
1 −1
kX
= Ak1 −i Bσ 2 BAk2 −i .
i=0

Note that the correlation function depends on k1 and k2 .


We can see the dependence of the correlation function on the time more
4.3. CONTINUOUS-TIME, VECTOR-VALUED RANDOM PROCESSES 4-11

clearly by letting d = k2 − k1 and writing


1 −1
kX
ρ(k, k + d) = E(X[k]X[k + d]) = Ak−i Bσ 2 BAd+k−i
i=0
Xk X
k 
= Aj Bσ 2 BAj+d = Aj Bσ 2 BAj Ad .
j=1 j=1

In particular, if the discrete time system is stable then |A| < 1 and the
correlation function decays as we take points that are further departed in
time (d large). Furthermore, if we let k → ∞ (i.e., look at the steady state
solution) then the correlation function only depends on d (assuming the sum
converges) and hence the steady state random process is stationary.
In our derivation so far, we have assumed that X[k + 1] only depends on
the value of the state at time k (this was implicit in our use of equation (4.11)
and the assumption that W [k] is independent of X). This particular assump-
tion is known as the Markov property for a random process: a Markovian
process is one in which the distribution of possible values of the state at time
k depends only on the values of the state at the prior time and not earlier.
Written more formally, we say that a discrete random process is Markovian
if
pX,k (x | X[k − 1], X[k − 2], . . . , X[0]) = pX,k (x | X[k − 1]).

Markov processes are roughly equivalent to state space dynamical systems,


where the future evolution of the system can be completely characterized in
terms of the current value of the state (and not its history of values prior to
that).

4.3 Continuous-Time, Vector-Valued Random Processes


We now consider the case where our time index is no longer discrete, but
instead varies continuously. A fully rigorous derivation requires careful use
of measure theory and is beyond the scope of this text, so we focus here
on the concepts that will be useful for modeling and analysis of important
physical properties.
A continuous-time random process is a stochastic system characterized
by the evolution of a random variable X(t), t ∈ [0, T ]. We are interested in
understanding how the (random) state of the system is related at separate
times. The process is defined in terms of the “correlation” of X(t1 ) with
X(t2 ). We assume, as above, that the process is described by continuous
random variables, but the discrete state case (with time still modeled as a
real variable) can be handled in a similar fashion.
We call X(t) ∈ Rn the state of the random process at time t. For the
4-12 CHAPTER 4. STOCHASTIC SYSTEMS

case n > 1, we have a vector of random processes:


 
X1 (t)
 
X(t) =  ... 
Xn (t)

We can characterize the state in terms of a (joint) time-varying pdf,


Z x1,u Z xn,u
P({xi,l ≤ Xi (t) ≤ xi,u }) = ··· pX1 ,...,Xn (x; t)dxn . . . dx1 .
x1,l xn,l

Note that the state of a random process is not enough to determine the extact
next state, but only the distribution of next states (otherwise it would be a
deterministic process). We typically omit indexing of the individual states
unless the meaning is not clear from context.
We can characterize the dynamics of a random process by its statistical
characteristics, written in terms of joint probability density functions:

P(x1l ≤ Xi (t1 ) ≤ x1u , x2l ≤ Xj (t2 ) ≤ x2u )


Z x2u Z x1u
= pXi ,Yi (x1 , x2 ; t1 , t2 ) dx1 dx2
x2l x1l

The function p(xi , xj ; t1 , t2 ) is called a joint probability density function and


depends both on the individual states that are being compared and the
time instants over which they are compared. Note that if i = j, then pXi ,Xi
describes how Xi at time t1 is related to Xi at time t2 .
In general, the distributions used to describe a random process depend on
the specific time or times that we evaluate the random variables. However,
in some cases the relationship only depends on the difference in time and not
the absolute times (similar to the notion of time invariance in deterministic
systems, as described in ÅM08). A process is stationary if p(x, t+τ ) = p(x, t)
for all τ , p(xi , xj ; t1 +τ, t2 +τ ) = p(xi , xj ; t1 , t2 ), etc. In this case we can write
p(xi , xj ; τ ) for the joint probability distribution. Stationary distributions
roughly correspond to the steady state properties of a random process and
we will often restrict our attention to this case.
We are often interested in random processes in which changes in the state
occur when a random event occurs (such as a molecular reaction or binding
event). In this case, it is natural to describe the state of the system in terms
of a set of times t0 < t1 < t2 < · · · < tn and X(ti ) is the random variable
that corresponds to the possible states of the system at time ti . Note that
time time instants do not have to be uniformly spaced and most often (for
physical systems) they will not be. All of the definitions above carry through,
and the process can now be described by a probability distribution of the
4.3. CONTINUOUS-TIME, VECTOR-VALUED RANDOM PROCESSES 4-13

form
 
P X(ti ) ∈ [xi , xi + dxi ], i = 1, . . . , n =
p(xn , xn−1 , . . . , x0 ; tn , tn−1 , . . . , t0 )dxn dxn−1 dx1 ,
where dxi are taken as infinitesimal quantities.
Just as in the case of discrete time processes, we define a continuous time
random process to be a Markov process if the probability of being in a given
state at time tn depends only on the state that we were in at the previous
time instant tn−1 and not the entire history of states prior to tn−1 :
 
P X(tn ) ∈ [xn , xn + dxn ] | X(ti ) ∈ [xi , xi + dxi ], i = 1, . . . , n − 1
 
= P X(tn ) ∈ [xn , xn + dxn ] | X(tn−1 ) ∈ [xn−1 , xn−1 + dxn−1 ] . (4.12)

In practice we do not usually specify random processes via the joint


probability distribution p(xi , xj ; t1 , t2 ) but instead describe them in terms of
a propogater function. Let X(t) be a Markov process and define the Markov
propogater as
Ξ(dt; x, t) = X(t + dt) − X(t), given X(t) = x.
The propogater function describes how the random variable at time t is
related to the random variable at time t + dt. Since both X(t + dt) and X(t)
are random variables, Ξ(dt; x, t) is also a random variable and hence it can
be described by its density function, which we denote as Π(ξ, x; dt, t):
Z x+ξ

P x ≤ X(t + dt) ≤ x + ξ = Π(dx, x; dt, t) dx.
x
The previous definitions for mean, variance and correlation can be ex-
tended to the continuous time, vector-valued case by indexing the individual
states:
 
E{X1 (t)}
 .. 
E{X(t)} =  .  =: µ(t)
E{Xn (t)}
 
E{X1 (t)X1 (t)} . . . E{X1 (t)Xn (t)}
 .. .. 
E{(X(t) − µ(t))(X(t) − µ(t))T } =  . .  =: Σ(t)
E{Xn (t)Xn (t)}
 
E{X1 (t)X1 (s)} . . . E{X1 (t)Xn (s)}
 .. .. 
E{X(t)X T (s)} =  . .  =: R(t, s)
E{Xn (t)Xn (s)}
Note that the random variables and their statistical properties are all in-
dexed by the time t (and s). The matrix R(t, s) is called the correlation
4-14 CHAPTER 4. STOCHASTIC SYSTEMS

ρ(t1 − t2 )

τ = t1 − t2
Figure 4.3: Correlation function for a first-order Markov process.

matrix for X(t) ∈ Rn . If t = s then R(t, t) describes how the elements of x


are correlated at time t (with each other) and in the case that the processes
have zero mean, R(t, t) = Σ(t). The elements on the diagonal of Σ(t) are
the variances of the corresponding scalar variables. A random process is un-
correlated if R(t, s) = 0 for all t 6= s. This implies that X(t) and X(s) are
independent random events and is equivalent to pX,Y (x, y) = pX (x)pY (y).
If a random process is stationary, then it can be shown that R(t + τ, s +
τ ) = R(t, s) and it follows that the correlation matrix depends only on t − s.
In this case we will often write R(t, s) = R(s − t) or simply R(τ ) where τ is
the correlation time. The covariance matrix in this case is simply R(0).
In the case where X is also scalar random process, the correlation matrix
is also a scalar and we will write r(τ ), which we refer to as the (scalar)
correlation function. Furthermore, for stationary scalar random processes,
the correlation function depends only on the absolute value of the correlation
function, so r(τ ) = r(−τ ) = r(|τ |). This property also holds for the diagonal
entries of the correlation matrix since Rii (s, t) = Rii (t, s) from the definition.
Definition 4.7 (Ornstein-Uhlenbeck process). Consider a scalar random
process defined by a Gaussian pdf with µ = 0,
1 1 x2
p(x, t) = √ e− 2 σ 2 ,
2πσ 2
and a correlation function given by
Q −ω0 |t2 −t1 |
r(t1 , t2 ) =e .
2ω0
The correlation function is illustrated in Figure 4.3. This process is known
as an Ornstein-Uhlenbeck process and it is a stationary process.

Note on terminology. The terminology and notation for covariance and cor-
relation varies between disciplines. The term covariance is often used to
refer to both the relationship between different variables X and Y and the
relationship between a single variable at different times, X(t) and X(s).
The term “cross-covariance” is used to refer to the covariance between two
random vectors X and Y , to distinguish this from the covariance of the
elements of X with each other. The term “cross-correlation” is sometimes
also used. Finally, the term “correlation coefficient” refers to the normalized
correlation r̄(t, s) = E(X(t)X(s))/E(X(t)X(t))..
4.4. LINEAR STOCHASTIC SYSTEMS WITH GAUSSIAN NOISE 4-15

MATLAB has a number of functions to implement covariance and correla-


tion, which mostly match the terminology here:
• cov(X) - this returns the variance of the vector X that represents sam-
ples of a given random variable or the covariance of the columns of a
matrix X where the rows represent observations.
• cov(X, Y) - equivalent to cov([X(:), Y(:)]). Computes the covari-
ance between the columns of X and Y , where the rows are observa-
tions.
• xcorr(X, Y) - the “cross-correlation” between two random sequences.
If these sequences came from a random process, this is correlation
function r(t).
• xcov(X, Y) - this returns the “cross-covariance”, which MATLAB defines
as the “mean-removed cross-correlation”.
The MATLAB help pages give the exact formulas used for each, so the main
point here is to be careful to make sure you know what you really want.
We will also make use of a special type of random process referred to
as “white noise”. A white noise process X(t) satisfies E{X(t)} = 0 and
R(t, s) = W δ(s − t), where δ(τ ) is the impulse function and W is called the
noise intensity. White noise is an idealized process, similar to the impulse
function or Heaviside (step) function in deterministic systems. In particu-
lar, we note that r(0) = E{X 2 (t)} = ∞, so the covariance is infinite and we
never see this signal in practice. However, like the step and impulse func-
tions, it is very useful for characterizing the response of a linear system, as
described in the following proposition. It can be shown that the integral of a
white noise process is a Wiener process, and so often white noise is described
as the derivative of a Wiener process.

4.4 Linear Stochastic Systems with Gaussian Noise


We now consider the problem of how to compute the response of a linear
system to a random process. We assume we have a linear system described
in state space as
Ẋ = AX + F W, Y = CX (4.13)
Given an “input” W , which is itself a random process with mean µ(t),
variance σ 2 (t) and correlation r(t, t + τ ), what is the description of the
random process Y ?
Let W be a white noise process, with zero mean and noise intensity Q:
r(τ ) = Qδ(τ ).
We can write the output of the system in terms of the convolution integral
Z t
Y (t) = h(t − τ )W (τ ) dτ,
0
4-16 CHAPTER 4. STOCHASTIC SYSTEMS

where h(t − τ ) is the impulse response for the system


h(t − τ ) = CeA(t−τ ) B + Dδ(t − τ ).
We now compute the statistics of the output, starting with the mean:
Z t
E(Y (t)) = E( h(t − η)W (η) dη )
0
Z t
= h(t − η)E(W (η)) dη = 0.
0

Note here that we have relied on the linearity of the convolution integral to
pull the expectation inside the integral.
We can compute the covariance of the output by computing the correla-
tion r(τ ) and setting σ 2 = r(0). The correlation function for y is
Z t Z s
rY (t, s) = E(Y (t)Y (s)) = E( h(t − η)W (η) dη · h(s − ξ)W (ξ) dξ )
0 0
Z tZ s
= E( h(t − η)W (η)W (ξ)h(s − ξ) dηdξ )
0 0

Once again linearity allows us to exchange expectation and integration


Z tZ s
rY (t, s) = h(t − η)E(W (η)W (ξ))h(s − ξ) dηdξ
0 0
Z tZ s
= h(t − η)Qδ(η − ξ)h(s − ξ) dηdξ
0 0
Z t
= h(t − η)Qh(s − η) dη
0

Now let τ = s − t and write


Z t
rY (τ ) = rY (t, t + τ ) = h(t − η)Qh(t + τ − η) dη
0
Z t
= h(ξ)Qh(ξ + τ ) dξ (setting ξ = t − η)
0

Finally, we let t → ∞ (steady state)


Z ∞
lim rY (t, t + τ ) = r̄Y (τ ) = h(ξ)Qh(ξ + τ )dξ (4.14)
t→∞ 0

If this integral exists, then we can compute the second order statistics for
the output Y .
We can provide a more explicit formula for the correlation function r in
terms of the matrices A, F and C by expanding equation (4.14). We will
consider the general case where W ∈ Rm and Y ∈ Rp and use the correlation
matrix R(t, s) instead of the correlation function r(t, s). Define the state
4.4. LINEAR STOCHASTIC SYSTEMS WITH GAUSSIAN NOISE 4-17

transition matrix Φ(t, t0 ) = eA(t−t0 ) so that the solution of system (4.13) is


given by Z t
x(t) = Φ(t, t0 )x(t0 ) + Φ(t, λ)F w(λ)dλ
t0

Proposition 4.2 (Stochastic response to white noise). Let E(X(t0 )X T (t0 )) =


P (t0 ) and W be white noise with E(W (λ)W T (ξ)) = RW δ(λ − ξ). Then the
correlation matrix for X is given by
RX (t, s) = P (t)ΦT (s, t)
where P (t) satisfies the linear matrix differential equation
Ṗ (t) = AP + P AT + F RW F, P (0) = P0 .
Proof. Using the definition of the correlation matrix, we have
E(X(t)X T (s)) = E Φ(t, 0)X(0)X T (0)ΦT (t, 0) + cross terms
Z t Z s 
t T
+ Φ(t, ξ)F W (ξ) dξ W (λ)F Φ(s, λ) dλ
0 0
T
= Φ(t, 0)E(X(0)X (0))Φ(s, 0)
Z tZ s
+ Φ(t, ξ)F E(W (ξ)W T (λ))F T Φ(s, λ) dξ dλ
0 0
Z t
T
= Φ(t, 0)P (0)φ (s, 0) + Φ(t, λ)F RW (λ)F T Φ(s, λ) dλ.
0
Now use the fact that Φ(s, 0) = Φ(s, t)Φ(t, 0) (and similar relations) to
obtain
RX (t, s) = P (t)ΦT (s, t)
where
Z T
T
P (t) = Φ(t, 0)P (0)Φ (t, 0) + Φ(t, λ)F RW F T (λ)ΦT (t, λ)dλ
0
Finally, differentiate to obtain
Ṗ (t) = AP + P AT + F RW F, P (0) = P0
(see Friedland for details).
The correlation matrix for the output Y can be computed using the fact
that Y = CX and hence RY = C T RX C. We will often be interested in
the steady state properties of the output, which are given by the following
proposition.
Proposition 4.3 (Steady state response to white noise). For a time-invariant
linear system driven by white noise, the correlation matrices for the state and
output converge in steady state to
T
RX (τ ) = RX (t, t + τ ) = P eA τ , RY (τ ) = CRX (τ )C T
4-18 CHAPTER 4. STOCHASTIC SYSTEMS

where P satisfies the algebraic equation


AP + P AT + F RW F T = 0 P > 0. (4.15)
Equation (4.15) is called the Lyapunov equation and can be solved in
MATLAB using the function lyap.
Example 4.3 First-order system
Consider a scalar linear process
Ẋ = −aX + W, Y = cX,
where W is a white, Gaussian random process with noise intensity σ 2 . Using
the results of Proposition 4.2, the correlation function for X is given by
RX (t, t + τ ) = p(t)e−aτ
where p(t) > 0 satisfies
p(t) = −2ap + σ 2 .
We can solve explicitly for p(t) since it is a (non-homogeneous) linear dif-
ferential equation:
σ2
p(t) = e−2at p(0) + (1 − e−2at )
.
2a
Finally, making use of the fact that Y = cX we have
σ 2 −aτ
r(t, t + τ ) = c2 (e−2at p(0) + (1 − e−2at )
)e .
2a
In steady state, the correlation function for the output becomes
c2 σ 2 −aτ
r(τ ) = e .
2a
Note correlation function has the same form as the Ornstein-Uhlenbeck pro-
cess in Example 4.7 (with Q = c2 σ 2 ). ∇

4.5 Random Processes in the Frequency Domain


As in the case of deterministic linear systems, we can analyze a stochastic
linear system either in the state space or the frequency domain. The fre-
quency domain approach provides a very rich set of tools for modeling and
analysis of interconnected systems, relying on the frequency response and
transfer functions to represent the flow of signals around the system.
Given a random process X(t), we can look at the frequency content of
the properties of the response. In particular, if we let ρ(τ ) be the correlation
function for a (scalar) random process, then we define the power spectral
density function as the Fourier transform of ρ:
Z ∞ Z ∞
−jωτ 1
S(ω) = ρ(τ )e dτ, ρ(τ ) = S(ω)ejωτ dτ.
−∞ 2π −∞
4.5. RANDOM PROCESSES IN THE FREQUENCY DOMAIN 4-19

log S(ω)

ω0 log ω
Figure 4.4: Spectral power density for a first-order Markov process.
.

The power spectral density provides an indication of how quickly the values
of a random process can change through the frequency content: if there
is high frequency content in the power spectral density, the values of the
random variable can change quickly in time.
Example 4.4 First-order Markov process
To illustrate the use of these measures, consider a first-order Markov process
as defined in Example 4.7. The correlation function is
Q −ω0 (τ )
ρ(τ ) = e .
2ω0
The power spectral density becomes
Z ∞
Q −ω|τ | −jωτ
S(ω) = e e dτ
−∞ 2ω0
Z 0 Z ∞
Q (ω−jω)τ Q (−ω−jω)τ Q
= e dτ + e dτ = 2 .
−∞ 2ω0 0 2ω0 ω + ω02
We see that the power spectral density is similar to a transfer function and
we can plot S(ω) as a function of ω in a manner similar to a Bode plot,
as shown in Figure 4.4. Note that although S(ω) has a form similar to a
transfer function, it is a real-valued function and is not defined for complex
s. ∇
Using the power spectral density, we can more formally define “white
noise”: a white noise process is a zero-mean, random process with power
spectral density S(ω) = W = constant for all ω. If X(t) ∈ Rn (a random
vector), then W ∈ Rn×n . We see that a random process is white if all
frequencies are equally represented in its power spectral density; this spectral
property is the reason for the terminology “white”. The following proposition
verifies that this formal definition agrees with our previous (time domain)
definition.
Proposition 4.4. For a white noise process,
Z ∞
1
ρ(τ ) = S(ω)ejωτ dτ = W δ(τ ),
2π −∞
4-20 CHAPTER 4. STOCHASTIC SYSTEMS

where δ(τ ) is the unit impulse function.


Proof. If τ 6= 0 then
Z ∞
1
ρ(τ ) = W (cos(ωτ ) + j sin(ωτ ) dτ = 0
2π −∞

If τ = 0 then ρ(τ ) = ∞. Can show that


Z ǫZ ∞
ρ(0) = lim (· · · ) dωdτ = W δ(0)
ǫ→0 −ǫ −∞

Given a linear system


Ẋ = AX + F W, Y = CX,
with W given by white noise, we can compute the spectral density function
corresponding to the output Y . We start by computing the Fourier transform
of the steady state correlation function (4.14):
Z ∞ Z ∞ 
SY (ω) = h(ξ)Qh(ξ + τ )dξ e−jωτ dτ
−∞ 0
Z ∞ Z ∞ 
−jωτ
= h(ξ)Q h(ξ + τ )e dτ dξ
0 −∞
Z ∞ Z ∞ 
= h(ξ)Q h(λ)e−jω(λ−ξ) dλ dξ
Z0 ∞ 0

= h(ξ)ejωξ dξ · QH(jω) = H(−jω)QH(jω)


0
This is then the (steady state) response of a linear system to white noise.
As with transfer functions, one of the advantages of computations in
the frequency domain is that the composition of two linear systems can be
represented by multiplication. In the case of the power spectral density, if we
pass white noise through a system with transfer function H1 (s) followed by
transfer function H2 (s), the resulting power spectral density of the output
is given by
SY (ω) = H1 (−jω)H2 (−jω)Qu H2 (jω)H1 (jω).
As stated earlier, white noise is an idealized signal that is not seen in
practice. One of the ways to produced more realistic models of noise and
disturbances is to apply a filter to white noise that matches a measured
power spectral density function. Thus, we wish to find a covariance W and
filter H(s) such that we match the statistics S(ω) of a measured noise or dis-
turbance signal. In other words, given S(ω), find W > 0 and H(s) such that
S(ω) = H(−jω)W H(jω). This problem is know as the spectral factorization
problem.
4.6. FURTHER READING 4-21
2 2
v y
1 − 1 −
p(v) = √ e 2RV p(y) = √ e 2RY
2πRV V −→ H −→ Y 2πRY
SV (ω) = RV SY (ω) = H(−jω)RV H(jω)

Ẋ = AX + F V ρY (τ ) = RY (τ ) = CP e−A|τ | C T
ρV (τ ) = RV δ(τ )
Y = CX AP + P AT + F RV F T = 0

Figure 4.5: Summary of steady state stochastic response.

Figure 4.5 summarizes the relationship between the time and frequency
domains.

4.6 Further Reading


There are several excellent books on stochastic systems that cover the re-
sults in this chapter in much more detail. For discrete-time systems, the
textbook by Kumar and Varaiya [KV86] provides an derivation of the key
results. Results for continuous-time systems can be found in the textbook by
Friedland [Fri04]. Åström [Åst06] gives a very elegant derivation in a unified
framework that integrates discrete-time and continuous-time systems.

Exercises
4.1 Let Z be a random random variable that is the sum of two independent
normally (Gaussian) distributed random variables X1 and X2 having means
m1 , m2 and variances σ12 , σ22 respectively. Show that the probability density
function for Z is
Z ∞  
1 (z − x − m1 )2 (x − m2 )2
p(z) = exp − − dx
2πσ1 σ2 −∞ 2σ12 2σ22
and confirm that this is normal (Gaussian) with mean m1 +m2 and variance
σ12 + σ22 . (Hint: Use the fact that p(z|x2 ) = pX1 (x1 ) = pX1 (z − x2 ).)
4.2 (ÅM08, Exercise 7.13) Consider the motion of a particle that is under-
going a random walk in one dimension (i.e., along a line). We model the
position of the particle as
x[k + 1] = x[k] + u[k],
where x is the position of the particle and u is a white noise processes with
E(u[i]) = 0 and E(u[i] u[j])Ru δ(i − j). We assume that we can measure
x subject to additive, zero-mean, Gaussian white noise with covariance 1.
Show that the expected value of the particle as a function of k is given by
k−1
X
E(x[k]) = E(x[0]) + E(u[i]) = E(x[0]) =: µx
i=0
4-22 CHAPTER 4. STOCHASTIC SYSTEMS

and the covariance E((x[k] − µx )2 ) is given by


k−1
X
E((x[k] − µx )2 ) = E(u2 [i]) = kRu
i=0

4.3 Consider a second order system with dynamics


        
Ẋ1 −a 0 X1 1   X1
= + v, Y = 1 1
Ẋ2 0 −b X2 1 X2
that is forced by Gaussian white noise with zero mean and variance σ 2 .
Assume a, b > 0.

(a) Compute the correlation function ρ(τ ) for the output of the system.
Your answer should be an explicit formula in terms of a, b and σ.
(b) Assuming that the input transients have died out, compute the mean
and variance of the output.

4.4 Find a constant matrix A and vectors F and C such that for
Ẋ = AX + F W, Y = CX
the power spectrum of Y is given by
1 + ω2
S(ω) =
(1 − 7ω 2 )2 + 1
Describe the sense in which your answer is unique.
Chapter Five
Kalman Filtering

In this chapter we derive the optimal estimator for a linear system in con-
tinuous time (also referred to as the Kalman-Bucy filter). This estimator
minimizes the covariance and can be implemented as a recursive filter.
Prerequisites. Readers should have basic familiarity with continuous-time
stochastic systems at the level presented in Chapter 4.

5.1 Linear Quadratic Estimators


Consider a stochastic system
Ẋ = AX + Bu + F W, Y = CX + V,
where X represents that state, u is the (deterministic) input, W represents
disturbances that affect the dynamics of the system and V represents mea-
surement noise. Assume that the disturbance W and noise V are zero-mean,
Gaussian white noise (but not necessarily stationary):
1 1 T −1
p(w) = √
n
√ e− 2 w R W w E{W (s)W T (t)} = RW (t)δ(t − s)
2π det RW
1 1 T −1
p(v) = √
n
√ e− 2 v R v v E{V (s)V T (t)} = Rv (t)δ(t − s)
2π det Rv
We also assume that the cross correlation between W and V is zero, so that
the disturbances are not correlated with the noise. Note that we use multi-
variable Gaussians here, with noise intensities RW ∈ Rm×m and RV ∈ Rp×p .
In the scalar case, RW = σW 2 and R = σ 2 .
V V
We formulate the optimal estimation problem as finding the estimate
X̂(t) that minimizes the mean square error E{(X(t) − X̂(t))(X(t) − X̂(t))T }
given {Y (τ ) : 0 ≤ τ ≤ t}. It can be shown that this is equivalent to find-
ing the expected value of X subject to the “constraint” given by all of the
previous measurements, so that X̂(t) = E{X(t)|Y (τ ), τ ≤ t}. This was the
way that Kalman originally formulated the problem and it can be viewed
as solving a least squares problem: given all previous Y (t), find the estimate
X̂ that satisfies the dynamics and minimizes the square error with the mea-
sured data. We omit the proof since we will work directly with the error
formulation.
5-2 CHAPTER 5. KALMAN FILTERING

Theorem 5.1 (Kalman-Bucy, 1961). The optimal estimator has the form
of a linear observer
˙
X̂ = AX̂ + BU + L(Y − C X̂)
where L(t) = P (t)C T Rv−1 and P (t) = E{(X(t) − X̂(t))(X(t) − X̂(t))T }
satisfies
Ṗ = AP + P AT − P C T Rv−1 (t)CP + F RW (t)F T ,
P (0) = E{X(0)X T (0)}.
Sketch of proof. The error dynamics are given by
Ė = (A − LC)E + ξ, ξ = F W − LV, Rξ = F RW F T + LRv LT
The covariance matrix PE = P for this process satisfies
Ṗ = (A − LC)P + P (A − LC)T + F RW F T + LRv LT
= AP + P AT + F RW F T − LCP − P C T LT + LRv LT
= AP + P AT + F RW F T + (LRv − P C T )Rv−1 (LRv + P C T )T
− P C T Rv CP,
where the last line follows by completing the square. We need to find L such
that P (t) is as small as possible, which can be done by choosing L so that
Ṗ decreases by the maximum amount possible at each instant in time. This
is accomplished by setting
LRv = P C T =⇒ L = P C T Rv−1 .

Note that the Kalman filter has the form of a recursive filter: given P (t) =
E{E(t)E T (t)} at time t, can compute how the estimate and covariance
change. Thus we do not need to keep track of old values of the output.
Furthermore, the Kalman filter gives the estimate X̂(t) and the covariance
PE (t), so you can see how well the error is converging.
If the noise is stationary (RW , RV constant) and if the dynamics for P (t)
are stable, then the observer gain converges to a constant and satisfies the
algebraic Riccati equation:
L = P C T Rv−1 AP + P AT − P C T Rv−1 CP + F RW F T .
This is the most commonly used form of the controller since it gives an
explicit formula for the estimator gains that minimize the error covariance.
The gain matrix for this case can solved use the lqe command in MATLAB.
Another property of the Kalman filter is that it extracts the maximum
possible information about output data. To see this, consider the residual
random process
R = Y − C X̂
5.2. EXTENSIONS OF THE KALMAN FILTER 5-3

(this process is also called the innovations process). It can be shown for the
Kalman filter that the correlation matrix of R is given by

RR (t, s) = W (t)δ(t − s).

This implies that the residuals are a white noise process and so the output
error has no remaining dynamic information content.

5.2 Extensions of the Kalman Filter


Correlated disturbances and noise
The derivation of the Kalman filter assumes that the disturbances and
noise are independent and white. Removing the assumption of indepen-
dence is straightforward and simply results in a cross term (E{W (t)V (s)} =
RW V δ(s − t)) being carried through all calculations.
To remove the assumption of white noise disturbances, we can construct
a filter that takes white noise as an input and produces a disturbance source
with the appropriate correlation function (or equivalently, spectral power
density function). The intuition behind this approach is that we must have
an internal model of the noise and/or disturbances in order to capture the
correlation between different times.
Eliminating correlated sensor noise is more difficult.

Extended Kalman filters


Consider a nonlinear system

Ẋ = f (X, U, W ), X ∈ Rn , u ∈ Rm ,
Y = CX + V, Y ∈ Rp ,

where W and V are Gaussian white noise processes with covariance matrices
RW and RV . A nonlinear observer for the system can be constructed by using
the process
˙
X̂ = f (X̂, U, 0) + L(Y − C X̂).

If we define the error as E = X − X̂, the error dynamics are given by

Ė = f (X, U, W ) − f (X̂, U, 0) − LC(X − X̂)


= F (E, X̂, U, W ) − LCe

where
F (E, X̂, U, W ) = f (E + X̂, U, W ) − f (X̂, U, 0)
5-4 CHAPTER 5. KALMAN FILTERING

We can now linearize around current estimate X̂:

∂F ∂F
Ê = E + F (0, X̂, U, 0) + W− LCe + h.o.t
∂E | {z ∂W
} | {z } |{z}
=0 noise obserwer gain

≈ ÃE + F̃ W − LCE,

where the matrices



∂F ∂f
à = =
∂e (0,X̂,U,0) ∂X (X̂,U,0)

∂F ∂f
F̃ = =
∂W (0,X̂,U,0) ∂W (X̂,U,0)

depend on current estimate X̂. We can now design an observer for the lin-
earized system around the current estimate:

˙
X̂ = f (X̂, U, 0) + L(Y − C X̂), L = P C T Rv−1 ,
Ṗ = (Ã − LC)P + P (Ã − LC)T + F̃ RW F̃ T + LRv LT ,
P (t0 ) = E{X(t0 )X T (t0 )}

This is called the (Schmidt) extended Kalman filter (EKF).


The intuition in the Kalman filter is that we replace the prediction por-
tion of the filter with the nonlinear modeling while using the instantaneous
linearization to compute the observer gain. Although we lose optimality,
in applications the extended Kalman filter works very well and it is very
versatile, as illustrated in the following example.

Example 5.1 Online parameter estimation


Consider a linear system with unknown parameters ξ

Ẋ = A(ξ)X + B(ξ)U + F W, ξ ∈ Rp ,
Y = C(ξ)X + V.

We wish to solve the parameter identification problem: given U (t) and Y (t),
estimate the value of the parameters ξ.
One approach to this online parameter estimation problem is to treat ξ
as an unknown state that has zero derivative:

Ẋ = A(ξ)X + B(ξ)U + F W, ξ˙ = 0.
5.3. LQG CONTROL 5-5

We can now write the dynamics in terms of the extended state Z = (X, ξ):
f( X
h i
ξ ,U,W )
  z   }|   {  
d X A(ξ) 0 X B(ξ) F
= + U+ W,
dt ξ 0 0 ξ 0 0

Y = C(ξ)X + V .
| h {zi }
h( X
ξ ,W )

This system is nonlinear in the extended state Z, but we can use the ex-
tended Kalman filter to estimate Z. If this filter converges, then we obtain
both an estimate of the original state X and an estimate of the unknown
parameter ξ ∈ Rp .
Remark: need various observability conditions on augmented system in
order for this to work. ∇

5.3 LQG Control


Return to the original “H2 ” control problem

Ẋ = AX + BU + F W W, V Gaussian white
Figure noise with covariance
Y = CX + V RW , RV
Stochastic control problem: find C(s) to minimize
Z ∞ 
 
J =E (Y − r)T RW (Y − r)T + U T Rv U dt
0
Assume for simplicity that the reference input r = 0 (otherwise, translate
the state accordingly).
Theorem 5.2 (Separation principle). The optimal controller has the form
˙
X̂ = AX̂ + BU + L(Y − C X̂)
U = K(X̂ − Xd )
where L is the optimal observer gain ignoring the controller and K is the
optimal controller gain ignoring the noise.
This is called the separation principle (for H2 control).

5.4 Application to a Thrust Vectored Aircraft


To illustrate the use of the Kalman filter, we consider the problem of esti-
mating the state for the Caltech ducted fan, described already in Section 3.4.
5-6 CHAPTER 5. KALMAN FILTERING

The following code implements an extended Kalman filter in MATLAB,


by constructing a state vector that consists of the actual state, the estimated
state and the elements of the covariance matrix P (t):
% pvtol.m - nonlinear PVTOL model, with LQR and EKF
% RMM, 5 Feb 06
%
% This function has the dynamics for a nonlinear planar vertical takeoff
% and landing aircraft, with an LQR compensator and EKF estimator.
%
% state(1) x position, in meters
% state(2) y position, in meters
% state(3) theta angle, in radians
% state(4-6) velocities
% state(7-12) estimated states
% state(13-48) covariance matrix (ordered rowise)

function deriv = pvtol(t, state, flags)


global pvtol_K; % LQR gain
global pvtol_L; % LQE gain (temporary)
global pvtol_Rv; % Disturbance covariance
global pvtol_Rw; % Noise covariance
global pvtol_C; % outputs to use
global pvtol_F; % disturbance input

% System parameters
J = 0.0475; % inertia around pitch axis
m = 1.5; % mass of fan
r = 0.25; % distance to flaps
g = 10; % gravitational constant
d = 0.2; % damping factor (estimated)

% Initialize the derivative so that it is correct size and shape


deriv = zeros(size(state));

% Extract the current state estimate


x = state(1:6);
xhat = state(7:12);

% Get the current output, with noise


y = pvtol_C*x + pvtol_C * ...
[0.1*sin(2.1*t); 0.1*sin(3.2*t); 0; 0; 0; 0];

% Compute the disturbance forces


fd = [
0.01*sin(0.1*t); 0.01*cos(0.027*t); 0
];

% Compute the control law


F = -pvtol_K * xhat + [0; m*g];

% A matrix at current estimated state


A = [ 0 0 0 1 0 0;
5.4. APPLICATION TO A THRUST VECTORED AIRCRAFT 5-7

0 0 0 0 1 0;
0 0 0 0 0 1;
0, 0, (-F(1)*sin(xhat(3)) - F(2)*cos(xhat(3)))/m, -d, 0, 0;
0, 0, (F(1)*cos(xhat(3)) - F(2)*sin(xhat(3)))/m, 0, -d, 0;
0 0 0 0 0 0 ];

% Estimator dynamics (prediction)


deriv(7) = xhat(4); deriv(8) = xhat(5); deriv(9) = xhat(6);
deriv(10) = (F(1) * cos(xhat(3)) - F(2) * sin(xhat(3)) - d*xhat(4)) / m;
deriv(11) = (F(1) * sin(xhat(3)) + F(2) * cos(xhat(3)) - m*g - d*xhat(5)) / m;
deriv(12) = (F(1) * r) / J;

% Compute the covariance


P = reshape(state(13:48), 6, 6);
dP = A * P + P * A’ - P * pvtol_C’ * inv(pvtol_Rw) * pvtol_C * P + ...
pvtol_F * pvtol_Rv * pvtol_F’;
L = P * pvtol_C’ * inv(pvtol_Rw);

% Now compute correction


xcor = L * (y - pvtol_C*xhat);
for i = 1:6, deriv(6+i) = deriv(6+i) + xcor(i); end;

% PVTOL dynamics
deriv(1) = x(4); deriv(2) = x(5); deriv(3) = x(6);
deriv(4) = (F(1)*cos(x(3)) - F(2)*sin(x(3)) - d*x(4) + fd(1)) / m;
deriv(5) = (F(1)*sin(x(3)) + F(2)*cos(x(3)) - m*g - d*x(5) + fd(2)) / m;
deriv(6) = (F(1) * r + fd(3)) / J;

% Copy in the covariance updates


for i = 1:6,
for j = 1:6,
deriv(6+6*i+j) = dP(i, j);
end;
end;

% All done
return;

To show how this estimator can be used, consider the problem of stabiliz-
ing the system to the origin with an LQR controller that uses the estimated
state. The following MATLAB code implements the controller, using the
previous simulation:
% kf_dfan.m - Kalman filter for the ducted fan
% RMM, 5 Feb 06

% Global variables to talk to simulation modle


global pvtol_K pvtol_L pvtol_C pvtol_Rv pvtol_Rw pvtol_F;

%%
%% Ducted fan dynamics
%%
%% These are the dynamics for the ducted fan, written in state space
5-8 CHAPTER 5. KALMAN FILTERING

%% form.
%%

% System parameters
J = 0.0475; % inertia around pitch axis
m = 1.5; % mass of fan
r = 0.25; % distance to flaps
g = 10; % gravitational constant
d = 0.2; % damping factor (estimated)

% System matrices (entire plant: 2 input, 2 output)


A = [ 0 0 0 1 0 0;
0 0 0 0 1 0;
0 0 0 0 0 1;
0 0 -g -d/m 0 0;
0 0 0 0 -d/m 0;
0 0 0 0 0 0 ];

B = [ 0 0;
0 0;
0 0;
1/m 0;
0 1/m;
r/J 0 ];

C = [ 1 0 0 0 0 0;
0 1 0 0 0 0 ];

D = [ 0 0; 0 0];

dfsys = ss(A, B, C, D);

%%
%% State space control design
%%
%% We use an LQR design to choose the state feedback gains
%%

K = lqr(A, B, eye(size(A)), 0.01*eye(size(B’*B)));


pvtol_K = K;

%%
%% Estimator #1
%%

% Set the disturbances and initial condition


pvtol_F = eye(6);
pvtol_Rv = diag([0.0001, 0.0001, 0.0001, 0.01, 0.04, 0.0001]);

x0 = [0.1 0.2 0 0 0 0];


R11 = 0.1; R22 = 0.1; R33 = 0.001;
5.4. APPLICATION TO A THRUST VECTORED AIRCRAFT 5-9

% Set the weighting matrices (L is computed but not used)


pvtol_C = [1 0 0 0 0 0; 0 1 0 0 0 0];
pvtol_Rw = diag([R11 R22]);
pvtol_L = lqe(A, pvtol_F, pvtol_C, pvtol_Rv, pvtol_Rw);

[t1, y1] = ode45(@pvtol, [0, 15], ...


[x0 0*x0 reshape(x0’*x0, 1, 36)]);

subplot(321);
plot(t1, y1(:,1), ’b-’, t1, y1(:,2), ’g--’);
legend x y;
xlabel(’time’);
ylabel(’States (no \theta)’);
axis([0 15 -0.3 0.3]);

subplot(323);
plot(t1, y1(:,7) - y1(:,1), ’b-’, ...
t1, y1(:,8) - y1(:,2), ’g--’, ...
t1, y1(:,9) - y1(:,3), ’r-’);
legend xerr yerr terr;
xlabel(’time’);
ylabel(’Errors (no \theta)’);
axis([0 15 -0.2 0.2]);

subplot(325);
plot(t1, y1(:,13), ’b-’, t1, y1(:,19), ’g--’, t1, y1(:,25), ’r-’);
legend P11 P22 P33
xlabel(’time’);
ylabel(’Covariance (w/ \theta)’);
axis([0 15 -0.2 0.2]);

%%
%% Estimator #2
%%

% Now change the output and see what happens (L computed but not used)
pvtol_C = [1 0 0 0 0 0; 0 1 0 0 0 0; 0 0 1 0 0 0];
pvtol_Rw = diag([R11 R22 R33]);
pvtol_L = lqe(A, pvtol_F, pvtol_C, pvtol_Rv, pvtol_Rw);

[t2, y2] = ode45(@pvtol, [0, 15], ...


[x0 0*x0 reshape(x0’*x0, 1, 36)]);
subplot(322);
plot(t2, y2(:,1), ’b-’, t2, y2(:,2), ’g--’);
legend x y;
xlabel(’time’);
ylabel(’States (w/ \theta)’);
axis([0 15 -0.3 0.3]);

subplot(324);
plot(t2, y2(:,7) - y2(:,1), ’b-’, ...
t2, y2(:,8) - y2(:,2), ’g--’, ...
5-10 CHAPTER 5. KALMAN FILTERING

t2, y2(:,9) - y2(:,3), ’r-’);


legend xerr yerr terr;
xlabel(’time’);
ylabel(’Errors (w/ \theta)’);
axis([0 15 -0.2 0.2]);

subplot(326);
plot(t2, y2(:,13), ’b-’, t2, y2(:,19), ’g--’, t2, y2(:,25), ’r-’);
legend P11 P22 P33
xlabel(’time’);
ylabel(’Covariance (w/ \theta)’);
axis([0 15 -0.2 0.2]);

print -dpdf dfan_kf.pdf

5.5 Further Reading

Exercises
5.1 Consider the problem of estimating the position of an autonomous
mobile vehicle using a GPS receiver and an IMU (inertial measurement
unit). The dynamics of the vehicle are given by

y l ẋ = cos θ v
θ ẏ = sin θ v
1
θ̇ = tan φ v,

x
We assume that the vehicle is disturbance free, but that we have noisy
measurements from the GPS receiver and IMU and an initial condition error.
In this problem we will utilize the full form of the Kalman filter (including
the Ṗ equation).
(a) Suppose first that we only have the GPS measurements for the xy
position of the vehicle. These measurements give the position of the
vehicle with approximately 1 meter accuracy. Model the GPS error
as Gaussian white noise with σ = 1.2 meter in each direction and
design an optimal estimator for the system. Plot the estimated states
and the covariances for each state starting with an initial condition
of 5 degree heading error at 10 meters/sec forward speed (i.e., choose
x(0) = (0, 0, 5π/180) and x̂ = (0, 0, 0)).
(b) An IMU can be used to measure angular rates and linear acceleration.
Assume that we use a Northrop Grumman LN200 to measure the
5.5. FURTHER READING 5-11

angular rate θ̇. Use the datasheet on the course web page to determine
a model for the noise process and design a Kalman filter that fuses
the GPS and IMU to determine the position of the vehicle. Plot the
estimated states and the covariances for each state starting with an
initial condition of 5 degree heading error at 10 meters/sec forward
speed.
Note: be careful with units on this problem!
Chapter Six
Sensor Fusion

In this chapter we consider the problem of combining the data from differ-
ent sensors to obtain an estimate of a (common) dynamical system. Unlike
the previous chapters, we focus here on discrete-time processes, leaving the
continuous-time case to the exercises. We begin with a summary of the in-
put/output properties of discrete-time systems with stochastic inputs, then
present the discrete-time Kalman filter, and use that formalism to formulate
and present solutions for the sensor fusion problem. Some advanced methods
of estimation and fusion are also summarized at the end of the chapter that
demonstrate how to move beyond the linear, Gaussian process assumptions.
Prerequisites. The material in this chapter is designed to be reasonably self-
contained, so that it can be used without covering Sections ??–4.4 or Chap-
ter 5 of this supplement. We assume rudimentary familiarity with discrete-
time linear systems,, at the level of the brief descriptions in Chapters 2 and 5
of ÅM08, and discrete-time random processes as described in Section 4.2 of
these notes.

6.1 Discrete-Time Stochastic Systems


We begin with a concise overview of stochastic system in discrete time,
echoing our development of continuous-time random systems described in
Chapter 4. We consider systems of the form
X[k + 1] = AX[k] + Bu[k] + F W [k], Y [k] = CX[k] + V [k], (6.1)
where X ∈ Rn represents the state, u ∈ Rm represents the (deterministic)
input, W ∈ Rq represents process disturbances, Y ∈ Rp represents the
system output and W ∈ Rp represents measurement noise.
As in the case of continuous-time systems, we are interested in the re-
sponse of the system to the random input W [k]. We will assume that W is a
Gaussian process with zero mean and correlation function ρW (k, k + d) (or
correlation matrix RW (k, k + d) if W is vector valued). As in the continuous
case, we say that a random process is white noise if RW (k, k + d) = RW δ(d)
with δ(d) = 1 if d = 0 and 0 otherwise. (Note that in the discrete-time case,
white noise has finite covariance.)
To compute the response Y [k] of the system, we look at the properties
of the state vector X[k]. For simplicity, we take u = 0 (since the system is
6-2 CHAPTER 6. SENSOR FUSION

linear, we can always add it back in by superposition). Note first that the
state at time k + l can be written as
X[k + l] = AX[k + l − 1] + F W [x + l − 1]
= A(AX[k + l − 2] + F W [x + l − 2]) + F W [x + l − 1]
l
X
= Al X[k] + Aj−1 F W [k + l − j].
j=1

The mean of the state at time k is given by


k
X
E(X[k]) = Ak E(E[0]) + Aj−1 F E(W [k − j]) = Ak E(X[0]).
j=1

To compute the covariance RX (k, k+d), we start by computing RX (k, k+1):


RX (k, k + 1) = E{X[k]X T [k + 1]}
= E{(Ak x[0] + Ak−1 F w[0] + · · · + ABw[k − 2] + B[k − 1])·
(Ak+1 x[0] + Ak Bw[0] + · · · + Bw[k])T }
Performing a similar calculation for RX (k, k + l), it can be shown that

RX (k, k + l) = Ak P [0](AT )k + Ak−1 F RW [0]F T (AT )k−1 + . . .



+ F RW [k]F T (AT )l =: P [k](AT )l , (6.2)
where
P [k + 1] = AP [k]AT + F RW [k]F T . (6.3)
The matrix P [k] is the covariance of the state matrix and we see that its
value can be computed recursively starting with P [0] = E(X[0]X T [0]) and
then applying equation (6.3). Equations (6.2) and (6.3) are the equivalent
of Proposition 4.2 for continuous-time processes. If we additionally assume
that W is stationary and focus on the steady state response, we obtain the
following.
Proposition 6.1 (Steady state response to white noise). For a discrete-
time, time-invariant, linear system driven by white noise, the correlation
matrices for the state and output converge in steady state to
RX (d) = RX (k, k + d) = P Ad , RY (d) = CRX (d)C T ,
where P satisfies the algebraic equation
AP AT + F RW F T = 0 P > 0. (6.4)

6.2 Kalman Filters in Discrete Time (AM08)


We now consider the optimal estimator in discrete time. This material is
presented in ÅM08 in slightly simplified (but consistent) form.
6.2. KALMAN FILTERS IN DISCRETE TIME (AM08) 6-3

Consider a discrete time, linear system with input, having dynamics


X[k + 1] = AX[k] + Bu[k] + F W [k],
(6.5)
Y [k] = CX[k] + V [k],

where W [k] and V [k] are Gaussian, white noise processes satisfying
E{W [k]} = 0 E{V [k]} = 0
( (
0 k 6= j 0 k 6= j
E{W [k]W T [j]} = E{V [k]V T [j]} =
RW k=j RV k=j
E{W [k]V T [j]} = 0.
(6.6)
We assume that the initial condition is also modeled as a Gaussian random
variable with
E{X[0]} = x0 E{X[0]X T [0]} = P [0]. (6.7)

We wish to find an estimate X̂[k] that gives the minimum mean square
error (MMSE) for E{(X[k] − X̂[k])(X[k] − X̂[k])T } given the measurements
{Y [l] : 0 ≤ l ≤ k}. We consider an observer of the form

X̂[k + 1] = AX̂[k] + Bu[k] + L[k](Y [k] − C X̂[k]). (6.8)


The following theorem summarizes the main result.

Theorem 6.2. Consider a random process X[k] with dynamics (6.5) and
noise processes and initial conditions described by equations (6.6) and (6.7).
The observer gain L that minimizes the mean square error is given by
L[k] = AP [k]C T (RV + CP [k]C T )−1 ,
where
P [k + 1] = (A − LC)P [k](A − LC)T + F RW F T + LRV LT
(6.9)
P [0] = E{X[0]X T [0]}.

Proof. We wish to minimize the mean square of the error, E{(X[k]−X̂[k])(X[k]−


X̂[k])T }. We will define this quantity as P [k] and then show that it satisfies
the recursion given in equation (6.9). Let E[k] = Y [k] − C X̂[k] be the resid-
ual between the measured output and the estimated output. By definition,

P [k + 1] = E{E[k + 1]E T [k + 1]}


= (A − LC)P [k](A − LC)T + F RW F T + LRV LT
= AP [k]AT − AP [k]C T LT − LCP [k]AT +
L(RV + CP [k]C T )LT + F RW F T .
6-4 CHAPTER 6. SENSOR FUSION

Letting Rǫ = (RV + CP [k]C T ), we have


P [k + 1] = AP [k]AT − AP [k]C T LT − LCP [k]AT + LRǫ LT + F RW F T
 T
= AP [k]AT + L − AP [k]C T Rǫ−1 Rǫ L − AP [k]C T Rǫ−1
− AP [k]C T Rǫ−1 CP [k]T AT + F RW F T .

In order to minimize this expression, we choose L = AP [k]C T Rǫ−1 and the


theorem is proven.

Note that the Kalman filter has the form of a recursive filter: given P [k] =
E{E[k]E[k]T } at time k, can compute how the estimate and covariance
change. Thus we do not need to keep track of old values of the output.
Furthermore, the Kalman filter gives the estimate X̂[k] and the covariance
P [k], so we can see how reliable the estimate is. It can also be shown that
the Kalman filter extracts the maximum possible information about output
data. It can be shown that for the Kalman filter the correlation matrix for
the error is
RE [j, k] = Rδjk .
In other words, the error is a white noise process, so there is no remaining
dynamic information content in the error.
In the special case when the noise is stationary (RW , RV constant) and
if P [k] converges, then the observer gain is constant:
L = AP C T (RV + CP C T ),
where P satisfies
−1
P = AP AT + F RW F T − AP C T RV + CP C T CP AT .
We see that the optimal gain depends on both the process noise and the
measurement noise, but in a nontrivial way. Like the use of LQR to choose
state feedback gains, the Kalman filter permits a systematic derivation of
the observer gains given a description of the noise processes. The solution
for the constant gain case is solved by the dlqe command in MATLAB.

6.3 Predictor-Corrector Form


The Kalman filter can be written in a two step form by separating the
correction step (where we make use of new measurements of the output) and
the prediction step (where we compute the expected state and covariance at
the next time instant).
We make use of the notation X̂[k|j] to represent the estimated state at
time instant k given the information up to time j (where typically j = k−1).
Using this notation, the filter can be solved using the following algorithm:
6.3. PREDICTOR-CORRECTOR FORM 6-5

Step 0: Initialization.
k=1
X̂[0|0] = E{X[0]}
P [0|0] = E{X[0]X T [0]}

Step 1: Prediction. Update the estimates and covariance matrix to account


for all data taken up to time k − 1:
X̂[k|k−1] = AX̂[k−1|k−1] + Bu[k − 1]
P [k|k−1] = AP [k−1|k−1]AT + F RW [k − 1]F T

Step 2: Correction. Correct the estimates and covariance matrix to account


for the data taken at time step k:
L[k] = P [k|k−1]C T (RV + CP [k|k−1]C T )−1 ,
X̂[k|k] = X̂[k|k−1] + L[k](Y [k] − C X̂[k|k−1]),
P [k|k] = P [k|k−1] − L[k]CP [k|k−1].

Step 3: Iterate. Set k to k + 1 and repeat steps 1 and 2.


Note that the correction step reduces the covariance by an amount related to
the relative accuracy of the measurement, while the prediction step increases
the covariance by an amount related to the process disturbance.
This form of the discrete-time Kalman filter is convenient because we can
reason about the estimate in the case when we do not obtain a measure-
ment on every iteration of the algorithm. In this case, we simply update the
prediction step (increasing the covariance) until we receive new sensor data,
at which point we call the correction step (decreasing the covariance).
The following lemma will be useful in the sequel:
Lemma 6.3. The optimal gain L[k] satisfies
L[k] = P [k|k]C T RV−1
Proof. L[k] is defined as
L[k] = P [k|k−1]C T (RV + CP [k|k−1]C T )−1 .
Multiplying through by the inverse term on the right and expanding, we
have
L[k](RV + CP [k|k−1]C T ) = P [k|k−1]C T ,
L[k]RV + L[k]CP [k|k−1]C T = P [k|k−1]C T ,
and hence
L[k]RV = P [k|k−1]C T − L[k]CP [k|k−1]C T ,
= (I − L[k]C)P [k|k−1]C T = P [k|k]C T .
The desired results follows by multiplying on the right by RV −1 .
6-6 CHAPTER 6. SENSOR FUSION

Figure 6.1: Sensor fusion

6.4 Sensor Fusion


We now return to the main topic of the chapter: sensor fusion. Consider the
situation described in Figure 6.1, where we have an input/output dynamical
system with multiple sensors capable of taking measurements. The problem
of sensor fusion involves deciding how to best combine the measurements
from the individual sensors in order to accurately estimate the process state
X. Since different sensors may have different noise characteristics, evidently
we should combine the sensors in a way that places more weight on sensors
with lower noise. In addition, in some situations we may have different sen-
sors available at different times, so that not all information is available on
each measurement update.
To gain more insight into how the sensor data are combined, we investi-
gate the functional form of L[k]. Suppose that each sensor takes a measure-
ment of the form
Y i = C iX + V i, i = 1, . . . , p,
where the superscript i corresponds to the specific sensor. Let V i be a zero
mean, white noise process with covariance σi2 = RV i (0). It follows from
Lemma 6.3 that
−1
L[k] = P [k|k]C T RW .
First note that if P [k|k] is small, indicating that our estimate of X is close
to the actual value (in the MMSE sense), then L[k] will be small due to
the leading P [k|k] term. Furthermore, the characteristics of the individual
sensors are contained in the different σi2 terms, which only appears in RW .
Expanding the gain matrix, we have
 
1/σ12
−1 −1  .. 
L[k] = P [k|k]C T RW , RW = . .
2
1/σp
−1
We see from the form of RW that each sensor is inversely weighted by
its covariance. Thus noisy sensors (σi2 ≫ 1) will have a small weight and
require averaging over many iterations before their data can affect the state
estimate. Conversely, if σi2 ≪ 1, the data is “trusted” and is used with higher
6.5. INFORMATION FILTERS 6-7

weight in each iteration.

6.5 Information Filters


An alternative formulation of the Kalman filter is to make use of the in-
verse of the covariance matrix, called the information matrix, to represent
the error of the estimate. It turns out that writing the state estimator in
this form has several advantages both conceptually and when implementing
distributed computations. This form of the Kalman filter is known as the
information filter.
We begin by defining the information matrix I and the weighted state
estimate Ẑ:
I[k|k] = P −1 [k|k], Ẑ[k|k] = P −1 [k|k]X̂[k|k].
We also make use of the following quantities, which appear in the Kalman
filter equations:
−1 −1
Ωi [k|k] = (C i )T RW i
i [k|k]C , Ψi [k|k] = (C i )T RW i
i [k|k]C X̂[k|k].

Using these quantities, we can rewrite the Kalman filter equations as


Prediction Correction
 −1 p
X
I[k|k−1] = AI −1 [k−1|k−1]AT + RW , I[k|k] = I[k|k−1] + Ωi [k|k],
i=1
Xp
Ẑ[k|k−1] = I[k|k−1]AI −1 [k−1|k−1]Ẑ[k−1|k−1] + Bu[k−1], Ẑ[k|k] = Ẑ[k|k−1] + Ψi [k|k].
i=1
Note that these equations are in a particularly simple form, with the infor-
mation matrix being updated by each sensor’s Ωi and similarly the state
estimate being updated by each sensors Ψi .
Remarks:
1. Information form allows simple addition for correction step. Intuition:
add information through additional data.
2. Sensor fusion: information content = inverse covariance (for each sen-
sor)
3. Variable rate: incorporate new information whenever it arrives. No
data =⇒ no information =⇒ prediction update only.

6.6 Additional topics


Converting continuous time stochastic systems to discrete time
Ẋ = AX + Bu + F w
6-8 CHAPTER 6. SENSOR FUSION

Figure 6.2: Sensor fusion with correlated measurement noise

x(t + h) ≈ x(t) + hẋ(t)


= x(t) + hAx(t) + hBu(t) + hF W (t)
= (I + hA)X(t) + (hB)u(t) + (hF )W (t)

X[k + 1] = (I + hA) X[k] + (bB) u[k] + (hF ) W [k].


| {z } | {z } | {z }
à B̃ F̃

Correlated disturbances and noise


As in the case of continuous-time Kalman filters, in the discrete time we can
include noise or disturbances that are non-white by using a filter to generate
noise with the appropriate correlation function.
On practical method to do this is to collect samples W [1], W [2], . . . , W [N ]
and then numerically compute the correlation function

N −l
1 X
RW (l) = E{W [i]W [i + l]} = W [j]W [j + l].
N −l
j=1

6.7 Further Reading

Exercises
6.1 Consider the problem of estimating the position of an autonomous
mobile vehicle using a GPS receiver and an IMU (inertial measurement
unit). The continuous time dynamics of the vehicle are given by
6.7. FURTHER READING 6-9

y l ẋ = cos θ v
θ ẏ = sin θ v
1
θ̇ = tan φ v,

x
We assume that the vehicle is disturbance free, but that we have noisy
measurements from the GPS receiver and IMU and an initial condition error.

(a) Rewrite the equations of motion in discrete time, assuming that we


update the dynamics at a sample time of h = 0.005 sec and that we
can take ẋ to be roughly constant over that period. Run a simulation of
your discrete time model from initial condition (0, 0, 0) with constant
input φ = π/8, v = 5 and compare your results with the continuous
time model.
(b) Suppose that we have a GPS measurement that is taken every 0.1 sec-
onds and an IMU measurement that is taken every 0.01 seconds. Write
a MATLAB program that that computes the discrete time Kalman
filter for this system, using the same disturbance, noise and initial
conditions as Exercise 5.1.

6.2
6.3 Consider a continuous time dynamical system with multiple measure-
ments,
Ẋ = AX + Bu + F V, Y i = C ix + W i, i = 1, . . . , q.
Assume that the measurement noises W i are indendendent for each sensor
and have zero mean and variance σi2 . Show that the optimal estimator for
X weights the measurements by the inverse of their covariances.
6.4 Show that if we formulate the optimal estimate using an estimator of
the form
X̂[k + 1] = AX̂[k] + L[k](Y [k + 1] − CAX̂[k])
that we recover the update law in the predictor-corrector form.
Bibliography

[AF06] M. Athans and P. L. Falb. Optimal Control: An Introduction to the Theory


and Its Applications. Dover, 2006. Originally published in 1963.
[ÅM08] K. J. Åström and R. M. Murray. Feedback Systems: An Introduction for
Scientists and Engineers. Princeton University Press, 2008. Available at
http://fbsbook.org.
[Åst06] K. J. Åström. Introduction to Stochastic Control Theory. Dover, New York,
2006. Originally published by Academic Press, New York, 1970.
[BH75] A. E. Bryson, Jr. and Y.-C. Ho. Applied Optimal Control: Optimization,
Estimation, and Control. Wiley, New York, 1975.
[Bro81] R. W. Brockett. Control theory and singular Riemannian geometry. In New
Directions in Applied Mathematics, pages 11–27. Springer-Verlag, New York,
1981.
[Bry99] A. E. Bryson. Dynamic optimization. Addison Wesley, 1999.
[dB78] C. de Boor. A Practical Guide to Splines. Springer-Verlag, 1978.
[FLMR92] M. Fliess, J. Levine, P. Martin, and P. Rouchon. On differentially flat non-
linear systems. Comptes Rendus des Séances de l’Académie des Sciences,
315:619–624, 1992. Serie I.
[Fri04] B. Friedland. Control System Design: An Introduction to State Space Methods.
Dover, New York, 2004.
[GMSW] P. E. Gill, W. Murray, M. A. Saunders, and M. Wright. User’s Guide for
NPSOL 5.0: A Fortran Package for Nonlinear Programming. Systems Opti-
mization Laboratory, Stanford University, Stanford, CA 94305.
[GS01] G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes.
Oxford University Press, third edition, 2001.
[HO01] J. Hauser and H. Osinga. On the geometry of optimal control: The inverted
pendulum example. In American Control Conference, 2001.
[HP87] C. Hargraves and S. Paris. Direct trajectory optimization using nonlinear
programming and collocation. AIAA J. Guidance and Control, 10:338–342,
1987.
[Isi89] A. Isidori. Nonlinear Control Systems. Springer-Verlag, 2nd edition, 1989.
[Jad01] A. Jadbabaie. Nonlinear Receding Horizon Control: A Control Lyapunov
Function Approach. PhD thesis, California Institute of Technology, Control
and Dynamical Systems, 2001.
[JSK99] M. Jankovic, R. Sepulchre, and P. V. Kokotović. CLF based designs with
robustness to dynamic input uncertainties. Systems Control Letters, 37:45–
54, 1999.
B-2 BIBLIOGRAPHY

[JYH01] A. Jadbabaie, J. Yu, and J. Hauser. Unconstrained receding horizon control of


nonlinear systems. IEEE Transactions on Automatic Control, 46(5):776–783,
2001.
[Kal64] R. E. Kalman. When is a linear control system optimal? J. Basic Engrg.
Trans. ASME Ser. D, 86:51–60, 1964.
[KKK95] M. Krstić, I. Kanellakopoulos, and P. Kokotović. Nonlinear and Adaptive
Control Design. Wiley, 1995.
[KKM91] I. Kanellakopoulos, P. V. Kokotovic, and A. S. Morse. Systematic design of
adaptive controllers for feedback linearizable systems. IEEE Transactions on
Automatic Control, 36(11):1241–1253, 1991.
[KV86] P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification,
and Adaptive Control. Prentice Hall, Inc., 1986.
[LM67] E. B. Lee and L. Markus. Foundations of Optimal Control Theory. Robert
E. Krieger Publishing Company, 1967.
[LS95] F. L. Lewis and V. L. Syrmos. Optimal Control. Wiley, second edition, 1995.
[Lue97] David G. Luenberger. Optimization by Vector Space Methods. Wiley, New
York, 1997.
[MA73] P. J. Moylan and B. D. O. Anderson. Nonlinear regulator theory and an in-
verse optimal control problem. IEEE Trans. on Automatic Control, 18(5):460–
454, 1973.
[MDP94] P. Martin, S. Devasia, and B. Paden. A different look at output tracking—
Control of a VTOL aircraft. Automatica, 32(1):101–107, 1994.
[MFHM05] M. B. Milam, R. Franz, J. E. Hauser, and R. M. Murray. Receding horizon
control of a vectored thrust flight experiment. IEE Proceedings on Control
Theory and Applications, 152(3):340–348, 2005.
[MHJ+ 03] R. M. Murray, J. Hauser, A. Jadbabaie, M. B. Milam, N. Petit, W. B. Dunbar,
and R. Franz. Online control customization via optimization-based control.
In T. Samad and G. Balas, editors, Software-Enabled Control: Information
Technology for Dynamical Systems. IEEE Press, 2003.
[MM99] M. B. Milam and R. M. Murray. A testbed for nonlinear flight control tech-
niques: The Caltech ducted fan. In Proc. IEEE International Conference on
Control and Applications, 1999.
[MMM00] M. B. Milam, K. Mushambi, and R. M. Murray. A computational approach to
real-time trajectory generation for constrained mechanical systems. In Proc.
IEEE Control and Decision Conference, 2000.
[MRRS00] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert. Constrained
model predictive control: Stability and optimality. Automatica, 36(6):789–814,
2000.
[Mur97] R. M. Murray. Nonlinear control of mechanical systems: A Lagrangian per-
spective. Annual Reviews in Control, 21:31–45, 1997.
[PBGM62] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko.
The Mathematical Theory of Optimal Processes. Wiley-Interscience, 1962.
(translated from Russian).
[PND99] J. A. Primbs, V. Nevistić, and J. C. Doyle. Nonlinear optimal control: A
control Lyapunov function and receding horizon perspective. Asian Journal
of Control, 1(1):1–11, 1999.
BIBLIOGRAPHY B-3

[QB97] S. J. Qin and T. A. Badgwell. An overview of industrial model predictive


control technology. In J.C. Kantor, C.E. Garcia, and B. Carnahan, editors,
Fifth International Conference on Chemical Process Control, pages 232–256,
1997.
[Rug90] W. J. Rugh. Analytical framework for gain scheduling. In Proc. American
Control Conference, pages 1688–1694, 1990.
[Sey94] H. Seywald. Trajectory optimization based on differential inclusion. J. Guid-
ance, Control and Dynamics, 17(3):480–487, 1994.
[Sha90] J. S. Shamma. Analysis of gain scheduled control for nonlinear plants. IEEE
Transactions on Automatic Control, 35(12):898–907, 1990.
[SJK97] R. Sepulchre, M. Jankovic, and P. V. Kokotović. Constructive Nonlinear
Control. Springer, London, 1997.
[Son83] E. D. Sontag. A Lyapunov-like characterization of asymptotic controllability.
SIAM Journal of Control and Optimization, 21:462–471, 1983.
[vNM98] M. J. van Nieuwstadt and R. M. Murray. Rapid hover to forward flight
transitions for a thrust vectored aircraft. Journal of Guidance, Control, and
Dynamics, 21(1):93–100, 1998.
[vNRM98] M. van Nieuwstadt, M. Rathinam, and R. M. Murray. Differential flatness and
absolute equivalence. SIAM Journal of Control and Optimization, 36(4):1225–
1239, 1998.
Index

algebraic Riccati equation, Harrier AV-8B aircraft, optimization, 2-1


2-12 2-15 Ornstein-Uhlenbeck
process, 4-14
bang-bang control, 2-10 infinite horizon, 2-5
binomial distribution, 4-3 information filter, 6-7 Poisson distribution, 4-3
information matrix, 6-7 probability mass function,
cost function, 2-1 innovations process, 5-3 4-2
costate variables, 2-6 integral cost, 2-5 probability measure, 4-1
probability space, 4-1
differential flatness, 1-9 Kalman filter
recursive form, 5-2 random process, 4-8
error system, 1-3 random variable, 4-2
events, 4-1 Lagrange multipliers, 2-3 receding horizon control,
expectation, 4-7 linear quadratic, 2-5 1-3, 3-1
extended Kalman filter, 5-4 linearization, 1-4 residual random process,
extremum, 2-4 5-2
matrix differential Riccati ODE, 2-11
feasible trajectory, 2-5 equation, 2-11
feedforward, 1-4 mean, 4-4, 4-7 sample space, 4-1
final cost, 2-5 standard deviation, 4-5
finite horizon, 2-5 noise intensity, 4-15
normal distribution, 4-4 terminal cost, 2-5
gain scheduling, 1-4 two point boundary value
Gaussian distribution, 4-4 optimal control problem, problem, 2-11
2-5
Hamiltonian, 2-6 optimal value, 2-1 uniform distribution, 4-4
I-2 INDEX

Note: Under construction! The indexing for the text has not yet been
done and so this index contains an incomplete and unedited set of terms.

You might also like