Improving The Closed-Loop Performance of Nonlinear Systems
Improving The Closed-Loop Performance of Nonlinear Systems
OF NONLINEAR SYSTEMS
By
Randal W. Beard
A Thesis Submitted to the Graduate
Faculty of Rensselaer Polytechnic Institute
in Partial Fulllment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY
Major Subject: Electrical Engineering
Approved by the
Examining Committee:
i
CONTENTS
LIST OF FIGURES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : v
ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vi
ACKNOWLEDGMENT : : : : : : :::::::::::::::::::::::: vii
1. INTRODUCTION : : : : : : : :::::::::::::::::::::::: 1
1.1 Motivation : : : : : : : : : :::::::::::::::::::::::: 1
1.2 The Key Ideas : : : : : : : :::::::::::::::::::::::: 4
1.3 Organization of the Thesis :::::::::::::::::::::::: 6
2. LITERATURE REVIEW : : : : : : : : : : : : : : : : : : : : : : : : : : : 7
2.1 Methods Dependent on the Initial State : : : : : : : : : : : : : : : : 7
2.2 Linearization about a Nominal Path : : : : : : : : : : : : : : : : : : : 8
2.3 Perturbation Methods : : : : : : : : : : : : : : : : : : : : : : : : : : 8
2.4 Regularization of the Cost Function : : : : : : : : : : : : : : : : : : : 10
2.5 Feedback Linearization : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.6 Gain Scheduling : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
2.7 Other Methods : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
2.8 The Successive Approximation Method : : : : : : : : : : : : : : : : : 13
2.9 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14
3. PROBLEM FORMULATION : : : : : : : : : : : : : : : : : : : : : : : : : 16
3.1 Innite-Time Horizon Problem : : : : : : : : : : : : : : : : : : : : : : 16
3.2 Finite-Time Horizon Problem : : : : : : : : : : : : : : : : : : : : : : 22
3.3 The Generalized-Hamilton-Jacobi-Bellman Equation : : : : : : : : : : 25
3.4 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29
4. A NEW ALGORITHM TO IMPROVE CLOSED-LOOP PERFORMANCE 31
4.1 The Basic Idea of Galerkin's Method : : : : : : : : : : : : : : : : : : 31
4.2 Galerkin Projections of the GHJB equation : : : : : : : : : : : : : : : 32
4.3 The Combined Algorithm : : : : : : : : : : : : : : : : : : : : : : : : 33
4.4 Implementation Issues : : : : : : : : : : : : : : : : : : : : : : : : : : 38
4.5 Summary of the Method : : : : : : : : : : : : : : : : : : : : : : : : : 40
ii
5. CONVERGENCE AND STABILITY : : : : : : : : : : :: : : : : : : : : : 42
5.1 The Convergence Problem : : : : : : : : : : : : : :: : : : : : : : : : 42
5.2 Preliminary Tools : : : : : : : : : : : : : : : : : : :: : : : : : : : : : 43
5.3 Convergence and Stability Proofs : : : : : : : : : :: : : : : : : : : : 61
5.3.1 Convergence of Successive Approximations :: : : : : : : : : : 61
5.3.2 Convergence of Galerkin Approximations : :: : : : : : : : : : 63
5.3.3 Convergence of the Main Algorithm : : : : :: : : : : : : : : : 69
5.4 Summary of the Main Result : : : : : : : : : : : :: : : : : : : : : : 69
6. EXAMPLES AND COMPARISONS : : : : : : : : : : : : : : : : : : : : : 70
6.1 Illustrative Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : 70
6.1.1 Linear System with Non-Quadratic Cost : : : : : : : : : : : : 71
6.1.2 Nonlinear System with Non-Quadratic Cost : : : : : : : : : : 73
6.1.3 Bilinear System: Non-Smooth Control : : : : : : : : : : : : : 73
6.1.4 Inverted Pendulum : : : : : : : : : : : : : : : : : : : : : : : : 75
6.1.5 Finite-time Chemical Reactor : : : : : : : : : : : : : : : : : : 77
6.1.6 Finite-time Nonholonomic Example : : : : : : : : : : : : : : : 81
6.2 Comparative Examples : : : : : : : : : : : : : : : : : : : : : : : : : : 85
6.2.1 Comparison with Perturbation Methods : : : : : : : : : : : : 85
6.2.2 Comparison with Regularization Methods : : : : : : : : : : : 87
6.2.3 Comparison with Exact Linearization Method : : : : : : : : : 89
6.3 Design Example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 91
6.3.1 Voltage Regulation of a Power System : : : : : : : : : : : : : 91
7. CONCLUSION AND FUTURE WORK : : : : : : : : : : : : : : : : : : : 97
7.1 Overview of the Main Results : : : : : : : : : : : : : : : : : : : : : : 97
7.2 Contributions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 98
7.3 Future Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 99
7.4 Conclusion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 100
REFERENCES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 101
APPENDICES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 107
A. AUXILIARY RESULTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : 108
A.1 Proof of Lemma 5.2.1. : : : : : : : : : : : : : : : : : : : : : : : : : : 108
iii
A.2 Proof of Lemma 5.2.2. :
: : : : : : : : : : : : : : : : : : : : : : : : : 109
A.3 Proof of Lemma 5.2.3. :
: : : : : : : : : : : : : : : : : : : : : : : : : 110
A.4 Proof of Lemma 5.2.4. :
: : : : : : : : : : : : : : : : : : : : : : : : : 110
A.5 :
Proof of Corollary 5.2.5. : : : : : : : : : : : : : : : : : : : : : : : : 112
A.6 Proof of Lemma 5.2.12. :
: : : : : : : : : : : : : : : : : : : : : : : : : 112
A.7 :
Proof of Theorem 5.3.1. : : : : : : : : : : : : : : : : : : : : : : : : 114
B. GALERKIN'S METHOD : : : : : : : : : : : : : : : : : : : : : : : : : : : 115
C. LIST OF SYMBOLS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 118
iv
LIST OF FIGURES
3.1 Phase
ow plotted against lines of constant cost. : : : : : : : : 27
3.2 Successive Approximation Algorithm. : : : : : : : : : : : : : : 28
4.1 Closed Loop System. : : : : : : : : : : : : : : : : : : : : : : : 38
4.2 Algorithm for Improving Feedback Control Laws. : : : : : : : 41
5.1 Convergence Diagram : : : : : : : : : : : : : : : : : : : : : : 43
5.2 Gain margins for successive approximations u(i) . : : : : : : : : 60
6.1 Cost and control for linear system with non-quadratic cost. : : 72
6.2 Control vs. state for bilinear system. : : : : : : : : : : : : : : 74
6.3 Cost vs. position for inverted pendulum. : : : : : : : : : : : : 76
6.4 Time varying control gains for chemical reactor. : : : : : : : : 78
6.5 Time histories for chemical reactor. : : : : : : : : : : : : : : : 80
6.6 Unicycle: Control Gains : : : : : : : : : : : : : : : : : : : : : 82
6.7 Unicycle: Time history for innite-time gains : : : : : : : : : 83
6.8 Unicycle: Time Histories : : : : : : : : : : : : : : : : : : : : : 84
6.9 Comparison with perturbation method. : : : : : : : : : : : : : 86
6.10 Comparison with regularization method. : : : : : : : : : : : : 88
6.11 Comparison with feedback linearization. : : : : : : : : : : : : 90
6.12 Generator connected through transmission lines to innite bus. 92
6.13 Power generator, time histories of the states. : : : : : : : : : : 95
6.14 Power generator, time history of the control. : : : : : : : : : : 96
v
ABSTRACT
There are a variety of tools for computing stabilizing feedback control laws for non-
linear systems. The diculty is that these tools usually do not take into account the
performance of the closed-loop systems. On the other hand, optimal control theory
gives guaranteed closed-loop performance but the resulting problem is dicult to
solve for general nonlinear systems. While there may be many feedback control laws
that provide adequate performance, optimal control theory insists on the one control
that provides peak performance. In this thesis we bypass the diculties inherent
in the optimal control problem by developing a design algorithm to improve the
closed-loop performance of arbitrary, stabilizing feedback control laws.
The problem of improving the closed-loop performance of a stabilizing con-
trol reduces to solving a rst-order, linear partial dierential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. An interesting fact is that
when the process is iterated, the solution to the GHJB equation converges uniformly
to the solution of the Hamilton-Jacobi-Bellman equation which solves the optimal
control problem. The main contribution of the thesis is to show that Galerkin's
method can be used to nd a solution to the GHJB equation and that the result-
ing control laws are stable, and that when the process is iterated, it still converges
to the optimal control. The thesis therefore solves an important problem that has
remained open for over thirty years, i.e., it shows how to nd a uniform approxima-
tion to the Hamilton-Jacobi-Bellman equation such that the approximate controls
are still stable on a specied set.
The method developed in the thesis is a practical, o-line algorithm that
computes closed-form, feedback control laws with guaranteed performance. Our
algorithm is the rst practical method that computes arbitrarily close uniform ap-
proximations to the optimal control, while simultaneously guaranteeing closed-loop
asymptotic stability.
vi
ACKNOWLEDGMENT
I would rst like to acknowledge the assistance and guidance, over the past four
years, of my thesis advisor Professor George Saridis. In particular I am grateful
that George allowed me the
exibility to search for a problem in which I was was
really interested, and then insisted that I stick with the problem when it momentarily
appeared to be a dead end. I am also indebted to George for his advise to pursue
a curriculum loaded with mathematics and theory. I feel that the quality of my
graduate education has been greatly enriched by this emphasis.
I would also like to acknowledge the technical assistance and support of Pro-
fessor John Wen, who co-advised me while George was on sabbatical in Greece. It
was John who rst suggested that I try to use Galerkin's method to approximate the
Generalized-Hamilton-Jacobi-Bellman equation. I am also grateful for the enthusi-
asm which John brings to his work and attribute the beginning of my excitement
about nonlinear control to him.
I would also like to thank the other members of my thesis committee, Pro-
fessor Howard Kaufman and Professor Mark Levi for their technical assistance and
encouragement. In addition I would like to thank Professor Joe Chow for his en-
couragement with regard to applying the methods of the thesis to voltage regulation
of power generators.
As a graduate student at RPI I have been fortunate to have friends who are
technically competent and intellectually alive. Specically I would like to acknowl-
edge Fernando Lizarralde, Brian Tibbetts, Pedro Lima, Joe Musto, Lee Wilnger,
Adam Divelbiss, and Sanjeev Seereeram. In particular I would like to thank Fer-
nando for many helpful discussions, questions and critiques concerning my thesis
work.
When we moved to Troy four years ago, three other families who attend the
same church arrived at the same time for the same purpose. I am deeply grateful
for our friendship with James Maxwell, Dan Moore, and John Palmer and their
families. We share common values, beliefs, ambitions and dreams. Their association
and friendship has made our stay here particularly enjoyable.
I gratefully acknowledge the constant support and encouragement of my wife
Andrea. Over the past four years she has willingly lived the lifestyle of a graduate
student and worked hard to support our family. She is a wonderful wife and mother
and anything that is good in this thesis would not have been possible without her.
I would like to thank my children Laurann and Kaitlyn for (somewhat unwillingly)
allowing me to spend so much time at school.
I am fortunate to come from a great family and would like to acknowledge
their encouragement and support. My father instilled in me a love and thirst for
learning which I hope never dies, while my mother taught me the great truth that
to succeed you only need to act as if you already have. It was my Grandfather's
example and love of science that instilled in me, very early in life, the desire to
vii
obtain a Ph.D. I am also fortunate to have seven brothers and sisters who are some
of the best friends I have.
Finally I acknowledge the hand of God in my life. Without His help, this
thesis and everything else that I have accomplished or hope to accomplish would
not be possible.
Last but not least, I gratefully acknowledge the hand by which I've been fed:
nancial support for this research came from the Center for Intelligent Systems for
Space Exploration (CIRSSE) which was funded by NASA grants NGT 10000 and
NAGW-1333.
viii
CHAPTER 1
INTRODUCTION
1.1 Motivation
Control engineering is the study of physical systems that produce a desired
response to inputs. The objective of any control design is to shape the input such
that the output of the system has certain characteristics. For example, in an auto-
motive cruise control system, the throttle is automatically adjusted to maintain a
speed prescribed by the driver. Dynamical systems are composed of inputs, states
and outputs. The states correspond to internal dynamics and are usually associated
with energy storage devices in the system. The states are dynamically in
uenced by
the inputs to the system. The outputs, which depend statically on the states, are
variables in the system that can be measured. When the control, or input to the
system, depends on the state (resp. output) of the system, then the control is called
a state (resp. output) feedback control. Feedback control laws are desirable because,
among other things, they provide robustness with respect to external disturbances,
unmodeled dynamics and variations in the physical parameters of the system that
is being controlled. In this thesis we concentrate on state feedback control laws, i.e.,
\control" will always mean \state feedback control."
Physical systems are inherently nonlinear in nature. However, nonlinear sys-
tems are dicult to analyze mathematically. The typical approach is to linearize
the system around some operating point and to analyze the resulting linear system.
If the motion of the system is large, then the linear model of the system becomes
invalid. Therefore it is desirable to consider the full nonlinear model of the system.
To deal with the inherent mathematical diculty of nonlinear systems, one of two
approaches is typically adopted. The rst approach is to utilize specic properties
of the system to develop specic control laws that perform well for that system.
The drawback is that the results may not be applicable to any other system. The
second approach is to develop tools for general classes of nonlinear systems. The
drawback is that these tools will usually result in conservative designs since they
do not exploit specic characteristics of the system under design. To attack any
particular problem however, it is necessary to have a number of design tools from
which to draw. Since there are relatively few design tools for nonlinear systems our
objective is to develop a feedback synthesis method for a general class of nonlinear
systems.
We will address the nonlinear regulator problem, where the control objective
is to move the system states to the origin. For the regulator problem, a closed-
loop system is stable if the states converge asymptotically to the origin. There is a
large literature devoted to the problem of designing stable regulators for nonlinear
system. The most popular tool is Lyapunov's second method which is important
to our arguments and will be reviewed in chapter 3. To use Lyapunov's method,
a designer rst proposes a control and then tries to nd a Lyapunov function for
1
2
not provide a necessary and sucient condition for optimality of a feedback control,
whereas the HJB equation still provides a sucient condition for optimality of a
feedback control. This paper is a non-technical presentation of some other papers
by Vinter and Lewis, namely (Vinter and Lewis, 1978a; Vinter and Lewis, 1978b).
Standard approaches to nd the open loop optimal control via a two point
boundary value problem can be bound in (Sage and White III, 1977).
2.2 Linearization about a Nominal Path
Feedback Synthesis around a nominal path is a method that is half way be-
tween open-loop control and full feedback control. The basic idea is to assume that
the trajectory of the system is always contained in a small region about a nominal
trajectory. The method is not completely dependent on the initial state since it is
allowed to be in some small ball around the nominal x0 . A typical approach is to
linearize the system equations around the nominal path. Examples can be found in
(Merriam III, 1964).
Another approach is to partition the state space into regions where the system
equations are approximately linear. Knowledge of the nominal trajectory enables
the number of partitions to be kept reasonably small. An example of this approach
is (White and Cook, 1973).
In (Jamshidi, 1976) the control is separated into a linear and nonlinear part.
The linear part can be realized by direct feedback where the nonlinear portion is
dependent on the nominal initial state x0 . If the trajectory does not deviate too
far from the nominal path then the linear portion of the control will tend to push
the trajectory to the nominal. The diculty is that it is dicult if not impossible
to estimate the region of attraction of the linear portion of the control around the
nominal path.
Open-loop control and feedback around a nominal path result in control sys-
tems that are not robust with respect to disturbances and modeling errors. They are
generally viewed as inferior to feedback control laws but a review has been included
here for completeness. In the remainder of this chapter we will focus on methods
for synthesizing feedback control laws.
2.3 Perturbation Methods
Feedback synthesis by the use of perturbation methods addresses the problem
of synthesizing a suboptimal feedback control for the class of nonlinear systems that
are perturbations of a linear system, i.e.
x_ = Ax + Bu + f (x); (2.1)
where is assumed to be small, and the performance index is assumed to be
quadratic. The rst work to investigate this class of system is (Al'brekht, 1961),
where it is assumed that f (x) can be expanded in a power series around x = 0, and
9
that the system (A; B ) is stabilizable. Al'brekht shows that, under these conditions,
the optimal cost and control can be expanded as a power series around x = 0, and
that a rst order stabilizing control is found by standard LQR theory. Higher order
terms are obtained by solving certain linear partial dierential equations. These
higher order terms are then added to the control to improve the performance of the
system. It is shown that these higher order terms do not destroy the stability of
the system . In (Lukes, 1969), the authors present a denitive study of analytic,
innite-time regulators. A similar treatment of the nite-time problem is given in
(Willemstein, 1977). In (Werner and Cruz, 1968), it is shown that an nth order Tay-
lor series expansion of the optimal control gives a (2n + 1)th order approximation
of the performance index.
In (Garrard, 1969; Garrard, 1977; Garrard et al., 1967; Garrard and Jordan,
1977) the class of problems described by equation (2.1) is again considered. In these
papers it is shown that the associated Hamilton-Jacobi-Bellman equation can be
expanded as a power series around = 0. The cost function is again assumed to be
quadratic. The result is a a series of linear partial dierential equations that can be
solved successively. The rst equation in this series reduces to the standard Riccati
equation, and it is shown that the second equation reduces to an equation with an
analytic solution. The higher order equations, however, cannot be readily solved
and so they are ignored. The method, therefore, synthesizes controls that include
linear and cubic functions of the state. To obtain higher order terms in the control
one would have to use a technique such as the Ritz-Galerkin method used in this
thesis, to synthesize these terms.
An approach similar to Garrard's is presented in (Nishikawa et al., 1971). The
dierence is that Pontryagin's maximum principle is used to expand the co-state
equations in a power series around = 0. The problem is again reduced to solving
a series of linear PDE's. Again the linear and cubic terms in the control are readily
computed but higher order terms become dicult to obtain.
The papers, (Garrard, 1969) and (Nishikawa et al., 1971) consider a similar
example, so we can compare these methods to the synthesis method obtained in
this thesis. The results are given in section 6.2.1. In that section it is shown that
for the particular example considered, if our method is used to compute a control
with linear and cubic terms, the control performs slightly better than the control
obtained in (Garrard, 1969) and slightly worse than the control derived in (Nishikawa
et al., 1971). The advantage of our method is that it is relatively easy to compute
control terms of higher order, whereas it is not possible to do so with the methods
of (Garrard, 1969) and (Nishikawa et al., 1971).
The problem with these methods is that they are inherently tied to the con-
vergence of a power series for which it is dicult if not impossible to estimate the
region of convergence. Consequently, it is equally dicult to estimate the stability
region of a control calculated from a truncated power series. For bilinear systems,
however, it appears that that the region of attraction can be estimated, as reported
in (Cebuhar and Costanza, 1984).
10
In (Halme and Hamalainen, 1975), the authors present a method that is simi-
lar to perturbation methods. The basic idea is to represent the integral curve of the
solution (via Green functions) as a basic linear operator and then invert the opera-
tor. The method has several advantages over perturbation methods. Namely, it is
possible to estimate the region of convergence of the power series that makes up the
nal control, therefore it is possible to estimate the stability region of a truncated
control law. The method is also extended to nite-time problems.
They also address the problem of constructing nonlinear observers and then using
them for output feedback.
Another approach that is similar to gain scheduling is presented in (Cloutier
et al., 1996). The nonlinear system is rst parameterized by putting it into the form
x_ = A(x)x + g(x)u:
A state dependent Riccati equation is then constructed at each x:
AT (x)P (x) + P (x)A(x) ? P (x)g(x)R?1(x)gT (x)P (x) + Q(x) = 0 (2.3)
For certain systems, this equation can be solved explicitly: in which case the method
produces a feedback control law. There are two diculties. First, there are an
innite number of parameterizations such that
f (x) = A(x)x:
However, for each x, there is exactly one parameterization corresponding to the
optimal, and that parameterization depends on x. The second diculty is that
once a parameterization is chosen, equation (2.3) must be solved for each x. When
this can not be done explicitly, the equation must be solved at a discrete number of
points and interpolated in between, i.e., gain scheduling. The disadvantage is that it
is hard to judge the sub-optimality resulting from any particular parameterization.
2.7 Other Methods
Before discussing the method of successive approximation which is the category
to which the synthesis method derived in this thesis belongs, we discuss several other
techniques that do not seem to fall into any of the categories above.
In (Goh, 1993) a feedback control is synthesized by training a neural network to
approximate the solution of the Hamilton-Jacobi-Bellman equation. The diculty
is that stability cannot be guaranteed. The neural network is trained by computing
open-loop controls for various points in the state space.
Another synthesis technique is given in (Van Trees, 1962). In this treatise the
optimal control is realized as an innite Volterra series. The method is similar to
feedback linearization and is limited to systems that can be \inverted" in that the
nonlinearities can be eliminated by feedback. A method similar to Van Trees was
introduced by Wiener in (Wiener, 1949; Wiener, 1958) where the system equations
and input/output sequences are represented by polynomial functions.
In (Chow and Kokotovic, 1981) the authors consider the class of nonlinear
dynamics that can be divided into subsystems of slow and fast dynamics where the
fast dynamics are governed by linear equations. A two stage Lyapunov-Bellman
design is proposed where the control is shown to be stable by Lyapunov techniques
and \near-optimal" by solving the Bellman equation.
13
2.9 Conclusion
In regard to nding approximate solutions to the Hamilton-Jacobi-Bellman
(HJB) equation, an interesting quote is found in (Merriam III, 1964):
Pertinent methods of approximation must satisfy two properties. First,
the approximation must converge uniformly to the optimum control sys-
tem with increasing complexity of the approximation. Second, when the
approximation is truncated at any degree of complexity, the resulting
control system must be stable without unwanted limit cycles. At this
time (1964), no series or other known form of approximation possesses
both these required properties.
As the literature review has shown, a similar statement could also be made
prior to this thesis, but not after. There is a recognized need to nd methods of
approximating the HJB equation such that the resulting controls are guaranteed to
be stable. This need is enhanced by the recent interest in the nonlinear H1 control
problem (cf. (van der Schaft, 1992; Wise and Sedwick, 1994; Ball et al., 1993)). It
is shown in (van der Schaft, 1992) that the nonlinear H1 control problem reduces
to the solution of the Hamilton-Jacobi-Isaacs (HJI) equation. The HJI equation is
15
identical in form to the HJB equation and the method presented in this thesis is
equally applicable to the HJI equation.
CHAPTER 3
PROBLEM FORMULATION
In this chapter we give a mathematical formulation of the problem addressed in this
thesis. We will address two related control problems, namely:
The innite-time horizon problem where the system equations are as-
sumed to be autonomous and the optimization index is over an innite time
interval, and
The nite-time horizon problem where the system equations can be time-
varying and the optimization index is over a nite time interval.
The rst two sections of this chapter introduce these problems and develop
the framework in which we address them. In section 3.3, the optimal control prob-
lem is generalized to the problem of evaluating and improving the performance of
stabilizing controls. We show how that the performance of an arbitrary stabilizing
control is given by the solution to a linear partial dierential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. We also show how the so-
lution to this equation can be used to construct a control law that improves the
closed-loop performance. We show that the Hamilton-Jacobi-Bellman (HJB) equa-
tion is a special case of the GHJB equation and that the GHJB equation forms a
contraction on the set of admissible controls with the xed point corresponding to
the optimal control.
3.1 Innite-Time Horizon Problem
We rst describe the innite-time horizon problem. The innite-time problem
considers the class of nonlinear systems described by ordinary dierential equations
that are ane in the control:
x_ = f (x) + g(x)u(x); (3.1)
where x 2 IRn, f : IRn ! IRn , g : IRn ! IRnm and u : IRn ! IRm .
To ensure that the control problem is well posed we assume that f and g are
Lipschitz continuous on a set
B (0), where B (x) is a ball around x. We also
assume that f (0) = 0.
Remark 3.1.1 In most of the derivations in this thesis it is not necessary to as-
sume that f and g are autonomous for the innite-time horizon problem. However,
if f and g are not autonomous, the main algorithm derived in chapter 4 will require
the solution of a time-varying ordinary dierential equation over the interval [0; 1).
Moreover, this equation must be solved backward in time. Since this in not compu-
tationally feasible, we restrict our attention to autonomous f and g, in which case
the algorithm becomes particularly simple.
16
17
Dene '(t; x0; u) to be the solution at time t to the equation (3.1) with initial
conditions x0 and control u. To simplify the notation we write '(t) '(t; x0 ; u)
when x0 and u are understood.
The rst requirement for any closed-loop system is that the control stabilize
the system. Consequently we dene the concept of a stabilizing control. In this
thesis, stabilizing control will always mean that the system is asymptotically stable
in the sense of Lyapunov (cf. (Khalil, 1992, p. 98)).
Denition 3.1.2 Stabilizing Controls.
For the innite-time horizon problem, the control u : IRn ! IRm is said to
stabilize system (3.1) around 0 on
IRn if
f (0) + g(0)u(0) = 0
for each > 0, there exists > 0 such that
kx0k < =) k'(t; x0 ; u)k < ; 8t 0; 8x0 2
:
kx0 k < =) limt!1 k'(t; x0 ; u)k = 0, 8x0 2
.
Throughout the thesis we will frequently use concepts from Lyapunov theory.
Therefore we will brie
y review the main results (cf. (Khalil, 1992; Vidyasagar,
1993)). A function V : IRn ! IR is positive (negative) denite on
if V (x) > 0(< 0)
for all x 2
n f0g and V (0) = 0. A function V : IRn ! IR is a Lyapunov function
on
, for the system (3.1), if V is continuously dierentiable on
, and if both V
and ?V_ ? @x (f + gu) are positive denite on
. The main result of Lyapunov
@V T
18
kxk < kyk =) l(x) < l(y). In the optimal control literature, the standard perfor-
mance measure is an integral of a function of the state and control trajectories:
Z1
J (x0 ; u) = l ('(t)) + ku ('(t))k2R dt; (3.2)
0
where l : IRn ! IR is a positive denite, monotonically increasing function on
,
R 2 IRmm is a symmetric, positive denite matrix, kuk2R = uT Ru and x0 2
IRn.
l is called the state penalty function and kuk2R is the control penalty function.
Typically l is a quadratic weighting of the states, i.e., l = xT Qx where Q is a
positive denite matrix.
Remark 3.1.4 In linear quadratic optimal control Q is only required to be positive
semi-denite, with (Q1=2 ; A) - detectable. For simplicity we require that l is positive
denite which, together with the monotonically increasing property, implies that
the system is observable through l. This assumption could obviously be relaxed.
However, we see this as an unnecessary complication since l is a design parameter
that can always be chosen to satisfy the observability condition.
For equation (3.2) to give any indication of the performance of the system, the
integral must converge. Unfortunately, stability of f + gu is not sucient for the
integral to be nite.
Simple Example
The solution to the system
x_ = xu; u = ? jxj :
is
'(t) = 1 +xj0x j t :
0
4
The control u asymptotically stabilizes the system but if l(x) = jxj then
Z1 Z 1 jx0 j
l('(t)) dt = dt
0 0 1 + jx0 j t
Z 1 du
= jx0 j
1 u
= 1:
4
However if l(x) = jxj with > 1 then the integral is nite.
This necessitates the restriction of stabilizing controls to those controls that
render the cost function (3.2) nite with respect to a certain penalty on the states.
Denition 3.1.5 Admissible Controls.
Given the system (f; g). For the innite-time horizon problem, a control u :
IRn ! IRm is admissible with respect to the state penalty function l on
, written
u 2 Al (
), if
19
u is continuously dierentiable on
,
u(0) = 0,
u stabilizes (f; g) on
,
R 1 l('(t)) + ku('(t))k2 dt < 1.
0 R
We can show a simple result that links the idea of an admissible control with
Lyapunov theory.
Lemma 3.1.6 Given the system (f; g) and a continuously dierentiable, positive
denite state penalty function l, if u 2 Al (
) then there exists a Lyapunov function
for the system on
.
Proof: We will show that the function
Z 1h i
V (x) = l('(t; x; u)) + ku('(t; x; u))k2R dt
0
is a Lyapunov function on
. Since l is continuously dierentiable and u is contin-
uously dierentiable on
, the Lipschitz assumption on (f; g) guarantee that
@V = Z 1 @'T h @l ('(t; x; u)) + 2 @uT Ru('(t; x; u))idt
@x 0 @x @' @'
is continuous in x, i.e. V is continuously dierentiable. Since
the positive deniteness of l and kuk2R guarantee that V and ?V_ are also positive
denite.
We would like to make the converse statement: if u stabilizes (f; g) on
, then
there exists a continuously dierentiable, positive denite state penalty function
l :
! IR such that u 2 Al (
). Unfortunately, this is not true. The reason is that
the control penalty function is restricted to be a quadratic function of u. Therefore
there are limitations on the type of asymptotically stabilizing controls that can be
admissible. For example if the control is a linear function of x, say u(x)R= ?x and
the system decays slower than p1t then ku('(t))k2 1t and the integral 01 kuk2 dt
is not nite. Hence, we are restricted to systems that decay suciently fast. We
can, however, state the following result.
Lemma 3.1.7 If u stabilizes the system (f; g) on
, then there exists a continuously
dierentiable, positive denite state penalty function l :
! IR such that u 2 Al (
)
if and only if u has nite energy, i.e.
Z1
0
ku('(t))k2 dt < 1; 8x0 2
;
20
Proof:R The necessity is obvious. For the suciency, the assumption guaran-
0 ku('(t))kR dt < 1, therefore it is sucient to construct l such that
tees 1 2
R 1 l(that
0 '(t)) dt < 1.
Dene the set
4
(t) = fx 2 IRn : x = '(t; x0 ; u); x0 2
g:
j =1
M < 1:
21
We employ the technique of mollication (cf. (Jones, 1993)) to obtain a continuously
dierentiable l from ^l. Since the eect of mollication under the integral can be made
arbitrarily small, the proof is complete.
Remark 3.1.8 In general it is dicult to derive specic conditions under which
the control can be made to have nite energy. However an important case when this
is always true is when the linearization of (f; g) at x = 0, i.e.
!
@f (0); g(0)
@x
is stabilizable. In this case the origin can be made exponentially stable by an
appropriate linear state feedback. Therefore there exists a nonlinear state feedback,
u, such that the real parts of the eigenvalues of @f@x+gu (0) are all negative. i.e., the
origin is exponentially stable. Therefore, the integral
Z1
0
k'(t)k2Q + ku('(t))k2R dt
Z t^ Z1
0 k'(t)kQ + ku('(t))kR dt + t^ k'(t)k2Q + ku('(t))k2R dt
2 2
Z t^ Z1
0 k'(t)k2Q + ku('(t))k2R dt + t^ C1e? t + C2e? t dt 1 2
< 1;
for all Q 0 and for all R > 0, where t^ is the last time that '(t) enters the
region of state space where the linearization of the closed loop system is valid. This
example shows why it is not necessary to be concerned about the admissibility of
linear systems with quadratic penalty functions: if the linear system is stabilizable,
then any stabilizing control is admissible with respect to any quadratic state penalty
function.
In the above discussion, the specication of the set
has been somewhat
arbitrary. However
can be made as large as the stability region of u.
Lemma 3.1.9 Given a system (f; g). If u 2 Al(
) and the region of stability of
the system x_ = f + gu is IRn where
, then u 2 Al ().
Proof: Since u is asymptotically stabilizing on
, there exists a t0 < 1 such
that
t > t0 =) fy : y = '(t; x; u); x 2 g
:
Then 8x 2
Z1 Z t0
l('( ; x)) + ku('( ; x))k2R d l('( ; x)) + ku('( ; x))k2R d
0Z 0
1
+ l('( ; '(t0 ; x))) + ku('( ; '(t0; x)))k2R d:
t0
22
The rst integral is nite since it is over a nite time period and the second integral
is nite since '(t0 ; x) 2
.
In general it is dicult to nd the largest stability region, , of u. However, it
is usually possible to nd a region
, or to verify (by Lyapunov methods) that
a subset
is contained in . Since our method will require some set
over which
u is stabilizing, but does not require the entire stability region , we will retain the
notation u 2 Al (
).
We will assume throughout the thesis that system (3.1) is controllable on
,
in that for an appropriate choice of l, there exists at least one admissible control,
u 2 Al (
).
The performance of an arbitrary admissible control u(0) 2 Al(
), can be
expressed in innitesimal form by a linear partial dierential equation called the
Generalized-Hamilton-Jacobi-Bellman (GHJB) equation. An interesting fact is that
the solution to the GHJB equation can be used to nd a feedback control law that
improves the closed-loop performance of u(0) . The GHJB equation is central to the
results in the thesis and will be the subject of section 3.3. It will also be shown that
if the process is iterated then the solution to the GHJB equation converges to the
solution of the Hamilton-Jacobi-Bellman (HJB) equation.
(t;
^ ), is a compact set in IRn.
A system that does not exhibit nite escape in the interval [t0; tf ] for any
initial condition, has a bounded response. The converse, however, is not true.
Example
Consider the system
x_ = x2 :
The integral curves of this system are given by the equation
'(t; 0; x0) = 1 ?x0x t ;
0
which diverges to innity at t = x1 if x0 > 0, but is bounded if x0 0. Therefore,
this system has a bounded response for any compact set contained in the negative
0
, l : IR
! IR is a positive denite, monotonically increasing function on
,
24
We can interpret the GHJB equation geometrically. Figure 3.1(a) shows the
phase portrait of a two dimensional, innite-time system. The dotted lines represent
the trajectories of the system. The cost at any point x is computed by integrating
equation (3.2) along the unique trajectory of the system passing through x. The solid
lines in gure 3.1(a) are constant contours of the cost function. This geometrical
interpretation suggests an intuitive idea for improving the cost of the system. If we
x the constant cost contours and minimize the action of the system with respect
to these contours, then the cost of the system will be reduced. For example, the
system in gure 3.1(b) will have lower cost then the system in gure 3.1(a). In both
the innite-time and nite-time problems, the action of the system is given by the
Hamiltonian (cf. (Arnold, 1989, p. 248))
H (t; x; u; @V
@x
4 @V T
)=
@x (f + gu) + l + kuk2R :
To improve the performance of an arbitrary control u(0) we minimize the Hamil-
tonian, i.e.,
( )
u(1) = arg u2Amin(D) @V (0)T (f + gu(0)) + l +
u(0)
2
l;s @x R
= ? 1 R?1 gT @V :
(0)
(3.8)
2 @x
The cost of u(1) given by the solution of the equation GHJB V (1) ; u(1) . In
(Saridis and Lee, 1979) it is shown that V (1) (t; x) V (0) (t; x) for each (t; x) 2 D. It
is also easy to show that the convergence does not get stuck in local minimums, i.e., if
for a xed i, V (i+1) (t; x) = V (i) (t; x) then V (i) (t; x) = V (t; x). This fact allows us to
give a simple derivation of the Hamilton-Jacobi-Bellman (HJB) equation. Assume
that a unique optimal control u exists and is an admissible control. Then the
optimal cost is given by the solution to the GHJB equation
@V + @V T (f + gu) + l + kuk2 = 0:
@t @x R
(a) (b)
2 2
1.5 1.5
1 1
0.5 0.5
x1 0 x1 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
optimal control. Plugging u^ into the GHJB equation gives the Hamilton-Jacobi-
Bellman equation
!
HJB (V ) = GHJB V ; ? 1 R?1 gT @V
2 @x = 0;
i.e.,
HJB (V ) = @V + @V T f ? 1 @V T gR?1gT @V T + l = 0: (3.9)
@t @x 4 @x @x
The derivation shows the complexity inherent in the HJB equation. Referring to
gure 3.1, instead of xing the system or the cost we allow both to depend on each
other; hence the nonlinearity in the equation. The derivation also shows that the
GHJB equation is a generalization of the HJB equation. Interestingly, the GHJB
equation denes a contraction on the set of admissible controls. This fact will be
proved in chapter 5, but for now we will give the successive approximation algorithm
implied by this fact.
Algorithm 3.3.2 Successive Approximation.
Initial Step Given an initial control law, u(0) (t; x) that is admissible on D, the
performance of u(0) on D is given by the unique solution V (0) (t; x) to
GHJB V (0) ; u(0) (t; x) = 0:
Set i = 1.
28
u (0)
i=0
Solve
(i) 2
( f(x) + g(x) u (i) )
∆ TV (i) + l(x) + || u || R = 0
i=i+1
In (Glad, 1985; Glad, 1987; Tsitsiklis and Athans, 1984) it was shown that the op-
timal control u is robust in the sense that it has innite gain margin and 50% gain
reduction margin. In (Saridis and Balaram, 1986), it was shown that u(i) has similar
robustness properties. Extended versions of these results appear in chapter 5 and
Appendix A.
In this section, we have derived the GHJB equation and illustrated its impor-
tance. The GHJB equation answers three fundamental questions. First, its solution
gives a compact representation of the performance of any admissible control. Second,
its solution allows us to improve the performance of the original control. Finally,
by iterating the process we converge uniformly to the solution of the HJB equation.
The GHJB equation was also shown to be a generalization of the HJB equation.
3.4 Summary
For convenience, we conclude this chapter by succinctly summarizing the prob-
lem and our main assumptions. Given the system
x_ = f (t; x) + g(t; x)u(t; x);
with cost functional
ZT
J (t0 ; x0 ) = s ('(T ; t0; x0 ; u )) + l (t; '(t; t0; x0 ; u)) + ku (t; '(t; t0 ; x0; u))k2
R dt;
t0
we make the following assumptions.
A3.1 f and g are Lipschitz continuous on
, and f (t; 0) = 0, 8t 2 [t0 ; T ].
A3.2 l and s are continuous, positive denite, monotonically increasing functions
on D and
respectively. R is a symmetric positive denite matrix.
A3.3 System (3.3) is controllable on D, in that there exists an admissible control,
u(0) 2 A(D), on D.
A3.4 In the innite-time horizon case, the system equations f , g and l, as well as
the initial control, u(0) are independent of time.
Given an initial stabilizing control u(0) 2 A, the performance of this control is
given by the solution to the GHJB equation
@V (0) + @V (0)T hf (t; x) + g(t; x)u(0)(t; x)i + l(t; x) +
u(0) (t; x)
2 = 0;
@t @x R
(0)
V (T; x) = s(x):
The solution V (0) to the GHJB equation can be used to construct a feedback control
law that improves the closed-loop performance of the system. The improved control
30
31
32
bn = l +
u(i)
; n ;R
(4.8)
Pn = hs; ni
: (4.9)
For the innite-time horizon, the equation reduces to
Ac(i) + b = 0:
We will now plug this approximation into algorithm 3.3.2 to obtain a new algo-
rithm for improving the closed loop performance of admissible controls for nonlinear
systems.
4.3 The Combined Algorithm
To simplify the notation throughout the remainder of the thesis, we dene
4
N (x) = (1(x); ; N (x))T ; (4.10)
34
and let rN be the Jacobian of N . If : IRN ! IR is a real valued function then
we dene the notation
h; N i
=4 (h; 1i
; : : : h; N i
)T :
If : IRN ! IRN is a vector valued function then we dene the notation
0 1
h1 ; 1i
hN ; 1 i
4B CC
h; N i
= B@ ... ...
A:
h1 ; N i
hN ; N i
The key to the notation is that the j th row corresponds to integration weighted by
j .
Using this notation we can write the Galerkin projection of the GHJB equation
in the compact form (
hGHJB (V; u) ; N i
= 0
hV (tf ); N i
= hs; N i
:
We will also use bold face letters to denote the coecients in the Galerkin approxi-
mation method, i.e.,
4 (i) T
c(Ni) = c1 ; : : : ; c(Ni) : (4.11)
We now trace the steps of algorithm 3.3.2, substituting the approximation of
the previous section, for the GHJB equation. Given an initial control u(0) , we can
compute an approximation to its cost VN(0) = c(0) T (0)
N N where cN is the solution to
(
c_ (0) ?1 (0) ?1
N (t) + M A(t)cN (t) + M b(t) = 0;
(0)
cN (tf ) = M ?1 P
and
D E
A = hrN f; N i
+ rN gu(0); N
2
b = ? hl; N i
?
u(0)
R ; N
P = hs; N i
:
Setting i = 1 we compute an updated control based on the approximation VN(i?1)
rather than the actual cost V (i?1) :
1 ?1 T @V (i?1)
u(Ni)(t; x) = ? R g (t; x) N
2 @x (t; x);
1
= ? R?1gT (t; x)rTN c(Ni?1) : (4.12)
2
35
When (4.12) is substituted into (4.7) and (4.8) we obtain the approximation
VN(i) = cN(i)T N ;
where c(Ni) is the solution to
(
c_ (Ni) (t) + M ?1 A(t)c(Ni) (t) + M ?1 b(t) = 0;
c(Ni) (tf ) = M ?1 P
and
* +
1X N
A = hrN f; N i
? 2 c(ki?1) rN gR?1gT @ k
@x ; N
;
k=1
XN * +!
1
b = ? hl; N i
? 2 ck (i?1) @
rN gR?1gT @x ; N c(Ni?1)
k
k=1
P = hs; N i
:
For a nite-time horizon, we obtain the following algorithm.
Algorithm 4.3.1 Dual Approximation for Finite-Time Horizon.
Compute the Integrals
* +
hl; N i
; hrN f; N i
; @x rN gR?1gT @k ; N
;
D E
(0)
2
rN gu ; N
;
u
R ; N ; hN ; N i
; hs; N i
:
(0)
?
P = (hN ; N i
) hs; N i
:
1
Find c(0)
N (t) satisfying the following linear dierential equation:
(
c_ (0) (0)
N (t) + A(t)cN (t) + b(t) = 0
c(0)
N (tf ) = P
Let i = 1.
Iterative Step An improved controller is given by
u(Ni) (t; x) = ? 21 R?1 gT (t; x)rTN (x)c(Ni?1) (t):
36
Let
X
N * +
M (t) = c(ji?1) (t) rN gR?1gT @k ;
@x N
j =1
?1
1
A(t) = (hN ; N i
) hrN f; N i
? 2 M (t) ;
?
1 ?
b(t) = (hN ; N i
) hl; N i
+ 4 M (t)cN (t) ;
1 (i 1)
P = (hN ; N i
)?1 hs; N i
:
Find c(Ni) (t) satisfying the following linear dierential equation:
(
c_ (Ni) (t) + A(t)c(Ni) (t) + b(t) = 0
c(Ni) (tf ) = P
Let i = i + 1.
For innite-time horizon problems, the time dependence disappears and these
equations become particularly easy to compute. The following algorithm summa-
rizes the innite-time horizon case.
Algorithm 4.3.2 Dual Approximation for Innite-Time Horizon.
Compute the Integrals
* +
hl; N i
; hrN f; N i
; rN gR?1gT @k ; N @x ;
D E
2
rN gu(0); N
;
u
R ; N :
(0)
Find c(0)
N satisfying the following linear equation:
Ac(0)
N + b = 0:
Let i = 1.
37
X (i?1)
N * + !
b = hl; N i
+ 41 ck rN gR?1gT @
@x
k
; N c(Ni?1) ;
k
Remark 4.3.3 It is, of course, possible to apply Galerkin's method to the HJB
equation directly. Assuming that the optimal cost function exists in a function
space spanned by span fj g1 4 T
1 we substitute the approximation VN = cN N into
the HJB equation and set the projection of the error to zero
(
hHJB (V ) ; N i
= 0
hV (tf ); N i
= hs; N i
:
After some algebra we obtain the nonlinear ordinary dierential equation
8 ?1
< c_ N1 +PhNN ; ND i
(hr?N1 f;T @kN i
cNE
>
.
x x
+
+
- -
f( . )
c1 1 g( . ) R-1gT(. ) ∆ φ (.)
2 1
. . .
. . .
. . .
φ (.)
cN 1 g(. ) R-1 gT( . ) ∆
2 N
the state space n is large. We have circumvented this problem by using a symbolic
computational engine, which results in signicant savings when it can be used. To
compute the integrals symbolically it is necessary that the closed form solutions
of all of the integrals exist. For example, this is true when the system equations
and the basis functions are polynomials. We have found that when the dimension
of the state space is greater than three, the necessary computations executed on a
Sun/Sparc 10 running Matlab 4.2, become prohibitive unless symbolic software is
used.
The most important practical issue regarding our algorithm is the choice of
the basis functions fj g. It is shown in chapter 5 that it is not necessary to require
that fj g are either orthogonal or even linearly independent. In fact if fj gN1 are
orthogonal (resp. linearly independent) and fj gN1 +1 are not orthogonal (resp. lin-
early dependent) then it is shown that VN(i) VN(i+1)
. However from a practical point
of view it is desirable that fj g are at least linearly independent, since in that case
we will be able to invert the matrix
D E
rTN (f + gu); N
directly. Since orthogonalization is computationally expensive, we use functions
which are simply linearly independent.
In the examples in chapter 6 we use polynomials as basis functions. We have
found that polynomials work very well in this algorithm. From equation (3.4) and
the fact that l and R are positive denite we can conclude that V (i) will also be
a positive denite function. Therefore the completeness assumption A4.1, will be
satised if we choose basis functions that span the positive denite functions, i.e.,
the terms obtained from the expansion of the polynomial
1 X
X n !2j
xk :
j =1 k=1
Therefore if n = 2 we can take the set
n o
fj g11 = x21 ; x1 x2 ; x22 ; x41 ; x31 x2 ; x21 x22 ; x1 x32 ; x42 ; :
For many system this set of basis functions works very well. Theoretically,
the algorithm converges for any complete basis as N ! 1. For nite values of
N , however, the algorithm will be sensitive to the chosen basis. If the system has
modes that are not spanned by the functions fj gN1 then the control will not be
able to compensate for these modes. To keep N as small as possible we want to
choose those j 's that capture the signicant dynamics of the system. The choice of
a basis, therefore, must receive careful consideration and it is here that engineering
ingenuity and insight are extremely important.
One of the major advantages of this method is that the control laws obtained
by algorithm 4.3.1 and algorithm 4.3.2 are tunable via the state penalty functions l
40
and s and the control weighting matrix R. It is important to note that there are only
2N integrals involving l and s and that R can be be pulled outside of the integrals
in which it is involved. Therefore, the computations that must be performed to tune
the controller are signicantly fewer than the original computations. In fact if l and
s are appropriately chosen (e.g. quadratic penalty functions), then the control can
be tuned without re-computing any integrals.
4.5 Summary of the Method
In this chapter we used the Galerkin spectral method to approximate the
solution to the GHJB equation. To apply Galerkin's method it is rst necessary
to place the solution to the dierential equation in a Hilbert space. To do so we
restrict attention to a compact subset
, of the stability region of a known stabilizing
control. When the solutions to the GHJB equation are restricted to this set they
exist in the Hilbert space L2 (D).
When Galerkin's method is used to approximate the GHJB equation, and
the result is plugged into the successive approximation algorithm we obtain algo-
rithm 4.3.1 and algorithm 4.3.2 which are shown pictorially in gure 4.2. In the next
chapter we will derive conditions under which these algorithms converge uniformly
to the solution of the HJB equation. While there are many methods of approximat-
ing solutions of the HJB equation they are either open-loop, impractical, or fail to
guarantee that the resulting approximate controls will be stabilizing. The advantage
of our algorithm is that
All of the computations are performed o-line, and
The resulting controls are in feedback form.
Furthermore, in chapter 5 we show that
The algorithm converges uniformly to the optimal control, and
When the approximation is truncated for nite (but large enough) N and I ,
the approximate controls are guaranteed to be stabilizing on a pre-specied
set
.
41
Compute:
2
l,Φ T
ΦN f , ΦN , ΦN
∆ (0)
f, g, l, R, u , { φj }1 , Ω
N
N u (0)
R
N
∆ T (0)
ΦN g u , ΦN { ∆
ΦN h φk , ΦN
T ∆
} 1
T
ΦN f , ΦN T (0)
∆
ΦN g u , ΦN
A= ∆
+
2
b= - l,Φ - u
(0)
, ΦN
N R
i=0
N
ΦN h φk , ΦN
(i) T
=Σ c
(i) ∆ ∆
M
k=1 k
T (i)
ΦN f , ΦN - 1 M
A=
∆
2
(i) (i)
b=- l,Φ - 1 M cN
N 4
i=i+1
42
43
(i)
V V*
(i)
VN V*N
is an admissible control law for the system (f; g) on D. If V^ is the unique positive def-
inite function satisfying GHJB V^ ; u^ = 0 with boundary conditions V^ (tf ; x) = s(x)
and V^ (t; 0) = 0; 8t 2 [t0 ; tf ], then V^ (t; x) V^ (t; x), for all (t; x) 2 D in particular
V^ (t0; x0 ; u^) V^ (t0 ; x0; u). Furthermore, the improvement in the performance is
given by Z tf
t
ku(; '( ; t; x; u^)) ? u^(; '( ; t; x; u^))k2R d:
Proof: See section A.4.
It has been shown in (Glad, 1985; Glad, 1987; Tsitsiklis and Athans, 1984)
that the optimal control u is robust in the sense that it has innite gain margin
and 50% gain reduction margin. A similar result has been shown for the GHJB
equation in (Saridis and Balaram, 1986). Since this result will be important in the
next section we restate it here.
Corollary 5.2.5 Robustness of u^.
Consider the innite-time problem with the system (f; g). Let u^ 2 Al (
) be
a control obtained from lemma 5.2.4 and let the gain perturbation D : IRm ! IRm
satisfy
zT RD(z) 1 +2 kzk2R ; > 0;
then x_ = f + gD (^u) is asymptotically stable on
.
Proof: See section A.5.
The situation is depicted geometrically in gure 5.2 on page 60; the system
remains stable as long as the control remains in the sector bounded above by innity
and below by 12 u^.
@T We show that if fj g1 are linearly independent, then so are the functions
N
0 Z1
=) cT rN (f + gu) dt = 0; for all x(0) 2
0
=) cT N (x) 0
which contradicts the linear independence of fj gN1 .
Finite-time: If the vector eld f + gu has a bounded response then along the
trajectories '(t; t0; x; u); (t; x) 2 D, we have that
Z t d
('(t; t0; x; u)) ? (x) = ('( ; t0; x; u)) d
t dt
0
Z t @
= (f + gu) ('( ; t0 ; x; u)) d:
0 t @x
Now suppose that the lemma is not true. Then there exists a nonzero c 2 IRN such
that
cT rN (f + gu) 0
Zt
=) cT rN (f + gu) dt = 0; 8(t; x) 2 D
t0
48
Zt
=) cT rN (f + gu) dt = 0; 8(t; x) 2 D
t 0
= 0;
with the boundary condition
D E
TN ; N
cN (tf ) = hs; N i
:
We also know the bN +1 satises
D E D E D E
TN +1 ; N +1
b_ N +1 + rTN +1(f + gu); N +1
bN +1 + l + kuk2R ; N +1
= 0;
with the boundary condition
D E
TN +1 ; N +1
bN +1(tf ) = hs; N +1 i
:
But this implies that
0 D T E DT E 1 !
@D T N ; N E
D N +1 ; N E
A _ _
b N
N ; N +1
TN +1; N +1
bN +1
0 D E 1
BB rN (f + gu); N
@x (f + gu); N
CC bN !
T @TN +1
+@ D T E
rN (f + gu); N +1
@@xN (f + gu); N +1
A bN +1
T
+1
0 D 2; E 1
+@ D
l + k uk R N E
A = 0
l + kukR ; N +1
2
D T E D E D E
=) N ; N
b_ N + TN +1; N
b_ N +1 + rTN (f + gu); N
bN
* T + D E
+
@ N +1 (f + gu); N bN +1 + l + kuk2R ; N
= 0:
@x
These equations imply that bN + N bN +1 satises the same linear dierential equa-
tion and boundary condition as cN , therefore
bN + N bN +1 = cN ; 8t 2 [t0 ; tf ];
which proves the result in the nite time case. For the innite time problem, similar
reasoning shows that cN and bN + N bN +1 both satisfy the linear equation
D E D E
rTN (f + gu); N
= ?l ? kuk2R ; N
So cN = BNT bN =) VN WN . But
hGHJB (WN ; u) ; N i
= 0; hWN (tf ); N i
= hs; N i
() BN hGHJB (WN ; u) ; N i
= 0; BN hWN (tf ); N i
= BN hs; N i
() hGHJB (WN ; u) ; N i
= 0; hWN (tf ); N i
= hs; N i
:
So BNT bN satises the same linear dierential equation and boundary condi-
tions as cN which implies that BNT bN cN in the nite-time case. In the innite-
time case fj gN1 are linearly independent so the hypothesis implies that the solution
to this equation is unique, therefore from (5.2) we have that BNT bN = cN .
Throughout the rest of this chapter we will assume that fj g have been or-
thonormalized since this does not aect the convergence result.
The orthonormality of fg1 1
1 implies that if a function (x) 2 span fj g1 then
X
1
(x) = h ; j i
j (x);
j =1
and that for any > 0 we can choose N suciently large to guarantee that
1
X
h ; j i
j < :
j =N +1
We will state necessary and sucient conditions for pointwise convergence of
a series to imply uniform convergence on a compact set.
Denition 5.2.10 We say that a series P1j=1 cj j (x) is pointwise decreasing on
,
written
X
1
cj j (x) 2 PD (
) ;
j =1
if 8k = 1; 2; : : :, and 8 > 0, 9 > 0 and m > 0 such that 8x 2
,
nP> m ) X 1
1 cj j (x) < =) cj j (x) < :
j =k+1 j =k+n+1
Remark 5.2.11 This condition implies that if the tail of a sequence at some point
x 2
is small, then after removing n > m terms, it is still small, where m is
a uniform number for all x 2
. In particular, this implies that if a series is
monotonically decreasing on
, i.e.,
1 1
X X
cj j (x) > cj j (x) ;
j =k j =k+1
52
X
1
hs; j i
j (x) 2 PD (
) ; (5.3)
j =1
then convergence is uniform on
.
Proof:
53
D Ei
+ l + kukR ; j
j (x) :
2
X D E
+ l + kuk2R ; j
j (x)
j=N +1
1
X D E
AB (x) + l + kuk2R ; j
j ; (5.5)
j =N +1
where
(i)
A =4 1kmax c (t)
N;2[t ;tf ] k
1 *
0
+
X @
B (x) =4 sup @x
k
(f + gu); j j :
(t;x)2D j =N +1
For the boundary condition we obtain
1 2* N + 3
X X
VN (tf ) ? V^ (t ? f ) = 4 ck (tf )k ; j ? hs; j i
5 j
j=N +1 k=1
1
X
= hs; j i
j : (5.6)
j=N +1
The lemma follows by applying the hypothesis and lemma 5.2.12.
We show that the GHJB equation is bounded below so that the previous
lemma implies convergence of the approximation to the solution. The proofs for the
innite-time and nite-time case are somewhat dierent and so we will give separate
lemmas for each case.
Lemma 5.2.14 kcN ? c^N k ! 0: Finite-time.
54
that a continuous perturbation in the system equations and the initial state implies
a continuous perturbation of the solution (cf. (Arnold, 1973)). This implies that for
all > 0, there exists a (t) > 0 such that 8t 2 [t0 ; tf ],
1 * +
X @
< (t) =) kcN (t) ? c^N (t)k2 < :
j
c^j (t) @x (f + gu)(t); N
j =N +1
2
But * @T +
X
1 D E
c^j (t) @x (f + gu)(t); N = ? l(t) + ku(t)k2R ; N
j
j =1
implies that the series on the left converges pointwise at each t 2 [t0 ; tf ]. So 8(t) >
0, 9k(t) such that
1 * +
X @
N > k(t) =)
c^ (t) @xj (f + gu)(t); N
< (t);
j=N +1 j
2
which proves the claim on pointwise convergence. Uniform convergence follows from
the hypothesis and lemma 5.2.12.
Lemma 5.2.15 kcN ? c^N k2 ! 0: Innite-time.
Given u 2 Al(
). Let VN (x) = PNj=1 cj j (x) satisfy
hGHJB (VN ; u) ; N i
= 0
4 P1
and V^ (x) = j =1 c^j j (x) satisfy
GHJB V^ ; u = 0;
T
where j (0) = 0. If
is compact and the functions @@xj (f + gu), kuk2R , l, s, are
continuous on
and are in the space span fj g1
1 , and if the coecients jcj (t)j are
uniformly bounded for all N , then
kcN ? c^N k2 ! 0:
Proof: Dene
N (x) =4 GHJB (VN ; u) (x);
then from the hypothesis we have that for all x 2
,
GHJB (VN ; u) (x) ? GHJB V^ ; u (x) = N (x):
Substituting the series expansion for VN and V^ , and moving the terms in the series
56
j^N (x)j2 dx
min (W ) kcN ? c^N k22
> 0
where Z
W =4 [rN (f + gu)] [rN (f + gu)]T dx;
min(W ) is the minimum eigenvalue of W , and the last inequality follows from
equation (5.7). Therefore
Z
j^N (x)j2 dx ! 0 =) kcN ? c^N k22 ! 0:
where (
) is the Lebesgue measure of
. Lemma 5.2.13 implies the pointwise
57
Proof:
2 Z 2
VN ? V^
L (
) =
VN ? V^ dx
2
Z Z 1 2
X
(cN ? c^N )T N (x) dx +
2
c^j j (x) dx
j =N +1
Z X 2
D E 1
= (cN ? c^N )T N ; TN
(cN ? c^N ) + c^j j (x) dx
j =N +1
By the mean value theorem, 9 2
such that
1 2
2 X
VN ? V^
L (
) = kcN ? c^N k22 + (
) c^j j ( )
2 j=N +1
! 0:
The following lemma states additional conditions that ensure that the approxi-
mate control uN converges to the updated control u^, where u^ is obtained by applying
lemma 5.2.4 to V^ .
Lemma 5.2.17 kuN ? u^k ! 0.
58
then kuN (x)(t; x) ? u^(t; x)kR ! 0 pointwise on D. If in addition the conditions for
uniform convergence in lemma 5.2.13 are satised and
X
1
c^j R?1 gT @
@x 2 PD (
)
j
(5.8)
j =1
then kuN (t; x) ? u^(t; x)kR ! 0 uniformly on D.
Proof:
1
1 X 1 @
kuN ? u^kR
? 2 R?1 gT rTN (cN ? c^N )
R +
2 c^j R?1gT @x
;
j
j =N +1 R
so u^ = ? 21 P1 ?1 T @j
j =1 c^j R g @x implies that the second term on the right hand side con-
verges pointwise to 0 and uniformly if condition (5.8) is satised. By lemma 5.2.13
we know that
X
1 @T
(cN (t) ? c^N (t))T rN (f + gu)(t; x) = N (t; x) + c^j (t) @xj (f + gu)(t; x)
j =N +1
converges pointwise to 0 and uniformly to 0 if conditions (5.3) are satised. For
each (t; x) 2 D we have by the denition of the inner product in IRN that
(cN ? c^N )T rN (f + gu) ! 0 () rTN (cN ? c^N ) ! (t; x)
where is perpendicular to (f + gu) at each (t; x). Since fj g1
1 are linearly inde-
pendent and (f + gu) is admissible, we have from lemma 5.2.6 that
(cN ? c^N )T rN (f + gu) ! 0 () (cN ? c^N ) ! 0:
From corollary 5.2.7 we have that
(cN ? c^N ) ! 0 ()
rTN (cN ? c^N )
! 0:
Therefore
rTN (cN ? c^N )
converges in the same sense as (cN ? c^N )T rN (f + gu).
59
Since R?1gT (t; x) in continuous on D and hence uniformly bounded, we have that
?1 T
R g (t; x)rTN (x)(cN (t) ? c^N (t))
R ! 0
in the same sense as (cN ? c^N )T rN (f + gu).
The next two lemmas show that for N suciently large, uN is admissible.
Lemma 5.2.18 Admissibility of uN : nite-time.
If the conditions of lemma 5.2.17 are satised, then for N suciently large
uN (t; x) 2 Al;s(D).
Proof: Dene
Z tf
J (x; w) =4 s('(tf ; t0; x; w)) + l('(t; t0; x; w)) + kw('(t; t0; x; w))k2R dt:
t0
We must show that for N suciently large, J (x; uN ) < 1 when J (x; u^) < 1.
But '(t; t0; x; w) depends continuously on w, i.e., small variations in w result in
small variations in '. Also since kuN ()k2R can be made arbitrarily close to ku^()k2R ,
J (x; uN ) can be made arbitrarily close to J (x; u^). Therefore for N suciently large,
J (x; uN ) < 1 and hence uN (t; x) is admissible.
Lemma 5.2.19 Admissibility of uN : innite-time.
Under the conditions of lemma 5.2.17, if the set
(
)1
g(0)R?1gT (0) @ 2 j (0)
@x2
2 1
is uniformly bounded for all N , then for N suciently large, uN 2 Al (
).
Proof: From lemma 5.2.4 we know that u^ 2 Al (
). Therefore from corol-
lary 5.2.5, uN is stabilizing on
if
u^T (x)RuN (x) > 12 u^T (x)Ru^(x)
() u^T (x)R(2^u(x) ? uN (x))
for all x 2
. The situation is intuitive in the 1D case: if u^(x) > 0 then uN (x) >
1=2^u(x) and if u^(x) < 0 then uN (x) < 1=2^u(x). So, in gure 5.2 we see that uN
must lie in the sector bounded above by 1 and below by 1=2^u. But we know from
lemma 5.2.17 that uN is uniformly within an ball of u^, where can be made
arbitrarily small by making N large enough. Therefore uN is guaranteed to be
stabilizing everywhere but some ball B (0; N ) centered at the origin, where N ! 0
as N ! 1 (see gure 5.2). By Lyapunov's rst theorem, uN will be stabilizing on
a small region
^ B (0; N ) (for N suciently large) if and only if the real parts of
60
0.6
0.4 u
0.2
1/2 u
u 0
−0.2
−0.4
ε ρ
N
N
−0.6
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
F =4 @f
@x (0)
Gj =4 ? 12 g(0)R?1gT (0) @@x2j (0):
2
where (M ) are the eigenvalues of M . So for N suciently large, uN will be stabi-
lizing if 8 9
< XN =
Re :(F + cj Gj ); < 0:
j =1
61
Note that
N
1
X 1 XN
X
X
F + c^j Gj ? F ? cj Gj
(^cj ? cj )Gj
+
c^j Gj
:
j =1 j =1 2 j =1 2 j =N +1
2
Since we know that the series P1 j =1 c^j Gj converges, 8 > 0, 9K1 such that
1
X
N > K1 =)
c^j Gj
< =2:
j =N +1 2
Also, since kGj k2 are uniformly bounded, lemma 5.2.15 implies that 9K2 such that
N > K2 implies
N
N
X
X
(^cj ? cj )Gj
jc^j ? cj jkGj k2
j =1 2 j =1
is less than =2, which proves that as N ! 1,
XN X1
(F + cj Gj ) ! (F + c^j Gj ):
j =1 j =1
Since all of the eigenvalues of F + P1
j =1 c^j Gj are strictly
P less than zero, there exists
N
some K after which all of the eigenvalues of F + j=1 cj Gj are strictly less than
zero. So for some nite K , N > K implies that uN is stabilizing on
.
To show that uN 2 Al (
) we must show that
4
Z1
J (x; uN ) = l('(t; x; uN )) + kuN ('(t; x; uN ))k2R dt < 1 8x 2
:
0
But since the eigenvalues of @x@ (f + guN )(0) can be made arbitrarily close to the
eigenvalues of @x@ (f + gu^)(0), the decay rates of f + guN and f + gu^ are of the same
order in a region close to zero. So (See remark 3.1.8)
J (x; u^) < 1 =) J (x; uN ) < 1 =) uN 2 Al (
):
i?
u(Ni) = ? 21 R?1gT @VN@x
( 1)
V (i) is the value function from the successive approximation algorithm, V^ (i) is
the actual solution of the GHJB equation when the approximate control, uN , is used
(note that u^(i) is only used in the analysis), and VN(i) is the approximate solution of
the GHJB equation when the approximate control is used.
We will show that the following assumptions are sucient for convergence and
stability.
A5.1 V (i) 2 span fj g1j=1 L2 (
),
A5.2 u(0) 2 Al;s(D),
A5.3
is compact,
A5.4 @@xj (f + gu(0) ),
u(0)
2R , l, s, @@xj gR?1gT @@xk are continuous and in span fj g11 ,
T T
A5.5 c(ji) (t) - is uniformly bounded as N ! 1 (c(ji) depends on N ),
P
(i)
2
A5.6 j=1 l +
uN
R ; j
j (x) 2 PD (
),
1
iT
A5.7 for N suciently large, P1j=1 @V@xN (f + gu(Ni)); j j (x) 2 PD (
),
( )
P
A5.8 j=1 hs; j i
j (x)PD (
),
1
T
A5.9 P1j=1 c^j (t) @@xj (f + gu(Ni))(t); N 2 PD ([t0 ; tf ]),
P
A5.10 1j=1 c^j R?1gT @@xj 2 PD (
),
n
o1
A5.11
g(0)R?1gT (0) @@xj (0)
2 j=1 is uniformly bounded,
2
64
If fj g1
1 are uniformly bounded then jj (x)j M for all x 2
and j 0. Therefore
(0) (0) 2 XN
c(0) ? c^(0) 2 jj (x)j2 + X 1 2
c^(0) jj (x)j2
V ? VN j j j
j =1 j =N +1
(0)
2 + M 2 X c^(0) 2
1
M 2
c(0) N ? ^
c N 2 j
j =N +1
! 0 uniformly.
Lemma 5.2.18 and lemma 5.2.19 imply that u(1)N 2 Al;s(D). Lemma 5.2.17 implies
that
sup
u(1) (1)
2
N ? u
R ! 0:
D
Induction Step: Assume that
1.
VN(i?1) ? V (i?1)
L (
) ! 0,
2
=0
66
By the same arguments used to establish the basis step we have that
(i) (i)
VN ? V^
!0 L2 (
)
where the convergence is uniform if fj g are
(iuniformly
bounded.
By the induction step we know that
uN ? u
! 0 uniformly on D. We will
) (i)
show that this implies that V^ (i) ! V (i) uniformly on D. For the nite-time problem
we have that 8t 2 [t0 ; tf )
Zt
2
V^ (i) (t; x) = l('( ; t0 ; x; u(Ni))) +
u(Ni) ('( ; t0 ; x; u(Ni))
R dt
Zt t 0
2
V (i) (t; x) = l('( ; t0 ; x; u(i))) +
u(i) ('( ; t0 ; x; u(i))
R dt:
t0
Since '(t; t0; x; w) depends continuously on w, V^ (i) can be made uniformly close to
V (i) by making u(Ni) uniformly close u(i). For innite-time we note from the proof of
lemma 5.2.19 that close to the origin, the decay rates of u(Ni) and u(i) can be made
arbitrarily close. Therefore, there is some t after which the integral
Z1
2
l('( ; x; u(Ni))) +
u(Ni)('( ; x; u(Ni) )
R dt
t
is uniformly close to
Z1
2
l('( ; x; u(i) )) +
u(i) ('( ; x; u(i))
R dt:
t
From the continuity of '(t; x; u) with respect to u, we can also make the integrals
from 0 to t arbitrarily close, which proves the result.
The admissibility of u(Ni+1) follows from the admissibility of u(i), via theo-
rem 5.3.1, lemma 5.2.18 and lemma 5.2.19.
To complete the proof we must show that
sup
u(i+1) ? u(Ni+1)
! 0:
D
By the triangle inequality we have that
sup
u(i+1) ? u(Ni+1)
sup
u(i+1) ? u^(i+1)
+ sup
u^(i+1) ? u(Ni+1)
:
Lemma 5.2.17 implies that sup
u^(i+1) ? u(Ni+1)
! 0, so the proof reduces to showing
67
that
(i+1) (i+1)
1 ?1 T @V (i) ? V^ (i)
sup
u ? u^
= sup
? R g
2 @x
! 0
given that sup
u(i) ? u(i)
! 0, where V (i) = P1 b(ji) j satises
N j =1
uniformly bounded on D. Taking the supremum over D of both sides and applying
the induction hypothesis, we obtain
(i) ^ (i) T
sup @ (V ? V ) (f + gu ) ! 0;
(i)
1 @x
X (i) (i) @Tj
=) sup (bj ? c^j ) (f + gu(i)) ! 0:
j=1 @x
(^c(ji) depends on N through uN .) By the denition of the inner product we see that
1
X (i) (i) @Tj X
1 @T
sup (bj ? c^j ) (f + gu(i)) ! 0 () (b(ji) ? c^(ji) ) j !
j =1 @x @x j =1
method is shown to converge to the inmum rather than the optimal. In section 6.1.4
the method is used to nd a feedback control for an inverted pendulum. Finally, in
section 6.1.5 a nite-time problem is solved.
In the rst three examples we will compare our control with the optimal. For
one dimensional systems the optimal control can be found directly by solving the
HJB equation for @V@x :
!2
@V f (x) + l(x) ? b2 @V = 0;
@x 4r @x
which gives s
@V = 2r 4r2f 2(x) + 4rl(x) :
@x b2 b4 b2
The optimal control law is therefore given by
s 2
f (x
u(x) = ? b ? sign(bx) f b(2x) + l(rx) :
)
(6.2)
N=1: −− j=1: −−
N=2: .. 20
j=2: ..
0.7 N=3: −
j=3: −.
N=4: −.− 15
optimal: −
0.6 10
5
0.5
Control
Cost
0.4
−5
0.3 −10
−15
0.2
−20
Optimal
0.1 −25
0 5 10 15 −5 0 5
Iteration State
Figure 6.1: Cost and control for linear system with non-quadratic cost.
73
are arbitrarily close to the inmum. Also note that u does not have a Taylor
series expansion at x = 0 and so the method in (Garrard, 1969; Garrard, 1977;
Garrard and Jordan, 1977) fails to be valid for this example. When the order of
approximation is 4, we obtain the following control:
u4(1) = ?11:397x2 + 26:316x4 ? 31:395x6 + 13:4053x8:
In gure 6.2 we show the calculated control for iterations i = 1; 2; 10 verses the
(a) Control vs. State, N=3 (b) Control vs. State, N=14
0 0
−0.2 −0.2
−0.4 −0.4
Control
Control
(0) (0)
−0.6 u : .. −0.6 u : ..
(1)
(1)
:: u :
u 14
3
(2)
−0.8 (2) −0.8 u : −.−
u : −.− 14
3
(14)
(14) u : −−
u : −− 14
−1 3 −1
*
*
:− u :−
u
−1.2 −1.2
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
State State
gure 6.3 shows the results of applying our algorithm to this system. The
cost in these gures is computed from an initial position of x1 (0) 2 [? 34 ; 34 ], and
an initial velocity of x2 (0) = 0. gure 6.3 (a) shows the improvement in cost as
the iteration i is increased. gure 6.3 (b) shows the improvement in cost as the
approximation order M is increased. Clearly, an approximation order of M = 4 (i.e.
cubic terms in the control) is sucient for this example.
76
V : −−
6
10 (oo)
V :−
6
5
0
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Position x_1
(b) Cost vs state, i=oo
25
(0)
V :−
20 (oo)
V : ..
4
15 (oo)
Cost
V : −.
6
10 (oo)
V : −−
8
5
0
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Position x_1
40
50
20
0
0
−20
−40 −50
0 2 4 6 0 2 4 6
time time
(c) Control Gains: rho = 100 (d) Control Gains: rho = 350
150 400
100
200
50
0 0
−50
−200
−100
−150 −400
0 2 4 6 0 2 4 6
time time
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
0 5 10 0 5 10
time time
(c) States: t_f = 2, P=5 (d) States: t_f = 1, P = 10
0.2 0.4
0.2
0
−0.2
−0.2
−0.4 −0.4
0 5 10 0 5 10
time time
where u and v are control variables. A famous theorem by Brockett states that
this system cannot be stabilized by a continuous, constant state feedback (Brock-
ett, 1983). To control the system various methods have been proposed using dis-
continuous and/or time-varying feedback. A survey of the eld of stabilization of
nonholonomic system can be found in (Samson, 1995; Srdalen and Egeland, 1995).
We will use the nite-time version of our algorithm to nd a (continuous) time-
varying feedback control law for the system. Repeated application of the control
law results in a piecewise continuous periodic controller. The cost function for the
system will be
Z tf
J = 1 k'(tf ; 0; x; u)k2 + 2 k'( ; 0; x; u)k2 + ku(; '( ; 0; x; u))k2 d;
0
where 1 and 2 are parameters. The set
was chosen somewhat arbitrarily to be
= [?2; 2] [?2; 2] [?2; 2]. For basis functions we use even polynomials in x,
y and z up to order six, deleting terms which integrated to zero due to symmetry.
We had N = 19 basis functions. For certain values of 1 and 2 the projected HJB
equation (4.13) escapes in nite-time. Various values of 1 and 2 , along with their
admissibility or inadmissibility are shown in the table below.
1 2 control status Plot
0 1 admissible gure 6.6 (a)
1 0 nite-escape
1 1 admissible gure 6.6 (b)
5 1 nite-escape
5 5 admissible gure 6.6 (c)
10 10 nite-escape
1 10 admissible gure 6.6 (d)
1 100 admissible gure 6.6 (e)
1 1000 admissible gure 6.6 (f)
10 100 admissible
The admissible gains associated with this table are shown in gure 6.6 for a nal
time of tf = 30 seconds. We can see that the nal state weighting 1 directly
aects the admissibility of the control. When 1 is large then the control gains must
82
(a) Control Gains: rho = (0,1)^T (b) Control Gains: rho = (1,1)^T
5 5
0 0
−5 −5
0 10 20 30 0 10 20 30
(c) Control Gains: rho = (5,5)^T (d) Control Gains: rho = (1,10)^T
10 10
0 0
−10 −10
−20 −20
0 10 20 30 0 10 20 30
(e) Control Gains: rho = (1,100)^T (f) Control Gains: rho = (1,1000)^T
50 100
0
0
−100
−50 −200
0 10 20 30 0 10 20 30
time time
1.2
0.8
0.6
states
0.4
0.2
−0.2
0 5 10 15 20 25 30
time
1 1
0.5 0.5
0 0
−0.5 −0.5
0 10 20 30 0 10 20 30
1 1
0.5 0.5
0 0
−0.5 −0.5
0 10 20 30 0 10 20 30
time time
−1
−2
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x_1
(oo)
0.02 (c) V g −V 4 vs. State
0.01
−0.01
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x_1
1.6
5
1.4
1.2 4
: −− V : −−
V r
r
1
Cost
Cost
3 (oo)
(oo)
V :− V :−
0.8 6
6
0.6 2
0.4
1
0.2
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Position x Position x =x
1 1 2
3.5
3
V : ..
fl
2.5 (oo)
V :−
6
Cost
1.5
0.5
0
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
x
1
where
4
(t) = the power angle,
4
!(t) = the relative speed,
4
kc = the gain of the excitation amplier,
4
uf (t) = the input of the SCR amplier of the generator,
4
D = per unit damping constant,
4
H = per unit inertia constant,
4
!0 = synchronous machine speed,
4
Pm = the mechanical input power,
4
Pe = the active electrical power delivered by generator,
4
Tdo = direct axis transient short circuit time constant,
4
Vs = innite bus voltage,
4
xd = direct axis reactance of the generator,
4
x0d = direct axis transient reactance of the generator,
0
0 4 xds
Td0 = x Td0 :
ds
We rst assume that Pm(t) is constant and that we wish to drive Pe(t) ! Pm .
Dene the following constants
a1 = ? 2DH
a2 = 2!H0
a3 = ? T10
d0
Breaker
Generator
XL
XL
Transformer
Fault
V 2 (xd ? x0 )
a4 = s d
xdsx0ds
a5 = Tk0cVxs :
d0 ds
Rearranging the equations give
_ = !(t)
!_ = a1 !(t) + a2 (Pm ? Pe(t))
P_e = a3 Pe(t) + !(t)Pe(t) cot((t)) + a4!(t) sin2 ((t)) + a5 sin((t))uf :
The objective of the control is to shape uf to drive
0 1 0 1
B@ !(t) CA ! B@ 00 CA :
(t)
Pe(t) Pm
To put the equations in regulator form, we make the following change of vari-
able:
x = ? 0
y = !
z = P e ? Pm :
Letting
? a3 Pm ;
uf = a usin(
5 x+ ) 0
the system equations becomes
x_ = y
y_ = a1y ? a2 z
z_ = a3z + y(z + Pm )cot(x + 0 ) + a4 ysin2(x + 0 ) + u
The equations are now in the form required by algorithm 4.3.2. For the cost function
we arbitrarily selected a quadratic weightings on the states and control,
Z1
J= (x2 + y2 + z2 + u2)dt:
0
Using feedback linearization, an initial stabilizing control is ((Wang et al., 1993))
ufl = ?y(z + Pm ) cot(x + 0) ? a4 y sin2 (x + 0 ) + k1x + k2y + k3z;
94
where K are the Kalman gains associated with the linear system
0 1 0 1
0 1 0 0
A = @ 0 a1 ?a2 A ; B = @ 0 C
B C B A ; Q = I (3); R = 1:
0 0 a3 1
Therefore the initial control is
ufl = ?y(z + Pm) cot(x + 0 ) ? a4y sin2(x + 0) + x + 1:0725y ? 9:6993z:
For the basis functions use polynomials up to degree 4,
fj g = f z2 ; yz; y2; xz; xy; x2 ; z4 ; yz3; y2z2 ; y3z; y4; xz3 ; xyz2; xy2z;
xy3; x2z2 ; x2yz; x2 y2; x3 z; x3 y; x4g:
The set
is selected by observing the uncontrolled dynamics (which are sta-
ble), and bounding the state response. The following bounds were selected:
40o 140o
:98 ! 1:02
0 Pe 1:5:
When algorithm 4.3.2 is applied to system, the following control is calculated for
i = 10:
u= ?5:2205z + 0:2407y + 1:1936x ? 6:8881z3 + 3:6315z2 y
?0:9524zy2 + 0:0987y3 ? 2:1929z2x + 0:7961zxy
?0:01972xy2 ? 2:18995zx2 + 0:4273x2y ? 0:05338x3:
The physical limits of the plant saturate the control variable between
?5 u 5:
In gure 6.13 we show the response of the system to a fault occurring between
0:55 t 0:7 seconds. In the gure the dotted line represents the uncontrolled
system, the dashed line represents the system controlled by feedback linearization,
and the solid line corresponds to the system under the control derived by our method.
The key variable is angle, . Reduction in the maximum angle swing and good
transient behavior in , correspond to money saved by the power utility. It can
be seen from gure 6.13 that the control derived by our method corresponds to a
10% reduction in the maximum power swing as compared to the control obtained
by feedback linearization.
The time history of the control variable is shown in gure 6.14. This plot shows
that the feedback linearizing control has several unwanted swings before the system
95
150 1.02
100
Speed
Angle
1
50
0 0.98
0 2 4 6 0 2 4 6
time time
1 6
0.8 4
Eqp
Eq
0.6 2
0.4 0
0 2 4 6 0 2 4 6
time time
1.5 1
1
Pe
Vt
0.5
0.5
0 0
0 2 4 6 0 2 4 6
time time
Control
0
−1
−2
−3
−4
−5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
time
The controls computed by the method are robust in the same sense as
the optimal control.
All computations are performed o-line.
Once a solution is found, the control can be implemented in hardware
and run in real time.
Coecients for the state and control weighting functions can be taken
outside the integral so tuning the control through the penalty function
becomes computationally fast.
Disadvantages:
\The curse of dimensionality:" n-dimensional integrals must be com-
puted. This limits the method to systems with small state spaces.
The control is given as a series of basis functions. Therefore the control
is inherently complex.
7.2 Contributions
The main contributions can be summarized as follows.
1. In chapter 3 we formalized the notion of an admissible control for both the
innite-time horizon problem and the nite-time horizon problem. While ad-
missibility is implicitly assumed in the literature, we have never seen it explic-
itly dened.
2. Attention was restricted to a compact set contained within the stability region
of an initial control. This places the performance index and control in an
inner product space allowing Galerkin's spectral method to be applied to the
problem.
3. The successive approximation algorithm was used to show that the optimal
control has the largest possible stability region of any admissible control.
4. Galerkin's spectral method was used to reduce the GHJB equation to a linear
algebraic equation. The resulting approximation was then used in the succes-
sive approximation algorithm to develop a new algorithm that improves the
performance of arbitrary stabilizing controls and converges to the solution of
the HJB equation.
5. Sucient conditions were developed to ensure that the dual approximation
converges uniformly to the optimal solution.
6. For a high enough (but nite) order of approximation, the approximate control
was shown to be admissible and to be robust in the same sense as the optimal
control.
99
7. It was shown that the region of stability of the approximate control can be
made equal to
, the estimate of the stability region of the initial control.
Therefore, in contrast to other method that approximate the HJB equation,
the approximate controls are (state) feedback controls with a guaranteed re-
gion of attraction that is well dened.
8. The algorithm was applied to a variety of systems (both innite and nite
time) and was shown to produce results that are comparable to other well
known approximation schemes, while being applicable to a much broader class
of systems.
7.3 Future Work
Suggestions for future research are summarized below.
1. The convergence result in chapter 5 shows that for a xed i, there is a K after
which the approximate control is admissible and close to the optimal. However,
when we implement the algorithm we x N and increase i. Therefore we would
like to know that as i increases, N is not required to increase. In other words,
if N1 works for i = 1, then we would like to know that N1 works as i ! 1.
Simulations indicate that this result is true and we leave it as a conjecture to
be shown.
2. We have found that for nite-time problems it is much easier to compute the
Galerkin approximation of the HJB equation directly, since it is as easy to
numerically solve a nonlinear ordinary dierential equation as a linear one.
To justify this approach we need to show that VN ! V as N ! 1. If we
could place a bound on N as i increases, the proof should follow as a limiting
case of the proof in chapter 5, but this needs to be investigated. We also
need to investigate the existence and uniqueness of the ordinary dierential
equation generated by doing the Galerkin approximation of the HJB equation.
3. We would like to have error bounds on the solution when i and N are xed.
4. Selection of basis functions is a major consideration in the approach. The
quality of the control will be determined by the basis functions used. Polyno-
mials seem to work well, but for a given system it would be nice to have some
guidance in selecting a good basis.
5. The method uniformly approximates the Hamilton-Jacobi-Bellman equation.
Since other Hamilton-Jacobi equations show up in various branches of nonlin-
ear system theory, the results of this thesis should extend naturally to these
problems which include the following list:
Nonlinear optimal control of stochastic systems,
Nonlinear H1 optimal control,
100
Nonlinear estimation,
Control of Nonlinear Systems using output feedback,
Nonlinear dierential games.
6. In its current framework, our algorithm cannot handle explicit constraints on
the state and control variables. It may be possible to extend the method to the
case where explicit constraints are placed on the control (e.g. kuk 1): Rather
than solving an explicit linear equation, we would solve a linear programming
problem.
7.4 Conclusion
Given a nonlinear system, it is usually possible to nd a feedback control law
that renders the closed-loop system stable. The thesis provides a practical tool that
enables a control engineer to advance the design by systematically incorporating
system performance. It is hoped that the algorithm will assist practicing engineers
to design enhanced controls for a variety of nonlinear systems.
REFERENCES
Aganovic, Z. and Gajic, Z. (1994). The successive approximation procedure for
nite-time optimal control of bilinear systems. IEEE Transactions on Automatic
Control, AC{39(9):1932{1935.
Al'brekht, E. G. (1961). On the optimal stabilization of nonlinear systems.
Joural of Applied Mathematics and Mechanics, 25(5):836{844.
Anderson, B. D. O. and Moore, J. B. (1971). Linear Optimal Control.
Prentice-Hall, Englewood Clis, New Jersey.
Apostol, T. M. (1974). Mathematical Analysis. Addison Wesley.
Arnold, V. I. (1973). Ordinary Dierential Equations. MIT Press.
Arnold, V. I. (1989). Mathematical Methods of Classical Mechanics. Springer
Verlag.
Balaram, J. (1985). Suboptimal Control of Nonlinear Systems. PhD thesis,
Rensselaer Polytechnic Institute, Troy, New York 12180.
Ball, J. A., Helton, J. W., and Walker, M. L. (1993). h1 control for nonlinear
systems with output feedback. IEEE Transactions on Automatic Control,
38(4):548{559.
Baumann, W. T. and Rugh, W. J. (1986). Feedback control of nonlinear
systems by extended linearization. IEEE Transactions on Automatic Control,
31(1):40{46.
Bellman, R. E. (1957). Dynamic Programming. Princeton University Press,
Princeton, New Jersey.
Bertsekas, D. P. (1976). On error bounds for successive approximation methods.
IEEE Transactions on Automatic Control, 21:394{396.
Bosarge, W. E., Johnson, O. G., McKnight, R. S., and Timlake, W. P. (1973).
The Ritz-Galerkin procedure for nonlinear control problems. SIAM Journal of
Numerical Analysis, 10(1):94{110.
Brockett, R. W. (1983). Asymptotic stability and feedback stabilization. In
Millman, R. S. and Sussmann, H. J., editors, Dierential Geometric Control
Theory, pages 181{191. Birkhauser.
Bryson, A. E. and Ho, Y. C. (1975). Applied Optimal Control. Hemisphere, New
York.
101
102
108
109
^ V^ T (f + gu):
V^_ (t; x) = @@tV + @@x
minimum at the origin, hence @@xV^ must vanish there: therefore u^(t; 0) = 0.
We now show that V^ (t; x) is a Lyapunov function for the system (f; g; u^).
V^ (t; x) is positive denite by lemma 5.2.1. Taking the derivative of V^ (t; x) along
the system (f; g; u^) and using the fact that g T @@xV^ = ?2Ru^ we obtain
But
@ V^ + @ V^ f (t; x) = ? @ V^ T g(t; x)u(t; x) ? l(t; x) ? kuk2
@t @x @x R
= 2^u Ru ? l ? kukR :
T 2
@ V^ = ? @ V^ T [f + gu] ? l ? kuk2
@t @x R
T
@ V^ = ? @ V^ [f + gu] ? l ? kuk2
@t @x R
Therefore
@ V^ T (f + gD(^u)) = ?l ? ku ? u^k2 ? ku^k2 + @ V^ T gD(^u) ? @ V^ T gu^:
@x R R @x @x
Using the fact that @ V^ T g = ?2^
uT R , the hypothesis gives that
@x
@ V^ T (f + gD(^u)) = ?l ? ku ? u^k2 ? ku^k2
@x R R
< 0:
K j=1 j=1 j =1
X
+ cj j (y) ? W (y)
1j=1 K 1
X X XK X
= cj j (x) + cj j (x) ? cj j (y) + cj j (y)
j=K +1 j=1 j =1 j=K +1
< :
Therefore PW (x) is continuous.
j =K +1 cj j (x) ! 0 uniformly on
implies that 8x 2
and 8 > 0, 9m
(ii): 1
such that 1
X
n > m =) cj j (x) < :
j=k+n+1
So for any > 0,
nP> m
) X 1
1 cj j (x) < =) cj j (x) < :
j =k+1 j =k+n+1
((=). P1
j =N +1 cj j (x) ! 0 pointwise on
implies that 8 > 0, 9k(x) such
that 1
X
n k(x) =) cj j (x) < =3:
j =N +1
Since PNj=1 cj j (x) is uniformly continuous on
, 8 > 0, 900 (N ) such that
N
X X
jx ? yj < 00(N ) =) cj j (x) ? cj j (y) < =3:
N
j =1 j =1
j =1 j =1
114
1
X
k(x0) k(x) =) cj j (x) < =3
j=k(x )+1 0
jx ? x0 j < (x0) < 0 =) jW (x) ? W (x0)j < =3:
So 8x 2
and 8 > 0, 9k 2 fk(xj )gpj=1, such that P1 j =k+1 c j j (x) < . From the
hypothesis, 9m; such that
nP> m ) X 1
1 cj j (x) < =) cj j (x) < :
j =k+1 j =n+k+1
Let K = max1jpfm + k(xj )g, then
1
X
N K =) cj j (x) < ; 8x 2
:
j =N +1
A.7 Proof of Theorem 5.3.1.
By lemma 5.2.1 and lemma 5.2.2 we know that there exists a unique, positive
denite solution, V , to the equation GHJB V; u = 0 with appropriate bound-
(0) (0)
ary conditions. From lemma 5.2.4 we have that u(1) 2 Al;s(D) and that V (1) (t; x)
V (0) (t; x). By induction we have that V (i) (t; x) V (i?1) (t; x) V (0) (t; x) and
u(i) 2 Al;s(D). We can repeat the argument used in the proof of lemma 5.2.4 to
show that 8i 0, V (t; x) V (i) (t; x). Hence for each (t; x) 2 D, V (i) (t; x) is a
monotonically decreasing sequence that is bounded below. Hence V (i) (t:x) converges
to some V (1) (t; x). However, it is easy to verify that
GHJB V (1) ; u(1) HJB V (1) 0
with identical boundary conditions, so V (1) (t; x) V (t; x).
If
is a compact set, then uniform convergence follows from Dini's theorem
of analysis (cf. (Apostol, 1974)).
APPENDIX B
GALERKIN'S METHOD
There are many results in the literature concerning Galerkin approximations and
sucient conditions for general classes of equations are known. The diculty is
that the Hamilton-Jacobi-Bellman equation does not satisfy the appropriate con-
ditions. In this section we review some of the results on Galerkin approximation
and demonstrate the inadequacy of the current methods when applied the GHJB
equation.
Classical references on Galerkin's method can be found in (Kantorovich and
Krylov, 1958; Mikhlin, 1964; Mikhlin and Smolitskiy, 1967). These sources show
that Galerkin's method applied to the equation
AV = b;
where A is a linear operator, converges if the the operator is symmetric, positive
denite and positive bounded below. An operator is symmetric if for all V; W 2
D(A)
hAV; W i = hV; AW i ;
positive if for all V 2 D(A) n f0g
hAV; V i > 0;
and positive bounded below if 9
> 0 (independent of V ) such that
hAV; V i
kV k2
8V 2 D(A).
These results are extended in (Petryshyn, 1965) so that A only needs to be K -
symmetric and K -positive bounded below, where K is an operator that multiplies
W and V in the equations above. In (Schultz, 1969a), the Galerkin approximation
method is placed in a Hilbert space setting. The dierential operators are required
to to positive bounded below and symmetric. In these cases, it is shown the the
Galerkin method converges and that for any nite subspace, the Galerkin method
yields the best-approximation from that subspace. Both linear and nonlinear oper-
ators are considered. A particular non-self-adjoint nonlinear operator is considered
in (Schultz, 1969b), however results do not exist for general non-self-adjoint and
nonlinear operators. A general survey of results concerning the convergence of the
Galerkin method, and error bounds presented in the literature prior to 1972 in
given in (Finlayson, 1972). A modern treatment is given in (Zeidler, 1990a) for
linear operators and (Zeidler, 1990b) for nonlinear operators.
The following denition and result is given in (Zeidler, 1990a, p. 279).
Denition B.0.1 (Uniquely Approximation-Solvable) Let X be a Hilbert
115
116
= sup 4m2
kV k6=0
m
= 1:
j ?1
2
?1
=) 1 j +
2 2+j
=) 2 j + ; j = 1; 2; :
h; N i
=4 Vector or Matrix of inner products p. 33
4
u = An arbitrary admissible control p. 43
4
VN = Solution to hGHJB(VN ;u) ; N i
= 0 p. 43
4
V^ = Solution to GHJB V^ ; u = 0 p. 43
4
uN = ? 21 R?1gT @V@xN p. 43
118
119
4
u^ = ? 12 R?1gT @@xV^ p. 43
4
cN = (c1; ; cN )T p. 43
4
c^N = (^c1; ; c^N )T p. 43
4
PD (
) = Pointwise decreasing
p. 51
4
V^ (i) = Solution of GHJB V^ (i) ; u(Ni) = 0 p. 63