100% found this document useful (2 votes)
469 views423 pages

Optimal Design of Control Systems Stochastic and Deterministic Problems

Uploaded by

azis muhajar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
469 views423 pages

Optimal Design of Control Systems Stochastic and Deterministic Problems

Uploaded by

azis muhajar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 423

OPllMAL DESIGN OF

CONlROL SYSTEMS
Stochastic and
Deterministic Problems
OPTIMAL DESIGN OF
CONTROL SYSTEMS
PURE AND APPLIED MATHEMATICS

A Program of Monographs, Textbooks, and Lecture Notes

EXECUTIVE EDITORS

Earl J. Tafi Zuhair Nashed


Rutgers University University of Delaware
New Brunswick, New Jersey Newark, Delaware

EDITORIAL BOARD

M. S. Baouendi Anil Nerode


Universityof Calrlifornia, Cornell University
San Diego
Donald Passman
Jane Cronin University of Wisconsin,
Rutgers University Madison

Jack K. Hale Fred S. Roberts


Georgia Institute of Technology Rutgers University

S. Kobayashi Gian-CarloRota
University of California, Massachusetts Institute of
Berkeley Technology

Marvin Marcus David L. Russell


University of California, Virginia Polytechnic Institute
Santa Barbara and State University

W. S. Massey Walter Schempp


Yale University Universitat Siegen

Mark Teply
University of Wisconsin,
Milwaukee
MONOGRAPHS AND TEXTBOOKS IN
PURE AND APPLIED MATHEMATICS

K. Yano, Integral Formulas in Riemannian Geometry (1970)


S. Kobayashi, Hyperbolic Manifolds and Holomorphic Mappings (1970)
V. S. Vladimirov, Equations of Mathematical Physics (A. Jeffrey, ed.; A. Littlewood,
trans.) (1970)
B. N. Pshenichnyi, Necessary Conditions for an Extremum (L. Neustadt, translation
ed.; K. Makowski, trans.) (1971)
L. Nariciet a/., FunctionalAnalysis and Valuation Theory (1971)
S. S. Passrnan, Infinite Group Rings (1971)
L. Domhoff, Group Representation Theory. Part A: Ordinary Representation Theory.
Part B: Modular Representation Theory (1971,1972)
W. Boothby and G. L. Weiss, eds., Symmetric Spaces (1972)
Y. Matsushima, DifferentiableManifolds (E. T. Kobayashi, trans.) (1972)
L. E. Ward, Jr., Topology (1972)
A. Babakhanian, Cohomological Methods in Group Theory (1972)
R. Gilrner, Multiplicative Ideal Theory (1972)
J. Yeh, Stochastic Processesand the Wiener Integral (1973)
J. Barros-Neto, lntroduction to the Theory of Distributions (1973)
R. Larsen, Functional Analysis (1973)
K. Yano and S. Ishihara, Tangent and Cotangent Bundles (1973)
C. Pmcesi, Rings with Polynomial Identities (1973)
R. Hermann, Geometry, Physics, and Systems (1973)
N. R. Wallach, Harmonic Analysis on Homogeneous Spaces (1973)
J. Dieudonne, lntroduction to the Theory of Formal Groups (1973)
I.Vaisman, Cohomology and Differential Forms (1973)
B.-Y. Chen. Geometrv of Submanifolds (19731
M. arcu us,' Finite ~ikensional~ultilinear~ l ~ e b(in
r atwo parts) (1973, 1975)
R. Larsen, Banach Algebras (1973)
R. 0. Kujala and A. L. Vitter, eds., Value Distribution Theory: Part A; Part B: Deficit
and Bezout Estimates bv Wilhelm Stoll11973\
K. B. Stolarsky, ~ l ~ e b r aNumbers
ic and-~io~hantine Approximation (1974)
A. R. Maaid. The Se~arableGalois Theow of Commutative Rinas - (1974)
.
B. R. ~cBonald,~ i n i t eRings with ldentig(l974)
J. Satake, Linear Algebra (S. Koh et al., trans.) (1975)
J. S. Golan, Localizationof Noncommutative Rings (1975)
G. Klambauer, MathematicalAnalysis (1975)
M. K. Agoston, Algebraic Topology (1976)
K. R. Goodead, Ring Theory (1976)
L. E. Mansfield, Linear Algebra with Geometric Applications (1976)
N. J. Pullman, Matrix Theory and Its Applications (1976)
B. R. McDonald. Geometric Alaebra Over Local Rinas 11976)
C. W. ~roetsch;Generalized lnverses of Linear op&aiors (1977)
J. E. Kuczkowski and J. L. Gerstina, Abstract Algebra (1977)
C. 0. Christenson and W. L. oxm man, Aspects of ~ o p o l o (1
g~977)
M. Nagata, Field Theory (1977)
R. L. Long, Algebraic Number Theory (1977)
W. F. Pfeffer, Integrals and Measures (1977)
R. L. Wheeden and A. Zygmund, Measure and Integral (1977)
J. H. Curtiss, lntroduction to Functions of a Complex Variable (1978)
K. Hrbacek and T Jech, lntroduction to Set Theory (1978)
W. S. Massey, Homology and Cohomology Theory (1978)
M. Marcus, lntroduction to Modern Algebra (1978)
E. C. Young, Vector and Tensor Analysis (1978)
S. B. Nadler, Jr., Hyperspaces of Sets (1978)
S. K. Segal, Topics in Group Kings (1978)
A. C. M. van Rood, Non-Archimedean FunctionalAnalysis (1978)
L. Convin and R. Szczarba, Calculus in Vector Spaces (1979)
C. Sadosky, Interpolationof Operators and Singular Integrals (1979)
J. Cronin, Differential Equations (1980)
C. W. Groetsch, Elements of Applicable FunctionalAnalysis (1980)
56. 1. Vaisman, Foundationsof Three-Dimensional Euclidean Geometry (1980)
57.
58. S. B. Chae, ~ e b e s ~ uIntegration
e (1980)
-. . ,
H. I. Freedan. Deterministic Mathematical Models in Pooulation Ecoloav (1980)

59. C. S. Rees et a/., Theory and Applications of Fourier Analysis (1981)


60. L. Nachbin, lntroduction to FunctionalAnalysis (R. M. Aron, trans.) (1981)
61. G. Orzech and M. Otzech, Plane Algebraic Curves (1981)
62. R. Johnsonbaugh and W. E. Pfaffenberger, Foundations of Mathematical Analysis
(1981)
63. W. L. Voxman and R. H. Goetschel, Advanced Calculus (1981)
64. L. J. Corwin and R. H. Szczarba, MultivariableCalculus (1982)
65. V. I.lsfr~tescu,lntroduction to Linear Operator Theory (1981)
66. R. D. Jawinen, Finite and Infinite Dimensional Linear Spaces (1981)
67. J. K. Beem and P. E. Ehrlich, Global Lorentzian Geometry (1981)
68. D. L. Annacost, The Structure of Locally Compact Abelian Groups (1981)
69. J. W. Brewerand M. K. Smith, eds., Emmy Noether: A Tribute (1981)
70. K. H. Kim, Boolean Matrix Theory and Applications (1982)
71. T. W. Wieting, The MathematicalTheory of Chromatic Plane Ornaments (1982)
72. D. B. Gauld, Differential Topology (1982)
73. R. L. Faber, Foundations of Euclidean and Non-Euclidean Geometry (1983)
74. M. Canneli, Statistical Theory and Random Matrices (1983)
75. J. H. Carmth et a/., The Theory of Topological Semigroups (1983)
76. R. L. Faber, Differential Geometry and Relativity Theory (1983)
77. S. Bamett, Polynomials and Linear Control Systems (1983)
78. G. Karpilovsky, Commutative Group Algebras (1983)
79. F. Van Oystaeyen and A. Verschoren, Relative lnvariants of Rings (1983)
80. 1. Vaisman, A First Course in Differential Geometry (1984)
81. G. W. Swan, Applications of Optimal Control Theory in Biomedicine (1984)
82. T. Petrie and J. D. Randall, Transformation Groups on Manifolds (1984)
83. K. Goebel and S. Reich, Uniform Convexity, Hyperbolic Geometry, and Nonexpansive
Mappings (1984)
84. T. Albu and C. Nastasescu, Relative Finiteness in Module Theory (1984)
85. K. Hrbacek and T. Jech, lntroduction to Set Theory: Second Edition (1984)
86.
87.
.
F. Van Ovstaeven and A. Verschoren. Relative lnvariants of Rinas (1984),
B. R. ~cbonalb,Linear Algebra 0ver'Commutative Rings (19845
88. M. Namba, Geometry of ProjectiveAlgebraic Curves (1984)
89. G. F. Webb, Theory of ~onlihear~ge-bependent~ o ~ u l a t i oDynamics
n (1985)
90. M. R. Bremner et al., Tables of Dominant Weight Multiplicities for Representations of
Simple Lie Algebras (1985)
91. A. E. Fekete, Real Linear Algebra (1985)
92. S. B. Chae, Holomorphy and Calculus in Normed Spaces (1985)
93. A. J. Jem.. lntroduction to Integral Equations with A~~lications (1985)
94. G. Karpilovsky, Projective ~epresenbtionsof ~ i n i t 'Groups
e (1985) '
95. L. Narici and E. Beckenstein, Topological Vector Spaces (1 . 985).
96. J. Weeks, The Shape of Space (1985)
97. P. R. Gribik and K. 0. Kortanek, Extremal Methods of Operations Research (1985)
98. J.-A. Chao and W. A. Woyczynski, eds., Probability Theory and Harmonic Analysis
(1986)
99. G. D. Crown et a/., Abstract Algebra (1986)
100. J. H. Carmth et a/., The Theory of Topological Semigroups, Volume 2 (1986)
101. R. S. Doran and V. A. Belfi, Characterizationsof C*-Algebras (1986)
102. M. W. Jeter, Mathematical Programming (1986)
103. M. Altman, A Unified Theory of Nonlinear Operator and Evolution Equations with
Applications (1986)
104. A. Verschoren, Relative lnvariants of Sheaves (1987)
105. R. A. Usmani, Applied Linear Algebra (1987)
106. P. Blass and J. Lang, Zariski Surfaces and Differential Equations in Characteristic p >
0 (1987)
107. J. A. Reneke et a/., Structured Hereditary Systems (1987)
108. H. Busemann and B. B. Phadke, Spaces with DistinguishedGeodesics (1987)
109. R. Harte, lnvertibility and Singularity for Bounded Linear Operators (1988)
110. G. S. Ladde et a/., Oscillation Theory of Differential Equations with Deviating Argu-
ments (1987)
111. L. Dudkin et al., Iterative Aggregation Theory (1987)
112. T. Okubo, Differential Geometry (1987)
113. D. L. Stancl and M. L. Stancl, Real Analysis with Point-Set Topology (1987)
T. C. Gard, lntroduction to Stochastic Differential Equations (1988)
S. S. Abhyankar, Enumerative Combinatoricsof Young Tableaux (1988)
H. Strade and R. Famsteiner, Modular Lie Algebras and Their Representations (1988)
J. A. Huckaba, Commutative Rings with Zero Divisors (1988)
W. D. Wallis, Combinatorial Designs (1988)
W. Wieshw, Topological Fields (1988)
G. Karpilovsky, Field Theory (1988)
S. Caenepeel and F. Van Oystaeyen, Brauer Groups and the Cohomology of Graded
Rings (1989)
W. Kozlowski, Modular Function Spaces (1988)
E. Lowen-Colebunders, Function Classes of Cauchy Continuous Maps (1989)
M. Pavel, Fundamentals of Pattern Recognition (1989)
V. Lakshmikantham et al., Stability Analysis of Nonlinear Systems (1989)
R. Sivaramakrishnan, The Classical Theory of Arithmetic Functions (1989)
N. A. Watson, Parabolic Equations on an Infinite Strip (1989)
K. J. Hastings, lntroduction to the Mathematicsof Operations Research (1989)
B. Fine, Algebraic Theory of the Bianchi Groups (1989)
D. N. Dikranjan etal., Topological Groups (1989)
J. C. Morgan 11, Point Set Theory (1990)
P. Biler andA. Witkowski, Problems in MathematicalAnalysis (1990)
H. J. Sussmann, Nonlinear Controllability and Optimal Control (1990)
J.-P. Florens et a/., Elements of Bayesian Statistics (1990)
N. Shell, Topological Fields and Near Valuations (1990)
B. F. Doolin and C. F. Martin, lntroduction to Differential Geometry for Engineers
(1990)
S. S. Holland, Jr., Applied Analysis by the Hilbert Space Method (1990)
J. Okninski, Semigroup Algebras (1990)
K. Zhu, Operator Theory in Function Spaces (1990)
G. B. Price, An lntroduction to Multicomplex Spaces and Functions (1991)
R. B. Darst, lntroduction to Linear Programming (1991)
P. L. Sachdev, Nonlinear Ordinary Differential Equations and Their Applications (1991)
T. Husain, Orthogonal Schauder Bases (1991)
J. Foran, Fundamentals of Real Analysis (1991)
W. C. Brown, Matrices and Vector Spaces (1991)
M. M. Rao and Z. D. Ren, Theory of Orlicz Spaces (1991)
J. S. Golan and T. Head, Modules and the Structures of Rings (1991)
C. Small, Arithmetic of Finite Fields (1991)
K. Yang, Complex Algebraic Geometry (1991)
D. G. Hoffman etal., Coding Theory (1991)
M. 0. Gonzalez, Classical Complex Analysis (1992)
M. 0. Gonzalez. Comolex Analvsis (1992)
L. W. Baggett, Functional~ n a l i s i s(1992)
M. Sniedovich, Dynamic Prwramminn (1992)
R. P. Aaatwal. ~ijference~aiationsand lneaualities 119921
C. ~rezkski,~iorthogonality'andIts ~ ~ ~ l i c a t ito
o n~s6 m e r i kAnalysis
l (1992)
C. Swartz. An lntroduction to FunctionalAnalysis
- (1992)
. .
S. B. Nadler, Jr., Continuum Theory (1992)
M. A. Al-Gwaiz, Theory of Distributions (1992)
E. Peny, Geometry: Axiomatic Developments with Problem Solving (1992)
E. Castillo and M. R. Ruiz-Cobo, Functional Equations and Modelling in Science and
Engineering (I992)
A. J. Jeni, Integral and Discrete Transforms with Applications and Error Analysis
(1992)
A. Charlier et al., Tensors and the Clifford Algebra (1992)
P. Bilerand T. Nadzieia. Problems and Examoles in Differential Eauations
E. Hansen, Global ~itimizationUsing lntervai Analysis (1992)
' .
119921,
S. Guerre-Delabriere. Classical Seauences in Banach S~aces(1992)
Y. C. Wong, lntroductory Theory of ~opologicalVector spaces (1992)
S. H. Kulkami and B. V. Limaye, Real Function Algebras (1992)
W. C. Brown, Matrices Over Commutative Rings (1993)
J. Loustau and M. Dillon, Linear Geometry with Computer Graphics (1993)
W. V. Petlyshyn, Approximation-Solvability of Nonlinear Functional and Differential
Equations (1993)
E. C. Young. Vector and Tensor Analysis: Second Edition (1993)
T. A. Bick, Elementary Boundary Value Problems (1993)
174. M. Pavel, Fundamentals of Pattem Recognition: Second Edition (1993)
175. S. A. Albeverio et a/. , NoncommutativeDistributions (1993)
176. W. Fulks, Complex Variables (1993)
177. M. M. Rao. Conditional Measures and Applications (1993)
178. A. Janicki and A. Weron, Simulation and Chaotic Behavior of a-Stable Stochastic
Processes (1994)
179. P. Neittaanmakiand D. Tiba. Optimal Control of Nonlinear Parabolic Systems (1994)
180. J. Cronin, Differential Equations: lntroduction and Qualitative Theory, Sewnd Edition
(1994)
181. S. Heikkila and V. Lakshmikantham, Monotone Iterative Techniques for Discontinuous
Nonlinear Differential Equations (1994)
182. X. Mao, Exponential Stability of Stochastic Differential Equations (1994)
183. B. S. Thomson, Symmetric Properties of Real Functions (1994)
184. J. E. Rubio. Optimization and NonstandardAnalysis (1994)
185. J. L. Bueso et al., Compatibility, Stability, and Sheaves (1995)
186. A. N. Micheland K. Wang. Qualitative Theory of Dynamical Systems (1995)
187. M. R. Damel, Theory of Lattice-Ordered Groups (1995)
188. Z. Naniewicz and P. D. Panagiotopoulos, Mathematical Theory of Hemivariational
Inequalitiesand Applications (1995)
189. L. J. Corwin and R. H. Szczanba. Calculus in Vector Spaces: Sewnd Edition (1995)
190. L. H. Erbe et al. , Oscillation Theory for Functional Differential Equations (1995)
191. S. Agaian et al., Binary PolynomialTransforms and Nonlinear Digital Filters (1995)
192. M. I. Gil: Norm Estimations for Operation-Valued Functions and Applications (1995)
193. P. A. Grillet. Semigroups: An lntroduction to the Structure Theory (1995)
194. S. Kichenassamy, Nonlinear Wave Equations (1996)
195. V. F. Krotov, Global Methods in Optimal Control Theory (1996)
196. K. I. Beidaret a/., Rings with Generalized Identities (1996)
197. V. I. Amautov et a/., lntroduction to the Theory of Topological Rings and Modules
11996)
G. ~ierksma,Linear and lnteger Programming(1996)
R. Lasser, lntroduction to Fourier Series (1996)
V. Sima, Algorithms for Linear-QuadraticOptimization (1996)
D. Redmond, Number Theory (1996)
J. K. Beem et a/., Global Lorentzian Geometry: Second Edition (1996)
M. Fontana et a/., Prijfer Domains (1997)
H. Tanabe, FunctionalAnalytic Methods for Partial Differential Equations (1997)
C. Q. Zhang, lnteger Flows and Cycle Covers of Graphs (1997)
E. Spiegel and C. J. O'Donnell, Incidence Algebras (1997)
B. Jakubczyk and W. Respondek, Geometry of Feedback and Optimal Control (1998)
T. W. Haynes eta/., Fundamentals of Domination in Graphs (1998)
T. W. Haynes et a/., Domination in Graphs: Advanced Topics (1998)
L. A. D'Alotto et al., A Unified Signal Algebra Approach to Two-Dimensional Parallel
Digital Signal Processing (1998)
F. Halter-Koch, Ideal Systems (1998)
N. K. Govil et a/., Approximation Theory (1998)
R. Cross, Multivalued Linear Operators (1998)
A. A. Marfynyuk, Stability by Liapunov's Matrix Function Method with Applications
(1998)
A. ~ a v i nand
i A. Yagi, Degenerate Differential Equations in Banach Spaces (1999)
A. lllanes and S. Nadler, Jr.. Hyperspaces:
-. . Fundamentals and Recent Advances
(1999)
G. Kato and D. Struppa, Fundamentalsof Algebraic Microlocal Analysis (1999)
G. X.-Z. Yuan, KKM Theory and Applications in Nonlinear Analysis (1999)
D. Motreanu and N. H. Pavel, Tangency, Flow Invariance for Differential Equations,
and Optimization Problems (1999)
K. Hrbacek and T. Jech, lntroduction to Set Theory, Third Edition (1999)
G. E. Kolosov, Optimal Design of Control Systems (1999)
A. I. Prilepko et a/., Methods for Solving Inverse Problems in Mathematical Physics
(1999)

Additional Volumes in Preparation


OPTIMAL DESIGN OF
CONTROL SYSTEMS
Stochastic and
Deterministic Problems

G. E. Kolosov
Moscow University of
Electronics and Mathematics
Moscow, Russia

M A R C E L

MARCELDEKKER,
INC.
Library of Congress Cataloging-in-Publication Data

Kolosov, G. E. (Gennadii Evgen'evich)


Optimal design of control systems: stochastic and deterministic problems / G. E.
Kolosov.
p. cm.- (Monographs and textbooks in pure and applied mathematics; 221)
Includes bibliographical references and index.
ISBN 0-8247-7537-6 (alk. papel-)
1. Control theory. 2. Mathematical optimization. I. Titlc. 11. Scrics.
QA402.3.K577 1999
629.8'3 12--dc2 1 99-30940
CIP

This book is printed on acid-kee paper

IIeadqualters
Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 2 12-685-4540

Eastern Hemisphere Distribution


Marcel Dekker AG
Hutgasse 4, Postfach 8 12, CH-4001 Basel, Switzerland
tel: 41-61-261-8482; fax: 41-61-261-8896

World Wide Web


http://www.dekker.com

The publisher offers discounts on this book when ordered in bulk quantities. For more infor-
mation, write to Spccial Sales/Professional Marketing at the headquarters address above.

Copyright O 1999 by Marcel Dekker, Inc. All Rights Reserved.

Neither this book nor any part may be reproduced or transmitted in any form or by any
mcans, clcctronic or mechanical, including photocopying, microfillning, arid recording, or
by any information storage and retrieval system, without permission in writing from the
p~ihlisher

Current printing (last digit):


1 0 9 8 7 6 5 4 3 2 1

PKlNTED IN THE UNITED STATES OF AMERICA


PREFACE

The rise of optimal control theory is a remarkable example of interaction


between practical needs and mathematical theories.
Indeed, in the middle of this century the development of various auto-
matic control systems in technology and of systems for control of motion
of mechanical objects (in particular, of flying objects such as airplanes and
rockets) gave rise to specific mathematical problems concerned with finding
the conditional extremum of functions or functionals, which could not be
solved by means of the methods of classical mathematical analysis and the
calculus of variations.
Extreme urgency of these problems for practical needs stimulated the
efforts of mathematicians to develop methods for solving these new prob-
lems. At the end of the fifties and a t the beginning of the sixties, these
efforts were crowned with success when new mathematical approaches such
as Pontryagin's maximum principle, Bellman's dynamic programming, and
linear and convex programming (developed somewhat earlier by L. Kan-
torovich, G . Dantzig, and others) were established. These new approaches
greatly affected the research carried out in control theory a t that time. It
should be noted that these approaches have played a very important role
in the process of formation of optimal control theory as an independent
branch of science. One can say that the role of the maximum principle and
dynamic programming in the theory of optimal control is as significant as
that of Maxwell's equations in electromagnetic theory in physics.
Optimal control theory evolved most intensively a t the end of the sixties
and during the seventies. This period showed a very high degree of coop-
eration and interaction between mathematicians and all those dealing with
applications of control theory in technology, mechanics, physics, chemistry,
biology, etc.
Later on, a gap between the purely mathematical and the practical ap-
proach to solving applied problems of optimal control began to emerge and
is now apparent. Although the appearance of this gap can be explained by
quite natural reasons, nevertheless, the further growth of this trend seems
to be undesirable. The author hopes that this book will to some extent
reduce the gap between these two branches of research.
iv Preface

This book is primarily intended for specialists dealing with applications


of control theory. It is well known that the use of such approaches as, say,
the maximum principle or dynamic programming often leads to optimal
control algorithms whose implementation for actual real-time plants en-
counters great (sometimes insurmountable) difficulties. This is the reason
that for solving control problems in practice one often employs methods
based on various simplifications and heuristic concepts. Naturally, this
results in losses in optimality but makes it possible to obtain control al-
gorithms that allow simple technological implementations. In some cases
the use of simplifications and heuristic concepts can also result in signif-
icant deviations of the system performance index from its optimal value
(Chapter VI).
In this book we describe ways for constructing simply realizable algo-
rithms of optimal (suboptimal) control, which are based on the dynamic
programming approach. These algorithms are derived on the basis of exact,
approximate analytical, or numerical solutions of differential and functional
Bellman equations corresponding to the control problems considered.
The book contains an introduction and seven chapters. Chapter I deals
with some general concepts of control theory and the description of math-
ematical approaches to solving problems of optimal control. We consider
both deterministic and stochastic models of controlled systems and discuss
the distinguishing features of stochastic models, which arise due to possible
ambiguous interpretation of solutions to stochastic differential equations
describing controlled systems with white noise disturbances.
We define the synthesis problem as the principal problem of optimal
control theory and give a general scheme of the dynamic programming ap-
proach. The Bellman equations for deterministic and stochastic control
problems (for Markov models and stochastic models with indirect obser-
vations) are studied. For problems with infinite horizon we introduce the
concept of stationary operating conditions, which is widely used in further
chapters of the book.
Exact methods of synthesis are considered in Chapter 11. We describe the
exceptional cases in which the Bellman equations have exact solutions, and
hence the optimal control algorithms can be obtained in explicit analytical
forms.
First (in §2.1), we briefly discuss some well-known results concerned with
solution of the so-called LQ-problems. Next, in gg2.2-2.4, we write exact so-
lutions for three specific problems of optimal control with bounded control
actions. We consider deterministic and stochastic problems of control of
the population size and the problem of constructing an optimal servomech-
anism. In these systems, the optimal controllers are of the "bang-bang"
form, and the switch point coordinates are given by finite formulas.
Preface v

The following four chapters are devoted to the description of approxi-


mate methods for synthesis. In this case, the design of suboptimal control
systems is based, as a rule, on using the approximate solutions of the cor-
responding Bellman equations. To obtain these approximate solutions, we
mainly use various versions of small parameter methods or successive ap-
proximation procedures.
In Chapter I11 we study weakly controlled systems. We consider control
problems with bounded controls and assume that the values of admissible
control actions are small. This stipulates the appearance of a small param-
eter in the nonlinear term in the Bellman equation. This, in turn, makes it
possible to propose a natural successive approximation procedure for solv-
ing the Bellman equation, and thus the synthesis problem, approximately.
This procedure is a modification of the well-known Picard and Bellman
procedures which provide a way for obtaining approximate solutions of
nonlinear differential equations by solving a sequence of linear equations.
Chapter I11 is organized as follows. First (in §3.1), we describe the
general scheme of approximate synthesis for controlled systems under sta-
tionary operating conditions. Next (in 33.2), by using this general scheme,
we calculate a suboptimal controller for an oscillatory system with one de-
gree of freedom. Later (in 33.3 and 33.4), we generalize our approach to
nonstationary problems and to the case of correlated disturbances; then we
estimate the error obtained. In 33.5 we prove that the successive approx-
imation procedure in question converges asymptotically. Finally (in 33.6),
we apply this approach to an approximate design of a stochastic system
with distributed parameters.
Chapter IV is about stochastic controlled systems with noises of small
intensities. In this case, the diffusion terms in the Bellman equation con-
tain small coefficients. Under certain assumptions this allows us to replace
the initial stochastic problem by a sequence of auxiliary deterministic prob-
lems of optimal control whose solutions (i) can be calculated more easily
and (ii) give a way for designing suboptimal control systems (with respect
to the initial stochastic problem). This approach is used for calculating
suboptimal controllers for two specific servomechanisms.
In Chapter V we consider a class of controlled systems whose dynamics
are quasiharmonic. The trajectories of such systems are close to harmonic
oscillations, and this is the reason that the well-developed techniques of the
theory of nonlinear oscillations can be effectively applied for studying these
systems. By using polar coordinates as the phase variables, we describe
the system state in terms of slowly changing amplitude and phase. The
presence of a small parameter on the right-hand sides of the differential
equations for these variables allows us to elaborate different versions of
approximate solutions for the various problems of optimal control. These
vi Preface

solutions are based on the use of appropriate asymptotic expansions of the


performance index, the optimal control algorithm, etc. in powers of the
small parameter.
We illustrate these techniques by solving four specific problems of op-
timal damping of deterministic and stochastic oscillations in a biological
predator-prey system and in a mechanical system with oscillatory dynam-
ics.
In Chapter VI we discuss some special asymptotic methods of synthesis
which do not belong to the classes of control problems studied in Chap-
ters 111-V. We consider the problems of control of plants with unknown
parameters (the adaptive control problems), in which the a priori uncer-
tainty of their values is small. In addition, we study stochastic control
problems with bounded phase variables and a problem of optimal control
of the population size whose behavior is governed by a stochastic logistic
equation with a large value of the medium capacity. We use small parameter
approaches for solving the problems mentioned above. For the construction
of suboptimal controls, we employ the asymptotic series expansions for the
loss functions and the optimal control algorithms. The error obtained is
estimated.
Numerical methods of synthesis are covered in the final Chapter VII.
We discuss the problem of the assignment of boundary conditions to grid
fUncti0n.s and propose some different schemes for solving specific problems
of optimal control. The numerical methods proposed are used for solving
specific synthesis problems.
The presentation of all the approaches studied in the book is accompa-
nied by numerous examples of actual control problems. All calculations
are carried out up to the accuracy level sufficient for comparatively simple
implementation of the optimal (suboptimal) algorithms obtained in actual
devices. In many cases, the algorithms are presented in the form of analo-
gous circuits or flow charts.
The book can be helpful to students, postgraduate students, and special-
ists working in the field of automatic control and applied mathematics. The
book may be of interest to mechanical and electrical engineers, physicists
and biologists. Only knowledge of the foundations of probability theory is
required for assimilating the subject matter of the book.
The reader should be acquainted with basic notions of probability theory
such as random events and random variables, the probability distribution
function and the probability density of random variables, the mean value
of a random variable, inconsistent and independent random events and
variables, etc. It is not compulsory to know the foundations of the theory
of random processes, since Chapter I provides all necessary facts about the
methods for describing random processes that are encountered further in
Preface vii

the book. This makes the book accessible to a wide circle of students and
specialists who are interested in applications of optimal control theory.

The author's intention to write this book was supported by R. L. Stra-


tonovich, who was the supervisor of the author's Ph.D thesis and for many
years till his sudden death in 1997 remained the author's friend.
The author wishes to express his deep gratitude to V. B. Kolmanovskii,
R. S. Liptser, and all participants of the seminar "Stability and Control" a t
the Moscow University of Electronics and Mathematics for useful remarks
and advice concerning the contents of this book.
The author's special thanks go to M. A. Shishkova for translating the
manuscript into English and keyboarding.

G. E. Kolosov
CONTENTS

Preface
Introduction

Chapter I. Synthesis Problems for Control


Systems and the Dynamic Programming
Approach
1.1. Statement of synthesis ~roblemsfor optimal control
systems
1.2. Differential equations for controlled systems with
random functions
1.3. Deterministic control problems. Formal scheme of
the dynamic programming approach
1.4. The Bellman equations for Markov controlled pro-
cesses
1.5. Sufficient coordinates in control problems with indi-
rect observations

Chapter 11. Exact Methods for Synthesis Prob-


lems
2.1. Linear-quadratic problems of optimal control (LQ-
problems)
2.2. Problem of optimal tracking a wandering coordinate
2.3. Optimal control of the population size
2.4. Stochastic problem of optimal fisheries management

Chapter 111. Approximate Synthesis of Stochas-


tic Control Systems With Small Control
Actions
3.1. Approximate solution of stationary synthesis prob-
lems
3.2. Calculation of a quasioptimal regulator for the os-
cillatory plant
x Contents

3.3. Synthesis of quasioptimal controls in the case of cor-


related noises
3.4. Nonstationary problems. Estimates of the quality
of approximate synthesis
3.5. Analysis of the asymptotic convergence of successive
approximations (3.0.6)-(3.0.8) as k -+ co
3.6. Approximate synthesis of some stochastic systems
with distributed parameters

Chapter IV. Synthesis of Quasioptimal Systems


in the Case of Small Diffusion Terms in the
Bellman Equation
4.1. Approximate synthesis of a servomechanism with
small-intensity noise
4.2. Calculation of a quasioptimal system for tracking a
discrete Markov process

Chapter V. Control of Oscillatory Systems


5.1. Optimal control of a quasiharmonic oscillator. An
asymptotic synthesis method
5.2. Control of the "predator-prey" system. The case of
a poorly adapted predator
5.3. Optimal damping of random oscillations
5.4. Optimal control of quasiharmonic systems with noise
in the feedback circuit

Chapter VI. Some Special Applications of


Asymptotic Synthesis Methods
6.1. Adaptive problems of optimal control
6.2. Some stochastic control problems with constrained
phase coordinates
6.3. Optimal control of the population size governed by
the stochastic logistic model

Chapter VII. Numerical Synthesis Methods


7.1. Numerical solution of the problem of optimal damp-
ing of random oscillations
7.2. Optimal control for the "predator-prey" system (the
general case)

Conclusion
References
Index
INTRODUCTION

The main problem of the control theory can be formulated as follows.


In the design of control systems it is assumed that each control sys-
tem (see Fig. 1) consists of the following two principal parts (blocks or
subsystems): the subsystem P to be controlled (the ~ l a n t )and the con-
trolling subsystem C (the controller). The plant P is a dynamical system
(mechanical, electrical, biological, etc.) whose behavior is described by a
well-known operator mapping the input (controlling) actions u(t) into the
output trajectories x(t). This operator can be defined by a system of ordi-
nary differential, functional, functional-differential, or integral equations or
by partial differential equations. It is important that the operator (or, in
technical terms, the structure or the construction) of the plant P is assumed
to be given and fixed from the outset.

As for the controller C , no preliminary restrictions are imposed on its


structure. This block must be constructed in such a way that the output
<
trajectories {x(t): 0 5 t T } (the case T = +cc is not excluded) possess,
in a sense, sufficiently "good" properties.
Whether the trajectories are "good" or not depends on the specifications
imposed on the control system in question. These assumptions are often
stated by using the concept of a support (or standard) trajectory S(t), and
the control system itself is constructed so that the deviation Ix(t) - Z(t)l
< <
on the time interval 0 t T does not exceed a value given in advance.
< <
If the "quality" of an individual trajectory {x(t) : 0 t T} can be es-
timated by the value of some functional I[x(t)] of this trajectory, then there
is a possibility to find a n optimal trajectory x,(t) on which the functional
2 Introduction

I[x(t)] attains its extremum value (in this case, the extremum type (mini-
mum or maximum) is determined by the character of the control problem).
The functional I[x(t)] used for estimating the control quality is often called
the optimality criterion or the performance index of the control system
designed.
If there are no random actions on the system, the problem of finding the
optimal trajectory x,(t) amounts to finding the optimal control program
<
{u, (t) : 0 5 t T) that ensures the plant motion along the extremum tra-
jectory {x,(t) : 0 5 t 5 T}. The optimal control u,(t) can be calculated
by using methods of classical calculus of variations [64], or, in more general
situations, Pontryagin's maximum principle [156], or various approximate
methods [I381 based on these two fundamental approaches. Different meth-
ods for calculating the optimal control programs are discussed in [137].
If an optimal control system is constructed without considering stochas-
tic effects, then the system can be open (as in Fig. I ) , since the plant tra-
< <
jectory {x(t) : 0 t T} and hence the value of the optimality criterion
< <
I[x(t)] are determined uniquely for a chosen realization {u(t) : 0 t T} of
control actions. (Needless to say that the equation of the plant is assumed
to have a unique solution for a given initial state x(0) = xo and a given
input function u(t).)

The situation is different if the system is subject to noncontrolled ran-


dom actions. In this case, to obtain an effective control, one needs some
information about the actual current state x(t) of the plant, that is, the
optimal system must be a closed-loop (or feedback) system. For example,
all servomechanisms are designed according to this principle (see Fig. 2).
In this case, in addition to the operator of the plant P, it is necessary to
take into account the properties of a source of information, which deter-
mines the required value y(t) of the output parameter vector x(t) at each
instant t (examples of specific servomechanisms can be found in [2, 20,
38, 501). The block C measures the current values of the input y(t) and
output x(t) variables and forms controlling actions in the form of the func-
tional u(t) = (~(y;,x;) of the observed trajectories yk = {y(s): 0 < s 5 t},
< <
xk = {x(s): 0 s t} so that the equality x(t) y(t) holds, ifpossible,
Introduction 3

< <
for 0 t T. However, the stochastic nature of the assigning action (com-
mand signal) y(t) on one side, and the inertial properties of the plant P on
the other side do not allow to ensure the required identity between the in-
put and output parameters. Therefore, a problem of optimal control arises
in a natural way.
Hence, just as in the deterministic case, the optimality criterion I[ly(t) -
x(t) I] is introduced, which is a measure of the "distance" between the func-
tions y(t) and x(t) on the time interval 0 < - t <
- T. The final statement
of the problem depends on the type of assumptions on the properties of
the assigning action y(t). Throughout this book, we use the probability
description for all random actions on the system. This means that all as-
signing actions are treated as random functions with known (completely
or partially) probability characteristics. In this approach, the optimal con-
trol law that determines the structure of the block C can be found from
the condition that the mean value of the criterion I [I y(t) - x(t) I] attains
its minimum. Another approach in which the regions of admissible values
of perturbations rather than their probability characteristics are specified
and the optimal system is constructed by methods of the game theory is
described in [23, 114, 115, 145, 1951.
If the servomechanism shown in Fig. 2 is significantly affected by noises
arising due to measurement errors, instability of voltage sources in electri-
cal circuits, varying properties of the medium surrounding the automatic
system, then the block diagram in Fig. 2 becomes more complicated and
can be of the form shown in Fig. 3.

Here C(t) and ~ ( t denote


) random perturbations distorting information
on the command signal y(t) and the state x(t) of the plant to be controlled;
the random function [(t) describes the perturbing actions on the plant P.
By '1' and '2' we denote the blocks in which useful signals and noises are
combined. It is usually assumed that the structure of such blocks is known.
4 Introduction

In this book we do not consider control systems whose block diagrams are
more complicated than that shown in Fig. 3. All control systems studied
in the sequel are special cases of the system shown in Fig. 3.
The main emphasis of this book is on the methods for calculating the
optimal control algorithms

which determine the structure of the controller C and guarantee the optimal
behavior of the feedback control system shown in Fig. 3. Since the methods
studied in this book are oriented to solving applied control problems in
mechanics, engineering, and biology, much attention is paid to obtaining
(*) in a form such that it can easily be used in practice. This means that all
optimal control algorithms described in the book for specific problems are
such that the functional (mapping) cp, in (*) has either a finite analytic form
or can be implemented by sufficiently simple standard modeling methods.
From the mathematical viewpoint, all problems of optimal control are
related to finding a conditional extremum of a functional (the optimal-
ity criterion), i.e., are problems of calculus of variations [28, 58, 64, 1371.
However, a distinguishing feature of many optimal control problems is that
they are "nonclassical" due restrictions imposed on the admissible values
of controlling actions u(t). For instance, this often leads to discontinuous
extremals inadmissible in the classical theory [64]. Therefore, problems of
optimal control are usually solved by contemporary mathematical methods,
the most important being the Pontryagin maximum principle [I561 and the
Bellman dynamic programming approach [14].These methods develop and
generalize two different approaches to variational problems in the classical
theory: the Euler method and the Weierstrass variational principle used for
constructing a separate extremal and the Hamilton-Jacobi method based
on the consideration of the entire field of extremals, which leads to partial
differential equations for controlled systems with lumped parameters or to
equations with functional derivatives for controlled systems with distributed
parameters.
The maximum principle, which is a rigorously justified mathematical
method, can be used in general for solving both deterministic and stochastic
problems of optimal control [58, 116, 1561. However this method, based on
the consideration of individual trajectories of the control process, leads
to certain technical difficulties when one needs to find the structure of
the controller C in feedback stochastic systems (see Figs. 2 and 3). In
this situation, the dynamic programming approach looks more attractive.
This method however suffers some flaws from the accuracy viewpoint (for
example, it is well known that the Bellman differential equations cannot be
Introduction 5

used in some cases of deterministic time-optimal control problems [50, 137,


1561).
In systems with lumped parameters where the behavior of the plant P
is governed by ordinary differential equations, the dynamic programming
approach allows the reduction of optimal control problem to solving a non-
linear partial differential equation (the Bellman equation). In this case, the
structure of the controller C (and hence the form of the function (map-
ping) cp, in (*)) is determined simultaneously with solving this equation.
Thus this method provides a straightforward solution of the main problem
in control theory, namely, the synthesis of a closed-loop automatic control
system. As for the possibility to use this method, so far it has been rig-
orously proved that the Bellman differential equations are valid and form
the basis for solving the synthesis problems for a wide class of stochastic
and deterministic control systems [113, 1751. Therefore, the dynamic pro-
gramming approach is widely used in this book and underlies practically
all methods developed for calculating optimal (or quasioptimal) controls.
As noted above, these methods constitute the dominant bulk of the subject
matter of this book.
As is known, the functional and differential Bellman equations can be
used effectively only if the controlled process (or, in more general cases,
the system phase traj.ectory in some state space) is a process without af-
tereffects, that is, a Markov type process. In deterministic problems, this
Markov property of trajectories readily follows from the corresponding exis-
tence and uniqueness theorems for the solutions of the Cauchy problem. To
ensure the Markov property of trajectories in stochastic control problems,
it is necessary to impose some restrictions on the class of random functions
used as mathematical models of random disturbances on the system. To
this end, throughout this book, it is assumed that all random actions on
the system are either "white noise" type processes or Markov stochastic
processes.
When the perturbations are of white noise type, the controlled process
x(t) itself can be Markov. If the noises are of Markov type, then the process
x(t) is, generally speaking, a component of a partially observable Markov
process of larger dimension. Therefore, to solve the synthesis problem ef-
fectively in this case, one needs to use a special state space formed by
sufficient statistics, so that the time evolution of these statistics possesses
the Markov property. In this case, the controller C consists of two parts: a
block that forms sufficient statistics (coordinates) and an actual controller
whose structure can be found by solving the Bellman equation.
These topics are studied in more detail in Chapter I.
CHAPTER I

SYNTHESIS PROBLEMS FOR CONTROL SYSTEMS


A N D THE DYNAMIC PROGRAMMING APPROACH

$1.1. Statement of synthesis


problems for optimal control systems
In synthesis problems it is required to find the structure of the control
block (controller) C in a feedback control system (see Figs. 2 and 3). From
the mathematical viewpoint, this problem is solved if we know the form of
the mapping
u = 9 6 ,9 (1.1.1)
that determines a single-valued correspondence between the input func-
tions' P = {P(t): 0 5 t < T) and g = {c(t): 0 5 t <
T ) and the control
< <
vector-function u = {u(t) : 0 t T) (the system is considered on the time
interval [0, TI).The conditions under which algorithm (1.1.1) can physi-
cally be implemented impose some restrictions on the form of the mapping
cp in (1.1.1). Usually, it is assumed that the current values of the control
vector u(t) = (ul(t), . . .,u,(t)) a t time t are independent of the future val-
ues Z(tl) and g(tf), t' > t . Therefore, the mapping (1.1.1) can be written
as follows (see (*) in Introduction):

where Pi = {P(s): 0 s< < t} and Gl = {(y(s): 0 5 s 5 t ) denote the


functions P and ij realized at time t .
In simpler situations (say, in the case of the servomechanism shown in
Fig. 2), the synthesis function cp may depend only on the current values of
the input processes
u(t) = ~ ( t4, t h ~ ( 4 ) (1.1.3)
or even may be of the form

'The functions jF and are input functions for the controller G.


8 Chapter I

if the command signal y(t) is either absent or a known deterministic function


of time.
The explicit form of the synthesis function cp is determined by the char-
acter of the optimal control problem.
To state the synthesis problem for an optimal control system mathemat-
ically, we need to know:

(1) the dynamic equations of the controlled plant;


(2) the goal of control;
(3) the restrictions (if any) on the domain of admissible values of control
actions u, on the domain of the phase variables x, etc.;
(4) the probability characteristics of the stochastic processes that affect
the system.

Obviously, in problems of deterministic optimal control we need only the


first three objects.
1.1.1. Dynamic equations of the controlled plant. The present
monograph, except for $3.6, deals with control systems in which the plant P
can be described by a system of ordinary differential equations in the normal
form
i = g(t, 2, u ) , (1.1.5)

where x = x(t) E R, and u = ~ ( t E) R, are the current values of an


n-dimensional vector of output parameters (the phase variables) and of
an r-dimensional control vector, g(t, x, u) : R x R, x R, c, R, is a given
vector-function, and the dot over a letter denotes the derivative with respect
t o time (that is, 5 is an n-vector with components d x i / d t , i = 1,.. .,n).
Here and in the sequel, Rk denotes the Euclidean space of k-dimensional
vectors.
If, in addition to the control u, the controlled plant experiences uncon-
trolled random perturbations (see Fig. 3), then its behavior is described by
the equation
= g(t, 2, u, ((t)), (1.1.6)

where [(t) is an m-vector of random functions (El(t), . . . , Ern (t)). Differ-


ential equations of the form (1.1.6) with random functions on the right-
hand sides are called stochastic dijfe~entialequations. In contrast with the
"usual" differential equations of the form (1.1.5), they have some special
properties, which we consider in detail in the next section.
The form of the vector-functions g(t, z, u) and (t, x, u, [(t)) on the right
in (1.1.5) and (1.1.6) is determined by the physical nature of the plant. In
the subsequent chapters, we consider various special cases of Eqs. (1.1.5)
Synthesis Problems for Control Systems 9

and (1.1.6) and solve some specific control problems for mechanical, techni-
cal, and biological objects. In the present chapter, we only discuss general
restrictions that we need to impose on the function g(.) in (1.1.5) and (1.1.6)
to obtain a well-posed mathematical statement of the problem of optimal
control synthesis.
The most important and, in fact, the only restriction on the function g(.)
is the existence of a unique solution to the Cauchy problem for Eqs. (1.1.5)
and (1.1.6) with any given control function u(t) chosen from a function
class that is called the class of admissible controls. This means that the
trajectory x(t) of system (1.1.5) or (1.1.6) is uniquely determined2 on the
+
time interval to 5 t 5 to T by the initial state x(to) = xo and a chosen
+
function {u(t) : to 5 t 5 t o T}.
The uniqueness of the solution x(t) of system (1.1.5) with the initial
condition %(to)= zo is guaranteed by well-known existence and uniqueness
theorems for systems of ordinary differential equations [137]. The following
theorem [I561 presents very general sufficient conditions for the existence
and uniqueness of the solution of system (1.1.5) with the initial condition
x(to) = xo (the Cauchy problem).
THEOREM.Let a vector-finction g(t, x, u) be continuous with respect to
all variables (t, x, u) and continuously difjerentiable with respect to the com-
ponents of the vector x = ( x l , . . ., x,), and let the vector-function u = u(t)
be continuous with respect to time. Then there exists a number T > 0 such
that a unique continuous vector-function x(t) satisfies system (1.1.5) with
the initial condition x(to) = xo on the interval to 5 t < - to T. +
If T + oo, that is, if the domain of existence of the unique solution
is arbitrary large, then the solution of the Cauchy problem is said to be
infinitely continuable to the right.
It should be noted that the functions g(.) and u need not be continuous
with respect to t. The theorem remains valid for piecewise continuous and
even for bounded functions g(-) and u that are measurable with respect
+
to t. In the last case, the solution x(t) : to 5 to T of system (1.1.5) is an
absolutely continuous function [91].
The assumption that the function g(-) is smooth with respect to the
components of the vector x is much more essential. If this condition is not
satisfied, then we can encounter situations in which system (1.1.5) does not
have any solutions in the "common" classical sense (for example, for some
initial vectors z(t0) = xo, it may be impossible to construct a function

2The solution of the stochastic differential equation (1.1.6) is a stochastic process


x(t). The uniqueness of the solution to (1.1.6) is understood in the sense that the initial
+
condition x(to) = zo and the control function u(t): to 5 t 5 t T uniquely deternline
+
the probability characteristics of the random variables x(t) for all t E (to,to TI.
10 Chapter I

x(t) that identically satisfies (1.1.5) on an arbitrarily small finite interval


to 5 t 5 to + T ) .
It is significant that we cannot exclude such seemingly "exotic" cases
from our consideration. As was already noted, the control function u on
the right-hand side of (1.1.5) can be defined either as a controlling program
(that is, as a function of time) or in the synthesis form, for example, in the
form u = cp(t, x(t)) like in (1.1.4). It is well known (this will be illustrated
by numerous special examples considered later) that many problems of
optimal control with control constraints often result in control algorithms
u, = cp(t,x(t)) in which the synthesis function cp is discontinuous with
respect to the phase variables x. In this case the assumptions of the above-
cited theorem may be violated even if the vector-function g(t, x, u) in (1.1.5)
is continuously differentiable with respect to x.
Now let us generalize the notion of the solution to the case of discontin-
uous (with respect to x) right-hand sides of Eqs. (1.1.5). Here we discuss
only the basic ideas for constructing generalized solutions. The detailed
and rigorous theory of generalized solutions of equations with discontinu-
ous right-hand sides can be found in Filippov's monograph [54].
We assume that in (1.1.5) the control function u has the synthesis form
(1.1.4). Then, by setting g(t, x) = g(t, x, cp(t, x)), we can rewrite (1.1.5) as
follows:
i = g(t,2). (1.1.7)
In the space of variables (t, x), we choose a domain D on which we need to
construct the solution of system (1.1.7). Suppose that a twice continuously
differentiable surface S divides the domain D into two domains D+ and D-
and some vector-functions g+ and 5- continuous in t and continuously
+
differentiable in X I , 2 2 , . . . , x, are defined on D+ S and on D- + S so
that g = & in D + and g = in D-. In this case, the solution of (1.1.7) on
the domain D- can uniquely be continued till the surface S . If the vector g
is directed towards the surface S in D- and away from the surface S in D+,
then the solution goes from D- to D + , intersecting the surface S only once
(Fig. 4). But if the vector g is directed towards the surface S in D- and
in D + , then the solution, once coming to S, can leave it neither to D-
nor to D+. Therefore, there is a problem of continuation of this solution.
In [54] it is assumed that after the solution x(t) comes to the surface S, the
subsequent motion of system (1.1.7) is realized along the surface S with
velocity
+
k = $o(t, x) 3 aZ+(t, x) (1 - a)g- (t, x), (1.1.8)
where x E S and the number a (0 a < <1) are chosen so that the vector
go(t,x) is tangent to the surface S a t the point x.
The vector go(t, x) in (1.1.8) can be constructed in the following way.
Synthesis Problems for Control Systems

At the point x E S we construct the vectors &(t, x) and 5-(t, x) and


connect their endpoints with a straight line. The point of intersection of
this straight line with the plane tangent to S a t the point x is the endpoint
of the desired vector &(t, x) (Fig. 5).

A function x(t) satisfying Eq. (1.1.7) in D+ and in D- and satisfying


Eq. (1.1.8) on the surface S is called the generalized solution of Eq. (1.1.7)
or a solution in the sense of Filippov.
This definition makes sense, since a solution in the sense of Filippov is
the limit of a sequence of classical solutions to Eq. (1.1.7) with smoothed (in
12 Chapter I

x) right-hand sides gk(t,x) if &(t, x) + c(t, x) as k -+ co. Moreover, the


sequence x k ( t ) of classical solutions of equations with retarded argument

uniquely converges to the same limit if the delay ~k + 0 as k -+ co (see 1541).


We also note that, in practice, solutions in the sense of Filippov can
be realized in some technical, mechanical, and other systems of automatic
control, which are sometimes called systems with variable structure [46]. In
such systems, the plant is described by Eq. (1.1.5), and the control vector u
makes a jump when the phase vector x(t) intersects a given switching sur-
face S. In such systems, if the motion is along the switching surface, the
critical segments of the trajectory can be realized by infinitely fast switch-
ing of control. In the theory of automatic control such regimes are called
"sliding modes" [2, 461.
Generalized solutions in the sense of Filippov allow us to construct
the unique solution of the Cauchy problem for Eq. (1.1.5) with function
g(t,x, u) piecewise continuous in x.
Now let us consider the stochastic differential equations (1.1.6). We
have already pointed out that these equations substantially differ from or-
dinary differential equations of the form (1.1.5); the special properties of
Eqs. (1.1.6) are studied in 51.2. Here we only briefly dwell on the nature of
special properties of these equations.
The stochastic differential equations (1.1.6) have the following funda-
mental characteristic property. If the random function [(t) on the right-
hand side of (1.1.6) is a stochastic process of the "white noise" type, then
the Cauchy problem for (1.1.6) can have an infinite (larger than a count-
able) set of different solutions. Everything depends on how we understand
the solution of (1.1.6) or, in other words, on how we construct the random
function x(t) that satisfies the corresponding Cauchy problem for (1.1.6).
It turns out that in this case we can propose infinitely many well-defined
solutions of equation (1.1.6).
This situation gives an impression that the differential equations (1.1.6)
do not make any sense. However, since control systems perturbed by a white
noise play an important role, it is necessary to specify how the dynamics
of a system is described in this case and in which sense Eq. (1.1.6) must be
understood if it is still used.
On the other hand, the existence and uniqueness of the solution to the
Cauchy problem for equations of the forms (1.1.5) and (1.1.6) is the basic
assumption that allows us to use the dynamic programming approach for
solving problems of optimal control synthesis.
In $1.2 we discuss these and some other topics.
S y n t h e s i s P r o b l e m s for C o n t r o l S y s t e m s 13

1.1.2. G o a l o f control. The requirements imposed on a designed


control system determine the form of the functional (the optimality crite-
rion), which is a numerical estimate of the control process. Let us consider
some typical problems of optimal control and write out the cost functionals
needed to state these problems.
We begin with deterministic problems in which the plant is described by
the system of differential equations (1.1.5). First, we assume that the time
<
interval 0 t 5 T (on which we consider the control process) is fixed and
the initial position of the plant is given, that is, x(0) = xo, where xo is a
vector of some given numbers. Such problems are called control problems
with variable right endpoint of the trajectory. Suppose that it is required
to construct an optimal servomechanism (see Fig. 2) such that the input
command signal y(t): 0 5 t <
T is a known function of time. If the goal
of the servomechanism shown in Fig. 2 is to reproduce the input function
y(t) via the output function x ( t ) : 0 <
t 5 T most closely, then one of
possible criteria for estimating the performance of this servomechanism is
the integral

(1.1.9)

where p is a given positive number, and la[ denotes the Euclidean norm of
a vector a , that is, la[ = (C:=,aj)"2. In an "ideal7' servomechanism, the
controlled output process is identically equal to the command signal, that
is, x(t) y(t), 0 5 t 5 T, and the functional (1.1.9) is equal to zero, which
is the least possible value. In other cases, the value of (1.1.9) is a numerical
estimate of the proximity between the input and output processes.
It may happen that much "effort" is required to ensure a sufficient prox-
imity between the processes x(t) and y(t), that is, the control action u(t)
needs to be large a t the input of the plant P. However, it is undesirable to
use too "large" controls in many actual devices both from the energy and
economy viewpoints, as well as from the reliability considerations. In these
cases, instead of (1.1.9), it is better to use, for example, the cost functional

where a, q > 0 are some given numbers. This functional takes into account
both the proximity between the output process x(t) and a given input
process y(t) and the total "cost" of control on the time interval [0, TI.
Of course, the functionals (1.1.9) and (1.1.10) do not exhaust all meth-
ods for stating integral optimality criteria that are used in problems of
synthesis of optimal servomechanisms (Fig. 2). The most general form
14 Chapter I

of integral criteria can be obtained by using the penalty functions intro-


duced by Wald [188]. Suppose that each current state of the system shown
in Fig. 2, characterized by the set of vectors ( x ( t ) ,y ( t ) , u ( t ) ) , is "penal-
ized" by a given nonnegative scalar function c(x, y, u) of their arguments.
If c(z, y, u) has the meaning of specific penalties per unit time, then the
functional
T
(1.1.11)

is a natural performance criterion on the time interval [0, TI. Obviously,


the functionals (1.1.9) and (1.1.10) are special cases of (1.1.11), in which
the penalty function c is defined as c(x, y , u ) = lx - yip or c(x, y, u) =
+
Ix - yip alulq, respectively.
Another class of optimal control problems is formed by problems of ter-
minal control. Such problems appear when the character of transition pro-
<
cesses in the system is not essential for 0 t < T and we are interested only
in the state of the system a t the terminal moment of time T. In this case,
using the corresponding penalty function $(x, y), we obtain the terminal
optimality criterion
Y(T)).
I2[uI = 11r(x(~), (1.1.12)
It should be noted that, by a n appropriate extension of the phase vector x,
we can rewrite the integral criterion (1.1.11) in the form (1.1.12). Thus,
from the mathematical viewpoint, the integral criterion (1.1.11)is a special
case of the terminal criterion (1.1.12) (see [ I , 34, 1371). Nevertheless, we
distinguish these criteria in the sequel, since they have different meanings
in applications.
In addition to (1.1.11) and (1.1.12), we often use their combination

this criterion depends both on the transition process and on the terminal
state of the system.
If the worst (with respect to a chosen.penalty function) state of the
controlled system on a fixed time interval [O,T] is a crucial factor, then,
instead of (1.1.11), we must use the criterion

An optimal system constructed by the minimization of the criterion (1.1.14)


provides the best (in contrast with any other system) result only in the
worst operating mode. Criteria of the form (1.1.14) were studied in [16, 40,
92, 1481.
Synthesis Problems for Control Systems 15

If a dynamicsystem described by Eqs. (1.1.5) is totally on troll able,^ then


optimal control problems with fixed endpoints or with some fixed terminal
set are often considered together with control problems with variable right
endpoint of the trajectory. In these problems, the control time T is not
fixed in advance and, as admissible controls, we take the control functions
u(t) : 0 5 t 5 T that transfer system (1.1.5) from a given initial state
x(0) = xo to a fixed terminal state x(T) = x l or to a fixed terminal set.
An admissible control u, (t) : 0 5 t 5 T is optimal if the integral functional
(1.1.11) attains its minimum at u, (t).
The control problems with fixed endpoints contain time-optimal prob-
lems as an important special case. In these problems, we have the penalty
function c E 1 in (1.1.11),and the minimized functional (1.1.11) is equal
to the transition time Il[u] = T from the state xo to the state 21 (or to a
given terminal set). Time-optimal problems find wide application in me-
chanics, physics, technology, e t ~ . (see [1, 24, 85, 90, 123, 137, 1561). It
should be noted that in due time just the time-optimal problems made a
great impact on the formation of the theory of optimal control as a subject
of independent study.
The most of the cited optimal control problems can readily be generalized
to the stochastic case in which the plant is described by the stochastic
differential equations (1.1.6). It only remains to note that in this case
each of the functionals (1.1.11)-(1.1.14) is a random variable for any fixed
control algorithm u(t) (given, say, in the form (1.1.3)). These variables
are measured to a large extent by their mean values, which determine the
values of the mean "losses" or "costs" of control if the control algorithm
(1.1.3) is repeated many times. The mean values

I s [u] = E14 [u] = E max c (x (t),y (t), u(t)) . (1.1.18)


O<t<T

of functionals (1.1.11)-(1.1.14) usually serve as optimality criteria in sto-

3According to 178, 1111, the system (1.1.5) is called totally controllable if for any two
given vectors xo and X I , there always exist a finite number T > 0 and an admissible
control function u ( t ) :0 5 t T such that system (1.1.5) is transferred from a given
initial state x(0) = xo to a given terminal state x(T) = X I during time T.
16 Chapter I

chastic problems of optimal control (in (1.1.15)-(1.1.18) and, in what fol-


lows, EA denotes the mean value (the mathematical expectation) of A).
In controlled stochastic systems, we often encounter control problems in
which the terminal moment is a random value, for example, the optimal
halting problems [113, 1321. In these problems, we have an additional
optimization parameter, namely, the random terminal time T of the process.
Therefore, the optimality criterion depends both on the control actions
u = U& = ~ ( t )0: 5 t 5 T and on r as follows:

There is another type of problems with a random terminal time of the


process. Suppose that D c R,+, is a subset of the Cartesian product
R,+, = R, x R, of Euclidean spaces of vectors x and y. Suppose that
TDis the time instant a t which the point (x(t), y(t)) comes to the boundary
a D of the set D for the first time.
Then we can state the problem [34, 1131 offinding the control law (1.1.3)
for which the functional

attains an extremum. The functional

is a special case of (1.1.20) with c(-) 1 and $(.) = 0; the value of this
functional is equal to the mean time during which the point (x, y) comes
to the boundary d D of the set D. If the criterion Ill is used, then the goal
of control depends on whether the initial state (x(0),y ( ~ ) of) the system is
an interior point or, vice versa, (x(0), y ( ~ ) )E Rn+, \ D. If (z(O),y(0)) E
D, then, as a rule, we need to maximize (1.1.21) (see 32.2); otherwise
((x(O),y(0)) @ D), the goal of control is to minimize (1.1.21). The last
problem is a stochastic version of the time-optimal problem [ I , 851.
The criteria 11,. . . ,Il1considered do not exhaust all possible statements
of optimal control problems. The other statements can be found in the
literature devoted to the control theory [I, 3, 5, 24, 34, 43, 58, 111, 112,
128, 1561. The choice of the criterion depends on practical needs, that
is, on the special technical problem that arises a t the design stage. It
should be noted that in the mathematical approach to optimalsystems more
attention is paid to general problems of the qualitative theory (the existence
and uniqueness of the solution, justification of the optimality principle,
Synthesis Problems for Control Systems 17

estimates and asymptotics of solutions, etc.), while the choice of a criterion


is not very essential. Moreover, by introducing some additional variables,
one can transform different optimality criteria to some standard form, for
example, to the integral form I1 (15) or to the terminal form I2 ( I s ) . The
situation is different if some quantitative calculations of the optimal control
block (controller) C are required, that is, if we need to write the algorithm
(1.1.3) explicitly. The complexity and sometimes the method of calculations
significantly depend on the optimality criterion. On the other hand, some
criteria of different form may lead to constructions of optimal systems close
to each other (see $2.2). In such cases, it may be useful to replace the
original criterion by a new one that simplifies the calculations of the optimal
algorithm (1.1.3) but does not change it essentially. Problems of the rational
choice of optimality criteria are considered in [50, 1551.
In the present monograph, we assume that an optimality criterion is
already chosen from some considerations and is a known functional of tra-
jectories of the system.
1.1.3. Constraints in control problems. In the design of actual
control systems (both open- and closed-loop systems, see Figs. 1-3), it is
often required to take into account some restrictions on the parameters of
the control process. For example, in many problems some constraints are
imposed on the control actions u. Suppose that only the value of the control
actions is important. Then the control function u is admissible if for all t
it takes values from some bounded set U c R,, that is,

In particular, (1.1.22) can be of the form

where uQ, pi, and k are given positive numbers.


Constraints of the form (1.1.22)-(1.1.24) reflect the fact that control
actions of any physical nature (force, torque, electric force, potential, heat
quantity, concentration, etc.) always vary in a bounded range. The control
is not allowed to take large values, since this may result in mechanical
break-downs, damages of electric circuits, etc.
Constraints of integral character are also possible. Sometimes they are
called constraints on control resources. In this case, the admissible control
Chapter I

functions u(t) must satisfy the condition

where Q > 0 is a given number. Problems with constraints (1.1.25) were


considered in [22, 341.
The same technical considerations often show that it is necessary to
impose some restrictions on the domain of admissible values of the phase
variables x. If X C R, is the set of possible values of x, then the related
constraints on the phase trajectories x(t) can be of the form

which is similar to (1.1.22).


Constraints of the form (1.1.22)-(1.1.26)
. . are of considerable and some-
times of decisive importance in problems of optimal control. So, control
problems often make sense only for constraints of the form (1.1.22)-(1.1.25).
Indeed, let us consider a control problem in which the plant is described
by system (1.1.5) and the control performance is estimated by the integral
optimality criterion (1.1.11) with penalty function independent of 7~ and u:

I[u] = j0 c(x(t)) dt.

Suppose that the penalty function c(x) attains the minimumvalue at x = xo


(we can always assume that this minimum value is zero). Then, by using an
arbitrary large control u (admissible in the absence of constraints (1.1.22)-
(1.1.25)), we can obtain a trajectory of motion x(t) that is arbitrarily close
to x(t) E xo = const (it is assumed that system (1.1.5) is controllable [78,
1111 and the current state of system (1.1.5) can be measured exactly).
Thus, if the control function is unbounded, the functional (1.1.27) can be
arbitrarily close to the zero value of its absolute minimum. But if the control
u(t) is bounded by some of conditions (1.1.22)-(1.1.25), then the functional
(1.1.27) never takes the zero value at x(0) # xo and the minimization
problem for (1.1.27) is nondegenerate.
In some cases, restrictions on phase variables (1.1.26) allow us to improve
the mathematical model of the control process and to describe the actual
situation more precisely. Let us consider an illustrative example. Suppose
that the plant is a servomotor with bounded speed. The equation of motion
has the form
2 = u, I u [ < uo, (1.1.28)
Synthesis Problems for Control Systems 19

where x and u are scalars. Suppose that by solving the synthesis problem,
we obtain the optimal control algorithm of the relay type:

+I, Y > 0,
u,(t, x) = uo sign (x - xo(t)), sign y = 0, y = 0, (1.1.29)
-1, Y < 0,
where xo(t) is a given function of time. In this algorithm the control action
instantaneously varies by a finite value when the difference (x - xo(t))
changes the sign. If an actual control device implementing (1.1.29) has
some inertial properties (for example, it turns out that the absolute rate v
of changing the control action is bounded by vo), then it is more convenient
to model such a system by a plant whose behavior is described by two phase
coordinates x1 = x and x2 = u such that

In this case, v (the rate of change of x2 = u) is a control parameter, and the


control constraint in (1.1.28) becomes a constraint imposed on the phase
coordinate x2 in (1.1.30).
1.1.4. Probability characteristics of stochastic processes. As
was already pointed out in Introduction, in the present monograph, we
consider stochastic processes under the assumption that all random actions
on the system (the variables ~ ( t )[(t),
, ~ ( t )and
, ((t) in Fig. 3) are either
white noises or Markov type processes. We restrict our consideration of
methods for the mathematical description of such processes to a rather
elementary presentation of related notions and problems. The rigorous
theory of Markov processes based on the measure theory can be found in
the monographs [44, 451.
A stationary scalar stochastic process [(t) is called the standard white
noise if it is Gaussian and has the zero mean and a delta type correlation
function,
E[(t) = 0, E[(t)[(t - r) = S ( r ) . (1.1.31)
In (1.1.31) S ( r ) denotes the Dirac delta-function that is zero for T # 0 and
becomes infinite for r = 0 (see [65, 911). Besides, any continuous function
f (t) satisfies the relation

[ f ( t ) i ( t - to) dt =
I f (b)/2,
f (a)/2,
to = b,
to = a ,
20 Chapter I

Various nonstationary generalizations of the notion of white noise are


combinations (obtained by multiplication and addition) of the standard
process (1.1.31) and some deterministic functions of time.
Obviously, a Gaussian stochastic process with characteristics (1.1.3 1)
cannot be physically realized, since, as we can see from (1.1.31), th'is process
has the infinite variance

and hence, to realize this process we need to have a source of infinite power.
Therefore, a process of the white noise type can be considered on some
time interval [O,T] as the limit (as A t 0) model of a time sequence
of independent identically distributed random variables (i = ((ti = i A )
(i = 0 , 1 , . . ., N , N = T l A ) with probability density

From (1.1.33) we can see that D[i = l / A t rn as A t 0. This means


that on any arbitrarily small finite time interval T with probability 1 a
realization of the white noise takes values both larger and smaller than any
fixed number. Thus the white noise is a stochastic process that oscillates
extremely fast and with infinite amplitude about its mean value. If we try
to draw a realization of the white noise on the time interval [to,to TI, +
then this realization completely fills the infinite band parallel to the x-axis
as shown in Fig. 6.
Synthesis Problems for Control Systems 21

The white noise is a convenient abstraction of actual stochastic processes.


This model of processes is of great advantage for performing mathematical
calculations with expressions that contain white noise type processes (in
particular, one can readily calculate the mean values of integrals of such
processes); this is related to the properties of the delta-function (1.1.32).
In mathematical investigations actual stochastic processes &(t) with finite
correlation time T,,, can be replaced by white noise type processes if T,,, <<
T,, where T, is either the characteristic time constant or the transient time
in a system under the action of & ( t ) (for more details, see [173, 1811). The
rigorous description of generalized stochastic processes of the white noise
type can be found in [63, 1571.
A multidimensional generalization of the standard white noise is an n-di-
mensional vector-column of random functions [(t) whose components ti(t)
(i = 1,. . ., n) are independent Gaussian stochastic processes with char-
acteristics (1.1.31). Thus, instead of (1.1.31), the n-dimensional standard
white noise is characterized by

where E is the n x n identity matrix and the superscript indicates the


transpose of a matrix.
Now let us consider methods for defining Markov processes. A stochastic
< <
process [(t): 0 t T with values from some set (the phase space) X is
called a Markov process or a process without aftereffect if the conditional
probabilities satisfy the relation

for any instants of time t l < t z < . . . < t n from [O,T] and any subset
G C X.
Formula (1.1.35) means that the probabilities of future values of Markov
processes are completely determined by the last measured state of the pro-
cess and are independent of the preceding states (the absence of aftereffect).
Depending on whether the sets [0, T] and X are discrete or continuous, we
distinguish discrete Markov sequences or Markov chains (the sets [0, T] and
X are discrete), continuous sequences (the set [O, T] is discrete and the set
X is continuous), and discrete Markov processes (the set [O, TI is continuous
and the set X is discrete). But if the phase space X is continuous and the
argument t of the stochastic process [(t) may take any value t E [0, TI, then
we have the following two types of Markov processes: continuous (all sam-
ple paths of the process [(t): 0 5 t 5 T are continuous functions of time
with probability 1) and strictly discontinuous (all sample paths of the pro-
< <
cess [(t) : 0 t T are step-functions, while the moments and amplitudes
of jumps are continuous random variables).
22 Chapter I

There exist more complicated Markov processes that are combinations


of processes listed above [181]. Numerous papers and monographs deal
with various types of Markov processes (see, for example, [ll, 36, 38, 67,
1601). In the present monograph, we consider the following three types
of stochastic processes in most detail: discrete, continuous, and strictly
discontinuous Markov processes.

Discrete Markov processes. As was already noted, in this case the time
is continuous, but the phase space X is discrete. We assume that the set X
consists of finitely many elements x l , . . . , x,, . . ., x,. At each instant of
time t E [0, T ] (possibly, T + m), the process [(t) with probability P,(t)
takes one of the m possible values x,, a: = 1,.. .,m. The transitions to
new states are instantaneous and take place a t random moments. Thus
a sample path of the process [(t) is a step-function of time as shown in
Fig. 7. Suppose that the process is in the state [(t) = x, a t time t. Then,
it follows from (1.1.35) that the probability of the event that the process
comes to the state [(T) = xp a t time T > t depends only on t , T, x,, and
xp. The corresponding conditional probability

which is usually called the transition probability, is an important character-


istic of the Markov process [(t).
The unconditional probabilities P, (t) = P{[(t) = x,}, a = 1, . . . , m,
and functions (1.1.36) describe the process [(t) completely.4 Actually, the
41f the probabilities P { < ( t l )= x,, < ( t 2 )= x p , . . . , ((t,) = x u ) are known for any
( t l , tz, . . . ,t,) E [0, T] and for any set of numbers ( a ,p , . . . , w ) , then a stochastic process
is said to be well-defined.
Synthesis Problems for Control Systems 23

probability multiplication theorem [52, 671 and the Markov property of the
process [ ( t ) imply that for any t l < t 2 < - .. < t, and a, P, . . . , w = 1 , . . .,m
the probability of the event { [ ( t l ) = x,, [ ( t 2 ) = x p , . . .,[(t,) = x u } can
be expressed in terms of the functions P,(t) and P,p(t, T ) as follows:

On the other hand, the functions P,(t) and P,p(t,r) can be obtained as
solutions of some systems of ordinary differential equations.
Let us derive the corresponding equations for P, ( t )and Pap ( t ,r ) . To this
end, we first obtain the Chapman-Kolmogorov equation for the transition
probabilities

We write formula (1.1.37) for three instants of time t , a, and T as follows:

Since
m
C P { [ ( ~=
) .,,[(a) =x.,f(r) = x p } = P ( C ( t ) = x,,C(r) =ED),
y=1
(1.1.40)
we write the right-hand side of (1.1.40) in the form

and, substituting (1.1.39) and (1.1.41) into (1.1.40), obtain Eq. (1.1.38)
after P,(t) is canceled out.
To derive differential equations for P,p(t, T ) we need some local time
characteristics of the Markov process [ ( t ) . If we assume that there is at
most one change of the state of the process [ ( t ) on a small time interval
A,5 then for small T - t we can write the transition probabilities Pap ( t ,T )
as follows:

5This is a well-known condition that the process ( ( t ) is ordinary [157, 1601, which
means that the probability of two and more jumps of [ ( t )is equal to o(A) on a small
time interval A .
24 Chapter I

(in (1.1.42), as everywhere in the following, by o ( A ) we denote expressions


of higher order than the infinitesimal A, that is, o ( A ) is a scalar function
such that lima+o o ( A ) / A = 0 ) .
The normalization condition CT==, P,p(t, r ) = 1 for the transition prob-
ability and formula (1.1.42) imply that

As is known [160, 1811, the parameters X,p(t) determine the intensity of


jumps of the process [ ( t ) . The variable X,(t) defined by (1.1.43) is often
called the exit intensity or the exit density of the state x,. It determines
the time intervaI on which the process <(t)remains in the state x , in the
sense that the probability P, of changing the state on the time interval
[ t ,t + T ] under the condition [ ( t ) = x , is

P, = I - exp { - 1
t+T
A, ( s )cis).

By setting a = t + A in (1.1.38) and using (1.1.42), we obtain

Dividing (1.1.44) by A and passing to the limit as A + 0 , we arrive a t the


system of differential equations ( a ,,f3 = 1 , . . .,m)

for the transition probabilities P,p(t, r ) ,which are considered as functions


of the initial time t . Equations (1.1.45) hold for t 5 r. The unique solu-
tion of system (1.1.45) is determined by the additional conditions on the
functions P a p ( t ,7 ) for t = r:
Synthesis Problems for Control Systems 25

With respect to the variable T, the transition probabilities Pap(t, T) sat-


isfy the other system of equations (a, = 1 , . . .,m)

which, by analogy with (1.1.45), can be obtained from (1.1.38) with a =


T - A by passing to the limit as A + 0. The initial conditions

Pap (t, t ) = L p , (1.1.48)

which are similar to (1.1.46), provide the uniqueness of the solution of


>
(1.1.47) defined for T t.
Equations (1.1.47) and (1.1.45) are the forward and backward systems
of Kolmogorov equations for the transition probabilities. From equations
(1.1.47) one can also readily derive equations for the unconditional prob-
abilities P,(t), a = 1, . . . , m. It suffices to multiply (1.1.47) by P,(t) and
sum over a taking into account the fact that

As a result, after we rename p +a and T + t, we obtain the system of


equations ( a = 1,. . .,m )

The initial probabilities P,(O), a = 1 , . . . , m, ensure that the solution of


system (1.1.49) is unique for t 2 0.
Thus an ordinary discrete Markov process is completely determined by
the probabilities P,(O), a = 1,. . .,m, of initial states and by the intensi-
ties Xap(t), a , p = 1 , . . .,m, a # p, of jumps. Indeed, if we know these
characteristics, then we can find the probabilities P,(t) and P,p(t, T) by
solving the systems of linear differential equations (1.1.49) and (1.1.47) (or
(1.1.45)). Conversely, if we know the probabilities P,(t) and Pap (t, T ) , then
we can calculate all possible probabilities of the form (1.1.37).
Continuous Markov processes. These processes are continuous in the
phase space X and with respect to time. On each time interval to < - t <-

to + T, sample paths J ( t ) of such processes are continuous functions of time


with probability 1.
266

First,, let us consider a one-dimensional (scalar) continuous stochastic


process.. In this case, the phase space X = RI is a set of points on the real
axis.. Since the instant value £(t) — x of the process is a continuous random
variable,, its probability properties can be determined via the probability
densityy function p(x,t). In a similar way, one can use the multidimen-
sionall density function P(XI, X2, . . . , xn; t2, . . . , tn) to describe the set of
instantt values £(ii) = xi,£(t2) = x2,...,£(tn) = xn. A stochastic process
£(t)£(t) : 0 < t < T is considered to be determined if we know all possible joint
densityy functions P(XI, . . . , xn; <i, . . . , tn) for any finite set of time instants
ff i, t2, j tn on the interval [0, T].
Thee multidimensional density functions P(XI, . . . , xn; t\, . . . , <„) are non-
negativee functions that satisfy the normalization condition

p(xi, ...,xn;tl,...,tn)dxi...dxn = I

withh respect to the variables xi,...,xn. With respect to the variables


ti,, . . . , < „ , these functions satisfy the symmetry conditions [66, 173]

(heree (ai, «2, ) is a permutation of the indices 1, 2, . . . , n) and the


compatibilityy conditions, which allow us to obtain the marginal distribu-
tionn [39] by integrating the initial density function

p(xp(xaa,, x/}; ta,tp) = I I p(x-i, ...,xn;ti,..., tn) dxi . . .

.. . . dxa_idxa+i . . . dxp_idx0+i . . . dxn. (1.1.50)

Itt follows from the probability multiplication theorem for joint distribu-
tionss (for any not necessarily Markov process) that

p(xi,x-2,...,xp(xi,x-2,...,xnn;ti,t;ti,t22,...,t,...,tnn)) =

(1.1.51))

wheree p(a;;,i,- xi, ti; . . . ; K,-_j.,i,-_i), i = 2,...,n, are densities relative to


thee conditional distributions of the process £(ti) provided that the instant
valuess £(ti) = x - L , £ ( t 2 ) = x2, . . . , £(^-i) = z,--i are chosen. However, if
inn (1.1.51) the sequence of times <i < t2 < < tn increases and £ ( t ) is a
Markovv process, then, by (1.1.35), we can write (1.1.51) in the form

xp(xxp(x33,t,t33 x2,t2)...p(xn,tn xn_i,<n_i), (1.1.52)


Synthesis Problems for Control Systems 27

Relation (1.1.52) is a n analog of (1.1.37) in the case of continuous Markov


processes.
It follows from (1.1.52) that to write any multidimensional density func-
tion, one needs to know the unconditional density p(x, t ) and the conditional
density p(y, r 1 x , t ) for any t and r > t. The function p(y, r 1 x,t ) , just as
Pap(t, r) in the discrete case, is called the transition probability. One can
obtain differential equations for the functions p(x, t ) and p(y, r I x, t ) , which
are analogs of Eqs. (1.1.45), (1.1.47), and (1.1.49). However, in contrast
with (1.1.45), (1.1.47), and (1.1.49), in this case we have partial differential
equations.
We write p(y, r 1 x, t ) = p(x, t ;y, r) for the transition probability and
consider this probability as a function of four variables x, t , y, and r.
Then, using (1.1.50) and (1.1.52), we readily obtain the relation

p(x, t; y, r) = l p ( x , t ; e, o)p(e, o; y, T) d r , t < o < r, (1.1.53)

which is a continuous analog of the Chapman-Kolmogorov equation; often


this equation is also called the Markov [67] or the Smoluchovski equation
11731.
We define the local characteristics of the stochastic process [(t) by the
relations

This Markov process is called a diffusion process. The values A ( t , x) and


B ( t , x) that determine [(t) are called the driJet and the diffusion coefficients.
Figure 8 illustrates these parameters by showing some sample paths of the
diffusion process [(t) issued from the point x a t time to. The straight line
A B shows the direction along which the "centroid" of the fan of sample
paths drifts for t close to to. The angle a between the line A B and the
x-axis is determined by the drift coefficient t a n a = A(to, z). For small
(t - to), the diffusion coefficient B ( x , to) determines the rate of increase in
the variance of instant values of [(t) with respect to the line AB; that is,
B ( x , to) determines the expansion rate of the fan of sample paths issued
from the point (to, x).
Note that the conditional expectations in (1.1.54) can be calculated by
integrating with the transition probability. For example,

E{[[(t + A ) - [(t)] 1 [(t) = x ) = / (2 - x)p(a, t ; e , t + A ) de. (1.1.55)


Chapter I

In (1.1.53) we set u = t + A, assume that the transition probability


p ( z , t + A; y, r ) is a sufficiently smooth function, and write its Taylor ex-
pansion
00
(Z akp
-x ) ~
P ( Z , ~ + A ; Y=, T
P ) ( x , ~ + A ; Y , ~ ( x), t ++A ; ~y, r~) . (1.1.56)
~
k=l

Substituting (1.1.56) into (1.1.53) and taking into account (1.1.54) and
(1.1.55), we obtain

Dividing (1.1.57) by A and passing to the limit as A + 0, we arrive a t the


backward Kolmogorov equation

To obtain the forward equation to which the transition probability satis-


fies as a function of y and r , we need to note that if ( ( u )= z , then the prob-
ability properties of the increment < ( z ,u ) = [ ( r ) - (( u ) of the stochastic
process [ ( r ) are completely determined by the function p ( z , u;y, 7 ) . As is
Synthesis Problems for Control Systems 29

known [67, 1731, the characteristic function O(u; z, u ) related to p(z, a ; y, r )


by the Fourier transform

@(u;2, a ) = E{ e x p [ j u ( t ( r ) - < ( a ) ) ] I ( ( 0 ) = 2 )

and by the inverse Fourier transform

is also a universal characteristic of the random variable {(z, a). Considering


O(u; z, u ) as the function of u and writing its Maclaurin series

) <(u)lk 1 [(a) = z} is the k-order moment of the


(here uk(z, a ) = E { [ ~ ( T -
increment of <(z, u)), we see that (1.1.61) and (1.1.54) imply

for u = r - A. Applying the inverse Fourier transform to the left- and


right-hand sides of this relation, we obtain

In (1.1.63) we have used the well-known formal relation [41] for the delta-
function:

which has a n exact meaning after it is multiplied by a n arbitrary continuous


function p ( z ) and integrated with respect to z and with (1.1.32) taken into
account.
We set a = r - A in (1.1.53), use (1.1.63), integrate with respect to z,
and obtain

A a2
+ -27a[yB ( ~ , ~ - A ) p ( x , t ; Y, r - A)] + o(A).
30 Chapter I

Then passing to the limit as A t 0, we obtain the forward Kolmogorov


equation

which is also called the Fokker-Planck equation.


Equations (1.1.58) and (1.1.64) are linear partial differential equations
of parabolic type. It is well known [166, 1791 that the unique solution of
such equations is not determined by the initial condition (for (1.1.64)) or
the endpoint condition (for (1.1.58))

to which the transition probability satisfies for t = r. It is also necessary to


take into account the boundary conditions, which in the case of the infinite
phase space X , consist of restrictions imposed on the asymptotic behavior
of the function p(x, t ; y, r) as 1x1, Iyl t co. To obtain unique solutions
of (1.1.58) and (1.1.64), it suffices to require that p(x, t; y, r) is bounded,
though it follows from the normalization condition that we always have a
sharper condition: p ( x , t ; y, r) t 0 as 1x1, Iyl t co. If the phase space X is
bounded, then the additional conditions to which the function p(x, t ; y, r )
need to satisfy a t the boundary points of X are determined by the behavior
of the phase trajectories ('(t) near these boundary points (see $6.2).
Multiplying (1.1.64) by p ( x , t ) , integrating with respect to x, and taking
into account the fact that

we obtain (after the change r t t and y t x) the following equation for


the unconditional density p(x, t):

which is similar to (1.1.64). To solve (1.1.66) for t 2 t o , it is necessary


to set the initial density p ( x , t o ) = po(x) and to take into account the
boundary conditions. If po(x) = S(x - xo), then the solution of (1.1.66) is
>
the transition probability p(x0, to; x, t ) , t to.
If the process ('(t) is a n n-dimensional vector-function of time (the phase
space is X = R,), then the local characteristics of process (1.1.54) are de-
termined by the vector A(x, t ) of drift coefficients with components Ai(x, t ) ,
Synthesis Problems for Control Systems 31

i = 1, . . .,n, and by the matrix of diffusion coefficients Bij ( x ,t ) , i,j =


1 , . . . , n. In the multidimensional case, the Fokker-Planck equation (1.1.66)
has the form

a
ap(xlt) - --[Ai(x,t)p(x,t)] + -21-a xa2 [ B i j ( ~ , t ) p ( x , t ) I . (1.1.67)
dt dxi idxj

The sums on the right-hand side of (1.1.67) are taken over twice repeated
indices. This means that these expressions are understood as

Later we shall also use such short notation for sums.


It should be noted that the Fokker-Planck equation is not the only
method for describing the properties of continuous Markov processes. An-
other method for defining the diffusion processes is based on stochastic
differential equations. We consider this method in 81.2.
Strictly discontinuous Markov processes. Suppose that the state x of the
process [ ( t ) varies by jumps a t random instants of time. By analogy with
the case of discrete processes, we assume that the moments of jumps form
an ordinary sequence of events; then we denote the intensity of jumps by
X(t, x ) provided that [ ( t ) = x . A jump a t time t transfers the process [ ( t )
to a random state y with probability density ~ ( xy,,t ) = p ( [ ( t 0 ) = y 1 +
[ ( t ) = x ) . Then for small r - t > 0 the transition probability p ( x , t ;y, r )
of this process can be written in the form

P(., t ;y 1 r ) = 11 - (7 - t ) X ( x ,t ) I d ( y - 2 )
+ (r - t ) X ( x ,t ) r ( x ,y, t ) +o(r - t ) . (1.1.68)

+
Just as previously, in (1.1.53) we first set u = t A, then we set u = r - A,
and apply (1.1.68). Passing to the limit as A + 0 in (1.1.53), we obtain
the following pair of integro-diflerential Feller equations for the transition
probability:
32 Chapter I

It follows from (1.1.70) that the one-dimensional unconditional density


p ( z , t ) satisfies the equation

A(., t ) p ( x ,t ) + / A((, e ) r ( t ,
E, z ) p ( z ,t ) d z . (1.1.71)

Equations (1.1.69)-(1.1.71) describe the probability properties of a strictly


discontinuous Markov process. They are analogs of Eqs. (1.1.45), (1.1.47),
and (1.1.49) for discrete processes and of Eqs. (1.1.58), (1.1.64), and (1.1.66)
for diffusion processes.

31.2. Differential equations for controlled


systems with random functions
Let us consider system (1.1.6) of stochastic differential equations describ-
ing the dynamics of a controlled plant P shown in Fig. 3, namely,

Recall that x = x ( t ) is an n-dimensional vector-column with components


xi = z i ( t ) , i = 1 , . . ., n , and g ( t , x , u , J ) is a given vector-function of time t
and vectors z , u , and J .
We begin the study of stochastic differential equations with the case in
which the vector-function g is independent of the control parameters u (the
motion without control):

In this case, we mainly consider a special case of Eq. (1.2.1), that is, the
scalar equation
ji: = a ( t , x ) +
a ( t ,x ) J ( t ) . (1.2.2)
The results obtained for (1.2.2) in what follows can readily be generalized
to the general case (1.2.1), as well as to the controlled motion case (1.1.6).
We shall do this in the end of the section.
If the stochastic process J ( t )is the time derivative J ( t )= jl(t) = d r l ( t ) / d t
of some random function q ( t ) , then multiplying (1.2.2) by dt we can write
Eq. (1.2.2) in terms of differentials:

The stochastic process x ( t ) , t 2 to, is called the solution of stochastic dif-


ferential equations (1.2.2), (1.2.3) with initial condition x ( t 0 ) = xo; the
Synthesis Problems for Control Systems 33

expression on the right-hand side of (1.2.3) is called the stochastic difleren-


>
tial of this process if for any t to we have the integral representation

Suppose that [(t) in (1.2.2) is the standard white noise with character-
istics (1.1.31). Then the stochastic process

is Gaussian and it follows from (1.1.31) that the increment ~ ( t -) q(0) of


this process has the following mean and variance over time t:

It also follows from (1.1.31) that the increments of ~ ( t on


) nonintersecting
time intervals are independent random variables, since we have

if the intervals [tl, t2] and [t3, t4] have no common interior points. The pro-
cess ~ ( tis) called Brownian motion or a Wiener random process. One can
show [66] that with probability 1 the realizations of this process are contin-
uous but (in view of (1.1.31) and (1.2.5)) nowhere differentiable functions
of time. Formula (1.2.7) illustrates these properties of the Wiener process.
Indeed, the order of the increment

is given by the mean square deviation


34 Chapter I

Thus, as A -+ 0 we have lAql


rate jAqI/A - -
fi -+ 0 (the continuity) while the
1 / f i -+ co (the nondifferentiability). The important for-
mula [I751

lim
A+O
C [q(ti+l)
,
- ?(ti)]' = t - to
2=0

(to < t l < t2 < . . . < t N = t; A = m ~ x ( t -


~ t+i )~)
2

is an immediate consequence of these specific properties of the Wiener pro-


cess. In (1.2.8) the convergence of the sequence of random sums on the
left-hand side to a nonrandom variable on the right-hand side is under-
stood in the sense of convergence in probability (that is, (1.2.8) is satisfied
almost surely).
Let us prove (1.2.8). To this end, we take into account the fact that,
according to (1.2.6) and (1.2.7), the increment Aq; = ~ ( t i + l-) ?(ti) is a
Gaussian random variable with zero mean and variance

We calculate the initial 4-order moment of this increment

E(Avi)4 =
1 Jx4ap {- x2
d~ = 3(ti+l - ti)3.
J2n(ti+1 - ti) 2(ti+l - ti)
(1.2.10)
Let us consider the random sum

It follows from (1.2.9) that


N-1
E ~ N= C (ti+l - ti) = t - to.

Since Aqi are independent, we have the variance

Taking into account the inequality D(Aqi)' 5 E(Aqi)4 = 3(ti+l -ti)', we


obtain
N-l N- 1
DCN < 3 C (ti+l - ti)' 5 3m?x(ti+l - ti) C (ti+l t i ) = 3A(t - to),
i=O i =O
Synthesis Problems for Control Systems 35

that is, the variance of (1.2.11) tends to zero as A + 0. Thus Chebyshev's

and formula (1.2.12) imply (1.2.8).


Now let us return to formula (1.2.4) for the solution of stochastic dif-
ferential equations (1.2.2), (1.2.3). The right-hand side of (1.2.4) contains
integrals of random functions, that is, stochastic integrals. The rigorous
theory of such integrals, presented in [75] for the first time, can be found
in [66]. Let us consider some special properties of these integrals that
distinguish them from common integrals of sufficiently smooth determinis-
tic functions. From the course of mathematical analysis it is known that
(nonstochastic) Riemann and Stieltjes integrals are defined as the limits as
A -+0 of the integral sums

obtained by the A-decomposition of the integration interval [to,t]: to < tl <


tz < ... < t N = t, A = max, (ti+l - ti). In (1.2.13) and (1.2.14), ri denote
some points 7-i E [ti,ti+l] on the ith subinterval of the A-decomposition.
Note that for any piecewise continuous functions a(7, x) and any continuous
functions U(T, x ) , the limits of the integral sums in (1.2.13) and (1.2.14)
are independent of the position of the points ~i on the interval
i = O , ...,N - 1 .
In the stochastic case in which a(7, x(T)) = a ( r ) , U(T, x ( r ) ) = u ( r ) , and
V(T) are random functions ( ~ ( 7is) a Wiener process), the integrals

can also be defined by formulas (1.2.13), (1.2.14) provided that [160]:


(1) the random functions a ( r ) and U(T) are uniformly mean square
+
continuous on the interval [to, t], that is, E [ ~ ( T A ) - a(7)I2 + 0
as A t 0 uniformly in T E [to,t] (the same holds for ~ ( 7 ) ) ;
(2) the functions a(7) and U(T) are square integrable, more precisely,
36 Chapter I

(3) the limit in (1.2.13), (1.2.14) is understood as the mean square limit
(recall that a random variable [ is called the mean square limit of
the sequence of random variables [,

if E([, - [)' + 0 as n + 00).


If assumptions (1)-(3) are satisfied, then the limits in (1.2.13) and (1.2.14)
exist and the stochastic integrals are well defined by formulas (1.2.13) and
(1.2.14).
The fact that the value of the stochastic integral (1.2.14) depends on the
choice of the points ri is essentially new in contrast with the deterministic
case. This follows from special properties of the Wiener process and for-
mula (1.2.8) (integral (1.2.13) is independent of the choice of ri). Let us
consider this fact in more detail. It follows from the method for construct-
ing the integral sum (1.2.14) that we can replace the continuous function
u ( r ) = a ( x ( r ) , T) on the integration interval [to,t] by a piecewise constant
function u A ( r ) whose constant values on different segments [ti, ti+l] of the
A-decomposition coincide with values of the continuous function a ( r ) cal-
culated a t arbitrary point ri E [ti,ti+l] (see Fig. 9). Let us define the rule
for the choice of r i by the formula

We show that in this case the value of integral (1.2.14) depends on u.


Synthesis Problems for Control Systems 37

Since the random function x ( r ) is continuous, we can approximate its


realization on the interval [ti, ti+l] (i = 0 , . . . ,N - 1) by the segment of the
straight line

tit^) ti)
X(T)= ti) + ti+l
-

- ti
(T - ti) + o(A), ti < < titl,
- T - (1.2.16)

(relation (1.2.16), (1.2.8), (1.2.13), and (1.2.14) are satisfied almost surely).
From (1.2.16) and (1.2.15), we have

We assume that the function u ( r , x) is continuously differentiable with re-


spect to both arguments. Then it follows from (1.2.15) and (1.2.17) that

Using (1.2.18), we transform (1.2.14), which defines the stochastic integral,


as follows:

In the integral I, and in the differential of the Wiener process d , ~ ( r ) in


(1.2.19), the subscript v indicates the method by which the sum was con-
structed.
For v = 0, formula (1.2.19) defines the Ito stochastic integral

which is widely used in the theory of stochastic processes of diffusion


type [45, 66, 113, 117, 1321. Let us calculate the difference I, - I. for
an arbitrary v. Since we assume that the function U(T,x) is differentiable,
this function can be expanded into the Taylor series up to the first two
terms in a neighborhood of a n arbitrary point (ti, x(ti)) as follows:
38 Chapter I

Passing in (1.2.3) from differentials to finite increments and taking into


account the fact that ti+l - ti is small, we obtain
x(ti+l) - ti) = a ( x ( ~ r )7):, (ti+l - ti)
+ ~ ( 7 :~(7:))
~ [q(ti+l) - ti)] + o(ti+l - ti)
= ti, %(ti))[ ~ ( t i + l-) ti)] + o ( A ) . (1.2.22)
We substitute (1.2.22) into (1.2.21); then we substitute the result into
(1.2.19) and obtain the desired difference of integrals
N-1
I, - I. = lim
a-to
C vZ(ti,
du
x(ti))g(til %(ti))[q(ti+l) - ?(ti)]'- (1.2.23)
i=O

Let us calculate the limit on the right-hand side of (1.2.23). To this end,
following [175], we consider both the A-decomposition t i , 0 i N , of the < <
integration domain [to,t] and a larger E-decomposition t;, 0 j M < N , < <
such that maxj(t;+, - t5) = E > A. For each fixed E-decomposition we
define piecewise constant functions 7, (t) and (t) whose constant values
on the j t h part of the E-decomposition are given by the formulas

Obviously, we have

N-l
5 C 7, (ti)[ ~ ( t i + l-) T(ti)l2. (1.2.24)
i=O

It follows from (1.2.8) that


lim
A+O
x
ti~[t;~t~+~I
[q(ti+l) - r7(ti)I2 = t;+l - t:;

therefore, the inequality (1.2.24) implies


Synthesis Problems for Control Systems 39

The last inequality holds for any fixed c-decomposition. Since the function
u(t,x ) is continuously differentiable, we have

as c + 0. Thus the first and the last sums in (1.2.25) have the same limits
as E + 0:

This relation and (1.2.25) imply

Now we return to (1.2.23) and obtain the following relation between the
stochastic integral Iv and the Ito integral Io:

Thus we see that the following similar formula holds for any square
integrable function @ ( T ,x ( T ) ) that is continuously differentiable with re-
spect to both arguments (provided that the stochastic process z ( t ) satisfies
Eq. (1.2.3)):

We also note that if the function u in (1.2.26) or @ in (1.2.27) are indepen-


dent of x , then there is no difference between the integrals I, and Io.
40 Chapter I

The stochastic integral I, with parameter v = 1/2 plays a n important


role. Such integrals were introduced by R. L. Stratonovich [174] and are
called symmetrized. T h e advantage of such integrals is that they can be cal-
culated by using the rules of integration for deterministic smooth functions.
The calculation of the stochastic integral

of the Wiener process is usually presented as an illustrative example [167,


174, 1751. Indeed, in this case the Ito integral can readily be calculated
from (1.2.20) and (1.2.8). By (1.2.20) we have

We write the i t h summand in the form

take into account the equality rj(tN) = q(t) and (1.2.8), sum up, and thus
obtain

A similar symmetrized integral can be found from (1.2.28) and (1.2.27)


for v = 1/2, C(T, x) 1, and i~ (7, x ( r ) ) = x ( r ) = q(7). Since in this case
the second summand on the right-hand side of (1.2.27) is equal to (t -to)/2,
we have
lo
t
1
4 7 ) d ~ i ~ n (=r )2 [s2(t) - v2(t0)],

that is, the usual formula of integration of deterministic functions is valid


for symmetrized stochastic integrals.
It follows from the preceding that if the solution of stochastic differen-
tial equations (1.2.2), (1.2.3) is defined as a random function x(t) satisfy-
ing (1.2.4), then this definition must be supplemented with a method for
calculating the last integral in (1.2.4), since different stochastic integrals
determine different solutions of Eqs. (1.2.2) and (1.2.3).
In fact, it follows from (1.2.4) that for all interpretations of the stochastic
integral, the solution x(t) is a Markov stochastic process, since the initial
Synthesis Problems for Control Systems 41

value x(to) = xo and the future increments of the Wiener process ~ ( r ) ,


r > to, independent of xo, uniquely determine the future (with respect to
to) behavior of the stochastic process x(t), t > to. The Markov process x(t)
is a continuous stochastic processes of diffusion type, therefore, according
to i1.1, its probability properties are completely determined by the drift
and diffusion coefficients (1.1.54)

A(t, x) = lim E
a-+o
-
1 x(t) = } (1.2.29)

B ( t , x) = lim E
A-0
I x(t) = x .
(1.2.30)

It follows from (1.2.4) that

Since the process x ( r ) and the function a ( r , x) are continuous, the mean
value of the first integral in (1.2.31) can be written for small A as follows:

The result of averaging the second integral in (1.2.31) depends on the defini-
tion of the stochastic integral. If we have an Ito integral, then by definition
(1.2.20) we have

(for any A not necessarily small). This important property of Ito integrals
follows from the fact that the increments q ( r ) are independent of the in-
tegrand a ( r , x ( r ) ) (here a is not an extrapolating function 11321). By the
same reason, for any A , formulas (1.2.8) and (1.2.20) imply

t+a
= Ea2 (7, x ( r ) ) d r .
(1.2.34)
From (1.2.29)-(1.2.34), we obtain the local characteristics

A(t, x) = a(t, x), B ( t , x) = a 2 ( t ,x) (1.2.35)


42 Chapter I

of the Markov process defined by (1.2.4) on the basis of the Ito integral.
If the second integral in (1.2.31) is understood in the sense of (1.2.19),
then formula (1.2.34) remains valid after the change do7(.r) + d , v ( ~ )
but, instead of (1.2.33), we obtain the following formula from (1.2.26) and
(1.2.33):

In this case, the diffusion process x(t) has the other characteristics

aa
A(t, x) = a ( t , x) + v-(t,
ax x)u(t, x),
(1.2.37)
B ( t , x) = u 2 ( t , x).

Thus, to avoid misunderstanding, it were more correct from the very


beginning to write the stochastic differential equation (1.2.3), say, in the
form
+
da:(t) = a(x(t), t ) dt ~ ( t~, ( t )durl(t),
) (1.2.38)
where the subscript v in the differential d,v(t) shows in which sense we un-
derstand the stochastic integrals in the integral equation (1.2.4), equivalent
to (1.2.3). If v = 0 in (1.2.38), then Eq. (1.2.38) is called the Ito diffe~enttial
equation. If v assumes different values, then, as was shown previously, we
have different solutions of Eq. (1.2.38). On the contrary, the same diffusion
process x(t) with drift coefficient A(t, x) and diffusion coefficient B ( t , z)
can be described by infinitely many stochastic differential equations

corresponding to different values of the parameter u.


If the Markov process x(t) is defined by the Ito equation

then Eq. (1.2.38), equivalent to (1.2.40), has the form


Synthesis Problems for Control Systems 43

Finally, if x(t) is defined by (1.2.38), then the Ito equation corresponding


to this process has the form

a ( t , x ( t ) ) +vu------
ax
dt + u (t, X)dov(t) (1.2.42)

(formulas (1.2.39), (1.2.41), and (1.2.42) readily follow from (1.2.35) and
(1.2.37)).
From (1.2.38), (1.2.40)-(1.2.42) we can see that different forms of differ-
ential equations can readily be transformed into each other.
In this connection, the following two questions arise immediately: (1) Do
we need different forms of Eq. (1.2.38) a t all? (2) Is it possible to use only
one definite form of stochastic equations, say, the Ito form, and to consider
all differential equations of the form (1.2.3) as Ito equations? The last ques-
tion has a n affirmative answer. Namely, in the majority of mathematical
papers 144, 45, 56, 66, 113, 1321 it is postulated from the beginning that the
motion of a controlled system is described by Ito differential equations and
the theory is constructed via Ito stochastic differentials and integrals. The
answer to the first question is based on advantages and disadvantages of
different forms of stochastic equations and integrals, on whether a n actual
process is adequate to its mathematical model, and on the character of the
problem considered.
The main advantage of Ito stochastic differential equations is the fact
that their solutions x(t) are martingales; this follows from (1.2.4), the def-
inition of the Ito integral (1.2.20), and formula (1.2.33). This fact allows
us to study the processes x(t) by rather general methods from the theory
of martingales [132]. Moreover, if we use the Ito equation, then we obtain
many formulas, for example, in the theory of filtration ($1.5), in the most
compact form.
However, sometimes it is inconvenient to use Ito differentials and inte-
grals, because very often we cannot use formulas from the common analysis
for operations with Ito processes. This was already pointed out when inte-
gral (1.2.28) was calculated. A similar situation arises when we differentiate
a function of the stochastic process x(t) that is a solution of the Ito equation
(1.2.40).
Suppose that p ( t , x) is a continuous function with continuous partial
derivatives &plat, d p / d x , and a 2 v / d x 2 . Then the stochastic process
v ( t , x ( t ) ) (here x(t) is a solution of Eq. (1.2.40)) has the following Ito
stochastic differential [66, 131, 1671:
Chapter I

But if we use the usual differentiation rule, then we have

for the differential of a composite function cp (under the condition that x(t)
satisfies Eq. (1.2.40) with usual differential dq(t)).
The outlined computational difficulties disappear if we use the sym-
metrized form of stochastic integrals and equations. This has already been
shown for integration when we calculated the integral (1.2.28). Let us show
that the usual formula of the composite function differentiation (1.2.44)
holds for the stochastic process x(t) defined by the symmetrized differen-
tial equation (that is, by Eq. (1.2.38) with v = 112). The proof of this
statement is indirect. Namely, we show that formula (1.2.44) for x(t), de-
fined by the symmetrized stochastic equation

implies formula (1.2.43) for x(t), defined by the Ito equation (1.2.40).
Indeed, it follows from (1.2.41) that the symmetrized equation equivalent
to (1.2.40) has the form

(the arguments of a and a are omitted). From this relation and (1.2.44) we
obtain the symmetrized stochastic differential

Now we note that (1.2.27) implies

By setting @ = adcplax in (1.2.47), we obtain the Ito stochastic differential


(1.2.43) from (1.2.46) and (1.2.47).
Synthesis Problems for Control Systems 45

Now let us consider another problem, which is extremely important from


the viewpoint of applications. This is the question whether our mathemat-
ical model is adequate to the actual process in the dynamic system with
random perturbations. One of the starting points in the theory of optimal
control is the assumption that the equation of motion of a dynamic sys-
tem is given a priori (51.1). Suppose that the corresponding equation has
the form (1.2.2) or (1.2.3). We have already shown that one can construct
infinitely many solutions of such equations by choosing one or other form
of stochastic integrals and differentials. Which solution from these infin-
itely many ones corresponds to the actual stochastic process in the system?
Does this solution exist? The answers can be obtained only by analyzing
specific physical premises that lead to Eqs. (1.2.2), (1.2.3). Such investiga-
tions were performed in [167, 173, 175, 1811, whose basic results relative to
Eqs. (1.2.2), (1.2.3) we state without details.
If we consider the solution x(t) of Eq. (1.2.3) as a continuous model for
a stochastic discrete time process xk = x(kA), k = 0 , 1 , 2 , . . . , which is
computer simulated according to the formula

(Jk, k = 1 , 2 , . . ., is a sequence of independent identically distributed Gauss-


ian random variables with zero mean and variance DJk = A), then as A + 0
the sequence xk (under the linear interpolation with respect to t between
the points t k = kA) converges in probability to the solution x(t) of (1.2.3),
provided that the latter is the Ito equation.
If the motion of a dynamic system is given by (1.2.2) (stochastic equa-
tions of the form (1.2.2) are called Langevin equations [127]), where ((t)
is a sufficiently wide-band stationary stochastic process (for example, the
Gaussian Ornstein-Uhlenbeck process with the autocorrelation function
R t ( r )= ( a / 2 ) exp{-aj.rl} for large values of a ) ,then the solution of (1.2.2)
coincides with the solution of the symmetrized equation (1.2.3), that is, of
(1.2.38) with u = 112. In particular, each simulation of the Langevin equa-
tion (1.2.2) with an "actual" white noise by using analog computers gives
a symmetrized solution of Eq. (1.2.3) (see [37]).
In the present monograph, all stochastic equations given in Langevin
form (1.2.2) with the white noise [(t) are understood in the symmetrized
sense. In what follows, the symmetrized form of stochastic equations is used
rather often, since this is the most convenient form for calculations relative
to transformations of random functions, the change of variables, etc. In
this connection, we omit the index u = 112 in the stochastic differential.
The subscript 0 in the differential doq(t) in Ito equations is used if and
only if the Ito equation and the corresponding symmetrized equation have
46 Chapter I

different solutions. In other cases, just as in symmetrized equations, we


write stochastic differentials without subscripts. Stochastic integrals and
differentials that correspond to other values of v [191, 1921 are basically of
theoretical interest. We shall not consider them in what follows.
In conclusion, let us consider some possible generalizations of the results
obtained. First we note that all above-mentioned facts for scalar equations
(1.1.2), (1.2.3) can readily be generalized to the multidimensional case, so
that the form of (1.2.2), (1.2.3) is preserved, provided that x E R, and [(t)
( ~ ( t ) are
) n-dimensional vector-columns of phase coordinates and random
functions and the functions a and u are an n-column and an n x n matrix.
If necessary, the corresponding systems of equations can be written in more
detail, for example, instead of (1.2.2), we can write

(as usual, the summation is taken over repeated indices if any, that is,
in (1.2.49) we have u;j[j = ~ ~ = l u i j < j )Systems
. (1.2.2) and (1.2.3) (or
(1.2.49)) determine a n n-dimensional Markov process x(t) with the vector
of drift coefficients

Ai(t, x) = a i ( t , x) + -21daij(t,
dxk
x)
akj(t, x); i = 1, . . . , n , (1.2.50)

and the matrix of diffusion coefficients

If the process x(t) is defined by the Ito equation (1.2.40), then, instead of
(1.2.50) and (1.2.51), we have

According to [173], stochastic equations of the more general form (1.2.1)


can always be represented in the form (1.2.2). Indeed, as shown in [173], if
random functions [(t) in (1.2.1) have a small correlation time (for example,
one can assume that [(t) is an n-vector of independent stochastic processes
of Ornstein-Uhlenbeck type with a large parameter a ) , then Eq. (1.2.1)
determines a Markov process with the following drift and diffusion coeffi-
cients:
Synthesis Problems for Control Systems 47

J-00

(here K [a, P] = E(a - Ea) (P - EP) denotes the covariance of random vari-
ables a and 0 ; moreover, the mean Eg; and the correlation functions in
(1.2.53) and (1.2.54) are calculated under the assumption that the argu-
ment x is a nonrandom fixed vector).
Since similar characteristics of the Markov process defined by (1.2.2) (or
by (1.2.49)) have the form (1.2.50), (1.2.51), we can obtain the differential
equation (1.2.2), which is stochastically equivalent to (1.2.1), by solving
system (1.2.50), (1.2.51) with respect to the unknown variables ai and uij,
. .
z , ~= 1,. . . , n. Note that system of Eqs. (1.2.50), (1.2.51) can always be
solved with respect to ai and uij. This follows from the fact that the
diffusion matrix B is positive definite (semidefinite) and symmetric. As is
known [62], to such matrices there corresponds a real symmetric positive
(semidefinite) matrix u that is the matrix square root u = 1/B (here we
do not consider methods for calculating u). Since u is symmetric, we have
B = u2 = u u T , that is, the matrix equation (1.2.51) is solvable, and hence
(1.2.50) implies

It follows from the preceding that to study Markov processes of diffusion


type, without loss of generality, we can consider stochastic equations only
in the form (1.2.2.), (1.2.3) or (1.2.40).
Therefore, the most general form of differential equations of motion of a
controlled system with random perturbations [(t) of the white noise type
is given by the equation

or by the equivalent equation

(in (1.2.55) [(t) is the standard white noise with the characteristics (1.1.34);
in (1.2.56)
r t

is the standard Wiener process). In (1.2.55) and (1.2.56) u = u(t) is un-


derstood as the control algorithm (1.1.2). The form of this algorithm can
be found by solving the Bellman equation.
48 Chapter I

31.3. Deterministic control problems. Formal


scheme of the dynamic programming approach
The dynamic programming approach [I41 was proposed by R. Bellman
in the fifties as a method for solving a wide range of problems relative to
processes of multistage choice. In this section we briefly discuss the main
idea of this method applied to synthesis problems for optimal feedback
control systems [16, 171. We begin with deterministic problems of optimal
control and pay the main attention to the algorithm of the method, that is,
to the method for constructing the optimal control in the synthesis form.
Let us consider the control problem with free right endpoint of the tra-
jectory, in which the plant is given by system (1.1.5)

the performance criterion is a functional of the form (1.1.11)

and the control vector u may take values a t each moment of time in a given
bounded set U C R,,
u ( t ) E U. (1.3.3)
In problem (1.3.1)-(1.3.3) the time interval 10, T] and the initial vector
of phase variables xo are known; it is required to find the control function
u, ( t ): 0 <
-t <
- T satisfying (1.3.3) and minimizing the functional (1.3.2) on
the trajectory x , ( t ) : 0 5 t 5 T , which is a solution of the Cauchy problem
(1.3.1) with u ( t )= u,(t). If the Cauchy problem (1.3.1) with u ( t ) = u , ( t )
has a single solution, then the optimal control u, ( t ): 0 5 t < T may be
represented in the form
u*(t)= ~ ( t~ ,( t ) ) , (1.3.4)
where the current values of the control vector are expressed in terms of the
current values of phase variables of system (1.3.1). The optimal control of
the form (1.3.4) is called optimal control i n the synthesis form, and formula
(1.3.4) itself is often called the algorithm of optimal control.
The dynamic programming approach allows us to obtain the optimal
control in the synthesis form (1.3.4) for problem (1.3.1)-(1.3.3) as follows.
We write
Synthesis Problems for Control Systems 49

The function F ( t , xt), called later the loss f ~ n c t i o n ,plays


~ an important
role in the method of dynamic programming. This function is equal to the
minimum value of the functional (1.3.2) provided that the control process
<
is considered on the time interval [t, TI, 0 5 t T , and the vector of phase
variables is equal to x(t) = xt a t the beginning of this interval (that is, at
time t). In (1.3.5) the minimum is calculated over all possible strategies
U(T) = (p(r, x(T)) , that is, over all possible vector-functions p(r,x) : [t,T] x
R, -+ R, provided that:
(a) these functions take values in an admissible set U ;
(b) for any t E [O, T] the Cauchy problem for system (1.3.1),

has a unique solution X(T) :t < T 5 T.


The dynamic programming method is based on the Bellman optimality
principle [14, 171, which implies that the loss function (1.3.5) satisfies the
basic functional equation

F ( t , z t ) = rnin
U(~)€U
c (a,%(a),
u(a)) d a + F (3, x ~]), (1.3.6)

for all 5 E [t,T]. For different statements of the optimality principle and
comments see [I, 16, 501. However, here we do not discuss these statements,
since to derive Eq. (1.3.6) it suffices to have the definition of the loss function
(1.3.5) and to understand that this is a function of time and of the state
x(t) = xt of the controlled system (1.3.1) a t time t (recall that the control
process is terminated a t a fixed time T).
-
To derive Eq. (1.3.6), we write the integral in (1.3.5) as the sum T = St
$+ of two integrals and write the minimum as the succession of minima
min = min min .
u(r)€U u ( a ) E U y(p)EU
tSr<T t ~ ~ t < pt < ~
Then we can write (1.3.5) as follows:

F ( t , xt) = min min c(a, %(a),


u(a)) d a
u(a)EU ~ ( P ) E U
tSo<x t<p<T

'The function (1.3.5) is also called a value function, a cost function, or t h e B e l l m a n


function.
50 Chapter I

p,
Since, by (1.3.1), the control u(p) on the interval T) does not affect the
solution x(o) of (1.3.1) on the preceding interval [ t , q , formula (1.3.7) takes
the form

F ( t , x t ) = min
u(u)EU

Now, since by (1.3.5) the second term in the braces in (1.3.8) is the loss
function F(i,x,), we finally obtain Eq. (1.3.6) from (1.3.8).
The basic functional equation (1.3.6) of the dynamic programming ap-
proach naturally allows us to derive the differential equation for the loss
+
function F ( t , 2). To this end, in (1.3.6) we set t = t A, where A > 0 is
small, and obtain

F ( t , xt) = min
u(u)€U
t<u<t+A
[l
t+A
c(o, x(o), u ( r ) ) d o + ~ (+ tA, xtca)]. (1.3.9)

Since the solutions x(t) . . of system (1.1.3) are continuous, the increments
( x t +- ~ xt) of the phase vector are small for admissible controls u(t) =
cp(t,x(t)). Taking into account this fact and assuming that the loss function
F ( t , x) is continuously differentiable with respect to all its arguments, we
can expand the function F ( t + A, x ~ + into ~ ) its Taylor series about the
point (t, xt) as follows:

In (1.3.10) a F / d x denotes an n-vector column with components a F / d x i ,


i = 1,2, . .., n; therefore, the third term on the right-hand side of (1.3.10) is
the scalar product of the vector of increments ( x t + - ~ xt) and the gradient
of the loss function

the function o(A) denotes the terms whose order is larger than that of the
infinitesimal A. It follows from (1.3.1) that for small A the increment of
the phase vector x can be written in the form
Synthesis Problems for Control Systems 51

Writing the first term in the square brackets in (1.3.9) as

Jt

substituting (1.3.10) and (1.3.12) into (1.3.9), and taking into account
(1.3.11), we arrive at

Note that only the first and the fourth terms on the right-hand side of
(1.3.13) depend on the control ut. Therefore, the minimum is calculated
only over these terms, the other terms in the brackets can be ignored.
Dividing (1.3.13) by A, passing to the limit as A -+ 0, and taking into ac-
count the fact that limA+o o ( A ) / A = 0, we obtain the following differential
equation for the loss function F ( t , x):

(here we omit the subscript t of the phase vector xt and the control ut).
Note that the loss function F ( t , x) satisfies Eq. (3.1.14) on the entire interval
of control 0 5 t < T except a t the endpoint t = T , where, in view of (1.3.5),
the loss function satisfies the condition

The differential equation (1.3.14), called the Bellman equation, plays the
central role in applications of the dynamic programming approach to the
synthesis of feedback optimal control. The solution of the synthesis prob-
lem, that is, the optimal strategy or the control algorithm u, (t) = p, (t, x) =
p, (t, x(t)) can be found simultaneously with the solution of Eq. (1.3.14).
Namely, suppose that we have somehow found the function F ( t , x) that
satisfies (1.3.14) and (1.3.15). Then the expression in the square brackets
in (1.3.14) is a known function o f t , x, and u. Calculating the minimum of
this function with respect to u, we obtain the optimal control u, = p, ( t ,x)
(u, determines the minimum point of this function in U c R,).
If the functions c(t, x, u) and g (t, x, u) and the set of admissible controls
U allow us to minimize the function in the square brackets explicitly, then
the optimal control can be written in the form
52 Chapter I

where d F / d x is a vector of partial derivatives yet unknown; when we min-


imize the function in the square brackets in (1.3.14), we assume that this
vector is given. Using (1.3.16) and denoting

we write (1.3.14) without the symbol "minx as follows:

To complete the synthesis problem, it is necessary to solve (1.3.18) with


regard to (1.3.15), that is, to find the function F ( t , x) that satisfies (1.3.18)
<
for 0 t < T and continuously tends to a given function $(x) as t + T ,
and to substitute the function F ( t , x) obtained into (1.3.16).
In practice, the main difficulty in this synthesis procedure, is related to
solving the Bellman equation (1.3.14) or (1.3.18), which is a first-order par-
tial differential equation. The main distinguishing feature of the Bellman
equation is that it is nonlinear because of the symbol "min" in (1.3.14),
which shows that the function in (1.3.18) nonlinearly depends on the
components of the vector of the partial derivatives dF/dx. The character
of this nonlinearity is determined by the form of the functions c(t, x, u) and
g(t, x, u), as well as by the set of admissible controls U .
Let us consider some typical illustrative examples.
+
lo.Suppose that c(t, x, u) = cl(t, x) u T p ( t , x)u, where P is a sym-
metric r x r matrix positive definite for all x E R and t E [0, TI, g(t, x, u) =
+
a(t, x) Q(t, x)u ( a is an n-vector and Q is an n x r matrix), and the con-
trol u is unbounded (that is, U = R,). Then the expression in the square
brackets in (1.3.14) takes the form

By differentiating this function with respect to u and solving the system


d[.]/du = 0, we obtain
Synthesis Problems for Control Systems 53

(the matrix P-l is the inverse of P ) . Substituting (1.3.20) into (1.3.19)


instead of u, we obtain

+
2'. Suppose that c(t, x, u) = cl(t, x), g(t, x, u) = a ( t , z ) Q ( t , x)u,
and the domain U is a n r-dimensional parallelepiped, that is, luil <
uoi,
i = 1, . . ., r, where the numbers uoi > 0 are given. One can readily see that

where signA and jAl are matrices obtained from A by replacing each its
element a i j by sign a i j and laij 1, respectively; {uol, . . . ,uor) denotes the
diagonal r x r matrix with uol, . . ., uor on its principal diagonal.
. Let the functions c(.) and g(.) be the same as in 2'; for the domain
3
'
U, instead of a parallelepiped, we take an r-dimensional ball of radius Ro
centered a t the origin. Then, instead of (1.3.22), we obtain the following
expressions for the functions cpo and Q!:

Note that in (1.3.23) and in the following, d F / d x T denotes an n-row-vector


with components d F / d x i , i = 1,.. ., n. Therefore, the function ~ Q Q ~
is a quadratic form in components of the gradient vector of the loss function,
and the matrix QQT is its kernel.
As a rule, the nonlinear character of Bellman equations does not allow
one to solve these equations (and the synthesis problem) explicitly. There
is only one exception, namely, the so-called linear-quadratic problems of op-
timal control (LQ-problems). In this case the differential equations (1.3.1)
of the plant are linear:
54 Chapter I

(here A(t) and B ( t ) are given n x n and n x r matrices), the penalty functions
c(t, x, u) and $(x) in the optimality criterion (1.3.2) are linear-quadratic
forms of the phase variables x and controls u, and there are no restrictions
on the domain of admissible controls (that is, U = R, in (1.3.3)).
Let us solve the synthesis problem for the simplest one-dimensional LQ-
problem with constant coefficients; in this case, the solution of the Bellman
equation and the optimal control can be obtained as finite analytic formulas.
Suppose that the plant is described by the scalar differential equation

2 = ax + bu, (1.3.24)

and the optimality criterion has the form

(cl > 0, c > 0, T > 0, h > 0, and a and b in (1.3.24) and (1.3.25) are
given constant numbers). The Bellman equation (1.3.14) and the boundary
condition (1.3.15) for problem (1.3.24), (1.3.25) have the form

The expression in the square brackets in (1.3.26) considered as a function


of u is a quadratic trinomial. Since h > 0, this trinomial has the single
minimum

which can readily be obtained from the relation d[.]/du = 0; this is a


necessary condition for an extremum. Substituting u, into (1.3.26) instead
of u and omitting the symbol "min", we rewrite the Bellman equation in
the form

We shall seek the loss function F ( t , x) satisfying Eq. (1.3.29) and the
boundary condition (1.3.27) in the form
Synthesis Problems for Control Systems 55

where p(t) is the desired function of time. If we substitute (1.3.30) into


(1.3.29), then we see that p(t) must satisfy the ordinary differential equation

for 0 <
- t < T . Moreover, it follows from (1.3.27) and (1.3.30) that the
function p(t) assumes a given value a t the right endpoint of the control
interval:
p(T) = cl. (1.3.32)
Equation (1.3.31) can readily be integrated by separation of variables. The
boundary condition (1.3.32) determines the unique solution of (1.3.3 1). Per-
forming the necessary calculations, we obtain the following function p(t)
that satisfies Eq. (1.3.31) and the boundary condition (1.3.32):

Thus it follows from (1.3.28) and (1.3.30) that the optimal control in the
synthesis form for problem (1.3.24), (1.3.25) has the form

where p(t) is determined by (1.3.33).


Note that problem (1.3.24), (1.3.25) is one of few optimal control prob-
lems, for which the Bellman equation can be solved exactly. In Chapter I1
we consider some other examples of exact solutions to synthesis problems
of optimal control (for deterministic and stochastic control systems). How-
ever, the majority of the optimal control problems cannot be solved exactly.
In these cases, one usually employs approximate and numerical synthesis
methods considered in Chapters 111-VII.
We complete this section with some remarks.
First we note that we have considered only a formal scheme or, as is said
sometimes, the algorithmic essence of the dynamic programming approach.
The described method for constructing an optimal control in the synthesis
form (1.3.4) is justified by some assumptions, which are violated sometimes.
We need to take into account the following.
(1) The loss function F ( x , t ) determined by (1.3.5) is not always dif-
ferentiable even if the penalty functions c(t, x, u) and $(x) are sufficiently
56 Chapter I

smooth (or even analytic) functions. It is well known that, by this rea-
son, the dynamic programming approach cannot be used for solving many
time-optimal control problems [50, 1561.
(2) Even in the case where the loss function F ( x , t ) satisfies the Bell-
man equation (1.3.14), the control u,(t, x) minimizing the function in the
square brackets in (1.3.14) may not be admissible. In particular, this con-
trol can violate the existence and uniqueness conditions for the solution of
the Cauchy problem for system (1.3.1).
(3) The Bellman equation (1.3.14) (or (1.3.18)) with the boundary con-
dition (1.3.15) can have nonunique solutions.
Nevertheless, we have the following theorem [I].
THEOREM. Suppose that there exists a unique continuously differentiable
solution Fo(t, x) of Eq. (1.3.14) with boundary condition (1.3.15) and there
exists an admissible control u,(t, x) such that

uEU
c(t, 2, u) + 9T (t, 2 , ~ )O-"dx
(t, 2)
1 ~ F o
= c(t, x, u*) + g T ( t , 2, u*)-(t,
dx
2).

Then the control u,(t, x ) in the synthesis form is optimal, and the function
Fo(t, x) coincides with the loss function (1.3.5).
In conclusion, we point out another fact relative to the dynamic pro-
gramming approach. The matter is that this method can be used for solving
problems of optimal control for which the optimal control u,(t, x) does not
exist. For example, such situations appear when the domain of admissible
controls U in (1.3.3) is a n open set.
The absence of a n optimal control does not prevent us from deriving the
basic equations for the dynamic programming approach. It only suffices
to modify the definition of the loss function (1.3.5). So, if we define the
function F ( t , xi) as the greatest lower bound of the functional in the square
brackets in (1.3.5),

then one can readily see that the function (1.3.35) satisfies the equations

F ( t , xt) = inf (1.3.36)


U(U)€U

c(t, X, U) , ~ ) gx)]
+ g ~ ( t x, ( t=,0, (1.3.37)
dt ax
Synthesis Problems for Control Systems 57

which are similar to Eqs. (1.3.6) and (1.3.14). However, in this case the
functions u,(t, x) realizing the infimum of the function in the square brack-
ets in (1.3.37) may not exist.
Nevertheless, the absence of a n optimal control u,(t, x) is of no funda-
mental importance in applications of the dynamic programming approach,
since if the lower bound in (1.3.37) is not attainable, one can always con-
struct the so-called &-optimal strategy u, (t, x). If this strategy is used in
system (1.3.1), then the performance functional (1.3.2) attains the value
I(u,) = F ( 0 , xo) + E , where E is a given positive number. Obviously, to
construct a n actual control system, it suffices to know the &-optimalstrat-
egy u, ( t ,x) for a small E.
Here we do not describe methods for constructing &-optimal strategies.
First, these methods are considered in detail in the literature (see, for ex-
ample, 1113, 1371). Second (and this is the main point), the optimal control
always exists in all special problems studied in Chapters 11-VII. This is the
reason that, from the very beginning, in the definition of the loss function
(1.3.5) we use the symbol "min" instead of a more general symbol "inf."

$1.4. The Bellman equations for Markov controlled processes


The dynamic programming approach is widely used for solving stochas-
tic problems of optimal control. In this section we consider the control
problems in which the controlled process is a Markov stochastic process.
It follows from the definition of the Markov processes given in $1.1that
the probabilities of future states of a controlled system are completely de-
termined by the current states of the vector of phase variables, which are
assumed to be known a t any time t.

One can readily see that the servomechanism shown in Fig. 10 possesses
the listed Markov properties if the following conditions are satisfied:
(1) the joint vector ( y ( t ) ,x(t)) of instant values that define the input
actions and output variables is treated as the phase vector of the system;
58 Chapter I

(2) the input action y(t) is a Markov stochastic process;


(3) the random perturbation f (t) is a white noise type process;
(4) the controller C is a noninertial device that forms current values of
the control actions u(t) according to the rule

Actually, if the plant P is described by equations of the form (1.2.55)


and y(t) is a Markov process with known probability characteristics, then it
follows from (1.2.55) and (1.4.1) that the joint vector Z(t) = (x(t), y(t)) is
a Markov process. In particular, if y(t) is a diffusion process with drift coef-
ficient A(t, y) and diffusion coefficient B ( t , y), then it follows from (1.2.39),
(1.2.55), and (1.4.1) that the vector E(t) satisfies a system of stochastic
differential equations of the form (1.2.2), that is, E(t) is a diffusion Markov
process.
In this section we deal only with systems of the type shown in Fig. 10.
In §1.5 we consider the possibilities of applying the dynamic programming
approach in a more general situation with non-Markov controlled process
(Fig. 3).
Later we shall derive the Bellman equations for various stochastic prob-
lems of optimal control that are studied in Chapters 11-VII. These problems
were stated in 51.1.
1.4.1. Base problem. Optimal tracking of a diffusion pro-
cess. As the basic problem we consider the synthesis of the optimal ser-
vomechanism shown in Fig. 10 under the following conditions:
(i) the controlled plant P is described by a system of stochastic differ-
ential equations of the form

where x E R, is a vector of controlled output variables, u E R, is a


vector of control actions, [(t) is the n-dimensional standard white noise
with characteristics (1.1.34), a and a are a given matrix and a vector-
function, and the initial vector x(0) = xo and the time interval [O,T] are
specified;
(ii) the optimal control is sought in the form (1.4.1), the goal of control
is to minimize the functional

(iii) the restrictions on admissible controls have the form


Synthesis Problems for Control Systems 59

where U is a given bounded closed subset in the space R,;


(iv) the input stochastic process y(t) is independent of ((t) and is an
m-dimensional diffusion Markov process with a known vector AY(t, y) of
drift coefficients and with matrix BY(t, y) of diffusion coefficients;'
(v) there are no restrictions on the phase variables, that is, on the com-
ponents of the vector Z = (x, y) E R n + m ;the current values of the compo-
nents of the joint vector Z can be measured precisely at any instant of time
t E [O,T].
By analogy with (1.3.5) we determine the loss function F(t,xt, yt) for
problem (i)-(v) as follows:

The loss function (1.4.5) for stochastic problem (i)-(v) differs from the
loss function (1.3.5) in the deterministic case by the additional operation
of averaging the functional in the square brackets in (1.4.5). The averaging
in (1.4.5) is performed over the set of sample paths = [ x ( r ): t XT
r TI, < <
$' = [y(r): t <
r 5 TI, that on the interval [t,T] satisfy the stochastic
differential equations (1.4.2) and (*) (see the footnote) with initial condi-
tions x(t) = z t , y(t) = yt and control function ~ ( r=) ( ~ ( 7~, ( r Y(r)),
),
t<r<T.
Since the process Z(r) = ( x ( r ) ,Y(r)) is Markov, the result of averaging
F,(t, 2,) = F,(t, xt, yt) = E[.] in (1.4.5) is uniquely determined by the time
moment t , by the state vector of the system Zt = ( x t ,yt) a t this moment,
and by a chosen algorithm of control, that is, by the vector-function y(.)
in (1.4.1). Therefore, it turns out that the loss function (1.4.5) obtained by
minimizing F,(t, Zt) = F,(t, xt, yt) over all admissible controlss (that is,
over all admissible vector-functions c p ( - ) ) depends only on time t and the
state (xt, yt) of the servomechanism (Fig. 10) a t this time moment.

'As was shown in 51.2, the coefficients AY(t, y ) and B Y ( t , y ) uniquely determi~lethe
system of stochastic differential equations

+
$ ( t ) = ay ( t , ~ ( t ) ) ay( t , ~ ( t ) ) v ( t ) , (*I
whose solutions are sample paths of the Markov process y ( t ) ; in (*) I l ( t ) denotes the
standard white noise (1.1.34) independent of E ( t ) .
t in the deterministic case (51.3) the control in the form ( 1 . 4 . 1 ) is called
8 J ~ s as
admissible if (i) for all t E [0, T ) , x E R,, and y E R,, the vector-function p ( t , x , y)
takes values from a n admissible set U and (ii) the Cauchy problenl for the stocllastic
differential equation ( 1 . 4 . 2 )with control u ( t ) in the form ( 1 . 4 . 1 )has a unique solution.
60 Chapter I

One can readily see that, for any Z E [t, TI, the loss function (1.4.5)
satisfies the equation

that is a stochastic generalization of the functional equation (1.3.6). The


averaging in (1.4.6) is performed over the sample paths xf and &, and the
symbol E[-] in (1.4.6) indicates the conditional expectation E - -
":>y:l"t,yt[-I.
To prove (1.4.6), we write Ez, - -(-) for the conditional expectation
, >Y;lx;,Y;
of a functional of phase trajectdries denoted by (.). Here we average over
<
all possible sample paths :x = [ x ( r ) : t 5 T TI, :y = [y(r): t 5 r 5 T]
provided that the sample paths x(T), y(r) are known"and fixed on the time
a.
interval [t, Then, using the formula for the repeated expectations

-
writing the integral in (1.4.5) as the sum =J &T
: 1 +
; of two integrals
and writing the minimum as the succession of minima

min = min min ,


u(r)€U U ( U ) E E
~ (P)EU
t<s<T t<u<t t_<p<T

we can rewrite (1.4.5) as

-
- min min E - -

t j a < t ?<P<T

It follows from (1.4.1) and (1.4.2) that the controls u ( ~ on


) the time interval
Z 5 p < T do not affect the stochastic process x(u) on the preceding time
- u < t and the input stochastic process y ( r ) is independent of
interval t <
controls a t all. Therefore, taking into account the obvious relation
Synthesis Problems for Control Systems

we can rewrite (1.4.8) in the form

F(t,xt, yt) = min E


u(a)EU
t<a<S

Since the process (x(t), y(t)) is Markov, the result of averaging in the second
term in the braces in (1.4.9) depends only on the terminal state of a fixed
sample path (x:, y): . Thus, replacing ExI ,Yi,I":'Y: - - by Ex:,yTIxi,yi and taking
into account the fact that, by (1.4.5), the second term in (1.4.9) is the loss
function F(3, xt; x),
we finally obtain the functional equation (1.4.6) from
(1.4.9).
Just as in the deterministic case, the functional equation (1.4.6) allows us
to obtain a differential equation for the loss function F ( t , x, y). By setting
+
3 = t A, we rewrite (1.4.6) in the form

F ( t , xt, yt) = min


ufr)EU

Assuming that A > 0 is small and the penalty function c(x, y, u) is con-
tinuous in its arguments and having in mind that the diffusion processes
x ( r ) and y ( r ) are continuous, we can represent the first term in the square
brackets in (1.4.10) as

where, as usual, the function o(A) denotes infinitesimals of higher order


than that of A.
Now we assume that the loss function F ( t , x, y) has continuous deriva-
tives with respect to t and continuous second-order derivatives with respect
to phase variables x and y. Then for small A we can expand the function
62 Chapter I

F(t + A, x t + a , y t + a ) in the Taylor series

Here all derivatives of the loss function are calculated a t the point ( t ,xt, yt);
as usual, d F / d x and d F / d y denote the n- and m-column-vectors of partial
derivatives of the loss function with respect to the components of the vectors
z and y, respectively; a 2 F / d x a z T , a 2 F / a x a Y T ,and a2F/aYaYT
denote the
n x n , n x m , and m x m matrices of second derivatives.
To obtain the desired differential equation for F ( t , x, y), we substitute
(1.4.11) and (1.4.12) into (1.4.10), average, and pass to the limit as A + 0.
Note that if we average expressions containing the random increments
(zt+~ - z t ) and (yt+a - yt), then all derivatives of F in (1.4.12) are con-

sidered as constants, since they depend on (t, xt, yt) and the mathematical
expectation in (1.4.10) is calculated under the assumption that the values
of xt and yt are known and fixed.
T h e mean values of the increments ( x t + ~ - xt) can be calculated by
integrating Eqs. (1.4.2). However, we can avoid this calculation if we use
the results discussed in $1.2. Indeed, if just as in (1.4.11) we assume that
the control u ( r ) is fixed and constant, U(T) E u t , then we see that for
< < +
t T t A, Eq. (1.4.2) determines a Markov process X(T) such that we
can write (see (1.1.54))

where Ax(t, xt, ut) is the vector of drift coefficients of this process. But
since (for a fixed u ( t ) = ut) Eq. (1.4.2) is similar to (1.2.2), it follows from
(1.2.50) that the components of this vector have the formg

' ~ e c a l lthat formula (1.4.14) holds for the symmetrized stochastic differential equa-
tion (1.4.2). But if (1.4.2) is an Ito equation, then we have A x ( t , z t , u t ) = a ( t , s t , u t )
instead of (1.4.14).
Synthesis Problems for Control Systems 63

In a similar way, (1.4.2), (1.1.50) and (1.2.52) imply

where
B x ( t , z t ) = u(t, xt)uT(t, xt). (1.4.16)
The other mean values in (1.4.12) can be expressed in terms of the input
Markov process y(t) as follows:

) AY(t, yt)A + o ( A ) ,
-~ t=
E(Y~+A
(1.4.17)

Finally, since the stochastic processes y(t) and [(t) are independent, we
have
E(zt+a - xt)(yt+a - ~ t = )o(A)-~ (1.4.19)
Taking into account (1.4.13)-(1.4.19), we substitute (1.4.11) and (1.4.12)
into (1.4.10) and rewrite the resulting expression as follows:

a~ + (AY(t, yt))T -
+ (AX(t,xt, ut)) T & aF
ay
+ -21 Sp B x (t, xt) -d2F 1
axaxT + 5 SP B~(t, ~ t )ayayT
- ] +o(~)}.
a2F
(1.4.20)

For brevity, in (1.4.20) we omit the arguments (t , xt, yt) of all partial deriva-
tives of F and denote the trace of the matrix A = Ilaijll; by S P A =
+ + +
all a22 - - - a,,.
By analogy with Eq. (1.3.14), we divide (1.4.20) by A , pass to the limit
as A -+ 0, and obtain the following Bellman differential equation for the
loss function F = F ( t , x, y):

By analogy with (1.3.14), we omit the subscripts of xt, yt, and ut, assuming
that the phase variables x, y and the control vector u in (1.4.21) are taken
64 Chapter I

at the current time t. We also note that the loss function F = F ( t , x, y)


<
must satisfy Eq. (1.4.21) for 0 t < T. At the right endpoint of the control
interval, this function must satisfy the condition

which readily follows from its definition (1.4.5).


By using the operator

a2
+
Sp Bx(t,X ) - SP BY(t, Y)-
axaxT

we can rewrite (1.4.21) in the compact form


ayayT I 7 (1.4.23)

In the theory of Markov processes [45, 157, 1751, the operator (1.4.23)
is called an injinitesimal operator of the diffusion Markov process Z(t) =
( ~ ( t ) )~,( t ) ) .
To obtain the optimal control in the synthesis form u, = cp, (t, x, y) for
problem (i)-(v), we need to solve the Bellman equation (1.4.21) with the
additional condition (1.4.22).
If it is possible to calculate the minimum of the function in the square
brackets in (1.4.21) explicitly, then the optimal control can be written as
follows (see 81.3, (1.3.16)-(1.3.18)):

and the Bellman equation (1.4.21) can be written without the symbol "min"

where @ denotes a nonlinear function of components of the vector dF/dx


S y n t h e s i s P r o b l e m s for C o n t r o l S y s t e m s 65

In this case, solving the synthesis problem is equivalent to solving (1.4.26)


with the additional condition (1.4.22). After the loss function F ( t ,x,y)
satisfying (1.4.26) and (1.4.22) is found, we can calculate the gradient
d F ( t , x, y)/dx = w(t, x, y) and obtain the desired optimal control

Obviously, the main difficulty in this approach to the synthesis problem


is to solve Eq. (1.4.26). Comparing this equation with a similar equation
(1.3.18) for the deterministic problem (1.3.1)-(1.3.3), we see that, in con-
trast with (1.3.18), Eq. (1.4.26) is a second-order partial differential equa-
tion of parabolic type. By analogy with (1.3.18), Eq. (1.4.26) is nonlinear,
but, in contrast with the deterministic case, the nonlinearity of Eq. (1.4.26)
is weak, since (1.4.26) is linear with respect to the higher-order derivatives
of the loss function. This is why, in the general theory of parabolic equa-
tions [61, 1241, equations of type (1.4.26) are usually called quasilinear or
semilinear.
In the general theory [I241 of quasilinear parabolic equations of type
(1.4.26), the existence and uniqueness theorems for their solutions are
proved for some classes of nonlinear functions @. The unique solution
of (1.4.26) is selected by initial and boundary conditions on the function
F ( t , x, y). In our case, condition (1.4.22) that determines the loss function
f o r t = T plays the role of the "initial" condition. The boundary conditions
are determined by the restrictions imposed on the phase variables x and y
in the original statement of the synthesis problem. If, as in problem (i)-(v)
considered here, there are no restrictions on the phase variables, then it is
necessary to solve the Cauchy problem for (1.4.26). In this case, the unique-
ness of the solution is ensured by some requirements on the rate of growth
of the function F ( t , x, y) a s 1x1, Iyl -+rn (for details see Chapter 111).
However, there are no general methods for solving equations of type
(1.4.26) explicitly. Nevertheless, in some specific cases, Eq. (1.4.26) can be
solved approximately or numerically, and sometimes, exactly. We describe
such special cases in detail in Chapters 11-VII.
Now let us consider some modifications of problem (i)-(v) that we shall
study later. First of all, we trace how the form of the Bellman equation
(1.4.21) varies if, in the initial problem (i)-(v), we use optimality criteria
that differ from (1.4.3).
1.4.2. S t a t i o n a r y t r a c k i n g . We begin by modifying the criterion
(1.4.3), which allows us to examine stationary operating conditions of the
servomechanism shown in Fig. 10.
We assume that criterion (1.4.3) does not penalize the terminal state
of the controlled system, that is, the penalty function $(x, y) = 0 in the
66 Chapter I

functional (1.4.3). Then the servomechanism shown in Fig. 10 can operate


in the time-invariant (stationary) tracking mode if the following conditions
are satisfied:
(1) the input Markov process y(t) is homogeneous in time, namely, its
drift and diffusion coefficients are independent of time: AY(t, y) = AY(y)
and BY(t, y) = BY (y);
(2) the plant is autonomous, that is, the right-hand sides of Eqs. (1.4.2)
do not depend on time explicitly, a ( t , x, u) = a(x, u) and u(t, x) = u ( x ) ;
3) the system works sufficiently long (the upper integration limit T + ca
in (1.4.3)).

A process of relaxation to the stationary operating conditions is schemat-


ically shown in Fig. 11, where the error z(t) = y(t) -x(t) between the input
action (the command signal) and the controlled value (x and y are scalar
variables) is plotted on the ordinate axis. One can see that for large T the
operation interval [0, T] can be conventionally divided into two intervals:
the time-varying operation interval [0, t l ] and the time-invariant operation
interval [tl, TI. The first is characterized by a correlation relation between
the values of random sample paths z(t), t E [0, tl], and the initial state z(0).
On this interval the probability characteristics of the stochastic process z(t)
depend on t. For t > t l , this correlation disappears, and we can assume
that z(t), t E [tl, TI, is a stationary process. Hence, the characteristics of
the process related to time t > t l are independent of t. In particular, the
instant values of the processes x(t) and y(t) on the interval It1, T] have a
constant density p, (x, y) of the probability distribution. Conditions for
the existence of time-invariant operating conditions for linear controlled
systems are discussed in [194].
Synthesis Problems for Control Systems 67

The performance on the time-invariant interval is characterized by the


value y of mean losses per unit time (the stationary tracking error). If the
operation time T increases to T + AT (see Fig. l l ) , then the loss function
(1.4.5) increases by yAT. Therefore, to study the stationary tracking, it
is expedient, instead of the loss function (1.4.5), to use the loss function
f (2, y) that is independent of time and can be written as

It follows from (1.4.23) and (1.4.24) that function (1.4.29) satisfies the
stationary Bellman equation

where LE,y denotes the elliptic operator

Obviously, for the optimal control u, = cp,(x, y), the error y of stationary
tracking has the form

and, together with the functions f (x, y) and u, = cp, (x, y), can be found
by solving the time-invariant equation (1.4.30). Some methods for solving
the stationary Bellman equations are considered in Chapters IILVI.
1.4.3. Maximization of the mean time of the first passage to
the boundary.
As previously, we assume that in the servomechanism shown in Fig. 10
the stochastic process y(t) is homogeneous in time and the plant P is
autonomous. We also assume that a simply connected closed domain
+
D C Rn+, is chosen in the (n m)-dimensional Euclidean space Rn+,
of vectors (2, y). It is required to find a control that, for any initial state
(x(0), ~ ( 0 ) E) D of the system, maximizes the mean time E 7 during which
the representative point (x(t), y(t)) achieves the boundary d D of the do-
main D (see the criterion (1.1.21) in 31.1).
By W u ( t -to, xo, yo) we denote the probability of the event that the rep-
resentative point (x, y) does not reach d D during time t-to if x(to) = xo and
68 Chapter I

y ( t o ) = Y O , ( z o ,yo) E D , and a control algorithm u ( t ) = c p ( x ( t )y,( t ) ) is


chosen. This definition implies the following properties of the function W u :

wu( 0 ,xo, yo) = 1, W U(+m, xo, yo) = 0 ,


if ( x o ,yo) is an interior point of D ; (1.4.33)
W u ( t- t o , XO, yo) E 0 , Vt > to, if ("0, yo) E aD.

If t , denotes a random instant of time at which the phase vector Z ( t ) =


( x ( t ) ,y ( t ) ) comes to the boundary a D for the first time, then the time
r = t , -to of coming to the boundary is a random variable and the function
W u(-) can be expressed via the conditional probability

t to, " 0 , yo) = PU{ r 2 t


W U (- - to I x ( t 0 ) = 2 0 , Y ( t 0 ) = Y O )
= pU{r > t -to I ~ 0 , ~ ~ ) . (1.4.34)

For the mutually disjoint events { r < t-to} and { r > t-to}, the probability
addition theorem implies

Expressing the distribution function of the probabilities P u { r < t - to I


x O ,yo) via the probability density w,(u) of the continuous random vari-
able r , we obtain

P{~<t-toI"o,~o)=
I'-" W, ( u )d a = 1 - W u(t - to, xo, yo)

from (1.4.34) and (1.4.35). Hence, after the differentiation with respect

Using the same notation for the argument of the density and for the random
value, that is, writing w,(t - t o ) = w ( r ) , from (1.4.33) and (1.4.36) we
obtain the mean time Er of achieving the boundary

= Lm W U(r," 0 , Yo) d r = 1; W U (-
t to, " 0 , Y O )dt.
(1.4.37)

This formula holds if limt,, (t - t o )W u(t - to,xo, yo) = 0.


Synthesis Problems for Control Systems 69

The mean time E r depends both on the initial state (xo, yo) of the
controlled system shown in Fig. 10 and on a chosen control algorithm
u = p(x, y). Therefore, the Bellman function for the problem considered is
determined by the relation

By analogy with (1.4.10), for the function (1.4.38), the basic functional
equation of the dynamic programming approach has the form

F l ( z t , yt) = max
u(r)€U
W U ( r- t , zt, yt) d~ + EFl(zt+a, Y ~ +11A.
t<~<t+A
(1.4.39)
The Bellman differential equation for the function Fl(x, y) can be de-
rived from (1.4.39) by passing to the limit as A t 0. In this case, the
procedure is almost the same as that used for the derivation of Eq. (1.4.21)
for the basic problem (i)-(v). Expanding F l ( x t + ~yt+a), in the Taylor
series around the point ( z t , Y,), averaging the expansion with respect to
the random increments ( x ~ + A - xt) and ( y t + ~- yt), taking into account
the relation limA,o W u ( A , xt, yt) = 1 for all (xt, yt) lying in the interior
of D , and passing to the limit as A + 0, from (1.4.39) with regard to
(1.4.13)-(1.4.19), we obtain the Bellman differential equation for the func-
tion F l ( x , y):
m a ~ L $ , ~ F l (y)x ,= -1, (1.4.40)
uEU

where the elliptic operator Lz,y is given by (1.4.31).


We also note that the function F l ( z , y) satisfies Eq. (1.4.40) in the in-
terior of the domain D. It follows from (1.4.33) and (1.4.38) that a t the
points of the boundary d D the function Fl vanishes,

In the theory of differential equations of elliptic type, the problem of


solving Eq. (1.4.40) with the boundary condition (1.4.41) is called the first
interior boundary-value problem or the Dirichlet problem. Thus, solving the
synthesis problem for the optimal control that maximizes the mean time
of the first passage to the boundary is equivalent to solving the Dirichlet
problem for the semilinear elliptic equation (1.4.40).
70 Chapter I

1.4.4. Minimization of the maximum penalty. Now let us consider


the synthesis problem with optimality criterion (1.1.18) for the optimal
control system shown in Fig. 10. In this case, it is reasonable to introduce
the loss function

Fz(t,xt,yt) = min E
t<s<T
I
max c ( x ( r ) , y ( r ) , u ( r ) ) . (1.4.42)

In (1.4.42) the averaging has the meaning of the conditional mathematical


expectation E[.] = E{[.] I x(t) = xt, y(t) = yt).
For small A we have the following basic functional relation for the func-
tion F2:

Let us introduce the notation cO(x,y) = m i h E u c(x, y, u). Then it follows


from (1.4.43) that either

provided that the function F2(t, xt, yt) > cO(xt,yt) has been obtained from
(1.4.45).
Acting by analogy with Section 1.4.1, that is, expanding the function
+
F2(t A , x t + ~y, t + ~ in
) the series (1.4.12), averaging, and passing to the
limit as A + 0, from (1.4.44) and (1.4.45) we obtain (with regard to
(1.4.13)-(1.4.19)) the Bellman equation in the differential form:

2, Y) = 0
minLZL,z,yF~(t, if Fz(t, 2, Y) > cO(x,Y),
(1.4.46)
Fz(t, X , Y) = cO(z,Y) otherwise,

where LT,z,y denotes the operator (1.4.23).


The unique solvability of (1.4.46) implies the condition
Synthesis Problems for Control Systems 71

as well as the matching conditions for the function F2(t, x, y) on the inter-
face between the domains on which the equations in (1.4.46) are defined.
These conditions of "smooth matching" [I131 require the continuity of the
function F2(t, x, y) and of its first-order derivatives with respect to the
phase variables x and y on the interface mentioned above.
If, by analogy with Sections 1.4.2 and 31.4.3, the characteristics of the
input process y(t) and of the controlled plant P are time-independent, then
it is often expedient to use a somewhat different statement of the problem
considered, which allows us to assume that the loss function is independent
of time. In this case, we do not fix the observation time but assume that
the optimal system minimizes the functional

where ,f3 > 0 is a given number. This change of the mathematical statement
preserves all characteristic features of the problem. Indeed, it follows from
(1.4.48) that the time of observation of the function c(x, y, u) is bounded
and determined by 0.Namely, this time is large for small /? and small for
large 0.
For the criterion (1.4.48) the loss function is determined by the formula10

f 2 (2, y)
U(T)EU ~ ? t I
= min E m a x c ( x ( r ) , y ( ~ )u, ( ~ ) ) e - P ( ~ - .~ ) (1.4.49)

Taking into account the relations

min E
u(r)~U I J
max c ( x ( r ) , Y ( ~ u) ,( ~ ) ) e - P ( ~ - ~ )
T>~+*

-
- min E
r>t+A
max c ( x ( r ) ,Y(r),
u(~))e-P(~-~-~)
I
we can rewrite Eq. (1.4.43) for the function f2(x, y) in the form

''As usual E[.] in (1.4.49)is treated as the conditional mathematical expectation


E[.I = EU.1 I x(t) = 2,~ ( t=) Y}.
72 Chapter I

By analogy with the previous reasoning, from (1.4.50) we obtain the Bell-
man equation for the function f 2 (x, y):

minL;,,fi(x, Y) = P f i ( x , Y) if f2(x, Y) > cO(x,Y),


(1.4.51)
otherwise,

where L;,, is the elliptic operator (1.4.31) that does not contain the deriv-
ative with respect to time t. In $2.2 of Chapter 11, we solve Eq. (1.4.51) for
a special problem of optimal control.
1.4.5. Optimal tracking o f a strictly discontinuous Markov pro-
cess. Let us consider a version of the synthesis problem for the optimal
tracking system that differs from the basic problem (i)-(v) by conditions (i)
and (iv). Namely, we assume that (i) the input process y(t) in the ser-
vomechanism (see Fig. 10) is given by a strictly discontinuous Markov pro-
cess (see 31.1) with known characteristics X(t, y) and ~ ( yz , t ) determining
the intensity of jumps and the density of the transition probability a t the
state (t, y) and that (ii) there are no random perturbations ((t) that act
on the plant P. In this case, the plant P is described by the system of
ordinary (nonstochastic) differential equations

It follows from (1.1.68) that for small A the transition probability p(t, yt; t +
+
A , yt+A) = p(y(t A) = y t + ~I y(t) = yt) for the input process y(t) is
determined by the formula

By analogy with the solution of the basic problem, in the case considered,
the loss function F3(t,x, y) is determined by (1.4.5) if E[.] in (1.4.5) is
understood as the averaging of the functional [.] over the set of sample
paths y: = [y(r): t 5 T 5 T] issued from a given initial point y(t) = yt.
Obviously, F3(t, x, y) satisfies the functional equations (1.4.6) and (1.4.10).
We rewrite Eq. (1.4.10) for F3 as follows:
Synthesis Problems for Control Systems 73

Note that for small A we can explicitly average (1.4.54) by integrating


the function in the square brackets multiplied by the transition probability
(1.4.53).
Since the sample paths of the input process y(t) are discontinuous, the
random increments (yt+a - yt) are, generally speaking, not small. There-
fore, in our case, instead of (1.4.12), we use the following representation of
+
F3(t A, xt+a, yt+a) as A +-0:

(in (1.4.55) it is assumed that F3(t, x, y) is a continuously differentiable


function with respect to t and x).
The Bellman equation for F3(t, x, y) can be derived from (1.4.54) in the
standard way. To this end, we substitute expansion (1.4.55) into (1.4.54),
average with probability density (1.4.53), and pass to the limit as A + 0
in (1.4.54).
Using (1.4.53), we obtain

In a similar way, it follows from (1.4.52) and (1.4.53) that

x t + ~- xt = a(t, xt, ut)A + o(A), (1.4.57)


dF3 dF3
Ea+(t, xt, Y ~ + A=) %(t, xt, yt) + O(A),
aF3 dF3
ET(t, x ~ , Y ~ +=A-(t,
)dx xt, yt) + O ( A ) ,

(in (1.4.58) and (1.4.59) the functions O(A) denote terms of the order of A
such that l i m a , ~ O ( A ) / A = N, where N is a finite number).
74 Chapter I

Using (1.4.55)-(1.4.60) and passing to the limit as A -+ 0 in (1.4.54),


we obtain the following Bellman integro-differential equation for the func-
tion F3:

F3(T72, Y) = $(z, Y). (1.4.62)


If X(t, Y) = X(Y), r ( t , Y, z) = T(Y, z), a ( t , 2, u) = a ( x , u), *(x, Y) = 0, and
T t m, then the system shown in Fig. 10 may operate in the stationary
tracking mode (see Section 1.4.2). In this case, instead of (1.4.61), we have
the stationary Bellman equation

where a stationary loss function f3(x, y) is determined by analogy with


(1.4.29) as
f3(x, Y) = T-+03
lim [F3(t, x, Y) - 7(T - t)]
and the number y > 0 determines mean losses per unit time in the station-
ary tracking mode under the optimal control. The solution of the time-
invariant equation (1.4.63) for a special synthesis problem is given in 32.2.
In conclusion, we make some remarks.
First, we note that in this section we have considered only the synthesis
problems (and the corresponding Bellman equations) that are studied in
the present monograph. The Bellman equations for other stochastic control
problems can be found in [I, 3, 5, 18, 34,50, 57, 58, 113, 1221. Moreover, the
ideas and methods of the dynamic programming approach are widely used
for solving problems of optimal control for Markov sequences and processes
with finitely or countably many states [151, 1521, which we do not consider
in this book.
We also point out that many arguments and computations in this section
are of rather formal character and sometimes correspond to the "physical
level of rigor." To justify the optimality principle, the sufficiency of Markov
optimal strategies, the validity of Bellman differential equations, and the
solvability of synthesis problems rigorously, it is required to have rather
complicated and refined mathematical constructions that are beyond the
framework of this book. The reader interested in a closer examination of
these problems is referred to the monographs [58, 59, 1751, and especially
to [113].
Synthesis Problems for Control Systems 75

31.5. Sufficient coordinates in control


problems with indirect observations
We have already noted that the dynamic programming method in, so to
say, its "pure" form can be used only for Markov controlled processes. Let
Xt be a current phase state of the system. The probabilities of future states
Xt+a (A > 0) of the process X ( t ) must be completely determined by the
last measured value Xt. However, since the time evolution of X (.t ,) depends
on random perturbations and control actions, the process X ( t ) satisfies the
Markov property only if the values ut of the current control are determined
by the instant values of the phase variables and time as follows:

The Markov property of the process X ( t ) allows us to write the basic func-
tional equation of the optimality principle, then to obtain the Bellman
equation, etc., that is, to follow the procedure described in 31.4.
To implement the control algorithm in the form (1.5. I ) , it is necessary
to measure the phase variables X t exactly a t each instant of time. This
possibility is provided by the servomechanism shown in Fig. 10. In this
case, the phase variables X t = Zt = (xt, yt) are the components of the
(n + m)-dimensional vector of instant input (assigning) actions and output
(controlled) variables.
Now let us consider a more general case of the system shown in Fig. 3.
At each instant of time, instead of true values of the vectors xt and yt, we
have only the results of measurements ci and Zi, which are sample paths
< < < <
of the stochastic processes {c(s): 0 s t ) and {Z(s): 0 s t}. These
processes are mixtures of "useful signals" Y;, xi and "random noises" (i, rl;.
Only these results of measurements can be used for calculating the current
values of the control actions ut; therefore, the desired control algorithm for
the system shown in Fig. 3 has the form of the functional

To illustrate the computation of the optimal functional ~ ( tZi, , ci), we


consider, as an example, the basic synthesis problem (see i1.4, Section 1.4.1)
in the case of indirect observations.
Assume that the equation of the controlled plant, the restrictions on the
control, and the optimality criterion have the form
76 Chapter I

(here we use the notation from (1.4.2), (1.4.3), and (1.4.4) in §1.4). The
observed processes Z(t) and g(t) are determined by the relations

Here P, Q, H, and G are given matrices whose dimensions agree with the
dimensions of the vectors Z, x, q, y, v,
and <. We also assume that the
c
vectors Z and q (as well as the vectors and C) are of the same dimension,
and the square matrices Q(t) and G(t) are nondegenerate for all t E [0, TI.''
We assume that the stochastic process [(t) in (1.5.3) is the standard white
noise (1.1.34) and the other stochastic functions y(t), C(t), and q(t) are
Markov diffusion processes with known characteristics (that is, with given
drift and diffusion coefficients). The stochastic processes [(t), y(t), ((t),
and q(t) are assumed to be independent. We also note that the stochastic
process x(t), which is a solution of the stochastic equation (1.5.3), is not
Markov, since in this case the control functions u(t) = ut on the right-hand
side of (1.5.3) have the form of functionals (1.5.2) and depend on the history
of the process.
Following the formal scheme of the dynamic programming approach,
by analogy with (1.4.5), we can define the loss function for the problem
considered as follows:

-t -t
F ( t , xo, Yo) = min E
u(r)EU
t<r<T
[l~ ( ~ ( 7~1( 7, 1 ,
T
4 ~ ) ) +d d~( x ( T ) , Y(T)) I z;, G]
*

(1.5.7)
Since the functions jrot and vi are arguments of F in (1.5.7), it would be
more correct if expression (1.5.7) were called a loss functional; however,
both (1.5.7) and (1.4.5) are called loss functions.
In contrast with $1.4, it is essentially new that we cannot write the
optimality principle equation of type (1.4.6) or (1.4.10) for the function
(1.5.7), since this function depends on the stochastic processes Z(t) and
c ( t ) , which are not Markov. Formula (1.5.6) immediately shows that Z(t)
and c ( t ) have no Markov properties, since the sum of Markov processes
is not a Markov process. Moreover, it was pointed out that the process
x(t) itself is not Markov. Therefore, we can solve the synthesis problem

"For simplicity, we assume that Q ( t )and G ( t )are nondegenerate, but this condition
is not necessary [132, 1751.
Synthesis Problems for Control Systems 77

by using the dynamic programming approach only if we can choose new


"phase" variables X ( t ) = Xt for the loss function (1.5.7) so that, on one
hand, they were sufficient for computation of minimum future losses in the
sense of
F ( t , Z;, g;) = F ( t , Xt)

and, on the other hand, the stochastic process X ( t ) be Markov. Such phase
variables X t are called suficient coordinates [I711 by analogy with sufficient
statistics used in the mathematical statistics [185].
It turns out that there exist sufficient coordinates for the problem consid-
ered and X t is a collection of instant values of observable processes Z(t) = Zt
and y(t) = gt and of the a posteriori probability density p(t, xt, yt) =
p(x(t) = xt, y(t) = yt I ZJ, 5;) of the unobserved vectors xt and yt,

In what follows, it will be shown that the coordinates (1.5.8) are sufficient
to compute the loss function (1.5.7). In the case of an uncontrolled process
x(t), the Markov property of (1.5.8) follows from Theorem 5.9 in [175].
To derive the Bellman differential equation, it is necessary to know equa-
tions that determine the time evolution of sufficient coordinates. For the
first two components of (1.5.8), that is, for the process Z(t) and c ( t ) , these
equations can be assumed to be known, because one can readily obtain
them from the a priori characteristics of the processes ~ ( t )x(t), , ((t), q(t)
and formulas (1.5.6). Later we derive the equation for the a posteriori
probability density p(t, xt, yt ) .
First, we do not pay attention to the fact that the control ut has the form
of a functional (1.5.2). In other words, we assume that u(t) in (1.5.3) is a
known deterministic function of time. Then the stochastic process x(t) that
satisfies the stochastic equation (1.5.3) is a diffusion Markov process whose
characteristics (the drift and diffusion coefficients) are uniquely determined
by the vector a ( t , x, u) and the matrix a ( t , x) (see 31.2). Thus, in our
case, x(t), y(t), ((t), and q(t) are independent stochastic Markov diffusion
processes with given drift coefficients and matrices of diffusion coefficients.
In view of formulas (1.5.6) and the fact that the matrices Q(t) and G(t)
are nondegenerate, it follows that the collection (Z(t),g(t),x(t), y(t)) is a
Markov diffusion process whose characteristics can be expressed via given
characteristics of the processes x(t), y(t), ((t), and q(t). Indeed, if we
denote the vectors of drift coefficients by A, (t, x), Ay (t, y), AC(t, 0, A, (t, q)
and the diffusion matrices of independent Markov processes x(t), y(t), ((t)),
and q(t) by B,(t, x ) , B y (t,y), B,-(t, (), B, (t, q), then it follows from (1.5.6)
that the drift coefficients A,- and AG for the components Z(t) and G(t) are
78 Chapter I

determined by the relations

and the matrix B of the diffusion coefficients of the joint process ( S ( t ) ,c(t),
x ( t ) ,y ( t ) ) has the block form

where Bs(t,2, x ) and B G ( t ,y, y ) are square matrices12 determined by the


relations

Now we point out that in the Markov collection of random functions


( Z ( t ) ,y(t),x ( t ) ,y ( t ) )the components S ( t ) and G ( t ) are observable, but the
components x ( y ) and y ( t ) are not observable. Partially observable Markov
processes are often called conditional Markov processes. The rigorous the-
ory of such processes can be found in [132, 1751.
Let us consider the conditional ( a posteriori) density p(t, x t , y t ) = ( x ( t ) =
Yi)
x t , y ( t ) = yt /t I:, of the probability distribution for unobservable com-
ponents of the partially observable Markov process ( S ( t ) ,c(t),x ( t ) ,y ( t ) ) .
It turns out that the a posteriori density p ( t , x t , yt) satisfies a stochastic
partial differential equation, first obtained in [175].This is a generalization
of the Fokker-Plank equation (1.1.67) to the case of observation. In what
follows, we briefly derive this equation.

121f P , Q , H , and G in (1.5.6) are row matrices, then B ; ( t , f , z ) and BG(t,G,y)are


scalar functions.
Synthesis Problems for Control Systems 79

According to [175], we introduce the following notation. We denote the


collection of random functions (Z(t), c ( t ) , x(t), y(t)) that forms a Markov
process by a single letter z(t) and assume that the dimension of the vector z
is equal to n. We assume that the unobservable components of the vector z
are numbered from 1 to m and the observable components are numbered
from m + 1 to n. For convenience, we write x, (1 5 a 5 m) for unobservable
components and y,, ( m + 1 < - p < - n ) for observable ones. We also use
three groups of indices: the indices i , k,l , .. . vary from 1 to n; the indices
a,p,r,... f r o m 1 t o m ; a n d p , a , r,... f r o m m + l t o n .
By assumption, the local characteristics of the diffusion process z(t) are
given a priori:
1
lim -E[Azk
A-0 A
I ~ ( t =) z] = A k ( t , z ) ; Azk = zk(t + A ) - zk(t);

It is required to obtain an equation for the a posteriori probability p(t, xt) =


p(xt I y;), provided that (1.5.14) and the results of observation of y; are
known.
Using the transition probability pA(zt+a I zt) = p a ( x t + a , yt+a I xt, yt)
and the probability multiplication theorem, we obtain

Integrating (1.5.15) with respect to xt and taking into account (1.1.50), we


obtain

If we write the left-hand side of (1.5.16) in the form

then we can write (1.5.16) as follows:

Integrating (1.5.16) with respect to x t + a , we obtain


80 Chapter I

Substituting (1.5.18) into (1.5.17) and taking into account the fact that the
equality

is valid, since the arguments are continuous, we obtain

~ ~

(1.5.19)
Equation (1.5.19) for partially observable Markov processes plays the
same role as the Markov (Smoluchovski) equation (1.1.53) for the complete
observation. To derive the differential equation for the a posteriori density
p(t, z t ) from (1.5.19), we use the same method as for the derivation of the
Fokker-Planck equation in 5 1.1 (see (1.1.59)-(1.1.64)).
Let us introduce two characteristic functions of random increments Az,,
a = 1,..., m, and Azk, k = 1,...,n,13

The transition probability can be expressed in terms of inverse Fourier


transforms as follows:

131n (1.5.20) and (1.5.21), as usual, j = &andithe sum is taken over the repeated
indices:
Synthesis Problems for Control Systems 81

Using the expansion of In &(u, zt) in the Maclaurin series, we can write

-
k , e, . . .,T = l
S
K k A , . . . , T ] ( j u ) ( j u ). . . ( u ) }

where K s [Azk,. . . , Az,] denotes the s-order correlation between the com-
(1.5.24)

+
ponents of the vector of increments A z = z(t A) - z ( t ) of the Markov
process z(t). Using well-known relations between correlations and initial
moments [173, 1811, we see that (1.5.14) gives the following representation
for (1.5.24):
A
@2(u1,.. . ,u,, i t ) = exp - k- B ~ , u ~ u ,
[ ~ j ~ k U
2
+ o(A)] (1.5.25)

(for brevity, in (1.5.25) and in the following we do not write arguments of


Ak and B k l , namely, Ak = & ( t , zt) and Bke = Bke(t,zt)).
Comparing (1.5.22) and (1.5.23), we see that

- ..,
@ l ( ~ l , Urn, Z t , Y ~ + A=) @Irn-" eXP [ - juu(yut+n - yet )]

After the substitution of (1.5.25), we can calculate the integral (1.5.26)


explicitly. As the result of integration, for the characteristic function we
obtain the formula14

where

1 4 ~ obtain
o (1.5.27) and (1.5.28), it sufficesto use the well-known formula [67]

which holds for real symmetric positive definite matrices B = [IBkell;and any constants
mk and me.
82 Chapter I

and K is a constant that does not influence the final result of calculations.
Note that we calculated (1.5.26) under the assumption that the matrix
IIBaPIIL+lis nondegenerate and we used the notation ]lF,,l = IIB,,ll-l.
Since the exponent in (1.5.27) is small (- A), we replace the exponential
+ +
by the first three terms of the Maclaurin series ex = 1 x x 2 / 2 , truncate
the terms whose order with respect to A is larger than 1, and obtain

exp [ ~ ( u l.,.,. urn,~ ty t,+ a ) A + o ( A ) ] = 1 + L ( u l , . . ., urn,Ztr Y~+A)A

In (1.5.29) we used the relation FupBpr = B a p F p r = dur, where d,, is the


Kronecker delta, and the formula

which follows from the properties of Wiener processes and is a multidi-


mensional generalization of formula (1.2.8) (for details, see Lemma 2.2 in
[175]).Substituting (1.5.28) into (1.5.29) and collecting the similar terms,
we obtain

exp [ L ( u l , .. ., urn,~ ty t,+ a ) A + o ( A ) ] = 1 + ( j u c y ) ( A Q A+ BaoFopAYp)

Using (1.5.23), (1.5.27), and (1.5.31), we calculate the numerator of the


fraction on the right-hand side in (1.5.19):

-
-
[1 + j u , ( A , A + B,, FupAyp)

+ -21( 3. u , ) ( j ~ ~ ) B +, ~~ , ~ , , ~ y , ] p ( tx, t ) d u l .. .durndx1t.. .dzmt

Taking into account the formulas (see (1.1.60)-(1.1.64))


Synthesis Problems for Control Systems

we obtain the numerator in (1.5.19) (we omit the constant K , since K and
a similar constant in the denominator of (1.5.19) are canceled) :

(in (1.5.33) ( t , x t + a , yt) are the arguments of the coefficients Ak, Bke,
and Fop).
The denominator of the right-hand expression in (1.5.19) differs from
the numerator by integration with respect to xt+a. We perform this inte-
gration, take into account the fact that the normalization condition for the
probability density p(t, x t + a ) implies

and from (1.5.33) obtain the following expression (without K ) for the de-
nominator in (1.5.19):

where Eps (.) denotes the a posteriori averaging s ( ~ ) ~ x)


( t dx.
,
We assume that the elements of the matrix BUp (and of F,,, respec-
tively) are independent of unobservable components x and take into account
(1.5.30). Then we can write
84 Chapter I

Multiplying (1.5.33) by (1.5.35) and substituting the result into (1.5.19),


we obtain

As A -+ 0, the terms denoted by o(A) in (1.5.36) disappear, and the


finite increments become differentials. In this case, according to $1.2, it
is necessary to point out in which sense stochastic differentials are under-
stood, since the differential equation obtained is stochastic (it contains the
differential of the Markov process dyp(t)).
Comparing Eq. (1.5.36) (as A + 0) with the stochastic equation (1.2.3),
we see that now the a posteriori probability density p(t, x) in (1.5.36) plays
the role of the random function x(t) in (1.2.3), and the vector-function

plays the role of the function a ( t , x) in (1.2.3). In this case, as it follows


from the derivation of Eq. (1.5.36), the functions (1.5.37) relative to time t
+
are multiplied by the increments Ay, = ~ , ( t A) - ~ , ( t )(see (1.5.14)).
Therefore, as A -+ 0, (1.5.36) implies the following differential equation in
the Ito form:

Equation (1.5.38) is the desired equation for the a posteriori probability


density of the unobservable components x(t) of the partially observable
Markov process z(t).
For the a posteriori density p(t, x), we obtain the symmetrized equation
(equivalent to (1.5.38))
Synthesis Problems for Control Systems 85

- Eps (B,, a(FpuAu)


ax,
+ B,, a(FpuAu)
ayT
+ F,,A, A,)] p
(1.5.39)

by using coupling formulas between stochastic differentials and integrals


(see $1.2);'~ in (1.5.39) y,, = y,,(t) = dy,(t)/dt denotes the formal time-
derivative of the Markov process y, (t).
Equation (1.5.39) is a generalization of the Fokker-Planck equation to
the observation case. It should be noted that if some transformations of ran-
dom functions p(t, x) are necessary (see formulas (1.5.41)-(1.5.44) below),
then it is more convenient to use Eq. (1.5.39), although it is more cumber-
some than a similar Ito equation (1.5.38), since (see $1.2) the symmetrized
form allows us to treat random functions (even such singular functions as
white noises) according to the same formal rules as for deterministic and
sufficiently smooth functions.
We can show [I321 that fiup[dOyp(t) - EpsApdt]16is the differential of
the standard Wiener process drl(t) studied in $1.2. Therefore, in view of
Eq. (1.5.38), the already cited Markov property of the set ( y t , ~ ( tx)) , can be
obtained by not completely rigorous but sufficiently illustrative arguments.
+
Indeed, since the increments [ ~ , ( t A) - ~ , ( t ) ]of the stochastic pro-
cesses ~ ( t =) &[y,(t) - EpsAp(7,Y(T)) dt] in (1.5.38) are mutually
+
independent, the future values of the a posteriori probability p(t A , z ) are
completely determined by (zt, yt, p(t, 2 ) ) . Since the vector xt is unobserv-
able, then the probabilities ~ (+tA , x) of future values are determined by
(yt,p(t, x)) and the probability of the current value xt, that is, by the a pos-
teriori density p(t, z ) contained in (yt,p(t, x)). On the other hand, since
the process z(t) is of Markov character, the probabilities of future values of
the observable process yt+a are completely determined by its current state
zt = (xt, yt), that is, by the same set (yt, p(t, x ) ) , since xt is unobservable.
This implies that ( y ( t ) , ~ ( tz,) ) is a Markov process.
Now let us recall that Eqs. (1.5.38) and (1.5.39) were derived under
the assumption that the control u(t) in (1.5.3) is a known deterministic
function of time. However, if the control u(t) is given by the functional
(1.5.2) (in the new notation, following (1.5.14), this functional has the form
~ ( t= ) ut = y ( t , yh)), then this fact does not affect the Markov properties
of (yt,p(t, x)), since it is assumed that ( y t , ~ ( tx)) , is determined by the

''Here we do not show in detail how to transform the Ito equation (1.5.38) to the
symmetrized form (1.5.39); the reader is strongly recommended to do this useful exercise
on his own.
16fl,,p denotes an element of the matrix @ which is the square root of the rna-
trix F; since the matrix llBpoll is nondegenerate, the inverse F = IIBPall-', as well
as IIBPall,is p s i t i v e definite and symmetric; therefore, the matrix square root fi
exists [62], and the matrix 0, as well as F, is positive definite and symmetric.
86 Chapter I

entire past history of the observations y; = {y(s): 0 5 s 5 t). Thus, for


a given state of (yt,p(t,x)) and any chosen functional cp in (1.5.2), the
control ut is a known vector on which the functions a ( t , x, u) in (1.5.3) and
the functions A, and A, in (1.5.38) depend as on a parameter. Hence it
follows that Eqs. (1.5.38) and (1.5.39) are also valid for controlled processes
(provided that the control is given in the form (1.5.2)).
Now let us return to the synthesis problem and the dynamic program-
ming approach. Describing the state of a controlled system a t time t by
(1.5.8) or, briefly, by (yt, p(t, x)) (recall that after (1.5.14) we introduced
ct
the new notation: Zt, + yt and xt, yt -+xt), we can write the loss func-
tion (1.5.7) as F ( t , y;) = ~ ( tyt,p(t,x)).
, Using the Markov property of
(yt,p(t, x)), we can write the basic equation of the optimality principle for
the function F (t, yt, p(t, 2)) as follows:

Generally speaking, by passing to the limit as A + 0 in (1.5.40) and us-


ing (1.5.14) and (1.5.38) (or (1.5.39)), we can obtain the Bellman differen-
tial equations by analogy with s1.4. However, the equation obtained in this
way contains the functional derivatives GF/Sp(t, x), S2F/Sp(t, x)Sp(t, T ),
etc.; usually it is difficult to solve this equation (as pointed out in $1.4
and 81.5, even the solution of "usual" Bellman partial differential equa-
tions is a rather complicated problem). Therefore, it is more convenient in
practice, instead of the a posteriori density p(t, x), to use some equivalent
set of parameters as arguments of the function F. We show how to do this.
Assume that the a posteriori probability density p(t, x) is a unimodal
function of a vector variable x for all t E 10, TI. By the vector mt = m ( t ) we
denote the maximum of the a posteriori density p(t, x) a t time t. Expanding
lnp(t, x) in the Taylor series with respect to x around the point m(t), we
obtain the following representation of the a posteriori density p(t, x):

x
2
ff,p,...,C=1
amp...c(t)(x, - ma(t)) . . . (xc - mc (t))} (1.5.41)
Synthesis Problems for Control Systems 87

(the scalar function a(t) in (1.5.41) is determined by the normalization


condition Jp(t, x) dx = 1).
Using (1.5.41), we readily obtain the system of equations for the pa-
rameters (m, (t), aap(t), aapy (t), . . .) instead of the symmetrized equation
(1.5.39). To this end, we rewrite (1.5.39) in the more compact form

Next we replace the functions 7f, and @(x,y) by their Taylor series,17
substitute (1.5.41) into (1.5.42), and successively set the coefficients of equal
powers of (2, - m,), (x, - m,)(xp - m p ) , . . . on the left- and right-hand
sides of (1.5.42) equal to each other; thus we obtain the following system
of ordinary differential equations for m, (t) ,asp (t) , aapy(t) , . . . :

In (1.5.43) the dot over a variable indicates, as usual, the time derivative
(mp = dmp(t)/dt). Moreover, in (1.5.43) we assume that Bffpis inde-
pendent of x and omit the arguments of the functions x,
@, and of their
derivatives; we assume - that-the values of these functions are taken a t the
point x = m, that is, Ap = Ap(t,m, y), d@/dx, = d@(t,m,y)/dx,, etc.
It follows from (1.5.41) that the set of the parameters m,(t), aap(t), . . .
uniquely determines the a posteriori probability density p(t, x) a t time t.

1 7 ~ h functions
e A, and 6(x, y) are expanded with respect to a: in a neighborhood
of the point m ( t ) .
88 Chapter I

Thus we can use these parameters as new arguments of the loss function,
since F (t, yt ,p(t, 2 ) ) = F (t, yt ,mat, a,pt, . . .). However, in the general case,
system (1.5.43) is of infinite order, and therefore, if we use the new sufficient
coordinates (yt, mat, a,pt, . . .) instead of the old coordinates (yt,p(t, x)),
then we do not gain considerable advantage in solving special problems.
Nevertheless, there is an important class of problems in which the a poste-
riori probability density (1.5.41) is Gaussian (conditional Gaussian Markov
processes are studied in detail in [131, 1321). We have such processes if [175]
(1) the elements of the matrix are constant numbers; (2) the functions
-
A, linearly depend on x; (3) the function @(x,y) depends on x linearly
and quadratically; (4) the initial probability density (the a pn'ori proba-
bility density of unobservable components before the observation) p(O, x)
is Gaussian. Under these conditions, we have a,py = a,pya = - . = 0 in a

(1.5.41) and (1.5.43), and system (1.5.43) is closed and of finite dimension:

Now let us consider the synthesis problem corresponding to this case. To


avoid cumbersome formulas, we deal with a simplified version of problem
(1.5.3)-(1.5.6). Namely, we assume that the input y(t) is absent and the
system shown in Fig. 3 does not contain Block 1. Suppose that the plant
P is described by the system linear with respect to the output (controlled)
variables
+ +
2 = G(t, u)x b(t, u) u(t)[(t), (1.5.45)
where x = x(t) is an m-vector of output variables, G(t, u) and u(t) are given
m x m matrices, b(t, u) is a given m-vector-function, and [(t) is an m-vector
of random perturbations of the standard white noise type (1.1.34). More
explicitly, the vector-matrix equation (1.5.45) has the form

We observe the stochastic process

where Z and 7 are k-vectors, P and Q are k x n and k x k matrices, the


matrix Q(t) is nondegenerate for all 0 t < <
T, and rl(t) is the standard
white noise (1.1.34) independent of [(t).
Synthesis Problems for Control Systems 89

Under the assumption that the admissible control satisfies condition


(1.5.4), it is required to find the optimal control u , ( t ) = cp(t,Zi) such
that the cost functional

attains its minimum value.


We write

Then ( x ( t ) ,y ( t ) )is a Markov stochastic process, and it follows from rela-


tions (1.5.45), (1.5.46), and (1.5.48) that the characteristics (1.5.14) of this
process have the form

In (1.5.49) the indices a ,P, y take values from 1 to m and the indices p, u, r
from m + 1 to m+ k .
In this case, it follows from (1.5.49) and (1.5.39) that in (1.5.42) we have

( F o pis an element of the matrix IIB,PII-l = [ Q ( t ) Q T( t ) ] - l ) It


. follows from
(1.5.49) and (1.5.50) that in this case system (1.5.43) has the form (1.5.44).
Substituting (1.5.49) and (1.5.50) into (1.5.44), we obtain the following
system of equations for the parameters m,(t), a a p ( t ) ,a l p = 1 , . . . , m, of
the a posteriori density:

System (1.5.51) can be written in a more compact vector-matrix form. So,


introducing the matrix A = IlaaPIIy and taking into account the fact that
yp = Z p according to (1.5.48), we see that (1.5.51) implies

A h =A[G(~ ,
u)m+ +
b(t, u ) ] ~ ~ ( t ) [ Q ( t ) & ~ ( t ) ] -~l (( tZ) ,- ) ,
A = - A C ( ~ ) C ~ ( ~G) ~A (- ~ , u ) AAG- ( ~ , u )
+pT(t)[~(t)~T(t)l-lp(t).
90 Chapter I

Now we note that the right-hand sides of (1.5.52) do not explicitly depend
on y(t), and moreover, the cost functional (1.5.47) is independent of the
observable process Z(t). Therefore, in this case, the current values of the
vector yt do not belong to the sufficient coordinates of problem (1.5.45)-
(1.5.47), which are the current values of the components of the vector mt
and of the elements of the matrix At.
If instead of the matrix A we consider the matrix D = A-l of a posteriori
covariances, then, multiplying the first equation in (1.5.52) by the matrix
D from the left and the second equation in (1.5.52) from the left and from
the right and taking into account the formulas

we obtain, instead of (1.5.52), the relations

Equations (1.5.53) are well-known equations of the K a l m a n filter [I, 5,


58, 79, 1321. As is known, the Kalman filter is a device for optimal filtering
of the "useful signal" x ( t ) that is observed on the background of a random
noise. In this case, the vector m(t) is an optimal18 estimate of current
values of the components of the unobservable stochastic process x(t) that
is the result of observation of ZJ = {Z(s): 0 5 s 5 t}, provided that the
observation process is given by (1.5.46). The matrix D(t) that satisfies
the second (matrix) equation in (1.5.53) characterizes the accuracy of the
estimation of unobservable components of the process x(t) by the vector
m(t) (see [I, 5, 791).
Equations (1.5.53) play the role of "equations of motion" for the con-
trolled system in the space of sufficient coordinates. Since the process
Q-'(t)(Z(t) - ~ ( t ) m is
) a white noise, the first equation in (1.5.53) is a
stochastic equation of type (1.5.45), and the second equation is a usual
differential (matrix) equation. Therefore, the Bellman differential equation
for the loss function F(t,mt, D t ) can be derived by a technique similar to
that used in $1.4 to derive Eq. (1.4.21) for the function (1.4.5).

''The optimality of the estimate m ( t ) is understood as the minimum of the mean


square deviation Elx(t) - m(t)I2;as is known [167, 175, 1811, in the Gaussian case,
m ( t )coincides with the maximum point of the a posteriori probability density p ( t , x) =
p ( x ( t ) = x 1 lot).
Synthesis Problems for Control Systems 91

After similar calculations, we obtain the Bellman equation of the follow-


ing form (see also [34, 1751) for the function F ( t , m, D) in problem (1.5.45)-
(1.5.47):

+ da DF (aaT- DPT(QQT)- 'PD)]


-

where a F / a m is an m-vector with components dF/dm,, a = 1,...,m;


d 2 ~ / d m d m Tis the m x m matrix of the derivatives d2F/dm,dmp, a, /3 =
1 , . . .,m; dF/dD is the m x m matrix of the partial derivatives dF/dDap,
a,/3 = 1,.. .,m; and c(m, D, u) denotes the a posteriori mean of the penalty
function c(x, u) in the functional (1.5.47), that is,

= [ ( 2 ~det
) ~D ] - ' / ~ / C(X, u ) e-(~-m)~D-'(x-m)/2
dx.
(1.5.55)

The loss function F ( t , m, D) satisfies (1.5.54) for 0 5 t < T. At the


terminal instant of time t = T, this function is determined by the relation

where, by analogy with (1.5.55), Eps(.) denotes integration of with (a)

Gaussian density. We see that (1.5.56) is a generalization of condition


(1.4.22) to the case of indirect observations.
As usual, by solving Eq. (1.5.54) with the additional condition (1.5.56),
we simultaneously obtain the optimal control u,(t) = cpl(t, m(t), D(t)) (see
81.3 and 81.4). Thus the desired algorithm of optimal control in the func-
tional form u,(t) = cp(t, Z i ) for problem (1.5.45)-(1.5.47) is the superpo-
sition of the two operations: the optimal filtering of the observed process
< <
(Z(t) : 0 t T) by means of the Kalman filter (1.5.53) and the formation
of the current control u, (t) = cpl (t, m(t), D(t)) .
This situation is typical of other problems with indirect observations.
Therefore, in the general case of the servomechanism shown in Fig. 3, the
Chapter I

controller C actually consists of two blocks that are functionally different


(see Fig. 12): the sufficient coordinate block SC that models the corre-
sponding filter and the decision block D whose structure is determined by
the solution of the Bellman equation.
Some examples of other Bellman equations obtained by using sufficient
coordinates, as well as solutions of these equations, will be considered later
in 33.3, $4.2, $5.4, and $6.1.
CHAPTER I1

EXACT METHODS FOR SYNTHESIS PROBLEMS

Exact solutions to synthesis problems of optimal control are of deep theo-


retical and practical interest. However, exact solutions can be obtained only
in some special cases. The point is that exact methods are characterized by
rather strict restrictions on the assumptions of the synthesis problem, but
these assumptions are seldom satisfied in actual practice. It is well known
that, for instance, the Bellman equation can be solved exactly under the
following assumptions: (1) the dynamic equations of the plant are linear,
(2) the optimality criterion of the form (1.1.11) or (1.4.3) contains only
quadratic penalty functions, (3) no restrictions are imposed on the control
and on the phase coordinates, (4) random actions (if any) on the system
are Gaussian Markov processes or processes of the white noise type. The
synthesis problems satisfying (1)-(4) are called linear-quadratic problems of
optimal control. An extensive literature is devoted to these problems [3, 5,
18, 24, 72, 112, 122, 128, 132, 1681. In the present chapter we restrict our
consideration to an outline of methods for solving such problems ($2.1) and
consider in more detail less known results concerning the solution of some
special synthesis problems with bounded controls (332.2-2.4).

3 2.1. Linear-quadratic problems


of optimal control (LQ-problems)
2.1.1. First, let us consider an elementary optimalstabilization problem
of a first-order system perturbed by a Gaussian white noise (see Fig. 13).
Suppose that the plant P is described by a linear scalar equation of the
form
+
& = ax bu$ &[(t), (2.1.1)

where a , b, and Y are given constants ( Y > 0) and [(t) is the standard
white noise (1.1.31). The performance of this system is estimated by the
following functional of the form (1.4.3) with quadratic penalty functions:
Chapter I1

(here c, e l , and h are given positive constants). We do not impose any


restrictions on the control u and the phase variable z.
Problem (2.1.1), (2.1.2) is a stochastic generalization of the linear-quad-
ratic problem (1.3.24), (1.3.25) considered in $1.3 and a special case of a
more general problem (1.4.2)-(1.4.4). Since the stabilization system shown
in Fig. 13 is a specific case of the servomechanism shown in Fig. 8, the
Bellman equation for problem (2.1. I ) , (2.1.2),

can be obtained from (1.4.21) by setting

A9 = BY = 0, Bx = v, c(z, y, u) = cx2 + hu , Ax = a x + bu.

In (2.1.3) the loss function F = F ( t , x), determined, as usual, by

F ( t , x) = min E +
[cx2(r) hu2(r)]d r + c 1 z 2 ( ~I)x(t) = x
.(TI
t<s<T

satisfies Eq. (2.1.3) in the strip IIT = (0 5 t < T , -co < x < co} and
becomes a given quadratic function,

for t = T. Condition (2.1.5) readily follows from the definition of the loss
function (2.1.4) or from formula (1.4.22) with +(x, y) = c1x2.
The optimal control u, in the form (1.4.25), which minimizes the ex-
pression in the square brackets in (2.1.3), is determined by the condition
Exact Methods for Synthesis Problems 95

Substituting, instead of u , the control u, into the expression in the square


brackets in (2.1.3) and omitting the symbol "min", we rewrite Eq. (2.1.3)
in the form

(Eq. (2.1.7) is just Eq. (1.4.26) for problem (2.1.1), (2.1.2)).


Now to solve the synthesis problem, it remains to find the solution F ( t , x )
that satisfies Eq. (2.1.7) in the strip IIT and is a continuous continuation
of (2.1.5) as t + T . We shall seek such a solution in the form

where p ( t ) and r ( t ) are some functions of time. We choose these functions so


that the solution of the form (2.1.8) satisfy (2.1.5) and (2.1.7). Substituting
(2.1.8) into (2.1.7) and setting the coefficient of x 2 equal to zero, as well
as the terms independent of x , we obtain the following equations for the
unknown functions p ( t ) and r ( t ) :

It follows from (2.1.5) that the solutions p ( t ) and r ( t )of (2.1.9) and (2.1.10)
attain the values
p(T) =cl, r(T)=0 (2.1.11)
a t the terminal time t = T .
The system of ordinary differential equations (2.1.9), (2.1.10) with addi-
tional conditions (2.1.11) can readily be integrated. As a result, we obtain
the following expressions for the functions p ( t ) and r ( t ) :

where the constants p, D l , D 2 , D 3 , and D 4 are related to the parameters


of problem (2.1.1), (2.1.2) as follows:
96 Chapter I1

From (2.1.6), (2.1.8), and (2.1.12), we obtain the optimal control law

which is the solution of the synthesis problem for the optimal stabilization
system in Fig. 13. It follows from (2.1.14) that in this case the controller C
in Fig. 13 is a linear amplifier in the variable x with variable amplifica-
tion factor j?(t). In the sequel, we indicate such amplifiers by a special
mark ">." Therefore, the optimal system for problem (2.1.1), (2.1.2) can
be represented as the block diagram shown in Fig. 14.

Obviously, the minimum value I[u,] of the optimality criterion (2.1.2)


with control (2.1.14) and the initial state x(0) = x is equal to F ( 0 , x). From
(2.1.8), (2.1.12), and (2.1.13), we have

To complete the study of problem (2.1.1), (2.1.2), it remains to prove


that the solution (2.1.12)-(2.1.15) of the synthesis problem is unique. It
follows from our discussion that the problem of uniqueness of (2.1.12)-
(2.1.15) is equivalent to the uniqueness of the solution (2.1.8) of Eq. (2.1.7).
The general theory of quasilinear parabolic equations [I241 implies that
Eq. (2.1.7) with additional condition (2.1.5) has a unique solution in the
class of functions F (t, x) whose growth as lx 1 -+ m does not exceed that of
any finite power of 1x1. On the other hand, an analysis of properties of the
loss function (2.1.4) performed in [I131 showed that, for each t E [0, T] and
x E R1, the function (2.1.4) satisfies the estimate
Exact Methods for Synthesis Problems 97

where N ( T ) is bounded for any finite T . Therefore, the function (2.1.8) is


a unique solution of Eq. (2.1.7), corresponding to the problem considered,
and the synthesis problem has no solutions other than (2.1.12)-(2.1.15).
REMARK.The optimal control (2.1.14) is independent of the param-
eter v , that is, of the intensity of random actions on the plant P, and
coincides with the optimal control algorithm (1.3.33), (1.3.34) for the de-
terministic problem (1.3.24), (1.3.25). Such a situation is typical of many
other linear-quadratic problems of optimal control with perturbations in
the form of a Gaussian white noise.
The exact formulas (2.1.12)-(2.1.15) allow us to examine the process of
relaxation of stationary operating conditions (see $1.4, Section 1.4.2) for
the stabilization system in question. To this end, let us consider a special
case of problem (2.1.1) in which the terminal state x ( T ) is not penalized
(cl = 0 ) . In this case, formulas (2.1.12) and (2.1.13) read

If the operating time is equal to T > tl = 3 / 2 0 , then the functions p ( t ) and


r ( t ) determined by (2.1.16) and (2.1.17) have the form shown in Fig. 15.

The functions ~ ( tand ) r ( t ) are characterized by the existence of two time


intervals [O, T - t l ] and [ T - t l , T ]on which p ( t ) and r ( t ) behave in different
98 Chapter I1

ways. The first interval [0, T - tl] corresponds to the stationary operating
mode, that is, p(t) Y c / ( P - a ) = const for t E [0, T - tl], the function
r ( t ) linearly decreases as t grows, and on this interval the rate of decrease
in r ( t ) is constant and equal to vc/(,L?- a). The terminal interval [T- t l , T]
is essentially nonstationary. It follows from (2.1.16) and (2.1.17) that the
length of this nonstationary interval is of the order of 3/20. Obviously,
in the case where this nonstationary interval is a small part of the entire
operating time [0, TI, the control performance is little affected if, instead of
the exact optimal control (2.1.14), we use the control

that corresponds to the stationary operating mode. It follows from (2.1.18)


that for large T the controller C in Fig. 13 is a linear amplifier with constant
amplification factor, whose technical realization is much simpler than that
of the nonstationary control block described by (2.1.14) and (2.1.12).
Formulas (2.1.16) and (2.1.17) show that, for large values of T - t , the

F ( t ,x ) -
loss function (2.1.8) satisfies the approximate relation
C
-x2
P-a
+- 1/C

0-a
(T - t ) .

Comparing (2.1.19) and (1.4.29), we see that in this case the value y of
stationary mean losses per unit time, introduced in $1.4, is equal to

that is, coincides with the rate of decrease in the function r ( t ) on the
stationary interval [0, T - tl] (Fig. 15). In this case, the stationary loss
function defined by (1.4.29) is equal to

It should be noted that to calculate y and the function f ( x ) , we need


not have exact formulas for p(t) and r ( t ) in (2.1.8). It suffices to use the
corresponding stationary Bellman equation (1.4.30), which in this cases has
the form

and to substitute the desired solution in the form f (x) = px2 into (2.1.22).
We obtain the numbers p and y, just as in the nonstationary case, by setting
Exact Methods for Synthesis Problems 99

the coefficients of x2 and the free terms on the left- and right-hand sides in
(2.1.22) equal to each other.
We also note that if a t least one of the parameters a , 6, v , c , and h of
problem (2.1.1), (2.1.2) depends on time, then, in general, there does not
exist any stationary operating mode. In this case, one cannot obtain finite
formulas for the functions p(t) and r ( t ) in (2.1.8), since Eq. (2.1.9) is a
Riccati equation and, in general, cannot be integrated exactly. Therefore,
if the problem has variable parameters, the solution is constructed, as a
rule, by using numerical integration methods.
2.1.2. All of the preceding can readily be generalized to multidimen-
sional problems of optimal stabilization. Let us consider the system shown
in Fig. 13 whose plant P is described by a linear vector-matrix equation of
the form
+ +
k = A(t)x B ( t ) u u(t)J(t), (2.1.23)
where x = x(t) E R, is a n n-vector-column of phase variables, u E R, is
an r-vector of controlling actions, and J ( t ) E R
, is an m-vector of random
perturbations of a Gaussian white noise type with characteristics (1.1.34).
The dimensions of the matrices A, B , and u are related to the dimensions
of the corresponding vectors and are equal to n x n , n x r, and n x m ,
respectively. The elements of these matrices are continuous functions of
time1 defined for all t from the interval [0, TI on which the controlled system
is considered.
For the optimality criterion, we take a quadratic functional of the form

Here Q and G ( t ) are symmetric nonnegative definite n x n matrices and


the symmetric r x r matrix H ( t ) is positive definite for each t E [0, TI.
Just as (2.1.3), the Bellman equation for problem (2.1.23), (2.1.24) fol-
lows from (1.4.21) if we set AY = BY = 0, Bx = u(t)uT(t), Ax = A(t)x +
+
B ( t ) u , and c(x, y, u) = xTGx U ~ H U . Thus we obtain

aF
+ xT AT (t) aF 1 a2F
-
at ax + -2 SP u ( t ) a T (t)-
-
axaxT

'As was shown in [156],it suffices to assume that the elements of the matrices A ( t ) ,
B ( t ) ,and a ( t ) are measurable and bounded.
100 Chapter I1

In this case, the additional condition on the loss function (1.4.22) has
the form
F ( T , x) = xT&x.
The further considerations leading to the solution of the synthesis problem
are similar to those in the one-dimensional case. Calculating the minimum
value of the expression in the square brackets in (2.1.25), we obtain the

which is a vector analog of formula (2.1.6). Substituting the expression

dF
-
dt
+x ..
obtained for u, into (2.1.25), we arrive a t the equation

dF
A (t) -
ax
+ 21 spa ( t ) a T(t)-
-
d2F
dxdxT

We seek the solution of (2.1.27) as the following quadratic form with respect
to the phase variables:

Substituting (2.1.28) into (2.1.27) and setting the coefficients of the qua-
dratic (with respect to x) terms and the free terms on the left-hand side in
(2.1.27) equal to zero, we obtain the following system of differential equa-
tions for the unknown matrix P ( t ) and the scalar function r(t):

If system (2.1.29) is solved, then the optimal solution of the synthesis


problem has the form

which follows from (2.1.26) and (2.1.28). Formula (2.1.30) shows that the
controller C in the optimal system in Fig. 13 is a linear amplifier with n
inputs and r outputs and variable amplification factors.
Let us briefly discuss the possibilities of solving system (2.1.29). The
existence and uniqueness of the nonnegative definite matrix P ( t ) satisfying
the matrix-valued Riccati equation (2.1.29) are proved in [72] under the
above assumptions on the properties of the matrices A(t), B(t), G(t), H ( t ) ,
Exact Methods for Synthesis Problems 101

and Q. One can obtain explicit formulas for elements of the matrix P ( t )
only by numerical r n e t h o d ~ ,which
~ is a rather complicated problem for
large dimensions of the phase vector x.
In the special case of the zero matrix G(t) 0, the solution of the matrix
equation (2.1.29) has the form [I, 1321

P (t) = xT(T, t ) E
[
>
Here X ( t , s ) , t s, denotes the fundamental matrix of system (2.1.23);
sometimes this matrix is also called the Cauchy matrix. The properties of
the fundamental matrix are described by the relations

One can construct the matrix X ( t , s ) if the so-called integral matrix Z ( t )


of system (2.1.23) is known. According to [ I l l ] , a square n x n matrix Z ( t )
is called the integral matrix of system (2.1.23) if its columns consist of any
n linearly independent solutions of the homogeneous system j: = A(t)x.
If the matrix Z ( t ) is known, then the fundamental matrix X ( t , s ) has the
form
X(t, s) = z(~)z-l(s). (2.1.33)
One can readily see that the matrix (2.1.33) satisfies conditions (2.1.32).
The fundamental matrix can readily be calculated if and only if the
elements of the matrix A(t) in (2.1.23) are time-independent, that is, if
A(t) A = const. In this case, we have

and the exponential matrix can be expressed in the standard way [62] either
via the Lagrange-Silvester interpolation polynomial (in the case of simple
eigenvalues of the matrix A) or via the generalized interpolation polynomial
(in the case of multiple eigenvalues and not simple elementary divisors of
the matrix A).
If the matrix A is time-varying, the construction of the fundamental
matrix (2.1.33) becomes more complicated and requires, as a rule, the use
of numerical integration methods.
'There also exist approximate analytic methods for calculating the matrices P ( t )
[I, 721. However, for matrices P ( t ) of larger dimensions, these methods meet serious
computational difficulties.
102 Chapter I1

2.1.3. The results obtained by solving the basic linear-quadratic prob-


lem (2.1.23), (2.1.24) can readily be generalized to more general statements
of the optimal control problem. Here we only list the basic lines of these
generalizations; for a detailed discussion of this subject see [ I , 5, 34, 58, 72,
122, 1321.
First of all, note that the synthesis problem (2.1.23), (2.1.24) admits a n
exact solution even if there are noises in the feedback circuit, that is, if
instead of exact values of the phase variables x(t), the controller C (see
Fig. 13) receives distorted information of the form

where N ( t ) and uo(t) are given matrices and ~ ( t is) either a stochastic
process of the white noise type (1.1.34) or a Gaussian Markov process. In
this case, the optimal control algorithm coincides with (2.1.30) in which,
instead of the true values of the current phase vector x = x(t), we use the
vector of current estimates m = m ( t ) of the phase vector. These estimates
are formed with the help of Eqs. (1.5.53) for the Kalman filter, which with
regard to the notation in (2.1.23) and (2.1.34) have the form3

rh, = [A(t) - B ( t ) ~ - ' ( t ) B ~ ( t ) ~ ( t ) ] m

+ DNT (t)[uo(t)u: (t)]-' (Z(t) - ~ ( t ) m ) ,


(2.1.35)

Thus in the case of indirect observation (2.1.34), as shown schematically


in Fig. 16, the optimal controller C consists of the following two functionally
different blocks connected in series: the block K F modeling Eqs. (2.1.35),
(2.1.36) for the Kalman filter and a linear amplifier with matrix amplifi-
cation factor P ( t ) = - ~ - l ( t ) ~ ~ ( t ) ~ This
( t ) .statement follows from the
well-known separation theorem 158, 1931.
The next generalization of the linear-quadratic problem (2.1.23), (2.1.24)
is related to a more general model of the plant. Suppose that, in addition
to additive noises [(t), the plant P is subject to perturbations depending
on the state x and control u and to pulsed random actions with Poisson
distribution of the pulse moments. It is assumed that the behavior of the
plant P is described by the special equation

3 ~ q u a t i o n (2.1.35)
s and (2.1.36) correspond to the case in which ~ ( tin) (2.1.34) is a
white noise.
Exact Methods for Synthesis Problems

where &(t) and &(t) are scalar Gaussian white noises (1.1.31), O ( t ) is an
e-vector of independent Poisson processes with intensity coefficients Xi (i =
1,. . .,l),ul, ~ 2 and
, u3 are given n x n, n x r , and n x & matrices, and
the other variables have the same meaning as in (2.1.23). For the exact
solution of problem (2.1.37), (2.1.24), see 1341.
We also note that sufficiently effective methods have been developed for
infinitely dimensional linear-quadratic problems of optimal control if the
plant P is either a linear dynamic system with distributed parameters or
a quantum-mechanical system. Results corresponding to control of dis-
tributed parameter systems can be found in 1118, 130, 164, 1821 and to
control of quantum systems in [12, 131.
All linear-quadratic problems of optimal control, as well as the above-
treated examples, are characterized by the fact that the loss function sat-
isfying the Bellman equation is of quadratic form (a quadratic functional)
and the optimal control law is a linear function ( a linear operator) with
respect to the phase variables (the state function).
To solve the Bellman equation becomes much more difficult if it is nec-
essary to take into account some restrictions on the domain of admissible
control values in the design of an optimal system. In this case, exact an-
alytical results can be obtained, as a rule, for one-dimensional synthesis
problems (or for problems reducible to one-dimensional problems). Some
of such problems are considered in the following sections of this chapter.

32.2. Problem of optimal tracking a wandering coordinate


Let the input (command) signal y(t) in the servomechanism shown in
Fig. 2 be a scalar Markov process with known characteristics, and let the
plant P be a servomotor whose speed is bounded and whose behavior is
104 Chapter I1

described by the scalar deterministic equation

(here u, determines the admissible range of the motor speed, -u, i < <
u,). Equation (2.2.1) adequately describes the dynamics of a constant
current motor controlled by the voltage on the motor armature under the
assumption that the moment of inertia and the inductance of the armature
winding are small [2, 501. We shall show that various synthesis problems
stated in $1.4 can be solved for such servomechanisms.
2.2.1. Let y(t) be a diffusion Markov process with constant drift a and
diffusion B coefficients. We need to calculate the controller C (see Fig. 2)
that minimizes the integral optimality criterion

where c(x, y) is a given penalty function.


By setting AY = a , BY= B, ax = U, and Bx = 0 in (1.4.21), we readily
obtain the following Bellman equation for problem (2.2. I), (2.2.2):

d~ d~ B ~ ~ F dF
- +a-
at ay
+ --
2ay2
+ c ( x , y) + min
lullurn
(2.2.3)

We shall consider the penalty functions c(x, y) depending only on the error
signal, that is, on the difference z = y - x between the command input y
and the controlled variable x. Obviously, in this case, the loss function
F ( t , x , y) = F ( t , y - x) = F ( t , z ) in (2.2.3) also depends only on z . Instead
of (2.2.3), we have

aF
-+a-
OF
+ Ba2F
--+c(z) + min (2.2.4)
at az 2 az2 IaIIur"

The minimum value of the function in the square brackets in (2.2.4) can be
obtained by using the control4

41n (2.2.5) signa indicates the following scalar function of a scalar variable a :
Exact Methods for Synthesis Problems 105

which requires to switch the servomotor speed instantly from one admissible
limit value to the opposite value when the derivative a F ( t , z ) / a z of the loss
function changes its sign. Control of the form (2.2.5) is naturally called
control of relay type (sometimes, this control is called "bang-bang" control).
Substituting (2.2.5), instead of u, into (2.2.4) and omitting the symbol
"min", we reduce Eq. (2.2.4) to the form

In [113, 1241 it was shown that in the strip ITT = (0 5 t 5 T, -m <


z < m } Eq. (2.2.6) has a unique solution F ( t , z) satisfying the additional
condition F(T,z) = 0 if the penalty function c(z) is continuous and does
not grow too rapidly as [ z [+ m.5 In this case, F ( t , z) is a function twice
continuously differentiable with respect to z and once with respect to t. In
particular, since a F / d z is continuous, the condition

must be satisfied a t the moment of switching the controlling action.


If c(z) > 0 attains its single minimum a t the point z = 0 and does
not decrease monotonically as jzj t oo, then Eq. (2.2.7) has a single root
z,(t) for each t. This root determines the switch point of control. On
different sides of the switch point the derivative a F / d z has opposite signs.
If a F / a z > 0 for z > z,(t) and a F / a z < 0 for z < z r ( t ) , then we can write
the optimal control (2.2.5) in the form

U* (t, Z) = U , sign (z - zr ( t ).) (2.2.8)


Thus, the synthesis problem is reduced to finding the switch point z,(t).
To this end, we need to solve Eq. (2.2.6).
Equation (2.2.6) has a n exact solution if we consider the stationary track-
ing. In this case, the terminal time (the upper limit of integration in (2.2.2))
T t m , and Eq. (2.2.6) for the time-invariant loss function (see (1.4.29))
f ( z ) = lim [ F ( t , z ) -
T+oo

becomes the ordinary differential equation

5More precisely, the condition that there exist positive constants A l , A2, and a
such that 0 < c(r) 5 A1 +Azlrla for all r implies constraints on the growth of the
function c(r).
106 Chapter I1

which can be solved by the matching method [113,171, 1721.


Let us show how to do this. Obviously, the nonlinear equation (2.2.10)
is equivalent t o two linear equations

for the functions fl(z)and f2(z)that determine the function f(z) on each
side of the switch point zr.The unique solutions to linear equations (2.2.11)
are determined by the behavior of fl and f2 as lz[-+ co. It follows from
the statement of the problem that if we take into account the diffusion
"divergence" of the trajectories z(t) for large lzl,then we only obtain small
corrections to the value of the optimality criterion and, in the limit as
I
jz -+ co, the loss functions fl(z)and f2(z)must behave just as the solutions
to Eqs. (2.2.11)with B = 0. The corresponding solutions of Eqs. (2.2.11)
have the form

dfl
Z = E ~
2 O"
- yl exp [ - 2(U\- a)(T- z)] dz,
(2.2.12)
df2 -
- --?/z [.(a) - 71exp [2(u\+ a)(Z- z)] d ~ .
dz B -00

According to (2.2.7),we have the following relations a t the switch point zr:

Substituting (2.2.12)into (2.2.13),considering (2.2.13)as a system of equa-


tions with respect to two unknown variables zr and y, and performing some
simple transformations, we obtain the equation for the switch point:

and the expression for the stationary tracking error


Exact Methods for Synthesis Problems 107

To obtain explicit formulas for the switch points and stationary errors, it
is necessary to choose some special penalty functions c(z). For example, for
the quadratic penalty function c(z) = z2 from (2.2.14), (2.2.15), we have

If c(z) = lzl, then we have

It should be noted that formulas (2.2.16)-(2.2.19) make sense only under


the condition u, > a. This is due to the fact that the stationary operating
mode in the problem considered may exist only for urn > a. Otherwise, (for
a > urn), the mean rate of increase in the command signal ~ ( t is) larger
than the limit admissible rate of change in the output variable x(t), and
the error signal z ( t ) = y(t) - x ( t ) is infinitely growing in time.
If the switch point zr is found, then we know how to control the servo-
motor P under the stationary operating conditions. In this case, according
to (2.2.8), the optimal control has the form

and hence, the block diagram of the optimal servomechanism has the form
shown in Fig. 17.
The optimal system shown in Fig. 17 differs from the optimal systems
considered in the preceding section by the presence of a n essentially nonlin-
ear ideal-relay-type element in the feedback circuit. The other distinction
between the system in Fig. 17 and the optimal linear systems considered
in 32.1 is that the control method depends on the diffusion coefficient B
of the input stochastic process (in $2.1, the optimal control is independent
of the diffusion coefficient^,^ and therefore, the block diagrams of optimal
deterministic and stochastic systems coincide).
If B = 0 (the deterministic case), then it follows from (2.2.16)-(2.2.19)
that the switch point zr = 0 and the stationary tracking error y = 0. These

'This takes place if the current values of the state vector z ( t ) are measured exactly.
Chapter I1

results readily follow from the statement of the problem; to obtain these
results it is not necessary to use the dynamic programming method. Indeed,
if a t some instant of time we have y(t) > x(t) (z(t) > 0), then, obviously, it
is necessary to increase x a t the maximum rate (that is, a t u = +urn) till
the equality y = x (z = 0) is attained. Then the motor can be stopped. In
a similar way, for y < x (z < O), the control u = -urn is switched on and
operates till y becomes equal to x. After y = x is attained and the motor
is stopped, the zero error z remains constant, since there are no random
actions to take the system out of the state z = 0. Therefore, the stationary
tracking "error" is zero.'
If the diffusion is taken into account, then the optimal deterministic
control u p t = urn signz is not optimal. This fact can be explained as
follows. Let u = urn signz, and let B # 0. Then the following two factors
affect the trajectories z(t): they regularly move downwards with velocity
(urn - a ) for z > O and upwards with velocity (urn+ a ) for z < 0 due to the
drift a and control u (see Fig. IS), and they "spread" due to the diffusion B
that is the same for all z. As a result, the stochastic process z(t) becomes
stationary (since the regular displacement towards the t-axis is proportional
to t and the diffusion spreading away from the t-axis is proportional to &)
and all sample paths of z ( t ) are localized in a strip of finite width containing
the t-axis.' However, since the "returning" velocities in the upper and lower
half-planes are different, the stationary trajectories of z(t) are arranged not

' ~ tis assumed that the penalty function c ( z ) attains its minimum value a t r = 0
and c ( 0 ) = 0.
'More precisely: if z ( 0 ) = 0 , then with probability 1 the values of r ( t ) lie in a strip
of finite width for all t 2 0.
Exact Methods for Synthesis Problems

symmetrically with respect to the line z = 0, as is conventionally shown in


Fig. 19. If the penalty function c(z) is a n even function (c(z) = c(-z)),
then, obviously, the stationary tracking error y = Ec(z) (see (1.4.32)) can
be decreased by placing the strip AB (where the trajectories are localized)
symmetrically with respect to the axis z = 0. This effect can be reached
by switching the control u a t some negative value zr rather than a t z = 0.
The exact position of the switch point zr is determined by formulas (2.2.14),
(2.2.16), and (22.2.18).

In conclusion, we note that all results obtained in this section can readily
be generalized to the case where the plant P is subject to additive noncon-
trolled perturbations of the white noise type (see Fig. 10). In this case,
110 Chapter I1

instead of Eq. (2.2.1), we have

where [(t) is the standard white noise (1.1.31) independent of the input
process y(t) and N > 0 is a given number.
In this case, the Bellman equation (2.2.3) acquires the form

a~
-+ a -
a~ + B a 2 +~N a 2 +~c(x, y) +
- - min [az]
a~ = 0,
at ay 2 a y 2 2 a x 2 I U I S U ~

and instead of (2.2.4), we obtain

This equation differs from (2.2.4) only by a coefficient of the diffusion term.
Therefore, all results obtained for systems whose block diagram is shown
in Fig. 2 and whose plant is described by Eq. (2.2.1) are automatically
valid for systems in Fig. 10 with Eq. (2.2.21) if in the original problem the
+
diffusion coefficient B is replaced by B N. In particular, if noises in the
plant are taken into account, then formulas (2.2.16) and (2.2.17) for the
stationary switch point and the stationary tracking error take the form

Note also that the problem studied in this section is equivalent to the
synthesis problem for servomechanism tracking a Wiener process of inten-
sity B with nonsymmetric constraints on admissible controls -urn a 5 +
u 5 u, + a , since both these problems have the same Bellman equation
(2.2.4).
2.2.2. Now let us consider the synthesis problem that differs from the
problem considered in the preceding section only by the optimality crite-
rion. We assume that there is an admissible domain Ill, 12] for the error
) y(t) - x(t) (el and l 2 are given numbers such that el < e 2 ) . We
~ ( t=
assume that if z(t) leaves this domain, then serious undesirable effects may
occur. For example, the system considered or a part of any other more
complicated system containing our system may be destroyed. In this case,
Exact Methods for Synthesis Problems 111

it is natural to look for controls that keep z(t) within the admissible limits
for the maximum possible time.
General problems of calculating the maximum mean time of the first
passage to the boundary were considered in $1.4. In particular, the Bell-
man equation (1.4.40) was obtained. In the scalar case studied here, this
equation has the form

B d2F1 dF1 aFl


--
2 ay2
+ap+
ay m a x u-
.Iium [ I
ax = -1
(Eq. (2.2.24) follows from (1.4.40), (1.4.3 I ) , since AY = a , Ax = U, BY = B,
Bx = 0). Recall that the function F l ( x , y ) in (2.2.24) is equal to the
maximum mean time of the first passage to the boundary of the domain of
admissible phase variables if the initial state of the system is (x, y). In the
case where the domain of admissible values (x, y) is determined by the error
signal z = y - x, the function Fl depends only on the difference Fl(x, y) =
Fl (y - x) = Fl (z) and, instead of the partial differential equation (2.2.24),
we have the following ordinary differential equation for the function Fl(z):

B d2Fl dFl aFi


--
2 dz2
+a-
dz ~ &I=-1.
+ l mu al x~ u[ -u- (2.2.25)

The function F l ( z ) satisfies Eq. (2.2.25) a t the interior points of the domain
[11,12] of admissible errors z. At the boundary points of this domain, Fl
vanishes (see (1.4.41)):

The optimal system can be synthesized by solving Eq. (2.2.25) with the
boundary conditions (2.2.26). Just as in the preceding section, one can see
that the optimal control u,(z) is of relay type and is equal to

U, (z) = -urn sign


(3
Using (2.2.27), we transform Eq. (2.2.25) to the form

The condition of smooth matching (see [113], p. 52) implies that the
solution Fl(z) of Eq. (2.2.28) and the derivatives dF11d.z and d2Fl/dz2 are
Chapter I1

continuous everywhere in the interior of [.el, e2]. Therefore, the switch point
z: is determined by the condition

The same continuity conditions and the boundary conditions (2.2.26),


as well as the "physical" meaning of the function Fl(z), a priori allow us
to estimate the qualitative behavior of the functional dependence Fl(z).
The corresponding curve is shown in Fig. 20. It follows from (2.2.29) that
the switch point corresponds to the maximum value of F l ( z ) . In this case,
Fi(z) < 0 for z > z i , and F i ( z ) > 0 for z < 2 ; . In particular, this implies
that the optimal control (2.2.27) can be written in the form

U* (z) = U , sign(z - zr1),

which is similar t o (2.2.20) and differs only by the position of the switch
point. Thus, in this case, if the applied constant displacement -zr is re-
placed by -zt, then the block diagram of the optimal system coincides with
that in Fig. 15.
The switch point z: can be found by solving Eq. (2.2.28) with the bound-
ary conditions (2.2.26). Just as in the preceding section, we replace the
nonlinear equation (2.2.28) by the following pair of linear equations for the
Exact Methods for Synthesis Problems 113

function ~ : ( z ) ,z: < z < e2, and the function F c ( z ) ,l1< z < zi:

The required switch point z: can be obtained from the matching conditions
for F $ ( z ) and F c ( z ) . Since F l ( z ) is twice continuously differentiable, it
follows from (2.2.27) that these conditions have the form

The boundary conditions (2.2.26) and (2.2.32) for F: ( z ) and F; ( z ) imply

z - l2 2(um - a )( 2 ;
F$(Z) = - +
urn--a 2 ( u , - ~ ) ~ B
-t2)

I
- a)(. - 2:)
- exp [ 2 ( u m

e, - z
+ B { exp [ 2 ( u m+ a$l(z:
F; (2) = -
um+a ~(u,+u)~
-el)

I
2(um + a ) ( z zi)
exp [ -
-
-
B I)-
By using (2.2.33) and the continuity condition (2.2.31), we obtain the fol-
lowing transcendental equation for the required point z::

2aum B
+
2urnz: = (urn a)l2 + (u, - a)el +-
u& a2-

+ -{-
B
2
urn-a 2
+
u m + a exp [E(urn a )(2: - e l ) ]

Urn
--
2
+aexp[- B(um-a)(z:-&)]}. (2.2.34)
urn - a

In the simple special case a = 0, it follows from (2.2.34) that


114 Chapter I1

that is, the switch point is the midpoint of the interval of admissible er-
rors z. This natural result can be predicted without solving the Bellman
equation. In the other special case where -el = l2= l ( l > 0) and a << um,
Eq. (2.2.34) gives the following approximate expression for zi:

To find z: in the other cases, it is necessary to solve the transcendental


equation (2.2.34).
2.2.3. Assume that the performance of the servomechanism shown in
Fig. 2 is determined by the maximum error z ( t ) = y(z) - x(t) on a fixed
time interval 0 < t 5 T. Then it is natural to minimize the optimality
criterion
I[u]= E m a x Iz(t)l = E max ly(t) - x(t)l, (2.2.35)
O S t l T O<t<T

which is a special case of the criterion (1.1.18). For convenience, we shall


use the modification (1.4.48) of the criterion (1.1.18), that is, instead of
(2.2.35), we shall minimize

The parameter ,6 > 0 determines the observation time for the stochastic
process ~ ( 7 ) We
. assume that the criteria (2.2.35) and (2.2.36) are equiv-
alent if the terminal time T and the variable ,6 are matched, for example,
as follows: T = c/o, where c > 0 is a constant.
The Bellman equation for the problem considered can be obtained from
(1.4.51) with regard to the relation f2(x, y) = f2(y - x) = f2(z). This
equation has the form

Just as in the preceding sections, after the expression in the square brackets
is minimized, Eq. (2.2.37) acquires the form

df2
2 , iff2(z) > z], (2.2.38)
otherwise.

In this case, just as in the preceding sections, the optimal control u,(z) is
of relay type and can be written in the form (2.2.20). The only distinction
Exact Methods for Synthesis Problems 115

is that, in general, the switch point z: differs from zr and 2:. The point z:
can be found by solving Eq. (2.2.38).
Solving Eq. (2.2.38), we shall distinguish two domains on the z-axis:
the domain Z1 where f 2 (z) > jzI and the domain Z2 where f2(z) = 121.
Obviously, if f2(z*) = lz*I for any z*, then f2(z) = 1x1 for any z such that
IzI > Iz*I. In other words, the domain Z2 consists of two infinite intervals
(-CQ, z') and (z", +a). In the domain Z1 lying between the boundary
points z' < 0 and z" > 0, we have

Next, the interval [zl, z"] is divided by the switch point z: into the fol-
lowing two parts: the interval z' < < z z: where Eq. (2.2.39) takes the
form

and z: <z < Z" where

Thus, in this case, we have seven unknown variables: z', z", z:, and the
four constants obtained by integrating Eqs. (2.2.40) and (2.2.41). They can
be obtained from the following seven conditions:

Formulas (2.2.42) are smooth matching conditions for the solutions f; (z)
and f$(z). The last three conditions show that the solutions and their
first-order derivatives are continuous a t the switch point z: (see (2.2.31)
and (2.2.32)). The first four conditions show that the solutions and their
first-order derivatives are continuous a t the boundary points of Z1.
By solving (2.2.40) and (2.2.41) with regard to (2.2.42), we readily obtain
116 Chapter I1

the following three equations for z', zl', and 2:

In (2.2.43)-(2.2.45) we have used the notation

The desired switch point z: can be found by solving the system of tran-
scendental equations (2.2.43)-(2.2.45). Usually, this system can be solved
by numerical methods. One can obtain the explicit expression for z: from
Eqs. (2.2.43)-(2.2.45) only if the problem is symmetric, that is, if a = 0. In
this case, the domain 2 1 is symmetric about the origin, z' = -zl', and we
have the switch point z: = 0. However, this is a rather trivial case and of
no practical interest.
REMARK.It should be noted that the optimal systems considered in
Sections 2.2.2 and 2.2.3 are very close to each other (the switch points
nearly coincide, z: z:) if the corresponding parameters of the problem
agree well with each other. These parameters can be made consistent in
the following way. Assume that the same parameters a, B , and urn are
given in problems of Sections 2.2.2 and 2.2.3. Then, choosing a value of
Exact Methods for Synthesis Problems 117

the parameter p, we can calculate three numbers z' = zl(P), z" = z1'(/3),
and z: = z:(,B) in dependence on the choice of 0. Now if we use z' and z"
as the boundary values of admissible errors (el = zl(P), ez = zl'(P)) in the
problem considered in Section 2.2.2, then by solving Eq. (2.2.34), we obtain
the coordinate of the switch point z: and show that z:(/3) w ~:(/3)' for
varying from 1.0 to 10W4. This is confirmed by the numerical experiment
described in [92]. Moreover, in [92] it is shown that Fl(z;(P)) w P-l for
these values of the parameter p.
2.2.4. Now let us consider the synthesis problem of optimal tracking a
discontinuous Markov process. Let us assume that the input process y(t)
in the problem of Section 2.2.1 is a pure discontinuous Markov process. As
shown in 51.1, such processes are completely characterized by the intensity
A(y) of jumps and the density function ~ ( yy') , describing the transition
probabilities a t the jump moments. The one-dimensional density p(t, y) of
this process satisfies the Feller equation (see (1.1.71))

From (1.4.61) with regard to (2.2.1) and (2.2.2), we obtain the Bellman
equation

+ c(z, y) + min
IuIIum

If we denote the integro-differential operator of Eq. (2.2.46) by Lt,y, then


this equations can be written in the short form

Comparing Eqs. (2.2.46) and (2.2.47) with the Feller equations (1.1.69) and
(1.1.70), we see that, for pure discontinuous processes, the Bellman equation
(2.2.47) contains the integro-differential operator ~t~
of the backward Feller
equation; this operator is dual to Lt,y. Therefore, Eq. (2.2.47) can be
written in the form

9 ~ h approximate
e relation zt(P) Z x;(P) means that lz:(P)-zr(P)I << z"(P)-~'(P).
118 Chapter I1

In what follows, we assume that the input Markov process y(t) is homo-
geneous with respect to the state variable y, that is, X(y) = X = const and
~ ( yy'), = ~ ( -yy'). In this case, by using the formal method proposed in
[176], we can replace the integro-differential operator L:~ in (2.2.47) and
(2.2.49) by an equivalent differential operator.
Let us show how to do this. First, we try to write Eqs. (2.2.46) and
(2.2.48) in the form

where L is the required differential operator and L p is the density of the


probability flow [160, 1731. We apply the Fourier transform to (2.2.50) and
(2.2.46). For the Fourier transform of the probability density

we obtain the following two equations from the well-known property of the
Fourier transform of the convolution of two functions:1°

where r ( s ) denotes

Comparing (2.2.51) and (2.2.52), we obtain the spectral representation of


the desired operator
T(s) - 1
L(s) = .X- (2.2.54)
S

If the expression on the right-hand side of (2.2.54) is the ratio of polynomials

(hi and qi are constant numbers), then, as follows from the theory of Fourier
transforms [41], the desired operator L(d/ay) can be obtained from L(s)

l o ~ e c a lthat
l X(y) = X and T(Z, y) =~ ( -yZ) in (2.2.46).
Exact Methods for Synthesis Problems 119

by the formal change s 4 dldy. Using the operator L(d/dy), we transform


the Bellman equation (2.2.49) to the form

(note that if L(d/ay) = H ( d / d y ) / Q ( d / d y ) , then the L 4 = cp has the


meaning of the relation H(d/dy)+ = Q(d/dy)cp).
Equation (2.2.56) can be simplified just as in Section 2.2.1. Namely,
assuming that the penalty function c(x, Y) depends only on the difference
z = y - x, instead of (2.2.56), we obtain the following ordinary differential
equation for the stationary operating mode:

(here the time-invariant loss function f3 = f3(z) is determined by analogy


with (2.2.9) and y is the stationary error.) The optimal control

differs from (2.2.20) only by the position of the switch point z:, which can
be obtained from the condition

To complete the solution of the synthesis problem, we need to calcu-


late 2:. By analogy with Section 2.2.1, the switch point z: divides the
z-axis into two parts: the domain z > z:, where df3/dz > 0, u, = urn, and
Eq. (2.2.57) has the form

and the domain z < z:, where df3/dz < 0, u, = urn, and Eq. (2.2.57) has
the form

kz (i)
-L - f'+u,-&- df' = y - c(z).

At the switch point 2% the derivatives of the functions f + ( z ) and f - ( z )


vanish:
120 Chapter I1

To solve the linear equations (2.2.59) and (2.2.60) explicitly, we need to


specify the form of the linear operator L(d/dz). Assume that the density
of the transition probability Z(Y,Y') a t the jump moment is given by the
formula

k~kz exp(-klIyl - YI) for Y' > Y, (2.2.62)


--{
.(Y, Y') = .(Y' - Y) = kl k2
+
exp(-k2y' - y / ) for y' < y.
Calculating the integral (2.2.53), we obtain

After the change s + d l d z , we obtain the following expression for the


operator L(d/dz) from (2.2.63) and (2.2.54):

With regard to (2.2.64), we can write Eqs. (2.2.59) and (2.2.60) in the form

Introducing the functions

we transform the system (2.2.65) as follows:

d 2 ~ + d$Q k2 * XAk
- $ Q =( z ) (2.2.67)
( A ) Urn)

where
c f (z) ==I--
Urn

ReIations (2.2.6 1) and (2.2.66) impIy the following matching conditions for
the functions $Q* (z) a t the switch point zz:
Exact Methods for Synthesis Problems 121

The characteristic equations corresponding to Eqs. (2.2.67) are

By I-"$ and p i we denote the roots of the characteristic equation for the
function cp+(z) (correspondingly, by pT and py the roots of the character-
istic equation for cp-(2)). A straightforward verification shows that if

then
(1) all roots p t 2 are real,
(2) each characteristic equation in (2.2.70) has roots of opposite signs
(for definiteness, in what follows, we assume that p$ and p: are
positive, and respectively, p: < 0).
REMARK.Note that condition (2.2.71) must be satisfied, since this is
the existence condition for the stationary tracking considered here. Indeed,
the expression on the left-hand side in (2.2.71) is equal to the absolute value
of the mean rate of the regular displacement in the command signal y(t)
caused by random jumps. Obviously, this rate of change cannot exceed the
limit admissible speed of the servomotor. Inequality (2.2.7 1)just confirms
this fact.
Taking into account these properties of the roots of the characteristic
equations (2.2.70), one can readily see that the solutions of Eqs. (2.2.67)
have the form

Using (2.2.72), from the matching conditions (2.2.69), we obtain the fol-
lowing equation for the required switch point z;:
122 Chapter I1

For the quadratic penalty function c(z) = z 2 , Eq. (2.2.73) has an exact
solution. Actually, taking into account (2.2.68)) we can rewrite Eq. (2.2.73)
in the form

where

Calculating the integrals in (2.2.74), we obtain

Since p t 2 and pL2 satisfy the quadratic equations (2.2.70), after easily
transformation, we obtain the following explicit formula for the switch
point:

Using (2.2.69)) (2.2.72), and (2.2.77), we can readily calculate the stationary
specific error

2u; X 2
+ (AAk - k 2 ~ m )[(Ak
2 + -)
urn + (k2 E
urn ) .] (2.2.78)
-
Exact Methods for Synthesis Problems 123

If instead of condition (2.2.71), we have a stronger inequality

then we can substantially simplify (2.2.77) and (2.2.78) by expanding these


expressions in power series in the small parameter E = XAk/umk2 and
taking only the leading terms of these expansions. In this case, instead of
(2.2.77) and (2.2.78), we have

For the first time, formulas (2.2.79) and (2.2.80) were derived by somewhat
different methods in [176].

$2.3. Optimal control of the population size


Numerous investigations deal with the dynamics of animal and microor-
ganism populations and control of the population size. For example, an
extensive literature on this subject can be found in 151, 73, 87, 89, 133,
142, 186, 187, 1891. Various mathematical evolutional models depending
on the environmental conditions of biological populations are used for de-
scribing the variations in the population size. We begin with a brief review
of such models. The main attention is given to the models that we shall
consider in this book later.
2.3.1. Models describing the population dynamics. Apparently,
Malthus was the first who considered the following model for the population
dynamics in 1798:

Here x = x(t) is the population size1' a t time t , and the constant number a ,
called the growth factor, is determined as the difference between the birth-
rate and the death-rate factors.
If the birth-rate is larger than the death-rate ( a > O), then, according
to the Malthus model (2.3.1), the population size must grow infinitely.

"The variable I is assumed to be continuous, although the number of individuals in


the population can be only integer. However, if the number of individuals is sufficiently
large, then the continuous model (2.3.1) can be used. In this case, the variable x is
treated as the population density, that is, as the number of individuals per unit area (or
volume) of the population habitat.
124 Chapter I1

Usually, this result is not confirmed, which shows that the mode1 (2.3.1)
is not perfect. Nevertheless, the basic idea of this model, namely, the
assumption that the rate of the population variation is proportional to the
current population size proved to be very fruitful. Many more realistic
models were constructed on this basis by introducing the corresponding
corrections to the growth factor a.
So, for example, if we assume that in (2.3.1) the growth factor a depends
on the population size x as

then we obtain the Gompertz mode1 (1825)

or the Verhulst mode1 (1838)

Equation (2.3.3) is often called the logistic equation. The positive con-
stants r and K are usually called the natural growth factor and the capacity
of the medium.
Models for more complicated systems of interacting populations are also
based on the Malthus model (2.3.1). Assume that in the same habitat there
are two different populations of sizes x l and 22, respectively. Let each of
these populations be described by the Malthus type equations

Now we assume that individuals in the second population (predators) can


exist only if they eat individuals (prey) from the first population.12 In this
case, it is natural to assume that the growth factors a and b in (2.3.4) have
the form

12This model is usually illustrated by the assumption that z l denotes a community


of hares and x2 a community of wolves. Hares need vegetable food, and wolves feed on
hares (and only on hares).
Exact Methods for Synthesis Problems 125

Thus, we arrive a t the two equations

which are well-known Lotka-Volterra equations that model the behavior of


the simplest system of interacting populations, the "predator-prey" model.
These equations were studied in detail by V. Volterra [187], who found
many remarkable properties of their solutions.
T h e multidimensional generalization of the Lotka-Volterra model has
the form
= - a T T , r = 1 , 2 ,..., n. (2.3.6)

The dynamics of system (2.3.6) depends on the form of the matrix A =


ljaijI(;L. If this matrix is antisymmetric, i.e., if a i j = -aji and aii = 0, then
Eq. (2.3.6) describes a conservative model for the population interaction. If
the quadratic form a,,xTxs is positive definite, then model (2.3.6) is called
dissipative.
Further generalizations of models for the population dynamics were re-
lated to more detailed descriptions of the interaction between individuals
in the population. For example, in many actual situations, the growth fac-
tor a depends on the population size a t some preceding moment of time
rather than on the current population size. In these cases, it is expedient
to use the Hutchinson model (1948)

which is a generalization of the logistic model (2.3.3). In 1976 Cushing


proposed the following more general model, in which both discrete and
distributed delays are taken into account:

where K ( s ) is a nondecreasing bounded function and the integral on the


right-hand side is of the Stieltjes type.
In some special cases, it is necessary to take into account the spatial
distribution of the population size. In these cases, the state of the popu-
lation is described by the density function D ( t , x, y) a t the point (x, y). If
the movement of individuals within the habitat is a diffusion process, then
instead of (2.3.7), we have the Hutchinson model with diffusion
126 Chapter I1

Equations (2.3.1)-(2.3.9) model the behavior of isolated biological com-


munities (autonomous models). If there are external actions on the pop-
ulation, then some additional terms responsible for these actions appear
on the right-hand sides of Eqs. (2.3.1)-(2.3.9) As usual, we distinguish two
types of external actions: purposeful controlled actions, which can be used
to control the population size, and noncontrolled random perturbations.
Let us consider a population described by the model (2.3.3). If there are
external actions, say, some individuals are taken away from the population,
then we obtain the controlled logistic model

5=T
(1 - -3 x-qux,

>
where the function u = u(t) 0 is the intensity of the catching process and
the number q > 0 is the catchability coefficient. In this case, the value

Q =9 lr u(t)x(t) dt (2.3.11)

gives the number of individuals caught during the time interval [tl, t2].13
In a similar way, the Lotka-Volterra equations can be generalized to the
following controlled system:

Here the control functions u l = ul(t) >


0 and u2 = uz(t) 2 0 are, re-
spectively, the intensities of catching prey and predators. If individuals are
removed only from one population, then we have u l ( t ) 0 or ug(t) 0 in
(2.3.12).
If the population behavior is substantially influenced by noncontrolled
random perturbations, then the dynamics of the population is described
by stochastic differential equations. For example, in some problems, the
population behavior can be satisfactory described by the stochastic logistic
model
x - qux +2/2Bx[(t), (2.3.13)

where [(t) is the scalar Gaussian white noise (1.1.31) and the number B > 0
determines the intensity of random perturbations. Many other stochastic
models used to describe the dynamics of various biological systems can be
found in [51, 831.
13Note that Eq. (2.3.10) can be used not only in the case of "mechanical" removal
(catching, shooting, etc.) but also in the case where the population size is controlled by
treating the habitat with chemical agents.
Exact Methods for Synthesis Problems 127

2.3.2. Statement of the problem. In this section we consider the op-


timal control of the size of a population described by the controlled logistic
model (2.3.10). The statement of this problem is borrowed from the books
135, 681, where this problem is formulated in conformity to the problem of
fisheries management.
We shall assume that the state x = x(t) of the controlled system (2.3.10)
characterizes the general quantity (or the mean density) of fish a t time t
in some chosen habitat. We also assume that the intensity of fishing is
bounded by a given value u, > 0. In this case, the mathematical model
for the dynamics of fish population has the form

j:=r
(1 - -3 x-qux,

By p > 0 we denote the price of the unit mass of caught fish and by
c > 0 the price of unit "efforts" u for fishing. Then it is natural to estimate
the "quality" of control by the functional

which, with regard to (2.3.11), gives the profit provided by the fishing
<
process defined by the control function u ( t ) : 0 t 2 T. The problem is to
< <
find the optimal control u, (t) : 0 t T for which the functional (2.3.15)
attains its maximum.
Following [35, 681, instead of (2.3.15), we shall estimate the quality of
control (i.e., of fishing) by the functional in which the terminal time T + m.
In this case, an additional "killing" factor appears in the integrand to ensure
the convergence, namely,

I(u) = Lrn e-6t (pqx(t) - c)u(t) dt, (2.3.16)

where b > 0 is a given positive number. As a result, we arrive a t the


following problem:
for an arbitrary initial state x(0) = xo of the controlled system
(2.3.14), to find a control function 0 5 u,(t) 5 urn: t >
0 (or
< >
0 5 u, (x (t)) u, : t 0) for which the functional (2.3.16) attains
its maximum on the trajectories of system (2.3.14).
REMARK.If the initial population size xo does not exceed the capacity K
of the medium, then it follows from Eq. (2.3.14) that for any time moment
128 Chapter I1

t > 0 and any admissible control u(t), the population size has the same
property, x(t) < K . Therefore, this problem is well posed if the parameters
p, q, and c in the functional (2.3.16) satisfy the condition

Otherwise, (xl > - K ) , this problem has only a trivial solution u,(t) 0,
t >
- O.I4 Therefore, in what follows, we assume that the inequality (2.3.17)
is satisfied. We also assume that qu, > T .
2.3.3. The solution of problem (2.3.14), (2.3.16). If we define the
function F ( x ) of the maximum future profit by the relation

F(x) = max
O5u(t)lum
[lm I (pqx(t) - c)u(t) dt ~ ( 0 =) x
I , (2.3.18)

then, using the standard procedure described in 3 1.3, we obtain the Bellman
equation

max {[rx (1 -
0LuLum
) - qux] -bF + (pyx - c)u

corresponding to problem (2.3.14), (2.3.16).


It follows from Eq. (2.3.19) that, depending on the current state (the
population size) x of the system (2.3.14), to perform the optimal control
. . G 0 for all points x E R1 C R+ a t which the
we need to choose u,(x)
function
dF
p ( x ) = pqx - c - qx-
dx (2.3.20)
is negative. Conversely, a t all points x E R2 c Rf where p ( x ) > 0, we
need to take the maximum admissible control u,(x) urn. If p(x,) = 0
a t a point x, (in view of continuity, the point x, separating R1 from R2
is the limit point of these domains), then the optimal control u, (x,) for
Eq. (2.3.19) is formally undetermined. However, one can see that the choice
of any admissible control 0 5 u 5 urn a t the point x, does not affect the
solution of Eq. (2.3.19).
Now let us consider how the population size in system (2.3.14) varies
with time. Let x(0) = xo < XI. Obviously, in this case, there exists an
initial half-interval [O, t,) a t all points of which we must set u, (t) 0. This
statement immediately follows from the fact that the expression pqx - c in
the parentheses in (2.3.16) is negative for x(t) close to xo. Thus, we have

14Since X I > K, we have ~ ~ x -( tc )< 0 for all t.


Exact Methods for Synthesis Problems 129

x(t) E R1 for all t E 10, t,). Hence, it follows from Eq. (2.3.14) with u 0
that, on the interval [0, t,), the population size x(t) increases monotonically
u p to the value x, = x(t,) that separates the sets R1 and R2. At the
point x,, as was already noted, the control may take any arbitrary value.
It is expedient to take this value equal to

and keep it constant for t >t,. It follows from (2.3.14) that the control
(2.3.21) preserves the population size x,.
REMARK. For u(x,) # u l , the representative point of system (2.3.14),
starting from the state x,, comes either to the set R1 (for u(x,) > u l ) or to
the set R2 (for u(x,) < ul) during a n infinitely small time interval. Then
the control u = 0 (or u = u,) immediately returns the representative point
to the state x,. Thus, for u(x,) # u l , the population size x, is preserved
by infinitely rapid switches of control (this is the sliding mode). Though,
as follows from (2.3.19), the value of the functional (2.3.16) for this control
remains the same as for u(x,) = u l , the constant control u(t) E u(x,) = u1,
>
t t,, is more convenient, since in this case the existence problem does not
>
arise for the solution x(t), t t,, of Eq. (2.3.14). The optimal control

21, (t) =
for 0 < t < t, (x(t) < x,),
(2.3.22)
for t>t,,

realizes the generalized solution x,(t) of Eq. (2.3.14) in the Filippov sense
(see 51.1).
Thus, for x(0) = xo < X I the optimal control (2.3.22) is a piecewise
function shown in Fig. 21 together with the plot of the function x, (t), which
shows the change of the population size corresponding to this control. It
remains to find the moment t, a t which the catching of individuals starts or,
which is the same, to find the size (density) x, = x, (t,) of the population
that we need to keep constant in the area of active catching. These variables
can readily be obtained by calculating the functional (2.3.16) and taking its
maximum with respect to t,. Indeed, for the control (2.3.22), the functional
(2.3.16) is equal to

We can calculate its maximum with respect to t, by using the fact that
x, = x, (t,) as a function of t, satisfies Eq. (2.3.14) for u E 0. After the
Chapter I1

FIG. 21

differentiation, from the extremum condition

- -de-8t* d.1C, -
$ ( x , ) + e-6'. - dx*
dt* d z , dt,

we obtain the following equation for the optimal size x , of the population:

ScK

This equation has only one positive solution

which has a physical meaning.


Note that, in view of (2.3.17), the value x , determined by (2.3.25) always
satisfies the inequality X I < x , < K . We also note that the condition
xo < X I , introduced for the sake of obviousness, does not influence the
Exact Methods for Synthesis Problems 131

choice of the optimal control. This strategy is completely determined by


x,, and according to this strategy, we do not catch individuals if the current
population density x(t) < x, and start catching with constant intensity
(2.3.21) if the population size attains the value x, (2.3.25).
We can readily calculate the profit function (2.3.18) corresponding to
this strategy. Integrating the equation in (2.3.14) with x(0) = x and u 0,
we obtain
xK
x(t) = t 2 0.
+
x ( K - x)ecTt'
Using (2.3.26), we see that the condition

allows us to find the moment t,. From (2.3.23) and (2.3.27), we explicitly
calculate the profit function

-
T
- -(pqx*
6q
- c)
(1- -
'1 x ( K - x,) 'IT
[.*in - (2.3.2,)

for x 5 2,.
To solve problem (2.3.14), (2.3.16) completely, it remains to consider
the case x(0) = xo > x,, that is, the case where the initial population
size is larger than the optimal size (2.3.25). First, we note that, in view of
(2.3.28), the profit function F ( x ) monotonically increases on the interval
<
0 x 5 x, from zero to the maximum value

We also note that the function +(x) has only one maximum point

and since the "killing" factor 6 in (2.3.16) is strictly positive, we always


have the strict inequality x2 > x,. Now if x(0) = xo = x2 > x ,, then using
the constant control
u(x2) = r4 (1 - z) , (2.3.30)
132 Chapter I1

we can keep the population size a t a level of x2 for which the functional
(2.3.16) attains the value

However, the constant control (2.3.30) is not optimal. One can readily
see that the functional (2.3.16) can take values larger than I ( u ( x 2 ) ) =
$(a2) if, instead of (2.3.30), we use the piecewise constant control function

shown in Fig. 22.

We choose a time interval A during which the control urn is applied, so


that a t the end of A the population size attains the value (2.3.25), that is,
x ( A ) = x,.15 The time interval A for the control urn is determined by the
equation

15The inequality I(ua(t)) > I(u(12)) = 4(12) is obtained by calculating the func-
tional (2.3.16) with regard to Eq. (2.3.14), where control has the form (2.3.31). Here
we do not perform the corresponding elementary but cumbersome calculations and leave
them to the reader as an exercise.
Exact Methods for Synthesis Problems 133

Obviously, control functions of the form (2.3.31) can be used not only
for the initial population size x(0) = 2 2 but also for arbitrary initial sizes
x(0) = x > 2,. In this case, we must perform the change 2 2 -+ x only in
Eq. (2.3.32) for the length A of the initial pulse urn. One can easily verify
that (2.3.20) implies p ( x ) > 0 for all x > x,. Therefore, the optimal control
as a function of the current population size (the synthesizing function) for
problem (2.3.14), (2.3.16) has the form
for <
0 x < x,,
for x=x,, (2.3.33)
for x > x*,
where x, is determined by (2.3.25).
Formula (2.3.33) gives the mathematical expression for the control strat-
egy that is well known in the theory of optimal fisheries management [35,
681. The key point in this strategy is the existence of an optimal size x, of
fish population given by (2.3.25). The goal of control is to achieve the op-
timal size x, of the population as soon as possible and to preserve the size
x, by using the constant control (2.3.21). This control strategy maximizes
the profit obtained by fishing if the profit is estimated by the functional
(2.3.16).
In conclusion, we note that the results presented in this section can be
generalized to the case in which the dynamics of fish population is subject
to the retarded equation (or the equation with delay)

here we have studied the controlled Hutchinson model. For the results
related to this case, see [99].
The stochastic version of problem (2.3.14), (2.3.16), when the behavior
of the population is described by the stochastic equation (2.3.13), will be
described in $6.3.

$2.4. Stochastic problem of optimal fisheries management


Now let us consider the problem of optimal fisheries management that
differs from the problem considered in $2.3 by the stochastic character of
the model used for the description of the population dynamics. We assume
that the behavior of a fish population is subject to the stochastic differential
equation of the form
134 Chapter I1

where ((t) is a scalar Gaussian white noise (1.1.31), B > 0 is a given positive
number, the natural growth factor r > 0 and the catchability coefficient
q > 0 have the same meaning as similar coefficients in (2.3.10), (2.3.13),
and (2.3.14). Equation (2.4.1) is a special case (as K + m ) of Eq. (2.3.13)
and, in accordance with the classification presented in Section 2.3.1, the
model described by Eq. (2.4.1) can be called a controlled stochastic Malthus
model.
Just as in $2.3, the size x(t) of the fish population is controlled by catch-
ing a part of this population. In this case, the catching intensity u(t) has
an upper bound u,, and therefore, the set of all nonnegative measurable
bounded functions of the form u(t) : [0, a)+ [O, u,] is considered as the
set of admissible controls. The goal of control is to maximize the functional
(2.3.16), which, in view of the random character of the functions x(t) and
u(t), is replaced by the corresponding mean value. As a result, we have the
problem

In what follows, we assume that the decay index S in (2.4.2) satisfies the
condition S > r .
We shall solve problem (2.4.1), (2.4.2) by using the standard procedure
of the dynamic programming approach described in 51.4. We define the
profit function for problem (2.4.1), (2.4.2) by the relation

where El(.) I x(0) = x] denotes the conditional mathematical expectation


of (.). As was shown in [113, 1751, the second-order derivative of the profit
function (2.4.3) is continuous. It follows from Theorem 3.1.5 in [I131 that
for all x E R+ = [0, m ) , this function has the upper bound

where N > 0 is a constant.


The Bellman equation (F, = g,F,, d Z
= =) ~

for the profit function (2.4.3) can be obtained as usual (see $1.4). It should
be pointed out that a symmetrized stochastic integral (see [I741 and 51.2)
Exact Methods for Synthesis Problems 135

was used for writing (2.4.5). This led to a n additional term B in the
parentheses in (2.4.5), that is, in the coefficient of xF,.16
Equation (2.4.5) allows us to find the optimal control u, as a function
u,(x) of the current states of system (2.4.1). First, we note that, according
to (2.4.5), the set of all admissible states of (2.4.1) can be divided into the
following two subsets (just as in $2.3):
the subset R1, where p ( x ) = pqx - c - qxFx < 0 and u,(x) =0
and
the subset R ~where
, ~ ( x>
) 0 and u, (x) = urn.
The boundary between these two subsets is determined by the relation

pqx - c - qxFx = 0. (2.4.6)

The further calculations show that, in this problem, there exists a unique
point x, satisfying (2.4.6). Therefore, the subsets R1 and R2 are the inter-
vals R1 = [0, x,) and R~ = (2, , CQ). Thus the optimal control in the syn-
thesis form u, = u,(x) becomes uniquely determined a t all points x E R+
except for the point x,. It follows from (2.4.6) that we can use any admis-
sible control u(x,) E [0, urn] a t the point x,.
Therefore, the optimal control function u, (x) can be represented in the
form

and the final solution of the synthesis problem is reduced to calculating the
coordinate of the switch point 2,. To calculate x,, we need to solve the
Bellman equation (2.4.5).
As was already noted, the second-order derivative of the profit function
F ( x ) is continuous, thus the profit function F ( x ) satisfying (2.4.5) can
be obtained by using the matching method (see $2.2). In what follows,
we describe the procedure for solving the Bellman equation (2.4.5) and
calculating the coordinate of the switch point x, in detail.
By F1(x) and F 2 ( x ) we denote the profit function F ( x ) on the intervals
R1 = [O, x,) and R' = (x,, CQ). It follows from (2.4.5) and (2.4.7) that the
functions F1 and F2 satisfy the linear equations

BX~F,:,+ + B)XF: - 6 ~ =' 0, O 5 x < x,,


(T (2.4.8)
B x Fxx+ ( r + B q u r n ) x ~ :- 6~~ + (pqx - c)um = 0,
2 2
- x > 2,.
(2.4.9)

I6If the stochastic differential equation in (2.4.1) is the Ito equation, then the second
term in the Bellman equation (2.4.5) has the form ( T - qu)xF,.
136 Chapter I1

Since the profit function F ( x ) is sufficiently smooth (recall that the second-
order derivative of F ( x ) is continuous), both functions F 1 and F 2 must
satisfy the condition (2.4.6) a t the switch point x,. Taking into account
the fact that F ( 0 ) = 0 according to (2.4.1) and (2.4.3), we have the following
additional boundary condition for the function F 1 ( x ) :

The boundary conditions (2.4.6), (2.4.10), and the upper bound (2.4.4)
allow us to obtain the explicit analytic solution of Eqs. (2.4.8) and (2.4.9).
Equation (2.4.8) is the well-known homogeneous Euler equation. Its
general solution has the form

where A1 and A2 are constants, and k i and k l satisfy the characteristic


equation
+ +
B k ( k - 1 ) ( T B ) k - S = 0. (2.4.12)
The constants A1 and A 2 are determined by two boundary conditions
(2.4.10) and (2.4.6) a t the points x = 0 and x = x,. Since the roots

of Eq. (2.4.12) have opposite signs, we conclude that to satisfy the condition
(2.4.10), we need to set A 2 equal to zero in (2.4.11). The constant

can be calculated by substituting F 1 ( x ) = ~ 1 % into~ : (2.4.6) and taking


into account the fact that (2.4.6) is valid a t the switch point x , . Thus, the
solution of Eq. (2.4.8) is given by the formula

The inhomogeneous Euler equation (2.4.9) can be solved in a similar


way. By using the standard method of variation of parameters, we obtain
the general solution
Exact Methods for Synthesis Problems

where
1
k21 -
--2B [ P m - T. + J(~u, - T ) +~4B6],

satisfy the characteristic equation

and A3 and A4 are arbitrary constants.


Since kf is positive, we must set the constant As equal to zero (otherwise,
formula (2.4.15) contradicts the upper bound (2.4.4)). T h e constant A* can
be calculated from condition (2.4.6) a t the switch point x,. Substituting
F 2 ( x ) (determined by (2.4.15) with A3 = 0) into (2.4.6) instead of the
function F , we obtain

This implies the following expression for the function F 2 ( x ) :

T h e two functions F1(x) (2.4.14) and F 2 ( x ) (2.4.17) determine the profit


function F ( x ) satisfying the Bellman equation (2.4.5) for all x E R+ =
[O, co). These functions contain a parameter x,, which remains unknown.
We can calculate x, by using the continuity property of the profit func-
tion F (x).
Each of the functions F1 and F2 is continuous. Hence, to ensure the
continuity of F ( x ) , it suffices to satisfy the condition

a t the switch point 2,. It follows from (2.4.6), (2.4.8), and (2.4.9) that
(2.4.18) is equivalent to the condition

a t the switch point x,.


138 Chapter I1

Calculating the second-order derivative of the functions (2.4.14) and


(2.4.17), we derive the following equation for x, from (2.4.19):

Hence, the switch point x, is determined by the explicit formula

Formula (2.4.20) and the optimal control algorithm (2.4.7) constitute


the complete analytic solution of the stochastic problem (2.4.1), (2.4.2) of
optimal fisheries management.
Some final comments and remarks. It is of interest to compare
(2.4.20) and (2.3.25), which is the optimal size of the population in the
deterministic problem (2.3.14), (2.3.16) of optimal control considered in
$2.3. Denoting (2.3.25) by c,,we may expect that the equality

lim F* = lim 2, (2.4.21)


K+m B+O

is valid due to continuity reasons (since the deterministic version of problem


(2.4.1), (2.4.2) formally coincides with problem (2.3.14), (2.3.16) as K +
m).
We can verify (2.4.21) by straightforward calculations of the limits on
both sides. Indeed, using (2.3.25), we readily calculate the limit on the
left-hand side of (2.4.21) for 6 > r,

c6
lim T, =
~ - t m pq(6 - r ) '

The same result is obtained by calculating the limit of (2.4.20) as B + 0,


since
k: - 1
+ (6 - r aqum
k ; - k; B+O
) (qum - r)
,
which follows from (2.4.13) and (2.4.16).
Formula (2.4.21) shows how the results obtained in this section for
problem (2.4.1), (2.4.2) are related to similar results for problem (2.3.14),
(2.3.16) obtained in Section 2.3.3 by quite different methods.
There is another interesting specific feature of problem (2.4.1), (2.4.2).
Namely, the standard "classical" approach of the dynamic programming
Exact Methods for Synthesis Problems 139

that leads to the exact solution of the stochastic problem (2.4.1), (2.4.2)
does not allow us to solve the synthesis problem (that is, to find the switch
point 2,) for the deterministic version of problem (2.4.1), (2.4.2), that is,
in the case where there are no random perturbations in Eq. (2.4.1). This
fact can readily be verified if we consider the deterministic version of the
Bellman equation (2.4.5).

max [(T -
o<u<u,

and calculate the functions

which, in this case, determine the profit function F ( x ) on the intervals


R1= [O, x*) and R2 = [x,, 00).
Contrary to the stochastic case in which the continuity condition (2.4.18)
for the functions (2.4.14) and (2.4.17) determines the unique switch point
(2.4.20), one can readily verify that the same continuity condition F1(z,) =
F2(x,) for the functions (2.4.24) and (2.4.25) holds for any point x , E
(0, m). Therefore, the control problem considered can serve as an example
illustrating the well-known idea (see [113, 1751) that the dynamic program-
ming approach is more suited for solving control problems with stochastic
models of plants (which, by the way, describe the actual reality more ade-
quately).

REMARK.If the equation in (2.4.1) is understood as the Ito stochastic


equation, then the Bellman equation for problem (2.4.1), (2.4.2) differs from
(2.4.5) and has the form

The way for solving this equation is quite similar to the above procedure
for solving Eq. (2.4.5). However, the population size x, that determines
the switch point for the optimal control (2.4.7) differs from (2.4.20) and is
Chapter I1

given by the expressions

Z* =
c(6 - r + qu,)
(El 1)
pq[h--r+ - I qumll
(k:-%;)
CHAPTER I11

APPROXIMATE SYNTHESIS OF STOCHASTIC


CONTROL SYSTEMS WITH SMALL CONTROL ACTIONS

Various approximate synthesis methods can be useful if the Bellman


equation cannot be solved exactly. Chapters 111-VI deal with some of these
methods.
Approximate methods are usually efficient if the initial statement of the
optimal control problem contains a small parameter. Quasioptimal control
algorithms are constructed by using either the corresponding procedures of
successive approximations or asymptotic expansions of the loss function in
powers of a small parameter of the problem. The choice of a method for
constructing an approximate solution of the synthesis problem essentially
depends on the choice of a parameter that is considered to be small. For
example, in this chapter, the values of control actions are assumed to be
small. Chapter IV is about the Bellman equation with small diffusion
coefficients. In Chapter V, we consider control problems for oscillating
systems with small attenuation decrement. In Chapter VI, the role of small
parameters is played by the a posteriori covariances of unknown coefficients
in the plant equations.
Let us formulate the main idea of the approximate synthesis method
studied in this chapter. As was already noted, the method is based on the
assumption that control actions on the plant P are relatively small. From
the physical viewpoint, this assumption means that the effect of the control
actions on the phase trajectories of the system is small, and therefore the
system dynamics is similar to noncontrolled motion. In particular, this
assumption holds for control problems with constraints if the noises acting
on the plant are of large intensity.
Indeed, let us assume that the controlled (unperturbed) plant is a stable
mechanical system. Then large random perturbations lead to large devia-
tions of the system from the equilibrium state. In this case, some "internal"
inertial and elastic forces arise in the system. These forces can significantly
exceed the (bounded) control forces whose effects on the system turn out
to be relatively small.'

'Note that in this book we do not consider deterministic synthesis problems for

141
142 Chapter I11

From the formal mathematical viewpoint, the fact that control actions
are small leads to a small parameter in the nonlinear term in the Bellman
equation. To verify this fact, let us consider the synthesis problem for
the servomechanism (Fig. 10) governed by the Bellman equation (1.4.21).
Assume that the dimensions of the region U of admissible controls are
bounded by a small value of the order of E. For definiteness, we assume
that U is either an r-dimensional parallelepiped (R, > U = ( u : lujl < -

u,i, i = 1 , 2 , . . .,T ; maxi u,i = E ) ) or an r-dimensional ball of radius E,


that is, (R, > U = ( u : x:=l
U: E)).<
In the first case, according to (1.3.22), the solution of the synthesis prob-
lem is given by the formula (the control algorithm)

where the vector of partial derivatives d F / d x is calculated by solving the


equation

Here denotes the r-vector (column) U,/E and L is a linear operator of


the form2

In the second case (where U is a ball), the optimal control has the form
(see (1.3.23))

systems controlled by small forces. Such systems called weakly controllable in [32] were
studied in [32, 1371.
'Recall that relations (3.0.1) and (3.0.2) follow from the Bellman equation (1.4.21)
+
with c ( z , y,u ) = CI ( x , y) and A z ( t , z ) = a ( t ,x ) Q(t)u; {u,~, . . . ,u,,) denotes a diag-
onal (T x T)-matrix; for a column A with components Al , . . . , A,, the expressions sign A
and [ A [ denote T-columns with components sign Aj and [ A j [ ( i = 1 , . . . ,T), respectively.
Approximate Synthesis of Stochastic Control Systems 143

where the vector d F / d z is the gradient of the loss function satisfying the
equation

If we denote the nonlinear terms in Eqs. (3.0.2) and (3.0.4) in the same
way, then we can write both equations in the form

where @(t,d F / d x ) is a given nonlinear function of its arguments.


As a rule, equations of the type (3.0.5) cannot be solved exactly. How-
ever, the presence of a small parameter in the nonlinear term of this equa-
tion yields a rather natural way of solving this equation approximately. To
this end, one can use the method of successive approximations in which the
zero-order approximation Fo(t,x,y) satisfies the equation

and the successive approximations Fk(t,x, y) can be calculated recurrently


by solving the sequence of linear equations of the form

If we know the solution F k ( t ,x, y) of the equation for the kth approxima-
tion (k = 0,1, . . .), then we can perform an approximate synthesis of the
controlled system by setting the quasioptimal control algorithm as

In this chapter we consider an approximate method for the synthesis of


optimal systems whose "algorithmic" essence is given in formulas (3.0.6)-
(3.0.8).3
Needless to say, the practical use of procedure (3.0.6)-(3.08) in special
problems leads to additional problems of constructivity and efficiency of

3The approximate synthesis algorithm (3.0.6)-(3.08)is a modification of the well-


known Bellman method of successive approximations [14,161. This method was used by
W. Fleming for solving some stochastic problems of optimal control 1551. The procedure
(3.0.6)-(3.0.8) is a special case of the Bellman method if the trivial strategy uo(t,x,y) 5 0
is used as the initial "generating" control strategy in the Bellman method.
144 Chapter I11

this approximate synthesis method. In this chapter we shall discuss these


problems in detail. All related material is divided in sections as follows.
First ($$3.1-3.3), we consider some methods for calculating the successive
approximations for stationary synthesis problems. We write out approxi-
mate solutions (that correspond to the first two approximations) for some
special control systems with various types of disturbances affecting the sys-
tem. In $3.1 and 53.2, we consider random perturbations of the white noise
type. In $3.3 the results obtained in $3.1 and $3.2 are generalized to the
case of correlated noises.
In $3.4 we study nonstationary problems and estimate the error of the
approximate synthesis (3.0.6)-(3.0.8) for the first two approximations.
In $ 3.5 we study asymptotic properties of successive approximations
(3.0.7), (3.0.8) as k -+ co. We show that, under some special conditions, as
k -+ co the sequence Fk is convergent to the exact solution of the Bellman
equation, and the corresponding quasioptimal control algorithms (3.0.8) to
the optimal control u,(t, x, y). In this case, the convergence uk t u, is
understood in the sense of convergence of values of the functional to be
minimized.
Finally, in 53.6 the method of successive approximations (3.0.6)-(3.0.8)
is used for approximate synthesis of some stochastic control systems with
distributed parameters.

$3.1. Approximate solution of stationary synthesis problems


3.1.1. Let us consider the problem of optimal damping of oscillations in
a dynamic system subject to random perturbations of the white noise type
(Fig. 13). Let the plant P be described by the following system of linear
stochastic differential equations with constant coefficients:

Here x = x(t) is an n-vector (column) of current phase variables of the


system (xl(t), . . ., xn(t)), u = u(t) is an T-vector (column) of control actions
(ul(t), . . .,u, (t)) , J(t) is an n-vector (column) of random perturbations
with independent components ( ( ~ ( t )., . .,Jn(t)) of the standard white noise
type (1.1.31), and A, Q, and o are given constant matrices of corresponding
size.
It is required to construct a control block C so that to ensure the optimal
suppression (damping) of the oscillations x(t) arising in the output of the
closed loop system shown in Fig. 13 under the action of random perturba-
tions ((t). As the optimality criterion, we shall use an integral functional
of the form
(3.1.2)
Approximate Synthesis of Stochastic Control Systems 145

where c(x) >0 is a given convex penalty function attaining the absolute
minimum c(0) = 0 a t the point x = 0 (the restrictions on c(x) are discussed
in detail in $3.4 and $3.5).
Let admissible controls be bounded and small. We assume that all com-
ponents of the control vector u satisfy the conditions

where E > 0 is a small parameter and urnl,. . .,urn, > 0 are given numbers
of order 1.
The system shown in Fig. 13 is a special case (the input signal y(t) 0)=
of the servomechanism shown in Fig. 10. Therefore, the Bellman equation
for problem (3.1.1)-(3.1.3) readily follows from (1.4.21), and taking into
account the relations AY(t, y) = 0, BY@,y) = 0, Ax(t, x, u) = Ax + Q u , and
~ ( xy,, u) = c(x), we obtain

Here L denotes a linear elliptic operator of the form

where, according to (1.4.16), the matrix B = uuT, and, as usual, the sum
in the last expression on the right-hand side of (3.1.5) is taken over repeated
indices from 1 to n.
It follows from (3.0.1) and (3.0.2) that in this case the optimal control
has the form

U, (t, x) = -{EU,I, . . . ,E U ~ sign


~ ) (3.1.6)

where the loss function F ( t , x) satisfies the equation

Some methods for solving Eq. (3.1.7) will be considered in $3.4. In


the present section, we restrict our consideration to stationary operating
conditions of the stabilization system in question. It follows from $1.4
that the stationary mode of stabilization (damping) can take place if T -+
co, where T is the terminal instant of the operation interval (the upper
146 C h a p t e r I11

integration limit in (3.1.2)). Obviously, in this case, stationary operating


conditions exist only if the unperturbed motion of the plant P is stable,
that is, in other words, if the real parts of the eigenvalues of the matrix A
in (3.1.1) are negative. In what follows, we assume that these conditions
are satisfied.
If we define the stationary loss function f (x) by the relation (see (1.4.29),
(2.2.9))
f ( x ) = lim [ ~ ( t , x ) - y ( T - t ) ] ,
T+w

then (3.1.7) implies the following time-invariant equation for f (x):

where the parameter y characterizing stationary "specific losses" together


with the function f (x) can be found by solving Eq. (3.1.8). We shall solve
Eq. (3.1.8) by the method of successive approximations. The scheme for
calculating (3.0.6), (3.0.7), applied to the time-invariant equation (3.1.8)
leads to the sequence of equations

It follows from (3.1.9) and (3.1.10) that each time when we are calcu-
lating the next approximation, we need to solve a linear inhomogeneous
elliptic equation of the form

We shall consider a method for solving Eq. (3.1.11) with a given function
cp(x), which allows us to represent the function f (x) in the form of a series
in eigenfunctions of some Sturm-Liouville problem [179].
3.1.2. T h e passage t o t h e adjoint equation. Let us consider the
operator
L* = - - (aA , . 1 3 % ~ ) -
1 a2
axi + 2-axiaxj(Bij), (3.1.12)

that is the dual of the operator (3.1.5). The equation


Approximate Synthesis of Stochastic Control Systems 147

is the Fokker-Planck equation (1.1.67) for the n-dimensional Gaussian


Markov process x ( t ) [45, 167, 1731. The assumption that the matrix A
is stable implies that this process has a stationary density function p o ( x )
such that
L f p o ( z ) = 0. (3.1.14)
In this case the stationary probability density p o ( x ) has the form

where P-I is the matrix of covariances of components of the vector x .


We shall present a possible method for solving Eq. (3.1.13) and calculat-
ing the matrix P in (3.1.15). The diffusion Markov process x ( t ) described
by (3.1.13) satisfies the system of linear stochastic differential equations

describing the uncontrolled motion of the plant (3.1.1). We pass from x to


new variables y related to x by the linear transformation

with a nondegenerate matrix Zf.


As a result, instead of (3.1.16), we have
the following system for the new variables:

where
2 = V - ~ A V , a = V-Lu.
- (3.1.19)
We choose so that the matrix 2 is diagonal

As is known [62], the matrix always exists and can readily be constructed
if the eigenvalues of the matrix A are simple, that is, if the characteristic
equation of the matrix A,

det(A - X E ) = 0 , (3.1.21)
-
has different roots X I , X 2 , . . ., An. In this case, the columns of the matrix V
are the eigenvectors 2 of the matrix A that satisfy the linear equations
148 Chapter I11

The system (3.1.18) can readily be solved in the case of (3.1.20). Indeed,
writing (3.1.18) in rows, we obtain

where the random functions v e ( t ) = i i e k t k ( t ) are processes of the white


noise type and have the characteristics

-
Here gLmis a n element of the matrix B = iiiiT. Solving Eq. (3.1.23), we
obtain

and taking into account (3.1.24), derive the following expressions for the
means and covariances:

-
Btm (3.1.26)
Eyt ( t ) y m ( t ) - Eye ( t ) . Eym ( t ) = ---[ e ( ' t + A m ) ( t - t ~ )
Je + Am
- 11,
which determine the transition roba ability p ( y ( t ) I y(to)) of the Gaussian
process ~ ( t ) It
. follows from (3.1.26) that

in the stationary case as t + co, since, by assumption, ReXi < 0 (i =


1,. . ., n ) for all roots of the characteristic equation (3.1.2 1) of the matrix A.
It follows from (3.1.27) that the stationary density function p o ( y ) can be
written in the form

where each entry of the matrix FP1is given by the formula


Approximate Synthesis of Stochastic Control Systems 149

The stationary density po(y) satisfies the stationary Fokker-Planck equa-


tion
a
L*Po(Y)= - - ( X ~ Y ~ P O )
1
S --
a2 -
(BijPo) = 0-
ayi 2 ayiayj
Since the random processes y ( t ) and x ( t ) are related by the linear transfor-
mation (3.1.17), the comparison of (3.1.15) with (3.1.28) yields the formula

which together with (3.1.29) allows us to calculate the matrix P.


Now let us return to the Fokker-Planck equation (3.1.14). If the operator
(3.1.5) satisfies the potentiality condition (see $ 4 , Section 5 in [173]),then
the operator equality4
POL= L*PO (3.1.32)
readily follows from (3.1.5), (3.1.12), and (3.1.14). However, even if the
potentiality conditions are not satisfied and (3.1.32) does not hold, one can
choose an operator L ; satisfying a similar relation

One can readily see that the operator L; has the form5

where the matrix G = IIGijllT is similar to the transpose matrix AT from


(3.1.1) and (3.1.5),
G = P-' AT P. (3.1.35)
The similarity transform (3.1.35) employs the matrix P from (3.1.15).
Relation (3.1.33) allows us to replace Eq. (3.1.1 1 ) by a similar equation
for the dual operator. In other words, it follows from (3.1.11) and (3.1.33)
that the problem of finding f ( x ) in (3.1.11) is equivalent to the problem of
finding z ( x ) in the equation

where z ( x ) , + ( x ) , and the functions f ( x ) , p ( x ) from Eq. (3.1.11) satisfy


the relations

4As usual, the operator equality is understood in the sense that it is an ordinary
relation po(x)Lw(x) = L*po(x)w(x) for any sufficiently smooth function w(x).
5The verification of (3.1.33) is left to the reader as an exercise.
150 Chapter I11

3.1.3. The solution of equations (3.1.36) and (3.1.11). Let us


consider the following problem of finding the eigenfunctions z, (x) and eigen-
values A, of the operator L; (the Sturm-Liouville problem):

Since L; is the Fokker-Planck operator, its eigenfunctions z, must satisfy


the zero conditions a t infinity (as 1x1 t MI). -
By passing from x to new variables y (x = Vy) and acting in a way
similar to (3.1.17)-(3.1.31), we can transform the operator (3.1.34) to the
form

--
where gj is an element of the matrix B = mT, a = V 1a, 7is a nonde-
1 -
generate matrix such that the transformation V - GV makes the matrix
diagonal, = {XI, . . ., A,), and Xi are roots of Eq. (3.1.21).6
In the new variables the stationary Fokker-Planck equation has the form

This equation differs from (3.1.30) only by the matrix of diffusion coeffi-
cients; therefore, the stationary probability density %(y) is determined by
the formulas

similar to (3.1.28) and (3.1.29).


Differentiating (3.1.40) appropriately many times, we see that

~ (3.1.35),the matrix G is similar to the transpose A ~ Since


' ~ c c o r d i nto . all similar
and transpose matrices have the same eigenvalues, the characteristic equation det(G -
XE) = 0 for the matrix G coincides with (3.1.21)).
Approximate Synthesis of Stochastic Control Systems 151

where m l , m2, . . .,mn are any arbitrary integers between 0 and m.


aml+...+m, -
It follows from (3.1.43) that the functions p, and the num-
+
bers (mlX1 - .. + mnX,) can be treated as the elgenfunctions zs and the
eigenvalues As of problem (3.1.38), respectively.
By using (3.1.41), we can write the functions zs in more detail as follows:'

Here Hm,...,,(y) = Hml...m,(yl,. . .,yn) denote multidimensional Hermit-


ian polynomials (for instance, see [4]) that, by definition, are equal to

aml+...+m,
x
ayyl . . .a y r n
i
exp [ - (yTPy)]. (3.1.45)

It follows from the general theory [4] for Hermitian polynomials with real
variables y that these polynomials form a closed and complete system of
functions, and an arbitrary function from a sufficiently large class (these
functions grow a t infinity not faster than any finite power of lyl) can be
expanded in an absolutely and uniformly convergent series in this system
of functions. Furthermore, the polynomials H are orthogonal to another
group of Hermitian polynomials G given by the formula

Here the variables p and y satisfy the relation

and the orthogonality condition itself has the form

is the Kronecker delta).

h he constant coefficient [(27~)"d e t ~ - ~ ] - ' /in~ (3.1.44) is omitted.


152 Chapter I11

However, we often need to use a complex matrix 7for the change of


variables x + y (for instance, see the problem in 83.2.). To pass to complex
variables, we need to verify some additional statements from the general
theory [4], which hold for real variables. In particular, it is necessary to
verify the orthogonality conditions (3.1.48), which are the most important
in practical calculations.
This was verified in -[107],
- where it was shown that all properties of
the polynomials H and G remain valid for complex variables if only all
functions H u l...un (y), G,, ...,. (y), exp[i(yTFy)1, and e x p [ $ ( ~ ~ - l pare
)l
considered as functions of the initial real variables x of the problem. To
1
this end, we need to make the change of variables y = V- x in all these
functions. In particular, in this case, the orthogonality condition (3.1.48)
has the form

where the matrices PI and P satisfy the relation

similar to (3.1.31).
Thus, we obtain the following algorithm for constructing the solution
f (x) of Eq. (3.1.11). First, we seek a stationary density po(x) satisfying
(3.1.14) and a n operator L; satisfying (3.1.33). Then we transform prob-
lem (3.1.11) to problem (3.1.36). After this, to find the eigenfunctions and
eigenvalues of problem (3.1.38), we need to calculate the matrix 7 that
transforms the matrix G to the diagonal form {XI, . . ., A,} by the simi-
1 -
larity transform 7- GV. Next, using the known Xi and 7and (3.1.42),
1
we calculate the matrices P- and ?5 that determine the stationary distri-
bution (3.1.41). The expression obtained for %(y) enables us to find the
eigenfunctions z, = z,, ...mm (3.1.44) for problem (3.1.38) and the orthog-
onal polynomials G, ,... m n (3.1.46).
Finally, we seek the function z(x) satisfying (3.1.36) in the form of the
series with respect t o the eigenfunctions:
Approximate Synthesis of Stochastic Control Systems 153

where am,.,,mnare unknown coefficients; the eigenfunctions zml...mn(x) can


1
be calculated by formulas (3.1.44) with y = V - x.
If we also represent the right-hand side 4 ( x ) = p ~ ( x ) p ( x in
) (3.1.36) as
the series

po(x)p(x) = 5
ml...mn=O
bml ...mnzml...mn (XI, (3.1.52)

where, in view of (3.1.49),

then we can calculate the unknown coefficients in (3.1.51) by the


formula

which follows from (3.1.38) and (3.1.43).


Now we see that (3.1.37) implies the expression

for the solution of the initial equation (3.1.11).


The algorithm obtained for solving (3.1.11) can be used for calculating
the successive approximations (3.1.9) and (3.1.10) It remains only to solve
the problem of how to choose the stationary losses -yk (k=0,1,2, . . . ) in
Eqs. (3.1.9) and (3.1.10).

3.1.4. Calculation of the parameters -yk (k: = 0 , 1 , 2 , . . .). The


structure of the solution (3.1.55) and a natural requirement that the station-
ary loss function f (x) must be finite imply that there is a unique method for
choosing -yk. Indeed, since, according to (3.1.54), the eigenvalue Aoo...o = 0,
the coefficient aoo.,.o in (3.1.46) is finite if a necessary condition boo...o = 0
is satisfied, or, more precisely, (in view of (3.1.53) and (3.1.46)) if we have
154 Chapter I11

This relation, (3.1.9) and (3.1.10) imply the following expressions for the
stationary losses y k :

Ip0(x) dxl . . .dx,,


ax

Thus, we have completely solved the problem of how to calculate the succes-
sive approximations (3.1.9), (3.1.10) for the stationary operating conditions
of the optimal stabilization system.
If the loss function f k (x) in the kth approximation is calculated, then the
quasioptimal control uk (x) in the kth approximation is completely defined,
namely, in view of (3.0.8) and (3.1.6), we have

uk(x) = -{Eu,~, . . .,EU,,} sign (QT -

In the next section, using this general algorithm for approximate syn-
thesis, we shall calculate a special system of optimal damping of random
oscillations when the plant is a linear oscillating system with one degree of
freedom.

$3.2. Calculation of a quasioptimal


regulator for the oscillatory plant
In this section we consider the stabilization system shown in Fig. 13,
in which the plant P is an oscillatory dynamic system described by the
equation
+
Z+p5 x = u + &[(t), (3.2.1)
where the absolute value of the scalar control u is bounded,

the scalar random process [(t) is the standard white noise (1.1.31), and P,
B, and E are given positive numbers (p < 2).
Equations of the type of (3.2.1) describe the motion of a single mass
point under the action of elastic forces, viscous friction, controlling and
random perturbations. The same equation describes the dynamics of a
direct-current motor controlled by the voltage applied to the armature when
Approximate Synthesis of Stochastic Control Systems 155

the load on the shaft varies randomly. Examples of other actual physical
objects described by Eq. (3.2.1) can be found in [2, 19, 27, 1361.
For system (3.2.1), (3.2.2), it is required to calculate the optimal regu-
lator (damper) C (see Fig. 13), which will damp, in the best possible way
with respect to the mean square error, the oscillations constantly arising
in the system due to random perturbations ((t). More precisely, as the
optimality criterion (3.1.2), we shall consider the functional

which has the meaning of the mean energy of random oscillations in system
(3.2.1). Note that the mean square criterion (3.2.3) is used most frequently
and this criterion corresponds to the most natural statement of the optimal
damping problem [I, 501. However, there are other statements of the prob-
+
lem with penalty functions other than the function c(x) = x2 i2exploited
in (3.2.3). From the viewpoint of the method used here for solving the syn-
thesis problem, the choice of the penalty function is of no fundamental
importance.
To make the problem (3.2.1)-(3.2.3) consistent with the general state-
ment treated in 53.1, we write Eq. (3.2.1) as the following system of two
first-order equations for the phase coordinates x l and 2 2 (these variables
can be considered as the displacement x l = x and the velocity x2 = i ) :

Using the vector-matrix notation, we can write system (3.2.4) in the form
(3.1.1), where A, Q, and u are the matrices

According to $3.1, under the stationary operating conditions (T + co in


(3.2.3)), the desired optimal damper C (Fig. 13) is a relay type regulator
described by the equation (see (3.1.6))

U* (xl, 2 2 ) = -E sign
(S)-
Here f = f ( x l , x 2 ) is the loss function satisfying the stationary Bellman
equation (see (3.1.8))
156 Chapter I11

where, according to (3.1.5) and (3.2.5),

The equation

determines a switching line for the optimal control action (from u = +E to


u = -E or backwards) on the phase plane ( X I , 2 2 ) . The goal of the present
section is to obtain explicit expressions for the control algorithm (3.2.6) and
the switching line (3.2.9). To this end, it is necessary to solve Eq. (3.2.7).
We shall solve this equation by the method of successive approximations
discussed in $3.1.
First, we shall prepare the mathematical apparatus for calculating the
successive approximations. A straightforward verification shows that the
stationary distribution with the density function po(x) = po(xl, xz), satis-
fying the equation (see (3.1.14))

Hence, the matrices P and P-I in (3.1.15) are equal to

It follows from (3.2.11) and (3.1.35) that in this case the matrix G of the
operator (3.1.34) coincides with the transpose matrix AT, that is, according
to (3.2.5), we have

and the operator (3.1.34) has the form


a a B a2
LT = 22-
axl + -ax2
-(px2 - + --.
2 ax;
One can readily see that the same probability density (3.2.10) satisfies the
stationary equation LTpo(x1, 2 2 ) = 0. Therefore, in this case, the ma-
trix PI from (3.1.49) and (3.1.50) coincides with the matrix P determined
by (3.2.11).
Approximate Synthesis of Stochastic Control Systems 157

The matrix V that reduces (3.2.12) to the diagonal form by the similarity
transform is equal to

(3.2.14)
This expression and formulas (3.1.50) and (3.2.12) imply

1
Correspondingly, the inverse matrix P- has the form

The matrices (3.2.15) and (3.2.16) allow us to calculate the two-dimensional


Hermitian polynomials
ae+m
Hem = (-qe+"exp [ $ ( Y T F Y ) ]7 exp [ - % ( Y P Y ) ]-
1 T-
ad 8 ~ 2 (3.2.17)
ae+m
Gem = (-I)eim ~ X P[t(r P
T--1
PI] aa:apB exp [ - i ( p T ~ - l p ) ]
-
p = Py. (3.2.18)
Then these polynomials must be represented as functions of x l and 2 2
by using the formula x = V y and expression (3.2.14) for the matrix V .
Table 3.2.1 shows some first polynomials H and G.
In this case, in view of (3.1.51)-(3.1.55), (3.2.7), (3.2.10), and (3.2.11),
the solutions of the equations of successive approximations (3.1.9), (3.1.10)
can be written in terms of the Hermitian po1ynomiaIs H e m ( x l , 2 2 ) as the
series

(3.2.19)
where the coefficients bFm are calculated by the formulas
Chapter I11

Polynomials H

Polynomials G

Expressions for the polynomials G20, Gllr . . . can be obtained


-- 1
from the
corresponding expressions for He, by the change 5, -+ Ppq , ~lp+ yp.

Before we pass to the straightforward calculation of the successive ap-


proximations fk, we make the following two remarks about some singular-
Approximate Synthesis of Stochastic Control Systems 159

ities of the series on the right-hand side of (3.2.19).


REMARK3.2.1. In practice, the series (3.2.19) is usually replaced by a
finite sum. T h e number of terms of the series (3.2.19) left in this sum is
determined by the rate of convergence of the series (3.2.19). Here we do
not discuss this question (see, for example, [26, 166, 1791). However, in our
case, the series (3.2.19) cannot be truncated in a n arbitrary way, since it
contains complex terms such as the polynomials Hemand the coefficients
aFm (this follows from (3.2.14)-(3.2.18) and (3.2.20)). At the same time,
the loss function fk(xl, x2) represented by this series has the meaning of a
real function for real arguments. Therefore, truncating the series (3.2.19),
we must remember that a finite sum of this series determines a real function
only if the last terms of this sum contain all terms with He, of a certain
+
group (namely, all Hemwith l m = s , where s is the highest order of the
polynomials left in the sum (3.2.19)).
REMARK3.2.2. Equation (3.2.7), as well as the corresponding equa-
tions of successive approximations (3.1.9) and (3. l.l o ) , is centrally sym-
metric (such equations remain unchanged after the substitution (xl, 22) t
(-21, -22)). Therefore, the series (3.2.19) must not contain terms for which
+
the sum (l m ) is odd, since the polynomials Hemwith odd (e m ) are +
not centrally symmetric (see Table 3.2.1). If we take this fact into account,
then the body of practical calculations is considerably reduced.
In what follows, we present the first two approximations calculated ac-
cording to (3.1.9) and (3.1.10) and the quasioptimal control algorithms
U O ( X ~ , X ~and
) ul(x1,22) corresponding to these approximations.
The zero approximation. First of all, let us calculate the parameter
-yo of specific stationary losses in the zero approximation. From (3.1.56)
+
with regard to c(x) = xf x; and (3.2.10), we have

7 =
' & (x: + x i ) exp [ - (x: + xi)] dxldx2.
Calculating the integral, we obtain

In view of Remark 3.2.2, the first coefficients aYo and a:, in the series
(3.2.19) are equal to zero.8 The coefficients bgo, b!, and b,: can be cal-
culated by using the formulas for G20, GI1, and Go2 from Table 3.2.1 and

'The same result can be obtained if we formally calculate the coefficients byo and b:l
by using (3.2.20).
160 Chapter I11

(3.2.16). Then, according to (3.2.20), the coefficient b!& has the form

The integral in (3.2.22) can readily be calculated, thus, taking into account
(3.2.21) and (3.2.14), we obtain

In a similar way, we can easily find

All other coefficients b;,, with t+m > 2 are zero in view of the orthogonality
condition (3.1.49).
According to (3.2.19), it follows from (3.2.23) and (3.2.24) that

Finally, using the formulas for H20, Hll, and H o from~ Table 3.2.1 and
(3.2.25), we obtain the loss function in the zero approximation

This relation and condition (3.2.9) imply the following equation for the
zero-approximation switching line rO:
Approximate Synthesis of Stochastic Control Systems 161

In this case, the quasioptimal control algorithm uo (x) in the zero approxi-
mation has the form

REMARK 3.2.3. The loss function fo(xl, 2 2 ) in the zero approximation


(without a constant term) and the parameter of stationary losses (3.2.21)
can be calculated in a different way, without using the method considered
above. Indeed, if we first seek the solution of the zero approximation equa-
tion (L is the operator (3.2.8))

Lfo = y 0 -2,
2
-2,
2

as the quadratic form

with unknown coefficients hll, h12, and h22, then, substituting this ex-
pression into (3.2.39), we obtain four equations for hll, hlz, h22, and
However, higher approximations cannot be obtained by this simple reason-
ing.
The first approximation. It follows from (3.1.10) and (3.2.26) that
in the first approximation we need to solve the equation

This equation can be solved by analogy with the zero-approximation equa-


tion (3.2.29), but the calculations are much more cumbersome due to more
complicated expression on the right-hand side.
First, we employ (3.1.57) and (3.2.21) to find the specific stationary
losses

1 B
7 = --
P
*JJ_m_ + ix2/
TB
lzl exp [ - $(x: + xi)] dxldx2,
then, after the integral is calculated, we obtain

The coefficients a&,, a i l , . . . in (3.2.19) are calculated by (3.2.19) and


(3.2.20) with regard to the formulas for Gem from Table 3.2.1. We omit
162 Chapter I11

the intermediate calculations and write the final expression for f l ( x l , 22).
Taking only the first terms in the series (3.2.19) u p to the fourth order
inclusively (that is, omitting the terms for which (l+ m) > 4), we obtain
the following expression for the loss function in the first approximation:

The condition P2 << 1 has also been used for calculating the coefficients
(3.2.32).
From (3.2.9) and (3.2.31) we obtain the following equation for the switch-
ing Iine r1 in the first approximation:

It follows from the continuity conditions that for small E the switching
line r1 is close to r0 determined by Eq. (3.2.27). Therefore, if we set
2 2 = -(/3/2)21 in the terms of the order of E in (3.2.33), then we obtain
errors of the order of E' in the equation for rl. Using this fact and formulas
(3.2.32) and (3.2.33), we arrive a t the following expression with accuracy
up to the terms of the order of O ( E ~ ) :

Figure 23 shows the position of the switching lines r0 and I" on the phase
plane ( ~ 1 ~ 2 2 ) .
The switching line (3.2.34) determines the quasioptimal control algo-
rithm in the first approximation:

'We do not calculate the coefficients v and p and the constant term "const" in (3.2.31)
since they do not affect the position of the switching line and the control algorithm in
the first approximation.
Approximate Synthesis of Stochastic Control Systems 163

This algorithm can easily be implemented with the help of standard blocks
of analog computers. The corresponding block diagram of a quasioptimal
control system for damping of random oscillations is shown in Fig. 24, where
1 and 2 denote direct-current amplifiers with amplification factors
164 Chapter I11

In conclusion, we dwell on another statement that follows from the


calculations of the first approximation. Namely, all expressions with a
small parameter contain this parameter in the form of the product ~ a = r
€1-, This statement concerns the loss function (3.2.31), the switching
line (3.2.34), and the formula (3.2.30) for stationary specific losses, which
can be written in the form

or, more briefly,

if the condition ,B2 << 1 is used in the same way as for calculating (3.2.32).
As was already noted, the method of successive approximations exploited
in the present section is efficient if the nonlinear term of the Bellman equa-
tion contains a small parameter E (we discuss this question in 33.4 in detail).
However, in the problem considered here, the convergence was ensured, in
fact, by the parameter €1-. If we recall that, by the conditions of
problem (3.2.2), the parameter E determines the values of admissible con-
trol, then it turns out that this variable need not be small for the method
of successive approximations to be efficient. Only the relation between the
limits of the admissible control and the intensity of random perturbations B
is important.
All this confirms our assertion made a t the beginning of this chapter
that the method of successive approximations considered here is convenient
for solving problems with bounded controls when the intensity of random
perturbations is large.

33.3. Synthesis of quasioptimal


controls in the case of correlated noises
Now we shall show how the method of successive approximations studied
in this chapter can be used for constructing quasioptimal controls when
random actions on the system are not white noises. Instead of the system
shown in Fig. 13, we shall consider a stabilization system of a somewhat
more general form (see Fig. 25), where in addition to random actions [(t)
on the plant we also take into account the noise rl(t)in the feedback circuit.
Let the controlled plant P be described, just as in $3.1, by the system
of linear differential equations with constant coefficients
Approximate Synthesis of Stochastic Control Systems 165

where xT = (21,. . ., x,), uT = (u1, . . . , u,), tT(t) = ([~(t),


. . ., tm(t)),
and
the constant matrices A, Q, and r are of dimensions n x n , n x r , and
n x m, respectively. Block 1 in Fig. 25 is assumed to be a linear inertialess
device described by the equation

where yT = (yl,. . . , yl), rlT = ( ~ 1 ,... , ve), and C and D are constant
matrices of dimensions e x n and e x 1,respectively, (det D # 0). The goal
of control is to minimize a functional of the form

We assume that the random perturbations [(t) and ~ ( taffecting


) the system
are independent diffusion processes with drift coefficients

and matrices of local diffusion coefficients BE and B, (G and Bt are m x m


dimensional constant matrices; H and B, are e x e dimensional constant
matrices; the matrices BE and B, are symmetric, BE is a nonnegative defi-
nite matrix and B, is a positive definite matrix). It is well known that in
this case the diffusion processes [(t) and ~ ( t are
) Gaussian.
The stated problem is a special case of the synthesis problem treated in
$1.5. This problem is characterized by the fact that the controlled process
x(t) is not a Markov process (in contrast, say, with problems considered in
$3.1 and $3.2; moreover, x(t) is a nonobservable process), and therefore, to
describe the controlled system shown in Fig. 25, we need a special space Xt
166 Chapter I11

of states. This space was called the space of sufficient coordinates in $1.5
(see also [171]). As was shown in $1.5, in this case, as sufficient coordinates,
we must use a system of parameters that determine the current a posteriori
probability density of nonobserved stochastic processes:

The a posteriori density (3.3.5) satisfies a stochastic partial differential


equation, which is a special case of Eq. (1.5.39). It follows from 5 1.5 that to
write an equation for the density (3.3.5), we need to use a priori probability
+ +
characteristics of the (n m l)-dimensional stochastic Markov process
( x ( t ) , t ( t ) ,Y(t)).l0
It follows from (3.3.1), (3.3.2), and (3.3.4) that the combined process
(x (t),[ ( t ) y(t)) , has the drift coefficients

and the matrix of local variances

The matrices introduced in (3.3.6) and (3.3.7) are

''In this case, the control u in (3.3.1) is assumed to be a given known vector at each
time instant t .
Approximate Synthesis of Stochastic Control Systems 167

Using (3.3.6) and (3.3.7), we obtain the following equation for the a
posteriori probability density (3.3.5):"

Here p(t, z) = pt (x, <) denotes the a posteriori density (3.3.5), z denotes
the vector (x,[), a, is the vector composed of the vector-columns a, and
a t , the matrix B, is a part of the matrix (3.3.7) consisting of its first
(n + m) rows and columns, Eps denotes the a posteriori averaging of the
corresponding expressions (that is, the integration with respect to z with
the density p(t, 2 ) ) .
It follows from (3.3.6)-(3.3.8) that the matrix B, is constant, the com-
ponents of the vector a, are linear functions of z, and the expression in the
square brackets in (3.3.9) linearly and quadratically depends on z. There-
fore, as shown in $1.5 (see also [170, 175]), the a posteriori density p(t, z)
satisfying (3.3.9) is Gaussian, that is,
p(t, z) = [ ( 2 7 ~ ) ~ det
+ K(t)]-'I2
x exp [ - i ( z - ~ ( t ) ) ~ K - ' ( t ) (-z Z(t))], (3.3.10)
if the initial (a priori) density p(0, z) = po(z) is Gaussian (this is assumed
in the sequel).
Substituting (3.3.10) into (3.3.9), one can obtain a system of differential
equations for the parameters 2 and K-' of the a posteriori probability
density (3.3.10). One can readily see that this system has the form

(in our special case, the system (1.5.52) acquires the form (3.3.11), (3.3.12)).
If instead of K - I we use the inverse matrix K (which is the matrix of
a posteriori covariances), then the system (3.3.11), (3.3.12) can be written
in the form

"TO derive (3.3.9) from (1.5.39), we need to recall that, according to the notation
used in (1.5.39), the vector A, coincides with the vector a,, the vector A, with a y , and
the structure of the diffusion matrix (3.3.7) implies the following relations between the
matrices: llBapll = Bz, IIBaull = 0 , llFupll = By1, and llBoPll= B Y .
Chapter I11

Here u,, V, W are the matrices

where, in turn, k,,, k,(,. . . are elements of the block covariance matrix K ,

(the dimensions of a block are determined by the dimensions of its sub-


scripts; for example, kxE is of dimension n x m).
The loss function for problem (3.3.1)-(3.3.3)

F ( t , i t , Kt) = min Eps [l


T
c(x(r), u ( r ) ) d r 1 yi]
t<r<T

= min Eps
ufr)
[1
T
c ( x ( r ) ,u ( r ) ) d i 1 S(t) = i t , K (t) = K t
I
t<;<~
(3.3.16)
is completely determined by the time instant t and the current values of
the parameters (St,K t ) of the a posteriori density (3.3.10) a t this instant
of time. It follows from the definition given in $1.5 that ( Z ( t ) , ~ ( t ) are
)
sufficient coordinates for problem (3.3.1)-(3.3.3).
The Bellman equation (1.5.54) for the function (3.3.16) can readily be
obtained in the standard way from the Eqs. (3.3.13), (3.3.14) for the suffi-
cient coordinates. However, it should be noted that, in this case, the system
(3.3.13), (3.3.14) has a special feature that allows us to exclude the a pos-
teriori covariance K ( t ) from sufficient coordinates. The point is that, in
contrast, say, with a similar system (1.5.53), the matrix equation (3.3.14) is
independent of controls u and in no way related to the system of differential
equations (3.3.13) for the a posteriori means Z(t). This allows us first to
solve the system (3.3.14) and calculate the matrix of a posteriori covari-
ances K ( t ) in the form of a known function of time on the entire control
interval 0 < -t <- T (we solve (3.3.14) with the initial matrix K(0) = KO,
where KO is the covariance matrix of a priori probability density po(z)).
If K ( t ) is assumed to be known, then in view of (3.3.8) and (3.3.15) we
can also assume that the matrix a, in (3.3.13) is a known function of time,
u,(t), and the loss function (3.3.16) depends on the set ( t , i t ) . Therefore,
Approximate Synthesis of Stochastic Control Systems 169

instead of Eq. (1.5.54) for the loss function F ( t , 3, we have the Bellman
equation of the form

(here N (2, K (t)) denotes the normal probability density (3.3.10) with the
vector of mean values Zand the covariance matrix K(t)).
Just as in $3.1 and $3.2, Eq. (3.3.17) becomes simpler if we consider
the stationary operating conditions for the stabilization system shown in
Fig. 25. The stationary operating conditions established during a long
operating time interval (which corresponds to large time t ) can exist if only
there exists a real symmetric nonnegative definite matrix K, such that

and this constant matrix K, is an asymptotically stable soIution of (3.3.14).


Let us assume that this condition is satisfied. Denoting the mean "con-
trol losses" per unit time under the stationary operating conditions, as
usual, by y, we can define the stationary loss function

f(Z)= lim [ F ( t , q -
T+w

for which, from (3.3.17), we derive the time-invariant Bellman equation

In (3.3.19) a, is the matrix a, (see (3.3.8) and (3.3.15)), where k,,, let,, . . .
are the corresponding blocks of the matrix K, determined by (3.3.18).
In some cases, it is convenient to solve Eq. (3.3.19) by the method of
successive approximations treated in $3.1 and $3.2. The following example
shows how this method can be used. Let us consider the simplest version of
the synthesis problem (3.3.1)-(3.3.3) in which Eqs. (3.3.1), (3.3.2) contain
scalar variables instead of vectors and matrices. In (3.3.3) we write the
penalty function c(x, u) in the form
170 Chapter I11

where E > 0 is a small parameter. From the "physical" viewpoint, this


penalty function means that the control actions are penalized much more
strongly than the deviations of the phase coordinate x(t) of the control
system (3.3.1) from the equilibrium state x = 0.
For simplicity, we set Q = a = C = D = 1, A = -a, G = -g, and H =
-h (a, 9, h are given positive numbers) in (3.3.1), (3.3.2), and (3.3.4). Then
the optimality filtration equations (3.3.13) and (3.3.14) have the following
(nonmatrix) form

~=-az+f+u+~[$-(h-a)~-~-u+h~],
-
B,
- - a t -
= - 9 ~+ -[G- ( h - a ) 5 - ( - u+ ~ Y I ,
(3.3.21)

B,

In this case the Bellman equation (3.3.19) has the form

Here the constants u,* and a t * are

where the constant covariances k;,, k&, and kit form the stationary solu-
tion of the system of differential equations (3.3.22).
Passing to the new variables xl = (&/a,*)Z, x2 = ( & / a r * ) r a n d
denoting by L the linear operator

where r = u,*/up, we can rewrite Eq. (3.3.23) as


Approximate Synthesis of Stochastic Control Systems 171

where b = gX./&. Taking into account the formulas

= (2~kf,)-'/~ Jm1x1exp (- -(x


-m
1
2Gx
- bx1)?

and minimizing the expression in the square brackets, we obtain from


(3.3.25) the optimal control for the stationary stabilization conditions:

where the function f (xl, 2 2 ) satisfies the nonlinear elliptic equation

Equation (3.3.28) is similar to Eqs. (3.1.8) and (3.2.7), therefore, in this


case, we can use the same method of approximate synthesis as in $3.1
and $3.2. Then the quasioptimal control uk(xl, 22) in the kth approxima-
tion is determined by the formula

where the functions fk(xl,x2) satisfy the linear equations of successive


approximations

In this case, the calculations of successive approximations f k (xl, 2 2 ) are


completely similar to those discussed in $3.1 and $3.2. Therefore, here
172 Chapter I11

we restrict our consideration to a brief description of the calculation of


fk(xl, x2). We only dwell upon the distinctions in formulas.
The operator (3.3.24) can be written in the form (3.1.5) if A = IIAijll:
and B = \lBijll! in (3.1.5) are understood as the matrices

The stationary density po(x) satisfying (3.1.14) has the form (3.1.15), and
the matrices P and P-l, as one can readily see, have the form

(p=a+g, v=a-gPr,andp=r+2g).
Using (3.3.31), we can find the matrix (see (3.1.35))

as well as the matrix

By the sirr~ilaritytransform, this matrix reduces the matrix (3.3.32) to the


diagonal form

It follows from (3.1.44), (3.1.51), and (3.1.55) that solutions of the equa-
tions of successive approximations (3.3.30) can be represented as the series

where He, (xl, 2 2 ) are two-dimensional Hermitian polynomials calculated


1 1
by the formulas (3.2.17) with y = 7- x (the matrix 7-is inverse to
(3.3.33)).
Approximate Synthesis of Stochastic Control Systems 173

The coefficients a:m are calculated by the formula (see (3.2.19))

and the coefficients

him = det1I2
2*t!m!
P
IS_,"
~tm(~")I~=vTp~
+
exp [ - ( x T p x ) ] w(x)
~ dx1dx2
(3.3.37)
are expressed in terms of the group of Hermitian polynomials Gem(xl,x2)
orthogonal to Hem(xl, x2) and calculated by (3.2.18).
Parallel to the calculations of the successive approximations to the loss
function (3.3.35), we calculate specific stationary losses -yk (corresponding
to the kth approximation) from the condition bEo = 0. So, in the zero
approximation we have

hence, performing simple calculations and taking into account (3.3.31), we


obtain

Next, using the obtained value of and formulas (3.3.26), (3.3.30),


(3.3.36), and (3.3.27), we can calculate any desired number of coefficients
aim in the series (3.3.35). With the help of these coefficients, we can con-
struct an approximate expression for the function fo(xl, x2), which allows
us to derive an explicit formula for the quasioptimal control algorithm
uo(xl, 22) in the zero approximation and to calculate the variables
fi(x1, x2), and ul(xl, x2) related to the first approximation.
Here we write explicit formulas neither for fo(xl, 2 2 ) nor for f l (XI,x2),
since they are very cumbersome. We only remark that in this case all qua-
sioptimal control algorithms (3.3.29) are nonlinear functions of the phase
variables (xl, x2); moreover, the character of nonlinearity is determined by
the number of terms left in the series (3.3.35) for the calculations.
174 Chapter I11

Thus, from the preceding it follows that the methods for calculations
of stationary operating conditions of the stabilization system (Fig. 13) can
readily be generalized to the case of a more general system with correlated
noise (Fig. 25) if the noise is a Gaussian Markov process. In this case, the
optimal system is characterized by the appearance of an optimal filter in
the regulator circuit; this filter is responsible for the formation of sufficient
coordinates. In our example (Fig. 25), where x, y, u, t, and 7 are scalar,
this filter is described by Eqs. (3.3.21). The circuit of functional elements
of this closed-loop control system is shown in Fig. 26.

Blocks P and 1 are units of the initial block diagram (Fig. 25). The
rest of the diagram in Fig. 26 determines the structure of the optimal con-
troller. One can see that this controller contains standard linear elements of
analog computers such as integrators, amplifiers, adders, etc. and one non-
linear converter NC, which implements the functional dependence (3.3.29).
Units of the diagram marked by ">" and having the numbers 1 , 2 , . . . , 8
are amplifiers with the following amplification factors Ki:
Approximate Synthesis of Stochastic Control Systems 175

33.4. Nonstationary problems. Estimates


of the quality of approximate synthesis
3.4.1. Nonstationary synthesis problems. If equations of a plant
are time-dependent or if the operating time T of a system is bounded, then
the optimal control algorithm is essentially time-varying, and we cannot
find this algorithm by using the methods considered in $33.1-3.3. In this
case, to synthesize an optimal system, it is necessary to solve a time-varying
Bellman equation, which, in general, is a more complicated problem. How-
ever, if the plant is governed by a system of linear (time-varying) equations,
then we can readily write solutions of the successive approximation equa-
tions (3.0.6), (3.0.7) in quadratures.
Let us show how this is done. Just as in 33.1, we consider the synthesis
problem for the stabilization system (Fig. 13) with a plant P described by
equations of the form

where x is an n-dimensional vector of phase coordinates, u is an T-dimensi-


) respectively, given n x n,
onal vector of controls, A(t), Q(t), and ~ ( t are,
n x T, and n x n matrices continuous for all t E [0, TI, and [(t) is the
n-dimensional standard white noise (1.1.34). To estimate the quality of
control, we shall use the following criterion of the type of (1.1.13):

and assume that the absolute values of the components of the control vector
u are bounded by small values (see (3.1.3)):

According to (3.1.6) and (3.1.7), the optimal control u*(t,x) for problem
(3.4.1)-(3.4.3) is given by the formula

U, (t, x) = - {EU,~,. . .,EU,,} sign

where the loss function F ( t , x) satisfies the equation


176 Chapter 111

with Lt,, denoting a linear parabolic operator of the form

For the function @(t,


a F / d x ) , we have the expression

In this case, the function F ( t , x) must satisfy (3.4.5) for all x E R,, 0 _<
t < T, and be a continuous continuation of the function

as t -+T (see (1.4.22)).


The nonlinear equation (3.4.5) is similar to (3.0.5) and, according to
(3.0.6) and (3.0.7), can be solved by the method of successive approxima-
tions. To this end, we need to solve the sequence of linear equations

(all functions Fk(t,x) determined by (3.4.9) and (3.4.10) must satisfy con-
dition (3.4.8)). Next, if we take Fk(t, x) as an approximate solution of
Eq. (3.4.5) and substitute Fk into (3.4.4) instead of F, we obtain a qua-
sioptimal control algorithm u k ( t ,x) in the kth approximation.
Let us write the solutions Fk(t, x), k = 0,1,2,. . ., in quadratures. First,
let us consider Eq. (3.4.9). Obviously, its solution Fo(t, z ) is equal to the
value of the cost functional

on the time interval [t,T] provided that there are no control actions. In this
case, the functional on the right-hand side of (3.4.11) is calculated along the
trajectories x(T), t 5 T _< T, that are solutions of the system of stochastic
differential equations

describing the uncontrolled motion of the plant ( u r 0 in (3.4.1)).


Approximate Synthesis of Stochastic Control Systems 177

It follows from $1.1and 31.2 that the solution of (3.4.12) is a continuous


Markov process X(T) ( a diffusion process). This process is completely de-
termined by the transitive probability density function p(x, t; z, T), which
determines the probability density of the random variable z = X ( T ) if the
stochastic process x(t) was in the state x(t) = x a t the preceding time
moment t. Obviously, by using p(x, t; z, T ) , we can write the functional
(3.4.11) in the form

On the other hand, we can write the transitive density p(x, t; z, T) for the
diffusion process X(T) (3.4.12) as an explicit finite formula if we know the
fundamental matrix X ( t , T) for the nonperturbed (deterministic) system
.i= A(t)z.
Indeed, since Eqs. (3.4.12) are linear, the stochastic process X ( T ) satisfy-
ing this equation is Markov and Gaussian. Therefore, for this process, the
transitive probability density has the form

p ( x , t ; z, T) = [ ( 2 ~det
) ~~ ] - ~ / ~ e x ~ [ --$u()z~ D - ' ( -
~ a)], (3.4.14)

where a = Ez = E ( X ( T ) I x(t) = x) is the vector of mean values and


D = E[(z - Ez)(z - E z ) ~ is
] the covariance (dispersion) matrix of the ran-
dom vector z = x(T). On the other hand, using the fundamental matrix
X ( t , r)12we can write the solution of system (3.4.12) in the form (the
Cauchy formula)

Hence, performing the averaging and taking into account properties of the
white noise (1.1.34), we obtain the following expressions for the vector a
and the matrix D:

>
12Recallthat the fundamental matrix X ( t , T), T t , is a nondegenerate n x n matrix
whose columns are linearly independent solutions of the system i ( r ) = A(r)z(r), SO
that X ( t , t) = E, where E is the identity matrix. Methods for constructing fundamental
matrices and their properties are briefly described on page 101 (for details, see 162, 1111).
178 Chapter I11

Formulas (3.4.13)-(3.4.16) determine the solution Fo(t, x) of the zero-


approximation equation (3.4.9), satisfying (3.4.8), in quadratures. It fol-
lows from (3.4.13)-(3.4.16) that the function Fo(t, x) is infinitely many
times differentiable with respect to the components of the vector x if the
functions c(z) and +(z) belong to a rather wide class (it suffices that the
functions c(z) exp(- $zT D-'z) and +(z) exp(- i z T D-'2) were absolutely
integrable [25]). Therefore, by analogy with (3.4.13), we can write the solu-
tion Fk(t,z) of the successive approximation equations (3.4.10), satisfying
(3.4.8), in the form

To obtain explicit formulas for the functions Fo(t, x), Fl(t, x), . . ., which
allow us to write the quasioptimal control algorithms uo(t, x), ul(t, x), . . .
as finite analytic formulas, we need to have the analytic expression of the
matrix X ( t , T) and to calculate the integrals in (3.4.13) and (3.4.17). For
autonomous plants (the case where the matrix A(t) in (3.4.1) and (3.4.12)
is constant, A(t) G A = const), the fundamental matrix X ( ~ , T has ) the
form of a matrix exponential:

whose elements can be calculated by standard methods. On the other hand,


it is well known that fundamental matrices of nonautonomous systems can
be constructed, as a rule, by numerical methods.13 Thus for A(t) # const,
it is often difficult to obtain analytical results.
If the plant equation (3.4.1) contains a constant matrix A(t) A =
const, then formulas (3.4.13) and (3.4.17) allow us to generalize the results
obtained in $33.1-3.3 for the stationary operating conditions to the time-
varying case. For example, let us consider a time-varying version of the
problem of optimal damping of random oscillations studied in $3.2.
13Examples of special matrices A ( t ) for which the fundamental matrix of the system
x = A ( t ) x can be calculated analytically, can be found, e.g., in [139].
Approximate Synthesis of Stochastic Control Systems 179

Just as in $3.2, we shall consider the optimal control problem (3.2.1)-


(3.2.3). However, in contrast with $3.2, we now assume that the terminal
time (the upper limit T of integration in the functional (3.2.3)) is a finite
fixed value. By writing the plant equation (3.2.1) in the form of system
(3.2.4), we see that problem (3.2.1)-(3.2.3) is a special case of problem
(3.4.1)-(3.4.3) if

Therefore, it follows from the general scheme (3.4.4)-(3.4.10) that in this


case the optimal control has the form

where for 0 5 t < T the function F ( t , 21, 22) satisfies the equation

and vanished a t the terminal point, that is,

According to (3.4.6) and (3.4.13), the operator Lt,, in (3.4.21) has the form

Let us calculate the loss function Fo(t,X I , 22) of the zero approximation.
In view of (3.4.9), (3.4.21), and (3.4.22), this function satisfies the linear
equation
Lt,,Fo(t, XI., 22) = -2; - x:, 0 5 t < T, (3.4.23)
with the boundary condition

According to (3.4.13), the function Fo(t, x l , 2 2 ) can be written in quadra-


tures
180 Chapter I11

where the transition probability density p(x, t ; z , T) is given by (3.4.14).


It follows from (3.4.15) and (3.4.16) that to find the parameters of the
transition density we need to calculate the fundamental matrix (3.4.18).
Obviously, the roots X1 and X2 of the characteristic equation det(A -
XE) = 0 of the matrix A given by (3.4.19) are

From this and the Lagrange-Silvester formula [62] we obtain the following
expression for the fundamental matrix (3.4.18) (here p = (T - t)):

$ sin Sp + 6 cos Sp
- -1
- e-Ppf2
S -sinSp b cos Sp $ sin bp
sin-sp 1 (3.4.26)

It follows from (3.4.15), (3.4.16), and (3.4.26) that in this case the vector
of means a and the variance matrix D of the transitive probability density
(3.4.14) have the form

+ $xl) sin Sp
a = e-pp/2
/I XI
22
+
cos Jp f (xz
cos s p - $ (21 + f x 2 ) sin Sp (3.4.27)

1 1
p~(p)=-(l-e-~~), p2(p)=4~+e-Pp(26sin26p-~cos26p)],
P
p3 (p) = 26 - e-Pp ( p sin 2Sp + 26 cos 2Sp). (3.4.28)

Substituting (3.4.14) into (3.4.25) instead of p(x, t ; z , T ) , integrating, tak-


ing into account (3.4.27) and (3.4.28), and performing some easy calcula-
Approximate Synthesis of Stochastic Control Systems 181

tions, we obtain the following final expression for the function Fo(t, x l , 22):

+ Pxlx2 + x i )
I
where 7 = T - t.
Let us briefly discuss formula (3.4.29). If we consider the terms on
the right-hand side of (3.4.29) as function of "reverse" time 7 = T - t ,
then these terms can be divided into three groups: infinitely increasing,
damping, and independent of p as 7 -+ oo. These three types of terms have
the following physical meaning. The only infinitely growing term (B/P)p in
(3.4.29) shows how the mean losses (3.4.11) depend on the operating time
in the mode of stationary operating conditions. Therefore, the coefficient
B / P has the meaning of the specific mean error y, which was calculated
in 53.2 by other methods and for which we obtained = B / P in the
zero approximation (see (3.2.21)). Next, the terms independent of p (in
the braces in (3.4.29)) coincide with the expression for the stationary loss
function obtained in $3.2 (formula (3.2.26)). Finally, the damping terms in
(3.4.29) characterize the deviations of operating conditions of the control
system from the stationary ones.
Using (3.4.29), we can approximately synthesize the optimal system in
the zero approximation, where the control algorithm uo(t, X I , xa) has the
form (3.4.20) with F replaced by Fo. The equation

determines the switching line on the phase plane (XI,x2). Formula (3.4.30)
shows that this is a straight line coinciding with the x-axis as p -+ 0 and
rotating clockwise as p -+ oo (see Fig. 27) till the limit value X I + 2x2/P = 0
corresponding to the stationary switching line (see (3.2.27)).
Formulas (3.4.29) and (3.4.30) also allow us to estimate whether it is
important to take into account the fact that the control algorithm is time-
varying. Indeed, (3.4.29) and (3.4.30) show that deviations from the sta-
tionary operating conditions are observed only on the time interval lying a t
Chapter I11

the distance from the terminal time T. Thus, if the general operating
time T is substantially larger than this interval (say, T >> 3/,0), then we can
use the stationary algorithm on the entire interval [0, TI, since in this case
the value of the optimality criterion (3.2.3) does not practically differ from
the optimal value. This fact is important for the practical implementation
of optimal systems, since the design of regulators with varying parameters
is a rather sophisticated technical problem.
3.4.2. Estimates of the approximate synthesis performance. Up
to this point in the present chapter, we have studied the problem of how
to find a control syste close to the optimal one by using the method of
successive approximations. In this section we shall consider the problem of
how the quasioptimal system constructed in this way is close to the optimal
system, that is, the problem of approximate synthesis performance.
Let us estimate the approximate synthesis performance for the first two
(the zero and the first) approximations calculated by (3.0.6)-(3.0.8). As an
example, we use the time-varying problem (3.4.1)-(3.4.3). We assume that
the entries of the matrices A(t), Q(t), and ~ ( t in ) (3.4.1) are continuous
functions of time defined on the interval 0 5 t 5 T. We also assume that the
penalty functions c(x) and $(x) in (3.4.2) are continuous and bounded for
all x E R,. Then [I241 there exists a unique function F ( t , x) that satisfies
the Cauchy problem (3.4.5), (3.4.8) for the quasilinear parabolic equation
(3.4.5)14 This function is continuous in the strip IIT = (1x1 < m, 0 5 t 5 T }

14We shall use the following terminology: Eq. (3.4.5) is called a quasilinear (semi-
linear) parabolic equation, the problem of solving Eq. (3.4.5) with the boundary condi-
Approximate Synthesis of Stochastic Control Systems 183

and continuously differentiable once with respect t o t and twice with respect
to x for 0 5 t < T; its first- and second-order derivatives with respect to x
are bounded for x E IIT.
One can readily see that in this case

and hence, for small E, the functions Fo(t , x) and Fl (t, x) nicely approximate
the exact solution of Eq. (3.4.5).
To prove relations (3.4.31), let us consider the functions So@,X ) =
F ( t ,x) - Fo(t, x) and S l ( t , x) = F (t, x) - Fl(t, x). It follows from (3.4.5),
(3.4.9), and (3.4.10) that these functions satisfy the equations

Equations (3.4.32) and (3.4.33) differ from (3.4.9) only by the expressions
on the right-hand sides and by the initial data. Therefore, according to
(3.4.13), the functions So and S1 can be written in the form

Since the function @ is continuous (see (3.4.7)) and the components of


the vector dF/dx are bounded, we have I@(T, dF/dz)l <
P for all T E [O, TI;
hence, we have the estimate

tion (3.4.8)is called the Cauchy problem, and the boundary condition (3.4.8)itself is
sometimes called the "initial" condition for the Cauchy problem (3.4.5),(3.4.8). This
terminology corresponds to the universally accepted standards [61,1241 if (as we shall
do in 53.5) in Eq. (3.4.5)we perform a change of variables and use the "reverse" time
p = T - t instead of t . In this case, the backward parabolic equation (3.4.5)becon~es
a "usual" parabolic equation, and the boundary value problem (3.4.5),(3.4.8)takes the
form of the standard Cauchy problem.
184 Chapter I11

The first relation in (3.4.31) is thereby proved.


To prove the second relation in (3.4.31), we need to estimate the dif-
ference S$ = (aF/axi) - (aFo/dxi). To this end, we differentiate (3.4.32)
with respect to xi. As a result, we obtain the following equation for the
function s;:

(in fact, the derivative on the right-hand side of (3.4.37) is formal, since the
function @ (3.4.7) is not differentiable). Using (3.4.13) for s;, we obtain

Integrating (3.4.38) by parts with respect to zi and taking into account


(3.4.14) and (3.4.15), we arrive at

From (3.4.39) we obtain the following estimate for 5';:

Now we note that since Q(t) in (3.4.7) is bounded, the function @(t,y)
satisfies the Lipschitz condition with respect to y:

Using (3.4.40), (3.4.41), and (3.4.35), we obtain

5 E ~ N P V ( T- t), v = C 1/;,
Approximate Synthesis of Stochastic Control Systems 185

which proves the second relation in (3.4.31).


In a similar way, we can also estimate the difference a F / d x i - a F l / d x i =
Si. Indeed, just as (3.4.39) was obtained from (3.4.32), we use (3.4.33) to
obtain

This relation and (3.4.40), (3.4.41) for the function Si readily yield the
estimate

which we shall use later.


According t o (3.0.8), in this case the quasioptimal controls u o ( t ,x ) and
ul(t,x ) are determined by (3.4.4), where instead of the loss function F ( t , x )
we use the successive approximations F'(x, t ) and Fl ( x ,t ) ,respectively. By
G o ( t ,x ) and G l ( x , t ) we denote the mean values of the functional (3.4.11)
calculated on the trajectories of the system (3.4.1)

with the use of the quasioptimal controls uo(t,x ) and u l ( t ,x ) . The func-
tions Gi ( t ,z), i = 0 , 1 , estimate the performance of the quasioptimal control
algorithms ui(t,x ) , i = 0 , l . Therefore, it is clear that the approximate syn-
thesis may be considered to be justified if there is only a small difference
between the performance criteria G o ( t ,x ) and G l ( t ,x ) of the suboptimal
systems and the exact solution F ( t , x ) of Eq. (3.4.5) with the initial condi-
tion (3.4.8).
One can readily see that the functions Go and G I satisfy estimates of
type (3.4.31), that is,

Relations (3.4.45) can be proved by analogy with (3.4.31). Indeed, the


functions Go and G1 satisfy the linear partial differential equations [45,
1571

dGi
LGi ( t ,X ) = - c ( x ) - c ~ ? ( tx, ) ~ ~ -(t, ( t ) x), (3.4.46)
dx
-
G i ( T ,X ) = $ ( x ) , u i ( t , X ) = u i ( t ,x ) / E , i = 0 , l .
186 Chapter I11

This fact and (3.4.9), (3.4.10) imply the following equations for the func-
tions Ho = Fo - Go and H1 = Fl - G I :

Since zTQT% =~ ( t ,z),Eq. (3.4.48) can be rewritten as follows:

It follows from (3.4.4) that Eqs. (3.1.46), (3.4.49) are linear parabolic equa-
tions with discontinuous coefficients. Such equations were studied in [80,
81, 1441. It was shown that if, just as in our case, the coefficients in
(3.1.46), (3.1.49) have discontinuities of the first kind, then, under our
assumptions about the properties of A(t), Q(t), c(x), and $(x), the solu-
tions of Eqs. (3.4.46), (3.4.49) and their first-order partial derivatives are
bounded.
Using this fact, we can readily verify that the right-hand sides of (3.4.47)
and (3.4.49) are of the order of E and e2, respectively. For Eq. (3.4.47), this
statement readily follows from the boundedness of the components of the
vectors dGo/8x and Eo and the elements of the matrix Q. The right-hand
side of (3.4.49) can be estimated by the Lipschitz condition (3.4.41) and
the inequality

which follows from (3.4.40) and (3.4.44). Therefore, for the functions Ho
and H1 we have
IHoINE, IHII-E~. (3.4.50)
To prove (3.4.45), it suffices to take into account the inequalities

and to use (3.4.31) and (3.4.50).


Thus, relations (3.4.45) show that if the Bellman equation contains a
small parameter in nonlinear terms, then the difference between the qua-
sioptimal control system calculated by (3.0.6)-(3.0.8) and the optimal con-
trol system is small and, for sufficiently small E , we can restrict our calcula-
tions to a small number of approximations. We need either one (the zero)
Approximate Synthesis of Stochastic Control Systems 187

or two (the zero and the first) approximations. This depends on the ad-
missible deviation of the quasioptimal system performance criteria Gi (t, z )
from the loss function F ( t , 2).
In conclusion, we make two remarks about (3.4.45).
REMARK3.4.1. One can readily see that all arguments that lead to the
estimates (3.4.45) remain valid for any types of nonlinear functions in (3.4.5)
that satisfy the Lipschitz condition (3.4.41). Therefore, in particular, all
statements proved above for the function @ (3.4.7) automatically hold for
equations of the form (3.0.4) with T-dimensional ball taken as the set U of
admissible controls, instead of an T-dimensional parallelepiped.
REMARK3.4.2. The estimates of the approximate synthesis accuracy
considered in this section are based on the assumption that the solutions of
the Bellman equation and their first-order partial derivatives are bounded.
At first glance it would seem that this assumption substantially narrows the
class of problems for which the approximate synthesis procedure (3.0.6)-
(3.0.8) can be justified. Indeed, the solutions of Eqs. (3.4.5), (3.4.9),
(3.4.10), and (3.4.46) are unbounded for any x E R, if the functions
c(x) and $(x) infinitely grow as 1x1 + m . Therefore, for example, we
must eliminate frequently used quadratic penalty functions from consid-
eration. However, if we are interested in the solution of the synthesis
problem in a given bounded region Xo of initial states x(0) of the con-
trol system, then the procedure (3.0.6)-(3.0.8) can also be used in the case
of unbounded penalty functions. This statement is based on the follow-
ing heuristic arguments. Since the plant equation (3.4.1) is linear and the
matrices A(t), &(t), and a ( t ) and the control vector u are bounded, we
can always choose a sufficiently large number R such that the probability
>
P{supOltLT Ix(t)l R ) becomes arbitrary small [ l l , 45, 1571 for any fixed
domain Xo of the initial states x(0). Therefore, without loss of accuracy,
we can replace the unbounded functions c(x) and $(x) in (3.4.2) (if, in a
certain sense, these functions grow as 1x1 = R + m slower than the prob-
ability -
Iz(t)l 2 R ) decreases as R -t m ) by the expressions
for 1x1 < R,
c(x) for 1x1 2 R,
1x1 = R,
for lxl<R,
for lxl>R,

for which the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) satisfy
the boundedness assumptions.
188 Chapter I11

The question of whether procedure (3.0.6)-(3.0.8) can be used for solving


the synthesis problems with unbounded functions c(x) and $(x) in the
functional (3.4.2) will be rigorously examined in the next section.

$3.5. Analysis of the asymptotic convergence of


successive approximations (3.0.6)-(3.0.8) as k + oo
The method of successive approximations (3.0.6)-(3.0.8) can also be used
for the synthesis of quasioptimal control systems if the Bellman equation
does not contain a small parameter in nonlinear terms. Needless to say
that (in contrast with Section 3.4.2 in $3.4) the first two approximations,
as a rule, do not approximate the exact solution of the synthesis problem
sufficiently well. We only hope that the suboptimal system synthesized on
the basis of (3.0.9) is close to the optimal system for large Ic. Therefore,
we need to investigate the asymptotic behavior as k 4 oo of the functions
Fk (t, x) and uk (t, x) in (3.0.6)-(3.0.8). The present section deals with this
problem.
Let us consider the time-varying synthesis problem of the form (3.4.1)-
(3.4.3) in a more general setting. We assume that the plant is described by
the vector-matrix stochastic differential equation of the form

Here x is an n-dimensional vector of phase coordinates of the system, u is an


r-dimensional vector of controls, [(t) is an n-dimensional vector of random
actions of the standard white noise type (1.1.34), Z(t, x) is a given vector-
function of the phase coordinates x and time t , and q(t) and ~ ( tx), are n x r
and n x n matrices whose elements depend on t and (t, x), respectively. The
conditions imposed on the functions E(t, x), q(t), and Z are stated later in
detail. Here we only note that these functions are always assumed to be
such that for t >- to, 0 < to < T, the stochastic equation (3.5.1) has a
unique solution z ( t ) satisfying the condition x(t0) = xo at least in the weak
sense (see sIV.4 in [132]).
As an optimality criterion, we take the functional (3.4.2),

Here c(x) and $(x) are given nonnegative scalar penalty functions whose
special form is determined by the character of the problem considered (the
requirements on c(x) and +(x) are given later).
The constraints on the domain of admissible controls have the form
(1.1.22),
u € u, (3.5.3)
Approximate Synthesis of Stochastic Control Systems 189

where U C R, is a closed bounded convex set in the Euclidean space R,.


It is required to find a function u, = u,(t, x(t)) satisfying (3.5.3) such
that the functional (3.5.2) calculated on the trajectories of system (3.5.1)
with the control u, attains its minimum value.
In accordance with the dynamic programming approach, solving this
problem is equivalent to solving the Bellman equation that, for problem
(3.5.1)-(3.5.3), reads (see $1.4)

Here Z ( t , x) is a column of functions with components (see (1.2.48))

-a e ( t , x ) = & ( t , x ) + - - g1m i , -
& = I ,...,n. (3.5.5)
2 dx,
Recall that we assumed in $1.2 that throughout this book all stochastic
differential equations written (just as (3.5.1)) in the Langevin form [I271
are symmetrized [174].
By definition, the loss function F in (3.5.4) is equal to

F = F ( t , x ) = min E
u(r)EU

Here E[(-) I x(t) = x] means averaging over all possible realizations of the
>
controlled stochastic process x ( r ) = z u ( ~ ) ( r( )r t ) issued from the point
x a t r = t. It follows from (3.5.6) that

Passing to the "reverse" time p = T - t , we transform Eq. (3.5.4) and the


condition (3.5.7) to the form

LF(p,x)=-c(z)-min
UEU

F(0, 2) = +(XI. (3.5.9)

In (3.5.8) we have the following notation:


190 Chapter I11

ai (p, x) = iii (2, T - p), q(p) = q(T - p), bij (p, x) is a general element of the
matrix $ a ( T - p, x ) ? F T ( ~
- p, x) and, as usual, the sum in (3.5.10) (just
as in (3.5.5)) is taken over repeated indices from 1 to n.
Assuming that the gradient d F / a x of the loss function is a known vector
and calculating the minimum in (3.5.8), we obtain

In addition, we obtain the function

that satisfies the condition

and solves the synthesis problem (after we have solved Eq. (3.5.11) with
the initial condition (3.5.9)). The form of the functions cp and @ depends
on the form of the domain U in (3.5.3) (see (1.3.19)-(1.3.23)).
Equation (3.5.11) is an equation of the form (3.0.5). It differs from
Eq. (3.0.5) only by a small parameter (there is no small coefficient E of
the function 9). Nevertheless, in this case, we shall also use the ap-
proximate synthesis procedure (3.0.6)-(3.0.8) in which, instead of the ex-
act solution F ( p , x) of Eq. (3.5.11), we take the sequence of functions
Fo(p, x), Fl(p, x), . . . recurrently calculated by solving the following se-
quence of linear equations:

The successive approximations uo(p, x), ul(p, x), . . . of control are deter-
mined by the expressions

Below we shall find the conditions under which the recurrent procedure
(3.5.13)-(3.5.15) converges to the exact solution of the synthesis problem.
Approximate Synthesis of Stochastic Control Systems 191

Let us consider Eq.(3.5.11) with the operator L determined by (3.5.10).


The solution F ( p , x) and the coefficients bij(p, x) and ai(p, x) of the oper-
<
ator L are defined on IIT = {[O, TI x R,} E {(p, x): 0 p 5 T, x E R,).
We assume that everywhere in IIT the matrix Ilbij(p, x)lly satisfies the con-
dition that the operator L is uniformly parabolic, that is, everywhere in
ItT for any real vector x we have

where and X are some positive constants. Moreover, we assume that the
functions bij(p, x) and ai(p, x) are bounded in IIT, continuous in both vari-
ables (p, x), and satisfy the Holder inequality with respect to x uniformly
in p, that is,

We assume that the functions c(x), $(x), and @(p,dF/dx) are continuous
in IIT and that c(x) and $ (x) satisfy the following restrictions on the growth
as 1x1 + m:
<
~ ( x ) KlehlXl, <
$(x) KlehlXl (3.5.18)
(h is a positive constant). We also assume that the function @(p,v) satisfies
the Lipschitz condition with respect to v = (vl,. . ., v,) uniformly in p E
[O, TI, that is,

In particular, the functions @ from (3.4.7) and (1.3.23) satisfy (3.5.19).


The following three consequences from the above assumptions are well
known [74].
(1) There exists a unique fundamental solution G(x, p; y, a) of linear
equations (3.5.13), (3.5.14). This solution is defined for all (x, p) E ItT and
(y, a) E IIT (p > a ) , satisfies the homogeneous equation LG = 0 in the
variables (x, p), and

for any continuous function f (x) such that


192 Chapter I11

(here 1 is taken from (3.5.16)).


(2) Solutions of inhomogeneous equations (3.5.13) and (3.5.14) can be
expressed in terms of G(x, p; y, u ) as follows:

In this case, formula (3.5.22) holds unconditionally in view of (3.5.18); for-


mula (3.5.23) holds only if the derivatives dFk/dxi satisfy some inequalities
of the form (3.5.18) (or a t least of the form (3.5.21)). In the sequel, we show
that this condition is always satisfied. The solutions Fk(p, x), k = 0,1,. . .,
are twice continuously differentiable in x, and the derivatives dFk/dxi and
d2Fk/dxidxj can be calculated by differentiating the integrands on the
right-hand sides of (3.5.22) and (3.5.23).
(3) The following inequalities hold (for any X < from (3.5.16)):

Statements (1)-(3) hold for linear equations (3.5.13), (3.5.14) of succes-


sive approximations. Now we return to the synthesis problem and consider
the two stages of solving this problem. First, by using the majorant esti-
mates (3.5.24) and (3.5.25), we prove that the successive approximations
F k ( p , x ) converge as k + co to the solution F ( p , x ) of Eq. (3.5.11) (in
this case, we simultaneously prove that there exists a unique solution of
Eq. (3.5.11) with the initial condition (3.5.9)). Next, we show that the sub-
optimal systems constructed by the control law (3.5.15) are asymptotically
as k + co equivalent to the optimal system.
1. First, we prove that the sequence of functions Fo (p, x), Fl (p, x), . . .
determined by recurrent formulas (3.5.22), (3.5.23) and the sequence of
their partial derivatives dFk(p, x)/dxi, k = 0,1,2,. . . are uniformly con-
Approximate Synthesis of Stochastic Control Systems 193

vergent. To this end, we construct the differences

(in (3.5.26), (3,5,27) we set Ic = 0,1,2, . . . provided that F- 0). Using


(3.5.19), (3.5.26), and (3.5.27), we obtain the inequalities

/ dQk (p, X )
axi / /" / I a G( ~ ,
K2
o Rn
dxi
P; Y, (. (aQk-l(a'

Formulas (3.5.28), (3.5.29) and (3.5.24), (3.5.25) allow us to calculate es-


timates for the differences (3.5.26), (3.5.27) recurrently. To this end, it is
necessary only to estimate IdQo/dxil. It turns out that the estimate of
type (3.5.18) holds, that is,

Indeed, since

t-"l2/ R, exp (- aly2 + h y l )dy < 4


for X > 0, we have
194 Chapter I11

for the derivative aFo/azi provided that (3.5.18), (3.5.22), and (3.5.25) are
taken into account.
By using the inequality

with regard to (3.5.19), (3.5.27), and (3.5.32), we obtain

x exp ( - 'Ix -Z I ' + hlyl) dy


P-"

and since p is bounded, we arrive a t (3.5.30).


Using (3.5.30) and applying formulas (3.5.28) and (3.5.29) repeatedly,
we estimate the differences (3.5.26) and (3.5.27) for an arbitrary number
k>- 1 (here r ( - ) is the gamma function) as follows:

(formulas (3.5.33) and (3.5.34) are proved by induction over k).


The estimates obtained show that the sequences of functions

converge to some limit functions

dFk
F (p, x) = klim
+m
F k (p, x), Wi (p, x) = k+m
lim -
dxi (p, z).

In this case, the partial sums on the right-hand side of (3.5.35) uniformly
converge in any bounded domain lying in HT,while in (3.5.36) the partial
Approximate Synthesis of Stochastic Control Systems 195

sums converge uniformly if they begin from the second term. The estimate
(3.5.32) shows that the first summand is majorized by a function with
singularity a t p = 0. However, one can readily see that this is an integrable
singularity. Therefore, we can pass to the limit (as Ic -+ oo) in (3.5.23) and
in the formula obtained by differentiating (3.5.23) with respect to xi. As a
result, we obtain

This implies that Wi(p, x) = d F ( p , %)/axi and hence the limit function
F(p, x) satisfies the equation

Equation (3.5.37) is equivalent to the initial equation (3.5.11) with the


initial condition (3.5.9), which can be readily verified by differentiating
with regard to (3.5.20).
Thus, we have proved that there exists a solution of Eq. (3.5.11) with
the initial condition (3.5.9). The proof of this statement shows that the
solution F(p, x) and its derivatives dF/dxi have the following majorants
everywhere in ItT:

By using (3.5.38), we can prove that the solution of Eq. (3.5.11) with
the initial condition (3.5.9) is unique. Indeed, assume that there exist two
solutions Fl and Fz of Eq. (3.5.11) (or of (3.5.37)). For the difference
V = Fl - F2we obtain the expression

which together with (3.5.19) allows us to write


196 Chapter I11

The same reasoning as for the functions Fk leads to the following estimate
for the difference V = Fl - F2 that holds for any k :

This implies that V(p, x) 0, that is, Fl(p, x) = &(p, x).


We have proved that the successive approximations Fo(p, x), Fl(p, x), . . .
obtained by recurrent formulas (3.5.13) and (3.5.14) converge asymptoti-
cally as k + oo to the solution of the Bellman equation, which exists and
is unique.
2. Now let us return to the synthesis problem. Previously, it was pro-
posed to use the functions uk(p,x) given by (3.5.15) for the synthesis of the
control system. By Hk(p,x) we denote the functional

calculated on the trajectories of system (3.5.1) that pass through the point
x at time t = T - p under the action of control uk. The function Hk(p, x)
determines the "quality" of the control uk(p, x) and satisfies the linear
equation
aHk
LHk(~, = -c(x) - u:(P, x)qT(p) z ( ~ , x), Hk(O, X) = d ( ~ ) .
(3.5.39)
From (3.5.14), (3.5.39), and the relation - u T q T d ~ k / d x = @(p,dFk/dx),
it follows that the difference Ak(p,x) = F k ( p ,x) - Hk(p,x) satisfies the
equation

Since the right-hand side of (3.5.40) is small for large k (see (3.5.19),
(3.5.34)), that is,

,, .. ,

(3.5.41)
and the initial condition in (3.5.40) is zero, we can expect that the difference
Ak (p, x) considered as the solution of Eq. (3.5.40) is of the same order, that
is,
[Ak(p, x) < - €6 ~ ~ e ~ l ~ l . (3.5.42)
Approximate Synthesis of Stochastic Control Systems 197

If the functions uk (p, x) are bounded and sufficiently smooth, so that the
coefficients of the operator Lk are Holder continuous, then the operator Lk
is just the same as L and the inequality (3.5.42) can readily be obtained from
(3.5.22), (3.5.24), and (3.5.41). Conversely, if uk(p, x) are discontinuous
functions (but without singularities, for example, such as in (3.0.1) and
(3.0.8)), then the inequality (3.5.42) follows from the results of [811.
Since the series (3.5.35) is convergent, we have IF(p, x) - Fk(p,x)l 5
~ l ~ lE; t 0 as k + m ) . Finally, this fact, the inequality
~ g ~ 7 e (where
< +
IF - HkI IF - FkI IFk- Hk1, and (3.5.42) imply

~ m a x ( ~ LE):, and K g = max(K6, K7)). Formula (3.5.43) proves the as-


( E =
ymptotic (as k t m ) optimality of suboptimal systems constructed accord-
ing to the control algorithms uk(p, x) calculated by the recurrent formulas
(3.5.13)-(3.5.15).
REMARK3.5.1. If the coefficients of the operator L are unbounded in
IIT, then the estimates (3.5.24) and (3.5.25), generally speaking, do not
hold. However, there may be a change of variables that reduces the problem
to the case considered above. If, for example, the coefficients & ( t , x) in
(3.5.1) depend on x in a linear way (that is, a(t, x) = A(t)x, where A(t)
is an n x n matrix depending only on t ) , then the change of variables
x = X(O, t)y (where X ( 0 , t ) is the fundamental matrix of the system i =
A(t)x) eliminates unbounded coefficients in the operator L (in the new
variables y), which allows us to investigate such systems by the methods
considered above.
In conclusion, let us consider an example from [96], which illustrates the
efficiency of the method of successive approximations for a one-dimensional
synthesis problem that can be solved exactly.
Let the control system be described by the scalar equation

Here d ( r ) is the delta function; b and urn are given positive numbers.
We shall assume that the penalty function c(x) in the optimality crite-
rion (3.5.2) is even (that is, c(x) = c(-x)) and the final state x(T) is not
penalized. Then the Bellman equation (3.5.8) and the initial condition
(3.5.9) take the form

dF
+ u min aF + --
bd2F
- = c(x)
ap [a
u-
2 ax2,
F(0, x) = 0. (3.5.44)
198 Chapter I11

Minimizing the expression in the square brackets, we obtain the optimal


control
dF
U* (p, X) = -Urn sign -(p, 2) ,
ax
and transform the Bellman equation to the form

Since the penalty function c(x) is even, it follows from (3.5.45) that for any
p the loss function F (p, x) satisfying (3.5.45) is an even function of x, hence
we have the explicit formula

u+(p,x) = u* (x) = -urn sign x.

In this case, for x > 0, the loss function F ( p , x) is determined by the


formula [26]

(x + + pl2 + dp} dy.


2ba
The successive approximations Fo(p, x) , Fl (p, x) , . . . are even functions
of the variable x (since c(x) is even). Therefore, in this case, any ap-
proximate control (3.5.15) coincides with the optimal control u,, and the
efficiency of the method can be estimated by the deviation of the successive
approximations Fo, Fl, . . . from the exact solution F (p, x) written above.
Choosing the quadratic penalty function c(x) = x2 and taking into ac-
count the fact that in this case the fundamental solution G(x, p; y, a) (the
transition probability density p(y, a ; y, a)) has the form

we obtain from (3.5.22) and (3.5.23) the following expressions for the first
two approximations:
Approximate Synthesis of Stochastic Control Systems

The functions Fo,Fl, F calculated for u, = b = p = 1 are shown in


Fig. 28. One can see that

max IF(1,x) - Fo(lt X ) I NN I, max


x F(1,x) x

that is, the second approximation gives a satisfactory approximation to the


exact solution.
This example shows that the actual rate of convergence of successive
approximations to the exact solution of the Bellman equation can be larger
than the theoretical rate of convergence estimated by (3.5.35) and (3.5.33),
since the proof of the convergence of the method of successive approxi-
mations (3.5.13)-(3.5.15) is based on rather rough estimates (3.5.24) and
(3.5.25) for the fundamental solution.

$3.6. Approximate synthesis of some stochastic


systems with distributed parameters
This section occupies a special place in the book, since only here we
consider optimal control systems with distributed parameters in which the
plant dynamics is described by partial differential equations.
So far the theory of optimal control of systems with distributed parame-
ters is characterized by a significant progress, first of all, in its deterministic
branch [30, 1301. Important results are also obtained in stochastic problems
(the distributed Kalman filter, the separation theorem in the optimal con-
trol synthesis for linear systems with quadratic criterion, etc. [118, 1821).
200 Chapter I11

However, many problems in the stochastic theory of systems with lumped


parameters still remain to be generalized to the case of distributed plants.
We do not try to consider these problems in detail but only discuss
the possible use of the approximate synthesis procedure (3.0.6)-(3.0.8) for
solving some stochastic control problems for distributed systems. Our con-
sideration is confined to problems in which the plants are described by
linear partial equations of parabolic type.
3.6.1. Statement of the problem. Let us consider control systems
subject to the equation

aV(t,
dt
= L,v(t, x) + u(t, a ) + [(t, a ) , 0 <t 5 T,
v(0, x) = vo(x).
(3.6.1)
Here C, denotes a smooth elliptic operator with respect to spatial variables
2 = ( X I , . . - 7 xn),

d2 d
axiaxj +
L, = aij (t, x) - bi(t, 2)-
axi + c(t, x),
whose coefficients aij (t, x), bi (t, x), and c(t, x) are defined in the cylinder
fl = D x [O,T], where D is the closure of an arbitrary domain in the
n-dimensional Euclidean space R, and the matrix a(t, x) satisfies the in-
equality
T
rl a7 = aij(t, x ) ~ i r l >
j 0 (3.6.3)
for all (t, x) E 0 and all 7 = (vl, . . . , 7,) (as usual, in (3.6.2) and (3.6.3)
the sum is taken over twice repeated indices from 1 to n).
If D does not coincide with the entire space R,, then, in addition to
(3.6.1), the following boundary conditions must be satisfied at the boundary
d D of the domain D:
M,v(t, 2) = uy(t,x), (3.6.4)
where the linear operator M, depends on the character of the boundary
problem. Thus, for the first, the second, and the third boundary value
problems, condition (3.6.4) has the form

Here x E dD, dvldu denotes the outward conormal derivative, and a is


the outward conormal vector whose components aj (i = 1 , . . .,n) and the
Approximate Synthesis of Stochastic Control Systems 201

components of the outward normal v on the boundary a D are related by


the formulas ui = aijvj [61, 1241; in particular, if Ilaijll; is the identity
matrix, i.e., aij = Sij, then the conormal coincides with the normal.
For example, equations of the form (3.6.1) with the boundary condi-
tions (3.6.4) describe heat propagation or variation in a substance concen-
tration in diffusion processes in some volume D [166, 1791. In this case,
v(t, x) is the temperature (or, respectively, the concentration) at the point
x E D a t time t. Then the boundary condition (3.6.4.1) determines the
temperature (concentration), and the condition (3.6.4.11) determines the
heat (substance) flux through the boundary a D of the volume D.
System (3.6.1) is controlled both by control actions u(t, x) distributed
throughout the volume and by variations in the boundary operating condi-
tions ur(t, 2). The admissible controls are piecewise continuous functions
u(t, x) and ur(t, x) with values in bounded closed domains:

We assume that the spatially distributed random action [(t, x) is of the


nature of a spatially correlated normal white noise

where K ( t , x, y) is a positive definite kernel-function symmetric in x and y


and S(t) is the delta function.
We also assume that, under the above assumptions, the function v(t, x)
characterizing the plant state a t time t is uniquely determined as the gen-
eralized solution of Eq. (3.6.1) that satisfies (3.6.1) for ( x , t ) E D x (0, T ]
and is a continuous continuation of a given initial function v(0, x) = vo(x)
as t + 0 and of the boundary conditions (3.6.4) as x -+ dD.
The problem is to find functions u,(t, x) and u:(t, x) satisfying (3.6.5)
so that to minimize the optimality criterion

. .
where xi = (xi, x i , . . ., xh), dxi = d x y x i . . .dxh ( i = 1 , 2 , . . . , s), and w is
an arbitrary nonnegative integrable function. In this case, the desired func-
tions u, and u: must depend on the current state v(t, x) of the controlled
system (the synthesis functions), that is, they must have the operator form

(it is assumed that the state function v(t, x) can be measured precisely).
202 Chapter I11

3.6.2. The Bellman equation and equations of successive ap-


proximations. To find the operators (3.6.8), we shall use the dynamic
programming approach. Taking into account the properties of the para-
bolic equation (3.6.1) and the nature of the random actions (3.6.6), we can
prove [95]that the time evolution of v ( t , x ) is Markov in the following sense:
for given functions u(t,x ) and u r ( t ,x ) , the probability distribution of the
future values of V ( T , x ) for T > t is completely determined by the value
of the function v ( t , x ) a t time t. This allows us to consider the minimum
losses on the time interval [t,TI,

F [ t ,v ( t , x ) ] = min
u(t,x)€U(s)
u , ( t , x ) € U r ( x ) t<r<T

where

as a functional depending only on the initial (at time t ) state v ( t , x ) and


time t. Therefore, the fundamental difference equation of the dynamic
programming approach (see (1.4.6)) can be written as
t+At
F [ t ,v ( t , x ) ] = min
u T € U , urTEUr
t<r<t+At
E{L Li, d~ + F E + at,v ( t + a t , x ) ] ) .
(3.6.10)
For small At, in view of (3.6.1), we have
v(t + At, x ) = v ( t , x ) + A v ( t , x )

Taking (3.6.11) into account, we can expand the functional F [ t + A t , v(t+


A t , x ) ] in the functional Taylor series [91]

+'// D D
J 2 F [ t v, ( t , 211
Sv(t, ~ ) ~ v Y)
A v ( t , x ) A v ( t ,y) d x d y
(t,
+ . .. .
(3.6.12)
Approximate Synthesis of Stochastic Control Systems 203

The functional derivatives SFISv and S2F / S v ( x ) 6 ~ ( in


~ ) (3.6.12) (for their
detailed description, see [91]) can be obtained by calculating the standard
derivatives in the formulas
6F 1 d F a ( ~ i~, 2 ,...)
= lim - 1
S V ( ~ , X ) A+O An dvj
A,+x

d2F . 1 d2Fa(vl,v2,.. .)
= lim -
(3.6.13)
6 ~ ( t~, ) 6 ~ y)
( t , a+o A2" d ~ i d ~ j
aj+x

In (3.6.13) the functional Fa(vl, v2,. . .) denotes a discrete analog of the


functional ~ ( tv(t,
, x)), which can be obtained by dividing the volume D
into n-dimensional cubes Ai of equal volume An and replacing the con-
tinuous function v(t, x) by a set of discrete values vl, 212,. . . each of which
is equal to the value of v(t, x) a t the center of the cube Ai. In this case,
the functional F is assumed to be sufficiently smooth, that is, its weak and
strong Giiteaux and Freshkt derivatives [91] exist up to the second order
inclusively, are equal to each other, and coincide with (3.6.13).
Substituting the expansion (3.6.12) into (3.6.10), passing to the limit
as At + 0, and taking into account (3.6.6) and (3.6.11), we obtain the
Bellman equation with functional derivatives:
dF - 6F
--
dt u
min
~ u u,EUr
,
{ ~ ( u up,
, v) + -[lXv(t, X) + dx

1 62 F
+ 5 J, J, K(t'x' ')6v(t, x)bv(t, y)
dxdy.

To find the desired optimal control operators (3.6.8), it is necessary to solve


Eq. (3.6.14).
The integral in the braces in (3.6.14) depends (in addition to the "solid"
controls u(t, 2)) on the control actions ur(t, x) that determine the boundary
conditions (3.6.4) for the functions v(t, x) obtained by solving Eq. (3.6.1).
We can write this dependence explicitly by using the Green formula [61,
1241
204 Chapter I11

where C : denotes the differential operator dual to C, in the variables x


and v is the outward normal on aD. In (3.6.15) the integral over the
boundary d D of the domain D explicitly depends on the control ur (t, x) of
the boundary operating conditions as it follows from (3.6.4). To be definite,
let us consider the third boundary value problem (3.6.4.111). The outward
conormal derivative of the state function v(t, x) on the boundary d D can
be written as

Substituting (3.6.16) into (3.6.15) and (3.6.15) into (3.6.14), we obtain the
following final Bellman equation (for the third boundary value problem):

--
aF
at
-
- min { ~ ( uup,
u , ,
u,EUr
, v) + J, %udx
SF
+ LD

This equation can be solved only approximately if the penalty functions are
arbitrary and the controls u and u r are subject to constraints.
Let us consider one of the methods for solving (3.6.17) by using the
approximate synthesis procedure of the form (3.0.6)-(3.0.8). As already
noted (883.1-3.4), the approximate synthesis method is especially conve-
nient if the controlling actions are small, namely, Ilv - voll/llvII << 1, where
v is the solution of Eq. (3.6.1) with the boundary condition (3.6.4) and with
any admissible functions u and ur satisfying (3.6.5), vo is the solution of
the corresponding homogeneous (in u and u r ) problem, and 11 -11 is the norm
in the space L2. From a physical viewpoint, this means that the power of
(controlled) sources is not large as compared with llv112 or with the intensity
JD lD K ( t , x, y ) dxdy of random perturbations t ( t , x).
Then, by setting u(t, x) = ur 0, we obtain the following equation for
the zero approximation instead of (3.6.17):
Approximate Synthesis of Stochastic Control Systems 205

Here, according to (3.6.9), Go(v(t, x)) is a functional of the form

= - uo[v(t,xl), . . ., v(t, xs)]do1.. .dx'.


(3.6.19)

If the functional Fo(t, v(t, x)) satisfying (3.6.18) is found, then the condition

min { G ( ~ , u ~ , v )
uEU,
urEUr
+ JD ?udx+ LD
allows us to calculate the zero-approximation optimal control functions (op-
erators) uo(t, x) = (t, v(t, x)) and u''(t, x) = $0 (t, v(t, 2)).
The expression for GI (v(t, z)) is used to calculate the first approximation
F~(t, v(t, x)) , and so on. In general, the kth approximation Fk(t, v(t, x))
(k = 1,2,. . .) of the loss functional is determined as the solution of an
equation of the form (3.6.18), where the change Go t Gk and F o -+ Fk is
performed. Furthermore, simultaneously with F k , we determine the pair of
functions (operators)

uk(t, = ~ k [ t~t ( tx)],


, X E D, Ut(t, 2 ) = &[t, v(t, x)], x E dB,

which allow us to synthesize a suboptimal control system in the kth ap-


proximation (the functions cpk and $k can be obtained from Eq. (3.6.20)
with Fo replaced by Fk).
3.6.3. Quadrature formulas for functionals of successive ap-
..
proximations Fk[t,v(t, x)], k = 0,1,2,. . To use the above procedure
of approximate synthesis in practice, we need to solve Eq. (3.6.18) and the
corresponding equations for Fk (k = 1,2, . . .).
206 Chapter I11

First, let us consider the zero-approximation equation (3.6.18). We show


that if the influence function G(x, t ;C , T) of a n instantaneous point source15
is known, then the solution of Eq. (3.6.18) can be written in the form

where the function wo(v1,. . .,v,) is defined by (3.6.19) and

Here the entries of the matrix llDtrll are given by the formulas

and (DF;),~ denotes the (a,P ) t h element of the inverse matrix IIDtTjl-l.
To prove (3.6.21) and (3.6.22), we need to recall some well-known facts [61,
1241 from the theory of adjoint operators and Green functions.
Suppose that a smooth elliptic operator L, of the form (3.6.2) is given in
an arbitrary domain D of a n n-dimensional Euclidean space R,. We also
assume that this operator is defined in the space of functions f sufficiently
smooth in D and satisfying the equation

on the boundary d D of the domain D; here M , denotes a certain differen-


tial operator with respect to the variables x E d D (a boundary operator).

I5The function G ( x ,t ; C , T ) , t > 7 , with respect to the variables ( x ,t ) is the solution


of the homogeneous boundary value problem ( 3 . 6 . 1 ) ,(3.6.4)(the case in which u ( t ,I ) =
u r ( t ,x) = [ ( t , x ) 0 in (3.6.1) and (3.6.4))with the initial condition V ( T , x ) = & ( x- C ) .
This function is also called the fundamental solution or the Green function of problem
(3.6.11, (3.6.4).
Approximate Synthesis of Stochastic Control Systems 207

DEFINITION3.6.1. The operators L; E D and M; E d D are called


adjoint operators of L, and M, if for arbitrary sufficiently smooth functions
f (x), satisfying (3.6.24), and p ( x ) satisfying

we have the relation

In general, the adjoint operators L: and M: are not uniquely defined.


However, if we set L: equal to the adjoint operator defined in the un-
bounded domain D = R, [61], that is,

then it follows from Definition 3.6.1 and the Green formula

that the operator M; can be defined uniquely. So, for the first, second,
and third homogeneous boundary conditions (that is, for the conditions
(3.6.4.1)-(3.6.4.111)) where up@,x) = 0, Eq. (3.6.25) takes, respectively, the
form

Now let us consider the parabolic operators


208 Chapter I11

DEFINITION3.6.2. A function G ( x ,t ;C , r ) defined and continuous for


( x , t ) , ( C , r ) E C!, t > r , is called the influence function of a point source
(the Green function) for the equation Lf = 0 in the domain C! if for any
r E 10, T ) the function G ( x ,t ;<,r ) satisfies the equation

in the variables ( t ,x ) in the domain D x ( r < t < T ) and satisfies the initial
and boundary conditions

limG(x, t ;(', r ) = d ( x - C ) , (3.6.31)


tJ.7
MSG= 0 for x E dD, r <t < T. (3.6.32)

In a similar way, the Green function G* ( x ,t ;C , r )is defined for the adjoint
parabolic operator (3.6.29). The only difference is that, in this case, the
function G* is defined for time t < r. The conditions (similar to (3.6.30)-
(3.6.32)) that determine the Green function for the adjoint problem have
the form

L*G*=O for (t,x)~Dx(O<t<r), (3.6.33)


limG*( 2 ,t ;C , r ) = 6 ( x - <), (3.6.34)
t?.r
M:G*=O for (t,z)~dDx(O<t<r). (3.6.35)

The following statement readily holds for the functions G and G * .


DUALITYT H E O R E M . If G ( x ,t ;C , r ) and G* ( x ,t ;C,r ) satisfy problems
(3.6.30)-(3.6.32) and (3.6.33)-(3.6.35), then

G ( x ,t ;C , 7 ) = G* (C, r ;x , t ) . (3.6.36)

PROOF. Let us consider the functions G ( y , q ; C , r ) and G * ( y ,q ; x , t ) for


y E D and r < q < t. Taking into account the fact that these functions
satisfy (3.6.30) and (3.6.33) in y and q , in view of Definition 3.6.1 of the
adjoint (in y ) operator C;, we have
Approximate Synthesis of Stochastic Control Systems 209

Rewriting (3.6.37) in the form

passing to the limit as E + 0, and taking into account (3.6.31) and (3.6.34),
we obtain (3.6.36).

Now, by using the properties of the Green functions, we shall show that
the functional (3.6.21) actually satisfies Eq.(3.6.18). To this end, we need
to calculate all derivatives in (3.6.18). Taking into account the relation

lim/
rJt
m
.lCO
-,
..
..
m
dvl ) ~1
. d v s { ~ ( v l , ...,v , ) [ ( 2 ~det 1~~,//]'~'

and the property (3.6.31) of the Green function, we differentiate (3.6.21)


with respect to time and obtain

To calculate apt,/&, we use the rules for differentiating determinants


and inverse matrices:

(here B is the matrix composed of the time-derivatives of the entries of the


matrix B). Performing the necessary calculations, we obtain
210 Chapter I11

where, for brevity, we use the notation

By formulas (3.6.13) and (3.6.22), we can readily obtain the first- and
second-order functional derivatives

In view of (3.6.36), the Green functions G(xa, r ;x, t ) in (3.6.39)-(3.6.42)


satisfy (with respect to x and t ) the adjoint equation (3.6.33) in the interior
of the domain D and the adjoint boundary condition on the boundary dD.
Taking into account the fact that the adjoint boundary condition has the
form (3.6.25.111) for the third boundary value problem (Eq. (3.6.18) was
written just for this problem) and substituting (3.6.41) into (3.6.18), we
readily verify that the integral over the boundary a D in (3.6.18) is equal
to zero. Finally, substituting (3.6.38)-(3.6.42) into (3.6.18), we arrive a t an
identity, and relation (3.6.21) is thereby proved.
The solution of the zero approximation equation (3.6.18) is given by
formulas (3.6.21) and (3.6.22). As a rule, the higher-order approximations
Fk (t, v(t, x)), k 2 1, are calculated by more complicated formulas, where,
in addition, we must pass to the limit, since, in general, wk.(v(t, x)), k 2 1,
are not integral functionals of the form (3.6.19). Therefore, we can calculate
successive approximations Fk (t, v(t, x)), k >_ 1, by using, instead of (3.6.21),
Approximate Synthesis of Stochastic Control Systems 211

the formula [95]

: (v(t, xl), . . . , v(t, xT)) is a finite-dimensional


where z g ( v 1 , . . .,v,) = G
analog of the functional Gk[v (t, x)] such that

lim w kA (VI, . . .,vr ) = Gk[v(t, x)].


r+m
A+O

The following example illustrates calculations with the help of formula


(3.6.43).
3.6.4. An example. If we choose some special expressions for the
functional (3.6.7), the operator (3.6.2), etc., then, using formulas (3.6.21)
and (3.6.43), we can obtain a finite approximate solution of the synthesis
problem. As an example, we calculate the optimal control of a substance
concentration in a cylinder of a finite length.
Let us consider a control problem often encountered in chemical industry
processes. Suppose that there is a chemical reactor in which the output
product is obtained by catalytic synthesis reactions. We assume that the
reacting agents diffuse into the catalysis chamber through pipelines. There
may be branches in the pipeline through which reagents are coming to
technological units where the concentration of the entering substance varies
on random. Simultaneously, to obtain a qualitative output product, it is
necessary to maintain the reagent concentrations close to given values. One
of possible ways to stabilize the concentration in the catalysis chamber is
to change the flow rate a t the input of the corresponding pipeline.
After appropriate generalizations and idealizations, this problem can be
stated as follows. Let the plant (a pipeline) be a cylinder of length l filled
with a homogeneous porous medium; the assumption r << l, where r is the
radius of the base, allows us to neglect radial variations of the concentra-
tion v and assume that v depends only on (x, t ) ,0 <
- x <- l. We also assume

that the cylinder is closed a t one end (x = l) and the flow rate is given
at the other end of the cylinder. The concentration v of the substance in
the cylinder can be affected by changes in the flow rate a t the end of the
cylinder (the rate of the incoming flow is the controlling action). Assuming
that a random perturbation ((t, x) is a stationary white noise, we obtain
the following mathematical model of the plant to be controlled [95]:
212 Chapter I11

(here B and C a r e the diffusion and the porosity coefficients of the medium);

For the plant (3.6.45)-(3.6.47), we need to synthesize a regulator that min-


imizes the mean value of the quadratic performance criterion

I =E [lTll a(X, Y)V(t, x)v(t, Y) dtdx ] (3.6.48)

(8(x,y) is a given positive definite function, i.e., the kernel) provided that
the absolute value of the boundary control action (the boundary flow of the
substance) u is bounded, that is,
Iul 5 Urn- (3.6.49)
In this example the Bellman equation (3.6.17) has the form

+ f "
K(x' ~ ) S v ( tx)Sv(t,
, y)
drdy
x=t
+ a 2 min
I U I L U ~ a x Sv
va(E)]
dx Sv %=a
, F[T,V(T,X)]=O.

Taking into account (3.6.45) and (3.6.46) and calculating the minimumwith
respect to u, we can rewrite (3.6.50) in the form

+ fl l S2F
K ( x ' ~ ) S v ( tx)Sv(t,
, y)
dxdy

Simultaneously, we obtain the optimal control law

Thus, to obtain the final solution of the synthesis problem, it remains to


calculate the functional derivative [SF/Sv(t, x)],=~in (3.6.52). We calculate
it by the method of successive approximations.
Approximate Synthesis of Stochastic Control Systems 213

The zero approximation. Suppose that urnis small. To solve (3.6.5 I),
we first set urn = 0. As a result, we obtain the following equation of the
zero approximation
a2 SF0
-?! at
= Jd( l
e
B(x, y)v(t, x)v(t, y) dxdy a
e
+ 2S,
v(t, x)- (----)
ax2sv(t,x)
dx

S2Fo
dxdy, FO[T, V(T,x)] = 0.

Elementary calculations show that its solution (3.6.21) can be written in


the form

FO[~, v(t, = JTdT{ J( Je@(z,v)


t 0 0

Here the Green function G for the boundary value problem (3.6.45), (3.6.46)
can readily be obtained by the separation of variables (the Fourier method)
[26, 1791 and represented as the series

The functional derivative of the quadratic functional (3.6.54) can readily


be calculated (for example, by using formulas (3.6.13); see also 1911) as
follows:

Hence it follows that the optimal control law (3.6.52) has the following form
in the zero approximation:

u0[t,v (t, x)] = urnsign


214 Chapter 111

The first approximation. Taking into account (3.6.56), we can write


Eq. (3.6.51) in the first approximation with respect to u, as follows:

S2 Fl
dxdy - 2a2u,G[v(t, x)],

Now, formulas (3.6.21) and (3.6.22) are not sufficient for calculating
Fl(t, v(t, x)); we need to use a more complicated calculation procedure
according to (3.6.43) and (3.6.44). A finite-dimensional analog of the func-
tional G can be obtained by dividing the interval [O,t] into the intervals
A = t / r and replacing G by

Next, we use formulas (3.6.21), (3.6.22), and (3.6.43) as well as the


formula
Approximate Synthesis of Stochastic Control Systems 215

where

p, v = 1 , . . .,r ; @ ( x )= - e-y2/' d y ; hPgP = hlgl + . - - + hrgr.


As a result, for F l [ t , v ( t , x ) ] , we obtain the expression

~ i [vt( ,t , x ) ] = ~ o [ vt (, t , x ) ] - 2a2um lT dT{,/Fe-H2/27(T)

where Fo[t, v ( t , x ) ] is given by (3.6.54), and moreover,

H = H [ t ,T , v ( t , x ) ] =
x G ( x ,u;0 , T ) G ( Y ,u;2, r ) G ( T ,T ; 8,t ) v ( t ,y) dxdydZdg,
(3.6.60)

x G ( 2 , r;x , u ) G ( y ,T ; y, u ) G ( x l ,a;0 , .r)G(yl,a;:, r ) G ( x l ' ,Z; 0 , T )


x G ( y l l ,z;5, T ) d x d y d x ' d y ' d x " d y l ' d ~ d ~ .

After the functional derivative ( 6 / 6 v~( t , ~x ) ) z=o is calculated, relations


(3.6.52) and (3.6.59) yield the controlling functional

ul [t,u ( t , x)] = u, sign

x [lll o(X, Y ) G ( X ,r;o , t ) G ( Y ,7 ;8 , t ) v ( t ,8) d y d x d y

Formula (3.6.61) enables us to synthesize the quasioptimal control system


in the first approximation.
216 Chapter I11

Although the quasioptimal control algorithms (3.6.56) and (3.6.61) look


somewhat cumbersome (especially, formula (3.6.61)), they admit a trans-
parent technical realization. For example, let us consider the zero-approxi-
mation algorithm (3.6.56), which can be written as

where

Q(v,t ) = lT drLe @(x,Y)G(X, 7; 0, t)G(y, T; Y, t ) d ~ d y

is a known function calculated previously. The current value of the state


function v(t, x) can be determined by a system of data units that measure
.
the concentration v(t, X I ) , v(t, x2), . . .,v(t, 2,) a t the points X I , 2 2 , . . , x,
lying along the cylinder. In particular, if the concentration gauges are
placed uniformly along the cylinder, then the integral in (3.6.62) can be
replaced by the sum

As a result, we obtain an algorithm whose realization does not present any


difficulties.
Approximate Synthesis of Stochastic Control Systems 217

Indeed, it follows from (3.6.63) that, besides a system of data units, the
control circuit (the feedback circuit) contains a system of linear amplifiers
with amplification factors Qi(t), an adder, and a relay type switching device
that relates the pipeline [0, .l]either to reservoir 1 (for pumping additional
substance) or to reservoir 2 (for substance suction a t the pipeline input).
Figure 29 shows the block diagram of the system realizing the control
algorithm (3.6.63).
The quasioptimal first-approximation algorithm (3.6.6 1) can be realized
in a similar way. Here only the control circuit, along with a nonlinear
unit of an ideal relay type, contains nonlinear transformers that realize the
probability error functions a(%).
However, it should be noted that an error is inevitably present in the
finite-dimensional approximation of the state function v(t, x) (when the
algorithm (3.6.56) is replaced by (3.6.63)), since it is impossible to measure
the system state v(t, x) precisely (this state is a point in the infinitely
dimensional Hilbert state L 2 ) . However, if the points X I , . . ., x p of location
of the concentration data units lie sufficiently close to each other, then this
error can be neglected.
CHAPTER IV

SYNTHESIS OF QUASIOPTIMAL SYSTEMS


IN THE CASE OF SMALL DIFFUSION
TERMS IN THE BELLMAN EQUATION

If random actions [(t) on the plant in the closed-loop control system


shown in Fig. 3 are of small intensity and the observation errors ~ ( tand
) C(t)
are large, then the Bellman equation contains a small parameter, namely,
the coefficients of the second-order derivatives of the loss function in the
phase variables are small.
Indeed, considering the synthesis problem for which we derived the Bell-
man equation in the form (1.4.26) in $1.4, we assume that the matrix
u(t, x(t)) in the plant equation (1.4.2), which determines the intensity of
random perturbations, has the form a(t, x) = E1I2~o(t, x), where E is a
small parameter (0 < E < 1). Moreover, if BY@,y) = ~ B z ( ty), is the
diffusion matrix of the input process, then Eq. (1.4.26) acquires the form

Ft + [Ay(t, y)lTFY+ 5E [SP B: (t, x)FxX+SpB:(t, y)FYy]+@(t,


x, Y,Fx) = 0,
(4.0.1)
where B$(t, x) = oo(t, x)uT(t, 2).
On the other hand, large observation errors correspond to the case in
which the matrix Q(t) in (1.5.46) has the form Q(t) = E - ~ / ~ Q ~ In (~).
this case, the Bellman equation (1.5.54) for the problem considered can be
written in the form

Ft + - Sp DRDF,,T
2
E
+ Sp [ F D ( U U ~ - EDRD)]+ %(m, D, F,, Fo) = 0,
(4.0.2)
where
220 Chapter IV

If the value of the parameter E is small, then the solutions of the above
equations and the solutions of the equations

obtained from (4.0.1)) (4.0.2) by setting E = 0, are expected to be close to


each other. The equations for F0 are, generally speaking, simpler than the
original Bellman equations, since they do not contain second-order deriva-
tives and thus are partial differential equations of the first order. If these
simpler equations can be solved exactly, then we can construct solutions of
the original Bellman equations as series in powers of the small parameter E,
that is, as F = F0+&F1 + E ~. . .. Here the function F0plays the role of the
leading term (generating solution) of the expansion. Taking finitely many
terms
F~= F O + E F ~ + . . . + E ~ F ~ (4.0.5)
of the asymptotic series and considering Fk as an approximate solution of
the Bellman equation (the kth approximation), we can readily solve the
synthesis problem corresponding to this approximation. To this end, it
suffices to make the change F + Fk in the expression for the optimal
control algorithm u, = ~ ~ x, ( y,
t d, F / a x ) (see, for instance, (1.4.25)). In
this way, we obtain the quasioptimal algorithm for the kth approximation:
uk(t12,Y) = PO(^, 2, Y, aFk(t, 2, Y ) / ~ Z ) -
>
For k 1, the expressions of Fk (or Fk) can be calculated recurrently. If
the functions and are sufficiently smooth, then the system of equations
for the successive terms F1,F2,. . . in the expansion (4.0.5) can be obtained
in the standard way by substituting the expansion (4.0.5) into Eqs. (4.0.1)
or (4.0.2) and setting the coefficients of different powers E~ (k 2 1) of the
small parameter equal to zero. In other cases, it may be convenient to
use a somewhat different scheme of calculations in which the successive
approximations Fk (k >1) are obtained as solutions of the sequence of
equations:
Synthesis of Quasioptimal Systems 221

This approximate synthesis procedure was studied in detail and exploited


for solving some special problems in [34, 56, 58, 172, 1751. The accuracy
of the approximate synthesis was investigated in [34, 561. It was shown
that, under certain conditions, the use of the quasioptimal control uk in
the kth approximation gives an error of the order of E~~~ in the value of
the minimized functional. In other words, if instead of the optimal control
algorithm u, we use the quasioptimal algorithm uk, then the difference
between the value of the optimality criterion I[uk] corresponding to this
control and the minimum possible (optimal) value I[u,] = F is of the order
of E"', that is,

where c is a constant.
In the present section the main attention is paid to the "algorithmic"
aspects of the method, that is, to calculational methods for obtaining qua-
sioptimal controls uk. As an example, we consider two specific problems of
the optimal servomechanism synthesis. First (in 54.1), we consider the syn-
thesis problem that generalizes the problem considered in 52.2 to the case in
which the input process ~ ( t is) a diffusion Markov process inhomogeneous
in the phase variable y. Next (in $4.2), we write an approximate solution of
the synthesis problem for an optimal system of tracking a discrete Markov
process of a "telegraph signal" type when the command input is observed
on the background of a white noise.

$4.1. Approximate synthesis of a


servomechanism with small-intensity noise
Let us consider a servomechanism shown in Fig. 10. Assume that the
plant P is described by a scalar equation of the form

where ((t) is the standard white noise of unit intensity (1.1.31), E and N
are given positive constants (E is a small parameter), and the values of
admissible controls u lie in the region1

'The nonsymmetric constraints (4.1.2) are, first, more general (see [21]), and second,
they allow a more convenient comparison between the results obtained later and the
corresponding formulas constructed in 52.2.
222 Chapter IV

where urn > a > 0. The command input ~ ( t is) a J(t)-independent scalar
Markov diffusion process with drift and diffusion coefficients

where @ and B > 0 are given numbers and E is the same small parameter
as in (4.1.1). The performance of the tracking system will be estimated by
the value of the integral optimality criterion

>
where the penalty function c (y(t) - x (t)) = c(z(t)) 0, c(0) = 0, is a given
concave function of the error signal z(t) = y(t) - x(t).
The problem stated above is a generalization of the problem studied in
Section 2.2.1 of $2.2 to the case in which the plant is subject to uncontrolled
random perturbations and the input Markov process y(t) is inhomogeneous
in the phase variable y (the drift coefficient AY = AY(y) = -@y # const).
The inhomogeneity of the input process y(t) makes the synthesis problem
more complicated, since in this case the Bellman equation cannot be re-
duced to a one-dimensional equation (as in Section 2.2.1 of $2.2).
Since problem (4.1.1)-(4.1.4) is a special case of problem (1.4.2)-(1.4.4),
then it follows from (1.4.21), (1.4.22), and (4.1.1)-(4.1.4) that the Bellman
equation has the form

-PyFy + -a-u,<u<-a+u,
min
-
[uFz] + i(NF.m + B F y y )+ c(y - X) = -Ft,
E

If like in Section 2.2.1 of $2.2 we introduce a new phase variable z = y - x


and replace the loss function F ( t , x, y) by F ( t , y, z), then Eq. (4.1.5) can
readily be written as
- min [-uFz1
@y(" + Fz) + -a-um<u<-a+um

We are interested in the stationary tracking when the terminal time


T + co. If the stationary loss function f (y, z ) is introduced in the standard
way (see (1.4.29) and (2.2.9)),

f (Y,') = Tlim
-im
EF@l y, 1' - T ( -~t)l, (4.1.7)
Synthesis of Quasioptimal Systems 223

then (4.1.6) implies the following stationary Bellman equation for the prob-
lem considered:

- P ~ ( f y+ f t ) + -a-u,<u<-a+u,
min [-ufrl

As usual, the number y 2 0 in (4.1.8) characterizes the mean losses per unit
time under stationary operating conditions. This number is an unknown
variable and can be obtained together with the solution of Eq. (4.1.8).
Let us discuss the possibility that Eq. (4.1.8) can be solved. By R+
we denote the domain on the phase plane (y, z) where f, > 0 and by R-
the domain where f, < 0. It follows from (4.1.8) that the optimal control
u,(y, z) must be equal to u, = u, - a in R+ and to u* = -urn - a in R - .
Denoting by f+(y, z) and f- (y, z) the values of the loss function f (y, z)
in the domains R+ and R-, we obtain the following two equations from
(4.1.8):

Since in (4.1.8) the first derivatives f y and f, are continuous on the interface
r between R+ and R- [172], both equations in (4.1.9) hold on r, and we
have the condition

Since the control action u, is of opposite sign on each side of the inter-
face I', the line I' is naturally called a switching line. It follows from the
preceding that the problem of the optimal system synthesis is equivalent to
the problem of finding the equation for the switching line I?.
Equations (4.1.9) cannot be solved exactly. The fact that expressions
with second-order derivatives contain a small parameter E allows us to solve
these equations by the method of successive approximations. In the zero
approximation, instead of (4.1.9), we need to solve the system of equations

By f:, and r0 we denote the loss function, the stationary error, and
the switching line obtained from the Eq. (4.1.11) for the zero approxima-
tion. The successive approximations fi,-yk, and rk ( k >
- 1) are calculated
224 Chapter IV

recurrently by solving a sequence of equations of the form

where

A method for solving Eqs. (4.1.11), (4.1.12) was proposed in [172]. Let
us briefly describe the procedure for calculating successive approximations
f k , 7 k , and rk,k = 0,1,2, . . .. First of all, note that Eqs. (4.1.11), (4.1.12)
are the Bellman equations for deterministic problems of synthesis of second-
order control systems in which the equation of motion has the form

(in the second equation the signs "minus" and "plus" of urn correspond
to the domains R'C+
and R!, respectively). As was shown in [172], the
gradient v f of the solution of nondiffusion equations (4.1.1 I ) , (4.1.12)
remains continuous when we cross the interface rk,that is, on rkwe have
the conditions
k
aft
af,-- -
-
af+ - - aft k = 0 , 1 , 2 ,..., (4.1.15)
ay ay ' az az
if the phase trajectories of the deterministic system (4.1.14) either approach
the line l' on both sides (the switching line of the first kind) or approach rk
on one side and recede on the other side (the switching line of the second
kind, see Fig. 4). This fact allows us to calculate the gradient vfk along
rk. Indeed, in the domain R: we have

and in the domain Rk_


Synthesis of Quasioptimal Systems 225

It follows from the preceding continuity considerations that both equations


(4.1.16) and (4.1.17) must be satisfied on r simultaneously. Solving these
equations for the first-order derivatives, we find the gradient of the loss
function on the interface rL between R$ and Rf:

This allows us to write the difference between the values of the loss function
a t different points on the boundary rk as a contour integral along the
boundary,

1
f k ( ~-) f k ( p ) =
P
Q
+
A: dy A: dl. (4.1.19)

If the part of rk between the points P and Q is a boundary of the first


kind (that is, the representative point of system (4.1.14) once coming to the
boundary moves in the "sliding regime" along the boundary [172]), then
formula (4.1.19) makes it possible to obtain a necessary condition for the
boundary rk to be optimal. The corresponding equation for the desired
switching line z = z k ( y ) is obtained from the condition that the difference
(4.1.19) must be minimal. This equation can be written in the form [I721

Equation (4.1.20) is a consequence of the following illustrative argu-


ments. Let yQ and yp be the coordinates of the points Q and P on the
y-axis. We divide the interval [yQ,y,] into N equal intervals of length
A = Iy, - yQ[ I N and replace the contour integral (4.1.19) by the corre-
sponding integral sum

where yi = yp f (i - 1)A and zi = ~ ( y i ) We . need to choose zi so that to


minimize the function @ a ( z l , . . ., z ~ + ~A) .necessary extremum condition
a@,/azi = 0 written for a n arbitrary i and the sum (4.1.21) allows us to
write the following system of equations for optimal zi:
226 Chapter IV

If A;(y, z), A,k(y,z), and zk(y) are sufficiently smooth functions of their
arguments, then we have
k k a ~ k aA:
Az(Yi-~,zi-l)-A,(yi,zi) = -Az(yi,zi)+-(yi,zi)(zi-l-zi)+~(A)
BY az
(4.1.23)
for small A = yi - y;-1. Substituting (4.1.23) into (4.1.22), taking into
+
account the relation zi+1 - 2zi zi-1 = o(A), and passing to the limit as
A + 0, we obtain the condition

which coincides with (4.1.20), since i is an arbitrary number.


If we know the gradient of the loss functions along the switching line
rk and the equation z = zk(y) for rk,then we can find a condition for
the parameter yk that is the kth approximation of the stationary tracking
error y in the original diffusion equation (4.1.8). By using (4.1.18) and the
equation z = zk (y), we obtain the following expression for total derivative
dfk/dy along rk:

The unknown parameter yk can be found from the condition that the de-
rivative (4.1.25) is finite a t a stable point; in the problem considered the
point y = 0 is stable. More precisely, this condition can be written as
lim ywk(y,y k ) = 0. (4.1.26)
Y+O
The expression

is the increment of the loss functions f k on the time interval dt. Hence
(4.1.26) means that this increment becomes zero after the controlled de-
terministic system (4.1.14) arrives a t the stable state y = 0. Obviously, in
this case, it follows from the above properties of the penalty function c(z)
that we also have z = 0. Thus, relation (4.1.26) is a necessary condition for
the deterministic Bellman equations (4.1. l l ) , (4.1.12) to have stationary
solutions.
Let us use the above-treated calculation procedure for solving the equa-
tions of successive approximations (4. l.l l ) , (4. l.12). We restrict our calcu-
lations to a small number of successive approximations that determine the
most important terms of the corresponding asymptotic expansions and pri-
marily affect the structure of the controller C when a quasioptimal control
system is designed.
Synthesis of Quasioptimal Systems 227

The zero approximation. To calculate the zero approximation, we


need to solve the system of equations (see (4.1.11))

Using (4.1.15) and solving system (4.1.27) with respect to BY = ay - =


%f and d =
ay az az
== g,
-

we obtain the following expressions for the


components of the gradient V f O (4.1.18) on the switching line rO:

Equation (4.1.20), which is a necessary condition for the switching line of


the first kind, and (4.1.28) allow us to obtain the equation for rO:

Since, by assumption, the penalty function c(z) attains its unique minimum
at z = 0, the condition (4.1.29) implies the equation

that is, in the zero approximation, the switching line coincides with the
y-axis on the plane (y, z).
Now let us verify whether (4.1.30) is a switching line of the first kind.
An examination of phase trajectories of system (4.1.14) shows that on the
segment

the phase trajectories approach the y-axis on both sides;' therefore, this
segment is an actual switching line. For y @ [ L , e+], the equation for the
switching line r0 will be obtained in the sequel.
Now let us calculate the stationary tracking error -yo. From (4.1.25),
(4.1.26), and (4.1.28), we have

20bviously, in this case, the domain Rt (RO)is the upper (lower) half-plane of the
phase plane (y,z). Therefore, to construct the phase trajectories, in the second equation
in (4.4.14), we must take urnwith sign "minus" for z > 0 and with "plus" for z < 0.
228 Chapter IV

It also follows from (4.1.28) and (4.1.31) (with regard to c(0) = 0) that the
loss function is constant on the y-axis for l- < y < l + ; thus we can set
fO(y,O) = 0 for y E [l-,[+I.
To calculate the loss function f 0 a t a n arbitrary point (y, z), we need
to integrate Eqs. (4.1.27). To this end, let us first write the system of
equations for the integral curves (characteristics):

If yo denotes the point a t which a given integral curve intersects the y-axis,
z = 0, then (4.1.32) implies the following equation for the characteristics
(the phase trajectories) :

as well as for the zero approximation of the loss function

c(zl) dz'
PVZ'JI'P*(Y)+ z - z 1 1 - a f urn' (4.1.34)

In (4.1.34) we have yo = y z l [ y h (y) + z ] , where y-l(y) is the inverse func-


tion of cp(y).
By using the loss function (4.1.34) obtained, we can determine how to
continue the switching line I'O outside the interval [l-, l+], where r0 is a
switching line of the second kind (that is, the phase trajectories of system
(4.1.14) approach r0 on one side and go away on the other side). In this
case, as already noted, the gradient (4.1.15) remains continuous on r O ;
therefore, the derivatives of the loss function along r0 are determined as
previously by (4.1.28). However, in general, formula (4.1.20), from which
Eq. (4.1.30) was derived, may not be valid any longer. In this case, the
equation for r0can be obtained by differentiating (4.1.34), say, with respect
to z and by setting, in view of (4.1.28), the expression obtained equal to
zero. This implies the following equation for the switching line rO:

Here we took into account the equality c(0) = 0 and assumed that the
condition (dy*/dyo) - (ayoldz) # O must be satisfied on r0 determined
by (4.1.35).
Synthesis of Quasioptimal Systems 229

An analysis of the phase trajectories (4.1.14) shows that, to find r0for


y > l+,we must use the function cp- (y) in Eq. (4.1.35) (correspondingly,
cp+ (y), to obtain r0for y < L ) .
Let us calculate r0for y > t+ in the case of the penalty function c(z) =
z2. In this case, the integral in (4.1.35) can readily be calculated and
Eq. (4.1.35) acquires the form

+
(in (4.1.36) we have yo = cp~l[cp-(~) z], where cp-(~)is determined by
(4.1.33)).
Equation (4.1.36) determines the switching line z = zO(y) for z > L+
+
implicitly. Near the point y = l+= ( a u,)/P a t which the switching line
changes its type, Eq. (4.1.36) allows us to obtain an approximate formula
and thus write the equation for r0explicitly:

Figure 30 shows the position of the switching line r0and the phase trajec-
tories in the zero approximation.
230 Chapter IV

Higher-order approximations. Everywhere in the sequel we assume


that the penalty function c(z) = z2. Let us consider Eqs. (4.1.12) corre-
sponding to the first approximation:

To simplify the further calculations, we note that, in the case of the sta-
tionary tracking mode and of small diffusion coefficients considered here,
the probability that the phase variables y and z fluctuate near the origin
on phase plane (y, z) is very large. The values y = (a f u,)/P a t which
the switching line r0changes its type are attained very seldom (for the sta-
tionary operating conditions); therefore, we are mainly interested in finding
the exact position of the switching line in the region -(u, - a)/@ < y <
(u, + a ) / p , where, in the zero approximation, the position of the switching
line is given by the equation z = 0. Next, note that the first-approximation
equation (4.1.37) differs from the corresponding zero-approximation equa-
tion (4.1.27) only by a small (of the order of E ) term in the expression for
z) (see (4.1.38)). Therefore, the continuity conditions imply that the
switching line I'l in the first approximation determined by (4.1.37) is suf-
ficiently close to the previous position z = 0. Thus, we can calculate I" by
using, instead of exact formulas, approximate expressions corresponding to
small values of z.
Now, taking into account the preceding arguments, let us calculate the
function z) = c:(~, z) determined by (4.1.38). To this end, we differ-
entiate the second expression in (4.1.34) and restrict ourselves to the first-
and second-order terms in z. As a result, we obtain3

a2fg -
dz2
-
22
fly - a & u,
+ (fly -P2yz2
a 41
+z3...,

d2f2 -
- -- Pz2 +z3..., d2f2
= z 3 . .
dzdy (fly - a k dy2

3The functions f:(y, z) and fL(y,z), as the solutions of Eqs. (4.1.37),are defined in
R
: and R?. At the same time, the functions f t (Y,z) and f:(y, z) are defined in R
:
: and RO) and I'l (between
and RO. However, since the switching lines r0 (between R
: and R?) are close to each other, to calculate (4.1.39),we have used expressions
R
for f i in R$ and RL .
(4.1.34)
Synthesis of Quasioptimal Systems 231

Substituting (4.1.39) into (4.1.38) and (4.1.37), we arrive a t the equations

af:
Py-+ (Py - a i u r n ) -
a f i =z2 -yl+ +
E(B N ) z (4.1.40)
ay 8.2 (PY- a + urn)
(in Eqs. (4.1.40) we preserve only the most important terms in the functions
c i ( y , z) and neglect the terms of the order higher than or equal to that
of e3).
In view of (4.1.15), both equations (4.1.40) hold on the boundary r l .
By solving these equations, we obtain the components of the gradient of
the loss function V fl(y, z ) on the switching line r l :

In this case, the condition (4.1.20) (a necessary condition for the switch-
ing line of the first kind) leads to the equation

Hence, neglecting the order terms, we obtain the following equation for
the switching line I'l in the first approximation:

Equation (4.1.43) allows us to calculate the stationary tracking error


y1 in the first approximation. The function wl(y, yl) readily follows from
(4.1.25), (4.1.41), and (4.1.43). Substituting the expression obtained for
wl(y, y l ) into (4.1.26), we see that y1 = O(e2), that is, the stationary
tracking error in the first approximation coincides with that in the zero
approximation, namely, = 0.
The stationary error y attains nonzero values only in the second approx-
imation. To calculate the derivative (4.1.25)

with desired accuracy, we need not calculate the loss function f& (y, z) in
the first approximation but can calculate c2(y,z) in (4.1.12) and (4.1.13)
232 Chapter IV

by using expressions (4.1.41) for the derivatives d f '/dy and d f l / d z , which


are satisfied along the switching line rl.
Differentiating the first relation in (4.1.41), we obtain

d2f1 -
--
E(B N ) + along I".
dz2 u&-(P~-a)~

As follows from (4.1.41), the other second-order derivatives d2f '/dzdy and
d2f1/dy2 on I'l are higher-order infinitesimals and can be neglected when
we calculate y2. Therefore, (4.1.45) and (4.1.13) yield the following approx-
imation expression for the function c2(y,z):

Taking (4.1.46) into account and solving the system (4.1.16), (4.1.17)
a
(with k = 2) for f 2/ayand 8f /dz, we calculate the functions A: and A:
in (4.1.44) as

From (4.1.26), (4.1.43), (4.1.44), and (4.1.47), we derive the equation foi
the stationary tracking error in the second approximation:

whence it follows that

Formula (4.1.48) exactly coincides with the stationary error (2.2.23) ob-
tained for a homogeneous (in y) input process. The inhomogeneity, in other
words, the dependence of the stationary error on the parameter P, begins to
manifest itself only in the calculations of higher approximations. However,
the drift coefficient -fly affects the position of the switching line (4.1.43)
already in the first approximation. Formula (4.1.43) is a generalization of
the corresponding formula (2.2.22); for /3 = O these formulas coincide.
Figure 31 shows the analogous circuit diagram of the tracking system
that realizes the optimal control algorithm in the first approximation. The
unit N C is an inertialess nonlinear transformer governed by the functional
Synthesis of Quasioptimal Systems

dependence (4.1.43). The realization of the unit N C in practice is substan-


tially simplified owing to the fact that the operating region of the input
variable y (where (4.1.43) must be maintained) is small. In fact, it suf-
fices to maintain (4.1.43) for lyl < C E ' / ~ , where C is a positive constant
of the order of O(1). Outside this region, the character of the functional
input-output relation describing N C is of no importance. In particular, for
Iyl > CE'/', the nonlinear transformer N C can be constructed by using the
equations for the switching line r0 in the zero approximation or, which is
even simpler, by using the equation z 0. This is due to the fact that the
system shown in Fig. 31 optimizes only the stationary tracking conditions
when the phase variables are fluctuating in a small neighborhood of the
origin on the plane (y, 2).

$4.2. Calculation of a quasioptimal system


for tracking a discrete Markov process
As the second example illustrating the approximate synthesis procedure
described above, we consider the problem of constructing an optimal system
for tracking a Markov "telegraph signal" type process ( a discrete process
with two states) in the case where the measurement of the input signal is
accompanied by a white noise and the plant is subject to random actions.
Figure 32 shows the block diagram of the system in question. We assume
that y(t) is a symmetric Markov process with two states (y(t) = f1) whose
a prior2 probabilities p t ( f 1) = P[y(t) = fl ] satisfy the equations
Chapter IV

Here the number ,u > 0 determines the intensity of transitions between


the states y = +1 and y = -1 per unit time. The system (4.2.1) is a
special case of system (1.1.49) with m = 2 and X,(t) = Xya(t) = p. I t
readily follows from (4.2.1) that realizations of the input signal y(t) are
sequences of random pulses; the lengths T of these pulses and of the intervals
between them are independent exponentially distributed random variables,
P(T > C) = e - P C .
The observable process y(t) is an additive mixture of the input signal
y(t) and a white noise (independent of y(t)) of intensity x :

Like in $4.1, the plant P is described by the scalar equation

where ((t) is the standard white noise independent of y(t) and C(t) and the
controlling action is bounded in absolute value,

To estimate the system performance, we use the integral optimality criterion

where the penalty function c(y - x) is the same as in (4.1.4). In the method
used here for solving problem (4.2.1)-(4.2.5), it is important that c(y - x)
is a differentiable function. In the subsequent calculations, this function is
quadratic, namely,
c(y - 2) = (y - x ) ~ . (4.2.6)
Synthesis of Quasioptimal Systems 235

A peculiar feature of our problem, in contrast, say, with the problem


studied in $4.1, is that the observed pair of stochastic processes (g(t), x(t))
is not a Markov process. Therefore, as was already noted in $1.5, to use
the dynamic programming approach, it is necessary to introduce a special
space of states formed by sufficient coordinates that already possess the
Markov property.
4.2.1. Sufficient coordinates and the Bellman equation. Let us
show that the current value of the output variable x(t) and the a posteriori
probability w t ( l ) = P[y(t) = +1 I fji] are sufficient coordinates X t in the
problem considered. In the sequel, owing to purely technical considerations,
it is more convenient to take, instead of wt(l), the variable zt = w t ( l ) -
wt(-1) as the second component of X t . It follows from the normalization
+
condition wt (1) wt (-1) = 1 that the a posteriori probabilities wt (1) and
wt(-1) can be uniquely expressed via zt as follows:

Obviously, zt randomly varies in time. Let us derive the stochastic equa-


tion describing the random function zt = z(t). Here we shall consider a
somewhat more general case of the input signal nonsymmetric with respect
to probability. In this case, instead of (4.2.1) the a priori properties of y(t)
are described by the equations

that is, the intensities of transitions between the states y = +l and y = -1


down from above (p) and upwards from below (v) are not equal to each
other.
Let us pass to the discrete time reference. In this case, random functions
in (4.2.2) are replaced by sequences of random variables

where gn, y,, and Cn are understood as the mean values of realizations over
the interval A of time quantization:
236 Chapter IV

It follows from (4.2.8) (see also (1.1.42)) that the sequence yn is a simple
Markov chain characterized by the following four transition probabilities
pA(yn+1 I yn):

(all relations in (4.2.11) hold u p to terms of the order of o(A)).


It follows from the properties of the white noise (1.1.31) that the random
variables incorresponding to different indices are independent of each other
and have the same probability densities

Using these properties of the sequences yn and in, we can write recurrent
formulas relating the a posteriori probabilities of successive time instants
+
(with numbers n and n 1) and the result of the last observation.
The probability addition and multiplication theorems yield the formulas

Taking into account the relation p(yn = f 1,?jT) = wn ( f l ) p ( ? j T ) , we can


rewrite (4.2.13) and (4.2.14) as follows:

We write dn = wn(l)/wn(-1) and note that (4.2.9) and (4.2.12) imply


Synthesis of Quasioptimal Systems 237

Now, dividing (4.2.15) by (4.2.16) and taking into account (4.2.11), we


obtain the following recurrent relation for the parameter d,:

By letting the time interval A -+ 0, and taking into account the fact that
lima+o (d,+l - d,)/A = dt and (4.2.17), we derive the following differential
equation for the function dt = d(t):

Since, in view of (4.2.7), the functions zt = z(t) and dt satisfy the relation
+
dt = (1 z t ) / ( l - zt), Eq. (4.2.18) for zt has the form

For a symmetric signal ( p = u), instead of (4.2.19), we have

REMARK.According to (4.2.2), the observable process y(t) contains a


white noise, and the coefficients of g(t) in (4.2.18)-(4.2.20) contain random
functions dt = d(t) and zt = z(t). It follows from $1.2 that, in this case, we
must indicate in which sense we understand the stochastic integrals used
for calculating the solutions of the stochastic differential equations (4.2.18)-
(4.2.20). A more rigorous analysis (e.g., see [132, 1751 shows that all three
equations (4.2.18)-(4.2.20) must be treated as symmetrized equations. In
particular, just due to this fact we can pass from Eq. (4.2.18) to Eq. (4.2.19)
by using the standard rules for differentiating composite functions (instead
of a more complicated differentiation rule (1.2.43) for solutions of differen-
tial Ito equations).
Now let us verify whether the coordinates X t = (xt, zt) are sufficient for
the solution of the synthesis problem in question. To this end, according
to [I711 and $1.5, we need to verify whether the coordinates X t = ( x t , z t )
are sufficient
(1) for obtaining the conditional mean penalties
238 Chapter IV

(2) for finding constraints on the set of admissible controls u;


(3) for determining their future evolution (that is, the probabilities of
the future values X t + a , A > 0).
In this problem, in view of (4.2.4), the set of admissible controls is a
given interval -1 5 u 5 1 of the number axis independent of anything;
therefore, we need not take into account the statement of item (2).4
Obviously, the conditional mean penalties (4.2.21) can be expressed via
the a posteriori probabilities as follows:

Since formulas (4.2.7) express the a posteriori probabilities wt(4Zl) in terms


of zt, statement (1) is trivially satisfied for the variables (xt, zt).
Let us study the time evolution of (xt,zt). The variable xt = x(t)
satisfies an equation of the form (4.2.3). If in this equation the control
ut a t time t is determined by the current values of (xt, zt), then, in view
of the white noise properties, the probabilities of the future values of x(T),
T > t , are completely determined by X t = (xt, zt). Now, let us consider
+
Eq. (4.2.20). Note that, according to (4.2.2), c ( t ) = y(t) f i C ( t ) , where
y(t) is a Markov process and <(t) is a white noise. Therefore, it follows from
Eq. (4.2.20) that the probabilities of the future values zt+a are determined
by zt and the behavior of Y(T), T 2 t. However, since Y(T) is a Markov
process, its behavior for T >
t is determined by the state yt described
by the probabilities wt(yt = *I), that is, in view of (4.2.7), still by the
coordinate zt. Thus, statement (3) is proved for Xt = ( x y ,zt).
Equations (4.2.3) and (4.2.20) allow us to write the Bellman equation
for the problem considered. Introducing the loss function

(4.2.23)
and using the Markov property of the sufficient coordinates ( x ( t ) , z ( t ) ) ,
from (4.2.23) we obtain the basic functional equation of the dynamic pro-
gramming approach:
r &+A
F ( t , xt, zt) = min
lu(r)l<1

4 ~ist necessary to verify the statement of item (2) only in special cases in which the
control constraints depend on the state of the control system. Such problems are not
considered in this book.
Synthesis of Quasioptimal Systems 239

The Bellman differential equation can be derived from (4.2.24) by the stan-
dard method (see $1.4 and $1.5) of expanding F ( t + A, xt+a, zt+a) in the
Taylor series around the point (t, xt, zt), averaging, and passing to the limit
as A t 0. In this procedure, we use the following obvious formulas that
are consequences of (4.2.3), (4.2.7), and (4.2.20)-(4.2.22):

E[(zt+a - ~ t ) ( z t + a- zt) I xt, ~ t=


] o(A), (4.2.29)
E[(xt+a - I xt,zt] = o(A),
E[(zt+a - I xt, zt] = o(A), k > 3. (4.2.30)
It is somewhat more difficult to calculate the mean value of the difference
(zt+a-zt). Since, as was already noted, (4.2.20) is a symmetrized stochastic
equation, E [ ( z ~ - ~ I xt, zt] = E[(zt+a - z t ) I zt] can be calculated with
+ zt)
the help of formulas (1.2.29) and (1.2.37) (with u = 112 in (1.2.37)). Then,
taking into account the relation

from (4.2.20) and (1.2.37), we obtain

As A -+ 0, relations (4.2.24)-(4.2.31) enable us to write the Bellman dif-


ferential equation in the form

-aF
+
at
min u-
1,51
aF
[ ax] -2pz-+--+
aF
az
B
2
~
axz
~( 1 -Fz 2 ) 2 a 2 ~
2~ az2
240 Chapter IV

The second term in Eq. (4.2.32) can also be written as -IdF/dxl.


To the equation obtained, we must add a condition on the loss function
in the end of the control process, namely,

and some boundary conditions.


Since the input signal takes one of the two values y(t) = f 1 a t each
instant of time t , we can restrict our consideration to the region 1x1 5 1.
Thus the sufficient coordinates are defined on the square -1 5 x $1, <
-1 <- z <- $1. The boundary conditions on the sides x = -1 and x = $1

of this square are

These conditions mean that there is no probability flow [ l l , 1731 through


the boundary x = f
On the other sides z = f1 of the square, the diffusion coefficient con-
tained in the second diffusion term *% is zero. Therefore, instead
of the conditions d F / d z = 0 on these sides of the square, we have the trivial
conditions

If, by analogy with the problem solved in $4.1, in the space of sufficient
coordinates (x, z) we denote the regions where d F / d x > 0 and d F / d x < 0
by R+ and R-, respectively, then in these regions the nonlinear equation
(4.2.32) is replaced by the corresponding linear equation and the optimal
control is formed by the rule

Since the first-order derivatives of the loss function are continuous [113,
1751, on the interface I' between R+ and R-, we have

To solve the synthesis problem is equivalent to find the interface r be-


tween R+ and R- (the switching line for the controlling action). A straight-
forward way for obtaining the equation for the switching line I' is to solve

5The condition (4.2.34) means that there are reflecting screens on the boundary
segments (x = +1, -1 5 z 5 $1) and (x = -1, -1 5 x 5 +1) (for a detailed description
of diffusion processes with phase constraints and various screens, see $6.2).
Synthesis of Quasioptimal Systems 241

the original nonlinear equation (4.2.32) with the initial and boundary con-
ditions (4.2.33)-(4.2.35) and then, on the plane ( z , z), to find the geometric
locus where the condition (4.2.36) is satisfied.
However, this method can be implemented only numerically. To solve
the synthesis problem analytically, let us return to the approximate method
used in $4.1.
4.2.2. Calculation of the successive approximations. Suppose
that the intensity of random actions on the plant is small but the error of
measurement of the input signal is large. In this case, we can set B = EBO
and ;ic = x O / &(where E > 0 is a small parameter). We consider, just as in
$4.1, the stationary tracking operating conditions. Then for the quadratic
loss function (4.2.6), the Bellman equation (4.2.32) has the form

+x2-2xz+1-y
az2
(4.2.37)
(here f = f ( x , z) is the stationary loss function defined just as in (4.1.7),
and y is the stationary tracking error).
Introducing the special notation f+ and f- for the loss function f in
R + and R-, we can replace the nonlinear equation (4.2.37) by the pair of
linear equations

each of which is valid only in one of the regions ( R + or R - ) on the phase


plane ( z , z) .
We shall solve Eqs. (4.2.38) by the method of successive approxima-
tions considered in $4.1. In this case, instead of (4.2.38), we need to
solve a number of simpler equations that successively approximate the
original equations (4.2.38). By setting E = 0 in (4.2.38), we obtain the
zero-approximation equations

The next approximations are calculated according to the scheme


242 Chapter IV

By solving the equations for the kth approximation (k = 0,1,2, . . .), we


obtain the set fz
(x, z), I'" -yk consisting of approximate expressions for the
loss function, the switching line, and the stationary tracking error.
In what follows, we solve the synthesis problem in the first two approxi-
mations, the zero and the first.
The zero approximation. Let us consider Eqs. (4.2.39). By analogy
with 34.1, the equation for the interface r0 between Rt
and R:, on which
both equations for f t and f! hold, and the stationary tracking error -yo
can be found without solving Eqs. (4.2.39). Indeed, using the condition
that the gradient v f k (see (4.1.15)) is continuous on the switching line r k ,

we obtain from (4.2.39) the following components of the gradient V f 0


along r O :

The condition

which is necessary for the existence of the switching line of the first kind
(see (4.1.20)), together with (4.2.42) implies that the line

is a possible F0 for the zero approximation.


An analysis of the phase trajectories of the deterministic system

shows that the trajectories actually approach the line (4.2.44) on both sides6
if only 2p < 1. In what follows, we assume that this condition is satisfied.
The stationary error is obtained from the condition that the derivative
d f O / d x calculated along r0 a t the stable point (e.g., a t the origin x = 0,

61n the first equation in (4.2.45), the sign + corresponds to the region z > x and the
sign - to z < x .
Synthesis of Quasioptimal Systems 243

z = 0) is finite (see (4.1.25) and (4.1.26)). In view of (4.2.42) and (4.2.44),

I-"=-
along rO.The condition (4.1.26) in this case has the form limx,o
0
-
2pxY -
0, which implies = 1.
Now, to solve Eq. (4.2.39), we write the characteristic equations

To solve (4.2.46) uniquely, it is necessary to pose some additional "initial"


condition (to pose the Cauchy problem) for the loss function f (x, z). This
condition follows from (4.2.42) and (4.2.44). The second relation in (4.2.42)
+
implies that f O ( z , z ) = -(z2/4,u) f O ( O , 0) on the line (4.2.44). Without
loss of generality, we can set fO(O,0) = 0. Thus, among the solutions fg
obtained from (4.2.46), we choose the solution satisfying the condition

on the line z = x. We readily obtain this solution

where xo = x*(x, z ) and the functions X* are determined as solutions of


the equations
X * e ~ 2 ~=~z *e ~ 2 ~ x . (4.2.49)
The first approximation. Now, using (4.2.48), we can find the switch-
ing line I'1 in the first approximation. Relations (4.2.40) and (4.2.41) allow
us to write the components of the gradient V f l on the line rl:

Differentiating (4.2.48) and using the relations


244 Chapter IV

that follow from (4.2.49), we find the components

Substituting (4.2.51) into (4.2.50), we obtain

Using again the condition (4.2.43), we find rl. The derivatives d A i / d z


and dAi/dx are calculated with regard to the fact that the difference
between the position of the switching line I'1 in the first approximation
and the position of r0 determined by (4.2.44) is small. Therefore, after
the differentiation of (4.2.52), we can replace X+ and X- by the relation
X+ = X- = z = x. If this replacement is performed only for the terms of
the order of E, then the error caused by this replacement is an infinitesimal
of higher order.
Synthesis of Quasioptimal Systems

Taking into account this fact, we obtain from (4.2.52):

Hence, using (4.2.43), we obtain the equation for the switching line I":

The position of r1 on the plane (x, z) depends on the values of p, xo,


and Bo. Figure 33 shows one of the possible switching lines and the phase
trajectories of system (4.2.45).
By analogy with the zero approximation, we find the stationary tracking
error y1 from the condition that the gradient (4.2.52) is finite a t the origin.
By letting z + 0 and x 4 0 in (4.2.52) and taking into account the fact
that X+ and X - tend to zero just as x and z , we obtain

Hence it follows that the stationary error in the first approximation depends
on the noise intensity a t the input of the system shown in Fig. 32 but is
independent of the noises in the plant.
246 Chapter IV

Using the equation (4.2.53) for the switching line and Eq. (4.2.20), we
construct the analogous circuit (see Fig. 34) for a quasioptimal tracking
system in the first approximation. The dotted line indicates the unit SC
that produces a sufficient coordinate z(t); the unit NC is an inertialess
transducer that realizes the functional dependence on the right-hand side of
(4.2.53). If we have E << 1for the small parameter contained in the problem,
then the output variable x(t) fluctuates mostly in a small neighborhood of
zero. In this case (Ix(t)l << I), as follows from (4.2.53), the nonlinear
unit NC can be replaced by a linear amplifier with the amplification factor
CHAPTER V

CONTROL OF OSCILLATORY SYSTEMS

The present chapter deals with some synthesis problems for optimal sys-
tems with quasiharmonic plants. Here the term "quasiharmonic" means
that the plant dynamics is close to harmonic oscillations in the process of
control. In this case, through time t = 2n, the phase trajectories of the
second-order systems considered in this chapter are close to circles on the
plane (3,k ) .
There exists an extensive literature on the methods for studying such
systems (including controlled systems) (e.g., see [2, 19, 27, 33, 69, 70,
136, 153, 1541 and the references therein). These methods are based on
the idea (going back to Poincark) that the motion in oscillatory systems
can be divided into "fast" and "slow" motions. This idea along with the
averaging method [2] enables one to derive equations for "slow" variables
that can readily be integrated. These equations are usually derived by
different versions of the method of successive approximations.
Various approximate methods based on the first-approximation equation
for slowly varying variables play an important role in industrial engineering.
For the first time, such a method for studying nonlinear oscillatory systems
was proposed by van der Pol [183, 1841 (the method of slowly varying
amplitudes). Among other first-approximation methods, we also point out
the "mean steepness" method [2] and the harmonic balance method [69,
701, which is widely used in engineering calculations of automatic control
systems.
More precise results can be obtained by regular asymptotic methods, the
most important of which is the asymptotic Krylov-Bogolyubov method [19].
Originally, this method was developed for studying nonlinear oscillations
in deterministic uncontrolled systems. Later on, this method was also used
for the investigation of stochastic [log, 1731 and controlled [33] oscillatory
systems. In the present chapter, the Krylov-Bogolyubov method is also
widely used for constructing quasioptimal control algorithms.
This chapter consists of four sections, in which we consider four special
problems of optimal damping of oscillations in quasiharmonic second-order
systems with constrained controlling actions. In the first two sections (55.1
248 Chapter V

and $5.2) we consider deterministic problems; the other two sections ($5.3
and $5.4) deal with stochastic synthesis problems.
First, in $5.1 we study the control problem for an arbitrary quasihar-
monic oscillator with one degree of freedom. We describe a method for
solving the synthesis problem approximately. In this method, the mini-
mized functional and the equation for the switching line are presented as
asymptotic expansions in powers of a small parameter contained in the
problem. The method of approximate synthesis is illustrated by some ex-
amples of solving the optimal control problems for a linear oscillator and
a nonlinear van der Pol oscillator. In $5.2 we use the method (consid-
ered in $5.1) for solving the control problem for a system of two biological
populations, namely, the "predator-prey" model described by the Lotka-
Volterra equation (see $2.3). We study a special Lotka-Volterra model
with a "poorly adapted predator." In this case, the sizes of both interact-
ing populations obey a quasiharmonic dynamics. Next, in $5.3, we consider
the stochastic version of the problem studied in $5.1. We consider an as-
ymptotic synthesis method that allows us to construct quasioptimal control
systems with an oscillatory plant subject to additive random disturbances.
Finally, in $5.4, the method considered in $5.3 is generalized to the case
of indirect observation when the measurement of the current state of the
oscillator is accompanied by a white noise.

$5.1. Optimal control of a quasiharmonic


oscillator. An asymptotic synthesis method
According to [2], a mechanical system with one degree of freedom is
called a quasiharmonic oscillator if its behavior is described by the system
of the form
+
331 = 2 2 E ~ 1 ( 2 1 , 2 2 u),
,
(5.1.1)
332 = -21 +
~ ~ 2 ( 2 212 , u),

where 21 and 2 2 are the phase coordinates, and ~2 are sufficiently


arbitrary (nonlinear, in the general case) functions of their arguments,'
u = ( u l , . . ., u T )is an r-dimensional vector of controlling actions subject to
various restrictions, and the number E is a small parameter.
It follows from (5.1.1) that for E = 0 the general solution of system
(5.1.1) is a union of two harmonic oscillations

z l ( t ) = a sin(t + a), 22(t) = a cos(t + a), (5.1.2)

'The only assumption is that, for some given functions ~1 and xz, the Cauchy prob-
lem of system (5.1.1) has a unique solution in a chosen domain D in the space of the
variables ( t ,xl, xz) (see $1.1).
Control of Oscillatory Systems 249

with the same period T = 27r and the phase shift Acp = n/2. Note that,
in the phase plane (21,22), the trajectory that is a circle of radius a corre-
sponds to the solution (5.1.2). If E # 0 but is a sufficiently small parameter,
then, in view of the continuity, the difference between the solution of sys-
tem (5.1.1) and the solution (5.1.2) is small on a time interval that is not
too large. More precisely, if for E # 0 we seek the solution of system (5.1.1)
in the form

+
then the "amplitude" increment A a = a ( t 27r) - a(t) and the "phase"
+
increment A a = a ( t 27r) - a ( t ) are small during time T = 27r, that is,
A a N E and A a E. This fact justifies the term "quasiharmonic" for
systems of the form (5.1.1) and serves as a basis for the elaboration of
various asymptotic methods for the analysis of such systems.
5.1.1. Statement of the problem. In the present section we consider
controlled oscillators whose behavior is described by an equation of the form

x + EX(%, I)&+ x = EU, (5.1.3)

where ~ ( xI) , is an arbitrary given function (nonlinear in the general case)


that is centrally symmetric, e.g., ~ ( xI) , = x(-x, -I). In the phase vari-
ables x l , x2 (determined, as usual, by x l = x and 2 2 = g), we can replace
Eq. (5.1.3) by the following equivalent system of first-order equations:

231 = 2 2 , &2 = -21 - +


E X ( X ~ , X ~ EU,
)X~ (5.1.4)

hence it follows that the oscillator (5.1.3) is a special case of the oscillator
(5.1.1) with XI 2 0 and X Z ( X ~ , X Z ,U) = u - ~ ( 2 122)x2.
,
It should be noted that equations of the form (5.1.3) describe a wide class
of controlled plants of various physical nature: mechanical (the Froude
pendulum [2]), electrical (vacuum-tube and semiconductor generators of
harmonic oscillations [2, 19, 183, 184]), electromechanical remote tracking
systems for angle reconstruction [2], etc. Numerous examples of actual
systems mathematically modeled by Eq. (5.1.3) can be found in [2, 19,
1361.
For the controlled oscillator (5.1.3), we shall consider the following op-
timal control problem with free right-hand endpoint of the trajectory.
We assume that the absolute value of the admissible (scalar) control
u = u(t) is bounded a t each time instant t:
250 Chapter V

and the goal of control for system (5.1.3) is to minimize the integral func-
tional
T
I[u] = c (x (t), &(t))dt i min (5.1.6)
I u ( t ) l < u m , O<t<T

over the trajectories {x(t) = xu(t): O 5 t 5 T) of system (5.1.3) that


correspond to all possible controls u satisfying (5.1.5). The time interval
[0, TI and the initial state of the oscillator x(0) = xl(0) = xlo, &(O) =
x2(0) = 220 are given. The penalty function c(x, 2 ) = c(xl,x2) in (5.1.6)
is assumed to be nonnegative and symmetrical with respect to the origin,
c(xl, 2 2 ) = c(-xl, -x2), and vanishing only a t the point ( X I = 0,x2 = 0).
In this case, the optimal control u, minimizing the functional (5.1.6) is
sought in the synthesis form u, = u, (t, xl(t), xz(t)).
Problem (5.1.3)-(5.1.6) is a special case of problem (1.3.1)-(1.3.3) con-
sidered in $1.3. Therefore, if we determine the function of minimum future
losses

F ( t , XI, 22) = min


Iu(r)I<urn,
t<s<T
[lT c(xl(r), xZ(T))d r I xl(t) = X I , ~ 2 ( t =
) 52
I
(5.1.7)
in the standard way and use the standard derivation procedure described
in $1.3, then, for the function (5.1.7), we obtain the Bellman differential
equation

dF
--at = "2- d F - dl7
axl
("1 + EX("', x2)xZ) -
ax2+ I u rnin
lSum [EU-
:xtl
+ c(x1, xz),
0 < t < T, F ( T ,~ 1 , x z=
) 0,
(5.1.8)
that corresponds to problem (5.1.3)-(5.1.6).
Equation (5.1.8) allow us to obtain some general properties of the optimal
control in the synthesis form u, (t, XI, xz), which we shall use later. Indeed,
it follows from (5.1.8) that the optimal control u, for which the expression
in the square brackets attains its minimum is a relay-type control and can
be written in the form

aF
-urn sign -(t, xl, x2).
8x2
REMARK 5.1.1. Rigorously speaking, the optimal control in this prob-
lem is not unique. This is related to the fact that at the points (t, xl, x2),
where a F ( t , X I , x2)/dx2 = 0, the optimal control u, is not uniquely deter-
mined by Eq. (5.1.8). On the other hand, one can see that a t the points
Control of Oscillatory Systems 251

(t, 21, 22), where a F / d x 2 = 0, the choice of any control u0 lying in the
admissible region [-urn, u,] does not affect the value of the loss function
F ( t , 21, 22) that satisfies the Bellman equation. Therefore, in particular,
the control (5.1.9) that requires the choice of u, = 0 a t the points (t, xl, x2),
where a F ( t , x l , x2)/dx2 = 0,2 is optimal.
Using (5.1.9), we can rewrite the Bellman equation (5.1.8) in the form

05t < T, F(T,21, 2 2 ) = 0.


(5.1.10)
It follows from (5.1.10) and the central symmetry of x(x1, z 2 ) and c(xl,22)
that the loss function (5.1.7) satisfying (5.1.10) is centrally symmetric with
respect to the phase coordinates, namely, F ( t , X I , 2 2 ) = F ( t , -21, - 2 2 ) .
Therefore, for any t , X I , 2 2 we have

It follows from this relation and (5.1.9) that the optimal control algorithm
u, (t, 21, x2) has an important property of being antisymmetric, namely,

The facts that the optimal control in problem (5.1.3)-(5.1.6) is of relay type
(5.1.9) and antisymmetric (5.1.11) play an important role in the asymptotic
synthesis method discussed in the sequel.
We also note that the optimal control algorithm in problem (5.1.3)-
(5.1.6) can be simplified significantly if we consider the optimal control of
system (5.1.3) on an infinite time interval. In this case, the upper limit
of integration T + oo in (5.1.6) and, instead of (5.1.7), we have the time-
independent3 loss function

f(xl,x2)= min
lu(~)I<~rn,

2Recall that the discontinuous function signx is determined by the relation

3The loss function (5.1.12)is time-independent, since the plant equations (5.1.4) are
time-invariant.
252 Chapter V

and, instead of (5.1.9), we have a time-invariant control algorithm of the


form
8f
u*(x1, x2) = -urn sign -(XI, x2). (5.1.13)
8x2
In what follows we shall consider just such a time-invariant version of
the optimal control problem (5.1.3)-(5.1.6) on an infinite time interval.
REMARK5.1.2. As T -+co, problem (5.1.3)-(5.1.6) makes sense only if
there exists an admissible control u(xl, 2 2 ) in the synthesis form ensuring
the convergence of the improper integral4

I"(XI, 22) =
I" c(x?(t), xF(t)) d t ,

where xT(t) and x%(t) denote solutions of system (5.1.4) with control E
(5.1.14)

and the initial conditions xl(0) = x1 and x2(0) = 2 2 . Simultaneously, for


some constraints of the form (5.1.5) imposed on the admissible controls
and for some nonlinear functions ~ ( x l2 ,2 ) in (5.1.3), (5.1.4), it may hap-
pen that none of the admissible controls u ensures the convergence of the
integral (5.1.14). For example, if x(xl,x2) = xq - 1, then system (5.1.3)
is a controlled van der Pol oscillator. It is well known [2, 183, 1841 that
undamped quasiharmonic auto-oscillations arise in such systems for u G 0.
Moreover, this auto-oscillating process is stable with respect to small dis-
turbances affecting the oscillator. Therefore, for sufficiently small urn in
(5.1.5), any admissible control is insufficient to "suppress" auto-oscillations
in the oscillator (5.1.3). In its turn, in view of the properties of the penalty
function c(x1, x2), it follows from this fact that the integral (5.1.14) does
not converge.
Everywhere in the sequel, we assume that the parameters of problem
(5.1.3)-(5.1.6) are chosen so that this problem has a solution as T + co.
The solvability conditions for problem (5.1.3)-(5.1.6) as T + co will be
studied in more detail in Section 5.1.4.
5.1.2. Equations for the amplitude and the phase. Reduction
of the synthesis problem. To study the quasiharmonic systems of the
form (5.1.1) and (5.1.3), it is convenient to describe the current state of
the system by using, instead of the coordinate X I and the velocity 2 2 , the
polar coordinates A and cp, which have the meaning of the "amplitude"
and the "phase" of almost harmonic oscillations. We can pass to the new
coordinates by the formulas
xl=Acos@, x2=-Asin@, @=t+p. (5.1.15)
-

4 ~also
t follows from the ~ r o ~ e r t i of
e sthe penalty function c(xl,x2) that the control
Z(x1,x2) guarantees the asymptotic stability of the trivial solution XI ( t ) = xa(t) r 0 of
system (5.1.4).
Control of Oscillatory Systems 253

The change of variables (5.1.15) transforms system (5.1.4) to the follow-


ing equations for the slowly changing amplitude and phase (equations in
the normal form [2, 19, 1361):

A = EG(A,@, u), @ = EH(A,@, u), (5.1.16)

where

G(A, @, u) = x,(A, @) - %(A, @I,

A
x,(A, @) = -(cos 2@- I)X(Acos @, -Asin@),
2
sin 2@
x&, @I = -- 2 ~ ( A c o@,
s -Asin@),
us (A, @) = u(A, @) sin @, u,(A, @) = u(A, @) cos @.

Since the optimal control is of relay type (5.1.9), (5.1.13) and antisym-
metric (5.1.11), for the control function u(A, @) in (5.1.17), we can imme-
diately write
u(A, @) = u, sign [ sin (@ - pr (A))]. (5.1.18)
Note that, in view of the change of variables (5.1.15), controls of the form
(5.1.18) are already of relay type and antisymmetric on the phase plane
(xl,x2). The function pJA) in (5.1.18) determines (in the polar coordi-
nates) an equation for the switching line of the controlling action. Thus,
in this case, the synthesis problem is equivalent to the problem of finding
the function $(A) that minimizes a given optimality criterion. The func-
tion p:(A) is calculated by using the method of successive approximations
presented in Section 5.1.4.
It is well known [2, 19, 331 that for a sufficiently small parameter E , in-
stead of Eqs. (5.1.16), one can use some other auxiliary equations, which
are constructed according to certain rules and are called truncated equa-
tions. These equations allow one to obtain approximate solutions of the
original equations in a rather simple way (the accuracy is the higher, the
smaller is the parameter E ) .5
In the simplest case, the truncated equations

5Here we do not justify the approximatingproperties of the solutions constructed with


the help of truncated equations. A detailed discussion of these problems can be found
in numerous textbooks and monographs devoted to the theory of nonlinear oscillations
(e.g., see 12, 19, 33, 1361).
254 Chapter V

are obtained from (5.1.16) by neglecting the vibrational terms in the ex-
pressions for G(A, @, u) and H(A, @, u) or, which is the same, by averaging
the right-hand sides of Eqs. (5.1.16) over the "fast phase" Q, while the
amplitude A is fixed,6 namely,

A higher accuracy of approximation to the solution of system (5.1.16) is


ensured by the regular asymptotic Krylov-Bogolyubov method [19, 1731,
in which the vibrational terms on the right-hand sides of Eqs. (5.1.16) are
eliminated by the additional change of variables

where

denote purely vibrational functions such that

v(A*,@*)= - iT12R v(A*, Q*) dm* = 0,

"(A*,@*) = - 2i12' a*) "(A*, dm* = 0.

By the change of variables (5.1.21), we obtain the following equations


for the nonvibrational amplitude A* and phase JO* from (5.1.16):

A* = EG*(A*) = EGT(A*) E~G;(A*)+ + E3 . . ., (5.1.23)


@* = EH*(A*)= EH;(A*) + E ~ H ; ( A * )+ E ~ . . .

In this case, the successive terms G;, H;, G;, H;, . . .,v1, "1, v2, "2,. . . of
the asymptotic series (5.1.23) and (5.1.22) are calculated recurrently by the
method of successive approximations.

'This method for obtaining truncated equations is often called the method of slowly
varying amplitudes or the v a n der Pol method.
Control of Oscillatory Systems 255

Let us illustrate this method. By using (5.1.21), we can write (5.1.16)


in the form

Substituting (5.1.22) and (5.1.23) into (5.1.24) and retaining only the terms
of the order of E in (5.1.24), we obtain the first-approximation relations

awl 1
H;(A*) + -(A*,
a@* a * ) = H(A*, a * ) = x,(A*, a * ) - -u,(A*,
A*
a*).
(5.1.25)
Now, by equating the nonvibrational and purely vibrational terms on the
left and on the right in (5.1.25), we obtain the following expressions for the
first terms of the asymptotic series (5.1.23) and (5.1.22):

G;(A*) = x A(A*, a * ) - u,(A*, a * ) = G(A*),


1 (5.1.26)
H;(A*) = xP(A*,a*)- ;u,(A*, a * ) = H(A*),
A

( A * a*)= * ) - ] d f - ( A * a*), (5.1.2'7)

where

Q, (A*, a * ) =
6;[u, (A*,@I) - ?&I da'.

In (5.1.26)-(5.1.28), as usual, the bar over an expression indicates the av-


eraging over the period, that is, & J:*. . . d@*;the lower integration limits
and are chosen so that the functions vl(A*, a*) and wl(A*, @*),
determined by (5.1.27) and (5.1.28), be "purely vibrational" in the vari-
able a*.
In a similar way, we can calculate the next terms Ga(A*), H,*(A*),
v2(Ar,a * ) , . . . of the asymptotic expansions (5.1.23) and (5.1.22). So, to
256 Chapter V

calculate the functions Ga, H,+,212, w2 in (5.1.24), we need to retain the ex-
pressions of the order of E ~ Then
. (5.1.24) implies the second-approximation
relations

G;(A*) + -G;(A*)
8%
d ~ *
+ -H;
avl
aa* (A*) +a
dv2
,
dG dG
= v17(A*,@*)
dA
+ wi;(A*,@*),
d@ (5.1.29)
H; (A*) + awl
--G;(A*)
dA*
awl
+
mH; (A*) +
dw2

In its turn, each equality in (5.1.29) splits into two separate relations for
the nonvibrational and vibrational terms contained in (5.1.29), respectively.
This allows us to calculate the four functions GZ(AS), H;(A*), v2(A*,a*),
and wg(A*, a*).In particular, for the nonvibrational terms, the first equal-
ity in (5.1.29) implies

Using (5.1.17), (5.1.27), and (5.1.28), we can write the right-hand side
of (5.1.30) in more details as follow^:^

where the expression

g; (A*) = - n,) dm* + ( -) a * (5.1.32)

indicates the control-independent terms.


We do not write out the expressions for H,+(A*), v2(A*,a * ) , . . . since we
do not need them in the sequel.

or brevity, we omit the arguments ( A * ,a * ) of the functions x,, xP, u s , and


Q, in (5.1.31) and (5.1.32).
Control of Oscillatory Systems 257

5.1.3. Auxiliary formulas. The functions G; (A*), HI (A*), Ga (A*),


Hz(A*), . . . that form the asymptotic series in (5.1.23) depend on the choice
of the control algorithm u(A, a ) , that is, in view of (5.1.18), on the func-
tion cp,(A). It follows from (5.1.26) and (5.1.31) that we can write this
dependence explicitly if we know the expressions

- 8% duS 8%
-
us, u,, aA sin n@, -cosna,
da
Qs -,
dA (5.1.33)
auS
Q-, Qc s i n n a , 9, cos n@.
aa
The average values (5.1.33) can readily be calculated by using (5.1.18),
the properties of the S-function, and the fact that the functions u,(A, a),
u,(A, a),Qs (A, a),and +,(A, @) are periodic (with respect to @).
1. If, for definiteness, we assume that 0 5 cpr 5 r / 2 , then it follows
from (5.1.17) and (5.1.18) that

u, sign [sin (a - c p , ( ~ ) ) ] sin @ d@

2u,
= -cos pr(A).
sinad@+/
"+qr(A)

Y,(A)
sinad@- J2"
"+Y,(A)
sin dm
I
(5.1.34)
7r

One can readily see that formula (5.1.34) remains valid for any cp,(A) such
that -7r <cp,(A) 5 7r.
2. In a similar way, we obtain

2% sin cpp(A)
u, sign [sin (a - cp, (A))] cos @ d@ = - -
7r

and the relation


U,U,= 0,

which we shall use later.


3. Using the formal relation

d
-sign x = 2S(x)
dx
Chapter V

and formula (5.1. IS), we can write


au,
-
a
= -[u(A, @) sin @I
dA dA
a
= -{urn sign [sin (0 - pr (A))] sin @)
aA

Using (5.1.37) and the properties of the &function, after the integration
and some elementary calculations, we obtain

[cos(n + l)pr - cos(n - l ) p r ] for even n ,


for odd n.

4. By the straightforward integration with regard to

8%
- = 2urn6[sin(0- p,)] cos(@- p,) sin cP + u, sign[sin(0 - pr)] cos 0,
a0
we obtain

[*~ i n ( +n l ) ~ -, sin(n - l ) p r ] for even n ,


for odd n.

5. Since \E, (A, cP) and du, (A, @)/dA are periodic functions, we have

Next, using (5.1.27) and (5.1.37), we arrive a t

where
&[sin(@'- pr)] cos(@' - p,) sin 0 ' d 0 '
Control of Oscillatory Systems

2 sin cp,. - - - - - - - - - - - I
I
I
I
sin pr .- - I
I I
I 1
I I
I I
*
0 'f, 7T n-+'f, 27T @

It follows from (5.1.40) that the choice of does not affect the value of
Q,%. Hence we set = 0. Furthermore, if we consider 0 5 cp, 5 n-,
then the piecewise constant function F ( @ )in (5.1.41) has jumps of value
+
sinp, a t the points cpr and n- cp, as shown in Fig. 35. For this function
F(@), one can readily calculate F and q,
namely,

These relations, (5.1.34), (5.1.40), and (5.1.41) imply

2u& 3
-- dcp ( 5sin 2pp - -1 sin f p , + sin cp,
n- dA n- 2

Carrying out similar calculations for - 7 ~ 5 cp, 5 0 and comparing the result
with the last formula, we finally obtain

6. Using the relation


260 Chapter V

and expressions (5.1.34)-(5.1.36), we obtain

du, 2u2
Qc-
aa = --sin2pr.
7r2

7. The relation
1
Q, sin n@ = -u, cos n 9 (5.1.44)
n
allows us to reduce the calculation of the desired mean value to finding a
simpler expression u, cos n a . Using (5.1.17) and (5.1.18) and performing
some simple calculations, we obtain
1
n+l sin(n + 1 ) + ~5 ~sin(n - l)y, for even n,
for odd n.

8. The value Q, c o s n 9 can readily be obtained by using the obvious


relation
1
Q, cos n@ = - --cos n 9
au,
n 2 d@
and formula (5.1.39).
The expressions obtained for the average values (5.1.33) will be used
later for solving the synthesis problem.
5.1.4. Approximate solution of the synthesis problem. Now let
us return to the basic problem of minimizing the functional (5.1.14). By
choosing the nonvibrational amplitude and phase as the state variables, we
rewrite (5.1.14) in the form8

( A ** = ] c*(A;, a;) d t ,

where c*(A*,9*)is obtained from the penalty function c(xl, x2) by the
change of variables (5.1.15), (5.1.21).
Note that the functional (5.1.46), treated as a function of the initial state
( A * ,a * ) , is a periodic function in the second variable, namely, I(A*, a * ) =

'The value of the functional (5.1.46) depends both on the initial state A*(O) = A*,
@ * ( O ) = @* of the system and on the control algorithm u(A;, Q;): 0 <
t < oo. There-
fore, for the functional (5.1.46) it is more correct to use the notation IU(~;,*:)(A*,@*)
a*)
or I ' ~ ( ~ * ) ( A * , (which, in view of (5.1.18), is the same). However, for simplicity, we
write I(A*, a * ) .
Control of Oscillatory Systems 261

I(A*, @* +27~). Therefore, taking into account (5.1.21) and the second
equation in (5.1.23), we obtain

from (5.1.46). In (5.1.47) the integration over the period is performed


along a trajectory of the system, and hence the amplitude A; is treated as
a function of the fast phase @ f . This function Af (a;) is determined by the
relation

that follows from Eqs. (5.1.23).


Note that the amplitude increment AA* = A* (a*+ 27~)- A*(@*)during
the period is, in view of (5.1.23), a small variable (of the order of E). By
using this fact and the Taylor expansion of the left-hand side of (5.1.47),
for the derivative dI(A*, @*)/dA*we obtain the following power series in
the small parameter E:

1 d 2 1 ( ~ *a*)
,
- - AA* - . .
2 dA*2
Since AA* = €G;(AS)2.rr in the first approximation with respect to E, it
follows from (5.1.49) that
dI(A*, a*) - --c*(A*)
& -
dA* G;(A*) + € . . . .
where
-
C* (A*) =- '*(A*, a,') d@F, (5.1.51)

and the function G;(A*) = G;(A*, (pr(A*)) is determined by (5.1.26),


(5.1.17), and (5.1.34).
Calculating the right-hand side of (5.1.49) with a higher accuracy (in
this case, to calculate the last term in (5.1.49), we need to differentiate
(5.1.50)), we obtain

dI(A*, a*) - - c*(A*) dc*


& - - €-(A*, @f)(@;- a*)
dA* +
Gf (A*) EG; (A*) dA*
dc*(A*)
- E7T- +c2...,
dA*
262 Chapter V

where, just as in (5.1.51), the bar over a letter indicates the averaging over
the period with respect to @:, and the function GB(A*) is determined by
(5.1.31).
Let us write the functional to be minimized as follows:

(note that, by the assumptions of the problem considered, we can set AT, =
I ( A 2 ) = 0).
It follows from (5.1.53) that, to minimize the functional (5.1.46), it suf-
fices to find the minimum of the derivative a I ( A * ,@*)/dASfor an arbitrary
current state (A*, @*) of the control system. The accuracy of this minimiza-
tion procedure depends on the number of terms retained in the expansion
of the right-hand side of (5.1.49) in powers of E. Let us perform the corre-
sponding calculations for the first two approximations.
According to (5.1.50), to minimize the functional (5.1.46) in the first
approximation in E, it suffices to minimize (in cpr) the expression

Since the penalty function c(x, i ) = c(xl, 2 2 ) is nonnegative, we have


i?*(A*) > 0 for A* # 0. Therefore, to minimize (5.1.54), it suffices to
minimize the function G; (A*, v r ( ~ * ) In
) . its turn, it follows from (5.1.17)
and (5.1.26) that GT attains its minimum value for the maximum value of

-
u. = u, (A*,m*) = -
1
2~
j 27r

0
U(A*,a*)sin a* d@*

This fact and (5.1.5) readily imply that the optimal control ul(A*, a*) in
the first approximation must have the form

Cornparing (5.1.55) and (5.1.18), we see that cpr(A*) z 0 in the first


approximation in E. This means that, in this case, the switching line of the
control coincides with the abscissa axis on the phase plane (xl= x, x2 = 2).
Indeed, if, instead of the amplitude A* and the phase @*, we take the
coordinate x and the velocity 2 as the state variables, then it follows from
(5.1.15), (5.1.21), and (5.1.55) that, in this approximation, the optimal
control of the oscillator (5.1.3) is ensured by the synthesis function of the
form
u1 (x, 2) = -urn sign 2. (5.1.56)
Control of Oscillatory Systems 263

From the mechanical viewpoint, this result means that, to obtain the
optimal damping of oscillations in the oscillator (5.1.3), we must apply
the maximum admissible controlling force (the torque) and this force (the
torque) must always be opposite to the velocity (the angular velocity) of
the motion. It must also be emphasized that the control algorithm in the
first approximation is universal, since it depends neither on the nonlinear
characteristics of the oscillator (that is, on the function ~ ( xi)
, in (5.1.3))
nor on the form of the penalty function c ( x , i)in the optimality criterion
(5.1.6).
To find the quasioptimal control algorithm in the second approximation,
we need to calculate the function cp,(A*) that minimizes (5.1.52) or, which
is the same, the expression

G; (A*, P,(A*)) + E G(A*,


~ P,(A* )) - (5.1.57)

Since (5.1.57) differs from G;(A*, c p r ( ~ * ) ) by a term of the order of E, it


is natural to assume that the difference between the function cp,(A*) that
minimizes (5.1.57) and the function cpr(A*) 0 in the first approximation
is small, that is, it is natural to assume that we have cp,(A*) -- E for the
desired function.
Having in mind the fact that cp,(A*) -- E and using the average values
(5.1.33) calculated in Section 5.1.3, we can estimate the order of differ-
ent terms in formula (5.1.31) for the function G; (A*, c p r ( ~ * ) ) . We also
note that since the function x in (5.1.3) is symmetric, that is, ~ ( xi) , =
x(-x, - i ) , there are only cosines (sines) of a,2@,. . . in the Fourier series
for the functions X, (A, a ) &(A, a ) ) . Thus, it follows from the results
obtained in Section 5.1.3 that, among all terms in (5.1.31), only two terms
&Q,$$ and -&*,&& are of the order of E . The other terms (depend-
ing on the control, that is, on cpr(A*)) in (5.1.31) are of the order of s2 or
E ~ This
. implies that the function cpr(At) minimizing (5.1.57) in the second
approximation is just the function maximizing the expression
E au, E
F(pr)= ii, - -!I!,-
A* a@*+ -A*
-!I! X".
"a@*
To obtain some special results, we need to define the function ~ ( xi)
, ex-
plicitly. Let us consider two examples.
EXAMPLE 1. Suppose that the plant is a linear quasiharmonic oscillator
described by Eq. (5.1.3). In this case, ~ ( xi), 1 and, in view of (5.1.17),
x(A, a) = A(cos 2@ - 1)/2.
By using (5.1.34), (5.1.43), and (5.1.45), we obtain

F(c,) =
2um
cos pr + E-urn (-1 sin 3cpr + sin cpr) + E- 2u;
27r 3 sin 2v,.
264 Chapter V

The desired function qr(A*) can be found from the condition

aF -
- 2u,
- -- sinv,
urn
+ r-(cos3q, + cosq,) +E=
44,
cos 2qr = 0.
a% '7r 21r

Since cpr is small (q, - E), it follows from (5.1.59) thatg


(5.1.59)

The function pr(A*) determines (in the polar coordinates) the switching
line equation for the quasioptimal control in the second approximation. The
position of this switching line on the phase plane (x, 5 ) is shown in Fig. 36.

It follows from (5.1.18) and (5.1.60) that in this case the quasioptimal
control algorithm (the synthesis function) in the second approximation has
the form

u2(A*,@*)=urnsign sin @* [ ( -E e(i


I))+::
- -

REMARK 5.1.3. It follows from (5.1.60) that qr(A*) + co as A* 4 0


and formulas (5.1.60) and (5.1.61) do not make sense any more. The reason

he terms of the order of c2 and of higher orders on the right-hand side of (5.1.60)
are omitted.
Control of Oscillatory Systems 265

is that if we use a control of the form (5.1.18), then there always exists a
small neighborhood of the origin on the phase plane (x, i ) and the quasi-
harmonic character of the trajectories of the plant (5.1.3) is violated in

-
this neighborhood. In Fig. 36, this neighborhood is the circle of radius R
(R E)." In the interior of this neighborhood, the applicability conditions
for the asymptotic (van der Pol, Krylov-Bogolyubov, etc.) methods are vi-
olated. Therefore, the quasioptimal control algorithms (5.1.56) and (5.1.61)
can be used everywhere except for the interior of this neighborhood. More-
over, it is important to keep in mind that, by using the asymptotic synthesis
method discussed in this section, it is in principle impossible to find the
optimal control in a small neighborhood of the point (x = 0, i = 0).
EXAMPLE 2. Now let x ( x , i ) = x2 - 1. In this case, the plant (5.1.3)
is a self-oscillating system (a self-exciting circuit) sometimes called the v a n
der Pol oscillator or the T h o m s o n generator. It follows from (5.1.17) that,
in this case, we have

Using formulas (5.1.34), (5.1.43), and (5.1.45) for the function (5.1.58), we
obtain

2urn
F(cpp)= cos cp sin 58, +3
- -
1
3
sin 3yr - sin cp, + 4um
-
A*
7r
sin 2yr
I
-
Just as in Example 1, from the condition d F / d p r = 0 with regard to the
fact that cpr is small (cpp E), we derive the equation of the switching line,

and the synthesis function in the second approximation,

u2(A*,a*)= urn sign [sin (a* - f ($- 1+ 4urn))].


7rA*
-- (5.1.64)

' O A ~ elementary analysis of the phase trajectories of a linear oscillator subject to the
control (5.1.56) shows that the phase trajectories of the system, once entering the circle
of radius R = 2&um,not only cease to be quasiharmonic, but cease to be oscillatory in
character at all.
Chapter V

FIG.37

The switching line (5.1.63) is shown in Fig. 37.

REMARK5.1.4. It was pointed out in Remark 5.1.2 that the problem


of optimal damping of oscillations in system (5.1.3) on an infinite time
interval is well posed if the optimal (quasioptimal ) control of the plant
(5.1.3) ensures the convergence of the improper integral (5.1.14) (or, which
is the same, of the integral (5.1.46)). Let us establish the convergence
conditions for these integrals in Example 2.
The properties of the penalty function c ( x , 2) readily imply that the in-
tegral (5.1.46) converges if, for a chosen control algorithm and any initial
value of the nonvibrational amplitude A* (0), the solution of the first equa-
tion in (5.1.23) A* (t) + 0 as t t oo, and furthermore, if A* (t) tends to zero
not too "slowly." Let us consider the special form of Eq. (5.1.23) in Exam-
ple 2. We confine ourselves to the first approximation A* = &G;(A*). Since
the quasioptimal control in the first approximation has the form (5.1.55), it
follows from (5.1.26) and (5.1.62) that the nonvibrational amplitude obeys
the equation

If u, > ~ 1 3 4then >


, for any A* 0 the function on the right-hand side of
(5.1.65) cannot be positive; therefore, A* (t) + 0 as t + oo for any solution
of (5.1.65). If in this case
Control of Oscillatory Systems 267

then the solution Ai(t) of Eq. (5.1.65) attains the value A* = 0 on a finite
time interval, which guarantees the convergence of the integral (5.1.46).
Thus, the inequality (5.1.66) is the solvability condition for problem (5.1.3)-
(5.1.6) a s T + m i n the case ofExample2.11
In conclusion we note that, in principle, the approximate method con-
sidered here can also be used for calculating the quasioptimal control al-
gorithms in the third, fourth and higher approximations. However, in this
case, the number of required calculations increases sharply.

$5.2. Control of the "predator-prey'' system.


The case of a poorly adapted predator
In this section, by using the asymptotic synthesis method considered in
$5.1, we solve the optimal control problem for a biological system consisting
of two different populations interpreted as "predators" and "prey" coexist-
ing in the same habitat (e.g., see $2.3 and 1133, 186, 1871). This system is
mathematically described by the standard Lotka-Volterra model in which
the behavior of an isolated system is subject to the following system of
equations (see (2.3.5)):

-
Recall that 5 = Z ( r ) and 5 = G(t ) are the respective population sizes1' of
prey and predators a t time Fand the positive constants ax, a2, bl, and b2
have the following meaning: a1 is the rate of growth of the number of prey,
a2 is the rate of prey consumption by predators, bl is the rate a t which the
prey biomass is processed into the new biomass of predators, and b2 is the
rate of predator natural death.
In this section we consider a special case of system (5.2.1) in which
the predators die a t a high natural rate and are "poor" predators, since
they consume their prey a t a low rate. In the nomenclature of [177], this
problem corresponds to the case of predators poorly adapted to the habitat.
For system (5.2.1), this means that we can take the ratio azbllb2 = E << 1
as a small parameter in the subsequent calculations.

''condition (5.1.66) becomes sharper with an increase in the number of terms re-
tained in the asymptotic series on the right-hand side of Eq. (5.1.23) for the nonvibra-
tional amplitude.
''If the distribution of species over the habitat is uniform, then Z and y" denote the
densities of the corresponding populations, that is, the numbers of species per unit area
(volume) of the habitat.
268 Chapter V

5.2.1. Statement of the problem. We assume that system (5.2.1)


is controlled by eliminating prey specimens from the population (by shoot-
ing, catching, and using herbicides). Then, instead of (5.2.1), we have the
system (see (2.3.12)

here the control Z = Z(T) satisfies the constraints

where 7 is a given positive number.


We consider the control problem for system (5.2.2) with infinite time
interval; the goal of control is to take the system from any initial state
- -
xo,& > 0 to the equilibrium state 2, = b2/bl, y* = a1/a2 of system
(5.2.1). For the optimality criterion we use the functional

I,, = Lrn ( ~ ( r-
)),b2 + c2 (5(T) - %)
a2
2
] (5.2.4)

where cl and c2 are given positive constants. We assume that the integral
(5.2.4) is convergent.
In (5.2.2) we change the variables as follows:

This allows us to rewrite system (5.2.2) in the form

In this case, the functional (5.2.4) to be minimized acquires the form

In the new variables (x, y), the goal of control is to transfer the system to
the origin (x = y = O), and the range of admissible values is bounded by the
Control o f Oscillatory Systems 269

quadrant z > - 1 / ~ ,y <W / E (since the initial variables are nonnegative,


Z , ? > 0).
We assume that the admissible control is bounded by a small value.
To this end, we set 7 = E~~ in (5.2.3). Then, changing the scale of the
controlling function, Z = E ~ U we
, can write system (5.2.6) and the constraint
(5.2.3) as

Thus the desired optimal control u, can be found from the condition that
the functional (5.2.7) attains the minimum value on the trajectories of
system (5.2.8) with constraint (5.2.9) imposed on the control actions. In
this case, we seek the control in the form u, = u, (x(t),y(t)).
5.2.2. Approximate solution o f problem (5.2.7)-(5.2.9). In the
case of "poorly adapted" predators, the number E in (5.2.8) is small, and
system (5.2.8) is a special case of the controlled quasiharmonic oscillator
(5.1.1). Therefore, the method of $5.1 can immediately be used for solving
problem (5.2.7)-(5.2.9). The single distinction is that admissible controls
are subject to nonsymmetric constraints (5.2.9); thus the antisymmetry
property (5.1.11) of the optimal control is violated. As a result, it is im-
possible to write the desired controls in the form (5.1.18). However, as is
shown later, no special difficulties in calculating the quasioptimal controls
in problem (5.2.7)-(5.2.9) arise due to this fact.
On the whole, the scheme for solving problem (5.2.7)-(5.2.9) repeats the
approximate synthesis procedure described in $5.1. Therefore, in what fol-
lows, the main attention is paid to distinctions in expressions and formulas
caused by the special nature of problem (5.2.7)-(5.2.9).
Just as in $5.1, by changing variables according to formulas (5.1. 15)13we
transform system (5.2.8) to the following equations for the slowly changing
amplitude and phase (5.1.16):

Now, instead of (5.1.17), we have the following expressions for the functions
G(A, 9 ) and H(A, 9 ) only:

G(A, 9) = g(A, 9) - uc(A, 9) - Eu:(A, @),

H ( A , 9) = h(A, 9) - u, (A, 9) - EU: ( A ,9),

13With the obvious change in notation: XI = x and 2 2 = y.


270 Chapter V

g(A, a ) = A' sin @ cos @(sincf, - cos a ) ,


h(A, a ) = A sin @ cos @(sin@ + cos a ) ,

u,(A, m) = - @)cos m, u(A' @)A cos2 a,


u:(A, m) = -
b2w b2w
us (A, m) = --"(A' @)sin a, u' (A, .PI = -- )' sin 8 cos a.
b2wA b2w

The passage to Eqs. (5.1.23) for the nonvibrational amplitude A* and


phase p* is performed, as above, by using formulas (5.1.21)-(5.1.24). The
terms Gfi,Hg ,Ga, . . . in the asymptotic series in (5.1.23) are calculated
from (5.1.24), (5.2.10) by the method of successive approximations. In
particular, in the first approximation, instead of (5.1.26)-(5.1.28), we have

G;(A*) = -u,(A*, a * ) , H,*(A*)= -us (A*,a*), (5.2.11)

~ l ( ~ * , m * ) = ~ ~ * l o ( ~ * l ~ ) - u , ( ~ (5.2.12)
* , ~ ) + ~ ~ d ~

w,(A*, m*) = l:*


[h(A*,5 ) - u,(A*, 6 ) + us]d a . (5.2.13)

In (5.2.11)-(5.2.13) we took into account the fact that, in view of (5.2.10),


we have

For the second term of the asymptotic series on the right-hand side of
Eq.(5.1.23), instead of (5.1.31), we have

&(A*, a*) - &(A*, a*)


[ aA* dA*

+ [
&(A*
am*
a*) -
am* I
~ u , ( A *7 a*) wl(A*, @*)

By $5.1 the quasioptimal controls ul(A*, @*),u2(A*,a*),. . . are found from


the condition that the partial derivative dI(A*,@*)/dA* attains its mini-
mum. In view of (5.1.50) and (5.1.52), this condition is equivalent to the
condition that G;(A*) attains its minimum (in the first approximation) or
Control of Oscillatory Systems 271

+
the sum GT(A*) EG; (A*) attains its minimum (in the second approxima-
tion). It follows from (5.2.9), (5.2.10), and (5.2.11) that minimization of
G'; (A*) means maximization of
- 1
U, = fic(A*,@*)= -
2.n o
1 2"
u(A*,@)cos@d@ i max .
oSu<r
(5.2.15)

This fact immediately implies the following implicit formula for quasiopti-
ma1 control in the first approximation:

ul(A*, a*)= Y
-(sign cos @*
2
+ 1). (5.2.16)

Taking into account formulas (5.1.15) and (5.1.21) for the change of vari-
ables, we can write x = A* cos @* with accuracy up to terms of the order
of E. This fact and (5.2.16) readily imply the following expression for the
synthesis control in the first approximation in terms of the variables (x, y):
I
ul(x, y) = -(sign%
2
+ 1). (5.2.17)

Thus, in the course of the control process, the controlling action assumes
only the boundary values from the admissible range (5.2.9) and is switched
from the state u l = 0 to the state u l = y (or conversely) each time when
the representative point (x, y) intersects the y-axis (the switching line in the
first approximation). We also point out that, according to (5.2.5), in the
variables (5,fj)corresponding to the original statement of problem (5.2.2)-
(5.2.4), this control algorithm leads to the switching line that is the vertical
line passing through the point 5 = 2, = b 2 / b l on the abscissa axis; this
point determines the number of prey if system (5.2.1) is in equilibrium.
To find the optimal control in the second approximation, we need to
minimize the expression G;(A*) +EG;(A*) = F ( A * ,u). The functions
G; (A*) = G; (A*, u) and G; (A*) = G; (A*,u) are calculated by formulas
(5.2.11) and (5.2.14) with regard to (5.2.10), (5.2.12), and (5.2.13). In ac-
tual calculations by these formulas, it is convenient to use the fact that the
difference between the optimal control uz(A*, a*) in the second approxi-
mation and (5.2.16) must be small. More precisely, we can assume that on
the one-period interval of the fast phase @* variation, the optimal control
in the second approximation has the form of the function shown in Fig. 38
(the solid lines), where A1 and A2 are the phase shifts of the switch times

-
with respect to the switch times of the control in the first approximation
(the dashed lines); these variables are small (A l, A2 E).
This fact allows us, without loss of generality, to seek the control algo-
rithm ug(A*, @*) in the second approximation immediately in the form
Y
UZ(A*,a*)= -
2 {sign[cos(@*- ( P ~ - +]1) .
) sin ( P ~
Chapter V

Here $71 = pl(A*) and $72 = $72(A*)are related to A l and A2 as

and hence, are also of the order of E.


If the desired control in the second approximation is written in the form
(5.2.18), then there are a t least two advantages. First, in this case, we can
minimize F(A*, u) = G;(A*) +
E G ~ ( A *by
) finding the minimum of the
known function F(A*, p l , $72) of two variables $71 and $72. Second, we can
calculate GT and G%by formulas (5.2.11) and (5.2.14) using the fact that
$71 and $72 are small ($71,$72 -- E)).
From (5.2.10), (5.2.11), and (5.2.18), we obtain

G;(A*) = -u,(A*,Q*) = -- I
27rbzw
1 277
n2(A*,Q*) cos Q* dQ*

- -- Y
2xb2w [c0~($71
- $72) +
cos((p1 $72)) + (5.2.20)

Since $71, $72 -- E , it follows from (5.2.20) that the maximal terms (de-
pending on $71 and $72) in the expansion of (5.2.20) in powers of E are of
the order of E ~ Therefore,
. to calculate the second term E G ; ~
in the function
F(A*, $71, $72) = G; +EG; to be minimized, we can retain only terms of the
order of E~ and neglect the terms of the order of e3 and of higher orders.
Control of Oscillatory Systems

With regard to this remark we calculate the mean values on the right-hand
side of (5.2.14) and thus see1* that we need to minimize the function

(5.2.21)
to obtain the optimal values of cpl and cpz in the second approximation.
From the condition dF/dcpl = dF/dcp2 = 0 necessary for an extremum,
we obtain the desired optimal values

Expressions (5.2.22) determine (in the polar coordinates) the switching


line for the optimal control in the second approximation. The form of this
line on the phase plane (x, y) is shown in Fig. 39. The neighborhood of the
origin in the interior of the circle of radius R = ~ E Y / wis ~the~ region where
the quasiharmonic character of the phase trajectories is violated. Generally
speaking, the results obtained here are not authentic, and we need to use
some other methods for constructing the switching line in this region.
5.2.3. Comparative analysis of different control algorithms. It
is of interest to compare the results obtained in the preceding subsection
14Here we omit cumbersome elementary transformations leading t o (5.2.21). To ob-
tain (5.2.21), we need t o use formulas (5.2.10), (5.2.12), (5.2.13), and (5.2.18) and the
technique used in Section 5.1.3 for calculating average values.
2 74 Chapter V

with the solutions of similar synthesis problems obtained by other methods.


To this end, we can use the results discussed in $7.2 (see also [105]), where
we present a numerical method for solving the synthesis problem for the
"normalized" predator-prey system controlled on a finite time interval.
In 57.2 we consider the optimal control problem in which the plant equa-
tions, the constraints on admissible controls, and the optimality criterion
have the form

In this case, in 57.2 we derive the optimal control G ( T ,5,y) in the synthesis
form by solving the Bellman equation corresponding to problem (5.2.23)-
(5.2.25) numerically.
Note that problem (5.2.23)-(5.2.25) turns into problem (5.2.2)-(5.2.4) if
the following assumptions are satisfied:

We also note that, in view of the changes of variables (5.2.5) and (5.2.26))
the quasioptimal control algorithm in the first approximation (5.2.17) ac-
quires the form
ul(Z, Y) = 7
-[sign@ - 1) + 11. (5.2.27)
2
To estimate the effectiveness of algorithm (5.2.27), we performed a nu-
merical simulation of the normalized system (5.2.23). Namely, we con-
structed a numerical solution of (5.2.23) on the fixed time interval 0 <
T 5 T = 15 for three different algorithms of control E (1) the optimal
control Ti = Z,(T,,: y); (2) the optimal stationary control Ti = ?i:(z,y)
corresponding to the case where the terminal time T t oo in problem
(5.2.23)-(5.2.25); (3) the quasioptimal control in the first approximation
(5.2.27).
Control of Oscillatory Systems

For these three control algorithms, the transient processes in system


(5.2.23) are shown as functions of time in Fig. 40 and as phase trajectories
in Fig. 41. Moreover, the following parameters of problem (5.2.2)-(5.2.4)
were used for the simulation: a1 = a2 = bl = b2 = 0.5, 7 = 0.125, E = 0.5,
w = 1, y = 0.5, cl = ca = 1 (in problem (5.2.23)-(5.2.25), to these values
there correspond = 0.25 and b = 1).

Comparing the curves in Figs. 40 and 41, we see that these three al-
gorithms lead to close transient processes in the control system. Hence,
276 Chapter V

the second and the third algorithms provide a sufficiently "good" con-
trol. This fact is also confirmed by calculating the quality functional
(5.2.25) for these three algorithms, namely, we obtain I[u,(r, Z,y)] = 4.812,
I[ui(%,?j)] = 4.827, and I[ul(:,y)] = 4.901. Thus, any of these algo-
rithms can be used with approximately the same result. Obviously, the
simplest practical realization is provided by the first-approximation algo-
rithm (5.2.27) obtained here; by the way, this algorithm corresponds to
reasonable intuitive heuristic considerations of how to control the system.
Indeed, according to (5.2.27), it is necessary to start catching (shooting,
etc.) every time when the prey population size becomes larger than the
equilibrium size (for the normalized dimensionless system (5.2.23), this
equilibrium size is equal to 1). Conversely, as soon as the prey popula-
tion size becomes smaller than the equilibrium size, any external action on
the system must be stopped.
It should be noted that the situation when the first-order approximation
allows one to obtain a control algorithm close to the optimal control is
rather typical not only of this special case but also of other cases where
the small parameter methods are used for solving approximate synthesis
problems for control systems. This fact is often (and not without success)
used in practice for solving special problems [2, 331. However, it should
be noted that this fact is not universal. There are several cases where the
first-approximation control leads to considerable increase in the value of
the functional to be minimized with respect to its optimal value. At the
same time, the higher-order approximations allow one to obtain control
algorithms close to the optimal control. Some examples of such situations
(however, related to control problems of different nature) are examined in
$6.1 and in [97, 981.

$5.3. Optimal damping of random oscillations


In this section we consider the optimal control problem for a quasihar-
monic oscillator, which is a stochastic generalization of the problem studied
in $5.1. Therefore, many ideas and calculational formulas from $5.1 are
widely used in the sequel.
However, it should be pointed out that the foundations underlying the
approximate synthesis methods in these two sections are absolutely dif-
ferent. In $5.1 the quasioptimal controls are obtained by straightforward
calculations and minimization of the cost functional, while in the present
section the approximate synthesis is based on an approximate method for
solving the Bellman equation corresponding to the problem in question.
5.3.1. Statement of the problem. Preliminary notes. Here we
consider a stochastic version of problem (5.1.3)-(5.1.6) as the initial synthe-
Control of Oscillatory Systems 277

sis problem. We assume that the quasiharmonic oscillator (5.1.3) is subject


to small controls EU = E U ( ~and,
) in addition, to random perturbations of
small intensity

where [(t) denotes the standard scalar white noise (1.1.31) and B > 0 is a
given number.
The admissible controls u = u(t), just as in (5.1.5), are subject to the
constraints
Iu(t) I I Urn, (5.3.2)
and the goal of control is to minimize the mean value of the functional

I[U] = E [l T
c(x(t), i ( t ) ) dt] i min
Iu(t)llum
O<t<T
. (5.3.3)

The nonlinear functions ~ ( xk), and c(x, k) in (5.3.1) and (5.3.3), just as
in $5.1, are assumed to be centrally symmetric, ~ ( x2), = x(-x, -k) and
c(x, k) = c(-x, -2). Next, it is assumed that the penalty function c(x, k)
is nonnegative and possesses a single minimum a t the point (x = 0, k = 0)
and c(0,O) = 0.
Let us introduce the coordinates x l = x, x2 = k and rewrite (5.3.1) as

Then, using the standard procedure from $1.4, for the function of minimum
future losses

F(t,x1,x2) = min
Iu(.)l<um
t<r<T
[ T ( 1 ( ) 2 ( ) ) d
I
1 xi(t) = xl,x2(t) = x2 ,

(5.3.5)
we obtain the Bellman differential equation

--
dF
dt
-
-22-
dF
ax1
- (xI+Ex(x~,x~)xz)-+

E B a2F
dF
min EU-
8x2 I u l l u , [ El
+ -2 dx, +-
7c ( x ~X,Z ) , 0 5 t < T, F ( T , XI,x2) = 0,
(5.3.6)

corresponding to problem (5.3.1)-(5.3.3).


278 Chapter V

It follows from (5.3.6) that the desired optimal control u,(t, XI, 2 2 ) can
be written in the form
aF
u*(t, xi, 2 2 ) = -urn sign -(t, 21, x2),
ax2
where the loss function F ( t , 21, x2) satisfies the following semilinear equa-
tion of parabolic type:

Equation (5.3.8) and the fact that the functions ~ ( x l2 2, ) and c(xl,x2)
are symmetric imply that F = F ( t , X I , x2), satisfying (5.3.8), is symmetric
with respect to the phase coordinates, that is, F ( t , X I , 2 2 ) = F ( t , -21, -22).
This and formula (5.3.7) show that the optimal control (5.3.7) possesses an
important property, which will be used in what follows; namely, the optimal
control (5.3.7) is antisymmetric (see (5.1.11)):

We also stress that in this section the main attention is paid to solving
the stationary version of problem (5.3.1)-(5.3.3), that is, to solving the
control problem in which the terminal time T + m. In the nomenclature
of [I], problem (5.3.1)-(5.3.3) as T + m is called the problem of optimal
stabilization of the oscillator (5.3.1).
5.3.2. Passage to the polar coordinates. The Bellman differen-
tial and functional equations. By using the change of variables (5.1.15),
we transform Eqs. (5.3.4) to equations for the slowly changing amplitude A
and phase cp:

where

(??(A, @, u, t ) = G(A, @, u) - E-'/~(, (t), (, (t) = B1I2<(t)sin @,


E-1/2
G(A,a,u , t ) = H(A, @, U) - -(,(t),
A
&(t) = ~ l I ~ ( (cos
t ) @,
(5.3.11)
and the functions G(A, @, u) and H(A, @, u) are determined by (5.1.17).
Control of Oscillatory Systems 279

Note that the right-hand sides of the differential equations (5.3.10) for
the amplitude and phase contain a random function ((t) that is a white
noise. Therefore, Eqs. (5.3.10) are stochastic equations. The expressions
(5.3.11) for 6'and & are derived from (5.3.4) and (5.1.15) by changing the
variables according to the usual rules valid for smooth functions [(t). Thus
it follows from 31.2 that the stochastic equations (5.3.4) and (5.3.10) are
equivalent if they are symmetrized.15
We also note that by passing to the polar coordinates (which become
the arguments of the loss function (5.3.5)), we can equally use either the
set (A, p , t ) of current values (at time t ) of the amplitude A, the "slow"
phase cp, and time t or the set (A, @, t ) in which the "slow" phase is replaced
by the "fast" phase @. For the calculations performed later, the set (A, @, t )
is more convenient.
For the loss function F ( t , A, @) defined by analogy with (5.3.5),

F (t, A, a) = min
Iu(.)I<um
E [ lT ~1( ~ ( 7~)(~r
I
d) r) 1 ~ ( t =) A, ~ ( t=) @ ,
t<s<T

we can write the basic functional equation of the dynamic programming


approach (see (1.4.6)) as

~ ( t , A t , @ t ) = min E
1u(.)11um
t<r<t+A
[l t+A
c i ( ~ ra,,) d r + ~ (+ tA, A,+*, at+,)].

(5.3.13)
This equation expresses the "optimality principle." It is important to stress
that relation (5.3.13) holds for any time interval A (not necessarily small).
This fact is important in what follows.
But if A -+ 0 in (5.3.13), than, using (5.3.10) and (5.3.11), we can readily
obtain (see 3 1.4) the following Bellman differential equation for the function
(5.3.12):

aF d F dF
--
at
= - ELF
a@ + + min
IU(T)ISU~
EG(A,@, u)-
aA + EH(A,@, u)-

15More precisely, for Eqs. (5.3.10) it is important to take into account the sym-
metrization property, since these equations contain a white noise E ( t ) multiplicatively
with expressions that depend on the state variables A and 9. As for Eqs.(5.3.4), they have
the same solutions independent of whether they are understood in the Ito, Stratonovich,
or any other sense.
Chapter V

where L denotes the operator

The last two terms in (5.3.15) appear due to the fact that the stochastic
equations (5.3.10) are symmetrized.
If we change the time scale and pass to the slowly varying time ?= ~ t ,
then Eq. (5.3.14) for the loss function F(K A, @) acquires the form

It follows from (5.3.16) that the derivatives of the loss functions with respect

dF/dA -
to the amplitude and the fast phase are of different orders of magnitude (if
1, then d F / a @ E). This fact, important for the subsequent
considerations, follows from the quasiharmonic character of the motion of
system (5.3.4).
Equation (5.3.16) can be simplified if, just as in $1.4, $2.2, $3.1, etc.,
we consider the stationary stabilization of random oscillations in system
(5.3.4). In this case, the upper limit of integration T -+oo in (5.3.5) and
(5.3.12). The right-hand side of (5.3.12) also tends to infinity because of
random perturbations ((t). Therefore, to suppress the divergence in the
stationary case, we need to consider the following stationary loss function
f (A, @) (see (1.4.29), (2.2.9), and (4.1.7)):

f (A, a) = lim [F(6A, a) - y (ET- t ) ] ,


T-tw

where the constant y characterizes the mean losses of control per unit time
in the stationary operating conditions. For the function (5.3.17), we have
the stationary version of Eq. (5.3.16):

- min
IuILum

Just as in $5.1, taking into account the relay (5.3.7) and the antisymme-
try (5.3.9) properties, without loss of generality, we can seek the optimal
Control of Oscillatory Systems 281

control u* (A, a ) , which minimizes the expression in the square brackets in


(5.3.18), in the set of controlling actions of the form (5.1.18):

u(A, a ) = u, sign [sin (a - p , ( ~ ) ) ] .

This allows us to rewrite Eq. (5.3.18) in the form

where G(A, a , p,) and H(A, a, p,) denote the functions obtained from
(5.1.17) after the substitution of the control u(A, a ) in the form (5.3.19).
Thus, solving the synthesis problem is reduced to finding the function
$(A) that minimizes the expression in the square brackets in (5.3.20) and
determines (in polar coordinates) the equation for the switching line of the
controlling actions u* = fU, under the optimal control u, (A, a). To calcu-
late the function p+r(A),just as to solve Eq. (5.3.20), we use the method of
successive approximations (see Section 5.3.3), which allows us to obtain the
desired function @ + ( Ain
) the form of a series in powers of the parameter E:

Now let us write the functional equation (5.3.13) for the time interval
A = 27r. With regard t o (5.3.19), we can write

F(t,At,at)= min
v,(Ar)
E [lt+2n
CI(A~, a,) d r + ~ +
( 271,
t ~ t + 2 n@t+2n
,
)I -
t<r<t+27T
(5.3.22)
Since the loss function (5.3.12) is periodic in the variable a, we have
F ( t , A, a ) = F ( t , A, - 27r). This and (5.3.10) imply that relation (5.3.22)
can be rewritten as

F ( t , At, a t ) = min E
vr
[l t+2n
cl(A,, a,) d ~
+F(t + 2 ~At, 1
+ EAA,at + E A ~ ,) (5.3.23)
Chapter V

where

EAA = E J: t + 2 ~G ( A r , @ r , u r , r ) d ~
= ,5 lt+2= G(A,, a,, cpr(Ar)) d r - &J
t
t+27r
L ( T ) dr,
(5.3.24)
t+a~
~ A c p= E H(A,, a,, u ~7), d r

H ( A r , @,, cpr(Ar)) d~ - & dr.

-
Using, just as in (5.3.16), the "slow" time t = ~t and expanding
+ +
F(;+ ~ T EAt, EAA,@t ~ A c p )in the Taylor series, we rewrite (5.3.23) in
the form

~F - d2F -
,,,
+--( E A A )d2 (t + 2re, At, at) + (EAA)( i A p ) (t + ~ T EAt,
, at)
d 2 -~
+--(&Av)'
2
(t + 2TE, At, @t)+ . . . = 0.
aa2 I
In the stationary case considered in what follows, Eq. (5.3.25) acquires
the form

min E [E
vF
ltiZn
a,) cl(A,, +aaAf
d r - 2 r & ~EAA-(At, at)
8f
+ ~Acp-(At,
da
at) + -
(EAA)' d2f
2
-
dA2 (At, at)

Equation (5.3.26) is of infinite order and looks much more complicated


than the differential equation (5.3.20). Nevertheless, since the differences
EAA and ~ A c pare small, the higher-order derivatives of the loss function in
(5.2.26) are, as a rule, of higher orders of magnitude with respect to powers
of the parameter E. This allows us, considering terms of more and more
Control of Oscillatory Systems 283

higher order of E in (5.3.26) successively, to solve equations of comparatively


non-high orders and then to use these solutions for approximate solving of
the synthesis problem.
In practice, in this procedure of approximate synthesis, special attention
must be paid to a very important fact that simplifies the calculations of
successive approximations. Namely, in this case, there are two equations,
(5.3.20) and (5.3.26), for the same function f (A, a). Thus, combining both
these equations, we can exclude the derivatives d f I d @ ,d2f I d A d a , . . . of
the loss function with respect to the phase from (5.3.26) and thus to de-
crease the dimension and to turn the two-dimensional equation (5.3.26) into
a one-dimensional equation.
It is convenient to exclude the derivatives with respect to the phase, just
as to solve Eqs. (5.3.26), by using the method of successive approximations.
5.3.3. Approximate solution of the synthesis problem. To apply
the method of successive approximations, we need to calculate the mean
value of the integral

in (5.3.26) and the mean values of the amplitude and phase increments

over the time 27r. By using system (5.3.10), we can calculate expressions
(5.3.27) and (5.3.28) with arbitrary accuracy in the form of series in powers
of the small parameter E.
Let us write
Z(A,O, urnsign[sin(@- (or(A))],t) = G ( A ,a,t ) ,
(5.3.29)
&(A, O, urnsign[sin(@- or(^))], t ) = H ( A , O, t).
Then it follows from (5.3.10) that the increments of the amplitude A and
the slow phase (o over an arbitrary time interval r are

By using formulas (5.3.30) repeatedly, we can present &SATand ESP, as


the series
284 Chapter V

where

The increments (5.3.24) are calculated by formulas (5.3.31)-(5.3.36) with


regard to (5.1.17), (5.3.10), and (5.3.11) as

Finally, we need to use (5.3.31)-(5.3.37) and average the corresponding


expressions with respect to ( ( t ) ,taking into account (1.1.31).
In a similar way, using formulas (5.3.32)-(5.3.36), we can also calculate
the integral in (5.3.27) as a series in powers of E . Indeed, writing

substituting &A,, Sly,, ... given by formulas (5.3.32), (5.3.35), .... and
averaging with respect to ( ( t ) ,we obtain the desired expansion for (5.3.27).
In practice, to use this method for calculating the mean values of (5.3.27)
and (5.3.28), we need to remember that formulas (5.3.30)-(5.3.38) possess a
Control o f Oscillatory Systems 285

specific distinguishing feature relative to the fact that the random functions
in expressions (5.3.29) have the coefficients &-'I2 :

G(A, a , t ) = G(A, a , p,) - E-'/~[, (t)

= xA(A,a) - us (A, a) - p t ( t ) sin a ,


E
E-'I2
H(A, a,t ) = H ( A , a , cp,) - -tc(t)
A

= x,(A, 9 ) - A E A

(formulas (5.3.39) follows from (5.1.17), (5.3.11), and (5.3.29)). Thus, terms
of the order of E-' appear in Eb2A2,, ES3A2,, . . ., ES2p2,, E 6 3 ( ~ 2.~. ..
,
Therefore, in the calculations of the mean values of (5.3.27) and (5.3.28),
the number of terms retained in the expansions (5.3.31) must always be
larger by 1 than needed for the accuracy desired (if, for example, we need
to calculate the mean values of (5.3.27) and (5.3.28) with accuracy up to
+
terms of order of E', then we need to retain (s 1) terms in the expansions
(5.3.31)).
For example, let us calculate the first term in the expansion of the mean
value E(EAA). From (5.3.32) and (5.3.35), we have

Averaging (5.3.40) with respect to t ( t ) and taking into account the prop-
erties of the white noise, we obtain

where the bar, as usual, indicates averaging with respect to the fast phase
over the period (e.g., z A ( A t ) = & gnx,(At, a)d a ) , and us(At, p,) = G,
is given by (5.1.34)). Next, it follows from (5.3.33), (5.3.40), and (5.3.41)
that
286 Chapter V

Averaging (5.3.43) with respect to [(t) and taking into account (1.1.31) and
(1.1.32), we obtain

Ed2A2, =
&At
J2, [ArE[(rl)t(t)cos(Qt + r') cos(Qt + r) d ~ d' r + D
1
= 8 J2"
&At 0
[AT6(r1 - r) cos(Qt + r') cos(mt + r) d r '
I dr +D
- LJ2' cos2(Qt + r)d r + D = I~B
-
+ D, (5.3.44)
2 4 0 2~At
where

Finally, from (5.1.34), (5.3.31), (5.3.37), (5.3.42), and (5.3.44) we obtain


the desired mean value

cp,) +-
4At I
+e2...

( p = u,/~r). In a similar way, we obtain

%(At) +-
At 1
2~ sin cpp(At) + c 2 . . .
(5.3.46)
Control of Oscillatory Systems 287

For the other mean values of (5.3.27) and (5.3.28), in the first approximation
in E, we have

All the other mean values E[(EAA)(EAY)],E ( E A ~ ~ . .). ~in, (5.3.28) are
higher-order infinitesimals in E.
Now let us calculate successive approximations of the Bellman equation
(5.3.26). Simultaneously, with the help of Eq. (5.3.20), we shall exclude the
derivatives of the loss function with respect to the phase from (5.3.26).
The first approximation. We represent the loss function f (A, a) as
the series
+ +
f (A, @) = fl(A, a) Ef2 (A, @) E2 - .., (5.3.49)
substitute into Eq. (5.3.26), and retain only terms of the order of E (omitting
the terms of the order of c2 and of higher orders). Since, in view of (5.3.20),
-
d f I d @ E, using (5.3.45)-(5.3.48), we obtain the following equation of the
first approximation from (5.3.26):

In (5.3.50) we calculate the minimum with respect to 9, under the assump-


tion that d f l /dA > 0 and thus obtain the expression

for the minimizing function cp*,(A)that determines the switching line in the
first approximation. In this case, in view of (5.3.51), Eq. (5.3.50) acquires
the form

Comparing the result obtained with the approximate synthesis result


(5.1.55) for a similar deterministic problem, we see that, in the first ap-
proximation in E, the perturbation ( ( t ) in no way affects the switching
line. Just as in the deterministic case (5.1.55), (5.1.56), the switching line
coincides with the abscissa axis on the phase plane (x,k ) for any type of
nonlinearity, that is, for any function ~ ( x&), in Eq. (5.3.1).
288 Chapter V

To find the switching line in the second approximation, we need to cal-


culate the derivative afl/b'A satisfying the differential equation (5.3.52),
where the stationary error y is not jet found. But we can readily show how
to calculate this error. Namely, since the stationary error is defined (in the
probability sense) as the mean penalty value (see (1.4.32)), we have

where pl(A) is the stationary probability density for the distribution of the
amplitude A. The Fokker-Planck equation that determines this stationary
density is conjugate to the Bellman equation. Therefore, in the case of
(5.3.52), the equation for pl(A) has the form

For the zero probability flow (see $4, item 4 in [173]), Eq. (5.3.54) has the
solution

where the constant C is determined by the normalization condition

As soon as y is known, we can solve Eq. (5.3.52). The unique solution of


this equation is specified by the condition that the function fl / a A must
behave as A -+ oo just as in the deterministic case (that is, in (5.3.1)
the random perturbations [ ( t ) 0). This assumption on the solution of
Eq. (5.3.52) is quite natural, since, obviously, the role of the diffusion term
in the equation decreases as A increases (similar considerations were used
in 32.2).
It follows from (5.3.52) that if there are no perturbations ( B = y = O),
then this equation has the solution
afl -
- - -
E l (A)
X,(A) - 2 ~ '
Therefore, the diffusion equation (5.3.52) has the solution

(y - E~(A'))exp
4
[z J A
A'
(A") - 2p + )4A1' 1
~A'Id ~ ' .
(5.3.58)
Control of Oscillatory Systems 289

Now we can verify whether the derivative dfl/dA is positive (this was
our assumption, when we derived (5.3.51)). It follows from (5.3.58) that
this assumption is satisfied for not too small values of the amplitude A.
Therefore, if we solve the synthesis problem by this method, we need not
consider a small neighborhood of the origin on the phase plane (x, k ) . Just
as in the deterministic case in 55.1, it is clear from the "physical" viewpoint
that the controlling action u and the perturbations [ ( t )lead to appearance
of a neighborhood where the quasiharmonic character of the phase trajec-
tories is violated.

The second approximation. To obtain the Bellman equation in the


second approximation, we retain the following expression in (5.3.26):

min E{E ['IT cl(A,, a,) d r - 2,rray $ EAA-af 1 + EAF-af 1


+'r aA a@

The other terms in (5.3.26) are necessarily of orders larger than that of E ~ .
The derivatives dfl/a@, d2fl/dAa@, . . . of the loss function with respect
to the phase can be eliminated from (5.3.59) by using (5.3.20). Hence we
have

To find the function cpr(A) that minimizes the expression in the braces
in (5.3.59), we shall consider only the terms in (5.3.59) that depend on the
control (or, which is the same, on cp,(A)). In this case, we shall use the fact
that the minimizing function p+r(A)is small in the second approximation:
p+r(A)= ~ c p (A).
a Therefore, in the part of Eq. (5.3.59) that depends on
cp,, we can retain only the terms that depend on cpa by expressions
-
and E ~ .
- E'

Clearly, it is no longer sufficient to have only formulas (5.3.45)-(5.3.48)


for the mean values of (5.3.27) and (5.3.28) in the first approximation.
In the expansions (5.3.45)-(5.3.48) we need to calculate the terms
Following the way for calculating (5.3.30)-(5.3.38) and retaining only ex-
- E ~ .

pressions depending on cp, = E ( P ~ in the terms of the order of E ~ we ,


see that, in the second approximation, formulas (5.3.45)-(5.3.48) must be
Chapter V

replaced by

2 r c ( A , p,) -1
+ B7T
2A

~'27rB-
E(EAA)' = EBT - -uc (A, p,) sin2 a, (5.3.62)
A

E [& lt+21 I cl(A,, a , ) d r = &27rE1(A)

where Z(A, cP) and &(A, cP) denote the purely vibrational components of
the functions G(A, a, p,) = E(A, p,) +
Z(A, a ) and cl(A, a) = El(A) +
El (A, a ) .
By using (5.3.60)-(5.3.64), (5.1.34), (5.3.42), and (5.3.59), we see that
the desired function p:(A) = ~cpz(A),which determines the switching line
in the second approximation, can be found by minimizing the expression

N ( p p ) = - ~ 4 u , cos p, afl
-
aA
- EZ { [Tic (A, p,)G (A, a) - uc (A, @ ) h(A, a)]
27T

2~
+ ,%(A, p,)
B
[r - cl(A, @) - - sin2 @-
a2fl
2 dA2
B afl af 1
- - cos2 @- -
2A 8A p p ) sin2 maA
B?T
+ aGc (A, p,) (5.3.65)

We collect similar terms in (5.3.65) with the help of (5.1.34)-(5.1.36).


Control of Oscillatory Systems

As a result, we obtain

In the following two examples, we calculate the function cp:(A) for which
(5.3.66) attains its minimum.
EXAMPLE 1. Suppose that the plant to be controlled is a linear system.
In this case, ~ ( xi)
, 1 in (5.3.1), and it follows from (5.1.17) that

For simplicity, we assume that the vibrational component of the penalty


function &(A, @) = 0 (this holds, e.g., if c(z, i)= x 2 + i 2 in (5.3.3)). Then,
in view of (5.1.44) and (5.1.45), the expression (5.3.66) acquires the form

The condition aN/dcp, = 0 leads to the following equation for the desired
function cp: (A):

cos 3p, + (14 - - +


2A2 A(8fl/dA)
) cos cpF] = 0. (5.3.67)

Representing the desired functions cpz(A) in the form of asymptotic expan-


sion p 3 A ) = &&(A) +s2. . . , we readily obtain from (5.3.67) the following
expression for the leading term of this expansion:

cp;(A) = scp; (A) = E 2[: - + :;:!2)]


Formula (5.3.68) determines the switching line of the suboptimal control
u2(A, cP) = urnsign[sin(@- &&(A))] in the second approximation.
292 Chapter V

In (5.3.68), y is calculated by formula (5.3.53) with the stationary prob-


ability density

C = ~2- 2 u , ~ e 1 6 P 2 / r , [ l - F (z*/) B] f F ( u ) =f- i l"e-x2 dr.


(5.3.69)
Here the derivative 8fl/dA, determined by (5.3.58) in the general case, has
the form

x lm [Fl (A') - y]A' exp [ -


1
(A" + PA')] dA'. (5.3.70)

Since

y -+0 and afl


-
E l (A)
as B+O;
aA 2 p - F A(A)

one can readily see that formula (5.3.68) coincides as B + 0 with the
corresponding expression (5.1.60) for the switching line of the deterministic
problem.
EXAMPLE 2. Let us consider a nonlinear plant with x(x, &) = x2 - 1 in
(5.3.1) (in this case, the plant is a self-exciting van der Pol circuit). For
such a system, it follows from (5.1.17) that

- A A3 A cos 2 a + -
a) = -- A3 cos 4G.
x(A) = - - -, Z,(A, (5.3.71)
2 8 2 8
Substituting (5.3.71) into (5.3.66) and using (5.1.44) and (5.1.45), from
(5.3.66) and the condition aN/acp, = 0 we derive the expression for the
switching line in the second approximation, which coincides in form with
the expression obtained in the previous example. However, now the loss
function and the stationary error in (5.3.68) must be calculated in a different
way.
So, in this case, the stationary probability density (5.3.55) for the dis-
tribution of the amplitude has the form
Control of Oscillatory Systems 293

where C is the normalization constant:

The stationary error y in (5.3.68) is calculated by formula (5.3.53) with the


help of (5.3.72) and (5.3.73). The expression for d fl/dA can be obtained
from (5.3.58) with regard to (5.3.71). As a result, we see that the derivative
d f l / d A in (5.3.68) has the form

afl
- 4
-- e x [I(g
p B 8 A2 + 8 , ~ A ) l
-
dA BA

A ( A ) - [
y) exp - -
B
1 A14
(-
8
- Af2 + ~,uA')]dAf.
Just as in Example 1, formula (5.3.68) coincides as B +0 with the
corresponding expression

obtained in $5.1 (see (5.1.63)) for the deterministic problem.

The influence of random perturbations on the position of the switching


line in the second approximation is shown in Fig. 42, where four switching
294 Chapter V

lines for the linear quasiharmonic system from Example 1 are depicted.
Curve 1 corresponds to the deterministic problem ( B = 0). Curves 2,
3, and 4 show the switching lines in the stochastic case and correspond
to the white noise intensities B = 1, B = 5, and B = 20, respectively.
These switching lines correspond to the quadratic penalty function c(x, &) =
+
x2 i 2in the optimality criterion (5.3.3) and the parameters u, = 1
and E = 0.25 in problem (5.3.1)-(5.3.3). The dashed circle in Fig. 42
approximately indicates the domain where the quasiharmonic character of
the phase trajectories of the system is violated. In the interior of this
domain, the synthesis method studied here may lead to large errors, and
we need to employ some other methods for calculating the switching line
near the origin.
5.3.4. Approximate synthesis of control that maximizes the
mean time of the first passage to the boundary. As another ex-
ample of the method of successive approximations treated above, let us
consider the synthesis problem for a system maximizing the mean time
during which the representative point (z(t), &(t))first comes to the bound-
ary of some domain on the phase plane (x, &). For definiteness, we assume
that this domain is the disk of radius Ro centered a t the origin. As be-
fore, we consider a system whose behavior is described by Eq. (5.3.1) with
constraints (5.3.2) imposed on control.
Passing to the polar coordinates and considering the new state variables
A and as functions of the "slow" time t =~ t we , transform Eq. (5.3.1)
to the system of equations of the form

2
where the functions and fi are given by (5.3.11) and (5.1.17). By using
Eq. (5.3.74), we can write the Bellman equation for the problem in question.
It follows from $1.4 that the maximum mean time during which the
representative point (A(T), @(TI) achieves the boundary (the loss function
for the synthesis problem considered) can be written as (see (1.4.38))

Recall that W ( r ,AT,at-)denotes the probability that the representative


point with the polar coordinates (AF, ai-)a t time does not achieve the
boundary of the region of admissible values during the time (T - t ) . For the
optimality principle (see (1.4.39)) corresponding to the function (5.3.75),
Control of Oscillatory Systems 295

we can write the equation


-
-
F (Ac, ac) = max
k(~)l$um
tjr<t+A
E[L+ t A
~(r,~i,ai)dr+~(~i+A,ai+A)]-

(5.3.76)
By letting the time interval A + 0, in the usual way ($1.4), we obtain the
following differential Bellman equation for the function F(A,a):
a~
a@
-= r { - ~ - L F - lulLurn
a~
max [G(A,@,u)-+H(A,@,u)-I}
dA
aF
-
a@ '
A < Ro, F(Ro,@)= 0.
(5.3.77)
Here L is the operator (5.3.15), and the functions G and H are determined
by formulas (5.1.17).
On the other hand, if we set A = 2as in (5.3.76), then we arrive a t the

[r2='
finite-difference Bellman equation (an analog of (5.3.26))

max E W (7, AT, aF)d r + (EAA)-


aF + ( r a p ) -
dF+- a2F
( E A A )-
~
pr dA a@ 2 dA2

Here the increments of the amplitude eAA and the "slow" phase r A y are
the same as in (5.3.24), and satisfy (5.3.45)-(5.3.48) and (5.3.61)-(5.3.64).
Next, to solve the synthesis problem approximately, we need, just as in
Section 5.3.3, to solve Eqs. (5.3.77) and (5.3.78) simultaneously. Here we
write out the first two approximations of the function cpr(A) determining
the switching line in the optimal regulator, which, just as in Section 5.3.3,
is of relay type and has the form (5.3.19).
The first approximation. Substituting the expression a F / a ~ from
(5.3.77) into Eq. (5.3.78), omitting the terms of the order of r 2 and of
higher orders, and using (5.3.45)-(5.3.48), we obtain the following Bellman
equation in the first approximation:

Since, by definition, W(T, Ai; @;) = 1 a t all points in the interior of the
domain of admissible states (that is, for all AT < Ro), we can transform
296 Chapter V

(5.3.79) with regard to (5.3.45) to the form

The function cpr(A) determining the switching line in the first approxi-
mation is found from the condition that the expression in the square brack-
ets in (5.3.80) attains its maximum. For 8Fl/dA < 016 we obtain

Comparing (5.3.81) with (5.3.51) as well as with (5.1.55), we conclude


that, in the first approximation in E , the switching line of the optimal quasi-
harmonic stabilization system always coincides with the abscissa axis on
the plane (x,2 ) ; this fact is independent of the type of system nonlinearity,
the existence of random perturbations, and the optimality criterion. Some
distinctions between expressions for cp+r(A)appear only in higher-order ap-
proximations.
The equation for the loss function F1(A) in the first approximation with
regard to (5.3.81) has the form

A unique solution of this equation is determined by the natural boundary


conditions

For simplicity, we shall consider the case where the plant is a linear quasi-
harmonic system. In this case, we have ~ ( x&), = 1 in (5.3.1) and z(A) =
-A12 in (5.3.82). Solving (5.3.81) with the second condition in (5.3.83),
we readily obtain

The expression (5.3.84) is used for determining the switching line in the
second approximation.

161t follows from (5.3.84) that the condition EJFl/dA < 0 is satisfied for all A E
(0, Rol.
Control of Oscillatory Systems 297

The second approximation. The switching line in the second approx-


imation is calculated by analogy with Section 5.3.3. Namely, in Eq. (5.3.78)
we consider the terms of the order of E' and retain the terms depending on
p r taking into account the fact that p: (A) is small (9: = E&). Then we see
that the desired function cp:(A) in the second approximation is determined
by the condition that the expression

attains its maximum.


If the system is linear, then we have %*(A, a) = ACOS2@/2, and the
desired expression for (pc(A), which follows from the condition dN /dp, = 0
with regard to (5.1.44) and (5.1.45), has the form

Figure 43 shows the switching line given by (5.3.86).

In conclusion, let us present the block diagram (Fig. 44) of a quasiop-


timal self-stabilizing feedback control system with plant P described by
298 Chapter V

Eq. (5.3.1). The feed-back circuit (the regulator) of this system contains
a differentiator, a multiplier, an adder, an inverter, a relay unit, and two
nonlinear transducers N C 1 and NC2. Unit N C 1 realizes the functional
dependence A = d m , that is, produces the current value of the am-
~ l i t u d eA. Unit NC2 models the functional dependence V;(A), which is
given either by (5.3.68) or by (5.3.86), depending on the problem consid-
ered. Thus, the feed-back circuit in the diagram in Fig. 44 realizes the
control law
+
u(x, J) = -€urn sign (1 xp:(&%Z)),

which coincides with (5.3.19) with accuracy up to terms -- E ~ .

We also note that the diagram in Fig. 44 becomes significantly simpler


if system (5.3.1) is controlled by using the quasioptimal algorithm in the
first approximation (5.1.55), (5.1.56). In this case, the part of the diagram
indicated by the dashed line is absent.

$5.4. Optimal control of quasiharmonic


systems with noise in the feedback circuit
Now we shall show how to generalize the results of the preceding section
to the case where the error in the measurement of the output (controlled)
variable x(t) cannot be removed.
5.4.1. Statement of the problem. We shall consider the feed-back
control system whose block diagram is shown in Fig. 25. Just as in 35.3, we
Control o f Oscillatory Systems 299

assume that the plant P is a quasiharmonic controlled system perturbed


by the standard white noise and described by the equation

We seek the optimal (scalar) control u, = u,(t) in the class of piecewise


continuous functions whose absolute value is bounded by urn:

It is required to calculate the controller C so that to provide the optimal


damping of oscillations x(t) arising in system (5.4.1) under the action of
random perturbations [(t). In this case, the quality of the damping is
estimated by the mean value of the functional

The functions ~ ( xk), and c(x, k) in (5.4.1), (5.4.3) are the same as in
(5.3.1), (5.3.3). Therefore, problem (5.4.1)-(5.4.3) is completely identical
to problem (5.3.1)-(5.3.3).
The single but important distinction between these problems is the fact
that now it is impossible to measure the current state of the controlled
variable x(t). We assume that the result y(t) of our measurement is an
additive mixture of the true value of x(t) and a random error of small
intensity:
+
y(t) = x(t) &rl(t), (5.4.4)
where E is a small parameter the same as in (5.4.1) and the random function
~ ( t is
) a white noise (independent of [(t)) with characteristics

where N > 0 is the intensity (spectral density) of the process ~ ( t ) .


Now to obtain some information about the current state of the plant
at time t , we need to use the entire prehistory of the observed process
y; = {y(r): 0 5 T 5 t} from the initial time t = 0 till the current time t.
Therefore, in this case, the current values of the control action ut and the
function (5.3.5) of minimum future losses depend on the observed realiza-
tion yh, that is, are the functionals

~ ( ty;), = min
Iu(.)I5um
E [ lT
c(x(r), i ( r ) )d r I y;] . (5.4.7)
t<r<T
300 Chapter V

The principal distinction between problems (5.4.1)-(5.4.4) and (5.3.1)-


(5.3.3) is that, to find the optimal control functional (5.4.6) that minimizes
the optimality criterion (5.4.3), we need to choose the space of states of
the controlled system (the sufficient coordinates of the problem; see $1.5,
$3.3, and $4.2) in a special way, which will allow us to use the dynamic
programming approach for solving the synthesis problem.
Let us show how to determine the sufficient coordinates for problem
(5.4.1)-(5.4.5).
5.4.2. Equations for the sufficient coordinates. Let us consider
the random function z(t) = J: ~ ( dr r). Then writing the plant equation
(5.4.1) as the system of first-order equations:

and assuming that the control u is a given function of time, we can readily
show that z(t) is the observable component of the three-dimensional Markov
process (xl(t), x2(t),~ ( t ) ) .By using (5.4.4), (5.4.5), and (5.4.8), as well
as the results of $1.5, we readily obtain an equation for the a posteriori
probability density wps(t, x) = wps (t, x1,x2) = w(xl, 2 2 1 2:) = w(xl, 2 2 I
y:) for the components of the unobservable diffusion process determined by
system (5.4.8). The corresponding equation is a special case of Eq. (1.5.39)
and has the form

awps (t, 2) - a (KawpS) a2


+ 21dz,axp
at
- --
8%
-- (Bapwps) + [Q(X,Y)- Glwps.
(5.4.9)
Here the subscripts a , P take the values 1 and 2, and

Equation (5.4.9) for the a posteriori density also remains valid if the
control u in (5.4.8) is a functional of the observed process 24, (or y); or even
of the a posteriori density wps(t, x) itself. This fact is justified in [I751 (see
also $ 1.5).
It follows from (5.4.4), (5.4.5), (5.4.9), (5.4.10), and the results of $1.5
that the a posteriori probability density wps(t, x), treated as a function of
time, is a Markov stochastic process and thus can be used as a sufficient
coordinate in the synthesis problem. However, usually, instead of wps (t, x),
it is more convenient to use a parameter system equivalent to wp,(t, x).
If we write xy(t) = x;,, xi(t) = xgt for the coordinates of the maximum
Control of Oscillatory Systems 301

point of the a posteriori probability density wps(t, x) a t time t,17 then,


expanding wps(t, x) in the Taylor series around this point, we obtain the
following representation for wp, (t, x) = wps(t, X I , x2) (see (1.5.41)):

(in (5.4.11) the sum is over ni, i = 1, . . . , s, assuming the values 1 and 2).
If we substitute (5.4.11) into (5.4.9) and set the coefficients of equal pow-
ers of (xn, - x:~). . . ( x , ~- x:~) on the left- and right-hand sides equal to
each other, then we obtain a system of differential equations for the pa-
rameters x:; (t) and a ,,.,* (t) (see (1.5.43)). Note that since Eq. (5.4.9)
is symmetrized, the stochastic equations obtained for x:;(t) and u,,...,~ (t)
are also symmetrized.
It is convenient to replace the probability density wps(t,x) by a set of pa-
rameters, since we often can truncate the infinite system of the parameters
xii, a [167, 170, 1811 retaining only a comparatively small number
of terms in the sum that is the exponent in (5.4.11). The error admitted
in this case as compared with the exact expression for wps is the less the
higher is the a posteriori accuracy of estimation of the unobservable com-
ponents x1 and x2 (or, which is the same, the less is the norm of the matrix
]lDaPll of the a posteriori variances); here, the norm of the matrix llDaPll
is of the order of E , since, in view of (5.4.4), the observation error is a small
variable of the order of fi.
It is often assumed [167, 1701 that a,,,,,, = a,,,, ,,,, = . = 0 in
(5.4.11) (this is the Gaussian approximation). In the Gaussian approxima-
tion, from (5.4.9) and (5.4.10) we have the following system of equations
for the parameters of the a posteriori density wps(t, X I , x2):18

1 7 ~ h variables
e x : ( t ) and z i ( t ) are estimates of the current values of the coordinate
x ( t ) and the velocity x ( t ) of the control system (5.4.1). If the estimation quality is
determined only by the value of the a posteriori probability, then z: ( t ) and z i ( t ) are
the optimal estimates.
''For the linear oscillator (when ~ ( z , & ) 1 in (5.4.1)), the a posteriori density
(5.4.11) is exactly Gaussian, and Eqs. (5.4.12) are precise.
302 Chapter V

DII = 2D12 - D?, (a


1
-E
a2x1~ - 2&DI1Dl2--
)
a2x1
+ a2x1
8% ax1ax2 8%;'
1
1 =2 D i i D 1 2 - - E
a2x1 + + DZ2+ E D ~ ~a2x1
D ~ ~ -
~
(EN ax, ~ - D~~) (1 ax;
- +
~(D11D22 Di2)-
a2x1 - EDZZ-,
ax2
ax2
ax2 ax2 (- 1 a2x1
622 = r B - 2D12 1
( + a + ax,
E- E--) - D:,
EN
- t7)
ax,
+ 2 a2x1- 2&Dl2DZ2---a2x1 . (5.4.12)
ax; axlax2
To write these equations, we have passed from the parameter system llaapll
to the matrix llDas 11 = /laaa11-l of the a posteriori covariances. Besides of
this, in (5.4.12) we have used the notation

Let us make some remarks concerning Eqs. (5.4.12). First, since (see
(5.4.1), (5.4.4), and (5.4.5)) the noise intensity in the plant and in the feed-
back circuit is assumed to be small (of the order of E ) , the covariances of the
a posteriori distribution are also small variables of the order of E , that is, we
can write D l 1 = &Ell,D12 = &Dl2,and D22 = &Dz2.This implies that the
terms in (5.4.12) are of different order of magnitude and thus Eqs. (5.4.12)
can be simplified furthermore. So, retaining the most important terms and
omitting the terms of the order of e2 and of higher orders, we can rewrite
(5.4.12) in the form

We also note that, in this approximation, the last three equations in (5.4.13)
can be solved independently of the first two equations. In particular, we
Control of Oscillatory Systems 303

see that, for a long observation, the stationary operating conditions occur
and the covariances of the a posteriori probability distribution attain some
steady-state values D;l, DT2, and Da2 that do not change during the further
observation. These limit covariances depend neither on the way of control
nor on the type of the plant nonlinearity (the function ~ ( x5 ,) in (5.4.1))
and are equal to

In what follows, we obtain the control algorithm for the optimal stabilizer
(controller) C under these stationary observation conditions.
5.4.3. The Bellman equation and the solution of the synthe-
sis problem. In the Gaussian approximation, the loss function (5.4.7)
is completely determined by the current values of the a posteriori means
0
zl(t) = x?, and x;(t) = xi, and by the values of the a posteriori covari-
ances Dl1, Dl2, and D22. Under the stationary observation conditions, the
a posteriori covariances (5.4.14) are constant, and therefore, we can take
x?(t), xg(t), and time t as the arguments of the loss function (5.4.7). Thus,
in this case, instead of (5.4.7), we have

-
(5.4.15)
In (5.4.15) the symbol E of the mathematical expectation means the a
posteriori averaging, that is, the averaging with the a posteriori probability
density. In other words, if we write the integral in the square brackets in
(5.4.15) as a function of the initial values of the unobservable variable xlt
and x2t, then, to obtain F ( t , x?,, xi,), we need to integrate this function
with respect to xlt and x2t with the Gaussian probability density

For the function (5.4.15), the basic functional equation (the optimality
304 Chapter V

principle) of the dynamic programming approach has the form

~ ( tx:,, x;,) = min cl(x1ry x2r) d 7


Iu(r)lLum
tLr<t+A

The differential Bellman equation can be obtained from (5.4.17) by using


the standard derivation procedure outlined in $1.4 and $1.5. To this end,
we need to expand the function F ( t + A, x ; , + ~ ) in the Taylor series
around the point ( t ,x?,, x:~), to calculate the mean values of the increments

and the integral

I
~ ~ ( x l~ r2r r d) 7 , (5.4.19)

to substitute the expressions obtained for (5.4.18) and (5.4.19) into (5.4.17),
and pass to the limit as A -+ 0.
To calculate the mean values of (5.4.18), we need Eqs. (5.4.13) and for-
mulas (5.4.4) and (5.4.5). So, from (5.4.13) we obtain

Since the stochastic processes XI, = x ~ ( T )xt,


, = x?(T), and x;, = x?(T)
are continuous, for small A we can replace these stochastic functions by the
constant values xlt, x?,, and xi,. The error of this replacement is of the
order of o(A). As a result, if we average with respect to q(t) with regard
to (5.4.5), then (5.4.20) implies

By averaging (*) with respect to xlt with the probability density (5.4.16),
we finally obtain
0 - 0,
E(z?t+a - XI,) - %2t + o(A)- (5.4.21)
Control of Oscillatory Systems 305

In a similar way, we can find the other expressions for (5.4.18) and
(5.4.19):

c(r,,, x2,) d r ] = A :[/ c(xlt, x2t)N(xP, D*) dxitdxzt

+o(A). (5.4.22)
Using (5.4.21) and (5.4.22) and letting A -+ 0 in (5.4.17), we obtain

datF (t, X 0I , x2)


-- 0
= x 0 2d F7 - (xy + E~(X:,
ax1

+ /c c(xl, x2)N(x0,a * ) dxldx2

(here we omit the subscript t in xy, xg, X I , 22).


(5.4.23)

If the terminal time T in (5.4.3), (5.4.7), and (5.4.15) is sufficiently large,


then the fact that F depends on t becomes unimportant (the stationary sta-
bilization conditions take place), since the derivative -dF/dt -+ y as T +
co (here y is a constant that characterizes the mean losses per unit time
under the optimal control). As is usual in such cases (see (1.4.29), (2.2.9),
(4.1.7), and (5.3.17)), passing from F ( t , x?, x;) to the time-independent loss
function
f (x:, 4 ) = [w,4, x i ) - y ( T - t)] ,
we arrive a t the stationary version of Eq. (5.4.23):
306 Chapter V

Just as in s5.3, it is more convenient to solve Eq. (5.4.24) in the polar


coordinates if, instead of the estimated values of the coordinate x: and the
velocity x;, we use, as the arguments of the loss function, the corresponding
values of the amplitude A. and the phase Go:

xy = A. cos Go, x: = -Ao sin (@o= t + yo). (5.4.25)

Performing the change of variables (5.4.25), we transform (5.4.24) to the


form

- min [G(AO,mo, u af ) + H(AO,


~ ~ , af
o%)-I). (5-4.26)
I ~ l l u m d@o

The expressions for G(Ao, Qo, u) and H(A0, a o , u) coincide with (5.1.17)
after the change A, @ + Ao, Qo. The function c*(Ao, @o) is determined by
+
the penalty function c(x, k) in (5.4.3) (e.g., for c(z, 2) = x 2 i2,we have
c*(Ao,a o ) = A; + +
EB,*, ED:^). In (5.4.26) Lo denotes the differential
operator

cos2 @O
+-
AO ~ A O
a - --
sin 2Qo
A;
aI)
a@, .
Note that as N + 0 formula (5.4.27) passes into formula (5.3.15) for the
operator L obtained in 35.3 for systems containing complete information
about the phase coordinates of the plant. We can readily verify this fact by
substituting the values (5.4.14) of the steady-state covariances into (5.4.27)
and passing to the limit as N + 0. Then (5.4.27) acquires the form of
(5.3.15), and Eq. (5.4.26) coincides with (5.3.18).
Control of Oscillatory Systems 307

Equation (5.4.26) can be solved by the approximate method outlined


in $5.3. Indeed, the principal assumption (necessary for the approximate
method to be efficient) that the trajectories of the sufficient coordinates
z:(t) and x:(t) are quasiharmonic is satisfied in this case, since the noises
[ ( t )in the plant and the noises ~ ( t in) the feed-back circuit are small (their
intensity is of the order of e). In view of this fact, the rate of change of
the estimated values for the amplitude A. and the phase cpo are small, and
hence we can use the successive approximation procedure of 85.3.
To this end, considering the loss function (5.4.15) as a function of the
estimates of the amplitude A. and the phase Go and writing the finite
difference equation (5.4.17) for the time interval A = 27r, we obtain the
equation

2
-+...I
( ~ A ' p o )d2
~
am:,
=o,

similar to Eq. (5.3.26). Next, just as in 85.3, by using (5.4.26), we elimi-


nate the derivatives of the loss function with respect to the phase eofrom
(5.4.28) and solve the obtained one-dimensional infinite-order equation by
the method of successive approximations.
Note that the increments of the estimated values of the amplitude EAAO
and the phase &Avoon the time interval A = 27r can readily be calculated
with the help of Eqs. (5.4.13) for the sufficient coordinates written in the
polar coordinates A. and Qo in accordance with the change of variables
(5.4.25). In this case, just as in 85.3, we assume in advance that, in view
of the symmetry of the problem, the optimal control has the form

u,(Ao, Go) = urn sign [sin (@o - cp,(Ao))], (5.4.29)

and thus solving the synthesis problem is equivalent to finding the equation
in the polar coordinates for the switching line cp,(Ao).
We do not consider the mathematical calculations in detail (they coincide
with those in $5.3), but illustrate the resuIts obtained for the switching line
in the first two approximations by way of example of a controlled plant that
is a linear quasiharmonic system (in (5.4.1) we have ~ ( zk ,) 1). By using
the above-described procedure, we simultaneously solve Eqs. (5.4.26) and
308 Chapter V

(5.4.28) and obtain the following one-dimensional Bellman equation in the


first approximation (in the case of quadratic penalties c(x, I ) = x2 + k 2 ) :
+
d2fi min [-2acoscpp-
d"
A0 I =?-A:,
(5.4.30)
Urn
p = -,
Y = (Dli)" (Kd2
4N

-
7i-

Hence we obtain the following equation for the switching line in the first
approximation:
c p p o ) 0, (5.4.31)
which corresponds to the control law

Taking into account (5.4.31), from (5.4.30) we obtain the expression

for the derivative dfl/dAo, which enters the formula for the switching line
in the second approximation:
Control of Oscillatory Systems 309

Since y?(Ao) is small, it follows from (5.4.25) and (5.4.29) that the qua-
sioptimal control algorithm in the second approximation can be written
as
ULJ(X?,X:) = -urn sign
The block diagram of a self-stabilizing system realizing the control algo-
rithm in the second approximation is shown in Fig. 45. The most important
distinction between this system and that in Fig. 44 is that the feed-back
circuit contains a n additional element SC producing the current values of
the sufficient coordinates x : ( t ) and x i ( t ) . Figure 46 presents the diagram
of this element in detail.
-1

<
I
CHAPTER VI

SOME SPECIAL APPLICATIONS OF


ASYMPTOTIC SYNTHESIS METHODS

In this chapter we consider some methods for solving adaptive problems


of optimal control ($6.1), as well as problems of control with constrained
phase coordinates ($6.2). Furthermore, in $6.3 we solve a problem of con-
trolling the size of a population whose behavior is described by a stochastic
logistic model.
"Adaptive problems" are optimal control problems, similar to those con-
sidered above, that are solved under the assumption that some system pa-
rameters are unknown a priori. In this case, just as in problems with
observation noise (33.3, $4.2, and $5.4), the optimal controller is a com-
bination of the optimal filtration unit and the controlling unit properly
producing the required controlling actions on the plant. In $6.1 we present
an approximate method for calculating such controllers; this method is ef-
fective if the a priori indeterminacy of unknown parameters is relatively
small.
In 56.2 we present exact and approximate solutions of some stochastic
problems of control with constrained phase coordinates. We consider two
servomechanisms and a stabilizing system under the assumption that the
range of admissible deviations between the command signal and the output
coordinate is a fixed interval on the coordinate axis. We consider two cases
of reflecting and absorbing screens a t the endpoints of this interval. In
solving the stabilization problem, we study a two-dimensional problem in
which the phase trajectories reflect along the normal on the boundary of
the region of admissible phase variables.
In $2.4 we have already studied the problem of control of a population
size and have exactly solved a special control problem based on the stochas-
tic Malthus model. In $6.3 we shall consider a general case of a stochastic
logistic controlled model and construct an optimal control algorithm for
this model in terms of generalized power series. We also obtain approx-
imate finite formulas for quasioptimal algorithms, which can be used for
large values of the model parameter called the m e d i u m capacity.
Chapter VI

$6.1. Adaptive problems of optimal control


In this section we consider the synthesis problem for controlled dynamic
systems perturbed by a white noise and described by equations with un-
known parameters. We assume that the system equations contain these pa-
rameters linearly and that the a priori indeterminacy of these parameters
is small in some sense. First we present a formal algorithm for solving the
Bellman equation approximately (and for the synthesis of a quasioptimal
control). The algorithm is based on the method of successive approxima-
tions in which the solution of the optimal control problem with completely
known values of all parameters is used as the zero approximation ( a gener-
ative solution). Next, we estimate the quality of the approximate synthesis
(for the first two approximations). Finally, we illustrate our method by cal-
culating a quasioptimal stabilization system in which the controlled plant
is an aperiodic dynamic unit with an unknown inertia factor.
6.1.1. We shall consider control systems where the plant is described
by stochastic differential equations of the form

Here x is an n-dimensional phase vector, u is an r-dimensional control


vector, O(x) is an n-dimensional vector of known functions, [(t) is an n-
dimensional vector of random functions of the white noise type (1.1.34),
and A, B , u are constant matrices of the corresponding dimensions.
Here B and u are known matrices (det u # O), and some (or all) elements
of the matrix A are a priori unknown. The functions $1 (x), . . .,On (x)are
arbitrary. The only assumption is that, a t least in the weak sense [131],
Eqs. (6.1.1) has a unique solution x(t) = xu (t), t 2 to, for a given x(to) = x
and any admissible control u.
In the following, it is convenient to denote the unknown parameters of
the matrix A by the special letter a. Numbering all unknown parameters
in an arbitrary way and writing them as a column a = ( a l , .. . , ak),we can
rewrite Eq. (6.1.1) as

where A* is obtained from the matrix A by substituting zeros instead of all


unknown elements and the n x k matrix Q(x) (that consists of the functions
Oi (x) and zeros) is uniquely determined by the vector a from the condition
+
AO(x) = A*O(x) Q(x)a. The goal of control is to minimize with respect
to u the mean value of the functional
Applications of Asymptotic Synthesis Methods 313

where c(x) and $(x) are some nonnegative bounded continuous functions,
and H is a positive definite constant r x r matrix. We do not impose any
restrictions on the admissible values of the control vector u and assume
that the state vector x can exactly be measured a t any time t E [O, TI.
Thus, we can seek the optimal control u,that minimizes the mathematical
expectation (6.1.3) in the form of the functional

where xh = {x(T) : 0 5 T 5 t ) is an observed realization of the state vector


from the initial instant of time to the current time t.
6.1.2. The approximate synthesis algorithm. We assume that
the difference between the unknown parameters a and the a priori known
vector a 0 is small. To obtain a rigorous mathematical statement, we assume
that a is a random vector subject to an a priori Gaussian distribution with
mean a0 and the covariance matrix Do = =Do (E is a small This
assumption and Eqs. (6.1.2) imply the following two facts that we need in
the sequel.
1. The a posteriori probability density p ( a I zk) = p t ( a ) calculated from
observations of the process x (t)' is a Gaussian (conditionally Gaussian)
density completely described by the vector m = m(t) = mt of a posteriori
mean values and the matrix D = D(t) = Dt of a posteriori covariances. The
latter are described by the following differential equations (see [132, 1751):
d0m = D Q (~x ( t ) ) ~ - l [ d o x ( t )- ( A ( m ) ~ ( x+) BU) d t ] ,
(6.1.5)

Throughout this section, N-I is the inverse of the matrix N = auT, Nl =


QTN-'Q, and the matrix x ( m ) is obtained from A in (6.1.1) in which all
unknown parameters a are replaced by their a posteriori means m.2 We
also note that system (6.1.5) contains stochastic differential Ito equations
and the differential equations in system (6.1.6) are understood in the usual
sense.
2. The elements of the matrix Dt are small variables ( N E) for all t > 0.
Indeed, by integrating the matrix equation (6.1.6), we obtain the following
explicit formula for the covariance matrix Dt in quadratures:

'It follows from (6.1.2) and (6.1.4) that x ( t ) is a diffusion type process.
2As is known 138,39, 1671, the a posteriori means m = mt are optimal estimates of
cu with respect to the minimum mean square error criterion.
314 Chapter VI

(E is the k x k identity matrix). Denoting the columns (with the same


numbers) of the matrices Dt and Do by yt and yo, respectively, we obtain
from (6.1.7) the relations

Since the constant matrices Do and N-' are positive definite, the matrix
R(s) is nonnegative definite; R(s) is degenerate if and only if all elements
of a t least one column of the matrix Q are zero.
>
Let X(s) 0 be the minimum eigenvalue of the matrix R(s). On multi-
plying (6.1.8) by yt in the scalar way, we obtain

(here llytll is the Euclidean norm of the vector yt). Replacing the quadratic
form in (6.1.9) by its lower bound and estimating the inner product (yo, yt)
with the help of the Cauchy-Schwarz-Bunyakovskii inequality, we arrive a t
the inequality

- - -
Since llyoll E , it follows from (6.1.10) that llyt 11 E . Thus we have Dt E
for all t E [0, TI.
We shall solve the problem of optimal control synthesis by the dynamic
programming approach. To this end, we first note that the a posten'ori
probability density p t ( a ) (or the current values of its parameters mt and
Dt) together with the current values of the phase vector xt form the suffi-
cient coordinates (see $1.5) for the problem in question. Therefore, these
parameters and time t are arguments of the loss function given, as usual,
by the formula

~ ( t , x , m , D ) = min
U(J)ER,
t<s<T

The expression in the square brackets in (6.1.5) is the differential of


the Wiener process (the innovation process [132]) with the matrix N of
Applications of Asymptotic Synthesis Methods 315

diffusion coefficients. Therefore, it follows from (6.1.2), (6.1.5), and (6.1.6)


that the variables (xt, mt, Dt) form a diffusion Markov process (degenerate
with respect to D). By applying the standard derivation procedure (see
31.4, as well as [97]), we obtain the following differential Bellman equation
for the function F = F ( t , x, m, D):

-Ft = oT (%)A (m)Fx + min [uTBTFX uTHu]


-T
uER,
+ + -21 S p ( N F x x ~ )

Here Ft = dF/dt, Fx is a vector-column with components . . ., E,


g,

dF
are matrices of partial derivatives,

and Sp(.) is the trace of the matrix (.).


Since the covariance matrix D is of the order of E, it is now expedient
to pass to new variables D according to the formula D = ED. Performing
this substitution and minimizing the expression in the square brackets, we
transform Eq. (6.1.12) to the form

In this case, the vector


316 Chapter VI

a t which the function in the square brackets in (6.1.12) attains its mini-
mum, determines the optimal control law, which becomes a known func-
tion u, = u, ( t ,x, m, D) of the sufficient coordinates, after the loss function
F = F (t, x, m, D) is calculated from Eq. (6.1.13).
Now let us discuss whether Eqs. (6.1.13) can be solved. Obviously, in the
more or less general case, it is hardly possible to obtain an exact solution.
Moreover, one cannot construct the exact solution of Eq. (6.1.13) even
in the special case where 8(x) is a linear function and c(x) and +(x) are
quadratic functions of x, that is, in the case in which the synthesis problem
with known parameters in system (6.1.1) can be solved exactly. The crucial
difficulty in this case is related to the bilinear form (in the variables x and
m) appearing in the coefficients of the first-order derivatives F,. On the
other hand, a high accuracy of estimating the unknown parameters a, due
to which a small parameter E appeared in the three last terms in (6.1.13),
results in a rather natural assumption that the difference between the exact
solution of (6.1.13) and the solution of (6.1.13) with E = 0 is small. (In
other words, the difference between the solution of the synthesis problem
with unknown parameters a and the similar solution with given a = a 0 is
small.)
The above considerations allow us to believe that an efficient approx-
imate solution of Eq. (6.1.13) (that is, of the synthesis problem) can be
obtained by means of the regular asymptotic method based on the expan-
sion of the desired loss function F in powers of the small parameter E:

Substituting (6.1.15) into (6.1.13) and grouping terms of the same order
with respect to E, we obtain the following equations for successive approx-
imations:
1
-F: = dT ( x ) z T ( m ) F -
~ -(F;)~BIF
4
: 21 S p ( ~ ~ 2 z , )c(x),
+ +
O<t<T, F0(~,x,m,D)=+(x); (6.1.16)
Applications of Asymptotic Synthesis Methods 317

The zero-approximation equation (6.1.16) is n ~ n l i n e a r while


,~ the suc-
cessive approximations can be found by solving the linear equations (6.1.17)
and (6.1.18), which usually is a simpler computational problem. Thus, the
described scheme for solving Eq. (6.1.13) approximately is useful only if
Eq. (6.1.16), that is, the Bellman equation for the problem with completely
known parameters a , can be solved exactly. As was already pointed out,
the last condition is satisfied if Oi (x) are linear functions and c(x) and $(x)
are quadratic functions of the phase variables x. In this case, all successive
approximations can also be calculated in the form of quadratures (see $3.1
in [34]).
The solutions of Eqs. (6.1.16)-(6.1.18) of successive approximations
can be used for obtaining approximate solution of the synthesis problem.
Namely, the quasioptimal control us(t, x, m, D), corresponding to the sth
approximation, is determined by formula (6.1.14) after the function F in
(6.1.14) is replaced by the approximate expression FS= F0 &F1 - - . + + +
cSFS.
6.1.3. Estimates of the quality of approximate synthesis. We as-
sume that the quasioptimal control us (t, x, m, D) has already been obtained
in the sth approximation. By

GS(t, x,m, D) = E [c(x(r)) + U


: (T)HU,(T)] d r + $(x(T))
we denote the mean value (calculated from the time instant t ) of the opti-
mality criterion (6.1.3) for the control us.4 The deviation As = GS - F of
the function (6.1.19) from the exact solution F (t, x, m, D) of the Bellman
equation (6.1.13) is a natural estimate of the quality of the approximate
control u S ( tx,
, m, D). In what follows, we calculate the order of A" in the
first two approximations, that is, we estimate AOand A'.
Just as in 53.4, we calculate the desired estimates AS ( s = 0 , l ) in two
steps. First, we estimate the differences 6' = F - FS,then, y" Fs - Gs,
which immediately implies the estimates for AS (in view of the triangle
inequality).
Estimation of the differences 6' and S 1 . Let O(x), c(x), and $(z)
be bounded continuous functions for all x E R n . Then it follows from The-
orem 2.8 (for the Cauchy problem) in [I241 that the quasilinear equations
3The partial differential equations (6.1.13) and (6.1.16)of parabolic type are linear
with respect to the higher-order derivatives of the loss function. That is why, equations
of the form (6.1.13) and (6.1.16)are sometimes called weakly nonlinear (quasilinear or
semi-linear), see [61, 1241.
41n (6.1.19), u , ( r ) = u , ( r , x U e ( r ) , m U ~ (DUE
r ) , ( T ) ) , where z U a ( T ) , m u . ( T ) , and
DUs( 7 )satisfy Eqs. (6.1.2), (6.1.5),and (6.1.6) with u = u , ( r ) for T > t and the initial
conditions xUs( t ) = x , m U a( t ) = m, and Dun ( t ) = D.
318 Chapter VI

(6.1.13) and (6.1.16) have a t most one solution in the class of functions
that are continuous in the strip IIT = (1x1 < m ; ID1 < m ; Iml < m ; 0 5 t <
T}, continuously differentiable once in t , and twice in other variables for
0 < t < T, and possess bounded first- and second-order derivatives with
respect to x, m, D in IIT. Furthermore, Theorem 2.5 (for quasilinear equa-
tions) in 11241 implies the following estimate for the solution of the Cauchy
problem (6.1.13):

(here C1, C2 2 0 are some constants; it is assumed that the function c may
depend not only on x as in (6.1.13) but on the other variables t , m , D).
The above arguments also hold for linear equations (6.1.17) and (6.1.18) of
successive approximations.
By introducing a quasilinear operator L, we rewrite Eq. (6.1.13) in the
form
LF=-c(x), O<t<T; F(T,x,m,D)=$(x).

Then, for 6' = F - F O , we obtain from (6.1.13) and (6.1.16) a quasilinear


equation of the form

(with regard to the fact that the solution F0 of the zero-approximation


equation (6.1.16) is independent of Dl and therefore, Fi = 0). The vector
of partial derivatives Fi is a bounded continuous function in view of the
above-mentioned properties of the solution to Eq. (6.1.16). Hence, (6.1.21)
is a n equation of the form (6.1.13). To use the estimate (6.1.20), we need
to verify whether the right-hand side of (6.1.21) is bounded.
The elements of the matrices Q and Nl are bounded, since the functions
O(x) are bounded and the matrix N is bounded and nondegenerate. More-
over, it follows from the inequality (6.1.10) that the norm of the matrix D
can only decrease with time t. Therefore, the matrix D is bounded for all
t E [0, T ] if the matrix Do of the initial (a priori) covariances is bounded,
which was assumed in advance.
It remains to estimate the matrices F : and ~ ~F:, of partial deriva-
tives. To this end, we turn to the zero-approximation equation (6.1.16). By
writing v i = a F O / d m i (here mi is an arbitrary component of the vector m )
and differentiating (6.1.16) with respect to the parameter mi, we obtain
Applications of Asymptotic Synthesis Methods

the linear equation for vi:

0 5 t < T; v' (T, x, m, D) = 0. (6.1.22)

Equation (6.1.22) is written for the case where the unknown parameter ai
stands on the r t h line and in the j t h column of the matrix A in the initial
system (6.1.1); here Bj = Bj(x) is the j t h component of the vector-function
B(x). Since is bounded, the solution v h f Eq. (6.1.22) and its partial
derivatives vk and vLXT,as was already noted, are also bounded. Finally,
since vL = FL; is bounded and the number i is arbitrary, the matrix F:mT
in the first term on the right in (6.1.21) is also bounded. In a similar way,
we verify the boundedness of F.,:
Thus, it follows from (6.1.21), (6.1.20) that So satisfies the estimate

where C is a positive constant.


-1
In a similar way, we can estimate S1 = F - F = F - F0 - aF1. From
(6.1.13), (6.1.16), and (6.1.17), it follows that S1 satisfies the equation

The boundedness of F:, FA, F;,~, and FA, can be verified by analogy
with the case where we estimated So. Therefore, (6.1.24) and the inequality
(6.1.20) imply
IS1[ 5 c a 2 . (6.1.25)
Estimation of the differences -yo and -yl. For the functions GS =
GS(t,z, m, D), s = 0,1,2,. . ., determined by (6.1.19), we have the linear
partial differential equations 1451

+ C(X)+ a SP(DQ~(X)G:,,) + €2
SP(DNI(X)DGR,T)
- a Sp(DNl(x)DGa, 05t < T,
GS(T,x, m, D) = y5 (2). (6.1.26)
320 Chapter VI

The quasioptimal controls

contained in (6.1.26) are bounded continuous functions. Therefore, in view


of [66], the functions GS satisfying (6.1.26) are also bounded and twice
continuously differentiable, just as the functions F and FSdiscussed above.
By using the expressions uo = - H - ~ B ~ F ; /and
~ u 1 = -H-'B~(F; +
~F,1)/2for quasioptimal controls, as well as equations (6.1.26), (6.1.16), and
(6.1.17), we can readily obtain the following equations for the differences
y O = F O - G O a n d y l =FO+&F1-G1:

where Lo and L1 are the linear differential operators

Since the expressions in the square brackets in (6.1.27) and (6.1.28)


are bounded, the inequalities (6.1.20) for the solutions yO(t,x , m , D) and
yl(t, x, m, D) of Eqs. (6.1.27) and (6.1.28) yield the estimates

Finally, from (6.1.29), (6.1.23), and (6.1.25) with regard to the inequality
/A"5 ]Iss + [ys[,we have
Applications of Asymptotic Synthesis Methods 321

The estimates (6.1.30) show that the use of the quasioptimal control uo or

in the functional (6.1.3) by -


u l instead of the optimal control (6.1.14) results in a deviation (an increase)
E in the zero approximation and by -- E~ in
the first approximation. Thus, it follows from (6.1.30) that the method
of approximate synthesis of optimal control considered in Section 6.1.2 is
asymptotically efficient.

6.1.4. An example. Let us consider the simplest case of system (6.1.1)


in which the plant is an aperiodic first-order unit with an unknown inertia
factor. In this case, Eq. (6.1.2) is a scalar equation

where a is an unknown parameter, b and v > 0 are given numbers, and


[(t) is a scalar white noise of intensity 1. We define the optimality criterion
(6.1.3) as

where g and h > 0 are given constants. The optimal filtration equations
(6.1.5), (6.1.6) and the Bellman equation (6.1.13) for problem (6.1.31),
(6.1.32) are

-
D
dom = - -z(t) [doz(t)
V
+ (mx(t) - bu) dt], (6.1.33)
- D~
D = --x2(t), (6.1.34)
V

(2, m, and D are scalar variables in (6.1.33)-(6.1.35)).


The zero approximation (6.1.6) for Eq. (6.1.35) has the form
322 Chapter VI

The exact solution of Eq. (6.1.36) is5

, m) = f O ( t ,m)x2
~ ' ( t x, + rO(t,m), (6.1.37)

a= Jm2+
b29
-
h '
o g v ( T - t ) - -1n
vh 2P
T (t, m) = + m + (P - m)e-2P(T-t)
p +m b2 ,B '

It follows from (6.1.14) and (6.1.37) that the quasioptimal control in the
zero approximation has the form

where fo(t, m ) is determined by (6.1.37).


To obtain the quasioptimal control in the first approximation, we need
to calculate the second term in the asymptotic expansion (6.1.15). In our
case, Eq. (6.1.17) for the function F1 = F1(t, x, m, D) has the form

-Ftl = - m x ~ ; b2
h
-
0
-f (t, m)xF,
1
+ -F,,
v 1
2
- DXF;,

O<t<T, ~l(T,x,m,D)=o. (6.1.39)

Since, in view of (6.1.37), we have F,: = 2fk(t,m)x, we obtain the fol-


lowing expression for the desired function F1(t, x, m, D):

(here fg(T- s, m) denotes the partial derivative 8f o ( s , m)/8m of the func-


tion f O ( s ,m) in (6.1.37) with respect to the parameter m).

5Note that the loss function in the zero approximation is independent of the estimate
variance D, i.e., Fa = F0 ( t ,z , m ) .
Applications of Asymptotic Synthesis Methods 323

It follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasiop-
timal control synthesis in the first approximation is given by the formula

Comparing (6.1.38) and (6.1.41), we note that the optimal regulators in the
zero and first approximations are linear in the phase variable x. However,
if higher-order approximations are used, then we obtain nonlinear "laws of
control."
For example, in the second approximation, we obtain from (6.1.18) and
(6.1.35) the following equation for the function F2 = F 2 ( t ,x, m, D):

Obviously, its solution has the form

( t ,D) = q(t,m, D)x4


~ ~ x,m, + f 2 ( t ,m, D ) X ~+ r2(t,m, D),
and therefore, it follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that
the quasioptimal control in the second approximation

is a linearly cubic function of x.


Figures 47 and 48 show block designs for quasioptimal feedback control
systems, which correspond to the first (Fig. 47) and the second (Fig. 48)
approximations. By Wi ( i = 0,1,2,3) we denote linear (in x) amplifiers
with varying amplification coefficients

W3 = -E 2 2b
-q(t,m, D).
h
324 Chapter VI

The plant P is described by Eq. (6.1.31). The unit S C of optimal filtration


forms the current values of the sufficient coordinates m = m(t) = mt and
D = D ( t ) = Dt. It should be noted that the coordinate mt is formed in S C
with the aid of the equation

which differs from Eq. (6.1.33). The reason is that only stochastic equations
understood in the symmetrized sense 11741 are subject to straightforward
simulation. Therefore, the symmetrized equation (6.1.42) is chosen so that
its solution coincides with the solution of the Ito equation (6.1.33).
6.1.5. Some results of numerical experiments. The estimates
(6.1.30) establish only the asymptotic optimality of the quasioptimal con-
trols uo and u1. Roughly speaking, the estimates (6.1.30) only mean that
the less the parameter E (i.e., the less the a priori indeterminacy of the
components of the vector a ) ,the more grounds we have for using the qua-
sioptimal controls uo and u1 (calculated according to the algorithm given in
Section 6.1.2) instead of the optimal (unknown) control (6.1.4) that solves
problem (6.1.1)-(6.1.3).
On the other hand, in practice we always deal with problems (6.1.1)-
(6.1.3) in which all parameters (including E) have definite finite values. As a
rule, in advance, it is difficult to determine whether a given specific value of
the parameter E is sufficiently small so that the above approximate synthesis
procedure can be used effectively. Some ideas about the situations arising
Applications of Asymptotic Synthesis Methods

for various relations between the parameters of problem (6.1.1)-(6.1.3) are


given by the results of numerical experiments performed to analyze the effi-
ciency of the quasioptimal algorithms (6.1.38) and (6.1.41) (see the example
considered in Section 6.1.4).
As was already noted, it is natural to estimate the quality of the qua-
sioptimal controls us (s = 0,1,2,. . .) by the differences AS = GS - F,
where the functions GS = GS(t, x, m, D), given by (6.1.19), satisfy the lin-
ear parabolic type equations (6.1.26) and the loss function F = F ( t , x, m, D)
satisfies the Bellman equation (6.1.13). In the example considered in Sec-
tion 6.1.4, the Bellman equation has the form (6.1.35), and the functions
GS (s = 0,1,2, . . .) satisfy the equations

Equations (6.1.35) and (6.1.43) were solved numerically (Eq. (6.1.43)


was solved for s = 0 and s = 1 with the quasioptimal controls (6.1.38) and
(6.1.41) taken as us, s = 0 , l ) .
Here we do not describe finite-difference schemes for constructing nu-
merical solutions of Eqs. (6.1.35) and (6.1.43)6 but only present the results

'Numerical methods for solving equations of the form (6.1.35) and (6.1.43) are dis-
cussed in Chapter VII.
Chapter VI

of the corresponding calculations performed for different values of the pa-


rameters of problem (6.1.31), (6.1.32).
In Fig. 49 the plots of the loss function F (solid curves) and the func-
tion Go (dashed curves) are given for three values of the a posteriori
variance D = ED in the case where m = 1, p = T - t = 3, and prob-
lem (6.1.31), (6.1.32) has the parameters g = h = b = u = 1. (Since
the functions F and Go are even with respect to the variable x, that is,
F ( t , x , m , D) = F ( t , -x,m, D) and Go(t, x , m ) = Go(t, -x,m), Fig. 49
shows the plots of F and Go only for x > 0.) Since the corresponding
curves for F and Go are close to each other, we can state that, in this case,
the quasioptimal zero-approximation control (6.1.38) ensures the control
quality close to that of the optimal control. However, this situation is not
universal, which is illustrated by the numerical results shown in Fig. 50.
Figure 50 shows the plots of the functions F (solid curves), Go (dot-
and-dash curves), and G1 (dashed curves) for the "reverse" time p =
T - t = 2.5 and the parameters g = h = 1, b = O , l , and u = 5 of
problem (6.31), (6.1.32). One can see that the use of the quasioptimal
zero-approximation control uo(t, x, m ) leads to a considerable increase in
the value of the functional (6.1.19) compared with the possible minimum
(optimal) value F ( t , x , m, D). Therefore, in this case, to ensure a quali-
tative control of system (6.1.31), we need to use quasioptimal controls in
higher-order approximations. In particular, it follows from Fig. 50 that, in
this case, the quasioptimal first-approximation control ul(t, x, m, D) deter-
mined by (6.1.37), (6.1.40) and (6.1.41) provides the control quality close
Applications of Asymptotic Synthesis Methods 327

to the optimal.
Thus, the results of numerical solution of Eqs. (6.1.35) and (6.1.43) con-
firm that the quasioptimal control algorithm (6.1.41) is "highly qualita-
tive." We point out that this result was obtained in spite of the fact that
the a posteriori variance D, which plays the role of a small parameter in
the asymptotic synthesis method considered here, is of the same order of
magnitude as the other parameters (g, h, b, v ) of problem (6.1.31), (6.1.32).
This fact allows us to believe that the asymptotic synthesis method (see
Section 6.1.2) can be used successfully for solving various practical problems
of the form (6.1.1)-(6.1.3) with finite values of the parameter E .
In conclusion, we make some methodological remarks. First, we recall
that in the title of this section the problems of optimal control with un-
known parameters of the form (6.1.1)-(6.1.3) are called "adaptive." It is
well known that problems of adaptive control are very important in the
modern control theory, and a t present there are numerous publications in
this field (e.g., see 16-9, 1901 and the references therein). Thus, it is of in-
terest to compare the results obtained in this section with other approaches
to similar problems.
The following heuristic idea is very often used for constructing adap-
tive algorithms of control. For example, suppose that for the feedback
control system shown in Fig. 13 it is required to construct a controller C
that provides some desired (not necessarily optimal) behavior of the sys-
tem in the case where some parameters cr of the plant P are not known
in advance. Suppose also that for some given parameters a , the required
328 Chapter VI

behavior of system in Fig. 13 is ensured by the well-known control algo-


rithm u = cp(t, x, a ) . In this case the proposed heuristic adaptive algorithm
consists in two steps: (1) to include a n optimal filter into the controller C
so that this filter will produce optimal estimates Gt = S(x6) of the vector
of unknown parameters of the plant by means of the observation of the
output process x i = {x(T): 0 5 T 5 t}; (2) to define the adaptive control
by the formula u, = ~ ( x, t ,Zt). Needless to say, a n additional analysis is
required to answer the question of whether such control ensures the desired
behavior of the system. The corresponding analysis [6-9, 1901 shows that
this method for constructing adaptive control is quite acceptable in many
specific problems.
Now let us discuss the results of this section. Note that the above-
mentioned heuristic idea is exactly realized if system (6.1.2) is controlled
by the quasioptimal zero-approximation control uo(t, x, m). To verify this
fact, we return to the example considered in Section 6.1.4. The algorithm
of the optimal control for problem (6.1.31), (6.1.32) with a known param-
eter a = -a is given by formulas (2.1.14) and (2.1.16) in $2.1. Compar-
ing (2.1.14), (2.1.16) with (6.1.37), (6.1.38), we see that the quasioptimal
zero-approximation algorithm (6.1.37), (6.1.38) can be obtained from the
optimal algorithm (2.1.14), (2.1.16) by replacing the parameter -a by its
optimal estimate mt established by means of the filter equations (6.1.33)
and (6.1.34).
On the other hand, a numerical analysis of the quasioptimal algorithms
uo(t, x, m ) and u l ( t , x, m, D) (see Figs. 49 and 50) shows that the algorithm
u l is preferable in contrast with the "heuristic" algorithm uo in the zero
approximation. This result proves that the regular asymptotic method con-
sidered in this section is effective for solving adaptive problems of optimal
control.

$6.2. Some stochastic control problems


with constrained phase coordinates
As was pointed out in $1.1,in the process of constructing actual control
systems, one often needs to take into account constraints of various types
imposed on the set of possible values of the phase coordinates. These con-
straints arise due to exploiting some specific systems, additional require-
ments on the transient processes, allowance to the fact that the time of
control switching is finite, and to other courses. In these cases, a region of
admissible values is specified in the phase space, so that the representative
point of the controlled system must not leave this region. In this case, the
equations of the system dynamics that determine the phase trajectories in
the interior of this region can be violated a t the boundary of this region.
Additional constraints that are imposed on the phase trajectories on the
Applications of Asymptotic Synthesis Methods

boundary depend on the type of the problem.


In what follows, we consider two one-dimensional and one two-dimensio-
nal problems of optimal control synthesis (the problem dimension is deter-
mined by the number of phase variables on which, in addition to time t ,
the loss function depends). In the one-dimensional problems, the controlled
variable z(t) is interpreted as the difference (error signal) between the cur-
rent values of the random command input y(t) and the controlled variable
x(t) in the servomechanism studied in $2.2. However, in contrast with $2.2
where any value of the error signal z(t) was admissible, in the present sec-
tion it is assumed that the region of admissible values of z is an interval
[11,12]. At the endpoints of this interval, we have either reflecting or ab-
sorbing screens [157, 1601. In the first case, if the representative point z(t)
comes to ll or 12,then it is instantaneously reflected into the interior of
the interval; in the second case, on the contrary, the representative point
"sticks" to the boundary and remains there forever. In practice, we have
the first problem if the error signal values lying outside the admissible inter-
val [11,12] are prohibited, and we have the second problem if the tracking
is interrupted a t the endpoints (just as in radio systems of phase small
adjustment [143, 1801).
In the two-dimensional problem we consider the optimal control of a
diffusion process in the interior of the disk of radius ro centered at the
origin on the phase plane (x, y). The circle bounding this disk is a regular
boundary [I241 reflecting the phase trajectories along the inward normal.

6.2.1. One-dimensional problems. Reflecting screens. Let us


consider, just as in 32.2, the synthesis problem of optimal tracking a wan-
dering coordinate in the case where the servomotor with bounded speed is
used as an executive mechanism. By analogy with $2.2, we assume that the
command input y(t) is a continuous Markov diffusion process with known
drift a and diffusion B coefficients ( a ,B = const, B > 0). By using a ser-
<
vomotor with bounded speed (x = u, lul um, um > la[),it is required to
< <
"follow" the command signal y(t) on the time interval 0 t T so that to
minimize the mathematical expectation (mean value) of the integral per-
formance criterion

where z(t) = y(t) - x(t) is the error signal, c(z) is a nonnegative penalty
function attaining its minimum a t the unique point z = 0, and c(0) = 0.
In this case, as shown in $2.2, solving the synthesis problem (in the case of
unbounded phase coordinates) is equivalent to solving the Bellman equation
330 Chapter VI

(see (2.2.4))

with the loss function

F ( t , z) = min
IU(J)IIU~
tls<T

satisfying the following natural condition for t = T:

F (T,z) = 0. (6.2.3)

According to 51.4, the Bellman equation is defined only by local charac-


teristics of the controlled process z(t). Therefore, for problems with con-
straints on the error signal, Eq. (6.2.1) also remains valid a t all interior
points el < z < 12. Indeed, since the stochastic process z(t) is continuous,
its realizations issued from an interior point z with large probability (almost
surely) move to a small distance during a small time At and cannot reach
the endpoints el and 12. Therefore, in a sufficiently small neighborhood of
any interior point z , a controlled stochastic process behaves in the same
way as if there were no reflecting screens. Hence, the differential equation
(6.2.1) is valid a t these points.
At the points el and e2, Eq. (6.2.1) is not valid, and additional conditions
on the function F a t these points are determined by the character of the
process z(t) near these points. For example, in the case of reflecting screens
considered here, we have the conditions [I571

The conditions (6.2.4) can be explained intuitively by modeling the


diffusion process z(t) approximately as a discrete random walk [I601 in
which with some definite probabilities the representative point goes from
the point z to neighboring points z f Az, Az = m, during the time
At. Then if a t any time instant t the point z comes to the boundary, say,
+
z = el, then with probability 1 the process z attains the value el Az at
+
time t At, and therefore, we can write the following relation for the loss
function (6.2.2):
Applications of Asymptotic Synthesis Methods 331

By expanding the second term in the Taylor series around the point (t, el),
we obtain

whence, passing to the limit as At + 0, we arrive a t (6.2.4).


Thus, to synthesize a n optimal servomechanism with the variable z sub-
ject to constraints in the form of reflecting screens a t the points z = el and
z = 12, we need to solve Eq. (6.2.1) with additional conditions (6.2.3) and
(6.2.4) on the function F ( t , z ) (el 5 z 5 l2, 0 < t <
- T). In this case, the
synthesis problem is solved according to the scheme studied in 52.2 for a
similar problem without constraints on z. Therefore, here we only briefly
recall this scheme paying the main attention to distinctions arising in the
calculational formulas due to constraints on the phase variable.
Obviously, the expression in the square brackets in (6.2.1) is minimized
by a n optimal control of the form
dF
U* (t, z) = urnsign -(t , z) . (6.2.5)
dz
Substituting (6.2.5) into (6.2.1) and omitting the symbol min, we obtain

If we pass to the reverse time T = T - t , then the boundary value problem


we need to solve acquires the form

By taking into account the properties of the penalty function c(z), we


see that the loss function F(T, z) satisfying the boundary value problem
(6.2.7)-(6.2.9) for all T (0 < T <
T) has a single minimum (with respect
to z) on the interval el < z < e2. Therefore, the optimal control (6.2.5)
can be written as (see (2.2.8))

U* (7, z) = urnsign (z - z, (7)), (6.2.10)


332 Chapter VI

where z, (7) is the minimum point (with respect to z) of the function F(T, Z)
and simultaneously the switch point of the controlling action. This point
can be found from the condition

Thus, to synthesize a n optimal system, we need to solve the boundary value


problem (6.2.7)-(6.2.9) and to use the condition (6.2.11).
Problem (6.2.7)-(6.2.9) can be solved exactly if, just as in $2.2, we con-
sider the stationary operating conditions corresponding to large values of 7.
In this case, instead of the function F(T, z), we can consider the stationary
loss function f (z) given by the relation
f (z) = lim [ F(r,z) - y r ]
T+W

(just as in (1.4.29), (2.2.9), (4.1.7), and (5.3.17), the number y characterizes


the mean losses per unit time in the stationary tracking mode). Therefore,
for large T (more precisely, as r + m), the partial differential equation
(6.2.7) is replaced by the following ordinary differential equation for the
function f (2):

with the boundary conditions

In this case, the coordinate of the switch point given by (6.2.11), where F
is replaced by f , attains a constant value z, (that is, we have a stationary
switch point).
The boundary value problem (6.2.12), (6.2.13) can readily be solved by
the matching method. By analogy with $2.2, let us consider Eq. (6.2.12) on
different sides of the switch point z,. Then the nonlinear equation (6.2.12)
is replaced by the pair of linear equations

Solving Eqs. (6.2.14), (6.2.15) with boundary conditions (6.2.13), we arrive


Applications of Asymptotic Synthesis Methods 333

By using (6.2.11), we obtain the two equations

for two unknown parameters y and z,. Substituting (6.2.16) into (6.2.17)
and eliminating the parameter y from the system obtained, we see that the
stationary switch point z, satisfies the transcendental equation

For the quadratic penalty function c(z) = z2, Eq. (6.2.18) acquires the form

If el -+ -aand l 2 + +a(that is, reflecting screens are absent), then


Eqs. (6.2.19) imply the following explicit formula for the switch point z,:

this formula was obtained in 52.2 (see (2.2.16)). In the other special case
e2 = -el and A 1 = -Az (the last equality is possible only if a = O),
Eq. (6.2.19) has a single trivial root z, = 0, that is, the optimal control
(6.2.10) coincides in sign with the error signal z.
6.2.2. Absorbing screens. Let us see how the tracking system studied
in the preceding section operates with absorbing screens. Obviously, in this
case, the loss function (6.2.2) must also satisfy Eq. (6.2.7) in the interior of
the interval [el, 12] and the zero initial condition (6.2.9). At the boundary
points, instead of (6.2.8), we have

The conditions (6.2.20) follow from formula (6.2.2) and the fact that the
trajectories z(t) stick to the boundary. Indeed, by using, as above, the
discrete random walk model for z(t), we can rewrite (6.2.2) as
334 Chapter VI

and hence, since t and A are arbitrary, we obtain

Just as in the preceding section, the exact solution of synthesis problem


with absorbing screens can be obtained only in the stationary case (as
.r + co). Suppose that the stationary operating mode exists and that zo is
the corresponding stationary switch point. Then for large T, the nonlinear
equation (6.2.7) can be replaced by two linear equations

For z = zo, z = el, and 2- = e2, the functions Fl and F 2 satisfy (6.2.11) and
(6.2.20).
In accordance with [26], for large T, we seek the solutions of the linear
equations (6.2.21) and (6.2.22) in the form

Using (6.2.23), we obtain from (6.2.21), (6.2.11), and (6.2.20) the following
system of ordinary differential equations for the functions $l(z) and fl(z):

d$1 df 1
-(zo)
dz = -(zo)
dz = 0, $l(el) = ~ ( e , ) , fl(el) = o.
(6.2.24)
From (6.2.24) we obtain

In a similar way, for the functions $2 and f2 we have

(here X1 and A2 are given by (6.2.16)).


It follows from (6.2.23), (6.2.25), and (6.2.26) that Eq. (6.2.7) has a
continuous solution only if
Applications of Asymptotic Synthesis Methods 335

The same continuity condition allows us also to obtain the following


equation for the switch point zo (provided that (6.2.27) is satisfied):

Just as in the case of reflecting screens, Eq. (6.2.28) can be specified by


various expressions for the penalty function c(z).
REMARK 6.2.1. If the condition (6.2.27) is violated, then it makes no
sense to study the stationary operating mode in the problem with absorbing
boundaries, since in this case the synthesis problem has only a trivial solu-
tion. In fact, we can readily see that for c(ll) > c(12) we always need to set
u = -urn (correspondingly, for c(C1) < c(12) we need to set u = +urn). This
character of control depends on the fact that, in view of its regularity, the
diffusion process z ( t ) sticks to that or other boundary with probability 1 (as
t + co). Therefore, it is clear that this algorithm for controlling the process
z(t) maximizes the probability of the event that the process sticks to the
boundary with the least possible value of the penalty function c(z).
In the general case c(el) # c(lz), we need to solve the nonstationary
boundary value problem (6.2.7), (6.2.20), and (6.2.9). Since this problem
cannot be solved exactly, it is necessary to use approximate synthesis meth-
ods. In particular, we can use the method of successive approximations
considered in Chapter I11 for problems with unbounded phase coordinates.
According to Chapter 111, the approximate solutions F ( ~ ) ( Tz), of Eq. (6.2.7)
can be found by recurrently solving the sequence of linear equations

, k = 0 , 1 , 2 , . .., in (6.2.29) satisfy (6.2.9) and (6.2.20)). Af-


(all F ( ~ ) ( Tz),
ter F ( ~ ) ( Tz)
, are calculated, the synthesis of a suboptimal system is es-
tablished by (6.2.10) and (6.2.11) with F replaced by F ( ~ ) Just ). as in
Chapter 111, we can prove that the function sequence F ( ~ (T,)z) asymptot-
ically as k -+ co converges to the exact solution F(T,Z) of the boundary
value problem (6.2.7), (6.2.9), (6.2.20), and the corresponding suboptimal
systems to the optimal system (the last convergence is estimated by the
quality functional).
6.2.3. The two-dimensional problem. Suppose that the motion
of a controlled system is similar to the dynamics of a Brownian particle
336 Chapter VI

randomly walking on the plane (x, y) so that along one of the axes, say,
along the x-axis, this motion is controlled by variations of the drift velocity
within a given region, while along the y-axis we have a purely diffusion
noncontrolled wandering. In this case, the equations describing the system
motion have the form

=u+rn&(t), j,=&&(t), - (urn - a ) 5u < urn + a ,


(6.2.30)
where J l ( t ) and t 2 ( t ) are independent stochastic processes of the white
noise type with intensity 1 and, by analogy with one-dimensional problems,
-(urn - a ) < 0 and (u, + a ) > 0 indicate the boundary values of the
nonsymmetric region of admissible controls u.
We assume that the representative point (x(t), y(t)) must not go away
from the origin on the plane (x, y) to distances larger than TO. To this end,
we assume that the phase trajectories reflect from the circle of radius ro
along the inward normal to this boundary. Under this assumption, it is
required to find a control law that minimizes the mean value of the quadratic
optimality criterion

One can readily see that the Bellman equation related to this problem,
written in the reverse time r = T - t , has the form (F,, Fx,Fyindicate the
~ a r t i a derivatives
l with respect to 7,x, y):

B(Fxx + Fyy)+ min


-(urn-a)<u<(um+a)
[uFx] = F, - x2 - Y2. (6.2.32)

In addition to Eq. (6.2.32) for the function F ( T ,x, y) such that 0 < r 5 T
and < T O , the loss function F ( r , x, y) must satisfy the zero initial
condition
F(O,x, Y) = 0 (6.2.33)

and the boundary condition of the form [I571

where d l d n is the normal derivative on the circle of radius TO.


Applications of Asymptotic Synthesis Methods 337

In the polar coordinates (r, cp) defined by the formulas x = r cos cp, y =
r sin cp, the boundary value problem (6.2.32)-(6.2.34) acquires the form

+ -(urn-a)<u<(um+a)
min

It follows from (6.2.35) that, just as in the one-dimensional case, the


optimal control is of relay type:

but now, instead of the switch point, we have a switching line on the plane
(x, y). This switching line is given by the equation (in the polar coordinates)

sin cp
COS pFT- -F,r = 0.

To obtain a n explicit formula for the switching line, we need to solve


Eq. (6.2.35) or (since this is impossible) equations of successive approxi-
mations obtained by analogy with Eqs. (6.2.29). Now we shall calculate
the loss functions and the corresponding switching lines for the first two
approximations of Eq. (6.2.35).
The zero approximation. Following the algorithm of successive ap-
proximations considered in Chapter I11 (see also (6.2.29)), we set the non-
linear term in the zero approximation of (6.2.35) equal to zero and thus
obtain
1
(6.2.40)

It follows from (6.2.40), (6.2.36), and (6.2.37) that the solution F(O) is
radially symmetric, F(')) = F(O)(r,r), and therefore, instead of (6.2.40),
(6.2.36), and (6.2.37), we have
338 Chapter VI

It is well known [I791 that the solution of Eq. (6.2.41) can be found by
separation of variables (by the Fourier method) as the series

Here l o ( % )is the Bessel function of zero order and is the m t h root of
the equation dIo(,u)/dp = 0.
It follows from the properties of zeros of the Bessel function [I791 that
the series (6.2.42) is rapidly convergent. Therefore, since we are interested
only in the qualitative character of suboptimal control laws, it suffices to
find only the first term of the series in the sum in (6.2.42).
Calculating cl and using the tables of Bessel functions [77], we obtain
the following approximate expression (0 = ~ / r ifor ) the function ~ ( ' 1 :
r r2 0
F(')(T, r) = AT- 0.0426210 ( h r ) (1 - exp [ - ~(&')'r]). (6.2.44)
2 0 ro
By differentiating (6.2.44) with respect to r and taking into account the
relations dIo(x)/dx = I l ( x ) and p: = 3.84, we find

r ) = 0.164~11
FJO)(r,
0 To
(gr)(1 - exp [ - ~ ( ~ : ) ~ r ] ) . (6.2.45)
Applications of Asymptotic Synthesis Methods 339

Since the first-order Bessel function I l ( p ~ r / r o )is positive for 0 < r < ro
(Il(&) = O), the derivative (6.2.45) is positive everywhere in the interior
of the disk of radius ro on the plane (x,y). Hence, in view of (6.2.38),
the sign of the controlling action in the zero-approximation is determined
by the sign of cos cp, that is, the switching line coincides with the vertical
diameter of the disk of radius ro on the plane (x, y) (in Fig. 51 the switching
line is indicated by AOB; the arrows show the direction of the mean drift
velocity).
The first approximation. By using the results obtained above, we
can write the first-approximation equation as

F(')(o, r, cp) = 0, TO, cp) = 0,


F,(l)(r,
F(')(r, r, cp) = F(')(r,r, cp + 27r) = 0,
r2 - (urn - a)F,(0) (r,r ) cos cp, 0 < cp < 5, < cp < 2 ~ ,
@(7,T, cp) =
r2 + (urn + a)~!') (7, T) cos C, $ < (f < 2, (6.2.47)

(here the function F ' O ) is given by formula (6.2.45)). The solution F(')
of Eq. (6.2.46) may also be written as a series in eigenfunctions, but since
now there is no radial symmetry, this series differs from (6.2.42) and has
the form [I791

F(')(T, r, cp) = coo(7) + (c,, cos ncp + c;, sin ncp)

where

dn = {1
2
for
for
n # 0,
n=O,
340 Chapter VI

and coo(r) denotes the terms independent of r and p, and hence, insignif-
icant for the control law (6.2.38). The numbers p k are the roots of the
equation dIn(p)/dp = 0, where In(p) is the nth-order Bessel function.
By analogy with the case of the zero approximation, we consider only
the first most important terms of the series (6.2.48). Namely, we retain
only the terms corresponding to the two roots p i and p: of the equation
dI,(p)/dp = 0; according to [77], p i = 1.84 and py = 3.84. This means
that all coefficients in (6.2.48) except for col, cll, and cil must be set
equal to zero. The coefficient col coincides with cl in (6.2.43) and has been
calculated in the zero approximation (therefore, in the series (6.2.48) the
term containing col coincides with the second term in formula (6.2.44)).
By calculating cil according to (6.2.50) with regard to (6.2.47), we obtain
cil = 0. Thus, to find the loss function F('), it suffices to calculate only
cll. Substituting (6.2.47) and (6.2.45) into (6.2.49), we obtain

x lrexp [ - ~ ( , L L : ) ~ (-
T g)](1 - e ~ ~ [ - - ~ ( p pdo
)~c])

Since we have (see [179], $2, Part 1, Appendix 2)

we calculate the other integrals in (6.2.51) and obtain


Applications of Asymptotic Synthesis Methods 341

Substituting (6.2.52) into (6.2.39) and letting r t m, we arrive a t the


following equation for the switching line corresponding to the stationary
operating conditions:

(here v = r/ro, E = Oro/a = B l u r o , and I i ( p ) = d I l ( p ) / d p ) .

Curves 1, 2, and 3 in Fig. 52 correspond to the three values of the


parameter E in Eq. (6.2.53): E = 0.4, 1.0, 3.0. Thus, the optimal control
in the first approximation consists in switching the control action from
u = -(urn - a ) in the region R- to u = +(urn + a ) in the region R+, which
(in dependence of the value of the parameter E ) lies inside one of the closed
curves 1-3 in Fig. 52.
REMARK6.2.2. T h e decomposition (Fig. 52) of the phase space into
the regions R- and R+ can be refined if the functions F(')(T,z) and
F(')(T, r, cp) are calculated more precisely (that is, are approximated by
a larger number of terms in the series (6.2.42) and (6.2.48)). However, as
the corresponding calculations show, curves 1-3 obtained in this case do
not practically differ from those shown in Fig. 52. 17

86.3. Optimal control of the population size


governed by the stochastic logistic model
In this section we return to the problem of optimal control of the pop-
ulation size, which was formulated in $2.4 (but not solved). Let us briefly
recall the statement of this problem.
342 Chapter VI

6.3.1. Statement of the problem. We shall consider a single-species


population whose dynamics is described by the controlled stochastic logistic
model

( 3
li. = T 1 - - x - qux + d%x[(t), t > 0, x(0) = xO, (6.3.1)

where x = x(t) is the population size (density) a t time t, [(t) is a stochastic


process (1.1.31) of the standard white noise type, and T , K q, B, and x0 are
given positive constants.
Admissible controls belong to the class of nonnegative scalar bounded
measurable functions u = u(t) that for all t satisfy a condition of the form

where u, is a given positive number.


We shall consider the control problem on an infinite time interval R+ =
[0, CQ) with an arbitrary initial population size x(0) = xO > 0. The goal of
control is to maximize the functional

I[u] = E [lcc
e-dt(pqx(t) - c)u(t) dt] -+ max ,
0<u(t)<um
t>o
(6.3.3)

where 6, p, q, c > 0 are given numbers and E denotes the mathematical


expectation of the expression in the square brackets (we average over the
ensemble of random trajectories issued from a given point x(0) = xO and
satisfying the stochastic differential equation (6.3.1)).
It follows from $2.4 that problem (6.3.1)-(6.3.3) is a stochastic general-
ization of optimal fisheries management problems studied in [35, 68, 1011.
If, just as in 52.4 and in [35, 68, 1011, the number p is the cost of unit
mass of caught fish, the number c denotes the cost of unit efforts u(t) for
fishing, and q is the catchability coefficient, then the functional (6.3.3) is
an estimate of the mean profit obtained by fishing during time of the order
of 116.
The optimal control function u, (t) : R+ -+ [0, urn] maximizing the func-
tional (6.3.3) is a random function of time. To obtain a constructive al-
gorithm for calculating this function, we need to use some results of the
general control theory for processes of diffusion type (see [58, 113, 1751 as
well as $ 1.4).
We assume that the controlling party has information about the current
values of the controlled process x(t). Then it is expedient to choose the
control u(t) a t time t on the basis of the entire body of information available
Applications of Asymptotic Synthesis Methods 343

on the controlled process. This leads to a controlling function of the form


u(t) = u(t, xk), xh = {x(s): 0 5 s 5 t } that is sometimes called the natural
strategy of control (the function u(t, 26) can be a probability measure).
But if, just as in our case, the controlled system obeys an equation of the
form (6.3.1) with perturbations [(t) in the form of a Gaussian white noise,
then, as was shown in [113, 1751, the prehistory of the controlled process
x(s): 0 5 s 5 t does not affect the quality of control. Therefore, to solve
the optimization problem (6.3.3), it suffices to consider only the class of
controlling functions that are deterministic functions of the current phase
variable u(t) = u(t, x(t)) (the nonrandomized Markov strategy). Next,
since the stochastic process [(t) is stationary and the coefficients in (6.3.1)
are time-invariant, the optimal strategy for the infinite-horizon problem
in question does not depend on time explicitly, that is, u,(t) = u,(x(t)).
By using the controlling function (the synthesizing function) in the form
u* (x), we can realize the optimal control of system (6.3.1) in the form of
an automatic feedback control system.
In what follows, we present a method for calculating the synthesizing
function u,(x) for problem (6.3.1)-(6.3.3).
6.3.2. Solution of problem (6.3.1)-(6.3.3). By analogy with $2.4
and on the basis of the results obtained in [113, 1751, we can assert that
the maximum value of the functional (6.3.3) (that is, the cost function)

F(X) = max
O<u(t)<um
E [lm 1 e-bt (pqr(t) - c)u(t) dt ~ ( 0 =
)
I
t20-

considered as a function of the initial state x is twice continuously dif-


ferentiable and satisfies the following Bellman equation7 (F' = dF/dx,
F" = d2F/dx2):

F'+ max [ ( p q x - c - q x ~ ' ) u ] - ~ ~= 0. (6.3.4)


O<u(t)<um

The cost function is defined only for nonnegative values of the variable x;
for x = 0, this function satisfies the natural boundary condition

which is a straightforward consequence of (6.3.1) and (6.3.3) (indeed, it


follows from (6.3.1) that if x(0) = 0, then x(t) 0 for all t 0; hence, >
' ~ ~ u a t i o(6.3.4)
n is written with regard to the fact that the solution of the stochastic
equation (6.3.1) is understood in the symmetrized sense (see 51.2 and [174]).
344 Chapter VI

it follows from (6.3.3) that in this case the optimal control has the form
u,(t) r 0; and hence, (6.3.3) implies (6.3.5)).
First, note that Eq. (6.3.4) for S > r + B and K -t oo has exact solution
(obtained in $2.4)

Here
20 =
c(S - r - B + qu,)
pq(6 - r - B + ekO-1
qurn)
determines the switch point of the optimal control in the synthesis form

and the numbers Ic: > 0 and kg < 0 in (6.3.6) and (6.3.7) can be written
in terms of the parameters of problem (6.3.1)-(6.3.3) as

For an arbitrarily chosen value of the parameter (the medium capacity)


K > 0, it is impossible to find the solution of Eq. (6.3.4) in the form of
finite formulas like (6.3.6) and (6.3.7). Nevertheless, as is shown below,
constructive methods for solving the synthesis problem can also be found
in this case.
Let us construct a solution of Eq. (6.3.4). First, we note that it follows
from (6.3.4) that the optimal control requires only the boundary values
u = 0 and u = urn of the set 10, urn] of admissible controls. The choice
of one of these values is determined by the sign of the expression y(x) =
pqx - c - qxFL. If y(x) = 0, then the choice of control is not determined
formally by Eq. (6.3.4). However, one can see that in this case the choice of
any admissible value of u does not affect the solution of Eq. (6.3.4), since
the nonlinear term of Eq.(6.3.4) vanishes for y(x) = 0 and any admissible u.
Therefore, we can write the optimal control in the form
Applications of Asymptotic Synthesis Methods 345

If the equation y ( x ) = 0 has a single root x,, then the optimal control can
be written in the form

similar to (6.3.8), where the coordinate of the switch point x , is determined


by the equation
pqx - c - y x F 1 ( x ) = 0 , (6.3.10)
whose solution can be obtained after the cost function F ( x ) is calculated.
By F o ( x ) and F l ( x ) we shall denote the cost function F ( x ) on either side
of the switch point 3,. Then, as it follows from (6.3.4) and (6.3.9), instead
of one nonlinear equation (6.3.4), we have two linear equations for Fo and
F' :

BX~F; x + ( + B - yu,
T - -x
K )Fi - SFl = U,(C - pyx), x , < z.
(6.3.12)
Since the cost function F (z) , as the solution of the Bellman equation (6.3.4),
is twice continuously differentiable for all x E [0,o o ) , the functions Fo and
Fl satisfy the boundary condition (6.3.10) a t the switch point x,. Moreover,
it follows from (6.3.5) that Fo(0) = 0. These boundary conditions allow us
to obtain the unique solution of Eqs. (6.3.11) and (6.3.12), and thus, for
all x E [O, o o ) , to construct the cost function F ( x ) satisfying the Bellman
equation (6.3.4).
We shall seek the solution of Eq.(6.3.11) as the generalized power series

By substituting the series (6.3.13) into (6.3.11) and setting the coefficients
of x u , xu+', . . . equal to zero, we obtain the following system for the char-
acteristic factor a and the coefficients a ; , i = 0 , 1 , 2 , . . . :
346 Chapter VI

If we set a. # 0, then the first relation in (6.3.14) implies the characteristic


equation
BU' + TU - 6 = 0,
whose roots

determine two possible values of the characteristic factor a in (6.3.13). Since


a2 is negative, it follows from the boundary condition (6.3.5) that we need
only to use ul = k! in (6.3.13). Therefore, the solution of Eq. (6.3.11) can
be written in the form
Fo(2) = ao$(x), (6.3.15)
where

For the coefficients of the series (6.3.16) we have the estimate

>
Thus, the series (6.3.16) converges for any finite x 0, we can differentiate
this series term by term, and its sum $(x) is an entire analytic function
satisfying the estimate

The constant ao in (6.3.15) can be found from the boundary condition


(6.3.10) for the function Fo a t the switch point x,. Hence we have the
following final expression for the solution of Eq. (6.3.11):

The nonhomogeneous equation (6.3.12) is of the same type as Eq. (6.3.11)


and its solution can also be expressed in terms of generalized power series.
Applications of Asymptotic Synthesis Methods 347

It is well known that the general solution of the nonhomogeneous equation


(6.3.12) is the sum of the general solution of the homogeneous equation,

and any particular solution of Eq. (6.3.12). Equation (6.3.18) is similar to


Eq. (6.3.11), and therefore, its solution can be constructed by analogy with
the above-described procedure (6.3.13)-(6.3.17). Performing the required
calculations, we obtain the following expression for the general solution of
Eq. (6.3.18):
+
F l ( x ) = ci$l(x) ~ 2 4 2 ( ~ ) . (6.3.19)
Here cl and c2 are arbitrary constants and the functions ?Itl(x) and $J~(X)
are the sums of generalized power series

+ C -1 k:(ki + 1)...(kt + n -
m
1)
*I(.) = xk: [I
n=l n! (a: + l ) ( a + 2 ) . . . ( a + n)

where the numbers k:, k i , and a: are determined by the expressions

Note that the series (6.3.20) for any finite x can be majorized by a conver-
gent numerical series. Therefore, the series (6.3.20) can be differentiated
and integrated term by term, and its sum &(x) is a n entire function. Sim-
ilar statements for the series (6.3.21) hold only for a: # n (here n is a
positive integer number); in what follows, we assume that this inequality is
satisfied.
A particular solution of the nonhomogeneous equation (6.3.12) can be
found by the standard procedure of variation of parameters. We write the
desired particular solution @ as

where the desired functions cl(x) and c2(x) satisfy the condition
348 Chapter VI

By substituting, instead of Fl, the relation (6.3.23) into (6.3.12), after


simple calculations with regard to (6.3.24), we obtain

Note that the expression in the square brackets in the integrands in


(6.3.25) and (6.3.26) is the Wronskian of Eq. (6.3.12), which is not zero
for all x, since the solutions $l(x) and $2(x) are linearly independent.
Therefore, we can readily calculate the integrals in (6.3.25) and (6.3.26)
and thus find the functions cl(x) and c2(x) as generalized power series
obtained by term-by-term integration in (6.3.25) and (6.3.26).
Thus the general solution of the nonhomogeneous equation (6.3.12) has
the form
+ +
FI(x) = c ~ $ l ( x ) c2$2(2) a ( % ) , (6.3.27)
where Q(x) is given by (6.3.23), (6.3.25), and (6.3.26). To obtain the unique
solution satisfying the Bellman equation (6.3.4) for x > x,, we need to
choose arbitrary constants cl and c2 in (6.3.27). To this end, we use the
boundary condition (6.3.10) for the function Fl(x) a t the switch point x,.
To obtain the second condition, we assume that the functions Fo(x) and
Fl(x) coincide as K + co with the known exact solution F ( x ) given by
(6.3.6). It follows from (6.3.16), (6.3.17), (6.3.20), (6.3.21), (6.3.25), and
(6.3.26) that this condition is satisfied if we set cl = 0. The condition
(6.3.10) for the function Fl(x) a t the point x, implies

Thus, the desired solution of the inhomogeneous equation (6.3.12) acquires


the form

Formulas (6.3.17) and (6.3.28) determine the cost function F (x) that
satisfies the Bellman equation (6.3.4) for all x E [O, XI). In these formulas,
only the coordinate of the switch point x, remains unknown. To find x,,
we use the condition that the cost function F (x) must be continuous a t the
switch point:
(6.3.29)
x=x* x=x*
Applications of Asymptotic Synthesis Methods 349

or, which is the same due to (6.3.10), the condition that the second-order
derivative must be continuous:

Since the series (6.3.16) and (6.3.21) are convergent, we can calculate
x, with any prescribed accuracy, and thus solve our equations numerically.
Furthermore, for large values of the medium capacity K , formulas (6.3.29)
and (6.3.30) give us approximate analytic formulas for the switch point,
and these formulas allows us to construct control algorithms that are close
to the optimal control.
6.3.3. The calculation of x, for large K . In the case K -+ m, the
functions $(x), $l(x), and $z(x), as it follows from (6.3.16), (6.3.20), and
(6.3.21), are given by the finite formulas

Correspondingly, instead of the series (6.3.15) and (6.3.28), we have

By substituting (6.3.31) and (6.3.32) into (6.3.30), we obtain x, = xo,


where xo is given by (6.3.7) (derived in $2.4).
If the medium capacity K is a finite number, then the coordinate x,
cannot be written as a finite formula. However, it follows from continuity
considerations that for large K the coordinate x, is close to xo, so that
we can take xo as the first approximation to the root of Eqs. (6.3.29) and
(6.3.30). Then the corrections for refining this first approximation can be
calculated by the following scheme.
For large K , the E = T / K B can be considered as a small parameter
and, as follows from (6.3.15), (6.3.16), (6.3.20), (6.3.21), and (6.3.28), the
functions Fo(x) and Fl(x) can be represented a s power series in E:
350 Chapter VI

We also seek the root of Eqs. (6.3.29) and (6.3.30), that is, the coordinate
x,, as the series
2, = xo + &Al+ c 2 a 2+ E ~ . .., (6.3.35)
where the numbers xo, A,, A,, . . . must be calculated. By substituting the
expansions (6.3.33)-(6.3.35) into Eq. (6.3.29) (or (6.3.30)) and setting the
coefficients of equal powers of the small parameter E on the left- and right-
hand sides equal to each other, we obtain a system of equations for succes-
sive calculation of the numbers xo, Al, A2,. . . in the expansion (6.3.35).
Obviously, the first term xo in (6.3.35) coincides with (6.3.7). To cal-
culate the first correction Al in the expansions (6.3.33) and (6.3.34), we
-
retain the zero-order and the first-order terms and omit the terms E~ and
higher-order terms. As a result, from (6.3.16), (6.3.17), (6.3.20), (6.3.21),
and (6.3.28) we obtain the following expressions for the functions Fo(x) and
Fl(x) in the first approximation:

where

By differentiating (6.3.36) and (6.3.37) two times, we rewrite Eq. (6.3.30)


as

To calculate the first two terms in the expansion (6.3.35), we substitute


+
the root x, = xo &Al into Eq. (6.3.38) and collect the terms of the zero
Applications of Asymptotic Synthesis Methods 351

and the first order with respect to the small parameter E. If we retain only
the zero-order terms in Eq. (6.3.38), then we can readily see that (6.3.38)
implies formula (6.3.7) for xo. Collecting the terms of the order of E , from
(6.3.38) we obtain the first correction

Thus, for large values of the parameter K (that is, for small E ) , the
coordinate xo given by (6.3.7) can be interpreted as the switch point in the
zero approximation. Correspondingly, the formula

where xo and Al are given by (6.3.7) and (6.3.39), determines the switch
point in the first approximation.
Let uo(x) and ul(x) denote the controls

Obviously, by using these algorithms to control system (6.3.1), we can de-


crease the value of the functional (6.3.3) compared with its maximum value
F ( x ) , which can be obtained by using the optimal control (6.3.9). How-
ever, it is natural to expect that this decrease in the value of the functional
(6.3.3) is negligible for large K, and moreover, the quasioptimal control
ul(x) is "better" than the zero-approximation algorithm uo(x) in the sense
that I[u$ > I[uo].
6.3.4. Results of the numerical analysis. Our expectations are
confirmed by the following results of numerical analysis of the quasiopti-
ma1 algorithms (6.3.41). By Gi (x) we denote the value of the functional
(6.3.3) obtained by using the controls ui and a given initial population size
x(0) = x. Then Gi(x) is a continuously differentiable function of the initial
state x and satisfies the linear equation
352 Chapter VI

Denoting by Gio(x) and Gil(x), just as in Section 6.3.2, the values of the
function Gi(x) on either side of the switch point xi, we obtain the following
equations for Gio and Gil from (6.3.42):

which are quite similar to Eqs. (6.3.11) and (6.3.12). Therefore, the general
solutions of these equations, by analogy with Section 6.3.2, have the form

where the functions $(x), &(x), and @(x) are given by formulas (6.3.16),
(6.3.20), (6.3.21), (6.3.23), (6.3.25), and (6.3.26).
The functions (6.3.45) differ from the corresponding functions (6.3.17)
and (6.3.28) in Section 6.3.2 by the method used for calculating the con-
stants El and C2 in (6.3.45). In Section 6.3.2 the corresponding constants
(ao in (6.3.15) and cl, cz in (6.3.27)) were determined by the condition
(6.3.10) a t a n unknown switch point x,, while in Eqs. (6.3.42) the switch
point xi was given in advance either by (6.3.7) with i = 0 or by (6.3.40)
with i = 1. By substituting (6.3.45) into the formulas Gio(xi) = Gil(xi)
and G;,(xi) = G:l(x;),8 we obtain the following formulas for the coefficients
El and C2 in (6.3.45):

By choosing specific numerical values of the parameters r , K, q, u, in


problem (6.3.1)-(6.3.3), one can calculate the coefficients (6.3.46) and thus
construct the plots of the functions Gi(x), i = 0,1, by using computers.
We also note that the same formulas (6.3.45) and (6.3.46) can be used for
numerical calculation of the cost function F ( x ) satisfying the Bellman equa-
tion (6.3.4). To this end, it suffices first to calculate the root of Eq. (6.3.29)
(or (6.3.30)), and then to substitute the obtained value into (6.3.46) in-
stead of xi. In this case, the functions Gio(x) and Gil(x) given by (6.3.45)

'These formulas follow from the condition that the solutions G;(z)of Eqs. (6.3.42)
are continuously differentiable.
Applications of Asymptotic Synthesis Methods 353

coincide, respectively, with the functions Fo(x) and Fl (x) given by (6.3.17)
and (6.3.28), that is, we have Gi(x) z F ( x ) .
The above-described procedure for numerical constructing the functions
Go(x), Gl(x), and F (x) was realized in the form of software and was used
in numerical experiments for estimating the quality of the quasioptimal
control algorithms uo(x) in the zero approximation and ul(x) in the first
approximation. Some results of these experiments are shown in Figs. 53
and 54, where the cost function F ( x ) is plotted by solid curves and the
functions Go(%)and Gl(x) by dot-and-dash and dashed curves, respectively.

In Fig. 53 these curves are constructed for two values of the parameter K :
K = 7.5 and K = 11; the other parameters of problem (6.3.1)-(6.3.3) are:
r = 1 , 6 = 3 , B = l , q = 3, u, = 1.5, c = 3, and p = 2. In this case,
the variable E = r / K B treated as a small parameter in the expansions
(6.3.33)-(6.3.35) attains the values E = 0.091 (the upper group of curves)
and E = 0.133 (the lower group of curves). Figure 53 shows that in this
case all three curves F (x), Go(%),and G I (x), relative to the same group
of parameters, are sufficiently close to each other. Hence, the use of the
quasioptimal algorithms (6.3.41) ensures the control quality close to that
of the optimal control (obviously, the first-approximation control ul(x) is
354 Chapter VI

preferable than the zero-approximation control uo(x), since the mean cost
Gl(x) corresponding to ul(x) is closer to the optimal cost F ( x ) ) .

It is of interest to point out that an improvement in the control quality


can be obtained by using ul(x) instead of uo(x) even if the parameter
E = T I K B is not small. This phenomenon is clearly illustrated by the
results of calculations shown in Fig. 54, where the curves F ( x ) , Go(x),and
Gl (x) are drawn for the following parameters of problem (6.3.1)-(6.3.3):
r=1,6=20,B=1,q=3,um=10O,c=3,p=2,K=0.3,and
K = 0.17.
Many times in Chapters 111, V, and VI we have considered similar sit-
uations (in which the formal use of the approximate synthesis procedure
developed for problems with a small parameter c << 1 provides satisfactory
results for E 1). Thus we see that the small parameter methods and
related methods of successive approximations are very effective tools for
investigation and solution of various specific practical problems of optimal
control.
CHAPTER VII

NUMERICAL SYNTHESIS METHODS

Numerical synthesis methods are, in general, mostly universal compared


with any other methods for solving problems of optimal control, since nu-
merical methods are highly insensitive to the problem conditions.
Indeed, each of the approximate methods described in Chapters III-VI
is intended for solving optimal control problems from a certain class char-
acterized by the singularities of the plant dynamics equations, by small
parameters, etc. The choice of the method for obtaining quasioptimal con-
trol algorithms essentially depends on the singularity of the control problem
designed.
On the other hand, if the control problem is solved, just as in the present
book, by the dynamic programming method, then the possibility to solve
the synthesis problem numerically is determined by the way of constructing
a numerical solution of the Bellman equation corresponding to the problem
in question. The type of this Bellman equation is determined by the char-
acter of the problem considered. So, the majority of stochastic synthesis
problems studied in Chapters II-VI correspond to the Bellman equations
in the form of nonlinear second-order partial differential equation of the
parabolic type. Correspondingly, the Bellman equations for deterministic
synthesis problems are (nonlinear) first-order partial differential advection
type equations.
Equations of both types were thoroughly studied long ago. Such equa-
tions arise in many problems of mathematical physics and mechanics of con-
tinuous media, in modeling chemical and biological processes, etc. Hence,
so far numerous different numerical methods have been developed for solv-
ing such equations,' many of which are realized as standard programs that

'It would be right to note that numerical methods have been developed mostly for
solving second-order parabolic equations. Nonlinear advection equations have been less
studied until the present time. However, many papers dealing with qualitative analysis
and numerical solution of such equations have appeared most recently. Here we would
like to mention the Italian school of mathematicians (M. Falcone, R. Ferretti, and others)
who studied various discrete schemes that allow the construction of numerical solutions
for various types of nonlinear advection equations including those with discontinuous
solutions [lo, 31, 48, 49, 531.
Chapter VII

are parts of well-known software packages such as MATLAB, Mathematzca,


and some others.
It should be noted that the existing software can be used for solving
synthesis problems in practice rather seldom. This fact is related to some
peculiar features of the Bellman equations (see $3.5 in [34]), which make the
application of standard numerical methods rather difficult. For example,
the difficulties arising in solving the Bellman equations of higher dimen-
sions are well known. Furthermore, an obstacle known as the " boundary
difficulty" is often encountered in the numerical solution of synthesis prob-
lems.
Obviously, any numerical procedure allows us to construct the solution
of the Bellman equation only in a bounded region D where the arguments
of the loss function vary. Therefore, if, for example, we solve the Bellman
equation of parabolic type, then we need to pose the initial and boundary
conditions on the boundary of D. At the same time, many optimal control
problems do not contain any restrictions on the phase coordinates (in this
case, to solve the synthesis problem, we need to solve the Cauchy problem
for the Bellman equation). Thus, for a reasonable choice of the boundary
conditions required for the numerical solution of the problem, we need, in
addition, to study the asymptotic behavior of the loss function a t infinity.
In more detail, these problems are considered in $7.1.
In $7.1 and $7.2 we show how one of the most widely used methods
(known as the grid function method) for solving partial differential equa-
tions numerically can be applied for the numerical solution of some specific
optimal control problems studied in the previous chapters by other meth-
ods.

$7.1. Numerical solution of the problem


of optimal damping of random oscillations
The main results of this section are related to the numerical solution of
the problem of optimal damping of random oscillations in a linear oscillator;
this problem was studied in $3.2 and $3.4. However, we begin with some
general problems concerning methods for stating the boundary conditions
for the loss function in solving the synthesis problem numerically.

7.1.1. Choice of the boundary conditions for the loss func-


tion. Let us consider a control system governed by the differential Ito
equation of the form

dx(t) = [a(t, x) + q(t)u] dt + ~ ( t , d o ~ ( t ) ,


X)

0 < t 5 T, x(0) = zo.


Numerical Synthesis 357

Here x = x(t) is an n-dimensional vector of phase variables, u = u(t) is an


r-dimensional vector of controlling actions, q(t) is a d-dimensional vector
of independent Wiener stochastic processes of unit intensity, a ( t , x) is an
n-dimensional vector of given functions, and q(t) and a ( t , x) are given n x r
and n x d matrices.
We assume that admissible control actions are subject to constraints of
the form
u(t) E U1 (7.1.2)
where U is a given closed bounded set in R,.
If the vector of current phase variables x(t) can be measured exactly,
then we need to construct a control function u, = u, (t, x(t)), 0 5 t 5 T ,
in the synthesis form so that, for any given initial state x(0) = xo, the
function u* minimizes the following functional defined on the trajectories
of Eq. (7.1.1):
(7.1.3)

(here EL.] is the mathematical expectation of [.I,


c(x), $(x) >
0 are given
<
penalty functions, and 0 5 t T is a given time interval).
According t o $1.4, the dynamic programming approach allows one to
reduce problem (7.1.1)-(7.1.3) to solving the partial differential equation
(the Bellman equation)

a a a2 1
L = a t + a T ( t , ~ ) - dx
+ ~ p b ( t , x ) - dxdxT ' b(t, 2) = -a(t,
2 x)aT (t,x).

Here F = F ( t , x) is the loss function determined as usual by

Equation (7.1.4) is a semilinear (linear with respect to higher-order


derivatives) equation of parabolic type, and we shall try to solve it numeri-
cally by using different versions of well-studied finite-difference procedures
in the grid function methods (the grid methods) 1135, 162, 163, 1791. How-
ever, these calculational scheme allow one to obtain the solution only in
a bounded domain D of the phase variables x. To apply these methods,
we need to impose some boundary conditions on the loss function F ( t , x)
on the boundary of D. Since in the initial statement of the problem it is
358 Chapter VII

assumed that the solution of Eq. (7.1.4) must be defined on an unbounded


phase space (x E R,), the boundary conditions for F ( t , x) require a special
analysis if Eq. (7.1.4) is solved numerically.
A possible method for overcoming the boundary indeterminacy in sto-
chastic time-optimal control problems was proposed in [85].
For the problem considered here, the essence of the method suggested
in [85] consists in the following. Suppose that it is required to construct a
numerical solution of Eq. (7.1.4) in a bounded region D. Let us consider a
sequence of expanding bounded regions in the phase space DR > D (DR
can be the n-dimensional ball of radius R or the n-dimensional cube with
edge R centered a t the origin). Then the desired solution F ( t , x) is defined
in the region D as the limit of the sequence of numerical solutions of the
boundary value problems for Eq. (7.1.4) in the regions DR, corresponding
to the increasing sequence of values of the parameter R. In this case, the
boundary conditions posed on the boundaries of the regions DR can be
arbitrary (for example, the zero conditions F ( t , x)laD, = 0).
However, in practice, the use of this procedure in the numerical synthesis
requires an extremely large amount of calculations. For example, already
for the second-order system (7.1.1) (x E R 2 ) , this method is unacceptable,
since the time required to compute the solution is too large.
Here we present a more economical numerical method based on the use
of the asymptotic behavior of the loss function for large 1x1. In this case, we
need a priori estimate the asymptotic behavior of F ( t , x) satisfying (7.1.4)
as 1x1 + m.
Suppose that q(t) be a piecewise continuous bounded function for all
t E [0, T] and a(t, x) and u(t, x) are continuous functions in x, Bore1 function
in (t, a ) , and satisfy the conditions

for all X , y, E R, and t E [O,T], where N >


0 are constants, la1 is the
Euclidean norm of the vector a, and I c TI I = Jw.
>
We assume that the penalty functions c(x), $(x) 0 are continuous and
satisfy the condition

for all x E R, and some m, Nl, N2 > 0; furthermore,

for all R > 0 and x, y E SR (SRis a ball of radius R in R,).


Numerical Synthesis 359

By using Theorem IV.l.l [113], one can show that the conditions (7.1.6),
(7.1.8), together with the upper estimates (7.1.7), guarantee that the func-
tion F ( t , x) satisfying problem (7.1.1)-(7.1.3) has generalized first-order
derivatives in x and the estimate

is satisfied for any t E [0, T] and for almost all x.


The lower bounds for the penalty functions (7.1.7) and the continuity of
the phase trajectories x(t) imply the following lower estimate for the loss
function:
>
F ( t , x) N ( 1 + 1x1)". (7.1.10)
Let ~ ' ( t x)
, denote the solution of the linear equation

(L is the operator in (7.1.4)). Obviously, F0 is the value of the functional

This functional is calculated on the trajectories of system (7.1.1) corre-


sponding to the noncontrolled motion (the averaging in (7.1.12) is per-
formed over the set of sample paths x(s) : t < s ( T issued from a given
point x(t) = x and satisfying the stochastic differential equation (7.1.1) for
u E 0).
It follows from (7.1.4) and (7.1.11) that the difference G(t, x) = Fo(t, x)-
F ( t , x) satisfies the equation

Here @ denotes the nonlinear function @(t,F,) = - min,EU[uTqTF,]. Since


the set U of admissible controls and the function q(t) are bounded, we have
the estimate
I@@, F x ) I 5 NIFx(t, x)I. (7.1.14)
If the transition probability density of a noncontrolled Markov process
x(s) satisfying Eq. (7.1.1) for u G 0 is denoted by p(x, t; y, s) ( s > t), then
we can write the solutions of Eqs. (7.1.11) and (7.1.13) in quadratures (see
(3.4.13)). In particular, for the function G we have
360 Chapter VII

This relation and (7.1.9) imply the following (similar to (7.1.9)) upper
bound for the difference G = F - FO:

Hence, with regard to (7.1.10), we obtain

as Ix 1 + co. This condition allows us to use F O ( t ,x) as the asymptotics of


the loss function F ( t , x) for solving the Bellman equation (7.1.4) numeri-
cally.
In some cases, for instance, in the example considered below, we succeed
in obtaining a finite analytic formula for the function F O ( t ,x).
7.1.2. Numerical solution of a specific problem. We shall discuss
the method of numerical synthesis in more detail for the problem of optimal
damping of random oscillations studied in $3.2 and $3.4. Suppose that the
plant to be controlled is a linear oscillator with one degree of freedom
governed by an equation of the form

where [ ( t )is the scalar standard white noise (1.1.31), u is a scalar control,
and p, B , and urn are given positive numbers (P < 2). By setting the
penalty functions c(x(t)) = x2(t)+k2(t) and +(x) = 0 in (7.1.3), we obtain
the Bellman equation

for the loss function F ( t , x, y) (here x and y = j: are the phase variables).
By passing to the reverse time p = T - t , we can rewrite (7.1.18) as the
standard Cauchy problem for a semilinear parabolic equation. By using
the old notation t for the reverse time p, we rewrite (7.1.18) as

We shall seek the numerical solution of Eq. (7.1.19) in the quadratic


<
(-L 5 x L, -L < -y < - L) region D of the phase variables (see Fig. 55).
We need to pose boundary conditions for the function F ( t , x, y) on the
boundary of D. It follows form (7.1.17) that the phase trajectories lying in
Numerical Synthesis

the interior of D cannot terminate on the boundary segments BC and ED


indicated by dashed lines in Fig. 55. Therefore, we need not pose boundary
conditions on these segments; on the other parts of the boundary, as it
follows from Section 7.1.1, the boundary conditions are posed with the aid
of the asymptotics F O ( t ,x, y) satisfying the linear equation

Up to the notation, Eq. (7.1.20) coincides with Eq. (3.4.23) whose solution
was obtained in $3.4 as the finite formula (3.4.29). Rewriting (3.4.29) with
regard to the notation used in the present problem, we obtain the solution
of Eq. (7.1.20) in the form

Formula (7.1.21) allows us to pose the boundary conditions for the de-
sired function F = F ( t , x, y) on the unhatched parts of the boundary
362 Chapter VII

of D = (-L < x,y< +L). To this end, we set F = F ( t , x , y ) =


F O ( t ,-L, y) on AB, F = F O ( t ,x, L) on C F , F = FO(t,L, y) on E F , and
F = F O ( t ,x, -L) on AD.
Let us construct a uniform grid in the domain IIT = { D x [0, TI) =
< < < <
{(x, y , t ) : - L x, y L, 0 t T}. By F& we denote the value of the
function F ( t , x, y) a t the point with coordinates (t = k ~x , = ih, y = jh),
where h and T are the approximation steps in the coordinates x, y and in
time t and i, j, k are integer-valued variables with values in -Q 5 i 5 +Q,
-Q 5 j < +Q, and 0 < Ic < K (L = Qh, T = KT).
The boundary conditions for the grid function F$ have the following
form (here FO( t,x, y) is the function (7.1.21)):

~ 6= FO(ICT,
, ~ ~h,jh), o 5 j 5 Q;
F!~,~ = F0(k7, -Qh, jh), < j < 0;
-Q
FtQ= F0(k7-,ih,Qh), -Q + 1 < i < Q;
FtPQ= Fo(k7,ih, - ~ h ) , -Q < i < Q - 1.

It follows from (7.1.19) that for Ic = 0 we must set

a t all nodes of the grid.


For the difference approximation of Eq. (7.1.19) we shall use a locally
one-dimensional solution method (a lengthwise-transverse scheme) [163]. In
this case the complete approximation scheme consists in solving the follow-
ing two one-dimensional (with respect to the phase coordinates) equations
successively:

Each of Eqs. (7.1.24) and (7.1.25) is replaced by a two-layer difference


scheme defined by the three-point pattern (Eq. (7.1.24)) or by the four-point
pattern (Eq. (7.1.25)). In this case, since the parts of the boundary of D
indicated by dashed lines in Fig. 55 are inaccessible, we shall approximate
> >
v, = dvldx by the right difference derivative for y 0 ( j 0) and by the
left difference derivative for y < 0 ( j < 0). Then the derivatives V, = dV/dy
and V, = d2V/dy2 are approximated by the central difference derivatives
Numerical Synthesis

The values of the grid function v t j and at the grid nodes are calculated
successively for the time layers k = 1,2,. . . by an implicit scheme. In
+
this case the (k 1)th layer function vtjfl corresponding to Eq. (7.1.24)
is used as the initial function vtjfl = K!~ for solving Eq. (7.1.25). The
grid functions F$ corresponding to the original equation (7.1.19) and the
functions vf,j and 63 corresponding to the auxiliary equations (7.1.24) and
-
(7.1.25) are related as follows: 8'tj= vFTj,vf,fl = Yzj, and ~ , f ? l= Fi , j ~ + ~
Moreover, since the time-step is assumed to be small (we take r = 0.01),
in the difference approximation of Eq. (7.1.25) we can use the sign of the
derivative Vk = vkfl instead of s i g n ( ~ : z l - T/,:-l1), that is, we shall use
ui,j = sign(~f"j+, - q$-l) instead of sign V, (a similar replacement was
performed in [34, 861).
It follows from the preceding that the difference approximation trans-
forms Eqs. (7.1.24) and (7.1.25) to the following three difference equations;

Formulas (7.1.26) and (7.1.27) together with the boundary conditions


(7.1.22) and the initial conditions (7.1.23) allow us to calculate the functions
v f13t l recurrently at all nodes of the grid. Indeed, rewriting (7.1.26) and
(7.1.27) in the form
364 Chapter VII

we see that, for given vfTj = F& and each fixed j 2 0, the desired set of the
values of v f73t l can be calculated successively from right to left by formula
+
(7.1.29). For the initial value of vz: we take F O ( ( k l ) ~L,, j h ) , where
F O ( t ,x,Y) is the function (7.1.21). Correspondingly, for j < 0 the values
of vf,fl can be calculated from left to right by formula (7.1.30) with the
+
initial value vF'Qtj = F0((k l ) ~-L, , jh) .'
Since vf,:' = yfEj,we obtain the grid function r/;fj for the kth time layer,
after the grid function vtf is calculated. Now to calculate the grid function
v,::' +
= F;T1 on the layer (k I ) , we need to solve the linear algebraic
system (7.1.28). It is convenient to solve this system by the sweep method
[162, 1791, which we briefly discuss here. Let us denote the desired values of
+
the grid function on the layer (k 1) by zj = K.';: Then system (7.1.28)
can be written in the form

where Aj , C j , Mi, and cpj are well-known expressions

Aj = 2 r B + h r ( i h + j p h + u,ui,j), C j = 2h2 + 47B,


M j = 2 r B - h r ( i h + j p h + umuilj), cpj = 2h2(vf,T1 + ~ ( j h ) ' ) .
(7.1.32)
Since the number of equations in (7.1.31) is less than the number of un-
known variables z j , -Q 5 j 5 Q, to solve the system (7.1.31) uniquely, we
need to complete this system with two conditions

zPQ = FO((k + l ) ~ih,, -L), zQ = ~ ' ( ( k+ l ) ~ih,


, L) (7.1.33)

that follow from the boundary conditions (7.1.22).


We seek the solution of problem (7.1.31), (7.1.33) in the form

where the coefficients p j and uj are calculated by the recurrent formulas

2The recurrent formulas (7.1.29) and (7.1.30) are used for k = 0 , 1 , 2 , . . .,K - 1. It
follows from (7.1.23) that in (7.1.29) and (7.1.30) we must set vPIj = 0, -Q 5 i , j 5 Q ,
for k = 0.
Numerical Synthesis 365

with the initial conditions

p-Q+I = 0 7 Y-Q+l +
= F0 ((x: l ) ~
i h, - L ) .

Thus, the algorithm for solving problem (7.1.31), (7.1.33) by the sweep
method consists in the following two steps:
( 1 ) to find l ~ and
j vj recurrently for -Q + 1 5 j 5 Q (from left to right
from j t o j + 1) by using the initial values (7.1.36) and formulas (7.1.35);
( 2 ) employing z Q from (7.1.33), to calculate (from right to left from
+
j 1 t o j ) the values z ~ - z ~~ , - .~. .,,z - ~ + z~ -, S~ U C C ~ S S ~according
V ~ ~ ~ to
formulas (7.1.34) (note that in this case, in view of (7.1.36), the value of
z -Q coincides with that given by (7.1.33)).
As was shown in [162, 1791, the procedure of calculations by formulas
(7.1.34) and (7.1.35) is stable if for any j we have

It follows from (7.1.32) that these conditions can be reduced to the following
one in the problem in question:

Obviously, the last condition can always be satisfied by choosing a suffi-


ciently small approximation step h.
This calculational procedure was realized as a software package used
for numerical experiments on computers. The parameters of a &fference
scheme were chosen so that to ensure a prescribed accuracy. It is well known
[I631 that the total locally one-dimensional approximation scheme (7.1.22),
(7.1.23), (7.1.26)-(7.1.28) is absolutely stable and its error is O ( h 2 r ) . +
The approximation steps were: T = 0.01 and h = 0.1. The dimensions of
the region D were: L = 3 and Q = 30. The other parameters 0, urn,B
of the problem were different in different specific calculations. The two-
dimensional d a t a array of the loss function F ( t , s , y) was printed for t =
0.25, 0.5, 0.75 ,....
Some results of these calculations are shown in Fig. 56-60. Figure 56
presents the axonometry of the loss function F ( t , x , y) in Eq. (7.1.19) with
3j = B = urn = 1 a t three time moments t = 0.25, 0.5, 1.0. Figure 57 shows
curves of constant level F ( t , x , y) = 3 and switching lines in a n optimal
system with ,6 = B = urn = 1 a t three time moments t = 0.5, 2.0, 8.0. In
view of the central symmetry of Eqs. (7.1.19), these curves are plotted in
two different halves of the region D. The switching line uniquely determines
the optimal control of system (7.1.17) as follows: u = -urn a t the points of
Chapter VII

the phase plane (2,y) lying above the switching line, and u = +urn below
this line.
Figure 58 illustrates how the switching line and the value of the perfor-
mance criterion of this optimal system depend on the value of the admissible
control u, for B = /? = 1 and t = 4. In Fig. 58 one can see that an increase
in the range of admissible controls uniformly improves the control quality,
Numerical Synthesis 367

that is, decreases the value of the optimality criterion independent of the
initial state of system (7.1.17).

Figures 59 and 60 show how the switching lines and the constant level
curves depend on the other parameters of the problem.
Chapter VII

$7.2. Optimal control for the


"predator-prey" system (the general case)
In this section we consider the deterministic problem of optimal control
for a biological system consisting of two interacting populations ("preda-
torsff and "prey"). We have already considered this system in $5.2 where
we studied a special type of this system called in $5.2 the case of a "poorly
adapted predator." In what follows, we consider the general case of this
problem. The synthesis problem corresponding to this case is solved numer-
ically. Furthermore, we obtain some analytic results for a control problem
with infinite horizon.
7.2.1. The normalized Lotka-Volterra model. Statement of
the problem. We assume that the system considered is described by the
Lotka-Volterra model (see [133, 186, 1871 as well as $2.3 and $5.2) in which
the behavior of the isolated system is governed by a system of the form

Here x1(7) and y1(7) are the sizes (densities) of prey and predator popu-
lations a t time T , the positive numbers ai (i = 1,2,3,4) characterize the
intraspecific (al, a4) and interspecific (az, as) interactions. By changing the
variables

we rewrite system (7.2.1) in the dimensionless (normalized) form


Numerical Synthesis 369

Just as in $5.2, we assume that the exsernal (controlling) action on sys-


tem (7.2.2) is t o remove some prey species from the habitat (by catching,
shooting, or using some chemical substances). In this case, the control
system considered is described by equations of the form
x(t) = (1 - y)x - ux, $(t) = b(x - l ) y ,
(7.2.3)
t > 0, x(0) = xo > 0, ~ ( 0=
) yo > 0,
where u = u(t) is a nonnegative bounded scalar controlling function that
>
for all t 0 satisfies the constraints

where urn is a given positive number.


Let us consider the phase trajectories of the controlled system (7.2.3).
They are solutions of the differential equation

First, we note that in view of Eqs. (7.2.3), the phase variables z ( t ) and ~ ( t )
cannot attain negative values for all t 2 0 if the initial values xo and yo are
nonnegative (the last assumption is always satisfied, since xo and yo denote
the initial sizes of the prey and predator populations, respectively). There-
fore, all solutions of Eq. (7.2.5) (the phase trajectories of system (7.2.3)) lie
>
in the first quadrant (x 2 0, y 0) of the phase plane (x, y). Furthermore,
we shall consider only the phase trajectories that correspond to the two
boundary values of control: u = 0 and u = u,.
For u = 0 Eqs. (7.2.3) coincide with Eqs. (7.2.2) for a n isolated (au-
tonomous) Lotka-Volterra system. The dynamics of system (7.2.2) was
studied in detail in 11871. Omitting the details, we only note that in the
first quadrant (x > 0, y 2 0) there are two singular points ( z = 0, y = 0)
and (x = 1,y = 1) that are the equilibrium states of system (7.2.2). In this
case the origin ( x = 0, y = 0) is a n unstable equilibrium state, while the
state (x = 1, y = 1) is stable and is a center type singular point. All phase
trajectories of system (7.2.2) (except for the two that lie on the coordinate
> >
axes: (x 0, y = 0) and (x = 0, y 0)) form a family of closed concentric
curves around the point (x = 1,y = 1). Thus, in a noncontrolled system
the sizes of both populations are subject to undecaying oscillations whose
period and amplitude depend on the initial state (xo, yo). However, if the
initial state (zo, yo) lies on one of the coordinate axes in the plane (x, y),
then there arise singular (aperiodic) phase trajectories. In this case it fol-
lows from Eqs. (7.2.2) that the representative point of the system cannot
370 Chapter VII

leave the corresponding coordinate axis and in the course of time either
approaches the origin (along the y-axis) or goes to infinity (along the x-
axis). The singular phase trajectories correspond to the degenerate case of
system (7.2.2). In this case, the biological system considered contains only
one population.
If u = urn > 0, then the dynamics of system (7.2.3) substantially depends
on urn. For example, if 0 < urn < 1, then the periodic character of solutions
of system (7.2.3) is conserved (just as in the case u = 0), while only the
center of the family of phase trajectories moves to the point (x = 1, y = 1 -
urn). For u, 2 1 the solution of system (7.2.3) is aperiodic. In the special
case u, = 1, Eq. (7.2.5) can easily be solved, and the phase trajectories of
system (7.2.3) can be written explicitly as

For urn > 1 Eq. (7.2.5) has a unique singular point (x = 0, y = O), and this
equilibrium state is globally asymptotically table.^
Now let us formulate the goal of control for system (7.2.3). In many cases
190, 1051 it is most desirable that system (7.2.3) is in equilibrium for u = 0,
that is, the point (x = 1,y = 1) is the most desirable state of system (7.2.3).
In this case, one is interested in a control u, = u,(x, Y) that takes system
(7.2.3) from any initial state (xo, yo) to the point x = 1,y = 1 in a minimum
time. This problem was solved in 1901. Here we consider the problem of
constructing a control u, = u,(t, x, y), which, in general, does not guarantee
that the system comes to the equilibrium point (x = 1,y = 1) but ensures
the minimum mean square deviation of the system phase trajectories from
the state (x = 1,y = 1) in a given time interval 0 t 5 T: <

7.2.2. The Bellman equation and calculation of the boundary


conditions. By using the standard procedure of the dynamic program-
ming approach (see $1.3), we obtain the following algorithm for solving
problem (7.2.3.), (7.2.4), (7.2.7).

3 ~ this
n case the term "global" means that the trivial solution of system (7.2.3) is
asymptotically stable for any initial values (xo,yo) from the first quadrant of the phase
plane.
Numerical Synthesis 371

Now we define the loss function (the functional of minimum future losses)
by the relation

F ( t , x , ~ ) = min
0ju(u)lum
t<o<T

and thus write the Bellman equation for problem (7.2.3), (7.2.4), (7.2.7) as

x,y>O, Oit<T, F(T,x,y)=O.


(7.2.9)
If the function F (t, x, Y) satisfying (7.2.9) is found, then the desired optimal
control u, (t, x, y) in the synthesis form is given by the expression

for g(t,x, y) < 0,


for g ( t , x, y) > 0.

By using (7.2.10), we can rewrite the Bellman equation in the form

It follows from (7.2.10) that the optimal control is a relay type function,
that is, a t each time instant the control u is either u = 0 or u = u, (this is a
bang-bang control). If the loss function (7.2.8) is continuously differentiable
with respect to x, then the control is switched from one value to the other
each time when the condition

is satisfied. Equation (7.2.12) determines the switching line on the phase


plane (x, y) a t each time instant. This switching line divides the phase
space x, y > 0 into two regions Ro and R, where the control u is either
u = 0 or u = u,, respectively. To find the switching line is equivalent to
solve the problem of optimal control synthesis.
372 Chapter VII

Of course, it must be remembered that the above procedure for solv-


ing the synthesis problem can be used only if the loss function (7.2.8) is
sufficiently smooth and the Bellman equation (7.2.9) (or (7.2.11)) holds a t
all points of the domain TIT = {x, y > <
0,0 t 5 T) of definition of the
loss function. T h e smoothness properties of solutions satisfying equation
of the form (7.2.9) (or (7.2.11)) were studied in detail in [172]. As applied
to Eq. (7.2.9), the main result of [I721 has the following meaning. The loss
function F ( t , x, y) satisfying (7.2.9) has continuous first-order derivatives
with respect to all its arguments in the regions Ro and R,. On the in-
terface between Ro and R,, that is, on the switching line, the derivatives
d F / d x and d F / d y can be discontinuous (have jumps) depending on type
of the switching line. Namely, for the switching lines of the first and second
kind, the first-order derivatives of the loss function are continuous every-
where in IIT. On the switching line of the third kind, the partial derivatives
d F / d x and d F / d y always have jumps. Recall that, according to the clas-
sification given in [172], the type of the switching line is determined by
the character of the phase trajectories of system (7.2.3) in the regions Ro
and R, near the switching line. For example, if the phase trajectories ap-
proach the switching line on both sides, then such a switching line is called
a switching line of the first kind. In this case, the representative point of
system (7.2.3), once coming to the switching line, moves along this line in
the sliding mode (see $1.1). If the phase trajectories approach the switch-
ing line on one side (say, in the region Ro) and leave it on the other side
(in R,), then we have a switching line of the second kind. Finally, if the
switching line coincides with a phase trajectory in the region R, (or Ro),
then we have a switching line of the third kind.
In what follows, switching lines of the third kind do not occur; thus
we can assume that for problem (7.2.3), (7.2.4), (7.2.7) studied here the
Bellman equation (7.2.9) (or (7.2.11)) is valid everywhere in the region
x > 0, y > 0, 0 5 t < T, and in this region the function F ( t , x, y) satisfying
this equation has continuous first-order derivatives with respect to all its
arguments.
To solve Eq. (7.2.9) uniquely, we need to pose boundary conditions for
the loss function F ( t , x, y) on the boundary of the region of admissible
phase variables, that is, for x = 0 and y = 0. Such boundary conditions
can readily be obtained by a straightforward calculation of the functional on
the right in (7.2.8) by using Eqs. (7.2.3) describing the system considered.
Let us write F ( t , 0, y) = p ( t , y) and F ( t , x, 0) = G(t, x). Then, using
(7.2.3) and (7.2.8), we obtain
Numerical Synthesis 373

To find $(t,x ) , we need to solve the following one-dimensional optimiza-


tion problem

( u )= 1 - u ) x ( ) > t, x ( t ) = 2.

Problem (7.2.14) can readily be solved, although the solution of (7.2.14)


and hence the form of the function $ ( t ,x ) substantially depend on the
value of urn.
(a) Let 0 < urn < 1. In this case the points

divide the x-axis into three intervals. On the intervals 0 5 x 5 xl and


<
x2 x < co,the function has the explicit form

+
2 ( T - t ) - 2 ~ [ e ( ~--11~ ) x2[e2(T-t)- 1 ] / 2 , 0 5 x < 21,

$ ( t ,x ) =
{
On the interval x l
2 ( T - t ) - f 1i -[Ue,( l - u m ) ( T - t ) - 11
+ L [ e 2 ( 1 - u m ) ( ~ - t )-

<
Z(1-urn)

- x <
11

- 2 2 , the function $ ( t ,x ) is given by


x2 < x.
(7.2.16)
the formula

where z is the root of the transcendental algebraic equation

One can readily see that the possible values of the root z of Eq. (7.2.18)
always lie in the region 1 <
-z < - e(l-um)(T-t)and the boundary values z = 1
and z = e(l-um)(T-t) correspond to the endpoints (7.2.15) of the interval
x1 5 x <2 2 . The optimal control u,, which solves problem (7.2.14),
depends on the variable x ( t ) = x and is determined as follows:

if x < 21, then u* 2 0, t 5u 5T,


if x 2 22, the nu*^^,, t<u<T,
3 74 Chapter VII

ifxl < x 5 x2, then u* = { urn, for x ( u )


for x(u)
< x* = x ~ l / ( l - ~ m ) ,
> 2,.

(b) Let urn = 1. In this case, for u = urn the coordinate x ( u ) = const,
and problem (7.2.14) has the obvious solution
for x ( u ) < 1,
u* = {;rn,
for x(u) 2 1.

The minimum value of the functional (7.2.14) can readily be calculated for
control (7.2.19), and as a result, for the desired function $(t, x) we obtain
the expression

$(t, 2) =
I +
O 5 x 5 e-(T-t),
( T - t ) - l n x 22 - x2/2 - 312, e-(T-t)
( (T-t)(2-2x+x2), x 1.
x 1,

(7.2.20)
>
< <

(c) Let urn > 1. In this case the optimal control solving problem (7.2.14)
coincides with (7.2.19).4 After some simple calculations, we obtain

4For e-(T-t) < x < e ( ~ m - l ) ( ~ - ~there


) , always exists a time instant a0 a t which
the solution x ( o ) of the equation x ( a ) = ( 1 - u , ) x ( a ) , a 2 t , x ( t ) = x , attains the
value x ( o o ) = 1. After the time 0 0 , the control (7.2.19) ensures the constant value
x ( o ) 5 1: a0 5 a 5 T by switching the control u infinitely fast between the boundary
values u = 0 and u = urn (the sliding mode). Just the same trajectory % ( a ) t: a 5 T <
but without the sliding mode can be obtained by using, instead of (7.2.19),the control

0, for % ( a )< 1,
u.={l, for x(u)=l,
Urn r for = ( a ) > 1.

Under this control we can realize the generalized solution in the sense of Filippov of the
equation x ( u ) = ( 1 - u * ) x ( o )(see 1541 and $1.1).
Numerical Synthesis 375

Thus, to find the optimal control in the synthesis form that solves prob-
lem (7.2.3), (7.2.4), (7.2.7), we need to solve the following boundary value
problem for the loss function F ( t , x, y):

where u, has the form (7.2. l o ) , ~ ( ty), is given by formula (7.2.13), and
the function + ( t , x ) is given by expressions (7.2.16)-(7.2.18), (7.2.20) or
(7.2.21) depending on the value of the maximum admissible control u,.
The boundary value problem (7.2.22) was solved numerically. The results
obtained are given in Section 7.2.4.
7.2.3. Problem with infinite horizon. Stationary operating
mode. Let us consider the control problem (7.2.3), (7.2.4), (7.2.7) on an
infinite time interval (in this case the terminal time T -+ ca). If the opti-
mal control u, (t, x, y) that solves problem (7.2.3), (7.2.4), (7.2.7) ensures
the convergence of the functional (7.2.8) for any initial state (x > 0, y > 0)
of the system, then due to time-invariance of Eqs. (7.2.3) the loss func-
tion (7.2.8) is also time-invariant, that is, F (t, x, y) -+ f (x, y), where the
function f (x, y) satisfies the equation

which is the stationary version of the Bellman equation (7.2.9).


In this case, the optimal control u,(x, y) and the switching line do not
depend on time explicitly and are given by formulas (7.2.10) and (7.2.12)
with F (t, x, y) replaced by the loss function f (s,y).
Let us denote the loss function f ( x , y) in the region Ro (u, = 0) by
fo(x, y), and the loss function f (x, y) in the region R, (u, = u,) by
f,(x, y). In Ro the function fo satisfies the equation

Correspondingly, for the function f, defined on R, we have


376 Chapter VII

Since the gradient of the loss function is continuous on the switching line,
that is, on the interface between R o and R,, we have

Equations (7.2.24)-(7.2.26) allow us to obtain explicit formulas for the par-


tial derivatives af /dx and af l a y along the switching line

If the switching line contains intervals of sliding mode, then formulas


(7.2.27) allow us to find these intervals and to obtain explicit analytic for-
mulas for the switching line on these intervals. As was shown in $4.1 (see
also [172]), the second-order mixed partial derivatives of the loss function
f (x, y) must coincide on the intervals of sliding mode, that is, we have

By using formulas (7.2.27), one can readily see that the condition (7.2.28)
is satisfied along the two lines y = x and y = 2 - x. To verify whether these
lines (or some parts of them) are lines of the sliding mode, we need to con-
sider the families of phase trajectories (that is, the solutions of Eq. (7.2.5))
for u = 0 and u = u, near these lines.
The corresponding analysis of the phase trajectories of system (7.2.3)
shows that the sliding mode may take place along the straight line y = x for
x < 1 and along the line y = 2 - x for x > 1. In this case the representative
point of system (7.2.3) once coming to the line y = x (x < 1) moves
along this lines (due to the sliding mode) away from the equilibrium state
(x = 1,y = 1). On the other hand, along the line y = 2 - x (x > I ) , system
(7.2.3) asymptotically a s t + oo approaches the point (x = 1,y = 1) due
to the sliding mode. T h a t is why, only the straight line segment

can be considered as the switching line for the optimal control in the sta-
tionary operating mode.
If u = urn, then the integral curve of Eq. (7.2.5) is tangent to the line
y = 2 - x a t the endpoint xO of the segment (7.2.29). By using (7.2.5), we
can write the tangency condition as
Numerical Synthesis 377

For different values of the parameters in problem (7.2.3), (7.2.4), (7.2.7)


(that is, of the numbers b > 0 and urn > O ) , the solution of Eq. (7.2.30) has
the form
[3b- 1 - urn- J ( 3 b - 1 - u , ) ~ - 8b(b - 1 ) ] / 2 ( b- I ) ,
if O<u, < 1, b # 1 or urn 2 1, b > b,,
x0 =
2 / ( 2 - u,), if 0 < urn< 1, b = 1,
if urn2 I, b < b,,

where the number b, is equal to

One can easily obtain a finite formula for the stationary loss function
f (x, y) along the switching line (7.2.29). By using the second equation in
(7.2.3) and formula (7.2.29), we see that the coordinate y(t)is governed by
the differential equation
Y = b(y - y2) (7.2.32)
while moving along the straight line (7.2.29). By integrating (7.2.32) with
the initial condition y ( 0 ) = y, we obtain

Using (7.2.33) and the relation x ( t ) = 2 - y ( t ) and calculating the func-


tional I in (7.2.7) for T = co,we find the desired stationary loss function

Here y is a n arbitrary point in the interval 2 - x0 < y 5 1.


7.2.4. Numerical solution of the nonstationary synthesis prob-
lem. If the control time T is finite, then the algorithm of the optimal con-
trol u,(t, x , y ) depends on time and, to find this control, we need to solve the
nonstationary Bellman equation (7.2.22). This equation is solved numeri-
cally in the bounded region R = { 0 5 x 5 x,,, 0 5 y 5 y,, 0 5 t 5 T}.
To this end, in R we construct the grid

w = {xi = ih,, i = 0 , 1 , . . .,N,, h,N, = x,,;


yj = j h y , j = O , l , . . . , N , , h,Ny =ymax;
t k = k r , k = 0 , 1 , ...,N , r N = T } , (7.2.35)
378 Chapter VII

and define the grid function F; that approximates the desired continuous
solution F ( t , x, y) of Eq. (7.2.22) a t the nodes of the grid (xi, yj, tk). The
values of the grid function ~4 a t the nodes of the grid (7.2.35) are related
to each other by algebraic equations obtained by the difference approxima-
tion of the Bellman equation (7.2.22). In what follows, we use well-known
methods for constructing difference schemes [60, 135, 162, 1631, therefore,
here we restrict our consideration only to a formal description of the differ-
ence equations used for solving Eq. (7.2.22) numerically. We stress that the
problems of approximation accuracy and stability and of the convergence
of the grid function F; to the exact solution F ( t , x, y) of Eq. (7.2.22) as
h,, hy, T + 0 are studied in detail in [49, 53, 135, 162, 163, 1791.
Just as in $7.1, by using the alternating direction method [163], we re-
place the two-dimensional (with respect to the phase variables) equation
(7.2.22) by the following pair of one-dimensional equations:

each of which is approximated by a finite-difference scheme with fractional


steps in the variable t. To ensure the stability of the difference approxima-
tion of Eqs. (7.2.36), (7.2.37), we use the scheme of "oriented differences"
<
[163]. For 0 < i < Nx and 0 < j < Ny, 0 < k N , we replace Eq. (7.2.36)
by the difference scheme

where

and the approximation steps h, and T satisfy the condition TIT,^ 5 hr for
all r, on the grid w .
For Eq. (7.2.37) we used the difference approximation
Numerical Synthesis 379

where

<
and the steps hy and T are specified by the condition T1ry[ hy for all ry
on the grid (7.2.35).
The grid functions for the initial Bellman equation (7.2.22) and for the
auxiliary equations (7.2.36), (7.2.37) are related as F; = vk. '3 ' vk-0.5
'3 =
~ ~ 5 - I=
~ k - 0 . and
'3 ' '3
The grid functions are calculated backwards over the time layers (num-
bered by k) -
from k = N to a n arbitrary number 0 k < N . The grid
function F; approximates the loss function F(T-~FT, ih,, j h y ) correspond-
ing to Eq. (7.2.22).
To obtain the unknown values of the grid functions vFj and K;
uniquely
from the algebraic equations (7.2.38) and (7.2.39), in view of (7.2.22), we
need to complete these equations with the zero "initial" conditions

and the boundary conditions of the form

where the function p(t, y) is determined by (7.2.13), and the function 4 ( t , x)


is calculated either by formulas (7.2.16)-(7.2.18) or by formula (7.2.20) (or
(7.2.21)) depending on the value of the admissible control u,. According to
[163], the difference scheme (7.2.38)-(7.2.41) approximates the loss function
F ( t , x, y) of Eq. (7.2.22) up to O(h,+ +
hy 7).
Calculations according to formulas (7.2.38)-(7.2.41) were performed by
using computers, and some numerical results are shown in Figs. 61-64.
Figure 61 shows the position of the switching lines (7.2.12) on the phase
plane (x, y) for different values of the "reverse" time p = T - t. The
curves in Fig. 61 were constructed for the problem parameters b = u, =
0.5 and the parameters h, = hy = 0.1, T = 0.01, and N, = Ny = 20
of the grid (7.2.35). Curves 1-5 correspond to the values of the reverse
time p = 1.5, 2.5, 3.5, 5.0, 7.0, respectively. The dashed line in Fig. 61
indicates the segment of the line (7.2.29) that is the part of the switching
line corresponding to the sliding mode of control in the limit case p =
T - t + a.Figures 62 and 63 show similar results for the maximum
values u, = 1.0 and u, = 1.5 of the admissible control. Curves 1-3 in
Figs. 62 and 63 are the switching lines corresponding to three values of the
Chapter VII

reverse time p = 3.5, 6.0, 12.0. Figure 64 illustrates the variation of the
loss function F ( t , x, y) along a part of the line (7.2.29) for different time
moments. The dotted line in Fig. 64 shows the stationary loss function
(7.2.34).
Numerical Synthesis

Figures 6 1-64 show that the results of numerical solution of Eq. (7.2.22)
(and of the synthesis problem) as p t cc allow us to study the passage
to the stationary control of population sizes. Moreover, these data confirm
the results of the theoretical analysis of the stationary mode carried out in
Section 7.2.3.
382 Chapter VII

We also point out that the nonstationary u,(t, x, y ) and the stationary
u, (x, y ) = limp,, u, (t, x, y ) algorithms of optimal control, obtained by
solving the Bellman equation (7.2.22) numerically, were used for the nu-
merical simulation of transient processes in system (7.2.3) when the com-
parative analysis of different control algorithms was carried out. The results
of this simulation and comparative analysis were discussed in $5.2.
CONCLUSION

Design methods that use the frequency approach to the analysis and
synthesis of control systems [119-121, 146, 1471 are widely applied in mod-
ern control engineering. Based on such notions as the transfer functions
of open- or closed-loop systems, these methods allow one to evaluate the
control quality by the position of zeros and poles of these transfer functions
in the frequency domain. The frequency methods are very illustrative and
effective in studying linear feedback control systems.
As for the methods for the calculation of optimal (suboptimal) con-
trol algorithms in the state space, shown in this book, modern engineering
most frequently deals with results obtained by solving problems of linear
quadratic optimization, which lead to linear optimal control systems.
So far linear quadratic problems of optimal control have been studied
comprehensively, the literature on this subject is quite extensive, and there-
fore these problems are only briefly outlined here. It should be noted that
the practical realization of linear optimal systems often involves difficul-
ties, as one needs to solve the matrix-valued Riccati equation and to use
the solution of this equation on the actual time scale. These problems are
discussed in [47, 126, 134, 149, 1501.
It is well known that a large number of practically important problems
of optimal control cannot be reduced to linear quadratic problems. In par-
ticular, this is true for control problems in which constraints imposed on
the values of the admissible control play an important role. Although prac-
tically important, there is currently no universal approach to solving these
optimal control problems with constraints in the form that ensures a simple
technical realization of the optimal control algorithm. The author hopes
that the results obtained in this book will help to develop new engineer-
ing methods for solving such problems by using constructive methods for
solving the Bellman equations.
Some remarks concerning the prospects for solving applied problems of
optimal control on the basis of the dynamic programming approach should
be made.
The existing methods of optimal control synthesis could be categorized
as exact, approximate analytic, and numerical. If a synthesis problem can
384 Conclusion

be solved exactly, then the optimal control algorithm can be written as a


finite formula obtained by analytically solving the corresponding Bellman
equation. Then the block C (the controller) in the functional diagram (see
Figs. 2 and 3) is a device simulating the analytic expression derived for the
optimal algorithm. Unfortunately, it is seldom that the Bellman equations
can be solved exactly (as a rule, for one-dimensional control problems).
The same holds in the case of linear quadratic problems, for which the dy-
namic programming approach only simplifies the procedure of solving the
synthesis problem by reducing the problem of solving a nonlinear partial
differential equation to solving a finite system of ordinary differential equa-
tions (a matrix-valued Riccati equation). In general, one could say that
intuition and conjecture are crucial in search of exact solutions to the Bell-
man equations. Therefore, the construction of exact solution resembles a
kind of art rather than a formal scientific approach.' Thus, we cannot ex-
pect that exact synthesis methods would be widely used for solving actual
control problems. The "practical" value of exact solutions to Bellman equa-
tions (and to synthesis problems) is that they, as a rule, form the basis for
a family of approximate analytic synthesis methods, which in turn enable
one to find control algorithms close to optimal algorithms for a significantly
larger class of specific applied problems.
The most common approximate synthesis methods employ various ver-
sions of the methods of a small parameter and of successive approximations
for solving the Bellman equation. On one hand, a large variety of different
versions of asymptotic synthesis methods (described in this book and by
other authors, see [22, 33, 34, 56-58, 1101) is available which allow one to
obtain solutions for many important classes of optimal control problems
often encountered in practice. On the other hand, the asymptotic syn-
thesis methods usually have a remarkable feature (multiply shown in this
book) that ensures a high effectiveness of asymptotic methods in practice.
Namely, quasioptimal control algorithms derived according to some scheme
with small parameters are often sufficient when the parameter supposed
to be small is in fact of a finite value, which is comparable to the other
parameters of the problem. In the design of actual control systems, this
allows one to obtain reasonable control algorithms by introducing a purely
formal small parameter into a specific problem considered. Moreover, by
formally applying the method of a small parameter, it is often possible to
significantly improve various heuristic control algorithms commonly used
in engineering (a typical example of such an improvement is given in 36.1).
All this makes approximate synthesis methods based on the use of asymp-

'A similar situation arises in the search of Liapunov functions in the theory of stability
[I, 29, 125, 1291. This fact was pointed out by T. Burton [29, p. 1661: " . . . Beyond any
doubt, construction of Liapunov functions is an art."
Conclusion 385

totic methods for solving the Bellman equations one of the most promising
trends in the engineering design of optimal control systems.
Another important branch of applied methods for solving problems of
optimal control is the development of numerical methods for solving the
Bellman equations (and synthesis problems). This field has recently re-
ceived much attention [lo, 31, 48, 49, 53, 86, 104, 1691. The main benefit
of numerical synthesis methods is their high universality. It is worth to note
that numerical methods also play an important role in problems of evalu-
ating the performance index of quasioptimal control algorithms calculated
by other methods. Currently, the widespread use of numerical synthesis
methods in modern engineering is somewhat hampered by the following
two factors: (i) the approximate properties of discrete schemes for solving
some classes of Bellman equations still remain to be rigorously mathemat-
ically justified, and (ii) the calculation of grid functions requires a great
number of operations. All this makes it difficult to solve control problems
of higher dimension and those with unbounded phase space. However, one
must not consider these facts as an obstacle to using numerical methods
in engineering. Recent developments in numerical methods for solving the
Bellman equations and in the decomposition of multidimensional problems
[31], continuous advances in parallel computing, and the progress in com-
puter technology itself suggest that the numerical methods for the synthesis
of optimal systems will soon become a regular tool for all those dealing with
the design of actual control systems.
REFERENCES

1. V. N. Afanasiev, V. B. Kolmanovskii, and V.R. Nosov, Mathemati-


cal Theory of Control Systems Design, Dordrecht: Kluwer Academic
Publishers, 1996.
2. A. A. Andronov, A. A. Vitt, and S. E. Khaikin, Theory of Oscilla-
tions, Moscow: Fizmatgiz, 1971.
3. M. Aoki, Optimization of Stochastic Systems, New York-London:
Academic Press, 1967.
4. P. Appel et J . Kampd de Feriet, Fonktions hypergdomdtriques et
hypersphdriques, Polynomes d7Hermite. Paris, 1926.
5. K. J. Astriim, Introduction to Stochastic Control Theory. New
York: Academic Press, 1970.
6. K. J. AstrGm, Theory and Applications of Adaptive Control - a
Survey. Automatica-J. IFAC, 19: 471-486, 1992.
7. K. J. Astriim, Adaptive control. In: Antoulas, ed., Mathematical
System Theory, Berlin: Springer, 1991, pp. 437-450.
8. K. J. AstrGm, Adaptive control around 1960. IEEE Control Sys-
tems, 16, No. 3: 44-49, 1996.
9. K. J. Astriim and B. Wittenmark, A survey of adaptive control
applications. Proceedings 34th IEEE Conference on Decision and
Control, New Orleans, Louisiana, 1995, pp. 649-654.
10. M. Bardi, S. Bottacin, and M. Falcone, Convergence of discrete
schemes for discontinuous value functions of pursuit-evasion games.
In: G. J. Olsder, ed., New Trends in Dynamic Games and Applica-
tions, Basel-Boston: Birkhauser, 1995, pp. 273-304.
11. A. T. Bharucha-Reid, Elements of the Theory of Markov Processes
and Their Applications, New York: McGrow-Hill, 1960.
12. V. P. Belavkin, Optimization of quantum observation and control.
Proceedings of 9th IFIP Conference on Optimizations Techniques,
Warszawa, 1979, Springer, 1980, pp. 141-149.
388 References

13. V. P. Belavkin, Nondemolition measurement and control in quan-


tum dynamic systems. Proceedings of CISM Seminar on Informa-
tion Complexity and Control in Quantum Physics, Springer, 1987,
pp. 311-329.
14. R. Bellman, Dynamic Programming. Princeton: Princeton Univer-
sity Press, 1957.
15. R. Bellman and E. Angel, Dynamic Programming and Partial Dif-
ferential Equations. New York: Academic Press, 1972.
16. R. Bellman, I. Gliksberg, and 0. A. Gross, Some Aspects of the
Mathematical Theory of Control Processes. Santa Monica, Califor-
nia: Rand Corporation, 1958.
17. R. Bellman and R. Kalaba, Theory of dynamic programming and
feedback systems. Proceedings of 1st IFAC Congress, Theory of
Discrete, Optimal, and Self-Tuning Systems, Moscow: Akad. Nauk
USSR, 1961.
18. D. P. Bertsekas, Dynamic Programming and Stochastic Control.
London: Academic Press, 1976.
19. N. N. Bogolyubov and Yu. A. Mitropolskii, Asymptotic Methods in
Nonlinear Oscillation Theory. Moscow: Fizmatgiz, 1974.
20. I. A. Boguslavskii, Navigation and Control under Incomplete Sta-
tistical Information. Moscow: Mashinostroenie, 1970.
21. I. A. Boguslavskii and A. V. Egorova, Stochastic optimal control of
motion with nonsymmetric constraints. Avtomat. i Telemekh., 33,
No. 8, 1972.
22. M. Y. Borodovskii, A. S. Bratus, and F. L. Chernous'ko, Optimal
pulse correction under random disturbances. Prikl. Mat. Mekh.,
39, No. 5, 1975.
23. N. D. Botkin and V. S. Patsko, Universal strategy in a differential
game with fixed terminal time. Problems Control Inform. Theory,
11, No. 6: 419-432, 1982.
24. A. E. Bryson and Y. C. Ho, Applied Optimal Control. Toronto-
London: Blaisdell, 1969.
25. B. M. Budak and S. V. Fomin, Multiple Integrals and Series. Mos-
cow: Nauka, 1965.
26. B. M. Budak, A. A. Samarskii, A. N. Tikhonov, Collection of Prob-
lems in Mathematical Physics. Moscow: Nauka, 1972.
27. B. V. Bulgakov, Oscillations. Moscow: Gostekhizdat, 1954.
28. R. Bulirsch and H. J. Pesch, The maximum principle, Bellman's
equation, and Caratheodory's work. J. Optim. Theory and Appl.,
References 389

80, No. 2: 203-229, 1994.


29. T. A. Burton, Volterra Integral and Differential Equations. New
York: Academic Press, 1983.
30. A. G. Butkovskii, Distributed Control Systems. New York: Else-
vier, 1969.
31. F. Camili, M. Falcone, P. Lanucara, and A. Seghini, A domain
decomposition method for Bellman equations. In: D. E. Keyes
and J. Xu, eds., Domain Decomposition Methods in Scientific and
Engineering. Contemp. Math., Providence: Amer. Math. Soc.,
1994, 180: 477-483, 1994.
32. F. L. Chernous'ko, Some problems of optimal control with a small
parameter. Prikl. Mat. Mekh., 32, No. 1, 1968.
33. F. L. Chernous'ko, L. D. Akulenko, and B. N. Sokolov, Control of
Oscillations. Moscow: Nauka, 1980.
34. F. L. Chernous'ko and V. B. Kolmanovskii, Optimal Control under
Random Disturbances. Moscow: Nauka, 1978.
35. C. W. Clark, Bioeconomic Modeling and Fisheries Managements.
New York: Wiley, 1985.
36. D. R. Cox and H. D. Miller, The Theory of Stochastic Processes.
Methuen, 1965.
37. M. L. Dashevskiy and R. S. Liptser, Analog modeling of stochastic
differential equations connected with change point problem. Av-
tomat. i Telemekh., 27, No. 4, 1966.
38. M. H. A. Davis and R. B. Vinter, Stochastic Modeling and Control.
London: Chapman and Hall, 1985.
39. M. H. DeGroot, Optimal Statistical Decisions. New York: McGrow-
Hill, 1970.
40. V. F. Dem'yanov, On minimization of maximal deviation. Vestnik
Leningrad Univ. Math., No. 7, 1966.
41. V. A. Ditkyn and A. P. Prudnikov, Integral Transforms and Oper-
ational Calculus. Moscow: Fizmatgiz, 1961.
42. A. L. Dontchev, Error estimates for a discrete approximation to
constrained control problems. SIAM J. Numer. Anal., 18: 500-
514, 1981.
43. A. L. Dontchev, Perturbations, Approximations, and Sensitivity
Analysis of Optimal Control Systems. Lecture Notes in Control
and Inform. Sci., Vol. 52, Berlin: Springer, 1983.
44. J. L. Doob, Stochastic Processes. New York: Wiley, 1953.
390 References

45. E. B. Dynkin, Markov Process. Berlin: Springer, 1965.


46. S. V. Emel'yanov, ed., Theory of Variable-Structure Systems. Mos-
cow: Nauka, 1970.
47. C. Endrikat and I. Hartmann, Optimal design of discrete-time
MIMO systems in the frequency domain. Internat. J. Control,
48, No. 4: 1569-1582, 1988.
48. M. Falcone, Numerical solution of dynamic programming equations.
Appendix to the monograph by M. Bardi, I. Capuzzo Dolcetta, Op-
timal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman
Equations. Basel-Boston: Birkhauser, 1997.
49. M. Falcone and R. Ferretti, Convergence analysis for a class of semi-
Lagrangian advection schemes. SIAM J. Numer. Anal., 38, 1998.
50. A. A. Feldbaum, Foundations of the Theory of Optimal Automatic
Systems. Moscow: Nauka, 1966.
51. M. Feldman and J . Roughgarden, A populations's stationary dis-
tribution and chance of extinction in stochastic environments with
remarks on the theory of species packing. Theor. Pop. Biol., 7,
No. 12: 197-207, 1975.
52. W. Feller, An Introduction to Probability Theory and Its Applica-
tions. New York: Wiley, 1970.
53. R. Ferretti, On a Class of Approximation Schemes for Linear Bound-
ary Control Problems. Lecture Notes in Pure and Appl. Math.,
Vol. 163, New York: Marcel Dekker, 1994.
54. A. F. Filippov, Differential Equations with Discontinuous Right-
Hand Sides. Dordrecht: Kluwer Academic Publishers, 1986.
55. W. H. Fleming, Some Markovian optimization problems. J. Math.
and Mech., 12 No. 1, 1963.
56. W. H. Fleming, Stochastic control for small noise intensities. SIAM
J. Control, 9, No. 3, 1971.
57. W. H. Fleming and M. R. James, Asymptotic series and exit time
probabilities. Ann. Probab., 20, No. 3: 1369-1384, 1992.
58. W. H. Fleming and R. W. Rishel, Deterministic and Stochastic
Optimal Control: Berlin: Springer, 1975.
59. W. H. Fleming and H. M. Soner, Controlled Markov Processes and
Viscosity Solutions. Berlin: Springer, 1993.
60. G. E. Forsythe, M. A. Malcolm, and C. B. Moler, Computer Meth-
ods for Mathematical Computation. Englewood Cliffs, N.J.: Pren-
tice Hall, 1977.
References 391

61. A. Friedman, Partial Differential Equations of Parabolic Type. En-


glewood Cliffs, N.J.: Prentice Hall, 1964.
62. F. R. Gantmacher, The Theory of Matrices. Vol. 1, New York:
Chelsea, 1964.
63. I. M. Gelfand, Generalized Stochastic Processes. Dokl. Akad. Nauk
SSSR, 100, No. 5, 1955.
64. I. M. Gelfand and S. V. Fomin, Calculus of Variations. Moscow:
Fizmatgiz, 1961.
65. I. M. Gelfand and G. I. Shilov, Generalized Functions and Their
Calculations. Moscow: Fizmatgiz, 1959.
66. I. I. Gikhman and A. V. Skorokhod, The Theory of Stochastic Pro-
cesses. Berlin: Springer, Vol. 1, 1974; Vol. 2, 1975.
67. B. V. Gnedenko, Theory of Probabilities. Moscow: Nauka, 1969.
68. B. S. Goh, Management and Analysis of Biological Populations.
Amsterdam: Elsevier Sci., 1980.
69. L. S. Goldfarb, On some nonlinearities in automatic regulation sys-
tems. Avtomat. i Telemekh., 8, No. 5, 1947.
70. L. S. Goldfarb, Research method for nonlinear regulation systems
based on harmonic balance principle. In: Theory of Automatic
Regulation, Moscow: Mashgiz, 1951.
71. E. Goursat, Cours d'Analyse MathCmatique. V. 3, Paris: Gauthier-
Villars, 1927.
72. R. Z. Hasminskii, Stochastic Stability of Differential Equations.
Alphen: Sijtjoff and Naordhoff, 1980.
73. G. E. Hutchinson, Circular control systems in ecology. Ann. New
York Acad. Sci., 50, 1948.
74. A. M. Il'in, A. S. Kalashnikov, and 0. L. Oleynik, Second-order
parabolic linear equations. Uspekhi Mat. Nauk, 17, No. 3, 1962.
75. K. Ito, Stochastic integral. Proc. Imp. Acad., Tokyo, 20, 1944.
76. K. Ito, On a formula concerning stochastic differentials. Nagoya
Math. J., 3: 55-65, 1951.
77. E. Janke, F. Emde, and F. Losch, Tafeln hoherer Funktionen. Stutt-
gart: Teubner, 1960.
78. R. E. Kalman, On general theory of control systems. In: Proceed-
ings of the 1 IFAC Congress, Vol. 2, Moscow: Acad. Nauk SSSR,
1960.
79. R. E. Kalman and R. S. Bucy, New results in linear filtering and
prediction theory. Trans. ASME Ser. D (J. Basic Engineering), 83:
95-108, 1961.
392 References

80. L. I. Kamynin, Methods of heat potentials for a parabolic equation


with discontinuous coefficients. Siberian Math. J., 4, No. 5, 1963.
81. L. I. Kamynin, On existence of boundary problem solution for par-
abolic equations with discontinuous coefficients. Izv. Akad. Nauk
SSSR Ser. Mat., 28, No. 4, 1964.
82. V. A. Kazakov, Introduction to the Theory of Markov Processes
and Radio Engineering Problems. Moscow: Sovetskoe Radio, 1973.
83. M. Kimura, Some problems of stochastic processes in genetics. Ann.
Math. Statist., 28: 882-901, 1957.
84. V. B. Kolmanovskii, On approximate synthesis of some stochastic
systems. Avtomat. i Telemekh., 36, No. 1, 1975.
85. V. B. Kolmanovskii, Some time-optimal control problems for sto-
chastic systems. Problems Control Inform. Theory, 4, No. 4, 1975.
86. V. B. Kolmanovskii and G. E. Kolosov, Approximate and numerical
methods to design optimal control of stochastic systems. Izv. Akad.
Nauk SSSR Tekhn. Kibernet., No. 4: 64-79, 1989.
87. V. B. Kolmanovskii and A. D. Myshkis, Applied Theory of Func-
tional Differential Equations. Dordrecht: Kluwer Academic Pub-
lishers, 1992.
88. V. B. Kolmanovskii and V. R. Nosov, Stability of Functional Dif-
ferential Equations. London: Academic Press, 1986.
89. V. B. Kolmanovskii and L. E. Shaikhet, Control of Systems with Af-
tereffect. Transl. Math. Monographs, Vol. 157, Providence: Amer.
Math. Soc., 1996.
90. V. B. Kolmanovskii and A. K. Spivak, Time-optimal control in a
predator-prey system. Prikl. Mat. Mekh., 54, No. 3: 502-506,
1990.
91. A. N. Kolmogorov and S. V. Fomin, Elements of Function Theory
and Functional Analysis. Moscow: Nauka, 1968.
92. G. E. Kolosov, Synthesis of statistical feedback systems optimal
with respect to different performance indices. Vestnik Moskov.
Univ. Ser. 111, No. 1: 3-14, 1966.
93. G. E. Kolosov, Optimal control of quasiharmonic plants under in-
complete information about the current values of phase variables.
Avtomat. i Telemekh., 30, No. 3: 33-41, 1969.
94. G. E. Kolosov, Some problems of optimal control of Markov plants.
Avtomat. i Telemekh., 35, No. 2: 16-24, 1974.
95. G. E. Kolosov, Analytical solution of problems in synthesis of opti-
mal distributed-parameter control systems subject to random per-
References 393

turbations. Automat. Remote Control, No. 11: 1612-1622, 1978.


96. G. E. Kolosov, Synthesis of optimal stochastic control systems by
the method of successive approximations. Prikl. Mat. Mekh., 43,
No. 1: 7-16, 1979.
97. G. E. Kolosov, Approximate synthesis of stochastic control systems
with random parameters. Avtomat. i Telemekh., 43, No. 6: 107-
116, 1982.
98. G. E. Kolosov, Approximate method for design of stochastic adap-
tive optimal control systems. In: G. S. Ladde and M. Sambandham,
eds., Proceedings of Dynamic Systems and Applications, Vol. 1,
1994, pp. 173-180.
99. G. E. Kolosov, On a problem of population size control. Izv. Ross.
Akad. Nauk Teor. Sist. Upravlen., No. 2: 181-189, 1995.
100. G. E. Kolosov, Numerical analysis of some stochastic suboptimal
controlled systems. In: Z. Deng, Z. Liang, G. Lu, and S. Ruan,
eds., Differential Equations and Control Theory. Lecture Notes in
Pure and Appl. Math., Vol. 176, New York: Marcel Dekker, 1996,
pp. 143-148.
101. G. E. Kolosov, Exact solution of a stochastic problem of optimal
control by population size. Dynamic Systems and Appl., 5, No. 1:
153-161, 1996.
102. G. E. Kolosov, Size control of a population described by a stochastic
logistic model. Automat. Remote Control, 58, No. 4: 678-686,
1997.
103. G. E. Kolosov and D. V. Nezhmetdinova, Stochastic problems of op-
timal fisheries managements. In: Proceedings of the 15th IMACS
Congress on Scientific Computation. Modelling and Applied Math-
ematics, Vol. 5, Berlin: Springer, 1997, pp. 15-20.
104. G. E. Kolosov and M. M. Sharov, Numerical method of design of
stochastic optimal control systems. Automat. Remote Control, 49,
No. 8: 1053-1058, 1988.
105. G. E. Kolosov and M. M. Sharov, Optimal damping of population
size fluctuations in an isolated "predator-prey" ecological system.
Automation and Remote Control, 53 No. 6: 912-920, 1992.
106. G. E. Kolosov and M. M. Sharov, Optimal control of population
sizes in a predator-prey system. Approximate design in the case
of an ill-adapted predator. Automat. Remote Control, 54, No. 10:
1476-1484, 1993.
107. G. E. Kolosov and R. L. Stratonovich, An asymptotic method for
394 References

solution of the problems of optimal regulators design. Avtomat. i


Telemekh., 25, No. 12: 1641-1655, 1964.
108. G. E. Kolosov and R. L. Stratonovich, On optimal control of quasi-
harmonic systems. Avtomat. i Telemekh., 26, No. 4:601-614, 1965.
109. G. E. Kolosov and R. L. Stratonovich, Asymptotic method for so-
lution of stochastic problems of optimal control of quasiharmonic
systems. Avtomat. i Telemekh., 28, No. 2: 45-58, 1967.
110. N. N. Krasovskii and E. A. Lidskii, Analytical design of regulators
in the systems with random properties. Avtomat. i Telemekh., 22,
No. 9-11, 1961.
111. N. N. Krasovskii, Theory of the Control of Motion. Moscow: Nauka,
1968.
112. V. F. Krotov, Global Methods in Optimal Control Theory. New
York: Marcel Dekker, 1996.
113. N. V. Krylov, Controlled Diffusion Process. New York: Springer,
1980.
114. S. I. Kumkov and V. S. Patsko, Information sets in the problem of
pulse control. Avtomat. i Telemekh., 22, No. 7: 195-206, 1997.
115. A. B. Kurzhanskii, Control and Observation under Uncertainty.
Moscow: Nauka, 1977.
116. H. J. Kushner and A. Schweppe, Maximum principle for stochastic
control systems. J. Math. Anal. Appl., No. 8, 1964.
117. H. J. Kushner, Stochastic Stability and Control. New York-London:
Academic Press, 1967.
118. H. J. Kushner, On the optimal control of a system governed by a
linear parabolic equation with white noise inputs. SIAM J . Control,
6, No. 4, 1968.
119. H. Kwakernaak, The polynomial approach to H, optimal regu-
lation. In: E. Mosca and L. Pandolfi, eds., H,-Control Theory,
Como, 1990. Lecture Notes in Math., Vol. 1496, Berlin: Springer,
1991.
120. H. Kwakernaak, Robust control and H,-optimization. Automati-
ca-J. IFAC, 29, No. 2: 255-273, 1993.
121. H. Kwakernaak, Symmetries in control system design. In: Alberto
Isidori, ed., Trends in Control, A European Perspective, Rome.
Berlin: Springer, 1995.
122. H. Kwakernaak and R. Sivan, Linear Optimal Control Systems.
New York-London: Wiley, 1972.
References 395

123. J. P. La Salle, The time-optimal control problem. In: Contribution


to Differential Equations, Vol. 5, Princeton, N.J.: Princeton Univ.
Press, 1960.
124. 0. Ladyzhenskaya, V. Solonnikov, and N. Uraltseva, Linear and
Quasilinear Equations of Parabolic Type. Transl. Math. Mono-
graphs, Vol. 23, Providence: Amer. Math. Soc., 1968.
125. V. Lakshmikantham, S. Leela and A. A. Martynyuk, Stability Anal-
ysis of Nonlinear Systems. New York: Marcel Dekker, 1988.
126. P. Lancaster and L. Rodman, Solutions of the continuous and dis-
crete time algebraic Riccati equations. In: S. Bittanti, A. J. Laub,
and J. G. Willems, eds., The Riccati Equation. Berlin: Springer,
1991.
127. P. Langevin, Sur la thdorie du mouvment brownien. Comptes Ren-
dus Acad. Sci. Paris, 146, No. 10, 1908.
128. E. B. Lee and L. Marcus, Foundation of Optimal Control Theory.
New York-London: Wiley, 1969.
129. X. X. Liao, Mathematical Theory and Application of Stability,
Wuhan, China: Huazhong Normal Univ. Press, 1988.
130. J . L. Lions, Optimal Control of Systems Governed by Partial Dif-
ferential Equations. Berlin: Springer, 1971.
131. R. S. Liptser and A. N. Shiryaev, Statistics of conditionally Gauss-
ian random sequences. In: Proc. of the 6th Berkeley Symp. of
Mathem. Statistics and Probability. University of California, 1970.
132. R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes.
Berlin: Springer, Vol. 1, 1977 and Vol. 2, 1978.
133. A. J. Lotka, Elements of Physical Biology. Baltimore: Williams and
Wilkins, 1925.
134. R. Luttmann, A. Munack, and M. Thoma, Mathematical modelling,
parameter identification, and adaptive control of single cell protein
processes in tower loop bioreactors. In: Advances in Biochemical
Engineering, Biotechnology, Vol. 32, Berlin-Heidelberg: Springer,
1985, pp. 95-205.
135. G. I. Marchuk, Methods of Numerical Mathematics. New York-
Berlin: Springer, 1975.
136. N. N. Moiseev, Asymptotical Methods of Nonlinear Analysis. Mos-
cow: Nauka, 1969.
137. N. N. Moiseev, Foundations of the Theory of Optimal Systems.
Moscow: Nauka, 1975.
396 References

138. B. S. Mordukhovich, Approximation Methods in Problems of Opti-


mization and Control. Moscow: Nauka, 1988.
139. V. M. Morozov and I. N. Kalenkova, Estimation and Control in
Nonstationary Systems. Moscow: Moscow State Univ. Press, 1988.
140. E. M. Moshkov, On accuracy of optimal control of terminal condi-
tion. Prikl. Mat. i Mekh., 34, No. 3, 1970.
142. J. D. Murray, Lectures on Nonlinear Differential Equation Model in
Biology. Oxford: Claremon Press, 1977.
143. G. V. Obrezkov and V. D. Razevig, Methods of Analysis of Tracking
Breakdowns. Moscow: Sovetskoe Radio, 1972.
144. 0. A. Oleynik, Boundary problems for linear elliptic and parabolic
equation with discontinuous coefficients. Izv. Acad. Nauk SSSR
Ser. Mat., 25, No. 1, 1961.
145. V. S. Patsko, et al., Control of an aircraft landing in windshear. J.
Optim. Theory and Appl., 83, No. 2: 237-267, 1994.
146. A. E. Pearson, Y. Shen, and J. Q. Pan, Discrete frequency formats
for linear differential system identification. In: Proc. of 12th World
Congress IFAC, Sydney, Australia, Vol. VII, 1993, pp. 143-148
147. A. E. Pearson and A. A. Pandiscio, Control of time lag systems
via reducing transformations. In: Proc. of 15th IMACS World
Congress. A. Sydow, ed., Systems Engineering, Vol. 5, Berlin: Wis-
senschaft & Technik, 1997, pp. 9-14.
148. A. A. Pervozvanskii, On minimum of maximal deviation of con-
trolled linear system. Izv. Acad. Nauk SSSR Mekhanika, No. 2,
1965.
149. H. J. Pesch, Real-time computation of feedback controls for con-
strained extremals (Part 1: Neighboring extremals; Part 2: A cor-
rection method based on multiple shooting). Optimal Control Appl.
Methods, 10, No. 2: 129-171, 1989.
150. H. J. Pesch, A practical guide to the solution of real-life optimal
control problems. Control Cybernet., 23, No. 1 and 2: 7-60, 1994.
151. A. B. Piunovskiy, Optimal control of stochastic sequences with con-
straints. Stochastic Anal. Appl., 15, No. 2: 231-254, 1997.
152. A. B. Piunovskiy, Optimal Control of Random Sequences in Prob-
lems with Constraints. Dordrecht: Kluwer Academic Publishers,
1997.
153. H. Poincare, Sur le probleme de troits corps et les equations de la
dynamiques. Acta Math., 13, 1890.
References 397

154. H. Poincare, Les Methodes Nouvelles de la Maechanique Celeste.


Paris: Gauthier-Villars, 1892-1899.
155. I. I. Poletayeva, Choice of optimality criterion. In: Engineering
Cybernetics, Moscow: Nauka, 1965.
156. L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, and
E. F. Mischenko, The Mathematical Theory of Optimal Processes.
New York: Interscience, 1962.
157. Yu. V. Prokhorov and Yu. A. Rozanov, Probability Theory, Founda-
tions, Limit Theorems, and Stochastic Processes. Moscow: Nauka,
1967.
158. N. S. Rao and E. 0. Roxin, Controlled growth of competing species.
SIAM J. Appl. Math., 50, No. 3: 853-864, 1990.
159. V. I. Romanovskii, Discrete Markov Chains. Moscow: Gostekhiz-
dat, 1949.
160. Yu. A. Rozanov, Stochastic Processes. Moscow: Nauka, 1971.
161. A. P. Sage and J. L. Melsa, Estimation Theory with Applications
to Communication and Control. New York: McGraw-Hill, 1971.
162. A. A. Samarskii, Introduction to Theory of Difference Schemes.
Moscow: Nauka, 1971.
163. A. A. Samarskii and A. V. Gulin, Numerical Methods. Moscow:
Nauka, 1989.
164. M. S. Sholar and D. M. Wiberg, Canonical equation for bound-
ary feedback control of stochastic distributed parameter systems.
Automatica-J. IFAC, 8, 1972.
165. H. L. Smith, Competitive coexistence in an oscillating chemostat.
SIAM J. Appl. Math., 40, No. 3: 498-552, 1981.
166. S. L. Sobolev, Equations of Mathematical Physics. Moscow: Nauka,
1966.
167. Yu. G. Sosulin, Theory of Detection and Estimation of Stochastic
Signals. Moscow: Sovetskoe Radio, 1978.
168. J. Song and J. Yu, Population System Control. Berlin: Springer,
1987.
169. J. Stoer, Principles of sequential quadratic programming methods
for solving nonlinear programs. In: K. Schittkowski, ed., Computa-
tional Mathematical Programming. NATO AS1 Series, F15, 1985,
pp. 165-207.
170. R. L. Stratonovich, Application of Markov processes theory for op-
timal filtering of signals. Radiotekhn. i Elektron., 5, No. 11, 1960.
398 References

171. R. L. Stratonovich, On the optimal control theory. Sufficient coor-


dinates. Avtomat. i Telemekh., 23, No. 7, 1962.
172. R. L. Stratonovich, On the optimal control theory. Asymptotic
method for solving the diffusion alternative equation. Avtomat. i
Telemekh., 23, No. 11, 1962.
173. R. L. Stratonovich, Topics in the Theory of Random Noise. New
York: Gordon and Breach, Vol. 1, 1963 and Vol. 2, 1967.
174. R. L. Stratonovich, New form of stochastic integrals and equations.
Vestnik Moskov. Univ. Ser. I Mat. Mekh., No. 1, 1964.
175. R. L. Stratonovich, Conditional Markov Processes and Their Ap-
plication to the Theory of Optimal Control. New York: Elsevier,
1968.
176. R. L. Stratonovich and V. I. Shmalgauzen, Some stationary prob-
lems of dynamic programming, Izv. Akad. Nauk SSSR Energetika
i Avtomatika, No. 5, 1962.
177. Y. M. Svirezhev, Nonlinear Waves, Dissipative Structures, and Ca-
tastrophes in Ecology. Moscow: Nauka, 1987.
178. G. W. Swan, Role of optimal control theory in cancer chemotherapy,
Math. Biosci., 101: 237-284, 1990.
179. A. N. Tikhonov and A. A. Samarskii, Equations of Mathematical
Physics. Moscow: Nauka, 1972.
180. V. I. Tikhonov, Phase small adjustment of frequency in presence of
noises. Avtomat. i Telemekh., 21, No. 3, 1960.
181. V. I. Tikhonov and M. A. Mironov, Markov Processes. Moscow:
Sovetskoe Radio, 1977.
182. S. G. Tzafestas and J. M. Nightingale, Optimal control of a class
of linear stochastic distributed parameter systems. Proc. IEE, 115,
No. 8, 1968.
183. B. van der Pol, A theory of the amplitude of free and forced triode
vibration. Radio Review, 1, 1920.
184. B. van der Pol, Nonlinear theory of electrical oscillations. Proc.
IRE, 22, No. 9, 1934.
185. B. L. van der Waerden, Mathematische Statistik. Berlin: Springer,
1957.
186. V. Volterra, Variazione fluttuazioni del numero d'individui in specie
animali convivelnti. Mem. Acad. Lincei, 2: 31-113, 1926.
187. V. Volterra, Lecons sur la theorie mathematique de la lutte pour la
vie. Paris: Gauthier-Villars, 1931.
References 399

188. A. Wald, Sequential Analysis, New York: Wiley, 1950.


189. K. E. F. Watt, Ecology and Resource Management. New York:
McGraw-Hill, 1968.
190. B. Wittenmark and K. J . Astrijm, Practical issues in the implemen-
tation of self-tuning control. Automatica--J. IFAC, 20: 595-605,
1984.
191. E. Wong and M. Zakai, On the relation between ordinary and sto-
chastic differential equations. Internat. J. Engrg. Sci., 3, 1965.
192. E. Wong and M. Zakai, On the relation between ordinary and sto-
chastic differential equations and applications to stochastic prob-
lems in control theory. Proc. Third Intern. Congress IFAC, Lon-
don, 1966.
193. W. M. Wonham, On the separation theorem of stochastic control.
SIAM J. Control, 6 : 312-326, 1968.
194. W. M. Wonham, Random differential equations in control theory.
In: A. T. Bharucha-Reid, ed., Probabilistic Methods in Applied
Mathematics, Vol. 2, New York: Academic Press, 1970.
195. M. A. Zarkh and V. S. Patsko, Strategy of second payer in the linear
differential game. Prikl. Math. Mekh., 51, No. 2: 193-200, 1987.
INDEX

program, 2
of relay type, 105, 111
Adaptive problems of optimal Control problem with infinite
control, 9 horizon, 343
A posteriori mean values, 91 Controller, 1, 7
A posteriori covariances, 90 Constraints, control, 17
Asymptotic synthesis on control resources, 17
method, 248 on phase variables, 18
Asymptotic series, 220 Cost function (functional), 49
Covariance matrix. 147

Bellman equation, 47, 51


differentional, 63 Diffusion process, 27
functional, 278 Dynamic programming
integro-differentional, 74 approach, 47
stationary, 67
Bellman optimality principle, 49
Brownian motion, 33
Equations, Langevin 45
logistic, 124
of a single population, 342
Cauchy problem, 9 stochastic differential, 32
Capacity of the medium, 124 truncated, 253
Chapman-Kolmogorov Error signal, 104
equation, 23 Error,
Control, admissible, 9 stationary tracking, 67, 226
bang-bang, 105 Estimate,
boundary, 212 of approximate synthesis, 182
distributed, 201 of unknown parameters, 316
Index

Euler equation, 136 Lotka-Volterra, equation, 125


normalized model, 274, 368

Feedback control system, 2


Filippov generalized Markov, process, 21
solution, 12 discrete, 22
Fokker-Planck equation, 29 continuous, 25
Functional, cost, 19 conditional, 79
quadratic, 93, 99 strictly discontinuous, 31
Mathematical expectation, 15
conditional, 60
Matrix, fundamental, 177
Gaussian, conditionally, 313 Method,
probability density, 92 alternating direction, 378
process, 20 grid function, 356
small parameter, 220
of successive
approximation, 143
Hutchinson, model, 125 sweep, 364
Model,
stochastic logistic, 126, 311
Malthus, model, 123
Integral criterion, 14
Ito, equation, 42
stochastic integral, 37
Natural growth factor, 124
Nonvibrational amplitude, 254
phase, 254
Kalman filter, 91
Kolmogorov,
backward equation, 25
forward equation, 25 Optimal, damping of random
Krylov-Bogolyubov method, 254 oscillations, 276
fisheries
management, 133, 342
Optimality criterion, 2, 13
Loss function, 49 terminal, 14
Index

Oscillator, quasiharmonic, 248 lengthwise-transverse, 362


Oscillatory systems, 247 Screen, reflecting, 329
absorbing, 333
Servomechanism, 7
Sliding mode, 12
Performance index, 2 Stationary operating
Plant, 1, 7 conditions, 65
Plant with distributed Sufficient coordinates, 75
parameters, 199 Switch point, 105
Poorly adapted predator, 267 Switching line, 156
Population models, 123 Symmetrized (Stratonovich)
Predator-prey model, 125 stochastic integral, 40
Probability, density, 20 Synthesis, numerical, 355
Problem, boundary-value, 70 Synthesis problem, 7
linear-quadratic (LQ-), 53
with free endpoint, 48
Process, stochastic, 19
optimal stabilization, 278 Transition probability, 22

Regulator, 154 Van-der-Pol oscillator, 252


Riccati equation, 100 Van-der-Pol method, 254

Sample path, 108 White noise, 19


Scheme, Wiener random process, 33

You might also like