Optimal Design of Control Systems Stochastic and Deterministic Problems
Optimal Design of Control Systems Stochastic and Deterministic Problems
CONlROL SYSTEMS
Stochastic and
Deterministic Problems
OPTIMAL DESIGN OF
CONTROL SYSTEMS
PURE AND APPLIED MATHEMATICS
EXECUTIVE EDITORS
EDITORIAL BOARD
S. Kobayashi Gian-CarloRota
University of California, Massachusetts Institute of
Berkeley Technology
Mark Teply
University of Wisconsin,
Milwaukee
MONOGRAPHS AND TEXTBOOKS IN
PURE AND APPLIED MATHEMATICS
G. E. Kolosov
Moscow University of
Electronics and Mathematics
Moscow, Russia
M A R C E L
MARCELDEKKER,
INC.
Library of Congress Cataloging-in-Publication Data
IIeadqualters
Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 2 12-685-4540
The publisher offers discounts on this book when ordered in bulk quantities. For more infor-
mation, write to Spccial Sales/Professional Marketing at the headquarters address above.
Neither this book nor any part may be reproduced or transmitted in any form or by any
mcans, clcctronic or mechanical, including photocopying, microfillning, arid recording, or
by any information storage and retrieval system, without permission in writing from the
p~ihlisher
the book. This makes the book accessible to a wide circle of students and
specialists who are interested in applications of optimal control theory.
G. E. Kolosov
CONTENTS
Preface
Introduction
Conclusion
References
Index
INTRODUCTION
I[x(t)] attains its extremum value (in this case, the extremum type (mini-
mum or maximum) is determined by the character of the control problem).
The functional I[x(t)] used for estimating the control quality is often called
the optimality criterion or the performance index of the control system
designed.
If there are no random actions on the system, the problem of finding the
optimal trajectory x,(t) amounts to finding the optimal control program
<
{u, (t) : 0 5 t T) that ensures the plant motion along the extremum tra-
jectory {x,(t) : 0 5 t 5 T}. The optimal control u,(t) can be calculated
by using methods of classical calculus of variations [64], or, in more general
situations, Pontryagin's maximum principle [156], or various approximate
methods [I381 based on these two fundamental approaches. Different meth-
ods for calculating the optimal control programs are discussed in [137].
If an optimal control system is constructed without considering stochas-
tic effects, then the system can be open (as in Fig. I ) , since the plant tra-
< <
jectory {x(t) : 0 t T} and hence the value of the optimality criterion
< <
I[x(t)] are determined uniquely for a chosen realization {u(t) : 0 t T} of
control actions. (Needless to say that the equation of the plant is assumed
to have a unique solution for a given initial state x(0) = xo and a given
input function u(t).)
< <
for 0 t T. However, the stochastic nature of the assigning action (com-
mand signal) y(t) on one side, and the inertial properties of the plant P on
the other side do not allow to ensure the required identity between the in-
put and output parameters. Therefore, a problem of optimal control arises
in a natural way.
Hence, just as in the deterministic case, the optimality criterion I[ly(t) -
x(t) I] is introduced, which is a measure of the "distance" between the func-
tions y(t) and x(t) on the time interval 0 < - t <
- T. The final statement
of the problem depends on the type of assumptions on the properties of
the assigning action y(t). Throughout this book, we use the probability
description for all random actions on the system. This means that all as-
signing actions are treated as random functions with known (completely
or partially) probability characteristics. In this approach, the optimal con-
trol law that determines the structure of the block C can be found from
the condition that the mean value of the criterion I [I y(t) - x(t) I] attains
its minimum. Another approach in which the regions of admissible values
of perturbations rather than their probability characteristics are specified
and the optimal system is constructed by methods of the game theory is
described in [23, 114, 115, 145, 1951.
If the servomechanism shown in Fig. 2 is significantly affected by noises
arising due to measurement errors, instability of voltage sources in electri-
cal circuits, varying properties of the medium surrounding the automatic
system, then the block diagram in Fig. 2 becomes more complicated and
can be of the form shown in Fig. 3.
In this book we do not consider control systems whose block diagrams are
more complicated than that shown in Fig. 3. All control systems studied
in the sequel are special cases of the system shown in Fig. 3.
The main emphasis of this book is on the methods for calculating the
optimal control algorithms
which determine the structure of the controller C and guarantee the optimal
behavior of the feedback control system shown in Fig. 3. Since the methods
studied in this book are oriented to solving applied control problems in
mechanics, engineering, and biology, much attention is paid to obtaining
(*) in a form such that it can easily be used in practice. This means that all
optimal control algorithms described in the book for specific problems are
such that the functional (mapping) cp, in (*) has either a finite analytic form
or can be implemented by sufficiently simple standard modeling methods.
From the mathematical viewpoint, all problems of optimal control are
related to finding a conditional extremum of a functional (the optimal-
ity criterion), i.e., are problems of calculus of variations [28, 58, 64, 1371.
However, a distinguishing feature of many optimal control problems is that
they are "nonclassical" due restrictions imposed on the admissible values
of controlling actions u(t). For instance, this often leads to discontinuous
extremals inadmissible in the classical theory [64]. Therefore, problems of
optimal control are usually solved by contemporary mathematical methods,
the most important being the Pontryagin maximum principle [I561 and the
Bellman dynamic programming approach [14].These methods develop and
generalize two different approaches to variational problems in the classical
theory: the Euler method and the Weierstrass variational principle used for
constructing a separate extremal and the Hamilton-Jacobi method based
on the consideration of the entire field of extremals, which leads to partial
differential equations for controlled systems with lumped parameters or to
equations with functional derivatives for controlled systems with distributed
parameters.
The maximum principle, which is a rigorously justified mathematical
method, can be used in general for solving both deterministic and stochastic
problems of optimal control [58, 116, 1561. However this method, based on
the consideration of individual trajectories of the control process, leads
to certain technical difficulties when one needs to find the structure of
the controller C in feedback stochastic systems (see Figs. 2 and 3). In
this situation, the dynamic programming approach looks more attractive.
This method however suffers some flaws from the accuracy viewpoint (for
example, it is well known that the Bellman differential equations cannot be
Introduction 5
and (1.1.6) and solve some specific control problems for mechanical, techni-
cal, and biological objects. In the present chapter, we only discuss general
restrictions that we need to impose on the function g(.) in (1.1.5) and (1.1.6)
to obtain a well-posed mathematical statement of the problem of optimal
control synthesis.
The most important and, in fact, the only restriction on the function g(.)
is the existence of a unique solution to the Cauchy problem for Eqs. (1.1.5)
and (1.1.6) with any given control function u(t) chosen from a function
class that is called the class of admissible controls. This means that the
trajectory x(t) of system (1.1.5) or (1.1.6) is uniquely determined2 on the
+
time interval to 5 t 5 to T by the initial state x(to) = xo and a chosen
+
function {u(t) : to 5 t 5 t o T}.
The uniqueness of the solution x(t) of system (1.1.5) with the initial
condition %(to)= zo is guaranteed by well-known existence and uniqueness
theorems for systems of ordinary differential equations [137]. The following
theorem [I561 presents very general sufficient conditions for the existence
and uniqueness of the solution of system (1.1.5) with the initial condition
x(to) = xo (the Cauchy problem).
THEOREM.Let a vector-finction g(t, x, u) be continuous with respect to
all variables (t, x, u) and continuously difjerentiable with respect to the com-
ponents of the vector x = ( x l , . . ., x,), and let the vector-function u = u(t)
be continuous with respect to time. Then there exists a number T > 0 such
that a unique continuous vector-function x(t) satisfies system (1.1.5) with
the initial condition x(to) = xo on the interval to 5 t < - to T. +
If T + oo, that is, if the domain of existence of the unique solution
is arbitrary large, then the solution of the Cauchy problem is said to be
infinitely continuable to the right.
It should be noted that the functions g(.) and u need not be continuous
with respect to t. The theorem remains valid for piecewise continuous and
even for bounded functions g(-) and u that are measurable with respect
+
to t. In the last case, the solution x(t) : to 5 to T of system (1.1.5) is an
absolutely continuous function [91].
The assumption that the function g(-) is smooth with respect to the
components of the vector x is much more essential. If this condition is not
satisfied, then we can encounter situations in which system (1.1.5) does not
have any solutions in the "common" classical sense (for example, for some
initial vectors z(t0) = xo, it may be impossible to construct a function
(1.1.9)
where p is a given positive number, and la[ denotes the Euclidean norm of
a vector a , that is, la[ = (C:=,aj)"2. In an "ideal7' servomechanism, the
controlled output process is identically equal to the command signal, that
is, x(t) y(t), 0 5 t 5 T, and the functional (1.1.9) is equal to zero, which
is the least possible value. In other cases, the value of (1.1.9) is a numerical
estimate of the proximity between the input and output processes.
It may happen that much "effort" is required to ensure a sufficient prox-
imity between the processes x(t) and y(t), that is, the control action u(t)
needs to be large a t the input of the plant P. However, it is undesirable to
use too "large" controls in many actual devices both from the energy and
economy viewpoints, as well as from the reliability considerations. In these
cases, instead of (1.1.9), it is better to use, for example, the cost functional
where a, q > 0 are some given numbers. This functional takes into account
both the proximity between the output process x(t) and a given input
process y(t) and the total "cost" of control on the time interval [0, TI.
Of course, the functionals (1.1.9) and (1.1.10) do not exhaust all meth-
ods for stating integral optimality criteria that are used in problems of
synthesis of optimal servomechanisms (Fig. 2). The most general form
14 Chapter I
this criterion depends both on the transition process and on the terminal
state of the system.
If the worst (with respect to a chosen.penalty function) state of the
controlled system on a fixed time interval [O,T] is a crucial factor, then,
instead of (1.1.11), we must use the criterion
3According to 178, 1111, the system (1.1.5) is called totally controllable if for any two
given vectors xo and X I , there always exist a finite number T > 0 and an admissible
control function u ( t ) :0 5 t T such that system (1.1.5) is transferred from a given
initial state x(0) = xo to a given terminal state x(T) = X I during time T.
16 Chapter I
is a special case of (1.1.20) with c(-) 1 and $(.) = 0; the value of this
functional is equal to the mean time during which the point (x, y) comes
to the boundary d D of the set D. If the criterion Ill is used, then the goal
of control depends on whether the initial state (x(0),y ( ~ ) of) the system is
an interior point or, vice versa, (x(0), y ( ~ ) )E Rn+, \ D. If (z(O),y(0)) E
D, then, as a rule, we need to maximize (1.1.21) (see 32.2); otherwise
((x(O),y(0)) @ D), the goal of control is to minimize (1.1.21). The last
problem is a stochastic version of the time-optimal problem [ I , 851.
The criteria 11,. . . ,Il1considered do not exhaust all possible statements
of optimal control problems. The other statements can be found in the
literature devoted to the control theory [I, 3, 5, 24, 34, 43, 58, 111, 112,
128, 1561. The choice of the criterion depends on practical needs, that
is, on the special technical problem that arises a t the design stage. It
should be noted that in the mathematical approach to optimalsystems more
attention is paid to general problems of the qualitative theory (the existence
and uniqueness of the solution, justification of the optimality principle,
Synthesis Problems for Control Systems 17
where x and u are scalars. Suppose that by solving the synthesis problem,
we obtain the optimal control algorithm of the relay type:
+I, Y > 0,
u,(t, x) = uo sign (x - xo(t)), sign y = 0, y = 0, (1.1.29)
-1, Y < 0,
where xo(t) is a given function of time. In this algorithm the control action
instantaneously varies by a finite value when the difference (x - xo(t))
changes the sign. If an actual control device implementing (1.1.29) has
some inertial properties (for example, it turns out that the absolute rate v
of changing the control action is bounded by vo), then it is more convenient
to model such a system by a plant whose behavior is described by two phase
coordinates x1 = x and x2 = u such that
[ f ( t ) i ( t - to) dt =
I f (b)/2,
f (a)/2,
to = b,
to = a ,
20 Chapter I
and hence, to realize this process we need to have a source of infinite power.
Therefore, a process of the white noise type can be considered on some
time interval [O,T] as the limit (as A t 0) model of a time sequence
of independent identically distributed random variables (i = ((ti = i A )
(i = 0 , 1 , . . ., N , N = T l A ) with probability density
for any instants of time t l < t z < . . . < t n from [O,T] and any subset
G C X.
Formula (1.1.35) means that the probabilities of future values of Markov
processes are completely determined by the last measured state of the pro-
cess and are independent of the preceding states (the absence of aftereffect).
Depending on whether the sets [0, T] and X are discrete or continuous, we
distinguish discrete Markov sequences or Markov chains (the sets [0, T] and
X are discrete), continuous sequences (the set [O, T] is discrete and the set
X is continuous), and discrete Markov processes (the set [O, TI is continuous
and the set X is discrete). But if the phase space X is continuous and the
argument t of the stochastic process [(t) may take any value t E [0, TI, then
we have the following two types of Markov processes: continuous (all sam-
ple paths of the process [(t): 0 5 t 5 T are continuous functions of time
with probability 1) and strictly discontinuous (all sample paths of the pro-
< <
cess [(t) : 0 t T are step-functions, while the moments and amplitudes
of jumps are continuous random variables).
22 Chapter I
Discrete Markov processes. As was already noted, in this case the time
is continuous, but the phase space X is discrete. We assume that the set X
consists of finitely many elements x l , . . . , x,, . . ., x,. At each instant of
time t E [0, T ] (possibly, T + m), the process [(t) with probability P,(t)
takes one of the m possible values x,, a: = 1,.. .,m. The transitions to
new states are instantaneous and take place a t random moments. Thus
a sample path of the process [(t) is a step-function of time as shown in
Fig. 7. Suppose that the process is in the state [(t) = x, a t time t. Then,
it follows from (1.1.35) that the probability of the event that the process
comes to the state [(T) = xp a t time T > t depends only on t , T, x,, and
xp. The corresponding conditional probability
probability multiplication theorem [52, 671 and the Markov property of the
process [ ( t ) imply that for any t l < t 2 < - .. < t, and a, P, . . . , w = 1 , . . .,m
the probability of the event { [ ( t l ) = x,, [ ( t 2 ) = x p , . . .,[(t,) = x u } can
be expressed in terms of the functions P,(t) and P,p(t, T ) as follows:
On the other hand, the functions P,(t) and P,p(t,r) can be obtained as
solutions of some systems of ordinary differential equations.
Let us derive the corresponding equations for P, ( t )and Pap ( t ,r ) . To this
end, we first obtain the Chapman-Kolmogorov equation for the transition
probabilities
Since
m
C P { [ ( ~=
) .,,[(a) =x.,f(r) = x p } = P ( C ( t ) = x,,C(r) =ED),
y=1
(1.1.40)
we write the right-hand side of (1.1.40) in the form
and, substituting (1.1.39) and (1.1.41) into (1.1.40), obtain Eq. (1.1.38)
after P,(t) is canceled out.
To derive differential equations for P,p(t, T ) we need some local time
characteristics of the Markov process [ ( t ) . If we assume that there is at
most one change of the state of the process [ ( t ) on a small time interval
A,5 then for small T - t we can write the transition probabilities Pap ( t ,T )
as follows:
5This is a well-known condition that the process ( ( t ) is ordinary [157, 1601, which
means that the probability of two and more jumps of [ ( t )is equal to o(A) on a small
time interval A .
24 Chapter I
P, = I - exp { - 1
t+T
A, ( s )cis).
p(xi, ...,xn;tl,...,tn)dxi...dxn = I
Itt follows from the probability multiplication theorem for joint distribu-
tionss (for any not necessarily Markov process) that
p(xi,x-2,...,xp(xi,x-2,...,xnn;ti,t;ti,t22,...,t,...,tnn)) =
(1.1.51))
Substituting (1.1.56) into (1.1.53) and taking into account (1.1.54) and
(1.1.55), we obtain
@(u;2, a ) = E{ e x p [ j u ( t ( r ) - < ( a ) ) ] I ( ( 0 ) = 2 )
In (1.1.63) we have used the well-known formal relation [41] for the delta-
function:
A a2
+ -27a[yB ( ~ , ~ - A ) p ( x , t ; Y, r - A)] + o(A).
30 Chapter I
a
ap(xlt) - --[Ai(x,t)p(x,t)] + -21-a xa2 [ B i j ( ~ , t ) p ( x , t ) I . (1.1.67)
dt dxi idxj
The sums on the right-hand side of (1.1.67) are taken over twice repeated
indices. This means that these expressions are understood as
P(., t ;y 1 r ) = 11 - (7 - t ) X ( x ,t ) I d ( y - 2 )
+ (r - t ) X ( x ,t ) r ( x ,y, t ) +o(r - t ) . (1.1.68)
+
Just as previously, in (1.1.53) we first set u = t A, then we set u = r - A,
and apply (1.1.68). Passing to the limit as A + 0 in (1.1.53), we obtain
the following pair of integro-diflerential Feller equations for the transition
probability:
32 Chapter I
A(., t ) p ( x ,t ) + / A((, e ) r ( t ,
E, z ) p ( z ,t ) d z . (1.1.71)
In this case, we mainly consider a special case of Eq. (1.2.1), that is, the
scalar equation
ji: = a ( t , x ) +
a ( t ,x ) J ( t ) . (1.2.2)
The results obtained for (1.2.2) in what follows can readily be generalized
to the general case (1.2.1), as well as to the controlled motion case (1.1.6).
We shall do this in the end of the section.
If the stochastic process J ( t )is the time derivative J ( t )= jl(t) = d r l ( t ) / d t
of some random function q ( t ) , then multiplying (1.2.2) by dt we can write
Eq. (1.2.2) in terms of differentials:
Suppose that [(t) in (1.2.2) is the standard white noise with character-
istics (1.1.31). Then the stochastic process
if the intervals [tl, t2] and [t3, t4] have no common interior points. The pro-
cess ~ ( tis) called Brownian motion or a Wiener random process. One can
show [66] that with probability 1 the realizations of this process are contin-
uous but (in view of (1.1.31) and (1.2.5)) nowhere differentiable functions
of time. Formula (1.2.7) illustrates these properties of the Wiener process.
Indeed, the order of the increment
lim
A+O
C [q(ti+l)
,
- ?(ti)]' = t - to
2=0
E(Avi)4 =
1 Jx4ap {- x2
d~ = 3(ti+l - ti)3.
J2n(ti+1 - ti) 2(ti+l - ti)
(1.2.10)
Let us consider the random sum
(3) the limit in (1.2.13), (1.2.14) is understood as the mean square limit
(recall that a random variable [ is called the mean square limit of
the sequence of random variables [,
tit^) ti)
X(T)= ti) + ti+l
-
- ti
(T - ti) + o(A), ti < < titl,
- T - (1.2.16)
(relation (1.2.16), (1.2.8), (1.2.13), and (1.2.14) are satisfied almost surely).
From (1.2.16) and (1.2.15), we have
Let us calculate the limit on the right-hand side of (1.2.23). To this end,
following [175], we consider both the A-decomposition t i , 0 i N , of the < <
integration domain [to,t] and a larger E-decomposition t;, 0 j M < N , < <
such that maxj(t;+, - t5) = E > A. For each fixed E-decomposition we
define piecewise constant functions 7, (t) and (t) whose constant values
on the j t h part of the E-decomposition are given by the formulas
Obviously, we have
N-l
5 C 7, (ti)[ ~ ( t i + l-) T(ti)l2. (1.2.24)
i=O
The last inequality holds for any fixed c-decomposition. Since the function
u(t,x ) is continuously differentiable, we have
as c + 0. Thus the first and the last sums in (1.2.25) have the same limits
as E + 0:
Now we return to (1.2.23) and obtain the following relation between the
stochastic integral Iv and the Ito integral Io:
Thus we see that the following similar formula holds for any square
integrable function @ ( T ,x ( T ) ) that is continuously differentiable with re-
spect to both arguments (provided that the stochastic process z ( t ) satisfies
Eq. (1.2.3)):
take into account the equality rj(tN) = q(t) and (1.2.8), sum up, and thus
obtain
A(t, x) = lim E
a-+o
-
1 x(t) = } (1.2.29)
B ( t , x) = lim E
A-0
I x(t) = x .
(1.2.30)
Since the process x ( r ) and the function a ( r , x) are continuous, the mean
value of the first integral in (1.2.31) can be written for small A as follows:
The result of averaging the second integral in (1.2.31) depends on the defini-
tion of the stochastic integral. If we have an Ito integral, then by definition
(1.2.20) we have
(for any A not necessarily small). This important property of Ito integrals
follows from the fact that the increments q ( r ) are independent of the in-
tegrand a ( r , x ( r ) ) (here a is not an extrapolating function 11321). By the
same reason, for any A , formulas (1.2.8) and (1.2.20) imply
t+a
= Ea2 (7, x ( r ) ) d r .
(1.2.34)
From (1.2.29)-(1.2.34), we obtain the local characteristics
of the Markov process defined by (1.2.4) on the basis of the Ito integral.
If the second integral in (1.2.31) is understood in the sense of (1.2.19),
then formula (1.2.34) remains valid after the change do7(.r) + d , v ( ~ )
but, instead of (1.2.33), we obtain the following formula from (1.2.26) and
(1.2.33):
In this case, the diffusion process x(t) has the other characteristics
aa
A(t, x) = a ( t , x) + v-(t,
ax x)u(t, x),
(1.2.37)
B ( t , x) = u 2 ( t , x).
a ( t , x ( t ) ) +vu------
ax
dt + u (t, X)dov(t) (1.2.42)
(formulas (1.2.39), (1.2.41), and (1.2.42) readily follow from (1.2.35) and
(1.2.37)).
From (1.2.38), (1.2.40)-(1.2.42) we can see that different forms of differ-
ential equations can readily be transformed into each other.
In this connection, the following two questions arise immediately: (1) Do
we need different forms of Eq. (1.2.38) a t all? (2) Is it possible to use only
one definite form of stochastic equations, say, the Ito form, and to consider
all differential equations of the form (1.2.3) as Ito equations? The last ques-
tion has a n affirmative answer. Namely, in the majority of mathematical
papers 144, 45, 56, 66, 113, 1321 it is postulated from the beginning that the
motion of a controlled system is described by Ito differential equations and
the theory is constructed via Ito stochastic differentials and integrals. The
answer to the first question is based on advantages and disadvantages of
different forms of stochastic equations and integrals, on whether a n actual
process is adequate to its mathematical model, and on the character of the
problem considered.
The main advantage of Ito stochastic differential equations is the fact
that their solutions x(t) are martingales; this follows from (1.2.4), the def-
inition of the Ito integral (1.2.20), and formula (1.2.33). This fact allows
us to study the processes x(t) by rather general methods from the theory
of martingales [132]. Moreover, if we use the Ito equation, then we obtain
many formulas, for example, in the theory of filtration ($1.5), in the most
compact form.
However, sometimes it is inconvenient to use Ito differentials and inte-
grals, because very often we cannot use formulas from the common analysis
for operations with Ito processes. This was already pointed out when inte-
gral (1.2.28) was calculated. A similar situation arises when we differentiate
a function of the stochastic process x(t) that is a solution of the Ito equation
(1.2.40).
Suppose that p ( t , x) is a continuous function with continuous partial
derivatives &plat, d p / d x , and a 2 v / d x 2 . Then the stochastic process
v ( t , x ( t ) ) (here x(t) is a solution of Eq. (1.2.40)) has the following Ito
stochastic differential [66, 131, 1671:
Chapter I
for the differential of a composite function cp (under the condition that x(t)
satisfies Eq. (1.2.40) with usual differential dq(t)).
The outlined computational difficulties disappear if we use the sym-
metrized form of stochastic integrals and equations. This has already been
shown for integration when we calculated the integral (1.2.28). Let us show
that the usual formula of the composite function differentiation (1.2.44)
holds for the stochastic process x(t) defined by the symmetrized differen-
tial equation (that is, by Eq. (1.2.38) with v = 112). The proof of this
statement is indirect. Namely, we show that formula (1.2.44) for x(t), de-
fined by the symmetrized stochastic equation
implies formula (1.2.43) for x(t), defined by the Ito equation (1.2.40).
Indeed, it follows from (1.2.41) that the symmetrized equation equivalent
to (1.2.40) has the form
(the arguments of a and a are omitted). From this relation and (1.2.44) we
obtain the symmetrized stochastic differential
(as usual, the summation is taken over repeated indices if any, that is,
in (1.2.49) we have u;j[j = ~ ~ = l u i j < j )Systems
. (1.2.2) and (1.2.3) (or
(1.2.49)) determine a n n-dimensional Markov process x(t) with the vector
of drift coefficients
Ai(t, x) = a i ( t , x) + -21daij(t,
dxk
x)
akj(t, x); i = 1, . . . , n , (1.2.50)
If the process x(t) is defined by the Ito equation (1.2.40), then, instead of
(1.2.50) and (1.2.51), we have
J-00
(here K [a, P] = E(a - Ea) (P - EP) denotes the covariance of random vari-
ables a and 0 ; moreover, the mean Eg; and the correlation functions in
(1.2.53) and (1.2.54) are calculated under the assumption that the argu-
ment x is a nonrandom fixed vector).
Since similar characteristics of the Markov process defined by (1.2.2) (or
by (1.2.49)) have the form (1.2.50), (1.2.51), we can obtain the differential
equation (1.2.2), which is stochastically equivalent to (1.2.1), by solving
system (1.2.50), (1.2.51) with respect to the unknown variables ai and uij,
. .
z , ~= 1,. . . , n. Note that system of Eqs. (1.2.50), (1.2.51) can always be
solved with respect to ai and uij. This follows from the fact that the
diffusion matrix B is positive definite (semidefinite) and symmetric. As is
known [62], to such matrices there corresponds a real symmetric positive
(semidefinite) matrix u that is the matrix square root u = 1/B (here we
do not consider methods for calculating u). Since u is symmetric, we have
B = u2 = u u T , that is, the matrix equation (1.2.51) is solvable, and hence
(1.2.50) implies
(in (1.2.55) [(t) is the standard white noise with the characteristics (1.1.34);
in (1.2.56)
r t
and the control vector u may take values a t each moment of time in a given
bounded set U C R,,
u ( t ) E U. (1.3.3)
In problem (1.3.1)-(1.3.3) the time interval 10, T] and the initial vector
of phase variables xo are known; it is required to find the control function
u, ( t ): 0 <
-t <
- T satisfying (1.3.3) and minimizing the functional (1.3.2) on
the trajectory x , ( t ) : 0 5 t 5 T , which is a solution of the Cauchy problem
(1.3.1) with u ( t )= u,(t). If the Cauchy problem (1.3.1) with u ( t ) = u , ( t )
has a single solution, then the optimal control u, ( t ): 0 5 t < T may be
represented in the form
u*(t)= ~ ( t~ ,( t ) ) , (1.3.4)
where the current values of the control vector are expressed in terms of the
current values of phase variables of system (1.3.1). The optimal control of
the form (1.3.4) is called optimal control i n the synthesis form, and formula
(1.3.4) itself is often called the algorithm of optimal control.
The dynamic programming approach allows us to obtain the optimal
control in the synthesis form (1.3.4) for problem (1.3.1)-(1.3.3) as follows.
We write
Synthesis Problems for Control Systems 49
F ( t , z t ) = rnin
U(~)€U
c (a,%(a),
u(a)) d a + F (3, x ~]), (1.3.6)
for all 5 E [t,T]. For different statements of the optimality principle and
comments see [I, 16, 501. However, here we do not discuss these statements,
since to derive Eq. (1.3.6) it suffices to have the definition of the loss function
(1.3.5) and to understand that this is a function of time and of the state
x(t) = xt of the controlled system (1.3.1) a t time t (recall that the control
process is terminated a t a fixed time T).
-
To derive Eq. (1.3.6), we write the integral in (1.3.5) as the sum T = St
$+ of two integrals and write the minimum as the succession of minima
min = min min .
u(r)€U u ( a ) E U y(p)EU
tSr<T t ~ ~ t < pt < ~
Then we can write (1.3.5) as follows:
p,
Since, by (1.3.1), the control u(p) on the interval T) does not affect the
solution x(o) of (1.3.1) on the preceding interval [ t , q , formula (1.3.7) takes
the form
F ( t , x t ) = min
u(u)EU
Now, since by (1.3.5) the second term in the braces in (1.3.8) is the loss
function F(i,x,), we finally obtain Eq. (1.3.6) from (1.3.8).
The basic functional equation (1.3.6) of the dynamic programming ap-
proach naturally allows us to derive the differential equation for the loss
+
function F ( t , 2). To this end, in (1.3.6) we set t = t A, where A > 0 is
small, and obtain
F ( t , xt) = min
u(u)€U
t<u<t+A
[l
t+A
c(o, x(o), u ( r ) ) d o + ~ (+ tA, xtca)]. (1.3.9)
Since the solutions x(t) . . of system (1.1.3) are continuous, the increments
( x t +- ~ xt) of the phase vector are small for admissible controls u(t) =
cp(t,x(t)). Taking into account this fact and assuming that the loss function
F ( t , x) is continuously differentiable with respect to all its arguments, we
can expand the function F ( t + A, x ~ + into ~ ) its Taylor series about the
point (t, xt) as follows:
the function o(A) denotes the terms whose order is larger than that of the
infinitesimal A. It follows from (1.3.1) that for small A the increment of
the phase vector x can be written in the form
Synthesis Problems for Control Systems 51
Jt
substituting (1.3.10) and (1.3.12) into (1.3.9), and taking into account
(1.3.11), we arrive at
Note that only the first and the fourth terms on the right-hand side of
(1.3.13) depend on the control ut. Therefore, the minimum is calculated
only over these terms, the other terms in the brackets can be ignored.
Dividing (1.3.13) by A, passing to the limit as A -+ 0, and taking into ac-
count the fact that limA+o o ( A ) / A = 0, we obtain the following differential
equation for the loss function F ( t , x):
(here we omit the subscript t of the phase vector xt and the control ut).
Note that the loss function F ( t , x) satisfies Eq. (3.1.14) on the entire interval
of control 0 5 t < T except a t the endpoint t = T , where, in view of (1.3.5),
the loss function satisfies the condition
The differential equation (1.3.14), called the Bellman equation, plays the
central role in applications of the dynamic programming approach to the
synthesis of feedback optimal control. The solution of the synthesis prob-
lem, that is, the optimal strategy or the control algorithm u, (t) = p, (t, x) =
p, (t, x(t)) can be found simultaneously with the solution of Eq. (1.3.14).
Namely, suppose that we have somehow found the function F ( t , x) that
satisfies (1.3.14) and (1.3.15). Then the expression in the square brackets
in (1.3.14) is a known function o f t , x, and u. Calculating the minimum of
this function with respect to u, we obtain the optimal control u, = p, ( t ,x)
(u, determines the minimum point of this function in U c R,).
If the functions c(t, x, u) and g (t, x, u) and the set of admissible controls
U allow us to minimize the function in the square brackets explicitly, then
the optimal control can be written in the form
52 Chapter I
+
2'. Suppose that c(t, x, u) = cl(t, x), g(t, x, u) = a ( t , z ) Q ( t , x)u,
and the domain U is a n r-dimensional parallelepiped, that is, luil <
uoi,
i = 1, . . ., r, where the numbers uoi > 0 are given. One can readily see that
where signA and jAl are matrices obtained from A by replacing each its
element a i j by sign a i j and laij 1, respectively; {uol, . . . ,uor) denotes the
diagonal r x r matrix with uol, . . ., uor on its principal diagonal.
. Let the functions c(.) and g(.) be the same as in 2'; for the domain
3
'
U, instead of a parallelepiped, we take an r-dimensional ball of radius Ro
centered a t the origin. Then, instead of (1.3.22), we obtain the following
expressions for the functions cpo and Q!:
(here A(t) and B ( t ) are given n x n and n x r matrices), the penalty functions
c(t, x, u) and $(x) in the optimality criterion (1.3.2) are linear-quadratic
forms of the phase variables x and controls u, and there are no restrictions
on the domain of admissible controls (that is, U = R, in (1.3.3)).
Let us solve the synthesis problem for the simplest one-dimensional LQ-
problem with constant coefficients; in this case, the solution of the Bellman
equation and the optimal control can be obtained as finite analytic formulas.
Suppose that the plant is described by the scalar differential equation
2 = ax + bu, (1.3.24)
(cl > 0, c > 0, T > 0, h > 0, and a and b in (1.3.24) and (1.3.25) are
given constant numbers). The Bellman equation (1.3.14) and the boundary
condition (1.3.15) for problem (1.3.24), (1.3.25) have the form
We shall seek the loss function F ( t , x) satisfying Eq. (1.3.29) and the
boundary condition (1.3.27) in the form
Synthesis Problems for Control Systems 55
for 0 <
- t < T . Moreover, it follows from (1.3.27) and (1.3.30) that the
function p(t) assumes a given value a t the right endpoint of the control
interval:
p(T) = cl. (1.3.32)
Equation (1.3.31) can readily be integrated by separation of variables. The
boundary condition (1.3.32) determines the unique solution of (1.3.3 1). Per-
forming the necessary calculations, we obtain the following function p(t)
that satisfies Eq. (1.3.31) and the boundary condition (1.3.32):
Thus it follows from (1.3.28) and (1.3.30) that the optimal control in the
synthesis form for problem (1.3.24), (1.3.25) has the form
smooth (or even analytic) functions. It is well known that, by this rea-
son, the dynamic programming approach cannot be used for solving many
time-optimal control problems [50, 1561.
(2) Even in the case where the loss function F ( x , t ) satisfies the Bell-
man equation (1.3.14), the control u,(t, x) minimizing the function in the
square brackets in (1.3.14) may not be admissible. In particular, this con-
trol can violate the existence and uniqueness conditions for the solution of
the Cauchy problem for system (1.3.1).
(3) The Bellman equation (1.3.14) (or (1.3.18)) with the boundary con-
dition (1.3.15) can have nonunique solutions.
Nevertheless, we have the following theorem [I].
THEOREM. Suppose that there exists a unique continuously differentiable
solution Fo(t, x) of Eq. (1.3.14) with boundary condition (1.3.15) and there
exists an admissible control u,(t, x) such that
uEU
c(t, 2, u) + 9T (t, 2 , ~ )O-"dx
(t, 2)
1 ~ F o
= c(t, x, u*) + g T ( t , 2, u*)-(t,
dx
2).
Then the control u,(t, x ) in the synthesis form is optimal, and the function
Fo(t, x) coincides with the loss function (1.3.5).
In conclusion, we point out another fact relative to the dynamic pro-
gramming approach. The matter is that this method can be used for solving
problems of optimal control for which the optimal control u,(t, x) does not
exist. For example, such situations appear when the domain of admissible
controls U in (1.3.3) is a n open set.
The absence of a n optimal control does not prevent us from deriving the
basic equations for the dynamic programming approach. It only suffices
to modify the definition of the loss function (1.3.5). So, if we define the
function F ( t , xi) as the greatest lower bound of the functional in the square
brackets in (1.3.5),
then one can readily see that the function (1.3.35) satisfies the equations
c(t, X, U) , ~ ) gx)]
+ g ~ ( t x, ( t=,0, (1.3.37)
dt ax
Synthesis Problems for Control Systems 57
which are similar to Eqs. (1.3.6) and (1.3.14). However, in this case the
functions u,(t, x) realizing the infimum of the function in the square brack-
ets in (1.3.37) may not exist.
Nevertheless, the absence of a n optimal control u,(t, x) is of no funda-
mental importance in applications of the dynamic programming approach,
since if the lower bound in (1.3.37) is not attainable, one can always con-
struct the so-called &-optimal strategy u, (t, x). If this strategy is used in
system (1.3.1), then the performance functional (1.3.2) attains the value
I(u,) = F ( 0 , xo) + E , where E is a given positive number. Obviously, to
construct a n actual control system, it suffices to know the &-optimalstrat-
egy u, ( t ,x) for a small E.
Here we do not describe methods for constructing &-optimal strategies.
First, these methods are considered in detail in the literature (see, for ex-
ample, 1113, 1371). Second (and this is the main point), the optimal control
always exists in all special problems studied in Chapters 11-VII. This is the
reason that, from the very beginning, in the definition of the loss function
(1.3.5) we use the symbol "min" instead of a more general symbol "inf."
One can readily see that the servomechanism shown in Fig. 10 possesses
the listed Markov properties if the following conditions are satisfied:
(1) the joint vector ( y ( t ) ,x(t)) of instant values that define the input
actions and output variables is treated as the phase vector of the system;
58 Chapter I
The loss function (1.4.5) for stochastic problem (i)-(v) differs from the
loss function (1.3.5) in the deterministic case by the additional operation
of averaging the functional in the square brackets in (1.4.5). The averaging
in (1.4.5) is performed over the set of sample paths = [ x ( r ): t XT
r TI, < <
$' = [y(r): t <
r 5 TI, that on the interval [t,T] satisfy the stochastic
differential equations (1.4.2) and (*) (see the footnote) with initial condi-
tions x(t) = z t , y(t) = yt and control function ~ ( r=) ( ~ ( 7~, ( r Y(r)),
),
t<r<T.
Since the process Z(r) = ( x ( r ) ,Y(r)) is Markov, the result of averaging
F,(t, 2,) = F,(t, xt, yt) = E[.] in (1.4.5) is uniquely determined by the time
moment t , by the state vector of the system Zt = ( x t ,yt) a t this moment,
and by a chosen algorithm of control, that is, by the vector-function y(.)
in (1.4.1). Therefore, it turns out that the loss function (1.4.5) obtained by
minimizing F,(t, Zt) = F,(t, xt, yt) over all admissible controlss (that is,
over all admissible vector-functions c p ( - ) ) depends only on time t and the
state (xt, yt) of the servomechanism (Fig. 10) a t this time moment.
'As was shown in 51.2, the coefficients AY(t, y ) and B Y ( t , y ) uniquely determi~lethe
system of stochastic differential equations
+
$ ( t ) = ay ( t , ~ ( t ) ) ay( t , ~ ( t ) ) v ( t ) , (*I
whose solutions are sample paths of the Markov process y ( t ) ; in (*) I l ( t ) denotes the
standard white noise (1.1.34) independent of E ( t ) .
t in the deterministic case (51.3) the control in the form ( 1 . 4 . 1 ) is called
8 J ~ s as
admissible if (i) for all t E [0, T ) , x E R,, and y E R,, the vector-function p ( t , x , y)
takes values from a n admissible set U and (ii) the Cauchy problenl for the stocllastic
differential equation ( 1 . 4 . 2 )with control u ( t ) in the form ( 1 . 4 . 1 )has a unique solution.
60 Chapter I
One can readily see that, for any Z E [t, TI, the loss function (1.4.5)
satisfies the equation
-
writing the integral in (1.4.5) as the sum =J &T
: 1 +
; of two integrals
and writing the minimum as the succession of minima
-
- min min E - -
t j a < t ?<P<T
Since the process (x(t), y(t)) is Markov, the result of averaging in the second
term in the braces in (1.4.9) depends only on the terminal state of a fixed
sample path (x:, y): . Thus, replacing ExI ,Yi,I":'Y: - - by Ex:,yTIxi,yi and taking
into account the fact that, by (1.4.5), the second term in (1.4.9) is the loss
function F(3, xt; x),
we finally obtain the functional equation (1.4.6) from
(1.4.9).
Just as in the deterministic case, the functional equation (1.4.6) allows us
to obtain a differential equation for the loss function F ( t , x, y). By setting
+
3 = t A, we rewrite (1.4.6) in the form
Assuming that A > 0 is small and the penalty function c(x, y, u) is con-
tinuous in its arguments and having in mind that the diffusion processes
x ( r ) and y ( r ) are continuous, we can represent the first term in the square
brackets in (1.4.10) as
Here all derivatives of the loss function are calculated a t the point ( t ,xt, yt);
as usual, d F / d x and d F / d y denote the n- and m-column-vectors of partial
derivatives of the loss function with respect to the components of the vectors
z and y, respectively; a 2 F / d x a z T , a 2 F / a x a Y T ,and a2F/aYaYT
denote the
n x n , n x m , and m x m matrices of second derivatives.
To obtain the desired differential equation for F ( t , x, y), we substitute
(1.4.11) and (1.4.12) into (1.4.10), average, and pass to the limit as A + 0.
Note that if we average expressions containing the random increments
(zt+~ - z t ) and (yt+a - yt), then all derivatives of F in (1.4.12) are con-
sidered as constants, since they depend on (t, xt, yt) and the mathematical
expectation in (1.4.10) is calculated under the assumption that the values
of xt and yt are known and fixed.
T h e mean values of the increments ( x t + ~ - xt) can be calculated by
integrating Eqs. (1.4.2). However, we can avoid this calculation if we use
the results discussed in $1.2. Indeed, if just as in (1.4.11) we assume that
the control u ( r ) is fixed and constant, U(T) E u t , then we see that for
< < +
t T t A, Eq. (1.4.2) determines a Markov process X(T) such that we
can write (see (1.1.54))
where Ax(t, xt, ut) is the vector of drift coefficients of this process. But
since (for a fixed u ( t ) = ut) Eq. (1.4.2) is similar to (1.2.2), it follows from
(1.2.50) that the components of this vector have the formg
' ~ e c a l lthat formula (1.4.14) holds for the symmetrized stochastic differential equa-
tion (1.4.2). But if (1.4.2) is an Ito equation, then we have A x ( t , z t , u t ) = a ( t , s t , u t )
instead of (1.4.14).
Synthesis Problems for Control Systems 63
where
B x ( t , z t ) = u(t, xt)uT(t, xt). (1.4.16)
The other mean values in (1.4.12) can be expressed in terms of the input
Markov process y(t) as follows:
) AY(t, yt)A + o ( A ) ,
-~ t=
E(Y~+A
(1.4.17)
Finally, since the stochastic processes y(t) and [(t) are independent, we
have
E(zt+a - xt)(yt+a - ~ t = )o(A)-~ (1.4.19)
Taking into account (1.4.13)-(1.4.19), we substitute (1.4.11) and (1.4.12)
into (1.4.10) and rewrite the resulting expression as follows:
a~ + (AY(t, yt))T -
+ (AX(t,xt, ut)) T & aF
ay
+ -21 Sp B x (t, xt) -d2F 1
axaxT + 5 SP B~(t, ~ t )ayayT
- ] +o(~)}.
a2F
(1.4.20)
For brevity, in (1.4.20) we omit the arguments (t , xt, yt) of all partial deriva-
tives of F and denote the trace of the matrix A = Ilaijll; by S P A =
+ + +
all a22 - - - a,,.
By analogy with Eq. (1.3.14), we divide (1.4.20) by A , pass to the limit
as A -+ 0, and obtain the following Bellman differential equation for the
loss function F = F ( t , x, y):
By analogy with (1.3.14), we omit the subscripts of xt, yt, and ut, assuming
that the phase variables x, y and the control vector u in (1.4.21) are taken
64 Chapter I
a2
+
Sp Bx(t,X ) - SP BY(t, Y)-
axaxT
In the theory of Markov processes [45, 157, 1751, the operator (1.4.23)
is called an injinitesimal operator of the diffusion Markov process Z(t) =
( ~ ( t ) )~,( t ) ) .
To obtain the optimal control in the synthesis form u, = cp, (t, x, y) for
problem (i)-(v), we need to solve the Bellman equation (1.4.21) with the
additional condition (1.4.22).
If it is possible to calculate the minimum of the function in the square
brackets in (1.4.21) explicitly, then the optimal control can be written as
follows (see 81.3, (1.3.16)-(1.3.18)):
and the Bellman equation (1.4.21) can be written without the symbol "min"
It follows from (1.4.23) and (1.4.24) that function (1.4.29) satisfies the
stationary Bellman equation
Obviously, for the optimal control u, = cp,(x, y), the error y of stationary
tracking has the form
and, together with the functions f (x, y) and u, = cp, (x, y), can be found
by solving the time-invariant equation (1.4.30). Some methods for solving
the stationary Bellman equations are considered in Chapters IILVI.
1.4.3. Maximization of the mean time of the first passage to
the boundary.
As previously, we assume that in the servomechanism shown in Fig. 10
the stochastic process y(t) is homogeneous in time and the plant P is
autonomous. We also assume that a simply connected closed domain
+
D C Rn+, is chosen in the (n m)-dimensional Euclidean space Rn+,
of vectors (2, y). It is required to find a control that, for any initial state
(x(0), ~ ( 0 ) E) D of the system, maximizes the mean time E 7 during which
the representative point (x(t), y(t)) achieves the boundary d D of the do-
main D (see the criterion (1.1.21) in 31.1).
By W u ( t -to, xo, yo) we denote the probability of the event that the rep-
resentative point (x, y) does not reach d D during time t-to if x(to) = xo and
68 Chapter I
For the mutually disjoint events { r < t-to} and { r > t-to}, the probability
addition theorem implies
P{~<t-toI"o,~o)=
I'-" W, ( u )d a = 1 - W u(t - to, xo, yo)
from (1.4.34) and (1.4.35). Hence, after the differentiation with respect
Using the same notation for the argument of the density and for the random
value, that is, writing w,(t - t o ) = w ( r ) , from (1.4.33) and (1.4.36) we
obtain the mean time Er of achieving the boundary
= Lm W U(r," 0 , Yo) d r = 1; W U (-
t to, " 0 , Y O )dt.
(1.4.37)
The mean time E r depends both on the initial state (xo, yo) of the
controlled system shown in Fig. 10 and on a chosen control algorithm
u = p(x, y). Therefore, the Bellman function for the problem considered is
determined by the relation
By analogy with (1.4.10), for the function (1.4.38), the basic functional
equation of the dynamic programming approach has the form
F l ( z t , yt) = max
u(r)€U
W U ( r- t , zt, yt) d~ + EFl(zt+a, Y ~ +11A.
t<~<t+A
(1.4.39)
The Bellman differential equation for the function Fl(x, y) can be de-
rived from (1.4.39) by passing to the limit as A t 0. In this case, the
procedure is almost the same as that used for the derivation of Eq. (1.4.21)
for the basic problem (i)-(v). Expanding F l ( x t + ~yt+a), in the Taylor
series around the point ( z t , Y,), averaging the expansion with respect to
the random increments ( x ~ + A - xt) and ( y t + ~- yt), taking into account
the relation limA,o W u ( A , xt, yt) = 1 for all (xt, yt) lying in the interior
of D , and passing to the limit as A + 0, from (1.4.39) with regard to
(1.4.13)-(1.4.19), we obtain the Bellman differential equation for the func-
tion F l ( x , y):
m a ~ L $ , ~ F l (y)x ,= -1, (1.4.40)
uEU
Fz(t,xt,yt) = min E
t<s<T
I
max c ( x ( r ) , y ( r ) , u ( r ) ) . (1.4.42)
provided that the function F2(t, xt, yt) > cO(xt,yt) has been obtained from
(1.4.45).
Acting by analogy with Section 1.4.1, that is, expanding the function
+
F2(t A , x t + ~y, t + ~ in
) the series (1.4.12), averaging, and passing to the
limit as A + 0, from (1.4.44) and (1.4.45) we obtain (with regard to
(1.4.13)-(1.4.19)) the Bellman equation in the differential form:
2, Y) = 0
minLZL,z,yF~(t, if Fz(t, 2, Y) > cO(x,Y),
(1.4.46)
Fz(t, X , Y) = cO(z,Y) otherwise,
as well as the matching conditions for the function F2(t, x, y) on the inter-
face between the domains on which the equations in (1.4.46) are defined.
These conditions of "smooth matching" [I131 require the continuity of the
function F2(t, x, y) and of its first-order derivatives with respect to the
phase variables x and y on the interface mentioned above.
If, by analogy with Sections 1.4.2 and 31.4.3, the characteristics of the
input process y(t) and of the controlled plant P are time-independent, then
it is often expedient to use a somewhat different statement of the problem
considered, which allows us to assume that the loss function is independent
of time. In this case, we do not fix the observation time but assume that
the optimal system minimizes the functional
where ,f3 > 0 is a given number. This change of the mathematical statement
preserves all characteristic features of the problem. Indeed, it follows from
(1.4.48) that the time of observation of the function c(x, y, u) is bounded
and determined by 0.Namely, this time is large for small /? and small for
large 0.
For the criterion (1.4.48) the loss function is determined by the formula10
f 2 (2, y)
U(T)EU ~ ? t I
= min E m a x c ( x ( r ) , y ( ~ )u, ( ~ ) ) e - P ( ~ - .~ ) (1.4.49)
min E
u(r)~U I J
max c ( x ( r ) , Y ( ~ u) ,( ~ ) ) e - P ( ~ - ~ )
T>~+*
-
- min E
r>t+A
max c ( x ( r ) ,Y(r),
u(~))e-P(~-~-~)
I
we can rewrite Eq. (1.4.43) for the function f2(x, y) in the form
By analogy with the previous reasoning, from (1.4.50) we obtain the Bell-
man equation for the function f 2 (x, y):
where L;,, is the elliptic operator (1.4.31) that does not contain the deriv-
ative with respect to time t. In $2.2 of Chapter 11, we solve Eq. (1.4.51) for
a special problem of optimal control.
1.4.5. Optimal tracking o f a strictly discontinuous Markov pro-
cess. Let us consider a version of the synthesis problem for the optimal
tracking system that differs from the basic problem (i)-(v) by conditions (i)
and (iv). Namely, we assume that (i) the input process y(t) in the ser-
vomechanism (see Fig. 10) is given by a strictly discontinuous Markov pro-
cess (see 31.1) with known characteristics X(t, y) and ~ ( yz , t ) determining
the intensity of jumps and the density of the transition probability a t the
state (t, y) and that (ii) there are no random perturbations ((t) that act
on the plant P. In this case, the plant P is described by the system of
ordinary (nonstochastic) differential equations
It follows from (1.1.68) that for small A the transition probability p(t, yt; t +
+
A , yt+A) = p(y(t A) = y t + ~I y(t) = yt) for the input process y(t) is
determined by the formula
By analogy with the solution of the basic problem, in the case considered,
the loss function F3(t,x, y) is determined by (1.4.5) if E[.] in (1.4.5) is
understood as the averaging of the functional [.] over the set of sample
paths y: = [y(r): t 5 T 5 T] issued from a given initial point y(t) = yt.
Obviously, F3(t, x, y) satisfies the functional equations (1.4.6) and (1.4.10).
We rewrite Eq. (1.4.10) for F3 as follows:
Synthesis Problems for Control Systems 73
(in (1.4.58) and (1.4.59) the functions O(A) denote terms of the order of A
such that l i m a , ~ O ( A ) / A = N, where N is a finite number).
74 Chapter I
The Markov property of the process X ( t ) allows us to write the basic func-
tional equation of the optimality principle, then to obtain the Bellman
equation, etc., that is, to follow the procedure described in 31.4.
To implement the control algorithm in the form (1.5. I ) , it is necessary
to measure the phase variables X t exactly a t each instant of time. This
possibility is provided by the servomechanism shown in Fig. 10. In this
case, the phase variables X t = Zt = (xt, yt) are the components of the
(n + m)-dimensional vector of instant input (assigning) actions and output
(controlled) variables.
Now let us consider a more general case of the system shown in Fig. 3.
At each instant of time, instead of true values of the vectors xt and yt, we
have only the results of measurements ci and Zi, which are sample paths
< < < <
of the stochastic processes {c(s): 0 s t ) and {Z(s): 0 s t}. These
processes are mixtures of "useful signals" Y;, xi and "random noises" (i, rl;.
Only these results of measurements can be used for calculating the current
values of the control actions ut; therefore, the desired control algorithm for
the system shown in Fig. 3 has the form of the functional
(here we use the notation from (1.4.2), (1.4.3), and (1.4.4) in §1.4). The
observed processes Z(t) and g(t) are determined by the relations
Here P, Q, H, and G are given matrices whose dimensions agree with the
dimensions of the vectors Z, x, q, y, v,
and <. We also assume that the
c
vectors Z and q (as well as the vectors and C) are of the same dimension,
and the square matrices Q(t) and G(t) are nondegenerate for all t E [0, TI.''
We assume that the stochastic process [(t) in (1.5.3) is the standard white
noise (1.1.34) and the other stochastic functions y(t), C(t), and q(t) are
Markov diffusion processes with known characteristics (that is, with given
drift and diffusion coefficients). The stochastic processes [(t), y(t), ((t),
and q(t) are assumed to be independent. We also note that the stochastic
process x(t), which is a solution of the stochastic equation (1.5.3), is not
Markov, since in this case the control functions u(t) = ut on the right-hand
side of (1.5.3) have the form of functionals (1.5.2) and depend on the history
of the process.
Following the formal scheme of the dynamic programming approach,
by analogy with (1.4.5), we can define the loss function for the problem
considered as follows:
-t -t
F ( t , xo, Yo) = min E
u(r)EU
t<r<T
[l~ ( ~ ( 7~1( 7, 1 ,
T
4 ~ ) ) +d d~( x ( T ) , Y(T)) I z;, G]
*
(1.5.7)
Since the functions jrot and vi are arguments of F in (1.5.7), it would be
more correct if expression (1.5.7) were called a loss functional; however,
both (1.5.7) and (1.4.5) are called loss functions.
In contrast with $1.4, it is essentially new that we cannot write the
optimality principle equation of type (1.4.6) or (1.4.10) for the function
(1.5.7), since this function depends on the stochastic processes Z(t) and
c ( t ) , which are not Markov. Formula (1.5.6) immediately shows that Z(t)
and c ( t ) have no Markov properties, since the sum of Markov processes
is not a Markov process. Moreover, it was pointed out that the process
x(t) itself is not Markov. Therefore, we can solve the synthesis problem
"For simplicity, we assume that Q ( t )and G ( t )are nondegenerate, but this condition
is not necessary [132, 1751.
Synthesis Problems for Control Systems 77
and, on the other hand, the stochastic process X ( t ) be Markov. Such phase
variables X t are called suficient coordinates [I711 by analogy with sufficient
statistics used in the mathematical statistics [185].
It turns out that there exist sufficient coordinates for the problem consid-
ered and X t is a collection of instant values of observable processes Z(t) = Zt
and y(t) = gt and of the a posteriori probability density p(t, xt, yt) =
p(x(t) = xt, y(t) = yt I ZJ, 5;) of the unobserved vectors xt and yt,
In what follows, it will be shown that the coordinates (1.5.8) are sufficient
to compute the loss function (1.5.7). In the case of an uncontrolled process
x(t), the Markov property of (1.5.8) follows from Theorem 5.9 in [175].
To derive the Bellman differential equation, it is necessary to know equa-
tions that determine the time evolution of sufficient coordinates. For the
first two components of (1.5.8), that is, for the process Z(t) and c ( t ) , these
equations can be assumed to be known, because one can readily obtain
them from the a priori characteristics of the processes ~ ( t )x(t), , ((t), q(t)
and formulas (1.5.6). Later we derive the equation for the a posteriori
probability density p(t, xt, yt ) .
First, we do not pay attention to the fact that the control ut has the form
of a functional (1.5.2). In other words, we assume that u(t) in (1.5.3) is a
known deterministic function of time. Then the stochastic process x(t) that
satisfies the stochastic equation (1.5.3) is a diffusion Markov process whose
characteristics (the drift and diffusion coefficients) are uniquely determined
by the vector a ( t , x, u) and the matrix a ( t , x) (see 31.2). Thus, in our
case, x(t), y(t), ((t), and q(t) are independent stochastic Markov diffusion
processes with given drift coefficients and matrices of diffusion coefficients.
In view of formulas (1.5.6) and the fact that the matrices Q(t) and G(t)
are nondegenerate, it follows that the collection (Z(t),g(t),x(t), y(t)) is a
Markov diffusion process whose characteristics can be expressed via given
characteristics of the processes x(t), y(t), ((t), and q(t). Indeed, if we
denote the vectors of drift coefficients by A, (t, x), Ay (t, y), AC(t, 0, A, (t, q)
and the diffusion matrices of independent Markov processes x(t), y(t), ((t)),
and q(t) by B,(t, x ) , B y (t,y), B,-(t, (), B, (t, q), then it follows from (1.5.6)
that the drift coefficients A,- and AG for the components Z(t) and G(t) are
78 Chapter I
and the matrix B of the diffusion coefficients of the joint process ( S ( t ) ,c(t),
x ( t ) ,y ( t ) ) has the block form
Substituting (1.5.18) into (1.5.17) and taking into account the fact that the
equality
~ ~
(1.5.19)
Equation (1.5.19) for partially observable Markov processes plays the
same role as the Markov (Smoluchovski) equation (1.1.53) for the complete
observation. To derive the differential equation for the a posteriori density
p(t, z t ) from (1.5.19), we use the same method as for the derivation of the
Fokker-Planck equation in 5 1.1 (see (1.1.59)-(1.1.64)).
Let us introduce two characteristic functions of random increments Az,,
a = 1,..., m, and Azk, k = 1,...,n,13
131n (1.5.20) and (1.5.21), as usual, j = &andithe sum is taken over the repeated
indices:
Synthesis Problems for Control Systems 81
Using the expansion of In &(u, zt) in the Maclaurin series, we can write
-
k , e, . . .,T = l
S
K k A , . . . , T ] ( j u ) ( j u ). . . ( u ) }
where K s [Azk,. . . , Az,] denotes the s-order correlation between the com-
(1.5.24)
+
ponents of the vector of increments A z = z(t A) - z ( t ) of the Markov
process z(t). Using well-known relations between correlations and initial
moments [173, 1811, we see that (1.5.14) gives the following representation
for (1.5.24):
A
@2(u1,.. . ,u,, i t ) = exp - k- B ~ , u ~ u ,
[ ~ j ~ k U
2
+ o(A)] (1.5.25)
- ..,
@ l ( ~ l , Urn, Z t , Y ~ + A=) @Irn-" eXP [ - juu(yut+n - yet )]
where
1 4 ~ obtain
o (1.5.27) and (1.5.28), it sufficesto use the well-known formula [67]
which holds for real symmetric positive definite matrices B = [IBkell;and any constants
mk and me.
82 Chapter I
and K is a constant that does not influence the final result of calculations.
Note that we calculated (1.5.26) under the assumption that the matrix
IIBaPIIL+lis nondegenerate and we used the notation ]lF,,l = IIB,,ll-l.
Since the exponent in (1.5.27) is small (- A), we replace the exponential
+ +
by the first three terms of the Maclaurin series ex = 1 x x 2 / 2 , truncate
the terms whose order with respect to A is larger than 1, and obtain
-
-
[1 + j u , ( A , A + B,, FupAyp)
we obtain the numerator in (1.5.19) (we omit the constant K , since K and
a similar constant in the denominator of (1.5.19) are canceled) :
(in (1.5.33) ( t , x t + a , yt) are the arguments of the coefficients Ak, Bke,
and Fop).
The denominator of the right-hand expression in (1.5.19) differs from
the numerator by integration with respect to xt+a. We perform this inte-
gration, take into account the fact that the normalization condition for the
probability density p(t, x t + a ) implies
and from (1.5.33) obtain the following expression (without K ) for the de-
nominator in (1.5.19):
''Here we do not show in detail how to transform the Ito equation (1.5.38) to the
symmetrized form (1.5.39); the reader is strongly recommended to do this useful exercise
on his own.
16fl,,p denotes an element of the matrix @ which is the square root of the rna-
trix F; since the matrix llBpoll is nondegenerate, the inverse F = IIBPall-', as well
as IIBPall,is p s i t i v e definite and symmetric; therefore, the matrix square root fi
exists [62], and the matrix 0, as well as F, is positive definite and symmetric.
86 Chapter I
x
2
ff,p,...,C=1
amp...c(t)(x, - ma(t)) . . . (xc - mc (t))} (1.5.41)
Synthesis Problems for Control Systems 87
Next we replace the functions 7f, and @(x,y) by their Taylor series,17
substitute (1.5.41) into (1.5.42), and successively set the coefficients of equal
powers of (2, - m,), (x, - m,)(xp - m p ) , . . . on the left- and right-hand
sides of (1.5.42) equal to each other; thus we obtain the following system
of ordinary differential equations for m, (t) ,asp (t) , aapy(t) , . . . :
In (1.5.43) the dot over a variable indicates, as usual, the time derivative
(mp = dmp(t)/dt). Moreover, in (1.5.43) we assume that Bffpis inde-
pendent of x and omit the arguments of the functions x,
@, and of their
derivatives; we assume - that-the values of these functions are taken a t the
point x = m, that is, Ap = Ap(t,m, y), d@/dx, = d@(t,m,y)/dx,, etc.
It follows from (1.5.41) that the set of the parameters m,(t), aap(t), . . .
uniquely determines the a posteriori probability density p(t, x) a t time t.
1 7 ~ h functions
e A, and 6(x, y) are expanded with respect to a: in a neighborhood
of the point m ( t ) .
88 Chapter I
Thus we can use these parameters as new arguments of the loss function,
since F (t, yt ,p(t, 2 ) ) = F (t, yt ,mat, a,pt, . . .). However, in the general case,
system (1.5.43) is of infinite order, and therefore, if we use the new sufficient
coordinates (yt, mat, a,pt, . . .) instead of the old coordinates (yt,p(t, x)),
then we do not gain considerable advantage in solving special problems.
Nevertheless, there is an important class of problems in which the a poste-
riori probability density (1.5.41) is Gaussian (conditional Gaussian Markov
processes are studied in detail in [131, 1321). We have such processes if [175]
(1) the elements of the matrix are constant numbers; (2) the functions
-
A, linearly depend on x; (3) the function @(x,y) depends on x linearly
and quadratically; (4) the initial probability density (the a pn'ori proba-
bility density of unobservable components before the observation) p(O, x)
is Gaussian. Under these conditions, we have a,py = a,pya = - . = 0 in a
(1.5.41) and (1.5.43), and system (1.5.43) is closed and of finite dimension:
In (1.5.49) the indices a ,P, y take values from 1 to m and the indices p, u, r
from m + 1 to m+ k .
In this case, it follows from (1.5.49) and (1.5.39) that in (1.5.42) we have
A h =A[G(~ ,
u)m+ +
b(t, u ) ] ~ ~ ( t ) [ Q ( t ) & ~ ( t ) ] -~l (( tZ) ,- ) ,
A = - A C ( ~ ) C ~ ( ~G) ~A (- ~ , u ) AAG- ( ~ , u )
+pT(t)[~(t)~T(t)l-lp(t).
90 Chapter I
Now we note that the right-hand sides of (1.5.52) do not explicitly depend
on y(t), and moreover, the cost functional (1.5.47) is independent of the
observable process Z(t). Therefore, in this case, the current values of the
vector yt do not belong to the sufficient coordinates of problem (1.5.45)-
(1.5.47), which are the current values of the components of the vector mt
and of the elements of the matrix At.
If instead of the matrix A we consider the matrix D = A-l of a posteriori
covariances, then, multiplying the first equation in (1.5.52) by the matrix
D from the left and the second equation in (1.5.52) from the left and from
the right and taking into account the formulas
= [ ( 2 ~det
) ~D ] - ' / ~ / C(X, u ) e-(~-m)~D-'(x-m)/2
dx.
(1.5.55)
where a , b, and Y are given constants ( Y > 0) and [(t) is the standard
white noise (1.1.31). The performance of this system is estimated by the
following functional of the form (1.4.3) with quadratic penalty functions:
Chapter I1
F ( t , x) = min E +
[cx2(r) hu2(r)]d r + c 1 z 2 ( ~I)x(t) = x
.(TI
t<s<T
satisfies Eq. (2.1.3) in the strip IIT = (0 5 t < T , -co < x < co} and
becomes a given quadratic function,
for t = T. Condition (2.1.5) readily follows from the definition of the loss
function (2.1.4) or from formula (1.4.22) with +(x, y) = c1x2.
The optimal control u, in the form (1.4.25), which minimizes the ex-
pression in the square brackets in (2.1.3), is determined by the condition
Exact Methods for Synthesis Problems 95
It follows from (2.1.5) that the solutions p ( t ) and r ( t )of (2.1.9) and (2.1.10)
attain the values
p(T) =cl, r(T)=0 (2.1.11)
a t the terminal time t = T .
The system of ordinary differential equations (2.1.9), (2.1.10) with addi-
tional conditions (2.1.11) can readily be integrated. As a result, we obtain
the following expressions for the functions p ( t ) and r ( t ) :
From (2.1.6), (2.1.8), and (2.1.12), we obtain the optimal control law
which is the solution of the synthesis problem for the optimal stabilization
system in Fig. 13. It follows from (2.1.14) that in this case the controller C
in Fig. 13 is a linear amplifier in the variable x with variable amplifica-
tion factor j?(t). In the sequel, we indicate such amplifiers by a special
mark ">." Therefore, the optimal system for problem (2.1.1), (2.1.2) can
be represented as the block diagram shown in Fig. 14.
ways. The first interval [0, T - tl] corresponds to the stationary operating
mode, that is, p(t) Y c / ( P - a ) = const for t E [0, T - tl], the function
r ( t ) linearly decreases as t grows, and on this interval the rate of decrease
in r ( t ) is constant and equal to vc/(,L?- a). The terminal interval [T- t l , T]
is essentially nonstationary. It follows from (2.1.16) and (2.1.17) that the
length of this nonstationary interval is of the order of 3/20. Obviously,
in the case where this nonstationary interval is a small part of the entire
operating time [0, TI, the control performance is little affected if, instead of
the exact optimal control (2.1.14), we use the control
F ( t ,x ) -
loss function (2.1.8) satisfies the approximate relation
C
-x2
P-a
+- 1/C
0-a
(T - t ) .
Comparing (2.1.19) and (1.4.29), we see that in this case the value y of
stationary mean losses per unit time, introduced in $1.4, is equal to
that is, coincides with the rate of decrease in the function r ( t ) on the
stationary interval [0, T - tl] (Fig. 15). In this case, the stationary loss
function defined by (1.4.29) is equal to
and to substitute the desired solution in the form f (x) = px2 into (2.1.22).
We obtain the numbers p and y, just as in the nonstationary case, by setting
Exact Methods for Synthesis Problems 99
the coefficients of x2 and the free terms on the left- and right-hand sides in
(2.1.22) equal to each other.
We also note that if a t least one of the parameters a , 6, v , c , and h of
problem (2.1.1), (2.1.2) depends on time, then, in general, there does not
exist any stationary operating mode. In this case, one cannot obtain finite
formulas for the functions p(t) and r ( t ) in (2.1.8), since Eq. (2.1.9) is a
Riccati equation and, in general, cannot be integrated exactly. Therefore,
if the problem has variable parameters, the solution is constructed, as a
rule, by using numerical integration methods.
2.1.2. All of the preceding can readily be generalized to multidimen-
sional problems of optimal stabilization. Let us consider the system shown
in Fig. 13 whose plant P is described by a linear vector-matrix equation of
the form
+ +
k = A(t)x B ( t ) u u(t)J(t), (2.1.23)
where x = x(t) E R, is a n n-vector-column of phase variables, u E R, is
an r-vector of controlling actions, and J ( t ) E R
, is an m-vector of random
perturbations of a Gaussian white noise type with characteristics (1.1.34).
The dimensions of the matrices A, B , and u are related to the dimensions
of the corresponding vectors and are equal to n x n , n x r, and n x m ,
respectively. The elements of these matrices are continuous functions of
time1 defined for all t from the interval [0, TI on which the controlled system
is considered.
For the optimality criterion, we take a quadratic functional of the form
aF
+ xT AT (t) aF 1 a2F
-
at ax + -2 SP u ( t ) a T (t)-
-
axaxT
'As was shown in [156],it suffices to assume that the elements of the matrices A ( t ) ,
B ( t ) ,and a ( t ) are measurable and bounded.
100 Chapter I1
In this case, the additional condition on the loss function (1.4.22) has
the form
F ( T , x) = xT&x.
The further considerations leading to the solution of the synthesis problem
are similar to those in the one-dimensional case. Calculating the minimum
value of the expression in the square brackets in (2.1.25), we obtain the
dF
-
dt
+x ..
obtained for u, into (2.1.25), we arrive a t the equation
dF
A (t) -
ax
+ 21 spa ( t ) a T(t)-
-
d2F
dxdxT
We seek the solution of (2.1.27) as the following quadratic form with respect
to the phase variables:
Substituting (2.1.28) into (2.1.27) and setting the coefficients of the qua-
dratic (with respect to x) terms and the free terms on the left-hand side in
(2.1.27) equal to zero, we obtain the following system of differential equa-
tions for the unknown matrix P ( t ) and the scalar function r(t):
which follows from (2.1.26) and (2.1.28). Formula (2.1.30) shows that the
controller C in the optimal system in Fig. 13 is a linear amplifier with n
inputs and r outputs and variable amplification factors.
Let us briefly discuss the possibilities of solving system (2.1.29). The
existence and uniqueness of the nonnegative definite matrix P ( t ) satisfying
the matrix-valued Riccati equation (2.1.29) are proved in [72] under the
above assumptions on the properties of the matrices A(t), B(t), G(t), H ( t ) ,
Exact Methods for Synthesis Problems 101
and Q. One can obtain explicit formulas for elements of the matrix P ( t )
only by numerical r n e t h o d ~ ,which
~ is a rather complicated problem for
large dimensions of the phase vector x.
In the special case of the zero matrix G(t) 0, the solution of the matrix
equation (2.1.29) has the form [I, 1321
P (t) = xT(T, t ) E
[
>
Here X ( t , s ) , t s, denotes the fundamental matrix of system (2.1.23);
sometimes this matrix is also called the Cauchy matrix. The properties of
the fundamental matrix are described by the relations
and the exponential matrix can be expressed in the standard way [62] either
via the Lagrange-Silvester interpolation polynomial (in the case of simple
eigenvalues of the matrix A) or via the generalized interpolation polynomial
(in the case of multiple eigenvalues and not simple elementary divisors of
the matrix A).
If the matrix A is time-varying, the construction of the fundamental
matrix (2.1.33) becomes more complicated and requires, as a rule, the use
of numerical integration methods.
'There also exist approximate analytic methods for calculating the matrices P ( t )
[I, 721. However, for matrices P ( t ) of larger dimensions, these methods meet serious
computational difficulties.
102 Chapter I1
where N ( t ) and uo(t) are given matrices and ~ ( t is) either a stochastic
process of the white noise type (1.1.34) or a Gaussian Markov process. In
this case, the optimal control algorithm coincides with (2.1.30) in which,
instead of the true values of the current phase vector x = x(t), we use the
vector of current estimates m = m ( t ) of the phase vector. These estimates
are formed with the help of Eqs. (1.5.53) for the Kalman filter, which with
regard to the notation in (2.1.23) and (2.1.34) have the form3
3 ~ q u a t i o n (2.1.35)
s and (2.1.36) correspond to the case in which ~ ( tin) (2.1.34) is a
white noise.
Exact Methods for Synthesis Problems
where &(t) and &(t) are scalar Gaussian white noises (1.1.31), O ( t ) is an
e-vector of independent Poisson processes with intensity coefficients Xi (i =
1,. . .,l),ul, ~ 2 and
, u3 are given n x n, n x r , and n x & matrices, and
the other variables have the same meaning as in (2.1.23). For the exact
solution of problem (2.1.37), (2.1.24), see 1341.
We also note that sufficiently effective methods have been developed for
infinitely dimensional linear-quadratic problems of optimal control if the
plant P is either a linear dynamic system with distributed parameters or
a quantum-mechanical system. Results corresponding to control of dis-
tributed parameter systems can be found in 1118, 130, 164, 1821 and to
control of quantum systems in [12, 131.
All linear-quadratic problems of optimal control, as well as the above-
treated examples, are characterized by the fact that the loss function sat-
isfying the Bellman equation is of quadratic form (a quadratic functional)
and the optimal control law is a linear function ( a linear operator) with
respect to the phase variables (the state function).
To solve the Bellman equation becomes much more difficult if it is nec-
essary to take into account some restrictions on the domain of admissible
control values in the design of an optimal system. In this case, exact an-
alytical results can be obtained, as a rule, for one-dimensional synthesis
problems (or for problems reducible to one-dimensional problems). Some
of such problems are considered in the following sections of this chapter.
(here u, determines the admissible range of the motor speed, -u, i < <
u,). Equation (2.2.1) adequately describes the dynamics of a constant
current motor controlled by the voltage on the motor armature under the
assumption that the moment of inertia and the inductance of the armature
winding are small [2, 501. We shall show that various synthesis problems
stated in $1.4 can be solved for such servomechanisms.
2.2.1. Let y(t) be a diffusion Markov process with constant drift a and
diffusion B coefficients. We need to calculate the controller C (see Fig. 2)
that minimizes the integral optimality criterion
d~ d~ B ~ ~ F dF
- +a-
at ay
+ --
2ay2
+ c ( x , y) + min
lullurn
(2.2.3)
We shall consider the penalty functions c(x, y) depending only on the error
signal, that is, on the difference z = y - x between the command input y
and the controlled variable x. Obviously, in this case, the loss function
F ( t , x , y) = F ( t , y - x) = F ( t , z ) in (2.2.3) also depends only on z . Instead
of (2.2.3), we have
aF
-+a-
OF
+ Ba2F
--+c(z) + min (2.2.4)
at az 2 az2 IaIIur"
The minimum value of the function in the square brackets in (2.2.4) can be
obtained by using the control4
41n (2.2.5) signa indicates the following scalar function of a scalar variable a :
Exact Methods for Synthesis Problems 105
which requires to switch the servomotor speed instantly from one admissible
limit value to the opposite value when the derivative a F ( t , z ) / a z of the loss
function changes its sign. Control of the form (2.2.5) is naturally called
control of relay type (sometimes, this control is called "bang-bang" control).
Substituting (2.2.5), instead of u, into (2.2.4) and omitting the symbol
"min", we reduce Eq. (2.2.4) to the form
5More precisely, the condition that there exist positive constants A l , A2, and a
such that 0 < c(r) 5 A1 +Azlrla for all r implies constraints on the growth of the
function c(r).
106 Chapter I1
for the functions fl(z)and f2(z)that determine the function f(z) on each
side of the switch point zr.The unique solutions to linear equations (2.2.11)
are determined by the behavior of fl and f2 as lz[-+ co. It follows from
the statement of the problem that if we take into account the diffusion
"divergence" of the trajectories z(t) for large lzl,then we only obtain small
corrections to the value of the optimality criterion and, in the limit as
I
jz -+ co, the loss functions fl(z)and f2(z)must behave just as the solutions
to Eqs. (2.2.11)with B = 0. The corresponding solutions of Eqs. (2.2.11)
have the form
dfl
Z = E ~
2 O"
- yl exp [ - 2(U\- a)(T- z)] dz,
(2.2.12)
df2 -
- --?/z [.(a) - 71exp [2(u\+ a)(Z- z)] d ~ .
dz B -00
According to (2.2.7),we have the following relations a t the switch point zr:
To obtain explicit formulas for the switch points and stationary errors, it
is necessary to choose some special penalty functions c(z). For example, for
the quadratic penalty function c(z) = z2 from (2.2.14), (2.2.15), we have
and hence, the block diagram of the optimal servomechanism has the form
shown in Fig. 17.
The optimal system shown in Fig. 17 differs from the optimal systems
considered in the preceding section by the presence of a n essentially nonlin-
ear ideal-relay-type element in the feedback circuit. The other distinction
between the system in Fig. 17 and the optimal linear systems considered
in 32.1 is that the control method depends on the diffusion coefficient B
of the input stochastic process (in $2.1, the optimal control is independent
of the diffusion coefficient^,^ and therefore, the block diagrams of optimal
deterministic and stochastic systems coincide).
If B = 0 (the deterministic case), then it follows from (2.2.16)-(2.2.19)
that the switch point zr = 0 and the stationary tracking error y = 0. These
'This takes place if the current values of the state vector z ( t ) are measured exactly.
Chapter I1
results readily follow from the statement of the problem; to obtain these
results it is not necessary to use the dynamic programming method. Indeed,
if a t some instant of time we have y(t) > x(t) (z(t) > 0), then, obviously, it
is necessary to increase x a t the maximum rate (that is, a t u = +urn) till
the equality y = x (z = 0) is attained. Then the motor can be stopped. In
a similar way, for y < x (z < O), the control u = -urn is switched on and
operates till y becomes equal to x. After y = x is attained and the motor
is stopped, the zero error z remains constant, since there are no random
actions to take the system out of the state z = 0. Therefore, the stationary
tracking "error" is zero.'
If the diffusion is taken into account, then the optimal deterministic
control u p t = urn signz is not optimal. This fact can be explained as
follows. Let u = urn signz, and let B # 0. Then the following two factors
affect the trajectories z(t): they regularly move downwards with velocity
(urn - a ) for z > O and upwards with velocity (urn+ a ) for z < 0 due to the
drift a and control u (see Fig. IS), and they "spread" due to the diffusion B
that is the same for all z. As a result, the stochastic process z(t) becomes
stationary (since the regular displacement towards the t-axis is proportional
to t and the diffusion spreading away from the t-axis is proportional to &)
and all sample paths of z ( t ) are localized in a strip of finite width containing
the t-axis.' However, since the "returning" velocities in the upper and lower
half-planes are different, the stationary trajectories of z(t) are arranged not
' ~ tis assumed that the penalty function c ( z ) attains its minimum value a t r = 0
and c ( 0 ) = 0.
'More precisely: if z ( 0 ) = 0 , then with probability 1 the values of r ( t ) lie in a strip
of finite width for all t 2 0.
Exact Methods for Synthesis Problems
In conclusion, we note that all results obtained in this section can readily
be generalized to the case where the plant P is subject to additive noncon-
trolled perturbations of the white noise type (see Fig. 10). In this case,
110 Chapter I1
where [(t) is the standard white noise (1.1.31) independent of the input
process y(t) and N > 0 is a given number.
In this case, the Bellman equation (2.2.3) acquires the form
a~
-+ a -
a~ + B a 2 +~N a 2 +~c(x, y) +
- - min [az]
a~ = 0,
at ay 2 a y 2 2 a x 2 I U I S U ~
This equation differs from (2.2.4) only by a coefficient of the diffusion term.
Therefore, all results obtained for systems whose block diagram is shown
in Fig. 2 and whose plant is described by Eq. (2.2.1) are automatically
valid for systems in Fig. 10 with Eq. (2.2.21) if in the original problem the
+
diffusion coefficient B is replaced by B N. In particular, if noises in the
plant are taken into account, then formulas (2.2.16) and (2.2.17) for the
stationary switch point and the stationary tracking error take the form
Note also that the problem studied in this section is equivalent to the
synthesis problem for servomechanism tracking a Wiener process of inten-
sity B with nonsymmetric constraints on admissible controls -urn a 5 +
u 5 u, + a , since both these problems have the same Bellman equation
(2.2.4).
2.2.2. Now let us consider the synthesis problem that differs from the
problem considered in the preceding section only by the optimality crite-
rion. We assume that there is an admissible domain Ill, 12] for the error
) y(t) - x(t) (el and l 2 are given numbers such that el < e 2 ) . We
~ ( t=
assume that if z(t) leaves this domain, then serious undesirable effects may
occur. For example, the system considered or a part of any other more
complicated system containing our system may be destroyed. In this case,
Exact Methods for Synthesis Problems 111
it is natural to look for controls that keep z(t) within the admissible limits
for the maximum possible time.
General problems of calculating the maximum mean time of the first
passage to the boundary were considered in $1.4. In particular, the Bell-
man equation (1.4.40) was obtained. In the scalar case studied here, this
equation has the form
The function F l ( z ) satisfies Eq. (2.2.25) a t the interior points of the domain
[11,12] of admissible errors z. At the boundary points of this domain, Fl
vanishes (see (1.4.41)):
The optimal system can be synthesized by solving Eq. (2.2.25) with the
boundary conditions (2.2.26). Just as in the preceding section, one can see
that the optimal control u,(z) is of relay type and is equal to
The condition of smooth matching (see [113], p. 52) implies that the
solution Fl(z) of Eq. (2.2.28) and the derivatives dF11d.z and d2Fl/dz2 are
Chapter I1
continuous everywhere in the interior of [.el, e2]. Therefore, the switch point
z: is determined by the condition
which is similar t o (2.2.20) and differs only by the position of the switch
point. Thus, in this case, if the applied constant displacement -zr is re-
placed by -zt, then the block diagram of the optimal system coincides with
that in Fig. 15.
The switch point z: can be found by solving Eq. (2.2.28) with the bound-
ary conditions (2.2.26). Just as in the preceding section, we replace the
nonlinear equation (2.2.28) by the following pair of linear equations for the
Exact Methods for Synthesis Problems 113
function ~ : ( z ) ,z: < z < e2, and the function F c ( z ) ,l1< z < zi:
The required switch point z: can be obtained from the matching conditions
for F $ ( z ) and F c ( z ) . Since F l ( z ) is twice continuously differentiable, it
follows from (2.2.27) that these conditions have the form
z - l2 2(um - a )( 2 ;
F$(Z) = - +
urn--a 2 ( u , - ~ ) ~ B
-t2)
I
- a)(. - 2:)
- exp [ 2 ( u m
e, - z
+ B { exp [ 2 ( u m+ a$l(z:
F; (2) = -
um+a ~(u,+u)~
-el)
I
2(um + a ) ( z zi)
exp [ -
-
-
B I)-
By using (2.2.33) and the continuity condition (2.2.31), we obtain the fol-
lowing transcendental equation for the required point z::
2aum B
+
2urnz: = (urn a)l2 + (u, - a)el +-
u& a2-
+ -{-
B
2
urn-a 2
+
u m + a exp [E(urn a )(2: - e l ) ]
Urn
--
2
+aexp[- B(um-a)(z:-&)]}. (2.2.34)
urn - a
that is, the switch point is the midpoint of the interval of admissible er-
rors z. This natural result can be predicted without solving the Bellman
equation. In the other special case where -el = l2= l ( l > 0) and a << um,
Eq. (2.2.34) gives the following approximate expression for zi:
The parameter ,6 > 0 determines the observation time for the stochastic
process ~ ( 7 ) We
. assume that the criteria (2.2.35) and (2.2.36) are equiv-
alent if the terminal time T and the variable ,6 are matched, for example,
as follows: T = c/o, where c > 0 is a constant.
The Bellman equation for the problem considered can be obtained from
(1.4.51) with regard to the relation f2(x, y) = f2(y - x) = f2(z). This
equation has the form
Just as in the preceding sections, after the expression in the square brackets
is minimized, Eq. (2.2.37) acquires the form
df2
2 , iff2(z) > z], (2.2.38)
otherwise.
In this case, just as in the preceding sections, the optimal control u,(z) is
of relay type and can be written in the form (2.2.20). The only distinction
Exact Methods for Synthesis Problems 115
is that, in general, the switch point z: differs from zr and 2:. The point z:
can be found by solving Eq. (2.2.38).
Solving Eq. (2.2.38), we shall distinguish two domains on the z-axis:
the domain Z1 where f 2 (z) > jzI and the domain Z2 where f2(z) = 121.
Obviously, if f2(z*) = lz*I for any z*, then f2(z) = 1x1 for any z such that
IzI > Iz*I. In other words, the domain Z2 consists of two infinite intervals
(-CQ, z') and (z", +a). In the domain Z1 lying between the boundary
points z' < 0 and z" > 0, we have
Next, the interval [zl, z"] is divided by the switch point z: into the fol-
lowing two parts: the interval z' < < z z: where Eq. (2.2.39) takes the
form
Thus, in this case, we have seven unknown variables: z', z", z:, and the
four constants obtained by integrating Eqs. (2.2.40) and (2.2.41). They can
be obtained from the following seven conditions:
Formulas (2.2.42) are smooth matching conditions for the solutions f; (z)
and f$(z). The last three conditions show that the solutions and their
first-order derivatives are continuous a t the switch point z: (see (2.2.31)
and (2.2.32)). The first four conditions show that the solutions and their
first-order derivatives are continuous a t the boundary points of Z1.
By solving (2.2.40) and (2.2.41) with regard to (2.2.42), we readily obtain
116 Chapter I1
The desired switch point z: can be found by solving the system of tran-
scendental equations (2.2.43)-(2.2.45). Usually, this system can be solved
by numerical methods. One can obtain the explicit expression for z: from
Eqs. (2.2.43)-(2.2.45) only if the problem is symmetric, that is, if a = 0. In
this case, the domain 2 1 is symmetric about the origin, z' = -zl', and we
have the switch point z: = 0. However, this is a rather trivial case and of
no practical interest.
REMARK.It should be noted that the optimal systems considered in
Sections 2.2.2 and 2.2.3 are very close to each other (the switch points
nearly coincide, z: z:) if the corresponding parameters of the problem
agree well with each other. These parameters can be made consistent in
the following way. Assume that the same parameters a, B , and urn are
given in problems of Sections 2.2.2 and 2.2.3. Then, choosing a value of
Exact Methods for Synthesis Problems 117
the parameter p, we can calculate three numbers z' = zl(P), z" = z1'(/3),
and z: = z:(,B) in dependence on the choice of 0. Now if we use z' and z"
as the boundary values of admissible errors (el = zl(P), ez = zl'(P)) in the
problem considered in Section 2.2.2, then by solving Eq. (2.2.34), we obtain
the coordinate of the switch point z: and show that z:(/3) w ~:(/3)' for
varying from 1.0 to 10W4. This is confirmed by the numerical experiment
described in [92]. Moreover, in [92] it is shown that Fl(z;(P)) w P-l for
these values of the parameter p.
2.2.4. Now let us consider the synthesis problem of optimal tracking a
discontinuous Markov process. Let us assume that the input process y(t)
in the problem of Section 2.2.1 is a pure discontinuous Markov process. As
shown in 51.1, such processes are completely characterized by the intensity
A(y) of jumps and the density function ~ ( yy') , describing the transition
probabilities a t the jump moments. The one-dimensional density p(t, y) of
this process satisfies the Feller equation (see (1.1.71))
From (1.4.61) with regard to (2.2.1) and (2.2.2), we obtain the Bellman
equation
+ c(z, y) + min
IuIIum
Comparing Eqs. (2.2.46) and (2.2.47) with the Feller equations (1.1.69) and
(1.1.70), we see that, for pure discontinuous processes, the Bellman equation
(2.2.47) contains the integro-differential operator ~t~
of the backward Feller
equation; this operator is dual to Lt,y. Therefore, Eq. (2.2.47) can be
written in the form
9 ~ h approximate
e relation zt(P) Z x;(P) means that lz:(P)-zr(P)I << z"(P)-~'(P).
118 Chapter I1
In what follows, we assume that the input Markov process y(t) is homo-
geneous with respect to the state variable y, that is, X(y) = X = const and
~ ( yy'), = ~ ( -yy'). In this case, by using the formal method proposed in
[176], we can replace the integro-differential operator L:~ in (2.2.47) and
(2.2.49) by an equivalent differential operator.
Let us show how to do this. First, we try to write Eqs. (2.2.46) and
(2.2.48) in the form
we obtain the following two equations from the well-known property of the
Fourier transform of the convolution of two functions:1°
where r ( s ) denotes
(hi and qi are constant numbers), then, as follows from the theory of Fourier
transforms [41], the desired operator L(d/ay) can be obtained from L(s)
l o ~ e c a lthat
l X(y) = X and T(Z, y) =~ ( -yZ) in (2.2.46).
Exact Methods for Synthesis Problems 119
differs from (2.2.20) only by the position of the switch point z:, which can
be obtained from the condition
and the domain z < z:, where df3/dz < 0, u, = urn, and Eq. (2.2.57) has
the form
kz (i)
-L - f'+u,-&- df' = y - c(z).
With regard to (2.2.64), we can write Eqs. (2.2.59) and (2.2.60) in the form
d 2 ~ + d$Q k2 * XAk
- $ Q =( z ) (2.2.67)
( A ) Urn)
where
c f (z) ==I--
Urn
ReIations (2.2.6 1) and (2.2.66) impIy the following matching conditions for
the functions $Q* (z) a t the switch point zz:
Exact Methods for Synthesis Problems 121
By I-"$ and p i we denote the roots of the characteristic equation for the
function cp+(z) (correspondingly, by pT and py the roots of the character-
istic equation for cp-(2)). A straightforward verification shows that if
then
(1) all roots p t 2 are real,
(2) each characteristic equation in (2.2.70) has roots of opposite signs
(for definiteness, in what follows, we assume that p$ and p: are
positive, and respectively, p: < 0).
REMARK.Note that condition (2.2.71) must be satisfied, since this is
the existence condition for the stationary tracking considered here. Indeed,
the expression on the left-hand side in (2.2.71) is equal to the absolute value
of the mean rate of the regular displacement in the command signal y(t)
caused by random jumps. Obviously, this rate of change cannot exceed the
limit admissible speed of the servomotor. Inequality (2.2.7 1)just confirms
this fact.
Taking into account these properties of the roots of the characteristic
equations (2.2.70), one can readily see that the solutions of Eqs. (2.2.67)
have the form
Using (2.2.72), from the matching conditions (2.2.69), we obtain the fol-
lowing equation for the required switch point z;:
122 Chapter I1
For the quadratic penalty function c(z) = z 2 , Eq. (2.2.73) has an exact
solution. Actually, taking into account (2.2.68)) we can rewrite Eq. (2.2.73)
in the form
where
Since p t 2 and pL2 satisfy the quadratic equations (2.2.70), after easily
transformation, we obtain the following explicit formula for the switch
point:
Using (2.2.69)) (2.2.72), and (2.2.77), we can readily calculate the stationary
specific error
2u; X 2
+ (AAk - k 2 ~ m )[(Ak
2 + -)
urn + (k2 E
urn ) .] (2.2.78)
-
Exact Methods for Synthesis Problems 123
For the first time, formulas (2.2.79) and (2.2.80) were derived by somewhat
different methods in [176].
Here x = x(t) is the population size1' a t time t , and the constant number a ,
called the growth factor, is determined as the difference between the birth-
rate and the death-rate factors.
If the birth-rate is larger than the death-rate ( a > O), then, according
to the Malthus model (2.3.1), the population size must grow infinitely.
Usually, this result is not confirmed, which shows that the mode1 (2.3.1)
is not perfect. Nevertheless, the basic idea of this model, namely, the
assumption that the rate of the population variation is proportional to the
current population size proved to be very fruitful. Many more realistic
models were constructed on this basis by introducing the corresponding
corrections to the growth factor a.
So, for example, if we assume that in (2.3.1) the growth factor a depends
on the population size x as
Equation (2.3.3) is often called the logistic equation. The positive con-
stants r and K are usually called the natural growth factor and the capacity
of the medium.
Models for more complicated systems of interacting populations are also
based on the Malthus model (2.3.1). Assume that in the same habitat there
are two different populations of sizes x l and 22, respectively. Let each of
these populations be described by the Malthus type equations
5=T
(1 - -3 x-qux,
>
where the function u = u(t) 0 is the intensity of the catching process and
the number q > 0 is the catchability coefficient. In this case, the value
Q =9 lr u(t)x(t) dt (2.3.11)
gives the number of individuals caught during the time interval [tl, t2].13
In a similar way, the Lotka-Volterra equations can be generalized to the
following controlled system:
where [(t) is the scalar Gaussian white noise (1.1.31) and the number B > 0
determines the intensity of random perturbations. Many other stochastic
models used to describe the dynamics of various biological systems can be
found in [51, 831.
13Note that Eq. (2.3.10) can be used not only in the case of "mechanical" removal
(catching, shooting, etc.) but also in the case where the population size is controlled by
treating the habitat with chemical agents.
Exact Methods for Synthesis Problems 127
j:=r
(1 - -3 x-qux,
By p > 0 we denote the price of the unit mass of caught fish and by
c > 0 the price of unit "efforts" u for fishing. Then it is natural to estimate
the "quality" of control by the functional
which, with regard to (2.3.11), gives the profit provided by the fishing
<
process defined by the control function u ( t ) : 0 t 2 T. The problem is to
< <
find the optimal control u, (t) : 0 t T for which the functional (2.3.15)
attains its maximum.
Following [35, 681, instead of (2.3.15), we shall estimate the quality of
control (i.e., of fishing) by the functional in which the terminal time T + m.
In this case, an additional "killing" factor appears in the integrand to ensure
the convergence, namely,
t > 0 and any admissible control u(t), the population size has the same
property, x(t) < K . Therefore, this problem is well posed if the parameters
p, q, and c in the functional (2.3.16) satisfy the condition
Otherwise, (xl > - K ) , this problem has only a trivial solution u,(t) 0,
t >
- O.I4 Therefore, in what follows, we assume that the inequality (2.3.17)
is satisfied. We also assume that qu, > T .
2.3.3. The solution of problem (2.3.14), (2.3.16). If we define the
function F ( x ) of the maximum future profit by the relation
F(x) = max
O5u(t)lum
[lm I (pqx(t) - c)u(t) dt ~ ( 0 =) x
I , (2.3.18)
then, using the standard procedure described in 3 1.3, we obtain the Bellman
equation
max {[rx (1 -
0LuLum
) - qux] -bF + (pyx - c)u
x(t) E R1 for all t E 10, t,). Hence, it follows from Eq. (2.3.14) with u 0
that, on the interval [0, t,), the population size x(t) increases monotonically
u p to the value x, = x(t,) that separates the sets R1 and R2. At the
point x,, as was already noted, the control may take any arbitrary value.
It is expedient to take this value equal to
and keep it constant for t >t,. It follows from (2.3.14) that the control
(2.3.21) preserves the population size x,.
REMARK. For u(x,) # u l , the representative point of system (2.3.14),
starting from the state x,, comes either to the set R1 (for u(x,) > u l ) or to
the set R2 (for u(x,) < ul) during a n infinitely small time interval. Then
the control u = 0 (or u = u,) immediately returns the representative point
to the state x,. Thus, for u(x,) # u l , the population size x, is preserved
by infinitely rapid switches of control (this is the sliding mode). Though,
as follows from (2.3.19), the value of the functional (2.3.16) for this control
remains the same as for u(x,) = u l , the constant control u(t) E u(x,) = u1,
>
t t,, is more convenient, since in this case the existence problem does not
>
arise for the solution x(t), t t,, of Eq. (2.3.14). The optimal control
21, (t) =
for 0 < t < t, (x(t) < x,),
(2.3.22)
for t>t,,
realizes the generalized solution x,(t) of Eq. (2.3.14) in the Filippov sense
(see 51.1).
Thus, for x(0) = xo < X I the optimal control (2.3.22) is a piecewise
function shown in Fig. 21 together with the plot of the function x, (t), which
shows the change of the population size corresponding to this control. It
remains to find the moment t, a t which the catching of individuals starts or,
which is the same, to find the size (density) x, = x, (t,) of the population
that we need to keep constant in the area of active catching. These variables
can readily be obtained by calculating the functional (2.3.16) and taking its
maximum with respect to t,. Indeed, for the control (2.3.22), the functional
(2.3.16) is equal to
We can calculate its maximum with respect to t, by using the fact that
x, = x, (t,) as a function of t, satisfies Eq. (2.3.14) for u E 0. After the
Chapter I1
FIG. 21
- -de-8t* d.1C, -
$ ( x , ) + e-6'. - dx*
dt* d z , dt,
we obtain the following equation for the optimal size x , of the population:
ScK
allows us to find the moment t,. From (2.3.23) and (2.3.27), we explicitly
calculate the profit function
-
T
- -(pqx*
6q
- c)
(1- -
'1 x ( K - x,) 'IT
[.*in - (2.3.2,)
for x 5 2,.
To solve problem (2.3.14), (2.3.16) completely, it remains to consider
the case x(0) = xo > x,, that is, the case where the initial population
size is larger than the optimal size (2.3.25). First, we note that, in view of
(2.3.28), the profit function F ( x ) monotonically increases on the interval
<
0 x 5 x, from zero to the maximum value
We also note that the function +(x) has only one maximum point
we can keep the population size a t a level of x2 for which the functional
(2.3.16) attains the value
However, the constant control (2.3.30) is not optimal. One can readily
see that the functional (2.3.16) can take values larger than I ( u ( x 2 ) ) =
$(a2) if, instead of (2.3.30), we use the piecewise constant control function
15The inequality I(ua(t)) > I(u(12)) = 4(12) is obtained by calculating the func-
tional (2.3.16) with regard to Eq. (2.3.14), where control has the form (2.3.31). Here
we do not perform the corresponding elementary but cumbersome calculations and leave
them to the reader as an exercise.
Exact Methods for Synthesis Problems 133
Obviously, control functions of the form (2.3.31) can be used not only
for the initial population size x(0) = 2 2 but also for arbitrary initial sizes
x(0) = x > 2,. In this case, we must perform the change 2 2 -+ x only in
Eq. (2.3.32) for the length A of the initial pulse urn. One can easily verify
that (2.3.20) implies p ( x ) > 0 for all x > x,. Therefore, the optimal control
as a function of the current population size (the synthesizing function) for
problem (2.3.14), (2.3.16) has the form
for <
0 x < x,,
for x=x,, (2.3.33)
for x > x*,
where x, is determined by (2.3.25).
Formula (2.3.33) gives the mathematical expression for the control strat-
egy that is well known in the theory of optimal fisheries management [35,
681. The key point in this strategy is the existence of an optimal size x, of
fish population given by (2.3.25). The goal of control is to achieve the op-
timal size x, of the population as soon as possible and to preserve the size
x, by using the constant control (2.3.21). This control strategy maximizes
the profit obtained by fishing if the profit is estimated by the functional
(2.3.16).
In conclusion, we note that the results presented in this section can be
generalized to the case in which the dynamics of fish population is subject
to the retarded equation (or the equation with delay)
here we have studied the controlled Hutchinson model. For the results
related to this case, see [99].
The stochastic version of problem (2.3.14), (2.3.16), when the behavior
of the population is described by the stochastic equation (2.3.13), will be
described in $6.3.
where ((t) is a scalar Gaussian white noise (1.1.31), B > 0 is a given positive
number, the natural growth factor r > 0 and the catchability coefficient
q > 0 have the same meaning as similar coefficients in (2.3.10), (2.3.13),
and (2.3.14). Equation (2.4.1) is a special case (as K + m ) of Eq. (2.3.13)
and, in accordance with the classification presented in Section 2.3.1, the
model described by Eq. (2.4.1) can be called a controlled stochastic Malthus
model.
Just as in $2.3, the size x(t) of the fish population is controlled by catch-
ing a part of this population. In this case, the catching intensity u(t) has
an upper bound u,, and therefore, the set of all nonnegative measurable
bounded functions of the form u(t) : [0, a)+ [O, u,] is considered as the
set of admissible controls. The goal of control is to maximize the functional
(2.3.16), which, in view of the random character of the functions x(t) and
u(t), is replaced by the corresponding mean value. As a result, we have the
problem
In what follows, we assume that the decay index S in (2.4.2) satisfies the
condition S > r .
We shall solve problem (2.4.1), (2.4.2) by using the standard procedure
of the dynamic programming approach described in 51.4. We define the
profit function for problem (2.4.1), (2.4.2) by the relation
for the profit function (2.4.3) can be obtained as usual (see $1.4). It should
be pointed out that a symmetrized stochastic integral (see [I741 and 51.2)
Exact Methods for Synthesis Problems 135
was used for writing (2.4.5). This led to a n additional term B in the
parentheses in (2.4.5), that is, in the coefficient of xF,.16
Equation (2.4.5) allows us to find the optimal control u, as a function
u,(x) of the current states of system (2.4.1). First, we note that, according
to (2.4.5), the set of all admissible states of (2.4.1) can be divided into the
following two subsets (just as in $2.3):
the subset R1, where p ( x ) = pqx - c - qxFx < 0 and u,(x) =0
and
the subset R ~where
, ~ ( x>
) 0 and u, (x) = urn.
The boundary between these two subsets is determined by the relation
The further calculations show that, in this problem, there exists a unique
point x, satisfying (2.4.6). Therefore, the subsets R1 and R2 are the inter-
vals R1 = [0, x,) and R~ = (2, , CQ). Thus the optimal control in the syn-
thesis form u, = u,(x) becomes uniquely determined a t all points x E R+
except for the point x,. It follows from (2.4.6) that we can use any admis-
sible control u(x,) E [0, urn] a t the point x,.
Therefore, the optimal control function u, (x) can be represented in the
form
and the final solution of the synthesis problem is reduced to calculating the
coordinate of the switch point 2,. To calculate x,, we need to solve the
Bellman equation (2.4.5).
As was already noted, the second-order derivative of the profit function
F ( x ) is continuous, thus the profit function F ( x ) satisfying (2.4.5) can
be obtained by using the matching method (see $2.2). In what follows,
we describe the procedure for solving the Bellman equation (2.4.5) and
calculating the coordinate of the switch point x, in detail.
By F1(x) and F 2 ( x ) we denote the profit function F ( x ) on the intervals
R1 = [O, x,) and R' = (x,, CQ). It follows from (2.4.5) and (2.4.7) that the
functions F1 and F2 satisfy the linear equations
I6If the stochastic differential equation in (2.4.1) is the Ito equation, then the second
term in the Bellman equation (2.4.5) has the form ( T - qu)xF,.
136 Chapter I1
Since the profit function F ( x ) is sufficiently smooth (recall that the second-
order derivative of F ( x ) is continuous), both functions F 1 and F 2 must
satisfy the condition (2.4.6) a t the switch point x,. Taking into account
the fact that F ( 0 ) = 0 according to (2.4.1) and (2.4.3), we have the following
additional boundary condition for the function F 1 ( x ) :
The boundary conditions (2.4.6), (2.4.10), and the upper bound (2.4.4)
allow us to obtain the explicit analytic solution of Eqs. (2.4.8) and (2.4.9).
Equation (2.4.8) is the well-known homogeneous Euler equation. Its
general solution has the form
of Eq. (2.4.12) have opposite signs, we conclude that to satisfy the condition
(2.4.10), we need to set A 2 equal to zero in (2.4.11). The constant
where
1
k21 -
--2B [ P m - T. + J(~u, - T ) +~4B6],
a t the switch point 2,. It follows from (2.4.6), (2.4.8), and (2.4.9) that
(2.4.18) is equivalent to the condition
c6
lim T, =
~ - t m pq(6 - r ) '
that leads to the exact solution of the stochastic problem (2.4.1), (2.4.2)
does not allow us to solve the synthesis problem (that is, to find the switch
point 2,) for the deterministic version of problem (2.4.1), (2.4.2), that is,
in the case where there are no random perturbations in Eq. (2.4.1). This
fact can readily be verified if we consider the deterministic version of the
Bellman equation (2.4.5).
max [(T -
o<u<u,
The way for solving this equation is quite similar to the above procedure
for solving Eq. (2.4.5). However, the population size x, that determines
the switch point for the optimal control (2.4.7) differs from (2.4.20) and is
Chapter I1
Z* =
c(6 - r + qu,)
(El 1)
pq[h--r+ - I qumll
(k:-%;)
CHAPTER I11
'Note that in this book we do not consider deterministic synthesis problems for
141
142 Chapter I11
From the formal mathematical viewpoint, the fact that control actions
are small leads to a small parameter in the nonlinear term in the Bellman
equation. To verify this fact, let us consider the synthesis problem for
the servomechanism (Fig. 10) governed by the Bellman equation (1.4.21).
Assume that the dimensions of the region U of admissible controls are
bounded by a small value of the order of E. For definiteness, we assume
that U is either an r-dimensional parallelepiped (R, > U = ( u : lujl < -
In the second case (where U is a ball), the optimal control has the form
(see (1.3.23))
systems controlled by small forces. Such systems called weakly controllable in [32] were
studied in [32, 1371.
'Recall that relations (3.0.1) and (3.0.2) follow from the Bellman equation (1.4.21)
+
with c ( z , y,u ) = CI ( x , y) and A z ( t , z ) = a ( t ,x ) Q(t)u; {u,~, . . . ,u,,) denotes a diag-
onal (T x T)-matrix; for a column A with components Al , . . . , A,, the expressions sign A
and [ A [ denote T-columns with components sign Aj and [ A j [ ( i = 1 , . . . ,T), respectively.
Approximate Synthesis of Stochastic Control Systems 143
where the vector d F / d z is the gradient of the loss function satisfying the
equation
If we denote the nonlinear terms in Eqs. (3.0.2) and (3.0.4) in the same
way, then we can write both equations in the form
If we know the solution F k ( t ,x, y) of the equation for the kth approxima-
tion (k = 0,1, . . .), then we can perform an approximate synthesis of the
controlled system by setting the quasioptimal control algorithm as
where c(x) >0 is a given convex penalty function attaining the absolute
minimum c(0) = 0 a t the point x = 0 (the restrictions on c(x) are discussed
in detail in $3.4 and $3.5).
Let admissible controls be bounded and small. We assume that all com-
ponents of the control vector u satisfy the conditions
where E > 0 is a small parameter and urnl,. . .,urn, > 0 are given numbers
of order 1.
The system shown in Fig. 13 is a special case (the input signal y(t) 0)=
of the servomechanism shown in Fig. 10. Therefore, the Bellman equation
for problem (3.1.1)-(3.1.3) readily follows from (1.4.21), and taking into
account the relations AY(t, y) = 0, BY@,y) = 0, Ax(t, x, u) = Ax + Q u , and
~ ( xy,, u) = c(x), we obtain
where, according to (1.4.16), the matrix B = uuT, and, as usual, the sum
in the last expression on the right-hand side of (3.1.5) is taken over repeated
indices from 1 to n.
It follows from (3.0.1) and (3.0.2) that in this case the optimal control
has the form
It follows from (3.1.9) and (3.1.10) that each time when we are calcu-
lating the next approximation, we need to solve a linear inhomogeneous
elliptic equation of the form
We shall consider a method for solving Eq. (3.1.11) with a given function
cp(x), which allows us to represent the function f (x) in the form of a series
in eigenfunctions of some Sturm-Liouville problem [179].
3.1.2. T h e passage t o t h e adjoint equation. Let us consider the
operator
L* = - - (aA , . 1 3 % ~ ) -
1 a2
axi + 2-axiaxj(Bij), (3.1.12)
where
2 = V - ~ A V , a = V-Lu.
- (3.1.19)
We choose so that the matrix 2 is diagonal
As is known [62], the matrix always exists and can readily be constructed
if the eigenvalues of the matrix A are simple, that is, if the characteristic
equation of the matrix A,
det(A - X E ) = 0 , (3.1.21)
-
has different roots X I , X 2 , . . ., An. In this case, the columns of the matrix V
are the eigenvectors 2 of the matrix A that satisfy the linear equations
148 Chapter I11
The system (3.1.18) can readily be solved in the case of (3.1.20). Indeed,
writing (3.1.18) in rows, we obtain
-
Here gLmis a n element of the matrix B = iiiiT. Solving Eq. (3.1.23), we
obtain
and taking into account (3.1.24), derive the following expressions for the
means and covariances:
-
Btm (3.1.26)
Eyt ( t ) y m ( t ) - Eye ( t ) . Eym ( t ) = ---[ e ( ' t + A m ) ( t - t ~ )
Je + Am
- 11,
which determine the transition roba ability p ( y ( t ) I y(to)) of the Gaussian
process ~ ( t ) It
. follows from (3.1.26) that
One can readily see that the operator L; has the form5
4As usual, the operator equality is understood in the sense that it is an ordinary
relation po(x)Lw(x) = L*po(x)w(x) for any sufficiently smooth function w(x).
5The verification of (3.1.33) is left to the reader as an exercise.
150 Chapter I11
--
where gj is an element of the matrix B = mT, a = V 1a, 7is a nonde-
1 -
generate matrix such that the transformation V - GV makes the matrix
diagonal, = {XI, . . ., A,), and Xi are roots of Eq. (3.1.21).6
In the new variables the stationary Fokker-Planck equation has the form
This equation differs from (3.1.30) only by the matrix of diffusion coeffi-
cients; therefore, the stationary probability density %(y) is determined by
the formulas
aml+...+m,
x
ayyl . . .a y r n
i
exp [ - (yTPy)]. (3.1.45)
It follows from the general theory [4] for Hermitian polynomials with real
variables y that these polynomials form a closed and complete system of
functions, and an arbitrary function from a sufficiently large class (these
functions grow a t infinity not faster than any finite power of lyl) can be
expanded in an absolutely and uniformly convergent series in this system
of functions. Furthermore, the polynomials H are orthogonal to another
group of Hermitian polynomials G given by the formula
similar to (3.1.31).
Thus, we obtain the following algorithm for constructing the solution
f (x) of Eq. (3.1.11). First, we seek a stationary density po(x) satisfying
(3.1.14) and a n operator L; satisfying (3.1.33). Then we transform prob-
lem (3.1.11) to problem (3.1.36). After this, to find the eigenfunctions and
eigenvalues of problem (3.1.38), we need to calculate the matrix 7 that
transforms the matrix G to the diagonal form {XI, . . ., A,} by the simi-
1 -
larity transform 7- GV. Next, using the known Xi and 7and (3.1.42),
1
we calculate the matrices P- and ?5 that determine the stationary distri-
bution (3.1.41). The expression obtained for %(y) enables us to find the
eigenfunctions z, = z,, ...mm (3.1.44) for problem (3.1.38) and the orthog-
onal polynomials G, ,... m n (3.1.46).
Finally, we seek the function z(x) satisfying (3.1.36) in the form of the
series with respect t o the eigenfunctions:
Approximate Synthesis of Stochastic Control Systems 153
po(x)p(x) = 5
ml...mn=O
bml ...mnzml...mn (XI, (3.1.52)
This relation, (3.1.9) and (3.1.10) imply the following expressions for the
stationary losses y k :
Thus, we have completely solved the problem of how to calculate the succes-
sive approximations (3.1.9), (3.1.10) for the stationary operating conditions
of the optimal stabilization system.
If the loss function f k (x) in the kth approximation is calculated, then the
quasioptimal control uk (x) in the kth approximation is completely defined,
namely, in view of (3.0.8) and (3.1.6), we have
In the next section, using this general algorithm for approximate syn-
thesis, we shall calculate a special system of optimal damping of random
oscillations when the plant is a linear oscillating system with one degree of
freedom.
the scalar random process [(t) is the standard white noise (1.1.31), and P,
B, and E are given positive numbers (p < 2).
Equations of the type of (3.2.1) describe the motion of a single mass
point under the action of elastic forces, viscous friction, controlling and
random perturbations. The same equation describes the dynamics of a
direct-current motor controlled by the voltage applied to the armature when
Approximate Synthesis of Stochastic Control Systems 155
the load on the shaft varies randomly. Examples of other actual physical
objects described by Eq. (3.2.1) can be found in [2, 19, 27, 1361.
For system (3.2.1), (3.2.2), it is required to calculate the optimal regu-
lator (damper) C (see Fig. 13), which will damp, in the best possible way
with respect to the mean square error, the oscillations constantly arising
in the system due to random perturbations ((t). More precisely, as the
optimality criterion (3.1.2), we shall consider the functional
which has the meaning of the mean energy of random oscillations in system
(3.2.1). Note that the mean square criterion (3.2.3) is used most frequently
and this criterion corresponds to the most natural statement of the optimal
damping problem [I, 501. However, there are other statements of the prob-
+
lem with penalty functions other than the function c(x) = x2 i2exploited
in (3.2.3). From the viewpoint of the method used here for solving the syn-
thesis problem, the choice of the penalty function is of no fundamental
importance.
To make the problem (3.2.1)-(3.2.3) consistent with the general state-
ment treated in 53.1, we write Eq. (3.2.1) as the following system of two
first-order equations for the phase coordinates x l and 2 2 (these variables
can be considered as the displacement x l = x and the velocity x2 = i ) :
Using the vector-matrix notation, we can write system (3.2.4) in the form
(3.1.1), where A, Q, and u are the matrices
U* (xl, 2 2 ) = -E sign
(S)-
Here f = f ( x l , x 2 ) is the loss function satisfying the stationary Bellman
equation (see (3.1.8))
156 Chapter I11
The equation
It follows from (3.2.11) and (3.1.35) that in this case the matrix G of the
operator (3.1.34) coincides with the transpose matrix AT, that is, according
to (3.2.5), we have
The matrix V that reduces (3.2.12) to the diagonal form by the similarity
transform is equal to
(3.2.14)
This expression and formulas (3.1.50) and (3.2.12) imply
1
Correspondingly, the inverse matrix P- has the form
(3.2.19)
where the coefficients bFm are calculated by the formulas
Chapter I11
Polynomials H
Polynomials G
7 =
' & (x: + x i ) exp [ - (x: + xi)] dxldx2.
Calculating the integral, we obtain
In view of Remark 3.2.2, the first coefficients aYo and a:, in the series
(3.2.19) are equal to zero.8 The coefficients bgo, b!, and b,: can be cal-
culated by using the formulas for G20, GI1, and Go2 from Table 3.2.1 and
'The same result can be obtained if we formally calculate the coefficients byo and b:l
by using (3.2.20).
160 Chapter I11
(3.2.16). Then, according to (3.2.20), the coefficient b!& has the form
The integral in (3.2.22) can readily be calculated, thus, taking into account
(3.2.21) and (3.2.14), we obtain
All other coefficients b;,, with t+m > 2 are zero in view of the orthogonality
condition (3.1.49).
According to (3.2.19), it follows from (3.2.23) and (3.2.24) that
Finally, using the formulas for H20, Hll, and H o from~ Table 3.2.1 and
(3.2.25), we obtain the loss function in the zero approximation
This relation and condition (3.2.9) imply the following equation for the
zero-approximation switching line rO:
Approximate Synthesis of Stochastic Control Systems 161
In this case, the quasioptimal control algorithm uo (x) in the zero approxi-
mation has the form
Lfo = y 0 -2,
2
-2,
2
with unknown coefficients hll, h12, and h22, then, substituting this ex-
pression into (3.2.39), we obtain four equations for hll, hlz, h22, and
However, higher approximations cannot be obtained by this simple reason-
ing.
The first approximation. It follows from (3.1.10) and (3.2.26) that
in the first approximation we need to solve the equation
1 B
7 = --
P
*JJ_m_ + ix2/
TB
lzl exp [ - $(x: + xi)] dxldx2,
then, after the integral is calculated, we obtain
the intermediate calculations and write the final expression for f l ( x l , 22).
Taking only the first terms in the series (3.2.19) u p to the fourth order
inclusively (that is, omitting the terms for which (l+ m) > 4), we obtain
the following expression for the loss function in the first approximation:
The condition P2 << 1 has also been used for calculating the coefficients
(3.2.32).
From (3.2.9) and (3.2.31) we obtain the following equation for the switch-
ing Iine r1 in the first approximation:
It follows from the continuity conditions that for small E the switching
line r1 is close to r0 determined by Eq. (3.2.27). Therefore, if we set
2 2 = -(/3/2)21 in the terms of the order of E in (3.2.33), then we obtain
errors of the order of E' in the equation for rl. Using this fact and formulas
(3.2.32) and (3.2.33), we arrive a t the following expression with accuracy
up to the terms of the order of O ( E ~ ) :
Figure 23 shows the position of the switching lines r0 and I" on the phase
plane ( ~ 1 ~ 2 2 ) .
The switching line (3.2.34) determines the quasioptimal control algo-
rithm in the first approximation:
'We do not calculate the coefficients v and p and the constant term "const" in (3.2.31)
since they do not affect the position of the switching line and the control algorithm in
the first approximation.
Approximate Synthesis of Stochastic Control Systems 163
This algorithm can easily be implemented with the help of standard blocks
of analog computers. The corresponding block diagram of a quasioptimal
control system for damping of random oscillations is shown in Fig. 24, where
1 and 2 denote direct-current amplifiers with amplification factors
164 Chapter I11
if the condition ,B2 << 1 is used in the same way as for calculating (3.2.32).
As was already noted, the method of successive approximations exploited
in the present section is efficient if the nonlinear term of the Bellman equa-
tion contains a small parameter E (we discuss this question in 33.4 in detail).
However, in the problem considered here, the convergence was ensured, in
fact, by the parameter €1-. If we recall that, by the conditions of
problem (3.2.2), the parameter E determines the values of admissible con-
trol, then it turns out that this variable need not be small for the method
of successive approximations to be efficient. Only the relation between the
limits of the admissible control and the intensity of random perturbations B
is important.
All this confirms our assertion made a t the beginning of this chapter
that the method of successive approximations considered here is convenient
for solving problems with bounded controls when the intensity of random
perturbations is large.
where yT = (yl,. . . , yl), rlT = ( ~ 1 ,... , ve), and C and D are constant
matrices of dimensions e x n and e x 1,respectively, (det D # 0). The goal
of control is to minimize a functional of the form
of states. This space was called the space of sufficient coordinates in $1.5
(see also [171]). As was shown in $1.5, in this case, as sufficient coordinates,
we must use a system of parameters that determine the current a posteriori
probability density of nonobserved stochastic processes:
''In this case, the control u in (3.3.1) is assumed to be a given known vector at each
time instant t .
Approximate Synthesis of Stochastic Control Systems 167
Using (3.3.6) and (3.3.7), we obtain the following equation for the a
posteriori probability density (3.3.5):"
Here p(t, z) = pt (x, <) denotes the a posteriori density (3.3.5), z denotes
the vector (x,[), a, is the vector composed of the vector-columns a, and
a t , the matrix B, is a part of the matrix (3.3.7) consisting of its first
(n + m) rows and columns, Eps denotes the a posteriori averaging of the
corresponding expressions (that is, the integration with respect to z with
the density p(t, 2 ) ) .
It follows from (3.3.6)-(3.3.8) that the matrix B, is constant, the com-
ponents of the vector a, are linear functions of z, and the expression in the
square brackets in (3.3.9) linearly and quadratically depends on z. There-
fore, as shown in $1.5 (see also [170, 175]), the a posteriori density p(t, z)
satisfying (3.3.9) is Gaussian, that is,
p(t, z) = [ ( 2 7 ~ ) ~ det
+ K(t)]-'I2
x exp [ - i ( z - ~ ( t ) ) ~ K - ' ( t ) (-z Z(t))], (3.3.10)
if the initial (a priori) density p(0, z) = po(z) is Gaussian (this is assumed
in the sequel).
Substituting (3.3.10) into (3.3.9), one can obtain a system of differential
equations for the parameters 2 and K-' of the a posteriori probability
density (3.3.10). One can readily see that this system has the form
(in our special case, the system (1.5.52) acquires the form (3.3.11), (3.3.12)).
If instead of K - I we use the inverse matrix K (which is the matrix of
a posteriori covariances), then the system (3.3.11), (3.3.12) can be written
in the form
"TO derive (3.3.9) from (1.5.39), we need to recall that, according to the notation
used in (1.5.39), the vector A, coincides with the vector a,, the vector A, with a y , and
the structure of the diffusion matrix (3.3.7) implies the following relations between the
matrices: llBapll = Bz, IIBaull = 0 , llFupll = By1, and llBoPll= B Y .
Chapter I11
where, in turn, k,,, k,(,. . . are elements of the block covariance matrix K ,
= min Eps
ufr)
[1
T
c ( x ( r ) ,u ( r ) ) d i 1 S(t) = i t , K (t) = K t
I
t<;<~
(3.3.16)
is completely determined by the time instant t and the current values of
the parameters (St,K t ) of the a posteriori density (3.3.10) a t this instant
of time. It follows from the definition given in $1.5 that ( Z ( t ) , ~ ( t ) are
)
sufficient coordinates for problem (3.3.1)-(3.3.3).
The Bellman equation (1.5.54) for the function (3.3.16) can readily be
obtained in the standard way from the Eqs. (3.3.13), (3.3.14) for the suffi-
cient coordinates. However, it should be noted that, in this case, the system
(3.3.13), (3.3.14) has a special feature that allows us to exclude the a pos-
teriori covariance K ( t ) from sufficient coordinates. The point is that, in
contrast, say, with a similar system (1.5.53), the matrix equation (3.3.14) is
independent of controls u and in no way related to the system of differential
equations (3.3.13) for the a posteriori means Z(t). This allows us first to
solve the system (3.3.14) and calculate the matrix of a posteriori covari-
ances K ( t ) in the form of a known function of time on the entire control
interval 0 < -t <- T (we solve (3.3.14) with the initial matrix K(0) = KO,
where KO is the covariance matrix of a priori probability density po(z)).
If K ( t ) is assumed to be known, then in view of (3.3.8) and (3.3.15) we
can also assume that the matrix a, in (3.3.13) is a known function of time,
u,(t), and the loss function (3.3.16) depends on the set ( t , i t ) . Therefore,
Approximate Synthesis of Stochastic Control Systems 169
instead of Eq. (1.5.54) for the loss function F ( t , 3, we have the Bellman
equation of the form
(here N (2, K (t)) denotes the normal probability density (3.3.10) with the
vector of mean values Zand the covariance matrix K(t)).
Just as in $3.1 and $3.2, Eq. (3.3.17) becomes simpler if we consider
the stationary operating conditions for the stabilization system shown in
Fig. 25. The stationary operating conditions established during a long
operating time interval (which corresponds to large time t ) can exist if only
there exists a real symmetric nonnegative definite matrix K, such that
f(Z)= lim [ F ( t , q -
T+w
In (3.3.19) a, is the matrix a, (see (3.3.8) and (3.3.15)), where k,,, let,, . . .
are the corresponding blocks of the matrix K, determined by (3.3.18).
In some cases, it is convenient to solve Eq. (3.3.19) by the method of
successive approximations treated in $3.1 and $3.2. The following example
shows how this method can be used. Let us consider the simplest version of
the synthesis problem (3.3.1)-(3.3.3) in which Eqs. (3.3.1), (3.3.2) contain
scalar variables instead of vectors and matrices. In (3.3.3) we write the
penalty function c(x, u) in the form
170 Chapter I11
~=-az+f+u+~[$-(h-a)~-~-u+h~],
-
B,
- - a t -
= - 9 ~+ -[G- ( h - a ) 5 - ( - u+ ~ Y I ,
(3.3.21)
B,
where the constant covariances k;,, k&, and kit form the stationary solu-
tion of the system of differential equations (3.3.22).
Passing to the new variables xl = (&/a,*)Z, x2 = ( & / a r * ) r a n d
denoting by L the linear operator
The stationary density po(x) satisfying (3.1.14) has the form (3.1.15), and
the matrices P and P-l, as one can readily see, have the form
(p=a+g, v=a-gPr,andp=r+2g).
Using (3.3.31), we can find the matrix (see (3.1.35))
It follows from (3.1.44), (3.1.51), and (3.1.55) that solutions of the equa-
tions of successive approximations (3.3.30) can be represented as the series
him = det1I2
2*t!m!
P
IS_,"
~tm(~")I~=vTp~
+
exp [ - ( x T p x ) ] w(x)
~ dx1dx2
(3.3.37)
are expressed in terms of the group of Hermitian polynomials Gem(xl,x2)
orthogonal to Hem(xl, x2) and calculated by (3.2.18).
Parallel to the calculations of the successive approximations to the loss
function (3.3.35), we calculate specific stationary losses -yk (corresponding
to the kth approximation) from the condition bEo = 0. So, in the zero
approximation we have
Thus, from the preceding it follows that the methods for calculations
of stationary operating conditions of the stabilization system (Fig. 13) can
readily be generalized to the case of a more general system with correlated
noise (Fig. 25) if the noise is a Gaussian Markov process. In this case, the
optimal system is characterized by the appearance of an optimal filter in
the regulator circuit; this filter is responsible for the formation of sufficient
coordinates. In our example (Fig. 25), where x, y, u, t, and 7 are scalar,
this filter is described by Eqs. (3.3.21). The circuit of functional elements
of this closed-loop control system is shown in Fig. 26.
Blocks P and 1 are units of the initial block diagram (Fig. 25). The
rest of the diagram in Fig. 26 determines the structure of the optimal con-
troller. One can see that this controller contains standard linear elements of
analog computers such as integrators, amplifiers, adders, etc. and one non-
linear converter NC, which implements the functional dependence (3.3.29).
Units of the diagram marked by ">" and having the numbers 1 , 2 , . . . , 8
are amplifiers with the following amplification factors Ki:
Approximate Synthesis of Stochastic Control Systems 175
and assume that the absolute values of the components of the control vector
u are bounded by small values (see (3.1.3)):
According to (3.1.6) and (3.1.7), the optimal control u*(t,x) for problem
(3.4.1)-(3.4.3) is given by the formula
In this case, the function F ( t , x) must satisfy (3.4.5) for all x E R,, 0 _<
t < T, and be a continuous continuation of the function
(all functions Fk(t,x) determined by (3.4.9) and (3.4.10) must satisfy con-
dition (3.4.8)). Next, if we take Fk(t, x) as an approximate solution of
Eq. (3.4.5) and substitute Fk into (3.4.4) instead of F, we obtain a qua-
sioptimal control algorithm u k ( t ,x) in the kth approximation.
Let us write the solutions Fk(t, x), k = 0,1,2,. . ., in quadratures. First,
let us consider Eq. (3.4.9). Obviously, its solution Fo(t, z ) is equal to the
value of the cost functional
on the time interval [t,T] provided that there are no control actions. In this
case, the functional on the right-hand side of (3.4.11) is calculated along the
trajectories x(T), t 5 T _< T, that are solutions of the system of stochastic
differential equations
On the other hand, we can write the transitive density p(x, t; z, T) for the
diffusion process X(T) (3.4.12) as an explicit finite formula if we know the
fundamental matrix X ( t , T) for the nonperturbed (deterministic) system
.i= A(t)z.
Indeed, since Eqs. (3.4.12) are linear, the stochastic process X ( T ) satisfy-
ing this equation is Markov and Gaussian. Therefore, for this process, the
transitive probability density has the form
p ( x , t ; z, T) = [ ( 2 ~det
) ~~ ] - ~ / ~ e x ~ [ --$u()z~ D - ' ( -
~ a)], (3.4.14)
Hence, performing the averaging and taking into account properties of the
white noise (1.1.34), we obtain the following expressions for the vector a
and the matrix D:
>
12Recallthat the fundamental matrix X ( t , T), T t , is a nondegenerate n x n matrix
whose columns are linearly independent solutions of the system i ( r ) = A(r)z(r), SO
that X ( t , t) = E, where E is the identity matrix. Methods for constructing fundamental
matrices and their properties are briefly described on page 101 (for details, see 162, 1111).
178 Chapter I11
To obtain explicit formulas for the functions Fo(t, x), Fl(t, x), . . ., which
allow us to write the quasioptimal control algorithms uo(t, x), ul(t, x), . . .
as finite analytic formulas, we need to have the analytic expression of the
matrix X ( t , T) and to calculate the integrals in (3.4.13) and (3.4.17). For
autonomous plants (the case where the matrix A(t) in (3.4.1) and (3.4.12)
is constant, A(t) G A = const), the fundamental matrix X ( ~ , T has ) the
form of a matrix exponential:
where for 0 5 t < T the function F ( t , 21, 22) satisfies the equation
According to (3.4.6) and (3.4.13), the operator Lt,, in (3.4.21) has the form
Let us calculate the loss function Fo(t,X I , 22) of the zero approximation.
In view of (3.4.9), (3.4.21), and (3.4.22), this function satisfies the linear
equation
Lt,,Fo(t, XI., 22) = -2; - x:, 0 5 t < T, (3.4.23)
with the boundary condition
From this and the Lagrange-Silvester formula [62] we obtain the following
expression for the fundamental matrix (3.4.18) (here p = (T - t)):
$ sin Sp + 6 cos Sp
- -1
- e-Ppf2
S -sinSp b cos Sp $ sin bp
sin-sp 1 (3.4.26)
It follows from (3.4.15), (3.4.16), and (3.4.26) that in this case the vector
of means a and the variance matrix D of the transitive probability density
(3.4.14) have the form
+ $xl) sin Sp
a = e-pp/2
/I XI
22
+
cos Jp f (xz
cos s p - $ (21 + f x 2 ) sin Sp (3.4.27)
1 1
p~(p)=-(l-e-~~), p2(p)=4~+e-Pp(26sin26p-~cos26p)],
P
p3 (p) = 26 - e-Pp ( p sin 2Sp + 26 cos 2Sp). (3.4.28)
tions, we obtain the following final expression for the function Fo(t, x l , 22):
+ Pxlx2 + x i )
I
where 7 = T - t.
Let us briefly discuss formula (3.4.29). If we consider the terms on
the right-hand side of (3.4.29) as function of "reverse" time 7 = T - t ,
then these terms can be divided into three groups: infinitely increasing,
damping, and independent of p as 7 -+ oo. These three types of terms have
the following physical meaning. The only infinitely growing term (B/P)p in
(3.4.29) shows how the mean losses (3.4.11) depend on the operating time
in the mode of stationary operating conditions. Therefore, the coefficient
B / P has the meaning of the specific mean error y, which was calculated
in 53.2 by other methods and for which we obtained = B / P in the
zero approximation (see (3.2.21)). Next, the terms independent of p (in
the braces in (3.4.29)) coincide with the expression for the stationary loss
function obtained in $3.2 (formula (3.2.26)). Finally, the damping terms in
(3.4.29) characterize the deviations of operating conditions of the control
system from the stationary ones.
Using (3.4.29), we can approximately synthesize the optimal system in
the zero approximation, where the control algorithm uo(t, X I , xa) has the
form (3.4.20) with F replaced by Fo. The equation
determines the switching line on the phase plane (XI,x2). Formula (3.4.30)
shows that this is a straight line coinciding with the x-axis as p -+ 0 and
rotating clockwise as p -+ oo (see Fig. 27) till the limit value X I + 2x2/P = 0
corresponding to the stationary switching line (see (3.2.27)).
Formulas (3.4.29) and (3.4.30) also allow us to estimate whether it is
important to take into account the fact that the control algorithm is time-
varying. Indeed, (3.4.29) and (3.4.30) show that deviations from the sta-
tionary operating conditions are observed only on the time interval lying a t
Chapter I11
the distance from the terminal time T. Thus, if the general operating
time T is substantially larger than this interval (say, T >> 3/,0), then we can
use the stationary algorithm on the entire interval [0, TI, since in this case
the value of the optimality criterion (3.2.3) does not practically differ from
the optimal value. This fact is important for the practical implementation
of optimal systems, since the design of regulators with varying parameters
is a rather sophisticated technical problem.
3.4.2. Estimates of the approximate synthesis performance. Up
to this point in the present chapter, we have studied the problem of how
to find a control syste close to the optimal one by using the method of
successive approximations. In this section we shall consider the problem of
how the quasioptimal system constructed in this way is close to the optimal
system, that is, the problem of approximate synthesis performance.
Let us estimate the approximate synthesis performance for the first two
(the zero and the first) approximations calculated by (3.0.6)-(3.0.8). As an
example, we use the time-varying problem (3.4.1)-(3.4.3). We assume that
the entries of the matrices A(t), Q(t), and ~ ( t in ) (3.4.1) are continuous
functions of time defined on the interval 0 5 t 5 T. We also assume that the
penalty functions c(x) and $(x) in (3.4.2) are continuous and bounded for
all x E R,. Then [I241 there exists a unique function F ( t , x) that satisfies
the Cauchy problem (3.4.5), (3.4.8) for the quasilinear parabolic equation
(3.4.5)14 This function is continuous in the strip IIT = (1x1 < m, 0 5 t 5 T }
14We shall use the following terminology: Eq. (3.4.5) is called a quasilinear (semi-
linear) parabolic equation, the problem of solving Eq. (3.4.5) with the boundary condi-
Approximate Synthesis of Stochastic Control Systems 183
and continuously differentiable once with respect t o t and twice with respect
to x for 0 5 t < T; its first- and second-order derivatives with respect to x
are bounded for x E IIT.
One can readily see that in this case
and hence, for small E, the functions Fo(t , x) and Fl (t, x) nicely approximate
the exact solution of Eq. (3.4.5).
To prove relations (3.4.31), let us consider the functions So@,X ) =
F ( t ,x) - Fo(t, x) and S l ( t , x) = F (t, x) - Fl(t, x). It follows from (3.4.5),
(3.4.9), and (3.4.10) that these functions satisfy the equations
Equations (3.4.32) and (3.4.33) differ from (3.4.9) only by the expressions
on the right-hand sides and by the initial data. Therefore, according to
(3.4.13), the functions So and S1 can be written in the form
tion (3.4.8)is called the Cauchy problem, and the boundary condition (3.4.8)itself is
sometimes called the "initial" condition for the Cauchy problem (3.4.5),(3.4.8). This
terminology corresponds to the universally accepted standards [61,1241 if (as we shall
do in 53.5) in Eq. (3.4.5)we perform a change of variables and use the "reverse" time
p = T - t instead of t . In this case, the backward parabolic equation (3.4.5)becon~es
a "usual" parabolic equation, and the boundary value problem (3.4.5),(3.4.8)takes the
form of the standard Cauchy problem.
184 Chapter I11
(in fact, the derivative on the right-hand side of (3.4.37) is formal, since the
function @ (3.4.7) is not differentiable). Using (3.4.13) for s;, we obtain
Now we note that since Q(t) in (3.4.7) is bounded, the function @(t,y)
satisfies the Lipschitz condition with respect to y:
5 E ~ N P V ( T- t), v = C 1/;,
Approximate Synthesis of Stochastic Control Systems 185
This relation and (3.4.40), (3.4.41) for the function Si readily yield the
estimate
with the use of the quasioptimal controls uo(t,x ) and u l ( t ,x ) . The func-
tions Gi ( t ,z), i = 0 , 1 , estimate the performance of the quasioptimal control
algorithms ui(t,x ) , i = 0 , l . Therefore, it is clear that the approximate syn-
thesis may be considered to be justified if there is only a small difference
between the performance criteria G o ( t ,x ) and G l ( t ,x ) of the suboptimal
systems and the exact solution F ( t , x ) of Eq. (3.4.5) with the initial condi-
tion (3.4.8).
One can readily see that the functions Go and G I satisfy estimates of
type (3.4.31), that is,
dGi
LGi ( t ,X ) = - c ( x ) - c ~ ? ( tx, ) ~ ~ -(t, ( t ) x), (3.4.46)
dx
-
G i ( T ,X ) = $ ( x ) , u i ( t , X ) = u i ( t ,x ) / E , i = 0 , l .
186 Chapter I11
This fact and (3.4.9), (3.4.10) imply the following equations for the func-
tions Ho = Fo - Go and H1 = Fl - G I :
It follows from (3.4.4) that Eqs. (3.1.46), (3.4.49) are linear parabolic equa-
tions with discontinuous coefficients. Such equations were studied in [80,
81, 1441. It was shown that if, just as in our case, the coefficients in
(3.1.46), (3.1.49) have discontinuities of the first kind, then, under our
assumptions about the properties of A(t), Q(t), c(x), and $(x), the solu-
tions of Eqs. (3.4.46), (3.4.49) and their first-order partial derivatives are
bounded.
Using this fact, we can readily verify that the right-hand sides of (3.4.47)
and (3.4.49) are of the order of E and e2, respectively. For Eq. (3.4.47), this
statement readily follows from the boundedness of the components of the
vectors dGo/8x and Eo and the elements of the matrix Q. The right-hand
side of (3.4.49) can be estimated by the Lipschitz condition (3.4.41) and
the inequality
which follows from (3.4.40) and (3.4.44). Therefore, for the functions Ho
and H1 we have
IHoINE, IHII-E~. (3.4.50)
To prove (3.4.45), it suffices to take into account the inequalities
or two (the zero and the first) approximations. This depends on the ad-
missible deviation of the quasioptimal system performance criteria Gi (t, z )
from the loss function F ( t , 2).
In conclusion, we make two remarks about (3.4.45).
REMARK3.4.1. One can readily see that all arguments that lead to the
estimates (3.4.45) remain valid for any types of nonlinear functions in (3.4.5)
that satisfy the Lipschitz condition (3.4.41). Therefore, in particular, all
statements proved above for the function @ (3.4.7) automatically hold for
equations of the form (3.0.4) with T-dimensional ball taken as the set U of
admissible controls, instead of an T-dimensional parallelepiped.
REMARK3.4.2. The estimates of the approximate synthesis accuracy
considered in this section are based on the assumption that the solutions of
the Bellman equation and their first-order partial derivatives are bounded.
At first glance it would seem that this assumption substantially narrows the
class of problems for which the approximate synthesis procedure (3.0.6)-
(3.0.8) can be justified. Indeed, the solutions of Eqs. (3.4.5), (3.4.9),
(3.4.10), and (3.4.46) are unbounded for any x E R, if the functions
c(x) and $(x) infinitely grow as 1x1 + m . Therefore, for example, we
must eliminate frequently used quadratic penalty functions from consid-
eration. However, if we are interested in the solution of the synthesis
problem in a given bounded region Xo of initial states x(0) of the con-
trol system, then the procedure (3.0.6)-(3.0.8) can also be used in the case
of unbounded penalty functions. This statement is based on the follow-
ing heuristic arguments. Since the plant equation (3.4.1) is linear and the
matrices A(t), &(t), and a ( t ) and the control vector u are bounded, we
can always choose a sufficiently large number R such that the probability
>
P{supOltLT Ix(t)l R ) becomes arbitrary small [ l l , 45, 1571 for any fixed
domain Xo of the initial states x(0). Therefore, without loss of accuracy,
we can replace the unbounded functions c(x) and $(x) in (3.4.2) (if, in a
certain sense, these functions grow as 1x1 = R + m slower than the prob-
ability -
Iz(t)l 2 R ) decreases as R -t m ) by the expressions
for 1x1 < R,
c(x) for 1x1 2 R,
1x1 = R,
for lxl<R,
for lxl>R,
for which the solutions of Eqs. (3.4.5), (3.4.9), (3.4.10), and (3.4.46) satisfy
the boundedness assumptions.
188 Chapter I11
Here c(x) and $(x) are given nonnegative scalar penalty functions whose
special form is determined by the character of the problem considered (the
requirements on c(x) and +(x) are given later).
The constraints on the domain of admissible controls have the form
(1.1.22),
u € u, (3.5.3)
Approximate Synthesis of Stochastic Control Systems 189
-a e ( t , x ) = & ( t , x ) + - - g1m i , -
& = I ,...,n. (3.5.5)
2 dx,
Recall that we assumed in $1.2 that throughout this book all stochastic
differential equations written (just as (3.5.1)) in the Langevin form [I271
are symmetrized [174].
By definition, the loss function F in (3.5.4) is equal to
F = F ( t , x ) = min E
u(r)EU
Here E[(-) I x(t) = x] means averaging over all possible realizations of the
>
controlled stochastic process x ( r ) = z u ( ~ ) ( r( )r t ) issued from the point
x a t r = t. It follows from (3.5.6) that
LF(p,x)=-c(z)-min
UEU
ai (p, x) = iii (2, T - p), q(p) = q(T - p), bij (p, x) is a general element of the
matrix $ a ( T - p, x ) ? F T ( ~
- p, x) and, as usual, the sum in (3.5.10) (just
as in (3.5.5)) is taken over repeated indices from 1 to n.
Assuming that the gradient d F / a x of the loss function is a known vector
and calculating the minimum in (3.5.8), we obtain
and solves the synthesis problem (after we have solved Eq. (3.5.11) with
the initial condition (3.5.9)). The form of the functions cp and @ depends
on the form of the domain U in (3.5.3) (see (1.3.19)-(1.3.23)).
Equation (3.5.11) is an equation of the form (3.0.5). It differs from
Eq. (3.0.5) only by a small parameter (there is no small coefficient E of
the function 9). Nevertheless, in this case, we shall also use the ap-
proximate synthesis procedure (3.0.6)-(3.0.8) in which, instead of the ex-
act solution F ( p , x) of Eq. (3.5.11), we take the sequence of functions
Fo(p, x), Fl(p, x), . . . recurrently calculated by solving the following se-
quence of linear equations:
The successive approximations uo(p, x), ul(p, x), . . . of control are deter-
mined by the expressions
Below we shall find the conditions under which the recurrent procedure
(3.5.13)-(3.5.15) converges to the exact solution of the synthesis problem.
Approximate Synthesis of Stochastic Control Systems 191
where and X are some positive constants. Moreover, we assume that the
functions bij(p, x) and ai(p, x) are bounded in IIT, continuous in both vari-
ables (p, x), and satisfy the Holder inequality with respect to x uniformly
in p, that is,
We assume that the functions c(x), $(x), and @(p,dF/dx) are continuous
in IIT and that c(x) and $ (x) satisfy the following restrictions on the growth
as 1x1 + m:
<
~ ( x ) KlehlXl, <
$(x) KlehlXl (3.5.18)
(h is a positive constant). We also assume that the function @(p,v) satisfies
the Lipschitz condition with respect to v = (vl,. . ., v,) uniformly in p E
[O, TI, that is,
/ dQk (p, X )
axi / /" / I a G( ~ ,
K2
o Rn
dxi
P; Y, (. (aQk-l(a'
Indeed, since
for the derivative aFo/azi provided that (3.5.18), (3.5.22), and (3.5.25) are
taken into account.
By using the inequality
dFk
F (p, x) = klim
+m
F k (p, x), Wi (p, x) = k+m
lim -
dxi (p, z).
In this case, the partial sums on the right-hand side of (3.5.35) uniformly
converge in any bounded domain lying in HT,while in (3.5.36) the partial
Approximate Synthesis of Stochastic Control Systems 195
sums converge uniformly if they begin from the second term. The estimate
(3.5.32) shows that the first summand is majorized by a function with
singularity a t p = 0. However, one can readily see that this is an integrable
singularity. Therefore, we can pass to the limit (as Ic -+ oo) in (3.5.23) and
in the formula obtained by differentiating (3.5.23) with respect to xi. As a
result, we obtain
This implies that Wi(p, x) = d F ( p , %)/axi and hence the limit function
F(p, x) satisfies the equation
By using (3.5.38), we can prove that the solution of Eq. (3.5.11) with
the initial condition (3.5.9) is unique. Indeed, assume that there exist two
solutions Fl and Fz of Eq. (3.5.11) (or of (3.5.37)). For the difference
V = Fl - F2we obtain the expression
The same reasoning as for the functions Fk leads to the following estimate
for the difference V = Fl - F2 that holds for any k :
calculated on the trajectories of system (3.5.1) that pass through the point
x at time t = T - p under the action of control uk. The function Hk(p, x)
determines the "quality" of the control uk(p, x) and satisfies the linear
equation
aHk
LHk(~, = -c(x) - u:(P, x)qT(p) z ( ~ , x), Hk(O, X) = d ( ~ ) .
(3.5.39)
From (3.5.14), (3.5.39), and the relation - u T q T d ~ k / d x = @(p,dFk/dx),
it follows that the difference Ak(p,x) = F k ( p ,x) - Hk(p,x) satisfies the
equation
Since the right-hand side of (3.5.40) is small for large k (see (3.5.19),
(3.5.34)), that is,
,, .. ,
(3.5.41)
and the initial condition in (3.5.40) is zero, we can expect that the difference
Ak (p, x) considered as the solution of Eq. (3.5.40) is of the same order, that
is,
[Ak(p, x) < - €6 ~ ~ e ~ l ~ l . (3.5.42)
Approximate Synthesis of Stochastic Control Systems 197
If the functions uk (p, x) are bounded and sufficiently smooth, so that the
coefficients of the operator Lk are Holder continuous, then the operator Lk
is just the same as L and the inequality (3.5.42) can readily be obtained from
(3.5.22), (3.5.24), and (3.5.41). Conversely, if uk(p, x) are discontinuous
functions (but without singularities, for example, such as in (3.0.1) and
(3.0.8)), then the inequality (3.5.42) follows from the results of [811.
Since the series (3.5.35) is convergent, we have IF(p, x) - Fk(p,x)l 5
~ l ~ lE; t 0 as k + m ) . Finally, this fact, the inequality
~ g ~ 7 e (where
< +
IF - HkI IF - FkI IFk- Hk1, and (3.5.42) imply
Here d ( r ) is the delta function; b and urn are given positive numbers.
We shall assume that the penalty function c(x) in the optimality crite-
rion (3.5.2) is even (that is, c(x) = c(-x)) and the final state x(T) is not
penalized. Then the Bellman equation (3.5.8) and the initial condition
(3.5.9) take the form
dF
+ u min aF + --
bd2F
- = c(x)
ap [a
u-
2 ax2,
F(0, x) = 0. (3.5.44)
198 Chapter I11
Since the penalty function c(x) is even, it follows from (3.5.45) that for any
p the loss function F (p, x) satisfying (3.5.45) is an even function of x, hence
we have the explicit formula
we obtain from (3.5.22) and (3.5.23) the following expressions for the first
two approximations:
Approximate Synthesis of Stochastic Control Systems
aV(t,
dt
= L,v(t, x) + u(t, a ) + [(t, a ) , 0 <t 5 T,
v(0, x) = vo(x).
(3.6.1)
Here C, denotes a smooth elliptic operator with respect to spatial variables
2 = ( X I , . . - 7 xn),
d2 d
axiaxj +
L, = aij (t, x) - bi(t, 2)-
axi + c(t, x),
whose coefficients aij (t, x), bi (t, x), and c(t, x) are defined in the cylinder
fl = D x [O,T], where D is the closure of an arbitrary domain in the
n-dimensional Euclidean space R, and the matrix a(t, x) satisfies the in-
equality
T
rl a7 = aij(t, x ) ~ i r l >
j 0 (3.6.3)
for all (t, x) E 0 and all 7 = (vl, . . . , 7,) (as usual, in (3.6.2) and (3.6.3)
the sum is taken over twice repeated indices from 1 to n).
If D does not coincide with the entire space R,, then, in addition to
(3.6.1), the following boundary conditions must be satisfied at the boundary
d D of the domain D:
M,v(t, 2) = uy(t,x), (3.6.4)
where the linear operator M, depends on the character of the boundary
problem. Thus, for the first, the second, and the third boundary value
problems, condition (3.6.4) has the form
. .
where xi = (xi, x i , . . ., xh), dxi = d x y x i . . .dxh ( i = 1 , 2 , . . . , s), and w is
an arbitrary nonnegative integrable function. In this case, the desired func-
tions u, and u: must depend on the current state v(t, x) of the controlled
system (the synthesis functions), that is, they must have the operator form
(it is assumed that the state function v(t, x) can be measured precisely).
202 Chapter I11
F [ t ,v ( t , x ) ] = min
u(t,x)€U(s)
u , ( t , x ) € U r ( x ) t<r<T
where
+'// D D
J 2 F [ t v, ( t , 211
Sv(t, ~ ) ~ v Y)
A v ( t , x ) A v ( t ,y) d x d y
(t,
+ . .. .
(3.6.12)
Approximate Synthesis of Stochastic Control Systems 203
d2F . 1 d2Fa(vl,v2,.. .)
= lim -
(3.6.13)
6 ~ ( t~, ) 6 ~ y)
( t , a+o A2" d ~ i d ~ j
aj+x
1 62 F
+ 5 J, J, K(t'x' ')6v(t, x)bv(t, y)
dxdy.
Substituting (3.6.16) into (3.6.15) and (3.6.15) into (3.6.14), we obtain the
following final Bellman equation (for the third boundary value problem):
--
aF
at
-
- min { ~ ( uup,
u , ,
u,EUr
, v) + J, %udx
SF
+ LD
This equation can be solved only approximately if the penalty functions are
arbitrary and the controls u and u r are subject to constraints.
Let us consider one of the methods for solving (3.6.17) by using the
approximate synthesis procedure of the form (3.0.6)-(3.0.8). As already
noted (883.1-3.4), the approximate synthesis method is especially conve-
nient if the controlling actions are small, namely, Ilv - voll/llvII << 1, where
v is the solution of Eq. (3.6.1) with the boundary condition (3.6.4) and with
any admissible functions u and ur satisfying (3.6.5), vo is the solution of
the corresponding homogeneous (in u and u r ) problem, and 11 -11 is the norm
in the space L2. From a physical viewpoint, this means that the power of
(controlled) sources is not large as compared with llv112 or with the intensity
JD lD K ( t , x, y ) dxdy of random perturbations t ( t , x).
Then, by setting u(t, x) = ur 0, we obtain the following equation for
the zero approximation instead of (3.6.17):
Approximate Synthesis of Stochastic Control Systems 205
If the functional Fo(t, v(t, x)) satisfying (3.6.18) is found, then the condition
min { G ( ~ , u ~ , v )
uEU,
urEUr
+ JD ?udx+ LD
allows us to calculate the zero-approximation optimal control functions (op-
erators) uo(t, x) = (t, v(t, x)) and u''(t, x) = $0 (t, v(t, 2)).
The expression for GI (v(t, z)) is used to calculate the first approximation
F~(t, v(t, x)) , and so on. In general, the kth approximation Fk(t, v(t, x))
(k = 1,2,. . .) of the loss functional is determined as the solution of an
equation of the form (3.6.18), where the change Go t Gk and F o -+ Fk is
performed. Furthermore, simultaneously with F k , we determine the pair of
functions (operators)
Here the entries of the matrix llDtrll are given by the formulas
and (DF;),~ denotes the (a,P ) t h element of the inverse matrix IIDtTjl-l.
To prove (3.6.21) and (3.6.22), we need to recall some well-known facts [61,
1241 from the theory of adjoint operators and Green functions.
Suppose that a smooth elliptic operator L, of the form (3.6.2) is given in
an arbitrary domain D of a n n-dimensional Euclidean space R,. We also
assume that this operator is defined in the space of functions f sufficiently
smooth in D and satisfying the equation
that the operator M; can be defined uniquely. So, for the first, second,
and third homogeneous boundary conditions (that is, for the conditions
(3.6.4.1)-(3.6.4.111)) where up@,x) = 0, Eq. (3.6.25) takes, respectively, the
form
in the variables ( t ,x ) in the domain D x ( r < t < T ) and satisfies the initial
and boundary conditions
In a similar way, the Green function G* ( x ,t ;C , r )is defined for the adjoint
parabolic operator (3.6.29). The only difference is that, in this case, the
function G* is defined for time t < r. The conditions (similar to (3.6.30)-
(3.6.32)) that determine the Green function for the adjoint problem have
the form
G ( x ,t ;C , 7 ) = G* (C, r ;x , t ) . (3.6.36)
passing to the limit as E + 0, and taking into account (3.6.31) and (3.6.34),
we obtain (3.6.36).
Now, by using the properties of the Green functions, we shall show that
the functional (3.6.21) actually satisfies Eq.(3.6.18). To this end, we need
to calculate all derivatives in (3.6.18). Taking into account the relation
lim/
rJt
m
.lCO
-,
..
..
m
dvl ) ~1
. d v s { ~ ( v l , ...,v , ) [ ( 2 ~det 1~~,//]'~'
By formulas (3.6.13) and (3.6.22), we can readily obtain the first- and
second-order functional derivatives
that the cylinder is closed a t one end (x = l) and the flow rate is given
at the other end of the cylinder. The concentration v of the substance in
the cylinder can be affected by changes in the flow rate a t the end of the
cylinder (the rate of the incoming flow is the controlling action). Assuming
that a random perturbation ((t, x) is a stationary white noise, we obtain
the following mathematical model of the plant to be controlled [95]:
212 Chapter I11
(here B and C a r e the diffusion and the porosity coefficients of the medium);
(8(x,y) is a given positive definite function, i.e., the kernel) provided that
the absolute value of the boundary control action (the boundary flow of the
substance) u is bounded, that is,
Iul 5 Urn- (3.6.49)
In this example the Bellman equation (3.6.17) has the form
+ f "
K(x' ~ ) S v ( tx)Sv(t,
, y)
drdy
x=t
+ a 2 min
I U I L U ~ a x Sv
va(E)]
dx Sv %=a
, F[T,V(T,X)]=O.
Taking into account (3.6.45) and (3.6.46) and calculating the minimumwith
respect to u, we can rewrite (3.6.50) in the form
+ fl l S2F
K ( x ' ~ ) S v ( tx)Sv(t,
, y)
dxdy
The zero approximation. Suppose that urnis small. To solve (3.6.5 I),
we first set urn = 0. As a result, we obtain the following equation of the
zero approximation
a2 SF0
-?! at
= Jd( l
e
B(x, y)v(t, x)v(t, y) dxdy a
e
+ 2S,
v(t, x)- (----)
ax2sv(t,x)
dx
S2Fo
dxdy, FO[T, V(T,x)] = 0.
Here the Green function G for the boundary value problem (3.6.45), (3.6.46)
can readily be obtained by the separation of variables (the Fourier method)
[26, 1791 and represented as the series
Hence it follows that the optimal control law (3.6.52) has the following form
in the zero approximation:
S2 Fl
dxdy - 2a2u,G[v(t, x)],
Now, formulas (3.6.21) and (3.6.22) are not sufficient for calculating
Fl(t, v(t, x)); we need to use a more complicated calculation procedure
according to (3.6.43) and (3.6.44). A finite-dimensional analog of the func-
tional G can be obtained by dividing the interval [O,t] into the intervals
A = t / r and replacing G by
where
H = H [ t ,T , v ( t , x ) ] =
x G ( x ,u;0 , T ) G ( Y ,u;2, r ) G ( T ,T ; 8,t ) v ( t ,y) dxdydZdg,
(3.6.60)
where
Indeed, it follows from (3.6.63) that, besides a system of data units, the
control circuit (the feedback circuit) contains a system of linear amplifiers
with amplification factors Qi(t), an adder, and a relay type switching device
that relates the pipeline [0, .l]either to reservoir 1 (for pumping additional
substance) or to reservoir 2 (for substance suction a t the pipeline input).
Figure 29 shows the block diagram of the system realizing the control
algorithm (3.6.63).
The quasioptimal first-approximation algorithm (3.6.6 1) can be realized
in a similar way. Here only the control circuit, along with a nonlinear
unit of an ideal relay type, contains nonlinear transformers that realize the
probability error functions a(%).
However, it should be noted that an error is inevitably present in the
finite-dimensional approximation of the state function v(t, x) (when the
algorithm (3.6.56) is replaced by (3.6.63)), since it is impossible to measure
the system state v(t, x) precisely (this state is a point in the infinitely
dimensional Hilbert state L 2 ) . However, if the points X I , . . ., x p of location
of the concentration data units lie sufficiently close to each other, then this
error can be neglected.
CHAPTER IV
Ft + - Sp DRDF,,T
2
E
+ Sp [ F D ( U U ~ - EDRD)]+ %(m, D, F,, Fo) = 0,
(4.0.2)
where
220 Chapter IV
If the value of the parameter E is small, then the solutions of the above
equations and the solutions of the equations
where c is a constant.
In the present section the main attention is paid to the "algorithmic"
aspects of the method, that is, to calculational methods for obtaining qua-
sioptimal controls uk. As an example, we consider two specific problems of
the optimal servomechanism synthesis. First (in 54.1), we consider the syn-
thesis problem that generalizes the problem considered in 52.2 to the case in
which the input process ~ ( t is) a diffusion Markov process inhomogeneous
in the phase variable y. Next (in $4.2), we write an approximate solution of
the synthesis problem for an optimal system of tracking a discrete Markov
process of a "telegraph signal" type when the command input is observed
on the background of a white noise.
where ((t) is the standard white noise of unit intensity (1.1.31), E and N
are given positive constants (E is a small parameter), and the values of
admissible controls u lie in the region1
'The nonsymmetric constraints (4.1.2) are, first, more general (see [21]), and second,
they allow a more convenient comparison between the results obtained later and the
corresponding formulas constructed in 52.2.
222 Chapter IV
where urn > a > 0. The command input ~ ( t is) a J(t)-independent scalar
Markov diffusion process with drift and diffusion coefficients
where @ and B > 0 are given numbers and E is the same small parameter
as in (4.1.1). The performance of the tracking system will be estimated by
the value of the integral optimality criterion
>
where the penalty function c (y(t) - x (t)) = c(z(t)) 0, c(0) = 0, is a given
concave function of the error signal z(t) = y(t) - x(t).
The problem stated above is a generalization of the problem studied in
Section 2.2.1 of $2.2 to the case in which the plant is subject to uncontrolled
random perturbations and the input Markov process y(t) is inhomogeneous
in the phase variable y (the drift coefficient AY = AY(y) = -@y # const).
The inhomogeneity of the input process y(t) makes the synthesis problem
more complicated, since in this case the Bellman equation cannot be re-
duced to a one-dimensional equation (as in Section 2.2.1 of $2.2).
Since problem (4.1.1)-(4.1.4) is a special case of problem (1.4.2)-(1.4.4),
then it follows from (1.4.21), (1.4.22), and (4.1.1)-(4.1.4) that the Bellman
equation has the form
-PyFy + -a-u,<u<-a+u,
min
-
[uFz] + i(NF.m + B F y y )+ c(y - X) = -Ft,
E
f (Y,') = Tlim
-im
EF@l y, 1' - T ( -~t)l, (4.1.7)
Synthesis of Quasioptimal Systems 223
then (4.1.6) implies the following stationary Bellman equation for the prob-
lem considered:
- P ~ ( f y+ f t ) + -a-u,<u<-a+u,
min [-ufrl
As usual, the number y 2 0 in (4.1.8) characterizes the mean losses per unit
time under stationary operating conditions. This number is an unknown
variable and can be obtained together with the solution of Eq. (4.1.8).
Let us discuss the possibility that Eq. (4.1.8) can be solved. By R+
we denote the domain on the phase plane (y, z) where f, > 0 and by R-
the domain where f, < 0. It follows from (4.1.8) that the optimal control
u,(y, z) must be equal to u, = u, - a in R+ and to u* = -urn - a in R - .
Denoting by f+(y, z) and f- (y, z) the values of the loss function f (y, z)
in the domains R+ and R-, we obtain the following two equations from
(4.1.8):
Since in (4.1.8) the first derivatives f y and f, are continuous on the interface
r between R+ and R- [172], both equations in (4.1.9) hold on r, and we
have the condition
Since the control action u, is of opposite sign on each side of the inter-
face I', the line I' is naturally called a switching line. It follows from the
preceding that the problem of the optimal system synthesis is equivalent to
the problem of finding the equation for the switching line I?.
Equations (4.1.9) cannot be solved exactly. The fact that expressions
with second-order derivatives contain a small parameter E allows us to solve
these equations by the method of successive approximations. In the zero
approximation, instead of (4.1.9), we need to solve the system of equations
By f:, and r0 we denote the loss function, the stationary error, and
the switching line obtained from the Eq. (4.1.11) for the zero approxima-
tion. The successive approximations fi,-yk, and rk ( k >
- 1) are calculated
224 Chapter IV
where
A method for solving Eqs. (4.1.11), (4.1.12) was proposed in [172]. Let
us briefly describe the procedure for calculating successive approximations
f k , 7 k , and rk,k = 0,1,2, . . .. First of all, note that Eqs. (4.1.11), (4.1.12)
are the Bellman equations for deterministic problems of synthesis of second-
order control systems in which the equation of motion has the form
(in the second equation the signs "minus" and "plus" of urn correspond
to the domains R'C+
and R!, respectively). As was shown in [172], the
gradient v f of the solution of nondiffusion equations (4.1.1 I ) , (4.1.12)
remains continuous when we cross the interface rk,that is, on rkwe have
the conditions
k
aft
af,-- -
-
af+ - - aft k = 0 , 1 , 2 ,..., (4.1.15)
ay ay ' az az
if the phase trajectories of the deterministic system (4.1.14) either approach
the line l' on both sides (the switching line of the first kind) or approach rk
on one side and recede on the other side (the switching line of the second
kind, see Fig. 4). This fact allows us to calculate the gradient vfk along
rk. Indeed, in the domain R: we have
This allows us to write the difference between the values of the loss function
a t different points on the boundary rk as a contour integral along the
boundary,
1
f k ( ~-) f k ( p ) =
P
Q
+
A: dy A: dl. (4.1.19)
If A;(y, z), A,k(y,z), and zk(y) are sufficiently smooth functions of their
arguments, then we have
k k a ~ k aA:
Az(Yi-~,zi-l)-A,(yi,zi) = -Az(yi,zi)+-(yi,zi)(zi-l-zi)+~(A)
BY az
(4.1.23)
for small A = yi - y;-1. Substituting (4.1.23) into (4.1.22), taking into
+
account the relation zi+1 - 2zi zi-1 = o(A), and passing to the limit as
A + 0, we obtain the condition
The unknown parameter yk can be found from the condition that the de-
rivative (4.1.25) is finite a t a stable point; in the problem considered the
point y = 0 is stable. More precisely, this condition can be written as
lim ywk(y,y k ) = 0. (4.1.26)
Y+O
The expression
is the increment of the loss functions f k on the time interval dt. Hence
(4.1.26) means that this increment becomes zero after the controlled de-
terministic system (4.1.14) arrives a t the stable state y = 0. Obviously, in
this case, it follows from the above properties of the penalty function c(z)
that we also have z = 0. Thus, relation (4.1.26) is a necessary condition for
the deterministic Bellman equations (4.1. l l ) , (4.1.12) to have stationary
solutions.
Let us use the above-treated calculation procedure for solving the equa-
tions of successive approximations (4. l.l l ) , (4. l.12). We restrict our calcu-
lations to a small number of successive approximations that determine the
most important terms of the corresponding asymptotic expansions and pri-
marily affect the structure of the controller C when a quasioptimal control
system is designed.
Synthesis of Quasioptimal Systems 227
Since, by assumption, the penalty function c(z) attains its unique minimum
at z = 0, the condition (4.1.29) implies the equation
that is, in the zero approximation, the switching line coincides with the
y-axis on the plane (y, z).
Now let us verify whether (4.1.30) is a switching line of the first kind.
An examination of phase trajectories of system (4.1.14) shows that on the
segment
the phase trajectories approach the y-axis on both sides;' therefore, this
segment is an actual switching line. For y @ [ L , e+], the equation for the
switching line r0 will be obtained in the sequel.
Now let us calculate the stationary tracking error -yo. From (4.1.25),
(4.1.26), and (4.1.28), we have
20bviously, in this case, the domain Rt (RO)is the upper (lower) half-plane of the
phase plane (y,z). Therefore, to construct the phase trajectories, in the second equation
in (4.4.14), we must take urnwith sign "minus" for z > 0 and with "plus" for z < 0.
228 Chapter IV
It also follows from (4.1.28) and (4.1.31) (with regard to c(0) = 0) that the
loss function is constant on the y-axis for l- < y < l + ; thus we can set
fO(y,O) = 0 for y E [l-,[+I.
To calculate the loss function f 0 a t a n arbitrary point (y, z), we need
to integrate Eqs. (4.1.27). To this end, let us first write the system of
equations for the integral curves (characteristics):
If yo denotes the point a t which a given integral curve intersects the y-axis,
z = 0, then (4.1.32) implies the following equation for the characteristics
(the phase trajectories) :
c(zl) dz'
PVZ'JI'P*(Y)+ z - z 1 1 - a f urn' (4.1.34)
Here we took into account the equality c(0) = 0 and assumed that the
condition (dy*/dyo) - (ayoldz) # O must be satisfied on r0 determined
by (4.1.35).
Synthesis of Quasioptimal Systems 229
+
(in (4.1.36) we have yo = cp~l[cp-(~) z], where cp-(~)is determined by
(4.1.33)).
Equation (4.1.36) determines the switching line z = zO(y) for z > L+
+
implicitly. Near the point y = l+= ( a u,)/P a t which the switching line
changes its type, Eq. (4.1.36) allows us to obtain an approximate formula
and thus write the equation for r0explicitly:
Figure 30 shows the position of the switching line r0and the phase trajec-
tories in the zero approximation.
230 Chapter IV
To simplify the further calculations, we note that, in the case of the sta-
tionary tracking mode and of small diffusion coefficients considered here,
the probability that the phase variables y and z fluctuate near the origin
on phase plane (y, z) is very large. The values y = (a f u,)/P a t which
the switching line r0changes its type are attained very seldom (for the sta-
tionary operating conditions); therefore, we are mainly interested in finding
the exact position of the switching line in the region -(u, - a)/@ < y <
(u, + a ) / p , where, in the zero approximation, the position of the switching
line is given by the equation z = 0. Next, note that the first-approximation
equation (4.1.37) differs from the corresponding zero-approximation equa-
tion (4.1.27) only by a small (of the order of E ) term in the expression for
z) (see (4.1.38)). Therefore, the continuity conditions imply that the
switching line I'l in the first approximation determined by (4.1.37) is suf-
ficiently close to the previous position z = 0. Thus, we can calculate I" by
using, instead of exact formulas, approximate expressions corresponding to
small values of z.
Now, taking into account the preceding arguments, let us calculate the
function z) = c:(~, z) determined by (4.1.38). To this end, we differ-
entiate the second expression in (4.1.34) and restrict ourselves to the first-
and second-order terms in z. As a result, we obtain3
a2fg -
dz2
-
22
fly - a & u,
+ (fly -P2yz2
a 41
+z3...,
d2f2 -
- -- Pz2 +z3..., d2f2
= z 3 . .
dzdy (fly - a k dy2
3The functions f:(y, z) and fL(y,z), as the solutions of Eqs. (4.1.37),are defined in
R
: and R?. At the same time, the functions f t (Y,z) and f:(y, z) are defined in R
:
: and RO) and I'l (between
and RO. However, since the switching lines r0 (between R
: and R?) are close to each other, to calculate (4.1.39),we have used expressions
R
for f i in R$ and RL .
(4.1.34)
Synthesis of Quasioptimal Systems 231
af:
Py-+ (Py - a i u r n ) -
a f i =z2 -yl+ +
E(B N ) z (4.1.40)
ay 8.2 (PY- a + urn)
(in Eqs. (4.1.40) we preserve only the most important terms in the functions
c i ( y , z) and neglect the terms of the order higher than or equal to that
of e3).
In view of (4.1.15), both equations (4.1.40) hold on the boundary r l .
By solving these equations, we obtain the components of the gradient of
the loss function V fl(y, z ) on the switching line r l :
In this case, the condition (4.1.20) (a necessary condition for the switch-
ing line of the first kind) leads to the equation
Hence, neglecting the order terms, we obtain the following equation for
the switching line I'l in the first approximation:
with desired accuracy, we need not calculate the loss function f& (y, z) in
the first approximation but can calculate c2(y,z) in (4.1.12) and (4.1.13)
232 Chapter IV
d2f1 -
--
E(B N ) + along I".
dz2 u&-(P~-a)~
As follows from (4.1.41), the other second-order derivatives d2f '/dzdy and
d2f1/dy2 on I'l are higher-order infinitesimals and can be neglected when
we calculate y2. Therefore, (4.1.45) and (4.1.13) yield the following approx-
imation expression for the function c2(y,z):
Taking (4.1.46) into account and solving the system (4.1.16), (4.1.17)
a
(with k = 2) for f 2/ayand 8f /dz, we calculate the functions A: and A:
in (4.1.44) as
From (4.1.26), (4.1.43), (4.1.44), and (4.1.47), we derive the equation foi
the stationary tracking error in the second approximation:
Formula (4.1.48) exactly coincides with the stationary error (2.2.23) ob-
tained for a homogeneous (in y) input process. The inhomogeneity, in other
words, the dependence of the stationary error on the parameter P, begins to
manifest itself only in the calculations of higher approximations. However,
the drift coefficient -fly affects the position of the switching line (4.1.43)
already in the first approximation. Formula (4.1.43) is a generalization of
the corresponding formula (2.2.22); for /3 = O these formulas coincide.
Figure 31 shows the analogous circuit diagram of the tracking system
that realizes the optimal control algorithm in the first approximation. The
unit N C is an inertialess nonlinear transformer governed by the functional
Synthesis of Quasioptimal Systems
where ((t) is the standard white noise independent of y(t) and C(t) and the
controlling action is bounded in absolute value,
where the penalty function c(y - x) is the same as in (4.1.4). In the method
used here for solving problem (4.2.1)-(4.2.5), it is important that c(y - x)
is a differentiable function. In the subsequent calculations, this function is
quadratic, namely,
c(y - 2) = (y - x ) ~ . (4.2.6)
Synthesis of Quasioptimal Systems 235
where gn, y,, and Cn are understood as the mean values of realizations over
the interval A of time quantization:
236 Chapter IV
It follows from (4.2.8) (see also (1.1.42)) that the sequence yn is a simple
Markov chain characterized by the following four transition probabilities
pA(yn+1 I yn):
Using these properties of the sequences yn and in, we can write recurrent
formulas relating the a posteriori probabilities of successive time instants
+
(with numbers n and n 1) and the result of the last observation.
The probability addition and multiplication theorems yield the formulas
By letting the time interval A -+ 0, and taking into account the fact that
lima+o (d,+l - d,)/A = dt and (4.2.17), we derive the following differential
equation for the function dt = d(t):
Since, in view of (4.2.7), the functions zt = z(t) and dt satisfy the relation
+
dt = (1 z t ) / ( l - zt), Eq. (4.2.18) for zt has the form
(4.2.23)
and using the Markov property of the sufficient coordinates ( x ( t ) , z ( t ) ) ,
from (4.2.23) we obtain the basic functional equation of the dynamic pro-
gramming approach:
r &+A
F ( t , xt, zt) = min
lu(r)l<1
4 ~ist necessary to verify the statement of item (2) only in special cases in which the
control constraints depend on the state of the control system. Such problems are not
considered in this book.
Synthesis of Quasioptimal Systems 239
The Bellman differential equation can be derived from (4.2.24) by the stan-
dard method (see $1.4 and $1.5) of expanding F ( t + A, xt+a, zt+a) in the
Taylor series around the point (t, xt, zt), averaging, and passing to the limit
as A t 0. In this procedure, we use the following obvious formulas that
are consequences of (4.2.3), (4.2.7), and (4.2.20)-(4.2.22):
-aF
+
at
min u-
1,51
aF
[ ax] -2pz-+--+
aF
az
B
2
~
axz
~( 1 -Fz 2 ) 2 a 2 ~
2~ az2
240 Chapter IV
If, by analogy with the problem solved in $4.1, in the space of sufficient
coordinates (x, z) we denote the regions where d F / d x > 0 and d F / d x < 0
by R+ and R-, respectively, then in these regions the nonlinear equation
(4.2.32) is replaced by the corresponding linear equation and the optimal
control is formed by the rule
Since the first-order derivatives of the loss function are continuous [113,
1751, on the interface I' between R+ and R-, we have
5The condition (4.2.34) means that there are reflecting screens on the boundary
segments (x = +1, -1 5 z 5 $1) and (x = -1, -1 5 x 5 +1) (for a detailed description
of diffusion processes with phase constraints and various screens, see $6.2).
Synthesis of Quasioptimal Systems 241
the original nonlinear equation (4.2.32) with the initial and boundary con-
ditions (4.2.33)-(4.2.35) and then, on the plane ( z , z), to find the geometric
locus where the condition (4.2.36) is satisfied.
However, this method can be implemented only numerically. To solve
the synthesis problem analytically, let us return to the approximate method
used in $4.1.
4.2.2. Calculation of the successive approximations. Suppose
that the intensity of random actions on the plant is small but the error of
measurement of the input signal is large. In this case, we can set B = EBO
and ;ic = x O / &(where E > 0 is a small parameter). We consider, just as in
$4.1, the stationary tracking operating conditions. Then for the quadratic
loss function (4.2.6), the Bellman equation (4.2.32) has the form
+x2-2xz+1-y
az2
(4.2.37)
(here f = f ( x , z) is the stationary loss function defined just as in (4.1.7),
and y is the stationary tracking error).
Introducing the special notation f+ and f- for the loss function f in
R + and R-, we can replace the nonlinear equation (4.2.37) by the pair of
linear equations
The condition
which is necessary for the existence of the switching line of the first kind
(see (4.1.20)), together with (4.2.42) implies that the line
shows that the trajectories actually approach the line (4.2.44) on both sides6
if only 2p < 1. In what follows, we assume that this condition is satisfied.
The stationary error is obtained from the condition that the derivative
d f O / d x calculated along r0 a t the stable point (e.g., a t the origin x = 0,
61n the first equation in (4.2.45), the sign + corresponds to the region z > x and the
sign - to z < x .
Synthesis of Quasioptimal Systems 243
I-"=-
along rO.The condition (4.1.26) in this case has the form limx,o
0
-
2pxY -
0, which implies = 1.
Now, to solve Eq. (4.2.39), we write the characteristic equations
Hence, using (4.2.43), we obtain the equation for the switching line I":
Hence it follows that the stationary error in the first approximation depends
on the noise intensity a t the input of the system shown in Fig. 32 but is
independent of the noises in the plant.
246 Chapter IV
Using the equation (4.2.53) for the switching line and Eq. (4.2.20), we
construct the analogous circuit (see Fig. 34) for a quasioptimal tracking
system in the first approximation. The dotted line indicates the unit SC
that produces a sufficient coordinate z(t); the unit NC is an inertialess
transducer that realizes the functional dependence on the right-hand side of
(4.2.53). If we have E << 1for the small parameter contained in the problem,
then the output variable x(t) fluctuates mostly in a small neighborhood of
zero. In this case (Ix(t)l << I), as follows from (4.2.53), the nonlinear
unit NC can be replaced by a linear amplifier with the amplification factor
CHAPTER V
The present chapter deals with some synthesis problems for optimal sys-
tems with quasiharmonic plants. Here the term "quasiharmonic" means
that the plant dynamics is close to harmonic oscillations in the process of
control. In this case, through time t = 2n, the phase trajectories of the
second-order systems considered in this chapter are close to circles on the
plane (3,k ) .
There exists an extensive literature on the methods for studying such
systems (including controlled systems) (e.g., see [2, 19, 27, 33, 69, 70,
136, 153, 1541 and the references therein). These methods are based on
the idea (going back to Poincark) that the motion in oscillatory systems
can be divided into "fast" and "slow" motions. This idea along with the
averaging method [2] enables one to derive equations for "slow" variables
that can readily be integrated. These equations are usually derived by
different versions of the method of successive approximations.
Various approximate methods based on the first-approximation equation
for slowly varying variables play an important role in industrial engineering.
For the first time, such a method for studying nonlinear oscillatory systems
was proposed by van der Pol [183, 1841 (the method of slowly varying
amplitudes). Among other first-approximation methods, we also point out
the "mean steepness" method [2] and the harmonic balance method [69,
701, which is widely used in engineering calculations of automatic control
systems.
More precise results can be obtained by regular asymptotic methods, the
most important of which is the asymptotic Krylov-Bogolyubov method [19].
Originally, this method was developed for studying nonlinear oscillations
in deterministic uncontrolled systems. Later on, this method was also used
for the investigation of stochastic [log, 1731 and controlled [33] oscillatory
systems. In the present chapter, the Krylov-Bogolyubov method is also
widely used for constructing quasioptimal control algorithms.
This chapter consists of four sections, in which we consider four special
problems of optimal damping of oscillations in quasiharmonic second-order
systems with constrained controlling actions. In the first two sections (55.1
248 Chapter V
and $5.2) we consider deterministic problems; the other two sections ($5.3
and $5.4) deal with stochastic synthesis problems.
First, in $5.1 we study the control problem for an arbitrary quasihar-
monic oscillator with one degree of freedom. We describe a method for
solving the synthesis problem approximately. In this method, the mini-
mized functional and the equation for the switching line are presented as
asymptotic expansions in powers of a small parameter contained in the
problem. The method of approximate synthesis is illustrated by some ex-
amples of solving the optimal control problems for a linear oscillator and
a nonlinear van der Pol oscillator. In $5.2 we use the method (consid-
ered in $5.1) for solving the control problem for a system of two biological
populations, namely, the "predator-prey" model described by the Lotka-
Volterra equation (see $2.3). We study a special Lotka-Volterra model
with a "poorly adapted predator." In this case, the sizes of both interact-
ing populations obey a quasiharmonic dynamics. Next, in $5.3, we consider
the stochastic version of the problem studied in $5.1. We consider an as-
ymptotic synthesis method that allows us to construct quasioptimal control
systems with an oscillatory plant subject to additive random disturbances.
Finally, in $5.4, the method considered in $5.3 is generalized to the case
of indirect observation when the measurement of the current state of the
oscillator is accompanied by a white noise.
'The only assumption is that, for some given functions ~1 and xz, the Cauchy prob-
lem of system (5.1.1) has a unique solution in a chosen domain D in the space of the
variables ( t ,xl, xz) (see $1.1).
Control of Oscillatory Systems 249
with the same period T = 27r and the phase shift Acp = n/2. Note that,
in the phase plane (21,22), the trajectory that is a circle of radius a corre-
sponds to the solution (5.1.2). If E # 0 but is a sufficiently small parameter,
then, in view of the continuity, the difference between the solution of sys-
tem (5.1.1) and the solution (5.1.2) is small on a time interval that is not
too large. More precisely, if for E # 0 we seek the solution of system (5.1.1)
in the form
+
then the "amplitude" increment A a = a ( t 27r) - a(t) and the "phase"
+
increment A a = a ( t 27r) - a ( t ) are small during time T = 27r, that is,
A a N E and A a E. This fact justifies the term "quasiharmonic" for
systems of the form (5.1.1) and serves as a basis for the elaboration of
various asymptotic methods for the analysis of such systems.
5.1.1. Statement of the problem. In the present section we consider
controlled oscillators whose behavior is described by an equation of the form
hence it follows that the oscillator (5.1.3) is a special case of the oscillator
(5.1.1) with XI 2 0 and X Z ( X ~ , X Z ,U) = u - ~ ( 2 122)x2.
,
It should be noted that equations of the form (5.1.3) describe a wide class
of controlled plants of various physical nature: mechanical (the Froude
pendulum [2]), electrical (vacuum-tube and semiconductor generators of
harmonic oscillations [2, 19, 183, 184]), electromechanical remote tracking
systems for angle reconstruction [2], etc. Numerous examples of actual
systems mathematically modeled by Eq. (5.1.3) can be found in [2, 19,
1361.
For the controlled oscillator (5.1.3), we shall consider the following op-
timal control problem with free right-hand endpoint of the trajectory.
We assume that the absolute value of the admissible (scalar) control
u = u(t) is bounded a t each time instant t:
250 Chapter V
and the goal of control for system (5.1.3) is to minimize the integral func-
tional
T
I[u] = c (x (t), &(t))dt i min (5.1.6)
I u ( t ) l < u m , O<t<T
dF
--at = "2- d F - dl7
axl
("1 + EX("', x2)xZ) -
ax2+ I u rnin
lSum [EU-
:xtl
+ c(x1, xz),
0 < t < T, F ( T ,~ 1 , x z=
) 0,
(5.1.8)
that corresponds to problem (5.1.3)-(5.1.6).
Equation (5.1.8) allow us to obtain some general properties of the optimal
control in the synthesis form u, (t, XI, xz), which we shall use later. Indeed,
it follows from (5.1.8) that the optimal control u, for which the expression
in the square brackets attains its minimum is a relay-type control and can
be written in the form
aF
-urn sign -(t, xl, x2).
8x2
REMARK 5.1.1. Rigorously speaking, the optimal control in this prob-
lem is not unique. This is related to the fact that at the points (t, xl, x2),
where a F ( t , X I , x2)/dx2 = 0, the optimal control u, is not uniquely deter-
mined by Eq. (5.1.8). On the other hand, one can see that a t the points
Control of Oscillatory Systems 251
(t, 21, 22), where a F / d x 2 = 0, the choice of any control u0 lying in the
admissible region [-urn, u,] does not affect the value of the loss function
F ( t , 21, 22) that satisfies the Bellman equation. Therefore, in particular,
the control (5.1.9) that requires the choice of u, = 0 a t the points (t, xl, x2),
where a F ( t , x l , x2)/dx2 = 0,2 is optimal.
Using (5.1.9), we can rewrite the Bellman equation (5.1.8) in the form
It follows from this relation and (5.1.9) that the optimal control algorithm
u, (t, 21, x2) has an important property of being antisymmetric, namely,
The facts that the optimal control in problem (5.1.3)-(5.1.6) is of relay type
(5.1.9) and antisymmetric (5.1.11) play an important role in the asymptotic
synthesis method discussed in the sequel.
We also note that the optimal control algorithm in problem (5.1.3)-
(5.1.6) can be simplified significantly if we consider the optimal control of
system (5.1.3) on an infinite time interval. In this case, the upper limit
of integration T + oo in (5.1.6) and, instead of (5.1.7), we have the time-
independent3 loss function
f(xl,x2)= min
lu(~)I<~rn,
3The loss function (5.1.12)is time-independent, since the plant equations (5.1.4) are
time-invariant.
252 Chapter V
I"(XI, 22) =
I" c(x?(t), xF(t)) d t ,
where xT(t) and x%(t) denote solutions of system (5.1.4) with control E
(5.1.14)
4 ~also
t follows from the ~ r o ~ e r t i of
e sthe penalty function c(xl,x2) that the control
Z(x1,x2) guarantees the asymptotic stability of the trivial solution XI ( t ) = xa(t) r 0 of
system (5.1.4).
Control of Oscillatory Systems 253
where
A
x,(A, @) = -(cos 2@- I)X(Acos @, -Asin@),
2
sin 2@
x&, @I = -- 2 ~ ( A c o@,
s -Asin@),
us (A, @) = u(A, @) sin @, u,(A, @) = u(A, @) cos @.
Since the optimal control is of relay type (5.1.9), (5.1.13) and antisym-
metric (5.1.11), for the control function u(A, @) in (5.1.17), we can imme-
diately write
u(A, @) = u, sign [ sin (@ - pr (A))]. (5.1.18)
Note that, in view of the change of variables (5.1.15), controls of the form
(5.1.18) are already of relay type and antisymmetric on the phase plane
(xl,x2). The function pJA) in (5.1.18) determines (in the polar coordi-
nates) an equation for the switching line of the controlling action. Thus,
in this case, the synthesis problem is equivalent to the problem of finding
the function $(A) that minimizes a given optimality criterion. The func-
tion p:(A) is calculated by using the method of successive approximations
presented in Section 5.1.4.
It is well known [2, 19, 331 that for a sufficiently small parameter E , in-
stead of Eqs. (5.1.16), one can use some other auxiliary equations, which
are constructed according to certain rules and are called truncated equa-
tions. These equations allow one to obtain approximate solutions of the
original equations in a rather simple way (the accuracy is the higher, the
smaller is the parameter E ) .5
In the simplest case, the truncated equations
are obtained from (5.1.16) by neglecting the vibrational terms in the ex-
pressions for G(A, @, u) and H(A, @, u) or, which is the same, by averaging
the right-hand sides of Eqs. (5.1.16) over the "fast phase" Q, while the
amplitude A is fixed,6 namely,
where
In this case, the successive terms G;, H;, G;, H;, . . .,v1, "1, v2, "2,. . . of
the asymptotic series (5.1.23) and (5.1.22) are calculated recurrently by the
method of successive approximations.
'This method for obtaining truncated equations is often called the method of slowly
varying amplitudes or the v a n der Pol method.
Control of Oscillatory Systems 255
Substituting (5.1.22) and (5.1.23) into (5.1.24) and retaining only the terms
of the order of E in (5.1.24), we obtain the first-approximation relations
awl 1
H;(A*) + -(A*,
a@* a * ) = H(A*, a * ) = x,(A*, a * ) - -u,(A*,
A*
a*).
(5.1.25)
Now, by equating the nonvibrational and purely vibrational terms on the
left and on the right in (5.1.25), we obtain the following expressions for the
first terms of the asymptotic series (5.1.23) and (5.1.22):
where
Q, (A*, a * ) =
6;[u, (A*,@I) - ?&I da'.
calculate the functions Ga, H,+,212, w2 in (5.1.24), we need to retain the ex-
pressions of the order of E ~ Then
. (5.1.24) implies the second-approximation
relations
G;(A*) + -G;(A*)
8%
d ~ *
+ -H;
avl
aa* (A*) +a
dv2
,
dG dG
= v17(A*,@*)
dA
+ wi;(A*,@*),
d@ (5.1.29)
H; (A*) + awl
--G;(A*)
dA*
awl
+
mH; (A*) +
dw2
In its turn, each equality in (5.1.29) splits into two separate relations for
the nonvibrational and vibrational terms contained in (5.1.29), respectively.
This allows us to calculate the four functions GZ(AS), H;(A*), v2(A*,a*),
and wg(A*, a*).In particular, for the nonvibrational terms, the first equal-
ity in (5.1.29) implies
Using (5.1.17), (5.1.27), and (5.1.28), we can write the right-hand side
of (5.1.30) in more details as follow^:^
- 8% duS 8%
-
us, u,, aA sin n@, -cosna,
da
Qs -,
dA (5.1.33)
auS
Q-, Qc s i n n a , 9, cos n@.
aa
The average values (5.1.33) can readily be calculated by using (5.1.18),
the properties of the S-function, and the fact that the functions u,(A, a),
u,(A, a),Qs (A, a),and +,(A, @) are periodic (with respect to @).
1. If, for definiteness, we assume that 0 5 cpr 5 r / 2 , then it follows
from (5.1.17) and (5.1.18) that
2u,
= -cos pr(A).
sinad@+/
"+qr(A)
Y,(A)
sinad@- J2"
"+Y,(A)
sin dm
I
(5.1.34)
7r
One can readily see that formula (5.1.34) remains valid for any cp,(A) such
that -7r <cp,(A) 5 7r.
2. In a similar way, we obtain
2% sin cpp(A)
u, sign [sin (a - cp, (A))] cos @ d@ = - -
7r
d
-sign x = 2S(x)
dx
Chapter V
Using (5.1.37) and the properties of the &function, after the integration
and some elementary calculations, we obtain
8%
- = 2urn6[sin(0- p,)] cos(@- p,) sin cP + u, sign[sin(0 - pr)] cos 0,
a0
we obtain
5. Since \E, (A, cP) and du, (A, @)/dA are periodic functions, we have
where
&[sin(@'- pr)] cos(@' - p,) sin 0 ' d 0 '
Control of Oscillatory Systems
2 sin cp,. - - - - - - - - - - - I
I
I
I
sin pr .- - I
I I
I 1
I I
I I
*
0 'f, 7T n-+'f, 27T @
It follows from (5.1.40) that the choice of does not affect the value of
Q,%. Hence we set = 0. Furthermore, if we consider 0 5 cp, 5 n-,
then the piecewise constant function F ( @ )in (5.1.41) has jumps of value
+
sinp, a t the points cpr and n- cp, as shown in Fig. 35. For this function
F(@), one can readily calculate F and q,
namely,
2u& 3
-- dcp ( 5sin 2pp - -1 sin f p , + sin cp,
n- dA n- 2
Carrying out similar calculations for - 7 ~ 5 cp, 5 0 and comparing the result
with the last formula, we finally obtain
du, 2u2
Qc-
aa = --sin2pr.
7r2
7. The relation
1
Q, sin n@ = -u, cos n 9 (5.1.44)
n
allows us to reduce the calculation of the desired mean value to finding a
simpler expression u, cos n a . Using (5.1.17) and (5.1.18) and performing
some simple calculations, we obtain
1
n+l sin(n + 1 ) + ~5 ~sin(n - l)y, for even n,
for odd n.
( A ** = ] c*(A;, a;) d t ,
where c*(A*,9*)is obtained from the penalty function c(xl, x2) by the
change of variables (5.1.15), (5.1.21).
Note that the functional (5.1.46), treated as a function of the initial state
( A * ,a * ) , is a periodic function in the second variable, namely, I(A*, a * ) =
'The value of the functional (5.1.46) depends both on the initial state A*(O) = A*,
@ * ( O ) = @* of the system and on the control algorithm u(A;, Q;): 0 <
t < oo. There-
fore, for the functional (5.1.46) it is more correct to use the notation IU(~;,*:)(A*,@*)
a*)
or I ' ~ ( ~ * ) ( A * , (which, in view of (5.1.18), is the same). However, for simplicity, we
write I(A*, a * ) .
Control of Oscillatory Systems 261
I(A*, @* +27~). Therefore, taking into account (5.1.21) and the second
equation in (5.1.23), we obtain
1 d 2 1 ( ~ *a*)
,
- - AA* - . .
2 dA*2
Since AA* = €G;(AS)2.rr in the first approximation with respect to E, it
follows from (5.1.49) that
dI(A*, a*) - --c*(A*)
& -
dA* G;(A*) + € . . . .
where
-
C* (A*) =- '*(A*, a,') d@F, (5.1.51)
where, just as in (5.1.51), the bar over a letter indicates the averaging over
the period with respect to @:, and the function GB(A*) is determined by
(5.1.31).
Let us write the functional to be minimized as follows:
(note that, by the assumptions of the problem considered, we can set AT, =
I ( A 2 ) = 0).
It follows from (5.1.53) that, to minimize the functional (5.1.46), it suf-
fices to find the minimum of the derivative a I ( A * ,@*)/dASfor an arbitrary
current state (A*, @*) of the control system. The accuracy of this minimiza-
tion procedure depends on the number of terms retained in the expansion
of the right-hand side of (5.1.49) in powers of E. Let us perform the corre-
sponding calculations for the first two approximations.
According to (5.1.50), to minimize the functional (5.1.46) in the first
approximation in E, it suffices to minimize (in cpr) the expression
-
u. = u, (A*,m*) = -
1
2~
j 27r
0
U(A*,a*)sin a* d@*
This fact and (5.1.5) readily imply that the optimal control ul(A*, a*) in
the first approximation must have the form
From the mechanical viewpoint, this result means that, to obtain the
optimal damping of oscillations in the oscillator (5.1.3), we must apply
the maximum admissible controlling force (the torque) and this force (the
torque) must always be opposite to the velocity (the angular velocity) of
the motion. It must also be emphasized that the control algorithm in the
first approximation is universal, since it depends neither on the nonlinear
characteristics of the oscillator (that is, on the function ~ ( xi)
, in (5.1.3))
nor on the form of the penalty function c ( x , i)in the optimality criterion
(5.1.6).
To find the quasioptimal control algorithm in the second approximation,
we need to calculate the function cp,(A*) that minimizes (5.1.52) or, which
is the same, the expression
F(c,) =
2um
cos pr + E-urn (-1 sin 3cpr + sin cpr) + E- 2u;
27r 3 sin 2v,.
264 Chapter V
aF -
- 2u,
- -- sinv,
urn
+ r-(cos3q, + cosq,) +E=
44,
cos 2qr = 0.
a% '7r 21r
The function pr(A*) determines (in the polar coordinates) the switching
line equation for the quasioptimal control in the second approximation. The
position of this switching line on the phase plane (x, 5 ) is shown in Fig. 36.
It follows from (5.1.18) and (5.1.60) that in this case the quasioptimal
control algorithm (the synthesis function) in the second approximation has
the form
he terms of the order of c2 and of higher orders on the right-hand side of (5.1.60)
are omitted.
Control of Oscillatory Systems 265
is that if we use a control of the form (5.1.18), then there always exists a
small neighborhood of the origin on the phase plane (x, i ) and the quasi-
harmonic character of the trajectories of the plant (5.1.3) is violated in
-
this neighborhood. In Fig. 36, this neighborhood is the circle of radius R
(R E)." In the interior of this neighborhood, the applicability conditions
for the asymptotic (van der Pol, Krylov-Bogolyubov, etc.) methods are vi-
olated. Therefore, the quasioptimal control algorithms (5.1.56) and (5.1.61)
can be used everywhere except for the interior of this neighborhood. More-
over, it is important to keep in mind that, by using the asymptotic synthesis
method discussed in this section, it is in principle impossible to find the
optimal control in a small neighborhood of the point (x = 0, i = 0).
EXAMPLE 2. Now let x ( x , i ) = x2 - 1. In this case, the plant (5.1.3)
is a self-oscillating system (a self-exciting circuit) sometimes called the v a n
der Pol oscillator or the T h o m s o n generator. It follows from (5.1.17) that,
in this case, we have
Using formulas (5.1.34), (5.1.43), and (5.1.45) for the function (5.1.58), we
obtain
2urn
F(cpp)= cos cp sin 58, +3
- -
1
3
sin 3yr - sin cp, + 4um
-
A*
7r
sin 2yr
I
-
Just as in Example 1, from the condition d F / d p r = 0 with regard to the
fact that cpr is small (cpp E), we derive the equation of the switching line,
' O A ~ elementary analysis of the phase trajectories of a linear oscillator subject to the
control (5.1.56) shows that the phase trajectories of the system, once entering the circle
of radius R = 2&um,not only cease to be quasiharmonic, but cease to be oscillatory in
character at all.
Chapter V
FIG.37
then the solution Ai(t) of Eq. (5.1.65) attains the value A* = 0 on a finite
time interval, which guarantees the convergence of the integral (5.1.46).
Thus, the inequality (5.1.66) is the solvability condition for problem (5.1.3)-
(5.1.6) a s T + m i n the case ofExample2.11
In conclusion we note that, in principle, the approximate method con-
sidered here can also be used for calculating the quasioptimal control al-
gorithms in the third, fourth and higher approximations. However, in this
case, the number of required calculations increases sharply.
-
Recall that 5 = Z ( r ) and 5 = G(t ) are the respective population sizes1' of
prey and predators a t time Fand the positive constants ax, a2, bl, and b2
have the following meaning: a1 is the rate of growth of the number of prey,
a2 is the rate of prey consumption by predators, bl is the rate a t which the
prey biomass is processed into the new biomass of predators, and b2 is the
rate of predator natural death.
In this section we consider a special case of system (5.2.1) in which
the predators die a t a high natural rate and are "poor" predators, since
they consume their prey a t a low rate. In the nomenclature of [177], this
problem corresponds to the case of predators poorly adapted to the habitat.
For system (5.2.1), this means that we can take the ratio azbllb2 = E << 1
as a small parameter in the subsequent calculations.
''condition (5.1.66) becomes sharper with an increase in the number of terms re-
tained in the asymptotic series on the right-hand side of Eq. (5.1.23) for the nonvibra-
tional amplitude.
''If the distribution of species over the habitat is uniform, then Z and y" denote the
densities of the corresponding populations, that is, the numbers of species per unit area
(volume) of the habitat.
268 Chapter V
I,, = Lrn ( ~ ( r-
)),b2 + c2 (5(T) - %)
a2
2
] (5.2.4)
where cl and c2 are given positive constants. We assume that the integral
(5.2.4) is convergent.
In (5.2.2) we change the variables as follows:
In the new variables (x, y), the goal of control is to transfer the system to
the origin (x = y = O), and the range of admissible values is bounded by the
Control o f Oscillatory Systems 269
Thus the desired optimal control u, can be found from the condition that
the functional (5.2.7) attains the minimum value on the trajectories of
system (5.2.8) with constraint (5.2.9) imposed on the control actions. In
this case, we seek the control in the form u, = u, (x(t),y(t)).
5.2.2. Approximate solution o f problem (5.2.7)-(5.2.9). In the
case of "poorly adapted" predators, the number E in (5.2.8) is small, and
system (5.2.8) is a special case of the controlled quasiharmonic oscillator
(5.1.1). Therefore, the method of $5.1 can immediately be used for solving
problem (5.2.7)-(5.2.9). The single distinction is that admissible controls
are subject to nonsymmetric constraints (5.2.9); thus the antisymmetry
property (5.1.11) of the optimal control is violated. As a result, it is im-
possible to write the desired controls in the form (5.1.18). However, as is
shown later, no special difficulties in calculating the quasioptimal controls
in problem (5.2.7)-(5.2.9) arise due to this fact.
On the whole, the scheme for solving problem (5.2.7)-(5.2.9) repeats the
approximate synthesis procedure described in $5.1. Therefore, in what fol-
lows, the main attention is paid to distinctions in expressions and formulas
caused by the special nature of problem (5.2.7)-(5.2.9).
Just as in $5.1, by changing variables according to formulas (5.1. 15)13we
transform system (5.2.8) to the following equations for the slowly changing
amplitude and phase (5.1.16):
Now, instead of (5.1.17), we have the following expressions for the functions
G(A, 9 ) and H(A, 9 ) only:
~ l ( ~ * , m * ) = ~ ~ * l o ( ~ * l ~ ) - u , ( ~ (5.2.12)
* , ~ ) + ~ ~ d ~
For the second term of the asymptotic series on the right-hand side of
Eq.(5.1.23), instead of (5.1.31), we have
+ [
&(A*
am*
a*) -
am* I
~ u , ( A *7 a*) wl(A*, @*)
+
the sum GT(A*) EG; (A*) attains its minimum (in the second approxima-
tion). It follows from (5.2.9), (5.2.10), and (5.2.11) that minimization of
G'; (A*) means maximization of
- 1
U, = fic(A*,@*)= -
2.n o
1 2"
u(A*,@)cos@d@ i max .
oSu<r
(5.2.15)
This fact immediately implies the following implicit formula for quasiopti-
ma1 control in the first approximation:
ul(A*, a*)= Y
-(sign cos @*
2
+ 1). (5.2.16)
Taking into account formulas (5.1.15) and (5.1.21) for the change of vari-
ables, we can write x = A* cos @* with accuracy up to terms of the order
of E. This fact and (5.2.16) readily imply the following expression for the
synthesis control in the first approximation in terms of the variables (x, y):
I
ul(x, y) = -(sign%
2
+ 1). (5.2.17)
Thus, in the course of the control process, the controlling action assumes
only the boundary values from the admissible range (5.2.9) and is switched
from the state u l = 0 to the state u l = y (or conversely) each time when
the representative point (x, y) intersects the y-axis (the switching line in the
first approximation). We also point out that, according to (5.2.5), in the
variables (5,fj)corresponding to the original statement of problem (5.2.2)-
(5.2.4), this control algorithm leads to the switching line that is the vertical
line passing through the point 5 = 2, = b 2 / b l on the abscissa axis; this
point determines the number of prey if system (5.2.1) is in equilibrium.
To find the optimal control in the second approximation, we need to
minimize the expression G;(A*) +EG;(A*) = F ( A * ,u). The functions
G; (A*) = G; (A*, u) and G; (A*) = G; (A*,u) are calculated by formulas
(5.2.11) and (5.2.14) with regard to (5.2.10), (5.2.12), and (5.2.13). In ac-
tual calculations by these formulas, it is convenient to use the fact that the
difference between the optimal control uz(A*, a*) in the second approxi-
mation and (5.2.16) must be small. More precisely, we can assume that on
the one-period interval of the fast phase @* variation, the optimal control
in the second approximation has the form of the function shown in Fig. 38
(the solid lines), where A1 and A2 are the phase shifts of the switch times
-
with respect to the switch times of the control in the first approximation
(the dashed lines); these variables are small (A l, A2 E).
This fact allows us, without loss of generality, to seek the control algo-
rithm ug(A*, @*) in the second approximation immediately in the form
Y
UZ(A*,a*)= -
2 {sign[cos(@*- ( P ~ - +]1) .
) sin ( P ~
Chapter V
G;(A*) = -u,(A*,Q*) = -- I
27rbzw
1 277
n2(A*,Q*) cos Q* dQ*
- -- Y
2xb2w [c0~($71
- $72) +
cos((p1 $72)) + (5.2.20)
Since $71, $72 -- E , it follows from (5.2.20) that the maximal terms (de-
pending on $71 and $72) in the expansion of (5.2.20) in powers of E are of
the order of E ~ Therefore,
. to calculate the second term E G ; ~
in the function
F(A*, $71, $72) = G; +EG; to be minimized, we can retain only terms of the
order of E~ and neglect the terms of the order of e3 and of higher orders.
Control of Oscillatory Systems
With regard to this remark we calculate the mean values on the right-hand
side of (5.2.14) and thus see1* that we need to minimize the function
(5.2.21)
to obtain the optimal values of cpl and cpz in the second approximation.
From the condition dF/dcpl = dF/dcp2 = 0 necessary for an extremum,
we obtain the desired optimal values
In this case, in 57.2 we derive the optimal control G ( T ,5,y) in the synthesis
form by solving the Bellman equation corresponding to problem (5.2.23)-
(5.2.25) numerically.
Note that problem (5.2.23)-(5.2.25) turns into problem (5.2.2)-(5.2.4) if
the following assumptions are satisfied:
We also note that, in view of the changes of variables (5.2.5) and (5.2.26))
the quasioptimal control algorithm in the first approximation (5.2.17) ac-
quires the form
ul(Z, Y) = 7
-[sign@ - 1) + 11. (5.2.27)
2
To estimate the effectiveness of algorithm (5.2.27), we performed a nu-
merical simulation of the normalized system (5.2.23). Namely, we con-
structed a numerical solution of (5.2.23) on the fixed time interval 0 <
T 5 T = 15 for three different algorithms of control E (1) the optimal
control Ti = Z,(T,,: y); (2) the optimal stationary control Ti = ?i:(z,y)
corresponding to the case where the terminal time T t oo in problem
(5.2.23)-(5.2.25); (3) the quasioptimal control in the first approximation
(5.2.27).
Control of Oscillatory Systems
Comparing the curves in Figs. 40 and 41, we see that these three al-
gorithms lead to close transient processes in the control system. Hence,
276 Chapter V
the second and the third algorithms provide a sufficiently "good" con-
trol. This fact is also confirmed by calculating the quality functional
(5.2.25) for these three algorithms, namely, we obtain I[u,(r, Z,y)] = 4.812,
I[ui(%,?j)] = 4.827, and I[ul(:,y)] = 4.901. Thus, any of these algo-
rithms can be used with approximately the same result. Obviously, the
simplest practical realization is provided by the first-approximation algo-
rithm (5.2.27) obtained here; by the way, this algorithm corresponds to
reasonable intuitive heuristic considerations of how to control the system.
Indeed, according to (5.2.27), it is necessary to start catching (shooting,
etc.) every time when the prey population size becomes larger than the
equilibrium size (for the normalized dimensionless system (5.2.23), this
equilibrium size is equal to 1). Conversely, as soon as the prey popula-
tion size becomes smaller than the equilibrium size, any external action on
the system must be stopped.
It should be noted that the situation when the first-order approximation
allows one to obtain a control algorithm close to the optimal control is
rather typical not only of this special case but also of other cases where
the small parameter methods are used for solving approximate synthesis
problems for control systems. This fact is often (and not without success)
used in practice for solving special problems [2, 331. However, it should
be noted that this fact is not universal. There are several cases where the
first-approximation control leads to considerable increase in the value of
the functional to be minimized with respect to its optimal value. At the
same time, the higher-order approximations allow one to obtain control
algorithms close to the optimal control. Some examples of such situations
(however, related to control problems of different nature) are examined in
$6.1 and in [97, 981.
where [(t) denotes the standard scalar white noise (1.1.31) and B > 0 is a
given number.
The admissible controls u = u(t), just as in (5.1.5), are subject to the
constraints
Iu(t) I I Urn, (5.3.2)
and the goal of control is to minimize the mean value of the functional
I[U] = E [l T
c(x(t), i ( t ) ) dt] i min
Iu(t)llum
O<t<T
. (5.3.3)
The nonlinear functions ~ ( xk), and c(x, k) in (5.3.1) and (5.3.3), just as
in $5.1, are assumed to be centrally symmetric, ~ ( x2), = x(-x, -k) and
c(x, k) = c(-x, -2). Next, it is assumed that the penalty function c(x, k)
is nonnegative and possesses a single minimum a t the point (x = 0, k = 0)
and c(0,O) = 0.
Let us introduce the coordinates x l = x, x2 = k and rewrite (5.3.1) as
Then, using the standard procedure from $1.4, for the function of minimum
future losses
F(t,x1,x2) = min
Iu(.)l<um
t<r<T
[ T ( 1 ( ) 2 ( ) ) d
I
1 xi(t) = xl,x2(t) = x2 ,
(5.3.5)
we obtain the Bellman differential equation
--
dF
dt
-
-22-
dF
ax1
- (xI+Ex(x~,x~)xz)-+
E B a2F
dF
min EU-
8x2 I u l l u , [ El
+ -2 dx, +-
7c ( x ~X,Z ) , 0 5 t < T, F ( T , XI,x2) = 0,
(5.3.6)
It follows from (5.3.6) that the desired optimal control u,(t, XI, 2 2 ) can
be written in the form
aF
u*(t, xi, 2 2 ) = -urn sign -(t, 21, x2),
ax2
where the loss function F ( t , 21, x2) satisfies the following semilinear equa-
tion of parabolic type:
Equation (5.3.8) and the fact that the functions ~ ( x l2 2, ) and c(xl,x2)
are symmetric imply that F = F ( t , X I , x2), satisfying (5.3.8), is symmetric
with respect to the phase coordinates, that is, F ( t , X I , 2 2 ) = F ( t , -21, -22).
This and formula (5.3.7) show that the optimal control (5.3.7) possesses an
important property, which will be used in what follows; namely, the optimal
control (5.3.7) is antisymmetric (see (5.1.11)):
We also stress that in this section the main attention is paid to solving
the stationary version of problem (5.3.1)-(5.3.3), that is, to solving the
control problem in which the terminal time T + m. In the nomenclature
of [I], problem (5.3.1)-(5.3.3) as T + m is called the problem of optimal
stabilization of the oscillator (5.3.1).
5.3.2. Passage to the polar coordinates. The Bellman differen-
tial and functional equations. By using the change of variables (5.1.15),
we transform Eqs. (5.3.4) to equations for the slowly changing amplitude A
and phase cp:
where
Note that the right-hand sides of the differential equations (5.3.10) for
the amplitude and phase contain a random function ((t) that is a white
noise. Therefore, Eqs. (5.3.10) are stochastic equations. The expressions
(5.3.11) for 6'and & are derived from (5.3.4) and (5.1.15) by changing the
variables according to the usual rules valid for smooth functions [(t). Thus
it follows from 31.2 that the stochastic equations (5.3.4) and (5.3.10) are
equivalent if they are symmetrized.15
We also note that by passing to the polar coordinates (which become
the arguments of the loss function (5.3.5)), we can equally use either the
set (A, p , t ) of current values (at time t ) of the amplitude A, the "slow"
phase cp, and time t or the set (A, @, t ) in which the "slow" phase is replaced
by the "fast" phase @. For the calculations performed later, the set (A, @, t )
is more convenient.
For the loss function F ( t , A, @) defined by analogy with (5.3.5),
F (t, A, a) = min
Iu(.)I<um
E [ lT ~1( ~ ( 7~)(~r
I
d) r) 1 ~ ( t =) A, ~ ( t=) @ ,
t<s<T
~ ( t , A t , @ t ) = min E
1u(.)11um
t<r<t+A
[l t+A
c i ( ~ ra,,) d r + ~ (+ tA, A,+*, at+,)].
(5.3.13)
This equation expresses the "optimality principle." It is important to stress
that relation (5.3.13) holds for any time interval A (not necessarily small).
This fact is important in what follows.
But if A -+ 0 in (5.3.13), than, using (5.3.10) and (5.3.11), we can readily
obtain (see 3 1.4) the following Bellman differential equation for the function
(5.3.12):
aF d F dF
--
at
= - ELF
a@ + + min
IU(T)ISU~
EG(A,@, u)-
aA + EH(A,@, u)-
15More precisely, for Eqs. (5.3.10) it is important to take into account the sym-
metrization property, since these equations contain a white noise E ( t ) multiplicatively
with expressions that depend on the state variables A and 9. As for Eqs.(5.3.4), they have
the same solutions independent of whether they are understood in the Ito, Stratonovich,
or any other sense.
Chapter V
The last two terms in (5.3.15) appear due to the fact that the stochastic
equations (5.3.10) are symmetrized.
If we change the time scale and pass to the slowly varying time ?= ~ t ,
then Eq. (5.3.14) for the loss function F(K A, @) acquires the form
It follows from (5.3.16) that the derivatives of the loss functions with respect
dF/dA -
to the amplitude and the fast phase are of different orders of magnitude (if
1, then d F / a @ E). This fact, important for the subsequent
considerations, follows from the quasiharmonic character of the motion of
system (5.3.4).
Equation (5.3.16) can be simplified if, just as in $1.4, $2.2, $3.1, etc.,
we consider the stationary stabilization of random oscillations in system
(5.3.4). In this case, the upper limit of integration T -+oo in (5.3.5) and
(5.3.12). The right-hand side of (5.3.12) also tends to infinity because of
random perturbations ((t). Therefore, to suppress the divergence in the
stationary case, we need to consider the following stationary loss function
f (A, @) (see (1.4.29), (2.2.9), and (4.1.7)):
where the constant y characterizes the mean losses of control per unit time
in the stationary operating conditions. For the function (5.3.17), we have
the stationary version of Eq. (5.3.16):
- min
IuILum
Just as in $5.1, taking into account the relay (5.3.7) and the antisymme-
try (5.3.9) properties, without loss of generality, we can seek the optimal
Control of Oscillatory Systems 281
where G(A, a , p,) and H(A, a, p,) denote the functions obtained from
(5.1.17) after the substitution of the control u(A, a ) in the form (5.3.19).
Thus, solving the synthesis problem is reduced to finding the function
$(A) that minimizes the expression in the square brackets in (5.3.20) and
determines (in polar coordinates) the equation for the switching line of the
controlling actions u* = fU, under the optimal control u, (A, a). To calcu-
late the function p+r(A),just as to solve Eq. (5.3.20), we use the method of
successive approximations (see Section 5.3.3), which allows us to obtain the
desired function @ + ( Ain
) the form of a series in powers of the parameter E:
Now let us write the functional equation (5.3.13) for the time interval
A = 27r. With regard t o (5.3.19), we can write
F(t,At,at)= min
v,(Ar)
E [lt+2n
CI(A~, a,) d r + ~ +
( 271,
t ~ t + 2 n@t+2n
,
)I -
t<r<t+27T
(5.3.22)
Since the loss function (5.3.12) is periodic in the variable a, we have
F ( t , A, a ) = F ( t , A, - 27r). This and (5.3.10) imply that relation (5.3.22)
can be rewritten as
F ( t , At, a t ) = min E
vr
[l t+2n
cl(A,, a,) d ~
+F(t + 2 ~At, 1
+ EAA,at + E A ~ ,) (5.3.23)
Chapter V
where
EAA = E J: t + 2 ~G ( A r , @ r , u r , r ) d ~
= ,5 lt+2= G(A,, a,, cpr(Ar)) d r - &J
t
t+27r
L ( T ) dr,
(5.3.24)
t+a~
~ A c p= E H(A,, a,, u ~7), d r
-
Using, just as in (5.3.16), the "slow" time t = ~t and expanding
+ +
F(;+ ~ T EAt, EAA,@t ~ A c p )in the Taylor series, we rewrite (5.3.23) in
the form
~F - d2F -
,,,
+--( E A A )d2 (t + 2re, At, at) + (EAA)( i A p ) (t + ~ T EAt,
, at)
d 2 -~
+--(&Av)'
2
(t + 2TE, At, @t)+ . . . = 0.
aa2 I
In the stationary case considered in what follows, Eq. (5.3.25) acquires
the form
min E [E
vF
ltiZn
a,) cl(A,, +aaAf
d r - 2 r & ~EAA-(At, at)
8f
+ ~Acp-(At,
da
at) + -
(EAA)' d2f
2
-
dA2 (At, at)
in (5.3.26) and the mean values of the amplitude and phase increments
over the time 27r. By using system (5.3.10), we can calculate expressions
(5.3.27) and (5.3.28) with arbitrary accuracy in the form of series in powers
of the small parameter E.
Let us write
Z(A,O, urnsign[sin(@- (or(A))],t) = G ( A ,a,t ) ,
(5.3.29)
&(A, O, urnsign[sin(@- or(^))], t ) = H ( A , O, t).
Then it follows from (5.3.10) that the increments of the amplitude A and
the slow phase (o over an arbitrary time interval r are
where
substituting &A,, Sly,, ... given by formulas (5.3.32), (5.3.35), .... and
averaging with respect to ( ( t ) ,we obtain the desired expansion for (5.3.27).
In practice, to use this method for calculating the mean values of (5.3.27)
and (5.3.28), we need to remember that formulas (5.3.30)-(5.3.38) possess a
Control o f Oscillatory Systems 285
specific distinguishing feature relative to the fact that the random functions
in expressions (5.3.29) have the coefficients &-'I2 :
= x,(A, 9 ) - A E A
(formulas (5.3.39) follows from (5.1.17), (5.3.11), and (5.3.29)). Thus, terms
of the order of E-' appear in Eb2A2,, ES3A2,, . . ., ES2p2,, E 6 3 ( ~ 2.~. ..
,
Therefore, in the calculations of the mean values of (5.3.27) and (5.3.28),
the number of terms retained in the expansions (5.3.31) must always be
larger by 1 than needed for the accuracy desired (if, for example, we need
to calculate the mean values of (5.3.27) and (5.3.28) with accuracy up to
+
terms of order of E', then we need to retain (s 1) terms in the expansions
(5.3.31)).
For example, let us calculate the first term in the expansion of the mean
value E(EAA). From (5.3.32) and (5.3.35), we have
Averaging (5.3.40) with respect to t ( t ) and taking into account the prop-
erties of the white noise, we obtain
where the bar, as usual, indicates averaging with respect to the fast phase
over the period (e.g., z A ( A t ) = & gnx,(At, a)d a ) , and us(At, p,) = G,
is given by (5.1.34)). Next, it follows from (5.3.33), (5.3.40), and (5.3.41)
that
286 Chapter V
Averaging (5.3.43) with respect to [(t) and taking into account (1.1.31) and
(1.1.32), we obtain
Ed2A2, =
&At
J2, [ArE[(rl)t(t)cos(Qt + r') cos(Qt + r) d ~ d' r + D
1
= 8 J2"
&At 0
[AT6(r1 - r) cos(Qt + r') cos(mt + r) d r '
I dr +D
- LJ2' cos2(Qt + r)d r + D = I~B
-
+ D, (5.3.44)
2 4 0 2~At
where
cp,) +-
4At I
+e2...
%(At) +-
At 1
2~ sin cpp(At) + c 2 . . .
(5.3.46)
Control of Oscillatory Systems 287
For the other mean values of (5.3.27) and (5.3.28), in the first approximation
in E, we have
All the other mean values E[(EAA)(EAY)],E ( E A ~ ~ . .). ~in, (5.3.28) are
higher-order infinitesimals in E.
Now let us calculate successive approximations of the Bellman equation
(5.3.26). Simultaneously, with the help of Eq. (5.3.20), we shall exclude the
derivatives of the loss function with respect to the phase from (5.3.26).
The first approximation. We represent the loss function f (A, a) as
the series
+ +
f (A, @) = fl(A, a) Ef2 (A, @) E2 - .., (5.3.49)
substitute into Eq. (5.3.26), and retain only terms of the order of E (omitting
the terms of the order of c2 and of higher orders). Since, in view of (5.3.20),
-
d f I d @ E, using (5.3.45)-(5.3.48), we obtain the following equation of the
first approximation from (5.3.26):
for the minimizing function cp*,(A)that determines the switching line in the
first approximation. In this case, in view of (5.3.51), Eq. (5.3.50) acquires
the form
where pl(A) is the stationary probability density for the distribution of the
amplitude A. The Fokker-Planck equation that determines this stationary
density is conjugate to the Bellman equation. Therefore, in the case of
(5.3.52), the equation for pl(A) has the form
For the zero probability flow (see $4, item 4 in [173]), Eq. (5.3.54) has the
solution
(y - E~(A'))exp
4
[z J A
A'
(A") - 2p + )4A1' 1
~A'Id ~ ' .
(5.3.58)
Control of Oscillatory Systems 289
Now we can verify whether the derivative dfl/dA is positive (this was
our assumption, when we derived (5.3.51)). It follows from (5.3.58) that
this assumption is satisfied for not too small values of the amplitude A.
Therefore, if we solve the synthesis problem by this method, we need not
consider a small neighborhood of the origin on the phase plane (x, k ) . Just
as in the deterministic case in 55.1, it is clear from the "physical" viewpoint
that the controlling action u and the perturbations [ ( t )lead to appearance
of a neighborhood where the quasiharmonic character of the phase trajec-
tories is violated.
The other terms in (5.3.26) are necessarily of orders larger than that of E ~ .
The derivatives dfl/a@, d2fl/dAa@, . . . of the loss function with respect
to the phase can be eliminated from (5.3.59) by using (5.3.20). Hence we
have
To find the function cpr(A) that minimizes the expression in the braces
in (5.3.59), we shall consider only the terms in (5.3.59) that depend on the
control (or, which is the same, on cp,(A)). In this case, we shall use the fact
that the minimizing function p+r(A)is small in the second approximation:
p+r(A)= ~ c p (A).
a Therefore, in the part of Eq. (5.3.59) that depends on
cp,, we can retain only the terms that depend on cpa by expressions
-
and E ~ .
- E'
replaced by
2 r c ( A , p,) -1
+ B7T
2A
~'27rB-
E(EAA)' = EBT - -uc (A, p,) sin2 a, (5.3.62)
A
where Z(A, cP) and &(A, cP) denote the purely vibrational components of
the functions G(A, a, p,) = E(A, p,) +
Z(A, a ) and cl(A, a) = El(A) +
El (A, a ) .
By using (5.3.60)-(5.3.64), (5.1.34), (5.3.42), and (5.3.59), we see that
the desired function p:(A) = ~cpz(A),which determines the switching line
in the second approximation, can be found by minimizing the expression
N ( p p ) = - ~ 4 u , cos p, afl
-
aA
- EZ { [Tic (A, p,)G (A, a) - uc (A, @ ) h(A, a)]
27T
2~
+ ,%(A, p,)
B
[r - cl(A, @) - - sin2 @-
a2fl
2 dA2
B afl af 1
- - cos2 @- -
2A 8A p p ) sin2 maA
B?T
+ aGc (A, p,) (5.3.65)
As a result, we obtain
In the following two examples, we calculate the function cp:(A) for which
(5.3.66) attains its minimum.
EXAMPLE 1. Suppose that the plant to be controlled is a linear system.
In this case, ~ ( xi)
, 1 in (5.3.1), and it follows from (5.1.17) that
The condition aN/dcp, = 0 leads to the following equation for the desired
function cp: (A):
Since
one can readily see that formula (5.3.68) coincides as B + 0 with the
corresponding expression (5.1.60) for the switching line of the deterministic
problem.
EXAMPLE 2. Let us consider a nonlinear plant with x(x, &) = x2 - 1 in
(5.3.1) (in this case, the plant is a self-exciting van der Pol circuit). For
such a system, it follows from (5.1.17) that
- A A3 A cos 2 a + -
a) = -- A3 cos 4G.
x(A) = - - -, Z,(A, (5.3.71)
2 8 2 8
Substituting (5.3.71) into (5.3.66) and using (5.1.44) and (5.1.45), from
(5.3.66) and the condition aN/acp, = 0 we derive the expression for the
switching line in the second approximation, which coincides in form with
the expression obtained in the previous example. However, now the loss
function and the stationary error in (5.3.68) must be calculated in a different
way.
So, in this case, the stationary probability density (5.3.55) for the dis-
tribution of the amplitude has the form
Control of Oscillatory Systems 293
afl
- 4
-- e x [I(g
p B 8 A2 + 8 , ~ A ) l
-
dA BA
A ( A ) - [
y) exp - -
B
1 A14
(-
8
- Af2 + ~,uA')]dAf.
Just as in Example 1, formula (5.3.68) coincides as B +0 with the
corresponding expression
lines for the linear quasiharmonic system from Example 1 are depicted.
Curve 1 corresponds to the deterministic problem ( B = 0). Curves 2,
3, and 4 show the switching lines in the stochastic case and correspond
to the white noise intensities B = 1, B = 5, and B = 20, respectively.
These switching lines correspond to the quadratic penalty function c(x, &) =
+
x2 i 2in the optimality criterion (5.3.3) and the parameters u, = 1
and E = 0.25 in problem (5.3.1)-(5.3.3). The dashed circle in Fig. 42
approximately indicates the domain where the quasiharmonic character of
the phase trajectories of the system is violated. In the interior of this
domain, the synthesis method studied here may lead to large errors, and
we need to employ some other methods for calculating the switching line
near the origin.
5.3.4. Approximate synthesis of control that maximizes the
mean time of the first passage to the boundary. As another ex-
ample of the method of successive approximations treated above, let us
consider the synthesis problem for a system maximizing the mean time
during which the representative point (z(t), &(t))first comes to the bound-
ary of some domain on the phase plane (x, &). For definiteness, we assume
that this domain is the disk of radius Ro centered a t the origin. As be-
fore, we consider a system whose behavior is described by Eq. (5.3.1) with
constraints (5.3.2) imposed on control.
Passing to the polar coordinates and considering the new state variables
A and as functions of the "slow" time t =~ t we , transform Eq. (5.3.1)
to the system of equations of the form
2
where the functions and fi are given by (5.3.11) and (5.1.17). By using
Eq. (5.3.74), we can write the Bellman equation for the problem in question.
It follows from $1.4 that the maximum mean time during which the
representative point (A(T), @(TI) achieves the boundary (the loss function
for the synthesis problem considered) can be written as (see (1.4.38))
(5.3.76)
By letting the time interval A + 0, in the usual way ($1.4), we obtain the
following differential Bellman equation for the function F(A,a):
a~
a@
-= r { - ~ - L F - lulLurn
a~
max [G(A,@,u)-+H(A,@,u)-I}
dA
aF
-
a@ '
A < Ro, F(Ro,@)= 0.
(5.3.77)
Here L is the operator (5.3.15), and the functions G and H are determined
by formulas (5.1.17).
On the other hand, if we set A = 2as in (5.3.76), then we arrive a t the
[r2='
finite-difference Bellman equation (an analog of (5.3.26))
Here the increments of the amplitude eAA and the "slow" phase r A y are
the same as in (5.3.24), and satisfy (5.3.45)-(5.3.48) and (5.3.61)-(5.3.64).
Next, to solve the synthesis problem approximately, we need, just as in
Section 5.3.3, to solve Eqs. (5.3.77) and (5.3.78) simultaneously. Here we
write out the first two approximations of the function cpr(A) determining
the switching line in the optimal regulator, which, just as in Section 5.3.3,
is of relay type and has the form (5.3.19).
The first approximation. Substituting the expression a F / a ~ from
(5.3.77) into Eq. (5.3.78), omitting the terms of the order of r 2 and of
higher orders, and using (5.3.45)-(5.3.48), we obtain the following Bellman
equation in the first approximation:
Since, by definition, W(T, Ai; @;) = 1 a t all points in the interior of the
domain of admissible states (that is, for all AT < Ro), we can transform
296 Chapter V
The function cpr(A) determining the switching line in the first approxi-
mation is found from the condition that the expression in the square brack-
ets in (5.3.80) attains its maximum. For 8Fl/dA < 016 we obtain
For simplicity, we shall consider the case where the plant is a linear quasi-
harmonic system. In this case, we have ~ ( x&), = 1 in (5.3.1) and z(A) =
-A12 in (5.3.82). Solving (5.3.81) with the second condition in (5.3.83),
we readily obtain
The expression (5.3.84) is used for determining the switching line in the
second approximation.
161t follows from (5.3.84) that the condition EJFl/dA < 0 is satisfied for all A E
(0, Rol.
Control of Oscillatory Systems 297
Eq. (5.3.1). The feed-back circuit (the regulator) of this system contains
a differentiator, a multiplier, an adder, an inverter, a relay unit, and two
nonlinear transducers N C 1 and NC2. Unit N C 1 realizes the functional
dependence A = d m , that is, produces the current value of the am-
~ l i t u d eA. Unit NC2 models the functional dependence V;(A), which is
given either by (5.3.68) or by (5.3.86), depending on the problem consid-
ered. Thus, the feed-back circuit in the diagram in Fig. 44 realizes the
control law
+
u(x, J) = -€urn sign (1 xp:(&%Z)),
The functions ~ ( xk), and c(x, k) in (5.4.1), (5.4.3) are the same as in
(5.3.1), (5.3.3). Therefore, problem (5.4.1)-(5.4.3) is completely identical
to problem (5.3.1)-(5.3.3).
The single but important distinction between these problems is the fact
that now it is impossible to measure the current state of the controlled
variable x(t). We assume that the result y(t) of our measurement is an
additive mixture of the true value of x(t) and a random error of small
intensity:
+
y(t) = x(t) &rl(t), (5.4.4)
where E is a small parameter the same as in (5.4.1) and the random function
~ ( t is
) a white noise (independent of [(t)) with characteristics
~ ( ty;), = min
Iu(.)I5um
E [ lT
c(x(r), i ( r ) )d r I y;] . (5.4.7)
t<r<T
300 Chapter V
and assuming that the control u is a given function of time, we can readily
show that z(t) is the observable component of the three-dimensional Markov
process (xl(t), x2(t),~ ( t ) ) .By using (5.4.4), (5.4.5), and (5.4.8), as well
as the results of $1.5, we readily obtain an equation for the a posteriori
probability density wps(t, x) = wps (t, x1,x2) = w(xl, 2 2 1 2:) = w(xl, 2 2 I
y:) for the components of the unobservable diffusion process determined by
system (5.4.8). The corresponding equation is a special case of Eq. (1.5.39)
and has the form
Equation (5.4.9) for the a posteriori density also remains valid if the
control u in (5.4.8) is a functional of the observed process 24, (or y); or even
of the a posteriori density wps(t, x) itself. This fact is justified in [I751 (see
also $ 1.5).
It follows from (5.4.4), (5.4.5), (5.4.9), (5.4.10), and the results of $1.5
that the a posteriori probability density wps(t, x), treated as a function of
time, is a Markov stochastic process and thus can be used as a sufficient
coordinate in the synthesis problem. However, usually, instead of wps (t, x),
it is more convenient to use a parameter system equivalent to wp,(t, x).
If we write xy(t) = x;,, xi(t) = xgt for the coordinates of the maximum
Control of Oscillatory Systems 301
(in (5.4.11) the sum is over ni, i = 1, . . . , s, assuming the values 1 and 2).
If we substitute (5.4.11) into (5.4.9) and set the coefficients of equal pow-
ers of (xn, - x:~). . . ( x , ~- x:~) on the left- and right-hand sides equal to
each other, then we obtain a system of differential equations for the pa-
rameters x:; (t) and a ,,.,* (t) (see (1.5.43)). Note that since Eq. (5.4.9)
is symmetrized, the stochastic equations obtained for x:;(t) and u,,...,~ (t)
are also symmetrized.
It is convenient to replace the probability density wps(t,x) by a set of pa-
rameters, since we often can truncate the infinite system of the parameters
xii, a [167, 170, 1811 retaining only a comparatively small number
of terms in the sum that is the exponent in (5.4.11). The error admitted
in this case as compared with the exact expression for wps is the less the
higher is the a posteriori accuracy of estimation of the unobservable com-
ponents x1 and x2 (or, which is the same, the less is the norm of the matrix
]lDaPll of the a posteriori variances); here, the norm of the matrix llDaPll
is of the order of E , since, in view of (5.4.4), the observation error is a small
variable of the order of fi.
It is often assumed [167, 1701 that a,,,,,, = a,,,, ,,,, = . = 0 in
(5.4.11) (this is the Gaussian approximation). In the Gaussian approxima-
tion, from (5.4.9) and (5.4.10) we have the following system of equations
for the parameters of the a posteriori density wps(t, X I , x2):18
1 7 ~ h variables
e x : ( t ) and z i ( t ) are estimates of the current values of the coordinate
x ( t ) and the velocity x ( t ) of the control system (5.4.1). If the estimation quality is
determined only by the value of the a posteriori probability, then z: ( t ) and z i ( t ) are
the optimal estimates.
''For the linear oscillator (when ~ ( z , & ) 1 in (5.4.1)), the a posteriori density
(5.4.11) is exactly Gaussian, and Eqs. (5.4.12) are precise.
302 Chapter V
Let us make some remarks concerning Eqs. (5.4.12). First, since (see
(5.4.1), (5.4.4), and (5.4.5)) the noise intensity in the plant and in the feed-
back circuit is assumed to be small (of the order of E ) , the covariances of the
a posteriori distribution are also small variables of the order of E , that is, we
can write D l 1 = &Ell,D12 = &Dl2,and D22 = &Dz2.This implies that the
terms in (5.4.12) are of different order of magnitude and thus Eqs. (5.4.12)
can be simplified furthermore. So, retaining the most important terms and
omitting the terms of the order of e2 and of higher orders, we can rewrite
(5.4.12) in the form
We also note that, in this approximation, the last three equations in (5.4.13)
can be solved independently of the first two equations. In particular, we
Control of Oscillatory Systems 303
see that, for a long observation, the stationary operating conditions occur
and the covariances of the a posteriori probability distribution attain some
steady-state values D;l, DT2, and Da2 that do not change during the further
observation. These limit covariances depend neither on the way of control
nor on the type of the plant nonlinearity (the function ~ ( x5 ,) in (5.4.1))
and are equal to
In what follows, we obtain the control algorithm for the optimal stabilizer
(controller) C under these stationary observation conditions.
5.4.3. The Bellman equation and the solution of the synthe-
sis problem. In the Gaussian approximation, the loss function (5.4.7)
is completely determined by the current values of the a posteriori means
0
zl(t) = x?, and x;(t) = xi, and by the values of the a posteriori covari-
ances Dl1, Dl2, and D22. Under the stationary observation conditions, the
a posteriori covariances (5.4.14) are constant, and therefore, we can take
x?(t), xg(t), and time t as the arguments of the loss function (5.4.7). Thus,
in this case, instead of (5.4.7), we have
-
(5.4.15)
In (5.4.15) the symbol E of the mathematical expectation means the a
posteriori averaging, that is, the averaging with the a posteriori probability
density. In other words, if we write the integral in the square brackets in
(5.4.15) as a function of the initial values of the unobservable variable xlt
and x2t, then, to obtain F ( t , x?,, xi,), we need to integrate this function
with respect to xlt and x2t with the Gaussian probability density
For the function (5.4.15), the basic functional equation (the optimality
304 Chapter V
I
~ ~ ( x l~ r2r r d) 7 , (5.4.19)
to substitute the expressions obtained for (5.4.18) and (5.4.19) into (5.4.17),
and pass to the limit as A -+ 0.
To calculate the mean values of (5.4.18), we need Eqs. (5.4.13) and for-
mulas (5.4.4) and (5.4.5). So, from (5.4.13) we obtain
By averaging (*) with respect to xlt with the probability density (5.4.16),
we finally obtain
0 - 0,
E(z?t+a - XI,) - %2t + o(A)- (5.4.21)
Control of Oscillatory Systems 305
In a similar way, we can find the other expressions for (5.4.18) and
(5.4.19):
+o(A). (5.4.22)
Using (5.4.21) and (5.4.22) and letting A -+ 0 in (5.4.17), we obtain
The expressions for G(Ao, Qo, u) and H(A0, a o , u) coincide with (5.1.17)
after the change A, @ + Ao, Qo. The function c*(Ao, @o) is determined by
+
the penalty function c(x, k) in (5.4.3) (e.g., for c(z, 2) = x 2 i2,we have
c*(Ao,a o ) = A; + +
EB,*, ED:^). In (5.4.26) Lo denotes the differential
operator
cos2 @O
+-
AO ~ A O
a - --
sin 2Qo
A;
aI)
a@, .
Note that as N + 0 formula (5.4.27) passes into formula (5.3.15) for the
operator L obtained in 35.3 for systems containing complete information
about the phase coordinates of the plant. We can readily verify this fact by
substituting the values (5.4.14) of the steady-state covariances into (5.4.27)
and passing to the limit as N + 0. Then (5.4.27) acquires the form of
(5.3.15), and Eq. (5.4.26) coincides with (5.3.18).
Control of Oscillatory Systems 307
2
-+...I
( ~ A ' p o )d2
~
am:,
=o,
and thus solving the synthesis problem is equivalent to finding the equation
in the polar coordinates for the switching line cp,(Ao).
We do not consider the mathematical calculations in detail (they coincide
with those in $5.3), but illustrate the resuIts obtained for the switching line
in the first two approximations by way of example of a controlled plant that
is a linear quasiharmonic system (in (5.4.1) we have ~ ( zk ,) 1). By using
the above-described procedure, we simultaneously solve Eqs. (5.4.26) and
308 Chapter V
-
7i-
Hence we obtain the following equation for the switching line in the first
approximation:
c p p o ) 0, (5.4.31)
which corresponds to the control law
for the derivative dfl/dAo, which enters the formula for the switching line
in the second approximation:
Control of Oscillatory Systems 309
Since y?(Ao) is small, it follows from (5.4.25) and (5.4.29) that the qua-
sioptimal control algorithm in the second approximation can be written
as
ULJ(X?,X:) = -urn sign
The block diagram of a self-stabilizing system realizing the control algo-
rithm in the second approximation is shown in Fig. 45. The most important
distinction between this system and that in Fig. 44 is that the feed-back
circuit contains a n additional element SC producing the current values of
the sufficient coordinates x : ( t ) and x i ( t ) . Figure 46 presents the diagram
of this element in detail.
-1
<
I
CHAPTER VI
where c(x) and $(x) are some nonnegative bounded continuous functions,
and H is a positive definite constant r x r matrix. We do not impose any
restrictions on the admissible values of the control vector u and assume
that the state vector x can exactly be measured a t any time t E [O, TI.
Thus, we can seek the optimal control u,that minimizes the mathematical
expectation (6.1.3) in the form of the functional
'It follows from (6.1.2) and (6.1.4) that x ( t ) is a diffusion type process.
2As is known 138,39, 1671, the a posteriori means m = mt are optimal estimates of
cu with respect to the minimum mean square error criterion.
314 Chapter VI
Since the constant matrices Do and N-' are positive definite, the matrix
R(s) is nonnegative definite; R(s) is degenerate if and only if all elements
of a t least one column of the matrix Q are zero.
>
Let X(s) 0 be the minimum eigenvalue of the matrix R(s). On multi-
plying (6.1.8) by yt in the scalar way, we obtain
(here llytll is the Euclidean norm of the vector yt). Replacing the quadratic
form in (6.1.9) by its lower bound and estimating the inner product (yo, yt)
with the help of the Cauchy-Schwarz-Bunyakovskii inequality, we arrive a t
the inequality
- - -
Since llyoll E , it follows from (6.1.10) that llyt 11 E . Thus we have Dt E
for all t E [0, TI.
We shall solve the problem of optimal control synthesis by the dynamic
programming approach. To this end, we first note that the a posten'ori
probability density p t ( a ) (or the current values of its parameters mt and
Dt) together with the current values of the phase vector xt form the suffi-
cient coordinates (see $1.5) for the problem in question. Therefore, these
parameters and time t are arguments of the loss function given, as usual,
by the formula
~ ( t , x , m , D ) = min
U(J)ER,
t<s<T
dF
are matrices of partial derivatives,
a t which the function in the square brackets in (6.1.12) attains its mini-
mum, determines the optimal control law, which becomes a known func-
tion u, = u, ( t ,x, m, D) of the sufficient coordinates, after the loss function
F = F (t, x, m, D) is calculated from Eq. (6.1.13).
Now let us discuss whether Eqs. (6.1.13) can be solved. Obviously, in the
more or less general case, it is hardly possible to obtain an exact solution.
Moreover, one cannot construct the exact solution of Eq. (6.1.13) even
in the special case where 8(x) is a linear function and c(x) and +(x) are
quadratic functions of x, that is, in the case in which the synthesis problem
with known parameters in system (6.1.1) can be solved exactly. The crucial
difficulty in this case is related to the bilinear form (in the variables x and
m) appearing in the coefficients of the first-order derivatives F,. On the
other hand, a high accuracy of estimating the unknown parameters a, due
to which a small parameter E appeared in the three last terms in (6.1.13),
results in a rather natural assumption that the difference between the exact
solution of (6.1.13) and the solution of (6.1.13) with E = 0 is small. (In
other words, the difference between the solution of the synthesis problem
with unknown parameters a and the similar solution with given a = a 0 is
small.)
The above considerations allow us to believe that an efficient approx-
imate solution of Eq. (6.1.13) (that is, of the synthesis problem) can be
obtained by means of the regular asymptotic method based on the expan-
sion of the desired loss function F in powers of the small parameter E:
Substituting (6.1.15) into (6.1.13) and grouping terms of the same order
with respect to E, we obtain the following equations for successive approx-
imations:
1
-F: = dT ( x ) z T ( m ) F -
~ -(F;)~BIF
4
: 21 S p ( ~ ~ 2 z , )c(x),
+ +
O<t<T, F0(~,x,m,D)=+(x); (6.1.16)
Applications of Asymptotic Synthesis Methods 317
(6.1.13) and (6.1.16) have a t most one solution in the class of functions
that are continuous in the strip IIT = (1x1 < m ; ID1 < m ; Iml < m ; 0 5 t <
T}, continuously differentiable once in t , and twice in other variables for
0 < t < T, and possess bounded first- and second-order derivatives with
respect to x, m, D in IIT. Furthermore, Theorem 2.5 (for quasilinear equa-
tions) in 11241 implies the following estimate for the solution of the Cauchy
problem (6.1.13):
(here C1, C2 2 0 are some constants; it is assumed that the function c may
depend not only on x as in (6.1.13) but on the other variables t , m , D).
The above arguments also hold for linear equations (6.1.17) and (6.1.18) of
successive approximations.
By introducing a quasilinear operator L, we rewrite Eq. (6.1.13) in the
form
LF=-c(x), O<t<T; F(T,x,m,D)=$(x).
Equation (6.1.22) is written for the case where the unknown parameter ai
stands on the r t h line and in the j t h column of the matrix A in the initial
system (6.1.1); here Bj = Bj(x) is the j t h component of the vector-function
B(x). Since is bounded, the solution v h f Eq. (6.1.22) and its partial
derivatives vk and vLXT,as was already noted, are also bounded. Finally,
since vL = FL; is bounded and the number i is arbitrary, the matrix F:mT
in the first term on the right in (6.1.21) is also bounded. In a similar way,
we verify the boundedness of F.,:
Thus, it follows from (6.1.21), (6.1.20) that So satisfies the estimate
The boundedness of F:, FA, F;,~, and FA, can be verified by analogy
with the case where we estimated So. Therefore, (6.1.24) and the inequality
(6.1.20) imply
IS1[ 5 c a 2 . (6.1.25)
Estimation of the differences -yo and -yl. For the functions GS =
GS(t,z, m, D), s = 0,1,2,. . ., determined by (6.1.19), we have the linear
partial differential equations 1451
+ C(X)+ a SP(DQ~(X)G:,,) + €2
SP(DNI(X)DGR,T)
- a Sp(DNl(x)DGa, 05t < T,
GS(T,x, m, D) = y5 (2). (6.1.26)
320 Chapter VI
Finally, from (6.1.29), (6.1.23), and (6.1.25) with regard to the inequality
/A"5 ]Iss + [ys[,we have
Applications of Asymptotic Synthesis Methods 321
The estimates (6.1.30) show that the use of the quasioptimal control uo or
where g and h > 0 are given constants. The optimal filtration equations
(6.1.5), (6.1.6) and the Bellman equation (6.1.13) for problem (6.1.31),
(6.1.32) are
-
D
dom = - -z(t) [doz(t)
V
+ (mx(t) - bu) dt], (6.1.33)
- D~
D = --x2(t), (6.1.34)
V
, m) = f O ( t ,m)x2
~ ' ( t x, + rO(t,m), (6.1.37)
a= Jm2+
b29
-
h '
o g v ( T - t ) - -1n
vh 2P
T (t, m) = + m + (P - m)e-2P(T-t)
p +m b2 ,B '
It follows from (6.1.14) and (6.1.37) that the quasioptimal control in the
zero approximation has the form
-Ftl = - m x ~ ; b2
h
-
0
-f (t, m)xF,
1
+ -F,,
v 1
2
- DXF;,
5Note that the loss function in the zero approximation is independent of the estimate
variance D, i.e., Fa = F0 ( t ,z , m ) .
Applications of Asymptotic Synthesis Methods 323
It follows from (6.1.14), (6.1.15), (6.1.37), and (6.1.40) that the quasiop-
timal control synthesis in the first approximation is given by the formula
Comparing (6.1.38) and (6.1.41), we note that the optimal regulators in the
zero and first approximations are linear in the phase variable x. However,
if higher-order approximations are used, then we obtain nonlinear "laws of
control."
For example, in the second approximation, we obtain from (6.1.18) and
(6.1.35) the following equation for the function F2 = F 2 ( t ,x, m, D):
W3 = -E 2 2b
-q(t,m, D).
h
324 Chapter VI
which differs from Eq. (6.1.33). The reason is that only stochastic equations
understood in the symmetrized sense 11741 are subject to straightforward
simulation. Therefore, the symmetrized equation (6.1.42) is chosen so that
its solution coincides with the solution of the Ito equation (6.1.33).
6.1.5. Some results of numerical experiments. The estimates
(6.1.30) establish only the asymptotic optimality of the quasioptimal con-
trols uo and u1. Roughly speaking, the estimates (6.1.30) only mean that
the less the parameter E (i.e., the less the a priori indeterminacy of the
components of the vector a ) ,the more grounds we have for using the qua-
sioptimal controls uo and u1 (calculated according to the algorithm given in
Section 6.1.2) instead of the optimal (unknown) control (6.1.4) that solves
problem (6.1.1)-(6.1.3).
On the other hand, in practice we always deal with problems (6.1.1)-
(6.1.3) in which all parameters (including E) have definite finite values. As a
rule, in advance, it is difficult to determine whether a given specific value of
the parameter E is sufficiently small so that the above approximate synthesis
procedure can be used effectively. Some ideas about the situations arising
Applications of Asymptotic Synthesis Methods
'Numerical methods for solving equations of the form (6.1.35) and (6.1.43) are dis-
cussed in Chapter VII.
Chapter VI
to the optimal.
Thus, the results of numerical solution of Eqs. (6.1.35) and (6.1.43) con-
firm that the quasioptimal control algorithm (6.1.41) is "highly qualita-
tive." We point out that this result was obtained in spite of the fact that
the a posteriori variance D, which plays the role of a small parameter in
the asymptotic synthesis method considered here, is of the same order of
magnitude as the other parameters (g, h, b, v ) of problem (6.1.31), (6.1.32).
This fact allows us to believe that the asymptotic synthesis method (see
Section 6.1.2) can be used successfully for solving various practical problems
of the form (6.1.1)-(6.1.3) with finite values of the parameter E .
In conclusion, we make some methodological remarks. First, we recall
that in the title of this section the problems of optimal control with un-
known parameters of the form (6.1.1)-(6.1.3) are called "adaptive." It is
well known that problems of adaptive control are very important in the
modern control theory, and a t present there are numerous publications in
this field (e.g., see 16-9, 1901 and the references therein). Thus, it is of in-
terest to compare the results obtained in this section with other approaches
to similar problems.
The following heuristic idea is very often used for constructing adap-
tive algorithms of control. For example, suppose that for the feedback
control system shown in Fig. 13 it is required to construct a controller C
that provides some desired (not necessarily optimal) behavior of the sys-
tem in the case where some parameters cr of the plant P are not known
in advance. Suppose also that for some given parameters a , the required
328 Chapter VI
where z(t) = y(t) - x(t) is the error signal, c(z) is a nonnegative penalty
function attaining its minimum a t the unique point z = 0, and c(0) = 0.
In this case, as shown in $2.2, solving the synthesis problem (in the case of
unbounded phase coordinates) is equivalent to solving the Bellman equation
330 Chapter VI
(see (2.2.4))
F ( t , z) = min
IU(J)IIU~
tls<T
F (T,z) = 0. (6.2.3)
By expanding the second term in the Taylor series around the point (t, el),
we obtain
where z, (7) is the minimum point (with respect to z) of the function F(T, Z)
and simultaneously the switch point of the controlling action. This point
can be found from the condition
In this case, the coordinate of the switch point given by (6.2.11), where F
is replaced by f , attains a constant value z, (that is, we have a stationary
switch point).
The boundary value problem (6.2.12), (6.2.13) can readily be solved by
the matching method. By analogy with $2.2, let us consider Eq. (6.2.12) on
different sides of the switch point z,. Then the nonlinear equation (6.2.12)
is replaced by the pair of linear equations
for two unknown parameters y and z,. Substituting (6.2.16) into (6.2.17)
and eliminating the parameter y from the system obtained, we see that the
stationary switch point z, satisfies the transcendental equation
For the quadratic penalty function c(z) = z2, Eq. (6.2.18) acquires the form
this formula was obtained in 52.2 (see (2.2.16)). In the other special case
e2 = -el and A 1 = -Az (the last equality is possible only if a = O),
Eq. (6.2.19) has a single trivial root z, = 0, that is, the optimal control
(6.2.10) coincides in sign with the error signal z.
6.2.2. Absorbing screens. Let us see how the tracking system studied
in the preceding section operates with absorbing screens. Obviously, in this
case, the loss function (6.2.2) must also satisfy Eq. (6.2.7) in the interior of
the interval [el, 12] and the zero initial condition (6.2.9). At the boundary
points, instead of (6.2.8), we have
The conditions (6.2.20) follow from formula (6.2.2) and the fact that the
trajectories z(t) stick to the boundary. Indeed, by using, as above, the
discrete random walk model for z(t), we can rewrite (6.2.2) as
334 Chapter VI
For z = zo, z = el, and 2- = e2, the functions Fl and F 2 satisfy (6.2.11) and
(6.2.20).
In accordance with [26], for large T, we seek the solutions of the linear
equations (6.2.21) and (6.2.22) in the form
Using (6.2.23), we obtain from (6.2.21), (6.2.11), and (6.2.20) the following
system of ordinary differential equations for the functions $l(z) and fl(z):
d$1 df 1
-(zo)
dz = -(zo)
dz = 0, $l(el) = ~ ( e , ) , fl(el) = o.
(6.2.24)
From (6.2.24) we obtain
randomly walking on the plane (x, y) so that along one of the axes, say,
along the x-axis, this motion is controlled by variations of the drift velocity
within a given region, while along the y-axis we have a purely diffusion
noncontrolled wandering. In this case, the equations describing the system
motion have the form
One can readily see that the Bellman equation related to this problem,
written in the reverse time r = T - t , has the form (F,, Fx,Fyindicate the
~ a r t i a derivatives
l with respect to 7,x, y):
In addition to Eq. (6.2.32) for the function F ( T ,x, y) such that 0 < r 5 T
and < T O , the loss function F ( r , x, y) must satisfy the zero initial
condition
F(O,x, Y) = 0 (6.2.33)
In the polar coordinates (r, cp) defined by the formulas x = r cos cp, y =
r sin cp, the boundary value problem (6.2.32)-(6.2.34) acquires the form
+ -(urn-a)<u<(um+a)
min
but now, instead of the switch point, we have a switching line on the plane
(x, y). This switching line is given by the equation (in the polar coordinates)
sin cp
COS pFT- -F,r = 0.
It follows from (6.2.40), (6.2.36), and (6.2.37) that the solution F(O) is
radially symmetric, F(')) = F(O)(r,r), and therefore, instead of (6.2.40),
(6.2.36), and (6.2.37), we have
338 Chapter VI
It is well known [I791 that the solution of Eq. (6.2.41) can be found by
separation of variables (by the Fourier method) as the series
Here l o ( % )is the Bessel function of zero order and is the m t h root of
the equation dIo(,u)/dp = 0.
It follows from the properties of zeros of the Bessel function [I791 that
the series (6.2.42) is rapidly convergent. Therefore, since we are interested
only in the qualitative character of suboptimal control laws, it suffices to
find only the first term of the series in the sum in (6.2.42).
Calculating cl and using the tables of Bessel functions [77], we obtain
the following approximate expression (0 = ~ / r ifor ) the function ~ ( ' 1 :
r r2 0
F(')(T, r) = AT- 0.0426210 ( h r ) (1 - exp [ - ~(&')'r]). (6.2.44)
2 0 ro
By differentiating (6.2.44) with respect to r and taking into account the
relations dIo(x)/dx = I l ( x ) and p: = 3.84, we find
r ) = 0.164~11
FJO)(r,
0 To
(gr)(1 - exp [ - ~ ( ~ : ) ~ r ] ) . (6.2.45)
Applications of Asymptotic Synthesis Methods 339
Since the first-order Bessel function I l ( p ~ r / r o )is positive for 0 < r < ro
(Il(&) = O), the derivative (6.2.45) is positive everywhere in the interior
of the disk of radius ro on the plane (x,y). Hence, in view of (6.2.38),
the sign of the controlling action in the zero-approximation is determined
by the sign of cos cp, that is, the switching line coincides with the vertical
diameter of the disk of radius ro on the plane (x, y) (in Fig. 51 the switching
line is indicated by AOB; the arrows show the direction of the mean drift
velocity).
The first approximation. By using the results obtained above, we
can write the first-approximation equation as
(here the function F ' O ) is given by formula (6.2.45)). The solution F(')
of Eq. (6.2.46) may also be written as a series in eigenfunctions, but since
now there is no radial symmetry, this series differs from (6.2.42) and has
the form [I791
where
dn = {1
2
for
for
n # 0,
n=O,
340 Chapter VI
and coo(r) denotes the terms independent of r and p, and hence, insignif-
icant for the control law (6.2.38). The numbers p k are the roots of the
equation dIn(p)/dp = 0, where In(p) is the nth-order Bessel function.
By analogy with the case of the zero approximation, we consider only
the first most important terms of the series (6.2.48). Namely, we retain
only the terms corresponding to the two roots p i and p: of the equation
dI,(p)/dp = 0; according to [77], p i = 1.84 and py = 3.84. This means
that all coefficients in (6.2.48) except for col, cll, and cil must be set
equal to zero. The coefficient col coincides with cl in (6.2.43) and has been
calculated in the zero approximation (therefore, in the series (6.2.48) the
term containing col coincides with the second term in formula (6.2.44)).
By calculating cil according to (6.2.50) with regard to (6.2.47), we obtain
cil = 0. Thus, to find the loss function F('), it suffices to calculate only
cll. Substituting (6.2.47) and (6.2.45) into (6.2.49), we obtain
x lrexp [ - ~ ( , L L : ) ~ (-
T g)](1 - e ~ ~ [ - - ~ ( p pdo
)~c])
( 3
li. = T 1 - - x - qux + d%x[(t), t > 0, x(0) = xO, (6.3.1)
I[u] = E [lcc
e-dt(pqx(t) - c)u(t) dt] -+ max ,
0<u(t)<um
t>o
(6.3.3)
F(X) = max
O<u(t)<um
E [lm 1 e-bt (pqr(t) - c)u(t) dt ~ ( 0 =
)
I
t20-
The cost function is defined only for nonnegative values of the variable x;
for x = 0, this function satisfies the natural boundary condition
it follows from (6.3.3) that in this case the optimal control has the form
u,(t) r 0; and hence, (6.3.3) implies (6.3.5)).
First, note that Eq. (6.3.4) for S > r + B and K -t oo has exact solution
(obtained in $2.4)
Here
20 =
c(S - r - B + qu,)
pq(6 - r - B + ekO-1
qurn)
determines the switch point of the optimal control in the synthesis form
and the numbers Ic: > 0 and kg < 0 in (6.3.6) and (6.3.7) can be written
in terms of the parameters of problem (6.3.1)-(6.3.3) as
If the equation y ( x ) = 0 has a single root x,, then the optimal control can
be written in the form
BX~F; x + ( + B - yu,
T - -x
K )Fi - SFl = U,(C - pyx), x , < z.
(6.3.12)
Since the cost function F (z) , as the solution of the Bellman equation (6.3.4),
is twice continuously differentiable for all x E [0,o o ) , the functions Fo and
Fl satisfy the boundary condition (6.3.10) a t the switch point x,. Moreover,
it follows from (6.3.5) that Fo(0) = 0. These boundary conditions allow us
to obtain the unique solution of Eqs. (6.3.11) and (6.3.12), and thus, for
all x E [O, o o ) , to construct the cost function F ( x ) satisfying the Bellman
equation (6.3.4).
We shall seek the solution of Eq.(6.3.11) as the generalized power series
By substituting the series (6.3.13) into (6.3.11) and setting the coefficients
of x u , xu+', . . . equal to zero, we obtain the following system for the char-
acteristic factor a and the coefficients a ; , i = 0 , 1 , 2 , . . . :
346 Chapter VI
>
Thus, the series (6.3.16) converges for any finite x 0, we can differentiate
this series term by term, and its sum $(x) is an entire analytic function
satisfying the estimate
+ C -1 k:(ki + 1)...(kt + n -
m
1)
*I(.) = xk: [I
n=l n! (a: + l ) ( a + 2 ) . . . ( a + n)
Note that the series (6.3.20) for any finite x can be majorized by a conver-
gent numerical series. Therefore, the series (6.3.20) can be differentiated
and integrated term by term, and its sum &(x) is a n entire function. Sim-
ilar statements for the series (6.3.21) hold only for a: # n (here n is a
positive integer number); in what follows, we assume that this inequality is
satisfied.
A particular solution of the nonhomogeneous equation (6.3.12) can be
found by the standard procedure of variation of parameters. We write the
desired particular solution @ as
where the desired functions cl(x) and c2(x) satisfy the condition
348 Chapter VI
Formulas (6.3.17) and (6.3.28) determine the cost function F (x) that
satisfies the Bellman equation (6.3.4) for all x E [O, XI). In these formulas,
only the coordinate of the switch point x, remains unknown. To find x,,
we use the condition that the cost function F (x) must be continuous a t the
switch point:
(6.3.29)
x=x* x=x*
Applications of Asymptotic Synthesis Methods 349
or, which is the same due to (6.3.10), the condition that the second-order
derivative must be continuous:
Since the series (6.3.16) and (6.3.21) are convergent, we can calculate
x, with any prescribed accuracy, and thus solve our equations numerically.
Furthermore, for large values of the medium capacity K , formulas (6.3.29)
and (6.3.30) give us approximate analytic formulas for the switch point,
and these formulas allows us to construct control algorithms that are close
to the optimal control.
6.3.3. The calculation of x, for large K . In the case K -+ m, the
functions $(x), $l(x), and $z(x), as it follows from (6.3.16), (6.3.20), and
(6.3.21), are given by the finite formulas
We also seek the root of Eqs. (6.3.29) and (6.3.30), that is, the coordinate
x,, as the series
2, = xo + &Al+ c 2 a 2+ E ~ . .., (6.3.35)
where the numbers xo, A,, A,, . . . must be calculated. By substituting the
expansions (6.3.33)-(6.3.35) into Eq. (6.3.29) (or (6.3.30)) and setting the
coefficients of equal powers of the small parameter E on the left- and right-
hand sides equal to each other, we obtain a system of equations for succes-
sive calculation of the numbers xo, Al, A2,. . . in the expansion (6.3.35).
Obviously, the first term xo in (6.3.35) coincides with (6.3.7). To cal-
culate the first correction Al in the expansions (6.3.33) and (6.3.34), we
-
retain the zero-order and the first-order terms and omit the terms E~ and
higher-order terms. As a result, from (6.3.16), (6.3.17), (6.3.20), (6.3.21),
and (6.3.28) we obtain the following expressions for the functions Fo(x) and
Fl(x) in the first approximation:
where
and the first order with respect to the small parameter E. If we retain only
the zero-order terms in Eq. (6.3.38), then we can readily see that (6.3.38)
implies formula (6.3.7) for xo. Collecting the terms of the order of E , from
(6.3.38) we obtain the first correction
Thus, for large values of the parameter K (that is, for small E ) , the
coordinate xo given by (6.3.7) can be interpreted as the switch point in the
zero approximation. Correspondingly, the formula
where xo and Al are given by (6.3.7) and (6.3.39), determines the switch
point in the first approximation.
Let uo(x) and ul(x) denote the controls
Denoting by Gio(x) and Gil(x), just as in Section 6.3.2, the values of the
function Gi(x) on either side of the switch point xi, we obtain the following
equations for Gio and Gil from (6.3.42):
which are quite similar to Eqs. (6.3.11) and (6.3.12). Therefore, the general
solutions of these equations, by analogy with Section 6.3.2, have the form
where the functions $(x), &(x), and @(x) are given by formulas (6.3.16),
(6.3.20), (6.3.21), (6.3.23), (6.3.25), and (6.3.26).
The functions (6.3.45) differ from the corresponding functions (6.3.17)
and (6.3.28) in Section 6.3.2 by the method used for calculating the con-
stants El and C2 in (6.3.45). In Section 6.3.2 the corresponding constants
(ao in (6.3.15) and cl, cz in (6.3.27)) were determined by the condition
(6.3.10) a t a n unknown switch point x,, while in Eqs. (6.3.42) the switch
point xi was given in advance either by (6.3.7) with i = 0 or by (6.3.40)
with i = 1. By substituting (6.3.45) into the formulas Gio(xi) = Gil(xi)
and G;,(xi) = G:l(x;),8 we obtain the following formulas for the coefficients
El and C2 in (6.3.45):
'These formulas follow from the condition that the solutions G;(z)of Eqs. (6.3.42)
are continuously differentiable.
Applications of Asymptotic Synthesis Methods 353
coincide, respectively, with the functions Fo(x) and Fl (x) given by (6.3.17)
and (6.3.28), that is, we have Gi(x) z F ( x ) .
The above-described procedure for numerical constructing the functions
Go(x), Gl(x), and F (x) was realized in the form of software and was used
in numerical experiments for estimating the quality of the quasioptimal
control algorithms uo(x) in the zero approximation and ul(x) in the first
approximation. Some results of these experiments are shown in Figs. 53
and 54, where the cost function F ( x ) is plotted by solid curves and the
functions Go(%)and Gl(x) by dot-and-dash and dashed curves, respectively.
In Fig. 53 these curves are constructed for two values of the parameter K :
K = 7.5 and K = 11; the other parameters of problem (6.3.1)-(6.3.3) are:
r = 1 , 6 = 3 , B = l , q = 3, u, = 1.5, c = 3, and p = 2. In this case,
the variable E = r / K B treated as a small parameter in the expansions
(6.3.33)-(6.3.35) attains the values E = 0.091 (the upper group of curves)
and E = 0.133 (the lower group of curves). Figure 53 shows that in this
case all three curves F (x), Go(%),and G I (x), relative to the same group
of parameters, are sufficiently close to each other. Hence, the use of the
quasioptimal algorithms (6.3.41) ensures the control quality close to that
of the optimal control (obviously, the first-approximation control ul(x) is
354 Chapter VI
preferable than the zero-approximation control uo(x), since the mean cost
Gl(x) corresponding to ul(x) is closer to the optimal cost F ( x ) ) .
'It would be right to note that numerical methods have been developed mostly for
solving second-order parabolic equations. Nonlinear advection equations have been less
studied until the present time. However, many papers dealing with qualitative analysis
and numerical solution of such equations have appeared most recently. Here we would
like to mention the Italian school of mathematicians (M. Falcone, R. Ferretti, and others)
who studied various discrete schemes that allow the construction of numerical solutions
for various types of nonlinear advection equations including those with discontinuous
solutions [lo, 31, 48, 49, 531.
Chapter VII
a a a2 1
L = a t + a T ( t , ~ ) - dx
+ ~ p b ( t , x ) - dxdxT ' b(t, 2) = -a(t,
2 x)aT (t,x).
By using Theorem IV.l.l [113], one can show that the conditions (7.1.6),
(7.1.8), together with the upper estimates (7.1.7), guarantee that the func-
tion F ( t , x) satisfying problem (7.1.1)-(7.1.3) has generalized first-order
derivatives in x and the estimate
This relation and (7.1.9) imply the following (similar to (7.1.9)) upper
bound for the difference G = F - FO:
where [ ( t )is the scalar standard white noise (1.1.31), u is a scalar control,
and p, B , and urn are given positive numbers (P < 2). By setting the
penalty functions c(x(t)) = x2(t)+k2(t) and +(x) = 0 in (7.1.3), we obtain
the Bellman equation
for the loss function F ( t , x, y) (here x and y = j: are the phase variables).
By passing to the reverse time p = T - t , we can rewrite (7.1.18) as the
standard Cauchy problem for a semilinear parabolic equation. By using
the old notation t for the reverse time p, we rewrite (7.1.18) as
Up to the notation, Eq. (7.1.20) coincides with Eq. (3.4.23) whose solution
was obtained in $3.4 as the finite formula (3.4.29). Rewriting (3.4.29) with
regard to the notation used in the present problem, we obtain the solution
of Eq. (7.1.20) in the form
Formula (7.1.21) allows us to pose the boundary conditions for the de-
sired function F = F ( t , x, y) on the unhatched parts of the boundary
362 Chapter VII
~ 6= FO(ICT,
, ~ ~h,jh), o 5 j 5 Q;
F!~,~ = F0(k7, -Qh, jh), < j < 0;
-Q
FtQ= F0(k7-,ih,Qh), -Q + 1 < i < Q;
FtPQ= Fo(k7,ih, - ~ h ) , -Q < i < Q - 1.
The values of the grid function v t j and at the grid nodes are calculated
successively for the time layers k = 1,2,. . . by an implicit scheme. In
+
this case the (k 1)th layer function vtjfl corresponding to Eq. (7.1.24)
is used as the initial function vtjfl = K!~ for solving Eq. (7.1.25). The
grid functions F$ corresponding to the original equation (7.1.19) and the
functions vf,j and 63 corresponding to the auxiliary equations (7.1.24) and
-
(7.1.25) are related as follows: 8'tj= vFTj,vf,fl = Yzj, and ~ , f ? l= Fi , j ~ + ~
Moreover, since the time-step is assumed to be small (we take r = 0.01),
in the difference approximation of Eq. (7.1.25) we can use the sign of the
derivative Vk = vkfl instead of s i g n ( ~ : z l - T/,:-l1), that is, we shall use
ui,j = sign(~f"j+, - q$-l) instead of sign V, (a similar replacement was
performed in [34, 861).
It follows from the preceding that the difference approximation trans-
forms Eqs. (7.1.24) and (7.1.25) to the following three difference equations;
we see that, for given vfTj = F& and each fixed j 2 0, the desired set of the
values of v f73t l can be calculated successively from right to left by formula
+
(7.1.29). For the initial value of vz: we take F O ( ( k l ) ~L,, j h ) , where
F O ( t ,x,Y) is the function (7.1.21). Correspondingly, for j < 0 the values
of vf,fl can be calculated from left to right by formula (7.1.30) with the
+
initial value vF'Qtj = F0((k l ) ~-L, , jh) .'
Since vf,:' = yfEj,we obtain the grid function r/;fj for the kth time layer,
after the grid function vtf is calculated. Now to calculate the grid function
v,::' +
= F;T1 on the layer (k I ) , we need to solve the linear algebraic
system (7.1.28). It is convenient to solve this system by the sweep method
[162, 1791, which we briefly discuss here. Let us denote the desired values of
+
the grid function on the layer (k 1) by zj = K.';: Then system (7.1.28)
can be written in the form
2The recurrent formulas (7.1.29) and (7.1.30) are used for k = 0 , 1 , 2 , . . .,K - 1. It
follows from (7.1.23) that in (7.1.29) and (7.1.30) we must set vPIj = 0, -Q 5 i , j 5 Q ,
for k = 0.
Numerical Synthesis 365
p-Q+I = 0 7 Y-Q+l +
= F0 ((x: l ) ~
i h, - L ) .
Thus, the algorithm for solving problem (7.1.31), (7.1.33) by the sweep
method consists in the following two steps:
( 1 ) to find l ~ and
j vj recurrently for -Q + 1 5 j 5 Q (from left to right
from j t o j + 1) by using the initial values (7.1.36) and formulas (7.1.35);
( 2 ) employing z Q from (7.1.33), to calculate (from right to left from
+
j 1 t o j ) the values z ~ - z ~~ , - .~. .,,z - ~ + z~ -, S~ U C C ~ S S ~according
V ~ ~ ~ to
formulas (7.1.34) (note that in this case, in view of (7.1.36), the value of
z -Q coincides with that given by (7.1.33)).
As was shown in [162, 1791, the procedure of calculations by formulas
(7.1.34) and (7.1.35) is stable if for any j we have
It follows from (7.1.32) that these conditions can be reduced to the following
one in the problem in question:
the phase plane (2,y) lying above the switching line, and u = +urn below
this line.
Figure 58 illustrates how the switching line and the value of the perfor-
mance criterion of this optimal system depend on the value of the admissible
control u, for B = /? = 1 and t = 4. In Fig. 58 one can see that an increase
in the range of admissible controls uniformly improves the control quality,
Numerical Synthesis 367
that is, decreases the value of the optimality criterion independent of the
initial state of system (7.1.17).
Figures 59 and 60 show how the switching lines and the constant level
curves depend on the other parameters of the problem.
Chapter VII
Here x1(7) and y1(7) are the sizes (densities) of prey and predator popu-
lations a t time T , the positive numbers ai (i = 1,2,3,4) characterize the
intraspecific (al, a4) and interspecific (az, as) interactions. By changing the
variables
First, we note that in view of Eqs. (7.2.3), the phase variables z ( t ) and ~ ( t )
cannot attain negative values for all t 2 0 if the initial values xo and yo are
nonnegative (the last assumption is always satisfied, since xo and yo denote
the initial sizes of the prey and predator populations, respectively). There-
fore, all solutions of Eq. (7.2.5) (the phase trajectories of system (7.2.3)) lie
>
in the first quadrant (x 2 0, y 0) of the phase plane (x, y). Furthermore,
we shall consider only the phase trajectories that correspond to the two
boundary values of control: u = 0 and u = u,.
For u = 0 Eqs. (7.2.3) coincide with Eqs. (7.2.2) for a n isolated (au-
tonomous) Lotka-Volterra system. The dynamics of system (7.2.2) was
studied in detail in 11871. Omitting the details, we only note that in the
first quadrant (x > 0, y 2 0) there are two singular points ( z = 0, y = 0)
and (x = 1,y = 1) that are the equilibrium states of system (7.2.2). In this
case the origin ( x = 0, y = 0) is a n unstable equilibrium state, while the
state (x = 1, y = 1) is stable and is a center type singular point. All phase
trajectories of system (7.2.2) (except for the two that lie on the coordinate
> >
axes: (x 0, y = 0) and (x = 0, y 0)) form a family of closed concentric
curves around the point (x = 1,y = 1). Thus, in a noncontrolled system
the sizes of both populations are subject to undecaying oscillations whose
period and amplitude depend on the initial state (xo, yo). However, if the
initial state (zo, yo) lies on one of the coordinate axes in the plane (x, y),
then there arise singular (aperiodic) phase trajectories. In this case it fol-
lows from Eqs. (7.2.2) that the representative point of the system cannot
370 Chapter VII
leave the corresponding coordinate axis and in the course of time either
approaches the origin (along the y-axis) or goes to infinity (along the x-
axis). The singular phase trajectories correspond to the degenerate case of
system (7.2.2). In this case, the biological system considered contains only
one population.
If u = urn > 0, then the dynamics of system (7.2.3) substantially depends
on urn. For example, if 0 < urn < 1, then the periodic character of solutions
of system (7.2.3) is conserved (just as in the case u = 0), while only the
center of the family of phase trajectories moves to the point (x = 1, y = 1 -
urn). For u, 2 1 the solution of system (7.2.3) is aperiodic. In the special
case u, = 1, Eq. (7.2.5) can easily be solved, and the phase trajectories of
system (7.2.3) can be written explicitly as
For urn > 1 Eq. (7.2.5) has a unique singular point (x = 0, y = O), and this
equilibrium state is globally asymptotically table.^
Now let us formulate the goal of control for system (7.2.3). In many cases
190, 1051 it is most desirable that system (7.2.3) is in equilibrium for u = 0,
that is, the point (x = 1,y = 1) is the most desirable state of system (7.2.3).
In this case, one is interested in a control u, = u,(x, Y) that takes system
(7.2.3) from any initial state (xo, yo) to the point x = 1,y = 1 in a minimum
time. This problem was solved in 1901. Here we consider the problem of
constructing a control u, = u,(t, x, y), which, in general, does not guarantee
that the system comes to the equilibrium point (x = 1,y = 1) but ensures
the minimum mean square deviation of the system phase trajectories from
the state (x = 1,y = 1) in a given time interval 0 t 5 T: <
3 ~ this
n case the term "global" means that the trivial solution of system (7.2.3) is
asymptotically stable for any initial values (xo,yo) from the first quadrant of the phase
plane.
Numerical Synthesis 371
Now we define the loss function (the functional of minimum future losses)
by the relation
F ( t , x , ~ ) = min
0ju(u)lum
t<o<T
and thus write the Bellman equation for problem (7.2.3), (7.2.4), (7.2.7) as
It follows from (7.2.10) that the optimal control is a relay type function,
that is, a t each time instant the control u is either u = 0 or u = u, (this is a
bang-bang control). If the loss function (7.2.8) is continuously differentiable
with respect to x, then the control is switched from one value to the other
each time when the condition
( u )= 1 - u ) x ( ) > t, x ( t ) = 2.
+
2 ( T - t ) - 2 ~ [ e ( ~--11~ ) x2[e2(T-t)- 1 ] / 2 , 0 5 x < 21,
$ ( t ,x ) =
{
On the interval x l
2 ( T - t ) - f 1i -[Ue,( l - u m ) ( T - t ) - 11
+ L [ e 2 ( 1 - u m ) ( ~ - t )-
<
Z(1-urn)
- x <
11
One can readily see that the possible values of the root z of Eq. (7.2.18)
always lie in the region 1 <
-z < - e(l-um)(T-t)and the boundary values z = 1
and z = e(l-um)(T-t) correspond to the endpoints (7.2.15) of the interval
x1 5 x <2 2 . The optimal control u,, which solves problem (7.2.14),
depends on the variable x ( t ) = x and is determined as follows:
(b) Let urn = 1. In this case, for u = urn the coordinate x ( u ) = const,
and problem (7.2.14) has the obvious solution
for x ( u ) < 1,
u* = {;rn,
for x(u) 2 1.
The minimum value of the functional (7.2.14) can readily be calculated for
control (7.2.19), and as a result, for the desired function $(t, x) we obtain
the expression
$(t, 2) =
I +
O 5 x 5 e-(T-t),
( T - t ) - l n x 22 - x2/2 - 312, e-(T-t)
( (T-t)(2-2x+x2), x 1.
x 1,
(7.2.20)
>
< <
(c) Let urn > 1. In this case the optimal control solving problem (7.2.14)
coincides with (7.2.19).4 After some simple calculations, we obtain
0, for % ( a )< 1,
u.={l, for x(u)=l,
Urn r for = ( a ) > 1.
Under this control we can realize the generalized solution in the sense of Filippov of the
equation x ( u ) = ( 1 - u * ) x ( o )(see 1541 and $1.1).
Numerical Synthesis 375
Thus, to find the optimal control in the synthesis form that solves prob-
lem (7.2.3), (7.2.4), (7.2.7), we need to solve the following boundary value
problem for the loss function F ( t , x, y):
where u, has the form (7.2. l o ) , ~ ( ty), is given by formula (7.2.13), and
the function + ( t , x ) is given by expressions (7.2.16)-(7.2.18), (7.2.20) or
(7.2.21) depending on the value of the maximum admissible control u,.
The boundary value problem (7.2.22) was solved numerically. The results
obtained are given in Section 7.2.4.
7.2.3. Problem with infinite horizon. Stationary operating
mode. Let us consider the control problem (7.2.3), (7.2.4), (7.2.7) on an
infinite time interval (in this case the terminal time T -+ ca). If the opti-
mal control u, (t, x, y) that solves problem (7.2.3), (7.2.4), (7.2.7) ensures
the convergence of the functional (7.2.8) for any initial state (x > 0, y > 0)
of the system, then due to time-invariance of Eqs. (7.2.3) the loss func-
tion (7.2.8) is also time-invariant, that is, F (t, x, y) -+ f (x, y), where the
function f (x, y) satisfies the equation
Since the gradient of the loss function is continuous on the switching line,
that is, on the interface between R o and R,, we have
By using formulas (7.2.27), one can readily see that the condition (7.2.28)
is satisfied along the two lines y = x and y = 2 - x. To verify whether these
lines (or some parts of them) are lines of the sliding mode, we need to con-
sider the families of phase trajectories (that is, the solutions of Eq. (7.2.5))
for u = 0 and u = u, near these lines.
The corresponding analysis of the phase trajectories of system (7.2.3)
shows that the sliding mode may take place along the straight line y = x for
x < 1 and along the line y = 2 - x for x > 1. In this case the representative
point of system (7.2.3) once coming to the line y = x (x < 1) moves
along this lines (due to the sliding mode) away from the equilibrium state
(x = 1,y = 1). On the other hand, along the line y = 2 - x (x > I ) , system
(7.2.3) asymptotically a s t + oo approaches the point (x = 1,y = 1) due
to the sliding mode. T h a t is why, only the straight line segment
can be considered as the switching line for the optimal control in the sta-
tionary operating mode.
If u = urn, then the integral curve of Eq. (7.2.5) is tangent to the line
y = 2 - x a t the endpoint xO of the segment (7.2.29). By using (7.2.5), we
can write the tangency condition as
Numerical Synthesis 377
One can easily obtain a finite formula for the stationary loss function
f (x, y) along the switching line (7.2.29). By using the second equation in
(7.2.3) and formula (7.2.29), we see that the coordinate y(t)is governed by
the differential equation
Y = b(y - y2) (7.2.32)
while moving along the straight line (7.2.29). By integrating (7.2.32) with
the initial condition y ( 0 ) = y, we obtain
and define the grid function F; that approximates the desired continuous
solution F ( t , x, y) of Eq. (7.2.22) a t the nodes of the grid (xi, yj, tk). The
values of the grid function ~4 a t the nodes of the grid (7.2.35) are related
to each other by algebraic equations obtained by the difference approxima-
tion of the Bellman equation (7.2.22). In what follows, we use well-known
methods for constructing difference schemes [60, 135, 162, 1631, therefore,
here we restrict our consideration only to a formal description of the differ-
ence equations used for solving Eq. (7.2.22) numerically. We stress that the
problems of approximation accuracy and stability and of the convergence
of the grid function F; to the exact solution F ( t , x, y) of Eq. (7.2.22) as
h,, hy, T + 0 are studied in detail in [49, 53, 135, 162, 163, 1791.
Just as in $7.1, by using the alternating direction method [163], we re-
place the two-dimensional (with respect to the phase variables) equation
(7.2.22) by the following pair of one-dimensional equations:
where
and the approximation steps h, and T satisfy the condition TIT,^ 5 hr for
all r, on the grid w .
For Eq. (7.2.37) we used the difference approximation
Numerical Synthesis 379
where
<
and the steps hy and T are specified by the condition T1ry[ hy for all ry
on the grid (7.2.35).
The grid functions for the initial Bellman equation (7.2.22) and for the
auxiliary equations (7.2.36), (7.2.37) are related as F; = vk. '3 ' vk-0.5
'3 =
~ ~ 5 - I=
~ k - 0 . and
'3 ' '3
The grid functions are calculated backwards over the time layers (num-
bered by k) -
from k = N to a n arbitrary number 0 k < N . The grid
function F; approximates the loss function F(T-~FT, ih,, j h y ) correspond-
ing to Eq. (7.2.22).
To obtain the unknown values of the grid functions vFj and K;
uniquely
from the algebraic equations (7.2.38) and (7.2.39), in view of (7.2.22), we
need to complete these equations with the zero "initial" conditions
reverse time p = 3.5, 6.0, 12.0. Figure 64 illustrates the variation of the
loss function F ( t , x, y) along a part of the line (7.2.29) for different time
moments. The dotted line in Fig. 64 shows the stationary loss function
(7.2.34).
Numerical Synthesis
Figures 6 1-64 show that the results of numerical solution of Eq. (7.2.22)
(and of the synthesis problem) as p t cc allow us to study the passage
to the stationary control of population sizes. Moreover, these data confirm
the results of the theoretical analysis of the stationary mode carried out in
Section 7.2.3.
382 Chapter VII
We also point out that the nonstationary u,(t, x, y ) and the stationary
u, (x, y ) = limp,, u, (t, x, y ) algorithms of optimal control, obtained by
solving the Bellman equation (7.2.22) numerically, were used for the nu-
merical simulation of transient processes in system (7.2.3) when the com-
parative analysis of different control algorithms was carried out. The results
of this simulation and comparative analysis were discussed in $5.2.
CONCLUSION
Design methods that use the frequency approach to the analysis and
synthesis of control systems [119-121, 146, 1471 are widely applied in mod-
ern control engineering. Based on such notions as the transfer functions
of open- or closed-loop systems, these methods allow one to evaluate the
control quality by the position of zeros and poles of these transfer functions
in the frequency domain. The frequency methods are very illustrative and
effective in studying linear feedback control systems.
As for the methods for the calculation of optimal (suboptimal) con-
trol algorithms in the state space, shown in this book, modern engineering
most frequently deals with results obtained by solving problems of linear
quadratic optimization, which lead to linear optimal control systems.
So far linear quadratic problems of optimal control have been studied
comprehensively, the literature on this subject is quite extensive, and there-
fore these problems are only briefly outlined here. It should be noted that
the practical realization of linear optimal systems often involves difficul-
ties, as one needs to solve the matrix-valued Riccati equation and to use
the solution of this equation on the actual time scale. These problems are
discussed in [47, 126, 134, 149, 1501.
It is well known that a large number of practically important problems
of optimal control cannot be reduced to linear quadratic problems. In par-
ticular, this is true for control problems in which constraints imposed on
the values of the admissible control play an important role. Although prac-
tically important, there is currently no universal approach to solving these
optimal control problems with constraints in the form that ensures a simple
technical realization of the optimal control algorithm. The author hopes
that the results obtained in this book will help to develop new engineer-
ing methods for solving such problems by using constructive methods for
solving the Bellman equations.
Some remarks concerning the prospects for solving applied problems of
optimal control on the basis of the dynamic programming approach should
be made.
The existing methods of optimal control synthesis could be categorized
as exact, approximate analytic, and numerical. If a synthesis problem can
384 Conclusion
'A similar situation arises in the search of Liapunov functions in the theory of stability
[I, 29, 125, 1291. This fact was pointed out by T. Burton [29, p. 1661: " . . . Beyond any
doubt, construction of Liapunov functions is an art."
Conclusion 385
totic methods for solving the Bellman equations one of the most promising
trends in the engineering design of optimal control systems.
Another important branch of applied methods for solving problems of
optimal control is the development of numerical methods for solving the
Bellman equations (and synthesis problems). This field has recently re-
ceived much attention [lo, 31, 48, 49, 53, 86, 104, 1691. The main benefit
of numerical synthesis methods is their high universality. It is worth to note
that numerical methods also play an important role in problems of evalu-
ating the performance index of quasioptimal control algorithms calculated
by other methods. Currently, the widespread use of numerical synthesis
methods in modern engineering is somewhat hampered by the following
two factors: (i) the approximate properties of discrete schemes for solving
some classes of Bellman equations still remain to be rigorously mathemat-
ically justified, and (ii) the calculation of grid functions requires a great
number of operations. All this makes it difficult to solve control problems
of higher dimension and those with unbounded phase space. However, one
must not consider these facts as an obstacle to using numerical methods
in engineering. Recent developments in numerical methods for solving the
Bellman equations and in the decomposition of multidimensional problems
[31], continuous advances in parallel computing, and the progress in com-
puter technology itself suggest that the numerical methods for the synthesis
of optimal systems will soon become a regular tool for all those dealing with
the design of actual control systems.
REFERENCES
program, 2
of relay type, 105, 111
Adaptive problems of optimal Control problem with infinite
control, 9 horizon, 343
A posteriori mean values, 91 Controller, 1, 7
A posteriori covariances, 90 Constraints, control, 17
Asymptotic synthesis on control resources, 17
method, 248 on phase variables, 18
Asymptotic series, 220 Cost function (functional), 49
Covariance matrix. 147