0% found this document useful (0 votes)
52 views

09 Forward Differential Dynamic Programming

jurnal ini membahas tentang program program dinamik.

Uploaded by

Muslikhudin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

09 Forward Differential Dynamic Programming

jurnal ini membahas tentang program program dinamik.

Uploaded by

Muslikhudin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 21, No.

4, APRIL 1977

Forward Differential Dynamic Programming


K. WIT 1

Communicated by S. E. Dreyfus

Abstract. The dynamic programming formulation of the forward prin-


ciple of optimality in the solution of optimal control problems results in a
partial differential equation with initial boundary condition whose
solution is independent of terminal cost and terminal constraints. Based
on this property, two computational algorithms are described. The
first-order algorithm with minimum computer storage requirements
uses only integration of a system of differential equations with specified
initial conditions and numerical minimization in finite-dimensional
space. The second-order algorithm is based on the differential dynamic
programming approach. Either of the two algorithms may be used for
problems with nondifferentiable terminal cost or terminal constraints,
and the solution of problems with complicated terminal conditions (e.g.,
with free terminal time) is greatly simplified.

Key Words. Optimal control, forward dynamic programming,


differential dynamic programming.

1. Introduction

Unlike the numerical integration of ordinary differential equations with


specified initial conditions and numerical minimization of functions for
which efficient computational methods are available, the solution of two-
point boundary-value problems (TPBVP) still presents great difficulties in
the development of new computational methods for the solution of optimal
control problems. By embedding a given problem into a more general class
of similar problems, dynamic programming (DP) avoids the tedious solution
of TPBVP, but it is not generally computationally feasible. All other
methods based on the backward principle of optimality invariably require
solution of a TPBVP for which some analyticity of terminal boundary
conditions is always essential. By redefining the principle of optimality in the

1Lecturer,DepartmentofElectricalEngineering,Universityof Salford,Lancashire,England.
487
© 1977 Plenum Publishing Corp., 227 West 17th Street, New York, N.Y. 10011. To promote freer access to published material
,,~ the spirit of the 1976 Copyright Law, Plenum sells reprint articles from all its journals. This availability underlines the fact
that no part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying~ microfilming~ recording, or otherwise, without written permission of the publisher.
Shipment is prompt; rate per article is $7.50.
488 JOTA: VOL. 21, NO. 4, APRIL 1977

forward way, it is possible to separate the functional minimization from the


specified terminal conditions, thus eliminating the need for their analyticlty.
Nondifferentiable terminal cost or terminal constraints may appear quite
often in the solution of practical problems, as illustrated by examples, and
they are fairly common in econometrics and optimum planning. Even
optimal control with unspecified terminal time poses a rather diffctilt
problem for which new methods continuously emerge in the control litera-
ture. Problems in which the time of the first arrival at the target set may not
be optimal, as illustrated in Ref. 1, have hardly been tackled. All that these
problems have in common is some complexity of their terminal conditions.
Using the approach developed in this paper, this complexity is traded for
more or less Complicated constraints in numerical minimization of nonlinear
programming (NP) for which a number of reliable and efficient algorithms
are available. At the same time, as a consequence of the forward formulation
of the principle of optimality, all boundary conditions are defined at the
initial time and the corresponding differential equations may therefore be
readily solved.

2. Problem Statement

We shall consider an n-dimensional dynamic system


£(t) =f(x, u, t), (I)
with given initial conditions
x (to) = xo. (2)
The problem is to find a piecewise-continuous vector function u(t) which
belongs to a closed control set U(t) for t ~ [to, tr] and minimizes the following
cost functional:

F(x (t), t) + L (x, u, z) dr, t --- tf. (3)


to

The corresponding optimal state-space trajectory is required to satisfy a set


of terminal constraints
• (x (t), t) = 0, t -> tf, (4)
• (x(t), t)<-o, t>_t~. (5)

It is assumed that the control of system (1) finishes at a terminal time ti, which
may or may not be specified in advance, so that any x (t), t > tf, needed in (3),
(4), or (5) can be determined from (1) with u(t)= O, t > tf.
JOTA: VOL. 21, NO. 4, APRIL 1977 489

The terminal cost F in (3) and the terminal constraints (4) and (5) are
not required to be differentiable. Some possible examples of nondifferenti-
able terminal conditions are
F - - maxIxi (t)[, (6)
t~--tf

xlt-ts-to-kT, K = 0 , 1, 2 , . . . , T>0, (7)


¢b-lxi(t)l-lxi(to)l, t>-tr, (8)
where xi(t), 1 <-i <-n, is the ith component of the state vector x(t). Such
terminal conditions appear naturally in optimal control of dynamic systems
that are not in equilibrium at the terminal time th after which no further
control is applied. The functions f and L are assumed to be sufficiently
smooth in all arguments.

3. Forward Versus Backward Dynamic Programming

The conventional backward DP embeds the solution of the original


problem with given initial condition Xointo a family of related problems with
the same terminal conditions, viz., time tl, terminal cost, and terminal
constraints, but with initial conditions in some subset Xo of the state space at
time t0. For any particular x0 c 2(0, the optimal policy and trajectory are
therefore readily available. The computation proceeds backward in time
from ts to to. Therefore, the optimal cost function
r ts
V(x(t),t)=~n{F(x(tr),ts)+| L(x,u,~-)d~-}, t<-tr, (9)
~t

V(x(tf), tr) = F(x(tl), ts), (10)


~I,(x(tr), tr) = o, (11)
~;(x (tr), tf) _-<0, (12)
depends on the terminal conditions.
The well-known dimensionality problem associated with a straightfor-
ward application of DP has led to the development of a numberof optimiza-
tion methods which invariably use some analytical information about the
terminal conditions and result in a nonlinear TPBVP which requires large
computational effort and computer core to solve.
In forward DP (Ref. 2), the embedding family is constructed with the
terminal conditions being ignored and it consists of state-space trajectories
emanating from the given initial point x0, each of which is a candidate for the
optimal trajectory. A particular one amongthem is recognized as optimal
490 JOTA: VOL. 21, NO. 4, APRIL 1977

after inclusion of the terminal conditions (terminal time, cost, and con-
straints). Thus, at any time t > to, there is a trajectory passing through
x(t) ~ X(t) with minimum cost
S(x(t), t) = min -~ L(x, u, 7) d.c, (13)
u ~ U .Ito

S(xo, to) = 0, (14)


where the attainable set X(t) is a set of all possible solutions of (1) for
admissible u(~-), ~-c [to, t]. The optimal trajectory must satisfy

minmin{F(x(t), t)+ L(x, u, ~') d~'}, t>-tl, (15)


x(tt),qu~U to

subject to (4) and (5) which, using (13) and noticing that F is independent of
u, can be written as
min{F(x(t), t)+S(x(tt), tr)}, t_>q, (16)
x(tt),q

subject to (4) and (5).


It is seen that the functional minimization in (13) has been separated
from the (n + 1)-dimensional NP problem in (16). If the terminal time is
specified in advance, the corresponding minimization in (16) is omitted, and
the dimension of the NP problem is reduced to n.
The fact that the optimal cost function S is not dependent on the
terminal conditions allows one to use standard functional optimization
techniques also for problems with nondifferentiable terminal cost or con-
straints. The problems with free terminal time q are also simplified. The
price that one has to pay for this convenience is that the very desirable
feedback nature of the backward DP has been lost; i.e., given a new initial
condition x~ # x0, no optimal policy is available unless the whole problem is
re-solved for the new x~. However, in most existing optimization algorithms,
the embedding nature of backward DP has been sacrificed in any case for
computational feasibility, by generating an open-loop optimal control for a
particular x0. Applying these algorithms to deterministic systems will also
yield the'closed-loop optimal strategy. In the optimization of stochastic
systems, however, it is vital to preserve the closed-loop feature of optimal
control to allow for compensation of future disturbances. Also, in some
analytical developments (e.g., in the proof of optimality of a singular control,
Ref. 3), the forward principle of optimality as described above will not work.
There are, however, many situations in which it may greatly simplify the
solution (e.g., in problems of economical planning with limited equipment
age, Ref. 4) or even yield solutions to problems which cannot be solved
otherwise.
JOTA: VOL. 21, NO. 4, APRIL 1977 491

From these comments, it should be clear that forward DP is not merely


backward DP applied in the reverse time to a problem with initial and
terminal conditions interchanged, as is sometimes erroneously assumed.

4. Optimum Cost Function

Before the NP problem (16) can be solved, it is necessary to have a


means of generating the embedding family of trajectories starting at xo and
having the minimum cost (13) for any reachable x (t) e X(t). On application
of the forward principle of optimality, it is easy to derive the following partial
differential equation for the optimal cost function in (13):
OS/at = minH(x, u, S~, t), (17)
u~U

S(x(to), b, to) =(b, X(to)-Xo), (18)


with the Hamiltonian defined by
H(x, u, Sx, t) = L(x, u, t)-(S~, f(x, u, t)), (19)
where
sx = a s / a x
is the gradient of S and b is a vector Lagrange multiplier. Eq. (17) resembles
the well-known Bellman-Jacobi-Hamilton equation for the backward
optimization and it differs from it only by a sign and the time at which the
boundary condition (18) is defined.
Because the direct numerical solution of Eq. (17) presents the same
difficulties as that of its backward counterpart, we will seek some feasible
methods of generating trajectories satisfiying (13). For this, the well-
established methods of solution of the Bellman equation may be used as a
helpful guide. In particular, we shall examine two approaches: (a) solution of
hyperbolic partial differential equations by the method of characteristics;
and (b) second-order approximation to the optimal cost function in (13). In
order to unify the derivations, we shall use the method of differential
dynamic programming (DDP, Ref. 5), which has been applied successfully
to a number of optimization problems.

5. Forward Minimum Principle

Let us assume that a control u*(t), t e [to, q], and the corresponding
trajectory x*(t), x*(to)= Xo, satisfying (13), are available. Then, Eq. (17)
becomes
OS(x*, t)/Ot = H(x*, u*, Sx, t), S(xo, to) = 0. (20)
492 JOTA: VOL. 21, NO. 4, APRIL 1977

For trajectories originating at Xo, the optimal cost function is independent of


b. If a small-control 6u is now introduced, so that u * + 6u E U, the new
trajectory satisfies
OS(x*+6x, t)/Ot =H(x*+6x, u*+6u, S~ +Sxx~x, t). (21)
By a small variation 3u we mean a variation for which Sx + Sx~ ~x is a
good approximation to Sx(X*+6x, t). Expanding Eq. (21) up to the first-
order terms, using (20), and noticing that either Hu = 0 for u* interior to U
or 6u = 0 for u* on the boundary of U, we obtain the adjoint equation
S~ (t) = (OH/Ox)(x*, u*, S~, t), (22)
with the initial condition
&(to) = b. (23)
From (1) and (19), it follows that
~:*(t) = -(OH/OSx)(x*, u*, Sx, t), x*(to) =xo, (24)
which is (1) evaluated along the optimal trajectory. Equations (22) and (24)
are clearly necessary conditions which the control u* minimizing the Hamil-
tonian in (17) as well as the integral in (13) must satisfy. They play the same
role in what we can call the forward minimum principle as the canonical
equations in the backward minimum principle of Pontryagin. For any given
b, Eqs. (22) and (24) can be easily integrated forward with u* determined by
a constrained minimization of the Hamiltonian (19) with respect to u ~ U
and the optimal cost can be computed as

S(x*(t), t) = L(x*, u*, T) d-r. (25)


o

By varying b in (23), we are able to generate the desired family of


candidate trajectories with x*(ts)~ X(tt). Thus, it is possible to replace the
minimization over x(q) in (16) by the minimization over b. This gives the
following equivalent problem:

minlF(x*(t), t)+ L(x*, u*, ~') d.r , t>-tr, (26-1)


b,tf [ 0

subject to
• (x*(t), t) = O, t>-ts, (26-2)
dp(x*(t), t)<_O, t>-tt. (26-3)
Accepting the open-loop strategy from the very beginning, we have reduced
the TPBVP to an initial-value problem which is much easier to solve at the
JOTA: VOL. 21, NO. 4, APRIL 1977 493

expense of a constrained minimization in the (n + 1)-dimensional space


(b, tf).
Integration of the adjoint equation (22) in the unstable direction may
pose numerical problems similar to those encountered with the shooting
methods often used to solve TPBVPs. In spite of some similarity between
these two approaches, the new algorithm works directly with the value of the
terminal cost function and the terminal constraints instead of their deriva-
tives. Thus, it appears to be more straightforward; it overcomes the problem
of sensitivity of the optimal cost to the accuracy to which the TPBVP is
solved and, as will be shown by examples, it is applicable to a broader class of
problems.

6. ComputationalAlgorithm

Using a direct search numerical method of NP, the computation


proceeds as follows.
Step 1. For a particular b and tf specified by the minimization routine,
integrate the 2n equations (22) and (24) from to to tr with u* determined by
either analytical minimization or, if necessary, numerical minimization of
the Hamiltonian (19).
Step 2. Compute S(x*(t¢), tr) in (25).
Step 3. Integrate (1) from tt forward with u(t)-O, t > tl, to obtain
x*(t) needed in F, ~ , and alp.Some degree of familiarity with the problem is
necessary in order to determine how far to continue the integration.
Step 4. Evaluate the total cost in (26-1) and the constraints (26-2) and
(26-3) as required by the minimization routine.
If the terminal cost F is independent of t > tI [i.e., if t = t! applies in
(26-1), (26-2), and (26-3)], it is advantageous to write the minimization in
(26-1) separately, namely,
min{. } = rain{rain[ • ]},
b,tf b ts

and to simplify the algorithm as follows. Introducing a penalty function P for


the constraints (26-2) and (26-3), integrate (22) and (24) from to onward as
long as
P(x*(t), t)+f(x*(t), t)+ L(x*, u*, ~') dr (27)

is decreasing. The problem has thus been reduced to an n-dimensional


unconstrained minimization in the space of adjoint initial conditions (23).
494 JOTA: VOL. 21, NO. 4, APRIL 1977

u=_.*0.Ims~I

a [R=2m

X2
~.-~

Fig. 1. Optimalcontrolof a crane: (a) movingcrane; (b) movingcrab.

Example6.1. Optimal Pendulum Control. A load is suspended from


an overhead crane crab as shown in Fig. la. The crab and the load are at rest
with respect to the crane, which is moving at a constant speed v. If the crane
suddenly stops, the load starts to swing as in Fig. lb. A bang-bang policy of
the crab speed u is to be found which will quickly reduce the amplitude of
load swings without excessive movement of the crab. It is required that the
crab be positioned eventually at the same point on the crane as before it
started to move.
The system is described by the equations
iq=x2, x1(0) = 0, (28)

iz = -txa-x3)lR:_-L-~_-L--_xs)2 x2(O)--~ V, (29)

it3 = u, x3(0) = 0, (30)

where xa and x2 are position and velocity of the load and x3 is position of the
crab. g = 9.81 m . sec -z is acceleration due to gravity.
In mathematical form, the above requirements may be written as the
following minimization problem:

subject to
x3(q) = 0. (32)
JOTA: VOL. 21, NO. 4, APRIL 1977 495

Cost

20 A
,~ / \x
r
/ t i~ \
,,, \ / ,~
I0
8
global
minimum

S It.)
Fig.2. Localminima.
The Hamiltonian
3
H = ½ x ~ - E Sx,2i (33)
is minimized for
u * = 0.1 sign[2S~[Xz(Xl-Xs)/(R2-(xl-x3)Z)]+Sj. (34)
The penalty term lO0lx3(t/) I was added to the cost functional (31). A
four-dimensional simplex minimization and fourth-order Runge-Kutta
integration method with step 0.02 were used. From the numerical experi-
ments, the minimized function in (b, ts)-space was found to have a number of
local minima along the ts-axis located approximately half the period of
pendulum oscillations apart (Fig. 2). The global minimum of 7.601 was
found for t¢ = 5.58 and

F °°°11
b=/-8.902 /
L-0.175_1
The optimal strategy and the corresponding phase diagram in xlx2-plane are
given in Fig. 3.

Example6.2. Optimal Shutdown of a Nuclear Reactor. It is required


to shut down a xenon-dominated nuclear reactor in minimum time in such a
way that a restart is always possible. The fission reactions are described by
(Ref. 6)
J~l = - - X l -'1- U , XI(0) = 1, (35)
22 = axl + bx2 + (c + dx2)u, X2(0)= 1, (36)
496 JOTA: VOL. 21, NO. 4, APRIL 1977

-I
x~,ms ~ - ÷- u~ O.l
1 . 0 ~ ~ u=-O,I

- 013 0.5

-I.0

Fig. 3. Phase trajectory and optimal control,

where xl and x2 are normalized concentrations of 1-135 and Xe-135,


respectively, u is a normalized neutron flux, and a, b, c, and d are constants,
The neutron flux is limited by the amount of fuel available, that is,
u -- u ~ ; (37)
and xenon concentration must not overpoison the reactor, that is,
x2(t)<--Xmax, t~O. (38)

13
flux =6.10 neutron /era2 s£c
2 Urn=x=2 F

to tq 6 12 tr tt '"IB hour

Fig. 4. Optimal shutdown policy.


JOTA: VOL. 21, NO. 4, APRIL 1977 497

If tr is the time at which the shut-down process is completed, i.e.,


u(t)-O, t>t h
then, for t-t~, (38) is a terminal constraint of the form (5). For t < th
however, (38) is a phase-space constraint not considered in our general
development.
The Hamiltonian is found to be
H=l+Sxlx1-Sx2(ax+bx2)-u[S~+S~2(c+dx2)], (39)
and the adjoint equations are
,~,:, = S=, - aS~, (40)
Sx2 = - ( b +du)Sx2. (41)
Let the solution of (35) and (36) for which equality holds in (38) be called a
constrained arc, otherwise an unconstrained arc. The beginning of a con-
strained arc is called an entry point. On an unconstrained arc, the optimal
control (Fig. 4) is given by
u . = { 0, S~+S~2(c+dx2)<O,
Um,,x, $~,~+Sx2(c+dx2)>O; (42)
and, on a constrained arc,
u* = uc = - ( a x l + bxm~)/(c + dxmax). (43)
Generally, the unconstrained arc has to satisfy some additional tangency
constraints at the entry point. From the special nature of this problem, it was
found by means of a geometrical argument that only one constrained arc
exists and that the adjoint variables at the entry point Q must satisfy (Ref. 7)
Sx~(tq) = 1/(Uc --X~), (44)
Sx2(tq) = - 1/(uc - x 1)(c + d x ~ ) . (45)
Thus, due to the separability of the optimal arcs, the first unconstrained arc
OQ in Fig. 5 is obtained by solving the TPBVP (35), (36), (40), (41). It turns

x2 P R Q
2

I o

i ~ G J Xl
(3..5 1.0
Fig. 5. Phase trajectoryof optimal Shutdownprocess.
498 JOTA: VOL 21, NO. 4, APRIL 1977

out that
u*(t) = 0, O<t<tq,
SO that no TPBVP need in fact be solved.
One approach to solving the problem is to choose the time t, at which to
leave the constrained arc (i.e., to switch uc to Umax)and the time tg, in order to
minimize tf subject to (38) for t -> ts. Another approach is obtained by noting
that, once control has ceased, the optimal trajectory is the unique trajectory
of the free motion
xl = - x l , (46)
Yc2= axl + bx2, (47)
passing through the origin and tangential to the line x2 = Xm~xat P. A fairly
elementary calculation based on this consideration gives the coordinates of
P as
x l(tp) = -(b/a)Xmax, (48)
x2(tp) = xm~. (49)

The arc PFG can now be obtained by integrating (46)-(47) backward in time
from the point P given by (48)-(49). The problem is, therefore, reduced to
choosing tr in order to minimize the time tf. This approach requires no more
than integration of two equations and one-dimensional minimization.
It is to be pointed out that the solutions for similar problems without the
constraint (38) were obtained in Ref. 6 by backward procedures only
because of the bilinear form of Eq. (36). For u = 0, this equation becomes
linear as in (47) with known analytical solution, from which
F(xl(tr) , x2(ts)) = max x2(t) (50)
t ~--tf

needed in (10) as well as the boundary values Vxl(tf) and Vx2(tl) of the
adjoint variables can be found analytically. Using the forward approach, it
was possible to obtain the solution of a more general case with the state-
space constraint (38) without exploiting the special bilinear structure of
system equations.

7. Second-Order Expansion

Using the method of DDP (Ref. 5), we will derive a second-order


algorithm from the forward principle of optimality. Because the system
equation (1) will be integrated backward in time, we have to ensure
JOTA: VOL. 21, NO. 4, APRIL 1977 499

satisfaction of the initial condition (2). This will be achieved by adjoining it


to (3) by a constant Lagrange vector multiplier b. The new optimal cost
function depends on b and it is written as

W(x(t), b, t)=min{ (b, x(to)-Xo)+ I ' L(x, u, ~) &c}. (51)


to

From the forward principle of optimality, it must satisfy

OW/Ot = rain
u
G(x, u, Wx, t), (52)

W ( x (to), b, to) = (b, X(to) - Xo), (53)


where the Hamiltonian is given by
a(x, u, W~, t)=L(x, u, t)-(W~,f(x, u, t)). (54)
Similar to (16), the optimal trajectory must yield
min {F(x (t), t) + W(x (tf), tf)}, t >-tf. (55)
x(tf),tf

The control vector u is assumed to be unconstrained for reasons that will


become obvious later.
Proceeding in a similar way to Ref. 5, it is assumed that a nominal
multiplier b and a nominal trajectory 2, not necessarily passing through x0,
are available for t ~ [to, tf]. If tI is not specified explicitly, t s [to, ~] for some
nominal time ~ >to. The nominal trajectory is assumed to satisfy the
terminal constraints (4) and (5) at time is. This is easily achieved by
integrating (1) on the first run backward in time from a feasible point. The
constraints are never violated from then on.
The minimization of G along 2(t) gives the necessary condition

Q(2, u*, Wx, t) = 0,


where u* is the minimizing control. Henceforward, it is assumed that
Guu (2, u*, W~, t) is a positive-definite matrix. On introduction of variations
6x, 6b, and 6u, Eq. (52) is expanded up to the second order about 2, u*, and
b. The control variation is chosen in the form

,~u =131 8x.


This is not to be interpreted as a local linearized feedback which, in the
backward formulation, determines the control that should be applied in
order to maintain optimality along 2 +6x. Here, 6u rather determines a
trajectory which should have been followed in order to satisfy Eq. (51).
500 JOTA: VOL 21, NO. 4, APRIL 1977

On collection of corresponding terms of the expanded equation, one


obtains the following set of differential equations:
d = a(2, a, W~, t) - G, a(to) = 0, (56)
~'~ = ax + W~x [f(2, rT, t) - f ] , W~(to) = b, (57)
~d/b = W~[f(2, a, t ) - f ] , Wb(to) =2(to)-Xo, (58)
~¢~b = - ( f x + f~l)Tw~b, W~b(to) = I, (59)
Wbb = T - I TW~,b,
- WxbfuGu,,fu Wbb (to) = O, (60)

- ( G ~ - f , TWx~) TGu~(G~
-1 --f,W~),T W~(to) = 0, (61)
where
t3,=au.(fuWx
-Qx).-1
T

Unless otherwise stated, all quantities are evaluated at 2,/~ u*. a (t) is the
predicted reduction of W(£, b, t) provided that the new control
u = u* + ~16x
is applied to (1) backward in time from 2(0.

8. ComputationalAlgorithm

The first four steps of the new algorithm are similar to those of Jacobson
(Ref. 5), with opposite direction of integration. As a result of their comple-
tion, a candidate 2(t), t e [to, ~], for an optimum trajectory satisfying both
the initial and the terminal constraints is obtained, along which a - 0.
The last step requires numerical minimization in (55). The values of the
optimal cost function are determined from its expansion
W ( 2 + 6 x , 5, ~r + 6ts) = W + ( W x , 6x)+½(~x, Wxx 6x) (62)
at time ~ +6t¢, which is known for 6t/-< 0. If 6ts > 0, one continues the
integration of (1), (57), and (61) from fs to ~i+6ty, while minimizing G to
obtain u*. The variation 6tf need not be small, since no time expansion is
used in (62). To limit the size of 6x, it may be necessary to take only a
size-limited step in a decreasing direction 6xa of the minimized function in
(55), rather than to complete the minimization. If 6xd # 0, (1) is integrated
backward in time from [2 + ~xd]~,-+~teapplying the new nominal control
= u*+fll 6x,
and the algorithm is restarted.
JOTA: VOL. 21, NO. 4, APRIL t977 501

R,=2o i
U~ )T '~ ox
C=IF ? ~ R2=5-~.

Fig. 6. Circuit diagram.

Example 8.1. Optimal Charging of a Capacitor. Let us consider the


simple circuit of Fig. 6 described by
Jc= -(1/C)(1/Rt + 1/R2)x + (1/CR1)u, x0 = 0. (63)
We wish to find the voltage u(t) which charges the capacitor to
x f = 10V,
while minimizing the sum of the energy dissipated in the resistor R 1 and the
square of the deviation of x(t) from xf, i.e.,

min{½ fo'~[(x-xf)2 + Rj2] dt}. (64)

The terminal time ti is free.


The starting nominal voltage was 26 V. The time histories, with dashed
extensions for t > 3, of input and output voltages and charging current on
Figs. 7-9 are indexed by the iteration number. Because of the linear-
quadratic nature of this problem, ft was determined by global minimization,
and it is marked by vertical dashed lines. The cost functions represented in
Fig. 7 by convex curves are clearly not parabolas; therefore, the variation 6q
computed by backward DDP, which does use the time expansion, would

x, Volt Cost, Joule


140
15 s /
120

I00

, { I 80
i i i

I 2 3 4 5 SeC

Fig. 7. Output voltage and corresponding cost.


502 JOTA: VOL. 21, NO. 4, APRIL 1977

u, Volt
30~ t i /
,,/
/

20 34 " " 2 " "

I0 I E
~
t
i
,I
g i
i i
I i l .... I1 t

1 2 3 4 5 S¢C

Fig. 8. Optimal output voltage.

have generally to be reduced to ~ 6tr, E < 1. The minimum cost of 98.32 was
reached in four iterations with
= 3.075 and b = 14.14.

9. Comments on Second-Order Algorithm

It was seen in Example 8.1 that even a linear-quadratic problem


required iteration of the second-order algorithm. This was caused by
ignoring the variation 6b, which would keep the initial condition (2) satisfied
after varying the terminal point during minimization. Knowledge of the
system transition matrix would be needed to compute such a variation 8b;
therefore, an additional matrix equation would have to be integrated. The
algorithm would then solve linear-quadratic problems in one iteration by
global minimization in (55).
If it is easy to satisfy the terminal constraints on the first run by forward
integration, it is advantageous to guess the Lagrange multiplier/7 and to

i, Amp
15 l
/l

IO ~ c ~ / //

r ~

I' __1

I 2 3 4 5 S~C'

Fig. 9. Optimal input current.


JOTA: VOL. 2I, NO. 4, APRIL 1977 503

integrate (1) from Xo together with Eqs. (57) and (61) until some time tf,
while minimizing G to obtain u*. Along such a trajectory, a -= 0, and the first
four steps of the first iteration may be left out. No starting nominal control ti
is needed in this case.
Comparing our second-order algorithm with Jacobson's algorithm for
problems with free terminal time tf, it is seen that his algorithm requires the
additional integration of two scalar equations and two vector equations with
complicated boundary conditions. Also, the optimal terminal time in case of
bang-bang control has to be known in advance, and the computation of 6tr in
his algorithm is complicated by the condition V,er-> 0, which may not hold
along some nominal trajectories. The computed 6tI has generally to be
reduced to e 3tI, e < 1, to ensure that the expansion of his optimal cost
function in terms of tI is valid. No such expansion is used in our algorithm;
neither is it required that the terminal cost or the constraints be differenti-
able.
The above second-order algorithm in its present form does not appear
suitable for problems with constrained control. The reachable set for such
problems is bounded; and, if a point outside 'this set is chosen by the
minimization procedure, the algorithm will fall to find a variation 8b
enforcing satisfaction of the system initial conditions.
Computational experience with more complicated systems showed that
the forward integration of the Riccati-like equation (61) using normally
available integration subroutines causes computer overflow. This result
agrees with the observations reported in Ref. 8 and seems to limit the
general applicability of all second-order forward methods.

10. Conclusions

Using the forward principle of optimality, we have derived two compu-


tational algorithms from which at least the first one, based on the forward
minimum principle, appears to be a powerful and easily programmable
method for the solution of optimal control problems with complicated or
nondifferentiable terminal conditions or both. It basically requires only an
integration subroutine and a procedure for the numerical minimization of
constrained functions. The approach seems to be ideally suited to on-line
dynamic optimization, since its storage requirements are minimal. The only
data which needs to be stored are the adjoint initial conditions and their
associated costs. The fourth-order Runge-Kutta method and simplex
minimization combined with a penalty function for treating the constraints
should be adequate for most practical problems and simple enough to
program easily on small on-line computers.
504 JOTA: VOL. 21, NO. 4, APRIL 1977

Problems with free terminal time may be solved at no additional


expense. The difficult question of locating the global minimum of a mul-
timodal function (which is likely to accompany problems with complicated
terminal conditions) could, at least partially, be resolved by off-line simula-
tions. It should also be possible to extend the forward approach to provide
solutions to stochastic control problems for which the open-loop feedback
policy (Ref. 9) is acceptable.

References

1. LEITMANN, G., and STALFORD, H., A Note on Termination in Optimal Control


Problems, Journal of Optimization Theory and Applications, Vol. 8, pp. 228-230,
1971.
2. LARSON, R. E., State Increment Dynamic Programming, American Elsevier
Publishing Company, New York, New York, 1968.
3. JACOBSON,D. H., A New Necessary Condition of Optimality for Singular Control
Problems, SIAM Journal on Control, Vol. 7, pp. 578-593, 1969.
4. PESCHON, J., and HENAULT, P. H., Long-Term Power System Expansion
Planning by Dynamic Programming and Production Cost Simulation, 9th-IEEE
Symposiumon Adaptive Processes, Decision, and Control, Austin, Texas, 1970.
5. JACOBSON, D. H., and MAYNE, D. Q., Differential Dynamic Programming,
American Elsevier Publishing Company, New York, New York, 1970.
6. ASH, M., Optimal Shutdown of Nuclear Reactors, Academic Press, New York,
New York, 1966.
7. KWAN, H. K., Optimal Shutdown Control of Nuclear Reactors, University of
Salford, MS Dissertation, 1973.
8. TAPLEY, B. D., and WILLIAMSON,W. E., Comparison of Linear and Riccati
Equations Used to Solve Optimal Control Problems, AIAA Journal, Vol. 10, pp.
1154-1159, 1972.
9. DREYFUS, S. E., Dynamic Programming and the Calculus of Variations,
Academic Press, New York, New York, 1965.

You might also like