09 Forward Differential Dynamic Programming
09 Forward Differential Dynamic Programming
4, APRIL 1977
Communicated by S. E. Dreyfus
1. Introduction
1Lecturer,DepartmentofElectricalEngineering,Universityof Salford,Lancashire,England.
487
© 1977 Plenum Publishing Corp., 227 West 17th Street, New York, N.Y. 10011. To promote freer access to published material
,,~ the spirit of the 1976 Copyright Law, Plenum sells reprint articles from all its journals. This availability underlines the fact
that no part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying~ microfilming~ recording, or otherwise, without written permission of the publisher.
Shipment is prompt; rate per article is $7.50.
488 JOTA: VOL. 21, NO. 4, APRIL 1977
2. Problem Statement
It is assumed that the control of system (1) finishes at a terminal time ti, which
may or may not be specified in advance, so that any x (t), t > tf, needed in (3),
(4), or (5) can be determined from (1) with u(t)= O, t > tf.
JOTA: VOL. 21, NO. 4, APRIL 1977 489
The terminal cost F in (3) and the terminal constraints (4) and (5) are
not required to be differentiable. Some possible examples of nondifferenti-
able terminal conditions are
F - - maxIxi (t)[, (6)
t~--tf
after inclusion of the terminal conditions (terminal time, cost, and con-
straints). Thus, at any time t > to, there is a trajectory passing through
x(t) ~ X(t) with minimum cost
S(x(t), t) = min -~ L(x, u, 7) d.c, (13)
u ~ U .Ito
subject to (4) and (5) which, using (13) and noticing that F is independent of
u, can be written as
min{F(x(t), t)+S(x(tt), tr)}, t_>q, (16)
x(tt),q
Let us assume that a control u*(t), t e [to, q], and the corresponding
trajectory x*(t), x*(to)= Xo, satisfying (13), are available. Then, Eq. (17)
becomes
OS(x*, t)/Ot = H(x*, u*, Sx, t), S(xo, to) = 0. (20)
492 JOTA: VOL. 21, NO. 4, APRIL 1977
subject to
• (x*(t), t) = O, t>-ts, (26-2)
dp(x*(t), t)<_O, t>-tt. (26-3)
Accepting the open-loop strategy from the very beginning, we have reduced
the TPBVP to an initial-value problem which is much easier to solve at the
JOTA: VOL. 21, NO. 4, APRIL 1977 493
6. ComputationalAlgorithm
u=_.*0.Ims~I
a [R=2m
X2
~.-~
where xa and x2 are position and velocity of the load and x3 is position of the
crab. g = 9.81 m . sec -z is acceleration due to gravity.
In mathematical form, the above requirements may be written as the
following minimization problem:
subject to
x3(q) = 0. (32)
JOTA: VOL. 21, NO. 4, APRIL 1977 495
Cost
20 A
,~ / \x
r
/ t i~ \
,,, \ / ,~
I0
8
global
minimum
S It.)
Fig.2. Localminima.
The Hamiltonian
3
H = ½ x ~ - E Sx,2i (33)
is minimized for
u * = 0.1 sign[2S~[Xz(Xl-Xs)/(R2-(xl-x3)Z)]+Sj. (34)
The penalty term lO0lx3(t/) I was added to the cost functional (31). A
four-dimensional simplex minimization and fourth-order Runge-Kutta
integration method with step 0.02 were used. From the numerical experi-
ments, the minimized function in (b, ts)-space was found to have a number of
local minima along the ts-axis located approximately half the period of
pendulum oscillations apart (Fig. 2). The global minimum of 7.601 was
found for t¢ = 5.58 and
F °°°11
b=/-8.902 /
L-0.175_1
The optimal strategy and the corresponding phase diagram in xlx2-plane are
given in Fig. 3.
-I
x~,ms ~ - ÷- u~ O.l
1 . 0 ~ ~ u=-O,I
- 013 0.5
-I.0
13
flux =6.10 neutron /era2 s£c
2 Urn=x=2 F
to tq 6 12 tr tt '"IB hour
x2 P R Q
2
I o
i ~ G J Xl
(3..5 1.0
Fig. 5. Phase trajectoryof optimal Shutdownprocess.
498 JOTA: VOL 21, NO. 4, APRIL 1977
out that
u*(t) = 0, O<t<tq,
SO that no TPBVP need in fact be solved.
One approach to solving the problem is to choose the time t, at which to
leave the constrained arc (i.e., to switch uc to Umax)and the time tg, in order to
minimize tf subject to (38) for t -> ts. Another approach is obtained by noting
that, once control has ceased, the optimal trajectory is the unique trajectory
of the free motion
xl = - x l , (46)
Yc2= axl + bx2, (47)
passing through the origin and tangential to the line x2 = Xm~xat P. A fairly
elementary calculation based on this consideration gives the coordinates of
P as
x l(tp) = -(b/a)Xmax, (48)
x2(tp) = xm~. (49)
The arc PFG can now be obtained by integrating (46)-(47) backward in time
from the point P given by (48)-(49). The problem is, therefore, reduced to
choosing tr in order to minimize the time tf. This approach requires no more
than integration of two equations and one-dimensional minimization.
It is to be pointed out that the solutions for similar problems without the
constraint (38) were obtained in Ref. 6 by backward procedures only
because of the bilinear form of Eq. (36). For u = 0, this equation becomes
linear as in (47) with known analytical solution, from which
F(xl(tr) , x2(ts)) = max x2(t) (50)
t ~--tf
needed in (10) as well as the boundary values Vxl(tf) and Vx2(tl) of the
adjoint variables can be found analytically. Using the forward approach, it
was possible to obtain the solution of a more general case with the state-
space constraint (38) without exploiting the special bilinear structure of
system equations.
7. Second-Order Expansion
OW/Ot = rain
u
G(x, u, Wx, t), (52)
- ( G ~ - f , TWx~) TGu~(G~
-1 --f,W~),T W~(to) = 0, (61)
where
t3,=au.(fuWx
-Qx).-1
T
Unless otherwise stated, all quantities are evaluated at 2,/~ u*. a (t) is the
predicted reduction of W(£, b, t) provided that the new control
u = u* + ~16x
is applied to (1) backward in time from 2(0.
8. ComputationalAlgorithm
The first four steps of the new algorithm are similar to those of Jacobson
(Ref. 5), with opposite direction of integration. As a result of their comple-
tion, a candidate 2(t), t e [to, ~], for an optimum trajectory satisfying both
the initial and the terminal constraints is obtained, along which a - 0.
The last step requires numerical minimization in (55). The values of the
optimal cost function are determined from its expansion
W ( 2 + 6 x , 5, ~r + 6ts) = W + ( W x , 6x)+½(~x, Wxx 6x) (62)
at time ~ +6t¢, which is known for 6t/-< 0. If 6ts > 0, one continues the
integration of (1), (57), and (61) from fs to ~i+6ty, while minimizing G to
obtain u*. The variation 6tf need not be small, since no time expansion is
used in (62). To limit the size of 6x, it may be necessary to take only a
size-limited step in a decreasing direction 6xa of the minimized function in
(55), rather than to complete the minimization. If 6xd # 0, (1) is integrated
backward in time from [2 + ~xd]~,-+~teapplying the new nominal control
= u*+fll 6x,
and the algorithm is restarted.
JOTA: VOL. 21, NO. 4, APRIL t977 501
R,=2o i
U~ )T '~ ox
C=IF ? ~ R2=5-~.
I00
, { I 80
i i i
I 2 3 4 5 SeC
u, Volt
30~ t i /
,,/
/
I0 I E
~
t
i
,I
g i
i i
I i l .... I1 t
1 2 3 4 5 S¢C
have generally to be reduced to ~ 6tr, E < 1. The minimum cost of 98.32 was
reached in four iterations with
= 3.075 and b = 14.14.
i, Amp
15 l
/l
IO ~ c ~ / //
r ~
I' __1
I 2 3 4 5 S~C'
integrate (1) from Xo together with Eqs. (57) and (61) until some time tf,
while minimizing G to obtain u*. Along such a trajectory, a -= 0, and the first
four steps of the first iteration may be left out. No starting nominal control ti
is needed in this case.
Comparing our second-order algorithm with Jacobson's algorithm for
problems with free terminal time tf, it is seen that his algorithm requires the
additional integration of two scalar equations and two vector equations with
complicated boundary conditions. Also, the optimal terminal time in case of
bang-bang control has to be known in advance, and the computation of 6tr in
his algorithm is complicated by the condition V,er-> 0, which may not hold
along some nominal trajectories. The computed 6tI has generally to be
reduced to e 3tI, e < 1, to ensure that the expansion of his optimal cost
function in terms of tI is valid. No such expansion is used in our algorithm;
neither is it required that the terminal cost or the constraints be differenti-
able.
The above second-order algorithm in its present form does not appear
suitable for problems with constrained control. The reachable set for such
problems is bounded; and, if a point outside 'this set is chosen by the
minimization procedure, the algorithm will fall to find a variation 8b
enforcing satisfaction of the system initial conditions.
Computational experience with more complicated systems showed that
the forward integration of the Riccati-like equation (61) using normally
available integration subroutines causes computer overflow. This result
agrees with the observations reported in Ref. 8 and seems to limit the
general applicability of all second-order forward methods.
10. Conclusions
References