0% found this document useful (0 votes)

60 views18 pages

09 Forward Differential Dynamic Programming

jurnal ini membahas tentang program program dinamik.

Uploaded by

Muslikhudin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views18 pages

09 Forward Differential Dynamic Programming

jurnal ini membahas tentang program program dinamik.

Uploaded by

Muslikhudin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 21, No.

4, APRIL 1977

Forward Differential Dynamic Programming

K. WIT 1

Communicated by S. E. Dreyfus

Abstract. The dynamic programming formulation of the forward prin-

ciple of optimality in the solution of optimal control problems results in a
partial differential equation with initial boundary condition whose
solution is independent of terminal cost and terminal constraints. Based
on this property, two computational algorithms are described. The
first-order algorithm with minimum computer storage requirements
uses only integration of a system of differential equations with specified
initial conditions and numerical minimization in finite-dimensional
space. The second-order algorithm is based on the differential dynamic
programming approach. Either of the two algorithms may be used for
problems with nondifferentiable terminal cost or terminal constraints,
and the solution of problems with complicated terminal conditions (e.g.,
with free terminal time) is greatly simplified.

Key Words. Optimal control, forward dynamic programming,

differential dynamic programming.

1. Introduction

Unlike the numerical integration of ordinary differential equations with

specified initial conditions and numerical minimization of functions for
which efficient computational methods are available, the solution of two-
point boundary-value problems (TPBVP) still presents great difficulties in
the development of new computational methods for the solution of optimal
control problems. By embedding a given problem into a more general class
of similar problems, dynamic programming (DP) avoids the tedious solution
of TPBVP, but it is not generally computationally feasible. All other
methods based on the backward principle of optimality invariably require
solution of a TPBVP for which some analyticity of terminal boundary
conditions is always essential. By redefining the principle of optimality in the

1Lecturer,DepartmentofElectricalEngineering,Universityof Salford,Lancashire,England.
487
© 1977 Plenum Publishing Corp., 227 West 17th Street, New York, N.Y. 10011. To promote freer access to published material
,,~ the spirit of the 1976 Copyright Law, Plenum sells reprint articles from all its journals. This availability underlines the fact
that no part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means,
electronic, mechanical, photocopying~ microfilming~ recording, or otherwise, without written permission of the publisher.
Shipment is prompt; rate per article is $7.50.
488 JOTA: VOL. 21, NO. 4, APRIL 1977

forward way, it is possible to separate the functional minimization from the

specified terminal conditions, thus eliminating the need for their analyticlty.
Nondifferentiable terminal cost or terminal constraints may appear quite
often in the solution of practical problems, as illustrated by examples, and
they are fairly common in econometrics and optimum planning. Even
optimal control with unspecified terminal time poses a rather diffctilt
problem for which new methods continuously emerge in the control litera-
ture. Problems in which the time of the first arrival at the target set may not
be optimal, as illustrated in Ref. 1, have hardly been tackled. All that these
problems have in common is some complexity of their terminal conditions.
Using the approach developed in this paper, this complexity is traded for
more or less Complicated constraints in numerical minimization of nonlinear
programming (NP) for which a number of reliable and efficient algorithms
are available. At the same time, as a consequence of the forward formulation
of the principle of optimality, all boundary conditions are defined at the
initial time and the corresponding differential equations may therefore be
readily solved.

2. Problem Statement

We shall consider an n-dimensional dynamic system

£(t) =f(x, u, t), (I)
with given initial conditions
x (to) = xo. (2)
The problem is to find a piecewise-continuous vector function u(t) which
belongs to a closed control set U(t) for t ~ [to, tr] and minimizes the following
cost functional:

F(x (t), t) + L (x, u, z) dr, t --- tf. (3)

The corresponding optimal state-space trajectory is required to satisfy a set

of terminal constraints
• (x (t), t) = 0, t -> tf, (4)
• (x(t), t)<-o, t>_t~. (5)

It is assumed that the control of system (1) finishes at a terminal time ti, which
may or may not be specified in advance, so that any x (t), t > tf, needed in (3),
(4), or (5) can be determined from (1) with u(t)= O, t > tf.
JOTA: VOL. 21, NO. 4, APRIL 1977 489

The terminal cost F in (3) and the terminal constraints (4) and (5) are
not required to be differentiable. Some possible examples of nondifferenti-
able terminal conditions are
F - - maxIxi (t)[, (6)
t~--tf

xlt-ts-to-kT, K = 0 , 1, 2 , . . . , T>0, (7)

¢b-lxi(t)l-lxi(to)l, t>-tr, (8)
where xi(t), 1 <-i <-n, is the ith component of the state vector x(t). Such
terminal conditions appear naturally in optimal control of dynamic systems
that are not in equilibrium at the terminal time th after which no further
control is applied. The functions f and L are assumed to be sufficiently
smooth in all arguments.

3. Forward Versus Backward Dynamic Programming

The conventional backward DP embeds the solution of the original

problem with given initial condition Xointo a family of related problems with
the same terminal conditions, viz., time tl, terminal cost, and terminal
constraints, but with initial conditions in some subset Xo of the state space at
time t0. For any particular x0 c 2(0, the optimal policy and trajectory are
therefore readily available. The computation proceeds backward in time
from ts to to. Therefore, the optimal cost function
r ts
V(x(t),t)=~n{F(x(tr),ts)+| L(x,u,~-)d~-}, t<-tr, (9)
~t

V(x(tf), tr) = F(x(tl), ts), (10)

~I,(x(tr), tr) = o, (11)
~;(x (tr), tf) _-<0, (12)
depends on the terminal conditions.
The well-known dimensionality problem associated with a straightfor-
ward application of DP has led to the development of a numberof optimiza-
tion methods which invariably use some analytical information about the
terminal conditions and result in a nonlinear TPBVP which requires large
computational effort and computer core to solve.
In forward DP (Ref. 2), the embedding family is constructed with the
terminal conditions being ignored and it consists of state-space trajectories
emanating from the given initial point x0, each of which is a candidate for the
optimal trajectory. A particular one amongthem is recognized as optimal
490 JOTA: VOL. 21, NO. 4, APRIL 1977

after inclusion of the terminal conditions (terminal time, cost, and con-
straints). Thus, at any time t > to, there is a trajectory passing through
x(t) ~ X(t) with minimum cost
S(x(t), t) = min -~ L(x, u, 7) d.c, (13)
u ~ U .Ito

S(xo, to) = 0, (14)

where the attainable set X(t) is a set of all possible solutions of (1) for
admissible u(~-), ~-c [to, t]. The optimal trajectory must satisfy

minmin{F(x(t), t)+ L(x, u, ~') d~'}, t>-tl, (15)

x(tt),qu~U to

subject to (4) and (5) which, using (13) and noticing that F is independent of
u, can be written as
min{F(x(t), t)+S(x(tt), tr)}, t_>q, (16)
x(tt),q

subject to (4) and (5).

It is seen that the functional minimization in (13) has been separated
from the (n + 1)-dimensional NP problem in (16). If the terminal time is
specified in advance, the corresponding minimization in (16) is omitted, and
the dimension of the NP problem is reduced to n.
The fact that the optimal cost function S is not dependent on the
terminal conditions allows one to use standard functional optimization
techniques also for problems with nondifferentiable terminal cost or con-
straints. The problems with free terminal time q are also simplified. The
price that one has to pay for this convenience is that the very desirable
feedback nature of the backward DP has been lost; i.e., given a new initial
condition x~ # x0, no optimal policy is available unless the whole problem is
re-solved for the new x~. However, in most existing optimization algorithms,
the embedding nature of backward DP has been sacrificed in any case for
computational feasibility, by generating an open-loop optimal control for a
particular x0. Applying these algorithms to deterministic systems will also
yield the'closed-loop optimal strategy. In the optimization of stochastic
systems, however, it is vital to preserve the closed-loop feature of optimal
control to allow for compensation of future disturbances. Also, in some
analytical developments (e.g., in the proof of optimality of a singular control,
Ref. 3), the forward principle of optimality as described above will not work.
There are, however, many situations in which it may greatly simplify the
solution (e.g., in problems of economical planning with limited equipment
age, Ref. 4) or even yield solutions to problems which cannot be solved
otherwise.
JOTA: VOL. 21, NO. 4, APRIL 1977 491

From these comments, it should be clear that forward DP is not merely

backward DP applied in the reverse time to a problem with initial and
terminal conditions interchanged, as is sometimes erroneously assumed.

4. Optimum Cost Function

Before the NP problem (16) can be solved, it is necessary to have a

means of generating the embedding family of trajectories starting at xo and
having the minimum cost (13) for any reachable x (t) e X(t). On application
of the forward principle of optimality, it is easy to derive the following partial
differential equation for the optimal cost function in (13):
OS/at = minH(x, u, S~, t), (17)
u~U

S(x(to), b, to) =(b, X(to)-Xo), (18)

with the Hamiltonian defined by
H(x, u, Sx, t) = L(x, u, t)-(S~, f(x, u, t)), (19)
where
sx = a s / a x
is the gradient of S and b is a vector Lagrange multiplier. Eq. (17) resembles
the well-known Bellman-Jacobi-Hamilton equation for the backward
optimization and it differs from it only by a sign and the time at which the
boundary condition (18) is defined.
Because the direct numerical solution of Eq. (17) presents the same
difficulties as that of its backward counterpart, we will seek some feasible
methods of generating trajectories satisfiying (13). For this, the well-
established methods of solution of the Bellman equation may be used as a
helpful guide. In particular, we shall examine two approaches: (a) solution of
hyperbolic partial differential equations by the method of characteristics;
and (b) second-order approximation to the optimal cost function in (13). In
order to unify the derivations, we shall use the method of differential
dynamic programming (DDP, Ref. 5), which has been applied successfully
to a number of optimization problems.

5. Forward Minimum Principle

Let us assume that a control u*(t), t e [to, q], and the corresponding
trajectory x*(t), x*(to)= Xo, satisfying (13), are available. Then, Eq. (17)
becomes
OS(x*, t)/Ot = H(x*, u*, Sx, t), S(xo, to) = 0. (20)
492 JOTA: VOL. 21, NO. 4, APRIL 1977

For trajectories originating at Xo, the optimal cost function is independent of

b. If a small-control 6u is now introduced, so that u * + 6u E U, the new
trajectory satisfies
OS(x*+6x, t)/Ot =H(x*+6x, u*+6u, S~ +Sxx~x, t). (21)
By a small variation 3u we mean a variation for which Sx + Sx~ ~x is a
good approximation to Sx(X*+6x, t). Expanding Eq. (21) up to the first-
order terms, using (20), and noticing that either Hu = 0 for u* interior to U
or 6u = 0 for u* on the boundary of U, we obtain the adjoint equation
S~ (t) = (OH/Ox)(x*, u*, S~, t), (22)
with the initial condition
&(to) = b. (23)
From (1) and (19), it follows that
~:*(t) = -(OH/OSx)(x*, u*, Sx, t), x*(to) =xo, (24)
which is (1) evaluated along the optimal trajectory. Equations (22) and (24)
are clearly necessary conditions which the control u* minimizing the Hamil-
tonian in (17) as well as the integral in (13) must satisfy. They play the same
role in what we can call the forward minimum principle as the canonical
equations in the backward minimum principle of Pontryagin. For any given
b, Eqs. (22) and (24) can be easily integrated forward with u* determined by
a constrained minimization of the Hamiltonian (19) with respect to u ~ U
and the optimal cost can be computed as

S(x(t), t) = L(x, u*, T) d-r. (25)

By varying b in (23), we are able to generate the desired family of

candidate trajectories with x*(ts)~ X(tt). Thus, it is possible to replace the
minimization over x(q) in (16) by the minimization over b. This gives the
following equivalent problem:

minlF(x(t), t)+ L(x, u*, ~') d.r , t>-tr, (26-1)

b,tf [ 0

subject to
• (x*(t), t) = O, t>-ts, (26-2)
dp(x*(t), t)<_O, t>-tt. (26-3)
Accepting the open-loop strategy from the very beginning, we have reduced
the TPBVP to an initial-value problem which is much easier to solve at the
JOTA: VOL. 21, NO. 4, APRIL 1977 493

expense of a constrained minimization in the (n + 1)-dimensional space

(b, tf).
Integration of the adjoint equation (22) in the unstable direction may
pose numerical problems similar to those encountered with the shooting
methods often used to solve TPBVPs. In spite of some similarity between
these two approaches, the new algorithm works directly with the value of the
terminal cost function and the terminal constraints instead of their deriva-
tives. Thus, it appears to be more straightforward; it overcomes the problem
of sensitivity of the optimal cost to the accuracy to which the TPBVP is
solved and, as will be shown by examples, it is applicable to a broader class of
problems.

6. ComputationalAlgorithm

Using a direct search numerical method of NP, the computation

proceeds as follows.
Step 1. For a particular b and tf specified by the minimization routine,
integrate the 2n equations (22) and (24) from to to tr with u* determined by
either analytical minimization or, if necessary, numerical minimization of
the Hamiltonian (19).
Step 2. Compute S(x*(t¢), tr) in (25).
Step 3. Integrate (1) from tt forward with u(t)-O, t > tl, to obtain
x*(t) needed in F, ~ , and alp.Some degree of familiarity with the problem is
necessary in order to determine how far to continue the integration.
Step 4. Evaluate the total cost in (26-1) and the constraints (26-2) and
(26-3) as required by the minimization routine.
If the terminal cost F is independent of t > tI [i.e., if t = t! applies in
(26-1), (26-2), and (26-3)], it is advantageous to write the minimization in
(26-1) separately, namely,
min{. } = rain{rain[ • ]},
b,tf b ts

and to simplify the algorithm as follows. Introducing a penalty function P for

the constraints (26-2) and (26-3), integrate (22) and (24) from to onward as
long as
P(x*(t), t)+f(x*(t), t)+ L(x*, u*, ~') dr (27)

is decreasing. The problem has thus been reduced to an n-dimensional

unconstrained minimization in the space of adjoint initial conditions (23).
494 JOTA: VOL. 21, NO. 4, APRIL 1977

u=_.*0.Ims~I

a [R=2m

X2
~.-~

Fig. 1. Optimalcontrolof a crane: (a) movingcrane; (b) movingcrab.

Example6.1. Optimal Pendulum Control. A load is suspended from

an overhead crane crab as shown in Fig. la. The crab and the load are at rest
with respect to the crane, which is moving at a constant speed v. If the crane
suddenly stops, the load starts to swing as in Fig. lb. A bang-bang policy of
the crab speed u is to be found which will quickly reduce the amplitude of
load swings without excessive movement of the crab. It is required that the
crab be positioned eventually at the same point on the crane as before it
started to move.
The system is described by the equations
iq=x2, x1(0) = 0, (28)

iz = -txa-x3)lR:_-L-~_-L--_xs)2 x2(O)--~ V, (29)

it3 = u, x3(0) = 0, (30)

where xa and x2 are position and velocity of the load and x3 is position of the
crab. g = 9.81 m . sec -z is acceleration due to gravity.
In mathematical form, the above requirements may be written as the
following minimization problem:

subject to
x3(q) = 0. (32)
JOTA: VOL. 21, NO. 4, APRIL 1977 495

Cost

20 A
,~ / \x
r
/ t i~ \
,,, \ / ,~
I0
8
global
minimum

S It.)
Fig.2. Localminima.
The Hamiltonian
3
H = ½ x ~ - E Sx,2i (33)
is minimized for
u * = 0.1 sign[2S~[Xz(Xl-Xs)/(R2-(xl-x3)Z)]+Sj. (34)
The penalty term lO0lx3(t/) I was added to the cost functional (31). A
four-dimensional simplex minimization and fourth-order Runge-Kutta
integration method with step 0.02 were used. From the numerical experi-
ments, the minimized function in (b, ts)-space was found to have a number of
local minima along the ts-axis located approximately half the period of
pendulum oscillations apart (Fig. 2). The global minimum of 7.601 was
found for t¢ = 5.58 and

F °°°11
b=/-8.902 /
L-0.175_1
The optimal strategy and the corresponding phase diagram in xlx2-plane are
given in Fig. 3.

Example6.2. Optimal Shutdown of a Nuclear Reactor. It is required

to shut down a xenon-dominated nuclear reactor in minimum time in such a
way that a restart is always possible. The fission reactions are described by
(Ref. 6)
J~l = - - X l -'1- U , XI(0) = 1, (35)
22 = axl + bx2 + (c + dx2)u, X2(0)= 1, (36)
496 JOTA: VOL. 21, NO. 4, APRIL 1977

-I
x~,ms ~ - ÷- u~ O.l
1 . 0 ~ ~ u=-O,I

- 013 0.5

-I.0

Fig. 3. Phase trajectory and optimal control,

where xl and x2 are normalized concentrations of 1-135 and Xe-135,

respectively, u is a normalized neutron flux, and a, b, c, and d are constants,
The neutron flux is limited by the amount of fuel available, that is,
u -- u ~ ; (37)
and xenon concentration must not overpoison the reactor, that is,
x2(t)<--Xmax, t~O. (38)

13
flux =6.10 neutron /era2 s£c
2 Urn=x=2 F

to tq 6 12 tr tt '"IB hour

Fig. 4. Optimal shutdown policy.

JOTA: VOL. 21, NO. 4, APRIL 1977 497

If tr is the time at which the shut-down process is completed, i.e.,

u(t)-O, t>t h
then, for t-t~, (38) is a terminal constraint of the form (5). For t < th
however, (38) is a phase-space constraint not considered in our general
development.
The Hamiltonian is found to be
H=l+Sxlx1-Sx2(ax+bx2)-u[S~+S~2(c+dx2)], (39)
and the adjoint equations are
,~,:, = S=, - aS~, (40)
Sx2 = - ( b +du)Sx2. (41)
Let the solution of (35) and (36) for which equality holds in (38) be called a
constrained arc, otherwise an unconstrained arc. The beginning of a con-
strained arc is called an entry point. On an unconstrained arc, the optimal
control (Fig. 4) is given by
u . = { 0, S~+S~2(c+dx2)<O,
Um,,x, $~,~+Sx2(c+dx2)>O; (42)
and, on a constrained arc,
u* = uc = - ( a x l + bxm~)/(c + dxmax). (43)
Generally, the unconstrained arc has to satisfy some additional tangency
constraints at the entry point. From the special nature of this problem, it was
found by means of a geometrical argument that only one constrained arc
exists and that the adjoint variables at the entry point Q must satisfy (Ref. 7)
Sx~(tq) = 1/(Uc --X~), (44)
Sx2(tq) = - 1/(uc - x 1)(c + d x ~ ) . (45)
Thus, due to the separability of the optimal arcs, the first unconstrained arc
OQ in Fig. 5 is obtained by solving the TPBVP (35), (36), (40), (41). It turns

x2 P R Q
2

I o

i ~ G J Xl
(3..5 1.0
Fig. 5. Phase trajectoryof optimal Shutdownprocess.
498 JOTA: VOL 21, NO. 4, APRIL 1977

out that
u*(t) = 0, O<t<tq,
SO that no TPBVP need in fact be solved.
One approach to solving the problem is to choose the time t, at which to
leave the constrained arc (i.e., to switch uc to Umax)and the time tg, in order to
minimize tf subject to (38) for t -> ts. Another approach is obtained by noting
that, once control has ceased, the optimal trajectory is the unique trajectory
of the free motion
xl = - x l , (46)
Yc2= axl + bx2, (47)
passing through the origin and tangential to the line x2 = Xm~xat P. A fairly
elementary calculation based on this consideration gives the coordinates of
P as
x l(tp) = -(b/a)Xmax, (48)
x2(tp) = xm~. (49)

The arc PFG can now be obtained by integrating (46)-(47) backward in time
from the point P given by (48)-(49). The problem is, therefore, reduced to
choosing tr in order to minimize the time tf. This approach requires no more
than integration of two equations and one-dimensional minimization.
It is to be pointed out that the solutions for similar problems without the
constraint (38) were obtained in Ref. 6 by backward procedures only
because of the bilinear form of Eq. (36). For u = 0, this equation becomes
linear as in (47) with known analytical solution, from which
F(xl(tr) , x2(ts)) = max x2(t) (50)
t ~--tf

needed in (10) as well as the boundary values Vxl(tf) and Vx2(tl) of the
adjoint variables can be found analytically. Using the forward approach, it
was possible to obtain the solution of a more general case with the state-
space constraint (38) without exploiting the special bilinear structure of
system equations.

7. Second-Order Expansion

Using the method of DDP (Ref. 5), we will derive a second-order

algorithm from the forward principle of optimality. Because the system
equation (1) will be integrated backward in time, we have to ensure
JOTA: VOL. 21, NO. 4, APRIL 1977 499

satisfaction of the initial condition (2). This will be achieved by adjoining it

to (3) by a constant Lagrange vector multiplier b. The new optimal cost
function depends on b and it is written as

W(x(t), b, t)=min{ (b, x(to)-Xo)+ I ' L(x, u, ~) &c}. (51)

From the forward principle of optimality, it must satisfy

OW/Ot = rain
u
G(x, u, Wx, t), (52)

W ( x (to), b, to) = (b, X(to) - Xo), (53)

where the Hamiltonian is given by
a(x, u, W~, t)=L(x, u, t)-(W~,f(x, u, t)). (54)
Similar to (16), the optimal trajectory must yield
min {F(x (t), t) + W(x (tf), tf)}, t >-tf. (55)
x(tf),tf

The control vector u is assumed to be unconstrained for reasons that will

become obvious later.
Proceeding in a similar way to Ref. 5, it is assumed that a nominal
multiplier b and a nominal trajectory 2, not necessarily passing through x0,
are available for t ~ [to, tf]. If tI is not specified explicitly, t s [to, ~] for some
nominal time ~ >to. The nominal trajectory is assumed to satisfy the
terminal constraints (4) and (5) at time is. This is easily achieved by
integrating (1) on the first run backward in time from a feasible point. The
constraints are never violated from then on.
The minimization of G along 2(t) gives the necessary condition

Q(2, u*, Wx, t) = 0,

where u* is the minimizing control. Henceforward, it is assumed that
Guu (2, u*, W~, t) is a positive-definite matrix. On introduction of variations
6x, 6b, and 6u, Eq. (52) is expanded up to the second order about 2, u*, and
b. The control variation is chosen in the form

,~u =131 8x.

This is not to be interpreted as a local linearized feedback which, in the
backward formulation, determines the control that should be applied in
order to maintain optimality along 2 +6x. Here, 6u rather determines a
trajectory which should have been followed in order to satisfy Eq. (51).
500 JOTA: VOL 21, NO. 4, APRIL 1977

On collection of corresponding terms of the expanded equation, one

obtains the following set of differential equations:
d = a(2, a, W~, t) - G, a(to) = 0, (56)
~'~ = ax + W~x [f(2, rT, t) - f ] , W~(to) = b, (57)
~d/b = W~[f(2, a, t ) - f ] , Wb(to) =2(to)-Xo, (58)
~¢~b = - ( f x + f~l)Tw~b, W~b(to) = I, (59)
Wbb = T - I TW~,b,
- WxbfuGu,,fu Wbb (to) = O, (60)

- ( G ~ - f , TWx~) TGu~(G~
-1 --f,W~),T W~(to) = 0, (61)
where
t3,=au.(fuWx
-Qx).-1
T

Unless otherwise stated, all quantities are evaluated at 2,/~ u*. a (t) is the
predicted reduction of W(£, b, t) provided that the new control
u = u* + ~16x
is applied to (1) backward in time from 2(0.

8. ComputationalAlgorithm

The first four steps of the new algorithm are similar to those of Jacobson
(Ref. 5), with opposite direction of integration. As a result of their comple-
tion, a candidate 2(t), t e [to, ~], for an optimum trajectory satisfying both
the initial and the terminal constraints is obtained, along which a - 0.
The last step requires numerical minimization in (55). The values of the
optimal cost function are determined from its expansion
W ( 2 + 6 x , 5, ~r + 6ts) = W + ( W x , 6x)+½(~x, Wxx 6x) (62)
at time ~ +6t¢, which is known for 6t/-< 0. If 6ts > 0, one continues the
integration of (1), (57), and (61) from fs to ~i+6ty, while minimizing G to
obtain u*. The variation 6tf need not be small, since no time expansion is
used in (62). To limit the size of 6x, it may be necessary to take only a
size-limited step in a decreasing direction 6xa of the minimized function in
(55), rather than to complete the minimization. If 6xd # 0, (1) is integrated
backward in time from [2 + ~xd]~,-+~teapplying the new nominal control
= u*+fll 6x,
and the algorithm is restarted.
JOTA: VOL. 21, NO. 4, APRIL t977 501

R,=2o i
U~ )T '~ ox
C=IF ? ~ R2=5-~.

Fig. 6. Circuit diagram.

Example 8.1. Optimal Charging of a Capacitor. Let us consider the

simple circuit of Fig. 6 described by
Jc= -(1/C)(1/Rt + 1/R2)x + (1/CR1)u, x0 = 0. (63)
We wish to find the voltage u(t) which charges the capacitor to
x f = 10V,
while minimizing the sum of the energy dissipated in the resistor R 1 and the
square of the deviation of x(t) from xf, i.e.,

min{½ fo'~[(x-xf)2 + Rj2] dt}. (64)

The terminal time ti is free.

The starting nominal voltage was 26 V. The time histories, with dashed
extensions for t > 3, of input and output voltages and charging current on
Figs. 7-9 are indexed by the iteration number. Because of the linear-
quadratic nature of this problem, ft was determined by global minimization,
and it is marked by vertical dashed lines. The cost functions represented in
Fig. 7 by convex curves are clearly not parabolas; therefore, the variation 6q
computed by backward DDP, which does use the time expansion, would

x, Volt Cost, Joule

140
15 s /
120

I00

, { I 80
i i i

I 2 3 4 5 SeC

Fig. 7. Output voltage and corresponding cost.

502 JOTA: VOL. 21, NO. 4, APRIL 1977

u, Volt
30~ t i /
,,/
/

20 34 " " 2 " "

I0 I E
~
t
i
,I
g i
i i
I i l .... I1 t

1 2 3 4 5 S¢C

Fig. 8. Optimal output voltage.

have generally to be reduced to ~ 6tr, E < 1. The minimum cost of 98.32 was
reached in four iterations with
= 3.075 and b = 14.14.

9. Comments on Second-Order Algorithm

It was seen in Example 8.1 that even a linear-quadratic problem

required iteration of the second-order algorithm. This was caused by
ignoring the variation 6b, which would keep the initial condition (2) satisfied
after varying the terminal point during minimization. Knowledge of the
system transition matrix would be needed to compute such a variation 8b;
therefore, an additional matrix equation would have to be integrated. The
algorithm would then solve linear-quadratic problems in one iteration by
global minimization in (55).
If it is easy to satisfy the terminal constraints on the first run by forward
integration, it is advantageous to guess the Lagrange multiplier/7 and to

i, Amp
15 l
/l

IO ~ c ~ / //

r ~

I' __1

I 2 3 4 5 S~C'

Fig. 9. Optimal input current.

JOTA: VOL. 2I, NO. 4, APRIL 1977 503

integrate (1) from Xo together with Eqs. (57) and (61) until some time tf,
while minimizing G to obtain u*. Along such a trajectory, a -= 0, and the first
four steps of the first iteration may be left out. No starting nominal control ti
is needed in this case.
Comparing our second-order algorithm with Jacobson's algorithm for
problems with free terminal time tf, it is seen that his algorithm requires the
additional integration of two scalar equations and two vector equations with
complicated boundary conditions. Also, the optimal terminal time in case of
bang-bang control has to be known in advance, and the computation of 6tr in
his algorithm is complicated by the condition V,er-> 0, which may not hold
along some nominal trajectories. The computed 6tI has generally to be
reduced to e 3tI, e < 1, to ensure that the expansion of his optimal cost
function in terms of tI is valid. No such expansion is used in our algorithm;
neither is it required that the terminal cost or the constraints be differenti-
able.
The above second-order algorithm in its present form does not appear
suitable for problems with constrained control. The reachable set for such
problems is bounded; and, if a point outside 'this set is chosen by the
minimization procedure, the algorithm will fall to find a variation 8b
enforcing satisfaction of the system initial conditions.
Computational experience with more complicated systems showed that
the forward integration of the Riccati-like equation (61) using normally
available integration subroutines causes computer overflow. This result
agrees with the observations reported in Ref. 8 and seems to limit the
general applicability of all second-order forward methods.

10. Conclusions

Using the forward principle of optimality, we have derived two compu-

tational algorithms from which at least the first one, based on the forward
minimum principle, appears to be a powerful and easily programmable
method for the solution of optimal control problems with complicated or
nondifferentiable terminal conditions or both. It basically requires only an
integration subroutine and a procedure for the numerical minimization of
constrained functions. The approach seems to be ideally suited to on-line
dynamic optimization, since its storage requirements are minimal. The only
data which needs to be stored are the adjoint initial conditions and their
associated costs. The fourth-order Runge-Kutta method and simplex
minimization combined with a penalty function for treating the constraints
should be adequate for most practical problems and simple enough to
program easily on small on-line computers.
504 JOTA: VOL. 21, NO. 4, APRIL 1977

Problems with free terminal time may be solved at no additional

expense. The difficult question of locating the global minimum of a mul-
timodal function (which is likely to accompany problems with complicated
terminal conditions) could, at least partially, be resolved by off-line simula-
tions. It should also be possible to extend the forward approach to provide
solutions to stochastic control problems for which the open-loop feedback
policy (Ref. 9) is acceptable.

References

1. LEITMANN, G., and STALFORD, H., A Note on Termination in Optimal Control

Problems, Journal of Optimization Theory and Applications, Vol. 8, pp. 228-230,
1971.
2. LARSON, R. E., State Increment Dynamic Programming, American Elsevier
Publishing Company, New York, New York, 1968.
3. JACOBSON,D. H., A New Necessary Condition of Optimality for Singular Control
Problems, SIAM Journal on Control, Vol. 7, pp. 578-593, 1969.
4. PESCHON, J., and HENAULT, P. H., Long-Term Power System Expansion
Planning by Dynamic Programming and Production Cost Simulation, 9th-IEEE
Symposiumon Adaptive Processes, Decision, and Control, Austin, Texas, 1970.
5. JACOBSON, D. H., and MAYNE, D. Q., Differential Dynamic Programming,
American Elsevier Publishing Company, New York, New York, 1970.
6. ASH, M., Optimal Shutdown of Nuclear Reactors, Academic Press, New York,
New York, 1966.
7. KWAN, H. K., Optimal Shutdown Control of Nuclear Reactors, University of
Salford, MS Dissertation, 1973.
8. TAPLEY, B. D., and WILLIAMSON,W. E., Comparison of Linear and Riccati
Equations Used to Solve Optimal Control Problems, AIAA Journal, Vol. 10, pp.
1154-1159, 1972.
9. DREYFUS, S. E., Dynamic Programming and the Calculus of Variations,
Academic Press, New York, New York, 1965.

Helliwell ModernClassicalMechanics Solutions
100% (2)
Helliwell ModernClassicalMechanics Solutions
456 pages
An Introduction To Optimal Control Theory
No ratings yet
An Introduction To Optimal Control Theory
279 pages
Optimal Control Exercises
100% (2)
Optimal Control Exercises
79 pages
DP_Slides
No ratings yet
DP_Slides
263 pages
1 s2.0 S0022247X03005353 Main
No ratings yet
1 s2.0 S0022247X03005353 Main
23 pages
36fc4cbaabe504446b51adb8a68f5958 MIT6 231F15 Complete Slide
No ratings yet
36fc4cbaabe504446b51adb8a68f5958 MIT6 231F15 Complete Slide
166 pages
Ar and MR Corr
No ratings yet
Ar and MR Corr
10 pages
RL_QA_Unit-IV
No ratings yet
RL_QA_Unit-IV
9 pages
Ecsecuritybased-Safety Auto Brake System For Hill Station Vehicle Using Mems Sensor.
100% (1)
Ecsecuritybased-Safety Auto Brake System For Hill Station Vehicle Using Mems Sensor.
3 pages
s11063-016-9562-6
No ratings yet
s11063-016-9562-6
29 pages
Dynamic_Programming
No ratings yet
Dynamic_Programming
37 pages
Image Enhancement by Elliptic Discrete Fourier Transforms: Artyom M. Grigoryan Sos S. Agaian
No ratings yet
Image Enhancement by Elliptic Discrete Fourier Transforms: Artyom M. Grigoryan Sos S. Agaian
10 pages
Kamala Pur Kar 2015
No ratings yet
Kamala Pur Kar 2015
9 pages
Chapter 6 Dynamic Optimization Math Econ 3rd y
No ratings yet
Chapter 6 Dynamic Optimization Math Econ 3rd y
17 pages
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
No ratings yet
GDD Nonlinear NIPS 2009 Convergent Temporal Difference Learning With Arbitrary Smooth Function Approximation
9 pages
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
27 pages
Vassilis Sakizlis, Vivek Dua, Stratos Pistikopoulos
No ratings yet
Vassilis Sakizlis, Vivek Dua, Stratos Pistikopoulos
34 pages
02 - Dynamic Programming and LQR
No ratings yet
02 - Dynamic Programming and LQR
25 pages
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
0% (1)
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
54 pages
Dynamic_Programming_and_Optimal_Control
No ratings yet
Dynamic_Programming_and_Optimal_Control
62 pages
Lecture 3 and 4
No ratings yet
Lecture 3 and 4
14 pages
BIOS 203: Free Energy Methods Tom Markland
No ratings yet
BIOS 203: Free Energy Methods Tom Markland
23 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
Figure by Mit Opencourseware
No ratings yet
Figure by Mit Opencourseware
26 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
Inverse Optimal Control With Linearly-Solvable MDPs
No ratings yet
Inverse Optimal Control With Linearly-Solvable MDPs
8 pages
AL ILQR Tutorial
No ratings yet
AL ILQR Tutorial
10 pages
Vol I Dimitri PDF
No ratings yet
Vol I Dimitri PDF
30 pages
10.3934 dcdss.2024060
No ratings yet
10.3934 dcdss.2024060
20 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
45 pages
AAC Conductors: Aluminum Conductor Concentrically Stranded (All Aluminum Conductor)
No ratings yet
AAC Conductors: Aluminum Conductor Concentrically Stranded (All Aluminum Conductor)
35 pages
Dynamic Linear Programming
No ratings yet
Dynamic Linear Programming
23 pages
Quantum Physics Homework Set 1
No ratings yet
Quantum Physics Homework Set 1
1 page
SDE-3
No ratings yet
SDE-3
19 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Re - ST - Hausman and Xthausman After Panel Fe, Re PDF
No ratings yet
Re - ST - Hausman and Xthausman After Panel Fe, Re PDF
6 pages
Ascorbic Acid
No ratings yet
Ascorbic Acid
2 pages
M∆JOR #07 QP(Eng)
No ratings yet
M∆JOR #07 QP(Eng)
22 pages
Altro Iros
No ratings yet
Altro Iros
6 pages
Typeset by AMS-TEX
No ratings yet
Typeset by AMS-TEX
27 pages
Statics of Rigid Bodies Notes
No ratings yet
Statics of Rigid Bodies Notes
5 pages
Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
No ratings yet
Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
7 pages
1211 5761 PDF
No ratings yet
1211 5761 PDF
8 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
Swellex Bolt Reliability PDF
No ratings yet
Swellex Bolt Reliability PDF
11 pages
Software Defined Radio Lec 7 - Digital Generation of Signals
No ratings yet
Software Defined Radio Lec 7 - Digital Generation of Signals
24 pages
P550
No ratings yet
P550
27 pages
SC Dec22
No ratings yet
SC Dec22
82 pages
Flexible Pavement - Aspects of Basic Design As Per IRC Method
100% (2)
Flexible Pavement - Aspects of Basic Design As Per IRC Method
87 pages
Introduction To Dynamic Optimization
No ratings yet
Introduction To Dynamic Optimization
7 pages
Dynamic Programming Matlab
No ratings yet
Dynamic Programming Matlab
6 pages
Necessary and Sufficient Conditions For Dynamic Optimization
No ratings yet
Necessary and Sufficient Conditions For Dynamic Optimization
18 pages
MIT6 231F15 Notes PDF
No ratings yet
MIT6 231F15 Notes PDF
303 pages
Measurement Science and Metrology Lesson Plan
No ratings yet
Measurement Science and Metrology Lesson Plan
5 pages
100 Questions From Previous Board Papers
No ratings yet
100 Questions From Previous Board Papers
8 pages
MIT6 231F11 Notes Short
No ratings yet
MIT6 231F11 Notes Short
125 pages
Nonlinear Optimisasi CACHUAT
No ratings yet
Nonlinear Optimisasi CACHUAT
214 pages
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
No ratings yet
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
55 pages
Parallel Algorithms For PDE-Constrained Optimization
No ratings yet
Parallel Algorithms For PDE-Constrained Optimization
33 pages
Unit 1 Estimation of Refinery Stream Properties
No ratings yet
Unit 1 Estimation of Refinery Stream Properties
45 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
Chapter 13 (Bearing)
No ratings yet
Chapter 13 (Bearing)
55 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Roles and Motivations For Roundness Instrumentation Metrology
No ratings yet
Roles and Motivations For Roundness Instrumentation Metrology
19 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Optimal Control of A Hybrid Dynamical System: Two Coupled Tanks
No ratings yet
Optimal Control of A Hybrid Dynamical System: Two Coupled Tanks
25 pages
WI/06 Procedure For Calculation of Uncertainty of Sensor With Probe
No ratings yet
WI/06 Procedure For Calculation of Uncertainty of Sensor With Probe
3 pages
Number System and Counting Atreya Roy
No ratings yet
Number System and Counting Atreya Roy
26 pages
Paulo Brito Ecomat Discreto
No ratings yet
Paulo Brito Ecomat Discreto
49 pages
Chapter One: 1.1 Optimal Control Problem
No ratings yet
Chapter One: 1.1 Optimal Control Problem
25 pages
303 2
No ratings yet
303 2
1 page
ES205 Intro
No ratings yet
ES205 Intro
75 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Sanford Kwinter and Umberto Boccioni Landscapes of Change Assemblage No 19 Dec 1992 PP 50 65
No ratings yet
Sanford Kwinter and Umberto Boccioni Landscapes of Change Assemblage No 19 Dec 1992 PP 50 65
17 pages
Mathematics in The Modern World L1
100% (1)
Mathematics in The Modern World L1
39 pages
Khulna University of Engineering & Technology: List of Eligible Candidates
No ratings yet
Khulna University of Engineering & Technology: List of Eligible Candidates
146 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Design and Installation of A Solar System For Own House.: Author
No ratings yet
Design and Installation of A Solar System For Own House.: Author
42 pages
Ayanamsa
43% (7)
Ayanamsa
30 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
Dynamic Programming and Optimal Control, Volumes I Solution Selected
No ratings yet
Dynamic Programming and Optimal Control, Volumes I Solution Selected
30 pages
Digital Communications 3units
100% (2)
Digital Communications 3units
108 pages
Race Car Aerodynamics
100% (1)
Race Car Aerodynamics
7 pages
Solving Optimal Control Problems With State Constraints
No ratings yet
Solving Optimal Control Problems With State Constraints
8 pages
Nonlinear and Dynamic Optimization
No ratings yet
Nonlinear and Dynamic Optimization
214 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Laplace Transforms Essentials
From Everand
Laplace Transforms Essentials
Morteza Shafii-Mousavi
3.5/5 (3)
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)

09 Forward Differential Dynamic Programming

Uploaded by

09 Forward Differential Dynamic Programming

Uploaded by

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 21, No.

Forward Differential Dynamic Programming

Abstract. The dynamic programming formulation of the forward prin-

Key Words. Optimal control, forward dynamic programming,

Unlike the numerical integration of ordinary differential equations with

forward way, it is possible to separate the functional minimization from the

We shall consider an n-dimensional dynamic system

F(x (t), t) + L (x, u, z) dr, t --- tf. (3)

The corresponding optimal state-space trajectory is required to satisfy a set

xlt-ts-to-kT, K = 0 , 1, 2 , . . . , T>0, (7)

3. Forward Versus Backward Dynamic Programming

The conventional backward DP embeds the solution of the original

V(x(tf), tr) = F(x(tl), ts), (10)

S(xo, to) = 0, (14)

minmin{F(x(t), t)+ L(x, u, ~') d~'}, t>-tl, (15)

subject to (4) and (5).

From these comments, it should be clear that forward DP is not merely

4. Optimum Cost Function

Before the NP problem (16) can be solved, it is necessary to have a

S(x(to), b, to) =(b, X(to)-Xo), (18)

5. Forward Minimum Principle

For trajectories originating at Xo, the optimal cost function is independent of

S(x*(t), t) = L(x*, u*, T) d-r. (25)

By varying b in (23), we are able to generate the desired family of

minlF(x*(t), t)+ L(x*, u*, ~') d.r , t>-tr, (26-1)

expense of a constrained minimization in the (n + 1)-dimensional space

Using a direct search numerical method of NP, the computation

and to simplify the algorithm as follows. Introducing a penalty function P for

is decreasing. The problem has thus been reduced to an n-dimensional

Fig. 1. Optimalcontrolof a crane: (a) movingcrane; (b) movingcrab.

Example6.1. Optimal Pendulum Control. A load is suspended from

iz = -txa-x3)lR:_-L-~_-L--_xs)2 x2(O)--~ V, (29)

it3 = u, x3(0) = 0, (30)

Example6.2. Optimal Shutdown of a Nuclear Reactor. It is required

Fig. 3. Phase trajectory and optimal control,

where xl and x2 are normalized concentrations of 1-135 and Xe-135,

Fig. 4. Optimal shutdown policy.

If tr is the time at which the shut-down process is completed, i.e.,

Using the method of DDP (Ref. 5), we will derive a second-order

satisfaction of the initial condition (2). This will be achieved by adjoining it

W(x(t), b, t)=min{ (b, x(to)-Xo)+ I ' L(x, u, ~) &c}. (51)

From the forward principle of optimality, it must satisfy

W ( x (to), b, to) = (b, X(to) - Xo), (53)

The control vector u is assumed to be unconstrained for reasons that will

Q(2, u*, Wx, t) = 0,

,~u =131 8x.

On collection of corresponding terms of the expanded equation, one

Fig. 6. Circuit diagram.

Example 8.1. Optimal Charging of a Capacitor. Let us consider the

min{½ fo'~[(x-xf)2 + Rj2] dt}. (64)

The terminal time ti is free.

x, Volt Cost, Joule

Fig. 7. Output voltage and corresponding cost.

20 34 " " 2 " "

Fig. 8. Optimal output voltage.

9. Comments on Second-Order Algorithm

It was seen in Example 8.1 that even a linear-quadratic problem

Fig. 9. Optimal input current.

Using the forward principle of optimality, we have derived two compu-

Problems with free terminal time may be solved at no additional

1. LEITMANN, G., and STALFORD, H., A Note on Termination in Optimal Control

You might also like

S(x(t), t) = L(x, u*, T) d-r. (25)

minlF(x(t), t)+ L(x, u*, ~') d.r , t>-tr, (26-1)