Direct Methods SLP Report
Direct Methods SLP Report
May 8, 2021
Certificate
I certify that this Supervised Learning Project Report titled “Direct Methods for
best of my knowledge, the report represents the research and work carried out by
the candidate. I confirm that the investigations were conducted in accord with the
and that the research data are presented honestly and without prejudice
Table of Contents
Abstract iii
Acknowledgement iv
Lists of Figures v
1 Introduction 1
1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Indirect Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
i
Table of Contents
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
References 21
ii
Abstract
Abstract
In this report, we study the numerical methods for optimal control. In particular,
the direct methods are discussed with an objective to showcase an approach for
general optimal control problems. These methods, which include single shooting,
multiple shooting and collocation methods, are relatively simple to follow and ef-
fectively solve a wide variety of optimization problems. We illustrate each of the
methods by working through example problems.
iii
Acknowledgements
Acknowledgements
iv
List of Figures
List of Figures
v
Introduction
Chapter 1
Introduction
In general, an objective function can include two terms: a boundary objective Φ(·)
and a path integral along the entire trajectory, with the integrand L (·). A problem
with both terms is said to be in Bolza form. A problem with only the integral
term is said to be in Lagrange form, and a problem with only a boundary term is
said to be in Mayer form [5].
Z tf
L x(t), u(t),t
J := Φ x(t0 ),t0 , x(t f ),t f + dt (1.1)
t0
Here we determine the state (equivalently, the trajectory), x(t) ∈ Rn , the control
u(t) ∈ Rm , the initial time, t0 ∈ R, and the terminal time, t f ∈ R (where t ∈ [t0 ,t f ]
is the independent variable) that optimizes (generally minimizes) the objective func-
tion J in (1.1). The optimization is subject to a variety of limits and constraints,
detailed in (1.2) – (1.4).
1
Introduction
The first, and perhaps most important, of these constraints is the system dynamics,
which are typically nonlinear and describe how the system changes in time:
Numerical methods for solving optimal control problems are divided into two ma-
jor classes: indirect methods (optimize, then discretize) and direct methods (dis-
cretize, then optimize) [1].
Indirect Methods use the necessary conditions of optimality of the infinite problem
to derive a boundary value problem (BVP) in ordinary differential equations (ODE).
The BVP is numerically solved to determine candidate optimal trajectories called
extremals. Each of the computed extremals is then examined to see if it is a local
minimum, maximum, or a saddle point. Of the locally optimizing solutions, the
particular extremal with the lowest cost is chosen.
Here we seek a solution to the (closed system of) conditions of optimality which
can be derived using the well-known calculus of variations and the Euler-Lagrange
2
Introduction
Direct methods to continuous optimal control finitely parameterize the infinite di-
mensional decision variables, notably the controls u(t) and sometimes both state
x(t) and controls u(t), such that the original problem is approximated by a finite
dimensional nonlinear program (NLP). This NLP can then be addressed by struc-
turally exploiting numerical NLP solution methods. For this reason, the approach
is often characterized as “First discretize, then optimize.” We will discuss direct
single shooting, direct multiple shooting, and direct collocation in this report.
3
Direct Shooting Methods
Chapter 2
2.1 Overview
4
Direct Shooting Methods
We denote the finite control parameters by the vector q [7], and the resulting
control function by u(t; q). The most widespread parameterization are piecewise
constant controls, for which we choose a fixed grid 0 = t0 < t1 < . . . < tN = t f , and
N parameters qi ∈ Rnu , i = 0, . . . , N − 1, and then we set
u(t; q) = qi if t ∈ [ti ,ti + 1].
Thus, the dimension of the vector q = (q0 , . . . , qN−1 ) is Nnu . In single shooting,
which is a sequential approach, we then regard the states x(t) on [t0 ,t f ], keeping
t0 fixed at zero, as dependent variables that are obtained by a forward integration
of the dynamic system, starting at x0 and using the controls u(t; q). This can be
performed using methods like Euler forward, Euler backward, Crank-Nicolson, Runge-
Kutta, Hermite-Simpson and many more depending upon the required accuracy. We
denote the resulting trajectory as x(t; q). In order to discretize inequality path con-
straints, we choose a grid, typically the same as for the control discretization, at
which we check the inequalities. Thus, in single shooting, we transcribe the OCP
(1.1) into the following NLP, that is visualized as
Z tf
L x(t; q), u(t; q),t dt
min Φ x(t0 ; q),t0 , x(t f ; q),t f +
q∈RNnu t0
5
Direct Shooting Methods
The direct multiple shooting method performs first a piecewise control discretiza-
tion on a grid, exactly as we did in single shooting, i.e. we set
u(t) = qi for t ∈ [ti ,ti+1 ].
But then, it solves the ODE separately on each interval [ti ,ti+1 ], starting with arti-
ficial initial values si :
ẋi (t; si , qi ) = f (xi (t; si , qi ), qi ) for t ∈ [ti ,ti+1 ]
xi (ti ; si , qi ) = si .
Thus, we obtain trajectory pieces xi (t; si , qi ). Likewise, we numerically compute
the integrals
Z ti+1
L xi (t; si , qi ), u(t; qi ),t
li (si , qi ) := dt
ti
Finally, we choose a grid at which we check the inquality path constraints; here
we choose the same as for the controls and states, but note that a much finer
sampling would be possible as well, which, however, requires continuous output
from the integrator. Thus, the NLP that is solved in multiple shooting is given
6
Direct Shooting Methods
by:
N−1
min Φ s0 ,t0 , sN ,t f + Σi=0 li (si , qi )
s,q
s.t. h(ti , si , qi ) ≤ 0 i = 0, . . . , N
(2.2)
g(t0 ,tN , s0 , sN ) ≤ 0
xi (ti+1 ; si , qi ) − si+1 = 0 i = 0, . . . , N − 1
2.4 Discussions
In both methods the state trajectory is stored as the result of a simulation. Notice
that multiple shooting is just like a series of single shooting methods, with a de-
fect constraint added to make the trajectory continuous. Multiple shooting results
in a higher dimensional non-linear program, but it is sparse and more linear than
the program produced by single shooting. Therefore it is important, we can and
should employ a sparsity exploiting NLP solver.
Single shooting works well enough for simple problems, but it will almost cer-
tainly fail on problems that are more complicated. This is because the relationship
between the decision variables and the objective and constraint functions is not
well approximated by the linear (or quadratic) model that the NLP solver uses [2].
This is not a huge problem for multiple shooting as on generating shorter seg-
ments, the relationship becomes more linear.
7
Direct Collocation Methods
Chapter 3
3.1 Overview
Arguably the most powerful methods for solving general optimal control problems
are direct collocation methods. A direct collocation method is a state and con-
trol parameterization method where the state and control are approximated using a
specified functional form. The time derivative of the state is approximated by dif-
ferentiating the interpolating polynomial and constraining the derivative to be equal
to the vector field at a finite set of collocation points.
The two most common forms of collocation are local collocation and global collo-
cation [1]. The methodology classification for direct collocation is h-methods and
p-methods.
8
Direct Collocation Methods
For p-methods, we divide the time horizon [t0 ,t f ] into (fixed) mesh intervals. The
state is approximated in each interval by an N th order polynomial. Convergence
of the p method was then achieved by increasing the degree of the polynomial
approximation. For problems whose solutions are smooth and well-behaved, a p
Gaussian quadrature collocation method has a simple structure and converges at
an exponential rate. The most well developed p Gaussian quadrature methods
are those that employ either Legendre-Gauss (LG) points, Legendre-Gauss-Radau
(LGR) points, or Legendre-Gauss-Lobatto (LGL) points. [8]
In a local approach, the time interval is divided into a large number of subinter-
vals called segments or finite elements and a small number of collocation points
are used within each segment. The segments are then linked via continuity condi-
tions on the state, the independent variable, and possibly the control. The rationale
for using local collocation is that a local method provides so-called local support
(i.e., the discretization points are located so that they support the local behavior
9
Direct Collocation Methods
A local collocation method follows a procedure in which the time interval [t0 ,t f ]
is divided into S subintervals [ts−1 ,ts ], (s = 1, . . . , S) where tS = t f . In order
to ensure continuity in the state across subintervals, the following compatibility
constraint is enforced at the interface of each subinterval:
x(t −i ) = x(t +i ), (s = 2, . . . , S − 1)
In the context of optimal control, local collocation has been employed using one
of two categories of discretization: Runge-Kutta methods and orthogonal colloca-
tion methods. [1]
The three most commonly used set of collocation points are Legendre-Gauss (LG),
Legendre-Gauss-Radau (LGR), and Legendre-Gauss-Lobatto (LGL) points. These
three sets of points are obtained from the roots of a Legendre polynomial and/or
linear combinations of a Legendre polynomial and its derivatives. All three sets
of points are defined on the domain [−1, 1], but differ significantly in that the
LG points include neither of the endpoints, the LGR points include one of the
endpoints, and the LGL points include both of the endpoints. [3]
10
Direct Collocation Methods
Figure 3.2: Differences between LGL, LGR, and LG collocation points: ref. [3]
11
Direct Collocation Methods
Figure 3.3: Distribution of nodes and collocation points for both the global and local
approaches (N = 20): ref. [4]
The results obtained in [4] suggest that, except in special circumstances, global or-
thogonal collocation is preferable to local orthogonal collocation. [4] also suggests
that, the global Gauss pseudospectral method (GPM) is much more accurate than
the local GPM for a given number of total collocation points. Furthermore, for
a desired accuracy, the global approach is computationally more efficient than the
local approach in smooth problems. For non-smooth problems the local and global
approach are quite similar in terms of accuracy.
For this illustration we employ local collocation with Legendre-Gauss (LG) collo-
cation points. The state at a set of collocation points (in addition to the beginning
of the interval) enters in the NLP as variables. An example of a choice of time
12
Direct Collocation Methods
as well as the final time tN,0 = t f . Also let xk, j denote the states at these time
points. On each control interval, we shall define a Lagrangian polynomial basis:
d
τ − τr
L j (τ) = ∏
r=0,r6= j τ j − τr
where Cr, j := L̇r (τ j ) and Dr := Lr (1). Plugging in the approximation of the state
derivative (3.1) into the ODE gives us a set of collocation equations that needs to
13
Direct Collocation Methods
d
hk f (tk, j , xk, j , uk ) − ∑ Cr, j xk,r = 0, k = 0, . . . , N − 1, j = 1, . . . , d (3.3)
r=0
And the approximation of the end state (3.2) gives us a set of continuity equations
that must be satisfied for every control interval:
d
xk+1,0 − ∑ Dr xk,r = 0, k = 0, . . . , N − 1, (3.4)
r=0
These two sets of equations [(3.3) and (3.4)] take the place of the continuity equa-
tion (represented by the integrator call) in direct multiple shooting.
14
Optimal Control Problems
Chapter 4
4.1.1 Description
The Bryson-Denham optimal control problem is a benchmark test problem for op-
timal control algorithms. Consider the system
ẋ(t) = v(t)
(4.1)
v̇(t) = u(t)
The parameter u(t) ∈ R (acceleration) is adjusted over the time horizon from a
starting time of zero to a final time of one. The variable x(t) ∈ R is the position
and v(t) ∈ R is the velocity. Consider the optimal control problem
1 1
Z
minimize u(t)2 dt
2 0
underlying system dynamics (4.1),
x(0) = x(1) = 0,
(4.2)
subject to
v(0) = −v(1) = 1,
x(t) ≤ l, where l = 1 .
9
15
Optimal Control Problems
16
Optimal Control Problems
4.2.1 Description
Consider the system ẍ = u, u ∈ [−1, 1] which can represent a car with posi-
tion x ∈ R and with bounded acceleration u acting as the control (negative accel-
eration corresponds to braking). Let us study the problem of parking the car at
the origin, i.e., bringing it to rest at x = 0, in minimal time. It is clear that the
system can indeed be brought to rest at the origin from every initial condition.
However, since the control is bounded, we cannot do this arbitrarily fast (we are
ignoring the trivial case when the system is initialized at the origin). Thus we
expect that there exists an optimal control u∗ which achieves the transfer in the
smallest amount of time. [6]
17
Optimal Control Problems
We know that the dynamics of the double integrator are equivalently described by
the state-space equations
ẋ1 = x2
(4.3)
ẋ2 = u
18
Optimal Control Problems
19
Optimal Control Problems
4.3 Conclusions
Except for the single-shooting method used to solve bang-bang double integrator
problem, all the methods gave decently correct results for each of the problems.
The non-smoothness of the optimal control function in bang-bang problems might
have caused single-shooting method to fail.
Looking at these direct methods for optimal control, we can summarise them in
the following way: When we went from direct single shooting to direct multiple
shooting we essentially traded non-linearity for problem size. The NLP in sin-
gle shooting is small, but often highly nonlinear, whereas the NLP for multiple-
shooting is larger, but less nonlinear and with a sparsity structure that can be
exploited efficiently. Direct collocation is to take one more step in the same di-
rection, adding even more degrees of freedom. The resulting NLP is even larger,
but has even more structure that can be exploited.
Direct methods are easier to implement with effective incorporation of state and
control constraints. But, this comes at the cost of accuracy and that is why it is
important to choose an appropriate method based on the problem and requirements.
20
References
References
[1] Anil V. Rao. A survey of numerical methods for optimal control. Advances in
the Astronautical Sciences, 135, 2010.
[3] Divya Garg et al. Direct trajectory optimization and costate estimation of gen-
eral optimal control problems using a radau pseudospectral method. AIAA Guid-
ance, Navigation, and Control Conference, 2009.
[4] Anil V. Rao Geoffrey T. Huntington. Comparison of global and local colloca-
tion methods for optimal control. Journal of Guidance, Control, and Dynamics,
31, 2008.
[6] Daniel Liberzon. Calculus of Variations and Optimal Control Theory: A Concise
Introduction. Princeton University Press, 2012.
21