Pontryagin Principle of Maximum Time-Optimal Control: Constrained Control, Bang-Bang Control
Pontryagin Principle of Maximum Time-Optimal Control: Constrained Control, Bang-Bang Control
Zdeněk Hurák
April 27, 2017
Hy0 = 0. (1)
Ly0 y0 ≥ 0, (2)
we concluded that Hamiltonian is not only stationary along the extremal; it is actually
maximized since
Hy0 y0 = −Ly0 y0 ≤ 0. (3)
Pontryagin principle of maximum; time-optimal control
or, equivalently as
The essence of the celebrated Pontryagin’s principle is that actually the above
condition is the necessary condition of optimality. The fact that
Hu = 0 (9)
is just a consequence in the situation when Hu exists and the set of allowable controls
u is not bounded. Let us emphasize the fact that the control u comes from some
bounded set U by writing the Pontryagin’s principle as
Theorem 1 (Pontryagin’s principle of maximum). For a given system and a given
optimal criterion, let u∗ ∈ U be an optimal control, then there is a variable called
costate which together with the state satisfies the Hamilton canonical equations
ẋ = ∇λ H, (10)
λ̇ = −∇x H, (11)
where
H(t, x, u, λ) = λT (t) · f (x, u, t) − L(t, x, u) (12)
and
H(t, x∗ , u∗ , λ∗ ) ≥ H(t, x∗ , u, λ∗ ), u ∈ U. (13)
Moreover, the corresponding boundary conditions must hold.
y(x)
y(x)
δy(x)
y ∗ (x)
a b b + db x
Figure 1: Optimizing over curves with one of their end point on a curve.
This trick is, that stretching or shrinking of the interval of the independent variable
is done by perturbing the stationary value of the right end b of the interval with the
same α as we use to perturb the functions y and y 0 . That is, b is perturbed by
∆b = α∆x and the perturbed cost functional is then
Z b+α∆x
J(y ∗ + αη) = L(x, y ∗ + αη, (y ∗ )0 + αη 0 )dx. (15)
a
Note that we have a minor technical problem here since y ∗ is only defined on the
interval [a, b]. But there is an easy fix: we will define a continuation of the function
even to the right of b in the form of a linear approximation given by the derivative of
y ∗ at b. We will exploit it in a while.
Now, in order to find a variation δJ, we can either proceed by fitting the Taylor’s
expansion of the above perturbed cost function to the general Taylor’s expansion and
identifying the first-order term in α. Alternatively (well, in fact, equivalently), we
can use the already stated fact that
d ∗
δJ = J(y + αη) α. (16)
dα α=0
In order to find this derivative, we have to observe that the variable with respect to
which we are differentiating is included in the upper bound of the integral. Therefore
we cannot just change the order of differentiation and integration. This situation
is handled by the well-known Leibniz rule for differentiation under the integral sign.
Look it up yourself in the full generality. In our case it leads to
Z b
d ∗
d
J(y + αη) = Ly − Ly η(x)dx + Ly0 |b η(b) + L|b ∆x,
0 (17)
dα α=0 a dx
which after multiplication by α gives the variation of the functional
Z b
d
δJ = Ly − Ly0 δy(x)dx + Ly0 |b δy(x) + L|b ∆xα
| {z }, (18)
a dx
∆b
where the first two terms on the right are already known to us. The only new is
the third term. The reasoning now is pretty much the same as it was in the fixed-
interval free-end case. We argue that among the variations δy there are also those that
vanish at b, hence the conditions must be satisfied even if the last two terms are zero.
But then the integral must be zero, which gives rise to the familiar Euler-Lagrange
equation. The last two terms must together be zero and it does not hurt to rewrite
them in a complete notation to dispell any ambiquity
Now, in order to get some more insight into the above condition, the relation
between the participating objects can be further explored. We will do it using the
Fig. 1 but we will augment it a bit with a few labels, see Fig. 2 below.
Note that we have included a brand new label here, namely, dyf for the perturbation
of the value of the function y() at the end of the interval (taking into consideration
that the length of the interval can change as well). We can now write
y ∗ (b + ∆b) + δy(b + ∆b) = y ∗ (b) + dyf , (20)
y(x)
≈ δy(b)
dyf
y(x) δy(b) y ∗0 (b)db
y ∗ (b)
y ∗ (x)
a b b + db x
Figure 2: Optimizing over curves with one end of the interval of the independent
variable x set free and relaxing also the value of the function there.
which after approximating each term with the first two terms of its Taylor expansion
gives
y ∗ + y ∗ 0 (b)∆b + δy(b) + δ 0 (b)∆b
(b) = y ∗
(b) + dyf . (21)
Note that the third product on the left can be neglected since it contains two
terms that are both of order one in the perturbation variable α. In other words, we
approximate δy(b + ∆b) by δy(b). In addition, the term y ∗ (b) can be subtracted from
both sides. From what remains after these cancelations, we can conclude that
y ∗ 0 (b)∆b + δy(b) = dyf , (22)
or, equivalently,
δy(b) = dyf − y ∗ 0 (b)∆b. (23)
We will now substitute this into the general form of the boundary equation in (19)
Ly0 (b, y(b), y 0 (b)) · (dyf − y ∗ 0 (b)∆b) + L(b, y(b), y 0 (b))∆b = 0. (24)
Collecting now the terms with the two independent perturbation variables dyf and
∆b, we reformat the above expression into
Ly0 (b, y(b), y 0 (b))dyf + (L(b, y(b), y 0 (b)) − Ly0 (b, y(b), y 0 (b))y ∗ 0 (b))∆b = 0. (25)
Now, since dyf and ∆b are assumed independent, the corresponding terms must be
simultaneously and independently equal zero, that is,
Ly0 (b, y(b), y 0 (b)) = 0, (26)
0 0 ∗0
L(b, y(b), y (b)) − Ly0 (b, y(b), y (b))y (b) = 0. (27)
Note that the first condition actually constitutes n scalar conditions whereas the
second one is just a scalar condition itself, hence, n + 1 boundary conditions.
subject to
ẋ(t) = f (x, u, t), x(ti ) = ri . (29)
We have already seen that the integrand of the augmented cost function now con-
tains not only the term that corresponds to the Lagrange multiplier but also the term
that penalizes the state at the final time, that is,
∂φ dx
Laug (x, u, λ, t) = L(x, u, t) + + (∇x φ)T +λ(ẋ − f (x, u, t)) (30)
|∂t {z dt}
dφ(x(t),t)
dt |t=tf
Since here we assume that the final time and the state at the final time are inde-
pendent, this single conditions breaks down into two boundary conditions1
which, in turn, enforces the scalar boundary condition (on top of those other n con-
ditions)
H(x(t), u(t), λ(t), t)|t=tf dtf = 0 . (36)
1 Note that here we commit the common abuse of notation in writing the functions to be differ-
∂φ(x(tf ),tf )
entiated as explicitnly dependent on tf such as in ∂t
. Instead, we should perhaps keep
∂φ(x(t),t)
writing it as ∂t but it is tiring and the formulas look cluttered.
t=tf
Now, if neither the system equations nor the optimal control cost function depend
explicitly on time, that is, if ∂H
∂t = 0, the Hamiltonian remains constant along the
optimal solution (trajectory), that is,
Combined with the previous result (boundary value of H at the end of the free time
interval is zero), we obtain the powerful conclusion that the Hamiltonian evaluated
alon the optimal trajectory is always zero in the free final time scenario:
Remark on notation In the previous lecture (notes) we already discussed the unfor-
tunate discrepancy in the definitions of Hamiltonian in the literature. Perhaps there
is no need to come back to this topic because you are now aware of the problem, but
I will do it anyway. My only motivation is to have the formulas at hand.
Recall that the ambigiuty starts with the definition of the augmented Lagrangian.
I could have easily written instead of (30) the following
∂φ dx T
L̂aug (x, u, λ̂, t) = L(x, u, t) + + (∇x φ)T +λ̂ (f (x, u, t) − ẋ). (40)
|∂t {z dt
}
dφ(x(t),t)
dt |t=tf
∂φ(x(tf ),tf )
which can be rewritten in the case of ∂t = 0 and using the alternative defini-
T
tion of the Hamiltonian Ĥ = L + λ̂ f as
T
(∇x φ − λ̂) dxf + Ĥ(x, u, λ̂, t) dtf , (42)
t=tf t=tf
2.2 Free final time but the final state on a prescribed curve
2.2.1 Calculus of variations setting
We will now investigate a special case when the final value of the solution y(x) is to
be on the curve described by ψ(x), that is
y ∗ (b + ∆b) + δy(b + ∆b) = ψ(b + ∆b). (43)
y(x)
ψ(x)
y ∗ (x)
a x
y ∗
+ (y ∗ )0 (b)∆b + δy(b) = y ∗
(b)
+ψ 0 (b)∆b,
(b) (45)
| {z }
ψ(b)
and substitute to the boundary condition (19), which after cancelling the common
∆b term yields
Ly0 (b, y(b), y 0 (b)) · (ψ 0 (b) − y 0 (b)) + L(b, y(b), y 0 (b)) = 0. (47)
This is just one scalar boundary conditions. But the original n conditions that the
state that y(x) = ψ(x) at the right end of the interval must be added. Altogether,
we have n + 1 boundary conditions.
Anyway, the above single equation is called transversality condition for the reason
to be illuminated by the next example.
Example 2.2. To get an insight, consider again the minimum distance problem. This
time we want to find the shortest distance from a point to a curve given by φ(x). The
answer is intuitive, but let us see what our rigorous tools offer here. The EL equation
stays intact, therefore we know that the shortest path is a line. It starts at (a, 0)
but in order to determine its end, we need to invoke the other boundary condition.
Remember that the Lagrangian is
p
L = 1 + (y 0 )2 dx (48)
and
y0
Ly0 = p dx. (49)
1 + (y 0 )2
The interpretation of this result is that our desired curve y hits the target curve φ
in a perpendicular (transverse) direction.
Understanding the boundary conditions is crucial. Let us have yet another look at
the result just derived. It can be written as
It follows that for a free length of the interval and fixed value of the variable at the
end of the interval, in which
ψ(x) = c, c ∈ R, (53)
the transversality condition simplifies to
H(b) = 0. (54)
subject to
Translating the above derived transversality condition from the domain (and nota-
tion) of calculus of variations into the optimal control setting gives
T ∂φ T
(∇x φ + λ) ψ̇(t) + L + − λ f (x, u, t) = 0. (59)
∂t t=tf
u∗(t)
1
bT λ(t)
-1
We do not know λ2 (t). In order to get it, we may need to solve the costate equations.
Indeed, we can solve them independently of the state equations since it is decoupled
from them
λ̇1 0 0 λ1
=− , (74)
λ̇2 1 0 λ2
from which it follows that
λ1 (t) = c1 (75)
and
λ2 (t) = c1 t + c2 . (76)
for some constants c1 and c2 . To determine the constants, we will have to bring the
boundary conditions finally into the game. The condition that H(tf ) = 0 gives
We can now sketch possible profiles of the switching function. A few characteristic
versions are in Fig. 5
What we have learnt is that the costate λ2 would go through zero at most once
during the whole control interval. Therefore we will have at most one switching of
the control signal. This is a valuable observation.
We are approaching the final stage of the derivations. So far we have learnt that
we can only consider u(t) = 1 and u(t) = −1. The state equations can be easily
integrated to get
1
v(t) = v(0) + ut, y(t) = y(0) + y(0)t + ut2
2
λ2(t)
t0 tf t
To visualize this in y − v domain, express t from the first and subsitute into the
second equation
1
u(y − y(0)) = v(0)(v − v(0)) + (v − v(0))2 ,
2
which is a family of parabolas parameterized by (y(0), v(0)). These are visualized in
Fig. 6.
There is a distinguished curve in the figure, which is composed of two branches. It
is special in that for all the states starting on this curve, the system is brought to
the origin for a corresponding setting of the control (and no further switching). This
curve, called switching curve can be expressed as
1 2
y= 2v v<0
− 12 v 2 v > 0
or
1
y = − v|v|
2
The final step can be done using this figure. Point your finger anywhere in the
plane. Follow the state trajectory that emanates from the particular point for which
you can get to the origin with at maximum 1 switching. Clearly the strategy is to set
u such that it brings us to the switching curve (the red one in the figure) and then
follow it (after switching). That is it. This control strategy can be written as
1.5
0.5
0
v
−0.5
−1
−1.5
−2
−3 −2 −1 0 1 2 3
y
Figure 6: Typical trajectories for both u(t) = 1 and u(t) = −1. Red is the switching
curve.
actually not quite what we see in the plot above, is it? We can see two switches in the
control signals. The first one happend at about 2.2 s, but the second happend close to
3.5 s. In fact, what you would experience if you run the code in Simulink is an error
statement that “At time 3.449489782192745, simulation hits (1000) consecutive zero
crossings.” and the simulation will be finished. Obviosly, what is going on is that the
simulator is tempted to include not just two but in fact a huge number of switches
in the control signal as it approaches the origin. This is quite characteristic of bang-
bang control—a phenomenon called chattering. In this particular example you may
decide to ignore it since both state variables are already close enough to the origin
and you may want to declare the control task as finished2 . Generally, this chattering
phenomenon needs to be handled somehow. Any suggestion how to reduced it?
4 Further reading
This lecture was prepared using [1], in particular chapters 3 (application of calculus of
variations to general problem of optimal control) and chapter 4 (Pontryagin’s princi-
2 Remember that we still consider a control over finite time interval, even though its length is a
tunable parameter. Hence, after reaching the end of the interval, the task is over.
0
u
-1
0 0.5 1 1.5 2 2.5 3 3.5
2
0
v
-2
0 0.5 1 1.5 2 2.5 3 3.5
2
1
y
0
0 0.5 1 1.5 2 2.5 3 3.5
Time [s]
ple). We did not talk about the proof of Pontryagin’s principle at the lecture and we
do not even command the students to go through the proof in the book. Understand-
ing the result, its roots in calculus of variations and how it removes the deficiencies
of the calculus of variations based results will suffice for our purposes.
The transition from the calculus of variations to the optimal control, especially
when it comes to the definition of Hamiltonian, is somewhat tricky. Unfortunately,
it is not discussed satisfactorily in the literature. Even Liberzon leaves it as an
(unsolved) exercise (3.5 and 3.6) to the student. Other major textbooks avoid the
topic altogether. The only treatment can be found in the famouse journal paper [2],
in particular the section “The first fork in the road: Hamilton” on page 39. The issue
is so delicate that they even propose to distinguish the two types of Hamiltonian by
referring to one as control Hamiltonian.
The time-optimal control for linear systems, in particular bang-bang control for a
double integrator is described in section 4.4.1 and 4.4.2. The material is quite standard
and can be found in many books and lecture notes. What is not covered, however, is
the fact that without any adjustment, the bang bang control is very troublesome from
an implementation viewpoint. A dedicated research thread has evolved, especially
driven by the needs of hard disk drive industry, which is called (a)proximate time-
optimal control (PTOS). Many dozens of papers can be found with this keyword in
the title.
References
[1] Daniel Liberzon. Calculus of Variations and Optimal Control Theory: A Concise
Introduction. Princeton University Press, December 2011.
[2] H.J. Sussmann and J.C. Willems. 300 years of optimal control: from the brachys-
tochrone to the maximum principle. IEEE Control Systems, 17(3):32–44, 1997.