0% found this document useful (0 votes)

96 views45 pages

Controltheory PDF

This document provides an introduction to optimal control theory and the calculus of variations. It begins with an overview of how optimal control theory extends the classical calculus of variations to find functions that maximize or minimize integrals involving variable functions. It then covers several foundational concepts in functions of several variables, including partial derivatives, gradients, level surfaces, and directional derivatives, which are necessary to understand optimal control theory and the calculus of variations. Finally, it presents the Mean Value Theorem for functions of several variables.

Uploaded by

Fiaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views45 pages

Controltheory PDF

Uploaded by

Fiaz Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

LECTURES ON OPTIMAL CONTROL

THEORY
Terje Sund
August 9, 2012

CONTENTS

INTRODUCTION
1. FUNCTIONS OF SEVERAL VARIABLES
2. CALCULUS OF VARIATIONS
3. OPTIMAL CONTROL THEORY

INTRODUCTION

In the theory of mathematical optimization one try to find maximum or

minimum points of functions depending of real variables and of other func-
tions. Optimal control theory is a modern extension of the classical calculus
of variations. Euler and Lagrange developed the theory of the calculus of
variations in the eighteenth century. Its main ingredient is the Euler equa-
tion1 which was discovered already in 1744. The simplest problems in the
calculus of variation are of the type
Z t1
max F (t, x(t), ẋ(t)) dt
t0

where x(t0 ) = x0 , x(t1 ) = x1 , t0 , t1 , x0 , x1 are given numbers and F is a

given function of three (real) variables. Thus the problem consists of finding
1
the Euler equation is a partiall differential equation of the form ∂F d ∂F
∂x − dt ( ∂ ẋ ) = 0,
where F is a function of three real variables, x is an unknown function of one variable t
and ẋ(t) = dx ∂F
dt (t). ∂x stands for the partial derivative of F with respect to the second
variable x, ∂x (t, x, ẋ), and ∂F
∂F
∂ ẋ are the partial derivatives with respect to the third variable
ẋ.

1
Rt
functions x(t) that make the integral t01 F (t, x(t), ẋ(t)) dt maximal or mini-
mal. Optimal control theory has since the 1960-s been applied in the study of
many different fields, such as economical growth, logistics, taxation, exhaus-
tion of natural resources, and rocket technology (in particular, interception
of missiles).

1 FUNCTIONS OF SEVERAL VARIABLES.

Before we start on the calculus of variations and control theory, we shall need
some basic results from the theory of functions of several variables. Let A ⊆
Rn , and let F : A → R be a real valued function defined p on the set A. We
let F (~x) = F (x1 , . . . , xn ), ~x = (x1 , . . . , xn ) ∈ A, ||~x|| = x21 + · · · + x2n . ||~x||
is called (the Euklidean) norm of ~x.
A vector ~x0 is called an inner point of the set A, if there exists a r > 0
such that
B(~x0 , r) = {~x ∈ Rn : ||~x − ~x0 || < r} ⊆ A.
We let
A0 = {~x0 ∈ A : ~x0 is an inner point of A}
The set A0 is called the interior of A. Let

~e1 , ~e2 , . . . , ~en

be the vectors of the standard basis of Rn , that is,

~ei = (0, 0, . . . , 0, 1, 0, . . . , 0) (i = 1, 2, . . . , n),

hence ~ei has a 1 on the i-th entry, 0 on all the other entries.
Definition 1. let ~x0 ∈ A be an inner point. The first order partial
derivatives Fi0 (~x0 ) of F at the point ~x0 = (x01 , . . . , x0n ) with respect to the
i-th variable xi , is defined by
1
Fi0 (~x0 ) = lim [F (~x0 + h~ei ) − F (~x0 )]
h→0 h
1
= lim [F (x01 , . . . , x0i + h, x0i+1 , . . . , x0n ) − F (x01 , . . . , x0n )],
h→0 h

if the limit exists.

2
We also write
∂F
Fi0 (~x0 ) = (~x0 ) = Di F (~x0 ) (1 ≤ i ≤ n)
∂xi
for the i-th first order partial derivative of of F i ~x0 .
Second order partial derivatives are written

0 ∂ 2F
Fi,j (~x0 ) = (~x0 ) = Di,j F (~x0 ) (1 ≤ i ≤ n)
∂xj ∂xi
where we let
∂ 2F ∂ ∂F
(~x0 ) = (~x0 ) = Fj0 (Fi0 (~x0 )).
∂xj ∂xi ∂xj ∂xi
The gradient of the function F at ~x0 is the vector
∂F ∂F
∇F (~x0 ) = ( (~x0 ), . . . , ( (~x0 ))
∂x1 ∂xn

Example 1. Let F (x1 , x2 ) = 7x21 x32 + x42 , ~x0 = (2, 1) Then

∇F (x1 , x2 ) = (14x1 x32 , 21x21 x22 + 4x32 ),

∇F (x1 , x2 ) = (14x1 x32 , 21x21 x22 + 4x32 )
= (28, 28).

Level surfaces: Let c ∈ R, F : Rn → R. The set

Mc = {~x : F (~x) = c}

is called the level surface (or the level hypersurface ) of F corresponding to

the level c. if ~x0 = (x01 , x02 , . . . , x0n ) ∈ Mc , then we define the tangent plane of
the level surface Mc at the point ~x0 as the solution set of the equation

∇F (~x0 ) · (~x − ~x0 ) = 0.

In particular, ∇F ~x0 ) is perpendicular to the tangent plane at ~x0 , ∇F (~x0 ) ⊥

the tangent plane at ~x0 .
Example 2. let F (x1 , x2 ) = x21 x2 . then

∇F (x1 , x2 ) = (2x1 x2 , x21 ) and ∇F (1, 2) = (4, 1).

3
Tangent the equation at the point (1, 2) for the level surface (the level curve)
x21 x2 = 2 is ∇F (1, 2) · (~x − (1, 2)) = 0, hence (4, 1) · (x1 − 1, x2 − 1) = 0,
4x1 − 4 + x2 − 2 = 0, that is 4x1 + x2 = 6.
Example 3. Let F (x1 , x2 , x3 ) = x21 + x22 + x23 , and consider the level
surface
x21 + x22 + x23 = 1,
the sphere with radius 1 and centre at the origin. (a) Then

∇F (x1 , x2 , x3 ) = 2(x1 , x2 , x3 )
1 1 1 2
∇F ( √ , √ , √ ) = √ (1, 1, 1).
3 3 3 3

The tangent plane at the point √1 (1, 1, 1) has the equation:

2 1 1 1
√ (1, 1, 1) · (x1 − √ , x2 − √ , x3 − √ ) = 0,
3 3 3 3
2 2 3
√ (x1 + x2 + x3 ) = √ √ ,
3 3 3
√
x1 + x2 + x3 = 3.

(b) Here
3 4 3 4 2
∇F (0, , ) = 2(0, , ) = (0, 3, 4)
5 5 5 5 5
3 4
The tangent plane at (0, 5 , 5 ) has the equation

2 3 4
(0, 3, 4) · (x1 − 0, x2 − , x3 − ) = 0,
5 5 5
3 4
0 + 3(x2 − ) + 4(x3 − ) = 0,
5 5
3x2 + 4x3 = 5.

Directional derivatives. Assume that F : A ⊆ Rn → R. Let ~a ∈ Rn

and ~x ∈ A0 be an inner point of A. The derivative of F along ~a at the point
~x is defined as
1
F~a0 (~x) = lim [F (~x + h~a) − F (~x)]
h→0 h

if the limit exists.

4
Remark 1. The average growth of F from ~x of ~x + h~a is
1
[F (~x + h~a) − F (~x)],
h
hence the above definition is natural if we think of the derivative as a rate of
change.
Definition 2. We say that F is continuously differentiable, or C 1 , if the
function F~a0 (~x) is continuous at ~x for all ~a ∈ Rn and all ~x ∈ A0 , that is if

lim F~a0 (~x + ~y ) = F~a0 (~x).

y →0
~

if F is C 1 , then we can show that the map ~a 7→ F~a0 (~x) is linear, for all
~x ∈ A0 . We show first the following version of the Mean Value Theorem for
functions of several variables,
Proposition 1 (Mean Value Theorem). Assume that F is continuously
differentiable on an open set A ⊆ Rn and that the closed line segment

[~x, ~y ] = {t~x + (1 − t)~y : 0 ≤ t ≤ 1}

from ~x to ~y is contained in A. Then there exists a point w~ on the open line

segment (~x, ~y ) = {t~x + (1 − t)~y : 0 < t < 1} such that

F (~x) − F (~y ) = F~x0 −~y (w)

Proof. Set g(t) = F (t~x + (1 − t)~y ), t ∈ (0, 1). Then

1 1
[g(t + h) − g(t)] = [F ((t + h)~x + (1 − (t + h))~y ) − F (t~x + (1 − t)~y )]
h h
1
= [F (t)~x + (1 − t)~y + h(~x − ~y )) − F (t~x + (1 − t)~y )]
h
−→ F~x0 −~y (t~x + (1 − t)~y )
h→0

Since F was continuously differentiable, it is clear that g 0 (t) is continuous

for 0 < t < 1, and g 0 (t) = F~x0 −~y (t~x + (1 − t)~y ). Moreover, g(0) = F (~y ) and
g(1) = F (~x). We apply the ”ordinary” Mean Value Theorem to the function
g. Hence there exists a θ ∈ (0, 1) such that

g(1) − g(0) = g 0 (θ)

= F~x0 −~y (θ~x + (1 − θ)~y ) = F~x0 −~y (w),
~

5
~ = θx + (1 − θ)~y lies on the open line segment (~x, ~y ). Hence we have
where w
proved the Mean Value Theorem.
Proposition 2. Let ~x ∈ A0 be given. If F is C 1 , then

(1) Fc~0a (~x) = cF~a0 (~x), for all c ∈ R and ~a ∈ Rn

and
(2) F~a0 +~b (~x) = F~a0 (~x) + F~b0 (~x),

that is, for each fixed ~x, the map ~a 7→ F~a0 (~x) is a linear map of Rn into R.
Proof. (1) if c = 0, it is clear that both sides of (1) are equal to 0.
Assume that c 6= 0. Then
1
Fc~0a (~x) = lim [F (~x + hc~a) − F (~x)]
h→0 h
1
= c · lim [F (~x + hc~a) − F (~x)]
h→0 hc
k=hc 1
= c lim [F (~x + k~a) − F (~x)] = cF~a0 (~x).
k→0 h

(2) F is C 1 hence F~a0 (~x) and F~b0 (~x) exist. Since x ∈ A0 , exists r > 0 such
that
||~x − ~y || < r ⇒ ~y ∈ A.
Choose h such that

||h(~a + ~b)|| < r and ||h~b|| < r.

Then
~x + h(~a + ~b) ∈ A and ~x + h~b ∈ A,
hence

(∗) F (~x + h(~a + ~b)) − F (~x)

F (~x + h(~a + ~b)) − F (~x + h~b) − F (~x)] + [F (~x + h~b) − F (~x)]

We apply the Mean Value Theorem to the expression in the first bracket.
Thus there exists a θ ∈ (0, 1) such that

(∗∗) F ((~x + h~b) + h~a) − F (~x + h~b) = Fh~

0
x + h~a + θh~b).
a (~

6
By part (1) we have
0
Fh~ x + h~a + θh~b) = hF~a0 (~x + h~a + θh~b)
a (~

If we divide the equation in (∗∗) by h, then we find

1
[F ((~x + h~b) + ~a)) − F (~x + h~b)] = F~a0 (~x + h~a + θh~b) −→ F~a0 (~x),
h h→0

since F is continuously differentiable. Consequently, dividing by h in (∗), it

follows that
1
lim [F ((~x + h(~a + ~b)) − F (~x)]
h→0 h
1
= F~a0 (~x) + lim [F (~x + h~b) − F (~x)] = F~a0 (~x) + F~b0 (~x)
h→0 h

We write ~a = ni=1 ai~ei . Then

n n
X X ∂F
F~a0 (~x) = ai F~e0i (~x) = ai (~x) = ∇F (~x) · ~a.
i=1 i=1
∂xi

As a consequence we have,
Corollary 1. If F is C 1 on A ⊆ Rn , then

F~a0 (~x) = ∇F (~x) · ~a,

for all ~x ∈ A0 .
If ||~a|| = 1, then F~a0 (~x) is often called the directional derivative of F at
the point ~x in the direction of ~a. From Corollary 1 it follows easily that
Corollary 2. F is continuously differentiable at the point ~x0 ⇔ ∇F is
continuous at the point ~x0 ⇔ all the first order partial derivatives of F are
continuous at ~x0 .
Proof. Let ~a = (a1 , . . . , an ) ∈ Rn , ||~a|| = 1. Then

|F~a0 (~x) − F~a0 (~x0 )| = |[∇F (~x) − ∇F (~x0 )] · ~a]|

=|∇F (~x) − ∇F (~x0 )| · ||~a|| · |∇F (~x) − ∇F (~x0 )|.

This proves that F~a0 (~x) − F~a0 (~x0 ) −→ 0 ⇔ ∇F (~x) − ∇F (~x0 ) −→ 0. Hence
x→~
~ x0 x→~
~ x0
F is continuously differentiable at the point ~x0 ⇔ ∇F is continuous at ~x0 .

7
If we let ~a = ~ei , (1 ≤ i ≤ n), we find that ∂x ∂F
i
(~x) = F~e0i (~x) is continuous at
~x0 for i = 1, . . . , n if F is continuously differentiable i ~x0 . Conversely, if the
∂F
partial derivatives ∂x i
(~x0 ) are continuous for i = 1, . . . n, then
n n
X X ∂F
F~a0 (~x) = ai F~e0i (~x) = ai (~x)
i=1 i=1
∂xi

is continuous at ~x0 for all ~a ∈ Rn . Hence it follows that F is continuously

differentiable at ~x0 .
Convex sets. A set S ⊆ Rn is called convex if

t~x + (1 − t)~y ∈ S

for all ~x, ~y ∈ S, and all t ∈ [0, 1].

In other words, S is convex if and only if the closed line segment [~x, ~y ]
between ~x and ~y is contained in S for all ~x, ~y ∈ S.
Example 4. Let us show that the unit disc with centre at the origin,
S = {(x1 , x2 ) : x21 + x22 ≤ 1} is a convex subset of R2 .
Let ~x = (x1 , x2 ), ~y = (y1 , y2 ) ∈ S, 0 ≤ t ≤ 1. We must show that

~z = t~x + (1 − t)~y = t(x1 , x2 ) + (1 − t)(y1 , y2 ) ∈ S.

Using the triangle inequality and that ||t~u|| = |t| · ||~u||, ~u ∈ Rn , we find that

||~z|| = ||t~x + (1 − t)~y ||

≤ ||t~x|| + ||(1 − t)~y || = t||~x|| + (1 − t)||~y ||
≤ t · 1 + (1 − t) · 1 = 1

Hence ~z ∈ S. This shows that S is convex.

Exercise 1. Let a > 0 and b > 0. Show that the elliptic disc

x21 x22
S = {(x1 , x2 ) : + 2 ≤ 1}
a2 b
er a convex subset of R2 .
Concave and convex functions. A C 2 −function (that is, a func-
tion possessing a continuous second derivative) f : [a, b] → R is concave if
f 00 (x) ≥ 0 for all x in the open interval (a, b). The function f is convex if

8
f 00 (x) ≥ 0 for all x ∈ (a, b). If f is convex, we see see geometrically that the
chord [(x, f (x))] always lies under or on the graph of f . Equivalently, the
inequality
(∗) tf (x) + (1 − t)f (y) ≤ f (tx + (1 − t)y),
holds for all x, y ∈ (a, b), and all t ∈ [0, 1]. The converse is also right: If
(∗) holds, then f is concave. For functions of several variables, the second
derivative at a vector ~x is no real number, but a bilinear function. As a
consequence, we will use the inequality (∗) when we define concavity for
functions of several variables.
Definition 3. Let S be a convex subset of Rn and let f be a real valued
function defined on S. We say that f is concave if f satisfies the inequality

(∗∗) f (t~x + (1 − t)~y ) ≥ tf (~x) + (1 − t)f (~y ),

for all ~x, ~y ∈ S and all t ∈ [0, 1]. A function f is called convex if the opposite
inequality holds. The function is strongly concave (respectively strongly
convex) if strict inequality holds in (∗∗).
Remark 2. That f is concave, means geometrically that the tangent
plane of the surface z = f (~x) lies over (or on) the surface at every point
(~y , f (~y )): the equation for the tangent plane through the point (~y , f (~y ))is

z = ∇f (~y ) · (~x − ~y ) + f (~y ),

hence
f (~x) ≤ z = ∇f (~y ) · (~x − ~y ) + f (~y )
for all ~x on the tangent plane.
Proposition 3. Assume that the functions f and g are defined on a
convex set S ⊆ Rn . Then the following statements hold
(a) If f and g are concave and a ≥ 0, b ≥ 0, then the function af + bg is
concave.
(b) If f and g are convex and a ≥ 0 then the function af + bg is convex.
Proof. (b): Assume that f and g are convex, a ≥ 0, b ≥ 0. Set h(~x) =
af (~x) + bg(~x) (~x ∈ S), and let t ∈ (0, 1). For all ~x, ~y ∈ S then

h(t~x + (1 − t)~y ) = af (t~x + (1 − t)~y ) + bg(t~x + (1 − t)~y )

≤ a[tf (~x) + (1 − t)f (~y )] + b[tg(~x) + (1 − t)g(~y )] = th(~x) + (1 − t)h(~y ).

9
Accordingly h convex.
Part (a) is shown in the same way.
The following useful ”2nd derivative test” holds:
Proposition. Let f : S ⊆ R2 → R, where S is open and convex. If all
the 2nd order partial derivatives of f are continuous (that is, f is C 2 on S),
then

∂ 2f ∂ 2f
(a) f is convex ⇔ (~
x ) ≥ 0, (~x) ≥ 0 and
∂x21 ∂x22
2
∂ 2f ∂ 2f ∂ 2 f (~x)
(~
x ) (~
x ) − ≥ 0 for all ~x ∈ S.
∂x21 ∂x22 ∂x1 ∂x2
∂ 2f ∂ 2f
(b) f is concave ⇔ (~
x ) ≤ 0, (~x) ≤ 0 and
∂x21 ∂x22
2
∂ 2f ∂ 2f ∂ 2 f (~x)
(~
x ) (~
x ) − ≥ 0 for all ~x ∈ S.
∂x2 ∂x22 ∂x1 ∂x2

We vil prove this proposition below (see Proposition 8). First we shall
need some results on symmetric 2 × 2 matrices Let

a b
Q=
b c

be a symmetric 2 × 2-matrix, and let

x
Q(~x) = ~x Q~x = (x1 , x2 )Q 1 = ax21 + 2bx1 x2 + cx22 ,
t
~x ∈ R2 .
x2

Definition 4. We say that Q is

(a) positive semidefinite if ~xt Q~x ≥ 0 for all ~x ∈ R2 .
(b) positive definite if ~xt Q~x > 0 for all ~x ∈ R2 \ {0}.
(c) negative semidefinite if ~xt Q~x ≤ 0 for all ~x ∈ R2 .
(d) negative definite if ~xt Q~x < 0 for all ~x ∈ R2 \ {0}.
We will also write Q ≥ 0 if Q is positive semidefinite, Q > 0 if Q is
positive definite, and similarly for (c) and (d).
Proposition 4. Q is positive semidefinite ⇔ a ≥ 0, c ≥ 0 and ac−b2 ≥ 0.

10
Proof. ⇐: assume that a ≥ 0, b ≥ 0 and ac − b2 ≥ 0. if a = 0, vil

ac − b2 = −b2 ≥ 0

then b = 0. Hence

x1 0 0 x1
(x1 , x2 )Q = (x1 , x2 )
x2 0 c x2

0
= (x1 , x2 ) = cx22 ≥ 0,
cx2

so that Q is positive semidefinite.

Assume that a 6= 0. Then

a b x1 ax1 + bx2
(x1 , x2 ) = (x1 , x2 ) = ax21 + 2bx1 x2 + cx22
b c x2 bx1 + cx1
b c
= a[x21 + 2 x1 x2 + x22 ]
a a
b b2 c b2
= a[x21 + 2 x1 x2 + 2 x22 + ( − 2 )x22 ]
a a a a

b ac − b2 2
= a[(x1 + x2 )2 + x2 ] ≥ 0
a a2
for all (x1 , x2 ) ∈ R2 . Hence is Q ≥ 0.
⇒: Assume that Q ≥ 0, then ~xt Q~x ≥ 0, ~x ∈ R2 . In particular,

a b 1
(1, 0) =a≥0
b c 0
and

a b 0
(0, 1) =c≥0
b c 1

Assume that a > 0 : As we have seen above,

a b x1
(x1 , x2 )
b c x2
b ac − b2 2
= a[(x1 + x2 )2 + x2 ].
a a2
11
ac−b2 2
If we let x1 = − ab x2 , we derive that a2
x2 ≥ 0, for all x2 , henceac − b2 ≥ 0.
Assume that a = 0 : Then
0 ≤ ~xt Q~x = ax21 + 2bx1 x2 + cx22
= 2bx1 x2 + cx22
if b > 0, the fact that x2 = 1, x1 = −n, implies
0 ≤ −2bn + c −→ −∞,
n→∞

a contradiction. Hence b = 0, and ac − b2 = 0 ≥ 0.

Further, it is easy to show
Proposition 5. Q is positive definite ⇔ a > 0 (and c > 0) and ac − b2 >
0.
The following chain rule for real functions of several variables will be
useful.
Proposition 6. (Chain Rule) Let f be a real C 1 - function defined on an
open convex subset S of Rn , and assume that ~r : [a, b] → S is differentiable.
Then the composed function f ◦ ~r = f ◦ (r1 , . . . , rn ) : [a, b] → R is well
defined. Set g(t) = f (~r(t)). Then g is differentiable and
∂f ∂f
g 0 (t) = (~r(t))r10 (t) + · · · + (~r(t))rn0 (t) = ∇f (~r(t)) · ~r 0 (t).
∂x1 ∂xn

Proof. consider
g(t + h) − g(t) f (~r(t + h)) − f (~r(t))
(∗) =
h h
Since S is open, there exists an open sphere Br with positive radius and centre
at ~r(t) which is contained in S. We choose h 6= 0 so small that ~r(t + h) ∈ Br .
Set ~u = ~r(t + h), ~v = ~r(t). Then the numerator on the right hand side of (∗)
is f (~u) − f (~v ). By applying the Mean Value Theorem to the last difference,
we obtain
f (~u) − f (~v ) = ∇f (~z) · (~u − ~v ),
where ~z = ~u + θ(~u − ~v ) for a θ ∈ (0, 1). Hence the difference quotient in (∗)
may be written
1 1
(∗∗) [g(t + h) − g(t)] = ∇f (~z) · [(~r(t + h)) − (~r(t))]
h h
12
As h → 0, we find ~u → ~v , so that ~z → ~u. Sincef is C 1 , is ∇f continuous,
∇f (~z) → ∇f (~u) = ∇f (~r(t)). Therefore the right side of (∗∗) converges to
∇f (~r(t)) ·~r 0 (t). This shows that g 0 (t) exists and is equal to the inner product
∇f (~r(t)) · ~r 0 (t).
If the function f is defined on an open subset S of R2 for which the partial
derivatives of 2nd order exist on S, then we may ask if the two mixed partial
2f 2f
derivative are equal. That is, if ∂x∂1 ∂x2
= ∂x∂2 ∂x 1
. This is not always the case
however, the following result which we state without proof, will be sufficient
for our needs.
Proposition 7. Assume that f is a real valued function defined on an
open subset S of R2 and that all the first and second order partial derivatives
of f are continuouse on S. Then

∂ 2f ∂ 2f
(~x) = (~x),
∂x1 ∂x2 ∂x2 ∂x1
for all ~x i S.
Proof. See for instance Tom M. Apostol, Calculus Vol. II, 4.25.
Next we will apply the last four propositions to prove
Proposition 8. Let f be a C 2 function defined on an open convex subset
S of R2 . Then the following statements hold,
2 2 2 2 2f
(a) f is convex ⇔ ∂∂xf2 ≥ 0, ∂∂xf2 ≥ 0 and ∆f = ∂∂xf2 ∂∂xf2 − ( ∂x∂1 ∂x 2
)2 ≥ 0
1 2 1 2
∂2f ∂2f
(b) f is concave ⇔ ∂x21
≤ 0, ∂x22
≤ 0 and ∆f ≥ 0
∂2f
(c) ∂x21
> 0 and ∆f > 0 ⇒ f is strictly convex.
∂2f
(d) ∂x21
< 0 and ∆f > 0 ⇒ f is strictly concave.
Alle the above inequalities are supposed to hold at each point of S.
Proof. (a) ⇐: Choose two arbitrary vectors ~a, ~b in S, and let t ∈ [0, 1].
We consider the function g given by

g(t) = f (~b + t(~a − ~b)) = f (t~a + (1 − t)~b).

Put ~r(t) = ~b + t(~a − ~b), thus g(t) = f (~r(t)). Let ~a = (a1 , a2 ), ~b = (b1 , b2 ). The

13
Chain Rule yields

g 0 (t) = ∇f (~r(t)) · ~r0 (t)

∂f ∂f
(~r(t)) · (a1 − b1 ) + (~r(t)) · (a2 − b2 )
∂x1 ∂x2
and
∂ 2 f (~r(t)) ∂ 2 f (~r(t))
g 00 (t) =( , ) · (~a − ~b)(a1 − b1 )
∂x21 ∂x2 ∂x1
∂ 2 f (~r(t)) ∂ 2 f (~r(t))
+( , ) · (~a − ~b)(a2 − b2 )
∂x1 ∂x2 ∂x22
∂ 2f ∂ 2f
=[ 2 (a1 − b1 )2 + (a1 − b1 )(a2 − b2 )
∂x1 ∂x2 ∂x1
∂ 2f ∂ 2f
+ (a1 − b1 )(a2 − b2 ) + 2 (a2 − b2 )2 ]~r(t)
∂x1 ∂x2 ∂x2
2
∂ f 2
∂ f
!
~ ∂x21
=(~a − b) t
∂ f 2
∂x1 ∂x2
∂ f 2 (~a − ~b)
∂x2 ∂x1 ∂x22 ~
r(t)

From the conditions in (a) the ”Hessematrix” H of f ,

∂2f ∂2f
!
∂x21 ∂x1 ∂x2
H= ∂2f ∂2f
∂x2 ∂x1 ∂x22

is positive semidefinite in S. Accordingly g 00 (t) ≥ 0, for all t ∈ (0, 1), so that

g is convex. In particular it follows that

f (t~a + (1 − t)~b) = g(t) = g(t · 1 + (1 − t) · 0)

≤ tg(1) + (1 − t)g(0) = tf (~a) + (1 − t)f (~b).

Since ~a and ~b were arbitrary in S, we deduce that f is convex.

⇒: In light of Proposition 4 it suffices to prove that the Hessematrix H
of f is positive semidefinite in S. Let ~x ∈ S, h ∈ R2 be arbitrary. As S is
open, there exists r > 0 such that |t| < r ⇒ ~x + t~h ∈ S. Define p on the
interval I = (−r, r) ved p(t) = f (~x + t~h).
Claim: p is a convex function.

14
In order to see this, let λ ∈ [0, 1], t and s ∈ I. Then

p(λt + (1 − λ)s)) = f (~x + λt~h + (1 − λ)s~h)

= f (λ(~x + t~h) + (1 − λ)(~x + s~h))
≤ λf (~x + t~h) + (1 − λ)f (~x + s~h)
= λp(t) + (1 − λ)p(s),

which proves the claim.

Since p is C 2 and convex, we deduce p00 ≥ 0. Let ~r(t) = ~x + t~h, t ∈ I. By
The Chain Rule,
p0 (t) = ∇f (~r(t)) · ~h
and as in the first part of the proof,

p00 (t) = ~ht H(~r(t))~h (~h ∈ R2 arbitrary).

As p00 (t) ≥ 0, it follows that H(~r(t)) is positive semidefinite, t ∈ I. In partic-

ular,
H(~x) = H(~r(0)) ≥ 0
for all ~x ∈ S. Hence part (a) follows.
The proofs of (b), (c) and (d) are similar.
The following ”gradient inequality” and its consequenses are useful in the
calculus of variations.
Proposition 9. (Gradient inequality) Assume that S ⊆ Rn is convex,
and let f : S → R be a C 1 -function. Then the following equivalence holds

f is concave ⇔
n
X ∂f
(1) f (~x) − f (~y ) ≤ ∇f (~y ) · (~x − ~y ) = (~y )(xi − yi )
i=1
∂xi
for all ~x, ~y ∈ S.

Proof. ⇒: Assume that f is concave, and let ~x, ~y ∈ S. For all t ∈ (0, 1) :
we know that
tf (~x) + (1 − t)f (~y ) ≤ f (t~x − (1 − t)~y ),

15
that is
t(f (~x) − f (~y )) ≤ f (t(~x − ~y ) + ~y ) − f (~y )
hence
1
f (~x) − f (~y ) ≤ [f (~y + t(~x − ~y )) − f (~y )]
t
If we let t → 0, we find that the expression to the right approaches the
derivative of f at ~y along ~x − ~y . Hence,
1
f (~x) − f (~y ) ≤ lim [f (~y + t(~x − ~y )) − f (~y )] = f~x0 −~y (~y )
t→0 t
= ∇f (~y ) · (~x − ~y )

⇐: Assume that the inequality (1) holds. Let ~x, ~y ∈ S, and let t ∈ (0, 1). We
put
~z = t~x + (1 − t)~y .
It is clear that ~z ∈ S, since S is convex. By (1),

(2) f (~x) − f (~z) ≤ ∇f (~z) · (~x − ~z)

and
(3) f (~y ) − f (~z) ≤ ∇f (~z) · (~y − ~z)

Multiplying the inequality in (2) by t and the inequality in (3) by 1 − t, and

then adding the two resulting inequalities, we derive that

(4) t(f (~x) − f (~z)) + (1 − t)(f (~y ) − f (~z))

≤ ∇f (~z) · [t(~x − ~z) + (1 − t)(~y − ~z)]

Here the right side of (4) equals ~0, since

t(~x − ~z) + (1 − t)(~y − ~z) = t~x + (1 − t)~y − t~z − ~z + t~z

= t~x + (1 − t)~y − ~z = ~0.

Rearranging the inequality (4) we hence find,

tf (~x) + (1 − t)f (~y ) ≤ f (~z) + (1 − t)f (~z)

= f (~z) = f (t~x + (1 − t)~y ),

which shows that f is convex.

16
Remark 3. We also have that f is strictly concave ⇔ proper inequality
holds in (1). Corresponding results with the opposite inequality (respectively
the opposite proper inequality) in (1), holds for convex (respectively strongly
convex) functions.
Definition 5. Let f : S ⊆ Rn → R, and let ~x∗ = (x∗1 , . . . , x∗n ) ∈ S.
A vector ~x∗ is called a (global) maximum point for f if f (~x∗ ) ≥ f (~x) for
all ~x ∈ S. Similarly we define global minimum point. ~x∗ is called a local
maximum point for f if there exists a positive r such that f (~x∗ ) ≥ f (~x) for
all ~x ∈ S that satisfies ||~x − ~x∗ || < r.
A stationary (or critical) point for f is a point ~y ∈ S such that ∇f (~y ) = ~0,
that is, all the first order partial derivatives of f at the point ~y are zero,
∂f
∂xi
(~y ) = 0 (1 ≤ i ≤ n).
If ~x∗ is a local maximum- or minimum point in the interior S 0 of S, then
∇f (~x∗ ) = ~0 :

∂f ∗ 1
(~x ) = lim+ [f (~x∗ ) − f (~x∗ + h~ei )] ≥ 0
∂xi h→0 h

and
∂f ∗ 1
(~x ) = lim− [f (~x∗ ) − f (~x∗ + h~ei )] ≤ 0,
∂xi h→0 h
∂f
and hence ∂x i
(~x∗ ) = 0, (1 ≤ i ≤ n). For arbitrary C 1 -functions f the
condition ∇f (~x∗ ) = ~0 is not sufficient to ensure that f has a loal maximum
or minimum at ~x∗ , however, if f is convex or concave, the following holds,
Proposition 10. Let f be a real valued C 1 function defined on a convex
subset S of Rn , and let ~x∗ ∈ S. The following holds

(a) if f is concave, then ~x∗ is a local maximum point for f

⇔ ∇f (~x∗ ) = ~0.

(b) if f is convex, is ~x∗ a local minimum point for f

⇔ ∇f (~x∗ ) = ~0.

Proof. (a)
⇒: We have seen above that if ~x∗ is a local maximum point for f , then
∇f (~x∗ ) = ~0.

17
⇐: Assume that ~x∗ is a stationary point and that f is concave. For all ~x ∈ S
it follows from the gradient inequality in Proposition 9 that

f (~x) − f (~x∗ ) ≤ ∇f (~x∗ ) · (~x − ~x∗ ) = 0,

hence f (~x) ≤ f (~x∗ ), for all ~x ∈ S.

the proof of (b) is similar.
The next result will also be useful,
Proposition 11. Assume that F is a real valued function defined on a
convex subset S of Rn , and that G is a real function defined on an interval in
R that contains the image f (S) = {f (~x) : ~x ∈ S} of f . Then the following
statements hold,
(a) f is concave and G is concave and increasing ⇒ G ◦ f is concave.
(b) f is convex and G is convex and increasing ⇒ G ◦ f is convex.
(c) f is concave and G is convex and decreasing ⇒ G ◦ f is convex.
(d) f is convex and G concave and decreasing ⇒ G ◦ f is concave.
Proof. (a) Let ~x, ~y ∈ S, and let t ∈ (0, 1). Put U = G ◦ f. Then

U (t~x + (1 − t)~y ) = G(f (t~x + (1 − t)~y )

≥ G(tf (~x) + (1 − t)f (~y )) (f concave and G increasing)
≥ tG(f (~x)) + (1 − t)G(f (~y )) (G concave)
= tU (~x) + (1 − t)U (~y ),

hence U is concave.
(b) is proved as (a).
(c) and (d): Apply (a) and (b) to the function −G. (Notice that G is convex
and decreasing ⇔ −G is concave and increasing.)

2 CALCULUS OF VARIATIONS.

We may say that the basic problem of the calculus of variations is to deter-
mine the maximum or minimum of an integral of the form
Z t1
(1) J(x) = F (t, x(t), ẋ(t)) dt,
t0

18
where F is a given C 2 −function of three variables and x = x(t) is an unknown
C 2 function on the interval [t0 , t1 ], such that
(2) x(t0 ) = x0 , and x(t1 ) = x1 ,
where x0 and x1 are given numbers. Additional side conditions for the prob-
lem may also be included. Functions x that are C 2 and satisfy the endpoint
conditions (2) are called admissible functions.
Example 1. (Minimal surface of revolution.)
Consider curves x = x(t) in the tx−plane, t0 ≤ t ≤ t1 , all with the same
given, end points (t0 , x0 ), (t1 , x1 ). By revolving such curves around the t−axis,
we obtain a surface of rvolution S.
Problem. Which curve x gives the smallest surface of revolution?
If we assume that x(t) ≥ 0, the area of S is
Z t1 √
A(x) = 2πx 1 + ẋ2 dt
t0

Our problem is to determine the curve(s) x∗ for which min A(x) = A(x∗ ),
when x(t0 ) = x0 , x(t1 ) = x1 . This problem is of the same type as in (1)
above.
We shall next formulate the main result for problems of the type (1).
Theorem 1. (”Main Theorem of the Calculus of Variations”)
Assume that F is a C 2 −function defined on R3 . Consider the integral
Z t1
(∗) F (t, x, ẋ) dt
t0

If a function x∗ maximizes or minimizes the integral in (∗) among all C 2 −functions

x on [t0 , t1 ] that satisfy the end point conditions
x(t0 ) = x0 , x(t1 ) = x1 ,
then x∗ satisfies the Euler equation
∂F d ∂F
(E) (t, x(t), ẋ(t)) − ( (t, x(t), ẋ(t)) = 0 (t ∈ [t0 , t1 ]).
∂x dt ∂x
If the function (x, ẋ) 7→ F (t, x, ẋ) is concave (respectively convex) for each
t ∈ [t0 , t1 ], then a function x∗ that satisfies the Euler equation (E), solves
the maximum (respectively minimum) problem.

19
A proof of Theorem 1 will be given below. First we consider
Example 2. Solve
Z 1
min (x2 + ẋ2 ) dt, x(0) = 0, x(1) = e2 − 1.
x 0

∂F d ∂F d
We let F (t, x, ẋ) = x2 + ẋ2 . Then ∂x
= 2x, dt ∂ ẋ
= dt
(2ẋ) = 2ẍ. Hence the
Euler equation of the problem is

ẍ − x = 0,

The general solution is

x(t) = aet + Be−t
Here x(0) = 0 = A + B, B = −A, x(1) = e2 − 1 = Ae + Be−1 = A(e − e−1 ).
This implies that A = e, B = −e, hence

x(t) = et+1 − e1−t

is the only possible solution, by the first part of Theorem 1. Here F (t, x, ẋ) =
x2 + ẋ2 is convex as a function of (x, ẋ), for each t. As a consequence of the
last part of Theorem 1, we conclude that x does in fact solve the problem.
The minimum is
Z 1 Z 1
2 2
(x + ẋ ) dt = [(et+1 − e1−t )2 + (et+1 + e1−t )2 ] dt = . . . = e4 − 1.
0 0

We notice that t does not occur explicitly in the formula for F in this example.
Next we shall take a closer look at this particular case.
The case F = F (x, ẋ) (that is, t does not occur explicitly in the formula
of the function F ).
If x(t) is a solution of the Euler equation, then the Chain Rule yields:
d ∂F ∂F ∂F ∂F d ∂F
[f (x, ẋ) − ẋ (x, ẋ)] = ẋ + ẍ − ẍ − ẋ
dt ∂ ẋ ∂x ∂ ẋ ∂ ẋ dt ∂ ẋ
∂F d ∂F
= ẋ[ − ( )] = ẋ · 0 = 0,
∂x dt ∂ ẋ
so that
∂F
F − ẋ =C (is constant).
∂ ẋ
20
This equation is called a first integral of the Euler equation.
In order to prove Theorem 1 we will need,
The Fundamental Lemma (of the Calculus of Variations). Assume
that f : [t0 , t1 ] → R is a continuous function, and at
Z t1
f (t)µ(t) dt = 0
t0

for all C 2 −functions µ that satisfies µ(t0 ) = µ(t1 ) = 0. Then f (t) = 0 for all
t in the interval [t0 , t1 ].
Proof. If there exists an s ∈ (t0 , t1 ) such that f (s) 6= 0, say f (s) > 0,
then f (t) > 0 for all t in some interval [s − , s + ] of s since f is continuous.
We choose a function µ such that µ is C 2 and

µ ≥ 0, µ(s − ) = µ(s + ) = 0, µ > 0/on (s − , s + )

and µ = 0 outside the interval (s − , s + ). We may for instance let

(
(t − (s − ))3 ((s + ) − t)3 , y ∈ [s − , s + ]
0, t∈/ [s − , s + ].

which is C 2 . Then
Z t1 Z s+
f (t)µ(t) dt = f (t)µ(t) dt > 0.
t0 s−

since µ(t)f (t) > 0 and µ(t)f (t) is continuous on the open interval (s−, s+).
Hence we hofe obtained a contradiction.
We shall also need to ”differentiate under the integral sign”:
Proposition 2. Let g be a C 2 −function defined on the rectangle R =
[a, b] × [c, d] i R2 .
Z b
G(u) = g(t, u) dt, u ∈ [c, d].
a

Then Z b
0 ∂g
G (u) = (t, u) dt.
a ∂u

21
∂g
Proof. Let > 0. Since ∂u is continuous, it is also uniformly continuous
on R (that is, closed and bounded, hence compact). Therefore there exists a
δ > 0 such that
∂g ∂g
| (s, u) − (t, v)| < ,
∂u ∂u b−a
for all (s, u), (t, v) in R such that ||(s, u) − (t, v)|| < δ. By the Mean value
Theorem there is a θ = θ(t, u, v) between u and v such that
∂g
g(t, v) − g(t, u) = (t, θ)(v − u).
∂u
Consequently,
b
g(t, v) − g(t, u) ∂g
Z
| [ − (t, u)] dt|
a v−u ∂u
Z b
∂g ∂g
≤ | (t, θ) − (t, u)| dt
a ∂u ∂u
Z b

≤ dt = , for |v − u| < δ.
a b−a

Hence
b
G(v) − G(u) g(t, v) − g(t, u)
Z
0
G (u) = lim = lim dt
v→u v−u v→u a v−u
Z b
∂g
= (t, u) dt.
a ∂u

Proof of Theorem 1. We will consider the maximum problem (the
minimum problem is similar)
 Z t1

 max F (t, x, ẋ) dt
(1) x t0

 when x(t ) = x and x(t ) = x , x , x given numbers.
0 0 1 1 0 1

assume that x∗ is a C 2 −function that solves (1). Let µ be an arbitrary

C 2 −function on [t0 , t1 ] that satisfies µ(t0 ) = µ(t1 ) = 0. For each real the
function
x = x∗ + µ

22
is an admissible function, that is, a C 2 −function satisfying the endpoint
conditions x(t0 ) = x0 and x(t1 ) = x1 . Then we must have

J(x∗ ) ≥ J(x + µ)

for all real . Let µ be fixed. We will study I() = J(x∗ + µ) as a function
of . The function I has a maximum for = 0. Hence

I 0 (0) = 0.

Now Z t1
I() = F (t, x∗ + µ, ẋ∗ + µ̇) dt
t0

Differentiating under the integral sign (see Proposition 2 above) with respect
to µ we find, Z t1
0
∂F ∗ ∂F ∗
0 = I (0) = µ+ µ̇ dt
t0 ∂x ∂ ẋ
∗ ∗ ∗
where we put ∂F ∂x
= ∂F
∂x
(t, x∗ , ẋ∗ ) and ∂F ∂ ẋ
= ∂F
∂ ẋ
(t, x∗ , ẋ∗ ). Hence
Z t1 Z t1
0 ∂F ∗ ∂F ∗
0 = I (0) = µ dt + µ̇ dt
t0 ∂x t0 ∂ ẋ
Z t1 ∗ t1 Z t1
∂F ∗ ∂F d ∂F ∗
= µ dt + µ − µ dt (ved delvis integrasjon)
t0 ∂x ∂x t0 t0 dt ∂ ẋ
Z t1
∂F ∗ d ∂F ∗
= − µ dt (since µ(t1 ) = µ(t0 ) = 0)
t0 ∂x dt ∂ ẋ
hence t1
∂F ∗ d ∂F ∗
Z
− µ dt = 0
t0 ∂x dt ∂ ẋ
2
for all such C −functions µ with µ(t1 ) = µ(t1 ) = 0. By the Fundamental
Lemma it follows that
∂F ∗ d ∂F ∗
(2) − = 0,
∂x dt ∂ ẋ
hence x∗ satisfies the Euler equation. Using a quite similar argument we find
that an optimal solution x∗ for the minimum problem also satisfies (2).
Sufficiency: assume that F (t, x, ẋ) is concave with respect to the last
two variables (x, ẋ), and let x∗ satisfy the Euler equation with the endpoint

23
conditions of (1). We shall prove that x∗ solves the the maximum problem.
Let x be an arbitrary admissible function for the problem. By concavity the
”Gradient inequality” yields,
∂F ∗ ∂F ∗
F (t, x, ẋ) − F (t, x∗ , ẋ∗ ) ≤ (x − x∗ ) + (ẋ − ẋ∗ )
∂x ∂ ẋ
d ∂F ∗ ∂F ∗ d ∂F ∗
( )(x − x∗ ) + (ẋ − ẋ∗ ) = ( (x − x∗ )),
dt ∂ ẋ ∂ ẋ dt ∂ ẋ
for all t ∈ [t0 , t1 ]. By integration it then follows that
Z t1
F (t, x, ẋ) − F (t, x∗ , ẋ∗ ) dt

t0
t1 t1
d ∂F ∗ ∂F ∗
Z
≤ (x − x∗ ) dt = ∗
(x − x ) = 0,
t0 dt ∂ ẋ ∂ ẋ t0

where we used the endpoint conditions for x and x∗ at the last step. Hence
Z t1 Z t1
F (t, x, ẋ) dt ≤ F (t, x∗ , ẋ∗ ) dt
t0 t0

as we wanted to prove.
Remark. In the above theorem, assume instead that F is concave with
respect to (x, ẋ) (for each t ∈ [t0 , t1 ]) only in a certain open and convex
subset R of R2 . The above sufficiency proof still applies to the set V of
all admissible functions x enjoying the property that (x(t), ẋ(t)) ∈ R for all
t ∈ [t0 , t1 ]. Hence any admissible solution x∗ of the
R t Euler equation that is
also an element of V will maximize the integral t01 F (t, x, ẋ) dt among all
members of V . Consequently, we obtain a maximum relative to the set V .
Note that the proof uses the ”Gradient Inequality” (Proposition 9 in Section
1) which requires the region to be open and convex.
Example 3. We will find a solution x = x(t) of the problem
Z 1
min F (t, x, ẋ) dt
−1

where F (t, x, ẋ) = t2 ẋ2 + 12x2 and x(−1) = −1, x(1) = 1.

(a) First we will solve the problem by means of Theorem 1. Here we find
∂F d ∂F d 2
t · 2ẋ = 24x − 4tẋ − 2t2 ẍ,

− = 24x −
∂x dt ∂ ẋ dt
24
and the Euler equation takes the form

t2 ẍ + 2tẋ − 12x = 0.

We seek solutions of the type x(t) = tk . This gives

(k(k − 1) + 2k − 12)tk = 0, for all t,

that is, k 2 + k − 12 = 0, which has the solutions k = 3 and k = −4.

Consequently, x(t) = At3 + Bt−4 where

x(−1) = −A + B = −1
x(1) = A + B = 1

Hence B = 0 and A = 1, so that x(t) = t3 is the only possible solution.

Furthermore,

∂ 2F 2 ∂ 2F ∂ 2F
= 2t ≥ 0, = 24 > 0, = 0,
∂ ẋ2 ∂x2 ∂x∂ ẋ
and the determinant of the Hesse matrix is 24 · 2t2 = 48t2 ≥ 0. It follows that
F is convex with respect to (x, ẋ). Hence, in view of Theorem 1, x(t) = t3
gives a minimum.
(b) Suppose next that x is a solution of the Euler equation to the problem
which in addition satisfies the given endpoint conditions. If z is an arbitrary
admissible function, then µ = z − x is a C 2 -function that satisfies µ(−1) =
µ(1) = 0. Let us show that

J(z) − J(x) = J(x + µ) − J(x) ≥ 0.

From this it will follow that x minimizes J. Now

Z 1
J(x + µ) − J(x) = [t2 (ẋ + µ̇)2 + 12(x + µ)2 − t2 ẋ2 − 12x2 ] dt
−1
Z 1 Z 1
2 2 2 2
= [2t ẋµ̇ + t µ̇ + 24xµ + 12µ ] dt = [t2 µ̇2 + 12µ2 ] dt + I
−1 −1

25
where
Z 1
I= [2t2 ẋµ̇ + 24xµ] dt
−1
1 Z 1 Z 1
2 2
= 2t ẋµ − (2t ẍµ + 4tẋµ) dt + 24xµ dt (using integration by parts)
−1 −1 −1
Z 1
2µ[t2 ẍ + 2tẋ − 12x] dt

=0− µ(−1) = µ(1) = 0
−1
= 0,
where the last inequality follows from the fact that x satisfies the Euler
equation
t2 ẍ + 2tẋ − 12x = 0.
Hence Z 1
J(x + µ) − J(x) = [t2 µ̇2 + 12µ2 ] dt > 0
−1
if µ 6= 0, hence every solution of the Euler equation that satisfies the endpoint
conditions, yields a minimum. We have seen in (a) that x(t) = t3 is the only
such solution.
Exercise 1. Consider the problem min J(x), where
Z 2
J(x) = ẋ(1 + t2 ẋ) dt, x(1) = 3, x(2) = 5.
1

(a) Find the Euler equation of the problem and show that it has exactly
one solution x that satisfies the given endpoint conditions.
(b) Apply Theorem 1 to prove that the solution from (a) really solves
the minimum problem.
(c) For an arbitrary C 2 -function µ that satisfies µ(1) = µ(2) = 0, show
that Z 2
J(x + µ) − J(x) = t2 µ̇2 dt,
1
where x is the solution from (a). Explain in light of this, that x minimizes
J.
Exercise 2. A function F is given by
F (x, y) = x2 (1 + y 2 ), (x, y) ∈ R2

26
(a) Show that F is convex in the region
1
R = {(x, y) : |y| ≤ √ }
3
(b) Show that the variation problem
Z 1
min x2 (1 + ẋ2 ) dt,
0
x(0) = x(1) = 1

has Euler equation

(∗) xẍ + ẋ2 − 1 = 0

V = {x : x is a C 2 function such that x(0) = x(1) = 1

1
and |ẋ(t)| < √ for all t ∈ [0, 1]}
3
Show that x∗ ∈ V . Explain that x∗ minimizes the integral in (1) among all
functions in V .
Weierstrass’ sufficiency condition.2
Example 4. Consider the problem of maximizing or minimizing the
integral Z 1
J(x) = (ẋ2 − x2 ) dt, x(0) = x(1) = 0.
0
The Euler equation of the problem is

ẍ + x = 0

The extremals (that is, the solutions of the Euler equation) are

x(t) = A sin t + B cos t, A and B arbitrary constants.

Among all extremals only the zero function x∗ , x∗ (t) = 0 for all t ∈ [0, 1],
satisfies the endpoint conditions. It is easy to see, using the second-derivative
2
This section is optional.

27
test, that F (t, x, ẋ) = ẋ2 − x2 is neither concave nor convex i (x, ẋ). Hence
we cannot use Theorem 1 to decide if x∗ gives a maximum or a minimum for
the problem.
Exercise 3. Show that the function F (t, x, ẋ) = ẋ2 − x2 of the last
Example is neither convex nor concave.
The sufficiency condition stated in Theorem 3 below, is often useful for
solving problems
(1) max or min J(x)
where Z t1
J(x) = F (t, x(t), ẋ(t)) dt.
t0

As above F is assumed to be a given C 2 −function of three variables and

x = x(t) is an unknown C 2 -function on the interval [t0 , t1 ], such that

(2) x(t0 ) = x0 , x(t1 ) = x1 ,

where x0 and x1 are given numbers.

The solutions of the Euler equation of this problem (without any endpoint
conditions), are called extremals.
Definition. we say that an extremal x∗ for
Z t1
J(x) = F (t, x(t), ẋ(t)) dt, x(t0 ) = x0 , x(t1 ) = x1 ,
t0

is a relative maximum if there exists an r > 0 such that J(x∗ ) ≥ J(x) for all
C 2 -functions x with x(t0 ) = x0 , x(t1 ) = x1 that satisfies |x∗ (t) − x(t)| < r
for all t ∈ [0, 1]. A relative minimum x∗ is defined similarly.
Theorem 3. (Weierstrass’ sufficiency condition). Assume that an ex-
tremal x∗ for the problem (1) satisfies the endpoint conditions (2) and that

28
there exists a parameter family x(·, α) 3 , − < α < , of extremals such that
(1) x(t, 0) = x∗ (t), t ∈ [t0 , t1 ],
∂x
(2) x(·, α) is differentiable with respect to α and (t, α)α=0 6= 0,
∂α
t0 ≤ t ≤ t1 ,
(3) if α1 6= α2 , then the two curves given by x(·, α1 ) and x(·, α2 )have no
common points,
∂ 2F
(4) (t, x, ẋ) < 0 for all t ∈ [t0 , t1 ] and all x = x(·, α), α ∈ (−, )
∂ ẋ2
Then x∗ is a relative maximum for J(x) with the given endpoint conditions.
If the condition
∂ 2F
(40 ) (t, x, ẋ) > 0 for all t ∈ [t0 , t1 ] and all x = x(·, α), α ∈ (−, ),
∂ ẋ2
holds instead of (4), then x∗ is a relative minimum.

Consider the extremals

x(t) = A cos t + B sin t, t ∈ [0, 1].

of Example 4. The parameter family of extremals x(·, α given by

(∗) x(t, α) = α cos t, α ∈ (−1, 1),

satisfies x(·, 0) = x∗ = 0 and is differentiable with respect to α with ∂x(t,α)

∂α

α=0
=
cos t 6= 0 for all t ∈ [0, 1]. Moreover, it is clear that the family of curves given
by (∗) have no common points. Finally we have
∂ 2F ∂
(t, x, ẋ) = (2ẋ) = 2 > 0.
∂ ẋ2 ∂ ẋ
Hence, by the Weierstrass sufficiency condition, x∗ = 0 is a relative minimum
point for J(x).
Exercise 4. Let
Z t1 √
J(x) = 1 + ẋ2 dt, x(t0 ) = x0 , x(t1 ) = x1 , where t0 6= t1 .
t0

3
For fixed α, x(·, α) denotes the function t 7→ x(t, α)

29
J(x) gives the arc length of the curve x(t) between the two given points
(t0 , x0 ) and (t1 , x1 ). We wish to determine the curve x∗ that yields the
minimal arc length.
(a) Show that the Euler equation of the problem may be written
ẋ
(E) √ = c, where c is a constant different from ± 1.
1 + ẋ
(b) Show that (E) has a unique solution x∗ that satisfies the given end-
point conditions.
(c) Show that among the solutions of (E) there exists a parameter family
given by x(t, α) = x∗ (t) + α, −1 < α < 1, where x∗ is as in (b), and that the
conditions of Theorem 3 are satified such that x∗ gives a (relative) minimum.
(d) Show that F (t, x, ẋ) is convex i (x, ẋ). Conclude by Theorem 1 that
x∗ solves the minimum problem.
Other endpoint conditions. Next we will consider optimation prob-
lems for which the left end point x(t0 ) of the admissible functions x is fixed,
whereas the right end point x(t1 ) is free. In this case we have the following
result:
Theorem 4. Assume that x0 is a given number. A necessary condition
for a C 2 -function4 x∗ to solve the problem
Z t1
max (min) F (t, x(t), ẋ(t)) dt, x(t0 ) = x0 , x(t1 ) is free,
t0

is that x∗ satisfies the Euler equation and, in addition, the condition

∂F
(T ) = 0.
∂ ẋ t=t1
If the function F (t, x, ẋ) is concave (respectively convex) in (x, ẋ) for each
t ∈ [t0 , t1 ], then any admissible function x∗ that satisfies the Euler equation
and the condition (T ), solves the maximum (respectively minimum) problem.
The condition (T ) is called the transversality condition.
Proof. We will give a proof of the maximum problem. The proof of
the minimum problem is similar. Assume that x∗ solves the problem. In
particular all admissible (C 2 -) functions x which have the same value x∗ (t1 )
4
As can be shown, it suffices to consider C 1 -functions.

30
as x∗ at the right endpoint t = t1 , satisfy J(x) ≤ J(x∗ ), hence x∗ is optimal
among those functions. However, then x∗ satisfies the Euler equation. In the
proof of the Euler equation, see the proof of Theorem 1, we considered the
function Z t1
I(α) = F (t, x∗ + αµ, ẋ∗ + αµ̇) dt
t0
After integration by parts and differentiation under the integral, we con-
cluded that
Z t1
0 ∂F ∗ d ∂F ∗ ∂F ∗ t
0 = I (0) = [ − ( )] dt + µ(t) t10 ,
t0 ∂x dt ∂ ẋ ∂ ẋ
where
∂F ∗ ∂F ∂F ∗ ∂F
(∗) = (t, x∗ (t), ẋ∗ (t)) and = (t, x∗ (t), ẋ∗ (t))
∂x ∂x ∂ ẋ ∂ ẋ
Here, at this point, we let µ be a C 2 -function such that µ(t0 ) = 0 and µ(t1 ) is
free. Since x∗ satisfies the Euler equation, the integral in (∗) must be equal
to zero, hence
∂F ∗
µ(t1 ) = 0.
∂ ẋ t=t1
∗
If we choose a µ with µ(t1 ) 6= 0, it follows that ( ∂F )
∂ ẋ t=t1
= 0.
Sufficiency: Assume that F (t, x, ẋ) is concave in (x, ẋ), and that x∗ is
an admissible function which satisfies the Euler equation and the condition
(T ). The argument led to the inequality
Z t1
F (t, x, ẋ) − F (t, x∗ , ẋ∗ ) dt

t0
t1 t1
d ∂F ∗ ∂F ∗
Z
≤ (x − x∗ ) dt = (x − x∗ )
t0 dt ∂ ẋ ∂ ẋ t0

The last part of the proof of Theorem 1, goes through as before. Here we
find
∂F ∗

∗
(x − x ) =0
∂ ẋ t=t0
since x(t0 ) = x∗ (t0 ) = x0 . Finally the condition (T ) yields
∂F ∗

∗
(x − x ) = 0.
∂ ẋ t=t1

Thus it follows that J(x) ≤ J(x∗ ), hence x∗ maximizes J.

31
3 OPTIMAL CONTROL THEORY.
In the Calculus of Variations we studied problems of the type
Z t1
max/min f (t, x, ẋ) dt, x(t0 ) = x0 , x(t1 ) = x1 (or x(t1 ) free).
t0

We assumed that all the relevant functions were of class C 2 (twice continu-
ously differentiable). If we let
u = ẋ,
then the above problem may be reformulated as
Z t1
(0) max/min f (t, x, u) dt, ẋ = u, x(t0 ) = x0 , x(t1 ) = x1 (or x(t1 ) free).
t0

We shall next study a more general class of problems, problems of the type:
 R t1
max/min t0 f (t, x(t), u(t)) dt, ẋ(t) = g(t, x(t), u(t)),

(1) x(t0 ) = x0 , x(t1 ) free,

u(t) ∈ R, t ∈ [t0 , t1 ], u piecewise continuous.


Later we shall also consider such problems with other endpoint conditions
and where the functions u can take values in more general subsets of R.
Such problems are called control problems, u is called a control vari-
able (or just a control). The control region is the common range of the
possible controls u.5 Pairs of functions (x, u) that satisfies the given end-
point conditions and, in addition, the equation of state ẋ = g(t, x, u), are
called admissible pairs.
We shall always assume that the controls u are piecewise continuous, that
is, u may possess a finite number of jump discontinuities. (In more advanced
texts measurable contols are considered.) In the calculus of variations we
always assumed that u = ẋ was of class C 1 .
In the optimal control theory it proves very useful to apply an auxiliary
function H of four variables defined by

H(t, x, u, p) = f (t, x, u) + pg(t, x.u),

called the Hamilton function (or Hamiltonian) of the given problem.

5
Variable control regions may also be considered

32
Pontryagin’s Maximum principle gives conditions that are necessary for
an admissible pair (x∗ , u∗ ) to solve a given control problem:
Theorem (The Maximum Principle I)
Assume that (x∗ , u∗ ) is an optimal pair for the problem in (1). Then
there exist a continuous function p = p(t), such that for all t ∈ [t0 , t1 ], the
following conditions are satisfied:
(a) u∗ (t) maximizes H(t, x∗ (t), u, p(t)), u ∈ R, that is,

H(t, x∗ (t), u, p(t)) ≤ H(t, x∗ (t), u∗ (t), p(t)), for all u ∈ R.

(b) The function p (called the adjoint function ) satisfies the differential
equation
∂H ∂H ∗
ṗ(t) = − (t, x∗ (t), u∗ (t), p(t)) (written − ),
∂x ∂x
except at the discontinuities of u∗ .
(c) The function p obeys the condition

(T ) p(t1 ) = 0 (the transversality condition)

Remark.
For the sake of simplicity we will frequently use the notations
∂H ∗ ∂H
= (t, x∗ (t), u∗ (t), p(t))
∂x ∂x
and similarly,

f ∗ = f (t, x∗ (t), u∗ (t)), g ∗ = g(t, x∗ (t), u∗ (t)), H ∗ = H(t, x∗ (t), u∗ (t), p(t))

Remark.
For the varition problem in (0), where ẋ = u, the Hamiltonian H will be
particularly simple,

H(t, x, u, p) = f (t, x, u) + pu,

hence, by the Maximum Principle, it is necessary that

∂H ∗ ∂f ∗
(i) ṗ(t) = − =−
∂x ∂x
33
Since u ∈ R, we must have (there are no endpoints to consider here)

∂H ∗ ∂f ∗
0= = + p(t).
∂u ∂u
or
∂f ∗ ∂f ∗
(ii) p(t) = −=− ,
∂u ∂ ẋ
in order that u = u∗ (t) shall maximize H. Equations (ii) and (i) now yield

d ∂f ∗ ∂f ∗
ṗ(t) = − ( )=− ,
dt ∂ ẋ ∂x
Hence we have deduced the Euler equation
∂f ∗ d ∂f ∗
(E) − ( )=0
∂x dt ∂ ẋ
∗
As p(t) = − ∂f ∂ ẋ
, the condition (T ) above yields the ”old” transversality
∂f ∗
condition ( ∂ ẋ )t=t1 = 0 from the Calculus of Variations. This shows that
The Maximum Principle implies our previous results from The Calculus of
Variations.
The following theorem of Mangasarian supplements the Maximum Prin-
ciple with sufficient conditions for optimality.
Mangasarian’s Theorem (First version)
Let the notation be as in the statement of the Maximum Principle. If the
map (x, u) 7→ H(t, x, u, p(t)) is concave for each t ∈ [t0 , t1 ], then each admis-
sible pair (x∗, u∗ ) that satisfies conditions (a), (b), and (c) of the Maximum
Principle, will give a maximum.
Example 1
We will solve the problem
Z T
max [1 − tx(t) − u(t)2 ] dt, ẋ = u(t), x(0) = x0 , x(T ) free,
0

where x0 and T are positive constants.

Solution.
The Hamiltonian is

H(t, x, u, p) = 1 − tx − u2 + pu

34
In order that u = u∗ (t) shall maximize H(t, x∗ (t), u, p(t)), it is necessary that
∂H
(t, x∗ (t), u, p(t)) = 0,
∂u
∂2H ∗
that is −2u + p(t) = 0, or u = 21 p(t). This yields a maximum since ∂u2
=
−2 < 0. Hence
u∗ (t) = p(t)/2, t ∈ [0, T ].
Furthermore,
∂H
= −t = −ṗ(t),
∂x
so that ṗ = t, and p(t) = 12 t2 + A. In addition, the condition (T ) gives
0 = p(T ) = 12 T 2 + A, hence A = − 12 T 2 . Thus
1
p(t) = (t2 − T 2 )
2
and
1 1
ẋ∗ (t) = u∗ (t) = p(t) = (t2 − T 2 ),
2 4
1 1
x∗ (t) = t3 − T 2 t + x0 .
12 4
Here H(t, x, u, p(t)) = 1 − tx − u2 + pu is concave in (x, u) for each t, being
the sum of the two concave functions (x, u) 7→ pu − tx (which is linear)
and (x, u) 7→ 1 − u2 , (which is constant in x and concave as a function
of u). As an alternative, we could have used the second derivative test.
Accordingly Mangasarian’s Theorem shows that (x∗ , u∗ ) is an optimal pair
for the problem.
Example 2
Z T
max (x − u2 ) dt, ẋ = x + u, x(0) = 0, x(T )free, u ∈ R.
0
Here
H(t, x, u, p) = x − u2 + p(x + u) = −u2 + pu + (1 + p)x,
which is concave in (x, u). Let us maximize H with respect to u: ∂H ∂u
=
1 ∂2H
−2u + p = 0 if and only if u = 2 p. This gives a maximum as ∂u2 = −2 < 0.
∗
As a consequence we find ẋ = x + 21 p, ṗ = − ∂H
∂x
= −(1 + p) Hence
Z Z
dp
= (−1) dt, log |1 + p| = −t + C, |1 + p| = Ae−t
1+p

35
From (T ) we find p(T ) = 0, thus 1 = Ae−T, A = eT . Accordingly, 1 + p =
±eT −t , p(t) = eT −t − 1 (p(T ) = 0 hence the plus sign must be used.) It
follows that ẋ − x = 21 p = 12 [eT −t − 1], dt
d
(e−t x) = 12 [eT −2t − e−t ],
1 1
x(t) = − eT −t + + Det
4 2
Now
1 1 1 1 1 1 1
x(0) = 0 = − eT + + D, D = (−1 + eT ) = eT ) = eT − ,
4 2 2 2 4 4 2
hence
1 1 1 1 1 1
x∗ (t) = − eT −t + eT +t − et + = [eT +t − eT −t ] + (1 − et )
4 4 2 2 4 2

Next we willl study control problems of the form
 R t1

 max/min t0
f (t, x(t), u(t)) dt, ẋ(t) = g(t, x(t), u(t)),

x(t ) = x , x(t ) free,
0 0 1
(2)


 u(t) ∈ U, t ∈ [t0 , t1 ], u piecewise continuous,

U a given interval in R

We shall assume that f and g are C 1 −functions and that x(t1 ) satisfies
exactly one of the following three terminal conditions:

(i)
 x(t1 ) = x1 , where x1 is given
(3) (ii) x(t1 ) ≥ x1 , where x1 is given

(iii) x(t1 ) is free


In the present situation it turns out that we must consider two different pos-
sibilities for the Hamilton function in order to obtain the correct Maximum
Principle.

(A) H(t, x, u, p) = f (t, x, u) + pg(t, x, u) (normal problems)

or
(B) H(t, x, u, p) = pg(t, x, u) (degenerate problems)

36
Case (A) is by far the most interesting one. Consequently, we shall almost
always discuss normal problems in our applications. Note that in case (B)
the Hamiltonian does not depend on the integrand f .
Remark. If x(t1 ) is free (as in (iii)), then the problem will always be
normal. More general, if p(t) = 0 for some t ∈ [t0 , t1 ], then it can be shown
that the problem is normal.
Pontryagin’s Maximum Principle takes the following form.
Theorem (The Maximum Principle II) Assume that (x∗ , u∗ ) is an
optimal pair for the control problem in (2) and that one of the terminal
conditions (i), (ii), or (iii) is satisfied. Then there exists a continuous and
piecewise differentiable function p such that p(t1 ) satisfies exactly one of the
following transversality conditions:

0
(i )
 no condition on p(t1 ), if x(t1 ) = x1
0
(ii ) p(t1 ) ≥ 0 if x(t1 ) ≥ x1 , (p(t1 ) = 0 if x(t1 ) > x1 )

 0
(iii ) p(t1 ) = 0 if x(t1 ) is free.

In addition, for each t ∈ [t0 , t1 ], the following conditions must hold

(a) u = u∗ (t) maximizes H(t, x∗ (t), u, p(t)), u ∈ U, that is,

H(t, x∗ (t), u, p(t)) ≤ H(t, x∗ (t), u∗ (t), p(t)), for all u ∈ U.

(b) The function p (called the adjoint function) satisfies the differential
equation
∂H ∂H ∗
ṗ(t) = − (t, x∗ (t), u∗ (t), p(t)) (written − ),
∂x ∂x
except at the discontinuity points of u∗ .
(c) If p(t) = 0 for some t ∈ [t0 , t1 ], then the problem is normal.
Once again concavity of the Hamiltonian H in (x, u) yields sufficiency for
the existence of an optimal pair:
Mangasarian’s Theorem II. Assume that (x∗ , u∗ ) is an admissible
pair for the maximum problem given in (2) and (3) above. Assume further
that the problem is normal. If the map (x, u) 7→ H(t, x, u, p(t)) is concave for
each t ∈ [t0 , t1 ], then each admissible pair (x∗, u∗ ) that satisfies conditions
R(a),
t1
(b), and (c) of the Maximum Principle II, will maximize the integral
t0
f (t, x, ẋ) dt.

37
In contrast to the Maximum Principle, the proof of Mangasarian’s Theo-
rem is neither particularly long nor difficult. For the proof we shall need the
following
Lemma. If φ is a concave real-valued C 1 −function on an interval I,
then
φ has a maximum at x0 ∈ I ⇐⇒ φ0 (x0 )(x0 − x) ≥ 0, for all x ∈ I.
Proof. We let a < b be the endpoints of I.
=⇒: Assume that x : 0 is a max point for φ. Since the derivative φ0 exists
there are exactly three possibilities:
(1) x0 = a and φ0 (x0 ) ≤ 0,
(2) a < x0 < b and φ0 (x0 ) = 0, and
(3) x0 = b and φ0 (x0 ) ≥ 0,
In all three cases we have φ0 (x0 )(x0 − x) ≥ 0.
⇐=: Suppose that φ0 (x0 )(x0 − x) ≥ 0. Again there are three possibilities:
(1) x0 = a: Then x0 − x ≤ 0, hence φ0 (x0 ) ≤ 0. Since φ is concave, the
tangent lies above or on the graph of φ at each point. Hence φ0 decreases
(this may also be shown analytically) so that
φ0 (x) ≤ φ0 (a) = φ0 (x0 ) ≤ 0, for all x ∈ I.
Hence φ decreases and x0 = a is a maximum point.
(2) a < x0 < b: For any x ∈ I which satisfies x0 < x we have x0 − x < 0,
hence φ0 (x0 ) ≤ 0. On the other hand, if x0 > x, then x − x0 > 0, and hence
φ0 (x0 ) ≥ 0 too. It follows that φ(x0 ) = 0. As φ0 is decreasing, x0 must be a
max point for φ.
(3) x0 = b: Then x0 − x ≥ 0, hence φ0 (x0 ) ≥ 0. Since φ0 is decreasing,
0
φ (x) ≥ 0 for all x ∈ I. Therefore φ is increasing and x0 = b must be a
maximum point.
Proof of Mangasarian’s Theorem. Assume that (x∗ , u∗ ) is an admis-
sible pair and satisfies conditions (a), (b), and (c) in the hypothesis of the
Maximum Principle. Assume further that the map
(x, u) 7−→ H(t, x, u, p(t))
is concave for each t ∈ [t0 , t1 ]. Let (x, u) be any admissible pair for the control
problem. We must show that
Z t1 Z t1
∗ ∗
∆= f (t, x (t), u (t)) dt − f (t, x(t), u(t)) dt ≥ 0
t0 t0

38
Let us introduce the simplified notations

f ∗ = f (t, x∗ (t), u∗ (t)), f = f (t, x(t), u(t)),

and similarly for g ∗ , g, H ∗ , and H. Thus

H ∗ = H(t, x∗ (t), u∗ (t), p(t)), H = H(t, x(t), u(t), p(t)).

Then
H ∗ = f ∗ + pg ∗ ,
hence

f ∗ = H ∗ − pg ∗ = H ∗ − pẋ∗ since ẋ∗ = g ∗ by condition (b))

and
f = H − pẋ.
It follows that
Z t1 Z t1
∗
∆= (f − f ) dt = (H ∗ − pẋ∗ − H + pẋ) dt
t t
Z 0t1 Z0 t1
= p(ẋ∗ − ẋ) dt − (H − H ∗ ) dt
t0 t0

Since H is concave with respect to (x, u) the ”gradient inequality” (see Propo-
sition 9 of Section 1) holds

∂H ∗ ∗ ∗ ∂H ∗
H −H ≤ (x − x ) + (u − u∗ )
∂x ∂u
∂H ∗
= −ṗ(x − ẋ) + (u − u∗ )
∂u
≤ −ṗ(x − ẋ) (by the last Lemma),

except at the discontinuities of u∗ . For the last inequality above we used

∗
that ∂H
∂u
(u∗ − u) ≥ 0 which holds since H is concave in (x, u), hence in u,
Lemma. Consequently,

39
Z t1 Z t1
∗
∆≥ p(ẋ − ẋ ) dt + ṗ(x − x∗ ) dt
t t0
Z t1 0
d
[p(x − x∗ )] dt ( here we used that the problem is normal ).
t0 dt
= |tt10 p(x − x∗ )
= p(t1 )[x(t1 ) − x∗ (t1 )] − p(t0 )[x(t0 ) − x∗ (t0 )]
= p(t1 )[x(t1 ) − x∗ (t1 )] (since x(t0 ) = x0 = x∗ (t0 ))
By the terminal conditions,
(i) x(t1 ) = x1 = x∗ (t1 ) : ∆ ≥ 0 is clear.
(ii) x(t1 ) ≥ x1 : If x∗ (t1 ) = x1 , then
x(t1 ) ≥ x1 = x∗ (t1 ), and since p(t1 ) ≥ 0 we find that ∆ ≥ 0
(iii) x(t1 ) is free: p(t1 ) = 0, hence ∆ ≥ 0.
Hence we have shown that the pair (x∗ , u∗ ) is optimal.
Example 3 (SSS 12.4 Eksempel 1)
We shall solve the control problem
Z 1
max x(t) dt, ẋ = x+u, x(0) = 0, x(1) ≥ 1, u(t) ∈ [−1, 1] = U, for all t ∈ [0, 1].
0

We will assume (as usual) that the problem is normal. Hence the Hamiltonian
is
H(t, x, u, p) = x + p(x + u) = (1 + p)x + pu
which is linear in (x, u)m hence it is concave (for all t ∈ [0, 1]). Thus Man-
gasarian’s Theorem applies and any admissible pair (x∗ , u∗ ) that satisfies
the conditions of the Maximum Principle will solve the problem. Here the
transversality condition
(iii0 ) p(1) ≥ 0(p(1) = 0 if x(1) > 1)
applies. Since H is linear in u, the maximum is attained at one of the
endpoints u = 1 or u = −1, hence

1, if p(t) > 0

∗
u (t) = −1, if p(t) < 0

undetermined if p(t) = 0. We let u∗ (t) = 1 in this case.


40
Furthermore, for x = x∗ , u = u∗ ,
∂H
= 1 + p = −ṗ (from condition (b) in the Maximum Principle)
∂x
Hence ṗ + p = −1 which yields p(t) = Ae−t − 1, where p(1) = Ae−1 − 1 ≥ 0,
so that A ≥ e. Therefore

p(t) = Ae−t − 1 ≥ e1−t − 1 = h(t),

where
h0 (t) = −e1−t < 0, and h(1) = 0, h(0) = e − 1 > 0.
Hence h(t) ∈ [0, e − 1]. In particular,

p(t) ≥ h(t) > 0 for t ∈ [0, 1), p(1) ≥ 0.

Accordingly
u = u∗ (t) = 1, t ∈ [0, 1].
It follows that
ẋ∗ − x∗ = 1, x∗ (t) = Bet − 1.
Further,
x∗ (0) = 0 ⇒ B = 1 ⇒ x∗ (t) = et − 1.
Thus x∗ (1) = e − 1 > 0, so that p(1) = 0, and A = e. We have found

x∗ (t) = et − 1, u∗ (t) = 1, for all t ∈ [0, 1]

and this is an optimal pair by Mangasarian’s Theorem.

An inspection of the proof of Mangasarian’s Theorem reveals that it will
work locally if the function (x, u) 7→ H(t, x, u, p(t)) is concave only in some
open and convex subset V of R2 . In fact, under these hypothesis the Gradient
Inequality (Proposition 1.9) is valid. This yields
Corollary. Assume that (x∗ , u∗ ) is an admissible pair for the maximum
problem given in (2) and (3) above. Assume further that the problem is
normal. Let V be an open and convex subset of R2 and let

W = {(x, u) : (x, u) is admissible and (x(t), u(t)) ∈ V for all t ∈ [t0 , t1 ]}.

If for each t ∈ [t0 , t1 ] the map (x, u) 7→ H(t, x, u, p(t)) is concave on V , then
each admissible pair (x∗ , u∗ ) that satisfies conditions (a), (b), and (c) of the

41
R t1
Maximum Principle II, will maximize the integral t0
f (t, x, ẋ) dt among all
admissible pairs (x, u) in W .

Arrow’s condition.
Mangasarian’s Theorem requires that the Hamilton function is concave with
respect to (x, u). In many important applications this condition is not sat-
isfied. Consequently, we will consider a weaker but related condition that in
some cases is sufficient for optimality. Define
Ĥ(t, x, p) = max H(t, x, u, p)
u∈U

where we assume that the maximum exists. Then we have

Arrow’s Sufficiency Condition. In the control problem (2) above as-
sume that the Hamilton function is given by
H(t, x, u, p) = f (t, x, u) + pg(t, x, u).
(that is, the problem is normal). If the conditions of the Maximum Principle
are satisfied for an admissible pair (x∗ , u∗ ) and the function x 7→ Ĥ(t, x, p(t))
is concave for each t ∈ [t0 , t1 ], then (x∗ , u∗ ) is optimal for the problem.
Example. We shall apply the Maximum Principle to find a possible
optimal pair for the problem. Then we will show that this pair is indeed
optimal.
Z 1
max (tx − u2 ) dt, where ẋ = x + u2 , u(t) ∈ [0, 1] for all t ∈ [−1, 1],
−1

and with the endpoint conditions x(−1) = −2e−1 − 1, x(1) free.

Solution:
Assume that (x∗ , u∗ ) is an optimal pair. The Hamilton function is
H(t, x, u, p) = tx − u2 + p(x + u2 ) = (t + p)x + (p − 1)u2 .
According to the Maximum Principle there exists a continuous function p(t)
such that for each t the control u = u∗ (t) maximizes H(t, x∗ (t), u, p(t)). Here
∂H
= 2(p − 1)u = 0 if and only if u = 0 or p = 1.
∂u
The second derivative of H with respect to u is equal to 2(p − 1) and it is
negative ⇔ p < 1. Consequently, whenever

42
p = p(t) > 1 :
u = u∗ (t) = 0 yields a minimum for H. Further, u = u∗ (t) = 1 gives a
maximum, and for all such t the function x∗ must satisfy the equation
ẋ∗ − x∗ = 1, hence x∗ (t) = Aet − 1. Next, if
p = p(t) < 1 :
u = u∗ (t) = 0 maximizes H, so that ẋ∗ − x∗ = 0, x∗ (t) = Bet .
Furthermore,
∂H
−ṗ(t) = = t + p(t),
∂x
hence ṗ(t) + p(t) = −t, p(t) = 1 − t + Ce−t . Here p(1) = C = 0, since x(1)
is free. Therefore,
p(t) = 1 − t, t ∈ [−1, 1].
Thus p(t) > 1 ⇔ t < 0, so that p(t) > 1 on [−1, 0) and p(t) < 1 on (0, 1].
t ∈ [−1, 0) :
p(t) > 1, x∗ (t) = Aet − 1, where x(−1) = Ae−1 − 1 = −2e−1 − 1, A = −2.
Hence x∗ (t) = −2et − 1, u∗ (t) = 1.
t ∈ (0, 1] :
p(t) < 1, x∗ (t) = Bet . Continuity of x∗ at t = 0 gives x∗ (0) = B = −2 − 1 =
−3. Therefore,
x∗ (t) = −3et , u∗ (t) = 0
t=1:
∗
u (1) can be assigned any value in [0, 1], since H is constant in u. Let us
choose the value u∗ (1) = 1.
Then the pair (x∗ , u∗ ) is the only possible candidate for a solution.
It remains to prove that (x∗ , u∗ ) is in fact optimal. We see that H(t, x, u, p(t))
is concave in (x, u) ⇔ p(t) ≤ 1 ⇔ t ∈ [0, 1]. Hence Mangasarian’s Theorem
does not apply. However, we observe that
Ĥ(t, x, p(t)) = max H(t, x, u, p(t)) = (t + p(t)x) + p(t) − 1 if p(t) > 1
u∈[0,1]

and
Ĥ(t, x, p(t)) = (t + p(t))x, if p(t) ≤ 1,
whence Ĥ is linear and is therefore concave in x. Accordingly, Arrow’s The-
orem implies that the pair (x∗ , u∗ ) solves the problem.
Exercise. Solve the control problem in the last example if the end point
conditions instead are
1
x(−1) = 0, x(1) = e2 − e1+ e .

43
Hint: Try with p(t) < 1 on an interval (t0 , 1] (p denotes the adjoint function).

44
REFERENCES
[EP] Edwards & Penney, Elementary Differential Equations, 2009. Upper
Saddle River, N.J. : Pearson Prentice Hall. ISBN: 978-0-13-235881-1.
[GF] Gelfand, I. M., Fomin, S. V., Calculus of variations, Prentice-Hall,
1963.
[O] Osgood, W. F., Sufficient Conditions in the Calculus of Variations,
The Annals of Mathematics, 2nd Series, Vol. 2, No. 1/4, 1900-1901 (105-129)
(Online at: www.jstor.org/stable/2007189)
[PBGM] Pontryagin, Boltyanskii, Gamkrelidze, Mischchenko, The Math-
ematical Theory of Optimal Processes, Interscience Publishers (Wiley), 1962.
[SSS] Sydsæter K., Seierstad A., Strøm, A., Matematisk Analyse, Bind
2, Gyldendal Akademisk 2004.
[SS] Seierstad, A., Sydster, K., Optimal Control Theory with Economic
Applications, North-Holland 1987. ISBN: 0-444-87923-4.

B. Solving Quadratic Equations
No ratings yet
B. Solving Quadratic Equations
23 pages
2.2 Double Integrals - Volume
No ratings yet
2.2 Double Integrals - Volume
2 pages
Limits Exercises With Answers
100% (4)
Limits Exercises With Answers
17 pages
Differential Calculus of Functions of One Variable
No ratings yet
Differential Calculus of Functions of One Variable
30 pages
Partial Derivatives
No ratings yet
Partial Derivatives
29 pages
Short Notes - Conic Section - Achievers 2026
No ratings yet
Short Notes - Conic Section - Achievers 2026
5 pages
Derivative Many Variables
No ratings yet
Derivative Many Variables
25 pages
Introduction To Nonlinear Systems and Numerical Optimization
No ratings yet
Introduction To Nonlinear Systems and Numerical Optimization
83 pages
SMAM34 III Sem Calculas of Variations and Integral Equations Up
No ratings yet
SMAM34 III Sem Calculas of Variations and Integral Equations Up
176 pages
Of For: New Proof Giorgi's Theorem Concerning The Problem Elliptic Differential Equations
No ratings yet
Of For: New Proof Giorgi's Theorem Concerning The Problem Elliptic Differential Equations
12 pages
Formulas Vector Analysis (Eng)
No ratings yet
Formulas Vector Analysis (Eng)
2 pages
Directional Derivative
No ratings yet
Directional Derivative
11 pages
Pow6 2
No ratings yet
Pow6 2
3 pages
10th STD Maths Passing and Scoring Package Eng Version 2024-25
No ratings yet
10th STD Maths Passing and Scoring Package Eng Version 2024-25
12 pages
Convex Optimization
No ratings yet
Convex Optimization
108 pages
Mat121 Module 5
No ratings yet
Mat121 Module 5
18 pages
Classnotes Ma1101 Sayfalar 21 73
No ratings yet
Classnotes Ma1101 Sayfalar 21 73
53 pages
Real Analysis (해석학2) 2024
No ratings yet
Real Analysis (해석학2) 2024
14 pages
4 Periodic Orbits: 4.1 Poincar E-Bendixson Theorem
No ratings yet
4 Periodic Orbits: 4.1 Poincar E-Bendixson Theorem
12 pages
CRW 1. Functions Markscheme
No ratings yet
CRW 1. Functions Markscheme
7 pages
2.5. Differentiability
No ratings yet
2.5. Differentiability
22 pages
28.05.2018 Math152 Final Exam
No ratings yet
28.05.2018 Math152 Final Exam
6 pages
Notes Partial Derivatives Chapter From Book
No ratings yet
Notes Partial Derivatives Chapter From Book
25 pages
A Frame-Free Formulation of Elasticity
No ratings yet
A Frame-Free Formulation of Elasticity
16 pages
Multivariable Calculus
No ratings yet
Multivariable Calculus
23 pages
Differential Forms - No Untitled
No ratings yet
Differential Forms - No Untitled
161 pages
34922study Material of Semester Iii C5 T Unit 23 20 08 2020
No ratings yet
34922study Material of Semester Iii C5 T Unit 23 20 08 2020
30 pages
0 Main
No ratings yet
0 Main
13 pages
Activity Sheets in Math 10 Quarter 1, Week 7: MELC: Performs Division of Polynomial Using Long Division and
No ratings yet
Activity Sheets in Math 10 Quarter 1, Week 7: MELC: Performs Division of Polynomial Using Long Division and
11 pages
Lec 1
No ratings yet
Lec 1
16 pages
Linear Graphs Test
No ratings yet
Linear Graphs Test
7 pages
PartialDerivative and Polar Coordinates
No ratings yet
PartialDerivative and Polar Coordinates
7 pages
Lectures On Optimal Control Theory: Terje Sund March 3, 2014
No ratings yet
Lectures On Optimal Control Theory: Terje Sund March 3, 2014
54 pages
Lecture 9
No ratings yet
Lecture 9
5 pages
ps4 Sol
No ratings yet
ps4 Sol
3 pages
10th SA2 I Q
No ratings yet
10th SA2 I Q
29 pages
Control Theory
No ratings yet
Control Theory
45 pages
Ex1 71sol
No ratings yet
Ex1 71sol
19 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Mathematical Reasoning: Solutions
No ratings yet
Mathematical Reasoning: Solutions
16 pages
53S21 Review Midterm 2 Solutions
No ratings yet
53S21 Review Midterm 2 Solutions
27 pages
Lecture 28: Directional Derivatives, Gradient, Tangent Plane
No ratings yet
Lecture 28: Directional Derivatives, Gradient, Tangent Plane
2 pages
Document Calculus II Chapter 1
No ratings yet
Document Calculus II Chapter 1
20 pages
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
No ratings yet
Functions of Several Variables: Unconstrained Extrema: N K N N N H 0 N N
5 pages
Chapter 3 - Matrix
No ratings yet
Chapter 3 - Matrix
8 pages
Newton's Law
No ratings yet
Newton's Law
8 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
6502 Multi 4
No ratings yet
6502 Multi 4
5 pages
Ade NPTL Notes
No ratings yet
Ade NPTL Notes
207 pages
Ross Elementary Analysis
No ratings yet
Ross Elementary Analysis
26 pages
Fall 2013 Math 290 Lecture 16
No ratings yet
Fall 2013 Math 290 Lecture 16
23 pages
Wa0011.
No ratings yet
Wa0011.
21 pages
Domain and Range of Rational Functions
No ratings yet
Domain and Range of Rational Functions
2 pages
Differentiation in Several Variables
No ratings yet
Differentiation in Several Variables
12 pages
Week 3 Partial Derivatives, Chain Rule, Directional Derivatives
No ratings yet
Week 3 Partial Derivatives, Chain Rule, Directional Derivatives
2 pages
Week2 - Common Procedure FEM
No ratings yet
Week2 - Common Procedure FEM
32 pages
Calculus 3.formulas
No ratings yet
Calculus 3.formulas
49 pages
Chapter 14
No ratings yet
Chapter 14
27 pages
EEE 1206 011 Direct Quadrate Modeling Induction Motor MatLab Simulink PDF
No ratings yet
EEE 1206 011 Direct Quadrate Modeling Induction Motor MatLab Simulink PDF
7 pages
Chapter 4: Matrices and Systems of Linear Equations
No ratings yet
Chapter 4: Matrices and Systems of Linear Equations
26 pages
MPM Libre PDF
No ratings yet
MPM Libre PDF
37 pages
Dynamical Systems: Supplementary Notes
No ratings yet
Dynamical Systems: Supplementary Notes
27 pages
063) AOD SJDDJ
No ratings yet
063) AOD SJDDJ
10 pages
3 × 3 Magic Squares With Duplicate Digits Allowed: 1 Problem
No ratings yet
3 × 3 Magic Squares With Duplicate Digits Allowed: 1 Problem
4 pages
Learning Hessian Matrix PDF
No ratings yet
Learning Hessian Matrix PDF
100 pages
EE 452 Digital Control Systems
No ratings yet
EE 452 Digital Control Systems
2 pages
Higher Order Linear Differential Equations: Math 240 - Calculus III
No ratings yet
Higher Order Linear Differential Equations: Math 240 - Calculus III
25 pages
Several Variable Differential Calculus: − ωt) represents
No ratings yet
Several Variable Differential Calculus: − ωt) represents
17 pages
MATHS
No ratings yet
MATHS
6 pages
AMA286 Supplementary Notes: September 3, 2008
No ratings yet
AMA286 Supplementary Notes: September 3, 2008
30 pages
Elijah's Math Notes
No ratings yet
Elijah's Math Notes
58 pages
Answers of The Exam of November 1, 2013 Differential Equations (wi2180LR)
No ratings yet
Answers of The Exam of November 1, 2013 Differential Equations (wi2180LR)
8 pages
SequnceSeries of Functions
No ratings yet
SequnceSeries of Functions
50 pages
MAT2322 Notes - by Eric Hua
No ratings yet
MAT2322 Notes - by Eric Hua
63 pages
MA2104 CheatSheet PDF
No ratings yet
MA2104 CheatSheet PDF
2 pages
Numerical Optimal Control Draft
No ratings yet
Numerical Optimal Control Draft
122 pages
Partial Derivatives
No ratings yet
Partial Derivatives
9 pages
Spivak
No ratings yet
Spivak
22 pages
Math 319 Notes, A.S
No ratings yet
Math 319 Notes, A.S
13 pages
KSR-Numerical Methods
No ratings yet
KSR-Numerical Methods
3 pages
Vector Calculus (1016-410-01) : Plan For The Day - Tuesday 12/4 1 Functions of Several Variables
No ratings yet
Vector Calculus (1016-410-01) : Plan For The Day - Tuesday 12/4 1 Functions of Several Variables
8 pages
First Periodic Test in Math 7: Pagalanggang National High School
No ratings yet
First Periodic Test in Math 7: Pagalanggang National High School
2 pages
Advance Cal Unit 1 2
No ratings yet
Advance Cal Unit 1 2
34 pages
Calculus of Variations: X X X X 0
No ratings yet
Calculus of Variations: X X X X 0
7 pages
z12 Partial Derivatives
No ratings yet
z12 Partial Derivatives
6 pages
International A Level Mathematics Pure Mathematics 3 Student Book Sample
No ratings yet
International A Level Mathematics Pure Mathematics 3 Student Book Sample
34 pages
Aops Community 2023 Aime
No ratings yet
Aops Community 2023 Aime
6 pages
Introduction To The Calculus of Variations
100% (1)
Introduction To The Calculus of Variations
12 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Controltheory PDF

Uploaded by

Controltheory PDF

Uploaded by

LECTURES ON OPTIMAL CONTROL

In the theory of mathematical optimization one try to find maximum or

where x(t0 ) = x0 , x(t1 ) = x1 , t0 , t1 , x0 , x1 are given numbers and F is a

1 FUNCTIONS OF SEVERAL VARIABLES.

~e1 , ~e2 , . . . , ~en

be the vectors of the standard basis of Rn , that is,

~ei = (0, 0, . . . , 0, 1, 0, . . . , 0) (i = 1, 2, . . . , n),

if the limit exists.

Example 1. Let F (x1 , x2 ) = 7x21 x32 + x42 , ~x0 = (2, 1) Then

∇F (x1 , x2 ) = (14x1 x32 , 21x21 x22 + 4x32 ),

Level surfaces: Let c ∈ R, F : Rn → R. The set

is called the level surface (or the level hypersurface ) of F corresponding to

∇F (~x0 ) · (~x − ~x0 ) = 0.

In particular, ∇F ~x0 ) is perpendicular to the tangent plane at ~x0 , ∇F (~x0 ) ⊥

∇F (x1 , x2 ) = (2x1 x2 , x21 ) and ∇F (1, 2) = (4, 1).

The tangent plane at the point √1 (1, 1, 1) has the equation:

Directional derivatives. Assume that F : A ⊆ Rn → R. Let ~a ∈ Rn

if the limit exists.

lim F~a0 (~x + ~y ) = F~a0 (~x).

[~x, ~y ] = {t~x + (1 − t)~y : 0 ≤ t ≤ 1}

from ~x to ~y is contained in A. Then there exists a point w~ on the open line

F (~x) − F (~y ) = F~x0 −~y (w)

Proof. Set g(t) = F (t~x + (1 − t)~y ), t ∈ (0, 1). Then

Since F was continuously differentiable, it is clear that g 0 (t) is continuous

g(1) − g(0) = g 0 (θ)

(1) Fc~0a (~x) = cF~a0 (~x), for all c ∈ R and ~a ∈ Rn

||h(~a + ~b)|| < r and ||h~b|| < r.

(∗) F (~x + h(~a + ~b)) − F (~x)

(∗∗) F ((~x + h~b) + h~a) − F (~x + h~b) = Fh~

If we divide the equation in (∗∗) by h, then we find

since F is continuously differentiable. Consequently, dividing by h in (∗), it

We write ~a = ni=1 ai~ei . Then

F~a0 (~x) = ∇F (~x) · ~a,

|F~a0 (~x) − F~a0 (~x0 )| = |[∇F (~x) − ∇F (~x0 )] · ~a]|

is continuous at ~x0 for all ~a ∈ Rn . Hence it follows that F is continuously

for all ~x, ~y ∈ S, and all t ∈ [0, 1].

~z = t~x + (1 − t)~y = t(x1 , x2 ) + (1 − t)(y1 , y2 ) ∈ S.

||~z|| = ||t~x + (1 − t)~y ||

Hence ~z ∈ S. This shows that S is convex.

(∗∗) f (t~x + (1 − t)~y ) ≥ tf (~x) + (1 − t)f (~y ),

z = ∇f (~y ) · (~x − ~y ) + f (~y ),

h(t~x + (1 − t)~y ) = af (t~x + (1 − t)~y ) + bg(t~x + (1 − t)~y )

be a symmetric 2 × 2-matrix, and let

Definition 4. We say that Q is

so that Q is positive semidefinite.

Assume that a > 0 : As we have seen above,

a contradiction. Hence b = 0, and ac − b2 = 0 ≥ 0.

g(t) = f (~b + t(~a − ~b)) = f (t~a + (1 − t)~b).

g 0 (t) = ∇f (~r(t)) · ~r0 (t)

From the conditions in (a) the ”Hessematrix” H of f ,

is positive semidefinite in S. Accordingly g 00 (t) ≥ 0, for all t ∈ (0, 1), so that

f (t~a + (1 − t)~b) = g(t) = g(t · 1 + (1 − t) · 0)

Since ~a and ~b were arbitrary in S, we deduce that f is convex.

p(λt + (1 − λ)s)) = f (~x + λt~h + (1 − λ)s~h)

which proves the claim.

p00 (t) = ~ht H(~r(t))~h (~h ∈ R2 arbitrary).

As p00 (t) ≥ 0, it follows that H(~r(t)) is positive semidefinite, t ∈ I. In partic-

(2) f (~x) − f (~z) ≤ ∇f (~z) · (~x − ~z)

Multiplying the inequality in (2) by t and the inequality in (3) by 1 − t, and

(4) t(f (~x) − f (~z)) + (1 − t)(f (~y ) − f (~z))

Here the right side of (4) equals ~0, since

t(~x − ~z) + (1 − t)(~y − ~z) = t~x + (1 − t)~y − t~z − ~z + t~z

Rearranging the inequality (4) we hence find,

tf (~x) + (1 − t)f (~y ) ≤ f (~z) + (1 − t)f (~z)

which shows that f is convex.

(a) if f is concave, then ~x∗ is a local maximum point for f

(b) if f is convex, is ~x∗ a local minimum point for f

f (~x) − f (~x∗ ) ≤ ∇f (~x∗ ) · (~x − ~x∗ ) = 0,

hence f (~x) ≤ f (~x∗ ), for all ~x ∈ S.

U (t~x + (1 − t)~y ) = G(f (t~x + (1 − t)~y )

If a function x∗ maximizes or minimizes the integral in (∗) among all C 2 −functions

The general solution is

x(t) = et+1 − e1−t

µ ≥ 0, µ(s − ) = µ(s + ) = 0, µ > 0/on (s − , s + )

and µ = 0 outside the interval (s − , s + ). We may for instance let

assume that x∗ is a C 2 −function that solves (1). Let µ be an arbitrary

µ ≥ 0, µ(s − ) = µ(s + ) = 0, µ > 0/on (s − , s + )

and µ = 0 outside the interval (s − , s + ). We may for instance let

J(x∗ ) ≥ J(x + µ)