0% found this document useful (0 votes)
18 views

Optimal Control Notes

Uploaded by

gilbardopandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Optimal Control Notes

Uploaded by

gilbardopandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 208

Notes for ENEE 664: Optimal Control

André L. Tits

September 2024

Copyright ©1993–2024, André L. Tits. All Rights Reserved 1


2 Copyright ©1993-2018, André L. Tits. All Rights Reserved
Contents

Copyright ©1993–2024, André L. Tits. All Rights Reserved 3


CONTENTS

4 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 1

Motivation and Scope

1.1 Some Examples


We give some examples of design problems in engineering that can be formulated as math-
ematical optimization problems. Although we emphasize here engineering design, optimiza-
tion is widely used in other fields such as economics or operations research. Such examples
can be found, e.g., in [?].

Example 1.1 Design of an operational amplifier (opamp)


Suppose the following features (specifications) are desired
1. a large gain-bandwidth product

2. sufficient stability

3. low power dissipation


In this course, we deal with parametric optimization. This means, for this example, that
we assume the topology of the circuit has already been chosen, the only freedom left being
the choice of the value of a number of “design parameters” (resistors, capacitors, various
transistor parameters). In real world, once the parametric optimization has been performed,
the designer will possibly decide to modify the topology of his circuit, hoping to be able to
achieve better performances. Another parametric optimization is then performed. This loop
may be repeated many times.
To formulate the opamp design problem as an optimization problem, one has to spec-
ify one (possibly several) objective function(s) and various constraints. We decide for the
following goal:
minimize the power dissipated
subject to gain-bandwidth product ≥ M1 (given)
frequency response ≤ M2 at all frequencies.
The last constraint will prevent two high a “peaking” in the frequency response, thereby
ensuring sufficient closed-loop stability margin. We now denote by x the vector of design
parameters
x = (R1 , R2 , . . . , C1 , C2 , . . . , αi , . . . ) ∈ Rn

Copyright ©1993–2024, André L. Tits. All Rights Reserved 5


Motivation and Scope

For any given x, the circuit is now entirely specified and the various quantities mentioned
above can be computed. More precisely, we can define
P (x) = power dissipated
GB(x) = gain-bandwidth product
F R(x, ω) = frequency response, as a function of the frequency ω.
We then write the optimization problem as

min{P (x)|GB(x) ≥ M1 , F R(x, ω) ≤ M2 ∀ω ∈ Ω} (1.1)

where Ω = [ω1 , ω2 ] is a range of “critical frequencies.” To obtain a canonical form, we now


define

f (x) := P (x) (1.2)


g(x) := M1 − GB(x) (1.3)
φ(x, ω) := F R(x, ω) − M2 (1.4)

and we obtain

min{f (x)|g(x) ≤ 0, φ(x, ω) ≤ 0 ∀ω ∈ Ω} (1.5)

Note. We will systematically use notations such as

min{f (x) : g(x) ≤ 0}

not to just indicate the minimum value, but rather as a short-hand for

minimize f (x)
subject to g(x) ≤ 0

More generally, one would have

min{f (x)|g i (x) ≤ 0, i = 1, 2, . . . , m, φi (x, ω) ≤ 0, ∀ω ∈ Ωi , i = 1, . . . , k}. (1.6)

If we define g : Rn → Rm by
 
g 1(x)

g(x) =  .. 
(1.7)
. 
m
g (x)

and, assuming that all the Ωi ’s are identical, if we define φ : Rn × Ω → Rk by


 
φ1 (x, ω)
φ(x, ω) = 
 .. 
(1.8)
. 
k
φ (x, ω)

we obtain again

min{f (x)|g(x) ≤ 0, φ(x, ω) ≤ 0 ∀ω ∈ Ω} (1.9)

6 Copyright ©1993-2018, André L. Tits. All Rights Reserved


1.1 Some Examples

[This is called a semi-infinite optimization problem: finitely many variables, infinitely many
constraints.]
Note. If we define

ψ i (x) = sup φi (x, ω), i = 1, . . . , k, (1.10)


ω∈Ω

(1.9) is equivalent to

min{f (x)|g(x) ≤ 0, ψ(x) ≤ 0} (1.11)

(more precisely, {x|φ(x, ω) ≤ 0 ∀ω ∈ Ω} = {x|ψ(x) ≤ 0}). Further, ψ(x) can be absorbed


into g(x).

Exercise 1.1 Prove the equivalence between (1.9) and (1.11). (To prove A = B, prove
A ⊆ B and B ⊆ A.)

Such transformation may not be advisable, for the following reasons:

(i) some potentially useful information (e.g., what are the ‘critical’ values of ω) is lost
when replacing (1.9) by (1.11)

(ii) for given x, ψ(x) may not be computable exactly in finite time (this computation
involves another optimization problem)

(iii) ψ may not be smooth even when φ is. E.g., x ∈ R, Ω = {−1, 1} with φ(x, −1) = −x,
φ(x, 1) = x, which yields ψ(x) = |x|, which is nonsmooth at x = 0; or more generally,
x, ω ∈ Rn ,Ω = {ω : kωk2 = 1}, φ(x, ω) = ω T x (2nd-order cone).

Exercise 1.2 Suppose that φ: Rn × Ω → R is continuous and that Ω is compact, so that


the ‘sup’ in (1.10) can be written as a ‘max’.

(a) Show that ψ is continuous.

(b) By exhibiting a counterexample, show that there might not exist a continuous function
ω(·) such that, for all x, ψ(x) = φ(x, ω(x)).

Exercise 1.3 Again referring to (1.10), show, by exhibiting counterexamples, that continuity
of ψ is no longer guaranteed if either (i) Ω is compact but φ is merely continuous in each
variable separately or (ii) φ is jointly continuous but Ω is not compact, even when the “sup”
in (1.10) is attained for all x.

On the other hand, smoothness of φ and compactness of Ω do not guarantee differentiability


of ψ.

Exercise 1.4 Referring still to(1.10), exhibit an example where φ ∈ C∞ (all derivatives
exist and are continuous), where Ω is compact, but where ψ is not everywhere differentiable.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 7


Motivation and Scope

In this course, we will mostly limit ourselves to classical (non semi-infinite) problems
(and will generally assume continuous differentiability), i.e., to problems of the form

min{f 0 (x) : f (x) ≤ 0, g(x) = 0} (1.12)

where f 0 : Rn → R, f : Rn → Rm , g : Rn → Rℓ , for some positive integers n, m and ℓ,


are continuously differentiable.

Remark 1.1 To fit the opamp design problem into formulation (1.11) we had to pick one
of the design specifications as objective (to be minimized). Intuitively more appealing would
be some kind of multiobjective optimization problem.

Remark 1.2 Problem (1.12) is broader than it may seem at first sight. For instance, it
includes 0/1 variables (in the scalar case, take g(x) := x(x − 1)) and integer variables (in
the scalar case take, e.g., g(x) := sin(πx).

Example 1.2 Design of a p.i.d controller (proportional - integral - derivative)


The scalar plant G(s) is to be controlled by a p.i.d. controller (see Figure 1.1). Again, the
structure of the controller has already been chosen; only the values of three parameters have
to be determined (x = [x1 , x2 , x3 ]T ).

R(s) + E(x,s) Y(x,s)


x3
x1 + x2s + 
s G(s)




T(x,s)

Figure 1.1:

Suppose the specifications are as follows

– low value of the ISE for a step input (ISE = integral of the square of the difference
(error) between input and output, in time domain)

– “enough stability”

– short rise time, settling time, low overshoot

We decide to minimize the ISE, while keeping the Nyquist plot of T (x, s) outside some
forbidden region (see Figure 1.2) and keeping rise time, settling time, and overshoot under
given values. The following constraints are also specified.

−10 ≤ x1 ≤ 10 , − 10 ≤ x2 ≤ 10 , .1 ≤ x3 ≤ 10

Exercise 1.5 Put the p.i.d. problem in the form (1.6), i.e., specify f , g i , φi , Ωi .

8 Copyright ©1993-2018, André L. Tits. All Rights Reserved


1.1 Some Examples

parabola
y=(a+1)x2 - b u(t)
y(t)
(-1,0) 1
w=0

ι(t)
T(x,jw)

w=w̄
Tr Ts T
Figure 1.2:

T (x, jω) has to stay outside the forbidden region ∀ω ∈ [0, ω̄]

For a step input, y(x, t) is desired to remain between l(t) and u(t) for t ∈ [0, T ]

Example 1.3 Consider again a plant, possibly nonlinear and time varying, and suppose we
want to determine the best control u(t) to approach a desired response.

ẋ = F (x, u, t)
y = G(x, t)

We may want to determine u(·) to minimize the integral


Z T
J(u) = (yu (t) − v(t))2 dt
0

where yu (t) is the output corresponding to control u(·) and v(·) is some reference signal.
Various features may have to be taken into account:

– Constraints on u(·) (for realizability)

– piecewise continuous
– |u(t)| ≤ umax ∀t

– T may be finite or infinite

– x(0), x(T ) may be free, fixed, constrained

– The entire state trajectory may be constrained (x(·)), e.g., to keep the temperature
reasonable

– One may require a “closed-loop” control, e.g., u(t) = u(x(t)). It is well known that
such ‘feedback’ control systems are much less sensitive to perturbations and modeling
errors.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 9


Motivation and Scope

Unlike Example 1.1 and Example 1.2, Example 1.3 is an ‘optimal control’ problem. Whereas
discrete-time optimal control problems can be solved by classical optimization techniques,
continuous-time problems involve optimization in infinite dimension spaces (a complete
‘waveform’ has to be determined).

1.2 Scope of the Course


To conclude this chapter we now introduce the class of problems that will be studied in this
course. Consider the abstract optimization problem

(P ) min{f (x) x ∈ S}

where S is a subset of a vector space X and where f : X → R is the cost or objective


function. S is the feasible set. Any x in S is a feasible point.

Definition 1.1 A point x̂ is called a (strict) global minimizer for (P ) if x̂ ∈ S and

f (x̂) ≤ f (x) ∀x ∈ S
(<) (∀x ∈ S, x 6= x̂)

Assume now X is equipped with a norm.

Definition 1.2 A point x̂ is called a (strict) local minimizer for (P) if x̂ ∈ S and ∃ ǫ > 0
such that

f (x̂) ≤ f (x) ∀x ∈ S ∩ B(x̂, ǫ)


(<) (∀x ∈ S ∩ B(x̂, ǫ), x 6= x̂)

Notation.
It often helps to distinguish the scalar 0 from the original in Rn or in a more general vector
space (see Appendix A). We will usually denote that latter by θ, sometimes specialized to
θn in the case of Rn or to θV in the case of a vector space V .

Scope
1. Type of optimization problems considered

(i) Finite-dimensional
unconstrained
equality constrained
inequality [and equality] constrained

10 Copyright ©1993-2018, André L. Tits. All Rights Reserved


1.2 Scope of the Course

linear, quadratic programs, convex problems


multiobjective problems
discrete optimal control

(ii) Infinite-dimensional
calculus of variations (no “control” signal) (old: 1800)
optimal control (new: 1950’s)

Note: most types in (i) can be present in (ii) as well.

2. Results sought

Essentially, solve the problem. The steps are

– conditions of optimality (“simpler” characterization of solutions)


– numerical methods: solve the problem, generally by solving some optimality con-
dition or, at least, using the insight such conditions provide.
– sensitivity: how “good” is the solutions in the sense of “what if we didn’t solve
exactly the right problem?”
(– duality: some transformation of the original problem into a hopefully simpler
optimization problem)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 11


Motivation and Scope

12 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 2

Linear Optimal Control: Some


Readily Solvable Instances

References: [?, ?, ?, ?].


At the outset, we consider linear time–varying models. A motivation for not starting
with the time-invariant case is that while, in a finite-horizon context (which is the simpler
situation as far as optimal control is concerned), allowing for time–varying models hardly
complicates the analysis, linear time–varying models are of much practical importance, e.g.,
in the context of trajectory tracking for nonlinear (even when time-invariant) systems.
Indeed, given a nominal control signal û and corresponding state trajectory x̂, a typical
approach to synthesizing close tracking of the trajectory is to first substitute a linear model
of the system, obtained by inearizing the original system around that trajectory, and then
focus on keeping the state of the linearization close to the origin. E.g., given
ẋ(t) = f (x(t), u(t))
(with appropriate regularity assumptions on f ) and a “nominal” associated trajectory (û, x̂),
linearization about this trajectory gives rise to the linear time–varying system
˙
x̃(t) = A(t) + B(t)ũ(t),
with A(t) := ∂f
∂x
∂f
(x̂(t), û(t)) and B(t) := ∂u (x̂(t), û(t)).
Thus, consider the linear control system
ẋ(t) = A(t)x(t) + B(t)u(t), x(t0 ) = x0 (2.1)
where x(t) ∈ Rn , u(t) ∈ Rm for all t, and A(·) and B(·) are matrix-valued functions. Suppose
A(·), B(·) and u(·) continuous. Then (2.1) has the unique solution
Zt
x(t) = Φ(t, t0 )x0 + Φ(t, σ)B(σ)u(σ)dσ (2.2)
t0

where the state transition matrix Φ satisfies the homogeneous differential equation

Φ(t, t0 ) = A(t)Φ(t, t0 )
∂t

Copyright ©1993–2024, André L. Tits. All Rights Reserved 13


Linear Optimal Control: Some Readily Solvable Instances

with initial condition


Φ(t0 , t0 ) = I.
Further, for any t1 , t2 , the transition matrix Φ(t1 , t2 ) is invertible and

Φ(t1 , t2 )−1 = Φ(t2 , t1 ).

2.1 Free terminal state, unconstrained, quadratic cost:


linear quadratic regulator (LQR)
A quadratic cost function of a function takes the form of an integral plus possibly terms
associated with “important” time points, typically the “terminal time”. The quadratic cost
function might be obtained from a second order expansion of the “true” cost function around
the desired trajectory.

2.1.1 Finite horizon


For simplicity, we start with the case for which the terminal state is free, and no constraint
(apart from the dynamics and the initial condition) are imposed on the control and state
values.
Consider the optimal control problem (see Exercise 2.3 below for a more general quadratic
cost function)
Z
1 tf  1
minimize J(u) := x(t)T L(t)x(t) + u(t)T u(t) dt + x(tf )T Qx(tf ) (P )
2 t0 2
subject to ẋ(t) = A(t)x(t) + B(t)u(t) ∀t ∈ [t0 , tf ], (2.3)
x(t0 ) = x0 , u ∈ C, (2.4)

where x(t) ∈ Rn , u(t) ∈ Rm and A(·), B(·) and L(·) are matrix-valued functions, and C
denotes the set of continuous mappings; minimization is with respect to u and x. (Equiva-
lently, x can be viewed as a function xu of u defined by the dynamics and initial condition,
so the only constraint is continuity of u.) The initial and final times t0 and tf are given, as
is the initial state x0 . The mappings A(·), B(·), and L(·), defined on the domain [t0 , tf ], are
assumed to be continuous. Without loss of generality, L(t) (for all t) and Q are assumed
symmetric.
Remark 2.1 Clearly, inclusion of the terminal cost in (P) could equivalently be achieved
by replacing L(t) with L(t) + δ(t − tf )Q, where δ is the Dirac impulse. In these notes though,
we rule out such impulses (controls u are functions, not distributions).
The problem just stated is, in a sense, the simplest meaningful continuous-time optimal
control problem. Indeed, the cost function is quadratic and the dynamics linear, and there
are no constraints on u (except for continuity). While a linear cost function may be even
simpler than a quadratic one, in the absence of (implicit or explicit) constraints on the
control, such problem would have no solution (except for the trivial situation where the cost
function is constant, independent of u and x).

14 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

In fact, the problem is simple enough that it can be solved without much advanced
mathematical machinery, by simply “completing the square”. Indeed, if (i) the integrand in
(P) were a perfect square and (ii) the give Q were the zero matrix, then the minimum value
of J would be attained with u selected (perhaps as a function of x) in such a way that the
integrand vanishes for all t.
Accordingly, the thought arises of first modifying the integrand to make it a perfect
square, e.g., by adding to it an appropriate quantity, say φ(x(t, u(t)). Of course, doing so
adds to J(u) a quantity (the integral of φ), which modifies the control problem to be solved.
Unless φ can be chosen in such a way that it integral does not depend on u or x. The R
following fundamental lemma provides a first step to resolving this conundrum. There ϕ
depends on (u, x) only via x(t0 ) and x(tf ), which as we will see can be taken care of by
making a proper choice of matrix K(·) involved in the lemma.

Lemma 2.1 (Fundamental Lemma/Path Independence Lemma) Let A(·), B(·) be continu-
ous matrix-valued functions and K(·) = K T (·) be a matrix-valued function with continuously
differentiable entries (in fact, absolute continuity is sufficient) on [t0 , tf ]. Then, if x(t) and
u(t) are related by
ẋ(t) = A(t)x(t) + B(t)u(t) ∀t, (2.5)
it holds that
Z tf
T T
x(tf ) K(tf )x(tf ) − x(t0 ) K(t0 )x(t0 ) = φ(x(t), u(t), K(t), K̇(t))dt (2.6)
t0

where
φ(x(t), u(t), K(t), K̇(t)) = x(t)T (K̇(t) + AT (t)K(t) + K(t)A(t))x(t) + 2x(t)T K(t)B(t)u(t).

Proof. Because x(·)T K(·)x(·) is continuously differentiable, the Fundamental Theorem of


Calculus yields
Ztf
T T d
x(tf ) K(tf )x(tf ) − x(t0 ) K(t0 )x(t0 ) = x(t)T K(t)x(t) dt
dt
t0

Ztf
= (ẋ(t)T K(t)x(t) + x(t)T K̇(t)x(t) + x(t)T K(t)ẋ(t))dt
t0

and the claim follows if one substitutes for ẋ(t) the right hand side of (2.5).

Importantly, in Lemma 2.6, matrix K(·) can be freely selected (subject to symmetry
and continuous differentiability) for the purpose at hand. This is key to invoking it toward
solving problem (P), as we now show.
Selecting K(tf ) to be equal to Q and invoking (2.6) yields, since x(t0 ) = x0 ,
Z tf
T T
x(tf ) Qx(tf ) = x0 K(t0 )x0 + φ(x(t), u(t), K(t), K̇(t))dt
t0

Copyright ©1993–2024, André L. Tits. All Rights Reserved 15


Linear Optimal Control: Some Readily Solvable Instances

which, substituted in (P), yields


Z tf  
T T T
2J(u) = x0 K(t0 )x0 + φ(x(t), u(t), K(t), K̇(t)) + x(t) L(t)x(t) + u(t) u(t) dt.
t0

Since K(·) does not depend on control u or state x, the first term in the right-hand side is
constant and it remains to select K(·) in such a way to that integrand is a perfect square.
Expanding φ we get
Z tf  
2J(u) = xT (K̇ + AT K + KA + L)x + u(K T B + B T K)x + uT u dt + xT0 K(t0 )x0 ,
t0

where we have removed the explicit dependence on t for notational compactness. Now recall
that the above holds independently of the choice of K(·). If we select it to satisfy (assuming
a solution exists on [t0 , tf ]!) the differential equation

K̇(t) = −AT (t)K(t) − K(t)A(t) − L(t) + K(t)B(t)B(t)T K(t) (2.7)

the above simplifies to


Z tf
 
2J(u) = x(t)T (K(t)B(t)B(t)T K(t))x(t) + 2x(t)T K(t)B(t)u(t) + u(t)T u(t) dt+xT0 K(t0 )x0 ,
t0

i.e.,
Ztf
1 1
J(u) = kB(t)T K(t)x(t) + u(t)k22 dt + xT0 K(t0 )x0 . (2.8)
2 2
t0

Note that, because K(·) is selected independently of u, the last term in (2.8) does not depend
on u.
We have completed the square! Equation (2.7) is an instance of a differential Riccati
equation (DRE) (after Jacopo F. Riccati, Italian mathematician, 1676–1754). Its right-hand
side is quadratic in the unknown K. We postpone the discussion of existence and uniqueness
of a solution to this equation and of whether such solution is symmetric (as required for the
Fundamental Lemma to hold). The following scalar example shows that this is an issue
indeed.

Example 2.1 (scalar Riccati equation) Consider the case of scalar, time-independent values
a = 0, b = 1, l = −1, q = 0, corresponding to the optimal control problem
Z tf
minimize (u(t)2 − x(t)2 )dt s.t. ẋ(t) = u(t) t ∈ [t0 , tf ), u ∈ C.
t0

The corresponding Riccati equation is

k̇(t) = 1 + k(t)2 , k(tf ) = 0

We get
atan(k(t)) − atan(k(tf )) = t − tf ,

16 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

yielding
k(t) = tan (t − tf ),
with a finite escape time at t̂ = tf − π2 . (In fact, if t0 < tf − π2 , even if we “forget” about the
singularity at t̂, k(t) = tan (t−tf ) is not the integral of its derivative (as would be required by
the Fundamental Lemma): k̇(t) is positive everywhere, while k(t) goes from positive values
just before t̂ to negative values after t̂.) It is readily verified that in fact this optimal control
problem itself has no solution if, e.g., with x0 = 0, when tf is too large. Indeed, with x0 = 0,
controls of the form u(t) = αt yields
Z tf
2 t2
J(u) = α t2 (1 − )dt,
0 4

which is negative for tf > 10 and hence can be made arbitrary largely negative by letting
α → ∞.

For the time being, we assume that (2.7) with terminal condition K(tf ) = Q has a unique
solution exists on [t0 , tf ], and we denote its value at time t by

K(t) = Π(t, Q, tf ),

so that Π(t, Q, tf ) satisfies the DRE



Π(t, Q, tf ) = −AT (t)Π(t, Q, tf )−Π(t, Q, tf )A(t)−L(t)+Π(t, Q, tf )B(t)B(t)T Π(t, Q, tf ) ∀t,
∂t
with Π(tf , Q, tf ) = Q. (Note that such solution Π(·, Q, tf ) must then be continuous on [t0 , tf ]—
indeed continuously differentiable on [t0 , tf ]) since the (continuous) right-hand side is its
derivative for all [t0 , tf ].) Accordingly, since u is unconstrained, it would seem from (2.8)
that J is minimized by the choice

u(t) = −B(t)T Π(t, Q, tf )x(t), (2.9)

and that its optimal value is xT0 Π(t0 , Q, tf )x0 . Equation (2.9) is a feedback law, which specifies
a control signal in closed-loop form.

Exercise 2.1 Show that this feedback law yields a well-defined, continuous control signal û.
(Hint. First solve for x, then obtain û.)

Remark 2.2 By “closed-loop” it is meant that the right-hand side of (2.9) does not depend
on the initial state x0 nor on the initial time t0 , but only on the current state and time. Such
formulations are of major practical importance: If, for whatever reason (modeling errors,
perturbations) the state at some time τ ∈ [t0 , tf ] is not what it was predicted to be (when
the optimal control u∗ (·) was computed, at time t0 ), the control generated by (2.9) is still
optimal over τ ∈ [t0 , tf ] (for the same cost function but with the integral starting at time
τ )—assuming no modeling errors or perturbations between times τ and tf . This is of course
not so for the open-loop optimal control obtained for the original problem, with starting
time t0 .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 17


Linear Optimal Control: Some Readily Solvable Instances

Remark 2.3 It follows from the above that existence of a solution to the DRE over [t0 , tf ]
is a sufficient condition for existence of a solution to optimal control problem (P ) for every
x0 ∈ Rn . Indeed, of course, the existence of a solution to (P ) is not guaranteed at the outset.
This was seen, e.g., in Example 2.1 above.

Let us now use a ∗ superscript to denote optimality, so (2.9) becomes

u∗ (t) = −B(t)T Π(t, Q, tf )x∗ (t), (2.10)

where x∗ is the “optimal trajectory”, i.e., the trajectory generated by the optimal control.
As noted above, the optimal value is given by

1
V (t0 , x0 ) := J(u∗ ) = xT0 Π(t0 , Q, tf )x0 .
2
V is known as the value function. Now suppose that, starting from x0 at time t0 , perhaps after
having undergone perturbations, the state reaches x(τ ) at time τ ∈ (t0 , tf ). The remaining
portion of the minimal cost, to be incurred over [τ, tf ], is the minimum, over u ∈ U, subject
to ẋ = Ax + Bu with x(τ ) fixed, of
1
R tf 
Jτ (u) := 2 τ
x(t)T L(t)x(t) + u(t)T u(t) dt + 21 x(tf )T Qx(tf ) (2.11)
R
1 tf
= 2 τ
kB(t)T Π(τ, Q, tf )x(t) + u(t)k2 dt + 12 x(τ )T Π(τ, Q, tf )x(τ ), (2.12)

where we simply have replaced, in (2.8) t0 with τ and x0 with x(τ ). The cost-to-go is

1
Jτ (u∗ ) = x(τ )T Π(τ, Q, tf )x(τ ).
2
Hence, the cost-to-go from an arbitrary time t < tf and state ξ ∈ Rn is

1
V (t, ξ) = Jt (u∗ ) = ξ T Π(t, Q, tf )ξ. (2.13)
2

Remark 2.4 We have not made any positive definiteness (or semi-definiteness) assumption
on L(t) or Q. The key assumption we have made is that the stated Riccati equation has
a solution Π(t, Q, tf ) over [t0 , tf ]. Below we investigate conditions (in particular, on L and
Q) which insure that this is the case. At this point, note that, if L(t)  0 for all t and
Q  0, then J(u) ≥ 0 for all u ∈ C, and expression (2.13) of the cost-to-go implies that
Π(t, Q, tf )  0 whenever it exists.

Remark 2.5 Note that (DRE) does not involve t0 nor x0 . In particular, when a solution
Π(·, Q, tf ) exists on [t0 , tf ], it yields optimal controls for every problem Pτ,ξ with the same
A(·), B(·), L(·), Q and tf as in (P), but with initial state ξ ∈ Rn and τ ∈ [t0 , tf ]. In particular
the value function V : (τ, ξ) 7→ V (τ, ξ) is the same for every problem Pτ,ξ , as long as (DRE)
has a solution on [τ, tf ], in particular, for every (τ, ξ) such that τ ∈ [t0 , tf ]. The notation Pτ,ξ
will be used in the remainder of this chapter.

18 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

Returning to the question of existence/uniqueness of the solution to the differential Ric-


cati equation, first note that the right-hand side of (2.7) is locally Lipschitz-continuous
(indeed, continuously differentiable) in K over Rn×n . This, together with continuity of A,
B, and L, implies that, for any given Q, there exists τ < tf such that a (continuously dif-
ferentiable) solution exists and is unique in [τ, tf ]. The alternative to existence of a unique
solution on −∞, tf ) is existence of a finite escape time. Indeed, the following holds.
Fact. (See, e.g., [?, Chapter 1, Theorems 2.1 & 3.1].) Let ϕ : Rn × R → Rn be continuous,
and Lipschitz-continuous in its first argument. Then for every x0 ∈ Rn and t0 ∈ R, there
exists t1 , t2 ∈ R, with t0 ∈ (t1 , t2 ), such that the differential equation ẋ = ϕ(x(t), t), with
x(t0 ) = x0 , has a (continuously differentiable) solution x(t) in (t1 , t2 ). Furthermore, this
solution is unique. Finally, suppose there exists a compact set S ⊂ Rn , with x0 ∈ S, that
enjoys the following property: For every t1 , t2 such that t0 ∈ (t1 , t2 ) and the solution x(t)
exists for all t ∈ (t1 , t2 ), x(t) belongs to S for all t ∈ (t1 , t2 ). Then the solution x(t) exists
and is unique (and continuously differentiable) for all t ∈ R.

Lemma 2.2 Let t̂ = inf{τ : Π(t, Q, tf ) exists ∀t ∈ [τ, tf ]}. If t̂ is finite, then kΠ(·, Q, tf )k is
unbounded on (t̂, tf ].

Proof. Let
ϕ(K, t) = −A(t)T K − KA(t) + KB(t)B T (t)K − L(t),
so that (2.7) can be written

K̇(t) = ϕ(K(t), t), K(tf ) = Q,

where ϕ ∈ C is continuous and is Lipschitz-continuous in its first argument on every bounded


set. Proceeding by contradiction, let S be a compact set containing {Π(t, Q, tf ) : t ∈ (t̂, tf ]}.
The claim is then an immediate consequence of the previous Fact.

In other words, either Π(t, Q, tf ) exists and is unique over (−∞, tf ), or there exists a finite
t̂ < tf such that Π(t, Q, tf ) is unbounded on t̂, tf ) (finite escape time). Further, since clearly
the transpose of a solution to the Riccati equation is also a solution, this unique solution
must be symmetric, as required in the path-independence lemma.

Remark 2.6 It follows that, without any further conditions, if t0 is close enough to tf , the
optimal control problem has a (unique) solution.

Next, note that if L(t)  0 for all t, and Q  0, then J(u) ≥ 0 for all u ∈ C. Hence, in
such case, as long as an optimal control exists,

V (t, ξ) ≥ 0 ∀t ≤ tf , ξ ∈ Rn .

It then follows from (2.13) that Π(τ, Q, tf ) is positive semidefinite for every τ such that
Π(t, Q, tf ) exists for all t ∈ [τ, tf ].

Theorem 2.1 Suppose L(t)  0 for all t and Q  0 Then Π(t, Q, tf ) exists ∀t ≤ tf , i.e,
t̂ = −∞.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 19


Linear Optimal Control: Some Readily Solvable Instances

Proof. Again, let t̂ = inf{τ : Π(t, Q, tf ) exists ∀t ∈ [τ, tf ]}, so that Π(·, Q, tf ) exists on (t̂, tf ].
Below, we show that kΠ(·, Q, tf )k is bounded by a continuous function over (t̂, tf ]. This will
imply that, if kΠ(·, Q, tf )k were to be unbounded over (t̂, tf ], we would have t̂ = −∞; in view
of Lemma 2.2, “t̂ is finite” is ruled out, and the claim will follow.
Thus, let τ ∈ (t̂, tf ]. For any x ∈ Rn , using the positive definiteness assumption on L(t)
and Q, we have
Ztf

ξ T Π(τ, Q, tf )ξ = 2V (τ, ξ) = min u(t)T u(t) + x(t)T L(t)x(t) dt + x(tf )T Qx(tf ) ≥ 0, (2.14)
u∈C
τ

where x(t) satisfies (2.3) with initial condition x(τ ) = ξ. We show that there exists a
continuous function F : (−∞, tf ) → Rn such that, if Π(·, Q, tf ) exists (and is unique and
continuously differentiable) on [τ, tf ), then, for all ξ ∈ Rn ,
ξ T Π(τ, Q, tf )ξ ≤ ξ T F (τ )ξ, (2.15)
implying, in view of Exercise 2.2 below and, since Π(τ, Q, tf )ξ  0, that
kΠ(τ, Q, tf )k2 ≤ kF (τ )k2 ∀τ ∈ (t̂, tf ],
as claimed. To conclude the proof, we construct such F . From (2.14), we have for every
u ∈ C,
Ztf
T

ξ Π(τ, Q, tf )ξ ≤ u(t)T u(t) + x(t)T L(t)x(t) dt + x(tf )T Qx(tf ) (2.16)
τ
To obtain an upper bound quadratic in ξ, we let û be identically zero. The corresponding x̂
then satisfies ẋ = Ax, so that x̂(t) = ΦA (t, τ )ξ for all τ . This yields the upper bound (2.15)
with Z t f
F (τ ) := ΦA (t, τ )T L(t)ΦA (t, τ )dt + ΦA (tf , τ )T QΦA (tf , τ ).
τ

Exercise 2.2 Prove that, if A = AT and F = F T , and 0 ≤ xT Ax ≤ xT F x for all x,


then kAk2 ≤ kF k2 , where k · k2 denotes the spectral norm of the matrix argument. (I.e.,
kAk2 = max{kAxk2 : kxk2 = 1}.)
Thus, when L(t) is positive semi definite for all t and Q is positive semi definite, our problem
has a unique optimal control given by (2.10).
Exercise 2.3 Investigate the case of the more general cost function
Ztf  
1 T T 1 T
J(u) = x(t) L(t)x(t) + u(t) S(t)x(t) + u(t) R(t)u(t) dt,
2 2
t0

where L, S and R are continuous on [t0 , tf ], L(t) and R(t) are symmetric on [t0 , tf ], and
R(t) ≻ 0 for all t. Hint: Let v(t) = T (t)u(t) + M(t)x(t), where T satisfies R(t) = T (t)T T (t)
for all t, and T is continuous (does such T exist?).

20 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

From here on, we assume a cost function of the form

Ztf  
1 T 1 T
J(u) = x(t) L(t)x(t) + u(t) R(t)u(t) dt,
2 2
t0

with L and R as in Exercise 2.3.


Solving the DRE
Let u∗ be an optimal control, with x∗ the associated optimal state trajectory. Let τ < tf
be such that Π(t, Q, tf ) is well defined and continuous for all t ∈ [τ, tf ]. For t ∈ [τ, tf ], define

p∗ (t) = −Π(t, Q, tf )x∗ (t), (2.17)

so that the optimal control law (2.10) satisfies

u∗ (t) = R(t)−1 B T (t)p∗ (t). (2.18)

Then x∗ and p∗ together satisfy the the linear system


 ∗    ∗ 
ẋ (t) A(t) B(t)R(t)−1 B T (t) x (t)
= (2.19)
ṗ∗ (t) L(t) −AT (t) p∗ (t)

evolving in R2n .

Exercise 2.4 Verify that x∗ and p∗ satisfy (2.19).

Suppose now that t0 ∈ [τ, tf ], and x(t0 ) = x0 . From (2.17), we also have the conditions
p (tf ) = −Qx∗ (tf ), yielding a linear two-point boundary-value problem (TPBVP). We know

that, without a positive semidefiniteness assumption on L and Q, there might not be a


solution; and indeed, in contrast with linear initial value problems, linear TPBVPs do not
always have a solution. Still, regardless of positive semi-definiteness of L and Q, whenever
Π(t, Q, tf ) exists and is continuous on [t0 , tf ], the optimal control is readily obtained via
solution of an initial-value problem, as seen next.

Theorem 2.2 Let X(t) and P (t) be n × n matrices satisfying the differential equation
    
Ẋ(t) A(t) B(t)R(t)−1 B T (t) X(t)
= , (2.20)
Ṗ (t) L(t) −AT (t) P (t)

with X(tf ) = I and P (tf ) = −Q. Then

P (t) = −Π(t, Q, tf )X(t) (2.21)

satisfies DRE (2.7) for t ∈ [τ, tf ], for every τ < tf such that Π(t, Q, tf ) exists on [τ, tf ].

Proof. Just plug in.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 21


Linear Optimal Control: Some Readily Solvable Instances

Exercise 2.5 Show that, if the DRE has a continuous solution Π(t, Q, tf ) on [τ, tf ] then
X(t) is nonsingular on [τ, tf ], so that Π(t, Q, tf ) = −P (t)X(t)−1 for all t ∈ [τ, tf ]. [Hint:
Use (2.21) to solve (2.20) for X(·).]

Instance of Pontryagin’s Principle (Lev S. Pontryagin, Soviet Mathematician,


1908–1988.)
In connection with cost function J with uT u generalized to uT Ru (although we will still
assume R = I for the time being), let ψ : Rn → R be the “terminal cost” function, i.e.,
1
ψ(ξ) = ξ T Qξ ∀ξ ∈ Rn ,
2
and let H : R × Rn × Rn × Rm → R be given by
1 T 
H(τ, ξ, η, υ) = − υ R(τ )υ + ξ T L(τ )ξ + η T (A(τ )ξ + B(τ )υ).
2
Equations (2.19), (2.18), and (2.17) then yield

ẋ∗ (t) = ∇η H(t, x∗ (t), p∗ (t), u∗ (t)), x∗ (t0 ) = x0 , (2.22)


ṗ∗ (t) = −∇ξ H(t, x∗ (t), p∗ (t), u∗ (t)), p∗ (tf ) = −∇ψ(x∗ (tf )), (2.23)

a two-point boundary-value problem. Now note that, from (2.18),

H(t, x∗ (t), p∗ (t), u∗ (t)) = maxm H(t, x∗ (t), p∗ (t), v) ∀t,


v∈R

where clearly the minimum is achieved at v = R(t)−1 B T (t)p∗ (t). Thus the following result
(an instance of Pontryagin’s Principle; see Chapter 6 for more details) holds.1

Theorem 2.3 Suppose Π(t, Q, tf ) exists on [t0 , tf ] (so that the TPBVP has a (unique) solu-
tion). Then u∗ ∈ C solves (P) if and only if

H(t, x∗ (t), p∗ (t), u∗ (t)) = maxm H(t, x∗ (t), p∗ (t), v) ∀t ∈ [t0 , tf ],


v∈R

where x∗ , with x∗ (t0 ) = x0 , is the state trajectory generated by u∗ , and where p∗ : R → Rn ,


continuously differentiable, satisfies

ṗ∗ (t) = −∇ξ H(t, x∗ (t), p∗ (t), u∗(t)) (= −AT p∗ (t) + L(t)x∗ (t)), ∀t ∈ [t0 , tf ],

p∗ (tf ) = −∇ψ(x∗ (tf )) (= −Qx∗ (tf )).

This suggests we define H : R × Rn × Rn → R to take values

H(τ, ξ, η) = maxm H(τ, ξ, η, v),


v∈R

1
For the present special case, the condition is necessary and sufficient; as seen in Chapter 6, this is not
so in general.

22 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

Then
H(t, x∗ (t), p∗ (t), u∗ (t)) = H(t, x∗ (t), p∗ (t)) ∀t.
Function H is the pre-Hamiltonian,2 (sometimes called control Hamiltonian or pseudo-Hamiltonian)
and H the Hamiltonian (or true Hamiltonian). (Sir William R. Hamilton, Irish mathemati-
cian, 1805–1865.) Thus

1 1
H(t, ξ, η) = − ξ T L(t)ξ + η T A(t)ξ + η T B(t)R(t)−1 B T (t)η
2 2
 T   
1 ξ −L(t) AT (t) ξ
= −1 T .
2 η A(t) B(t)R(t) B (t) η

Finally, note that the optimal cost J(u∗ ) can be equivalently expressed as

1
J(u∗ ) = − x(t0 )T p(t0 ).
2

Remark 2.7 The gradient of H with respect to the first and second (z := [ξ; η]) is given by
  
A(t) B(t)R(t)−1 B T (t) ξ
∇z H(t, ξ, η) = .
−L(t) AT (t) η

so that, in view of (2.17),

∇z H(t, x∗ (t), p∗ (t)) = ∇z H(t, x∗ (t), p∗ (t), u∗ (t)). (2.24)

Note however that while, as we will see later, (2.22)–(2.23) hold rather generally, (2.24) no
longer does when constraints are imposed on the values of control signal, as is the case later
in this chapter as well as in Chapter 6. Indeed, H usually is non-smooth.

Remark 2.8 Because ∇u H(t, x∗ (t), p∗ (t), u∗ (t)) = 0 and u∗ is differentiable, along trajec-
tories of (2.19), with z ∗ (t) := [x∗ (t); p∗ (t)], we have

d ∂
H(t, x∗ (t), p∗ (t), u∗ (t)) = ∇z H(t, x∗ (t), p∗ (t), u∗ (t))T ż ∗ (t) + H(t, x∗ (t), p∗ (t), u∗(t))
dt ∂t

= H(t, x∗ (t), p∗ (t), u∗ (t))
∂t
since

∇z H(t, x∗ (t), p∗ (t), u∗ (t))T ż ∗ (t) = ∇z H(t, x∗ (t), p∗ (t), u∗ (t))T J∇z H(t, x∗ (t), p∗ (t), u∗(t))
= 0 (since J T = −J).

In particular, if A, B and L do not depend on t, H(t, x∗ (t), p∗ (t), u∗ (t)) is constant along
trajectories of (2.19).
2
This terminology is borrowed from P.S. Krishnaprasad

Copyright ©1993–2024, André L. Tits. All Rights Reserved 23


Linear Optimal Control: Some Readily Solvable Instances

2.1.2 Infinite horizon, LTI systems


We now turn our attention to the case of infinite horizon (tf = ∞). Because we are mainly
interested in stabilizing control laws (so that x(t) → 0 as t → ∞ will be guaranteed), we
assume without loss of generality that Q = 0. To simplify the analysis, we further assume
that A, B and L are constant. We also simplify the notation by translating the origin of time
to the initial time t0 , i.e., by letting t0 = 0. Assuming (as above) that L = LT ≥ 0, so that
existence and uniqueness of a solution to (DRE) is guaranteed over every finite interval, we
can write L = C T C for some (non-unique) C, so that the problem can be written as
Z∞
1 
minimize J(u) = y(t)T y(t) + u(t)T u(t) dt
2
0

subject to

ẋ(t) = Ax(t) + Bu(t), ∀t ∈ [t0 , tf ),


y(t) = Cx(t),
u∈C
x(0) = x0 .

Note that y is merely some linear image of x, and need not be a physical output. For
example, it could be an error signal to be driven to the origin.
Consider the differential Riccati equation (2.7) with our constant A, B, and L and, in
agreement with the notation used in the previous section, denote by Π(t, 0, τ ) the value at
time t of the (continuously differentiable) solution to this equation that vanishes at time τ .
Since L is positive semi-definite, in view of Theorem 2.1, such solution exists for all t and τ
(with t ≤ τ ) and is unique, symmetric and positive semi-definite. Noting that Π(t, 0, τ ) only
depends on t − τ (since the Riccati equation is now time-invariant), we define

Π(τ ) := Π(0, 0, τ ),

so that
Π(t, 0, τ ) = Π(0, 0, τ − t) = Π(τ − t).

Exercise 2.6 Formally prove that Π(t, 0, τ ) = Π(0, 0, τ − t) for all t, τ ∈ R.

It is intuitively reasonable to expect that, for fixed t, as τ goes to ∞, the optimal feedback
law u(t) = −B T Π(t, 0, τ )x(t) for the finite-horizon problem with cost function

τ 1
J (u) := (x(t)T C T Cx(t) + u(t)T u(t))dt .
2
0

will tend to an optimal feedback law for our infinite horizon problem. Since the optimal
cost-to-go is x(t)T Π(τ − t)x(t), it is also tempting to guess that Π(τ − t) itself will converge,
as τ → ∞, to some matrix Π∞ independent of t (since the time-to-go will approach ∞ at

24 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

every time t and the dynamics is time-invariant), satisfying the algebraic Riccati equation
(since the derivative of Π∞ with respect to t is zero)

AT Π∞ + Π∞ A − Π∞ BB T Π∞ + C T C = 0, (ARE)

with optimal control law


u∗ (t) = −B T Π∞ x∗ (t).
We will see that, under a mild assumption, introduced next, our intuition is correct.
When dealing with an infinite-horizon problem, it is natural (if nothing else, for practical
reasons) to seek control laws that are stabilizing. Otherwise, while x∗ might still remain
bounded for some initial conditions, such property would not hold robustly.
Assumption. (A, B) is stabilizable.
We first show that Π(t) converges to some limit Π∞ , then that Π∞ satisfies (ARE) (i.e.,
Π∞ is an equilibrium point for (DRE)), and finally that the control law u(t) = −B T Π∞ x(t)
is optimal for our infinite-horizon problem and, if it is stabilizing, is the unique stabilizing
optimal control.

Lemma 2.3 As t → ∞, Π(t) converges to some symmetric, positive semi-definite ma-


trix Π∞ . Furthermore, Π∞ satisfies (ARE).

Proof. Since the optimal value for J τ (u) is 21 xT0 Π(τ )x0 , for any u ∈ C we have, for all τ > 0,

xT0 Π(τ )x0 ≤ x(σ)T C T Cx(σ) + u(σ)T u(σ)dσ. (2.25)
0

Invoking stabilizability, let u(t) = F x(t) be a stabilizing static state feedback and let x̂ be
the corresponding solution of

ẋ = (A + BF )x
x(0) = x0 ,

i.e., x̂(t) = e(A+BF )t x0 ; further, let û = F x̂. Then, since A + BF is Hurwitz stable and the
integrand in (2.25) is nonnegative, we have, for all τ ≥ 0,
Zτ Z∞
xT0 Π(τ )x0 ≤ x̂(σ)T C T C x̂(σ) + û(σ)T û(σ)dσ ≤ x̂(σ)T C T C x̂(σ) + û(σ)T û(σ)dσ = xT0 M x0 ,
0 0

where
Z∞
T
M = e(A+BF ) t (C T C + F T F )e(A+BF )t dt
0

is well defined and independent of τ . Since in view of (2.25) xT0 Π(τ )x0 is nonnegative, it is
bounded for every fixed x0 . Further, non-negative definiteness of C T C implies that xT0 Π(τ )x0

Copyright ©1993–2024, André L. Tits. All Rights Reserved 25


Linear Optimal Control: Some Readily Solvable Instances

is monotonically non-decreasing as τ increases. Since it is bounded, xT0 Π(τ )x0 must converge
for every x0 .3 Using the fact that Π(τ ) is symmetric it is easily shown that Π(τ ) converges
(see Exercise 2.7 below), i.e., for some symmetric matrix Π∞ ,

lim Π(τ ) = Π∞ .
τ →∞

Symmetry and positive semi-definiteness are inherited from Π(τ ). Finally, since4

−Π̇(τ ) = −AT Π(τ ) − Π(τ )A − C T C + Π(τ )BB T Π(τ )

and the right-hand side converges when τ → ∞, Π̇(τ ) must also converge and, since it is the
derivative of Π(τ ), which itself converges, its limit must be zero. Hence Π∞ satisfies (ARE).

Note: The final portion of the proof is taken from [?, p.296–297], [?, section 8.4].
Exercise 2.7 Prove that convergence of xT0 Π(τ )x0 for arbitrary x0 implies convergence of
Π(τ ).
Now note that, for any ũ ∈ C and τ ≥ 0, we have
1 T 1
x0 Π(τ )x0 = xT0 Π(0, 0, τ )x0 = min J τ (u) ≤ J τ (ũ) ≤ J(ũ),
2 2 u∈C

where J(ũ) might be infinite. Letting τ → ∞ we obtain


1 T
x Π∞ x0 ≤ J(ũ) ∀ũ ∈ C,
2 0
implying that
1 T
x Π∞ x0 ≤ inf u∈C J(u) (2.26)
2 0
(where inf u∈C J(u) could be infinite). Finally, we construct a control u∗ which attains the
cost value 12 xT0 Π∞ x0 , proving optimality of u∗ and equality in (2.26). For this, note that
the constant matrix Π∞ , since it satisfies (is an equilibrium point for) (ARE), automatically
satisfies the corresponding differential Riccati equation. Using this fact and making use of
the Fundamental Lemma with K(t) := Π∞ for all t, we obtain, analogously to (2.8),
Z τ 
τ 1 T 2 T T
J (u) = kB Π∞ x(t) + u(t)k2 dt − x(τ ) Π∞ x(τ ) + x0 Π∞ x0 ∀τ ≥ 0. (2.27)
2 0

Since Π∞  0, x(τ )T Π∞ x(τ ) > 0 for all τ , yielding, for every u ∈ C,


Z τ 
τ 1 T 2 T
J (u) ≤ kB Π∞ x(t) + u(t)k2 dt + x0 Π∞ x0 ∀τ ≥ 0..
2 0
3
It cannot be inferred from the mere fact that Π(τ ) converges as τ → ∞ that its derivative converges
to zero—which would imply, by taking limits in (2.7), that Π∞ satisfies the ARE. (E.g., as t → ∞,
exp(−t) sin(exp(t)) tends to a constant (zero) but its derivative does not go to zero.) In particular, Barbalat’s
Lemma can’t be applied without first showing that Π(·) has a uniformly continuous derivative.
4
The minus sign in the right-hand side is due to the identity (when Q = 0 and tf = 0) K(t) = Π(t, 0, 0) =
Π(0, 0, −t) = Π(−t).

26 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

Substituting the feedback control law

u = −B T Π∞ x, (2.28)

and denoting by u∗ the resulting control signal yields


1
J τ (u∗ ) ≤ xT0 Π∞ x0 ∀τ ≥ 0,
2
and invoking Lebesgue monotone convergence theorem yields
1
J(u∗ ) = lim J τ (u∗ ) ≤ xT0 Π∞ x0
τ →∞ 2
which, together with (2.26), proves that u∗ is optimal. Finally, if feedback (2.28) is stabilizing,
then asymptotic stability implies that x(τ ) → 0 as τ → ∞, and taking limits on both sides
of (2.27) yields
1
J(u∗ ) = xT0 Π∞ x0
2
showing that u must statisfy feedback law (2.28) in order to be optimal, so that u∗ is the

unique (continuous) optimal control.

Theorem 2.4 Suppose (A, B) is stabilizable. Then Π∞ solves (ARE), the control law u =
−B T Π∞ x is optimal, yielding u∗ , and
1
J(u∗ ) = xT0 Π∞ x0 .
2

x∗ (t) = e(A−BB ∞ )t
x0 ∀t.
Also, V (t, ξ) = 21 ξ T Π∞ ξ does not depend on t. Finally, if control law (2.28) is stabilizing,
then u∗ is the only optimal control.

This solves the infinite horizon LTI problem. Note however that it is not guaranteed that
the optimal control law is stabilizing. For example, consider the extreme case when C = 0
(in which case Π∞ = 0) and the system is open loop unstable. It seems clear that this is due
to unstable modes not being observable through C. This of course is undesirable. We now
show that, indeed, under a detectability assumption, the optimal control law is stabilizing.

Theorem 2.5 Suppose (A, B) is stabilizable and (C, A) detectable. Then, if K  0 solves
(ARE), A − BB T K is Hurwitz stable; in particular, A − BB T Π∞ is Hurwitz stable.

Proof. (ARE) can be rewritten

(A − BB T K)T K + K(A − BB T K) = −KBB T K − C T C. (2.29)

Proceed now by contradiction. Let λ, with Reλ ≥ 0, and v 6= 0 be such that

(A − BB T K)v = λv. (2.30)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 27


Linear Optimal Control: Some Readily Solvable Instances

Multiplying (2.29) on the left by v ∗ and on the right by v we get

2(Reλ)(v ∗ Kv) = −kB T Kvk2 − kCvk2 .

Since the left-hand side is non-negative (since K  0) and the right-hand side non-positive,
both sides must vanish. Thus (i) Cv = 0 and, (ii) B T Kv = 0 which together with (2.30),
implies that Av = λv. Since Reλ ≥ 0, this contradicts detectability of (C, A).

Corollary 2.1 If (A, B) is stabilizable and (C, A) is detectable, then the optimal control law
u = −B T Π∞ x is stabilizing.

Exercise 2.8 Suppose (A, B) is stabilizable. Prove that (i) Given x ∈ Rn , Π∞ x = 0n


(the origin of Rn ) if and only if xT Π∞ x = 0; and (ii) If Π∞ x = 0n then x belongs to the
unobservable subspace. In particular, if (C, A) is observable, then Π∞ ≻ 0. [Hint: Use the
fact that J(u∗ ) = 12 xT0 Π∞ x0 .]

To summarize, we have the following theorem.

Theorem 2.6 Suppose (A, B) is stabilizable. Then, as t → ∞, Π(t) → Π∞ , a symmetric,


positive semi-definite matrix that satisfies (ARE); and the feedback law u(t) = −B T Π∞ x(t)
is optimal. Further, if (C, A) is detectable, then (i) the resulting closed-loop system is asymp-
totically stable (i.e., Π∞ is a “stabilizing” solution of (ARE)); and (ii) the optimal solution
u∗ is unique. Finally, if (C, A) is observable, then Π∞ is positive definite.

Remark 2.9 Some intuition concerning the solutions of (ARE) can be gained as follows.
In the scalar case, if B 6= 0 and C 6= 0, the left-hand side is a downward parabola that
intersects the vertical axis at C 2 (> 0); hence (ARE) has two real solutions, one positive,
the other negative. When n > 1, the number of solutions increases, most of them being
indefinite matrices (i.e., with some positive eigenvalues and some negative eigenvalues). In
line with the fact that Π(t)  0 for all t, a (or the) positive semi-definite solution will be the
focus of our investigation.

Finally, we discuss how (ARE) can be solved directly (without computing the limit of
Π(t)). In the process, we establish that, under stabilizability of (A, B) and detectability of
(C, A), Π∞ is that unique stabilizing solution of (ARE), hence the unique symmetric positive
semi-definite solution of (ARE). Also see Appendix E in [?].
Consider the Hamiltonian matrix (see (2.19))
 
A BB T
H= ,
L −AT

and let K + be a stabilizing solution to (ARE), i.e., A − BB T K + is stable. Let


 
I 0
T = .
−K + I

28 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.1 Free terminal state, unconstrained, quadratic cost: linear quadratic regulator (LQR)

Then
 
−1 I 0
T = .
K+ I

Now (elementary block column and block row operations)


   
−1 I 0 A BB T I 0
T HT =
K+ I L −AT −K + I
  
I 0 A − BB T K + BB T
=
K+ I L + AT K + −AT
 
A − BB T K + −BB T
=
0 −(A + BB T K + )T

since K + is a solution to (ARE). It follows that

σ(H) = σ(A − BB T K + ) ∪ σ(−(A − BB T K + )),

where σ(·) denotes the spectrum (set of eigenvalues). Thus, if (A, B) is stabilizable and
(C, A) detectable (so such solution K + exists), H cannot have any imaginary eigenvalues:
It must have n eigenvalues in C− and n eigenvalues in C+ . Furthermore the first n columns
of T form a basis for the stable invariant subspace of H, i.e., for the span of all generalized
eigenvectors of H associated with stable eigenvalues (see, e.g., Chapter 13 of [?] for more on
this).  
S11
Now let be a basis for the stable invariant subspace of H, i.e., let
S21
 
S11 S12
S=
S21 S22
be any invertible matrix such that
 
−1 X Z
SHS =
0 Y

for some X, Y, Z such that σ(X) ⊂ C− and σ(Y ) ⊂ C+ . (Note that σ(H) = σ(X) ∪ σ(Y ).)
Then it must hold that, for some non-singular R′ ,
   
S11 I
= R′ .
S21 −K +

It follows that S11 = R′ and S21 = −K + R′ , yielding

K + = −S21 S11
−1
,

which also shows that (ARE) cannot have more than one stabilizing solution. From Theo-
rem 2.5, it also follows that, if (A, B) is stabilizable and (C, A) detectable, then (ARE) has
exactly one positive semidefinite solution, namely Π∞ .
We have thus proved the following.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 29


Linear Optimal Control: Some Readily Solvable Instances

−1
Theorem
 2.7  Suppose (A, B) is stabilizable and (C, A) is detectable. Then Π∞ = −S21 S11 ,
S11 −1
where is any basis for the stable invariant subspace of H. In particular, S21 S11 is
S21
symmetric and there is exactly one stabilizing solution to (ARE), namely Π∞ , which is also
the only positive semi-definite solution.


0 I
Exercise 2.9 Given J = (a unitary matrix such that J 2 = −I), any real matrix
−I 0
H that satisfies J −1 HJ = −H T is said to be Hamiltonian. Show that if H is Hamiltonian
and λ is an eigenvalue of H, then −λ also is.

The 2n × 2n matrix in (2.19) is Hamiltonian for all t.


In summary, we have the following. If (A, B) is stabilizable then Π∞ is well defined
(as the limit of Π(t); it is a positive semi-definite solution of (ARE); and the control law
u(t) = −B T Π∞ x(t) is optimal, with optimal value 12 xT0 Π∞ x0 . Further, if in addition

• (C, A) is detectable, then (i) the optimal control u∗ is unique and is generated by the
feedback control law u = −B T Π∞ ; (ii) A − BB T Π∞ is Hurwitz stable, i.e., the optimal
control law is stabilizing; and (iii) Π∞ is the only stabilizing solution of (ARE) and
the only positive semi-definite solution of (ARE).

• (C, A) is observable, then Π∞ is positive definite.

2.2 Fixed terminal state, unconstrained control values,


quadratic cost

Question: Given xf ∈ Rn , tf > t0 , does there exist u ∈ U such that, for system (2.1),
x(tf ) = xf ? If the answer to the above is “yes”, we say that xf is reachable from (x0 , t0 )
at time tf . If moreover this holds for all x0 , xf ∈ Rn then we say that the system (2.1) is
reachable on [t0 , tf ].
There is no loss of generality in assuming that xf = 0n ,5 as shown by the following
exercise.

Exercise 2.10 Define x̂(t) := x(t) − Φ(t, tf )xf . Then x̂ satisfies

d
x̂(t) = A(t)x̂(t) + B(t)u(t) ∀t ∈ [t0 , tf ).
dt

Conclude that, under dynamics (2.1), u steers (x0 , t0 ) to (xf , tf ) if and only if it steers (ξ, t0 )
to (0n , tf ), where ξ (= ξ(x0 , t0 )) := x0 − Φ(t0 , tf )xf .
5
Here and elsewere in these notes, 0n is the origin of Rn .

30 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.2 Fixed terminal state, unconstrained control values, quadratic cost

Since Φ(t0 , tf ) is invertible, it follows that system (2.1) is reachable on [t0 , tf ] if and only if it
is controllable on [t0 , tf ], i.e., if and only if, given x0 , there exists u ∈ C that steers (x0 , t0 ) to
(0n , tf ). [Note. Equivalence between reachability and controllability (to the origin) does not
hold in the discrete-time case, where controllability is a weaker property than reachability.]
Now controllability to (0n , tf ) from (ξ, t0 ), for some ξ, is equivalent to solvability of the
equation (in u ∈ C):
Ztf
Φ(tf , t0 )ξ + Φ(tf , σ)B(σ)u(σ)dσ = 0n .
t0

Equivalently (multiplying on the left by the non-singular matrix Φ(t0 , tf )), (0n , tf ) can be
reached from (ξ, t0 ). if there exists u ∈ C such that

Ztf
ξ = Lu := − Φ(t0 , σ)B(σ)u(σ)dσ,
t0

where L : C → Rn is a linear map.


Assume now that 0n is indeed reachable at time tf from (ξ, t0 ), and as a first step, consider
the minimum energy problem, i.e., suppose that we might want want to steer (ξ, t0 ) while
spending the least amount of energy, i.e., while minimizing

Ztf
1 1
J(u) := hu, ui = u(t)T u(t)dt
2 2
t0

subject to ξ = Lu, where h·, ·i : C × C → R is the Lm


2 inner product and, as before, the factor
of one half has been inserted for convenience. Linear map L is continuous (bounded) with
respect to the norm derived from this inner product (see Exercise A.50) and the problem
at hand is a linear least-squares problem. Clearly, 0n is reachable at time tf from (ξ, t0) if
and only if ξ ∈ R(L), so that (2.1) is controllable on [t0 , tf ] if and only if R(L) = Rn . From
Exercise A.50, L has an adjoint L∗ . In view of Theorem A.5, because R(LL∗ ) (a subspace
of Rn ) is closed, R(L) = R(LL∗ ). It follows from Theorem A.6 that the unique optimal
control is given by the unique u ∈ R(L∗ ) satisfying Lu = ξ is

u = L∗ η,

where η is any point in Rn satisfying

LL∗ η = ξ

(and such points do exist). It is shown in Exercise A.50 of Appendix A that L∗ is given by

(L∗ µ)(t) = −B T (t)ΦT (t0 , t)µ

Copyright ©1993–2024, André L. Tits. All Rights Reserved 31


Linear Optimal Control: Some Readily Solvable Instances

which yields
Ztf

LL µ = − Φ(t0 , t)B(t)(L∗ µ)(t)dt
t0
Ztf
= Φ(t0 , t)B(t)B T (t)ΦT (t0 , t)dt µ, ∀µ ∈ Rn ,
t0

i.e., LL∗ : Rn → Rn is given by


Ztf

LL = Φ(t0 , t)B(t)B T (t)ΦT (t0 , t)dt =: W (t0 , tf ).
t0

(W (t0 , tf ) is often defined instead with Φ(tf , t) instead of Φ(t0 , t); note though that Φ(t0 , tf )
is invertible.) Since R(L) = R(LL∗ ), 0n is reachable at tf from (ξ, t0 ) if and only if

ξ ∈ R(W (t0 , tf )) .

Note that W (t0 , tf ) has entries

(W (t0 , t1 ))ij = hvi , vj iL2 ,

with vi (t) := B(t)T Φi• (t0 , t)T for all t ∈ [t0 , tf ], where h·, ·i is again the Lm
2 inner product
i.e., W (t0 , tf ) is the Gramian matrix (or Gram matrix, or Gramian; Jørgen P. Gram, Danish
mathematician, 1850–1916) associated with the vectors (Φj· (t0 , ·)B(·))T , j = 1, . . . , n. It is
known as the reachability Gramian. It is invertible if and only if R(L) = Rn , i.e., if and
only if the system is reachable on [t0 , tf ].
Suppose W (t0 , tf ) is invertible indeed and, for simplicity, assume that the target state is
the origin, i.e., xf = 0n . (Also see Theorem A.6.) The unique minimum energy control that
steers (x0 , t0 ) to (0n , tf ) is then given by6

û = L∗ (LL∗ )−1 x(t0 )

i.e.

û(t) = −B T (t)ΦT (t0 , t)W (t0 , tf )−1 x(t0 ) (2.31)

and the corresponding energy is given by


1 1
hû, ûi = x(t0 )T W (t0 , tf )−1 x(t0 ). (2.32)
2 2
6 †
L := L∗ (LL∗ )−1 is known as the pseudo-inverse of integral operator L; this concept was first introduced
by E.I. Fredholm (Swedish mathematician, 1866–1937) in 1903. It was later further developed, in the
narrower context of linear operators L from Rm to Rn , by E.H. Moore (American mathematician, 1962–
1932), A. Bjerhammer (Swedish geodesist, 1917–2011), and R. Penrose (English mathematician, 1931–. Also
see end of Appendix A.

32 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.2 Fixed terminal state, unconstrained control values, quadratic cost

Note that, as expressed in (2.31), û(t) depends explicitly on the initial state x0 and initial
time t0 . Consequently, if between t0 and the current time t, the state x has been affected by
an external perturbation, û as expressed by (2.31) is no longer optimal (minimum energy)
over the remaining time interval [t, tf ]. Let us address this issue. (We still assume that
xf = 0n .) At time t0 , we have
û(t0 ) = −B T (t0 )ΦT (t0 , t0 )W (t0 , tf )−1 x(t0 )
= −B T (t0 )W (t0 , tf )−1 x(t0 ).
Intuitively, this must hold independently of the value of t0 , i.e., for the problem un-
der consideration, Bellman’s Principle of Optimality holds (Richard E. Bellman, American
mathematician, 1920–1984): Independently of the initial state (at t0 ), for û to be optimal,
it is necessarily the case that û applied from the current time t ≥ t0 up to the final time tf ,
starting atR tthe current state x(t), be optimal for the remaining problem, i.e., for the objective
function t u(τ )T u(τ )dτ . Specifically, given x ∈ Rn and t ∈ [t0 , tf ] such that xf is reachable
f

at time tf from (x, t), denote by P (x, t; xf , tf ) the problem of determining the control of least
energy that steers (x, t) to (xf , tf ), i.e., with (x, t) replacing (x0 , t0 ). Let x(·) be the state
trajectory that results when optimal control û is applied, starting from x0 at time t0 . Then
Bellman’s Principle of Optimality asserts that, for any t ∈ [t0 , tf ], the restriction of û to [t, tf ]
solves P (x(t), t; xf , tf ). (See Chapter 3 for a proof of Bellman’s Principle in the discrete-time
case.)

Remark 2.10 It is readily seen that the optimality principle holds only for a class of optimal
control problems (which includes those considered so far as well as most of those to be
considered later on). Indeed, the principle assumes that the cost is incurred sequentially, as
time increases (and as the state is integrated accordingly): the cost incurred up to time t
does not depend on values of u beyond t.

Exercise 2.11 Assuming for simplicity that piecewise continuous controls (i.e., controls
that are right-continuous everywhere, and continuous everywhere except possibly at finitely
many points, with finite left and right limits at those points) are admissible7 prove that Bell-
man’s Principle of Optimality holds for the minimum energy fixed endpoint problem. [Hint:
Given an optimal control u∗ for P (x0 , t0 ; xf , tf ), assuming by contradiction that a lower en-
ergy control û exists for P (x(t), t; xf , tf ), construct, based on û, a better control that u∗ for
P (x0 , t0 ; xf , tf ), a contradiction.]

It follows that, for any t such that W (t, tf ) is invertible,


û(t) = −B T (t)W (t, tf )−1 x(t) ∀t (2.33)
which yields the “closed-loop” implementation, depicted in Figure 2.1; in (2.33), v = 0n .
Further the optimal cost-to-go V (t, ξ) is (from (2.32)),
1
V (t, ξ) = ξ T W (t, tf )−1 ξ
2
7
It can be checked (but you are not asked to do so) that the essence of the entire derivation so far still
applies when the class of admissible controls is expanded from continuous to piecewise continuous signals.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 33


Linear Optimal Control: Some Readily Solvable Instances

v(t) u(t) •
x(t)
+ x(t)
B(t) + ∫
-

A(t)

BT (t)W -1 (t,t 1)

Figure 2.1: Closed-loop implementation

+
i c v
-

Figure 2.2: Charging capacitor

whenever the inverse exists. Note that, with a fixed ξ 6= 0n , V (t, ξ) → ∞ as t → tf . This
reflects the fact that, with x(tf ) = 0n , for t close to tf , very high energy must be expended
to reach the origin in a very short time t − tf .

Exercise 2.12 Prove (2.33) from (2.31) directly, without invoking Bellman’s Principle.

Example 2.2 (charging capacitor)

d
cv(t) = i(t)
dt
Ztf
minimize ri(t)2 dt s.t. v(0) = v0 , v(tf ) = v1
0

We obtain
1
B(t) ≡ , A(t) ≡ 0
c
Ztf
1 tf
W (0, tf ) = 2
dt = 2
c c
0

34 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.2 Fixed terminal state, unconstrained control values, quadratic cost

v0 − v1
η0 = c2
tf
1 c2 (v0 − v1 ) c(v1 − v0 )
i0 (t) = − = = constant.
c tf tf
The closed-loop optimal feedback law is given by
−c
i0 (t) = (v(t) − v1 ).
tf − t

Exercise 2.13 Discuss the same optimal control problem (with fixed end points) with the
objective function replaced by

Ztf
1
J(u) ≡ u(t)T R(t)u(t) dt
2
t0

where R(t) = R(t)T ≻ 0 for all t ∈ [t0 , tf ] and R(·) is continuous. [Hint: define a new inner
product on U.]

The controllability Gramian W (·, ·) happens to satisfy certain simple equations. Recalling
that
Ztf
W (t, tf ) = Φ(t, σ)B(σ)B T (σ)Φ(t, σ)T dσ
t

one easily verifies that W (tf , tf ) = 0 and

d
W (t, tf ) = A(t)W (t, tf ) + W (t, tf )AT (t) − B(t)B T (t) (2.34)
dt
implying that, if W (t, tf ) is invertible, it satisfies

d 
W (t, tf )−1 = −W (t, tf )−1 A(t) − AT (t)W (t, tf )−1 + W (t, tf )−1 B(t)B T (t)W (t, tf )−1(2.35)
dt
Equation (2.34) is linear. It is a Lyapunov equation. Equation (2.35) is quadratic. It is a
Riccati equation (for W (t, tf )−1 ).

Exercise 2.14 Prove that if a matrix M(t) := W (t, tf ) satisfies Lyapunov equation (2.34)
then, at every t < tf at which it is invertible, its inverse satisfies Riccati equation (2.35).
(Hint: dtd M(t)−1 = −M(t)−1 ( dtd M(t))M(t)−1 .)

Exercise 2.15 Prove that W (·, ·) also satisfies the functional equation

W (t0 , tf ) = W (t0 , t) + Φ(t0 , t)W (t, tf )ΦT (t0 , t).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 35


Linear Optimal Control: Some Readily Solvable Instances

As we have already seen, the Riccati equation plays a fundamental role in optimal control
systems involving linear dynamics and quadratic cost (linear-quadratic problems). At this
point, note that, if xf = 0n and W (t, tf ) is invertible, then û(t) = −B(t)T P (t)x(t), where
P (t) = W (t, tf )−1 solves Riccati equation (2.35).
We have seen that, if W (t0 , tf ) is invertible, the optimal cost for problem (FEP) is given
by
1 1
J(û) = hû, ûi = xT0 W (t0 , tf )−1 x0 . (2.36)
2 2
This is clearly true for any t0 , so that, from a given time t < tf (such that W (t, tf ) is
invertible) the “cost-to-go” is given by
1
x(t)T W (t, tf )−1 x(t)
2
I.e., the value function is given by
1
V (t, ξ) = ξ T W (t, tf )−1 ξ ∀ξ ∈ Rn , t < tf .
2
This clearly bears resemblance with results we obtained for the free endpoint problem since
W (t, tf )−1 satisfies the associated Riccati equation (see Exercise 2.14).
Now consider the more general quadratic cost
Ztf Ztf  T    
1 T T x(t) L(t) 0 x(t)
J(u) := x(t) L(t)x(t) + u(t) u(t) dt = dt, (2.37)
2 u(t) 0 I u(t)
t0 t0

where L(·) = L(·)T ∈ C. Let K(t) = K(t)T be some continuously differentiable time-
dependent matrix. Using the Fundamental Lemma we see that, since x0 and xi are fixed, it
is equivalent to minimize (we no longer assume that xf = 0n )
˜ 1
J(u) := J(u) + (xf T K(tf )xf − xT0 K(t0 )x0 )
2
Ztf  T    
1 x(t) L(t) + K̇(t) + AT (t)K(t) + K(t)A(t) K(t)B(t) x(t)
= dt.
2 u(t) B(t)T K(t) 0 u(t)
t0

To “complete the square”, suppose there exists such K(t) that satisfies
L(t) + K̇(t) + AT (t)K(t) + K(t)A(t) = K(t)B(t)B(t)T K(t)
i.e., satisfies the Riccati differential equation
K̇(t) = −AT (t)K(t) − K(t)A(t) + K(t)B(t)B(t)T K(t) − L(t), (2.38)
As we have seen when discussing the free endpoint problem, if L(t) is positive semi-definite
for all t, then a solution exists for every prescribed positive semi-definite “final” value K(tf ).)
Then we get
Ztf  T    
˜ 1 x(t) K(t)B(t)B(t)T K(t) K(t)B(t) x(t)
J(u) = dt
2 u(t) B(t)T K(t) I u(t)
t0

36 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.2 Fixed terminal state, unconstrained control values, quadratic cost

Ztf
1 2
= B(t)T K(t)x(t) + u(t) 2
dt.
2
t0

Now, again supposing that some solution to (DRE) exists, let K(·) be such a solution
with, say, K(tf ) = Kf , and let
v Kf (t) = B T (t)K(t)x(t) + u(t).
It is readily verified that, in terms of the new control input v Kf , the systems dynamics
become
ẋ(t) = [A(t) − B(t)B T (t)K(t)]x(t) + Bv Kf (t) t ∈ [t0 , tf ),
and the cost function takes the form
Ztf
˜ =1
J(u) v Kf (t)T v Kf (t)dt.
2
t0

That is, we end up with the problem


Rtf
minimize hv Kf (t), v Kf (t)idt
t0

subject to ẋ(t) = [A(t) − B(t)B T (t)Π(t, Kf , tf )]x(t) + Bv Kf (t) t ∈ [t0 , tf )(2.39)


x(t0 ) = x0
x(tf ) = xf
Kf
v ∈C (2.40)
where, following standard usage, we have parametrized the solutions to DRE by their value
(Kf ) at time tf , and denoted them by Π(t, Kf , tf ). This transformed problem (parametrized
by Kf ) is of an identical form to that we solved earlier. Denote by ΦA−BBT Π and WA−BBT Π
the state transition matrix and controllability Gramian for (2.39) (for economy of notation,
we have kept implicit the dependence of Π on Kf ). Then, for a given Kf , we can write the
optimal control v Kf for the transformed problem as (see 2.33)
v Kf (t) = −B(t)T WA−BB
−1
T Π (t, tf )x(t),

where Π := Π(t, Kf , tf ), and the optimal control u∗ for the original problem as
 
u∗ (t) = v Kf (t) − B(t)T Π(t, Kf , tf )x(t) = −B(t)T Π(t, Kf , tf ) + WA−BB
−1
TΠ (t, tf ) x(t).

We also obtain, with ξ0 = x(t0 ),


 
˜ ∗ ) + 1 ξ T Π(t0 , Kf , tf )ξ0 = 1 ξ T W −1 T (t0 , tf ) + Π(t0 , Kf , tf ) ξ0 .
J(u∗ ) = J(u
2 0 2 0 A−BB Π

The cost-to-go at time t is


ξ(x(t), t)T (WA−BB
−1
T Π (t, tf ) + Π(t, Kf , tf ))x(t)

Finally, if L(t) is identically zero, we can pick K(t) identically zero and we recover the
previous result.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 37


Linear Optimal Control: Some Readily Solvable Instances

Exercise 2.16 Show that reachability of (2.1) on [t0 , tf ] implies invertibility of WA−BBT Π (t0 , tf )
and vice-versa.

We obtain the block diagram depicted in Figure 2.3.8 Thus u∗ (t) = B(t)T p(t), with

w0=0 + v0 + u0 + x
B(t) ∫
+
- -

A(t)

BT (t)Π(t,K1,t1)

BT(t)WA-BBTΠ (t,t1)-1

Figure 2.3: Optimal feedback law

 
−1
p(t) = − Π(t, Kf , tf ) + WA−BB TΠ (t, tf ) x(t).

Remark 2.11 While vfK clearly depends on Kf , u∗ obviously cannot, since Kf is an arbitrary
symmetric matrix (subject to DRE having a solution with K(tf ) = Kf ). Thus we could have
assigned Kf = 0 throughout the analysis. Check the details of this.
The above is a valid closed-loop implementation as it does not involve the initial point
(x0 , t0 ) (indeed perturbations may have affected the trajectory between t0 and the current
time t). Π(t, Kf , tf ) can be precomputed (again, we must assume that such a solution exists.)
Also note that the optimal cost J(u∗ ) is given by (when xf = 0n )

˜ ∗ ) − 1 xf T Kf , xf − xT Π(t0 , Kf , tf )x0

J(u∗ ) = J(u 0
2
1  T −1 
= x0 WA−BBT Π (t0 , tf )x0 + xT0 Π(t0 , Kf , tf )x0
2
and is independent of Kf .
Finally, for all optimal control problems considered so far, the optimal control can be
expressed in terms of the adjoint variable (or co-state) p(·). More precisely, the following
holds.
8
If xf 6= 0n , then, on Figure 2.3, either replace w0 = 0 with
−1
w0 (t) = B T (t)WA−BB T Π (t, tf )ΦA−BB T Π (t, tf )xf ,

or equivalently insert a summing junction immediately to the right of the bottom feedback block, adding
−ΦA−BB T Π (t, tf )xf as exogenous input.

38 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.3 Free terminal state, constrained control values, linear terminal cost

Exercise 2.17 Consider the fixed terminal state problem with xf = 0n . Suppose that the
controllability Gramian W (t0 , tf ) is non-singular and that the relevant Riccati equation has
a (unique) solution Π(t, Kf , tf ) on [t0 , tf ] with K(tf ) = Kf . Let x(t) be the optimal trajectory
and define p(t) by
 
−1
p(t) = − Π(t, Kf , tf ) + WA−BB T Π(t,K ,t )
f f
(t, tf ) x(t)

so that the optimal control is given by

u∗ (t) = B T (t)p(t).

Prove that
    
ẋ(t) A(t) B(t)B T (t) x(t)
=
ṗ(t) L(t) −AT (t) p(t)

and the optimal cost is − 12 x(t0 )T p(t0 ).

Exercise 2.18 Verify that a minor modification of Theorem 2.3 holds in the present case of
fixed terminal state, the only difference being that p∗ (tf ) is now free (which “compensates”
for x∗ (tf ) being known).

Exercise 2.19 Consider an objective function of the form


Z tf
J(u) = (ϕ(x(t), t) + u(t)T u(t))dt + ψ(x(tf )),
t0

for some functions ϕ and ψ. Show that if an optimal control exists in U, its value u∗ (t) must
be in the range space of B T for all t. (Assume that B does not vary with time.)

2.3 Free terminal state, constrained control values, lin-


ear terminal cost
Consider the linear system

(S) ẋ(t) = A(t)x(t) + B(t)u(t), a.e. t ∈ [t0 , tf ], (2.41)

where A(·), B(·) are assumed to be continuous, and a (a.e. continuously differentiable)
solution x is sought. The “a.e” (almost every) and continuity (of x) specifications are needed
here because we will allow for discontinuous controls u.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 39


Linear Optimal Control: Some Readily Solvable Instances

Remark 2.12 When u is allowed to be discontinuous, “a.e.” is clearly needed. Consider


e.g., the very simple case of ẋ = u on [−1, 1], with x(−1) = 0, u(t) = 0 for t < 0, and
u(t) = 1 for t ≥ 0. The continuous “solution” is x(t) = 0 for t < 0 and x(t) = t for t ≥ 0,
but such x does not satisfy the differential equation for all t: indeed, x is not differentiable
at t = 0. It does satisfy the equation for almost every t though. As for the continuity
requirement, it renders the solution unique, for given initial state x(t0 ) = x0 . Indeed, the
continuous solution is the unique solution x (for given u) of the integral equation
Z t
x(t) = x0 + (A(τ )x(τ ) + B(τ )u(τ ))dτ ∀t,
t0

which amounts to specifying that x must be the integral of its almost-everywhere derivative.
Functions that have this property are termed absolutely continuous. Absolutely continuous
functions are a superset of continuously differentiable functions and a subset of almost ev-
erywhere differentiable functions. Further, if u ∈ PC (see Definition 2.1 below) and x is
continuous and satisfies (2.41) everywhere in [t0 , tf ] except possibly at (finitely many) points
t at which u is discontinuous, then it must be the integral of the right-handside, hence it is
absolutely continuous. Hence, for those x that satisfy (2.41) with u ∈ PC, continuity implies
absolute continuity.

From this point on, we will typically merely assume that the the control function u belongs
to the set PC of piecewise continuous functions, in the sense of the following definition.9

Definition 2.1 A function u : R → Rm belongs to PC if it is right-continuous10 and, for


every (finite) a, b ∈ R with a < b, it is continuous on [a, b] except for possibly finitely many
points of discontinuity, and has (finite) left and right limits at every point.

Throughout, given an initial time t0 and a terminal time tf , the set U of admissible controls
is defined by
U := {u : [t0 , tf ] → Rm , u ∈ PC, u(t) ∈ U ∀t ∈ [t0 , tf ]}, (2.42)
where U ⊆ Rm is to be specified. The reason for not requiring that admissible controls be
continuous is that, in many important cases, when U is not all of Rm , optimal controls are
“naturally” discontinuous, i.e., the problem has no solution if minimization is carried out
over the set of continuous function.11

Example 2.3 Consider the problem of bringing a point mass from rest at some point P to
rest at some point Q in the least amount of time, subject to upper and lower bounds on
9
Everything in these notes remains valid, with occasionally some minor changes, if the continuity assump-
tion on A and B is relaxed to piecewise continuity as well.
10
Many authors do not insist on right-continuity in the definition of piecewise continuity. The reason we
do is that with such requirement Pontryagin’s Maximum Principle holds for all t rather than for almost
all t, and that moreover the optimal control will be unique, which is not the case without such assumption.
Indeed, note that without right- (or left-) continuity requirement, changing the value of an optimal control
at, say, a single time point does not affect optimality.
11
This is related to the fact that the space of continuous functions is not “complete” under, say, the L1
norm. See more on this in Appendix A.

40 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.3 Free terminal state, constrained control values, linear terminal cost

the acceleration—equivalently on the force applied. (For example a car is stopped at a red
light and the driver wants to get as early as possible to a state of rest at the next light.)
Clearly, the best strategy is to use maximum acceleration up to some point, then switch
instantaneously to maximum deceleration. This is an instance of a “bang-bang” control. If
we insist that the acceleration must be a continuous function of time, the problem has no
solution.

(For another, explicit example, see Exercise 2.23 below.)

Given the constraint on the values of u (set U), contrary to the previous section, a linear
cost function can now be meaningful. Accordingly, we start with the following problem.
Let c ∈ Rn , c 6= 0, x0 ∈ Rn and let tf ≥ t0 be a fixed time. Find a control u∗ ∈ U so as to
minimize cT x(tf ) s.t.
dynamics ẋ(t) = A(t)x(t) + B(t)u(t) a.e. t ∈ [t0 , tf ],
initial condition x(t0 ) = x0
final condition x(tf ) ∈ Rn (no constraints)
control constraint u ∈ U
x absolutely continuous

Definition 2.2 The adjoint system to system


ẋ(t) = A(t)x(t) (2.43)
is given by12
ṗ(t) = −A(t)T p(t).

Let Φ(t, τ ) be the state transition matrix for (2.43).

Exercise 2.20 The state transition matrix Ψ(t, τ ) of the adjoint system is given by
Ψ(t, τ ) = Φ(τ, t)T .
Also, if ẋ = Ax, then x(t)T p(t) is constant.

d
Exercise 2.21 Prove that Φ (t , t)
dt A 0
= −ΦA (t0 , t)A(t).

Notation: For any u ∈ PC, z ∈ Rn and t0 ≤ t1 ≤ t2 ≤ tf , let φ(t2 , t1 , z, u) denote the state
at time t2 given that at time t1 the state is z, and control u is applied, i.e., let
Z t2
φ(t2 , t1 , z, u) = Φ(t2 , t1 )z + Φ(t2 , τ )B(τ )u(τ )dτ.
t1

Also let
K(t2 , t1 , z) = {φ(t2 , t1 , z, u) : u ∈ U}.
This set is called reachable set at time t2 .
12
Note that for the present problem, L = 0 (no integral cost), and this equation is a special case of (2.19).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 41


Linear Optimal Control: Some Readily Solvable Instances

Theorem 2.8 Let u∗ ∈ U and let


x∗ (t) = φ(t, t0 , x0 , u∗), t0 ≤ t ≤ tf .
Let p∗ (·) satisfy the adjoint equation:
ṗ∗ (t) = −AT (t)p∗ (t) t0 ≤ t ≤ tf
with terminal condition
p∗ (tf ) = −c
Then u∗ is optimal if and only if
p∗ (t)T B(t)u∗ (t) = sup{p∗ (t)T B(t)v : v ∈ U} ∀t ∈ [t0 , tf ) (2.44)
(implying that the “sup” is achieved) for all t ∈ [t0 , tf ). [Note that the optimization is now
over a finite dimension space.]

Proof. u∗ is optimal if and only if, for all u ∈ U


Z tf Z tf
T ∗ T
c [Φ(tf , t0 )x0 + Φ(tf , τ )B(τ )u (τ )dτ ] ≤ c [Φ(tf , t0 )x0 + Φ(tf , τ )B(τ )u(τ )dτ ]
t0 t0

or, equivalently, Z tf
(Φ(tf , τ )T c)T B(τ )(u∗ (τ ) − u(τ ))dτ ≤ 0.
t0
As pointed out above, for p∗ (t) as defined,
p∗ (t) = Φ(tf , t)T p∗ (tf ) = −Φ(tf , t)T c
So that u∗ is optimal if and only if and only if, for all u ∈ U,
Z tf
p∗ (τ )T B(τ )(u∗ (τ ) − u(τ ))dτ ≥ 0
t0

and the ‘if’ direction of the theorem follows immediately. Suppose now u∗ ∈ U is optimal.
We show that (2.44) is satisfied ∀t ∈ [t0 , tf ). Indeed, if this is not the case, ∃t∗ ∈ [t0 , tf ),
v ∈ U s.t.
p∗ (t∗ )T B(t∗ )u∗ (t∗ ) < p∗ (t∗ )T B(t∗ )v.
By right-continuity of u∗ (and of p∗ and B), there exists δ > 0 such that this inequality holds
for all t ∈ [t∗ , t∗ + δ]. Define ũ ∈ U by (“needle” perturbation)

v
 t∗ ≤ t < t∗ + δ
ũ(t) =

 ∗
u (t) otherwise
Then Z tf Z tf
∗ T ∗
p (t) B(t)u (t)dt < p∗ (t)T B(t)ũ(t)dt
t0 t0

which contradicts optimality of u .

42 Copyright ©1993-2018, André L. Tits. All Rights Reserved


2.3 Free terminal state, constrained control values, linear terminal cost

Corollary 2.2 For t0 ≤ t ≤ tf ,


p∗ (t)T x∗ (t) ≥ p∗ (t)T ξ ∀ξ ∈ K(t, t0 , x0 ). (2.45)

Exercise 2.22 Prove the corollary.

Because the problem under consideration has no integral cost, the pre-Hamiltonian H defined
in section 2.1.1 reduces to
H(τ, ξ, η, υ) = η T (A(τ )ξ + B(τ )υ)
and the Hamiltonian H by
H(τ, ξ, η) = sup{H(τ, ξ, η, v) : v ∈ U}.
Since pT A(t)x does not involve the variable u, condition (2.44) can be written as
H(t, x∗ (t), p∗ (t), u∗ (t)) = H(t, x∗ (t), p∗ (t)) ∀t (2.46)
This is another instance of Pontryagin’s Principle. The previous theorem states that, for
linear systems with linear objective functions, Pontryagin’s Principle provides a necessary
and sufficient condition of optimality.
Remark 2.13
1. Let ψ : Rn → R be the “terminal cost” function, defined by ψ(x) = cT x. Then
p∗ (tf ) = −∇ψ(x∗ (tf )), just like in the case of the problem of section 2.1.
2. Linearity in u was not used. Thus the result (and proof) apply to systems with
dynamics of the form
ẋ(t) = A(t)x(t) + B(t, u(t))
where B(·, ·) is, say, a continuous function.

Exercise 2.23 Compute the optimal control u∗ for the following time-invariant data: t0 = 0,
tf = 1, A = diag(1, 2), B = [1; 1], c = [−2; 1], U = [−1, 1]. Note that u∗ is not continuous!
(Indeed, this is an instance of a “bang-bang” control.)

Fact. Let A and B be constant matrices, and suppose there exists an optimal control

u∗ , with corresponding trajectory x∗ . Then m(t) = H(t, x∗ (t), p∗ (t)) is constant (i.e., the
Hamiltonian is constant along the optimal trajectory).
Exercise 2.24 Prove the fact under the assumption that U = [α, β]. (The general case will
be considered in Chapter 6.)

Exercise 2.25 Suppose U = [α, β], so that B(t) is an n × 1 matrix. Suppose that A(t) = A
and B(t) = B are constant matrices and A has n distinct real eigenvalues. Show that
there is an optimal control u∗ and t0 = τ0 ≤ τ1 < · · · ≤ τn = tf (n = dimension of x)
such that u∗ (t) = α or β on [τi , τi+1 ), 0 ≤ i ≤ n − 1. [Hint: first show that p∗ (t)T B =
γ1 exp(δ1 t) + · · · + γn exp(δn t) for some δi , γi ∈ R. Then use appropriate induction.]

Copyright ©1993–2024, André L. Tits. All Rights Reserved 43


Linear Optimal Control: Some Readily Solvable Instances

2.4 More general optimal control problems


We have shown how to solve optimal control problems where the dynamics are linear, the
objective function is quadratic, and the constraints are of a very simple type (fixed initial
point, fixed terminal point). In most problems of practical interest, though, one or more of
the following features is present.

(i) nonlinear dynamics and objective function

(ii) constraints on the control or state trajectories, e.g., u(t) ∈ U ∀ t, where U ⊂ Rm

(iii) more general constraints on the initial and terminal state, e.g., g(x(tf )) ≤ 0.

To tackle such more general optimization problems, we will make use of additional mathe-
matical machinery. We first proceed to develop such machinery.

44 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 3

Dynamic Programming

Much of the material in this chapter is borrowed from [?].


The dynamic-programming approach to optimal control compares candidate solutions to
all controls, yielding global solutions, even in the absence of linearity/convexity properties.
This is in contrast with the Pontryagin-Principle approach, which compares them to controls
that yield nearby trajectories and hence, in the absence of appropriate linearity/convexity
properties, tends to yield mere local solutions. (More on this in Chapter 6.) The price for
this is that dynamic programming tends to be computationally rather demanding, in many
cases prohitively so. An advantage of dynamic programming is that, at an introductory level,
it does not require as much mathematical machinery as Pontryagin’s Principle does. This
motivates its introduction at an early point in this course. We first consider discrete-time
problems, then turn to continuous time.

3.1 Discrete time

See, e.g., [?, ?, ?]).

Let X ⊂ Rn and U ⊂ Rm , let ψ : Rn → R and, for i = 1, . . . , N − 1, N a positive


integer, let L(i, ·, ·) : X × U → R, and f (i, ·, ·) : X × U → Rn ; also let ψ : Rn → R, be
given and let u := {u0 , . . . , uN −1 } ∈ RN m designate (finite) sequences of control values. No
regularity is assumed. For a given x0 ∈ X, consider the problem

N
X −1
minimizeu∈RNm J(u) := L(i, xi , ui ) + ψ(xN )
i=0
s.t. xi+1 = f (i, xi , ui ), i = 0, . . . , N − 1, x0 fixed
ui ∈ U, i = 0, . . . , N − 1, (P )
xi ∈ X, i = 1, . . . , N,

Copyright ©1993–2024, André L. Tits. All Rights Reserved 45


Dynamic Programming

The key idea is to embed problem (P ) into a family of problems, with all possible initial
times k < N − 1, and initial conditions ξ ∈ X
N
X −1
minimizeu∈RNm Jk (u) := L(i, xi , ui ) + ψ(xN )
i=k
s.t. xi+1 = f (i, xi , ui ), i = k, . . . , N − 1, xk = ξ,
ui ∈ U, i = k, k + 1, . . . , N − 1, (Pk,ξ )
xi ∈ X, i = k + 1, . . . , N.

The cornerstone of dynamic programming is Bellman’s Principle of Optimality, a form of


which is given in the following lemma. (R.E. Bellman, 1920–1984.)

Lemma 3.1 (Bellman’s Principle of Optimality) Suppose u∗k , . . . , u∗N −1 is optimal for (Pk,ξ )
with associated state trajectory x∗k = ξ, x∗k+1 , . . . , x∗N . Then, if ℓ ∈ {k, . . . , N − 1},
{u∗ℓ , u∗ℓ+1, . . . , u∗N −1 } is optimal for (Pℓ,x∗ℓ ).

Proof. Suppose not. Then there exists û := (ûℓ , . . . , ûN −1 ), with corresponding trajectory
x̂ℓ , x̂ℓ+1 , . . . , x̂N , with x̂ℓ = x∗ℓ , such that
N
X −1 N
X −1
L(i, x̂i , ûi ) + ψ(x̂N ) < L(i, x∗i , u∗i ) + ψ(x∗N ) (3.1)
i=ℓ i=ℓ

Consider then the control ũ := (ũk , . . . , ũN −1) given by


 ∗
ui i = k, . . . , ℓ − 1
ũi =
ûi i = ℓ, . . . , N − 1

and the corresponding trajectory x̃k = ξ, x̃k+1, . . . , x̃N , with

x̃i = x∗i i = k, . . . , ℓ
x̂i i = ℓ + 1, . . . , N

The value of the objective function corresponding to this control is


N
X −1
Jk (ũ) = L(i, x̃i , ũi ) + ψ(x̃N )
i=k
ℓ−1
X N
X −1
= L(i, x∗i , u∗i ) + L(i, x̂i , ûi ) + ψ(x̂N )
i=k i=ℓ
| {z }
ℓ−1
X N
X −1
< L(i, x∗i , u∗i ) + L(i, x∗i , u∗i ) + ψ(x∗N ) (3.2)
i=k i=ℓ
N
X −1
= L(i, x∗i , u∗i ) + ψ(x∗N )
i=k

46 Copyright ©1993-2018, André L. Tits. All Rights Reserved


3.1 Discrete time

where (3.1) was invoked. Hence ũ yields a lower cost than u∗i , a contradiction.

Let V be again the optimal value function, i.e., for ξ ∈ X, V (N, ξ) = ξ, and for k ∈
{0, . . . , N − 1},
V (k, ξ) = inf Jk (u)
ui ∈U ∀i

s.t. xi+1 = f (i, xi , ui ), i = k, . . . , N − 1, xi ∈ X, i = k + 1, . . . , N, xk = ξ.

Theorem 3.1 (Dynamic programming) Given k ∈ {0, . . . , N − 1} and ξ ∈ X, the following


holds:

V (k, ξ) = inf{L(k, ξ, v) + V (k + 1, f (k, ξ, v)) : v ∈ Uk , f (k, ξ, v) ∈ X} (3.3)

and v ∈ U is a minimizer for (3.3) if and only if there exists (uk+1, . . . , uN −1) such that
(v, uk+1, . . . , uN −1) is optimal for (Pk,ξ ).

Proof. Let
F (k, ξ, v) := L(k, ξ, v) + V (k + 1, f (k, ξ, v)).
Let u∗i , i = k, . . . , N − 1 be optimal for (Pk,ξ ), with x∗i , i = k + 1, . . . , N the corresponding
state trajectory. Further, let ûk := v ∈ U, with f (k, ξ, v) ∈ X, be such that no control
sequence of the form (v, uk+1, . . . , uN −1 ) is optimal for (Pk,ξ ), let (ûk+1 , . . . , ûN −1) be optimal
for (Pk,f (k,ξ,v)), and let x̂i , i = k+1, . . . , N be the state trajectory generated by (ûk , . . . , ûN −1 ).
We show that V (k, ξ) = F (k, ξ, u∗k ) < F (k, ξ, v), proving both claims.
Indeed,
N
X −1
V (k, ξ) = L(k, ξ, u∗k )+ L(i, x∗i , u∗i )+ψ(x∗N ) = L(k, ξ, u∗k )+V (k+1, f (k, ξ, u∗k )) = F (k, ξ, u∗k )
i=k+1
(3.4)
and
N
X −1
V (k, ξ) < L(k, ξ, v)+ L(i, x̂i , ûi )+ψ(x̂N ) = L(k, ξ, v)+V (k+1, f (k+1, ξ, v)) = F (k, ξ, v),
i=k+1

where in both instances the penultimate equality follows from the Principle of Optimality.

Definition 3.1 A function φ : {0, 1, . . . , N − 1} × X → Rm is an optimal control law if, for


all k ∈ {0, 1, . . . , N − 1} and all ξ ∈ X, φ(k, ξ) ∈ U and

V (k, ξ) = L(k, ξ, φ(k, ξ)) + V (k + 1, f (k, ξ, φ(k, ξ))).

Corollary 3.1 Let φ : {0, 1, . . . , N − 1} × X → Rm . Then φ is an optimal control law if


and only if, for all k ∈ {0, 1, . . . , N − 1} and all ξ ∈ X,

f0 (k, ξ, φ(k, ξ)) + V (k + 1, f (k, ξ, φ(k, ξ)) = min{L(k, ξ, v) + V (k + 1, f (k, ξ, v))}. (3.5)
v∈U

Copyright ©1993–2024, André L. Tits. All Rights Reserved 47


Dynamic Programming

How can this result be used? First note that V (N, ξ) = ψ(ξ) for all ξ, then for k =
N − 1, N − 2, . . . , 0 (i.e., “backward in time”) solve (3.5) to obtain, for each ξ, ψ(k, ξ), then
V (k, ξ).

Remark 3.1
1. No regularity assumptions were made.
2. It must be stressed that, indeed, since for a given initial condition x(0) = x0 the
optimal trajectory is not known a priori, V (k, ξ) must be computed for all ξ at every
time k > 0 (or at least for ξ equal to every “possibly optimal” xk ). Practically, such
computation is to be carried out over a grid of values ξ ∈ X; with d values for each
coordinate of ξ, dn optimizations would have to be carried out at each step k, i.e., an
exponential number of such calculations in terms of the state dimension n.

Exercise 3.1 (From [?].) Apply dynamic programming to solve the discrete-time, free ter-
minal state linear quadratic regulator problem, with cost function
N
X −1

xTi Li xi + uTi ui + xTN QxN ,
i=k

where Q and Li , i = 0, . . . , N − 1 are symmetric and positive semidefinite. The dynamics


are
xi+1 = Ai xi + Bi ui , i = 0, . . . , N − 1.
(The Riccati equation is now of course a difference equation. Its right-hand side is a bit more
complicated than before.)

3.2 Continuous time


See [?, ?].

In this section, for a differentiable mapping F , we denote by Di F the partial Fréchet


derivative of F with respect to its ith argument. Simlarly, for a scalar function V , we use
∇i V for its gradient with respect to ith argument.
Given t0 and tf , with t0 < tf , and ξ ∈ Rn , consider the problem (Pτ,ξ )
Z tf
minimize f0 (τ, x(τ ), u(τ ))dτ + ψ(x(tf ))
τ
s.t. ẋ(t) = f (t, x(t), u(t)), a.e. t ∈ [τ, tf ]
x(τ ) = ξ,
u ∈ U, x absolutely continuous
We impose the following regularity conditions, which are sufficient for existence and unique-
ness of an absolutely continuous solution to the differential equation and well-definedness of
the objective function.

48 Copyright ©1993-2018, André L. Tits. All Rights Reserved


3.2 Continuous time

(i) for each t ∈ [t0 , tf ], f (t, ·, ·) and f0 (t, ·, ·) are continuously differentiable on Rn × Rm ,
and ψ is continuously differentiable.

(ii) f , D2 f , D3f are continuous on [t0 , tf ] × Rn × Rm .

(iii) for α ∈ R, ∃β, γ s.t.

||f (t, ξ, v)|| ≤ β + γ||ξ|| ∀t ∈ [t0 , tf ], ξ ∈ Rn , v ∈ Rm , ||v|| ≤ α.

Also, for every τ ∈ [0, tf ) and ξ ∈ Rn consider the problem (Pτ,ξ )


Rt
minimize Jτ (u) := τ f f0 (t, x(t), u(t))dt + ψ(x(tf ))
s.t. ẋ(t) = f (t, x(t), u(t)), a.e. t ∈ [τ, tf ], x(τ ) = ξ, u ∈ U, x absolutely continuous

Assumption. (Pτ,ξ ) has an optimal control for all τ , ξ.


(Again, see, e.g., [?], for a derivation without such assumption; the resulting HJB equation
then involves an “inf” instead of the “min”.)
As before, let V (τ, ξ) be the minimum value of the objective function, starting from state
ξ at time τ < tf ; in particular V (tf , ξ) = ψ(ξ). An argument similar to that used in the
discrete-time case yields, for any ∆ ∈ (0, tf − τ )
Z τ +∆ 
V (τ, ξ) = inf f0 (t, x(t), u(t))dt + V (τ + ∆, x(τ + ∆)) | u ∈ U , (3.6)
τ

subject to
 ·
x (t) = f (t, x(t), u(t)) a.e. t ∈ [τ, tf ], x absolutely continuous
x(τ ) = ξ

(Existence of an optimal control is not assumed nor guaranteed at this point.)


The idea is then to differentiate both sides with respect to ∆ (with the left-hand side is
a constant), at ∆ = 0, thus obtaining a differential equation. (This, of course, cannot be
done in the discrete-time case.) This requires a regularity assumption on V (·, ·).
Assumption. V is continuously differentiable (jointly in both arguments).
Now, for given τ ∈ [t0 , tf ], let v ∈ U, let ũ ∈ U satisfy

ũ(t) = v ∀t ∈ [τ, τ + ∆],

and let x̃ be the corresponding state trajectory, with x̃(τ ) = ξ. Next, motivated by (3.6),
define ϕv : [0, tf − τ ) → R by
Z τ +∆
ϕv (∆) := V (τ + ∆, x̃(τ + ∆)) − V (τ, ξ) + f0 (t, x̃(t), ũ(t))dt.
τ

Then ϕv (0) = 0 and, from (3.6),

ϕv (∆) ≥ 0 ∀∆ ∈ [0, tf − τ ].

Copyright ©1993–2024, André L. Tits. All Rights Reserved 49


Dynamic Programming

Further, ϕv is continuously right-differentiable on (0, tf −τ ), so that its right-derivative ϕ′v (0)


at ∆ = 0 is non-negative, i.e., for all ξ ∈ Rn , v ∈ U, and all τ < tf , (since ũ(τ ) = v),
ϕ′v (0) = D1 V (τ, ξ) + D2 V (τ, ξ)f (τ, ξ, v) + f0 (τ, ξ, v) ≥ 0. (3.7)
Now let u∗ ∈ U be optimal for (Pτ,ξ ). Then, from Bellman’s Principle of Optimality (u∗
optimal on (τ, τ + ∆)), for every ∆ ∈ (0, tf − τ ), the function ϕ∗ : [0, tf − τ ] → R given by
Z τ +∆

ϕ∗ (∆) := V (τ + ∆, x (τ + ∆)) − V (τ, ξ) + f0 (t, x∗ (t, u∗ (t)))dt
τ

is identically zero, so that (since u∗ is right-continuous at t = τ ), for all ξ ∈ Rn , τ ∈ [t0 , tf ),


ϕ′∗ (0) = D1 V (τ, ξ) + D2 V (τ, ξ)f (τ, ξ, u∗(τ )) + f0 (τ, ξ, u∗(τ )) = 0. (3.8)
From (3.7) and (3.8), and since u∗ (τ ) ∈ U for all τ ∈ [t0 , tf ), we get
min {D1 V (τ, ξ) + D2 V (τ, ξ)f (τ, ξ, v) + f0 (τ, ξ, v)} = 0 ∀τ ∈ [t0 , tf ], ∀ξ ∈ Rn .
v∈U

We summarize the above in a formal statement, with “min” replaced with “inf”; such a
statement holds regardless of whether or not an optimal control exists.
Theorem 3.2 Suppose that the value function V is continuously differentiable. Then, for
all τ ∈ [t0 , tf ], ξ ∈ Rn ,

D1 V (τ, ξ) + inf v∈U {f0 (τ, ξ, v) + D2 V (τ, ξ)f (τ, ξ, v)} = 0
(HJB)
V (tf , ξ) = ψ(ξ)
This partial differential equation for V (·, ·) is known as the Hamilton–Jacobi–Bellman equa-
tion (W.R. Hamilton, Irish mathematician, 1805–1865; K.G.J. Jacobi, German mathemati-
cian, 1804–1851; R.E. Bellman, American mathematician, 1920–1984). Note that minimiza-
tion is now over (a subset of) Rm rather than over (a subset of) the function space U.
Now, for all τ, ξ, η, v, define H : R × Rn × Rn × Rm → R and H : R × Rn × Rn → R by
H(τ, ξ, η, v) = −f0 (τ, ξ, v) + η T f (τ, ξ, v) (3.9)
and
H(τ, ξ, η) = sup H(τ, ξ, η, v).
v∈U

Then (HJB) can be written as



D1 V (τ, ξ) = H (τ, ξ, −∇2 V (τ, ξ))
V (tf , ξ) = ψ(ξ)
Indeed, from (HJB) we have
D1 V (τ, ξ) = sup(−f0 (τ, ξ, υ)−D2V (τ, ξ)f (τ, ξ, υ)) = sup H(τ, ξ, −∇2V (τ, ξ), υ) = H(τ, ξ, −∇2 V (τ, ξ)).
υ∈U υ∈U

We now show that (HJB) is also a sufficient condition of optimality, i.e., that if V (·, ·)
satisfies (HJB), it must be the value function. (In discrete time this was obvious since the
solution to (3.3) was clearly unique, given the final condition V (N, ξ) = ψ(ξ).) Furthermore,
obtaining V by solving (HJB) yields an optimal control in feedback form.

50 Copyright ©1993-2018, André L. Tits. All Rights Reserved


3.2 Continuous time

Theorem 3.3 (Sufficient condition of optimality) Suppose there exists V (·, ·), continuously
differentiable, satisfying (HJB) together with the boundary condition. Further suppose that
the ‘inf ’ in (HJB) is attained for all (τ, ξ) (i.e., inf=min) and that there exists φ(·, ·), with
values in U, piecewise continuous in the first argument and Lipschitz in the second argument,
such that φ(τ, ξ) is a minimizer for all τ, ξ, so that V satisfies

D1 V (τ, ξ) + D2 V (τ, ξ)f (τ, ξ, φ(τ, ξ)) + f0 (τ, ξ, φ(τ, ξ)) = 0 ∀τ, ξ. (3.10)

Then φ is an optimal feedback law, i.e., the control law u(t) := φ(t, x(t)) generates an
optimal control û, and V is the value function. Further, together with the associated optimal
trajectory x̂, û satisfies

D1 V (t, x̂(t)) + D2 V (t, x̂(t))f (t, x, û(t)) + f0 (t, x̂(t), û(t)) = 0 ∀t. (3.11)

Proof. (Idea: Carefully integrate back what we differentiated.) Let τ < tf and ξ ∈ Rn be
arbitrary.
1. Let u ∈ U, yielding x(·) with x(τ ) = ξ (initial condition), i.e.,
 ·
x (t) = f (t, x̃(t), ũ(t)) a.e. t ∈ [τ, tf ], x absolutely continuous
x(t) = ξ

Since V satisfies (HJB) and u(t) ∈ U for all t, we have

−D1 V (t, x(t)) ≤ f0 (t, x(t), u(t)) + D2 V (t, x(t))f (t, x(t), u(t)) ∀t ∈ [τ, tf ),

which yields (us usual, V̇ denotes the total time derivative of V )

V̇ (t, x(t)) + f0 (t, x(t), u(t)) ≥ 0 a.e. t ∈ [τ, tf ].

Since V is continuously differentiable and x absolutely continuous, and since V (tf , x(tf )) =
ψ(x(tf )), integration of both sides from τ to tf yields
Z tf
ψ(x(tf )) − V (τ, ξ) + f0 (t, x(t), u(t))dt ≥ 0,
τ

i.e., Z tf
V (τ, ξ) ≤ f0 (t, x(t), u(t))dt + ψ(x(tf )) (= Jτ (u)). (3.12)
τ
Since u ∈ U is arbitrary, this implies that

V (τ, ξ) ≤ Jτ (u) ∀u ∈ U.

2. To show that V is the value function and that φ is an optimal feedback law indeed, it
now suffices to show that the control signal produced by φ achieves equality in (3.12). Thus
let x∗ (·) be the unique (due to the assumptions on φ) absolutely continuous function that
satisfies

ẋ∗ (t) = f (t, x∗ (t), φ(t, x∗ (t))) a.e. t ∈ [τ, tf ]


x∗ (τ ) = ξ,

Copyright ©1993–2024, André L. Tits. All Rights Reserved 51


Dynamic Programming

and let
u∗ (t) = φ(t, x∗ (t)).
Then, proceeding as above and using (3.10), we get
Z tf
V (t, ξ) = f0 (τ, x∗ (τ ), u∗ (τ ))dτ + ψ(x∗ (tf )),
t

showing optimality of control law φ and, by the same token, of any control signal that
satisfies (3.11), and showing that V is indeed the value function. (We leave out the proof
that, given the regularity assumptions on the data, φ satisfies the regularity conditions in
the statement of the theorem.)

Remark 3.2 (HJB) is a partial differential equation. Thus, just like that of discrete-time
dynamic programming, its solution is prohibitively CPU-demanding when n is large (curse
of dimensionality).

Exercise 3.2 Solve the free end-point linear quadratic regulator problem using HJB (see
Example 6.3). Is V (t, x) always well-defined for all t ∈ [t0 , tf ) and x ∈ Rn ?

Finally, we derive an instance of Pontryagin’s Principle for continuous-time problem (P),


under the additional (strong) assumption that the value function V is twice continuously
differentiable. We aim for a necessary condition.
Assumption. V is twice continuously differentiable.

Theorem 3.4 Let u∗ ∈ U be an optimal control for (Pt0 ,x0 ), and x∗ be the corresponding
state trajectory. Suppose the value function V is twice continuously differentiable. Then
Pontryagin’s Principle holds, with

p∗ (t) := −∇2 V (t, x∗ (t)), (3.13)

an absolutely continuous function.

Proof. Absolute continuity (indeed, continuous differentiability) of p∗ is implied by twice


continuous differentiability of V . Next,

p∗ (tf ) = −∇2 V (tf , x∗ (tf )) = −∇ψ(x∗ (tf ))

and, from (3.8) (recall that u∗ is right-continuous),

H(t, x∗ (t), p∗ (t), u∗(t)) = H(t, x∗ (t), p∗ (t)) ∀t.

It remains to show that ṗ∗ (t) = −∇2 H(t, x∗ (t), p∗ (t), u∗ (t)). Since V satisfies (HJB), we
have, for all (τ, ξ) and for all υ ∈ U,

G(τ, ξ, υ) := D1 V (τ, ξ) − H(τ, ξ, −∇2V (τ, ξ), υ) ≥ 0 (3.14)

52 Copyright ©1993-2018, André L. Tits. All Rights Reserved


3.2 Continuous time

and in particular, since u∗ (t) ∈ U for all t,

G(τ, ξ, u∗(t)) ≥ 0 ∀τ, ξ.

Now, (3.8) yields


G(t, x∗ (t), u∗ (t)) = 0 ∀t
so that
G(t, x∗ (t), u∗(t)) = minn G(t, x, u∗ (t))
x∈R

and
D2 G(t, x∗ (t), u∗ (t)) = 0 ∀t. (3.15)
Now, since V is twice differentiable, we have for all (τ, ξ, η, υ) since D1 ∇2 = ∇2 D1 ,

D2 G(τ, ξ, υ) = D1 (∇2 V (τ, ξ)) − D2 H(τ, ξ, −∇2V (τ, ξ), υ)

and hence, using (3.15) and definition (3.13) of p∗ ,

−ṗ∗ (t)T − D2 H(t, x∗ (t), p∗ (t), u∗(t)) = 0 ∀t.

completing the proof.

Exercise 3.3 Consider the optimal control problem, with scalar x and u,

minimize x(1) s.t. ẋ(t) = x(t)u(t) a.e., |u(t)| ≤ 1 ∀t ∈ [0, 1],

where u ∈ PC and x is absolutely continuous. Obtain (by “inspection”) the value function
V (t, x), for t ∈ [0, 1], x ∈ R. Verify that it is not everywhere differentiable with respect to
x (hence is not a solution to (HJB)). [A fortiori, V is not twice continuously differentiable,
and the derivation of Pontryagin’s Principle given in Theorem 3.4 is not valid. The Principle
itself does hold though, as we’ll see down the road. This situation is typical in problems with
nonlinear (in (x, u)) dynamics that is linear in u for fixed x, and with simple bounds on u,
and terminal cost; such problems are common place in engineering applications.]

Copyright ©1993–2024, André L. Tits. All Rights Reserved 53


Dynamic Programming

54 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 4

Unconstrained Optimization

References: [?, ?].

4.1 First order condition of optimality


We consider the problem

min{f (x) : x ∈ V } (4.1)

where V is a normed vector space and f : V → R.

Remark 4.1 This problem is technically very similar to the problem

min{f (x) : x ∈ Ω} (4.2)

where Ω is an open set in V , as shown in the following exercise.

Exercise 4.1 Suppose Ω ⊂ V is open. Prove carefully, using the definitions given earlier,
that x̂ is a local minimizer for (4.2) if and only if x̂ is a local minimizer for (4.1) and x̂ ∈ Ω.
Further, prove that the “if ” direction is true for general (not necessarily open) Ω, and show
by exhibiting a simple counterexample that the “only if ” direction is not.

Now suppose f is (Fréchet) differentiable (see Appendix B). (In fact, many of the results
we obtain below hold under milder assumptions.) We next obtain a first order necessary
condition for optimality.

Theorem 4.1 Let f be differentiable. Suppose x̂ is a local minimizer for (4.1). Then
Df (x̂) = 0B(V,R) .

Proof. Since x̂ is a local minimizer for (4.1), there exists ǫ > 0 such that

f (x̂ + h) ≥ f (x̂) ∀h ∈ B(0V , ǫ) (4.3)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 55


Unconstrained Optimization

Since f is Fréchet-differentiable, we have, for all h ∈ V ,


f (x̂ + h) = f (x̂) + Df (x̂)h + o(h) (4.4)
o(h)
with ||h||
→ 0 as h → 0V . Hence, from (4.3), whenever h ∈ B(0V , ǫ)

Df (x̂)h + o(h) ≥ 0 (4.5)


or, equivalently,
Df (x̂)(td) + o(td) ≥ 0 ∀d ∈ B(0V , 1), ∀t ∈ (0, ǫ] (4.6)
and, dividing by t, for t 6= 0
o(td)
Df (x̂)d + ≥ 0 ∀d ∈ B(0V , 1), ∀t ∈ (0, ǫ] (4.7)
t
It is easy to show (see exercise below) that o(th)
t
→ 0 as t → 0. Hence, letting t → 0 in (4.7),
we get
Df (x̂)d ≥ 0 ∀d ∈ B(0V , ǫ).
Since d ∈ B(0V , ǫ) implies −d ∈ B(0V , ǫ), we have Df (x̂)(−d) ≥ 0 thus
Df (x̂)d ≤ 0 ∀d ∈ B(0V , ǫ).
Hence
Df (x̂)d = 0 ∀d ∈ B(0V , ǫ)
which implies (since Df (x̂) is linear)
Df (x̂)d = 0 ∀d ∈ V
i.e.,
Df (x̂) = 0V .

Remark 4.2 The same optimality condition can be established (with essentially the same
proof) under less restrictive assumptions, specifically mere G-differentiability of f at x̂ (which
does not require a norm on V ) and mere “weak” local optimality of x̂: For every h ∈ V ,
there exists ǫh > 0 such that f (x̂) ≤ f (x̂ + th) for all t ≤ ǫh .

o(h) o(td)
Exercise 4.2 If o(·) is such that ||h||
→ 0 as h → 0V then, for any fixed d 6= 0V , t
→0
as t → 0 t ∈ R.
Remark 4.3 The optimality condition above, like several other conditions derived in this
course, is only a necessary condition, i.e., a point x satisfying this condition need not be
optimal, even locally. However, if there is an optimal x, it has to be among those which satisfy
the optimality condition. Also it is clear that this optimality condition is also necessary for
a global minimizer (since a global minimizer is also a local minimizer). Hence, if a global
minimizer is known to exist, it must be, among the points satisfying the optimality condition,
the one with minimum value of f .

56 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.2 Steepest descent method

Suppose now that we want to find a minimizer. Solving Df (x̂) = 0B (V, R) for x (n
nonlinear equations in n unknown, when V = Rn ) is usually hard and could well yield
maximizers or other stationary points instead of minimizers. A central idea is, given an
“initial guess” x̃, to determine a “descent”, i.e., a direction in which, starting from x̃, f
decreases (at least for small enough displacement). Thus suppose Df (x̃) 6= 0B (V, R) (hence
x̃ is not a local minimizer). If h is such that Df (x̃)h < 0 (such h exists; why?) then, for
t > 0 small enough, we have
 
o(th)
f (x̃ + th) − f (x̃) = t Df (x̃)h + <0
t

and, hence, for some th > 0,

f (x̃ + th) < f (x̃) ∀t ∈ (0, th ].

Such h is called a descent direction for f at x̃. The concept of descent direction is essential
to numerical methods.

4.2 Steepest descent method


Suppose now that V is a Hilbert space. Then Df (x̃)h = hgradf (x̃), hi and a particular
descent direction is h = −gradf (x̃). (This is so irrespective of which inner product (and
associated gradient) is used. We will return to this point when studying Newton’s method
and variable metric methods.) This direction is known as the direction of steepest descent.
The next exercise justifies this terminology.

Exercise 4.3 Let V be a Hilbert space, with inner product h·, ·iV and associated gradient
gradV . Let f : V → R be differentiable at x ∈ V , with gradient gradV f (x) 6= 0V , and let
dˆ := − kgrad
gradV f (x)
f (x)k
. (i) Show that
V

ˆ
argmin {Df (x)d : hd, diV = 1} = {d}.

(Hint: Use the Cauchy-Bunyakovskii-Schwartz inequality.) (ii) Show that, given any d˜ =
6 dˆ
˜ di
with hd, ˜ = 1, there exists t̄ > 0 such that

ˆ < f (x + td)
f (x + td) ˜ ∀t ∈ (0, t̄].

Exercise 4.4 Under the same assumptions as in Exercise 4.3, show that ĥ = −gradf (x) is
also the unique solution of
1
min f (x) + Df (x)h + hh, hi,
h 2
i.e., the only minimizer of the second order expansion of f about x when the Hessian is
identity.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 57


Unconstrained Optimization

In view of the above, a natural algorithm for attempting to solve (4.1) would be the following.
Algorithm SD (steepest descent with exact line search)
Data x0 ∈ H
i := 0
while gradf (xi ) 6= 0V do {
pick ti ∈ arg min{f (xi − tgradf (xi )) : t ≥ 0} (if there is no such minimizer
t
the algorithm fails)
xi+1 := xi − ti gradf (xi )
i := i + 1
}
stop

Notation: Given a real-valued function φ, the (possibly empty) set of global minimizers for
the problem

minimize φ(x) s.t. x ∈ S

is denoted by
arg min{φ(x) : x ∈ S}.
x

In the algorithm above, like in other algorithms we will study in this course, each iteration
consists of essentially 2 operations:

• computation of a search direction (here, directly opposed to the gradient of f )

• a search along that search direction, which amounts to solving, often approximately, a
minimization problem in only one variable, t. The function φ(t) = f (x + th) can be
viewed as the one-dimensional section of f at x in direction h. This second operation
is often also called “step-size computation” or “line search”.

Before analyzing the algorithm above, we point out a practical difficulty. Computation of
ti involves an exact minimization which cannot in general be performed exactly in finite
time (it requires construction of an infinite sequence). Hence, point xf will never be actually
constructed and convergence of the sequence {xi } cannot be observed. One says that the
algorithm is not implementable, but merely conceptual. An implementable algorithm for
solving (4.1) will be examined later.

4.3 Introduction to convergence analysis


In order to analyze Algorithm SD, we embed it in a class of algorithms characterized by
the following algorithm model. Here, a : V → V , V a normed vector space, represents an
iteration map; and ∆ ⊂ V is a set of “desirable” points.
Algorithm Model 1
Data. x0 ∈ V

58 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.3 Introduction to convergence analysis

i=0
while xi 6∈ ∆ do {
xi+1 = a(xi )
i=i+1
}
stop
Theorem 4.2 Suppose
(i) a(·) is continuous in ∆c (∆c is the complement of ∆)
(ii) There exists v : ∆c → V such that v(a(x)) < v(x) ∀x ∈ ∆c
Then, if the sequence {xi } constructed by Algorithm model 1 is infinite, every accumulation
point of {xi } is desirable (i.e., belongs to ∆).

Exercise 4.5 Prove the theorem.

Exercise 4.6 Give an example (i.e., exhibit a(·), v(·) and ∆) showing that condition (ii),
in Theorem 4.2, cannot be dropped.
Remark 4.4
1. Note that Algorithm Model 1 does not imply any type of “optimization” idea. The
result of Theorem 4.2 will hold if one can show the existence of a function v(·) that,
together with a(·), would satisfy conditions (i) to (iii). This idea is related to that of
a Lyapunov function for the “discrete-time system” xi+1 = a(xi ) (but assumptions on
v are weaker, and the resulting sequence may be unbounded).

2. The result of Theorem 4.2 is stronger than it may appear at first glance.
(i) if {xi } is bounded (e.g., if all level sets of f are bounded) and V is finite-
dimensional, accumulation points do exist.
(ii) if accumulation point(s) exist(s) (which implies that ∆ is nonempty), and if ∆ is
a finite set (which is often the case), there are simple techniques that can force
the entire sequence to converge to one of these accumulation points. For example,
if ∆ is the set of stationary points of v, one might restrict the step kxi+1 − xi k to
never be larger than some constant multiple of k∇v(xi )k (step-size limitation).

Exercise 4.7 Consider Algorithm SD (steepest descent) with H = Rn and define, for any
x ∈ Rn
t(x) = arg min{f (x − t∇f (x)) : t ≥ 0}
t
where we assume that t(x) is uniquely defined (unique global minimizer) for every x ∈ Rn .
Also suppose that t(·) is locally bounded, i.e., for any bounded set K, there exists M > 0
s.t. |t(x)| < M for all x ∈ K. Show that the hypotheses of Theorem 4.2 are satisfied with
∆ = {x ∈ Rn : Df (x) = 0} and v = f . [Hint: the key point is to show that a(·) is continuous,
i.e., that t(·) is continuous. This does hold because, since f is continuously differentiable,
the curve below (Figure 4.1) does not change too much in a neighborhood of x.]

Copyright ©1993–2024, André L. Tits. All Rights Reserved 59


Unconstrained Optimization

0 λ(x) λ

f(x-λ∇f(x))-f(x)

Figure 4.1: Steepest descent with exact search

Remark 4.5 We just proved that Algorithm SD yields accumulation points x̂ (if any) such
that Df (x̂) = 0. There is no guarantee, however, that x̂ is even a local minimizer (e.g., take
the case where x0 is a local maximizer). Nevertheless, this will very likely be the case, since
the cost function decreases at each iteration (and thus, local minimizers are the only ‘stable’
points.)

In many cases t(x) will not be uniquely defined for all x and, when it is, a(·) may not be
continuous. (See, e.g., the “Armijo” line search discussed below.) Also, the iteration map
may have memory, i.e., may not be a function of x alone. This issue is addressed in Algorithm
Model 2 below, which is based on a point-to-set iteration map

A : V → 2V

(2V is the set of subsets of V ).


Algorithm Model 2
Data. x0 ∈ V
i=0
while xi 6∈ ∆ do {
pick xi+1 ∈ A(xi )
i=i+1
}
stop
The advantages of using a point to set iteration map are that

(i) compound algorithms can be readily analyzed (two or more algorithms are intertwined)

(ii) this can include algorithms for which the iteration depends on some past information
(i.e., conjugate gradient methods)

(iii) algorithms not satisfying the conditions of the previous theorem (e.g., a not continuous)
may satisfy the conditions of the theorem below.

The algorithm above, with the convergence theorem below, will allow us to analyze an
implementable algorithm. The following theorem is due to Polak [?].

Theorem 4.3 Suppose that there exists a function v : V → R such that

60 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.3 Introduction to convergence analysis

(i) v(·) is continuous in ∆c

(ii) ∀x ∈ ∆c ∃ ǫ > 0, δ > 0 such that

v(y ′) − v(x′ ) ≤ −δ ∀x′ ∈ B(x, ǫ) ∀y ′ ∈ A(x′ ) (4.8)

Then, if the sequence {xi } constructed by Algorithm model 2 is infinite, every accumulation
point of {xi } is desirable (i.e., belongs to ∆).

Remark 4.6 (4.8) indicates a uniform decrease in the neighborhood of any non-desirable
point. Note that a similar property was implied by (ii) and (iii) in Theorem 4.2.

K
Lemma 4.1 Let {ti } ⊂ R be a monotonically decreasing sequence such that ti → t∗ for
some K ⊂ IN, t∗ ∈ R. Then ti ց t∗ .

Exercise 4.8 Prove the lemma.

Proof of Theorem 4.3


K K
By contradiction. Suppose xi → x̂ ∈
6 ∆. Since v(·) is continuous, v(xi ) → v(x̂). Since, in
view of (ii),
v(xi+1 ) < v(xi ) ∀i

it follows from the lemma above that

v(xi ) → v(x̂) . (4.9)

K
Now let ǫ, δ correspond to x̂ in assumption (ii). Since xi → x̂, there exists i0 such that
∀i ≥ i0 , i ∈ K, xi belongs to B(x̂, ǫ). Hence, ∀i ≥ i0 , i ∈ K,

v(y) − v(xi ) ≤ −δ ∀y ∈ A(xi ) (4.10)

and, in particular

v(xi+1 ) ≤ v(xi ) − δ ∀i ≥ i0 , i ∈ K (4.11)

But this contradicts (4.9) and the proof is complete.

Exercise 4.9 Show that Algorithm SD (steepest descent) with H = Rn satisfies the as-
K
sumptions of Theorem 4.3. Hence xi → x̂ implies Df (x̂) = 0B (V, R) (assuming that
argmint {f (xi − t∇f (xi ))} is always nonempty).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 61


Unconstrained Optimization

In the following algorithm, a line search due to Armijo (Larry Armijo, 20th century Amer-
ican mathematician) replaces the exact line search of Algorithm SD, making the algorithm
implementable. This line search imposes a decrease of f (xi ) at each iteration, which is com-
mon practice. Note however that such “monotone decrease” in itself is not sufficient for
inducing convergence to stationary points. Two ingredients in the Armijo line search insure
that “sufficient decrease” is achieved: (i) the back-tracking technique insures that, away
from stationary points, the step will not be vanishingly small, and (ii) the “Armijo line”
test insures that, whenever a reasonably large step is taken, a reasonably large decrease is
achieved.
We present this algorithm in a more general form, with a search direction hi (hi =
−gradf (xi ) corresponds to Armijo-gradient).
Algorithm 2 (Armijo step-size rule)
Parameters α, β ∈ (0, 1)
Data x0 ∈ V
i=0
while Df (xi ) 6= 0B (V, R) do {
compute an appropriate search direction hi
t=1
while f (xi + thi ) − f (xi ) > αtf ′ (xi ; hi ) do t := βt
xi+1 = xi + thi
i=i+1
}
stop

To get an intuitive picture of this step-size rule, let us define a function φi : R → R by

φi (t) = f (xi + thi ) − f (xi )

Using the chain rule we have

φ′i (0) = Df (xi + 0hi )hi = f ′ (xi ; hi )

so that the condition to be satisfied by t = β k can be written as

φi (t) ≤ αtφ′i (0)

Hence the Armijo rule prescribes to choose the step-size ti as the least power of β (hence
the largest value, since β < 1) at which the curve φ(t) is below the straight line αtφ′i (0), as
shown on the Figure 4.2. In the case of the figure k = 2 will be chosen. We see that this
step-size will be well defined as long as

f ′ (xi ; hi ) < 0

which insures that the straight lines on the picture are downward. Let us state and prove
this precisely.

62 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.3 Introduction to convergence analysis

Proposition 4.1 Suppose that Df (xi ) 6= 0 and that hi is such that

f ′ (xi ; hi ) < 0

Then there exists an integer k such that t = β k satisfies the line search criterion.

Proof

φi (t) = φi (t) − φi (0) = tφ′i (0) + o(t)


= tαφ′i (0) + t(1 − α)φ′i (0) + o(t)
 
′ ′ o(t)
= tαφi (0) + t (1 − α)φi (0) + t 6= 0
t

Since φ′i (0) = f ′ (xi ; hi ) < 0, the expression within braces is negative for t > 0 small enough,
thus φi (t) < tαφ′i (0) for t > 0 small enough.

We will now apply Theorem 4.3 to prove convergence of Algorithm 2. We just have to
show that condition (ii) holds (using v ≡ f , (i) holds by assumption). For simplicity, we let
V = Rn .
THE NEXT THEOREM TO BE SIMPLIFIED AND BE STATED
AND PROVED FOR THE SPECIAL CASE OF hi BEING THE
NEGATIVE GRADIENT DIRECTION, WITH A REMARK MEN-
TIONING THE GENERALIZATION.
Theorem 4.4 Let H(x) denote the set of search directions that could possibly be constructed
by Algorithm 2 when x is the current iterate. (In the case of steepest descent, H(x) =
{−gradf (x)}.) Suppose that H(x) is bounded away from zero near non-stationary points,
i.e., for any x̂ such that Df (x̂) 6= 0, there exists ǫ > 0 such that

inf{khk : h ∈ H(x), kx − x̂k ≤ ǫ} > 0. (4.12)

β2 β1 1=β0 λ
0•

φ i(λ)=f(xi+λh i)-f(xi)

αλφ′i (0)=αλ〈∇f(xi),hi〉
λφ′i (0)=λ〈∇f(x i),hi〉

Figure 4.2: Armijo rule

Copyright ©1993–2024, André L. Tits. All Rights Reserved 63


Unconstrained Optimization

Further suppose that for any x̂ for which Df (x̂) 6= 0 there exist positive numbers ǫ and ρ
such that, ∀x ∈ B(x̂, ǫ), ∀h ∈ H(x), it holds

h∇f (x), hi ≤ −ρk∇f (x)|| ||h||, (4.13)


K
where k · k is the norm induced by the inner product. Then xi → x∗ implies Df (x∗ ) = 0.

Proof (For simplicity we assume the standard Euclidean inner product.)


Z 1
f (x + th) = f (x) + h∇f (x + tth), thidt
0

Thus
Z 1
f (x + th) − f (x) − αth∇f (x), hi = t h∇f (x + σth) − ∇f (x), hidσ
0
+ t(1 − α)h∇f (x), hi (4.14)
≤ t( sup |h∇f (x + σth) − ∇f (x), hi| + (1 − α)h∇f (x), hi) ∀t ≥ 0
σ∈[0,1]

Suppose now that x̂ is such that ∇f (x̂) 6= 0 and let ǫ, ρ and C satisfy the hypotheses of the
theorem. Substituting (4.13) into (4.14) yields (since α ∈ (0, 1) and t ≥ 0), using Schwartz
inequality,

f (x + th) − f (x) − αth∇f (x), hi ≤ t||h||( sup ||∇f (x + σth) − ∇f (x)|| − (1 − α)ρ||∇f (x)||)
σ∈[0,1]

∀ x ∈ B(x̂, ǫ), ∀h ∈ H(x̂) (4.15)

Assume now that h(x) = −∇f (x) [the proof for the general case is left as a (not entirely
trivial) exercise.] First let us pick ǫ′ ∈ (0, ǫ] s.t., for some η > 0, (using continuity of ∇f )

(1 − α)ρ||∇f (x)|| > η > 0 ∀x ∈ B(x̂, ǫ′ )


Also by continuity of ∇f, ∃C s.t. ||∇f (x)|| ≤ C ∀x ∈ B(x̂, ǫ′ ). Since B̄(x̂, ǫ′ ) is compact,
∇f is uniformly continuous over B̄(x̂, ǫ′ ). Thus, there exists ǭ > 0 such that

||∇f (x + v) − ∇f (x)|| < η ∀||v|| < ǭ ∀x ∈ B̄(x̂, ǫ′ )

Thus,
ǭ
||∇f (x − σt∇f (x)) − ∇f (x)|| < η ∀σ ∈ [0, 1], t ∈ [0, ], x ∈ B̄(x̂, ǫ′ )
C
which implies

sup ||∇f (x − σt∇f (x)) − ∇f (x)|| < η ∀t ∈ [0, t̄], x ∈ B̄(x̂, ǫ′ )


σ∈[0,1]

64 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.3 Introduction to convergence analysis

ǭ
with t̄ = C
> 0. Thus (4.15) yields

f (x − t∇f (x)) − f (x) + αt||∇f (x)||2 < 0 ∀t ∈ (0, t̄], x ∈ B̄(x̂, ǫ′ ) (4.16)

Let us denote by k(x) the value of ki constructed by Algorithm 2 if xi = x. Then, from


(4.16) and the definition of k(x)

k(x) ≤ k ∗ = max(0, k̃) ∀x ∈ B(x̂, ǫ′ ) (4.17)

where k̃ is such that

β k̃ ≤ t̄ < β k̃−1 (4.18)

(since β k̃ will then always satisfy inequality (4.16). The iteration map A(x) (singleton valued
in this case) is
A(x) = {x − β k(x) ∇f (x)}.
and, from the line search criterion, using (4.17) and (4.18)

f (A(x)) − f (x) ≤ −αβ k(x) ||∇f (x)||2


≤ −αβ k ||∇f (x)||2 ∀x ∈ B̄(x̂, ǫ′ )

η2
≤ −αβ k ∀x ∈ B̄(x̂, ǫ′ )

(1 − α)2 ρ2

and condition (ii) of Theorem 4.3 is satisfied with

η2
δ = αβ k

> 0.
(1 − α)2 ρ2
K
Hence xi → x∗ implies x∗ ∈ ∆, i.e., ∇f (x∗ ) = 0.

Remark 4.7 Condition (4.13) expresses that the angle between h and (−gradf (x)), (in the
2D plane spanned by these two vectors) is uniformly bounded by cos−1 (ρ) (note that ρ > 1
cannot possibly satisfy (4.13) except if both sides are = 0). In other words, this angle is
uniformly bounded away from 90◦ . This angle just being less than 90◦ for all x, insuring that
h(x) is always a descent direction, is indeed not enough. Condition (4.12) prevents h(x)
from collapsing (resulting in a very small step ||xk+1 − xk ||) except near a desirable point.

Exercise 4.10 Prove that Theorem 4.4 is still true if the Armijo search is replaced by an
exact search (as in Algorithm SD).

Remark 4.8

1. A key condition in Theorem 4.3 is that of uniform descent in the neighborhood of a


non-desirable point. Descent by itself may not be enough.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 65


Unconstrained Optimization

2. The best values for α and β in Armijo step-size are not obvious a priori. More about
all this can be found in [?, ?].

3. Many other step-size rules can be used, such as golden section search, quadratic or cubic
interpolation, Goldstein step-size. With some line searches, a stronger convergence
result than that obtained above can be proved, under the additional assumption that f
is bounded from below (note that, without such assumption, the optimization problem
would not be well defined): Irrespective of whether or not {xk } has accumulation
points, the sequence of gradients {∇f (xk )} always converges to zero. See, e.g., [?,
section 3.2].

Note on the assumption xi → x∗ . The convergence results given earlier assert that,
under some assumptions, every accumulation point of the sequence generated by a suitable
algorithm (e.g., Armijo gradient) satisfies the first order necessary condition of optimality.
A much “nicer” result would be that the entire sequence converges. The following exercises
address this question.

Exercise 4.11 Let f : Rn → R be continuously differentiable. Suppose {xi } is constructed


by the Armijo-gradient algorithm. Suppose that {x : Df (x) = 0B (V, R)} is finite and suppose
that {xi } has an accumulation point x̂. Then xi → x̂.

Exercise 4.12 Let x̂ be an isolated stationary point of f : Rn → R and let {xi } be a


sequence with the property that all its accumulation points are stationary, and that one of
those is x̂. (A stationary point x̂ of f is said to be isolated if there exists ǫ > 0 such that f
has no stationary point in B(x̂, ǫ) \ {x̂}.) Further suppose that there exists ρ > 0 such that
the index set K := {i : xi ∈ B(x̂, ρ)} is such that kxi+1 − xi k → 0 as i → ∞, i ∈ K. Prove
that under these assumptions xi → x̂ as i → ∞.

The assumption in Exercise 4.12 is often satisfied. In particular, any local minimum satisfying
the 2nd order sufficient condition of optimality is isolated (why?). Finally if xi+1 = xi −
tgradf (xi ) and |t| ≤ 1 (e.g., Armijo-gradient) then kxi+1 − xi k ≤ kgradf (xi )k which goes to
0 on sub-sequences converging to a stationary point; for other algorithms, suitable step-size
limitation schemes will yield the same result.

Exercise 4.13 Let f : Rn → R be continuously differentiable and let {xi } be a bounded


sequence with the property that every accumulation point x̂ satisfies Df (x̂) = 0. Then
Df (xi ) → 0 as i → ∞.

4.4 Second order optimality conditions


Here we consider only V = Rn (the analysis in the general case is slightly more involved).
Consider again the problem

min{f (x) x ∈ Rn } (4.19)

66 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.4 Second order optimality conditions

Theorem 4.5 (2nd order necessary condition) Suppose that f is twice differentiable and let
x̂ be a local minimizer for (4.19). Then ∇2 f (x̂) is positive semi-definite, i.e.

dT ∇2 f (x̂)(x̂)d ≥ 0 ∀d ∈ Rn

Proof. Let d ∈ Rn , kdk = 1, and let t > 0. Since x̂ is a local minimizer, ∇f (x̂) = 0. Second
order expansion of f around x̂ yields
 
t2 T 2 o2 (td)
0 ≤ f (x̂ + td) − f (x̂) = d ∇ f (x̂)d + (4.20)
2 t2
o2 (h)
with khk2
→ 0 as h → 0. The claim then follows by letting t → 0.

Theorem 4.6 (2nd order sufficiency condition) Suppose that f is twice differentiable, that
Df (x̂) = 0 and that ∇2 f (x̂) is positive definite. Then x̂ is a strict local minimizer for (4.19).

Proof. Let m > 0 be the smallest eigenvalue of ∇2 f (x̂). (It is positive indeed since V is
assumed finite-dimensional.) Then, since hT ∇2 f (x̂)h ≥ mkhk2 for all h,
 
2 m o2 (h)
f (x̂ + h) − f (x̂) ≥ khk + ∀h 6= 0V .
2 khk2
|o2 (h)| m
Let ǫ > 0 be such that khk2
< 2
for all khk < ǫ, h 6= 0V . Then

f (x̂ + h) > f (x̂) ∀khk ≤ ǫ , h 6= 0V ,

proving the claim.

Alternatively, under the further assumption that the second derivative of f is continuous,
Theorem 4.6 can be proved by making use of (B.15), and using the fact that, due to the
assumed continuity of the second derivative, (D2 f (x̂ + th)h) h ≥ (m/2)khk2 for all h small
enough and t ∈ (0, 1).

Exercise 4.14 Show by a counterexample that the 2nd order sufficiency condition given
above is not valid when the space is infinite-dimensional. [Hint: Consider the space of
sequences in R with finitely nonzero entries (equivalently, the space of univariate polynomi-
als), with an appropriate norm.] Show that the condition remains sufficient in the infinite-
dimensional case if it is expressed as: there exists m > 0 such that, for all h,

D2 f (x̂)h h ≥ (m/2)khk2

Remark 4.9 Consider the following “proof” for Theorem 4.6. “Let d 6= 0. Then
hd, D2(x̂)di = δ > 0. Let h = td. Proceeding as in (1), (4.20) yields f (x̂+td)−f (x̂) > 0 ∀t ∈
(0, t̄] for some t̄ > 0, which shows that x̂ is a local minimizer.” This argument is in error.
Why? (Note that if this argument was correct, it would imply that the result also holds on
infinite-dimensional spaces. However, on such spaces, it is not sufficient that hd, D2 f (x̂)di
be positive for every nonzero d: it must be bounded away from zero for kdk = 1.)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 67


Unconstrained Optimization

Remark 4.10 Note that, in the proof of Theorem 4.6, it is not enough that o2 (h)/khk2 goes
to zero along straight lines. (Thus twice Gateaux differentiable is not enough.)

Exercise 4.15 Exhibit an example where f has a strict minimum at some x̂, with D2 f (x̂) 
0 (as required by the 2nd order necessary condition), but such that there is no neighborhood
of x̂ where D2 f (x) is everywhere positive semi-definite. (Try x ∈ R2 ; while examples in R
do exist, they are contrived.) Simple examples exist where there is a neighborhood of x̂ where
the Hessian is nowhere positive semi-definite (except at x̂). [Hint: First ignore the strictness
requirement.]

4.5 Minimization of convex functions


Convex functions have very nice properties in relation with optimization as shown by the
exercise and theorem below.
Exercise 4.16 The set of global minimizers of a convex function is convex (without any
differentiability assumption). Also, every local minimizer is global. If such minimizer exists
and f is strictly convex, then it is the unique (strict) global minimizer.
Theorem 4.7 Suppose f : V → R, is differentiable and convex. Then Df (x∗ ) = 0 implies
that x∗ is a global minimizer for f . If f is strictly convex, x∗ is the unique (hence global and
strict) minimizer.

Proof. If f is convex, then, ∀x ∈ V


f (x) ≥ f (x∗ ) + Df (x∗ )(x − x∗ )
and, since Df (x∗ ) = 0
f (x) ≥ f (x∗ ) ∀x ∈ V
and x∗ is a global minimizer. If f is strictly convex, then (e.g., [?]),
f (x) > f (x∗ ) + Df (x∗ )(x − x∗ ) ∀x 6= x∗ .
hence
f (x) > f (x∗ ) ∀x 6= x∗
and x∗ is strict and is the unique global minimizer.

Remark 4.11 Let V = Rn and suppose f is strongly convex. Then there is a global
minimizer (why?). By the previous theorem it is unique and Df vanishes at no other point.

Exercise 4.17 Suppose that f : Rn → R is strictly convex and has a minimizer x̂, and
suppose that the sequence {xi } is such that
(i) every accumulation point x̃ satisfies Df (x̃) = 0n ;
(ii) f (xi+1 ) ≤ f (xi ) for all i.
Then xi → x̂.

68 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.6 Conjugate direction methods

4.6 Conjugate direction methods


(see [?])
We restrict the discussion to V = Rn .
Steepest descent type methods can be very slow.

Exercise 4.18 Let f (x, y) = 21 (x2 + ay 2 ) where a > 0. Consider the steepest descent al-
gorithm with exact minimization. Given (xi , yi ) ∈ R2 , obtain formulas for xi+1 and yi+1 .
Using these formulas, give a qualitative discussion of the performance of the algorithm for
a = 1, a very large and a very small. Verify numerically using, e.g., MATLAB.

If the objective function is quadratic and x ∈ R2 , two function evaluations and two gradient
evaluations are enough to identify the function exactly (why?). Thus there ought to be a way
to reach the solution in two iterations. As most functions look quadratic locally, such method
should give good results in the general case. Clearly, such a method must have memory (to
remember previous function and gradient values). It turns out that a very simple idea gives
answers to these questions. The idea is that of conjugate direction.

Definition 4.1 Given a symmetric matrix Q, two vectors d1 and d2 are said to be Q-
orthogonal, or conjugate with respect to Q if dT1 Qd2 = 0

Fact. If Q is positive definite and d0 , . . . , dk are Q-orthogonal and are all nonzero, then
these vectors are linearly independent (and thus there can be no more than n such vectors).
Proof. See [?].

Theorem 4.8 (Expanding Subspace Theorem)


Consider the quadratic function f (x) = 12 xT Qx + bT x with Q ≻ 0 and let h0 , h1 , . . . , hn−1 be
a sequence of Q-orthogonal vectors in Rn . Then given any x0 ∈ Rn if the sequence {xk } is
generated according to xk+1 = xk + tk hk , where tk minimizes f (xk + thk ), then xk minimizes
f over the linear variety (or affine set x0 + span {h0 , . . . , hk−1}.

Proof. Since f is convex we can write

f (xk + h) ≥ f (xk ) + ∇f (xk )T h

Thus it is enough to show that, for all h ∈ sp {h0 , . . . , hk−1},

∇f (xk )T h = 0

i.e., since h ∈ span {h0 , . . . , hk−1 } if and only if −h ∈ span {h0 , . . . , hk−1 }, that

∇f (xk )T h = 0 ∀h ∈ span {h0 , . . . , hk−1 } ,

which holds if and only if

∇f (xk )T hi = 0 i = 0, . . . , k − 1.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 69


Unconstrained Optimization

We prove this by induction on k, for i = 0, 1, 2, . . . First, for any i, and k = i + 1,



∇f (xi+1 )T hi = f (xi + thi ) = 0
∂t
Suppose it holds for some k > i. Then

∇f (xk+1 )T hi = (Qxk+1 + b)T hi


= (Qxk + tk Qhk + b)T hi
= ∇f (xk )T hi + tk QhTk hi = 0

This first term vanishes due to the induction hypothesis, the second due to Q-orthogonality
of the h′i s.

Corollary 4.1 xn minimizes f (x) = 21 xT Qx + bT x over Rn , i.e., the given iteration yields
the minimizer for any quadratic function in no more than n iterations.

Remark 4.12 The minimizing step size tk is given by

(Qxk + b)T hk
tk = −
hTk Qhk

Conjugate gradient method


There are many ways to choose a set of conjugate directions. The conjugate gradient method
selects each direction as the negative gradient added to a linear combination of the previous
directions. It turns out that, in order to achieve Q-orthogonality, one must use

hk+1 = −∇f (xk+1 ) + βk hk (4.21)

i.e., only the preceding direction hk has a nonzero coefficient. This is because, if (4.21) was
used to construct the previous iterations, then ∇f (xk+1 ) is already conjugate to h0 , . . . , hk−1 .
Indeed, first notice that hi is always a descent direction (unless ∇f (xi ) = 0), so that ti 6= 0.
Then, for i < k
 
T T 1
∇f (xk+1 ) Qhi = ∇f (xk+1 ) Q(xi+1 − xi )
ti
1
= ∇f (xk+1 )T (∇f (xi+1 ) − ∇f (xi ))
ti
1
= ∇f (xk+1 )T (βi hi − hi+1 + hi − βi−1 hi−1 ) = 0
ti
where we have used (4.21) for k = i + 1 and k = i, and the Expanding Subspace Theorem.
The coefficient βk is chosen so that hTk+1 Qhk = 0. One gets

∇f (xk+1 )T Qhk
βk = . (4.22)
hTk Qhk

70 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.6 Conjugate direction methods

Non-quadratic objective functions


If x∗ is a minimizer for f , ∇2 f (x∗ ) is positive semi-definite. Generically (i.e., in ‘most’ cases),
it will be strictly positive definite, since matrices are generically non-singular. Thus (since
∇f (x∗ ) = 0)
1
f (x) = f (x∗ ) + (x − x∗ )T Q(x − x∗ ) + o2 (x − x∗ )
2
with Q = ∇ f (x ) ≻ 0. Thus, close to x∗ , f looks like a quadratic function with positive
2 ∗

definite Hessian matrix and the conjugate gradient algorithm should work very well in such
a neighborhood of the solution. However, (4.22) cannot be used for βk since Q is unknown
and since we do not want to compute the second derivative. Yet, for a quadratic function,
it can be shown that
||∇f (xk+1 ||2 (∇f (xk+1 ) − ∇f (xk ))T ∇f (xk+1 )
βk = = (4.23)
||∇f (xk )||2 ||∇f (xk )||2
The first expression yields the Fletcher-Reeves conjugate gradient method. The second one
gives the Polak-Ribière conjugate gradient method.
Exercise 4.19 Prove (4.23).
Algorithm 3 (conjugate gradient, Polak-Ribière version)
Data xo ∈ Rn
i=0
h0 = −∇f (x0 )
while ∇f (xi ) 6= 0 do {
ti ∈ arg min f (xi + thi ) (exact search)
t
xi+1 = xi + ti hi
hi+1 = −∇f (xi+1 ) + βi hi
i=i+1
}
stop
The Polak-Ribière formula uses
(∇f (xi+1 ) − ∇f (xi ))T ∇f (xi+1 )
βi = . (4.24)
||∇f (xi )||2
It has the following advantage over the Fletcher-Reeves formula: away from a solution there
is a possibility that the search direction obtained be not very good, yielding a small step
||xk+1 − xk ||. If such a difficulty occurs, ||∇f (xk+1) − ∇f (xk )|| will be small as well and P-R
will yield −∇f (xk+1 ) as next direction, thus “resetting” the method.
By inspection, one verifies that hi is a descent direction for f at xi , i.e., ∇f (xi )T hi < 0
whenever ∇f (xi ) 6= 0. The following stronger statement can be proved.
Fact. If f is twice continuously differentiable and strongly convex then ∃ρ > 0 such that
h∇f (xi ), hi i ≤ −ρ||∇f (xi )|| ||hi || ∀i
where {hi } and {xi } are as constructed by Algorithm 3 (in particular, this assumes an exact
line search).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 71


Unconstrained Optimization

Exercise 4.20 Show that


||hi || ≥ ||∇f (xi )|| ∀i.

As pointed out earlier one can show that the convergence theorem of Algorithm 2 still holds
in the case of an exact search (since an exact search results in a larger decrease).

Exercise 4.21 Show how the theorem just mentioned can be applied to Algorithm 3, by
specifying H(x).

Thus, all accumulation points are stationary and, since f is strongly convex, xi → x̂ the
unique global minimizer of f . An implementable version of Algorithm 3 can be found in [?].
If f is not convex, it is advisable to periodically reset the search direction (i.e., set hi =
−∇f (xi ) whenever i is a multiple of some number k; e.g., k = n to take advantage of the
quadratic termination property).

4.7 Rates of convergence


(see [?])
Note: Our definitions are simpler and not exactly equivalent to the ones in [?].
Quotient convergence rates

Definition 4.2 Suppose xi → x∗ . One says that {xi } converges to x∗ with a Q-order of
p(≥ 1) and a corresponding Q-factor = γ if there exists i0 such that, for all i ≥ i0

||xi+1 − x∗ || ≤ γ||xi − x∗ ||p .


Obviously, for a given initial point x0 , and supposing i0 = 0, the larger the Q-order, the faster
{xi } converges and, for a given Q-order, the smaller the Q-factor, the faster {xi } converges.
Also, according to our definition, if {xi } converges with Q-order = p it also converges with
Q-order = p′ for any p′ ≤ p, and if it converges with Q-order = p and Q-factor = γ it
also converges with Q-order = p and Q-factor = γ ′ for any γ ′ ≥ γ. Thus it may be more
appropriate to say that {xi } converges with at least Q-order = p and at most Q-factor =
γ. But what is more striking is the fact that a larger Q-order will overcome any initial
conditions, as shown in the following exercise. (If p = 1, a smaller Q-factor also overcomes
any initial conditions.)

Exercise 4.22 Let xi → x∗ be such that ||xi+1 − x∗ || = γ ||xi − x∗ ||p for all i, with γ > 0,
p ≥ 1, and suppose that yi → y ∗ is such that

||yi+1 − y ∗ || ≤ δ||yi − y ∗||q ∀i for some δ > 0

Show that, if q > p, for any x0 , y0 , x0 6= x∗ ∃N such that

||yi − y ∗ || < ||xi − x∗ || ∀i ≥ N

72 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.7 Rates of convergence

Q-linear convergence
If γ ∈ (0, 1) and p = 1, convergence is called Q-linear. This terminology comes from the fact
that, in that case, we have (assuming i0 = 0),

||xi − x∗ || ≤ γ i ||x0 − x∗ || ∀i

and, hence
log ||xi − x∗ || ≤ i log γ + log ||x0 − x∗ || ∀i
so that − log kxi − x∗ k (which, when positive, is roughly proportional to the number of exact
figures in xi ) is linear (more precisely, affine) in i.
||xi+1 −x∗ ||
If xi → x∗ Q-linearly with Q-factor= γ, clearly ||xi −x∗ ||
≤ γ for i larger enough. This
motivates the following definition.

Definition 4.3 xi → x∗ Q-superlinearly if

||xi+1 − x∗ ||
→ 0 as i → ∞
||xi − x∗ ||

Exercise 4.23 Show, by a counterexample, that a sequence {xi } can converge Q-superlinearly,
without converging with any Q-order p > 1. (However, Q-order larger than 1 implies Q-
superlinear convergence.)

Exercise 4.24 Show that, if xi → x∗ Q-superlinearly, ||xi+1 − xi || is a good estimate of the


“current error” ||xi − x∗ || in the sense that

||xi − x∗ ||
lim =1
i→∞ ||xi+1 − xi ||

Definition 4.4 If xi → x∗ with a Q-order p = 2, {xi } is said to converge q-quadratically.

q-quadratic convergence is very fast: for large i, the number of exact figures is doubled at
each iteration.

Exercise 4.25 [?]. The Q-factor is norm-dependent (unlike the Q-order). Let
( 
(.9)k 10 √ for k even
xi 1/ 2 
(.9)k 1/√2 for k odd
p
and consider the norms x21 + x22 and max(|x1 |, |x2 |). Show that in both cases the sequence
converges with Q-order= 1, but that only in the 1st case convergence is Q-linear (γ > 1 in
the second case).

Root convergence rates

Copyright ©1993–2024, André L. Tits. All Rights Reserved 73


Unconstrained Optimization

Definition 4.5 One says that xi → x∗ with an R-order equal to p ≥ 1 and an R-factor equal
to γ ∈ (0, 1) if there exists i0 such that, for all i ≥ i0

||xi0 +i − x∗ || ≤ γ i δ for p = 1 (4.25)


∗ pi
||xi0 +i − x || ≤ γ δ for p > 1 (4.26)

for some δ > 0. (Note: by increasing δ, i0 can always be set to 0.)

Equivalently there exists δ ′ > 0 and γ ′ ∈ (0, 1) such that, for all i,

||xi − x∗ || ≤ γ i δ ′ for p = 1 (4.27)


∗ ′pi ′
||xi − x || ≤ γ δ p>1 (4.28)
i
(take γ ′ = γ 1/p 0 ).
Again with the definition as given, it would be appropriate to used the phrases ‘at least’ and
‘at most’.

Exercise 4.26 Show that, if xi → x∗ with Q-order= p ≥ 1 and Q-factor= γ ∈ (0, 1) then
xi → x∗ with R-order= p and R-factor= γ. Show that the converse is not true. Finally,
exhibit a sequence {xi } converging

(i) with Q-order p ≥ 1 which does not converge with any R-order= p′ > p, with any
R-factor

(ii) with Q-order p ≥ 1 and Q-factor γ, not converging with R-order p with any R-factor
γ ′ < γ.

Definition 4.6 If p = 1 and γ ∈ (0, 1), convergence is called R-linear.


If xi → x∗ R-linearly we see from (4.27) that
1 1
lim sup ||xi − x∗ || i ≤ lim γδ i = γ
i→∞ i→∞

Definition 4.7 {xi } is said to converge to x∗ R-superlinearly if


1
lim ||xi − x∗ || i = 0
i→∞

Exercise 4.27 Show that, if xi → x∗ Q-superlinearly, then xi → x∗ R-superlinearly.

Exercise 4.28 Show, by a counterexample, that a sequence {xi } can converge R-superlinearly
without converging with any R-order p > 1

Definition 4.8 If xi → x∗ with R-order p = 2, {xi } is said to converge R-quadratically.

Remark 4.13 R-order, as well as R-factor, are norm independent.

74 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.7 Rates of convergence

Rate of convergence of first order algorithms (see [?, ?])


Consider the following algorithm
Algorithm
Data x0 ∈ Rn
i=0
while Df (xi ) 6= 0 do {
obtain hi
ti ∈ arg min{f (xi + thi ) t ≥ 0}
xi+1 = xi + ti hi
i=i+1
}
stop

Suppose that there exists ρ > 0 such that

∇f (xi )T hi ≤ −ρ||∇f (xi )|| ||hi || ∀i, (4.29)

where k · k is the Euclidean norm.


and that hi 6= 0 whenever Df (xi ) 6= 0 (condition ||hi || ≥ c||Df (xi )|| is obviously superfluous
if we use an exact search, since, then, xi+1 depends only on the direction of hi ). We know
that this implies that any accumulation point is stationary. We now suppose that, actually,
xi → x∗ . This will happen for sure if f is strongly convex. It will in fact happen in most
cases when {xi } has some accumulation point.
Then, we can study the rate of convergence of {xi }. We give the following theorem without
proof (see [?]).

Theorem 4.9 Suppose xi → x∗ , for some x∗ , where {xi } is constructed by the algorithm
above, and assume that (4.29) holds. Suppose that f is twice continuously differentiable and
that the second order sufficiency condition holds at x∗ (as discussed above, this is a mild
assumption). Let m, M, ǫ be positive numbers such that ∀x ∈ B(x∗ , ǫ), ∀y ∈ Rn

m||y||2 ≤ y T ∇2 f (x)y ≤ M||y||2

[Such numbers always exist. Why?]. Then xi → x∗ R-linearly (at least) with an R-factor of
(at most)
r
ρm 2
γ = 1−( ) (4.30)
M

If ρ = 1 (steepest descent), convergence is Q-linear with same Q-factor (with Euclidean


norm).

Exercise 4.29 Show that if ρ < 1 convergence is not necessarily Q-linear (with Euclidean
norm).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 75


Unconstrained Optimization

Remark 4.14

1. f as above could be called ‘locally strongly convex’, why?

2. without knowing anything else on hip , the fastest convergence is achieved with ρ = 1
m 2
(steepest descent), which yields γs = 1 − ( M ) . m and M are related to the smallest
and largest eigenvalues of D2 f (x∗ ) (why? how?). If m ≪ M, convergence may be very
slow again (see Figure 4.3) (also see [?]).

m∼M
m << M level curves
• x0
• •x1 • •x ∼ x*
• 1
•x •x0
2

Figure 4.3:

We will see below that if hi is cleverly chosen (e.g., conjugate gradient method) the rate of
convergence can be much faster.
If instead of using exact line search we use an Armijo line search, with parameters α, β ∈
(0, 1), convergence is still R-linear and the R-factor is now given by
r
ρm 2
γa = 1 − 4βα(1 − α)( ) (4.31)
M

Remark 4.15

1. For α = 1/2 and β ≃ 1, γa is close to γs , i.e., Armijo gradient converges as fast as


steepest descent. Note, however that, the larger β is, the more computer time is going
to be needed to perform the Armijo line search, since β k will be very slowly decreasing
when k increases.

2. Hence it appears that the rate of convergence does not by itself tell how fast the
problem is going to be solved (even asymptotically). The time needed to complete
one iteration has to be taken into account. In particular, for the rate of convergence
to have any significance at all, the work per iteration must be bounded. See exercise
below.

Exercise 4.30 Suppose xi → x∗ with xi 6= x∗ for all i. Show that for all p, γ there exists a
sub-sequence {xik } such that xik → x∗ with Q-order = p and Q-factor = γ.

76 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.7 Rates of convergence

Exercise 4.31 Consider two algorithms for the solution of some given problem. Algo-
rithms SD and 2 construct sequences {x1k } and {x2k }, respectively, both of which converge
to x∗ . Suppose x10 = x20 ∈ B(x∗ , 1) (open unit ball) and suppose

kxik+1 − x∗ k = kxik − x∗ kpi

with p1 > p2 > 0. Finally, suppose that, for both algorithms, the CPU time needed to generate
xk+1 from xk is bounded (as a function of k), as well as bounded away from 0. Show that
there exists ǭ > 0 such that, for all ǫ ∈ (0, ǭ), {x1k } enters the ball B(x∗ , ǫ) in less total
CPU time than {x2k } does. Thus, under bounded, and bounded away from zero, time per
iteration, Q-orders can be meaningfully compared. (This is in contrast with the point made
in Exercise 4.30.)

Conjugate direction methods


The rates of convergence (4.30) and (4.31) are conservative since they do not take into
account the way the directions hi are constructed, but only assume that they satisfy (4.29).
The following modification of Algorithm 3 can be shown to converge superlinearly.
Algorithm 4 (conjugate gradient with periodic reinitialization)
Parameter k > 0 (integer)
Data x0 ∈ Rn
i=0
h0 = −∇f (x0 )
while ∇f (xi ) 6= 0 do {
pick ti ∈ argmin{f (xi + thi ) : t ≥ 0}
xi+1 = xi + ti hi
hi+1 = −∇f (xi+1 ) + βi hi

0
 if i is a multiple of k
with βi =

 (∇f (xi+1 )−∇f (xi ))T ∇f (xi+1 )
||∇f (xi )||2
otherwise
i=i+1
}
stop

Exercise 4.32 Show that any accumulation point of the sub-sequence {xi }i=ℓk of the se-
quence {xi } constructed by Algorithm 4 is stationary.

Exercise 4.33 (Pacer step) Show that, if f is strongly convex in any bounded set, then
the sequence {xi } constructed by Algorithm 4 converges to the minimum x∗ and the rate of
convergence is at least R-linear.

Convergence is in fact n-step q-quadratic if k = n. If it is known that xi → x∗ , clearly the


strong convexity assumption can be replaced by strong convexity around x∗ , i.e., 2nd order
sufficiency condition.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 77


Unconstrained Optimization

Theorem 4.10 Suppose that k = n and suppose that the sequence {xi } constructed by
Algorithm 4 is such that xi → x∗ , at which point the 2nd order sufficiency condition is
satisfied. Then xi → x∗ n-step q-quadratically, i.e. ∃q, l0 such that

||xi+n − x∗ || ≤ q||xi − x∗ ||2 for i = ln, l ≥ l0 .

This should be compared with the quadratic rate obtained below for Newton’s method:
Newton’s method achieves the minimum of a quadratic convex function in 1 step (compared
to n steps here).

Exercise 4.34 Show that n-step q-quadratic convergence does not imply R-superlinear con-
vergence. Show that the implication would hold under the further assumption that, for some
C > 0,

kxk+1 − x∗ k ≤ Ckxk − x∗ k ∀k.

4.8 Newton’s method


Let us first consider Newton’s method (Isaac Newton, English mathematician, 1642–1727)
for solving a system of equations. Consider the system of equations

F (x) = 0V (4.32)

with F : V → V , V a normed vector space, and F is differentiable. The idea is to replace


(4.32) by an affine (i.e., 1st order) approximation at the current estimate of the solution (see
Figure 4.4). Suppose xi is the current estimate. Assuming Fréchet-differentiability, consider
the equation

F(x)

convergent n=1

x
• • •
x2 x1 x0

Figure 4.4: Newton’s iteration

F̃i (x) := F (xi ) + DF (xi )(x − xi ) = 0V (4.33)

78 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.8 Newton’s method

and we denote by xi+1 the solution to (4.33) (assuming DF (xi ) is invertible).


Note that, from (4.33), xi+1 is given by

xi+1 = xi − DF (xi )−1 F (xi ) (4.34)

but it should not be computed that way but rather by solving the linear system (4.33)
(much cheaper than computing an inverse). It turns out that, under suitable conditions, xi
converges very fast to a solution x∗ .
Hence, in particular, if F : Rn → Rn , Newton’s method is invariant under scaling of the
individual components of x.

Theorem 4.11 Suppose V is a Banach space and F : V → V is twice continuously differ-


entiable. Let x∗ be such that F (x∗ ) = 0V and suppose that DF (x∗ ) is invertible. Then there
exists ρ > 0 and C > 0 such that, if kx0 − x∗ k < ρ, xi → x∗ and kxi+1 − x∗ k ≤ Ckxi − x∗ k2
for all i.

Proof. (We use the following instance of the Inverse Function Theorem: Let f : V → V
be continuously differentiable, V a Banach space and let x0 ∈ V be such that DF (x0 ) is
invertible. Then there exists ρ > 0 such that DF (x)−1 exists and is continuous on B(x0 , ρ).
See, e.g. [?, Section 9.7, Problem 1].) Let xi be such that DF (xi ) is invertible. Then, from
(4.34),

kxi+1 − x∗ k = xi − x∗ − DF (xi )−1 F (xi ) ≤ DF (xi )−1 i


· kF (xi ) + DF (xi )(x∗ − xi )k ,

where the induced norm k · ki is used for the (inverse) linear map. Now, there exist (i) ρ1 > 0
and β1 > 0 such that

DF (x)−1 ≤ β1 ∀x ∈ B(x∗ , ρ1 )
and (ii) ρ2 > 0 and β2 > 0 such that

kF (x) + DF (x)(x∗ − x)k ≤ β2 kx − x∗ k2 ∀x ∈ B(x∗ , ρ2 )

(Existence of β1 and ρ1 follow from the Inverse Function Theorem; existence of β2 , for ρ2
small enough follows from continuity of the second derivative of F and from Corollary B.2.)
Further, let ρ > 0, with ρ < min{ρ1 , ρ2 } be such that

β1 β2 ρ < 1.

It follows that, whenever xi ∈ B(x∗ , ρ),

kxi+1 − x∗ k ≤ β1 β2 kxi − x∗ k2

≤ β1 β2 ρkxi − x∗ k ≤ kxi − x∗ k
so that, if x0 ∈ B(x∗ , ρ), then xi ∈ B(x∗ , ρ) for all i, and thus

kxi+1 − x∗ k ≤ β1 β2 kxi − x∗ k2 ∀i (4.35)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 79


Unconstrained Optimization

and
kxi+1 − x∗ k ≤ β1 β2 ρkxi − x∗ k ≤ (β1 β2 ρ)i kx0 − x∗ k ∀i .
Since β1 β2 ρ < 1, it follows that xi → x∗ as i → ∞. In view of (4.35), convergence is
q-quadratic.

We now turn back to our optimization problem

min{f (x)|x ∈ V }.

Since we are looking for points x∗ such that Df (x∗ ) = 0B (V, R), we want to solve a system of
nonlinear equations and the theory just presented applies, provided f is twice differentiable.
The Newton iteration now amounts to solving the linear system

Df (xi ) + D2 f (xi )(x − xi ) = 0


i.e., finding a stationary point for the quadratic approximation to f
1 2 
f (xi ) + Df (xi )(x − xi )2 + D f (xi )(x − xi ) (x − xi )2 .
2
In particular, if f is quadratic (i.e., Df is linear), one Newton iteration yields such point
(which may or may not be the minimizer) exactly. Quadratic convergence is achieved, e.g., if
f is three times continuously differentiable and the 2nd order sufficiency condition holds at
x∗ , so that D2 f (x∗ ) is invertible. We will now show that we can obtain stronger convergence
properties when the Newton iteration is applied to a minimization problem (than when
it is applied to a general root-finding problem). In particular, we want to achieve global
convergence (convergence for any initial guess x0 ). We can hope to achieve this because
the optimization problem has more structure than the general equation solving problem, as
shown in the next exercise.

Exercise 4.35 Exhibit a function F : R2 → R2 which is not the gradient of any C 2


function f : R2 → R.

Exercise 4.36 Newton’s method is invariant under affine transformations of the domain
z 7→ Lz + b, where L : V → V is an invertible, bounded linear map. [Note that, if L is a
bounded linear map and F is continuously differentiable, then G : z 7→ F (Lz + b) also is
continuously differentiable. Why?] Express this statement in a mathematically precise form,
and prove it. Next, turning to the application to minimization, show (e.g., by exhibiting
an example) that steepest descent (e.g., with exact line search) is not invariant under such
transformations, a significant shortcoming in comparison to Newton’s method, since selecting
a good scaling is then the user’s responsibility.

Remark 4.16 Global strong convexity of f (over all of Rn ) does not imply global conver-
gence of Newton’s method to minimize f . (f = integral of the function plotted on Figure 4.5.
gives such an example).

80 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.8 Newton’s method

Exercise 4.37 Come up with a specific globally strongly convex function f : R → R and
an initial point x0 such that Newton’s iteration started at x0 does not converge to the unique
minimizer of f . Experiment with Matlab, trying various initial points, and comment (in
particular, you should observe local quadratic convergence).

divergent (although ∂F
 (x) > 0 ∀x)
∂x

x2 x0
x1 x3

Figure 4.5: Example of non-convergence of Newton’s method for minimizing a strongly


convex function f , equivalently for locating a point at which its derivative F (pictured, with
derivative F ′ bounded away from zero) vanishes.

Global convergence will be obtained by making use of a suitable step-size rule, e.g., the
Armijo rule.
Now suppose V = Rn .1 Note that the Newton direction hN (x) at some x is given by
hN (x) = −∇2 f (x)−1 ∇f (x).
When ∇2 f (x) ≻ 0, hN (x) is minus the gradient associated with the (local) inner product
hu, vi = uT ∇2 f (x)v.
Hence, in such case, Newton’s method is a special case of steepest descent, albeit with a
norm that changes at each iteration.

Armijo–Newton
The idea is the following: replace the Newton iterate xi+1 = xi − ∇2 f (xi )−1 ∇f (xi ) by a
suitable step in the Newton direction, i.e.,
xi+1 = xi − ti ∇2 f (xi )−1 ∇f (xi ) (4.36)
with ti suitably chosen. By controlling the length of each step, one hopes to prevent insta-
bility. Formula (4.36) fits into the framework of Algorithm 2, with h(x) := hN (x). Following
Algorithm 2, we define ti in (4.36) via the Armijo step-size rule.
Note that if ∇2 f (x) is positive definite, h(x) is a descent direction. Global convergence with
an Armijo step, however, requires more (see (4.12)–(4.13)).
1
The same ideas apply for any Hilbert space, but some additional notation is needed.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 81


Unconstrained Optimization

Exercise 4.38 Suppose f : Rn → R is in C 2 and suppose ∇2 f (x) ≻ 0 for all x ∈ Rn . Then


(i) For every x̂ ∈ L there exists C > 0 and ρ > 0 such that, for all x close enough to x̂,

||hN (x)|| ≥ C||∇f (x)||

hN (x)T ∇f (x) ≤ −ρ||hN (x)|| ||∇f (x)||


so that the assumptions of Theorem 4.4 hold.

(ii) The sequence {xk } generated by the Armijo-Newton algorithm converges to the unique
global minimizer.

Armijo-Newton yields global convergence. However, the q-quadratic rate may be lost since
nothing insures that the step-size ti will be equal to 1, even very close to a solution x∗ .
However, as shown in the next theorem, this will be the case if the α parameter is less
than 1/2.

Theorem 4.12 Suppose f : Rn → R is in C 3 and let an initial point x0 be given. Consider


Algorithm 2 with α ∈ (0, 1/2) and hi := hN (xi ), and suppose that x̂ is an accumulation point
of {xi } with (∇f (x̂) = 0 and) ∇2 f (x̂) ≻ 0 (second order sufficiency condition). Then there
exists i0 such that ti = 1 for all i ≥ i0 , and xi → x̂ q-quadratically.

Exercise 4.39 Prove Theorem 4.12.

We now have a globally convergent algorithm, with (locally) a q-quadratic rate (if f is thrice
continuously differentiable). However, we needed an assumption of strong convexity on f on
bounded sets (for the iteration to be well-defined).
Suppose now that f is not strongly convex (perhaps not even convex). We noticed earlier
that, around a local minimizer, the strong convexity assumption is likely to hold. Hence,
we need to steer the iterate xi towards the neighborhood of such a local solution and then
use Armijo-Newton for local convergence. Such a scheme is called a stabilization scheme.
Armijo-Newton stabilized by a gradient method would look like the following.
Algorithm
Parameters. α ∈ (0, 1/2), β ∈ (0, 1)
Data. x0 ∈ Rn
i=0
while ∇f (xi ) 6= 0 do {
set
Hi := ∇2 f (xi ) + δi I,
where δi ≥ 0 is appropriately selected;
set
hi := −Hi−1 ∇f (xi );
compute Armijo step size ti ;
set xi+1 := xi + ti hi

82 Copyright ©1993-2018, André L. Tits. All Rights Reserved


4.9 Variable metric methods

}
stop.
In defining Hi , the non-negative scalar δi should be selected large enough (so hi is close
enough to the being along the negative gradient direction) that global convergence ensures,
while ideally being set to zero from some iteration onwards, so that local the Armijo-Newton
iteration takes over, yielding q-quadratic convergence. The choice δ-selection rule is critical,
to avoid the iteration entering into an infinite loop by repeated switching between δi = 0
and a “large” value, possibly even hindering global convergence.

4.9 Variable metric methods


Two major drawbacks of Newton’s method are as follows:
1. for hi = −∇2 f (xi )−1 ∇f (xi ) to be a descent direction for f at xi , the Hessian ∇2 f (xi )
must be positive definite. (For this reason, we had to stabilize it with a gradient
method.)

2. second derivatives must be computed ( n(n+1)


2
of them!) and a linear system of equations
has to be solved.

The variable metric methods avoid these 2 drawbacks. The price paid is that the rate of
convergence is not quadratic anymore, but merely superlinear (as in the conjugate gradient
method). The idea is to construct increasingly better estimates Si of the inverse of the
Hessian, making sure that those estimates remain positive definite. Since the Hessian ∇2 f
is generally positive definite around a local solution, the latter requirement presents no
contradiction.
Algorithm
Data. x0 ∈ Rn , S0 ∈ Rn×n , positive definite (e.g., S0 = I)
while ∇f (xi ) 6= 0 do {
hi = −Si ∇f (xi )
pick ti ∈ arg min{f (xi + thi )|t ≥ 0}
t
xi+1 = xi + ti hi
compute Si+1 , positive definite, using some update formula
}
stop

If the Si are bounded and uniformly positive definite, all accumulation points of the sequence
constructed by this algorithm are stationary (why?).
Exercise 4.40 Consider the above algorithm with the step-size rule ti = 1 for all i instead
of an exact search and assume f is strongly convex. Show that, if kSi − ∇2 f (xi )−1 k → 0,
convergence is Q-superlinear (locally). [Follow the argument used for Newton’s method. In
fact, if kSi − ∇2 f (xi )−1 k → 0 fast enough, convergence may be quadratic.]

Copyright ©1993–2024, André L. Tits. All Rights Reserved 83


Unconstrained Optimization

A number of possible update formulas have been suggested. The most popular one, due
independently to Broyden, Fletcher, Goldfarb and Shanno (BFGS) is given by

γi γiT Si δi δiT Si
Si+1 = Si + −
δiT γi δiT Si δi

where

δi = xi+1 − xi

γi = ∇f (xi+1 ) − ∇f (xi )

Convergence
If f is three times continuously differentiable and strongly convex, the sequence {xi } gener-
ated by the BFGS algorithm converges superlinearly to the solution x∗ .

Remark 4.17

1. BFGS has been observed to perform remarkably well on non convex cost functions

2. Variable metric methods, much like conjugate gradient methods, use past information
in order to improve the rate of convergence. Variable metric methods require more
storage than conjugate gradient methods (an n × n matrix) but generally exhibit much
better convergence properties.

Exercise 4.41 Justify the name “variable metric” as follows. Given a symmetric positive
definite matrix M, define

||x||M = (xT Mx)1/2 (= new metric)

and show that


inf {f (x + αh) : ||h||S −1 = 1} (Pα )
h i

−Si ∇f (x)
is achieved for h close to ĥ := ||Si ∇f (x)|| −1
for α small. Specifically, show that, given any
S
i

h̃ 6= ĥ, with kh̃kS −1 = 1, there exists ᾱ > 0 such that


i

f (x + αĥ) < f (x + αh̃) ∀α ∈ (0, ᾱ].

In particular, if D2f (x) > 0, the Newton direction is the direction of steepest descent in the
corresponding norm.

84 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 5

Constrained Optimization

Consider the problem

min{f (x) : x ∈ Ω} (P )
where f : V → R is continuously Fréchet differentiable and Ω is a subset of V , a normed
vector space. Although the case of interest here is when Ω is not open (see Exercise 4.1), no
such assumption is made.

Remark 5.1 Note that this formulation is very general. It allows in particular for binary
variables (e.g., Ω = {x : x(x − 1) = 0}) or integer variables (e.g., Ω = {x : sin(xπ) = 0}).
These two examples are in fact instances of smooth equality-constrained optimization: see
section 5.2 below.

5.1 Abstract Constraint Set


In the unconstrained case, we obtained a first order necessary condition of local optimality
by approximating f in the neighborhood of x̂ with its first order expansion at x̂. Specifically
we had, for all small enough h and some little-o function,

0 ≤ f (x̂ + h) − f (x̂) = Df (x̂) + o(h),

and since the direction (and orientation) of h was unrestricted, we could conclude that
Df (x̂) = 0B(V,R) .
In the case of problem (P ) however, the inequality f (x̂) ≤ f (x̂ + h) is not known to
hold for every small h. It is known to hold when x̂ + h ∈ Ω though. Since Ω is potentially
very complicated to specify, some approximation of it near x̂ is needed. A simple yet very
powerful such class of approximations is the class of conic approximations. Perhaps the most
obvious one is the “radial cone”.

Definition 5.1 The radial cone RC(x, Ω) to Ω at x is defined by

RC(x, Ω) = {h ∈ V : ∃t̄ > 0 s.t. x + th ∈ Ω ∀t ∈ (0, t̄]}.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 85


Constrained Optimization

The following result is readily proved (as an extension of the proof of Theorem 4.1 of uncon-
strained minimization, or with a simplification of the proof of the next theorem).

Proposition 5.1 Suppose x∗ is a local minimizer for (P). Then

Df (x∗ )h ≥ 0 ∀h ∈ cl(coRC(x∗ , Ω))

While this result is useful, it has a major drawback: When nonlinear equality constraints
are present, RC(x∗ , Ω) is usually empty, so that theorem is vacuous. This motivates the
introduction of the “tangent cone”. But first of all, let us define what is meant by “cone”.

Definition 5.2 Given a vector space V , A set C ⊆ V is a cone if x ∈ C implies αx ∈ C


for all α > 0.

In other words given x ∈ C, the entire ray from the origin through x belongs to C (but
possibly not the origin itself). A cone may be convex or not.
Example 5.1 (Figure 5.1)

ℜ2 ℜ2 ℜ2 non convex
cone
0
• convex • not a
0 cone a≠0 cone

ι1
ℜ2 ι2 ℜ3
• • non convex
cone
ι1 ∪ ι2 • •
• • •
0 0
non convex cone •

Figure 5.1:

A cone may or may not include the origin of the space. If it does, it is pointed. A cone
is salient if it does not include a nontrivial subspace. (A subspace S is nontrivial subspace
if S 6= {0V }.)

Exercise 5.1 Show that a cone C is convex if and only if αx + βy ∈ C for all α, β ≥ 0,
α + β > 0, x, y ∈ C.

Exercise 5.2 Show that a cone C is convex if and only if 21 (x + y) ∈ C for all x, y ∈ C.
Show by exhibiting a counterexample that this statement is incorrect if “cone” is replaced
with “set”.

Exercise 5.3 Prove that radial cones are cones.

86 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.1 Abstract Constraint Set

To address the difficulty just encountered with radial cones, let us focus for a moment
on the case Ω = {(x, y) : y = f (x)} (e.g., with x and y scalars), and with f nonlinear.
As just observed, the radial cone is empty in this situation. From freshman calculus we
do know how to approximate such set around x̂ though: just replace f with its first-order
Taylor expansion, yielding {(x, y) : y = f (x̂) + Df (x̂)(x − x̂)}, where we have replaced the
curve with its “tangent” at x̂. Hence we have replaced the radial-cone specification (that
a ray belongs to the cone if short displacements along that ray yield points within Ω) with
a requirement that short displacements along a candidate tangent direction to our curve
yield points “little-o-close” to the curve. Merging the two ideas yields the “tangent cone”, a
super-set of the radial cone. In the case of Figure 5.2 below, the tangent cone is the closure
of the radial cone: it includes the radial cone as well as the two “boundary line” which are
the tangents to the “boundary curves” of Ω. (It is of course not always the case that the
tangent cone is the closure of the radial cone: Just think of the case that motivated this
discussion, where the radial cone was empty.)

Definition 5.3 Given an normed space V , Ω ⊆ V , and x ∈ Ω, the tangent cone TC(x, Ω)
to Ω at x is defined by

ko(t)k
TC(x, Ω) = {h ∈ V : ∃o(·), t̄ > 0 s.t. x+th+o(t) ∈ Ω ∀t ∈ (0, t̄], → 0 as t → 0, t > 0} .
t
The next exercise shows that this definition meets the geometric intuition discussed above.

Exercise 5.4 Let


Ω = epif := {(y, z) ∈ Rn+1 : z ≥ f (y)},
with f : Rn → R continuously differentiable. Verify that, given (ŷ, ẑ) with ẑ = f (ŷ),

TC((ŷ, ẑ), Ω) = {(v, w) : w ≥ Df (ŷ)v},

i.e., the tangent cone is the half space that lies “above” the subspace parallel to the tangent
hyperplane—when n = 1, the line through the origin parallel to the tangent line—to f at
(ŷ, ẑ).

Exercise 5.5 Show that tangent cones are cones.

Exercise 5.6 Given an normed space V , Ω ⊆ V , and x ∈ Ω, show that

o(t)
TC(x, Ω) = {h ∈ V : ∃o(·) s.t. x + th + o(t) ∈ Ω ∀t > 0, → 0 as t → 0, t > 0} .
t

Exercise 5.7 Let Ω ⊆ H, H a Hilbert space (e.g., H = Rn ). Let ϕ : R → Ω be continuously


differentiable. (Then given α ∈ R, Dϕ(t)α ∈ H for all t, i.e., by linearity, αDϕ(t) ∈ H, in
particular (with α = 1) Dϕ(t) ∈ H for all t.) Prove that, for all t, Dϕ(t) and −Dϕ(t) both
belong to TC(ϕ(t), Ω).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 87


Constrained Optimization

Some authors require that o(·) be continuous. This may yield a smaller set (and hence a
weaker version of the necessary condition of optimality), as shown in the next exercise.
Exercise 5.8 Let Ω : {0, 1, 21 , 13 , . . .}. Show that TC(0, Ω) = {h ∈ R : h ≥ 0} but there
exists no continuous little-o function such that th + o(t) ∈ Ω for all t > 0 small enough.
TC need not be convex, even if Ω is defined by smooth equality and inequality constraints
(although N (Dg(x)) and S(x), introduced below, are convex). Example: Ω = {x ∈ R2 :
x1 x2 ≤ 0}. Also note that x∗ + TC(x∗ , Ω) is an approximation to Ω around x∗ .
Theorem 5.1 Suppose x∗ is a local minimizer for (P). Then

Df (x∗ )h ≥ 0 ∀h ∈ cl(coTC(x∗ , Ω)).

Proof. Let h ∈ TC(x∗ , Ω). Then


o(t)
∃o(·) ∋ x∗ + th + o(t) ∈ Ω ∀t ≥ 0 and lim = 0. (5.1)
t→0 t

By definition of a local minimizer, ∃ t̄ > 0 ∋

f (x∗ + th + o(t)) ≥ f (x∗ ) ∀t ∈ [0, t̄] (5.2)

But, using the definition of derivative

f (x∗ + th + o(t)) − f (x∗ )


ô(th)
= Df (x∗ )(th + o(t)) + ô(th + o(t)) with → 0 as t → 0
t
= tDf (x∗ )h + õ(t)

with õ(t) = Df (x∗ )o(t) + ô(th + o(t)).


Hence  
∗ ∗ ∗ õ(t)
f (x + th + o(t)) − f (x ) = t Df (x )h + ≥ 0 ∀t ∈ (0, t̄]
t
so that
õ(t)
Df (x∗ )h + ≥ 0 ∀t ∈ (0, t̄]
t
õ(t)
It is readily verified that t
→ 0 as t → 0. Thus, letting t ց 0, one obtains

Df (x∗ )h ≥ 0.

The remainder of the proof is left as an exercise.

Dual cone. In a Hilbert space X, if x∗ = 0X solves (P), then hgradf (x∗ ), hi ≥ 0 for all h in
the tangent cone, i.e., gradf (x∗ ) belongs to the dual cone to the tangent cone.

Definition 5.4 Given a cone K, its dual cone K ∗ is given by

K ∗ := {y ∈ X : hy, xi ≥ 0 ∀x ∈ K}.

88 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.2 Equality Constraints - First Order Conditions

Ω)
x,
Ω C(*
+T
x*

x*
C

Figure 5.2: In the situation pictured (where x∗ = 0X for clarity), the tangent cone to Ω at
x∗ is the closure of the radial cone to Ω at x∗ .

In Figure 5.2, when x∗ = 0X , cone C is the negative of the dual cone, which is the polar
cone, of TC(x, Ω).

The necessary condition of optimality we just obtained is not easy to use. By considering
specific types of constraint sets Ω, we hope to obtain a simple expression for TC, thus a
convenient condition of optimality. We first consider the equality constrained problem.

5.2 Equality Constraints - First Order Conditions


For simplicity, we now restrict ourselves to V = Rn . Consider the problem

min{f (x) : g(x) = 0ℓ } (5.3)

i.e.
min{f (x) : x ∈ {x : g(x) = 0ℓ }}
| {z }

where f : Rn → R and g : Rn → Rℓ are both continuously Fréchet differentiable. Let x∗


be such that g(x∗ ) = 0ℓ . Let h ∈ TC(x∗ , Ω), i.e., suppose there exists a o(·) function such
that
g(x∗ + th + o(t)) = 0ℓ ∀t ≥ 0.
Since g(x∗ ) = 0ℓ , we readily conclude that Dg(x∗)h = 0ℓ . Hence TC(x∗ , Ω) ⊆ N (Dg(x∗ )),
i.e., if h ∈ TC(x∗ , Ω) then ∇gi (x∗ )T h = 0 for every i: tangent vectors are orthogonal to the
gradients of the components of g.

Exercise 5.9 Prove that, furthermore,

cl(coTC(x∗ , Ω)) ⊆ N (Dg(x∗ )) .

Definition 5.5 The pair (x∗ , g) is said to be non-degenerate if

cl(coTC(x∗ , Ω)) = N (Dg(x∗)) . (5.4)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 89


Constrained Optimization

Below, we investigate conditions that guarantee non-degeneracy. Note that, indeed, non-
degeneracy may fail to hold. In particular, unlike TC(x∗ , Ω) and the closure of its convex
hull, N (Dg(x∗ )) depends not only on Ω, but also on the way Ω is formulated. For example,
with x ∈ R, {x : x = 0} ≡ {x : x2 = 0} but N (Dg(0)) is {0} for the left-hand side and R
for the right-hand side.
We now derive a first order optimality conditions for the case when (5.4) does hold.
Theorem 5.2 Suppose x∗ is a local minimizer for (5.3) and suppose that (x∗ , g) is non-
degenerate, i.e., cl coTC(x∗ , Ω) = N (Dg(x∗ )). Then
Df (x∗ )h = 0 ∀h ∈ N (Dg(x∗))

Proof. From Theorem 5.1,


Df (x∗ )h ≥ 0 ∀h ∈ cl(coTC(x∗ , Ω) (5.5)
i.e.,
Df (x∗ )h ≥ 0 ∀h ∈ N (Dg(x∗ )) . (5.6)
Now obviously h ∈ N (Dg(x∗ )) implies −h ∈ N (Dg(x∗ )). Hence
Df (x∗ )h = 0 ∀h ∈ N (Dg(x∗ )) (5.7)

Remark 5.2 Theorem 5.2 can also be proved directly, without making use of Theorem 5.1,
by invoking the “Inverse Function Theorem”. See, e.g., [?, Section 9.3, Lemma 1].
Remark 5.3 Our non-degeneracy condition is in fact a type of “constraint qualification”;
more on this later. In fact, as mentioned in Remark 5.22 it is the least restrictive constraint
qualification for our equality-constrained problem.
Corollary 5.1 (Lagrange multipliers. Joseph-Louis Lagrange, Italian-born mathematician,
1736–1813.) Under the same assumptions, ∃λ∗ ∈ Rm such that
∇f (x∗ ) + Dg(x∗ )T λ∗ = 0n (5.8)
i.e.
m
X

∇f (x ) + λ∗,j ∇g j (x∗ ) = 0n . (5.9)
j=1

Proof. From the theorem above



∇f (x∗ ) ∈ N (Dg(x∗ ))⊥ = N (Dg(x∗))⊥ = R Dg(x∗ )T ,
i.e.,
∇f (x∗ ) = Dg(x∗ )T λ̃ (5.10)
for some λ̃. The proof is complete if one sets λ∗ = −λ̃.

Next, we seek intuition concerning (5.4) by means of examples.

90 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.2 Equality Constraints - First Order Conditions

Example 5.2 (Figure 5.3)

(1) m = 1 n = 2. Claim: ∇g(x∗ )⊥TC(x∗ , Ω) (why ?). Thus, from the picture, TC(x∗ , Ω) =
N (Dg(x∗)) (assuming ∇g(x∗ ) 6= 02 ), so (5.4) holds.

(2) m = 2 n = 3. Again, TC(x∗ , Ω) = N (Dg(x∗)), so again (5.4) holds. (We assume here
that ∇g1 (x∗ ) 6= 03 6= ∇g2 (x∗ ) and it is clear from the picture that these two gradients
are not parallel to each other.)

(3) m = 2 n = 3. Here TC is a line but N (Dg(x∗ )) is a plane (assuming ∇g1 (x∗ ) 6=


03 6= ∇g2 (x∗ )). Thus cl(coTC(x∗ , Ω)) 6= N (Dg(x∗ )). Note that x∗ could be a local
minimizer with ∇f (x∗ ) as depicted, although ∇f (x∗ )T h < 0 for some h ∈ N (Dg(x∗ )).

(1) m=1 n=2


x*+TC(x*,Ω)
• Ω
x* ={xg(x)=0}

(2) m=2 n=3



g 1(x)=0
• x*
g (x)=0
2

(3) m=2 n=3 Ω ∇f(x*)


g =0
1
x* •

g =0
2

Figure 5.3: Inclusion (5.4) holds in case (1) and (2), but not in case (3).

Now let h ∈ N (Dg(x∗)), i.e., suppose Dg(x∗)h = 0ℓ . We want to find o(·) : R → Rn s.t.

s(t) = x∗ + th + o(t) ∈ Ω ∀t ≥ 0

Geometric intuition (see Figure 5.4) suggests that we could try to find o(t) orthogonal to
h. (This does not work for the third example in Figure 5.3.). Since h ∈ N (Dg(x∗)), we try
with o(t) in the range of Dg(x∗ )T , i.e.,

Copyright ©1993–2024, André L. Tits. All Rights Reserved 91


Constrained Optimization

x* x*+th

Ω x*+ th
+ o(t)
Figure 5.4:

s(t) = x∗ + th + Dg(x∗)T ϕ(t)


for some ϕ(t) ∈ Rℓ . We want to find ϕ(t) such that s(t) ∈ Ω ∀t, i.e., such that g(x∗ + th +
Dg(x∗ )T ϕ(t)) = 0ℓ ∀t, and to see under what condition ϕ exists and is a “little o” function.
We will make use of the implicit function theorem.

Theorem 5.3 (Implicit Function Theorem (IFT); see, e.g., [?]. Let F : Rm ×
Rn−m → Rm , m < n and x̂1 ∈ Rm , x̂2 ∈ Rn−m be such that

(a) F ∈ C 1

(b) F (x̂1 , x̂2 ) = 0m

(c) D1 F (x̂1 , x̂2 ) is non-singular.

Then ∃ǫ > 0 and a function Φ : B(x̂2 , ǫ) → Rm such that

(i) x̂1 = Φ(x̂2 )

(ii) F (Φ(x2 ), x2 ) = 0m ∀x2 ∈ B(x̂2 , ǫ)

and Φ is the only function satisfying (i) and (ii);

(iii) Φ ∈ C 1 in B(x̂2 , ǫ)

(iv) DΦ(x2 ) = −[D1 F (Φ(x2 ), x2 )]−1 D2 F (Φ(x2 ), x2 ) ∀x2 ∈ B(x̂2 , ǫ), where D denotes
differentiation and Di differentiation with respect to the ith argument.

Interpretation (Figure 5.5)

x2
F(x1,x2)=0
• x*
∼x

x1

Figure 5.5:

92 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.2 Equality Constraints - First Order Conditions

Let n = 2, m = 1 i.e. x1 ∈ R, x2 ∈ R. [The idea is to “solve” the system of equations


for x1 (locally around (x̂1 , x̂2 ).] Around x∗ , (“likely” D1 F (x∗1 , x∗2 ) 6= 0) x1 is a well defined
continuous function of x2 : x1 = Φ(x2 ). Around x̃, (D1F (x̃1 , x̃2 ) = 0) and x1 is not
everywhere defined (specifically, it is not defined for x2 < x̃2 ) ((c) is violated). Note that
(iv) is obtained by differentiating (ii) using the chain rule. We are now ready to prove the
following theorem.

Exercise 5.10 Let A be a full rank m × n matrix with m ≤ n. Then AAT is non-singular.
If m > n, AAT is singular.

Definition 5.6 x ∈ Ω is a regular point for problem (5.3) if Dg(x∗ ) is surjective, i.e., has
full row rank.

Theorem 5.4 Suppose that g(x∗ ) = 0ℓ and x∗ is a regular point for (5.3). Then (x∗ , g) is
non-degenerate, i.e., TC(x∗ , Ω) = N (Dg(x∗ )).

Proof. Let h ∈ N (Dg(x∗ )). We want to find ϕ : R → Rℓ such that s(t) := x∗ + th +


Dg(x∗ )T ϕ(t) satisfies
g(s(t)) = 0ℓ ∀t > 0
i.e., 
g x∗ + th + Dg(x∗)T ϕ(t) = 0ℓ ∀t > 0.
Consider the function g̃ : Rℓ × R → Rℓ

g̃ : (ϕ, t) 7→ g x∗ + th + Dg(x∗ )T ϕ

We now use the IFT on g̃ with ϕ̂ = 0ℓ , t̂ = 0. We have


(i) g̃ ∈ C 1

(ii) g̃(0ℓ , 0) = 0ℓ

(iii) Dg(0ℓ, 0) = Dg(x∗ )Dg(x∗)T


Hence Dg(0ℓ, 0) is non-singular and IFT applies i.e. ∃ϕ : R → Rℓ and t̄ > 0 such that
ϕ(0) = 0ℓ and
g̃(ϕ(t), t) = 0ℓ ∀t ∈ [−t̄, t̄]
i.e. 
g x∗ + th + Dg(x∗ )T ϕ(t) = 0ℓ ∀t ∈ [−t̄, t̄].
Now note that a differentiable function that vanishes at 0 is a“o” function if and only if its
derivative vanishes at 0. To exploit this fact, note that

(t) = − (D1 g̃(ϕ(t), t))−1 D2 g̃(ϕ(t), t) ∀t ∈ [−t̄, t̄]
dt
But, from the definition of g̃ and since h ∈ N (Dg(x∗ )),

D2 g̃(0ℓ , 0) = Dg(x∗ )h = 0ℓ

Copyright ©1993–2024, André L. Tits. All Rights Reserved 93


Constrained Optimization

i.e.

(0) = 0ℓ
dt
and

ϕ(t) = ϕ(0) + (0)t + o(t) = o(t)
dt
so that
g(x∗ + th + o′ (t)) = 0ℓ ∀t
with
o′ (t) = Dg(x∗ )T o(t)
which implies that h ∈ TC(x∗ , Ω).

Remark 5.4 Note that, in order for Dg(x∗) to be full row rank, it is necessary that ℓ ≤ n,
a natural condition for problem (5.3).
Remark 5.5 Regularity of x∗ is not necessary in order for (5.8) to hold. For example,
consider cases where two components of g are identical (in which case there are no regular
points), or cases where x∗ happens to also be an unconstrained local minimizer (so (5.8)
holds trivially) but is not a regular point, e.g., min x2 subject to x = 0. Also, it is a simple
exercise to show that, if g is affine, then every x∗ such that g(x∗ ) = 0ℓ is regular.
Remark 5.6 It follows from Theorem 5.4 that, when x∗ is a regular point, TC(x∗ , Ω) is
convex and closed, i.e.,
TC(x∗ , Ω) = cl(coTC(x∗ , Ω)).
Indeed, is a closed subspace. This subspace is typically referred to as the tangent plane to
Ω at x∗ .
Remark 5.7 Suppose now g is scalar-valued. The set Lα (g) := {x : g(x) = α} is often
referred to as the α-level set of g. Let x∗ ∈ Lα (g) for some α. Regardless of whether x∗ is
regular for g, we have
TC(x∗ , Lα (g)) ⊆ N (Dg(x∗ )) = {∇g(x∗ )}⊥ ,
i.e., ∇g(x∗ ) is orthogonal to TC(x∗ , Lα (g)). It is said to be normal at x∗ to Lα (g), the
(unique) level set (or level surface, or level curve) of g that contains x∗ .
Remark 5.8 Also of interest is the connection between (i) the gradient at some x̂ ∈ R of a
scalar, continuously differentiable function f : Rn → R and (ii) the normal at x̂, f (x̂) to its
graph
G := {(x, z) ∈ Rn+1 : z = f (x)}.
(Note that ∇f (x) lies in Rn while G belongs to Rn+1 .) Thus let x̂ ∈ Rn and ẑ = f (x̂). Then
(i) (x̂, ẑ) is regular for the function z −f (x) and (ii) (g, −1) is orthogonal to the tangent plane
at (x̂, ẑ) to G (i.e., is normal to G) if and only if g = ∇f (x̂). Thus, the gradient at x̂ of f is
the horizontal projection of the downward normal (makes an angle of more than π/2 with
the positive z-axis) at (x̂, ẑ) to the graph of f , when the normal is scaled so that its vertical
projection (which is always nonzero since f is differentiable at x̂) has unit magnitude.

94 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.2 Equality Constraints - First Order Conditions

Exercise 5.11 Prove claims (i) and (ii) in Remark 5.8.

Remark 5.9 Regularity of x∗ is not necessary in order for (5.8) to hold. For example,
consider cases where two components of g are identical (in which case there are no regular
points), or cases where x∗ happens to also be an unconstrained local minimizer (so (5.8)
holds trivially) but is not a regular point, e.g., min x2 subject to x = 0.

Corollary 5.2 Suppose that x∗ is a local minimizer for (5.3) (without full rank assumption).
Then ∃λ∗ ∈ R1+ℓ , λ∗ = (λ∗,0 , λ∗,1 , . . . , λ∗,ℓ ) 6= 01+ℓ , such that

X
∗,0 ∗
λ ∇f (x ) + λ∗,j ∇g j (x∗ ) = 0n , (5.11)
j=1

i.e., the set {∇f (x∗ ), ∇g j (x∗ ), j = 1, . . . , ℓ} is linearly dependent.

Proof. If Dg(x∗ ) is not full rank, then (5.11) holds with λ∗,0 = 0 (∇g j (x∗ ) are linearly
dependent). If Dg(x∗) has full rank, (5.11) holds with λ∗,0 = 1, from Corollary 5.1.

Remark 5.10

1. How does the optimality condition (5.9) help us solve the problem? Just remember
that x∗ must satisfy the constraints, i.e.,

g(x∗ ) = 0ℓ (5.12)

Hence we have a system of n + ℓ equations with n + ℓ unknown (x∗ and λ∗ ). Now,


keep in mind that (5.9) is only a necessary condition. Hence, all we can say is that,
if there exists a local minimizer satisfying the full rank assumption, it must be among
the solutions of (5.9)+(5.12).
2. Solutions with λ∗,0 = 0 are degenerate in the sense that the cost function does not enter
the optimality condition. If λ∗,0 6= 0, dividing both sides of (5.11) by λ∗,0 yields (5.9).

Lagrangian function
If one defines L : Rn × Rℓ → R by

X
L(x, λ) = f (x) + λj g j (x)
j=1

then, the optimality conditions (5.9)+(5.12) can be written as: ∃λ∗ ∈ Rℓ s.t.

D1 L(x∗ , λ∗ ) = 0B(Rn ,R) , (5.13)


D2 L(x∗ , λ∗ ) = 0B(Rℓ ,R) . (5.14)

(In fact, D2 L(x∗ , λ) = 0B(Rn ,R) ∀λ.) L is called the Lagrangian for problem (5.3).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 95


Constrained Optimization

5.3 Equality Constraints – Second Order Condition


Assume V = Rn . In view of the second order condition for the unconstrained problems, one
might expect a second order necessary condition of the type

hT ∇2 f (x∗ )h ≥ 0 (Dg(x∗ )) ∀h ∈ cl(co(T C(x∗ , Ω)) (5.15)

since it should be enough to consider directions h in the tangent plane to Ω at x∗ . The


following exercise show that this is not true in general. (In a sense, the tangent cone is a
mere first-order approximation to Ω near x∗ , and is not accurate enough for for (5.15) to
hold.)
Exercise 5.12 Consider the problem

min{f (x, y) ≡ −x2 + y : g(x, y) ≡ y − kx2 = 0} , k>1.

Show that (0, 0) is the unique global minimizer and that it does not satisfy (5.15). ((0,0)
does not minimize f over the tangent cone (=tangent plane).)

The correct second order conditions will involve the Lagrangian. (The statement given here
is more restrictive than strictly necessary. Specifically, while surjectivity of the Jacobian of
the constraints is sufficient for the theorem to hold, it also holds under milder appropriate
conditions.)
Theorem 5.5 (2nd order necessary condition). Suppose x∗ is a local minimizer for (5.3)
and suppose that Dg(x∗ ) is surjective. Also suppose that f and g are twice continuously
differentiable. Then there exists λ ∈ Rm such that (5.13) holds and

hT ∇2xx L(x∗ , λ)h ≥ 0 ∀h ∈ N (Dg(x∗)) (5.16)

Proof. Let h ∈ N (Dg(x∗ )) = TC(x∗ , Ω) (since Dg(x∗) has full row rank). Then ∃o(·) s.t. x∗ +
th + o(t) ≡ s(t) ∈ Ω ∀t ≥ 0 i.e.

g(s(t)) = 0ℓ ∀t ≥ 0 (5.17)

and, since x∗ is a local minimizer, for some t̄ > 0

f (s(t)) ≥ f (x∗ ) ∀t ∈ [0, t̄] (5.18)

We can write ∀t ∈ [0, t̄]


1 
0 ≤ f (s(t))−f (x∗ ) = ∇f (x∗ )T (th+o(t))+ (th+o(t))T ∇2 f (x∗ )(th + o(t)) +o2 (t) (5.19)
2
and, for j = 1, 2, . . . , m,
1 
0 = g j (s(t)) − g j (x∗ ) = ∇g j (x∗ )T (th + o(t)) + (th + o(t))T ∇2 g j (x∗ )(th + o(t)) + oj2 (t)
2
(5.20)

96 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.3 Equality Constraints – Second Order Condition

oj (t)
where o2t(t)
2 → 0, as t → 0, 2t2 → 0 as t → 0 for j = 1, 2, . . . , m. (Note that the first
term in the RHS of (5.19) is generally not 0, because of the “o” term, and is likely to even
dominate the second term; this is why conjecture (5.15) is incorrect.) We have shown (1st
order condition) that there exists λ ∈ Rm such that
m
X

∇f (x ) + λj ∇g j (x∗ ) = 0
j=1

Hence, multiplying the jth equation in (5.20) by λj and adding all of them, together with
(5.19), we get
1
0 ≤ L(s(t), λ)−L(x∗ , λ) = ∇x L(x∗ , λ)T (th+o(t))+ (th+o(t))T ∇2xx L(x∗ , λ)(th+o(t))+ õ2 (t)
2
yielding
1
0 ≤ (th + o(t))T ∇2xx L(x∗ , λ)(th + o(t)) + õ2 (t)
2
with m
X
õ2 (t) = o2 (t) + λj oj2 (t)
j=1

which can be rewritten as


t2 T 2 ō2 (t)
0≤ h ∇xx L(x∗ , λ)h + ō2 (t) with 2 → 0 as t → 0.
2 t
Dividing by t2 and letting t ց 0 (t > 0), we obtain the desired result.

Just like in the unconstrained case, the above proof cannot be used directly to obtain a
sufficiency condition, by changing ‘≤’ to ‘<’ and proceeding backwards. In fact, an additional
difficulty here is that all we would get is

f (x∗ ) < f (s(t)) for any h ∈ N (Dg(x∗ )) and s(t) = x∗ + th + oh (t),

for some “little o” function oh and for all t ∈ (o, t̄h ] for some t̄h > 0. It is not clear whether
this would ensure that

f (x∗ ) < f (x) ∀x ∈ Ω ∩ B(x∗ , ǫ) \ {x∗ }, for some ǫ > 0.

It turns out that it does.

Theorem 5.6 (2nd order sufficiency condition). Suppose that f and g are twice continu-
ously differentiable. Suppose that x∗ ∈ Rn is such that
(i) g(x∗ ) = 0ℓ

(ii) (5.13) is satisfied for some λ∗ ∈ Rℓ

(iii) hT ∇2xx L(x∗ , λ∗ )h > 0 ∀h ∈ N (Dg(x∗)) , h 6= 0n

Copyright ©1993–2024, André L. Tits. All Rights Reserved 97


Constrained Optimization

Then x∗ is a strict local minimizer for (5.3)

Proof. By contradiction. Suppose x∗ is not a strict local minimizer. Then ∀ǫ > 0, ∃x ∈


B(x∗ , ǫ), x 6= x∗ s.t. g(x) = 0ℓ and f (x) ≤ f (x∗ ) i.e., ∃ a sequence {hk } ⊂ Rn , ||hk || = 1
and a sequence of positive scalars tk → 0 such that

g(x∗ + tk hk ) = 0ℓ (5.21)

f (x∗ + tk hk ) ≤ f (x∗ ) (5.22)

We can write
1
0 ≥ f (xk ) − f (x∗ ) = tk ∇f (x∗ )T hk + t2k hTk ∇2 f (x∗ )hk + o2 (tk hk )
2
1
0ℓ = g (xk ) − g (x ) = tk ∇g (x ) hk + t2k hTk ∇2 g j (x∗ )hk + oj2 (tk hk )
j j ∗ j ∗ T
2
j
Multiplying by the multipliers λ∗ given by (ii) and adding, we get

1
0 ≥ t2k hTk ∇2xx L(x∗ , λ∗ )hk + õ2 (tk hk )
2
i.e.
1 T 2 õ2 (tk hk )
hk ∇xx L(x∗ , λ∗ )hk + ≤ 0 ∀k (5.23)
2 t2k

K
Since ||hk || = 1 ∀k, {hk } lies in a compact set and there exists h∗ and K s.t. hk → h∗ .
Without loss of generality, assume hk → h∗ . Taking k → ∞ in (5.23), we get

(h∗ )T ∇2xx L(x∗ , λ∗ )h∗ ≤ 0.

Further, (5.21) yields


 
∗ o(tk )
0 = tk Dg(x ) hk + (5.24)
tk
o(tk )
Since tk > 0 and hk → h∗ , and since tk
→ 0 as k → ∞, this implies

Dg(x∗)h∗ = 0ℓ , i.e., h∗ ∈ N (Dg(x∗)) .

This is in contradiction with assumption (iii).

Exercise 5.13 Give an alternate proof of Theorem 4.6, using the line of proof used above
for Theorem 5.6. Where do you make use of the assumption that the domain is finite di-
mensional? (Recall Exercise 4.14.)

98 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.4 Inequality Constraints – First Order Conditions

5.4 Inequality Constraints – First Order Conditions


Consider the problem
min{f 0 (x) : f (x) ≤ 0} (5.25)
where f 0 : Rn → R and f : Rn → Rm are continuously differentiable, and “≤” is meant
component-wise. Recall that, in the equality constrained case (see (5.3)), we obtained two
first order conditions: a strong condition
∃λ∗ such that ∇f (x∗ ) + Dg(x∗ )T λ∗ = 0n (5.26)
subject to the assumption that Dg(x∗) has rank m, and a weaker condition
∃(λ∗0 , λ∗ ) 6= 01+m : w such that λ∗0 ∇f (x∗ ) + Dg(x∗)T λ∗ = 0n , (5.27)
not subject to such assumption. For the inequality constraint case, we shall first obtain
a weak condition (F. John condition; Fritz John, German-born mathematician, 1910–1994)
analogous to (5.27). Then we will investigate when a strong condition (Karush-Kuhn-Tucker
condition) holds. (William Karush, American mathematician, 1917–1997; Harold W. Kuhn,
American mathematician, 1925–2014; Albert W. Tucker, Canadian mathematician, 1905–
1995. From wikipedia: “The KKT conditions were originally named after Harold W. Kuhn,
and Albert W. Tucker, who first published the conditions in 1951. Later, in the 1970s, it
was discovered that the necessary conditions for this problem had been stated by William
Karush in his master’s thesis in 1939.”) We use the following notation. For x ∈ Rn
J(x) = {j ∈ {1, . . . , m} : f j (x) ≥ 0} (5.28)
J0 (x) = {0} ∪ J(x) (5.29)
i.e., J(x) is the set of indices of active or violated constraints at x whereas J0 (x) includes
the index of the cost function as well.
Exercise 5.14 Let s ∈ Rm be a vector of “slack” variables. Consider the 2 problems
(P1 ) min{f 0 (x) : f j (x) ≤ 0, j = 1, . . . , m}
x

(P2 ) min{f 0 (x) : f j (x) + (sj )2 = 0, j = 1, . . . , m}


x,s

(i) Carefully prove that if (x̂, ŝ) solves (P2 ) then x̂ solves (P1 ).
p
(ii) Carefully prove that if x̂ solves (P1 ) then (x̂, ŝ) solves (P2 ), where ŝj = −f j (x̂).
(iii) Use Lagrange multipliers rule and part (ii) to prove (carefully) the following weak result:
If x̂ solves (P1 ) and {∇f j (x̂) : j ∈ J(x̂)} is a linearly-independent set, then there exists a
vector µ ∈ Rm such that
∇f 0 (x̂) + Df (x̂)T µ = 0n (5.30)
µj = 0 if f j (x̂) < 0 j = 1, 2, . . . , m (5.31)
[This result is weak because: (i) there is no constraint on the sign of µ, (ii) the linear
independence assumption is significantly stronger than necessary.]

Copyright ©1993–2024, André L. Tits. All Rights Reserved 99


Constrained Optimization

To obtain stronger conditions, we now proceed, as in the case of equality constraints, to


characterize TC(x∗ , Ω). For x ∈ Ω we first define the set of “first-order strictly feasible”
directions
S̃f (x) := {h : ∇f j (x)T h < 0 ∀j ∈ J(x)}.
It is readily checked that S̃f (x∗ ) is a convex cone. Further, as shown next, it is a subset of
the radial cone, hence of the tangent cone.
Theorem 5.7 If x ∈ Ω (i.e., f (x) ≤ 0m ),
S̃f (x) ⊆ RC(x, Ω) .
Proof. Let h ∈ S̃f (x). For j ∈ J(x), we have, since f j (x) = 0,

f j (x + th) = f j (x + th) − f j (x) = t ∇f j (x∗ )T h + oj (t) (5.32)


 
j T oj (t)
= t ∇f (x) h + . (5.33)
t
Since ∇f j (x)T h < 0, there exists t̄j > 0 such that
oj (t)
∇f j (x)T h + < 0 ∀t ∈ (0, t̄j ] (5.34)
t
and, with t̄ = min{t̄j : j ∈ J(x)} > 0,

f j (x + th) < 0 ∀j ∈ J(x), ∀t ∈ (0, t̄]. (5.35)

For j ∈ {1, . . . , m}\J(x), f j (x) < 0, thus, by continuity ∃ t̃ > 0 s.t.

f j (x + th) < 0 ∀j ∈ {1, . . . , m}\J(x), ∀t ∈ (0, t̃]

Thus
x + th ∈ Ω ∀t ∈ (0, min(t̄, t̃)]

Theorem 5.8 Suppose x∗ ∈ Ω is a local minimizer for (P). Then

∇f 0 (x∗ )T h ≥ 0 for all h s.t. ∇f j (x∗ )T h < 0 ∀j ∈ J(x∗ )

or, equivalently
6 ∃h s.t. ∇f j (x∗ )T h < 0 ∀j ∈ J0 (x∗ ).

Proof. Follows directly from Theorems 5.7 and 5.1.

Remark 5.11 Since RC(x, Ω) ⊂ TC(x, Ω)), it follows from Theorem 5.7 that

cl S̃f (x∗ ) ⊆ cl RC(x, Ω) ⊂ cl co TC(x, Ω)).

However, as further discussed below, it does not follow from Theorem 5.8 that ∇f 0 (x∗ )T h ≥ 0
for all h such that ∇f j (x∗ )T h ≤ 0 for all j ∈ J(x∗ ).

100 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.4 Inequality Constraints – First Order Conditions

Definition 5.7 A set of vectors v1 , . . . , vk ∈ Rn is said to be positively linearly independent


if 
Pk
i=1 µ i vi = 0
implies µi = 0 i = 1, 2, . . . , k
µi ≥ 0, i = 1, . . . , k
If the condition does not hold, they are positively linearly dependent.

Note that, if v1 , . . . , vk are linearly independent, then they are positively linearly indepen-
dent.
The following proposition gives a geometric interpretation to the necessary condition just
obtained.

Theorem 5.9 Given {v1 , . . . , vk } ∈ Rn the following three statements are equivalent

(i) 6 ∃h ∈ Rn such that vjT h < 0 j = 1, . . . , k

(ii) 0n ∈ co{v1 , . . . , vk }

(iii) the vj ’s are positively linearly dependent.

Proof
(i)⇒(ii): By contradiction. If 0n 6∈ co{v1 , . . . , vk } then by the separation theorem (see
Appendix B) there exists h such that

v T h < 0 ∀v ∈ co{v1 , . . . , vk } .

In particular vjT h < 0 for j = 1, . . . , k. This contradicts (i).


(ii)⇒(iii): If 0n ∈ co{v1 , . . . , vk } then (see Exercise B.23 in Appendix B), for some αj ,
k
X k
X
0n = αj vj αj ≥ 0 αj = 1
j=1 1

Since positive numbers which sum to 1 cannot be all zero, this proves (iii).
P
(iii)⇒(i): By contradiction. Suppose (iii) holds, i.e., αi vi = 0n for some αi ≥ 0, not all
T
zero, and there exists h such that vj h < 0, j = 1, . . . , k. Then

k
!T k
X X
0= αj vj h= αj vjT h.
1 1

But since the αj ’s are non negative, not all zero, this is a contradiction.

Corollary 5.3 Suppose x∗ is a local minimizer for (5.25). Then

Copyright ©1993–2024, André L. Tits. All Rights Reserved 101


Constrained Optimization

0n ∈ co{∇f j (x∗ ) : j ∈ J0 (x∗ )}

Corollary 5.4 (F. John conditions) [Fritz John, American (German born) mathematician,
1
1910-1994]. Suppose x∗ is a local minimizer for (5.25). Then there exist µ∗0 , µ∗ , . . . , µ∗m ,
not all zero such that
m
P
(i) µ∗0 ∇f 0 (x∗ ) + µ∗j ∇f j (x∗ ) = 0n
j=1

(ii) f j (x∗ ) ≤ 0 j = 1, . . . , m
(iii) µ∗j ≥ 0 j = 0, 1, . . . , m
(iv) µ∗j f j (x∗ ) = 0 j = 1, . . . , m

Proof. From (iii) in Theorem 5.9, ∃µ∗j , ∀j ∈ J0 (x∗ ), µ∗j ≥ 0, not all zero such that
X
µ∗j ∇f j (x∗ ) = 0.
j∈J0 (x∗ )

By defining µ∗j = 0 for j 6∈ J0 (x∗ ) we obtain (i). Finally, (ii) just states that x∗ is feasible,
(iii) directly follows and (iv) follows from µ∗j = 0 ∀j 6∈ J0 (x∗ ).

Remark 5.12 Condition (iv) in Corollary 5.4 is called complementary slackness. In view
of conditions (ii) and (iii) it can be equivalently stated as
m
X
(µ∗ )T f (x∗ ) = µ∗j f j (x∗ ) = 0
1

The similarity between the above F. John conditions and the weak conditions obtained
for the equality constrained case is obvious. Again, if µ∗0 = 0, the cost f 0 does not enter
the conditions at all. These conditions are degenerate. (For example, min x s.t. x3 ≥ 0 has
x∗ = 0 as global minimizer and the F. John condition only holds with µ∗0 = 0. See more
meaningful examples below.)
We are now going to try and obtain conditions for the optimality conditions to be non-
degenerate, i.e., for the first multiplier µ∗0 to be nonzero (in that case, it can be set to 1
by dividing through by µ∗0 ). The resulting optimality conditions are called Karush-Kuhn-
Tucker (or Kuhn-Tucker) optimality conditions. The general condition for µ∗0 6= 0 is called
Kuhn-Tucker constraint qualification (KTCQ). But before considering it in some detail, we
give a simpler (but more restrictive) condition, which is closely related to the “regularity”
condition of equality-constrained optimization.

Proposition 5.2 Suppose x∗ is a local minimizer for (5.25) and suppose {∇f j (x∗ ) : j ∈
J(x∗ )} is a positively linearly independent set of vectors. Then the F. John conditions hold
with µ∗0 6= 0.

102 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.4 Inequality Constraints – First Order Conditions

Proof. By contradiction. Suppose µ∗0 = 0. Then, from Corollary 5.4 above, ∃ µ∗j ≥ 0, j =
1, . . . , m, not all zero such that
m
X
µ∗j ∇f j (x∗ ) = 0n
j=1

and, since µ∗j = 0 ∀j 6∈ J0 (x∗ ),


X
µ∗j ∇f j (x∗ ) = 0n ,
j∈J(x∗ )

which contradicts the positive linear independence assumption.

Note: Positive linear independence of {∇f j (x∗ ) : j ∈ J(x∗ )} is not necessary in order for
µ∗0 to be nonzero. E.g, again, min x s.t. x3 ≥ 0 has x∗ = 0 and µ∗0 > 0.
We now investigate weaker conditions under which a strong result holds. For x ∈ Ω,
define

Sf (x) := {h : ∇f j (x)T h ≤ 0 ∀j ∈ J(x)}. (5.36)

It turns out that the strong result holds whenever Sf (x∗ ) is equal the convex hull of the
closure of the tangent cone. (It is readily verified that Sf (x∗ ) is a closed convex cone.)

Remark 5.13 In the case of equality constraints recast as inequality constraints (g(x) ≤
0, −g(x) ≤ 0), Sf (x) reduces to N (Dg(x∗)) !

Remark 5.14 The inclusion S̃f (x∗ ) ⊆ TC(x∗ , Ω) (Theorem 5.7) does not imply that

Sf (x∗ ) ⊆ cl(coTC(x∗ , Ω)) (5.37)

since, in general
cl(S̃f (x∗ )) ⊆ Sf (x∗ ) (5.38)
but equality in (5.38) need not hold. (As noted below (see Proposition 5.6 : Mangasarian-
Fromovitz constraint qualification), it holds if and only if S̃f (x∗ ) is nonempty.) Condi-
tion (5.37) is known as the KTCQ condition (see below).

The next exercise shows that the inclusion opposite to (5.37) always holds. That inclusion
is analogous to the inclusion cl(coTC(x∗ , Ω)) ⊆ N (Dg(x∗ )) in the equality-constrained case.

Exercise 5.15 If x∗ ∈ Ω, then

cl(coTC(x∗ , Ω)) ⊆ Sf (x∗ ).

Definition 5.8 The Kuhn-Tucker constraint qualification (KTCQ) is satisfied at x̂ ∈ Ω if

cl(coTC(x̂, Ω)) = {h : ∇f j (x̂)T h ≤ 0 ∀j ∈ J(x̂)} (= Sf (x̂)) (5.39)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 103


Constrained Optimization



x*

Figure 5.6:

Figure 5.7:

Thus KTCQ is satisfied iff (5.37) holds. In particular this holds whenever S̃f (x) 6= ∅ (see
Proposition 5.6 below).

Example 5.3 In (1), (2), and (3) below, the gradients of the active constraints are not
positively linearly independent. In (1) and (2) though, KTCQ does hold.
(1) (Figure 5.6)

f 1 (x) ≡ x2 − x21 
S̃f (x∗ ) = ∅
f 2 (x) ≡ −x2 ⇒
 Sf (x∗ ) = {h s.t. h2 = 0} = TC(x∗ , Ω)
x∗ = (0, 0)

but (5.37) holds anyway


(2) (Figure 5.7)
 S̃f (x∗ ) =∅
f 1 (x) ≡ x2 − x31  ∗
S (x ) = {h s.t. h2 = 0}
f 2 (x) = −x2 ⇒ f ∗
 TC(x , Ω) = {h : h2 = 0, h1 ≥ 0}
x∗ = (0, 0)
and (5.37) does not hold

If f 2 is changed to x41 − x2 , KTCQ still does not hold, but with the cost function f 0 (x) = x2 ,
the KKT conditions do hold! (Hence KTCQ is sufficient but not necessary in order for KKT
to hold. Also see Exercise 5.20 below.)
(3) Example similar to that given in the equality constraint case; TC(x∗ , Ω) and Sf (x∗ ) are
the same as in the equality case (Figure 5.8). KTCQ does not hold.
(4) {(x, y, z) : z −(x2 +2y 2) ≥ 0, z −(2x2 +y 2) ≥ 0}. At (0, 0, 0), the gradients are positively
linearly independent, though they are not linearly independent. S̃f (x∗ ) 6= ∅, so that KTCQ
does hold.

Remark 5.15 The equality constraint g(x) = 0ℓ can be written instead as the pair of
inequalities g(x) ≤ 0ℓ and −g(x) ≤ 0ℓ . If there are no other constraints, then we get

104 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.4 Inequality Constraints – First Order Conditions

•x* f 1(x)=0 ≤ 0 above

f2(x)=0 ≤ 0 below

Figure 5.8:

Sf (x̂) = N (Dg(x̂)) so that KTCQ holds if and only if (x̂, g) is non-degenerate. (In particular,
if x̂ is regular, then KTCQ does hold.) However, none of the sufficient conditions we have
discussed (for KTCQ to hold) are satisfied in such case (implying that none are necessary)!
(Also, as noted in Remark 5.5, in the equality-constrained case, regularity is (sufficient but)
not necessary for a “strong” condition—i.e., with λ∗,0 6= 0—to hold.)

Exercise 5.16 (due to H.W. Kuhn). Obtain the sets Sf (x∗ ), S̃f (x∗ ) and TC(x∗ , Ω) at min-
imizer x∗ for the following examples (both have the same Ω): (i) minimize (−x1 ) subject
to x1 + x2 − 1 ≤ 0, x1 + 2x2 − 1 ≥ 0, x1 ≥ 0, x2 ≥ 0, (ii) minimize (−x1 ) subject to
(x1 + x2 − 1)(x1 + 2x2 − 1) ≤ 0; x1 ≥ 0, x2 ≥ 0. In each case, indicate whether KTCQ holds.

Exercise 5.17 In the following examples, exhibit the tangent cone at (0, 0) and determine
whether KTCQ holds

(i) Ω = {(x, y) : (x − 1)2 + y 2 − 1 ≤ 0, x2 + (y − 1)2 − 1 ≤ 0}

(ii) Ω = {(x, y) : − x2 + y ≤ 0, − x2 − y ≤ 0.

(iii) Ω = {(x, y) : − x2 + y ≤ 0, − x2 − y ≤ 0, − x ≤ 0}

(iv) Ω = {(x, y) : xy ≤ 0}.

Remark 5.16 Convexity of f j , j = 0, 1, . . . , m does not imply KTCQ. E.g., for Ω := {x :


x2 ≤ 0} ⊂ R (or, say, Ω := {(x, y) : x2 ≤ 0} ⊂ R2 ), KTCQ does not hold at the origin.

The following necessary condition of optimality follows trivially.

Proposition 5.3 Suppose x∗ is a local minimizer for (P) and KTCQ is satisfied at x∗ . Then
∇f 0 (x∗ )T h ≥ 0 ∀h ∈ Sf (x∗ ), i.e.,

∇f 0 (x∗ )T h ≥ 0 ∀h s.t. ∇f j (x∗ )T h ≤ 0 ∀j ∈ J(x∗ ) (5.40)

Proof. Follows directly from (5.39) and Theorem 5.1.

Remark 5.17 As mentioned in Remark 5.22 below, KTCQ is in fact the least restrictive
constraint qualification for our inequality-constrained problem.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 105


Constrained Optimization

Remark 5.18 The above result may appear to the reader to be only slightly stronger than
Theorem 5.8. It turns out that this is enough to ensure µ∗0 6= 0 though. This follows directly
from Farkas’s Lemma, named after Gyula Farkas (Hungarian mathematician, 1847–1930).

Proposition 5.4 (Farkas’s Lemma). Consider a1 , . . . , ak , b ∈ Rn . Then bT x ≤ 0 ∀x ∈


{x : aTi x ≤ 0, i = 1, . . . , k} if and only if

k
X
∃λi ≥ 0, i = 1, 2, . . . , k s.t. b = λi ai
i=1

Proof. (⇐): Obvious. (⇒): Consider the set


k
X
C = {y : y = λi ai , λi ≥ 0, i = 1, . . . , k} .
i=1

Exercise 5.18 Prove that C is a closed convex cone.

Our claim can now be simply expressed as b ∈ C. We proceed by contradiction. Thus


suppose b 6∈ C. Then, by Exercise ??, there exists x̃ such that x̃T b > 0 and x̃T v ≤ 0 for all
v ∈ C, in particular, x̃T ai ≤ 0 for all i, contradicting the premise.

Theorem 5.10 (Karush-Kuhn-Tucker). Suppose x∗ is a local minimizer for (5.26) and


KTCQ holds at x∗ . Then there exists µ∗ ∈ Rm such that

m
X
0 ∗
∇f (x ) + µ∗j ∇f j (x∗ ) = 0n
j=1
∗j
µ ≥ 0 j = 1, . . . , m
j
f (x∗) ≤ 0 j = 1, . . . , m
µ f (x∗ ) = 0
∗j j
j = 1, . . . , m

Proof. From (5.40) and Farkas’s Lemma ∃µ∗j , j ∈ J(x∗ ) such that
X
∇f 0 (x∗ ) + µ∗j ∇f j (x∗ ) = 0n
j∈J(x∗ )

with µ∗j ≥ 0, j ∈ J(x∗ )


Setting µ∗j = 0 for j 6∈ J(x∗ ) yields the claim.

106 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.4 Inequality Constraints – First Order Conditions

Remark 5.19 An interpretation of Farkas’s Lemma is that the closed convex cone C and
the closed convex cone
D := {x : aTi x ≥ 0, i = 1, . . . , k}
are dual of each other; see Definition 5.4. (Hint: hb, xi ≤ 0 for all x ∈ {x : hai , xi ≤ 0} iff
hb, xi ≥ 0 for all x ∈ {x : hai , xi ≥ 0}.) In the case of a subspace S (subspaces are cones),
the dual cone is simply the orthogonal complement, i.e., S ∗ = S ⊥ (check it). In that special
case, the fundamental property of linear maps L
N (L) = R(L∗ )⊥
can be expressed in the notation of cone duality as
N (L) = R(L∗ )∗ ,
which shows that our proofs of Corollary 5.1 (equality constraint case) and Theorem 5.10
(inequality constraint case), starting from Theorem 5.2 and Proposition 5.3 respectively, are
analogous.
We pointed out earlier a sufficient condition for µ∗0 6= 0 in the F. John conditions. We now
consider two other important sufficient conditions and we show that they imply KTCQ.
Definition 5.9 h : Rn → R is affine if h(x) = aT x + b.
Proposition 5.5 Suppose the f j ’s are affine. Then Ω = {x : f j (x) ≤ 0, j = 1, 2, . . . , m}
satisfies KTCQ at any x ∈ Ω.

Exercise 5.19 Prove Proposition 5.5.


The next exercise shows that KTCQ (a property of the description of the constraints) is not
necessary in order for the KKT conditions to hold for some objective function.
Exercise 5.20 Consider again the optimization problems in Exercise 5.16. In both cases
(i) and (ii) check whether the Kuhn-Tucker optimality conditions hold. Then repeat with the
cost function x2 instead of −x1 , which does not move the minimizer x∗ . (This substitution
clearly does not affect whether KTCQ holds!)
Remark 5.20 Note that, as a consequence of this, the strong optimality condition holds for
any equality constrained problem whose constraints are all affine (with no “regular point”
assumption).

Now, we know that it always holds that


cl(TC(x∗ , Ω)) ⊆ Sf (x∗ ) and cl(S̃f (x∗ )) ⊆ cl(TC(x∗ , Ω)) .
Thus, a sufficient condition for KTCQ is
Sf (x∗ ) ⊂ cl(S̃f (x∗ )).
The following proposition establishes when this holds. (Clearly, the only instance except for
the trivial case when both Sf (x∗ ) and S̃f (x∗ ) are empty.)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 107


Constrained Optimization

Proposition 5.6 (Mangasarian-Fromovitz) Suppose that there exists ĥ ∈ Rn such that


∇f j (x̂)T ĥ < 0 ∀j ∈ J(x̂), i.e., suppose S̃f (x̂) 6= ∅. Then Sf (x̂) ⊂ cl(S̃f (x̂)) (and
thus, KTCQ holds at x̂).

Proof. Let h ∈ Sf (x̂) and let hi = 1i ĥ + h i = 1, 2, . . . Then

1
∇f j (x̂)T hi = ∇f j (x̂)T ĥ + ∇f j (x̂)T h < 0 ∀j ∈ J(x̂) (5.41)
i | {z } | {z }
<0 ≤0

so that hi ∈ S̃f (x̂) ∀i. Since hi → h as i → ∞, h ∈ cl(S̃f (x̂)).

Exercise 5.21 Suppose the gradients of active constraints are positively linearly indepen-
dent. Then KTCQ holds. (Hint: ∃h s.t. ∇f j (x)T hi < 0 ∀j ∈ J(x).)

Exercise 5.22 Suppose the gradients of the active constraints are linearly independent.
Then the KKT multipliers are unique.

Exercise 5.23 (First order sufficient condition of optimality.) Consider the problem

minimize f0 (x) s.t. fj (x) ≤ 0 j = 1, . . . , m,

where fj : Rn → R, j = 0, 1, . . . , m, are continuously differentiable. Suppose the F. John


conditions hold at x∗ with multipliers µ∗j , j = 0, 1, . . . , m, not all zero. In particular x∗
satisfies the constraints, µ∗j ≥ 0, j = 0, 1, . . . , m, µ∗j fj (x∗ ) = 0, j = 1, . . . , m, and
m
X
µ∗0 ∇f0 (x∗ ) + µ∗j ∇fj (x∗ ) = 0n .
j=1

(There is no assumption that these multipliers are unique in any way.) Further suppose that
there exists a subset J of J0 (x∗ ) of cardinality n with the following properties: (i) µ∗j > 0 for
all j ∈ J, (ii) {∇fj (x∗ ) : j ∈ J} is a linearly independent set of vectors. Prove that x∗ is
a strict local minimizer.

5.5 Mixed Constraints – First Order Conditions


We consider the problem

min{f 0(x) : f (x) ≤ 0m , g(x) = 0ℓ } (5.42)

with
f 0 : Rn → R, f : Rn → Rm , g : Rn → Rℓ , all C1
We first obtain an extended Karush-Kuhn-Tucker condition. We define

108 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.5 Mixed Constraints – First Order Conditions

Sf,g (x∗ ) = {h : ∇f j (x∗ )T h ≤ 0 ∀j ∈ J(x∗ ), Dg(x∗ )h = 0}

As earlier, KTCQ is said to hold at x∗ if

Sf,g (x∗ ) = cl(coTC(x∗ , Ω)). (5.43)

Remark 5.21 In the pure equality case (m = 0) this reduces to N (Dg)(x∗ )) =


cl(coTC(x∗ , Ω)), which is less restrictive than regularity of x∗ .

Exercise 5.24 Again Sf,g (x∗ ) ⊇ cl(coTC(x∗ , Ω)) always holds.

Fact. If {∇g j (x), j = 1, . . . , ℓ} ∪ {∇f j (x), j ∈ J(x)} is a linearly independent set of vectors,
then KTCQ holds at x.

Exercise 5.25 Prove the Fact.

Theorem 5.11 (extended KKT conditions). If x∗ is a local minimizer for (5.42) and KTCQ
holds at x∗ then ∃µ∗ ∈ Rm , λ∗ ∈ Rℓ such that

µ∗ ≥ 0m
f (x∗ ) ≤ 0m , g(x∗ ) = 0ℓ
m
X ℓ
X
∇f 0 (x∗ ) + µ∗j ∇f j (x∗ ) + λ∗j ∇g j (x∗ ) = 0n
j=1 j=1
∗j j ∗
µ f (x ) = 0 j = 1, . . . , m

Proof (sketch)
Express Sf,g (x∗ ) by means of pairs of inequalities of the form ∇g j (x∗ )T h ≤ 0, g j (x∗ )T h ≥
0, and use Farkas’s lemma.

Remark 5.22 As shown in the next exercise, constraint qualification (5.43) is the least
restrictive valid constraint qualification: it is necessary in some appropriate sense. The non-
degeneracy condition we considered in the context of equality-constrained optimization and
the (“restricted”) KTCQ we considered obtained in the pure inequality-constraint case are
special cases of the above and hence are necessary in the same sense.

Exercise. Prove the following: Constraint qualification (5.43) is necessary in the sense that,
if it does not hold at x∗ then there exists a continuously differentiable objective function f 0
that attains a (constrained) local minimum at x∗ at which the extended KKT conditions do
not hold. [Hint: Invoke the main result in paper [?] (which involves the concept of polar
cone) and show that the CQ condition used there is equivalent to the above.]
Without the KTCQ assumption, the following (weak) result holds (see, e.g., [?], sec-
tion 3.3.5).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 109


Constrained Optimization

Theorem 5.12 (extended F. John conditions). If x∗ is a local minimizer for (5.42) then
∃µ∗j , j = 0, 1, 2, . . . , m and λ∗j j = 1, . . . , ℓ, not all zero, such that

µ∗j ≥ 0 j = 0, 1, . . . , m
f (x∗ ) ≤ 0m , g(x∗ ) = 0ℓ
m
X ℓ
X
∗j j ∗
µ ∇f (x ) + λ∗j ∇g j (x∗ ) = 0n
j=0 j=1

µ∗j f j (x∗ ) = 0 j = 1, . . . , m (complementary slackness)

Note the constraint on the sign of the µ∗j ’s (but not of the λ∗j ’s) and the complementary
slackness condition for the inequality constraints.
Exercise 5.26 The argument that consists in splitting again the equalities into sets of 2
inequalities and expressing the corresponding F. John conditions is inappropriate. Why?

Theorem 5.13 (Convex problems, sufficient condition). Consider problem (5.42). Suppose
that f j , j = 0, 1, 2, . . . , m are convex and that g j , j = 1, 2, . . . , ℓ are affine. Under those
conditions, if x∗ is such that ∃µ∗ ∈ Rm , λ∗ ∈ Rℓ which, together with x∗ , satisfy the KKT
conditions, then x∗ is a global minimizer for (5.42).

Proof. Define ℓ : Rn → R as
m
X ℓ
X
j j
ℓ(x) := L(x, µ∗ , λ∗ ) = f 0 (x) + µ∗ f j (x) + λ∗ g j (x)
j=1 j=1

with µ∗ and λ∗ as given in the theorem.


(i) ℓ(·) is convex (prove it)
m
P ℓ
P
j j
(ii) ∇ℓ(x∗ ) = ∇f 0 (x∗ ) + µ∗ ∇f j (x∗ ) + λ∗ ∇g j (x∗ ) = 0n
j=1 j=1

since (x∗ , µ∗ , λ∗ ) is a KKT triple.

(i) and (ii) imply that x∗ is a global minimizer for ℓ, i.e.,

ℓ(x∗ ) ≤ ℓ(x) ∀x ∈ Rn

in particular,
ℓ(x∗ ) ≤ ℓ(x) ∀x ∈ Ω
i.e.
m
X ℓ
X m
X ℓ
X
0 ∗ ∗j j ∗ ∗j j ∗ 0 ∗j j j
f (x ) + µ f (x ) + λ g (x ) ≤ f (x) + µ f (x) + λ∗ g j (x) ∀x ∈ Ω .
j=1 j=1 j=1 j=1

110 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.6 Mixed Constraints – Second order Conditions

Since (x∗ , µ∗, λ∗ ) is a KKT triple this simplifies to


m
X ℓ
X
0 ∗ 0 ∗j j j
f (x ) ≤ f (x) + µ f (x) + λ∗ g j (x) ∀x ∈ Ω
j=1 j=1

and, since for all x ∈ Ω, g(x) = 0ℓ , f (x) ≤ 0m and µ∗ ≥ 0m ,


f 0 (x∗ ) ≤ f 0 (x) ∀x ∈ Ω .

Exercise 5.27 Under the assumptions of the previous theorem, if f 0 is strictly convex, then
x∗ is the unique global minimizer for (P ).

Remark 5.23 Our assumptions require that g be affine (not just convex). In fact, what we
really need is that ℓ(x) be convex and this might not hold if g is merely convex. For example
{(x, y) : x2 − y = 0} is obviously not convex.

5.6 Mixed Constraints – Second order Conditions


Theorem 5.14 (necessary condition). Suppose that x∗ is a local minimizer for (5.42) and
suppose that {∇f j (x∗ ), j ∈ J(x∗ )} ∪ {∇g j (x∗ ), j = 1, . . . , ℓ} is a linearly independent set of
vectors. Then there exist µ∗ ∈ Rn and λ∗ ∈ Rℓ such that the KKT conditions hold and
hT ∇2xx L(x∗ , µ∗ , λ∗ )h ≥ 0 ∀h ∈ {h : Dg(x∗)h = 0ℓ , ∇f j (x∗ )T h = 0 ∀j ∈ J(x∗ )} (5.44)
m
P ℓ
P

with L(x, µ, λ) = f 0 (x) + µj f j (x) + λj g j (x).
j=1 j=1

Proof. It is clear that x∗ is also a local minimizer for the problem


minimize f 0 (x) s.t. g(x) = 0ℓ
f j (x) = 0 ∀j ∈ J(x∗ ) .
The claims is then a direct consequence of the second order necessary condition of optimality
for equality constrained problems.

Remark 5.24

1. The linear independence assumption is more restrictive than KTCQ (and is more
restrictive than necessary; see related comment in connection with Theorem 5.5). It
insures uniqueness of the KKT multipliers (prove it).
2. Intuition may lead to believe that (5.44) should hold for all h in the larger set
Sf,g (x∗ ) := {h : Dg(x∗)h = 0ℓ , ∇f j (x∗ )T h ≤ 0 ∀j ∈ J(x∗ )} .
The following scalar example shows that this is not true:

Copyright ©1993–2024, André L. Tits. All Rights Reserved 111


Constrained Optimization

min{log(x) : x ≥ 1} (with x ∈ R)

(Note that a first order sufficiency condition holds for this problem: see Exercises 5.23
and 5.28)

There are a number of (non-equivalent) second-order sufficient conditions (SOSCs) for prob-
lems with mixed (or just inequality) constraints. The one stated below strikes a good trade-off
between power and simplicity.

Theorem 5.15 (SOSC with strict complementarity). Suppose that x∗ ∈ Rn is such that

(i) KKT conditions (see Theorem 5.11) hold with µ∗ , λ∗ as multipliers, and µ∗j > 0 ∀j ∈
J(x∗ ).

(ii) hT ∇2xx L(x∗ , µ∗ , λ∗ )h > 0 ∀h ∈ {h 6= 0n : Dg(x∗ )h = 0ℓ , ∇f j (x∗ )T h = 0 ∀j ∈


J(x∗ )}

Then x∗ is a strict local minimizer.

Proof. See [?].

Remark 5.25 Without strict complementarity, the result is not valid. (Example: min{x2 −
y 2 : y ≥ 0} with (x∗ , y ∗) = (0, 0) which is a KKT point but not a local minimizer.) An
alternative (stronger) condition is obtained by dropping the strict complementarity assump-
j
tion but replacing in (ii) J(x∗ ) with its subset I(x∗ ) := {j : µ∗ > 0}, the set of indices
j
of binding constraints. Notice that if µ∗ = 0, the corresponding constraint does not enter
the KKT conditions. Hence, if such “non-binding” constraints is removed, x∗ will still be a
KKT point.

Exercise 5.28 Show that the second order sufficiency condition still holds if condition (ii)
is replaced with
hT ∇2xx L(x∗ , µ∗ , λ∗ )h > 0 ∀h ∈ Sf,g (x∗ )\{0} .

Find an example where this condition holds while (ii) does not.

This condition is overly strong: see example in Remark 5.24.

Remark 5.26 If sp{∇f j (x∗ ) : j ∈ I(x∗ )} = Rn , then condition (iii) in the sufficient
condition holds trivially, yielding a first order sufficiency condition. (Relate this to Exercise
5.23.)

112 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.7 Glance at Numerical Methods for Constrained Problems

5.7 Glance at Numerical Methods for Constrained


Problems
Penalty functions methods

(P ) min{f 0 (x) : f (x) ≤ 0m , g(x) = 0ℓ }

The idea is to replace (P ) by a sequence of unconstrained problems

(Pi ) min φi (x) ≡ f 0 (x) + ci P (x)

with P (x) = ||g(x)||2 + ||f (x)+ ||2 , (f (x)+ )j = max(0, f j (x)), and where ci grows to infinity.
Note that P (x) = 0 if and only if x ∈ Ω (feasible set). The rationale behind the method
is that, if ci is large, the penalty term P (x) will tend to push the solution xi (Pi ) towards
the feasible set. The norm used in defining P (x) is arbitrary, although the Euclidean norm
has clear computational advantages (P (x) continuously differentiable: this is the reason for
squaring the norms).

Exercise 5.29 Show that if || · || is any norm in Rn , || · || is not continuously differentiable


at 0n .

First, let us suppose that each Pi can be solved for a global minimizer xi (conceptual version).
K
Theorem 5.16 Suppose xi → x̂, then x̂ solves (P ).

Exercise 5.30 Prove Theorem 5.16

As pointed out earlier, the above algorithm is purely conceptual since it requires exact
computation of a global minimizer for each i. However, using one of the algorithms previously
studied, given ǫi > 0, one can construct a point xi satisfying

||∇φi (xi )|| ≤ ǫi , (5.45)

by constructing a sequence {xj } such that ∇φi (xj ) → 0n as j → ∞ and stopping computa-
tion when (5.45) holds. We choose {ǫi } such that ǫi → 0 as i → ∞.
For simplicity, we consider now problems with equality constraints only.
Exercise 5.31 Show that ∇φi (x) = ∇f (x) + ci Dg(x)T g(x)
K
Theorem 5.17 Suppose that ∀x ∈ Rn , Dg(x) has full row rank . Suppose that xi → x∗ .
Then x∗ ∈ Ω and ∃λ∗ ∋

∇f (x∗ ) + Dg(x∗ )T λ∗ = 0n (5.46)


K
i.e., the first order necessary condition of optimality holds at x∗ . Moreover, ci g(xi ) → λ∗ .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 113


Constrained Optimization

Exercise 5.32 Prove Theorem 5.17


Remark 5.27 The main drawback of this penalty function algorithm is the need to drive
ci to ∞ in order to achieve convergence to a solution. When ci is large (Pi ) is very difficult
to solve (slow convergence and numerical difficulties due to ill-conditioning). In practice,
one should compute xi for a few values of ci , set γi = 1/ci and define xi = x(γi ), and then
extrapolate for γi = 0 (i.e., ci = ∞). Another approach is to modify (Pi ) as follows, yielding
the method of multipliers, due to Hestenes and Powell.
1
(Pi ) min f (x) + ci ||g(x)||2 + λTi g(x) (5.47)
2
where
λi+1 = λi + ci g(xi ) (5.48)
with xi the solution to (Pi ).
It can be shown that convergence to the solution x∗ can now be achieved without having to
drive ci to ∞, but merely by driving it above a certain threshold ĉ (see [?] for details.)

Methods of feasible directions (inequality constraints)


For unconstrained problems, the value of the function ∇f at a point x indicates whether x
is stationary and, if not, in some sense how far it is from a stationary point. In constrained
optimization, a similar role is played by optimality functions. Our first optimality function
is defined as follows. For x ∈ Rn

θ1 (x) = min max ∇f j (x)T h, j ∈ J0 (x) (5.49)
h∈S


with S = {h : ||h|| ≤ 1}. Any norm could be used but we will focus essentially on the
Euclidean norm, which does not favor any direction. Since the “max” is taken over a finite
set (thus a compact set) it is continuous in h. Since S is compact, the minimum is achieved.
Proposition 5.7 For all x ∈ Rn , θ1 (x) ≤ 0. Moreover, if x ∈ Ω, θ1 (x) = 0 if and only if x
is a F. John point.

Exercise 5.33 Prove Proposition 5.7


We thus have an optimality function through which we can identify F. John points. Now
suppose that θ1 (x) < 0 (hence x is not a F. John point) and let ĥ be a minimizer in (5.49).
Then ∇f j (x)T ĥ < 0 for all j ∈ J0 (x), i.e., ĥ is a descent direction for the cost function and
all the active constraints, i.e., a feasible descent direction. A major drawback of θ1 (x) is its
lack of continuity, due to jump in the set J0 (x) when x hits a constraint boundary. Hence
|θ1 (x)| may be large even if x is very close to a F. John point. This drawback is avoided by
the following optimality function.

θ2 (x) = min max ∇f 0 (x)T h; f j (x) + ∇f j (x),T h, j = 1, . . . , m (5.50)
h∈S

114 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.7 Glance at Numerical Methods for Constrained Problems

Exercise 5.34 Show that θ2 (x) is continuous.

Proposition 5.8 Suppose x ∈ Ω. Then θ2 (x) ≤ 0 and, moreover, θ2 (x) = 0 if and only if x
is a F. John point.

Exercise 5.35 Prove Proposition 5.8

Hence θ2 has the same properties as θ1 but, furthermore, it is continuous. A drawback of θ2 ,


however, is that its computation requires evaluation of the gradients of all the constraints,
as opposed to just those of the active constraints for θ1 .
We will see later that computing θ1 or θ2 , as well as the minimizing h, amounts to solving a
quadratic program, and this can be done quite efficiently. We now use θ2 (x) in the following
optimization algorithm, which belongs to the class of methods of feasible directions.
Algorithm (method of feasible directions)
Parameters α, β ∈ (0, 1)
Data x0 ∈ Ω
i=0
while θ2 (xi ) 6= 0 do {
obtain hi = h(xi ), minimizer in (5.50)
k=0
repeat {
if (f 0 (xi + β k hi ) − f (xi ) ≤ αβ k ∇f (xi )T h & f j (xi + β k hi ) ≤ 0 for j = 1, 2, . . . , m)
then break
k =k+1
}
forever
xi+1 = xi + β k hi
i=i+1
}
stop

We state without proof a corresponding convergence theorem (the proof is essentially pat-
terned after the corresponding proof in the unconstrained case).

Theorem 5.18 Suppose that {xi } is constructed by the above algorithm. Then xi ∈ Ω ∀i
K
and xi → x̂ implies θ2 (x̂) = 0 (i.e. x̂ is a F. John point).

Note: The proof of this theorem relies crucially on the continuity of θ2 (x).
Newton’s method for constrained problems
Consider again the problem

min{f (x) : g(x) = 0} (5.51)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 115


Constrained Optimization

We know that, if x∗ is a local minimizer for (5.51) and Dg(x∗) has full rank, then ∃λ ∈ Rm
such that 
∇x L(x∗ , λ∗ ) = 0
g(x∗ ) =0
which is a system of n + m equations with n + m unknowns. We can try to solve this system
using Newton’s method. Define z = (x, λ) and
 
∇x L(x, λ)
F (z) = .
g(x)
We have seen that, in order for Newton’s method to converge locally (and quadratically),
it is sufficient that DF (z ∗ ) be non singular, with z ∗ = (x∗ , λ∗ ), and that DF be Lipschitz
continuous around z ∗ .
Exercise 5.36 Suppose that 2nd order sufficiency conditions of optimality are satisfied at
x∗ and, furthermore, suppose that Dg(x∗ ) has full rank. Then DF (z ∗ ) is non singular.
As in the unconstrained case, it will be necessary to “stabilize” the algorithm in order to
achieve convergence from any initial guess. Again such stabilization can be achieved by
• using a suitable step-size rule
• using a composite algorithm.
Sequential quadratic programming
This is an extension of Newton’s method to the problem
min{f 0 (x) : g(x) = 0, f (x) ≤ 0} . (P )
Starting from an estimate (xi , µi , λi ) of a KKT triple, solve the following minimization prob-
lem:
1
min{∇x L(xi , µi , λi )T v + v T ∇2xx L(xi , µi , λi )v :
v 2
g(xi ) + Dg(xi )v = 0, f (xi ) + Df (xi )v ≤ 0} (Pi )
i.e., the constraints are linearized and the cost is quadratic (but is an approximation to L
rather than to f 0 ). (Pi ) is a quadratic program (we will discuss those later) since the cost
is quadratic (in v) and the constraints linear (in v). It can be solved exactly and efficiently.
Let us denote by vi its solution and ξi , ηi the associated multipliers. The next iterate is
xi+1 = xi + vi
µi+1 = ξi
λi+1 = ηi
Then (Pi+1 ) is solved as so on. Under suitable conditions (including second order sufficiency
condition at x∗ with multipliers µ∗ , λ∗ ), the algorithm converges locally quadratically if
(xi , µi , λi ) is close enough to (x∗ , µ∗, λ∗ ). As previously, D2 L can be replaced by an estimate,
e.g., using an update formula. It is advantageous to keep those estimates positive definite. To
stabilize the methods (i.e., to obtain global convergence) suitable step-size rules are available.

Exercise 5.37 Show that, if m = 0 (no inequality) the iteration above is identical to the
Newton iteration considered earlier.

116 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.8 Sensitivity

5.8 Sensitivity
An important question for practical applications is to know what would be the effect of
slightly modifying the constraints, i.e., of solving the problem
min{f 0 (x) : g(x) = b1 , f (x) ≤ b2 } (5.52)
with the components of b1 and b2 being small. Specifically, given a (local) minimizer x∗ for
the problem when b := (b1 , b2 ) = (0ℓ , 0m ), (i) does there exist ǫ > 0 such that, whenever
kbk ≤ ǫ, the problem has a local minimizer x(b) which is “close” to x∗ ? and (ii) if such ǫ
exists, what can be said about the “value function” V : B(0ℓ+m , ǫ) → R given by
V (b) := f0 (x(b));
in particular, how about ∇V (0ℓ+m )?
For simplicity, we first consider the case with equalities only:
min{f 0 (x) : g(x) = b}. (5.53)
When b = 0ℓ , there exist λ∗ such that the first-order conditions of optimality are

∇f 0 (x∗ ) + Dg(x∗ )T λ∗ = 0n
. (5.54)
g(x∗ )) = 0ℓ
hold. Consider now the left-hand side of (5.54) as a function F (x, λ, b) of x, λ and b and
try to solve (5.54) locally for x and λ using IFT (assuming f 0 and g are twice continuously
Fréchet-differentiable)
 2 
∗ ∗ ∇1,1 L(x∗ , λ∗ ) Dg(x∗ )T
D1,2 F (x , λ , θℓ ) = (5.55)
Dg(x∗ ) 0
(where D1,2 denotes the Fréchet-derivative with respect to the first two arguments) which
was shown to be non-singular (in Exercise 5.36) under the same hypotheses. Hence x(b) and
λ(b) are well defined and continuously differentiable for ||b|| small enough and they form a
KKT pair for (5.53) since they satisfy (5.54). Furthermore, by continuity, they still satisfy
the 2nd order sufficiency conditions (check this) and hence, x(b) is a strict local minimizer
for (5.53). Finally, by the chain rule, we have
T
DV (0ℓ ) = Df 0(x∗ )Dx(0ℓ ) = −λ∗ Dg(x∗ )Dx(0ℓ ),
where we have invoked (5.54). Now let ϕ : Rℓ → Rℓ be given by ϕ(b) := g(x(b)). But by
the second equality in (5.54), for kbk small enough,
Dg(x(b))Dx(b) = Dϕ(b) = I,
in particular, with b = 0ℓ ,
Dg(x∗)Dx(0ℓ ) = I.
Hence (DV (0ℓ ))T = −λ∗ .
The next theorem addresses the case of the general problem (5.52). Strict complemen-
tarity is needed in order for µ(b) to still satisfy µ(b) ≥ 0m .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 117


Constrained Optimization

Theorem 5.19 Consider the family of problems (5.52) where f 0 , f and g are twice contin-
uously differentiable. Suppose that for b = (b1 , b2 ) = 0ℓ+m , there is a local solution x∗ such

that S = {∇g j (x∗ ), j = 1, 2, . . . , ℓ} ∪ {∇f j (x∗ ), j ∈ J(x∗ )} is a linearly independent set of
vectors. Suppose that, together with the multipliers µ∗ ∈ Rm and λ∗ ∈ Rℓ , x∗ satisfies SOSC
with strict complementarity. Then there exists ǫ > 0 such that for all b ∈ B(0ℓ , ǫ) there
exists x(b), x(·) continuously differentiable, such that x(0ℓ ) = x∗ and x(b) is a strict local
minimizer for (5.52). Furthermore

(DV (0ℓ+m ))T = −[λ∗ ; µ∗].

The proof essentially follows the lines of that for the equality-constrained case.

Exercise 5.38 In the theorem above, instead of considering (5.55), one has to consider the
matrix
 
∇21,1 L(x∗ , λ∗ , µ∗ ) Dg(x∗ )T Df 1(x∗ )T · · · Df m (x∗ )T

 Dg(x∗ ) 

D1,2,3 F (x∗ , λ∗ , µ∗ , 0ℓ+m ) = 
 µ∗1 Df 1 (x∗ ) f1 (x∗ ) 
.
 .. . .. 
 . 
∗ m ∗ ∗
µm Df (x ) fm (x )

Show that, under the assumption of the theorem above, this matrix is non-singular. (Hence,
again, one can solve locally for x(b), λ(b) and µj (b).)

Remark 5.28 Equilibrium price interpretation. For simplicity, consider a problem with a
single constraint,
min{f 0 (x) : f (x) ≤ 0m } with f : Rn → R.
(Extension to multiple constraints is straightforward.) Suppose that, at the expense of
paying a price of p per unit of b, the producer can, by acquiring some amount of b, replace
the inequality constraint used above by the less stringent

f (x) ≤ b , b > 0

(conversely, he/she will save p per unit if b < 0), for a total additional cost to the producer
of pb. From the theorem above, the resulting savings will be, to the first order

d
f 0 (x(0)) − f 0 (x(b)) ≃ − f (x(0))b = µ∗ b
db
Hence, if p < µ∗ , it is to the producers’ advantage to relax the constraint by relaxing the
right-hand side, i.e., by acquiring some additional amount of b. If p > µ∗ , to the contrary,
s/he can save by tightening the constraint. If p = µ∗ , neither relaxation nor tightening yields
any gain (to first order). Hence µ∗ is called the equilibrium price.

118 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.9 Duality

Note. As seen earlier, the linear independence condition in the theorem above (linear
independence of the gradients of the active constraints) insures uniqueness of the KKT
multipliers λ∗ and µ∗ . (Uniqueness is obviously required for the interpretation of µ∗ as
“sensitivity”.)
Exercise 5.39 Discuss the more general case
min{f 0 (x) : g(x, b1 ) = 0ℓ , f (x, b2 ) ≤ 0m }.

5.9 Duality
See [?]. Most results given in this section do not require any differentiability of the objective
and constraint functions. Also, some functions will take values on the extended real line
(including ±∞). The crucial assumption will be that of convexity. The results are global.
We consider the inequality constrained problem
min{f 0 (x) : f (x) ≤ 0m , x ∈ X} (P )
with f 0 : Rn → R, f : Rn → Rm and X is a given subset of Rn (e.g., X = Rn ) and the
inequality is meant componentwise. As before, we define the Lagrangian function by
m
X
L(x, µ) = f 0 (x) + µj f j (x)
j=1

Exercise 5.40 (sufficient condition of optimality; no differentiability or convexity assumed).


Suppose that (x∗ , µ∗ ) ∈ X × Rm is such that µ∗ ≥ 0m and
L(x∗ , µ) ≤ L(x∗ , µ∗ ) ≤ L(x, µ∗ ) ∀µ ≥ 0m , x ∈ X, (5.56)
i.e., (x∗ , µ∗) is a saddle point for L. Then x∗ is a global minimizer for (P ). (In particular,
it is feasible for (P).)

We will see that under assumptions of convexity and a certain constraint qualification, the
following converse holds: if x∗ solves (P ) then ∃µ∗ ≥ 0m such that (5.56) holds. As we show
below (Proposition 5.9), the latter is equivalent to the statement that
   
min sup L(x, µ) = max inf L(x, µ)
x∈X µ≥0m µ≥0m x∈X

and that the left-hand side attains its minimum at x∗ and the right-hand side attains its
maximum at µ∗ . If this holds, strong duality is said to hold. In such case, one could compute
any µ∗ which globally maximizes ψ(µ) = inf L(x, µ), subject to the simple constraint µ ≥ 0.
x
Once µ∗ is known, (5.56) shows that x∗ is a minimizer of L(x, µ∗ ), unconstrained if X = Rn .
(Note: L(x, µ∗ ) may have other “spurious” minimizers, i.e., minimizers for which (5.56) does
not hold; see below.)
Instead of L(x, µ), we now consider a more general function F : Rn × Rm → R. First of
all, “weak duality” always holds.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 119


Constrained Optimization

Lemma 5.1 (Weak duality). Given two sets X and Y and a function F : X × Y → R,

sup inf F (x, y) ≤ inf sup F (x, y) (5.57)


y∈Y x∈X x∈X y∈Y

Proof. We have successively

inf F (x, y) ≤ F (x, y) ∀x ∈ X, ∀y ∈ Y. (5.58)


x∈X

Hence, taking sup on both sides,


y∈Y

sup inf F (x, y) ≤ sup F (x, y) ∀x ∈ X (5.59)


y∈Y x∈X y∈Y

where now only x is free. Taking inf on both sides (i.e., in the right-hand side) yields
x∈X

sup inf F (x, y) ≤ inf sup F (x, y) (5.60)


y∈Y x∈X x∈X y∈Y

In the sequel, we will make use of the following result, of independent interest.
Proposition 5.9 Given two sets X and Y and a function F : X × Y → R, the following
statements are equivalent (under no regularity or convexity assumption)
(i) x∗ ∈ X, y ∗ ∈ Y and

F (x∗ , y) ≤ F (x∗ , y ∗ ) ≤ F (x, y ∗ ) ∀x ∈ X, ∀y ∈ Y

(ii)
min{sup F (x, y)} = max{ inf F (x, y)}
x∈X y∈Y y∈Y x∈X

where the common value is finite and the left-hand side is attained at x∗ and the right-
hand side is attained at y ∗.
  n o
Proof. ((ii)⇒(i)) Let α = min sup F (x, y) = max inf F (x, y) and let x∗ and y ∗ attain
x y y x
the ‘min’ in the first expression and the ‘max’ in the second expression, respectively. Then

F (x∗ , y) ≤ sup F (x∗ , y) = α = inf F (x, y ∗) ≤ F (x, y ∗) ∀x, y.


y x

Thus
F (x∗ , y ∗) ≤ α ≤ F (x∗ , y ∗)
and the proof is complete.
((i)⇒(ii))

inf sup F (x, y) ≤ sup F (x∗ , y) = F (x∗ , y ∗) = inf F (x, y ∗) ≤ sup inf F (x, y).
x y y x y x

120 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.9 Duality

By weak duality (see below) it follows that

inf sup F (x, y) = sup F (x∗ , y) = F (x∗ , y ∗) = inf F (x, y ∗) = sup inf F (x, y).
x y y x y x

Further, the first and fourth equalities show that the “inf” and “sup” are attained at x∗ and
y ∗ , respectively.

A pair (x∗ , y ∗) satisfying the conditions of Proposition 5.9 is referred to as a saddle point.

Exercise 5.41 The set of saddle points of a function F is a Cartesian product, that is, if
(x1 , y1 ) and (x2 , y2) are saddle points, then (x1 , y2 ) and (x2 , y1 ) also are. Further, F takes
the same value at all its saddle points.

Let us apply the above to L(x, µ). Thus, consider problem (P ) and let Y = {µ ∈ Rm : µ ≥
0}. Let p : Rn → R ∪ {+∞}, and ψ : Rm → R ∪ {−∞} be given by
 0
f (x) if f (x) ≤ 0
p(x) = sup L(x, µ) =
µ≥0 +∞ otherwise
ψ(µ) = inf L(x, µ).
x∈X

Then (P ) can be written

minimize p(x) s.t. x ∈ X (5.61)

and weak duality implies that

inf p(x) ≥ sup ψ(µ). (5.62)


x∈X µ≥0m

Definition 5.10 It is said that duality holds (or strong duality holds) if equality holds in
(5.62).

Remark 5.29 Some authors use the phrases “strong duality” or “duality holds” to means
that not only there is no duality gap but furthermore the primal infimum and dual supremum
are attained at some x∗ ∈ X and µ∗ ∈ Y .

If duality holds and x∗ ∈ X minimizes p(x) over X and µ∗ solves the dual problem

maximize ψ(µ) s.t. µ ≥ 0m , (5.63)

it follows that

L(x∗ , µ) ≤ L(x∗ , µ∗ ) ≤ L(x, µ∗ ) ∀x ∈ X, ∀µ ≥ 0m (5.64)

and, in particular, x∗ is a global minimizer for

minimize L(x, µ∗ ) s.t. x ∈ X.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 121


Constrained Optimization

Remark 5.30 That some x̂ ∈ X minimizes L(x, µ∗ ) over X is not sufficient for x̂ to solve
(P ) (even if (P ) does have a global minimizer). [Similarly, it is not sufficient that µ̂ ≥ 0
maximize L(x∗ , µ) in order for µ̂ to solve (5.63); in fact it is immediately clear that µ̂
maximizing L(x∗ , µ) implies nothing about µ̂j for j ∈ J(x∗ ).] However, as we saw earlier,
(x̂, µ̂) satisfying both inequalities in (5.64) is enough for x̂ to solve (P ) and µ̂ to solve (5.63).

Suppose now that duality holds, i.e.,

min{ sup L(x, µ)} = max { inf L(x, µ)} (5.65)


x∈X µ≥0m µ≥0m x∈X

(with the min and the max being attained). Suppose we can easily compute a maximizer
µ∗ for the right-hand side. Then, by Proposition 5.9 and Exercise 5.40, there exists x∗ ∈ X
such that (5.56) holds, and such x∗ is a global minimizer for (P ). From (5.56) such x∗ is
among the minimizers of L(x, µ∗ ). The key is thus whether (5.65) holds. This is known as
(strong) duality. We first state without proof a more general result, about the existence of
a saddle point for convex-concave functions. (This and other related results can be found
in [?]. The present result is a minor restatement of Proposition 2.6.4 in that book.)

Theorem 5.20 Let X and Y be convex sets and let F : X × Y → R be convex in its first
argument and concave (i.e., −F is convex) in its second argument, and further suppose that,
for each x ∈ X and y ∈ Y , the epigraphs of F (·, y) and −F (x, ·) are closed. Further suppose
that the set of minimizers of sup F (x, y) is nonempty and compact. Then
y∈Y

min sup F (x, y) = max inf F (x, y).


x∈X y∈Y y∈Y x∈X

We now prove this result in the specific case of our Lagrangian function L.


Proposition 5.10 ψ(µ) = inf L(x, µ) is concave (i.e., −ψ is convex) (without convexity
x
assumption)

Proof. L(x, µ) is affine, hence concave, and the pointwise infimum of a set of concave functions
is concave (prove it!).

We will now see that convexity of f 0 and f and a certain “stability” assumption (related to
KTCQ) are sufficient for duality to hold. This result, as well as the results derived so far in
this section, holds without differentiability assumption. Nevertheless, we will first prove the
differentiable case with the additional assumption that X = Rn . Indeed, in that case, the
proof is immediate.

Theorem 5.21 Suppose x∗ solves (P ) with X = Rn , suppose that f 0 and f are differentiable
and that KTCQ holds, and let µ∗ be a corresponding KKT multiplier vector. Furthermore,
suppose f 0 and f are convex functions. Then µ∗ solves the dual problem, duality holds and
x∗ minimizes L(x, µ∗ ).

122 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.9 Duality

Proof. Since (x∗ , µ∗ ) is a KKT pair one has


m
X
0 ∗ j
∇f (x ) + µ∗ ∇f j (x∗ ) = 0n (5.66)
j=1

and by complementary slackness

L(x∗ , µ∗) = f 0 (x∗ ) = p(x∗ ).

Now, under our convexity assumption,


m
X
0 j
ℓ(x) := f (x) + µ∗ f j (x) = L(x, µ∗ ) (5.67)
j=1

is convex, and it follows that x∗ is a global minimum for L(x, µ∗ ). Hence ψ(µ∗ ) = L(x∗ , µ∗ ).
It follows that µ∗ solves the dual, duality holds, and, from (5.66), x∗ solves

∇x L(x∗ , µ∗ ) = 0n

Remark 5.31 The condition above is also necessary for µ∗ to solve the dual: indeed if µ̂
is not a KKT multiplier vector at x∗ then ∇x L(x∗ , µ̂) 6= 0n so that x∗ does not minimize
L(·, µ̂), and (x∗ , µ̂) is not a saddle point.

Remark 5.32

1. The only convexity assumption we used is convexity of ℓ(x) = L(x, µ∗ ), which is weaker
than convexity of f 0 and f (for one thing, the inactive constraints are irrelevant).

2. A weaker result, called local duality is as follows: If the min’s in (5.61)-(5.63) are taken
in a local sense, around a KKT point x∗ , then duality holds with only local convexity
of L(x, µ∗ ). Now, if 2nd order sufficiency conditions hold at x∗ , then we know that
∇2xx L(x∗ , µ∗ ) is positive definite over the tangent space to the active constraints. If
positive definiteness can be extended over the whole space, then local (strong) convexity
would hold. This is in fact one of the ideas which led to the method of multipliers
mentioned above (also called augmented Lagrangian).

Exercise 5.42 Define again (see Method of Feasible Directions in section 5.7)

Θ(x) = min max{∇f j (x)T h : j ∈ J0 (x)}. (5.68)


||h||2 ≤1

Using duality, show that


X X
Θ(x) = − min{|| µj ∇f j (x)||2 µj = 1, µj ≥ 0 ∀j ∈ J0 (x)} (5.69)
j∈J0(x) j∈J0 (x)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 123


Constrained Optimization

and if h∗ solves (5.68) and µ∗ solves (5.69) then, if Θ(x) 6= 0, we have.


P j


− j∈J0 (x) µ∗ ∇f j (x)
h = P . (5.70)
|| j∈J0(x) µ∗j ∇f j (x)||2

h
Hint: first show that the given problem is equivalent to the minimization over h̃
∈ Rn+1

min{h̃|∇f j (x)T h ≤ h̃ ∀j ∈ J0 (x), hT h ≤ 1} .


h

Remark 5.33 This shows that the search direction corresponding to Θ is the direction
opposite to the nearest point to the origin in the convex hull of the gradients of the active
constraints. Notice that applying duality has resulted in a problem (5.69) with fewer variables
and simpler constraints than the original problem (5.68). Also, (5.69) is a quadratic program
(see below).

We now drop the differentiability assumption on f 0 and f and merely assume that they are
convex. We substitute for (P ) the family of problems

min{f 0 (x) : f (x) ≤ b, x ∈ X}

with b ∈ Rm and X a convex set, and we will be mainly interested in b in a neighborhood


of the origin. We know that, when f 0 and f are continuously differentiable, the KKT
multipliers can be interpreted as sensitivities of the optimal cost to variation of components
of f . We will see here how this can be generalized to the non-differentiable case. When the
generalization holds, duality will hold.
The remainder of our analysis will take place in Rm+1 where points of the form
(f (x), f 0 (x)) lie (see Figure 5.9). We will denote vectors in Rm+1 by (z, z0 ) where z 0 ∈ R,
z ∈ Rm . We also define f¯ : X → Rm+1 by
 
f (x)
f¯(x) =
f 0 (x)

and f¯(X) by
f¯(X) = {f¯(x) : x ∈ X}
On Figure 5.9, the cross indicates the position of f¯(x) for some x ∈ X; and, for some µ ≥ 0m ,
the oblique line represents the hyperplane Hx,µ orthogonal to (µ, 1) ∈ Rm+1 , i.e.,

Hx,µ = {(z, z 0 ) : (µ, 1)T (z, z 0 ) = α},

where α is such that f¯(x) ∈ Hx,µ , i.e.,

µT f (x) + f 0 (x) = α,

i.e.,
z 0 = L(x, µ) − µT z.

124 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.9 Duality


z0

 (f(x),f 0 (x))
 〈µ,f(x)〉 

×
(x∈X)

L(x,µ)
 f 0 (x)  z 0 =f 0 (x)-〈µ,z-f(x)〉=L(x,µ)-〈µ,z〉

  slope=-µ≤0

m

z

Figure 5.9

In particular, the oblique line intersects the vertical axis at z 0 = L(x, µ).
Next, we define the following objects:
Ω(b) = {x ∈ X : f (x) ≤ b},
B = {b ∈ Rm : Ω(b) 6= ∅},
V : Rm → R ∪ {±∞}, with V (z) := inf {f 0 (x) : f (x) ≤ z}.
x∈X

V is the value function. It can take the values +∞ (when {x ∈ X : f (x) ≤ b} is empty) and
−∞ (when V is unbounded from below on {x ∈ X : f (x) ≤ b}).
Exercise 5.43 If f 0 , f are convex and X is a convex set, then Ω(b) and B and the epigraph
of V are convex sets (so that V is a convex function), and V is monotonic non-increasing.
(Note that, on the other hand, min{f 0 (x) : f (x) = b} need not be convex in b: e.g.,
f (x) = ex and f 0 (x) = x. Also, f¯(X) need not be convex (e.g., it could be just a curve as,
for instance, if n = m = 1).)
The following exercise points out a simple geometric relationship between epiV and f¯(X),
which yields simple intuition for the central result of this section.
Exercise 5.44 Prove that
cl(epiV ) = cl(f¯(X) + Rm+1
+ ),
where Rm+1
+ is the set of points in Rm+1 with non-negative components. Further, if for all z
such that V (z) is finite, the “min” in the definition of V is attained, then
epiV = f¯(X) + Rm+1
+ .
This relationship is portrayed in Figure 4.10, which immediately suggests that the “lower
tangent plane” to epiV with “slope” −µ (i.e., orthogonal to (µ, 1), with µ ≥ 0m ), intersects
the vertical axis at ordinate ψ(µ), i.e.,
inf{z 0 + µT z : (z, z 0 ) ∈ epiV } = inf L(x, µ) = ψ(µ) ∀µ ≥ 0m . (5.71)
x∈X

We now provide a simple, rigorous derivation of this result.


First, the following identity is easily derived.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 125


Constrained Optimization

Exercise 5.45

inf{z 0 + µT z : z 0 ≥ V (z)} = inf{z 0 + µT z : z 0 > V (z)}.

The claim now follows:

inf{z 0 + µT z : (z, z 0 ) ∈ epiV } = inf {z 0 + µT z : z 0 ≥ V (z)}


(z,z 0 )∈Rm+1

= inf {z 0 + µT z : z 0 > V (z)}


(z,z 0 )∈Rm+1

= inf {z 0 + µT z : z 0 > f 0 (x), f (x) ≤ z}


(z,z 0 )∈Rm+1 ,x∈X

= inf {z 0 + µT f (x)}
(z,z 0 )∈Rm+1 ,x∈X

= inf {f 0 (x) + µT f (x)}


x∈X
= inf L(x, µ)
x∈X
= ψ(µ),

where we have used the fact that µ ≥ 0m .


In the case of the picture, V (0m ) = f 0 (x∗ ) = ψ(µ∗ ) i.e., duality holds and inf and sup are
attained and finite. This works because

(i) V (·) is convex

(ii) there exists a non-vertical supporting line (supporting hyperplane) to epi V at (0, V (0)).
(A sufficient condition for this is that 0m ∈ int B.)

Note that if V (·) is continuously differentiable at 0, then −µ∗ is its slope where µ∗ is the
KKT multiplier vector. This is exactly the sensitivity result we proved in section 5.8!
In the convex (not necessarily differentiable) case, condition (ii) acts as a substitute
for KTCQ. It is known as the Slater constraint qualification (after Morton L. Slater, 20th
century American mathematician). It says that there exists some feasible x at which none
of the constraints is active, i.e., f (x) < 0. Condition (i), which implies convexity of epi V ,
implies the existence of a supporting hyperplane, i.e., ∃(µ, µ0 ) 6= 0m+1 , s.t.

µT 0 + µ0 V (0) ≤ µT z + µ0 z 0 ∀(z, z 0 ) ∈ epi V


i.e.

µ0 (V (0)) ≤ µ0 z 0 + µT z ∀(z, z 0 ) ∈ epi V (5.72)

and the epigraph property of epi V ((0, β) ∈ epi V ∀β > V (0)) implies that µ0 ≥ 0.

Exercise 5.46 Let S be convex and closed and let x ∈ ∂S, where ∂S denotes the boundary of
S. Then there exists a hyperplane H separating x and S. H is called supporting hyperplane
to S at x. (Note: The result still holds without the assumption that S is closed, but the proof
is harder. Hint for this harder result: If S is convex, then ∂S = ∂clS.)

126 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.9 Duality

Exercise 5.47 Suppose f 0 and f j are continuously differentiable and convex, and suppose
Slater’s constraint qualification holds. Then MFCQ holds at every feasible x∗ .

Under condition (ii), µ0 > 0, i.e., the supporting hyperplane is non-vertical. In particular
suppose that 0 ∈ int B and proceed by contradiction: if µ0 = 0 (5.72) reduces to µT z ≥ 0
for all (z, z 0 ) ∈ epi V , in particular for all z with kzk small enough; since (µ0 , µ) 6= 01+m ,
this is impossible. Under this condition, dividing through by µ0 we obtain:

∃ µ∗ ∈ Rm s.t. V (0m ) ≤ z 0 + (µ∗ )T z ∀(z, z 0 ) ∈ epi V .

Next, the fact that V is monotonic non-increasing implies that µ∗ ≥ 0m . (Indeed we can
keep z 0 fixed and let any component of z go to +∞.) Formalizing the argument made above,
we now obtain, since, for all x ∈ X, (f 0 (x), f (x)) ∈ epi V ,

f 0 (x∗ ) = V (0m ) ≤ f 0 (x) + (µ∗ )T f (x) = L(x, µ∗ ) ∀x ∈ X ,

implying that
f 0 (x∗ ) ≤ inf L(x, µ∗ ) = ψ(µ∗ ) .
x

In view of weak duality, duality holds and sup ψ(µ) is attained at µ∗ ≥ 0m .


µ≥0m

Remark 5.34 Figure 5.10 may be misleading, as it focuses on the special case where m = 1
and the constraint is active at the solution (since µ∗ > 0). Figure 4.11 illustrates other cases.
It shows V (z) for z = γej , with γ a scalar and ej the jth coordinate vector, both in the case
when the constraint is active and in the case it is not.

To summarize: if x∗ solves (P ) (in particular, inf p(x) is attained and finite) and
x∈X
conditions (i) and (ii) hold, then sup ψ(µ) is attained and equal to f 0 (x∗ ) and x∗ solves
µ≥0

minimize L(x, µ∗ ) s.t. x ∈ X .

Note. Given a solution µ∗ to the dual, there may exist x minimizing L(x, µ∗ ) s.t. x does not
solve the primal. A simple example is given by the problem

min (−x) s.t. x ≤ 0

with x ∈ R, where L(x, µ∗ ) is constant (V (·) is a straight line). For, e.g., x̂ = 1,

L(x̂, µ∗ ) ≤ L(x, µ∗ ) ∀x

but it is not true that


L(x̂, µ) ≤ L(x̂, µ∗ ) ∀µ ≥ 0m .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 127


Constrained Optimization

5.10 Linear and Quadratic Programming


(See [?])
Consider the problem

min{cT x : Gx = b1 , F x ≤ b2 } (5.73)

where

c ∈ Rn
G ∈ Rm×n
F ∈ Rk×n

For simplicity, assume that the feasible set is nonempty and bounded. Note that if n = 2
and m = 0, we have a picture such as that on Figure 5.12. Based on this figure, we make
the following guesses

1. If a solution exists (there is one, in view of our assumptions), then there is a solution
on a vertex (there may be a continuum of solutions, along an edge)

2. If all vertices “adjacent” to x̂ have a larger cost, then x̂ is optimal

Hence a simple algorithm would be

1. Find x0 ∈ Ω, a vertex. Set i = 0

2. If all vertices adjacent to xi have higher cost, stop. Else, select an edge along which
the directional derivative of cT x is the most negative. Let xi+1 be the corresponding
adjacent vertex and iterate.

This is the basic idea of simplex algorithm.


In the sequel, we restrict ourselves to the following canonical form

min{cT x : Ax = b, x ≥ 0} (5.74)

where c ∈ Rn , A ∈ Rm×n , b ∈ Rm

Proposition 5.11 Consider the problem (5.73). Also consider

min{cT (v − w) : G(v − w) = b1 , F (v − w) + y = b2 , v ≥ 0, w ≥ 0, y ≥ 0}. (5.75)

If (v̂, ŵ, ŷ) solves (5.75) then x̂ = v̂ − ŵ solves (5.73).

Exercise 5.48 Prove Proposition 5.11.

128 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.10 Linear and Quadratic Programming

Problem (5.75) is actually of the form (5.74) since it can be rewritten as

(  c T  v   
 v

 

v
 )
G −G 0 b1
min  −c   w   w = , w  ≥ 0
F −F I b2
0 y y y

Hence, we do not lose any generality by considering (5.74) only. To give meaning to our first
guess, we need to introduce a suitable notion of vertex or extreme point.

Definition 5.11 Let Ω ∈ Rn be a polyhedron (intersection of half spaces). Then x ∈ Ω is


an extreme point of Ω if

x = λx1 + (1 − λ)x2
⇒ x = x1 = x2
λ ∈ (0, 1); x1 , x2 ∈ Ω

Proposition 5.12 (See [?])


Suppose (5.74) has a solution (it does under our assumptions). Then there exists x∗ , an
extreme point, such that x∗ solves (5.74).

Until the recent introduction of new ideas in linear programming (Karmarkar and others),
the only practical method for the solution of general linear programs was the simplex method.
The idea is as follows:

1. obtain an extreme point of Ω, x0 , set i = 0.

2. if xi is not a solution, pick among the components of xi which are zero, a component
such that its increase (xi remaining on Ax = b) causes a decrease in the cost function.
Increase this component, xi remaining on Ax = b, until the boundary of Ω is reached
(i.e., another component of x is about to become negative). The new point, xi+1 , is an
extreme point adjacent to xi with lower cost.

Obtaining an initial extreme point of Ω can be done by solving another linear program

min{Σǫj : Ax − b − ǫ = 0, x ≥ 0, ǫ ≥ 0} (5.76)

Exercise 5.49 Let (x̂, ǫ̂) solve (5.76) and suppose that

min{cT x : Ax = b, x ≥ 0}

has a feasible point. Then ǫ̂ = 0 and x̂ is feasible for the given problem.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 129


Constrained Optimization

More about the simplex method can be found in [?].


Quadratic programming (See [?])
Problems with quadratic cost function and linear constraints frequently appear in optimiza-
tion (although not as frequently as linear programs). We have twice met such problems
when studying algorithms for solving general nonlinear problems: in some optimality func-
tions and in the sequential quadratic programming method. As for linear programs, any
quadratic program can be put into the following canonical form
1
min{ xT Qx + cT x : Ax = b, x ≥ 0} (5.77)
2
We assume that Q is positive definite so that (5.77) (with strongly convex cost function and
convex feasible set) admits at most one KKT point, which is then the global minimizer. K.T.
conditions can be expressed as stated in the following theorem.

Theorem 5.22 x̂ solves (5.77) if and only if ∃ψ ∈ Rn , ǫ ∈ Rn ∋


Ax̂ = b, x̂ ≥ 0
Qx̂ + c + AT ψ − ǫ = 0
ǫT x̂ = 0
ǫ≥0

Exercise 5.50 Prove Theorem 5.22.

In this set of conditions, all relations except for the next to last one (complementary slack-
ness) are linear. They can be solved using techniques analog to the simplex method (Wolfe
algorithm; see [?]).

130 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.10 Linear and Quadratic Programming

Figure 5.10

Copyright ©1993–2024, André L. Tits. All Rights Reserved 131


Constrained Optimization

Figure 5.11

132 Copyright ©1993-2018, André L. Tits. All Rights Reserved


5.10 Linear and Quadratic Programming

〈 f1,x 〉 = b12

〈 f2,x 〉 = b22



x •
〈 f4,x 〉 = b42

c 〈 f3,x 〉 = b32

〈 c,x 〉 = constant

Figure 5.12:

Copyright ©1993–2024, André L. Tits. All Rights Reserved 133


Constrained Optimization

134 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Chapter 6

Calculus of Variations and


Pontryagin’s Principle

This chapter deals with a subclass of optimization problems of prime interest to controls
theorists and practitioners, that of optimal control problems, and with the classical field of
calculus of variations (whose formalization dates back to the late 18th century), a precursor
to continuous-time optimal control. Optimal control problem can be optimization problems
in finite-dimensional spaces (like in the case of discrete-time optimal control with finite
horizon), or in infinite-dimensional spaces (like in the case of a broad class of continuous-
time optimal control problems). In the latter case, finite-dimensional optimization ideas
often still play a key role, at least when the underlying state space is finite-dimensional.

6.1 Introduction to the calculus of variations


In this section, we give a brief introduction to the classical calculus of variations, and fol-
lowing the discussion in [?], we show connections with optimal control and Pontryagin’s
Principle. (Also see [?].)
Let X := [C 1 ([a, b])]n , a, b ∈ R, let A, B ∈ Rn be given, let

Ω = {x ∈ X : x(a) = A, x(b) = B}, (6.1)

and let J : Ω → R be given by


Zb
J(x) = L(t, x(t), ẋ(t))dt ∀x ∈ X (6.2)
a

where L : [a, b] × Rn × Rn → R is a given smooth function. L is typically referred to as “La-


grangian” in the calculus-of-variations literature; note that it is unrelated to the Lagrangian
encountered in Chapter 5. The basic problem in the classical calculus of variations is

minimize J(x) s.t. x ∈ Ω. (6.3)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 135


Calculus of Variations and Pontryagin’s Principle

Remark 6.1 Note at this point that problem (6.3) can be thought of as the “optimal
control” problem
Zb
minimize L(t, x(t), u(t))dt s.t. ẋ(t) = u(t) ∀t, x(a) = A, x(b) = B, u continuous,
a
(6.4)
where minimization is to be carried out over the pair (x, u). The more general ẋ = f (x, u),
with moreover the values of u(t) restricted to lie in some U (which could be the entire Rm )
amounts to generalizing Ω by imposing constraints on the values of ẋ in (6.4), viz.,

ẋ(t) ∈ f (x(t), U) ∀t

(a “differential inclusion”), which can also be thought of as

ẋ(t) = v(t), v(t) ∈ Ũ := f (x(t), U) ∀t,

i.e., as state-dependent constraint on the control values. E.g., f (x, u) = sin(u) with U = Rm
yields the constraint |ẋ(t)| ≤ 1 for all t. In general, such constraint are not allowed within
the framework of the calculus of variations. If f (x, U) = Rn though (more likely to be
the case when U = Rn ) then the problem does fit within (6.3), via (6.4), but requires the
solution of an equation of the form v = f (x, u) for u.
While the tangent cone is norm dependent, the radial cone is not, so as a first approach
we base our analysis on the latter. Indeed, it turns out that much can be said about this
problem1 without need to specify a norm on X.

Exercise 6.1 Show that, for any x ∈ Ω,

RC(x, Ω) = {h ∈ X : h(a) = h(b) = 0n }.

Furthermore, RC(x∗ , Ω) is a subspace. Finally, for all x ∈ Ω, x + RC(x, Ω) = Ω.

It is readily established that J is G-differentiable. (Recall that, unlike F-differentiability,


G-differentiability is independent of the norm on the domain of the function.) In the sequel,
prompted by the connection pointed out above with optimal control, we denote by Du L the
derivative of L with respect to its third argument. Similarly, Dt L and Dx L denote its partial
derivatives with respect to its first and second arguments.
Exercise 6.2 Show that J is G-differentiable, with derivative DJ given by
Zb  
DJ(x)h = Dx L(t, x(t), ẋ(t))h(t) + Du L(t, x(t), ẋ(t))ḣ(t) dt ∀x, h ∈ X.
a

[Hint: First investigate directional differentiability, recalling that if J is differentiable then


DJ(x)h is the directional derivative of J at x in direction h.]
1
Even concerning “weak” local minimizers (see Remark 4.2) whose definition, in contrast to that of local
minimizers, does not rely on an underlying norm. In the sequel though, we focus on global minimizers.

136 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.1 Introduction to the calculus of variations

Since RC(x∗ , Ω) is a subspace, we know that, if x∗ is a minimizer for J, then we must


have DJ(x∗ )h = 0 for all h ∈ RC(x∗ , Ω). The following key result ensues.

Proposition 6.1 If x∗ ∈ Ω is optimal for problem (6.3) then, ∀h ∈ X s.t. h(a) = h(b) = 0n ,
Z b 
∗ ∗ ∗ ∗
Dx L(t, x (t), ẋ (t))h(t) + Du L(t, x (t), ẋ (t), )ḣ(t) dt = 0 (6.5)
a

Remark 6.2 Clearly, this results also holds for “stationary” points. E.g., see below the
discussion of the Principle of Least Action.

We now proceed to transform (6.5) to obtain a more manageable condition: the Euler–
Lagrange equation (Leonhard Euler, Swiss mathematician, 1707–1787). The following
derivation is simple but assumes that x∗ is twice continuously differentiable. (An alter-
native, which does not require such assumption, is to use the DuBois-Reymond Lemma: see,
e.g., [?, ?].) Integrating (6.5) by parts, one gets, for all h ∈ X with h(a) = h(b) = 0n ,
Z b Z b
d
∗ ∗
Dx L(t, x (t), ẋ (t))h(t)dt+[Du L(t, x (t), ẋ ∗ ∗
(t))h(t)]ba − (Du L(t, x∗ (t), ẋ∗ (t))) h(t)dt = 0
a a dt
i.e.,
Z b 
∗ ∗ d ∗ ∗
Dx L(t, x (t), ẋ (t)) − Du L(t, x (t), ẋ (t)) h(t)dt = 0 (6.6)
a dt

Since the integrand is continuous, the Euler–Lagrange equation follows

Theorem 6.1 (Euler–Lagrange Equation.) If x∗ ∈ Ω is optimal for problem (6.3) then

d
Dx L(t, x∗ (t), ẋ∗ (t)) − Du L(t, x∗ (t), ẋ∗ (t)) = 0L(X,R) ∀t ∈ [a, b] (6.7)
dt
Indeed, it is a direct consequence of the following result.

Exercise 6.3 If f : R → R is continuous and if


Z b
f (t)h(t)dt = 0n ∀h ∈ X s.t. h(a) = h(b) = 0n
a

then
f (t) = 0n ∀t ∈ [a, b].

Hamiltonia formalism Define H : R × Rn × Rn × Rm ,

H(τ, ξ, π, v) = π T v − L(τ, ξ, v)

and let
p∗ (t) = ∇u L(t, x∗ (t), u∗ (t))

Copyright ©1993–2024, André L. Tits. All Rights Reserved 137


Calculus of Variations and Pontryagin’s Principle

so that, invoking E-L,

ṗ∗ (t) = Dt ∇u L(t, x∗ (t), u∗ (t)) = ∇x L(t, x∗ (t), u∗ (t)) = −∇x H(t, x∗ (t), u∗ (t)).

Also,
∇u H(t, x∗ (t), u∗ (t)) = p∗ (t) − ∇u L(t, x∗ (t), u∗ (t)) = 0,
with u∗ (t) = ẋ∗ (t), and (2nd order necessary condition)

∇2uu L(t, x∗ (t), u∗ (t))  0m2 ,

with u∗ (t) = ẋ∗ (t). Finally,

∇2uu H(t, x∗ (t), u∗ (t)) = −∇2uu L(t, x∗ (t), u∗ (t)) ≤ 0 ∀t

with u∗ (t) = ẋ∗ (t).


Remark 6.3

1. Euler–Lagrange equation is a necessary condition of optimality. If x∗ is a local min-


imizer for (6.3) in some norm (in particular, x∗ ∈ C 1 ), then it satisfies E.-L. Prob-
lem (6.3) may also have solutions which are not in C 1 , not satisfying E.-L.

2. E.-L. amounts to a second order ordinary differential equation in x with 2 point bound-
ary conditions (x∗ (a) = A, x∗ (b) = B). Existence and uniqueness of a solution are not
guaranteed in general.

Example 6.1 (see [?] for details)

Among all the curves joining 2 given points (x1 , t1 ) and (x2 , t2 ) in R2 , find the one which
generates the surface of minimum area when rotated about the t-axis.
The area of the surface of revolution generated by rotating the curve x around the t-axis is
Z t2 p
J(x(·)) = 2π x(t) 1 + ẋ(t)2 dt, (6.8)
t1

so that
∂L √ ∂L xẋ
(x, ẋ) = √
(x, ẋ) = 1 − ẋ2 , ,
∂x ∂u 1 − ẋ2
and the Euler–Lagrange equation can be integrated to give
t + C1
x∗ (t) = C cosh (6.9)
C
where C and C1 are constants to be determined using the boundary conditions.

Exercise 6.4 Check (6.9).

It can be shown that 3 cases are possible, depending on the positions of (x1 , t1 ) and (x2 , t2 )

138 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.1 Introduction to the calculus of variations

1. There are 2 curves of the form (6.9) passing through (x1 , t1 ) and (x2 , t2 ) (in limit cases,
these 2 curves are identical). One of them solves the problem (see Figure 6.1).

2. There are 2 curves of the form (6.9) passing through (x1 , t1 ) and (x2 , t2 ). One of them
is a local minimizer. The global minimizer is as in (3) below (non smooth).

3. There is no curve of the form (6.9) passing through (x1 , t1 ) and (x2 , t2 ). Then there is
no smooth curve that achieves the minimum. The solution is not C 1 and is shown in
Figure 6.2 below (it is even not continuous).

x (x1,t1 ) (x2,t2 )
• •

(a) t

Figure 6.1:

surface = sum
of 2 disks



(b)

Figure 6.2:

Various extensions
The following classes of problems can be handled in a similar way as above.

(1) Variable end points

– x(a) and x(b) may be unspecified, as well as either a or b (free time problems)
– some of the above may be constrained without being fixed, e.g.,

g(x(a)) ≤ 0

(2) Isoperimetric problems: One can have constraints of the form


Z b

K(x) = G(x(t), ẋ(t), t)dt = given constant
a

For more detail, see [?, ?].

Copyright ©1993–2024, André L. Tits. All Rights Reserved 139


Calculus of Variations and Pontryagin’s Principle

Theorem 6.2 (Legendre second-order necessary condition (for a weak minimum). Adrien-
Marie Legendre, French mathematician, 1752–1833.) If x∗ ∈ Ω is optimal for problem (6.3),
then
∇2u L(t, x∗ (t), ẋ∗ (t))  0 ∀t ∈ [a, b].

Again, see [?, ?] for details.

Toward Pontryagin’s Principle (see [?])


Along the lines of like sub-section in section 2.1.1, but adapted to problem (6.4) (more
general nonlinear objective but simpler dynamics), define the pre-Hamiltonian H : R × Rn ×
Rn × Rm → R by
H(t, x, p, u) := −L(t, x, u) + pT u (6.10)
and, given a minimizer u∗ and corresponding state trajectory x∗ for problem (6.4) (with
ẋ∗ = u∗ ), let
p∗ (t) := ∇u L(t, x∗ (t), u∗ (t)) ∀t, (6.11)
Then the Euler–Lagrange equation yields

ṗ∗ (t) = ∇x L(t, x∗ (t), u∗ (t)) = −∇x H(t, x∗ (t), p∗ (t), u∗ (t)) ∀t. (6.12)

Next, since ∇p H(t, x, p, u) = u, and since u∗ = ẋ∗ , we have

ẋ∗ (t) = ∇p H(t, x∗ (t), p∗ (t), ẋ∗ (t)) ∀t. (6.13)

Further,
∇u H(t, x∗ (t), p∗ (t), u∗ (t)) = −p∗ (t) + p∗ (t) = 0 ∀t. (6.14)
Finally, Legendre’s second-order condition yields

∇2u H(t, x∗ (t), p∗ (t), ẋ∗ (t))  0 ∀t. (6.15)

Equations (6.14)-(6.13)-(6.12)-(6.15), taken together, are very close to Pontryagin’s Prin-


ciple applied to problem (6.4) (fixed initial and terminal states). The only missing piece is
that, instead of (recall that ẋ∗ (t) = u∗ (t))

H(t, x∗ (t), p∗ (t), u∗ (t)) = maxn H(t, x∗ (t), p∗ (t), v), (6.16)
v∈R

we merely have (6.14) and (6.15), which are necessary conditions for u∗ (t) to be such maxi-
mizer.
Exercise. Verify that this (in particular equation (6.11)) is consistent with the exercise
immediately following Exercise 2.17 (Pontryagin’s Principle for the case of linear dynamics
and fixed terminal state).
Exercise. Reconcile definition (6.11) of p∗ (t) (in the case when the dynamics is ẋ = u) with
definition (3.13) used in the context of dynamic programming. [Hint: Invoke (HJB).]

140 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.1 Introduction to the calculus of variations

Remark 6.4 Condition (2.3) can be written as

H(t, x∗ (t), p∗ (t), u∗(t)) = H(t, x∗ (t), p∗ (t)),

where H is the Hamiltonian, which in the present case is also the Legendre-Fenchel transform
of the Lagrangian L(t, x, ·), viz.

H(t, x, p) = sup{pT v − L(t, x, v)}.


v

also known as convex conjugate of L(t, x, ·). (Indeed, as the supremum of linear functions,
the Legendre-Fenchel transform is convex—in our context, in p.)

Remark 6.5 In view of Remark 6.1 and with equations (6.14)-(6.13)-(6.12)-(6.15) in hand,
it is tempting to conjecture that, subject to a simple modification, such “maximum principle”
still holds in much more general cases, when ẋ = u is replaced by ẋ = f (x, u) and u(t)
is constrained to lie in a certain set U for all t: Merely replace pT u by pT f (x, u) in the
definition (6.10) of H and, in (6.16), replace the unconstrained maximization by one over
U. This intuition turns out to be essentially correct indeed, as we will see below (and as we
have already seen, in a limited context, in section 2.3).

Connection with classical mechanics


Classical mechanics extensively refers to a “Hamiltonian” which is the total energy in
the system. This quantity can be linked to the above as follows. (See [?, Section 2.4.3], [?,
Section 1.4] for additional insight.)
Consider an isolated mechanical system and denote by x(t) the vector of its position (and
angle) variables. According to Hamilton’s Principle of Least Action (which should be more
appropriately called Principle of Stationary Action), the state of such system evolves so as
to annihilate the derivative of the “action” S(x), where S : C 1 [a, b] → R is given by
Z b
S(x) = L(t, x(t), ẋ(t))dt,
a

where
L(t, x(t), ẋ(t)) = T − V,
T and V being the kinetic and potential energies. Again, define

p∗ (t) := ∇u L(t, x∗ (t), ẋ∗ (t)). (6.17)

In classical mechanics, the potential energy does not depend on ẋ(t), while the kinetic
energy is of the form T = 12 ẋ(t)T M ẋ(t), where M is a symmetric, positive definite matrix.
Substituting into (6.17) yields p∗ (t) = M ẋ∗ (t). Hence, from (6.10),

H(t, x∗ (t), p∗ (t), ẋ∗ (t)) = p∗ (t)T ẋ∗ (t) − L(t, x∗ (t), ẋ∗ (t)) = ẋ∗ (t)M ẋ∗ (t) − T + V = T + V,

i.e., the pre-Hamiltonian evaluated along the trajectory that makes the action stationary is
indeed the total energy in the system. In this context, p∗ is known as the momentum. For
more details, see e.g., [?].

Copyright ©1993–2024, André L. Tits. All Rights Reserved 141


Calculus of Variations and Pontryagin’s Principle

6.2 Discrete-Time Optimal Control


(see [?])
Consider the problem (time–varying system)
N
X −1
min J(u) := L(i, xi , ui ) + ψ(xN ) s.t. (6.18)
i=0

xi+1 = xi + f (i, xi , ui ), i = 0, 1, . . . , N − 1 (dynamics)2


g0 (x0 ) = θ, h0 (x0 ) ≤ θ (initial state constraints)
gN (xN ) = θ, hN (xN ) ≤ θ (final state constraints)
qi (ui ) ≤ θ, i = 0, . . . , N − 1 (control constraints)
where all functions are continuously differentiable in the x’s and u’s, with ui ∈ Rm , xi ∈ Rn ,
and where ψ : Rn → R and all other functions are real vector-valued. To keep things
simple, we will not consider trajectory constraints of the type

r(i, xi , ui ) ≤θ.

The given problem can be formulated as

min{f 0 (z) : f¯(z) ≤ θ, ḡ(z) = θ} (6.19)


z

where  
x0
 .. 
 . 
 
 xN 
z=  ∈ R(N +1)n+N m
u
 0 
 . 
 .. 
uN −1
is an augmented vector on which to optimize and
N
X −1
0
f (z) = L(i, xi , ui ) + ψ(xN )
i=0

   
q0 (u0 ) x1 − x0 − f (0, x0 , u0)
 ..   .. 
 .   . 
¯  
f (z) = qN −1 (uN −1 )
 
ḡ(z) = xN − xN −1 − f (N − 1, xN −1 , uN −1)
   
 h0 (x0 )   g0 (x0 ) 
hN (xN ) gN (xN )

2 ∆
or equivalently, ∆xi = xi+1 − xi = f (i, xi , ui )

142 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.2 Discrete-Time Optimal Control

(the dynamics are now handled as constraints). If ẑ is optimal, the F. John conditions hold
for (6.19), i.e.,
∃( p0 , λ0 , . . . , λN −1 , ν0 , νN ) ≥ 0, p1 , . . . , pN , η0 , ηN , not all zero s.t.
|{z} | {z } | {z } | {z } | {z }
ψ q h dynamics g

∂ ∂f
⇒ p0 ∇x L(0, x̂0 , û0) − p1 + (0, x̂0 , û0 )T p1 + p0 = θ, (6.20)
∂x0 ∂x
where (note that p0 ∈ Rn is to be distinguished from p0 ∈ R)
∂g0 ∂h0
p0 := (x̂0 )T η0 + (x̂0 )T ν0 , (6.21)
∂x ∂x

∂ ∂f
⇒ p0 ∇x L(i, x̂i , ûi ) + pi − pi+1 − (i, x̂i , ûi )T pi+1 = θ i = 1, . . . , N − 1 (6.22)
∂xi ∂x
∂ ∂gN ∂hN
⇒ p0 ∇ψ(x̂N ) + pN + (x̂N )T ηN + (x̂N )T νN = θ (6.23)
∂xN ∂x ∂x
∂ ∂ ∂
⇒ p0 ∇u L(i, x̂i , ûi ) − f (i, x̂i , ûi )T pi+1 + qi (ûi )T λi = θ i = 0, . . . , N −(6.24)
1
∂ui ∂u ∂u
+ complementarity slackness
λTi qi (ûi ) = 0 i = θ, . . . , N − 1 (6.25)
ν0T h0 (x̂0 ) = 0 νNT
hN (x̂N ) = θ (6.26)
To simplify these conditions, let us define the pre-Hamiltonian function H : {0, 1, . . . , N −
1} × Rn × R1+n × Rm → R
H(i, x, [p0 ; p], u) = −p0 L(i, x, u) + pT f (i, x, u).
Then we obtain, with p̃i := [p0 ; pi ],
(6.20) + (6.22) ⇒ pi = pi+1 + ∇x H(i, x̂i , p̃i+1 , ûi ) i = 0, 1, . . . , N − 1 (6.27)
 ∂ ∂

− ∂u H(i, x̂i , p̃i+1 , ûi ) + ∂u qi (ûi )T λi = θ
(6.24) + (6.25) ⇒ i = 0, . . . , N − 1 (6.28)
λTi qi (ûi ) = 0
and (6.28) is a necessary condition (assuming KTCQ holds) for ûi the problem
max H(i, x̂i , p̃i+1 , v) s.t. qi (v) ≤ θ, i = 0, . . . , N − 1
v

to have v := ûi as a solution. (So this is a weak version of Pontryagin’s Principle.) Also,
(6.21)+(6.23) +(6.26) imply the transversality conditions
" #
∂g0
∂x
(x̂0 )
p0 ⊥ N ∂hj0 , (6.29)
∂x
(x̂0 ), j s.t. hj0 (x̂0 ) = 0
" #
∂gN
∂x
(x̂N )
pN + p0 ∇ψ(xN ) ⊥ N ∂hjN . (6.30)
∂x
(x̂N ), j s.t. hjN (x̂N ) =0

Copyright ©1993–2024, André L. Tits. All Rights Reserved 143


Calculus of Variations and Pontryagin’s Principle

Remark 6.6 Simple examples show that it is indeed the case that a strong version of
Pontryagin’s Principle does not hold in general, in the discrete-time case. One such example
can be found, e.g., in [?, Section 4.1, Example 37]. A less restrictive condition than convexity,
that still guarantees that a strong Pontryagin’s principle holds, is “directional convexity”;
see [?, Section 4.2].

Remark 6.7 1. If KTCQ holds for the original problem, one can set p0 = 1. Then if
there are no constraints on xN (no gN ’s or hN ’s) we obtain that pN = −∇ψ(xN ).
2. If p0 6= 0, then without loss of generality we can assume p0 = 1 (this amounts to scaling
all multipliers).

3. If x0 is fixed, p0 is free (i.e, no additional information is known about p0 ). Every degree


of freedom given to x0 results in one constraint on p0 . The same is true for xN and
pN + p0 ∇ψ(xN ).

4. The vector pi is known as the co-state or adjoint variable (or dual variable) at time
i. Equation (6.27) is the adjoint difference equation. Note that problems (6.31) are
decoupled in the sense that the kth problem can be solved for ûk as a function of x̂k
and p̂k+1 only. We will discuss this further in the context of continuous-time optimal
control.

Remark 6.8 If all equality constraints (including dynamics) are affine and the objective
function and inequality constraints are convex, then (6.28) is a necessary and sufficient
condition for the problem

max H(i, x̂i , p̃i+1 , u) s.t. qi (u) ≤ 0, i = 1, . . . , N − 1 (6.31)


u

to have a local minimum at ûi , in particular, a true Pontryagin Principle holds: there
exist vectors p0 , p1 , . . . , pN satisfying (6.27) (dynamics) such that ûi solves the constraint
optimization problem (6.31) for i = 1, . . . , N, and such that the transversality conditions
hold.

Remark 6.9 More general conditions under which a true maximum principle holds are
discussed in [?, Section 6.2].

Remark 6.10 (Sensitivity interpretation of pi .) From section 5.8, we know that, under
appropriate assumptions (which imply, in particular, that we can choose p0 = 1), if we
modify problem (6.18) by changing the ith dynamic equation (and only the ith one) from

−xi−1 − f (i − 1, xi−1 , ui−1 ) + xi = θ

to
−xi−1 − f (i − 1, xi−1 , ui−1 ) + xi = b,
then, if we denote by û(b) the new optimal control, we have

∇b J(û(b)) = pi .
b=θ

144 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

Now note that changing θ to b is equivalent to introducing at time i a perturbation that


replaces xi with xi − b, so that varying b is equivalent to varying xi in the opposite direction.
Consider now the problem (Pi,x ), where we start at time i from state x, with value function
V (k, x). It follows from the above that

∇x V (i, xi ) = −pi ,

the discrete-time analog to (3.13). This can be verified using the dynamic-programming
approach of section 3.1. Equation (3.4)

V (i, x) = L(i, x, ûi ) + V (i + 1, x + f (i, x, ûi )),

yields
 
∂f
∇x V (i, x) = ∇x L(i, x, ûi ) + I + (i, x, ûi ) ∇x V (i + 1, x + f (i, x, ûi )).
∂x

With x := xi and pi substituted for ∇x V (i, xi ), we get (6.27) indeed!

6.3 Continuous-Time Optimal Control

More on optimal control of linear systems


Definition 6.1 Let K ⊂ Rn , x∗ ∈ K. We say that d is the inward (resp. outward) normal
to a hyperplane supporting K at x∗ if d 6= 0 and

dT x∗ ≤ dT x ∀x ∈ K (resp. dT x∗ ≥ dT x ∀x ∈ K)

Remark 6.11 Equivalently

dT (x − x∗ ) ≥ 0 ∀x ∈ K (resp. dT (x − x∗ ) ≤ 0 ∀x ∈ K)

i.e., from x∗ , all the directions towards a point in K make with d an angle of less (resp. more)
than 90◦ .

Proposition 6.2 Let u∗ ∈ U and let x∗ (t) = φ(t, t0 , x0 , u∗ ). Then u∗ is optimal if and only
if c is the inward normal to a hyperplane supporting K(tf , t0 , x0 ) at x∗ (tf ) (which implies that
x∗ (tf ) is on the boundary of K(tf , t0 , x0 )).

Exercise 6.5 Prove Proposition 6.2.

Further, as seen in Corollary 2.2, at every t ∈ [t0 , tf ], p∗ (t) is the outward normal at x∗ (t)
to K(t, t0 , x0 ).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 145


Calculus of Variations and Pontryagin’s Principle

Rn

x* (t 2)• p* (t 2)

x*(t 1)• x*(t f)• p*(t f )=c


p* (t 0) p* (t1)

x0= x *(t 0)
= K(t 0,t 0,x 0) K(t 2 ,t 0 ,x0)
K(t1,t 0 ,x0) K(t f ,t 0 ,x 0)

t0 t1 t2 tf

Figure 6.3:

Exercise 6.6 (i) Assuming that U is convex, show that U is convex. (ii) Assuming that U
is convex, show that K(tf , t0 , z) is convex.

Remark 6.12 It can be shown that K(tf , t0 , z) is convex even if U is not, provided we
enlarge U to include all bounded measurable functions u : [t0 , ∞) → U: see Theorem 1A,
page 164 of [?].

From (2.45), we see that, if u∗ is optimal, i.e., if c = −p∗ (tf ) is the inward normal to a
hyperplane supporting K(tf , t0 , x0 ) at x∗ (tf ) then, for t0 ≤ t ≤ tf , x∗ (t) is on the boundary of
K(t, t0 , x0 ) and p∗ (t) is the outward normal to a hyperplane supporting K(t, t0 , x0 ) at x∗ (t).
This normal is obtained by transporting backwards in time, via the adjoint equation, the
outward normal p∗ (tf ) at time tf .

Suppose now that the objective function is of the form ψ(x(tf )), ψ continuously differentiable,
instead of cT x(tf ) and suppose K(tf , t0 , x0 ) is convex (see remark above on convexity of
K(t, t0 , z)).
We want to
minimize ψ(x) s.t. x ∈ K(tf , t0 , x0 )
we know that if x∗ (tf ) is optimal, then

∇ψ(x∗ (tf ))T h ≥ 0 ∀h ∈ cl (coTC(x∗ (tf ), K(tf , t0 , x0 )))

Claim: from convexity of K(tf , t0 , x0 ), this implies

∇ψ(x∗ (tf ))T (x − x∗ (tf )) ≥ 0 ∀x ∈ K(tf , t0 , x0 ). (6.32)

Exercise 6.7 Prove the claim.

146 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

Note: Again, ∇ψ(x∗ (tf )) is an inward normal to K(tf , t0 , x0 ) at x∗ (tf ).


An argument identical to that used in the case of a linear objective function shows that a
version of Pontryagin’s Principle still holds in this case (but only as a necessary condition,
since (6.32) is merely necessary), with the terminal condition on the adjoint equation being
now
p∗ (tf ) = −∇ψ(x∗ (tf )).
Note that the adjoint equation cannot anymore be integrated independently of the state
equation, since x∗ (tf ) is needed to integrate p∗ (backward in time from tf to t0 ).

Exercise 6.8 By following the argument in these notes, show that a version of Pontryagin’s
Principle also holds for a discrete-time linear optimal control problem. (We saw earlier that
it may not hold for a discrete nonlinear optimal control problem. Also see discussion in the
next section of these notes.)

Exercise 6.9 If ψ is convex and U is convex, Pontryagin’s Principle is again a necessary


and sufficient condition of optimality.

Optimal control of nonlinear systems


See [?, ?]. We consider the problem

minimize ψ(x(tf )) s.t. ẋ(t) = f (t, x(t), u(t)), a.e. t ∈ [t0 , tf ], u ∈ U, (6.33)

where x(t0 ) := x0 is prescribed and x is absolutely continuous. Unlike the linear case, we do
not have an explicit expression for φ(tf , t0 , x0 , u). We shall settle for a comparison between
the trajectory x∗ and trajectories x obtained by perturbing the control u∗ . By considering
strong perturbations of u∗ we will be able to still obtain a global Pontryagin Principle,
involving a global minimization of the pre-Hamiltonian. Some proofs will be omitted.
We assume that ψ is continuously differentiable, and impose the following regularity condi-
tions on (6.33) (same assumptions as in Chapter 3)

(i) for each t ∈ [t0 , tf ], f (t, ·, ·) : Rn × Rm → Rn is continuously differentiable;

(ii) the functions f, ∂f


∂x
, ∂f
∂u
are continuous on [t0 , tf ] × Rn × Rm ;

(iii) for every finite α, ∃β, γ s.t.

||f (t, x, u)|| ≤ β + γ||x|| ∀t ∈ [t0 , tf ], x ∈ Rn , u ∈ Rm , ||u|| ≤ α.


Under these conditions, for any t̂ ∈ [t0 , tf ], any z ∈ Rn , and u ∈ PC, (6.33) has a unique
continuous solution
x(t) = φ(t, t̂, z, u) t̂ ≤ t ≤ tf
such that x(t̂ ) = z. Let x(t0 ) = x0 be given. Again, let U ⊆ Rm and let

U = {u : [t0 , tf ] → U , u ∈ PC}

Copyright ©1993–2024, André L. Tits. All Rights Reserved 147


Calculus of Variations and Pontryagin’s Principle

be set of admissible controls. Let3

K(t, t0 , z) = {φ(t, t0 , z, u) : u ∈ U}.

Problem (6.33) can be seen to be closely related to

minimize ψ(x) s.t. x ∈ K(tf , t0 , x0 ), (6.34)

where x here belongs in Rn , a finite-dimension space! Indeed, it is clear that x∗ (·) is an


optimal state-trajectory for (6.33) if and only if x∗ (tf ) solves (6.34). Characterization of a
related optimal control will be obtained as a by-product of solving this problem.
Now, let u∗ ∈ U be optimal and let x∗ be the corresponding state trajectory. As in the
linear case we must have (necessary condition)

∇ψ(x∗ (tf ))T h ≥ 0 ∀h ∈ cl coTC(x∗ (tf ), K(tf , t0 , x0 )), (6.35)

where ψ is the (continuously differentiable) terminal cost. However, unlike in the linear case
(convex reachable set), there is no explicit expression for this tangent cone. We will obtain
a characterization for a subset of interest of this cone. This subset will correspond to a
particular type of perturbations of x∗ (tf ). The specific type of perturbation to be used is
motivated by the fact that we are seeking a ‘global’ Pontryagin Principle, involving a ‘min’
over the entire U set.
Let D be the set of discontinuity points of u∗ . Let τ ∈ (t0 , tf ), τ 6∈ D, and let v ∈ U. For
ǫ > 0, we consider the strongly perturbed control uτ,ǫ defined by
(
v for t ∈ [τ − ǫ, τ )
uτ,v,ǫ (t) = ∗
u (t) elsewhere

This is often referred to as a “needle” perturbation. A key fact is that, as shown by the
following proposition, even though v may be very remote from u∗ (τ ), the effect of this
perturbed control on x is small. Such “strong” perturbations lead to a global Prontryagin’s
Principle even though local “tools” are used. Let xτ,v,ǫ be the trajectory corresponding to
uτ,v,ǫ . For small ǫ, this trajectory will be claose to x∗ and, because uτ,v,ǫ ∈ U, xτ,v,ǫ (tf ) will
be in K(tf , t0 , x0 ).

Proposition 6.3
xτ,v,ǫ (tf ) = x∗ (tf ) + ǫhτ,v + o(ǫ)
o(ǫ)
with o satisfying ǫ
→ 0 as ǫ → 0, and with

hτ,v = Φ(tf , τ )[f (τ, x∗ (τ ), v) − f (τ, x∗ (τ ), u∗ (τ ))].


˙
where Φ is the state transition matrix for the linear (time-varying) system ξ(t) =
∂f ∗ ∗
∂x
(t, x (t), u (t))ξ(t).
3
Notations φ(·, ·, ·, cdot) and K(·, ·, ·) were already use in Chapter 2 of these notes in the context of linear
dynamics.

148 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

See, e.g., [?, p. 246-250] for a detailed proof of this result. The gist of the argument is that
(i)
Z τ

xτ,v,ǫ (τ ) = x (τ − ǫ) + f (t, xτ,v,ǫ (t), v)dt = x∗ (τ − ǫ) + ǫf (τ, x∗ (τ ), v) + o(ǫ).
τ −ǫ

and
Z τ
∗ ∗
x (τ ) = x (τ − ǫ) + f (t, x∗ (t), u∗ (t))dτ = x∗ (τ − ǫ) + ǫf (τ, x∗ (τ ), u∗ (t)) + o′ (ǫ)
τ −ǫ

so that
xτ,v,ǫ (τ ) − x∗ (τ ) = ǫ[f (τ, x∗ (τ ), v) − f (τ, x∗ (τ ), u∗ (τ ))] + o′′ (ǫ).
and (ii) for t > τ , with ξ(t) := xτ,v,ǫ (t) − x∗ (t),

˙ := ẋτ,ǫ (t) − ẋ∗ (t) = f (t, xτ,ǫ (t), u∗(t)) − f (t, x∗ (t), u∗(t)) = ∂f (t, x∗ (t), u∗ (t))ξ(t) + o(ǫ).
ξ(t)
∂x

Exercise 6.10 ∀τ ∈ (t0 , tf ), τ 6∈ D, ∀v ∈ U,

hτ,v ∈ TC(x∗ (tf ), K(tf , t0 , x0 )) (6.36)

This leads to the following Pontryagin Principle (necessary condition).

Theorem 6.3 Let ψ : Rn → R be continuously differentiable and consider the problem

minimize ψ(x(tf )) s.t.


ẋ(t) = f (t, x(t), u(t)) a.e. t ∈ [t0 , tf ]
x(t0 ) = x0 , u ∈ U, x continuous

Suppose u∗ ∈ U is optimal and let x∗ (t) = φ(t, t0 , x0 , u∗ ). Let p∗ (t), t0 ≤ t ≤ tf , continuous,


satisfy the (linear) adjoint equation

∂f
ṗ∗ (t) = − (t, x∗ (t), u∗(t))T p∗ (t) = −∇x H(t, x∗ (t), p∗ (t), u∗ (t)) a.e. t ∈ [t0 , tf ] (6.37)
∂x
p∗ (tf ) = −∇ψ(x∗ (tf )) (6.38)
with
H(t, x, p, u) = pT f (t, x, u) ∀t, x, u, p.
Then u∗ satisfies the Pontryagin Principle

H(t, x∗ (t), p∗ (t), u∗ (t)) = H(t, x∗ (t), p∗ (t)) (= max H(t, x∗ (t), p∗ (t), v) ) (6.39)
v∈U

for all t ∈ [t0 , tf ). Finally, if f does not depend explicitly on t, i.e., if f (t, x, u) = fˆ(x, u) for
all t, for some fˆ, then H(t, x∗ (t), p∗ (t), u∗ (t)) us constant.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 149


Calculus of Variations and Pontryagin’s Principle

Proof (compare with linear case). In view of (6.36), we must have

∇ψ(x∗ (tf ))T ht,v ≥ 0 ∀t ∈ [t0 , tf ], v ∈ U (6.40)

Using the expression we obtained above for ht,v , we get

∇ψ(x∗ (tf ))T Φ(tf , t)[f (t, x∗ (t), v) − f (t, x∗ (t), u∗ (t))] ≥ 0 ∀t ∈ (t0 , tf ), t 6∈ D, ∀v ∈ U.

Since, in view of (6.38)–(6.37), p∗ (t) = −Φ(tf , t)∇ψ(x∗ (tf )), we get

p∗ (t)T f (t, x∗ (t), u∗ (t)) ≥ p∗ (t)T f (t, x∗ (t), v) ∀t ∈ (t0 , tf ), t 6∈ D, ∀v ∈ U.

Finally, for t ∈ D, the result follows from right-continuity of the two sides.

Remark 6.13 If u∗ ∈ U is locally optimal, in the sense that x∗ (tf ) is a local minimizer for ψ
in K(tf , t0 , x0 ), Pontryagin’s Principle still holds (with a global minimization over U). Why?

The following exercise generalizes Exercise 2.24.


Exercise 6.11 Prove that, if f does not explicitly depend on t, then m(t) = H(t, x∗ (t),
p∗ (t)) is constant. Assume that u∗ is piecewise continuously differentiable.

Integral objective functions (Lagrange problems)


Suppose the objective function, instead of being ψ(x(tf )), is of the form
Z tf
L(t, x(t), u(t))dt .
t0

Such problems are known as a Lagrange problems, while terminal-state-cost problems are
known as Mayer problems.
Lagrange problems can be converted to the Mayer form. To this end, we consider the
augmented system with state variable x̃ = [x0 ; x] ∈ R1+n as follows
 
˜ L(t, x(t), u(t))
˙
x̃(t) == f (t, x(t), u(t)) a.e. t ∈ [t0 , tf ] := = f˜(t, x(t), u(t)) a.e. t ∈ [t0 , tf ],
f (t, x(t), u(t))

x0 (t0 ) = 0, x0 (tf ) free. Now the problem is equivalent to minimizing

ψ(x̃(tf )) := x0 (tf )

with dynamics and constraints of the same form as before. After some simplifications, we
get the following result.
Theorem 6.4 Let u∗ ∈ U be optimal, and let x∗ be the associated state trajectory, and let

H(τ, ξ, η, υ) = −L(τ, ξ, υ) + pT f (τ, ξ, υ)


H(τ, ξ, η) = sup H(τ, ξ, η, v)
v∈U

150 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

Then there exists a function p∗ : [t0 , tf ] → Rn , continuous, satisfying

ṗ∗ (t) = −∇ξ H(t, x∗ (t), p∗ (t), u∗ (t)) a.e. t ∈ [t0 , tf ],

with p∗ (tf ) = 0. Furthermore

H(t, x∗ (t), p∗ (t), u∗(t)) = H(t, x∗ (t), p∗ (t)) ∀t.

Finally, if L and f do not depend explicitly on t,

m(t) = H(t, x∗ (t), p∗ (t)) = constant.

Exercise 6.12 Prove Theorem 6.4.

Remark 6.14 Note that the expression for H is formally quite similar to the negative of that
for the “Lagrangian” used in constrained optimization. Indeed L is the (integrand in the)
cost function, and f specifies the “constraints” (dynamics in the present case). (But beware!!
In the calculus of variations literature, the term “Lagrangian” refers to the integrand L in
problem (6.2).)

Exercise 6.13 Conversely, express terminal cost ψ(x(tf )) as an integral cost.

Lagrange multiplier interpretation of p∗(t)


For each t, p∗ (t) can be thought of as a (vector of) Lagrange multiplier(s) for the con-
straint
ẋ(t) = f (t, x(t), u(t)),
or rather for
dx(t) = f (t, x(t), u(t))dt. (6.41)
To see this, finely discretize time, let ∆ := ti+1 − ti , and let xi := x(ti ), ui := u(ti ), and
fi (xi , ui ) := ∆ · f (ti , xi , ui ). Then (6.41) is appropriate approximated by

xi+1 = xi + fi (xi , ui ), (6.42)

which is the dynamics used in section 6.2 (Discrete-time optimal control). Similarly, if we
let pi = p∗ (ti ), we can appropriately approximate the adjoint equation (6.37) with
∂fi
pi+1 = pi − (xi , ui )T pi ,
∂x
which is identical to (6.22), i.e., pi is the Lagrange multiplier associated with (6.42). Also,
recall that in Chapter 3), we noted in (3.13) that, when value function V is smooth enough,

p∗ (t) := −∇x V (t, x∗ (t)),

which, again (see section 6.2) can be viewed in terms of the sensitivity interpretation of
Lagrange multipliers.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 151


Calculus of Variations and Pontryagin’s Principle

Geometric approach to discrete-time case


We investigate to what extent the approach just used for the continuous-time case can
also be used in the discrete-time case, the payoff being the geometric intuition.
A hurdle is immediately encountered: strong perturbations as described above cannot
work in the discrete-time case. The reason is that, in order to build an approximation to
the reachable set we must construct small perturbations of x∗ (tf ). In the discrete-time case,
the time interval during which the control is varied cannot be made arbitrarily small (it is
at least one time step), and thus the “smallness” of the perturbation must come from the
perturbed value v. Consequently, at every t, u∗ (t) can only be compared to nearby values v
and a true Pontryagin Principle cannot be obtained. We investigate what can be obtained
by considering appropriate “weak” variations.
Thus consider the discrete-time system

xi+1 = xi + f (i, xi , ui ), i = 0, . . . , N − 1,

and the problem of minimizing ψ(xN ), given a fixed x0 and the constraint that ui ∈ U for
all i. Suppose u∗i , i = 0, . . . , N − 1, is optimal, and x∗i , i = 1, . . . , N is the corresponding
optimal state trajectory. Given k ∈ {0, . . . , N − 1}, ǫ > 0, and w ∈ TC(u∗k , U), consider the
weak variation (
u∗k + ǫw + o(ǫ) i=k
(uk,ǫ )i =
u∗i otherwise
where o(·) is selected in such a way that uǫ,i ∈ U for all i, which is achievable due to the
choice of w. The next state value is then given by

(xk,ǫ )k+1 = x∗k+1 + f (k, x∗k , u∗k + ǫw + o(ǫ)) − f (k, x∗k , u∗k ) (6.43)
∂f
= x∗k+1 + ǫ (k, x∗k , u∗k )w + õ(ǫ). (6.44)
∂u
The final state is then given by
∂f
(xk,ǫ )N = x∗N + ǫΦ(N, k + 1) (k, x∗k , u∗k )w + ô(ǫ).
∂u
∂f
which shows that hk,w := Φ(N, k + 1) ∂u (k, x∗k , u∗k )w belongs to TC(x∗N , K(N, 0, x0 )). We now
can proceed as we did in the continuous-time case. Thus
∂f
∇ψ(x∗N )T Φ(N, k + 1) (k, x∗k , u∗k )w ≥ 0 ∀k, ∀w ∈ TC(u∗k , U).
∂u
Letting p∗k solve the adjoint equation,

∂f
pi = pi+1 + (i, xi , ui )pi+1 with p∗N = −∇ψ(x∗N ),
∂x
i.e., p∗k+1 = −Φ(N, k + 1)T ∇ψ(x∗N ), we get

∂f
(p∗k+1)T (k, x∗k , u∗k )w ≤ 0 ∀k, ∀w ∈ TC(u∗k , U).
∂u

152 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

And defining H(j, ξ, η, v) = η T f (j, ξ, v) we get


∂H
(k, x∗k , p∗k+1, u∗k )w ≤ 0 ∀k, ∀w ∈ TC(u∗k , U),
∂u
which is a mere necessary condition of optimality for the minimization of H(k, x∗k , p∗k+1 , u)
with respect to u ∈ U. It is a special case of the result we obtained earlier in this chapter
for the discrete-time case.

Partially free initial state


Suppose now that x(t0 ) is not necessarily fixed but, more generally, it is merely con-
strained to satisfy g 0 (x(t0 )) = 0ℓ0 where g 0 : Rn → Rℓ0 is a given continuously differentiable
function. Let T0 = {x : g 0 (x) = 0ℓ0 . (T0 = Rn is a special case of this, where the image
space of g 0 has dimension 0. T0 = {x0 }, i.e., g 0(x) = x − x0 , is the other extreme: fixed
initial point.) Problem (6.34) becomes

minimize ψ(x) s.t. x ∈ K(tf , t0 , T0 ), (6.45)

Note that, if x∗ (·) is an optimal trajectory, then K(t, t0 , T0 ) contains K(t, t0 , x∗ (t0 )) for all t,
so that
TC(x∗ (tf ), K(tf , t0 , x∗ (t0 )) ⊆ TC(x∗ (tf ), K(tf , t0 , T0 )).
Since x∗ (tf ) ∈ K(tf , t0 , x∗ (t0 )), x∗ (tf ) is also optimal for (6.33) with fixed initial state x0 :=
x∗ (t0 ), and the necessary conditions we obtained for the fixed initial point problem apply to
the present problem, with x0 := x∗ (t0 )—but, of course, x∗ (t0 ) isn’t known. We now obtain
an additional condition (which will be of much interest, since we now have one more degree
of freedom). Let x∗ (t0 ) be the optimal initial point. From now on, the following additional
assumption will be in force:
Assumption. (x∗ (t0 ), g 0) is non-degenerate.
Let  0 
∂g ∗
h∈N (x (t0 )) = TC(x∗ (t0 ), T0 ).
∂x
Then, given ǫ > 0 there exists some little-o function o such that x∗ (t0 ) + ǫh + o(ǫ) ∈ T0 .
For ǫ > 0 small enough, let xǫ (t0 ) = x∗ (t0 ) + ǫh + o(ǫ), and consider applying our optimal
control u∗ (for initial point x∗ (t0 )), but starting from xǫ (t0 ) as initial point. We now invoke
the following result, given as an exercise.

Exercise 6.14 Show that, if h ∈ TC(x∗ (t0 ), T0 ), then

Φ(tf , t0 )h ∈ TC(x∗ (tf ), K(tf , t0 , T0 )),

where Φ is as in Proposition 6.3. (Hint: Use the fact that Dǫ xǫ (t) follows linearized dynamics.
See, e.g., Theorem 10.1 in [?].)

It follows that optimality of x∗ (tf ) for (6.45) yields


 
∗ T ∂g 0 ∗
∇ψ(x (tf )) Φ(tf , t0 )h ≥ 0 ∀h ∈ N (x (t0 )) ,
∂x

Copyright ©1993–2024, André L. Tits. All Rights Reserved 153


Calculus of Variations and Pontryagin’s Principle
 
∂g 0
i.e., since N ∂x
(x∗ (t0 )) is a subspace,
 
T ∗ T ∂g 0 ∗
(Φ(tf , t0 ) ∇ψ(x (tf ))) h = 0 ∀h ∈ N (x (t0 )) .
∂x
Thus  
∗ ∂g 0 ∗
p (t0 ) ⊥ N (x (t0 )) (6.46)
∂x
which is known as a transversality condition. Note that p∗ (t0 ) is thus no longer free. Indeed,
for each degree of freedom “gained” on x∗ (t0 ), we “lose” one on p∗ (t0 ).

Remark 6.15 An alternative derivation of this result follows by bserving that (excuse the
abuse of notation)  0 
∂ψ ∗ ∂g ∗
(x (tf )h = 0 ∀h ∈ N (x (t0 )) .
∂x0 ∂x
and
∂ψ ∗ ∂ψ ∗ ∂x(tf ) ∂ψ ∗
(x (tf )) = (x (tf )) = (x (tf ))Φ(tf , t0 ).
∂x0 ∂x ∂x0 ∂x

Constrained terminal state


Now suppose that the final state x∗ (tf ), instead of being entirely free, is possibly con-
strained, specifically, suppose x∗ (tf ) is required to satisfy g f (x∗ (tf )) = 0ℓf , where g f : Rn →
Rℓf is a given continuously differentiable function. Let Tf = {x : g f (x) = 0}. (Tf = {xf },
with xf given, is a special case of this, where g f is given by g f (x) ≡ x − xf .)
An important special case is that of completely free initial state. Indeed this is the
“mirror image” of the case with free terminal state and (possibly) constrained initial state,
which was considered in the previous subsection. It is the object of the next exercise.
Exercise. Obtain a Pontryagin Principle for problem (6.33) but with free initial state and
possibly constrained (or even fixed) terminal state. Use an initial cost instead of a terminal
cost, or consider the case of an integral objective function. [Hint: Reverse time.]
Turning to the general case, note that ψ should no longer be minimized over the entire
reachable set, but only on its intersection with Tf . The Pontryagin Principle as previously
stated no longer holds. Rather we can write
∇ψ(x∗ (tf ))T h ≥ 0 ∀h ∈ cl coTC(x∗ (tf ), K(tf , t0 , T0 ) ∩ Tf ), (6.47)
initial which involves the tangent cone to a smaller set than when the terminal state is
unconstrained. We now make simplifying assumptions.
Assumptions Terminal-State (TS):
1. (x∗ (tf ), g f ) is non-degenerate.
2.
cl coTC(x∗ (tf ), K(tf , t0 , T0 ) ∩ Tf ) = cl coTC(x∗ (tf ), K(tf , t0 , T0 )) ∩ cl coTC(x∗ (tf ), Tf ).
(6.48)

154 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

3. the convex cone


  
∇ψ(x∗ (tf ))T h ∗
C= ∂g f : h ∈ cl coTC(x (tf ), K(tf , t0 , T0 ))
∂x
(x∗ (tf ))h

is closed.
The first two assumptions amount to a type of constraint qualification for the constraint
x∗ (tf ) ∈ Tf . In contrast, the third one also involves the objective function.

Exercise 6.15 Show that, if x ∈ Ω1 ∩ Ω2 then cl coTC(x, Ω1 ∩ Ω2 ) ⊆ cl coTC(x, Ω1 ) ∩


cl coTC(x, Ω2 ). Provide an example showing that equality does not always hold.

Exercise 6.16 Consider minimizing f (x) subject to x ∈ Ω1 ∩Ω2 , with Ωi := {x : gi (x) ≥ 0},
i = 1, 2, where f, g1 , g2 : Rn → R are smooth. Let x̂ be a local minimizer for this
problem. Further assume that (i) ∇gi (x̂) 6= 0n , i = 1, 2, and (ii) TC(x, Ω1 ∩ Ω2 ) =
TC(x, Ω1 ) ∩ TC(x, Ω2 ). Show that, under such assumptions, KKT holds at x̂; i.e., with-
out further assumptions on f , there exists λ̂ ∈ R2 such that

∇f (x̂) + λ̂1 ∇g1 (x̂) + λ̂2 ∇g1 (x̂) = 0n .

This shows that (i)+(ii) forms a constraint qualification indeed.

Under these three assumptions,


 Pontryagin’s
 Principle can be readily proved, as follows.
∗ ∂g f ∗
First, TC(x (tf ), Tf ) = N ∂x (x (tf )) , and from (6.47) and (6.48), we obtain
 
∗ T ∗ ∂g f ∗
∇ψ(x (tf )) h ≥ 0 ∀h ∈ cl coTC(x (tf ), K(tf , t0 , T0 )) ∩ N (x (tf )) . (6.49)
∂x
Now let
R = (−1, 0, . . . , 0)T ∈ Rℓf +1
Then, in view of (6.49), R 6∈ C. Since C is a closed convex cone, it follows from Exercise ??
that there exists µ := [p∗0 ; π] ∈ Rℓf +1 such that µT R < 0 (i.e., p∗0 > 0) and µT v ≥ 0 for all
v ∈ C, i.e.,
 T
∗ ∗ ∂g f ∗
p0 ∇ψ(x (tf )) + T
(x (tf )) π h ≥ 0 ∀h ∈ cl coTC(x∗ (tf ), K(tf , t0 , T0 )),
∂x
or equivalently (since p∗0 > 0), by redefining π,
 T
∗ ∂g f ∗
∇ψ(x (tf )) + (x (tf ))T π h ≥ 0 ∀h ∈ cl coTC(x∗ (tf ), K(tf , t0 , T0 )),
∂x
(to be compared to (6.35)). If, instead of imposing p∗ (tf ) = −∇ψ(x∗ (tf )), we impose the
condition
∂g f ∗
p∗ (tf ) = −∇ψ(x∗ (tf )) − (x (tf ))T π for some π,
∂x

Copyright ©1993–2024, André L. Tits. All Rights Reserved 155


Calculus of Variations and Pontryagin’s Principle

we obtain, formally, the same Pontryagin Principle as above. While we do not know π, the
above guarantees that
 f 
∗ ∗ ∂g ∗
p (tf ) + ∇ψ(x (tf )) ⊥ N (x (tf )) .
∂x

This is again a transversality condition.


It can be shown that, without our assumptions except for the requirement that (x∗ (tf ), g f )
is non-degenerate, the same result still holds except that the transversality condition be-
comes: there exists p∗0 ≥ 0, with (p∗ (tf ), p∗0 ) not identically zero, such that
 
∗ ∂g f ∗
p (tf ) + p∗0 ∇ψ(x∗ (tf )) ⊥N (x (tf )) . (6.50)
∂x

This result is significantly harder to prove though. It is the central difficulty in the proof of
Pontryagin’s principle. Proofs are found in [?, ?, ?]. Also see [?].

General case (Bolza problems)


Finally, consider the case of function that includes both an integral term and a terminal-
state term (such problem is known as a Bolza problem), and both initial and terminal states
are possibly fixed or constrained. (Such problems are readily converted to the Mayer form,
using the same transformation that we used earlier to transform Lagrange problems to the
Mayer form.) The following theorem can be proved.

Theorem 6.5 Consider the problem (assume L is smooth enough)


Z tf
minimize L(t, x(t), u(t))dt + ψ(x(tf )) subject to
t0
ẋ(t) = f (t, x(t), u(t)) a.e. t ∈ [t0 , tf ],
g 0 (x(t0 )) = 0ℓ0 , g f (x(tf )) = 0ℓf ,
u ∈ U, x continuous,

and the associated pre-Hamiltonian H : R × Rn × R1+n × Rm → R and Hamiltonian


H : R × Rn × R1+n → R given by

H(τ, ξ, η̃, υ) := −η0 L(t, ξ, υ) + η T f (τ, ξ, υ), H(t, ξ, η̃) := supv∈U H(t, ξ, η̃, v),

where η̃ := [η0 ; η]. Suppose u∗ ∈ U is an optimal control, and x∗0 ∈ {ξ : g 0 (ξ) = 0ℓ0 ) an
optimal initial state, and let x∗ (·) be the associated optimal trajectory (with x∗ (t0 ) = x∗0 ).
Then there exists an absolutely continuous function p∗ : [t0 , tf ] → Rn and a scalar constant
p∗0 ≥ 0, with p̃∗ := [p∗0 ; p∗ ] not identically zero, that satisfy
 T
∗ ∂H ∗ ∗ ∗
ṗ (t) = − (t, x (t), p̃ (t), u (t)) a.e. t ∈ [t0 , tf ]
∂x
H(t, x∗ (t), p̃∗ (t), u∗ (t)) = H(t, x∗ (t), p̃∗ (t)) ∀t ∈ [t0 , tf ).

156 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.3 Continuous-Time Optimal Control

∂g 0
Further, if ∂x
(x∗ (t0 )) has full row rank, then
 
∗ ∂g 0 ∗
p (t0 ) ⊥ N (x (t0 )) ,
∂x

∂g f
and if ∂x
(x∗ (tf )) has full row rank, then
 
∗ ∂g f ∗
p (tf ) + p∗0 ∇ψ(x∗ (tf )) ⊥N (x (tf )) .
∂x

Also, if f does not depend explicitly on t (i.e., if f (t, x, u) = fˆ(x, u) for all t, for some fˆ)
then m(t) := H(t, x∗ (t), p∗ (t)) is constant. Finally, if Assumptions TS hold for the associated
Mayer problem, then such [p∗0 ; p∗ ] exists for which p∗0 = 1.

Remark 6.16 When constraint qualification (6.48) does not hold for the associated Mayer
problem, then p∗0 may have to be equal to zero (with p∗ not identically zero). This makes
the result “weak”, just like the F. John condition of constrained optimization, because it
involves neither L nor ψ, i.e., does not involve the function being minimized.

Remark 6.17 It can be shown that Assumptions TS, (which imply that p∗0 can be chosen
strictly positive) also hold if a certain controllability condition is satisfied provided the control
values are unconstrained, i.e., U = Rn . See, e.g., [?].

Free final time


Suppose the final time is itself a decision variable (example: minimum-time problem).
Consider the problem
Z tf
minimize L(t, x(t), u(t))dt subject to
t0
ẋ(t) = f (t, x(t), u(t)) a.e. t ∈ [t0 , tf ], x absolutely continuous, (dynamics)
g 0 (x(t0 )) = 0ℓ0 , g f (x(tf )) = 0ℓf (initial and final conditions)
u∈U (control constraints)
tf ≥ t0 (final time constraint)

We analyze this problem by converting the variable-length time interval [t0 , tf ] into a fixed-
length time interval [0, 1]. Define t(·), absolutely continuous, to satisfy

dt(s)
= α(s) a.e. s ∈ [0, 1]. (6.51)
ds
To fall back into a known formalism, we will consider s as the new time, t(s) as a new state
variables, and α(s) as a new control. Note that, clearly, given any optimal α∗ (·), there is an
equivalent constant optimal α∗ , equal to t∗f − t0 ; accordingly, among the controls (u(·), α(·))

Copyright ©1993–2024, André L. Tits. All Rights Reserved 157


Calculus of Variations and Pontryagin’s Principle

that satisfy Pontryagin’s principle, we can choose to focus only on those for which α is
constant.4 The initial and final condition on t(s) are

t(0) = t0
t(1) free

Denoting z(s) = x(t(s)), v(s) = u(t(s)) we obtain the state equation

d dt(s)
z(s) = αf (t(s), z(s), v(s)), = α a.e. s ∈ [0, 1]
ds ds
g 0 (z(0)) = 0ℓ0 , g f (z(1)) = 0ℓf .

Now suppose that (u∗ , tf ∗ , x∗0 ) is optimal for the original problem. Then the corresponding
(v ∗ , α∗ , x∗0 ), with α∗ = tf ∗ − t0 is optimal for the transformed problem. Expressing the known
conditions for this problem and performing some simplifications, we obtain the following
result.

Theorem 6.6 Same as above, with the pre-Hamiltonian

H(t, x, p̃, u) = −p0 L(t, x, u) + pT f (t, x, u),

and the additional necessary condition (related to the additional degree of freedom, on the
terminal time)
H(tf ∗ , x∗ (tf ∗ ), p̃∗ (tf ∗ )) = 0
where p̃∗ (t) = (p∗0 , p∗ (t)) and tf ∗ is the optimal final time. Again, if L and f do not explicitly
depend on t, then
H(t, x∗ (t), p̃∗ (t)) = constant = 0 ∀t

Exercise 6.17 Prove Theorem 6.6 by applying the previous results.

Minimum time problem


Consider the following special case of the previous problem (with fixed initial and final
states), with L = 1,

minimize tf subject to
ẋ(t) = f (t, x(t), u(t)) a.e. t ∈ [t0 , tf ], x absolutely continuous
x(t0 ) = x0 , x(tf ) = xf
u ∈ U, tf ≥ t0 (tf free)

The previous theorem can be simplified to give the following.


4
Alternatively, α(s) can be viewed as an additional state variable, with associated dynamics α̇ = 0.

158 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.4 Applying Pontryagin’s Principle

Theorem 6.7 Let tf ∗ ≥ t0 and u∗ ∈ U be optimal. Let x∗ be the corresponding trajectory.


Then there exists a scalar p∗0 and an absolutely continuous function p∗ : [t0 , tf ∗ ] → Rn , not
identically zero such that

ṗ∗ (t) = −Dx H(t, x∗ (t), p∗ (t), u∗(t)) a.e. t ∈ [t0 , tf ∗ ]


[p∗ (t0 ), p∗ (tf ) free]
H(t, x∗ (t), p∗ (t), u∗ (t)) = H(t, x∗ (t), p∗ (t)) t ∈ [t0 , tf )
H(tf ∗ , x∗ (tf ∗ ), p∗ (tf ∗ )) ≥ 0

with

H(t, x, p, u) = pT f (t, x, u) (L = 0)
H(t, x, p) = sup H(t, x, p, u)
v∈U

Also, if f does not depend explicitly on t then H(t, x∗ (t), p∗ (t)) is constant.

Exercise 6.18 Prove Theorem 6.7. Hint: Note that the statement does not involve a
scalar p∗0 . Indeed, when starting from Theorem 6.6 to solve this exercise, you will see that
p∗0 = H(tf ∗ , x∗ (tf ∗ ), p∗ (tf ∗ )), which yields the inequality H(tf ∗ , x∗ (tf ∗ ), p∗ (tf ∗ )) ≥ 0 in the
statement of Theorem 6.7.

Remark 6.18 Note that p is determined only up to a constant scalar factor.

6.4 Applying Pontryagin’s Principle


Given that the finite-dimensional maximization problem (maximization of the pre-Hamiltonian)
for obtaining u∗ (t) at each time t involves x∗ (t) and p∗ (t), both of which are yet to be de-
termined, it may first appear that Pontryagin’s may be of little help for solving a “real”
problem (e.g., numerically). The following approach comes to mind though:

1. Minimize the pre-Hamiltonian with respect to the control, yielding u∗ (t) in terms of
x∗ (t) and p∗ (t) at every time t;

2. Plug the expression for u∗ into the differential equations for x∗ and p∗ .

3. Solve the resulting differential equations in (x∗ , p∗ ). This is in general a two-point


boundary value problem.

4. Plug x∗ (t) and p∗ (t) into the expression obtained for u∗ (t) (for each t) in step 1 above.

One difficulty with the scheme outlined above is that, in most cases of practical interest,
no “closed-form” solution can be obtained at step 1 for u∗ (t) in terms of (x∗ (t), p∗ (t)): the
maximization is to be carried out for fixed values of t, e.g., on a fine time grid, and this has
to be done concurrently with carrying out steps 3 and 4, since x∗ (t) and p∗ (t) must be known
at the “current” time in order to be able to proceed with the (numerical) minimization.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 159


Calculus of Variations and Pontryagin’s Principle

Another difficulty is that the differential equation to be solved at step 3, in 2n variables


(assuming p∗0 is strictly positive and hence can be set to 1), has n auxiliary conditions at time
t0 and n auxiliary conditions at time tf . I.e., it is a “two-point boundary-value problem”.
Such problems are notoriously hard to analyze (let alone solve), as compared to initial value
problems. (E.g., the question of existence and uniqueness of solutions for such problems is
largely open.) If instead x∗ (t0 ) and p∗ (t0 ) were fully known, than a solution process would be
as follows: Starting from (x∗ (t0 ), p∗ (t0 )), proceed with a single step of numerical integration
of the set of differential equations, with u∗ (t0 ) a minimizer of the pre-Hamiltonian at time
t0 , yielding values of x∗ , p∗ , and then (via minimization of the pre-Hamiltonian) u∗ , at the
next time point, and proceed. (Note that, in a real-time application, x∗ (t) could possibly be
measured rather than computed—but this is not so for p∗ (t).)
Given that (x∗ (t0 ), p∗ (t0 )) is not fully known though, the above cannot be implemented.
A standard way to proceed is then to use a “shooting method”, by which the n “missing”
initial conditions are first “guessed”, and the scheme of the previous paragraph is carried
out based on this guess. If extreme luck strikes and, at the end of the process, it so happens
that the conditions to be satisfied at tf are met, then the problem is solved! If not, the same
process is restarted with a revised (possibly educated) guess of the missing initial conditions,
i.e., another “shot” is taken at the “target”, until a satisfactory result is achieved. The
choice of the next guess could be driven, e.g., by an optimization process that would work
at minimizing an expression of the error in the values obtained at time tf .
Next we consider in details a (linear) example that can be solved explicitly.
Example 6.2 (see [?]). Consider the motion of a point mass
mẍ + σ ẋ = u , x(t), u(t) ∈ R σ, m > 0
Suppose that u(t) is constrained by
|u(t)| ≤ 1 ∀t
Starting from x0 , ẋ0 we want to reach x = 0, ẋ = 0 in minimum time. Set x1 = x, x2 = ẋ,
σ
α= m > 0, β = m1 > 0. Let U = [−1, 1]. The state equation is
      
ẋ1 (t) 0 1 x1 (t) 0
= + u(t) (6.52)
ẋ2 (t) 0 −α x2 (t) β
Since the system is linear, and the objective function is linear in (x, tf ), the Pontryagin
Principle is a necessary and sufficient condition of optimality. We seek an optimal control
u∗ with corresponding state trajectory x∗ . The pre-Hamiltonian H is
H(t, x, p, u) = pT (Ax + Bu) = pT [x2 ; −αx2 + βu] = (p1 − αp2 )x2 + βp2 u.
(i) Maximize H(t, x∗ (t), p∗ (t), v) with respect to v. Pontryagin’s principle the yields u∗ (t)
in terms of t, x∗ (t), and p∗ (t). Specifically, since β > 0,
u∗ (t) = +1 when p∗2 (t) > 0
−1 when p∗2 (t) < 0
anything when p∗2 (t) = 0
Unfortunately, this is not a bona fide feedback control law, because it involves p∗ (·).

160 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.4 Applying Pontryagin’s Principle

(ii) Use the results of (i) to integrate the adjoint equation:


 ∗    ∗ 
ṗ1 (t) 0 0 p1 (t)
∗ =− , (6.53)
ṗ2 (t) 1 −α p∗2 (t)
yielding
p∗1 (t) = p∗1 (0)
1 ∗ 1
p∗2 (t) = p1 (0) + eαt (− p∗1 (0) + p∗2 (0)).
α α
(The fact that, for the problem at hand, the adjoint equation does not involves x∗ (t)
or u∗ (t), simplifies matters.) Note that we initially have no information on p∗ (0) or
p∗ (tf ), since x∗ (0) and x∗ (tf ) are fixed. We need to determine p∗ (0) from the knowledge
of x∗ (0) and x∗ (tf ), i.e., we have to determine p∗ (0) such that the corresponding u∗
steers x∗ from x0 to the target (0, 0). For this, would could just “guess” p∗ (0) and use
trial-and-error (“shooting”), but the specific structure of the problem (in particular,
we only need the sign of p∗2 (t)) allows us to do better.
(iii) Plug the solution of the adjoint equation into the expression we obtained for u∗ . Ob-
serve that, because p∗2 is monotonic in t, u∗ (t) can change its value at most once, except
if p∗2 (t) = 0 ∀t. Clearly, the latter cannot occur since (check it) it would imply that
p∗ is identically zero, which the theorem rules out. The following cases can arise:
case 1. −p∗1 (0) + αp∗2 (0) > 0 p∗2 strictly monotonic increasing. Then either
u∗ (t) = +1 ∀t
or 
∗ +1 t < t̂ for some t̂
u (t) =
−1 t > t̂
or
u∗ (t) = −1 ∀t

case 2. −p∗1 (0) + αp∗2 (0) < 0 p∗2 strictly monotonic decreasing. Then either
u∗ (t) = −1 ∀t
or 
∗ −1 t < t̂ for some t̂
u (t) =
+1 t > t̂
or

u (t) = +1 ∀t

case 3. −p∗1 (0) + αp∗2 (0) = 0 p∗2 constant, p∗2 (t) = α1 p∗1 (0). Then either

u∗ (t) = −1 ∀t
or
u∗ (t) = 1 ∀t

Copyright ©1993–2024, André L. Tits. All Rights Reserved 161


Calculus of Variations and Pontryagin’s Principle

Thus we have narrowed down the possible optimal controls to the controls having the fol-
lowing property.

|u∗ (t)| = 1 ∀t

u (t) changes sign at most once.

(iv) Integrate the state equation. The only piece of information we have not used yet is
the knowledge of x(0) and x(tf ). We now investigate the question of which among the
controls just obtained steers the given initial point to the origin. It turns out that
exactly one such control will do the job, hence will be optimal.

We proceed as follows. Starting from x = (0, 0), we apply all possible controls backward in
time and check which yield the desired initial condition. Let y(t) = x(t∗ − t).

1. u∗ (t) = 1 ∀t
We obtain the system

ẏ1 (t) = −y2 (t)


ẏ2 (t) = αy2 (t) + β

with y1 (0) = y2 (0) = 0. This gives


 
β eαt − 1 β
y1 (t) = −t + , y2 (t) = (1 − eαt ) < 0 ∀t > 0.
α α α

Also, eliminating t yields


 
1 β α
y1 = log(1 − y2 ) − y2 .
α α β

Thus y1 is increasing, y2 is decreasing (see curve OA in Figure 6.4).

2. u∗ (t) = −1 ∀t

β eαt − 1 β
y1 (t) = − (−t + ), y2(t) = − (1 − eαt )
α α α
Also, eliminating t yields
 
1 β α
y1 = − log(1 + y2 ) + y2 .
α α β

Thus y1 is decreasing, y2 is increasing initially (see curve OB in Figure 6.4)

3. Suppose now that v ∗ (t) := u∗ (t∗ − t) = +1 until some time t̂, and −1 afterward. Then
the trajectory for y is of the type OCD (y1 must keep increasing while y2 < 0). If
u∗ (t) = −1 first, then +1, the trajectory is of the type OEF.

162 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.4 Applying Pontryagin’s Principle

1
A
D
C
-1
+1

0
2
-1

+1
E
B
F

Figure 6.4: “+1”/“−1” indicate values of u∗ (t) in/on the associated regions/curves; (α < 1
is assumed)

The reader should convince himself/herself that one and only one trajectory passes through
any point in the plane. Thus the given control, inverted in time, must be the optimal control
for initial conditions at the given point (assuming that an optimal control exists).
We see then that the optimal control u∗ (t) has the following properties, at each time t
if x∗ (t) is above BOA or on OB u∗ (t) = −1
if x∗ (t) is below BOA or on OA u∗ (t) = 1
Thus we can synthesize the optimal control in feedback form: u∗ (t) = ψ(x∗ (t)) where the
function ψ is given by

1 if (x1 , x2 ) is below BOA or on OA
ψ(x1 , x2 ) =
−1 above BOA or on OB
BOA is called the switching curve.

Example 6.3 Linear quadratic regulator

ẋ(t) = A(t)x(t) + B(t)u(t) (dynamics) A(·), B(·) continuous


x(0) = x0 given,
we want to Z
1 1 
minimize J = x(t)T L(t)x(t) + u(t)T R(t)u(t) dt
2 0
where L(t) = L(t)  0 ∀t, R(t) = R(t)T ≻ 0 ∀t, say, both continuous, and where the
T

final state is free. Since the final state is free, we can select p∗0 = 1, so the pre-Hamiltonian
H is given by
1
H(t, x, p, u) = − (xT L(t)x + uT R(t)u) + pT (A(t)x + B(t)u).
2

Copyright ©1993–2024, André L. Tits. All Rights Reserved 163


Calculus of Variations and Pontryagin’s Principle

We first minimize H(t, x∗ (t), p∗ (t), v) with respect to v in order to find u∗ in terms of t, x∗ (t),
and p∗ (t). Since U = Rn , the Pontryagin Principle yields, as R(t) > 0 ∀t,

R(t)u∗ (t) − B T (t)p∗ (t) = 0 ,

which we can explicitly solve for u∗ (t) in terms of p∗ (t). Thus

u∗ (t) = R(t)−1 B(t)T p∗ (t).

Next, we plug this expression into the adjoint equation and state equation, yielding

ṗ∗ (t) = −AT (t)p∗ (t) + L(t)x∗ (t)
(S)
ẋ∗ (t) = A(t)x∗ (t) + B(t)R(t)−1 B(t)T p∗ (t)

with p∗ (1) = 0, x∗ (0) = x0 . Integrating, we obtain


 ∗    
p (1) Φ11 (1, t) Φ12 (1, t) p∗ (t)
=
x∗ (1) Φ21 (1, t) Φ22 (1, t) x∗ (t)

Since p∗ (1) = 0, the first row yields

p∗ (t) = Φ11 (1, t)−1 Φ12 (1, t)x∗ (t) (6.54)

provided Φ11 (1, t) is non singular ∀t, which was proven to be the case (since L(t) is positive
semi-definite; see Theorem 2.1). Now let

K(t) = Φ11 (1, t)−1Φ12 (1, t)

so that p∗ (t) = K(t)x∗ (t). We now show that K(t) satisfies a fairly simple equation. Note
that K(t) does not depend on the initial state x0 . From (6.54), p∗ (t) = K(t)x∗ (t). Differen-
tiating, we obtain

ṗ∗ (t) = K̇(t)x∗ (t) + K(t)[A(t)x∗ (t) + B(t)R(t)−1 B(t)T p∗ (t)]


= (K̇(t) + K(t)A(t) + K(t)B(t)R(t)−1 B(t)T K(t))x∗ (t) (6.55)

On the other hand, the first equation in (S) gives

ṗ∗ (t) = −A(t)T K(t)x∗ (t) + L(t)x∗ (t) (6.56)

Since K(t) does not depend on the initial state x0 , this implies

K̇(t) = −K(t)A(t) − A(t)T K(t) − K(t)B(t)R(t)−1 B(t)T K(t) + L(t) (6.57)

For the same reason, p(1) = 0 implies K(1) = 0. Equation (6.57) is a Riccati equation. It
has a unique solution, which is a symmetric matrix. Note that we have

u∗ (t) = R(t)−1 B(t)T K(t)x∗ (t)

which is an optimal feedback control law. This was obtained in Chapter 2 using elementary
arguments.

164 Copyright ©1993-2018, André L. Tits. All Rights Reserved


6.4 Applying Pontryagin’s Principle

Exercise 6.19 (Singular control. From [?].) Obtain the minimum-time control to bring the
state from [1; 0] to the origin under the dynamics

ẋ1 = x22 − 1, ẋ2 = u,

where u ∈ U and u(t) ∈ U := [−1.1] for all t. Show by inspection that the sole optimal
control u∗ is identically zero. Show that, as asserted by Pontryagin’s Principle, there exists
a p∗ satisfying the conditions of the principle, but that such p∗ is such that, at every time t,
the pre-Hamiltonian along (x∗ , p∗ ) is minimized by every v ∈ [−1, 1], so that Pontryagin’s
Principle is of no help for solving the problem. When such situation arises (which is not
that rare in real-life problems), the term singular control (or singular arc, which refers to the
time portion of the curve u∗ (·) on which Pontryagin’s Principle is of no help) is used.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 165


Calculus of Variations and Pontryagin’s Principle

166 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Appendix A

Generalities on Vector Spaces

Note. Some of the material contained in this appendix (and to a lesser extent in the second
appendix) is beyond what is strictly needed for this course. We hope it will be helpful to
many students in their research and in more advanced courses.
References: [?], [?, Appendix A]
Definition A.1 Let F = R or C and let V be a set. V is a vector space (linear space)
over F if two operations, addition and scalar multiplication, are defined, with the following
properties
(a) ∀x, y ∈ V, x + y ∈ V and V is an Abelian (aka commutative) group for the addition
operation (i.e., “+” is associative and commutative, there exists an additive identity
0V and every x ∈ V has an additive inverse −x).
(b) ∀α ∈ F, x ∈ V, ∃αx ∈ V and
(i) ∀x ∈ V, ∀α, β ∈ F
1x = x, α(βx) = (αβ)x, 0x = 0V , α0V = 0V
(ii) x, y ∈ V, ∀α, β ∈ F
α(x + y) = αx + αy
(α + β)x = αx + βx
If F = R, V is said to be a real vector space. If F = C, it is said to be a complex vector
space. Elements of V are often referred to as “vectors” or as “points”.
Exercise A.1 Let x ∈ V , a vector space. Prove that x + x = 2x.
In the context of optimization and optimal control, the primary emphasis is on real vector
spaces (i.e., F = R). In the sequel, unless explicitly indicated otherwise, we will assume this
is the case.
Example A.1 R, Rn , Rn×m (set of n×n real matrices); the set of all univariate polynomials
of degree less than n; the set of all continuous functions f : Rn → Rk ; the set C[a, b] of
all continuous functions over an interval [a, b] ⊂ R. (All of these with the usual + and ·
operations.) The 2D plane (or 3D space), with an origin (there is no need for coordinate
axes!), with the usual vector addition (parallelogram rule) and multiplication by a scalar.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 167


Generalities on Vector Spaces

Exercise A.2 Show that the set of functions f : R → R such that f (0) = 1 is not a vector
space.

Definition A.2 A set S ⊂ V is said to be a subspace of V if it is a vector space in its own


right with the same “+” and “·” operations as in V .

Definition A.3 Let V be a linear space. The family of vectors {x1 , . . . , xn } ⊂ V is said to
be linearly independent if any relation of the form

α1 x1 + α2 x2 + . . . + αn xn = 0

implies
α1 = α2 = . . . = αn = 0.

Given a finite collection of vectors, its span is given by


( n )
X
sp ({b1 , . . . , bn }) := αi bi : αi ∈ R, i = 1, . . . , n .
i=1

Definition A.4 Let V be a linear space. The family of vectors {b1 , . . . , bn } ⊂ V is


said to be a basis for V if (i) {b1 , . . . , bn } is a linearly independent family, and (ii)
V = sp({b1 , . . . , bn }).

Definition A.5 For i = 1, . . . , n, let ei ∈ Rn be the n-tuple consisting of all zeros, except
for a one in position i. Then {e1 , . . . , en } is the canonical basis for Rn .

Exercise A.3 The canonical basis for Rn is a basis for Rn .

Exercise A.4 Let V be a vector space and suppose {b1 , . . . , bn } is a basis for V . Prove
that,Pgiven any x ∈ V , there exists a unique n-tuple of scalars, {α1 , . . . , αn } such that
x = ni=1 αi bi . (Such n-tuple referred to as the coordinate vector of x in basis {b1 , . . . , bn }.)

Exercise A.5 Suppose {b1 , . . . , bn } and {b′1 , . . . , b′m } both form bases for V . Then m = n.

Definition A.6 If a linear space V has a basis consisting of n elements then V is said to
be finite-dimensional or of dimension n. Otherwise, it is said to be infinite-dimensional.

Example A.2 Rn is n-dimenstional. The set Rn×m of n × m real matrices form an nm-
dimensional vector space. Univariate polynomials of degree < n form an n-dimensional
vector space. Points in the plane, once one such point is selected to be the origin (but no
coordinate system has been selected yet), form a 2-dimensional vector space. (Come up with
a basis for each of the preceding examples!) C[a, b] is infinite-dimensional. The set of all
univariate polynomials form an infinite-dimensional vector space, and so does that set of all
scalar sequences with no more than finitely many nonzero entries. These last two examples
are “isomorphic” to each other.

168 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Exercise A.6 Prove that the vector space of all univariate polynomials is infinite-dimensional.

Suppose V is finite-dimensional and let {b1 , . . . , bn } ⊂ V be a basis. Then, given any


x ∈ V there exists a unique n-tuple (α1 , . . . , αn ) ∈ Rn (the coordinates, or components of
Pn
x) such that x = αi bi (prove it). Conversely, to every (α1 , . . . , αn ) ∈ Rn corresponds
P i=1
a unique x = ni=1 αi bi ∈ V . Moreover the coordinates of a sum (in V ) are the sums (in
R) of the corresponding coordinates, and similarly for scalar multiples. Thus, once a basis
has been selected, any n-dimensional vector space (over R) can be thought of as Rn itself.
(Every n-dimensional vectors space is said to be isomorphic to Rn .) We write V ∼ Rn .

Normed vector spaces


Definition A.7 A norm on a vector space V is a function k · k: V → R with the properties
(i) ∀α ∈ R, ∀x ∈ V, kαxk = |α|kxk (positive homegeneous)

(ii) kxk > 0 ∀x ∈ V \ {θ}

(iii) ∀x, y ∈ V, kx + yk ≤ kxk + kyk (triangle inequality).

A normed vector space is a pair (V, k · k) where V is a vector space and k · k is a norm on
V . Often, when the specific norm is irrelevant or clear from the context, we simply refer to
“normed vector space V ”.

n
 n
1/2  n
1/p
n
P P 2
P p
Example A.3 In R , ||x||1 = |xi |, ||x||2 = (xi ) , ||x||p = (xi ) , p ∈
i=1 i=1 i=1
[1, ∞), ||x||∞ = max |xi |; in the space of bounded continuous functions f : R → R, ||f ||∞ =
i
1 1/p
R p
sup |f (t)|; in C[0, 1], kf kp = |f (t)| dt , p ∈ [1, ∞).
t 0

Note that the p-norm requires that p ≥ 1. Indeed, when p < 1, the triangle inequality does
not hold. E.g., take p = 1/2, x = (1, 0), y = (0, 1).
Once the concept of norm has been introduced, one can talk about balls and convergent
sequences.
Definition A.8 Given a normed vector space (V, k · k), a sequence {xn } ⊂ V is said to be
convergent (equivalently, to converge) to x∗ ∈ V if kxn − x∗ k → 0 as n → ∞.

Remark A.1 Examples of sequences that converge in one norm and not in another are well
known. For example, it is readily checked that, in the space P of univariate polynomials,
equivalently, of scalar sequences with only finitely many nonzero terms, the sequence {z k }
(i.e., the kth term in the sequence is the monomial given by the kth power of the variable)
does not converge in normP 1 k · k1 (sum of absolute values of coefficients) but converges to
zero in the norm ||p|| = |p | where pi is the coefficient of the ith power term. As another
i i
example consider the sequence of piecewise continuous function xk from [0, 1] to R with
xk (t) = k for t ∈ [0, (1/k 2 )] and 0 otherwise. Check that this sequence converges to θ in

Copyright ©1993–2024, André L. Tits. All Rights Reserved 169


Generalities on Vector Spaces

norm k · k2 but does not converge in norm k · k∞ . Examples where a sequence converges to
two different limits in two different norms are more of a curiosity. The following one is due
to Tzvetan Ivanov from Catholic University of Louvain (UCL) and Dmitry Yarotskiy from
Ludwig Maximilian Universität P München. Consider the space P defined above, and for a
polynomial p ∈ P , write p(z) = i pi z i . Consider the following two norms on P :
  
|pi |
kpka = max |p0 |, max ,
i≥1 i
(  )
X |pi − p0 |
kpkb = max |p0 + pi |, max .
i≥1
i≥1 i
(Note that the coefficients pi of p in the basis {1, z 1 , z 2 , ...}, used in norm a, are replaced in
norm b by the coefficients of p in the basis {1, z 1 − 1, z 2 − 1, ...}.) As above, consider the
sequence {xk } = {z k } of monomials of increasing power. It is readily checked that xk tends
to zero in norm a, but that it tends to the “constant” polynomial z 0 = 1 in norm b, since
kxk − 1kb = k2 tends to zero.
Definition A.9 Given a normed vector space (V, k · k), a sequence {xn } ⊂ V is said to be
a Cauchy sequence if kxn − xm k → 0 as n, m → ∞, i.e., if for every ǫ > 0 there exists N
such that n, m ≥ N implies kxn − xm k < ǫ.

Exercise A.7 Every convergent sequence is a Cauchy sequence.

Inner-product spaces
Definition A.10 Let V be a (possibly complex) vector space and let F be either R or C.
The function h·, ·i : V × V → F is called an inner product (scalar product) if
hy, xi = hx, yi ∀x, y ∈ V
hx, αy + βzi = αhx, yi + βhx, zi ∀x, y, z, ∈ V, α, β ∈ F
hx, xi > 0 ∀x 6= θ
A vector space endowed with an inner-product is termed inner-product space, or pre-Hilbert
space. It is readily checked that, if h·, ·i is an inner product on V , then the function k · k
given by
kxk = hx, xi1/2
is a norm on V (the norm induced by the inner product). (Check it.) Hence every inner-
product space is a normed vector space. Unless otherwise noted, the notation kxk, when x
belongs to an inner-product space refers to hx, xi1/2 .
Remark A.2 Some authors use a sightly different definition for the inner product, with
the second condition replaced by hαx, yi = αhx, yi, or equivalently hx, αyi = ᾱhx, yi. Note
that the difference is merely notational, since (x, y) := hy, xi satisfies such definition. The
definition given here has the advantage that it is satisfied by the standard dot product in
Cn , hx, yi := x∗ y = x̄T y, rather than by the slightly less “friendly” xT ȳ. When F = R, these
two definitions are equivalent, given the symmetry property hy, xi = hx, yi.)

170 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Example A.4 Let V be (real and) finite-dimensional, and let {bi }ni=1 be a basis. Then
n
P
hx, yi = ξi ηi , where ξ ∈ Rn and η ∈ Rn are the vectors of coordinates of x and y in basis
i=1
{bi }ni=1 , is an inner product. It is known as the Euclidean inner product associated to basis
{bi }ni=1 .

The following exercise characterizes all inner products on finite-dimensional vector spaces.

Exercise A.8 Let V be an n-dimensional inner-product space, with a basis {bi }. Prove the
following statement. A mapping h·, ·i : V × V → R is an inner product on V if and only if
there exists a symmetric positive definite matrix M such that

hx, yi = ξ T Mη ∀x, y ∈ V,

where ξ (resp., η) is the column-vector of components of x (resp., y) in basis {bi }.

Note that, if M = M T ≻ 0, then M = AT A for some square non-singular matrix A, so


that xT Mu = (Ax)T (Ay) = x′T y ′ , where x′ and y ′ are the coordinates of x and y in a new
basis. Hence, up to a change of basis, an inner product over Rn always takes the form xT y.

Example A.5 V = C[t0 , tf ], the space of all continuous functions from [t0 , tf ] to R, with

Ztf
hx, yi = x(t)y(t) dt.
t0

This inner product is known as the L2 inner product.

Example A.6 V = C[t0 , tf ]m , the space of continuous functions from [t0 , tf ] to Rm . For
x(·) = (ξ1 (·), . . . , ξm (·)), y(·) = (η1 (·), . . . , ηm (·))

Ztf X
m Ztf
hx, yi = ξi (t)ηi (t) dt = x(t)T y(t) dt.
t0 i=1 t0

This inner product is again known as the L2 inner product. The same inner product is valid
for the space of piecewise-continuous functions U considered in Chapter 2.

Exercise A.9 (Gram matrix). Let V be an inner-product space, let v1 , . . . , vk ∈ V and let
G be a k × k matrix with (i, j) entry given by Gij := hvi , vj i. Then G  θ, and G ≻ θ if and
only if the vi s are linearly independent.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 171


Generalities on Vector Spaces

Exercise A.10 Let V be the vector space of univariate quadratic polynomials (more pre-
cisely, polynomials of degree no higher than two) over [0, 1], endowed with the inner product
Z 1
hp, qi := p(t)q(t)dt ∀p, q ∈ V.
0

For p, q ∈ V , let P, Q ∈ R3 be the associated vectors of coefficients (say, P = [p0 ; p1 ; p2 ]).


Obtain a symmetric matrix S such that

hp, qi = P T SQ, ∀p, q ∈ V.

Theorem A.1 (Cauchy-Bunyakovskii-Schwarz (CBS) inequality, after Baron Augustin-


Louis Cauchy, French mathematician, 1789–1857; Viktor Ya. Bunyakovsky, Russian mathe-
matician, 1804–1889; K. Hermann A. Schwarz, German mathematician, 1843–1921.) If V
is a vector space with inner product h·, ·i,

|hx, yi| ≤ kxk · kyk ∀x, y ∈ V.

Moreover both sides are equal if and only if x = θ or y = λx for some λ ∈ R.

Exercise A.11 Prove the CBS inequality. [Hint: hx + αy, x + αyi ≥ 0 ∀α ∈ R.]

Exercise. Prove that for all y ∈ V , h·, yi is continuous on V (so that, if {xk } → x∗ ,
hxk , yi → hx∗ , yi).

Theorem A.2 (Parallelogram law.) In an inner-product space V , the sum of the squares
of the norms of the two diagonals of a parallelogram is equal to the sum of the squares of the
norms of its four sides, i.e., for every x, y ∈ V ,

kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ), (A.1)

where k · k is the norm induced by the inner product.

Exercise A.12 Prove Theorem A.2.

Fact. Given a normed space V whose norm satisfies (A.1),


1 
hx, yi := kx + yk2 − kx − yk2 ∀x, y ∈ V
4
is an inner product; further, hx, xi = kxk2 for all x ∈ V .

Definition A.11 Given x, y ∈ V, x and y are said to be orthogonal if hx, yi = 0. Given a


set S ⊂ V , the set
S ⊥ = {x ∈ V : hx, si = 0 ∀s ∈ S} .
is called orthogonal complement of S.

172 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Exercise A.13 Prove that S ⊥ is a closed subspace.

Exercise A.14 Prove that, if S is a subset of an inner-product space, then S ⊆ (S ⊥ )⊥ .

Example A.7 Equality may not hold even when S is a subspace. E.g., let V be the space
of continuous functions with the L2 inner product, and S ⊂ V the set of all polynomials.
Then S 6= (S ⊥ )⊥ = V . (Indeed, S is not closed.)

Exercise A.15 Consider a plane (e.g., a blackboard or sheet of paper) together with a point
in that plane declared to be the origin. With an origin in hand, we can add vectors (points)
in the plane using the parallelogram rule, and multiply vectors by scalars, and it is readily
checked that all vector space axioms are satisfied; hence we have a vector space V . Two non-
collinear vectors e1 and e2 of V form a basis for V . Any vector x ∈ V is now uniquely specified
by its components in this basis; let us denote by xE the column vector of its components. Now,
let us say that two vectors x, y ∈ V are perpendicular if the angle θ(x, y) between them (e.g.,
measured with a protractor on your sheet of paper) is π/2, i.e., if cos θ(x, y) = 0. Clearly,
in general, (xE )T y E = 0 is not equivalent to x and y being perpendicular. (In particular,
of course, (eE T E E T E T
1 ) e2 = 0 (since e1 = [1, 0] and e2 = [0, 1] ), while e1 and e2 may not be
perpendicular to each other.) Question: Determine a symmetric positive-definite matrix S
such that hx, yiS := (xE )T Sy E = 0 if and only if x and y are perpendicular.

Gram-Schmidt ortho-normalization (Erhard Schmidt, Balto-German mathematician,


1876–1959)
Let V be a finite-dimensional inner product space and let {b1 , . . . , bn } be a basis for V . Let
u1
u1 = b1 , e1 =
ku1k2
k−1
X uk
uk = bk − hbk , ei iei , ek = , k = 2; . . . , n.
i=1
kuk k2

Then {e1 , . . . , en } is an orthonormal basis for V , i.e., kei k2 = 1 for all i, and hei , ej i = 0 for
all i 6= j (Check it).

Closed, open, compact sets


Definition A.12 Let V be a normed vector space. A subset S ⊂ V is closed (in V ) if
every x ∈ V for which there is a sequence {xn } ⊂ S that converges to x, belongs to S. A
subset S ⊂ V is open if its complement is closed. The closure clS of a set S is the smallest
closed set that contains S, i.e., the intersection of all closed sets that contain S (see the next
exercise). The interior intS of a set S is the largest open set that is contained in S, i.e., the
union of all open sets that contain S.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 173


Generalities on Vector Spaces

Exercise A.16 Prove the following, which shows that Definition A.12 is valid. The inter-
section ∩α Sα of an arbitrary (possibly uncountable) family of closed sets is closed. The union
∪α Sα of an arbitrary (possibly uncountable) family of open sets is open.

Exercise A.17 Prove that intS = (clS c )c and clS = (intS c )c . .

Exercise A.18 Let V be a normed vector space and let S ⊂ V . Show that the closure of S
is the set of all limit points of sequences of S that converge in V .

Exercise A.19 Show that a subset S of a normed vector space is open if and only if given
any x̂ ∈ S there exists ǫ > 0 such that {x : kx − x̂k < ǫ} ⊂ S.

Exercise A.20 Show that, in a normed linear space, every finite-dimensional subspace is
closed. In particular, all subspaces of Rn are closed.

Example A.8 Given a positive integer n, the set of polynomials of degree ≤ n in one vari-
able over [0,1] is a finite-dimensional subspace of C[0, 1]. The set of all univariate polynomials
over [0,1] is an infinite-dimensional subspace of C[0, 1]; it is not closed in either of the norms
of Example A.3 (prove it).

Exercise A.21 Given any set E in an inner product space, E ⊥ is a closed subspace (in the
norm derived from the inner product).

Exercise A.22 If S is a subspace of an inner product space, (S ⊥ )⊥ = clS. In particular, if


S is a finite-dimensional subspace of an inner product space, then S ⊥⊥ = S. [Hint: choose
an orthogonal basis for S. ]

Definition A.13 A set S in a normed vector space is bounded if there exists ρ > 0 s.t.
S ⊂ {x : kxk ≤ ρ}

Definition A.14 A subset S of a normed vector space is said to be (sequentially) compact


if, given any sequence {xk }∞k=0 ⊂ S, there exists a sub-sequence that converges to a point of
S, i.e., there exists an infinite index set K ⊆ {0, 1, 2, . . .} and x∗ ∈ S such that xk −→ x∗
as k −→ ∞, k ∈ K.

Remark A.3 The concept of compact set is also used in more general “topological spaces”
than normed vector spaces, but with a different definition (“Every open cover includes a
finite sub-cover”). In such general context, the concept introduced in Definition A.14 is
referred to as “sequential compactness” and is weaker than compactness. In the case of
normed vector spaces (or, indeed, of general “metric spaces”), compactness and sequential
compactness are equivalent.

174 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Fact. (Bolzano-Weierstrass, Heine-Borel). Let S ⊂ Rn . Then S is compact if and only if it


is closed and bounded. (For a proof see, e.g., wikipedia.)

Example A.9 The “simplest” infinite-dimensional vector space may be the space P of
univariate polynomials (of arbitrary degrees), or equivalently the space of finite-length se-
quences (infinite sequences with finitely many nonzero entries). Consider P together with
the ℓ∞ norm (maximum absolute value among the (finitely many) non-zero entries). The
closed unit ball in P is not compact. For example, the sequence xk , where xk is the monomial
z k , z being the unknown, which clearly belongs to the unit ball, does not have a converging
sub-sequence. Similarly, the close unit ball B in ℓ1 (absolutely summable real sequences) is
not compact. For example, the following continuous function is unbounded over B (example
due to Nuno Martins):
1 1 1
f (x) = max{xn } if xn ≤ ∀n, and + max{n(xn − )} otherwise.
2 2 2
In fact, it is a important result due to Riesz that the closed unit ball of a normed vector
space is compact if and only if the space if finite-dimensional. (See, e.g., [?, Theorem 6,
Ch. 5].)

Definition A.15 Supremum, infimum. Given a set S ⊆ R, the supremum sup S


(resp. infimum inf S) of S is the lowest/leftmost (resp., highest/rightmost) x such that s ≤ x
(resp. s ≥ x) for all s ∈ S. If there is no such x, then sup S := +∞ (resp. inf S := −∞), and
if S is empty, then sup S := −∞ and inf S := +∞. Finally, if sup S (resp. inf S) belongs
to S, the supremum (resp. infimum) is said to be attained, and is known as the maximum
max S (resp. minimum min S) of S.

It is an axiom of R (i.e., part of the definition of R) that every upper-bounded subset of R


has a finite supremum and every lower-bounded subset of R has a finite infimum.

Definition A.16 Let (V, k · kV ) and (W, k · kW ) be normed spaces, and let f : V → W . Then
f is continuous at x̂ ∈ V if for every ǫ > 0, there exists δ > 0 such that kf (x) − f (x̂)kW < ǫ
for all x such that kx − x̂kV < δ. If f is continuous at x̂ for all x̂ ∈ V , it is said to be
continuous.

Exercise A.23 Prove that, in any normed vector space, the norm is continuous with respect
to itself.

Exercise A.24 Let V be a normed space and let S be a compact set in V . Let f : V → R
be continuous. Then there exists x, x̄ ∈ S such that

f (x) ≤ f (x) ≤ f (x̄) ∀x ∈ S

i.e., the supremum and infimum of {f (x) : x ∈ S} are attained.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 175


Generalities on Vector Spaces

Exercise A.25 Let f : V → R be continuous, V a normed vector space. Then, for all
α ∈ R, the sub-level set {x : f (x) ≤ α} is closed.

Definition A.17 Let f : Rn −→ Rm and let S ⊂ Rn . Then f is uniformly continuous


over S if for all ǫ > 0 there exists δ > 0 such that

x, y ∈ S
=⇒ kf (x) − f (y)k < ǫ
kx − yk < δ

Exercise A.26 Let S ⊂ Rn be compact and let f : Rn −→ Rm be continuous over S.


Then f is uniformly continuous over S.

For a given vector space, it is generally possible to define many different norms.
Norm equivalence

Definition A.18 Two norms k · ka and k · kb on a same vector space V are equivalent if
there exist N, M > 0 such that ∀x ∈ V, N||x||a ≤ ||x||b ≤ M||x||a .

Exercise A.27 Verify that the above is a bona fide equivalence relation, i.e., that it is
reflexive, symmetric and transitive. (Example: Similarity of n × n matrices is an equivalence
relation.)

We will see that equivalent norms can often be used interchangeably. The following result
is thus of great importance.

Exercise A.28 Prove that, if V is a finite-dimensional vector space, all norms on V are
equivalent. [Hint. First select an arbitrary basis {bi }ni=1 for V . Then show that k · k∞ :
V → R, defined by kxk∞ := max |xi |, where the xi ’s are the coordinates of x in basis {bi },
is a norm. Next, show that, if k · k is an arbitrary norm, it is a continuous function from
(V, k · k∞ ) to (R, | · |). Finally, conclude by invoking Exercise A.24. ]

Exercise A.29 Does the result of Exercise A.28 extend to some infinite-dimensional con-
text, say, to the space of univariate polynomials of arbitrary degrees? If not, where does your
proof break down?

Exercise A.30 Suppose that k · ka and k · kb are two equivalent norms on a vector space
1/k
V and let the sequence {xk } ⊂ V be such that the sequence {kxk ka } converges. Then
1/k
the sequence {kxk kb } also converges and both limits are equal. Moreover, if V is a space
of matrices and xk is the kth power of a given matrix A, then the limit exists and is the
spectral radius ρ(A), i.e., the radius of the smallest disk centered at the origin containing all
eigenvalues of A.

176 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Exercise A.31 Let {xi } ⊂ V and k.ka and k.kb be 2 equivalent norms on V . Then a set
S us open (resp. closed) w.r.t. norm k.ka if and only if it is open (resp. closed) w.r.t. norm
k.kb . Furthermore

(i) {xi } converges to x∗ with respect to norm a if and only if it converges to x∗ with respect
to norm b.

(ii) {xi } is Cauchy w.r.t. norm a if and only if it is Cauchy w.r.t. norm b

Hence, in Rn , we can talk about converging sequences and Cauchy sequences without
specifying the norm.

Exercise A.32 Suppose {xk } ⊂ Rn is such that, for some b ∈ Rn and α > 0, kxk+1 −xk k ≤
αbT (xk+1 − xkP
) for all k. Prove that, if {bT xk } is bounded, then {xk } converges. Hint: Show
that the sum N k=0 kx
k+1
− xk k remains bounded as N → ∞, and that this implies that {xk }
is Cauchy. Such situation may arise when attempting to construct a maximizing sequence
for the maximization of bT x over a certain set.

Complete vector spaces


Definition A.19 A normed linear space V is said to be complete if every Cauchy sequence
in V converges to a point of V .

Exercise A.33 Prove that Rn is complete

Hence every finite-dimensional vector space is complete. Complete normed spaces are
known as Banach spaces (Stefan Banach, Polish mathematician, 1892–1945). Complete
inner product spaces (with norm derived from the inner product) are known as Hilbert
spaces (David Hilbert, German mathematician, 1862–1943). Rn is a Hilbert space.

Example A.10 C[0, 1] with the sup norm is Banach. The vector space of polynomials over
[0,1], with the sup norm, is not Banach, nor is C[0, 1] with an Lp norm, p finite. The space
of square-summable real sequences, with norm derived from the inner product
v
u∞
uX
hx, yi = t xi yi
i=1

is a Hilbert space.

Exercise A.34 Exhibit an example showing that the inner product space of Example A.5
is not a Hilbert space. (Hence the set U of admissible controls is not complete. Even when
enlarged to include piecewise-continuous functions it is still not complete.)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 177


Generalities on Vector Spaces

While the concepts of completeness and closedness are somewhat similar in spirit, they
are clearly distinct. In particular, completeness applies to vector spaces and closedness
applies to subsets of vectors spaces. Yet, for example, the vector space P of univariate
polynomials is not complete under the sup norm, but it is closed (as a subset of itself), and
its closure (in itself) is itself; indeed, all vector spaces are closed subsets of themselves. Note
however that P can also be thought of as a subspace of the (complete under the sup norm)
space C([0, 1]) of continuous functions over [0, 1]. Under the sup norm, P is not a closed (in
C([0, 1])) subspace. (Prove it.) More generally, closedness and completeness are related by
the following result.

Theorem A.3 Let V be a normed vector space, and let W be a Banach space that contains
a subspace V ′ with the properly that V and V ′ are isometric normed vector spaces. (Two
normed vector spaces are isometric if they are isomorphic as vector spaces and the isomor-
phism leaves the norm invariant.) Then V ′ is closed (in W ) if and only if V is complete.
Furthermore, given any such V there exists such W and V ′ such that the closure of V ′ (in
W ) is W itself. (Such W is known as the completion of V . It is isomorphic to a certain
normed space of equivalence classes of Cauchy sequences in V .) In particular, a subspace S
of a Banach space V is closed if and only if it is complete as a vector space (i.e., if and only
if it is a Banach space).

You may think of incomplete vector spaces as being “porous”, with pores being elements of
the completion of the space. You may also think of non-closed subspaces as being porous;
pores are elements of the “mother” space whenever that space is complete.

Direct sums and orthogonal projections


Definition A.20 Let S and T be two subspaces of a linear space V . The sum S + T :=
{s + t : s ∈ S, t ∈ T } of S and T is called a direct sum if S ∩ T = {θ}. The direct sum is
denoted S ⊕ T .

Exercise A.35 Given two subspaces S and T of V , V = S ⊕ T if and only if for every
v ∈ V there is a unique decomposition v = s + t such that s ∈ S and t ∈ T .

It can be shown that, if S is a closed subspace of a Hilbert space H, then

H = S ⊕ S ⊥.

Equivalently, ∀x ∈ H there is a unique y ∈ S such that x − y ∈ S ⊥ . The (linear) map


P : x 7→ y is called orthogonal projection of x onto the subspace S.

Exercise A.36 Prove that if P is the orthogonal projection onto S, then

kx − P xk = inf{kx − sk, s ∈ S},

where k · k is the norm derived from inner product.

178 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Linear maps
Definition A.21 Let V, W be vector spaces. A map L : V → W is said to be linear if

L(αx1 + βx2 ) = αL(x1 ) + βL(x2 ) ∀α, β ∈ R


∀x1 x2 ∈ V

Exercise A.37 Let L : V → W be linear, where V and W have dimension n and m,


respectively, and let {bVi } and {bW
i } be bases for V and W . Show that L can be represented
by a matrix, i.e., there exists a matrix ML such that, for any x ∈ V, y ∈ W such that
y = L(x), it holds that vy = ML · vx , where vx and vy are n × 1 and m × 1 matrices (column
vectors) with entries given by the components of x and y, and “·” is the usual matrix product.
The entries in the ith column of ML are the components in {bW V
i } of Lbi .

Linear maps from V to W themselves form a vector space L(V, W ).


Given a linear map L ∈ L(V, W ), its range is given by

R(L) = {Lx : x ∈ V } ⊆ W

and its nullspace (or kernel) by

N (L) = {x ∈ V : Lx = θW } ⊆ V.

Exercise A.38 R(L) and N (L) are subspaces of W and V respectively.

Exercise A.39 Let V and W be vector spaces, with V finite-dimensional, and let L : V →
W be linear. Then R(L) is also finite-dimensional, of dimension no larger than that of V .

Definition A.22 A linear map L : V → W is said to be surjective if R(L) = W ; it is said


to be injective if N (L) = {θV }.

Exercise A.40 Prove that a linear map A : Rn → Rm is surjective if and only if the
matrix that represents it has full row rank, and that it is injective if and only if the matrix
that represents is has full column rank.

Bounded linear maps


Definition A.23 Let V, W be normed vector spaces. A linear map L ∈ L(V, W ) is said to
be bounded if there exists c > 0 such that kLxkW ≤ ckxkV for all x ∈ V . If L is bounded,
the operator norm (induced norm) of L is defined by
 
kLxkW
kLk = inf{c : kLxkW ≤ ckxkV ∀x ∈ V } = sup : x 6= 0V .
kxkV

Copyright ©1993–2024, André L. Tits. All Rights Reserved 179


Generalities on Vector Spaces

The set of bounded linear maps from V to W is a vector space. It is denoted by B(V, W ).
Let V be C[0, 1] with the L1 norm, let W = R, and let Lx = x(0); L is a linear map.
Then, kLxk reaches arbitrarily large values without kxk being large, for instance, on the unit
ball in L1 : with xk (t) := k exp(−kt), kxk k1 = 1 − exp(−k) < 1 for all k > 0, but x(0) = k
becomes arbitrarily large as k increases. Such linear map is said to be “unbounded”. For
another example, let V be the vector space of continuously differentiable functions on [0, 1]
with kxk = max |x(t)|. Let W = R and let L be defined to Lx = x′ (0). Then L is an
t∈[0,1]
unbounded linear map. (Think of the sequence xk (t) = sin(kt).)

Exercise A.41 Show that k · k as defined above is a norm on B(V, W )

Exercise A.42 Let V be a finite-dimensional normed vector space, and let L be a linear
map over V . Prove that L is bounded.

Unbounded linear maps exist whenever that space is infinite dimensional, as shown with
the following example with the “smallest” infinite-dimensional space.
Example A.11 [?, Example 4, p.105] On the space of finitely nonzero infinite sequences
with norm equal to the maximum of the absolute values of the entries, define, for x :
{ξ1 , . . . , ξn , 0, 0, . . .},

X
f (x) = kξk .
k=1
The functional f is clearly linear but unbounded.
Now for an example from linear system theory.
Example A.12 Let L be a linear time-invariant dynamical input-output system, say with
scalar input and scalar output. If the input space and output space both are endowed with
the ∞-norm, then the induced norm of L is
Z
kLk = |h(t)|dt

where h is the system’s unit impulse response. L is bounded if and only if h is absolutely
integrable, which is the case if and only if the system is bounded-input/bounded-output
stable—i.e., if the ∞-norm of the output is finite whenever that of the input is.
An important characterization of bounded linear maps is as follows.
Exercise A.43 Let V, W be normed vector spaces and let L ∈ L(V, W ). The following are
equivalent: (i) L is bounded; (ii) L is continuous over V ; (iii) L is continuous at θV .
Moreover, if L is bounded then N (L) is closed.

Example A.13 Let V be the vector space of continuously differentiable functions on [0, 1]
with kxk = max |x(t)|. Let W = R and again let L be defined to Lx = x′ (0), an unbounded
t∈[0,1]
kt3
linear map. It can be verified that N (L) is not closed. For example, let xk (t) = 1+kt2
. Then
xk ∈ N (L) for all k and xk → x̂ with x̂(t) = t, but Lx̂ = 1.

180 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Exercise A.44 Show that


kLxkW
kLk = sup kLxkW = sup = sup kLxkW .
kxkV ≤1 x6=θ kxkV kxkV =1

Also, kLxkW ≤ kLkkxkV for all x ∈ V .

Exercise A.45 Prove that

kABk ≤ kAk · kBk ∀A ∈ B(W, Z), B ∈ B(V, W )

with AB defined by
AB(x) = A(B(x)) ∀x ∈ V.

Theorem A.4 (Riesz–Fréchet Theorem) (e.g., [?, Theorem 4-12]). (Frigyes Riesz,
Hungarian mathematician, 1880–1956; Maurice R. Fréchet, French mathematician, 1898–
1973.) Let H be a Hilbert space and let L ∈ B(H, R) (i.e., L is a bounded linear functional
on H). Then there exists ℓ ∈ H such that

L(x) = hℓ, xi ∀x ∈ H.

Adjoint of a linear map

Definition A.24 Let V, W be two spaces endowed with inner products h·, ·iV and h·, ·iW
respectively and let L ∈ L(V, W ). An adjoint map to L is a map L∗ : W → V satisfying

hL∗ y, xiV = hy, LxiW ∀x ∈ V, y ∈ W.

Fact. If V is a Hilbert space, then every L ∈ B(V, W ) has an adjoint.

Exercise A.46 Suppose L has an ajoint map L∗ . Show that (i) L has no other adjoint map,
i.e., the adjoint map (when it exists) is unique; (ii) L∗ linear; (iii) L∗ has an adjoint, with
(L∗ )∗ = L; (iv) if L is bounded, then L∗ also is, and kL∗ k = kLk.

Exercise A.47 Given L ∈ L(V, W ), L′ ∈ L(W, V ), we have (L + L′ )∗ = L∗ + L′∗ ; (αL)∗ =


αL∗ ; (LL′ )∗ = L′∗ L∗ ;(L∗ )∗ = L.

When L∗ = L (hence V = W ), L is said to be self-adjoint.

Exercise A.48 Let L be a linear map with adjoint L∗ . Show that hx, Lxi = 12 hx, (L + L∗ )xi
for all x.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 181


Generalities on Vector Spaces

Exercise A.49 Let L be a linear map from V to W , where V and W are finite-dimensional
vector spaces, and let ML be its matrix representation in certain bases {biV } and {biW }. Let
Sn and Sm be n × n and m × m symmetric positive definite matrices. Obtain the matrix
representation of L∗ under the inner products hx1 , x2 iV := ξ1T Sn ξ2 and hy1 , y2 iW := η1T Sm η2 ,
where ξk and ηk , k = 1, 2, are corresponding vectors of coordinates in bases {biV } and {biW }.
In particular show that if the Euclidean inner product is used for both spaces (i.e., Sn and
Sm are both the identity), then ML∗ = MLT , so that L is self-adjoint if and only if ML is
symmetric.

Exercise A.50 Let U be given by Example A.6. Consider the map L : U → Rn given by
Ztf
L(u) = G(σ)u(σ)dσ,
t0

n×m
R tf G T: [t0 , tf ] → R
with
n
continuous. Assume the inner product on U given by hu, viU :=
t0
u(t) v(t)dt and, in R , hx, yi := xT y. Then L is linear and bounded. Verify that L has
an adjoint L∗ : Rn → U given by
(L∗ x)(t) = G(t)T x.
[Proving boundedness of L takes some effort. Hint: If ϕ : [0, 1] → R is continuous (or
piecewise continuous), then kϕk1 ≤ kϕk2 , which can be proved by craftily invoking the CBS
inequality.]

Exercise A.51 Prove that the linear map L in Exercise A.50 is bounded. [Hint:
R1 R1
( 0 x(t)dt)2 ≤ 0 x(t)2 dt, which can be proved from Jensen’s inequality.]

Exercise A.52 Consider the linear time-invariant state-space model (A, B, and C are real)
ẋ = Ax + Bu (A.2)
y = Cx (A.3)
and the associated transfer-function matrix G(s) = C(sI − A)−1 B (for s ∈ C). The time-
invariant state-space model, with input y and output v,
ṗ = −AT p + C T y (A.4)
v = −B T p (A.5)
is said to be adjoint to (A.2)-(A.3). (Note the connection with system (2.19), when L =
C T C.) Show that the transfer-function matrix G∼ associated with the adjoint system (A.4)-
(A.5)√is given by G∼ (s) = G(−s)T ; in particular, for every ω ∈ R, G∼ (jω) = G(jω)H , where
j := −1 and a H superscript denotes the complex conjugate transpose of a matrix (which
is the adjoint with respect to the complex Euclidean inner product hu, vi = ξ H η, where ξ and
η are the vectors of coordinates of u and v, and ξ H is the complex conjugate transpose of ξ).
This justifies referring to p as the adjoint variable, or co-state. The triple (−AT , C T , −B T )
is also said to be dual to (A, B, C).

182 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Exercise A.53 Let G be a linear map from C[0, 1]m to C[0, 1]p , m and p positive integers,
defined by Z t
(Gu)(t) = G(t, τ )u(τ )dτ, t ∈ [0, 1], (A.6)
0

where the matrix G(t, τ ) depends continuouslyRon t and τ . Let C[0, 1]m and C[0, 1]p be en-
1
dowed with the L2 inner product, i.e., hr, si = 0 r(t)T s(t)dt. Then G is linear and bounded.
Prove that G has an adjoint G ∗ given by
Z 1

(G y)(t) = G(τ, t)T y(τ )dτ, t ∈ [0, 1]. (A.7)
t

Further, suppose that G is given by

G(t, τ ) = C(t)Φ(t, τ )B(τ )1(t − τ ),

where 1(t) is the unit step and Φ(t, τ ) is the state-transition matrix associated with a cer-
tain matrix A(t), or equivalently, since G(t, τ ) only affects (A.6) for τ < t, G(t, τ ) =
C(t)Φ(t, τ )B(τ ). Then G is the mapping from u to y generated by

ẋ(t) = A(t)x(t) + B(t)u(t), x(0) = 0, (A.8)


y(t) = C(t)x(t) (A.9)

and G ∗ is the mapping from z to v generated by

ṗ(t) = −A(t)T p(t) + C(t)T z(t), p(1) = 0, (A.10)


v(t) = −B(t)T p(t). (A.11)

Observe from (A.7) that G ∗ is anticausal, i.e., the value of the output at any given time in
[0, 1] depends only on present and future values of the input. Also note that, when A, B, and
C are constant, the transfer function matrix associated with G is G(s) = C(sI − A)−1 B and
that associated with G ∗ is G∼ (s), discussed in Exercise A.52.

Remark A.4 In connection with Exercise A.53, note that, with z := y, u := −v is the
optimal control for problem (P) of section 2.1.1, with t0 = 0, tf = 1, L = C T C, and Q = 0.
Hence the optimal control can be automatically generated by the simple feedback loop.

ẋ(t) = Ax(t) + BB T p(t),

ṗ(t) = −AT p(t) + C T Cx(t),


Unfortunately, the initial costate p(0) is unknown, so the adjoint system cannot be integrated.
Of course, because p(1) is known, p(0) can be pre-calculated BUT it will have to be assumed
that no disturbance or model errors affects that computation. I.e., the feedback-looking
implementation would not enjoy the benefits provided by true feedback. (When p(1) is
known, this “feedback” loop effectively integrates the Riccati equation!

Theorem A.5 (Fundamental Theorem of linear algebra) Let V, W be inner product spaces,
let L ∈ L(V, W ), with adjoint L∗ . Then N (L) and N (L∗ ) are closed and

Copyright ©1993–2024, André L. Tits. All Rights Reserved 183


Generalities on Vector Spaces

(a) N (L∗ ) = R(L)⊥ ; N (L) = R(L∗ )⊥ ;

(b) N (L∗ ) = N (LL∗ ); N (L) = N (L∗ L);

(c) cl(R(L)) = cl(R(LL∗ )), and if R(LL∗ ) is closed, then R(L) = R(LL∗ );

(d) (from (a) and (c):) if R(L) = R(LL∗ ), then V = R(L) ⊕ R(L)⊥ = R(L∗ ) ⊕ N (L).

Proof. Closedness of N (L) and N (L∗ ) follows from (a).

(a)

y ∈ N (L∗ ) ⇔ L∗ y = θV
⇔ hL∗ y, xi = 0 ∀x ∈ V
⇔ hy, Lxi = 0 ∀x ∈ V
⇔ y ∈ R(L)⊥ .

(We have used the fact that, if hL∗ y, xi = 0 ∀x ∈ V , then, in particular, hL∗ y, L∗ yi =
0, so that L∗ y = θV .)

(b)

y ∈ N (L∗ ) ⇔ L∗ y = θV ⇒ LL∗ y = θW ⇔ y ∈ N (LL∗ )

and

y ∈ N (LL∗ ) ⇔ LL∗ y = θW ⇒ hy, LL∗ yi = 0 ⇔ hL∗ y, L∗ yi = 0 ⇔ L∗ y = θV ⇔ y ∈ N (L∗ )

(c) R(L)⊥ = N (L∗ ) = N (LL∗ ) = R(LL∗ )⊥ . Thus R(L)⊥⊥ = R(LL∗ )⊥⊥ . The result
follows. Finally, if R(LL∗ ) is closed, then

R(LL∗ ) ⊆ R(L) ⊆ cl(R(L)) = cl(R(LL∗ )) = R(LL∗ ),

which implies that R(LL∗ ) = R(L) = cl R(L) = cl R(LL∗ ).

Now let L ∈ L(V, W ), with adjoint L∗ , and suppose that R(L) = R(LL∗ ). Let ŵ ∈
R(L) (= R(LL∗ )). Then there exists v̂ ∈ R(L∗ ) such that Lv̂ = ŵ. Further, v ∈ V satisfies
the equation Lv = ŵ if and only if v − v̂ ∈ N (L) (= R(L∗ )⊥ ). Hence, for any such v, v̂ is
the orthogonal projection of v (i.e., of the solution set of Lv = ŵ) on the subspace R(L∗ ),
which implies that, unless v = v̂, hv̂, v̂i < hv, vi. (Indeed, since hv − v̂, v̂i = 0,

hv, vi = hv̂ − (v̂ − v), v̂ − (v̂ − v)i = hv̂, v̂i + hv̂ − v, v̂ − vi > hv̂, v̂i .)

This leads to the solution of the linear least squares problem, stated next.

184 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Generalities on Vector Spaces

Theorem A.6 Let V, W be inner-product spaces and let L ∈ L(v, w), with adjoint L∗ .
Suppose that R(L) = R(LL∗ ). Let w ∈ R(L). Then the problem

minimize hv, vi s.t. Lv = w (A.12)

has a unique minimizer v0 . Further v0 ∈ R(L∗ ).

Proof. Let v ∈ V . Since R(L) = R(LL∗ ), Lv = LL∗ ξ for some ξ; with v0 := L∗ ξ, we have
Lv0 = LL∗ ξ = Lv and hence v − v0 ∈ N (L) = R(L∗ )⊥ and v = v0 + (v − v0 ).

Moore-Penrose pseudo-inverse
Let L ∈ L(V, W ), V and W inner-product spaces, with adjoint L∗ . Suppose that R(L) =
R(LL∗ ). It follows from the above that L|R(L∗ ) : R(L∗ ) → R(L) is a bijection. Let L† |R(L) :
R(L) → R(L∗ ) denote its inverse. Further, define L† on N (L∗ ) by

L† w = θV ∀w ∈ N (L∗ ).

Exercise A.54 Suppose W = R(L) ⊕ N (L∗ ). (For instance, W is Hilbert and R(L) is
closed.) Prove that L† has a unique linear extension to W .

This extension is the Moore-Penrose pseudo-inverse (after Eliakim H. Moore, American


mathematician, 1862–1932; Roger Penrose, English mathematician, born 1931). L† is linear
and LL† restricted to R(L) is the identity in W , and L† L restricted to R(L∗ ) is the identity
in V ; i.e.,
LL† L = L and L† LL† = L† . (A.13)

Exercise A.55 Let L : Rn → Rm be a linear map with matrix representation ML (an m × n


matrix). We know that the restriction of L to R(L∗ ) is one-to-one, onto R(L). Thus there
is an inverse map from R(L) to R(L∗ ). The Moore-Penrose pseudo-inverse L† of L is a
linear map from Rm to Rn that agrees with the just mentioned inverse on R(L) and maps
to θ every point in N (L∗ ). Prove the following.
• Such L† is uniquely defined, i.e., for an arbitrary linear map L, there exists a linear
map, unique among linear maps, that satisfies all the listed condition.
• Let ML = UΣV T be the singular value decomposition of ML , and let k be such that the
nonzero entries of Σ are its (i, i) entries, i = 1, . . . , k. Then the matrix representation
of L† is given by ML† := V Σ† U T where Σ† (an n × m matrix) has its (i, i) entry,
i = 1, . . . , k equal to the inverse of that of Σ and all other entries equal to zero.
In particular,
• If L is one-to-one, then L∗ L is invertible and L† = (L∗ L)−1 L∗ .
• If L is onto, then LL∗ is invertible and L† = L∗ (LL∗ )−1 .
• If L is one-to-one and onto, then L is invertible and L† = L−1 .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 185


Generalities on Vector Spaces

L
ℜ(L*) ℜ(L)
L*

θv θw
N(L) N(L* )

Figure A.1: Structure of a linear map

186 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Appendix B

On Differentiability and Convexity

B.1 Differentiability
[?, ?, ?]
First, let f : R → R. We know that f is differentiable at x∗ if

f (x∗ + t) − f (x∗ )
lim
t→0 t
exists, i.e. if there exists a ∈ R such that

f (x∗ + t) − f (x∗ )
= a + ϕ(t)
t
with ϕ(t) → 0 as → 0, i.e., if there exists a ∈ R such that

f (x∗ + t) = f (x∗ ) + at + o(t) ∀t ∈ R (B.1)

where o : R → R satisfies o(t)


t
→ 0 as t → 0. Scalar a is the derivative of f at x∗ , often
′ ∗
noted f (x ). Obviously, we have

f (x∗ + t) − f (x∗ )
f ′ (x∗ ) = lim .
t→0 t
Equation (B.1) shows that f ′ (x∗ )t (= at) is a linear (in t) approximation to f (x∗ + t) −
f (x∗ ). Our first goal in this appendix is to generalize this to f : V → W , with V and W
more general (than R) vector spaces. In such context, according to a formula such as (B.1)
(with a := f ′ (x∗ )) for t ∈ V , f ′ (x∗ )t will lie in W . Hence it is natural to view f ′ (x∗ ) (value
of the derivative of f at a given point of domain V ) as a linear map from V to W .
While it is typical to write the derivative of f at a point x∗ as ∂f ∂x
(x∗ ), in these notes
we leave out the (meaningless) “x” in the denominator and write instead Df (p) for the
derivative of f at p ∈ V (e.g., Df (x∗ ) is the derivative a f at x∗ ), such notation being
simpler and in some instances (e.g., when invoking the chain rule) clearer than ∂f ∂x
(x∗ ).
n m
As a next step we consider the case of f : R → R .

Copyright ©1993–2024, André L. Tits. All Rights Reserved 187


On Differentiability and Convexity

Definition B.1 The function f : Rn → Rm has partial derivatives at x∗ if there exists


aij ∈ R, for i = 1, . . . , m j = 1, . . . , n such that

fi (x∗ + tej ) = fi (x∗ ) + aij t + oij (t) ∀t ∈ R


oij (t)
with t
→ 0 as t → 0, for i = 1, . . . , m, j = 1, . . . , n.

We will note the “partial derivatives” aij of f at x∗ by Dj fi (x∗ ). Note that existence of all
partial derivatives at a point does not imply continuity at that point, as seen in the next
example.
Example B.1 Let f : R2 → R be defined by

0 if xy = 0
f (x, y) =
1 elsewhere

Then f has partial derivatives at (0, 0), yet it is not continuous at (0,0).

Hence, existence of partial derivatives at x∗ does not imply continuity at x∗ . Also, the
notion of partial derivative does not readily extend to functions whose domain is infinite-
dimensional. For both of these reasons, we next consider notions of differentiability which,
while being more restrictive than mere existence of partial derivatives, readily generalize to
more general domains (and codomains) for which partial derivatives can’t even be defined.
Before doing so, we note the following fact, which applies when f has finite-dimensional
domain and co-domain.
Fact. (e.g., [?, Theorem 13.20]) Let f : Rn → Rm , and let Ω ⊆ Rn open set. If the
partial derivatives of (the components of) f exist and are continuous throughout Ω, then f
is continuous on Ω.
We now consider f : V → W , where V and W are vector spaces, and W is equipped
with a norm.
Definition B.2 f is 1-sided (2-sided) directionally differentiable at x∗ ∈ V if for all h ∈ V
there exists ah ∈ W such that

f (x∗ + th) = f (x∗ ) + tah + oh (t) ∀t ∈ R (B.2)

with
1
koh (t)kW → 0 as t → 0 (for any given h) (2 − sided)
t
1
koh (t)kW → 0 as t ↓ 0 (for any given h) (1 − sided)
t
ah is the directional derivative of f at x∗ in direction h, often denoted f ′ (x∗ ; h).

Definition B.3 f is Gâteaux– (or G–) differentiable at x∗ if there exists a linear map A ∈
L(V, W ) such that

f (x∗ + th) = f (x∗ ) + tAh + oh (t) ∀h ∈ V ∀t ∈ R (B.3)

188 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.1 Differentiability

with, for fixed h ∈ V ,


1
oh (t) → θ as t → 0.
t
Linear map A is termed G-derivative of f at x∗ and will be denoted Df (x∗ ).

(René E. Gâteaux, French mathematician, 1889-1914.)


Note: The term G-differentiability is also used in the literature to refer to other concepts
of differentiability. Here we follow the terminology used in [?].
It is readily checked that f is G-differentiable at x∗ if and only if (i) it is 2-sided direc-
tionally differentiable at x∗ in all directions and (ii) f ′ (x∗ ; h) is linear in h. Indeed, in such
case, the directional derivative in direction h is the image of h under the mapping defined
by the G-derivative, i.e.,
f ′ (x∗ ; h) = Df (x∗ )h ∀h ∈ V.
G-differentiability at x∗ still does not imply continuity at x∗ though, even when V and W
are finite-dimensional! See Exercise B.5 below.
Suppose now that V is also equipped with a norm.

Definition B.4 f is Fréchet– (or F–) differentiable at x∗ if there exists a continuous linear
map A ∈ B(V, W ) such that

f (x∗ + h) = f (x∗ ) + Ah + o(h) ∀h ∈ V (B.4)

with
ko(h)kW
→ 0 as h → 0V .
khkV
Linear map A is termed F-derivative of f at x∗ and denoted Df (x∗ ) in these notes.

In other words, f is F-differentiable at x* if there exists a continuous linear map A : V → W


such that

1
(f (x∗ + h) − f (x∗ ) − Ah) → 0W as h → 0V .
khk

With A written Df (x∗ ), we can write (B.4) as

f (x∗ + h) = f (x∗ ) + Df (x∗ )h + o(h)

whenever f : V → W is F-differentiable at x∗ .
If f : Rn → Rm is F-differentiable at x∗ then, given bases for Rn and Rm , Df (x∗ ) can
be represented by a matrix (as any linear map from Rn to Rm ).

Exercise B.1 Prove that, if f : Rn → Rm is F-differentiable at x∗ then, given bases for


Rn and Rm , the entries of the matrix representation of Df (x∗ ) are the partial derivatives
Dj fi (x∗ ).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 189


On Differentiability and Convexity

It should be clear that, if f is F-differentiable at x with F-derivative Df (x), then (i) it


is G-differentiable at x with G-derivative Df (x), (ii) it is 2-sided directionally differentiable
at x in all directions, with directional derivative in direction h given by Df (x)h, and (iii) if
f : Rn → Rm , then f has partial derivatives Dj fi (x) at x equal to Dj fi (x).
The difference between Gâteaux and Fréchet is that, for the latter, ko(h)k
khkV
W
must tend to θ no
matter how h goes to θ whereas, for the former, convergence is along straight lines th, with
fixed h. Further, F-differentiability at x requires that Df (x∗ ) ∈ B(V, W ) (bounded linear
map). Clearly, every Fréchet-differentiable function at x is continuous at x (why?).
The following exercises taken from [?] show that each definition is “strictly stronger”
than the previous one.

Exercise B.2 (proven in [?])


1. If f is Gâteaux-differentiable, its Gâteaux derivative is unique.

2. If f is Fréchet differentiable, it is also Gâteaux differentiable and its Fréchet derivative


is given by its (unique) Gâteaux-derivative.

Exercise B.3 Define f : R2 → R by f (x) = x1 if x2 = 0, f (x) = x2 , if x1 = 0, and


f (x) = 1 otherwise. Show that the partial derivatives D1 f (0) and D2 f (0) exist, but that f
is not directionally differentiable at (0, 0).

Exercise B.4 Define f : R2 → R by

f (x) = sgn(x2 ) min(|x1 |, |x2 |).

Show that, for any h ∈ R2 ,

lim(1/t)[f (th) − f (0, 0)] = f (h),


t→0

and thus that f is 2-sided directionally differentiable at 0, 0, but that f is not G-differentiable
at 0, 0.

Exercise B.5 Define f : R2 → R by




 0 if x1 = 0
 − 12
f (x) = 2x2 e x1
!2 if x1 6= 0.
 − 12
 2
 x2 + e 1 x

Show that f is G-differentiable at (0, 0), but that f is not continuous (and thus not F-
differentiable) at (0, 0).

Remark B.1 For f : Rm → Rn , again from the equivalence of norms, F-differentiability


does not depend on the particular norm.

190 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.1 Differentiability

Gradient of a differentiable functional over a Hilbert space


Let f : H → R where H is a Hilbert space and suppose f is Fréchet-differentiable at x,
so that Df (x) ∈ B(H, R). Then, in view of the Riesz-Fréchet theorem (Theorem A.4, there
exists a unique g ∈ H such that

Df (x)h = hg, hi = ∀h ∈ H.

Such g is called the gradient of f at x, and denoted gradf (x), i.e.,

hgradf (x), hi = Df (x)h ∀h ∈ H.

When H = Rn and h·, ·i is the Euclidean inner product hx, yi = xT y, we will often denote
the gradient of f at x by ∇f (x∗ ).

Exercise B.6 Note that the gradient depends on the inner product to which it is associated.
For example, suppose f : Rn → R. Let S be a symmetric positive definite n × n matrix, and
define hx, yi = xT Sy. Prove that gradf (x) = S −1 Df (x)T . In particular, under the Euclidean
inner product, ∇f (x) = Df (x)T .

In the sequel, unless otherwise specified, “differentiable” will mean “Fréchet-differentiable”.


An important property, which may not hold if f is merely Gâteaux differentiable, is given
by the following fact.
Fact [?]. If φ : X → Y and θ : Y → Z are F-differentiable, respectively at x∗ ∈ X and at
φ(x∗ ) ∈ Y , then h : X → Z defined by h(x) = θ(φ(x)) is F-differentiable and the following
chain rule applies

Dh(x) = Dθ(φ(x))Dφ(x) ∀x ∈ X.

Exercise B.7 Prove this fact using the notation used in these notes.

Exercise B.8 Express the “total” derivative of θ(φ(x), ψ(x)) with respect to x, with θ, φ,
and ψ F-differentiable maps between appropriate spaces.

Exercise B.9 Let Q be an n × n (not necessarily symmetric) matrix and let b ∈ Rn . Let
f (x) = 12 hx, Qxi + hb, xi, where hx, yi = xT Sy, with S = S T > 0. Show that f is
F-differentiable and obtain its gradient with respect to the same inner product.

Remark B.2

1. We will say that a function is differentiable (in any of the previous senses) if it is
differentiable everywhere.

2. When x is allowed to move, Df can be viewed as a function of x, whose values are


bounded linear maps:

Copyright ©1993–2024, André L. Tits. All Rights Reserved 191


On Differentiability and Convexity

Df : V → B(V, W ), x 7→ Df (x).

First order “exact” expansions


Mean Value Theorem (e.g., [?]) Let φ : R → R and suppose φ is continuous on [a, b] ⊂ R
and differentiable on (a, b). Then, we know that there exists ξ ∈ (a, b) such that

φ(b) − φ(a) = φ′ (ξ)(b − a) (B.5)

i.e.,
φ(a + h) − φ(a) = φ′ (a + th)h, for some t ∈ (0, 1) (B.6)

We have the following immediate consequence for functionals.


Fact. Suppose f : V → R is differentiable. Then for any x, h ∈ V there exists t ∈ (0, 1)
such that

f (x + h) − f (x) = Df (x + th)h (B.7)

Proof. Consider φ : R → R defined by φ(s) : f (x + sh). By the result above there exists
t ∈ (0, 1) such that

f (x + h) − f (x) = φ(1) − φ(0) = φ′ (t) = Df (x + th)h (B.8)

where we have applied the chain rule for F-derivatives.

It is important to note that this result is generally not valid for f : V → Rm , m > 1, because
to different components of f will correspond different values of t. For this reason, we will
often make use, as a substitute, of the fundamental theorem of integral calculus, which
requires continuous differentiability (though the weaker condition of “absolute continuity”,
which implies existence of the derivative almost everywhere, is sufficient):

Definition B.5 f : V → W is said to be continuously Fréchet-differentiable if it is Fréchet-


differentiable and its Fréchet derivative ∂f
∂x
is a continuous function from V to B(V, W ).

The following fact strengthens the result we quoted earlier that, when f maps Rn to Rm ,
continuity of its partial derivatives implies its own continuity.
Fact. (E.g., [?], Theorem 2.5.) Let f : Rn → Rm , and let Ω be an open subset of Rn . If the
partial derivatives of (the components of) f exist on Ω and are continuous at x̂ ∈ Ω, then f
is continuously (Fréchet) differentiable at x̂.
Now, if φ : R → R is continuously differentiable on [a, b] ⊂ R, then the fundamental
theorem of integral calculus asserts that
Z b
φ(b) − φ(a) = φ′ (ξ)dξ. (B.9)
a

192 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.1 Differentiability

Now For a continuously differentiable g : R1 → Rm , we define the integral of g by


 Rb 
Z b a
g 1(t)dt
g(t)dt = 
 .. 
(B.10)
. 
a Rb m
a
g (t)dt

For f : V → Rm , we then obtain

Theorem B.1 (Fundamental Theorem of integral calculus) If f : V → Rm is continuously


(Fréchet) differentiable, then for all x, h ∈ V

Z1
f (x + h) − f (x) = Df (x + th)h dt
0

Proof. Define φ : R1 → Rm by φ(s) = fi (x + sh). Apply (B.9), (B.10) and the chain rule.

Note. For f : V → W , W a Banach space, the same result holds, with a suitable definition
of the integral (of “integral-regulated” functions, see [?]).

Corollary B.1 . Let V be a normed vector space. If f : V → Rn is continuously differen-


tiable, then for all x ∈ V
f (x + h) = f (x) + O(khk),
in the sense that f is locally Lipschitz; i.e., given any x∗ ∈ V there exist ρ > 0, δ > 0, and
β > 0 such that for all h ∈ V with khk ≤ δ, and all x ∈ B(x∗ , ρ),

kf (x + h) − f (x)k ≤ βkhk . (B.11)

Further, if V is finite-dimensional, then given any ρ > 0, and any r > 0, there exists β > 0
such that (B.11) holds for all x ∈ B(x∗ , ρ) and all h ∈ V with khk ≤ r.

Exercise B.10 Prove Corollary B.1.

A stronger and more general version of the second statement in Corollary B.1 is as follows.
(See, e.g, [?], Theorem 2.3.)
Fact. Let V and W be normed vector spaces, and let B be an open ball in V . Let f : V → W
be differentiable on B, and suppose ∂f
∂x
(x) is bounded on B. Then there exists β > 0 such
that, for all x ∈ B, h ∈ V such that x + h ∈ B,

kf (x + h) − f (x)k ≤ βkhk .

(In this version, B need not be “small”.)

Second derivatives

Copyright ©1993–2024, André L. Tits. All Rights Reserved 193


On Differentiability and Convexity

Definition B.6 Suppose f : V → W is differentiable on V and use the induced norm for
B(V, W ). If
Df : V → B(V, W ), x 7→ Df (x)
is itself differentiable, then f is twice differentiable and the derivative of Df at x ∈ V is
noted D2f (x) and is called second derivative of f at x. Thus

D2 f (x) : V → B(V, W ) (B.12)

and
D2 f : V → B(V, B(V, W )), x 7→ D2 f (x).

Fact. If f : V → W is twice continuously Fréchet-differentiable then its second derivative


is symmetric in the sense that for all u, v ∈ V , x ∈ V
 
D2f (x)u v = D2f (x)v u (∈ W ) (B.13)

[Note: the reader may want to resist the temptation of viewing D2 f (x) as a “cube” matrix.
It is simpler to think about it as an abstract linear map.]
Now let f : H → R, where H is a Hilbert space, be twice differentiable. Then, in view
of the Riess-Fréchet Theorem, B(H, R) is isomorphic to H, and in view of (B.12) D2 f (x)
can be thought of as a bounded linear map Hessf (x) : H → H. This can be made precise
as follows. For any x ∈ H, Df (x) : H → R is a bounded linear functional and, for any
u ∈ H, D2 f (x)u : H → R is also a bounded linear functional. Thus in view of the Riesz-
Fréchet representation Theorem, there exists a unique ψu ∈ H such that

D2 (x)u v = hψu , vi ∀v ∈ H.

For given x ∈ H, The map from u ∈ H to ψu ∈ H is linear and bounded (why?). Let
us denote this map by Hessf (x) ∈ B(H, H) (after L. Otto Hesse, German mathematician,
1811–1874), i.e., ψu = Hessf (x)u. We get

D2 f (x)u v = hHessf (x)u, vi ∀x, u, v ∈ H.

In view of (B.13), if f is twice continuously differentiable, Hessf (x) is self-adjoint. If H = Rn


and h·, ·i is the Euclidean inner product, then Hessf (x) is represented by an n × n symmetric
matrix, which we will denote by ∇2 f (x). In the sequel though, following standard usage, we
will often abuse notation and use ∇2 f and D2f interchangeably.
Fact. If f : V → W is twice F-differentiable at x ∈ V then
1 2 
f (x + h) = f (x) + Df (x)h + D f (x)h h + o2 (h) (B.14)
2
with
||o2 (h)||
→ 0 as h → θ.
||h||2

194 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

Exercise B.11 Prove the above in the case W = Rn . Hint: first use Theorem B.1.

Finally, a second order integral expansion.


Theorem B.2 If f : V → Rm is twice continuously differentiable, then for all x, h ∈ V
Z 1

f (x + h) = f (x) + Df (x)h + (1 − t) D2f (x + th)h h dt (B.15)
0

(This generalizes the relation, with φ : R → R twice continuously differentiable,


Z 1

φ(1) − φ(0) − φ (0) = (1 − t)φ′′ (t)dt
0

Check it by integrating by parts. Let φ(s) = fi (x + s(y − x)) to prove the theorem.)

Corollary B.2 . Let V be a normed vector space. Suppose f : V → Rn is twice continuously


differentiable, Then given any x∗ ∈ V there exists ρ > 0, δ > 0, and β > 0 such that for all
h ∈ V with khk ≤ δ, and all x ∈ B(x∗ , ρ),

kf (x + h) − f (x) − Df (x)hk ≤ βkhk2 . (B.16)

Further, if V is finite-dimensional, then given any ρ > 0, and any r > 0, there exists β > 0
such that (B.16) holds for all x ∈ B(x∗ , ρ) and all h ∈ V with khk ≤ r.

Again, a more general version of this result holds. (See, e.g, [?], Theorem 4.8.)
Fact. Let V and W be normed vector spaces, and let B be an open ball in W . Let
f : W → W be twice differentiable on B, and suppose D2 f (x) is bounded on B. Then there
exists β > 0 such that, for all x ∈ B, h ∈ V such that x + h ∈ B,

kf (x + h) − f (x) − Df (x)hk ≤ βkhk2 .

[Again, in this version, B need not be “small”.]

B.2 Some elements of convex analysis


[?]
Let V be a vector space.

Definition B.7 A set S ⊆ V is said to be convex if, ∀x, y ∈ S, ∀λ ∈ (0, 1),

λx + (1 − λ)y ∈ S. (B.17)

Further, a function f : V → R is convex on the convex set S ⊆ V if ∀x, y ∈ S, ∀λ ∈ (0, 1),

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y), (B.18)

i.e., if the arc lies below the chord (see Figure B.1).

Copyright ©1993–2024, André L. Tits. All Rights Reserved 195


On Differentiability and Convexity

However, this definition may fail (in the sense of the right-hand side of (B.18) not being well
defined) when f is allowed to take on values of both −∞ and +∞, which is typical in convex
analysis. (The definition is valid if f is “proper”: see below.) A more general definition is
as follows.

Definition B.8 A function f : S → R∪{±∞} is convex on convex set S ⊆ V if its epigraph

epi f := {(x, z) ∈ S × R : z ≥ f (x)}

is a convex set.

The (effective) domain domf of a convex function f : S → R ∪ {±∞} is the set of points
x ∈ S such that f (x) < ∞; f is proper if (i) its domain is non-empty and (ii) for all x ∈ S,
f (x) 6= −∞.
Fact. Let S be a convex open subset of a finite-dimensional vector space. If f : S → R is
convex on S, then it is continuous on S. More generally, if f : S → R ∪ {±∞} is convex
and proper, then it is continuous over the relative interior of its domain (in particular, over
the interior of its domain). See, e.g., [?], section 1.4 and [?].

Definition B.9 Let V be a normed vector space. Then S ⊆ V is strictly convex if, ∀x, y ∈
S, x 6= y, ∀λ ∈ (0, 1),
λx + (1 − λ)y ∈ int(S),
where int(S) denotes the interior of S. A function f : S → R is said to be strictly convex
on the convex set S ⊆ V if its epigraph is strictly convex.

Definition B.10 A convex combination of k points x1 , . . . , xk ∈ V is any point x expressible


as
k
X
x = λi xi (B.19)
i=1

k
P
with λi ≥ 0, for i = 1, . . . , k and λi = 1.
i=1

Exercise B.12 Show that a set S is convex if and only if it contains the convex combinations
of all its finite subsets. Hint: use induction.

Fact 1. [?]. (Jensen’s inequality.) A function f : V → R is convex on the convex set S ⊂ V ,


if and only if for any finite set of points x1 , ..., xk ∈ S and any λi ≥ 0, i = 1, ..., k with
Pk
λi = 1, one has
i=1 !
Xk X k
f λi xi ≤ λi f (xi ).
i=1 i=1

196 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

f(y)+λ(f(x)-f(y))
f =λf(x)+(1-λ)f(y)


• • f(λx+(1-λ)y)
x λx+(1-λ)y y
=y+λ(x-y)

Figure B.1:

Exercise B.13 Prove the above. (Hint: Use Exercise B.12 and mathematical induction.)

Convex Hull

Definition B.11 The convex hull of a set S, denoted coS, is the smallest convex set con-
taining S. The following exercise shows that it makes sense to talk about the “smallest” such
set.

Exercise B.14 Show that ∩{Y : Y convex, S ⊆ Y } is convex and contains S. Since it is
contained in every convex set containing S, it is the ‘smallest’ such set.

Exercise B.15 Prove that



( k k
)
[ X X
coX = λi xi : xi ∈ X, λi ≥ 0 ∀i, λi = 1
k=1 i=1 i=1

In other words, coX is the set of all convex combinations of finite subsets of X. In particular,
if X is finite, X = {x1 , . . . , xℓ }
( ℓ ℓ
)
X X
coX = λi xi : λi ≥ 0 ∀i, λi = 1
i=1 i=1

(Hint: To prove ⊆, show that X ⊆ RHS (right-hand side) and that RHS is convex. To
prove ⊇, use mathematical induction.)

Copyright ©1993–2024, André L. Tits. All Rights Reserved 197


On Differentiability and Convexity

Exercise B.16 (due to Constantin Carathéodory, Greek-born mathematician, 1873–1950;


e.g. [?]). Show that, in the above exercise, for X ⊂ Rn , it is enough to consider convex
combinations of n + 1 points, i.e.,
( n+1 n+1
)
X X
coX = λi xi : λi ≥ 0, xi ∈ X, λi = 1
i=1 i=1

and show by example (say, in R2 ) that n points is generally not enough.

Exercise B.17 Prove that the convex hull of a compact subset of Rn is compact and that
the closure of a convex set is convex. Show by example that the convex hull of a closed subset
of Rn need not be closed.

Suppose now V is a normed space.


Proposition B.1 Suppose that f : V → R is differentiable on an open convex subset S of
a normed vector space V . Then f is convex on S if and only if, for all x, y ∈ S

f (y) ≥ f (x) + Df (x)(y − x). (B.20)

Proof. (only if) (see Figure B.2). Suppose f is convex. Then, ∀x, y ∈ S, λ ∈ [0, 1],

f(y)
f(x)+λ(f(y)-f(x)) f(x) + f(x+λ(y-x))-f(x)
• λ

f(x+λ(y-x))

x y
λ
0 1
Figure B.2:

f (x + λ(y − x)) = f (λy + (1 − λ)x) ≤ λf (y) + (1 − λ)f (x) = f (x) + λ(f (y) − f (x))(B.21)

i.e.
f (x + λ(y − x)) − f (x)
f (y) − f (x) ≥ ∀λ ∈ (0, 1], ∀x, y ∈ V (B.22)
λ
and, when λ ց 0, since f is differentiable,

f (y) − f (x) ≥ Df (x)(y − x) ∀x, y ∈ V (B.23)

(if) (see Figure B.3). Suppose (B.20) holds ∀x, y ∈ V . Then, for given x, y ∈ V and
z = λx + (1 − λ)y

198 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

f (x) ≥ f (z) + Df (z)(x − z) (B.24)

f (y) ≥ f (z) + Df (z)(y − z), (B.25)


and λ×(B.24) +(1 − λ)×(B.25) yields
λf (x) + (1 − λ)f (y) ≥ f (z) + Df (z)(λx + (1 − λ)y − z)
= f (λx + (1 − λ)y)

Fact. Moreover, f is strictly convex if and only if inequality (B.20) is strict whenever x 6= y
(see [?]).
Definition B.12 f : V → R, V a normed space, is said to be strongly convex over a convex
set S if f is differentiable on S and there exists m > 0 s.t. ∀x, y ∈ S
m
f (y) ≥ f (x) + Df (x)(y − x) + ky − xk2 .
2
Proposition B.2 If f : V → R is strongly convex, it is strictly convex and, for any x0 ∈ V ,
the sub-level set
{x : f (x) ≤ f (x0 )}
is bounded.

Proof. The first claim follows from Definition B.12 and the fact preceding it. Now, let h be
an arbitrary unit vector in Rn . Then
1
f (x0 + h) ≥ f (x0 ) + Df (x0 )h + mkhk2
2
1
≥ f (x0 ) − kDf (x0 )k khk + mkhk2
m 2 
= f (x0 ) + khk − kDf (x0 )k khk
2
> f (x0 ) whenever khk > (2/m) kDf (x0 )k .
Hence {x : f (x) ≤ f (x0 )} ⊆ B (x0 , (2/m) kDf (x0 )k), which is a bounded set.

Interpretation. The function f lies above the function


1
f˜(x) = f (x0 ) + Df (x0 )(x − x0 ) + mkx − x0 k2
2
which grows without bound uniformly in all directions.

A above A1  ⇒ C above C1

C B B above B1 
A
C1 B1
A
x z y

Figure B.3:

Copyright ©1993–2024, André L. Tits. All Rights Reserved 199


On Differentiability and Convexity

Proposition B.3 Suppose that f : V → R is twice continuously differentiable (so that its
Hessian is symmetric). Then f is convex on V if and only if its Hessian D2 f (x) is positive
semi-definite for all x ∈ V .

Proof. We prove sufficiency. We can write, ∀x, y ∈ V


Z 1
f (x + h) = f (x) + Df (x)h + (1 − t)(D2 f (x + th)h)hdt
0
⇒ f (y) ≥ f (x) + Df (x)h ∀x, y ∈ V

and, from Proposition B.1, f is convex.

Exercise B.18 Prove the necessity part of Proposition B.3.

Exercise B.19 Show that if f is twice continuously differentiable and D2 f is positive definite
on V , then f is strictly convex. Show by example that the converse does not hold in general.

Exercise B.20 Suppose f : V → R is twice continuously differentiable. Then f is strongly


convex if and only if there exists m > 0 such that for all x, h ∈ V ,

hT D2 f (x)h ≥ m||h||2. (B.26)

Exercise B.21 Let V = Rn . Show that (B.26) holds if, and only if, the eigenvalues of
∇2 f (x) (they are real, why ?) are all positive and bounded away from zero, i.e., there exists
m > 0 such that for all x ∈ Rn all eigenvalues of ∇2 f (x) are larger than m.

Exercise B.22 Exhibit a function f : R → R, twice continuously differentiable with Hes-


sian everywhere positive definite, which is not strongly convex.

Exercise B.23 Exhibit a function f : R2 → R such that for all x, y ∈ R, f (x, ·) : R → R


and f (·, y) : R → R are strongly convex but f is not even convex. Hint: consider the Hessian
matrix.

Separation of Convex Sets [?, ?]


Here we consider subsets of a Hilbert space H.

Definition B.13 Let a ∈ H, a 6= 0, and α ∈ R. The set

P (a, α) := {x ∈ H : ha, xi = α}

is called a hyperplane. Thus, hyperplanes are level sets of linear functionals.

200 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

• ξa
1a • ξa+x

x
H

Figure B.4:

Let us check, for n = 2, that this corresponds to the intuitive notion we have of a hyperplane,
i.e., a straight line in this case. Such straight line can be expressed, for a given scalar ξ and
vector a orthogonal to the straight line, as the set of all x of the form x = ξa + v for some
v⊥a; see Figure B.4, where x should be v, which should be orthogonal to a, and H should
be P (a, α). Hence

ha, xi = ξ||a||2 + ha, vi = ξ||a||2 ∀x ∈ P (a, α)

which corresponds to the above definition with α = ξ||a||2.


{x : ha, xi ≤ α} and {x : ha, xi ≥ α} are closed half spaces;
{x : ha, xi < α} and {x : ha, xi > α} are open half spaces.

Definition B.14 Let X, Y ⊂ H. X and Y are (strictly) separated by P (a, α) if

ha, xi ≥ (>)α ∀x ∈ X

ha, yi ≤ (<)α ∀y ∈ Y

This means that X is entirely in one of the half spaces and Y entirely in the other. Examples
are given in Figure B.5.

x x′

H
y
H′
y

Figure B.5: X and Y are separated by H and strictly separated by H′ . X ′ and Y ′ are not
separated by any hyperplane (i.e., cannot be separated).

Note. X and Y can be separated without being disjoint. Any hyperplane is even separated
from itself (by itself!)
The idea of separability is strongly related to that of convexity. For instance, it can be shown
that 2 disjoint convex sets can be separated. We will present here only one of the numerous
separation theorems; this theorem will be used in connection with constrained optimization.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 201


On Differentiability and Convexity

x
• ∧ X
x

∧ ∧ 2∆
〈 x, x 〉=x = H1

0
∧ ∧ 2∆
〈 x, x 〉 = 1−
2  x  = H2

Figure B.6:

Theorem B.3 Suppose X ⊂ H is nonempty, closed and convex and suppose that 0 6∈ X.
Then there exists a ∈ H, a 6= 0, and α > 0 such that

ha, xi > α ∀x ∈ X (B.27)

(i.e., X and {0} are strictly separated by P (a, α)).


Proof (see Figure B.6). We first prove the result for the case H = Rn . Let k · k denote the
norm derived from the underlying inner product. We first show that there exists x̂ ∈ X such
that
kx̂k ≤ kxk ∀x ∈ X. (B.28)
Choose ρ > 0 such that B̄(0, ρ) ∩ X 6= ∅. B̄(0, ρ) ∩ X is bounded and closed (intersection of
closed sets), hence compact. Since || · || is continuous, there exists x̂ ∈ X such that

||x̂|| ≤ ||x|| ∀x ∈ B̄(0, ρ) ∩ X (B.29)

hence (B.28) holds since ||x̂|| ≤ ρ and ||x|| > ρ for all x 6∈ B̄(0, ρ). We now show that
x̂ is normal to a hyperplane that strictly separates X from the origin. We first show that
P (x̂, kx̂k2 ), which containts x̂, (non-strictly) separates X from θ. Clearly, hθ, x̂i < kx̂k2 . We
show by contradiction that
hx, x̂i ≥ kx̂k2 ∀x ∈ X, (B.30)
proving the separation. Thus suppose there exists x ∈ X such that

hx, x̂i = ||x̂||2 − ǫ, (B.31)

where ǫ > 0. Since X is convex,

xλ := λx + (1 − λ)x̂ = x̂ + λ(x − x̂) ∈ X ∀λ ∈ [0, 1].

Further
||xλ ||2 = ||x̂||2 + λ2 ||x − x̂||2 + 2λ(hx̂, xi − ||x̂||2) ∀λ ∈ [0, 1]
i.e. using (??),
||xλ ||2 = ||x̂||2 + λ2 ||x − x̂||2 − 2λ ǫ,

202 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

so that

||xλ ||2 < ||x̂||2 (B.32)

holds for λ > 0 small enough (since ǫ > 0), which contradicts (B.28). Hence, (B.30) must
hold. Since x̂ 6= θ, it follows that, with α := kx̂k2 /2,

hθ, x̂i = 0 < α < hx, x̂i ∀x ∈ X,

concluding the proof for the case H = Rn .


Note that this proof in fact applies to the case of a general Hilbert space, except for the
use of a compactness argument to prove (B.28): in general, B̄(0, ρ) may not be compact.
We borrow the general proof from [?, Section 3.12]. Thus let δ := inf x∈X kxk, so that

kxk ≥ δ ∀x ∈ X,

and let {xi } ⊂ X be such that

kxi k → δ as i → ∞. (B.33)

Since X is convex, (xi + xj )/2 also belongs to X, so that k(xi + xj )/2k ≥ δ, implying

kxi + xj k2 ≥ 4δ. (B.34)

Now the parallelogram law gives

kxi − xj k2 + kxi + xj k2 = 2(kxi k2 + kxj k2 ),

i.e, in view of (??) and (??),

kxi − xj k2 ≤ 2(kxi k2 + kxj k2 ) − 4δ → 0 as i, j → ∞.

The sequence {xi } is Cauchy, hence (since H is complete) convergent, to some x̂ ∈ X since
X is closed. From continuity of the norm, we conclude that kx̂k = δ, concluding the proof.

Corollary B.3 If X is closed and convex and b 6∈ X, then b and X are strictly separated.

Remark B.3 Theorem B.3 also holds more generally on Banach spaces V ; see, e.g., [?,
Section 5.12]. In this context (there is no inner product), hyperplanes are more generally
defined by
P (ℓ, α) := {x ∈ V : ℓx = α}
where ℓ : V → V is a continuous linear map. The proof is based on the celebrated Hahn-
Banach theorem.

Fact. If X and Y are nonempty disjoint and convex, with X compact and Y closed, then
X and Y are strictly separated.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 203


On Differentiability and Convexity

Exercise B.24 Prove the Fact. Hint: first show that Y − X, defined as {z|z = y − x, y ∈
Y, x ∈ X}, is closed, convex and does not contain 0.

Exercise B.25 Show by an example that if in the above theorem, X is merely closed, X
and Y may not be strictly separated. (In particular, the difference of 2 closed sets, defined
as above, may not be closed.)

Exercise B.26 Prove that, if C is a closed convex cone and x 6∈ C, then there exists h such
that hT x > 0 and hT v ≤ 0 for all v ∈ C.

204 Copyright ©1993-2018, André L. Tits. All Rights Reserved


B.2 Some elements of convex analysis

Acknowledgment
The author wishes to thank the numerous students who have contributed constructive com-
ments towards improving these notes over the years. In addition, special thanks are addressed
to Ji-Woong Lee, who used these notes when he taught an optimal control course at Penn
State University in the Spring 2008 semester and, after the semester was over, provided
many helpful comments towards improving the notes.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 205


On Differentiability and Convexity

206 Copyright ©1993-2018, André L. Tits. All Rights Reserved


Bibliography

[1] B.D.O. Anderson and J.B. Moore. Optimal Control: Linear Quadratic Methods. Pren-
tice Hall, 1990.

[2] A. Avez. Differential Calculus. J. Wiley and Sons, 1986.

[3] D.P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic
Press, New York, 1982.

[4] D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol. 1. Athena Scientific,
Belmont, Massachusetts, 2017.

[5] S.P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, Massachusetts,


1996.

[6] S.P. Bertsekas, A. Nedic, and A.E. Ozdaglar. Convex Analysis and Optimization.
Athena Scientific, Belmont, Massachusetts, 2003.

[7] R.W. Brockett. Finite Dimensional Linear Systems. J. Wiley and Sons, 1970.

[8] F.M. Callier and C.A. Desoer. Linear System Theory. Springer-Verlag, New York, 1991.

[9] M.D. Canon, JR. C.D. Cullum, and E. Polak. Theory of Optimal Control and
Mathematical Programming. McGraw-Hill Book Company, New York, 1970.

[10] M.D. Canon, C.D. Cullum, and E.Polak. Theory of Optimal Control and Mathematical
Programming. McGraw-Hill, 1970.

[11] F.H. Clarke. Optimization and Nonsmooth Analysis. Wiley Interscience, 1983.

[12] V.F. Dem’yanov and L.V. Vasil’ev. Nondifferentiable Optimization. Translations Series
in Mathematics and Engineering. Springer-Verlag, New York, Berlin, Heidelberg, Tokyo,
1985.

[13] P.M. Fitzpatrick. Advanced Calculus. Thomson, 2006.

[14] W.H. Fleming and R.W. Rishel. Deterministic and Stochastic Optimal Control.
Springer-Verlag, New York, 1975.

[15] F.J. Gould and J.W. Tolle. A necessary and sufficient qualification for constrained
optimization. SIAM J. on Applied Mathematics, 20, 1971.

Copyright ©1993–2024, André L. Tits. All Rights Reserved 207


BIBLIOGRAPHY

[16] J.K. Hale. Ordinary Differential Equations. Wiley-Interscience, 1969.


[17] H.K. Khalil. Nonlinear Systems. Prentice Hall, 2002. Third Edition.
[18] Peter D. Lax. Functional Analysis. J. Wiley & Sons Inc., 2002.
[19] E.B. Lee and L. Markus. Foundations of Optimal Control Theory. Wiley, New York,
1967.
[20] G. Leitman. The Calculus of Variations and OptimalControl. Plenum Press, 1981.
[21] D. Liberzon. Calculus of Variations and Optimal Control Theory: A Concise
Introduction. Princeton University Press, 2011.
[22] D. G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison-Wesley,
Reading, Mass., 1973.
[23] D.G. Luenberger. Optimization by Vector Space Methods. J. Wiley and Sons, 1969.
[24] J. Nocedal and S.J. Wright. Numerical Optimization. Second edition. Springer-Verlag,
2006.
[25] J. Ortega and W. Rheinboldt. Iterative Solution of Nonlinear Equations in Several
Variables. Academic Press, New York, 1970.
[26] E. Polak. Computational Methods in Optimization. Academic Press, New York, N.Y.,
1971.
[27] L.S. Pontryagin, V.G. Boltyansky, R.V. Gamkrelidze, and E.F. Mi shchenko. The
Mathematical Theory of Optimal Processes. Interscience, 1962.
[28] W. Rudin. Real and Complex Analysis. McGraw-Hill, New York, N.Y., 1974. second
edition.
[29] W.J. Rugh. Linear System Theory. Prentice Hall, 1993.
[30] E.D. Sontag. Mathematical Control Theory. Deterministic Finite Dimensional Systems.
Springer-Verlag, 1990.
[31] E.D. Sontag. Mathematical Control Theory. Deterministic Finite Dimensional Systems.
Springer-Verlag, 1998. second edition.
[32] H.J. Sussmann and J.C. Willems. 300 years of optimal control: From the brachys-
trochrone to the maximum principle. IEEE CSS Magazine, 17, 1997.
[33] P.P. Varaiya. Notes in Optimization. Van Nostrand Reinhold, 1972.
[34] R. Vinter. Optimal Control. Springer Verlag, 2010.
[35] K. Zhou, J.C. Doyle, and K. Glover. Robust and Optimal Control. Prentice Hall, New
Jersey, 1996.

208 Copyright ©1993-2018, André L. Tits. All Rights Reserved

You might also like