Minimization of Functionals
Minimization of Functionals
Minimization of Functionals
If the measure of the utility of any theorem is judged by how concisely it may
be expressed, and how widely it may be applied, then the Weierstrass Theorem
rightly plays a central role in optimization theory. It provides sufficient conditions
for the solution of the optimization problem where we seek to find u ∈ U ⊆ X
such that
f (u) = inf f (v)
v∈U
162 Chapter 5. Minimization of Functionals
Because the proof of this theorem, while well-known, is instructive and serves as
a model for the proofs of more general results in this chapter, we will summarize
it here. We will require the following alternative characterizations of continuity
on topological spaces to carry out this proof in a manner that can be “lifted” to
more general circumstances. Recall that one of the most common definitions of
continuity is cast in terms of inverse images of open sets.
Definition 5.1.1. Let (X, τx ) and (Y, τy ) be topological spaces. A function f : X →
Y is continuous at x0 ∈ X if the inverse image of every open set O in Y that
contains f (x0 ) is an open set in X that contains x0 . That is
Ck+1 ⊆ Ck ∀k ∈ N
5.2. Elementary Calculus 163
and each Ck is compact being a closed subset of a compact set. The sequence of
compact sets {Ck }∞
k=1 clearly satisfies the finite intersection property, so that
∞
∃ x0 ∈ Ck .
k=1
where C is the constraint set. Now, there are many ways in which we can construct
simple functions for which there is no minimizer over the constraint set. If the
constraint set is unbounded, such as the entire real line, an increasing function
like f (x) = x obviously does not have a minimizer. Even if the constraint set is
bounded, for example C ≡ (0, 1], there is no minimizer for the simple function
f (x) = x. Intuitively, we would like to say that x0 = 0 is the minimizer, but
this point is not in the constraint set C. While there are many theorems that can
describe when a function will achieve its minimum over some constraint set, one
prototypical example is due to Weierstrass.
Theorem 5.2.1. If f is a real-valued function defined on a closed and bounded
subset C of the real line, then f achieves its minimum.
If f is in fact a differentiable function of the real variable x, and is defined on all
of R, then the problem of characterizing the values of x where extrema may occur
is well known: the extrema may occur only when the derivative of the function f
vanishes. From elementary calculus we know that:
Theorem 5.2.2. If f is a differentiable, real-valued function of the real variable x
and is defined on all of R, then
f (x0 ) = inf f (x) implies that f (x0 ) = 0.
x∈R
In fact, most students studying calculus for the first time spend a great deal of time
finding the zeros of the derivative of a function, in order to find the extrema of the
164 Chapter 5. Minimization of Functionals
function. Soon after learning that the first derivative can be used to characterize
the possible locations of the extrema of a real-valued function, the student of
calculus is taught to examine the second derivative of a function to gain some
insight into the nature of the extrema.
Theorem 5.2.3. If f is a twice differentiable, real-valued function defined on all of
R, f (x0 ) = 0, and
f (x0 ) > 0 (5.3)
then x0 is a relative minimum. In other words,
f (x) ≥ f (x0 )
for all x in some neighborhood of x0 .
Of course, readers will recognize these theorems immediately. These theo-
rems are fundamentals of the foundations of real variable calculus, and require no
abstract, functional analytic framework whatsoever. Because of their simple form
and graphical interpretation, they are easy to remember. They are important to
this chapter in that they provide a touchstone for more abstract results in func-
tional analysis that are required to treat optimization problems in mechanics. For
example, the elastic energy stored in a beam, rod, plate, or membrane cannot be
expressed in terms of a real-valued function of a real variable f (x). One can hy-
pothesize that equilibria of these structures correspond to minima in their stored
energy, but the expressions for the stored energy are not classically differentiable
functions of a real variable. We cannot simply differentiate the energy expressions
in a classical sense to find the equilibria as described in the above theorems.
What is required, then, is a generalization of these theorems that is suffi-
ciently rich to treat the meaningful collection of problems in mechanics and control
theory. For a large class of problems, we will find that each of the simple, intuitive
theorems above can be generalized so that they are meaningful for problems in
control and mechanics. In particular, this chapter will show that:
• The Weierstrass Theorem can be generalized to a functional analytic frame-
work. To pass to the treatment of control and mechanics problems, we will
need to generalize the idea of considering closed and bounded subsets of the
real line, and consider compact subsets of topological spaces. We will need to
generalize the notion of continuity of functions of a real variable to continuity
of functionals on topological spaces.
• The characterization of minima of real-valued functions by derivatives that
vanish will be generalized by considering Gateaux and Fréchet derivatives of
functionals on abstract spaces. It will be shown that Theorem 5.2.1 has an
immediate generalization to a functional analytic setting.
• The method of determining that a given extrema of a real-valued function is
a relative minima, by checking to see if its second derivative is positive, also
has a simple generalization. In this case, a relative minima can be deduced if
the second Gateaux derivative is positive.
5.3. Minimization of Differentiable Functionals 165
This is clearly a result that is directly analogous to the local character of the
characterization of extrema of real-valued functions. In fact, the primary results
of this section are derived by exploiting the identification of f (x0 + th) with a
real-valued function
g(t) ≡ f (x0 + th)
where t ∈ [0, 1] and h ∈ X. Note that for fixed x0 , h ∈ X, g(t) is a real-valued
function. Indeed, if g is sufficiently smooth, uniformly for all x0 and h in some
subset of X, we can expand g in a Taylor series about t = 0
n
tk g (k) (0)
g(t) = g(0) + + Rn+1 .
k!
k=1
Now we obtain the most direct, simple generalization of Theorem 5.2.2 for real-
variable functions.
Theorem 5.3.1. Let X be a normed vector space, and let f : X → R. If f has a
local minimum at x0 ∈ X and the Gateaux derivative Df (x0 ) exists, we have
Theorem 5.3.2. Let X be a normed vector space and let the functional f : X → R.
Suppose that n is an even number with n ≥ 2 and
(i) f is n times Fréchet differentiable in a neighborhood of x0 ,
(ii) Df (n) is continuous at x0 , and
(iii) the nth derivative is coercive, that is
where
C = {x : g(x) = 0}.
Provided that the functions are smooth enough and the constraints are regular,
there is a very satisfactory Lagrange multiplier representation for this problem.
Ljusternik’s Theorem
As will be seen in many applications, the regularity of the constraints plays an im-
portant role in justifying the applicability of Lagrange multipliers to many equality
constrained problems. In fact, this pivotal role is made clear in the following the-
orem due to Ljusternik.
Theorem 5.4.1 (Ljusternik’s Theorem). Let X and Y be Banach spaces. Suppose
that
(i) g : X → Y is Fréchet differentiable on an open set O ⊆ X,
(ii) g is regular at x0 ∈ O, and
(iii) the Fréchet derivative x0 → Df (x0 ) is continuous at x0 in the uniform op-
erator topology on L(X, Y ).
Then there is a neighborhood N (y0 ) of y0 = g(x0 ) and a constant C such that the
equation
y = g(x)
has a solution x for every y ∈ N (y0 ) and
x − x0 X ≤ Cy − y0 Y .
With these preliminary definitions, we can now state the Lagrange multiplier the-
orem for equality constrained extremization.
Theorem 5.4.2. Let X and Y be Banach spaces, f : X → R, and g : X → Y .
Suppose that:
(i) f and g are Fréchet differentiable on an open set O ⊆ X,
(ii) the Fréchet derivatives
x0 → Df (x0 )
x0 → Dg(x0 )
are continuous in the uniform operator topology on L(X, R) and L(X, Y ),
respectively, and
(iii) x0 ∈ O is a regular point of the constraints g(x).
If f has a local extremum under the constraint g(x0 ) = 0 at the regular point
x0 ∈ O, then there is a Lagrange multiplier y0∗ ∈ Y ∗ such that the Lagrangian
f (x) + y0∗ g(x)
is stationary at x0 . That is, we have
Df (x0 ) + y0∗ ◦ Dg(x0 ) = 0.
Proof. We first show that if x0 is a local extremum, then Df (x0 ) ◦ x = 0 for all x
such that Dg(x0 ) ◦ x = 0. Define the mapping
F :X →R×Y
F (x) = f (x), g(x) .
168 Chapter 5. Minimization of Functionals
Dg(x0 ) ◦ u = 0
but
Df (x0 ) ◦ u = z = 0.
If this were the case, then x0 would be a regular point of the mapping F . To see
why this is the case, we can compute
DF (x0 ) ◦ x = Df (x0 ) ◦ x, Dg(x0 ) ◦ x ∈ R × Y.
α−β α−β
Df (x0 ) ◦ u = Df (x0 ) ◦ u
z z
= α − β = α − Df (x0 ) ◦ x̄
α−β α−β
Dg(x0 ) ◦ u = Dg(x0 ) ◦ u = 0.
z z
α−β
If we choose x = z u + x̄, it is readily seen that
DF (x0 ) ◦ α−β
z u + x̄ = Df (x0 ) ◦ α−β u + x̄ , Dg(x0 ) ◦ α−β u + x̄
z z
= α−β
z Df (x0 ) ◦ u + Df (x 0 ) ◦ x̄, Dg(x0 ) ◦ x̄
= (α − Df (x0 ) ◦ x̄ + Df (x0 ) ◦ x̄, y)
= (α, y).
N (α0 , 0) ⊆ R × Y
has a solution for every (α, y) ∈ N (α0 , 0) and the solution satisfies
x − x0 X ≤ C {|α − α0 | + yY } .
In particular, the element (α0 − , 0) is in the neighborhood N (α0 , 0) for all small
enough. For every > 0 there is a solution x to the equation
F (x ) = (α0 − , 0).
But this means that
f (x ) = α0 −
= f (x0 ) −
and
g(x ) = 0.
Furthermore, we have that
x − x0 X ≤ .
This contradicts the fact that x0 is a local extremum, and we conclude
Df (x0 ) ◦ x = 0
for all x ∈ X such that
Dg(x0 ) ◦ x = 0.
Recall that
{x ∈ X : Dg(x0 ) ◦ x = 0} = ker Dg(x0 ) .
⊥
In fact Df (x0 ) ∈ X ∗ and Df (x0 ) ∈ ker (Dg(x0 )) . Since the range of Dg(x0 )
is closed, we have
∗ ⊥
range Dg(x0 ) = ker Dg(x0 ) .
By definition
Dg(x0 ) : X → Y
and ∗
Dg(x0 ) : Y ∗ → X ∗ .
We conclude that there is a y0∗ ∈ Y ∗ such that
∗
Df (x0 ) = − Dg(x0 ) ◦ y0∗
∗
Df (x0 ) + Dg(x0 ) ◦ y0∗ = 0.
By definition
( ∗ )
Dg(x0 ) ◦ y0∗ , x = y0∗ , Dg(x0 ) ◦ xY ∗ ×Y .
X ∗ ×X
The above theorem bears a close resemblance to the Lagrange multiplier the-
orem from undergraduate calculus discussed in [12], [18] in the introduction. The
essential ingredients of the above theorem include smoothness of the functionals
f and g and the regularity of the constraints. There is an alternative form of this
theorem that weakens the requirement that the constraints are in fact regular at
x0 . It will be useful in many applications.
Theorem 5.4.3. Let X and Y be Banach spaces, f : X → R, and g : X → Y .
Suppose that:
(i) f and g are Fréchet differentiable on an open set O ⊆ X,
(ii) the Fréchet derivatives
x0 → Df (x0 )
x0 → Dg(x0 )
are continuous in the uniform operator topology on L(X, R) and L(X, Y ),
respectively, and
(iii) the range of Dg(x0 ) is closed in Y .
If f has a local extremum under the constraint g(x0 ) = 0 at the point x0 ∈ O, then
there are multipliers λ0 ∈ R and y0∗ ∈ Y ∗ such that the Lagrangian
λ0 f (x) + y0∗ g(x)
is stationary at x0 . That is
λ0 Df (x0 ) + y0∗ ◦ Dg(x0 ) = 0.
Proof. The proof of this theorem can be carried out in two steps. First, suppose
that the range of Dg(x0 ) is all of Y . In this case, the constraint g(x0 ) is regular at
x0 . We can apply the preceding theorem and select λ0 ≡ 1. If, on the other hand,
the range Dg(x0 ) is strictly contained in Y , we know that there is some ỹ ∈ Y
such that
d = inf ỹ − y : y ∈ range Dg(x0 ) > 0.
⊥
By Theorem 2.2.2 there is an element y0∗ ∈ range Dg(x0 ) such that
y0∗ , ỹ = d = 0
and y0∗ = 0. But for any linear operator A
⊥
range(A) = ker(A∗ )
so that ⊥ ∗
y0∗ ∈ range Dg(x0 ) ≡ ker Dg(x0 ) .
By definition, since y0∗ ∈ Y ∗
( ∗ )
y0∗ , Dg(x0 ) ◦ xY ∗ ×Y = Dg(x0 ) ◦ y0∗ , x =0
X ∗ ×X
for all x ∈ X. We choose λ0 = 0 and conclude
λ0 Df (x0 ) + y0∗ ◦ Dg(x0 ) = 0.
5.5. Fréchet Differentiable Implicit Functionals 171
A(x(u), u) = 0.
for some C ∈ R.
If there is a solution λ ∈ L(Y, Z) to the equation
λ ◦ Dx A(x, u) = Dx J (x, u)
at x = x(u), then
J(u) = J (x(u), u)
is Gateaux differentiable at u and
so that
R1 x(u ) − x(u)X R1 x(u ) − x(u)X
≤ .
Cũ − uU · x(u ) − x(u)X
Consequently, we write
R()
lim → 0.
→0
In the various derivative expressions that follow, we will use R() to denote gener-
ically any remainder terms that have the above asymptotic behavior as a function
of . In addition, by the Gateaux differentiability of J (x, ·), we have
J(u ) − J(u)
= Du J (x(u ), u) ◦ (ũ − u)
Dx J (x(u), u) ◦ (x(u ) − x(u)) R()
+ + . (5.12)
Since the pairs (x(u ), u ), (x(u), u) are solutions of A(·, ·) = 0, it is always true
that
λ ◦ (A(x(u ), u ) − A(x(u), u)) = 0 ∈ Z.
We can write
A(x(u ), u ) − A(x(u ), u) A(x(u ), u) − A(x(u), u)
λ◦ + = 0. (5.13)
174 Chapter 5. Minimization of Functionals
In this last limit, we have used the continuity of Du J (x, u) and Du A(x, u) on
X × U.