Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems

Chapter 1
Optimality Conditions:
Unconstrained Optimization
1.1
Differentiable Problems
Consider the problem of minimizing the function f : Rn R where f is twice continuously

differentiable on Rn :
P minimize f (x)
x Rn
We wish to obtain constructible first and secondorder necessary and sufficient conditions
for optimality. Recall the following elementary results.
Theorem 1.1.1 [First Order Necessary Conditions for Optimality]
Let f : Rn R be differentiable at a point x Rn . If x is a local solution to the problem P,
then f (x) = 0.
Proof: From the definition of the derivative we have that
f (x) = f (x) + f (x)T (x x) + o(kx xk)
o(kx xk)
= 0. Let x := x tf (x). Then
xx kx xk
where lim
o(tkf (x)k)
f (x tf (x)) f (x)
.
= kf (x)k2 +
t
t
Taking the limit as t 0 we obtain

0 kf (x)k2 0.
Hence f (x) = 0.
13
14 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

Theorem 1.1.2 [SecondOrder Optimality Conditions]
Let f : Rn R be twice differentiable at the point x Rn .
1. (necessity) If x is a local solution to the problem P, then f (x) = 0 and 2 f (x) is
positive semi-definite.
2. (sufficiency) If f (x) = 0 and 2 f (x) is positive definite, then there is an > 0 such
that f (x) f (x) + kx xk2 for all x near x.
Proof:
1. We make use of the secondorder Taylor series expansion
1
(1.1.1)f (x) = f (x) + f (x)T (x x) + (x x)T 2 f (x)(x x) + o(kx xk2 ) .
2
Given d Rn and t > 0 set x := x + td, plugging this into (1.1.1) we find that
0
f (x + td) f (x)
1 T 2
o(t2 )
=
d
f
(x)d
+
t2
2
t2
since f (x) = 0 by Theorem 1.1.1. Taking the limit as t 0 we get that

0 dT 2 f (x)d.
Now since d was chosen arbitrarily we have that 2 f (x) is positive semi-definite.
2. From (1.1.1) we have that
(1.1.2)
f (x) f (x)
1 (x x)T 2
(x x)
o(kx xk2 )
=
f
(x)
+
.
kx xk2
2 kx xk
kx xk
kx xk2
If > 0 is the smallest eigenvalue of 2 f (x), choose > 0 so that

(1.1.3)

o(kx xk2 )

kx xk2
4
whenever kx xk < . Then for all kx xk < we have from (1.1.2) and (1.1.3) that
f (x)f (x)
kxxk2
2)
1
+ 0(kxxk
2
kxxk2
1
.
4
Consequently, if we set = 14 , then

f (x) f (x) + kx xk2
whenever kx xk < .
1.2. CONVEX PROBLEMS
1.2
15
Convex Problems
Observe that Theorem 1.1.1 establishes firstorder necessary conditions while Theorem 1.1.2
establishes both secondorder necessary and sufficient conditions. What about firstorder
sufficiency conditions? For this we introduce the following definitions.
Definition 1.2.1 [Convex Sets and Functions]
1. A subset C Rn is said to be convex is for every pair of points x and y taken from C,
the entire line segment connecting x and y is also contained in C, i.e.,
[x, y] C
where
[x, y] = {(1 )x + y : 0 1} .
2. A function f : Rn R {} is said to be convex if the set

epi (f ) = {(, x) : f (x) }
is a convex subset of R1+n . In this context, we also define the set
dom (f ) = {x Rn : f (x) < +}
to be the essential domain of f .
Lemma 1.2.1 The function f : Rn R is convex if and only if for every two points
x1 , x2 dom (f ) and [0, 1] we have
f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ).
That is, the secant line connecting (x1 , f (x1 )) and (x2 , f (x2 )) lies above the graph of f .
Example: The following functions are examples of convex functions: cT x, kxk, ex , x2
The significance of convexity in optimization theory is illustrated in the following result.

Theorem 1.2.1 Let f : Rn R {} be convex. If x dom (f ) is a local solution to
the problem P, then x is a global solution to the problem P.
Proof: If f (
x) = we are done, so let us assume that < f (
x). Suppose there is
n
a xb R with f (xb) < f (x). Let > 0 be such that f (x) f (x) whenever kx xk .
Set := (2kx xbk)1 and x := x + (xb x). Then kx xk /2 and f (x )
(1 )f (x) + f (xb) < f (x). This contradicts the choice of , hence no such xb exists.

If f is a differentiable convex function, then a better result can be established. In order
to obtain this result we need the following lemma.
Lemma 1.2.2 Let f : Rn R {+} be convex.
1. Given x dom (f ) and d Rn the difference quotient
(1.2.4)
f (x + td) f (x)
t
is a non-decreasing function of t on (0, +).

2. For every x dom (f ) and d Rn the directional derivative f 0 (x; d) always exists and
is given by
f (x + td) f (x)
(1.2.5)
f 0 (x; d) := inf
.
t>0
t
3. For every x dom (f ), the function f 0 (x; ) is sublinear, i.e. f 0 (x; ) is positively
homogeneous,
f 0 (x; d) = f 0 (x; d) d Rn , 0 ,
and subadditive,
f 0 (x; u + v) f 0 (x; u) + f 0 (x; v).
Proof: We assume (1.2.4) is true and show (1.2.5). If x + td
/ dom (f ) for all t > 0, then
the result obviously true. Therefore, we may as well assume that there is a t > 0 such that
x + td dom (f ) for all t (0, t]. Recall that
(1.2.6)
f 0 (x; d) := lim
t0
f (x + td) f (x)
.
t
Now if the difference quotient (1.2.4) is non-decreasing in t on (0, +), then the limit in
(1.2.6) is necessarily given by the infimum in (1.2.5). This infimum always exists and so
f 0 (x; d) always exists and is given by (1.2.5).
We now prove (1.2.4). Let x dom (f ) and d Rn . If x + td
/ dom (f ) for all t > 0,
then the result is obviously true. Thus, we may assume that
0 < t = sup{t : x + td dom (f )}.
17
Let 0 < t1 < t2 < t (we allow the possibility that t2 = t if t < +). Then

t1
td
h
t2 2

i
t1
f 1 t2 x + tt21 (x + t2 d)

1 tt12 f (x) + tt12 f (x + t2 d).
f (x + t1 d) = f x +
=
Hence
f (x + t1 d) f (x)
f (x + t2 d) f (x)
.
t1
t2
We now show Part 3 of this result. To see that f 0 (x; ) is positively homogeneous let
d Rn and > 0 and note that
f 0 (x; d) = lim
t0
f (x + (t)d) f (x)
= f 0 (x; d).
(t)
To see that f 0 (x; ) is subadditive let u, v Rn , then

f (x + t(u + v)) f (x)
t
f (x + 2t (u + v)) f (x)
lim
t0
t/2
1
f ( (x + tu) + 12 (x + tv)) f (x)
lim 2 2
t0
t
1
1
f (x + tu) + 2 f (x + tv) f (x)
lim 2 2
t0
t
f (x + tu) f (x) f (x + tv) f (x)
lim
+
t0
t
t
f 0 (x; u) + f (x; v) .
f 0 (x; u + v) = lim
t0
=
=
=
=
From Lemma 1.2.2 we immediately obtain the following result.

Theorem 1.2.2 Let f : Rn R {+} be convex and suppose that x Rn is a point
at which f is differentiable. Then x is a global solution to the problem P if and only if
f (x) = 0.
Proof: If x is a global solution to the problem P, then, in particular, x is a local solution
to the problem P and so f (x) = 0 by Theorem 1.1.1. Conversely, if f (x) = 0, then, by
setting t := 1, x := x, and d := y x in (1.2.5), we get that
0 f (y) f (x),
or f (x) f (y). Since y was chosen arbitrarily, the result follows.

As Theorems 1.2.1 and 1.2.2 demonstrate, convex functions are very nice functions indeed.
This is especially so with regard to optimization theory. Thus, it is important that we be
able to recognize when a function is convex. For this reason we give the following result.
Theorem 1.2.3 Let f : Rn R.
1. If f is differentiable on Rn , then the following statements are equivalent:
(a) f is convex,
(b) f (y) f (x) + f (x)T (y x) for all x, y Rn
(c) (f (x) f (y))T (x y) 0 for all x, y Rn .
2. If f is twice differentiable then f is convex if and only if f is positive semi-definite for
all x Rn .
Proof: (a) (b) If f is convex, then 1.2.3 holds. By setting t := 1 and d := y x we
obtain (b).
(b) (c) Let x, y Rn . From (b) we have
f (y) f (x) + f (x)T (y x)
and
f (x) f (y) + f (y)T (x y).
By adding these two inequalities we obtain (c).
(c) (b) Let x, y Rn . By the mean value theorem there exists 0 < < 1 such
that
f (y) f (x) = f (x )T (y x)
where x := y + (1 )x. By hypothesis,
0 [f (x ) f (x)]T (x x)
= [f (x ) f (x)]T (y x)
= [f (y) f (x) f (x)T (y x)].
Hence f (y) f (x) + f (x)T (y x).
(b) (a) Let x, y Rn and set
:= max () := [f (y + (1 )x) (f (y) + (1 )f (x))].
[0,1]
We need to show that 0. Since [0, 1] is compact and is continuous, there is a

[0, 1] such that () = . If equals zero or one, we are done. Hence we may as
well assume that 0 < < 1 in which case
0 = 0 () = f (x )T (y x) + f (x) f (y)
19
where x = x + (y x), or equivalently

f (y) = f (x) f (x )T (x x ).
But then
= f (x ) (f (x) + (f (y) f (x)))

= f (x ) + f (x )T (x x ) f (x)
0
by (b).
2) Suppose f is convex and let x, d Rn , then by (b) of Part (1),
f (x + td) f (x) + tf (x)T d
for all t R. Replacing the left hand side of this inequality with its secondorder
Taylor expansion yields the inequality
f (x) + tf (x)T d +
t2 T 2
d f (x)d + o(t2 ) f (x) + tf (x)T d
2
or equivalently
1 t 2
o(t2 )
d f (x)d + 2 0.
2
t
Letting t 0 yields the inequality
dT 2 f (x)d 0.
Since d was arbitrary, 2 f (x) is positive semi-definite.
Conversely, if x, y Rn , then by the mean value theorem there is a (0, 1) such that
1
f (y) = f (x) + f (x)T (y x) + (y x)T 2 f (x )(y x)
2
where x = y + (1 )x. Hence
f (y) f (x) + f (x)T (y x)
since 2 f (x ) is positive semi-definite. Therefore, f is convex by (b) of Part (1).
We have established that f 0 (x; d) exists for all x dom (f ) and d Rn , but we have not
yet discussed to continuity properties of f . We give a partial result in this direction in the
next lemma.
Lemma 1.2.3 Let f : Rn R {+} be convex. Then f if bounded in a neighborhood of
a point x if and only if f is Lipschitz in a neighborhood of x.

Proof: If f is Lipschitz in a neighborhood of x, then f is clearly bounded above in a neighborhood of x. Therefore, we assume local boundedness and establish Lipschitz continuity.
Let > 0 and M > 0 be such that |f (x)| M for all x x + 2B. Set g(x) =
f (x + x) f (
x). It is sufficient to show that g is Lipschitz on B. First note that for all
x 2B
1
1
1
1
0 = g(0) = g( x + (x)) g(x) + g(x),
2
2
2
2
and so g(x) g(x) for all x 2B. Next, let x, y B with x 6= y and set = kx yk.
Then w = y + 1 (y x) 2B, and so
1
1
1
1
g(y) = g
x
+
w
g(x)
+
g(w).
1 + 1
1 + 1
1 + 1
1 + 1
!
Consequently,
g(y) g(x)
1
(g(w) g(x)) 2M 1 = 2M 1 kx yk.
1 + 1
Since this inequality is symmetric in x and y, we obtain the result.
1.3
Convex Composite Problems
Convex composite optimization is concerned with the minimization of functions of the form
f (x) := h(F (x)) where h : Rm R {+} is a proper convex function and F : Rn Rn is
continuously differentiable. Most problems from nonlinear programming can be cast in this
framework.
Examples:
(1) Let F : Rn Rm where m > n, and consider the equation F (x) = 0. Since m > n
it is highly unlikely that a solution to this equation exists. However, one might try
to obtain a best approximate solution by solving the problem min{kF (x)k : x Rn }.
This is a convex composite optimization problem since the norm is a convex function.
(2) Again let F : Rn Rm where m > n, and consider the inclusion F (x) C, where
C Rn is a non-empty closed convex set. One can pose this inclusion as the optimization problem min{dist(F (x)|C) : x Rn }. This is a convex composite optimization
problem since the distance function
dist(y | C) := inf ky zk
zC
is a convex function.
1.3. CONVEX COMPOSITE PROBLEMS
21
(3) Let F : Rn Rm , C Rn a non-empty closed convex set, and f0 : Rn R,

and consider the constrained optimization problem min{f0 (x) : F (x) C}. One can
approximate this problem by the unconstrained optimization problem
min{f0 (x) + dist(f (x)|C) : x Rn }.
This is a convex composite optimization problem where h(, y) = + dist(y|C) is a
convex function. The function f0 (x) + dist(f (x)|C) is called an exact penalty function
for the problem min{f0 (x) : F (x) C}. We will review the theory of such functions
in a later section.
Most of the first-order theory for convex composite functions is easily derived from the
observation that
(1.3.7)
f (y) = h(F (y)) = h(F (x) + F 0 (x)(y x)) + o(ky xk).
This local representation for f is a direct consequence of h being locally Lipschitz:

|h(F (y)) h(F (x) + F 0 (x)(y x))|
R
Kky xk 01 kF 0 (x + t(y x)) F 0 (x)kdt
for some K 0. Equation (1) can be written equivalently as
(1.3.8)
h(F (x + d)) = h(F (x)) + f (x; d) + o(kdk)
where
f (x; d) := h(F (x) + F 0 (x)d) h(F (x)).
From 1.3.8, one immediately obtains the following result.
Lemma 1.3.1 Let h : Rn R be convex and let F : Rn Rm be continuously differentiable.
Then the function f = h F is everywhere directional differentiable and one has
(1.3.9)
f 0 (x; d) = h0 (F (x); F 0 (x)d)

.
= inf >0 f (x;d)
This result yields the following optimality condition for convex composite optimization
problems.
Theorem 1.3.1 Let h : Rm R be convex and F : Rn Rm be continuously differentiable.
If x is a local solution to the problem min{h(F (x))}, then d = 0 is a global solution to the
problem
(1.3.10)
x) + F 0 (
x)d).
minn h(F (
dR
There are various ways to test condition 1.3.8. A few of these are given below.

Lemma 1.3.2 Let h and F be as in Theorem 1.3.1. The following conditions are equivalent
(a) d = 0 is a global solution to 1.3.10.
(b) 0 h0 (F (x); F 0 (x)d) for all d Rn .
(c) 0 f (x; d) for all d Rn .
Proof:The equivalence of (a) and (b) follows immediately from convexity. Indeed, this
equivalence is the heart of the proof of Theorem 1.3.1. The equivalence of (b) and (c) is an
immediate consequence of 1.3.2.

In the sequel, we say that x Rn satisfies the first-order condition for optimality for the
convex composite optimization problem if it satisfies any of the three conditions (a)(c) of
Lemma 1.3.2.
1.3.1
A Note on Directional Derivatives
Recall that if f : Rn R is differentiable, then the function f 0 (x; d) is linear in d:

f 0 (x; d1 + d2 ) = f 0 (x; d1 ) + f 0 (x; d2 ) .
If f is only assumed to be convex and not necessarily differentiable, then f 0 (x; ) is sublinear
and hence convex. Finally, if f = h F is convex composite with h : Rm R convex and
F : Rn Rm continuously differentiable, then, by Lemma (1.3.1), f 0 (x; ) is also sublinear
and hence convex. Moreover, the approximate directional derivative f (x; d) satisfies
1
1
1 f (x; 1 d) 2 f (x; 2 d)
for 0 < 1 2 ,
by the nondecreasing nature of the difference quotients. Thus, in particular,

f (x; d) f (x; d)
for all [0, 1].

Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems

Uploaded by

Copyright:

Available Formats

Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems

Uploaded by

Copyright:

Available Formats

Chapter 1

Consider the problem of minimizing the function f : Rn R where f is twice continuously

Taking the limit as t 0 we obtain

14 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

since f (x) = 0 by Theorem 1.1.1. Taking the limit as t 0 we get that

If > 0 is the smallest eigenvalue of 2 f (x), choose  > 0 so that

Consequently, if we set = 14 , then

1.2. CONVEX PROBLEMS

2. A function f : Rn R {} is said to be convex if the set

16 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

is a non-decreasing function of t on (0, +).

1.2. CONVEX PROBLEMS

To see that f 0 (x; ) is subadditive let u, v Rn , then

From Lemma 1.2.2 we immediately obtain the following result.

18 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

We need to show that 0. Since [0, 1] is compact and is continuous, there is a

1.2. CONVEX PROBLEMS

where x = x + (y x), or equivalently

= f (x ) (f (x) + (f (y) f (x)))

20 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

Since this inequality is symmetric in x and y, we obtain the result.

Convex Composite Problems

1.3. CONVEX COMPOSITE PROBLEMS

(3) Let F : Rn Rm , C Rn a non-empty closed convex set, and f0 : Rn R,

f (y) = h(F (y)) = h(F (x) + F 0 (x)(y x)) + o(ky xk).

This local representation for f is a direct consequence of h being locally Lipschitz:

h(F (x + d)) = h(F (x)) + f (x; d) + o(kdk)

f 0 (x; d) = h0 (F (x); F 0 (x)d)

22 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION

A Note on Directional Derivatives

Recall that if f : Rn R is differentiable, then the function f 0 (x; d) is linear in d:

by the nondecreasing nature of the difference quotients. Thus, in particular,

for all [0, 1].

You might also like

If > 0 is the smallest eigenvalue of 2 f (x), choose > 0 so that