Optimality Conditions: Unconstrained Optimization: 1.1 Differentiable Problems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Chapter 1

Optimality Conditions:
Unconstrained Optimization
1.1

Differentiable Problems

Consider the problem of minimizing the function f : Rn R where f is twice continuously


differentiable on Rn :
P minimize f (x)
x Rn
We wish to obtain constructible first and secondorder necessary and sufficient conditions
for optimality. Recall the following elementary results.
Theorem 1.1.1 [First Order Necessary Conditions for Optimality]
Let f : Rn R be differentiable at a point x Rn . If x is a local solution to the problem P,
then f (x) = 0.
Proof: From the definition of the derivative we have that
f (x) = f (x) + f (x)T (x x) + o(kx xk)
o(kx xk)
= 0. Let x := x tf (x). Then
xx kx xk

where lim

o(tkf (x)k)
f (x tf (x)) f (x)
.
= kf (x)k2 +
t
t

Taking the limit as t 0 we obtain


0 kf (x)k2 0.
Hence f (x) = 0.

13

14 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION


Theorem 1.1.2 [SecondOrder Optimality Conditions]
Let f : Rn R be twice differentiable at the point x Rn .
1. (necessity) If x is a local solution to the problem P, then f (x) = 0 and 2 f (x) is
positive semi-definite.
2. (sufficiency) If f (x) = 0 and 2 f (x) is positive definite, then there is an > 0 such
that f (x) f (x) + kx xk2 for all x near x.
Proof:
1. We make use of the secondorder Taylor series expansion
1
(1.1.1)f (x) = f (x) + f (x)T (x x) + (x x)T 2 f (x)(x x) + o(kx xk2 ) .
2
Given d Rn and t > 0 set x := x + td, plugging this into (1.1.1) we find that
0

f (x + td) f (x)
1 T 2
o(t2 )
=
d

f
(x)d
+
t2
2
t2

since f (x) = 0 by Theorem 1.1.1. Taking the limit as t 0 we get that


0 dT 2 f (x)d.
Now since d was chosen arbitrarily we have that 2 f (x) is positive semi-definite.
2. From (1.1.1) we have that
(1.1.2)

f (x) f (x)
1 (x x)T 2
(x x)
o(kx xk2 )
=

f
(x)
+
.
kx xk2
2 kx xk
kx xk
kx xk2

If > 0 is the smallest eigenvalue of 2 f (x), choose  > 0 so that


(1.1.3)



o(kx xk2 )


kx xk2
4

whenever kx xk < . Then for all kx xk <  we have from (1.1.2) and (1.1.3) that
f (x)f (x)
kxxk2

2)
1
+ 0(kxxk
2
kxxk2
1
.
4

Consequently, if we set = 14 , then


f (x) f (x) + kx xk2
whenever kx xk < .


1.2. CONVEX PROBLEMS

1.2

15

Convex Problems

Observe that Theorem 1.1.1 establishes firstorder necessary conditions while Theorem 1.1.2
establishes both secondorder necessary and sufficient conditions. What about firstorder
sufficiency conditions? For this we introduce the following definitions.
Definition 1.2.1 [Convex Sets and Functions]
1. A subset C Rn is said to be convex is for every pair of points x and y taken from C,
the entire line segment connecting x and y is also contained in C, i.e.,
[x, y] C

where

[x, y] = {(1 )x + y : 0 1} .

2. A function f : Rn R {} is said to be convex if the set


epi (f ) = {(, x) : f (x) }
is a convex subset of R1+n . In this context, we also define the set
dom (f ) = {x Rn : f (x) < +}
to be the essential domain of f .

Lemma 1.2.1 The function f : Rn R is convex if and only if for every two points
x1 , x2 dom (f ) and [0, 1] we have
f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ).
That is, the secant line connecting (x1 , f (x1 )) and (x2 , f (x2 )) lies above the graph of f .
Example: The following functions are examples of convex functions: cT x, kxk, ex , x2
The significance of convexity in optimization theory is illustrated in the following result.

16 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION


Theorem 1.2.1 Let f : Rn R {} be convex. If x dom (f ) is a local solution to
the problem P, then x is a global solution to the problem P.
Proof: If f (
x) = we are done, so let us assume that < f (
x). Suppose there is
n
a xb R with f (xb) < f (x). Let  > 0 be such that f (x) f (x) whenever kx xk .
Set := (2kx xbk)1 and x := x + (xb x). Then kx xk /2 and f (x )
(1 )f (x) + f (xb) < f (x). This contradicts the choice of , hence no such xb exists.

If f is a differentiable convex function, then a better result can be established. In order
to obtain this result we need the following lemma.
Lemma 1.2.2 Let f : Rn R {+} be convex.
1. Given x dom (f ) and d Rn the difference quotient
(1.2.4)

f (x + td) f (x)
t

is a non-decreasing function of t on (0, +).


2. For every x dom (f ) and d Rn the directional derivative f 0 (x; d) always exists and
is given by
f (x + td) f (x)
(1.2.5)
f 0 (x; d) := inf
.
t>0
t
3. For every x dom (f ), the function f 0 (x; ) is sublinear, i.e. f 0 (x; ) is positively
homogeneous,
f 0 (x; d) = f 0 (x; d) d Rn , 0 ,
and subadditive,
f 0 (x; u + v) f 0 (x; u) + f 0 (x; v).
Proof: We assume (1.2.4) is true and show (1.2.5). If x + td
/ dom (f ) for all t > 0, then
the result obviously true. Therefore, we may as well assume that there is a t > 0 such that
x + td dom (f ) for all t (0, t]. Recall that
(1.2.6)

f 0 (x; d) := lim
t0

f (x + td) f (x)
.
t

Now if the difference quotient (1.2.4) is non-decreasing in t on (0, +), then the limit in
(1.2.6) is necessarily given by the infimum in (1.2.5). This infimum always exists and so
f 0 (x; d) always exists and is given by (1.2.5).
We now prove (1.2.4). Let x dom (f ) and d Rn . If x + td
/ dom (f ) for all t > 0,
then the result is obviously true. Thus, we may assume that
0 < t = sup{t : x + td dom (f )}.

1.2. CONVEX PROBLEMS

17

Let 0 < t1 < t2 < t (we allow the possibility that t2 = t if t < +). Then


 

t1
td
h
t2 2
 
i
t1
f 1 t2 x + tt21 (x + t2 d)

 

1 tt12 f (x) + tt12 f (x + t2 d).

f (x + t1 d) = f x +
=

Hence

f (x + t1 d) f (x)
f (x + t2 d) f (x)

.
t1
t2
We now show Part 3 of this result. To see that f 0 (x; ) is positively homogeneous let
d Rn and > 0 and note that
f 0 (x; d) = lim
t0

f (x + (t)d) f (x)
= f 0 (x; d).
(t)

To see that f 0 (x; ) is subadditive let u, v Rn , then


f (x + t(u + v)) f (x)
t
f (x + 2t (u + v)) f (x)
lim
t0
t/2
1
f ( (x + tu) + 12 (x + tv)) f (x)
lim 2 2
t0
t
1
1
f (x + tu) + 2 f (x + tv) f (x)
lim 2 2
t0
t
f (x + tu) f (x) f (x + tv) f (x)
lim
+
t0
t
t
f 0 (x; u) + f (x; v) .

f 0 (x; u + v) = lim
t0

=
=

=
=

From Lemma 1.2.2 we immediately obtain the following result.


Theorem 1.2.2 Let f : Rn R {+} be convex and suppose that x Rn is a point
at which f is differentiable. Then x is a global solution to the problem P if and only if
f (x) = 0.
Proof: If x is a global solution to the problem P, then, in particular, x is a local solution
to the problem P and so f (x) = 0 by Theorem 1.1.1. Conversely, if f (x) = 0, then, by
setting t := 1, x := x, and d := y x in (1.2.5), we get that
0 f (y) f (x),
or f (x) f (y). Since y was chosen arbitrarily, the result follows.

18 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION


As Theorems 1.2.1 and 1.2.2 demonstrate, convex functions are very nice functions indeed.
This is especially so with regard to optimization theory. Thus, it is important that we be
able to recognize when a function is convex. For this reason we give the following result.
Theorem 1.2.3 Let f : Rn R.
1. If f is differentiable on Rn , then the following statements are equivalent:
(a) f is convex,
(b) f (y) f (x) + f (x)T (y x) for all x, y Rn
(c) (f (x) f (y))T (x y) 0 for all x, y Rn .
2. If f is twice differentiable then f is convex if and only if f is positive semi-definite for
all x Rn .
Proof: (a) (b) If f is convex, then 1.2.3 holds. By setting t := 1 and d := y x we
obtain (b).
(b) (c) Let x, y Rn . From (b) we have
f (y) f (x) + f (x)T (y x)
and
f (x) f (y) + f (y)T (x y).
By adding these two inequalities we obtain (c).
(c) (b) Let x, y Rn . By the mean value theorem there exists 0 < < 1 such
that
f (y) f (x) = f (x )T (y x)
where x := y + (1 )x. By hypothesis,
0 [f (x ) f (x)]T (x x)
= [f (x ) f (x)]T (y x)
= [f (y) f (x) f (x)T (y x)].
Hence f (y) f (x) + f (x)T (y x).
(b) (a) Let x, y Rn and set
:= max () := [f (y + (1 )x) (f (y) + (1 )f (x))].
[0,1]

We need to show that 0. Since [0, 1] is compact and is continuous, there is a


[0, 1] such that () = . If equals zero or one, we are done. Hence we may as
well assume that 0 < < 1 in which case
0 = 0 () = f (x )T (y x) + f (x) f (y)

1.2. CONVEX PROBLEMS

19

where x = x + (y x), or equivalently


f (y) = f (x) f (x )T (x x ).
But then

= f (x ) (f (x) + (f (y) f (x)))


= f (x ) + f (x )T (x x ) f (x)
0

by (b).
2) Suppose f is convex and let x, d Rn , then by (b) of Part (1),
f (x + td) f (x) + tf (x)T d
for all t R. Replacing the left hand side of this inequality with its secondorder
Taylor expansion yields the inequality
f (x) + tf (x)T d +

t2 T 2
d f (x)d + o(t2 ) f (x) + tf (x)T d
2

or equivalently
1 t 2
o(t2 )
d f (x)d + 2 0.
2
t
Letting t 0 yields the inequality
dT 2 f (x)d 0.
Since d was arbitrary, 2 f (x) is positive semi-definite.
Conversely, if x, y Rn , then by the mean value theorem there is a (0, 1) such that
1
f (y) = f (x) + f (x)T (y x) + (y x)T 2 f (x )(y x)
2
where x = y + (1 )x. Hence
f (y) f (x) + f (x)T (y x)
since 2 f (x ) is positive semi-definite. Therefore, f is convex by (b) of Part (1).


We have established that f 0 (x; d) exists for all x dom (f ) and d Rn , but we have not
yet discussed to continuity properties of f . We give a partial result in this direction in the
next lemma.
Lemma 1.2.3 Let f : Rn R {+} be convex. Then f if bounded in a neighborhood of
a point x if and only if f is Lipschitz in a neighborhood of x.

20 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION


Proof: If f is Lipschitz in a neighborhood of x, then f is clearly bounded above in a neighborhood of x. Therefore, we assume local boundedness and establish Lipschitz continuity.
Let  > 0 and M > 0 be such that |f (x)| M for all x x + 2B. Set g(x) =
f (x + x) f (
x). It is sufficient to show that g is Lipschitz on B. First note that for all
x 2B
1
1
1
1
0 = g(0) = g( x + (x)) g(x) + g(x),
2
2
2
2
and so g(x) g(x) for all x 2B. Next, let x, y B with x 6= y and set = kx yk.
Then w = y + 1 (y x) 2B, and so
1
1
1
1
g(y) = g
x
+
w

g(x)
+
g(w).
1 + 1
1 + 1
1 + 1
1 + 1
!

Consequently,
g(y) g(x)

1
(g(w) g(x)) 2M 1 = 2M 1 kx yk.
1 + 1

Since this inequality is symmetric in x and y, we obtain the result.

1.3

Convex Composite Problems

Convex composite optimization is concerned with the minimization of functions of the form
f (x) := h(F (x)) where h : Rm R {+} is a proper convex function and F : Rn Rn is
continuously differentiable. Most problems from nonlinear programming can be cast in this
framework.
Examples:
(1) Let F : Rn Rm where m > n, and consider the equation F (x) = 0. Since m > n
it is highly unlikely that a solution to this equation exists. However, one might try
to obtain a best approximate solution by solving the problem min{kF (x)k : x Rn }.
This is a convex composite optimization problem since the norm is a convex function.
(2) Again let F : Rn Rm where m > n, and consider the inclusion F (x) C, where
C Rn is a non-empty closed convex set. One can pose this inclusion as the optimization problem min{dist(F (x)|C) : x Rn }. This is a convex composite optimization
problem since the distance function
dist(y | C) := inf ky zk
zC

is a convex function.

1.3. CONVEX COMPOSITE PROBLEMS

21

(3) Let F : Rn Rm , C Rn a non-empty closed convex set, and f0 : Rn R,


and consider the constrained optimization problem min{f0 (x) : F (x) C}. One can
approximate this problem by the unconstrained optimization problem
min{f0 (x) + dist(f (x)|C) : x Rn }.
This is a convex composite optimization problem where h(, y) = + dist(y|C) is a
convex function. The function f0 (x) + dist(f (x)|C) is called an exact penalty function
for the problem min{f0 (x) : F (x) C}. We will review the theory of such functions
in a later section.
Most of the first-order theory for convex composite functions is easily derived from the
observation that
(1.3.7)

f (y) = h(F (y)) = h(F (x) + F 0 (x)(y x)) + o(ky xk).

This local representation for f is a direct consequence of h being locally Lipschitz:


|h(F (y)) h(F (x) + F 0 (x)(y x))|
R
Kky xk 01 kF 0 (x + t(y x)) F 0 (x)kdt
for some K 0. Equation (1) can be written equivalently as
(1.3.8)

h(F (x + d)) = h(F (x)) + f (x; d) + o(kdk)

where
f (x; d) := h(F (x) + F 0 (x)d) h(F (x)).
From 1.3.8, one immediately obtains the following result.
Lemma 1.3.1 Let h : Rn R be convex and let F : Rn Rm be continuously differentiable.
Then the function f = h F is everywhere directional differentiable and one has
(1.3.9)

f 0 (x; d) = h0 (F (x); F 0 (x)d)


.
= inf >0 f (x;d)

This result yields the following optimality condition for convex composite optimization
problems.
Theorem 1.3.1 Let h : Rm R be convex and F : Rn Rm be continuously differentiable.
If x is a local solution to the problem min{h(F (x))}, then d = 0 is a global solution to the
problem
(1.3.10)
x) + F 0 (
x)d).
minn h(F (
dR

There are various ways to test condition 1.3.8. A few of these are given below.

22 CHAPTER 1. OPTIMALITY CONDITIONS: UNCONSTRAINED OPTIMIZATION


Lemma 1.3.2 Let h and F be as in Theorem 1.3.1. The following conditions are equivalent
(a) d = 0 is a global solution to 1.3.10.
(b) 0 h0 (F (x); F 0 (x)d) for all d Rn .
(c) 0 f (x; d) for all d Rn .
Proof:The equivalence of (a) and (b) follows immediately from convexity. Indeed, this
equivalence is the heart of the proof of Theorem 1.3.1. The equivalence of (b) and (c) is an
immediate consequence of 1.3.2.

In the sequel, we say that x Rn satisfies the first-order condition for optimality for the
convex composite optimization problem if it satisfies any of the three conditions (a)(c) of
Lemma 1.3.2.

1.3.1

A Note on Directional Derivatives

Recall that if f : Rn R is differentiable, then the function f 0 (x; d) is linear in d:


f 0 (x; d1 + d2 ) = f 0 (x; d1 ) + f 0 (x; d2 ) .
If f is only assumed to be convex and not necessarily differentiable, then f 0 (x; ) is sublinear
and hence convex. Finally, if f = h F is convex composite with h : Rm R convex and
F : Rn Rm continuously differentiable, then, by Lemma (1.3.1), f 0 (x; ) is also sublinear
and hence convex. Moreover, the approximate directional derivative f (x; d) satisfies
1
1
1 f (x; 1 d) 2 f (x; 2 d)

for 0 < 1 2 ,

by the nondecreasing nature of the difference quotients. Thus, in particular,


f (x; d) f (x; d)

for all [0, 1].

You might also like