0% found this document useful (0 votes)
68 views

CS 726: Nonlinear Optimization 1 Lecture 04: Convexity and Continuity

This document summarizes a lecture on convexity and continuity in nonlinear optimization. It defines Lipschitz continuity, uniform continuity, and continuity. It presents a lemma showing the relationships between different definitions of continuity and Lipschitz continuity of the gradient. It proves implications between statements about Lipschitz continuity of the gradient and properties of the Hessian. The document reviews key concepts to provide background for further lectures on nonlinear optimization.

Uploaded by

Harris
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

CS 726: Nonlinear Optimization 1 Lecture 04: Convexity and Continuity

This document summarizes a lecture on convexity and continuity in nonlinear optimization. It defines Lipschitz continuity, uniform continuity, and continuity. It presents a lemma showing the relationships between different definitions of continuity and Lipschitz continuity of the gradient. It proves implications between statements about Lipschitz continuity of the gradient and properties of the Hessian. The document reviews key concepts to provide background for further lectures on nonlinear optimization.

Uploaded by

Harris
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CS 726: Nonlinear Optimization 1

Lecture 04 : Convexity and Continuity

Michael C. Ferris

Computer Sciences Department


University of Wisconsin-Madison

February 1 2021

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 1 / 15


Review I

Lots of things coming at you - for completeness and reference


Proofs are important, please review to ensure you understand what is
being done
Lecture 3: Introduced the notion of (strict and) strong convexity and
used this to prove a key theorem giving equivalences of strong
convexity when f 2 C 1
Lecture 4A: please review lecture (slides and recording) at your
convenience. Key points:
Eigenvalues (decomposition: AQ = Q⇤) and singular values
(decomposition: A = USV T )
Definition of psd/pd and equivalences
Lemma that bounds quadratic form using eigenvalues
Slides 16 onwards are additional reading and not critical for this
course - useful background material

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 2 / 15


where the last step follows from continuity: r f (x + p) r f (x) ! 0 as p ! 0,
Review IIfor all 2 (0, 1).
As we will see throughout this text, a crucial quantity in optimization is the
Lipschitz constant L for the gradient of f , which is defined to satisfy

kr f (x) r f (y)k  Lkx yk, for all x, y 2 dom ( f ). (2.7)

We say that a continuously di↵erentiable function f with this property is L-


smooth or has L-Lipschitz gradients. We say that f is L0 -Lipschitz if

| f (x) f (y)|  L0 kx yk, for all x, y 2 dom ( f ). (2.8)

From (2.2), we have


Z 1
f (y) f (x) r f (x)T (y x) = [r f (x + (y x)) r f (x)]T (y x) d .
0

By using (2.7), we have

[r f (x+ (y x)) r f (x)]T (y x)  kr f (x+ (y x)) r f (x)kky xk  L ky xk2 .

By substituting this bound into the previous integral, we obtain the following
result.

Lemma 2.2 Given an L-smooth function f , we have for any x, y 2 dom ( f )


that
L
f (y)  f (x) + r f (x)T (y x) + ky xk2 . (2.9)
2
Lemma 2.2 asserts that f can be upper bounded by a quadratic function
whose value at x is equal to f (x).
When f is twice continuouslyCS726:Lecture
Michael C. Ferris (UW-Madison) di↵erentiable,
04 we can characterize the con-
Convexity and Continuity 3 / 15
Continuity I

Definition (Lipschitz Continuity)


A function F : Rn ! Rm is said to be Lipschitz continuous on ⌦ if 9⇢ > 0
such that 8x, y 2 ⌦

kF (y ) F (x)k  ⇢ ky xk

Definition (Uniform Continuity)


A function F : Rn ! Rm is said to be uniformly continuous if
8✏, 9 , 8x, 8y :

||y x|| < =) |F (y ) F (x)| < ✏

Example
p
F (x) = x 8x 2 [0, 1]

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 4 / 15


Continuity II

In uniform continuity, unlike ordinary continuity, the maximum distance


between F (x) and F (y ) cannot depend on x and y themselves.

Definition (Continuity)
A function F : Rn ! Rm is said to be continuous if 8✏, 8x, 9 , 8y :

||y x|| < =) |F (y ) F (x)| < ✏

Example
F (x) = x 1 8x 2 (0, 1)

Fact
Lipschitz continuity =) Uniform continuity =) Continuity

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 5 / 15


Lipschitz Continuity

Lemma (Lipschitz continuity)


Consider the following statements:
(a) rf is Lipschitz continuous on ⌦ with constant ⇢
(b) For all x, y 2 ⌦, f (y )  f (x) + hrf (x), y xi + (⇢/2) kx y k2
(c) For all x, y 2 ⌦, hrf (x) rf (y ), x y i  ⇢ kx y k2
Then (a) implies (b) implies (c). If f is twice continuously di↵erentiable,
then (c) implies
⌦ ↵
(d) For all x, y 2 ⌦, y x, r2 f (z)(y x)  ⇢ ky xk2
⌦ ↵
If y x, r2 f (x)(y x) ky xk2 for some (not necessarily
positive), then (d) implies (a), possibly with a di↵erent constant ⇢.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 6 / 15


Proof.
(a) =) (b)
Z 1
f (y ) f (x) hrf (x), y xi = hrf (x + t(y x)) rf (x), y xi dt
0
Z 1
 ky xk krf (x + t(y x)) rf (x)k dt
0
Z 1
2
 ⇢ ky xk tdt
0
⇢ 2
= ky xk
2

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 7 / 15


Proof.
(b) =) (c) Invoking (b) twice gives
⇢ 2
f (y )  f (x) + hrf (x), y xi + kx yk
2
⇢ 2
f (x)  f (y ) + hrf (y ), x y i + kx yk
2

Adding these inequalities gives


2
f (y ) + f (x)  f (x) + f (y ) + hrf (x) rf (y ), y xi + ⇢ kx yk

from where (c) follows.


(c) =) (d) It follows from (c) that
2 2
hrf (x + (y x)) rf (x), (y x)i  ⇢ ky xk
2
If we divide both sides by , (d) then follows in the limit as ! 0.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 8 / 15


Proof.
(d) =) (a) Let := 2 max{ , 0} + ⇢. We first show that r2 f (z)  . Note that

D E
2 2 2
r f (z) = sup r f (z)y = sup x, r f (z)y
ky k=1 kxk=1,ky k=1

However,

D E 1 D E 1 D E 1 D E
2 2 2 2
x, r f (z)y = x y , r f (z)(y x) + x, r f (z)x + y , r f (z)y
2 2 2
1 2 2 2
 { kx y k + ⇢ kxk + ky k }
2
1 2 2 2 2
 {max{ , 0}(kxk + 2 kxk ky k + ky k ) + ⇢(kxk + ky k )}
2

Hence, r2 f (z)  2 max{ , 0} + ⇢, as required. The Lipschitz continuity now follows easily since

Z 1
2
krf (y ) rf (x)k = r f (x + t(y x))(y x)dt
0
 ky xk

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 9 / 15


Quadratic Bound Lemma

Lemma
If F : Rn ! Rm is such that DF is Lipschitz continuous w.r.t. convex set
⌦, then,
L
kF (y ) F (x) DF (x)(y x)k2  ky xk22
2
8x, y 2 ⌦ where L is a Lipschitz constant for DF on ⌦.

Proof.
Problem in homework and also [Wright and Recht(2020), Lemma 2.2].

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 10 / 15


Characterizations of Strictly/Strongly Convex Functions
Proposition
Suppose domf is open and f : Rn ! R̄ is twice continuously di↵erentiable
over domf . Then:
(a) If domf is convex and r2 f (x) is positive semi-definite (psd) 8x 2
domf , then f is convex.
(b) If domf is convex and r2 f (x) is positive definite (pd) 8x 2domf ,
then f is strictly convex.
(c) If f is convex, then domf is convex and r2 f (x) is psd 8x 2domf .
(d) If f (x) = 12 x T Qx + p T x and Q is symmetric, then f is convex ()
Q is psd. Also, f is strictly convex () Q is pd. (Note: Here domf
is all of Rn , which is convex.)

Example
f (x) = x 2 is a nice example of this proposition.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 11 / 15


Example
f (x) = ln(x) for x > 0. In this case, domf = {x|x > 0} and
r2 f (x) = x12 > 0.

Example
Note that while f (x) = x12 has a psd (and in fact pd) second derivative,
domf is not convex (domf = R \ {0}) and thus f is not convex.

1
A visualization of the function f (x) = x2 , where the gap at x = 0 is clear.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 12 / 15


A related result is found as [Wright and Recht(2020), Lemma 2.3].

Lemma
Suppose f is twice continuously di↵erentiable on Rn and convex. Then
(a) f is strongly convex with modulus of convexity m if and only if
r2 f (x) ⌫ mI for all x.
(b) rf is Lipschitz continuous with Lipschitz constant L if and only if
r2 f (x) LI for all x.

Proof.
Statement (a) is proven as [Wright and Recht(2020), Lemma 2.5 (a)].
The statement (b) follows in a similar manner (to generate a lower bound
on eigenvalues) and the implication (d) implies (a) in Lemma 7 (see
above).

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 13 / 15


Note:

Proof of the above result can be found in [Singer(2014)].


Strong convexity ) strict convexity
If f is di↵erentiable (C 1 ), then strong convexity ()
f (y ) f (x) + rf (x)T (y x) + m2 k y x k2 for 8x, y

Theorem
f (x) = 12 x T Qx + P T x. f is strictly convex () f is strongly convex
() f is coercive.

Proof.
Exercise(put the pieces together from the last two lectures)

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 14 / 15


Finally, the strength of strong convexity is apparent in the following result.
Theorem
Let f be di↵erentiable and strongly convex with modulus m > 0. Then the
minimizer x ⇤ of f exists and is unique.

Proof.
See [Wright and Recht(2020), Theorem 2.6]. The di↵erentiability
assumption is actually not needed, see for example Wikipedia entry on
Convex Analysis/Strong convexity.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 15 / 15


Y. Singer.
Advanced optimization.
Lecture notes for AS 221, Harvard University, 2014.
S. J. Wright and B. Recht.
Optimization for Data Analysis.
in proof, 2020.

Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 15 / 15

You might also like