CS 726: Nonlinear Optimization 1
Lecture 04 : Convexity and Continuity
Michael C. Ferris
Computer Sciences Department
University of Wisconsin-Madison
February 1 2021
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 1 / 15
Review I
Lots of things coming at you - for completeness and reference
Proofs are important, please review to ensure you understand what is
being done
Lecture 3: Introduced the notion of (strict and) strong convexity and
used this to prove a key theorem giving equivalences of strong
convexity when f 2 C 1
Lecture 4A: please review lecture (slides and recording) at your
convenience. Key points:
Eigenvalues (decomposition: AQ = Q⇤) and singular values
(decomposition: A = USV T )
Definition of psd/pd and equivalences
Lemma that bounds quadratic form using eigenvalues
Slides 16 onwards are additional reading and not critical for this
course - useful background material
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 2 / 15
where the last step follows from continuity: r f (x + p) r f (x) ! 0 as p ! 0,
Review IIfor all 2 (0, 1).
As we will see throughout this text, a crucial quantity in optimization is the
Lipschitz constant L for the gradient of f , which is defined to satisfy
kr f (x) r f (y)k Lkx yk, for all x, y 2 dom ( f ). (2.7)
We say that a continuously di↵erentiable function f with this property is L-
smooth or has L-Lipschitz gradients. We say that f is L0 -Lipschitz if
| f (x) f (y)| L0 kx yk, for all x, y 2 dom ( f ). (2.8)
From (2.2), we have
Z 1
f (y) f (x) r f (x)T (y x) = [r f (x + (y x)) r f (x)]T (y x) d .
0
By using (2.7), we have
[r f (x+ (y x)) r f (x)]T (y x) kr f (x+ (y x)) r f (x)kky xk L ky xk2 .
By substituting this bound into the previous integral, we obtain the following
result.
Lemma 2.2 Given an L-smooth function f , we have for any x, y 2 dom ( f )
that
L
f (y) f (x) + r f (x)T (y x) + ky xk2 . (2.9)
2
Lemma 2.2 asserts that f can be upper bounded by a quadratic function
whose value at x is equal to f (x).
When f is twice continuouslyCS726:Lecture
Michael C. Ferris (UW-Madison) di↵erentiable,
04 we can characterize the con-
Convexity and Continuity 3 / 15
Continuity I
Definition (Lipschitz Continuity)
A function F : Rn ! Rm is said to be Lipschitz continuous on ⌦ if 9⇢ > 0
such that 8x, y 2 ⌦
kF (y ) F (x)k ⇢ ky xk
Definition (Uniform Continuity)
A function F : Rn ! Rm is said to be uniformly continuous if
8✏, 9 , 8x, 8y :
||y x|| < =) |F (y ) F (x)| < ✏
Example
p
F (x) = x 8x 2 [0, 1]
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 4 / 15
Continuity II
In uniform continuity, unlike ordinary continuity, the maximum distance
between F (x) and F (y ) cannot depend on x and y themselves.
Definition (Continuity)
A function F : Rn ! Rm is said to be continuous if 8✏, 8x, 9 , 8y :
||y x|| < =) |F (y ) F (x)| < ✏
Example
F (x) = x 1 8x 2 (0, 1)
Fact
Lipschitz continuity =) Uniform continuity =) Continuity
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 5 / 15
Lipschitz Continuity
Lemma (Lipschitz continuity)
Consider the following statements:
(a) rf is Lipschitz continuous on ⌦ with constant ⇢
(b) For all x, y 2 ⌦, f (y ) f (x) + hrf (x), y xi + (⇢/2) kx y k2
(c) For all x, y 2 ⌦, hrf (x) rf (y ), x y i ⇢ kx y k2
Then (a) implies (b) implies (c). If f is twice continuously di↵erentiable,
then (c) implies
⌦ ↵
(d) For all x, y 2 ⌦, y x, r2 f (z)(y x) ⇢ ky xk2
⌦ ↵
If y x, r2 f (x)(y x) ky xk2 for some (not necessarily
positive), then (d) implies (a), possibly with a di↵erent constant ⇢.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 6 / 15
Proof.
(a) =) (b)
Z 1
f (y ) f (x) hrf (x), y xi = hrf (x + t(y x)) rf (x), y xi dt
0
Z 1
ky xk krf (x + t(y x)) rf (x)k dt
0
Z 1
2
⇢ ky xk tdt
0
⇢ 2
= ky xk
2
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 7 / 15
Proof.
(b) =) (c) Invoking (b) twice gives
⇢ 2
f (y ) f (x) + hrf (x), y xi + kx yk
2
⇢ 2
f (x) f (y ) + hrf (y ), x y i + kx yk
2
Adding these inequalities gives
2
f (y ) + f (x) f (x) + f (y ) + hrf (x) rf (y ), y xi + ⇢ kx yk
from where (c) follows.
(c) =) (d) It follows from (c) that
2 2
hrf (x + (y x)) rf (x), (y x)i ⇢ ky xk
2
If we divide both sides by , (d) then follows in the limit as ! 0.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 8 / 15
Proof.
(d) =) (a) Let := 2 max{ , 0} + ⇢. We first show that r2 f (z) . Note that
D E
2 2 2
r f (z) = sup r f (z)y = sup x, r f (z)y
ky k=1 kxk=1,ky k=1
However,
D E 1 D E 1 D E 1 D E
2 2 2 2
x, r f (z)y = x y , r f (z)(y x) + x, r f (z)x + y , r f (z)y
2 2 2
1 2 2 2
{ kx y k + ⇢ kxk + ky k }
2
1 2 2 2 2
{max{ , 0}(kxk + 2 kxk ky k + ky k ) + ⇢(kxk + ky k )}
2
Hence, r2 f (z) 2 max{ , 0} + ⇢, as required. The Lipschitz continuity now follows easily since
Z 1
2
krf (y ) rf (x)k = r f (x + t(y x))(y x)dt
0
ky xk
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 9 / 15
Quadratic Bound Lemma
Lemma
If F : Rn ! Rm is such that DF is Lipschitz continuous w.r.t. convex set
⌦, then,
L
kF (y ) F (x) DF (x)(y x)k2 ky xk22
2
8x, y 2 ⌦ where L is a Lipschitz constant for DF on ⌦.
Proof.
Problem in homework and also [Wright and Recht(2020), Lemma 2.2].
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 10 / 15
Characterizations of Strictly/Strongly Convex Functions
Proposition
Suppose domf is open and f : Rn ! R̄ is twice continuously di↵erentiable
over domf . Then:
(a) If domf is convex and r2 f (x) is positive semi-definite (psd) 8x 2
domf , then f is convex.
(b) If domf is convex and r2 f (x) is positive definite (pd) 8x 2domf ,
then f is strictly convex.
(c) If f is convex, then domf is convex and r2 f (x) is psd 8x 2domf .
(d) If f (x) = 12 x T Qx + p T x and Q is symmetric, then f is convex ()
Q is psd. Also, f is strictly convex () Q is pd. (Note: Here domf
is all of Rn , which is convex.)
Example
f (x) = x 2 is a nice example of this proposition.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 11 / 15
Example
f (x) = ln(x) for x > 0. In this case, domf = {x|x > 0} and
r2 f (x) = x12 > 0.
Example
Note that while f (x) = x12 has a psd (and in fact pd) second derivative,
domf is not convex (domf = R \ {0}) and thus f is not convex.
1
A visualization of the function f (x) = x2 , where the gap at x = 0 is clear.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 12 / 15
A related result is found as [Wright and Recht(2020), Lemma 2.3].
Lemma
Suppose f is twice continuously di↵erentiable on Rn and convex. Then
(a) f is strongly convex with modulus of convexity m if and only if
r2 f (x) ⌫ mI for all x.
(b) rf is Lipschitz continuous with Lipschitz constant L if and only if
r2 f (x) LI for all x.
Proof.
Statement (a) is proven as [Wright and Recht(2020), Lemma 2.5 (a)].
The statement (b) follows in a similar manner (to generate a lower bound
on eigenvalues) and the implication (d) implies (a) in Lemma 7 (see
above).
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 13 / 15
Note:
Proof of the above result can be found in [Singer(2014)].
Strong convexity ) strict convexity
If f is di↵erentiable (C 1 ), then strong convexity ()
f (y ) f (x) + rf (x)T (y x) + m2 k y x k2 for 8x, y
Theorem
f (x) = 12 x T Qx + P T x. f is strictly convex () f is strongly convex
() f is coercive.
Proof.
Exercise(put the pieces together from the last two lectures)
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 14 / 15
Finally, the strength of strong convexity is apparent in the following result.
Theorem
Let f be di↵erentiable and strongly convex with modulus m > 0. Then the
minimizer x ⇤ of f exists and is unique.
Proof.
See [Wright and Recht(2020), Theorem 2.6]. The di↵erentiability
assumption is actually not needed, see for example Wikipedia entry on
Convex Analysis/Strong convexity.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 15 / 15
Y. Singer.
Advanced optimization.
Lecture notes for AS 221, Harvard University, 2014.
S. J. Wright and B. Recht.
Optimization for Data Analysis.
in proof, 2020.
Michael C. Ferris (UW-Madison) CS726:Lecture 04 Convexity and Continuity 15 / 15