0% found this document useful (0 votes)
40 views17 pages

Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex

The document summarizes some key concepts in optimization: - Convex functions have certain properties like Jensen's inequality and their epigraph is convex. - Strongly convex functions have an additional quadratic term in their properties compared to convex functions. - Lipschitz continuity means a function's slope is bounded, and Lipschitz gradient means the gradient's slope is bounded. These relate a function to linear or quadratic bounds.

Uploaded by

Bikshu11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views17 pages

Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex

The document summarizes some key concepts in optimization: - Convex functions have certain properties like Jensen's inequality and their epigraph is convex. - Strongly convex functions have an additional quadratic term in their properties compared to convex functions. - Lipschitz continuity means a function's slope is bounded, and Lipschitz gradient means the gradient's slope is bounded. These relate a function to linear or quadratic bounds.

Uploaded by

Bikshu11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Some special class of functions in optimization:

convex, Lipschitz, strongly convex

Andersen Ang

Mathématique et recherche opérationnelle


UMONS, Belgium
[email protected] Homepage: angms.science

First draft: June 6, 2017


Last update: November 4, 2020
Overview

1 Convex function

2 α-strongly convex function

3 Lipschitz continuity, Lipschitz gradient and Lipschitz Hessian

4 Summary

2 / 17
Convex function
A function f (x) with f : dom f → R is convex if :
I dom f is a convex set
I ∀x, y ∈ dom f , f satisfies
I Jensen’s inequality
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).
I Gradient of f is monotonic


x − y, ∇f (x) − ∇f (y) ≥ 0.
I 1st-order Taylor approximation at point x is a global under-estimator


f (y) ≥ f (x) + ∇f (x), y − x .
I Epigraph of f is a convex set.
I f is strictly convex if ≤, ≥ became <, > (i.e. strict inequality).
I The 4 definitions are equivalent: you can move from one definition to
another as “if and only if”. See optimization books for the proof of
equivalence between these 4 definitions.
3 / 17
Convexity: the geometry of Jensen’s inequality
f : dom f → R is convex if :
(1) dom f is a convex set and
(2) ∀x, y ∈ dom f, f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)

5 f
λx + (1 − λ)y
4 f (λx + (1 − λ)y)
λf (x) + (1 − λ)f (y)
3
f (x)

0
−2 −1 0 1
x
4 / 17
Convexity: the geometry of 1st-order Taylor approximation
f : dom f → R is convex if :
(1) dom f is a convex set and

(2) ∀x, y ∈ dom f, f (y) ≥ f (x) + ∇f (x), y − x

f
f (−1) + ∇f (−1)(y − (−1))
20
f (y)

10

−4 −2 0 2
y
5 / 17
α-strongly convex function
A function f : dom f → R is α-strongly convex if:
I dom f is a convex set.
I ∀x, y ∈ dom f , f satisfies
I Jensen’s inequality with an additional quadratic term with α > 0
α
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)− λ(1 − λ)kx − yk22 .
2
I gradf is monotonic with an additional quadratic term with α > 0
x − y, ∇f (x) − ∇f (y) ≥ αkx − yk22 ≥ 0.

I 1st-order Taylor approximation at point x is global under-estimator


with an additional quadratic term with α > 0
α
f (y) ≥ f (x) + ∇f (x), y − x + kx − yk22 .

2
or we say f is lower bounded by a quadratic function.
I With α > 0, the function f (x) − α2 kxk22 is convex.
I If f is twice differentiable, it is α-strongly convex iff ∇2 f (x)  αI.
I These definitions are equivalent
6 / 17
Equivalence between definitions

of strong convexity
We show ∇2 f (x)  αI =⇒ x − y, ∇f (x) − ∇f (y) ≥ αkx − yk22 , α > 0.
Rb
First recall from calculus G(b) − G(a) = a g(θ)dθ. Next, a smart step, let
θ = y + τ (x − y), then dθ = (x − y)dτ . Consider integral range from 0 to 1 for
τ we let G be ∇f and g be ∇2 f , this gives
Z 1
∇2 f y + τ (x − y) (x − y)dτ.

∇f (x) − ∇f (y) =
0
(left hand side is a vector, right hand side is matrix-vector product, also a vector)

Take dot product with x − y on the whole equation on both sides


Z 1
D E
∇2 f y + τ (x − y) (x − y)dτ .


x − y, ∇f (x) − ∇f (y) = x − y,
0


By ∇2 f (x)  αI for all x, we have ∇2 f y + τ (x − y)  αI and
Z 1
D E
α(x − y)dτ = αkx − yk22 .

x − y, ∇f (x) − ∇f (y) ≥ x − y,
0

7 / 17
α-strongly convex: the geometry of the lower bounded
f (x) : dom f → R is α-strongly convex if
(1) dom f is a convex set and
α
(2) for all x, y ∈ dom f : f (y) ≥ f (x) + ∇f (x)> (y − x) + kx − yk22
2

20 f
α
f (−1) + ∇f (−1)(y − (−1)) + ky − (−1)k22
f (y)

2
10 f (−1) + ∇f (−1)(y − (−1))

−4 −2 0 2
y
Interpretation: f is lower bounded by a quadratic curve with some
curvature, which is also lower bounded by the 1st order Taylor
approximation (zero curvature) =⇒ f is not “too flat” (at least not “as
flat as” the lower bound). In other words: f is at least α-amount of
“bumpy”. 8 / 17
Lipschitz continuity
A function f (x) : dom f → R is Lipschitz if for any two points
x, y ∈ dom f , there exists a constant L ≥ 0 (the Lipschitz constant) such
that
|f (x) − f (y)| ≤ Lkx − yk.

|f (x) − f (y)|
I Re-arrange gives ≤ L, which is approximately the
kx − yk
magnitude of the gradient when x, y are close =⇒ f is Lipschitz
means the “slope” (rate of change) of f is bounded above by a global
constant L.
I Removing the absolute value sign:

f (x) ≤ f (y) + Lkx − yk
f (x) ≥ f (y) − Lkx − yk

meaning that f for all x is bounded above and below by a linear


function.
9 / 17
The geometry of Lipschitz continuity
A function is Lipschitz means function does not have sharp changes
everywhere: ∀x, the function value f is entirely outside a cone which is
modeled by the linear functions in the last page.

10
f (x)

−10

−4 −2 0 2 4
x

Important note: such property is global, such cone exists for all points
on f . i.e. the cone can “slide” along the curve and the argument still
holds.
10 / 17
Lipschitz continuous gradient
A function f : dom f → R is smooth if for any two points x, y ∈ dom f ,
there exists a constant L such that

k∇f (x) − ∇f (y)k ≤ Lkx − yk.

I This assume f is differentiable.


I f is L-smooth is also called L-Lipschitz gradient.
I f is L-smooth is equivalent to

f (y) − f (x) − ∇f (x), y − x ≤ L ky − xk22 .




2
Removing the absolute value sign:

f (y) ≤ f (x) + ∇f (x), y − x + L2 ky − xk22


(

f (y) ≥ f (x) + ∇f (x), y − x − L2 ky − xk22



meaning that f is bounded above and below by a quadratic function.


11 / 17
Equivalent definitions of L-smooth function
A function f (x) is L -smooth if
I gradf is L-Lipschitz with Lipschitz constant L ≥ 0.
i.e. ∀x, y ∈ domf we have L ≥ 0
k∇f (x) − ∇f (y)k ≤ Lkx − yk.
I f is bounded by a quadratic function with L > 0:
f (y) − f (x) − ∇f (x), y − x ≤ L ky − xk22 .


2
I the gradient of f is monotonic with additional term with L > 0:

1
x − y, ∇f (x) − ∇f (y) ≥ k∇f (x) − ∇f (y)k22 .


L
I the norm of the slope of ∇f (which is ∇2 f ) is bounded above.
I If f is twice differentiable, ∇2 f (x)  LI, or all the eigenvalue of
∇2 f (x) is upperbounded by L.
These definitions are equivalent. e.g.: take the norm of the 3rd condition
gives the 1st condition.
12 / 17
Proof of equivalence
We show for L > 0, k∇f (x) − ∇f (y)k ≤ Lkx − yk implies
f (y) − f (x) − ∇f (x), y − x ≤ L ky − xk22 .


2
Rb
Recall from calculus G(b) − G(a) = a g(θ)dθ. Next, a smart step, let g(θ) as
g(τ ) = h∇f (x + τ (y − x)), y − xi be a function in τ and dθ = dτ . Consider the
definite integral of g(τ ) from 0 to 1, let G(b) = f (y) and G(a) = f (x), hence
R1D E
f (y) − f (x) = 0 ∇f (x + τ (y − x)), y − x dτ
R1D E
= 0 ∇f (x + τ (y − x))−∇f (x) + ∇f (x), y − x dτ.

As ∇f (x) is independent of τ , can take out from the integral


Z 1 D E
f (y) − f (x) = h∇f (x), y − xi + ∇f (x + τ (y − x)) − ∇f (x), y − x dτ.
0

The idea is to create the


term h∇f (x),
y − xi so that we can move it to the left
and get f (y) − f (x) − ∇f (x), y − x

13 / 17
Proof of equivalence - continue

| 1 h∇f (x + τ (y − x)) − ∇f (x), y − xi dτ


R
|f (y) − f (x) − h∇f (x), y − xi| = |
R 10
≤ 0 h∇f (x + τ (y − x)) − ∇f (x), y − xi dτ

c.s. R1
≤ 0 k∇f (x + τ (y − x)) − ∇f (x)k · ky − xkdτ.

c.s. means Cauchy – Schwarz inequality.

Now look at k∇f (x + τ (y − x)) − ∇f (x)k, this is exactly where we can apply
the Lipschitz gradient inequality

k∇f (x + τ (y − x)) − ∇f (x)k ≤ Lkτ (y − x)k ≤ L|τ |ky − xk = Lτ ky − xk

where kτ (y − x)k = |τ |ky − xk as norm is non-negative. Note that the integral


range is from 0 to 1 so the absolute sign in τ can be removed. Lastly
Z 1
L
Lτ dτ · ky − xk2 = ky − xk2 .


f (y) − f (x) − ∇f (x), y − x ≤
0 2

14 / 17
L-smoothness: the geometry of the upper bound
any two points x, y ∈ dom f ,
A function f is
L-smooth if for
f (y) ≤ f (x) + ∇f (x), y − x + L2 ky − xk22

20 f
f (−1) + ∇f (−1)(y − (−1)) + L2 ky − (−1)k
f (y)

10

−4 −2 0 2
y

Interpretation : f is globally bounded above by a quadratic function.


i.e. f cannot be “too sharp” (f is flatter than the upper bound), or f
cannot grow “too fast”.

15 / 17
Lipschitz continuous Hessian
A function f (x) : dom f → R has L-Lipschitz Hessian, if for any two
points x, y ∈ dom f , there exists a constant L (the Lipschitz constant)
such that
k∇2 f (x) − ∇2 f (y)k ≤ Lkx − yk.
I This assumes f is twice differentiable.
I This means the norm of ∇3 f (x) is bounded above by L.
I f has L-Lipschitz Hessian is equivalent to

f (x)−f (y)− ∇f (x), y−x − ∇2 f (x)(y−x), y−x ≤ L ky−xk32



6
see here for the proof.
Removing the absolute value sign, and make y the subject:


2 L
f (y) ≥ f (x) −
∇f (x), y − x −
∇ f (x)(y − x), y − x − 6 ky − xk32
L
f (y) ≤ f (x) − ∇f (x), y − x − ∇2 f (x)(y − x), y − x + 6 ky − xk32
which means f (y) is bounded above and below by two cubic
functions parameterized at the point x for all y.
16 / 17
Last page - summary
f is convex if domf is convex and
1. f
(λx + (1 − λ)y) ≤ λf (x)
+ (1 − λ)f (y)
2. x − y, ∇f (x) − ∇f (y) ≥ 0


3. f (y) ≥ f (x) + ∇f (x), y − x

f is α-strongly convex if domf is convex and


1. f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) − α 2
λ(1 − λ)kx − yk22
2. x − y, ∇f (x) − ∇f (y) ≥ αkx − yk22

3. f (y) ≥ f (x) + ∇f (x), y − x + α kx − yk22




2
α
4. f (x) − 2 kxk22 is convex
5. ∇2 f (x)  αI, if f is twice differentiable

f is L-Lipschitz gradient (L-smooth) if f is differentiable and


1. k∇f
(x) − ∇f (y)k ≤ Lkx − yk
2. f (y) − f (x) − ∇f (x), y − x ≤ L ky − xk22


2
1
3. x − y, ∇f (x) − ∇f (y) ≥ L k∇f (x) − ∇f (y)k22

4. ∇2 f (x)  LI, if f is twice differentiable

f is L-Lipschitz Hessian if f is twice differentiable and


1. k∇2 f (x) − ∇2 f (y)k ≤ Lkx − yk
L
2. f (x) − f (y) − ∇f (x), y − x − ∇2 f (x)(y − x), y − x ≤ ky − xk32



6
End of document
17 / 17

You might also like