Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
Some Special Class of Functions in Optimization: Convex, Lipschitz, Strongly Convex
Andersen Ang
1 Convex function
4 Summary
2 / 17
Convex function
A function f (x) with f : dom f → R is convex if :
I dom f is a convex set
I ∀x, y ∈ dom f , f satisfies
I Jensen’s inequality
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).
I Gradient of f is monotonic
x − y, ∇f (x) − ∇f (y) ≥ 0.
I 1st-order Taylor approximation at point x is a global under-estimator
f (y) ≥ f (x) + ∇f (x), y − x .
I Epigraph of f is a convex set.
I f is strictly convex if ≤, ≥ became <, > (i.e. strict inequality).
I The 4 definitions are equivalent: you can move from one definition to
another as “if and only if”. See optimization books for the proof of
equivalence between these 4 definitions.
3 / 17
Convexity: the geometry of Jensen’s inequality
f : dom f → R is convex if :
(1) dom f is a convex set and
(2) ∀x, y ∈ dom f, f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
5 f
λx + (1 − λ)y
4 f (λx + (1 − λ)y)
λf (x) + (1 − λ)f (y)
3
f (x)
0
−2 −1 0 1
x
4 / 17
Convexity: the geometry of 1st-order Taylor approximation
f : dom f → R is convex if :
(1) dom f is a convex set and
(2) ∀x, y ∈ dom f, f (y) ≥ f (x) + ∇f (x), y − x
f
f (−1) + ∇f (−1)(y − (−1))
20
f (y)
10
−4 −2 0 2
y
5 / 17
α-strongly convex function
A function f : dom f → R is α-strongly convex if:
I dom f is a convex set.
I ∀x, y ∈ dom f , f satisfies
I Jensen’s inequality with an additional quadratic term with α > 0
α
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)− λ(1 − λ)kx − yk22 .
2
I gradf is monotonic with an additional quadratic term with α > 0
x − y, ∇f (x) − ∇f (y) ≥ αkx − yk22 ≥ 0.
2
or we say f is lower bounded by a quadratic function.
I With α > 0, the function f (x) − α2 kxk22 is convex.
I If f is twice differentiable, it is α-strongly convex iff ∇2 f (x) αI.
I These definitions are equivalent
6 / 17
Equivalence between definitions
of strong convexity
We show ∇2 f (x) αI =⇒ x − y, ∇f (x) − ∇f (y) ≥ αkx − yk22 , α > 0.
Rb
First recall from calculus G(b) − G(a) = a g(θ)dθ. Next, a smart step, let
θ = y + τ (x − y), then dθ = (x − y)dτ . Consider integral range from 0 to 1 for
τ we let G be ∇f and g be ∇2 f , this gives
Z 1
∇2 f y + τ (x − y) (x − y)dτ.
∇f (x) − ∇f (y) =
0
(left hand side is a vector, right hand side is matrix-vector product, also a vector)
By ∇2 f (x) αI for all x, we have ∇2 f y + τ (x − y) αI and
Z 1
D E
α(x − y)dτ = αkx − yk22 .
x − y, ∇f (x) − ∇f (y) ≥ x − y,
0
7 / 17
α-strongly convex: the geometry of the lower bounded
f (x) : dom f → R is α-strongly convex if
(1) dom f is a convex set and
α
(2) for all x, y ∈ dom f : f (y) ≥ f (x) + ∇f (x)> (y − x) + kx − yk22
2
20 f
α
f (−1) + ∇f (−1)(y − (−1)) + ky − (−1)k22
f (y)
2
10 f (−1) + ∇f (−1)(y − (−1))
−4 −2 0 2
y
Interpretation: f is lower bounded by a quadratic curve with some
curvature, which is also lower bounded by the 1st order Taylor
approximation (zero curvature) =⇒ f is not “too flat” (at least not “as
flat as” the lower bound). In other words: f is at least α-amount of
“bumpy”. 8 / 17
Lipschitz continuity
A function f (x) : dom f → R is Lipschitz if for any two points
x, y ∈ dom f , there exists a constant L ≥ 0 (the Lipschitz constant) such
that
|f (x) − f (y)| ≤ Lkx − yk.
|f (x) − f (y)|
I Re-arrange gives ≤ L, which is approximately the
kx − yk
magnitude of the gradient when x, y are close =⇒ f is Lipschitz
means the “slope” (rate of change) of f is bounded above by a global
constant L.
I Removing the absolute value sign:
f (x) ≤ f (y) + Lkx − yk
f (x) ≥ f (y) − Lkx − yk
10
f (x)
−10
−4 −2 0 2 4
x
Important note: such property is global, such cone exists for all points
on f . i.e. the cone can “slide” along the curve and the argument still
holds.
10 / 17
Lipschitz continuous gradient
A function f : dom f → R is smooth if for any two points x, y ∈ dom f ,
there exists a constant L such that
1
x − y, ∇f (x) − ∇f (y) ≥ k∇f (x) − ∇f (y)k22 .
L
I the norm of the slope of ∇f (which is ∇2 f ) is bounded above.
I If f is twice differentiable, ∇2 f (x) LI, or all the eigenvalue of
∇2 f (x) is upperbounded by L.
These definitions are equivalent. e.g.: take the norm of the 3rd condition
gives the 1st condition.
12 / 17
Proof of equivalence
We show for L > 0, k∇f (x) − ∇f (y)k ≤ Lkx − yk implies
f (y) − f (x) − ∇f (x), y − x ≤ L ky − xk22 .
2
Rb
Recall from calculus G(b) − G(a) = a g(θ)dθ. Next, a smart step, let g(θ) as
g(τ ) = h∇f (x + τ (y − x)), y − xi be a function in τ and dθ = dτ . Consider the
definite integral of g(τ ) from 0 to 1, let G(b) = f (y) and G(a) = f (x), hence
R1D E
f (y) − f (x) = 0 ∇f (x + τ (y − x)), y − x dτ
R1D E
= 0 ∇f (x + τ (y − x))−∇f (x) + ∇f (x), y − x dτ.
13 / 17
Proof of equivalence - continue
Now look at k∇f (x + τ (y − x)) − ∇f (x)k, this is exactly where we can apply
the Lipschitz gradient inequality
14 / 17
L-smoothness: the geometry of the upper bound
any two points x, y ∈ dom f ,
A function f is
L-smooth if for
f (y) ≤ f (x) + ∇f (x), y − x + L2 ky − xk22
20 f
f (−1) + ∇f (−1)(y − (−1)) + L2 ky − (−1)k
f (y)
10
−4 −2 0 2
y
15 / 17
Lipschitz continuous Hessian
A function f (x) : dom f → R has L-Lipschitz Hessian, if for any two
points x, y ∈ dom f , there exists a constant L (the Lipschitz constant)
such that
k∇2 f (x) − ∇2 f (y)k ≤ Lkx − yk.
I This assumes f is twice differentiable.
I This means the norm of ∇3 f (x) is bounded above by L.
I f has L-Lipschitz Hessian is equivalent to
f (x)−f (y)− ∇f (x), y−x − ∇2 f (x)(y−x), y−x ≤ L ky−xk32
6
see here for the proof.
Removing the absolute value sign, and make y the subject:
2 L
f (y) ≥ f (x) −
∇f (x), y − x −
∇ f (x)(y − x), y − x − 6 ky − xk32
L
f (y) ≤ f (x) − ∇f (x), y − x − ∇2 f (x)(y − x), y − x + 6 ky − xk32
which means f (y) is bounded above and below by two cubic
functions parameterized at the point x for all y.
16 / 17
Last page - summary
f is convex if domf is convex and
1. f
(λx + (1 − λ)y) ≤ λf (x)
+ (1 − λ)f (y)
2. x − y, ∇f (x) − ∇f (y) ≥ 0
3. f (y) ≥ f (x) + ∇f (x), y − x