0% found this document useful (0 votes)
18 views4 pages

Lecture 12

Uploaded by

samhith23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Lecture 12

Uploaded by

samhith23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

6FMAI19 Nonlinear Optimization Spring, 2022

Lecture #12 — 13/4, 2022


Lecturer: Yura Malitsky Scribe: Aban Husain

1 L-smooth functions and strong convexity


Unless otherwise specified, X is a finite dimensional R-vector space equipped with p-norm
∥ · ∥.

Definition 1. The dual space X ∗ of X is the space of linear forms on X with norm ∥ · ∥∗
defined by
∥f ∥∗ = max f (x).
∥x∥=1

As X is assumed to be finite dimensional, there is a natural equivalence between X and


X ∗ , i.e. X ∗ is an R-vector space of the same dimension as X.

Remark 2. For X with norm ∥ · ∥p , the dual norm is ∥ · ∥q , where p1 + 1q = 1 for p > 1, and
q = ∞ for p = 1. In particular, if X has Euclidean norm, then so does X ∗ .

Recall that, for an arbitrary function f : X → R, the Legendre-Fenchel transform (or


complex conjugate) f ∗ : X ∗ → R can be constructed as:

f ∗ (y) = sup {⟨y, x⟩ − f (x)}.


x∈X

Consider the negative entropy function on the n-simplex


n Pn
(P
i=1 xi log xi if x ∈ ∆n = {x ∈ Rn | i=1 xi = 1}
h(x) =
∞ else.

Then by straightforward calculation

h∗ (y) = sup {⟨y, x⟩ − h(x)}


x∈X
n
X
= sup {⟨y, x⟩ − xi log xi }
x∈∆n i=1
n
X
= log( exp yi ).
i=1

1.1 L-smooth functions


The definition of L-smooth functions can be generalized to X with an unspecified p-norm.

Definition 3. A differentiable function f : X → R is L-smooth with respect to a norm ∥ · ∥ if

∥∇f (y) − ∇f (x)∥∗ ≤ L∥y − x∥, ∀x, y ∈ X.

Theorem 4. Let f : X → R be convex, and L > 0. Then the following are equivalent for all
x, y ∈ X and λ ∈ [0, 1]:

1. f is L-smooth with respect to ∥ · ∥;

1
2. f (y) ≤ f (x) + ⟨∇f (x), y − x⟩ + L2 ∥y − x∥2 ;
1
3. f (y) − f (x) − ⟨∇f (x), y − x⟩ ≥ 2L ∥∇f (x) − ∇f (y)∥2∗ ;
1
4. ⟨∇f (x) − ∇f (y), x − y⟩ ≥ L ∥∇f (x) − ∇f (y)∥2∗ ;

5. f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y) − L2 λ(1 − λ)∥x − y∥2 .


Proof. (1) ⇒ (2) Let xλ = x + λ(y − x) for λ ∈ [0, 1]. Using the fundamental theorem of
calculus and Hölder’s inequality:
Z 1
f (y) − f (x) − ⟨∇f (x), y − x⟩ = ⟨∇f (xλ ) − ∇f (x), y − x⟩ dλ
0
Z 1
≤ ∥∇f (xλ ) − ∇f (x)∥∗ ∥y − x∥ dλ
0
Z 1
≤ Lλ∥y − x∥2 dλ
0
L
= ∥y − x∥2 .
2

(2) ⇒ (3) For fixed x ∈ X let

φ(y) = f (y) − f (x) − ⟨∇f (x), y − x⟩.

From definition ∇φ(y) = ∇f (y)−∇f (x), and by convexity, φ(x) = 0 is a minimum value. For
y ∈ X, set z = y − ∥∇φ(y)∥
L

v where v is chosen so that ⟨∇φ(y), v⟩ = ∥∇φ(y)∥∗ and ∥v∥ = 1.
Then

0 ≤ φ(z)
∥∇φ(y)∥∗ L ∥∇φ(y)∥∗ 2
= φ(y) − ⟨∇φ(y), v⟩ + ∥ v∥
L 2 L
1
= f (y) − f (x) − ⟨∇f (x), y − x⟩ − ∥∇f (y) − ∇f (x)∥2∗ .
2L
(3) ⇒ (4) For each x, y ∈ X,
1
f (y) − f (x) − ⟨∇f (x), y − x⟩ ≥ ∥∇f (x) − ∇f (y)∥2∗
2L
1
f (x) − f (y) − ⟨∇f (y), x − y⟩ ≥ ∥∇f (y) − ∇f (x)∥2∗
2L
Summation yields (4).
(4) ⇒ (1) Using Hölder’s inequality,
1
∥∇f (x) − ∇f (y)∥2∗ ≤ ⟨∇f (x) − ∇f (y), x − y⟩ ≤ ∥∇f (x) − ∇f (y)∥∗ ∥x − y∥.
L
(2) ⇒ (5) This follows from the definition of convexity and the inequality in (2).
(5) ⇒ (2) Rewrite (5) as

f (x + λ(y − x)) − f (x) L(1 − λ)


f (y) ≤ f (x) + + ∥y − x∥2 .
λ 2
The limit as λ → 0 results in (2).

2
Pn
Claim 5. The function f (x) = log( i=1 exp xi ) is 1-smooth with respect to ∥ · ∥2 and ∥ · ∥∞ .

The first and second order partial derivatives of f are


 n
x x e xk ) 2 ,
−e i e j /( if i ̸= j
P
n

∂f 2

∂f X
(x) = exi /( exk ), (x) = i=k
n n
∂xi k=1
∂xi ∂xj −exi exi /(


P
exk )2 + exi /(
P
exk ) if i = j.
i=k k=1

Fix the notation σ = ∇f (x) and ∇2 f (x) = diag(σ) − σσ T .

1. In the case of Euclidean norm, L is bounded by the largest eigenvalue of the Hessian.
By Weyl’s inequality ∇2 f (x) ≼ diag(σ) ≼ I, so f is 1-smooth with respect to ∥ · ∥2 .

2. Given ∥ · ∥∞ , for any d ∈ R the inequality ⟨∇2 f (x), d⟩ ≤ ⟨diag(σ), d⟩ ≤ ∥d∥∞ holds.
Since f is twice continuously differentiable, for x, y ∈ R there exists some z ∈ [x, y] such
that
1
f (y) = f (x) + ⟨∇f (x), y − x⟩ + ⟨∇2 f (z)(y − x), y − x⟩
2
1
≤ f (x) + ⟨∇f (x), y − x⟩ + ∥x − y∥∞ .
2
By 4, f is 1-smooth with respect to ∥ · ∥∞ .

1.2 µ-strongly convex functions


The definition of strongly convex functions can also be generalized.

Definition 6. A function f : X → R is µ-strongly convex wrt. ∥ · ∥ if for all x, y ∈ X and


λ ∈ [0, 1]:
µ
λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y) + λ(1 − λ)∥y − x∥2 .
2
It is important to note that the equivalence
µ
f is µ-strongly convex ⇔ f (x) − ∥x∥2 is convex
2
holds only in the Euclidean case.

Theorem 7. Let f : X → R ∪ {∞}. The following are equivalent for all x, y ∈ X:

1. f is µ-strongly convex with respect to ∥ · ∥;

2. f (y) ≥ f (x) + ⟨gx , y − x⟩ + µ2 ∥y − x∥2 , ∀gx ∈ ∂f (x);

3. ⟨gx − gy , x − y⟩ ≥ µ∥x − y∥2 , ∀gx ∈ ∂f (x), ∀gy ∈ ∂f (y).

Proof. (1) ⇒ (2) Let xλ = x + λ(y − x), λ ∈ [0, 1]. The definition of µ-strong convexity can
be rewritten as
µ f (xλ ) − f (x)
f (y) ≥ f (x) + (1 − λ)∥y − x∥2 + .
2 λ
Allowing λ → 0,
µ
f (y) ≥ f (x) + ∥y − x∥2 + ⟨∇fy−x (x), y − x⟩
2
µ
≥ f (x) + ∥y − x∥2 + ⟨gx , y − x⟩ ∀gx ∈ ∂f (x).
2

3
(2) ⇒ (1) For x, y ∈ X and xλ = x + λ(y − x), λ ∈ [0, 1]:
µ
λf (y) ≥ λ(f (xλ ) + ⟨gxλ , y − xλ ⟩ + ∥y − xλ ∥2 )
2
µ
(1 − λ)f (x) ≥ (1 − λ)(f (xλ ) + ⟨gxλ , x − xλ ⟩ + ∥x − xλ ∥2 ).
2
Summation yields (1).
(2) ⇒ (3) Monotonicity follows immediately from (2).
(3) ⇒ (2) For λ ∈ [0, 1], let xλ = x + λ(y − x). Given that f is convex, for gxλ ∈ ∂f (xλ ),
Z 1
f (y) − f (x) = ⟨gxλ , y − x⟩ dλ.
0

Since ⟨gxλ , y − x⟩ ≥ ⟨gx , y − x⟩ + µλ∥x − y∥2 , (2) follows.

2 Fenchel duality of L-smooth and strongly convex functions


In the last lecture, the following relations between subgradients of a function and its convex
conjugate were established.

Lemma 8. [Fenchel Young’s equality] For a proper, lower semicontinuous convex function
f : X → R, the following conditions are equivalent:

1. f (x) + f ∗ (y) = ⟨y, x⟩;

2. x ∈ ∂f ∗ (y);

3. y ∈ ∂f (x).

Theorem 9. Let f : X → R. The following statements hold:

1. If f is closed and µ-strongly convex with respect to ∥ · ∥, then f ∗ is is 1


µ -smooth with
respect to ∥ · ∥∗ ;

2. If f is convex and L-smooth with respect to ∥ · ∥, then f ∗ is is 1


L -strongly convex with
respect to ∥ · ∥∗ .

Proof. Both statements are direct consequences of Fenchel Young, 4 and 7.

Claim 10. The negative entropy function h(x) on the n-simplex is 1-stronlgy convex with
respect to both ∥ · ∥1 and ∥ · ∥2 .

Since the complex conjuagate of h(x) is 1-smooth with respect to ∥·∥2 and ∥·∥∞ , 9 ensures
1-strong convexity with respect to the dual norms.

You might also like