Lecture 12
Lecture 12
Definition 1. The dual space X ∗ of X is the space of linear forms on X with norm ∥ · ∥∗
defined by
∥f ∥∗ = max f (x).
∥x∥=1
Remark 2. For X with norm ∥ · ∥p , the dual norm is ∥ · ∥q , where p1 + 1q = 1 for p > 1, and
q = ∞ for p = 1. In particular, if X has Euclidean norm, then so does X ∗ .
Theorem 4. Let f : X → R be convex, and L > 0. Then the following are equivalent for all
x, y ∈ X and λ ∈ [0, 1]:
1
2. f (y) ≤ f (x) + ⟨∇f (x), y − x⟩ + L2 ∥y − x∥2 ;
1
3. f (y) − f (x) − ⟨∇f (x), y − x⟩ ≥ 2L ∥∇f (x) − ∇f (y)∥2∗ ;
1
4. ⟨∇f (x) − ∇f (y), x − y⟩ ≥ L ∥∇f (x) − ∇f (y)∥2∗ ;
From definition ∇φ(y) = ∇f (y)−∇f (x), and by convexity, φ(x) = 0 is a minimum value. For
y ∈ X, set z = y − ∥∇φ(y)∥
L
∗
v where v is chosen so that ⟨∇φ(y), v⟩ = ∥∇φ(y)∥∗ and ∥v∥ = 1.
Then
0 ≤ φ(z)
∥∇φ(y)∥∗ L ∥∇φ(y)∥∗ 2
= φ(y) − ⟨∇φ(y), v⟩ + ∥ v∥
L 2 L
1
= f (y) − f (x) − ⟨∇f (x), y − x⟩ − ∥∇f (y) − ∇f (x)∥2∗ .
2L
(3) ⇒ (4) For each x, y ∈ X,
1
f (y) − f (x) − ⟨∇f (x), y − x⟩ ≥ ∥∇f (x) − ∇f (y)∥2∗
2L
1
f (x) − f (y) − ⟨∇f (y), x − y⟩ ≥ ∥∇f (y) − ∇f (x)∥2∗
2L
Summation yields (4).
(4) ⇒ (1) Using Hölder’s inequality,
1
∥∇f (x) − ∇f (y)∥2∗ ≤ ⟨∇f (x) − ∇f (y), x − y⟩ ≤ ∥∇f (x) − ∇f (y)∥∗ ∥x − y∥.
L
(2) ⇒ (5) This follows from the definition of convexity and the inequality in (2).
(5) ⇒ (2) Rewrite (5) as
2
Pn
Claim 5. The function f (x) = log( i=1 exp xi ) is 1-smooth with respect to ∥ · ∥2 and ∥ · ∥∞ .
1. In the case of Euclidean norm, L is bounded by the largest eigenvalue of the Hessian.
By Weyl’s inequality ∇2 f (x) ≼ diag(σ) ≼ I, so f is 1-smooth with respect to ∥ · ∥2 .
2. Given ∥ · ∥∞ , for any d ∈ R the inequality ⟨∇2 f (x), d⟩ ≤ ⟨diag(σ), d⟩ ≤ ∥d∥∞ holds.
Since f is twice continuously differentiable, for x, y ∈ R there exists some z ∈ [x, y] such
that
1
f (y) = f (x) + ⟨∇f (x), y − x⟩ + ⟨∇2 f (z)(y − x), y − x⟩
2
1
≤ f (x) + ⟨∇f (x), y − x⟩ + ∥x − y∥∞ .
2
By 4, f is 1-smooth with respect to ∥ · ∥∞ .
Proof. (1) ⇒ (2) Let xλ = x + λ(y − x), λ ∈ [0, 1]. The definition of µ-strong convexity can
be rewritten as
µ f (xλ ) − f (x)
f (y) ≥ f (x) + (1 − λ)∥y − x∥2 + .
2 λ
Allowing λ → 0,
µ
f (y) ≥ f (x) + ∥y − x∥2 + ⟨∇fy−x (x), y − x⟩
2
µ
≥ f (x) + ∥y − x∥2 + ⟨gx , y − x⟩ ∀gx ∈ ∂f (x).
2
3
(2) ⇒ (1) For x, y ∈ X and xλ = x + λ(y − x), λ ∈ [0, 1]:
µ
λf (y) ≥ λ(f (xλ ) + ⟨gxλ , y − xλ ⟩ + ∥y − xλ ∥2 )
2
µ
(1 − λ)f (x) ≥ (1 − λ)(f (xλ ) + ⟨gxλ , x − xλ ⟩ + ∥x − xλ ∥2 ).
2
Summation yields (1).
(2) ⇒ (3) Monotonicity follows immediately from (2).
(3) ⇒ (2) For λ ∈ [0, 1], let xλ = x + λ(y − x). Given that f is convex, for gxλ ∈ ∂f (xλ ),
Z 1
f (y) − f (x) = ⟨gxλ , y − x⟩ dλ.
0
Lemma 8. [Fenchel Young’s equality] For a proper, lower semicontinuous convex function
f : X → R, the following conditions are equivalent:
2. x ∈ ∂f ∗ (y);
3. y ∈ ∂f (x).
Claim 10. The negative entropy function h(x) on the n-simplex is 1-stronlgy convex with
respect to both ∥ · ∥1 and ∥ · ∥2 .
Since the complex conjuagate of h(x) is 1-smooth with respect to ∥·∥2 and ∥·∥∞ , 9 ensures
1-strong convexity with respect to the dual norms.