0% found this document useful (0 votes)
7 views32 pages

Lect4 Removed

Uploaded by

rw8an1413
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

Lect4 Removed

Uploaded by

rw8an1413
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Convex Optimization

(EE227A: UC Berkeley)

Lecture 4
(Conjugates, subdifferentials)
31 Jan, 2013

Suvrit Sra
Organizational
♥ HW1 due: 14th Feb 2013 in class.
♥ Please LATEX your solutions (contact TA if this is an issue)
♥ Discussion with classmates is ok
♥ Each person must submit his/her individual solutions
♥ Acknowledge any help you receive
♥ Do not copy!
♥ Make sure you understand the solution you submit
♥ Cite any source that you use
♥ Have fun solving problems!

2 / 30
Recap
I Eigenvalues, singular values, positive definiteness
I Convex sets, θ1 x + θ2 y ∈ C, θ1 + θ2 = 1, θi ≥ 0
I Convex functions, midpoint convex, recognizing convexity
I Norms, mixed-norms, matrix norms, dual norms
I Indicator, distance function, minimum of jointly convex
I Brief mention of other forms of convexity

x+y 
f 2 ≤ 12 [f (x) + f (y)] + continuity =⇒ f is cvx
∇2 f (x)  0 implies f is cvx.

3 / 30
Fenchel Conjugate

4 / 30
Dual norms

Def. Let k·k be a norm on Rn . Its dual norm is

kuk∗ := sup uT x | kxk ≤ 1 .




uT x
Exercise: Verify that we may writekuk∗ = supx6=0 kxk
Exercise: Verify that kuk∗ is a norm.
I ku + vk∗ = sup (u + v)T x | kxk ≤ 1


I But sup (A + B) ≤ sup A + sup B

Exercise: Let 1/p + 1/q = 1, where p, q ≥ 1. Show that k·kq is


dual to k·kp . In particular, the `2 -norm is self-dual.
Hint: Use Hölder’s inequality: uT v ≤ kukp kvkq

5 / 30
Fenchel conjugate
Def. The Fenchel conjugate of a function f is

f ∗ (z) := sup xT z − f (x).


x∈dom f

Note: f ∗ pointwise (over x) sup of linear functions of z, so cvx!

Example Let f (x) = kxk. We have f ∗ (z) = Ik·k∗ ≤1 (z). That is,
conjugate of norm is the indicator function of dual norm ball.

I Consider two cases: (i) kzk∗ > 1; (ii) kzk∗ ≤ 1


I Case (i), by definition of dual norm (sup over z T u) there is a u s.t.
kuk ≤ 1 and z T u > 1
I f ∗ (z) = supx xT z − f (x). Rewrite x = αu, and let α → ∞
I Then, z T x − kxk = αz T u − kαuk = α(z T u − kuk); → ∞
I Case (ii): Since z T x ≤ kxkkzk∗ , xT z − kxk ≤ kxk(kzk∗ − 1) ≤ 0.
I x = 0 maximizes kxk(kzk∗ − 1), hence f (z) = 0.
I Thus, f (z) = +∞ if (i), and 0 if (ii), as desired.
6 / 30
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.

7 / 30
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.

Thus, dom f ∗ = {a}, and f ∗ (a) = −b.



Example Let a ≥ 0, and set √ f (x) = − a2 − x2 if |x| ≤ a, and +∞
otherwise. Then, f ∗ (z) = a 1 + z 2 .

Example f (x) = 12 xT Ax, where A  0. Then, f ∗ (z) = 12 z T A−1 z.

Exercise: If f (x) = max(0, 1 − x), then dom f ∗ is [−1, 0], and


within this domain, f ∗ (z) = z.
Hint: Analyze cases: max(0, 1 − x) = 0; and
max(0, 1 − x) = 1 − x

7 / 30
Subdifferentials

8 / 30
First order global underestimator

f (x) i
−y
), x
h∇ f (y
)+
f (y
f (y)

y x

f (x) ≥ f (y) + h∇f (y), x − yi

9 / 30
First order global underestimator

f (x)

f (y) x − yi
+ hg 1,
f (y )
g1

g2
g3

y
f (x) ≥ f (y) + hg, x − yi

9 / 30
Subgradients

f (x)

f (y) x − yi
+ hg 1,
f (y )
g1

g2
g3

y
g1 , g2 , g3 are subgradients at y

10 / 30
Subgradients – basic facts
I f is convex, differentiable: ∇f (y) the unique subgradient at y
I A vector g is a subgradient at a point y if and only if
f (y) + hg, x − yi is globally smaller than f (x).
I Usually, one subgradient costs approx. as much as f (x)
I Determining all subgradients at a given point — difficult.
I Subgradient calculus—a great achievement in convex analysis
I Without convexity, things become wild! — advanced course

11 / 30
Subgradients – example

f (x) := max(f1 (x), f2 (x)); both f1 , f2 convex, differentiable

f (x) f1 (x)

f2 (x)

y
? f1 (x) > f2 (x): unique subgradient of f is f10 (x)
? f1 (x) < f2 (x): unique subgradient of f is f20 (x)
? f1 (y) = f2 (y): subgradients, the segment [f10 (y), f20 (y)]
(imagine all supporting lines turning about point y)
12 / 30
Subgradients

Def. A vector g ∈ Rn is called a subgradient at a point y, if for all


x ∈ dom f , it holds that

f (x) ≥ f (y) + hg, x − yi

13 / 30
Subdifferential

Def. The set of all subgradients at y denoted by ∂f (y). This set is


called subdifferential of f at y
If f is convex, ∂f (x) is nice:
♣ If x ∈ relative interior of dom f , then ∂f (x) nonempty
♣ If f differentiable at x, then ∂f (x) = {∇f (x)}
♣ If ∂f (x) = {g}, then f is differentiable and g = ∇f (x)

14 / 30
Subdifferential – example

f (x) = |x|

∂f (x)

+1

x
−1


−1
 x < 0,
∂|x| = +1 x > 0,

[−1, 1] x = 0.

15 / 30
More examples

Example f (x) = kxk2 . Then,


(
kxk−1
2 x x 6= 0,
∂f (x) :=
{z | kzk2 ≤ 1} x = 0.

Proof.
kzk2 ≥ kxk2 + hg, z − xi
kzk2 ≥ hg, zi
=⇒ kgk2 ≤ 1.

16 / 30
More examples

Example A convex function need not be subdifferentiable everywhere.


Let (
−(1 − kxk22 )1/2 if kxk2 ≤ 1,
f (x) :=
+∞ otherwise.
f diff. for all x with kxk2 < 1, but ∂f (x) = ∅ whenever kxk2 ≥ 1.

17 / 30
Calculus

18 / 30
Recall basic calculus
If f and k are differentiable, we know that
Addition: ∇(f + k)(x) = ∇f (x) + ∇k(x)
Scaling: ∇(αf (x)) = α∇f (x)

Chain rule
If f :Rn → Rm , and k : Rm → Rp . Let h : Rn → Rp be the
composition h(x) = (k ◦ f )(x) = k(f (x)). Then,
Dh(x) = Dk(f (x))Df (x).

Example If f : Rn → R and k : R → R, then using the fact that


∇h(x) = [Dh(x)]T , we obtain

∇h(x) = k 0 (f (x))∇f (x).

19 / 30
Subgradient calculus
♠ Finding one subgradient within ∂f (x)
♠ Determining entire subdifferential ∂f (x) at a point x
♠ Do we have the chain rule?
♠ Usually not easy!

20 / 30
Subgradient calculus
H
If f is differentiable, ∂f (x) = {∇f (x)}
H
Scaling α > 0, ∂(αf )(x) = α∂f (x) = {αg | g ∈ ∂f (x)}
Addition∗ : ∂(f + k)(x) = ∂f (x) + ∂k(x) (set addition)
H

Chain rule∗ : Let A ∈ Rm×n , b ∈ Rm , f : Rm → R, and


H

h : Rn → R be given by h(x) = f (Ax + b). Then,


∂h(x) = AT ∂f (Ax + b).
Chain rule∗ : h(x) = f ◦ k, where k : X → Y is diff.
H

∂h(x) = ∂f (k(x)) ◦ Dk(x) = [Dk(x)]T ∂f (k(x))

Max function∗ : If f (x) := max1≤i≤m fi (x), then


H
[
∂f (x) = conv {∂fi (x) | fi (x) = f (x)} ,

convex hull over subdifferentials of “active” functions at x


Conjugation: z ∈ ∂f (x) if and only if x ∈ ∂f ∗ (z)
H

21 / 30
Examples

It can happen that ∂(f1 + f2 ) 6= ∂f1 + ∂f2

Example Define f1 and f2 by


( √ (
−2 x if x ≥ 0, +∞ if x > 0,
f1 (x) := and f2 (x) := √
+∞ if x < 0, −2 −x if x ≤ 0.

Then, f = max {f1 , f2 } = I0 , whereby ∂f (0) = R


But ∂f1 (0) = ∂f2 (0) = ∅.

However, ∂f1 (x) + ∂f2 (x) ⊂ ∂(f1 + f2 )(x) always holds.

22 / 30
Examples

Example f (x) = kxk∞ . Then,

∂f (0) = conv {±e1 , . . . , ±en } ,

where ei is i-th canonical basis vector (column of identity matrix).


To prove, notice that f (x) = max1≤i≤n |eTi x| .


23 / 30
Example (S. Boyd)

Example Let f (x) = max sT x | si ∈ {−1, 1} (2n members)




(−1, 1)
+1 (1, 1)
+1

−1 −1
(1, −1)

∂f at x = (0, 0) ∂f at x = (1, 0) ∂f at x = (1, 1)

24 / 30
Rules for subgradients

25 / 30
Subgradient for pointwise sup
f (x) := sup h(x, y)
y∈Y

Getting ∂f (x) is complicated!

Simple way to obtain some g ∈ ∂f (x):


I Pick any y ∗ for which h(x, y ∗ ) = f (x)
I Pick any subgradient g ∈ ∂h(x, y ∗ )
I This g ∈ ∂f (x)

h(z, y ∗ ) ≥ h(x, y ∗ ) + g T (z − x)
h(z, y ∗ ) ≥ f (x) + g T (z − x)
f (z) ≥ h(z, y) (because of sup)
T
f (z) ≥ f (x) + g (z − x).

26 / 30
Example
Suppose ai ∈ Rn and bi ∈ R. And

f (x) := max (aTi x + bi ).


1≤i≤n

This f a max (in fact, over a finite number of terms)


I Suppose f (x) = aTk x + bk for some index k
I Here f (x; y) = fk (x) = aTk x + bk , and ∂fk (x) = {∇fk (x)}
I Hence, ak ∈ ∂f (x) works!

27 / 30
Subgradient of expectation
Suppose f = Ef (x, u), where f is convex in x for each u (an r.v.)
Z
f (x) := f (x, u)p(u)du

I For each u choose any g(x, u) ∈ ∂x f (x, u)


R
I Then, g = g(x, u)p(u)du = Eg(x, u) ∈ ∂f (x)

28 / 30
Subgradient of composition
Suppose h : Rn → R cvx and nondecreasing; each fi cvx

f (x) := h(f1 (x), f2 (x), . . . , fn (x)).

To find a vector g ∈ ∂f (x), we may:


I For i = 1 to n, compute gi ∈ ∂fi (x)
I Compute u ∈ ∂h(f1 (x), . . . , fn (x))
I Set g = u1 g1 + u2 g2 + · · · + un gn ; this g ∈ ∂f (x)
I Compare with ∇f (x) = J∇h(x), where J matrix of ∇fi (x)

Exercise: Verify g ∈ ∂f (x) by showing f (z) ≥ f (x) + g T (z − x)

29 / 30
References
1 R. T. Rockafellar. Convex Analysis
2 S. Boyd (Stanford); EE364b Lecture Notes.

30 / 30

You might also like