0% found this document useful (0 votes)
21 views48 pages

09 Convex

The document discusses convex optimization, defining convex sets and functions, and explaining their properties. It outlines methods to check for convexity in both one-dimensional and multi-dimensional functions, including conditions based on order derivatives and the Hessian matrix. The importance of convexity is emphasized, particularly in relation to finding global minima in optimization problems.

Uploaded by

Shahd Mounir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views48 pages

09 Convex

The document discusses convex optimization, defining convex sets and functions, and explaining their properties. It outlines methods to check for convexity in both one-dimensional and multi-dimensional functions, including conditions based on order derivatives and the Hessian matrix. The importance of convexity is emphasized, particularly in relation to finding global minima in optimization problems.

Uploaded by

Shahd Mounir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Convex Optimization

CS 360 Lecture 9

1
Convex sets
A set 𝒞 is convex if, for ∀𝑥, 𝑦 ∈ 𝒞 and 𝜆 ∈ 𝑅 with 0 ≤ 𝜆 ≤ 1, we have
𝜆𝑥 + 1 − 𝜆 𝑦 ∈ 𝒞

If we take any two elements in 𝒞, draw a line between the two elements, then each
point on the line also belongs to 𝒞

2
Convex Set
● 𝑅𝑑 is a convex set for ∀𝑑
● Affine subspaces 𝑥|𝐴𝑥 = 𝑏 where 𝐴 ∈ 𝑅𝑚×𝑑 , 𝑏 ∈ 𝑅 𝑚
● Polyhedron 𝑥|𝐴𝑥 ≤ 𝑏

● Intersections of convex sets are convex.


‫ 𝑖𝐶 𝑖ځ‬with 𝐶1 , 𝐶2, … are convex sets

3
Convex Functions
A function is convex if its domain 𝒟(𝑓) is a convex set, and for ∀𝑥1, 𝑥2 ∈ 𝒟(𝑓), we have
𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2 𝜆 ∈ [0,1]

A function is strictly convex if 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 < 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2 𝜆 ∈ (0,1)


4
Convex function
Function 𝑓 is concave if −𝑓 is convex

5
Convex functions

Convexity along all lines

6
Why is convexity so important

If you can find a critical point in a convex function, it’s a global minimum.
Critical point: 𝑓 ′ 𝑥 = 0 or 𝑓 ′ 𝑥 does not exist. 7
3 ways to check convexity for 1D function 𝑥1, 𝑥2 are scalars

● 0th order: prove the original convexity condition


𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2

● 1st order: tangent line must lie below the function


𝑑
𝑓 𝑥2 ≥ 𝑓 𝑥1 + 𝑓(𝑥1)(𝑥2 − 𝑥1)
𝑑𝑥

● 2nd order: curvature is always positive


𝑑2
𝑓(𝑥) ≥ 0
𝑑𝑥 2

8
Checking convexity: 0th order 𝑥1, 𝑥2 are scalars

Prove 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
9
Checking convexity: 1st order

𝑑
Confirm 𝑓 𝑥2 ≥ 𝑓 𝑥1 + 𝑓(𝑥1 )(𝑥2 − 𝑥1 )
𝑑𝑥

10
Checking convexity: 2nd order

𝑑2
Check 𝑓(𝑥) ≥0
𝑑𝑥2
11
3 ways to check convexity for N-d function
For the following 𝑥, 𝑦 are all N-d vectors.
● 0th order: prove the original convexity condition
𝑓 𝜆𝑥 + 1 − 𝜆 𝑦 ≤ 𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦

● 1st order: tangent plane must lie below the function


𝑓 𝑦 ≥ 𝑓 𝑥 + ∇𝑓(𝑥) 𝑇 (𝑦 − 𝑥)

● 2nd order: Hessian is positive semi-definite


∇2𝑥 𝑓(𝑥) ≥ 0

12
Checking convexity: 0th order

13
Checking convexity: 1st order

Tangent plane is the span of tangents 𝑓 𝑦 ≥ 𝑓 𝑥 + ∇𝑥 𝑓 𝑥 𝑇 (𝑦 − 𝑥)

14
Checking convexity: 2nd order
Hessian Matrix positive semi-definite: ∇2𝑥 𝑓 𝑥 ≥ 0

● Matrix 𝐴 is positive semi-definite (PSD) if 𝑥 𝑇 𝐴𝑥 ≥


0 𝑓𝑜𝑟 ∀𝑥 ∈ 𝑅 𝑛
● A symmetric matrix is p.s.d. if and only if all eigenvalues
are non-negative.

15
Hessian 𝜕
𝜕𝑥1
𝑓(𝑥)
𝜕2
𝑥1 𝑓 𝑥 =4
𝜕𝑥12

𝑓 𝑥 = 2𝑥12 + 𝑥22 𝜕2
𝑥1 𝑓 𝑥 =0
𝑥2 𝜕𝑥1 𝑥2

4 0
H=
0 2
𝑥2 𝜕
𝑓(𝑥) 𝜕2
𝜕𝑥2 𝑥1 𝑓 𝑥 =0
𝜕𝑥1 𝑥2

𝜕2
𝑥2 𝑓 𝑥 =2
𝜕𝑥22

16
Hessian 𝜕
𝜕𝑥1
𝑓(𝑥)
𝜕2
𝑥1 𝑓 𝑥 =4
𝜕𝑥12

𝑓 𝑥 = 2𝑥12 − 𝑥22 𝜕2
𝑥1 𝑓 𝑥 =0
𝑥2 𝜕𝑥1 𝑥2

4 0
H=
0 −2
𝑥2 𝜕
𝑓(𝑥) 𝜕2
𝜕𝑥2 𝑥1 𝑓 𝑥 =0
𝜕𝑥1 𝑥2

𝜕2
𝑥2 𝑓 𝑥 = −2
𝜕𝑥22

17
Checking convexity: 2nd order

4 0 −4 0
𝐻= 𝐻=
0 −2 0 −2

4 0
𝐻= 4 0
0 2 𝐻=
0 0

What if it’s not diagonal?


18
Checking convexity: 2nd order
𝑓 𝑥 = 𝑥1 + 2𝑥12 + 𝑥22 − 𝑥1 𝑥2 − 2

4 −1
𝐻=
−1 2

Eigenvalues: 𝜆 = − 2 + 3, 2 + 3
1 1
Eigenvectors: 𝑒 = 2+1
, − 2+1

19
3 ways to check convexity for N-d function
For the following 𝑥, 𝑦 are all N-d vectors.
● 0th order: prove the original convexity condition
𝑓 𝜆𝑥 + 1 − 𝜆 𝑦 ≤ 𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦

● 1st order: tangent plane must lie below the function


𝑓 𝑦 ≥ 𝑓 𝑥 + ∇𝑓(𝑥) 𝑇 (𝑦 − 𝑥)

● 2nd order: Hessian is positive semi-definite


∇2𝑥 𝑓(𝑥) ≥ 0
An N-d function is convex if it is convex in any direction. Or any 1-d slice of the function
is convex.
20
Equivalence of the Three Definitions (i)->(ii)
𝑓 𝜆𝑦 + 1 − 𝜆 𝑥 ≤ 𝜆𝑓 𝑦 + 1 − 𝜆 𝑓 𝑥
𝑓 𝑥+𝜆 𝑦−𝑥 ≤𝑓 𝑥 +𝜆 𝑓 𝑦 −𝑓 𝑥
𝑓 𝑥+𝜆 𝑦−𝑥 − 𝑓(𝑥)
𝑓 𝑦 −𝑓 𝑥 ≥
𝜆
As 𝜆 → 0:
𝑥, 𝑦
𝑓 𝑥+𝜆 𝑦−𝑥 −𝑓 𝑥 𝑇
𝑓 𝑦 − 𝑓 𝑥 ≥ lim = ∇𝑓 𝑥 (𝑦 − 𝑥)
𝜆→0 𝜆
Directional derivative!
The direction is (𝑦 − 𝑥)

21
Equivalence of the Three Definitions (ii)->(i)
Let 𝑧 = 𝜆𝑥 + 1 − 𝜆 𝑦
𝑇
𝑓 𝑥 ≥ 𝑓 𝑧 + ∇𝑓 𝑧 𝑥−𝑧
𝑇
𝑓 𝑦 ≥ 𝑓 𝑧 + ∇𝑓 𝑧 𝑦−𝑧
𝑥 𝑧 𝑦
𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦
𝑇 𝑇
≥ 𝜆𝑓 𝑧 + 1 − 𝜆 𝑓 𝑧 + 𝜆 ∇𝑓 𝑧 𝑥−𝑧 + 1−𝜆 ∇𝑓 𝑧 𝑦−𝑧
𝑇
= 𝑓 𝑧 + ∇𝑓 𝑧 𝜆𝑥 − 𝜆𝑧 + 1 − 𝜆 𝑦 − 1 − 𝜆 𝑧
=𝑓 𝑧
= 𝑓(𝜆𝑥 + 1 − 𝜆 𝑦)

22
Equivalence of the Three Definitions (ii)->(iii)
For any direction denoted by two points 𝑥, 𝑦 (𝑥 ≠ 𝑦)
𝑓 𝑦 ≥ 𝑓 𝑥 + 𝑓′ 𝑥 𝑦 − 𝑥
𝑓 𝑥 ≥ 𝑓 𝑦 + 𝑓′ 𝑦 𝑥 − 𝑦
𝑓′ 𝑥 𝑦 − 𝑥 ≤ 𝑓 𝑦 − 𝑓 𝑥 ≤ 𝑓′ 𝑦 𝑦 − 𝑥
Divide both sides by 𝑦 − 𝑥 2
𝑓 ′ 𝑦 − 𝑓′(𝑥)
≥0
𝑦−𝑥
As 𝑦 → 𝑥
We have 𝑓 ′′ 𝑥 ≥ 0

23
iii -> ii
For any direction denoted by two points 𝑥, 𝑦 𝑥 ≠ 𝑦 :
𝑓 𝑦 = 𝑓 𝑥 + 𝑓 ′ 𝑥 𝑦 − 𝑥 + 𝑓 ′′ 𝛿 𝑦 − 𝑥 2 with 𝛿 ∈ (𝑥, 𝑦)
𝑓 ′′ 𝛿 ≥ 0
𝑓 𝑦 ≥ 𝑓 𝑥 + 𝑓′(𝑥)(𝑦 − 𝑥)

Taylor’s remainder Thm.

24
Example of convex functions
● Exponential: 𝑓 𝑥 = 𝑒 𝑎𝑥 for ∀𝑎 ∈ 𝑅
● Affine functions: 𝑓(𝑥) = 𝑎𝑇 𝑥 + 𝑏 where 𝑥 ∈ 𝑅𝑛

1
● Quadratic functions: 𝑓 𝑥 = 2 𝑥 𝑇 𝐴𝑥 + 𝑏𝑇 𝑥 + 𝑐 for a PSD matrix 𝐴 ∈ 𝑅𝑛×𝑛 and 𝑏 ∈ 𝑅𝑛
and 𝑐 ∈ 𝑅
● Nonnegative weighted sums of convex functions 𝑓 𝑥 = σ𝐾 𝑖=1 𝑤𝑖 𝑓𝑖 (𝑥)
○ Where 𝑓1 , 𝑓2 , … 𝑓𝐾 are convex functions and 𝑤1 , 𝑤2 , … 𝑤𝐾 are nonnegative real numbers

Affine functions are both convex and concave


25
Jesen’s Inequality
For any convex function 𝑓 we have:
𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2 𝜆 ∈ [0,1]
Using induction, we can extend this to
𝑘 𝑘 𝑘
𝑓 ෍ 𝜆𝑖 𝑥𝑖 ≤ ෍ 𝜆𝑖 𝑓(𝑥𝑖 ) for ෍ 𝜆𝑖 = 1 and 𝜆𝑖 > 0
𝑖=1 𝑖=1 𝑖=1
We can further extend this to integrals

𝑓 න 𝑝 𝑥 𝑥𝑑𝑥 ≤ න 𝑝 𝑥 𝑓 𝑥 𝑑𝑥 for න 𝑝 𝑥 𝑑𝑥 = 1 and 𝑝(𝑥) ≥ 0

𝑓 𝐸𝑥 ≤ 𝐸 𝑓(𝑥)

26
Proof of Jensen’s Inequality
Proof by induction
Base case: 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
Induction Hypothesis: 𝑓 σ𝑘𝑖=1 𝜆𝑖 𝑥𝑖 ≤ σ𝑘𝑖=1 𝜆𝑖 𝑓(𝑥𝑖 )
Induction Step:
𝜆𝑖 𝑥𝑖
● 𝑓 σ𝑖=1:𝑘+1 𝜆𝑖 𝑥𝑖 = 𝑓 𝜆𝑘+1𝑥𝑘+1 + 1 − 𝜆𝑘+1 σ𝑖=1:𝑘 1−𝜆
𝑘+1

𝜆𝑖 𝑥𝑖
≤ 𝜆𝑘+1 𝑓 𝑥𝑘+1 + 1 − 𝜆𝑘+1 𝑓 ෍
1 − 𝜆𝑘+1
𝑖=1:𝑘
𝜆𝑖
≤ 𝜆𝑘+1 𝑓 𝑥𝑘+1 + 1 − 𝜆𝑘+1 ෍ 𝑓 𝑥𝑖
1 − 𝜆𝑘+1
𝑖=1:𝑘

= ෍ 𝜆𝑖 𝑓(𝑥𝑖 )
𝑖=1:𝑘+1
27
Convex optimization
Convex optimization problems:
Minimize 𝑓 𝑥
subject to 𝑥 ∈ 𝐶
Where 𝐶 is a convex set.

We can often write it as


Miminize 𝑓 𝑥 objective

s. t. 𝑔𝑖 𝑥 ≤ 0, 𝑖 = 1, … , 𝑚
constraints
ℎ𝑗 𝑥 = 0, 𝑗 = 1, … , 𝑝
Where 𝑓 is a convex function, 𝑔𝑖 ’s are convex functions, and ℎ𝑗 ’s are affine functions

28
Solving unconstrained convex
optimization

29
How to solve convex optimization problems
Assume no constraints, then the problem
arg min𝑥 𝑓(𝑥) , where 𝑓(𝑥) is a convex function

Is equivalent to
∇𝑓 𝑥 = 0

𝜕 𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥 ,… 𝑓 𝑥
𝑥0 𝑥1 𝑥𝑝

30
Example
Objective: 𝑓 𝑥 = 𝑥 2 + 1
𝑑𝑓(𝑥)
Solve: = 2𝑥 = 0
𝑑𝑥

𝑥=0

31
Example in 2D
Objective:𝑓 𝑥 = 𝑥12 + 𝑥22 + 𝑥1𝑥2 − 𝑥1 + 1

𝜕𝑓(𝑥)
Solve: = 2𝑥1 + 𝑥2 − 1
𝑑𝑥1

𝜕𝑓(𝑥)
= 2𝑥2 + 𝑥1
𝑑𝑥2

𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥 = 0,0
𝑥1 𝑥2

32
Caveat: saddle points Check the Hessian. If Hessian is p.d./p.s.d.
for x, then it’s local minimum/maximum

33
Higher dimensional: use scipy

34
Optimization using Scipy

35
Gradient Descent

36
Gradient Descent

37
Gradient

Each arrow is the gradient evaluated at that point. The gradients point towards the direction of
steepest ascent.
38
Gradient Descent Algorithm
● Choose initial guess 𝑥0
● While not converged (𝜂 is the learning rate)
𝑥𝑡+1 = 𝑥𝑡 − 𝜂∇𝑓(𝑥𝑡 )

39
Choice of learning rate

40
Debugging gradient descent

41
Example in 2D (revisited)
Objective:𝑓 𝑥 = 𝑥12 + 𝑥22 + 𝑥1𝑥2 − 𝑥1 + 1

𝜕𝑓(𝑥)
Solve: = 2𝑥1 + 𝑥2 − 1
𝜕𝑥1

𝜕𝑓(𝑥)
= 2𝑥2 + 𝑥1
𝜕𝑥2

𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥
𝜕𝑥1 𝜕𝑥2

42
Example in 2D (revisited)
Algorithm:

2𝑥1 + 𝑥2 − 1
∇𝑓 𝑥 =
2𝑥2 + 𝑥1

𝑥𝑡+1 ← 𝑥𝑡 − 𝛼∇𝑓(𝑥)
Try it for 𝛼 = 0.3

43
Caveat: Perils of non-convex functions

Non convex functions can lead to local minima or singularities


44
Caveat: local minima

Non convex functions can lead to local minima


45
Caveat: singularities

46
Example
Find the minimum of function 𝑔 𝑢, 𝑣 = 𝑢 + 2𝑣 2 + 10 sin 𝑢 + 2𝑢2 − 𝑢𝑣 − 2 /30

47
Example
Find the minimum of function 𝑔 𝑢, 𝑣 = 𝑢 + 2𝑣 2 + 10 sin 𝑢 + 2𝑢2 − 𝑢𝑣 − 2 /30

48

You might also like