09 Convex
09 Convex
CS 360 Lecture 9
1
Convex sets
A set 𝒞 is convex if, for ∀𝑥, 𝑦 ∈ 𝒞 and 𝜆 ∈ 𝑅 with 0 ≤ 𝜆 ≤ 1, we have
𝜆𝑥 + 1 − 𝜆 𝑦 ∈ 𝒞
If we take any two elements in 𝒞, draw a line between the two elements, then each
point on the line also belongs to 𝒞
2
Convex Set
● 𝑅𝑑 is a convex set for ∀𝑑
● Affine subspaces 𝑥|𝐴𝑥 = 𝑏 where 𝐴 ∈ 𝑅𝑚×𝑑 , 𝑏 ∈ 𝑅 𝑚
● Polyhedron 𝑥|𝐴𝑥 ≤ 𝑏
3
Convex Functions
A function is convex if its domain 𝒟(𝑓) is a convex set, and for ∀𝑥1, 𝑥2 ∈ 𝒟(𝑓), we have
𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2 𝜆 ∈ [0,1]
5
Convex functions
6
Why is convexity so important
If you can find a critical point in a convex function, it’s a global minimum.
Critical point: 𝑓 ′ 𝑥 = 0 or 𝑓 ′ 𝑥 does not exist. 7
3 ways to check convexity for 1D function 𝑥1, 𝑥2 are scalars
8
Checking convexity: 0th order 𝑥1, 𝑥2 are scalars
Prove 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
9
Checking convexity: 1st order
𝑑
Confirm 𝑓 𝑥2 ≥ 𝑓 𝑥1 + 𝑓(𝑥1 )(𝑥2 − 𝑥1 )
𝑑𝑥
10
Checking convexity: 2nd order
𝑑2
Check 𝑓(𝑥) ≥0
𝑑𝑥2
11
3 ways to check convexity for N-d function
For the following 𝑥, 𝑦 are all N-d vectors.
● 0th order: prove the original convexity condition
𝑓 𝜆𝑥 + 1 − 𝜆 𝑦 ≤ 𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦
12
Checking convexity: 0th order
13
Checking convexity: 1st order
14
Checking convexity: 2nd order
Hessian Matrix positive semi-definite: ∇2𝑥 𝑓 𝑥 ≥ 0
15
Hessian 𝜕
𝜕𝑥1
𝑓(𝑥)
𝜕2
𝑥1 𝑓 𝑥 =4
𝜕𝑥12
𝑓 𝑥 = 2𝑥12 + 𝑥22 𝜕2
𝑥1 𝑓 𝑥 =0
𝑥2 𝜕𝑥1 𝑥2
4 0
H=
0 2
𝑥2 𝜕
𝑓(𝑥) 𝜕2
𝜕𝑥2 𝑥1 𝑓 𝑥 =0
𝜕𝑥1 𝑥2
𝜕2
𝑥2 𝑓 𝑥 =2
𝜕𝑥22
16
Hessian 𝜕
𝜕𝑥1
𝑓(𝑥)
𝜕2
𝑥1 𝑓 𝑥 =4
𝜕𝑥12
𝑓 𝑥 = 2𝑥12 − 𝑥22 𝜕2
𝑥1 𝑓 𝑥 =0
𝑥2 𝜕𝑥1 𝑥2
4 0
H=
0 −2
𝑥2 𝜕
𝑓(𝑥) 𝜕2
𝜕𝑥2 𝑥1 𝑓 𝑥 =0
𝜕𝑥1 𝑥2
𝜕2
𝑥2 𝑓 𝑥 = −2
𝜕𝑥22
17
Checking convexity: 2nd order
4 0 −4 0
𝐻= 𝐻=
0 −2 0 −2
4 0
𝐻= 4 0
0 2 𝐻=
0 0
4 −1
𝐻=
−1 2
Eigenvalues: 𝜆 = − 2 + 3, 2 + 3
1 1
Eigenvectors: 𝑒 = 2+1
, − 2+1
19
3 ways to check convexity for N-d function
For the following 𝑥, 𝑦 are all N-d vectors.
● 0th order: prove the original convexity condition
𝑓 𝜆𝑥 + 1 − 𝜆 𝑦 ≤ 𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦
21
Equivalence of the Three Definitions (ii)->(i)
Let 𝑧 = 𝜆𝑥 + 1 − 𝜆 𝑦
𝑇
𝑓 𝑥 ≥ 𝑓 𝑧 + ∇𝑓 𝑧 𝑥−𝑧
𝑇
𝑓 𝑦 ≥ 𝑓 𝑧 + ∇𝑓 𝑧 𝑦−𝑧
𝑥 𝑧 𝑦
𝜆𝑓 𝑥 + 1 − 𝜆 𝑓 𝑦
𝑇 𝑇
≥ 𝜆𝑓 𝑧 + 1 − 𝜆 𝑓 𝑧 + 𝜆 ∇𝑓 𝑧 𝑥−𝑧 + 1−𝜆 ∇𝑓 𝑧 𝑦−𝑧
𝑇
= 𝑓 𝑧 + ∇𝑓 𝑧 𝜆𝑥 − 𝜆𝑧 + 1 − 𝜆 𝑦 − 1 − 𝜆 𝑧
=𝑓 𝑧
= 𝑓(𝜆𝑥 + 1 − 𝜆 𝑦)
22
Equivalence of the Three Definitions (ii)->(iii)
For any direction denoted by two points 𝑥, 𝑦 (𝑥 ≠ 𝑦)
𝑓 𝑦 ≥ 𝑓 𝑥 + 𝑓′ 𝑥 𝑦 − 𝑥
𝑓 𝑥 ≥ 𝑓 𝑦 + 𝑓′ 𝑦 𝑥 − 𝑦
𝑓′ 𝑥 𝑦 − 𝑥 ≤ 𝑓 𝑦 − 𝑓 𝑥 ≤ 𝑓′ 𝑦 𝑦 − 𝑥
Divide both sides by 𝑦 − 𝑥 2
𝑓 ′ 𝑦 − 𝑓′(𝑥)
≥0
𝑦−𝑥
As 𝑦 → 𝑥
We have 𝑓 ′′ 𝑥 ≥ 0
23
iii -> ii
For any direction denoted by two points 𝑥, 𝑦 𝑥 ≠ 𝑦 :
𝑓 𝑦 = 𝑓 𝑥 + 𝑓 ′ 𝑥 𝑦 − 𝑥 + 𝑓 ′′ 𝛿 𝑦 − 𝑥 2 with 𝛿 ∈ (𝑥, 𝑦)
𝑓 ′′ 𝛿 ≥ 0
𝑓 𝑦 ≥ 𝑓 𝑥 + 𝑓′(𝑥)(𝑦 − 𝑥)
24
Example of convex functions
● Exponential: 𝑓 𝑥 = 𝑒 𝑎𝑥 for ∀𝑎 ∈ 𝑅
● Affine functions: 𝑓(𝑥) = 𝑎𝑇 𝑥 + 𝑏 where 𝑥 ∈ 𝑅𝑛
1
● Quadratic functions: 𝑓 𝑥 = 2 𝑥 𝑇 𝐴𝑥 + 𝑏𝑇 𝑥 + 𝑐 for a PSD matrix 𝐴 ∈ 𝑅𝑛×𝑛 and 𝑏 ∈ 𝑅𝑛
and 𝑐 ∈ 𝑅
● Nonnegative weighted sums of convex functions 𝑓 𝑥 = σ𝐾 𝑖=1 𝑤𝑖 𝑓𝑖 (𝑥)
○ Where 𝑓1 , 𝑓2 , … 𝑓𝐾 are convex functions and 𝑤1 , 𝑤2 , … 𝑤𝐾 are nonnegative real numbers
𝑓 𝐸𝑥 ≤ 𝐸 𝑓(𝑥)
26
Proof of Jensen’s Inequality
Proof by induction
Base case: 𝑓 𝜆𝑥1 + 1 − 𝜆 𝑥2 ≤ 𝜆𝑓 𝑥1 + 1 − 𝜆 𝑓 𝑥2
Induction Hypothesis: 𝑓 σ𝑘𝑖=1 𝜆𝑖 𝑥𝑖 ≤ σ𝑘𝑖=1 𝜆𝑖 𝑓(𝑥𝑖 )
Induction Step:
𝜆𝑖 𝑥𝑖
● 𝑓 σ𝑖=1:𝑘+1 𝜆𝑖 𝑥𝑖 = 𝑓 𝜆𝑘+1𝑥𝑘+1 + 1 − 𝜆𝑘+1 σ𝑖=1:𝑘 1−𝜆
𝑘+1
𝜆𝑖 𝑥𝑖
≤ 𝜆𝑘+1 𝑓 𝑥𝑘+1 + 1 − 𝜆𝑘+1 𝑓
1 − 𝜆𝑘+1
𝑖=1:𝑘
𝜆𝑖
≤ 𝜆𝑘+1 𝑓 𝑥𝑘+1 + 1 − 𝜆𝑘+1 𝑓 𝑥𝑖
1 − 𝜆𝑘+1
𝑖=1:𝑘
= 𝜆𝑖 𝑓(𝑥𝑖 )
𝑖=1:𝑘+1
27
Convex optimization
Convex optimization problems:
Minimize 𝑓 𝑥
subject to 𝑥 ∈ 𝐶
Where 𝐶 is a convex set.
s. t. 𝑔𝑖 𝑥 ≤ 0, 𝑖 = 1, … , 𝑚
constraints
ℎ𝑗 𝑥 = 0, 𝑗 = 1, … , 𝑝
Where 𝑓 is a convex function, 𝑔𝑖 ’s are convex functions, and ℎ𝑗 ’s are affine functions
28
Solving unconstrained convex
optimization
29
How to solve convex optimization problems
Assume no constraints, then the problem
arg min𝑥 𝑓(𝑥) , where 𝑓(𝑥) is a convex function
Is equivalent to
∇𝑓 𝑥 = 0
𝜕 𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥 ,… 𝑓 𝑥
𝑥0 𝑥1 𝑥𝑝
30
Example
Objective: 𝑓 𝑥 = 𝑥 2 + 1
𝑑𝑓(𝑥)
Solve: = 2𝑥 = 0
𝑑𝑥
𝑥=0
31
Example in 2D
Objective:𝑓 𝑥 = 𝑥12 + 𝑥22 + 𝑥1𝑥2 − 𝑥1 + 1
𝜕𝑓(𝑥)
Solve: = 2𝑥1 + 𝑥2 − 1
𝑑𝑥1
𝜕𝑓(𝑥)
= 2𝑥2 + 𝑥1
𝑑𝑥2
𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥 = 0,0
𝑥1 𝑥2
32
Caveat: saddle points Check the Hessian. If Hessian is p.d./p.s.d.
for x, then it’s local minimum/maximum
33
Higher dimensional: use scipy
34
Optimization using Scipy
35
Gradient Descent
36
Gradient Descent
37
Gradient
Each arrow is the gradient evaluated at that point. The gradients point towards the direction of
steepest ascent.
38
Gradient Descent Algorithm
● Choose initial guess 𝑥0
● While not converged (𝜂 is the learning rate)
𝑥𝑡+1 = 𝑥𝑡 − 𝜂∇𝑓(𝑥𝑡 )
39
Choice of learning rate
40
Debugging gradient descent
41
Example in 2D (revisited)
Objective:𝑓 𝑥 = 𝑥12 + 𝑥22 + 𝑥1𝑥2 − 𝑥1 + 1
𝜕𝑓(𝑥)
Solve: = 2𝑥1 + 𝑥2 − 1
𝜕𝑥1
𝜕𝑓(𝑥)
= 2𝑥2 + 𝑥1
𝜕𝑥2
𝜕 𝜕
∇𝑓 𝑥 = 𝑓 𝑥 , 𝑓 𝑥
𝜕𝑥1 𝜕𝑥2
42
Example in 2D (revisited)
Algorithm:
2𝑥1 + 𝑥2 − 1
∇𝑓 𝑥 =
2𝑥2 + 𝑥1
𝑥𝑡+1 ← 𝑥𝑡 − 𝛼∇𝑓(𝑥)
Try it for 𝛼 = 0.3
43
Caveat: Perils of non-convex functions
46
Example
Find the minimum of function 𝑔 𝑢, 𝑣 = 𝑢 + 2𝑣 2 + 10 sin 𝑢 + 2𝑢2 − 𝑢𝑣 − 2 /30
47
Example
Find the minimum of function 𝑔 𝑢, 𝑣 = 𝑢 + 2𝑣 2 + 10 sin 𝑢 + 2𝑢2 − 𝑢𝑣 − 2 /30
48