Subgradient Method
Subgradient Method
3. Subgradient method
• subgradient method
• convergence analysis
• alternating projections
3.1
Subgradient method
𝑥 𝑘+1 = 𝑥 𝑘 − 𝑡 𝑘 𝑔 𝑘 , 𝑘 = 0, 1, . . .
𝑔 𝑘 is any subgradient of 𝑓 at 𝑥 𝑘
∞
• diminishing: 𝑡 𝑘 → 0 and 𝑡𝑘 = ∞
P
𝑘=0
this is equivalent to k𝑔k2 ≤ 𝐺 for all 𝑥 and 𝑔 ∈ 𝜕 𝑓 (𝑥) (see next page)
𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥)
= 𝑓 (𝑥) + k𝑔k2
> 𝑓 (𝑥) + 𝐺
★
k𝑥0 − 𝑥★ k22
𝐺 2𝑡
𝑓best,𝑘 −𝑓 ≤ +
2(𝑘 + 1)𝑡 2
★
𝐺 k𝑥0 − 𝑥★ k22
𝐺𝑠
𝑓best,𝑘 − 𝑓 ≤ +
2(𝑘 + 1)𝑠 2
∞
𝑡𝑖 → 0,
X
𝑡𝑖 = ∞
𝑖=0
𝑘
𝐺2 𝑡𝑖2
P
k𝑥0 − 𝑥★ k22 𝑖=0
𝑓best,𝑘 − 𝑓 ★ ≤ +
𝑘 𝑘
2 𝑡𝑖 2 𝑡𝑖
P P
𝑖=0 𝑖=0
𝑘 𝑘
• 2
can show that ( 𝑡𝑖 )/( 𝑡𝑖 )
P P
→ 0; hence, 𝑓best,𝑘 converges to 𝑓 ★
𝑖=0 𝑖=0
• examples of diminishing step size rules:
𝜏 𝜏
𝑡𝑖 = , 𝑡𝑖 = √
𝑖+1 𝑖+1
minimize k 𝐴𝑥 − 𝑏k1
10−2 10−2
10−3 10−3
10−4 10−4
0 100 200 300 400 500 0 1000 2000 3000
𝑘 𝑘
Subgradient method 3.8
√
Diminishing step size: 𝑡 𝑘 = 0.01/ 𝑘 + 1 and 𝑡 𝑘 = 0.01/(𝑘 + 1)
( 𝑓best,𝑘 − 𝑓 ★)/ 𝑓 ★
100 √
𝑡 𝑘 = 0.01/ 𝑘 + 1
𝑡 𝑘 = 0.01/(𝑘 + 1)
10−1
10−2
10−3
10−4
10−5
0 1000 2000 3000 4000 5000
𝑘
𝑘
𝑅2 + 𝑠𝑖2
X
𝑖=0
𝑓best,𝑘 − 𝑓 ★ ≤
𝑘
2
X
𝑠𝑖 /𝐺
𝑖=0
• for given 𝑘 , the right-hand side is minimized by the fixed step length
𝑅
𝑠𝑖 = 𝑠 = √
𝑘 +1
𝐺𝑅
𝑓best,𝑘 − 𝑓 ★ ≤ √
𝑘 +1
𝑓 (𝑥𝑖 ) − 𝑓 ★
𝑡𝑖 =
k𝑔𝑖 k22
★
2
𝑓 (𝑥𝑖 ) − 𝑓
≤ k𝑥𝑖 − 𝑥★ k22 − k𝑥𝑖+1 − 𝑥★ k22
k𝑔𝑖 k22
★ 𝐺 k𝑥0 − 𝑥★ k2
𝑓best,𝑘 −𝑓 ≤ √
𝑘 +1
1
𝑔 = 0 if 𝑥ˆ ∈ 𝐶 𝑗 , 𝑔= 𝑥ˆ − 𝑃 𝑗 ( 𝑥)
ˆ if 𝑥ˆ ∉ 𝐶 𝑗
k 𝑥ˆ − 𝑃 𝑗 ( 𝑥)
ˆ k2
𝑓 (𝑥 𝑘 )
𝑥 𝑘+1 = 𝑥 𝑘 − (𝑥 𝑘 − 𝑃 𝑗 (𝑥 𝑘 ))
𝑓 𝑗 (𝑥 𝑘 )
= 𝑃 𝑗 (𝑥 𝑘 )
at each step, we project the current point onto the farthest set
• later, we will see faster sequential projection methods that are almost as simple
minimize 𝑓 (𝑥)
subject to 𝑥∈𝐶
𝑥 𝑘+1 = 𝑃𝐶 (𝑥 𝑘 − 𝑡 𝑘 𝑔 𝑘 ), 𝑘 = 0, 1, . . .
Halfspace: 𝐶 = {𝑥 | 𝑎𝑇 𝑥 ≤ 𝑏} (with 𝑎 ≠ 0)
𝑏 − 𝑎𝑇 𝑥
𝑃𝐶 (𝑥) = 𝑥 + 𝑎 if 𝑎𝑇 𝑥 > 𝑏, 𝑃𝐶 (𝑥) = 𝑥 if 𝑎𝑇 𝑥 ≤ 𝑏
k𝑎k22
Rectangle: 𝐶 = {𝑥 ∈ R𝑛 | 𝑙 𝑥 𝑢} where 𝑙 𝑢
𝑙𝑘
𝑥𝑘 ≤ 𝑙𝑘
𝑃𝐶 (𝑥)𝑘 = 𝑥 𝑘 𝑙𝑘 ≤ 𝑥𝑘 ≤ 𝑢𝑘
𝑢𝑘 𝑥𝑘 ≥ 𝑢𝑘
Norm balls: 𝐶 = {𝑥 | k𝑥k ≤ 𝑅} for many common norms (e.g., 236B page 5.26)
𝑢 = 𝑃𝐶 (𝑥)
m 𝑢
(𝑥 − 𝑢)𝑇 (𝑧 − 𝑢) ≤ 0 ∀𝑧 ∈ 𝐶 𝐶
m
k𝑥 − 𝑧k22 ≥ k𝑥 − 𝑢k22 + k𝑧 − 𝑢k22 ∀𝑧 ∈ 𝐶
minimize 𝑓 (𝑥)
subject to 𝑥∈𝐶
• 𝐶 is a closed convex set; other assumptions are the same as on page 3.3
• first inequality on page 3.5 still holds:
2
k𝑥𝑖+1 − 𝑥★ k22 = 𝑃𝐶 (𝑥𝑖 − 𝑡𝑖 𝑔𝑖 ) − 𝑥★ 2
2
≤ 𝑥𝑖 − 𝑡𝑖 𝑔𝑖 − 𝑥★ 2
= k𝑥𝑖 − 𝑥★ k22 − 2𝑡𝑖 𝑔𝑖𝑇 (𝑥𝑖 − 𝑥★) + 𝑡𝑖2 k𝑔𝑖 k22
≤ ★ 2
k𝑥𝑖 − 𝑥 k2 − 2𝑡𝑖 𝑓 (𝑥𝑖 ) − 𝑓 + 𝑡𝑖2 k𝑔𝑖 k22
★
√
can the 𝑓best,𝑘 − 𝑓★ ≤ 𝐺 𝑅/ 𝑘 + 1 bound on page 3.10 be improved?
Problem class
minimize 𝑓 (𝑥)
Algorithm class
• algorithm can choose any 𝑥 (𝑖+1) from the set 𝑥 (0) + span{𝑔 (0) , 𝑔 (1) , . . . , 𝑔 (𝑖) }
• we stop after a fixed number 𝑘 of iterations
1
𝑓 (𝑥) = max 𝑥𝑖 + k𝑥k22 (with 𝑘 < 𝑛), 𝑥 (0) = 0
𝑖=1,...,𝑘+1 2
1 1 1
★
𝑥 = −( , ..., , 0, . . . , 0), ★
𝑓 =−
𝑘
| + 1 {z 𝑘 + 1
} 2(𝑘 + 1)
𝑘 + 1 times
√
• distance of starting point to solution is 𝑅 = k𝑥 (0) − 𝑥★ k 2 = 1/ 𝑘 + 1
• Lipschitz constant on {𝑥 | k𝑥 − 𝑥★ k2 ≤ 𝑅}:
2
𝐺= sup k𝑔k2 ≤ √ +1
𝑔∈𝜕 𝑓 (𝑥), k𝑥−𝑥★ k2 ≤𝑅 𝑘 +1
1
𝑓 (𝑥 (𝑖) ) ≥ k𝑥 (𝑖) k22 ≥ 0,
(𝑖) (𝑖)
𝑥 (𝑖) = (𝑥1 , . . . , 𝑥𝑖 , 0, . . . , 0), 𝑓best,𝑖 = 0
2
1 𝐺𝑅
𝑓best,𝑘 − 𝑓 ★ = − 𝑓 ★ = = √
2(𝑘 + 1) 2(2 + 𝑘 + 1)
Conclusion
√
• example shows that 𝑂 (𝐺 𝑅/ 𝑘) bound cannot be improved
• subgradient method is “optimal” (for this problem and algorithm class)
• S. Boyd, Lecture slides and notes for EE364b, Convex Optimization II.