0% found this document useful (0 votes)
63 views

Subgradient Method

The document summarizes the subgradient method for minimizing a nondifferentiable convex function. It discusses using a fixed step size, fixed step length, or diminishing step sizes. For a fixed number of iterations k, the optimal step length is shown to be the square root of R divided by k+1, where R bounds the initial distance to the optimal solution. This guarantees an accuracy of ε in O(1/ε^2) iterations. The tightest bound is obtained by setting the step size equal to the ratio of the function value drop to the subgradient norm squared at each iteration.

Uploaded by

williamrob104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Subgradient Method

The document summarizes the subgradient method for minimizing a nondifferentiable convex function. It discusses using a fixed step size, fixed step length, or diminishing step sizes. For a fixed number of iterations k, the optimal step length is shown to be the square root of R divided by k+1, where R bounds the initial distance to the optimal solution. This guarantees an accuracy of ε in O(1/ε^2) iterations. The tightest bound is obtained by setting the step size equal to the ratio of the function value drop to the subgradient norm squared at each iteration.

Uploaded by

williamrob104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

L.

Vandenberghe ECE236C (Spring 2022)

3. Subgradient method

• subgradient method

• convergence analysis

• optimal step size when 𝑓 ★ is known

• alternating projections

• projected subgradient method

• optimality of subgradient method

3.1
Subgradient method

to minimize a nondifferentiable convex function 𝑓 : choose 𝑥 0 and repeat

𝑥 𝑘+1 = 𝑥 𝑘 − 𝑡 𝑘 𝑔 𝑘 , 𝑘 = 0, 1, . . .

𝑔 𝑘 is any subgradient of 𝑓 at 𝑥 𝑘

Step size rules

• fixed step: 𝑡 𝑘 constant

• fixed length: 𝑡 𝑘 k𝑔 𝑘 k2 = k𝑥 𝑘+1 − 𝑥 𝑘 k2 is constant


• diminishing: 𝑡 𝑘 → 0 and 𝑡𝑘 = ∞
P
𝑘=0

Subgradient method 3.2


Assumptions

• problem has finite optimal value 𝑓 ★, optimal solution 𝑥★

• 𝑓 is convex with dom 𝑓 = R𝑛

• 𝑓 is Lipschitz continuous with constant 𝐺 > 0:

| 𝑓 (𝑥) − 𝑓 (𝑦)| ≤ 𝐺 k𝑥 − 𝑦k2 for all 𝑥 , 𝑦

this is equivalent to k𝑔k2 ≤ 𝐺 for all 𝑥 and 𝑔 ∈ 𝜕 𝑓 (𝑥) (see next page)

Subgradient method 3.3


Proof.

• assume k𝑔k2 ≤ 𝐺 for all subgradients; choose 𝑔 𝑦 ∈ 𝜕 𝑓 (𝑦) , 𝑔𝑥 ∈ 𝜕 𝑓 (𝑥) :

𝑔𝑇𝑥 (𝑥 − 𝑦) ≥ 𝑓 (𝑥) − 𝑓 (𝑦) ≥ 𝑔𝑇𝑦 (𝑥 − 𝑦)

by the Cauchy–Schwarz inequality

𝐺 k𝑥 − 𝑦k2 ≥ 𝑓 (𝑥) − 𝑓 (𝑦) ≥ −𝐺 k𝑥 − 𝑦k2

• assume k𝑔k2 > 𝐺 for some 𝑔 ∈ 𝜕 𝑓 (𝑥) ; take 𝑦 = 𝑥 + 𝑔/k𝑔k2:

𝑓 (𝑦) ≥ 𝑓 (𝑥) + 𝑔𝑇 (𝑦 − 𝑥)
= 𝑓 (𝑥) + k𝑔k2
> 𝑓 (𝑥) + 𝐺

Subgradient method 3.4


Analysis

• the subgradient method is not a descent method


• therefore 𝑓best,𝑘 = min𝑖=0,...,𝑘 𝑓 (𝑥𝑖 ) can be less than 𝑓 (𝑥 𝑘 )
• the key quantity in the analysis is the distance to the optimal set

Progress in one iteration


• distance to 𝑥★:
2
k𝑥𝑖+1 − 𝑥★ k22 = 𝑥𝑖 − 𝑡𝑖 𝑔𝑖 − 𝑥★ 2
= k𝑥𝑖 − 𝑥★ k22 − 2𝑡𝑖 𝑔𝑖𝑇 (𝑥𝑖 − 𝑥★) + 𝑡𝑖2 k𝑔𝑖 k22
≤ k𝑥𝑖 − 𝑥 k2 − 2𝑡𝑖 𝑓 (𝑥𝑖 ) − 𝑓 + 𝑡𝑖2 k𝑔𝑖 k22
★ 2 ★

• best function value: combine inequalities for 𝑖 = 0, . . . , 𝑘 :


𝑘 𝑘
2( ★
k𝑥0 − 𝑥★ k22 − k𝑥 𝑘+1 − 𝑥★ k22 𝑡𝑖2 k𝑔𝑖 k22
X X
𝑡𝑖 )( 𝑓best,𝑘 − 𝑓 ) ≤ +
𝑖=0 𝑘 𝑖=0
k𝑥0 − 𝑥★ k22 + 𝑡𝑖2 k𝑔𝑖 k22
X

𝑖=0
Subgradient method 3.5
Fixed step size and fixed step length

Fixed step size: 𝑡𝑖 = 𝑡 with 𝑡 constant


k𝑥0 − 𝑥★ k22
𝐺 2𝑡
𝑓best,𝑘 −𝑓 ≤ +
2(𝑘 + 1)𝑡 2

• does not guarantee convergence of 𝑓best,𝑘


• for large 𝑘 , 𝑓best,𝑘 is approximately 𝐺 2𝑡/2-suboptimal

Fixed step length: 𝑡𝑖 = 𝑠/k𝑔𝑖 k2 with 𝑠 constant


𝐺 k𝑥0 − 𝑥★ k22
𝐺𝑠
𝑓best,𝑘 − 𝑓 ≤ +
2(𝑘 + 1)𝑠 2

• does not guarantee convergence of 𝑓best,𝑘


• for large 𝑘 , 𝑓best,𝑘 is approximately 𝐺𝑠/2-suboptimal

Subgradient method 3.6


Diminishing step size


𝑡𝑖 → 0,
X
𝑡𝑖 = ∞
𝑖=0

• bound on function value:

𝑘
𝐺2 𝑡𝑖2
P
k𝑥0 − 𝑥★ k22 𝑖=0
𝑓best,𝑘 − 𝑓 ★ ≤ +
𝑘 𝑘
2 𝑡𝑖 2 𝑡𝑖
P P
𝑖=0 𝑖=0

𝑘 𝑘
• 2
can show that ( 𝑡𝑖 )/( 𝑡𝑖 )
P P
→ 0; hence, 𝑓best,𝑘 converges to 𝑓 ★
𝑖=0 𝑖=0
• examples of diminishing step size rules:

𝜏 𝜏
𝑡𝑖 = , 𝑡𝑖 = √
𝑖+1 𝑖+1

Subgradient method 3.7


Example: 1-norm minimization

minimize k 𝐴𝑥 − 𝑏k1

• subgradient is given by 𝐴𝑇 sign( 𝐴𝑥 − 𝑏)


• example with 𝐴 ∈ R500×100, 𝑏 ∈ R500

Fixed steplength 𝑡 𝑘 = 𝑠/k𝑔 𝑘 k2 for 𝑠 = 0.1, 0.01, 0.001

( 𝑓 (𝑥 𝑘 ) − 𝑓 ★)/ 𝑓 ★ ( 𝑓best,𝑘 ) − 𝑓 ★)/ 𝑓 ★


100 100
0.1 0.1
0.01 0.01
10−1 0.001 10−1 0.001

10−2 10−2

10−3 10−3

10−4 10−4
0 100 200 300 400 500 0 1000 2000 3000
𝑘 𝑘
Subgradient method 3.8

Diminishing step size: 𝑡 𝑘 = 0.01/ 𝑘 + 1 and 𝑡 𝑘 = 0.01/(𝑘 + 1)

( 𝑓best,𝑘 − 𝑓 ★)/ 𝑓 ★
100 √
𝑡 𝑘 = 0.01/ 𝑘 + 1
𝑡 𝑘 = 0.01/(𝑘 + 1)
10−1

10−2

10−3

10−4

10−5
0 1000 2000 3000 4000 5000
𝑘

Subgradient method 3.9


Optimal step size for fixed number of iterations

from page 3.5: if 𝑠𝑖 = 𝑡𝑖 k𝑔𝑖 k2 and k𝑥 0 − 𝑥★ k2 ≤ 𝑅 , then

𝑘
𝑅2 + 𝑠𝑖2
X
𝑖=0
𝑓best,𝑘 − 𝑓 ★ ≤
𝑘
2
X
𝑠𝑖 /𝐺
𝑖=0

• for given 𝑘 , the right-hand side is minimized by the fixed step length

𝑅
𝑠𝑖 = 𝑠 = √
𝑘 +1

• the resulting bound after 𝑘 steps is

𝐺𝑅
𝑓best,𝑘 − 𝑓 ★ ≤ √
𝑘 +1

• this guarantees an accuracy 𝑓best,𝑘 − 𝑓 ★ ≤ 𝜖 in 𝑘 = 𝑂 (1/𝜖 2) iterations

Subgradient method 3.10


Optimal step size when 𝑓 ★ is known

• the right-hand side in the first inequality of page 3.5 is minimized by

𝑓 (𝑥𝑖 ) − 𝑓 ★
𝑡𝑖 =
k𝑔𝑖 k22

• the optimized bound is


2
𝑓 (𝑥𝑖 ) − 𝑓
≤ k𝑥𝑖 − 𝑥★ k22 − k𝑥𝑖+1 − 𝑥★ k22
k𝑔𝑖 k22

• applying this recursively from 𝑖 = 0 to 𝑖 = 𝑘 (and using k𝑔𝑖 k2 ≤ 𝐺 ) gives

★ 𝐺 k𝑥0 − 𝑥★ k2
𝑓best,𝑘 −𝑓 ≤ √
𝑘 +1

Subgradient method 3.11


Example: find point in intersection of convex sets

find a point in the intersection of 𝑚 closed convex sets 𝐶1, . . . , 𝐶𝑚 :

minimize 𝑓 (𝑥) = max { 𝑓1 (𝑥), . . . , 𝑓𝑚 (𝑥)}

where 𝑓 𝑗 (𝑥) = inf k𝑥 − 𝑦k2 is Euclidean distance of 𝑥 to 𝐶 𝑗


𝑦∈𝐶 𝑗

• 𝑓 ★ = 0 if the intersection is nonempty


ˆ if 𝑔 ∈ 𝜕 𝑓 𝑗 ( 𝑥)
• (from page 2.14) 𝑔 ∈ 𝜕 𝑓 ( 𝑥) ˆ and 𝐶 𝑗 is farthest set from 𝑥ˆ
ˆ follows from projection 𝑃 𝑗 ( 𝑥)
• (from page 2.20) subgradient 𝑔 ∈ 𝜕 𝑓 𝑗 ( 𝑥) ˆ on 𝐶 𝑗 :

1
𝑔 = 0 if 𝑥ˆ ∈ 𝐶 𝑗 , 𝑔= 𝑥ˆ − 𝑃 𝑗 ( 𝑥)
ˆ if 𝑥ˆ ∉ 𝐶 𝑗

k 𝑥ˆ − 𝑃 𝑗 ( 𝑥)
ˆ k2

note that k𝑔k2 = 1 if 𝑥ˆ ∉ 𝐶 𝑗

Subgradient method 3.12


Subgradient method for point in intersection of convex sets

• optimal step size (page 3.11) for 𝑓 ★ = 0 and k𝑔𝑖 k2 = 1 is 𝑡𝑖 = 𝑓 (𝑥𝑖 )

• at iteration 𝑘 , find farthest set 𝐶 𝑗 (with 𝑓 (𝑥 𝑘 ) = 𝑓 𝑗 (𝑥 𝑘 ) ), and take

𝑓 (𝑥 𝑘 )
𝑥 𝑘+1 = 𝑥 𝑘 − (𝑥 𝑘 − 𝑃 𝑗 (𝑥 𝑘 ))
𝑓 𝑗 (𝑥 𝑘 )
= 𝑃 𝑗 (𝑥 𝑘 )

at each step, we project the current point onto the farthest set

• a version of the alternating projections algorithm

• for 𝑚 = 2, projections alternate onto one set, then the other

• later, we will see faster sequential projection methods that are almost as simple

Subgradient method 3.13


Projected subgradient method

the subgradient method is easily extended to handle constrained problems

minimize 𝑓 (𝑥)
subject to 𝑥∈𝐶

where 𝐶 is a closed convex set

Projected subgradient method: choose 𝑥 0 ∈ 𝐶 and repeat

𝑥 𝑘+1 = 𝑃𝐶 (𝑥 𝑘 − 𝑡 𝑘 𝑔 𝑘 ), 𝑘 = 0, 1, . . .

• 𝑃𝐶 (𝑦) denotes the Euclidean projection of 𝑦 on 𝐶


• 𝑔 𝑘 is any subgradient of 𝑓 at 𝑥 𝑘
• 𝑡 𝑘 is chosen by same step size rules as for unconstrained problem (page 3.2)

Subgradient method 3.14


Examples of simple convex sets

subgradient projection is practical only if projection on 𝐶 is easy to compute

Halfspace: 𝐶 = {𝑥 | 𝑎𝑇 𝑥 ≤ 𝑏} (with 𝑎 ≠ 0)

𝑏 − 𝑎𝑇 𝑥
𝑃𝐶 (𝑥) = 𝑥 + 𝑎 if 𝑎𝑇 𝑥 > 𝑏, 𝑃𝐶 (𝑥) = 𝑥 if 𝑎𝑇 𝑥 ≤ 𝑏
k𝑎k22

Rectangle: 𝐶 = {𝑥 ∈ R𝑛 | 𝑙  𝑥  𝑢} where 𝑙  𝑢

 𝑙𝑘


 𝑥𝑘 ≤ 𝑙𝑘
𝑃𝐶 (𝑥)𝑘 = 𝑥 𝑘 𝑙𝑘 ≤ 𝑥𝑘 ≤ 𝑢𝑘
 𝑢𝑘 𝑥𝑘 ≥ 𝑢𝑘

Norm balls: 𝐶 = {𝑥 | k𝑥k ≤ 𝑅} for many common norms (e.g., 236B page 5.26)

we’ll encounter many other examples later in the course

Subgradient method 3.15


Projection on closed convex set

𝑃𝐶 (𝑥) = argmin k𝑢 − 𝑥k22


𝑢∈𝐶

𝑢 = 𝑃𝐶 (𝑥)
m 𝑢

(𝑥 − 𝑢)𝑇 (𝑧 − 𝑢) ≤ 0 ∀𝑧 ∈ 𝐶 𝐶

m
k𝑥 − 𝑧k22 ≥ k𝑥 − 𝑢k22 + k𝑧 − 𝑢k22 ∀𝑧 ∈ 𝐶

this follows from general optimality conditions in 236B page 4.9

Subgradient method 3.16


Analysis

minimize 𝑓 (𝑥)
subject to 𝑥∈𝐶

• 𝐶 is a closed convex set; other assumptions are the same as on page 3.3
• first inequality on page 3.5 still holds:

2
k𝑥𝑖+1 − 𝑥★ k22 = 𝑃𝐶 (𝑥𝑖 − 𝑡𝑖 𝑔𝑖 ) − 𝑥★ 2
2
≤ 𝑥𝑖 − 𝑡𝑖 𝑔𝑖 − 𝑥★ 2
= k𝑥𝑖 − 𝑥★ k22 − 2𝑡𝑖 𝑔𝑖𝑇 (𝑥𝑖 − 𝑥★) + 𝑡𝑖2 k𝑔𝑖 k22
≤ ★ 2
k𝑥𝑖 − 𝑥 k2 − 2𝑡𝑖 𝑓 (𝑥𝑖 ) − 𝑓 + 𝑡𝑖2 k𝑔𝑖 k22
★

second line follows from page 3.16 (with 𝑧 = 𝑥★, 𝑥 = 𝑥𝑖 − 𝑡𝑖 𝑔𝑖 )

• hence, earlier analysis also applies to subgradient projection method

Subgradient method 3.17


Optimality of the subgradient method


can the 𝑓best,𝑘 − 𝑓★ ≤ 𝐺 𝑅/ 𝑘 + 1 bound on page 3.10 be improved?

Problem class
minimize 𝑓 (𝑥)

• assumptions on page 3.3 are satisfied


• we are given a starting point 𝑥 (0) with k𝑥 (0) − 𝑥★ k2 ≤ 𝑅
• we are given the Lipschitz constant 𝐺 of 𝑓 on {𝑥 | k𝑥 − 𝑥★ k2 ≤ 𝑅}
• 𝑓 is defined by an oracle: given 𝑥 , the oracle returns 𝑓 (𝑥) and a 𝑔 ∈ 𝜕 𝑓 (𝑥)

Algorithm class

• algorithm can choose any 𝑥 (𝑖+1) from the set 𝑥 (0) + span{𝑔 (0) , 𝑔 (1) , . . . , 𝑔 (𝑖) }
• we stop after a fixed number 𝑘 of iterations

Subgradient method 3.18


Test problem and oracle

1
𝑓 (𝑥) = max 𝑥𝑖 + k𝑥k22 (with 𝑘 < 𝑛), 𝑥 (0) = 0
𝑖=1,...,𝑘+1 2

• subdifferential 𝜕 𝑓 (𝑥) = conv{𝑒 𝑗 + 𝑥 | 1 ≤ 𝑗 ≤ 𝑘 + 1, 𝑥 𝑗 = max 𝑥𝑖 }


𝑖=1,...,𝑘+1

• solution and optimal value

1 1 1

𝑥 = −( , ..., , 0, . . . , 0), ★
𝑓 =−
𝑘
| + 1 {z 𝑘 + 1
} 2(𝑘 + 1)
𝑘 + 1 times

• distance of starting point to solution is 𝑅 = k𝑥 (0) − 𝑥★ k 2 = 1/ 𝑘 + 1
• Lipschitz constant on {𝑥 | k𝑥 − 𝑥★ k2 ≤ 𝑅}:

2
𝐺= sup k𝑔k2 ≤ √ +1
𝑔∈𝜕 𝑓 (𝑥), k𝑥−𝑥★ k2 ≤𝑅 𝑘 +1

• the oracle returns the subgradient 𝑒 𝚥ˆ + 𝑥 where 𝚥ˆ = min{ 𝑗 | 𝑥 𝑗 = max 𝑥𝑖 }


𝑖=1,...,𝑘+1

Subgradient method 3.19


Iteration

• after 𝑖 ≤ 𝑘 iterations of any algorithm in the algorithm class,

1
𝑓 (𝑥 (𝑖) ) ≥ k𝑥 (𝑖) k22 ≥ 0,
(𝑖) (𝑖)
𝑥 (𝑖) = (𝑥1 , . . . , 𝑥𝑖 , 0, . . . , 0), 𝑓best,𝑖 = 0
2

• suboptimality after 𝑘 iterations

1 𝐺𝑅
𝑓best,𝑘 − 𝑓 ★ = − 𝑓 ★ = = √
2(𝑘 + 1) 2(2 + 𝑘 + 1)

Conclusion

• example shows that 𝑂 (𝐺 𝑅/ 𝑘) bound cannot be improved
• subgradient method is “optimal” (for this problem and algorithm class)

Subgradient method 3.20


Summary: subgradient method

• handles general nondifferentiable convex problems

• often leads to very simple algorithms

• convergence can be very slow

• no good stopping criterion

• theoretical complexity: 𝑂 (1/𝜖 2) iterations to find 𝜖 -suboptimal point

• an “optimal” first-order method: 𝑂 (1/𝜖 2) bound cannot be improved

Subgradient method 3.21


References

• S. Boyd, Lecture slides and notes for EE364b, Convex Optimization II.

• Yu. Nesterov, Lectures on Convex Optimization (2018), section 3.2.3. The


example on page 3.19 is in §3.2.1.

• B. T. Polyak, Introduction to Optimization (1987), section 5.3.

Subgradient method 3.22

You might also like