Chapter 3 Unconstrained Convex Optimization
Chapter 3 Unconstrained Convex Optimization
OPTIMIZATION
Unconstrained convex optimization
CONTENT
• Unconstrained optimization problems
• Descent method
• Gradient descent method
• Newton method
• Subgradient method
Unconstrained convex optimization
• Unconstrained, smooth convex optimization problem:
min f(x)
• f: Rn → R is convex and twice differentiable
• dom f = R: no constraint
• Assumption: the problem is solvable with f* = minx f(x) and x* =
argMinx f(x)
• To find x, solve equation f(x*) = 0: not easy to solve analytically
• Iterative scheme is preferred: compute minimizing sequence x(0), x(1), …
s.t. f(x(k)) → f(x*) as k →
• The algorithm stops at some point x(k) when the error is within acceptable
tolerance: f(x(k)) – f* ≤
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤ ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn
6 f(x) = ex + e-x - 3x2 + x has one local f(x) = ex + e-x - 3x2 has two global minimizers
minimizer and one global minimizer
Local minimizer
• Theorem (Necessary condition for local minimum) If x* is a local
minimizer for f: Rn → R, then f(x*) = 0 (x* is also called stationary point
for f)
Local minimizer
Example
• f(x,y) = x2 + y2 - 2xy + x
• f(x) = 2x - 2y + 1 = 0 has no solution
2y – 2x
8
Local minimizer
9
Local minimizer
10
Local minimizer
2 2
• Example f(x,y) = 𝑒 𝑥 +𝑦
f(x,y) = 𝑥2+𝑦2 = 0 has unique solution x* = (0,0)
2x𝑒
𝑥2+𝑦2
2y𝑒
2f(0,0) =
2 0 > 0 → (0,0) is a minimizer of f
0 2
11
Local minimizer
• Example Find a minimizer of f(x,y) = x2 + y2 - 2xy – x ?
12
Descent method
Determine starting point x(0) Rn;
k 0;
while( stop condition not reach){
Determine a search direction pk Rn;
Determine a step size k > 0 s.t. f(x(k) + kpk) < f(x(k));
x(k+1) x(k) + kpk;
k k+1;
}
13
Gradient descent method
init x(0);
k = 1;
while stop condition not reach{
specify constant k;
x(k) = x(k-1) - kf(x(k-1));
k = k + 1;
}
𝑓
• k might be specified in such a way that f(x(k-1) - kf(x(k-1))) is minimized: =0
𝑘
14
Gradient descent method
Example f(x1,x2,x3) = 𝑥12 + 𝑥22 + 𝑥32 - 𝑥1 𝑥2 - 𝑥2 𝑥3 + 𝑥1 + 𝑥3 → min ?
15
Newton method
• Second-order Taylor approximation g of f at x is
1
f(x+h) g(x + h) = f(x) + hf(x) + h2 2f(x)
2
• Which is a convex quadratic function of h
𝑔
• g(x+h) is minimized when = 0 → h = - 2f(x)-1f(x)
ℎ
16
Newton method
17
Newton method
Step x y f
Initialization [0,0,0] [1, 1, 1] 0
18
Subgradient method
19
Subgradient method
• Subgradient of f at x
• Any vector g such that f(x’) f(x) + gT(x’-x)
• If f is differentiable, only possible choice is g(k) = f(x(k)), → the
subgradient method reduces to the gradient method
20
Subgradient method
• Example Given f(x)= max{3-x, x-3}. Find subgradient of f at any x
• Solve ?
21
Basic subgradient method
22
Convergence proof
• Notations: x* is a minimizer of f
• Assumptions
• Norm of the subgradients is bounded (with a constant G): ||g(k)||2 ≤ G
(this is the case if, for example, f satisfies the Lipschitz condition
|f(u) – f(v)| ≤ G||u-v||2)
• ||x(1) - x*||22 ≤ R (with a known constant R)
• We have ||x(k+1) - x*||22 = ||x(k) - kg(k) – x*||22
= ||x(k) – x*||22 - 2kg(k)T(x(k) – x*) + 2𝑘 ||g(k)||22
≤ ||x(k) – x*||22 - 2k(f(x(k)) – f(x*)) + 2𝑘 ||g(k)||22 (due to the fact that f(x*) f(x(k))
+ g(k)T(x* - x(k))) (1)
23
Convergence proof
• Apply the inequality (1) recursively, we have
∗
||x(k+1) - x*||22 ≤ ||x(1) - x*||22 -2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) + σ𝑘𝑖=1 2𝑖 ||g(i)||22 (where f *
= f(x*))
∗
→ 2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) ≤ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22
∗
→ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22 2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 )
∗ (𝑘) ∗
2(σ𝑘𝑖=1 𝑖 )𝑚𝑖𝑛𝑖 1 𝑘(𝑓 x(i) − 𝑓 ) = 2σ𝑘𝑖=1 𝑖 (𝑓𝑏𝑒𝑠𝑡 − 𝑓 )
= ,…,
(𝑘) ∗ R2 + σ𝑘 ||g(i)||22
2
→ 𝑓𝑏𝑒𝑠𝑡 −𝑓 ≤ 𝑖=1 𝑖
(2)
2 σ𝑘
𝑖=1 𝑖
24
Convergence proof
• Different cases
• Constant step size k =
(𝑘) ∗ R2 + 𝐺22𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘
(𝑘) ∗
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G2/2 when k →
• Constant step length k = / ||g(k)||2
(𝑘) ∗ R2 +2𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘/𝐺
(𝑘) ∗
→𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G /2 when k →
25
Convergence proof
• Different cases
• Square summable but not summable
||||22 = σ
𝑖=1 𝑖 < and σ𝑖=1 𝑖 =
2
26
Example
minimize f(x) = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
• Finding subgradient: given x, the index j for which
𝑎𝑗𝑇 x + bj = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
→ subgradient at x is g = aj
27
Thank you
for your
attentions!