0% found this document useful (0 votes)
7 views

Chapter 3 Unconstrained Convex Optimization

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Chapter 3 Unconstrained Convex Optimization

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

FUNDAMENTALS OF

OPTIMIZATION
Unconstrained convex optimization
CONTENT
• Unconstrained optimization problems
• Descent method
• Gradient descent method
• Newton method
• Subgradient method
Unconstrained convex optimization
• Unconstrained, smooth convex optimization problem:
min f(x)
• f: Rn → R is convex and twice differentiable
• dom f = R: no constraint
• Assumption: the problem is solvable with f* = minx f(x) and x* =
argMinx f(x)
• To find x, solve equation f(x*) = 0: not easy to solve analytically
• Iterative scheme is preferred: compute minimizing sequence x(0), x(1), …
s.t. f(x(k)) → f(x*) as k → 
• The algorithm stops at some point x(k) when the error is within acceptable
tolerance: f(x(k)) – f* ≤ 
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤  ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn

f(x) = ex has no minimizer f(x) = -x + e-x has no minimizer


Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤  ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn

6 f(x) = ex + e-x - 3x2 + x has one local f(x) = ex + e-x - 3x2 has two global minimizers
minimizer and one global minimizer
Local minimizer
• Theorem (Necessary condition for local minimum) If x* is a local
minimizer for f: Rn → R, then f(x*) = 0 (x* is also called stationary point
for f)
Local minimizer

Example
• f(x,y) = x2 + y2 - 2xy + x
• f(x) = 2x - 2y + 1 = 0 has no solution
2y – 2x

→ there is no minimizer of f(x,y)

8
Local minimizer

• Theorem (Sufficient condition for a local minimum) Assume x* is a


stationary point and that  2f(x*) is positive definite, then x* is a local
minimizer

2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)


 2f(x) = . . .
𝑥1𝑥1 𝑥1𝑥2 𝑥1𝑥𝑛
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥2𝑥1 𝑥2𝑥2 𝑥2𝑥𝑛
. . .
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥𝑛𝑥1 𝑥𝑛𝑥2 𝑥𝑛𝑥𝑛

9
Local minimizer

• Matrix Anxn is called positive definite if

Ai = a1,1 a1,2 . . . a1,i , det(Ai) > 0, i = 1,…,n

a2,1 a2,2 . . . a2,i


.....
ai,1 . . ai,2 . . . ai,i

10
Local minimizer

2 2
• Example f(x,y) = 𝑒 𝑥 +𝑦
f(x,y) = 𝑥2+𝑦2 = 0 has unique solution x* = (0,0)
2x𝑒
𝑥2+𝑦2
2y𝑒

 2f(0,0) =
2 0 > 0 → (0,0) is a minimizer of f
0 2

11
Local minimizer
• Example Find a minimizer of f(x,y) = x2 + y2 - 2xy – x ?

12
Descent method
Determine starting point x(0)  Rn;
k  0;
while( stop condition not reach){
Determine a search direction pk  Rn;
Determine a step size k > 0 s.t. f(x(k) + kpk) < f(x(k));
x(k+1)  x(k) + kpk;
k  k+1;
}

Stop condition may be


• ||f(xk)|| ≤ 
• ||xk+1 - xk|| ≤ 
• k > K (maximum number of iterations)

13
Gradient descent method

• Gradient descent schema


x(k) = x(k-1) - kf(x(k-1))

init x(0);
k = 1;
while stop condition not reach{
specify constant k;
x(k) = x(k-1) - kf(x(k-1));
k = k + 1;
}

𝑓
• k might be specified in such a way that f(x(k-1) - kf(x(k-1))) is minimized: =0
𝑘

14
Gradient descent method
Example f(x1,x2,x3) = 𝑥12 + 𝑥22 + 𝑥32 - 𝑥1 𝑥2 - 𝑥2 𝑥3 + 𝑥1 + 𝑥3 → min ?

15
Newton method
• Second-order Taylor approximation g of f at x is
1
f(x+h)  g(x + h) = f(x) + hf(x) + h2  2f(x)
2
• Which is a convex quadratic function of h
𝑔
• g(x+h) is minimized when = 0 → h = - 2f(x)-1f(x)
ℎ

16
Newton method

Generate x(0); // starting point


k = 0;
while stop condition not reach{
x(k+1)  x(k) -  2f(x(k))-1f(x(k));
k = k + 1;
}

17
Newton method

Step x y f
Initialization [0,0,0] [1, 1, 1] 0

Step 1 [-1., -1., -1.] [-2.46519033e-32 1.11022302e-16 2.22044605e-16] -1.0000000000000004

Step 2 [-1., -1., -1.] [0., 0., 0.] -1

18
Subgradient method

• For minimize nondifferentiable convex function


• Subgradient method is not a descent method: the function value can
increase

19
Subgradient method
• Subgradient of f at x
• Any vector g such that f(x’)  f(x) + gT(x’-x)
• If f is differentiable, only possible choice is g(k) = f(x(k)), → the
subgradient method reduces to the gradient method

20
Subgradient method
• Example Given f(x)= max{3-x, x-3}. Find subgradient of f at any x
• Solve ?

21
Basic subgradient method

x(k+1) = x(k) - kg(k)


• x(k): is at the kth iteration
• g(k): any subgradient of f at x(k)
• k > 0 is the kth step size
(𝑘)
• Note: subgradient is not a descent method, thus 𝑓𝑏𝑒𝑠𝑡 = min{f(x(1)), f(x(2)),
…, f(x(k))}

22
Convergence proof
• Notations: x* is a minimizer of f
• Assumptions
• Norm of the subgradients is bounded (with a constant G): ||g(k)||2 ≤ G
(this is the case if, for example, f satisfies the Lipschitz condition
|f(u) – f(v)| ≤ G||u-v||2)
• ||x(1) - x*||22 ≤ R (with a known constant R)
• We have ||x(k+1) - x*||22 = ||x(k) - kg(k) – x*||22
= ||x(k) – x*||22 - 2kg(k)T(x(k) – x*) + 2𝑘 ||g(k)||22
≤ ||x(k) – x*||22 - 2k(f(x(k)) – f(x*)) + 2𝑘 ||g(k)||22 (due to the fact that f(x*)  f(x(k))
+ g(k)T(x* - x(k))) (1)

23
Convergence proof
• Apply the inequality (1) recursively, we have

||x(k+1) - x*||22 ≤ ||x(1) - x*||22 -2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) + σ𝑘𝑖=1 2𝑖 ||g(i)||22 (where f *
= f(x*))

→ 2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) ≤ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22

→ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22  2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) 
∗ (𝑘) ∗
2(σ𝑘𝑖=1 𝑖 )𝑚𝑖𝑛𝑖 1 𝑘(𝑓 x(i) − 𝑓 ) = 2σ𝑘𝑖=1 𝑖 (𝑓𝑏𝑒𝑠𝑡 − 𝑓 )
= ,…,
(𝑘) ∗ R2 + σ𝑘  ||g(i)||22
2
→ 𝑓𝑏𝑒𝑠𝑡 −𝑓 ≤ 𝑖=1 𝑖
(2)
2 σ𝑘 
𝑖=1 𝑖

24
Convergence proof

• Different cases
• Constant step size k = 
(𝑘) ∗ R2 + 𝐺22𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘
(𝑘) ∗
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G2/2 when k → 
• Constant step length k =  / ||g(k)||2
(𝑘) ∗ R2 +2𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘/𝐺
(𝑘) ∗
→𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G /2 when k → 

25
Convergence proof

• Different cases
• Square summable but not summable
||||22 = σ 
𝑖=1 𝑖 <  and σ𝑖=1 𝑖 = 
2

(𝑘) ∗R2 + 𝐺2||||22


→ 𝑓𝑏𝑒𝑠𝑡 −𝑓 ≤
2 σ𝑘𝑖=1 𝑖
(𝑘) ∗
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to 0 as k → 

26
Example
minimize f(x) = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
• Finding subgradient: given x, the index j for which
𝑎𝑗𝑇 x + bj = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
→ subgradient at x is g = aj

27
Thank you
for your
attentions!

You might also like