0% found this document useful (0 votes)
15 views28 pages

Chapter 3 Unconstrained Convex Optimization

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views28 pages

Chapter 3 Unconstrained Convex Optimization

Uploaded by

hai.nguyen29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

FUNDAMENTALS OF

OPTIMIZATION
Unconstrained convex optimization
CONTENT
• Unconstrained optimization problems
• Descent method
• Gradient descent method
• Newton method
• Subgradient method
Unconstrained convex optimization
• Unconstrained, smooth convex optimization problem:
min f(x)
• f: Rn → R is convex and twice differentiable
• dom f = R: no constraint
• Assumption: the problem is solvable with f* = minx f(x) and x* =
argMinx f(x)
• To find x, solve equation f(x*) = 0: not easy to solve analytically
• Iterative scheme is preferred: compute minimizing sequence x(0), x(1), …
s.t. f(x(k)) → f(x*) as k → 
• The algorithm stops at some point x(k) when the error is within acceptable
tolerance: f(x(k)) – f* ≤ 
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤  ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn

f(x) = ex has no minimizer f(x) = -x + e-x has no minimizer


Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤  ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn

6 f(x) = ex + e-x - 3x2 + x has one local f(x) = ex + e-x - 3x2 has two global minimizers
minimizer and one global minimizer
Local minimizer
• Theorem (Necessary condition for local minimum) If x* is a local
minimizer for f: Rn → R, then f(x*) = 0 (x* is also called stationary point
for f)
Local minimizer

Example
• f(x,y) = x2 + y2 - 2xy + x
• f(x) = 2x - 2y + 1 = 0 has no solution
2y – 2x

→ there is no minimizer of f(x,y)

8
Local minimizer

• Theorem (Sufficient condition for a local minimum) Assume x* is a


stationary point and that  2f(x*) is positive definite, then x* is a local
minimizer

2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)


 2f(x) = . . .
𝑥1𝑥1 𝑥1𝑥2 𝑥1𝑥𝑛
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥2𝑥1 𝑥2𝑥2 𝑥2𝑥𝑛
. . .
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥𝑛𝑥1 𝑥𝑛𝑥2 𝑥𝑛𝑥𝑛

9
Local minimizer

• Matrix Anxn is called positive definite if

Ai = a1,1 a1,2 . . . a1,i , det(Ai) > 0, i = 1,…,n

a2,1 a2,2 . . . a2,i


.....
ai,1 . . ai,2 . . . ai,i

10
Local minimizer

2 2
• Example f(x,y) = 𝑒 𝑥 +𝑦
f(x,y) = 𝑥2+𝑦2 = 0 has unique solution x* = (0,0)
2x𝑒
𝑥2+𝑦2
2y𝑒

 2f(0,0) =
2 0 > 0 → (0,0) is a minimizer of f
0 2

11
Local minimizer
• Example Find a minimizer of f(x,y) = x2 + y2 - 2xy – x ?

12
Descent method
Determine starting point x(0)  Rn;
k  0;
while( stop condition not reach){
Determine a search direction pk  Rn;
Determine a step size k > 0 s.t. f(x(k) + kpk) < f(x(k));
x(k+1)  x(k) + kpk;
k  k+1;
}

Stop condition may be


• ||f(xk)|| ≤ 
• ||xk+1 - xk|| ≤ 
• k > K (maximum number of iterations)

13
Gradient descent method

• Gradient descent schema


x(k) = x(k-1) - kf(x(k-1))

init x(0);
k = 1;
while stop condition not reach{
specify constant k;
x(k) = x(k-1) - kf(x(k-1));
k = k + 1;
}

𝑓
• k might be specified in such a way that f(x(k-1) - kf(x(k-1))) is minimized: =0
𝑘

14
Gradient descent method
Example f(x1,x2,x3) = 𝑥12 + 𝑥22 + 𝑥32 - 𝑥1 𝑥2 - 𝑥2 𝑥3 + 𝑥1 + 𝑥3 → min ?

15
Newton method
• Second-order Taylor approximation g of f at x is
1
f(x+h)  g(x + h) = f(x) + hf(x) + h2  2f(x)
2
• Which is a convex quadratic function of h
𝑔
• g(x+h) is minimized when = 0 → h = - 2f(x)-1f(x)
ℎ

16
Newton method

Generate x(0); // starting point


k = 0;
while stop condition not reach{
x(k+1)  x(k) -  2f(x(k))-1f(x(k));
k = k + 1;
}

17
Newton method

Step x y f
Initialization [0,0,0] [1, 1, 1] 0

Step 1 [-1., -1., -1.] [-2.46519033e-32 1.11022302e-16 2.22044605e-16] -1.0000000000000004

Step 2 [-1., -1., -1.] [0., 0., 0.] -1

18
Subgradient method

• For minimize nondifferentiable convex function


• Subgradient method is not a descent method: the function value can
increase

19
Subgradient method
• Subgradient of f at x
• Any vector g such that f(x’)  f(x) + gT(x’-x)
• If f is differentiable, only possible choice is g(k) = f(x(k)), → the
subgradient method reduces to the gradient method

20
Subgradient method
• Example Given f(x)= max{3-x, x-3}. Find subgradient of f at any x
• Solve ?

21
Basic subgradient method

x(k+1) = x(k) - kg(k)


• x(k): is at the kth iteration
• g(k): any subgradient of f at x(k)
• k > 0 is the kth step size
(𝑘)
• Note: subgradient is not a descent method, thus 𝑓𝑏𝑒𝑠𝑡 = min{f(x(1)), f(x(2)),
…, f(x(k))}

22
Convergence proof
• Notations: x* is a minimizer of f
• Assumptions
• Norm of the subgradients is bounded (with a constant G): ||g(k)||2 ≤ G
(this is the case if, for example, f satisfies the Lipschitz condition
|f(u) – f(v)| ≤ G||u-v||2)
• ||x(1) - x*||22 ≤ R (with a known constant R)
• We have ||x(k+1) - x*||22 = ||x(k) - kg(k) – x*||22
= ||x(k) – x*||22 - 2kg(k)T(x(k) – x*) + 2𝑘 ||g(k)||22
≤ ||x(k) – x*||22 - 2k(f(x(k)) – f(x*)) + 2𝑘 ||g(k)||22 (due to the fact that f(x*)  f(x(k))
+ g(k)T(x* - x(k))) (1)

23
Convergence proof
• Apply the inequality (1) recursively, we have

||x(k+1) - x*||22 ≤ ||x(1) - x*||22 -2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) + σ𝑘𝑖=1 2𝑖 ||g(i)||22 (where f *
= f(x*))

→ 2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) ≤ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22

→ R2 + σ𝑘𝑖=1 2𝑖 ||g(i)||22  2σ𝑘𝑖=1 𝑖 (𝑓 x(i) − 𝑓 ) 
∗ (𝑘) ∗
2(σ𝑘𝑖=1 𝑖 )𝑚𝑖𝑛𝑖 1 𝑘(𝑓 x(i) − 𝑓 ) = 2σ𝑘𝑖=1 𝑖 (𝑓𝑏𝑒𝑠𝑡 − 𝑓 )
= ,…,
(𝑘) ∗ R2 + σ𝑘  ||g(i)||22
2
→ 𝑓𝑏𝑒𝑠𝑡 −𝑓 ≤ 𝑖=1 𝑖
(2)
2 σ𝑘 
𝑖=1 𝑖

24
Convergence proof

• Different cases
• Constant step size k = 
(𝑘) ∗ R2 + 𝐺22𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘
(𝑘) ∗
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G2/2 when k → 
• Constant step length k =  / ||g(k)||2
(𝑘) ∗ R2 +2𝑘
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 ≤
2𝑘/𝐺
(𝑘) ∗
→𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to G /2 when k → 

25
Convergence proof

• Different cases
• Square summable but not summable
||||22 = σ 
𝑖=1 𝑖 <  and σ𝑖=1 𝑖 = 
2

(𝑘) ∗R2 + 𝐺2||||22


→ 𝑓𝑏𝑒𝑠𝑡 −𝑓 ≤
2 σ𝑘𝑖=1 𝑖
(𝑘) ∗
→ 𝑓𝑏𝑒𝑠𝑡 − 𝑓 converges to 0 as k → 

26
Example
minimize f(x) = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
• Finding subgradient: given x, the index j for which
𝑎𝑗𝑇 x + bj = maxi=1,…,m(𝑎𝑖𝑇 x + bi)
→ subgradient at x is g = aj

27
Thank you
for your
attentions!

You might also like