0% found this document useful (0 votes)
16 views11 pages

5 ND Basic Questions

Uploaded by

Body Builder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

5 ND Basic Questions

Uploaded by

Body Builder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

N-D

OPTIMIZATION

Dhish Kumar Saxena


Professor
Department of Mechanical & Industrial Engineering
(Joint Faculty, Department of Computer Science)
IIT Roorkee
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Given Problem: Minimize f (X ); X ∈ R n

(A) Initialize X0 that is Xk : k = 0 Q4. Is convergence independent of X0

Q1. Stopping conditions? (B) While some stopping criterion is not met by Xk

Q2. How to find Xk+1 (i) find Xk+1 : f (Xk+1) < f (Xk )
} Q3. Does the Algorithm
Converge over iterations?

(ii) Set k = k + 1 Q5. Speed of convergence?

Endwhile

(C ) Declare output X* = Xk, a stationary point of f(X)

2
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Q1: Stopping Conditions


We do know:
∙ First-order Necessary Condition (FONCC): ∇f = 0

X
∙ Second-order Necessary Condition (SONCC): Hessian Matrix H is positive semi-definite (p.s.d) Computationally
∙ Second-order Sufficiency Condition (SONCC): Hessian Matrix H is positive definite (p.d) expensive

Utilize FONCC

∙ ∇f (xk ) ≤ ϵ ……dependent on the units of xk

f (xk ) − f (xk+1)
∙ ≤ϵ
f (xk )

3
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Q2: How to find Xk+1

How about modeling Xk+1 = Xk + αk dk, where:


∙ Xk is the current point
∙ dk is a descent direction: one that promises reduction in function value
∙ αk is a step length along dk

Needs two considerations, one each along αk and dk

4
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Xk+1 = Xk + αk dk

Perspective-I Descent direction dk : a vector dk ∈ R n; dk ≠ 0, is said to be a descent direction at a given point X ∈ R n,


if there exists δd > 0 such that f (Xk + αk dk ) < f (X ) ∀ αk ∈ (0,δd )
Set of all Descent Directions at a given point Sd̂ (X ) = {dk : f (Xk + αk dk ) < f (X ) ∀ αk ∈ (0,δd )}

Perspective-II Set of all directions which make an obtuse angle with ∇f Sd(X ) = {dk : ∇f T (X )dk < 0}

Sd(X ) ⊆ Sd̂ (X ), that is Sd(X ) ⟹ Sd̂ (X )

{dk : ∇f T (X )dk < 0} ⟹ {dk : f (Xk + αk dk ) < f (X ) ∀ αk ∈ (0,δd )}

T T
f (X + αk dk ) − f (X )
f (Xk + αk dk ) = f (X ) + ∇f (Xk ) αk dk ⟹ ∇f (Xk ) dk = Ltα→o+
αk

∇f T (Xk ) dk < 0 ⟹ f (X + αk dk ) < f (X )

All Optimization methods, despite usage of different dk s, honor these definitions of Descent Direction
5
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

αk : Exact Method Xk+1 = Xk + αk dk assuming dk is fixed through choice of a method αk : Inexact Method

Focus on the fact that f (Xk + αk dk ) = f (αk ) a given XK , dk ⟹ f (Xk + αk dk ) is a function of a single variable αk
FONCC for a single variable

f′(αk ) = 0
df df dX T
αk is such that it takes you to a point at which the
f′(αk ) = = ⋅ = 0 ⟹ ∇f (Xk + αk dk ) ⋅ dk = 0 ⟹ ∇fk+1 dk = 0
dαk dX dαk Gradient is perpendicular to the current direction
dk
Xk+1

∇f T (Xk+1)
αk
f′(αk = 0) ≡ ∇fkT dk < 0 Xk

αk : Inexact Method Step Length should be such that:

∙ it provides sufficient decrease in f value ⟹ Armijo's condition

∙ it provides a sufficient shift from Xk to Xk+1 ⟹ Goldstein's condition Armijo-Wolfe condition

⟹ Wolfe's condition
6



FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Xk+1 = Xk + αk dk assuming dk is fixed through choice of a method


αk : Inexact Method Step Length should be such that it provides sufficient decrease in f value Armijo's condition
f(αk)
f′(0) ≡ ∇f (Xk ) ⋅ dk ⟹ ∇f kT ⋅ dk < 0
f (αk = 0) ≡ f (0) ≡ f (Xk ) ϕ(αk ) = 0 [ ∇f kT dk ] ⋅ αk + f (Xk )

ϕ(αk ) =A [ ∇f kT dk ] ⋅ αk + f (Xk ); A ∈ (0,1)

αk
y = mx + c
ϕ(αk ) = [ ∇fkT dk] ⋅ αk + f (Xk )

αkA αkA
Armijo's condition Choose αk such that f (αk ) ≤ ϕ(αk ) ⟹ f (Xk + αk dk ) ≤ A[ ∇f kT dk ] ⋅ αk + f (Xk ); A ∈ (0,1)
Backtracking: ∙ Start with αk = 1
∙ If Armijo's condition is violated with αk = 1 Even a small step length may fulfil Armijo's condn.
reset αknew = αk /β where β may be set at 2 } hence, significant step length may not prevail 7

FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Xk+1 = Xk + αk dk assuming dk is fixed through choice of a method


αk : Inexact Method Step Length should be such that it provides sufficient decrease in f value Armijo's condition
f(αk)
f′(0) ≡ ∇f (Xk ) ⋅ dk ⟹ ∇f kT ⋅ dk < 0
f (αk = 0) ≡ f (0) ≡ f (Xk ) ϕ(αk ) = 0 [ ∇f kT dk ] ⋅ αk + f (Xk )

ϕ(αk ) =A [ ∇f kT dk ] ⋅ αk + f (Xk ); A ∈ (0,1)

αk
ϕ(αk ) = [ ∇fkT dk] ⋅ αk + f (Xk ) Wolfe's condition Choose αk such that
αkW αkW
the new slope at αk
αkA αkA is larger than the original slope
f′(αk ) ≥ W f′(0); W ∈ (0,1)
αkAW αkAW

f (αk ) ≤ ϕ(αk ) ⟹ f (Xk + αk dk ) ≤ A[ ∇f kT dk ] ⋅ αk + f (Xk ); A ∈ (0,1)


Armijo-Wolfe condition
{ f′(αk ) ≥ W f′(0); W ∈ (A,1) 8





FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N

Global Convergence (Zoutendijk) Theorem

Consider: Minimize f (X ) where X ∈ R n and suppose:


∙ f is C 1 continuous
∙ f is bounded below
∙ ∇f is Lipschitz continous ⟹ | ∇fk+1 − ∇fk | ≤ L | Xk+1 − Xk | ; L:0<L<∞
If at every iteration k of an optimization algorithm:
∙ a descent direction dk is chosen such that cos 2(θk ) > δ +, where θk is the angle between dk and ∇fK
∙ αk satisfies Armijo-Wolfe Condition

Then the Optimization Algorithm either terminates in a finite number of iterations or Ltk→∞ | ∇fk | = 0

The above claim does not depend on X0, hence, Convergence is independent of the initial point

9
FRAMEWORK FOR UNCONSTRAINED N-D OPTIMIZATIO N


Speed of Convergence (order and rate of convergence)
Consider that an Optimization algorithm generates a sequence of points Xk

| Xk+1 − X* |
The sequence Xk converges to X* with order p if : Ltk→∞ p = r (a finite number)
| Xk − X* |
δk+1 X*
Ltk→∞ | δk+1 | = r | δk |p ; (r is finite number)
Xk+1
δk p
Xk Asymptotically | δk+1 | = r | δk |

p : Order of Convergence
r : rate of Convergence
∙ δk ≈ 0 for very large k, and the aim is to achieve δk+1 = 0
∙ The aim is better fulfilled with: larger p (which makes δkp even smaller); smaller r

An Algorithm with higher-order and lower-rate is said to Converge faster (faster speed of Convergence)

∙ p=1: first-order/linear convergence


∙ p=2: second-order/quadratic convergence
| Xk+1 − X* | | Xk+1 − X* |
∙ Superlinear convergence: when an algorithm reports: Ltk→∞ = 0; Ltk→∞ =∞
| Xk − X* | | Xk − X* |2
10
Thank You

You might also like