E1 251 Linear and Nonlinear Op2miza2on
E1 251 Linear and Nonlinear Op2miza2on
1
8.1.
Steepest
Descent
Method
Step along an appropriate direction of highest decrease
x k+1 = x k + α k d k
d k : local best direction
To get the direction, look at Taylor's formula:
f (x k+1 ) = f (x k ) + (x k+1 − x k )T ∇f (x k ) +
(x k+1 − x k )T F(x k + θ (x k+1 − x k ))(x k+1 − x k )
f (x k+1 ) = f (x k ) + α k d k T ∇f (x k ) + α k2 d k T F(x k + θ (x k+1 − x k ))d k
For α k arbitrarily small:
f (x k+1 ) ! f (x k ) + α k d k T ∇f (x k )
4
Search
path
in
steepest
descent
method
5
8.3
Convergence
rate
of
SD
for
quadra=c
func=on
1 1 T
E(x k ) = (x k − x ) Q(x k − x ) = y k Qy k , where y k = x k − x* .
* T *
2 2
1
E(x k+1 ) = (x k+1 − x* )T Q(x k+1 − x* )
2
⎡ ( gk gk ) ⎤
T 2
Case II: Inexact line search with the following condition (Armijo's rule)
f (x k + α k d k ) ≤ f (x k ) + c1α k ∇T f (x k )d k
f (x k + nα k d k ) > f (x k ) + c1nα k ∇T f (x k )d k
⎛ A − a⎞
2
f (x k+1 ) − f * ≤ ⎜
⎝ A + a ⎟⎠
( f (x k ) − f *
)
where a and A are smallest and largest eigen values of F(x* )
and k is sufficiently large. 7
Exercise (Luenberger):
Suppose we use the method of steepest decent to
minimize the quadratic function
f (x) = (x − x* )T Q(x − x* ), but we allow a tolerance
±δα k , δ > 0 in the line search. In other words,
we use the iteration
x k+1 = x k − α k g k , where
(1-δ )α k ≤ α k ≤ (1+δ )α k , with α k being
g Tk g k
αk = T .
g k Qg k
(a) Find the convergence rate
(b) What is the largest δ that allows guarantees
convergence ?
8
From page 7:
⎡ 2α k g Tk Qy k − α k2 g Tk Qg k ⎤
E(x k+1 ) = ⎢1− T ⎥ E(x k )
⎣ T
y k Qy k ⎦
gk gk
Substitute α k = (1 ± δ ) T :
g k Qg k
⎡ ⎛ gk gk ⎞
T
⎛ gk gk ⎞
T 2
⎤
⎢ 2⎜ T ⎟ (1 ± δ )g k Qy k − ⎜ T
T
⎟ (1 ± δ ) g k Qg k ⎥
2 T
⎢ ⎝ g k Qg k ⎠ ⎝ g k Qg k ⎠ ⎥ E(x )
= ⎢1 − ⎥ k
y Tk Qy k
⎢ ⎥
⎢⎣ ⎥⎦
Using the relation Qy k = g k we get
⎡ ( ) ( ) ⎤
T 2 T 2
gk gk gk gk
⎢ 2 T (1 ± δ ) − T (1 ± δ )2 ⎥
⎢ g k Qg k g k Qg k ⎥
= ⎢1 − T −1 ⎥ E(x k )
gk Q gk
⎢ ⎥
⎢⎣ ⎥⎦ 9
⎡ ⎤
⎢ ⎥
⎢
= ⎢1 −
2 ± 2δ − 1(+ δ 2
± 2δ ) ⎥
⎥ E(x k )
⎢ (g T
k Qg k )(
g T −1
k Q gk ) ⎥
⎢
( ) ⎥
T 2
⎣ gk gk ⎦
⎡ ⎤
⎢ ⎥
⎢ 1− δ 2 ⎥
= ⎢1 − T
(
g Qg g)(
T −1
) ⎥ E(x k ) = ⎡
⎣1 − (
1 − δ 2
)
γ k ⎤⎦ E(x k )
⎢ k k k Q gk ⎥
⎢
( ) ⎥
T 2
⎣ gk gk ⎦
10
E(x k+1 ) = ⎡⎣1 − (1 − δ 2 )γ k ⎤⎦ E(x k )
⎡ 4(1 − δ )λ1λn ⎤
2
≤ ⎢1 − ⎥ E(x k )
⎣ (λ1 + λn ) ⎦
2
⎡ (λ1 − λn )2 + 4δ 2 λ1λn ⎤
≤⎢ ⎥ E(x k )
⎣ (λ1 + λn ) 2
⎦
⎡ (r − 1) + 4δ λn ⎤
2 2
≤⎢ ⎥ E(x k ),
⎣ (r + 1) 2
⎦
r = λn / λ1 (condition number)
11
Exercise :
12
Exercise (Luenberger):
Consider the quadratic problem
f (x) = (1 / 2)xT Qx − bT x, where Q is an n × n
{ }
matrix. Let v 0 , v1 ,...., v p−1 be the subset of
eigen vectors of Q such that their corresponsing
{
eigen values λ0 , λ1 ,...., λ p−1 } are in
nondecreasing order. Suppose that the intial
guess is chosen in such a way that the corresponding
gradient is a linear combination of the set
{v , v ,...., v }. Show that any subsequent gradient
0 1 p−1
13
Exercise (Luenberger):
Suppose an iterative algorithm of the form
x k+1 = x k + α k d k is applied to the quadratic
problem with matrix Q, where α k as usual
as chosen as the minimum point of the line
search, where d k is the vector satisfying
d Tk g k < 0 and
14
Recap
from
Chapter
7:
Determine α k :
1
f (x k + α d k ) = (x k + α d k )T Q(x k + α d k ) − (x k + α d k )T b
2
1 T α2 T
= x k Qx k + d k Qd k + α d k T Qx k − x k T b − α d k T b
2 2
α k is determined by
d
f (x k + α d k ) = α d k T Qd k + d k T Qx k − d k T b = 0.
dα
α d k T Qd k + d k T ( Qx k − b ) = 0.
!#"# $
gk
d k Tgk
αk = − T
d k Qd k
15
1 1 T
E(x k ) = (x k − x ) Q(x k − x ) = y k Qy k , where y k = x k − x* .
* T *
2 2
1
E(x k+1 ) = (x k+1 − x* )T Q(x k+1 − x* )
2
Substitute x k+1 = x k + α k d k :
1
E(x k+1 ) = (x k + α k d k − x* )T Q(x k + α k d k − x* )
2
1
= (y k + α k d k )T Q(y k + α k d k )
2
1 T 1 2 T
= y k Qy k + α k d k Qy k + α k d k Qd k
T
2 2
⎡ −2α k d Tk Qy k − α k2 d Tk Qd k ⎤ ⎛ 1 T ⎞
= ⎢1 − T ⎥ ⎜⎝ y k Qy k ⎟⎠
⎣ y k Qy k ⎦ 2
16
d Tk g k
Substitute for α k = − T :
d k Qd k
⎡ ⎛ dk gk ⎞ T
T
⎛ dk gk ⎞ T
T 2
⎤
⎢ 2⎜ T ⎟ d k Qy k − ⎜ T ⎟ d k Qd k ⎥
⎢ ⎝ d k Qd k ⎠ ⎝ d k Qd k ⎠ ⎥ E(x )
= ⎢1 − ⎥ k
y Tk Qy k
⎢ ⎥
⎢⎣ ⎥⎦
Using the relation Qy k = g k we get
⎡ ( ) ( ) ⎤
T 2 T 2
dk gk dk gk
⎢ 2 T − T ⎥
⎢ d k Qd k d k Qd k ⎥
= ⎢1 − ⎥ E(x k )
g Tk Q−1g k
⎢ ⎥
⎢⎣ ⎥⎦
17
⎡ ( ) ⎤
T 2
dk gk
E(x k+1 ) = ⎢1 − T ⎥ E(x k )
⎢
⎣ (
d k Qd k g Tk Q−1g k )( ) ⎥
⎦
( d g ) T 2
≥ β,
k k
Since
( d Qd )( g Q
T
k k
T
k
−1
gk )
E(x k+1 ) ≤ (1 − β )E(x k ).
18
8.5
Newton
itera=on:
Newton's method is based on the local quadratic approximation
of the function:
1
f (x) fk (x) = f (x k ) + ∇f (x k )(x − x k ) + (x − x k )T F(x k )(x − x k ).
T
2
With respect to the standard quadratic form
fk (x) = 0.5xT Qk x − b Tk x + c, we have
Qk = F(x k )
b k = −∇f (x k ) + F(x k )x k
c = ∇f T (x k )x k + 0.5xTk F(x k )x k
The next iterate, x k+1 is the minimum of fk (x) :
x k+1 = x k − [F(x k )]−1 ∇f (x k ).
19
Newton’s
itera=on
on
quadra=c
func=on
x k+1 = x k − [F(x k )]−1 ∇f (x k )
x k+1 = x k − Q−1 ( Qx k − b )
−1
x k+1 = x k − x k − Q
! b
x*
x k+1 = x*
21
Interpreta=on
of
damped
Newton
itera=on:
1.
For a given function f (x), define h(y) = f (Ty). Then the gradients
and Hessians are related as follows:
∇h(y) = TT ∇f (Ty), H(y) = TT F(Ty)T
a and A are now largest and smallest eigen values of F −1/2 (x ′′ )F(x ′ )F −1/2 (x ′′ )
where x ′ and x ′′ any two points within the neighborhood of interest.
23
Marquardt-‐Levenberg
Modifica=on:
When F(x k ) is not positive-definite
x k+1 = x k − α k [F(x k ) + µ k I]−1 ∇f (x k )
µ k is chosen in such a way that F(x k ) + µ k I is positive definite.
To choose a proper µ k , one needs to know extreme eigen values
of F(x k ).