0% found this document useful (0 votes)
72 views5 pages

Unconstrained Minimization in R: Newton Methods

The document summarizes Newton's method for unconstrained minimization. It discusses the local quadratic convergence rate of Newton's method when minimizing a function. The method approximates the function using a quadratic Taylor expansion at each iteration and sets the gradient of this approximation to zero to determine the next point. Under certain conditions on the Hessian and gradient, including the Hessian being nonsingular at the solution, Newton's method is shown to converge superlinearly to the solution. The quadratic convergence rate is also proven under additional Lipschitz conditions. Damped Newton methods are introduced which can guarantee global convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views5 pages

Unconstrained Minimization in R: Newton Methods

The document summarizes Newton's method for unconstrained minimization. It discusses the local quadratic convergence rate of Newton's method when minimizing a function. The method approximates the function using a quadratic Taylor expansion at each iteration and sets the gradient of this approximation to zero to determine the next point. Under certain conditions on the Hessian and gradient, including the Hessian being nonsingular at the solution, Newton's method is shown to converge superlinearly to the solution. The quadratic convergence rate is also proven under additional Lipschitz conditions. Damped Newton methods are introduced which can guarantee global convergence.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 4

Unconstrained Minimization in Rn :
Newton Methods

1. Motivation
Again the problem here is
min f (x) (1)
x∈Rn

The basic idea is to take a quadratic approximation around the current point xi and minimize
this approximation to get xi+1 . So we have:
1
minn f (xi ) + ∇f (xi )(x − xi ) + (x − xi )∇2 f (xi )(x − xi ).
x∈R 2
By setting the gradient of this approximation to zero, we get:

∇f (xi ) + ∇2 f (xi )(xi+1 − xi ) = 0.

If ∇2 f (xi )−1 exists then


xi+1 = xi − ∇2 f (xi )−1 ∇f (xi )
which is the classical Newton iteration. An equivalent interpretation is obtained by linearizing
∇f (x) around xi and solving the linearized equations for xi+1 .

2. Local Convergence of Newton’s Method and its Superlinear and Quadratic Rates
of Convergence

1 Theorem Let g: Rn → Rn be differentiable on an open neighborhood of a solution x̄ of


g(x) = 0 and let ∇g be continuous and nonsingular at x̄. Then:

(a) x̄ is a point of attraction of the Newton iteration

xk+1 = xk − ∇g(xk )−1 g(xk )

and
kxk+1 − x̄k
lim =0 (Superlinear quotient rate)
k→∞ kxk − x̄k

(b) Moreover, if
k∇g(x) − ∇g(x̄)k <
= αkx − x̄k

for all x in some neighborhood N (x̄) of x̄, that is g ∈ LCα1 (N (x̄)), then there is a positive
constant c s.t.:
2
kxk+1 − x̄k <
= ckxk − x̄k (Quadratic quotient rate)

for all k >


= k0 where k0 depends on x0 .

1
Proof.

(a) We first show that ∇f (x) is nonsingular for all x in a neighborhood of x̄. Set β =
k∇g(x̄)−1 k and let 0 < ε < (2β)−1 . By the continuity
of ∇g at x̄ there is a δ > 0
s.t. k∇g(x) − ∇g(x̄)k = ε whenever x ∈ S: = {x kx − x̄k <
<
= δ}. Hence by the Banach

Perturbation Lemma A.18:


 −1 β
k ∇g(x̄) + ∇g(x) − ∇g(x̄) k = k∇g(x)−1 k <
= < 2β
1 − βε

Define the Newton mapping:

h(x) = x − ∇g(x)−1 g(x).

By B.6.2:
∇h(x̄) = I − ∇g(x̄)−1 ∇g(x̄) = 0.
Hence by the Ostrowski Point of Attraction Theorem B.6.1, x̄ is a point of attraction.
Moreover since h is differentiable at x̄, that is:

kh(xk ) − h(x̄) − ∇h(x̄)(xk − x̄)k


lim = 0,
k→∞ kxk − x̄k

whenever lim xk = x̄. Since ∇h(x̄) = 0, this gives:


k→∞

kxk+1 − x̄k
lim = 0.
k→∞ kxk − x̄k

(b) By the Quadratic Bound Lemma B.1.3:


α
kg(x) − g(x̄) − ∇g(x̄)(x − x̄)k <
= kx − x̄k2 ,
2

and

kh(x) − h(x̄)k = kx − ∇g(x)−1 g(x) − x̄k


 
<
= k − ∇g(x)−1 g(x) − g(x̄) − ∇g(x̄)(x − x̄) k
 
+ k∇g(x)−1 ∇g(x) − ∇g(x̄) (x − x̄)k
<
= βαkx − x̄k2 + 2βαkx − x̄k2 ,

in a suitable neighborhood of x̄. Hence:


2
kxk+1 − x̄k <
= 3βαkxk − x̄k ,

provided that for a given x0 , k0 , is chosen so that xk lies in the neighborhood of all
k>= k0 , and by part (a) such a k0 is guaranteed to exist.

2
3. Local convergence of Newton’s method, global convergence of damped Newton’s
method and their quadratic rate of convergence

1 Theorem (Local Newton convergence & its quadratic rate)


Consider the Newton algorithm

xi+1 = xi − ∇2 f (xi )−1 ∇f (xi )

for solving ∇f (x) = 0 where f : Rn → R. Assume that:

(i) ∃x̄: ∇f (x̄) = 0

(ii) k∇2 f (y) − ∇2 f (x)k <


= Rky − xk, ∀x, y ∈ B(x̄) for some R > 0

1
(iii) k∇2 f (x)−1 k <
= , ∀x ∈ B(x̄) for some m > 0 and B 2m (x̄) ⊂ B(x̄)
m R

2m
(iv) kx0 − x̄k <
R

Then

R
(a): (i), (ii) and (iii) =⇒ kxi+1 − x̄k <
= kxi − x̄k2
2m
(b): (i), (ii), (iii) and (iv)=⇒ ∀i : xi ∈ B 2m (x̄) & xi → x̄
R

Note: B 2m (x̄) ⊂ B(x̄) can be induced by taking R larger, but then x0 must be closer to x̄.
R

Proof.

(a)
 
xi+1 − x̄ = xi − x̄ − ∇2 fi−1 ∇fi − ∇f (x̄) By (i)
h  it=1
= xi − x̄ − ∇2 fi−1 ∇f x̄ + t(xi − x̄)
t=0
Z 1   
= xi − x̄ − ∇2 fi−1 ∇2 f x̄ + t(xi − x̄) − ∇2 fi + ∇2 fi (xi − x̄)dt
0
(By continuity of ∇2 f from (ii))
Hence:
Z 1  
kxi+1 − x̄k <
= k∇2 fi−1 k k∇2 f x̄ + t(xi − x̄) − ∇2 fi k · kxi − x̄kdt
0
(By Lemma B.2.2)
1 1
Z
<
= R(1 − t)kxi − x̄kdtkxi − x̄k (By (ii), (iii))
m 0
R
= kxi − x̄k2
2m

3
(b) By induction:
! !2i
R < R
kxi − x̄k = kx0 − x̄k i = 1, 2, . . .
2m 2m

2 i
(Induction proof: ai <
= a0

2 i
= a0 . Then:
The inequality holds for i = 1. Suppose it holds for i, that is ai <

 i
2 i+1
2< 2
ai+1 <
= (ai ) = ao = a2i ,

where the first inequality follows from part (a). Thus the induction is extended to i + 1.)

Hence !2i
2m R 2m
kxi − x̄k <
= kx0 − x̄k < ,
R 2m R
and lim kxi − x̄k = 0.
i→∞

2 Theorem (Globally convergent damped Newton & its rate of convergence)


Consider the damped Newton algorithm:

Direction choice pi = −∇2 f (xi )−1 ∇f (xi )

Stepsize: xi+1 = xi + λi pi and λi chosen such that:

f (xi ) − f (xi+1 ) >


= σ2 (−∇f (xi )pi ),

for some forcing function σ2 . (Hence all stepsizes of SCAT1 can be used. 1 )

Assumption
(v) f ∈ C 2 and ∃M >
= m > 0 s.t. ∀x, y ∈ RN :

mkyk2 < 2 <


= y∇ f (x)y = M kyk
2

Then

(a): (v) =⇒ {xi } → x̄ where x̄ is the unique solution of minn f (x)


x∈R
(b): h (ii), (v) and “min. or first stationary point along pi ” stepsizei =⇒
1
M 2R
kxi+1 −x̄k <
= 3 kxi −x̄k2
2m 2
1
O. L. Mangasarian: “Parallel gradient distribution in unconstrained optimization”, SIAM Journal on Optimiza-
tion 33, 1995, 1916-1925, Theorem 2.

4
Proof.
1
(a) −∇fi pi = ∇fi ∇2 fi−1 ∇fi = k∇fi k2 =: σ(k∇fi k)
M 2
= f (xi ) the bounded set S = {x f (x) = f (x0 )} ⊂{x kx−x0 k =
Because f (xi+1 ) < < < k∇f(x0 )k}

m
contains {xi }. But:
1 1
kpi k2 = ∇fi ∇2 fi−2 ∇fi <
= 2
k∇fi k2 <
= max ||∇f (xi )k2 .
M m2 x∈S
So {xi , pi } is bounded and hence must have an accumulation point. But for each such
accumulation point (x̄, p̄), x̄ is the unique solution of minn f (x). So {xi } → x̄.
x∈R
(b) We need the following lemma:
3 Lemma Let f : Rn → R, let M >
= m>0 s.t. ∀x, y ∈ Rn

M kyk2 > 2 >


= y∇ f (x)y = mkyk
2

Then * +
M 1
2
f (y) <
= f (x), ∇f (x̄ = 0 =⇒ ky − x̄k <
= kx − x̄k
m
Proof of Lemma
1
f (y) = f (x̄) + ∇f (x̄)(y − x̄) + (y − x̄)∇2 f (s)(y − x̄)
2
1
f (x) = f (x̄) + ∇f (x̄)(x − x̄) + (x − x̄)∇2 f (t)(x − x̄),
2

where s ∈ (x̄, y) and t ∈ (x̄, x). Since, ∇f (x̄), subtraction gives:


1 1
0>
= f (y) − f (x) = (y − x̄)∇2 f (s)(y − x̄) − (x − x̄)∇2 f (t)(x − x̄)
2 2
1 1
>
= − M kx − x̄k2 + mky − x̄k2
2 2
Going back to the proof of part (b), let x̄i = xi − ∇2 fi−1 ∇fi , that is, x̄i is the “xi+1 ” of the
Newton algorithm. Because we are minimizing along pi , obviously f (xi+1 ) < = f (x̄i ) and hence

by the above lemma:


M 1
2
kxi+1 − x̄k <
= kx̄i − x̄k.
m
By part (a) of the previous theorem, Theorem 1,
R
kxi+1 − x̄k <
= kxi − x̄k2 .
2m
Hence 1
M 2R
kxi+1 − x̄k <
= 3 kxi − x̄k2
2m 2

You might also like