0% found this document useful (0 votes)
32 views

Rrrdesdelinear and Nonlinear Programming-4

This document discusses quasi-Newton methods for optimization which approximate the inverse Hessian matrix to avoid directly computing it. It presents a modified Newton method that replaces the true inverse Hessian with a positive definite approximation, and proves its convergence properties are similar to gradient descent methods. It also describes a classical modified Newton's method that uses the inverse Hessian evaluated at the initial point.

Uploaded by

bidbifb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Rrrdesdelinear and Nonlinear Programming-4

This document discusses quasi-Newton methods for optimization which approximate the inverse Hessian matrix to avoid directly computing it. It presents a modified Newton method that replaces the true inverse Hessian with a positive definite approximation, and proves its convergence properties are similar to gradient descent methods. It also describes a classical modified Newton's method that uses the inverse Hessian evaluated at the initial point.

Uploaded by

bidbifb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Chapter 10 QUASI-NEWTON

METHODS

In this chapter we take another approach toward the development of methods lying
somewhere intermediate to steepest descent and Newton’s method. Again working
under the assumption that evaluation and use of the Hessian matrix is impractical
or costly, the idea underlying quasi-Newton methods is to use an approximation to
the inverse Hessian in place of the true inverse that is required in Newton’s method.
The form of the approximation varies among different methods—ranging from
the simplest where it remains fixed throughout the iterative process, to the more
advanced where improved approximations are built up on the basis of information
gathered during the descent process.
The quasi-Newton methods that build up an approximation to the inverse
Hessian are analytically the most sophisticated methods discussed in this book for
solving unconstrained problems and represent the culmination of the development
of algorithms through detailed analysis of the quadratic problem. As might be
expected, the convergence properties of these methods are somewhat more difficult
to discover than those of simpler methods. Nevertheless, we are able, by continuing
with the same basic techniques as before, to illuminate their most important features.
In the course of our analysis we develop two important generalizations of
the method of steepest descent and its corresponding convergence rate theorem.
The first, discussed in Section 10.1, modifies steepest descent by taking as the
direction vector a positive definite transformation of the negative gradient. The
second, discussed in Section 10.8, is a combination of steepest descent and Newton’s
method. Both of these fundamental methods have convergence properties analogous
to those of steepest descent.

10.1 MODIFIED NEWTON METHOD


A very basic iterative process for solving the problem

minimize f x

which includes as special cases most of our earlier ones is


285
286 Chapter 10 Quasi-Newton Methods

xk+1 = xk − k Sk f xk T  (1)

where Sk is a symmetric n × n matrix and where, as usual, k is chosen to minimize


fxk+1 . If Sk is the inverse of the Hessian of f , we obtain Newton’s method, while
if Sk = I we have steepest descent. It would seem to be a good idea, in general,
to select Sk as an approximation to the inverse of the Hessian. We examine that
philosophy in this section.
First, we note, as in Section 8.8, that in order that the process (1) be guaranteed
to be a descent method for small values of , it is necessary in general to require
that Sk be positive definite. We shall therefore always impose this as a requirement.
Because of the similarity of the algorithm (1) with steepest descent† it should
not be surprising that its convergence properties are similar in character to our
earlier results. We derive the actual rate of convergence by considering, as usual,
the standard quadratic problem with

f x = 21 xT Qx − bT x (2)

where Q is symmetric and positive definite. For this case we can find an explicit
expression for k in (1). The algorithm becomes

xk+1 = xk − k Sk gk (3a)

where

gk = Qxk − b (3b)
gkT Sk gk
k =  (3c)
gkT Sk QSk gk

We may then derive the convergence rate of this algorithm by slightly extending
the analysis carried out for the method of steepest descent.

Modified Newton Method Theorem (Quadratic case). Let x∗ be the unique


minimum point of f, and define Ex = 21 x − x∗ T Qx − x∗ .
Then for the algorithm (3) there holds at every step k
 2
B k − bk
E xk+1   E xk   (4)
Bk + bk

where bk and Bk are, respectively, the smallest and largest eigenvalues of the
matrix Sk Q.


The algorithm (1) is sometimes referred to as the method of deflected gradients, since the
direction vector can be thought of as being determined by deflecting the gradient through
multiplication by Sk .
10.1 Modified Newton Method 287

Proof. We have by direct substitution


 T 2
E xk  − E xk+1  g k Sk gk
= T  
E xk  gk Sk QSk gk gkT Q−1 gk

Letting Tk = S1/2 1/2


k QSk and pk = S1/2
k gk we obtain

 T 2
E xk  − E xk+1  p k Pk
= T  
E xk  pk Tk pk pTk T−1
k pk

From the Kantorovich inequality we obtain easily


 2
B k − bk
E xk+1   E xk  
Bk + bk
−1/2
where bk and Bk are the smallest and largest eigenvalues of Tk . Since S1/2
k Tk S k =
Sk Q, we see that Sk Q is similar to Tk and therefore has the same eigenvalues.
This theorem supports the intuitive notion that for the quadratic problem one
should strive to make Sk close to Q−1 since then both bk and Bk would be close
to unity and convergence would be rapid. For a nonquadratic objective function f
the analog to Q is the Hessian F(x), and hence one should try to make Sk close to
Fxk −1 .
Two remarks may help to put the above result in proper perspective. The
first remark is that both the algorithm (1) and the theorem stated above are only
simple, minor, and natural extensions of the work presented in Chapter 8 on steepest
descent. As such the result of this section can be regarded, correspondingly, not as
a new idea but as an extension of the basic result on steepest descent. The second
remark is that this one simple result when properly applied can quickly characterize
the convergence properties of some fairly complex algorithms. Thus, rather than
an isolated result concerned with a specific form of algorithm, the theorem above
should be regarded as a general tool for convergence analysis. It provides significant
insight into various quasi-Newton methods discussed in this chapter.

A Classical Method
We conclude this section by mentioning the classical modified Newton’s method, a
standard method for approximating Newton’s method without evaluating Fxk −1
for each k. We set

xk+1 = xk − k F x0  −1 f xk T  (5)

In this method the Hessian at the initial point x0 is used throughout the process.
The effectiveness of this procedure is governed largely by how fast the Hessian is
changing—in other words, by the magnitude of the third derivatives of f .

You might also like