0% found this document useful (0 votes)
15 views3 pages

Lecture 15

The document discusses quasi-Newton methods for optimization, specifically the BFGS and SR1 algorithms. It provides details on how BFGS and SR1 approximate the Hessian matrix without directly computing it, using rank-1 or rank-2 updates that satisfy the secant condition. Pseudocode for the general BFGS algorithm is also presented.

Uploaded by

ronaldo lopes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views3 pages

Lecture 15

The document discusses quasi-Newton methods for optimization, specifically the BFGS and SR1 algorithms. It provides details on how BFGS and SR1 approximate the Hessian matrix without directly computing it, using rank-1 or rank-2 updates that satisfy the secant condition. Pseudocode for the general BFGS algorithm is also presented.

Uploaded by

ronaldo lopes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

45010: Optimization I 2019W

Lecture 15: BFGS and SR1


Lecturer: Yimin Zhong Scribes: None

Note: In all notes, bold face letters denote vectors.

15.1 Quasi-Newton methods

The quasi-Newton methods should be noted they are different from modified Hessian methods, since the
quasi-Newton methods are not working on Hessian matrices.
The Newton’s method requires us to solve the Newton’s step p = −[∇2 f (xk )]−1 ∇f (xk ). However, for general
case, this is not a cheap operation (solving an equation).
On the other hand, the Newton’s method has quadratic local convergence, which is much better than linear
convergence. The quasi-Newton is a compromise between convergence and complexity.

15.1.1 The basic rule for quasi-Newton

Suppose we have the model function at xk (You can imagine Newton’s method or Trust region method)
1
mk (p) = f (xk ) + pT ∇f (xk ) + pT Bk p (15.1)
2
This Bk is an approximation for Hessian of f , but how to get Bk+1 from Bk without computing the Hessian
exactly?
After one step, the next model is
1
mk+1 (p) = f (xk+1 ) + pT ∇fk+1 + pT Bk+1 p (15.2)
2
where xk+1 = xk + pk . The quasi-Newton considers the approximation Bk+1 should satisfy a condition
(secant condition): the gradients of model function mk+1 should match the gradients of f at xk and xk+1 .
That means
∇mk+1 (−pk ) = ∇fk+1 − Bk+1 pk = ∇fk (15.3)
so Bk+1 pk = ∇fk+1 − ∇fk . We denote
sk = xk+1 − xk , yk = ∇fk+1 − ∇f (xk ) (15.4)
then (the secant equation)
Bk+1 sk = yk (15.5)
This (15.5) is the condition that our Bk+1 should satisfy! Intuitively, just “informally” write it as (imagine
one dimensional case)
∇fk+1 − ∇fk
Bk+1 = (15.6)
xk+1 − xk
The right hand side is “like” Hessian. But a single equation (15.5) cannot determine Bk+1 uniquely (why?).
For many problems, we also hope Bk+1 be psd to make sure the direction is descent direction.

15-1
15-2 Lecture 15: BFGS and SR1

15.1.2 SR1

In this part, we use the symmetric rank-1 update by

Bk+1 = Bk + σvv T (15.7)

where σ = 1 or −1, and σ, v satisfies the secant equation yk = Bk+1 sk . The reason for its name as rank-1 is:
vv T is a rank-1 matrix. So we compute

yk = Bk sk + [σv T sk ]v (15.8)

so v is along direction yk − Bk sk , say v = a(yk − Bk sk ), then

σa2 (sTk (yk − Bk sk )) = 1 (15.9)

so
−1/2
σ = sign(sTk (yk − Bk sk )), a = ±|sTk (yk − Bk sk )| (15.10)
which is
(yk − Bk sk )(yk − Bk sk )T
Bk+1 = Bk + (15.11)
(yk − Bk sk )T sk
The Sherman-Morrison formula (see A.27 in book) can easily invert this matrix by

(sk − Hk yk )(sk − Hk yk )T
Hk+1 = Hk + (15.12)
(sk − Hk yk )T yk

we can also derive above formula by setting rank-1 formula for Hk+1 as we did for Bk+1 . However there are
two issues with this method:

1. the denominator (sk − Hk yk )T yk may be too close to 0

2. If (sk − Hk yk )T yk is negative, then Hk+1 might be non-positive definite anymore.

For the first issue, we can set a rule to skip the iteration,

|(sk − Hk yk )T yk | < rkyk kk(sk − Hk yk )k (15.13)

say r = 10−8 , we will skip this iteration by setting Hk+1 = Hk , otherwise the denominator is not small, we
can still use the update formula.

15.1.3 BFGS

This famous quasi-Newton method: BFGS, is named after 4 distinguished mathematicians. The idea is
similar to the above SR1 method. Instead of rank-1 update, we can use rank-2 update formula. So BFGS is
using a update formula as
Bk+1 = Bk + auuT + bvv T (15.14)
So we can multiply sk ,
yk = Bk sk + au(uT sk ) + bv(v T sk ) (15.15)
Which means
a(uT sk )u + b(v T sk )v = yk − Bk sk (15.16)
Lecture 15: BFGS and SR1 15-3

Here actually we have multiple choices for u and v vectors, but BFGS takes u = yk and v = Bk sk , to match
the right hand side. Then we must have

a(ykT sk ) = 1, b(sTk Bk sk ) = −1 (15.17)

The update formula is

yk ykT Bk sk sTk Bk
Bk+1 = Bk + − (BFGS)
ykT sk sTk Bk sk (15.18)
= (I − ρk yk sTk )Bk (I − ρk sk ykT ) + ρk yk ykT (DFP)
1 −1
where ρk = sT
. Use the relation Hk+1 sk = yk , we will get the rank-2 update for Hk+1 = Bk+1 .
k yk

sk sTk Hk yk ykT Hk
Hk+1 = Hk + T
− (DFP)
sk yk ykT Hk yk (15.19)
= (I − ρk sk ykT )Hk (I − ρk yk sTk ) + ρk sk sTk (BFGS)

The latter one is BFGS. Now, we have the updating formula, what is the initial value of H0 ? This is quite
difficult to come up with a good one unless we compute it explicitly, sometimes we can simply set it to be
identity .

15.1.4 BFGS Algorithm


1. set starting point x0 and approximated inverse Hessian H0 . k ← 0.
2. If k∇f (xk )k is not small enough (say 10−9 ), then

pk = −Hk ∇fk (15.20)

compute xk+1 = xk +αk pk with step length αk chosen to satisfy Wolfe condition (important). Compute
sk = xk+1 − xk , yk = ∇fk+1 − ∇fk and update Hk+1 , k ← k + 1. Go to step 2.

The step length should not use backtracking algorithm to generate since the algorithm relies
on the curvature condition. The performance may be degraded using backtracking. We can
use the exact line search here.

You might also like