0% found this document useful (0 votes)

21 views12 pages

Chapter 6vh

Tyh

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views12 pages

Chapter 6vh

Tyh

Uploaded by

Krishna Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

6 Second-Order Methods

The previous chapter focused on optimization methods that involve first-order

approximations of the objective function using the gradient. This chapter focuses
on leveraging second-order approximations that use the second derivative in uni-
variate optimization or the Hessian in multivariate optimization to direct the
search. This additional information can help improve the local model used for
informing the selection of directions and step lengths in descent algorithms.

6.1 Newton’s Method

Knowing the function value and gradient for a design point can help determine
the direction to travel, but this first-order information does not directly help
determine how far to step to reach a local minimum. Second-order information,
on the other hand, allows us to make a quadratic approximation of the objective
function and approximate the right step size to reach a local minimum as shown
in figure 6.1. As we have seen with quadratic fit search in chapter 3, we can
analytically obtain the location where a quadratic approximation has a zero
gradient. We can then use that location as the next iteration to approach a local
minimum.
In univariate optimization, the quadratic approximation about a point x (k)
comes from the second-order Taylor expansion:

( x − x (k) )2 ′′ (k)
q ( x ) = f ( x (k) ) + ( x − x (k) ) f ′ ( x (k) ) + f (x ) (6.1)
2
88 c ha p te r 6 . se c on d -ord e r me thod s

Figure 6.1. A comparison of first-

order and second-order approxi-
mations. Bowl-shaped quadratic
approximations have unique loca-
tions where the derivative is zero.
f

f
x x

Setting the derivative to zero and solving for the root yields the update equation
for Newton’s method:
∂
q( x ) = f ′ ( x (k) ) + ( x − x (k) ) f ′′ ( x (k) ) = 0 (6.2)
∂x
f ′ ( x (k) )
x ( k +1) = x ( k ) − (6.3)
f ′′ ( x (k) )
This update is shown in figure 6.2.

Figure 6.2. Newton’s method can

be interpreted as a root-finding
method applied to f ′ that itera-
tively improves a univariate design
point by taking the tangent line at
f′

( x, f ′ ( x )), finding the intersection

with the x-axis, and using that x

0 value as the next design point.

x ( k +1) x (k) x ( k +1) x (k)

x x

The update rule in Newton’s method involves dividing by the second derivative.
The update is undefined if the second derivative is zero, which occurs when
the quadratic approximation is a line. Instability also occurs when the second
derivative is very close to zero, in which case the next iterate will lie very far from
the current design point, far from where the local quadratic approximation is
valid. Poor local approximations can lead to poor performance with Newton’s
method. Figure 6.3 shows three kinds of failure cases.

© 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license.
2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected]
6 . 1. n e w ton ’s me thod 89

Oscillation Overshoot Negative f ′′

x
x (k) x ( k +1) x ( k +1) x (k)
x x
Figure 6.3. Examples of failure
cases with Newton’s method.
Newton’s method does tend to converge quickly when in a bowl-like region
that is sufficiently close to a local minimum. It has quadratic convergence, meaning
the difference between the minimum and the iterate is approximately squared
with every iteration. This rate of convergence holds for Newton’s method starting
from x (1) within a distance δ of a root x ∗ if1 1
The final condition enforces suf-
ficient closeness, ensuring that the
• f ′′ ( x ) 6= 0 for all points in I, function is sufficiently approxi-
mated by the Taylor expansion.
• f ′′′ ( x ) is continuous on I, and J. Stoer and R. Bulirsch, Introduc-
tion to Numerical Analysis, 3rd ed.
′′′ (1) ′′′ ∗ Springer, 2002.
• 1 f (x )
2 | f ′′ ( x (1) ) | < c| ff ′′ ((xx∗ )) | for some c < ∞

for an interval I = [ x ∗ − δ, x ∗ + δ]. The final condition guards against overshoot.

Newton’s method can be extended to multivariate optimization (algorithm 6.1).
The multivariate second-order Taylor expansion at x(k) is:

1
f ( x ) ≈ q (x ) = f ( x( k ) ) + ( g( k ) ) ⊤ (x − x( k ) ) + (x − x( k ) ) ⊤ H( k ) (x − x( k ) ) (6.4)
2
where g(k) and H(k) are the gradient and Hessian at x(k) , respectively.
We evaluate the gradient and set it to zero:

∇ q (x ) = g( k ) + H( k ) (x − x( k ) ) = 0 (6.5)

We then solve for the next iterate, thereby obtaining Newton’s method in multi-
variate form:
x( k +1) = x( k ) − (H( k ) ) −1 g( k ) (6.6)

If f is quadratic and its Hessian is positive definite, then the update converges 2
Termination conditions for de-
scent methods are given in chap-
to the global minimum in one step. For general functions, Newton’s method
ter 4.
is often terminated once x ceases to change by more than a given tolerance.2
Example 6.1 shows how Newton’s method can be used to minimize a function.

Example 6.1. Newton’s method

With x(1) = [9, 8], we will use Newton’s method to minimize Booth’s func- used to minimize Booth’s function;
tion: see appendix B.2.
f (x) = ( x1 + 2x2 − 7)2 + (2x1 + x2 − 5)2
The gradient of Booth’s function is:

∇ f (x) = [10x1 + 8x2 − 34, 8x1 + 10x2 − 38]

The Hessian of Booth’s function is:

" #
10 8
H(x) =
8 10

The first iteration of Newton’s method yields:

" # " # −1 " #
(2) (1)

(1)
−1
(1) 9 10 8 10 · 9 + 8 · 8 − 34
x =x − H g = −
8 8 10 8 · 9 + 10 · 8 − 38
" # " # −1 " # " #
9 10 8 120 1
= − =
8 8 10 114 3

The gradient at x(2) is zero, so we have converged after a single iteration. The
Hessian is positive definite everywhere, so x(2) is the global minimum.

Newton’s method can also be used to supply a descent direction to line search
or can be modified to use a step factor.3 Smaller steps toward the minimum or 3
See chapter 5.
line searches along the descent direction can increase the method’s robustness.
The descent direction is:4 4
The descent direction given by
Newton’s method is similar to the
d(k) = −(H(k) )−1 g(k) (6.7) natural gradient or covariant gra-
dient. S. Amari, “Natural Gradi-
ent Works Efficiently in Learning,”
Neural Computation, vol. 10, no. 2,
pp. 251–276, 1998.

function newtons_method(∇f, H, x, ϵ, k_max) Algorithm 6.1. Newton’s method,

which takes the gradient of the
k, Δ = 1, fill(Inf, length(x))
function ∇f, the Hessian of the ob-
while norm(Δ) > ϵ && k ≤ k_max
jective function H, an initial point x,
Δ = H(x) \ ∇f(x) a step size tolerance ϵ, and a maxi-
x -= Δ mum number of iterations k_max.
k += 1
end
return x
end

6.2 Secant Method

Newton’s method for univariate function minimization requires the first and
second derivatives f ′ and f ′′ . In many cases, f ′ is known but the second derivative
is not. The secant method (algorithm 6.2) applies Newton’s method using estimates
of the second derivative and thus only requires f ′ . This property makes the secant
method more convenient to use in practice.
The secant method uses the last two iterates to approximate the second deriva-
tive:
f ′ ( x ( k ) ) − f ′ ( x ( k −1) )
f ′′ ( x (k) ) ≈ (6.8)
x ( k ) − x ( k −1)
This estimate is substituted into Newton’s method:
x ( k ) − x ( k −1)
x ( k +1) ← x ( k ) − f ′ ( x (k) ) (6.9)
f ′ ( x ( k ) ) − f ′ ( x ( k −1) )
The secant method requires an additional initial design point. It suffers from
the same problems as Newton’s method and may take more iterations to converge
due to approximating the second derivative.

6.3 Quasi-Newton Methods

Just as the secant method approximates f ′′ in the univariate case, quasi-Newton

methods approximate the inverse Hessian. Quasi-Newton method updates have
the form:
x( k +1) ← x( k ) − α ( k ) Q( k ) g( k ) (6.10)
where α(k) is a scalar step factor and Q(k) approximates the inverse of the Hessian
at x(k) .

function secant_method(f′, x0, x1, ϵ) Algorithm 6.2. The secant method

for univariate function minimiza-
g0 = f′(x0)
tion. The inputs are the first deriva-
Δ = Inf
tive f′ of the target function, two
while abs(Δ) > ϵ initial points x0 and x1, and the
g1 = f′(x1) desired tolerance ϵ. The final x-
Δ = (x1 - x0)/(g1 - g0)*g1 coordinate is returned.
x0, x1, g0 = x1, x1 - Δ, g1
end
return x1
end

These methods typically set Q(1) to the identity matrix, and they then apply up-
dates to reflect information learned with each iteration. To simplify the equations
for the various quasi-Newton methods, we define the following:

γ( k +1) ≡ g( k +1) − g( k ) (6.11)

δ ( k +1)
≡x ( k +1)
−x (k)
(6.12)

The Davidon-Fletcher-Powell (DFP) method (algorithm 6.3) uses:5 5

The original concept was pre-
sented in a technical report, W. C.
Qγγ⊤ Q δδ⊤ Davidon, “Variable Metric Method
Q ← Q− + ⊤ (6.13) for Minimization,” Argonne Na-
γ⊤ Qγ δ γ tional Laboratory, Tech. Rep. ANL-
5990, 1959. It was later published:
where all terms on the right hand side are evaluated at the same iteration. W. C. Davidon, “Variable Metric
The update for Q in the DFP method has three properties: Method for Minimization,” SIAM
Journal on Optimization, vol. 1, no. 1,
pp. 1–17, 1991. The method was
1. Q remains symmetric and positive definite.
modified by R. Fletcher and M. J. D.
Powell, “A Rapidly Convergent De-
2. If f (x) = 21 x⊤ Ax + b⊤ x + c, then Q = A−1 . Thus the DFP has the same scent Method for Minimization,”
convergence properties as the conjugate gradient method. The Computer Journal, vol. 6, no. 2,
pp. 163–168, 1963.
3. For high-dimensional problems, storing and updating Q can be significant
compared to other methods like the conjugate gradient method.

An alternative to DFP, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method

(algorithm 6.4), uses:6 6
R. Fletcher, Practical Methods of Op-
timization, 2nd ed. Wiley, 1987.
! !
δγ⊤ Q + Qγδ⊤ γ⊤ Qγ δδ⊤
Q ← Q− + 1+ ⊤ (6.14)
δ⊤ γ δ γ δ⊤ γ

mutable struct DFP <: DescentMethod Algorithm 6.3. The Davidon-

Q Fletcher-Powell descent method.
end
function init!(M::DFP, f, ∇f, x)
m = length(x)
M.Q = Matrix(1.0I, m, m)
return M
end
function step!(M::DFP, f, ∇f, x)
Q, g = M.Q, ∇f(x)
x′ = line_search(f, x, -Q*g)
g′ = ∇f(x′)
δ = x′ - x
γ = g′ - g
Q[:] = Q - Q*γ*γ'*Q/(γ'*Q*γ) + δ*δ'/(δ'*γ)
return x′
end

mutable struct BFGS <: DescentMethod Algorithm 6.4. The Broyden-

Q Fletcher-Goldfarb-Shanno descent
method.
end
function init!(M::BFGS, f, ∇f, x)
m = length(x)
M.Q = Matrix(1.0I, m, m)
return M
end
function step!(M::BFGS, f, ∇f, x)
Q, g = M.Q, ∇f(x)
x′ = line_search(f, x, -Q*g)
g′ = ∇f(x′)
δ = x′ - x
γ = g′ - g
Q[:] = Q - (δ*γ'*Q + Q*γ*δ')/(δ'*γ) +
(1 + (γ'*Q*γ)/(δ'*γ))[1]*(δ*δ')/(δ'*γ)
return x′
end

BFGS does better than DFP with approximate line search but still uses an
n × n dense matrix. For very large problems where space is a concern, the Limited-
memory BFGS method (algorithm 6.5), or L-BFGS, can be used to approximate
BFGS.7 L-BFGS stores the last m values for δ and γ rather than the full inverse 7
J. Nocedal, “Updating Quasi-
Newton Matrices with Limited
Hessian, where i = 1 indexes the oldest value and i = m indexes the most recent.
Storage,” Mathematics of Computa-
The process for computing the descent direction d at x begins by computing tion, vol. 35, no. 151, pp. 773–782,
q(m) = ∇ f (x). The remaining vectors q(i) for i from m − 1 down to 1 are computed 1980.
using
⊤
δ( i +1) q( i +1)
q( i ) = q( i +1) − ⊤ γ( i +1) (6.15)
γ( i +1) δ( i +1)
These vectors are used to compute another m + 1 vectors, starting with

γ( m ) ⊙ δ( m ) ⊙ q( m )
z(0) = ⊤ (6.16)
γ( m ) γ( m )

and proceeding with z(i) for i from 1 to m according to

 ⊤ ⊤ 
( i −1) ( i −1) ( i −1) ( i −1)
δ q γ z
z( i ) = z( i −1) + δ( i −1)  (6.17)
 
⊤ ( i −1) − ⊤ ( i −1) 
γ( i −1) δ γ( i −1) δ

The descent direction is d = −z(m) .

For minimization, the inverse Hessian Q must remain positive definite. The
initial Hessian is often set to the diagonal of
⊤
γ(1) δ(1)
Q(1) = ⊤ (6.18)
γ(1) γ(1)

Computing the diagonal for the above expression and substituting the result into
z(1) = Q(1) q(1) results in the equation for z(1) .
The quasi-Newton methods discussed in this section are compared in figure 6.4.
They often perform quite similarly.

© 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license.
2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected]
6 . 4 . su mma ry 95

DFP Figure 6.4. Several quasi-Newton

BFGS methods compared on the Rosen-
L-BFGS (m = 3) brock function; see appendix B.6.
All methods have nearly identical
L-BFGS (m = 2)
updates, with L-BFGS noticeably
L-BFGS (m = 1) deviating only when its history, m,
is 1.
x2

6.4 Summary

• Incorporating second-order information in descent methods often speeds con-

vergence.

• Newton’s method is a root-finding method that leverages second-order infor-

mation to quickly descend to a local minimum.

• The secant method and quasi-Newton methods approximate Newton’s method

when the second-order information is not directly available.

6.5 Exercises

Exercise 6.1. What advantage does second-order information provide about

convergence that first-order information lacks?

Exercise 6.2. When finding roots in one dimension, when would we use Newton’s
method instead of the bisection method?

Exercise 6.3. Apply Newton’s method to f ( x ) = x2 from a starting point of your

choice. How many steps do we need to converge?

mutable struct LimitedMemoryBFGS <: DescentMethod Algorithm 6.5. The Limited-

m memory BFGS descent method,
which avoids storing the ap-
δs
proximate inverse Hessian. The
γs parameter m determines the history
qs size. The LimitedMemoryBFGS type
end also stores the step differences
function init!(M::LimitedMemoryBFGS, f, ∇f, x) δs, the gradient changes γs, and
M.δs = [] storage vectors qs.
M.γs = []
M.qs = []
return M
end
function step!(M::LimitedMemoryBFGS, f, ∇f, x)
δs, γs, qs, g = M.δs, M.γs, M.qs, ∇f(x)
m = length(δs)
if m > 0
q = g
for i in m : -1 : 1
qs[i] = copy(q)
q -= (δs[i]⋅q)/(γs[i]⋅δs[i])*γs[i]
end
z = (γs[m] .* δs[m] .* q) / (γs[m]⋅γs[m])
for i in 1 : m
z += δs[i]*(δs[i]⋅qs[i] - γs[i]⋅z)/(γs[i]⋅δs[i])
end
x′ = line_search(f, x, -z)
else
x′ = line_search(f, x, -g)
end
g′ = ∇f(x′)
push!(δs, x′ - x); push!(γs, g′ - g)
push!(qs, zeros(length(x)))
while length(δs) > M.m
popfirst!(δs); popfirst!(γs); popfirst!(qs)
end
return x′
end

Exercise 6.4. Apply Newton’s method to f ( x ) = 1 ⊤

2 x Hx starting from x(1) =
[1, 1]. What have you observed? Use H as follows:
" #
1 0
H= (6.19)
0 1000

Next, apply gradient descent to the same optimization problem by stepping

with the unnormalized gradient. Do two steps of the algorithm. What have you
observed? Finally, apply the conjugate gradient method. How many steps do you
need to converge?

Exercise 6.5. Compare Newton’s method and the secant method on f ( x ) = x2 + x4 ,

with x (1) = −3 and x (0) = −4. Run each method for 10 iterations. Make two
plots:

1. Plot f vs. the iteration for each method.

2. Plot f ′ vs. x. Overlay the progression of each method, drawing lines from
( x (i) , f ′ ( x (i) )) to ( x (i+1) , 0) to ( x (i+1) , f ′ ( x (i+1) )) for each transition.

What can we conclude about this comparison?

Exercise 6.6. Give an example of a sequence of points x (1) , x (2) , . . . and a function
f such that f ( x (1) ) > f ( x (2) ) > · · · and yet the sequence does not converge to a
local minimum. Assume f is bounded from below.

Exercise 6.7. What is the advantage of a Quasi-Newton method over Newton’s

method?

Exercise 6.8. Give an example where the BFGS update does not exist. What would
you do in this case?

Exercise 6.9. Suppose we have a function f (x) = ( x1 + 1)2 + ( x2 + 3)2 + 4. If we

start at the origin, what is the resulting point after one step of Newton’s method?

Exercise 6.10. In this problem we will derive the optimization problem from
which the Davidon-Fletcher-Powell update is obtained. Start with a quadratic
approximation at x(k) :
⊤ 1 ⊤
f ( k ) (x ) = y ( k ) + g( k ) x − x( k ) + x − x( k ) H( k ) x − x( k )
2

where y(k) , g(k) , and H(k) are the objective function value, the true gradient, and
a positive definite Hessian approximation at x(k) .
The next iterate is chosen using line search to obtain:
−1
x( k +1) ← x( k ) − α ( k ) H( k ) g( k )

We can construct a new quadratic approximation f (k+1) at x(k+1) . The approxi-

mation should enforce that the local function evaluation is correct:

f ( k +1) ( x( k +1) ) = y ( k +1)

and that the local gradient is correct:

∇ f ( k +1) (x( k +1) ) = g( k +1)

and that the previous gradient is correct:

∇ f ( k +1) (x( k ) ) = g( k )

Show that updating the Hessian approximation to obtain H(k+1) requires:8 8

This condition is called the secant
equation. The vectors δ and γ are
defined in equation (6.11).
H( k +1) δ( k +1) = γ( k +1)

Then, show that in order for H(k+1) to be positive definite, we require:9 9

This condition is called the cur-
vature condition. It can be enforced
⊤ using the Wolfe conditions during
δ( k +1) γ( k +1) > 0 line search.

Finally, assuming that the curvature condition is enforced, explain why one
then solves the following optimization problem to obtain H(k+1) :10 10
The Davidon-Fletcher-Powell up-
date is obtained by solving such
an optimization problem to obtain
minimize H − H( k ) an analytical solution and then
H
finding the corresponding update
subject to H = H⊤ equation for the inverse Hessian
approximation.
Hδ(k+1) = γ(k+1)

where H − H(k) is a matrix norm that defines a distance between H and H(k) .

© 2019 Massachusetts Institute of Technology, shared under a Creative Commons CC-BY-NC-ND license.
2022-05-22 00:25:57-07:00, revision 47fd495, comments to [email protected]

Strength of Materials Formula Sheet
68% (22)
Strength of Materials Formula Sheet
4 pages
Real Analysis Project
50% (2)
Real Analysis Project
14 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Quantum Numbers
No ratings yet
Quantum Numbers
28 pages
Newton Raphson Method PDF
No ratings yet
Newton Raphson Method PDF
5 pages
CH 17 PDF
100% (1)
CH 17 PDF
144 pages
Cea Cse
No ratings yet
Cea Cse
1,092 pages
Energy in The Environment: Lesson Plan Grade 10 Science
No ratings yet
Energy in The Environment: Lesson Plan Grade 10 Science
5 pages
Newton Method
100% (2)
Newton Method
11 pages
TSMH4MD1 H4 01 TG T 31 TG0 002003213391
No ratings yet
TSMH4MD1 H4 01 TG T 31 TG0 002003213391
39 pages
Module 4 System of Linear Equations and Inequalities
No ratings yet
Module 4 System of Linear Equations and Inequalities
29 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
JEE Main Full Length Test (Gen 1 and 2) - Paper
No ratings yet
JEE Main Full Length Test (Gen 1 and 2) - Paper
24 pages
Physical Science 3
No ratings yet
Physical Science 3
16 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Lecture 7 Newton
No ratings yet
Lecture 7 Newton
44 pages
Clnote Oct12
No ratings yet
Clnote Oct12
25 pages
Sequential Quadratic Programming
No ratings yet
Sequential Quadratic Programming
50 pages
Magnification Factor and Frequency Response Curve
No ratings yet
Magnification Factor and Frequency Response Curve
8 pages
许韬-Lecture Notes 05 - Chap2-Section2.3
No ratings yet
许韬-Lecture Notes 05 - Chap2-Section2.3
24 pages
Roadmap of UML
No ratings yet
Roadmap of UML
39 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
5 2 Ot NM 18122020
No ratings yet
5 2 Ot NM 18122020
15 pages
Sayan Ghosh 26900123054 Numerical Methods
No ratings yet
Sayan Ghosh 26900123054 Numerical Methods
13 pages
7 Newton Raphson Method
No ratings yet
7 Newton Raphson Method
20 pages
Lecture pptParameterEstimation
No ratings yet
Lecture pptParameterEstimation
24 pages
Structure of Atom
No ratings yet
Structure of Atom
15 pages
Numerical Results For Gauss-Seidel Iterative Algor
No ratings yet
Numerical Results For Gauss-Seidel Iterative Algor
11 pages
14 Newton
No ratings yet
14 Newton
24 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
18 pages
Physics Final Amaan
No ratings yet
Physics Final Amaan
18 pages
Thi Giữa Kì II Lớp 7
No ratings yet
Thi Giữa Kì II Lớp 7
15 pages
5 Newton & Secant Method
No ratings yet
5 Newton & Secant Method
12 pages
Maths Project
No ratings yet
Maths Project
13 pages
HW 3 Unconstrained-Optimization Advanced
No ratings yet
HW 3 Unconstrained-Optimization Advanced
9 pages
Unconstrained NLP Algorithm Ipad
No ratings yet
Unconstrained NLP Algorithm Ipad
25 pages
Bhargava2 v-4 ScrambledConteer-11
No ratings yet
Bhargava2 v-4 ScrambledConteer-11
10 pages
Basic Electrical-Chapter 3 Theory
No ratings yet
Basic Electrical-Chapter 3 Theory
13 pages
Lecture 14
No ratings yet
Lecture 14
9 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Newton 'S Method
No ratings yet
Newton 'S Method
22 pages
Lecture12
No ratings yet
Lecture12
6 pages
EE232 4 Density of States QW QWire
No ratings yet
EE232 4 Density of States QW QWire
11 pages
Investigating Science
No ratings yet
Investigating Science
9 pages
Becoming The Mentor You Wish You Had A Guide For Software Engineers
No ratings yet
Becoming The Mentor You Wish You Had A Guide For Software Engineers
7 pages
Chương 9
No ratings yet
Chương 9
12 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
The Bridge Between Newtons Method and Newton-Raph
No ratings yet
The Bridge Between Newtons Method and Newton-Raph
9 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
Bsg8 Edited q1w7nk Heat and Temperature
No ratings yet
Bsg8 Edited q1w7nk Heat and Temperature
15 pages
Chapter 4
No ratings yet
Chapter 4
37 pages
Newton's Method For Unconstrained Optimization
No ratings yet
Newton's Method For Unconstrained Optimization
14 pages
1.3: Newton's Method: MA385 - Numerical Analysis September 2019
No ratings yet
1.3: Newton's Method: MA385 - Numerical Analysis September 2019
17 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Newton Scribed
No ratings yet
Newton Scribed
7 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
W6 Lesson 4 - Newton's Method - Module
No ratings yet
W6 Lesson 4 - Newton's Method - Module
5 pages
Q 2 M 3 Reading Resources New
No ratings yet
Q 2 M 3 Reading Resources New
4 pages
Week 3 Modifikasi Posisi palsu-dan-Metode-Newton 9543 0
No ratings yet
Week 3 Modifikasi Posisi palsu-dan-Metode-Newton 9543 0
14 pages
Be Electrical Engineering Semester 3 2023 October Engineering Mathematics III m3 Pattern 2019
No ratings yet
Be Electrical Engineering Semester 3 2023 October Engineering Mathematics III m3 Pattern 2019
2 pages
Worksheet 5 Phys1 2425
No ratings yet
Worksheet 5 Phys1 2425
2 pages
Newton Raphson Method PDF
No ratings yet
Newton Raphson Method PDF
5 pages
Java Enums PDF
No ratings yet
Java Enums PDF
3 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
STR6 Torsion of Circular Sections Stud Guide - 0908
No ratings yet
STR6 Torsion of Circular Sections Stud Guide - 0908
10 pages
Second Order Method: Newton Method Quasi Newton Method
No ratings yet
Second Order Method: Newton Method Quasi Newton Method
11 pages
Chapter 9 Newton's Method
No ratings yet
Chapter 9 Newton's Method
27 pages
Balace and Gravitational Curve: Activity 9
No ratings yet
Balace and Gravitational Curve: Activity 9
5 pages
Rrrdesdelinear and Nonlinear Programming-4
No ratings yet
Rrrdesdelinear and Nonlinear Programming-4
3 pages
Advanced Surveying Assignment Questions - (PASHA - BHAI)
No ratings yet
Advanced Surveying Assignment Questions - (PASHA - BHAI)
3 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
EXCERCISE
No ratings yet
EXCERCISE
4 pages
2me02 Engineering Thermodynamics
No ratings yet
2me02 Engineering Thermodynamics
2 pages
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 03-Sep-2020 Lecture - 16 PDF
No ratings yet
FALLSEM2020-21 CHE1011 TH VL2020210101704 Reference Material I 03-Sep-2020 Lecture - 16 PDF
14 pages
A Model-Trust Region Algorithm Utilizing A Quadratic Interpolant
No ratings yet
A Model-Trust Region Algorithm Utilizing A Quadratic Interpolant
11 pages
Newton's Method
No ratings yet
Newton's Method
3 pages
Template Thermal Equilibrium Lab Report
No ratings yet
Template Thermal Equilibrium Lab Report
3 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
"Newton's Method and Loops": University of Karbala College of Engineering Petroleum Eng. Dep
No ratings yet
"Newton's Method and Loops": University of Karbala College of Engineering Petroleum Eng. Dep
11 pages
ISO 9956-2-Amendment 1-98
No ratings yet
ISO 9956-2-Amendment 1-98
6 pages
Machine Problem
No ratings yet
Machine Problem
15 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Bolt Capacity
No ratings yet
Bolt Capacity
9 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Roots of Equations: 1.0.1 Newton's Method
No ratings yet
Roots of Equations: 1.0.1 Newton's Method
20 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Dynamic Analysis of Machine Foundation
No ratings yet
Dynamic Analysis of Machine Foundation
8 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Chapter 6vh

Uploaded by

Chapter 6vh

Uploaded by

6 Second-Order Methods

The previous chapter focused on optimization methods that involve first-order

6.1 Newton’s Method

Figure 6.1. A comparison of first-

Figure 6.2. Newton’s method can

( x, f ′ ( x )), finding the intersection

with the x-axis, and using that x

x ( k +1) x (k) x ( k +1) x (k)

Oscillation Overshoot Negative f ′′

for an interval I = [ x ∗ − δ, x ∗ + δ]. The final condition guards against overshoot.

Example 6.1. Newton’s method

∇ f (x) = [10x1 + 8x2 − 34, 8x1 + 10x2 − 38]

The Hessian of Booth’s function is:

The first iteration of Newton’s method yields:

function newtons_method(∇f, H, x, ϵ, k_max) Algorithm 6.1. Newton’s method,

6.2 Secant Method

6.3 Quasi-Newton Methods

Just as the secant method approximates f ′′ in the univariate case, quasi-Newton

function secant_method(f′, x0, x1, ϵ) Algorithm 6.2. The secant method

γ( k +1) ≡ g( k +1) − g( k ) (6.11)

The Davidon-Fletcher-Powell (DFP) method (algorithm 6.3) uses:5 5

An alternative to DFP, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method

mutable struct DFP <: DescentMethod Algorithm 6.3. The Davidon-

mutable struct BFGS <: DescentMethod Algorithm 6.4. The Broyden-

and proceeding with z(i) for i from 1 to m according to

The descent direction is d = −z(m) .

DFP Figure 6.4. Several quasi-Newton

• Incorporating second-order information in descent methods often speeds con-

• Newton’s method is a root-finding method that leverages second-order infor-

• The secant method and quasi-Newton methods approximate Newton’s method

Exercise 6.1. What advantage does second-order information provide about

Exercise 6.3. Apply Newton’s method to f ( x ) = x2 from a starting point of your

mutable struct LimitedMemoryBFGS <: DescentMethod Algorithm 6.5. The Limited-

Exercise 6.4. Apply Newton’s method to f ( x ) = 1 ⊤

Next, apply gradient descent to the same optimization problem by stepping

Exercise 6.5. Compare Newton’s method and the secant method on f ( x ) = x2 + x4 ,

1. Plot f vs. the iteration for each method.

What can we conclude about this comparison?

Exercise 6.7. What is the advantage of a Quasi-Newton method over Newton’s

Exercise 6.9. Suppose we have a function f (x) = ( x1 + 1)2 + ( x2 + 3)2 + 4. If we

We can construct a new quadratic approximation f (k+1) at x(k+1) . The approxi-

f ( k +1) ( x( k +1) ) = y ( k +1)

and that the local gradient is correct:

∇ f ( k +1) (x( k +1) ) = g( k +1)

and that the previous gradient is correct:

Show that updating the Hessian approximation to obtain H(k+1) requires:8 8

Then, show that in order for H(k+1) to be positive definite, we require:9 9

You might also like