0% found this document useful (0 votes)

11 views57 pages

Lecture8 UnconstrainedII 2023

This document discusses unconstrained optimization methods, focusing on line-search-based techniques and their applications. It covers concepts such as steepest descent, conjugate gradient methods, and the importance of selecting appropriate step sizes in optimization. Additionally, it introduces the Wolfe conditions and the backtracking line search algorithm to ensure convergence to a local minimum.

Uploaded by

muskaanbhayana9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views57 pages

Lecture8 UnconstrainedII 2023

Uploaded by

muskaanbhayana9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Unconstrained Optimization II

Asst. Prof. Dr.-Ing. Sudchai Boonto

October 24, 2023
Department of Control System and Instrumentation Engineering
King Mongkut’s Unniversity of Technology Thonburi
Thailand
Objective

At the end of this chapter you should be able to:

• Describe, implement, and use line-search-based methods.
• Explain the pros and cons of the various search direction methods.
• Understand steepest descent, conjugate gradient, etc.

2/57
Two Approaches to Finding and Optimum

Line search approach Trust-region approach

3/57
Basic Concept

Consider a problem

minimize f (x), x ∈ Rn
x

• Most numerical methods require a starting design or point which we call x0 .

• We then determine the direction of travel d0 .
• A step size α0 is then determined based on minimizing f as much as possible
and the design point is updated as x1 = x0 + α0 d0 .
• The process of where to go and how far to go are repeated from x1 or
xk+1 = xk + αk dk .

4/57
Basic Concept: Example

Given f (x1 , x2 ) = x21 + 5x22 , a point x0 = [3 1]T , f0 = f (x0 ) = 14.

1. construct f (α) alogn the direction d = [−3 − 5]T and provide a plot of f (α)
versus α, for α ≥ 0.
We have x(α) = [x0 − αd] = [3 − 3α, 1 − 5α]T and
f (α) = (3 − 3α)2 + 5(1 − 5α)2 .

14
13
12
11
10
f(®)

9
8
7
6
5
0.0 0.1 0.2 0.3 0.4 0.5
®

5/57
Basic Concept : Example

2. Find the slope df (α)/dα) at α = 0. Verify that this equal ∇f (x0 )T d.

We have

df (α)
= (−6(3 − 3α) − 50(1 − 5α))|α=0 = −68
dα α=0
" #
h i −3
∇f (x0 )T d = 2(3) 10(1) = −68
−5

3. Minimize f (α) with respect to α, to obtain step size α0 . Given the corresponding
new point x1 and value of f1 = f (x1 )
We have

df (α)
= −6(3 − 3α) − 50(1 − 5α) = 0 =⇒ 268α = 68 or α = 0.2537
dα
" # " # " #
3 −3 2.2388
x1 = + α0 = , f (x1 ) = 5.3732, less f0 = 14.
1 −5 −0.2687

6/57
Basic Concept : Example

2. Provide a plot showing contours of the function, steepest descent direction x0

and x1 .

3
2
x0
1
0
x2

-1 x1
-2
-3
-5 0 5
x1

7/57
Basic Concept : Example

We want to design the width and height of

the rectangular cross-section to increase
the bending stress defined by

6M
σ0 = , where M is a moment.
wh2

With the initial design (w, h) = (1, 3), we have

6(2000 × 24)
σ0 = = 32, 000psi
1(32 )

√ √
Using d = [−1/ 5 − 2/ 5]T and α = 0.2 we have

" # " √ # " #

1 −1/ 5 0.9106
x1 = x0 + αd = + 0.2 √ = , σ1 = 71, 342 psi
3 −2/ 5 2.8211
8/57
Line Search : Exact Line Search

Assume we have chosen a descent direction d. We need to choose the step factor α to
obtain our next design point. One approach is to use line search, which selects the
step factor that minimizes the one-dimensional function:

minimize f (x + αd)
α

To inform the search, we can use the derivative of the line search objective, which is
simply the directional derivative along d at x + αd.

function LINE_SEARCH(f, d)
objective = α -> f (x + α ∗ d)
a, b = brackect_minimum(objective)
α = minimize(objective, a, b)
return x + α ∗ d
end function

The exact line search is expensive, if we need to do it every step of the optimization.
In Matlab evironment, we can use commands fminbnd or fminsearch.
9/57
Line Search : Exact Line Search

• One disadvantage of conducting a line search at each step is the computational

cost of optimizing α to a high degree of precision.
• We could quickly find a reasonable value and then move on, selecting xk+1 ,
and then picking a new direction dk+1 .
• Some algorithms use a fixed step factor. Slarge steps will tend to result in fast
convergence but risk overshooting the minimum.
• Smaller steps tend to be more stable but can result in slower convergence.
• A fixed step factor α is sometimes referred to as a learning rate.
• Another method is to use a decaying step factor:

αk = α1 γk−1 for γ ∈ (0, 1]

The decaying step factors are popular when minimizing noisy objective function,
and always used in machine learning applications.

10/57
Line Search : Exact Line Search

Consider conducting a line search on f (x1 , x2 , x3 ) = sin(x1 x2 ) + e(x2 +x3 ) − x3 from

x = [1, 2, 3] in the direction d = [0, −1, −1]. The corresponding optimization problem
is:
150

minimize sin((1 + 0α)(2 − α))

α
100

+ e((2−α)+(3−α)) − (3 − α)

which simplifies to:

minimize sin(2 − α) + e(5−2α) + α − 3 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

The minimum is at α ≈ 3.127 with x ≈ [1, −1.126, −1.126]. Note I:

∇f (α) = − cos(2 − α) − 2e(5−2α) + 1 = 0. We can solve for α by using vpasolve in
Matlab. Note II: We can use Nonlinear search in Matlab or Julia like fminbnd from the
original problem.

11/57
Approximate Line Search

• It is often more computationally efficient to perform more iterations of a

descent method than to do an exact line search at each iteration, especially if
the function and derivative calculations are expensive.
• Many methods discussed so far can benefit from using approximate line search
to find a suitable step size with a small number of evaluations.
• Since descent methods must descend, a step size α may be suitable if it causes
a decrease in the objective function value. We need the sufficient decrease
condition. (to protect that the reductions in f values is not to small.)
• The sufficient decrease in the objective function value:

f (xk+1 ) ≤ f (xk ) + βα∇dk f (xk )

with β ∈ [0, 1] often set to β = 1 × 10−4 .

12/57
Approximate Line Search

• If β = 0, then any decrease is acceptable. If β = 1, then the decrease has to be

at least as much as what would be predicted by a first-order approximation.
• If d is a valid descent direction, then there must exist a sufficiently small step
size that satisfies the sufficient decrease condition.
• We can start with a large step size and decrease it by a constant reduction
factor until the sufficient decrease condition is satisfied.
• The algorithm is known as backtracking line search because of how it
backtracks along the descent direction.
13/57
Approximate Line Search

function BACKTRACKING_LINE_SEARCH(f, ∇f, x, d, α; p = 0.5, β = 1e − 4)

y, g = f (x), ∇f (x)
while f (x + α ∗ d) > y + β ∗ α ∗ (gT · d) do
α∗= p
end while
return α
end function

• The first condition is insufficient to guarantee convergence to a local minimum.

Very small step sizes will satisfy the first condition but can prematurely
converge.
• Backtracking line search avoids premature convergence by accepting the largest
satisfactory step size obtained by sequential downscaling and is guaranteed to
converge to ta local minimum.

14/57
Curvature Condition

The curvature condition requires the the directional derivative at the next iterate to be
shallower (α is not too close to zero):

∇dk f (xk+1 ) ≥ σ∇dk f (xk )

• Where σ controls how shallow the next directional derivative must be.
• It is common to set β < σ < 1 with σ = 0.1 when approximate linear search is
used with the conjugate gradient method and to 0.9 when used with Newton’s
method.
• The strong curvature condition, which is more restrictive criterion in that is also
required not to be too positive:

|∇dk f (xk+1 )| ≤ −σ∇dk f (xk )

• Both sufficient decrease condition (for αU ) and strong curvature condition are
called strong Wolfe conditions.(for αL ). 15/57
Curvature Condition

16/57
Wolfe Condition

Consider approximate line search on f (x1 , x2 ) = x21 + x1 x2 + x22 from x = [1, 2] in

the direction d = [−1, −1], using a maximum step size of 10, a reduction factor of 0.5,
a first Wolfe condition parameter β = 1 × 10−4 and a second Wolfe condition
parameter σ = 0.9.
The first Wolfe condition is f (x + αd) ≤ f (x) + βα(gT d), where g = ∇f (x) = [4, 5].
α = 10 we have

" # " #! " #!

1 −1 h i −1
f + 10 ≤ 7 + 1 × 10−4 (10) 4 5
2 −1 −1

217 ≤ 6.991 (It is not satisfied.)

α = 0.5(10) = 5 we have

" # " #! " #

1 −1 h i −1
f +5 ≤ 7 + 1 × 10−4 (5) 4 5
2 −1 −1

37 ≤ 6.996 (It is not satisfied.)

17/57
Wolfe Condition

α = 0.5(5) = 2.5, we have

" # " #! " #

1 −1 h iT −1
f + 2.5 ≤ 7 + 1 × 10−4 (2.5) 4 5
2 −1 −1

3.25 ≤ 6.998( The first Wolfe condition is satisfied.)

The candidate design point x′ = x + αd = [−1.5, −0.5]T is checked against the

second Wolfe condition:

∇d f (x′ ) ≥ σ∇d f (x)

" # " #
h i −1 h i −1
−3.5 −2.5 ≥σ 4 5
−1 −1

6 ≥ −8.1( The second Wolfe condition is satisfied. )

Approximate line search terminates with x = [−1.5 − 0.5]T .

18/57
The Steepest Descent Method

The steepest-descent mehtod (also called gradient descent) is a simple and intuitive
method for determining the search direction.
Direction Vector:
• Let xk be the current point at the kth iteration: k = 0 corresponds to the
starting point.
• We need to choose a downhill direction d and then a step size α > 0 such that
the new point xk + αd is better. We desire f (xk + αd) < f (xk ).
• To see how d should be chosen, we use the Taylor’s expansion

f (xk + αd) = f (xk ) + α∇f (xk )T d + O(α2 )

δf = α∇f (xk )T d + O(α2 )

• For small enough α the term O(α2 ) is dominated. Consequently, we have

δf ≈ α∇f (xk )T d

19/57
The Steepest Descent Method

• For a reduction in f or δf < 0, we require d to be a descent direction or a

direction that satisfies

∇f (xk )T d < 0

• The steepest descent method is based on choosing d at the kth iteration, which
we will denote as dk , as

dk = −∇f (xk ) =⇒ ∇f (xk )T (−∇f (xk )) = −∥∇f (xk )∥2 < 0

• This direction will be referred to as the steepest descent direction.

20/57
The Steepest Descent Method : Example

h iT
Given f (x) = x1 x22 , x0 = 1 2 .
1. Find the steepest descent direction at x0

h iT
d = − ∇f (x)|x=x0 = −4 −4

h iT
2. Is d = −1 2 a direction of descent?

" #
h i −1
∇f (x) T
d= 4 4 =4>0
x=x0 2

It is not a descent direction.

21/57
The Steepest Descent Method

After we have the direction vector dk at the point xk , how far to go along this
direction?
• We need to develop a numerical procedure to determine the step size αk along
dk .
• If we move along dk the design variables and the objective function depen only
on α as

x(α) = xk + αdk , f (α) = f (xk + αdk )

• The slope or derivative f ′ (α) = df /dα is called the directional derivative of f

along the direction d and is given by the expression

df (α̂)
= ∇f (xk + α̂dk )T dk
dα

22/57
The Steepest Descent Method

• In the steepest descent method, the direction vector is −∇f (xk ) resulting in
the slope at the current point α = 0 being

df (α)
= ∇f (xk )T (−∇f (xk )) = −∥∇f (xk )∥2 < 0
dα α=0

Implying a move in a downhill direction.

23/57
The Steepest Descent Method

• Starting from an initial point, we determine a direction vector and a step size,
and obtain a new point as xk+1 = xk + αk dk .
• The question is to know when to stop the iterative process. We have two stop
criteria to discuss here.
• First Befor performing line search, the necessary condition for optimality is
checked:

∥∇f (xk )∥ ≤ εG ,

where ϵG is a tolerance on the gradient and is supplied by the user. If the

condition is satisfied the the process is terminated.

24/57
The Steepest Descent Method : Stoping Criteria

• Second: We check the successive reductions in f as a criterion for stopping.

|f (xk+1 ) − f (xk )| ≤ εA + εR |f (xk )|

where εA = absolute tolerance on the change in function value and εR =

relative tolerance. Only if the condition is satisfied for two consecutive
iterations is the descen process stopped.

25/57
Steepest Descent Algorithm : Algorithm

Require: x0 , εG , εA , εR
k=0
while true do
Compute ∇f (xk )
if ∥∇f (xk )∥ ≤ εG then
Stop
else if then
dk = −∇f (xk )/∥∇f (xk )∥
end if
αk = line_search(f, d),
xk+1 = xk + αk dk ,
if |f (xk+1 ) − f (xk )| ≤ εA + εR |f (xk )| then
Stop
else
k = k + 1, xk = xk+1
end if
end while
26/57
Steepest Descent Algorithm: Zig-Zags Property

• The steepest descent method zig-zags its way towards the optimum point.
Consider

f (x1 , x2 ) = x21 + 5x22 .

5
Steepest Descent with 30 iterations

x0
0
x2

x¤

-5
-5 0 5 10
x1

27/57
Steepest Descent Algorithm: Zig-Zags Property

From the fact that αk is obtained by minimizing f (xk + αdk ). Thus

∂f (xk + αdk )
=0
∂α
∂f (xk+1 ) ∂f (xk+1 ) ∂xk+1 ∂f (xk+1 ) ∂(xk + αdk )
= = =0
∂α ∂xk+1 ∂α ∂xk+1 ∂α
∇f (xk+1 )T dk = 0

Set d = −∇fk , we have

−∇f (xk+1 )T ∇f (xk ) = 0

From the last line, it means the k + 1 direction is perpendicular to the k direction. If
you use the approximation line search, this perpendicular property is lost, but the
zig-zags are still there.

28/57
Steepest Descent Algorithm : The Bean function

Find the minimum of the bean function

1 2
f (x1 , x2 ) = (1 − x1 )2 + (1 − x2 )2 + 2x2 − x21 ,
2

using the steepest-descent algorithm with an exact line search, and a convergence
tolerance of ∥∇f } ≤ 10−6 .

2.5

1.5 " #
1.2134
1
x∗ = , f (x∗ ) = 0.0919
0.5 0.8241
0

-0.5

-1
-3 -2 -1 0 1 2 3

29/57
Steepest Descent Algorithm : Convergence Characteristics

• The speed of convergence of the method is related to the spectral condition

number of the Hessian matrix. The spectral condition number κ of a symmetric
positive definite matrix A is defined as the ratio of the largest to the smallest
eigenvalue, or

λmax
κ=
λmin

• For well-conditioned Hessian matrices, the condition number is close to unity,

contours are more circular, and the method is a its best.
• The higher the condition number, the more ill-conditioned is the Hessian, the
contours are more elliptical, more is the amount of zig-zagging near as the
optimum is approached, the smaller are the step sizes and thus the poorer is
the rate of convergence.

30/57
Steepest Descent Algorithm : Convergence Characteristics

Consider a function f (x1 , x2 ) = x21 + βx22 , we have

" #
2 0
H(f ) = ∇ f = 2
, with β = 1, 5, 15
0 2β

5 5
Steepest Descent with 1 iterations Steepest Descent with 29 iterations

x0 x0
0 0
x2

x2
x¤ x¤

-5 -5
-5 0 5 10 -5 0 5 10
x1 x1
5
Steepest Descent with 90 iterations

x0
0
x2

x¤

-5 31/57
-5 0 5 10
x1
The Conjugate Gradient Method

• The Conjugate Gradient Method [Fletcher and Powell 1963] is a dramatic

improve over the steepest descent method. The steepest descent perform
poorly in narrow valleys.
• It can find the minimum of a quadratic function of n variables in n iterations.
• The conjugate Gradient method is also powerful on general functions.
• Consider the problem of minimizing a quadratic function

1 T
minimize q(x) = x Ax + bT x + c
x 2

where A is a symmetric and positive definite.

• The conjugate directions, or directions that are mutually conjugate with respect
to A, as vectors which satisfy

dT
i Adj = 0, i ̸= j, 0 ≤ i, j ≤ n

32/57
The Conjugate Gradient Method : The method

• The mutually conjugate vectors are the basis vectors of A. They are generally
no orthogonal to one another.
• The algorithm is started with the direction of steepest descent:

d1 = −g1

• Use line search to find the next design point For quadratic functions, the step
factor α can be computed exactly. The update is then:

x 2 = x 1 + α1 d 1

33/57
The Conjugate Gradient Method : The method

• Suppose we want ot derive the optimal step factor for a line search on a
quadratic function:

minimize f (x + αd)
α

We have

∂f (x + αd) ∂ 1
= (x + αd)T A(x + αd) + bT (x + αd) + c
∂α ∂α 2
= dT A(x + αd) + bT d = dT A(x + αd) + dT b
= dT (Ax + b) + αdT Ad

∂f (x+αd)
Setting ∂α
= 0 results in:

dT (Ax + b)
α=−
dT Ad
34/57
The Conjugate Gradient Method : The method

• Subsequent iterations choose dk+1 based on the current gradient and a

contribution from the previous descent direction:

dk = −gk + βk dk−1

for scalar parameter β. Larger values of β indicate that the previous descent
direction contributes more strongly.
• To find the best value for β for a known A, using the fact that dk is conjugate to
dk−1 :

dT
k Adk−1 = 0 =⇒ (−gk + βk dk−1 )T Adk−1 = 0
gkT Adk−1
−gkT Adk−1 + βk dT
k−1 Adk−1 = 0 =⇒ βk =
dT
k−1 Adk−1

• The conjugate gradient method can be applied to nonquadratic functions as

well.
35/57
The Conjugate Gradient Method : The method

We do not know the value of A that best approximates f around xk . Several choices
for βk tend to work well:
• Fletcher-Reeves:

gkT gk
βk = − T
gk−1 gk−1

• Polak-Ribière:

gkT (gk − gk−1 )

βk = T
gk−1 gk−1

• Convergence for the Polak-Ribière method can be guaranteed if we modify it to

allow for automatic resets:

β ← max(β, 0)

36/57
The Conjugate Gradient Method : Example

h iT
Consider f = x21 + 4x22 , x0 = 1 1 . We will perform two iterations of the
conjugate gradient algorithm. The first step is the steepest descent iteration. Thus

h iT
d0 = −∇f (x0 ) = − 2 8

Assuming the direction vectors are not normalized to be unit vectors,

f (α) = f (x0 + αd0 ) = (1 − 2α)2 + 4(1 − 8α)2

h iT
which yields α0 = 0.1308, x1 = x0 + α0 d0 = 0.7385 −0.0462 . The next
iteration (using Fletcher-Reeves method):

∥∇f (x1 )∥2

β0 = = 2.3176/68 = 0.0341
∥∇f (x0 )∥2

37/57
The Conjugate Gradient Method : Example

" # " # " #

−1.4770 −2 −1.5451
d1 = −∇f T (x1 ) + β0 d0 = + 0.0341 =
0.3692 −8 0.0966

We have

f (α) = f (x1 + αd1 ) = (0.7385 − 1.5451α)2 + 4(−0.0462 + 0.0966α)2

which yields

α1 = 0.4780
" # " # " #
0.7385 −1.5451 0
x2 = x1 + α1 d 1 = + 0.4780 =
−0.0462 0.0966 0

38/57
The Conjugate Gradient Method : Example

5
Conjugate gradient with 2 iterations

x0
0
x2

x¤

-5
-5 0 5 10
x1

39/57
The Conjugate Gradient Method : Algorithm

Require: x0 , εG
k=0
while ∥∇fk ∥ > εG do
if k = 0 then
∇fx
dk = − ∥∇f ∥
k
else
T
∇fx ∇fk
βk =
∇fk−1
T ∇fk−1
∇fk
dk = − ∥∇f + βk dk−1
k∥
end if
αk = line_search(f, dk )
xk = xk−1 + αk dk
k =k+1
end while

40/57
The Conjugate Gradient Method : Example

The minimum of the bean function,

1 2
f (x1 , x2 ) = (1 − x1 )2 + (1 − x2 )2 + 2x2 − x21
2

3 3
CG with 18 iterations SD with 31 iterations
2 2
x0
1 1
x2

x2
x0 x¤

x¤
0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
x1 x1

41/57
Newton’s Method

• The function value and gradient can help to determine the direction to travel,
but it does not directly help to determine how far to step to reach a local
minimum.
• Second-order information allows us to make a quadratic approximation of the
objective function and approximate the right step size to reach a local minimum.
• As we have seen with a quadratic fit search, we can analytically obtain the
location where a quadratic approximation has a zero gradient. We can use that
location as the next iteration to approach a local minimum.
• The quadratic approximation about a point xk comes from the second-order
Taylor expansion:

1
q(x) = f (xk ) + (x − xk )f ′ (xk ) + (x − xk )2 f ′′ (xk )
2
∂
q(x) = f ′ (xk ) + (x − xk )f ′′ (xk ) = 0
∂x
f ′ (xk )
xk+1 = xk − ′′
f (xk )

42/57
Newton’s Method : Example

Suppose we want to minimize the following single-variable function:

f (x) = (x − 2)4 + 2x2 − 4x + 4, f ′ (x) = 4(x − 2)3 + 4x − 4,

f ′′ (x) = 12(x − 2)2 + 4

with x0 = 3, we can form the quadratic uisng the function value and the first and
second derivatives evaluated at the point.

30 30

20
20

10
x2

x2
10 x0
0
x2x3 x1 x0
x1
0 x2x3
-10
0 1 2 3 4 0 1 2 3 4
x1 x1

43/57
Newton’ s Method : Disadvantage

• The update rule in Newton’s method involves dividing by the second derivative.
The update is undefined if the second derivative is zero, which occurs when the
quadratic approximation is a horizontal line.
• Instability also ocurs when the second derivative is very close to zero, in which
case the next iterate will lie very far from the current design point, far from
where the local quadratic approximation is valid.
• Poor local approximations can lead to poor performance with Newton’s method.

Oscillation Overshoot Negative curvature

2 4
3.0
1 3

2.5 2
0
1
f

f
2.0 -1
0

1.5 -2 -1

-3 -2
-4 -2 0 2 4 xk xk + 1 xk + 1 xk
x1 x1 x1

44/57
Newton’s Method : Multivariate Optimization

• The multivariate second-order Taylor expansion at xk is

1
f (x) ≈ q(x) = f (xk ) + ∇f (xk )T (x − xk ) + (x − xk )T Hk (x − xk )
2
dq(xk )
= ∇f (xk ) + Hk (x − xk ) = 0
d(x − xk )

We then solve for the next iterate, thereby obtaining Newton’s method in
multivariate form:

xk+1 = xk − H−1
k ∇f (xk )

• If f (x) is quadratic and its Hessian is positive definite, then the update
converges to the global minimum in one step. For general functions, Newton’s
method is often terminated once x ceases to change by more than a given
tolerance.

45/57
Newton’s Method : Example
h i
With x1 = 9 8 , we will use Newton’s method to minimize Booth’s function:

f (x) = (x1 + 2x2 − 7)2 + (2x1 + x2 − 5)2 ,

" #
h iT 10 8
∇f (x) = 10x1 + 8x2 − 34, 8x1 + 10x2 − 38 , H(x) =
8 10

The first iteration of Newton’s method yields:

" # " #−1 " #

9 10 8 10(9) + 8(8) − 34
x2 = x1 − H−1
1 g1 = −
8 8 10 8(9) + 10(8) − 38
" # " #−1 " # " #
9 10 8 120 1
= − =
8 8 10 114 3

The gradient at x2 is zero, so we have converged after a single iteration. The Hessian
is positive definite everywhere, so x2 is the global minimum.

46/57
Newton’s Method : Multivariate Optimization

9
x0

x2
x¤

0
Steepest Descent with 6 iterations
Newton's method with 1 iterations
-3
-5 0 5 10
x1

3.05

3.00
x2

2.95
Steepest Descent with 6 iterations
Newton's method with 1 iterations

1.0 1.1 1.2

x1 47/57
Newton’s Method : Algorithm

Require: x0 , εG , ∇fk , Hk
k=0
while ∥∇fk ∥ > εG && k ≤ kmax do
∆ = H(x)−1 ∇f (x)
x=x−∆
k =k+1
end while
return x

48/57
Secant Method

Newton’s method is efficient because the second-order information results in better

search directions, but it has the significant shortcoming of requiring the Hessian.
Quasi-Newton methods are designed to address this issue. The basic idea is that we
can use first-order information (gradients) along each step in the iteration path to
building an approximation of the Hessian.
• The secant method uses the last two iterates to approximate the second
derivative:

f ′ (xk ) − f ′ (xk−1 )
f ′′ (xk ) ≈
xk − xk−1

• This estimate is substituted into Newton’s method:

xk − xk−1
xk+1 = xk − f ′ (xk )
f ′ (xk ) − f ′ (xk−1 )

• The secant method requires an additional initial design point. It suffers from
the same problems as Newton’s method.
49/57
Quasi-Newton Methods

• As the secant method approximates f ′′ in the univariate case, quasi-Newton

methods approximate the inverse Hessian. Quasi-Newton method updates have
the form:

xk+1 = xk − αk Qk ∇fk ,

where αk is a scalar step factor and Qk approximates the inverse of the

Hessian at xk
• These methods typically set Q1 to the identity matrix, and they then apply
updates to reflect information learned with each iteration. To simplify the
equations for the various quasi-Newton methos, we define the following:

γ k+1 = ∇fk+1 − ∇fk

δ k+1 = xk+1 − xk

50/57
Quasi-Newton Methods : Davidon-Fletcher-Powell (DFP)

The Davidon-Fletcher-Powell (DFP) method uses:

Qk γ k γ T
k Qk δk δT
Qk+1 = Qk − + k
γT
k Qk γ k δT
k γk

The update for Q in the DFP method havs three properties:

• Q remains symmetric and positive definite.
• If f (x) = 1 T
2
x Ax + bT x + c, then Q = A−1 . Thus the DFP has the same
convergence properties as the conjugate gradient method.
• For high-dimensional problems, storing and updating Q can be significant
compared to other methods like the conjugate gradient method.

51/57
Quasi-Newton Methods : Davidon-Fletcher-Powell (DFP)

Require: x0 , εG , ∇fk
k = 0, Q = I
while ∥∇fk ∥ > εG && k ≤ kmax do
g = ∇f (x)
x′ = line_search(f, x, −Q ∗ g)
g′ = ∇f (x′ )
δ = x′ − x
γ = g′ − g
Q = Q − Qγγ T Q/γ T Qγ + δδ T /δ T γ
return x′
end while

52/57
Quasi-Newton Methods : Broyden-Fletcher-Goldfarb-Shanno
(BFGS)

An alternative to DFP, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method uses:

! !
T
δk γ T
k Qk + Qk γ k δ k γT
k Qk γ k δk δT
Qk+1 = Qk − + 1+ k
δT
k γk δT
k γk δT
k γk

Require: x0 , εG , ∇fk
k = 0, Q = I
while ∥∇fk ∥ > εG && k ≤ kmax do
g = ∇f (x)
x′ = line_search(f, x, −Q ∗ g)
g′ = ∇f (x′ )
δ = x′ − x
γ = g′ − g

Q = Q − δγ T Q + Qγδ T /δ T γ + 1 + γ T Qγ/δ T γ δδ T /δ T γ
return x′
end while 53/57
Compare Four Methods

Minimizing of bean function

Steepest Descent Conjugate gradient

4 4
3 26 iterations 3 12 iterations
x0 x0
2 2
1 x¤ 1 x¤
x2

x2
0 0
-1 -1
-2 -2
-4 -2 0 2 4 -4 -2 0 2 4
x1 x1

DFP(red) and BFGS(green) Newton's method

4 4
3 7 iterations 3 8 iterations
x0 7 iterations x0
2 2
x¤ x¤
1 1
x2

0 0
-1 -1
-2 -2
-4 -2 0 2 4 -4 -2 0 2 4
x1 x1 54/57
Compare Four Methods

Minimizing the total potential energy for a spring system:

q 2 q 2
1 1
minimize k1 (l1 + x1 )2 + x22 − l1 + k2 (l2 − x1 )2 + x22 − l2 − mgx2
x1 ,x2 2 2

By letting l1 = 12, l2 = 8, k1 = 1, k2 = 10, mg = 7 (with appropriate units).

55/57
Compare Four Methods

56/57
Reference

1. Joaquim R. R. A. Martins, Andrew Ning, ”Engineering Design Optimization,”

Cambridge University Press, 2021.

2. Mykel J. kochenderfer, and Tim A. Wheeler, ”Algorithms for Optimization,” The MIT
Press, 2019.

3. Ashok D. Belegundu, Tirupathi R. Chandrupatla, ”Optimization Concepts and

Applications in Engineering,” Cambridge University Press, 2019.

4. Kalyanmoy D., ”Optimization for Engineering Design: Algorithms and Examples,”

2nd, PHI Learning Private Limited, 2012.

57/57

Soft Computing
No ratings yet
Soft Computing
39 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Optimumengineeringdesign Day5
No ratings yet
Optimumengineeringdesign Day5
84 pages
Cauchy Gradient Based Technique Lecture 5
No ratings yet
Cauchy Gradient Based Technique Lecture 5
21 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Module 5
No ratings yet
Module 5
100 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
US - TMC - 05 - Optimization 2022
No ratings yet
US - TMC - 05 - Optimization 2022
43 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
Linnear Nonlineae Numerical Method
No ratings yet
Linnear Nonlineae Numerical Method
43 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
69 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Lecture 5 Si416 2025
No ratings yet
Lecture 5 Si416 2025
21 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Unconstrained
No ratings yet
Unconstrained
30 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Lecture 9
No ratings yet
Lecture 9
31 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
69 pages
Line Search Algorithms With Guaranteed Sufficient Decrease
No ratings yet
Line Search Algorithms With Guaranteed Sufficient Decrease
22 pages
NEOM Manual Part-II 4-Expts
No ratings yet
NEOM Manual Part-II 4-Expts
41 pages
Chương 9
No ratings yet
Chương 9
12 pages
Jsea20100500012 19713998
No ratings yet
Jsea20100500012 19713998
7 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
JSSM20090100005 27988653
No ratings yet
JSSM20090100005 27988653
7 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Ame-2 One-Dimensional-Search
No ratings yet
Ame-2 One-Dimensional-Search
11 pages
Structural and Multidisciplinary Optimization
No ratings yet
Structural and Multidisciplinary Optimization
33 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization of Chemical Processes (Che1011)
No ratings yet
Optimization of Chemical Processes (Che1011)
9 pages
Download
No ratings yet
Download
7 pages
Optim
No ratings yet
Optim
70 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Optimization For Data Sciences PPT3
No ratings yet
Optimization For Data Sciences PPT3
12 pages
IE684 Lab05
No ratings yet
IE684 Lab05
3 pages
HW 1
No ratings yet
HW 1
8 pages
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
No ratings yet
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
35 pages