0% found this document useful (0 votes)

7 views

Lecture 5

Uploaded by

mralreda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Lecture 5

Uploaded by

mralreda99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

TMA947 / MMG621 – Nonlinear optimization Lecture 5

TMA947 / MMG621 — Nonlinear optimization

Lecture 5 — Uncontrained optimization algorithms

Emil Gustavsson, Zuzana Nedělková

November 6, 2017

[Minor revision: Axel Ringh - Spetember, 2024]

Consider the unconstrained optimization problem to

minimize
n
f (x), (1)
x∈R

where f ∈ C 0 on Rn (f is continuous). Mostly, we assume that f ∈ C 1 holds (f is continuously

differentiable), sometimes even C 2 . The choice of the algorithm depends on the size of the prob-
lem, availability of ∇f (x) and ∇2 f (x), convexity of f and if the goal is to find a local or the global
minimum.

Most algorithms for unconstrained optimization problems are what we call line search type algo-
rithms.
Definition. Line search type algorithm

Step 0: Starting point x0 ∈ Rn . Let k := 0.

Step 1: Find search direction pk ∈ Rn
Step 2: Perform line search, i.e., find αk > 0 such that f (xk + αk pk ) < f (xk )
Step 3: Let xk+1 := xk + αk pk .
Step 4: If termination criteria is fulfilled then stop! Otherwise, let k := k + 1 and go to Step 1.

p 5

4
f
3

w 1

−1
x
−2
−5
−4
−3 −3
−2
−1
0
1 −4
2
3
4 −5
a c
5

Most algorithms we consider are inherently local, meaning that the search direction pk is only
based on the information at the current point xk , that is, f (xk ), ∇f (xk ), and ∇2 f (xk ).

Think of a near-sighted mountain climber. The climber is in a deep fog and can only check his or
her barometer for the height and feel the steepness of the slope under her feet.

1
TMA947 / MMG621 – Nonlinear optimization Lecture 5

Step 1: Search directions

Vector pk is a descent direction at xk if f (xk + αpk ) < f (xk ) for all α ∈ (0, δ] for some δ > 0.

Let f ∈ C 1 in some neighborhood of xk ∈ Rn , if ∇f (xk ) ̸= 0, then pk = −∇f (xk ) is a descent

direction for f at xk (follows from optimality conditions). This search step is called steepest descent
direction because it solves the problem to

minimize ∇f (xk )T p.
p∈Rn :∥p∥=1

Let Q ∈ Rn×n be an arbitrary symmetric, positive definite matrix. Then pk = −Q∇f (xk ) is a
descent direction for f at xk , because

∇f (xk )T pk = −∇f (xk )T Q∇f (xk ) < 0,

due to the positive definiteness of Q.

Examples:

– Steepest descent: Q = I,
– Newton’s method: Q = [∇2 f (xk )]−1 .

We will now derive Newton’s method. To do so, we need to assume that f ∈ C 2 . We also first
assume that ∇2 f (x) is positive definite. A second-order Taylor approximation is then:
1
f (xk + p) − f (xk ) ≈ ∇f (xk )T p + pT ∇2 f (xk )p =: φxk (p)
2
We now try to minimize this approximation by setting the gradient of φxk (p) to zero:

∇p φxk (p) = ∇f (xk ) + ∇2 f (xk )p = 0 ⇔ ∇2 f (xk )p = −∇f (xk )

Now by choosing the vector fulfilling this we obtain pk = −[∇2 f (xk )]−1 ∇f (xk ) as the search
direction. When n = 1, we get that pk = −f ′ (xk )/f ′′ (xk ).

When the Hessian ∇2 f (xk ) is positive definite this search direction is a descent direction. But
when ∇2 f (xk ) is negative definite (may be also non invertible), the search direction is an ascent
direction, meaning that Newton’s method does differentiate between minimization and maxi-
mization problem. The solution to this problem is to modify ∇2 f (xk ) by adding a diagonal ma-
trix γI such that (∇2 f (xk ) + γI) is positive definite (this can always be done, why?). This method
is called the Levenberg-Marquardt modification. We thus take as search direction
−1
pk = − ∇2 f (xk ) + γI

∇f (xk ).

Note that

– Steepest descent: γ = ∞,
– Newton’s method: γ = 0.

2
TMA947 / MMG621 – Nonlinear optimization Lecture 5

What happens when we can not compute ∇2 f (xk )? Try to approximate the Hessian in some way
choosing approximate matrix B k . From Taylor expansion for ∇f (xk ) we have that

∇2 f (xk )(xk − xk−1 ) ≈ ∇f (xk ) − ∇f (xk−1 )

so the approximate matrix B k has to fulfill

B k (xk − xk−1 ) = ∇f (xk ) − ∇f (xk−1 ).

Many different choices of B k exist, and they lead to what is called quasi-Newton methods.

To summarize:

Steepest descent: pk = −∇f (xk )

2
Netwon’s method: ∇ f (xk )pk = −∇f (xk )
2
Levenberg-Marquardt: (∇ f (xk ) + γI)pk = −∇f (xk )
Quasi-Newton: B k pk = −∇f (xk ).

Step 2: Line search

In each iteration one would like to solve

minimize φ(α) := f (xk + αpk ).

α≥0

The optimality conditions for the problem are

φ′ (α∗ ) ≥ 0,
α∗ φ′ (α∗ ) = 0,
α∗ ≥ 0.

These conditions state that if α∗ > 0, then φ′ (α∗ ) = 0, which implies that

∇f (xk + α∗ pk )T pk = 0,

meaning that the search direction pk is orthogonal to the gradient of f at xk + α∗ pk .

p 5

4
f
3

w 1

−1
x
−2
−5
−4
−3 −3
−2
−1
0
1 −4
2
3
4 −5
a c
5

3
TMA947 / MMG621 – Nonlinear optimization Lecture 5

However, solving the line search problem to optimality is unnecessary. The optimal solution to
the original problem lies elsewhere anyway. Examples of methods to choose step lengths αk

– Interpolation: Use f (xk ), ∇f (xk ), and ∇f (xk )T pk to approximate φ = f (xk + αpk ) quadrat-
ically. Then minimize this approximation of φ analytically.
– Newton’s method: Repeat improvements from a quadratic approximation: α = α−φ′ (α)/φ′′ (α)
– Golden section: Derivative-free method which shrinks an interval wherein a solution to
φ′ (α) = 0 lies.

We will often use what is denoted as the Armijo rule. The idea is to choose a step length α which
provides sufficient decrease in f . We have that

f (xk + αpk ) ≈ f (xk ) + α∇f (xk )T pk ,

for very small values of α > 0, meaning that we predict that the objective function will decrease
with α∇f (xk )T pk if we move a step length α in the direction of pk . Now this might be too
optimistic, and we will therefore accept the step length if the actual decrease is at least a fraction
µ (µ is small, typically µ ∈ [0.001, 0.01]) of the predicted decrease, i.e., we will accept α if

f (xk + αpk ) − f (xk ) ≤ µα∇f (xk )T pk ,

or equivalently, if
φ(α) − φ(0) ≤ µαφ′ (0).
We usually start with α = 1. If this is not fulfilled, then choose α := α/2.

r a

b c

Figure 1: The interval (R) accepted by the Armijo step length rule

4
TMA947 / MMG621 – Nonlinear optimization Lecture 5

Convergence

In order to state a convergence result for the algorithm, we make an additional assumption for
the search directions. We need the directions pk to fulfill

∇f (xk )T pk
− ≥ s1 , ∥pk ∥ ≥ s2 ||∇f (xk ||, and ∥pk ∥ ≤ M (2)
∥∇f (xk )∥ · ∥pk ∥

for some s1 , s2 , M > 0, where the first inequality makes the angle between pk and ∇f (xk ) stay
between 0 and π/2, but not too close to π/2. The second inequality makes sure that the only
case when pk can be zero is when the gradient is zero. These two conditions guarantee a certain
descent quality.
Theorem (convergence of unconstrained algorithm). Suppose f ∈ C 1 and for the starting point x0
the level set {x ∈ Rn | f (x) ≤ f (x0 )} is bounded. Consider the iterative algorithm described above.
Suppose that for all k, pk fulfills (2) and αk is chosen according to the Armijo rule. Then

a) the sequence {xk } is bounded,

b) the sequence {f (xk )} is descending and lower bounded, and

c) every limit point of {xk } is a stationary point.

Proof. See Theorem 11.4 in the book.

If we add the assumption that f is a convex function, then we can show that

optimum exists ⇐⇒ {xk } converges to an optimal solution.

Step 4: Termination criteria

We can not terminate the algorithm when ∇f (xk ) = 0, since this rarely happens. We need to have
some tolerance level. Three examples are

a) ∥∇f (xk )∥ ≤ ε1 (1 + |f (xk )|), where ε1 > 0 is small.

b) f (xk−1 ) − f (xk ) ≤ ε2 (1 + |f (xk )|), where ε2 > 0 is small.
c) ∥xk − xk−1 ∥ ≤ ε3 (1 + ∥xk ∥), where ε3 > 0 is small.

Can also use the max-norm ∥ · ∥∞ instead.

5
TMA947 / MMG621 – Nonlinear optimization Lecture 5

A note on trust region methods

Trust region methods use a quadratic approximation of the function around the current iterate
xk , avoid a line search but instead bound the length of the search direction. Let

1
φxk (p) := f (xk ) + ∇f (xk )T p + pT ∇2 f (xk )p.
2
Since this is a local approximation, we restrict our approximation to a trust region in the neighbor-
hood of xk , i.e., we trust the model in the region where ∥p∥ ≤ ∆k . We then solve the problem
to

minimize φxk (p),

subject to ∥p∥ ≤ ∆k .

and let the solution be pk . Then we update our iterate as xk+1 = xk + pk . We also update the trust
region parameter ∆k depending on the progress so far (actual reduction/predicted reduction).
The method is robust and possess strong convergence. More detailed information about trust
region methods can be found in the book on pages 301–302.

A note on black-box functions

In some cases the value of the objective function f (x) is given through some unknown simulation
procedure. This implies that we do not have a clear representation of the gradient of the objective
function. In some cases, we can perform numerical differentiation and approximate the partial
derivatives as, e.g.,
∂f (x) f (x + αei ) − f (x)
≈ ,
∂xi α
where ei = (0, . . . , 0, 1, 0, . . . , 0)T is the unit vector in Rn .

If the simulation is not accurate, we get a bas derivative information. We can use derivative-free
methods instead. These try to build a model fˆ of the objective function f from evaluating the
objective function at some specific test points and optimize the model fˆ instead of the function f .

(Larry W. Mays, Yeou-Koung Tung) Hydrosystems Engineering and Management
100% (5)
(Larry W. Mays, Yeou-Koung Tung) Hydrosystems Engineering and Management
552 pages
Accurate Retail Testing of Fashion Merchandise: Methodology and Application
No ratings yet
Accurate Retail Testing of Fashion Merchandise: Methodology and Application
13 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Nocedal_Wright Ch_02-02
No ratings yet
Nocedal_Wright Ch_02-02
12 pages
Optimization2
No ratings yet
Optimization2
40 pages
lecture-5-si416-2025
No ratings yet
lecture-5-si416-2025
21 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
Chương 9
No ratings yet
Chương 9
12 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
Unconstrained and Constrained Optimization Algorithms by Soman K.P
No ratings yet
Unconstrained and Constrained Optimization Algorithms by Soman K.P
166 pages
Optimization For Machine Learning: Massachusetts Institute of Technology
No ratings yet
Optimization For Machine Learning: Massachusetts Institute of Technology
169 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
No ratings yet
Nonlinear Programming (Concepts, Algorithms, and Applications To Chemical Processes) - 3. Newton-Type Methods For Unconstrained Optimization (2010)
23 pages
Bologna 07
No ratings yet
Bologna 07
315 pages
An Algorithm For Minimax Solution of Overdetennined Systems of Non-Linear Equations
No ratings yet
An Algorithm For Minimax Solution of Overdetennined Systems of Non-Linear Equations
8 pages
(1.5.2) Unconstrained Nonlinear Programming
No ratings yet
(1.5.2) Unconstrained Nonlinear Programming
25 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Newton Gauss Method
No ratings yet
Newton Gauss Method
37 pages
Preguntas del examen
No ratings yet
Preguntas del examen
8 pages
Opt_Lec_10
No ratings yet
Opt_Lec_10
16 pages
Lecture8_UnconstrainedII_2023
No ratings yet
Lecture8_UnconstrainedII_2023
57 pages
Unconstrained Multivariable Optimization
No ratings yet
Unconstrained Multivariable Optimization
42 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Constrained and Unconstrained Optimization: Carlos Hurtado
No ratings yet
Constrained and Unconstrained Optimization: Carlos Hurtado
42 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
6 OneD Unconstrained Opt
No ratings yet
6 OneD Unconstrained Opt
29 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
Coursework of Optimization 2022 2
No ratings yet
Coursework of Optimization 2022 2
2 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Maximum Slope Method
No ratings yet
Maximum Slope Method
14 pages
Elimination Methods
No ratings yet
Elimination Methods
34 pages
Optimumengineeringdesign Day5
No ratings yet
Optimumengineeringdesign Day5
84 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
No ratings yet
33-Cauchy Method and Fletcher-Reeves Method-13-04-2024
37 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Basic Solution Methods For Unconstrained Problems
No ratings yet
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Basic Solution Methods For Unconstrained Problems
68 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Lec3 Gradient Based Method Part I
No ratings yet
Lec3 Gradient Based Method Part I
30 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Water Network Optimization
No ratings yet
Water Network Optimization
4 pages
Minimization of Internally Reflected Power Via Waveform Design in Cognitive MIMO Radar
No ratings yet
Minimization of Internally Reflected Power Via Waveform Design in Cognitive MIMO Radar
16 pages
Module 1 - Introduction To Business Analytics
No ratings yet
Module 1 - Introduction To Business Analytics
62 pages
1-s2.0-S0019057822004797-main
No ratings yet
1-s2.0-S0019057822004797-main
17 pages
Quantum Blackjack
No ratings yet
Quantum Blackjack
15 pages
4. SIMON'S DECISION MAKING & MANAGEMENT INFORMATION SYSTEM
No ratings yet
4. SIMON'S DECISION MAKING & MANAGEMENT INFORMATION SYSTEM
44 pages
Pict Struct Ijcv PDF
No ratings yet
Pict Struct Ijcv PDF
42 pages
Optimum Span Length For Steel Composite Girder Expressway Bridges
No ratings yet
Optimum Span Length For Steel Composite Girder Expressway Bridges
8 pages
1 s2.0 S0376042122000070 Main
No ratings yet
1 s2.0 S0376042122000070 Main
29 pages
Course Guide Master Mawi 2010
No ratings yet
Course Guide Master Mawi 2010
64 pages
Two Pairs of Families of Polyhedral Norms Versus - Norms: Proximity and Applications in Optimization
No ratings yet
Two Pairs of Families of Polyhedral Norms Versus - Norms: Proximity and Applications in Optimization
39 pages
Development and Challenges of Planning and Scheduling For Petroleum and Petrochemical Production
No ratings yet
Development and Challenges of Planning and Scheduling For Petroleum and Petrochemical Production
12 pages
CBSE Class 12 Maths Chapter 12 - Linear Programming Important Questions 2022-23
No ratings yet
CBSE Class 12 Maths Chapter 12 - Linear Programming Important Questions 2022-23
39 pages
BUDT 732 Syllabus Spring 2014 v1 0
No ratings yet
BUDT 732 Syllabus Spring 2014 v1 0
5 pages
Logistics Engineering and Health 1st Edition Hayfa Zgaya All Chapters Instant Download
100% (3)
Logistics Engineering and Health 1st Edition Hayfa Zgaya All Chapters Instant Download
62 pages
Combining Analytical Hierarchy Process and TOPSIS Approaches For Supplier Selection in A Cable Company
No ratings yet
Combining Analytical Hierarchy Process and TOPSIS Approaches For Supplier Selection in A Cable Company
20 pages
Ai Programs
No ratings yet
Ai Programs
29 pages
cs188 Su24 Note03
No ratings yet
cs188 Su24 Note03
13 pages
Solutions To Deep Learning
No ratings yet
Solutions To Deep Learning
25 pages
ACCT-4103 Mid Preparation
No ratings yet
ACCT-4103 Mid Preparation
16 pages
1 s2.0 S0149197014002728 Main
No ratings yet
1 s2.0 S0149197014002728 Main
7 pages
Network Optimization: Nodes or Vertices. The Lines Are Called Arcs. The Arcs May Have A Direction On Them, in Which
No ratings yet
Network Optimization: Nodes or Vertices. The Lines Are Called Arcs. The Arcs May Have A Direction On Them, in Which
8 pages
Convex Tutorial
No ratings yet
Convex Tutorial
88 pages
Data Science From Research To Application
No ratings yet
Data Science From Research To Application
350 pages
Flower Pollination Algorithm For Global Optimization
No ratings yet
Flower Pollination Algorithm For Global Optimization
11 pages
Optimal Coordination of Directional Overcurrent Relays: A Genetic Algorithm Approach
No ratings yet
Optimal Coordination of Directional Overcurrent Relays: A Genetic Algorithm Approach
4 pages
Simulation-Based Optimization of A Multiple Gas Feed Sweetening
No ratings yet
Simulation-Based Optimization of A Multiple Gas Feed Sweetening
16 pages
Instructors Manual
No ratings yet
Instructors Manual
96 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

TMA947 / MMG621 – Nonlinear optimization Lecture 5

TMA947 / MMG621 — Nonlinear optimization

Lecture 5 — Uncontrained optimization algorithms

[Minor revision: Axel Ringh - Spetember, 2024]

Consider the unconstrained optimization problem to

where f ∈ C 0 on Rn (f is continuous). Mostly, we assume that f ∈ C 1 holds (f is continuously

Step 0: Starting point x0 ∈ Rn . Let k := 0.

Step 1: Search directions

Let f ∈ C 1 in some neighborhood of xk ∈ Rn , if ∇f (xk ) ̸= 0, then pk = −∇f (xk ) is a descent

∇f (xk )T pk = −∇f (xk )T Q∇f (xk ) < 0,

due to the positive definiteness of Q.

∇p φxk (p) = ∇f (xk ) + ∇2 f (xk )p = 0 ⇔ ∇2 f (xk )p = −∇f (xk )

∇2 f (xk )(xk − xk−1 ) ≈ ∇f (xk ) − ∇f (xk−1 )

so the approximate matrix B k has to fulfill

B k (xk − xk−1 ) = ∇f (xk ) − ∇f (xk−1 ).

Steepest descent: pk = −∇f (xk )

Step 2: Line search

In each iteration one would like to solve

minimize φ(α) := f (xk + αpk ).

The optimality conditions for the problem are

meaning that the search direction pk is orthogonal to the gradient of f at xk + α∗ pk .

f (xk + αpk ) ≈ f (xk ) + α∇f (xk )T pk ,

f (xk + αpk ) − f (xk ) ≤ µα∇f (xk )T pk ,

a) the sequence {xk } is bounded,

b) the sequence {f (xk )} is descending and lower bounded, and

c) every limit point of {xk } is a stationary point.

Proof. See Theorem 11.4 in the book.

optimum exists ⇐⇒ {xk } converges to an optimal solution.

Step 4: Termination criteria

a) ∥∇f (xk )∥ ≤ ε1 (1 + |f (xk )|), where ε1 > 0 is small.

Can also use the max-norm ∥ · ∥∞ instead.

A note on trust region methods

minimize φxk (p),

A note on black-box functions

You might also like