0% found this document useful (0 votes)

66 views5 pages

School of Computer Science and Applied Mathematics

1. The document describes line search methods for unconstrained optimization problems. These methods iteratively select a search direction and step size to minimize an objective function. 2. The optimal step size exactly minimizes the objective function along the search direction, but this is often computationally expensive. Instead, line search methods use inexact approaches like Armijo, Goldstein, and Wolfe conditions to efficiently select a step size that sufficiently reduces the objective. 3. The Armijo condition alone may allow too small steps. The Wolfe conditions combine the Armijo condition with a curvature condition to ensure sufficient reduction while preventing overly short steps. The Strong Wolfe conditions further modify the curvature condition to choose a step size near the local minimizer

Uploaded by

Fanta Camara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views5 pages

School of Computer Science and Applied Mathematics

Uploaded by

Fanta Camara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

School of Computer Science and Applied Mathematics

APPM 3017: Optimization III

Lecture 03 : Line Search Methods
Lecturer: Matthews Sejeso Date: August 2021

LEARNING OUTCOMES
By the completion of this lecture you should be able to:
1. Describe the stepsize selection of line search methods.
2. Describe convergence of gradient methods with dynamic stepsize selection.
Reference:
• Chapter 3, Jorge Nocedal and Stephen J. Wright, ‘Numerical Optimization’.
• Chapter 1, Dimitri P. Bertsekas, ‘Nonlinear Programming’.

3.1 Introduction

In unconstrained optimization, we minimize the objective function that depends on real variables, with no
restrictions on the values of these variables. The formulation is

min f (x), (3.1)

where x ∈ Rn is a real vector with n components and f : Rn → R is a continuously differentiable nonlinear

function. The solutions to the optimization problem (3.1) are almost always impossible to obtain directly
- with few exceptions. Hence, for the most part, we solve this problem with iterative algorithms. Most of
the interesting algorithms for this problem rely on an important idea, called iterative descent. The idea is
to start from the initial point x0 and successively generate the sequence {xk }, such that f is decreased at
each iteration. There are two main types of algorithms in unconstrained optimization - line search methods
and trust region methods; in this lecture, we focus on the former.
In line search method we first choose a search direction dk from the current point xk , using the information
about the function f . Then we determine the stepsize sk along the search direction, so that the new point
is

xk+1 = xk + sk dk (3.2)

The aim of different choices of sk is mainly to ensure that the algorithm defined by (3.2) results to be
globally convergent, possibly without deteriorating the rate of convergence. We have assumed a constant
stepsize up to this point, i.e. fixed stepsize sk = s, is used for all iterations. The constant stepsize rule is
straightforward. However, if the stepsize is too large, divergence will occur; while the stepsize is too small,
the convergence rate may be very slow. Thus, the constant stepsize rule is useful only for problems where
an appropriate constant stepsize value is known or can be determined fairly.
The best possible choice of the stepsize sk is the value that minimize the objective function f along the
search direction dk :

sk = arg min f (xk + sdk ), (3.3)

However, unless the function f has some particular structure, an exact minimization of (3.3) is computationally
expensive. Moreover, in general, the whole minimization algorithm does not gain any special advantage

3-1
from the knowledge of such optimal sk . Therefore, it is more convenient to use approximate methods,
i.e., computationally simple methods that guarantee particular convergence properties. We discuss how to
choose the stepsize sk in a dynamic manner by approximately solving the one-dimensional minimization
problem (3.3).

3.2 Gradient methods

We consider gradient related descent methods of the form

xk+1 = xk − sk Dk ∇f (xk ), (3.4)

where Dk is a positive definite symmetric matrix. Since dk = −Dk ∇f (xk ), the descent condition
∇f (xk )dk < 0 is written as

∇f (xk )Dk ∇f (xk ) > 0,

and it holds due to the positive definiteness of Dk . Here are some of choices of the matrix Dk , resulting in
methods that are widely used:
• Steepest descent method: The choice of Dk is the identity matrix, that is Dk = I. This is the
simplest choice but it often leads to slow convergence.
• Newton’s method The choice of Dk is the inverse Hessian of the objective function the the current
iterate, that is Dk = [∇2 f (x)]−1 . Generally, Newton’s methods converges fast as compare to steepest
descent methods.
We need to choose the stepsize by solving a one-dimensional minimization problem (3.3). An exact solution
is, in general, difficult to obtain. Most practical methods try some candidate s and select the one that
reduces the function value. The main aim is to assure that we get a sufficiently significant decrease in f
without taking a tiny step.

3.3 Stepsize selection

Once the search direction is computed, we have to decide how far to move along that direction. The distance
to move along the search direction dk is determined by the stepsize sk . We would like to choose sk to give
a substantial reduction of f . The ideal choice of sk would be the global minimizer of a one-dimensional
optimization problem:

s∗ = arg min f (xk + sdk ) (3.5)

s>0

By solving (3.5) exactly, we would drive the maximum benefit from the search direction dk − exact line
search. However, exact minimization may be computationally expensive, and it is usually unnecessary.
Most practical methods perform an inexact line search, i.e. they find an approximate solution of (3.5).
A typical inexact line search method tries out a sequence of candidate values for s, stopping to accept
one of these values when certain conditions are satisfied. We impose the following two conditions on the
stepsize: (1) sk has to guarantee a sufficient reduction of the objective function f , and (2) sk has to be
sufficiently distant from 0. The inexact line search methods find the interval of acceptable values for sk
such that these conditions are satisfied. Then a bisection or interpolation can be used to compute a good
stepsize within this interval.

3-2
3.3.1 Armijo Condition

A popular inexact line search condition stipulate that sk must give a sufficient decrease in the objective
function f , as measured by the following inequality:

f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk , (Armijo Condition) (3.6)

for some constant c1 ∈ (0, 1). In other words, the reduction in f is proportional to both the step-length sk
and the directional derivative ∇f (xk )T dk . The inequality (3.6) is called the Armijo condition. Can you
spot any weakness of the Armijo condition?

3.3.2 Goldstein Conditions

The Armijo condition may not be enough by itself to ensure that the algorithm makes reasonable progress,
since it is satisfied for all sufficiently small values of s. To rule out unacceptably short steps, the stepsize
sk is required to satisfy the Goldstein conditions defined as:

Definition 3.1. Let c1 and c2 be two constants such that 0 < c1 < c2 < 1. The stepsize sk satisfy the
Goldstein conditions if:

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

(2) f (xk + sk dk ) ≥ f (xk ) + c2 sk ∇f (xk )T dk .

The first condition ensure sufficient reduction in f , while the second condition ensure sufficient distance
between xk and xk+1 . The weak point of Goldstein conditions is that they can exclude the optimal value
of the stepsize.

3.3.3 Wolfe Conditions

An alternative way of preventing the step length from being too short is to impose conditions on the
derivative of the objective function f . We introduce the curvature condition, which require sk to satisfy:

∇f (xk + sk dk )T dk ≥ c2 ∇f (xk )T dk , (3.7)

for some constant c2 ∈ (c1 , 1), where c1 in the constant from the Armijo condition (3.6). The right hand
side of (3.7) is simply the derivative of (3.5) with respect to s. The curvature condition ensure that the
slope of the univariate function f (xk + sdk ) at s = sk is greater than c2 times the initial slope at s = 0.
This makes sense because if the slope ∇f (xk + sdk )T dk is strongly negative, we have an indication that
we can reduce f significantly along the search direction dk . On the other hand, if ∇f (xk + sdk )T dk is
slightly negative or even positive, it is an indication that we cannot expect much more decrease of f along
the direction dk .
We couple the sufficient decrease condition (3.6) with curvature condition (3.7) to define the Wolfe
conditions:

Definition 3.2. Let c1 and c2 be two constants such that 0 < c1 < c2 < 1. The stepsize sk satisfy the
Wolfe conditions if:

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

(2) ∇f (xk + sk dk )T dk ≥ c2 ∇f (xk )T dk .

3-3
The stepsize may satisfy Wolfe’s conditions without being particularly close to a minimizer of (3.5). We
can modify the curvature condition to enforce sk lie in a small neighbourhood of the local minimizer of
(3.5). This lead to the definition of the Strong Wolfe conditions:

Definition 3.3. Let c1 and c2 be two constants such that 0 < c1 < c2 < 1. The step length sk satisfy the
Strong Wolfe conditions if:

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

(2) ∇f (xk + sk dk )T dk ≤ c2 ∇f (xk )T dk .

The second condition of strong Wolfe conditions does not allow the slope ∇f (xk + sdk )T dk to be positive.
Hence, this exclude points that are far from stationary points of (3.5).

3.3.4 Backtracking line search

We look at the implementation of the above conditions. The Armijo condition alone does not ensure
that the method will make reasonable progress along the given search direction. However, suppose the
line search method chooses its candidate steps appropriately by using a so-called backtracking approach.
In that case, we can dispense with the second condition of Goldstein and use the sufficient decrease
condition(Armijo condition) to terminate the line search procedure. In its most basic form, backtracking
proceeds as described in Algorithm 1

Algorithm 1 Backtracking line search

1: Choose s̄ > 0, reduction factor ρ ∈ (0, 1) and c1 ∈ (0, 1); then set s ← s̄.
2: while f (xk + sdk ) > f (xk ) + c1 s∇f (xk )T dk do
3: s ← ρs;
4: end while
5: return sk = s.

In this procedure, the initial stepsize s̄ is chosen to be 1 in Newton methods, but can have different values
in other algorithms such as steepest descent method. The parameter s̄ fixes the search for stepsize to lie
within the interval [0, s̄]. In practice c1 is chosen to be very small, for example c1 = 10−3 . An acceptable
step length will be found after a finite number of trails, because sk will eventually become small enough
that the sufficient decrease condition holds. Notice, other conditions may not be implemented because the
backtracking strategy avoids excessively small steps.

3.4 Convergence

We state convergence results of the gradient methods using the Armijo condition to select the stepsize.

Theorem 3.1. Let {xk } be a sequence of generated by a gradient method (3.4), and assume that dk satisfy
the descent condition and the stepsize sk is chosen using the Armijo condition. Then every limit point of
{xk } is a stationary point.
Refer to the references for a proof of this theorem. Similar theorems can be stated for Goldstein and Wolfe
conditions. The same conclusion of Theorem 3.1 holds if we use the exact minimization of f (xk + sdk )
as the stepsize.

3-4
3.5 Summary

In this lecture, we have discussed inexact line search methods. Along a given descent direction, we take a
step that reduces the function to some acceptable degree. The Armijo condition can measure the acceptable
reduction of the objective function. In addition to the Armijo condition, the Goldstein conditions also
avoid tiny stepsizes. However, Goldstein conditions may fail to bracket the optimal value of the step size.
Wolfe conditions resolve this problem. In practice, implementation of Armijo with backtracking is usually
enough.

PROBLEM SET III

1. Consider the problem of minimizing the function of two variables f (x, y) = 3x2 + y 4 .
(a) Apply one iteration of the steepest descent method with (1, −2) as the starting point and with
the stepsize chosen by the Armijo condition with s̄ = 1, c1 = 0.1, ρ = 0.5.
(b) Repeat (a) using s̄ = 1, c1 = 0.1, ρ = 0.1 instead. How does the cost of the new iterate compare
to that obtained in (a)? Comment on the tradeoffs involved in the choice of ρ.
(c) Apply one iteration of Newton’s method with the same starting point and stepsize rule as in (a).
How does the cost of the new iterate compare to that obtained in (a)? How about the amount
of work involved in finding the new iterate?
2. Show graphically that if 0 < c2 < c1 < 1, there may be no stepsizes that satisfy the Wolfe conditions.
3. Program the steepest descent and Newton methods using backtracking line search. Use them to
minimize the Rosenbrock function. Set the initial stepsize s̄ = 1 and print the stepsize used by each
method at each iteration. First try the initial point x0 = [1.2, 1.2]T and then the more difficult staring
point x0 = [−1.2, 1]T .

3-5

Wolfe Conditions
No ratings yet
Wolfe Conditions
5 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
MU123. Discovering Mathematics. Handbook
100% (1)
MU123. Discovering Mathematics. Handbook
52 pages
CntrlEngg (Optimization) RecentAdvancesInSystemModellingAndOptimization JorgeAmaya
No ratings yet
CntrlEngg (Optimization) RecentAdvancesInSystemModellingAndOptimization JorgeAmaya
203 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Lec 02
No ratings yet
Lec 02
43 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Optimumengineeringdesign Day5
No ratings yet
Optimumengineeringdesign Day5
84 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
Opt Sem10
No ratings yet
Opt Sem10
26 pages
Optimization For Machine Learning: Massachusetts Institute of Technology
No ratings yet
Optimization For Machine Learning: Massachusetts Institute of Technology
169 pages
6 OneD Unconstrained Opt
No ratings yet
6 OneD Unconstrained Opt
29 pages
7 Numerical Methods For Unconstrained Optimization PDF
No ratings yet
7 Numerical Methods For Unconstrained Optimization PDF
25 pages
Integral Calculus Reviewer PDF
No ratings yet
Integral Calculus Reviewer PDF
17 pages
Slide 6: Script For 17 March 2020
No ratings yet
Slide 6: Script For 17 March 2020
3 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Lecture 9
No ratings yet
Lecture 9
31 pages
(1.5.2) Unconstrained Nonlinear Programming
No ratings yet
(1.5.2) Unconstrained Nonlinear Programming
25 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Nocedal - Wright CH - 02-02
No ratings yet
Nocedal - Wright CH - 02-02
12 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Lecture 9
No ratings yet
Lecture 9
15 pages
Lecture 09 - Sequential Quadratic Programming
No ratings yet
Lecture 09 - Sequential Quadratic Programming
4 pages
Ame-2 One-Dimensional-Search
No ratings yet
Ame-2 One-Dimensional-Search
11 pages
Lecture 5 Si416 2025
No ratings yet
Lecture 5 Si416 2025
21 pages
Clnote Sept24
No ratings yet
Clnote Sept24
24 pages
Unrestricted Search
No ratings yet
Unrestricted Search
5 pages
Part3 1
No ratings yet
Part3 1
15 pages
MIT8 01SC Problems03 Soln
No ratings yet
MIT8 01SC Problems03 Soln
12 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Vector Quantization: Data Compression and Data Retrieval
No ratings yet
Vector Quantization: Data Compression and Data Retrieval
39 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Subgrad Method Slides
No ratings yet
Subgrad Method Slides
33 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Calculus 2: Engr. Luisito Lolong Lacatan, Pcpe, PHD
No ratings yet
Calculus 2: Engr. Luisito Lolong Lacatan, Pcpe, PHD
46 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Download
No ratings yet
Download
7 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Math Construction Project
No ratings yet
Math Construction Project
24 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
Other Nonlinear Regression Methods For Algebraic Models
No ratings yet
Other Nonlinear Regression Methods For Algebraic Models
17 pages
II r20 Aids Syllabus
No ratings yet
II r20 Aids Syllabus
45 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
013391545X - Section10-2 - 10-4
No ratings yet
013391545X - Section10-2 - 10-4
18 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
302 Tut 4 Solns
No ratings yet
302 Tut 4 Solns
51 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Career Point DPP
No ratings yet
Career Point DPP
4 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
S1 Maths 1 ST Term
No ratings yet
S1 Maths 1 ST Term
42 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Session 1 Objectives: Define Ratio and Proportion
No ratings yet
Session 1 Objectives: Define Ratio and Proportion
14 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Geometry Handout #5: 1 Problems
No ratings yet
Geometry Handout #5: 1 Problems
6 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
637
No ratings yet
637
8 pages
Central Tendency Variation Skewness Individual Performance Relationships
No ratings yet
Central Tendency Variation Skewness Individual Performance Relationships
9 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Math Module 7 Week 1
No ratings yet
Math Module 7 Week 1
10 pages
Input Output New Pattern
No ratings yet
Input Output New Pattern
13 pages
2x2 Production Economy Handout
No ratings yet
2x2 Production Economy Handout
3 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Topology and Geometry PDF
No ratings yet
Topology and Geometry PDF
86 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
12 pages
Complex Analysis Week2Lecture1
No ratings yet
Complex Analysis Week2Lecture1
15 pages
School of Computer Science and Applied Mathematics: X I I N
No ratings yet
School of Computer Science and Applied Mathematics: X I I N
9 pages
Lecture 05 - Quasi Newthon Methods
No ratings yet
Lecture 05 - Quasi Newthon Methods
10 pages
Lecture 04 - Conjugate Gradient Methods
No ratings yet
Lecture 04 - Conjugate Gradient Methods
9 pages
Lecture 08 - Penalty and Augmented Lagrangian Methods
No ratings yet
Lecture 08 - Penalty and Augmented Lagrangian Methods
7 pages
John Nash's Problem
No ratings yet
John Nash's Problem
4 pages
Optim
No ratings yet
Optim
70 pages
NM Practical
No ratings yet
NM Practical
5 pages
11 Schwarzschild
No ratings yet
11 Schwarzschild
10 pages
June 2021 MS
No ratings yet
June 2021 MS
29 pages
Alfaro Et. Al (2015) Extensión de La Jornada Educativa en AML
No ratings yet
Alfaro Et. Al (2015) Extensión de La Jornada Educativa en AML
37 pages
AREA UNDER CURVE IIT ADV Previous Year Q Bank Till 2020
No ratings yet
AREA UNDER CURVE IIT ADV Previous Year Q Bank Till 2020
10 pages
CH6 - Trig Identities and Equations
No ratings yet
CH6 - Trig Identities and Equations
34 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
2 pages
Appm 2007 Course Outline Numerics
No ratings yet
Appm 2007 Course Outline Numerics
2 pages
Binary Adder
No ratings yet
Binary Adder
5 pages
René Descartes (Father of Analytic Geometry)
No ratings yet
René Descartes (Father of Analytic Geometry)
3 pages
Predicates and Quantifiers-1
No ratings yet
Predicates and Quantifiers-1
9 pages

School of Computer Science and Applied Mathematics

Uploaded by

School of Computer Science and Applied Mathematics

Uploaded by

School of Computer Science and Applied Mathematics

APPM 3017: Optimization III

min f (x), (3.1)

where x ∈ Rn is a real vector with n components and f : Rn → R is a continuously differentiable nonlinear

sk = arg min f (xk + sdk ), (3.3)

3.2 Gradient methods

We consider gradient related descent methods of the form

xk+1 = xk − sk Dk ∇f (xk ), (3.4)

∇f (xk )Dk ∇f (xk ) > 0,

3.3 Stepsize selection

s∗ = arg min f (xk + sdk ) (3.5)

f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk , (Armijo Condition) (3.6)

3.3.2 Goldstein Conditions

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

3.3.3 Wolfe Conditions

∇f (xk + sk dk )T dk ≥ c2 ∇f (xk )T dk , (3.7)

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

(1) f (xk + sk dk ) ≤ f (xk ) + c1 sk ∇f (xk )T dk ,

3.3.4 Backtracking line search

Algorithm 1 Backtracking line search

PROBLEM SET III

You might also like