0% found this document useful (0 votes)

7 views

Hauser Lecture2

The document discusses line search methods for unconstrained optimization problems. It introduces the generic framework for line search methods and describes how they iteratively improve an initial solution by computing a search direction and step length. It also provides details on computing the step length using backtracking line search and the Armijo condition. Finally, it discusses methods for computing the search direction, including the steepest descent direction.

Uploaded by

simranlps1405

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Hauser Lecture2

Uploaded by

simranlps1405

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Line Search Methods for

Unconstrained Optimisation

Lecture 8, Numerical Linear Algebra and Optimisation

Oxford University Computing Laboratory, MT 2007
Dr Raphael Hauser ([email protected])
The Generic Framework
For the purposes of this lecture we consider the unconstrained minimisation
problem
(UCM) minn f (x),
x∈R

where f ∈ C 1(Rn, R) with Lipschitz continous gradient g(x).

• In practice, these smoothness assumptions are sometimes violated, but

the algorithms we will develop are still observed to work well.

• The algorithms we will construct have the common feature that, starting
from an initial educated guess x0 ∈ Rn for a solution of (UCM), a sequence
of solutions (xk )N ⊂ Rn is produced such that
xk → x∗ ∈ Rn
such that the first and second order necessary optimality conditions
g(x∗) = 0,
H(x∗) 0 (positive semidefiniteness)
are satisfied.
• We usually wish to make progress towards solving (UCM) in every itera-
tion, that is, we will construct xk+1 so that
f (xk+1 ) < f (xk )
(descent methods).

• In practice we cannot usually compute x∗ precisely (i.e., give a symbolic

representation of it, see the LP lecture!), but we have to stop with a xk
sufficiently close to x∗ .

• Optimality conditions are still useful, in that they serve as a stopping

criterion when they are satisfied to within a predetermined error tolerance.

• Finally, we wish to construct (xk )N such that convergence to x∗ takes

place at a rapid rate, so that few iterations are needed until the stopping
criterion is satisfied. This has to be counterbalanced with the computa-
tional cost per iteration, as there typically is a tradeoff
faster convergence ⇔ higher computational cost per iteration.

We write f k = f (xk ), g k = g(xk ), and H k = H(xk ).

Generic Line Search Method:

1. Pick an initial iterate x0 by educated guess, set k = 0.

2. Until xk has converged,

i) Calculate a search direction pk from xk , ensuring that this direction
is a descent direction, that is,
[g k ]Tpk < 0 if g k 6= 0,
so that for small enough steps away from xk in the direction pk the
objective function will be reduced.
ii) Calculate a suitable steplength αk > 0 so that
f (xk + αk pk ) < f k .
The computation of αk is called line search, and this is usually an
inner iterative loop.
iii) Set xk+1 = xk + αk pk .

Actual methods differ from one another in how steps i) and ii) are computed.
Computing a Step Length αk

The challenges in finding a good αk are both in avoiding that

the step length is too long,

2.5

1.5

0.5

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

(the objective function f (x) = x2 and the iterates xk+1 = xk + αk pk generated

by the descent directions pk = (−1)k+1 and steps αk = 2+3/2k+1 from x0 = 2)
or too short,

2.5

1.5

0.5

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

(the objective function f (x) = x2 and the iterates xk+1 = xk + αk pk generated by the descent
directions pk = −1 and steps αk = 1/2k+1 from x0 = 2).
Exact Line Search:

In early days, αk was picked to minimize

(ELS) min f (xk + αpk )
α
s.t. α ≥ 0.
Although usable, this method is not considered cost effective.

Inexact Line Search Methods:

• Formulate a criterion that assures that steps are neither too long nor too
short.

• Pick a good initial stepsize.

• Construct sequence of updates that satisfy the above criterion after very
few steps.
Backtracking Line Search:

1. Given αinit > 0 (e.g., αinit = 1), let α(0) = αinit and l = 0.

2. Until f (xk + α(l)pk )“<”f k ,

i) set α(l+1) = τ α(l), where τ ∈ (0, 1) is fixed (e.g., τ = 12 ),
ii) increment l by 1.

3. Set αk = α(l) .

This method prevents the step from getting too small, but it does not prevent
steps that are too long relative to the decrease in f .

To improve the method, we need to tighten the requirement

f (xk + α(l)pk )“<”f k .
To prevent long steps relative to the decrease in f , we require the Armijo
condition
f (xk + αk pk ) ≤ f (xk ) + αk β · [g k ]Tpk
for some fixed β ∈ (0, 1) (e.g., β = 0.1 or even β = 0.0001).

That is to say, we require that the achieved reduction if f be at least a fixed

fraction β of the reduction promised by the first-oder Taylor approximation
of f at xk .
0.14

0.12

0.1

0.08

0.06

0.04

0.02

−0.02

−0.04

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Backtracking-Armijo Line Search:

1. Given αinit > 0 (e.g., αinit = 1), let α(0) = αinit and l = 0.

2. Until f (xk + α(l)pk ) ≤ f (xk ) + α(l)β · [g k ]Tpk ,

1 ),
i) set α(l+1) = τ α(l), where τ ∈ (0, 1) is fixed (e.g., τ = 2

ii) increment l by 1.

3. Set αk = α(l).
Theorem 1 (Termination of Backtracking-Armijo). Let f ∈ C 1 with gradient
g(x) that is Lipschitz continuous with constant γ k at xk , and let pk be a
descent direction at xk . Then, for fixed β ∈ (0, 1),

i) the Armijo condition f (xk + αpk ) ≤ f k + αβ · [g k ]Tpk is satisfied for all

α ∈ [0, αkmax], where
2(β − 1)[g k ]Tpk
αkmax = ,
γ k kpk k22

ii) and furthermore, for fixed τ ∈ (0, 1) the stepsize generated by the backtracking-
Armijo line search terminates with
k ] T pk

2τ (β − 1)[g
αk ≥ min αinit, .
γ k kpk k22

We remark that in practice γ k is not known. Therefore, we cannot simply

compute αkmax and αk via the explicit formulas given by the theorem, and we
still need the algorithm on the previous slide.
Theorem 2 (Convergence of Generic LSM with B-A Steps).
Let the gradient g of f ∈ C 1 be uniformly Lipschitz continuous
on Rn. Then, for the iterates generated by the Generic Line
Search Method with Backtracking-Armijo step lengths, one of
the following situations occurs,

i) g k = 0 for some finite k,

ii) limk→∞ f k = −∞,

[g k ]Tpk

k T k
iii) limk→∞ min |[g ] p |, = 0.
kpk k2
Computing a Search Direction pk

Method of Steepest Descent:

The most straight-forward choice of a search direction, pk = −g k , is called
steepest-descent direction.

• pk is a descent direction.

• pk solves the problem

k T
min p ∈ Rn mL k k
k (x + p) = f + [g ] p
s.t. kpk2 = kg k k2 .

• pk is cheap to compute.

Any method that uses the steepest-descent direction as a search direction is

a method of steepest descent.
Intuitively, it would seem that pk is the best search-direction one can find. If
that were true then much of optimisation theory would not exist!
Theorem 3 (Global Convergence of Steepest Descent). Let the
gradient g of f ∈ C 1 be uniformly Lipschitz continuous on Rn.
Then, for the iterates generated by the Generic LSM with B-A
steps and steepest-descent search directions, one of the following
situations occurs,

i) g k = 0 for some finite k,

ii) limk→∞ f k = −∞,

iii) limk→∞ g k = 0.
Advantages and disadvantages of steepest descent:

⊕ Globally convergent (converges to a local minimiser from any

starting point x0).

⊕ Many other methods switch to steepest descent when they

do not make sufficient progress.

⊖ Not scale invariant (changing the inner product on Rn changes

the notion of gradient!).

⊖ Convergence is usually very (very!) slow (linear).

⊖ Numerically, it is often not convergent at all.

1.5

0.5

−0.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Contours for the objective function f (x, y) = 10(y − x2 )2 + (x − 1)2 (Rosenbrock function),
and the iterates generated by the generic line search steepest-descent method.
More General Descent Methods:

Let B k be a symmetric, positive definite matrix, and define the

search direction pk as the solution to the linear system

B k pk = −g k

• pk is a descent direction, since

[g k ]Tpk = −[g k ]T[B k ]−1g k < 0.

• pk solves the problem

Q 1
min mk (xk + p) = f k + [g k ]Tp + pTB k p.
p∈Rn 2
• pk corresponds to the steepest descent direction if the norm
q
kxkB k := xT B k x
is used on Rn instead of the canonical Euclidean norm. This
change of metric can be seen as preconditioning that can be
chosen so as to speed up the steepest descent method.

• If the Hessian H k of f at xk is positive definite, and B k = H k ,

this is Newton’s method.

• If B k changes at every iterate xk , a method based on the

search direction pk is called variable metric method. In par-
ticular, Newton’s method is a variable metric method.
Theorem 4 (Global Convergence of More General Descent Di-
rection Methods). Let the gradient g of f ∈ C 1 be uniformly
Lipschitz continuous on Rn . Then, for the iterates generated by
the Generic LSM with B-A steps and search directions defined
by B k pk = −g k , one of the following situations occurs,

i) g k = 0 for some finite k,

ii) limk→∞ f k = −∞,

iii) limk→∞ g k = 0,

provided that the eigenvalues of B k are uniformly bounded above,

and uniformly bounded away from zero.
Theorem 5 (Local Convergence of Newton’s Method). Let the Hessian H of
f ∈ C 2 be uniformly Lipschitz continuous on Rn. Let iterates xk be generated
via the Generic LSM with B-A steps using αinit = 1 and β < 12 , and using
the Newton search direction nk , defined by H k nk = −g k . If (xk )N has an
accumulation point x∗ where H(x∗) ≻ 0 (positive definite) then

i) αk = 1 for all k large enough,

ii) limk→∞ xk = x∗ ,

iii) the sequence converges Q-quadratically, that is, there exists κ > 0 such
that
kxk+1 − x∗k
lim ≤ κ.
k→∞ kxk − x∗ k2

The mechanism that makes Theorem 5 work is that once the sequence
(xk )N enters a certain domain of attraction of x∗, it cannot escape again
and quadratic convergence to x∗ commences.
Note that this is only a local convergence result, that is, Newton’s method is
not guaranteed to converge to a local minimiser from all starting points.
The fast convergence of Newton’s method becomes apparent
when we apply it to the Rosenbrock function:
1.5

0.5

−0.5
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

Contours for the objective function f (x, y) = 10(y −x2 )2 +(x−1)2 , and the iterates generated
by the Generic Linesearch Newton method.
Modified Newton Methods:

The use of B k = H k makes only sense at iterates xk where

H k ≻ 0. Since this is usually not guaranteed to always be the
case, we modify the method as follows,

• Choose M k 0 so that H k + M k is “sufficiently” positive

definite, with M k = 0 if H k itself is sufficiently positive defi-
nite.

• Set B k = H k + M k and solve B k pk = −g k .

The regularisation term M k is typically chosen as one of the following,

• If H k has the spectral decomposition H k = Qk Λk [Qk ]T, then

H k + M k = Qk max(ε I, |Dk |)[Qk ]T.

• M k = max(0, −λmin (H k )) I.

• Modified Cholesky method:

1. Compute a factorisation P H k P T = LBLT, where P is a permutation
matrix, L a unit lower triangular matrix, and B a block diagonal matrix
with blocks of size 1 or 2.
2. Choose a matrix F such that B + F is sufficiently positive definite.
3. Let H k + M k = P T L(B + F )LTP .
Other Modifications of Newton’s Method:

1. Build a cheap approximation B k to H k :

• Quasi-Newton approximation (BFGS, SR1, etc.),

• or use finite-difference approximation.

2. Instead of solving B k pk = −g k for pk , if B k ≻ 0 approximately

solve the convex quadratic programming problem
1
(QP) pk ≈ arg min f k + pTg k + pTBp.
p∈Rn 2
The conjugate gradient method is a good solver for step 2:

1. Set p(0) = 0, g (0) = g k , d(0) = −g k , and i = 0.

2. Until g (i) is sufficiently small or i = n, repeat

kg (i) k22
i) α(i) = [d(i) ]T B k d(i)
,

ii) p(i+1) = p(i) + α(i) d(i) ,

iii) g (i+1) = g (i) + α(i) B k d(i) ,
kg (i+1) k22
iv) β (i) = kg (i) k22
,

v) d(i+1) = −g (i+1) + β (i) d(i) ,

vi) increment i by 1.

3. Output pk ≈ p(i) .
Important features of the conjugate gradient method:

• [g k ]Tp(i) < 0 for all i, that is, the algorithm always stops with
a descent direction as an approximation to pk .

• Each iteration is cheap, as it only requires the computation

of matrix-vector and vector-vector products.

• Usually, p(i) is a good approximation of pk well before i = n.

Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Vector Space Over F2
No ratings yet
Vector Space Over F2
7 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Chương 9
No ratings yet
Chương 9
12 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
No ratings yet
Numerical Optimization For Inverse Problems - 10 Lectures On Inverse Problems and Imaging
15 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Lec4 Gradient Method Revise
No ratings yet
Lec4 Gradient Method Revise
33 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Optim
No ratings yet
Optim
70 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Lecture8_UnconstrainedII_2023
No ratings yet
Lecture8_UnconstrainedII_2023
57 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
(1.5.2) Unconstrained Nonlinear Programming
No ratings yet
(1.5.2) Unconstrained Nonlinear Programming
25 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
E1 251 Linear and Nonlinear Op2miza2on
No ratings yet
E1 251 Linear and Nonlinear Op2miza2on
24 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Lec 02
No ratings yet
Lec 02
43 pages
Nocedal_Wright Ch_02-02
No ratings yet
Nocedal_Wright Ch_02-02
12 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
No ratings yet
Global Convergence of A Modified Fletcher-Reeves Conjugate Gradient Method With Armijo-Type Line Search - Zhang, Zhou (2006)
12 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
4 Pattern Directions, 21-08-2024
No ratings yet
4 Pattern Directions, 21-08-2024
58 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
13 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Download
No ratings yet
Download
7 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Solving_Nonlinear_Equations
No ratings yet
Solving_Nonlinear_Equations
18 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
No ratings yet
Exportar Páginas Numerical-Optimization-Second-Edition - Backup
3 pages
Benson1
No ratings yet
Benson1
36 pages
lecture-5-si416-2025
No ratings yet
lecture-5-si416-2025
21 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Multi Variable Optimization: Min F (X, X, X, - X)
No ratings yet
Multi Variable Optimization: Min F (X, X, X, - X)
38 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
Nonlinear Spring21
No ratings yet
Nonlinear Spring21
45 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
No ratings yet
Hawassa University (Hu), Institute of Technology (Iot) Chemical Engineering Department
30 pages
Numerical Methods For Solving Nonlinear Equations1
No ratings yet
Numerical Methods For Solving Nonlinear Equations1
7 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Backpropagation_optimization_tutorial
No ratings yet
Backpropagation_optimization_tutorial
14 pages
Opt_Lec_10
No ratings yet
Opt_Lec_10
16 pages
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
No ratings yet
), R Is Continuously Differentiable.: Journal of Industrial and Management Optimization Volume 1, Number 2, May 2005
8 pages
Wolfram Mathematica Tutorial Collection
No ratings yet
Wolfram Mathematica Tutorial Collection
38 pages
Line Search Methods
No ratings yet
Line Search Methods
7 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Direct Iteration Method: X F (X) F (X) X
No ratings yet
Direct Iteration Method: X F (X) F (X) X
6 pages
Long-Memory Time Series: Theory and Methods
From Everand
Long-Memory Time Series: Theory and Methods
Wilfredo Palma
No ratings yet
21MA101
No ratings yet
21MA101
2 pages
Further Pure Mathematics FP3: Pearson Edexcel GCE
No ratings yet
Further Pure Mathematics FP3: Pearson Edexcel GCE
28 pages
DPP 02 Set Theory - 1044640 - 2022 - 09 - 24 - 14 - 47
No ratings yet
DPP 02 Set Theory - 1044640 - 2022 - 09 - 24 - 14 - 47
2 pages
Instant ebooks textbook No Bullshit Guide to Linear Algebra 2.2 Edition Ivan Savov download all chapters
No ratings yet
Instant ebooks textbook No Bullshit Guide to Linear Algebra 2.2 Edition Ivan Savov download all chapters
52 pages
18.A34 PROBLEMS #7: 1 2 K I I I
No ratings yet
18.A34 PROBLEMS #7: 1 2 K I I I
5 pages
Gas-Lubricated Porous Bearings Short Journal Bearings, Steady-State Solution
No ratings yet
Gas-Lubricated Porous Bearings Short Journal Bearings, Steady-State Solution
7 pages
Schrodinger FDTD
No ratings yet
Schrodinger FDTD
5 pages
Analysis of Algorithms, A Case Study: Determinants of Matrices With Polynomial Entries
No ratings yet
Analysis of Algorithms, A Case Study: Determinants of Matrices With Polynomial Entries
10 pages
Calculus Maple PDF
No ratings yet
Calculus Maple PDF
328 pages
Supp Dit 0205
No ratings yet
Supp Dit 0205
3 pages
Rational Nos. Ans
No ratings yet
Rational Nos. Ans
20 pages
Modul E: Ema Emits College Philippines
No ratings yet
Modul E: Ema Emits College Philippines
4 pages
Linear Programming - The Simplex Algorithm: Z X X X X X X X X XX
No ratings yet
Linear Programming - The Simplex Algorithm: Z X X X X X X X X XX
11 pages
Section 6.4 Vectors and Dot Products: Name
No ratings yet
Section 6.4 Vectors and Dot Products: Name
2 pages
C1 Sequences and Series - General
No ratings yet
C1 Sequences and Series - General
16 pages
PATTERN & SEQUENCE v2
100% (1)
PATTERN & SEQUENCE v2
23 pages
Teaching Program Year 7
No ratings yet
Teaching Program Year 7
41 pages
Sums of Two Squares: Pete L. Clark
No ratings yet
Sums of Two Squares: Pete L. Clark
7 pages
Mathematics Department: ASSIGNMENT - 1: Infinite Series
No ratings yet
Mathematics Department: ASSIGNMENT - 1: Infinite Series
3 pages
2 Signals and Systems: Part I: Solutions To Recommended Problems
100% (1)
2 Signals and Systems: Part I: Solutions To Recommended Problems
9 pages
Sample Test 1
No ratings yet
Sample Test 1
4 pages
Algebra Multi-Step Equations 02
No ratings yet
Algebra Multi-Step Equations 02
2 pages
Z Transform Applications
100% (1)
Z Transform Applications
21 pages
5.2 Vector spaces, linearly independent, dependent, basis
No ratings yet
5.2 Vector spaces, linearly independent, dependent, basis
16 pages
Stat276 Chapter 6
100% (2)
Stat276 Chapter 6
9 pages
Point of Intersection & Concurrency of Lines 10-Nov-2022
No ratings yet
Point of Intersection & Concurrency of Lines 10-Nov-2022
3 pages
Cambridge IGCSE: Additional Mathematics 0606/11
No ratings yet
Cambridge IGCSE: Additional Mathematics 0606/11
16 pages
Cauchy Theorem of Integration
No ratings yet
Cauchy Theorem of Integration
13 pages
Principal Moments of Inertia
No ratings yet
Principal Moments of Inertia
13 pages