0% found this document useful (0 votes)

40 views

Lecture 5

Here are the steps to minimize the given function f(x,y) = 1/2 x^2 + 9y^2/2 using steepest descent with optimized step size: 1. Initialize x(0), y(0) 2. Compute the gradient: ∇f(x,y) = (x, 9y) 3. The descent direction is the negative gradient: d = (-x, -9y) 4. Construct the function to optimize for step size: h(α) = f(x - αx, y - α9y) = 1/2(x - αx)^2 + 9/2(y - α9y)^

Uploaded by

Reema Amgad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views

Lecture 5

Uploaded by

Reema Amgad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

AIS323

Spring 2023

Lecture 5
• Steepest descent Algorithm Weaknesses
• Addressing steepest descent weaknesses
• Step size
• Slow convergence
• Zig-zagging
• Momentum-based approaches

• Michel Bierlaire, Optimzation: Principles and algorithms, 2nd edition, EPFL Press, 2018.
• Mykel J. Kochenderfer, Tim A. Wheeler, Algorithms for Optimization, MIT Press, 2019.
• Jermey Watt, Reza Borhani, Aggelos K. Kataggelos,, Machine Learning Refined foundations, Algorithms, and
Applications, 2nd Edition, Cambridge University Press, 2020.
Topic Source

Problem formulation Michel Bieraire, Optimzation: Principles and Algorithms, EPFL

Press, 2018, Ch. 1

Exact solution of univariate

functions and stationary points

Contour plots Mykel J. Kochenderfer Tim A. Wheeler, Algorithms for

Optimization, MIT Press, 2019. Ch. 1, Sec. 2.1 and 2.2.
First order and second order
derivatives of multivariable
functions
Exact solution of multivariate
functions and stationary points

Gradient descent and its variants Jermey Watt, Reza Borhani, Aggelos K. Kataggelos,, Machine
Learning Refined foundations, Algorithms, and Applications, 2nd
Edition, Cambridge University Press, 2020, Ch.3 , Appendix A.
2
• The steepest descent algorithm generally converges slowly
towards an extremum.

• A natural weakness of the steepest descent : The direction

Gradient of the negative gradient can rapidly oscillate during a run of

gradient descent, often producing zig-zagging steps that
take considerable time for the gradient descent algorithm to
descent reach a minimum point.

Weaknesses • A natural weakness of the steepest descent : The magnitude

of the negative gradient can vanish rapidly near stationary
points, leading gradient descent to slowly crawl near
minima and saddle (inflection) points.

• If the function has more than one local minimum, the

algorithm can get stuck in one of them.

3
• Finding a good step size may take some trial and error for
the specific function. It is problem dependent.

Gradient
descent
• The step size controls whether the algorithm converges to a
Weaknesses minimum quickly or slowly. As at each step of the algorithm,
we always move in a descent direction, but how far we move in
this direction is controlled by the step size. So, if the step size is
set too small, we can descend too slow, and if set too large we
may even ascend.

4
• Finding a good step size may take some trial and error for
the specific function. It is problem dependent.

5
• To reach a minimum, the magnitude of the negative gradient can vanish rapidly
near stationary points, leading gradient descent to slowly crawl near minima and
saddle points. This can slow down gradient descent’s progress near stationary
points.
Crawling
near •

stationary
points

• A popular solution to this slow-crawling behavior is normalized gradient

descent.
Crawling towards stationary points is due to the vanishing gradient.
This can be addressed by Normalizing gradient descent. We
normalize the steepest descent direction by dividing it by its
magnitude. Doing so gives a normalized gradient descent step of
the form:
Addressing
crawling near 𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶(𝒌)
(𝜵(𝒇(𝒙(𝒌) ))
stationary 𝜵(𝒇(𝒙(𝒌) )
points: So, the movement in the steepest descent direction is totally
Normalizing controlled by the step size.
Gradient
Descent Normalized gradient descent empowers the standard gradient
descent method to push through flat regions of a function with
much greater ease. This includes flat regions of a function that may
lead to a local minimum, or the region around a saddle point of a
nonconvex function where standard gradient descent can halt.
For example, if

∇𝑓 = (0.3,0.1)

∇𝑓 1
= 0.3,0.1 = (0.948, 0.316)
Addressing ∇𝑓 2
0.3 + 0.12

Steepest Descent After normalization Before normalization

Weaknesses:
Normalizing
Gradient
Descent

Less crawling descent

88
For example, if

Addressing ∇𝑓 = (0.3,10)
Steepest Descent ∇𝑓 1
= 0.3,10 = (0.029, 0.99)
Weaknesses: ∇𝑓 2
0.3 + 10 2

Normalizing After normalization Before normalization

Gradient
Descent

Less Zigzagging

99
1010
• Finding a good step size may take some trial and error for
Steepest the specific function. It is problem dependent.

Descent
and
Learning • The step size controls whether the algorithm converges to a
Rate minimum quickly or slowly. As at each step of the algorithm,
we always move in a descent direction, but how far we move in
this direction is controlled by the step size. So, if the step size is
set too small, we can descend too slow, and if set too large we
may even ascend.
• Instead of using a fixed step length rule. It is also
Steepest possible to change the value of α from one step to
another with what is often referred to as an adjustable
Descent with step length rule.

Diminishing • A diminishing step length rule that starts with a large

𝛼 value that gets smaller through the iterations, like
Learning Step 1
𝛼 = at the 𝑘𝑡ℎ iteration can be used.
𝑘

• This avoids overshooting the minimum and zigzagging

around the minimum.

1212
• Some algorithms attempt to optimize the step size
at each iteration so that the step maximally

Steepest decreases 𝑓 for each gradient descent direction.

(analytical methods, exact line search methods,
Descent with
inexact line search methods, ….)
13
Optimized Step
Size (analytically)
• Deriving the optimum step size mathematically at
every step to force descent is computationally
expensive.
The procedure starts with a point 𝑥 (0) and involves the following steps:

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

Steepest otherwise proceed to the next step.

Descent with 2. Determine the steepest descent direction 𝑑 (𝑘) (𝒅(𝒌) = −𝜵(𝒇(𝒙(𝒌) ).

Optimized Step 3. Construct ℎ(𝜶) = 𝑓 𝒙 𝒌+𝟏 = 𝒇(𝒙 𝒌 − 𝜶𝛁(𝒇 𝒙 𝒌 )

Size Determine analytically the step size or learning rate 𝜶(𝒌) that minimizes ℎ 𝜶 .
(analytically)
𝜶(𝒌) = min 𝒉(𝜶)
𝜶

4. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) + 𝜶(𝒌) (−𝜵(𝒇(𝒙(𝒌) )).

14
Use the steepest descent method with optimized step size to minimize the following function:
1 9
𝑓 𝑥, 𝑦 = 𝑥 2 + 𝑦 2
2 2
𝑥
Gradient ∇𝑓 𝑥, 𝑦 = 9𝑦

descent: 𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶(𝒌) 𝛁(𝒇(𝒙(𝒌) )

Optimized Step 𝒙 (𝒌+𝟏)

𝒙(𝒌+𝟏) 𝒙
= (𝒌+𝟏) = 𝒚
(𝒌) 𝒙
− 𝜶 𝟗𝒚
𝒌
=
𝒙 𝒌 (𝟏 − 𝜶)
.
𝒚 𝒌
Size (analytical) 𝒚
1 9
𝒚 (𝟏 − 𝟗𝜶)

𝒌+𝟏 𝒌
ℎ 𝛼 =𝑓 𝒙 = (𝒙 (𝟏 − 𝜶)2 + (𝒚 𝒌
(𝟏 − 𝟗𝜶)2
2 2
A univariate We need to solve min ℎ 𝛼
function 𝛼

𝑑𝑓 𝒙 𝒌+𝟏 𝒌 𝒌
= −(𝒙 𝟏 − 𝜶 ) − 81(𝒚 (𝟏 − 𝟗𝜶))=0
𝑑𝜶

2 2
𝒌 𝒙 𝒌 +81 𝒚 𝒌
𝜶 = min ℎ 𝛼 = 2 2
𝛼 𝒙 𝒌 +729 𝒚 𝒌

Michel Bieraire, Optimzation: Principles and Algorithms, EPFL Press, 2018. 15

𝑥 (1) = (9,3 ) Optimized 𝛼

Notice that the gradient vectors of two

successive steps are perpendicular.

The value of 𝑓 for the first five iterations

81
31.609756097561
12.3355145746579
4.81385934620797
1.87857925705677

16
Michel Bieraire, Optimzation: Principles and Algorithms, EPFL Press, 2018.
𝑥 (1) = (9,3 ) Fixed 𝛼 = 0.2 Optimized 𝛼

17
Michel Bieraire, Optimzation: Principles and Algorithms, EPFL Press, 2018.
𝑥 (1) = (9,3 ) Optimized 𝛼

𝑓 𝑥, 𝑦

𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠

18
Michel Bieraire, Optimzation: Principles and Algorithms, EPFL Press, 2018.
The procedure starts with a point 𝑥 (1) and involves the following steps:

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

otherwise proceed to the next step.

2. Determine the steepest descent direction 𝑑 (𝑘) (𝒅(𝒌) = −𝜵(𝒇(𝒙(𝒌) ).

Gradient
descent with 3. Construct ℎ(𝜶) = 𝑓 𝒙 𝒌+𝟏 = 𝒇(𝒙 𝒌 − 𝜶𝛁(𝒇 𝒙 𝒌 )

exact line Determine the step size or learning rate 𝜶(𝒌) that minimizes ℎ 𝜶 .

search 𝜶(𝒌) = min 𝒉(𝜶)

𝜶

using numerical univariate optimization methods (Golden section, Newton’s

method, ….

4. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) + 𝜶(𝒌) (−𝜵(𝒇(𝒙(𝒌) )).

• We need to Spend less effort in calculating the step size. So, instead of
Gradient trying to solve univariate optimization problem (analytically or numerically),

descent with a trial and error approach is considered where various step sizes are tested
and the first one that is suitable is accepted.
inexact line
search
• Inexact line search is a strategy for picking efficiently a good value of step size
which results in a quicker convergence.
The procedure starts with a point 𝑥 (0) and involves the following steps:

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

otherwise proceed to the next step.

Gradient
Gradient 2. Determine the steepest descent direction 𝑑 (𝑘) (𝒅(𝒌) = −𝜵(𝒇(𝒙(𝒌) ).
Wolfe conditions provide an upper and lower
descent
descent with
with bound on the admissible step length values.

inexact
inexact line
line 3. a) Calculate initial loss 𝒇(𝒙(𝒌) ) and initialize 𝜶(𝒌) to a large value.

search
search b) Calculate 𝒇(𝒙(𝒌) −𝜶 𝒌
𝜵(𝒇 𝒙 𝒌
)

c) If 𝒇( 𝒙(𝒌) −𝜶 𝒌
𝜵(𝒇 𝒙 𝒌
)is less than 𝒇(𝒙(𝒌) ) then this value of 𝜶(𝒌) is

acceptable, else decrease 𝜶(𝒌) by some factor and repeat b).

Back tracking line search

4. Compute the next design point according to
𝒙(𝒌+𝟏) = 𝒙(𝒌) + 𝜶(𝒌) (−𝜵(𝒇(𝒙(𝒌) )).
Oscillating negative gradient

The Zig-
Gradient
zagging
descent with
behavior of a
inexact line
gradient
search
descent

A popular solution to the zig-zagging behavior is momentum accelerated gradient descent.

Momentum: is the quantity of motion of a moving body, measured as a
product of its mass and velocity.

The gradient descent with momentum algorithm (or Momentum for short)
borrows the idea from physics. Imagine rolling down a ball inside of a
frictionless bowl. Instead of stopping at the bottom, the momentum it has
Momentum accumulated pushes it forward, and the ball keeps rolling back and forth.

Accelerated The ball naturally gathers momentum as gravity causes it to accelerate, just as
the gradient causes momentum to accumulate in this descent method.
Steepest
Descent

https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam- 23
f898b102325c
Momentum
Accelerated
Steepest
Descent

https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-
24
adam-f898b102325c
Momentum helps solve the issue of slow convergence.

Momentum
Accelerated
Steepest
Descent 𝜔1

𝜔2

https://medium.com/analytics-vidhya/momentum-a-simple-yet-efficient-optimizing-technique-
ef76834e4423 25
Momentum
Accelerated
Steepest
Descent

𝜔1

𝜔2
Gradient descent takes a long time to traverse a nearly flat surface as they
have gradients with small magnitudes and thus require many iterations of
gradient descent to traverse.

The momentum is a function of every negative gradient which precedes it. It

Momentum captures the average of the directions preceding it.

Accelerated Modify the gradient descent to incorporate momentum

Steepest
Descent The momentum update equations are:
𝒅(𝒌) = 𝜷𝒅(𝒌−𝟏) + (𝟏 − 𝜷)𝜵𝒇(𝒙𝒌 )
𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶𝒅(𝒌)

For 𝛽 = 0, we recover the gradient descent.

For larger 𝛽, 𝑑 (𝑘) represents a summary of the subsequent preceding
gradients.
27
The procedure starts with a point 𝑥 (0) , 𝜷 (typically 𝟎. 𝟕), 𝜶 and 𝑑 (0) = ∇𝑓(𝑥 0 ) and
involves the following steps:

Momentum
Accelerated 1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

28 Steepest otherwise proceed to the next step.

Descent 2. Determine the steepest descent direction

𝒅(𝒌) = 𝜷𝒅(𝒌−𝟏) + (𝟏 − 𝜷)𝜵𝒇(𝒙𝒌 )

3. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶𝒅(𝒌)

Use the momentum accelerated steepest descent method to minimize the following function:
𝑓 𝑥, 𝑦 = 0.5𝑥 2 + 9.75𝑦 2

𝜶 = 𝟎. 𝟏, 𝒙(𝟎) = (𝟏𝟎, 𝟏)
Momentum
Accelerated
29
Steepest
Descent:
Example
𝜶 = 𝟎. 𝟏, 𝒙(𝟎) = (𝟏𝟎, 𝟏)

𝛽 = 0.2
𝛽=0

Both momentum-accelerated ( 𝜷 = 𝟎. 𝟐 and 𝜷 = 𝟎. 𝟕)

versions clearly outperform the standard scheme (𝜷 = 𝟎)
, in that they reach a point closer to the true minimum of 𝛽 = 0.7
the function and the overall path taken by gradient
descent is smoothest for 𝜷 = 𝟎. 𝟕.

3030
Momentum
Accelerated One issue of momentum is that the
Steepest steps do not slow down enough at the
Descent bottom of a valley and tend to overshoot
the valley floor.
Weakness

Connect 4 Research Paper
No ratings yet
Connect 4 Research Paper
15 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Download
No ratings yet
Download
7 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Slide 6: Script For 17 March 2020
No ratings yet
Slide 6: Script For 17 March 2020
3 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
Lecture8_UnconstrainedII_2023
No ratings yet
Lecture8_UnconstrainedII_2023
57 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Unit3_rev3
No ratings yet
Unit3_rev3
201 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Lecture10v01 Descent2
No ratings yet
Lecture10v01 Descent2
18 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Advanced Gradient Descent
No ratings yet
Advanced Gradient Descent
14 pages
lecture7-graddesc
No ratings yet
lecture7-graddesc
8 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Cs3491 - Aiml - Unit III - Gradient Descent
No ratings yet
Cs3491 - Aiml - Unit III - Gradient Descent
12 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
OPTIMIZATION _Lecture4_
No ratings yet
OPTIMIZATION _Lecture4_
10 pages
Machine Learning - Lecture 2
No ratings yet
Machine Learning - Lecture 2
28 pages
Backpropagation_optimization_tutorial
No ratings yet
Backpropagation_optimization_tutorial
14 pages
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
No ratings yet
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
45 pages
Stats 102B Cheat Sheet
No ratings yet
Stats 102B Cheat Sheet
4 pages
Opt_Lec_10
No ratings yet
Opt_Lec_10
16 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
IB352 Warwick Wk4 - Lecture-4
No ratings yet
IB352 Warwick Wk4 - Lecture-4
22 pages
Lecture 7 (with notes)
No ratings yet
Lecture 7 (with notes)
39 pages
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
No ratings yet
Algorithm For Unconstrained-Multivariable Case-2 (CH 6)
31 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
No ratings yet
Mathematical Analysis of Descent Algorithms in Artificial Intelligence Convergence, Loss Landscapes, and Structural Optimization
8 pages
Lecture_7_8_other_descent_methods
No ratings yet
Lecture_7_8_other_descent_methods
7 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Cauchy Gradient Based Technique Lecture 5
No ratings yet
Cauchy Gradient Based Technique Lecture 5
21 pages
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 EEE1020 ETH VL2023240103124 2023-08-19 Reference-Material-I
9 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Optim
No ratings yet
Optim
70 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
week 10 notes MLF
No ratings yet
week 10 notes MLF
20 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Chương 9
No ratings yet
Chương 9
12 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Hill Climbing: Fundamentals and Applications
From Everand
Hill Climbing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
Operating Systems - Ch6 - Mod - Reem
No ratings yet
Operating Systems - Ch6 - Mod - Reem
36 pages
Operating Systems - Ch3 - Mod - Reem
No ratings yet
Operating Systems - Ch3 - Mod - Reem
30 pages
Operating Systems - Ch4 - Mod - Reem
No ratings yet
Operating Systems - Ch4 - Mod - Reem
21 pages
Lecture 2
100% (1)
Lecture 2
67 pages
The Effects of Joining University On Student Gpa
No ratings yet
The Effects of Joining University On Student Gpa
37 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

AIS323

Problem formulation Michel Bieraire, Optimzation: Principles and Algorithms, EPFL

Exact solution of univariate

Contour plots Mykel J. Kochenderfer Tim A. Wheeler, Algorithms for

• A natural weakness of the steepest descent : The direction

Gradient of the negative gradient can rapidly oscillate during a run of

Weaknesses • A natural weakness of the steepest descent : The magnitude

• If the function has more than one local minimum, the

• A popular solution to this slow-crawling behavior is normalized gradient

Steepest Descent After normalization Before normalization

Less crawling descent

Normalizing After normalization Before normalization

Diminishing • A diminishing step length rule that starts with a large

• This avoids overshooting the minimum and zigzagging

Steepest decreases 𝑓 for each gradient descent direction.

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

Steepest otherwise proceed to the next step.

Optimized Step 3. Construct ℎ(𝜶) = 𝑓 𝒙 𝒌+𝟏 = 𝒇(𝒙 𝒌 − 𝜶𝛁(𝒇 𝒙 𝒌 )

4. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) + 𝜶(𝒌) (−𝜵(𝒇(𝒙(𝒌) )).

descent: 𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶(𝒌) 𝛁(𝒇(𝒙(𝒌) )

Optimized Step 𝒙 (𝒌+𝟏)

Michel Bieraire, Optimzation: Principles and Algorithms, EPFL Press, 2018. 15

Notice that the gradient vectors of two

The value of 𝑓 for the first five iterations

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

otherwise proceed to the next step.

2. Determine the steepest descent direction 𝑑 (𝑘) (𝒅(𝒌) = −𝜵(𝒇(𝒙(𝒌) ).

search 𝜶(𝒌) = min 𝒉(𝜶)

using numerical univariate optimization methods (Golden section, Newton’s

4. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) + 𝜶(𝒌) (−𝜵(𝒇(𝒙(𝒌) )).

1. Check whether 𝒙(𝒌) satisfies the termination conditions. If it does, terminate;

otherwise proceed to the next step.

acceptable, else decrease 𝜶(𝒌) by some factor and repeat b).

Back tracking line search

A popular solution to the zig-zagging behavior is momentum accelerated gradient descent.

The momentum is a function of every negative gradient which precedes it. It

Accelerated Modify the gradient descent to incorporate momentum

For 𝛽 = 0, we recover the gradient descent.

28 Steepest otherwise proceed to the next step.

Descent 2. Determine the steepest descent direction

𝒅(𝒌) = 𝜷𝒅(𝒌−𝟏) + (𝟏 − 𝜷)𝜵𝒇(𝒙𝒌 )

3. Compute the next design point according to

𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝜶𝒅(𝒌)

Both momentum-accelerated ( 𝜷 = 𝟎. 𝟐 and 𝜷 = 𝟎. 𝟕)

You might also like