0% found this document useful (0 votes)

197 views7 pages

Steepest Descent

The document discusses the steepest descent method for minimizing functions. It is one of the oldest and simplest optimization methods, proposed by Cauchy in 1847. While it converges slowly, understanding it helped develop more advanced methods. Recent modifications to the step size have improved its performance, sparking new interest in the steepest descent method.

Uploaded by

stankcc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views7 pages

Steepest Descent

Uploaded by

stankcc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Steepest Descent

Juan C. Meza1 Lawrence Berkeley National Laboratory Berkeley, California 94720

Abstract The steepest descent method has a rich history and is one of the simplest and best known methods for minimizing a function. While the method is not commonly used in practice due to its slow convergence rate, understanding the convergence properties of this method can lead to a better understanding of many of the more sophisticated optimization methods. Here, we give a short introduction and discuss some of the advantages and disadvantages of this method. Some recent results on modied versions of the steepest descent method are also discussed.

Keywords: optimization,gradient,minimization,Cauchy

Introduction

The classical steepest descent method is one of the oldest methods for the minimization of a general nonlinear function. The steepest descent method, also known as the gradient descent method, was rst proposed by Cauchy in 1847 [1]. In the original paper, Cauchy proposed the use of the gradient as a way of solving a nonlinear equation of the form f ( x 1 , x2 , . . . , x n ) = 0 , (1)

where f is a real-valued continuous function that never becomes negative and which remains continuous, at least within certain limits. The basis for the method is the simple observation that a continuous function should decrease, at least initially, if one takes a step along the direction of the negative gradient. The only diculty then is deciding how to choose the
This work was supported in part by the Director, Oce of Science, U.S. Department of Energy, under Contract No. DE-AC02-05CH11231.
1

length of the step one should take. While this is easy to compute for special cases such as a convex quadratic function, the general case usually requires the minimization of the function in question along the negative gradient direction. Despite its simplicity, the steepest descent method has played an important role in the development of the theory of optimization. Unfortunately, the method is known to be quite slow in most real-world problems and is therefore not widely used. Instead, more powerful methods such as the conjugate gradient method or quasi-Newton methods are frequently used. Recently however, several attempts have been proposed to improve the eciency of the method. These modications have led to a newfound interest in the steepest descent method, both from a theoretical and practical viewpoint. These methods have pointed to the interesting observation that the gradient direction itself is not a bad choice, but rather that the original step length chosen leads to the slow convergence behavior.

Method of Steepest Descent

Suppose that we would like to nd the minimum of a function f (x), x Rn , and f : Rn R. We will denote the gradient of f by gk = g (xk ) = f (xk ). The general idea behind most minimization methods is to compute a step along a given search direction, dk , for example, xk+1 = xk + k dk , k = 0, 1 , . . . , (2)

where the step length, k , is chosen so that k = arg minf (xk + dk ).

(3)

Here arg min refers to the argument of the minimum for the given function. For the steepest descent method, the search direction is given by dk = f (xk ). The steepest descent algorithm can now be written as follows: The two main computational advantages of the steepest descent algorithm is the ease with which a computer algorithm can be implemented and the low storage requirements necessary, O(n). The main work requirement is the line search required to compute the step length, k and the computation of the gradient. 2

Algorithm 1 Steepest Descent Method Given an initial x0 , d0 = g0 , and a convergence tolerance tol for k = 0 to maxiter do Set k = argmin () = f (xk ) gk xk+1 = xk k gk Compute gk+1 = f (xk+1 ) if ||gk+1 ||2 tol then converged end if end for

Convergence Theory

One of the main advantages to the steepest descent method is that it has a nice convergence theory [2, 3]. It is fairly easy to show that the steepest descent method has a linear rate of convergence, which is not too surprising given the simplicity of the method. Unfortunately, even for mildly nonlinear problems this will result in convergence that is too slow for any practical application. On the other hand, the convergence theory for the steepest descent method is extremely useful in understanding the convergence behavior of more sophisticated methods. To start, lets consider the case of minimizing the following quadratic function 1 f (x) = xT Qx bT x, (4) 2 where b Rn , and Q is an n n symmetric positive denite matrix. Since Q is symmetric and positive denite, all of the eigenvalues are real and positive. Let the eigenvalues of the matrix Q be given by 0 < 1 2 , . . . , n . Note that the gradient of (4) is simply g (x) = Qx b. so we can write one step of the method of steepest descent as xk+1 = xk k (Qxk b), (6) (5)

where k is chosen to minimize f (x) along the direction gk . A simple calculation (for the quadratic case) yields the following equation for k : k =
T gk gk . T gk Qgk

(7)

To analyze the convergence, it is easiest to consider the quantity f (xk ) f (x ), where x denotes the global minimizer of equation (4). Here we will follow proofs that can be found in standard texts such as [2, 3]. We rst notice that the unique minimizer to equation (4) is given by the solution to the linear system Qx = b. (8) Consider the quantity: 1 T 1 T f ( xk ) f ( x ) = xk Qxk bT xk (x ) Qx bT x 2 2 1 T 1 T T = xk Qxk (Qx ) xk (x ) Qx (Qx )T x 2 2 1 T = ( xk x ) Q( xk x ) 2 To compute a bound, one uses a lemma due to Kantorivich, which can be found in Luenberger [2]. In particular, when the method of steepest descent with exact line searches is used on a strongly convex quadratic function then: 2 ( Q) 1 f (xk+1 ) f (x ) f ( xk ) f ( x ) . (9) ( Q) + 1 where (Q) = n /1 is the condition number of the matrix Q. A similar bound can be derived for the case of a general nonlinear objective function, if we assume that k is the global minimizer along the search direction.

Example

Consider a simple example of a 3 dimensional quadratic function given by: 1 f (x) = xT Q bT x, (10) 2 where 1 0 0 1 Q = 0 0 b = 1 . 0 0 2 1 Using the steepest descent algorithm on this example problem produces the following results. The convergence tolerance was set so that the algorithm would terminate when ||g (xk )|| 106 . One can clearly see the eects of even a mildly large condition number as predicted by the error bound and as seen in the number of iterations required to achieve convergence. 4

2 5 10 20 50

# iterations (A) 27 4 161 25 633 100 2511 400 15619 2500

Bound .3600 .8521 .9801 .9950 .9984

Table 1: Steepest Descent ((A) = 1 /n ).

Scaling

One of the most important aspects in minimizing real-world problems is the issue of scaling. Because of the way that many scientic and engineering problems are initially formulated it is not uncommon to have problems due to variables having widely diering magnitudes. This can be due to many issues, but a common one is that variables have dierent physical units that can lead to the optimization variables having orders of magnitude dierences. For example, one variable could be given in kilometers (103 meters) and another variable might be in milliseconds (103 seconds) leading to a 6 order of magnitude dierence. As a general rule of thumb, one would like to have all the variables in an optimization problem having roughly similar magnitudes however. This leads to better decisions in which search direction to choose as well as in deciding when convergence is achieved. One fairly standard approach is to use a diagonal scaling based on what a typical value of a variable is expected to be. One would then transform the variables by the scaling: x = Dx, (11) where D is a diagonal scaling matrix. In the test problem given above for example, one simple choice would be: 1 0 0 D = 0 0 . (12) 2 0 0

Extensions

Recently, several new modications to the steepest descent method have been proposed. In 1988, Barzilai and Borwein [4] proposed two new step sizes for use with the negative gradient direction. Although their method did not guarantee descent in the objective function values, their numerical results indicated a substantial improvement over the classical steepest descent method. One of their main observations was that the behavior of the steepest descent algorithm depended as much on the step size as on the search direction. They proposed instead the following procedure. First one writes the new iterate as: 1 xk+1 = xk gk . (13) k Then, instead of computing the step size by doing a line search or using the formula for the quadratic case 7, one computes the step length, k , through the following formula: sT1 yk1 k = k , (14) sT k 1 sk 1

where sk1 = xk xk1 and yk1 = gk gk1 . Using this new formula, Barzilai and Borwein were able to produce a substantial improvement in the performance of the steepest descent algorithm for certain test problems. Subsequently, Raydan was able to prove convergence of the Barzilai and Borwein method for the case of a strictly convex quadratic function for any number of variables and in 1997 he proposed a nonmonotone line search strategy due to Grippo, Lampariello, and Lucidi [5], that guarantees global convergence [6] for the general nonlinear case. For an excellent overview on this subject and further details see [7]. The steepest descent method is one of the oldest known methods for minimizing a general nonlinear function. The convergence theory for the method is widely used and is the basis for understanding many of the more sophisticated and well known algorithms. However, the basic method is well known to converge slowly for many problems and is rarely used in practice. Recent results have generated a renewed interest in the steepest descent method. The main observation is that the steepest descent direction can be used with a dierent step size than the classical method that can substantially improve the convergence. One disadvantage however is the lack of monotone convergence. After so many years, it is interesting to note that this method can still yield some surprising results. 6

References
[1] A. Cauchy. Methodes generales pour la resolution des systemes dequations simultanees,. C.R. Acad. Sci. Par., 25:536538, 1847. [2] D. G. Luenberger and Yinyu Ye. Linear and Nonlinear Programming. Springer, 2008. [3] Stephen G. Nash and Ariela Sofer. Linear and Nonlinear Programming. McGraw-Hill, 1996. [4] J. Barzilai and J. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis, 8:141148, 1988. [5] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newtons method. SIAM Journal of Numer. Anal., 23:707 716, 1986. [6] M. Raydan. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal of Optimization, 7(1):2633, 1997. [7] R. Fletcher. On the Barzilai-Borwein method. In L. Qi, K. Teo, and X. Yang, editors, Optimization and Control with Applications, pages 235 256. Springer, 2005.

Detailed Lesson Plan in MAPEH 8 Arts q1
75% (4)
Detailed Lesson Plan in MAPEH 8 Arts q1
15 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
Constrained Optimization With Inequality Constraint
No ratings yet
Constrained Optimization With Inequality Constraint
43 pages
CH 11 Non Linear Programming
No ratings yet
CH 11 Non Linear Programming
47 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Multicriteria Decision Making ESYE 562 Electre Methods
No ratings yet
Multicriteria Decision Making ESYE 562 Electre Methods
35 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Proceedings Else
0% (1)
Proceedings Else
454 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
BigM Method Calculator
0% (2)
BigM Method Calculator
4 pages
Orthogonal Array Testing
100% (2)
Orthogonal Array Testing
9 pages
Final
100% (1)
Final
14 pages
STK 310 Multicollinearity Class Test MEMO
No ratings yet
STK 310 Multicollinearity Class Test MEMO
3 pages
183bus 152a 2 - 1535685063
0% (1)
183bus 152a 2 - 1535685063
14 pages
Krylov Methods
No ratings yet
Krylov Methods
35 pages
(Quiz) Statistics in Finance
No ratings yet
(Quiz) Statistics in Finance
19 pages
4b. Simplex Method (M Technique) - Max Objective Case
100% (1)
4b. Simplex Method (M Technique) - Max Objective Case
23 pages
Download
No ratings yet
Download
7 pages
Krylov Subspace Methods For Solving Large Unsymmetric Linear Systems (Saad)
No ratings yet
Krylov Subspace Methods For Solving Large Unsymmetric Linear Systems (Saad)
22 pages
Part 8 - Gauss SeidelJacobi
No ratings yet
Part 8 - Gauss SeidelJacobi
9 pages
Krylov Subspace Methods and Preconditioning: Are Magnus Bruaset
No ratings yet
Krylov Subspace Methods and Preconditioning: Are Magnus Bruaset
36 pages
Classical Optimization Techniques
No ratings yet
Classical Optimization Techniques
48 pages
Quiz 10
No ratings yet
Quiz 10
4 pages
Simplex Method in Excel
100% (1)
Simplex Method in Excel
6 pages
Dual Methods For The Minimization of The Total Variation
No ratings yet
Dual Methods For The Minimization of The Total Variation
30 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
(Settle) 5.3.3.5 Packet Tracer - Configure Layer 3 Switches Instructions
100% (1)
(Settle) 5.3.3.5 Packet Tracer - Configure Layer 3 Switches Instructions
2 pages
Least Mean Square Algorithm
No ratings yet
Least Mean Square Algorithm
14 pages
The Hysteresis Loop and Magnetic Properties
100% (1)
The Hysteresis Loop and Magnetic Properties
2 pages
Krylov Methods in Mor
No ratings yet
Krylov Methods in Mor
39 pages
Box Plots and Distribution
No ratings yet
Box Plots and Distribution
14 pages
Educational Research Report
100% (1)
Educational Research Report
76 pages
Bifurcations Economics Models PDF
No ratings yet
Bifurcations Economics Models PDF
24 pages
4-Simplex +big M + Sensityve Analysis (TM 4)
100% (1)
4-Simplex +big M + Sensityve Analysis (TM 4)
52 pages
Tema Microeconomie Engleza Csie
No ratings yet
Tema Microeconomie Engleza Csie
10 pages
10 Dual Simplex Method
100% (1)
10 Dual Simplex Method
16 pages
Chapter 14 Fixed Effects Regressions Least Square Dummy Variable Approach (EC220)
No ratings yet
Chapter 14 Fixed Effects Regressions Least Square Dummy Variable Approach (EC220)
26 pages
Uncertainty Based Information
No ratings yet
Uncertainty Based Information
10 pages
Bivariate Normal Distribution
No ratings yet
Bivariate Normal Distribution
6 pages
PPT2 - W1-S2 - Simplex Method & Duality Theory-R0
No ratings yet
PPT2 - W1-S2 - Simplex Method & Duality Theory-R0
73 pages
1 - Course Slides - Data Science and ML Fundamentals
No ratings yet
1 - Course Slides - Data Science and ML Fundamentals
92 pages
Physics
No ratings yet
Physics
8 pages
Noise Cancelation Using Adaptive Filters
No ratings yet
Noise Cancelation Using Adaptive Filters
21 pages
Histograms
No ratings yet
Histograms
14 pages
Game Theory. Final Exam With Solutions: February 3, 2012
No ratings yet
Game Theory. Final Exam With Solutions: February 3, 2012
8 pages
8603 Guess Paper-1 - Unlocked
No ratings yet
8603 Guess Paper-1 - Unlocked
82 pages
Java Oops
No ratings yet
Java Oops
28 pages
Two Phase Method
No ratings yet
Two Phase Method
3 pages
Box and Whisker Primer
No ratings yet
Box and Whisker Primer
4 pages
New Ugc Rules For Cas June - 2010
0% (1)
New Ugc Rules For Cas June - 2010
11 pages
Z T and Chi-Square Tables
No ratings yet
Z T and Chi-Square Tables
6 pages
10.4 Applications of Numerical Methods Applications of Gaussian Elimination With Pivoting
No ratings yet
10.4 Applications of Numerical Methods Applications of Gaussian Elimination With Pivoting
11 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Lec 24 Lagrange Multiplier
No ratings yet
Lec 24 Lagrange Multiplier
20 pages
Functions of Random Variables
No ratings yet
Functions of Random Variables
5 pages
Panel Data Lecture Rome
No ratings yet
Panel Data Lecture Rome
47 pages
4 Duality Theory
No ratings yet
4 Duality Theory
17 pages
Instruction
No ratings yet
Instruction
6 pages
Optimization Methods
No ratings yet
Optimization Methods
13 pages
Comparative Politics Evolution Nature and Scope
No ratings yet
Comparative Politics Evolution Nature and Scope
7 pages
The Case Study On Penang South - Island Reclamation (PSR) Megaproject
100% (1)
The Case Study On Penang South - Island Reclamation (PSR) Megaproject
43 pages
丹田 and a scientific definition of Qi 氣 that can explain ... are obese. (tai chi and meditation Book 6)
No ratings yet
丹田 and a scientific definition of Qi 氣 that can explain ... are obese. (tai chi and meditation Book 6)
7 pages
Masters Thesis Timeline
100% (3)
Masters Thesis Timeline
7 pages
IDEA TRIBE - 2025 - Broucher
No ratings yet
IDEA TRIBE - 2025 - Broucher
4 pages
AFS Foundations Handbook - Final
No ratings yet
AFS Foundations Handbook - Final
41 pages
The Importance of PCB Trace Widths in PCB Design
No ratings yet
The Importance of PCB Trace Widths in PCB Design
10 pages
Engineered Ferrites and Their Applications: Pankaj Sharma Gagan Kumar Bhargava Sumit Bhardwaj Indu Sharma Editors
No ratings yet
Engineered Ferrites and Their Applications: Pankaj Sharma Gagan Kumar Bhargava Sumit Bhardwaj Indu Sharma Editors
261 pages
3 Lesson 1 - Worksheet
No ratings yet
3 Lesson 1 - Worksheet
3 pages
2025 Grandiose Mock - Science 2
No ratings yet
2025 Grandiose Mock - Science 2
4 pages
23MIPrelimQP (H2 Chem Paper 3)
No ratings yet
23MIPrelimQP (H2 Chem Paper 3)
26 pages
Film Pendek "Gen Z" Sebagai Media Edukasi Tentang Pentingnya Memuliakan Orang Tua
No ratings yet
Film Pendek "Gen Z" Sebagai Media Edukasi Tentang Pentingnya Memuliakan Orang Tua
91 pages
MPS 020 - 032
No ratings yet
MPS 020 - 032
32 pages
The Preparing For The Threat - Questions
No ratings yet
The Preparing For The Threat - Questions
4 pages
Halaman 291-328
No ratings yet
Halaman 291-328
76 pages
Wind Power Statistics and An Evaluation of Wind Energy D - 1995 - Renewable Ener
No ratings yet
Wind Power Statistics and An Evaluation of Wind Energy D - 1995 - Renewable Ener
6 pages
Scholarship For MSC Student
No ratings yet
Scholarship For MSC Student
3 pages
Septic Tank & Leach Field
No ratings yet
Septic Tank & Leach Field
1 page
IoT and Machine Learning Approaches For Automation
No ratings yet
IoT and Machine Learning Approaches For Automation
8 pages
Volumes by Sections Using Prismoidal Formulas
0% (1)
Volumes by Sections Using Prismoidal Formulas
4 pages
Samai (Hod) Applied Accounting Year 1 Business Economices Sec Sem
No ratings yet
Samai (Hod) Applied Accounting Year 1 Business Economices Sec Sem
19 pages
AAiT PECC 2015 Year II Sem I Sections 21
No ratings yet
AAiT PECC 2015 Year II Sem I Sections 21
21 pages
Determinants of Work-Readiness: Siti Nurlaela Kurjono Rasto
No ratings yet
Determinants of Work-Readiness: Siti Nurlaela Kurjono Rasto
7 pages
FDT Crusherun L 1
No ratings yet
FDT Crusherun L 1
1 page
Sentence Structure
No ratings yet
Sentence Structure
3 pages
Chap4 - Functions, Pigeonhole Principle
No ratings yet
Chap4 - Functions, Pigeonhole Principle
31 pages
Stability of A Planar Interface During Solidification of A Dilute Binary Alloy PDF
No ratings yet
Stability of A Planar Interface During Solidification of A Dilute Binary Alloy PDF
9 pages
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet

Steepest Descent

Uploaded by

Steepest Descent

Uploaded by

Steepest Descent

Juan C. Meza1 Lawrence Berkeley National Laboratory Berkeley, California 94720

Method of Steepest Descent

where the step length, k , is chosen so that k = arg minf (xk + dk ).

# iterations (A) 27 4 161 25 633 100 2511 400 15619 2500

Bound .3600 .8521 .9801 .9950 .9984

Table 1: Steepest Descent ((A) = 1 /n ).

You might also like