Chapter 8 Lecture Notes

This chapter discusses search methods for real-valued functions using gradients, specifically focusing on the concept of level sets and the gradient descent algorithm. The method of steepest descent is introduced as a technique to minimize functions, where the step size is chosen to maximize the decrease in the objective function. Practical stopping criteria for the algorithm are also provided, along with examples illustrating the application of the steepest descent method to quadratic functions.

Uploaded by

jmarmolejo03

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Chapter 8 Lecture Notes

Uploaded by

jmarmolejo03

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

In this chapter we consider a class of search methods for real-valued functions on ℝn.

These methods use the gradient of the given function.

• A level set of a function f: ℝn → ℝ on is the set of points x satisfying f(x) = c for some constant c.
○ Thus, a point x0 ∈ ℝn is on the level set corresponding to level c if f(x0) = c.
• In the case of functions of two real variables, f: ℝ2 → ℝ, the notion of the level set is illustrated in following figure

• The gradient of f at x0 denoted ∇f(x0), if it is not a zero vector, is orthogonal to the tangent vector to an arbitrary smooth curve passing
through x0 on the level set f(x) = c.
○ Thus, the direction of maximum rate of increase of a real-valued differentiable function at a point is orthogonal to the level set of the
function through that point.
• In other words, the gradient acts in such a direction that for a given small displacement, the function f increases more in the direction
of the gradient than in any other direction.
• To prove this statement, recall that 〈∇f(x), d〉, ||d|| = 1, is the rate of increase of f in the direction d at the point x.
• By the Cauchy-Schwarz inequality,

• Thus, the direction in which ∇f(x) points is the direction of maximum rate of increase of f at x.
○ The direction in which – ∇f(x) points is the direction of maximum rate of decrease of f at x.
• Hence, the direction of negative gradient is a good direction to search if we want to find a function minimizer.

• Let x(0) be a starting point, and consider the point x(0) – α∇f(x(0)). Then, by Taylor’s theorem, we obtain

○ Thus, if ∇f(x(0)) ≠ 0, then for sufficiently small α > 0, we have

• This means that the point x(0) – α∇f(x(0))) is an improvement over the point x(0) if we are searching for a minimizer.

• To formulate an algorithm that implements this idea, suppose that we are given a point x(k).
• To find the next point x(k+1) we start at x(k) and move by an amount –α∇f(x(k)), where αk is a positive scalar called the step size.
• This procedure leads to the following iterative algorithm:

○ We refer to this as a gradient descent algorithm (aka gradient algorithm).

• The gradient varies as the search proceeds, tending to zero as we approach the minimizer.
• We have the option of either taking very small steps and reevaluating the gradient at every step, or we can take large steps each
time.
• The first approach results in a laborious method of reaching the minimizer, whereas the second approach may result in a more
zigzag path to the minimizer.
• The advantage of the second approach is possibly fewer gradient evaluations.
• Among many different methods that use this philosophy the most popular is the method of steepest descent, which we discuss
next.
• A very popular example is applying a gradient method to the training of a class of neural networks.

The Method of Steepest Descent:

The method of steepest descent is a gradient algorithm where the step size αk is chosen to achieve the maximum amount of decrease of
the objective function at each individual step. Specifically, αk is chosen to minimize ϕk(α) ≜ f(x(k) – α▽ f(x(k)) In other words,
• To summarize, the steepest descent algorithm proceeds as follows:
○ At each step, starting from the point x(k) we conduct a line search in the direction – ∇f(x(k)) until a minimizer, x(k+1), is found.
○ A typical sequence resulting from the method of steepest descent is depicted in the following figure

• Note: The method of steepest descent moves in orthogonal steps, as stated in the following proposition.

Proposition 1
If [Equation] {x(k)}∞k=0 is a steepest descent sequence for a given function f: ℝn → ℝ, then for each k the vector x(k+1) – x(k) is
orthogonal to the vector
x(k+2) – x(k+1).

• The proposition above implies that ∇f(x(k)) is parallel to the tangent plane to the level set {f(x) = f(x(k+1))} at x(k+1).
○ Note that as each new point is generated by the steepest descent algorithm, the corresponding value of the function f decreases in
value, as stated below.

Proposition 2)
If {x(k)}∞k=0 is the steepest descent sequence for f: ℝn → ℝ and if ∇f(x(k)) ≠ 0, then f(x(k+1)) < f(x(k)).

• If for some k, we have ∇f(x(k)) = 0, then the point x(k) satisfies the FONC.
○ In this case, x(k+1) = x(k). We can use the above as the basis for a stopping (termination) criterion for the algorithm.
○ The condition ∇f(x(k+1)) = 0, however, is not directly suitable as a practical stopping criterion, because the numerical computation of the
gradient will rarely be identically equal to zero.
• A practical stopping criterion is to check if the norm ||∇f(x(k))|| of the gradient is less than a prespecified threshold, in which case we
stop.
○ Alternatively, we may compute the absolute difference |f(x(k+1)) – f(x(k))| between objective function values for every two successive
iterations, and if the difference is less than some prespecified threshold, then we stop; that is, we stop when

○ Yet another alternative is to compute the norm ||x(k+1) – x(k)|| of the difference between two successive iterates, and we stop if the
norm is less than a prespecified threshold:

○ Alternatively, we may check “relative” values of the quantities above; for example,

Note: The two (relative) stopping criteria above are preferable to the previous (absolute) criteria because the relative criteria are “scale-
independent.”
○ For example, scaling the objective function does not change the satisfaction of the criterion |f(x(k+1)) – f(x(k))|/|f(x(k))| < ε.
○ Similarly, scaling the decision variable does not change the satisfaction of the criterion ||x(k+1) – x(k)||/||x(k))|| < ε.

Example 1) Use the method of steepest descent to find the minimizer of the following function
• Let us now see what the method of steepest descent does with a quadratic function of the form

○ where Q ∈ ℝn×n is a symmetric positive definite matrix, b ∈ ℝn, and x ∈ ℝn. The unique minimizer of f can be found by setting the
gradient of f to zero, where

○ There is no loss of generality in assuming Q to be a symmetric matrix.

• Therefor if we are given a quadratic form x⊤Ax and A ≠ A⊤, then because the transposition of a scalar equals itself, we obtain

• The Hessian of f is F(x) = Q = Q⊤ > 0. To simplify the notation we write g(k) = ∇f(x(k)). Then, the steepest descent algorithm for the
quadratic function can be represented as

• In the quadratic case, we can find an explicit formula for αk.

○ Assume that g(k) ≠ 0, for if g(k) = 0, then x(k) = x* and the algorithm stops.
○ Because αk ≥ 0 is a minimizer of ϕk(α) = f(x(k) – αg(k)), we apply the FONC to ϕk(α) to obtain

• In summary, the method of steepest descent for the quadratic takes the form
Example 2) Let f(x1,x2 ) = x 1 2+x 2 2. Find the minimal solution.

• What if f(x1,x2 ) = x 1 2/5+x 2 2?

ACT数学真题分类
No ratings yet
ACT数学真题分类
31 pages
Chapter 9 Lecture Notes
No ratings yet
Chapter 9 Lecture Notes
3 pages
Multi-Variable Optimization Methods
No ratings yet
Multi-Variable Optimization Methods
21 pages
Open Methods
No ratings yet
Open Methods
9 pages
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
No ratings yet
R Is Differentiable. We Want To Approximate A Point A Where F Takes F, ,, - . - in Which F
3 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
Newton-Raphson Optimization: Steve Kroon
No ratings yet
Newton-Raphson Optimization: Steve Kroon
4 pages
Optim
No ratings yet
Optim
70 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000207 - 2024-03-01 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000207 - 2024-03-01 - Reference-Material-I
31 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Jason Preszler: G (X) H (X)
100% (1)
Jason Preszler: G (X) H (X)
10 pages
Project For Automated Train by Roshan
No ratings yet
Project For Automated Train by Roshan
6 pages
Microproject PPT (MATHS)
No ratings yet
Microproject PPT (MATHS)
26 pages
U0 J LHWW Cncy 4 B 4 L FK GSK
No ratings yet
U0 J LHWW Cncy 4 B 4 L FK GSK
38 pages
Optimization and Search
No ratings yet
Optimization and Search
27 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
Multivariable Optimization
No ratings yet
Multivariable Optimization
48 pages
Conjugate Gradient Method
No ratings yet
Conjugate Gradient Method
8 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Chapter 1 (Derivatives in use)
No ratings yet
Chapter 1 (Derivatives in use)
25 pages
Asymptote Horizontal Asymptote: Horizontal Asymptotes Are Horizontal Lines To Which The Function Is
No ratings yet
Asymptote Horizontal Asymptote: Horizontal Asymptotes Are Horizontal Lines To Which The Function Is
19 pages
Differentiation & Maxima & Minima
No ratings yet
Differentiation & Maxima & Minima
30 pages
AP Calculus AB - Ultimate Guide Notes - Knowt
No ratings yet
AP Calculus AB - Ultimate Guide Notes - Knowt
29 pages
EC2104 Lecture 2
No ratings yet
EC2104 Lecture 2
40 pages
Topic 2
No ratings yet
Topic 2
30 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Lecture #7
No ratings yet
Lecture #7
19 pages
LM Add Maths Section 8 LVersion
No ratings yet
LM Add Maths Section 8 LVersion
51 pages
Ye3wJw0BNsYg99Eqcykf
No ratings yet
Ye3wJw0BNsYg99Eqcykf
11 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Unconstrained Minimization
No ratings yet
Unconstrained Minimization
7 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
CALCULUS 2 FUNCTION and LIMIT PDF
No ratings yet
CALCULUS 2 FUNCTION and LIMIT PDF
74 pages
Conjugate Gradient Method Report
No ratings yet
Conjugate Gradient Method Report
17 pages
Applications of Elliptic Curves Over Finite Fields: University of New Brunswick
No ratings yet
Applications of Elliptic Curves Over Finite Fields: University of New Brunswick
8 pages
QIC4
No ratings yet
QIC4
5 pages
Motion in A Straight Line New Notes
No ratings yet
Motion in A Straight Line New Notes
8 pages
Notes
No ratings yet
Notes
73 pages
CH 4-Design Optimization-Optimum Design Concepts PDF
No ratings yet
CH 4-Design Optimization-Optimum Design Concepts PDF
62 pages
Slope of A Line Tangent To A Curve
100% (1)
Slope of A Line Tangent To A Curve
17 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Newtons Method
No ratings yet
Newtons Method
8 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
ConvergeofHybridSpaceMapping Alg
No ratings yet
ConvergeofHybridSpaceMapping Alg
12 pages
MHF4U Unit 1
No ratings yet
MHF4U Unit 1
9 pages
Single Variable Calculus Assignment Help
No ratings yet
Single Variable Calculus Assignment Help
55 pages
Mathematical Induction
No ratings yet
Mathematical Induction
17 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
Application of Quadratic Functions Group
No ratings yet
Application of Quadratic Functions Group
8 pages
The Bisection Method: Bisection Method Is Yet Another Technique For Finding A Solution To The Nonlinear
No ratings yet
The Bisection Method: Bisection Method Is Yet Another Technique For Finding A Solution To The Nonlinear
5 pages
ECOM 6302: Engineering Optimization: Chapter Three
100% (1)
ECOM 6302: Engineering Optimization: Chapter Three
56 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
DSAL-210-Lecture 3 - Understanding Asymptotic Notation
No ratings yet
DSAL-210-Lecture 3 - Understanding Asymptotic Notation
22 pages
NEOM Manual Part-II 4-Expts
No ratings yet
NEOM Manual Part-II 4-Expts
41 pages
Limit
No ratings yet
Limit
37 pages
Curvature: Prerequisites
No ratings yet
Curvature: Prerequisites
14 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Matlab Worksheet 1
No ratings yet
Matlab Worksheet 1
2 pages
Slides Chapter 03
No ratings yet
Slides Chapter 03
11 pages
2011 Euclid Contest: The Centre For Education in Mathematics and Computing
No ratings yet
2011 Euclid Contest: The Centre For Education in Mathematics and Computing
18 pages
History of Pi
No ratings yet
History of Pi
3 pages
Exercise Sheet 10
No ratings yet
Exercise Sheet 10
2 pages
QA Chapter4 PDF
No ratings yet
QA Chapter4 PDF
5 pages
15.093 Optimization Methods
No ratings yet
15.093 Optimization Methods
10 pages
17ME61-Scheme and Syllabus
No ratings yet
17ME61-Scheme and Syllabus
3 pages
Elims MSP Math Wiz 2016
No ratings yet
Elims MSP Math Wiz 2016
3 pages
Bertrand's Postulate For Carmichael Numbers: Daniel Larsen
No ratings yet
Bertrand's Postulate For Carmichael Numbers: Daniel Larsen
26 pages
TB 17 65475b1fe9a0c3.65475b2172cf01.34694443
No ratings yet
TB 17 65475b1fe9a0c3.65475b2172cf01.34694443
2 pages
Contraction Principle
No ratings yet
Contraction Principle
21 pages
Immediate download Algebra and Trigonometry Eighth Edition Ron Larson ebooks 2024
100% (7)
Immediate download Algebra and Trigonometry Eighth Edition Ron Larson ebooks 2024
82 pages
Math 141 2021 Fall Syllabus
No ratings yet
Math 141 2021 Fall Syllabus
2 pages
Sistem Bilangan (Eng. Numeral System) : Dedy Wirawan Soedibyo
No ratings yet
Sistem Bilangan (Eng. Numeral System) : Dedy Wirawan Soedibyo
28 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Maths Class 10th All Chapter Notes
No ratings yet
Maths Class 10th All Chapter Notes
22 pages
Quant Questions For XAT
No ratings yet
Quant Questions For XAT
12 pages
Math
No ratings yet
Math
2 pages
103 Assymptotic Notations
No ratings yet
103 Assymptotic Notations
112 pages
LECTURE 01 (C)
No ratings yet
LECTURE 01 (C)
20 pages
UIC_INTEGRAL CALCULUS_2017 key
No ratings yet
UIC_INTEGRAL CALCULUS_2017 key
5 pages
Module 2 - 2
No ratings yet
Module 2 - 2
18 pages
Siddhartha Gadgil and Harish Seshadri
No ratings yet
Siddhartha Gadgil and Harish Seshadri
20 pages
Vedic Maths
No ratings yet
Vedic Maths
19 pages
G Ss1 Further Maths
No ratings yet
G Ss1 Further Maths
6 pages
Mixed Signals-MIDTERM-EXAM
No ratings yet
Mixed Signals-MIDTERM-EXAM
19 pages
7 TH
No ratings yet
7 TH
7 pages
Try These : Solve For X
No ratings yet
Try These : Solve For X
13 pages