0% found this document useful (0 votes)

13 views37 pages

Lecture 3 Gradient Descent

Uploaded by

Khang Thái Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views37 pages

Lecture 3 Gradient Descent

Uploaded by

Khang Thái Duy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Gradient Descent

Nicholas Ruozzi
University of Texas at Dallas
Gradient Descent
• Method to find local optima of differentiable a function
• Intuition: gradient tells us direction of greatest increase,
negative gradient gives us direction of greatest decrease
• Take steps in directions that reduce the function value
• Definition of derivative guarantees that if we take a small
enough step in the direction of the negative gradient, the
function will decrease in value
• How small is small enough?

2
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size (sometimes called learning rate)

3
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size (sometimes called learning rate)

When do we stop?

4
Gradient Descent

Gradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size (sometimes called learning rate)

Possible Stopping Criteria: iterate until for some

How small should be?

5
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4

6
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =− 4 −.8 ⋅ 2 ⋅( − 4)

7
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4

8
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =2.4 −.8 ⋅ 2⋅ 2.4

(1)
𝑥 =0.4
9
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
(2)
𝑥 =−1.44

10
Gradient Descent
2
𝑓 ( 𝑥 ) =𝑥

Step size:
(0)
𝑥 =− 4
(1)
𝑥 =2.4
1.44
(3)
𝑥 =.864
(4 )
𝑥 =− 0.5184
(5)
𝑥 =0.31104

(30)
𝑥 =−8.8429 6 𝑒−07
11
Gradient Descent

Step size: .9

12
Gradient Descent

Step size: .2

13
Gradient Descent

Step size matters!

14
Gradient Descent

Step size matters!

15
Line Search
• Instead of picking a fixed step size that may or may not actually
result in a decrease in the function value, we can consider
minimizing the function along the direction specified by the
gradient to guarantee that the next iteration decreases the
function value
• In other words choose,
• This is called exact line search
• This optimization problem can be expensive to solve exactly 
• However, if is convex, this is a univariate convex optimization
problem

16
Backtracking Line Search
• Instead of exact line search, could simply use a strategy that
finds some step size that decreases the function value (one must
exist)
• Backtracking line search: start with a large step size, , and keep
shrinking it until
• This always guarantees a decrease, but it may not decrease as
much as exact line search
• Still, this is typically much faster in practice as it only requires
a few function evaluations

17
Backtracking Line Search

• To implement backtracking line search, choose two parameters

• Set
• While

• Set

Iterations continue until

a step size is found that
decreases the function
“enough”

18
Backtracking Line Search

𝛼=.2 , 𝛽=.99
19
Backtracking Line Search

𝛼=.1 , 𝛽=.3
20
Gradient Descent: Convex Functions

• For convex functions, local optima are always global optima (this
follows from the definition of convexity)
• If gradient descent converges to a critical point, then the
result is a global minimizer

• Not all convex functions are differentiable, can we still apply

gradient descent?

21
Gradients of Convex Functions
• For a differentiable convex function its gradients yield linear
underestimators

𝑔 ( 𝑥)

22
Gradients of Convex Functions
• For a differentiable convex function its gradients yield linear
underestimators

𝑔 ( 𝑥)

23
Gradients of Convex Functions
• For a differentiable convex function its gradients yield linear
underestimators: zero gradient corresponds to a global
optimum

𝑔 ( 𝑥)

24
Subgradients
• For a convex function , a subgradient at a point is given by any
line, such that and for all , i.e., it is a linear underestimator

𝑔 ( 𝑥)

𝑥
0
𝑥

25
Subgradients
• For a convex function , a subgradient at a point is given by any
line, such that and for all , i.e., it is a linear underestimator

𝑔 ( 𝑥)

𝑥
0
𝑥

26
Subgradients
• For a convex function , a subgradient at a point is given by any
line, such that and for all , i.e., it is a linear underestimator

𝑔 ( 𝑥)

𝑥
0
𝑥

27
Subgradients
• For a convex function , a subgradient at a point is given by any
line, such that and for all , i.e., it is a linear underestimator

𝑔 ( 𝑥)

If is a subgradient
at , then is a global
minimum

𝑥
0
𝑥

28
Subgradients
• If a convex function is differentiable at a point , then it has a
unique subgradient at the point given by the gradient
• If a convex function is not differentiable at a point , it can have
many subgradients
• E.g., the set of subgradients of the convex function at the
point is given by the set of slopes

• The set of all subgradients of at form a convex set, i.e.,

subgradients, then is also a subgradient
• Subgradients only guaranteed to exist for convex functions

29
Subgradient Example

• Subgradient of for convex functions?

30
Subgradient Example

• Subgradient of for convex functions?

• If ,

• and are both subgradients (and so are all convex

combinations of these)

31
Subgradient Descent

Subgradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size and is a subgradient of at

32
Subgradient Descent

Subgradient Descent Algorithm:

• Pick an initial point
• Iterate until convergence

where is the step size and is a subgradient of at

Can you use line search here?

33
Subgradient Descent

Step Size: .9

34
Diminishing Step Size Rules
• A fixed step size may not result in convergence for non-
differentiable functions
• Instead, can use a diminishing step size:
• Required property: step size must decrease as number of
iterations increase but not too quickly that the algorithm fails
to make progress
• Common diminishing step size rules:
• for some
• for some

35
Subgradient Descent

Diminishing Step Size

36
Theoretical Guarantees
• The hard work in convex optimization is to identify conditions
that guarantee quick convergence to within a small error of the
optimum
• Let
• For a fixed step size, , we are guaranteed that

where is some positive constant that depends on

• If is differentiable, then we have whenever is small enough

ViPA Training Manual - Rev I For V4.7xx
100% (2)
ViPA Training Manual - Rev I For V4.7xx
80 pages
Bandana SAP MM-Ariba S2P-Consultant
No ratings yet
Bandana SAP MM-Ariba S2P-Consultant
4 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
ServiceNow Overview Brochure
No ratings yet
ServiceNow Overview Brochure
6 pages
Manual 4JS
No ratings yet
Manual 4JS
408 pages
Proxytokernel
No ratings yet
Proxytokernel
210 pages
JBIMS Strategic MGMT Srini BMN Exam Paper Nov 2022
No ratings yet
JBIMS Strategic MGMT Srini BMN Exam Paper Nov 2022
4 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
History of Computers
No ratings yet
History of Computers
43 pages
Gradient Descent
No ratings yet
Gradient Descent
55 pages
Unit - I: Roots of Equation and Error Approximations (MCQ) : 1 A B C D
No ratings yet
Unit - I: Roots of Equation and Error Approximations (MCQ) : 1 A B C D
6 pages
Unit3 Rev3
No ratings yet
Unit3 Rev3
201 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
Traffic Signal
No ratings yet
Traffic Signal
38 pages
Daily Wisdom: Sayings of The Prophet Muhammad by Abdur Raheem Kidwai
No ratings yet
Daily Wisdom: Sayings of The Prophet Muhammad by Abdur Raheem Kidwai
7 pages
Lec 11
No ratings yet
Lec 11
13 pages
Subgradients Slides
No ratings yet
Subgradients Slides
37 pages
1child Birth Records Management System
No ratings yet
1child Birth Records Management System
53 pages
Lecture7 Graddesc
No ratings yet
Lecture7 Graddesc
8 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
Notes On Subgradients
No ratings yet
Notes On Subgradients
13 pages
Operating System Lab Manual
No ratings yet
Operating System Lab Manual
58 pages
12 Acers
No ratings yet
12 Acers
3 pages
Human-Centered Machine Learning Implementation in Banking Case Study in BRILink BRI Branchless Banking Agent Acquisition Upgrade and Activation
No ratings yet
Human-Centered Machine Learning Implementation in Banking Case Study in BRILink BRI Branchless Banking Agent Acquisition Upgrade and Activation
7 pages
14 New Students Are Ready To Join Your Classroom 6th Grade-Movers
No ratings yet
14 New Students Are Ready To Join Your Classroom 6th Grade-Movers
15 pages
UNOPS - Travel Management Services
No ratings yet
UNOPS - Travel Management Services
35 pages
Plantgro Windows Manual Final
No ratings yet
Plantgro Windows Manual Final
125 pages
Chapter 07
No ratings yet
Chapter 07
20 pages
Subgradients
No ratings yet
Subgradients
39 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Map Update Manual - LTH - BR V2.1 - ENG
No ratings yet
Map Update Manual - LTH - BR V2.1 - ENG
6 pages
Intel® Volume Management Device LED Management Tool Release Notes and Guide
No ratings yet
Intel® Volume Management Device LED Management Tool Release Notes and Guide
8 pages
Operating System Kernels
No ratings yet
Operating System Kernels
7 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
02 Grad Desc
No ratings yet
02 Grad Desc
54 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
1 page
Lecture10v01 Descent2
No ratings yet
Lecture10v01 Descent2
18 pages
Numbering Scheme For Two Motion Selector: Electronic Switching Systems
No ratings yet
Numbering Scheme For Two Motion Selector: Electronic Switching Systems
32 pages
Connection of A SIMATIC S7-1x00 To A SQL Database: SQL / Tabular Data Stream (SQL)
No ratings yet
Connection of A SIMATIC S7-1x00 To A SQL Database: SQL / Tabular Data Stream (SQL)
69 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Week 10 Notes MLF
No ratings yet
Week 10 Notes MLF
20 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
No ratings yet
Subgradients: Subgradient Calculus Duality and Optimality Conditions Directional Derivative
39 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
English Us Maxpro Ssa Faqs-PDF-En-US-1
No ratings yet
English Us Maxpro Ssa Faqs-PDF-En-US-1
6 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Subgradients: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Subgradients: Ryan Tibshirani Convex Optimization 10-725
25 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Continuous Optimization
No ratings yet
Continuous Optimization
23 pages
LInear
No ratings yet
LInear
14 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
DNC 15 User Manual - EN PDF
100% (1)
DNC 15 User Manual - EN PDF
64 pages
06 SG Method
No ratings yet
06 SG Method
33 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Zappavigna 2016 Social Media Photography Construing Subjectivity in Instagram Images
No ratings yet
Zappavigna 2016 Social Media Photography Construing Subjectivity in Instagram Images
22 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Women Safety App
No ratings yet
Women Safety App
4 pages
Huawei
No ratings yet
Huawei
4 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Auto LISP
No ratings yet
Auto LISP
18 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
02-Subgrad Method Notes
No ratings yet
02-Subgrad Method Notes
27 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
DS 5535 Supplemental Questions For Visa Applicants
No ratings yet
DS 5535 Supplemental Questions For Visa Applicants
3 pages

Lecture 3 Gradient Descent

Uploaded by

Lecture 3 Gradient Descent

Uploaded by

Gradient Descent

Gradient Descent Algorithm:

where is the step size (sometimes called learning rate)

Gradient Descent Algorithm:

where is the step size (sometimes called learning rate)

Gradient Descent Algorithm:

where is the step size (sometimes called learning rate)

Possible Stopping Criteria: iterate until for some

How small should be?

Step size matters!

Step size matters!

• To implement backtracking line search, choose two parameters

Iterations continue until

• Not all convex functions are differentiable, can we still apply

• The set of all subgradients of at form a convex set, i.e.,

• Subgradient of for convex functions?

• Subgradient of for convex functions?

• and are both subgradients (and so are all convex

Subgradient Descent Algorithm:

where is the step size and is a subgradient of at

Subgradient Descent Algorithm:

where is the step size and is a subgradient of at

Can you use line search here?

Diminishing Step Size

where is some positive constant that depends on

You might also like