0% found this document useful (0 votes)

84 views12 pages

Cs3491 - Aiml - Unit III - Gradient Descent

Uploaded by

Soban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views12 pages

Cs3491 - Aiml - Unit III - Gradient Descent

Uploaded by

Soban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Department of Computer Science &

Engineering

Regulation 21

Semester: III

Course Code: CS3491

Course Name: Artificial Intelligence and Machine Learning

K.Sumithra Devi
Assistant Professor
CSE
1
KCG DEPARTMENT OF CSE 1
UNIT III SUPERVISED LEARNING – GRADIENT DESCENT

CO 3 Build supervised learning models

KCG DEPARTMENT OF CSE 2

Gradient Descent
The algorithm:

x (t+1) = x (t) − α(t)∇f ( x (t) ),

t = 0, 1,
2, . . . ,
where α ( t ) is called the step
size.

KCG DEPARTMENT OF CSE 3

Why is the direction −∇f (x)?
If x∗ is optimal, then
1
lim [f (x∗ + ϵ d ) − f (x∗)] = ∇f (x∗)T
d ϵ
`є→0 ˛¸
≥0,x ∀d

=⇒ ∇f
(x∗)T d ≥ 0, ∀d
But if x ( t ) is not optimal, then we want
f ( x ( t ) + ϵ d ) ≤ f ( x (t) )
So, 1 h (t) i
lim f (x + ϵ d ) − f (x(t) ) = ∇f (x(t) )T d
`є→0 ϵ ˛¸
≤0, forxsome d

=⇒ ∇f ( x ( t ) ) T d ≤
0
KCG DEPARTMENT OF CSE 4
Descent Direction
Pictorial illustration:
∇f (x) is perpendicular to the contour.
A search direction d can either be on the positive side ∇f (x)T d ≥
0 or negative side ∇f (x)T d < 0.
Only those on the negative side can reduce the
cost. All such d ’s are called the descent
directions.

KCG DEPARTMENT OF CSE 5

Bayesian Linear Regression
The Steepest d
Previous slide: If x ( t ) is not optimal yet, then some d will give
∇f ( x ( t ) ) T d ≤ 0.

So, let us make ∇f ( x ( t ) ) T as negative as possible.

d ( t ) = argmin ∇f ( x ( t ) ) T d ,
d 2=δ

We need δ to control the magnitude; Otherwise d is

unbounded.
The solution is
d ( t ) = −∇f ( x (t) )
Why? By Cauchy Schwarz,
∇f ( x ( t ) ) T d ≥ − ∇f ( x ( t ) ) 2 d 2.

Minimum attained when d = −∇f (x(t)).

KCG
(t)
DEPARTMENT OF CSE 6
Steepest Descent Direction
Pictorial illustration:
Put a ball surrounding the current
point. All d ’s inside the ball are
feasible.
Pick the one that minimizes ∇f (x)T d .
This direction must be parallel (but
opposite sign) to ∇f (x).

7
KCG DEPARTMENT OF CSE 7
Step Size
The algorithm:

x (t+1) = x (t) − α(t)∇f ( x (t) ),

t = 0, 1,
2, . . . ,
where α ( t ) is called the step size.
1. Fixed step size
α ( t ) = α.
2. Exact line search
α ( t ) = argmin f x( t ) + α d ( t ) ,
α

E.g., if f (x) = 21 x T Hx + c T x,
then
∇f ( x ( t ) ) T d ( t )
α (t) = − .
d (t)T H d (t)

3. Inexact line search:

Amijo / Wolfe conditions. See Nocedal-Wright Chapter
8
KCG 3.1. DEPARTMENT OF CSE 8
Convergence
Let x∗ be the global minimizer. Assume the followings:
Assume f is twice differentiable so that ∇2f exist.
Assume 0 ≤ λminI ≤ ∇2f (x) ≤ λmaxI for all x ∈ Rn
Run gradient descent with exact line search.
Then, (Nocedal-Wright Chapter 3, Theorem 3.3)
λ min 2
f (x (t+ 1)
) − f (x ) ≤
∗
1 f ( x ( t ) ) − f (x∗)
− λmax
λ min 4
≤ 1 f (x(t−1)) − f (x∗)
− λmax
≤ ..

λ min 2t
≤ 1 f (x(1)) − f (x∗) .
− λmax
Thus, f ( x ( t ) ) → f (x∗) as t → ∞. 9
KCG DEPARTMENT OF CSE 9
Understanding Convergence
Gradient descent can be viewed as successive
approximation. Approximate the function as
1
f (x + d ) ≈ f (x ) + ∇f (x ) d + d 2.
t t t T
2α
We can show that the d that minimizes f (x t + d ) is d = −α∇f (x t
). This suggests: Use a quadratic function to locally approximate
f.
Converge when curvature α of the approximation is not too big.

10
KCG DEPARTMENT OF CSE 10
Advice on Gradient Descent

Gradient descent is useful because

Simple to implement (compared to ADMM, FISTA, etc)
Low computational cost per iteration (no matrix
inversion) Requires only first order derivative (no Hessian)
Gradient is available in deep networks (via back
propagation)
Most machine learning has built-in (stochastic) gradient
descents Welcome to implement your own, but you need to be
careful
Convex non-differentiable problems, e.g., l 1 -
norm Non-convex problem, e.g., ReLU in deep
network Trap by local minima
Inappropriate step size, a.k.a. learning rate
Consider more “transparent” algorithms such as CVX
when Formulating problems. No need to worry about
11
KCG algorithm. Trying toDEPARTMENT
obtain insights.
OF CSE 11
Types of Gradient Descent

• Based on the error in various training models, the Gradient

Descent learning algorithm can be divided into

• Batch gradient descent

• stochastic gradient descent
• mini-batch gradient descent

12
KCG DEPARTMENT OF CSE 12

DL Unit-2
No ratings yet
DL Unit-2
24 pages
DL - Assignment 9 Solution
100% (3)
DL - Assignment 9 Solution
7 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
Gradinet
No ratings yet
Gradinet
51 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
No ratings yet
ECS171: Machine Learning: Lecture 4: Optimization (LFD 3.3, SGD)
45 pages
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
27 pages
Ml4t Notes
No ratings yet
Ml4t Notes
95 pages
Steepest Descent Algorithm
No ratings yet
Steepest Descent Algorithm
28 pages
BSC Part 3
No ratings yet
BSC Part 3
29 pages
Lecture2 Gradient Descent Linear Regression
No ratings yet
Lecture2 Gradient Descent Linear Regression
75 pages
Coordinate Descent Algorithms: Stephen J. Wright
No ratings yet
Coordinate Descent Algorithms: Stephen J. Wright
32 pages
Steepest Descent in Unconstrained Optimization
No ratings yet
Steepest Descent in Unconstrained Optimization
12 pages
Optimization of Chemical Processes (Che1011)
No ratings yet
Optimization of Chemical Processes (Che1011)
9 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
School of Computer Science and Applied Mathematics
No ratings yet
School of Computer Science and Applied Mathematics
5 pages
MATH 685/ CSI 700/ OR 682 Lecture Notes: Optimization Problems
No ratings yet
MATH 685/ CSI 700/ OR 682 Lecture Notes: Optimization Problems
69 pages
Process Optimization Algorythms PDF
No ratings yet
Process Optimization Algorythms PDF
77 pages
20 Notes 6250 f13
No ratings yet
20 Notes 6250 f13
8 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Class Notes MTH-9842
No ratings yet
Optimization Class Notes MTH-9842
25 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
Optimization: 1 Motivation
No ratings yet
Optimization: 1 Motivation
20 pages
Advanced Gradient Descent
No ratings yet
Advanced Gradient Descent
14 pages
Download
No ratings yet
Download
7 pages
Image Steganography Using Deep Learning Techniques
No ratings yet
Image Steganography Using Deep Learning Techniques
62 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
Lecture Note SGD
No ratings yet
Lecture Note SGD
4 pages
Optimumengineeringdesign Day3a
No ratings yet
Optimumengineeringdesign Day3a
34 pages
Inverse Kinematics
No ratings yet
Inverse Kinematics
43 pages
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
No ratings yet
(k+1) K (K) (K) (K) : Recall That A Direction Is A Vector of Unit Length
5 pages
(K) K (k+1) (K) K (K)
No ratings yet
(K) K (k+1) (K) K (K)
6 pages
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
No ratings yet
Machine Learning: Lunch & Learn - Session 8 Luis Borbon 25/07/2017
30 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Hauser Lecture2
No ratings yet
Hauser Lecture2
26 pages
Lecture 12
No ratings yet
Lecture 12
16 pages
Geoffrey Hinton With Nitish Srivastava Kevin Swersky: Neural Networks For Machine Learning
No ratings yet
Geoffrey Hinton With Nitish Srivastava Kevin Swersky: Neural Networks For Machine Learning
31 pages
Gradient Descent PDF
No ratings yet
Gradient Descent PDF
9 pages
Calculus of Variations: Total Variation Denoising
No ratings yet
Calculus of Variations: Total Variation Denoising
4 pages
O4MD 03 Descent Methods
No ratings yet
O4MD 03 Descent Methods
18 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Ex01 Linear Regression
No ratings yet
Ex01 Linear Regression
2 pages
6 Gradient Method
No ratings yet
6 Gradient Method
19 pages
Lec 02
No ratings yet
Lec 02
43 pages
Steepest Descent
No ratings yet
Steepest Descent
7 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Optimization PPT - Part-2
No ratings yet
Optimization PPT - Part-2
42 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
Module 2
No ratings yet
Module 2
151 pages
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
No ratings yet
Adaptive Proximal Gradient Method For Convex Optimization: 1 Intro
23 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
Chap 4 2
No ratings yet
Chap 4 2
214 pages
Lecture 7 (With Notes)
No ratings yet
Lecture 7 (With Notes)
39 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Lecture 9 Si416
No ratings yet
Lecture 9 Si416
14 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Nesterov Momentum
No ratings yet
Nesterov Momentum
3 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Opt Lec 10
No ratings yet
Opt Lec 10
16 pages
Lecture05 Descent
No ratings yet
Lecture05 Descent
31 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
12 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
Deep Learning - IIT Ropar - Unit 7 - Week 4
No ratings yet
Deep Learning - IIT Ropar - Unit 7 - Week 4
6 pages
Optimization 2
No ratings yet
Optimization 2
40 pages
Lecture 7 8 Other Descent Methods
No ratings yet
Lecture 7 8 Other Descent Methods
7 pages
#9 Steepest Descent
No ratings yet
#9 Steepest Descent
17 pages
Lecture8 UnconstrainedII 2023
No ratings yet
Lecture8 UnconstrainedII 2023
57 pages
Cauchy Gradient Based Technique Lecture 5
No ratings yet
Cauchy Gradient Based Technique Lecture 5
21 pages
Optimization Technique Lecture Note
No ratings yet
Optimization Technique Lecture Note
25 pages
Backpropagation Optimization Tutorial
No ratings yet
Backpropagation Optimization Tutorial
14 pages
Chapter 8 Lecture Notes
No ratings yet
Chapter 8 Lecture Notes
4 pages
Optimization
No ratings yet
Optimization
6 pages
Lecture - Activation Function
No ratings yet
Lecture - Activation Function
30 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
BookSlides - 7 Part A - Error-Based - Learning
No ratings yet
BookSlides - 7 Part A - Error-Based - Learning
60 pages
1 One Dimension: Gradient Descent
No ratings yet
1 One Dimension: Gradient Descent
5 pages
MCQ1
No ratings yet
MCQ1
22 pages
Extending The Step-Size Restriction For Gradient Descent
No ratings yet
Extending The Step-Size Restriction For Gradient Descent
17 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages

Cs3491 - Aiml - Unit III - Gradient Descent

Uploaded by

Cs3491 - Aiml - Unit III - Gradient Descent

Uploaded by

Department of Computer Science &

Course Code: CS3491

Course Name: Artificial Intelligence and Machine Learning

CO 3 Build supervised learning models

KCG DEPARTMENT OF CSE 2

x (t+1) = x (t) − α(t)∇f ( x (t) ),

KCG DEPARTMENT OF CSE 3

KCG DEPARTMENT OF CSE 5

So, let us make ∇f ( x ( t ) ) T as negative as possible.

We need δ to control the magnitude; Otherwise d is

Minimum attained when d = −∇f (x(t)).

x (t+1) = x (t) − α(t)∇f ( x (t) ),

3. Inexact line search:

Gradient descent is useful because

• Based on the error in various training models, the Gradient

• Batch gradient descent

You might also like