Lect 5 - Gradient Descent

Gradient Descent (GD) is an iterative optimization algorithm used to find local minima or maxima of functions, commonly applied in machine learning to minimize cost functions. It requires functions to be differentiable and convex, with the gradient indicating the direction of steepest increase. Variants of GD include Batch, Stochastic, and Mini-batch methods, each with different computational efficiencies and convergence characteristics.

Uploaded by

cs22b2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views31 pages

Lect 5 - Gradient Descent

Uploaded by

cs22b2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

GRADIENT DESCENT

Umarani Jayaraman
Gradient Descent -
Introduction
 Gradient descent (GD) is an iterative
first-order optimization algorithm, used
to find a local minimum/maximum of a
given function.
 This method is commonly used
in machine learning (ML) and deep
learning (DL) to minimize a cost/loss
function (e.g. in a linear regression).
 This method was proposed long before
the era of modern computers by
Augustin-Louis Cauchy in 1847.
Function requirements

 Gradient descent algorithm does not

work for all functions. There are two
specific requirements. A function has to
be:
 differentiable
 convex
First requirement - differentiable

 First, what does it mean it has to

be differentiable?
 If a function is differentiable it has a
derivative for each point in its domain
 Not all functions meet these criteria.
First, let’s see some examples of
functions meeting this criterion:
 Typical non-differentiable functions have
a step a cusp or a discontinuity
Next requirement - function has to be
convex

 Next requirement — function has to be

convex.
 For a univariate function, this means that
the line segment connecting two
function’s points lays on or above its
curve (it does not cross it).
 If it crosses it has a local minimum which
is not a global one.
 Mathematically, for two points x₁, x₂
laying on the function’s curve this
condition is expressed as:
 where λ denotes a point’s location on a
section line and its value has to be
between 0 (left point) and 1 (right point),
 e.g. λ=0.5 means a location in the
middle.
 Below there are two functions with
exemplary section lines.
Caution: First requirement – differentiable, what about second requirement?

 First, what does it mean it has to

be differentiable?
 If a function is differentiable it has a
derivative for each point in its domain
 Second and third function is not convex
 Another way to check mathematically if
a univariate function is convex is to
calculate the second derivative and
check if its value is always bigger than
0.
Let’s do a simple example
saddle points
 It is also possible to use quasi-convex
functions with a gradient descent
algorithm.
 However, often they have so-
called saddle points (called
also minimax points) where the
algorithm can get stuck
 An example of a quasi-convex function
is:
 First order derivative
 Second order derivative
 Example of a saddle point in a bivariate
function is show below.
Gradient
 Intuitively it is a slope of a curve at a given
point in a specified direction.
 In the case of a univariate function, it is
simply the first derivative at a selected
point.
 In the case of a multivariate function, it is
a vector of derivatives in each main
direction (along variable axes) (i.e) partial
derivatives.
 A gradient for an n-dimensional function f(x)
at a given point ‘p’ is defined as follows:
Gradient Descent Procedure
 In summary, Gradient Descent method’s steps
are:
 1. choose a starting point (initialization)
 2. calculate gradient at this point
 3. make a scaled step in the opposite direction
to the gradient (objective: minimize)
 4. repeat points 2 and 3 until one of the criteria
is met:
 maximum number of iterations reached
 step size is smaller than the tolerance (due to
scaling or a small gradient).
Gradient Descent: sample
code
 This function takes 5 parameters:
 1. starting point [float] - in our case, we define it
manually but in practice, it is often a random
initialisation
 2. gradient function [object] - function calculating
gradient which has to be specified before-hand and
passed to the GD function
 3. learning rate [float] - scaling factor for step
sizes
 4. maximum number of iterations [int]
 5. tolerance [float] to conditionally stop the
algorithm (in this case a default value is 0.01)
Effect of different learning
rate
 The animation below shows steps taken
by the GD algorithm for learning rates of
0.1 and 0.8.
Results of various learning
rate
Gradient - summary
 The gradient is a fundamental concept in calculus
and optimization technique
 The gradient of a function, denoted by ∇ (nabla),
is a vector that points in the direction of the
steepest increase of the function at a given point.
 Mathematically, for a function f(x1,x2,...,xn), the
gradient is given by:
 ∇f=(∂f/∂x1, ∂f/ ∂x2 ,..., ∂f/ ∂xn )
 Each component of the gradient represents the
partial derivative of the function with respect to
one of its input variables.
Significance in Optimization:

 In the context of optimization problems, the

goal is often to find the minimum or maximum
of a function.
 The gradient provides crucial information about
the direction and rate of change of the
function.
 The negative gradient points in the direction
of the steepest decrease of the function.
 Therefore, moving in the opposite direction of
the gradient helps in descending towards the
minimum of the function.
Gradient Descent Algorithm
24
Batch Gradient Descent
25

It can be
computationally
intensive to compute
Stochastic Gradient
26
Descent

Easy to compute but very

noisy
Mini-batch Gradient
27
Descent

Fast to compute and a

much better estimate of
the true gradient
Mini-batches while training
28

 More accurate estimation of gradient

 Smoother convergence
 Allows for larger leaning rates
 Mini-batches lead to fast training
Error minimization with
29
iterations
Gradient Descent- Variants
 Batch
 Stochastic
 Mini-batch
Thank you

AASHTO - Pavement Management Guide 2nd Edition-AASHTO (2012)
100% (4)
AASHTO - Pavement Management Guide 2nd Edition-AASHTO (2012)
196 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
VLSI Physical Design Automation PDF
No ratings yet
VLSI Physical Design Automation PDF
29 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
MM17 Custom Fields Update MVKE Table
0% (1)
MM17 Custom Fields Update MVKE Table
10 pages
Levine Smume6 01
100% (1)
Levine Smume6 01
14 pages
Ebit 30: Portable Color Doppler System
100% (2)
Ebit 30: Portable Color Doppler System
14 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
Thoshiba Power Transformer
100% (1)
Thoshiba Power Transformer
28 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
GATE 2018 CS Paper (Feb 4)
33% (3)
GATE 2018 CS Paper (Feb 4)
9 pages
4.2 Gradient-Based Optimization
No ratings yet
4.2 Gradient-Based Optimization
35 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
13 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
ML Unit 3
No ratings yet
ML Unit 3
46 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
MAT6007 - Session8 - Gradient Descent
No ratings yet
MAT6007 - Session8 - Gradient Descent
16 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
B-Jac Us
No ratings yet
B-Jac Us
8 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
GD Algo
No ratings yet
GD Algo
18 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
AI33
No ratings yet
AI33
6 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
Optimization and Gradient Descent Algorithm
No ratings yet
Optimization and Gradient Descent Algorithm
37 pages
Verilog User Defined Primitives
No ratings yet
Verilog User Defined Primitives
8 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
ML Notes
No ratings yet
ML Notes
14 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Mark-VIe Power Supply Specificationsforprojectcl
No ratings yet
Mark-VIe Power Supply Specificationsforprojectcl
6 pages
LInear
No ratings yet
LInear
14 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
Gradient Descent
No ratings yet
Gradient Descent
55 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Contact Summary
No ratings yet
Contact Summary
19 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
Monozukuri
0% (1)
Monozukuri
13 pages
Chapter-III Water Resources Systems: Analysis
No ratings yet
Chapter-III Water Resources Systems: Analysis
53 pages
Reduced Row Echelon Form
No ratings yet
Reduced Row Echelon Form
4 pages
Tl-Wa850re Qig V6
No ratings yet
Tl-Wa850re Qig V6
2 pages
Solutions Assignment-1 Data Members and Functions
No ratings yet
Solutions Assignment-1 Data Members and Functions
24 pages
C2-Distributed Databases
No ratings yet
C2-Distributed Databases
95 pages
Audi A6 f2 Faulty 0009
No ratings yet
Audi A6 f2 Faulty 0009
2 pages
Pros Dle24
No ratings yet
Pros Dle24
37 pages
Audit in CIS - Introduction
100% (1)
Audit in CIS - Introduction
3 pages
APAAR Consent Form - Eng
No ratings yet
APAAR Consent Form - Eng
1 page
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
Gridadvisor Series II Smart Sensor Catalog Ca915001en
No ratings yet
Gridadvisor Series II Smart Sensor Catalog Ca915001en
4 pages
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
No ratings yet
Teaching Veterinary Radiology and Diagnostic Ultrasound at A Distance: Using A QTVR Image Database
19 pages
Arrays
No ratings yet
Arrays
5 pages
Plag Report
No ratings yet
Plag Report
18 pages
Arrays
No ratings yet
Arrays
9 pages
Vaccine Portal
No ratings yet
Vaccine Portal
3 pages
Assignment List
No ratings yet
Assignment List
3 pages
Battery Capacity and Battery Backup Time Calculation
No ratings yet
Battery Capacity and Battery Backup Time Calculation
3 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Lect 5 - Gradient Descent

Uploaded by

Lect 5 - Gradient Descent

Uploaded by

GRADIENT DESCENT

 Gradient descent algorithm does not

 First, what does it mean it has to

 Next requirement — function has to be

 First, what does it mean it has to

 In the context of optimization problems, the

Easy to compute but very

Fast to compute and a

 More accurate estimation of gradient

You might also like