Lec 5 - Gradient-Descent

Gradient descent is an iterative optimization algorithm used to minimize loss functions. It works by taking steps proportional to the negative gradient of the function to reach a local minimum. The function must be differentiable and convex for gradient descent to work. Each step of gradient descent calculates the gradient of the loss function at the current point and moves in the opposite direction, repeating until convergence within a specified tolerance.

Uploaded by

Ankur Saroj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views31 pages

Lec 5 - Gradient-Descent

Uploaded by

Ankur Saroj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

GRADIENT DESCENT

Umarani Jayaraman
Gradient Descent - Introduction
◻ Gradient descent (GD) is an iterative first-order
optimization algorithm, used to find a local
minimum/maximum of a given function.
◻ This method is commonly used in machine
learning (ML) and deep learning (DL) to minimize
a cost/loss function (e.g. in a linear regression).
◻ This method was proposed long before the era of
modern computers by Augustin-Louis Cauchy in
1847.
Function requirements

◻ Gradient descent algorithm does not work for all

functions. There are two specific requirements. A
function has to be:
◻ differentiable
◻ convex
First requirement - differentiable

◻ First, what does it mean it has to be differentiable?

◻ If a function is differentiable it has a derivative for
each point in its domain
◻ Not all functions meet these criteria. First, let’s see
some examples of functions meeting this criterion:
◻ Typical non-differentiable functions have a step a
cusp or a discontinuity
Next requirement - function has to be
convex
◻ Next requirement — function has to be convex.
◻ For a univariate function, this means that the line
segment connecting two function’s points lays on
or above its curve (it does not cross it).
◻ If it crosses it has a local minimum which is not a
global one.
◻ Mathematically, for two points x₁, x₂ laying on the
function’s curve this condition is expressed as:
◻ where λ denotes a point’s location on a section line
and its value has to be between 0 (left point) and 1
(right point),
◻ e.g. λ=0.5 means a location in the middle.
◻ Below there are two functions with exemplary
section lines.
Caution: First requirement –
differentiable, what about second
requirement?
◻ First, what does it mean it has to be differentiable?
◻ If a function is differentiable it has a derivative for
each point in its domain
◻ Second and third function is not convex
◻ Another way to check mathematically if a
univariate function is convex is to calculate the
second derivative and check if its value is always
bigger than 0.
Let’s do a simple example
saddle points
◻ It is also possible to use quasi-convex
functions with a gradient descent algorithm.
◻ However, often they have so-called saddle
points (called also minimax points) where the
algorithm can get stuck
◻ An example of a quasi-convex function is:
◻ First order derivative
◻ Second order derivative
◻ Example of a saddle point in a bivariate function is
show below.
Gradient
◻ Intuitively it is a slope of a curve at a given point in
a specified direction.
◻ In the case of a univariate function, it is simply
the first derivative at a selected point.
◻ In the case of a multivariate function, it is
a vector of derivatives in each main direction
(along variable axes) (i.e) partial derivatives.
◻ A gradient for an n-dimensional function f(x) at a
given point ‘p’ is defined as follows:
Gradient Descent Procedure
◻ In summary, Gradient Descent method’s steps are:
◻ 1. choose a starting point (initialization)
◻ 2. calculate gradient at this point
◻ 3. make a scaled step in the opposite direction to
the gradient (objective: minimize)
◻ 4. repeat points 2 and 3 until one of the criteria is
met:
maximum number of iterations reached
step size is smaller than the tolerance (due to scaling or
a small gradient).
Gradient Descent: sample code
◻ This function takes 5 parameters:
◻ 1. starting point [float] - in our case, we define it
manually but in practice, it is often a random
initialisation
◻ 2. gradient function [object] - function calculating
gradient which has to be specified before-hand and
passed to the GD function
◻ 3. learning rate [float] - scaling factor for step sizes
◻ 4. maximum number of iterations [int]
◻ 5. tolerance [float] to conditionally stop the algorithm
(in this case a default value is 0.01)
Effect of different learning rate
◻ The animation below shows steps taken by the GD
algorithm for learning rates of 0.1 and 0.8.
Results of various learning rate
Gradient - summary
◻ The gradient is a fundamental concept in calculus and
optimization technique
◻ The gradient of a function, denoted by ∇ (nabla), is a
vector that points in the direction of the steepest
increase of the function at a given point.
◻ Mathematically, for a function f(x1,x2,...,xn), the
gradient is given by:
◻ ∇f=(∂f/∂x1, ∂f/ ∂x2 ,..., ∂f/ ∂xn )
◻ Each component of the gradient represents the partial
derivative of the function with respect to one of its
input variables.
Significance in Optimization:

◻ In the context of optimization problems, the goal is

often to find the minimum or maximum of a
function.
◻ The gradient provides crucial information about
the direction and rate of change of the function.
◻ The negative gradient points in the direction of the
steepest decrease of the function.
◻ Therefore, moving in the opposite direction of the
gradient helps in descending towards the minimum
of the function.
Gradient Descent Algorithm
24
Batch Gradient Descent
25

It can be computationally
intensive to compute
Stochastic Gradient Descent
26

Easy to compute but very noisy

Mini-batch Gradient Descent
27

Fast to compute and a much

better estimate of the true
gradient
Mini-batches while training
28

◻ More accurate estimation of gradient

◻ Smoother convergence
◻ Allows for larger leaning rates
◻ Mini-batches lead to fast training
Error minimization with iterations
29
Gradient Descent- Variants
◻ Batch
◻ Stochastic
◻ Mini-batch
Thank you

Gradient Descent
No ratings yet
Gradient Descent
17 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
LInear
No ratings yet
LInear
14 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Lecture 8
No ratings yet
Lecture 8
16 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Optimization and Gradient Descent Algorithm
No ratings yet
Optimization and Gradient Descent Algorithm
37 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
No ratings yet
DS303: Introduction To Machine Learning: Stochastic Gradient Descent
19 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
GD Algo
No ratings yet
GD Algo
18 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
Clnote Oct8
No ratings yet
Clnote Oct8
39 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
Chapter Gradient Descent
No ratings yet
Chapter Gradient Descent
6 pages
ML Notes
No ratings yet
ML Notes
14 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
13 pages
Gradient Descent
No ratings yet
Gradient Descent
55 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Optim
No ratings yet
Optim
33 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Differentiation, Partial Differentiation & Gradients
No ratings yet
Differentiation, Partial Differentiation & Gradients
51 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Calculus - Class Notes
No ratings yet
Calculus - Class Notes
4 pages
Unit3 Rev3
No ratings yet
Unit3 Rev3
201 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Maths For ML
No ratings yet
Maths For ML
1 page
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
VMC Alps Advanced Questions
No ratings yet
VMC Alps Advanced Questions
86 pages
Graphing Transformations of Exponential Functions G1
No ratings yet
Graphing Transformations of Exponential Functions G1
12 pages
Numerical Examples
No ratings yet
Numerical Examples
44 pages
I Am Sharing 'Assignment-3 Maths' With You
No ratings yet
I Am Sharing 'Assignment-3 Maths' With You
7 pages
Sma2116 Engineering Maths II
100% (1)
Sma2116 Engineering Maths II
66 pages
Grade 8 Mathematics Accelerated
No ratings yet
Grade 8 Mathematics Accelerated
20 pages
Cambridge IGCSE™ and O Level Additional Mathematics Coursebook With Cambridge Online Mathematics (2 Years' Access)
No ratings yet
Cambridge IGCSE™ and O Level Additional Mathematics Coursebook With Cambridge Online Mathematics (2 Years' Access)
19 pages
Infinite Series PDF
No ratings yet
Infinite Series PDF
14 pages
Real Analysis: Solutions
No ratings yet
Real Analysis: Solutions
6 pages
1.8 Inverse Functions Edit
No ratings yet
1.8 Inverse Functions Edit
3 pages
FINAL UPDATED Parent - Student Handbook High School 6sep
0% (1)
FINAL UPDATED Parent - Student Handbook High School 6sep
48 pages
Assignment BSC-MA 02 Calculus 2024-25
No ratings yet
Assignment BSC-MA 02 Calculus 2024-25
3 pages
Httpsmoodle - Usth.edu - Vnpluginfile.php51389mod resourcecontent1Calculus20I20-20Tutorial20231 PDF
No ratings yet
Httpsmoodle - Usth.edu - Vnpluginfile.php51389mod resourcecontent1Calculus20I20-20Tutorial20231 PDF
6 pages
Bulan Mac: Minggu 6 15/4/2024 - 19/4/2024
No ratings yet
Bulan Mac: Minggu 6 15/4/2024 - 19/4/2024
20 pages
Clmd4ageneralmathematicsshs - PDF Function (Mathematics) Formula
No ratings yet
Clmd4ageneralmathematicsshs - PDF Function (Mathematics) Formula
1 page
Math 1151 Practice Problems For MID Exam Fall 2024
100% (1)
Math 1151 Practice Problems For MID Exam Fall 2024
8 pages
List of Mathematical Jargon
No ratings yet
List of Mathematical Jargon
8 pages
PYTHON LAB MANUAL-2024 (Autonomous)
No ratings yet
PYTHON LAB MANUAL-2024 (Autonomous)
33 pages
Introduction To Reliability Engineering: Prof. Neeraj Kumar Goyal
No ratings yet
Introduction To Reliability Engineering: Prof. Neeraj Kumar Goyal
712 pages
Interpolation
No ratings yet
Interpolation
10 pages
Concatenative Programming
No ratings yet
Concatenative Programming
6 pages
Agnia Thesis
No ratings yet
Agnia Thesis
169 pages
Ged 102 Exercise 3 Module 1 1Q 2021-2022
No ratings yet
Ged 102 Exercise 3 Module 1 1Q 2021-2022
3 pages
Chap02. P&ID Diagrams. Process Automation Handbook
100% (1)
Chap02. P&ID Diagrams. Process Automation Handbook
6 pages
Reliability Based Structural Design: Ton Vrouwenvelder
No ratings yet
Reliability Based Structural Design: Ton Vrouwenvelder
9 pages
Design of Fuzzy Controllers
No ratings yet
Design of Fuzzy Controllers
27 pages
Materi Pengolahan Citra Digital 4c Sesi 11-12 Image Transformations
No ratings yet
Materi Pengolahan Citra Digital 4c Sesi 11-12 Image Transformations
16 pages
Loosely-Coupled Processes: 1.1 Message Communicating and Shared-Variable Systems
No ratings yet
Loosely-Coupled Processes: 1.1 Message Communicating and Shared-Variable Systems
19 pages
Class-Xii Holiday Homework (2024-25) Science
No ratings yet
Class-Xii Holiday Homework (2024-25) Science
7 pages
11 MM Exp Logs Practice Test
No ratings yet
11 MM Exp Logs Practice Test
7 pages

Lec 5 - Gradient-Descent

Uploaded by

Lec 5 - Gradient-Descent

Uploaded by

GRADIENT DESCENT

◻ Gradient descent algorithm does not work for all

◻ First, what does it mean it has to be differentiable?

◻ In the context of optimization problems, the goal is

Easy to compute but very noisy

Fast to compute and a much

◻ More accurate estimation of gradient

You might also like