0% found this document useful (0 votes)

48 views5 pages

Gradient Descent and Its Types

Uploaded by

sai ramya prabhaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views5 pages

Gradient Descent and Its Types

Uploaded by

sai ramya prabhaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Gradient Descent and its Types

BE G I NNE R D E E P LE A RNI NG M A C HI NE LE A RNI NG

This article was published as a part of the Data Science Blogathon.

Introduction

The gradient descent algorithm is an optimization algorithm mostly used in machine learning and deep
learning. Gradient descent adjusts parameters to minimize particular functions to local minima. In linear
regression, it finds weight and biases, and deep learning backward propagation uses the method.
The algorithm objective is to identify model parameters like weight and bias that reduce model error on
training data.

In this article, we will explore different types of gradient descent. So let’s get started with the article.

What is a Gradient?

dy = change in y

dx = change in x

1. A gradient measures how much the output of a function changes if you change the inputs a little bit.
2. In machine learning, a gradient is a derivative of a function that has more than one input variable.
Known as the slope of a function in mathematical terms, the gradient simply measures the change in
all weights about the change in error.

Learning Rate:

The algorithm designer can set the learning rate. If we use a learning rate that is too small, it will cause us
to update very slowly, requiring more iterations to get a better solution.

Types of Gradient Descent:

There are three popular types that mainly differ in the amount of data they use:
1. BATCH GRADIENT DESCENT:

Batch gradient descent, also known as vanilla gradient descent, calculates the error for each example
within the training dataset. Still, the model is not changed until every training sample has been assessed.
The entire procedure is referred to as a cycle and a training epoch.

Some benefits of batch are its computational efficiency, which produces a stable error gradient and a
stable convergence. Some drawbacks are that the stable error gradient can sometimes result in a state of
convergence that isn’t the best the model can achieve. It also requires the entire training dataset to be in
memory and available to the algorithm.

class GDRegressor: def init(self,learning_rate=0.01,epochs=100): self.coef_ = None self.intercept_ = None

self.lr = learning_rate self.epochs = epochs def fit(self,X_train,y_train): # init your coefs self.intercept_

= 0 self.coef_ = np.ones(X_train.shape[1]) for i in range(self.epochs): # update all the coef and the

intercept y_hat = np.dot(X_train,self.coef_) + self.intercept_ #print("Shape of y_hat",y_hat.shape)

intercept_der = -2 * np.mean(y_train - y_hat) self.intercept_ = self.intercept_ - (self.lr * intercept_der)

coef_der = -2 * np.dot((y_train - y_hat),X_train)/X_train.shape[0] self.coef_ = self.coef_ - (self.lr *

coef_der) print(self.intercept_,self.coef_) def predict(self,X_test): return np.dot(X_test,self.coef_) +

self.intercept_

Advantages

1. Fewer model updates mean that this variant of the steepest descent method is more computationally
efficient than the stochastic gradient descent method.
2. Reducing the update frequency provides a more stable error gradient and a more stable convergence
for some problems.
3. Separating forecast error calculations and model updates provides a parallel processing-based
algorithm implementation.

Disadvantages
1. A more stable error gradient can cause the model to prematurely converge to a suboptimal set of
parameters.
2. End-of-training epoch updates require the additional complexity of accumulating prediction errors
across all training examples.
3. The batch gradient descent method typically requires the entire training dataset in memory and is
implemented for use in the algorithm.
4. Large datasets can result in very slow model updates or training speeds.
5. Slow and require more computational power.

2. STOCHASTIC GRADIENT DESCENT:

By contrast, stochastic gradient descent (SGD) changes the parameters for each training sample one at a
time for each training example in the dataset. Depending on the issue, this can make SGD faster than batch
gradient descent. One benefit is that the regular updates give us a fairly accurate idea of the rate of
improvement.

However, the batch approach is less computationally expensive than the frequent updates. The frequency
of such updates can also produce noisy gradients, which could cause the error rate to fluctuate rather than
gradually go down.

Advantages

1. You can instantly see your model’s performance and improvement rates with frequent updates.
2. This variant of the steepest descent method is probably the easiest to understand and implement,
especially for beginners.
3. Increasing the frequency of model updates will allow you to learn more about some issues faster.
4. The noisy update process allows the model to avoid local minima (e.g., premature convergence).
5. Faster and require less computational power.
. Suitable for the larger dataset.

Disadvantages

1. Frequent model updates are more computationally intensive than other steepest descent
configurations, and it takes considerable time to train the model with large datasets.
2. Frequent updates can result in noisy gradient signals. This can result in model parameters and cause
errors to fly around (more variance across the training epoch).
3. A noisy learning process along the error gradient can also make it difficult for the algorithm to commit
to the model’s minimum error.

Implementation of sgd classifier in sklearn:

from sklearn.linear_model import SGDClassifier X = [[0., 0.], [1., 1.]] y = [0, 1] clf =

SGDClassifier(loss="hinge", penalty="l2", max_iter=5) clf.fit(X, y) SGDClassifier(max_iter=5)

3. MINI-BATCH GRADIENT DESCENT:

Since mini-batch gradient descent combines the ideas of batch gradient descent with SGD, it is the
preferred technique. It divides the training dataset into manageable groups and updates each separately.
This strikes a balance between batch gradient descent’s effectiveness and stochastic gradient descent’s
durability.

Mini-batch sizes typically range from 50 to 256, although, like with other machine learning techniques,
there is no set standard because it depends on the application. The most popular kind in deep learning,
this method is used when training a neural network.

class MBGDRegressor: def init(self,batch_size,learning_rate=0.01,epochs=100): self.coef_ = None

self.intercept_ = None self.lr = learning_rate self.epochs = epochs self.batch_size = batch_size def

fit(self,X_train,y_train): # init your coefs self.intercept_ = 0 self.coef_ = np.ones(X_train.shape[1]) for i

in range(self.epochs): for j in range(int(X_train.shape[0]/self.batch_size)): idx =

random.sample(range(X_train.shape[0]),self.batch_size) y_hat = np.dot(X_train[idx],self.coef_) +

self.intercept_ #print("Shape of y_hat",y_hat.shape) intercept_der = -2 * np.mean(y_train[idx] - y_hat)

self.intercept_ = self.intercept_ - (self.lr * intercept_der) coef_der = -2 * np.dot((y_train[idx] -

y_hat),X_train[idx]) self.coef_ = self.coef_ - (self.lr * coef_der) print(self.intercept_,self.coef_) def

predict(self,X_test): return np.dot(X_test,self.coef_) + self.intercept_

Advantages

1. The model is updated more frequently than the stack gradient descent method, allowing for more
robust convergence and avoiding local minima.
2. Batch updates provide a more computationally efficient process than stochastic gradient descent.
3. Batch processing allows for both the efficiency of not having all the training data in memory and
implementing the algorithm.

Disadvantages

1. Mini-batch requires additional hyperparameters “mini-batch size” to be set for the learning algorithm.
2. Error information should be accumulated over a mini-batch of training samples, such as batch gradient
descent.
3. it will generate complex functions.

Configure Mini-Batch Gradient Descent:

The mini-batch steepest descent method is a variant of the steepest descent method recommended for
most applications, intense learning.
Mini-batch sizes, commonly called “batch sizes” for brevity, are often tailored to some aspect of the
computing architecture in which the implementation is running. For example, a power of 2 that matches
the memory requirements of the GPU or CPU hardware, such as 32, 64, 128, and 256.
The stack size is a slider for the learning process.
Smaller values allow the learning process to converge quickly at the expense of noise in the training
process. Larger values result in a learning process that slowly converges to an accurate estimate of the
error gradient.

Conclusion

In this article, we learned about different types of gradient descent. The key takeaways from the article are:

The mini-batch steepest descent method is the recommended method because it combines the
concept of batch steepest descent with SGD. Simply divide your training dataset into manageable
groups and update each individually. This balances the effectiveness of batch gradient descent with
the durability of stochastic gradient descent.
When using batch gradient descent, adjustments are made after calculating the error for a certain
batch. One advantage of the batch gradient descent method is its computational efficiency, which
produces a stable error gradient and a stable convergence.
Stochastic Gradient Descent (SGD) sequentially modifies the parameters of each training sample in
each training sample of the dataset. This allows SGD to be faster than batch gradient descent. One
benefit is that the regular updates give us a fairly accurate idea of the rate of improvement.
In general, the higher the learning rate, the faster the model can learn at the expense of the non-optimal
final set of weights. With a low learning rate, the model can learn a more optimal or globally optimal set
of weights, but it can take considerable time to train.

The media shown in this ar ticle is not owned by Analytics Vidhya and is used at the Author’s discretion.

Article Url - https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/

Akash Wagh

Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
Lec 6
No ratings yet
Lec 6
11 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
8 - GD Optimizers
No ratings yet
8 - GD Optimizers
18 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
UNIT3
No ratings yet
UNIT3
37 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
Gradient Descent & Stockastic Gradient Descent
No ratings yet
Gradient Descent & Stockastic Gradient Descent
6 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
GD Types
No ratings yet
GD Types
98 pages
2,5 Stochastic Gradient Descent
No ratings yet
2,5 Stochastic Gradient Descent
11 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
Module 4 Lab 3
No ratings yet
Module 4 Lab 3
6 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Optimizer
No ratings yet
Optimizer
13 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
SGD
No ratings yet
SGD
3 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Optimization Algorithms Deep PDF
No ratings yet
Optimization Algorithms Deep PDF
9 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
No ratings yet
CSCE 5063-001: Assignment 2: 1 Implementation of SVM Via Gradient Descent
5 pages
Power Systems Resilience: Naser Mahdavi Tabatabaei Sajad Najafi Ravadanegh Nicu Bizon
No ratings yet
Power Systems Resilience: Naser Mahdavi Tabatabaei Sajad Najafi Ravadanegh Nicu Bizon
366 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
SAP Connector - Extractor For ERP ECC General User Guide 2110 For SolEx
No ratings yet
SAP Connector - Extractor For ERP ECC General User Guide 2110 For SolEx
95 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
05.stochastic Gradient Descent
No ratings yet
05.stochastic Gradient Descent
2 pages
UNIT2
No ratings yet
UNIT2
25 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Tora
100% (3)
Tora
14 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Presentation On Microsoft's ZUNE
No ratings yet
Presentation On Microsoft's ZUNE
21 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
New INSITE Installation Instructions: 1. Download INSITE Software From Website
No ratings yet
New INSITE Installation Instructions: 1. Download INSITE Software From Website
27 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Host
No ratings yet
Host
48 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Reference Letter Job Application
100% (2)
Reference Letter Job Application
5 pages
Adarsh Vardhan Patel
No ratings yet
Adarsh Vardhan Patel
20 pages
40 Questions On Time Series Solution Skillpower Time Series Datafest 2017
100% (1)
40 Questions On Time Series Solution Skillpower Time Series Datafest 2017
18 pages
Kenneth Hagin JR Vida de Obediencia
100% (1)
Kenneth Hagin JR Vida de Obediencia
47 pages
MKS DeviceNet DIDO User Manual
No ratings yet
MKS DeviceNet DIDO User Manual
31 pages
Maths Revision-checklist-Year-2-Pure-maths
No ratings yet
Maths Revision-checklist-Year-2-Pure-maths
2 pages
Zapi Controllers 1511612-2200SRM1006 - (11-2007) - Us-En
No ratings yet
Zapi Controllers 1511612-2200SRM1006 - (11-2007) - Us-En
62 pages
Campus Ambassador Pitch
No ratings yet
Campus Ambassador Pitch
2 pages
Vande Bharat Ticket
50% (2)
Vande Bharat Ticket
2 pages
SWITCH Poe sg350 E
No ratings yet
SWITCH Poe sg350 E
4 pages
Januarius T. Manipol - Profile - PDF - 03152024
No ratings yet
Januarius T. Manipol - Profile - PDF - 03152024
4 pages
Introduction To Data Science - Unit-1
No ratings yet
Introduction To Data Science - Unit-1
9 pages
Omron Fins Ethernet Manual
No ratings yet
Omron Fins Ethernet Manual
86 pages
Ent131 HRM Assessment
No ratings yet
Ent131 HRM Assessment
42 pages
Mock
No ratings yet
Mock
13 pages
MTM18 Final Report
No ratings yet
MTM18 Final Report
24 pages
Shortlisting - Acropolis
No ratings yet
Shortlisting - Acropolis
7 pages
FCSS Soc An-7.4
No ratings yet
FCSS Soc An-7.4
5 pages
BRM Unit-4
No ratings yet
BRM Unit-4
18 pages
Document 5
No ratings yet
Document 5
16 pages
Unit 6 - Java Server Programming
No ratings yet
Unit 6 - Java Server Programming
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
7 pages
1.2-4 Apps
No ratings yet
1.2-4 Apps
7 pages
Cyber Sequrityy
No ratings yet
Cyber Sequrityy
4 pages
How FPGA 5G Technology Changes Equation For The Future
No ratings yet
How FPGA 5G Technology Changes Equation For The Future
4 pages
JSS 1 - 3
No ratings yet
JSS 1 - 3
6 pages
Normal Lead by SR
No ratings yet
Normal Lead by SR
1 page
Cryptography Guide
No ratings yet
Cryptography Guide
4 pages
Doubts
No ratings yet
Doubts
2 pages
Mapa Tipo de Datos
No ratings yet
Mapa Tipo de Datos
1 page

Gradient Descent and Its Types

Uploaded by

Gradient Descent and Its Types

Uploaded by

Gradient Descent and its Types

BE G I NNE R D E E P LE A RNI NG M A C HI NE LE A RNI NG

This article was published as a part of the Data Science Blogathon.

Types of Gradient Descent:

class GDRegressor: def __init__(self,learning_rate=0.01,epochs=100): self.coef_ = None self.intercept_ = None

intercept y_hat = np.dot(X_train,self.coef_) + self.intercept_ #print("Shape of y_hat",y_hat.shape)

intercept_der = -2 * np.mean(y_train - y_hat) self.intercept_ = self.intercept_ - (self.lr * intercept_der)

coef_der = -2 * np.dot((y_train - y_hat),X_train)/X_train.shape[0] self.coef_ = self.coef_ - (self.lr *

coef_der) print(self.intercept_,self.coef_) def predict(self,X_test): return np.dot(X_test,self.coef_) +

2. STOCHASTIC GRADIENT DESCENT:

Implementation of sgd classifier in sklearn:

SGDClassifier(loss="hinge", penalty="l2", max_iter=5) clf.fit(X, y) SGDClassifier(max_iter=5)

3. MINI-BATCH GRADIENT DESCENT:

class MBGDRegressor: def __init__(self,batch_size,learning_rate=0.01,epochs=100): self.coef_ = None

self.intercept_ = None self.lr = learning_rate self.epochs = epochs self.batch_size = batch_size def

fit(self,X_train,y_train): # init your coefs self.intercept_ = 0 self.coef_ = np.ones(X_train.shape[1]) for i

in range(self.epochs): for j in range(int(X_train.shape[0]/self.batch_size)): idx =

random.sample(range(X_train.shape[0]),self.batch_size) y_hat = np.dot(X_train[idx],self.coef_) +

self.intercept_ #print("Shape of y_hat",y_hat.shape) intercept_der = -2 * np.mean(y_train[idx] - y_hat)

self.intercept_ = self.intercept_ - (self.lr * intercept_der) coef_der = -2 * np.dot((y_train[idx] -

y_hat),X_train[idx]) self.coef_ = self.coef_ - (self.lr * coef_der) print(self.intercept_,self.coef_) def

predict(self,X_test): return np.dot(X_test,self.coef_) + self.intercept_

Configure Mini-Batch Gradient Descent:

Article Url - https://www.analyticsvidhya.com/blog/2022/07/gradient-descent-and-its-types/

You might also like

class GDRegressor: def init(self,learning_rate=0.01,epochs=100): self.coef_ = None self.intercept_ = None

class MBGDRegressor: def init(self,batch_size,learning_rate=0.01,epochs=100): self.coef_ = None