0% found this document useful (0 votes)

9 views8 pages

Gradient Descent

Uploaded by

yuvrajsharma56780

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

Gradient Descent

Uploaded by

yuvrajsharma56780

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Gradient Descent

What is a Cost Function?

It is a function that measures the performance of a model for any given data. Cost
Function quantifies the error between predicted values and expected values and presents it in
the form of a single real number.

After making a hypothesis with initial parameters, we calculate the Cost function. And with a
goal to reduce the cost function, we modify the parameters by using the Gradient descent
algorithm over the given data. Here’s the mathematical representation for it:

What is Gradient Descent?

Gradient Descent in Machine Learning

Gradient Descent is known as one of the most commonly used optimization algorithms to
train machine learning models by means of minimizing errors between actual and expected
results. Further, gradient descent is also used to train Neural Networks.

In mathematical terminology, Optimization algorithm refers to the task of

minimizing/maximizing an objective function f(x) parameterized by x. Similarly, in machine
learning, optimization is the task of minimizing the cost function parameterized by the
model's parameters. The main objective of gradient descent is to minimize the convex
function using iteration of parameter updates. Once these machine learning models are
optimized, these models can be used as powerful tools for Artificial Intelligence and various
computer science applications.

In this tutorial on Gradient Descent in Machine Learning, we will learn in detail about
gradient descent, the role of cost functions specifically as a barometer within Machine
Learning, types of gradient descents, learning rates, etc.

Example of Gradient Descent

Let’s say you are playing a game where the players are at the top of a mountain, and they are
asked to reach the lowest point of the mountain. Additionally, they are blindfolded. So, what
approach do you think would make you reach the lake?

Take a moment to think about this before you read on.

The best way is to observe the ground and find where the land descends. From that position,
take a step in the descending direction and iterate this process until we reach the lowest point.

Finding the lowest point in a hilly landscape. (Source: Fisseha Berhane)

Gradient descent is an iterative optimization algorithm for finding the local minimum of a
function.

To find the local minimum of a function using gradient descent, we must take steps
proportional to the negative of the gradient (move away from the gradient) of the function at
the current point. If we take steps proportional to the positive of the gradient (moving towards
the gradient), we will approach a local maximum of the function, and the procedure is
called Gradient Ascent.

Gradient descent was originally proposed by CAUCHY in 1847. It is also known as steepest
descent.
This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a gradient descent algorithm is to minimize the cost
function using iteration. To achieve this goal, it performs two steps iteratively:

o Calculates the first-order derivative of the function to compute the gradient or slope of
that function.
o Move away from the direction of the gradient, which means slope increased from the
current point by alpha times, where Alpha is defined as Learning Rate. It is a tuning
parameter in the optimization process which helps to decide the length of the steps.

How does Gradient Descent work?

Before starting the working principle of gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple linear
regression is given as:

1. Y=mX+c

Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.
The starting point(shown in above fig.) is used to evaluate the performance as it is considered
just as an arbitrary point. At this starting point, we will derive the first derivative or slope and
then use a tangent line to calculate the steepness of this slope. Further, this slope will inform
the updates to the parameters (weights and bias).

The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.

The main objective of gradient descent is to minimize the cost function or the error between
expected and actual. To minimize the cost function, two data points are required:

o Direction & Learning Rate

These two factors are used to determine the partial derivative calculation of future iteration
and allow it to the point of convergence or local minimum or global minimum. Let's discuss
learning rate factors in brief;

Learning Rate:

It is defined as the step size taken to reach the minimum or lowest point. This is typically a
small value that is evaluated and updated based on the behavior of the cost function. If the
learning rate is high, it results in larger steps but also leads to risks of overshooting the
minimum. At the same time, a low learning rate shows the small step sizes, which
compromises overall efficiency but gives the advantage of more precision.
How Does Gradient Descent Work?
1. Gradient descent is an optimization algorithm used to minimize the cost function of a
model.
2. The cost function measures how well the model fits the training data and is defined
based on the difference between the predicted and actual values.
3. The gradient of the cost function is the derivative with respect to the model’s parameters
and points in the direction of the steepest ascent.
4. The algorithm starts with an initial set of parameters and updates them in small steps to
minimize the cost function.
5. In each iteration of the algorithm, the gradient of the cost function with respect to each
parameter is computed.
6. The gradient tells us the direction of the steepest ascent, and by moving in the opposite
direction, we can find the direction of the steepest descent.
7. The size of the step is controlled by the learning rate, which determines how quickly the
algorithm moves towards the minimum.
8. The process is repeated until the cost function converges to a minimum, indicating that
the model has reached the optimal set of parameters.
9. There are different variations of gradient descent, including batch gradient descent,
stochastic gradient descent, and mini-batch gradient descent, each with its own
advantages and limitations.
10. Efficient implementation of gradient descent is essential for achieving good performance
in machine learning tasks. The choice of the learning rate and the number of iterations
can significantly impact the performance of the algorithm.
Types of Gradient Descent

Based on the error in various training models, the Gradient Descent learning algorithm can be
divided into Batch gradient descent, stochastic gradient descent, and mini-batch
gradient descent. Let's understand these different types of gradient descent:

1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each point in the training set and
update the model after evaluating all training examples. This procedure is known as the
training epoch. In simple words, it is a greedy approach where we have to sum over all
examples for each update.

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.

o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training samples.

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration. Or in other words, it processes a training epoch for each example
within a dataset and updates each training example's parameters one at a time. As it requires
only one training example at a time, hence it is easier to store in allocated memory. However,
it shows some computational efficiency losses in comparison to batch gradient systems as it
shows frequent updates that require more detail and speed. Further, due to frequent updates, it
is also treated as a noisy gradient. However, sometimes it can be helpful in finding the global
minimum and also escaping the local minimum.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every example, and it consists of a
few advantages over other gradient descent.

o It is easier to allocate in desired memory.

o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the
updates on those batches separately. Splitting training datasets into smaller batches make a
balance to maintain the computational efficiency of batch gradient descent and speed of
stochastic gradient descent. Hence, we can achieve a special type of gradient descent with
higher computational efficiency and less noisy gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

o It is computationally efficient.
o It produces stable gradient descent convergence.

Challenges with the Gradient Descent

Although we know Gradient Descent is one of the most popular methods for optimization
problems, it still also has some challenges. There are a few challenges as follows:

1. Local Minima and Saddle Point:

For convex problems, gradient descent can find the global minimum easily, while for non-
convex problems, it is sometimes difficult to find the global minimum, where the machine
learning models achieve the best results.

Whenever the slope of the cost function is at zero or just close to zero, this model stops
learning further. Apart from the global minimum, there occur some scenarios that can show
this slop, which is saddle point and local minimum. Local minima generate the shape similar
to the global minimum, where the slope of the cost function increases on both sides of the
current points.

In contrast, with saddle points, the negative gradient only occurs on one side of the point,
which reaches a local maximum on one side and a local minimum on the other side. The
name of a saddle point is taken by that of a horse's saddle.
The name of local minima is because the value of the loss function is minimum at that point
in a local region. In contrast, the name of the global minima is given so because the value of
the loss function is minimum there, globally across the entire domain the loss function.

2. Vanishing and Exploding Gradient

In a deep neural network, if the model is trained with gradient descent and backpropagation,
there can occur two more issues other than local minima and saddle point.

Vanishing Gradients:

Vanishing Gradient occurs when the gradient is smaller than expected. During
backpropagation, this gradient becomes smaller that causing the decrease in the learning rate
of earlier layers than the later layer of the network. Once this happens, the weight parameters
update until they become insignificant.

Exploding Gradient:

Exploding gradient is just opposite to the vanishing gradient as it occurs when the Gradient is
too large and creates a stable model. Further, in this scenario, model weight increases, and
they will be represented as NaN. This problem can be solved using the dimensionality
reduction technique, which helps to minimize complexity within the model.

Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
UNIT2
No ratings yet
UNIT2
25 pages
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
No ratings yet
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
25 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
GD Types
No ratings yet
GD Types
98 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
Paper 2
No ratings yet
Paper 2
27 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent A Fundamental Optimization Algorithm
No ratings yet
Gradient Descent A Fundamental Optimization Algorithm
30 pages
FX E X: Khwaja Fareed University of Engineering & Information Technology Rahim Yar Khan
100% (1)
FX E X: Khwaja Fareed University of Engineering & Information Technology Rahim Yar Khan
2 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
Gradient Descent Final
No ratings yet
Gradient Descent Final
27 pages
Gradient Descent
No ratings yet
Gradient Descent
27 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
DL Unit - 2
No ratings yet
DL Unit - 2
20 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Mathematics 10 - : Quarter 1 - Week 4
No ratings yet
Mathematics 10 - : Quarter 1 - Week 4
8 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent Regression
No ratings yet
Gradient Descent Regression
14 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
LInear
No ratings yet
LInear
14 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Deep Learning (Part 8) - Coursesteach
No ratings yet
Deep Learning (Part 8) - Coursesteach
16 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
What Is Gradient Descent - Built in
No ratings yet
What Is Gradient Descent - Built in
11 pages
Gradient Descent Algorithm.Y...
No ratings yet
Gradient Descent Algorithm.Y...
10 pages
Introduction To Gradient Descent
No ratings yet
Introduction To Gradient Descent
8 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent
No ratings yet
Gradient Descent
7 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
AI33
No ratings yet
AI33
6 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Assignment B 4 GradientDescent
No ratings yet
Assignment B 4 GradientDescent
5 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Tom Mitchell Provides A More Modern Definition
No ratings yet
Tom Mitchell Provides A More Modern Definition
10 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
Econometrics Note Gujarati Chapter 14
No ratings yet
Econometrics Note Gujarati Chapter 14
10 pages
Numerical Methods Question Bank
No ratings yet
Numerical Methods Question Bank
17 pages
Five Years Ago, John's Age Was Half of The Age He Will Be in 8 Years. How Old Is He Now?
No ratings yet
Five Years Ago, John's Age Was Half of The Age He Will Be in 8 Years. How Old Is He Now?
8 pages
MATLAB Practical File (Codes) by Priyanshu Sinha
No ratings yet
MATLAB Practical File (Codes) by Priyanshu Sinha
35 pages
Lecture 11 - Solving Equations by Gauss Seidel Method
No ratings yet
Lecture 11 - Solving Equations by Gauss Seidel Method
15 pages
B For I 1,, M: N J J J
No ratings yet
B For I 1,, M: N J J J
19 pages
CS 384: AAD Laboratory Assignment-II Soumya Gourab Sahoo: Q.5. STRASSENS's Matrix Multiplication
No ratings yet
CS 384: AAD Laboratory Assignment-II Soumya Gourab Sahoo: Q.5. STRASSENS's Matrix Multiplication
27 pages
Application of Derivatives - 2 Practical
No ratings yet
Application of Derivatives - 2 Practical
5 pages
Polynomials
No ratings yet
Polynomials
19 pages
Keller: a b α 1 α n n i i α i i i
No ratings yet
Keller: a b α 1 α n n i i α i i i
26 pages
Lec 21 Trapizoidal Rule
No ratings yet
Lec 21 Trapizoidal Rule
28 pages
University of Waterloo Department of Management Sciences MSCI 331: Introduction To Optimization FALL 2019 Assignment 4
No ratings yet
University of Waterloo Department of Management Sciences MSCI 331: Introduction To Optimization FALL 2019 Assignment 4
3 pages
Ai Assignment 2
No ratings yet
Ai Assignment 2
19 pages
Daa Question Bank All Units
No ratings yet
Daa Question Bank All Units
4 pages
Problem Sheet-3
No ratings yet
Problem Sheet-3
4 pages
November Month Plan CBSE - 10 Polynomials Assignment Solutions
No ratings yet
November Month Plan CBSE - 10 Polynomials Assignment Solutions
6 pages
Exam Focus - More About Polynomials ANS
No ratings yet
Exam Focus - More About Polynomials ANS
8 pages
Dynamic Programming Tutorial
No ratings yet
Dynamic Programming Tutorial
15 pages
S Dixit
No ratings yet
S Dixit
57 pages
L02 AsymptoticAnalysis I
No ratings yet
L02 AsymptoticAnalysis I
24 pages
Section 3.4 - Communicate With Algebra: Brain Teaser
No ratings yet
Section 3.4 - Communicate With Algebra: Brain Teaser
5 pages
Warshalls Algorithm
No ratings yet
Warshalls Algorithm
6 pages
Heuristic Search Strategies: Informed (Heuris7c) Search Strategy
No ratings yet
Heuristic Search Strategies: Informed (Heuris7c) Search Strategy
12 pages
Homework 3
No ratings yet
Homework 3
2 pages
Course Inication
No ratings yet
Course Inication
3 pages
DSA Assignment #3 LAB
No ratings yet
DSA Assignment #3 LAB
4 pages
Assignment 1 CFD-1
No ratings yet
Assignment 1 CFD-1
2 pages
Remainder Theorem and Factor Theorem
No ratings yet
Remainder Theorem and Factor Theorem
2 pages
Hill Climbing: Fundamentals and Applications
From Everand
Hill Climbing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Gradient Descent

Uploaded by

Gradient Descent

Uploaded by

Gradient Descent

What is a Cost Function?

What is Gradient Descent?

Gradient Descent in Machine Learning

In mathematical terminology, Optimization algorithm refers to the task of

Example of Gradient Descent

Take a moment to think about this before you read on.

Finding the lowest point in a hilly landscape. (Source: Fisseha Berhane)

How does Gradient Descent work?

o Direction & Learning Rate

1. Batch Gradient Descent:

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.

2. Stochastic gradient descent

Advantages of Stochastic gradient descent:

o It is easier to allocate in desired memory.

3. MiniBatch Gradient Descent:

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.

Challenges with the Gradient Descent

1. Local Minima and Saddle Point:

2. Vanishing and Exploding Gradient

You might also like