0% found this document useful (0 votes)
38 views3 pages

Unit 4 - GRADIENT LEARNING

Uploaded by

jini shine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views3 pages

Unit 4 - GRADIENT LEARNING

Uploaded by

jini shine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

GRADIENT LEARNING

In deep learning, gradient-based learning is the core principle behind training neural networks. Gradient
Descent is known as one of the most commonly used optimization algorithms to train machine learning
models by means of minimizing errors between actual and expected results. Further, gradient descent is
also used to train Neural Networks.

1. Loss Function: The first step in gradient-based learning is defining a loss function, which measures
the difference between the predicted outputs of the neural network and the actual target values. Common
loss functions include mean squared error (MSE) for regression tasks and categorical cross-entropy for
classification tasks.

3. Backpropagation: Backpropagation is the algorithm used to compute the gradients of the loss function
with respect to the model parameters efficiently. It involves two main steps:

 Forward Pass: During the forward pass, input data is propagated through the neural network, and
predictions are made. The activations of each layer are computed sequentially until the output is
generated.
 Backward Pass: During the backward pass, the gradients of the loss function with respect to the
model parameters are computed using the chain rule. Gradients are propagated backward through
the network, layer by layer, allowing efficient computation of the gradients using techniques such
as the chain rule.

4. Parameter Updates: Once the gradients have been computed, the model parameters are updated using
the gradient descent algorithm. This process is repeated iteratively for a certain number of epochs or until
convergence, with the goal of minimizing the loss function and improving the model's performance on the
training data.
Types of Gradient Descent
1. Batch Gradient Descent:
Batch gradient descent (BGD) is used to find the error for each point in the training set
and update the model after evaluating all training examples. This procedure is known as
the training epoch. In simple words, it is a greedy approach where we have to sum over
all examples for each update.

Advantages of Batch gradient descent:

o It produces less noise in comparison to other gradient descent.


o It produces stable gradient descent convergence.
o It is Computationally efficient as all resources are used for all training samples.

2. Stochastic gradient descent


Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration. Or in other words, it processes a training epoch for each example
within a dataset and updates each training example's parameters one at a time. As it
requires only one training example at a time, hence it is easier to store in allocated
memory.

Advantages of Stochastic gradient descent:

In Stochastic gradient descent (SGD), learning happens on every example, and it consists
of a few advantages over other gradient descent.

o It is easier to allocate in desired memory.


o It is relatively fast to compute than batch gradient descent.
o It is more efficient for large datasets.

3. MiniBatch Gradient Descent:


Mini Batch gradient descent is the combination of both batch gradient descent and
stochastic gradient descent. It divides the training datasets into small batch sizes then
performs the updates on those batches separately. Splitting training datasets into
smaller batches make a balance to maintain the computational efficiency of batch
gradient descent and speed of stochastic gradient descent.

Advantages of Mini Batch gradient descent:

o It is easier to fit in allocated memory.


o It is computationally efficient.
o It produces stable gradient descent convergence

You might also like