Unit 4 - GRADIENT LEARNING
Unit 4 - GRADIENT LEARNING
In deep learning, gradient-based learning is the core principle behind training neural networks. Gradient
Descent is known as one of the most commonly used optimization algorithms to train machine learning
models by means of minimizing errors between actual and expected results. Further, gradient descent is
also used to train Neural Networks.
1. Loss Function: The first step in gradient-based learning is defining a loss function, which measures
the difference between the predicted outputs of the neural network and the actual target values. Common
loss functions include mean squared error (MSE) for regression tasks and categorical cross-entropy for
classification tasks.
3. Backpropagation: Backpropagation is the algorithm used to compute the gradients of the loss function
with respect to the model parameters efficiently. It involves two main steps:
Forward Pass: During the forward pass, input data is propagated through the neural network, and
predictions are made. The activations of each layer are computed sequentially until the output is
generated.
Backward Pass: During the backward pass, the gradients of the loss function with respect to the
model parameters are computed using the chain rule. Gradients are propagated backward through
the network, layer by layer, allowing efficient computation of the gradients using techniques such
as the chain rule.
4. Parameter Updates: Once the gradients have been computed, the model parameters are updated using
the gradient descent algorithm. This process is repeated iteratively for a certain number of epochs or until
convergence, with the goal of minimizing the loss function and improving the model's performance on the
training data.
Types of Gradient Descent
1. Batch Gradient Descent:
Batch gradient descent (BGD) is used to find the error for each point in the training set
and update the model after evaluating all training examples. This procedure is known as
the training epoch. In simple words, it is a greedy approach where we have to sum over
all examples for each update.
In Stochastic gradient descent (SGD), learning happens on every example, and it consists
of a few advantages over other gradient descent.