Finalized Review Report 3 (Gradient, Confusion Matrix)
Finalized Review Report 3 (Gradient, Confusion Matrix)
Finalized Review Report 3 (Gradient, Confusion Matrix)
Submitted by
Section: A
1. Initialization: The model’s weights and biases are initialized with small ran-
dom values.
2. Forward Pass: During the forward pass, the input data is fed into the neural
network. The data propagates through the network layer by layer, and the
activation function of each neuron is applied to produce the output of each
neuron. This process is repeated until the final output is generated.
3. Loss Function: A loss function is used to quantify how well the model is per-
forming on the training data. It measures the difference between the predicted
output and the actual target values. Common loss functions for different tasks
include Mean Squared Error (MSE) for regression and Cross-Entropy Loss for
classification.
5. Gradient Update: The gradients calculated in the previous step are used
to update the model’s parameters. The goal is to adjust the parameters in a
way that reduces the loss function and improves the model’s predictions. The
learning rate hyperparameter controls the size of the steps taken during the
update.
1
Summer Project Literature Review Report 3
6. Repeat: Steps 2 to 5 are repeated for each batch of training data, and this
process is known as an epoch. Training can continue for multiple epochs until
the model converges to a point where the loss is minimized, or a predefined
stopping criterion is met.
By iteratively applying the Gradient Descent algorithm, the neural network ”learns”
from the training data and updates its parameters in the direction that reduces the
prediction error. The process continues until the model reaches a satisfactory level of
performance, allowing it to make accurate predictions on new, unseen data.
Gradient Descent is a fundamental optimization technique used in training neural
networks and other machine learning models, and there are several variants of it, such
as Stochastic Gradient Descent (SGD), Mini-batch Gradient Descent, and more ad-
vanced techniques like Adam and RMSprop, which improve the optimization process
and converge faster.
2
Summer Project Literature Review Report 3