Finalized Review Report 3 (Gradient, Confusion Matrix)

Department of Electrical Engineering
Summer Project 2023
Literature Review Report

about Gradient Descent and
Confusion Matrix in Machine Learning
Submitted by
1) Faseeh Ahmed | 02-3-1-013-2021
Section: A
Submitted To: Dr. Sufi Tabassum Gul
Due: July 24, 2023

Contents
1 Literature Review about Gradient Descent and Confusion Matrix in

Machine Learning 1
1.1 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Example about Interpreting Confusion Matrix . . . . . . . . . 2
1.2.2 Main Uses of Confusion Matrix . . . . . . . . . . . . . . . . . 3
Literature Review about Gradient
Descent and Confusion Matrix in Ma-
chine Learning
1.1 Gradient Descent

Gradient Descent is an optimization algorithm commonly used in neural networks
(and other machine learning models) to update the model’s parameters in order to
minimize the loss function. It is an iterative process that helps the model to learn
from the training data and make better predictions.
In the context of neural networks, the model’s parameters are the weights and
biases of the individual neurons. The goal of training is to find the optimal values
for these parameters, which allow the neural network to make accurate predictions
on new, unseen data.
Here’s how Gradient Descent works in neural networks:
1. Initialization: The model’s weights and biases are initialized with small ran-
dom values.
2. Forward Pass: During the forward pass, the input data is fed into the neural
network. The data propagates through the network layer by layer, and the
activation function of each neuron is applied to produce the output of each
neuron. This process is repeated until the final output is generated.
3. Loss Function: A loss function is used to quantify how well the model is per-
forming on the training data. It measures the difference between the predicted
output and the actual target values. Common loss functions for different tasks
include Mean Squared Error (MSE) for regression and Cross-Entropy Loss for
classification.
4. Backward Pass (Backpropagation): This is the core of Gradient Descent in

neural networks. During the backward pass, the gradients of the loss function
with respect to each weight and bias are calculated. This step tells us how much
each parameter contributes to the error in the predictions.
5. Gradient Update: The gradients calculated in the previous step are used
to update the model’s parameters. The goal is to adjust the parameters in a
way that reduces the loss function and improves the model’s predictions. The
learning rate hyperparameter controls the size of the steps taken during the
update.
1
Summer Project Literature Review Report 3
6. Repeat: Steps 2 to 5 are repeated for each batch of training data, and this
process is known as an epoch. Training can continue for multiple epochs until
the model converges to a point where the loss is minimized, or a predefined
stopping criterion is met.
By iteratively applying the Gradient Descent algorithm, the neural network ”learns”
from the training data and updates its parameters in the direction that reduces the
prediction error. The process continues until the model reaches a satisfactory level of
performance, allowing it to make accurate predictions on new, unseen data.
Gradient Descent is a fundamental optimization technique used in training neural
networks and other machine learning models, and there are several variants of it, such
as Stochastic Gradient Descent (SGD), Mini-batch Gradient Descent, and more ad-
vanced techniques like Adam and RMSprop, which improve the optimization process
and converge faster.
1.2 Confusion Matrix

A confusion matrix is a performance measurement tool used in machine learning,
especially in the context of classification tasks. It is a table that helps visualize the
performance of a classification model by summarizing the predictions made by the
model on a set of data, and how they compare to the actual ground-truth labels.
The confusion matrix is organized into four categories based on the predicted and
actual class labels:
• True Positives (TP): The number of instances correctly predicted as positive

(correctly classified as the positive class).
• False Positives (FP): The number of instances incorrectly predicted as posi-

tive (incorrectly classified as the positive class).
• True Negatives (TN): The number of instances correctly predicted as nega-

tive (correctly classified as the negative class).
• False Negatives (FN): The number of instances incorrectly predicted as neg-

ative (incorrectly classified as the negative class).
The confusion matrix typically looks like this:
Predicted Positive (Class 1) Predicted Negative (Class 0)

Actual Positive (Class 1) True Positives (TP) False Negatives (FN)
Actual Negative (Class 0) False Positives (FP) True Negatives (TN)
1.2.1 Example about Interpreting Confusion Matrix

Here’s an example of how to interpret a confusion matrix for a binary classification
problem:
Assume we are classifying emails as spam (positive class) or not spam (negative
class).
• TP (True Positives): The number of emails correctly classified as spam.
• TN (True Negatives): The number of emails correctly classified as not spam.
2
Summer Project Literature Review Report 3
• FP (False Positives): The number of emails incorrectly classified as spam (false

alarms).
• FN (False Negatives): The number of emails incorrectly classified as not spam

(missed spam).
1.2.2 Main Uses of Confusion Matrix

The main uses of a confusion matrix are:
• Evaluating Model Performance: The confusion matrix provides valuable

insights into how well a classification model is performing. It allows you to
calculate metrics like accuracy, precision, recall (sensitivity), specificity, F1-
score, etc.
• Identifying Model Errors: By looking at the confusion matrix, you can

identify the types of errors your model is making and understand which class is
being misclassified more often.
• Model Selection and Parameter Tuning: When comparing multiple mod-

els or tuning hyperparameters, the confusion matrix can help you choose the
model or parameter settings that yield the best overall performance.

Finalized Review Report 3 (Gradient, Confusion Matrix)

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Finalized Review Report 3 (Gradient, Confusion Matrix)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finalized Review Report 3 (Gradient, Confusion Matrix)

Uploaded by

Copyright:

Available Formats

Department of Electrical Engineering

Summer Project 2023

Literature Review Report

1) Faseeh Ahmed | 02-3-1-013-2021

Submitted To: Dr. Sufi Tabassum Gul

Due: July 24, 2023

1 Literature Review about Gradient Descent and Confusion Matrix in

1.1 Gradient Descent

4. Backward Pass (Backpropagation): This is the core of Gradient Descent in

1.2 Confusion Matrix

• True Positives (TP): The number of instances correctly predicted as positive

• False Positives (FP): The number of instances incorrectly predicted as posi-

• True Negatives (TN): The number of instances correctly predicted as nega-

• False Negatives (FN): The number of instances incorrectly predicted as neg-

The confusion matrix typically looks like this:

Predicted Positive (Class 1) Predicted Negative (Class 0)

1.2.1 Example about Interpreting Confusion Matrix

• TP (True Positives): The number of emails correctly classified as spam.

• TN (True Negatives): The number of emails correctly classified as not spam.

• FP (False Positives): The number of emails incorrectly classified as spam (false

• FN (False Negatives): The number of emails incorrectly classified as not spam

1.2.2 Main Uses of Confusion Matrix

• Evaluating Model Performance: The confusion matrix provides valuable

• Identifying Model Errors: By looking at the confusion matrix, you can

• Model Selection and Parameter Tuning: When comparing multiple mod-

You might also like