0% found this document useful (0 votes)
2 views

Lecture #4

Evaluation metrics are essential quantitative measures for assessing the performance of deep learning and machine learning models, aiding in model comparison, hyperparameter tuning, and monitoring over time. The document outlines various evaluation metrics, including classification metrics like accuracy, precision, recall, and F1 score, as well as regression metrics such as mean absolute error and mean squared error. Additionally, it discusses the importance of using techniques like k-fold cross-validation to ensure reliable model performance assessment.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture #4

Evaluation metrics are essential quantitative measures for assessing the performance of deep learning and machine learning models, aiding in model comparison, hyperparameter tuning, and monitoring over time. The document outlines various evaluation metrics, including classification metrics like accuracy, precision, recall, and F1 score, as well as regression metrics such as mean absolute error and mean squared error. Additionally, it discusses the importance of using techniques like k-fold cross-validation to ensure reliable model performance assessment.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Evaluation Metrics in DL(ML)

Q: What is Evaluation Metrics?


▪ Evaluation metrics are quantitative measures used to assess the performance of
a data science or DL (ML) model.
▪ These metrics provide insights into how well the model is performing and help
in comparing different models or algorithms.
Q: Why is it Important?
Evaluation matrices are important as they help:
▪ To assess the performance of a model: Evaluation metrics provide a
quantitative measure of how well a model performs on a given task.
➢ This is essential for understanding a model’s strengths and weaknesses and deciding
whether to deploy it to production.
▪ To compare different models: Evaluation metrics can be used to compare deep
learning models trained on the same dataset to solve the same problem.
➢ For example, if two models have similar accuracy scores, but if one has a higher precision
score, that will be preferred.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation Metrics in DL(ML) Cont.
Q: Why is it Important?
▪ To tune hyperparameters: Evaluation metrics are often used to tune the
hyperparameters of a machine learning model.
▪ Hyperparameters control a model’s training process, such as the learning rate and the
number of epochs.
➢ By adjusting the hyperparameters, data scientists can improve the performance of their models.
▪ To monitor the performance of a model over time: Evaluation metrics can be used to
monitor the performance of a machine learning model over time.
▪ This is important because models can degrade performance over time due to changes
in the data distribution and concept drift.
➢ By monitoring the performance of a model, data scientists can identify any problems early and
take corrective action.
▪ To identify overfitting: Overfitting occurs when a model learns the training data too
well and cannot generalize to new data.
➢ Evaluation metrics can identify overfitting by comparing the model’s performance on the
training data to its performance on a held-out test set.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation of the Dl (ML) models
Q: How can you evaluate the DL models?
You can follow these steps:
▪ Choose the right evaluation metric: The choice of the evaluation metric will
depend on the specific machine learning task and the desired outcome.
➢ For a classification model, you can choose accuracy, precision, recall, and
F1 score as evaluation metrics.
➢ For a Regression model, you may use mean absolute error (MAE), mean
squared error (MSE), and root mean squared error (RMSE) as evaluation
metrics.
▪ Split data into training and test sets: The training set is used to train the model,
and the test set is used to evaluate the model’s performance on unseen data.
▪ This is to ensure that the model is more balanced with the training data.
➢ To reduce the risk of overfitting, use a cross-validation technique.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Using k-fold cross-validation to assess
model performance
• The DL model can either suffer from underfitting (high bias) if the model is too
simple.
• It can overfit the training data (high variance) if the model is too complex for the
underlying training data.
• The common cross-validation technique can help us obtain reliable estimates of the
model’s performance, that is, how well the model performs on unseen data.
• A better way of using the holdout cross-validation method for model selection is to
separate the data into three parts: a training set, a validation set, and a test set.
• We use the validation set to repeatedly evaluate the performance of the model after
training using parameter values.
• We estimate the models’ performance on the test dataset.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Model Training, Tuning, and Evaluation
• Attempt to find a good compromise between the complexity of a model and its
prediction accuracy on the training data.
• Use three sets (training, validation, and test data) to build, tune, and measure the
performance of a model.

Cross
Validation

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Model Training, Tuning, and Evaluation
K-fold cross-validation
The Figure shows an example of 10-fold cross-validation

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation of the DL (ML) models Cont.
Q: How can you evaluate the ML models?
You can follow these steps:
▪ Train and evaluate the multiple models: Try different deep learning
algorithms and hyperparameters to see which models perform the
best on the training data.
▪ Select the best model: Once you have evaluated all of your models,
you can select the best model based on the evaluation metrics.
➢ For example, if the model is going to be used to make high-stakes
(‫ )عالية المخاطر‬decisions, then it is important to select a model with
high accuracy and precision.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics
Q: What are the classification of the evaluation metrics?
The evaluation model is classified into:
▪ Classification Metrics
▪ Regression Metrics
Classification Metrics
▪ For every classification model, a confusion matrix is used to check the performance
of any given set of test data.
Confusion Matrix
▪ A confusion matrix is a summary of correct and incorrect predictions and helps
visualize the outcomes.
▪ Confusion matrix something looks like this:
Actual 0 Actual 1
Predicted 0 True Negative (TN) False Negative (FN)
Predicted 1 False Positive(FP) True Positive (TP)
where,
True Positive (TP): Predicted positive, and it’s true. True Negative (TN): Predicted negative, and it’s false.
False Positive (FP): Predicted positive, and it’s false. False Negative (FN): Predicted negative, and it’s true.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Accuracy
▪ Accuracy is one of the most used evaluation metrics in classification problems.
▪ It measures the proportion of correct predictions in the total predictions made.
▪ It is defined as:
Accuracy = Number of Correct Predictions/Total Number of Predictions
Mathematically, it is defined as:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision
▪ Precision evaluates the accuracy of the positive prediction made by the classifier.
▪ In simple terms, precision answers the question: “Of all the instances that the model
predicted as positive, how many were positive?”.
Mathematically, it is defined as:
Precision = True Positive (TP) / (True Positive (TP) + False Positive (FP))

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Recall
▪ The recall is also known as sensitivity or the true positive rate.
▪ It is the ratio of the number of true positive predictions to the total number of actual positive
instances in the dataset.
▪ Recall measures the ability of a model to identify all relevant instances.
Mathematically, recall is defined as:
Recall = True Positive (TP) / (True Positive (TP) + False Negative (FN))
F1-Score
▪ The F1 score is the harmonic mean of precision and recall. It provides a single metric that
balances the trade-off between precision and recall. It is especially useful when the class
distribution is imbalanced.
Mathematically, it is given by:
F1 Score = 2 x [(Precision x Recall)/ (Precision + Recall)]
▪ The F1-score ranges between 0 and 1.
1: indicates perfect precision and recall
0: neither precision nor recall

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
AUC-ROC Curve
▪ AUC-ROC stands for the Area Under the Receiver Operating Characteristic
Curve.
▪ ROC curve is a graphical representation of classification model performance at
different thresholds.
▪ It is created by plotting the True Positive Rate (TPR) against the False Positive
Rate (FPR).
▪ Whereas AUC represents the area under the ROC curve.
▪ It provides a single scalar value that summarizes the overall performance of a
classifier across all possible threshold values.
The formula of TPR and FPR:
True Positive Rate (TPR/Sensitivity/Recall) = TP / (TP + FN)
False Positive Rate (FPR) = FP / (FP + TN)

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Regression Metrics
Mean Absolute Error
𝑵
𝟏
ෝ𝒋
𝑴𝑨𝑬 = ෍ 𝒚𝒋 − 𝒚
𝑵
𝒋=𝟏
▪ The Mean Absolute Error (or MAE) tells the average of the absolute differences
between predicted and actual values.
▪ By calculating MAE, we can get an idea of how wrong the predictions were by the
model.
Mean Square Error
𝑵
𝟏
𝑴𝑺𝑬 = ෍(𝒚𝒋 − 𝒚 ෝ 𝒋 )𝟐
𝑵
𝒋=𝟏
▪ The Mean Squared Error (or MSE) is the same as the mean absolute error.
▪ Both tell the average of the differences between predicted and actual values and the
magnitude of the error.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Quiz #3
i) What success and error rates are based
on the given confusion matrix in a multi-
class prediction problem? Does this give
us a good metric?
ii) Compute the accuracy, precision,
recall, and F1 score.
iii) Draw a schematic diagram for a DL
model to find a compromise between its
complexity and prediction accuracy.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Generative AI (GenAI) (‫)الذكاء االصطناعي التوليدي‬
▪ Generative AI (GenAI) is an advanced technological approach that
enables the creation of content, including text, images, and videos.
▪ By analyzing and recognizing patterns within extensive training
datasets, generative AI can autonomously (‫ )بشكل مستقل‬construct
material that shares comparable characteristics to its training input.
▪ GenAI is a subset of deep learning, which means it uses artificial
neural networks and can process both labeled and unlabeled data
using supervised, unsupervised, and semi-supervised methods.

14
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Generative AI
(GenAI) ( ‫الذكاء‬
‫االصطناعي‬
‫)التوليدي‬
Large Language
Models are also a
subset of DL

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A.


Ismail 15
AI vs ML vs DL vs Generative AI

Artificial Intelligence 2nd Semester Prof. Dr. Nabil A. Ismail


Discriminative vs Generative Models
• Deep learning models or machine learning models, in general, can
be divided into two types:
Generative and Discriminative.

• A discriminative model is a type of model that is used to classify or predict labels for data
points.
• Discriminative models are typically trained on a dataset of labeled data points, and they learn
the relationship between the features of the data points and the labels. Once a discriminative
model is trained, it can be used to predict the label for new data points.
• A generative model generates new data instances based on a learned probability distribution of
existing data. Thus, generative models generate new content.
17
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Example:
▪ In this context, the discriminative model learns
the conditional probability distribution, which
is the probability of Y (the output) given X (the
input). For example, it determines that an input
represents a dog and classifies it as such, rather
than as a cat.
▪ On the other hand, the generative model learns
the joint probability distribution, which
encompasses the probability of both X and Y.
This model not only predicts the conditional
probability that a given input represents a dog,
but it can also generate a picture of a dog.
▪ So, to summarize, generative models can
generate new data instances, while
discriminative models can discriminate
between different kinds of data instances.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Discriminative vs Generative Models Cont.

19
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Discriminative vs Generative Models Cont.

20
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail

You might also like