Lecture #4

Evaluation metrics are essential quantitative measures for assessing the performance of deep learning and machine learning models, aiding in model comparison, hyperparameter tuning, and monitoring over time. The document outlines various evaluation metrics, including classification metrics like accuracy, precision, recall, and F1 score, as well as regression metrics such as mean absolute error and mean squared error. Additionally, it discusses the importance of using techniques like k-fold cross-validation to ensure reliable model performance assessment.

Uploaded by

Yousef

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture #4

Uploaded by

Yousef

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Evaluation Metrics in DL(ML)

Q: What is Evaluation Metrics?

▪ Evaluation metrics are quantitative measures used to assess the performance of
a data science or DL (ML) model.
▪ These metrics provide insights into how well the model is performing and help
in comparing different models or algorithms.
Q: Why is it Important?
Evaluation matrices are important as they help:
▪ To assess the performance of a model: Evaluation metrics provide a
quantitative measure of how well a model performs on a given task.
➢ This is essential for understanding a model’s strengths and weaknesses and deciding
whether to deploy it to production.
▪ To compare different models: Evaluation metrics can be used to compare deep
learning models trained on the same dataset to solve the same problem.
➢ For example, if two models have similar accuracy scores, but if one has a higher precision
score, that will be preferred.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation Metrics in DL(ML) Cont.
Q: Why is it Important?
▪ To tune hyperparameters: Evaluation metrics are often used to tune the
hyperparameters of a machine learning model.
▪ Hyperparameters control a model’s training process, such as the learning rate and the
number of epochs.
➢ By adjusting the hyperparameters, data scientists can improve the performance of their models.
▪ To monitor the performance of a model over time: Evaluation metrics can be used to
monitor the performance of a machine learning model over time.
▪ This is important because models can degrade performance over time due to changes
in the data distribution and concept drift.
➢ By monitoring the performance of a model, data scientists can identify any problems early and
take corrective action.
▪ To identify overfitting: Overfitting occurs when a model learns the training data too
well and cannot generalize to new data.
➢ Evaluation metrics can identify overfitting by comparing the model’s performance on the
training data to its performance on a held-out test set.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation of the Dl (ML) models
Q: How can you evaluate the DL models?
You can follow these steps:
▪ Choose the right evaluation metric: The choice of the evaluation metric will
depend on the specific machine learning task and the desired outcome.
➢ For a classification model, you can choose accuracy, precision, recall, and
F1 score as evaluation metrics.
➢ For a Regression model, you may use mean absolute error (MAE), mean
squared error (MSE), and root mean squared error (RMSE) as evaluation
metrics.
▪ Split data into training and test sets: The training set is used to train the model,
and the test set is used to evaluate the model’s performance on unseen data.
▪ This is to ensure that the model is more balanced with the training data.
➢ To reduce the risk of overfitting, use a cross-validation technique.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Using k-fold cross-validation to assess
model performance
• The DL model can either suffer from underfitting (high bias) if the model is too
simple.
• It can overfit the training data (high variance) if the model is too complex for the
underlying training data.
• The common cross-validation technique can help us obtain reliable estimates of the
model’s performance, that is, how well the model performs on unseen data.
• A better way of using the holdout cross-validation method for model selection is to
separate the data into three parts: a training set, a validation set, and a test set.
• We use the validation set to repeatedly evaluate the performance of the model after
training using parameter values.
• We estimate the models’ performance on the test dataset.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Model Training, Tuning, and Evaluation
• Attempt to find a good compromise between the complexity of a model and its
prediction accuracy on the training data.
• Use three sets (training, validation, and test data) to build, tune, and measure the
performance of a model.

Cross
Validation

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Model Training, Tuning, and Evaluation
K-fold cross-validation
The Figure shows an example of 10-fold cross-validation

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Evaluation of the DL (ML) models Cont.
Q: How can you evaluate the ML models?
You can follow these steps:
▪ Train and evaluate the multiple models: Try different deep learning
algorithms and hyperparameters to see which models perform the
best on the training data.
▪ Select the best model: Once you have evaluated all of your models,
you can select the best model based on the evaluation metrics.
➢ For example, if the model is going to be used to make high-stakes
(‫ )عالية المخاطر‬decisions, then it is important to select a model with
high accuracy and precision.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics
Q: What are the classification of the evaluation metrics?
The evaluation model is classified into:
▪ Classification Metrics
▪ Regression Metrics
Classification Metrics
▪ For every classification model, a confusion matrix is used to check the performance
of any given set of test data.
Confusion Matrix
▪ A confusion matrix is a summary of correct and incorrect predictions and helps
visualize the outcomes.
▪ Confusion matrix something looks like this:
Actual 0 Actual 1
Predicted 0 True Negative (TN) False Negative (FN)
Predicted 1 False Positive(FP) True Positive (TP)
where,
True Positive (TP): Predicted positive, and it’s true. True Negative (TN): Predicted negative, and it’s false.
False Positive (FP): Predicted positive, and it’s false. False Negative (FN): Predicted negative, and it’s true.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Accuracy
▪ Accuracy is one of the most used evaluation metrics in classification problems.
▪ It measures the proportion of correct predictions in the total predictions made.
▪ It is defined as:
Accuracy = Number of Correct Predictions/Total Number of Predictions
Mathematically, it is defined as:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Precision
▪ Precision evaluates the accuracy of the positive prediction made by the classifier.
▪ In simple terms, precision answers the question: “Of all the instances that the model
predicted as positive, how many were positive?”.
Mathematically, it is defined as:
Precision = True Positive (TP) / (True Positive (TP) + False Positive (FP))

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Recall
▪ The recall is also known as sensitivity or the true positive rate.
▪ It is the ratio of the number of true positive predictions to the total number of actual positive
instances in the dataset.
▪ Recall measures the ability of a model to identify all relevant instances.
Mathematically, recall is defined as:
Recall = True Positive (TP) / (True Positive (TP) + False Negative (FN))
F1-Score
▪ The F1 score is the harmonic mean of precision and recall. It provides a single metric that
balances the trade-off between precision and recall. It is especially useful when the class
distribution is imbalanced.
Mathematically, it is given by:
F1 Score = 2 x [(Precision x Recall)/ (Precision + Recall)]
▪ The F1-score ranges between 0 and 1.
1: indicates perfect precision and recall
0: neither precision nor recall

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
AUC-ROC Curve
▪ AUC-ROC stands for the Area Under the Receiver Operating Characteristic
Curve.
▪ ROC curve is a graphical representation of classification model performance at
different thresholds.
▪ It is created by plotting the True Positive Rate (TPR) against the False Positive
Rate (FPR).
▪ Whereas AUC represents the area under the ROC curve.
▪ It provides a single scalar value that summarizes the overall performance of a
classifier across all possible threshold values.
The formula of TPR and FPR:
True Positive Rate (TPR/Sensitivity/Recall) = TP / (TP + FN)
False Positive Rate (FPR) = FP / (FP + TN)

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Types of Evaluation Metrics Cont.
Regression Metrics
Mean Absolute Error
𝑵
𝟏
ෝ𝒋
𝑴𝑨𝑬 = ෍ 𝒚𝒋 − 𝒚
𝑵
𝒋=𝟏
▪ The Mean Absolute Error (or MAE) tells the average of the absolute differences
between predicted and actual values.
▪ By calculating MAE, we can get an idea of how wrong the predictions were by the
model.
Mean Square Error
𝑵
𝟏
𝑴𝑺𝑬 = ෍(𝒚𝒋 − 𝒚 ෝ 𝒋 )𝟐
𝑵
𝒋=𝟏
▪ The Mean Squared Error (or MSE) is the same as the mean absolute error.
▪ Both tell the average of the differences between predicted and actual values and the
magnitude of the error.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Quiz #3
i) What success and error rates are based
on the given confusion matrix in a multi-
class prediction problem? Does this give
us a good metric?
ii) Compute the accuracy, precision,
recall, and F1 score.
iii) Draw a schematic diagram for a DL
model to find a compromise between its
complexity and prediction accuracy.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Generative AI (GenAI) (‫)الذكاء االصطناعي التوليدي‬
▪ Generative AI (GenAI) is an advanced technological approach that
enables the creation of content, including text, images, and videos.
▪ By analyzing and recognizing patterns within extensive training
datasets, generative AI can autonomously (‫ )بشكل مستقل‬construct
material that shares comparable characteristics to its training input.
▪ GenAI is a subset of deep learning, which means it uses artificial
neural networks and can process both labeled and unlabeled data
using supervised, unsupervised, and semi-supervised methods.

14
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Generative AI
(GenAI) ( ‫الذكاء‬
‫االصطناعي‬
‫)التوليدي‬
Large Language
Models are also a
subset of DL

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A.

Ismail 15
AI vs ML vs DL vs Generative AI

Artificial Intelligence 2nd Semester Prof. Dr. Nabil A. Ismail

Discriminative vs Generative Models
• Deep learning models or machine learning models, in general, can
be divided into two types:
Generative and Discriminative.

• A discriminative model is a type of model that is used to classify or predict labels for data
points.
• Discriminative models are typically trained on a dataset of labeled data points, and they learn
the relationship between the features of the data points and the labels. Once a discriminative
model is trained, it can be used to predict the label for new data points.
• A generative model generates new data instances based on a learned probability distribution of
existing data. Thus, generative models generate new content.
17
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Example:
▪ In this context, the discriminative model learns
the conditional probability distribution, which
is the probability of Y (the output) given X (the
input). For example, it determines that an input
represents a dog and classifies it as such, rather
than as a cat.
▪ On the other hand, the generative model learns
the joint probability distribution, which
encompasses the probability of both X and Y.
This model not only predicts the conditional
probability that a given input represents a dog,
but it can also generate a picture of a dog.
▪ So, to summarize, generative models can
generate new data instances, while
discriminative models can discriminate
between different kinds of data instances.

Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Discriminative vs Generative Models Cont.

19
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail
Discriminative vs Generative Models Cont.

20
Advanced Topics in CSE 2nd Semester 2025 Prof. Dr. Nabil A. Ismail

Machine Learning Interview Questions.
50% (2)
Machine Learning Interview Questions.
43 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Chapter 2 Part II (1)
No ratings yet
Chapter 2 Part II (1)
28 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
CSL0777 L06
No ratings yet
CSL0777 L06
24 pages
Machine Learning # 2
No ratings yet
Machine Learning # 2
17 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
lec-4
No ratings yet
lec-4
24 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Exp7_MLAI2
No ratings yet
Exp7_MLAI2
8 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
Ss PPT Presentation
No ratings yet
Ss PPT Presentation
11 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
IAI&ML UNIT-5
No ratings yet
IAI&ML UNIT-5
15 pages
Model Evaluation Presentation (2)
No ratings yet
Model Evaluation Presentation (2)
24 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
ML 5
No ratings yet
ML 5
14 pages
6. DEEP UNIT 4
No ratings yet
6. DEEP UNIT 4
31 pages
AD3501-DL-UNIT 4 NOTES
No ratings yet
AD3501-DL-UNIT 4 NOTES
16 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
dbms-10 marks
No ratings yet
dbms-10 marks
32 pages
How to Evaluate Machine Learning Models - Yulinda Rizky
No ratings yet
How to Evaluate Machine Learning Models - Yulinda Rizky
15 pages
Cofusion Matrix Cross- Validation
No ratings yet
Cofusion Matrix Cross- Validation
34 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
2. Performance Measures
No ratings yet
2. Performance Measures
19 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
A10-Model-Performance-v2-2up
No ratings yet
A10-Model-Performance-v2-2up
11 pages
DL_Unit1 (1)
No ratings yet
DL_Unit1 (1)
79 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Machine Learning Linear and Logistic Rgression K-mean
No ratings yet
Machine Learning Linear and Logistic Rgression K-mean
11 pages
ML - Chapter 6 - Model Evaluation
No ratings yet
ML - Chapter 6 - Model Evaluation
65 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
49 pages
All DL
No ratings yet
All DL
72 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
ML 3170724 Unit-3
No ratings yet
ML 3170724 Unit-3
48 pages
Module 2
No ratings yet
Module 2
151 pages
Untitled
No ratings yet
Untitled
11 pages
Emailing PREDICTIVE ANALYSIS 2
No ratings yet
Emailing PREDICTIVE ANALYSIS 2
14 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Unit 2
No ratings yet
Unit 2
28 pages
Regression
No ratings yet
Regression
24 pages
CSL0777 L08
No ratings yet
CSL0777 L08
29 pages
Unit V
No ratings yet
Unit V
12 pages
ML2
No ratings yet
ML2
8 pages
Data Mining Assignment Help
No ratings yet
Data Mining Assignment Help
5 pages
Lecture -3
No ratings yet
Lecture -3
24 pages
unit 2 (1)
No ratings yet
unit 2 (1)
23 pages
Model Comparison
No ratings yet
Model Comparison
27 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
full stack20254-C6BLbQz4
No ratings yet
full stack20254-C6BLbQz4
2 pages
Sheet 2-1
No ratings yet
Sheet 2-1
7 pages
SE-lect6
No ratings yet
SE-lect6
35 pages
Mahmoud Anwar PHP Laravel Developer
No ratings yet
Mahmoud Anwar PHP Laravel Developer
2 pages
Mid Exam Image ف3 solution
No ratings yet
Mid Exam Image ف3 solution
4 pages
Mahmoud Anwar Portfolio
No ratings yet
Mahmoud Anwar Portfolio
12 pages
Exercise #1
No ratings yet
Exercise #1
1 page
Chapter 2 اولي
No ratings yet
Chapter 2 اولي
4 pages
Magnetically Coupled Circuits-Lecture 2
No ratings yet
Magnetically Coupled Circuits-Lecture 2
16 pages
Quiz 1 - A
No ratings yet
Quiz 1 - A
2 pages
LEC2 - 7segment - LED Matrix
No ratings yet
LEC2 - 7segment - LED Matrix
37 pages
Multivariate Models: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
Multivariate Models: Introductory Econometrics For Finance' © Chris Brooks 2013 1
57 pages
Kim2009
No ratings yet
Kim2009
9 pages
EEE436 Lecture Slide 5
No ratings yet
EEE436 Lecture Slide 5
14 pages
Crosstabs: Notes
No ratings yet
Crosstabs: Notes
7 pages
COSM Important Question 2024
No ratings yet
COSM Important Question 2024
5 pages
Maa 203 Cheat Sheet Lucien Walewski
No ratings yet
Maa 203 Cheat Sheet Lucien Walewski
2 pages
1.5 Measurement (Gage R&R)
100% (1)
1.5 Measurement (Gage R&R)
19 pages
(eBook PDF) Statistics for Managers Using Microsoft Excel 8th Global Edition - The ebook in PDF format is ready for download
100% (1)
(eBook PDF) Statistics for Managers Using Microsoft Excel 8th Global Edition - The ebook in PDF format is ready for download
51 pages
2010 RAMS Doe and Data Analysis
No ratings yet
2010 RAMS Doe and Data Analysis
30 pages
Strip Plot Design
No ratings yet
Strip Plot Design
12 pages
W7 - Q4 - Statistics and Probability 1 Cot 2 1
No ratings yet
W7 - Q4 - Statistics and Probability 1 Cot 2 1
32 pages
MAT 2602 Statistics Module 2021
No ratings yet
MAT 2602 Statistics Module 2021
288 pages
Exercise 6A: Result Is Predetermined and So Not Random
No ratings yet
Exercise 6A: Result Is Predetermined and So Not Random
4 pages
Time Series Documentation - Mathematica
100% (2)
Time Series Documentation - Mathematica
214 pages
MSA For Continuous Data
No ratings yet
MSA For Continuous Data
8 pages
Analysis Var - Ance: OF (Anova)
No ratings yet
Analysis Var - Ance: OF (Anova)
13 pages
2022 MA311 Statistics-Tutorial
No ratings yet
2022 MA311 Statistics-Tutorial
5 pages
PLS-SEM or CB-SEM: Updated Guidelines On Which Method To Use
No ratings yet
PLS-SEM or CB-SEM: Updated Guidelines On Which Method To Use
17 pages
Relationship Between Variables: Fitting An Equation or Curve The Meaning of Regression The Population Regression Function (PRF)
No ratings yet
Relationship Between Variables: Fitting An Equation or Curve The Meaning of Regression The Population Regression Function (PRF)
21 pages
Ekram Assignment ECONO
100% (1)
Ekram Assignment ECONO
16 pages
Maximum Likelihood Method: MLM: pick α to maximize the probability of getting the measurements (the x 's) that we did!
No ratings yet
Maximum Likelihood Method: MLM: pick α to maximize the probability of getting the measurements (the x 's) that we did!
8 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
A Generalized Benford's Law For JPEG Coefficients and Its Applications in Image Forensics
No ratings yet
A Generalized Benford's Law For JPEG Coefficients and Its Applications in Image Forensics
11 pages
RMD S10 Regression
No ratings yet
RMD S10 Regression
22 pages
Multiple Linear Regression: BIOST 515 January 15, 2004
No ratings yet
Multiple Linear Regression: BIOST 515 January 15, 2004
32 pages
Case Study 2
100% (2)
Case Study 2
9 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Recent Developments in Systematic Sampling: A Review: Journal of Statistical Theory and Practice
No ratings yet
Recent Developments in Systematic Sampling: A Review: Journal of Statistical Theory and Practice
23 pages
IS5740 W05 Tutorial Note (Regression)
No ratings yet
IS5740 W05 Tutorial Note (Regression)
12 pages
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
No ratings yet
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
13 pages