0% found this document useful (0 votes)

64 views11 pages

Machine Learning Model Evaluation

The document discusses the importance of machine learning model evaluation, highlighting techniques such as cross-validation and metrics like accuracy, precision, recall, F1 score, and confusion matrix for classification tasks. It also covers regression evaluation metrics including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). Overall, it emphasizes that proper evaluation is crucial for ensuring model effectiveness and reliability in real-world applications.

Uploaded by

Nilay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views11 pages

Machine Learning Model Evaluation

Uploaded by

Nilay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Machine Learning Model Evaluation

Model evaluation is a process that uses some metrics which help us to

analyze the performance of the model. Think of training a model like
teaching a student. Model evaluation is like giving them a test to see if
they truly learned the subject—or just memorized answers. It helps us
answer:
Did the model learn patterns?
Will it fail on new questions?

Model development is a multi-step process and we need to keep a check

on how well the model do future predictions and analyze a models
weaknesses. There are many metrics for that. Cross Validation is one
technique that is followed during the training phase and it is a model
evaluation technique.

Cross-Validation: The Ultimate Practice Test

Cross Validation is a method in which we do not use the whole dataset for
training. In this technique some part of the dataset is reserved for testing
the model. There are many types of Cross-Validation out of which K Fold
Cross Validation is mostly used. In K Fold Cross Validation the original
dataset is divided into k subsets. The subsets are known as folds. This is
repeated k times where 1 fold is used for testing purposes, rest k-1 folds
are used for training the model. It is seen that this technique generalizes
the model well and reduces the error rate.
Holdout is the simplest approach. It is used in neural networks as well as
in many classifiers. In this technique the dataset is divided into train and
test datasets. The dataset is usually divided into ratios like 70:30 or 80:20.
Normally a large percentage of data is used for training the model and a
small portion of dataset is used for testing the model.

Evaluation Metrics for Classification Task

Classification is used to categorize data into predefined labels or classes.
To evaluate the performance of a classification model we commonly use
metrics such as accuracy, precision, recall, F1 score and confusion matrix.
These metrics are useful in assessing how well model distinguishes
between classes especially in cases of imbalanced datasets. By
understanding the strengths and weaknesses of each metric, we can
select the most appropriate one for a given classification problem.

In this Python code, we have imported the iris dataset which has features
like the length and width of sepals and petals. The target values are Iris
setosa, Iris virginica, and Iris versicolor. After importing the dataset we
divided the dataset into train and test datasets in the ratio 80:20. Then we
called Decision Trees and trained our model. After that, we performed the
prediction and calculated the accuracy score, precision, recall, and f1 score.
We also plotted the confusion matrix.
Importing Libraries and Dataset

Now let’s load the toy dataset iris flowers from the sklearn.datasets library
and then split it into training and testing parts (for model evaluation) in the
80:20 ratio.

random_state=20,

Now, let’s train a Decision Tree Classifier model on the training data, and
then we will move on to the evaluation part of the model using different
metrics.

1. Accuracy

Accuracy is defined as the ratio of number of correct predictions to the

total number of predictions. This is the most fundamental metric used to
evaluate the model. The formula is given by:
Accuracy = TP+TP +TN
TN+FP+ FN
However Accuracy has a drawback. It cannot perform well on an
imbalanced dataset. Suppose a model classifies that the majority of the
data belongs to the major class label. It gives higher accuracy, but in
general model cannot classify on minor class labels and has poor
performance.

print("Accuracy:", accuracy_score(y_test,

Output:

Accuracy: 0.9333333333333333
2. Precision and Recall

Precision is the ratio of true positives to the summation of true positives

and false positives. It basically analyses the positive predictions.
Precision = TP
TP+FP
The drawback of Precision is that it does not consider the True Negatives
and False Negatives.
Recall is the ratio of true positives to the summation of true positives and
false negatives. It basically analyses the number of correct positive
samples.
Recall = TP
TP+FN
The drawback of Recall is that often it leads to a higher false positive rate.

print("Precision:", precision_score(y_test,

average="weighted"))

print('Recall:', recall_score(y_test,

average="weighted"))

Output:

Precision: 0.9435897435897436
Recall: 0.9333333333333333

3. F1 score

F1 score is the harmonic mean of precision and recall. It is seen that during
the precision-recall trade-off if we increase the precision, recall decreases
and vice versa. The goal of the F1 score is to combine precision and recall.
2×Precision×Recall
F1 Score = Precision+Recall
Output:

F1 score: 0.9327777777777778

4. Confusion Matrix

Confusion matrix is a N x N matrix where N is the number of target

classes. It represents number of actual outputs and predicted outputs.
Some terminologies in the matrix are as follows:
True Positives: It is also known as TP. It is the output in which the
actual and the predicted values are YES.
True Negatives: It is also known as TN. It is the output in which the
actual and the predicted values are NO.
False Positives: It is also known as FP. It is the output in which the
actual value is NO but the predicted value is YES.
False Negatives: It is also known as FN. It is the output in which the
actual value is YES but the predicted value is NO.

cm_display.plot()

Output:
Confusion matrix for the output of the model

In the output the accuracy of model is 93.33%. Precision is approximately

0.944 and Recall is 0.933. F1 score is approximately 0.933. Finally the
confusion matrix is plotted. Here class labels denote the target classes:

0 = Setosa
1 = Versicolor
2 = Virginica

From the confusion matrix, we see that 8 setosa classes were correctly
predicted. 11 Versicolor test cases were also correctly predicted by the
model and 2 virginica test cases were misclassified. In contrast, the rest 9
were correctly predicted.

5. AUC-ROC Curve

AUC (Area Under Curve) is an evaluation metric that is used to analyze the
classification model at different threshold values. The Receiver Operating
Characteristic (ROC) curve is a probabilistic curve used to highlight the
model’s performance. The curve has two parameters:
TPR: It stands for True positive rate. It basically follows the formula of
Recall.
FPR: It stands for False Positive rate. It is defined as the ratio of False
positives to the summation of false positives and True negatives.
This curve is useful as it helps us to determine the model’s capacity to
distinguish between different classes. Let us illustrate this with the help of
a simple Python example

Output:

Auc 0.75

AUC score is a useful metric to evaluate the model. It highlights model’s

capacity to separate the classes. In the above code 0.75 is a good AUC
score. A model is considered good if the AUC score is greater than 0.5 and
approaches 1.

Evaluation Metrics for Regression Task

Regression is used to determine continuous values. It is mostly used to
find a relation between a dependent and independent variable. For
classification we use a confusion matrix, accuracy, f1 score, etc. But for
regression analysis since we are predicting a numerical value it may differ
from the actual output. So we consider the error calculation as it helps to
summarize how close the prediction is to the actual value. There are many
metrics available for evaluating the regression model.
In this Python Code we have implemented a simple regression model
using the Mumbai weather CSV file. This file comprises Day, Hour,
Temperature, Relative Humidity, Wind Speed and Wind Direction. The link
for the dataset is here.
We are interested in finding relationship between Temperature and
Relative Humidity. Here Relative Humidity is the dependent variable and
Temperature is the independent variable. We performed linear regression
and use different metrics to evaluate the performance of our model. To
calculate the metrics we make extensive use of sklearn library.

mean_squared_error, mean_absolute_percentage_error

Now let’s load the data into the panda’s data frame and then split it into
training and testing parts (for model evaluation) in the 80:20 ratio.

test_size=0.20,
random_state=0)

Now, let’s train a simple linear regression model. On the training data and
we will move to the evaluation part of the model using different metrics.

1. Mean Absolute Error ( MAE)

This is the simplest metric used to analyze the loss over the whole
dataset. As we know that error is basically the difference between the
predicted and actual values. Therefore MAE is defined as the average of
the errors calculated. Here we calculate the modulus of the error, perform
summation and then divide the result by the total number of data points.
It is a positive value. The formula of MAE is given by
MAE = ∑ ∣ypred–yactual∣
N
i=1
N

y_pred=Y_pred)
Output:

Mean Absolute Error 1.7236295632503873

2. Mean Squared Error( MSE)

The most commonly used metric is Mean Square error or MSE. It is a

function used to calculate the loss. We find the difference between the
predicted values and actual variable, square the result and then find the
average by all datapoints present in dataset. MSE is always positive as we
square the values. Small the value of MSE better is the performance of our
model. The formula of MSE is given:
∑ (ypred–yactual)
N 2

MSE = i=1
N

Output:

Mean Square Error 3.9808057060106954

3. Root Mean Squared Error( RMSE)

RMSE is a popular method and is the extended version of MSE. It indicates

how much the data points are spread around the best line. It is the
standard deviation of the MSE. A lower value means that the data point
lies closer to the best fit line.

RMSE =
N
pred –yactual )2
N

y_pred=Y_pred,
squared=False)

Output:
Root Mean Square Error 1.9951956560725306

4. Mean Absolute Percentage Error ( MAPE)

MAPE is used to express the error in terms of percentage. It is defined as

the difference between the actual and predicted value. The error is then
divided by the actual value. The results are then summed up and finally
and we calculate the average. Smaller the percentage better the
performance of the model. The formula is given by
MAPE = 1 ∑N ( ∣ypred–yactual∣ ) × 100%
N i=1 ∣yactual∣

Output:

Mean Absolute Percentage Error 0.02334408993333347

Evaluating machine learning models is a important step in ensuring their

effectiveness and reliability in real-world applications. Using appropriate
metrics such as accuracy, precision, recall, F1 score for classification and
regression-specific measures like MAE, MSE, RMSE and MAPE can assess
model performance for different tasks. Moreover adopting evaluation
techniques like cross-validation and holdout ensures that models
generalize well to unseen data.

Credit Card Fraud Detection-Ppt-1
100% (1)
Credit Card Fraud Detection-Ppt-1
22 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
A Comprehensive Analysis of Machine Learning Models For Algorithmic Trading of Bitcoin
No ratings yet
A Comprehensive Analysis of Machine Learning Models For Algorithmic Trading of Bitcoin
11 pages
International Journal of Fatigue: Zhixin Zhan, Hua Li
No ratings yet
International Journal of Fatigue: Zhixin Zhan, Hua Li
15 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Module 2
No ratings yet
Module 2
151 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
Lec 12 13 Evaluation Measures
No ratings yet
Lec 12 13 Evaluation Measures
45 pages
AI Unit 1
No ratings yet
AI Unit 1
41 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
BTP Final Repo 2024 Conv
No ratings yet
BTP Final Repo 2024 Conv
56 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Lec 4 ML S4 Evaluation Metrics
No ratings yet
Lec 4 ML S4 Evaluation Metrics
29 pages
Unit - I Chap-4 Model Evaluation and Development
No ratings yet
Unit - I Chap-4 Model Evaluation and Development
35 pages
Inf 713 Exam Notes
No ratings yet
Inf 713 Exam Notes
25 pages
Lecture - 3
No ratings yet
Lecture - 3
24 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
IDS U-5 Answers
No ratings yet
IDS U-5 Answers
16 pages
Non Invasive Real Time Multimodal Deception Detection Using Machine Learning and Parallel Computing Techniques
No ratings yet
Non Invasive Real Time Multimodal Deception Detection Using Machine Learning and Parallel Computing Techniques
16 pages
Sample Paper For AI
No ratings yet
Sample Paper For AI
21 pages
Module 6
No ratings yet
Module 6
24 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Comp Vis Week 2
No ratings yet
Comp Vis Week 2
16 pages
Team-39 Mini Project Documentation
No ratings yet
Team-39 Mini Project Documentation
49 pages
BBC Reading 1
No ratings yet
BBC Reading 1
23 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
Global Lyapunov Functions
No ratings yet
Global Lyapunov Functions
21 pages
Module 7 - Evaluation Measures
No ratings yet
Module 7 - Evaluation Measures
27 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
DS Unit 4
No ratings yet
DS Unit 4
13 pages
Unit 3
No ratings yet
Unit 3
13 pages
Machine Learning 3
No ratings yet
Machine Learning 3
30 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
Is Grad-CAM Explainable in Medical Images?
No ratings yet
Is Grad-CAM Explainable in Medical Images?
13 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Confusion Matrix
No ratings yet
Confusion Matrix
8 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
STD X - Model Evaluation - Content
No ratings yet
STD X - Model Evaluation - Content
5 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Ads 5
No ratings yet
Ads 5
5 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
Chater 3 Class 10
No ratings yet
Chater 3 Class 10
4 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
FYP Proposal1
No ratings yet
FYP Proposal1
12 pages
Machine Larning
No ratings yet
Machine Larning
14 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency
No ratings yet
Evaluation of Machine Learning Algorithms For The Detection of Fake Bank Currency
6 pages
Introduction To AI Powered Data Analysis
No ratings yet
Introduction To AI Powered Data Analysis
10 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Classification Metrics
No ratings yet
Classification Metrics
24 pages
Quantum Circuit Architecture Search For Variational Quantum Algorithms
No ratings yet
Quantum Circuit Architecture Search For Variational Quantum Algorithms
8 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Iv 1
No ratings yet
Iv 1
5 pages
Forest Fire Detection and Notification Method Based On AI and IoT Approaches
No ratings yet
Forest Fire Detection and Notification Method Based On AI and IoT Approaches
13 pages
Field-Scale Estimation and Comparison of The Sugarcane Yield From Remote Sensing Data: A Machine Learning Approach
No ratings yet
Field-Scale Estimation and Comparison of The Sugarcane Yield From Remote Sensing Data: A Machine Learning Approach
14 pages
Data in Machine Learning
No ratings yet
Data in Machine Learning
7 pages
Ads Exp 4
No ratings yet
Ads Exp 4
4 pages
Decision Tree Code Explanation
No ratings yet
Decision Tree Code Explanation
4 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Ai Powered Medical Diagnosis-Phase 3
No ratings yet
Ai Powered Medical Diagnosis-Phase 3
10 pages
Deep Learning For Facial Recognition in Secure Attendance Tracking With Anti-Spoofing Technique
No ratings yet
Deep Learning For Facial Recognition in Secure Attendance Tracking With Anti-Spoofing Technique
4 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

Machine Learning Model Evaluation

Uploaded by

Machine Learning Model Evaluation

Uploaded by

Machine Learning Model Evaluation

Model evaluation is a process that uses some metrics which help us to

Model development is a multi-step process and we need to keep a check

Cross-Validation: The Ultimate Practice Test

Evaluation Metrics for Classification Task

Accuracy is defined as the ratio of number of correct predictions to the

Precision is the ratio of true positives to the summation of true positives

Confusion matrix is a N x N matrix where N is the number of target

In the output the accuracy of model is 93.33%. Precision is approximately

AUC score is a useful metric to evaluate the model. It highlights model’s

Evaluation Metrics for Regression Task

1. Mean Absolute Error ( MAE)

Mean Absolute Error 1.7236295632503873

2. Mean Squared Error( MSE)

The most commonly used metric is Mean Square error or MSE. It is a

Mean Square Error 3.9808057060106954

3. Root Mean Squared Error( RMSE)

RMSE is a popular method and is the extended version of MSE. It indicates

4. Mean Absolute Percentage Error ( MAPE)

MAPE is used to express the error in terms of percentage. It is defined as

Mean Absolute Percentage Error 0.02334408993333347

Evaluating machine learning models is a important step in ensuring their

You might also like