0% found this document useful (0 votes)

24 views36 pages

Unit III Iml Final

The document discusses evaluation metrics for classification and regression models in machine learning, emphasizing the importance of metrics beyond accuracy for assessing model performance. Key classification metrics include precision, recall, F1 score, and AUC-ROC, while regression metrics include mean absolute error, mean squared error, and root mean squared error. Understanding these metrics helps data scientists improve model predictions and avoid pitfalls such as misleading accuracy rates.

Uploaded by

binisri73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views36 pages

Unit III Iml Final

Uploaded by

binisri73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Evaluation Metrics for Regression and Classification.

Classification
Evaluating the performance of your classification model is crucial to ensure its accuracy and
effectiveness. While accuracy is important, it’s just one piece of the puzzle. There are several other
evaluation metrics that provide a more comprehensive understanding of your model’s performance.
This article will discuss these metrics and how they can guide you in making the right decisions to
improve your model’s predictive power.
Classification Metrics in Machine Learning
Classification Metrics is about predicting the class labels given input data. In binary classification, there
are only two possible output classes (i.e., Dichotomy). In multiclass classification, more than two
possible classes can be present. I’ll focus only on binary classification.
A very common example of binary classification is spam detection, where the input data could include
the email text and metadata (sender, sending time), and the output label is either “spam” or “not spam.”
(See Figure) Sometimes, people use some other names also for the two classes: “positive” and
“negative,” or “class 1” and “class 0.”
There are many ways for measuring classification performance. Accuracy, confusion matrix, log-loss,
and AUC-ROC are some of the most popular metrics. Precision-recall is a widely used metrics for
classification problems.

The Limitations of Accuracy

Accuracy simply measures how often the classifier correctly predicts. We can define accuracy as the
ratio of the number of correct predictions and the total number of predictions.

Accuracy= TP+TN / TP+TN+FP+FN

When any model gives an accuracy rate of 99%, you might think that model is performing very good
but this is not always true and can be misleading in some situations. I am going to explain this with the
help of an example.
Confusion Matrix
Confusion Matrix is a performance measurement for the machine learning classification problems
where the output can be two or more classes. It is a table with combinations of predicted and actual
values.
 True Positive: We predicted positive and it’s true. In the image, we predicted that a woman is
pregnant and she actually is.
 True Negative: We predicted negative and it’s true. In the image, we predicted that a man is
not pregnant and he actually is not.
 False Positive (Type 1 Error): We predicted positive and it’s false. In the image, we predicted
that a man is pregnant but he actually is not.
 False Negative (Type 2 Error): We predicted negative and it’s false. In the image, we predicted
that a woman is not pregnant but she actually is.
We discussed Accuracy, now let’s discuss some other metrics of the confusion matrix!
Precision
It explains how many of the correctly predicted cases actually turned out to be positive. Precision is
useful in the cases where False Positive is a higher concern than False Negatives. The importance of
Precision is in music or video recommendation systems, e-commerce websites, etc. where wrong results
could lead to customer churn and this could be harmful to the business.
Precision for a label is defined as the number of true positives divided by the number of predicted
positives.

Recall (Sensitivity)
It explains how many of the actual positive cases we were able to predict correctly with our model.
Recall is a useful metric in cases where False Negative is of higher concern than False Positive.

F1 Score
It gives a combined idea about Precision and Recall metrics. It is maximum when Precision is equal to
Recall.
F1 Score is the harmonic mean of precision and recall.

The F1 score punishes extreme values more. F1 Score could be an effective evaluation metric in the
following cases:
 When FP and FN are equally costly.
 Adding more data doesn’t effectively change the outcome
 True Negative is high
AUC-ROC
The Receiver Operator Characteristic (ROC) is a probability curve that plots the TPR(True Positive
Rate) against the FPR(False Positive Rate) at various threshold values and separates the ‘signal’ from
the ‘noise’.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between
classes. From the graph, we simply say the area of the curve ABDE and the X and Y-axis.
From the graph shown below, the greater the AUC, the better is the performance of the model at different
threshold points between positive and negative classes. This simply means that When AUC is equal to
1, the classifier is able to perfectly distinguish between all Positive and Negative class points. When
AUC is equal to 0, the classifier would be predicting all Negatives as Positives and vice versa. When
AUC is 0.5, the classifier is not able to distinguish between the Positive and Negative classes.
Working of AUC
In a ROC curve, the X-axis value shows False Positive Rate (FPR), and Y-axis shows True Positive
Rate (TPR). Higher the value of X means higher the number of False Positives(FP) than True
Negatives(TN), while a higher Y-axis value indicates a higher number of TP than FN. So, the choice of
the threshold depends on the ability to balance between FP and FN.
Log loss (Logistic loss) or Cross-Entropy Loss is one of the major metrics to assess the performance
of a classification problem.
For a single sample with true label y∈{0,1} and a probability estimate p=Pr(y=1), the log loss is:

Evaluation metrics for regression

Machine learning models aim to understand patterns within data, enabling predictions, answers to
questions, or a deeper understanding of concealed patterns. This iterative learning process involves the
model acquiring patterns, testing against new data, adjusting parameters, and repeating until achieving
satisfactory performance. The evaluation phase, essential for regression problems, employs loss
functions. As a data scientist, it’s crucial to monitor regression metrics like mean squared error and R-
squared to ensure the model doesn’t overfit the training data. Libraries like scikit-learn provide tools
to train and evaluate regression models, helping data scientists build effective solutions.
Evaluation Metrics

 Mean Absolute Error (MAE)

 Mean Bias Error (MBE)
 Relative Absolute Error (RAE)
 Mean Absolute Percentage Error (MAPE)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 Relative Squared Error (RSE)

Mean Absolute Error (MAE):

Mean absolute error, or L1 loss, stands out as one of the simplest and easily
comprehensible loss functions and evaluation metrics. It computes by averaging the absolute
differences between predicted and actual values across the dataset. Mathematically, it represents
the arithmetic mean of absolute errors, focusing solely on their magnitude, irrespective of
direction. A lower MAE indicates superior model accuracy.

where

 y_i = actual value

 y_hat_i = predicted value
 n = sample size

Mean Bias Error (MBE):

In “Mean Bias Error,” bias reflects the tendency of a measurement process to overestimate or
underestimate a parameter. It has a single direction, positive or negative. Positive bias implies an
overestimated error, while negative bias implies an underestimated error. Mean Bias Error (MBE)
calculates the mean difference between predicted and actual values, quantifying overall bias
without considering absolute values. Similar to MAE, MBE differs in not taking the absolute
value. Caution is needed with MBE, as positive and negative errors can cancel each other out.

Relative Absolute Error (RAE)

Relative root mean square error Absolute Error is calculated by dividing the total absolute error
by the absolute difference between the mean and the actual value. The formula for RAE is:

where y_bar is the mean of the n actual values.

RAE measures the performance of a predictive model and is expressed in terms of a ratio. The
value of RAE can range from zero to one. A good model will have values close to zero, with zero
being the best value. This error shows how the mean residual relates to the mean devia tion of the
target function from its mean.

Mean Absolute Percentage Error (MAPE)

Calculate Mean Absolute Percentage Error (MAPE) by dividing the absolute difference between
the actual and predicted values by the actual value. This absolute percentage is a veraged across
the dataset. MAPE, also known as Mean Absolute Percentage Deviation (MAPD), increases
linearly with error. Lower MAPE values indicate better model performance.

Mean Squared Error (MSE)

MSE is one of the most common regression loss functions and an important error metric. In Mean
Squared Error, also known as L2 loss, we calculate the error by squaring the difference between
the predicted value and actual value and averaging it across the dataset.

MSE is also known as Quadratic loss as the penalty is not proportional to the error but to the
square of the error. Squaring the error gives higher weight to the outliers, which results in a smooth
gradient for small errors.

Optimization algorithms benefit from this penalization for large errors as it helps find the optimum
values for parameters using the least squares method. MSE will never be negative since the errors
are squared. The value of the error ranges from zero to infinity. MSE increases exponentially with
an increase in error. A good model will have an MSE value closer to zero, indicating a better
goodness of fit to the data.

Root Mean Squared Error (RMSE)

Root Mean Square Error in Machine Learning (RMSE) is a popular metric used in machine
learning and statistics to measure the accuracy of a predictive model. It quantifies the differences
between predicted values and actual values, squaring the errors, taking the mean, and then finding
the square root. RMSE provides a clear understanding of the model’s performance, with lower
values indicating better predictive accuracy relative root mean square error.
It is computed by taking the square root of MSE. RMSE is also called the Root Mean Square
Deviation. It measures the average magnitude of the errors and is concerned with the deviations
from the actual value. RMSE value with zero indicates that the model has a perfect fit. The lower
the RMSE, the better the model and its predictions. A higher relative root mean square error in
machine learning indicates that there is a large deviation from the residual to the ground truth.
RMSE can be used with different features as it helps in figuring out if the feature is improving the
model’s prediction or not.
Relative Squared Error (RSE)
To calculate Relative Squared Error, you take the Mean Squared Error (MSE) and divide it by
the square of the difference between the actual and the mean of the data. In other words, we divide
the MSE of our model by the MSE of a model that uses the mean as the predicted value.

Application of Logistic Regression To People-Analytics
No ratings yet
Application of Logistic Regression To People-Analytics
30 pages
Chapters 1-3 - v1
No ratings yet
Chapters 1-3 - v1
33 pages
Critical Appraisal
100% (2)
Critical Appraisal
132 pages
McKinsey On Risk Issue 4
No ratings yet
McKinsey On Risk Issue 4
56 pages
Effects of Contrast Sensitivity On Colour Vision Testing
No ratings yet
Effects of Contrast Sensitivity On Colour Vision Testing
33 pages
Using Swabs For Cleaning Validation
No ratings yet
Using Swabs For Cleaning Validation
7 pages
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
No ratings yet
A BERT Encoding With Recurrent Neural Network and Long-Short Term Memory For Breast Cancer Image Classification - 1-s2.0-S2772662223000176-Main
15 pages
Workflow Bactriology
No ratings yet
Workflow Bactriology
48 pages
Project - 8: Finance &risk Analytics - India Credit Risk
No ratings yet
Project - 8: Finance &risk Analytics - India Credit Risk
28 pages
Stat 110 Strategic Practice 3, Fall 2011
No ratings yet
Stat 110 Strategic Practice 3, Fall 2011
23 pages
(6C) Fundamentals in Epidemiology
No ratings yet
(6C) Fundamentals in Epidemiology
7 pages
ML Assignment-2: Unit 3
No ratings yet
ML Assignment-2: Unit 3
21 pages
Fundamentals of Epidemiology (EPID 610) Exercise 12 Screening Learning Objectives
100% (1)
Fundamentals of Epidemiology (EPID 610) Exercise 12 Screening Learning Objectives
4 pages
ML3 Evaluating Models
No ratings yet
ML3 Evaluating Models
40 pages
Biostatistics For CK Step 2 6.16.2019
No ratings yet
Biostatistics For CK Step 2 6.16.2019
37 pages
Ultrasonographic Usg Evaluation of Acute Appendici
No ratings yet
Ultrasonographic Usg Evaluation of Acute Appendici
8 pages
QAL1 - ABB - AO2000 LS 25 - en - K
No ratings yet
QAL1 - ABB - AO2000 LS 25 - en - K
3 pages
Brunner Et Al
No ratings yet
Brunner Et Al
8 pages
BSPH Student-Submitted Exam Questions
No ratings yet
BSPH Student-Submitted Exam Questions
9 pages
Strictures of The Sigmoid: Barium Enema Evaluation: Colon
No ratings yet
Strictures of The Sigmoid: Barium Enema Evaluation: Colon
11 pages
Diagnostic Test
No ratings yet
Diagnostic Test
26 pages
II. The Bayes Rule
No ratings yet
II. The Bayes Rule
6 pages
Cytology of Bone Fine Needle Aspiration Biopsy
No ratings yet
Cytology of Bone Fine Needle Aspiration Biopsy
11 pages
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
No ratings yet
Tutorial 6 Evaluation Metrics For Machine Learning Models: Classification and Regression Models
22 pages
MSDS
100% (1)
MSDS
4 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
RThe Gugging Swallowing Screen - A Contribution
No ratings yet
RThe Gugging Swallowing Screen - A Contribution
9 pages
Clinical Chemistry I
No ratings yet
Clinical Chemistry I
16 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Revised 2
No ratings yet
Revised 2
45 pages
Soapp R
100% (1)
Soapp R
7 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Dzamonja-Ignjatovic Et Al (2013) A Comparison of New and Revised Rorschach Measures of Schizophrenic Functioning in A Serbian Clinical Sample
No ratings yet
Dzamonja-Ignjatovic Et Al (2013) A Comparison of New and Revised Rorschach Measures of Schizophrenic Functioning in A Serbian Clinical Sample
9 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Module 2
No ratings yet
Module 2
72 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Lec 4 ML S4 Evaluation Metrics
No ratings yet
Lec 4 ML S4 Evaluation Metrics
29 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
6.evaluation Metrics - UNIT 2
No ratings yet
6.evaluation Metrics - UNIT 2
4 pages
Lec 4
No ratings yet
Lec 4
24 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
CE880 Lecture6 Slides
No ratings yet
CE880 Lecture6 Slides
25 pages
Unit 2 Chap 4
No ratings yet
Unit 2 Chap 4
14 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Performance Metrics
No ratings yet
Performance Metrics
8 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
S1 Evaluate Performance LKW 1mar2025
No ratings yet
S1 Evaluate Performance LKW 1mar2025
26 pages
How To Evaluate and Monitor Performance of AI Models For Financial Risk Management - A Practical Guide by Indraneel Dutta Barua
No ratings yet
How To Evaluate and Monitor Performance of AI Models For Financial Risk Management - A Practical Guide by Indraneel Dutta Barua
1 page
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Unit-3 ML Part-2 Ai&Ml r23
No ratings yet
Unit-3 ML Part-2 Ai&Ml r23
15 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
No ratings yet
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
15 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Ceftriaxone Injection Amv
No ratings yet
Ceftriaxone Injection Amv
8 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
DS Unit 4
No ratings yet
DS Unit 4
13 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Performance Evaluation
No ratings yet
Performance Evaluation
24 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Intel Assignment
No ratings yet
Intel Assignment
13 pages
Metric
No ratings yet
Metric
6 pages
Unit 3
No ratings yet
Unit 3
13 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
2 pages
Unit8 (Evaluation Method)
No ratings yet
Unit8 (Evaluation Method)
43 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)

Unit III Iml Final

Uploaded by

Unit III Iml Final

Uploaded by

Evaluation Metrics for Regression and Classification.

The Limitations of Accuracy

Accuracy= TP+TN / TP+TN+FP+FN

Evaluation metrics for regression

 Mean Absolute Error (MAE)

Mean Absolute Error (MAE):

 y_i = actual value

Mean Bias Error (MBE):

Relative Absolute Error (RAE)

where y_bar is the mean of the n actual values.

Mean Absolute Percentage Error (MAPE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

You might also like