D3 IT Performance Metrics May 2023
D3 IT Performance Metrics May 2023
5/17/2023 1
Performance Estimation of a Classifier
Predicted Actual
Accuracy = ¾ =0.75 or 75%
0 1
1 1
0 0
0 0
Predicted/ 1 0
Actual
1 True Positive (TP) False Positive (FP)
0 False Negative (FN) True Negative (TN)
5/17/2023
Accuracy = TP+TN/FP+FN+TP+TN 2
Performance Estimation of a Classifier
True Positive (TP)
The predicted value matches the actual value
The actual value was positive and the model predicted a positive
value
True Negative (TN)
The predicted value matches the actual value
The actual value was negative and the model predicted a negative
False Positive (FP) – Type 1 error
The predicted value was falsely predicted
The actual value was negative but the model predicted a positive
False Negative (FN) – Type 2 error
The predicted value was falsely predicted
The actual value was positive but the model predicted a negative
value
5/17/2023 3
Performance Estimation of a Classifier
Predicted Actual
Accuracy = ¾ =0.75 or 75%
0 1
1 1
0 0
0 0
Predicted 1 0
/Actual
1 1 0
0 1 2
5/17/2023 4
Performance Estimation of a Classifier
Predictive accuracy works fine, when the classes
are balanced
That is, every class in the data set are equally
important
5/17/2023 6
Performance Estimation of a Classifier
Effectiveness of Predictive Accuracy
Given a data set of stock markets, we are to classify
them as “good” and “worst”. Suppose, in the data set,
out of 100 entries, 98 belong to “good” class and only
2 are in “worst” class.
With this data set, if classifier’s predictive accuracy is 0.98, a very high
value!
Here, there is a high chance that 2 “worst” stock markets may
incorrectly be classified as “good”
On the other hand, if the predictive accuracy is 0.02, then none of the
stock markets may be classified as “good”
5/17/2023 7
Performance Estimation of a Classifier
Effectiveness of Predictive Accuracy
Given a data set of stock markets, we are to classify
them as “good” and “worst”. Suppose, in the data set,
out of 100 entries, 98 belong to “good” class and only
2 are in “worst” class.
With this data set, if classifier’s predictive accuracy is 0.98, a very high
value!
Here, there is a high chance that 2 “worst” stock markets may
incorrectly be classified as “good”
On the other hand, if the predictive accuracy is 0.02, then none of the
stock markets may be classified as “good”
5/17/2023 8
Performance Estimation of a Classifier
Predicted Fever Actual Fever Status
1 1 TP
0 0 TN
0 0 TN
1 0 FP
0 0 TN
1 0 FP
0 1 FN
1 1 TP
0 0 TN
1 0 FP
5/17/2023 10
Performance Estimation of a Classifier
Thus, when the classifier classified a test data set
with imbalanced class distributions, then
predictive accuracy on its own is not a reliable
indicator of a classifier’s effectiveness.
This necessitates an alternative metrics to judge
the classifier.
5/17/2023 11
Performance Estimation of a Classifier
These metrics may be derived from a single matrix
known as “Confusion Matrix” or “Contingency Matrix”
5/17/2023 12
Performance Estimation of a Classifier
source: https://en.wikipedia.org/wiki/Sensitivity_and_specificity
5/17/2023 18
Performance Estimation of a Classifier
F1 Score (or F-score) which is a weighted
average or harmonic mean of precision and
recall are useful to deal with imbalanced
datasets
High value of F1 ensures that Precision and
Recall both are reasonably high
5/17/2023 19
Analysis with Performance Measurement Metrics
Based on the various performance metrics, we can characterize a classifier.
𝑃
TPR = =1
𝑃 Predicted Class
0
FPR = =0 + -
𝑁
𝑃
Precision = = 1 + P 0
Actual
𝑃 class
2×1
F1 Score = =1 - 0 N
1+1
𝑃+𝑁
Accuracy = =1
𝑃+𝑁
5/17/2023 20
Analysis with Performance Measurement Metrics
When every instance is wrongly classified, it is called the worst classifier. In this
case, TP = 0, TN = 0 and the CM is
Predicted Class
0
TPR = =0 + -
𝑃
𝑁 + 0 P
FPR = =1
Actual
class
𝑁
0 - N 0
Precision = =0
𝑁
F1 Score = Not applicable
as Recall + Precision = 0
0
Accuracy = =0
𝑃+𝑁
5/17/2023 21
Analysis with Performance Measurement Metrics
The classifier always predicts the + class correctly. Here, the False Negative
(FN) and True Negative (TN) are zero. The CM is
Predicted Class
𝑃
TPR = =1 + -
𝑃
𝑁 + P 0
FPR = =1
Actual
class
𝑁
𝑃 - N 0
Precision =
𝑃+𝑁
2𝑃
F1 Score =
2𝑃+𝑁
𝑃
Accuracy = =0
𝑃+𝑁
5/17/2023 22
Analysis with Performance Measurement Metrics
This classifier always predicts the - class correctly. Here, the False Negative
(FN) and True Negative (TN) are zero. The CM is
Predicted Class
0
TPR = =0 + -
𝑃
0 + 0 p
FPR = =0
Actual
class
𝑁
- 0 N
Precision = Not applicable
(as TP + FP = 0)
F1 Score = Not applicable
𝑁
Accuracy = =0
𝑃+𝑁
5/17/2023 23
Predictive Accuracy versus TPR and FPR
One strength of characterizing a classifier by its TPR and FPR is that they do
not depend on the relative size of P and N.
The same is also applicable for FNR and TNR and others measures from CM.
In contrast, the Predictive Accuracy, Precision, Error Rate, F1 Score, etc. are
affected by the relative size of P and N.
FPR, TPR, FNR and TNR are calculated from the different rows of the CM.
On the other hand Predictive Accuracy, etc. are derived from the values in both
rows.
This suggests that FPR, TPR, FNR and TNR are more effective than
Predictive Accuracy, etc.
5/17/2023 24
Confusion Matrix
A classifier is built on a dataset regarding Good and Worst
classes of stock markets. The model is then tested with a test set
of 10000 unseen instances. The result is shown in the form of a
confusion matrix. The result is self explanatory.
Predictive accuracy?
5/17/2023 25
Multiclass Classification
Multiclass (or multinomial) classification is a type
of supervised learning task in machine learning and
data science where the goal is to predict the
categorical class label of an input data sample from
a set of two or more possible classes.
In other words, given an input, the model has to
predict which of the multiple possible classes that
input belongs to. For example, given an image of an
animal, the task might be to classify it as a cat, dog,
or bird.
5/17/2023 26
Multiclass Classification
Multiclass classification is different from binary
classification, where the goal is to predict a binary label,
such as "yes" or "no," or "true" or "false." Multiclass
classification can be performed using various algorithms
such as logistic regression, decision trees, random forests,
support vector machines, neural networks, etc.
Some algorithms (such as Random Forest classifiers or
naive Bayes classifiers) are capable of handling multiple
classes directly. Others (such as SVM classifiers or Linear
classifiers) are strictly binary classifiers. However, there are
various strategies that you can use to perform multiclass
classification using multiple binary classifiers
5/17/2023 27
Multilabel Classification
goal is to predict multiple categorical class labels for an input data
sample.
In other words, instead of predicting a single class label for a data
sample, the model predicts multiple labels.
For example, in a movie recommendation system, a movie might
have multiple genre labels, such as "comedy," "action," and
"drama."
Multilabel classification is different from multiclass classification,
where the goal is to predict a single class label from multiple
possible classes. In multilabel classification, the number of possible
labels is not limited to two or more classes, and a data sample can
belong to one or more labels.
Multilabel classification can be performed using various algorithms
such as binary relevance, label powerset, classifier chains, and
hierarchical classification.
5/17/2023 28
Confusion Matrix for Multiclass Classifier
Having m classes, confusion matrix is a table of size
m×m , where, element at (i, j) indicates the number of
instances of class i but classified as class j.
To have good accuracy for a classifier, ideally most
diagonal entries should have large values with the rest of
23 1 4 0 1
entries being close to zero.
2 35 6 2 2
3 1 73 3 7
4 2 4 50 3
5 4 2 5 28
Confusion matrix may have additional rows or columns
to provide total or recognition rates per class.
5/17/2023 29
Confusion Matrix for Multiclass
Classifier
Unlike binary classification, there are no positive or
negative classes here.
At first, it might be a little difficult to find TP, TN, FP
and FN since there are no positive or negative classes,
but it’s actually pretty easy.
What we have to do here is to find TP, TN, FP and FN for
each individual class. For example, if we take class
Apple, then let’s see what are the values of the metrics
from the confusion matrix.
5/17/2023 30
Source: https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826
Confusion Matrix for Multiclass Classifier
TP = 7
TN = (2+3+2+1) = 8
FP = (8+9) = 17
FN = (1+3) = 4
Since we have all the
necessary metrics for class
Apple from the confusion
matrix, now we can calculate
the performance measures
for class Apple. For example,
class Apple has
Precision = 7/(7+17) = 0.29
Recall = 7/(7+4) = 0.64
5/17/2023
F1-score = 0.40 31
Source: https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826
Confusion Matrix for Multiclass
Classifier
5/17/2023 32
Source: https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826
Confusion Matrix for Multiclass Classifier
Confusion matrix with multiple class
Following table shows the confusion matrix of a classification problem with six
classes labeled as C1, C2, C3, C4, C5 and C6.
Class C1 C2 C3 C4 C5 C6
C1 52 10 7 0 0 1
C2 15 50 6 2 1 2
C3 5 6 6 0 0 0
C4 0 2 0 10 0 1
C5 0 1 0 0 7 1
C6 1 3 0 1 0 24
Predictive accuracy?
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 33
Confusion Matrix for Multiclass Classifier
In case of multiclass classification, sometimes one class is important enough to
be regarded as positive with all other classes combined together as negative.
Thus a large confusion matrix of m*m can be concised into 2*2 matrix.
m×m CM to 2×2 CM
For example, the CM shown in Example transformed into a CM of size 2×2
considering the class C1 as the positive class and classes C2, C3, C4, C5 and C6
combined together as negative.
Class + -
+ 52 18
- 21 123
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 34
ROC Curves
5/17/2023 35
ROC Curves
ROC is an abbreviation of Receiver Operating Characteristic come from the
signal detection theory, developed during World War 2 for analysis of radar
images.
In the context of classifier, ROC plot is a useful tool to study the behaviour of
a classifier or comparing two or more classifiers.
Since, the values of FPR and TPR varies from 0 to 1 both inclusive, the two
axes thus from 0 to 1 only.
Each point (x, y) on the plot indicating that the FPR has value x and the TPR
value y.
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 36
ROC Plot
A typical look of ROC plot with few points in it is shown in the following
figure.
Note the four cornered points are the four extreme cases of classifiers
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 37
Interpretation of Different Points in ROC Plot
Le us interpret the different points in the ROC plot.
5/17/2023
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014 40
Interpretation of Different Points in ROC Plot
Let us interpret the different points in the ROC plot.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 41
Tuning a Classifier through ROC Plot
Using ROC plot, we can compare two or more classifiers by their TPR and
FPR values and this plot also depicts the trade-off between TPR and FPR of a
classifier.
Examining ROC curves can give insights into the best way of tuning
parameters of classifier.
For example, in the curve C2, the result is degraded after the point P.
Similarly for the observation C1, beyond Q the settings are not acceptable.
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 42
Comparing Classifiers trough ROC Plot
Two curves C1 and C2 are corresponding to the experiments to choose two
classifiers with their parameters.
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 43
Comparing Classifiers trough ROC Plot
We can use the concept of “area under curve” (AUC) as a better method to
compare two or more classifiers.
If a model is perfect, then its AUC = 1.
If a model simply performs random guessing, then its AUC = 0.5
A model that is strictly better than other, would have a larger value of AUC
than the other.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 44
A Quantitative Measure of a Classifier
The concept of ROC plot can be extended to compare quantitatively using
Euclidean distance measure.
Here, C(fpr, tpr) is a classifier and 𝜹 denotes the Euclidean distance between
the best classifier (0, 1) and C. That is,
𝜹= 𝑓𝑝𝑟 2 + (1 − 𝑡𝑝𝑟)2
Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan Kaufmann, 2015.
Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-Wesley, 2014
5/17/2023 45
Performance Estimation of a Regression Model
Different Measures :
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
R Squared (R2 )
5/17/2023 46
Actual Predicted Predicted
1 1 1 1
2 4 3 4
3 9 8 9 0.80
4 16 17 16
5 25 23 25 R squared =1
5/17/2023 47
Performance Estimation of a Regression Model
The Mean Absolute Error (or MAE) is the average of the
absolute differences between predictions and actual
values. It gives an idea of how wrong the predictions
were.
The measure gives an idea of the magnitude of the error,
but no idea of the direction (e.g. over or under
predicting).
The R^2 (or R Squared) metric provides an indication of
the goodness of fit of a set of predictions to the actual
values. In statistical literature, this measure is called the
coefficient of determination.
This is a value between 0 and 1 for no-fit and perfect fit
5/17/2023 48