CE880_Lecture6_slides
CE880_Lecture6_slides
Haider Raza
Tuesday, 21th February 2023
1
About Myself
2
What we will be covering in this lecture
I Evaluation Metrics
I Confusion Metrics
I F1 Score, Recall, Precision
I Hypothesis testing
I p-value
I Types of Hypothesis testing
3
Classification/Evaluation Metrics
0
https://towardsdatascience.com
4
Prerequisites
I Condition positive (P): the number of real positive cases in the data
I Condition negative (N) the number of real negative cases in the data
I True positive: Sick people correctly identified as sick
I False positive: Healthy people incorrectly identified as sick
I True negative: Healthy people correctly identified as healthy
I False negative: Sick people incorrectly identified as healthy
5
Classification Accuracy
I i.e. 80 out of 100 samples are correctly classified. Then 80/100 = 0.8
I It is often presented as a percentage (%) by multiplying the result by 100.
Example (accuracy in %)
classification accuracy = correct predictions / total predictions * 100
classification accuracy = 80/100 * 100 = 80%
Example (error)
classification accuracy = correct predictions / total predictions * 100
classification accuracy = 80/100 * 100 = 80%
6
Problem with classification accuracy?
I When your data has more than 2 classes: With 3 or more classes you may get
a classification accuracy of 80%, but you don’t know if that is because all classes
are being predicted equally well or whether one or two classes are being neglected
by the model.
I When your data does not have an even number of classes (Data Imbalance):
You may achieve accuracy of 90% or more, but this is not a good score if 90
records for every 100 belong to one class and you can achieve this score by always
predicting the most common class value.
Classification accuracy can hide the detail you need to diagnose the performance of
your model. But thankfully we can tease apart this detail by using a confusion matrix.
7
Confusion Matrix
8
Accuracy and Confusion Matrix
9
Example: Covid vs Non-Covid
10
False Positive Rate: Type I error rate
FP
FPR =
FP + TN
2
FPR = = 0.4
2+3
11
False Negative Rate: Type II error rate
FN
FNR =
TP + FN
1
FPR = = 0.2
4+1
12
Let’s consider Imbalance Dataset
Suppose we have 10K records, where label A being 9k and label B being 1K. Now
suppose we are calculating the Accuracy, then its obvious that we will get a 90%
accuracy were the model predicts most of the records being tagged to label A. Clearly
this is not a good way of calculating the efficiency of the model if our dataset is not
balanced. So, in such such situations we use Recall, Precision, F-beta as the
classification metric
13
Recall: Sensitivity, Hit Rate, or True Positive Rate (TPR)
Recall says that out of the total actual positive values, how many positive were we
able to predict correctly.
Example: If we have 100 Covid Positive Cases then out of 100 cases how many we
have correctly predicted?
Note: In case of Recall we deal with ’False Negative’
14
Precision: (Positive Predictive Value (PPV))
Precision says that out of the total predicted positive result, how many results were
actually positive.
Note: in case of Precision we deal with ’False Positive’
15
Precision Vs Recall: Example 1: SPAM Detection
In this case, we mostly have to consider the Precision. Let’s say we got an email
which is originally not a spam, but the model detected it as a spam, which means it is
a False Positive. At last: the user is going to miss an email, which may be of high
importance.
Note: In such cases, where the False Positive value is high, our main focus should
always be to reduce it to minimum.
16
Precision Vs Recall: Example 2: Cancer Detection
In this case, we mostly have to consider the Recall. In Cancer vs Not Cancer:
Suppose the model predicted Not Cancer whereas patient was actually having Cancer,
which is a False Negative. This might turn out to be a blunder by the model.
In such cases a False Positive won’t be a very big issue because even if the person is
not having Cancer but is predicted as Cancer then he/she could go for another test to
verify the result. But if the person has Cancer and is predicted as negative (False
Negative) then chances are he might not go for another test which might turn out to
be a disaster. Therefore, it’s important to use Recall in such situations.
NOTE: Our goal should always be to reduce Precision and Recall, however:
I Whenever the False Positive is of more importance with respect to the problem
statement, then use Precision.
I If the False Negative has greater importance with respect to the problem
statement, then use Recall.
17
F-Beta
Sometimes both the False Positive and False Negative play an important role in an
imbalanced dataset. In such cases, we have to consider both Recall and Precision.
If the β value is 1, then the Fbeta becomes a F1-Score. Sometimes, beta value can
also be 0.5 or 2
18
F1 Score
19
Selecting beta value
Beta = 1 If both False Positive and False Negative are equally important, then we
will select Beta = 1.
Beta <= 1 or close to 0 Suppose False Positive is having more impact than the
False Negative, then we need to reduce the Beta value by selecting something between
0 to 1. Example: SPAM Detection
Beta >= 1 Suppose the False Negative impact is high which is basically the Recall,
then in such cases we increase the Beta value more than 1. Example: Cancer
Detection
20
Hypothesis Testing
22
Hypothesis Testing: Decision Rule (p-value) based
I if p-value (p) > level of significance (α), we fail to reject Null Hypothesis
I if p-value (p) ≤ level of significance (α), we reject Null Hypothesis
23
Selecting Hypothesis test
24
25