0% found this document useful (0 votes)
13 views

CE880_Lecture6_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CE880_Lecture6_slides

Uploaded by

Anand A J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

School of Computer Science and Electronics Engineering, University of Essex

ILecture 6: Evaluation Metrics and Hypothesis Testing


CE880: An Approachable Introduction to Data Science

Haider Raza
Tuesday, 21th February 2023

1
About Myself

I Name: Haider Raza


I Position: Senior Lecturer in AI
I Research interest: AI, Machine Learning, Data Science
I Contact: [email protected]
I Academic Support Hours: 1-2 on Friday via zoom. Zoom link is available on
Moodle
I Website: www.sagihaider.com

2
What we will be covering in this lecture

I Evaluation Metrics
I Confusion Metrics
I F1 Score, Recall, Precision
I Hypothesis testing
I p-value
I Types of Hypothesis testing

3
Classification/Evaluation Metrics

0
https://towardsdatascience.com

4
Prerequisites

I Condition positive (P): the number of real positive cases in the data
I Condition negative (N) the number of real negative cases in the data
I True positive: Sick people correctly identified as sick
I False positive: Healthy people incorrectly identified as sick
I True negative: Healthy people correctly identified as healthy
I False negative: Sick people incorrectly identified as healthy

5
Classification Accuracy

I Classification accuracy is the ratio of correct predictions to total predictions made.


Example (accuracy)
classification accuracy = correct predictions / total predictions

I i.e. 80 out of 100 samples are correctly classified. Then 80/100 = 0.8
I It is often presented as a percentage (%) by multiplying the result by 100.
Example (accuracy in %)
classification accuracy = correct predictions / total predictions * 100
classification accuracy = 80/100 * 100 = 80%

I Classification accuracy can also easily be turned into a misclassification rate or


error rate by inverting the value, such as:

Example (error)
classification accuracy = correct predictions / total predictions * 100
classification accuracy = 80/100 * 100 = 80%

6
Problem with classification accuracy?

I When your data has more than 2 classes: With 3 or more classes you may get
a classification accuracy of 80%, but you don’t know if that is because all classes
are being predicted equally well or whether one or two classes are being neglected
by the model.
I When your data does not have an even number of classes (Data Imbalance):
You may achieve accuracy of 90% or more, but this is not a good score if 90
records for every 100 belong to one class and you can achieve this score by always
predicting the most common class value.

Classification accuracy can hide the detail you need to diagnose the performance of
your model. But thankfully we can tease apart this detail by using a confusion matrix.

7
Confusion Matrix

A confusion matrix is a summary of prediction results on a classification problem. In


other words, the confusion matrix shows the ways in which your classification model is
confused when it makes predictions.
It gives you insight not only into the errors being made by your classifier but more
importantly the types of errors that are being made.

8
Accuracy and Confusion Matrix

9
Example: Covid vs Non-Covid

10
False Positive Rate: Type I error rate

False positive rate (FPR) is probability of false alarm

FP
FPR =
FP + TN
2
FPR = = 0.4
2+3

11
False Negative Rate: Type II error rate

FN
FNR =
TP + FN
1
FPR = = 0.2
4+1

12
Let’s consider Imbalance Dataset

Suppose we have 10K records, where label A being 9k and label B being 1K. Now
suppose we are calculating the Accuracy, then its obvious that we will get a 90%
accuracy were the model predicts most of the records being tagged to label A. Clearly
this is not a good way of calculating the efficiency of the model if our dataset is not
balanced. So, in such such situations we use Recall, Precision, F-beta as the
classification metric

13
Recall: Sensitivity, Hit Rate, or True Positive Rate (TPR)

Recall says that out of the total actual positive values, how many positive were we
able to predict correctly.
Example: If we have 100 Covid Positive Cases then out of 100 cases how many we
have correctly predicted?
Note: In case of Recall we deal with ’False Negative’

14
Precision: (Positive Predictive Value (PPV))

Precision says that out of the total predicted positive result, how many results were
actually positive.
Note: in case of Precision we deal with ’False Positive’

15
Precision Vs Recall: Example 1: SPAM Detection

In this case, we mostly have to consider the Precision. Let’s say we got an email
which is originally not a spam, but the model detected it as a spam, which means it is
a False Positive. At last: the user is going to miss an email, which may be of high
importance.
Note: In such cases, where the False Positive value is high, our main focus should
always be to reduce it to minimum.

16
Precision Vs Recall: Example 2: Cancer Detection

In this case, we mostly have to consider the Recall. In Cancer vs Not Cancer:
Suppose the model predicted Not Cancer whereas patient was actually having Cancer,
which is a False Negative. This might turn out to be a blunder by the model.
In such cases a False Positive won’t be a very big issue because even if the person is
not having Cancer but is predicted as Cancer then he/she could go for another test to
verify the result. But if the person has Cancer and is predicted as negative (False
Negative) then chances are he might not go for another test which might turn out to
be a disaster. Therefore, it’s important to use Recall in such situations.
NOTE: Our goal should always be to reduce Precision and Recall, however:

I Whenever the False Positive is of more importance with respect to the problem
statement, then use Precision.
I If the False Negative has greater importance with respect to the problem
statement, then use Recall.

17
F-Beta

Sometimes both the False Positive and False Negative play an important role in an
imbalanced dataset. In such cases, we have to consider both Recall and Precision.

If the β value is 1, then the Fbeta becomes a F1-Score. Sometimes, beta value can
also be 0.5 or 2

18
F1 Score

If the Beta value is 1, then:

The above formula is a representation of Harmonic mean between Precision and


Recall. Now, let’s understand when to choose what values of Beta.

19
Selecting beta value

Beta = 1 If both False Positive and False Negative are equally important, then we
will select Beta = 1.

Beta <= 1 or close to 0 Suppose False Positive is having more impact than the
False Negative, then we need to reduce the Beta value by selecting something between
0 to 1. Example: SPAM Detection

Beta >= 1 Suppose the False Negative impact is high which is basically the Recall,
then in such cases we increase the Beta value more than 1. Example: Cancer
Detection

20
Hypothesis Testing

Evaluation of two mutually exclusive statements on population using sample data.

I Start with specifying Null and Alternative Hypotheses about a population


parameter
I Set the level of significance (α)
I Collect Sample data and calculate the Test Statistic and P-value by running a
Hypothesis test that well suits our data
I Make Conclusion: Reject or Fail to Reject Null Hypothesis
21
Confusion Matrix in Hypothesis testing

The decision rule for the p-value method:

I Confidence: The probability of accepting a True Null Hypothesis. It is denoted


as (1-α)
I Type I error: Occurs when we reject a True Null Hypothesis and is denoted as α.
I Type II error: Occurs when we accept a False Null Hypothesis and is denoted as
β.

22
Hypothesis Testing: Decision Rule (p-value) based

The decision rule for the p-value method:

I if p-value (p) > level of significance (α), we fail to reject Null Hypothesis
I if p-value (p) ≤ level of significance (α), we reject Null Hypothesis

23
Selecting Hypothesis test

24
25

You might also like