0% found this document useful (0 votes)
2 views

Module 8 - PDF

The document discusses the k-Nearest Neighbors (kNN) algorithm, focusing on its classification and regression applications, effects of outliers, and the importance of choosing an appropriate k value. It also covers performance metrics for classification models, including confusion matrices, accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). Additionally, it highlights challenges in kNN computation and optimization techniques such as k-d trees.

Uploaded by

satyam.kumar10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 8 - PDF

The document discusses the k-Nearest Neighbors (kNN) algorithm, focusing on its classification and regression applications, effects of outliers, and the importance of choosing an appropriate k value. It also covers performance metrics for classification models, including confusion matrices, accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC). Additionally, it highlights challenges in kNN computation and optimization techniques such as k-d trees.

Uploaded by

satyam.kumar10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Learning Objectives

• Continuing discussion on KNN

• Performance metrics for Classification

• Significance of different metrics

1
kNN: Classification

Effect of Outliers:

● Consider k=1.
● Sensitive to outliers: Decision boundary
changes drastically with outliers.
● Solution?
○ Increase k

3
kNN: Classification

Effect of k:

● Low k: overfitting, highly


unstable decision boundary
● Good k: Smooth boundary, no
overfitting/underfitting
● Higher k: Everything classified
as most probable class
● How to find a good k?
k=1 k=15

4
kNN: Classification

Effect of k:

● Low k: overfitting, highly


unstable decision boundary
● Good k: Smooth boundary, no
overfitting/underfitting
● Higher k: Everything classified
as most probable class
● How to find a good k?
k=1 k=15
Cross validation is our friend!

5
kNN: Classification

What if we have same votes from


both classes?

Potential solutions for tie-


breaking:
● Take k odd
● Randomly select
● Use the class with larger prior k=1 k=15

6
kNN: Classification

A probabilistic variant: Probabilistic kNN

E.g. k=4, c=3


P=[3/4, 0, 1/4]

y=1 y=2 y=3

8
kNN: Classification

A probabilistic variant: Probabilistic kNN

E.g. k=4, c=3 Variant with pseudo counts:


P=[(3+1)/(4+3), (0+1)/(4+3),(1+1)/(4+3)]
=[4/7, 1/7, 2/7]
y=1 y=2 y=3

9
kNN: Regression

A simple regression algorithm:

● Train examples: where is a continuous real valued target


● Given test input
● Find distances to the n training examples using a distance metric
● Select k closest training examples and their target values
● The output is the mean of the target values of the k neighbours

Can be used for interpolation.

12
kNN: Challenges

Computationally expensive:
● Need to store all training example
● Required to compute distances to all training examples:

There are ways to optimize kNN computation:


● Reduce dimensionality using dimensionality reduction techniques
● Reduce number of comparisons:
○ kD tree implementation
○ Locality sensitive hashing

14
kNN: Computational Complexity
Brute force method
● Training time complexity: O(1)
● Training space complexity: O(1)
● Prediction time complexity: O(k * n * d)
● Prediction space complexity: O(1)

15
kNN: Computational Complexity

k-d tree method


● Training time complexity: O(d * n * log(n))
● Training space complexity: O(d * n)
● Prediction time complexity: O(k * log(n))
● Prediction space complexity: O(1)

16
kNN: k-d Tree

● K Dimensional tree (or k-d tree) is a tree data structure that is used to represent
points in a k-dimensional space.
● Used for various applications like nearest point (in k-dimensional space), efficient
storage of spatial data, range search etc.

18
kNN: k-d Tree

Example:

19
kNN: k-d Tree

Example:

20
kNN: Computational Complexity

The more “traditional” application of the kNN is the classification of data. It


often has quite a lot of points, e. g. MNIST has 60k training images and 10k test
images. Classification is done offline, which means we first do the training
phase, then just use the results during prediction. Therefore, if we want to
construct the data structure, we only need to do so once. For 10k test images,
let’s compare the brute force (which calculates all distances every time) and
k-d tree for 3 neighbors.

21
kNN: Computational Complexity

● Brute force (O(k * n)): 3 * 10,000 = 30,000


● k-d tree (O(k * log(n))): 3 * log(10,000) ~ 3 * 13 = 39

22
Classification Metrics

How to measure the performance of a classification model?

24
Classification Metrics

Most widely used metrics and tools to access classification models:

● Confusion matrix
● Accuracy
● Precision/Recall/F1-score
● Area under the ROC curve

25
Classification Metrics

Confusion Matrix

A table to summarize how successful the classification model is at


predicting examples belonging to various classes.

26
Classification Metrics

Confusion Matrix

E.g., For binary classification, a model predicts two classes: “spam” and
“not_spam” from a given email.
prediction
spam not_spam

spam True Positive False Negative


actual

(TP) (FN)

not_spam False Positive True Negative


(FP) (TN)

27
Classification Metrics

Confusion Matrix

Exercise 1: Consider a Cricket Tournament. Find


the mapping.
True Positive False Negative
(TP) (FN)
1. You had predicted that India would win and it won.
2. You had predicted that England would not win and it
False Positive True Negative
lost. (FP) (TN)
3. You had predicted that England would win, but it lost.
4. You had predicted that India would not win, but it won.

28
Classification Metrics

Confusion Matrix

Exercise 2:

prediction
1 0

1 TP=? FN=?
actual

0 FP=? TN=?

29
Classification Metrics

Confusion Matrix

Exercise 2:

prediction
1 0

1 TP=6 FN=2
actual

0 FP=1 TN=3

30
Classification Metrics

Confusion Matrix
Multiclass Classification: E.g. emotion classification
prediction
Happy Sad Angry Surprise Disgust Neutral

Happy

Sad

actual
=? Angry

Surprise

Disgust

Neutral

31
Classification Metrics

Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.
prediction
spam not_spam
TP+TN
Acc = spam True Positive False Negative
TP + TN + FP + FN
actual
(TP) (FN)

not_spam False Positive True Negative


(FP) (TN)

33
Classification Metrics

Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.

prediction
1 0
actual

1 TP=6 FN=2

0 FP=1 TN=3

Accuracy = ?
34
Classification Metrics

Precision
Precision is the ratio of correct positive predictions to the overall number of
positive predictions
prediction
spam not_spam
TP
Precisions = spam True Positive False Negative
TP + FP
actual
(TP) (FN)

FP is costly! not_spam False Positive True Negative


(FP) (TN)

35
Classification Metrics

Recall
Recall is the ratio of correct positive predictions to the overall number of positive
examples.
prediction
spam not_spam

TP spam True Positive False Negative


Recall =
actual
(TP) (FN)
TP + FN
not_spam False Positive True Negative
(FP) (TN)
FN is costly!

36
Classification Metrics

F1-Score

● The formula for the standard F1-score


is the harmonic mean of the precision
and recall.
● Best of both worlds
● A perfect model has an F-score of 1.
● FP & FN both are costly!

37
Classification Metrics

Visualizing Precision/Recall

© wikipedia

38
Classification Metrics

Precision/Recall/F1-score

prediction
1 0

1 P=6 FN=2
Precision = ?

actual
0 FP=1 TN=3
Recall=?

F1-score=?

39
Classification Metrics

Examples: It all depends on the problem!


Diagnosis of cancer.

prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual

but the actual positive cases


should not go undetected!
no _cancer OK Perfect
What metric would you pick?

41
Classification Metrics

Examples: It all depends on the problem!


Diagnosis of cancer.

prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual

but the actual positive cases


should not go undetected!
no _cancer OK Perfect
TP
Recall =
TP + FN

42
Classification Metrics

Examples: It all depends on the problem!


Detecting if an email spam or no spam.

prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual

as spam than receiving an


occasional spam as no spam.
no_spam X Perfect
What metric would you pick?

43
Classification Metrics

Examples: It all depends on the problem!


Detecting if an email spam or no spam.

prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual

as spam than receiving an


occasional spam as no spam.
no_spam X Perfect
TP
Precision =
TP + FP

44
Classification Metrics

Multiclass Classification

prediction
Happy Sad Angry Surprise Disgust Neutral

Happy
● Can you define recall (Happy)?

Sad
● Can you define precision (Happy)?
actual

Angry

Surprise

Disgust

Neutral

45
Classification Metrics

Multiclass Classification

prediction
Happy Sad Angry Surprise Disgust Neutral

recall(Happy)
Happy

Sad
actual

Angry

Surprise

Disgust

Neutral

46
Classification Metrics

Multiclass Classification

prediction
Happy Sad Angry Surprise Disgust Neutral

precision(Happy)
Happy

Sad
actual

Angry

Surprise

Disgust

Neutral

47
Classification Metrics

Multiclass Classification

prediction
Happy Sad Angry Surprise Disgust Neutral

Can you define accuracy?


Happy

Sad
actual

Angry

Surprise

Disgust

Neutral

48
Classification Metrics

Multiclass Classification

prediction
Happy Sad Angry Surprise Disgust Neutral

Happy

Sad
actual

Angry

Surprise

Disgust

Neutral

49
Classification Metrics

Area under the ROC Curve (AUC)


● The ROC curve (ROC stands for “receiver operating characteristic,” the term
comes from radar engineering. The method was originally developed for
operators of military radar receivers starting in 1941, which led to its name.) is
a commonly used method to assess the performance of binary classification
models.

● ROC curves use a combination of:


(1) true positive rate (the proportion of positive examples predicted correctly,
defined exactly as recall) and
(2) false positive rate (the proportion of negative examples predicted
incorrectly)
to build up a summary picture of the classification performance.
51
Classification Metrics

Area under the ROC Curve (AUC)


● ROC curves use a combination of:
(1) true positive rate (the proportion of positive examples predicted correctly,
defined exactly as recall) and
(2) false positive rate (the proportion of negative examples predicted
incorrectly)
to build up a summary picture of the classification performance.

TP
FP
TPR =
FPR =
TP + FN
FP + TN

52
Classification Metrics

Area under the ROC Curve (AUC)

prediction
TP spam not_spam
TPR =
TP + FN spam True Positive False Negative

actual
(TP) (FN)

FP not_spam False Positive True Negative


FPR = (FP) (TN)
FP + TN
Specificity = 1 - FPR = TN / (TN + FP)
Sensitivity/Recall = TPR
53
Classification Metrics

Area under the ROC Curve (AUC)

TP
TPR =
TP + FN

FP
FPR =
FP + TN

54
Classification Metrics

Area under the ROC Curve (AUC)

● We used a threshold for


classification in many classification
models
● Typically for models that give
probabilistic output score.

55
Classification Metrics

Area under the ROC Curve (AUC)

● To compare different classifiers, it can


be useful to summarize the
performance of each classifier into a
single measure.
● One common approach is to
calculate the area under the ROC
curve, which is abbreviated to AUC.

56
Classification Metrics

Area under the ROC Curve (AUC)

● AUC ranges in value from 0 to 1


● A model whose predictions are 100%
wrong has an AUC of 0.0
● One whose predictions are 100%
correct has an AUC of 1.0
● AUC is classification-threshold-
invariant and suitable for comparison

58
Classification Metrics

Area under the ROC Curve (AUC)

prediction
spam not_spam
actual

spam 10 0

not_spam 10 0

All predictions say “spam”.


(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 59
Classification Metrics

Area under the ROC Curve (AUC)

prediction
spam not_spam
actual

spam 0 10

not_spam 0 10

All predictions say “not_spam”.


(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 60
Classification Metrics

Area under the ROC Curve (AUC)

prediction
spam not_spam
actual

spam 10 0

not_spam 0 10

All predictions are perfect.


(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 61
Classification Metrics

Area under the ROC Curve (AUC)

prediction
spam not_spam
actual

spam 5 5

not_spam 5 5

Some random predictions.


(1) TPR=?
(2) FPR=?
(3) Where is the point in ROC curve? 62

You might also like