Module 8 - PDF
Module 8 - PDF
1
kNN: Classification
Effect of Outliers:
● Consider k=1.
● Sensitive to outliers: Decision boundary
changes drastically with outliers.
● Solution?
○ Increase k
3
kNN: Classification
Effect of k:
4
kNN: Classification
Effect of k:
5
kNN: Classification
6
kNN: Classification
8
kNN: Classification
9
kNN: Regression
12
kNN: Challenges
Computationally expensive:
● Need to store all training example
● Required to compute distances to all training examples:
14
kNN: Computational Complexity
Brute force method
● Training time complexity: O(1)
● Training space complexity: O(1)
● Prediction time complexity: O(k * n * d)
● Prediction space complexity: O(1)
15
kNN: Computational Complexity
16
kNN: k-d Tree
● K Dimensional tree (or k-d tree) is a tree data structure that is used to represent
points in a k-dimensional space.
● Used for various applications like nearest point (in k-dimensional space), efficient
storage of spatial data, range search etc.
18
kNN: k-d Tree
Example:
19
kNN: k-d Tree
Example:
20
kNN: Computational Complexity
21
kNN: Computational Complexity
22
Classification Metrics
24
Classification Metrics
● Confusion matrix
● Accuracy
● Precision/Recall/F1-score
● Area under the ROC curve
25
Classification Metrics
Confusion Matrix
26
Classification Metrics
Confusion Matrix
E.g., For binary classification, a model predicts two classes: “spam” and
“not_spam” from a given email.
prediction
spam not_spam
(TP) (FN)
27
Classification Metrics
Confusion Matrix
28
Classification Metrics
Confusion Matrix
Exercise 2:
prediction
1 0
1 TP=? FN=?
actual
0 FP=? TN=?
29
Classification Metrics
Confusion Matrix
Exercise 2:
prediction
1 0
1 TP=6 FN=2
actual
0 FP=1 TN=3
30
Classification Metrics
Confusion Matrix
Multiclass Classification: E.g. emotion classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
Sad
actual
=? Angry
Surprise
Disgust
Neutral
31
Classification Metrics
Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.
prediction
spam not_spam
TP+TN
Acc = spam True Positive False Negative
TP + TN + FP + FN
actual
(TP) (FN)
33
Classification Metrics
Accuracy
Accuracy is given by the number of correctly classified examples divided by the
total number of classified examples.
prediction
1 0
actual
1 TP=6 FN=2
0 FP=1 TN=3
Accuracy = ?
34
Classification Metrics
Precision
Precision is the ratio of correct positive predictions to the overall number of
positive predictions
prediction
spam not_spam
TP
Precisions = spam True Positive False Negative
TP + FP
actual
(TP) (FN)
35
Classification Metrics
Recall
Recall is the ratio of correct positive predictions to the overall number of positive
examples.
prediction
spam not_spam
36
Classification Metrics
F1-Score
37
Classification Metrics
Visualizing Precision/Recall
© wikipedia
38
Classification Metrics
Precision/Recall/F1-score
prediction
1 0
1 P=6 FN=2
Precision = ?
actual
0 FP=1 TN=3
Recall=?
F1-score=?
39
Classification Metrics
prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual
41
Classification Metrics
prediction
It is important in medical cases
cancer no_cancer where it doesn’t matter
whether we raise a false alarm
cancer Perfect X
actual
42
Classification Metrics
prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual
43
Classification Metrics
prediction
It is important in emails where
spam no_spam it is more important that we
don’t miss any important email
spam Perfect OK
actual
44
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
● Can you define recall (Happy)?
Sad
● Can you define precision (Happy)?
actual
Angry
Surprise
Disgust
Neutral
45
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
recall(Happy)
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
46
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
precision(Happy)
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
47
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Sad
actual
Angry
Surprise
Disgust
Neutral
48
Classification Metrics
Multiclass Classification
prediction
Happy Sad Angry Surprise Disgust Neutral
Happy
Sad
actual
Angry
Surprise
Disgust
Neutral
49
Classification Metrics
TP
FP
TPR =
FPR =
TP + FN
FP + TN
52
Classification Metrics
prediction
TP spam not_spam
TPR =
TP + FN spam True Positive False Negative
actual
(TP) (FN)
TP
TPR =
TP + FN
FP
FPR =
FP + TN
54
Classification Metrics
55
Classification Metrics
56
Classification Metrics
58
Classification Metrics
prediction
spam not_spam
actual
spam 10 0
not_spam 10 0
prediction
spam not_spam
actual
spam 0 10
not_spam 0 10
prediction
spam not_spam
actual
spam 10 0
not_spam 0 10
prediction
spam not_spam
actual
spam 5 5
not_spam 5 5