Supervised Learning KNN
Supervised Learning KNN
Classifier (k-NN)
❑ k-Nearest-Neighbour (k-NN) classifier
➢ It works on unseen data and will search through the training dataset
for the k-most similar instances.
Figure: Nearest neighbour using the 11-NN rule, the point denoted by a “star” is
classified to the class.
k-NN Classifier Algorithm
Distance Measures
Distance Measures
Euclidean Distance
Classification Types
❑ Binary Classification:
Binary classification is the task of classifying the
elements of a given set into two groups on the basis of a
classification rule.
❑ Multi-Class Classification:
A classification task with more than two classes.
Example (Multi-class)
Table 1: Data for Height Classification.
Name Gender Height Output M = Male
Kristina F 1.6 m Short F = Female
Jim M 2m Tall
Maggie F 1.9 m Medium
Martha F 1.88 m Medium
Stephanie F 1.7 m Short
Bob M 1.85 m Medium
Kathy F 1.6 m Short
Dave M 1.7 m Short
Worth M 2.2 m Tall
Steven M 2.1 m Tall
Debbie F 1.8 m Medium
Todd M 1.95 m Medium
Kim F 1.9 m Medium
Amy F 1.8 m Medium
Wynette F 1.75 m Medium
➢ Using the sample data from Table 1 and the Output classification as the
training set output value, we classify the instance (Pat, F, 1.6).
➢ First convert the discrete data to numeric; Such as: Let, M=0, F=1
➢ Then we will do the distance calculation. Here, both the Euclidean and
Manhattan distance measures yield the same results; that is, the distance is
simply the absolute value of the difference between the values.
➢ Of these the five item, three are classified as short and two as
medium. Thus, the kNN will classify Pat as short.
What should be the value of “ k “ ?
❑ k should be large enough so that error rate is minimized
k too small will lead to noisy decision boundaries
❑ Setting k to the square root of the number of training samples may lead to
better results (Empirically Found)
No Of features = 20
K = √20 = 4.4 ~ 4
How to determine a good value for k?