0% found this document useful (0 votes)
14 views23 pages

Supervised Learning KNN

The document discusses the k-nearest neighbor (k-NN) classifier algorithm. k-NN is a supervised learning algorithm that classifies new data based on the features of its k nearest neighbors in the training set. It works by finding the k closest training examples in feature space and assigning the new example to the most common class among its k neighbors. The value of k affects accuracy, with larger values reducing noise but increasing computational cost.

Uploaded by

arif.ishaan99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

Supervised Learning KNN

The document discusses the k-nearest neighbor (k-NN) classifier algorithm. k-NN is a supervised learning algorithm that classifies new data based on the features of its k nearest neighbors in the training set. It works by finding the k closest training examples in feature space and assigning the new example to the most common class among its k neighbors. The value of k affects accuracy, with larger values reducing noise but increasing computational cost.

Uploaded by

arif.ishaan99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

k-Nearest Neighbour

Classifier (k-NN)
❑ k-Nearest-Neighbour (k-NN) classifier

❑ k-NN classifier - Examples


What is k-Nearest Neighbour (k-NN)
Algorithm?
➢ k-Nearest Neighbour (k-NN) is a Supervised Learning algorithm
that classifies a new data point into the target class, depending
on the features of it’s neighbouring data point.
Principle of kNN - Classifier
➢ It is Lazy Learner as it doesn't learn from a discriminative function
from training data but memorizes training dataset.

➢ This technique implements classification by considering majority of


vote among the “k” closest points to the unlabeled data point.

➢ It works on unseen data and will search through the training dataset
for the k-most similar instances.

➢ Euclidean distance / Manhattan distance is used as metric for


calculating the distance between points.
When k=3 or k=5??
kNN Classifier (con.)

Figure: Nearest neighbour using the 11-NN rule, the point denoted by a “star” is
classified to the class.
k-NN Classifier Algorithm
Distance Measures
Distance Measures
Euclidean Distance
Classification Types
❑ Binary Classification:
Binary classification is the task of classifying the
elements of a given set into two groups on the basis of a
classification rule.
❑ Multi-Class Classification:
A classification task with more than two classes.
Example (Multi-class)
Table 1: Data for Height Classification.
Name Gender Height Output M = Male
Kristina F 1.6 m Short F = Female
Jim M 2m Tall
Maggie F 1.9 m Medium
Martha F 1.88 m Medium
Stephanie F 1.7 m Short
Bob M 1.85 m Medium
Kathy F 1.6 m Short
Dave M 1.7 m Short
Worth M 2.2 m Tall
Steven M 2.1 m Tall
Debbie F 1.8 m Medium
Todd M 1.95 m Medium
Kim F 1.9 m Medium
Amy F 1.8 m Medium
Wynette F 1.75 m Medium
➢ Using the sample data from Table 1 and the Output classification as the
training set output value, we classify the instance (Pat, F, 1.6).

➢ First convert the discrete data to numeric; Such as: Let, M=0, F=1

➢ Then we will do the distance calculation. Here, both the Euclidean and
Manhattan distance measures yield the same results; that is, the distance is
simply the absolute value of the difference between the values.

➢ Suppose that k = 5 is given. We then have that the k nearest neighbours to


the input instance are (Kristina, F, 1.6), (Kathy, F, 1.6), (Stephanie, F, 1.7),
(Wynette, F, 1.75) and (Debbie, F, 1.8).

➢ Of these the five item, three are classified as short and two as
medium. Thus, the kNN will classify Pat as short.
What should be the value of “ k “ ?
❑ k should be large enough so that error rate is minimized
k too small will lead to noisy decision boundaries

❑ k should be small enough so that only nearby samples are included


k too large will lead to over-smoothed boundaries

❑ Setting k to the square root of the number of training samples may lead to
better results (Empirically Found)

No Of features = 20
K = √20 = 4.4 ~ 4
How to determine a good value for k?

❑ Starting with k = 1, we use a test set to estimate the


error rate of the classifier.

❑ The k value that gives the minimum error rate may be


selected

❑ Choose an odd value of k for 2 class problem.

❑ k must not be multiple of number of classes.


Disadvantage of kNN classifier
➢ The main disadvantage of the kNN classifier is that it is a lazy learner, i.e. it
does not learn anything from the training data and simply uses the training
data itself for classification.
➢ A serious drawback associated with (k)NN technique is the complexity,
(O(kN))2, in search of the nearest neighbour(s) among the N available training
samples. Although, due to its asymptotic error performance, the kNN rule
achieves good results when the data set is large, the performance of the
classifier may degrade dramatically when the value of N training instances is
relatively small.
Application of kNN Classifier
Used in classification
Used to get missing values
Used in pattern recognition
Used in gene expression
Used in protein-protein prediction
Used to get 3D structure of protein
Used to measure document similarity
*** THANK YOU ***

You might also like