KNN Algorithm
KNN Algorithm
Machine Learning
K-Nearest Neighbor (KNN)
Algorithm
(Introduction, Algorithm, Example, Applications)
Contents
1. Introduction to KNN
2. Need of KNN Algorithm
3. KNN Algorithms Steps
4. How to select the value of K?
5. Euclidean Distance Formula
6. Example
7. Applications
8. Advantages & Disadvantages
Introduction to KNN Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning algorithm based on
Supervised Learning technique.
• KNN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
• KNN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suited category by using KNN algorithm.
• KNN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
• KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
Introduction to KNN Algorithm (Contd.)
Why do we need a KNN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B, and we have
a new data point x1, so this data point will lie in which of these categories.
• To solve this type of problem, we need a KNN algorithm.
• With the help of KNN, we can easily identify the category or class of a particular
dataset.
How does KNN work?
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance
• Step-4: Among these k neighbors, count the number of the data points in each category
• Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum
• Step-6: Our model is ready
How does KNN work? (Contd.)
• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the
distance between two points, which we have already studied.
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category
A and two nearest neighbors in category B.
How to select the value of K in the K-NN Algorithm?
• There is no particular way to determine the best value for "K", so we need to try some values to
find the best out of them.
• The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the
model.
• Large values for K are good, but it may find some difficulties.
Revise: Euclidean Distance Formula
Another Example
Name Age Gender Sport
Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Laxmi 55 F Football
Michael 15 M Football
Angelina 5 F ??
Male = 0
Female = 1
Another Example (Contd.)
Name Age Gender Distance Sport
Ajay 32 0 27.02 Football
Mark 40 0 35.01 Neither
Sara 16 1 11.00 Cricket
Zaira 34 1 9.00 Cricket
Sachin 55 0 50.01 Neither
Rahul 40 0 35.01 Cricket
Pooja 20 1 15.00 Neither
Smith 15 0 10.00 Cricket
Laxmi 55 1 50.00 Football
Michael 15 0 10.05 Football