KNN (K Nearest Neighbor)
KNN (K Nearest Neighbor)
Machine Learning
What is K-NN?
• The K-Nearest Neighbor (KNN) algorithm is a popular
machine learning technique used for classification and
regression tasks.
• During the training phase, the KNN algorithm stores the
entire training dataset as a reference. When making
predictions, it calculates the distance between the input
data point and all the training examples, using a chosen
distance metric such as Euclidean distance.
• Require high memory – need to store all of the training
data.
Continue…
• K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
• It is also called a lazy learner
algorithm because it does not learn from the
training set immediately instead it stores the
dataset and at the time of classification, it
performs an action on the dataset.
How K-NN classifier works?
1) Load the data
2) Initialize the value of k
3) For getting the predicted class, iterate from 1 to
total number of training data points
i. Calculate the distance between test data and each row
of training dataset. Here we will use Euclidean distance
as our distance metric since it’s the most popular
method. The other distance function or metrics that
can be used are Manhattan distance, Minkowski
distance, cosine, etc. If there are categorical variables,
hamming distance can be used.
Continue…
i. Sort the calculated distances in ascending order
based on distance values
ii. Get top k rows from the sorted array
iii. Get the most frequent class of these rows
iv. Return the predicted class
Example
A new data entry has been introduced to the data set in the
green color
Continue…
• Let's assume the value of K is 3.
Continue…
• Out of the 3 nearest neighbors, the majority
class is red so the new entry will be assigned
to that class.
EXAMPLE
BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Continue…
• Let's assume the value of K is 5.
• Here's the new data entry
BRIGHTNESS SATURATION CLASS
:
20 35 ?
25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01