0% found this document useful (0 votes)
376 views21 pages

KNN (K Nearest Neighbor)

Uploaded by

bhalikhaliq558
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
376 views21 pages

KNN (K Nearest Neighbor)

Uploaded by

bhalikhaliq558
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

KNN(k-nearest neighbor)

Machine Learning
What is K-NN?
• The K-Nearest Neighbor (KNN) algorithm is a popular
machine learning technique used for classification and
regression tasks.
• During the training phase, the KNN algorithm stores the
entire training dataset as a reference. When making
predictions, it calculates the distance between the input
data point and all the training examples, using a chosen
distance metric such as Euclidean distance.
• Require high memory – need to store all of the training
data.
Continue…
• K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
• It is also called a lazy learner
algorithm because it does not learn from the
training set immediately instead it stores the
dataset and at the time of classification, it
performs an action on the dataset.
How K-NN classifier works?
1) Load the data
2) Initialize the value of k
3) For getting the predicted class, iterate from 1 to
total number of training data points
i. Calculate the distance between test data and each row
of training dataset. Here we will use Euclidean distance
as our distance metric since it’s the most popular
method. The other distance function or metrics that
can be used are Manhattan distance, Minkowski
distance, cosine, etc. If there are categorical variables,
hamming distance can be used.
Continue…
i. Sort the calculated distances in ascending order
based on distance values
ii. Get top k rows from the sorted array
iii. Get the most frequent class of these rows
iv. Return the predicted class
Example

The graph above represents a data set


consisting of two classes — red and blue.
Continue…

A new data entry has been introduced to the data set in the
green color
Continue…
• Let's assume the value of K is 3.
Continue…
• Out of the 3 nearest neighbors, the majority
class is red so the new entry will be assigned
to that class.
EXAMPLE
BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Continue…
• Let's assume the value of K is 5.
• Here's the new data entry
BRIGHTNESS SATURATION CLASS
:
20 35 ?

• Euclidean distance formula:


√(X₂-X₁)²+(Y₂-Y₁)²
• Where:
– X₂ = New entry's brightness (20).
– X₁= Existing entry's brightness.
– Y₂ = New entry's saturation (35).
– Y₁ = Existing entry's saturation
Continue…
• For the first row, d1:
d1 = √(20 - 40)² + (35 - 20)²
= √400 + 225
= √625
= 25
• For the Second row, d1:
d2 = √(20 - 50)² + (35 - 50)²
= √900 + 225
= √1125
= 33.54
Continue…
• For the Third row, d1:
d2 = √(20 - 60)² + (35 - 90)²
= √1600 + 3025
= √4625
= 68.01
• Calculate the distance from other data points
also, using the same Euclidean distance
formula
Continue…
BRIGHTNESS SATURATION CLASS Distance
40 20 Red 25
50 50 Blue 33.54
60 90 Blue 68.01
10 25 Red 10
70 70 Blue 61.03
60 10 Red 47.17
25 80 Blue 45
Continue…
BRIGHTNESS SATURATION CLASS Distance
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54

25 80 Blue 45
60 10 Red 47.17
70 70 Blue 61.03
60 90 Blue 68.01

Let's rearrange the distances in ascending order


Continue…
• Since we chose 5 as the value of K, we'll only
consider the first five rows. That is:
BRIGHTNESS SATURATION CLASS DISTANCE
10 25 Red 10
40 20 Red 25
50 50 Blue 33.54
25 80 Blue 45
60 10 Red 47.17
Continue…
• As you can see above, the majority class
within the 5 nearest neighbors to the
new entry is Red. Therefore, we'll
classify the new entry as Red.
Advantages of K-NN Algorithm
• It is simple to implement.
• No training is required before classification.
Disadvantages of K-NN Algorithm
• Can be cost-intensive when working with a large
data set.
• A lot of memory is required for processing large
data sets.
• Choosing the right value of K can be tricky.

You might also like