0% found this document useful (0 votes)
10 views13 pages

K Nearestneighborknnalgorithm 241117075907 d767c46d

Uploaded by

1mohamed.gamal54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

K Nearestneighborknnalgorithm 241117075907 d767c46d

Uploaded by

1mohamed.gamal54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

K-Nearest Neighbors

(K-NN)
By Mohamed Gamal
Agenda
▪ What is K-NN?
▪ How does K-NN work?
▪ K-NN algorithm and structure
▪ Advantages of K-NN
▪ Disadvantages of K-NN
▪ Example

1
What is KNN?
▪ K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised non-linear machine
learning algorithm used for classification and regression tasks.

▪ It's a non-parametric algorithm, meaning it doesn't make any assumptions about the
underlying data distribution and doesn't learn a model during the training phase.

▪ Instead, it memorizes the entire training dataset and makes predictions based on the
similarity of new data points to the known data points.

2
How does KNN work? (Algorithm)
▪ Step-1: Select the number K of the neighbors.

▪ Step-2: Calculate the distance of K number of neighbors.

▪ Step-3: Take the K nearest neighbors as per the calculated distance.

▪ Step-4: Among these K neighbors, count the number of the data


points in each category.

▪ Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
2
1) K=5 2) Calculate the distances. 3) Choose K=5 neighbors
with the min. distances.

Blue: 3 Blue: 3
Orange: 2 Orange: 2

4) Among the selected K nearest neighbors, count 5) Assign the new data point to the
the no. points in each class/category. class/category with the majority of votes.
Ways to calculate the
distance in KNN
Minkowski Distance
(Named after the German mathematician, Hermann Minkowski)
𝑝=1

Manhattan Distance
(Also called taxicab distance or cityblock distance)

Euclidean Distance
𝑝=2

(The shortest distance between any two points)


How to select the value of K?
• There’s no particular way to determine the best value
for K, so you need to try some values to find the best
out of them.

• The most preferred value for K is 5.

• A very low value for K such as K=1 or K=2, can be noisy


and lead to the effects of outliers in the model.

• Large values for K are good, but it may find some


difficulties.
Advantages Disadvantages

Accuracy depends on the quality of the data (e.g.,


Simple and effective algorithm
noise can affect accuracy)

With large data, the prediction stage might be


Quick calculation time
slow

Sensitive to the scale of the data and irrelevant


High accuracy (with small dataset)
features

No need to make additional assumptions about Requires a large amount of memory — needs to
the data store all of the training data
Example The dataset New data entry

▪ Assume that:

• The value of K is 5.
• Euclidean distance is used.

(Saturation)
▪ Note: you can calculate the distance
using any other measure!
(e.g., Manhattan, Minkowski … etc.).
The dataset

Calculating Distances:

▪ 𝑑1 = 40 − 20 2 + 20 − 35 2 = 25

▪ 𝑑2 = 50 − 20 2 + 50 − 35 2 = 33.54

▪ 𝑑3 = 60 − 20 2 + 90 − 35 2 = 68.01

▪ 𝑑4 = 10 − 20 2 + 25 − 35 2 = 10

▪ 𝑑5 = 70 − 20 2 + 70 − 35 2 = 61.03

▪ 𝑑6 = 60 − 20 2 + 10 − 35 2 = 47.17 New data entry

▪ 𝑑7 = 25 − 20 2 + 80 − 35 2 = 45
The dataset New data entry

Red

▪ As you can see, based on the 5


𝐾=5

selected neighbors with the low


distances, the majority of votes
are for the Red class, therefore,
the new entry is classified as
Red.
KNN failure cases
▪ If the data is a jumble of all different classes then KNN will fail because it will try
to find k nearest neighbors.
▪ Outliers points.

Jumbled data Outliers


Thank You!

You might also like