K Nearestneighborknnalgorithm 241117075907 d767c46d
K Nearestneighborknnalgorithm 241117075907 d767c46d
(K-NN)
By Mohamed Gamal
Agenda
▪ What is K-NN?
▪ How does K-NN work?
▪ K-NN algorithm and structure
▪ Advantages of K-NN
▪ Disadvantages of K-NN
▪ Example
1
What is KNN?
▪ K-Nearest Neighbors (KNN) is a simple, yet powerful, supervised non-linear machine
learning algorithm used for classification and regression tasks.
▪ It's a non-parametric algorithm, meaning it doesn't make any assumptions about the
underlying data distribution and doesn't learn a model during the training phase.
▪ Instead, it memorizes the entire training dataset and makes predictions based on the
similarity of new data points to the known data points.
2
How does KNN work? (Algorithm)
▪ Step-1: Select the number K of the neighbors.
▪ Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
2
1) K=5 2) Calculate the distances. 3) Choose K=5 neighbors
with the min. distances.
Blue: 3 Blue: 3
Orange: 2 Orange: 2
4) Among the selected K nearest neighbors, count 5) Assign the new data point to the
the no. points in each class/category. class/category with the majority of votes.
Ways to calculate the
distance in KNN
Minkowski Distance
(Named after the German mathematician, Hermann Minkowski)
𝑝=1
Manhattan Distance
(Also called taxicab distance or cityblock distance)
Euclidean Distance
𝑝=2
No need to make additional assumptions about Requires a large amount of memory — needs to
the data store all of the training data
Example The dataset New data entry
▪ Assume that:
• The value of K is 5.
• Euclidean distance is used.
(Saturation)
▪ Note: you can calculate the distance
using any other measure!
(e.g., Manhattan, Minkowski … etc.).
The dataset
Calculating Distances:
▪ 𝑑1 = 40 − 20 2 + 20 − 35 2 = 25
▪ 𝑑2 = 50 − 20 2 + 50 − 35 2 = 33.54
▪ 𝑑3 = 60 − 20 2 + 90 − 35 2 = 68.01
▪ 𝑑4 = 10 − 20 2 + 25 − 35 2 = 10
▪ 𝑑5 = 70 − 20 2 + 70 − 35 2 = 61.03
▪ 𝑑7 = 25 − 20 2 + 80 − 35 2 = 45
The dataset New data entry
Red