4 Intro To K Nearest Neighbors
4 Intro To K Nearest Neighbors
K Nearest Neighbors
Prediction Algorithm:
1. Calculate the distance from x to all points in your data
2. Sort the points in your data by increasing distance from x
3. Predict the majority label of the “k” closest points
KN
N
Choosing a K will affect what class a new point is assigned to:
KN
N
Choosing a K will affect what class a new point is assigned to:
KN
N
Choosing a K will affect what class a new point is assigned to:
KNN Distance Metrics
• For the algorithm to work best on a particular dataset we need
to choose the most appropriate distance metric accordingly.
• There are a lot of different distance metrics available, but we are
only going to talk about a few widely used ones.
• Euclidean distance function is the most popular one among all
of them as it is set default in the SKlearn KNN classifier library in
python.
KNN Distance Metrics
• Manhattan Distance
The distance between two points is the sum of the
absolute differences of their Cartesian coordinates.
Example
Suppose we have two points as shown in the image
We will get:
the red(4,4) and the green(1,1).
d = |4-1| + |4-1| = 6
KNN Distance Metrics
• Euclidean Distance
It is a measure of the true straight line distance
between two points in Euclidean space.
Example
Now suppose we have two point the red (4,4) and the
We will get:
green (1,1).
d= 𝟒−𝟏 𝟐 + 𝟒−𝟏 𝟐
= 4,24
KNN Distance Metrics
Other distances
• Minkowski distance
• Cosine Distance
• Jaccard Distance
• Minkowski distance
KN
N
Pros
● Very simple
● Training is trivial/simple
● Works with any number of classes
● Easy to add more data
● Few parameters
○ K
○ Distance Metric
KN
N
Cons
● High Prediction Cost (worse for large data sets)
● Not good with high dimensional data
● Categorical Features don’t work well