0% found this document useful (0 votes)
14 views13 pages

4 Intro To K Nearest Neighbors

K Nearest Neighbors (KNN) is a simple classification algorithm that predicts the class of a new point based on the majority label of its 'k' closest points in the dataset. The choice of 'k' and the distance metric, such as Euclidean or Manhattan distance, significantly influence the classification outcome. While KNN is easy to implement and works with various classes, it has drawbacks including high prediction costs for large datasets and poor performance with high-dimensional or categorical data.

Uploaded by

sarabenamar27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views13 pages

4 Intro To K Nearest Neighbors

K Nearest Neighbors (KNN) is a simple classification algorithm that predicts the class of a new point based on the majority label of its 'k' closest points in the dataset. The choice of 'k' and the distance metric, such as Euclidean or Manhattan distance, significantly influence the classification outcome. While KNN is easy to implement and works with various classes, it has drawbacks including high prediction costs for large datasets and poor performance with high-dimensional or categorical data.

Uploaded by

sarabenamar27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Introduction to

K Nearest Neighbors

Dr BENDIABDALLAH Mohammed Hakim


KN
N
K Nearest Neighbors is a classification algorithm that operates on
a very simple principle.
It is best shown through example!
Imagine we had some imaginary data on Dogs and Horses, with
heights and weights.
KN
N
KN
N
Training Algorithm:
1. Store all the Data

Prediction Algorithm:
1. Calculate the distance from x to all points in your data
2. Sort the points in your data by increasing distance from x
3. Predict the majority label of the “k” closest points
KN
N
Choosing a K will affect what class a new point is assigned to:
KN
N
Choosing a K will affect what class a new point is assigned to:
KN
N
Choosing a K will affect what class a new point is assigned to:
KNN Distance Metrics
• For the algorithm to work best on a particular dataset we need
to choose the most appropriate distance metric accordingly.
• There are a lot of different distance metrics available, but we are
only going to talk about a few widely used ones.
• Euclidean distance function is the most popular one among all
of them as it is set default in the SKlearn KNN classifier library in
python.
KNN Distance Metrics
• Manhattan Distance
The distance between two points is the sum of the
absolute differences of their Cartesian coordinates.

Example
Suppose we have two points as shown in the image
We will get:
the red(4,4) and the green(1,1).
d = |4-1| + |4-1| = 6
KNN Distance Metrics
• Euclidean Distance
It is a measure of the true straight line distance
between two points in Euclidean space.

Example
Now suppose we have two point the red (4,4) and the
We will get:
green (1,1).
d= 𝟒−𝟏 𝟐 + 𝟒−𝟏 𝟐

= 4,24
KNN Distance Metrics
Other distances
• Minkowski distance

• Cosine Distance

• Jaccard Distance

• Minkowski distance
KN
N
Pros
● Very simple
● Training is trivial/simple
● Works with any number of classes
● Easy to add more data
● Few parameters
○ K
○ Distance Metric
KN
N
Cons
● High Prediction Cost (worse for large data sets)
● Not good with high dimensional data
● Categorical Features don’t work well

You might also like