0% found this document useful (0 votes)
46 views12 pages

4.kNN Concepts

Uploaded by

Suyash Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views12 pages

4.kNN Concepts

Uploaded by

Suyash Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

K Nearest Neighbors

Copyright©2019 by Simplifying Skills, Contact:- 9579708361/7798283335/8390096208


KNN

K Nearest Neighbors is a classification algorithm that


operates on a very simple principle.
It is best shown through example!
Imagine we had some imaginary data on Dogs and Horses,
with heights and weights.
KNN
KNN

Training Algorithm:
1. Store all the Data

Prediction Algorithm:
1. Calculate the distance from x to all points in your data
2. Sort the points in your data by increasing distance from x
3. Predict the majority label of the “k” closest points
KNN
Choosing a K will affect what class a new point is
assigned to:
KNN
Choosing a K will affect what class a new point is
assigned to:
KNN
Choosing a K will affect what class a new point is
assigned to:
KNN

Pros
● Very simple
● Training is trivial
● Works with any number of
classes
● Easy to add more data
● Few parameters
○ K
○ Distance Metric
KNN

Cons
● High Prediction Cost (worse for large data
sets)
● Not good with high dimensional data
● Categorical Features don’t work well
It can be used for both classification and regression problems. However, it is
more widely used in classification problems in the industry. K nearest
neighbors is a simple algorithm that stores all available cases and classifies
new cases by a majority vote of its k neighbors. The case being assigned to
the class is most common amongst its K nearest neighbors measured by a
distance function.

These distance functions can be Euclidean, Manhattan, Minkowski and


Hamming distance. First three functions are used for continuous function and
fourth one (Hamming) for categorical variables. If K = 1, then the case is
simply assigned to the class of its nearest neighbor. At times, choosing K turns
out to be a challenge while performing kNN modeling.
KNN can easily be mapped to our real lives. If you want to learn about a person,
of whom you have no information, you might like to find out about his close
friends and the circles he moves in and gain access to his/her information!

Things to consider before selecting kNN:

• KNN is computationally expensive


• Variables should be normalized else higher range variables can bias it
• Works on pre-processing stage more before going for kNN like outlier, noise
removal
Let’s Implement

You might also like