Introduction To KNN
Introduction To KNN
KNN which stand for K Nearest Neighbor is a Supervised Machine Learning algorithm that
classifies a new data point into the target class, depending on the features of its neighboring
data points. To make you understand how KNN algorithm works, let’s consider the following
scenario
Used for
Classification
Memorizes the and regression
training data Non-
set parametric
KNN Algorithm
Supervised
Based on
Learning
feature
algorithm
similarity
Simple Machine
learning
algorithm
Start
End
Euclidean distance: The Euclidean distance between any two instances is the
length of the line segment connecting them. In this study, the dataset is composed
of 22 attributes is represented in 22-dimensional space. If x = (x 1 , x2 , ..., x12 ) and y
= (y1 , y2 , ..., y12 ) are two points, then the distance from x to y is given by :
Let K=3
With K=3, there are one Default=Y (Jack, 33,150000, Y) and two Default=N (Kate,
35,120000, N)and (George , 60, 100000, N) .Here out of three closest object only one Y and
two N so we can say default class for Andrew is N.
Let K=4
With K=4, there are two Default=Y (Jack, 33,150000, Y),(Anil, 23, 95000, Y) and three
Default=N (Kate, 35,120000, N)(George, 60, 100000, N). Here out of four closest object
twois Y and two is N.So again we cannot able to decide class for Andrew.
Let K=5
With K=5, there are two Default=Y (Jack, 33,150000, Y),(Anil, 23, 95000, Y) and two
Default=N (Kate, 35,120000, N) (George, 60, 100000, N) (Alex,45,80000,N). Here out of
five closest objects two is Y and three is N. So we can say default class for Andrew is N
Pros of KNN
1. Simple to implement
2. Flexible to feature/distance choices
3. Naturally handles multi-class cases
4. Can do well in practice with enough representative data
Cons of KNN
1. Need to determine the value of parameter K (number of nearest neighbors)
2. Computation cost is quite high because we need to compute the distance of each query
instance to all training samples.
3. Storage of data
4. Must know we have a meaningful distance function.