KNN ALGORITHM
Classification Algorithm
It is the simplest Supervised Learning Algorithm.
Determine what group something belong in , type of tumor etc…
Used to predict future groups.
It computes the distance between data records and all reference data
records.
Then it looks at the “K” closets data records in the reference data.
For example:-
k=5 it will see for 5 closets data
The distance equation is normally,
d = √[(x2 – x1)2 + (y2 – y1)2]
ie. Euclidean Distance
Other distances Manhattan distance
Minkowski distance
etc….
How do we choose the factor “k”
To choose a value of k
1) Sqrt(n), where n is the total no of data points.
2) Odd value of K is selected to avoid confusion between data points.
When do we use KNN Algorithm
While Data is Labeled.
Data is noise free(undefined Class name eg: hellokitty,23 etc..)
When dataset is small.
Some of the Use Case:
Perdict Diabetets
Spam email filtering
Iris flower classification
Handwritten digit recognition etc…
NAME AGE GENDER SPORTS
Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Laxmi 55 F Football
Michael 15 M Football
Angelina 5 F ?
For the Name Angelina, Find the Sports where k=3
Here we take,
Male=0
Female=1
ITERATION 1:
NAME AGE GENDER SPORTS
Ajay 32 M Football
Angelina 5 1 ?
d = √[(5 – 32)2 + (1 – 0)2]
= 27.02
ITERATION 2:
NAME AGE GENDER SPORTS
Mark 40 M Neither
Angelina 5 F ?
d = √[(5 – 40)2 + (1 – 0)2]
= 35.01
ITERATION 3:
NAME AGE GENDER SPORTS
Sara 16 F Cricket
Angelina 5 F ?
d = √[(5 – 16)2 + (1 – 1)2]
= 11.00
ITERATION 4:
NAME AGE GENDER SPORTS
Zaira 34 F Cricket
Angelina 5 F ?
d = √[(5 – 34)2 + (1 – 1)2]
= 29.00
here k=3
The closets 3 values to Angelina is 9.0,10.0,10.05
ie., Zaira, Smith, Micheal
While we look at the classification of three nearest neighbors.
NAME DISTANCE SPORTS
Zaira 9.0 Cricket
Smith 10.0 Cricket
Micheal 10.05 Football
Here the Majority is Cricket, So Angelina belongs to the Group of Cricket.