100% found this document useful (1 vote)
173 views

KNN Algorithm

The KNN algorithm is a supervised machine learning algorithm that can be used for classification problems. It works by finding the K closest training examples in a dataset to a new data point based on distance, and assigns the new point the most common label of its K nearest neighbors. The document provides examples of how to calculate distances between data points, choose an appropriate value for K, and demonstrates running the KNN algorithm on a sample dataset to predict the sports for a new data point.

Uploaded by

charan11082001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
173 views

KNN Algorithm

The KNN algorithm is a supervised machine learning algorithm that can be used for classification problems. It works by finding the K closest training examples in a dataset to a new data point based on distance, and assigns the new point the most common label of its K nearest neighbors. The document provides examples of how to calculate distances between data points, choose an appropriate value for K, and demonstrates running the KNN algorithm on a sample dataset to predict the sports for a new data point.

Uploaded by

charan11082001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

KNN ALGORITHM

Classification Algorithm
It is the simplest Supervised Learning Algorithm.
Determine what group something belong in , type of tumor etc…
Used to predict future groups.
It computes the distance between data records and all reference data records.
Then it looks at the “K” closets data records in the reference data.
For example:-
k=5 it will see for 5 closets data
The distance equation is normally,

d = √[(x2 – x1)2 + (y2 – y1)2]


ie. Euclidean Distance
Other distances  Manhattan distance
 Minkowski distance
etc….
How do we choose the factor “k”

To choose a value of k

1) Sqrt(n), where n is the total no of data points.


2) Odd value of K is selected to avoid confusion between data points.
When do we use KNN Algorithm
 While Data is Labeled.
 Data is noise free(undefined Class name eg: hellokitty,23 etc..)
 When dataset is small.
Some of the Use Case:
 Perdict Diabetets
 Spam email filtering
 Iris flower classification
 Handwritten digit recognition etc…
NAME AGE GENDER SPORTS
Ajay 32 M Football
Mark 40 M Neither
Sara 16 F Cricket
Zaira 34 F Cricket
Sachin 55 M Neither
Rahul 40 M Cricket
Pooja 20 F Neither
Smith 15 M Cricket
Laxmi 55 F Football
Michael 15 M Football
Angelina 5 F ?

For the Name Angelina, Find the Sports where k=3


Here we take,
Male=0
Female=1
ITERATION 1:
NAME AGE GENDER SPORTS
Ajay 32 M Football

Angelina 5 1 ?

d = √[(5 – 32)2 + (1 – 0)2]


= 27.02
ITERATION 2:

NAME AGE GENDER SPORTS


Mark 40 M Neither
Angelina 5 F ?

d = √[(5 – 40)2 + (1 – 0)2]


= 35.01
ITERATION 3:

NAME AGE GENDER SPORTS


Sara 16 F Cricket
Angelina 5 F ?
d = √[(5 – 16)2 + (1 – 1)2]
= 11.00
ITERATION 4:

NAME AGE GENDER SPORTS


Zaira 34 F Cricket
Angelina 5 F ?

d = √[(5 – 34)2 + (1 – 1)2]


= 29.00
here k=3
The closets 3 values to Angelina is 9.0,10.0,10.05
ie., Zaira, Smith, Micheal
While we look at the classification of three nearest neighbors.
NAME DISTANCE SPORTS
Zaira 9.0 Cricket
Smith 10.0 Cricket
Micheal 10.05 Football

Here the Majority is Cricket, So Angelina belongs to the Group of Cricket.

You might also like