KNN-Assignment-Description
KNN-Assignment-Description
Experiment No 4
Implementing K-Nearest Neighbors (KNN)
Problem Description:
1. Take input from “train.txt” and plot the points with different colored markers according to the
assigned class label.
2. Implement KNN algorithm. The value of K will be taken from user. Classify the test points from
“test.txt” with different colored markers according to the predicted class label.
3. Print the top K distances along with their class labels and the predicted class to “prediction.txt” for
each of the test data. So, for example, if K = 3, for one of the test data (3,7), the “prediction.txt” may
look like:
Test point: 3, 7
Distance 1: 2 Class: 1
Distance 2: 4 Class: 0
Distance 3: 5 Class: 1
Predicted class: 1
Marks Distribution:
Task Mark
1 2
2 4
3 4
Example:
We have data from the questionnaires survey (to ask people opinion) and objective testing with
two attributes (acid durability and strength) to classify whether a special paper tissue is good or
not. Here is four training samples
X1 = Acid Durability X2 = Strength
Y = Classification
(seconds) (kg/square meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7.
Without another expensive survey, can we guess what the classification of this new tissue is?
2. Calculate the distance between the query-instance and all the training samples
Coordinate of query instance is (3, 7), instead of calculating the distance we compute square
distance which is faster to calculate (without square root)
3. Sort the distance and determine nearest neighbors based on the K-th minimum distance
4. Gather the category of the nearest neighbors. Notice in the second row last column that the
category of nearest neighbor (Y) is not included because the rank of this data is more than 3
(=K).
5. Use simple majority of the category of nearest neighbors as the prediction value of the query
instance
We have 2 good and 1 bad, since 2>1 then we conclude that a new paper tissue that pass
laboratory
test with X1 = 3 and X2 = 7 is included in Good category.