K-Nearest Neighbors
K-Nearest Neighbors (KNN) is one of the simplest algorithms used in Machine Learning for
regression and classification problem. KNN algorithms use a data and classify new data points
based on a similarity measures (e.g. distance function). Classification is done by a majority vote
to its neighbors.
K-nearest neighbor (Knn) algorithm pseudocode:
Let (Xi, Ci) where i = 1, 2……., n be data points. Xi denotes feature values & Ci denotes labels for
Xifor each i.
Assuming the number of classes as ‘c’
Ci ∈ {1, 2, 3, ……, c} for all values of i
Let x be a point for which label is not known, and we would like to find the label class using k-
nearest neighbor algorithms.
Knn Algorithm Pseudocode:
1. Calculate “d(x, xi)” i =1, 2, ….., n; where d denotes the Euclidean distance between the
points.
2. Arrange the calculated n Euclidean distances in non-decreasing order.
3. Let k be a +ve integer, take the first k distances from this sorted list.
4. Find those k-points corresponding to these k-distances.
5. Let ki denotes the number of points belonging to the ith class among k points i.e. k ≥ 0
6. If ki >kj ∀ i ≠ j then put x in class i.
Nearest Neighbor Algorithm:
Nearest neighbor is a special case of k-nearest neighbor class. Where k value is 1 (k = 1). In this
case, new data point target class will be assigned to the 1st closest neighbor.
Advantages of K-nearest neighbors algorithm
Knn is simple to implement.
Knn executes quickly for small training data sets.
performance asymptotically approaches the performance of the Bayes Classifier.
Don’t need any prior knowledge about the structure of data in the training set.
No retraining is required if the new training pattern is added to the existing training set.
Limitation to K-nearest neighbors algorithm
When the training set is large, it may take a lot of space.
For every test data, the distance should be computed between test data and all the training data.
Thus a lot of time may be needed for the testing.