Updated K-Nearest Neighbors in Machine Learning
Updated K-Nearest Neighbors in Machine Learning
For classification problems, the KNN algorithm assigns the test data point to the class
that appears most frequently among the k-nearest neighbors. In other words, the class
with the highest number of neighbors is the predicted class.
For regression problems, the KNN algorithm assigns the test data point the average of
the k-nearest neighbors' values.
The distance metric used to measure the similarity between two data points is an
essential factor that affects the KNN algorithm's performance. The most commonly used
distance metrics are Euclidean distance, Manhattan distance, and Minkowski distance.
Lazy learning algorithm − KNN is a lazy learning algorithm because it does not
have a specialized training phase and uses all the data for training while
classification.
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 1/11
Page 2 of 11
Step 1 − For implementing any algorithm, we need dataset. So during the first
step of KNN, we must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K
can be any integer.
Step 4 − End
Example
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 2/11
Page 3 of 11
Now, we need to classify new data point with black dot (at point 60,60) into blue or red
class. We are assuming K = 3 i.e. it would find three nearest data points. It is shown in
the next diagram −
We can see in the above diagram the three nearest neighbors of the data point with
black dot. Among those three, two of them lies in Red class hence the black dot will also
be assigned in red class.
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 3/11
Page 4 of 11
Load the data − The first step is to load the dataset into memory. This can be
done using various libraries such as pandas or numpy.
Split the data − The next step is to split the data into training and test sets. The
training set is used to train the KNN algorithm, while the test set is used to evaluate
its performance.
Normalize the data − Before training the KNN algorithm, it is essential to
normalize the data to ensure that each feature contributes equally to the distance
metric calculation.
Calculate distances − Once the data is normalized, the KNN algorithm calculates
the distances between the test data point and each data point in the training set.
Select k-nearest neighbors − The KNN algorithm selects the k-nearest neighbors
based on the distances calculated in the previous step.
Make a prediction − For classification problems, the KNN algorithm assigns the
test data point to the class that appears most frequently among the k-nearest
neighbors. For regression problems, the KNN algorithm assigns the test data point
the average of the k-nearest neighbors' values.
KNN as Classifier
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
path = "https://archive.ics.uci.edu/ml/machine-learning-
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 4/11
Page 5 of 11
databases/iris/iris.data"
Data Preprocessing will be done with the help of following script lines −
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
Next, we will divide the data into train and test split. Following code will split the dataset
into 60% training data and 40% of testing data −
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 5/11
Page 6 of 11
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
Next, train the model with the help of KNeighborsClassifier class of sklearn as follows −
At last we need to make prediction. It can be done with the help of following script −
y_pred = classifier.predict(X_test)
Output
Confusion Matrix:
[[21 0 0]
[ 0 16 0]
[ 0 7 16]]
Classification Report:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 21
Iris-versicolor 0.70 1.00 0.82 16
Iris-virginica 1.00 0.70 0.82 23
micro avg 0.88 0.88 0.88 60
macro avg 0.90 0.90 0.88 60
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 6/11
Page 7 of 11
Accuracy: 0.8833333333333333
KNN as Regressor
First, start with importing necessary Python packages −
import numpy as np
import pandas as pd
path = "https://archive.ics.uci.edu/ml/machine-learning-
databases/iris/iris.data"
output:(150, 5)
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 7/11
Page 8 of 11
Output
Pros
It is very useful for nonlinear data because there is no assumption about data in
this algorithm.
It has relatively high accuracy but there are much better supervised learning
models than KNN.
Cons
Applications of KNN
The following are some of the areas in which KNN can be applied successfully −
Banking System
Chapters
KNN can Categories
be used in banking system to predict weather an individual is fit for loan
approval? Does that individual have the characteristics similar to the defaulters one?
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 8/11
Page 9 of 11
KNN algorithms can be used to find an individual's credit rating by comparing with the
persons having similar traits.
Politics
With the help of KNN algorithms, we can classify a potential voter into various classes
like "Will Vote", "Will not Vote", "Will Vote to Party 'Congress', "Will Vote to Party 'BJP'.
Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting
Detection, Image Recognition and Video Recognition.
TOP TUTORIALS
Python Tutorial
Java Tutorial
C++ Tutorial
C Programming Tutorial
C# Tutorial
PHP Tutorial
R Tutorial
HTML Tutorial
CSS Tutorial
JavaScript Tutorial
SQL Tutorial
TRENDING TECHNOLOGIES
Docker Tutorial
Kubernetes Tutorial
DSA Tutorial
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 9/11
Page 10 of 11
CERTIFICATIONS
Online C Compiler
Online C++ Compiler
Online C# Compiler
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 10/11
Page 11 of 11
Tutorials Point is a leading Ed Tech company striving to provide the best learning material on
technical and non-technical subjects.
https://www.tutorialspoint.com/machine_learning/machine_learning_knn_nearest_neighbors.htm 11/11