
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
K-Nearest Neighbor Algorithm in Python
Introduction
The k-Nearest Neighbor is a powerful and straightforward technique to solve problems related to classification and regression. It makes the prediction on the input samples and checks for how similar the samples are to one another. In this present article we will explain k-NN technique and implementation of python, with two different types of approaches . To ensure a clear comprehension of this well-known technique, we will offer step-by-step explanations replete with executable code and results.
k-Nearest Neighbor Algorithm
A supervised machine learning ( ML) technique used for classification and regression problems is called the k-Nearest Neighbour (k-NN) algorithm. It functions under the premise that identical instances frequently produce similar results. The algorithm locates the k closest training examples and determines the class (classification) or value (regression) to be predicted based on the majority of the labels or average values of those samples, respectively, given a fresh input.
Syntax
The syntax for both is similar . Here we will use the scikit- learn library, which can help to develop the k-NN method in python. There are two different methods depending on the user's needs. If the user wants the classification tasks, he/she can go with KNeighborsClassifier, or if the user wants to predict the numerical part , he/she can go with KNeighborsRegressor
1. Classification
from sklearn.neighbors import KNeighborsClassifier # Create an instance of the k-NN classifier knn = KNeighborsClassifier(n_neighbors=k) # Train the classifier using the training data knn.fit(X_train, y_train) # Make predictions on new data predictions = knn.predict(X_test)
2. Regression
from sklearn.neighbors import KNeighborsRegressor # Create an instance of the k-NN regressor knn = KNeighborsRegressor(n_neighbors=k) # Train the regressor using the training data knn.fit(X_train, y_train) # Make predictions on new data predictions = knn.predict(X_test)
The code above uses k to denote the number of neighbors to take into account, X_train and y_train to denote the training features and labels, and X_test to denote the fresh data on which predictions should be made.
Explanation of Syntax
The relevant class is imported from the sklearn.neighbors package.
By defining the k-NN classifier or regressor's number of neighbors to take into account, we can generate an instance of it.
Using the fit() function and the training set, we train the classifier or regressor.
Finally, we use the predict() method to generate predictions while giving the updated data.
Algorithm
Step 1 Load the Data: -Read or load the dataset into your Python environment.
Step 2 Split the Data: -Divide the dataset into training and testing sets to assess the algorithm's effectiveness.
Step 3 Preprocess the Data: - To ensure consistent data representation, carry out any necessary preparation procedures, such as scaling or normalization.
Step 4 Train the k-NN Model: -Utilizing the training data, create an instance of the k-NN classifier or regressor.
Step 5 Evaluate the Model: - Make predictions based on the testing set, and then assess the model's performance using suitable measures like accuracy or mean squared error.
Approach
Approach 1 k-NN Classification Example
Approach 2 k-NN Regression Example
Approach 1: k-NN Classification Example
Let's have a look at a real-world application of k-NN classification to identify the species of iris flowers based on the dimensions of their sepals and petals. For this presentation, we'll make use of the well-known Iris dataset.
Example
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score # Load the Iris dataset iris = load_iris() # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) # Create an instance of the k-NN classifier knn = KNeighborsClassifier(n_neighbors=3) # Train the classifier using the training data knn.fit(X_train, y_train) # Make predictions on the testing set predictions = knn.predict(X_test) # Calculate and print the accuracy of the model accuracy = accuracy_score(y_test, predictions) print("Accuracy:", accuracy)
Output
Accuracy: 1.0
In Approach 1, the Iris dataset is loaded, divided into training and testing sets, and a k-NN classifier instance with n_neighbors=3 is created.
Using the training set of data, we train the classifier, and then we make predictions using the testing set.
In order to determine the model's accuracy, we compare the predicted labels to the actual labels. In this instance, the result displays the model's accuracy, which is 1.0 or 100%. This indicates that the k-NN classifier has 100% accuracy in identifying the species of iris blooms in the testing set.
Approach 2: k-NN Regression Example
Let's utilise the Boston Housing dataset to forecast the median price of owner-occupied homes as an example of regression. For this challenge, we'll use the k-NN regressor.
Example
from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_squared_error # Load the Boston Housing dataset boston = load_boston() # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42) # Create an instance of the k-NN regressor knn = KNeighborsRegressor(n_neighbors=5) # Train the regressor using the training data knn.fit(X_train, y_train) # Make predictions on the testing set predictions = knn.predict(X_test) # Calculate and print the mean squared error of the model mse = mean_squared_error(y_test, predictions) print("Mean Squared Error:", mse)
Output
Mean Squared Error: 30.137858823529412
In Approach 2, the Boston Housing dataset is loaded, divided into training and testing sets, and a k-NN regressor instance with n_neighbors=5 is created.
Generally, we train the algorithm using the training set of data which is also known as the training set, we then use this training set to make the predictions using the k-NN algorithm.
Finally, we compare the predicted and actual values to get the model's mean squared error. The model's mean squared error (MSE), which in this instance is around 30.1379, is displayed in the output. The MSE is the average squared difference between the true values in the testing set and the anticipated median values of the owner-occupied dwellings. In this case, a lower score would suggest a more accurate regression model because a lower MSE is indicative of greater performance. The n_neighbors parameter number used to build the KNeighborsRegressor instance, along with the random state used to divide the data into training and testing sets, will determine the precise amount of the mean squared error.
Conclusion
The k-Nearest Neighbour (k-NN) algorithm is a flexible and popular machine learning method. For problems involving classification and regression, it is especially helpful. The fundamentals of the k-NN technique, its syntax in Python, and detailed descriptions of how to implement it were all covered in this article. Along with entire executable code and results, we also looked at two methods for using k-NN for classification and regression. You can use the k-NN method as a potent tool to address a range of machine learning issues by comprehending it and its useful applications.