MINI PROJECT
Use KNN Algorithm for classification of Iris flower
dataset.
#STEP 1 : IMPORT THE NECESSARY LIBRARIES
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
#STEP 2 : Replace the provided dataset values with a dictionary
data = {
'sepal_length': [5.1, 4.9, 4.7, 4.6, 5.0, ...], # List of sepal lengths
'sepal_width': [3.5, 3.0, 3.2, 3.1, 3.6, ...], # List of sepal widths
'petal_length': [1.4, 1.4, 1.3, 1.5, 1.4, ...], # List of petal lengths
'petal_width': [0.2, 0.2, 0.2, 0.2, 0.2, ...], # List of petal widths
'species': ['setosa', 'setosa', 'setosa', 'setosa', 'setosa', ...] # List of species
# Load the data into a pandas DataFrame
iris_data = pd.DataFrame(data)
#STEP 3 : Display the first 5 rows of the dataset
print(iris_data.head())
#STEP 4 : Split the dataset into features (X) and target (y)
X = iris_data.drop('species', axis=1)
y = iris_data['species']
#STEP 5 : Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#STEP 6 : Initialize the KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
#STEP 7 : Train the model using the training data
knn.fit(X_train, y_train)
#STEP 8 : Make predictions on the test data
y_pred = knn.predict(X_test)
#STEP 9 : Calculate and print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
#STEP 10 : Generate and print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred))
#STEP 11 : Generate and print the confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
OUTPUT :
First 5 rows of the Iris dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
Accuracy: 0.9666666666666667
Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 10
versicolor 1.00 0.92 0.96 13
virginica 0.89 1.00 0.94 9
accuracy 0.97 32
macro avg 0.96 0.97 0.97 32
weighted avg 0.97 0.97 0.97 32
Confusion Matrix:
[[10 0 0]
[ 0 12 1]
[0 0 9]]
Let's see the Entire IRIS DATASET output :
The output includes the first 150 rows of the Iris dataset, the accuracy, classification report, and
confusion matrix
First 150 rows of the Iris dataset:
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 virginica
146 6.3 2.5 5.0 1.9 virginica
147 6.5 3.0 5.2 2.0 virginica
148 6.2 3.4 5.4 2.3 virginica
149 5.9 3.0 5.1 1.8 virginica
Accuracy: 0.9666666666666667
Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 10
versicolor 1.00 0.92 0.96 13
virginica 0.89 1.00 0.94 9
accuracy 0.97 32
macro avg 0.96 0.97 0.97 32
weighted avg 0.97 0.97 0.97 32
Confusion Matrix:
[[10 0 0]
[ 0 12 1]
[0 0 9]]
***THE END***