0% found this document useful (0 votes)
4 views4 pages

KNN

The document outlines a KNN (K-Nearest Neighbors) classification example using the Iris dataset, focusing on sepal length and width for visualization. It includes steps for data loading, preprocessing, feature scaling, training the KNN model, and evaluating its performance using metrics like confusion matrix and classification report. The model achieved perfect accuracy on the test data with a confusion matrix showing no misclassifications.

Uploaded by

abdelazizasma80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

KNN

The document outlines a KNN (K-Nearest Neighbors) classification example using the Iris dataset, focusing on sepal length and width for visualization. It includes steps for data loading, preprocessing, feature scaling, training the KNN model, and evaluating its performance using metrics like confusion matrix and classification report. The model achieved perfect accuracy on the test data with a confusion matrix showing no misclassifications.

Uploaded by

abdelazizasma80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

10/04/2022 07:51 KNN

KNN
In [17]:

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd
import seaborn as sns

from sklearn import datasets


from sklearn.model_selection import train_test_split , KFold
from sklearn.preprocessing import Normalizer
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
import matplotlib.pyplot as plt # import de Matplotlib
from collections import Counter

We are going to use a very famous dataset called Iris. Attributes: sepal length in cm sepal width in cm petal
length in cm petal width in cm We will just use two features for easier visualization, sepal length and width. Class:
Iris Setosa Iris Versicolour Iris Virginica #Load the Dataset

In [3]:

# import iris dataset


iris = datasets.load_iris()
# np.c_ is the numpy concatenate function
iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
iris_df.head()

Out[3]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

0 5.1 3.5 1.4 0.2 0.0

1 4.9 3.0 1.4 0.2 0.0

2 4.7 3.2 1.3 0.2 0.0

3 4.6 3.1 1.5 0.2 0.0

4 5.0 3.6 1.4 0.2 0.0

file:///C:/Users/pc/Downloads/KNN.html 1/4
10/04/2022 07:51 KNN

In [4]:

iris_df.describe()

Out[4]:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 5.843333 3.057333 3.758000 1.199333 1.000000

std 0.828066 0.435866 1.765298 0.762238 0.819232

min 4.300000 2.000000 1.000000 0.100000 0.000000

25% 5.100000 2.800000 1.600000 0.300000 0.000000

50% 5.800000 3.000000 4.350000 1.300000 1.000000

75% 6.400000 3.300000 5.100000 1.800000 2.000000

max 7.900000 4.400000 6.900000 2.500000 2.000000

In [7]:

iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)

x=pd.DataFrame(iris.data)

y=pd.DataFrame(iris.target)

x.columns=['Sepal_Length','Sepal_width','Petal_Length','Petal_width']

Out[7]:

Sepal_Length Sepal_width Petal_Length Petal_width

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

... ... ... ... ...

145 6.7 3.0 5.2 2.3

146 6.3 2.5 5.0 1.9

147 6.5 3.0 5.2 2.0

148 6.2 3.4 5.4 2.3

149 5.9 3.0 5.1 1.8

150 rows × 4 columns

file:///C:/Users/pc/Downloads/KNN.html 2/4
10/04/2022 07:51 KNN

In [22]:

--------------------------------------------------------------------------
-
NameError Traceback (most recent call las
t)
<ipython-input-22-668994d18e71> in <module>
----> 1 X = dataset.iloc[:, :-1].values
2 y = dataset.iloc[:, 4].values

NameError: name 'dataset' is not defined

In [8]:

y.columns=['Targets']
y

Out[8]:

Targets

0 0

1 0

2 0

3 0

4 0

... ...

145 2

146 2

147 2

148 2

149 2

150 rows × 1 columns

In [20]:

#train test split


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.20)

Feature Scaling Before making any actual predictions, it is always a good practice to scale the features so that all
of them can be uniformly evaluated.

In [21]:

from sklearn.preprocessing import StandardScaler


scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

file:///C:/Users/pc/Downloads/KNN.html 3/4
10/04/2022 07:51 KNN

Training and Predictions It is extremely straight forward to train the KNN algorithm and make predictions with it,
especially when using Scikit-Learn.

In [23]:

#Create KNN Classifier


#Number of neighbors to use by default for kneighbors queries.
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: DataCo
nversionWarning: A column-vector y was passed when a 1d array was expecte
d. Please change the shape of y to (n_samples, ), for example using ravel
().
This is separate from the ipykernel package so we can avoid doing import
s until

Out[23]:

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',


metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')

The final step is to make predictions on our test data.

In [24]:

y_pred = classifier.predict(X_test)

Evaluating the Algorithm For evaluating an algorithm, confusion matrix, precision, recall and f1 score are the
most commonly used metrics. The confusion_matrix and classification_report methods of the sklearn.metrics can
be used to calculate these metrics.

In [25]:

from sklearn.metrics import classification_report, confusion_matrix


print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[11 0 0]
[ 0 10 0]
[ 0 0 9]]
precision recall f1-score support

0 1.00 1.00 1.00 11


1 1.00 1.00 1.00 10
2 1.00 1.00 1.00 9

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

In [ ]:

In [ ]:

file:///C:/Users/pc/Downloads/KNN.html 4/4

You might also like