0% found this document useful (0 votes)
25 views24 pages

K Nearest Neighbour - Jupyter Notebook

The document is a Jupyter Notebook that demonstrates K-Nearest Neighbors (KNN) classification. It loads the Iris dataset, explores the data, and visualizes it. It then splits the data into training and test sets. Finally, it imports the KNeighborsClassifier to set up the KNN model and classify new data based on its distances to the training examples.

Uploaded by

AYUSH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views24 pages

K Nearest Neighbour - Jupyter Notebook

The document is a Jupyter Notebook that demonstrates K-Nearest Neighbors (KNN) classification. It loads the Iris dataset, explores the data, and visualizes it. It then splits the data into training and test sets. Finally, it imports the KNeighborsClassifier to set up the KNN model and classify new data based on its distances to the training examples.

Uploaded by

AYUSH KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 1/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 2/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 3/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 4/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 5/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 6/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 7/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 8/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 9/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

localhost:8888/notebooks/K Nearest Neighbour.ipynb 10/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [1]: 1 import pandas as pd


2 import numpy as np
3 from sklearn.datasets import load_iris
4 iris=load_iris()

In [2]: 1 iris.feature_names

Out[2]: ['sepal length (cm)',


'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']

In [3]: 1 iris.target_names

Out[3]: array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

localhost:8888/notebooks/K Nearest Neighbour.ipynb 11/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [4]: 1 df=pd.DataFrame(iris.data,columns=iris.feature_names)
2 df.head()

Out[4]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

In [5]: 1 df.shape

Out[5]: (150, 4)

In [6]: 1 df['target']=iris.target
2 df.head()

Out[6]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

0 5.1 3.5 1.4 0.2 0

1 4.9 3.0 1.4 0.2 0

2 4.7 3.2 1.3 0.2 0

3 4.6 3.1 1.5 0.2 0

4 5.0 3.6 1.4 0.2 0

localhost:8888/notebooks/K Nearest Neighbour.ipynb 12/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [7]: 1 df[df.target==1].head()

Out[7]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

50 7.0 3.2 4.7 1.4 1

51 6.4 3.2 4.5 1.5 1

52 6.9 3.1 4.9 1.5 1

53 5.5 2.3 4.0 1.3 1

54 6.5 2.8 4.6 1.5 1

In [8]: 1 df[df.target==2].head()

Out[8]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target

100 6.3 3.3 6.0 2.5 2

101 5.8 2.7 5.1 1.9 2

102 7.1 3.0 5.9 2.1 2

103 6.3 2.9 5.6 1.8 2

104 6.5 3.0 5.8 2.2 2

In [9]: 1 df['flower_name']=df.target.apply(lambda x:iris.target_names[x])


2 df.head()

Out[9]: sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target flower_name

0 5.1 3.5 1.4 0.2 0 setosa

1 4.9 3.0 1.4 0.2 0 setosa

2 4.7 3.2 1.3 0.2 0 setosa

3 4.6 3.1 1.5 0.2 0 setosa

4 5.0 3.6 1.4 0.2 0 setosa

localhost:8888/notebooks/K Nearest Neighbour.ipynb 13/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [10]: 1 df0=df[:50]
2 df1=df[50:100]
3 df2=df[100:]

In [11]: 1 import matplotlib.pyplot as plt

Sepal Length Vs Sepal Width(Setosa vs Versicolor)

In [12]: 1 plt.xlabel('sepal Length')


2 plt.ylabel('sepal Width')
3 plt.scatter(df0['sepal length (cm)'],df0['sepal width (cm)'],color="green",marker='+')
4 plt.scatter(df1['sepal length (cm)'],df1['sepal width (cm)'],color="blue",marker='.')
5 ​

Out[12]: <matplotlib.collections.PathCollection at 0x13d9b12d8b0>

Petal Length Vs Petal Width(Setosa vs Versicolor)

localhost:8888/notebooks/K Nearest Neighbour.ipynb 14/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [13]: 1 plt.xlabel('petal Length')


2 plt.ylabel('petal Width')
3 plt.scatter(df0['petal length (cm)'],df0['petal width (cm)'],color="green",marker='+')
4 plt.scatter(df1['petal length (cm)'],df1['petal width (cm)'],color="blue",marker='.')
5 ​

Out[13]: <matplotlib.collections.PathCollection at 0x13d9b238eb0>

In [14]: 1 from sklearn.model_selection import train_test_split

In [15]: 1 X=df.drop(['target','flower_name'],axis='columns')
2 y=df.target

In [16]: 1 X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=1)

In [17]: 1 len(X_train)

Out[17]: 120

localhost:8888/notebooks/K Nearest Neighbour.ipynb 15/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [18]: 1 len(X_test)

Out[18]: 30

Create KNN

In [19]: 1 from sklearn.neighbors import KNeighborsClassifier


2 knn=KNeighborsClassifier()
3 knn.fit(X_train,y_train)

Out[19]: KNeighborsClassifier()

In [20]: 1 knn.score(X_test,y_test)

Out[20]: 1.0

In [21]: 1 from sklearn.metrics import confusion_matrix


2 y_pred=knn.predict(X_test)
3 cm=confusion_matrix(y_test,y_pred)
4 cm

Out[21]: array([[11, 0, 0],


[ 0, 13, 0],
[ 0, 0, 6]], dtype=int64)

localhost:8888/notebooks/K Nearest Neighbour.ipynb 16/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [22]: 1 import matplotlib.pyplot as plt


2 import seaborn as sn
3 plt.figure(figsize=(7,5))
4 sn.heatmap(cm,annot=True)
5 plt.xlabel('Predicted')
6 plt.ylabel('Truth')

Out[22]: Text(42.0, 0.5, 'Truth')

localhost:8888/notebooks/K Nearest Neighbour.ipynb 17/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [23]: 1 from sklearn.metrics import classification_report


2 print(classification_report(y_test,y_pred))

precision recall f1-score support

0 1.00 1.00 1.00 11


1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 6

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

In [24]: 1 from sklearn.model_selection import GridSearchCV


2 k_range=list(range(1,31))
3 print(k_range)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]

In [25]: 1 param_grid=dict(n_neighbors=k_range)
2 print(param_grid)
3 ​

{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30]}

In [26]: 1 #defining parameter range


2 grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)
3 grid.fit(X, y)

Out[26]: GridSearchCV(cv=10, estimator=KNeighborsClassifier(),


param_grid={'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30]},
scoring='accuracy')

localhost:8888/notebooks/K Nearest Neighbour.ipynb 18/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [27]: 1 grid_mean_scores = grid.cv_results_['mean_test_score']


2 print(grid_mean_scores)

[0.96 0.95333333 0.96666667 0.96666667 0.96666667 0.96666667


0.96666667 0.96666667 0.97333333 0.96666667 0.96666667 0.97333333
0.98 0.97333333 0.97333333 0.97333333 0.97333333 0.98
0.97333333 0.98 0.96666667 0.96666667 0.97333333 0.96
0.96666667 0.96 0.96666667 0.95333333 0.95333333 0.95333333]

In [28]: 1 plt.plot(k_range, grid_mean_scores)


2 plt.xlabel('Value of K for KNN')
3 plt.ylabel('Cross-Validated Accuracy')

Out[28]: Text(0, 0.5, 'Cross-Validated Accuracy')

In [29]: 1 print(grid.best_score_)
2 print(grid.best_params_)
3 print(grid.best_estimator_)

0.9800000000000001
{'n_neighbors': 13}
KNeighborsClassifier(n_neighbors=13)

localhost:8888/notebooks/K Nearest Neighbour.ipynb 19/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [30]: 1 k_range = list(range(1, 31))


2 weight_options = ['uniform', 'distance']
3 param_grid = dict(n_neighbors=k_range, weights=weight_options)
4 print(param_grid)

{'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28, 29, 30], 'weights': ['uniform', 'distance']}

In [31]: 1 grid = GridSearchCV(knn, param_grid, cv=10, scoring='accuracy', return_train_score=False)


2 grid.fit(X, y)

Out[31]: GridSearchCV(cv=10, estimator=KNeighborsClassifier(),


param_grid={'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30],
'weights': ['uniform', 'distance']},
scoring='accuracy')

localhost:8888/notebooks/K Nearest Neighbour.ipynb 20/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

In [32]: 1 pd.DataFrame(grid.cv_results_)[['mean_test_score', 'std_test_score', 'params']]

localhost:8888/notebooks/K Nearest Neighbour.ipynb 21/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

Out[32]: mean_test_score std_test_score params

0 0.960000 0.053333 {'n_neighbors': 1, 'weights': 'uniform'}

1 0.960000 0.053333 {'n_neighbors': 1, 'weights': 'distance'}

2 0.953333 0.052068 {'n_neighbors': 2, 'weights': 'uniform'}

3 0.960000 0.053333 {'n_neighbors': 2, 'weights': 'distance'}

4 0.966667 0.044721 {'n_neighbors': 3, 'weights': 'uniform'}

5 0.966667 0.044721 {'n_neighbors': 3, 'weights': 'distance'}

6 0.966667 0.044721 {'n_neighbors': 4, 'weights': 'uniform'}

7 0.966667 0.044721 {'n_neighbors': 4, 'weights': 'distance'}

8 0.966667 0.044721 {'n_neighbors': 5, 'weights': 'uniform'}

9 0.966667 0.044721 {'n_neighbors': 5, 'weights': 'distance'}

10 0.966667 0.044721 {'n_neighbors': 6, 'weights': 'uniform'}

11 0.966667 0.044721 {'n_neighbors': 6, 'weights': 'distance'}

12 0.966667 0.044721 {'n_neighbors': 7, 'weights': 'uniform'}

13 0.966667 0.044721 {'n_neighbors': 7, 'weights': 'distance'}

14 0.966667 0.044721 {'n_neighbors': 8, 'weights': 'uniform'}

15 0.966667 0.044721 {'n_neighbors': 8, 'weights': 'distance'}

16 0.973333 0.032660 {'n_neighbors': 9, 'weights': 'uniform'}

17 0.973333 0.032660 {'n_neighbors': 9, 'weights': 'distance'}

18 0.966667 0.044721 {'n_neighbors': 10, 'weights': 'uniform'}

19 0.973333 0.032660 {'n_neighbors': 10, 'weights': 'distance'}

20 0.966667 0.044721 {'n_neighbors': 11, 'weights': 'uniform'}

21 0.973333 0.032660 {'n_neighbors': 11, 'weights': 'distance'}

22 0.973333 0.032660 {'n_neighbors': 12, 'weights': 'uniform'}

23 0.973333 0.044222 {'n_neighbors': 12, 'weights': 'distance'}

24 0.980000 0.030551 {'n_neighbors': 13, 'weights': 'uniform'}

25 0.973333 0.032660 {'n_neighbors': 13, 'weights': 'distance'}

localhost:8888/notebooks/K Nearest Neighbour.ipynb 22/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

mean_test_score std_test_score params

26 0.973333 0.044222 {'n_neighbors': 14, 'weights': 'uniform'}

27 0.973333 0.032660 {'n_neighbors': 14, 'weights': 'distance'}

28 0.973333 0.032660 {'n_neighbors': 15, 'weights': 'uniform'}

29 0.980000 0.030551 {'n_neighbors': 15, 'weights': 'distance'}

30 0.973333 0.032660 {'n_neighbors': 16, 'weights': 'uniform'}

31 0.973333 0.032660 {'n_neighbors': 16, 'weights': 'distance'}

32 0.973333 0.032660 {'n_neighbors': 17, 'weights': 'uniform'}

33 0.980000 0.030551 {'n_neighbors': 17, 'weights': 'distance'}

34 0.980000 0.030551 {'n_neighbors': 18, 'weights': 'uniform'}

35 0.973333 0.032660 {'n_neighbors': 18, 'weights': 'distance'}

36 0.973333 0.032660 {'n_neighbors': 19, 'weights': 'uniform'}

37 0.980000 0.030551 {'n_neighbors': 19, 'weights': 'distance'}

38 0.980000 0.030551 {'n_neighbors': 20, 'weights': 'uniform'}

39 0.966667 0.044721 {'n_neighbors': 20, 'weights': 'distance'}

40 0.966667 0.033333 {'n_neighbors': 21, 'weights': 'uniform'}

41 0.966667 0.044721 {'n_neighbors': 21, 'weights': 'distance'}

42 0.966667 0.033333 {'n_neighbors': 22, 'weights': 'uniform'}

43 0.966667 0.044721 {'n_neighbors': 22, 'weights': 'distance'}

44 0.973333 0.032660 {'n_neighbors': 23, 'weights': 'uniform'}

45 0.973333 0.032660 {'n_neighbors': 23, 'weights': 'distance'}

46 0.960000 0.044222 {'n_neighbors': 24, 'weights': 'uniform'}

47 0.973333 0.032660 {'n_neighbors': 24, 'weights': 'distance'}

48 0.966667 0.033333 {'n_neighbors': 25, 'weights': 'uniform'}

49 0.973333 0.032660 {'n_neighbors': 25, 'weights': 'distance'}

50 0.960000 0.044222 {'n_neighbors': 26, 'weights': 'uniform'}

51 0.966667 0.044721 {'n_neighbors': 26, 'weights': 'distance'}

localhost:8888/notebooks/K Nearest Neighbour.ipynb 23/24


4/28/23, 3:12 PM K Nearest Neighbour - Jupyter Notebook

mean_test_score std_test_score params

52 0.966667 0.044721 {'n_neighbors': 27, 'weights': 'uniform'}

53 0.980000 0.030551 {'n_neighbors': 27, 'weights': 'distance'}

54 0.953333 0.042687 {'n_neighbors': 28, 'weights': 'uniform'}

55 0.973333 0.032660 {'n_neighbors': 28, 'weights': 'distance'}

56 0.953333 0.042687 {'n_neighbors': 29, 'weights': 'uniform'}

57 0.973333 0.032660 {'n_neighbors': 29, 'weights': 'distance'}

58 0.953333 0.042687 {'n_neighbors': 30, 'weights': 'uniform'}

59 0.966667 0.033333 {'n_neighbors': 30, 'weights': 'distance'}

In [33]: 1 print(grid.best_score_)
2 print(grid.best_params_)

0.9800000000000001
{'n_neighbors': 13, 'weights': 'uniform'}

In [ ]: 1 ​

In [ ]: 1 ​

In [ ]: 1 ​

localhost:8888/notebooks/K Nearest Neighbour.ipynb 24/24

You might also like