0% found this document useful (0 votes)
39 views6 pages

Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

This document is a Jupyter Notebook detailing the implementation of the K Nearest Neighbors (KNN) algorithm using Python. It covers data import, standardization, train-test splitting, model training, and evaluation, including the use of confusion matrices and classification reports. The notebook also discusses selecting an optimal K value using the elbow method.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views6 pages

Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

This document is a Jupyter Notebook detailing the implementation of the K Nearest Neighbors (KNN) algorithm using Python. It covers data import, standardization, train-test splitting, model training, and evaluation, including the use of confusion matrices and classification reports. The notebook also discusses selecting an optimal K value using the elbow method.

Uploaded by

pateljil0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

Lecture 11-K Nearest Neighbour-Part 2

K Nearest Neighbors with Python

Import Libraries
In [10]: 1 import pandas as pd
2 import seaborn as sns
3 import matplotlib.pyplot as plt
4 import numpy as np
5 %matplotlib inline

Get the Data


In [11]: 1 df = pd.read_csv('Downloads/KNN_Project_Data')

In [12]: 1 df.head()

Out[12]:
XVPM GWYH TRAT TLLZ IGGA HYKR EDFS

0 1636.670614 817.988525 2565.995189 358.347163 550.417491 1618.870897 2147.641254 33

1 1013.402760 577.587332 2644.141273 280.428203 1161.873391 2084.107872 853.404981 44

2 1300.035501 820.518697 2025.854469 525.562292 922.206261 2552.355407 818.676686 84

3 1059.347542 1066.866418 612.000041 480.827789 419.467495 685.666983 852.867810 34

4 1018.340526 1313.679056 950.622661 724.742174 843.065903 1370.554164 905.469453 65

Standardize the Variables


In [4]: 1 from sklearn.preprocessing import StandardScaler

In [5]: 1 scaler = StandardScaler()

In [13]: 1 scaler.fit(df.drop('TARGET CLASS',axis=1))

Out[13]: StandardScaler()

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 1/6


4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

In [16]: 1 scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))

In [20]: 1 df_feat = pd.DataFrame(scaled_features,columns=df.columns[:-1])


2 df_feat.head()

Out[20]:
XVPM GWYH TRAT TLLZ IGGA HYKR EDFS GUUB MGJM

0 1.568522 -0.443435 1.619808 -0.958255 -1.128481 0.138336 0.980493 -0.932794 1.008313

1 -0.112376 -1.056574 1.741918 -1.504220 0.640009 1.081552 -1.182663 -0.461864 0.258321

2 0.660647 -0.436981 0.775793 0.213394 -0.053171 2.030872 -1.240707 1.149298 2.184784

3 0.011533 0.191324 -1.433473 -0.100053 -1.507223 -1.753632 -1.183561 -0.888557 0.162310

4 -0.099059 0.820815 -0.904346 1.609015 -0.282065 -0.365099 -1.095644 0.391419 -1.365603

Train Test Split


In [21]: 1 from sklearn.model_selection import train_test_split

In [22]: 1 X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['TA


2 test_size=0.30)

Using KNN
Remember that we are trying to come up with a model to predict whether someone will TARGET
CLASS or not. We'll start with k=1.

In [23]: 1 from sklearn.neighbors import KNeighborsClassifier

In [24]: 1 knn = KNeighborsClassifier(n_neighbors=1)

In [25]: 1 knn.fit(X_train,y_train)

Out[25]: KNeighborsClassifier(n_neighbors=1)

In [26]: 1 pred = knn.predict(X_test)

Predictions and Evaluations


Let's evaluate our KNN model!

In [27]: 1 from sklearn.metrics import classification_report,confusion_matrix

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 2/6


4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

In [28]: 1 print(confusion_matrix(y_test,pred))

[[109 45]
[ 33 113]]

In [29]: 1 print(classification_report(y_test,pred))

precision recall f1-score support

0 0.77 0.71 0.74 154


1 0.72 0.77 0.74 146

accuracy 0.74 300


macro avg 0.74 0.74 0.74 300
weighted avg 0.74 0.74 0.74 300

Choosing a K Value
Let's go ahead and use the elbow method to pick a good K Value:

In [30]: 1 error_rate = []
2 ​
3 # Will take some time
4 for i in range(1,40):
5
6 knn = KNeighborsClassifier(n_neighbors=i)
7 knn.fit(X_train,y_train)
8 pred_i = knn.predict(X_test)
9 error_rate.append(np.mean(pred_i != y_test))

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 3/6


4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

In [31]: 1 plt.figure(figsize=(10,6))
2 plt.plot(range(1,40),error_rate,color='blue', linestyle='dashed', marker='
3 markerfacecolor='red', markersize=10)
4 plt.title('Error Rate vs. K Value')
5 plt.xlabel('K')
6 plt.ylabel('Error Rate')

Out[31]: Text(0, 0.5, 'Error Rate')

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 4/6


4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

In [32]: 1 # FIRST A QUICK COMPARISON TO OUR ORIGINAL K=1


2 knn = KNeighborsClassifier(n_neighbors=1)
3 ​
4 knn.fit(X_train,y_train)
5 pred = knn.predict(X_test)
6 ​
7 print('WITH K=1')
8 print('\n')
9 print(confusion_matrix(y_test,pred))
10 print('\n')
11 print(classification_report(y_test,pred))

WITH K=1

[[109 45]
[ 33 113]]

precision recall f1-score support

0 0.77 0.71 0.74 154


1 0.72 0.77 0.74 146

accuracy 0.74 300


macro avg 0.74 0.74 0.74 300
weighted avg 0.74 0.74 0.74 300

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 5/6


4/5/23, 12:36 PM Lecture-11-K Nearest Neighbors-Part2 - Jupyter Notebook

In [35]: 1 # NOW WITH K=3


2 knn = KNeighborsClassifier(n_neighbors=23)
3 ​
4 knn.fit(X_train,y_train)
5 pred = knn.predict(X_test)
6 ​
7 print('WITH K=30')
8 print('\n')
9 print(confusion_matrix(y_test,pred))
10 print('\n')
11 print(classification_report(y_test,pred))

WITH K=30

[[114 40]
[ 20 126]]

precision recall f1-score support

0 0.85 0.74 0.79 154


1 0.76 0.86 0.81 146

accuracy 0.80 300


macro avg 0.80 0.80 0.80 300
weighted avg 0.81 0.80 0.80 300

In [ ]: 1 ​

localhost:8890/notebooks/Lecture-11-K Nearest Neighbors-Part2.ipynb 6/6

You might also like