0% found this document useful (0 votes)
27 views4 pages

ML Practical 3D

Uploaded by

Samir Bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

ML Practical 3D

Uploaded by

Samir Bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment No: 03

Name: Bhosale Samir Shamkant Roll no: CO407 Class: BE COMP

Title: Implement K-Nearest Neighbors algorithm on diabetes.csv dataset.

Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset. Dataset link :
https://www.kaggle.com/datasets/abdallamahgoub/diabetes

Importing the libraries


In [1]: import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_sc

Import the dataset


In [2]: df = pd.read_csv("diabetes.csv")

In [3]: df.head()

Out[3]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age Outcome

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1


Preprocess the dataset
In [4]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767 Data
columns (total 9 columns):
# Column Non-Null Count Dtype

0 Pregnancies 768 non-null int64


1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 Pedigree 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)memory
usage: 54.1 KB
In [5]: df.shape

(768, 9)
Out[5]:

df.columns
In [6]:
Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin','BMI', 'Pedigree',
Out[6]: 'Age', 'Outcome'],
dtype='object')

In [7]: df.describe()

Out[7]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age O

count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768

mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240885

std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232 0

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000

25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000 0

50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000

75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000 1

max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000

In [8]: df.isna().sum()

Out[8]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
Pedigree 0
Age 0
Outcome 0
dtype: int64

In [9]: # Features and target variable


X = df.drop('Outcome', axis=1)
y = df['Outcome']
In [10]: df.head()

Out[10]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Pedigree Age Outcome

0 6 148 72 35 0 33.6 0.627 50 1

1 1 85 66 29 0 26.6 0.351 31 0

2 8 183 64 0 0 23.3 0.672 32 1

3 1 89 66 23 94 28.1 0.167 21 0

4 0 137 40 35 168 43.1 2.288 33 1

In [11]: # Split the dataset into training and testing sets


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42
In [12]: # Normalize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [13]: # Initialize and train KNN classifier


k = 5
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train_scaled, y_train)

Out[13]:
▾ KNeighborsClassifier
KNeighborsClassifier()

In [14]: # Predict on the test set


y_pred = knn.predict(X_test_scaled)

In [15]: # Compute evaluation metrics


conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

In [16]: # Print metrics


print("Confusion Matrix:")

Confusion Matrix:
print(conf_matrix)
In [17]:
[[119 32]
[ 37 43]]

In [18]: print("Accuracy:", accuracy)

Accuracy: 0.7012987012987013

In [19]: print("Error Rate:", error_rate)

Error Rate: 0.2987012987012987

In [20]: print("Precision:", precision)

Precision: 0.5733333333333334

In [21]: print("Recall:", recall)

Recall: 0.5375

In [ ]:

You might also like