0% found this document useful (0 votes)
129 views4 pages

Diabetes Prediction - Logistic Regression - Jupyter Notebook

This document summarizes code for performing machine learning on a diabetes dataset using logistic regression. It loads data, splits it into training and test sets, standardizes the features, fits logistic regression models with and without standardization, calculates accuracy scores on the test sets, generates a correlation heatmap, and plots a confusion matrix. The standardized data achieves a slightly higher accuracy score than the non-standardized data.

Uploaded by

saravanakumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views4 pages

Diabetes Prediction - Logistic Regression - Jupyter Notebook

This document summarizes code for performing machine learning on a diabetes dataset using logistic regression. It loads data, splits it into training and test sets, standardizes the features, fits logistic regression models with and without standardization, calculates accuracy scores on the test sets, generates a correlation heatmap, and plots a confusion matrix. The standardized data achieves a slightly higher accuracy score than the non-standardized data.

Uploaded by

saravanakumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

In 

[1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix,accuracy_score, plot_co
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

In [2]:

data = pd.read_csv('diabetes.csv')
data.head()

Out[2]:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunction

0 6 148 72 35 0 33.6 0.627

1 1 85 66 29 0 26.6 0.35

2 8 183 64 0 0 23.3 0.672

3 1 89 66 23 94 28.1 0.167

4 0 137 40 35 168 43.1 2.288

In [3]:

data.groupby('Outcome').mean()

Out[3]:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diab

Outcome

0 3.298000 109.980000 68.184000 19.664000 68.792000 30.304200

1 4.865672 141.257463 70.824627 22.164179 100.335821 35.142537

In [4]:

X=data.drop(columns='Outcome',axis=1)
y=data['Outcome']
print(X.shape,y.shape)

(768, 8) (768,)

The standard score of a sample x is calculated as:


z = (x - u) / s

where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation
of the training samples or one if with_std=False.

In [5]:

scalar=StandardScaler()
X_standard=scalar.fit_transform(X)
print(X_standard)

[[ 0.63994726 0.84832379 0.14964075 ... 0.20401277 0.46849198

1.4259954 ]

[-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078

-0.19067191]

[ 1.23388019 1.94372388 -0.26394125 ... -1.10325546 0.60439732

-0.10558415]

...

[ 0.3429808 0.00330087 0.14964075 ... -0.73518964 -0.68519336

-0.27575966]

[-0.84488505 0.1597866 -0.47073225 ... -0.24020459 -0.37110101

1.17073215]

[-0.84488505 -0.8730192 0.04624525 ... -0.20212881 -0.47378505

-0.87137393]]

In [6]:

n,y_test=train_test_split(X,y,random_state=2)
andard_test,y_standard_train,y_standard_test=train_test_split(X_standard,y,random_state=2)

In [7]:

lr=LogisticRegression()
lr_standard=LogisticRegression()

In [8]:

lr.fit(X_train,y_train)
lr_standard.fit(X_standard_train,y_standard_train)

C:\Users\SUPER\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:762: ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:

https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)

Please also refer to the documentation for alternative solver options:

https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)

n_iter_i = _check_optimize_result(

Out[8]:

LogisticRegression()
In [9]:

y_pred = lr.predict(X_test)
y_standard_pred = lr_standard.predict(X_standard_test)
print(accuracy_score(y_test,y_pred))
print(accuracy_score(y_test,y_standard_pred))

0.7604166666666666

0.765625

In [10]:

plt.figure(dpi=150)
sns.heatmap(data.corr(),annot=True)

Out[10]:

<AxesSubplot:>
In [11]:

plot_confusion_matrix(lr, X_test, y_test, display_labels=['Diabetic', 'Non-diabetic'],)

Out[11]:

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x22b8a65d
4f0>

In [ ]:

You might also like