0% found this document useful (0 votes)
36 views8 pages

Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub

Uploaded by

Linya C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views8 pages

Diabetes Prediction Using Logistic Regression - Untitled - Ipynb at Main Prajwal10031999 - Diabetes Prediction Using Logistic Regression GitHub

Uploaded by

Linya C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Prajwal10031999 / Diabetes-Prediction-using-Logistic-Regression Public

Code Issues Pull requests Actions Projects Security Insights

Diabetes-Prediction-using-Logistic-Regression / Untitled.ipynb

Prajwal10031999 Add files via upload 4 years ago

1305 lines (1305 loc) · 166 KB


In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
df1=pd.read_csv("diabetes.csv")

In [3]:
df1.head()

Out[3]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

In [4]:
df1.describe()

Out[4]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI

count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000

mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578

std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000

50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000

75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000

max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000

In [5]:
sns.heatmap(df1.isnull(),yticklabels=False,cmap='viridis')

Out[5]: <AxesSubplot:>
In [6]:
sns.heatmap(df1,yticklabels=False,cmap='viridis')

Out[6]: <AxesSubplot:>

In [7]:
sns.set_style('whitegrid')

In [9]:
sns.countplot(x='Outcome',hue='Outcome',data=df1,palette='cubehelix')

Out[9]: <AxesSubplot:xlabel='Outcome', ylabel='count'>


In [10]:
plt.scatter(x='Outcome',y='Age',data=df1)
plt.ylabel('Age')
plt.xlabel('Outcome')

Out[10]: Text(0.5, 0, 'Outcome')

In [11]:
sns.distplot(df1['Age'],kde=False,color='darkblue',bins=30)

C:\Users\Hp\anaconda3\envs\car\lib\site-packages\seaborn\distributions.py:2551:
FutureWarning: `distplot` is a deprecated function and will be removed in a fut
ure version. Please adapt your code to use either `displot` (a figure-level fun
ction with similar flexibility) or `histplot` (an axes-level function for histo
grams).
warnings.warn(msg, FutureWarning)
Out[11]: <AxesSubplot:xlabel='Age'>

In [12]:
sns.distplot(df1['BloodPressure'],kde=False,color='royalblue',bins=20)

Out[12]: <AxesSubplot:xlabel='BloodPressure'>
In [13]:
sns.jointplot(x='Age',y='BloodPressure',data=df1)

Out[13]: <seaborn.axisgrid.JointGrid at 0x1f8b5d60f88>

Diabetes-Prediction-using-Logistic-Regression / Untitled.ipynb Top


In [14]:
import seaborn as sns
Preview Code Blame
sns.set(style="whitegrid") Raw
#tips=sns.load_dataset("diabetes.csv")
plt.figure(figsize=(15,8))

ax=sns.barplot(x="Age", y="BloodPressure", data=df1,)


In [15]:
from sklearn.model_selection import train_test_split

In [16]:
df1.head()

Out[16]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

In [17]:
x=['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','D

In [18]:
y=['Output']

In [19]:
df2=pd.DataFrame(data=df1)
df2.head()

Out[19]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1

In [20]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(df1.drop('Outcome',axis=1),df1

In [21]:
X_test.head()

Out[21]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedig

766 1 126 60 0 0 30.1


748 3 187 70 22 200 36.4

42 7 106 92 18 0 22.7

485 0 135 68 42 250 42.3

543 4 84 90 23 56 39.5

In [30]:
from sklearn.linear_model import LogisticRegression
LRModel=LogisticRegression(solver='lbfgs', max_iter=7600)
LRModel.fit(X_train,y_train)

Out[30]: LogisticRegression(max_iter=7600)

In [31]:
predictions_diabetes=LRModel.predict(X_test)

In [33]:
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test,predictions_diabetes))

precision recall f1-score support

0 0.83 0.86 0.85 103


1 0.70 0.65 0.67 51

accuracy 0.79 154


macro avg 0.77 0.76 0.76 154
weighted avg 0.79 0.79 0.79 154

In [58]:
# paitentid_54=pd.DataFrame([1,123,126,60,0,30.1,0.349,47],columns=x)
#Defining a sample data to test the model
x=['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','D
data=[0,170,126,60,35,30.1,0.649,78]
paitentid_54=pd.DataFrame([data],columns=x)
paitentid_54.head()

Out[58]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 0 170 126 60 35 30.1

In [59]:
df1.head()

Out[59]: Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigre

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

4 0 137 40 35 168 43.1


In [60]:
predictions_diabetes=LRModel.predict(paitentid_54)

In [61]:
print(predictions_diabetes)

[1]

In [ ]:

You might also like