0% found this document useful (0 votes)
55 views6 pages

Logistic - Ipynb - Colaboratory

The document discusses building a logistic regression model to predict diabetes using a diabetes dataset. It loads and explores the dataset, splits it into training and test sets, trains a logistic regression classifier on the training set, evaluates the model on the test set, and predicts on a new sample. Key steps include data preprocessing, training an LR model, calculating accuracy and other metrics, and predicting the probability of diabetes for a new data point.

Uploaded by

Akansha Uniyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Logistic - Ipynb - Colaboratory

The document discusses building a logistic regression model to predict diabetes using a diabetes dataset. It loads and explores the dataset, splits it into training and test sets, trains a logistic regression classifier on the training set, evaluates the model on the test set, and predicts on a new sample. Key steps include data preprocessing, training an LR model, calculating accuracy and other metrics, and predicting the probability of diabetes for a new data point.

Uploaded by

Akansha Uniyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3/31/23, 2:59 PM logistic.

ipynb - Colaboratory

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns 
import pandas_profiling as pp 
import plotly.express as px

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report,accuracy_score,f1_score,precision_score,recall_score,roc_curve,roc_auc_score

%matplotlib inline

df = pd.read_csv('/content/diabetes2.csv')

df_temp = df.copy()

df.head()

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI Diabe

0 6 148 72 35 0 33.6

1 1 85 66 29 0 26.6

2 8 183 64 0 0 23.3

3 1 89 66 23 94 28.1

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

df.shape

(768, 9)

df["Outcome"].value_counts()

0 500
1 268
Name: Outcome, dtype: int64

df.isnull().sum()

Pregnancies 0
Glucose 0

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 1/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64

# The visualisation of outcome
sns.catplot(x="Outcome", kind="count", data=df_temp, palette="Set2")
plt.show()

# The visualisation count of Age of their diabetics
ax = sns.catplot(x="Age", kind="count",hue="Outcome",data=df_temp, palette="pastel", legend=False)
ax.fig.set_figwidth(20)
plt.legend(loc='upper right', labels= ["Non diabetic", "Diabetic"])
plt.show()

# Age Distribution
fig = px.histogram(df, x="Age",
                   marginal="box")
fig.show()

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 2/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

100

80
count

60

# Age distribution by Outcome 0
fig = px.histogram(df, x=df[df.Outcome==0].Age,
40
                   marginal="box",
                   color_discrete_sequence=['lightgreen'])
fig.show() 20

0
20 30 40 50

80

60
count

40

20

0
20 30 40 50

# Age distribution by Outcome 0
fig = px.histogram(df, x=df[df.Outcome==1].Age,
                   marginal="box",
                   color_discrete_sequence=['purple'])
fig.show()

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 3/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

50
# Glucose distribution by Outcome 1
fig = px.histogram(df, x=df[df.Outcome==1].Glucose,
40
                   marginal="box",
                   color_discrete_sequence=['#AB63FA'])
fig.show()
30
count

20

10

35
0
20 30 40
30

25
count

20

15

10

0
0 50 100 150 200

# Average Glucose for Diabetics person
df[df.Outcome==1].Glucose.mean()

141.25746268656715

x = df_temp.drop(['Outcome'], axis = 1)
y = df_temp.loc[:,"Outcome"].values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 123)

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(514, 8)
(514,)
(254, 8)
(254,)
Code Text
lr = LogisticRegression(solver='liblinear', max_iter = 10) #solve=liblinear kaggle için gerekli
lr.fit(x_train, y_train)

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 4/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory

/usr/local/lib/python3.9/dist-packages/sklearn/svm/_base.py:1244: ConvergenceWarning:

lr.fit(x_train, y_train)
Liblinear failed to converge, increase the number of iterations.

▾ LogisticRegression
/usr/local/lib/python3.9/dist-packages/sklearn/svm/_base.py:1244: Converge
LogisticRegression(max_iter=10, solver='liblinear')
Liblinear failed to converge, increase the number of iterations.

▾ LogisticRegression
LogisticRegression(max iter=10, solver='liblinear')

x_pred = lr.predict(x_train)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_train, x_pred)

array([[311, 32],
[120, 51]])

#train score
score = accuracy_score(y_train, x_pred)
score

0.7042801556420234

y_pred = lr.predict(x_test)

confusion_matrix(y_pred,y_test)

array([[143, 61],
[ 14, 36]])

cm1 = confusion_matrix(y_test, y_pred)
sns.heatmap(cm1, annot=True, fmt=".0f")
plt.xlabel('Predicted Values')
plt.ylabel('Actual Values')
plt.title('Accuracy Score: {0}'.format(score), size = 15)
plt.show()

print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.70 0.91 0.79 157


1 0.72 0.37 0.49 97

accuracy 0.70 254


macro avg 0.71 0.64 0.64 254
weighted avg 0.71 0.70 0.68 254

# Defined data set (it should be diabetic)


https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 5/6
3/31/23, 2:59 PM logistic.ipynb - Colaboratory
# Defined data set (it should be diabetic)
data = [[5, 150, 33.7, 50, 150, 74, 0.5, 53]]

# Create the pandas DataFrame 
df_test = pd.DataFrame(data, columns = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','A

# Predict on new data
res = lr.predict(df_test)
res

array([1])

Colab paid products - Cancel contracts here

https://colab.research.google.com/drive/1J5x3HCUeuV1QqlzUQjkfZo3wJ0-bUkbv#scrollTo=EEVLbnSKDJsT&printMode=true 6/6

You might also like