0% found this document useful (0 votes)
20 views5 pages

Apply Logistic Regression Model Techniques To Predict Data On Any Dataset

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Apply Logistic Regression Model Techniques To Predict Data On Any Dataset

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

5 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/5.

ipynb

5. Apply Logistic Regression Model techniques to predict data on


any dataset.

In [20]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

In [21]: df = pd.read_csv("blood_pressure.csv")

In [22]: df.head()

Out[22]: Patient_Number Blood_Pressure_Abnormality Level_of_Hemoglobin Genetic_Pedigree_Coefficient

0 1 1 11.28 0.90

1 2 0 9.75 0.23

2 3 1 10.79 0.91

3 4 0 11.00 0.43

4 5 1 14.17 0.83

In [23]: df.tail()

Out[23]: Patient_Number Blood_Pressure_Abnormality Level_of_Hemoglobin Genetic_Pedigree_Coefficient

1995 1996 1 10.14

1996 1997 1 11.77

1997 1998 1 16.91

1998 1999 0 11.15

1999 2000 1 11.36

In [24]: df.shape

Out[24]: (2000, 15)

1 of 5 30-10-2024, 22:07
5 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/5.ipynb

In [25]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Patient_Number 2000 non-null int64
1 Blood_Pressure_Abnormality 2000 non-null int64
2 Level_of_Hemoglobin 2000 non-null float64
3 Genetic_Pedigree_Coefficient 1908 non-null float64
4 Age 2000 non-null int64
5 BMI 2000 non-null int64
6 Sex 2000 non-null int64
7 Pregnancy 442 non-null float64
8 Smoking 2000 non-null int64
9 Physical_activity 2000 non-null int64
10 salt_content_in_the_diet 2000 non-null int64
11 alcohol_consumption_per_day 1758 non-null float64
12 Level_of_Stress 2000 non-null int64
13 Chronic_kidney_disease 2000 non-null int64
14 Adrenal_and_thyroid_disorders 2000 non-null int64
dtypes: float64(4), int64(11)
memory usage: 234.5 KB

In [26]: df.isnull().sum()

Out[26]: Patient_Number 0
Blood_Pressure_Abnormality 0
Level_of_Hemoglobin 0
Genetic_Pedigree_Coefficient 92
Age 0
BMI 0
Sex 0
Pregnancy 1558
Smoking 0
Physical_activity 0
salt_content_in_the_diet 0
alcohol_consumption_per_day 242
Level_of_Stress 0
Chronic_kidney_disease 0
Adrenal_and_thyroid_disorders 0
dtype: int64

2 of 5 30-10-2024, 22:07
5 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/5.ipynb

In [53]: sns.countplot(x = df.Blood_Pressure_Abnormality)

Out[53]: <Axes: xlabel='Blood_Pressure_Abnormality', ylabel='count'>

In [28]: df['Genetic_Pedigree_Coefficient'] = df.Genetic_Pedigree_Coefficient.fillna(df

In [29]: df['alcohol_consumption_per_day'] = df['alcohol_consumption_per_day'].fillna(df

In [30]: df = df.drop(['Pregnancy'],axis=1)

In [31]: df = df.drop(['Patient_Number'],axis=1)

In [32]: df.isnull().sum()

Out[32]: Blood_Pressure_Abnormality 0
Level_of_Hemoglobin 0
Genetic_Pedigree_Coefficient 0
Age 0
BMI 0
Sex 0
Smoking 0
Physical_activity 0
salt_content_in_the_diet 0
alcohol_consumption_per_day 0
Level_of_Stress 0
Chronic_kidney_disease 0
Adrenal_and_thyroid_disorders 0
dtype: int64

In [33]: X = df.drop(['Blood_Pressure_Abnormality'],axis=1)
y = df['Blood_Pressure_Abnormality']

In [34]: X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state

3 of 5 30-10-2024, 22:07
5 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/5.ipynb

In [35]: lg = LogisticRegression(C= 1438.44988828766,


max_iter= 100,
penalty= 'l2',
solver= 'liblinear')

In [36]: lg.fit(X_train,y_train)

Out[36]: LogisticRegression(C=1438.44988828766, solver='liblinear')


In a Jupyter environment, please rerun this cell to show the HTML representation or trust
the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page
with nbviewer.org.

In [37]: y_pred = lg.predict(X_test)

In [38]: from sklearn.metrics import confusion_matrix,classification_report,accuracy_score

In [39]: confusion_matrix(y_test,y_pred)

Out[39]: array([[159, 64],


[ 49, 128]], dtype=int64)

In [40]: print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.76 0.71 0.74 223


1 0.67 0.72 0.69 177

accuracy 0.72 400


macro avg 0.72 0.72 0.72 400
weighted avg 0.72 0.72 0.72 400

In [41]: recall = 111/111+66

In [42]: FPR = 64/64+159

In [43]: FPR

Out[43]: 160.0

In [44]: accuracy_score(y_test,y_pred)

Out[44]: 0.7175

In [45]: precision_score(y_test,y_pred)

Out[45]: 0.6666666666666666

4 of 5 30-10-2024, 22:07
5 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/5.ipynb

In [46]: recall_score(y_test,y_pred)

Out[46]: 0.7231638418079096

In [47]: from sklearn.metrics import roc_curve

In [48]: pred_proba = lg.predict_proba(X_test)

In [49]: roc_auc_score(y_test,y_pred)*100

Out[49]: 71.80841630564213

In [50]: fpr,tpr,ther = roc_curve(y_test,pred_proba[:,-1])

In [51]: plt.plot(fpr,tpr,c = 'g')


plt.xlabel("FPR")
plt.ylabel("TPR")
plt.grid()

In [ ]:

5 of 5 30-10-2024, 22:07

You might also like