0% found this document useful (0 votes)

82 views8 pages

Logistic Regression 205

The document describes using logistic regression to perform heart disease classification. It includes: 1) Importing necessary libraries and datasets, cleaning data by removing null values. 2) Visualizing the data and adding a constant. 3) Fitting a logistic regression model to the data and analyzing the results. 4) Performing backward feature selection to refine the model. 5) Splitting the data into training and test sets. 6) Fitting the logistic regression model on the training data and predicting on the test data. 7) Calculating the accuracy of the model on the test data.

Uploaded by

Ranadeep Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views8 pages

Logistic Regression 205

Uploaded by

Ranadeep Dey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

205 Vaishnavi Nilawar

Logistic Regression for Heart Disease Classification

1. Importing required Libraries

In [1]:

import pandas as pd

import numpy as np

import statsmodels.api as sm

import scipy.stats as st

import matplotlib.pyplot as plt

import seaborn as sn

from sklearn.metrics import confusion_matrix

import matplotlib.mlab as mlab

%matplotlib inline

2. Importing Dataset ¶

In [2]:

df = pd.read_csv('heart_disease.csv')

df.drop(['education'],axis=1,inplace=True)

df.rename(columns={'male':'sex_male'},inplace=True)

df.head()

Out[2]:

sex_male age currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp diabe

0 1 39 0 0.0 0.0 0 0

1 0 46 0 0.0 0.0 0 0

2 1 48 1 20.0 0.0 0 0

3 0 61 1 30.0 0.0 0 1

4 0 46 1 23.0 0.0 0 0

3. Checking for Null values

In [3]:

df.isnull().sum()

Out[3]:

sex_male 0

age 0

currentSmoker 0

cigsPerDay 29

BPMeds 53

prevalentStroke 0

prevalentHyp 0

diabetes 0

totChol 50

sysBP 0

diaBP 0

BMI 19

heartRate 1

glucose 388

TenYearCHD 0

dtype: int64

In [4]:

count = 0

for i in df.isnull().sum(axis=1):

if i > 0:

count = count+1

print('Total number of rows with missing values is', count)

Total number of rows with missing values is 489

4. Removing Null values

In [5]:

df.dropna(axis=0, inplace=True)

df.info()

Int64Index: 3749 entries, 0 to 4237

Data columns (total 15 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 sex_male 3749 non-null int64

1 age 3749 non-null int64

2 currentSmoker 3749 non-null int64

3 cigsPerDay 3749 non-null float64

4 BPMeds 3749 non-null float64

5 prevalentStroke 3749 non-null int64

6 prevalentHyp 3749 non-null int64

7 diabetes 3749 non-null int64

8 totChol 3749 non-null float64

9 sysBP 3749 non-null float64

10 diaBP 3749 non-null float64

11 BMI 3749 non-null float64

12 heartRate 3749 non-null float64

13 glucose 3749 non-null float64

14 TenYearCHD 3749 non-null int64

dtypes: float64(8), int64(7)

memory usage: 468.6 KB

5. Data Visualization
In [6]:

def draw_histograms(dataframe, features, rows, cols):

fig=plt.figure(figsize=(20,20))

for i, feature in enumerate(features):

ax=fig.add_subplot(rows,cols,i+1)

dataframe[feature].hist(bins=20,ax=ax,facecolor='red')

ax.set_title(feature+"Distribution", color='blue')

fig.tight_layout()

plt.show()

draw_histograms(df, df.columns, 6, 3)

In [7]:

sn.countplot(x='TenYearCHD', data=df)

Out[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x224600f92e0>

6. Insering a constant value

In [8]:

from statsmodels.tools import add_constant as add_constant

df_constant = add_constant(df)

df_constant.head()

Out[8]:

const sex_male age currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp

0 1.0 1 39 0 0.0 0.0 0 0

1 1.0 0 46 0 0.0 0.0 0 0

2 1.0 1 48 1 20.0 0.0 0 0

3 1.0 0 61 1 30.0 0.0 0 1

4 1.0 0 46 1 23.0 0.0 0 0

In [9]:

st.chisqprob = lambda chisq, df: st.chi2.sf(chisq, df)

cols = df_constant.columns[:-1]

model = sm.Logit(df.TenYearCHD, df_constant[cols])

result = model.fit()

result.summary()

Optimization terminated successfully.

Current function value: 0.377199

Iterations 7

Out[9]:

Logit Regression Results

Dep. Variable: TenYearCHD No. Observations: 3749

Model: Logit Df Residuals: 3734

Method: MLE Df Model: 14

Date: Sat, 11 Sep 2021 Pseudo R-squ.: 0.1169

Time: 11:39:42 Log-Likelihood: -1414.1

converged: True LL-Null: -1601.4

Covariance Type: nonrobust LLR p-value: 2.922e-71

coef std err z P>|z| [0.025 0.975]

const -8.6463 0.687 -12.577 0.000 -9.994 -7.299

sex_male 0.5740 0.107 5.343 0.000 0.363 0.785

age 0.0640 0.007 9.787 0.000 0.051 0.077

currentSmoker 0.0732 0.155 0.473 0.636 -0.230 0.376

cigsPerDay 0.0184 0.006 3.003 0.003 0.006 0.030

BPMeds 0.1446 0.232 0.622 0.534 -0.311 0.600

prevalentStroke 0.7191 0.489 1.471 0.141 -0.239 1.677

prevalentHyp 0.2146 0.136 1.574 0.116 -0.053 0.482

diabetes 0.0025 0.312 0.008 0.994 -0.609 0.614

totChol 0.0022 0.001 2.074 0.038 0.000 0.004

sysBP 0.0153 0.004 4.080 0.000 0.008 0.023

diaBP -0.0039 0.006 -0.619 0.536 -0.016 0.009

BMI 0.0103 0.013 0.820 0.412 -0.014 0.035

heartRate -0.0023 0.004 -0.550 0.583 -0.010 0.006

glucose 0.0076 0.002 3.408 0.001 0.003 0.012

In [10]:

def back_feature_elem (data_frame, dep_var, col_list):

while len(col_list)>0 :

model = sm.Logit(dep_var,data_frame[col_list])

result = model.fit(disp=0)

largest_pvalue = round(result.pvalues,3).nlargest(1)

if largest_pvalue[0]<(0.05):

return result

break

else:

col_list = col_list.drop(largest_pvalue.index)

result = back_feature_elem(df_constant, df.TenYearCHD, cols)

result.summary()

Out[10]:

Logit Regression Results

Dep. Variable: TenYearCHD No. Observations: 3749

Model: Logit Df Residuals: 3742

Method: MLE Df Model: 6

Date: Sat, 11 Sep 2021 Pseudo R-squ.: 0.1148

Time: 11:40:26 Log-Likelihood: -1417.6

converged: True LL-Null: -1601.4

Covariance Type: nonrobust LLR p-value: 2.548e-76

coef std err z P>|z| [0.025 0.975]

const -9.1211 0.468 -19.491 0.000 -10.038 -8.204

sex_male 0.5813 0.105 5.521 0.000 0.375 0.788

age 0.0654 0.006 10.330 0.000 0.053 0.078

cigsPerDay 0.0197 0.004 4.803 0.000 0.012 0.028

totChol 0.0023 0.001 2.099 0.036 0.000 0.004

sysBP 0.0174 0.002 8.166 0.000 0.013 0.022

glucose 0.0076 0.002 4.573 0.000 0.004 0.011

In [11]:

params = np.exp(result.params)

conf = np.exp(result.conf_int())

conf['OR'] = params

pvalue = round(result.pvalues,3)

conf['pvalue'] = pvalue

conf.columns = ['CI 95%(2.5%)','CI 95%(97.5%)', 'Odds Ratio', 'pvalue']

print((conf))

CI 95%(2.5%) CI 95%(97.5%) Odds Ratio pvalue

const 0.000044 0.000274 0.000109 0.000

sex_male 1.454877 2.198166 1.788313 0.000

age 1.054409 1.080897 1.067571 0.000

cigsPerDay 1.011730 1.028128 1.019896 0.000

totChol 1.000150 1.004386 1.002266 0.036

sysBP 1.013299 1.021791 1.017536 0.000

glucose 1.004343 1.010895 1.007614 0.000

7. Splitting the data into train and test data

In [15]:

import sklearn

new_features = df[['age','sex_male','cigsPerDay','totChol','sysBP','glucose','TenYearC
HD']]

x = new_features.iloc[:,:-1]

y = new_features.iloc[:,-1]

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=.20, random_state=5)

8. Fitting the data into the Model

In [ ]:

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression()

logreg.fit(x_train, y_train)

y_pred = logreg.predict(x_test)

9. Calculating the accuracy

In [18]:

print("Model Accuracy:")

sklearn.metrics.accuracy_score(y_test,y_pred)

Model Accuracy:

Out[18]:

0.8706666666666667

Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
20 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
16 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Fundamentals of Meter Provers and Proving Methods
100% (1)
Fundamentals of Meter Provers and Proving Methods
9 pages
Merih Instruction BUS Door
No ratings yet
Merih Instruction BUS Door
6 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Jonathan Bennett Events and Their Names
No ratings yet
Jonathan Bennett Events and Their Names
239 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Linear Merged Pagenumber
No ratings yet
Linear Merged Pagenumber
48 pages
ML Proj Diabetes
No ratings yet
ML Proj Diabetes
51 pages
Fds 1
No ratings yet
Fds 1
44 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Simba S7 D - Techspecific
No ratings yet
Simba S7 D - Techspecific
4 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Binary Prediction of Smoker Status Using Bio-Signals
No ratings yet
Binary Prediction of Smoker Status Using Bio-Signals
20 pages
Logistic Regression
No ratings yet
Logistic Regression
28 pages
Chap 3 VDZ Activity Report 09-12
No ratings yet
Chap 3 VDZ Activity Report 09-12
19 pages
Diabetic Prediction Using LogicalRegression
No ratings yet
Diabetic Prediction Using LogicalRegression
9 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
Rapport
No ratings yet
Rapport
21 pages
Case Study
No ratings yet
Case Study
21 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Logidtic Regression ASSIGNMENT
No ratings yet
Logidtic Regression ASSIGNMENT
13 pages
Documentation Code
No ratings yet
Documentation Code
20 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
ML Practical 04
No ratings yet
ML Practical 04
20 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
No ratings yet
Diabetes and Glucose Correlation - IBM Machine Learning Training Project
10 pages
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
No ratings yet
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
18 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
ML Data Preprocessing in Python
No ratings yet
ML Data Preprocessing in Python
9 pages
Heart - Cleveland - Ipynb - Colab
No ratings yet
Heart - Cleveland - Ipynb - Colab
5 pages
Diabetes
No ratings yet
Diabetes
7 pages
Diabetes
No ratings yet
Diabetes
10 pages
Apply Linear Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Linear Regression Model Techniques To Predict Data On Any Dataset
6 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
Project
No ratings yet
Project
8 pages
AI Mini Project
No ratings yet
AI Mini Project
6 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
No ratings yet
SVM - RF - Diabetes - CSV - 26 - 6 - 2023.ipynb - Colaboratory
8 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Stroke Prediction
No ratings yet
Stroke Prediction
14 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
ADS Exp-1
No ratings yet
ADS Exp-1
3 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Python Cod1
No ratings yet
Python Cod1
3 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
HCIA-Datacom V1.0 Lab Guide
No ratings yet
HCIA-Datacom V1.0 Lab Guide
182 pages
Ls Inverter Ic5
No ratings yet
Ls Inverter Ic5
20 pages
Trig Practice 2
No ratings yet
Trig Practice 2
3 pages
Nigerian Communications Commission Grant Presentation
No ratings yet
Nigerian Communications Commission Grant Presentation
69 pages
6.2 Properties of Parallelograms: Quadrilaterals
No ratings yet
6.2 Properties of Parallelograms: Quadrilaterals
15 pages
Topic: The Determinants of Profitability of Bangladeshi Commercial Banks
No ratings yet
Topic: The Determinants of Profitability of Bangladeshi Commercial Banks
50 pages
Pleiades - Sigdell 6 10
No ratings yet
Pleiades - Sigdell 6 10
5 pages
Programming Language - Common Lisp 8. Structures
No ratings yet
Programming Language - Common Lisp 8. Structures
10 pages
Example Calculation Sheet
No ratings yet
Example Calculation Sheet
1 page
04 02 Permutation and Combinations2 PDF
No ratings yet
04 02 Permutation and Combinations2 PDF
28 pages
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
No ratings yet
1 - Assignment - PH 401 (EE) - MODULE - 6 (Statistical Mechanics)
2 pages
Tabla de Torques DP DC HW
No ratings yet
Tabla de Torques DP DC HW
1 page
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
2 pages
The Cruel Prince
No ratings yet
The Cruel Prince
4 pages
Summary Report: Threat Analysis
No ratings yet
Summary Report: Threat Analysis
9 pages
ESC201 UDas Lec24Corrected OpAmp Aps PDF
No ratings yet
ESC201 UDas Lec24Corrected OpAmp Aps PDF
6 pages
UML Diagram Step by Step
No ratings yet
UML Diagram Step by Step
26 pages
Introduction To Part I: The Methanol-to-Olefins (MTO) Reaction and Small-Pore Microporous Materials
No ratings yet
Introduction To Part I: The Methanol-to-Olefins (MTO) Reaction and Small-Pore Microporous Materials
13 pages
Improvements in The Mechanical Properties of The 18R-6R High-Hysteresis Martensitic Transformation by Nanoprecipitates in CuZnAl Alloys
No ratings yet
Improvements in The Mechanical Properties of The 18R-6R High-Hysteresis Martensitic Transformation by Nanoprecipitates in CuZnAl Alloys
8 pages
Kig1009 Um-Pt01-Mqf-Br003-S00
No ratings yet
Kig1009 Um-Pt01-Mqf-Br003-S00
2 pages
Ceng317 Gc32 Final Exam: Two-Way Anova
No ratings yet
Ceng317 Gc32 Final Exam: Two-Way Anova
6 pages
Umc Notification
No ratings yet
Umc Notification
1 page
DSPDF Formulae
No ratings yet
DSPDF Formulae
3 pages
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
No ratings yet
Experimental Study On Self Compacting Concrete With Various Percentage of Steel Fibres
4 pages
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
No ratings yet
Program // Mouseeventsview - CPP: Implementation of The Cmouseeventsview Class
6 pages