0% found this document useful (0 votes)
12 views17 pages

Brain Stroke Prediction Using ML - Jupyter Notebook

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Brain Stroke Prediction Using ML - Jupyter Notebook

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [1]:

!pip install matplotlib

Requirement already satisfied: matplotlib in c:\users\91809\downloads\anac


onda\lib\site-packages (3.7.0)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\91809\download
s\anaconda\lib\site-packages (from matplotlib) (1.0.5)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\91809\down
loads\anaconda\lib\site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\91809\downloa
ds\anaconda\lib\site-packages (from matplotlib) (4.25.0)
Requirement already satisfied: pillow>=6.2.0 in c:\users\91809\downloads\a
naconda\lib\site-packages (from matplotlib) (9.4.0)
Requirement already satisfied: numpy>=1.20 in c:\users\91809\downloads\ana
conda\lib\site-packages (from matplotlib) (1.23.5)
Requirement already satisfied: cycler>=0.10 in c:\users\91809\downloads\an
aconda\lib\site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\91809\download
s\anaconda\lib\site-packages (from matplotlib) (3.0.9)
Requirement already satisfied: packaging>=20.0 in c:\users\91809\downloads
\anaconda\lib\site-packages (from matplotlib) (22.0)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\91809\downloa
ds\anaconda\lib\site-packages (from matplotlib) (1.4.4)
Requirement already satisfied: six>=1.5 in c:\users\91809\downloads\anacon
da\lib\site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)

In [2]:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.rcParams['figure.figsize'] = (5, 5)

In [5]:

data=pd.read_csv(r"C:\Users\91809\OneDrive\Desktop\healthcare-dataset-stroke-data.csv")

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 1/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [6]:

data

Out[6]:

id gender age hypertension heart_disease ever_married work_type Residence_type avg_gl

0 9046 Male 67.0 0 1 Yes Private Urban

Self-
1 51676 Female 61.0 0 0 Yes Rural
employed

2 31112 Male 80.0 0 1 Yes Private Rural

3 60182 Female 49.0 0 0 Yes Private Urban

Self-
4 1665 Female 79.0 1 0 Yes Rural
employed

... ... ... ... ... ... ... ... ...

5105 18234 Female 80.0 1 0 Yes Private Urban

Self-
5106 44873 Female 81.0 0 0 Yes Urban
employed

#exploratory analysis

In [7]:

data.shape

Out[7]:

(5110, 12)

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 2/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [8]:

data.info

Out[8]:

<bound method DataFrame.info of id gender age hypertension h


eart_disease ever_married \
0 9046 Male 67.0 0 1 Yes
1 51676 Female 61.0 0 0 Yes
2 31112 Male 80.0 0 1 Yes
3 60182 Female 49.0 0 0 Yes
4 1665 Female 79.0 1 0 Yes
... ... ... ... ... ... ...
5105 18234 Female 80.0 1 0 Yes
5106 44873 Female 81.0 0 0 Yes
5107 19723 Female 35.0 0 0 Yes
5108 37544 Male 51.0 0 0 Yes
5109 44679 Female 44.0 0 0 Yes

work_type Residence_type avg_glucose_level bmi smoking_stat


us \
0 Private Urban 228.69 36.6 formerly smok
ed
1 Self-employed Rural 202.21 NaN never smok
ed
2 Private Rural 105.92 32.5 never smok
ed
3 Private Urban 171.23 34.4 smok
es
4 Self-employed Rural 174.12 24.0 never smok
ed
... ... ... ... ...
...
5105 Private Urban 83.75 NaN never smok
ed
5106 Self-employed Urban 125.20 40.0 never smok
ed
5107 Self-employed Rural 82.99 30.6 never smok
ed
5108 Private Rural 166.29 25.6 formerly smok
ed
5109 Govt_job Urban 85.28 26.2 Unkno
wn

stroke
0 1
1 1
2 1
3 1
4 1
... ...
5105 0
5106 0
5107 0
5108 0
5109 0

[5110 rows x 12 columns]>

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 3/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [9]:

data.isnull().sum()

Out[9]:

id 0
gender 0
age 0
hypertension 0
heart_disease 0
ever_married 0
work_type 0
Residence_type 0
avg_glucose_level 0
bmi 201
smoking_status 0
stroke 0
dtype: int64

In [10]:

#let fill null values


data['bmi'].value_counts()

Out[10]:

28.7 41
28.4 38
26.7 37
27.6 37
26.1 37
..
48.7 1
49.2 1
51.0 1
49.4 1
14.9 1
Name: bmi, Length: 418, dtype: int64

In [11]:

data['bmi'].describe()

Out[11]:

count 4909.000000
mean 28.893237
std 7.854067
min 10.300000
25% 23.500000
50% 28.100000
75% 33.100000
max 97.600000
Name: bmi, dtype: float64

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 4/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [12]:

data['bmi'].fillna(data['bmi'].mean(),inplace=True)

In [13]:

data['bmi'].describe()

Out[13]:

count 5110.000000
mean 28.893237
std 7.698018
min 10.300000
25% 23.800000
50% 28.400000
75% 32.800000
max 97.600000
Name: bmi, dtype: float64

In [14]:

data.drop('id',axis=1,inplace=True)

In [15]:

data

Out[15]:

gender age hypertension heart_disease ever_married work_type Residence_type a

0 Male 67.0 0 1 Yes Private Urban

Self-
1 Female 61.0 0 0 Yes Rural
employed

2 Male 80.0 0 1 Yes Private Rural

3 Female 49.0 0 0 Yes Private Urban

Self-
4 Female 79.0 1 0 Yes Rural
employed

... ... ... ... ... ... ... ...

5105 Female 80.0 1 0 Yes Private Urban

Self-
5106 Female 81.0 0 0 Yes Urban
employed

Self-
5107 Female 35.0 0 0 Yes Rural
employed

5108 Male 51.0 0 0 Yes Private Rural

5109 Female 44.0 0 0 Yes Govt_job Urban

5110 rows × 11 columns

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 5/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [16]:

#Outlier Removation
from matplotlib.pyplot import figure
figure(num=None, figsize=(8, 6), dpi=800, facecolor='w', edgecolor='k')

Out[16]:

<Figure size 6400x4800 with 0 Axes>

<Figure size 6400x4800 with 0 Axes>

In [17]:

data.plot(kind='box')
plt.show()

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 6/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [18]:

data.head()

Out[18]:

gender age hypertension heart_disease ever_married work_type Residence_type avg_

0 Male 67.0 0 1 Yes Private Urban

Self-
1 Female 61.0 0 0 Yes Rural
employed

2 Male 80.0 0 1 Yes Private Rural

3 Female 49.0 0 0 Yes Private Urban

Self-
4 Female 79.0 1 0 Yes Rural
employed

In [19]:

from sklearn.preprocessing import LabelEncoder


enc=LabelEncoder()

In [20]:

gender=enc.fit_transform(data['gender'])

In [21]:

smoking_status=enc.fit_transform(data['smoking_status'])

In [22]:

work_type=enc.fit_transform(data['work_type'])
Residence_type=enc.fit_transform(data['Residence_type'])
ever_married=enc.fit_transform(data['ever_married'])

In [23]:

data['work_type']=work_type

In [24]:

data['ever_married']=ever_married
data['Residence_type']=Residence_type
data['smoking_status']=smoking_status
data['gender']=gender

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 7/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [25]:

data

Out[25]:

gender age hypertension heart_disease ever_married work_type Residence_type a

0 1 67.0 0 1 1 2 1

1 0 61.0 0 0 1 3 0

2 1 80.0 0 1 1 2 0

3 0 49.0 0 0 1 2 1

4 0 79.0 1 0 1 3 0

... ... ... ... ... ... ... ...

5105 0 80.0 1 0 1 2 1

5106 0 81.0 0 0 1 3 1

5107 0 35.0 0 0 1 3 0

5108 1 51.0 0 0 1 2 0

5109 0 44.0 0 0 1 0 1

5110 rows × 11 columns

In [26]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5110 entries, 0 to 5109
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 5110 non-null int32
1 age 5110 non-null float64
2 hypertension 5110 non-null int64
3 heart_disease 5110 non-null int64
4 ever_married 5110 non-null int32
5 work_type 5110 non-null int32
6 Residence_type 5110 non-null int32
7 avg_glucose_level 5110 non-null float64
8 bmi 5110 non-null float64
9 smoking_status 5110 non-null int32
10 stroke 5110 non-null int64
dtypes: float64(3), int32(5), int64(3)
memory usage: 339.5 KB

In [27]:

X=data.drop('stroke',axis=1)

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 8/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [28]:

X.head()

Out[28]:

gender age hypertension heart_disease ever_married work_type Residence_type avg_

0 1 67.0 0 1 1 2 1

1 0 61.0 0 0 1 3 0

2 1 80.0 0 1 1 2 0

3 0 49.0 0 0 1 2 1

4 0 79.0 1 0 1 3 0

In [29]:

Y=data['stroke']

In [30]:

Out[30]:

0 1
1 1
2 1
3 1
4 1
..
5105 0
5106 0
5107 0
5108 0
5109 0
Name: stroke, Length: 5110, dtype: int64

In [68]:

from sklearn.model_selection import train_test_split


X_train, X_test, Y_train, Y_test=train_test_split(X,Y,test_size=0.2,random_state=10)

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 9/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [69]:

X_train

Out[69]:

gender age hypertension heart_disease ever_married work_type Residence_type a

2285 1 49.0 0 0 1 2 0

4733 1 67.0 0 0 1 2 0

3905 1 78.0 0 0 1 2 1

4700 1 47.0 0 0 1 2 0

4939 0 59.0 0 0 1 2 1

... ... ... ... ... ... ... ...

1180 0 62.0 0 0 1 2 0

3441 0 59.0 0 0 1 3 1

1344 1 47.0 0 0 1 2 0

4623 1 25.0 0 0 1 0 1

1289 0 80.0 0 0 1 3 0

4088 rows × 10 columns

In [70]:

Y_train

Out[70]:

2285 0
4733 0
3905 0
4700 0
4939 0
..
1180 0
3441 0
1344 0
4623 0
1289 0
Name: stroke, Length: 4088, dtype: int64

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 10/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [71]:

X_test

Out[71]:

gender age hypertension heart_disease ever_married work_type Residence_type

2413 0 58.00 0 0 1 2 0

1141 1 57.00 0 0 1 2 0

146 1 65.00 0 0 1 3 1

3883 0 1.64 0 0 0 4 1

1044 0 79.00 0 0 1 0 1

... ... ... ... ... ... ... ...

2261 1 59.00 0 0 1 2 1

4712 1 57.00 0 0 1 2 1

4971 0 63.00 0 0 1 2 1

2224 1 57.00 0 0 1 2 0

4825 0 14.00 0 0 0 4 1

1022 rows × 10 columns

In [72]:

Y_test

Out[72]:

2413 0
1141 0
146 1
3883 0
1044 0
..
2261 0
4712 0
4971 0
2224 0
4825 0
Name: stroke, Length: 1022, dtype: int64

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 11/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [73]:

data.describe()

Out[73]:

gender age hypertension heart_disease ever_married work_type Re

count 5110.000000 5110.000000 5110.000000 5110.000000 5110.000000 5110.000000

mean 0.414286 43.226614 0.097456 0.054012 0.656164 2.167710

std 0.493044 22.612647 0.296607 0.226063 0.475034 1.090293

min 0.000000 0.080000 0.000000 0.000000 0.000000 0.000000

25% 0.000000 25.000000 0.000000 0.000000 0.000000 2.000000

50% 0.000000 45.000000 0.000000 0.000000 1.000000 2.000000

75% 1.000000 61.000000 0.000000 0.000000 1.000000 3.000000

max 2.000000 82.000000 1.000000 1.000000 1.000000 4.000000

In [74]:

from sklearn.preprocessing import StandardScaler


std=StandardScaler()

In [75]:

X_train_std=std.fit_transform(X_train)
X_test_std=std.transform(X_test)

In [39]:

import pickle

In [40]:

with open('model,pkl','wb') as scaler_file:


pickle.dump(std,scaler_file)

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 12/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [116]:

X_train_std

Out[116]:

array([[ 1.19359699, 0.2521852 , -0.33069968, ..., -0.58626884,


0.00238781, 1.51158251],
[ 1.19359699, 1.04686385, -0.33069968, ..., -0.50843521,
-0.44065504, -0.35191245],
[ 1.19359699, 1.5325008 , -0.33069968, ..., 2.27080023,
-0.58427812, -0.35191245],
...,
[ 1.19359699, 0.16388757, -0.33069968, ..., -0.43745625,
1.34810513, 1.51158251],
[ 1.19359699, -0.80738634, -0.33069968, ..., 1.33171097,
-0.75401449, 0.57983503],
[-0.83780372, 1.62079843, -0.33069968, ..., -0.74171498,
-0.16646553, 0.57983503]])

In [42]:

X_test_std

Out[42]:

array([[-0.83780372, 0.64952452, -0.33069968, ..., -0.12678509,


1.38727506, 1.51158251],
[ 1.19359699, 0.60537571, -0.33069968, ..., -0.35586361,
0.12078063, -1.28365994],
[ 1.19359699, 0.95856622, -0.33069968, ..., -0.83414241,
0.00238781, -0.35191245],
...,
[-0.83780372, 0.87026859, -0.33069968, ..., -1.08555387,
1.17836876, 0.57983503],
[ 1.19359699, 0.60537571, -0.33069968, ..., -0.66056457,
0.32968693, -0.35191245],
[-0.83780372, -1.29302329, -0.33069968, ..., -0.75962556,
-1.31545016, -1.28365994]])

In [43]:

from sklearn.tree import DecisionTreeClassifier


dt=DecisionTreeClassifier()

In [44]:

dt.fit(X_train_std,Y_train)

Out[44]:

▾ DecisionTreeClassifier
DecisionTreeClassifier()

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 13/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [45]:

dt.feature_importances_

Out[45]:

array([0.04214775, 0.17247766, 0.01142995, 0.03098266, 0.02544746,


0.04860749, 0.03847329, 0.28277809, 0.27806247, 0.06959316])

In [46]:

X_train.columns

Out[46]:

Index(['gender', 'age', 'hypertension', 'heart_disease', 'ever_married',


'work_type', 'Residence_type', 'avg_glucose_level', 'bmi',
'smoking_status'],
dtype='object')

In [47]:

Y_pred=dt.predict(X_test_std)

In [48]:

Y_pred

Out[48]:

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [49]:

from sklearn.metrics import accuracy_score

In [50]:

ac_dt=accuracy_score(Y_test,Y_pred)

In [51]:

ac_dt

Out[51]:

0.901174168297456

In [52]:

from sklearn.linear_model import LogisticRegression


lr=LogisticRegression()

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 14/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [53]:

lr.fit(X_train_std,Y_train)

Out[53]:

▾ LogisticRegression
LogisticRegression()

In [54]:

Y_pred_lr=lr.predict(X_test_std)

In [55]:

Y_pred_lr

Out[55]:

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [56]:

ac_lr=accuracy_score(Y_test,Y_pred_lr)

In [57]:

ac_lr

Out[57]:

0.9383561643835616

In [76]:

from sklearn.neighbors import KNeighborsClassifier


knn=KNeighborsClassifier()

In [77]:

knn.fit(X_train_std,Y_train)

Out[77]:

▾ KNeighborsClassifier
KNeighborsClassifier()

In [78]:

Y_pred=knn.predict(X_test_std)

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 15/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [79]:

ac_knn=accuracy_score(Y_test,Y_pred)

In [80]:

ac_knn

Out[80]:

0.9344422700587084

In [81]:

from sklearn.ensemble import RandomForestClassifier


rf=RandomForestClassifier()

In [82]:

rf.fit(X_train_std,Y_train)

Out[82]:

▾ RandomForestClassifier
RandomForestClassifier()

In [83]:

Y_pred=rf.predict(X_test_std)

In [84]:

ac_rf=accuracy_score(Y_test,Y_pred)

In [85]:

ac_rf

Out[85]:

0.9363992172211351

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 16/17


7/25/23, 8:16 PM brain stroke prediction using ml - Jupyter Notebook

In [110]:

models_params = {
'rand':{
'model':RandomForestClassifier(),
'params' :{
'n_estimators':[100,120,130]
}
},
'logi':{
'model':LogisticRegression(solver='liblinear',multi_class='auto'),
'params':{
'C':[ 1,5,10 ]
}
},
'knn':{
'model':KNeighborsClassifier(),
'params':{
'n_neighbors':[5,6,8]
}
}

models

In [ ]:

In [93]:

from sklearn.model_selection import GridSearchCV

In [113]:

for model_name,mp in models_params.items():


clf = GridSearchCV(mp['model'],mp['params'],cv=5,return_train_score=False)
clf.fit(X_train_std,Y_train)
print(model_name,"score",clf.best_score_)

rand score 0.9537675855072377


logi score 0.9540120842847438
knn score 0.9540120842847438

In [ ]:

localhost:8888/notebooks/brain stroke prediction using ml.ipynb 17/17

You might also like