100% found this document useful (1 vote)

80 views16 pages

Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory

The document discusses preprocessing ECG data from multiple datasets and combining them. It imports libraries, lists the file paths, sorts the files, reads and combines the CSVs for each lead across all data. It performs dimensionality reduction via PCA on the combined data, explaining over 75% of variance with 100 components. It then trains and tests different ML models like KNN on the reduced data for a single lead, achieving 78% accuracy.

Uploaded by

girishcherry12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

80 views16 pages

Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory

Uploaded by

girishcherry12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Open in Colab

IMPORTING LIBRARIES

import pandas as pd
import numpy as np
import os
from natsort import natsorted
import joblib

WORKING ON COMBING MULTIPLE LEAD FILES

#creating list to store file_names

NORMAL_=[]
MI_=[]
PMI_=[]
HB_=[]

normal = '/content/drive/MyDrive/CMPE255_PROJECT/NORMAL'
abnormal = '/content/drive/MyDrive/CMPE255_PROJECT/AHB'
MI = '/content/drive/MyDrive/CMPE255_PROJECT/MI'
MI_history = '/content/drive/MyDrive/CMPE255_PROJECT/PM'

Types_ECG = {'normal':normal,'Abnormal_hear_beat':abnormal,'MI':MI,'History_MI':MI_histo

for types,folder in Types_ECG.items():

for files in os.listdir(folder):
if types=='normal':
NORMAL_.append(files)
elif types=='Abnormal_hear_beat':
HB_.append(files)
elif types=='MI':
MI_.append(files)
elif types=='History_MI':
PMI_.append(files)

NORMAL_ = natsorted(NORMAL_)
NORMAL_

['scaled_data_1D_1.csv',
'scaled_data_1D_2.csv',
'scaled_data_1D_3.csv',
'scaled_data_1D_4.csv',
'scaled_data_1D_5.csv',
'scaled_data_1D_6.csv',
'scaled_data_1D_7.csv',
'scaled_data_1D_8.csv',
'scaled_data_1D_9.csv',
'scaled_data_1D_10.csv',
'scaled_data_1D_11.csv',
'scaled_data_1D_12.csv',
'scaled_data_1D_13.csv']

D bl li k ( ) di
Double-click (or enter) to edit

Double-click (or enter) to edit

MI_ = natsorted(MI_)
MI_

PMI_ = natsorted(PMI_)
PMI_

HB_ = natsorted(HB_)
HB_

#loop over and create combined csv files for each leads.
for x in range(len(MI_)):
df1=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/NORMAL/{}'.format(NORMAL_[x]))
df2=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/AHB/{}'.format(HB_[x]))
df3=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/MI/{}'.format(MI_[x]))
df4=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/PM/{}'.format(PMI_[x]))
final_df = pd.concat([df1,df2,df3,df4],ignore_index=True)
final_df.to_csv('/content/drive/MyDrive/CMPE255_PROJECT/Combined_IDLead_{}.csv'.format

#now reading just lead1

df=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/Combined_IDLead_1.csv')
df['Target'].unique()

array(['No', 'HB', 'MI', 'PM'], dtype=object)

df.drop(columns=['Unnamed: 0'],inplace=True)

#convert Target column values as Numeric using ngroups

encode_target_label = df.groupby('Target').ngroup().rename("target").to_frame()
test_final = df.merge(encode_target_label, left_index=True, right_index=True)
test_final.drop(columns=['Target'],inplace=True)
test_final

0 1 2 3 4 5 6 7

0 0.728449 0.680755 0.619010 0.645367 0.681570 0.732488 0.758448 0.750660 0.7

1 0.957972 0.950695 0.941024 0.930501 0.913601 0.892244 0.868016 0.855127 0.8

2 0.611084 0.661575 0.695790 0.741113 0.716666 0.595794 0.425022 0.286457 0.4

3 0.839213 0.861690 0.866457 0.865756 0.855027 0.855606 0.845561 0.843187 0.8

4 0.917753 0.924369 0.873765 0.791381 0.699513 0.604927 0.500312 0.446012 0.5

... ... ... ... ... ... ... ... ...

923 0.874246 0.877014 0.864280 0.860505 0.871349 0.912404 0.958148 0.977826 0.9

924 0.829815 0.832084 0.852396 0.909665 0.988242 1.000000 0.923323 0.821865 0.7

925 0.469048 0.417983 0.362322 0.351995 0.391493 0.418305 0.440135 0.444598 0.4

926 0.682510 0.682286 0.641051 0.620212 0.608210 0.576331 0.603596 0.645714 0.6

927 0.792175 0.815695 0.819518 0.820559 0.847985 0.880933 0.902061 0.878266 0.8

928 rows × 256 columns

PERFORM DIMENSIONALITY REDUCTION JUST FOR CHECKING/UNDERSTANDING

#just for testing

# Now Perform Dimensionality reduction (PCA) on that Dataframe and check
from sklearn.decomposition import PCA

#do PCA and choose componeents as 100

pca = PCA(n_components=100)
x_pca = pca.fit_transform(test_final.iloc[:,0:-1])
x_pca = pd.DataFrame(x_pca)

# Calculate the variance explained by priciple components

explained_variance = pca.explained_variance_ratio_
print('Variance of each component:', pca.explained_variance_ratio_)
print('\n Total Variance Explained:', round(sum(list(pca.explained_variance_ratio_))*100

#store the new pca generated dimensions in a dataframe

pca_df = pd.DataFrame(data = x_pca)
target = pd.Series(test_final['target'], name='target')
result_df = pd.concat([pca_df, target], axis=1)
result_df
Variance of each component: [1.76145888e-01 9.50265614e-02 6.99060614e-02 6.1
5.34876630e-02 4.23664893e-02 3.68320213e-02 3.38541791e-02
3.00884979e-02 2.90396728e-02 2.64962509e-02 2.42272738e-02
2.10221030e-02 1.99751559e-02 1.77321042e-02 1.63016802e-02
1.53898622e-02 1.48412074e-02 1.33644825e-02 1.19674074e-02
1.16813409e-02 1.05807650e-02 9.68875480e-03 9.47385060e-03
8.65347748e-03 8.47506998e-03 7.93382172e-03 7.30163338e-03
6.76380665e-03 6.36886390e-03 6.02004791e-03 5.46823032e-03
3 2299 03 4 9 82 89 03 4 4686092 03 4 4608 684 03
result_df

0 1 2 3 4 5 6 7

0 1.018578 1.148263 -0.589582 0.193617 0.047950 -0.309400 -0.161566 0.478471

1 -1.098692 0.289832 -1.766388 1.076165 -0.261201 -0.820446 -0.474188 -0.515238

2 0.275021 -0.451289 0.106750 -0.426415 0.066133 0.692474 0.634894 -0.035867

3 -1.517085 1.662693 -1.021167 0.804267 -0.281985 0.518180 0.355748 -0.344235

4 -0.152840 -1.046283 0.351278 1.100381 -1.613642 1.484188 -0.113277 -0.251152

... ... ... ... ... ... ... ... ...

923 -1.321884 2.153021 0.788596 -1.304253 0.458186 -0.859346 -0.069127 -0.392796

924 -0.867163 -0.040504 0.940680 0.302648 -0.469672 -0.368255 1.065579 0.801522

925 3.753012 0.841636 -0.317393 -0.296117 0.593769 -0.255474 -0.057091 -0.072048

926 0.603083 0.126259 0.003433 0.283612 0.169559 -0.156326 -0.068399 -0.184308

927 -1.452945 1.233599 0.439472 0.278517 0.165928 -0.171830 -0.075000 0.033859

928 rows × 101 columns

TRYING DIFFERENT ML MODELS ON A SINGLE LEAD(EX : 1) POST

DIMENSIONALITY REDUCTION

KNN

# Import the necessary modules for ML model

from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline steps:

steps = [('knn', KNeighborsClassifier())]

# Create the pipeline: pipeline

pipeline = Pipeline(steps)

# have paased less range value of hyperparamter since i'm using free tier version of goo
k_range = list(range(1, 9))
parameters = dict(knn__n_neighbors=k_range)

#input
X = result_df.iloc[:,0:-1]

#target
y=result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=42)

#increasing cv score takes lot of time in gooogle colab, so kept it just 2.

cv = GridSearchCV(pipeline,parameters,cv=2)

cv.fit(X_train,y_train)

# Predict the labels of the test set: y_pred

y_pred = cv.predict(X_test)

Knn_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

print("Accuracy: {}".format(Knn_Accuracy))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))

Accuracy: 0.782258064516129
precision recall f1-score support

0 0.87 0.63 0.73 105

1 0.91 0.91 0.91 94
2 0.72 0.88 0.79 112
3 0.63 0.67 0.65 61

accuracy 0.78 372

macro avg 0.78 0.77 0.77 372
weighted avg 0.80 0.78 0.78 372

Tuned Model Parameters: {'knn__n_neighbors': 1}

Double-click (or enter) to edit

LOGISTIC REGRESSION

from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline steps:

steps = [('lr', LogisticRegression())]

# Create the pipeline: pipeline

pipeline = Pipeline(steps)

#input
X = result_df.iloc[:,0:-1]

#target
y=result_df.iloc[:,-1]

#parameters for gridsearchcv if we increase range of entries from 5 to higher value, we

c_space = np.logspace(-4, 4, 3)
parameters = {'lr__C': c_space,'lr__penalty': ['l2']}

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=42)

#call GridSearchCV and set crossvalscore to 2

cv = GridSearchCV(pipeline,parameters,cv=2)

cv.fit(X_train,y_train)

# Predict the labels of the test set: y_pred

y_pred = cv.predict(X_test)
LR_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

print("Accuracy: {}".format(LR_Accuracy))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))

Accuracy: 0.543010752688172
precision recall f1-score support

0 0.36 0.33 0.35 105

1 0.73 0.91 0.81 94
2 0.56 0.58 0.57 112
3 0.38 0.26 0.31 61

accuracy 0.54 372

macro avg 0.51 0.52 0.51 372
weighted avg 0.52 0.54 0.53 372

Tuned Model Parameters: {'lrC': 10000.0, 'lrpenalty': 'l2'}

SVM

# Import the necessary modules for ML model

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline

steps = [('SVM', SVC())]

pipeline = Pipeline(steps)

#input
X = result_df.iloc[:,0:-1]

#target
y=result_df.iloc[:,-1]

# Specify the hyperparameter space, if we increase the penalty(c) and gamma value the ac
#since it takes lots of time in google colab provided only a single value
parameters = {'SVM__C':[10],'SVM__gamma':[1]}
# Create train and test sets
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=21)

cv = GridSearchCV(pipeline,parameters,cv=3)
cv.fit(X_train,y_train)

y_pred = cv.predict(X_test)
SVM_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

SVM_Accuracy=cv.score(X_test, y_test)

print("Accuracy: {}".format(SVM_Accuracy))
print(classification_report(y_test, y_pred))

Accuracy: 0.8225806451612904
precision recall f1-score support

0 0.58 1.00 0.74 93

1 1.00 1.00 1.00 99
2 1.00 0.61 0.76 117
3 1.00 0.68 0.81 63

accuracy 0.82 372

macro avg 0.90 0.82 0.83 372
weighted avg 0.90 0.82 0.83 372

NOW COMBINING ALL 12 LEADS INTO A SINGLE CSV FILE AND THEN PERFROM
MODEL ANALYSIS

#lets try combining all 12 leads in a single csv

location= '/content/drive/MyDrive/CMPE255_PROJECT/'
for files in natsorted(os.listdir(location)):
if files.endswith(".csv") and not files.endswith("13.csv"):
if files!='Combined_IDLead_1.csv':
df=pd.read_csv('/content/drive/MyDrive/CMPE255_PROJECT/{}'.format(files))
df.drop(columns=['Unnamed: 0'],inplace=True)
test_final=pd.concat([test_final,df],axis=1,ignore_index=True)
test_final.drop(columns=test_final.columns[-1],axis=1,inplace=True)

#drop the target column

test_final.drop(columns=[255],axis=1,inplace=True)
test_final
0 1 2 3 4 5 6 7

0 0.728449 0.680755 0.619010 0.645367 0.681570 0.732488 0.758448 0.750660 0.7

1 0.957972 0.950695 0.941024 0.930501 0.913601 0.892244 0.868016 0.855127 0.8

2 0.611084 0.661575 0.695790 0.741113 0.716666 0.595794 0.425022 0.286457 0.4

3 0.839213 0.861690 0.866457 0.865756 0.855027 0.855606 0.845561 0.843187 0.8

4 0.917753 0.924369 0.873765 0.791381 0.699513 0.604927 0.500312 0.446012 0.5

#write the final file to csv

test_final.to_csv('final_1D.csv',header=False,index=False)

TEST DIMENSIONALITY REDUCTION EXPLAINED VARIANCE ON THE DATA

# Now Perform Dimensionality reduction (PCA) on that Dataframe and check

from sklearn.decomposition import PCA

#do PCA and choose componeents as 400

pca = PCA(n_components=400)
x_pca = pca.fit_transform(test_final)
x_pca = pd.DataFrame(x_pca)

# Calculate the variance explained by priciple components

#store the new pca generated dimensions in a dataframe

#store the new pca generated dimensions in a dataframe
pca_df = pd.DataFrame(data = x_pca)
target = pd.Series(result_df.iloc[:,-1], name='target')
final_result_df = pd.concat([pca_df, target], axis=1)
final_result_df
Variance of each component: [8.04649534e-02 4.68818003e-02 3.76212504e-02 2.9
2.57031130e-02 2.32574514e-02 2.14376788e-02 2.04315151e-02
1.94482863e-02 1.79877408e-02 1.64766264e-02 1.53241665e-02
1.50689862e-02 1.41398267e-02 1.36330466e-02 1.33375324e-02
1.26355566e-02 1.25577001e-02 1.16968257e-02 1.11671338e-02
1.07975552e-02 1.06183806e-02 1.03402122e-02 1.01248410e-02
9.73197948e-03 9.25504395e-03 9.16367637e-03 8.76267060e-03
8.54270112e-03 8.20665462e-03 8.07642149e-03 7.90742343e-03
7.54929819e-03 7.21938018e-03 7.07604659e-03 6.89135251e-03
6.80575532e-03 6.71875790e-03 6.38252148e-03 6.33951897e-03
6.10254734e-03 5.94560955e-03 5.76371295e-03 5.71788829e-03
5.55354810e-03 5.42316932e-03 5.35640711e-03 5.08429353e-03
5.03302777e-03 4.96811576e-03 4.87696491e-03 4.63686128e-03
4.55349933e-03 4.45390625e-03 4.31579996e-03 4.28316592e-03
4.17213140e-03 4.12346241e-03 4.09072049e-03 3.99349122e-03
3.92129459e-03 3.81982060e-03 3.78116652e-03 3.73307150e-03
3.68894307e-03 3.55238746e-03 3.49148625e-03 3.40490507e-03
3.33593814e-03 3.25467389e-03 3.20023474e-03 3.14871964e-03
3.09091665e-03 3.07180393e-03 3.05651457e-03 2.95447952e-03
2.90507083e-03 2.84618700e-03 2.80939396e-03 2.76324718e-03
2.71487874e-03 2.68959207e-03 2.67378836e-03 2.62085254e-03
2.55991613e-03 2.53614502e-03 2.47015404e-03 2.45768102e-03
2.41851536e-03 2.39477316e-03 2.35560704e-03 2.29236345e-03
2.26928539e-03 2.24965527e-03 2.22764534e-03 2.19258829e-03
2.14654982e-03 2.09081474e-03 2.08656961e-03 2.04315332e-03
2.01191187e-03 1.99715030e-03 1.98092986e-03 1.93183566e-03
1.90133601e-03 1.86628808e-03 1.85847904e-03 1.79040117e-03
1.77318190e-03 1.76278440e-03 1.73682193e-03 1.70177712e-03
1.69142157e-03 1.66289246e-03 1.64192361e-03 1.62455779e-03
1.59836820e-03 1.57166872e-03 1.56017874e-03 1.55193712e-03
1.52130395e-03 1.50860404e-03 1.48563216e-03 1.45667689e-03
1.44862677e-03 1.43014707e-03 1.42443426e-03 1.39341888e-03
1.38941740e-03 1.38032166e-03 1.35292505e-03 1.33403513e-03
1.33300728e-03 1.31774024e-03 1.29238722e-03 1.24574072e-03
1.23408862e-03 1.21598644e-03 1.20568485e-03 1.19391143e-03
1.18690274e-03 1.16630751e-03 1.16159095e-03 1.14539199e-03
#save to dimensionally reduced csv file
1.13634359e-03 1.11858663e-03 1.10460060e-03 1.08515359e-03
final_result_df.to_csv("pca_final.csv")
1.07679695e-03 1.06488284e-03 1.05861426e-03 1.04012565e-03
1.03222232e-03 1.02519590e-03 1.01169941e-03 9.96444257e-04
import 9.76134514e-04
joblib 9.61104386e-04 9.57134099e-04 9.48294848e-04
9.35386446e-04
#save the PCA model 9.29858628e-04 9.24107282e-04 9.20229599e-04
9.00136970e-04 8.84392791e-04 8.60041244e-04 8.58222437e-04
joblib_file='PCA_ECG.pkl'
8.39586154e-04 8.34156616e-04 8.24745137e-04 8.19630377e-04
joblib.dump(pca,joblib_file)
8.11755902e-04 8.09589697e-04 7.93351930e-04 7.83229226e-04
['PCA_ECG.pkl']
7.69323633e-04 7.62916710e-04 7.61217310e-04 7.49412461e-04
7.41978508e-04 7.32319449e-04 7.28386324e-04 7.15766463e-04
7.00416470e-04 6.92792928e-04 6.87860571e-04 6.77118996e-04
TRYING DIFFERENT ML
6.69195650e-04 MODELS ON 6.52787237e-04
6.62776506e-04 THE ALL 12 LEADS COMBINED FILE WITHOUT
6.41350808e-04
6.31671343e-04 6.25941688e-04 6.20986818e-04 6.12964320e-04
DIMENSIONALITY
6.06757241e-04REDUCTION
6.00414979e-04 5.90442751e-04 5.85447566e-04
5.82053388e-04 5.72736727e-04 5.64768427e-04 5.62060875e-04
5.53942338e-04 5.47413376e-04 5.43815848e-04 5.39018247e-04
KNN 5.31538796e-04 5.21422265e-04 5.16620308e-04 5.13730678e-04
5.08883050e-04 5.04308686e-04 4.96238364e-04 4.91958416e-04
4.80055673e-04 4.74422583e-04 4.69414331e-04 4.65649131e-04
4.62052065e-04
# Import the necessary 4.58664175e-04 4.49131977e-04 4.46512859e-04
modules for ML model
from sklearn.pipeline import Pipeline 4.30056931e-04 4.24233413e-04
4.45747677e-04 4.36928354e-04
4 21656145 04 4
from sklearn.neighbors 20467965
import 04 4 16760267 04 4 15888840 04
KNeighborsClassifier
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline steps:

steps = [('knn', KNeighborsClassifier())]

# Create the pipeline: pipeline

pipeline = Pipeline(steps)

# have paased less range value of hyperparamter since i'm using free tier version of goo
k_range = list(range(1, 30))
parameters = dict(knn__n_neighbors=k_range)

#input
X = final_result_df.iloc[:,:-1]

#target
y=final_result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=42)

#increasing cv score takes lot of time in gooogle colab, so kept it just 2.

cv = GridSearchCV(pipeline,parameters,cv=2)

cv.fit(X_train,y_train)

# Predict the labels of the test set: y_pred

y_pred = cv.predict(X_test)

Knn_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

print("Accuracy: {}".format(Knn_Accuracy))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))

Accuracy: 0.793010752688172
precision recall f1-score support

0 0.92 0.65 0.76 105

1 0.95 0.91 0.93 94
2 0.70 0.86 0.77 112
3 0.65 0.74 0.69 61

accuracy 0.79 372

macro avg 0.80 0.79 0.79 372
weighted avg 0.81 0.79 0.79 372

Tuned Model Parameters: {'knn__n_neighbors': 1}

LOGISTIC REGRESSION

from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline steps:

steps = [('lr', LogisticRegression())]

# Create the pipeline: pipeline

pipeline = Pipeline(steps)

#input
X = final_result_df.iloc[:,:-1]

#target
y=final_result_df.iloc[:,-1]

#parameters for gridsearchcv

c_space = np.logspace(-4, 4, 10)
parameters = {'lr__C': c_space,'lr__penalty': ['l2']}

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=42)

#call GridSearchCV and set crossvalscore to 2

cv = GridSearchCV(pipeline,parameters,cv=2)

cv.fit(X_train,y_train)

# Predict the labels of the test set: y_pred

y_pred = cv.predict(X_test)
LR_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

print("Accuracy: {}".format(LR_Accuracy))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))

Accuracy: 0.7768817204301075
precision recall f1-score support

0 0.83 0.57 0.68 105

1 0.83 0.91 0.87 94
2 0.82 0.86 0.84 112
3 0.59 0.77 0.67 61

accuracy 0.78 372

macro avg 0.77 0.78 0.76 372
weighted avg 0.79 0.78 0.77 372

Tuned Model Parameters: {'lrC': 0.3593813663804626, 'lrpenalty': 'l2'}

SVM

# Import the necessary modules for ML model

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

# Setup the pipeline

steps = [('SVM', SVC())]

pipeline = Pipeline(steps)

#input
X = final_result_df.iloc[:,:-1]

#target
y=final_result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5,random_state=21)

cv = GridSearchCV(pipeline,parameters,cv=3)
cv.fit(X_train,y_train)

y_pred = cv.predict(X_test)
SVM_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

SVM_Accuracy=cv.score(X_test, y_test)

print("Accuracy: {}".format(SVM_Accuracy))
print(classification_report(y_test, y_pred))
print("Tuned Model Parameters: {}".format(cv.best_params_))

Accuracy: 0.9051724137931034
precision recall f1-score support

0 0.81 0.92 0.86 119

1 1.00 1.00 1.00 125
2 0.91 0.89 0.90 140
3 0.93 0.78 0.84 80

accuracy 0.91 464

macro avg 0.91 0.89 0.90 464
weighted avg 0.91 0.91 0.91 464

Tuned Model Parameters: {'SVMC': 10, 'SVMgamma': 0.01}

XGBOOST

from xgboost import XGBClassifier

from sklearn.metrics import accuracy_score

model = XGBClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: {}".format(accuracy))
print(classification_report(y_test, y_pred))

Accuracy: 0.853448275862069
precision recall f1-score support

0 0.79 0.70 0.74 119

1 0.98 1.00 0.99 125
2 0.82 0.87 0.84 140
3 0.80 0.82 0.81 80

accuracy 0.85 464

macro avg 0.85 0.85 0.85 464
weighted avg 0.85 0.85 0.85 464

SAVING A VERY BASIC ML MODEL AND USING IT ON REALTIME PIPELINE TO

CHECK WORKING

# Import the necessary modules for ML model

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report
import joblib
#input
X = final_result_df.iloc[:,:-1]
#target
y=final_result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4,random_state=42)

knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(X_train,y_train)

joblib_file='knn_model_test.pkl'
joblib.dump(knn,joblib_file)

['knn_model_test.pkl']

# Import the necessary modules for ML model

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report

#input
X = pd.read_csv('final_1D.csv',header=None)

#target
y=final_result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.5,random_state=21)

svm=SVC(C=10,gamma=0.01)

svm.fit(X_train,y_train)

joblib_file='svm_model_test.pkl'
joblib.dump(svm,joblib_file)

['svm_model_test.pkl']
ENSEMBLE

# Importing required modules

from sklearn import linear_model, tree, ensemble
from sklearn.naive_bayes import GaussianNB
import xgboost
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
import pickle

#input
X = final_result_df.iloc[:,0:-1]

#target
y=final_result_df.iloc[:,-1]

# Create train and test sets

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=42)

# Stacking of ML Models
eclf = VotingClassifier(estimators=[
('SVM', SVC(probability=True)),
('knn', KNeighborsClassifier()),
('rf', ensemble.RandomForestClassifier()),
('bayes',GaussianNB()),
('logistic',LogisticRegression()),
], voting='soft')

# Hyperparameter Tuning using gridSearch

params = {'SVM__C':[1, 10, 100],
'SVM__gamma':[0.1, 0.01],
'knn__n_neighbors': [1,3,5],
'rf__n_estimators':[300, 400],
}

grid = GridSearchCV(estimator=eclf, param_grid=params, cv=5)

voting_clf = grid.fit(X_train, y_train)

print(grid.best_params_)
y_pred = voting_clf.predict(X_test)

# Compute and print metrics

Voting_Accuracy=voting_clf.score(X_test, y_test)

print("Accuracy: {}".format(Voting_Accuracy))
print(classification_report(y_test, y_pred))
print(voting_clf.best_params_)

{'SVMC': 1, 'SVMgamma': 0.1, 'knn__n_neighbors': 1, 'rf__n_estimators': 3

Accuracy: 0.9247311827956989
precision recall f1-score support

0 0.89 0.96 0.92 80

1 1.00 1.00 1.00 72
2 0.92 0.92 0.92 79
3 0.88 0.75 0.81 48
accuracy 0.92 279
macro avg 0.92 0.91 0.91 279
weighted avg 0.92 0.92 0.92 279

{'SVMC': 1, 'SVMgamma': 0.1, 'knn__n_neighbors': 1, 'rf__n_estimators': 3

# open a file, where you ant to store the data

file = open('Heart_Disease_Prediction_using_ECG.pkl', 'wb')
# dump information to that file
pickle.dump(voting_clf, file)

SAVE AND USE THE ABOVE MODEL IN THE STREAMLIT APP :

https://colab.research.google.com/drive/139YVmcUBCiP52J2sX
3QE_eiu2sukVgpn?usp=sharing

Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Study Plan - SBL 12 Week - PER
100% (1)
Study Plan - SBL 12 Week - PER
1 page
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
ML Lect1
100% (1)
ML Lect1
51 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
PR01
100% (1)
PR01
41 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Book
100% (1)
Book
480 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
9 Regression
100% (1)
9 Regression
14 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Classification Problems
100% (1)
Classification Problems
25 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
Vinee
100% (1)
Vinee
28 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Chapter 4 - Linear Regression
100% (2)
Chapter 4 - Linear Regression
25 pages
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
100% (1)
Loading The Dataset: First We Load The Dataset and Find Out The Number of Columns, Rows, NULL Values, Etc
8 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
100% (1)
Peter Dueben: Royal Society University Research Fellow & ECMWF's Coordinator For Machine Learning and AI Activities
33 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
ML Practicals
No ratings yet
ML Practicals
21 pages
Oving-The-Beginners-Pid-Introduction/ Improving The Beginner's PID - Introduction
No ratings yet
Oving-The-Beginners-Pid-Introduction/ Improving The Beginner's PID - Introduction
2 pages
A Transportation Problem
No ratings yet
A Transportation Problem
1 page
Sorting PDF
No ratings yet
Sorting PDF
2,495 pages
ARTIFICIAL INTELLIGENCE Question Paper 21 22
0% (1)
ARTIFICIAL INTELLIGENCE Question Paper 21 22
3 pages
Novel Image Processing Techniques For Early Detection of Breast Cancer, Mat Lab and Lab View Implementation
No ratings yet
Novel Image Processing Techniques For Early Detection of Breast Cancer, Mat Lab and Lab View Implementation
4 pages
Module 3
No ratings yet
Module 3
98 pages
Algorithms and Data Structures
No ratings yet
Algorithms and Data Structures
80 pages
Coding Theory
No ratings yet
Coding Theory
34 pages
Lec 2
No ratings yet
Lec 2
9 pages
Data Structure and Algorithm Reviewer
No ratings yet
Data Structure and Algorithm Reviewer
3 pages
Ai Lab File 2
No ratings yet
Ai Lab File 2
45 pages
Matching Pursuit Decomposition of EEG
No ratings yet
Matching Pursuit Decomposition of EEG
22 pages
Data Mining Classification - Basic Concepts and Techniques
No ratings yet
Data Mining Classification - Basic Concepts and Techniques
92 pages
Neural Network Toolbox Command List
No ratings yet
Neural Network Toolbox Command List
4 pages
BT-401 Assignment-I
No ratings yet
BT-401 Assignment-I
1 page
Lec 10 BST
No ratings yet
Lec 10 BST
20 pages
7 - The Sampling Theorem
No ratings yet
7 - The Sampling Theorem
16 pages
Effective Heuristics For Ant Colony Optimization To Handle Large-Scale Problems
No ratings yet
Effective Heuristics For Ant Colony Optimization To Handle Large-Scale Problems
10 pages
CTDL GT
No ratings yet
CTDL GT
8 pages
Dpp-3 Division Algorithm, Factor & Remainder Theorem
No ratings yet
Dpp-3 Division Algorithm, Factor & Remainder Theorem
2 pages
Adaptive Intensity Transformation
No ratings yet
Adaptive Intensity Transformation
1 page
Matrix YM
100% (3)
Matrix YM
17 pages
Chapter 7 Project Problem: Noise Equivalent Bandwidth: Background
No ratings yet
Chapter 7 Project Problem: Noise Equivalent Bandwidth: Background
4 pages
Python Coding Interview Interview Questions Questions
No ratings yet
Python Coding Interview Interview Questions Questions
9 pages
Signals and Systems
No ratings yet
Signals and Systems
174 pages
Unit I Introduction
No ratings yet
Unit I Introduction
55 pages
Quiz 05inp Lagrange Solution
No ratings yet
Quiz 05inp Lagrange Solution
8 pages
Cse 408 Assignment
No ratings yet
Cse 408 Assignment
14 pages
Problem Set 4
No ratings yet
Problem Set 4
1 page
Ali DSP Lab4Report.
No ratings yet
Ali DSP Lab4Report.
17 pages

Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory

Uploaded by

Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory

Uploaded by

Open in Colab

WORKING ON COMBING MULTIPLE LEAD FILES

#creating list to store file_names

for types,folder in Types_ECG.items():

Double-click (or enter) to edit

#now reading just lead1

array(['No', 'HB', 'MI', 'PM'], dtype=object)

#convert Target column values as Numeric using ngroups

0 0.728449 0.680755 0.619010 0.645367 0.681570 0.732488 0.758448 0.750660 0.7

1 0.957972 0.950695 0.941024 0.930501 0.913601 0.892244 0.868016 0.855127 0.8

2 0.611084 0.661575 0.695790 0.741113 0.716666 0.595794 0.425022 0.286457 0.4

3 0.839213 0.861690 0.866457 0.865756 0.855027 0.855606 0.845561 0.843187 0.8

4 0.917753 0.924369 0.873765 0.791381 0.699513 0.604927 0.500312 0.446012 0.5

... ... ... ... ... ... ... ... ...

928 rows × 256 columns

PERFORM DIMENSIONALITY REDUCTION JUST FOR CHECKING/UNDERSTANDING

#just for testing

#do PCA and choose componeents as 100

# Calculate the variance explained by priciple components

#store the new pca generated dimensions in a dataframe

0 1.018578 1.148263 -0.589582 0.193617 0.047950 -0.309400 -0.161566 0.478471

1 -1.098692 0.289832 -1.766388 1.076165 -0.261201 -0.820446 -0.474188 -0.515238

2 0.275021 -0.451289 0.106750 -0.426415 0.066133 0.692474 0.634894 -0.035867

3 -1.517085 1.662693 -1.021167 0.804267 -0.281985 0.518180 0.355748 -0.344235

4 -0.152840 -1.046283 0.351278 1.100381 -1.613642 1.484188 -0.113277 -0.251152

... ... ... ... ... ... ... ... ...

923 -1.321884 2.153021 0.788596 -1.304253 0.458186 -0.859346 -0.069127 -0.392796

924 -0.867163 -0.040504 0.940680 0.302648 -0.469672 -0.368255 1.065579 0.801522

925 3.753012 0.841636 -0.317393 -0.296117 0.593769 -0.255474 -0.057091 -0.072048

926 0.603083 0.126259 0.003433 0.283612 0.169559 -0.156326 -0.068399 -0.184308

927 -1.452945 1.233599 0.439472 0.278517 0.165928 -0.171830 -0.075000 0.033859

928 rows × 101 columns

TRYING DIFFERENT ML MODELS ON A SINGLE LEAD(EX : 1) POST

# Import the necessary modules for ML model

# Setup the pipeline steps:

# Create the pipeline: pipeline

# Create train and test sets

#increasing cv score takes lot of time in gooogle colab, so kept it just 2.

# Predict the labels of the test set: y_pred

Knn_Accuracy = cv.score(X_test, y_test)

# Compute and print metrics

0 0.87 0.63 0.73 105

accuracy 0.78 372

Tuned Model Parameters: {'knn__n_neighbors': 1}

Double-click (or enter) to edit

from sklearn.pipeline import Pipeline

# Setup the pipeline steps:

# Create the pipeline: pipeline

#parameters for gridsearchcv if we increase range of entries from 5 to higher value, we

# Create train and test sets

#call GridSearchCV and set crossvalscore to 2

# Predict the labels of the test set: y_pred

# Compute and print metrics

0 0.36 0.33 0.35 105

accuracy 0.54 372

Tuned Model Parameters: {'lr__C': 10000.0, 'lr__penalty': 'l2'}

# Import the necessary modules for ML model

# Setup the pipeline

# Compute and print metrics

0 0.58 1.00 0.74 93

accuracy 0.82 372

#lets try combining all 12 leads in a single csv

#drop the target column

0 0.728449 0.680755 0.619010 0.645367 0.681570 0.732488 0.758448 0.750660 0.7

1 0.957972 0.950695 0.941024 0.930501 0.913601 0.892244 0.868016 0.855127 0.8

2 0.611084 0.661575 0.695790 0.741113 0.716666 0.595794 0.425022 0.286457 0.4

3 0.839213 0.861690 0.866457 0.865756 0.855027 0.855606 0.845561 0.843187 0.8

4 0.917753 0.924369 0.873765 0.791381 0.699513 0.604927 0.500312 0.446012 0.5

#write the final file to csv

TEST DIMENSIONALITY REDUCTION EXPLAINED VARIANCE ON THE DATA

# Now Perform Dimensionality reduction (PCA) on that Dataframe and check

#do PCA and choose componeents as 400

# Calculate the variance explained by priciple components

#store the new pca generated dimensions in a dataframe

# Setup the pipeline steps:

# Create the pipeline: pipeline

Tuned Model Parameters: {'lrC': 10000.0, 'lrpenalty': 'l2'}

Tuned Model Parameters: {'lrC': 0.3593813663804626, 'lrpenalty': 'l2'}

Tuned Model Parameters: {'SVMC': 10, 'SVMgamma': 0.01}

{'SVMC': 1, 'SVMgamma': 0.1, 'knn__n_neighbors': 1, 'rf__n_estimators': 3

{'SVMC': 1, 'SVMgamma': 0.1, 'knn__n_neighbors': 1, 'rf__n_estimators': 3