0% found this document useful (0 votes)
13 views9 pages

P21 Final Project Report

This report of a project i did.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views9 pages

P21 Final Project Report

This report of a project i did.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Arrhythmia Prediction and Diagnosis using Data

Analysis
Mangalnathan Vijayagopal Shreyas Muralidhara
[email protected] [email protected]
NC State University NC State University
Raleigh 27606 Raleigh 27606

Nischal Kashyap Pawandeep Mendiratta


[email protected] [email protected]
NC State University NC State University
Raleigh 27606 Raleigh 27606

ABSTRACT complex extraction based on annotated files with making


To detect and predict the type of arrhythmia based on Electro- R-peak as the centre and of constant size. Designing Machine
cardiogram (ECG) tool using machine learning models and learning model including deep neural network and calibrate
algorithms. We will be training a model using a given dataset hyperparameters to obtain the best results.
and then use test data to classify instances with unknown The data set will be split accordingly (70/30 rule) into train-
class labels. ing and testing data. We will be using 1D-CNN, MLP, SVM
and KNN with k-fold cross validation models to achieve the
KEYWORDS objectives.
datasets, support vector machines, neural networks, classi-
fiers, accuracy
2 RELATED WORK
1 BACKGROUND AND INTRODUCTION
We have referred the paper Automated Screening of Arrhyth-
Problem Statement mia Using Wavelet Based Machine Learning Techniques[3]
Most cardiac disorders cause irregularities in heartbeat. These
irregular patterns in rhythm of heartbeat is called Arrhyth- The paper summarizes that different techniques have been
mia. Electrocardiogram (ECG) [1] is the most preferred tool used to extract R point in the ECG delineation. Most of the
used by clinical practitioners to capture heartbeat. ECG is methods use compression of time domain features using
renowned to be cost-effective, easy to use and noninvasive Principal Component Analysis (PCA), Linear Discriminant
to the human body. However, Physicians may not interpret Analysis (LDA).
Electrocardiogram for large data sets effectively as it is time PCA or any other feature compression technique should
consuming and can also cause miss-classification of beats. condense the features better than the time domain coun-
They also cannot effectively identify the normalities and terpart method. Classification using 3 different algorithm -
abnormalities in the heartbeat. Support vector machine with various kernels(SVM), error
Although single arrhythmia heartbeat may not have a back propagation Neural network. Gaussian mixture mod-
serious impact on life, continuous arrhythmia beats can re- els(EBPNN) and Gaussian Mixture models(GMM).
sult in fatal circumstances. Therefore, automatic detection We infer that PCA would be better for feature selection than
of arrhythmia beats from ECG signals is a significant task LDA from further analysis. Implement Random forest in con-
in the field of cardiology. To eradicate the complexity and trast with the PCA.
possibility of human error in diagnosing, we leverage the
computational power and indefatigability of machine learn- Block diagram of the proposed scheme was implemented
ing models[1][2]. with the suggested models - Support vector Machine along
UCI Machine learning repository transformed the ECG with various kernels. Neural network classifier models - 1 Di-
signals into QRS complexes(column 10) based on the R- mension Convolution Neural Network and Back-propagation
peak(column 19) of 17 different types of beats. Our approach implementation using Simple network of Multi layer per-
includes Weighted class models - Proportionate sampling ceptron (MLP) approach. As novelty we have implemented
for all the 17 classes of beats because of a high standard kFold kNN even though it was not suggested by block dia-
deviation in the number of samples produced per class. QRS gram. Hence we achieved the lowest accuracy for the kNN
Project Team 21, Spring 2020, NC State University Mangalnathan Vijayagopal, Shreyas Muralidhara, Nischal Kashyap, and Pawandeep Mendiratta

model. 3 METHODS
Approach
We have also referred Machine Intelligent Diagnosis of ECG
As noted in the introduction, the machine learning algo-
for Arrhythmia Classification Using DWT, ICA and SVM
rithms used for training the prediction model include Sup-
techniques [4].
port Vector Machines (SVM), 1 Dimensional Convoluted Neu-
In the paper, it is mentioned that Arrhythmia class can be
ral Network, Multilayer Perceptron (MLP), and K-Nearest
grouped in 5 major classes Non-ectopic (N), Supraventricular
Neighbour(KNN) to achieve our objectives.
ectopic(S), Ventricular ectopic (V), Fusion (F) and Unknown
(U) for MIT-BIH arrhythmia dataset de-noised ECG R–peak
is detected using Pan–Tompkins algorithm. R–peak detected
signal is segmented, such that each segment consists of 99
samples before R–peak and 100 samples after R-peak. Each of
these 200 samples of cardiac beats of five arrhythmia classes
are used in this study. SVM separates the Binary labeled train-
ing features with maximum margin from the hyperplane.
We infer that most of the Classes of cardiac arrhythmia can
be classified by linear separation, but when Linear separation
is not possible we can use non linear kernel transformations
for non linear mapping to higher dimensional feature space.

In the paper, An integrated ECG feature extraction scheme


using PCA and wavelet transform[5], it is mentioned that Figure 1: Arrhythmia Model Architecture
Novel feature extraction on ECG using discrete random
wavelet provides many features in time. Using PCA for fea-
Before training, the first step in exploratory data analysis
ture selection compresses the features in time domain. Hence
is data pre-processing. Data pre-processing involves clean-
wavelet features contribute more significant than time do-
ing the data to ensure that the data set fed to the model is
main features for arrhythmia class detection.
consistent, clean and meaningful. This involves handling
We found out that applying PCA to capture the components
missing values, scaling the values of the attributes of the
with maximum variance will remove time domain features
data set, and selecting only those attributes which contribute
and retain the wavelet features as majority of wavelet fea-
towards predictions.
tures are captured when we have 95% cumulative variance.As
This document will explain the complete steps used for ar-
novelty we have implemented Random forest classifier as
rhythmia prediction and classification which will contribute
Feature selection which ic not found in any of references
valuable information to medical institutes for patient diag-
and it performs equally well in SVM model and In fact better
nosis
in K fold - kNN models.
Data Preprocessing
Finally in the paper, Heart rate dynamics distinguish among
atrial fibrillation, normal sinus rhythm and sinus rhythm Data Preprocessing is a crucial step in exploratory data anal-
with frequent ectopy[6], we inferred that ventricular re- ysis. It involves ensuring that clean data is fed to the machine
sponse analysis is based on the predictability of the inter-beat learning models so that it can clearly use this information to
timing (‘RR intervals’) of the QRS complexes in the ECG. RR provide accurate predictions.
intervals are derived from the most obvious large amplitude We have performed the following steps for data prepro-
feature in the ECG, the R-peak, the detection of which can cessing in the following order:
be far more noise resistant. This approach may therefore be • Remove unwanted columns - Deleting the attributes
more suitable for automatic, real-time AF detection. having more than 40% missing values as they do not
contribute effectively for predictions.
Therefore we conclude that, the QRS complexes along with • Replace missing values - Impute the missing values
R peak detect the presence of cardiac arrhythmia. They also by replacing them with the attribute median. Median
classify the majority of cardiac arrhythmia Ischemic arrhyth- is chosen over mean because the mean is more suscep-
mia, Ventricular, Super Ventricular and Artrio Ventricular tible to outliers.
arrhythmia • Attribute Scaling - Normalize the attributes so that
values are scaled up by processing. Scaling is necessary
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University

because there might be values which might be too large We have decided the number of components based on the
compared to others. plots of the following.
We found out that there was only column (feature 14) • Plot of Eigen values v/s No of components.
which was eligible to be removed because it had more than • Plot of Percent of cumulative variance v/s No of com-
40% missing values across rows. ponents.
We used SimpleImputer[7] which is a function in python Based on the plots, we select the components whose cu-
sklearn[8] package to replace all the remaining missing val- mulative variance captures 95% of variability.
ues with the median of values in the column where the By performing principal component analysis on the given
missing value was present. data set, we have obtained information which are compiled
Similarly, we used the StandardScaler[9] (sklearn) func- into the plot in Figure 1
tionality to scale all the values to a similar range and at the
same time retain their proportional differences.

Data Splitting
After data preprocessing the next step is to split the data
set into training and testing data. We have followed the 70-
30 rule for splitting. 70% of the entire data set is used for
training the model. The remaining 30% was used for testing
the accuracy of the trained model. We are using random data
split with stratified sampling to ensure equal distribution
of class labels in training and testing data respectively. In
other words, all the class labels have an equal chance to be
considered for either training data or testing data.

Feature Selection
Here we select features which predominantly contributes
to the prediction of class labels. First, we remove the cate- Figure 2: Principal Component vs Eigen Values
gorical features for which 95% of all the values were either
completely 0s or completely 1s. We primarily use two tech-
niques for feature selection. By inspecting the plots, we estimate that approximately
88 principal components have cumulative variance captures
• Random Forests 95%variability. Therefore we select these 88 features to pre-
• Principal Component Analysis dict the class labels.

Principal Component Analysis Random Forests


Principal Component Analysis[11] or PCA is a statistical pro- This model[10] works on a portion of the data set by continu-
cedure used for feature selection and feature extraction. We ously sampling with replacement and then fitting a decision
try to map various principal components with respect to the tree to the model. Each decision tree is a sequence of yes-no
given data points and find out which components cover or questions based on a single or combination of features. All
represent the variance in the correlated variables. In simple the features are not considered by the tree, which confirms
terms, PCA is mainly used to highlight and quantify the sim- that individual decision trees are not co-related. Hence the
ilarities and differences between features in the data set. The classifier less prone to over-fitting.
number of components is always less than or equal to the The measure of impurity is by Gini index. By implement-
number of attributes in the data set. The first principal com- ing Random forests, we are selecting desired attributes which
ponent always captures the largest variance in the data set does not cause model over-fitting.
followed by subsequent orthogonal principal components.In The Hyperparameter for Random forest is the number of
our dataset applying PCA to capture the components with classifiers considered. We have set this value to 20. This is
maximum variance will remove time domain features and done to ensure that there is no bias in selecting the attributes
retain the wavelet features as majority of wavelet features from the dataset. If a feature is selected, then we can be sure
are captured when we have 95% cumulative variance. that it was selected by a majority of decision trees generated
by Random forests.
Project Team 21, Spring 2020, NC State University Mangalnathan Vijayagopal, Shreyas Muralidhara, Nischal Kashyap, and Pawandeep Mendiratta

All the wavelet features were better extracted by Random We have implemented SVM using linear kernel, polyno-
forest than PCA. Hence resulting in slightly more features mial kernel, and radial basis function kernel for the features
than PCA. Hence the Novelty idea of random forest seems selected by PCA and Random forests. We can ascertain that
to work better. the accuracy scores from Random Forest features and PCA
By implementing Random Forests on the given data set, features are similar.
we have found that only 99 features (columns) out of 279
contribute predominantly to the prediction of the class label. k-Nearest Neighbors
The K Fold k-nearest neighbors (KNN)[14] algorithm is a
4 MODELS simple, supervised machine learning algorithm that can be
Machine learning models are functions or algorithms which used to solve both classification and regression problems.
are used to predict output for an unknown input based on All the wavelet features were better extracted by Random
previous patterns of input-output combinations. forest than PCA. Hence resulting in slightly more features
than PCA. Hence the Novelty idea of random forest seems
to work better.
Support Vector Machines
The basic idea of a K Fold KNN algorithm is to find the k
Support vector machines[12] or SVM is a supervised ma- nearest points in a training data set to a test data point and
chine learning model used for classification and regression predict the class of the test data point based on the nearest
analysis. data entities.Here the global data set is divided into 5 folds.
The objective of a support vector machine is to create a hy- Every fold is considered as a test data set at least once and
perplane in an N dimensional space that distinctly classifies prediction is made on it based on the other folds which are
the data points. Hyperplanes are decision boundaries that aggregated as a training data set. The nearest points are
help classify the data points. Data points falling on either decided on the basis of Euclidean distance or Minkowski
side of the hyperplane can be attributed to different classes. distance calculations.
The dimensions of the hyperplane depends upon the num- According to our KNN algorithm, we initially calculate the
ber of input features. For example, if the number of input Euclidean distance between every data point in the training
features is 2, then the hyperplane is a line. If the number of data set with the test data point. Gradually we select the k
input features is 3, then the hyperplane is a 2D plane. nearest neighbors from the training model which has the
According to SVM, firstly, we must find the points that lie least euclidean distance with respect to the test data point.
closest to all the other data points. These points are called The class variable of the test data point is predicted using
support vectors. Next, we find the distance between the the class variables of the k nearest neighbors.
dividing plane and the support vectors. This distance is called KNN is desirable in areas where there is less information
as margin. The main goal is to maximize this margin to obtain about the data set. For example there may be outliers in the
an optimal dividing plane also known as the hyper-plane. data set or redundancy for which we may want to incorporate
For our project, we have made use of the sklearn library other rules to queries that don’t fit well in the dimensional
for implementing the SVM Model. space in which the KNN algorithm runs in.
SVM uses a set of mathematical functions known as ker- The dataset Cardiac Arrhythmia is more of an imbalanced
nels[13] . A kernel function transforms input data into a dataset which has 16 class labels with disproportionate class
required form. There are 9 kernel functions available. We values(with class labels missing for 11,12 and 13). It is im-
have implemented the following kernels: portant for us to cross validate every data point in order
• Linear kernel to obtain higher efficiency. This is where K Fold KNN[16]
• Polynomial kernel Algorithm comes into place. With the help of this algorithm,
• Gaussian Radial Basis Function (RBF) kernel we test every data point and classify them to their respective
• Sigmoid kernel function class labels.

Linear kernel is best suited for our model as the data set 1-D Convoluted Neural Network
is linearly separated. For novelty, we are comparing linear Since this is a classification problem, the convolutional model
kernel function with polynomial kernel and radial basis ker- works the best as a classifier. We are using keras.layers.Conv1D
nel to verify that the accuracies are lower with polynomial library along with fully connected layers for building the
and radial basis kernel functions. model. The approach includes creating a single block of
Our SVM implementation includes regularization of hy- convolutions and flattening the results as required by the
perparameters using critical factors ranging from 10−3 to connected layers to generate the class labels.
103 . We are using Conv1D model because it is tabular data. Since
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University

our data is imbalanced (the distribution of class labels is not hierarchical and non-linear combination of features and pat-
uniform across the dataset), we will be using weighted loss terns detected from the input. Therefore, instead of hand cod-
function to penalize misclassification of labels. The labels ing essential features, neural network autonomously chooses
which occur less in dataset cannot afford to be missclassified. features resulting in high classification accuracy. This justi-
Therefore higher weights are assigned to such classes using fies our choice of neural networks (MLP and 1D-CNN) for
sklearn’s compute_class_weights package. arrhythmia classification.
We are using Adam optimizer with a softmax layer in order Multi Layer Perceptrons (MLP) are universal approximators
to obtain the final predictions. which can be used to create mathematical models using re-
gression analysis. Since classification is a form of regression
when the class labels are categorical, MLPs make good clas-
sifier algorithms.
1D-CNN is generally more preferred for image classification,
but flattened images are equivalent to 1D-CNN with multiple
attributes. Since the waveform is converted into attributes
in the data set, 1D-CNN is also a good choice as a machine
Figure 3: 1D-CNN Layer Architecture
learning model for this data set.

Multi Layer Perceptron 5 EXPERIMENTS AND RESULTS


Multi Layer Perceptron or MLP, is a feed-forward neural Dataset
network which are fully connected with multiple single neu- We are using the UCI Machine Learning repository
rons. The input layers in the model are fully connected to (http://archive.ics.uci.edu/ml/datasets/Arrhythmia) which
the hidden layer and the output layer is fully connected to comprises of a data set containing arrhythmia data for 452
the hidden layer. Such an implementation is called as Deep (rows) patients each of which contain 279 attributes (columns).
Neural Network. For this model, we will be implementing These patients are classified into 1 out of 16 types of Arrhyth-
the weighted loss function to compensate the imbalanced mia (class labels). Our feature space of 279 attributes includes
classes in the training data set.[1] patient information such as gender, age, PQRST wave signal,
height and channel signal information[2].
Class names key factors and the distribution of the for
UCI Arrhythmia dataset is described in [Table 1].

Hypothesis
After thorough analysis of the dataset and diligent review of
related work, we have come up with the following hypothe-
sis:
(1) Predicting the common types of arrhythmia from ECG
data by detecting patterns in QRS complexes based on the
R-peak values which is derived from the dataset. If the ECG
waveform is between 30ms to 60 ms or beyond that i.e RR
Figure 4: Multi Layer Perceptron Architecture peak width is beyond 60ms then the data is considered as
probability of arrhythmia. Our main goal is to predict the
non - arrhythmia results as accurately as possible.
Rationale
We implemented SVM because it is the most reliable model (2) Perform proportionate sampling for all 16 classes of heart-
for the stratified arrhythmia data set for classifying and re- beats due to high standard deviation in the number of sam-
gression. ples by using weighted class models.
We implemented KNN because this model requires no train-
ing before making predictions. Therefore, new data can be Support Vector Machines
seamlessly added to the model without impacting the accu- Support Vector Machines algorithm was applied along with
racy scores. the feature selection approaches PCA and Random Forest.
Neural Networks are famous for high classification accuracy. With the kernels and the critical factors as the hyperparame-
This is because the deep layers of a neural network represents ters, we obtained a maximum accuracy of 73.53% with linear
Project Team 21, Spring 2020, NC State University Mangalnathan Vijayagopal, Shreyas Muralidhara, Nischal Kashyap, and Pawandeep Mendiratta

Arrhythmia class and Key factors: No. of Instances


1. Normal - QRS complexes
245
based on the R-peak
2. Ischemic changes - QRS
44
complexes based on the R-peak
3. Old Anterior Myocardial
Infarction- RR interval model
15
& PR interval variability with
P similarity
4. Old Inferior Myocardial
Infarction - interval model
15
& PR interval variability
with P similarity
5. Sinus tachycardy -
13
RR interval irregularity
6. Sinus bradycardy - Figure 5: SVM Accuracies for PCA Features
25
RR interval irregularity
7. Ventricular Premature
Contraction (PVC) - QRS 3
complexes based on the R-peak
8. Supraventricular Premature
Contraction - QRS complexes 2
based on R -peak
9. Left bundle branch block -
9
QRS complexes based on R-peak
10. Right bundle branch block -
50
QRS complexes based on R-peak
11.1. degree AtrioVentricular
block - QRS complexes based on 0
R-peak
12.2. degree AV block - QRS
0
complexes based on R-peak
13.3. degree AV block - QRS
0 Figure 6: SVM Accuracies for Random Forest Features
complexes based on R-peak
14.Left ventricule hypertrophy
- QRS complexes based on 4 k-Nearest Neighbors
R-peak
We have generated accuracy scores of the model for multiple
15. Atrial Fibrillation or Flutter
values of k for both the feature selection methods. We have
- Absence of P waves, presence 5
found that for value of k = 5, maximum accuracy is 61.27%
of f-waves in TQ interval
for PCA and for value of k = 3, a maximum accuracy of 65.49
16. Others 22
%. The data set was split into 7 folds initially and K Fold cross
Table 1: Class distribution for UCI Arrhythmia data
validation was performed for various values of k. We also
observed that the accuracy was slightly more for random
forest than for PCA.
kernel (Critical factor, c = 0.01(PCA),c = 0.1(Random Forest The graph plots for the accuracy scores mentioned are
Classifier)) for both the feature selection methods. found in Figure 4 and Figure 5 respectively.
The graph plot in Figure 4 and Figure 5 shows the com-
parisons between the accuracies of the kernel functions for 1D-CNN
corresponding critical factors for both the feature selection The convolution layers with 64 filters and increasing 128 fil-
approaches. Figure 6 displays the classification report for the ters are added as a block of two respectively. The activation
best model. function for the convolutional block is RELU with the fixed
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University

Figure 7: Classification report for Linear SVM


Figure 9: KNN Accuracies for Random Forest Features

Figure 8: KNN Accuracies for PCA Features


Figure 10: Classification report for 1D-CNN

Multi Layer Perceptron


kernel size of 10. In order to retain the important features Since the hidden layers and the softmax output layer are
from the block, we initialize a dropout of 0.3 to the convolu- trainable, the hyperparameter tuning is performed for all
tion maxpool. The fully connected dense layers with 128 and layers. The hidden layers with units 64 and 128 and RELU
64 units are added in order to extract the features specific to activation are fixed for the artificial multi layer perceptron
the class and the predictions are extracted from the softmax and the results are extracted using softmax with class labels
layer with Adam optimizer having the default learning rate of size 17. The output layer uses Adam optimizer with a
of 0.001. We specify the precision, recall and f1-score values learning rate set to default of 0.001. The classification report
for the individual arrhythmia classes along with the macro for the Multi Layer Perceptron Model is found in Firgure 8.
and weighted averages as shown in Figure 7.
With the above specified hyperparameters we achieved the Discussion
best accuracy of 69.85 % for imbalanced test data of 136 For SVM, the data is linearly separated. From our work, we
records. Since all the layers are trainable, the hyperparame- conclude that linear kernel works best for the data set by
ters can be tuned is performed for all layers. comparing accuracy scores with other multi-dimensional
Project Team 21, Spring 2020, NC State University Mangalnathan Vijayagopal, Shreyas Muralidhara, Nischal Kashyap, and Pawandeep Mendiratta

In our current approach we have considered detecting and


classifying the class-imbalanced UCI Arrhythmia tabular
data based on the QRS wavelets based on the R-peak value
and achieved accuracy values ranging from 60% to 70% with
a maximum accuracy of 73.53% by the Neural Network Mod-
els 1D-CNN and MLP, using the Weighted loss function.
Future scope of work includes developing Transfer Learn-
ing models using Resnet, InceptionResnetv2, VGG which
would be trained on the ECG sinus wave graph data for Ar-
rhythmia detection and classification. By ensembling results
from our 1D-CNN model predictions with Transfer learn-
ing model predictions, based on individual class confidence
levels for both the models, We can potentially predict the
Hybrid model to have an accuracy of 85% for each class of
cardiac arrhythmia.

7 ACKNOWLEDGEMENTS
We thank Dr. Thomas Price, the teaching assistants and the
Figure 11: Classification report for MLP
Dept. of Computer Science at North Carolina State University
for their support and guidance.
kernels.
For KNN, this was a novel approach which was not imple- 8 REFERENCES
mented in prior related work. The reason we chose KNN [1] A. Das, F. Catthoor and S. Schaafsma, "Heartbeat Clas-
was that it was the only model where the entire dataset is sification in Wearables Using Multi-layer Perceptron and
validated using k-fold KNN, making the model more robust Time-Frequency Joint Distribution of ECG," 2018 IEEE/ACM
to provide accuracy for all the classes. But the accuracies International Conference on Connected Health: Applications,
turned out to be quite less when compared to the scope of Systems and Engineering Technologies (CHASE), Washing-
prior work suggested models - SVM, 1D-CNN and MLP. ton, DC, USA, 2018, pp. 69-74.
[2] N. Kalkstein, Y. Kinar, M. Na’aman, N. Neumark and P.
The previous work suggested to use PCA for feature se- Akiva, "Using machine learning to detect problems in ECG
lection while we find that the Random forest classifier to be data collection," 2011 Computing in Cardiology, Hangzhou,
equally good at feature selection for SVM and KNN imple- 2011, pp. 437-440.
mentation. This is another novel approach. [3]Martis, R.J., Krishnan, M.M.R., Chakraborty, C. et al. Au-
In prior related work, the ECG data was directly fed to the tomated Screening of Arrhythmia Using Wavelet Based Ma-
models. We have used the tabular form of ECG data avail- chine Learning Techniques. J Med Syst 36, 677–688 (2012).
able at UCI Machine learning repository. Therefore, it is not https://doi-org.prox.lib.ncsu.edu/10.1007/s10916-010-9535-7
insightful to compare accuracy scores with each other. [4] U. Desai, R. J. Martis, C. G. Nayak, Sarika K. and G. Se-
Using weighted loss functions for 1D-CNN as well as MLP, shikala, "Machine intelligent diagnosis of ECG for arrhyth-
we reduce the penalty for using a balanced class models. mia classification using DWT, ICA and SVM techniques,"
None of the related work referenced has implemented 1D- 2015 Annual IEEE India Conference (INDICON), New Delhi,
CNN and MLP. These are novel approaches to the dataset 2015, pp. 1-4.
implemented by us to effectively solve the problem state- [5]R. J. Martis, C. Chakraborty and A. K. Ray, "An Integrated
ment. ECG Feature Extraction Scheme Using PCA and Wavelet
By implementing these models, we have covered the objec- Transform," 2009 Annual IEEE India Conference, Gujarat,
tives specified in the hypothesis section. 2009, pp. 1-4.
[6] M. Carrara, L. Carozzi, T.J. Moss, M. De Pasquale, S.
6 CONCLUSIONS AND FUTURE SCOPE Cerutti, M. Ferrario, D.E. Lake, J.R. Moorman, Heart rate
As mentioned in the Discussion subsection, we were success- dynamics distinguish among atrial fibrillation, normal si-
fully able to predict classifications for all the labels which nus rhythm and sinus rhythm with frequent ectopy, Physiol
had adequate information required to train the models and Meas 36 (9) (2015) 1873–1888.
then perform proportionate sampling of all the 16 classes of [7] A. M. Salem, K. Revett and E. A. El-Dahshan, "Machine
heartbeats. learning in electrocardiogram diagnosis," 2009 International
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University

Multiconference on Computer Science and Information Tech- http://statweb.stanford.edu/ tibs/sta306bfiles/cvwrong.pdf


nology, Mragowo, 2009, pp. 429-433.
[8] Scikit-learn: A machine learning library for python -
https://scikit-learn.org/stable/ A MEETING SCHEDULES
[9] StandardScaler: Feature Scaler Throughout the project duration, all four of us punctually
https://towardsdatascience.com/scale-standardize-or-normalize- attended meetings via Zoom video conference on the sched-
with-scikit-learn-6ccc7d176a02 uled days given below:
[10] Random Forests - https://en.wikipedia.org/wiki/
Random_forest (1) April 4th - 1:30pm to 3:30pm
[11] Principal Component Analysis - (2) April 9th - 5:30pm to 8:30pm
https://en.wikipedia.org/wiki/Principal_component_analysis (3) April 14th - 7:30pm to 10:30pm
[12] Support Vector Machines - https://data-flair.training/blogs/ (4) April 16th - 7:30pm to 10:30pm
svm-support-vector-machine-tutorial/ (5) April 18th - 4:30pm to 6:30pm
[13] SVM Kernel Functions - https://data-flair.training/blogs/ (6) April 21st - 1:00pm to 5:00pm
svm-kernel-functions/ (7) April 22nd - 1:00pm to 5:00pm
[14] Novitasari, H B, Nur Hadianto, Sfenrianto, A Rahmawati, (8) April 23rd - 4:00pm to 7:00pm
Risha Prasetyo, Jaja Miharja and Windu Gata. “K-nearest (9) April 24th - 3:00pm to 6:00pm
neighbor analysis to predict the accuracy of product delivery
using administration of raw material model in the cosmetic B PROJECT LINK
industry (PT Cedefindo).” (2019). You can find this project on the NCSU github enterprise
[15] Cross Validation - https://en.wikipedia.org/wiki/Cross- server using the following link.
validation_(statistics) https://github.ncsu.edu/mvijaya2/ALDAProject
[16] K Fold -

You might also like