P21 Final Project Report
P21 Final Project Report
Analysis
Mangalnathan Vijayagopal Shreyas Muralidhara
[email protected] [email protected]
NC State University NC State University
Raleigh 27606 Raleigh 27606
model. 3 METHODS
Approach
We have also referred Machine Intelligent Diagnosis of ECG
As noted in the introduction, the machine learning algo-
for Arrhythmia Classification Using DWT, ICA and SVM
rithms used for training the prediction model include Sup-
techniques [4].
port Vector Machines (SVM), 1 Dimensional Convoluted Neu-
In the paper, it is mentioned that Arrhythmia class can be
ral Network, Multilayer Perceptron (MLP), and K-Nearest
grouped in 5 major classes Non-ectopic (N), Supraventricular
Neighbour(KNN) to achieve our objectives.
ectopic(S), Ventricular ectopic (V), Fusion (F) and Unknown
(U) for MIT-BIH arrhythmia dataset de-noised ECG R–peak
is detected using Pan–Tompkins algorithm. R–peak detected
signal is segmented, such that each segment consists of 99
samples before R–peak and 100 samples after R-peak. Each of
these 200 samples of cardiac beats of five arrhythmia classes
are used in this study. SVM separates the Binary labeled train-
ing features with maximum margin from the hyperplane.
We infer that most of the Classes of cardiac arrhythmia can
be classified by linear separation, but when Linear separation
is not possible we can use non linear kernel transformations
for non linear mapping to higher dimensional feature space.
because there might be values which might be too large We have decided the number of components based on the
compared to others. plots of the following.
We found out that there was only column (feature 14) • Plot of Eigen values v/s No of components.
which was eligible to be removed because it had more than • Plot of Percent of cumulative variance v/s No of com-
40% missing values across rows. ponents.
We used SimpleImputer[7] which is a function in python Based on the plots, we select the components whose cu-
sklearn[8] package to replace all the remaining missing val- mulative variance captures 95% of variability.
ues with the median of values in the column where the By performing principal component analysis on the given
missing value was present. data set, we have obtained information which are compiled
Similarly, we used the StandardScaler[9] (sklearn) func- into the plot in Figure 1
tionality to scale all the values to a similar range and at the
same time retain their proportional differences.
Data Splitting
After data preprocessing the next step is to split the data
set into training and testing data. We have followed the 70-
30 rule for splitting. 70% of the entire data set is used for
training the model. The remaining 30% was used for testing
the accuracy of the trained model. We are using random data
split with stratified sampling to ensure equal distribution
of class labels in training and testing data respectively. In
other words, all the class labels have an equal chance to be
considered for either training data or testing data.
Feature Selection
Here we select features which predominantly contributes
to the prediction of class labels. First, we remove the cate- Figure 2: Principal Component vs Eigen Values
gorical features for which 95% of all the values were either
completely 0s or completely 1s. We primarily use two tech-
niques for feature selection. By inspecting the plots, we estimate that approximately
88 principal components have cumulative variance captures
• Random Forests 95%variability. Therefore we select these 88 features to pre-
• Principal Component Analysis dict the class labels.
All the wavelet features were better extracted by Random We have implemented SVM using linear kernel, polyno-
forest than PCA. Hence resulting in slightly more features mial kernel, and radial basis function kernel for the features
than PCA. Hence the Novelty idea of random forest seems selected by PCA and Random forests. We can ascertain that
to work better. the accuracy scores from Random Forest features and PCA
By implementing Random Forests on the given data set, features are similar.
we have found that only 99 features (columns) out of 279
contribute predominantly to the prediction of the class label. k-Nearest Neighbors
The K Fold k-nearest neighbors (KNN)[14] algorithm is a
4 MODELS simple, supervised machine learning algorithm that can be
Machine learning models are functions or algorithms which used to solve both classification and regression problems.
are used to predict output for an unknown input based on All the wavelet features were better extracted by Random
previous patterns of input-output combinations. forest than PCA. Hence resulting in slightly more features
than PCA. Hence the Novelty idea of random forest seems
to work better.
Support Vector Machines
The basic idea of a K Fold KNN algorithm is to find the k
Support vector machines[12] or SVM is a supervised ma- nearest points in a training data set to a test data point and
chine learning model used for classification and regression predict the class of the test data point based on the nearest
analysis. data entities.Here the global data set is divided into 5 folds.
The objective of a support vector machine is to create a hy- Every fold is considered as a test data set at least once and
perplane in an N dimensional space that distinctly classifies prediction is made on it based on the other folds which are
the data points. Hyperplanes are decision boundaries that aggregated as a training data set. The nearest points are
help classify the data points. Data points falling on either decided on the basis of Euclidean distance or Minkowski
side of the hyperplane can be attributed to different classes. distance calculations.
The dimensions of the hyperplane depends upon the num- According to our KNN algorithm, we initially calculate the
ber of input features. For example, if the number of input Euclidean distance between every data point in the training
features is 2, then the hyperplane is a line. If the number of data set with the test data point. Gradually we select the k
input features is 3, then the hyperplane is a 2D plane. nearest neighbors from the training model which has the
According to SVM, firstly, we must find the points that lie least euclidean distance with respect to the test data point.
closest to all the other data points. These points are called The class variable of the test data point is predicted using
support vectors. Next, we find the distance between the the class variables of the k nearest neighbors.
dividing plane and the support vectors. This distance is called KNN is desirable in areas where there is less information
as margin. The main goal is to maximize this margin to obtain about the data set. For example there may be outliers in the
an optimal dividing plane also known as the hyper-plane. data set or redundancy for which we may want to incorporate
For our project, we have made use of the sklearn library other rules to queries that don’t fit well in the dimensional
for implementing the SVM Model. space in which the KNN algorithm runs in.
SVM uses a set of mathematical functions known as ker- The dataset Cardiac Arrhythmia is more of an imbalanced
nels[13] . A kernel function transforms input data into a dataset which has 16 class labels with disproportionate class
required form. There are 9 kernel functions available. We values(with class labels missing for 11,12 and 13). It is im-
have implemented the following kernels: portant for us to cross validate every data point in order
• Linear kernel to obtain higher efficiency. This is where K Fold KNN[16]
• Polynomial kernel Algorithm comes into place. With the help of this algorithm,
• Gaussian Radial Basis Function (RBF) kernel we test every data point and classify them to their respective
• Sigmoid kernel function class labels.
Linear kernel is best suited for our model as the data set 1-D Convoluted Neural Network
is linearly separated. For novelty, we are comparing linear Since this is a classification problem, the convolutional model
kernel function with polynomial kernel and radial basis ker- works the best as a classifier. We are using keras.layers.Conv1D
nel to verify that the accuracies are lower with polynomial library along with fully connected layers for building the
and radial basis kernel functions. model. The approach includes creating a single block of
Our SVM implementation includes regularization of hy- convolutions and flattening the results as required by the
perparameters using critical factors ranging from 10−3 to connected layers to generate the class labels.
103 . We are using Conv1D model because it is tabular data. Since
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University
our data is imbalanced (the distribution of class labels is not hierarchical and non-linear combination of features and pat-
uniform across the dataset), we will be using weighted loss terns detected from the input. Therefore, instead of hand cod-
function to penalize misclassification of labels. The labels ing essential features, neural network autonomously chooses
which occur less in dataset cannot afford to be missclassified. features resulting in high classification accuracy. This justi-
Therefore higher weights are assigned to such classes using fies our choice of neural networks (MLP and 1D-CNN) for
sklearn’s compute_class_weights package. arrhythmia classification.
We are using Adam optimizer with a softmax layer in order Multi Layer Perceptrons (MLP) are universal approximators
to obtain the final predictions. which can be used to create mathematical models using re-
gression analysis. Since classification is a form of regression
when the class labels are categorical, MLPs make good clas-
sifier algorithms.
1D-CNN is generally more preferred for image classification,
but flattened images are equivalent to 1D-CNN with multiple
attributes. Since the waveform is converted into attributes
in the data set, 1D-CNN is also a good choice as a machine
Figure 3: 1D-CNN Layer Architecture
learning model for this data set.
Hypothesis
After thorough analysis of the dataset and diligent review of
related work, we have come up with the following hypothe-
sis:
(1) Predicting the common types of arrhythmia from ECG
data by detecting patterns in QRS complexes based on the
R-peak values which is derived from the dataset. If the ECG
waveform is between 30ms to 60 ms or beyond that i.e RR
Figure 4: Multi Layer Perceptron Architecture peak width is beyond 60ms then the data is considered as
probability of arrhythmia. Our main goal is to predict the
non - arrhythmia results as accurately as possible.
Rationale
We implemented SVM because it is the most reliable model (2) Perform proportionate sampling for all 16 classes of heart-
for the stratified arrhythmia data set for classifying and re- beats due to high standard deviation in the number of sam-
gression. ples by using weighted class models.
We implemented KNN because this model requires no train-
ing before making predictions. Therefore, new data can be Support Vector Machines
seamlessly added to the model without impacting the accu- Support Vector Machines algorithm was applied along with
racy scores. the feature selection approaches PCA and Random Forest.
Neural Networks are famous for high classification accuracy. With the kernels and the critical factors as the hyperparame-
This is because the deep layers of a neural network represents ters, we obtained a maximum accuracy of 73.53% with linear
Project Team 21, Spring 2020, NC State University Mangalnathan Vijayagopal, Shreyas Muralidhara, Nischal Kashyap, and Pawandeep Mendiratta
7 ACKNOWLEDGEMENTS
We thank Dr. Thomas Price, the teaching assistants and the
Figure 11: Classification report for MLP
Dept. of Computer Science at North Carolina State University
for their support and guidance.
kernels.
For KNN, this was a novel approach which was not imple- 8 REFERENCES
mented in prior related work. The reason we chose KNN [1] A. Das, F. Catthoor and S. Schaafsma, "Heartbeat Clas-
was that it was the only model where the entire dataset is sification in Wearables Using Multi-layer Perceptron and
validated using k-fold KNN, making the model more robust Time-Frequency Joint Distribution of ECG," 2018 IEEE/ACM
to provide accuracy for all the classes. But the accuracies International Conference on Connected Health: Applications,
turned out to be quite less when compared to the scope of Systems and Engineering Technologies (CHASE), Washing-
prior work suggested models - SVM, 1D-CNN and MLP. ton, DC, USA, 2018, pp. 69-74.
[2] N. Kalkstein, Y. Kinar, M. Na’aman, N. Neumark and P.
The previous work suggested to use PCA for feature se- Akiva, "Using machine learning to detect problems in ECG
lection while we find that the Random forest classifier to be data collection," 2011 Computing in Cardiology, Hangzhou,
equally good at feature selection for SVM and KNN imple- 2011, pp. 437-440.
mentation. This is another novel approach. [3]Martis, R.J., Krishnan, M.M.R., Chakraborty, C. et al. Au-
In prior related work, the ECG data was directly fed to the tomated Screening of Arrhythmia Using Wavelet Based Ma-
models. We have used the tabular form of ECG data avail- chine Learning Techniques. J Med Syst 36, 677–688 (2012).
able at UCI Machine learning repository. Therefore, it is not https://doi-org.prox.lib.ncsu.edu/10.1007/s10916-010-9535-7
insightful to compare accuracy scores with each other. [4] U. Desai, R. J. Martis, C. G. Nayak, Sarika K. and G. Se-
Using weighted loss functions for 1D-CNN as well as MLP, shikala, "Machine intelligent diagnosis of ECG for arrhyth-
we reduce the penalty for using a balanced class models. mia classification using DWT, ICA and SVM techniques,"
None of the related work referenced has implemented 1D- 2015 Annual IEEE India Conference (INDICON), New Delhi,
CNN and MLP. These are novel approaches to the dataset 2015, pp. 1-4.
implemented by us to effectively solve the problem state- [5]R. J. Martis, C. Chakraborty and A. K. Ray, "An Integrated
ment. ECG Feature Extraction Scheme Using PCA and Wavelet
By implementing these models, we have covered the objec- Transform," 2009 Annual IEEE India Conference, Gujarat,
tives specified in the hypothesis section. 2009, pp. 1-4.
[6] M. Carrara, L. Carozzi, T.J. Moss, M. De Pasquale, S.
6 CONCLUSIONS AND FUTURE SCOPE Cerutti, M. Ferrario, D.E. Lake, J.R. Moorman, Heart rate
As mentioned in the Discussion subsection, we were success- dynamics distinguish among atrial fibrillation, normal si-
fully able to predict classifications for all the labels which nus rhythm and sinus rhythm with frequent ectopy, Physiol
had adequate information required to train the models and Meas 36 (9) (2015) 1873–1888.
then perform proportionate sampling of all the 16 classes of [7] A. M. Salem, K. Revett and E. A. El-Dahshan, "Machine
heartbeats. learning in electrocardiogram diagnosis," 2009 International
Arrhythmia Prediction and Diagnosis using Data Analysis Project Team 21, Spring 2020, NC State University