Arrhythmia Detection - A Machine Learning Based
Arrhythmia Detection - A Machine Learning Based
Abstract— Cardiovascular arrhythmias are most common domain, frequency and morphological features etc. [1].
cardiac problem in the world. This work while focusing on Thereafter, various machine learning algorithms in recent
development of automated detection and classification of works trained to identify arrhythmia or its types. Most of the
arrhythmia using MIT-BIH database compared five machine work used support vector machine (SVM) [8], artificial neural
learning algorithms with three different features. Pre-processing
followed by beat detection is applied on one channel to get
network (ANN) [9], Decision Trees [10] and Random Forest
individual beats having PQRST window. Three features (viz. classifiers [11] etc. Deep learning methodology is applied to
simple amplitude feature of 300 samples, area feature with non- differential different classes of arrhythmia [3], [4]. Recently
overlapped sliding window, and area feature with overlapped artificial neural network (ANN) and deep learning architecture
sliding window) were used for five classifiers used in this study. It are becoming prominent in decisive systems. A CNN based
was observed that Artificial Neural Network with amplitude 34-layered deep learning framework trained on patient's data
features gave best result with 99.59% accuracy which is to achieve cardiologist level performance for arrhythmia
comparable to state-of-the art methods. detection to classify 14 classes in real time [3]. To automate
the task of arrhythmia detection, another work to classify
Keywords—ANN, Arrhythmia, Decision Tree, ECG, MIT-BIH,
Machine Learning, Signal Processing, Random Forest, SVM
normal (N), right bundle branch block (RBBB) and paced beat
by transfer learning approach using AlexNet [4]. In another
work, ventricular arrhythmia and non-ventricular arrhythmia
I. INTRODUCTION classes were diagnosed with the help of personalized two and
Cardiovascular arrhythmias are very common form of three features with support vector machine [5]. A system
disease which might cause cardiac arrest or even death [1]. based on nonlinear analysis of variational modes of ECG was
Occurrences of some arrhythmias are very infrequent, hence presented with two novel features viz. variational mode
the patient has to be monitored for a long time to identify the sample entropy and variational mode distribution entropy,
arrhythmia [2]. Manual diagnostic of arrhythmia from ECG is followed by multiclass support vector machine classifier with
time consuming and often vary with the expertise of radial basis function [6]. Multinomial logistic regression for
cardiologists. Various techniques for auto-detection of detecting arrhythmia achieves 93.13 % using R-R interval
arrhythmia have been developed across the world to diagnose based features [1].
the patients [1], [3]–[6]. There is a need for robust detection of III. MATERIALS AND METHODOLOGY
arrhythmias for prevention of further loss of life.
In this work, we have developed an auto-detection and In this section, the description of database, selection of
classification approach for different types of arrhythmias. types of arrhythmias, beat detection and classification
Normal and four different types of arrhythmias from approach is discussed.
benchmark dataset MIT-BIH are used for the development of A. Benchmark Dataset
approach. Beat detection followed by classification using three
MIT-BIH dataset which is publicly available at PhysioNet
different features (viz. amplitude, area with non-overlapped
[12] is used in this work. The dataset which was prepared in
sliding window and area with overlapped sliding window)
five years consists of 48 records of 2 channel ECG digitized at
were used to compare five different classifiers (viz. Support
a rate of 360 samples per second. 25 male subjects aged from
Vector Machine, Decision Tree, Random Forest, Naïve Bayes
32 to 89 years and 22 female subjects aged from 23 to 89
and Artificial Neural Network). It is demonstrated that the
years were involved in the development of this database. 60%
Artificial Neural Network with amplitude feature is
of the total subjects were inpatient. The beats which could be
performing the best amongst all with 99.59% accuracy.
identified as QRS were annotated and are about 109,000 in
II. STATE-OF-THE-ART numbers. The six records in dataset contain 33 beats which
remain unclassified because of inability to reach agreement on
During the acquisition of ECG data, various noises and
beat types [13].
physiological artifacts affect the signal [5]-[7]. Therefore, a
This work is focused on classification of five classes of
pre-processing algorithm is required to fine-tune the data.
arrhythmia as Normal (N), Paced Beat (/), Right Bundle
Previous work includes various ECG features such as R-time
Branch Block Beat (R), Left Bundle Branch Block (L) and
Authorized licensed use limited to: University of Wollongong. Downloaded on May 30,2020 at 00:31:42 UTC from IEEE Xplore. Restrictions apply.
Premature Ventricular Beat (V). From the two channel ECG C. Data Preparation
data, we considered only first channel of ECG for this work. For peak detection, Pan-Tompkins’s algorithm is applied
ECG may be affected from noise, so we involved pre- [14]. After detecting peaks, the closest matching annotated
processing steps to refine signal data. Then we segmented the peaks were taken into consideration for our studies. Based on
signal into 300 sample points for further feature computation the detected peaks, a window of 300 samples around the peak
units as explained in the next section. (P-149 to P+150 samples) was segmented. This window of
B. Data Pre-processing 300 samples was assumed as amplitude features. From the too
many beat types, we considered only those beats who were
The ECG signal may contain various noises due to baseline
enough in numbers i.e. more than 5000 for this work. We took
wandering noises and respiratory muscle noise etc. A sliding
5 types of beats as they were more than 5000 in numbers. We
window of 300 sample size to compute mean of ECG and then
prepared data by taking equal numbers of beats of each type to
subtracted from each of these samples to shift their mean to
balance the dataset for performing comparisons.
zero in that window. After application of mean shift, an
average filter with kernel size 10 was used to smooth signal
within that sliding window of size 300. This step generated the
pre-processed signal for further preparation of data used for
beat detection followed by classification.
Authorized licensed use limited to: University of Wollongong. Downloaded on May 30,2020 at 00:31:42 UTC from IEEE Xplore. Restrictions apply.
IV. BEAT FEATURE EXTRACTION section C. We applied Simpson rule on absolute values over
We extracted the same feature as two ways by considering each sliding window of 10 samples. Thus, we formed a feature
two different types of sliding window in ECG Signal. We vector of 30 features for each sample. Figure 1 in second
considered area features with an intuition that QRS curve column represents area under the curve for non-overlapped
encompasses more area and a pattern of QRS will alter in sliding window.
different beats and area under the curve will be changed B. Overlapped sliding window
accordingly. We considered 3/8 Simpson rule as it gives An overlapped sliding window of size 50 and stride 25 was
promising results while calculating the area under the curve.
applied to calculate the area of 300 data samples prepared in
previous section III.C. Again, we applied Simpson rule on
absolute values over an overlapped window of 50 samples.
Thus, a feature vector of 11 features prepared for training
Where shows graphical representation for area under the curve by
considering overlapped sliding window.
A. Non-overlapped sliding window
A sliding non-overlapped window of size 10 to calculate
the area for 300 data samples each prepared in previous
TABLE I: CONFUSION MATRIX, CLASSIFICATION REPORTS AND OVERALL RESULT OF FIVE CLASSIFIERS
A. Best confusion matrix with ANN model with amplitude B. Worst confusion matrix with Naïve Bayes classifier
features model with overlapped area features
Predicted/Actual N / R L V Predicted/Actual N / R L V
C. Classification Report for ANN model with amplitude D. Classification Report for Naïve Bayes classifier model
features with overlapped area features
F1- F1-
PRECISION RECALL SUPPORT PRECISION RECALL SUPPORT
SCORE SCORE
N 1.00 1.00 1.00 1465 N 0.58 0.93 0.71 1516
/ 1.00 1.00 1.00 1559 / 0.71 0.55 0.62 1519
R 1.00 0.99 1.00 1528 R 0.81 0.9 0.85 1533
L 1.00 0.99 1.00 1515 L 0.69 0.53 0.6 1489
V 0.99 0.99 0.99 1433 V 0.85 0.6 0.71 1443
AVG/ AVG/
1 1 1 7500 0.72 0.7 0.7 7500
TOTAL TOTAL
E. Overall comparison for five classifiers and different features in terms of accuracies
Authorized licensed use limited to: University of Wollongong. Downloaded on May 30,2020 at 00:31:42 UTC from IEEE Xplore. Restrictions apply.
V. CLASSIFICATION OF BEATS The generated models are tested on the data for final
On a total of 25000 beat samples (5000 for each class), we validation. A sample data is used with normal and abnormal
split data for training and testing i.e. 70% for training data and beats for the testing of model. Fig. 1 shows segmentation and
30% for testing. Five different models for our study are classification of normal and Premature Ventricular beat with
described below: artificial neural network using amplitude features. Different
Support vector machine (SVM): SVM is a class of colour representation is used to highlight the normal and
supervised machine learning algorithms [8]. We applied four abnormal beats. In this example green and yellow colours
different variants of SVM i.e. Linear SVM, SVM with RBF represent normal and Premature Ventricular beats
kernel, SVM with polynomial kernel of degree 3 and Sigmoid respectively.
SVM. It was noticed that SVM with RBF gives satisfactory An automated beat detection and classification system for
results among others. the quantification of types of arrhythmia has been shown here.
Decision tree classifier: A decision tree is classification Various signal features could be targeted for further
algorithm which contains nodes that form a directed tree with development.
a node called root that has no incoming edge [10]. VII. CONCLUSION
Random forest classifier: Random forest classifier, also
known as random decision forest, is a set of various decision In this work, we have demonstrated an automated approach
trees and its capacity can be arbitrarily increased or decreased for detection of beats followed by classification. Five different
to improve accuracy for both training and unseen data [11]. classifiers were compared for five classes of data (normal and
We implemented Random forest with 10 trees which gave four types of arrhythmias) with three different features. It is
significant accuracy. observed that Artificial Neural Network with amplitude
Naïve Bayes classifier: This classifier is a simple feature gives the best results with 99.59% accuracy. In future,
probabilistic classifier based on Bayes theorem. There are we will target to develop a comprehensive module for mobile
many variants of Naïve Bayes classifiers viz. Gaussian naïve platform.
Bayes, Multinomial naïve Bayes etc. In this study Gaussian
naïve Bayes function is used. ACKNOWLEDGMENT
Artificial neural network (ANN): Basic element of Authors wish to acknowledge funding received by CSIR-
processing of neural networks are called neurons which learn CSIO under CSIR mission on Intelligent Systems (IS) –
by adjusting weights in accordance to the data to be learnt [9]. Intelligent Technologies and Solutions. Vishavpreet Singh is
We used ANN with input layer of 300 nodes, 4 hidden layers thankful to the financial Support by CSIR-Senior Research
with 152, 300, 300 and 152 nodes, and one output node with 5 Fellowship program.
nodes.
REFERENCES
VI. RESULTS AND DISCUSSIONS [1] O. Behadada, M. Trovati, M. A. Chikh, N. Bessis, and Y. Korkontzelos,
The algorithm is developed and tested on a HP Z6 “Logistic regression multinomial for arrhythmia detection,” in
Foundations and Applications of Self* Systems, IEEE International
Workstation with 32 GB RAM and 8 GB NVIDIA P4000 Workshops on, 2016, pp. 133–137.
graphics card. Python libraries sklearn and keras were used [2] R. A. Sanders, T. A. Kurosawa, and M. D. Sist, “Ambulatory
for the development to make it an open-source package. electrocardiographic evaluation of the occurrence of arrhythmias in
The performance of classifiers is compared and the results healthy Salukis,” J. Am. Vet. Med. Assoc., vol. 252, no. 8, pp. 966–969,
2018.
shown in Table 1. It suggests that ANN with pre-amplitude [3] P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y. Ng,
features of 300 samples gives best result with 99.59% “Cardiologist-Level Arrhythmia Detection with Convolutional Neural
accuracy. With area features with non-overlapping sliding Networks,” arxiv, 2017.
window, SVM with RBF performed best with 98.97%. [4] A. Isin and S. Ozdalili, “Cardiac arrhythmia detection using deep
learning,” in Procedia Computer Science, 2017.
Likewise, area features with overlapped sliding window when [5] P. Cheng and X. Dong, “Life-threatening ventricular arrhythmia
fed to Random Forest with 10 trees gives third best result with detection with personalized features,” IEEE Access, 2017.
97.733%. [6] A. Chetan, R. K. Tripathy, and S. Dandapat, “A Diagnostic System for
Detection of Atrial and Ventricular Arrhythmia Episodes from
Electrocardiogram,” J. Med. Biol. Eng., 2018.
[7] J. Park, M. Kang, J. Gao, Y. Kim, and K. Kang, “Cascade Classification
with Adaptive Feature Extraction for Arrhythmia Detection,” J. Med.
Syst., 2017.
[8] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf,
“Support vector machines,” IEEE Intell. Syst. their Appl., vol. 13, no. 4,
pp. 18–28, 1998.
[9] A. Abraham, “Artificial neural networks,” Handb. Meas. Syst. Des.,
2005.
[10] L. Rokach and O. Maimon, “Top-down induction of decision trees
classifiers-a survey,” IEEE Trans. Syst. Man, Cybern. Part C
(Applications Rev., vol. 35, no. 4, pp. 476–487, 2005.
[11] T. K. Ho, “Random decision forests,” in Document analysis and
Fig. 1: Detection of Normal and Premature Ventricular beat with artificial recognition, 1995., proceedings of the third international conference on,
neural network using amplitude features
Authorized licensed use limited to: University of Wollongong. Downloaded on May 30,2020 at 00:31:42 UTC from IEEE Xplore. Restrictions apply.
1995, vol. 1, pp. 278–282. database,” IEEE Eng. Med. Biol. Mag., vol. 20, no. 3, pp. 45–50, 2001.
[12] G. B. Moody, R. G. Mark, and A. L. Goldberger, “PhysioNet: a web- [14] J. Pan and W. J. Tompkins, “A real-time QRS detection algorithm,”
based resource for the study of physiologic signals,” IEEE Eng. Med. IEEE Trans. Biomed. Eng, vol. 32, no. 3, pp. 230–236, 1985.
Biol. Mag., vol. 20, no. 3, pp. 70–75, 2001.
[13] G. B. Moody and R. G. Mark, “The impact of the MIT-BIH arrhythmia
Authorized licensed use limited to: University of Wollongong. Downloaded on May 30,2020 at 00:31:42 UTC from IEEE Xplore. Restrictions apply.