15.ECG Based Decision Support System For Clinical Man
15.ECG Based Decision Support System For Clinical Man
15.ECG Based Decision Support System For Clinical Man
Abstract. Heart disease prediction system using ECG is to predict heart disease using ECG
signals. Heart is the next major organ comparing to brain, which has more priority in human
body. Heart disease diagnosis is a complex task which requires much experience and knowledge.
The huge amount of data generated for prediction of heart disease is too complex and voluminous
to be processed by traditional methods. By using traditional methods doctors took lot of time to
diagnosis the disease. So, an entropy based feature selection technique is used with classification
algorithms in order to reduce the search space. The proposed model was tested on the real time
dataset of NRI Hospital medical data. Using this system it is easier to predict the disease. It will
also helpful for the doctors to take quick decisions.
1. Introduction
Heart attacks are one of the major reasons behind several deaths happening worldwide [1]. An
electrocardiogram is a graphical record of the extent and course of the electrical movement that is created
by contraction and relaxation of the ventricular and atria electrocardiogram is used to check the heart
rate by placing several electrodes ECG because of their age, weight, high cholesterol level, obesity etc.
Any change in the rhythm of the ECG signal is due of heart disease. According to the survey in 2004
1.1 million people died from heart disease , a total of 72 million of these deaths are due to heart disease
and 5.7 million were due to stroke A recent survey has found out that about 23.6 million people will die
of cardiovascular disease by 2030. Chronic disease requires constant treatment to enhance the quality
life of patients. Nowadays, it is estimated that 12% of natural deaths occur accidentally, 88% of which
are cardiac. The identification of early heart beat rhythms plays a vital role in preventing heart disease.
Now a day’s data is generating more and more [14] so, Content extraction becomes a more challenging
task in the today’s world [7]. Information on the web is also of different types which contains structured,
semi structured and unstructured kind of data and current websites present a larger wide variety of
complexities than traditional ones[13]. Data mining is about describing the past and projecting the future
for analysis. It also helps to derive information from large datasets [3] [4]. This involves data planning,
evolution, data interpretation, modelling and deployment. To deal this complex data, there is a need of
dominant tool like Machine Learning (ML) [2].
2. Literature Survey
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016
The healthcare system produces massive quantities of data every day. Much of it isn't used successfully
though. Animesh Hazra et al.[5] suggested that any of the latest work on heart disease prediction using
data mining techniques analyses the various combinations of mining algorithms used and conclude
which techniques are successful and efficient. A non-stationary signal, the electrocardiogram (ECG) is
commonly used to measure the rate and frequency of heartbeats. A comparison of the overall pattern
and shape of the ECG waveform helps doctors to identify possible illnesses. Currently, a Computer
based diagnosis is conducted using some signal processing to diagnose an ECG-based patient. The
feature extraction scheme for subsequent analysis specifies the amplitudes and intervals in the ECG
signal or any other features there. Recently various research techniques for analysing the ECG signal
have been developed [6].
3. Proposed Model
3.1. Experimental setup: This experiment was conducted on the Intel® Core™ i7 Processors with 64
bit Windows 10Pro machine. Anaconda 5.1.0 jupyter notebook Python distribution is used in this
experiment. This dataset is gathered from real time data. We have collected the real time information
in NRI hospital Vijayawada. We gathered the ECG records of the patients and note down the values in
an excel sheet. The dataset consists of 9 attributes such as PR interval [12], QRS duration, QTC
interval, QT interval, vent rate, P wave, T wave, QRS wave and problem. Out of these attributes, class
variable is problem attribute and remaining attributes are used as the predictor variables. Figure 1 shows
the process of the proposed system of Heart Disease prediction using ECG signals.
2
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016
3. QT_interval 300-440 ms The QT interval is the time from the start of the QRS series,
representing ventricular depolarization to the end of the ventricular
repolarization of the T wave.
4. QTC_interval 400-440 ms It is nothing but corrected QT interval.
5. P_axes 110 ms The p wave shows the first positive deflection on the ECG and atrial
depolarization.
6. T_axes 160 ms T wave reflects ventricular repolarisation.
7. QRS duration 80-120 ms. The“QRS complex” is a variation of the Q wave, R wave and S. This
reflects depolarization to the ventricle
8. QRS_axes <100 ms QRS axes applies to frontal plane ventricular depolarization.
9. Problem - Describes whether it is normal ECG or abnormal ECG or borderline
ECG or normal except with rate.
3
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016
observe that the dataset is having missing values. In Figure 8 we replaced missing values with mean so
that we don’t have any missing values.
Figure 7. Heat map before filling Figure 8. Heat map after filling
missing values missing values
3.2.2 Correlation
Correlation is a statistical measure that demonstrates how often two or more variables fluctuate together.
A positive relationship shows degree to which those factors positively correlated to one other. A negative
relationship shows degree to which those factors negatively correlated to one other. From the Figure 9 it
is observed that the vent_rate and QT_interval are negatively related to one other. So, the attribute
vent_rate got dropped.
Figure 9. Heat map before filling Figure 10. Heat map after filling
missing values missing values
3.2.3 Information Gain
This is also one of the pre-processing techniques, which is used to measure the reduction in entropy. It
is widely used to build a model like decision tree from a training dataset, to calculate the information
benefit of a variable and to choose a variable that maximizes the benefit of information, thereby dividing
the dataset into groups for successful classification. From Figure 10 it is observed that PR_interval has
the lowest information gain. Information gain can be calculated using the following formula.
4. Implementation
4.1 Splitting the Dataset
4
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016
Dataset is splitted into two parts such as the train set and test set. It means 80% of data into the train set
and the remaining 20% of data into the test set as shown in Figure 11.
5. Result Analysis
Figure 14 shows the accuracy comparison of three classifiers (Decision Tree, Gaussian Naïve Bayes,
SVC). From Figure 14 it is observed that the highest accuracy is 97% for Decision Tree and Gaussian
Naïve Bayes. Figure 15 shows the accuracy comparison of three classifiers (Decision Tree, Gaussian
Naïve Bayes, SVC) when K-fold cross validation is applied. From fig14 it is observed that the highest
accuracy is 98.2% for Decision Tree. It is also observed that all the three classifiers achieves highest
performance after applying K-fold cross validation.
5
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016
Figure 16 shows the Comparison accuracy of three classifiers (Decision Tree, Gaussian Naïve Bayes,
SVC) before and after applying Information Gain. It is observed that after applying Information Gain
the accuracy of decision tree has increased from 97.3 to 98.2.
6. CONCLUSION
In this paper various pre-processing techniques were applied on the data with three Machine Learning
algorithms like Decision Trees, Gaussian Naive Bayes and SVC in- order to predict presence or absence
of heart disease. The accuracy varies for different algorithms. The highest accuracy 98.2% was achieved
by Decision tree with Information Gain and K-Fold cross validation methods. By using this system we
can reduce medical errors, enhance patient safety and improve patient outcomes. It is easier to predict
the disease and it is also helpful for the doctors to make quick decisions.
7. References
[1] M.Sireesha, S.N.TirumalaRao and Srikanth V 2019 optimized Feature Extraction and Hyrid
Classification Model for Heart Disease and Breast cancer Prediction International journal of
Recent Technology and Engineering vol-7, No 6 pp 1754-1722.
[2] Prakash K.B and Dorai Rangaswamy M.A. 2016 Content extraction studies using neural network
and attribute generation. Indian Journal of Science and Technology pp 1-10.
[3] M.Sireesha, Srikanth vemuru and S.N.Tirumalarao 2018 Coalesce based binary table: an
enhanced algorithm for mining frequent patterns International journal of Engineering &
Technology, vol. 7, no.1.5, pp 51-55.
[4] M.Sireesha, S.N.Tirumalarao and Srikanth V 2018 Frequent itemset Mining Algorithms: A survey
Journal of Theoretical and Applied Information Technology vol 96, No.3 pp 744-755.
[5] Animesh H, Subrata Kumar M, Amit G, Arkomita M and Asmita M 2017 Heart Disease Diagnosis
and Prediction Using Machine Learning and Data Mining Techniques: A Review Advances in
Computational Sciences and Technology Vol 10, No 7 pp 2137-2159.
[6] Naveen Ku. Dewangan and S. P. Shukla 2015 A Survey on ECG Signal Feature Extraction and
Analysis Techniques International Journal Of Innovative Research In Electrical, Electronics,
Instrumentation And Control Engineering Vol. 3 Issue 6 pp 12-19.
[7] Prakash K.B and Rangaswamy M.A.D 2016 Content extraction of biological datasets using soft
computing techniques Journal of Medical Imaging and Health Informatics pp 932-936
[8] M.Sireesha, S. N. TirumalaRao and Srikanth V 2020 Predictive Analysis of Imbalanced
Cardiovascular Disease Using SMOTE International Journal of Advanced Science and
Technology Vol 29 No 5 pp 6301 – 6311.
[9] A.Dhanasekar and Dr.R.Mala 2016 Analysis of Association rule for heart disease prediction from
large datasets International Journal of Innovative Research in Science, Engineering and
Technology Vol 5, Issue 10 pp 18059-18063
[10] M.Sireesha, Srikanth V and S.N.Tirumala Rao 2020 Classification Model for Prediction Of Heart
Disease Using Correlation Coefficient Technique International Journal of Advanced Trends
in Computer Science and Engineering Vol 9, No 2 pp 2116 – 2123.
[11] A. Calderon, A. Pérez and J. Valente 2019 ECG Feature Extraction and Ventricular Fibrillation
(VF) Prediction using Data Mining Techniques IEEE 32nd International Symposium on
Computer-Based Medical Systems (CBMS), Cordoba, Spain, 2019, pp. 14-19
[12] https://www.healio.com/cardiology/learn-the-heart/ecgreview/ecginterpretation-tutorial
[13] Prakash K.B., Dorai R M.A., and Raman A.R. 2010 Proc. Int. Conf. on Trendz in Information
6
AICERA 2020 IOP Publishing
IOP Conf. Series: Materials Science and Engineering 1085 (2021) 012016 doi:10.1088/1757-899X/1085/1/012016