Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
https://doi.org/10.1007/s42979-020-00365-y
ORIGINAL RESEARCH
Received: 27 September 2020 / Accepted: 2 October 2020 / Published online: 16 October 2020
© Springer Nature Singapore Pte Ltd 2020
Abstract
Heart disease, alternatively known as cardiovascular disease, encases various conditions that impact the heart and is the
primary basis of death worldwide over the span of the past few decades. It associates many risk factors in heart disease and a
need of the time to get accurate, reliable, and sensible approaches to make an early diagnosis to achieve prompt management
of the disease. Data mining is a commonly used technique for processing enormous data in the healthcare domain. Research-
ers apply several data mining and machine learning techniques to analyse huge complex medical data, helping healthcare
professionals to predict heart disease. This research paper presents various attributes related to heart disease, and the model
on basis of supervised learning algorithms as Naïve Bayes, decision tree, K-nearest neighbor, and random forest algorithm.
It uses the existing dataset from the Cleveland database of UCI repository of heart disease patients. The dataset comprises
303 instances and 76 attributes. Of these 76 attributes, only 14 attributes are considered for testing, important to substantiate
the performance of different algorithms. This research paper aims to envision the probability of developing heart disease in
the patients. The results portray that the highest accuracy score is achieved with K-nearest neighbor.
Keywords Heart disease prediction · Data mining · Decision tree · Naïve Bayes · K-NN · Random forest · Machine learning
SN Computer Science
Vol.:(0123456789)
345
Page 2 of 6 SN Computer Science (2020) 1:345
classification algorithms to predict heart disease in patients training and test dataset. Training dataset trains our model
[4]. while testing dataset functions as new data to get accuracy
Data mining is the process of extracting valuable data of the model. The dataset exists with models and its output.
and information from huge databases. Various data mining The classification and regression are its example.
techniques such as regression, clustering, association rule
and classification techniques like Naïve Bayes, decision tree, Unsupervised Learning
random forest and K-nearest neighbor are used to classify
various heart disease attributes in predicting heart disease. Data used to train are not classified or labelled in the data-
A comparative analysis of the classification techniques is set. Aim is to find hidden patterns in the data. The model is
used [5]. In this research, I have taken dataset from the UCI trained to develop patterns. It can easily predict hidden pat-
repository. The classification model is developed using clas- terns for any new input dataset, but upon exploring data, it
sification algorithms for prediction of heart disease. In this draws conclusion from datasets to describe hidden patterns.
research, a discussion of algorithms used for heart disease In this technique, no responses in the dataset are seen. The
prediction, comparison among the existing systems is made. clustering method is an example of an unsupervised learn-
It also mentions further research and advancement possibili- ing technique.
ties in the paper.
Reinforcement Learning
It does not use labelled dataset nor the results are associated
Background
with data, thus model learns from the experience. In this
technique, the model improves its presentation based on its
Heart disease affects millions of people, and it remains the
association with environment and figures out how to discuss
chief cause of death in the world. Medical diagnosis should
its faults and to get the right outcome through assessment
be proficient, reliable, and aided with computer techniques
and testing various prospects.
to reduce the effective cost for diagnostic tests. Data mining
Classification algorithms are commonly used supervised
is a software technology that helps computers to build and
learning techniques to define probability of heart disease
classify various attributes. This research paper uses classifi-
occurrence.
cation techniques to predict heart disease. This section gives
a portrayal of the related subjects like machine learning and
its methods with brief descriptions, data pre-processing, Classification Machine Learning Techniques
evaluation measurements and description of the dataset used
in this research. The classification task is used for prediction of subsequent
cases dependent on past information. Many data mining
techniques as Naïve Bayes, neural network, decision tree
Machine Learning have been applied by researchers to have a precision diag-
nosis in heart disease. The accuracy given by different tech-
Machine learning is an emerging subdivision of artificial niques varies with number of attributes. This research pro-
intelligence. Its primary focus is to design systems, allow vides diagnostic accuracy score for improvement of better
them to learn and make predictions based on the experi- health results. We have used WEKA tool in this research
ence. It trains machine learning algorithms using a training for pre-processing the dataset, which is in ARFF format
dataset to create a model. The model uses the new input data (attribute-relation file format). Only 14 attributes of all 76
to predict heart disease. Using machine learning, it detects different attributes have been considered for analysis to get
hidden patterns in the input dataset to build models. It makes precise results. By comparison and analysis using different
accurate predictions for new datasets. The dataset is cleaned algorithms with WEKA tool heart disease can be predicted
and missing values are filled. The model uses the new input and treated early and prompt [5].
data to predict heart disease and then tested for accuracy.
Machine learning techniques are classified as:
Approach Methodology
Supervised Learning
This research aims to foresee the odds of having heart dis-
ease as probable cause of computerized prediction of heart
The model is trained on a dataset that is labelled. It has
disease that is helpful in the medical field for clinicians and
input data and its outcomes. Data are classified and split into
SN Computer Science
SN Computer Science (2020) 1:345 Page 3 of 6 345
Data Source
For this study, I have used dataset from UCI Machine learn-
ing repository. It comprises a real dataset of 300 examples
of data with 14 various attributes (13 predictors; 1 class) like
blood pressure, type of chest pain, electrocardiogram result,
etc. (Table 1). In this research, we have used four algorithms
to get reasons for heart disease and create a model with the
maximum possible accuracy.
Data Pre‑processing
SN Computer Science
345
Page 4 of 6 SN Computer Science (2020) 1:345
Integration the data may not be acquired from a single This algorithm splits the data into two or more analogous
source but varied sources, and it has to be integrated before sets based on the most important indicators. The entropy of
processing. each attribute is calculated and then the data are divided,
Reduction the data gained are complex and require to be with predictors having maximum information gain or mini-
formatted to achieve effective results. mum entropy:
The data are then classified and split into training data c
∑
set and test data set which is run on various algorithms to Entropy(S) = −Pi log2 Pi,
achieve accuracy score results. i=1
SN Computer Science
SN Computer Science (2020) 1:345 Page 5 of 6 345
Table 2 Percentage accuracy Naïve Bayes K-nearest neighbor Decision tree Random forest
results of classification
techniques Accuracy
Testing set 88.15789474 78.94736842 (K = 2) 73.68421052631578 84.21052631578947
(parameter = entropy) (parame-
ter = entropy)
Training set 81.05726872 90.78947368 (K = 7) 80.26315789473685 82.89473684210526
(parameter = gini) (parameter = gini)
SN Computer Science
345
Page 6 of 6 SN Computer Science (2020) 1:345
and random forest. The data were pre-processed and then 7. Pahwa K, Kumar R. Prediction of heart disease using hybrid
used in the model. K-nearest neighbor, Naïve Bayes, and technique for selecting features. In: 2017 4th IEEE Uttar Pradesh
section international conference on electrical, computer and elec-
random forest are the algorithms showing the best results tronics (UPCON). IEEE. p. 500–504.
in this model. I found the accuracy after implementing four 8. Pouriyeh S, Vahid S, Sannino G, De Pietro G, Arabnia H, Gutier-
algorithms to be highest in K-nearest neighbors (k = 7). rez J. A comprehensive investigation and comparison of machine
We can further expand this research incorporating other learning techniques in the domain of heart disease. In: 2017 IEEE
symposium on computers and communications (ISCC). IEEE. p.
data mining techniques such as time series, clustering and 204–207.
association rules, support vector machine, and genetic 9. Chauhan R, Bajaj P, Choudhary K, Gigras Y. Framework to pre-
algorithm. Considering the limitations of this study, there dict health diseases using attribute selection mechanism. In: 2015
is a need to implement more complex and combination 2nd international conference on computing for sustainable global
development (INDIACom). IEEE. p. 1880–84.
of models to get higher accuracy for early prediction of 10. Bouali H, Akaichi J. Comparative study of different classifica-
heart disease. tion techniques: heart disease use case. In: 2014 13th interna-
tional conference on machine learning and applications. IEEE. p.
482–86.
Author contribution All authors have equal contribution to this study 11. Xu S, Zhang Z, Wang D, Hu J, Duan X, Zhu T. Cardiovascular
and all authors have read and approved the final manuscript. risk prediction method based on CFS subset evaluation and ran-
dom forest classification framework. In: 2017 IEEE 2nd interna-
tional conference on big data analysis (ICBDA). IEEE. p. 228–32.
Compliance with Ethical Standards 12. Otoom AF, Abdallah EE, Kilani Y, Kefaye A, Ashour M. Effective
diagnosis and monitoring of heart disease. Int J Softw Eng Appl.
Conflict of interest The authors declare that they have no competing 2015;9(1):143–56.
interest. 13. Vembandasamy K, Sasipriya R, Deepa E. Heart diseases detec-
tion using Naive Bayes algorithm. Int J Innov Sci Eng Technol.
2015;2(9):441–4.
14. Chaurasia V, Pal S. Data mining approach to detect heart diseases.
References Int J Adv Comput Sci Inf Technol (IJACSIT). 2014;2:56–66.
15. Parthiban G, Srivatsa SK. Applying machine learning methods in
1. Seckeler MD, Hoke TR. The worldwide epidemiology of acute diagnosing heart disease for diabetic patients. Int J Appl Inf Syst
rheumatic fever and rheumatic heart disease. Clin Epidemiol. (IJAIS). 2012;3(7):25–30.
2011;3:67. 16. Deepika K, Seema S. Predictive analytics to prevent and con-
2. Gaziano TA, Bitton A, Anand S, Abrahams-Gessel S, Murphy A. trol chronic diseases. In: 2016 2nd international conference on
Growing epidemic of coronary heart disease in low-and middle- applied and theoretical computing and communication technology
income countries. Curr Probl Cardiol. 2010;35(2):72–115. (iCATccT). IEEE. p. 381–86.
3. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine- 17. Dwivedi AK. Performance evaluation of different machine learn-
learning improve cardiovascular risk prediction using routine ing techniques for prediction of heart disease. Neural Comput
clinical data? PLoS ONE. 2017;12(4):e0174944. Appl. 2018;29(10):685–693.
4. Ramalingam VV, Dandapath A, Raja MK. Heart disease predic-
tion using machine learning techniques: a survey. Int J Eng Tech- Publisher’s Note Springer Nature remains neutral with regard to
nol. 2018;7(2.8):684–7. jurisdictional claims in published maps and institutional affiliations.
5. Patel J, TejalUpadhyay D, Patel S. Heart disease prediction
using machine learning and data mining technique. Heart Dis.
2015;7(1):129–37.
6. Fatima M, Pasha M. Survey of machine learning algorithms for
disease diagnostic. J Intell Learn Syst Appl. 2017;9:1–16. https
://doi.org/10.4236/jilsa.2017.91001.
SN Computer Science