2022-Heart Disease Prediction Using Machine Learning Techniques Publication
2022-Heart Disease Prediction Using Machine Learning Techniques Publication
net/publication/357422370
CITATIONS READS
35 426
4 authors:
Some of the authors of this publication are also working on these related projects:
“PSK to CSK Mapping for Hybrid Systems Involving the Radio Frequency and the Visible Spectrum View project
All content following this page was uploaded by Reldean Williams on 20 September 2022.
Abstract— One of the main contributors to death cases globally In this study, criteria such as heart rate, blood pressure,
is heart diseases. Heart illnesses have an impact on many people in gender, diabetes, age and so on are used in the prediction and
the middle or elderly age which, in most instances, lead to serious diagnosis of heart disease. The prediction of this condition is
health adverse effects such as strokes and heart attacks. Therefore, it difficult since multiple causes are involved in heart disease.
is necessary to diagnose and predict heart diseases to prevent any
serious health issues before they occur. In this paper, a provisional
Many of the main heart disease signs include:
study and examination, using different state of the art Machine
Learning Techniques namely Artificial Neural Networks, Decision • Sweating and fatigue.
Trees and Naïve Bayes, Random Forest, Logistic Regression, • Chest tightness.
Support Vector Machines and XG Boost, were implemented at
various evaluation stages to predict heart diseases. Results show that • Pressure in the upper back pain that spreads to the
Random Forest technique has outperformed the other techniques and arm.
achieved a prediction accuracy of 95%. • Nausea, indigestion, heartburn, or stomach pain.
• Shortness of breath.
Keywords— Machine Learning, Heart Disease, Decision
Trees, Naive Bayes, Neural Networks. The mode of cardiac attack is as follows: ‘cardio' is indicated
by the heart. All cardiac disorders thereby influence the division
of coronary diseases. Coronary heart disease is classified into
I. INTRODUCTION various forms [4]:
The triggers and treatment of cardiac arrest have been
researched for many years, and new findings are under • Angina pectoris
development. However, new findings suggest that it might
not be prudent to specifically restrict the consumption of dietary • Coronary heart disease
concentrated trans-fats or supplement them with • Congenital heart diseases
polyunsaturated trans-fats while taking into consideration • Congestive heart failure
certain health conditions [1]. The most important task in the • Cardiomyopathy
profession of healthcare is the detection of illness. If an illness
is diagnosed early, a lot of lives can be spared.
Heart disease is an exhaustive term for a fraction of illnesses,
accidents and conditions in the heart and blood vessels.
Machine learning recognition methods may make a Symptoms of heart disease differ by type of heart condition [4].
significant contribution to the medical profession by
providing accurate and efficient diagnosis of diseases. Several
complications are emerging at a gradual rate and developing Congenital cardiac disorder relates to the formation and
disorders of the heart are quickly detected [2] [19]. operation of the heart due to premature growth of the heart since
birth [5] [12]. When the heart fails to pump enough blood to the
body's different organs, congestive heart disease develops. The
The well-being of the core of an adult is based solely on most prevalent type of heart condition with its scientific term is
the perception of how a person lives, which depends entirely coronary heart failure or ischemic heart disorder [6]. Coronary
on the professed and individualized actions of the person. heart failure is a condition that relates to injury to the heart that
There might be a variety of genetic fundamentals through arises as the blood flow is reduced, increasing fatty deposits on
which a form of heart disease has been spread over decades blood vessel linings that provide oxygen to the cardiac muscles,
[3]. Cardiac disorders are more common in men than in contributing to their constriction [7] [13].
women. According to the World Health Organization (WHO)
statistics, reports show that 24 per cent of mortality are caused
by non-transmissible illnesses in India, which take place
because of heart attacks [3].
978-1-6654-1656-6/21/$31.00 ©2021
Authorized licensed use limited 118
IEEEof Johannesburg. Downloaded
to: University on February 17,2022 at 21:15:14 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Data Analytics for Business and Industry (ICDABI)
Cardiovascular disease is a major cause of drowsiness and The Naïve Bayes, Artificial Neural Networks, Decision Tree
death due to today's lifestyle [8] [14]. Distinguishing signs of for the development of Intelligent Heart Disease Prediction
cardiovascular disease is a necessity, yet a complex errand that Systems (IHDPS) was used by Palaniyappan and Awang [5]. It
should be carried out in a timely and professional manner, and shows the data both in tabular and graphical formats to increase
proper robotization would be exceptionally attractive [9][15]. presentation and ease of understanding. It also aims to lower
Currently, several doctors and hospitals have adopted silent healthcare costs by delivering successful care. Hidden habits
data systems to monitor their social welfare or patient records and interactions have also been shown to be unexploited.
[10]. Presented data mechanisms usually yield a lot of data that Advanced techniques in data mining helped resolve this
could be used in typical organizations, such as diagrams, problem.
images, content and numbers, but unfortunately, this database
containing rich data is used occasionally to make clinical
Deep Learning, Decision Tree, SVM, and KNN [6] are the
choices [11] [16].
methods employed by Sharma and Rizvi [6] to predict heart
disease. Despite the fact that the data sets include noise, they
This paper presents an experiment to forecasting the were able to lower the dimensionality of the data set by cleaning
likelihood of heart disease for a patient using power and and pre-processing it. They discovered that Neural Networks
popular machine learning techniques. This work is conducted have a high level of accuracy.
as an attempt to gain greater diagnoses accuracy, thus
reducing the risks of heart disease and mitigate its severe
impacts on people’s health. Cardiovascular disorder and multiple indications of heart
failure have been addressed in depth by Hazra, Mandal, Gupta,
Mukherjee, and Mukherjee [7]. There were various types of
This paper is structured as follows, section II presents a algorithms and methods for grouping and clustering.
comprehensive literature review on the most important
research that employed machine learning in heart disease A data mining analysis was proposed by Krishnaiah,
detection. Section III introduces the proposed methodology. Narsimha, and Chandra [8]. According to the findings, different
Results are introduced and discussed in section IV, followed accuracies for heart disease prediction may be obtained by
by conclusions and findings analysis in section V. employing a variety of methods and considering a variety of
factors.
II. LITERATURE REVIEW Kaur and Kaur [9] showed that needless, repetitive
information is stored in the heart disease results. It must be pre-
Decision Tree (DT), Naïve Bayes (NB) and Neural processed. To produce better results, they also state that function
Network (NN) algorithms were used by Gandhi and Singh [1] selection needs to be achieved on the data set.
and a medical data set was analyzed and it was found that a wide
number of features are affected, therefore, the number of features
must be limited. In their study they claim that the time is Data mining was employed by Vijayashree and Iyengar [10]
minimized by limiting the number of features. The decision and on a regular basis, a massive amount of knowledge is
tree and neural networks algorithms were employed by them. generated and cannot, as such, be translated manually. To
The research was strong in terms of using decision tree and forecast diseases from these databases, data mining can be used
neural networks to determine the heart disease accuracy, but efficiently. In this paper the heart disease database analyses
minimizing the time could impede on the results accuracy. numerous data mining strategies. To conclude, this paper
analyses and contrasts how multiple classification algorithms
operate on a database of heart disease.
For heart disease prediction, Thomas and Princy [2], used
NN, KNN algorithm, DT and NB. To detect the risk factor for
heart disease, they used data mining techniques. This type of Benjamin, Virani, Callaway, Chamberlain, Chang, and
research could lack a contribution in the study of heart Cheng [11] identify seven major risk factors for heart disease,
disease as it is primarily focused on data mining techniques. including diet, smoking, obesity, diabetes, inactivity,
cholesterol and high blood pressure. Statistics on heart illness,
such as stroke and coronary artery disease, were also discussed.
Bharti and Singh [3], applied Artificial Neural Network,
Genetic Algorithm for Prediction and Associative
Classification using Particle Swarm Optimization. In their experiment, Kishore, Kumar, Singh, Punia, and
Associative classification is a modern and successful Hambir [12] showed that, relative to other techniques such as
methodology that incorporates mining and classification of CNN, NB and SVM, recurrent NN have fair precision. Neural
the association law into a prediction model and achieves Networks also do well in heart disease detection. A system that
reasonable precision. can detect silent heart failure and warn the patient as early as
possible has now been developed [12].
Purushottam, Saxena and Sharma [4] presented “Medical
diagnosis could be improved with the help of an electronic To predict cardiac disease, a number of algorithms were
device, which could also save costs. In this research, we utilized by Kumar, Koushik, and Deepak [13]: Algorithm of
created a framework that can quickly discover rules for Logistic Model Tree, Random Forest, Decision Tree, KNN, NB
predicting a patient's risk level based on a health parameter. and SVM. Unlike other algorithms, the Naïve Bayes algorithm
According to the user's needs, the guidelines may be has achieved strong results. For the Cleveland (UCI repository)
prioritized. The system's success is assessed in terms of heart failure, Kumar, Koushik, and Deepak used the data set
classification precision, and the findings suggest that it has a library. It also took less time to build the J48 algorithm and it
lot of potential for more reliably predicting the risk of heart showed good performance.
disease.”
119
Authorized licensed use limited to: University of Johannesburg. Downloaded on February 17,2022 at 21:15:14 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Data Analytics for Business and Industry (ICDABI)
Arthy and Murugeshwari [16] analysed the current The system will then show the accuracy of the algorithm
studies on the prediction of heart disease using data mining. once the cross-validation process has been passed. This then
In cardiac disease detection, data mining methods are widely means that every step has been completed successfully and the
used. The databases used are also discussed, such as the UCI system will stop. If the data is analyzed and found that it is not a
heart disease library dataset, instruments used, such as Apache supervised learning algorithm, the system will stop immediately
Mahout, Rapid Miner, Weka, KEEL, Data melt, R data and start analyzing the next data set.
mining, Rattle, etc. They presume that better predictive
success benefits are derived from the use of a single
algorithm. However, the use of two or more algorithms for Fig 2 illustrates the various strategies used. The performance
hybridization will boost and enhance the detection of heart is the machine learning models' accuracy metrics. In prediction,
disease with reasonable precision. the formula can then be used. Various Machine Learning
Methods, namely, SVM, KNN, NB algorithm and Decision
Sudha, Gayathri, and Jaisankar [18] addresses Data Trees have been applied in this study. These methods are known
Mining Technologies. They also recommend a concept to be some of the best Machine Learning Classifiers and have been
diagram that includes the following steps: data-set proven to perform and produce excellent results.
multiplication, normalization and pre-processing,
dimensionality reduction using main component analysis,
function subset collection, algorithm of classification, and K-Nearest Neighbours (KNN)
results analysis. They used three classifiers, Naïve Bayes, KNN is an algorithm for non-parametric machine learning.
Neural Networks, and Decision Tree. They assume that A supervised method of learning is the KNN algorithm. This
Neural Networks perform well in contrast to other classifiers. implies that the algorithm calls all of the data and learns from it
to predict the outcome. And it works well because the
knowledge about the instruction is large and includes noisy
To diagnose and predict heart diseases, eight different beliefs.
Machine Learning methods were used in this paper. The
Machine Learning methods employed in this study are well
known and proven to exceptionally perform well in many The material is separated into collections of preparation and
applications and tasks. assessments. For model development and schooling, the train set
is used. The k-value (also known as the square root of the
number of observations) determines it. On the built foundation,
test data is now anticipated. There are certain distance
III. PROPOSED METHODOLOGY
dimensions that are universal. With constant variables,
A study of different Machine Learning approaches is measures such as Minkowski distance, Manhattan distance and
rendered in this paper to forecast patients' heart disease from Euclidean distance can be utilized.
their medical records. Below is the flow chart for the
proposed system approach:
The commonly used metric, though, is the distance from
Euclidean. For Euclidian reach, the formula is as follows:
The data collection for heart failure is taken as an input,
and then pre-processed by substituting column media for non-
available values. The data is then analyzed for different k
Machine Learning Algorithms. If it is a supervised learning d = √Σ (xi − yi) (1)
algorithm, a training and testing set will be created and the i =1
supervised learning algorithm will be applied. Thereafter a
cross-validation process takes place to ensure that the previous
processes were done correctly.
120
Authorized licensed use limited to: University of Johannesburg. Downloaded on February 17,2022 at 21:15:14 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Data Analytics for Business and Industry (ICDABI)
By fine-tuning the hyper parameters, the SVM classifier's When compared to other algorithms, it can be observed that
output can be improved. Using Grid Search CV to do this. the Random Forest method has the best prediction accuracy. The
Separate C values can be supplied to this tool as data. With Random Forest technique produced the most accurate findings
defined values, it constructs various SVM models and then since it produces the class that is the median of the classes or the
seeks the best c value for which the model performs mean/average forecast of individual trees.
admirably.
Attributes used:
Naive Bayes algorithm (NB) Age – Age in years
When the input dimensionality is extremely high, this is a CP – Chest Pain Type, 1: Typical Angina, 2: Atypical Angina,
classification procedure that is employed. The presence of 3: Non-anginal pain, 4: Asymptomatic
one feature in a class is unrelated to the existence of another Sex – Male:1, Female:0
feature, according to a Naive Bayes classifier. The Bayes Exang – Exercise Induced Angina
theorem is used to support it. The following is Bayes' Fbs – Fasting Blood Sugar > 120 mg/dl
theorem: Trestbps - Resting Blood Pressure
Restecg – Resting Electrocardiographic Results (values 0,1,2)
P(Y/X) = P(X/Y) P(X) (3) Slope – The slope of the peak exercise ST segment
Chol – Serum Cholesterol in mg/dl
This is the likelihood that Y will yield X, where X is the CA – Number of major vessels (0-3) colored by fluoroscopy
preceding instance and Y is the dependent case. It needs Thalach – Maximum Heart Rate Achieved
minimal training knowledge. It is simple and may be used for Thal – Normal = 3, Fixed Defect = 6, Reversible Defect = 7
binary classification difficulties. Num – no heart disease present, heart disease present
Oldpeak – ST depression induced by exercise relative to rest
Decision Trees (DT)
One method to viewing an algorithm is using decision
trees. It's a well-known machine learning algorithm. There
are multiple variables of heart disease, such as nicotine, BP,
cholesterol, weight, etc. The decision tree's difficulty lies in
choosing the root node. The data must be specifically
categorized by this element included in the root nodes. As the
root node, we make use of age. It is simple to understand the
decision tree. They're non-parametric, and feature selection is
implied.
121
Authorized licensed use limited to: University of Johannesburg. Downloaded on February 17,2022 at 21:15:14 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Data Analytics for Business and Industry (ICDABI)
Experiments are carried out in Fig 3 to determine the It uses data such as blood pressure, cholesterol, pain in
efficacy of the proposed procedure. To begin, the proposed the chest and then helps to predict a patient's potential heart
approach is tested on data sets related to heart disease. A attack. As stated earlier, the family history of heart failure may also
comparison analysis with other well-known machine learning be a cause for developing heart disease. Thus, this patient
approaches is performed to verify the efficacy of the proposed knowledge may also be incorporated to help improve the
process. model's accuracy.
SVM, logistic regression, KNN and random forest (RF) This study would be helpful in recognizing new patients in
are some of the methods used. Table I summarizes the test the immediate future that might be suffering from heart disease.
This will help to take protective steps and therefore aim to keep
results of the various methods on the data sets, respectively.
the patient from experiencing the risk of heart failure. Thus, if a
From Table I, it is evident that RF algorithm method achieved condition is projected to be positive for heart disease, the doctors
superior classification performance. will closely examine the diagnostic evidence for the condition.
For example, consider that the patient has diabetes that will
This study's data came from the UCI data repository. The cause heart disease or condition, and then the patient will
UCI Machine Learning Repository's data is open to the manage to monitor diabetes, which can prevent heart disease in
public. Because of the lower number of missing values and turn.
outliers, several researchers have found the Cleveland and
Hungarian datasets to be ideal for constructing mining To predict heart disease, several machine learning methods
models. Before being sent to the suggested algorithm for can be employed. In the event of binary classification problems
training and testing, the data is cleaned and pre-processed. such as estimation of heart attack, Logistic Regression will also
perform well. Random Forest, then Decision Trees, should
perform well. It is also possible to extend the ensemble
The UCI Machine Learning Repository contains data approaches and Artificial Neural Networks to the data set which
sets, domain theories, and data generators that the machine may improve the findings.
learning community may use to empirically evaluate machine
learning methods.
ACKNOWLEDGMENT
The overarching goal of my research is to better The author wishes to express her gratitude to everyone
anticipate the existence of cardiac disease. The UCI whose unwavering collaboration made it possible, as well as the
repository data set is utilized in this work to obtain more continual advice and support that crowned all efforts with
accurate results. Decision trees and Naive Bayes were used success. My project supervisor, Professor Thokozani Shongwe,
as data mining classification algorithms. and co-supervisors, Dr Ali Hasan and Mr Vikash Rameshar, are
to be commended for their guidance, inspiration, and
constructive ideas in the development of this research. I would
Despite the fact that this database has 14 features, only a also like to thank my former classmate Mr Thabani Poswa for
subset of them is used in published research. To date, his skillful advice in supporting me to ensure that my project is
machine learning researchers have solely used the Cleveland a success. I also want to thank my family, partner and friends at
database. If the individual has heart disease, the "target" field large for their moral support and continual encouragement in
displays a value ranging from 0 (no presence) to 4 (presence). making this endeavour a success. The author would like to thank
Research with the Cleveland database have primarily focused the University of Bahrain for supporting this research.
on trying to distinguish between presence (values 1,2,3,4) and
absence (value 0).
REFERENCES
Most machine learning algorithms demand integer [1] M. Gandhi and S. N. Singh. “Predictions in heart disease using
values, thus attributes with category values were transformed techniques of data mining”, 2015.
to numerical values. For variables with more than two [2] J. Thomas and R. T. Princy. “Human heart disease prediction system
categories, dummy variables were constructed. Dummy using data mining techniques”, 2016.
[3] S. Bharti and S. N. Singh. “India analytical study of heart disease
variables aid in the learning of data by Neural Networks. prediction comparing with different algorithms”, May 2015.
[4] S. Purushottam, K. Saxena, and R. Sharma. “Efficient heart disease
prediction system using decision tree”, 2015.
[5] S. Palaniyappan and R. Awang. “Intelligent heart disease prediction using
V. CONCLUSION data mining techniques”, August 2008.
[6] H. Sharma and M. A. Rizvi. “Prediction of heart disease using machine
With the rising number of fatalities from heart disease, it learning algorithms:” A survey, August 2017.
is becoming increasingly necessary to develop a system [7] A. Hazra, S. K. Mandal, A. Gupta, A. Mukherjee, and A. Mukherjee.
capable of accurately and effectively forecasting heart illness. “Heart disease diagnosis and prediction using machine learning and data
The study's objective was to find the most effective machine mining techniques:” A review, 2017.
learning method for detecting heart abnormalities. [8] V. Krishnaiah, G. Narsimha, and N. S. Chandra. “Heart disease prediction
system using data mining techniques and intelligent fuzzy approach:” A
review, February 2016.
[9] R. Kaur and P. Kaur. A review – “Heart disease forecasting pattern using
For predicting heart disease, the accuracy ratings of the various data mining techniques”, June 2016.
DT, LR, RF and NB algorithms were evaluated in this study.
122
Authorized licensed use limited to: University of Johannesburg. Downloaded on February 17,2022 at 21:15:14 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Data Analytics for Business and Industry (ICDABI)