Comprehensive Review of Machine Learning Applications in Heart Disease Prediction
Comprehensive Review of Machine Learning Applications in Heart Disease Prediction
Abstract:- Heart infections are responsible, for deaths and heart disease treatment, as there is advanced development in
now they are a major contributor to depression in many the diagnosis of heart diseases and well informed population.
individuals. To prevent fatalities, regular monitoring and Early detection of heart disease is crucial as it can
early identification of heart conditions can significantly significantly improve the chancesof effective treatment and
reduce the number of deaths. Detecting heart disease has reduce mortality rates (see Figure1). The factors that cause
become a task in the analysis of data. While accurately heart diseases are also grouped into the reversible and fixed
predicting heart infections may pose challenges employing components. These are factors that cannot change
advanced machine learning techniques can make it easier. biologically status and encompasses age, gender, genes
Studies have shown that machine learning methods can among others. Again, whereas non-modifiable risk factors are
effectively predict heart disease enabling detection and beyond one’s control, modifiable risk factors refer to the
assessment of its severity. This approach aims to lower lifestyle factors. These include being over-weight, smoking,
mortality rates decrease the severity of the illness and no exercise, poor diet, high blood pressure and high levels
facilitate diagnosis. The field of therapy is undergoing of cholesterol respectively. It is thus advisable to work on the
advancements through the integration of machine alterable risk factors as a way of decreasing the prevalence
learning techniques leading to enhanced accuracy in and severity of heart diseases. Another difficulty that is linked
interpreting analyses. These techniques play a role, in intimately with heart diseases is the extent of public ignorance
identifying indicators for predicting cardiac diseases with about these factors. This lack of knowledge, most of the time,
precision. The presentation is put together using leads to late presentation and hence delayed intervention,
categorization techniques, such, as Decision Tree (DT) which is followed by severe health consequences, or even
K Nearest Neighbors (K NN) Random Forest (RF) and mortality. It is extensive knowledge among the medical
Support Vector Machine (SVM). The performance of communities and the providers in the health care industry that
these four algorithms is assessed from angles, including early detection and preventive measures are optimal when it
specificity, recall, accuracy and precision. While precision comes to heart diseases. Heart diseases if diagnosed on time
varies SVM appears to deliver the results in this approach are usually manageable; however, if diagnosed at a senior
for calculations, in many instances. stage are fatal. Better diagnosis enhanced by improved
technology and knowledgeable populace will encourage early
Keywords:- Heart Disease, Machine Learning, Prediction, health care and intervention, better results.
Supervised Learning, Unsupervised Learning, Deep
Learning.
I. INTRODUCTION
Often, the disease is only recognized in its final research paper attained the highest accuracy of 99. 05%
stages or after passing. This phenomenon has prompted achieved by using the Random Forest Bagging Method and
therapeutic organizations to emphasize the importance of Relief feature selection.
early disease detection.
S. Mondal et al. [3] which aimed at proposing a new two
II. LITERATURE SURVEY tiered stacked predictive model for analysis of heart diseases
risks with the use of machine learning. This model leveraged
Cardiovascular Diseases or CVD which includes heart a dataset containing eleven significant characteristics from
dis- eases is a global problem and early diagnostic tools are 1190 patients sourced from five distinct datasets: Some of the
easier to implement. Thus, the work conducted on this area available datasets are the Hungarian dataset, the Cleveland
with the help of Machine learning (ML) has shown the dataset, the Switzerland dataset, Long Beach dataset, and
improvement of prediction accuracy and efficiency of heart the Statlog dataset. Several other ML algorithms, such as
diseases. This review focuses on the updated developments Stochastic Gradient Descent, K-Nearest Neighbor, Logistic
(last decade) in the application of methods based on the ML Regression, and Random Forest have been used in the
algorithm for diagnosing heart disease. Understanding the previousworks to predict HD. In the context of the research
efficiency of thesemethods, comparing their results to each paper, the proposed stacking model outperforms the works
other, and investigat-ing the new trends in this perspective, we reviewed in this study in accuracy, recall, and ROC-AUC
only consider studies based on ML. Because in the current values getting toan accuracy rate of 96% and a recall of 0.
academic database search (IEEE Xplore, PubMed, Google 98.
Scholar) we focus on the most significant work regarding
predictive models. The goal of this review is to provide those A. Lakshmanarao et al. [4] concerned with superior
involved in research or practicing healthcare management predic- tion of heart diseases using the state of the art machine
with useful information. Relatedly, researchers will find future learning integrated with better feature engineering. The
research directions and prospects of oil exploration and experiments concerning the prediction of heart diseases were
clinicians will identify ML applications and best practices for done using heart disease dataset obtained from Kaggle. This
applying them to clinical practice. Thus, this review helps was done through association and correlation of the original
the ongoing processes of improving people’s health and dataset and consequently buildup of a fused available dataset
decreasing the frequency of heart diseases by pointing out contain 29 features and 918 samples. The algorithms used
those approaches that are more promising while indicating were: Thus, there are K-nearest neighbours (KNN), Support
the fields that require further investigationand development. vector machine (SVM), Decision tree, Random forest and
XGBoost. All of these classifiers achieved a good accuracy
T. Ullah et al. [1] focused on increasing the efficiency rate for heart disease detection, ranging from 84% to 96%
ofthe CVD diagnosis by choosing the best features of ECG across different algorithms.
signals which are widely employed for the automatic detec-
tion of CVD using machine learning methods. The research Gorapalli Srinivasa Rao and G Muneeswari [5] looked
utilized two main datasets to detect: Another one is the at how IoT, Data Mining, Deep Learning, and Machine
Hungarian Heart Disease Dataset (HHDD) and the other one Learning help predict heart disease. They used public
is the Behavioural Risk Factor Surveillance System (BRFSS) datasets like UCI open dataset, Heart Disease Dataset,
Dataset. The algorithms used by them were: Gradient Boost- Cleveland heart disease dataset, and Cardiovascular Disease
ing, Logistic Regression, Extra Tree, Random Forest, Support Dataset, plus localdatasets. To predict heart disease, they tried
Vector Machines (SVM), Comparison with State-of the-Art- feature selection methods such as GRAE, CAE, Lasso, IGAE,
Algorithms and Machine Learning-Based Architecture. and ETC. Then, it used classifiers like Random Forest, Naive
Lever-aging such techniques, the study achieved impressive Bayes, and Gradient Boost to check if patients had heart
accuracyrates in cardiovascular disease detection tasks. disease. The study wrapped up by saying we need more
complex systemsto better spot heart disease. It stressed the
P. Ghosh et al. [2] introduced a framework for accurate value of combining different approaches.
Heart Disease prediction by the different methods stressing
on efficient Data Collection, Data Pre-processing and Data C. M. Bhatt et al. [6] set out to examine how well
Transformation for model training. Using various sources of different machine learning methods work to predict heart
data namely Cleveland, Long Beach VA, Switzerland, disease. The research looked at random forest, decision tree
Hungar- ian, and Stat log the study establishes suitable classifier, multilayer perceptron, and XGBoost algorithms.
features through applying Relief and LASSO improving on The teamused a dataset of 70,000 patient records. Each record
the heart disease prediction accuracy. By applying the had 12 key features such as age, gender, and blood pressure
Decision Tree, Bayesian classifier, neural network, readings. The models they came up with showed high
Association law, SVM, KNN and others, the heart diseases accuracy: decision trees got 86.37% with cross-validation,
are detectable with good results.The proposed model in the XGBoost reached 86.87% random forest hit 87.05%, and
multilayer perceptron topped out at 87.28%. Going forward, N. Lutimath et al. [10] focused on the machine learning
the scientists want to check how reliable and useful these techniques to predict heart disease using biomedical data. The
findings are. They also aimto make the results easy for people data -set on proof is taken to his website of heart diseases
to grasp. What’s more, theythink it’s a good idea to explore dataset, Diabetes Mellitus and Liver disease COSTNTAX
how gaps in data and weird outliers might mess with how (video interpretation): Dataset: 600 records were part of this,
well the models work. having a high accuracy rate to about 98% accurately pre-
dicted). These classification methods are used in this paper:
A. M. Qadri et al. [7] conducted an in-depth analysis of Decision Tree, Support Vector Machines and Random Forest
heart-related dataset features, contributing to the model. This paper compares the random forest regression
understanding of key factors influencing heart disease model to decision tree regression based on classification,
prediction. The dataset used for the study contained 499 feature engineering, and performance measures. From this
healthy patients and 526 patients with heart failure disease, study we find that of decision tree regression model and
with a distribution of 300 males and 226 females diagnosed random forest regression model the results are more accurate
with heart failure. Various machine learning algorithms, in case of prediction heart disease by using Random Forest as
developed using the Python programming language-based compared to Decision Tree model.
scikit-learn library module, such as SVM, random forest,
decision tree, logistic regression, and na¨ıve Bayes classifier N. Alageel at al. [11] aimed to increase the accuracy of
were used to predict heart failure. The decision tree method predicting a stroke in the medical field due to Machine and
emerged as the top-performing model, achieving an deep learning technologies have been made. Datasets from
impressive accuracy score of 100% in heart failure prediction, kaggle, EHRs were used to predict stroke with many features
surpassing other machine learning algorithms and without and with age, BMI, glucose level, and smoking
highlighting the success of the proposed feature engineering status. The following machine learning algorithms have been
approach. employed in this research, among others: Na¨ıve Bayes,
SVM, Decision Tree, Random Forest, K-Nearest Neighbours
P. Rahman et al. [8] attempted to develop a machine (KNN), and Stacking. Thus the application showcases the
learning tool based on various machine learning algorithms importance of benefiting simple algorithms, which
together with artificial neural networks (ANNs) for the demonstrate high ac- curacy with explainable outcomes, than
prediction of heart failure risk which can ensure early complex algorithms.
diagnosis and cost effective treatment. The age of dataset
from research was an average 65 years old, showing real M. K. Joshi et al. [12] intended to compare the
patient demographics affected by cardiovascular diseases performance of different Machine learning algorithms in
with a minimum recorded as low as 42 and maximum even predicting cardio- vascular indices, emphasizing on using data
higher than that at the astonishing number of over 95. In the driven methods for early detection and ultimatum of disease.
present study, a combi- nation of various supervised learning The used datasets such as Cleveland dataset from the UCI ML
classifiers such as KNN, SVM, DT, RF and repository and PIMA dataset to train and test their machine
xgBoost/RandomForest, xgBosst/CatBoose was used to learning model for prediction of cardiovascular disease. The
develop a highly stable accurate model. The results methodologies involve implementing deep neural networks,
highlighted the importance of a predictive modeling early KNN and SVM- based models, decision trees with random
detection for heart failure to control effectively the forest classifiers, logistic regression model along with
cardiovascular diseases world epidemic. Naive Bayes classi erand support vector classifier. For heart
disease prediction novel method like MSSO-ANFIS has been
J. Rashid et al. [9] aimed at identifying relevant input introduced here, and therefore, feature selection methods such
features based on the brute-force algorithm and by as LCSA show higher accuracy with lower error rate.
introducing machine learning techniques to improve heart
disease Classi- fier. The research paper uses three datasets of Machine Learning in Healthcare
heart disease named the Cleveland, Statlog and Hungary Machine learning (ML) involves algorithms that allow
datsets which are extracted from University Of California- computers to learn from and make predictions on data. In
Irvine ML repository. The heart disease prediction can be healthcare, ML can analyze large datasets to uncover patterns
improved using machine learning techniques like Support and make predictions. ML methods are increasingly being
Vector Machine (SVM), Random Forest, K Nearest Neighbor usedfor disease prediction, including heart disease.
(KNN) and Naive Bayes. Relative to the results of split
validation, the accuracy with Naive Bayes reached 97%
whereas that caused by Random Forest via cross-validation
was about 95%.
III. TECHNIQUES FOR HEART DISEASE IV. PROPOSED METHODOLOGY FOR HEART
PREDICTION DISEASE PREDICTION
A. Supervised Learning Methods This section outlines the methodology used to develop a
prediction model for identifying individuals at risk of heart
Logistic Regression: Logistic regression is used for binary disease. The approach encompasses a series of steps for data
classification problems. It is simple and interpretable, collection, preparation, analysis, and evaluation to construct
making it a common choice for medical predictions. an effective prediction model. The methodology provides a
Decision Trees: Decision trees partition data into subsets comprehensive framework for conducting a thorough and
based on feature values. They are easy to interpret but productive study as shown in Figure 2.
proneto overfitting.
Random Forest: Random forest is an ensemble method A. Data Collection
that builds multiple decision trees and merges their results
to improve accuracy and control overfitting. Effective Heart Disease Prediction Begins with the
Support Vector Machines: Support vector machines Collectionof Relevant Patient Data. This Data Typically
(SVM) find the hyperplane that best separates different Includes:
classes in the feature space. They are effective in high-
dimensional spaces. Medical Histories: Patient health records detailing past
k-Nearest Neighbors: k-Nearest Neighbors (k-NN) clas- medical conditions and treatments.
sify data points based on their proximity to other points. It Demographic Information: Age, gender, ethnicity, and
is simple but can be computationally intensive. other demographic factors.
Neural Networks: Neural networks consist of layers of Clinical Measurements: Blood pressure, cholesterol lev-
nodes that mimic the human brain. They can model els, blood sugar levels, and body mass index.
complex relationships but require large datasets and Lifestyle Factors: Smoking status, physical activity lev-
computational power. els, dietary habits, and family history of heart disease.
Clustering: Clustering algorithms like k-means and hier- Preprocessing is Essential to Ensure the Quality and
archical clustering group data points based on similarity, Consis-tency of the Dataset. The Steps Involved are:
whichcan reveal patterns in heart disease data.
Data Cleaning: Address missing values, correct incon-
C. Ensemble Methods sistencies, and handle outliers.
Normalization: Scale features to ensure uniformity
Bagging: Bagging, or Bootstrap Aggregating, improves across different attributes.
the stability and accuracy of machine learning algorithms Data Splitting: Divide the dataset into training andtesting
by combining the predictions of multiple models. subsets, typically using an 80:20 ratio.
Boosting: Boosting sequentially applies models to im-
prove weak predictions, commonly used algorithms C. Feature Selection
include AdaBoost and Gradient Boosting. Feature selection involves choosing the most relevant
fea-tures for the predictive model. Techniques used for
D. Deep Learning Methods feature relevance based on their relationship with the target
variable.
Convolutional Neural Networks: Convolutional Neural
Networks (CNNs) are effective for image data, useful in Information Gain: Measures how much a feature
analyzing medical imaging for heart disease. improves the prediction of the target variable.
Recurrent Neural Networks: Recurrent Neural Networks
(RNNs) handle sequential data, such as time series of Wrapper Methods:
patient vitals, to predict heart disease trends.
Forward Selection: Starts with an empty set andadds
features based on model performance.
Backward Elimination: Begins with all features and
removes them iteratively based on performance.
Recursive Feature Elimination (RFE): Removes features
iteratively to find the best subset.
Ensemble Methods:
Feature Importance from Ensemble Models: Al- gorithms
like Random Forest and Gradient Boosting provide scores
based on feature contribution to pre- dictions.
D. Model Testing
(1)
(2)
Ethical Considerations: The protection of patients’ lifestyle data to create a more comprehensive picture of
information as well as the effectiveness of results derived patient health, resulting in more accurate predictions. To build
from big data and machine learning in the health sector is trust and transparency, researchers are developing methods
challenging. Despite its usefulness for developing like SHAP and LIME that provide insights into how models
accurate AI models, patient data is not readily available reach their conclusions. Finally, hybrid models that combine
due to regulatory restrains such as HIPAA and GDPR. traditional machine learning with deep learning leverage the
Others are the ethical issues that include, informed strengths of both approaches to enhance prediction accuracy
consent and the clear statement on how data will be used. and reliability.
Patient data is the main ingredient used in designing and
implementing an AI system in healthcare, and such data Looking ahead, the future of heart disease prediction
needs to be trusted to ensure proper usage. with machine learning is bright. Continuous monitoring
Model Development & Integration: Applying of ma- through real-time data streams holds promise for even better
chine learning models to practical clinical work means predic- tions and enabling proactive interventions. Exploring
multiple stages of work. First, real-world testing of advanced deep learning techniques remains a key area of
interventions makes it possible to enhance the safety interest, with the potential to further improve feature
of ventures and their efficiency. But before that, the extraction and model performance. Personalized prediction
most appropriate method of adopting machine learningis models that consider individual variations in genetics,
selected depending on the problem and data type. lifestyle, and environmental factors are another exciting
Clinicians must comprehend why the model’s prediction avenue for research. Ultimately, the success of these models
are accurate in assessing the patient; lastly, integration hinges on their interpretability and usability in clinical
with the existing health-care concept and electronic settings. Developing user-friendly tools and making models
medical records greatly enhances the chances of the more transparent will be crucial for their seamless integration
model’s successful implementation. into healthcare workflows.
Long-Term Challenges: Despite the successful devel-
opment, the long-term problems of applying machine IX. CONCLUSION
learning models to cardiac diagnosis remain topical. This
flexibility is required due to the constant changes with the This review paper aims at giving a systematic review of
patient demographics, the methods of treatments, and several machine learning models that are applied in predicting
interpreting changing rules and laws. This ensures models the odds of cardiac ailments and noting notable developments
are not outdated, invalid or may infringe on patient’s rights for early detection of the ailments and treatment performance.
for long term sustainable managementof diseases. The use of machine learning approaches has been of great
Scalability & Sustainability: The effectiveness and safety help in improving the chance of predicting cases of heart
of the algorithms to use in diagnosing heart diseases, disease through proper sorting of the risk factors with the
depend on their applicability for real-life use. This makes help of big data. The ability of ML algorithms to sift through
it necessary to carry out tests to establishthe efficacy large amounts of data from different sources can uncover
particularly in the diverse patients and scenarios. Thirdly, potentially unseen risk factors where conventional means
the models require constant updating of data to be fail, thus detecting and inhibiting before an event occurs
effective; this can prove cumbersome at times. Last but probably saves lives. The research emphasizes the critical
not the least, it is necessary to invest financial resources need for ML in addressing this growing epidemic of heart
and personnel time in the long term to routinely check the disease.
models’ validity and update and fine-tune the models.
Some of the promising directions of further
VIII. RECENT ADVANCES AND FUTURE development of ML for improving the assessment of the risk
DIRECTIONS of cardiac diseases are described in the paper. The potential to
further enhance precision and clinical relevance lies in real-
time monitoring via wearables, advanced deep learning
Machine learning is revolutionizing heart disease
algorithms trained across clusters of patients, as well as
prediction. Real-time data from wearables and internet-
personalizedmodels. But broader adoption requires resolving
connected devicesallows for continuous patient monitoring,
issues includ-ing interpretability for clinicians and integration
enabling models to adapt to changing health conditions and
into existing healthcare workflows. The paper also argues that
make dynamic predic- tions. Deep learning techniques like
rigorousinnovation of machine learning models, transparency
Convolutional Neural Net- works and Recurrent Neural
about their predictions and integration into the clinical
Networks excel at handling com- plex medical data such as
workflowis critical going forward. Altogether, these advances
images and time-series information. This leads to improved
have considerable promise for improved diagnostics that are
feature extraction and overall model performance. finer and more accurate- potentially saving lives worldwide
Additionally, data fusion techniques are bringing together by helping to alleviate the burden of cardiovascular diseases.
information from electronic health records, genetics, and