0% found this document useful (0 votes)
14 views

Heart Failure Prediction Using Machine Learning Algorithm

Uploaded by

juby
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Heart Failure Prediction Using Machine Learning Algorithm

Uploaded by

juby
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3rd International Conf.

on Computation, Automation and Knowledge Management (ICCAKM 2022)


2022 3rd International Conference on Computation, Automation and Knowledge Management (ICCAKM) | 978-1-6654-5319-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICCAKM54721.2022.9990484

Heart Failure Prediction Using Machine Learning


Algorithm
Sameer Pandey Dr. Ravinder Kaur
Research Scholar, Chandigarh University, Punjab Assistant Professor, Chandigarh University, Punjab
Email: [email protected] Email: [email protected]

Abstract—According to WHO around 17.9 million Cardio- in developing nations due to a lack of medical diagnostic
vascular diseases (CVDs) are the principal effect of mortality hospitals, osteopathic physicians, and other facilities that effect
everywhere, asserting the lifeline of an estimated 17.9 million the exact prognosis of cardiovascular disease.
humans every year, it is a guesstimated 32 % of worldwide.
With the rising population, heart disease is becoming increasingly ML is one of the most rapidly growing areas of Artificial
difficult to identify and treat at an early stage. With the help of Intelligence (AI), with applications in a wide range of indus-
machine learning (ML) advancement in the healthcare sector, we tries, including healthcare. Because ML is an intelligent tool
can build models which can predict heart diseases. The outcome for analyzing data, and the medical profession is abundant
of this paper is to create a model that will successfully predict with data, it offers a lot of value in the healthcare field. ML
whether a person has cardiovascular disease or not, based on
these 11 clinical features. We have used the UCI heart failure techniques may be used to identify several kinds of ailment;
prediction dataset for this research work. We have used five dif- however, this study will focus on heart disease diagnosis.
ferent machine learning (ML) algorithms in this paper Gradient Because CVDs are the supreme reason for mortality every-
Boosting(GB), Random Forest(RF), K-nearest neighbors (KNN), where today, early detection of the condition is critical to
Logistic Regression(LR) and Support Vector Machine(SVM) has saving lives. The significance of ML in identifying hidden
been used for the development of the model. After a comparison
of all the above machine learning techniques found that SVM is discrete patterns and analyzing the data cannot be overstated.
the best suitable technique with 94.56 % efficiency in predicting Just after data analysis, machine learning strategies can help
whether or not a human has a cardiovascular ailment or not. predict and identify cardiovascular problems. This paper gives
In the implementation portion of the research, there is also an experimental analysis of the ML classification algorithms
a comparison explanation of all the algorithms. This effective utilized in area of cardiac ailment diagnosis [3]. Being given
prediction algorithm will also assist doctors in minimizing the
amount of heart disease-related fatalities. us the inference about various machine learning algorithms
due to the availability of the performance parameters of these
Keywords: Cardiovascular diseases, KNN, Logistic Regression, algorithms collected during experiments on same input i.e.
SVM, Gradient Boosting and Random Forest same dataset. It demonstrates how critical machine learning is
in the healthcare area, as well as how it can produce accurate
I. INTRODUCTION
predictions and assist healthcare practitioners. The order is
In today’s society, each one is so preoccupied in their life set out in the following way. Section 2 covers the about
and job, don’t have time to meditate or take care of their health. work done in background and basics of machine learning,
Human life is totally dependent on the effective operation classification algorithms, and the most extensively used heart
of the heart. The heart pumps blood through blood veins disease dataset among academics. In section 3 we discussed
to the various body parts, supplying adequate oxygen and about the datasets and its attributes. The literature overview of
other important nutritional components for the body’s proper the current proposed research activity in this area is included
functioning. A healthy heart is conducive to a healthy life. in Section 4. Section 5 discussed methodologies. Section 6
Because of their hurried lives, most individuals suffer from compares and contrasts the categorization techniques based on
stress, anxiety, despair, and a variety of other ailments. People their accuracy. Finally, in section 7, the conclusion is offered.
are becoming unwell and suffering from serious ailments as a
result of these key reasons [1]. There are various illnesses that II. BACKGROUND
cause individuals to die each year, such as Covid-19, cancer, This part contains summaries of the paper’s related disci-
heart disease, TB, and so on, but the leading cause of mortality plines, such as machine learning, its algorithms (with brief
in the medical field is heart disease or cardiovascular disease definitions) and data preparation strategies.
(CVD) [2]. One of humanity’s main concerns is healthcare.
As stated by WHO principles, individuals have such a basic A. Machine Learning (ML)
potential to improve the quality health. It is believed that ML is a subfield of artificial intelligence. Machine learning
adequate health care assistance is provided for regular health allows machines to understand from input attributes, per-
checkups. Heart diseases are the leading cause for more formed quantitative and qualitative analysis to provide out-
than 32% among all deaths worldwide. Early detection and comes inside about a given scale. The objective of ML is to
diagnosis of many CVDs is highly challenging, mainly
978-1-6654-5319-6/22/$31.00 ©2022 IEEE 1
Authorized licensed use limited to: University of Wollongong. Downloaded on October 19,2024 at 06:24:04 UTC from IEEE Xplore. Restrictions apply.
comprehend the representation of the data so that users can un- across the training phase. It does not begin to create a sim-
derstand and use how they fit it into algorithm. ML constructs ulation until the dataset has been searched. As a consequence,
it simpler as machines to develop reliable prototype using KNN is an excellent data mining algorithm.
training dataset and execute decision-making methodologies 3) Support Vector Machines(SVM): SVMs are supervised
depends on input data as a consequence [4]. Autonomous ve- ML techniques i.e utilized for classification and regression
hicles, Industrial automation, medical surgery, robotics, virtual tasks. SVMs are more frequently used for categorical attributes
assistants, medicine, pattern recognition, natural language pro- problems. SVMs [7] were firstly presented in early 1960s, In
cessing, Statistical arbitrage, data mining, Dialogue systems, 1990s SVM comes into play utilized for model building. In
games playing, google maps, product recommendation, share compared to other ml techniques, SVMs have a distinct user
prediction, medical diagnosis, chat bots, image recognition, interface. SVM algo now a days popular and used frequently
and crime prediction through video surveillance systems are due to his capacity to manage categorical continuous both
just some of the fields where machine learning is used. attributes. Datapoints that seem to be nearest to the hyperplane
There are 2 important kinds of ML algorithms mainly used: are referred to as support vectors. These data points will be
1) Supervised Learning: Supervised ML approaches antic- used to define a separating line.
ipate occurrences by using what it has gain experience from Hyperplane is a sub space i.e split among a group of entities
prior present data with the use of attribute tags. ML gen- with distinct group, as seen in the picture above. The space
erates an estimated feature to anticipate expected outcomes, across two lines on the nearest data points of various classes is
beginning with both the dataset training method. The model known as the range. The perpendicular length among the line
is capable of providing outcomes based on given input data and SVM may be computed. A large range is seen as a decent
after a suitable training phase. The ML algorithm compares range, whereas a tiny range is regarded as a poor margin.
achieved outcomes to predicted original results to find flaws 4) Random Forest (RF): RF is a supervised model ap-
and adapt your algorithm based on the outcomes. [5]. proach. It constructs a “forest” consisting of a bunch of
2) Unsupervised Learning: If training dataset is unlabeled, decision trees, i.e, frequently assisted by the ”bagging” ap-
unsupervised ML approaches are used. It investigates that proaches. The basic concept of the classifier technique is to
model is reliable and how they may derive a method which integrate more training methodologies boost the end product
is used to describe the underlying prototype in unlabeled [9]. It’s a straightforward machine learning approach that,
dataset. Models doesn’t really know the appropriate outcomes; across many conditions, gives awesome outcomes absence of
however, it recognizes the data and creates insights out of it any importance for hyper-parameter tweaking. It is widely
in order to find hidden pattern in unlabeled data. [5]. utilised approaches owing to their efficiency adaptability.
5) Gradient Boosting (GB): The primary idea behind this
B. Machine Learning Techniques technique is to develop models in a sequential manner, with
In this paper 5 machine learning techniques are used for each model attempting to decrease the inaccuracies of the
model deployment. preceding model.
1) Logistic Regression: Because the theory underlying it When any objective attribute is sequential, we employ GB
is quite similar to Linear Regression, it’s dubbed ’Logistic Regressor, and once the objective is a classification problem,
Regression.’ The Logit function is used in this classification we use GB Classifier [10]. The only distinction among the two
strategy, thus the term “Logistic.” Logistic Regression (LR) [6] is the “Loss function.” By inserting training examples, the idea
is a form of predictive technique based on the probabilistic idea is to utilise gradient descent to lower this error function. We
used for categorization. Logistic Regression, a classification would have variety loss strategies for correlation scenarios
technique [7], is used to estimate the likelihood of a set of since it is dependent on an error function, such as Mean
categories. Through LR, the outcome parameter is a binary squared error (MSE).
variable with information encoded as 0 or 1.
The goal of Logistic Regression is to discover a link among C. Data Preprocessing
a collection of qualities as well as likelihood of a certain Data preprocessing is a crucial stage in the data preparation
occurrence. A Logistic Regression pattern is superior to a and machine learning process [11]. This is the first stage in
Linear Regression analysis in that it utilises an objective every data analysis effort. It is made up of the following steps:
functions defined as such “Sigmoid” or ”logit ” rather than 1. Attributes cleansing
a linear function. 2. Reshaping of attributes
2) K-Nearest Neighbors(KNN): KNN is perhaps ML’s 3. Attributes Integration
greatest fundamental and critical supervised algorithms 4. Attributes minimization
which works on categorical dataset [8]. Facial recognition,
recommendation systems, text mining, healthcare etc., are III. EXPERIMENTAL SETUP
some applications area, once you submit the dataset after This section is explaining about the experimental setup for
training, it will not do further any training again that’s the this work which includes the dataset, prepossessing and other
reason it is called lazy learning algorithm. Rather, it just mandatory tasks performed to get the outcome for analysis.
captures the information and does not perform any calculations These all are explained in upcoming sections.

Authorized licensed use limited to: University of Wollongong. Downloaded on October 19,2024 at 06:24:04 UTC from IEEE Xplore. Restrictions apply.
Pronab et al. Satish et al. proposed that feature selection
methods may be utilized to lower the cost of diagnosis by
picking the important attributes. The author used distinct
ML models like RF, KNN, SVM, Naive Bayes, and Neural
networks. They combined 5 datasets into a single dataset
and implement all the algorithms. After comparison of the
algorithm, the author found that the Random forest model is
best for the prediction of heart disease with efficiency 92.16%
[12].
Apurb et al. studied that Data science is critical in the
processing of massive amounts of data in the realm of
healthcare. Because predicting cardiac disease is a complicated
Fig. 1: Data Representation undertaking, there is a need to automate the process in order
to reduce hazards and inform patients well in advance. Author
uses the UCI heart failure prediction dataset and implements
A. Collection of Data 5 algorithms on that behalf of accuracy author found that
Random Forest is the most accurate algorithm to predict heart
This dataset was created by combining various datasets
failure with an efficiency of 90.16% [13].
all of which were available in the past but had not been
Chimaa et al. talks about the critical condition of CVDs
combined. This dataset combines five heart datasets with 11
people can suffer. Using ML algorithms to develop advanced
similar features to facilitate the largest cardiovascular disease
technologies that can properly diagnose individuals based on
dataset obtainable for research objective to date. Below are the
electronic health records. Focused on several cardiac factors,
following 5 datasets were utilised in its formulation:
implemented distinct ML model likewise multi-layer percep-
• Cleveland Ohio: 303 insights
tron, RF, SVM and Naive Bayes. In the result, they found
• Hungarian: 294 insights
that SVM models work best in the comparison with efficiency
• Switzerland: 123 insights
91.67% [14].
• Long Beach Virginia: 200 insights
• Stalog(Heart) Data Set: 270 insights V. METHODOLOGIES
• Totalinsights: 1190 insights
A. Exploratory Data Analysis (EDA)
• Duplicate values: 272 insights
• Finalize dataset: 918 insights
The first thing we need to perform data analysis and check
whether our target label is balanced or unbalanced. A count
We use Kaggle’s UCI Heart Failure Prediction Dataset. This plot should be sufficient to find out. As shown in 2
dataset contains 918 observations with 11 clinical variables
• Amount of patients without heart disease: 410
(such as age, sex, cholesterol, and so on) and one target label,
• Amount of patients with heart disease: 508
heart disease (patients). The major goal is to develop a model
that is reliable to do prediction whether a person has CVDs
or not based on these 11 clinical characteristics. As a result,
this is a binary categorization issue.

B. Attributes description
The dataset consists of 14 attributes. The predictable at-
tribute is referred to “Heart Disease” and rest of 13 as input
attributes. The attribute descriptions are shown in Fig 1:

IV. LITERATURE REVIEW


Devansh et al. argued that cardiovascular disease, author
found different kind of disorders affecting the human heart,
is the leading cause of mortality worldwide during the last
few decades. This work covers several features associated Fig. 2:Heart Disease
to heart disease, use ML methods likewise decision tree(DT),
Nave Bayes, KNN, and RF for data mining classifications. Once we get balanced data for EDA, need to check all the
The collection has 303 instances and 76 attributes. Only 14 attributes column, and remove unnecessary column which is
of the 76 features are chosen for evaluation, which itself is not used in analysis for heart failure [15].
required to examine the characteristics of different techniques. As a result of count plots of heart diseases, found that Men
After comparison author found that KNN algorithm best with are more than twice as likely as women to get heart disease.
higher accuracy 90.78% [4]. It turns out that ASY is the most frequent kind of chest pain

Authorized licensed use limited to: University of Wollongong. Downloaded on October 19,2024 at 06:24:04 UTC from IEEE Xplore. Restrictions apply.
among persons with heart disease. If you experience exercise- F-1 is the parameter that reached me down to assess the
induced angina and answer Y, you are more likely to get heart algorithm’s analysis.
disease than if you answer N. The most linked to the aim is the For each algorithm, to calculate the performance of the
slope of the peak exercise ST, which is flat. A fasting blood model, a confusion matrix was used to calculate the precision,
sugar level of 1 (FastingBS ¿ 120 mg/dl) indicates that you recall, F-measure and accuracy of the output [9]. The formulae
may have heart disease. below were used to compute all the parameters:

B. Correlation Matrix with Heatmap TP + TN


Accuracy = (1)
Correlation indicates how closely the attributes are related TP + TN + FP + FN
to each other or to the aim parameter. The heatmap makes it TruePositive
simple to classifying of the features which are really important Precision = (2)
to the target attribute and will use the seaborn library to TruePositive + FalsePositive
visualize the heatmap’s related features. TruePositive
Correlation can give positive values (when one value rises, Recall = (3)
the value of the objective variable rises) or negative (when TruePositive + FalseNegative
one value falls, the value of the objective attribute falls) 2 ∗ Precision ∗ Recall
(When one value is increased, the target attribute’s attribute F1= (4)
Precision + Recall
decreased). This heatmap shows that ’cp’ chest pain is strongly
related to the target variable. Compared to how the other two • True Positive (TP): It means person is diagnosed with
heart ailment.
components are connected, can claim that chest discomfort
• True Negative (TN): Negative report, It means person has
plays the most important role in predicting the presence of
no heart ailment.
heart disease. A heart attack is a real emergency. A cardiac
• False Positive (FP): Positive report and person has no
develops when a blood embolism blocks the blood supply to
heart ailment.
the heart [16]. Without blood, tissue lacks oxygen and dies,
• False Negative (FN): Negative report and person diagnose
producing chest discomfort.
with heart ailment.
Table 1 shows the confusion matrix derived by the suggested
model for various methods. Table 2 shows the accuracy score
for Random Forest, KNN, SVM, Logistic Regression, and
Gradient Boosting classification algorithms.
TABLE I: Outcomes from Confusion Matrix

Algorithm TP TN FP FN

Logistic Regression 90 67 10 17

KNN 96 70 7 11
Fig. 3: Correlation features of Heart Disease
SVM 105 69 5 5
Age and Oldpeak are the numerical variables that are more
associated to a heart illness, according to the figure (remember Random Forest 95 66 11 12
that this feature is related to depression, so it makes sense).
Fasting Blood Sugar is also important, but we’ll go over that in Gradient Boosting 96 65 12 11
more depth as a category characteristic later. Other numerical
characteristics aren’t linked to heart disease in any way. In
fact, the goal has a negative association with Max Heart Rate VII. CONCLUSION
and Cholesterol. Medics, on the other hand, rarely tell us this.
A plausible hypothesis is that these two characteristics require This research study found that number of deaths increasing
some form of interaction with other characteristics in order to rapidly with heart diseases. The major goal of this study is to
cause heart disease. develop a reliable system that can reliably forecast the risk of
a heart attack at an early stage. In this study, compared several
VI. RESULT AND ANALYSIS machine learning methods for heart disease prediction using
the UCI dataset. As you can see, all of our models performed
This segment shows the outcome analysis of RF, KNN, admirably. The Logistic Regression model, Random Forest
SVM, GB, and LR. Accuracy score, Precision, Recall, and and Gradient boosting fared somewhat worse than the others,
whereas KNN models scored nearly identically, with minor

Authorized licensed use limited to: University of Wollongong. Downloaded on October 19,2024 at 06:24:04 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Comparison of Machine Learning algorithms
[4] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction using
machine learning techniques,” SN Computer Science, vol. 1, no. 6, pp.
Algorithm Precision Recall F1 Accuracy 1–6, 2020.
[5] R. Saravanan and P. Sujatha, “A state of art techniques on machine
learning algorithms: A perspective of supervised learning approaches
Logistic Regression 0.857 0.853 0.854 85.3% in data classification,” in 2018 Second International Conference on
Intelligent Computing and Control Systems (ICICCS), 2018, pp. 945–
949.
KNN 0.904 0.902 0.902 90.2% [6] A. Gupta, R. Kumar, H. S. Arora, and B. Raman, “Mifh: A machine
intelligence framework for heart disease diagnosis,” IEEE Access, vol. 8,
SVM 0.954 0.954 0.954 94.56% pp. 14 659–14 674, 2019.
[7] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease
prediction using hybrid machine learning techniques,” IEEE access,
Random Forest 0.875 0.875 0.875 87.5% vol. 7, pp. 81 542–81 554, 2019.
[8] S. F. Weng, “Can machine-learning improve cardiovascular risk predic-
tion using routine clinical data?” Apr. 2017.
Gradient Boosting 0.875 0.875 0.875 87.5% [9] P. Rani, R. Kumar, N. M. Ahmed, and A. Jain, “A decision support
system for heart disease prediction based upon machine learning,”
Journal of Reliable Intelligent Environments, vol. 7, no. 3, pp. 263–275,
2021.
[10] P. Theerthagiri et al., “Cardiovascular disease prediction using recursive
feature elimination and gradient boosting classification techniques,”
arXiv preprint arXiv:2106.08889, 2021.
[11] N. A. Saeed and Z. T. M. Al-Ta’i, “Heart disease prediction system using
optimization techniques,” in International Conference on New Trends in
Information and Communications Technology Applications. Springer,
2020, pp. 167–177.
[12] N. S. C. Reddy, S. S. Nee, L. Z. Min, and C. X. Ying, “Classification
and feature selection approaches by machine learning techniques: Heart
disease prediction,” International Journal of Innovative Computing,
vol. 9, no. 1, 2019.
[13] A. Rajdhan, A. Agarwal, M. Sai, D. Ravi, and P. Ghuli, “Heart disease
prediction using machine learning,” International Journal of Research
and Technology, vol. 9, no. 04, pp. 659–662, 2020.
[14] C. Boukhatem, H. Y. Youssef, and A. B. Nassif, “Heart disease pre-
Fig. 4: Comparison Graph diction using machine learning,” in 2022 Advances in Science and
Engineering Technology International Conferences (ASET), 2022, pp.
1–6.
[15] R. Katarya and S. K. Meena, “Machine learning techniques for heart
changes in False Positives/Negatives. Finally, the Support disease prediction: a comparative study and analysis,” Health and
Vector Classifier model outperformed the others, achieving a Technology, vol. 11, no. 1, pp. 87–97, 2021.
[16] A. Gavhane, G. Kokkula, I. Pandya, and K. Devadkar, “Prediction of
remarkable accuracy of 94.56% in Fig [4]. heart disease using machine learning,” in 2018 second international
conference on electronics, communication and aerospace technology
VIII. FUTURE WORK AND SCOPE (ICECA). IEEE, 2018, pp. 1275–1278.
The presented approach may be improved and expanded to
increase the accuracy of heart disease prediction. To obtain
a better and more accurate result, will employ additional
machine learning algorithms on the UCI heart failure dataset.
Can also collect real-time data sets from hospitals in various
countries. This might increase performance and accuracy in
predicting cardiac disease. Can also implement an optimized
deep-learning model to give accurate results and reliability.
Similar prediction methods can be developed for a variety of
other chronic or deadly conditions such as cancer, diabetes,
and so on.
REFERENCES
[1] J. O. R. Kim, Y.-S. Jeong, J. H. Kim, J.-W. Lee, D. Park, and H.-S.
Kim, “Machine learning-based cardiovascular disease prediction model:
A cohort study on the korean national health insurance service health
screening database,” Diagnostics, vol. 11, no. 6, p. 943, 2021.
[2] F. Z. Abdeldjouad, M. Brahami, and N. Matta, “A hybrid approach
for heart disease diagnosis and prediction using machine learning
techniques,” in International conference on smart homes and health
telematics. Springer, 2020, pp. 299–306.
[3] C. R. Olsen, R. J. Mentz, K. J. Anstrom, D. Page, and P. A. Patel, “Clin-
ical applications of machine learning in the diagnosis, classification, and
prediction of heart failure,” American Heart Journal, vol. 229, pp. 1–17,
2020.

Authorized licensed use limited to: University of Wollongong. Downloaded on October 19,2024 at 06:24:04 UTC from IEEE Xplore. Restrictions apply.

You might also like