Heart Failure Prediction
Heart Failure Prediction
1 Introduction
HF disease is the number one cause of deaths around the world. The main effect of this
disease is due to blockage in arteries. A heart attack can occur without a person being
aware of it. Heart attack is not always as obvious, and we find common symptoms
like pain in arms and chest, shortness of breath, cold sweats, fatigue, swollen legs,
and rapid heartbeat. A silent heart attack is one that has no symptoms, minimal, or
unrecognized symptoms. High blood pressure, high cholesterol, diabetes, smoking, a
family history of heart disease, obesity, and aging are all risk factors for silent attacks.
The majority of cardiovascular diseases can be prevented by addressing risk factors
like cigarette use, unhealthy diet, obesity, physical inactivity, and excessive alcohol
consumption. It cannot be diagnosed easily overlapping of symptoms with other
diseases. Apart from making life healthy and diet control, diagnosing at early stage
which ultimately saves the lives. People are also unaware of the complications and
symptoms associated with chronic illness, despite advances in health departments.
This paper analyzes the performance of the classification algorithms such as KNN,
SVM, and RF classifier, and MLP for heart failure prediction.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 545
A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances
in Computer Engineering and Communication Systems, Algorithms for Intelligent
Systems, https://doi.org/10.1007/978-981-16-7389-4_53
546 O. S. Priya et al.
2 Background Work
In [1], the researchers experimented with the dataset obtained from UCI reposi-
tory. Researcher compares various decision tree classification algorithms in order to
improve better performance in cardiovascular disease diagnosis. By applying data
mining techniques, it extracts hidden patterns. Algorithms like J48, logistic model
tree, and RFC are used in testing. J48 has the highest accuracy 56.7%, while logistic
model tree has the lowest accuracy 55.77%.
In [2], provide an overall view of the current research on predicting heart
disease. Classification techniques mainly focus on heart disease prediction rather than
studying various data cleaning and pruning approach. Using different data mining
approaches such as DT, C4.5, K-means, ID3, SVM, NB, artificial neural network
(ANN), classification and regression trees (CART) methodology, regression, J48.
Selection of combination of data mining techniques and implementation of it on the
dataset yields a fast and effective implementation of the system for heart disease
management.
In [3], for prediction, the authors examined using 15 medical parameters including
age, gender, blood pressure, cholesterol, and obesity. MLP with backpropagation is
used to build an efficient heart disease prediction system for predicting heart disease
risk levels. Results show that there are zero false negative and false positive entries
such that system predicts heart disease with 100% accuracy.
In [4], the authors have developed and presented a real-time patient monitoring
device that uses Arduino which is capable to sense real-time parameters like body
temperature, blood pressure, humidity, and heartbeat. It is a cloud-based heart disease
prediction device that uses machine learning techniques to identify impending heart
disease. Algorithms like ANN, SVM, and RFC have been used, out of which support
vector shows the highest accuracy level of 97.53%.
In [5], the authors have developed a predictive approach to forecast the chances of
heart failure of a patient admitted in the hospital. Different algorithms with their accu-
racies are decision tree 93.19%, logistic regression 87.36%, random forest 89.14%,
Naïve Bayes 87.27%, and support vector machine 92.30%.
In [6], the dataset has been collected from ‘Framingham’ with attributes such as
gender, age, education, diabetes, BP meds (person on BP medicines), and cigarettes
per day. It uses machine learning to predict the risk of coronary heart disease, algo-
rithms like random forest 96.8%, decision tree 92.7%, K-nearest neighbor 92.87%.
K-nearest neighbor shows the highest execution time than decision tree and random
forest.
In [7], consideration of dataset is a retrospective samples of male from a high risk
region of the Western Cape of South Africa-KEEL. Different algorithms like SVM,
DT, and NB have been used. The accuracy of all the three models tends to show
greater than 70%.
In [8], to demonstrate prediction applied boosting for each ML Technique. Algo-
rithms like NB, SVM, RFC, Hoeffding Tree, and logistic model tree have been used
for effective prediction. Random forest shows the better results compared with all
51 Heart Failure Prediction Using Classification Methods 547
other techniques. The obtained results were compared with the proposed model of
all the techniques such as boosting, bagging and AdaBoost out of which AdaBoost
is the best technique with 80.32% of accuracy.
In [9], the dataset has been collected from UCI repository which consists of biolog-
ical parameters which includes blood pressure, sex, age, cholesterol. Algorithms with
their obtained accuracies are SVM (83%), DT (79%), linear regression (78%), and
K-nearest neighbor (87%). KNN shows the highest rate of accuracy compared with
all other algorithms.
In [10], the researchers used machine learning algorithms like logistic regression,
RFC, DT, and K-nearest neighbor (KNN). KNN is effective in predicting the model
with 85.71% accuracy.
3 Proposed System
All the sources cited above have named the importance of early detection of HF
disease which may help people in living longer lives and improving their lives
healthier. The following framework depicted in Fig. 1 is used in this process.
4 Methodology
Data preprocessing is a process that transforms unstructured data into a format that is
more readable and understandable. Its purpose is to clean up the dataset by eliminating
duplicates, inconsistencies, missing values, and errors.
As a result, data cleaning approach is used for preprocessing the data; we have
used mean, median, and mode to fill missing data, which comprises of checking
missing values, filling in missing data, and cleaning.
4.3 Implementation
The original dataset splits into two sections: training data and test data. We have
divided the dataset into two parts as 80% for training data and 20% for testing data.
Machine learning techniques such as classification methods are used to test a
dataset.
51 Heart Failure Prediction Using Classification Methods 549
It is based on the distance between data points and distinct data that are grouped
together. The user determines the number of neighbors for other groups of data,
which are referred to as neighbors, which is very important in dataset analysis.
KNN is used to perform both regression and classification tasks using numbers
(k) of neighbors. It categorizes new data points based on similarity measures. We
have considered n neighbors as 4.
It refers those data points near to the hyperplane whose distance is perpendicular to
the hyperplane, if we sum all near points of hyperplane and maximize that distance
such set of data points would be called as support vector classifier.
It gives the best possible decision boundary, allowing us to categorize data points
easily. It chooses extreme points that support the hyperplane imagination, which
are referred to as vectors of support; ML algorithms are known as vector support
machine. Kernel is a function which visualizes data in different perspective or set of
dimensions which easily fits a HYPERPLANE. We have used linear kernal and C
refers as penalty with 2 units.
Random forests or random decision forests uses an ‘ensemble’ learning approach for
classification by making multiple decision tree using random samples from training
data.
A RF is a meta estimator that comprises a large number of DT classifiers on
various sub-samples of the dataset and used to improve the predictive accuracy while
avoiding over fitting. We have considered n estimators as 200 and criterion used is
‘Gini,’ which is used to analyze the accuracy.
550 O. S. Priya et al.
To perform classification tasks, the MLP classifier uses an elemental neural network,
which comes under the category of ANN and consists of input layer, two hidden
layers, and output layer.
It is a method of supervised learning. It is an ANN with large number of percep-
tron’s. Therefore, ‘tanh’ activation function has been used. The concept of having
multiple layers and a nonlinear activation makes MLP different from a normal linear
perceptron.
4.3.5 Classification
We are classifying the data as label 0 which indicates that a person with no disease
and label 1 indicates that a person with disease.
We construct confusion matrix and calculate accuracy, precision, recall, and F1-
score.
Predictions: In this, we will predict survival rate of death. The output labels are:
• A person with attack
• A person without attack.
5 Results
6 Conclusion
Heart failure disease is one of the most popularly effected disease. Silent attacks are
commonly seen in women, and they cause a large number of deaths as a result of
delayed identification.
By identifying the functioning of heart failure in human bodies, it helps the diag-
nosis center’s to identify the properties with good accuracy levels, so that we can
avoid any complications like severe pain, shortness of breath.
In this paper, we classified data using a dataset which is obtained from Kaggle.
We have experimented with KNN, SVM, RFC, and MLP algorithms which were
used to determine the existence of survival rate of death from the available dataset,
and the performance of accuracy is also presented. We can conclude and prove
that machine learning algorithms such as K-nearest neighbor (KNN), support vector
machine (SVM), random forest (RF), and multilayer perceptron (MLP) can success-
fully predict data based on experimental findings. The obtained average accuracy
for testing data is KNN (75.0%), SVM (83.3%), RFC (85.0%), and MLP classifier
(71.6%) for target attribute death event. Random forest classifier achieves the best
results in predicting the survival rate of death.
51 Heart Failure Prediction Using Classification Methods 553
References
1. J. Patel, T. Upadhyay, S. Patel, Heart disease prediction using machine learning and data mining
technique. Int. J. Comput. Sci. Commun. 7, 129–137 (2016)
2. A. Hazra, S.K. Mandal, A. Gupta, A. Mukherjee, A. Mukherjee, Heart disease diagnosis and
prediction using machine learning and data mining techniques: a review. Adv. Comput. Sci.
Technol. 10(7), 2137–2159 (2017). ISSN 0973-6107
3. P. Singh, S. Singh, G.S. Pandi-Jain, Effective heart disease prediction system using data mining
techniques. Int. J. Nanomed. 13, 121–124 (2018)
4. S. Nashif, M.R. Raihan, M.R. Islam, M.H. Imam, Heart disease detection using machine
learning algorithms and a real-time cardiovascular health monitoring system. World J. Eng.
Technol., 854–873 (2018)
5. F.S. Alotaibi: Implementation of machine learning model to predict heart failure disease. Int.
J. Adv. Comput. Sci. Appl. (IJACSA) 10(6) (2019)
6. D. Krishnani, A. Kumari, A. Dewangan, A. Singh, N.S. Naik, Prediction of coronary
heart disease using supervised machine learning algorithms, in IEEE Region 10 Conference
(TENCON) (2019), pp. 367–372
7. A.H. Gonsalves, F. Thabtah, R.M.A. Mohammad, G. Singh, Prediction of coronary heart
disease using machine learning: an experimental analysis, in Proceedings of the 2019 3rd
International Conference on Deep Learning Technologies (2019)
8. P. Motarwar, A. Duraphe, G. Suganya, M. Premalatha, Cognitive approach for heart disease
prediction using machine learning, in International Conference on Emerging Trends in
Information Technology and Engineering (ic-ETITE) (2020), pp. 1–5
9. A. Singh, R. Kumar, Heart disease prediction using machine learning algorithms, in Interna-
tional Conference on Electrical and Electronics Engineering (ICE3) (Gorakhpur, India, 2020),
pp. 452–457
10. A. Singh, Prediction of heart disease using machine learning. Int. J. Sci. Res. Comput. Sci.
Eng. Inf. Technol. (2020). ISSN: 2456-3307