0% found this document useful (0 votes)
11 views9 pages

Heart Failure Prediction

The document discusses heart failure (HF) as a leading cause of death globally, emphasizing the importance of early detection and the role of classification algorithms in predicting heart failure. It analyzes various machine learning techniques, including KNN, SVM, RF, and MLP, using a dataset from Kaggle, and presents their accuracy in predicting heart disease outcomes. The findings indicate that the Random Forest Classifier achieved the highest accuracy at 85.0%, highlighting the effectiveness of machine learning in improving heart disease diagnosis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Heart Failure Prediction

The document discusses heart failure (HF) as a leading cause of death globally, emphasizing the importance of early detection and the role of classification algorithms in predicting heart failure. It analyzes various machine learning techniques, including KNN, SVM, RF, and MLP, using a dataset from Kaggle, and presents their accuracy in predicting heart disease outcomes. The findings indicate that the Random Forest Classifier achieved the highest accuracy at 85.0%, highlighting the effectiveness of machine learning in improving heart disease diagnosis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Chapter 51

Heart Failure Prediction Using


Classification Methods

Oruganti Shashi Priya, Kanakala Srinivas, and Sagar Yeruva

1 Introduction

HF disease is the number one cause of deaths around the world. The main effect of this
disease is due to blockage in arteries. A heart attack can occur without a person being
aware of it. Heart attack is not always as obvious, and we find common symptoms
like pain in arms and chest, shortness of breath, cold sweats, fatigue, swollen legs,
and rapid heartbeat. A silent heart attack is one that has no symptoms, minimal, or
unrecognized symptoms. High blood pressure, high cholesterol, diabetes, smoking, a
family history of heart disease, obesity, and aging are all risk factors for silent attacks.
The majority of cardiovascular diseases can be prevented by addressing risk factors
like cigarette use, unhealthy diet, obesity, physical inactivity, and excessive alcohol
consumption. It cannot be diagnosed easily overlapping of symptoms with other
diseases. Apart from making life healthy and diet control, diagnosing at early stage
which ultimately saves the lives. People are also unaware of the complications and
symptoms associated with chronic illness, despite advances in health departments.
This paper analyzes the performance of the classification algorithms such as KNN,
SVM, and RF classifier, and MLP for heart failure prediction.

O. S. Priya (B) · K. Srinivas · S. Yeruva


Department of CSE, VNR Vignana Jyothi Institute of Engineering and Technology, Bachupally,
Hyderabad, India
K. Srinivas
e-mail: [email protected]
S. Yeruva
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 545
A. B. Reddy et al. (eds.), Proceedings of Second International Conference on Advances
in Computer Engineering and Communication Systems, Algorithms for Intelligent
Systems, https://doi.org/10.1007/978-981-16-7389-4_53
546 O. S. Priya et al.

2 Background Work

In [1], the researchers experimented with the dataset obtained from UCI reposi-
tory. Researcher compares various decision tree classification algorithms in order to
improve better performance in cardiovascular disease diagnosis. By applying data
mining techniques, it extracts hidden patterns. Algorithms like J48, logistic model
tree, and RFC are used in testing. J48 has the highest accuracy 56.7%, while logistic
model tree has the lowest accuracy 55.77%.
In [2], provide an overall view of the current research on predicting heart
disease. Classification techniques mainly focus on heart disease prediction rather than
studying various data cleaning and pruning approach. Using different data mining
approaches such as DT, C4.5, K-means, ID3, SVM, NB, artificial neural network
(ANN), classification and regression trees (CART) methodology, regression, J48.
Selection of combination of data mining techniques and implementation of it on the
dataset yields a fast and effective implementation of the system for heart disease
management.
In [3], for prediction, the authors examined using 15 medical parameters including
age, gender, blood pressure, cholesterol, and obesity. MLP with backpropagation is
used to build an efficient heart disease prediction system for predicting heart disease
risk levels. Results show that there are zero false negative and false positive entries
such that system predicts heart disease with 100% accuracy.
In [4], the authors have developed and presented a real-time patient monitoring
device that uses Arduino which is capable to sense real-time parameters like body
temperature, blood pressure, humidity, and heartbeat. It is a cloud-based heart disease
prediction device that uses machine learning techniques to identify impending heart
disease. Algorithms like ANN, SVM, and RFC have been used, out of which support
vector shows the highest accuracy level of 97.53%.
In [5], the authors have developed a predictive approach to forecast the chances of
heart failure of a patient admitted in the hospital. Different algorithms with their accu-
racies are decision tree 93.19%, logistic regression 87.36%, random forest 89.14%,
Naïve Bayes 87.27%, and support vector machine 92.30%.
In [6], the dataset has been collected from ‘Framingham’ with attributes such as
gender, age, education, diabetes, BP meds (person on BP medicines), and cigarettes
per day. It uses machine learning to predict the risk of coronary heart disease, algo-
rithms like random forest 96.8%, decision tree 92.7%, K-nearest neighbor 92.87%.
K-nearest neighbor shows the highest execution time than decision tree and random
forest.
In [7], consideration of dataset is a retrospective samples of male from a high risk
region of the Western Cape of South Africa-KEEL. Different algorithms like SVM,
DT, and NB have been used. The accuracy of all the three models tends to show
greater than 70%.
In [8], to demonstrate prediction applied boosting for each ML Technique. Algo-
rithms like NB, SVM, RFC, Hoeffding Tree, and logistic model tree have been used
for effective prediction. Random forest shows the better results compared with all
51 Heart Failure Prediction Using Classification Methods 547

Fig. 1 Proposed framework for heart failure prediction

other techniques. The obtained results were compared with the proposed model of
all the techniques such as boosting, bagging and AdaBoost out of which AdaBoost
is the best technique with 80.32% of accuracy.
In [9], the dataset has been collected from UCI repository which consists of biolog-
ical parameters which includes blood pressure, sex, age, cholesterol. Algorithms with
their obtained accuracies are SVM (83%), DT (79%), linear regression (78%), and
K-nearest neighbor (87%). KNN shows the highest rate of accuracy compared with
all other algorithms.
In [10], the researchers used machine learning algorithms like logistic regression,
RFC, DT, and K-nearest neighbor (KNN). KNN is effective in predicting the model
with 85.71% accuracy.

3 Proposed System

All the sources cited above have named the importance of early detection of HF
disease which may help people in living longer lives and improving their lives
healthier. The following framework depicted in Fig. 1 is used in this process.

4 Methodology

4.1 Description of the Dataset

Input dataset ‘Heart_Failure_Clinical_Records’ was obtained from Kaggle, which


consists of 12 attributes with 1 target attribute (death event) and 300 records. There are
7 nominal and 6 numeric attributes. Attributes are age, anemia, creatinine phospho-
kinase, diabetes, platelets, ejection fraction, high blood pressure, serum creatinine,
serum sodium, sex, smoking, time, and death event. ‘Outcome’ of the feature we are
going to predict 0 means no heart disease, 1 means heart disease. Table 1 displays
the information for each attribute (Fig. 2).
548 O. S. Priya et al.

Table 1 Accuracy obtained using various classification algorithms


S. No. Algorithm Accuracy (%)
1 K-nearest neighbor 75.0
2 Support vector machine 83.3
3 Random forest 85.0
4 Multilayer perceptron 71.6

Fig. 2 Description of heart failure prediction

4.2 Data Preprocessing

Data preprocessing is a process that transforms unstructured data into a format that is
more readable and understandable. Its purpose is to clean up the dataset by eliminating
duplicates, inconsistencies, missing values, and errors.
As a result, data cleaning approach is used for preprocessing the data; we have
used mean, median, and mode to fill missing data, which comprises of checking
missing values, filling in missing data, and cleaning.

4.3 Implementation

The original dataset splits into two sections: training data and test data. We have
divided the dataset into two parts as 80% for training data and 20% for testing data.
Machine learning techniques such as classification methods are used to test a
dataset.
51 Heart Failure Prediction Using Classification Methods 549

Classifiers: A type of supervised learning that enables computers to learn from


their experiences. It learns from the input it receives and then applies the learned
knowledge to categorize new observations. We apply different algorithms to build
model and predict model. In order to test the dataset, we will use the following
classifiers:
1. K-nearest Neighbor
2. Support Vector Machine
3. Random Forest Classifier
4. Multilayer Perceptron.

4.3.1 K-Nearest Neighbor Classifier

It is based on the distance between data points and distinct data that are grouped
together. The user determines the number of neighbors for other groups of data,
which are referred to as neighbors, which is very important in dataset analysis.
KNN is used to perform both regression and classification tasks using numbers
(k) of neighbors. It categorizes new data points based on similarity measures. We
have considered n neighbors as 4.

4.3.2 Support Vector Machine Classifier

It refers those data points near to the hyperplane whose distance is perpendicular to
the hyperplane, if we sum all near points of hyperplane and maximize that distance
such set of data points would be called as support vector classifier.
It gives the best possible decision boundary, allowing us to categorize data points
easily. It chooses extreme points that support the hyperplane imagination, which
are referred to as vectors of support; ML algorithms are known as vector support
machine. Kernel is a function which visualizes data in different perspective or set of
dimensions which easily fits a HYPERPLANE. We have used linear kernal and C
refers as penalty with 2 units.

4.3.3 Random Forest Classifier

Random forests or random decision forests uses an ‘ensemble’ learning approach for
classification by making multiple decision tree using random samples from training
data.
A RF is a meta estimator that comprises a large number of DT classifiers on
various sub-samples of the dataset and used to improve the predictive accuracy while
avoiding over fitting. We have considered n estimators as 200 and criterion used is
‘Gini,’ which is used to analyze the accuracy.
550 O. S. Priya et al.

4.3.4 Multilayer Perceptron

To perform classification tasks, the MLP classifier uses an elemental neural network,
which comes under the category of ANN and consists of input layer, two hidden
layers, and output layer.
It is a method of supervised learning. It is an ANN with large number of percep-
tron’s. Therefore, ‘tanh’ activation function has been used. The concept of having
multiple layers and a nonlinear activation makes MLP different from a normal linear
perceptron.

4.3.5 Classification

We are classifying the data as label 0 which indicates that a person with no disease
and label 1 indicates that a person with disease.
We construct confusion matrix and calculate accuracy, precision, recall, and F1-
score.
Predictions: In this, we will predict survival rate of death. The output labels are:
• A person with attack
• A person without attack.

5 Results

Therefore, we have 12 independent attributes out of which ‘age,’ ‘ejection_fraction,’


‘serum_creatinine,’ ‘serum_sodium,’ ‘creatinine_phosphokinase,’ and ‘time’ are the
important features from the considered dataset.
Data Correlation: The correlation matrix shows the correlation among the features
and their correlation with the DEATH_EVENT (Target Attribute). Five features—
’age,’ ‘ejection fraction,’ ‘serum creatinine,’ ‘serum sodium,’ ‘creatinine phosphoki-
nase,’ and ‘time’—seem to be the most correlated to the death event when compared
with other features. Figure 3 shows the correlation matrix between the features
(Figs. 4 and 5).
The algorithms with their accuracy are given in Table 2 which is comparative
study of algorithms.

Number of Correctly Predicted values


Accuracy = (1)
Total Number of Predicted values
51 Heart Failure Prediction Using Classification Methods 551

Fig. 3 Feature importance

Fig. 4 Feature correlation matrix


552 O. S. Priya et al.

Fig. 5 Distribution of classes

Table 2 Comparison of performance with various metrics


Algorithms KNN SVM RF MLP
Predicted values 0 1 0 1 0 1 0 1
Precision 0.75 0.78 0.88 0.75 0.88 0.79 0.72 0.00
Recall 0.95 0.35 0.88 0.75 0.90 0.75 1.00 0.00
F1-score 0.84 0.48 0.88 0.75 0.89 0.77 0.83 0.00

6 Conclusion

Heart failure disease is one of the most popularly effected disease. Silent attacks are
commonly seen in women, and they cause a large number of deaths as a result of
delayed identification.
By identifying the functioning of heart failure in human bodies, it helps the diag-
nosis center’s to identify the properties with good accuracy levels, so that we can
avoid any complications like severe pain, shortness of breath.
In this paper, we classified data using a dataset which is obtained from Kaggle.
We have experimented with KNN, SVM, RFC, and MLP algorithms which were
used to determine the existence of survival rate of death from the available dataset,
and the performance of accuracy is also presented. We can conclude and prove
that machine learning algorithms such as K-nearest neighbor (KNN), support vector
machine (SVM), random forest (RF), and multilayer perceptron (MLP) can success-
fully predict data based on experimental findings. The obtained average accuracy
for testing data is KNN (75.0%), SVM (83.3%), RFC (85.0%), and MLP classifier
(71.6%) for target attribute death event. Random forest classifier achieves the best
results in predicting the survival rate of death.
51 Heart Failure Prediction Using Classification Methods 553

References

1. J. Patel, T. Upadhyay, S. Patel, Heart disease prediction using machine learning and data mining
technique. Int. J. Comput. Sci. Commun. 7, 129–137 (2016)
2. A. Hazra, S.K. Mandal, A. Gupta, A. Mukherjee, A. Mukherjee, Heart disease diagnosis and
prediction using machine learning and data mining techniques: a review. Adv. Comput. Sci.
Technol. 10(7), 2137–2159 (2017). ISSN 0973-6107
3. P. Singh, S. Singh, G.S. Pandi-Jain, Effective heart disease prediction system using data mining
techniques. Int. J. Nanomed. 13, 121–124 (2018)
4. S. Nashif, M.R. Raihan, M.R. Islam, M.H. Imam, Heart disease detection using machine
learning algorithms and a real-time cardiovascular health monitoring system. World J. Eng.
Technol., 854–873 (2018)
5. F.S. Alotaibi: Implementation of machine learning model to predict heart failure disease. Int.
J. Adv. Comput. Sci. Appl. (IJACSA) 10(6) (2019)
6. D. Krishnani, A. Kumari, A. Dewangan, A. Singh, N.S. Naik, Prediction of coronary
heart disease using supervised machine learning algorithms, in IEEE Region 10 Conference
(TENCON) (2019), pp. 367–372
7. A.H. Gonsalves, F. Thabtah, R.M.A. Mohammad, G. Singh, Prediction of coronary heart
disease using machine learning: an experimental analysis, in Proceedings of the 2019 3rd
International Conference on Deep Learning Technologies (2019)
8. P. Motarwar, A. Duraphe, G. Suganya, M. Premalatha, Cognitive approach for heart disease
prediction using machine learning, in International Conference on Emerging Trends in
Information Technology and Engineering (ic-ETITE) (2020), pp. 1–5
9. A. Singh, R. Kumar, Heart disease prediction using machine learning algorithms, in Interna-
tional Conference on Electrical and Electronics Engineering (ICE3) (Gorakhpur, India, 2020),
pp. 452–457
10. A. Singh, Prediction of heart disease using machine learning. Int. J. Sci. Res. Comput. Sci.
Eng. Inf. Technol. (2020). ISSN: 2456-3307

You might also like