Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
Sudipta Modak, Member, IEEE and Esam Abdel-Raheem, Senior Member, IEEE, Luis Rueda, Senior Member, IEEE
Department of Electrical and Computer Engineering
School of Computer Science
University of Windsor
401 Sunset Ave, Windsor, ON N9B 3P4, Canada
E-mail: {modak, lrueda, eraheem}@uwindsor.ca
Abstract—Prediction of heart disease is one of the most the world [1]. These datasets contain hidden information that
important fields of study in modern science. By studying data is directly related to the condition of the heart and needs to be
such as cholesterol levels, blood sugar, and blood pressure, identified. Due to the presence of such a huge quantity of data,
heart disease can be predicted. In recent years, several machine
learning techniques have been used to aid in fast prediction by it is impossible to manually analyze them and create methods
learning from the data. However, the prediction accuracy still for prediction [6]. Therefore, machine learning techniques are
remains low. This is due to lower number of records contained in required to deal with such data to predict diseases at early
the databases available. In this paper, we propose a new method stages.
of heart disease prediction using a modified variation of infinite The work of [7] compares six different types of algorithms,
feature selection and multilayer perceptron. The method shows
a high accuracy of 87.70%, a high F1-score of 87.21%, a high including Linear, Quadratic, Cubic and Medium Gaussian
sensitivity of 88.50%, a high specificity of 87.02%, and a high support vector machines (SVM), as well as Decision Tree
precision in prediction of 86.05%. on the Cleveland, Hungarian, and Ensemble Subspace Discriminant for prediction accuracy.
Switzerland, Long Beach, and Statlog datasets. For evaluation Deep learning has been used in the work of [8], where
purposes, we have combined all the datasets together and then different combinations of a number of hidden layers and the
divided the combined dataset into training and test samples with
a 20 % percent of the samples allocated for testing. number of epochs have been tested to learn which combina-
Index Terms—Heart disease prediction, Infinite feature selec- tion produces the best accuracy of prediction. Heart disease
tion, Multilayer perceptron, Neural Networks prediction using artificial neural networks can be found in
[9]. This method uses six different classifiers to test the data
I. I NTRODUCTION and employs deep neural networks (DNN) for classification
Automatic disease detection, classification, and prediction to achieve high accuracy in prediction. Similar algorithms can
have been important areas of research for several decades. be found in [10], [11].
For this purpose, several algorithms have been developed Feature selection techniques for heart disease prediction
to aid doctors with accurate predictions of several types of have been used in the works of [12]–[14]. In [12] an optimized
diseases. One such field is that of heart disease, which is one version of genetic algorithms with SVM to achieve good ac-
of the biggest causes of death in the modern world [1]. By curacy in prediction, while in [13], a brute-force approach was
studying the patterns of the electrocardiogram (ECG) signals used to select relevant features. That technique takes a small
and correlating them to existing data, common anomalies in subset of features with at least three features and evaluates
the heart can be identified. Several techniques for the detection the combination of such features on several classifiers such as
of QRS complex exist with high accuracy such as the works Logistic Regression (LR), k-nearest neighbour, decision tree,
in [2]–[5]. However, the problem with the prediction of heart Naive Bayes, SVM, neural network, and vote. An integer-
diseases remains quite challenging due to the low number of coded genetic algorithm has been used to select features as
records contained in the available databases. well [14]. That method aids the SVM-based prediction of
By learning from the data available, machine learning mod- heart disease and improves accuracy. A combination of the
els can predict these diseases at early stages. The attributes genetic algorithm and recursive feature elimination followed
taken into account for such techniques can be obtained from by a random forest classifier for better prediction has been
an individual’s body such as electrocardiogram, blood pres- presented in [15]. The work of [15] uses a genetic algorithm
sure, sugar levels, age, sex, cholesterol levels, etc [6]. How- and recursive feature elimination to select relevant features
ever, there can be redundant features present in the datasets. from the data sets along with different classifiers for clas-
These redundant features make predictions inaccurate and use sification. Based on that work, the random forest classifier
up precious memory and time. A significantly large amount provides the best results when combined with the hybrid
of data has been collected by the healthcare industry from feature selection technique used in [15]. The work of [16]
previous cases of heart-related disease from patients all over investigates the performance of several classifiers such as
978-1-6654-5818-4/22/$31.00 ©2022
Authorized licensed use limited to: SRM IEEE
Institute 235
of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. ICAIIC
Restrictions 2022
apply.
Data is taken as the output. The aim is to design an algorithm that
takes the 13 attributes and predicts the output. Since the data
Preprocessing is labeled, the algorithm is regarded as a supervised learning
algorithm.
Rank and
Energy Adaptive Training
B. Pre-processing
generation Threshold Set The algorithm begins with combining all five datasets into
by InfFS
one complete dataset. Any record with missing values was
Feature Selection Hidden eliminated from across the five databases and therefore only
Layers 1,025 records out of 1,190 were used in this work. The reason
for this is that, if more data is fed into a neural network
Deep then it can learn better. Similarly, any missing values will
Test Set Learning
hamper the process of learning as it creates ambiguity in the
Model
learning mechanism. The output is also changed to binary
output, that is anyone with no disease is regarded as a ’0’, and
Prediction an individual with heart disease is regarded as ’1’. Originally,
the data had classes from 0 to 3 which indicated the type
Fig. 1. Block diagram for the proposed method. of heart disease present in the individual with ’0’ being the
absence of any heart disease and ’1,’ ’2,’ and ’3’ being the
presence of three different types of diseases. For our work,
decision trees, Naive Bayes, k-nearest neighbor, and neural we have considered only the binary case of not having any
networks on heart disease prediction by varying the number disease as ’0’ and the presence of disease as ’1.’
of features provided as inputs. According to [16] the Naive
Bayes classifier works best if the number of features is low. C. Feature Selection
This paper presents a new technique for prediction of A dataset of data might have hundreds of features of which
heart disease using an advanced version of Infinite Feature many of them can be uncorrelated to the output of the data.
Selection (InfFS) [17] and deep neural networks (DNN). The main objective of feature selection (FS) algorithms is to
The method is evaluated on five different datasets, namely, pick out a subset of variables from the input that directly
Cleveland, Hungary, Switzerland, Statlog, and Long Beach V influence the outcome of the data while reducing the noise
[18]. A comparison with state-of-the-art methods in the field and filtering out the unwanted variables from the data. For
is included in this context as well. each dataset, a feature selection algorithm needs a particular
II. M ATERIALS AND M ETHODS selection criterion that can evaluate the applicability of each
of the features on the output classes. Once this measure is
The proposed method involves four stages, namely, prepro- calibrated, the irrelevant features are identified one by one and
cessing, which reads the raw data and converts it into usable eliminated if they do not satisfy the conditions being imposed.
quantities; a feature selection stage that selects a suitable In our research, we have employed an improved version of
subset of features and eliminates the redundant ones; a deep InfFS to select a distinct subset of feature from the dataset.
learning stage that is used to learn from the training datasets; The method was initially developed in [17] and is used to map
finally, a prediction stage that predicts the outcomes from the features on an affinity graph as nodes and then connects
the test dataset. The block diagram for the entire process is them. It then considers moving from feature to feature and by
summarized in Fig. 1. doing so, it creates a path by selecting several features as a
A. Dataset Description subset of the original list of features. Given a list of features,
F , the algorithm considers two aspects of the features, the
The datasets used in the experiments contain 1,190 records
vertices, V , and the edges, E. V contains a set of values
from five distinct databasets. It has 14 distinct features of
that represent a feature distribution for each of the values in
which eight are categorical features and six are numeric
F , while E represents the relations between two features in
features. The features are age, sex, chest pain type (cp), resting
the feature distribution space [17]. An adjacency matrix A,
blood pressure (trestbps), serum cholesterol (chol), fasting
containing all the pairwise energies, which is the maximal
blood sugar greater than 120 mg/d (fbs), resting electrocar-
feature dispersion and correlation between two features [17],
diographic results (restecg), maximum heart rate achieved
is formulated. Once all the features are mapped onto the graph
(thalach), exercise-induced angina (exang), ST depression
based on their weights in A, several pathways are selected
induced by exercise relative to rest (oldpeak), the slope of the
with more than two features at each iteration. These features
peak exercise ST segment (slope), number of major vessels
are all connected nodes and the energies of each of these paths
colored by fluoroscopy (ca), thallium scan (thal), and class
are calculated as follows:
attribute (num). Out of these 14 features 13 of them are taken
as inputs to the proposed algorithm and class attribute (num) ai,j = ασi,j + (1 − α)ci,j , (1)
236
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
l−1
Y
ξγ = avk ,vk+1 , (2)
k=0
l
where P(i,j) contains all the paths of length l between i and
j. The single feature energy score s(i) is given by:
X X
S(i) = Rl (i, j) = Al (i, j) , (4) Fig. 2. Schematic view of a deep neural network.
jϵV jϵV
237
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
are common for all five datasets and therefore they can be
combined together into one comprehensive database. Now,
the combined database is divided into separate segments for
training the DNN and then testing the model. This is done by
using five-fold cross-validation, that is the database is divided
into five equal segments and out of the five one segment is
kept as the test set and the rest four are used for training the
neural networks. This process is repeated five times with a
separate segment each time used for testing and the rest being
used for training to cross-validate the results. Out of the 1,025
instances, each time, 205 samples are kept as the test samples
which are later used to generate the metrics for the algorithm.
Once, the design of the neural network is finalized using the
above procedure, it is finally tested on the Cleveland database
to ensure a fair comparison with other methods included those
in the literature. The main metrics are accuracy, sensitivity, Fig. 3. Accuracy versus Epochs.
specif icity, precision, and F 1 score which are represented
by the following equations,
Rani et al. [15] in terms of accurate predictions. Similarly, the
tp + tn accuracy achieved by the proposed method is approximately
accuracy = , (9)
tp + tn + f p + f n equal to the method Amin et al. [13]. However, the method
tp of Bharti et al. [9] shows higher accuracy than the proposed
sensitivity = , (10) method. This method is evaluated further using merits such
tp + f n
as sensitivity and specificity. Figures 5 and 6 illustrate the
tn
specif icity = , (11) comparison of both quantities between the proposed method
tn + f p and the one in [9]. It is noticeable that the proposed method
tp outperforms the method of Bharti et al. [9] in both cases with
precision = , (12)
tp + f p higher sensitivity and specificity. Furthermore, the method of
tp Bharti et al. uses three dense layers with 128, 64, and 32 units
F1 = , (13) in layers 1, 2, and 3, respectively. This increases the number of
tp + 0.5 ∗ (f p + f n)
computations and time for processing significantly and makes
where tp is the number of true positives, tn is the number of the model quite complex. On the other hand, the proposed
true negatives, f n is the number of false negatives, and f p is algorithm uses only 11, 8, and 8 units for the input, hidden
the number of false positives. layer 1 and hidden layer 2, respectively, and so has lesser
Table I shows the results on the test samples of five different time complexity. Considering this, it is safe to say that the
folds. The values of accuracy, sensitivity, specif icity, proposed algorithm performs better than all methods included
precision, and F 1 − score are included in the table. Figure in the context.
3 shows the test and training accuracy trends. The algorithm
requires approximately 50 epochs to reach the desired weights IV. C ONCLUSION
for the DNN classifier, therefore there is not much need to We have presented a new method of heart disease predic-
increase the number of epochs as it might lead to overfitting. tion using adaptive infinite feature selection and deep neural
As shown in the graph, the training accuracy is close to 86% networks. The proposed method has shown high accuracy of
and stabilizes at this value at approximately epoch 45. The prediction on five datasets which include Cleveland, Hun-
testing accuracy stabilizes at approximately 85 % which is gary, Switzerland, Statlog, and Long Beach V. The proposed
roughly the same epoch number as the training accuracy. The method has been tested on different folds of data and shows
figure shows the accuracy versus the number of epochs for high accuracy in terms of prediction of heart disease. In the
Fold 5. future, the proposed method can be tested with eigenvector
In addition, the proposed method is compared to five state- centrality for feature selection and more neural network-based
of-the-art methods in the field. The performances of these classifiers can be implemented such as graph neural networks.
methods are collected from the papers mentioned in the The work can also be extended to the prediction of more acute
literature. For a fair comparison, we have only used the and chronic diseases such as anemia, diabetes, and tumors.
Cleveland dataset, which has also been used by all methods for
ACKNOWLEDGEMENT
testing and comparison of metrics. The results are presented
in Table II, while Figure 4 displays the comparison more This work has been partially supported by a grant provided
vividly. It is observed that the proposed method outperforms by the Natural Science and Engineering Council of Canada
the methods of Latha et al. [5], Gokulnath et al. [12], and (NSERC). The authors would also like to thank the Office
238
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
TABLE I
M ETRICS OF E VALUATION ON THE T EST SET PER FOLD .
TABLE II
C OMPARISON WITH THE S TATE - OF - THE - ART M ETHODS ON C LEVELAND
DATABASE .
R EFERENCES
[1] R. Chitra and V. Seenivasagam, “Review of heart disease prediction
system using data mining and hybrid intelligent techniques,” ICTACT
journal on soft computing, vol. 3, no. 04, pp. 605–09, 2013.
[2] S. Modak, L. Y. Taha, and E. Abdel-Raheem, “A novel method of qrs
detection using time and amplitude thresholds with statistical false peak
elimination,” IEEE Access, vol. 9, pp. 46 079–46 092, 2021.
Fig. 5. Comparison of sensitivity with Bharti et al. [9]. [3] S. Modak, E. Abdel-Raheem, and L. Y. Taha, “A novel adaptive
multilevel thresholding based algorithm for qrs detection,” Biomedical
Engineering Advances, p. 100016, 2021.
239
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
[4] S. Modak, L. Y. Taha, and E. Abdel-Raheem, “Single channel qrs
detection using wavelet and median denoising with adaptive multilevel
thresholding,” in 2020 IEEE International Symposium on Signal Pro-
cessing and Information Technology (ISSPIT). IEEE, 2020, pp. 1–6.
[5] Z. Zhang, Q. Yu, Q. Zhang, N. Ning, and J. Li, “A kalman filtering based
adaptive threshold algorithm for qrs complex detection,” Biomedical
Signal Processing and Control, vol. 58, p. 101827, 2020.
[6] C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of prediction
of heart disease risk based on ensemble classification techniques,”
Informatics in Medicine Unlocked, vol. 16, p. 100203, 2019.
[7] S. Ekız and P. Erdoğmuş, “Comparative study of heart disease classi-
fication,” in 2017 Electric Electronics, Computer Science, Biomedical
Engineerings’ Meeting (EBBT). IEEE, 2017, pp. 1–4.
[8] P. Ramprakash, R. Sarumathi, R. Mowriya, and S. Nithyavishnupriya,
“Heart disease prediction using deep neural network,” in 2020 Inter-
national Conference on Inventive Computation Technologies (ICICT).
IEEE, 2020, pp. 666–670.
[9] R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and
P. Singh, “Prediction of heart disease using a combination of machine
learning and deep learning,” Computational Intelligence and Neuro-
science, vol. 2021, 2021.
[10] A. Khemphila and V. Boonjing, “Heart disease classification using
neural network and feature selection,” in 2011 21st International
Conference on Systems Engineering. IEEE, 2011, pp. 406–409.
[11] Y. E. Shao, C.-D. Hou, and C.-C. Chiu, “Hybrid intelligent modeling
schemes for heart disease classification,” Applied Soft Computing,
vol. 14, pp. 47–52, 2014.
[12] C. B. Gokulnath and S. Shantharajah, “An optimized feature selection
based on genetic approach and support vector machine for heart
disease,” Cluster Computing, vol. 22, no. 6, pp. 14 777–14 787, 2019.
[13] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of
significant features and data mining techniques in predicting heart
disease,” Telematics and Informatics, vol. 36, pp. 82–93, 2019.
[14] S. Bhatia, P. Prakash, and G. Pillai, “Svm based decision support system
for heart disease classification with integer-coded genetic algorithm
to select critical features,” in Proceedings of the world congress on
engineering and computer science, 2008, pp. 34–38.
[15] P. Rani, R. Kumar, N. M. S. Ahmed, and A. Jain, “A decision support
system for heart disease prediction based upon machine learning,”
Journal of Reliable Intelligent Environments, pp. 1–13, 2021.
[16] T. J. Peter and K. Somasundaram, “An empirical study on prediction
of heart disease using classification data mining techniques,” in IEEE-
International conference on advances in engineering, science and
management (ICAESM-2012). IEEE, 2012, pp. 514–518.
[17] G. Roffo, S. Melzi, and M. Cristani, “Infinite feature selection,” in
Proceedings of the IEEE International Conference on Computer Vision,
2015, pp. 4202–4210.
[18] D. Dua, C. Graff et al., “Uci machine learning repository,” 2017.
240
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.