0% found this document useful (0 votes)
135 views6 pages

Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks

This document presents a new method for predicting heart disease using an adaptive infinite feature selection technique and deep neural networks. The method was tested on 5 heart disease datasets combined together and showed high accuracy of 87.7%, F1-score of 87.21%, sensitivity of 88.5%, specificity of 87.02%, and precision of 86.05% in predicting heart disease. The technique adaptively selects important features and uses a multilayer perceptron for classification to improve prediction performance over existing methods.

Uploaded by

mano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views6 pages

Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks

This document presents a new method for predicting heart disease using an adaptive infinite feature selection technique and deep neural networks. The method was tested on 5 heart disease datasets combined together and showed high accuracy of 87.7%, F1-score of 87.21%, sensitivity of 88.5%, specificity of 87.02%, and precision of 86.05% in predicting heart disease. The technique adaptively selects important features and uses a multilayer perceptron for classification to improve prediction performance over existing methods.

Uploaded by

mano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Heart Disease Prediction Using Adaptive Infinite

Feature Selection and Deep Neural Networks


2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) | 978-1-6654-5818-4/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAIIC54071.2022.9722652

Sudipta Modak, Member, IEEE and Esam Abdel-Raheem, Senior Member, IEEE, Luis Rueda, Senior Member, IEEE
Department of Electrical and Computer Engineering
School of Computer Science
University of Windsor
401 Sunset Ave, Windsor, ON N9B 3P4, Canada
E-mail: {modak, lrueda, eraheem}@uwindsor.ca

Abstract—Prediction of heart disease is one of the most the world [1]. These datasets contain hidden information that
important fields of study in modern science. By studying data is directly related to the condition of the heart and needs to be
such as cholesterol levels, blood sugar, and blood pressure, identified. Due to the presence of such a huge quantity of data,
heart disease can be predicted. In recent years, several machine
learning techniques have been used to aid in fast prediction by it is impossible to manually analyze them and create methods
learning from the data. However, the prediction accuracy still for prediction [6]. Therefore, machine learning techniques are
remains low. This is due to lower number of records contained in required to deal with such data to predict diseases at early
the databases available. In this paper, we propose a new method stages.
of heart disease prediction using a modified variation of infinite The work of [7] compares six different types of algorithms,
feature selection and multilayer perceptron. The method shows
a high accuracy of 87.70%, a high F1-score of 87.21%, a high including Linear, Quadratic, Cubic and Medium Gaussian
sensitivity of 88.50%, a high specificity of 87.02%, and a high support vector machines (SVM), as well as Decision Tree
precision in prediction of 86.05%. on the Cleveland, Hungarian, and Ensemble Subspace Discriminant for prediction accuracy.
Switzerland, Long Beach, and Statlog datasets. For evaluation Deep learning has been used in the work of [8], where
purposes, we have combined all the datasets together and then different combinations of a number of hidden layers and the
divided the combined dataset into training and test samples with
a 20 % percent of the samples allocated for testing. number of epochs have been tested to learn which combina-
Index Terms—Heart disease prediction, Infinite feature selec- tion produces the best accuracy of prediction. Heart disease
tion, Multilayer perceptron, Neural Networks prediction using artificial neural networks can be found in
[9]. This method uses six different classifiers to test the data
I. I NTRODUCTION and employs deep neural networks (DNN) for classification
Automatic disease detection, classification, and prediction to achieve high accuracy in prediction. Similar algorithms can
have been important areas of research for several decades. be found in [10], [11].
For this purpose, several algorithms have been developed Feature selection techniques for heart disease prediction
to aid doctors with accurate predictions of several types of have been used in the works of [12]–[14]. In [12] an optimized
diseases. One such field is that of heart disease, which is one version of genetic algorithms with SVM to achieve good ac-
of the biggest causes of death in the modern world [1]. By curacy in prediction, while in [13], a brute-force approach was
studying the patterns of the electrocardiogram (ECG) signals used to select relevant features. That technique takes a small
and correlating them to existing data, common anomalies in subset of features with at least three features and evaluates
the heart can be identified. Several techniques for the detection the combination of such features on several classifiers such as
of QRS complex exist with high accuracy such as the works Logistic Regression (LR), k-nearest neighbour, decision tree,
in [2]–[5]. However, the problem with the prediction of heart Naive Bayes, SVM, neural network, and vote. An integer-
diseases remains quite challenging due to the low number of coded genetic algorithm has been used to select features as
records contained in the available databases. well [14]. That method aids the SVM-based prediction of
By learning from the data available, machine learning mod- heart disease and improves accuracy. A combination of the
els can predict these diseases at early stages. The attributes genetic algorithm and recursive feature elimination followed
taken into account for such techniques can be obtained from by a random forest classifier for better prediction has been
an individual’s body such as electrocardiogram, blood pres- presented in [15]. The work of [15] uses a genetic algorithm
sure, sugar levels, age, sex, cholesterol levels, etc [6]. How- and recursive feature elimination to select relevant features
ever, there can be redundant features present in the datasets. from the data sets along with different classifiers for clas-
These redundant features make predictions inaccurate and use sification. Based on that work, the random forest classifier
up precious memory and time. A significantly large amount provides the best results when combined with the hybrid
of data has been collected by the healthcare industry from feature selection technique used in [15]. The work of [16]
previous cases of heart-related disease from patients all over investigates the performance of several classifiers such as

978-1-6654-5818-4/22/$31.00 ©2022
Authorized licensed use limited to: SRM IEEE
Institute 235
of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. ICAIIC
Restrictions 2022
apply.
Data is taken as the output. The aim is to design an algorithm that
takes the 13 attributes and predicts the output. Since the data
Preprocessing is labeled, the algorithm is regarded as a supervised learning
algorithm.
Rank and
Energy Adaptive Training
B. Pre-processing
generation Threshold Set The algorithm begins with combining all five datasets into
by InfFS
one complete dataset. Any record with missing values was
Feature Selection Hidden eliminated from across the five databases and therefore only
Layers 1,025 records out of 1,190 were used in this work. The reason
for this is that, if more data is fed into a neural network
Deep then it can learn better. Similarly, any missing values will
Test Set Learning
hamper the process of learning as it creates ambiguity in the
Model
learning mechanism. The output is also changed to binary
output, that is anyone with no disease is regarded as a ’0’, and
Prediction an individual with heart disease is regarded as ’1’. Originally,
the data had classes from 0 to 3 which indicated the type
Fig. 1. Block diagram for the proposed method. of heart disease present in the individual with ’0’ being the
absence of any heart disease and ’1,’ ’2,’ and ’3’ being the
presence of three different types of diseases. For our work,
decision trees, Naive Bayes, k-nearest neighbor, and neural we have considered only the binary case of not having any
networks on heart disease prediction by varying the number disease as ’0’ and the presence of disease as ’1.’
of features provided as inputs. According to [16] the Naive
Bayes classifier works best if the number of features is low. C. Feature Selection
This paper presents a new technique for prediction of A dataset of data might have hundreds of features of which
heart disease using an advanced version of Infinite Feature many of them can be uncorrelated to the output of the data.
Selection (InfFS) [17] and deep neural networks (DNN). The main objective of feature selection (FS) algorithms is to
The method is evaluated on five different datasets, namely, pick out a subset of variables from the input that directly
Cleveland, Hungary, Switzerland, Statlog, and Long Beach V influence the outcome of the data while reducing the noise
[18]. A comparison with state-of-the-art methods in the field and filtering out the unwanted variables from the data. For
is included in this context as well. each dataset, a feature selection algorithm needs a particular
II. M ATERIALS AND M ETHODS selection criterion that can evaluate the applicability of each
of the features on the output classes. Once this measure is
The proposed method involves four stages, namely, prepro- calibrated, the irrelevant features are identified one by one and
cessing, which reads the raw data and converts it into usable eliminated if they do not satisfy the conditions being imposed.
quantities; a feature selection stage that selects a suitable In our research, we have employed an improved version of
subset of features and eliminates the redundant ones; a deep InfFS to select a distinct subset of feature from the dataset.
learning stage that is used to learn from the training datasets; The method was initially developed in [17] and is used to map
finally, a prediction stage that predicts the outcomes from the features on an affinity graph as nodes and then connects
the test dataset. The block diagram for the entire process is them. It then considers moving from feature to feature and by
summarized in Fig. 1. doing so, it creates a path by selecting several features as a
A. Dataset Description subset of the original list of features. Given a list of features,
F , the algorithm considers two aspects of the features, the
The datasets used in the experiments contain 1,190 records
vertices, V , and the edges, E. V contains a set of values
from five distinct databasets. It has 14 distinct features of
that represent a feature distribution for each of the values in
which eight are categorical features and six are numeric
F , while E represents the relations between two features in
features. The features are age, sex, chest pain type (cp), resting
the feature distribution space [17]. An adjacency matrix A,
blood pressure (trestbps), serum cholesterol (chol), fasting
containing all the pairwise energies, which is the maximal
blood sugar greater than 120 mg/d (fbs), resting electrocar-
feature dispersion and correlation between two features [17],
diographic results (restecg), maximum heart rate achieved
is formulated. Once all the features are mapped onto the graph
(thalach), exercise-induced angina (exang), ST depression
based on their weights in A, several pathways are selected
induced by exercise relative to rest (oldpeak), the slope of the
with more than two features at each iteration. These features
peak exercise ST segment (slope), number of major vessels
are all connected nodes and the energies of each of these paths
colored by fluoroscopy (ca), thallium scan (thal), and class
are calculated as follows:
attribute (num). Out of these 14 features 13 of them are taken
as inputs to the proposed algorithm and class attribute (num) ai,j = ασi,j + (1 − α)ci,j , (1)

236
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
l−1
Y
ξγ = avk ,vk+1 , (2)
k=0

where α is a loading coefficient, σ is the standard deviation,


and ξγ accounts for the pairwise energies of all the feature
pairs that compose the path [17]. Here, l is the length of the
path, i and j are the positions of the feature, while a is the
feature. The cycles are recorded in Rl , that are computed as
follows: X
Rl (i, j) = ξγ = Al (i, j) , (3)
l
γϵP(i,j)

l
where P(i,j) contains all the paths of length l between i and
j. The single feature energy score s(i) is given by:
X X
S(i) = Rl (i, j) = Al (i, j) , (4) Fig. 2. Schematic view of a deep neural network.
jϵV jϵV

and is equal to Rl (i, j). The vector S stores the feature


from the FS stage and formulates the same number of neurons
energies individually. The feature energies are then arranged
in the input stage. The training dataset is then fed to the neural
in decreasing order with the feature having the highest energy
network in batches of size 32. The activation function used
first and stored in vector M.
for the input stage is the rectified linear activation function
Once all the feature rank energies are calculated, the algo-
(Relu), which is a piecewise linear function that will transfer
rithm automatically selects the number of features to keep, and
the input directly to the output if it is positive. However, if
hence some features are deemed redundant and are eliminated.
the input is negative then it will pass zero to the output. The
A vector (B) is formulated, which stores the square of the
hidden layers consist of eight neurons each and the activation
individual rank energies. Equation 5 shows how each element
function of the first hidden layer is also kept as ’relu’. The
in B is calculated.
second hidden layer is initialized with the activation function
of ’sigmoid’, which is a logistic function that is represented
B(k) = M (k)2 . (5)
by the following equation:
Here, k is the element number in both B and M. Finally, a 1
threshold of T is used to decide how many features to keep for ϕ(x) = , (7)
1 + E −x
the classification step. This threshold is calculated as follows:
where E is Euler’s number and x is the element number in
n
X the input range of the sigmoid function. The dropouts for all
T =C B(k) , (6)
three layers, that is the input layer and the two hidden layers
k=1
are 0.25 each.
where n is the number of elements in B and C is a constant The activation function used for the output layer is ’soft-
taken as 0.325 for the entire dataset. max’ which is a normalized exponential function and provides
If the individual energy of a feature exceeds the threshold the probability of obtaining a ’0’ and a ’1’ as the output.
T , then that feature is kept, or else it is discarded. The Equation 8 represents the ’softmax’ function.
remaining features are then arranged in the order of their
energy values in vector M. The number feature kept is denoted ez
γ(⃗z) = PP . (8)
by N . q=1 ez l
D. Classification Here, P is the number of classes in the multi-class classifier,
Artificial neural networks (ANN) are just like the neurons ⃗z, input vector z standard exponential function for the input
in a human brain [9]. Neural networks consist of several vector, and zl standard exponential function for the output
components such as neurons, synapses, weights, and biases. vector.
Deep neural networks (DNN) are an extension to ANN where
multiple hidden layers are included to maximize the accuracy III. E XPERIMENT AND R ESULTS
of decisions. These networks are feedforward networks, where First, the datasets are combined into one database and
the data always propagates in the forward direction, that is, in then the combined database is divided into training and test
the direction of the output from the input and does not loop datasets. The reason for this is that, the datasets all come
backward. Figure 2 shows an example of a DNN with one from the same repository which is the UCI and according
input and one output layer and two connected hidden layers. to other methods such as the one in [15] only the fourteen
For our research, we have used two hidden layers in the features mentioned in the dataset description of this paper
deep learning platform. The learning algorithm takes N inputs are selected for training and testing. These fourteen features

237
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
are common for all five datasets and therefore they can be
combined together into one comprehensive database. Now,
the combined database is divided into separate segments for
training the DNN and then testing the model. This is done by
using five-fold cross-validation, that is the database is divided
into five equal segments and out of the five one segment is
kept as the test set and the rest four are used for training the
neural networks. This process is repeated five times with a
separate segment each time used for testing and the rest being
used for training to cross-validate the results. Out of the 1,025
instances, each time, 205 samples are kept as the test samples
which are later used to generate the metrics for the algorithm.
Once, the design of the neural network is finalized using the
above procedure, it is finally tested on the Cleveland database
to ensure a fair comparison with other methods included those
in the literature. The main metrics are accuracy, sensitivity, Fig. 3. Accuracy versus Epochs.
specif icity, precision, and F 1 score which are represented
by the following equations,
Rani et al. [15] in terms of accurate predictions. Similarly, the
tp + tn accuracy achieved by the proposed method is approximately
accuracy = , (9)
tp + tn + f p + f n equal to the method Amin et al. [13]. However, the method
tp of Bharti et al. [9] shows higher accuracy than the proposed
sensitivity = , (10) method. This method is evaluated further using merits such
tp + f n
as sensitivity and specificity. Figures 5 and 6 illustrate the
tn
specif icity = , (11) comparison of both quantities between the proposed method
tn + f p and the one in [9]. It is noticeable that the proposed method
tp outperforms the method of Bharti et al. [9] in both cases with
precision = , (12)
tp + f p higher sensitivity and specificity. Furthermore, the method of
tp Bharti et al. uses three dense layers with 128, 64, and 32 units
F1 = , (13) in layers 1, 2, and 3, respectively. This increases the number of
tp + 0.5 ∗ (f p + f n)
computations and time for processing significantly and makes
where tp is the number of true positives, tn is the number of the model quite complex. On the other hand, the proposed
true negatives, f n is the number of false negatives, and f p is algorithm uses only 11, 8, and 8 units for the input, hidden
the number of false positives. layer 1 and hidden layer 2, respectively, and so has lesser
Table I shows the results on the test samples of five different time complexity. Considering this, it is safe to say that the
folds. The values of accuracy, sensitivity, specif icity, proposed algorithm performs better than all methods included
precision, and F 1 − score are included in the table. Figure in the context.
3 shows the test and training accuracy trends. The algorithm
requires approximately 50 epochs to reach the desired weights IV. C ONCLUSION
for the DNN classifier, therefore there is not much need to We have presented a new method of heart disease predic-
increase the number of epochs as it might lead to overfitting. tion using adaptive infinite feature selection and deep neural
As shown in the graph, the training accuracy is close to 86% networks. The proposed method has shown high accuracy of
and stabilizes at this value at approximately epoch 45. The prediction on five datasets which include Cleveland, Hun-
testing accuracy stabilizes at approximately 85 % which is gary, Switzerland, Statlog, and Long Beach V. The proposed
roughly the same epoch number as the training accuracy. The method has been tested on different folds of data and shows
figure shows the accuracy versus the number of epochs for high accuracy in terms of prediction of heart disease. In the
Fold 5. future, the proposed method can be tested with eigenvector
In addition, the proposed method is compared to five state- centrality for feature selection and more neural network-based
of-the-art methods in the field. The performances of these classifiers can be implemented such as graph neural networks.
methods are collected from the papers mentioned in the The work can also be extended to the prediction of more acute
literature. For a fair comparison, we have only used the and chronic diseases such as anemia, diabetes, and tumors.
Cleveland dataset, which has also been used by all methods for
ACKNOWLEDGEMENT
testing and comparison of metrics. The results are presented
in Table II, while Figure 4 displays the comparison more This work has been partially supported by a grant provided
vividly. It is observed that the proposed method outperforms by the Natural Science and Engineering Council of Canada
the methods of Latha et al. [5], Gokulnath et al. [12], and (NSERC). The authors would also like to thank the Office

238
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
TABLE I
M ETRICS OF E VALUATION ON THE T EST SET PER FOLD .

Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Average


Accuracy (%) 90.24 88.29 90.24 83.41 84.91 87.70
Sensitivity (%) 92.63 84.69 92.55 84.38 88.24 88.50
Specificity (%) 88.18 91.59 88.29 82.57 84.47 87.02
Precision (%) 87.13 90.22 87.00 81.00 84.91 86.05
F1-Score (%) 89.80 87.37 89.69 82.65 86.54 87.21

TABLE II
C OMPARISON WITH THE S TATE - OF - THE - ART M ETHODS ON C LEVELAND
DATABASE .

Method Year Accuracy (%)


M. S. Amin et al. [13] 2019 87.41
C. B. C. Latha et al. [6] 2019 85.48
R. Bharti et al. [9] 2021 94.20
C. B. Gokulnath et al. [12] 2019 84.40
P. Rani et al. [15] 2021 86.60
Proposed Method 2021 87.13

Fig. 4. Comparison of accuracy with state-of-the-art methods.

Fig. 6. Comparison of specificity with Bharti et al. [9].

of Research and Innovation Services of the University of


Windsor.

R EFERENCES
[1] R. Chitra and V. Seenivasagam, “Review of heart disease prediction
system using data mining and hybrid intelligent techniques,” ICTACT
journal on soft computing, vol. 3, no. 04, pp. 605–09, 2013.
[2] S. Modak, L. Y. Taha, and E. Abdel-Raheem, “A novel method of qrs
detection using time and amplitude thresholds with statistical false peak
elimination,” IEEE Access, vol. 9, pp. 46 079–46 092, 2021.
Fig. 5. Comparison of sensitivity with Bharti et al. [9]. [3] S. Modak, E. Abdel-Raheem, and L. Y. Taha, “A novel adaptive
multilevel thresholding based algorithm for qrs detection,” Biomedical
Engineering Advances, p. 100016, 2021.

239
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.
[4] S. Modak, L. Y. Taha, and E. Abdel-Raheem, “Single channel qrs
detection using wavelet and median denoising with adaptive multilevel
thresholding,” in 2020 IEEE International Symposium on Signal Pro-
cessing and Information Technology (ISSPIT). IEEE, 2020, pp. 1–6.
[5] Z. Zhang, Q. Yu, Q. Zhang, N. Ning, and J. Li, “A kalman filtering based
adaptive threshold algorithm for qrs complex detection,” Biomedical
Signal Processing and Control, vol. 58, p. 101827, 2020.
[6] C. B. C. Latha and S. C. Jeeva, “Improving the accuracy of prediction
of heart disease risk based on ensemble classification techniques,”
Informatics in Medicine Unlocked, vol. 16, p. 100203, 2019.
[7] S. Ekız and P. Erdoğmuş, “Comparative study of heart disease classi-
fication,” in 2017 Electric Electronics, Computer Science, Biomedical
Engineerings’ Meeting (EBBT). IEEE, 2017, pp. 1–4.
[8] P. Ramprakash, R. Sarumathi, R. Mowriya, and S. Nithyavishnupriya,
“Heart disease prediction using deep neural network,” in 2020 Inter-
national Conference on Inventive Computation Technologies (ICICT).
IEEE, 2020, pp. 666–670.
[9] R. Bharti, A. Khamparia, M. Shabaz, G. Dhiman, S. Pande, and
P. Singh, “Prediction of heart disease using a combination of machine
learning and deep learning,” Computational Intelligence and Neuro-
science, vol. 2021, 2021.
[10] A. Khemphila and V. Boonjing, “Heart disease classification using
neural network and feature selection,” in 2011 21st International
Conference on Systems Engineering. IEEE, 2011, pp. 406–409.
[11] Y. E. Shao, C.-D. Hou, and C.-C. Chiu, “Hybrid intelligent modeling
schemes for heart disease classification,” Applied Soft Computing,
vol. 14, pp. 47–52, 2014.
[12] C. B. Gokulnath and S. Shantharajah, “An optimized feature selection
based on genetic approach and support vector machine for heart
disease,” Cluster Computing, vol. 22, no. 6, pp. 14 777–14 787, 2019.
[13] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification of
significant features and data mining techniques in predicting heart
disease,” Telematics and Informatics, vol. 36, pp. 82–93, 2019.
[14] S. Bhatia, P. Prakash, and G. Pillai, “Svm based decision support system
for heart disease classification with integer-coded genetic algorithm
to select critical features,” in Proceedings of the world congress on
engineering and computer science, 2008, pp. 34–38.
[15] P. Rani, R. Kumar, N. M. S. Ahmed, and A. Jain, “A decision support
system for heart disease prediction based upon machine learning,”
Journal of Reliable Intelligent Environments, pp. 1–13, 2021.
[16] T. J. Peter and K. Somasundaram, “An empirical study on prediction
of heart disease using classification data mining techniques,” in IEEE-
International conference on advances in engineering, science and
management (ICAESM-2012). IEEE, 2012, pp. 514–518.
[17] G. Roffo, S. Melzi, and M. Cristani, “Infinite feature selection,” in
Proceedings of the IEEE International Conference on Computer Vision,
2015, pp. 4202–4210.
[18] D. Dua, C. Graff et al., “Uci machine learning repository,” 2017.

240
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on July 04,2023 at 11:34:45 UTC from IEEE Xplore. Restrictions apply.

You might also like