Research Article: Heart Disease Prediction Based On The Embedded Feature Selection Method and Deep Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Hindawi

Journal of Healthcare Engineering


Volume 2021, Article ID 6260022, 9 pages
https://doi.org/10.1155/2021/6260022

Research Article
Heart Disease Prediction Based on the Embedded Feature
Selection Method and Deep Neural Network

Dengqing Zhang ,1,2 Yunyi Chen ,3 Yuxuan Chen ,3 Shengyi Ye,1,2 Wenyu Cai,1,2
Junxue Jiang,1,2 Yechuan Xu,1,2 Gongfeng Zheng,1,2 and Ming Chen 1,4
1
Jinjiang Hospital Affiliated to Fujian Medical University, Fujian, Jinjiang 362200, China
2
Department of Cardiology, Jinjiang Hospital of Fujian Province, Fujian, Jinjiang 362200, China
3
School of Informatics Xiamen University, Xiamen University, Fujian, Xiamen 361000, China
4
Department of Public Health, Jinjiang Hospital of Fujian Province, Fujian, Jinjiang 362200, China

Correspondence should be addressed to Ming Chen; [email protected]

Received 21 August 2021; Revised 7 September 2021; Accepted 16 September 2021; Published 29 September 2021

Academic Editor: Gu Xiaoqing

Copyright © 2021 Dengqing Zhang et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
In recent decades, heart disease threatens people’s health seriously because of its prevalence and high risk of death. Therefore,
predicting heart disease through some simple physical indicators obtained from the regular physical examination at an early stage
has become a valuable subject. Clinically, it is essential to be sensitive to these indicators related to heart disease to make
predictions and provide a reliable basis for further diagnosis. However, the large amount of data makes manual analysis and
prediction taxing and arduous. Our research aims to predict heart disease both accurately and quickly through various indicators
of the body. In this paper, a novel heart disease prediction model is given. We propose a heart disease prediction algorithm that
combines the embedded feature selection method and deep neural networks. This embedded feature selection method is based on
the LinearSVC algorithm, using the L1 norm as a penalty item to choose a subset of features significantly associated with heart
disease. These features are fed into the deep neural network we built. The weight of the network is initialized with the He initializer
to prevent gradient varnishing or explosion so that the predictor can have a better performance. Our model is tested on the heart
disease dataset obtained from Kaggle. Some indicators including accuracy, recall, precision, and F1-score are calculated to evaluate
the predictor, and the results show that our model achieves 98.56%, 99.35%, 97.84%, and 0.983, respectively, and the average AUC
score of the model reaches 0.983, confirming that the method we proposed is efficient and reliable for predicting heart disease.

1. Introduction shown that about 90% of heart diseases can be prevented [2].
In addition, heart disease has the characteristics of early
Heart disease is a common fatal disease and is currently the detection, early treatment, and early recovery. Therefore,
number one killer of the global population. According to the early detection of this illness is the key to treatment. To
World Health Organization report [1], cardiovascular dis- obtain the patient’s cardiovascular status, the hospital needs
ease kills 17.9 million people every year, accounting for to collect specific physical values, such as static blood
about 32% of the world’s deaths. The report also stated that pressure, blood sugar, cholesterol, maximum heart rate,
heart disease and stroke are the leading causes of cardio- chest pain type, and electrocardiogram. However, traditional
vascular diseases, accounting for approximately 85% of manual analysis of huge heart disease-related data has the
deaths. Like a circulatory system disease, cardiovascular disadvantages of misdiagnosis and is time-consuming. Ar-
disease is caused by many factors, such as high blood tificial intelligence is widely used in prediction to solve this
pressure, smoking, diabetes, and lack of exercise. problem, among which machine learning (ML) and deep
So far, methods to reduce deaths from heart diseases learning (DL) are the majority. These prediction models
have always been the focus of research, and studies have analyze a large amount of medical data to determine whether
2 Journal of Healthcare Engineering

a patient has the disease and obtain more accurate prediction (87.69%), and SGD has the best effect of 5-fold cross-vali-
results than manual diagnosis. dation (87.69%). There are also studies using mixed models
This study combines machine learning with deep to make predictions. Kavitha et al. [9] used random deep
learning, applies LinearSVC and DNN technology, and forest (RF), DT, and mixed model (RF; DT) on the UCI
proposes a new heart disease prediction model. The Line- Cleveland dataset. The final result showed that the mixed
arSVC algorithm is applied to the feature selection module model has the best effect, with an accuracy of 88.7%.
after data preprocessing. At the same time, we use Lasso as a In addition, many new predictive models have also been
penalty term to generate a sparse weight matrix, filter out a proposed by researchers. Spencer et al. [10] combined
subset of features closely related to heart disease, and provide feature selection technology with ML algorithm, and the
more reliable input for DNN. Furthermore, we compare created model combined chi-square feature selection and
several widely used weight initializers and finally choose the BayesNet algorithm to achieve an accuracy of 85%. Khan
He initialization method since it can provide the best initial [11] proposed an improved deep convolutional neural
weight for the network. According to the results, the pro- network IoT framework. The framework is attached to a
posed model achieved accuracy, recall, and precision of wearable detection device to detect the patient’s blood
98.56%, 99.35%, and 97.84%. pressure and electrocardiogram (ECG). Compared with the
The paper is structured as follows: Section 2 reviews the existing deep learning neural network and LR, this method
previous research on heart disease, Section 3 introduces the has better performance (98.2%). Mohan et al. [12] combined
database we use and method analysis of our proposed model, RF with linear method (LM) and proposed a hybrid random
followed by the detailed results of this research and the forest (HRFLM) prediction model with a linear model.
comparison with other algorithms in Section 4, and finally, Experiments on the UCI Cleveland dataset showed that the
Section 5 mentions a conclusion of this paper. accuracy of the classification model reached 88.7%, which is
better than other classification methods. Magesh and
2. Literature Review Swarnalatha [13] adopted a cluster-based DT learning
(CDTL) method. After feature processing, the CDTL-RF
Researchers apply various data mining techniques to heart prediction accuracy can reach 89.30%, improving 12.60%
disease prediction methods. Amin et al. [3] used the UCI compared to the non-CDTL method. Mehmood et al. [14]
Cleveland database to confirm important features and proposed a method called CardioHelp, which combines
mining techniques and finally used the UCI Statlog dataset CNN with deep learning algorithms, involving the use of
for evaluation and verification. The research proposed 9 CNN for HF prediction and temporal model modeling at the
salient features from the 13 features of the original dataset earliest stage. Compared with other state-of-the-art
and compared three data mining techniques: Vote, Naı̈ve methods, this method achieves the best performance, and its
Bayes (NB), and Support Vector Machine (SVM). Among accuracy rate is 97%.
them, Vote has the best performance, and the accuracy is
87.41%. Similarly, Nalluri et al. [4] compared the perfor- 3. Materials and Methods
mance of the XGBoost algorithm and the logistic regression
(LR) method in predicting the value of Chronic Heart Data preprocessing, feature selection, and classification are
Disease (CHD). The results show that the accuracy of LR the three most crucial parts of the heart disease prediction
reaches 85.86%, which is better than XGBoost, which has an model. We carry out outlier processing and standardization
accuracy of 84.46%. Louridi et al. [5] used the UCI machine on the dataset, ensuring all the data are well structured after
learning repository and compared three methods: SVM, the data preprocessing process. A feature selection process
k-Nearest Neighbor (kNN), and NB. Experiments show that based on the LinearSVC algorithm is applied to choose
the SVM with linear kernel has the best effect, with an valuable features. The selected feature subset is divided into
accuracy of 86.8%. Shah et al. [6] used an existing dataset the training set and test set at a ratio of 3 : 1; the former is fed
from the Cleveland database of the UCI Cardiology Patient into the deep neural network we built. Specifically, our deep
Repository and considered 14 attributes. They compared neural network uses the he_normal initializer to construct
four ML algorithms, namely, kNN, NB, DT, and random the best initial weights to prevent gradients from exploding
forest. The experimental results showed that kNN classifi- or vanishing and attain a better effect. Additionally, the
cation has the best effect (90.78%). effectiveness of the model is measured based on the test
Decision tree (DT) also plays an essential role in the field samples. The structure of our heart disease prediction model
of heart disease prediction. Kumar et al. [7] evaluated and can be clearly seen in Figure 1.
analyzed three methods: NB, SVM, and DT, and the results
showed that methods achieved 81.58%, 61.26%, and 90.79%
accuracy. The effect of comparing DT is better than others. In 3.1. Data Collection. There are many databases related to
a similar research, Pires et al. [8] compared a wealth of heart diseases, such as the Cleveland database and the heart
machine learning methods on neural networks, KNN, DT, disease database provided by the National Cardiovascular
SVM, combined nomenclature (CN2) rule inducer, and Disease Surveillance System. This paper uses a widely used
Stochastic Gradient Descent (SGD). The cross-validation heart disease dataset from Kaggle [15], composed of four
results with different multiples show that DT and SVM have databases: Cleveland, Hungary, Switzerland, and the VA
the best accuracy of 10-fold and 20-fold cross-validation Long Beach. The dataset has 14 attributes, and each attribute
Journal of Healthcare Engineering 3

Data preprocessing

Heart disease
Outlier Removal Data standardization Feature selection
dataset

Input
Train Valid Test

Deep learning training


Test
DNN predictor

Prediction results

Results evaluation

Figure 1: Proposed heart disease prediction system structure.

is set with a value. It contains 1025 patient records of dif- IQR � Q3 − Q1 ,


ferent ages, of which 713 are male, and 312 are female. This
dataset is a subset of [16]. The original dataset contains 76 Bl � Q1 − 1.5 ∗ IQR, (1)
attributes, but most scholars only use 14 of them, since other Bu � Q3 + 1.5 ∗ IQR.
attributes have little effect on heart disease, such as time of
exercise ECG reading and exercise protocol. The descrip- Now, the values outside the range of Bl ∼ Bu are rec-
tions in this database are shown in Table 1. ognized as outliers needed to be removed. After these
outliers are filtered out, they can be abandoned from the
dataset. Figure 2 shows the changes in the boxplot before and
3.2. Data Preprocessing. We choose the heart disease after the outlier processing.
dataset publicly from Kaggle [15]. To ensure the stability
and accuracy of the prediction model, it is essential to
3.2.2. Data Standardization Process. Data standardization
perform data analysis and preprocess before inputting
aims to eliminate the differences between features so that
them into the deep neural network. There are two main
subsequent models can learn weights wholeheartedly.
parts of data preprocessing: outlier removal and data
Networks trained on standardized data usually produce
standardization.
better results [18]. Data standardization can convert the
original data into normally distributed data without
3.2.1. Outlier Removal Process. Well-processed and changing the initial data structure distribution. We use the
structured data determines the effectiveness of the model StandardScaler method to standardize the data since all
largely. The raw dataset contains a sort of unreasonable outliers are removed in the previous step and our data
values commonly, whose attributes are inconsistent with roughly obeys normal distribution. The conversion equation
the whole. These abnormal values are named outliers. We is as follows:
analyze the heart disease dataset and apply the x−μ
x∗ � , (2)
interquartile range (IQR) method to detect and remove σ
outliers. It is worth mentioning that the physical indi- where u is the mean of the training samples or zero if
cators of healthy people are usually in a similar range, and with_mean � False and s is the standard deviation of the
the abnormality of specific biological indicators may be a training samples or one if with_std � False.
reflection of diseases. Standardization calculates the mean and variance of the
Therefore, the heart disease prediction model needs to data and converts the data with them. The standardization
alert some outliers instead of removing all of them process can transform the data into a standard normal
thoughtlessly. This paper applied the IQR method to deal distribution suitable for the network behind it.
with the outliers of chol and trestbps columns since these
two columns are generally normally distributed, but the
boxplot shows that they both have apparent abnormalities 3.3. Feature Selection Based on an Embedded Method.
that deviate from the normal range. IQR is a technique used Irrelative features often affect the model’s training process,
to help detect outliers in data. It defines the difference be- and some noise features even make the model deviate from
tween the third quartile and the first quartile as IQR [17], and the correct track. Feature selection chooses a subset of
then the lower and upper boundaries can be calculated by variables that can effectively describe the input data and
the following equations: ensure good prediction results [19]. Some feature selection
4 Journal of Healthcare Engineering

Table 1: Description of features.


SN Attribute Description
1 age Age in years
2 sex 1 � male; 0 � female
3 cp Chest pain type
4 trestbps Resting blood pressure (in mm Hg on admission to the hospital)
5 chol Serum cholesterol in mg/dl
6 fbs Fasting blood sugar >120 mg/dl (1 � true; 0 � false)
7 restecg Resting electrocardiographic results
8 thalach Maximum heart rate achieved
9 exang Exercise-induced angina (1 � yes; 0 � no)
10 oldpeak ST depression induced by exercise relative to rest
11 slope The slope of the peak exercise ST segment
12 ca Number of major vessels (0–3) colored by fluoroscopy
13 thal 1 � normal; 2 � fixed defect; 3 � reversible defect
14 target 1 � disease; 2 � no disease

raw data data after outlier removing

350
500
300
400
250

300
200

200 150

100 100

chol trestbps chol trestbps


(a) (b)

Figure 2: The changes of boxplot before and after the outlier removal using IQR. (a) Raw data. (b) Results of outlier removal.

methods are applied to reduce the influence of noise or Our embedded feature selection is based on the Lin-
irrelevant variables, roughly summarized into filter methods, earSVC algorithm, which is applicable to this binary
wrapper methods, and embedded methods. However, the classification problem. We use L1 norm regularization
feature subset selected by the filter method has high re- [20] as a penalty term because it has good robustness and
dundancy, and the wrapper method has a high computa- makes the coefficients sparse. The L1 regularization loss
tional complexity because the evaluation of different feature function is also called Lasso regression. It is expressed as
subsets requires retraining and testing while the embedded follows:
method can efficiently select a subset with better perfor-
N
mance. In this paper, the dataset we used has selected 14 2
min 􏽘 􏼐wT xi − yi 􏼑 + λw1 , (3)
attributes from 76 features in the original dataset. We use the w
i�1
penalty-based embedded feature selection method to verify
these chosen features and try to pick the most related fea- where w represents the coefficient of the feature, x is the
tures based on them. Embedded feature selection integrates feature matrix, y is the target vector matrix, n is the number
the feature selection process with the model training process. of samples, and λ represents the regularization strength.
Instead of splitting the data into training and test sets, the The regularization can restrict the coefficient, and a
two are completed in the same optimization process. The sparse matrix can be obtained by using L1 regularization.
machine learning algorithm is used for training and obtains The feature with a coefficient of 0 in the matrix can be
the weight coefficient of each feature, these weight coeffi- regarded as inconsequential to the model, which will not
cients often represent the importance of features to the affect the effectiveness of the model even if it is removed.
model, and then the evaluation module selects the most Therefore, we can concentrate on the nonzero value features
contributing feature according to the value of the weight to achieve the purpose of feature selection. Through feature
coefficient. The embedded feature selection method relies on selection, we can reduce the number of features and select
model evaluation to complete feature selection. the most reliable feature subset.
Journal of Healthcare Engineering 5

3.4. Heart Disease Classification Using Deep Neural Networks ReLu(x) � max(0, x),
3.4.1. Deep Neural Networks. A deep neural network is a (5)
1
deep learning framework, usually a feedforward neural Sigmoid(x) � .
1 + e− x
network. In addition, the deep neural network is a dis-
criminative model, which can be trained through the The output result obtained by Sigmoid can be regarded
backpropagation algorithm [21]. as the probability of belonging to the corresponding cate-
After continuous research, the network has been widely gory. Therefore, we convert the data label into one-hot
used in speech recognition, cancer detection, and other fields encoding. The one-hot encoding uses an N-bit status register
and has outstanding performance. This is because deep to encode N states. The label is represented as a binary
neural networks can use statistical learning methods to vector. In each code, only one bit is marked as 1, which
extract high-level features from the input data. represents a valid index, and the remaining bits are marked
The basic structure of DNN can be divided into three as 0. This encoding method converts the label into a con-
layers, namely, the input layer, hidden layer, and output venient form for the network, which facilitates the calcu-
layer. Unlike perceptrons, deep neural network structures lation of the BinaryCrossentropy loss function.
have at least one hidden layer. Therefore, deep neural
networks are sometimes called Multilayer Perceptrons
(MLP). This change increases the depth and complexity of 3.4.3. Application of Initializers. Deep neural networks
the model, improves the model’s capabilities, and can use usually need to learn an extremely complex nonlinear
multiple activation functions. Each hidden layer of the model, and different initializers often lead to distinctive
network has interconnected neurons. The process of the convergence speeds and effects. If the weights of each layer
deep neural network is that after the hidden layer extracts are all initialized to 0 or 1, the neural network cannot learn
the input features, the classification result is finally ob- important features during the backpropagation process, and
tained in the output layer. Our DNN network structure it is challenging to update parameters. In addition, an ex-
diagram is shown in Figure 3. We input the 12 features cessively large initial value will cause exploding gradient,
selected in the feature selection module into the DNN while an initial value that is too small will cause the van-
network. This network has 7 hidden layers and finally gets ishing gradient; both lead to a decline in the learning ability
2 outputs, corresponding to the scores belonging to each of the network. To solve the problems mentioned above, it is
category. necessary to find a suitable weight initialization method,
which needs to meet the following demands:
(1) Avoid saturation of activation values of neurons in
3.4.2. Loss and Activation Function. The heart disease each layer
prediction in this paper is essentially a binary classification
problem. We use BinaryCrossentropy as a loss function to (2) Avoid the activation value of each layer which be-
measure the quality of the model’s prediction. Bina- comes zero
ryCrossentropy is widely used in binary classification However, the prevalent random normal method for
problems. To calculate the loss by BinaryCrossentropy, the weight initialization may cause network optimization to a
following equation is used. dilemma. Once the random distribution is not properly
generated, it may encounter a situation where the output
1 N
Hp (q) � − 􏽘 yi log p yi 􏼁􏼁 + 1 − yi 􏼁log 1 − p yi 􏼁􏼁, value of the deep network is close to 0, resulting in the
N i�1 vanishing gradient. The basic idea of Xavier [22] initiali-
(4) zation is that the activation value of each layer and the
variance of the gradient remain consistent during the
where y is a binary label, and p (y) is the probability of propagation process, avoid all output values tending to 0,
belonging to the y label. and make each layer get effective feedback during back-
BinaryCrossentropy can measure the quality of clas- propagation. However, Xavier initialization has an advan-
sification since the process of reducing the loss can make tage over Tanh but is ineffective with the ReLU activation
the sample whose label equals 1 obtained a larger predicted function. He initialization [23] divides by two based on
probability of p (y). In contrast, the probability of the Xavier, which can keep the variance unchanged and make
sample with 0 labels becomes smaller. The accuracy of the sure half of the neurons in each layer are activated. Equation
model can be significantly improved with the process of (6) states the Xavier method, and equation (7) states He
reducing loss. initialization method.
The input layer and hidden layer of our deep neural 􏽳������� 􏽳�������
network use the ReLU activation function, while the Sigmoid ⎝
⎛ 6 6 ⎠,

wi ∼ U − , (6)
activation function is used in the output layer to map the ni + ni+1 ni + ni+1
output to the range of [0, 1] to adapt to the Bina-
ryCrossentropy loss function. In addition, choosing Sigmoid 􏽰��������� 􏽰���������
instead of ReLU makes the output easier to control. The 􏽨U􏼐− 6/ni + ni+1 , 6/ni + ni+1 􏼑􏽩
wi ∼ . (7)
ReLU and Sigmoid functions are as follows: 2
6 Journal of Healthcare Engineering

x[1]

x[2]
∑ sigmoid
x[3]

x[12] ∑ sigmoid
...

...

...

...

...

...

...
n=1028 n=512 n=256 n=256 n=128 n=128 n=32
Figure 3: The structure of our deep neural network.

We compare some famous weight initialization methods, In our experiment, the Adam optimizer was selected,
whose results are illustrated clearly in Section 4. Due to the which is a stochastic gradient-based optimization. The
advantages of the He initializer with the ReLU activation Adam optimizer only needs first-order gradients and is
function, we employ this method in our network. computationally efficient with only little memory. This
method calculates the individual adaptive learning rate of
different parameters by estimating the first and second
4. Results and Discussion moments of the gradient, which has advantages compared
We employed the proposed method to predict heart disease with other optimization methods [25]. In this experiment,
and evaluated the results. The heart disease dataset was the learning rate is 0.0001, and the number of iterations is
preprocessed through outlier removal and data standardi- 150. To ensure the reliability of the results, all of the statistics
zation at the beginning, after which a feature selection we mentioned in this paper are average results of 10 ex-
module was applied, and the selected feature subset was fed periments. It shows that our average accuracy is 98.56%,
into the deep neural network for training. We tried a wide recall reaches 99.35%, precision is 97.84%, and F1-score
range of network optimization algorithms to improve the achieves 0.983. Table 2 represents the detailed results and
effect and stability of the model. Figure 4 shows the confusion matrix of the predicted results
We divided the data into a training and a test set at a ratio of an experiment.
of 3 : 1, and 20% of the training data was partitioned for In addition, we use ROC and AUC to evaluate the
verification. The DNN network was trained and learned in performance of the model. Receiver operating characteristic
the training set. We calculate the accuracy, recall, precision, curve (ROC) is a curve drawn based on a series of different
and F1-score indicators to evaluate these results. Accuracy boundary values with a true positive rate on the ordinate and
can describe the number of correct predictions over all of the a false positive rate on the abscissa [26]. AUC is the area
predictions. Recall refers to the proportion of real positive under the ROC curve, which represents the probability of
cases that are correctly predicted positive, and precision the calculated score of the positive sample higher than that of
denotes the proportion of predicted positive cases that are the negative sample when the samples are randomly selected,
correctly real positives [24]; F1-score is a measure com- which can measure the pros and cons of the prediction
bining both precision and recall and can be regarded as the model. Results show that the average AUC value of our
harmonic average of the two. The following equations model is 0.983 and the ROC curve of an experiment can be
present calculation of them: seen in Figure 5.
In the data preprocessing, we used the IQR method to
TP + TN remove the outliers of chol and trestbps and successfully
accuracy � , normalized the dataset. In the feature selection module, we
TP + TN + FP + FN
use an embedded feature selection method based on Line-
TP arSVC, using the L1 norm as a penalty term, and successfully
recall � ,
TP + FN picked 12 features that contribute to the model. The fbs
(8) feature with a score of 0 is removed in this module. See
TP Table 3 for each feature score.
precision � ,
TP + FP By using the He initialization method, our model ob-
tained outstanding stability and accuracy as a consequence.
2 ∗ precision ∗ recall
F1 − score � We compare the neural network’s performance using the He
precision + recall initialization method, RandomNormal method, and Xavier
method. It is concluded that the He initialization shows
where TP is the true positive, FP denotes the false positive, superiority, the accuracy is 9.3% and 13.3% higher than the
TN denotes the true negative, and FN is the false negative. random and Xavier method, respectively, recall is 9.0% and
Journal of Healthcare Engineering 7

Table 2: Results of the proposed method.


Table 3: Importance value of features.
Class 0 (%) Class 1 (%)
Accuracy 98.56 98.56 SN Attribute Importance value
Recall 97.84 99.35 1 age −0.00144
Precision 99.35 97.84 2 sex −0.56535
3 cp 0.288000
4 trestbps −0.005696
5 chol −0.001637
Confusion matrix
120 6 fbs 0.0
7 restecg 0.1190487
8 thalach 0.0079078
100 9 exang −0.332790
0 120 2 10 oldpeak −0.199450
11 slope 0.1589541
80 12 ca −0.252994
13 thal −0.2927249
True label

60

40 comparison of initializers
1 2 121 1.4

20 1.2
0.986 0.993 0.978 0.983
1.0 0.903
0.893 0.885 0.9
0.853 0.869 0.836 0.856
0 1
0.8
results

Predict label
Figure 4: Confusion matrix on test data. 0.6

0.4

ROC and AUC 0.2


1.0
0.0
0.8 Accuracy Recall Precision F1-score
Indicator
0.6 He_normal
TPR

RandomNormal
0.4 Xavier
Figure 6: Results of different initializers.
0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0 comparison of BatchNormalization layer
1.4
FPR
AUC=0.98 1.2
0.986 0.975 0.993 0.983 0.978 0.967 0.983 0.98
Figure 5: ROC curve and AUC of the proposed algorithms. 1.0

0.8
results

12.4% higher, respectively, precision is increased by 9.3%


0.6
and 14.2%, and F1-score is increased in the number of 0.083
and 0.127. These results are demonstrated in detail in 0.4
Figure 6.
Additionally, we found that batch normalization per- 0.2
formed poorly in our model. We add batch normalization
0.0
after the fully connected layer. The accuracy, recall, preci-
Accuracy Recall Precision F1-score
sion, and F1-score of the model change to 97.5%, 98.3%, Indicator
96.7%, and 0.98, respectively, which are decreased by 1.1%,
1.0%, 1.1%, and 0.003. Comparison results are illustrated in without BN
with BN
Figure 7. We conjecture this is because He initializer already
gives the network good initial weights, so that each layer of Figure 7: Comparison of results using batch normalization layer.
8 Journal of Healthcare Engineering

Table 4: Comparison of classification performance of the proposed method with others.


Authors Methods Accuracy (%) Recall (%) Precision (%)
Ramprakash et al. [27] χ2 -DNN 94.0 93.00 —
Gao et al. [28] Bagging ensemble method with decision tree 98.6 99.0 97.8
Gao et al. [28] PCA + decision tree 99.0 97.0 98.0
Ali et al. [29] MLP 97.95 98 98
Proposed LinearSVC + DNN 98.56 99.35 97.84

the network has good input and output values, avoiding the Authors’ Contributions
vanishing and exploding gradients.
Furthermore, we compared our method with some Dengqing Zhang and Yunyi Chen contributed equally to this
published methods proposed by other scholars. For example, work.
Ramprakash et al. [27] used the combination of the PCA
feature extraction method and DNN to get a classification References
with high accuracy, but the recall only reached 97%. The
[1] World Health Organization, https://www.who.int/health-
specific comparison results are shown in Table 4.
topics/cardiovascular-diseases#tab�tab_1, 2021.
[2] H. C. McGill, C. A. McMahan, and S. S. Gidding, “Preventing
5. Conclusions heart disease in the 21st century,” Circulation, vol. 117, no. 9,
pp. 1216–1227, 2008.
In this paper, we propose a heart disease prediction algo- [3] M. S. Amin, Y. K. Chiam, and K. D. Varathan, “Identification
rithm based on DNN combined with LinearSVC embedded of significant features and data mining techniques in pre-
feature selection method. Through the IQR method, the dicting heart disease,” Telematics and Informatics, vol. 36,
outliers in the dataset are successfully removed and all data pp. 82–93, 2019.
are standardized to obtain reliable input. In addition, the [4] S. Nalluri, R. V. Saraswathi, S. Ramasubbareddy, K. Govinda,
optimal feature subset is selected in the feature selection and E. Swetha, “Chronic heart disease prediction using data
module based on the LinearSVC algorithm and L1 norm. A mining techniques, advances in intelligent systems and
total of 12 most-relative features are selected and input into computing,” in Data Engineering and Communication
the subsequent DNN network. To enhance the network’s Technology, pp. 903–912, Springer, Singapore, 2020.
[5] N. Louridi, M. Amar, and B. E. Ouahidi, “Identification of
performance, we compare three weight initialization
cardiovascular diseases using machine learning,” in Pro-
methods including the He_normal, random_normal, and ceedings of the 2019 7th Mediterranean Congress of Tele-
Xavier, concluding that He initialization method acquires communications (CMT), pp. 1–6, IEEE, Fez, Morocco,
the best results in this heart disease prediction model. October 2019.
Meanwhile, we find that the batch normalization layer is not [6] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction
suitable for this method, attaining lower scores in every using machine learning techniques,” SN Computer Science,
indicator. In this two-classification problem, we choose vol. 1, no. 6, pp. 1–6, 2020.
BinaryCrossentropy as the loss function and Sigmoid as the [7] A. Kumar, P. Kumar, A. Srivastava, V. D. A. Kumar,
activation function of the output layer to map the output to K. Vengatesan, and A. Singhal, “Comparative analysis of data
the range of [0, 1]. The experimental results show that a mining techniques to predict heart disease for diabetic pa-
tients,” in Proceedings of the International Conference on
high-accuracy prediction model for heart disease is realized.
Advances in Computing and Data Sciences, pp. 507–518,
The accuracy of our proposed method reaches 98.56%, recall Springer, Valletta, Malta, April 2020.
is 99.35%, precision is 97.84%, and F1-score achieves 0.983, [8] I. M. Pires, G. Marques, N. M. Garcia, and V. Ponciano,
with an AUC score of 0.983, proving that this feature se- “Machine learning for the evaluation of the presence of heart
lection method and deep neural network are feasible and disease,” Procedia Computer Science, vol. 177, pp. 432–437,
reliable in predicting heart disease. In the future, we will 2020.
continue to adjust the depth and parameters of the DNN to [9] M. Kavitha, G. Gnaneswar, R. Dinesh, Y. R. Sai, and
enhance the stability of the model as well as research other R. S. Suraj, “Heart disease prediction using hybrid machine
deep learning optimization techniques to obtain better learning model,” in Proceedings of the 2021 6th International
performance [30]. Conference on Inventive Computation Technologies (ICICT),
pp. 1329–1333, IEEE, Coimbatore, India, January 2021.
[10] R. Spencer, F. Thabtah, N. Abdelhamid et al., “Exploring
Data Availability feature selection and classification methods for predicting
heart disease,” Digital health, vol. 6, 2020.
The heart disease dataset used to support the findings of this
[11] M. A. Khan, “An IoT framework for heart disease prediction
study is available at https://www.kaggle.com/johnsmith88/ based on MDCNN classifier,” IEEE Access, vol. 8, Article ID
heart-disease-dataset. 34717, 2020.
[12] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart
Conflicts of Interest disease prediction using hybrid machine learning tech-
niques,” IEEE access, vol. 7, Article ID 81542, 2019.
The authors declare that there are no conflicts of interest [13] G. Magesh and P. Swarnalatha, “Optimal feature selection
regarding the publication of this study. through a cluster-based DT learning (CDTL) in heart disease
Journal of Healthcare Engineering 9

prediction,” Evolutionary Intelligence, vol. 14, no. 2,


pp. 583–593, 2021.
[14] A. Mehmood, M. Iqbal, Z. Mehmood et al., “Prediction of
heart disease using deep convolutional neural networks,”
Arabian Journal for Science and Engineering, vol. 46, no. 4,
pp. 3409–3422, 2021.
[15] https://www.kaggle.com/johnsmith88/heart-disease-dataset.
[16] https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
[17] H. P. Vinutha, B. Poornima, and B. M. Sagar, “Detection of
outliers using interquartile range technique from intrusion
dataset,” Advances in Intelligent Systems and Computing,
vol. 701, pp. 511–518, 2018.
[18] M. Shanker, M. Y. Hu, and M. S. Hung, “Effect of data
standardization on neural network training,” Omega, vol. 24,
no. 4, pp. 385–397, 1996.
[19] G. Chandrashekar and F. Sahin, “A survey on feature selection
methods,” Computers & Electrical Engineering, vol. 40, no. 1,
pp. 16–28, 2014.
[20] S. L. Kukreja, J. Löfberg, and M. J. Brenner, “A least absolute
shrinkage and selection operator (LASSO) for nonlinear
system identification,” IFAC proceedings volumes, vol. 39,
no. 1, pp. 814–819, 2006.
[21] T. Epelbaum, “Deep learning: technical introduction,” 2017,
https://arxiv.org/abs/1709.01412.
[22] X. Glorot and Y. Bengio, “Understanding the difficulty of
training deep feedforward neural networks,” in Proceedings of
the Thirteenth International Conference on Artificial Intelli-
gence and Statistics, PMLR, vol. 9, pp. 249–256, Sardinia, Italy,
May 2010.
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into
rectifiers: surpassing human-level performance on ImageNet
classification,” in Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pp. 1026–1034,
Santiago, Chile, December 2015.
[24] D. M. W. Powers, “Evaluation: from precision, recall and
F-measure to ROC, informedness, markedness and correla-
tion,” 2020, https://arxiv.org/abs/2010.16061.
[25] D. P. Kingma and Ba. Jimmy, “Adam: a method for stochastic
optimization,” 2014, https://arxiv.org/abs/1412.6980.
[26] T. Fawcett, “An introduction to ROC analysis,” Pattern
Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.
[27] P. Ramprakash, R. Sarumathi, R. Mowriya et al., “Heart
disease prediction using deep neural network,” in Proceedings
of the International Conference on Inventive Computation
Technologies (ICICT), pp. 666–670, IEEE, Coimbatore, India,
February 2020.
[28] X. Y. Gao, A. A. Ali, H. S. Hassan, and E. M. Amwar, “Im-
proving the accuracy for analyzing heart diseases prediction
based on the ensemble method,” Complexity, vol. 2021, Article
ID 6663455, 10 pages, 2021.
[29] M. M. Ali, B. P. Kumar, K. Ahmad, M. B. Francis,
M. W. Q. Julian, and M. A. Moni, “Heart disease prediction
using supervised machine learning algorithms: performance
analysis and comparison,” Computers in Biology and Medi-
cine, vol. 136, Article ID 104672, 2021.
[30] M. Rahman, M. M. Zahin, and L. Islam, “Effective prediction
on heart disease: anticipating heart disease using data mining
techniques,” in Proceedings of the 2019 International Con-
ference on Smart Systems and Inventive Technology (ICSSIT),
pp. 536–541, Tirunelveli, India, November 2019.

You might also like