0% found this document useful (0 votes)

39 views15 pages

Informatics in Medicine Unlocked: Sciencedirect

Uploaded by

himanshu.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views15 pages

Informatics in Medicine Unlocked: Sciencedirect

Uploaded by

himanshu.cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Informatics in Medicine Unlocked 42 (2023) 101370

Contents lists available at ScienceDirect

Informatics in Medicine Unlocked

journal homepage: www.elsevier.com/locate/imu

Cardiovascular disease identification using a hybrid CNN-LSTM model with

explainable AI☆
Md Maruf Hossain a, Md Shahin Ali a, **, Md Mahfuz Ahmed a, Md Rakibul Hasan Rakib a,
Moutushi Akter Kona a, Sadia Afrin b, Md Khairul Islam a, Md Manjurul Ahsan c, d,
Sheikh Md Razibul Hasan Raj e, Md Habibur Rahman f, g, *
a
Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh
b
Department of Food and Nutrition, Government College of Applied Human Science, Azimpur, Dhaka, 1205, Bangladesh
c
Department of Radiology, Northwestern University, Machine and Hybrid Intelligence Lab, Chicago, IL, 60611, USA
d
Industrial & Production Engineering, Shahjalal University of Science and Technology, Sylhet 3114, Bangladesh
e
Department of Information and Communication Technology, Islamic University Kushtia, 7003, Bangladesh
f
Department of Computer Science and Engineering, Islamic University, Kushtia, 7003, Bangladesh
g
Center for Advanced Bioinformatics and Artificial Intelligent Research, Islamic University, Kushtia, 7003, Bangladesh

A R T I C L E I N F O A B S T R A C T

Keywords: Cardiovascular disease (CVD) is a leading cause of death worldwide, with millions dying each year. The iden
Cardiovascular disease tification and early diagnosis of CVD are critical in preventing adverse health outcomes. Hence, this study
Deep learning proposes a hybrid deep learning (DL) model that combines a convolutional neural network (CNN) and long short-
CNN-LSTM
term memory (LSTM) to identify CVD from the clinical data. This study utilizes CNN to extract the relevant
Feature engineering
features from the input data and the LSTM network to process sequential data and capture dependencies and
Explainable AI
patterns over time. This study provides insights into the potential of a hybrid DL model combined with feature
engineering and explainable AI to improve the accuracy and interpretability of CVD prediction. We evaluated our
model on a publicly available dataset where the proposed CNN-LSTM achieved a high accuracy of 73.52% and
74.15% with and without feature engineering, respectively, in identifying individuals with CVD, which is the
best result compared to the current state-of-the-art model. The results of this study demonstrate the potential of
DL models for the early diagnosis of CVD. Our proposed CNN-LSTM model also incorporates explainable AI to
identify the top features responsible for CVD. They could be used to develop more effective screening tools in
clinical practice.

1. Introduction main cause of death on a global scale. In 2020, according to the World
Health Organization, this disease will be the primary cause of death
Cardiovascular disease (CVD) is a condition that causes blood vessel worldwide, with an estimated 17.9 million fatalities annually [3].
obstruction and heart attacks with chest discomfort, as well as other Additionally, coronary disease mortality increases annually. It is antic
heart diseases and heart failure that may result in death or other severe ipated that the population will surpass 23.6 million by 2030. CVD is
issues [1]. It is number one on the list of the top ten reasons for passing in considered the leading cause of death worldwide. Coronary heart dis
the previous 15 years, and 15 million persons passed in 2015 [2]. ease, cerebrovascular disease, peripheral arterial disease, strokes and
January 2017 research demonstrated cardiovascular infections are the transient ischaemic attacks (TIA), vascular illness, chronic heart illness,

☆
www: iu.ac.bd (M.M. Hossain); iu.ac.bd (M.S. Ali); iu.ac.bd (M.M. Ahmed); iu.ac.bd (M.R.H. Rakib); iu.ac.bd (M.A. Kona); cahs.gov.bd/#/ (S. Afrin); iu.ac.bd (M.
K. Islam); sust.edu (M.M. Ahsan); iu.ac.bd (S.M.R.H. Raj); iu.ac.bd (M.H. Rahman).
* Corresponding author. Department of Computer Science and Engineering, Islamic University, Kushtia, 7003, Bangladesh.
** Corresponding author. Department of Biomedical Engineering, Islamic University, Kushtia, 7003, Bangladesh.
E-mail addresses: [email protected] (M.M. Hossain), [email protected] (M.S. Ali), [email protected] (M.M. Ahmed), [email protected]
(M.R.H. Rakib), [email protected] (M.A. Kona), [email protected] (S. Afrin), [email protected] (M.K. Islam), [email protected]
(M.M. Ahsan), [email protected] (S.M.R.H. Raj), [email protected] (M.H. Rahman).

https://doi.org/10.1016/j.imu.2023.101370
Received 2 April 2023; Received in revised form 27 September 2023; Accepted 2 October 2023
Available online 4 October 2023
2352-9148/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

congenital heart defects, thrombosis of deep veins, and pulmonary • Additionally, we used feature engineering techniques to identify and
embolism are the most common cardiovascular conditions [4]. Several incorporate relevant features such as ’blood_diff,’ ’BMI,’ ’obese,’ and
contributory risk factors, including hypertension, elevated blood pres ’hypertension’ into our model for CVD detection. Additionally, we
sure, excessive lipids, and abnormal pulse rate, complicate the diagnosis employed preprocessing strategies to refine the dataset, enhancing
of cardiac disease. Detecting CVD at an early period is essential to its specificity and improving the overall performance of our model.
reducing this toll. • Lastly, we utilize explainable AI techniques to identify the key fea
Machine learning (ML) is one of the most swiftly evolving areas of tures affecting cardiac patients in our CNN- LSTM model to gain
Artificial Intelligence (AI). These algorithms can analyze vast amounts valuable insights for clinical decision-making and improving patient
of data from numerous disciplines, including the vital medical field [5]. outcomes.
Moreover, data mining is among the numerous methods for enhancing
disease detection and diagnosis. Furthermore, early detection of CVD 1.3. Organization
reduces costs and CVD mortality. By employing an algorithm for clas
sification, which contributes an essential part of medical research, data The remaining sections of the paper are organized as follows: Section
analysis strategies can efficiently and affordably complete the task. 2 provides a comprehensive literature review, offering an overview of
Using a variety of ML algorithms, such as Logistic Regression, Naive existing research. The subsequent section 3, delves into the challenges
Bayes (NB), Support Vector Machines, Decision Trees, and others we associated with CVD detection. The methodology is outlined in Section
constructed a model. 4, detailing the research design and data collection techniques. Sections
Through supervised learning and binary analysis of user data, we 5 and 6 present the results and discussions, respectively, analyzing the
addressed the identified issue. By dividing the data into study and findings and their implications. Finally, Section 7 concludes the paper,
training sets, applying different methods or combinations, and evalu summarizing the main conclusions and proposing directions for future
ating for precision, we developed an accurate predictive model for research.
assessing an individual’s risk of CVD. Using this procedure, an adverse
factor is identified with high accuracy. This study anticipated the dan 2. Literature review
gers and etiology of CVD from one generation to the next. Informatics in
healthcare for CVD is changing in various disciplines, including storing Cardiovascular identification using ML techniques is expanding
and sending data. The medical consequences of ML for CVD in 2020 are rapidly, with numerous studies investigating the application of ML al
that it accesses the material in a supervised format and provides more gorithms and DL models. By utilizing these techniques, healthcare can
accurate data for the anticipated condition. be revolutionized by enabling more precise diagnoses and tailored
treatment plans for patients. This section reviews prior studies on
identifying CVD through diverse ML approaches.
1.1. Motivation
In [6], the authors proposed an RF algorithm correlating diabetes
and heart disease. The method employed to determine the percentage of
Late identification of the most critical behavioural risk factors for
heart disease prediction is aware of the connection and the extent to
heart disease and stroke underscores the need for this study. Early
which diabetes affects coronary artery disease. Better performance can
detection of these conditions through routine screening exams could
be obtained by using more parameters. The authors in this study [1]
improve treatment options and longer survival times. Public health
concentrated on leveraging healthcare data for cardiovascular disease
initiatives have been established to encourage communities to undergo
prediction through a mobile-based iOS application, obtaining an
regular screening for chronic conditions such as CVD. That is why we are
impressive accuracy of 72.7%. They propose extending the model to
approaching this study by using ML approaches. There are many
encompass other diseases and exploring deep learning and CNN
algorithm-based studies, but we aim to conclude various algorithms in a
methods for potential efficiency enhancements.
point and express them with good accuracy. Here we employed the
The authors [7] proposed a model employing four methods of clas
Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neigh
sification for data mining: KNN, NB, DT, and RF. They utilized numerous
bors (KNN), NB, Multilayer Perceptron (MLP), CatBoost, Gradient
data mining techniques, including regression, clustering, and associa
Boosting Machine (GBM), AdaBoost, Random Forest (RF), and hybrid
tion rules, to retrieve valuable data and details from large databases.
CNN-LSTM model. Based on our model and performance, we are moti
They discovered the maximum accuracy with KNN (k = 7). Imple
vated to conclude with the proposed CNN-LSTM method. Pre-screening
menting more complex models and incorporating additional data min
systems for early disease prediction, detection, and prevention are a
ing techniques, including time-series analysis, the combination of
crucial source of information that applies cutting-edge ICT methods to
association and clustering rules, SVM, and genetic algorithms, can
address issue with health data collection, analysis, and interpretation, as
improve the accuracy of early heart disease prediction. In Ref. [8], the
well as to enhance current health systems for the advanced screening of
authors applied several ML techniques, namely, DT, NB, and Neural
diseases that we can explicitly discuss in this work. We aim to get over
Networks, and developed a prototype of the Intelligent Heart Disease
the previous restriction and produce a successful outcome.
Prediction System (IHDPS). This approach has improved accuracy and
also saved time and money. They observed that the accuracy of NB was
1.2. Contribution 60%, logistic regression was 61.45%, and SVM was 64.4%. The authors
[9] employed a method to boost the coronary artery disease prediction
A summary of the primary contribution of our study includes the rate more correctly. They utilized effective prediction techniques, such
following: as Gaussian NB, SVM, RF, Hoeffding Tree, and LMT, to boost accuracy.
RF gives the most accurate results among all the algorithms used.
• We introduce our novel CNN-LSTM model, achieving exceptional Adding more input attributes and reducing the data size can enhance the
accuracy in early-stage cardiac condition diagnosis for patients. As genetic algorithm’s performance. In Ref. [10], the authors proposed a
deep learning pioneers, we are the first to develop such a model using method for predicting cardiac disease utilizing sklearn libraries, pandas,
CSV datasets. matplotlib, and other required libraries with a TkInter Python-based
• Furthermore, our model excels in both accuracy and efficiency, application. The hybrid model achieved 88% accuracy. Applying DL
setting new standards. Utilizing CNNs and LSTM networks, we ach techniques to predict heart disease may yield improved results. The
ieve faster diagnosis compared to current ML algorithms, facilitating authors [11] analyzed heart disease prediction via an ML technique.
prompt interventions and potentially improving patient outcomes. Comparing all the algorithms used in this method, the RF shows the best

2
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 1. Workflow of our proposed methodology.

result of 95.60%. The results can be improved by implementing other are compared to the proposed CNN model. The paper claims a high
algorithms and fruitful deep-learning techniques. In Ref. [12], the au accuracy of 94% for the CNN model using the UCI ML repository dataset.
thors proposed a method for evaluating and summarizing the overall The CNN model aims to improve efficiency and accuracy by leveraging
predictive capability of ML algorithms in CVD. They demonstrated that DL capabilities to analyze complex patterns in medical data. In Ref. [16],
the predictive ability of ML algorithms in CVD, particularly SVM and the authors addressed the global health burden of CVD, specifically
boosting algorithms, is promising. They might obtain an optimal accu coronary heart disease. The authors proposed an enhanced deep neural
racy rate using more feature selection methodologies and techniques. network (DNN) model for accurate and reliable diagnosis. The model
The authors [13] employed a hybrid approach using ML classifiers for achieved 83.67% diagnostic accuracy, 93.51% sensitivity, 72.86%
CVD forecasts. They have constructed their framework using a neural specificity, and 79.12% precision. Investigating enhanced DL methods
network based on general principles. Using the confusion matrix, the and advanced models can be effective in further improving the accuracy
proposed method employing the ML classifier obtained a higher accu of heart disease diagnoses in patients worldwide. In Ref. [17], the author
racy of 85.71%. By varying the number of testing datasets, accuracy can of the paper employed a model using DL Techniques (DLTs), specifically
be increased. Artificial Neural Networks (ANN), to analyze and predict CVD in the
In [14], sonam et al. developed a heart disease prediction system Robust Healthcare Industry (RHI). The ANN model achieves a high ac
using two classification techniques: NB and DT classifiers. The DT curacy of 98.4% compared to other models, highlighting its
classifier outperformed the NB classifier, but removing irrelevant attri effectiveness in disease analysis and prognosis.
butes from the dataset improved the NB classifier’s performance. This The existing literature on CVD detection using ML algorithms reveals
research emphasizes the importance of classification techniques and that several studies have been conducted in this domain. However, there
data preprocessing for accurate heart disease prediction. In Ref. [4], the is a research gap in developing more effective and productive models
authors proposed a method for predicting CVD using symptom input. Six that can improve the accuracy and efficiency of these algorithms.
classification algorithms were used to analyze 14 attributes of the Although most studies have focused on using ML algorithms for classi
Cleveland dataset, with SVM and MLP achieving the highest accuracy of fication and prediction, the lack of research on developing efficient
91.7%. Ensembles and exploring more parameter settings could further preprocessing techniques to enhance algorithm performance is evident.
improve performance. This method shows promise in providing accurate Although some studies have achieved high accuracy rates, there is still a
and immediate disease prediction. The authors [15] employed a Con need to improve the prediction of CVD at an early stage and identify
volutional Neural Network (CNN) approach for predicting CVD. Tradi more accurate risk indicators. There is no available related work to
tional ML methods, such as Logistic Regression, K-Nearest Neighbors detect CVD using the DL method on tabular data. Furthermore, it has
(KNN), NB, Support Vector Machine (SVM), and Neural Networks (NN), been observed that most researchers used a limited dataset, raising

3
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Table 1
Details of dataset features.
SN Attribute Description
name

1. age Age in days.

2. gender Sex indicate in 1(Woman); 2(Man);
3. height Denoted height in centimeters(cm)
4. weight Denoted weight in kilogram(kg)
5. ap_hi Systolic blood pressure at the time of heartbeats.
6. ap_lo Diastolic blood pressure at the time of heart rests between
beats.
7. cholesterol Based on amount of lipoprotein variation in blood as 1:
normal; 2: above normal; 3: well above normal;
8. gluc Fasting blood glucose level for patient in express as 1:
normal; 2: above normal; 3: well above normal;
9. smoke whether the patient is a smoker (1) or nonsmoker (0).
10. alco Alcohol level in patient’s blood that develops from drinking
alcoholic beverages written as nonalcoholic(0); alcoholic
(1);
Fig. 2. The number of diseased and non-diseased people.
11. active Physical activity either presence (1) or absent (0)
12. cardio Working capacity of hearts valves and chambers for targeting
variable as no disease (0); disease (1); 4. Materials and methodology

We divide this system model of methodology into several stages: data

concerns about their findings’ generalizability. Additionally, the utili
collection, data normalization, feature engineering, model architecture
zation of explainable artificial intelligence (XAI) and feature engineer
design, model training, hyperparameter tuning, and model evaluation,
ing techniques has not been reported in many studies, which can provide
as shown in Fig. 1.
insightful information on the performance of a classifier.
Therefore, this study aims to address these gaps and enhance the
development of effective and efficient ML models for CVD prediction.
4.1. Materials

3. Challenges of CVD detection

4.1.1. Dataset
In this work, a dataset on heart disease was analyzed to develop our
Though this is a successful appearance of the model, we are facing
anticipated model [20]. The dataset used in this study was obtained
some challenges as well. New charts for 21 geographical regions have
from the Kaggle CVD dataset. There are 14 attributes contained within
been produced by the WHO CVD Risk Chart Working Group study to aid
this dataset. Table 1 illustrates the specifics of all features.
in risk prediction in clinics and national public health initiatives [18].
The data set comprises 70,000 patient records, comprising 24,470
CVD is a significant cause of death and disability worldwide, and early
males and 45,530 females of various ages, with 35,021 (50.03%) normal
detection and treatment are crucial for preventing its progression. ML
patients and 34,979 (49.97%) heart disease patients, as shown in Fig. 2.
algorithms have shown promise in detecting CVD, but several challenges
The descriptive statistics of the dataset are shown in Table 2.
must be addressed to improve their accuracy and reliability. The
development of population-specific models that fully address the issues
of present risk prediction models will require further resources and ef 4.2. Methodology
forts to collect more observational data with extensive follow-ups. Risk
prediction methods are undoubtedly at the height of development. 4.2.1. Data normalization
Normalization of data is a fundamental preprocessing stage for data
• Lack of standardized data: ML algorithms require large amounts of mining and learning. It entails converting the data into an accepted
high-quality data for training and validation, but there is a lack of format that can improve data precision and integrity, reduce duplication
standardized data in CVD detection. The data may need to be and inconsistency, and ensure that the data is arranged in an effective
completed, accurate, or consistent, which can affect the performance and consistent way [21]. Normalization is necessary when noticeable
of ML algorithms. variations exist between the ranges of various features, and it is
• Imbalanced data: The distribution of CVD and non-CVD cases in the particularly appropriate when there are no outliers in the dataset. There
training data may be imbalanced, with most claims being non-CVD. are several data normalization techniques. The most commonly used
This can lead to biased results and poor performance of ML algo normalization techniques are Decimal scaling, Unit Vector Normaliza
rithms in detecting CVD [19]. tion, Z-Score Normalization, Log Transformation, and Min-Max
• The complexity of CVD: CVD is a complex disease with many risk Normalization.
factors and symptoms, which makes it challenging to identify the Our study used the min-max normalization technique to scale our
most critical features for detection. ML algorithms may also be data between 0 and 1. We are approaching this normalization technique
challenged by commodities and confounding factors that can affect because of its simplicity and effectiveness in enabling distance-
the accuracy of the results. measuring algorithms. It is essential to remember that the minimum
• Ethical considerations: Using ML algorithms in healthcare raises and maximum values may not accurately represent the data in certain
ethical concerns like privacy, informed consent, and bias. The algo cases, potentially resulting in information loss. Below, we have provided
rithms should be ensured that they are transparent, unbiased, and do a brief overview of the normalization techniques employed in our study.
not violate ethical principles essential. Min-Max normalization balances value comparisons between data
• Interpretability: In ML algorithms, black boxes are often consid before and after the procedure by executing linear changes on the
ered, making it difficult to interpret the results and understand how original data [22,23]. The data inputs used in min-max normalization
the algorithm arrived at a particular decision. This makes CVD are translated into a predetermined range (from 0 or 1 to 1). This
detection difficult, where the underlying factors and mechanisms standardization preserves the associations between the original data
still need to be fully understood. values. The method can be described in the following formula [24]:

4
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Table 2
Descriptive statistics for the Dataset.
Variable Min Max Mean Median STD Variance MAD RMS Skewness Missing

age 30 65 53.338 54 6.765 45.769 5.644 53.766 − 0.306 0

gender 1 2 1.349 1 0.477 0.227 0.455 1.431 0.631 0
height 55 250 164.359 165 8.210 67.406 6.324 164.564 − 0.642 0
weight 10 200 74.206 72 14.396 207.237 11.044 75.589 1.012 0
ap_hi − 150 16020 128.817 120 154.011 23719.517 16.007 200.781 85.296 0
ap_lo − 70 11000 96.630 80 188.472 35521.894 31.044 211.799 32.114 0
cholesterol 1 3 1.367 1 0.680 0.462 0.549 1.526 1.587 0
gluc 1 3 1.226 1 0.572 0.327 0.385 1.653 2.397 0
smoke 0 1 0.088 0 0.283 0.080 0.161 0.297 2.906 0
alco 0 1 0.054 0 0.225 0.051 0.102 0.231 3.956 0
active 0 1 0.804 1 0.397 0.158 0.315 0.896 − 1.529 0
cardio 0 1 0.500 0 0.500 0.250 0.500 0.707 0.001 0
blood_diff 0 1 44.899 40 10.658 113.589 8.282 46.147 0.833 0
BMI 3.472 298.667 27.464 26.298 5.609 31.471 4.032 28.031 5.581 0
obese 0 1 0.260 0 0.438 0.192 0.389 0.510 1.093 0
hypertense 0 1 0.814 0 0.389 0.151 0.303 0.902 − 1.612 0

Fig. 3. Flowchart of feature engineering.

X − min(X) Fig. 3 presents a flowchart outlining the steps in feature engineering.

xnew = (1)
max(X) − min(X) The flowchart provides a clear understanding of the process, starting
with data collection, then data cleaning, feature selection, feature
transformation, and finally, feature scaling. This approach helps to
X = old value;
ensure that the most relevant and informative features are selected to
Max (X) = maximum value in the dataset;
improve model performance.
Moreover, we leveraged advanced feature engineering techniques to
The equation used is as follows:
identify and incorporate crucial features like ’blood_diff,’ ’BMI,’ ’obese,’
x and ’hypertension,’ significantly enhancing the model’s diagnostic ca
x∗ = (2)
10j pabilities. "Blood_diff" indicates the variability between systolic and
diastolic blood pressures, while "hypertense status" determines hyper
where, j = log10(max(xi)) (3). tension. These features serve as crucial indicators for cardiovascular
health analysis, identifying individuals at risk of hypertension-related
4.2.2. Feature engineering cardiovascular conditions. Additionally, the standardized trans
Feature engineering maximizes performance by extracting predictive formation of height and weight, known as "BMI," plays a significant role
data and choosing ML algorithms based on the feature sets [3]. It uses in assessing cardiovascular health and identifying potential obesity-
transform function, arithmetic, and aggregation operators to create new related risks. By incorporating these features, our method enhances
features and thus transforms a dataset’s feature space to improve pre CVD detection accuracy and supports early intervention for better pa
dictive modeling [25]. tient outcomes.
In this study, we have selected the most relevant features and created
new features to improve the performance of our proposed CNN-LSTM 4.2.3. Feature creation
model. Feature engineering includes the following steps shown in Fig. 3. New features have been created using feature engineering tech
The workflow has a few steps: niques, including (blood_diff), Body Mass Index (BMI), obese, and hy
pertense. Creating new features by combining or interacting with
1. Coefficient analysis can remove high-relevant descriptors. existing variables can aid in capturing nonlinear interactions and
2. Dimensionality enhancement using basic structural parameters and enhance model precision.
prototypical functions (such as multiply).
∑ Eni ( )
3. Top important descriptors selection to discover a good descriptor InfoGain fni , Eni (4)
subset for material characterization [26]. ni ∈T,fni =f
E

ML requires laborious feature engineering, which helps DL and When f is used as a splitting feature, its weighted information gain in
improved text classification rules learning. Our study incorporates a that node is ascribed. When used as an argument of a splitting part, it is
visual aid to illustrate the feature engineering process systematically. credited with the splitting feature’s weighted info-gain, divided by the

5
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

total number of elements in its statements and discounted by. Demon

strates that the value of a developed feature should not be attributed
only to its construction parts.

4.2.4. Feature transformation

Feature transformation involves converting or modifying raw input
data into a more meaningful representation that can be used as input to
an ML model and act as a better indicator.
In this paper, we have calculated (blood_diff) and hypertense status
based on the (ap_hi) and (ap_lo) features, a feature transformation type
in feature engineering. And BMI is a standardized and recognized
transformation of features (height and weight). We determine obesity
based on a person’s BMI, a feature transformation in feature engineer
ing. Several feature transformation methods exist, such as log trans
formation, Z-normalization transformation, scaled transformation, sine
transformation, square-root transformation, etc. [27].
Log transformation translates feature values by applying the natural
logarithm, a popular data transformation method. y = ln (1 + x), where
y is the modified characteristic, x is the original characteristic, and ln(x)
is the natural logarithmic function. Due to the limitation imposed by the
ln(x) function, the above formula transforms only numbers greater than
zero. A constant, such as the ln (1 + x) function, is introduced to address
the zero values [28].
Fig. 4. Architecture of the proposed CNN-LSTM model.
4.2.5. Feature selection
This is the process of picking a subset of the most relevant features particular variable for a given model and forecast. Although numerous
from a more significant collection for use in a model. Employing too methods for determining feature importance exist, they can generally be
many features can lead to over-fitting and poor model performance. divided into model-independent and agnostic. Numerous model-neutral
Feature selection is the process of producing a subset of an original techniques have been created to increase ML models’ openness,
feature set based on a set of feature selection criteria, which picks the dependability, and interoperability. Interpreting feature relevance in
pertinent features of a dataset. This study shows which feature is the supervised learning models can be difficult as attributes are interde
most significant in different algorithms for predicting the disease more pendent. Such dependencies are disregarded by permutation feature
accurately. importance (PFI), leading to incorrect extrapolation-based conclusions.
Possibilities include more sophisticated conditional PFI techniques that
4.2.6. Feature extraction permit the evaluation of feature relevance based on all other charac
The feature extraction process transforms raw data into more rele teristics. It is advantageous if the conditioning is apparent and under
vant and useful characteristics for a particular application. Various stood due to this change in perspective and permits accurate
techniques can be used for feature extraction, including convolutional interpretations. The target outcome whether the cardiac disease is pre
methods that use manually designed kernels and syntactic and structural sent’s the same across datasets, despite variations in the numbers and
methods applied to sequential, geographic, or other structured data types of characteristics. Not all information is equally essential for
[29]. predicting heart disease. Some characteristics are more useful in pre
dicting heart disease because of their relevance [3]. Using ML tech
4.2.7. Feature scaling niques, the significance of each feature was determined. According to
This includes altering features to be of a comparable size, which their feature scores, the features were sorted. In our study of CVD
might enhance the effectiveness of ML algorithms. Feature scaling is one identification using a hybrid CNN-LSTM model with explainable AI,
of the most critical changes we must apply to our data [30]. Here in our feature importance analysis plays a crucial role. We performed indi
paper blood_diff feature, we eliminate all the value which ranges above vidual feature importance analyses for each algorithm used in our study.
80 because it is uncommon and which reduce the effectiveness of Random Forest and Decision Tree algorithms provided feature scores,
algorithms. allowing us to determine the significance of each feature. However,
KNN, SVM, and MLP algorithms did not generate feature scores [34], so
4.2.8. Feature encoding we relied on other techniques to assess feature importance in those
The process of encoding categorical information into numerical cases. Our proposed algorithm for CVD identification identifies ap_hi,
form, using techniques such as one-hot encoding or ordinal encoding so ap_lo, and age as the top three features without feature engineering.
that they may be utilized in an ML model [31]. However, with feature engineering, the top three features are age, ap_hi,
and hypertense. These are the most significant features that represent
4.3. Feature importance the analysis result of the features by various algorithms.

In classical ML, the value of features is crucial in the data pre

processing [32]. "Feature importance" refers to methods that grade input 4.4. Proposed CNN-LSTM model
features according to how well they predict a target variable. In pre
dictive modeling, feature significance values are essential as they pro The proposed CNN-LSTM model consists of two main components: a
vide the basis for reducing dimensionality, insight into the data and CNN for feature extraction and an LSTM for sequence modeling. The
model, and feature independence testing. A predictive model’s CNN component takes in the raw input data and extracts relevant fea
effectiveness and efficiency can be increased through feature selection tures by applying convolutional filters. These filters capture local pat
[32,33]. The feature (variable) significance indicates the impact of every terns in the data and can learn to identify important features [35]. The
feature in the model’s prediction. It establishes the usefulness of a output of the CNN component is then fed into the LSTM component,

6
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 5. Heatmap showing correlations among all the features of the dataset.

which processes the sequence of feature vectors to model temporal de 4.5. SHAP as XAI
pendencies and capture long-term context [36]. The LSTM network
contains memory cells that store information over time and gates that SHAP (SHapley Additive exPlanations) is a well-known Explainable
control the flow of information into and out of the cells. This allows the Artificial Intelligence (XAI) technique that describes ML model out
model to learn and remember patterns in the data that occur over long comes. It provides a method for measuring the impact of each model
periods of time [37]. The final output of the model can be either a feature on the output of a specific data point. This can help to under
classification label or a prediction of the next value in the sequence, stand why a model made a particular prediction and which features
depending on the task at hand. The model is trained using were most influential in producing that prediction [38]. By utilizing
back-propagation through time, which allows the gradients to flow SHAP, the insights of the model can be achieved into a model’s pre
through the entire sequence and update the model’s parameters. The dictions to allow for uncovering and rectifying biases, enhancing per
architecture of the model we proposed in this study is shown in Fig. 4. formance, and creating end-user confidence. Firstly, it calculates the
The proposed CNN-LSTM model performs sequential data process SHAP values to see the insightful features that led to further steps [39].
ing. Firstly, the CNN component extracts features automatically from the The SHAP value can be determined via the formula below:
raw data, reducing the need for manual feature engineering. Secondly,
∑|z′|!(M − |z′| − 1) [ (z′)]
the LSTM component can capture long-term dependencies and context, φi (f , a) = fx (z′) − fx (5)
allowing the model to make more accurate predictions or classifications. ′ ′
M ⅈ
z ⊆x
Finally, the model can be trained end-to-end using back-propagation
through time, making it more efficient and effective than traditional Where,
models that require separate feature extraction and sequence modeling
steps. φi = Shapley value for feature i
In our study, we proposed a hybrid CNN-LSTM DL model for f = Blackbox model
analyzing the dataset. Our proposed model introduces a hybrid CNN- x = input data point
LSTM DL model that leverages the strengths of both architectures. z′ = subset
This allows us to capture spatial and temporal dependencies in the data x′ = simplified data input
effectively. Our approach fills a gap in the literature, as no previous fx (z′) = With feature i.
work applies this hybrid model to the dataset. The integration of CNN ( ′)
fx zⅈ = Without feature i.
and LSTM enables us to extract complex patterns and capture long-term
dependencies, resulting in superior performance compared to existing
It calculates the contribution of a single characteristic by considering
methods.
all potential subsets of features that include that feature. The contribu
In our proposed model for the CSV dataset, we adjust CNN param
tion is then weighted by the number of subgroups with the feature and
eters (number of layers, filter size, stride, pooling size), LSTM parame
averaged over all potential subsets. This produces a single SHAP value
ters (number of layers, units, dropout rate), training parameters
for each feature, which may be utilized to explain the model prediction
(learning rate, batch size, number of epochs), feature engineering
for a specific instance.
(feature selection, scaling), and model architecture (skip connections,
attention mechanisms) can improve accuracy. Hyperparameters opti
mization methods like grid search or Bayesian optimization are essential
for finding optimal combinations. Thorough experimentation and vali
dation are crucial for achieving the best performance.

7
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 7. Graphical representation of K-fold cross-validation.

TP + Tn
Accuracy = (11)
TP + Tn + Fp + Fn

5.3. Exploratory data analysis

Fig. 6. KDE plot representing diseased and non-diseased patients based on the
In order to gain a better understanding of the dataset’s characteris
age distribution.
tics, exploratory data analytics were performed. The results of these
analyses are presented in the following section. Fig. 5 is a heatmap
5. Results
depicting the associated values and inter-feature correlations. All the
colored cells show a correlation between the two characteristics and
5.1. Experiment setup
their associated values. The cell color indicates the strength of the
connection, with negative values represented by a distinct color and zero
This study employed a conventional laptop with Windows 10, an
indicating no correlation between variables.
Intel Core I7 processor, and 16 GB of RAM. The experiment was con
In addition, Fig. 6 depicts the density distribution of a dataset of
ducted five times, and the ultimate outcome was determined by taking
patients with and without illness. According to the dataset, the afflicted
an average of all five computational results. group consists of patients aged 50 to 60. The graph indicates that age is a
significant risk factor for heart disease and that the likelihood of
5.2. Performance evaluation developing the condition grows.

The dataset underwent evaluation by eleven classification algo

5.4. Experimental results
rithms, which were then compared using a 5-fold cross-validation to
determine the best approach based on accuracy and other statistical
In our study, many classification techniques, including GNB, SVM,
factors. The algorithms tested included GNB, SVM, DT, LR, CatBoost,
DT, LR, CatBoost, AdaBoost, GBM, MLP, XGBoost, and CNN-LSTM, have
AdaBoost, GBM, RF, MLP, XGBoost, and a novel hybrid CNN-LSTM
been applied to our dataset and processed the features by applying
model. The evaluation metrics are used to measure each algorithm’s
feature engineering (FE) techniques and outliers have been found and
performance.
removed. These classification algorithms were applied to the dataset
To assess the effectiveness of the various algorithms employed in the
using 5-fold cross-validation techniques; how it works is shown in Fig. 7.
experiment, several statistical measures were utilized, including kappa
Table 3 displays the classification method’s sensitivity, specificity,
statistics, precision, recall, f-measure, Matthew’s correlation coefficient
and accuracy with and without FE. With FE, CatBoost, RF, and CNN-
(MCC), receiver operating characteristic (ROC), and precision-recall
LSTM models give the highest accuracy. XGBoost, GBM, and CNN-LSTM
(PRC). These evaluation matrices employ the terms true positive (TP),
parameters give the highest sensitivity, whereas CNN-LSTM, RF, and
true negative (TN), false positive (FP), and false negative (FN). TP and
AdaBoost give the highest specificity. The results show that MLP,
TN refer to successfully recognized samples; meanwhile, FP and FN
XGBoost, and CNN-LSTM achieved the highest accuracy after applying
belong to instances that the models wrongly identified. The following
FE. CatBoost, GBM, and MLP achieved the highest sensitivity, while RF,
provides a detailed elaboration of the mentioned evaluation metrics.
AdaBoost, and CNN-LSTM achieved the highest specificity. However,
These measures were used to compare the effectiveness of the algo
our proposed model outperformed the other algorithms regarding
rithms and ensure the reliability and accuracy of the experiment’s out
overall performance.
comes [3]. The formula for these methods is as follows:
Table 4 reveals that LR, SVM, and GNB under-performed compared
TP to other algorithms when precision, recall, and f-measures were evalu
Recall = (6)
TP + Fn ated without FE. However, when FE was applied, DT, GBM, and Cat
Boost exhibited poor performance. On the other hand, RF, AdaBoost,
F1 − score = 2 ×
TP + Tn
(7) and GNB showed exceptionally high performance without FE. With FE,
TP + Tn + Fp + Fn CNN-LSTM, AdaBoost, and SVM demonstrated high performance.
The results of kappa statistics and MCC values for various classifi
TP cation systems, with and without FE, are presented in Table 5.
Precision = (8)
TP + Fn The findings indicate that our proposed model outperforms the other
most effective and valuable algorithms. Table 6 displays each classifi
TP
Sensitivity = (9) cation method’s area under ROC and PRC values.
TP + Fn
The area under ROC represents a common region between the TP and
Tn FP rates. In contrast, the area under PRC indicates a common area be
Specificity = (10) tween precision and recall. Even though GNB, SVM, and LR get the
Tn + Fp
closest results, AdaBoost, MLP, and CNN-LSTM perform better without

8
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Table 3
Classification results of the different classification algorithms regarding Sensitivity, Specificity, and Accuracy.
Algorithm Sensitivity Specificity Accuracy

Without FE With FE Without FE With FE Without FE With FE

GNB 56.12% 69.80% 71.86% 75.79% 59.70% 72.16%

SVM 62.06% 69.48% 64.27% 76.79% 63.09% 72.27%
DT 71.07% 72.20% 74.22% 72.83% 72.55% 72.48%
LR 64.14% 71.84% 64.34% 75.33% 64.24% 73.32%
CatBoost 71.14% 73.48% 75.31% 73.64% 73.06% 73.55%
AdaBoost 69.01% 71.42% 78.47% 76.85% 72.84% 73.62%
GBM 72.01% 73.92% 73.69% 73.78% 72.83% 73.86%
RF 70.67% 72.43% 76.78% 75.96% 73.36% 73.93%
MLP 71.32% 74.21% 73.34% 73.90% 72.29% 74.06%
XGBOOST 72.02% 73.35% 73.48% 75.06% 72.74% 74.11%
Proposed CNN-LSTM 71.17% 72.04% 76.39% 77.11% 73.52% 74.15%

Table 4
Comparison based on precision, recall, and F-Measures.
Classifier Algorithm Precision Recall F-Measure

Without FE With FE Without FE With FE Without FE With FE

GNB 87.15% 81.59% 56.12% 69.80% 68.28% 75.23%

SVM 66.46% 82.91% 62.06% 69.48% 64.19% 75.60%
DT 75.63% 76.27% 71.07% 72.20% 73.28% 74.18%
LR 63.83% 79.80% 64.14% 71.84% 63.99% 75.61%
CatBoost 77.16% 76.62% 71.14% 73.48% 74.03% 75.02%
AdaBoost 82.46% 81.87% 69.01% 71.42% 75.14% 76.29%
GBM 74.26% 76.57% 72.01% 73.92% 73.12% 75.22%
Random Forest 79.45% 80.26% 70.67% 72.43% 74.80% 76.14%
MLP 74.14% 76.57% 71.32% 74.21% 72.70% 75.37%
XGBoost 73.93% 78.61% 72.02% 73.35% 72.97% 75.89%
Proposed CNN-LSTM 78.64% 81.82% 71.17% 72.04% 74.72% 76.62%

FE. Using FE, MLP, SVM, and GNB get the closest results, while GBM,
Table 5 XGBoost, and CNN-LSTM demonstrate superior performance.
Evaluation by kappa and MCC. Fig. 8a & b depicts the ROC curve of different classification algo
Classifier Algorithm kappa Mcc rithms with and without FE, constructed from the true and false positive
Without FE With FE Without FE With FE rates. It is a graphical depiction of Table 6’s area under ROC (AUROC).
Fig. 9a and b illustrate the area under the precision-recall curve
GNB 19.61% 43.89% 23.46% 44.59%
SVM 26.21% 44.06% 26.27% 44.98%
(AUPRC) for several classification methods, with and without FE,
Decision Tree 45.12% 45.12% 45.21% 44.85% respectively. The AUPRC values represent the location and agree with
Logistic Regression 28.48% 46.35% 28.48% 46.66% the values reported in Table 6. These figures provide a visual repre
CatBoost 46.13% 46.94% 46.30% 46.99% sentation of the AUPRC values. Table 7 displays the five most relevant
AdaBoost 45.73% 46.88% 46.62% 47.44%
characteristics based on correlation value and feature relevance.
GBM 45.66% 47.58% 45.69% 47.61%
RF 46.76% 47.58% 47.12% 47.88% According to the table, high arterial pressure (ap_hi) is the significant
MLP 44.59%% 48.00% 44.63% 48.02% trait or factor for identifying and predicting heart disease with and
XGBoost 45.48% 48.01% 45.49% 48.14% without FE for various classification algorithms.
Proposed 47.07% 47.97% 47.33% 48.41% In addition to age, the level of cholesterol in the blood (cholesterol),
CNN-LSTM
the patient is either underweight or overweight or normal (BMI), and the
alcohol level in the patient’s blood (alco) are other important predictors
of heart disease.
Table 6 According to the table, high arterial pressure (ap_hi) is the significant
Value of area under ROC and PRC. trait or factor for identifying and predicting heart disease with and
Classifier Algorithm AUROC AUPRC without FE for various classification algorithms. In addition to age, the
level of cholesterol in the blood (cholesterol), the patient is either un
Without FE With FE Without FE With FE
derweight or overweight or normal (BMI), and the alcohol level in the
GNB 0.5982 0.7160 0.6733 0.76254
patient’s blood (alco) are other important predictors of heart disease.
SVM 0.6311 0.7060 0.6935 0.7346
DT 0.7256 0.7234 0.7653 0.7593 Table 8 displays applicable classification algorithms’ feature
LR 0.6423 0.7213 0.7726 0.7592 importance and coefficient scores, excluding SVM, MLP, and GNB,
CatBoost 0.7307 0.7342 0.7858 0.7801 which do not produce such values. The importance and coefficients of
AdaBoost 0.7288 0.7292 0.7766 0.7717 each feature are presented in the table. Fig. 12a and b visually represent
GBM 0.7283 0.7373 0.7886 0.7828
the important features of our proposed CNN-LSTM model as presented in
RF 0.7326 0.7370 0.7894 0.7804
MLP 0.7273 0.5273 0.7787 0.6081 Table 8. The figures illustrate the feature ranking based on their sig
XGBoost 0.7274 0.7383 0.7862 0.7829 nificance and coefficient scores, providing insight into the significant
Proposed CNN-LSTM 0.7344 0.7395 is 0.7873 0.7829 risk factors for CVD.
Without FE, the classifiers had an accuracy range of 59.70%–
73.52%. However, when utilizing the same splitting with FE, the

9
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 8. ROC curve analysis obtained from the proposed hybrid CNN-LSTM model.

Fig. 9. PRC curve analysis obtained from the proposed hybrid CNN-LSTM model.

Table 7
Top five features for heart disease according to applied algorithms.
Feature Ranking 1st 2nd 3rd 4th 5th

DT With FE blood_diff ap_hi ap_lo cholesterol BMI

Without FE ap_hi age cholesterol alco gluc
CatBoost With FE ap_hi cholesterol ap_lo active BMI
Without FE ap_hi age cholesterol ap_lo weight
AdaBoost With FE ap_hi cholesterol BMI active weight
Without FE age ap_hi weight cholesterol ap_lo
GBM With FE ap_hi cholesterol BMI ap_lo weight
Without FE ap_hi age cholesterol weight ap_lo
RF With FE ap_hi ap_lo blood_diff cholesterol BMI
Without FE ap_hi ap_lo age cholesterol weight
XGBoost With FE ap_hi cholesterol ap_lo active smoke
Without FE ap_hi age cholesterol ap_lo active
Proposed CNN-LSTM With FE age ap_hi hypertense ap_lo blood_diff
Without FE ap_hi ap_lo age cholesterol weight

classifiers’ accuracy ranged from 72.16% to 74.15%. Additionally, we 6. Discussions

evaluated our proposed CNN- LSTM classifier alongside all the ML
classifiers used in our experiment. Table 9 presents the results obtained The results presented in Table 9 show that our proposed CNN-LSTM
with and without FE. model outperformed all other classification algorithms regarding accu
racy, precision, recall, and F1-score, with and without FE. The accuracy
of the proposed model was 73.89% without FE, which increased to
74.95% with FE. Moreover, the proposed model’s precision, recall, and

10
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Table 8
Feature importance and coefficient scores of different applied algorithms.
Algorithm DT CatBoost AdaBoost GBM RF XGBoost Proposed CNN-LSTM

age Without FE 0.1031 24.0725 0.22 0.1233 0.1253 0.1472 0.0363

With FE 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gender Without FE 0.0013 0.0 0.0 0.0010 0.0050 0.0184 0.0184
With FE 0.0 0.5600 0.0 0.0003 0.0042 0.0047 0.0047
height Without FE 0.0010 0.2378 0.01 0.0050 0.0307 0.0046 0.0046
With FE 0.0 0.3025 0.005 0.0026 0.0225 0.0033 0.0033
weight Without FE 0.0042 5.3016 0.15 0.0230 0.0307 0.0288 0.0288
With FE 0.0096 1.4251 0.07 0.0089 0.0448 0.0117 0.0117
ap_hi Without FE 0.7926 42.7182 0.23 0.7345 0.4731 0.5227 0.5226
With FE 0.2856 65.4973 0.23 0.8060 0.3503 0.6932 0.6932
ap_lo Without FE 0.0034 7.1226 0.075 0.0202 0.1785 0.0681 0.0681
With FE 0.5394 3.2506 0.065 0.0171 0.1763 0.0411 0.0411
cholesterol Without FE 0.0798 17.7410 0.135 0.0745 0.0930 0.1120 0.1193
With FE 0.0746 19.7296 0.165 0.1008 0.0945 0.1295 0.1295
gluc Without FE 0.0072 0.0 0.05 0.0061 0.0142 0.0119 0.0119
With FE 0.0 2.2176 0.065 0.0073 0.0144 0.0166 0.0166
smoke Without FE 0.0000 1.0115 0.03 0.0023 0.0051 0.0250 0.0250
With FE 0.0 0.8309 0.045 0.0037 0.0049 0.0187 0.0187
alco Without FE 0.0080 0.0 0.035 0.0025 0.0049 0.0172 0.0172
With FE 0.0 0.6074 0.055 0.0027 0.0041 0.0139 0.0139
active Without FE 0.0064 1.7944 0.065 0.0073 0.0089 0.0363 0.0363
With FE 0.0008 2.8462 0.075 0.0085 0.0084 0.0342 0.0342

Table 9
Results comparison of different ML algorithms against our proposed CNN-LSTM classifier using a 5-fold cross-validation technique.
Algorithms 1st fold C (%) 2nd fold C (%) 3rd fold C (%) 4th fold C (%) 5th fold C (%) CV Means (%) CV STD (%)

Without With Without With Without With Without With Without With Without With Without With
FE FE FE FE FE FE FE FE FE FE FE FE FE FE

GNB 0.596 0.715 0.595 0.716 0.587 0.713 0.593 0.712 0.594 0.711 0.593 0.713 0.003 0.0019
SVM 0.677 0.483 0.689 0.480 0.679 0.489 0.689 0.489 0.685 0.491 0.684 0.486 0.0051 0.0043
DT 0.727 0.727 0.732 0.732 0.733 0.733 0.727 0.727 0.723 0.723 0.729 0.728 0.0036 0.0037
LR 0.714 0.719 0.725 0.724 0.718 0.720 0.729 0.723 0.718 0.7111 0.721 0.720 0.0055 0.0044
CatBoost 0.731 0.727 0.739 0.735 0.733 0.734 0.732 0.732 0.729 0.727 0.733 0.731 0.0033 0.0031
AdaBoost 0.729 0.508 0.731 0.732 0.729 0.729 0.726 0.727 0.725 0.491 0.728 0.638 0.0024 0.1127
GBM 0.732 0.509 0.743 0.739 0.735 0.734 0.737 0.734 0.733 0.648 0.736 0.672 0.0040 0.0885
RF 0.527 0.676 0.738 0.736 0.736 0.731 0.73 0.731 0.716 0.730 0.690 0.721 0.082 0.0222
MLP 0.728 0.518 0.742 0.492 0.732 0.491 0.734 0.659 0.729 0.509 0.733 0.534 0.0048 0.0635
XGBoost 0.731 0.738 0.743 0.740 0.734 0.732 0.737 0.735 0.734 0.732 0.736 0.734 0.0039 0.0041
Proposed 0.736 0.742 0.743 0.746 0.736 0.739 0.732 0.738 0.734 0.744 0.746 0.749 0.0042 0.0040
CNN-
LSTM

Fig. 10. Accuracy curve analysis of our proposed hybrid CNN-LSTM model.

F1-score were 74.02%, 73.72%, and 73.84%, respectively, without FE, study conducted an evaluation of the proposed CNN-LSTM model,
which improved to 75.09%, 75.22%, and 75.15%, respectively, with FE. comparing its performance with and without FE. Figs. 10a and 11a
These results indicate that the proposed CNN-LSTM model more illustrate the outcomes of the model without FE, indicating a test ac
effectively predicts CVD risk when the FE gets associated with it. The curacy of 73.52% and the convergence of training and validation loss

11
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 11. Model loss curve analysis of our proposed hybrid CNN-LSTM model.

Fig. 12. Important features according to SHAP using our proposed CNN-LSTM model.

Fig. 13. The confusion matrix of our proposed CNN-LSTM without FE takes
80% as training and 20% as testing data.
Fig. 14. The confusion matrix of our proposed CNN-LSTM with FE takes 80%
as training and 20% as testing data.

12
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Fig. 15. The confusion matrix of all the experimented classifiers on Dataset before feature engineering (FE) and after feature engineering (FE), respectively.

13
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

Table 10
Comparison between the state-of-the-art approaches and our proposed CNN-LSTM model on the same dataset.
Study Precision Recall F- Measure Kappa Mcc AUROC Sensitivity Specificity Accuracy

[1] £ £ 72% £ £ £ £ £ 72.70%

Our study 81.82% 72.04% 76.62% 47.97% 48.41% 73.95% 72.04% 77.11% 74.15%

after the 55tℎ epoch. Subsequently, Figs. 10b and 11b present the net diverse dataset, and employing XAI and feature engineering techniques.
work’s performance on training and validation data with FE. These contributions enhance the development of effective and efficient
These figures provide a comprehensive overview of accuracy and ML models for CVD prediction, providing valuable insights for future
loss for each epoch. With FE, the CNN-LSTM model exhibited conver research and clinical applications.
gence at the 82nd epoch, achieving the highest validation score. The However, some limitations of this study should be acknowledged.
model demonstrated a classification performance of 74.15% on unseen First, the proposed model was evaluated on a single dataset, and its
test data not encountered during training. These results indicate that FE performance should be validated on other datasets. Second, the model
enhances the CNN-LSTM model’s performance, improving accuracy and was trained using a supervised learning approach, which requires a large
convergence. The findings emphasize incorporating FE techniques in the amount of labeled data. Obtaining labeled data can be challenging, and
proposed model for enhanced predictive capabilities. The significance of alternative approaches, such as semi-supervised or unsupervised
individual features in a model can be determined by analyzing their learning, may be necessary.
absolute Shapley values. The absolute Shapley values for each feature
are averaged across the dataset to obtain a global understanding of 7. Conclusions
feature importance. Afterwards, the features are sorted in descending
order of importance, and a plot is generated to visualize the results. In In conclusion, our study makes significant strides in the accurate
this context, the proposed CNN-LSTM model was utilized to predict prediction of heart disease using data mining and machine learning
CVD. Fig. 12a depicts the Shapley values for important features without techniques. Through the evaluation of various ML algorithms, we have
employing FE techniques. The plot indicates that the most significant successfully demonstrated that the CNN-LSTM hybrid model emerges as
feature is ap_hi. Conversely, Fig. 12b shows the important features after the most effective algorithm, achieving a commendable accuracy of
applying FE techniques, and the plot reveals that age is the most sig 74.15%. This finding highlights the potential of advanced ML models in
nificant feature, followed by ap_hi. the realm of cardiovascular disease (CVD) prediction, providing a
In our study, we utilized the confusion matrix to compare the per valuable tool for early intervention and prevention. Moreover, we
formance of the proposed CNN-LSTM model with and without FE. employed the SHAP technique to interpret the importance of features in
Fig. 13 displays the confusion matrix obtained from the CNN-LSTM the models, offering crucial insights into the underlying mechanisms of
model without FE. disease prediction. By identifying key features such as age, systolic blood
The confusion matrix shows that the model correctly predicted 5479 pressure, and cholesterol levels as essential contributors to heart disease
instances while the TN was 4814. Fig. 14 displays the confusion matrix prediction, we have deepened our understanding of the factors influ
obtained from the CNN-LSTM model with FE, showing that the model encing CVD development. This valuable information can potentially aid
correctly predicted 6008 instances while the TN was 4018. Overall, clinicians in devising personalized treatment plans and risk manage
adding FE led to an increase in the number of accurate predictions, ment strategies for patients, ultimately leading to improved clinical
indicating that the model could better identify cases of heart disease. outcomes. Our study successfully addresses several research gaps and
Fig. 15 displays the dataset’s confusion matrix. By dividing the limitations by leveraging a hybrid CNN-LSTM model, utilizing a diverse
dataset into training and testing halves, 20% of the data and 80% of the dataset, and incorporating feature engineering techniques. These con
data were used to assess the performance of various classifiers. tributions enhance the effectiveness and efficiency of ML models for
The confusion matrix provides a valuable visual representation of the CVD prediction, positioning them as promising tools in the domain of
model’s performance and can be used to guide further improvements to cardiovascular health management. However, we acknowledge that
the model. Furthermore, it is worth noting that some classification al further research is warranted to validate our findings on larger and more
gorithms, such as RF, AdaBoost, and XGBoost, achieved high accuracy diverse datasets. Continued exploration of ML techniques and their
and F1-score with FE. However, their performance decreased without clinical applications in predicting and preventing heart disease will
FE, indicating that FE is crucial for these algorithms’ performance. undoubtedly strengthen the field’s knowledge base and enable better-
On the other hand, MLP and CatBoost achieved high accuracy and informed decision-making for medical practitioners.
F1-score with and without FE, indicating their robustness to FE. Our
proposed CNN-LSTM model performs better in predicting cardiovascular CRediT authorship contribution statement
disease (CVD) risk than other classification algorithms. One of the recent
study [1] on the same dataset achieved an accuracy of 72.7%, while our Md. Maruf Hossain: Conceptualization, Methodology, Software,
model demonstrates an exceptional accuracy of 74.15% in diagnosing Validation, Writing - Original Draft, Md Shahin Ali: Conceptualization,
early-stage cardiac conditions. This represents a significant improve Methodology, Software, Validation, Writing - reviewing & editing. Md.
ment of 1.45% over the previous state-of-the-art approaches. Mahfuz Ahmed: Data curation, Writing, Validation. Md. Rakibul
Comparison between the state-of-the-art approaches and our pro Hasan Rakib: Data curation, Writing, Validation. Moutushi Akter
posed CNN-LSTM model on the same dataset, are shown in Table 10. Kona: Data curation, Writing, Validation. Sadia Afrin: Data curation,
However, in this study, the author did not employ explainable AI and Writing, Validation. Md Khairul Islam: Methodology, Formal analysis,
feature engineering techniques to visualize the specific features Writing - Review & Editing and Supervising. Md Manjurul Ahsan:
responsible for the disease. We utilized these, which underscores the Investigation, Writing - Review & Editing. Sheikh Md. Razibul Hasan
uniqueness of our study. Raj: Data curation, Writing, Validation. Md Habibur Rahman: Formal
However, other algorithms such as RF, AdaBoost, XG- Boost, MLP, analysis, Writing - Review & Editing and Supervising.
and CatBoost also achieved high accuracy and F1-score with FE.
Therefore, FE is an essential step in improving the performance of these Data availability
algorithms. Moreover, our study addressed these research gaps and
limitations by utilizing a hybrid CNN-LSTM model, incorporating a The data used to support the findings of this study are available from

14
M.M. Hossain et al. Informatics in Medicine Unlocked 42 (2023) 101370

the corresponding author upon request. [15] Sajja TK, Kalluri HK. A deep learning method for prediction of cardiovascular
disease using convolutional neural network. Rev. d’Intelligence Artif. 2020;34(5):
601–6.
[16] Miao KH, Miao JH. Coronary heart disease diagnosis using deep neural networks.
Declaration of competing interest Int J Adv Comput Sci Appl 2018;9(10).
[17] Junejo A, et al. Notice of retraction: molecular diagnostic and using deep learning
techniques for predict functional recovery of patients treated of cardiovascular
The authors declare that they have no known competing financial disease. IEEE Access 2019;7:120315–25.
interests or personal relationships that could have appeared to influence [18] Farzadfar F. Cardiovascular disease risk prediction models: challenges and
the work reported in this paper. perspectives. Lancet Global Health 2019;7(10):e1288–9.
[19] Thabtah F, et al. Data imbalance in classification: experimental evaluation. Inf Sci
2020;513:429–41.
Acknowledgement [20] Islam MK, et al. Brain tumor detection in MR image using superpixels, principal
component analysis and template based K-means clustering algorithm. Machine
Learning with Applications 2021;5:100044.
We would like to acknowledge the support provided by the Bio- [21] Ahsan MM, et al. Deep transfer learning approaches for Monkeypox disease
Imaging Research Lab, Department of Biomedical Engineering, Islamic diagnosis. Expert Syst Appl 2023;216:119483.
[22] Al-Rawahnaa ASM, Al Hadid AYB. Data mining for Education Sector, a proposed
University, Kushtia 7003, Bangladesh, in carrying out our research concept. Journal of Applied Data Sciences 2020;1(1):1–10.
successfully. [23] Ahsan MM, et al. Monkeypox diagnosis with interpretable deep learning. IEEE
Access; 2023.
[24] Henderi H, Wahyuningsih T, Rahwanto E. Comparison of min-max normalization
References and Z-score normalization in the K-nearest neighbor (kNN) algorithm to test the
accuracy of types of breast cancer. Int J Intell Inf Syst 2021;4(1):13–20.
[1] Kedia V, et al. Time efficient IOS application for CardioVascular disease prediction [25] Lou R, et al. Automated detection of radiology reports that require follow-up
using machine learning. In: 2021 5th international conference on computing imaging using natural language processing feature engineering and machine
methodologies and communication (ICCMC). IEEE; 2021. learning classification. J Digit Imag 2020;33:131–6.
[2] Omar S, Mohamed N, Elbendary N. A cardiovascular disease prediction using [26] Dai D, et al. Using machine learning and feature engineering to characterize
machine learning algorithms. In: The international undergraduate research limited material datasets of high-entropy alloys. Comput Mater Sci 2020;175:
conference. The Military Technical College; 2021. 109618.
[3] Ali MM, et al. Heart disease prediction using supervised machine learning [27] Amin A, et al. Customer churn prediction in telecommunication industry using data
algorithms: performance analysis and comparison. Comput Biol Med 2021;136: certainty. J Bus Res 2019;94:290–301.
104672. [28] Amin A, et al. Cross-company customer churn prediction in telecommunication: a
[4] Arunachalam S. Cardiovascular disease prediction model using machine learning comparison of data transformation methods. Int J Inf Manag 2019;46:304–19.
algorithms. Int J Res Appl Sci Eng Technol 2020;8:1006–19. [29] Ali MS, et al. Alzheimer’s disease detection using m-random forest algorithm with
[5] Ali MS, et al. An enhanced technique of skin cancer classification using deep optimum features extraction. In: 2021 1st international conference on artificial
convolutional neural network with transfer learning models. Machine Learning intelligence and data analytics (CAIDA). IEEE; 2021.
with Applications 2021;5:100036. [30] Géron A. Hands-on machine learning with scikit-learn, keras, and TensorFlow.
[6] Rubini P, et al. A cardiovascular disease prediction using machine learning O’Reilly Media, Inc.; 2022.
algorithms. Annals of the Romanian Society for Cell Biology 2021:904–12. [31] Dahouda MK, Joe I. A deep-learned embedding technique for categorical features
[7] Shah D, Patel S, Bharti SK. Heart disease prediction using machine learning encoding. IEEE Access 2021;9:114381–91.
techniques. SN Computer Science 2020;1:1–6. [32] Baughman A, et al. Study of feature importance for quantum machine learning
[8] Jagtap A, et al. Heart disease prediction using machine learning. International models. 2022. arXiv preprint arXiv:2202.11204.
Journal of Research in Engineering, Science and Management 2019;2(2):352–5. [33] Hind M, et al. TED: teaching AI to explain its decisions. In: Proceedings of the 2019
[9] Motarwar P, et al. Cognitive approach for heart disease prediction using machine AAAI/ACM conference on AI. Ethics, and Society; 2019.
learning. In: 2020 international conference on emerging trends in information [34] Molnar C, et al. Model-agnostic feature importance and effects with dependent
technology and engineering (ic-ETITE). IEEE; 2020. features: a conditional subgroup approach. Data Min Knowl Discov 2023:1–39.
[10] Kavitha M, et al. Heart disease prediction using hybrid machine learning model. In: [35] Torres JF, et al. Deep learning for time series forecasting: a survey. Big Data 2021;9
2021 6th international conference on inventive computation technologies (ICICT). (1):3–21.
IEEE; 2021. [36] Khorram A, Khalooei M, Rezghi M. End-to-end CNN+ LSTM deep learning
[11] Katarya R, Meena SK. Machine learning techniques for heart disease prediction: a approach for bearing fault diagnosis. Appl Intell 2021;51:736–51.
comparative study and analysis. Health Technol 2021;11:87–97. [37] Muruganandam NS, Arumugam U. Seminal stacked long short-term memory (SS-
[12] Krittanawong C, et al. Machine learning prediction in cardiovascular diseases: a LSTM) model for forecasting particulate matter (PM2. 5 and PM10). Atmosphere
meta-analysis. Sci Rep 2020;10(1):16057. 2022;13(10):1726.
[13] Kumar NK, et al. Analysis and prediction of cardio vascular disease using machine [38] Das A, Rad P. Opportunities and challenges in explainable artificial intelligence
learning classifiers. In: 2020 6th international conference on advanced computing (xai): a survey. 2020. arXiv preprint arXiv:2006.11371.
and communication systems (ICACCS). IEEE; 2020. [39] Islam MK, et al. Enhancing lung abnormalities detection and classification using a
[14] Nikhar S, Karandikar A. Prediction of heart disease using machine learning Deep Convolutional Neural Network and GRU with explainable AI: a promising
algorithms. International Journal of Advanced Engineering, Management and approach for accurate diagnosis. Machine Learning with Applications; 2023,
Science 2016;2(6):239484. 100492.