0% found this document useful (0 votes)
3 views7 pages

Manuscript

This study compares five machine learning algorithms—Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Logistic Regression (LR), Decision Tree (DT), and Naive Bayes (NB)—for classifying cardiovascular disease using a dataset from Kaggle. The research highlights the challenges posed by high-dimensional and imbalanced data in accurately diagnosing heart disease and concludes that SVM and LR are the most effective methods for this classification task. The paper emphasizes the importance of machine learning in improving medical decision-making and patient diagnosis in healthcare.

Uploaded by

Dza Fadhlan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Manuscript

This study compares five machine learning algorithms—Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Logistic Regression (LR), Decision Tree (DT), and Naive Bayes (NB)—for classifying cardiovascular disease using a dataset from Kaggle. The research highlights the challenges posed by high-dimensional and imbalanced data in accurately diagnosing heart disease and concludes that SVM and LR are the most effective methods for this classification task. The paper emphasizes the importance of machine learning in improving medical decision-making and patient diagnosis in healthcare.

Uploaded by

Dza Fadhlan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Machine Learning Algorithms for The Classification

of Cardiovascular Disease: A Comparative Study

*Pantea
Wada Mohammed Jinjri Keikhosrokiani Nasuha Lee Abdullah
School of Computer Sciences, School of Computer Sciences, School of Computer Sciences,
Universiti Sains Malaysia Universiti Sains Malaysia Universiti Sains Malaysia
Minden, 11800 Penang, Malaysia Minden, 11800 Penang, Malaysia Minden, 11800 Penang, Malaysia
[email protected] [email protected] [email protected]

Abstract— Heart disease (cardiovascular disease) is a major data for various resolutions. These make available a way of
human disorder that significantly affects many people's lives. analyzing a large dataset to find patterns and relations
Diagnosing heart disease becomes an important task to reduce amongst diverse entities that are not detectable without
its sovereignty in its early stage. Machine learning methods advanced analyzing techniques [9].
remain the most widely used for the classification and detection
processes. This work aims to design and identify a model that Due to the high dimensional and imbalanced nature of
best classifies cardiovascular disease and predicts the presence data, standard statistical approaches have stood a significant
or absence of the disease in patients using machine learning challenge and have abridged several classification techniques
methods with accurate predictions. Therefore, this paper unfeasible. Thus, researchers [10]–[12] have proposed
compares the five most powerful machine learning platforms to several techniques to handle the inherent difficulties of high
classify cardiovascular disease data. The proposed five different dimension in data. The difficulty of precise classification is
classifiers are are support vector machine (SVM), K-nearest possibly due to noisy features that are non-relevant for
neighbor (K-NN), Logistic regression (LR), Decision tree (DT), classification but relatively lead to the accumulation of
and Naive Bayes (NB) for the classification of cardiovascular significant errors that frequently lead to inaccurate analysis.
disease (CVD). To validate the work, the dataset was obtained Thus, the standard classification methods do not handle these
from the Kaggle repository online. The algorithms' data inadequacies, outliers, and noises leading to reduced
performance is analyzed, evaluated, and compared by applying
accurate results. For this reason, there is a need to identify
various performance factors. Results indicates that support
and apply a technique capable of solving these issues.
vector machine (SVM) and logistic regression (LR) methods are
the most efficient for diagnosing cardiovascular disease. Nowadays, classification is a persistent problem that
involves various applications. Several healthcare
Keywords—heart disease, cardiovascular disease, machine establishments face a significant challenge in delivering
learning, classification, comparative analysis eminence services like diagnosing patients appropriately and
I. INTRODUCTION managing treatment at reasonable expenses. Therefore,
classification approaches are generally used in the medical
Cardiovascular disease, also known as (heart disease) is a field to classify health data into different classes according to
well-known and critical human global problem that some constraints, relatively a specific classifier [13]. In
significantly affects people. A recent study estimated that general, a classification algorithm is a function that evaluates
millions of deaths had occurred worldwide due to heart the input features so that one class is separated into positive
diseases[1]–[5], representing 31% of all world deaths. values and the other into negative values by the output.
Medical evidence has revealed that certain risk factors Therefore, classifier training is essential to identify the
upsurge an individual’s likelihood of getting heart disease weights that provide the classes in the data with the most
(CVD). Some of these factors, as stated by [6], [7],are precise and best separation [14].
unhealthy diet, use of tobacco, depression, stress, excessive
use of alcohol, physical inactivity, inheritance overweight, Motivated by the advances of numerous machine learning
and age. Several reports by the W.H.O have shown the rise approaches to forecasting cardiovascular disease threat and
of death due to CVD diseases mainly attributed to insufficient improved classification performance, this paper contributes
protective measures despite increasing risk factors. by performing a comparative study of machine learning
algorithms and identifies the utmost effective algorithm with
The cumulative morbidity and mortality due to heart reasonable accuracy for classifying cardiovascular disease
disease worldwide have fascinated researcher’s attention to data. In addition to establishing the performance of different
perform numerous investigations in their determination to algorithms in large and minor datasets with one view, classify
curtail the rates. However, machine learning techniques play them appropriately, and offer information on building
a very dynamic role in medical data mining for information supervised machine learning models. We proposed five
extraction analysis. These systems have broadly used to machine learning algorithms: Support vector machine
execute medical decision systems for predictions, improved (SVM), K-nearest neighbor (KNN), Logistic regression (LR),
healthiness policy-making and inhibition of clinical errors, Decision tree (DT), and Naïve Bayes (NB) to efficiently
early discovery, prevention of ailments, and avoidable evaluate their performance by using clinical data obtained
infirmary deaths [8]. As an intelligent system, machine from cardiovascular disease patients. The rest of the paper
learning methods can be used to understand the meaning of a planned as follows: Section 2 discusses related works by
data set consistently and offers a suitable output from raw other researchers. Section 3 presents the proposed

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


methodology, dataset, and methods. At the same time, the of parametric and non-parametric methods in classifying
performance measures chosen are briefly discussed in heart disease. They obtained a Cleveland dataset consisting
Section 4. The results are presented and discussed in section of 270 records and 13 features, out of which 11 are used and
5, while Section 6 concludes the paper and future works. validate their analysis. A 10-fold cross-validation method
was used to measure the unbiased estimate of their
II. RELATED WORKS classification models from their experiment. Their results
showed LR achieves 0.91% accuracy over BFNN with an
In medical healthcare, classification is one of the most accuracy of 0.88%.
critical, important, and popular decision-making tools. There S. I. Ayon, M. M. Islam, and M. R. Hossain [24]
have been numerous computational intelligence methods that compared the computational intelligence techniques
supports medical healthcare domain [15]– [17]. Some of the performances to classify heart disease. They selected seven
works done by other researchers are related to this area of computational intelligence techniques consisting such as
research. decision tree (DT), k-nearest neighbor (KNN), support vector
According to H. Ayatollahi, L. Gholamhosseini, and M. machine (SVM), deep neural network (DNN), random forest
Salehi [18] in their work, they performed a comparative study (RF), logistic regression (LR), and naïve bayes (NB). The
between the artificial neural network (ANN) and support Cleveland and Statlog heart disease datasets from the UCI
vector machine (SVM) approaches for classification based on repository are used to evaluate each technique's performance.
the positive predictive value of cardiovascular disease. They Their work showed that 97% accuracy is achieved by the
use a medical data record from various hospitals for coronary DNN, which is comparatively better than the other
artery disease patients. The data comprises 1324 instances, 25 algorithms.
attributes, and split for the algorithm to training and testing
sets to a ratio of 70% and 30% correspondingly. Their III. METHODOLOGY
experimental result shows that SVM achieves higher The methodology for this study starts by collecting the
accuracy and better performance over the ANN. dataset for classification. Classification is one of the essential
tasks in data mining, the aim of which is to assign a document
Another work by M. F. Rabbi [19] proposed the most to one or more classes or categories. Therefore, this study
common classification models used in data mining. They use determines an efficient algorithm for the classification of
k-nearest neighbor (K-NN), artificial neural network (ANN), cardiovascular disease dataset. To achieve this, we assume
and support vector machine (SVM) using MATLAB multi- that there are only two classes: the positive class with
layered feed-forward back-propagation. Their work was unpredictable results and the negative class without
analyzed using the heart disease Cleveland dataset, which unforeseen discoveries. The methodological framework for
contains 303 instances and 76 attributes from the UCI the cardiovascular disease classification system is illustrated
machine learning repository. After they pre-processed the in Fig. 1.
dataset and performed the experiments, their experimental
results disclosed that the SVM algorithm had outperformed
the K-NN and ANN with a classification accuracy of 85%. In
comparison, KNN achieves 82% and 73%, ANN
approximately.
A. S. Ebenezer, S. J. Priya, D. Narmadha, and G. N.
Sundar[20], selected ten different algorithms to classify
coronary artery disease risk assessment. Artificial neural
network (ANN), decision tree (DT), support vector machine
(SVM), random forest (RF), CHAID, rule induction, naïve
bayes (NB), k-nearest neighbor (KNN), [21] and decision
stump (DS) are chosen to perform classification task, PIMA
dataset was obtained from an online repository for the
analysis of their work. The dataset contains 303 patients
records with 14 attributes. Their results revealed that NB and
SVM performed well to predict heart disease.
H. Mansoor, I. Y. Elgendy, R. Segal, A. A. Bavry, and J.
Bian, [22] in their work compared the performances of
classification algorithms for machine learning. They selected
Random Forest (RF) and Logistic Regression (LR)
techniques for predicting the risk level of heart disease in
patients. They used united states National Inpatient Sample Fig. 1. Methodology framework
(NIS) data for 2011-2013 in their work. The LR technique
delivers a better accuracy performance of 89% than the RF We use the Anaconda Jupyter Notebook Python 3
with 88% from their experimental analysis. application to implement the algorithms. The datasets are
then pre-processed and split into training and test sets.
S.D. Desai, S. Giraddi, P. Narayankar, N.R. According to K. Korjus, M. N. Hebart, and R. Vicente, [25]
Pudakalakatti, and S. Sulegaon [23] experiment with logistic most researchers practice a 70:30 ratio (70% for training and
regression (LR) and back-propagation neural network 30% for the testing) because the system generates the more
(BFNN) to assess the classification technique's accuracy for data allocated to training, the more optimum and accurate
heart disease prediction. They perform a comparative study
outcomes. Therefore, the 70:30 partitioning ratio is dependent variable in the LR is a binary variable containing
performed. data coded as 0 (yes, success, etc.) or 1. (no, failure, etc.). The
A. Dataset main task in LR analysis is to estimate the log odds of an
occurrence. LR estimates multiple linear regression
The dataset used is obtained from the Kaggle repository functions, as defined mathematically in (1):
online [26] to analyze and compare the algorithms chosen for
this study. The dataset consists of 77,000 clinical trial records
of patients' data collected by hospitals for cardiovascular-
related diseases, and there are three input features within the
Where k = 1, 2…, n
dataset: Objective (realistic-information), Examination
(outcomes of medical investigation), and Subjective (data b) Support Vector Machine (SVM): SVM is introduced
obtained from a patient). Also, the dataset has11 attributes in the year 1992 by Boser, Guyon, and Vapnik initially
from which 4 are objective features, 4 examination features, developed as a binary classification method, and its
3 subjective features, and 1 target variable labeled as application extended for multiple problems of classification
(Absence or Presence) for diagnosis. Table. 1 provides a brief and regression. Due to its generalization performance, it is
description of the cardiovascular disease dataset collected for considered a virtuous classifier without adding prior
the analysis of this study. knowledge even when the input amount is very high.[28].
Data containing separable classes can be classified by finding
TABLE I. DATASET DESCRIPTION
the optimal hyperplane that maximizes the margin between
Attributes
Input Data Type/ classified classes[29]. SVM's model is defined as finite-
Features Description dimensional vector spaces in which each dimension
1 Age Int / days
represents a 'feature' of a specific object. High-dimensional
2 Height Int / centimeters space problem strategy has shown to be a practical approach.
Objective In recent years, SVM has shown excellent performance for
3 Weight features Float/ kilograms
disease prediction in medical health due to its computational
Gender Categorical code 1: effectiveness on large datasets. The primary purpose is to
4
male, 2: female
Systolic blood Int/
design it as a supervised learning method for regression and
5 classification tasks and minimize generalization errors [30].
pressure
Diastolic blood Int/ The. SVM represented mathematically as:
6
pressure
Cholesterol Examination 1: normal, 2: above
7 features normal, 3: well above
normal
Glucose 1: normal, 2: above
8 normal, 3: well above
normal
9 Smoking Binary ‘X’ is depicted as a vector point in the Equation and
10 Alcohol Subjective Binary “w” as a weight and a vector. Therefore, to separate the data
features in (2), the data in (3) must be continuously greater than zero,
11 Physical activity Binary
and the data in (4) must be continually lower than zero.
12
Cardiovascular
Target
Presence or absence of Among all likely hyperplanes, SVM decides where the
CVD / target variable. hyperplane distance is as great as possible.
c) K-Nearest Neighbour (K-NN): The K-NN is a
B. Classification approaches method used for classification based on the similarity
Next, we explain the machine learning approaches used between one case and another. When a case is new at a
in this study. Five popular classification models (i.e., particular point, its distance from every one of the model
Decision tree, K-nearest neighbor, Logistic regression, Naïve cases determined like the closest neighbor, which is the most
Bayes, and Support vector machine) are built and compared similar the technique indicates the case. In this way, the case
based on their predictive accuracy. Many studies compared is put into the output containing the most immediate
data mining methods in different parameter settings. Most of neighbours. The K-NN algorithm predicts a new input class
these previous studies found these methods superior to their label, and K-NN uses the similarity of the new input to the
statistical counterparts in terms of being less constrained by samples of its input in the training set. If the new input is the
assumptions and producing better classification results. Some same as the samples in the training set, the K-NN
of these methods are briefly discussed. classification output is not sound [31]. Let (x,y) be the
a) Logistic Regression (LR): is one of the most widely observation of training and the learning function as h: X ⟶
used machine learning models for analyzing multivariate Y, so that the value of y can determine by an observation x,
regression issues in medical healthcare has been. LR is used h(x).
to forecast a dependent variable's outcome with a continuous Mathematically, the Euclidean distance calculated
independent variable that helps diagnose and predict diseases using the following formula, where p:x*x=R is a function that
differently [27]. It is an approach to discriminative categories returns the distance between the two points x(xx , xi′ ).
that work on the input vector and extracts significant
statistical items from the model or predicts data trends. The
• True positive (TP) – Are situations when the actual
class of datapoint is 1 and predicted also 1.
• False positive (FP) – Are situations when the actual
d) Decision Tree (DT): is another algorithm for class of datapoint is 0 and predicted is 0.
supervised learning; using a DT, mostly classification
• False negative (FN) – Are situations when actual class
problems are solved. DT classifies the data based on decision
of datapoint is 0 and predicted is 1.
rules derived from the training data by calculating the entropy
and gain of information. A tree structure developed for • True negative (TN) – Are situations when the actual
classification purposes, and each node will represent an class of datapoint is 1 and predicted is 0.
attribute. The root node, followed by the children's nodes will
be the primary one. The leaf nodes then represent the result TABLE II. CONFUSION MATRIX

of the decision [32]. With continuous and categorical Actual value


attributes, it performs efficiently. The population is separated Classified as absence Classified as presence
into two or more comparable sets in DT based on essential
predictors. For each feature, the first stage of DT is to

Absence
Predicted value
TP FN
calculate entropy. Next, the dataset split with high data gain
or less entropy based on the variables/predictors. The rest of
the attributes are followed by the two steps as stated.

Presence
FP TN

Where “l” referred to a response variable module count,


A. F1_score
“qk” is the ratio of the count of the kth class procedures to
F1-score is a function interpreted as a weight of recall
whole count of models.
average and precision when an f1-score reaches its best value
at 1 and its worst score at 0. For f1-score, the formula is:

e) Naïve Bayes (NB): is one of the most popular data (10)


mining algorithms used for classification. It is a probabilistic B. Recall
model built on the Bayes theorem with a strong assumption
of independence between features. NB algorithm assumes Recall measures how much relevant data from any
machine learning algorithm is retrieved. It focuses on the
that the impact of a specific feature in a class is independent
capability to find all related occurrences in the data. The
of other features. Despite its simplicity, the classifier of NB following equation represents a recall:
often does surprisingly well and is widely used because it
often performs more sophisticated classification methods.
[33].
A way of calculating a posterior probability, P(C|X), (11)
from P(C), P(X) and P(X|C), is provided by the bayes
C. Precision
theorem. NB assumes that an independent value of other
predictors is the effect of a predictor (X) value on a given Precision is the fact of being accurate and correct.
class (C). This assumption is, therefore, known as conditional Precision gives the idea of correctly predicted instances. It
independence. quantifies predictions that belongs to a positive class,
measured the amount of true positive from all positives, and
this is calculated as:

IV. EVALUATION MEASURES (12)


The evaluation criteria used to measure the algorithm's
D. Accuracy
performance are based on specific metrics such as f1_score,
precision, recall, and accuracy. We also measure the training Accuracy Is an essential measure of describing the
time taken for each algorithm. The confusion matrix presents performance of an algorithm. It defines the step to which an
the classification algorithm's performances that form the algorithm can predict the positive and negative cases
basis from which various parameters are calculated. Thus, the correctly and is measured using the formula:
confusion metrics measure a model's accuracy by comparing
predicted values with actual values and contribute by finding
out whether a classification algorithm is usually mislabeling (13)
one another [34], [35]. The parameters below are a brief
description of values for the confusion matrix and its V. RESULTS AND DISCUSSION
representation, as shown in Table 2.
The experiments for this study are performed on intel ®
core2 Duo T8300 @ 2.40 GHz Windows 10 computer with
8GB RAM and implemented in Anaconda Jupyter notebook TABLE VII. CONFUSION MATRIX FOR SUPPORT VECTOR MACHINE
CLASSIFIER
python version 3. We validate our experiments on a single
dataset from the Kaggle repository for proper understanding. Predicted Value

In this paper, five computational intelligence methods: Absence Presence Actual


value
decision tree (DT), logistic regression (LR), K-nearest
Absence 5671 (40.51%) 1317 (9.41%) 6988
neighbor (KNN), Naïve bayes (NB), and Support vector

Actual
machine (SVM), has used for the classification of Presence 2509 (17.92%) 4503 (32.16%) 7012

value
cardiovascular disease dataset. The dataset consists of 70,000 Total Predicted 8180 5820 14000
samples and 11 features. There are two categories of
analyzing the classes: absence or presence of the disease.
From the data samples, 35021 are specified as absent, while From the confusion matrix results, the naïve bayes (NB)
the remaining 334979 specified as presence of cardiovascular predicts the highest number of true positives, while logistic
disease. The data is partitioned into training and testing sets regression predicts the highest number of true negatives, as
at a ratio of 70:30, respectively. represented in Tables 6 and 5.
The confusion matrix of prediction results for DT, LR, As illustrated in Figure 2, is a comparison between the
KNN, NB, and SVM is shown in tables 1-5. With the help of algorithms DT, KNN, LR, NB, and SVM for the F1_score
f1_score, recall, precision, and accuracy are measured as with 63.94%, 67.02%, 71.13%, 44.43%, and 70.71%,
illustrated in Fig. 2-6. respectively. It is concluded that LR outperforms other
algorithms for the F1_score.
TABLE III. CONFUSION MATRIX FOR DECISION TREE CLASSIFIER

Predicted Value
Absence Presence Actual
value
Absence 6562 (31.35%) 3899 (18.46%) 10461
Actual

Presence 3775 (18.03%) 6764 (32.16%) 10539


value

Total Predicted 10337 10663 21000

TABLE IV. CONFUSION MATRIX FOR K-NEAREST NEIGHBOR


CLASSIFIER

Predicted Value
Absence Presence Actual
value
Absence 8243 (39.25%) 2296 (10.93%) 10539
Fig. 2. Comparison for F1_score
Actual

Presence 4031 (19.20%) 6430 (30.62%) 10461


value

Total Predicted 12274 8726 21000

TABLE V. CONFUSION MATRIX FOR LOGISTIC REGRESSION


CLASSIFIER

Predicted Value
Absence Presence Actual
value
Absence 5363 (38.31%) 1625 (11.61%) 6988
Actual

Presence 2244 (16.03%) 4768 (34.06%) 7012


value

Total Predicted 7607 6393 14000

TABLE VI. CONFUSION MATRIX FOR NAÏVE BAYES CLASSIFIER

Predicted Value Fig. 3. Comparison for the precision


Absence Presence Actual
value The representation in Fig. 3 shows the comparison for
Absence 9078 (43.23%) 1383 (6.59%) 10461
precision which computes the number of positive class
predictions that essentially fit the positive class. The SVM
Actual

Presence 7134 (33.97%) 3405 (16.21%) 10539


value

outperforms other algorithms with 77.35%, followed by LR


Total Predicted 16212 4788 21000 74.58%, KNN 73.68%, NB 71.11%, and DT with 63.42%.
training time of about 296.67 seconds, despite the excellent
accuracy performance over other algorithms.
In general, SVM and logistic regression algorithms
have the highest classification accuracy of 72.66% and
72.33%. In contrast, DT, KNN, and NB have a classification
accuracy of 63.69%, 69.87%, and 59.44%, respectively, as
illustrated in Table 8. Comparing the overall results of the
classification shows a support vector machine (SVM), and
logistic regression have been identified as the most efficient
classification algorithms for cardiovascular disease data with
an accuracy of 72.66% and 72.36.

TABLE VIII. COMPARATIVE ANALYSIS FOR THE ALGORITHMS

Algorithm Recall F1_score Precision Accuracy T/time


(Sec)

DT 64.40 63.94 63.42 63.69% 0.53


Fig. 4. Comparison of Recall

Fig. 4 shows a comparison for a recall measure K-NN 61.46 67.02 73.68 69.87.% 5.78
between the algorithms, which illustrates the proportion of
LR 67.99 71.13 74.58 72.36% 2.52
predicted positive classes made out of entire examples that
are positive from the dataset.
NB 32.30 44.43 71.11 59.44% 0.63

SVM 64.21 70.17 77.35 72.66% 296.67

Therefore in this paper, we have discussed the major


techniques used in data mining to classify cardiovascular
disease data. These contribute by identifying the support
vector machine (SVM) as the best performing technique that
can predict the presence or absence of cardiovascular disease
with better accuracy for the early diagnosis, which will
reduce the rate of mortality. Thus, the system helps not only
doctors but also the patients by reducing the cost of laboratory
examination and saves time.
VI. CONCLUSION AND FUTURE WORK
Data mining applications are widely used in medical
Fig. 5. Comparison for the accuracy performance healthcare to detect diseases and diagnose heart disease
patients based on their medical data. In this paper, we have
The comparison of performance accuracy for the discussed effective machine learning techniques and
algorithms used is shown in Fig. 5. It is observed that SVM identified the most efficient for the classification of
achieved an accuracy of 72.66% over other algorithms DT, cardiovascular disease by using the patient’s data. Multiple
KNN, LR, and NB with 63.69%, 69.87%, 72.36%, and classification algorithms SVM, KNN, DT, LR, and NB have
59.44%. been compared based on evaluative measures such as
precision, recall, f1-score, accuracy, and the algorithms
training time. Our proposed work shows that support vector
machine (SVM) and logistic regression (LR) methods are the
most efficient for diagnosing cardiovascular disease. In the
future, we intend to enhance the performance of these basic
classification techniques by developing a meta-model that
will be used for predicting cardiovascular disease among
people with risk of heart disease.
VII. ACKNOWLEDGMENT
The authors are thankful to the School of Computer
Sciences, and Division of Research & Innovation, USM for
providing financial support from the Short-Term Grant
(304/PKOMP/6315435).
Fig. 6. Comparison of training time

Fig. 6 represent the time taken during the training


process for each of the algorithm. SVM takes a longer
REFERENCES ST-elevation myocardial infarction: A machine learning
approach,” Heart & Lung, vol. 46, no. 6, pp. 405–411, 2017.
[23] S. D. Desai, S. Giraddi, P. Narayankar, N. R. Pudakalakatti, and S.
[1] P. Keikhosrokiani, Perspectives in the development of mobile Sulegaon, “Back-Propagation Neural Network Versus Logistic
medical information systems: Life cycle, management, Regression in Heart Disease Classification,” in Advanced
methodological approach and application. 2019. Computing and Communication Technologies, 2019, pp. 133–
[2] WHO, “Cardiovascular diseases (CVDs): key facts,” World Health 144.
Organization, 2017. [24] S. I. Ayon, M. M. Islam, and M. R. Hossain, “Coronary artery
[3] P. Keikhosrokiani, N. Mustaffa, and N. Zakaria, “Success factors in heart disease prediction: a comparative study of computational
developing iHeart as a patient-centric healthcare system: A multi- intelligence techniques,” IETE Journal of Research, pp. 1–20,
group analysis,” Telematics and Informatics, vol. 35, no. 4, 2018, 2020.
doi: 10.1016/j.tele.2017.11.006. [25] K. Korjus, M. N. Hebart, and R. Vicente, “An efficient data
[4] P. Keikhosrokiani, “Chapter 5 - Success factors of mobile medical partitioning to improve classification performance while keeping
information system (mMIS),” P. B. T.-P. in the D. of M. M. I. S. parameters interpretable,” PloS one, vol. 11, no. 8, p. e0161788,
Keikhosrokiani, Ed. Academic Press, 2020, pp. 75–99. 2016.
[5] P. Keikhosrokiani, N. Mustaffa, N. Zakaria, and R. Abdullah, [26] S.Ulianova, “Cardiovascular Disease dataset,” Kaggle.com, 2019.
“Assessment of a medical information system: the mediating role https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
of use and user satisfaction on the success of human interaction (accessed Jan. 02, 2021).
with the mobile healthcare system (iHeart),” Cognition, [27] L. M. Kemppainen, T. T. Kemppainen, J. A. Reippainen, S. T.
Technology & Work, vol. 22, no. 2, pp. 281–305, 2020, doi: Salmenniemi, and P. H. Vuolanto, “Use of complementary and
10.1007/s10111-019-00565-4. alternative medicine in Europe: Health-related and
[6] A. D’Souza, “Heart disease prediction using data mining sociodemographic determinants,” Scandinavian journal of public
techniques,” International Journal of Research in Engineering health, vol. 46, no. 4, pp. 448–455, 2018.
and Science (IJRES) ISSN (Online), pp. 2320–9364, 2015. [28] C. J. C. Burges, “A tutorial on support vector machines for pattern
[7] P. Keikhosrokiani, “Chapter 6 - Emotional-persuasive and habit- recognition,” Data Mining and Knowledge Discovery, vol. 2, no.
change assessment of mobile medical information Systems 2, pp. 121–167, 1998, doi: 10.1023/A:1009715923555.
(mMIS),” P. B. T.-P. in the D. of M. M. I. S. Keikhosrokiani, Ed. [29] K. P. Soman, R. Loganathan, and V. Ajay, Machine learning with
Academic Press, 2020, pp. 101–109. SVM and other kernel methods. PHI Learning Pvt. Ltd., 2009.
[8] J. Patel, D. TejalUpadhyay, and S. Patel, “Heart disease prediction [30] J. Zhi, J. Sun, Z. Wang, and W. Ding, “Support vector machine
using machine learning and data mining technique,” Heart classifier for prediction of the metastasis of colorectal cancer,”
Disease, vol. 7, no. 1, pp. 129–137, 2015. International journal of molecular medicine, vol. 41, no. 3, pp.
[9] V. Abeykoon, N. Kankanamdurage, A. Senevirathna, P. Ranaweera, 1419–1426, 2018.
and R. Udawalpola, “Electrical Devices Identification through [31] N. Khateeb and M. Usman, “Efficient heart disease prediction
Power Consumption using Machine Learning Techniques,” Int. J. system using K-nearest neighbor classification technique,” in
Simul. Syst. Sci. Technol, vol. 17, 2016. Proceedings of the International Conference on Big Data and
[10] W. Xing and Y. Bei, “Medical Health Big Data Classification Internet of Thing, 2017, pp. 21–26.
Based on KNN Classification Algorithm,” IEEE Access, vol. 8, [32] A. B. Møller, B. V Iversen, A. Beucher, and M. H. Greve,
pp. 28808–28819, 2019. “Prediction of soil drainage classes in Denmark by means of
[11] M. J. El-Khatib, B. S. Abu-Nasser, and S. S. Abu-Naser, “Glass decision tree classification,” Geoderma, vol. 352, pp. 314–329,
Classification Using Artificial Neural Network,” 2019. 2019.
[12] A. M. Rajeswari, M. S. Sidhika, M. Kalaivani, and C. Deisy, [33] K. Vembandasamy, R. Sasipriya, and E. Deepa, “Heart diseases
“Prediction of Prediabetes using Fuzzy Logic based Association detection using Naive Bayes algorithm,” International Journal of
Classification,” in 2018 Second International Conference on Innovative Science, Engineering & Technology, vol. 2, no. 9, pp.
Inventive Communication and Computational Technologies 441–444, 2015.
(ICICCT), 2018, pp. 782–787. [34] B. Lantz, Machine learning with R: expert techniques for
[13] D. Sisodia and D. S. Sisodia, “Prediction of diabetes using predictive modeling. Packt Publishing Ltd, 2019.
classification algorithms,” Procedia computer science, vol. 132, [35] T. Mailund, Beginning Data Science in R: Data Analysis,
pp. 1578–1585, 2018. Visualization, and Modelling for the Data Scientist. Apress,
[14] T. I. Netoff, “The Ability to Predict Seizure Onset,” in Engineering 2017.
in Medicine, Elsevier, 2019, pp. 365–378.
[15] S. S. Sikchi, S. Sikchi, and M. S. Ali, “Fuzzy expert systems (FES)
for medical diagnosis,” International Journal of Computer
Applications, vol. 63, no. 11, 2013.
[16] A. V. S. Kumar, “Diagnosis of heart disease using Advanced
Fuzzy resolution Mechanism,” International Journal of Science
and Applied Information Technology, vol. 2, no. 2, pp. 22–30,
2013.
[17] I. Teoh Yi Zhe and P. Keikhosrokiani, “Knowledge workers
mental workload prediction using optimised ELANFIS,” Applied
Intelligence, 2020, doi: 10.1007/s10489-020-01928-5.
[18] H. Ayatollahi, L. Gholamhosseini, and M. Salehi, “Predicting
coronary artery disease: a comparison between two data mining
algorithms,” BMC public health, vol. 19, no. 1, pp. 1–9, 2019.
[19] M. F. Rabbi et al., “Performance evaluation of data mining
classification techniques for heart disease prediction,” American
Journal of Engineering Research, vol. 7, no. 2, pp. 278–283,
2018.
[20] A. S. Ebenezer, S. J. Priya, D. Narmadha, and G. N. Sundar, “A
novel scoring system for coronary artery disease risk
assessment,” in 2017 International Conference on Intelligent
Computing and Control (I2C2), 2017, pp. 1–6.
[21] O. Abdelrahman and P. Keikhosrokiani, “Assembly Line Anomaly
Detection and Root Cause Analysis Using Machine Learning,”
IEEE Access, vol. 8, pp. 189661–189672, 2020, doi:
10.1109/ACCESS.2020.3029826.
[22] H. Mansoor, I. Y. Elgendy, R. Segal, A. A. Bavry, and J. Bian,
“Risk prediction model for in-hospital mortality in women with

You might also like