0% found this document useful (0 votes)
30 views8 pages

Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis

State-of-the-art data mining tools incorporate advanced machine learning (ML) and artificial intelligence (AI) models, and it is widely used in classification, association rules, clustering, prediction, and sequential models. Data mining is important for the process of diagnosing and predicting diseases in the early stages, and this contributes greatly to the development of the health services sector. This study utilized classification to predict the stroke of a sample of the patient dataset that was taken from Kaggle. The classification model was created using the data mining program waikato environment for knowledge analysis (WEKA). This data mining tool helped identify individuals most at risk of stroke based on analysis of features extracted from the patient’s dataset. These features were used in classification processes according to the naive Bayes (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. Analysis of the classification results of the previous algorithms showed that the SVM outperformed other algorithms in terms of accuracy (94.4%), sensitivity (100%), and F-measure (97.1%). However, the NB algorithm had the best performance in terms of precision (95.7%).

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views8 pages

Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis

State-of-the-art data mining tools incorporate advanced machine learning (ML) and artificial intelligence (AI) models, and it is widely used in classification, association rules, clustering, prediction, and sequential models. Data mining is important for the process of diagnosing and predicting diseases in the early stages, and this contributes greatly to the development of the health services sector. This study utilized classification to predict the stroke of a sample of the patient dataset that was taken from Kaggle. The classification model was created using the data mining program waikato environment for knowledge analysis (WEKA). This data mining tool helped identify individuals most at risk of stroke based on analysis of features extracted from the patient’s dataset. These features were used in classification processes according to the naive Bayes (NB), random forest (RF), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms. Analysis of the classification results of the previous algorithms showed that the SVM outperformed other algorithms in terms of accuracy (94.4%), sensitivity (100%), and F-measure (97.1%). However, the NB algorithm had the best performance in terms of precision (95.7%).

Uploaded by

IAES IJAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IAES International Journal of Artificial Intelligence (IJ-AI)

Vol. 13, No. 3, September 2024, pp. 3010~3017


ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3010-3017  3010

Enhancing stroke prediction using the waikato environment for


knowledge analysis

Muneera Altayeb, Areen Arabiat


Department of Communications and Computer Engineering, Faculty of Engineering, Al-Ahliyya Amman University, Al-Salt, Jordan

Article Info ABSTRACT


Article history: State-of-the-art data mining tools incorporate advanced machine learning
(ML) and artificial intelligence (AI) models, and it is widely used in
Received Jan 5, 2024 classification, association rules, clustering, prediction, and sequential models.
Revised Feb 13, 2024 Data mining is important for the process of diagnosing and predicting diseases
Accepted Feb 28, 2024 in the early stages, and this contributes greatly to the development of the
health services sector. This study utilized classification to predict the stroke
of a sample of the patient dataset that was taken from Kaggle. The
Keywords: classification model was created using the data mining program waikato
environment for knowledge analysis (WEKA). This data mining tool helped
Multi-layer perceptron
identify individuals most at risk of stroke based on analysis of features
Naive Bayes
extracted from the patient’s dataset. These features were used in classification
Random forest
processes according to the naive Bayes (NB), random forest (RF), support
Support vector machine
vector machine (SVM), and multi-layer perceptron (MLP) algorithms.
Waikato environment for
Analysis of the classification results of the previous algorithms showed that
knowledge analysis data mining
the SVM outperformed other algorithms in terms of accuracy (94.4%),
sensitivity (100%), and F-measure (97.1%). However, the NB algorithm had
the best performance in terms of precision (95.7%).
This is an open access article under the CC BY-SA license.

Corresponding Author:
Muneera Altayeb
Department of Communications and Computer Engineering, Faculty of Engineering
Al-Ahliyya Amman University
Al-Saro, Al-Salt, Amman, Jordan
Email: [email protected]

1. INTRODUCTION
Stroke, a potentially fatal consequence of atrial fibrillation, poses challenges in its prediction for
doctors due to its time-consuming and tedious nature. It primarily affects individuals over the age of 65 and is
comparable to a "heart attack" in its damaging effect on the brain. In the United States and agricultural nations,
stroke is the third leading cause of death. It occurs when the brain's blood supply is obstructed or reduced.
There are two main types of stroke: ischemic stroke, caused by insufficient blood flow, and hemorrhagic stroke,
caused by bleeding. Hemorrhagic stroke can be further classified into subarachnoid hemorrhage and
intracerebral hemorrhage [1]. Stroke ranks among the world's main causes of mortality and disability. Stroke
ranks second in Korea in terms of causes of death. The population of Korea is expected to age quickly; by
2050, the proportion of people over 60 is expected to rise from 13.7% in 2015 to 28.6% [2]–[4].
Islam et al. [5] introduced adaptive gradient boosting machine learning (ML) models to classify and
predict acute stroke in active states. The study was conducted on electroencephalogram (EEG) of 75 healthy
adults without a history of any neurological diseases, and 48 patients who had been diagnosed with an acute
stroke. Results showed that the proposed model was approximately 80% accurate in classifying the stroke
group. In a study on stroke prediction, researchers explored the use of three ML models: deep neural network

Journal homepage: http://ijai.iaescore.com


Int J Artif Intell ISSN: 2252-8938  3011

(DNN), random forest (RF), and logistic regression (LR). They evaluated the models' performance with
specific parameters and found that DNN, commonly used for predicting ischemic or acute stroke, showed
promise for long-term prediction as well. The DNN model achieved an impressive 88% accuracy when
considering input variables, outperforming the other models. The researchers highlighted the need to enhance
the model with automated and precise calculations, reducing the dependence on simpler models [6].
Hadianfard et al. [7] presented a study that aimed to predict stroke patients' survival rates by extracting
decision rules through the use of data mining techniques. The researchers used the multiple imputation method
to handle missing data when analyzing data from 4149 stroke patients that they had obtained from paper
medical records. To balance the target variable, they used methods like under- and oversampling in addition to
synthetic minority oversampling (SMOTE). Stroke patients' survival rate was predicted using the LR, decision
tree, and SVM algorithms. The repeated incremental pruning to produce error reduction (RIPPER) algorithm
was also used to extract decision rules. In terms of kappa (33.34), sensitivity (79.06%), and accuracy (76.96%),
LR outperformed the other algorithms. Nonetheless, the specificity (65.35%) and area under the ROC curve
(AUC) (0.77) were lower than other algorithms. Using an independent dataset of 234 records, the LR algorithm
that performed the best on the primary dataset was tested. When this method was used with the external
validation dataset, its accuracy (79.91%), sensitivity (83.94%), kappa (39.26), and AUC (0.8) all improved; its
specificity (60.98%) did not change.
Choi et al. [8] created a new methodology for applying deep learning models to raw EEG data that
does not take frequency features into account. Using real-time EEG sensor data, the proposed stroke prediction
model was developed and trained. Several deep learning models specializing in time series data classification,
and prediction long short term memory (LSTM), bidirectional LSTM, convolution neural network (CNN)-
LSTM, and CNN-bidirectional LSTM were created and compared. When using raw EEG data, the LSTM
bidirectional CNN model predicted stroke with 94.0% accuracy and low false positive rate (6.0%) and false
negative rate (5.7%), demonstrating high confidence in our method.
Modern ML algorithms and data preprocessing tools are arranged in an orderly manner on the waikato
environment for knowledge analysis (WEKA) workbench. Using these methods from the command line is the
primary method of interacting with them. However, easy-to-use interactive graphical user interfaces are
available for data exploration, large-scale experiment setups on distributed computing platforms, and stream
data processing configuration design. These interfaces make up a sophisticated setting for data mining
experiments. The GNU general public license governs the distribution of the Java-written system [9].
The novelty of this work lies in the use of a huge dataset to train several ML classifiers supported by
the WEKA data mining tool for stroke prediction. The prediction process in the proposed model is divided into
four stages; i) choosing the data set, ii) dataset cleaning and preprocessing, iii) classification using four
algorithms naive Bayes (NB), RF, support vector machine (SVM), and multi-layer perceptron (MLP), and
iv) results and performance evaluation. The performance of the classifier is evaluated using the following
metrics: accuracy, sensitivity, precision, and F-measure.

2. PROPOSED METHOD
The proposed model aims to detect stroke using ML and deep learning classifiers embedded in the
data mining tool WEKA, which allows users to categorize accuracy using various algorithmic methods, based
on a set of features [10]–[12]. Before starting the classification process, the dataset is first filtered and
pre-processed to become ready as features that can be fed to classifiers, as this is the first and most important
step in the process of developing a ML classifier. In the next step, the dataset is divided into test datasets and
training datasets and to gauge and analyze the performance, the cross-validation method is used in our model
proposed in this paper, 10 folds are used; the test is conducted on each fold independently, while the other nine
folds are used to learn. The 1/10 dataset that is retained separately is used to compute the error rate [13], [14].
The classification process is then carried out based on four algorithms NB, RF, SVM, and MLP.
Figure 1 describes the model proposed in this work.

2.1. Dataset
The dataset used in this study was obtained from Kaggle [15], where it consists of 3254 cases, each
containing eight attributes: age, gender, heart disease, hypertension, marital status, average blood sugar level,
body mass index, and smoking. These attributes are initialized by filtering them in a preprocessing or cleaning
step that involves deleting rows that include redundant, corrupted, incomplete, inaccurate, or incorrectly
structured data from a dataset. Then datasets are converted to the comma separated value (CSV) file format,
which is a compatible format with WEKA. Table 1 shows the attributes and description of the dataset used in
the classification process.
‒ Gender: a person's gender is indicated by this characteristic. 2,117 men (41.4%) and 2,994 women (58.6%)
comprise the male and female population. Disproportionately afflict women, with sociocultural gender
Enhancing stroke prediction using the waikato environment for knowledge analysis (Muneera Altayeb)
3012  ISSN: 2252-8938

playing a role in variations in risk factors, evaluation, treatment, and results. The study focuses on the gaps
in existing knowledge and research [16].
‒ Age: this feature describes an individual's age, as the occurrence of strokes in young individuals rises as
they age beyond 35 years, and there has been a 23% increase in such cases over ten years, primarily due
to a rise in ischemic stroke [17].
‒ Hypertension: this feature determines if the individual has hypertension, a condition that impacts 9.8% of
the participants and raises the risk [18].
‒ Heart disease: this feature signifies the presence or absence of heart disease in the individual. The
percentage of patients diagnosed with heart disease stands at 5.4% [19].
‒ Ever married: this feature displays the participants' marital status, with married individuals making up
65.6% of the sample [20].
‒ Average glucose level: this feature captures the participant’s average glucose level [21].
‒ BMI: this feature records the participants' body mass index [22].
‒ Smoking: three categories are included in this feature, which tracks the participant's smoking status:
formerly smoking (21.2%), never smoking (40.9%), and smoking (37-8%) [23].

Figure 1. Architecture phases of the proposed model

Table 1. Attributes and data description [15]


Variable Classification Data type Frequency Percentage (%)
Gender Male Nominal 1260 38.7
Female 1994 61.3
Age >35 Nominal 2484 76.3
<=35 770 23.7
Hypertension Yes Nominal 408 12.5
No 2846 87.5
Heart disease Yes Nominal 205 6.3
No 3049 93.7
Ever married Yes Nominal 2598 79.8
No 656 20.2
Average glucose level >120 Nominal 759 23.3
<=120 2495 76.7
BMI >=25 Nominal 2557 78.6
<25 697 21.4
Smoking Formerly smoking Nominal 814 25.0
Smokes 728 22.4
Never smoke 1712 52.6

2.2. Machine learning classifiers


ML is the scientific study of algorithms and statistical models used by computer systems to execute
tasks without explicit programming which has quickly improved in recent years in the context of data analysis
and computing, allowing applications to work intelligently. Applications like web search engines, data mining,
image processing, and predictive analytics are commonplace and use these methods. Because algorithms learn

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3010-3017


Int J Artif Intell ISSN: 2252-8938  3013

automatically, this is the main advantage [24], [25]. In this research, several classifiers were tested and
compared for stroke detection and will be discussed in the following subsection.

2.2.1. Naive Bayes classifier


One of the most widely used data mining methods is the NB algorithm. Its efficiency assumes attribute
independence, which may be broken in many real-world data sets. Several attempts have been made to reduce
the assumption, with attribute selection being one major technique. It calculates the possibility that a new
sample belongs to a particular class based on the argument that all features are independent of each other given
the class [26], [27]. Given the prior probabilities, P(c), P(x), and P(x|c), one may calculate the posterior
probability, P(c|x), using the Bayes theorem. The NB classifier assumes that the influence of a predictor's (x)
value on a given class (c) is independent of the values of the other predictors. In (1) illustrates this assumption,
which is known as class conditional independence [28].
P(xlc)P(c)
P(x|c) = P(x)
P(x|c) = P(x, lc) xP(x2|c)x. .× P(x, |c) × P(c) (1)

2.2.2. Random forest classifiers


An algorithm uses a bagging algorithm to group data, obtain decision tree models, and combine
sub-small models for a final model. The prediction results are based on voting, with the largest vote-based
classification [29]. One of the ensemble learning strategies that belongs to the homogeneous base learner group
in terms of constructive classifiers is reinforcement learning (RF). The first type is computational, while the
second is statistical. From a computational standpoint, the RF has the potential to cope with both regression
and classification problems [30].

2.2.3. Support vector machine


The optimal hyperplane that is closest to every data point is the target of the SVM training. In order
to decrease the overall separation between each data point and the hyperplane plane, the hyperplane parameters
are continually adjusted during the training phase [31]. SVM uses structural risk minimization to solve a limited
quadratic optimization problem to segregate data across a decision boundary, often known as the hyperplane
f(x)=0. The items in the provided data input xi (i = 1, 2, …, N) have labels that differ and correspond to the
positive and negative classes. Yields the hyperplane dividing the provided data in the case of linearly
distributed data, as shown in (2):

𝑦 = 𝐹(𝑥) = 𝑊 𝑇 𝑥 + 𝑏 = ∑𝑁
𝑖=1 𝑊𝑖 𝑥𝑖 + 𝑏 (2)

The vector W and scalar b determine the best-separating hyperplane, which maximizes the distance between
the plane and the closest data. Using the kernel function, SVM may be applied to non-linear classification tasks
when the features in high-dimensional feature spaces are non-linearly separable [32].

2.2.4. Multi-layer perceptron


MLP is a kind of neural network that employs the back-propagation method for supervised learning.
MLP architecture is composed of a three-layer configuration: input layer, hidden layer(s), and output layer(s);
in which every neuron is connected to every other neuron in the layer above it. MLP is said to perform
exceptionally well in non-linear issues regularly [33]. Figure 2 demonstrates the MLP neural networks'
architecture.

Figure 2. MLP neural networks' architecture [34]

Enhancing stroke prediction using the waikato environment for knowledge analysis (Muneera Altayeb)
3014  ISSN: 2252-8938

3. PERFORMANCE EVALUATION AND CLASSIFICATION RESULTS


In this paper, the classification process was implemented using multiple classifiers NB, RF, SVM,
and MLP for stroke detection on the registered dataset. The performance of the proposed model was evaluated
using different metrics, the first of which is the accuracy measure, as it has been used in many studies to
determine the classification accuracy according to in (3) [35]. One of the measures also used in this paper is
precision which is a measure of data accuracy achieved when limited information is available. In binary
classification, precision can be equated to positive predictive values. The subsequent statement represents the
rule for determining precision. As demonstrated by (4) [36].
𝑇𝑃+𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (3)

𝑇𝑃
Precision = 𝑇𝑃+𝐹𝑃 (4)

On the other hand, to measure the amount of total positive samples (TP+FN) that were assigned to
positive categories (TP), the sensitivity measurement index was used. In other words, the ratio of true positives
to the total ratio of actual yeses appears in (5) [37]. The F-measure was also used in this work by calculating
the harmonic mean of precision and sensitivity by assigning equal weight to each of them. In (6) shows the
F-measure [38].
𝑇𝑃
Sensitivity = 𝑇𝑃+𝐹𝑁 (5)

2∗Precision∗Sensitivity
F − Measure = (6)
Precision+Sensitivity

3.1. Confusion matrix


The confusion matrix is an important metric for assessing the accuracy of models. Furthermore, the
confusion matrix concept is perplexing [39]. Table 2 shows the confusion matrix for a binary classifier where
the predicted values are denoted as positive (1) and negative (0), while the actual values are marked true (1)
and false (0). Classification model possibilities are estimated from the expressions TP, TN, FP, and FN found
in the confusion matrix [40], [41].
In this paper, four different classification processes were performed and the performance of each was
evaluated based on the result of the confusion matrix as shown in Tables 3 to 6. From these results, we note
that NB achieved an accuracy of 90.4% while the accuracy of the MLP and RF classifiers reached 94.0%.
Finally, SVM excels with accuracy, reaching 94.4%.

Table 2. Confusion matrix [42]


Predicted
Congested Uncongested
Actual Congested True positive (TP) False negative (FN)
Uncongested False positive (FP) True negative (TN)

Table 3. Confusion matrix for NB


Confusion Matrix Normal Sick
Normal 2891 182
Sick 129 51

Table 4. Confusion matrix for RF


Confusion matrix Normal Sick
Normal 3058 15
Sick 180 0

Table 5. Confusion matrix for SVM


Confusion matrix Normal Sick
Normal 3037 0
Sick 180 0

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3010-3017


Int J Artif Intell ISSN: 2252-8938  3015

Table 6. Confusion matrix for MLP


Confusion matrix Normal Sick
Normal 3051 22
Sick 174 6

On the other hand, when we compare these classification results, as in Table 7 and Figure 3, we can
notice that the NB has the highest precision score of 95.7%. In terms of sensitivity and F-measure, SVM also
has the highest results of 100% and 97.1%, respectively. According to these results, the superiority of SVM
over other classifiers appears, with an accuracy of 94.4% and 100% for sensitivity, and it performed well
regarding precision with 94.4% and 97.1% for F1 score. On the other hand, the accuracy demonstrated in this
paper is shown to be superior to previous research. As in Islam et al. [5] the accuracy rate was 80%; in
Heo et al. [6], the accuracy rate was 88%; and in Hadianfard et al. [7], the accuracy rate was 76.96%. However,
in this study, the accuracy rate was around 94.4%.

Table 7. Performance comparison of classifiers (10-fold cross-validation)


Classifier NB (%) RF (%) SVM (%) MLP (%)
Accuracy 90.400 94.0 94.4 94.0
Precision 95.7 94.4 94.4 94.6
Sensitivity 94.1 99.5 100 99.3
F-measure 94.9 96.9 97.1 96.9

102.000

100.000

98.000

96.000
Percentage

94.000

92.000

90.000

88.000

86.000

84.000
Naive Bayes Random Forest SVM MLP

Accuracy Precision Sensitivity F-measure

Figure 3. Performance comparison of classifiers (10-fold cross-validation)

4. CONCLUSION
To complete our study, we had to evaluate several classification algorithms to detect stroke based on
a set of features such as age, hypertension, heart disease, blood sugar, BMI, marital status, and smoking status.
WEKA data mining software was used to evaluate and analyze the NB, RF, SVM, and MLP algorithms.
Regarding classification performance metrics, the performance was measured by performing a variety of
evaluation metrics, such as accuracy, precision, sensitivity, and F-measure on stroke datasets using 10-fold
cross-validation, SVM demonstrated strong generalization ability, achieving reliable results on both training
and testing datasets, with values of 94.4%, 100%, and 97.1% for accuracy, sensitivity, and F-measure,
respectively. In future work, a combination of other classification methods may be used to enhance the results.

REFERENCES
[1] H. K. V, H. P, G. Gupta, V. P, and P. K B, “Stroke prediction using machine learning algorithms,” International Journal of
Innovative Research in Engineering & Management, vol. 8, no. 4, Jul. 2021, doi: 10.21276/ijirem.2021.8.4.2.
[2] D. Pastore et al., “Sex-genetic interaction in the risk for cerebrovascular disease,” Current Medicinal Chemistry, vol. 24, no. 24,

Enhancing stroke prediction using the waikato environment for knowledge analysis (Muneera Altayeb)
3016  ISSN: 2252-8938

Sep. 2017, doi: 10.2174/0929867324666170417100318.


[3] H. C. Kim, D. P. Choi, S. V. Ahn, C. M. Nam, and I. Suh, “Six-year survival and causes of death among stroke patients in Korea,”
Neuroepidemiology, vol. 32, no. 2, pp. 94–100, Nov. 2009, doi: 10.1159/000177034.
[4] H. Lee, S. H. Oh, H. Cho, H. J. Cho, and H. Y. Kang, “Prevalence and socio-economic burden of heart failure in an aging society
of South Korea,” BMC Cardiovascular Disorders, vol. 16, no. 1, Nov. 2016, doi: 10.1186/s12872-016-0404-2.
[5] M. S. Islam, I. Hussain, M. M. Rahman, S. J. Park, and M. A. Hossain, “Explainable artificial intelligence model for stroke prediction
using EEG signal,” Sensors, vol. 22, no. 24, 2022, doi: 10.3390/s22249859.
[6] J. N. Heo, J. G. Yoon, H. Park, Y. D. Kim, H. S. Nam, and J. H. Heo, “Machine learning-based model for prediction of outcomes
in acute stroke,” Stroke, vol. 50, no. 5, pp. 1263–1265, 2019, doi: 10.1161/STROKEAHA.118.024293.
[7] Z. Hadianfard, H. L. Afshar, S. Nazarbaghi, B. Rahimi, and T. Timpka, “Predicting mortality in patients with stroke using data
mining techniques,” Acta Informatica Pragensia, vol. 11, no. 1, pp. 36–47, 2022, doi: 10.18267/j.aip.163.
[8] Y. A. Choi et al., “Deep learning-based stroke disease prediction system using real-time bio signals,” Sensors, vol. 21, no. 13, 2021,
doi: 10.3390/s21134269.
[9] E. Frank et al., “WEKA-a machine learning workbench for data mining,” in Data Mining and Knowledge Discovery Handbook,
Springer US, 2009, pp. 1269–1277, doi: 10.1007/978-0-387-09823-4_66.
[10] G. Aksu and N. Doğan, “An analysis program used in data mining: WEKA,” Journal of Measurement and Evaluation in Education
and Psychology, vol. 10, no. 1, pp. 80–95, 2019, doi: 10.21031/epod.399832.
[11] Z. B. Zamir, “Can the WEKA data mining tool be used in developing an economic growth model?,” Journal of Accounting, Business
and Management (JABM), vol. 30, no. 2, Nov. 2023, doi: 10.31966/jabminternational.v30i2.919.
[12] N. Nissa, S. Jamwal, and M. Neshat, “A technical comparative heart disease prediction framework using boosting ensemble
techniques,” Computation, vol. 12, no. 1, 2024, doi: 10.3390/computation12010015.
[13] K. A. Shakil, S. Anis, and M. Alam, “Dengue disease prediction using weka data mining tool,” arXiv-Computer Science, pp. 1-26,
Feb. 2015, doi: 10.48550/arXiv.1502.05167.
[14] J. R. M. Navin and R. Pankaja, “Performance analysis of text classification algorithms using confusion matrix,” International
Journal of Engineering and Technical Research (IJETR), vol. 6, no. 4, pp. 75–78, 2013.
[15] Fedesoriano, “Stroke prediction dataset,” Kaggle, 2021, Accessed: Feb. 12, 2024. [Online]. Available:
https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
[16] K. M. Rexrode, T. E. Madsen, A. Y. X. Yu, C. Carcel, J. H. Lichtman, and E. C. Miller, “The impact of sex and gender on stroke,”
Circulation Research, vol. 130, no. 4, pp. 512–528, 2022, doi: 10.1161/CIRCRESAHA.121.319915.
[17] M. S. Ekker, J. I. Verhoeven, I. Vaartjes, K. M. V. Nieuwenhuizen, C. J. M. Klijn, and F. E. D. Leeuw, “Stroke incidence in young
adults according to age, subtype, sex, and time trends,” Neurology, vol. 92, no. 21, pp. e2444–e2454, 2019, doi:
10.1212/WNL.0000000000007533.
[18] J. Dubow and M. E. Fink, “Impact of hypertension on stroke,” Current Atherosclerosis Reports, vol. 13, no. 4, pp. 298–305, 2011,
doi: 10.1007/s11883-011-0187-y.
[19] C. W. Tsao et al., “Heart disease and stroke statistics-2022 update: a report from the american heart association,” Circulation, vol.
145, no. 8, pp. E153–E639, 2022, doi: 10.1161/CIR.0000000000001052.
[20] S. Ramazanu, A. Y. Loke, and V. C. L. Chiang, “Couples coping in the community after the stroke of a spouse: a scoping review,”
Nursing Open, vol. 7, no. 2, pp. 472–482, Nov. 2020, doi: 10.1002/nop2.413.
[21] Á. Chamorro et al., “Glucose modifies the effect of endovascular thrombectomy in patients with acute stroke,” Stroke, vol. 50, no.
3, pp. 690–696, Mar. 2019, doi: 10.1161/STROKEAHA.118.023769.
[22] F. Q. Nuttall, “Body mass index: obesity, BMI, and health: a critical review,” Nutrition Today, vol. 50, no. 3, pp. 117–128, 2015,
doi: 10.1097/NT.0000000000000092.
[23] B. Pan, X. Jin, L. Jun, S. Qiu, Q. Zheng, and M. Pan, “The relationship between smoking and stroke a meta-analysis,” Medicine
(United States), vol. 98, no. 12, 2019, doi: 10.1097/MD.0000000000014872.
[24] P. Refaeilzadeh, L. Tang, and H. Liu, “On comparison of feature selection algorithms,” in Proceedings of AAAI workshop on
evaluation methods for machine learning II, pp. 34-39, 2007.
[25] I. H. Sarker, M. H. Furhad, and R. Nowrozy, “AI-driven cybersecurity: an overview, security intelligence modeling and research
directions,” SN Computer Science, vol. 2, no. 3, 2021, doi: 10.1007/s42979-021-00557-0.
[26] S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowledge-Based Systems, vol. 192, 2020,
doi: 10.1016/j.knosys.2019.105361.
[27] S. Sayad, “Naive bayesian,” Presentation. Accessed: Feb. 12, 2024 [Online]. Available:
https://www.saedsayad.com/naive_bayesian.htm
[28] P. Langley and S. Sage, “Induction of selective Bayesian classifiers,” in Uncertainty Proceedings 1994, Elsevier, 1994, pp. 399–
406, doi: 10.1016/b978-1-55860-332-5.50055-9.
[29] B. Mahesh, “Machine learning algorithms-a review,” International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381–
386, 2020, doi: 10.21275/ART20203995.
[30] Q. Xu and J. Yin, “Application of random forest algorithm in physical education,” Scientific Programming, vol. 2021, pp. 1–10,
Sep. 2021, doi: 10.1155/2021/1996904.
[31] M. Savargiv, B. Masoumi, and M. R. Keyvanpour, “A new random forest algorithm based on learning automata,” Computational
Intelligence and Neuroscience, vol. 2021, pp. 1–19, 2021, doi: 10.1155/2021/5572781.
[32] M. Wei, W. Meng, F. Dai, and W. Wu, “Application of machine learning in predicting the rate-dependent compressive strength of
rocks,” Journal of Rock Mechanics and Geotechnical Engineering, vol. 14, no. 5, pp. 1356–1365, 2022, doi:
10.1016/j.jrmge.2022.01.008.
[33] P. F. Orrù, A. Zoccheddu, L. Sassu, C. Mattia, R. Cozza, and S. Arena, “Machine learning approach using MLP and SVM algorithms
for the fault prediction of a centrifugal pump in the oil and gas industry,” Sustainability, vol. 12, no. 11, 2020, doi:
10.3390/su12114776.
[34] A. Pinkus, “Approximation theory of the MLP model in neural networks,” Acta Numerica, vol. 8, pp. 143–195, 1999, doi:
10.1017/S0962492900002919.
[35] Ž. Vujović, “Classification model evaluation metrics,” International Journal of Advanced Computer Science and Applications, vol.
12, no. 6, pp. 599–606, 2021, doi: 10.14569/IJACSA.2021.0120670.
[36] A. Tharwat, “Classification assessment methods,” Applied Computing and Informatics, vol. 17, no. 1, pp. 168–192, 2018, doi:
10.1016/j.aci.2018.08.003.
[37] F. Rahmad, Y. Suryanto, and K. Ramli, “Performance comparison of anti-spam technology using confusion matrix classification,”
IOP Conference Series: Materials Science and Engineering, vol. 879, no. 1, 2020, doi: 10.1088/1757-899X/879/1/012076.

Int J Artif Intell, Vol. 13, No. 3, September 2024: 3010-3017


Int J Artif Intell ISSN: 2252-8938  3017

[38] H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” International Journal of Electrical and
Computer Engineering, vol. 11, no. 3, pp. 2407–2413, 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.
[39] D. Li, F. Huang, L. Yan, Z. Cao, J. Chen, and Z. Ye, “Landslide susceptibility prediction using particle-swarm-optimized multilayer
perceptron: Comparisons with multilayer-perceptron-only, BP neural network, and information value models,” Applied Sciences,
vol. 9, no. 18, Sep. 2019, doi: 10.3390/app9183664.
[40] G. Zeng, “On the confusion matrix in credit scoring and its analytical properties,” Communications in Statistics - Theory and
Methods, vol. 49, no. 9, pp. 2080–2093, 2020, doi: 10.1080/03610926.2019.1568485.
[41] R. AlShboul, F. Thabtah, A. J. W. Scott, and Y. Wang, “The application of intelligent data models for dementia classification,”
Applied Sciences, vol. 13, no. 6, 2023, doi: 10.3390/app13063612.
[42] D. Fuqua and T. Razzaghi, “A cost-sensitive convolution neural network learning for control chart pattern recognition,” Expert
Systems with Applications, vol. 150, 2020, doi: 10.1016/j.eswa.2020.113275.

BIOGRAPHIES OF AUTHORS

Muneera Altayeb obtained a bachelor’s degree in computer engineering in 2007,


and a master’s degree in communications engineering from the University of Jordan in 2010.
She has been working as a lecturer in the Department of Communications and Computer
Engineering at Al-Ahliyya Amman University since 2015, in addition to her administrative
experience as assistant dean of the Faculty of Engineering during the period (2020-2023). Her
research interests focus on the following areas: digital signals and image processing, machine
learning, robotics, and artificial intelligence. She can be contacted at email:
[email protected].

Areen Arabiat earned her B.Sc. in Computer Engineering in 2005 from al Balqaa
Applied University, and her M.Sc. in Intelligent Transportation Systems (ITS) from Al Ahliyya
Amman University in 2022. She is currently a computer lab supervisor at the Faculty of
Engineering, Al-Ahliyya Amman University since 2013. Her research interests are focused on
the areas: machine learning, data mining, artificial intelligence, and image processing. She can
be contacted at email: [email protected].

Enhancing stroke prediction using the waikato environment for knowledge analysis (Muneera Altayeb)

You might also like