0% found this document useful (0 votes)
24 views12 pages

Research Article: Stroke Disease Detection and Prediction Using Robust Learning Approaches

Uploaded by

Bhavyatha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Research Article: Stroke Disease Detection and Prediction Using Robust Learning Approaches

Uploaded by

Bhavyatha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Hindawi

Journal of Healthcare Engineering


Volume 2021, Article ID 7633381, 12 pages
https://doi.org/10.1155/2021/7633381

Research Article
Stroke Disease Detection and Prediction Using Robust
Learning Approaches

Tahia Tazin ,1 Md Nur Alam,1 Nahian Nakiba Dola,1 Mohammad Sajibul Bari,1
Sami Bourouis ,2 and Mohammad Monirujjaman Khan 1
1
Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka 1229, Bangladesh
2
Department of Information Technology, College of Computers and Information Technology, Taif University, P.O. Box 11099,
Taif 21944, Saudi Arabia

Correspondence should be addressed to Mohammad Monirujjaman Khan; [email protected]

Received 7 October 2021; Revised 4 November 2021; Accepted 9 November 2021; Published 26 November 2021

Academic Editor: Han Wang

Copyright © 2021 Tahia Tazin et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Stroke is a medical disorder in which the blood arteries in the brain are ruptured, causing damage to the brain. When the supply of
blood and other nutrients to the brain is interrupted, symptoms might develop. According to the World Health Organization
(WHO), stroke is the greatest cause of death and disability globally. Early recognition of the various warning signs of a stroke can
help reduce the severity of the stroke. Different machine learning (ML) models have been developed to predict the likelihood of a
stroke occurring in the brain. This research uses a range of physiological parameters and machine learning algorithms, such as
Logistic Regression (LR), Decision Tree (DT) Classification, Random Forest (RF) Classification, and Voting Classifier, to train
four different models for reliable prediction. Random Forest was the best performing algorithm for this task with an accuracy of
approximately 96 percent. The dataset used in the development of the method was the open-access Stroke Prediction dataset. The
accuracy percentage of the models used in this investigation is significantly higher than that of previous studies, indicating that the
models used in this investigation are more reliable. Numerous model comparisons have established their robustness, and the
scheme can be deduced from the study analysis.

1. Introduction population. In the United States, approximately 795,000


people suffer from the disabling effects of strokes on a
Stroke occurs when the blood flow to various areas of the regular basis [2]. It is India’s fourth leading cause of death.
brain is disrupted or diminished, resulting in the cells in Strokes are classified as ischemic or hemorrhagic. In a
those areas of the brain not receiving the nutrients and chemical stroke, clots obstruct the drainage; in a hemor-
oxygen they require and dying. A stroke is a medical rhagic stroke, a weak blood vessel bursts and bleeds into the
emergency that requires urgent medical attention. Early brain. Stroke may be avoided by leading a healthy and
detection and appropriate management are required to balanced lifestyle that includes abstaining from unhealthy
prevent further damage to the affected area of the brain and behaviors, such as smoking and drinking, keeping a healthy
other complications in other parts of the body. The World body mass index (BMI) and an average glucose level, and
Health Organization (WHO) estimates that fifteen million maintaining an excellent heart and kidney function. Stroke
people worldwide suffer from strokes each year, with one prediction is essential and must be treated promptly to avoid
person dying every four to five minutes in the affected irreversible damage or death. With the development of
population. Stroke is the sixth leading cause of mortality in technology in the medical sector, it is now possible to an-
the United States according to the Centers for Disease ticipate the onset of a stroke by utilizing ML techniques. The
Control and Prevention (CDC) [1]. Stroke is a non- algorithms included in ML are beneficial as they allow for
communicable disease that kills approximately 11% of the accurate prediction and proper analysis. The majority of
2 Journal of Healthcare Engineering

previous stroke-related research has focused on, among functional prognosis of an ischemic stroke. They tested this
other things, the prediction of heart attacks. Brain stroke has method on a patient who died three months after admission.
been the subject of very few studies. The main motivation of They obtained an AUC value of greater than 90. Kansadub
this paper is to demonstrate how ML may be used to forecast et al. [14] conducted research to determine the risk of stroke.
the onset of a brain stroke. The most important aspect of the The authors of the research analyzed the data to predict
methods employed and the findings achieved is that among strokes using Naive Bayes, decision trees, and neural net-
the four distinct classification algorithms tested, Random works. They assessed their pointer’s accuracy and AUC in
Forest fared the best, achieving a higher accuracy metric in their research. They categorized all of these algorithms as
comparison to the others. One downside of the model is that decision trees, with naive Bayes providing the most accurate
it is trained on textual data rather than real time brain results. Adam et al. [15] conducted research to determine the
images. The implementation of four ML classification classification of an ischemic stroke. They categorized is-
methods is shown in this paper. chemic strokes using two models: the k-nearest neighbor
Numerous academics have previously utilized machine method and the decision tree technique. In their study, the
learning to forecast strokes. Govindarajan et al. [3] used text decision tree method was found to be more useful by
mining and a machine learning classifier to classify stroke medical experts when used to categorize strokes.
disorders in 507 individuals. They tested a variety of machine The majority of studies had an accuracy rate of around
learning methods for training purposes, including Artificial 90%, which was considered to be quite good. However, the
Neural Network (ANN), and they found that the SGD algo- novelty of our research is that we used several well-known
rithm provided the greatest value, 95 percent. Amini et al. [4, 5] machine learning methods to get the best result. Random
performed research to predict a stroke occurrence. They forest (RF), decision tree (DT), voting classifier (VC), and
classified 50 risk variables for stroke, diabetes, cardiovascular logistic regression (LR) were the most successful algorithms,
disease, smoking, hyperlipidemia, and alcohol consumption in with 96, 94, 91, and 87 percent F1-scores, respectively. The
807 healthy and unhealthy individuals. They used two of the accuracy percent of the models used in this research is much
most accurate methods: the c4.5 decision tree algorithm (95 greater than the accuracy percent of the models used in
percent accuracy) and the K-nearest neighbor algorithm (94 previous investigations, suggesting that the models used in
percent accuracy). Cheng et al. [6] presented a study on es- this investigation are more trustworthy. They have been
timating the prognosis of an ischemic stroke. In their study, shown to be resilient in many model comparisons, and the
they used 82 ischemic stroke patient data sets, two ANN scheme may be generated from the results of the study’s
models, and the accuracy values of 79 and 95 percent. Cheon analysis.
et al. [7–9] conducted research to determine the predictability As mentioned earlier, the major contribution of this
of a stroke patient death. They identified the stroke incidence research is that we have used different machine learning
using 15,099 individuals in their research. They detected strokes models on a publicly available dataset. In the previous work,
using a deep neural network method. The authors utilized PCA most of the researchers used a significant model to predict
to extract information from the medical records and predict the stroke disease. However, we used four different models,
strokes. They have 83 percent area under the curve (AUC). and also, we compared the results with the previous work.
Singh et al. [10] conducted research using artificial intelligence All the results and comparisons are briefly discussed in the
to predict strokes. They employed a new technique for pre- following section. The rest of this article is set out as follows:
dicting stroke in their research using the cardiovascular health the experimental methodology and procedures are described
study (CHS) dataset. Additionally, they used the decision tree in Section 2; the result analysis is provided in Section 3; and
method to do a feature extraction followed by a principal conclusions have been discussed in Section 4.
component analysis. In this case, the model was built using a
neural network classification method, and it achieved 97
percent accuracy. 2. Procedure and Experimental Methodology
Chin et al. [11] conducted research to determine the This section includes a description of the dataset, a block
accuracy of an automated early ischemic stroke detection. diagram, a flow diagram, and evaluation matrices, as well as
The major objective of their research was to create a method the process and methodology used in the study.
for automating primary ischemic stroke using Convolu-
tional Neural Network (CNN). They amassed 256 pictures
for the purpose of training and testing the CNN model. They 2.1. Proposed System. The data has become available for
utilized the data lengthening technique to increase the model construction once it has been processed. A pre-
gathered picture in their system’s image preparation. Their processed dataset and machine learning techniques are
CNN technique achieved a 90 percent accuracy rate. Sung needed for the model construction. LR, DT classification, RF
et al. [12] conducted research to establish a stroke severity classification, and voting classifier are some of the methods
index. They gathered data on 3577 patients who had an acute used. After creating four alternative models, the accuracy
ischemic stroke. They utilized a variety of data mining measures, namely accuracy score, precision score, recall
methods, including linear regression, to create their pre- score, and F1 score are used to compare them. The designed
dictive models. Their ability to predict outperformed the system’s block diagram is shown in Figure 1.
k-nearest neighbor method (95% confidence interval). All the components of the block diagram have been
Monteiro et al. [13] used machine learning to predict the discussed in the following subsections.
Journal of Healthcare Engineering 3

Dataset

Machine Leaning
Data
Algorithms
Preprocessing

Missing Data
analysis Random Forest

Handling
Imbalanced data
Logistic
Regression
Label Encoding

Decision Tree

Voting Classifier

Model Building
and Comparing

Figure 1: Proposed system’s block diagram.

2.2. Dataset. The stroke prediction dataset [16] was used to bearing on model construction. The dataset is then inspected
perform the study. There were 5110 rows and 12 columns in for null values and filled if any are detected. The null values
this dataset. The value of the output column stroke is either 1 in the column BMI are filled using the data column’s mean
or 0. The number 0 indicates that no stroke risk was in this case.
identified, while the value 1 indicates that a stroke risk was Label encoding converts the dataset’s string literals to
detected. The probability of 0 in the output column (stroke) integer values that the computer can comprehend. As the
exceeds the possibility of 1 in the same column in this computer is frequently trained on numbers, the strings must
dataset. 249 rows alone in the stroke column have the value be converted to integers. The gathered dataset has five
1, whereas 4861 rows have the value 0. To improve accuracy, columns of the data type string. All strings are encoded
data preprocessing is used to balance the data. Figure 2 during label encoding, and the whole dataset is transformed
shows the total number of stroke and nonstroke records in into a collection of numbers. The dataset used for stroke
the output column before preprocessing. prediction is very imbalanced. The dataset has a total of 5110
From Figure 2, it is clear that this dataset is an imbal- rows, with 249 rows indicating the possibility of a stroke and
anced dataset. The SMOTE technique has been used to 4861 rows confirming the lack of a stroke. While using such
balance this dataset. data to train a machine-level model may result in accuracy,
other accuracy measures such as precision and recall are
inadequate. If such an unbalanced data is not dealt with
2.3. Preprocessing. Before building a model, data pre- properly, the findings will be inaccurate, and the forecast will
processing is required to remove unwanted noise and be ineffective. As a result, to obtain an efficient model, this
outliers from the dataset that could lead the model to depart unbalanced data must be dealt with first. The SMOTE
from its intended training. This stage addresses everything technique was employed for this purpose. Figure 3 depicts
that prevents the model from functioning more efficiently. the dataset’s balance output column.
Following the collection of the relevant dataset, the data The next stage is to construct the model after finishing
must be cleaned and prepared for model development. As data preparation and managing the imbalanced dataset. To
stated before, the dataset used has twelve characteristics. To improve the accuracy and efficiency of this job, the data is
begin with, the column id is omitted since its presence has no divided into training and testing data with a ratio of 80
4 Journal of Healthcare Engineering

5000 made by this algorithm. Each DT in this method must vote


for one of the two output classes (in this case, stroke or no
4000 stroke). The final prediction is determined by the RF
method, which chooses the class with the most votes. A
3000
block diagram of random forest classification is shown in
count

Figure 4.
The flexibility of the random forest is one of its most
2000
alluring features. It may be utilized for relapse detection and
grouping tasks, and the overall weighting given to infor-
1000 mation characteristics is readily apparent. Additionally, it is
a beneficial approach since the default hyperparameters it
0 employs often give unambiguous expectations. Under-
0 1 standing the hyperparameters is critical since there are
stroke relatively few of them, to begin with. Overfitting is a well-
Figure 2: Total number of stroke and normal data. known problem in machine learning, although it occurs
seldom with the arbitrary random forest classifier. If there
are sufficient trees in the forest, the classifier will not overfit
the model.

4000
2.4.2. Decision Tree. Both regression and classification
concerns are addressed using classification with DT [18].
3000
Furthermore, as the input variables already have a related
count

output variable, this methodology is a supervised learning


2000 model. It resembles a tree. The data is constantly segmented
according to a specific parameter in this method. The de-
1000 cision node and the leaf node are the two parts of a decision
tree. At the former node, the data is divided, and the latter is
0 the node that produces the result. The DT classifier’s basic
0 1 structure is depicted in Figure 5.
The DT is easy to comprehend since it replicates the
Figure 3: Output column after preprocessing.
phases that a person goes through while making a real world
decision. It may be very beneficial in resolving issues with
percent training data and 20 percent testing data. After decision-making. Consider all potential solutions to an issue.
splitting, the model is trained using a variety of classification Cleaning data is not required as much as it is with other
methods. Random forest, decision tree classification methods.
method, voting classifier, and logistic regression are the
classification algorithms utilized in this study.
2.4.3. Voting Classifier. A voting classifier is a type of
classification model that trains on an ensemble of multiple
2.4. Proposed Algorithms. The most common disease iden- models and predicts an output (class) based on the class that
tified in the medical field is stroke, which is on the rise year has the greatest chance of being selected as the output [19]. It
after year. Using the publicly accessible stroke prediction is used to predict the outcome of a vote. The flowchart for the
dataset, the study measured four commonly used machine voting classifier model is shown in Figure 6.
learning methods for predicting brain stroke recurrence, Voting summarizes the methodology we will use to
which are as follows: compare various training models. There are two methods of
voting, which are as follows:
(i) Random forest
(i) Soft voting: In this phase, the predicted probability
(ii) Decision tree
gradients for each model are added and averaged.
(iii) Voting classifier The category with the highest value is deemed the
(iv) Logistic regression winner, and its contents are the output. While this
seems to be a fair and rational strategy, it is only
recommended if the individual categories are cali-
2.4.1. Random Forest. The classification algorithm chosen brated correctly. This is similar to computing the
was RF classification [17]. RFs are composed of numerous weighted average of a set of numbers, except that
independent decision trees that were trained individually on each of the various models contributes propor-
a random sample of data. These trees are created during tionally to the final output vector.
training, and the decision trees’ outputs are collected. A (ii) Hard voting: This phase combines the categorization
process termed voting is used to determine the final forecast outputs of all the various models and specifies the
Journal of Healthcare Engineering 5

Training Dataset Training Training Training


sample-1 sample-2 sample-n

Test Dataset Training Training Training


sample-1 sample-1 sample-1

Voting

Prediction

Figure 4: Block diagram of Random Forest Classifier.

Level-1
Root
Node

Internal Level-2
Leaf Node
Node

Leaf Node Leaf Node Level-3

Figure 5: Basic structure of a decision tree classifier.

final output value as the mode value of the resultant 2.5. Evaluation Matrix. Figure 8 depicts the confusion
output. Because of the fact that the particular matrix or evaluation matrix. The confusion matrix is a tool
probability values associated with each model are for evaluating the performance of machine learning clas-
disregarded, this approach is analogous to com- sification algorithms. The confusion matrix has been used to
puting the arithmetic mean of a collection of test the efficiency of all models created. The confusion matrix
numbers. The output alone of each model is illustrates how often our models forecast correctly and how
considered. often they estimate incorrectly. False positives and false
negatives have been allocated to badly predicted values,
whereas true positives and true negatives were assigned to
2.4.4. Logistic Regression. The flowchart for the logistic re- properly anticipated values. The model’s accuracy, preci-
gression model is shown in Figure 7. In the supervised sion-recall trade-off, and AUC were utilized to assess its
learning approach, LR is one of the most commonly used performance after grouping all predicted values in the
ML algorithms [20]. It is a forecasting method that uses a matrix.
collection of independent factors to predict a categorical
dependent variable.
Utilizing logistic regression, the output of a categorical 3. Result Analysis
dependent variable is predicted. As a result, the output must The models’ capacities, model forecasts, investigation, and
be discrete or categorical in nature. It may be yes or no, 0 or eventual outcomes are examined in this part.
1, true or false, etc., but probability values between 0 and 1
are given. Logistic regression and linear regression are used
in very similar ways. The classification problems are 3.1. Data Visualization. A histogram depicts a recurrence
addressed with LR, and the regression problems are dispersion with infinite classes. It is a region outline made of
addressed using linear regression. Instead of a regression square shapes with bases at class boundary spans and regions
line, we use an S-shaped logistic function that predicts the proportionate to the comparing classes’ frequencies. As the
two maximum values (0 or 1). base fills in the spaces between the class borders, the square
6 Journal of Healthcare Engineering

Learner-1

Learner-2

Meta Predictio
Data
Data n Result
Learner-3

Learner-n

Figure 6: Flowchart of a voting classifier.

Feature-1

Feature-2
Logistic Regression Prediction

Feature-n

Figure 7: Structure of a logistic regression classifier.

shapes are all linked. The squares form the statures are 3.2. Visualization of Feature Selection. The process of feature
proportional to the comparative class frequencies and re- selection is shown in Figure 11. Feature selection aids in
currence densities for distinct classes. Figure 9 illustrates comprehending how features are linked to one another.
some important features of the histograms. A histogram Figure 11 shows that age, hypertension, avg_-
depicts the dataset’s proportions. glucose_level, heart_disease, ever_married, and BMI are
Figure 9 depicts the dataset’s gender, age, hypertension, positively corelated with the target feature. However, gender
heart disease, ever married, average glucose level, and body is negatively corelated with stroke.
mass index distributions. For the gender attribute, 0 means
male and 1 means female. There are more female samples
than male samples in this collection. However, based on the 3.3. Evaluation of the Model
age distribution, it is obvious that the sample’s average age is
in the 40s, and the upper limit is approximately 60. When it 3.3.1. Random Forest (RF). Figure 12 depicts the classifi-
comes to hypertension, 0 means the individual does not have cation report for the RF model.
it, while 1 means the person has it. The total number of In this case, the total F1-score obtained is 96 percent. The
individuals who are healthy and have no history of heart individual F1-scores for healthy people are 96 percent, while
disease is achieved in this dataset. With regard to BMI and those who have had a brain stroke have 96 percent. This
average glucose levels, Figure 10 shows the relationship model achieved the highest accuracy after fine-tuning. Prior
between one feature and the target feature. to fine-tuning, the model had an accuracy of 92 percent.
Figure 10 shows the relationship between gender and Figure 13 depicts the random forest model’s prediction.
stroke, age and stroke, hypertension and stroke, heart dis- The predicted outcome and the model’s calculated perfor-
ease and stroke, ever_married and stroke, avg_glucose_level mance are shown in the confusion matrix. There are 2707
and stroke, and BMI and stroke. accurate guesses and 113 erroneous predictions.
Journal of Healthcare Engineering 7

Predicted: Predicted:
NO YES

Actual: NO TN FP

Actual: YES FN TP

Figure 8: Block diagram of confusion matrix.

gender age hypertension


3000
2500 600 4000
2000 3000
400
1500 2000
1000 200
500 1000
0 0 0
0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 0.0 0.2 0.4 0.6 0.8 1.0
heart_disease ever_married avg_glucose_level
3000
4000 1500
2500
3000 2000 1000
2000 1500
1000 500
1000 500
0 0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 50 100 150 200 250
bmi
2000
1500
1000
500
0
20 40 60 80 100

Figure 9: Histogram of some important features of the dataset.

3.3.2. Decision Tree. The classification report for the deci- 3.3.3. Voting Classifier. The classification report for the
sion tree classification is shown in Figure 14. voting classifier is shown in Figure 16.
The final F1-score in this case is 94 percent. An indi- The total F1-score obtained in this case is 91 percent. The
vidual’s F1-score is 94 percent for healthy individuals and 95 individual F1-scores are 91 percent for healthy people and 91
percent for those who have had a brain stroke. Also, the percent for those who have had a stroke. Also, the precision
precision and recall are shown in Figure 14. A fine-tuned and recall are shown in Figure 16. Without any fine-tuning,
decision tree model has also been implemented. However, this model achieved 91 percent accuracy.
after fine-tuning, the accuracy did not improve. The prediction made by the voting classifier is shown in
Figure 15 depicts the DT model’s prediction. There were Figure 17. The overall number of accurate guesses is 2565,
2664 accurate guesses and 156 erroneous predictions. while the total number of erroneous predictions is 255.
8 Journal of Healthcare Engineering

1.0 80 1.0
0.8 0.8
60

hypertension
0.6 0.6
gender

age
40
0.4 0.4
0.2 20 0.2
0.0 0 0.0
0 1 0 1 0 1

stroke stroke stroke


1.0 1.0
250

avg_glucose_level
0.8 0.8

ever_married
heart_disease

200
0.6 0.6
0.4 0.4 150

0.2 0.2 100


0.0 0.0 50
0 1 0 1 0 1

stroke stroke stroke

100

80

60
bmi

40

20

0 1

stroke

Figure 10: Relationship between some important features with the target feature.

Features Correlating with Stroke


1.00

0.75

0.50

0.25

0.00

-0.25
stroke 1
age 0.23
hypertension 0.14 -0.50
avg_glucose_level 0.14
heart_disease 0.14
ever_married 0.11 -0.75

bmi 0.042
gender -0.0069
-1.00
stroke
Figure 11: Features correlation with stroke.
Journal of Healthcare Engineering 9

Figure 12: Classification report of random forest.

0 1366 41
True Label

1 72 1341

0 1
Predicted Label
Figure 13: Confusion matrix of random forest.

Figure 14: Classification report of decision tree.


10 Journal of Healthcare Engineering

0 1322 85

True Label
1 71 1342

0 1
Predicted Label
Figure 15: Confusion matrix of a decision tree.

Figure 16: Classification report of a voting classifier.

0 1280 127
True Label

1 128 1285

0 1
Predicted Label
Figure 17: Confusion matrix of a voting classifier.
Journal of Healthcare Engineering 11

Table 1: Performance comparison.


This paper (model name) Accuracy (%) Reference paper (model name) Accuracy (%)
Random forest 96 Ref [21] random forest 73
Decision tree 94 Ref [21] decision tree 77.6
Voting classifier 91 Ref [12] K-nearest neighbor 95
Logistic regression 79 Ref [21] logistic regression 77.6

3.4. Model Comparison. Table 1 shows a comparison of the Acknowledgments


models with those found in prior studies. The chart clearly
demonstrates that of the various models included in the The authors are thankful for the support from Taif Uni-
framework, the RF model is the most effective. In addition to versity Researchers Supporting Project (TURSP-2020/26),
having a higher F1-score, it has more precision and better Taif University, Taif, Saudi Arabia.
recall and accuracy.
From Table 1, it is clear that all algorithms have an References
acceptable level of accuracy, but the random forest algorithm
is a preferable option because of its higher level of accuracy. [1] “Concept of stroke by healthline,” [Online]. Available: https://
This paper achieved 96 percent accuracy using the RF al- www.cdc.gov/stroke/index.htm.
gorithm, but in [21] the authors achieved only 73 percent [2] “Statistics of stroke by Centers for disease control and pre-
accuracy. Also, using the decision tree algorithm, this paper vention,” [Online]. Available: https://www.cdc.gov/stroke/
achieved 94 percent accuracy, while the authors in [21] facts.htm.
[3] P. Govindarajan, R. K. Soundarapandian, A. H. Gandomi,
achieved 77.6 percent accuracy. Although the KNN algo-
R. Patan, P. Jayaraman, and R. Manikandan, “Classification of
rithm has not been implemented in this research, ref [12]
stroke disease using machine learning algorithms,” Neural
achieved 95 percent accuracy, which is higher than the Computing & Applications, vol. 32, no. 3, pp. 817–828, 2020.
voting classifier’s accuracy (91 percent). However, in this [4] L. Amini, R. Azarpazhouh, M. T. Farzadfar et al., “Prediction
paper, logistic regression performs poorly. and control of stroke by data mining,” International Journal of
Preventive Medicine, vol. 4, no. 2, pp. S245–S249, 2013.
4. Conclusion [5] S. M. Reza, M. M. Rahman, and S. A. Mamun, “A new ap-
proach for road networks-a vehicle xml device collaboration
Stroke is a life-threatening medical illness that should be treated with big data,” in Proceedings of the International Conference
as soon as possible to avoid further complications. The de- On Electrical Engineering And Information And Communi-
velopment of an ML model could aid in the early detection of cation Technology, no. 1–5, Mirpur, Dhaka, April 2014.
stroke and the subsequent mitigation of its severe conse- [6] C. A. Cheng, Y. C. Lin, and H. W. Chiu, “Prediction of the
quences. The effectiveness of several ML algorithms in properly prognosis of ischemic stroke patients after intravenous
predicting stroke based on a number of physiological variables thrombolysis using artificial neural networks,” Studies in
is investigated in this study. Random forest classification Health Technology and Informatics, vol. 202, pp. 115–118,
2014.
outperforms the other methods tested with a classification
[7] S. Cheon, J. Kim, and J. Lim, “The use of deep learning to
accuracy of 96 percent. According to the research, the random predict stroke patient mortality,” International Journal of
forest method outperforms other processes when cross-vali- Environmental Research and Public Health, vol. 16, no. 11,
dation metrics are used in brain stroke forecasting. The future 2019.
scope of this study is that using a larger dataset and machine [8] M. S. Zulfiker, N. Kabir, A. A. Biswas, P. Chakraborty, and
learning models, such as AdaBoost, SVM, and Bagging, the M. M. Rahman, “Predicting students’ performance of the
framework models may be enhanced. This will enhance the private universities of Bangladesh using machine learning
dependability of the framework and the framework’s presen- approaches,” International Journal of Advanced Computer
tation. In exchange for just providing some basic information, Science and Applications, vol. 11, no. 3, 2020.
the machine learning architecture may help the general public [9] S. Rahman, T. Sharma, S. Reza, M. Rahman, M. Kaiser, and
in determining the likelihood of a stroke occurring in an adult Nurjahan, “Pso-nf based vertical handoff decision for ubiq-
uitous heterogeneous wireless network (uhwn),” in Pro-
patient. In an ideal world, it would help patients obtain early
ceedings of the 2016 International Workshop On
treatment for strokes and rebuild their lives after the event. Computational Intelligence (IWCI), pp. 153–158, Dhaka,
Bangladesh, December 2016.
Data Availability [10] M. S. Singh and P. Choudhary, “Stroke prediction using
artificial intelligence,” in Proceedings of the 2017 8th Annual
The data utilized to support this research findings are ac- Industrial Automation And Electromechanical Engineering
cessible online at https://www.kaggle.com/fedesoriano/ Conference (IEMECON), pp. 158–161, Bangkok, Thailand,
stroke-prediction-dataset. August 2017.
[11] C.-L. Chin, B.-J. Lin, G.-R. Wu et al., “An automated early
Conflicts of Interest ischemic stroke detection system using CNN deep learning
algorithm,” in Proceedings of the 2017 IEEE 8th International
The authors declare that they have no conflicts of interest to Conference on Awareness Science and Technology (iCAST),
report regarding the present study. pp. 368–372, Taichung, Taiwan, November 2017.
12 Journal of Healthcare Engineering

[12] S.-F. Sung, C.-Y. Hsieh, Y.-H. Kao Yang et al., “Developing a
stroke severity index based on administrative data was feasible
using data mining techniques,” Journal of Clinical Epidemi-
ology, vol. 68, no. 11, pp. 1292–1300, 2015.
[13] M. Monteiro, A. C. Fonseca, A. T. Freitas et al., “Using
machine learning to improve the prediction of functional
outcome in ischemic stroke patients,” IEEE/ACM Transac-
tions on Computational Biology and Bioinformatics, vol. 15,
no. 6, pp. 1953–1959, 2018.
[14] T. Kansadub, S. Thammaboosadee, S. Kiattisin, and
C. Jalayondeja, “Stroke risk prediction model based on de-
mographic data,” in Proceedings of the 2015 8th Biomedical
Engineering International Conference (BMEiCON), pp. 1–3,
Pattaya, Thailand, November 2015.
[15] S. Y. Adam, A. Yousif, and M. B. Bashir, “Classification of
ischemic stroke using machine learning algorithms,” Inter-
national Journal of Computer Application, vol. 149, no. 10,
pp. 26–31, 2016.
[16] “Stroke prediction dataset,” [Online]. Available: https://www.
kaggle.com/fedesoriano/stroke-prediction-dataset.
[17] “Documentation for random forest classification from scikit-
learn,” org. [Online]. Available:https://scikit-learn.org/stable/
modules/generated/sklearn.ensemble.
RandomForestClassifier.html.
[18] “Documentation for decision tree classification from scikit-
learn,” org. [Online]. Available: https://scikit-learn.org/stable/
modules/tree.html.
[19] Voting Classifier. [Online]. Available: https://
towardsdatascience.com/custom-implementation-of-feature-
importance-for-your-voting-classifier-model-859b573ce0e0.
[20] “Logistic regression in machine learning,” [Online]. Available:
.
[21] G. Sailasya and G. L. A. Kumari, “Analyzing the performance
of stroke prediction using ML classification algorithms,”
International Journal Of Advanced Computer Science And
Applications, vol. 12, no. 6, pp. 539–545, 2021.

You might also like