An Effective Framework For Predicting Stroke Prediction Using Machine Learning Technique
An Effective Framework For Predicting Stroke Prediction Using Machine Learning Technique
Abstract— Today, adequately trained machine learning happen in the world [1]. Like all muscles, our heart needs
algorithms can be significantly used in fields such as surveillance, oxygen too, and it can’t get enough oxygen during a heart
medicine, and data management to identify and provide solutions attack. Heart attack is also known as Myocardial Infarction
to problems that do not have an answer or if the current solutions (MI) [2]. High blood pressure, smoking, and stress are risk
are ineffective. A stroke is when blood arteries in the brain factors for heart attack. The Coronary arteries supply blood to
burst, harming the brain. It may also occur if the brain's supply the heart in some fatty component, also known as plaques, that
of nutrients and blood is interrupted. The severity of a stroke can develops on the artery wall. These plaques increase day by day
be lessened by early recognition of numerous warning symptoms.
and makes a blockage within the supply route [3]. Due to that
Our lifestyle has recently changed, especially in the covid era.
blockage, blood cells cannot move from the artery. And the
Nowadays, Millions of people encounter illnesses, disabilities,
cardiovascular diseases, being overweight, hypertension, and
oxygen necessity of the heart will not be satisfied. If the size of
many more. There are more than 10 million cases per year in the plaques is not that huge, but if a plaque cracks, there will be
India only. When something stops the blood cells’ flow, a person a blood clot sometime. And that clot will not allow other blood
encounters a heart attack. Because of this, the tissue will start to cells to move. So, in that case, too-starved cells will die within
die because of a lack of blood supply. So, we proposed a machine specific minutes. In some, a worst-case heart attack may cause
learning model for early detection. So, we proposed a machine sudden death. One person dies every 34 seconds due to a heart
learning model for early detection. The dataset used is from attack [4]. We can reduce these numbers by using early
Kaggle, with various parameters that have been considered. In detection using Machine Learning algorithms. Although
this paper, we have developed several machine learning various treatments are available, like Coronary angioplasty and
algorithms for the early detection of the disease, taking into stenting, and Coronary artery bypass surgery (CABG) [5],
consideration multiple risk factors and physiological parameters early detection will drop the percentage of death. Machine
that have caused the stroke. We trained our model using many Learning is a domain that allows Ma-chines (Computers) to
machine learning algorithms such as Logistic Regression (LR), learn without being explicitly programmed.
K-Nearest Neighbor (KNN), Decision Tree (DT), Random Forest
(RF), Support Vector Machine (SVM), Extreme Gradient This research paper used several Machine Learning
Boosting (XGBoost), and Neural Network (NN), and then we algorithms such as Logistic Regression, K Nearest Neighbor,
checked its testing accuracy. We found the highest accuracy in Random Forest Classifier, Decision Tree Classifier, XGBoost
the XGBoost Classifier with 100% accuracy and 99% accuracy Classifier, and Neural Network. Here we got maximum
in the Random Forest Classifier algorithm. We also used accuracy in XGBoost Classifier Algorithm with 100%
ensemble learning and successfully got 100% testing accuracy. Accuracy and Random Forest Classifier with 99% Accuracy.
Here we also used the feature selection technique. Using the
Keywords—component; formatting; style; styling; insert (key Feature selection technique, we found 50% and 25%
words) significant features. And with traditional machine learning
algorithms, we also used the ensemble learning technique, Max
I. INTRODUCTION vote. For this model, we used Python Programming language
Cardiovascular Diseases (CVDs) are the leading killer of all and its many libraries like pandas, NumPy, Scikit-learn, etc.
diseases worldwide. Cardiovascular causes Heart Attacks and The paper provides a brief overview of the work done,
problems Strokes. There are seven types of this killer disease, followed by section 2, the Literature Review; section 3,
coronary artery disease (CAD), Heart Arrhythmias, Heart Methodology with various sub-section; section 4, Result and
Failure, Heart Valve Disease, Pericardial Disease, Discussion; and section 5, about Conclusion.
Cardiomyopathy (Heart Muscle Disease), and congenital heart
disease. Around 17.9 million people die every year due to
cardiovascular diseases. This number is 32% of all deaths that
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
II. LITERATURE REVIEW algorithm. They also provide comparison graphs of Different
algorithms using precision, recall, and accuracy metrics. In
[13], the efficient treatment of the person on time before the
In the Research paper [6] “KNN Algorithm used for Heart complication increases only if an accurate and reliable system
Attack detection,” To predict the possibility of having a heart diagnoses the patient’s heart condition. They used five
attack, they employed the KNN (K Nearest Neighbor) machine learning algorithms to detect heart attacks: SVM,
Algorithm on the Kaggle dataset. In their research to select Decision Tree Classifier, Naïve Bayes, KNN, and Artificial
significant features, they also used Correlation Matrix. By Neural Network (ANN). They got the highest accuracy in
doing so, they got 72.37% accuracy. Using the K-Fold cross- ANN with 86.91% and the lowest in DTC with 74.0%. The
validation technique, they optimize the parts. main goal of the research paper [14] named “Smart Wearable
In [7] proposed by Reddy Anuradha entitled “An Model for predicting heart diseases using machine learning” is
Assessment on Cardio-vascular Disease Prediction and to predict the possibility of heart diseases using Machine
Diagnosis using Machine Learning Algorithms.” They used Learning Algorithm. Here they used Random Forest Classifier
Machine learning algorithms like Support Vector Machine Algorithm. They used an ECG sensor from which the Electro
(SVM), Random Forest Classifier, KNN, Decision tree Cardiogram Patterns were obtained. Then this data will be
classifier, and Naïve Bayes. The main goal of this study is to given RF algorithm to predict, and they will get 88%
detect various cardiovascular diseases. She used different efficiency. They use accurate data; the outputs are provided on
classification techniques to predict the presence or absence of that person’s mobile app. This research is beneficial for
disease. She also used accuracy, sensitivity, and specificity to countries that have fewer doctors. In the research paper [15]
evaluate the performance. In [8], the author proposed “An titled “Heart Disease Prediction using Machine Learning
Efficient Early-stage Heart Disease, Risk Detection Using Algorithms,” researchers used the UCI dataset to create the
Machine Learning Techniques,” the authors mentioned that model. They used KNN, Logistic Regression, and Random
early prediction is a big challenge in the medical field. The Forest Algorithm. They also provide a graph of Exploratory
main goal of this research is to predict heart disease at an early data analysis based on Age. In the paper, they put a table of
stage. For that, they worked on the UCI dataset. For training testing ratios and describe the accuracy of each
dimensionality reduction, they use PCA (Principal Component algorithm. They got the highest accuracy of 87.88% using
Analysis). They also used parameters like Algorithm Logistic Regression with a 0.2 test size. Research paper [16]
Accuracy, precision, recall, and AUC (Area Under Curve). proposed by Amin Ul Haq and his group titled “A Hybrid
They got maximum accuracy from Naïve Bayes and Adaboost Intelligent System Framework for the Prediction of Heart
Algorithms. According to the authors of the Research paper Disease Using Machine Learning Algorithms.” They used
[9] entitled “Heart Diseases Prediction using Logistic machine learning algorithms to solve the complex problem of
Regression Algorithm,” the detection of cardiovascular heart disease diagnoses. They used SVM, KNN, ANN,
diseases is crucial since its complication can impact a person’s Decision Tree, Logistic Regression, AdaBoost, Naïve Bayes,
life. The reports of physical examinations take more time to and Fuzzy logic Algorithms to create a predictive model. For
prepare, so they developed a machine learning model. For that, they used Cleveland's heart disease data. There are total
creating the model, they used Logistic Regression Algorithm. data of 303 patients. They also used parameters like accuracy,
They also provide a confusion matrix and correlation matrix of specificity, and sensitivity. And they successfully got 89%
the data. They successfully got 85% accuracy and a 0.1406 accuracy using the Logistic Regression Algorithm with 10-
error rate. In a research paper [10] on heart disease prediction, fold cross-validation. Early detection of heart-related illnesses
the authors used the Data mining technique WARM – may prove beneficial in decreasing deaths worldwide. So
Weighted Association Rule mining to obtain the highest Abhinav Kulshreshth, Mahima Yadav, and Ganga Sharma
confidence score. WARM is used to find feature relations. To proposed a paper [17] entitled “Detecting Cardiac Ailments
improve prediction, they assigned some weighted scores. They using Machine Learning.” To check whether the person has
worked on the UCI dataset. With all features, they got 96% any heart problems or not. And to select the best
accuracy, and using feature selection of significant parts; they hyperparameters, they used a gradient-boosting classifier with
got 98% Accuracy. They also generated 20 rules from dataset Grid Search Algorithm. So, they achieved 93% accuracy with
features. In [11] used, four machine learning algorithms: 92.5% precision.
Support Vector Machine, Decision Tree Classifier, Random
Forest Classifier, and K Nearest Classifier. Using KNN
researcher got 63.4%, and the accuracy of RFC was 71.4%. In
DTC Algorithm, they got 68.4%, and in SVM with linear
SVM kernel, they got 72.5% and using Gaussian SVM kernel,
they got 86.2%. In [2] entitled “Detection of Heart Diseases
using Machine Learning Techniques,” used machine learning
algorithms like Decision Table, Naïve Bayes, SMO, and Lazy
Kstar on the UCI dataset. They got the highest accuracy in the
Naïve Bayes algorithm and the lowest in the Decision table
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY
The main goal of this research is to detect the chances of reliable output. The complete workflow of this research is
having a heart attack at an early stage with more accurate and shown in Fig. 1.
AGE in years
SEX 1 = Male; 0 = Female
CP Chest Pain type: Value 1: typical angina, Value 2: atypical angina, Value 3: non-
anginal pain, Value 4: Asymptotic
TRESTBPS Resting Blood Pressure (in mm Hg on Admission to the Hospital)
CHOL Serum Cholesterol in mg/dl fetched via BMI sensor.
FBS Fasting Blood Sugar > 120 mg/dl (1 = True; 0 = False)
RESTECG Resting Electrocardiographic Results
THALACH Maximum Heart Rate Achieved
EXANG Exercise-induced Angina (1 = yes; 0 = no)
OLDPEAK ST Depression induced by Exercise Relative to Rest
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
SLOPE The Slope of the Peak Exercise ST Segment
CA Number of Major Vessels (0-3) Coloured by Fluoroscopy
THAL A blood disorder called Thalassemia (3 = Normal; 6 = Fixed Defect; 7 =
Reversable Defect)
Value 1: normal blood flow Value 2: fixed defect (no blood flow in some parts of
the heart) Value 3: reversible defect (a blood flow is observed, but it is not
normal)
TARGET 1 or 0
A. Classification Algorithms
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
traditional machine learning algorithms. Bagging, boosting, seaborn, sklearn, and XGBoost. Confusion Matrix and
and stacking are three main Ensemble learning methods [30]. accuracy are used to evaluate the best classifier. A confusion
matrix is a table for the classifier that depicts its performance
on the collective test data set. It is a summary of prediction
B. Feature Selection
results on classification problems.
The feature selection technique reduces the input variables to Accuracy: Percentage of stroke patients determined as positive
improve model efficiency [31]. Simply put, we segregate and non-patients as negative [18]. Precision: Percentage of
between significant and unneeded data columns. Using feature stroke patients who tested positive [18].Recall: Percentage of
selection, we optimize our model in different ways. It is used stroke patients who have previously tested
to prevent overfitting and improve accuracy. It is also used to positive[18].Specificity: Percentage of non-stroke patients
reduce training time. Feature selection is an important stage of who have tested negative [18].F1-Score (Harmonic Mean of
data pre-processing. This technique picks relevant features to Precision and Recall): Percentage of stroke patients who have
create an accurate model. It reduces the dimensionality of the previously tested positive [18]. For the stroke dataset, we
data [32]. In this research work, we have taken three different apply the LR, KNN, DT, RF, SVM, XGBoost, and Ensemble
subsets of features for our obtained classification models. The learning techniques and obtain a training accuracy of 86.46 %,
first set of feature vectors consists of all 13 components; in the 100%, 75.73%, 100%, 92.19%, 100%, and 100% and testing
second feature vector, we have considered the best 50% accuracy of 86.34%, 98.04%, 77.07%, 99%, 88.78%, 100%,
effective and significant parts, and finally, we thought the best and 98.04% respectively.Fig.3(a), 4(a), 5(a), 6(a), 7(a), 8(a),
25% features to test which combinations of features are and 9(a) present the reported confusion matrix. Fig. 3(b), 4(b),
appropriate for the proposed model for the prediction of heart 5(b), 6(b), 7(b), 8(b), and 9(b) presents the Roc curve for
attacks. Fig.2 presents the correlation in between the features obtained classification algorithms such as LR, KNN, DT, RF,
used in this study. SVM, XGBoost, and Ensemble learning respectively. The
same pictorial representation is shown in Fig.10.
(a)
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
Fig. 4. Results for KNN: (a) Confusion matrix and (b) ROC curve
(a) (b)
Fig. 5. Results for DT: (a) Confusion matrix and (b) ROC curve
(a) (b)
Fig. 6. Results for RF: (a) Confusion matrix and (b) ROC curve
(a) (b)
Fig. 7. Results for SVM: (a) Confusion matrix and (b) ROC curve
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
Fig. 8. Results for XGBoost: (a) Confusion matrix and (b) ROC curve
(a) (b)
Fig. 9. Results for Ensemble Learning: (a) Confusion matrix and (b) ROC curve
Fig. 10. Achieved Training and Testing Accuracy using different ML models
IV. CONCLUSION age, sex, Chest pain type, Resting Blood Pressure, Fasting
The ability of Machine Learning and Artificial Intelligence to Blood Sugar, etc., with more accuracy in a few seconds. The
handle healthcare data in a novel way has the potential to main advantage of this study is that any non-medical
upend the Medical Industry. In recent years the number of employee may utilize this approach to foresee heart failure to
heart attack cases increased significantly due to changes in our lessen the time complexity. Research in the future may focus
lifestyle. And more on that, one in five heart attacks is silent, on estimating the risk levels of cardiovascular disease, as this
which means the damage happened in the body, and the will aid medical professionals and individuals in assessing the
person is not at all aware of it. So, early detection is essential. degree of their cardiac disease. This study looks at how well
This study put forth a model for determining whether a person different machine learning algorithms predict strokes based on
will experience a heart attack by just putting simple things like various physiological parameters. The efficacy of several
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.
machine learning algorithms in accurately predicting stroke model for predicting heart disease using machine learning”, (2022),
4321-4332
based on multiple physiological parameters is demonstrated in
[15] Mursal Furqan, Hiba Rajput, Sanam Narejo, Adnan Ashraf, Kanwal
this research. With a testing accuracy of 100%, the XGBoost Awan, “Heart Disease Prediction using Machine Learning Algorithms”,
Classifier Algorithm outperforms all the other methods. (2020), ISBN-978-969-23372-1-2
The future focus of this research is to see if the framework [16] Amin Ul Haq, Jian Ping Li, Muhammad Hammad Memon, Shah Nazir,
models can be improved by utilizing a larger dataset and and Ruinan Sun, “A Hybrid Intelligent System Framework for the
machine learning models like Bagging. In exchange for Prediction of Heart Disease Using Machine Learning Algorithms”,
(2018), Volume 2018 |Article ID 3860146https://doi.org/10.1155/
merely providing some basic information, the machine 2018/3860146
learning architecture may help the general public recognize [17] Abhinav Kulshreshth, Mahima Yadav, Ganga Sharma, "Detecting
the potential for a stroke to develop in an adult patient. It Cardiac Ailments using Machine Learning", 2022 2nd International
would help patients get early treatment for stroke attacks in a Conference on Intelligent Technologies (CONIT), pp.1-5, 2022.
perfect world. [18] Satapathy, S. K., & Loganathan, D. (2022). Automated Classification of
Sleep Stages Using Single-Channel EEG: A Machine Learning-Based
Method. International Journal of Information Retrieval Research
(IJIRR), 12(2), 1-19.http://doi.org/10.4018/ IJIRR.299941
REFERENCES
[19] S. K. Satapathy, S. Pattnaik and R. Rath, "Automated Sleep Staging
Classification System Based On Convolutional Neural Network Using
[1] Govindarajan, Priya, et al. “Classification of stroke disease using Polysomnography Signals," 2022 IEEE Delhi Section Conference
machine learning algorithms.” Neural Computing and Applications 32 (DELCON), 2022, pp. 1-10, doi: 10.1109/DELCON54057.
(2019): 817-828. 2022.9753132.
[2] Satapathy, S.K., Loganathan, D. “Prognosis of automated sleep staging [20] S. K. Satapathy, H. Madhani, S. Garg, D. Swain and N. Rajput,
based on two-layer ensemble learning stacking model using single- "AutoSleepNet: A Multi-Signal Framework for Automated Sleep Stage
channel EEG signal,” Soft Comput 25, 15445–15462 (2021). Classification," 2022 IEEE World Conference on Applied Intelligence
https://doi.org/10.1007/s00500-021-06218-x and Computing (AIC), 2022, pp. 745-750, doi: 10.1109/AIC55036.
[3] Cheon, Songhee & Kim, Jungyoon & Lim, Jihye. (2019). The Use of 2022.9848873.
Deep Learning to Predict Stroke Patient Mortality. International Journal [21] Santosh Kumar Satapathy, Hari Kishan Kondaveeti, S R Sreeja, Hiral
of Environmental Research and Public Health. 16. 1876. Madhani, Nitinsingh Rajput, Debabrata Swain, A Deep Learning
10.3390/ijerph16111876. Approach to Automated Sleep Stages Classification Using Multi-Modal
[4] Satapathy, S. K., Bhoi, A. K., Loganathan, D., Khandelwal, B., & Signals, Procedia Computer Science, Volume 218,2023, Pages 867-876,
Barsocchi, P. “Machine learning with ensemble stacking model for ISSN 1877-0509, https://doi.org/10.1016/j.procs.2023. 01.067.
automated sleep staging using dual-channel EEG signal,” Biomedical
Signal Processing and Control, 69, 102898. doi: 10.1016/j.bspc.
2021.102898
[5] Satapathy, S.K., Loganathan, D. “Multimodal Multiclass Machine
Learning Model for Automated Sleep Staging Based on Time Series
Data,” SN COMPUT. SCI. 3, 276 (2022).https://doi.org/10.1007/
s42979-022-01156-3
[6] Bah, I. “KNN Algorithm Used for Heart Attack Detection,” FES Journal
of Engineering Sciences, 11(1), 7-19.2021 https://doi.org/10.52981/ fjes.
v11i1.758
[7] Reddy Anuradha, “An Assessment on Cardiovascular Disease Prediction
and Diagnosis using Machine Learning Algorithms,” ISSN No: 2350-
1146 I.F-5.11, Volume VIII and Issue I
[8] Wesam Shishah, “An Efficient Early Stage Heart Disease Risk
Detection Using Machine Learning Techniques,” (2022), INSPEC
Accession Number: 21764250, DOI: 10.1109/
ICPC2T53885.2022.9777070
[9] Bhagyesh Randhawan, Ritesh Jagtap, Amruta Bhilawade, Durgesh
Chaure, “Heart Disease Prediction Using Logistic Regression
Algorithm,” (2022), Volume 10 Issue IV, ISSN: 2321-9653, IC Value:
45.98
[10] Armin Yazdani, Kasturi Dewi Varathan, Yin Kia Chiam, Asad Waqar
Malik, and Wan Azman Wan Ahmad, “A novel approach for heart
disease prediction using strength scores with significant predictors,”
Yazdani et al. BMC Med Inform Decis Mak (2021) 21:194,
https://doi.org/10.1186/s12911-021-01527-5
[11] Chithambaram T, Logesh Kannan N, Gowsalya M, “Heart Disease
Detection Using Machine Learning,” DOI: https://doi.org/10.21203/
rs.3.rs-97004/v1
[12] Vishal Dineshkumar Soni, “Detection Of Heart Disease Using Machine
Learning Techniques,” VOLUME 9, ISSUE 08, (2020) ISSN 2277-8616
[13] Lubna Riyaz, Muheet Ahmed Butt, Majid Zaman, and Omeera Ayob,
“Heart Disease Prediction Using Machine Learning Techniques: A
Quantitative Review,” (2021), AISC, Volume 1394
[14] S.V. Jansi Rani, K.R. Sarath Chandran, Akshaya Ranganathan,
M.Chandrasekharan, B.Janani, and G.Deepsheka, “Smart wearable
Authorized licensed use limited to: BMS Institute of Technology. Downloaded on December 15,2023 at 10:07:19 UTC from IEEE Xplore. Restrictions apply.