0% found this document useful (0 votes)

24 views11 pages

BANASD603

Uploaded by

keshavk1401

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views11 pages

BANASD603

Uploaded by

keshavk1401

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction........................................................................................................... 2
Business Problem................................................................................................... 2
Data Used.............................................................................................................. 3
Dataset Description............................................................................................ 3
Data Preprocessing............................................................................................. 3
Processing and Splitting of the Dataset..............................................................5
Model Buildings..................................................................................................... 6
Selection of Machine Learning Models................................................................6
Training the Models............................................................................................. 6
Model Training and Validation............................................................................. 6
Results and Model Comparisons............................................................................ 7
Conclusion............................................................................................................. 7
References............................................................................................................. 8
Appendix................................................................................................................ 9
Appendix A: Dataset Link................................................................................... 9
Appendix B: ROC Curve...................................................................................... 9
Appendix C: Precision-Recall Curve..................................................................10

1
Introduction
Stroke is reported to be the major cause of more than 11% of deaths and
also one of the leading causes of long-term disability, worldwide. The risk
for getting stroke should therefore be determined as early as possible in a
patient's life to basically prevent patient’s death due to any severe
outcome of stroke (Feigin et al., 2017). Prediction of stroke is not an easy
task, and it basically depends on many prognostic factors such as age,
medical history, and lifestyle habits that can best be predicted with
advanced analytical methods (Hassan et al., 2024).
This whole report would focus on building predictive models that would
basically determine the likelihood of stroke among patients based on
several health attributes such as age, hypertension, heart disease, and
smoking status. The collected patient health data will be fed into several
machine learning models, and the accuracy of the models obtained will be
compared to determine which one is more accurate for the practical use
(Zhi et al., 2024). Hence, the models would therefore identify high-risk
patients much earlier and receive adequate interventions in good time
and consequently pool resources.
This whole report and project now go hand-in-hand with the promise of
data-driven intervention for better outcomes in patients as emphasis on
predictive analytics grows in healthcare. Health providers might make that
shift from reactive to proactive using models like these, and going
forward, reduce the incidence of stroke while increasing care efficiency .

Business Problem
Stroke is one of the major critical healthcare issue in the modern-day age
due to its high mortality and disability rates. Early detection of those
patients that are at high risk for stroke is important and essential, as it
basically enables healthcare personals to basically take preventive
measures and provide early interventions. So, that the patient life can be
saved (Chadaga et al., 2023).
Despite the availability of different types of clinical tools and machines,
accurately predicting stroke risk remains a very big challenge due to the
complexity of factors that are involved, such as age, hypertension, heart
disease, and smoking habits. Healthcare providers basically need a
reliable, data-driven method to identify people who are most likely to
suffer a stroke, basically allowing for efficient resource allocation and
treatment prioritization.
The business benefits of building such a model that can accurately predict
stroke are substantial. Hospitals and clinics can basically improve their

2
decision-making processes by focusing most of their resources on
individuals that have higher likelihood of stroke, thereby reducing
healthcare costs and improving patient outcomes.

Data Used
Dataset Description
The data set that has been used for this whole exercise and project comes
from Kaggle's Stroke Prediction Dataset. This dataset basically holds over
5,100 records about patient health data, basically one row for every
patient. The data set holds the very important health attributes such as
age, gender, hypertension, heart disease, average glucose level, body
mass index (BMI), and smoking status. The main target variable is stroking
whose value either is 1 indicating the stroke or 0 indicating not stroke.
These are important characteristics to basically build a predictive machine
learning model since these have already been recognized as risk factors
for stroke by global health standards.

Figure 1: Dataset Demo in Excel

Data Preprocessing
Before building the predicting model, different types of steps were taken
in order to clean and preprocess the data. For example, there were some

3
missing values in the BMI column that needed immediate attention.
Hence, the missing values in the BMI column were handled by filling them
with the median value of the BMI column, basically ensuring that no data
points were lost due to missing information present in the BMI column.
Additionally, categorical variables such as gender, ever_married,
work_type, Residence_type, and smoking_status were basically converted
into numerical representations using One-Hot Encoding.

Figure 2: Missing Values in the Dataset

4
Figure 3: Handling Missing Values

Figure 4: One-Hot Encoding

Processing and Splitting of the Dataset

One of the major challenges that I faced when working with the dataset
was the class imbalance between stroke (minority class) and non-stroke
(majority class) records. Since stroke occurrences were far fewer than
non-stroke cases, SMOTE (Synthetic Minority Over-sampling Technique)
was applied to the training data. SMOTE generates synthetic samples for

5
the minority class, helping to balance the dataset and improve the
model's ability to predict stroke cases.
A training and testing set was divided in the ratio 80:20 for basically
checking performance of the model. Scaling features was by using the
StandardScaler library, so that features of all attributes of the dataset
scaled at the same level, which is very particular and important to model
like Logistic Regression and SVM as this normalizes the features that in
effect helps the model to work better.

Figure 5: Processing and Splitting of the Dataset

Model Buildings
Selection of Machine Learning Models
To basically tackle the stroke prediction problem, several machine learning
models were selected to compare their performance. The models used
include:
 Logistic Regression
 Random Forest Classifier
 K-Nearest Neighbours (KNN)
 Support Vector Machine (SVM)
 Gradient Boosting Classifier

6
These models were chosen due to their effectiveness in classification
tasks and their ability to handle different types of data.

Training the Models

Each of the machine learning model was basically trained by using the
training dataset after applying SMOTE. For certain and some models, such
as Logistic Regression, Random Forest, and SVM, the
class_weight='balanced' parameter was basically used to further address
the class imbalance that was happening because of the less stroke
positive points. This parameter basically adjusts the weights assigned to
each class, giving more importance to the minority class (stroke) during
training part.

Model Training and Validation

After doing the preprocessing the data and basically addressing the class
imbalance that was present in the dataset, each of the machine learning
models was trained by using the training dataset. The performance of
each machine learning models was further validated by using the testing
dataset, where the parameters of the performance were accuracy,
precision, recall, F1-score, and AUC-ROC. These parameters basically
allowed us to compare the strengths and weaknesses of each and every
model in terms of predicting stroke cases accurately and efficiently.

Figure 6: Models Building

7
Results and Model Comparisons

Figure 7: Performance Summary Table

Among the several machine learning models that was tested, Gradient
Boosting model emerged as the best-performing model for predicting
stroke cases. Although its accuracy (0.94) was similar to many other
models, Gradient Boosting demonstrated a higher precision (0.44) and F1-
score (0.11) for stroke predictions that did not show by any other models.
The AUC-ROC score (0.81) basically indicated a that there is good balance
between true positive and false positive predictions, making it the most
effective model for distinguishing between stroke and non-stroke cases.

Conclusion
Through this whole report and project, we have tested the ability of
several machine learning models to basically predict strokes in patients by
using health attributes such as age, hypertension, and heart disease. All
of these models promised much when trying to balance precision and
recall for predicting strokes but suffered due to the problem of class
imbalance in the minority class. Even though the models were very
accurate, recall needs to improve such that more high-risk patients are
detected. Based on the results, predictive analytics has promise in
healthcare and providers may proactively do this. Future efforts should
aim at improving the recall of the model and class imbalance for the
model to detect a case of stroke as early as possible. But based on the
current dataset and analysis, if we have to do choose any model for
making the predictions, we should definitely go with the Gradient Boosting
model.

References
Feigin, V. L., Norrving, B., & Mensah, G. A. (2017). Global burden of stroke.
Circulation Research, 120(3), 439-448.
https://doi.org/10.1161/CIRCRESAHA.116.308413
Hassan, A., Gulzar Ahmad, S., Ullah Munir, E., Ali Khan, I., & Ramzan, N.
(2024). Predictive modelling and identification of key risk factors for stroke

8
using machine learning. Scientific Reports, 14(1), 11498.
https://doi.org/10.1038/s41598-024-61665-4
Zhi, S., Hu, X., Ding, Y., Chen, H., Li, X., Tao, Y., & Li, W. (2024). An
exploration on the machine-learning-based stroke prediction model.
Frontiers in Neurology, 15, 1372431.
https://doi.org/10.3389/fneur.2024.1372431
Chadaga, K., Sampathila, N., Prabhu, S., & Chadaga, R. (2023). Multiple
explainable approaches to predict the risk of stroke using artificial
intelligence. Information, 14(8), 435. https://doi.org/10.3390/info14080435

9
Appendix
Appendix A: Dataset Link
https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset
Describe above is the dataset link from where you can see the dataset
that has been used in the project and report.

Appendix B: ROC Curve

10
Appendix C: Precision-Recall Curve

Consulting Casebook Consulere DMS IITDelhi
No ratings yet
Consulting Casebook Consulere DMS IITDelhi
30 pages
GE XR6000 X-Ray - Service Manual
91% (44)
GE XR6000 X-Ray - Service Manual
422 pages
CertIFR Session 01
No ratings yet
CertIFR Session 01
10 pages
Stroke Prediction Analysis
No ratings yet
Stroke Prediction Analysis
5 pages
Group Assessment
No ratings yet
Group Assessment
20 pages
Algorithms 16 00417
No ratings yet
Algorithms 16 00417
16 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Stroke Prediction Using Clinical and Social Features in Machine
No ratings yet
Stroke Prediction Using Clinical and Social Features in Machine
13 pages
Stroke Prediction Project Presentation
No ratings yet
Stroke Prediction Project Presentation
9 pages
DSC652 Project-Stroke Prediction System
No ratings yet
DSC652 Project-Stroke Prediction System
22 pages
(IJCST-V12I4P5) :vaishali Sarde, Pankaj Sarde
No ratings yet
(IJCST-V12I4P5) :vaishali Sarde, Pankaj Sarde
8 pages
Strokeprediction DRAFTArticle
No ratings yet
Strokeprediction DRAFTArticle
6 pages
PROJECT - 2 Corection
No ratings yet
PROJECT - 2 Corection
5 pages
Strokeprediction DRAFTArticle
No ratings yet
Strokeprediction DRAFTArticle
6 pages
SLR, SLC and USL Mini Project
No ratings yet
SLR, SLC and USL Mini Project
10 pages
Machine Learning - Project
No ratings yet
Machine Learning - Project
26 pages
Itpml32 Full
No ratings yet
Itpml32 Full
19 pages
A Machine Learning-Based Model For Stroke Predicti
No ratings yet
A Machine Learning-Based Model For Stroke Predicti
9 pages
Performance Analysis of Machine Learning Approaches in Stroke Prediction
No ratings yet
Performance Analysis of Machine Learning Approaches in Stroke Prediction
6 pages
IEEE Usa
No ratings yet
IEEE Usa
7 pages
Brain Stroke Prediction Using Machine Learning Techniques
No ratings yet
Brain Stroke Prediction Using Machine Learning Techniques
6 pages
Final.52 Plag.
No ratings yet
Final.52 Plag.
48 pages
Stroke Prediction
No ratings yet
Stroke Prediction
48 pages
Brain Stroke Review 2
No ratings yet
Brain Stroke Review 2
27 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis
No ratings yet
Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis
8 pages
Proposedsytem
No ratings yet
Proposedsytem
1 page
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
Stroke Prediction ! Review
No ratings yet
Stroke Prediction ! Review
15 pages
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
No ratings yet
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
4 pages
New Oct PREDICTIVE - ANALYSIS - ON - HEART - STROKE - PREDICTION - USING - MACHINE - LEARNING - MODEL-4-1 - (1) (1) NEw
No ratings yet
New Oct PREDICTIVE - ANALYSIS - ON - HEART - STROKE - PREDICTION - USING - MACHINE - LEARNING - MODEL-4-1 - (1) (1) NEw
7 pages
Using Machine Learning For Detection and Prediction of Chronic Diseases
No ratings yet
Using Machine Learning For Detection and Prediction of Chronic Diseases
17 pages
Stroke Prediction Using Linear Regression
No ratings yet
Stroke Prediction Using Linear Regression
8 pages
Brain Stroke Shiva
100% (1)
Brain Stroke Shiva
21 pages
F2022393008-Stroke Prediction
No ratings yet
F2022393008-Stroke Prediction
6 pages
Prediction of Stroke Using Deep Learning Model: October 2017
No ratings yet
Prediction of Stroke Using Deep Learning Model: October 2017
10 pages
Stroke Prediction Project Report
No ratings yet
Stroke Prediction Project Report
7 pages
ISE Group Project
No ratings yet
ISE Group Project
15 pages
Research Article: Stroke Disease Detection and Prediction Using Robust Learning Approaches
No ratings yet
Research Article: Stroke Disease Detection and Prediction Using Robust Learning Approaches
12 pages
Mini Report2
No ratings yet
Mini Report2
40 pages
Sensors 22 04670
No ratings yet
Sensors 22 04670
13 pages
Project - Presentation - Phase 0-2
No ratings yet
Project - Presentation - Phase 0-2
14 pages
Stroke Prediction D.B
No ratings yet
Stroke Prediction D.B
11 pages
An Effective Framework For Predicting Stroke Prediction Using Machine Learning Technique
No ratings yet
An Effective Framework For Predicting Stroke Prediction Using Machine Learning Technique
8 pages
Performance Analysis of Various Machine Learning Approaches in Stroke Prediction
No ratings yet
Performance Analysis of Various Machine Learning Approaches in Stroke Prediction
6 pages
Miniproject Review PPT (Final)
No ratings yet
Miniproject Review PPT (Final)
14 pages
Stroke Prediction System Using ANN (Artificial Neural Network)
No ratings yet
Stroke Prediction System Using ANN (Artificial Neural Network)
3 pages
Machine Learning For Preventive Healthcare
No ratings yet
Machine Learning For Preventive Healthcare
10 pages
TMA Final Report
No ratings yet
TMA Final Report
17 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Sample PPT For Presenattions DevIC 2025
No ratings yet
Sample PPT For Presenattions DevIC 2025
13 pages
Stroke Prediction Using Machine Learning1
No ratings yet
Stroke Prediction Using Machine Learning1
9 pages
IJCRT2106047
No ratings yet
IJCRT2106047
5 pages
Predicting Ischemic Stroke: By, Atcahaya V (1801015) Mirtuyanjana N (1801114)
No ratings yet
Predicting Ischemic Stroke: By, Atcahaya V (1801015) Mirtuyanjana N (1801114)
7 pages
Stroke
No ratings yet
Stroke
6 pages
Final Research Paper
No ratings yet
Final Research Paper
10 pages
Research Paper (1) (1) (1) (1) Final
No ratings yet
Research Paper (1) (1) (1) (1) Final
10 pages
A Transfer Learning Approch To Predict The Diagnosis of Brain Stroke
No ratings yet
A Transfer Learning Approch To Predict The Diagnosis of Brain Stroke
6 pages
Rapport
No ratings yet
Rapport
21 pages
Mini Project BSP
No ratings yet
Mini Project BSP
8 pages
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
From Everand
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
David W. Hosmer, Jr.
4/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
BANASDS603 Assessment2
No ratings yet
BANASDS603 Assessment2
8 pages
ADMN2926T
No ratings yet
ADMN2926T
12 pages
4976 Algomal (Group Assignment)
No ratings yet
4976 Algomal (Group Assignment)
9 pages
Assessment 3 (HR Recommendation)
No ratings yet
Assessment 3 (HR Recommendation)
15 pages
Bece 103 (3-5)
No ratings yet
Bece 103 (3-5)
7 pages
Bece 103 (0-2)
No ratings yet
Bece 103 (0-2)
7 pages
Credit Handbook
No ratings yet
Credit Handbook
52 pages
Combined Gas (Turbine) and Steam (Turbine) : (COGAS)
100% (1)
Combined Gas (Turbine) and Steam (Turbine) : (COGAS)
28 pages
Josaa Round 2 Mnnit Allahabad Cutoff 2024
No ratings yet
Josaa Round 2 Mnnit Allahabad Cutoff 2024
23 pages
Ken Klippenstein FOIA Lawsuit
No ratings yet
Ken Klippenstein FOIA Lawsuit
121 pages
RHO Related PDF
No ratings yet
RHO Related PDF
11 pages
Amantra The Crown Jewel of The New BKC
No ratings yet
Amantra The Crown Jewel of The New BKC
22 pages
Operation Manual: Minimax V2.0
No ratings yet
Operation Manual: Minimax V2.0
203 pages
Review Mechanical Recycling of Ewaste
100% (2)
Review Mechanical Recycling of Ewaste
21 pages
Tax Deduction at Source (TDS)
No ratings yet
Tax Deduction at Source (TDS)
3 pages
Snapdragon 710 Product Brief
No ratings yet
Snapdragon 710 Product Brief
2 pages
035.business English Professional Phrases 500 - Business English Learning
No ratings yet
035.business English Professional Phrases 500 - Business English Learning
59 pages
Preparing A Given Concentration of Sodium Hypochlorite From An Unscented Bleach Solution
No ratings yet
Preparing A Given Concentration of Sodium Hypochlorite From An Unscented Bleach Solution
6 pages
Lan644-Lab 01-Using - Making Maps
No ratings yet
Lan644-Lab 01-Using - Making Maps
3 pages
Burglar Alarm System Term Paper
No ratings yet
Burglar Alarm System Term Paper
12 pages
Biological Assets
No ratings yet
Biological Assets
28 pages
Alternative Investment Fund
No ratings yet
Alternative Investment Fund
13 pages
Project Report On Compensation-Management, Case Study Have Tobe Included
50% (2)
Project Report On Compensation-Management, Case Study Have Tobe Included
32 pages
Wk-5-Chapter-5-Job Analysis and Design
No ratings yet
Wk-5-Chapter-5-Job Analysis and Design
28 pages
Metalkraft (Z Purlins Re Print)
No ratings yet
Metalkraft (Z Purlins Re Print)
4 pages
Microsoft Word - Enhance The CRM WebClient UI With Custom Fields2
No ratings yet
Microsoft Word - Enhance The CRM WebClient UI With Custom Fields2
21 pages
Chapter 49 Gas Turbine Power Plnat
No ratings yet
Chapter 49 Gas Turbine Power Plnat
4 pages
Sports Tourism: A Proposal For Economic and Infrastructural Development in Africa
No ratings yet
Sports Tourism: A Proposal For Economic and Infrastructural Development in Africa
76 pages
3 I's RESEARCH
No ratings yet
3 I's RESEARCH
2 pages
Johari Window Worksheet PDF
No ratings yet
Johari Window Worksheet PDF
32 pages
Yamaha MG06X/MG06 - Technical Specifications
No ratings yet
Yamaha MG06X/MG06 - Technical Specifications
2 pages
DAC 6 Presentation
No ratings yet
DAC 6 Presentation
56 pages
NEA V MAGELCO
No ratings yet
NEA V MAGELCO
1 page

BANASD603

Uploaded by

BANASD603

Uploaded by

Table of Contents

Figure 1: Dataset Demo in Excel

Figure 2: Missing Values in the Dataset

Figure 4: One-Hot Encoding

Processing and Splitting of the Dataset

Figure 5: Processing and Splitting of the Dataset

Training the Models

Model Training and Validation

Figure 6: Models Building

Figure 7: Performance Summary Table

Appendix B: ROC Curve

You might also like