Aiml Project
Aiml Project
Bachelor of Technology
in
By
Submitted to
CERTIFICATE
This is to certify that the Project Report entitled “Brain Stroke Prediction” is are cord of
Bonafide work carried out by Kola Vennela bearing Roll No-2203A51210 during the
academic year 2023-2024 in partial fulfillment of the award of the degree of Bachelor of
Technology in Computer Science Engineering by the SR UNIVERSITY, WARANGAL.
ACKNOWLEDGEMENT
We express our thanks to course coordinator Mr. D. Ramesh, Asst. prof. for guiding us from
the beginning through the end of the course project. We express our gratitude to head of the
department CS&AI, Dr. M. Shashikala, Associate Professor for encouragement, support and
insightful suggestions. We truly value their consistent feedback on our progress, which was always
constructive and encouraging and ultimately drove us to the right direction.
We wish to take this opportunity to express our sincere gratitude and deep sense of respect to
our beloved Dean, School of Computer Science and Artificial Intelligence, Dr C. V. Guru Rao, for his
continuous support and guidance to complete this project in the institute.
Finally, we express our thank to all teaching and non-teaching staff of the department for their
suggestions and timely support.
ABSTRACT
Brain stroke, also known as cerebrovascular accident (CVA), is a critical medical condition with
potentially severe consequences. Early detection of individuals at risk of stroke can significantly aid
in preventive measures and timely medical interventions, thus reducing morbidity and mortality rates
associated with stroke. In this project, we propose an artificial intelligence and machine learning-
based approach for predicting the risk of brain stroke.
The dataset used for training and testing our predictive model consists of various demographic,
clinical, and lifestyle factors such as age, gender, hypertension, diabetes, smoking habits, alcohol
consumption, and physical activity level, among others. We employ state-of-the-art machine learning
algorithms, including logistic regression, random forest, support vector machines, and neural
networks, to analyze and learn patterns from the data.
Feature selection techniques and cross-validation methods are utilized to enhance model performance
and generalization. Additionally, model interpretability techniques are employed to understand the
significant predictors contributing to stroke risk prediction.
The performance of our predictive model is evaluated using metrics such as accuracy, sensitivity,
specificity, and area under the receiver operating characteristic curve (AUC-ROC). Furthermore, we
conduct comparative analyses with existing risk assessment tools and clinical guidelines to validate
the effectiveness and reliability of our proposed approach.
The results demonstrate promising performance in accurately predicting the risk of brain stroke,
thereby providing valuable insights for healthcare professionals to identify high-risk individuals and
initiate appropriate preventive strategies. Our aim is to develop a robust and scalable predictive tool
that can be integrated into clinical practice for proactive management of stroke risk, ultimately
leading to improved patient outcomes and healthcare resource allocation.
Table of Contents
1 Introduction 1
2 Literature Review 2
3 Design 3
4 Methodology 4
5 Data Pre-processing 8
6 Results 9
7 Conclusion 17
8 Future scope 17
9 References 17
1.INTRODUCTION:
Brain stroke, also referred to as cerebrovascular accident (CVA), is a leading cause of mortality and long-term
disability worldwide. It occurs when blood flow to a part of the brain is interrupted or reduced, depriving brain
tissue of oxygen and nutrients. Prompt identification of individuals at risk of stroke is imperative for
implementing preventive strategies and timely interventions to mitigate the potential consequences.
With the advancements in artificial intelligence (AI) and machine learning (ML), predictive modeling has
emerged as a promising approach for assessing stroke risk. By leveraging vast amounts of data encompassing
demographic, clinical, and lifestyle factors, AI-based models can discern patterns and identify individuals
predisposed to stroke. Such predictive tools have the potential to revolutionize stroke prevention by enabling
proactive management strategies tailored to individual risk profiles.
In this project, we aim to develop an AI-powered system for predicting the risk of brain stroke. We utilize a
diverse dataset containing information on key risk factors such as age, gender, hypertension, diabetes, smoking
habits, alcohol consumption, and physical activity levels. By applying machine learning techniques and feature
selection methods, we aim to construct a robust predictive model capable of accurately assessing stroke risk.
The significance of this project lies in its potential to augment current clinical practices by providing healthcare
professionals with a reliable tool for early stroke risk identification. By integrating AI-driven predictive
analytics into routine healthcare protocols, we anticipate a paradigm shift towards proactive stroke prevention
strategies, ultimately leading to improved patient outcomes and reduced burden on healthcare systems.
1
2. LITERATURE REVIEW
Brain stroke remains a major public health concern globally, with its incidence steadily rising and its
debilitating consequences imposing significant socioeconomic burdens. In recent years, there has
been a growing interest in leveraging artificial intelligence (AI) and machine learning (ML)
techniques to enhance stroke risk prediction and preventive interventions.
Numerous studies have explored the utility of AI and ML algorithms in predicting stroke risk by
analyzing large datasets containing diverse sets of risk factors. For instance, demographic factors such
as age and gender, along with clinical variables including hypertension, diabetes, hyperlipidemia, and
atrial fibrillation, have consistently emerged as significant predictors of stroke risk across various
populations (Wang et al., 2020; Qureshi et al., 2019).
2
3.DESIGN:
Requirement Specifications
Hardware Requirements
System
RAM
Hard Disk
Input
Output
Software Requirements
OS
Platform
Program Language
3
4. METHODOLOGY:
logistic_regression = LogisticRegression()
logistic_regression.fit(x_train, y_train)
5
4.5 Naive Bayes:
4.7 AdaBoost:
AdaBoost (Adaptive Boosting) is an ensemble technique that iteratively combines
multiple weak learners (e.g., decision trees) into a strong learner. It works by
training a sequence of weak learners, where each subsequent learner focuses on
the instances that were misclassified by the previous learner. The final prediction
6
is a weighted majority vote of the weak learners, with higher weights assigned to
the more accurate learners.
4.9 XGBoost:
XGBoost (Extreme Gradient Boosting) is an optimized implementation of
Gradient Boosting that incorporates several computational and algorithmic
improvements. It uses a more regularized model formalization to control
overfitting, parallel and distributed processing, and a highly optimized gradient
computation technique. XGBoost has become a widely used and effective
algorithm for many machine learning tasks, especially structured or tabular data.
7
5. DATASETPREPROCESSING:
DATASET DESCRIPTION
Attributes:
gender
we will prefer the gender in this column.
Age
Here we will prefer the age of the desired person.
Hypertension
We will add if they have hypertension or not in 0’s and 1’s.
heart_disease
We will note whether they have heart disease or not in 0’s and 1’s.
ever_married
We will note whether they are married or not.
work_type
We will note what of work they are doing.
Residence_type
We will note that where they are living.
avg_glucose_level
We will note their average glucose level.
Bmi
We will note their bmi values.
smoking_status
We will take a note whether they have smoking habit or not.
Stroke
We will note that they have ever got a stroke or not in 0 for no and 1 for
yes.
Here target variable is stroke. From these attributes results we will predict that
whether they have brain stroke or if there is a chance to hit a brain stroke or not
depending on all the attribute values they have given.
8
Dataset:
9
6. RESULTS:
Logistic Regression:
Accuracy: 0.9458375125376128
Precision: 0.0
Recall: 0.0
F1-score: 0.0
Confusion Matrix:
[[943 0]
[ 54 0]]
Here the logistic regression accuracy is 0.945 and all the three precision, recall, f1 score is 0. This means logistic
regression is biased.
KNN:
Accuracy: 0.9468405215646941
Precision: 0.5714285714285714
Recall: 0.07407407407407407
F1-score: 0.13114754098360656
Confusion Matrix:
[[940 3]
[ 50 4]]
Here knn accuracy is 0.946 , precision is 0.571, recall is 0.074 and f1 score is 0.131.
10
Decision Tree:
Accuracy: 0.9167502507522568
Precision: 0.24561403508771928
Recall: 0.25925925925925924
F1-score: 0.2522522522522523
Confusion Matrix:
[[900 43]
[ 40 14]]
Here Decision Tree accuracy is 0.916, precision is 0.245, recall is 0.2592 and f1 score is 0.2522.
SVM:
Accuracy: 0.9458375125376128
Precision: 0.0
Recall: 0.0
F1-score: 0.0
Confusion Matrix:
[[943 0]
[ 54 0]]
Here SVM accuracy is 0.945 and remaining precision, recall, f1 score is 0. This means SVM is biased.
11
Naive Bayes:
Accuracy: 0.7512537612838516
Precision: 0.13257575757575757
Recall: 0.6481481481481481
F1-score: 0.220125786163522
Confusion Matrix:
[[714 229]
[ 19 35]]
Here Navie Bayes accuracy is 0.751, precision is 0.132, recall is 0.132, and f1 scare is 0.22.
Here Random Forest Accuracy is 0.939 and remaining all precision, recall, f1 score is 0. This means Random Forest Is
baised.
12
AdaBoost Classifier:
Accuracy: 0.9468405215646941
Precision: 1.0
Recall: 0.018518518518518517
F1 Score: 0.03636363636363636
Confusion Matrix:
[[943 0]
[ 53 1]]
Here AdaBoost accuracy is 0.946, precision is 1, recall is 0.185 and f1 score is 0.036.
Here Gradient Boosting Classifier accuracy is 0.946 , precision is 0.66, recall is 0.037 and f1 score is 0.071.
13
XGBoost Classifier:
Accuracy: 0.9388164493480441
Precision: 0.23076923076923078
Recall: 0.05555555555555555
F1 Score: 0.08955223880597016
Confusion Matrix:
[[933 10]
[ 51 3]]
Here XGBoost Classifier accuracy is 0.938, precision is 0.23, recall is 0.055 and f1 score is 0.089.
From all the machine learning models Logistic Regression, SVM and Random Forest are biased in
precision, recall and f1 score.
14
In the graph, the accuracy values of all the machine learning models are shown. It is
clear that the Navie Bayes(0.751) has the lowest accuracy and is the most effective
among all the machine learning models to predict Brain Strokes in the given dataset.
After this model Decision has the lowest accuracy and can also be used effectively for
the prediction of Brain Strokes.
7. CONCLUSION:
16
In conclusion, artificial intelligence and machine learning techniques offer promising
advancements in stroke risk prediction. By analyzing diverse datasets, these methods accurately
identify individuals at risk, surpassing traditional approaches. Challenges like interpretability and
implementation remain, requiring ongoing research. Nevertheless, AI-driven predictive analytics have
the potential to enhance proactive stroke management, improving patient outcomes and resource
allocation in healthcare..
8. FUTURE SCOPE :
1. Advanced Algorithms: Refining machine learning models for better accuracy.
2. Personalized Care: Tailoring interventions based on individual risk profiles.
3. Real-time Assessment: Developing systems for instant risk evaluation.
4. Clinical Support: Integrating AI into decision-making tools for healthcare providers.
5. Population Health: Using AI to manage stroke risks at a broader level.
6. Ethics and Regulations: Addressing concerns about data privacy and bias.
7. Validation Studies: Conducting large-scale trials to assess model effectiveness.
9. REFERENCES:
1. https://www.ijert.org/brain-stroke-prediction-using-machine-learning
2. https://www.questjournals.org/jecer/papers/vol8-issue4/F08042530.pdf
3. https://ijarsct.co.in/Paper10422.pdf
4. https://journals.itb.ac.id/index.php/jictra/article/download/18061/6082
5. https://www.kaggle.com/code/reihanenamdari/brain-stroke-prediction-decisiontree
17