0% found this document useful (0 votes)
255 views22 pages

Aiml Project

This document describes developing a machine learning model to predict the risk of brain stroke. It discusses collecting data on demographic and clinical risk factors and applying algorithms like logistic regression, random forests and neural networks to analyze the data and build a predictive model. The goal is to create a tool that can help identify individuals at high risk of stroke so preventive measures can be taken.

Uploaded by

kolavennela90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views22 pages

Aiml Project

This document describes developing a machine learning model to predict the risk of brain stroke. It discusses collecting data on demographic and clinical risk factors and applying algorithms like logistic regression, random forests and neural networks to analyze the data and build a predictive model. The goal is to create a tool that can help identify individuals at high risk of stroke so preventive measures can be taken.

Uploaded by

kolavennela90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

BRAIN STROKE PREDICTION

A Project Report in partial fulfillment of the degree

Bachelor of Technology

in

Computer Science & Engineering

By

2203A51210 KOLA VENNELA

Under the Guidance of


D Ramesh

Assistant Professor, Department of CSE.

Submitted to

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


SR UNIVERSITY, ANANTHASAGAR, WARANGAL
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that the Project Report entitled “Brain Stroke Prediction” is are cord of
Bonafide work carried out by Kola Vennela bearing Roll No-2203A51210 during the
academic year 2023-2024 in partial fulfillment of the award of the degree of Bachelor of
Technology in Computer Science Engineering by the SR UNIVERSITY, WARANGAL.

Supervisor Head of the Department


Mr. D. Ramesh Dr. M. Shashikala
Asst. Professor Assoc. Prof .& HOD (CSE)
SR University SR University

ACKNOWLEDGEMENT
We express our thanks to course coordinator Mr. D. Ramesh, Asst. prof. for guiding us from
the beginning through the end of the course project. We express our gratitude to head of the
department CS&AI, Dr. M. Shashikala, Associate Professor for encouragement, support and
insightful suggestions. We truly value their consistent feedback on our progress, which was always
constructive and encouraging and ultimately drove us to the right direction.

We wish to take this opportunity to express our sincere gratitude and deep sense of respect to
our beloved Dean, School of Computer Science and Artificial Intelligence, Dr C. V. Guru Rao, for his
continuous support and guidance to complete this project in the institute.

Finally, we express our thank to all teaching and non-teaching staff of the department for their
suggestions and timely support.
ABSTRACT

Brain stroke, also known as cerebrovascular accident (CVA), is a critical medical condition with
potentially severe consequences. Early detection of individuals at risk of stroke can significantly aid
in preventive measures and timely medical interventions, thus reducing morbidity and mortality rates
associated with stroke. In this project, we propose an artificial intelligence and machine learning-
based approach for predicting the risk of brain stroke.

The dataset used for training and testing our predictive model consists of various demographic,
clinical, and lifestyle factors such as age, gender, hypertension, diabetes, smoking habits, alcohol
consumption, and physical activity level, among others. We employ state-of-the-art machine learning
algorithms, including logistic regression, random forest, support vector machines, and neural
networks, to analyze and learn patterns from the data.

Feature selection techniques and cross-validation methods are utilized to enhance model performance
and generalization. Additionally, model interpretability techniques are employed to understand the
significant predictors contributing to stroke risk prediction.

The performance of our predictive model is evaluated using metrics such as accuracy, sensitivity,
specificity, and area under the receiver operating characteristic curve (AUC-ROC). Furthermore, we
conduct comparative analyses with existing risk assessment tools and clinical guidelines to validate
the effectiveness and reliability of our proposed approach.

The results demonstrate promising performance in accurately predicting the risk of brain stroke,
thereby providing valuable insights for healthcare professionals to identify high-risk individuals and
initiate appropriate preventive strategies. Our aim is to develop a robust and scalable predictive tool
that can be integrated into clinical practice for proactive management of stroke risk, ultimately
leading to improved patient outcomes and healthcare resource allocation.
Table of Contents

S.NO Content Page No

1 Introduction 1
2 Literature Review 2
3 Design 3
4 Methodology 4
5 Data Pre-processing 8
6 Results 9
7 Conclusion 17
8 Future scope 17
9 References 17
1.INTRODUCTION:

Brain stroke, also referred to as cerebrovascular accident (CVA), is a leading cause of mortality and long-term
disability worldwide. It occurs when blood flow to a part of the brain is interrupted or reduced, depriving brain
tissue of oxygen and nutrients. Prompt identification of individuals at risk of stroke is imperative for
implementing preventive strategies and timely interventions to mitigate the potential consequences.

With the advancements in artificial intelligence (AI) and machine learning (ML), predictive modeling has
emerged as a promising approach for assessing stroke risk. By leveraging vast amounts of data encompassing
demographic, clinical, and lifestyle factors, AI-based models can discern patterns and identify individuals
predisposed to stroke. Such predictive tools have the potential to revolutionize stroke prevention by enabling
proactive management strategies tailored to individual risk profiles.

In this project, we aim to develop an AI-powered system for predicting the risk of brain stroke. We utilize a
diverse dataset containing information on key risk factors such as age, gender, hypertension, diabetes, smoking
habits, alcohol consumption, and physical activity levels. By applying machine learning techniques and feature
selection methods, we aim to construct a robust predictive model capable of accurately assessing stroke risk.

The significance of this project lies in its potential to augment current clinical practices by providing healthcare
professionals with a reliable tool for early stroke risk identification. By integrating AI-driven predictive
analytics into routine healthcare protocols, we anticipate a paradigm shift towards proactive stroke prevention
strategies, ultimately leading to improved patient outcomes and reduced burden on healthcare systems.

1
2. LITERATURE REVIEW
Brain stroke remains a major public health concern globally, with its incidence steadily rising and its
debilitating consequences imposing significant socioeconomic burdens. In recent years, there has
been a growing interest in leveraging artificial intelligence (AI) and machine learning (ML)
techniques to enhance stroke risk prediction and preventive interventions.

Numerous studies have explored the utility of AI and ML algorithms in predicting stroke risk by
analyzing large datasets containing diverse sets of risk factors. For instance, demographic factors such
as age and gender, along with clinical variables including hypertension, diabetes, hyperlipidemia, and
atrial fibrillation, have consistently emerged as significant predictors of stroke risk across various
populations (Wang et al., 2020; Qureshi et al., 2019).

Furthermore, advancements in ML techniques, including logistic regression, random forest, support


vector machines, and neural networks, have enabled the development of sophisticated predictive
models capable of capturing complex interactions among various risk factors (Yu et al., 2020;
Vidyasagar et al., 2018). These models offer superior performance in terms of sensitivity, specificity,
and area under the receiver operating characteristic curve (AUC-ROC), thereby facilitating more
precise risk stratification and personalized interventions.

In summary, the literature underscores the potential of AI and ML approaches in revolutionizing


stroke risk prediction and preventive care. By harnessing the power of big data analytics and
predictive modeling, healthcare systems can move towards a proactive paradigm of stroke
management, ultimately reducing the burden of stroke-related morbidity and mortality on individuals
and society.

2
3.DESIGN:
Requirement Specifications
Hardware Requirements
 System
 RAM
 Hard Disk
 Input
 Output

Software Requirements
 OS
 Platform
 Program Language

3
4. METHODOLOGY:

After Information pre-processing and information visualization the following step is


to apply the models on the dataset. Our dataset comes beneath directed learning as it
contains the labeled information (target factors, include factors). To begin with the
dataset is splitted into preparing set and testing set. At that point the show is prepared
on preparing set and at that point tried on testing set.

4.1 Logistic regression algorithm:

Logistic regression is a machine learning calculation which comes beneath


administered learning. It is a parametric strategy, where an condition is shaped to
unravel. The condition returns proceeds values. These proceeds values ought to to
changed over to categorical values.so, we utilize a enactment work called
“sigmoid”.by utilizing log mistake function.

 from sklearn.linear_model import LogisticRegression

 logistic_regression = LogisticRegression()

 logistic_regression.fit(x_train, y_train)

4.2 K-Nearest Neighbor algorithm:

K-Nearest Neighbor calculation is a machine learning calculation which comes


beneath directed learning. This is utilized for both classification and relapse. This
calculation is non parametric. This is too called as sluggish learning calculation. This
calculation works by to begin with selecting the k esteem which is an numbers esteem
and less than the number of lines. When a unused information point is given, KNN
finds the closest neighbors to that information point based on the remove utilizing
different strategies like Euclidean remove or Manhattan separate. And allocates the
information point to that class.
4
 from sklearn.neighbors import KNeighborsClassifier
 knn = KNeighborsClassifier()
 knn.fit(x_train, y_train)

4.3 Decision Tree algorithm:


Decision tree calculation is a machine learning calculation which comes beneath
administered learning. This is utilized for both classification and relapse issues. This
calculation is too known as ID3 calculation. This calculation is non parametric
strategy. It shapes a tree from the given dataset. It has two hubs choice hubs and leaf
hubs. Choice hubs are utilized for taking choices and leaf hubs are the yield of that
choices. The trait choice happens by entropy and data Gini.
 from sklearn.tree import DecisionTreeClassifier
 decision_tree = DecisionTreeClassifier()
 decision_tree.fit(x_train, y_train)
4.4 Support vector machine algorithm:
Support vector machine calculation is a machine learning calculation which comes
beneath administered learning. This is utilized for both classification and relapse
issues. SVM works by developing a hyperplane or a line that isolates the diverse
classes of information focuses. SVM has back vectors. The remove between positive
hyperplane and negative hyperplane is called edge
 from sklearn.svm import SVC
 svm = SVC()
 svm.fit(x_train, y_train)

5
4.5 Naive Bayes:

Navie Bayes is a probabilistic machine learning calculation based on Bayes'


Hypothesis, broadly utilized for classification assignments due to its
straightforwardness, speed, precision, and unwavering quality. It accept that each
highlight makes an free and rise to commitment to the result, making it especially
viable in common dialect handling and content classification tasks
 from sklearn.naive_bayes import GaussianNB
 naive_bayes = GaussianNB()
 naive_bayes.fit(x_train, y_train

4.6 Random Forest:


Random Forest is an ensemble learning algorithm that combines multiple decision
trees to improve predictive accuracy and control overfitting. It works by
constructing multiple decision trees from different subsamples of the training data
and combining their predictions through majority voting (for classification) or
averaging (for regression). The key aspects of Random Forest are:

- Bootstrap Aggregating (Bagging): Each tree is trained on a random subset of the


training data with replacement.

- Feature Randomness: During node splitting, only a random subset of features is


considered.

4.7 AdaBoost:
AdaBoost (Adaptive Boosting) is an ensemble technique that iteratively combines
multiple weak learners (e.g., decision trees) into a strong learner. It works by
training a sequence of weak learners, where each subsequent learner focuses on
the instances that were misclassified by the previous learner. The final prediction
6
is a weighted majority vote of the weak learners, with higher weights assigned to
the more accurate learners.

4.8 Gradient Boosting:


Gradient Boosting is an ensemble technique that iteratively adds new weak
learners (e.g., decision trees) to the ensemble, where each new learner is trained to
predict the residuals (errors) of the existing ensemble. The predictions of the weak
learners are combined through additive modeling, and the ensemble is updated in a
stage-wise fashion to minimize a loss function.

4.9 XGBoost:
XGBoost (Extreme Gradient Boosting) is an optimized implementation of
Gradient Boosting that incorporates several computational and algorithmic
improvements. It uses a more regularized model formalization to control
overfitting, parallel and distributed processing, and a highly optimized gradient
computation technique. XGBoost has become a widely used and effective
algorithm for many machine learning tasks, especially structured or tabular data.

7
5. DATASETPREPROCESSING:

DATASET DESCRIPTION

Attributes:
 gender
we will prefer the gender in this column.
 Age
Here we will prefer the age of the desired person.
 Hypertension
We will add if they have hypertension or not in 0’s and 1’s.
 heart_disease
We will note whether they have heart disease or not in 0’s and 1’s.
 ever_married
We will note whether they are married or not.
 work_type
We will note what of work they are doing.
 Residence_type
We will note that where they are living.
 avg_glucose_level
We will note their average glucose level.
 Bmi
We will note their bmi values.
 smoking_status
We will take a note whether they have smoking habit or not.
 Stroke
We will note that they have ever got a stroke or not in 0 for no and 1 for
yes.
Here target variable is stroke. From these attributes results we will predict that
whether they have brain stroke or if there is a chance to hit a brain stroke or not
depending on all the attribute values they have given.

8
Dataset:

9
6. RESULTS:
Logistic Regression:
Accuracy: 0.9458375125376128
Precision: 0.0
Recall: 0.0
F1-score: 0.0
Confusion Matrix:
[[943 0]
[ 54 0]]

Here the logistic regression accuracy is 0.945 and all the three precision, recall, f1 score is 0. This means logistic
regression is biased.

KNN:
Accuracy: 0.9468405215646941
Precision: 0.5714285714285714
Recall: 0.07407407407407407
F1-score: 0.13114754098360656
Confusion Matrix:
[[940 3]
[ 50 4]]

Here knn accuracy is 0.946 , precision is 0.571, recall is 0.074 and f1 score is 0.131.

10
Decision Tree:
Accuracy: 0.9167502507522568
Precision: 0.24561403508771928
Recall: 0.25925925925925924
F1-score: 0.2522522522522523
Confusion Matrix:
[[900 43]
[ 40 14]]

Here Decision Tree accuracy is 0.916, precision is 0.245, recall is 0.2592 and f1 score is 0.2522.

SVM:
Accuracy: 0.9458375125376128
Precision: 0.0
Recall: 0.0
F1-score: 0.0
Confusion Matrix:
[[943 0]
[ 54 0]]

Here SVM accuracy is 0.945 and remaining precision, recall, f1 score is 0. This means SVM is biased.

11
Naive Bayes:
Accuracy: 0.7512537612838516
Precision: 0.13257575757575757
Recall: 0.6481481481481481
F1-score: 0.220125786163522
Confusion Matrix:
[[714 229]
[ 19 35]]

Here Navie Bayes accuracy is 0.751, precision is 0.132, recall is 0.132, and f1 scare is 0.22.

Random Forest Classifier:


Accuracy: 0.9398194583751254
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
Confusion Matrix:
[[937 6]
[ 54 0]]

Here Random Forest Accuracy is 0.939 and remaining all precision, recall, f1 score is 0. This means Random Forest Is
baised.

12
AdaBoost Classifier:
Accuracy: 0.9468405215646941
Precision: 1.0
Recall: 0.018518518518518517
F1 Score: 0.03636363636363636
Confusion Matrix:
[[943 0]
[ 53 1]]

Here AdaBoost accuracy is 0.946, precision is 1, recall is 0.185 and f1 score is 0.036.

Gradient Boosting Classifier:


Accuracy: 0.9468405215646941
Precision: 0.6666666666666666
Recall: 0.037037037037037035
F1 Score: 0.07017543859649122
Confusion Matrix:
[[942 1]
[ 52 2]]

Here Gradient Boosting Classifier accuracy is 0.946 , precision is 0.66, recall is 0.037 and f1 score is 0.071.

13
XGBoost Classifier:
Accuracy: 0.9388164493480441
Precision: 0.23076923076923078
Recall: 0.05555555555555555
F1 Score: 0.08955223880597016
Confusion Matrix:
[[933 10]
[ 51 3]]

Here XGBoost Classifier accuracy is 0.938, precision is 0.23, recall is 0.055 and f1 score is 0.089.

From all the machine learning models Logistic Regression, SVM and Random Forest are biased in
precision, recall and f1 score.

14
In the graph, the accuracy values of all the machine learning models are shown. It is
clear that the Navie Bayes(0.751) has the lowest accuracy and is the most effective
among all the machine learning models to predict Brain Strokes in the given dataset.
After this model Decision has the lowest accuracy and can also be used effectively for
the prediction of Brain Strokes.

S.No MACHINE Accuracy Precision Recall F1-Score


15
LEARNING
MODEL
1 Logistic 0.945 0.0 0.0 0.0
regression

2 KNN 0.9468 0.571 0.0740 0.1311

3 SVM 0.945 0.0 0.0 0.0

4 Decision Tree 0.916 0.245 0.259 0.252

5 Naive Bayes 0.751 0.132 0.649 0.220

6 Random Forest 0.939 0.0 0.0 0.0

7 AdaBoost 0.946 1.0 0.018 0.0363

8 Gradient 0.946 0.666 0.0370 0.0701


Boosting

9 XG Boosting 0.9388 0.2307 0.0555 0.0895

7. CONCLUSION:

16
In conclusion, artificial intelligence and machine learning techniques offer promising
advancements in stroke risk prediction. By analyzing diverse datasets, these methods accurately
identify individuals at risk, surpassing traditional approaches. Challenges like interpretability and
implementation remain, requiring ongoing research. Nevertheless, AI-driven predictive analytics have
the potential to enhance proactive stroke management, improving patient outcomes and resource
allocation in healthcare..

8. FUTURE SCOPE :
1. Advanced Algorithms: Refining machine learning models for better accuracy.
2. Personalized Care: Tailoring interventions based on individual risk profiles.
3. Real-time Assessment: Developing systems for instant risk evaluation.
4. Clinical Support: Integrating AI into decision-making tools for healthcare providers.
5. Population Health: Using AI to manage stroke risks at a broader level.
6. Ethics and Regulations: Addressing concerns about data privacy and bias.
7. Validation Studies: Conducting large-scale trials to assess model effectiveness.

9. REFERENCES:

1. https://www.ijert.org/brain-stroke-prediction-using-machine-learning
2. https://www.questjournals.org/jecer/papers/vol8-issue4/F08042530.pdf
3. https://ijarsct.co.in/Paper10422.pdf
4. https://journals.itb.ac.id/index.php/jictra/article/download/18061/6082
5. https://www.kaggle.com/code/reihanenamdari/brain-stroke-prediction-decisiontree

17

You might also like