0% found this document useful (0 votes)
11 views

Stroke-Prediction-Using-Linear-Regression

This study investigates the use of linear regression models to predict stroke risk based on clinical and demographic factors, utilizing a publicly available dataset. The research aims to identify significant predictors and establish a reliable model for early stroke risk identification, emphasizing the importance of timely interventions. While linear regression shows potential, the study suggests that combining it with more complex algorithms could enhance predictive accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Stroke-Prediction-Using-Linear-Regression

This study investigates the use of linear regression models to predict stroke risk based on clinical and demographic factors, utilizing a publicly available dataset. The research aims to identify significant predictors and establish a reliable model for early stroke risk identification, emphasizing the importance of timely interventions. While linear regression shows potential, the study suggests that combining it with more complex algorithms could enhance predictive accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal of Scientific Research in Engineering and Management (IJSREM)

Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

Stroke Prediction Using Linear Regression

Nahala M A1, Sooraj Subhash2, Kishore Xavier3 , Rahul Manoj4 Sreehari V V5


1
Asst. Prof , Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
[email protected]
2
Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
[email protected]
3
Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
[email protected]
4
Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
[email protected]
5
Student, Dept of CSE, Sree Narayana Gurukulam College Of Engineering, Kochi, India,
[email protected]
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Stroke is one of the leading causes of death 1.INTRODUCTION
and disability worldwide, and early prediction can
significantly improve patient outcomes through timely Stroke is one of the most leading causes of death and
interventions. This study explores the potential of using disability, and early prediction will be significantly
linear regression models to predict the likelihood of improved through timely interventions. The aim of this
stroke in individuals based on a set of clinical and study is to determine whether a linear regression model
demographic factors. Data used came from a publicly can be used for the prediction of stroke probability in a
available stroke dataset; the features used include age, person given a set of clinical and demographic factors.
gender, hypertension, heart disease, marital status, type Using a public available dataset concerning strokes, the
of work, smoking habits, among others. The goal of this variables present included age, gender, high blood
study is to find some important predictors and then pressure, heart problems, marital status, employment
establish a linear regression model which is capable of status, smoking habit, among others. It attempts to find
approximating stroke risk with reasonable accuracy. the strong predictors in a linear regression model capable
Hence, feature selection and preprocessing aided the of giving a reasonable stroke prediction accuracy.
choice of relevant variables with which to build the
Stroke is a medical condition characterized by the sudden
predicting model. The subset formed by training and
interruption of blood flow to the brain, resulting in loss
testing will be used to analyze a range of metrics, such
of brain function. It is one of the leading causes of death
as the mean squared error and the value of R-squared to
and long-term disability worldwide, affecting millions of
reflect performance. The outcomes do indeed show that
people annually. The ability to predict stroke risk is
using relevant features for linear regression results can
important for early intervention, prevention, and
indeed be used for predictions related to stroke risks:
personalized treatment strategies that may reduce the
thereby resulting in a simple but readable early risk
burden of this debilitating disease. As healthcare systems
identification model. More, however, the accuracy found
shift towards making decisions based on data, predictive
of the model suggests that other algorithmic and data
modeling has emerged as a promising tool for predicting
needs could allow for increased reliability in this field.
medical conditions, including stroke.
The paper concludes that with linear regression, there
seems a viable foundation to predict stroke while Traditional stroke risk assessment is based on clinical
suggesting further refinement and more refined models guidelines and risk factors such as age, hypertension,
would be necessary in clinical applications. diabetes, heart disease, smoking, and family history.
These are all very well-known risk factors, but the
interaction between them makes it challenging to
Key Words: Stroke prediction, linear regression, feature quantify and predict stroke risk in individual patients
selection, data preprocessing, machine learning. with a reasonable degree of accuracy. Recent advances in
machine learning and statistical modeling offer new
opportunities for improving predictive accuracy. Linear
regression is a very popular and interpretable statistical
method that offers a straightforward approach to
modeling the relationship between stroke risk and
various demographic, medical, and behavioral factors.

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 1


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

This study aims to evaluate the utility of linear regression accuracy and reliability of the model. While linear
for stroke prediction. By using a dataset of individuals regression is useful for identifying risk factors, its
with various demographic and health characteristics, the simplicity may limit accuracy in predicting strokes,
goal is to develop a model that can estimate the especially when relationships between variables are non-
likelihood of stroke based on these factors. Specifically, linear or complex. To address this, linear regression can
the study will focus on identifying which variables have be combined with the advanced methods of logistic
the most significant impact on stroke risk and how well regression, decision trees, or neural networks to improve
linear regression can predict stroke occurrence based on predictive performance. Another promising area is the
these inputs. integration of real-time health monitoring data and
personalization of medicine.
The primary impetus for this research is to investigate
whether a simple, interpretable linear model can provide
useful insights into stroke risk, which could become a
3.LITERATURE REVIEW
potential tool for early screening in clinical settings.
Although such complex models as logistic regression or
This paper “Stroke Prediction Using Machine Learning
machine learning techniques may have better accuracy,
Classification Methods” by Hamza Al-Zubaidi,
the study focuses on linear regression because of its
Mohammed Dweik, and Amjed Al-Mousa explores the
simplicity, interpretability, and potential for real-time
use of machine learning techniques to predict stroke
clinical applications. Ultimately, this research will
risk. Stroke is a major global health concern, and early
contribute to the efforts in stroke prevention by showing
identification of at-risk individuals is critical for
the feasibility of predictive modeling in healthcare and
effective prevention. The study examines classification
forming a basis for further research with more complex
models like Random Forest, Decision Tree, Logistic
algorithms..
Regression, and Support Vector Machines (SVM),
trained on features such as age, glucose levels, smoking
habits, and medical history, to predict stroke occurrence.
2.STROKE PREDICTION One of the challenges addressed in this study is the
problem of imbalanced datasets, where stroke cases are
Stroke is a medical condition resulting from interruption significantly fewer than non-stroke cases. To address
of blood supply to the brain, leading to cell death. this, the authors use the Synthetic Minority
Accurate predictions of stroke can be vital in the early Oversampling Technique (SMOTE) to generate
diagnosis and prevention of this disease, thus reducing synthetic examples for the minority class. This ensures
mortality and disability rates. The prediction of strokes the models are better at identifying stroke cases and
often depends on the analysis of risk factors such as age, reduces bias toward the majority class. Among the
hypertension, diabetes mellitus, smoking habits, and models tested, the Random Forest classifier showed the
family medical history. Due to its capacity to detect best performance, achieving an accuracy of 94-95%
patterns and inter-relations within a large dataset, along with high precision, recall, and F1-score. The
machine learning techniques such as linear regression are model’s ensemble approach, which combines multiple
extensively applied for this reason. Linear regression is a decision trees, contributes to its robustness and ability to
statistical model that relates the dependent variable with outperform other methods like Decision Tree, Logistic
independent variables. In the context of stroke Regression, and SVM. This study demonstrates that
prediction, it computes the probability of having a stroke machine learning, particularly Random Forest, can serve
as a function of the input features, which represent the as a powerful tool for stroke prediction. By integrating
risk factors. The method is simple, computationally these predictive models into healthcare systems, medical
efficient, and interpretable, making it a good choice for professionals can identify high-risk individuals and take
understanding how individual factors influence stroke preventive measures effectively. This research also
risk. The dataset typically contains records of patients' emphasizes the importance of addressing data
demographic information, medical history, lifestyle imbalances to ensure reliable real-world applications.[1]
factors, and stroke outcomes. Preprocessing steps like
handling missing values, normalizing data, and encoding This paper "Early Stroke Prediction Using Machine
categorical variables are crucial to ensure the dataset is Learning" by Chetan Sharma, Shamneesh Sharma,
ready for analysis. Feature selection is also very Mukesh Kumar, and Ankur Sodhi discusses the use of
important for identifying the most relevant predictors and machine learning techniques for predicting stroke, with
to remove noise in the model. The coefficients are a focus on health and lifestyle factors. The classifiers
computed by linear regression for every predictor involved in the study were Random Forest, Decision
variable, indicating how much the predictor contributes Tree, and Naïve Bayes. Random Forest showed the best
to stroke risk. Metrics like Mean Squared Error (MSE), accuracy of 98.94%, which indicated how good it was at
R-squared, and residual plots are used to determine the classifying risky individuals. Feature selection from the

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 2


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

study indicated that features such as "gender" and check if even higher accuracy rates are possible in
"residence type" had minimal impacts on the further studies.This research reinforces the potential of
predictions, but rather health-related factors such as machine learning in healthcare, particularly in
glucose level and smoking habits. This is consistent with addressing life-threatening conditions like stroke. By
previous literature that focuses on the key medical integrating Random Forest models into medical systems,
variables for accurate predictions of strokes. The healthcare providers can achieve more reliable early
random forest has outperformed other methods because detection of stroke risks, enabling timely interventions
of the robust ensemble approach it uses; it processes and significantly reducing stroke-related mortality and
more complex patterns better than other simpler models complications.[3]
like decision tree and Naïve Bayes. This reliability
makes it a promising tool for integrating machine The paper "Stroke Prediction Using Machine Learning"
learning into proactive healthcare systems. The study by Abinandhini D. M., Aman Kumar, Gudi Vishnu Teja,
underlines the transformative role of machine learning, Divya S., Naman Chauhan, I. R. Oviya, and Kalpana
especially Random Forest, in advancing early stroke Raja examines the stroke risk prediction of machine
prediction. By adopting these models in healthcare, learning algorithms with the help of patient health data.
practitioners can enhance early detection and implement The analysis was carried out on the following five
timely interventions, reducing the burden of stroke- models using a Kaggle dataset, with attributes including
related complications.[2] age, gender, BMI, and smoking status: Random Forest,
Gaussian Naïve Bayes, Logistic Regression, Support
Vector Machine (SVM), and K-Nearest Neighbor
This paper "Stroke Risk Prediction Model Using (KNN). From the study carried out, Random Forest has
Machine Learning" by Nugroho Sinung Adi, Richas an accuracy of 94.81%, which outperformed all other
Farhany, Rafidah Ghina, and Herlina Napitupulu models. In contrast, KNN was the model that delivered
presents the use of the application of machine learning the lowest accuracy at 76.32%. Random Forest, because
algorithms for predicting stroke risk. Because stroke is a of its ensemble approach, handles large datasets and
leading global cause of death, the search is for an complex relationships well, making it the most reliable
improvement on early detection by identifying persons tool for the early detection of stroke. Other models, such
with high risks through their historical health data. as Gaussian Naïve Bayes and Logistic Regression, have
Machine learning is used to analyze the patterns in the done fairly well but are less robust compared to Random
data and enables the development of models for Forest.The analysis shows the strengths and limitations
effective and accurate stroke prediction.The study of each algorithm. While SVM and Logistic Regression
evaluates three algorithms—Naïve Bayes, Decision have moderate accuracy, Random Forest has robust
Tree, and Random Forest—and compares their performance. KNN is lower in accuracy, which is an
performance in predicting stroke risk. Random Forest important point when the algorithms chosen for a
was the most accurate model with a reliability of particular dataset and prediction goal may not be
94.78%, followed by Decision Tree at 91.91% and suitable for medical applications. The authors suggest
Naïve Bayes at 89.98%. With the ensemble method of that machine learning models, such as Random Forest,
Random Forest which takes several decision trees, it is should be included in healthcare systems to make early
better suited for datasets containing complex diagnosis and intervention for stroke possible. This
information. The results suggest that Random Forest method can improve patient outcomes considerably by
demonstrates superior capabilities to handle large facilitating early prevention measures. It further
information and cannot be easily corrupted by noisy and recommends further studies that consider other patient-
incomplete inputs.The main strength of this study is that related information such as real-time monitoring and
it provides a detailed comparison of different models, genetic factors, which will fine-tune the models even
allowing insight into their strengths and weaknesses. better. This study validates the role of machine learning
Naïve Bayes and Decision Tree can do reasonably well, in healthcare, suggesting its capacity to transform the
but they are outperformed by Random Forest in both prediction of early strokes and furthering healthy
accuracy and robustness. Thus, the study emphasizes outcomes.[4]
choosing appropriate algorithms when developing
machine learning-based diagnostic tools, particularly for The paper “Stroke Prediction Using Machine Learning
high-stakes applications like stroke prediction.The Methods” by Saumya Gupta and Supriya Raheja
authors also point out potential avenues for explores the use of machine learning algorithms to
improvement, and include more patient attributes like predict stroke risk, highlighting the importance of early
genetic factors, lifestyle habits, and real-time health detection for better outcomes. Using health and
monitoring data may be included to enhance the demographic data, the study evaluates algorithms such
performance of the Random Forest model. They as Support Vector Machine (SVM), K-Nearest
recommend other advanced algorithms to test in order to Neighbors (KNN), Logistic Regression, Naïve Bayes,

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 3


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

and Random Forest to identify the most effective model (3-layer and 4-layer ANN) to assess their predictive
for stroke prediction.Results show that Random Forest capabilities.The highest accuracy is 99%, and it
outperforms other models, delivering the highest outperforms other algorithms in terms of classification
accuracy due to its robust ensemble approach, which accuracy, F1-score, and AUC. Here, it gets reflected that
handles complex data and variability more effectively Random Forest is good at dealing with complex data and
than simpler algorithms. Other methods, including SVM surpasses most other models.
and KNN, show moderate accuracy but fall short in The 4-layer ANN actually works pretty well within deep
comparison to Random Forest's reliability.This study learning models and yields an accuracy of 92.39% while
emphasizes the potential of machine learning in the 3-layer ANN shows to perform even worse than that.
improving stroke diagnosis and enabling timely Although the deep learning model is promising, it does
interventions. Integrating models like Random Forest not outperform Random Forest in this application and
into healthcare systems can aid in early detection, therefore may rather be said that traditional ML methods
significantly reducing stroke-related complications.The could be better suitable for stroke prediction.The study
authors encourage further research to enhance these focuses on the effectiveness of Random Forest in stroke
models by including additional data such as real-time prediction, noting its high accuracy and robustness. The
health monitoring and genetic information, which could paper concludes by recommending Random Forest as
improve predictive accuracy and make these tools even the most reliable model for clinical stroke prediction,
more effective in clinical settings.[5] while also recognizing the potential of deep neural
networks for future research.In summary, this study
The paper "A Predictive Analytics Approach for Stroke compares a wide range of machine learning and deep
Prediction Using Machine Learning and Neural learning methods, providing valuable insights into
Networks" by Soumyabrata Dev, Hewei Wang, selecting the best predictive model for stroke detection,
Chidozie Shamrock Nwosu, and Nishtha Jain offers a ultimately contributing to more accurate and timely
technique for stroke prediction by processing 29,072 clinical decision-making.[7]
patient electronic health records (EHR). The research
highlights age, heart disease, hypertension, and average A Predictive Analytics Approach for Stroke Prediction
glucose levels as the most predictive features in the Using Machine Learning and Neural Networks,"
prediction of stroke.The authors use Principal authored by Soumyabrata Dev, Hewei Wang, Chidozie
Component Analysis (PCA) to reduce the Shamrock Nwosu, Nishtha Jain, Bharadwaj Veeravalli,
dimensionality of data for efficiency in the model. They and Deepu John, aims to predict the risk of stroke using
find that neural network on four features gives the an EHR dataset of 29,072 patients. The paper identifies
highest accuracy. This shows the significance of proper age, heart disease, average glucose level, and
feature selection in improving prediction.When hypertension as the most important predictive features
compared to other machine learning models, the neural for stroke prediction.To optimize the model, the authors
network consistently outperforms the others, showing apply Principal Component Analysis (PCA) for
superior results in stroke prediction. This underscores dimensionality reduction, showing that using only these
the value of deep learning for analyzing complex, large four features leads to better accuracy than using all
datasets like EHR.This paper points out the fact that this features. This shows how feature selection is important
approach enhances clinical decision-making, because in improving model performance.The study compares
more accurate stroke risk assessments will allow for several models, such as decision trees, random forests,
more timely intervention. This method should also and convolutional neural networks (CNNs). The neural
simplify EHR management and further help healthcare network model outperformed the others, showing its
professionals to appropriate use of patient data.The effectiveness in stroke prediction because it can identify
study emphasizes the scope of stroke predictability and complex patterns in large datasets.The research
early diagnosis through neural networks and machine highlights the importance of optimized feature selection
learning from healthcare systems, which aids in further in improving the accuracy of prediction. The neural
research in reinforcing these models.[6] network model significantly improved stroke risk
prediction by focusing on the most relevant features.In
The paper "Prediction of Brain Stroke Using Machine conclusion, the study suggests that the combination of
Learning Algorithms and Deep Neural Network machine learning and neural networks could improve
Techniques" by Senjuti Rahman, Mehedi Hasan, and clinical decision-making in predicting stroke. Optimized
Ajay Krishno Sarkar presents a machine learning models will help identify high-risk patients earlier,
approach to predict brain stroke using a Kaggle dataset. resulting in better outcomes. Further research should be
The study evaluates various machine learning conducted to refine the models and enhance their
algorithms such as Random Forest, XGBoost, predictability.[8]
AdaBoost, LightGBM, SVM, KNN, Naive Bayes, and
Logistic Regression, along with deep neural networks

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 4


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

seek medical advice. In contrast, the admin plays the


role of an administrator who oversees the system for its
smooth operation and to maintain the integrity of the
4.ARCHITECTURE process. The admin logs in to the system to monitor
user details and review stroke prediction results that are
The Architecture of a Stroke prediction using linear created for the individual. This oversight allows the
rergression is: admin to monitor system performance, address any
anomalies, and potentially use the data for research or
reporting purposes. At the core of the system is the
stroke prediction process, which acts as the central
mechanism facilitating these interactions. This process
takes the input provided by the user, applies
sophisticated algorithms or machine learning models to
analyze the data, and generates accurate stroke risk
predictions. The data flow in the system starts from
users submitting their health data, which then gets
processed by the central stroke prediction mechanism
to yield results. At the same time, the admin can have
access to those results and user information for
management or analysis purposes. Therefore, the Level
0 diagram captures the basic structure of the stroke
prediction system, highlighting the flow of data
between users, administrators, and the prediction
process. It is a basis for understanding at a general
level how the system operates, hence of great utility to
The image represents a Data Flow Diagram (DFD) for a
stakeholders and developers.
stroke prediction system. It illustrates how data flows
between the system's entities and processes. The
The Level 1 (Detailed Diagram) explains the procedures
diagram is divided into Level 0 (a high-level overview)
for both users and administrators in the stroke prediction
and Level 1 (a more detailed breakdown of system
system. On the user's side, it starts from the interaction
functionalities). Below is a detailed explanation
using the register function to get an account that requires
the entry of the user's personal credentials in order to log
The Level 0 diagram, also referred to as the context
into the system. Then, it proceeds to login by
diagram, is a high-level view of the stroke prediction
introducing the credentials that will lead to the
system, with a focus on its interaction with external
availability of the features offered by the system. After
entities and the main process. It identifies the flow of
login, the users input their data, which is important
data between the system and its key stakeholders. The
health information such as age, blood pressure, and
Context Diagram or Level 0 comprises a
medical history. The next process is the stroke
comprehensive, general description of the stroke
prediction process, where the system uses predictive
prediction system, focusing on its interaction and
models to analyze the data input by the user and
involvement between the central process of the system
determine the likelihood of a stroke. After the analysis,
and the major elements involved. The two central
users can log in to view results. Users are given their
persons involved in the stroke prediction system are the
specific prediction results, showing the probability of
user and the admin. Each is meant to play a separate,
stroke occurrence for them. After checking the results,
yet interrelated, part in the functioning of the system.
users can log out safely from the system. In the admin
The user includes people who interact with the system
section, administrators are very important. Admin logs
to determine the possibilities of having a stroke. These
into the system by using the login function with the
users interact with the system by inputting personal
administrative details. Once logged in, they can access
information and health-related data; for example,
the view users feature, where they can view and manage
medical history, age, or details about lifestyle, which
all registered users' details. Moreover, admins can use
then the system processes to predict. The system uses
the view results feature to see the stroke prediction
such data in predictive models that analyze it, and this
results for all users, which is helpful for monitoring the
usually comes back to the user in the form of
system, research, or reports. Once done with the above
outcomes, normally representing the probabilities of
administrative activities, the admin securely logs out of
having a stroke. This process empowers users by
the system using the logout function. This detailed
providing them with valuable insights that can help
diagram will highlight how different user and
them make decisions related to health or urge them to
administrator structured workflows may look, giving a

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 5


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

reflection of how this system properly manages data advanced technique has developed neural networks,
entry, process prediction, and the result managing in which would recognize rather complex patterns while
order to make experience smooth and operational for processing voluminous data via multiple levels of
any stakeholder. interconnected nodes. Again, these are very much useful
in high-scale health records or scanning, such as CT and
The stroke prediction system is designed to help users MRI scan images but are computationally expensive to
assess their risk of stroke based on health-related inputs, run and therefore require good hardware resources.
giving them valuable insights into their potential risk. It Moreover, neural networks are often regarded as "black
also offers administrators the ability to oversee and box" models because they are not very interpretable, and
manage the system, including access to user data and so it is hard for health care providers to understand why
prediction results. The structured flow of the system a prediction was made. SVM and KNN are also
ensures smooth interactions for users and efficient commonly used in the stroke prediction task. SVMs are
administrative management. The Level 0 diagram particularly good for classification problems where the
represents a general view of the system, showing how it classes are well-separated, like distinguishing between
relates to external entities, such as users and patients at high risk for stroke and those not at risk.
administrators. The Level 1 diagram is more specific in KNN is a non-parametric method that predicts outcomes
nature, showing the functions and processes for both based on the proximity of a given data point to its
users and admins and demonstrating the logical neighbors. While KNN can be intuitive, it struggles with
sequence of actions within the system. This approach is large datasets, as the algorithm becomes
very crucial in aiding developers and stakeholders computationally expensive when dealing with high-
understand the system's design and functionality in the dimensional data. While these machine learning models
planning and development stages. and statistical techniques have proven useful in stroke
prediction, they all come with certain disadvantages.
These include issues related to the complexity of the
5.COMPARISON WITH EXISTING SYSTEM models, the need for large datasets, high computational
demands, challenges with interpretability, and
The current stroke prediction systems rely on a broad difficulties with generalization to new populations.
range of statistical models and machine learning
algorithms that examine health information to give Current stroke prediction systems rely on complex
prognostication of stroke risk. These range from modeling with neural networks and random forests,
traditional methods, such as linear regression, logistic while their output in most cases remains opaque and
regression, which provided a foundation for stroke difficult for healthcare providers to understand and trust.
forecasting, to more complex or sophisticated machine This is a major issue in clinical practice: clear insights
learning techniques of decision trees, random forests, into risk factors are required. In contrast, the proposed
neural networks, SVMs, and KNNs. All these methods system uses linear regression, which is highly
bring their individual strengths: classic models fit for interpretable and provides transparent insights into how
more simple assignments, the advanced models better factors like age and blood pressure impact stroke risk,
dealing with vast amounts of information and taking into hence making it even more suitable for clinical
account much more intricate and non-linear environments. In addition, advanced models like neural
relationships of risk factors between each other. Linear networks and SVMs are very computationally expensive
and logistic regressions have widely been in use for and may not be feasible in the resource-poor healthcare
many decades because they are so easy and environment, and their slow training phase and long
straightforward for interpretation. They enable medical implementation time further disqualifies their practical
practitioners to understand the correlation between a usage. The proposed system uses linear regression,
patient's individual risk factors, such as age, blood which is computationally efficient and can run on less
pressure, and cholesterol levels, and their chance of powerful hardware for real-time predictions in smaller
having a stroke. These models are more practical when or resource-constrained healthcare environments. Large
working with small data sets because they do not require volumes of data are required to train complex models
large volumes of data to produce reasonable predictions. effectively, however. Such datasets may not always be
More advanced techniques, including decision trees and accessible, especially in smaller institutions. It addresses
random forests, are able to manage larger and more this proposition by using linear regression since it can
complex data sets. These algorithms function by making work nicely with smaller datasets and ensures wider
a series of decision rules that split the data into various accessibility. Moreover, though such advanced models
branches, aiding in the discovery of interactions between are likely to overfit, especially when there is unbalanced
multiple risk factors. Though these models are highly or inadequate data, the simple nature of linear regression
efficient at identifying patterns, they are prone to deters the probability of overfitting if data is relatively
overfitting if not tuned properly. Yet, another very well-balanced. Another challenge with advanced models

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 6


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

is that their high computational demands make it hard to especially low-resource ones. In contrast, the proposed
scale; however, the proposed system can easily be system uses a much more straightforward and
retrained or recalibrated as more data becomes available interpretive technique, linear regression, where it is quite
and grows with the expanding dataset. In addition, easy to understand how the different risk factors, for
complex models require expensive hardware for training example, age, blood pressure, and lifestyle, affect the
and deployment, which is financially unfeasible for probability of stroke. Interpretability is also critical for
many healthcare providers. By employing linear gaining the trust of healthcare professionals and patients.
regression, the costs outlined above are minimized and Further, linear regression is computationally efficient
more economical, which is a solution that proves to be and can be run on less powerful hardware, which makes
valuable for especially those areas where resources are it suitable for real-time clinical applications, even in
limited. Existing models would, by then, lack resource-constrained environments. The system's
generalizability to the new populations either based on simpler design also significantly reduces costs
demographic differences or regional variability, while associated with high-end machine learning
linear regression, being less rigid, may easily get infrastructure, which makes it an affordable solution for
updated with newer data or risk factors in order to healthcare providers with limited budgets. It is also
remain relevant for diverse clinical environments. scalable, capable of handling growing datasets and
adapting to new risk factors as medical knowledge
The proposed stroke prediction model has several advances. Focusing on simplicity, interpretability,
advantages over a complex machine learning model due efficiency, and cost-effectiveness, the proposed system
to its interpretability and efficiency, among other points. would be a practical, affordable, and scalable solution
In contrast to complex models that are hard to track, the for stroke prediction, especially in healthcare facilities in
system uses the linear regression technique, whose resource-limited settings, overcoming most of the
explanation of how a risk factor, such as age or high challenges posed by more complex existing systems.
blood pressure, affects prediction of stroke makes it
sound more acceptable and reliable to patients and
healthcare providers. In addition, the linear regression 6.CONCLUSION
model is computationally efficient and performs well on
less powerful hardware, which is critical in real-time In conclusion, the stroke prediction model developed
clinical applications, where resources may be scarce, with linear regression has a significant potential for
thus significantly better than the more resource-intensive improving stroke prevention and health care outcomes.
models. The design is also simpler, and hence, there is a It can easily identify individuals at risk using various
cost savings on the high-end machine learning health and lifestyle factors and is a very practical and
infrastructure, and thus, it's a much more affordable interpretable solution for clinicians. The methodology
solution for healthcare providers. This system will followed in creating such a model begins with clearly
enable healthcare providers to flag high-risk patients defining the problem and identifying key factors
early in order to target preventive interventions on such influencing stroke risk. Data collection and
patients, and ultimately reduce the rate of strokes and preprocessing are two crucial stages in ensuring that the
improve patient outcomes. In addition, it will be easy to health data used for training the model is clean, relevant,
update the system, adding new risk factors or data and accurate. Once the data is processed, the model is
sources with changes in medical knowledge; otherwise, built, trained, and evaluated to ensure that it accurately
more complex systems, once built, may require major reflects the relationships between risk factors like age,
reconstruction if new data are to be accommodated. blood pressure, cholesterol levels, and comorbidities
such as diabetes or heart disease. This approach ensures
The proposed system depicts several distinct advantages that healthcare professionals can make informed
over the ones in existence today, including simplicity, decisions about which patients need further attention and
interpretability, computationally efficiency, and lower preventive measures.
cost. In contrast to existing systems which depend
heavily on advanced models of machine learning, for One of the major advantages of employing linear
example neural networks, random forests or support regression in stroke prediction is its interpretability.
vector machines, despite their power they pose serious Linear regression does not produce such complex
challenges to themselves. These systems are highly models as that of more advanced machine learning
complex, usually operating as "black box" models, and algorithms. Instead, it enables the clinicians to
are thus hard to interpret in their decision-making understand how a risk factor contributes to a particular
processes. The decision-making process needs to be prediction. This results in the enhancement of clinical
interpretable because of the clinical acceptance acceptance and trust, given that the results can easily be
involved. Also, they need a lot of computational explained to the patients. Success would therefore
resources that are unavailable in most health settings, greatly rely on the quality and relevance of the data

© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 7


International Journal of Scientific Research in Engineering and Management (IJSREM)
Volume: 09 Issue: 01 | Jan - 2025 SJIF Rating: 8.448 ISSN: 2582-3930

used. Of course, inclusion of relevant predictors, such as [6] Manikandan, K., Patidar, A., Walia, P., & Roy, A.
lifestyle factors, age, and a more detailed medical B. "Hand Gesture Detection and Conversion to
history, may ensure the accuracy of model predictions. Speech and Text."
Further feature engineering and hyperparameter tuning
improve the performance of the model in capturing [7] Hwang, E. J., Cho, S., Lee, J., & Park, J. C. "An
relationships between the variables. The downside to Efficient Sign Language Translation Using Spatial
this is that linear regression models are incapable of
Configuration and Motion Dynamics with LLMs."
capturing complicated, nonlinear relationships between
variables. So while linear regression works for many
[8] Gong, J., Foo, L. G., He, Y., Rahmani, H., & Liu, J.
situations, the interaction of risk factors in complex
ways might limit its use in other cases. "LLMs are Good Sign Language Translators."

Beyond that, and to improve on the current model's


accuracy in forecasting, other models may need to be
introduced, especially logistic regression, which in some
medical fields, have been replaced by machine learning
techniques for instance decision trees, forests, or even
support machines. Such techniques can reveal more
complex relations and therefore possibly lead to higher
predictability in cases of strokes. Despite this, the
simplicity and interpretability of linear regression make
it a great starting point for stroke prediction systems,
especially in resource-limited settings. This model can
give healthcare providers an effective, scalable, and real-
time solution to determine who is at risk, allowing for
early intervention. Ultimately, while linear regression
can be a powerful predictor for stroke, the overall
combination of such models with superior ones and
continuous refinement against new data and research can
go on to result in highly effective and personalized
healthcare intervention with better outcomes for the
patient in the long term.

7.REFERENCES

[1] Chetan Sharma , Shamneesh Sharma , Mukesh


Kumar , Ankur Sodhi " Early Stroke Prediction
Using Machine Learning " Chitkara University,
Himachal Pradesh (INDIA).

[2] Chetan Sharma , Shamneesh Sharma , Mukesh


Kumar , Ankur Sodhi " Early Stroke Prediction
Using Machine Learning " Chitkara University,
Himachal Pradesh (INDIA).

[3] Nugroho Sinung Adi , Richas Farhany , Rafidah


Ghina , Herlina Napitupulu "Stroke Risk Prediction
Model Using Machine Learning "

[4] Abinandhini D M , Aman Kumar , Gudi Vishnu


Teja , Gudi Vishnu Teja , Divya S , Naman
Chauhan , I R Oviya , Kalpana Raja " Stroke
Prediction Using Machine Learning "

[5] Saumya Gupta , Supriya Raheja " Stroke Prediction


Using Machine Learning Methods "
© 2025, IJSREM | www.ijsrem.com DOI: 10.55041/IJSREM39741 | Page 8

You might also like