0% found this document useful (0 votes)
15 views36 pages

Final - Proj AZRA Merged

The project report details a machine learning-based approach for predicting multiple diseases, including diabetes, heart disease, and Parkinson's disease, utilizing algorithms like Support Vector Machine and Logistic Regression. It emphasizes the importance of early detection and personalized healthcare management through a user-friendly interface developed with Streamlit. The study aims to enhance healthcare outcomes by providing accurate predictions based on patient data, thereby facilitating timely interventions.

Uploaded by

Bhavyatha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

Final - Proj AZRA Merged

The project report details a machine learning-based approach for predicting multiple diseases, including diabetes, heart disease, and Parkinson's disease, utilizing algorithms like Support Vector Machine and Logistic Regression. It emphasizes the importance of early detection and personalized healthcare management through a user-friendly interface developed with Streamlit. The study aims to enhance healthcare outcomes by providing accurate predictions based on patient data, thereby facilitating timely interventions.

Uploaded by

Bhavyatha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI, KARNATAKA

A Project Work Report


on
DISEASE PREDICTION USING MACHINE LEARNING
TECHNIQUES
Submitted in the partial fulfillment for the award of
BACHELOR OF ENGINEERING
in
INFORMATION SCIENCE AND ENGINEERING

By
Ms. ANANYA DIXIT 1BY20IS027
Ms. AZRA RUMANA 1BY20IS036
Ms. HARSHINI K 1BY20IS060

Under the guidance of

Dr. N. Rakesh
Associate Professor
Dept. of ISE, BMSIT&M

2023-2024
BMS INSTITUTE OF TECHNOLOGY & MANAGEMENT
YELAHANKA, BENGALURU-560064

DEPARTMENT OF INFORMATION SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that the Final Year Project (Seventh Semester) entitled “DISEASE PREDICTION
USING MACHINE LEARNING TECHNIQUES” is a bonafide work carried out by Ms.
ANANYA DIXIT (1BY20IS027), Ms. AZRA RUMANA (1BY20IS036), Ms. HARSHINI K
(1BY20IS060) in partial fulfillment for the award of Bachelor of Engineering Degree in Information
Science and Engineering of the Visvesvaraya Technological University, Belagavi during the year 2023-24. It
is certified that all corrections/suggestions indicated for internal assessment have been incorporated in this
report. The project report has been approved as it satisfies the academic requirements with respect to mini
project work for the B.E Degree.

Signature of the Guide Signature of the Coordinator Signature of the HOD


Dr. N. Rakesh Dr. N. Rakesh Dr. Pushpa S K

Signature of Principal
Dr. Sanjay H A
ABSTRACT

Electronic data has accumulated due to the rising incidence of chronic illnesses, the complexity of
the relationships between various diseases, and the widespread use of computer-based technologies
in the health care sector. Doctors are encountering challenges in accurately diagnosing illnesses and
analysing symptoms because of the large volumes of data. Predicting the occurrence of multiple
diseases is crucial for early intervention and personalized healthcare management. In this study, we
propose a machine learning-based approach utilizing Machine Learning Algorithms to predict the
likelihood of various diseases simultaneously. We develop a user-friendly interface using Streamlit,
allowing users to input relevant health data and obtain predictive outcomes in real-time.

The models are trained on a diverse dataset comprising features associated with different diseases.
Through comprehensive experimentation and evaluation, we demonstrate the efficacy of our
approach in accurately predicting multiple diseases, thereby facilitating timely preventive measures
and tailored healthcare interventions. Our proposed system provides a valuable tool for healthcare
professionals and individuals to proactively manage their health and mitigate the risks associated
with various medical conditions.

(i)
ACKNOWLEDGEMENT

We are happy to present this project after completing it successfully. This project would not have
been possible without the guidance, assistance and suggestions of many individuals. We would like
to express our deep sense of gratitude and indebtedness to each and everyone who has helped us
make this project a success.

We heartily thank Principal, Dr. Sanjay H A, BMS Institute of Technology & Management for
his constant encouragement and inspiration in taking up this project.

We heartily thank Dr.Pushpa S K, Professor and HoD, Dept. of Information Science and
Engineering, BMS Institute of Technology & Management for his constant encouragement and
inspiration in taking up this project.

We heartily thank project coordinator Dr. N. Rakesh, Associate Professor, Dept. of Information
Science and Engineering, for his constant follow up and advice throughout the course of the project
work.

We gracefully thank our project guide, Dr. N. Rakesh, Associate Professor, Dept. of
Information Science and Engineering, for his encouragement and advice throughout the course of
the project work.

Special thanks to all the staff members of Information Science Department for their help and kind
cooperation.

Lastly, we thank our parents and friends for their encouragement and support given to us in order to
finish this precious work.

By,
Ananya Dixit
Azra Rumana
Harshini K
BMS INSTITUTE OF TECHNOLOGY & MANAGEMENT
YELAHANKA, BANGALORE-64
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING

Declaration

We, hereby declare that the project (18CSP77) titled “DISEASE PREDICTION USING
MACHINE LEARNING TECHNIQUES” is a record of original project phase-2 work undertaken
for the award of the degree Bachelor of Engineering in Information Science and Engineering of the
Visvesvaraya Technological University, Belagavi during the year 2023-24. We have completed this
project phase -2 work under the guidance of Dr. N. Rakesh, Associate Professor, Dept of ISE,
BMSIT&M.
I also declare that this project phase -2 report has not been submitted for the award of any degree,
diploma, associate ship, fellowship or other titles anywhere else.

Student Photo Photo1 Photo2 Photo3

USN 1BY20IS027 1BY20IS036 1BY20IS060

Name ANANYA DIXIT AZRA RUMANA HARSHINI K

Signature

(iii)
INDEX

ABSTRACT i
ACKNOWLEDGEMENT ii
DECLARATION iii
LIST OF FIGURES v

1. Introduction 1-2
1.1 Problem Statement
1.2 Existing System
1.3 Proposed System

2. Literature Survey 3-9


2.1 Critical Analysis of the Literature
2.2 Summary Table

3. System Requirement Specification 10


3.1 Functional Requirements
3.2 Hardware Requirements and Software Requirements

4. System Design 11-13


4.1 Methodology
4.2 Design

5. Implementation 14-17

6. Testing 18-19

7. Results and Discussion 20-23

8. Conclusion and Future work 24

References 25-26
LIST OF FIGURES

Figure No. Figure Name Page


No.
4.1 Architecture Diagram 12

4.2 Data Flow Diagram 13


14
5.1 Heart Disease Dataset

5.2 Diabetes disease Dataset 15

5.3 Parkinson’s Disease Dataset 15

5.4 Decision Boundary Hyperplane of SVM 16

5.5 Logistic Regression 16


17
5.6 Streamlit Interface

5.7 System Integration 17

7.1 Result showing positive for Diabetes 20

7.2 Result showing negative for Diabetes 20

7.3 Result showing person has Heart Disease 21

7.4 Result showing person does not have Heart Disease 21

Result showing person has Parkinson’s Disease 22


7.5

7.6 Result showing person does not have Parkinson’s 22


Disease

(v)
LIST OF TABLES

Table No. Table name Page No.

6.1 Unit Testing for Heart Disease Prediction 18

6.2 Unit Testing for Parkinson’s Disease Prediction 18

6.3 Unit Testing for Diabetes Disease Prediction 19

6.4 19
Integration Testing
6.5 19
System Testing
7.1 23
Comparison of Accuracy
Disease Prediction Using Machine Learning Algorithms Introduction

CHAPTER 1

INTRODUCTION

The capacity to accurately and efficiently forecast and diagnose diseases is still a major concern in
the rapidly changing field of healthcare. Conventional diagnosis techniques often rely heavily on
clinical expertise and subjective assessment making them time-consuming, error-prone, and
sometimes fail to capture the intricacies of many illness situations. With its capacity to analyze
massive volumes of data and spot patterns, machine learning (ML) has become a potent tool for
overcoming these constraints. ML-powered multiple disease prediction holds immense potential to
revolutionize healthcare by enabling early identification where ML models can analyze diverse data
sources, such as medical records, genetic information, and environmental factors, to pinpoint
individuals who are prone to developing various health conditions. This advance notice enables for
proactive interventions, potentially preventing disease onset or delaying its progression, leading to
improved patient outcomes and reduced healthcare costs.

Personalized Treatment by considering individual risk profiles and specific disease combinations,
ML models can help tailor treatment plans to optimize effectiveness and minimize adverse effects.
This personalized approach to healthcare ensures that patients obtain the most suitable care for
themselves in unique situations, leading to better health outcomes and improved quality of life.

The objective of the multiple disease prediction system is to transform the healthcare landscape by
offering individuals precise forecasts regarding various medical conditions. Utilizing sophisticated
machine learning algorithms like Support Vector Machine (SVM) and logistic regression, this system
scrutinizes health data to anticipate the probability of encountering diverse ailments. The
methodology encompasses pivotal phases such as data preprocessing, model training, and
incorporation of a user-friendly interface via Streamlit. By elaborating on each step comprehensively,
this approach empowers both healthcare practitioners and individuals to proactively monitor their
well-being and make well-informed choices. This introduction lays the groundwork for an in-depth
examination of the system's design and execution.

The significance of such a predictive healthcare system lies in its potential to revolutionize disease
management and prevention strategies. By harnessing the power of machine learning, individuals can
gain insights into their health risks and take proactive measures to mitigate them. Healthcare
providers can also benefit from early detection and intervention, leading to improved patient

Dept. of ISE, BMSIT 2023-24 1


Disease Prediction Using Machine Learning Algorithms Introduction

outcomes and reduced healthcare costs. Moreover, the user-friendly interface provided by Streamlit
enhances accessibility and encourages widespread adoption of the system among both professionals
and lay users. Ultimately, this system represents a significant step towards personalized and data-
driven healthcare, paving the way for a healthier and more informed society.

1.1 Problem Statement


A predictive model to predict the multiple diseases in accordance with relevant parameters put down
by the patients without visiting the hospitals / physicians by using Machine Learning Algorithms.
The project aims to improve healthcare outcomes by enabling early detection and prediction of
diseases using machine learning algorithms and streamlining the prediction process through an
intuitive user interface.

1.2 Existing System


The current approach to disease prediction primarily involves manual diagnosis by healthcare
professionals, which relies on patient symptoms, medical history, and diagnostic tests. However, this
method has limitations in terms of scalability, efficiency, and susceptibility to human error.
Furthermore, existing systems often lack a user-friendly interface for patients or healthcare providers
to input data and obtain real-time predictions. This results in potential delays in diagnosis and
treatment, highlighting the need for advanced machine learning algorithms and intuitive interfaces
like Streamlit for efficient disease prediction and decision-making.

1.3 Proposed System


The proposed system is a comprehensive disease prediction project that utilizes machine learning
algorithms, including Support Vector Machine (SVM) and Logistic Regression to predict multiple
diseases such as diabetes, heart disease and Parkinson's disease. The system aims to provide accurate
disease predictions based on input parameters and a user-friendly interface developed using Streamlit
and deployed on Streamlit Cloud. Data for the models is collected from the Kaggle platform, and is
preprocessed to ensure its quality and suitability for training the models. The preprocessed data is
then used to train the respective machine learning algorithms specific to each disease. The trained
models are tested to evaluate their accuracy in disease prediction.

Dept. of ISE, BMSIT 2023-24 2


Disease Prediction Using Machine Learning Algorithms Literature Survey

CHAPTER 2
LITERATURE SURVEY
A literature review is a survey of scholarly sources on a specific topic. It provides an overview of
current knowledge, allowing you to identify relevant theories, methods, and gaps in the existing
research.

2.1 Critical Analysis of the Literature


The reviews have explored the numerous studies which illustrate the utilisation of various ML algorithms,
including Logistic Regression, K-Nearest Neighbours, and employing Support Vector Machines (SVM)
and Deep Learning models for predictionan extensive array of ailments like diabetes, cancer, cardiovascular
disease, and Parkinson’s disease. These studies have demonstrated the promise of ML in achieving high
prediction accuracy, with some models achieving performance exceeding conventional diagnostic methods.
However, challenges remain, including data quality and accessibility, etc.

The study focuses on the integration of machine learning in the healthcare domain. It utilizes four classifiers
employing MLalgorithms, including Simple CART, SVM, Naive Bayes and Random Forest. The research
conducts experiments using the WEKA which predicts the occurrence of Diabetes disease.The classifiers are
compared based on the time taken for training and testing, and accuracy values. The dataset comprises 9
features and 768 instances, resulting in accuracies of 77% for Naive Bayes, 79% for SVM, 76.9% for Random
Forest, and 76.5% for Simple CART. Notably, the analysis of overall performance suggests that the Support
Vector Machine outperforms Naive Bayes, Random Forest, and Simple CART in forecasting diabetes disease
[1]. Challenges encountered during the use of SVM and Simple CART included sensitivity to high
dimensionality, difficulties with non-linear datasets, the black box nature of the models, hyper parameter
tuning, overfitting, high variance, limited feature importance interpretation, and unsuitability for continuous
data. These challenges also encompassed computational expenses, underscoring the various factors that need
to be considered while applying these algorithms in the sector of health care.

In addressing the challenges previously mentioned, an exhaustive examination of Disease Prediction using
ML Algorithms was undertaken [2]. This paper proposes the training of a machine learning model using
Decision Trees, a technique that involves systematically partitioning the dataset into progressively smaller
subsets to predict the target value, i.e., the disease. Furthermore, the study incorporates the application of the
Naive Bayes algorithm, recognized for its simplicity in implementation, speed, efficiency, and versatility in
handling both continuous and discrete data. In this study, the dataset consists of 132 symptoms and 4920

Dept. of ISE, BMSIT 2023-24 3


Disease Prediction Using Machine Learning Algorithms Literature Survey

instances, covering 41 diseases. Impressively, employing this approach resulted in an accuracy of 95.12%
for Decision Trees, 95% for Random Forest, and 95.12% for Naive Bayes. This highlights the efficacy of the
methodology in predicting diseases.

A study akin to [2], titled "Comparison of Machine Learning Models for Parkinson’s Disease Prediction" [3],
was critically examined. This research introduced an additional methodology involving Logistic Regression,
recognized as a potent and versatile tool for disease prediction, offering advantages regarding interpretability,
efficiency, and robustness. The dataset utilised in this investigation comprised 31 features and 4290 instances.
The outcomes demonstrated notable accuracies, with Logistic Regression achieving 89.83%, Naive Bayes
also at 89.83%, Decision Tree at 93.22%, and Random Forest, which gave an accuracy rate of 94.92%[3].
This comparison underscores the diverse strengths and performance nuances in the context of machine
learning models predicting Parkinson's disease.

Researchers have delved into data mining applications to predict heart diseases, as evidenced by their work
on "Heart Disease Prediction Using Machine Learning Algorithms" [4]. This investigation elucidates the
extraction of intriguing patterns and knowledge from extensive datasets. The study meticulously compares
and evaluates the precision of various data mining and machine learning approaches to find the most efficient
one. Ultimately, the results favour the KNN (K-Nearest Neighbors) method as the optimal choice for
predicting heart diseases. The dataset included 14 features, this gave a level of accuracy 87% for KNN, 79%
for Decision tree, 78% for linear regression and 83% for SVM.

The paper [5] focuses on the relatively underexplored realm of healthcare—specifically, the early-stage risk
prediction of Non-Communicable Disease (NCD) through wearable technology within the Healthcare
Common Procedure Coding System. Employing methodologies akin to those in reference [3], the dataset
incorporated 17 features and comprised 520 instances. Notably, the Naive Bayes algorithm demonstrated a
remarkable accuracy of 81%.

Researchers conducted an examination of Parkinson's disease, referenced as [3], focusing on L1-Norm SVM
and an Effective Recognition System utilising voice recordings [6]. The system's development involved
employing a SVM as a machine learning classifier to differentiate between individuals with Parkinson's
disease (PD) and those who are healthy. Feature selection utilised the L1-Norm SVM to identify pertinent
and characteristics that exhibit a strong correlation essential for accurate PD and healthy classification. The
dataset encompassed 22 features, yielding an accuracy of 94% for DBN (Deep Belief Network), 99% for L1
Norm, and 99% for SVM.

Dept. of ISE, BMSIT 2023-24 4


Disease Prediction Using Machine Learning Algorithms Literature Survey

The study titled "Multiple Disease Prediction Using Machine Learning Algorithms" [7] incorporates the
KNN approach [4], Random Forest [5], and XGBoost. While XGBoost stands out as a resilient and widely
applied algorithm in supervised learning, it does come with certain drawbacks, such as computational
complexity, susceptibility to imbalanced datasets, and a reliance on meticulous pre-processing. The dataset
used in the study comprises 2359 features and 1683 instances. The predictive accuracies achieved were 85%
for Diabetes, 82% for Heart disease, 78% for Parkinson’s, and 75% for Liver disease.

In the study documented in [8], the authors directed their attention towards predicting Parkinson's disease,
extending the groundwork established in a preceding investigation [6]. Their exploration delved into
assessing the effectiveness of two ML algorithms—specifically, the SVM and Decision Tree—previously
examined in another study [4]. The dataset employed for Parkinson's disease analysis in this research
encompassed 44 distinct features, drawing from an extensive collection of 240 speech measurements.
Notably, the achieved predictive accuracy was remarkable, with the SVM algorithm reaching an impressive
90%, and the Tree classifiers achieving 84%. This research provides great perspectives to the domain of
applying ML in predicting Parkinson's disease, highlighting the effectiveness of SVM and Decision Tree
models in this particular domain.

In another comprehensive review, the primary objective was to develop a user-friendly web application
enabling accurate and simultaneous prediction of multiple diseases. The approach ingeniously integrates
various disease detection techniques, eliminating the necessity for additional websites or software [9].
Employing Random Forest, Naive Bayes, and SVM [1][2][3][4][5], the dataset encompassed varying
features, ranging from 9 to 24, and instances ranging from 195 to 5110. The researchers demonstrated notable
success, achieving an accuracy of 88% for Diabetes, 85% for Heart disease, and 82% for Kidney disease.

For diagnosing Parkinson's disease, specific tests like blood tests or ECGs are typically required. To address
the complexity of classifying Parkinson's disease, the recommended approach involves utilising ML based
algorithm called Logistic Decision Regression (LDR) [10]. LDR amalgamates the advantages of Logistic
Regression (LR) and Decision Trees, offering interpretability, the ability to handle both continuous and
discrete data, and efficiency for large datasets. The researchers proposed a dataset consisting of 22 features
and 31 instances to train the LDR model. Notably, the study revealed that the LDR model achieved a
remarkable accuracy of 93.5% in predicting Parkinson’s disease.

In our extensive literature review [11], we explored various algorithms such as SVM, LR, and KNN, noting

Dept. of ISE, BMSIT 2023-24 5


Disease Prediction Using Machine Learning Algorithms Literature Survey

a limitation in their application owing to the absence of a varied dataset. To address this, we enhanced
prediction capabilities by incorporating two additional algorithms, Naïve Bayes and K-Nearest Neighbors,
and conducted an evaluation based on their success factors to determine which algorithm exhibited superior
accuracy. This study involved the analysis of 768 instances, with 268 classified as positive and the rest as
negative. The SVM algorithm demonstrated an impressive accuracy of 92.13%, while the LR, KNN, and DT
algorithms achieved accuracies of 89.55%, 87.92%, and 85.05%, respectively.
The study [12] investigates the utilisation of logistic regression (LR), a statistical technique also examined in
previous works [3][5][11], for predicting binary outcomes in the context of determining the existence or
nonexistence of cardiovascular disease (CVD). The dataset comprised 303 instances. LR demonstrated an
accuracy of 87.10% in successfully predicting the presence or absence of CVD in the subjects.

In the contemporary healthcare sector, machine learning is commonly employed to detect diseases and predict
their occurrences through data modelling. In studies focusing on risk assessment in intricate scenarios, such
as heart disease, the Logistic Regression ML algorithm is notably prevalent [13]. The authors utilised the
Cleveland Heart Disease Dataset, encompassing 303 cases with diverse factors contributing to the risk of
heart disease. The achieved accuracy in their analysis was 85.47%.

Evidence has demonstrated that simpler systems, such as Logistic Regression and SVM, has the potential to
generate more precise outcomes than their more complex counterparts. Another recent review [14] parallels
the investigation in [11], incorporating the widely recognized Pima Indian Diabetes Dataset comprising 768
instances involving diverse risk factors for diabetes. In the realm of diabetes prediction, researchers found
that SVM outperformed LR, achieving an accuracy rate of 79% compared to LR's 75%.

2.2 Summary Table

Sl Title of the paper Description Methodology Observation


No.
[1] Diabetes Disease The goal of the paper is to ● Naive Bayes Compared to the other
Prediction Using create a model that uses ● SVM algorithms that are
Machine Learning on Big machine learning ● Random employed, the SVM method
Data of Healthcare techniques to predict Forest produced the best results,
(IEEE) diabetes. ● Simple according to the authors.
CART

Dept. of ISE, BMSIT 2023-24 6


Disease Prediction Using Machine Learning Algorithms Literature Survey

[2] Disease Prediction using The paper demonstrates ● Decision Tree The description of patient's
Machine Learning the prediction of diseases ● Random symptoms may lack
Algorithms such as Malaria, Dengue, Forest accuracy, indicating the
(IEEE) Impetigo, etc ● Naive Bayes presence of overfitting.
[3] Comparison of Machine An evaluation of the ● Logistic The bagging classifier shows
Learning Models for performance analysis Regression signs of overfitting, as it
Parkinson’s Disease includes five models ● Naive Bayes achieves a high training
Prediction designed for Parkinson's ● Decision Tree accuracy of 98.5%, but its
disease. performance ● Random test accuracy is notably
analysis. forest lower at 91.5%.
[4] Heart Disease Prediction The precision of ML ● KNN Scope limited to
using Machine Learning techniques in forecasting ● Decision tree Heart disease.
heart disease was ● Linear
demonstrated in this Regression
paper. ● SVM
[5] Early-Stage Risk This paper examined ● Naive Bayes As SVM showed lower
Prediction of Non- health sensor data using ● Random accuracy, Regression and
Communicable Disease EPS and AI techniques, Forest Random forest were
Using Machine Learning concentrating on the ● Decision tree considered to give the
in Health CPS early-stage prediction of ● Logistic precise results.
risks associated with Regression
heart and diabetes
diseases.
[6] Feature Selection Based This paper has an The analysis of experimental
on L1-Norm Support accurate diagnosis of PD ● L1-norm results effectively
Vector Machine and using ML based ● SVM demonstrates the system's
Effective Recognition prediction system. It also ability to classify between
System for Parkinson’s uses feature selection and individuals with Parkinson's
Disease Using Voice classification using Voice disease and those who are
Recordings recording data. healthy.
[7] Multiple Disease A web-based application ● KNN The application offers a user-
Prediction where different machine ● Random friendly interface, multiple
learning algorithms are forest disease prediction, and
utilized for predicting ● XGBoost accessibility.

Dept. of ISE, BMSIT 2023-24 7


Disease Prediction Using Machine Learning Algorithms Literature Survey

probability of multiple
diseases based on user-
provided medical
information. Their scope
of diseases included
diabetes, parkinsons ,
heart and liver
[8] Early Warning Signs Of The paper proposes an ● SVM The proposed approach
Parkinson’s Disease ML based approach for ● Decision Tree offers early detection of PD
Prediction Using temely identification of
Machine Learning Parkinson's disease (PD)
Technique using non-motor
symptoms.
[9] Multiple Disease The paper proposes a web ● Random User-friendly interface,
Prediction using Machine application that utilises Forest, multiple disease prediction,
Learning and Deep Machine learning and ● Naive Bayes, and accessibility
Learning with the deep learning techniques ● SVM
implementation of Web utilized to forecast the
Technology probability of various
illnesses based on user-
provided medical
information. The system
predicted for diabetes,
heart and kidney
[10] An effective Parkinson’s The paper discusses the ● LDR The authors found that the
disease prediction using anticipation of LDR model was able to
logistic decision Parkinson’s disease. The predict PD and also suggest
regression and machine authors proposed a that the method has the
learning with big data method that uses a dataset potential to be used to
to train the model. accurately diagnose the
disease.

Dept. of ISE, BMSIT 2023-24 8


Disease Prediction Using Machine Learning Algorithms Literature Survey

[11] Diabetes Disease This research introduces ● SVM It was found that the SVM
Prediction Using the approach for diabetes ● Logistic algorithm achieved the best
Machine Learning prediction using ML Regression performance than alternative
algorithms. Using a ● KNN algorithms in use.
patient dataset, these
algorithms' performance
was assessed.
[12] Logistic regression The research examines ● Logistic The authors used LR to
technique for prediction the application of logistic Regression examine the connections
of cardiovascular disease regression (LR), a between a quantity of risk
statistical approach used factors, blood pressure,
to forecast outcomes that cholesterol, smoking status,
are either yes or no in the age, gender, and the
presence or absence of existence or non-existence of
cardiovascular disease CVD.
(CVD).
[13] Heart Disease Prediction To assess how well ● Logistic The authors used LR to
Using Logistic logistic regression (LR), Regression examine the connections
Regression a statistical technique for between a quantity of risk
forecasting binary factors encompassing blood
outcomes, predicts the pressure, cholesterol, blood
probability of heart sugar, age, gender, and chest
disease with different discomfort.
heart disease risk
variables.
[14] Logistic Regression and In this study, researchers ● Logistic They attribute this difference
SVM-based Diabetes propose the use of Regression to SVM's ability to handle
Prediction System machine learning ● SVM non-linear relationships
algorithms to assess how between risk factors and
accurately certain risk diabetes.
factors predict the
likelihood of an
individual developing
diabetes.

Dept. of ISE, BMSIT 2023-24 9


Disease Prediction Using Machine Learning Algorithms Requirement Analysis

CHAPTER 3
REQUIREMENT ANALYSIS
3.1 Functional requirements:
● Streamlit App: Create a Streamlit web application with different sections for data input,
model prediction, and result display. Include user-friendly forms for inputting relevant
patient data.
● Documentation: Provide clear documentation for users, including how to use the
application and any relevant information about the machine learning models.
● Testing: Implement thorough testing to ensure the application functions as expected and
provides accurate predictions.
● Scalability: If there's a potential for increased usage, ensure that the application is scalable
and can handle additional users.

3.2 Hardware requirements:


● Computer/Server: A computer or server with sufficient processing power to run machine
learning models.
● RAM: Having a minimum of 8GB RAM is recommended, especially if dealing with
moderately sized datasets.

Software requirements
● Python: Install Python on your system. Streamlit and machine learning libraries are typically
used with Python.
● Streamlit: Streamlit is the main framework for creating interactive web applications.
● Machine Learning Libraries: Install machine learning libraries such as Scikit-Learn for
SVM and Logistic Regression.
● Data Analysis Libraries: Libraries like Pandas and NumPy for data manipulation and
analysis on on your system.

Dept. of ISE, BMSIT 2023-24 10


Disease Prediction Using Machine Learning Algorithms System Design

CHAPTER 4
SYSTEM DESIGN

4.1 Methodology:

The methodology described here involves constructing and deploying a reliable predictive model
capable of assessing multiple diseases. This model utilizes machine learning techniques, namely
Support Vector Machine (SVM) and logistic regression, to examine health data and predict the
probability of different medical ailments. The procedure consists of fundamental phases such as data
preprocessing, model training, and incorporation into a user-friendly interface via Streamlit. Through
a comprehensive exploration of these steps, this methodology aims to offer valuable insights into
building and utilizing an efficient predictive healthcare framework as follows:

1. Define the problem: First, you need to clearly define the problem you are trying to solve. This
includes understanding the domain, the data you have, and the expected output.
2. Collect data: Once you have defined the problem, you need to collect relevant data that can be
used to train the machine learning algorithm. This could involve gathering data from different sources
or generating synthetic data.
3. Data Pre-processing: Data pre-processing involves cleaning, transforming, and normalizing the
data so that it can be used for training the machine learning algorithm. This step is crucial as the
quality of the data can significantly impact the performance of the model.
4. Dividing the data: Two sets of data must be created: a training set and a testing set. The model is
trained on the training set, and its performance is assessed on the testing set.
5. Select a model: Once the data is pre-processed and split, you need to select an appropriate machine
learning model for the problem. This involves understanding the strengths and weaknesses of
different models and selecting the one that is best suited for the problem.
6. Train the model: You can now train the machine learning algorithm using the training set after
choosing the model and the data. Finding the ideal set of parameters that reduces the discrepancy
between the model's predictions and the actual output entails applying an optimization technique.
7. Test the model: A model needs to be evaluated using the testing set after it has been trained. To
evaluate how effectively the model is doing, metrics like accuracy, precision, and recall must be
calculated.
8. Tune the model: Depending on the evaluation's findings, you might need to make changes to the
model's architecture or its parameters.

Dept. of ISE, BMSIT 2023-24 11


Disease Prediction Using Machine Learning Algorithms System Design

9. Deploy the model: Once the performance of the model is good, than it can deployed, and used for
make predictions on new data. This could involve retraining the model with new data or updating its
parameters as needed. Below figure shows the architecture diagram for the system:

Fig 4.1: Architecture diagram

The depicted architecture diagram outlines the model's workflow, comprising three primary
components: data preprocessing, model training, and integration with the Streamlit user interface
(UI). Initially, collected data containing pertinent features for each disease undergoes preprocessing,
addressing missing values, feature scaling, and categorical variable encoding. Subsequent to
preprocessing, SVM and logistic regression models are trained on this processed data to discern
disease patterns. Following model training, Streamlit, a user-friendly web application framework for
Python, is utilized to construct an interactive UI. Through the Streamlit UI, users can input their
health parameters, prompting the machine learning models to generate predictions. The UI exhibits
the outcomes, including the likelihood of specific diseases based on the input provided. This cohesive
system not only harnesses potent machine learning algorithms for precise predictions but also
furnishes an accessible and user-friendly interface for users to effortlessly evaluate their health risks.

Dept. of ISE, BMSIT 2023-24 12


Disease Prediction Using Machine Learning Algorithms System Design

4.2 Design

Data Flow Diagram:

The Data Flow Diagram (DFD) is a vital tool in our healthcare diagnostic project, providing a visual
representation of how data circulates within the system. This graphical illustration delineates various
processes, data repositories, data movement pathways, and external interfaces, offering a broad
perspective on the flow of information. Processes depicted in the DFD, such as data input, application
of machine learning algorithms, and presentation of results, are mapped out to illustrate their
interactions and interdependencies. Data stores, symbolizing repositories for patient data and trained
models, pinpoint where information is housed within the system. Data flows visually depict the
trajectory of data as it traverses between processes and data repositories, showcasing the journey of
input data as it undergoes processing, culminating in the output of disease predictions. External
entities, including users or external systems, are incorporated into the diagram to elucidate how data
is exchanged with the external environment. The DFD plays a pivotal role in identifying potential
bottlenecks, redundancies, and optimization opportunities within the healthcare diagnostic
framework, fostering a comprehensive comprehension of data dynamics and ensuring the efficiency
of data processing across the system.

Data

Fig 4.2: Data Flow Diagram

Dept. of ISE, BMSIT 2023-24 13


Disease Prediction Using Machine Learning Algorithms Implementation

CHAPTER 5
IMPLEMENTATION

The implementation of our healthcare diagnostic system involves the utilization of tools and packages to
facilitate data preprocessing, model training, user interface development, and system integration.

1. Data Preprocessing:
• Data preprocessing is a crucial step that involves cleaning and transforming raw data into a format suitable
for machine learning algorithms.

• Tasks in data preprocessing may include handling missing values, scaling numerical features, encoding
categorical variables, and splitting the dataset into training and testing sets. The dataset is described as
follows:

(i) Diabetes Prediction:


We collected dataset from a public data repository KAGGLE. The dataset contains 769 instances and 9
features which includes pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, diabetes pedigree
function, age.

Fig 5.1 Diabetes disease dataset

(ii) Heart Disease Prediction:


The heart disease dataset contains 304 instances and 14 features which includes age, sex, chest pain,
trest BPS, cholestrol, FBS, rest ECG, thalach, exang, old peak, slope, ca, thal and target.

Dept. of ISE, BMSIT 2023-24 14


Disease Prediction Using Machine Learning Algorithms Implementation

Fig 5.2: Heart disease dataset

(iii) Parkinson’s Disease Prediction:


Parkinsons dataset contains 196 instances and 23 features which includes name, MDVP:Flo(Hz),
MDVP:Fhi(Hz), MDVP:Flo(Hz), MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP,
MDVP:PPQ, Jitter:DDP, MDVP:Shimmer, MDVP:Shimmer(dB), Shimmer:APQ3,
Shimmer:APQ5,MDVP:APQ, Shimmer:DDA, NHR, HNR, status, RPDE, DFA, spread1,
spread2, D2, PPE.

Fig 5.3: Parkinson’s disease dataset

2. Model Training:
• Model training involves feeding preprocessed data into machine learning algorithms to train predictive
models.

• For our healthcare diagnostic system, we would utilize machine learning algorithms such as Support Vector
Machine (SVM) and logistic regression.

Dept. of ISE, BMSIT 2023-24 15


Disease Prediction Using Machine Learning Algorithms Implementation

Support Vector Machine (SVM): The technique of Support Vector Machines (SVM) aims to identify
the best line or decision boundary that can divide n dimensional space into classes in order to quickly
categorize new information in the future. This optimum decision boundary is known to be a hyperplane.

Fig 5.4: Decision Boundary Hyperplane of SVM

Logistic Regression: Logistic regression is a highly popular algorithm in the field of machine
learning and falls under the category of supervised learning techniques. Its primary purpose is to
predict the outcome of a categorical dependent variable based on a given set of independent variables.
The dependent variable should have discrete or categorical values such as Yes or No, 0 or 1, or true
or false. However, instead of providing exact 0 or 1 values, logistic regression produces probabilistic
values ranging between 0 and 1.

Fig 5.5: Logistic Regression

3. User Interface Development:


• User interface development entails creating an interactive interface that allows users to input their health
parameters and receive disease predictions.

• We would use Streamlit, a Python library for building web applications, to develop the user interface.

Dept. of ISE, BMSIT 2023-24 16


Disease Prediction Using Machine Learning Algorithms Implementation

• The interface would include input fields for users to enter their health parameters and display the predicted
diseases based on the input.

Fig 5.6: Streamlit Interface

4. System Integration:

• This includes connecting the data preprocessing pipeline, trained machine learning models, and user
interface.

• We would integrate the data preprocessing and model training components with the user interface
developed using Streamlit. The integrated system would allow users to input their health parameters,
preprocess the data, feed it into the trained models, and display the predicted diseases through the user
interface.

Fig 5.7: System Integration

Dept. of ISE, BMSIT 2023-24 17


Disease Prediction Using Machine Learning Algorithms Testing

CHAPTER 6

TESTING

Unit Testing:

Heart Disease Prediction (Logistic Regression):

Table 6.1: Unit Testing for Heart Disease Prediction

Test Input Features Expected Output Actual


Cases Output
1 [63, 1, 3, 145, 233, 1, 0, 150, 0, 2.3, 0, 0, 1] 1 (Positive for heart disease) 1

2 [37, 1, 2, 130, 250, 0, 1, 187, 0, 3.5, 0, 0, 2] 0 (Negative for heart disease) 0

Parkinson's Disease Prediction (SVM):

Table 6.2: Unit Testing for Parkinsons’s Disease Prediction

Test Input Features Expected Output Actual


Cases Output
1 [119.99200, 157.30200, 74.99700, 0.00784, 1 (Positive for Parkinson's 1
0.00007, 0.00370, 0.00554, 0.01109, disease)
0.04374, 0.426, 0.02182, 0.03130, 0.02971,
0.06545, 0.02211, 21.03300, 0.414783,
0.815285, -4.813031, 0.266482, 2.301442,
0.284654, 0.171088] |
2 [122.40000, 148.65000, 65.47600, 0.00459, 0 (Negative for Parkinson's 0
0.00003, 0.00263, 0.00333, 0.00790, disease)
0.02751, 0.418, 0.01910, 0.02949, 0.02701,
0.04004, 0.01929, 19.08500, 0.499452,
0.819521, -4.075192, 0.285695, 2.486855,
0.368674, 0.163755]

Dept. of ISE, BMSIT 2023-24 18


Disease Prediction Using Machine Learning Algorithms Testing

Diabetes Prediction (SVM):

Table 6.3: Unit Testing for Diabetes Prediction


Test Input Features Expected Output Actual
Cases Output
1 [6, 148, 72, 35, 0, 33.6, 0.627, 50] 1 (Positive for diabetes) 1

2 [1, 85, 66, 29, 0, 26.6, 0.351, 31] 0 (Negative for diabetes) 0

Integration Testing
Table 6.4: Integration Testing

Test Components to be Input Expected Actual Output Pass/Fail


Cases Integrated Output
1 Data Preprocessing & Model Sample Trained model Model object Pass
Training dataset
2 Model Training & User Preprocessed Prediction UI UI with Pass
Interface data prediction
feature
3 User Interface & System User input Disease Predicted Pass
Integration (age, sex, prediction disease
etc.) displayed on UI

System Testing
Table 6.5: System Testing
Test Case Input Expected Output Actual Output Pass/Fail
1 User inputs (age, Disease prediction Predicted disease Pass
sex, etc.) displayed on UI
2 Updated health data Updated disease Predicted disease Pass
from external prediction updated on UI
source
3 Incorrect input Error message Error message Pass
format displayed on UI
4 Concurrent user Smooth System handles Pass
requests performance, no concurrent requests
system crashes effectively

Dept. of ISE, BMSIT 2023-24 19


Disease Prediction Using Machine Learning Algorithms Results and Discussion

CHAPTER 7
RESULTS AND DISCUSSION

After completion of project the application interfaces look like the following:
1. Diabetes Prediction:
SVM works by finding the optimal hyperplane that separates data points into different classes with
maximum margin. By representing patient data with various features, SVM can effectively classify
individuals as diabetic or non-diabetic based on their feature values.

Fig 7.1: Result showing positive for Diabetes

Fig 7.2: Result showing negative for Diabetes

Dept. of ISE, BMSIT 2023-24 20


Disease Prediction Using Machine Learning Algorithms Results and Discussion

2. Heart Disease Prediction:


Logistic regression works by modeling the probability of a binary outcome (heart disease) using a
sigmoid function applied to a linear combination of input features. By estimating model parameters and
calculating decision boundaries, logistic regression can effectively classify patients as having or not
having a heart disease based on their feature values.

Fig 7.3: Result showing person has Heart Disease

Fig 7.4: Result showing person Does not have Heart Disease

Dept. of ISE, BMSIT 2023-24 21


Disease Prediction Using Machine Learning Algorithms Results and Discussion

3. Parkinson’s Disease Prediction:


SVM works by transforming the data into a higher-dimensional space where it becomes linearly
separable, finding the optimal hyperplane that maximizes the margin between data points of different
classes, and making predictions based on the position of new data points relative to the hyperplane. This
allows SVM to effectively classify patients as having or not having Parkinson's disease based on their
health data.

Fig 7.5: Result showing person has Parkinson’s Disease

Fig 7.6: Result showing person Does not have Parkinson’s Disease

Dept. of ISE, BMSIT 2023-24 22


Disease Prediction Using Machine Learning Algorithms Results and Discussion

The results underscore the effectiveness of SVM and logistic regression algorithms in disease prediction when
integrated with a streamlined user interface. The combination of accurate predictive models and an intuitive
interface holds significant potential for empowering individuals to make informed decisions about their health
and seek timely medical intervention. The table below shows the accuracy of each:

Table 7.1: Comparison of Accuracy

Sl. Disease Name Algorithm Name Existing System Proposed System


No. Accuracy Accuracy

1 Diabetes SVM Classifier 76% 78%

2 Heart Disease Logistic Regression 80% 85%

3 Parkinson’s SVM Classifier 71% 89%

Dept. of ISE, BMSIT 2023-24 23


Disease Prediction Using Machine Learning Algorithms Conclusion and Future work

CHAPTER 8
CONCLUSION AND FUTURE WORK

This study of the development of a disease prediction system for heart, diabetes, and Parkinson's
using Streamlit and machine learning algorithms, specifically Support Vector Machine (SVM) and
Logistic Regression, has proven to be a valuable tool for early diagnosis. Through a user-friendly
interface created with Streamlit, users can input relevant health parameters, and the system provides
predictions based on the trained models. The SVM and Logistic Regression models exhibited
promising accuracy in predicting these diseases. The models takes relevant parameters as input and
predict whether there is a chance that the disease is present or not. The possible accuracy rate using
the SVM algorithm is 78% for diabetes prediction. Similarly, for Parkinson's disease prediction, an
accuracy of 89% is expected with SVM. Logistic Regression was employed for heart disease
prediction, with possible accuracy rate of 85%.

In the future, the disease prediction system can be enhanced by incorporating more sophisticated
machine learning algorithms and increasing the size and diversity of the training datasets. Integration
with real-time health monitoring devices would enable continuous data input, improving the accuracy
of predictions. Furthermore, the system's usability can be refined through additional features such as
personalized health recommendations based on individual risk factors. Collaborations with
healthcare professionals for validation and refinement of the models could enhance the system's
reliability and relevance in clinical settings. Additionally, addressing privacy and security concerns
to ensure the ethical use of health data remains a crucial aspect for future development.

Dept. of ISE, BMSIT 2023-24 24


REFERENCES
[1] Ayman Mir, Sudhir N Dhage “Diabetes Disease Prediction Using Machine Learning on Big Data
of Healthcare” (IEEE) 2019

[2] Amin Ul Haq, Jian Ping Li,Moahmmad Hammad Memon,Jalaluddin Khan, Asad Malik, Tanvir
Ahmed “Feature Selection Based on L1-Norm Support Vector Machine and Effective Recognition
System for Parkinson’s Disease Using Voice Recordings” (IEEE) 2019

[3] Sneha Gram Purohit, Chetan Sagarnal “Disease Prediction using Machine Learning Algorithms”
(IEEE) 2020

[4] Tapan Kumar, Pradyumn Sharma, Nupur Prakash “Comparison of Machine learning models for
Parkinson’s Disease prediction” (IEEE) 2020

[5] Archana Singh, Rakesh Kumar “Heart Disease Prediction using Machine Learning” (IEEE) 2020

[6] Mohammed Juned Shaikh, Soham Manjrekar “Multiple Disease Prediction” (IEEE) 2020

[7] Kranthi Kumar Singamaneni Dr.G.Putlibai ,Dr.P,Sagaya Aurelia, P Gopala Krishna,


Dr.D.StalinDavid “An effective Parkinson’s disease prediction using logistic decision regression and
machine learning with big data” 2021.

[8] Rahatara Ferdousi, M Anwar Hossain, AbdulMotaleb El Saddik “Early-Stage Risk Prediction of
Non-Communicable Disease Using Machine Learning in Health CPS” (IEEE) 2021

[9] Pawan Kumar Mall , Rajesh Kumar Yadav, Arun Kumar Rai , Vipul Narayan , Swapnita
Srivastava “Early Warning Signs Of Parkinson’s Disease Prediction Using Machine Learning
Technique” 2022

[10] Mostafizur Rahman; Saiful Islam; Sadia Binta Sarowar; Meem Tasfia Zaman “Multiple Disease
Prediction using Machine Learning and Deep Learning with the Implementation of Web
Technology” (IEEE) 2023

Dept. of ISE, BMSIT 2023-24 25


[11] D. Bertsimas, L. Mingardi and B. Stellato “ Machine Learning for Real-Time Heart Disease
Prediction” (IEEE) 2021

[12] M. A..Sarwar, N. Kamal, W. Hamid and M. A. Shah "Prediction of Diabetes Using Machine
Learning Algorithms in Healthcare". (IEEE) 2019

[13] T. J. Wroge, Y. Özkanca, C. Demiroglu, D. Si, D. C. Atkins and R. H. Ghomi, "Parkinson’s


Disease Diagnosis Using Machine Learning and Voice". (IEEE) 2019

[14] J.P. Li, A.U. Haq, S.U. Din, J. Khan, A. Khan and A. Saboor, "Heart Disease Identification
Method Using Machine Learning Classification in E-Healthcare" (IEEE) 2020

[15] K.G.Dinesh, K.Arumugaraj, K.D. Santhosh and V. Mareeswari ,"Prediction of Cardiovascular


Disease Using Machine Learning Algorithms," (IEEE) 2019

[16] Mehrbakhsh Nilashi, Othman bin Ibrahim, Hossein Ahmadi, Leila Shahmoradi “An analytical
method for diseases prediction using machine learning techniques” 2019

[17] Prashant Kumbharkar, Deepak Mane, Santosh Borde Sunil Sangve “Diabetes Disease
Prediction Using Machine Learning Algorithms” 2022

[18] A. L. Yadav, K. Soni and S. Khare, "Heart Diseases Prediction using Machine Learning" (IEEE)
2023

[19] Rubini P. E., Dr. C. A. Subasini, Dr. A. Vanitha Katharine, V. Kumaresan, S. Gowdham Kumar,
T. M. Nithya “A Cardiovascular Disease Prediction using Machine Learning Algorithms” 2021

[20] Richa Mathur, Vibhakar Pathak Devesh Bandil “Parkinson Disease Prediction Using Machine
Learning Algorithm” 2019.

Dept. of ISE, BMSIT 2023-24 26


The Report is Generated by DrillBit Plagiarism Detection Software

Submission Information

Author Name Azra Rumana


Title Disease Prediction using Machine Learning Techniques
Paper/Submission ID 1697273
Submitted by [email protected]
Submission Date 2024-04-25 10:07:36
Total Pages 35
Document type Project Work

Result Information

Similarity 14 %
1 10 20 30 40 50 60 70 80 90

Student Sources Type Report Content


Paper Internet
Quotes
0.58% 1.43%
3.03%
Words <
14, 6.3%

Journal/ Ref/Bib
Publicatio 6.15%
n 11.99%

Exclude Information Database Selection

Quotes Excluded Language English


References/Bibliography Excluded Student Papers Yes
Sources: Less than 14 Words % Excluded Journals & publishers Yes
Excluded Source 0% Internet or Web Yes
Excluded Phrases Excluded Institution Repository Yes

A Unique QR Code use to View/Download/Share Pdf File


DrillBit Similarity Report

A-Satisfactory (0-10%)

14
B-Upgrade (11-40%)

21 B C-Poor (41-60%)
D-Unacceptable (61-100%)
SIMILARITY % MATCHED SOURCES GRADE

LOCATION MATCHED DOMAIN % SOURCE TYPE

1 www.irjmets.com Publication
2

2 arxiv.org Publication
2

bmsit.ac.in Publication
3 1

information-science-engineering.newhorizoncollegeofengineering.in Publication
4 1

arxiv.org Publication
5 1

drttit.gvet.edu.in Publication
6 1

inpressco.com Internet Data


7 <1

www.irjmets.com Publication
8 1
9 www.mdpi.com Internet Data
1

10 sjcit.ac.in Publication
<1

11 Thesis Submitted to Shodhganga Repository Publication


<1

12 Deep learning approach for diabetes prediction using PIMA Indian Publication
<1
dataset by Naz-2020

13 www.diva-portal.org Publication
<1

14 annamalaiuniversity.ac.in Publication
<1

You might also like