Main Project
Main Project
A PROJECT ON
CERTIFICATE
Certified that the project work entitled “MULTIPLE DISEASE PREDICTION USING
MACHINE LEARNING AND STREAMLIT” is a bonafide work carried out by
MOHAMMAD AMAAN, RIFA MARYAM SHEIKH, SAYYED FARAZ bearing
USN’s 4SH20CS028, 4SH20CS052, 4SH20CS056 respectively in partial fulfilment of eight
semester project, regards to the subject “Major Project” for the award of degree of Bachelor of
Engineering in Computer Science and Engineering of Visvesvaraya Technological University,
Belagavi during the year 2023-2024. It is certified that all correction/ suggestions indicated by the
guide have been incorporated in the report. The project report has been approved as it satisfies the
academic requirement in respect of the seminar work prescribed for the said degree.
de
ABSTRACT
In today’s world, Deep learning techniques are playing a vital role in all areas. It has already made a
massive impact in almost every field, such as self-driving cars, cancer diagnosis, predictive forecasting,
precision medicine and speech recognition. The limitations of traditional learning techniques are
overcome by deep learning techniques.
Healthcare falls under the essential conveniences to be given to the society. Many of the current AI models
for medical services examination are focusing on one disease prediction for each analysis. Our point is to
anticipate the various sorts of illness in single stage by utilizing inbuilt python module Streamlit. In this
task we are utilizing Naïve Bayes algorithm, Logistic Regression, TensorFlow and keras, random forest,
SVM, classifier are utilized for prediction of a particular disease .The calculation which gives more
accuracy is used to train the data set before implementation. To implement multiple disease analysis used
machine learning algorithms, Streamlit and python pickling is utilized to save the model behaviour. In this
article we analyse Diabetes analysis, Heart disease, parkinson's disease, malaria disease and intestine
disease by using some of the basic parameters such as Pulse Rate, Cholesterol, Blood Pressure, Heart
Rate and also image etc., and also the risk factors associated with the disease can be found using prediction
model with good accuracy and Precision. Further we can include other kind of chronic diseases, skin
diseases and many other. In this work, demonstrated that using only core health parameters many diseases
can be predicted. The significance of this analysis to analyse the maximum diseases to screen the patient's
condition and caution the patients ahead of time to diminish mortality proportion.
ACKNOWLEDGEMENT
A successful project is a fruitful combination of the efforts of many people. Some directly
involved and others who have quietly encouraged and extended their invaluable support
throughout its progress.
We would also like to convey our heartfelt thanks to our Management for providing the
good infrastructure, laboratory facility, qualified and inspiring staff whose guidance was of
great help in successful completion of this project.
We are extremely grateful and thankful to our beloved Principal Dr. K E Prakash for
providing the congenial atmosphere and necessary facilities for achieving the cherished
goal.
We feel delighted to have this page to express our sincere thanks and deep appreciation to
Prof. Anand S Uppar, Head of the Department, Computer Science and Engineering for
his valuable guidance, keen interest and constant encouragement throughout the entire period
of this project work.
We also thank all other teaching staff and non-teaching staff for allowing us to carry out the
project work.
Finally, we would like to thank our family for their support and understanding, to whom
we owe so much.
MOHAMMAD AMAAN
RIFA MARYAM SHEIKH
SAYYED FARAZ
SHREE DEVI INSTITUTE OF TECHNOLOGY
KENJAR, MANGALURU – 574142
Department of Computer Science and Engineering
DECLARATION
Date:
Place: Mangalore
MOHAMMAD AMAAN [4SH20CS028]
RIFA MARYAM SHEIKH [4SH20CS052]
SAYYED FARAZ [4SH20CS056]
TABLE OF CONTENTS
CHAPTERS
PG.NO
1 INTRODUCTION 1
3.3 DISADVANTAGE 5
4 PROPOSED SYSTEM 7
6 SYSTEM DESIGN 10
6.1 DESCRIPTION 10-11
7.2 TESTING 21
REFERENCES 26
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
The medical services industry can go with a successful choice by "mining" the huge
data set they have for example by extracting the hidden relationships and connections in the
data set. Data mining algorithms like Random Forest Logistic Regression, TensorFlow and
keras, SVM and Naïve Bayes calculations can give a solution for this present circumstance.
Thus, we have developed a computerized framework that can discover and extract hidden
knowledge associated with the diseases from a historical (diseases-side effects) data set by the
standard arrangement of the particular algorithm. The medical care and clinical area are more
in need of data mining today.
At the point when certain information mining strategies are utilized in a correct manner,
significant data can be removed from enormous data sets and that can assist the clinical
specialist with taking early choice and further develop healthcare administrations. The spirit is
to use the classification in order to assist the physician. During a ton of examinations over
existing frameworks in medical services, examination thought about just a single sickness at a
time. Most extreme articles center around a specific sickness. At the point when any association
needs to break down their patient's well being reports then they need to send many models. The
methodology in the current framework is helpful to dissect just specific illnesses.
These days mortality has expanded because of not distinguishing the specific infection.
Indeed, even the patient who got restored from one sickness might be experiencing another
infection. Inside experiencing heart issues which are not distinguished. Like this many
occasions are seen in many individuals' life stories.In numerous sickness expectation
frameworks a client can break down more than one illness on a solitary site.
The client doesn't have to cross better places to foresee whether he/she has a specific
infection or not. In this, the client needs to choose the name of the specific illness, enter its
boundaries and simply click on submit. The comparing AI model will be summoned and it will
anticipate the result and show it on the screen.
CHAPTER 2
LITERATURE SURVEY
There have been various examinations done connected with predicting the disease using
different Techniques and algorithms which can be used by Healthcare centers. This paper
reviews on the strategies and results used by the research papers:
Sateesh Ambesange [1] detected the health parameters by various sensors. The Arduino boards
processed the data received from the sensors and demonstrated the prediction of Diabetes, using
only core health parameters and compared the results with the complete PIDD data set ,resulted
in 81.91% precision for KNN algorithm 81.81%
Chetan Sagarnal [3] in this the algorithms are selected, the symptoms are processed, and the
disease is predicted which is resulted with 95.12%
Nuzhat F.Shaikh [4] In the visualization of the modules by different techniques for
understanding and algorithm selected for comparison basis of accuracy and time taken for the
class labels with the best accuracy 98.12 by J48 algorithm.
Rashmi G Saboji et al, [5] tried to find a scalable solution that can predict heart disease utilizing
classification mining and used Random Forest Algorithm. This system presents a comparison
against Naïve-Bayes classifiers but Random Forest gives more accurate results withaccuracy
98%.
Pahulpreet Singh Kohli et al, [6] suggested disease prediction by using applications and methods
of machine learning and used techniques like Logistic Regression, Decision Tree, Support
Vector Machine, Random Forest and Adaptive Boosting. This paper focuses on predicting Heart
disease, Breast cancer, and Diabetes. The highest accuracies are obtained using Logistic
Regression that is 95.71% for Breast cancer, 84.42% for Diabetes, and 87.12% for Heart
disease.
Lambodar Jena et al, [7] focused on risk prediction for chronic diseases by taking advantage of
distributed machine learning classifiers and used techniques like Naive Bayes and Multilayer
Perceptron. This paper tries to predict Chronic-Kidney-Disease and the accuracy of Naïve Bayes
and Multilayer Perceptron is 95% and 99.7% respectively.
Naganna Chetty et al, [8] developed a system that gives improved results for disease prediction
and used a fuzzy approach. And used techniques like KNN classifier, Fuzzy c-means clustering,
and Fuzzy KNN classifier. In this paper diabetes disease and liver disorder prediction is done
and the accuracy of Diabetes is 97.02% and Liver disorder is 96.13.
Sayali Ambekar et al, [9] recommended Disease Risk Prediction and used a convolution neural
network to perform the task. In this paper machine learning techniques like CNN-UDRP
algorithm, Naive Bayes, and KNN algorithm are used. The system uses structured data to be
trained and its accuracy reaches 82% and is achieved by using Naïve Bayes.
MinChen et al, [10] proposed a disease prediction system in his paper where he used machine
learning algorithms. In the prediction of disease, he used techniques like CNN- UDRP
algorithm, CNN-MDRP algorithm, Naive Bayes, K-Nearest Neighbor, and Decision Tree. This
proposed system had an accuracy of 94.8% .
CHAPTER 3
Many of the existing machine learning models for health care analysis are concentrating
on one disease per analysis. For example first is for liver analysis, one for cancer analysis, one
for lung diseases like that. If a user wants to predict more than one disease, he/she has to go
through different sites.
There is no common system where one analysis can perform more thanone disease
prediction. Some of the models have lower accuracy which can seriously affect patients’ health.
When an organization wants to analyse their patient’s health reports, they haveto deploy many
models which in turn increases the cost as well as time Some of the existing systems consider
very few parameters which can yield false results.
Multiple Disease Prediction using Machine Learning,Deep Learning and Streamlit The
existing system is a project that focuses on predicting diabetes, heart disease, and Parkinson's
disease using various machine learning algorithms. The algorithms employed in this project
include Naive Bayes classifier, Decision Trees classifier, Random Forest classifier, Support
Vector Machine (SVM), and Logistic Regression. To deploy the models, Streamlit Cloud and
Streamlit library are utilized, providing a user-friendly interface for disease prediction.
The system collects data from various sources, preprocesses it, trains the models with
the processed data, and tests their performance. One of the algorithms used in the system is
SVM, which achieved a prediction accuracy of 76% for diabetes. This means that the SVM
model correctly predicted diabetes in 76% of the cases it was tested on. The performance of
the SVM algorithm indicates its effectiveness in distinguishing between diabetic and non-
diabetic individuals. Similarly, for Parkinson's disease prediction, the SVM algorithm achieved
a prediction accuracy of 71%. This means that the SVM model accurately predicted the
presence or absence of Parkinson's disease in 71% of the cases.
The performance of the SVM algorithm in Parkinson's disease prediction indicates its
potential in assisting with early detection and intervention. The system incorporates other
machine learning algorithms such as Naive Bayes, Decision Trees, and Random Forest, which
may have varying performance metrics for different diseases.
These algorithms are designed to leverage different characteristics of the data and make
predictions based on distinct methodologies. Overall, the existing system demonstrates the
effectiveness of machine learning algorithms in predicting diabetes, heart disease, and
Parkinson's disease. The use of Streamlit Cloud and Streamlit library allows for easy
deployment and provides a user-friendly interface for interacting with the prediction models.
Further enhancements and optimizations can be made to improve the accuracy and
performance of the models for better disease prediction and early intervention.
Data bias: One of the biggest concerns with machine learning systems is data bias. If the
training data used to develop the system is biased or incomplete, it can lead to inaccurate
predictions and misdiagnosis. This is especially problematic when it comes to
underrepresented populations, as their data may not be well-represented in the training
set.
Overfitting: Overfitting occurs when a machine learning model is trained too closely to a
particular dataset and becomes overly specialized in predicting it. This can result in poor
generalization to new data and lower accuracy.
Lack of interpretability: Many machine learning algorithms are "black boxes," meaning that
it is difficult to understand how they arrive at their predictions. This can be problematic
in healthcare, where it is important to be able to explain how a diagnosis was made.
To address the identified issues in the existing system and create a more comprehensive and
accurate machine learning model for health care analysis, the proposed solution involves the
development of a multi-disease prediction system with improved accuracy, reduced bias, and
enhanced interpretability. The strategy includes the following key components:
Integrated Multi-Disease Prediction Model: Develop a unified machine learning model
capable of predicting multiple diseases simultaneously. Integrate diverse datasets related to
various diseases to create a comprehensive and holistic analysis system.
Data Quality Assurance: Implement rigorous data preprocessing techniques to address data
bias and incompleteness. Ensure the inclusion of diverse and representative datasets,
especially focusing on underrepresented populations, to reduce bias.
Regularization Techniques to Mitigate Overfitting: Apply regularization techniques such
as dropout in neural networks to prevent overfitting. Use cross-validation strategies during
model development to assess generalization performance.
Interpretable Machine Learning Models: Choose machine learning algorithms with
inherent interpretability, such as decision trees or rule-based models. Implement model-
agnostic interpretability tools to enhance understanding of complex models.
Continuous Model Monitoring and Improvement: Establish a system for ongoing model
monitoring to identify and address performance degradation. Implement mechanisms for
continuous learning, allowing the model to adapt to evolving healthcare trends and data
characteristics.
CHAPTER 4
PROPOSED SYSTEM
The proposed system is a comprehensive disease prediction project that utilizes machine
learning algorithms, including Support Vector Machine (SVM), Logistic Regression,
TensorFlow with Keras, to predict multiple diseases such as diabetes, heart disease, Parkinson's
disease, malaria disease and intestine disease. The system aims to provide accurate disease
predictions based on input parameters and a user-friendly interface developed using Streamlit
and deployed on Streamlit Cloud. Data for the models is collected from the Kaggle platform, a
popular data science community, and is preprocessed to ensure its quality and suitability for
training the models. The preprocessed data is then used to train the respective machine learning
algorithms specific to each disease. The trained models are tested to evaluate their accuracy in
disease prediction.
The system employs the SVM algorithm to predict diabetes, achieving an accuracy of
78%. This indicates that the SVM model can accurately identify the presence or absence of
diabetes in patients, aiding in early detection and effective management. For Parkinson's
disease prediction, the system uses the SVM algorithm with an accuracy of 87%. This high
accuracy demonstrates the capability of the SVM model to distinguish individuals with
Parkinson's disease from healthy individuals.
Heart disease prediction is performed using the Logistic Regression algorithm, which
achieves an accuracy of 85%. This model effectively identifies the likelihood of heart disease
in patients, supporting timely intervention and appropriate treatment. For malaria disease
prediction, the system utilizes TensorFlow with Keras, achieving an impressive accuracy of
96%. This high accuracy demonstrates the power of deep learning models in accurately
predicting malaria disease, enabling early detection and proactive care. intestine disease
prediction is also included in the system, utilizing TensorFlow with Keras and achieving an
accuracy of 95%.
The deep learning model developed using these technologies can effectively detect the
presence of intestine disease, enabling early diagnosis and intervention.
CHAPTER 5
5.1 REQUIREMENTS
All computer software needs certain hardware components or other software resources
to be present on a computer. These prerequisites are known as (computer) system requirements
and are often used as a guideline as opposed to an absolute rule. Most software defines two sets
of system requirements: minimum and recommended. With increasing demand for higher
processing power and resources in newer versions of software, system requirements tend to
increase over time.
Back-End : Python3.12 .
CHAPTER 6
SYSTEM DESIGN
This chapter provides information of software development life cycle, design model i.e.various
UML diagrams and process specification.
6.1 DESCRIPTION
Systems design is the process or art of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements. One could see it as the
application of systems theory to product development. There is some overlap and synergy
with the disciplines of systems analysis, systems architecture and systems engineering.
The System Design Document describes the system requirements, operating
environment, system and subsystem architecture, files and database design, input formats,
output layouts, human-machine interfaces, detailed design, processing logic, and external
interfaces.
This design activity describes the system in narrative form using non-technical terms.
It should provide a high-level system architecture diagram showing a subsystem breakout of
the system, if applicable. The high-level system architecture or subsystem diagrams should, if
applicable, show interfaces to external systems. Supply a high-level context diagram for the
system and subsystems, if applicable. Refer to the requirements trace ability matrix (RTM)
in the Functional Requirements Document (FRD), to identify the allocation of the functional
requirements into this design document.
This section describes any constraints in the system design (reference any trade-off
analyses conducted such, as resource use versus productivity, or conflicts with other systems)
and includes any assumptions made by the project team in developing the system design. This
section describes any contingencies that might arise in the design of the system that may change
the development direction. Possibilities include lack of interface agreements with outside
agencies or unstable architectures at the time this document is produced. Address any possible
workarounds or alternative plans.
To design a system for Multiple Disease prediction based on lab reports using machine
learning, we can follow the following steps:
Data Collection: Data is collected from Kaggle.com, a popular platform for accessing
datasets. The data is obtained specifically for diabetes, heart disease, Parkinson's disease,
malaria disease and intestine disease.
Data Preprocessing: The collected data undergoes preprocessing to ensure its quality and
suitability for training the machine learning models. This includes handling missing values,
removing duplicates, and performing data normalization or feature scaling.
Model Selection: Different machine learning algorithms are chosen for each disease
prediction task. Support Vector Machine (SVM), Logistic Regression, and TensorFlow with
Keras are selected as the algorithms for various diseases based on their performance and
suitability for the specific prediction tasks.
Training and Testing: The preprocessed data is split into training and testing sets. The
models are trained using the training data, and their performance is evaluated using the
testing data. Accuracy is used as the evaluation metric to measure the performance of each
model
Model Deployment: Streamlit, along with its cloud deployment capabilities, is used to
create an interactive web application. The application offers a user-friendly interface with
five options for disease prediction: heart disease, diabetes, Parkinson's disease, malaria
disease and intestine disease. When a specific disease is selected, the application prompts
the user to enter the required parameters for the prediction.
Use case diagrams model behavior within a system and helps the developers understand
of what the user require.
Use case diagram can be useful for getting an overall view of the system and clarifying
who can do and more importantly what they can’t do.
Use case diagram consists of use cases and actors and shows the interaction between
the use case and actors.
Above figure 6.3 use case diagram consists of two actors named as user and system.
User can perform actions like select the Entity and Enter the details. System perform actions
select the entity means select the disease and enter the patient details then load the dataset and
classify the data finally predict the disease.
One of the primary uses of sequence diagrams is in the transition from requirements
expressed as use cases to the next and more formal level of refinement. Use cases are often
refined into one or more sequence diagrams.
From the Fig:6.4 sequence diagram the prediction system can collect the data from
actor and store the data in dataset. Prediction system processes the train data and access the
data from dataset then prediction system use the train and test data and apply ML algorithms
and check user status value and grand status values then get the output.
CHAPTER 7
7.1 IMPLEMENTATION:
7.1.1 MODULES
• The aim of the prediction is which can perform early prediction of diabetes of a patient.
• It uses data about the Effected and normal people data preferences to generate Whether person
is effected or not from a particular Disease.
Attribute Information:
Pregnancies
Glucose
Blood pressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
Code:
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like
KNN,XGBoost,SVM,RANDOM FOREST, Logistic Regression etc
This aims to predict via different supervised machine learning methods.
Attribute Information:
Age
Sex
Serum cholestral
Code:
The Parkinson Disease prediction module is one of the core of a multiple Disease
prediction system.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like KNN, XGBoost, SVM, RANDOM
FOREST, Logistic Regression etc.
Attribute Information:
Code:
For Malaria Disease Prediction, the system utilizes TensorFlow with Keras, achieving
an impressive accuracy of 96%. This high accuracy demonstrates the power of deep
learning models in accurately predicting malaria disease, enabling early detection and
proactive care.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .
Code:
Intestine Disease Prediction is also included in the system, utilizing TensorFlow with
Keras and achieving an accuracy of 95%. The deep learning model developed using
these technologies can effectively detect the presence of intestine disease, enabling
early diagnosis and intervention.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .
Code:
7.2 TESTING
The Multiple Disease Prediction system requires user input in the form of parameters specific
to each disease. When the user selects a particular disease from the options menu, the system
prompts for the relevant parameters. The input design should ensure that the user can easily
provide the required information The application provides a user interface with a menu
containing five disease options: heart disease, diabetes, Parkinson's disease malaria disease and
intestine disease. When the user clicks on a specific disease, the application prompts for the
required parameters for that particular disease prediction. The input design should ensure that
the parameters requested are relevant and necessary for accurate disease prediction. The user
should be able to enter the parameters in a user-friendly and intuitive manner.
The Multiple Disease Prediction system provides the predicted result of whether the person is
affected by the selected disease or not. The output design should present the result in a clear and
understandable format. The system should display the output after the user has entered the
parameters. The output could be presented as:
"Prediction: The person is affected by [Disease Name]." (If the prediction is positive)
"Prediction: The person is not affected by [Disease Name]." (If the prediction is negative)
The output should be displayed on the user interface, allowing the user to easily interpret the
prediction result. Overall, the input design ensures that the user can enter the necessary
parameters for disease prediction, while the output design presents the prediction result clearly
on the user interface.
CHAPTER 8
RESULTS AND DISCUSSIONS
We have used a large dataset which consists of 70% training data and 30% testing data. The
algorithms used for comparison were Naive Bayes, Decision Tree, SVM and Random Forest. The
algorithms selected for comparison were based on the accuracy and time taken for prediction of
class label. The accuracy analysis of algorithms on the dataset can be seen in Table .
The existing system doesn’t have kidney disease and breast cancer prediction system. that’s
why we leave “-” in the existing system accuracy for kisney disease amd breast cancer.
prediction system. that’s why we leave “-” in the existing system accuracy for kisney disease
amd breast cancer.
CHAPTER 9
CONCLUSION AND FUTURE SCOPE
In conclusion, our project utilized machine learning algorithms, including Support Vector
Machine (SVM), Logistic Regression, and TensorFlow with Keras, to develop a disease
prediction system. The system focused on five diseases: diabetes, heart disease, Parkinson's
disease, malaria disease and intestine disease. We collected data from Kaggle.com and performed
preprocessing to ensure data quality. For diabetes prediction, we achieved an accuracy of 78%
using the SVM algorithm. Similarly, for Parkinson's disease prediction, we achieved an accuracy
of 89% with SVM. Logistic Regression was employed for heart disease prediction, resulting in
an accuracy of 85%. For malaria disease and intestine disease, we utilized TensorFlow with
Keras, achieving accuracy rates of 96% and 95% respectively. The system is designed as a user-
friendly application with a menu offering options for each disease. When a specific disease is
selected, the user is prompted to enter the relevant parameters for the prediction model. Once
the parameters are provided, the system displays the predicted disease result.
The project "Multiple Disease Prediction using Machine Learning, Deep Learning and
Streamlit" has shown promising results in predicting various diseases with respectable
accuracies. Moving forward, there are several potential areas for future development and
enhancement:
Expansion of Disease Prediction: The current project focuses on diabetes, heart disease,
Parkinson's disease,malaria disease and intestine disease. In the future, additional diseases can
be included to create a more comprehensive and diverse disease prediction system.
Integration of More Machine Learning Algorithms: While the project already employs
Support Vector Machines (SVM), Logistic Regression, and TensorFlow with Keras, there are
many other machine learning algorithms that can be explored. Incorporating algorithms such
as Random Forest, Gradient Boosting, or Neural Networks may further improve the accuracy
and performance of the disease prediction models.
Integration of Advanced Feature Engineering Techniques: Feature engineering plays a crucial
role in extracting meaningful information from the input data. Exploring advanced feature
engineering techniques like dimensionality reduction, feature selection, and feature extraction
can potentially enhance the prediction models and their interpretability.
REFERENCES