Final Project Report
Final Project Report
CHAPTERS
NO
1 INTRODUCTION 1
3.3 DISADVANTAGE 5
4 PROPOSED SYSTEM 7
6 SYSTEM DESIGN 10
6.1 DESCRIPTION 10-11
TABLE OF CONTENTS
7 SYSTEM IMPLEMENTATION AND 16
TESTING
7.1 IMPLEMENTATION 16
7.2 TESTING 21
REFERENCES 26
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
The medical services industry can go with a successful choice by "mining" the huge
data set they have for example by extracting the hidden relationships and connections in the
data set. Data mining algorithms like Random Forest Logistic Regression, TensorFlow and
keras, SVM and Naïve Bayes calculations can give a solution for this present circumstance.
Thus, we have developed a computerized framework that can discover and extract hidden
knowledge associated with the diseases from a historical (diseases-side effects) data set by
the standard arrangement of the particular algorithm. The medical care and clinical area are
more in need of data mining today.
At the point when certain information mining strategies are utilized in a correct
manner, significant data can be removed from enormous data sets and that can assist the
clinical specialist with taking early choice and further develop healthcare administrations.
The spirit is to use the classification in order to assist the physician. During a ton of
examinations over existing frameworks in medical services, examination thought about just a
single sickness at a time. Most extreme articles center around a specific sickness. At the point
when any association needs to break down their patient's well being reports then they need to
send many models. The methodology in the current framework is helpful to dissect just
specific illnesses.
These days mortality has expanded because of not distinguishing the specific
infection. Indeed, even the patient who got restored from one sickness might be experiencing
another infection. Inside experiencing heart issues which are not distinguished. Like this
many occasions are seen in many individuals' life stories.In numerous sickness expectation
frameworks a client can break down more than one illness on a solitary site.
The client doesn't have to cross better places to foresee whether he/she has a specific
infection or not. In this, the client needs to choose the name of the specific illness, enter its
boundaries and simply click on submit. The comparing AI model will be summoned and it
will anticipate the result and show it on the screen.
CHAPTER 2
LITERATURE SURVEY
There have been various examinations done connected with predicting the disease using
different Techniques and algorithms which can be used by Healthcare centers. This paper
reviews on the strategies and results used by the research papers:
Sateesh Ambesange [1] detected the health parameters by various sensors. The Arduino boards
processed the data received from the sensors and demonstrated the prediction of Diabetes, using
only core health parameters and compared the results with the complete PIDD data set ,resulted
in 81.91% precision for KNN algorithm 81.81%
Chetan Sagarnal [3] in this the algorithms are selected, the symptoms are processed, and the
disease is predicted which is resulted with 95.12%
Nuzhat F.Shaikh [4] In the visualization of the modules by different techniques for
understanding and algorithm selected for comparison basis of accuracy and time taken for the
class labels with the best accuracy 98.12 by J48 algorithm.
Rashmi G Saboji et al, [5] tried to find a scalable solution that can predict heart disease utilizing
classification mining and used Random Forest Algorithm. This system presents a comparison
against Naïve-Bayes classifiers but Random Forest gives more accurate results withaccuracy
98%.
Pahulpreet Singh Kohli et al, [6] suggested disease prediction by using applications and
methods of machine learning and used techniques like Logistic Regression, Decision Tree,
Support Vector Machine, Random Forest and Adaptive Boosting. This paper focuses on
predicting Heart disease, Breast cancer, and Diabetes. The highest accuracies are obtained using
Logistic Regression that is 95.71% for Breast cancer, 84.42% for Diabetes, and 87.12% for
Heart disease.
Lambodar Jena et al, [7] focused on risk prediction for chronic diseases by taking advantage of
distributed machine learning classifiers and used techniques like Naive Bayes and Multilayer
Perceptron. This paper tries to predict Chronic-Kidney-Disease and the accuracy of Naïve
Bayes and Multilayer Perceptron is 95% and 99.7% respectively.
Naganna Chetty et al, [8] developed a system that gives improved results for disease prediction
and used a fuzzy approach. And used techniques like KNN classifier, Fuzzy c-means clustering,
and Fuzzy KNN classifier. In this paper diabetes disease and liver disorder prediction is done
and the accuracy of Diabetes is 97.02% and Liver disorder is 96.13.
Sayali Ambekar et al, [9] recommended Disease Risk Prediction and used a convolution neural
network to perform the task. In this paper machine learning techniques like CNN-UDRP
algorithm, Naive Bayes, and KNN algorithm are used. The system uses structured data to be
trained and its accuracy reaches 82% and is achieved by using Naïve Bayes.
MinChen et al, [10] proposed a disease prediction system in his paper where he used machine
learning algorithms. In the prediction of disease, he used techniques like CNN- UDRP
algorithm, CNN-MDRP algorithm, Naive Bayes, K-Nearest Neighbor, and Decision Tree. This
proposed system had an accuracy of 94.8% .
CHAPTER 3
Many of the existing machine learning models for health care analysis are
concentrating on one disease per analysis. For example first is for liver analysis, one for
cancer analysis, one for lung diseases like that. If a user wants to predict more than one
disease, he/she has to go through different sites.
There is no common system where one analysis can perform more thanone disease
prediction. Some of the models have lower accuracy which can seriously affect patients’
health. When an organization wants to analyse their patient’s health reports, they haveto
deploy many models which in turn increases the cost as well as time Some of the existing
systems consider very few parameters which can yield false results.
Multiple Disease Prediction using Machine Learning,Deep Learning and Streamlit The
existing system is a project that focuses on predicting diabetes, heart disease, and Parkinson's
disease using various machine learning algorithms. The algorithms employed in this project
include Naive Bayes classifier, Decision Trees classifier, Random Forest classifier, Support
Vector Machine (SVM), and Logistic Regression. To deploy the models, Streamlit Cloud and
Streamlit library are utilized, providing a user-friendly interface for disease prediction.
The system collects data from various sources, preprocesses it, trains the models with
the processed data, and tests their performance. One of the algorithms used in the system is
SVM, which achieved a prediction accuracy of 76% for diabetes. This means that the SVM
model correctly predicted diabetes in 76% of the cases it was tested on. The performance of
the SVM algorithm indicates its effectiveness in distinguishing between diabetic and non-
diabetic individuals. Similarly, for Parkinson's disease prediction, the SVM algorithm
achieved a prediction accuracy of 71%. This means that the SVM model accurately predicted
the presence or absence of Parkinson's disease in 71% of the cases.
The performance of the SVM algorithm in Parkinson's disease prediction indicates its
potential in assisting with early detection and intervention. The system incorporates other
machine learning algorithms such as Naive Bayes, Decision Trees, and Random Forest,
which may have varying performance metrics for different diseases.
These algorithms are designed to leverage different characteristics of the data and
make predictions based on distinct methodologies. Overall, the existing system demonstrates
the effectiveness of machine learning algorithms in predicting diabetes, heart disease, and
Parkinson's disease. The use of Streamlit Cloud and Streamlit library allows for easy
deployment and provides a user-friendly interface for interacting with the prediction models.
Further enhancements and optimizations can be made to improve the accuracy and
performance of the models for better disease prediction and early intervention.
Data bias: One of the biggest concerns with machine learning systems is data bias. If the
training data used to develop the system is biased or incomplete, it can lead to inaccurate
predictions and misdiagnosis. This is especially problematic when it comes to
underrepresented populations, as their data may not be well-represented in the
training set.
Overfitting: Overfitting occurs when a machine learning model is trained too closely to a
particular dataset and becomes overly specialized in predicting it. This can result in poor
generalization to new data and lower accuracy.
Lack of interpretability: Many machine learning algorithms are "black boxes," meaning
that it is difficult to understand how they arrive at their predictions. This can be
problematic in healthcare, where it is important to be able to explain how a diagnosis
was made.
To address the identified issues in the existing system and create a more comprehensive and
accurate machine learning model for health care analysis, the proposed solution involves the
development of a multi-disease prediction system with improved accuracy, reduced bias, and
enhanced interpretability. The strategy includes the following key components:
Integrated Multi-Disease Prediction Model: Develop a unified machine learning model
capable of predicting multiple diseases simultaneously. Integrate diverse datasets related to
various diseases to create a comprehensive and holistic analysis system.
Data Quality Assurance: Implement rigorous data preprocessing techniques to address data
bias and incompleteness. Ensure the inclusion of diverse and representative datasets,
especially focusing on underrepresented populations, to reduce bias.
Regularization Techniques to Mitigate Overfitting: Apply regularization techniques such
as dropout in neural networks to prevent overfitting. Use cross-validation strategies during
model development to assess generalization performance.
Interpretable Machine Learning Models: Choose machine learning algorithms with
inherent interpretability, such as decision trees or rule-based models. Implement model-
agnostic interpretability tools to enhance understanding of complex models.
Continuous Model Monitoring and Improvement: Establish a system for ongoing model
monitoring to identify and address performance degradation. Implement mechanisms for
continuous learning, allowing the model to adapt to evolving healthcare trends and data
characteristics.
CHAPTER 4
PROPOSED SYSTEM
The system employs the SVM algorithm to predict diabetes, achieving an accuracy of
78%. This indicates that the SVM model can accurately identify the presence or absence of
diabetes in patients, aiding in early detection and effective management. For Parkinson's
disease prediction, the system uses the SVM algorithm with an accuracy of 87%. This high
accuracy demonstrates the capability of the SVM model to distinguish individuals with
Parkinson's disease from healthy individuals.
Heart disease prediction is performed using the Logistic Regression algorithm, which
achieves an accuracy of 85%. This model effectively identifies the likelihood of heart disease
in patients, supporting timely intervention and appropriate treatment. For malaria disease
prediction, the system utilizes TensorFlow with Keras, achieving an impressive accuracy of
96%. This high accuracy demonstrates the power of deep learning models in accurately
predicting malaria disease, enabling early detection and proactive care. intestine disease
prediction is also included in the system, utilizing TensorFlow with Keras and achieving an
accuracy of 95%.
The deep learning model developed using these technologies can effectively detect the
presence of intestine disease, enabling early diagnosis and intervention.
CHAPTER 5
5.1 REQUIREMENTS
All computer software needs certain hardware components or other software resources
to be present on a computer. These prerequisites are known as (computer) system
requirements and are often used as a guideline as opposed to an absolute rule. Most software
defines two sets of system requirements: minimum and recommended. With increasing
demand for higher processing power and resources in newer versions of software, system
requirements tend to increase over time.
Back-End : Python3.12 .
CHAPTER 6
SYSTEM DESIGN
This chapter provides information of software development life cycle, design model i.e.various
UML diagrams and process specification.
6.1 DESCRIPTION
To design a system for Multiple Disease prediction based on lab reports using machine
learning, we can follow the following steps:
Data Collection: Data is collected from Kaggle.com, a popular platform for accessing
datasets. The data is obtained specifically for diabetes, heart disease, Parkinson's disease,
malaria disease and intestine disease.
Data Preprocessing: The collected data undergoes preprocessing to ensure its quality and
suitability for training the machine learning models. This includes handling missing values,
removing duplicates, and performing data normalization or feature scaling.
Model Selection: Different machine learning algorithms are chosen for each disease
prediction task. Support Vector Machine (SVM), Logistic Regression, and TensorFlow
with Keras are selected as the algorithms for various diseases based on their performance
and suitability for the specific prediction tasks.
Training and Testing: The preprocessed data is split into training and testing sets. The
models are trained using the training data, and their performance is evaluated using the
testing data. Accuracy is used as the evaluation metric to measure the performance of each
model
Model Deployment: Streamlit, along with its cloud deployment capabilities, is used to
create an interactive web application. The application offers a user-friendly interface with
five options for disease prediction: heart disease, diabetes, Parkinson's disease, malaria
disease and intestine disease. When a specific disease is selected, the application prompts
the user to enter the required parameters for the prediction.
Use case diagrams model behavior within a system and helps the developers
understand of what the user require.
Use case diagram can be useful for getting an overall view of the system and
clarifying who can do and more importantly what they can’t do.
Use case diagram consists of use cases and actors and shows the interaction between
the use case and actors.
Above figure 6.3 use case diagram consists of two actors named as user and system.
User can perform actions like select the Entity and Enter the details. System perform actions
select the entity means select the disease and enter the patient details then load the dataset and
classify the data finally predict the disease.
One of the primary uses of sequence diagrams is in the transition from requirements
expressed as use cases to the next and more formal level of refinement. Use cases are often
refined into one or more sequence diagrams.
From the Fig:6.4 sequence diagram the prediction system can collect the data from
actor and store the data in dataset. Prediction system processes the train data and access the
data from dataset then prediction system use the train and test data and apply ML algorithms
and check user status value and grand status values then get the output.
CHAPTER 7
7.1 IMPLEMENTATION:
7.1.1 MODULES
• The aim of the prediction is which can perform early prediction of diabetes of a patient.
• It uses data about the Effected and normal people data preferences to generate Whether
person is effected or not from a particular Disease.
Attribute Information:
Pregnancies
Glucose
Blood pressure
SkinThickness
Insulin
BMI
DiabetesPedigreeFunction
Age
Code:
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like
KNN,XGBoost,SVM,RANDOM FOREST, Logistic Regression etc
This aims to predict via different supervised machine learning methods.
Attribute Information:
Age
Sex
Serum cholestral
Code:
The Parkinson Disease prediction module is one of the core of a multiple Disease
prediction system.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like KNN, XGBoost, SVM, RANDOM
FOREST, Logistic Regression etc.
Attribute Information:
status - The health status of the subject (one) - Parkinson's, (zero) – healthy.
Code:
For Malaria Disease Prediction, the system utilizes TensorFlow with Keras, achieving
an impressive accuracy of 96%. This high accuracy demonstrates the power of deep
learning models in accurately predicting malaria disease, enabling early detection and
proactive care.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .
Code:
Intestine Disease Prediction is also included in the system, utilizing TensorFlow with
Keras and achieving an accuracy of 95%. The deep learning model developed using
these technologies can effectively detect the presence of intestine disease, enabling
early diagnosis and intervention.
It uses data about the Effected and normal people data preferences to generate the
result of the patient.
It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .
Code:
7.2 TESTING
The Multiple Disease Prediction system requires user input in the form of parameters specific
to each disease. When the user selects a particular disease from the options menu, the system
prompts for the relevant parameters. The input design should ensure that the user can easily
provide the required information The application provides a user interface with a menu
containing five disease options: heart disease, diabetes, Parkinson's disease malaria disease
and intestine disease. When the user clicks on a specific disease, the application prompts for
the required parameters for that particular disease prediction. The input design should ensure
that the parameters requested are relevant and necessary for accurate disease prediction. The
user should be able to enter the parameters in a user-friendly and intuitive manner.
The Multiple Disease Prediction system provides the predicted result of whether the person is
affected by the selected disease or not. The output design should present the result in a clear
and understandable format. The system should display the output after the user has entered the
parameters. The output could be presented as:
"Prediction: The person is affected by [Disease Name]." (If the prediction is positive)
"Prediction: The person is not affected by [Disease Name]." (If the prediction is
negative)
The output should be displayed on the user interface, allowing the user to easily interpret the
prediction result. Overall, the input design ensures that the user can enter the necessary
parameters for disease prediction, while the output design presents the prediction result clearly
on the user interface.
CHAPTER 8
RESULTS AND DISCUSSIONS
We have used a large dataset which consists of 70% training data and 30% testing data. The
algorithms used for comparison were Naive Bayes, Decision Tree, SVM and Random Forest.
The algorithms selected for comparison were based on the accuracy and time taken for
prediction of class label. The accuracy analysis of algorithms on the dataset can be seen in
Table .
The existing system doesn’t have kidney disease and breast cancer prediction system. that’s
why we leave “-” in the existing system accuracy for kisney disease amd breast cancer.
prediction system. that’s why we leave “-” in the existing system accuracy for kisney disease
amd breast cancer.
CHAPTER 9
CONCLUSION AND FUTURE SCOPE
The project "Multiple Disease Prediction using Machine Learning, Deep Learning and
Streamlit" has shown promising results in predicting various diseases with respectable
accuracies. Moving forward, there are several potential areas for future development and
enhancement:
Expansion of Disease Prediction: The current project focuses on diabetes, heart disease,
Parkinson's disease,malaria disease and intestine disease. In the future, additional diseases
can be included to create a more comprehensive and diverse disease prediction system.
Integration of More Machine Learning Algorithms: While the project already employs
Support Vector Machines (SVM), Logistic Regression, and TensorFlow with Keras, there are
many other machine learning algorithms that can be explored. Incorporating algorithms such
as Random Forest, Gradient Boosting, or Neural Networks may further improve the accuracy
and performance of the disease prediction models.
Integration of Advanced Feature Engineering Techniques: Feature engineering plays a crucial
role in extracting meaningful information from the input data. Exploring advanced feature
engineering techniques like dimensionality reduction, feature selection, and feature extraction
REFERENCES
8879, 2017.