0% found this document useful (0 votes)
9 views

Main Project

Uploaded by

Rifa Sheikh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Main Project

Uploaded by

Rifa Sheikh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi-590014, Karnataka, India

A PROJECT ON

“MULTIPLE DISEASE PREDICTION USING MACHINE LEARNING AND


STREAMLIT”
As prescribed by VTU for Eight semester project, regards to the subject
MAJOR PROJECT
FOR
Bachelor of Engineering
In
COMPUTER SCIENCE AND ENGINEERING
For the Academic year 2023-2024
Submitted by
MOHAMMAD AMAAN 4SH20CS028
RIFA MARYAM SHEIKH 4SH20CS052
SAYYED FARAZ 4SH20CS056

Under the Guidance of


Ms. Tejakshi N S
Asst. Professor
Department of CSE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SHREE DEVI INSTITUTE OF TECHNOLOGY, MANGALURU-574 142
SHREE DEVI INSTITUTE OF TECHNOLOGY
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

Certified that the project work entitled “MULTIPLE DISEASE PREDICTION USING
MACHINE LEARNING AND STREAMLIT” is a bonafide work carried out by
MOHAMMAD AMAAN, RIFA MARYAM SHEIKH, SAYYED FARAZ bearing
USN’s 4SH20CS028, 4SH20CS052, 4SH20CS056 respectively in partial fulfilment of eight
semester project, regards to the subject “Major Project” for the award of degree of Bachelor of
Engineering in Computer Science and Engineering of Visvesvaraya Technological University,
Belagavi during the year 2023-2024. It is certified that all correction/ suggestions indicated by the
guide have been incorporated in the report. The project report has been approved as it satisfies the
academic requirement in respect of the seminar work prescribed for the said degree.

Signature of the Guide Signature of the HOD Signature of Principal


Ms. Tejakshi N S Prof. Anand S Uppar Dr.K.E.Prakash
Asst. Professor HOD, Dept. of CSE Principal and Director
Dept. of CSE
EXTERNAL VIVA

Name of the Examiners Signature with Date


1.
2.

de
ABSTRACT

In today’s world, Deep learning techniques are playing a vital role in all areas. It has already made a
massive impact in almost every field, such as self-driving cars, cancer diagnosis, predictive forecasting,
precision medicine and speech recognition. The limitations of traditional learning techniques are
overcome by deep learning techniques.

Healthcare falls under the essential conveniences to be given to the society. Many of the current AI models
for medical services examination are focusing on one disease prediction for each analysis. Our point is to
anticipate the various sorts of illness in single stage by utilizing inbuilt python module Streamlit. In this
task we are utilizing Naïve Bayes algorithm, Logistic Regression, TensorFlow and keras, random forest,
SVM, classifier are utilized for prediction of a particular disease .The calculation which gives more
accuracy is used to train the data set before implementation. To implement multiple disease analysis used
machine learning algorithms, Streamlit and python pickling is utilized to save the model behaviour. In this
article we analyse Diabetes analysis, Heart disease, parkinson's disease, malaria disease and intestine
disease by using some of the basic parameters such as Pulse Rate, Cholesterol, Blood Pressure, Heart
Rate and also image etc., and also the risk factors associated with the disease can be found using prediction
model with good accuracy and Precision. Further we can include other kind of chronic diseases, skin
diseases and many other. In this work, demonstrated that using only core health parameters many diseases
can be predicted. The significance of this analysis to analyse the maximum diseases to screen the patient's
condition and caution the patients ahead of time to diminish mortality proportion.
ACKNOWLEDGEMENT

A successful project is a fruitful combination of the efforts of many people. Some directly
involved and others who have quietly encouraged and extended their invaluable support
throughout its progress.

We would also like to convey our heartfelt thanks to our Management for providing the
good infrastructure, laboratory facility, qualified and inspiring staff whose guidance was of
great help in successful completion of this project.

We are extremely grateful and thankful to our beloved Principal Dr. K E Prakash for
providing the congenial atmosphere and necessary facilities for achieving the cherished
goal.

We feel delighted to have this page to express our sincere thanks and deep appreciation to
Prof. Anand S Uppar, Head of the Department, Computer Science and Engineering for
his valuable guidance, keen interest and constant encouragement throughout the entire period
of this project work.

We are profoundly indebted to Ms. Tejakshi N S, Guide, Assistant Professor, Department


of Computer Science and Engineering, for her guidance throughout the project work by
innumerable acts of timely advice and encouragement.

We also thank all other teaching staff and non-teaching staff for allowing us to carry out the
project work.

Finally, we would like to thank our family for their support and understanding, to whom
we owe so much.

MOHAMMAD AMAAN
RIFA MARYAM SHEIKH
SAYYED FARAZ
SHREE DEVI INSTITUTE OF TECHNOLOGY
KENJAR, MANGALURU – 574142
Department of Computer Science and Engineering

DECLARATION

We MOHAMMAD AMAAN, RIFA MARYAM SHEIKH, SAYYED FARAZ bearing USN’s


4SH20CS028 , 4SH20CS052, 4SH20CS056, respectively, students of eighth semester Bachelor of
Engineering, Computer Science and Engineering, Shree Devi Institute of Technology, Mangalore
declare that the major project work entitled “MULTIPLE DISEASE PREDICTION USING
MACHINE LEARNING AND STREAMLIT” has been duly executed by us under the guidance
of Ms. Tejakshi N S, Asst. Professor, Department of Computer Science and Engineering, Shree
Devi Institute of Technology, Mangalore and submitted for the requirements for the 8th semester
major project of Bachelor of Engineering in Computer Science Engineering during the year
2023- 2024.

Date:

Place: Mangalore
MOHAMMAD AMAAN [4SH20CS028]
RIFA MARYAM SHEIKH [4SH20CS052]
SAYYED FARAZ [4SH20CS056]
TABLE OF CONTENTS
CHAPTERS
PG.NO

1 INTRODUCTION 1

2 LITERATURE SURVEY 2-3

3 PROBLEM STATEMENT AND 4


SOLUTION STRATEGY
3.1 PROBLEM STATEMENT 4

3.2 EXISTING SYSTEM 4-5

3.3 DISADVANTAGE 5

3.4 SOLUTION STRATEGY 5-6

4 PROPOSED SYSTEM 7

5 REQUIREMENT ANALYSIS AND 8


SPECIFICATION
5.1 REQUIREMENTS 8

5.2 HARDWARE AND SOFTWARE REQUIREMENTS 8

5.2.1 HARDWARE REQUIREMENTS 8

5.2.2 SOFTWARE REQUIREMENTS 9

6 SYSTEM DESIGN 10
6.1 DESCRIPTION 10-11

6.2 SYSTEM ARCHITECTURE 12

6.3 USE CASE DIAGRAM 13

6.4 SEQUENCE DIAGRAM 14

6.5 DEPLOYMENT DIAGRAM 15


7 SYSTEM IMPLEMENTATION AND 16
TESTING
7.1 IMPLEMENTATION 16

7.1.1 MODULES 16-20

7.2 TESTING 21

7.2.1 INPUT DESIGN 21

7.2.2 OUTPUT DESIGN 21

8 RESULTS AND DISCUSSIONS 22


8.1 DIABETES PREDICTION 22

8.2 HEART DISEASE PREDICTION 23

8.3 PARKINSON’S DISEASE PREDICTION 23

8.4 MALARIA DISEASE PREDICTION 24

8.5 INTESTINE DISEASE PREDICTION 24

9 CONCLUSION AND FUTURE SCOPE 25


9.1 FUTURE SCOPE 25

REFERENCES 26
LIST OF FIGURES

FIGURE NO TITLE PG.NO

6.1 SYSTEM DESIGN 11

6.2 SYSTEM ARCHITECTURE 12

6.3 USE CASE DIAGRAM 13

6.4 SEQUENCE DIAGRAM 14

6.5 DEPLOYMENT DIAGRAM 15

8.1 DIABETES PREDICTION 22

8.2 HEART DISEASE PREDICTION 23

8.3 PARKINSON’S DISEASE PREDICTION 23

8.4 MALARIA DISEASE PREDICTION 24

8.5 INTESTINE DISEASE PREDICTION 24


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 1
INTRODUCTION

The medical services industry can go with a successful choice by "mining" the huge
data set they have for example by extracting the hidden relationships and connections in the
data set. Data mining algorithms like Random Forest Logistic Regression, TensorFlow and
keras, SVM and Naïve Bayes calculations can give a solution for this present circumstance.
Thus, we have developed a computerized framework that can discover and extract hidden
knowledge associated with the diseases from a historical (diseases-side effects) data set by the
standard arrangement of the particular algorithm. The medical care and clinical area are more
in need of data mining today.

At the point when certain information mining strategies are utilized in a correct manner,
significant data can be removed from enormous data sets and that can assist the clinical
specialist with taking early choice and further develop healthcare administrations. The spirit is
to use the classification in order to assist the physician. During a ton of examinations over
existing frameworks in medical services, examination thought about just a single sickness at a
time. Most extreme articles center around a specific sickness. At the point when any association
needs to break down their patient's well being reports then they need to send many models. The
methodology in the current framework is helpful to dissect just specific illnesses.

These days mortality has expanded because of not distinguishing the specific infection.
Indeed, even the patient who got restored from one sickness might be experiencing another
infection. Inside experiencing heart issues which are not distinguished. Like this many
occasions are seen in many individuals' life stories.In numerous sickness expectation
frameworks a client can break down more than one illness on a solitary site.

The client doesn't have to cross better places to foresee whether he/she has a specific
infection or not. In this, the client needs to choose the name of the specific illness, enter its
boundaries and simply click on submit. The comparing AI model will be summoned and it will
anticipate the result and show it on the screen.

Dept of CSE, SDIT 1


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 2

LITERATURE SURVEY
There have been various examinations done connected with predicting the disease using
different Techniques and algorithms which can be used by Healthcare centers. This paper
reviews on the strategies and results used by the research papers:

Sateesh Ambesange [1] detected the health parameters by various sensors. The Arduino boards
processed the data received from the sensors and demonstrated the prediction of Diabetes, using
only core health parameters and compared the results with the complete PIDD data set ,resulted
in 81.91% precision for KNN algorithm 81.81%

Akkem Yaganteeswarudu [2] conducted comparative study on the effectiveness of Decision


Tree, Random Forest and logistic regression algorithms in predicting multi Disease which
resulted in logistic regression results 92% accuracy, for heart disease classification
Randomforest yield 95%accuracy and for cancer detection SVM yield 96% accuracy.

Chetan Sagarnal [3] in this the algorithms are selected, the symptoms are processed, and the
disease is predicted which is resulted with 95.12%

Nuzhat F.Shaikh [4] In the visualization of the modules by different techniques for
understanding and algorithm selected for comparison basis of accuracy and time taken for the
class labels with the best accuracy 98.12 by J48 algorithm.

Rashmi G Saboji et al, [5] tried to find a scalable solution that can predict heart disease utilizing
classification mining and used Random Forest Algorithm. This system presents a comparison
against Naïve-Bayes classifiers but Random Forest gives more accurate results withaccuracy
98%.

Pahulpreet Singh Kohli et al, [6] suggested disease prediction by using applications and methods
of machine learning and used techniques like Logistic Regression, Decision Tree, Support
Vector Machine, Random Forest and Adaptive Boosting. This paper focuses on predicting Heart
disease, Breast cancer, and Diabetes. The highest accuracies are obtained using Logistic
Regression that is 95.71% for Breast cancer, 84.42% for Diabetes, and 87.12% for Heart
disease.

Dept of CSE, SDIT 2


Multiple Disease Prediction Using ML and Streamlit 2023-2024

Lambodar Jena et al, [7] focused on risk prediction for chronic diseases by taking advantage of
distributed machine learning classifiers and used techniques like Naive Bayes and Multilayer
Perceptron. This paper tries to predict Chronic-Kidney-Disease and the accuracy of Naïve Bayes
and Multilayer Perceptron is 95% and 99.7% respectively.

Naganna Chetty et al, [8] developed a system that gives improved results for disease prediction
and used a fuzzy approach. And used techniques like KNN classifier, Fuzzy c-means clustering,
and Fuzzy KNN classifier. In this paper diabetes disease and liver disorder prediction is done
and the accuracy of Diabetes is 97.02% and Liver disorder is 96.13.

Sayali Ambekar et al, [9] recommended Disease Risk Prediction and used a convolution neural
network to perform the task. In this paper machine learning techniques like CNN-UDRP
algorithm, Naive Bayes, and KNN algorithm are used. The system uses structured data to be
trained and its accuracy reaches 82% and is achieved by using Naïve Bayes.

MinChen et al, [10] proposed a disease prediction system in his paper where he used machine
learning algorithms. In the prediction of disease, he used techniques like CNN- UDRP
algorithm, CNN-MDRP algorithm, Naive Bayes, K-Nearest Neighbor, and Decision Tree. This
proposed system had an accuracy of 94.8% .

Dept of CSE, SDIT 3


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 3

PROBLEM STATEMENT AND SOLUTION STRATEGY

3.1 PROBLEM STATEMENT

Many of the existing machine learning models for health care analysis are concentrating
on one disease per analysis. For example first is for liver analysis, one for cancer analysis, one
for lung diseases like that. If a user wants to predict more than one disease, he/she has to go
through different sites.
There is no common system where one analysis can perform more thanone disease
prediction. Some of the models have lower accuracy which can seriously affect patients’ health.
When an organization wants to analyse their patient’s health reports, they haveto deploy many
models which in turn increases the cost as well as time Some of the existing systems consider
very few parameters which can yield false results.

3.2 EXISTING SYSTEM

Multiple Disease Prediction using Machine Learning,Deep Learning and Streamlit The
existing system is a project that focuses on predicting diabetes, heart disease, and Parkinson's
disease using various machine learning algorithms. The algorithms employed in this project
include Naive Bayes classifier, Decision Trees classifier, Random Forest classifier, Support
Vector Machine (SVM), and Logistic Regression. To deploy the models, Streamlit Cloud and
Streamlit library are utilized, providing a user-friendly interface for disease prediction.

The system collects data from various sources, preprocesses it, trains the models with
the processed data, and tests their performance. One of the algorithms used in the system is
SVM, which achieved a prediction accuracy of 76% for diabetes. This means that the SVM
model correctly predicted diabetes in 76% of the cases it was tested on. The performance of
the SVM algorithm indicates its effectiveness in distinguishing between diabetic and non-
diabetic individuals. Similarly, for Parkinson's disease prediction, the SVM algorithm achieved
a prediction accuracy of 71%. This means that the SVM model accurately predicted the
presence or absence of Parkinson's disease in 71% of the cases.

Dept of CSE, SDIT 4


Multiple Disease Prediction Using ML and Streamlit 2023-2024

The performance of the SVM algorithm in Parkinson's disease prediction indicates its
potential in assisting with early detection and intervention. The system incorporates other
machine learning algorithms such as Naive Bayes, Decision Trees, and Random Forest, which
may have varying performance metrics for different diseases.

These algorithms are designed to leverage different characteristics of the data and make
predictions based on distinct methodologies. Overall, the existing system demonstrates the
effectiveness of machine learning algorithms in predicting diabetes, heart disease, and
Parkinson's disease. The use of Streamlit Cloud and Streamlit library allows for easy
deployment and provides a user-friendly interface for interacting with the prediction models.
Further enhancements and optimizations can be made to improve the accuracy and
performance of the models for better disease prediction and early intervention.

3.3 DISADVANTAGES OF EXISTING SYSTEM

 Data bias: One of the biggest concerns with machine learning systems is data bias. If the
training data used to develop the system is biased or incomplete, it can lead to inaccurate
predictions and misdiagnosis. This is especially problematic when it comes to
underrepresented populations, as their data may not be well-represented in the training
set.

 Overfitting: Overfitting occurs when a machine learning model is trained too closely to a
particular dataset and becomes overly specialized in predicting it. This can result in poor
generalization to new data and lower accuracy.

 Lack of interpretability: Many machine learning algorithms are "black boxes," meaning that
it is difficult to understand how they arrive at their predictions. This can be problematic
in healthcare, where it is important to be able to explain how a diagnosis was made.

3.4 Solution Strategy

To address the identified issues in the existing system and create a more comprehensive and
accurate machine learning model for health care analysis, the proposed solution involves the

Dept of CSE, SDIT 5


Multiple Disease Prediction Using ML and Streamlit 2023-2024

development of a multi-disease prediction system with improved accuracy, reduced bias, and
enhanced interpretability. The strategy includes the following key components:
 Integrated Multi-Disease Prediction Model: Develop a unified machine learning model
capable of predicting multiple diseases simultaneously. Integrate diverse datasets related to
various diseases to create a comprehensive and holistic analysis system.
 Data Quality Assurance: Implement rigorous data preprocessing techniques to address data
bias and incompleteness. Ensure the inclusion of diverse and representative datasets,
especially focusing on underrepresented populations, to reduce bias.
 Regularization Techniques to Mitigate Overfitting: Apply regularization techniques such
as dropout in neural networks to prevent overfitting. Use cross-validation strategies during
model development to assess generalization performance.
 Interpretable Machine Learning Models: Choose machine learning algorithms with
inherent interpretability, such as decision trees or rule-based models. Implement model-
agnostic interpretability tools to enhance understanding of complex models.
 Continuous Model Monitoring and Improvement: Establish a system for ongoing model
monitoring to identify and address performance degradation. Implement mechanisms for
continuous learning, allowing the model to adapt to evolving healthcare trends and data
characteristics.

Dept of CSE, SDIT 6


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 4

PROPOSED SYSTEM

The proposed system is a comprehensive disease prediction project that utilizes machine
learning algorithms, including Support Vector Machine (SVM), Logistic Regression,
TensorFlow with Keras, to predict multiple diseases such as diabetes, heart disease, Parkinson's
disease, malaria disease and intestine disease. The system aims to provide accurate disease
predictions based on input parameters and a user-friendly interface developed using Streamlit
and deployed on Streamlit Cloud. Data for the models is collected from the Kaggle platform, a
popular data science community, and is preprocessed to ensure its quality and suitability for
training the models. The preprocessed data is then used to train the respective machine learning
algorithms specific to each disease. The trained models are tested to evaluate their accuracy in
disease prediction.

The system employs the SVM algorithm to predict diabetes, achieving an accuracy of
78%. This indicates that the SVM model can accurately identify the presence or absence of
diabetes in patients, aiding in early detection and effective management. For Parkinson's
disease prediction, the system uses the SVM algorithm with an accuracy of 87%. This high
accuracy demonstrates the capability of the SVM model to distinguish individuals with
Parkinson's disease from healthy individuals.

Heart disease prediction is performed using the Logistic Regression algorithm, which
achieves an accuracy of 85%. This model effectively identifies the likelihood of heart disease
in patients, supporting timely intervention and appropriate treatment. For malaria disease
prediction, the system utilizes TensorFlow with Keras, achieving an impressive accuracy of
96%. This high accuracy demonstrates the power of deep learning models in accurately
predicting malaria disease, enabling early detection and proactive care. intestine disease
prediction is also included in the system, utilizing TensorFlow with Keras and achieving an
accuracy of 95%.

The deep learning model developed using these technologies can effectively detect the
presence of intestine disease, enabling early diagnosis and intervention.

Dept of CSE, SDIT 7


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 5

SYSTEM REQUIREMENTS ANALYSIS AND


SPECIFICATION

5.1 REQUIREMENTS

A software requirements specification (SRS) is a description of a software systemto


be developed, its defined after business requirements specification (CONOPS) also called
stakeholder requirements specification (STRS) other document related is the system
requirements specification (SYRS).

5.2 HARDWARE AND SOFTWARE REQUIREMENTS

All computer software needs certain hardware components or other software resources
to be present on a computer. These prerequisites are known as (computer) system requirements
and are often used as a guideline as opposed to an absolute rule. Most software defines two sets
of system requirements: minimum and recommended. With increasing demand for higher
processing power and resources in newer versions of software, system requirements tend to
increase over time.

5.2.1 HARDWARE REQUIREMENTS

System processor : Intel Core i7.

Hard Disk : 512 SSD.

Monitor : “15” LED.

Mouse : Optical Mouse.

RAM : 8.0 GB.

Key Board : Standard Windows Keyboard.

Dept of CSE, SDIT 8


Multiple Disease Prediction Using ML and Streamlit 2023-2024

5.2.2 SOFTWARE REQUIREMENTS

Operating system : Windows 10.

Coding Language : Python 3.9.

Front-End : Streamlit 3.7, Python.

Back-End : Python3.12 .

Python Modules : Pickle 1.2.3 .

Dept of CSE, SDIT 9


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 6

SYSTEM DESIGN

This chapter provides information of software development life cycle, design model i.e.various
UML diagrams and process specification.

6.1 DESCRIPTION

Systems design is the process or art of defining the architecture, components, modules,
interfaces, and data for a system to satisfy specified requirements. One could see it as the
application of systems theory to product development. There is some overlap and synergy
with the disciplines of systems analysis, systems architecture and systems engineering.
The System Design Document describes the system requirements, operating
environment, system and subsystem architecture, files and database design, input formats,
output layouts, human-machine interfaces, detailed design, processing logic, and external
interfaces.
This design activity describes the system in narrative form using non-technical terms.
It should provide a high-level system architecture diagram showing a subsystem breakout of
the system, if applicable. The high-level system architecture or subsystem diagrams should, if
applicable, show interfaces to external systems. Supply a high-level context diagram for the
system and subsystems, if applicable. Refer to the requirements trace ability matrix (RTM)
in the Functional Requirements Document (FRD), to identify the allocation of the functional
requirements into this design document.
This section describes any constraints in the system design (reference any trade-off
analyses conducted such, as resource use versus productivity, or conflicts with other systems)
and includes any assumptions made by the project team in developing the system design. This
section describes any contingencies that might arise in the design of the system that may change
the development direction. Possibilities include lack of interface agreements with outside
agencies or unstable architectures at the time this document is produced. Address any possible
workarounds or alternative plans.

To design a system for Multiple Disease prediction based on lab reports using machine
learning, we can follow the following steps:

Dept of CSE, SDIT 10


Multiple Disease Prediction Using ML and Streamlit 2023-2024

 Data Collection: Data is collected from Kaggle.com, a popular platform for accessing
datasets. The data is obtained specifically for diabetes, heart disease, Parkinson's disease,
malaria disease and intestine disease.
 Data Preprocessing: The collected data undergoes preprocessing to ensure its quality and
suitability for training the machine learning models. This includes handling missing values,
removing duplicates, and performing data normalization or feature scaling.
 Model Selection: Different machine learning algorithms are chosen for each disease
prediction task. Support Vector Machine (SVM), Logistic Regression, and TensorFlow with
Keras are selected as the algorithms for various diseases based on their performance and
suitability for the specific prediction tasks.
 Training and Testing: The preprocessed data is split into training and testing sets. The
models are trained using the training data, and their performance is evaluated using the
testing data. Accuracy is used as the evaluation metric to measure the performance of each
model

Fig. 6.1 System Design

 Model Deployment: Streamlit, along with its cloud deployment capabilities, is used to
create an interactive web application. The application offers a user-friendly interface with
five options for disease prediction: heart disease, diabetes, Parkinson's disease, malaria
disease and intestine disease. When a specific disease is selected, the application prompts
the user to enter the required parameters for the prediction.

Dept of CSE, SDIT 11


Multiple Disease Prediction Using ML and Streamlit 2023-2024

6.2 SYSTEM ARCHITECTURE


A system architecture is the conceptual model that defines the structure, behaviour, andmore
views of a system. An architecture description is a formal description and representation of a
system, organized in a way that supports reasoning about the structures andbehaviours of the
system.
A system architecture can consist of system components and the sub-systems
developed, that will work together to implement the overall system. There have been efforts to
formalize languages to describe system architecture, collectively these are called architecture
description languages.
Machine learning has given computer systems the ability to automatically learn without
being explicitly programmed. In this, the author has used three machine learning algorithms
(Logistic Regression, KNN, and Naïve Bayes). The architecture diagram describesthe high-
level overview of major system components and important working relationships.

Fig. 6.2 System Architecture

Dept of CSE, SDIT 12


Multiple Disease Prediction Using ML and Streamlit 2023-2024

6.3 USE CASE DIAGRAM

Use case diagrams model behavior within a system and helps the developers understand
of what the user require.

Use case diagram can be useful for getting an overall view of the system and clarifying
who can do and more importantly what they can’t do.

Use case diagram consists of use cases and actors and shows the interaction between
the use case and actors.

Fig. 6.3 Use Case Diagram

Above figure 6.3 use case diagram consists of two actors named as user and system.
User can perform actions like select the Entity and Enter the details. System perform actions
select the entity means select the disease and enter the patient details then load the dataset and
classify the data finally predict the disease.

Dept of CSE, SDIT 13


Multiple Disease Prediction Using ML and Streamlit 2023-2024

6.4 SEQUENCE DIAGRAM


A sequence diagram is a type of interaction diagram because it describes how and in
what order a group of objects works together. These diagrams are used by software developers
and business professionals to understand requirements for a new system or to document an
existing process. Sequence diagrams are sometimes known as event diagrams or event
scenarios.

One of the primary uses of sequence diagrams is in the transition from requirements
expressed as use cases to the next and more formal level of refinement. Use cases are often
refined into one or more sequence diagrams.

Fig. 6.4 Sequence Diagram

From the Fig:6.4 sequence diagram the prediction system can collect the data from
actor and store the data in dataset. Prediction system processes the train data and access the
data from dataset then prediction system use the train and test data and apply ML algorithms
and check user status value and grand status values then get the output.

Dept of CSE, SDIT 14


Multiple Disease Prediction Using ML and Streamlit 2023-2024

6.5 DEPLOYMENT DIAGRAM


The deployment diagram visualizes the physical hardware on which the software willbe
deployed. It portrays the static deployment view of a system. It involves the nodes and their
relationships. It ascertains how software is deployed on the hardware. It maps the software
architecture created in design to the physical system architecture, where the softwarewill be
executed as a node. Since it involves many nodes, the relationship is shown by utilizing
communication paths.

Fig. 6.5 Deployment Diagram

A deployment diagram for multiple disease prediction includes components such as


disease dataset, data preprocessing, Ml algorithms, predictive model. The user interface
collects input data from disease dataset and processes using Ml algorithms and then predict the
disease using predict model.

Dept of CSE, SDIT 15


Multiple Disease Prediction Using ML and Streamlit 2023-2024

CHAPTER 7

SYSTEM IMPLEMENTATION AND TESTING

7.1 IMPLEMENTATION:

An Implementation is a realization of a technical specification or algorithm as a program,


software components, or other computer system though computer programming and
deployment. Many implementations may exist for specifications or standards. A special case
occurs in object- oriented programming, when a concrete class implements an interface.

7.1.1 MODULES

1. DIABETES DISEASE PREDICTION

• The aim of the prediction is which can perform early prediction of diabetes of a patient.

• This aims to predict via different supervised machine learning methods.

• It uses data about the Effected and normal people data preferences to generate Whether person
is effected or not from a particular Disease.

Attribute Information:

 Pregnancies

 Glucose

 Blood pressure

 SkinThickness

 Insulin

 BMI

 DiabetesPedigreeFunction

 Age

Dept of CSE, SDIT 16


Multiple Disease Prediction Using ML and Streamlit 2023-2024

Code:

2. HEART DISEASE PREDICTION

 It uses data about the Effected and normal people data preferences to generate the
result of the patient.
 It performs the Different machine algorithms like
KNN,XGBoost,SVM,RANDOM FOREST, Logistic Regression etc
 This aims to predict via different supervised machine learning methods.

Attribute Information:

 Age

 Sex

 Chest Pain types

 Resting blood pressure

 Serum cholestral

 Fasting Blood sugar

 Resting Cardiographic Result

 Maximum Heart rate achieved

 Exercise Reduced Angina

 Vessels coloured by Fluroscopy

Dept of CSE, SDIT 17


Multiple Disease Prediction Using ML and Streamlit 2023-2024

Code:

3. PARKINSONS DISEASE PREDICTION

 The Parkinson Disease prediction module is one of the core of a multiple Disease
prediction system.
 It uses data about the Effected and normal people data preferences to generate the
result of the patient.
 It performs the Different machine algorithms like KNN, XGBoost, SVM, RANDOM
FOREST, Logistic Regression etc.

Attribute Information:

 name - ASCII subject name and recording number.

 MDVP:Fo(Hz)- Average vocal fundamental frequency.

 MDVP:Fhi(Hz) - Maximum vocal fundamental frequency.

 MDVP:Flo(Hz) - Minimum vocal fundamental frequency.

 MDVP:Jitter(%), MDVP:Jitter(Abs), MDVP:RAP, MDVP:PPQ, Jitter:DDP -


Several measures of variation in fundamental frequency.
 MDVP:Shimmer,MDVP:Shimmer(dB),Shimmer:APQ3,Shimmer:APQ5,MDV
P:APQ, Shimmer:DDA - Several measures of variation in amplitude.
 NHR, HNR- Two measures of the ratio of noise to tonal components in the voice.

Dept of CSE, SDIT 18


Multiple Disease Prediction Using ML and Streamlit 2023-2024
 status - The health status of the subject (one) - Parkinson's, (zero) – healthy.

 RPDE, D2- Two nonlinear dynamical complexity measures.

 DFA - Signal fractal scaling exponent.

 spread1,spread2,PPE - Three nonlinear measures of fundamental frequency variation.

Code:

4. MALARIA DISEASE PREDICTION

 For Malaria Disease Prediction, the system utilizes TensorFlow with Keras, achieving
an impressive accuracy of 96%. This high accuracy demonstrates the power of deep
learning models in accurately predicting malaria disease, enabling early detection and
proactive care.
 It uses data about the Effected and normal people data preferences to generate the
result of the patient.
 It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .

Code:

Dept of CSE, SDIT 19


Multiple Disease Prediction Using ML and Streamlit

5. INTESTINE DISEASE PREDICTION

 Intestine Disease Prediction is also included in the system, utilizing TensorFlow with
Keras and achieving an accuracy of 95%. The deep learning model developed using
these technologies can effectively detect the presence of intestine disease, enabling
early diagnosis and intervention.
 It uses data about the Effected and normal people data preferences to generate the
result of the patient.
 It performs the Different machine algorithms like CNN, TensorFlow with Keras etc .

Code:

Dept of CSE, SDIT 20


Multiple Disease Prediction Using ML and Streamlit

7.2 TESTING

7.2.1 Input Design:

The Multiple Disease Prediction system requires user input in the form of parameters specific
to each disease. When the user selects a particular disease from the options menu, the system
prompts for the relevant parameters. The input design should ensure that the user can easily
provide the required information The application provides a user interface with a menu
containing five disease options: heart disease, diabetes, Parkinson's disease malaria disease and
intestine disease. When the user clicks on a specific disease, the application prompts for the
required parameters for that particular disease prediction. The input design should ensure that
the parameters requested are relevant and necessary for accurate disease prediction. The user
should be able to enter the parameters in a user-friendly and intuitive manner.

7.2.2 Output Design:

The Multiple Disease Prediction system provides the predicted result of whether the person is
affected by the selected disease or not. The output design should present the result in a clear and
understandable format. The system should display the output after the user has entered the
parameters. The output could be presented as:
 "Prediction: The person is affected by [Disease Name]." (If the prediction is positive)

 "Prediction: The person is not affected by [Disease Name]." (If the prediction is negative)

The output should be displayed on the user interface, allowing the user to easily interpret the
prediction result. Overall, the input design ensures that the user can enter the necessary
parameters for disease prediction, while the output design presents the prediction result clearly
on the user interface.

Dept of CSE, SDIT 21


Multiple Disease Prediction Using ML and Streamlit

CHAPTER 8
RESULTS AND DISCUSSIONS

We have used a large dataset which consists of 70% training data and 30% testing data. The
algorithms used for comparison were Naive Bayes, Decision Tree, SVM and Random Forest. The
algorithms selected for comparison were based on the accuracy and time taken for prediction of
class label. The accuracy analysis of algorithms on the dataset can be seen in Table .

Disease Algorithm Existing system Proposed system


SN.
Name Name accuracy accuracy
1 Diabetes SVM Classifier 76% 78%
2 Heart disease Logistic Regression 80% 85%
3 Parkinson’s disease SVM Classifier 71% 87%
4 Malaria disease TensorFlow and keras - 96%
5 Intestine disease TensorFlow and keras - 95%

Table 1. Comparison of Accuracy of all 5 models

The existing system doesn’t have kidney disease and breast cancer prediction system. that’s
why we leave “-” in the existing system accuracy for kisney disease amd breast cancer.
prediction system. that’s why we leave “-” in the existing system accuracy for kisney disease
amd breast cancer.

8.1 DIABETES PREDICTION:

Fig. 8.1 Diabetes Prediction

Dept of CSE, SDIT 22


Multiple Disease Prediction Using ML and Streamlit

8.2 HEART DISEASE PREDICTION:

Fig. 8.2 Heart Disease Prediction

8.3 PARKINSON’S DISEASE PREDICTION:

Fig. 8.3 Parkinson’s Disease Prediction

Dept of CSE, SDIT 23


Multiple Disease Prediction Using ML and Streamlit

8.4 MALARIA DISEASE PREDICTION:

Fig. 8.4 Malaria Disease Prediction:

8.5 INTESTINE DISEASE PREDICTION:

Fig. 8.5 Intestine Disease Prediction

Dept of CSE, SDIT 24


Multiple Disease Prediction Using ML and Streamlit

CHAPTER 9
CONCLUSION AND FUTURE SCOPE

In conclusion, our project utilized machine learning algorithms, including Support Vector
Machine (SVM), Logistic Regression, and TensorFlow with Keras, to develop a disease
prediction system. The system focused on five diseases: diabetes, heart disease, Parkinson's
disease, malaria disease and intestine disease. We collected data from Kaggle.com and performed
preprocessing to ensure data quality. For diabetes prediction, we achieved an accuracy of 78%
using the SVM algorithm. Similarly, for Parkinson's disease prediction, we achieved an accuracy
of 89% with SVM. Logistic Regression was employed for heart disease prediction, resulting in
an accuracy of 85%. For malaria disease and intestine disease, we utilized TensorFlow with
Keras, achieving accuracy rates of 96% and 95% respectively. The system is designed as a user-
friendly application with a menu offering options for each disease. When a specific disease is
selected, the user is prompted to enter the relevant parameters for the prediction model. Once
the parameters are provided, the system displays the predicted disease result.

9.1 FUTURE SCOPE

The project "Multiple Disease Prediction using Machine Learning, Deep Learning and
Streamlit" has shown promising results in predicting various diseases with respectable
accuracies. Moving forward, there are several potential areas for future development and
enhancement:
Expansion of Disease Prediction: The current project focuses on diabetes, heart disease,
Parkinson's disease,malaria disease and intestine disease. In the future, additional diseases can
be included to create a more comprehensive and diverse disease prediction system.
Integration of More Machine Learning Algorithms: While the project already employs
Support Vector Machines (SVM), Logistic Regression, and TensorFlow with Keras, there are
many other machine learning algorithms that can be explored. Incorporating algorithms such
as Random Forest, Gradient Boosting, or Neural Networks may further improve the accuracy
and performance of the disease prediction models.
Integration of Advanced Feature Engineering Techniques: Feature engineering plays a crucial
role in extracting meaningful information from the input data. Exploring advanced feature
engineering techniques like dimensionality reduction, feature selection, and feature extraction
can potentially enhance the prediction models and their interpretability.

Dept of CSE, SDIT 25


Multiple Disease Prediction Using ML and Streamlit

REFERENCES

[1]. Vijayalaxmi A, Sridevi S, Dr.Sridhar and Sateesh Ambesange “Multi-Disease Prediction


with Artificial Intelligence from Core Health Parameters Measured through Non-invasive
Technique” IEEE 2020 4th International Conference on Intelligent Computing and Control
Systems (ICICCS)
[2]. Akkem Yaganteeswarudu “Multi Disease Prediction Model by using Machine Learning
and Flask API.
[3]. Chetan Sagarnal,Sneha Grampurohit “Disease Prediction using Machine Learning
Algorithms” IEEE 2020 International Conference for Emerging Technology (INCET)
[4]. Ajinkya Kunjir,Harshal Sawant and Nuzhat F. Shaikh “Data Mining and Visualization for
Prediction of Multiple Diseases in Healthcare” 2017 International Conference on Big Data
Analytics and Computational Intelligence (ICBDAC).
[5]. Rashmi G Saboji and Prem Kumar Ramesh, “A Scalable Solution for Heart Disease
Prediction using International Journal of Innovative Research in Computer Science &
Technology (IJIRCST) ISSN: 2347-5552, Volume- 8, Issue4, July- 2020
https://doi.org/10.21276/ijircst.2020.8.4.14 www.ijircst.org Copyright © 2020. Innovative
Research Publication. All Rights Reserve 330 Classification Mining Technique'' IEEE, 978-
1- 5386-1887-5/17, pp. 1780-1785, 2017.
[6]. Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine Learning in Disease
Prediction” IEEE, 978-1-5386-6947-1/18, pp. 1-4, 2018.
[7]. Lambodar Jena and Ramakrushna Swain, “Chronic Disease Risk Prediction using
Distributed Machine Learning Classifiers” IEEE, 978-1-5386-2924-6/17, pp. 170-173,
2017.
[8]. Naganna Chetty, Kunwar Singh Vaisla and Nagamma Patil, “An Improved Method for
Disease Prediction using Fuzzy Approach” IEEE, DOI 10.1109/ICACCE.2015.67, pp. 569-
572, 2015.
[9]. Sayali Ambekar, Rashmi Phalnikar, “Disease Risk Prediction by Using Convolutional
Neural Network” IEEE, 978-1-5386-5257-2/18, 2018.
[10]. M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease prediction by machine
learning over big data from healthcare communities” IEEE Access, vol. 5, no. 1, pp. 8869–
8879, 2017.

Dept of CSE, SDIT 26

You might also like