Disease Prediction Research Report

Integral University, Lucknow, India (IUL)
Multiple Disease Prediction System Using ML

Ahsan Ahmad Beg, Fazla Maqsood, Assistant Prof. Sifatullah Siddiqi
Student, Department of Computer Science and Engineering, Integral University, India
Student, Department of Computer Science and Engineering, Integral University, India
Assistant Professor, Sifatullah Siddiqi, Department of Computer Science and Engineering, Integral University,
India
[email protected] , [email protected]
---------------------------------------------------------------------***---------------------------------------------------------------------
ABSTRACT - Machine learning and Artificial Intelligence technology, we aim to revolutionize healthcare by
have become integral components of numerous industries. developing a predictive model that can accurately forecast
From self-driving cars to medical fields, we can find them the likelihood of various diseases in individuals. By
everywhere. In the medical industry, the abundance of analysing comprehensive medical data and employing
patient data presents an opportunity for leveraging advanced machine-learning algorithms. we can provide
machine learning techniques to enhance disease detection timely and precise predictions. Our research focuses on
and diagnosis. In this project, we present a comprehensive developing a robust model that can analyse various
Prediction System capable of detecting multiple diseases patient parameters and historical records to forecast the
simultaneously, addressing the limitations of existing likelihood of specific diseases. Through this project, we
systems that often offer lower accuracy and focus on aspire to enhance healthcare outcomes, empower medical
individual diseases. Our system currently focuses on five professionals, and improve the overall well-being of
individuals around the world.
major diseases: Heart, Liver, Diabetes, Lung Cancer, and
Parkinson's disease, with the potential for expansion to
1.1 DESCRIPTION
include more diseases in the future.
A lot of analysis over existing systems in the health care
By incorporating various parameters specific to each
industry considered only one disease at a time. For
disease, users can input their data and receive reliable
predictions regarding disease presence. The implications example, one system is used to analyse diabetes, another
of this project are significant, as it enables individuals to is used to analyse diabetes retinopathy, and another
monitor their health conditions and take proactive system is used to predict heart disease. Maximum systems
measures, ultimately leading to improved life expectancy. focus on a particular disease. When an organization wants
By harnessing the power of machine learning, we aim to to analyse their patient’s, health reports then they have to
contribute to the well-being of countless individuals, deploy many models.
providing accurate disease predictions that can potentially
save lives. The approach in the existing system is useful to analyse
only particular diseases. In multiple diseases prediction
Key Words: Supervised Learning, Hypothesis Generation, system, a user can analyse more than one disease on a
Exploratory Data Analysis, Feature Engineering, Pre- single website. The user doesn’t need to traverse different
processing Data, Modelling, Logistic Regression, Predictive places in order to predict whether he/she has a particular
System, Support Vector Machine (SVM), k-nearest disease or not. In multiple diseases prediction system, the
neighbours Algorithm (KNN), Deployment, Streamlit user needs to select the name of the particular disease,
Cloud.
enter its parameters and just click on submit. The
1.INTRODUCTION corresponding machine learning model will be invoked
and it would predict the output and display it on the
Multiple Disease Prediction System is an end-to-end screen.
machine learning project. This system is designed to
analyse various medical parameters and predict the 1.2 PROBLEM SYSTEM
likelihood of a patient acquiring a particular disease. We
aim to revolutionize healthcare by harnessing the power The current landscape of machine learning models in
of machine learning to accurately predict and diagnose healthcare analysis predominantly focuses on individual
diseases. Leveraging the power of cutting-edge diseases, necessitating separate analyses for each
condition. Liver analysis, cancer analysis, and lung disease aids in optimizing healthcare resource allocation by
analysis are typically treated as isolated entities. prioritizing high-risk patients and ensuring timely
interventions. Furthermore, machine learning algorithms
This fragmented approach poses a challenge for users can assist in the identification of disease patterns and risk
seeking to predict multiple diseases, as they are forced to factors, contributing to the development of targeted public
navigate through various platforms. Regrettably, there is health strategies.
no unified system capable of conducting comprehensive While significant progress has been made in the field of
disease predictions across multiple conditions. Moreover, multiple disease prediction using machine learning, there
some existing models exhibit suboptimal accuracy, are still challenges that need to be addressed. These
thereby compromising patient well-being. Organizational include the availability and quality of health data, ensuring
efforts to analyse patient health reports require the patient privacy and data security, and the interpretability
deployment of numerous models, resulting in increased and explain ability of the predictive models. Additionally,
costs and time consumption. Furthermore, several the integration of machine learning algorithms into
prevailing systems rely on limited parameters, leading to existing healthcare systems requires careful consideration
potentially erroneous outcomes. of regulatory frameworks, ethical guidelines, and
healthcare workflows.
1.3 PROPOSED SYSTEM
We present an innovative solution that revolutionizes In conclusion, the application of machine learning in
disease prediction in healthcare analysis. Our proposed multiple disease prediction holds immense potential for
system transcends the conventional approach by enabling revolutionizing healthcare. By harnessing the power of
the simultaneous prediction of multiple diseases. By these algorithms, healthcare providers can proactively
consolidating diverse analyses into a single unified identify individuals at risk, enhance diagnosis accuracy,
platform, users can efficiently access accurate predictions and optimize treatment strategies. However, it is crucial to
for various conditions. With a focus on enhancing both address the challenges associated with data quality,
accuracy and efficiency, our model considers a privacy, interpretability, and regulatory compliance to
comprehensive set of parameters, ensuring reliable ensure the successful implementation of machine
results. By eliminating the need for multiple models and learning-based predictive models in healthcare settings.
streamlining the prediction process, our system holds the There are several tools and technologies used which have
potential to significantly improve healthcare outcomes been used to develop this project.
while optimizing resource allocation.
TOOLS USED:
To implement multiple disease analyses, we will utilize 1: Kaggle - Kaggle is a platform that provides access to
machine learning algorithms and the Streamlit diverse datasets
framework. When accessing the web application, users 2: Google Colaboratory - Colaboratory is a data analysis
can select the specific disease they wish to predict and and machine learning tool
input the corresponding parameters. Streamlit will then 3: Anaconda – Anaconda aims to simplify package
invoke the appropriate model and provide the patient's management and deployment.
status as the output. This research contributes to the 4: Spyder IDE - An open-source cross-platform integrated
advancement of healthcare analytics, providing a unique development environment
and holistic approach to disease prediction that has the 6: Streamlit Cloud - Deploy, manage, share your apps with
capacity to transform patient care on a global scale. the world, directly from Streamlit
2. BACKGROUND TECHNOLOGIES USED:

1: Python - Python is dynamically typed, high-level,
The field of healthcare has witnessed significant general-purpose programming language.
advancements in recent years, thanks to the emergence of 2: NumPy - A library for the Python. adding support for
machine learning techniques. With the growing large, multi-dimensional arrays & matrices
availability of health data and the increasing computing 3: Pandas – A software library written for the Python
power, machine learning has become a powerful tool in programming language for data manipulation and
predicting and diagnosing various diseases. analysis.
4: Sklearn - A free software machine learning library for
The benefits of employing machine learning for multiple the Python programming language.
disease prediction are numerous. Firstly, it enables 5: Machine Learning Algorithms - Supervised learning is
healthcare professionals to identify individuals who are at the types of machine learning in which machines are
a higher risk of developing multiple diseases, facilitating trained using well "labelled" training data, and on basis of
early intervention and preventive measures. Secondly, it that data, machines predict the output.
6: Pickle - Python pickle module is used for serializing and The hypothesis for the Multiple Disease Prediction System is
de-serializing a Python object structure. that by analysing comprehensive medical data and
7: Stream Lit - A free, Open-source framework to rapidly employing advanced machine learning algorithms, it is
build and share beautiful machine learning web apps. possible to accurately predict the likelihood of individuals
3. SYSTEM ANALYSIS acquiring specific diseases. The hypothesis assumes that
there are underlying patterns and relationships within the
3.1 FUNCTIONAL REQUIREMENT
medical data that can be leveraged to develop a robust
predictive model.
 The system allows the patient to predict the
disease 5.2 COLLECTION OF DATA
 The user adds the input for the particular disease
and based on the trained model of the user input To initiate this project, we began by collecting data from
the output will be displayed. various sources. We utilized Kaggle as a platform to import
relevant datasets, which serve as valuable resources for
3.2 NON-FUNCTIONAL REQUIREMNTS practice, research, and as a foundation for constructing
machine learning models. These curated datasets provide a
 The website will provide range of the values solid starting point, offering a diverse range of information
during the prediction of the disease. that can be leveraged to train and validate our prediction
system accurately.
 The website should be reliable and consistent.
5.3 DATA PRE- PROCESSING / REMOVAL OF UNWANTED
4. SYSTEM MODEL DATA
For developing this project, we have used Supervised The collected data serves various purposes, and as it is
Machine Learning Model. Supervised Learning is the sourced from diverse platforms, it can contain a substantial
simplest machine learning model to understand in which amount of information. However, this imported data may
input data is called training data and has a known label or also include unwanted or noisy elements that require pre-
result as an output. So, it works on the principle of input- processing. The primary objective of data pre-processing is
output pairs. It requires creating a function that can be to refine the dataset by removing irrelevant or redundant
trained using a training data set, and then it is applied to information, addressing missing values, and handling
unknown data and makes some predictive performance. outliers or noise. This step ensures that only the necessary
Supervised learning is task-based and tested on labelled
and high-quality data is retained for further analysis.
data sets.
We can implement a supervised learning model on simple 5.4 FEATURE SELECTION
real-life problems. We have employed machine learning
models to predict the likelihood of different diseases Feature selection is a critical step in the data analysis
based on user-input symptoms. To ensure accurate process, aimed at identifying and selecting the most relevant
predictions, we selected different machine learning and informative features from a dataset. With the
algorithms for each disease, considering their accuracy
abundance of available data, feature selection plays a crucial
performance.
role in enhancing the performance and efficiency of machine
For each disease, we carefully chose a specific machine
learning algorithm that best captures the patterns and learning models. We have used statistical measures and
relationships between symptoms and diseases. These correlation analysis techniques to assess the importance
algorithms include logistic regression, support vector and relevance of each feature.
machines (SVM) & K-Nearest Neighbours (KNN). The
selection was based on their ability to effectively analyse By performing feature selection, we aim to streamline the
the dataset and provide accurate predictions. input data and retain only the most valuable features for our
By leveraging different machine learning algorithms for predictive models. This process helps us focus on the most
different diseases and selecting the most accurate models, influential factors and ensures that our model's predictions
our project aims to provide reliable disease predictions, are based on the most meaningful and impactful variables.
supporting healthcare professionals and users in early Ultimately, feature selection enables us to improve the
detection and intervention. accuracy and efficiency of our multiple disease prediction
system, contributing to better healthcare outcomes and
empowering medical professionals with valuable insights.
5. EXPERIMENT
5.5 MODEL BUILDING
5.1 HYPOTHESIS GENERATION
For our multiple disease prediction system, we have utilized

supervised machine learning algorithms, namely Logistic
7. PRELIMINARIES
Regression, Support Vector Machines (SVM), and K-Nearest
Neighbours (KNN). These algorithms have been chosen for 7.1 MACHINE LEARNING ALGORITHM
their effectiveness in classification tasks and their ability to
handle multi-class prediction scenarios. 7.1.1 LOGISTIC REGRESSION ALGORITHM
By leveraging these supervised machine learning A

algorithms, we aim to develop robust and accurate models
for multiple disease prediction. Each algorithm brings its
unique strengths and considerations, allowing us to explore
different approaches and select the most suitable model for
each disease category. Through this model-building process,
we aim to provide reliable and timely predictions.
5.6 DEPLOYMENT
Our multiple disease prediction system has been deployed

using the Streamlit Cloud server, providing a user-friendly
web interface for easy access and interaction. With Streamlit statistical model is typically used to model a binary
Cloud, users can input disease parameters and receive dependent variable with the help of a logistic function.
accurate predictions for multiple diseases simultaneously. Another name for the logistic function is a sigmoid
This deployment ensures accessibility and scalability, function and is given by:
empowering healthcare professionals and individuals to
make informed decisions about their health. 1 ⅇx
F ( x )= =
6. DESIGN 1+ⅇ−x ⅇx +1
6.1 ARCHITECTURE DESIGN This function assists the logistic regression model to
squeeze the values from (-k,k) to (0,1). Logistic regression
In Figure no 6.1 we have experimented on five different is majorly used for binary classification tasks; however, it
diseases that is Heart, Diabetes and Lung Cancer, can be used for multiclass classification.
Parkinson's and Thyroid as these are correlated to each
other. The first step is to extract the dataset. we have Logistic regression starts from a linear equation. However,
imported the dataset from Kaggle respectively. Once we this equation consists of log-odds which is further passed
have imported the dataset then visualization of each through a sigmoid function which squeezes the output of
inputted data takes place. After visualization pre-processing the linear equation to a probability between 0 and 1. And,
of data takes place where we check for outliers, missing we can decide a decision boundary and use this
values and also scale the dataset then on the updated probability to conduct classification task.
dataset we split the data into training and testing. Next, on
7.1.2 SUPPORT VECTOR MACHINE ALGORTIHM
the training dataset, we applied different machine learning
algorithms and applied knowledge on the classified Support Vector Machine (SVM) is a supervised machine
algorithm using the testing dataset. After applying learning algorithm that is usually used in solving binary
knowledge, we have chosen the Logistic Regression, SVM, classification problems. It can also be applied in multi-
and KNN algorithm with the best accuracy for each of the class classification problems and regression problems.
diseases. Then we build a pickle file for all the diseases and
then integrated the pickle file with the Streamlit framework Assume we have n training points, each observation i has
for the output of the model on the webpage. p features (i.e., x_i has p dimensions), and is in two classes
y_i=-1 or y_i = 1. Suppose we have two classes of
6.2 USER INTERFACE DESIGN observations that are linearly separable. That means we
can draw a hyperplane through our feature space such
that all instances of one class are on one side of the
hyperplane, and all instances of the other class are on the
opposite side. (A hyperplane in p dimensions is a p-1
dimensional subspace. In the two-dimensional example

that follows, a hyperplane is just a line.)
 Firstly, we will choose the number of neighbours,

We define a hyperplane as: so we will choose the k=5.
x ⋅ w+b=0
where ˜w is a p-vector and ˜b is a real number. For

convenience, we require that ˜w = 1, so the quantity x * ˜w
+ ˜b is the distance from point x to the hyperplane.
 Next, we will calculate the Euclidean

distance between the data points. The Euclidean
distance is the distance between two points,
which we have already studied in geometry.
 By calculating the Euclidean distance we got the

nearest neighbours, as three nearest neighbours
in category A and two nearest neighbours in
category B.
Thus we can label our classes with y = +1/-1, and the
requirement that the hyperplane divides the classes  As we can see the 3 nearest neighbours are from
becomes: category A, hence this new data point must
belong to category A.
yⅈ ( x i ⋅ w ' +b ' ) ≥0
8. RESULT
In our system, The Diabetes Disease prediction and

7.1.3 K-NN ALGORITHM Parkinson's Disease prediction model utilize the Support
Vector Machine (SVM) algorithm, while the heart disease
The K-NN working can be explained on the basis of the prediction and Lung Cancer Prediction model employs the
below algorithm: Logistic Regression algorithm. For the Thyroid disease
prediction model, we have utilized the K-NN algorithm, as
 Step-1: Select the number K of the neighbour these algorithms have demonstrated the best accuracy for
their respective diseases.
 Step-2: Calculate the Euclidean distance of K
number of neighbours
 Step-3: Take the K nearest neighbours as per the

calculated Euclidean distance
 Step-4: Among these k neighbours, count the

number of the data points in each category
 Step-5: Assign the new data points to that

category for which the number of the neighbour
is maximum
When a patient enters the relevant parameters based on
 Step-6: Our model is ready. the selected disease, the system will determine whether
the patient is likely to have the disease or not. The system optimizing feature selection, and incorporating
guides by indicating the expected range of values for each more comprehensive datasets, we can reduce false
parameter. If a value is outside the specified range, invalid, predictions and increase the overall accuracy of
or left empty, the system will display a warning sign, the system. This would ultimately contribute to
prompting the patient to input a correct and valid value. lowering the mortality rate by enabling timely
interventions and treatments.
By utilizing these specific algorithms and providing clear  Integrating our multiple disease prediction system
parameter requirements, we aim to enhance the accuracy with electronic health records can provide a more
and reliability of our disease prediction system. This comprehensive and personalized healthcare
approach empowers users to receive timely and accurate experience. By leveraging patient data from EHR
predictions while ensuring that the input data meets the systems, we can enhance the accuracy of
necessary criteria for analysis. predictions and enable healthcare professionals to
make informed decisions based on the patient's
9. Conclusion medical history.
 Mobile Application Development: Developing a
The primary aim of this project was to develop a system mobile application version of the multiple disease
capable of accurately predicting multiple diseases. By prediction system would enhance accessibility
achieving this objective, we have eliminated the need for and convenience for users. It would allow
users to visit multiple websites, saving them valuable individuals to access the system on their
time. Timely disease prediction can significantly increase smartphones, providing real-time disease
life expectancy and prevent financial burdens. To predictions and empowering them to take
accomplish this, we utilized several machine learning proactive measures for their health anytime,
algorithms, including Logistic Regression, Support Vector anywhere.
Machines (SVM), and K-Nearest Neighbours (KNN), in
order to achieve the highest possible accuracy. By focusing on these future directions, we can further
advance the field of disease prediction, improve
By harnessing the power of these algorithms, our system healthcare outcomes, and make a positive impact on
can provide accurate disease predictions, empowering individuals' lives.
individuals to take proactive measures for their health.
Early detection and intervention are crucial in managing REFRENCE
and treating various diseases effectively. Through this
 Priyanka Sonar, Prof. K. JayaMalini,” DIABETES
project, we have created a valuable tool that can
PREDICTION USING DIFFERENT MACHINE
contribute to improved healthcare outcomes and overall
well-being. LEARNING APPROACHES”, 2019 IEEE ,3rd
International Conference on Computing
In conclusion, our project represents a significant step Methodologies and Communication (ICCMC)
towards revolutionizing healthcare by leveraging machine
learning techniques to predict multiple diseases  Archana Singh, Rakesh Kumar, “Heart Disease
accurately. The system's ability to provide efficient and Prediction
accurate predictions has the potential to enhance
healthcare access, improve patient outcomes, and
ultimately save lives.
10. Future Scope:
There are several avenues for future development and

expansion of our multiple disease prediction system:
 Addition of More Diseases: In the future, we can

expand the system by incorporating additional
diseases into the existing web application. This
would enable users to predict a broader range of
diseases and further enhance the system's
usefulness in healthcare.
 Accuracy Improvement: As part of ongoing
research and development, we can strive to
improve the accuracy of disease predictions. By
refining the machine learning algorithms,

Disease Prediction Research Report

Uploaded by

Copyright:

Available Formats

Disease Prediction Research Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Disease Prediction Research Report

Uploaded by

Copyright:

Available Formats

Integral University, Lucknow, India (IUL)

Multiple Disease Prediction System Using ML

Student, Department of Computer Science and Engineering, Integral University, India

Student, Department of Computer Science and Engineering, Integral University, India

2. BACKGROUND TECHNOLOGIES USED:

For our multiple disease prediction system, we have utilized

By leveraging these supervised machine learning A

Our multiple disease prediction system has been deployed

dimensional subspace. In the two-dimensional example

 Firstly, we will choose the number of neighbours,

where ˜w is a p-vector and ˜b is a real number. For

 Next, we will calculate the Euclidean

 By calculating the Euclidean distance we got the

In our system, The Diabetes Disease prediction and

 Step-3: Take the K nearest neighbours as per the

 Step-4: Among these k neighbours, count the

 Step-5: Assign the new data points to that

10. Future Scope:

There are several avenues for future development and

 Addition of More Diseases: In the future, we can

You might also like