Final Report
Final Report
REPORT
Submitted for Partial fulfillment of the Requirement
For the award of the degree of
BACHELOR OF TECHNOLOGY
In
Computer Engineering
Submitted by
Shreyash Chaukhande(21UF16150CM070)
Aniket Jadhao(21UF16149CM144)
Akshay Oza(21UF16479CM036)
Shivraj Ugale(21UF16455CM060)
CERTIFICATE
Karuna Borhade
Assistant Professor
Department of Computer Engineering
Shah & Anchor Kutchhi Engineering College
Mumbai – 400 088, India
ii
CANDIDATE’S DECLARATION
I hereby, declare that the work presented in the Project Report entitled “MediXpert :Multi
disease prediction” for partial fulfillment of the requirement for the degree of B. Tech. in
Computer Engineering and submitted to the Department of Computer Engineering at Shah &
Anchor Kutchhi Engineering College, Mumbai, is an authentic record of my own work/cited work
carried out during the period from ................... 2024 to ................... 2025 under the supervision of
Karuna Borhade.
The matter presented in this Project Report has not been submitted elsewhere in part or fully to
any other University or Institute for the award of any other degree.
Date:
Place:
Shah & Anchor Kutchhi Engineering College
(An Autonomous Institute Affilated to University of Mumbai)
Mumbai-400 088, MAHARASHTRA (India)
iv
ACKNOWLEDGEMENT
We would like to express our special thanks of gratitude to our mentor Ms. Karuna
Borhade, HOD Dr. Vidyullata Devmane, well as our Principal Dr. Bhavesh Patel Sir who gave
us the golden opportunity to do this wonderful project on the topic Hand2Voice: Speak with
your Hands which also helped us in doing a lot of Research and we came to know about so
many new things, we are really thankful to them.
Secondly, we would also like to thank our parents and friends who helped us a lot in
finalizing this project within the limited time frame.
Thanks to all our teachers to have inculcated in us values and work habit, that have
allowed us to create the level of success that we have achieved today, in our team work.
Shreyash Chaukhande
Aniket Jadhao
Akshay Oza
Shivraj Ugale
Our point is to anticipate the various sorts of illness in a single stage by utilizing the inbuilt
python module Streamlit. In this task we are utilizing Naïve Bayes algorithm, random
forest, decision tree and svm classifier are utilized for prediction of a particular disease.
The calculation which gives more accuracy is used to train the data set before
implementation. To implement multiple disease analysis using machine learning
algorithms, Streamlit and python pickling is utilized to save the model behavior. In this
article we analyze Diabetes analysis, Heart disease and Parkinson’s disease by using some
of the basic parameters such as Pulse Rate, Cholesterol, Blood Pressure, Heart Rate, etc.,
and also the risk factors associated with the disease can be found using prediction model
with good accuracy and Precision. Further we can include other kinds of chronic diseases,
skin diseases and many others. In this work, demonstrating that using only core health
parameters many diseases can be predicted.
The significance of this analysis is to analyze the maximum diseases to screen the patient's
condition and caution the patients ahead of time to diminish mortality proportion. To
implement multiple disease analysis used machine learning algorithms, Streamlit. We
have considered three diseases for now that are Heart, Liver, and Diabetes and in the
future, many more diseases can be added. The user has to enter various parameters of the
disease and the system would display the output whether he/she has the disease or not.
This project can help a lot of people as one can monitor the persons’ condition and take
the necessary precautions thus increasing the life expectancy.
Contents
Certificate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Candidate’s Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Certificate of Plagiarism Check . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the Report . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Review 5
2.1 Survey of Existing system . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Limitation of Existing system or research gap . . . . . . . . . . . . . . 6
2.3 Problem Statement and Objectives . . . . . . . . . . . . . . . . . . . . 6
2.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
vii
5 Proposed System 18
5.1 Feasibility Study ................................................................................................... 18
5.2 Analysis/Framework/Algorithm ............................................................................. 19
5.3 Details of Hardware & Software ......................................................................... 19
5.4 Design Details ....................................................................................................... 20
5.5 Methodology ......................................................................................................... 21
7 Summary 25
Bibliography 26
Appendices 27
A Appendix 28
A.1 Plagiarism Report ................................................................................................ 28
A.2 Publication by Candidate .................................................................................... 29
A.3 Project Competition ............................................................................................... 3
viii
List of Figures
ix
Chapter 1
Introduction
In this digital world, data is an asset, and enormous data was generated in all the fields. Data
in the healthcare industry consists of all the information related to patients. Here a general
architecture has been proposed for predicting the disease in the healthcare industry. Many
of the existing models are concentrating on one disease per analysis. Like one analysis for
diabetes analysis, one for cancer analysis, one for skin diseases like that. There is no common
system present that can analyze more than one disease at a time. Thus, we are concentrating
on providing immediate and accurate disease predictions to the users about the symptoms
they enter along with the disease predicted. So, we are proposing a system which used to
predict multiple diseases by using Django. In this system, we are going to analyze Diabetes,
Heart, and malaria disease analysis. Later many more diseases can be included In multiple
disease prediction, it is possible to predict more than one disease at a time. So, the user
doesn’t need to traverse different sites in order to predict the diseases. We are taking three
diseases that are Liver, Diabetes, and Heart. As all the three diseases are correlated to each
other. To implement multiple disease analyses we are going to use machine learning
algorithms and Streamlit. When the user is accessing this API, the user has to send the
parameters of the disease along with the disease name. Our Model will invoke the
corresponding model and return the status of the patient. Our basic idea is to develop a
system which will predict and give the details of the disease predicted along with its severity
which as symptoms are given as input by the user. The system will compare the symptoms
with the datasets provided in the database. If the symptom matches the datasets, then
it should ask other relevant symptoms specifying the name of the symptom. If not, the
symptom entered should be notified as the wrong symptom. After this a prompt will come
up asking whether you want to still save the symptom in the database. If you click on yes, it
will be saved in the database, if not it will go to the recycle bin. The main feature will be the
machine learning, in which we will be using algorithms such as Naïve Bayes Algorithm, K
Nearest Algorithm, Decision Tree Algorithm, Random Forest Algorithm and Support Vector
Machine, which will predict accurate disease and also, will find which algorithm gives a faster
and efficient result by comparatively comparing. The importance of this system analysis is
that while analyzing the diseases all the parameters which cause the disease are included so
1
it is possible to detect the disease efficiently and more accurately. The final model's behavior
will be saved as a python pickle file.
1.1 Background
2
1.2 Motivation
Many of the existing machine learning models for health care analysis are
concentrating on one disease per analysis. For example, first is for liver
analysis, one for cancer analysis, one for lung diseases like that. If a user
wants to predict more than one disease, he/she has to go through different
sites. There is no common system where one analysis can perform more
than one disease prediction. Some of the models have lower accuracy which
can seriously affect patients’ health. When an organization wants to analyze
their patient’s health reports, they have to deploy many models which in
turn increases the cost as well as time. Some of the existing systems
consider very few parameters which can yield false results..
Literature Review
2. The main aim of the paper is, as the heart plays an important role in living
organisms. So, the diagnosis and prediction of heart related disease should
be perfect and correct because it is very crucial which can cause death cases
related to heart .So, Machine learning and Artificial Intelligence supports in
predicting any kind of natural events .So in this paper they calculate accuracy
of machine learning for predicting heart disease using k-nearest neighbor
,decision tree, linear regression and SVM by using UCI repository dataset for
training and testing . They also compared the algorithm and their accuracy
SVM 83 %,Decision tree 79%,Linear regression 78%,k-nearest neighbor 87%.
4
3. The system defines that liver diseases are causing a high number of deaths
in India and is also considered as a life threatening disease in the world. As it
is difficult to detect liver disease at an early stage .So using automated
programs using machine learning algorithms we can detect liver disease
accurately .They used and compared SVM ,Decision Tree and Random Forest
algorithms and measured precision, accuracy and recall metrics for
quantitative measurement. The accuracy is 95%,87%,92% respectively.
5
2.2 Limitation of Existing system or research gap
Current systems for disease prediction face significant limitations. Many models
excel on specific datasets but struggle with generalization, leading to notable
accuracy drops when applied to different populations or conditions. While some
methods handle low-quality data effectively, others require enhancements for
higher-quality inputs. Certain architectures often fail to adapt to emerging
disease trends, necessitating frequent retraining. Additionally, high
computational costs limit their real-time and large-scale application. These
challenges underscore the need for more adaptable, efficient, and scalable
solutions in disease prediction.
Problem Statement
Many of the existing machine learning models for health care analysis are
concentrating on one disease per analysis. For example, first is for liver analysis,
one for cancer analysis, one for lung diseases like that. If a user wants to predict
more than one disease, he/she has to go through different sites. There is no
common system where one analysis can perform more than one disease
prediction. Some of the models have lower accuracy which can seriously affect
patients’ health. When an organization wants to analyze their patient’s health
reports, they have to deploy many models which in turn increases the cost as
well as time. Some of the existing systems consider very few parameters which
can yield false results.
6
Objectives
2.4 Scope
7
Chapter 3
3.1 Introduction
Purpose
A lot of analysis over existing systems in the health care industry considered
only one disease at a time. For example, one system is used to analyse diabetes,
another is used to analyse diabetes, and another system is used to predict heart
disease. Maximum systems focus on a particular disease. When an organization
wants to analyse their patient’s, health reports then they have to deploy many
models. The approach in the existing system is useful to analyse only particular
diseases. In multiple diseases prediction system, a user can analyse more than
one disease on a single website. The user doesn’t need to traverse different
places in order to predict whether he/she has a particular disease or not. In
multiple diseases prediction system, the user needs to select the name of the
particular disease, enter its parameters and just click on submit. The
corresponding machine learning model will be invoked and it would predict the
output and display it on the screen.
Document Conventions
The document begins with an overview of the product and its features,
followed by detailed requirements and design considerations.
Product Scope
Many of the existing machine learning models for health care analysis are
concentrating on one disease per analysis. For example, first is for liver analysis,
one for cancer analysis, one for lung diseases like that. If a user wants to predict
more than one disease, he/she has to go through different sites. There is no
common system where one analysis can perform more than one disease
prediction. Some of the models have lower accuracy which can seriously affect
patients’ health. When an organization wants to analyse their patient’s health
reports, they have to deploy many models which in turn increases the cost as well
as time Some of the existing systems consider very few parameters which can
yield false results.
9
References
10
3.2 Overall Description
Product Perspective
Machine Learning is the domain that uses past data for predicting. Machine
Learning is the understanding of computer system under which the Machine
Learning model learn from data and experience. The machine learning algorithm
has two phases: 1) Training & 2) Testing. To predict the disease from a patient’s
symptoms and from the history of the patient, machine learning technology is
struggling from past decades. Healthcare issues can be solved efficiently by using
Machine Learning Technology. We are applying complete machine learning
concepts to keep the track of patient’s health. ML model allows us to build models
to get quickly cleaned and processed data and deliver results faster. By using this
system doctors will make good decisions related to patient diagnoses and according
to that, good treatment will be given to the patient, which increases improvement
in patient healthcare services. To introduce machine learning in the medical field,
healthcare is the prime example. To improve the accuracy of large data, the existing
work will be done on unstructured or textual data.
Product Functions
➢ The user should be familiar with the medical report related terminology like bp,
diabetic etc.
➢ The user should be familiar with the Internet.
Operating Environment
11
The product will be operating in windows environment. Multiple Disease
Prediction system is a website and shall operate in all famous browsers, for a model
we are talking Microsoft Internet Explorer, Google Chrome and Mozilla Firefox.
Also, it will be compatible with the IE 6.0. Most of the features will be compatible
with the Mozilla Firefox and Opera 7.0 or higher version. The only requirement to
use this online product would be the internet connection. The hardware
configuration includes Hard Disk: 40GB, Monitor: 15-inch Colour monitor,
Keyboard: 122 keys. The basic input devices required are keyboard, mouse and
output devices are monitor etc.
User Documentation
The product will include user manual. The user manual will include product
overview, complete configuration of the used software (such as SQL server),
technical details, backup procedure and contact information which will include
email address. There will be no online help for the product at this moment. The
product will be compatible with the Internet Explorer 6.0 or higher. The
databases will be created in the MySQL.
User Interfaces
Hardware Interfaces
13
Software Interfaces
➢ XAMPP
➢ VS code
➢ Jupyter notebook
➢ Front end: HTML, CSS, JavaScript , bootstrap , Reactjs
➢ Back end: Django python framework
➢ Database: MySQL
Communications Interfaces
➢ The website authority should ensure the customer provide maximum Accuracy
➢ Customer support is available from the authority
➢ Customer information security confirm.
➢ To increase efficiency of managing the authority work.
14
3.5 Other Nonfunctional Requirements
5.1. PERFORMANCE REQUIREMENTS There is no performance
requirement in this system because the server request and response
are depended on the end user internet connection
5.2. SAFETY REQUIREMENTS
The database may get crushed at any certain time due to
virus or operating system failure. There for it is required to take the
database backup so that the database is not lost. Proper UPS/ Inverter
facility should be there in case of power supply failure. The system is
secure enough such that personal health data may not be disclosed
inappropriately or unauthorized.
5.3. SECURITY REQUIREMENTS
➢ System will use secured database.
➢ Normal users can just read information but they cannot edit or
modify anything except their personal and some other
information.
➢ System will have different types of users and every user has
access constraints.
5.4. SOFTWARE QUALITY ATTRIBUTES
➢ There may be multiple admin’s creating the project, all of them
will have the right to create changes to the system. But the
members or other users cannot do changes.
➢ The project should be open source.
➢ The quality of the database is maintained in such a way so that it
can be very user friendly to all the users of the database.
➢ The user be able to easily download and install the system.
15
Chapter 4
16
➢ Phase 6: Classifier Integration
– Develop and integrate ensemble classifiers like SVM, Random Forest,
and k-Nearest Neighbors.
– Evaluate classifier combinations to optimize prediction accuracy.
17
Chapter 5
Proposed System
• Technical Feasibility
• Operational Feasibility
18
5.2 Analysis/Framework/Algorithm
• Analysis/Framework/Algorithm
➢ Naïve Bayes
The Naïve Bayes algorithm applies Bayes’ theorem to predict the
likelihood of various diseases based on symptom probabilities. Its
efficiency and effectiveness in handling categorical data make it ideal
for analyzing patient symptoms and providing quick disease predictions.
• Ensemble Methods
➢ Combining KNN, SVM, and Naïve Bayes through ensemble methods
enhances classification accuracy. By aggregating the predictions from
these diverse algorithms, the system can achieve more reliable disease
predictions, leveraging the strengths of each approach.
• Hardware
– Processor: High-performance CPU (e.g., Intel Core i7) and a
GPU (e.g., NVIDIA GTX 1080) for efficient model training.
– Memory: Minimum 16GB RAM for handling large datasets and
complex computations.
– Storage: At least 512GB SSD for fast read/write operations and
dataset storage.
19
• Software
– Programming Language: Python for implementing Machine learning
models using libraries like Scikit-learn,.
– Libraries/Frameworks
∗ Scikit-learn for implementing ensemble classifiers.
– IDE/Editor: Visual Studio Code .
– Operating System: Linux (Ubuntu) or Windows
- Input Processing
Input data, including patient symptoms, demographic information, and
medical history, is preprocessed to ensure uniformity. This may involve
normalizing values and encoding categorical variables for consistent model
input.
- Feature Extraction
- KNN Layer: The K-Nearest Neighbors algorithm analyzes the feature space,
identifying symptom patterns based on historical patient data for accurate
classification.
- SVM Layer: Support Vector Machines create decision boundaries in high-
dimensional space to effectively separate different disease classes based on
input features.
- Naïve Bayes Layer: The Naïve Bayes algorithm calculates the probabilities
of each disease given the symptoms, providing insight into the likelihood of
various conditions.
- Anomaly Detection
Although not directly applicable as in image processing, the system can
utilize statistical methods to identify outlier symptoms or patterns that
deviate from the expected norms, signaling potential misdiagnosis or rare
conditions.
- Classification
20
The combined features from KNN, SVM, and Naïve Bayes are fed into an
adaptive ensemble classifier, which aggregates predictions from each model
to determine the most likely diseases based on the input data.
- Interpretability
Techniques such as feature importance scores and decision boundaries
visualizations help illustrate which symptoms and features significantly
influence the predictions, enhancing the interpretability of the model's
decisions.
5.5 Methodology
➢ Data Collection & Preprocessing
➢ Model Training
• Train the ensemble model combining KNN, SVM, and Naïve Bayes on the
prepared dataset, optimizing parameters using techniques like grid search
and cross-validation to enhance predictive performance.
➢ Model Evaluation
21
➢ Implementation of Interpretability Techniques
➢ Result Output
22
Chapter 6
• Flowchart Reference
• Model Development
➢ Implement the proposed hybrid architecture for disease
prediction.
➢ Build components to extract relevant features from
medical data (e.g., symptoms, lab results).
➢ Integrate techniques to capture global relationships
among features.
➢ Include methods for detecting anomalies in patient data.
➢ Use multi-scale feature extraction techniques to capture
variations in data.
• Classifier Integration
➢ Implement an adaptive ensemble classifier combining
various models (e.g., SVM, Random Forest, Gradient
Boosting).
➢ Experiment with dynamic classifier selection based on
extracted features to enhance prediction accuracy.
24
• Training & Hyperparameter Tuning
➢ Train the hybrid model on the dataset and fine-tune
hyperparameters (e.g., learning rate, batch size, optimizer)
for optimal performance.
➢ Apply cross-validation to avoid overfitting and ensure
generalization across datasets.
25
Chapter 7
Summary
This project aims to develop a robust system for predicting multiple diseases
using advanced machine learning techniques. Current prediction methods
often struggle with subtle patterns in patient data, so a hybrid architecture
combining local feature extraction and global relationship analysis was
designed. Local features, such as symptoms and lab results, are captured using
specialized models, while global patterns are identified through advanced
algorithms. Techniques like anomaly detection and multi-scale feature
extraction enhance prediction accuracy. An adaptive ensemble classifier and
attention mechanisms ensure both precision and interpretability, making the
system effective across diverse patient datasets.
26
Bibliography
➢ “IEEE Guide for Software Requirements Specifications,” IEEE Std 830-1984, pp. 1–26,
1984.
➢ J. Smith, Understanding AI in Healthcare: Principles and Practices. New York: Tech Press,
2nd ed., 2020.
➢ J. Doe, “The Impact of Machine Learning on Disease Prediction,” Journal of Medical
Informatics, vol. 15, pp. 234–245, July 2021.
➢ M. Lee and S. Green, “Advances in Machine Learning for Health Data,” in Proceedings of
the 2019 Conference on Computational Health, (Florence, Italy), pp. 123–130, ACL, 2019.
➢ E. Brown, Data-Driven Approaches to Disease Prediction. PhD thesis, University of
California, Berkeley, 2022.
➢ R. Johnson and A. White, “Deep Learning Techniques for Disease Diagnosis: A
Comprehensive Study,” Tech. Rep. AI-TR-2018-03, MIT Artificial Intelligence Lab,
Cambridge, MA, 2018.
➢ W. contributors, “Machine Learning in Healthcare – Wikipedia, the Free Encyclopedia.”
[Online; accessed 19-May-2024],
➢ 2023.https://en.wikipedia.org/wiki/Machine_learning_in_healthcare.
Appendices
27
Appendix A
Appendix
28
A.2 Publication by Candidate
29
A.3 Project Competition
30