Project Report3
Project Report3
Project Report
on
Bachelor of Technology
in
Submitted By: -
Chaman(22015004022)
Ankit Malik(22015004009)
Ansh Tewatia(22015004014)
Under the guidance of
DECEMBER 2024
CERTIFICATE
I hereby certify that the work which is being presented in the B.Tech Project Report entitled,
‘Disease prediction system using ml’ in partial fulfillment of the requirements for the award
of the Bachelor of Technology in Computer Science & Engineering and submitted to the
Department of Computer Science & Engineering of Echelon Institute of Technology,
Faridabad is an authentic record of my own work carried out during a period from August
2024 to December 2024.
The matter presented in this report has not been submitted by me for the award of any other
degree elsewhere.
Signature of Candidate
Chaman(22015004022)
Ankit Malik(22015004009)
Ansh Tewatia(22015004014)
TO WHOM IT MAY CONCERN
This is to certify that the Project entitled ‘Disease prediction system using ml’ submitted
by “Chaman” (22015004022), “Ankit Malik” (22015004009),”Ansh Tewatia”
(22015004014) Department of Computer Science and Engineering, Echelon Institute
of Technology Under J.C. Bose University of Science and Technology, YMCA (Formerly
YMCA UST), Faridabad, for partial fulfillment of the requirements for the degree of
Bachelor of Technology in Computer Science & Engineering; is a Bonafide record of the
work and investigations carried out by him under my supervision and guidance.
Ms. Tanya
Signature of HOD
Head of Department
ACKNOWLEDGEMENT
We take this opportunity to thank all those who have helped us in completing the project
successfully.
We would like to express our gratitude to Ms. Tanya, who as my guide/mentor
provided me with every possible support and guidance throughout the
development of project. This project would never have been completed without
her encouragement and support.
We would also like to show our gratitude to Dr. Manisha Vashisht, Head of
Department for providing us the required resources and a healthy environment
for carrying out our project work
Chaman(22015004022)
Ankit Malik(22015004009)
Ansh Tewatia(22015004014)
ABSTRACT
With the increasing prevalence of diseases, early detection and prediction play a crucial
role in effective treatment and prevention. This project presents a Disease Prediction
System utilizing Machine Learning (ML) to predict diseases based on patient symptoms
and medical history. The system applies classification algorithms such as Decision
Trees, Random Forest, Support Vector Machines (SVM), and Neural Networks to
analyse input data and provide accurate predictions.
The core methodology of this project revolves around supervised learning techniques,
where labelled datasets containing symptoms and corresponding diseases are used to
train predictive models. The system follows a structured workflow: data collection,
preprocessing, feature selection, model training, and evaluation. The dataset used
consists of patient medical history and symptom-disease correlations, ensuring
comprehensive coverage of common illnesses. Various machine learning models are
implemented, and their performance is compared using evaluation metrics such as
accuracy, precision, recall, and F1-score to determine the most effective approach.
Random Forest and Neural Networks emerge as the most effective models due to their
ability to capture complex patterns in symptom-disease relationships.
The Decision Tree algorithm provides a simple yet effective approach for
understanding symptom classifications, whereas Support Vector Machines (SVM)
offer better
performance when working with high-dimensional data. The K-Nearest Neighbours
(KNN) algorithm is also explored, but it exhibits limitations when dealing with large
datasets due to computational overhead.
One of the key features of this system is its user-friendly interface, designed to allow
individuals to input symptoms and receive real-time disease predictions. The interface
is built using React for web applications and React Native for mobile applications,
ensuring a seamless user experience. The system aims to provide not only disease
predictions but also recommendations for seeking medical consultation, making it a
valuable tool for both patients and healthcare providers.
The Disease Prediction System presents a range of benefits, including:
1. Chapter-1 Introduction 8
References 35
7
INTRODUCTION
1.1 Overview
Healthcare is one of the most critical sectors where technological advancements have
led to significant improvements in disease diagnosis, treatment, and prevention.
Traditional healthcare systems often rely on manual medical examinations, laboratory
tests, and physician expertise, which may sometimes result in delays, errors, or
inconsistencies in diagnosing diseases. With the growing availability of electronic
health records (EHRs) and vast amounts of patient data, the need for automated, data-
driven healthcare solutions has become essential.
The proposed system can be deployed on web applications, mobile healthcare apps, and
telemedicine platforms, making disease prediction accessible to people worldwide. By
integrating this AI-powered diagnostic tool into existing healthcare infrastructure, we
8
can bridge the gap between technology and medical science, creating a cost-effective,
efficient, and scalable solution for disease prevention.
This project aims to develop an ML-based Disease Prediction System that provides fast,
accurate, and scalable diagnostic capabilities, ensuring better patient care and early
intervention.
10
• Improve early disease detection by leveraging supervised learning algorithms
for predictive healthcare analysis.
• Enhance diagnostic accuracy by reducing human errors and inconsistencies in
disease diagnosis.
• Integrate real-time healthcare monitoring by connecting the ML system with
wearable medical devices and electronic health records (EHRs).
• Increase healthcare accessibility by deploying the system on cloud-based
platforms, mobile apps, and telemedicine portals.
• Enable personalized treatment recommendations by analysing patient history,
lifestyle, and genetic data.
• Ensure data security and privacy by implementing secure encryption protocols
and complying with HIPAA and GDPR regulations.
With continuous improvements in AI, deep learning, and cloud computing, this system
has the potential to revolutionize medical diagnostics, personalized treatment plans, and
remote healthcare services. Future developments will focus on expanding the system’s
capabilities, integrating wearable health devices, and ensuring ethical AI
implementation for widespread healthcare adoption.
12
OBJECTIVES OF PROJECT
The primary objective of this project is to create a robust Machine Learning (ML) model
that can accurately predict diseases based on patient symptoms and medical history. The
model should leverage advanced classification techniques, such as Random Forest,
Decision Trees, Support Vector Machines (SVM), and Neural Networks, to achieve high
precision and recall. By integrating data-driven approaches, the system aims to enhance
diagnostic accuracy and reduce human errors.
The accuracy of any ML-based disease prediction system depends on the quality of the
dataset. This project focuses on:
Gathering comprehensive medical datasets from trusted sources such as healthcare
institutions and online repositories.
Feature selection techniques to identify the most relevant symptoms that contribute to
disease prediction.
Accuracy
Precision
13
Recall
F1-score By implementing multiple algorithms, the system can evaluate their strengths
and weaknesses and choose the most effective one for disease prediction.
The project aims to rigorously evaluate the ML model using statistical techniques such
as:
Confusion Matrix to analyse true positives, false positives, false negatives, and true
negatives.
Cross-validation techniques to ensure that the model generalizes well to unseen data.
The evaluation process will help optimize the model and identify areas for
improvement.
14
Optimization techniques will be applied to improve the performance of the ML model.
This includes:
Hyperparameter tuning to select the best model parameters.
Since healthcare data is sensitive, the system will incorporate data encryption and
anonymization techniques to protect user information. Additionally, compliance with
HIPAA and GDPR guidelines will be considered to maintain privacy and security.
2.8 Integration with Electronic Health Records (EHRs)
For enhanced usability, the project will explore ways to integrate with EHR systems
used by hospitals and clinics. This would allow:
Beyond predicting diseases, the system will suggest personalized healthcare tips based
on:
Lifestyle habits (e.g., diet, exercise recommendations).
15
2.10 Extend Disease Coverage to Rare and Chronic Illnesses
Most ML-based disease prediction systems focus on common diseases. This project
aims to expand the model's capabilities to cover rare and chronic illnesses by
incorporating diverse datasets and expert medical knowledge
.
2.11 Deploy the System on Cloud for Scalability
To ensure global accessibility and high availability, the system will be deployed on cloud
platforms like AWS, Google Cloud, or Microsoft Azure. Benefits include:
16
2.14 Enhance Model Generalization for Real-World Application
One of the key challenges in ML-based disease prediction is ensuring that the model
generalizes well across different populations. This project will:
With these objectives, the Disease Prediction System Using Machine Learning aims to
revolutionize early disease detection, improve accessibility to healthcare, and contribute
to AI-driven medical advancements.
17
DESIGN AND APPROACH/METHODOLOGY
18
Decision Tree: A rule-based classification model for symptom-based disease
identification.
Random Forest: An ensemble model that improves prediction accuracy.
Support Vector Machine (SVM): Separates disease categories based on symptom
patterns.
K-Nearest Neighbours (KNN): Classifies diseases by comparing new cases with
existing ones.
Neural Networks: A deep learning model for complex disease prediction scenarios.
Logistic Regression: Used for binary classification problems.
Gradient Boosting Algorithms: XGBoost and AdaBoost for enhancing model
performance.
20
The Design and Methodology outlined above provide a structured approach to
developing a robust and efficient Disease Prediction System Using Machine Learning.
• Data Preprocessing
1. Handle missing values by using mean/mode substitution.
2. Convert categorical data (e.g., symptoms) into numerical format using
One-Hot Encoding.
3. Normalize numerical values (e.g., blood pressure, cholesterol levels) for
better model performance.
• Model Evaluation
1. Measure accuracy using metrics like Precision, Recall, F1-score, and
ROC-AUC.
2. Tune hyperparameters to improve performance.
21
• User Interface (UI) Development
1. Design a web or mobile app where users can enter symptoms.
2. The system predicts the possible disease and suggests next steps (e.g.,
consult a doctor, take tests, etc.).
22
RESULT AND DISCUSSION
4.1 Overview
The Disease Prediction System using Machine Learning aims to improve early-stage
disease detection, diagnostic accuracy, and personalized treatment plans. The system
utilizes various machine learning models, including Decision Trees, Random Forest,
Naïve Bayes, Support Vector Machine (SVM), and Deep Learning, to analyze patient
symptoms and medical history and provide predictive insights into potential diseases.
This chapter presents the results obtained from the model evaluations, a detailed
discussion of their performance metrics, comparative analysis, real-world applications,
and limitations. It also explores challenges, ethical considerations, and future
advancements that can enhance the efficiency and accuracy of AI-driven healthcare
solutions.
Neural Network
94.1 0.93 0.94 0.93 0.96 Very Slow
(ANN)
24
4.3.2 Key Observations
• Neural Networks outperformed all models, achieving 94.1% accuracy and a high
ROC-AUC score of 0.96, but required longer training times.
• Random Forest was the best balance of accuracy and speed, making it ideal for
real-time applications.
• SVM performed well but was computationally expensive, making it less
practical for large-scale datasets.
• Naïve Bayes was the fastest model, but its accuracy suffered due to its
assumption of feature independence.
25
• Governments can use the system to allocate healthcare resources effectively in
high-risk areas.
26
• Creating an API service for hospital integration with EHR systems.
4.7 The Disease Prediction System using Machine Learning demonstrated high
accuracy, efficiency, and real-world applicability. The best-performing models, Neural
Networks (94.1%) and Random Forest (91.5%), showed promising results for medical
diagnostics and telehealth applications.
However, future work should focus on:
• Expanding datasets for better generalization across diverse populations.
• Improving AI model interpretability to increase trust in ML-driven diagnostics.
• Enhancing security with blockchain-based patient data storage.
• Integrating wearable device monitoring for real-time patient health tracking.
With continuous advancements in AI, cloud computing, and IoT, machine learning-
based disease prediction can revolutionize global healthcare, reducing diagnostic errors
and improving patient outcomes.
27
28
CONCLUSIONS AND FUTURE SCOPE
5.1 Conclusion
The Disease Prediction System using Machine Learning (ML) represents a significant
step towards revolutionizing healthcare diagnostics by enabling fast, accurate, and
scalable disease detection. By leveraging AI-powered models, this system provides
datadriven insights into potential illnesses, allowing individuals and healthcare
professionals to take proactive measures for disease management and prevention.
The results of our study confirm that machine learning can significantly enhance
diagnostic accuracy, reduce reliance on costly medical tests, and bridge gaps in remote
and underdeveloped healthcare systems.
With Neural Networks achieving 94.1% accuracy and Random Forest models excelling
in real-time predictions, it is evident that AI-driven disease prediction can outperform
traditional diagnostic methods in both efficiency and precision.
29
This reduces waiting times for medical consultations, enabling early
intervention and better patient care.
30
must include blockchain-based secure data storage for enhanced patient data
protection.
31
• These virtual assistants can ask intelligent follow-up questions to refine
predictions, making them more patient-centric.
32
• Decentralized AI models can be trained using Federated Learning, ensuring data
privacy without sharing patient records.
33
✔ Integrating AI with IoT and wearable health devices for real-time monitoring.
34
REFRENCES
• Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques.
Morgan Kaufmann.
• Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly Detection: A Survey.
ACM Computing Surveys, 41(3), 1-58.
• Kumar, R., Srivastava, S., & Gupta, J. P. (2021). Machine Learning-Based Disease
Prediction Models: A Review. Journal of Biomedical Informatics, 118, 103789.
• Rahman, M. M., & Davis, D. N. (2013). Addressing the Class Imbalance Problem in
Medical Datasets. Journal of Biomedical Informatics, 46(4), 837-847.
35
• WHO (World Health Organization). (2021). AI in Healthcare: Ethics and
Governance.
Retrieved from https://www.who.int/
• Google AI Blog. (2022). Machine Learning for Disease Prediction. Retrieved from
https://ai.googleblog.com/
36