0% found this document useful (0 votes)

31 views10 pages

Project Report

The document proposes using machine learning techniques to predict diabetes by analyzing health data. Diabetes is a growing global issue that requires early detection for better management. The proposed solution involves collecting data, preprocessing it, selecting relevant features, splitting the data for training and testing machine learning classifiers like random forest, decision trees, and logistic regression. Random forest achieved the highest accuracy of 98% for predicting diabetes. Future work could integrate more diverse datasets and explore other advanced algorithms to improve predictive accuracy.

Uploaded by

BT21EE013 Pratima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views10 pages

Project Report

Uploaded by

BT21EE013 Pratima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Problem Statement

The problem statement is focused on the challenge of accurately predicting diabetes using
machine learning techniques. This involves analyzing a complex set of health data, including
various clinical parameters, to develop a reliable predictive model. The goal is to enable early
detection and intervention for diabetes, a condition with significant health implications,
leveraging the power of data analytics and machine learning algorithms.
Growing Prevalence of Diabetes

Diabetes is a global The rise of diabetes Python-based machine

health issue, with a presents a critical need learning tools offer a
growing prevalence for early detection and promising solution,
across the world, effective management to enabling early detection
placing significant mitigate its impact on and personalized
strain on health systems public health and management of diabetes,
and affecting the healthcare systems. potentially transforming
quality of life for healthcare outcomes.
individuals.
Proposed Solution
● Data Collection and Pre-processing
○ Gathered a comprehensive dataset comprising 1405 instances and 10 features including Glucose, BMI, and Age. Data pre-processing
involved removing inconsistent features such as 'Id', imputing zero values for biologically critical attributes, and scaling the data
using StandardScaler for optimal algorithm performance.
● Feature Selection and Normalization
○ Employed Pearson’s correlation method to retain highly relevant features, ensuring a robust feature set for model training.
Normalization was conducted to scale numerical data within the range of 0 to 1, enhancing the efficiency of distance-based
algorithms.
● Data Splitting and Model Training
○ Split the pre-processed data into 1600 training samples and 400 testing samples. This split facilitated the evaluation of the model's
predictive power on unseen data.
● Machine Learning Classifiers
○ Various machine learning classifiers such as Decision Trees (DT), K-Nearest Neighbors (KNN), Random Forests (RF), Naive Bayes
(NB), Logistic Regression (LR), and Support Vector Machines (SVM) were deployed to establish a prediction model. Each classifier
was meticulously implemented using Python's scikit-learn library.
● Evaluation and Results
○ The performance of each classifier was evaluated based on accuracy, with Random Forest achieving the highest accuracy of 98%, as
shown in the results table. Such insights are pivotal for choosing the most effective classifier for predicting diabetes.
System Approach
Hardware Software Libraries Required Model Evaluation

Processor: Intel Core i5 or Python 3.x and compatible pandas, numpy, scikit-learn, Using specific scikit-learn
equivalent OS, along with Jupyter matplotlib, seaborn for data modules for data splitting,
RAM: 8GB or higher Notebook or PyCharm for analysis, numerical cross-validation, and
development. operations, machine performance metrics.
Storage: 256GB SSD or
learning, visualization, and
higher for faster data
processing model evaluation.
Alogorithm & Deployment :
Algorithm Selection
Random Forest stands out as the chosen algorithm for its robustness in handling both numerical and categorical
data, an essential feature given the varied nature of the Pima Indians Diabetes Database. Its ability to manage
missing values and maintain accuracy across large datasets makes it particularly suited for medical datasets,
which often contain incomplete records. Random Forest's methodology, which builds multiple decision trees and
merges them to get a more accurate and stable prediction, offers a significant advantage in predicting complex
outcomes like diabetes.

Data Input
The dataset originates from the Pima Indians Diabetes Database, accessible on Kaggle, and is aimed specifically
at predicting the onset of diabetes based on various medical predictors. With 2000 data points and 8
independent variables—including Number of Pregnancies, Glucose, Blood Pressure, Skin Thickness, Insulin, BMI,
Diabetes Pedigree Function, and Age —the dataset provides a comprehensive basis for analysis. The target
variable, 'Outcome,' categorizes the patients into two groups: 0 for those without diabetes and 1 for those
diagnosed with the condition, offering a clear binary classification challenge.
Training Process
The training process begins with preprocessing, which involves handling missing data and potentially encoding
categorical variables to prepare the dataset for analysis. Following this, the dataset is split, usually
allocating 70% for training and 30% for testing, to ensure that the model can be trained on a substantial
portion of the data while still being validated on an independent set. The Random Forest model is then
initialized with specific parameters, such as the number of trees and their depth, to fit the model to the
training set. This process involves constructing multiple decision trees on various sub-samples of the dataset
and using averaging to improve the predictive accuracy and control over-fitting.

Prediction Process
Upon completion of the training, the Random Forest model uses the learned patterns to predict the outcome on new
or unseen data, effectively determining the probability of diabetes for each patient. The model outputs a
classification of 0 or 1, representing non-diabetic and diabetic outcomes, respectively. This prediction is
based on the majority vote from all trees in the forest, or in the case of regression tasks, an average
prediction, thereby leveraging the collective insight of multiple decision models for a more accurate and
reliable prediction.
Results
Comparison of Model Accuracy Insightful Model Comparison

• Machine learning classification

Machine Learning Algorithms Result
algorithms developed for
--------------------------------------- prediction of diabetes in
Logistic Regression 79.0 earlier stage. We used 70% of
K-Nearest Neighbors 80.5 data for trining and 30% of data
SVM 84.5 for testing. In this ratio of
Naive Bayes 76.83 data splitting Here we found
Decision Tree 96.0 that Random Forest Classifier
Random Forest 98.0 predicted with 99% of accuracy
AdaBoost Classifier 81.16 as highest accuracy for the
dataset.
Conclusion
The project aimed to create a model identifying diabetes patients at high risk of hospital admission,
addressing the complexity of this prediction. Given the need for improved understanding of admission
risk, the project contributes by proposing an assistive tool. It analyzes factors such as blood glucose
level and body mass index using various machine learning models and retrospective analysis of medical
records. The system predicts diabetes onset based on relevant medical details collected through a web
application. The trained artificial neural network, comprising six dense layers, achieves a reliable 98%
accuracy in predicting whether a person is diabetic or not.
Future Work and Applications
Research Directions Potential Applications

Integrate more diverse datasets for Develop automated screening tools for early
comprehensive analysis. diabetes detection in clinical settings.

Explore other advanced machine learning Create mobile applications for personalized
algorithms for enhanced predictive accuracy. diabetes risk assessment and management.
References
1. Sahoo, K.S., et al.: An evolutionary SVM model for DDOS attack detection in software
defined networks. IEEE Access 8, 132502 –132513 (2020)
2. Sahoo, K.S., et al.: A machine learning approach for predicting DDoS traffic in
software defined networks. In: 2018 International Conference on Information Technology
(ICIT). IEEE (2018)
3. Jakka, A., Vakula Rani, J.: Performance evaluation of machine learning models for
diabetes prediction. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(11) (2019). ISSN:
2278-3075
4. Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus
with machine learning techniques. Bioinform. Comput. Biol. Sect. J. Front. Genet.,
published: 06 2018

Journal La Edusci: Exploring English Language Students' Difficulties in Listening Comprehension
No ratings yet
Journal La Edusci: Exploring English Language Students' Difficulties in Listening Comprehension
10 pages
slmMA Ancient History and Archaeology
100% (1)
slmMA Ancient History and Archaeology
72 pages
FINAL CHAPTER 1 & 2
No ratings yet
FINAL CHAPTER 1 & 2
48 pages
FINAL THESISss
No ratings yet
FINAL THESISss
71 pages
Bio-Inspired PSO For Improving Neural Based Diabetes Prediction System
No ratings yet
Bio-Inspired PSO For Improving Neural Based Diabetes Prediction System
21 pages
Final Seminar Report Soumya
No ratings yet
Final Seminar Report Soumya
20 pages
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
No ratings yet
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
24 pages
Diabetes Analysis and Prediction
No ratings yet
Diabetes Analysis and Prediction
45 pages
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
No ratings yet
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
10 pages
Seetu Papers 1
No ratings yet
Seetu Papers 1
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
Diabetes Prediction Using Colab Notebook Based Mac
No ratings yet
Diabetes Prediction Using Colab Notebook Based Mac
6 pages
Final
No ratings yet
Final
44 pages
MSc-Process-Engineering ETH Zurich
No ratings yet
MSc-Process-Engineering ETH Zurich
9 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
ppt715B.pptm (Autosaved)
No ratings yet
ppt715B.pptm (Autosaved)
15 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
Risab
No ratings yet
Risab
13 pages
BT21EE008 - Experiment-14 Group-01
No ratings yet
BT21EE008 - Experiment-14 Group-01
8 pages
Week 3 - Mini Quiz 3
No ratings yet
Week 3 - Mini Quiz 3
3 pages
SDR - Assignment
0% (1)
SDR - Assignment
1 page
Uta Staiger, Henriette Steiner, Andrew Webber - Memory Culture and The Contemporary City - Palgrave Macmillan (2009)
100% (1)
Uta Staiger, Henriette Steiner, Andrew Webber - Memory Culture and The Contemporary City - Palgrave Macmillan (2009)
253 pages
Anxiety 2312.15272
No ratings yet
Anxiety 2312.15272
8 pages
Vision-Based UAV Collision Avoidance With 2D Dynamic Safety Envelope
No ratings yet
Vision-Based UAV Collision Avoidance With 2D Dynamic Safety Envelope
11 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
CIEA Term Project
No ratings yet
CIEA Term Project
19 pages
Literature Survey Paper On Comparative Analysis of Diabetics Prediction Systems Using Machine Learning Algorithms
No ratings yet
Literature Survey Paper On Comparative Analysis of Diabetics Prediction Systems Using Machine Learning Algorithms
4 pages
Presentation 3
No ratings yet
Presentation 3
8 pages
Diabe PDF
No ratings yet
Diabe PDF
11 pages
A Level Sociology Topic 4 Culture
No ratings yet
A Level Sociology Topic 4 Culture
6 pages
Download
No ratings yet
Download
6 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Artificial Intelligence in Healthcare
No ratings yet
Artificial Intelligence in Healthcare
3 pages
MLPPT 11 45
No ratings yet
MLPPT 11 45
31 pages
10 22399-Ijcesen 1185474-2693654
No ratings yet
10 22399-Ijcesen 1185474-2693654
6 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Software Engineer JD - Consultadd Group
No ratings yet
Software Engineer JD - Consultadd Group
2 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
2 pages
FND3002 Individual and Society - April 2021 - Module Guide
No ratings yet
FND3002 Individual and Society - April 2021 - Module Guide
13 pages
2022.module 3 - Communicative Language Ability - Bachman - CEFR
No ratings yet
2022.module 3 - Communicative Language Ability - Bachman - CEFR
31 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Literature Survey Diabetes Prediction
No ratings yet
Literature Survey Diabetes Prediction
2 pages
Diabetes - Test Report
No ratings yet
Diabetes - Test Report
62 pages
3 Journal
No ratings yet
3 Journal
9 pages
Educational Psychology Chapter 2 Part 1
No ratings yet
Educational Psychology Chapter 2 Part 1
16 pages
Sse 25 21 114-1
No ratings yet
Sse 25 21 114-1
14 pages
CSM (Charge Simulation Method)
No ratings yet
CSM (Charge Simulation Method)
9 pages
DPS
No ratings yet
DPS
18 pages
Predicting Diabetes Onset Using Machine Learning
No ratings yet
Predicting Diabetes Onset Using Machine Learning
4 pages
Analyzing The Behavior of Different Classification Algorithms in Diabetes Prediction
No ratings yet
Analyzing The Behavior of Different Classification Algorithms in Diabetes Prediction
6 pages
WFP 0000102103
No ratings yet
WFP 0000102103
119 pages
Article 6
No ratings yet
Article 6
11 pages
UBA21S1272
No ratings yet
UBA21S1272
1 page
Ai Datascience Project Grade 10
No ratings yet
Ai Datascience Project Grade 10
14 pages
Employee Knowledge Management at Techumsh
No ratings yet
Employee Knowledge Management at Techumsh
13 pages
Research Title
No ratings yet
Research Title
1 page
Lesson 13 Discussion 1
No ratings yet
Lesson 13 Discussion 1
3 pages
TechnologyName Phase1
No ratings yet
TechnologyName Phase1
9 pages
Chapter 2 Final Out Put
No ratings yet
Chapter 2 Final Out Put
7 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
245-Article Text-2088-1-10-20240129
No ratings yet
245-Article Text-2088-1-10-20240129
8 pages
Compositeinsulator
No ratings yet
Compositeinsulator
13 pages
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
No ratings yet
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset
12 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Diabetes Prediction - ML
No ratings yet
Diabetes Prediction - ML
29 pages
Comparison of ML Techniques
No ratings yet
Comparison of ML Techniques
16 pages
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
No ratings yet
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
5 pages
Paper 2
No ratings yet
Paper 2
5 pages
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
No ratings yet
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
7 pages
20BCE7620 AP2021228000397 Experiment-6 Removed
No ratings yet
20BCE7620 AP2021228000397 Experiment-6 Removed
19 pages
Energy-Constrained Delivery of Goods With Drones Under Varying Wind Conditions 1
No ratings yet
Energy-Constrained Delivery of Goods With Drones Under Varying Wind Conditions 1
13 pages
Mini Project
No ratings yet
Mini Project
15 pages
DIAPRO - Diabetes Prediction Application
No ratings yet
DIAPRO - Diabetes Prediction Application
18 pages
Hubungan Pengetahuan Dan Pola Konsumsi Dengan Status Gizi Pada Mahasiswa Kesehatan
No ratings yet
Hubungan Pengetahuan Dan Pola Konsumsi Dengan Status Gizi Pada Mahasiswa Kesehatan
9 pages
The Concept of Constituent Power-Martin Loughlin PDF
No ratings yet
The Concept of Constituent Power-Martin Loughlin PDF
20 pages
The Effect of Motivation On Purchasing Intention o
No ratings yet
The Effect of Motivation On Purchasing Intention o
6 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
From Everand
Applied Statistical Analysis with SPSS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Prediction of Type 2 Diabetes Using Machine Learning - 2020 - Procedia Computer
No ratings yet
Prediction of Type 2 Diabetes Using Machine Learning - 2020 - Procedia Computer
11 pages
Allama Iqbal Open University, Islamabad: (Department of Science Education)
No ratings yet
Allama Iqbal Open University, Islamabad: (Department of Science Education)
2 pages
Accepted Students - Fall 2020
No ratings yet
Accepted Students - Fall 2020
3 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
Diabetes PPT
100% (1)
Diabetes PPT
9 pages
Chapter 9 - Learning To Be The Student
No ratings yet
Chapter 9 - Learning To Be The Student
4 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
GE 3 (The Contemporary World) Printable Reviewer
No ratings yet
GE 3 (The Contemporary World) Printable Reviewer
5 pages
Analisis Tokoh Dan Penokohan Pada Drama RT Nol RW Nol Karya Iwan Simatupang
No ratings yet
Analisis Tokoh Dan Penokohan Pada Drama RT Nol RW Nol Karya Iwan Simatupang
10 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
MYP Years 4-5 Assessment Criteria 1
100% (1)
MYP Years 4-5 Assessment Criteria 1
33 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
CMO 87 S. 2017 BS Computer Engineering
No ratings yet
CMO 87 S. 2017 BS Computer Engineering
81 pages

Project Report

Uploaded by

Project Report

Uploaded by

Problem Statement

Diabetes is a global The rise of diabetes Python-based machine

• Machine learning classification

You might also like