241410

The document discusses using machine learning algorithms to predict diabetes at an early stage. It proposes a smart healthcare framework called HealthEdge that uses medical sensors and IoT devices to analyze diabetes risk factors and predict type 2 diabetes. The framework trains models in the cloud and uses them for prediction at the edge level to address diabetes in an integrated IoT-edge-cloud system.

Uploaded by

Aqsa Aqqa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

241410

Uploaded by

Aqsa Aqqa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Name : Muhammad

Haseeb
Roll No : 241410
Class : MSCS 1st(2024-26)
Department : Computer Science
Assignment : Data Mining
Submitted to : DR Uzma Jameel

1
Assignment # 1
1. Dataset name:
Sylhet dataset

2. Link to dataset:

Early Stage Diabetes Risk Prediction - UCI Machine Learning Repository

3. Domain:

Healthcare
4. What is data about:

Patients with diabetes.

5. No. of features in dataset d?

16
6. Which data mining technique is performed:
Random Forest (RF) and Logistic
Regression (LR)
7. Size of dataset, n=?

520 instances
8. No. of classes:
Null

9. Source of dataset:

This dataset was collected using direct questionnaires from patients at the Sylhet
Diabetes Hospital in Sylhet, Bangladesh and was approved by a doctor.
“Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques” by M. M.
F. Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra in 2019.

2
Sr Reference Problem Techniques/ Results Limitations Link
. Addressed Models / Future to the
No Works article

1 World Health A machine 1.Random 0.97 to A large spectrum 1st

Organization. Top 10
causes of death learning-based Forest and 0.98 of machine
globally 2020. smart Logistic(RF) learning and deep
https://www.who.int/
healthcare 2.Regression(LR learning
news-room/fact-
sheets/detail/the- framework for ) algorithms will be
top-10- causes-of- diabetes considered to
death#:~:text=The
%20top%20global
prediction in an evaluate the
%20causes integrated IoT- proposed system.
%20of,birth edge-cloud
%20asphyxia%20and
%20birth%20trauma computing
%2C (accessed system.
November 12, 2022).
2 Ismail L, To address this 1 Feature Approxi we aim to focus on the 2nd
progress of algorithms
Materwala H. challenge by Selection mately in related fields and to
IDMPF: leveraging big 2. Machine 0.8768 apply more novel and
intelligent data mining Learning efficient algorithms
diabetes techniques to Models such as deep neural
mellitus develop network to solve
current problems. We
prediction predictive
intend to collect more
framework models that can data, such as lifestyle
using machine effectively and image data,
learning. identify improve the quality of
data collection, update
Applied individuals at
the system and build
Computing and high risk of more reliable models.
Informatics diabetes based
2021. on
https://doi.org/ comprehensive
10.1108/ACI- health data.
10-2020-0094.
3 Types of Type 2 diabetes 1. Featuressel 0.76 to To capture 3rd
diabetes. using machine ection Algo 0.79 complex patterns
https://www.idf learning 2. Machine and dependencies
.org/aboutdiab algorithms Learning within large-scale
etes/what- Algo healthcare
isdiabetes.html. 3. Unified datasets. Deep
Accessed 23 Setup learning
Mar 2021 4. Evalution techniques have
Matrices
shown promise in
various medical
applications and
may offer
improved
performance in
diabetes risk
prediction tasks.

4 haw, J.; Sicree, To address the 1. Machine

3 78% predicting whether 4th
R.; Zimmet, P. challenge of learning an individual will
Global early detection 2. SVM develop T2D in the
estimates of and prediction 3. XGBoost following year (Y +
Assignment # 3

INTRODUCTION:
Diabetes, one of the top 10 causes of death across the world , is a disease characterized by
increased blood sugar levels . Based on a report by the International Diabetes Federation, in
2021, 537 million adults globally were suffering from diabetes causing 6.7 million deaths .
Furthermore, the number of diabetics is projected to reach 643 million by 2030 and 783
million by 2045 . Diabetes in an individual prevails due to a dynamic interaction between
different risk factors such as sleep duration, alcohol consumption, dyslipidemia, physical
inactivity, serum uric acid, obesity, hypertension, cardiovascular disease, family history of

4
diabetes, ethnicity, depression, age, and gender . If not treated at an early stage, diabetes can
lead to severe complications. The use of machine learning has thus gained wide attention for
the prediction of diabetes based on risk factors data . However, these works focus on stand-
alone diabetes prediction. To the best of our knowledge, no work proposes a smart healthcare
framework for diabetes prediction. To address this void by proposing HealthEdge, a machine
learning-based smart healthcare framework for the prediction of type 2 diabetes in an
integrated IoT-edge-cloud computing system. The proposed system analyzes diabetes risk
factors using medical sensors/devices and predicts the incidence of diabetes in an individual.
The machine learning model is trained in the cloud and then the developed model is used by
edge servers for diabetes prediction.
The pathophysiology of diabetes involves complex interactions between genetic,
environmental, and lifestyle factors. The chronic hyperglycemia of diabetes is associated with
long-term damage, dysfunction, and failure of various organs, particularly the eyes, kidneys,
nerves, heart, and blood vessels. Complications such as diabetic retinopathy, nephropathy,
neuropathy, and cardiovascular diseases significantly impact the quality of life and mortality
rates of individuals with diabetes.
Early diagnosis and effective management are crucial in preventing or delaying the onset of
complications. Management strategies include lifestyle modifications, glucose monitoring,
pharmacotherapy, and, in some cases, insulin therapy. Recent advancements in diabetes
research have led to improved treatment modalities and a better understanding of the disease
mechanisms, offering hope for more effective interventions in the future.

PROBLEM BACKGROUND:
Diabetes is a fast-growing health problem worldwide, mainly due to changes in lifestyle,
more people living in cities, and an aging population. The increase in obesity, especially belly
fat, is a big reason for the rise in Type 2 diabetes. This is connected to less physical activity
and unhealthy eating habits. Genetics also play a part, but lifestyle and environment are key
factors. Poorer communities face higher risks because they have less access to healthy food,
healthcare, and places to exercise. Complications from diabetes, like eye, kidney, and heart
problems, greatly affect people's lives and put pressure on healthcare systems. Solving this
issue needs a mix of public health efforts, policy changes, and better access to healthcare to
catch and manage diabetes early.

5
PROBLEM STATEMENT:

Diabetes is becoming a major health problem worldwide due to rising obesity, less physical
activity, and unhealthy diets. Many people, especially those with less money, struggle to get
good care, making the problem worse. This research aims to find out what causes diabetes
to increase and suggest ways to better prevent and manage it.

RESEARCH QUESTION:

1.How can machine learning algorithms predict the onset of diabetes using patient health
records and lifestyle data?
2.What are the most effective data analytics techniques for identifying high-risk populations
for diabetes?
3.How can computer-based models be used to simulate the impact of various lifestyle
interventions on diabetes prevention?
4. What role can artificial intelligence play in personalizing diabetes management plans for
individuals based on their health data?

RESEARCH OBJECTIVE:

 Data Collection and Integration: Gather comprehensive health records and lifestyle
data from diverse sources, ensuring a robust dataset for analysis.
 Advanced Machine Learning Models: Develop and train machine learning models
using various algorithms (e.g., XGBoost, Logistic Regression, Random Forest) to
predict diabetes onset and identify high-risk individuals.
 Data Analytics Techniques: Employ sophisticated data analytics methods (e.g.,
clustering, regression analysis, pattern recognition) to uncover key risk factors and
high-risk populations.
 Simulation Tools: Build computer-based simulation models to assess the impact of
lifestyle interventions on diabetes prevention, providing insights into effective public
health strategies.

6
 AI for Personalization: Utilize AI techniques (e.g., neural networks, reinforcement
learning) to design and optimize personalized diabetes management plans,
improving patient care and outcomes.

SCOPE:
This research will look at why more people are getting diabetes, especially those with less
money, and find ways to prevent and manage it better. It will also see how computers and
new technologies can help, like using data to predict who might get diabetes or how to treat
it best. The focus will be on simple, practical solutions that can work for everyone, no matter
where they live. While we won't go into very technical details, we'll explore how these ideas
can make a big difference in fighting diabetes around the world.

MOTIVATION:

The motivation behind this research stems from the urgent need to address the escalating
diabetes epidemic, which disproportionately affects individuals from disadvantaged
backgrounds. By understanding the root causes of diabetes and exploring innovative
approaches to prevention and management, we aim to reduce the burden of this chronic
disease on individuals, families, and healthcare systems. Moreover, harnessing the potential
of computer systems and predictive analytics offers promising avenues to revolutionize
diabetes care, making it more accessible, personalized, and effective for everyone.
Ultimately, this research is driven by the desire to improve health outcomes and promote
equity in healthcare, ensuring that all individuals, regardless of their socioeconomic status,
have the opportunity to live healthier lives free from the burden of diabetes.

7
Purposed Methodology:
1. Dataset Collection
 Sources
 Physical Examination Data
 Follow-up Data
2. Data Preprocessing
 Physical Examination Data:
 Dealing with Missing Values: Handle missing data points using appropriate
imputation methods.
 Encoded Text Features: Convert categorical text data into numerical format using
encoding techniques such as one-hot encoding or label encoding.
 Remove Abnormal Values: Identify and remove outliers and abnormal values to
ensure data quality.
 Delete Duplicate Samples: Remove any duplicate entries to maintain data integrity.
 Follow-up Data:
 Delete Duplicate Samples: Remove duplicate records to ensure consistency.

8
3. Feature Fusion
 Combine Different Types of Data:
 Demographics: Include demographic information such as age, gender, and ethnicity.
 Vital Signs: Integrate vital signs data like blood pressure, heart rate, etc.
 Laboratory Values: Incorporate laboratory test results (e.g., blood glucose levels,
cholesterol levels).
 Other Features: Include MSP (Medical Symptom Profile), MDP (Medical Diagnosis
Profile), BMI (Body Mass Index), FBG (Fasting Blood Glucose), PS (Physical
Status), and MA (Medical Assessment).
4. Feature Selection
 Methods:
 MI (Mutual Information): Measure the mutual dependence between variables.
 ANOVA (Analysis of Variance): Use statistical analysis to identify significant
features.
 GI (Gini Index): Employ Gini index to assess the purity of splits in the data.
 Strategy:
 IFS (Incremental Feature Selection): Gradually add features based on their importance
and performance to build an optimal feature set.
5. Classification
 Algorithms:
 XGBoost (Extreme Gradient Boosting): Use this ensemble learning method for
classification.
 LR (Logistic Regression): Apply logistic regression for predicting the probability of
diabetes.
 RF (Random Forest): Utilize random forest for robust classification using multiple
decision trees.
 Models:
 Diabetes Risk Assessment Model: Use XGBoost, LR, and RF to develop the risk
assessment model.
 Diabetes Risk Score Card: Develop a scoring system using logistic regression for easy
interpretation.
 Follow-up Record-Based Model: Implement logistic regression to predict outcomes
based on follow-up data.
9
Model Outputs
 Diabetes Risk Assessment Model: Provides a comprehensive risk assessment using
advanced machine learning algorithms.
 Diabetes Risk Score Card: Offers a simplified scoring method for quick risk
evaluation.
 Follow-up Record-Based Model: Tracks and predicts patient outcomes based on
follow-up data.

REFERENCES:

1. American Diabetes Association. (2021). Standards of Medical Care in Diabetes—

2021. Diabetes Care, 44(Supplement 1), S1-S232.
2. International Diabetes Federation. (2019). IDF Diabetes Atlas, 9th Edition. Brussels,
Belgium: International Diabetes Federation.
3. World Health Organization. (2016). Global Report on Diabetes. Geneva, Switzerland:
World Health Organization.
4. ] C. Cristelo, C. Azevedo, J.M. Marques, R. Nunes, B. Sarmento, SARS-CoV-2 and diabetes:
new challenges for the disease, Diabetes Res. Clin. Pract. 164 (2020), 108228.
5. ] R.L. Thomas, S. Halim, S. Gurudas, S. Sivaprasad, D.R. Owens, IDF Diabetes Atlas: a review
of studies utilising retinal photography on the global prevalence of diabetes related
retinopathy between 2015 and 2018, Diabetes Res. Clin. Pract. 157 (2019), 107840.

Price-Rexroth Hydraulics Division
78% (9)
Price-Rexroth Hydraulics Division
512 pages
Air Compressor Parts PDF
0% (1)
Air Compressor Parts PDF
51 pages
Cardiologie MANUAL
50% (12)
Cardiologie MANUAL
15 pages
Proposal
No ratings yet
Proposal
12 pages
Aiml Project Report
No ratings yet
Aiml Project Report
10 pages
Manual de Servicio de Analizador de Química Clínica
0% (1)
Manual de Servicio de Analizador de Química Clínica
516 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Vitotres343 TechGuide PDF
No ratings yet
Vitotres343 TechGuide PDF
32 pages
Diabetes Analysis and Prediction
No ratings yet
Diabetes Analysis and Prediction
45 pages
Bca 5th Sem Minor Report
No ratings yet
Bca 5th Sem Minor Report
46 pages
Final
No ratings yet
Final
44 pages
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
No ratings yet
Artificial Intelligence Approaches For Predicting Diabetes in Egypt
19 pages
Peerj Cs 1914
No ratings yet
Peerj Cs 1914
30 pages
Journal Pone 0310218
No ratings yet
Journal Pone 0310218
29 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
Minipro 2
No ratings yet
Minipro 2
24 pages
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
No ratings yet
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
15 pages
Final Seminar Report Soumya
No ratings yet
Final Seminar Report Soumya
20 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
Proposal
No ratings yet
Proposal
21 pages
DPS
No ratings yet
DPS
18 pages
Diabetes Prediction - ML
No ratings yet
Diabetes Prediction - ML
29 pages
Diabetes Detection
No ratings yet
Diabetes Detection
19 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
Food Del Report 1
No ratings yet
Food Del Report 1
13 pages
Sampling Procedure APEDA 1721269949
No ratings yet
Sampling Procedure APEDA 1721269949
5 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Risab
No ratings yet
Risab
13 pages
Diabetes Prediction
No ratings yet
Diabetes Prediction
13 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
Ijerph 19 12378 v2
No ratings yet
Ijerph 19 12378 v2
25 pages
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
No ratings yet
Prediction of Diabetes Using Machine Learning: A Modern User-Friendly Model
7 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Diabe PDF
No ratings yet
Diabe PDF
11 pages
Sse 25 21 114-2
No ratings yet
Sse 25 21 114-2
13 pages
Article 6
No ratings yet
Article 6
11 pages
ZEROTHREVIEW
No ratings yet
ZEROTHREVIEW
10 pages
Machine Learning and Deep Learning Techniques
No ratings yet
Machine Learning and Deep Learning Techniques
13 pages
Mini Project
No ratings yet
Mini Project
15 pages
Towards Real-Time Monitoring and Risk Assessment of Diabetes Complications Using Optimized Machine Learning Models
No ratings yet
Towards Real-Time Monitoring and Risk Assessment of Diabetes Complications Using Optimized Machine Learning Models
5 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
DIAPRO - Diabetes Prediction Application
No ratings yet
DIAPRO - Diabetes Prediction Application
18 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
No ratings yet
Machine Learning Based Diabetes Prediction - WITH TRACH CHANGES
10 pages
3 Journal
No ratings yet
3 Journal
9 pages
Diabetes Synopsis Report
No ratings yet
Diabetes Synopsis Report
10 pages
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
No ratings yet
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
8 pages
245-Article Text-2088-1-10-20240129
No ratings yet
245-Article Text-2088-1-10-20240129
8 pages
MB-310 Dynamics 365 Finance
No ratings yet
MB-310 Dynamics 365 Finance
13 pages
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
No ratings yet
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
7 pages
Sustainability 15 13484 v2
No ratings yet
Sustainability 15 13484 v2
24 pages
1 s2.0 S2772671124002419 Main (Asp)
No ratings yet
1 s2.0 S2772671124002419 Main (Asp)
18 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
No ratings yet
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
5 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
No ratings yet
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
5 pages
Analyze The Use of Machine Learning Models in The Pima Diabetes Data Set For Early Stage Detection
No ratings yet
Analyze The Use of Machine Learning Models in The Pima Diabetes Data Set For Early Stage Detection
5 pages
Report Diabetics
No ratings yet
Report Diabetics
8 pages
Diabetes Predection
No ratings yet
Diabetes Predection
7 pages
Literature Survey Diabetes Prediction
No ratings yet
Literature Survey Diabetes Prediction
2 pages
Test Initial Engleza Clasa A 8 A
No ratings yet
Test Initial Engleza Clasa A 8 A
2 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
Rhabdo Virus
No ratings yet
Rhabdo Virus
13 pages
Project Report Codecrafters
No ratings yet
Project Report Codecrafters
3 pages
Amec Unit 2 QB
No ratings yet
Amec Unit 2 QB
23 pages
Paper 3
No ratings yet
Paper 3
1 page
CCW Basics and The Micro 830
No ratings yet
CCW Basics and The Micro 830
52 pages
The Noun Phrase Jan Rijkhoff Z Library
No ratings yet
The Noun Phrase Jan Rijkhoff Z Library
1,028 pages
Tema Excel Proiect TIC CECCAR
No ratings yet
Tema Excel Proiect TIC CECCAR
33 pages
ENGLISH 9 Q1 Week 1 2
No ratings yet
ENGLISH 9 Q1 Week 1 2
10 pages
Mint Delhi 10.08.2020 PDF
No ratings yet
Mint Delhi 10.08.2020 PDF
17 pages
Worksheet and Coronavirus 10 Ac
No ratings yet
Worksheet and Coronavirus 10 Ac
5 pages
DevOps Part I
No ratings yet
DevOps Part I
16 pages
Lecture 7-2
No ratings yet
Lecture 7-2
37 pages
Sims 2 Thoughts
No ratings yet
Sims 2 Thoughts
13 pages
Target Appraisal: Case: Dr. Reddy Laboratories (A) & (B)
No ratings yet
Target Appraisal: Case: Dr. Reddy Laboratories (A) & (B)
45 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
One-Way ANOVA: (Independent Group and Repeated Measures)
No ratings yet
One-Way ANOVA: (Independent Group and Repeated Measures)
36 pages
Jithin Original
No ratings yet
Jithin Original
2 pages
Collab Report Merged
No ratings yet
Collab Report Merged
55 pages
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
No ratings yet
Water Ingress Analysis and Splash Protection Evaluation For Vehicle Wading Using Non-Classical CFD Simulation
13 pages
Cbo - Elen4003a - 2023
No ratings yet
Cbo - Elen4003a - 2023
4 pages
Homework 6 Mckinney Ce374L Prob. 4.4.5. Drawdown Was Measured During A Pumping Test in A Confined Aquifer at Frequent
No ratings yet
Homework 6 Mckinney Ce374L Prob. 4.4.5. Drawdown Was Measured During A Pumping Test in A Confined Aquifer at Frequent
3 pages
Scedule of Defense
No ratings yet
Scedule of Defense
1 page
Imagine
No ratings yet
Imagine
5 pages
Synapse - Test Automation Engineer
No ratings yet
Synapse - Test Automation Engineer
1 page