0% found this document useful (0 votes)
13 views10 pages

241410

The document discusses using machine learning algorithms to predict diabetes at an early stage. It proposes a smart healthcare framework called HealthEdge that uses medical sensors and IoT devices to analyze diabetes risk factors and predict type 2 diabetes. The framework trains models in the cloud and uses them for prediction at the edge level to address diabetes in an integrated IoT-edge-cloud system.

Uploaded by

Aqsa Aqqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

241410

The document discusses using machine learning algorithms to predict diabetes at an early stage. It proposes a smart healthcare framework called HealthEdge that uses medical sensors and IoT devices to analyze diabetes risk factors and predict type 2 diabetes. The framework trains models in the cloud and uses them for prediction at the edge level to address diabetes in an integrated IoT-edge-cloud system.

Uploaded by

Aqsa Aqqa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Name : Muhammad

Haseeb
Roll No : 241410
Class : MSCS 1st(2024-26)
Department : Computer Science
Assignment : Data Mining
Submitted to : DR Uzma Jameel

1
Assignment # 1
1. Dataset name:
Sylhet dataset

2. Link to dataset:

Early Stage Diabetes Risk Prediction - UCI Machine Learning Repository

3. Domain:

Healthcare
4. What is data about:

Patients with diabetes.


5. No. of features in dataset d?

16
6. Which data mining technique is performed:
Random Forest (RF) and Logistic
Regression (LR)
7. Size of dataset, n=?

520 instances
8. No. of classes:
Null

9. Source of dataset:

This dataset was collected using direct questionnaires from patients at the Sylhet
Diabetes Hospital in Sylhet, Bangladesh and was approved by a doctor.
“Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques” by M. M.
F. Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra in 2019.

2
Sr Reference Problem Techniques/ Results Limitations Link
. Addressed Models / Future to the
No Works article

1 World Health A machine 1.Random 0.97 to A large spectrum 1st


Organization. Top 10
causes of death learning-based Forest and 0.98 of machine
globally 2020. smart Logistic(RF) learning and deep
https://www.who.int/
healthcare 2.Regression(LR learning
news-room/fact-
sheets/detail/the- framework for ) algorithms will be
top-10- causes-of- diabetes considered to
death#:~:text=The
%20top%20global
prediction in an evaluate the
%20causes integrated IoT- proposed system.
%20of,birth edge-cloud
%20asphyxia%20and
%20birth%20trauma computing
%2C (accessed system.
November 12, 2022).
2 Ismail L, To address this 1 Feature Approxi we aim to focus on the 2nd
progress of algorithms
Materwala H. challenge by Selection mately in related fields and to
IDMPF: leveraging big 2. Machine 0.8768 apply more novel and
intelligent data mining Learning efficient algorithms
diabetes techniques to Models such as deep neural
mellitus develop network to solve
current problems. We
prediction predictive
intend to collect more
framework models that can data, such as lifestyle
using machine effectively and image data,
learning. identify improve the quality of
data collection, update
Applied individuals at
the system and build
Computing and high risk of more reliable models.
Informatics diabetes based
2021. on
https://doi.org/ comprehensive
10.1108/ACI- health data.
10-2020-0094.
3 Types of Type 2 diabetes 1. Featuressel 0.76 to To capture 3rd
diabetes. using machine ection Algo 0.79 complex patterns
https://www.idf learning 2. Machine and dependencies
.org/aboutdiab algorithms Learning within large-scale
etes/what- Algo healthcare
isdiabetes.html. 3. Unified datasets. Deep
Accessed 23 Setup learning
Mar 2021 4. Evalution techniques have
Matrices
shown promise in
various medical
applications and
may offer
improved
performance in
diabetes risk
prediction tasks.

4 haw, J.; Sicree, To address the 1. Machine


3 78% predicting whether 4th
R.; Zimmet, P. challenge of learning an individual will
Global early detection 2. SVM develop T2D in the
estimates of and prediction 3. XGBoost following year (Y +
Assignment # 3

INTRODUCTION:
Diabetes, one of the top 10 causes of death across the world , is a disease characterized by
increased blood sugar levels . Based on a report by the International Diabetes Federation, in
2021, 537 million adults globally were suffering from diabetes causing 6.7 million deaths .
Furthermore, the number of diabetics is projected to reach 643 million by 2030 and 783
million by 2045 . Diabetes in an individual prevails due to a dynamic interaction between
different risk factors such as sleep duration, alcohol consumption, dyslipidemia, physical
inactivity, serum uric acid, obesity, hypertension, cardiovascular disease, family history of

4
diabetes, ethnicity, depression, age, and gender . If not treated at an early stage, diabetes can
lead to severe complications. The use of machine learning has thus gained wide attention for
the prediction of diabetes based on risk factors data . However, these works focus on stand-
alone diabetes prediction. To the best of our knowledge, no work proposes a smart healthcare
framework for diabetes prediction. To address this void by proposing HealthEdge, a machine
learning-based smart healthcare framework for the prediction of type 2 diabetes in an
integrated IoT-edge-cloud computing system. The proposed system analyzes diabetes risk
factors using medical sensors/devices and predicts the incidence of diabetes in an individual.
The machine learning model is trained in the cloud and then the developed model is used by
edge servers for diabetes prediction.
The pathophysiology of diabetes involves complex interactions between genetic,
environmental, and lifestyle factors. The chronic hyperglycemia of diabetes is associated with
long-term damage, dysfunction, and failure of various organs, particularly the eyes, kidneys,
nerves, heart, and blood vessels. Complications such as diabetic retinopathy, nephropathy,
neuropathy, and cardiovascular diseases significantly impact the quality of life and mortality
rates of individuals with diabetes.
Early diagnosis and effective management are crucial in preventing or delaying the onset of
complications. Management strategies include lifestyle modifications, glucose monitoring,
pharmacotherapy, and, in some cases, insulin therapy. Recent advancements in diabetes
research have led to improved treatment modalities and a better understanding of the disease
mechanisms, offering hope for more effective interventions in the future.

PROBLEM BACKGROUND:
Diabetes is a fast-growing health problem worldwide, mainly due to changes in lifestyle,
more people living in cities, and an aging population. The increase in obesity, especially belly
fat, is a big reason for the rise in Type 2 diabetes. This is connected to less physical activity
and unhealthy eating habits. Genetics also play a part, but lifestyle and environment are key
factors. Poorer communities face higher risks because they have less access to healthy food,
healthcare, and places to exercise. Complications from diabetes, like eye, kidney, and heart
problems, greatly affect people's lives and put pressure on healthcare systems. Solving this
issue needs a mix of public health efforts, policy changes, and better access to healthcare to
catch and manage diabetes early.

5
PROBLEM STATEMENT:

Diabetes is becoming a major health problem worldwide due to rising obesity, less physical
activity, and unhealthy diets. Many people, especially those with less money, struggle to get
good care, making the problem worse. This research aims to find out what causes diabetes
to increase and suggest ways to better prevent and manage it.

RESEARCH QUESTION:

1.How can machine learning algorithms predict the onset of diabetes using patient health
records and lifestyle data?
2.What are the most effective data analytics techniques for identifying high-risk populations
for diabetes?
3.How can computer-based models be used to simulate the impact of various lifestyle
interventions on diabetes prevention?
4. What role can artificial intelligence play in personalizing diabetes management plans for
individuals based on their health data?

RESEARCH OBJECTIVE:

 Data Collection and Integration: Gather comprehensive health records and lifestyle
data from diverse sources, ensuring a robust dataset for analysis.
 Advanced Machine Learning Models: Develop and train machine learning models
using various algorithms (e.g., XGBoost, Logistic Regression, Random Forest) to
predict diabetes onset and identify high-risk individuals.
 Data Analytics Techniques: Employ sophisticated data analytics methods (e.g.,
clustering, regression analysis, pattern recognition) to uncover key risk factors and
high-risk populations.
 Simulation Tools: Build computer-based simulation models to assess the impact of
lifestyle interventions on diabetes prevention, providing insights into effective public
health strategies.

6
 AI for Personalization: Utilize AI techniques (e.g., neural networks, reinforcement
learning) to design and optimize personalized diabetes management plans,
improving patient care and outcomes.

SCOPE:
This research will look at why more people are getting diabetes, especially those with less
money, and find ways to prevent and manage it better. It will also see how computers and
new technologies can help, like using data to predict who might get diabetes or how to treat
it best. The focus will be on simple, practical solutions that can work for everyone, no matter
where they live. While we won't go into very technical details, we'll explore how these ideas
can make a big difference in fighting diabetes around the world.

MOTIVATION:

The motivation behind this research stems from the urgent need to address the escalating
diabetes epidemic, which disproportionately affects individuals from disadvantaged
backgrounds. By understanding the root causes of diabetes and exploring innovative
approaches to prevention and management, we aim to reduce the burden of this chronic
disease on individuals, families, and healthcare systems. Moreover, harnessing the potential
of computer systems and predictive analytics offers promising avenues to revolutionize
diabetes care, making it more accessible, personalized, and effective for everyone.
Ultimately, this research is driven by the desire to improve health outcomes and promote
equity in healthcare, ensuring that all individuals, regardless of their socioeconomic status,
have the opportunity to live healthier lives free from the burden of diabetes.

7
Purposed Methodology:
1. Dataset Collection
 Sources
 Physical Examination Data
 Follow-up Data
2. Data Preprocessing
 Physical Examination Data:
 Dealing with Missing Values: Handle missing data points using appropriate
imputation methods.
 Encoded Text Features: Convert categorical text data into numerical format using
encoding techniques such as one-hot encoding or label encoding.
 Remove Abnormal Values: Identify and remove outliers and abnormal values to
ensure data quality.
 Delete Duplicate Samples: Remove any duplicate entries to maintain data integrity.
 Follow-up Data:
 Delete Duplicate Samples: Remove duplicate records to ensure consistency.

8
3. Feature Fusion
 Combine Different Types of Data:
 Demographics: Include demographic information such as age, gender, and ethnicity.
 Vital Signs: Integrate vital signs data like blood pressure, heart rate, etc.
 Laboratory Values: Incorporate laboratory test results (e.g., blood glucose levels,
cholesterol levels).
 Other Features: Include MSP (Medical Symptom Profile), MDP (Medical Diagnosis
Profile), BMI (Body Mass Index), FBG (Fasting Blood Glucose), PS (Physical
Status), and MA (Medical Assessment).
4. Feature Selection
 Methods:
 MI (Mutual Information): Measure the mutual dependence between variables.
 ANOVA (Analysis of Variance): Use statistical analysis to identify significant
features.
 GI (Gini Index): Employ Gini index to assess the purity of splits in the data.
 Strategy:
 IFS (Incremental Feature Selection): Gradually add features based on their importance
and performance to build an optimal feature set.
5. Classification
 Algorithms:
 XGBoost (Extreme Gradient Boosting): Use this ensemble learning method for
classification.
 LR (Logistic Regression): Apply logistic regression for predicting the probability of
diabetes.
 RF (Random Forest): Utilize random forest for robust classification using multiple
decision trees.
 Models:
 Diabetes Risk Assessment Model: Use XGBoost, LR, and RF to develop the risk
assessment model.
 Diabetes Risk Score Card: Develop a scoring system using logistic regression for easy
interpretation.
 Follow-up Record-Based Model: Implement logistic regression to predict outcomes
based on follow-up data.
9
Model Outputs
 Diabetes Risk Assessment Model: Provides a comprehensive risk assessment using
advanced machine learning algorithms.
 Diabetes Risk Score Card: Offers a simplified scoring method for quick risk
evaluation.
 Follow-up Record-Based Model: Tracks and predicts patient outcomes based on
follow-up data.

REFERENCES:

1. American Diabetes Association. (2021). Standards of Medical Care in Diabetes—


2021. Diabetes Care, 44(Supplement 1), S1-S232.
2. International Diabetes Federation. (2019). IDF Diabetes Atlas, 9th Edition. Brussels,
Belgium: International Diabetes Federation.
3. World Health Organization. (2016). Global Report on Diabetes. Geneva, Switzerland:
World Health Organization.
4. ] C. Cristelo, C. Azevedo, J.M. Marques, R. Nunes, B. Sarmento, SARS-CoV-2 and diabetes:
new challenges for the disease, Diabetes Res. Clin. Pract. 164 (2020), 108228.
5. ] R.L. Thomas, S. Halim, S. Gurudas, S. Sivaprasad, D.R. Owens, IDF Diabetes Atlas: a review
of studies utilising retinal photography on the global prevalence of diabetes related
retinopathy between 2015 and 2018, Diabetes Res. Clin. Pract. 157 (2019), 107840.

10

You might also like