0% found this document useful (0 votes)
45 views6 pages

A Survey On Diabetes Risk Prediction Using Machine.50

This document summarizes a survey on using machine learning approaches to predict diabetes risk. It finds that machine learning classification algorithms like support vector machine, K-nearest neighbor, and random forest show the best accuracy for early diabetes prediction. The study addresses assessing various supervised and unsupervised machine learning methods on diabetes datasets to identify the most precise predictive model. It concludes that early diabetes detection is important for effective treatment, and machine learning can help identify people who may have diabetes but are unaware.

Uploaded by

21bit20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views6 pages

A Survey On Diabetes Risk Prediction Using Machine.50

This document summarizes a survey on using machine learning approaches to predict diabetes risk. It finds that machine learning classification algorithms like support vector machine, K-nearest neighbor, and random forest show the best accuracy for early diabetes prediction. The study addresses assessing various supervised and unsupervised machine learning methods on diabetes datasets to identify the most precise predictive model. It concludes that early diabetes detection is important for effective treatment, and machine learning can help identify people who may have diabetes but are unaware.

Uploaded by

21bit20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Original Article

A survey on diabetes risk prediction using machine


learning approaches
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

Shimoo Firdous1, Gowher A. Wagai2, Kalpana Sharma1


WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

1
Department of Computer Science, Bhagwant University, Ajmer, Rajasthan, 2Department of Medicine‑Associated Hospital
GMC, Anantnag, Jammu and Kashmir, India

A bstract
Background: Diabetes mellitus (DM) is a chronic condition that can lead to a variety of consequences. Diabetes is a condition that is
caused by factors such as age, lack of exercise, sedentary lifestyle, family history of diabetes, high blood pressure, depression and stress,
poor food, and so on. Diabetics are at a higher risk of developing diseases such as heart disease, nerve damage (diabetic neuropathy),
eye problems (diabetic retinopathy), kidney disease (diabetic nephropathy), stroke, and so on. According to the International Diabetes
Federation, 382 million people worldwide suffer from diabetes. By 2035, this number will have risen to 592 million. Every day, a large
number of people become victims, and many are ignorant whether they have it or not. It primarily affects individuals between the ages
of 25 and 74 years. If diabetes is left untreated and undiagnosed, it can lead to a slew of complications. The emergence of machine
learning approaches, on the other hand, solves this crucial issue. Aims and Objectives: The aim was to study the DM and analyze how
machine learning algorithms are used to identify the diabetes mellitus at an early stage, which is one of the most serious metabolic
disorders in the world today. Methods and Materials: Data was obtained from databases such as Pubmed, IEEE xplore, and INSPEC,and
from other secondary sources and primary sources in which methods based on machine learning approaches used in healthcare to
predict diabetes at an early stage are reported. Results: After surveying various research papers, it was found that machine learning
classification algorithms like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) etc shows the best
accuracy for predicting diabetes at an early stage. Conclusion: Early detection of diabetes is critical for effective therapy. Many people
have no idea whether or not they have it. The full assessment of Machine learning approaches for early diabetes prediction and how to
apply a variety of supervised and unsupervised machine learning algorithms to the dataset to achieve the best accuracy are addressed
in this paper.. Furthermore, the work will be expanded and refined to create a more precise and general predictive model for diabetes
risk prediction at an early stage. Different metrics can be used to assess performance and for accurate diabetic diagnosis.

Keywords: Accuracy, classification, diabetes mellitus, machine learning algorithm

Introduction parts of the patient’s life, including physical and mental well‑being,
and no therapy technique can produce spectacular improvements
Diabetes mellitus is a metabolic disorder defined by abnormally or stop the disease from progressing.[2] In the year 2000, India had
high blood sugar levels due to a lack of insulin secretion or a the greatest number of diabetics in the world (31.7 million), which
combination of insulin resistance and insufficient insulin synthesis increased to 62.4 million in 2011 and is anticipated to reach 69.9
to compensate.[1] It is a progressive metabolic ailment that affects all million by 2025.[3,4] Rapid urbanization and economic development
are to blame for India’s high frequency. Indians are more likely to
Address for correspondence: Dr. Gowher A. Wagai,
develop diabetes as a result of their low BMI combined with high
Al‑Jowher Diabetes Care and Scan Centre, Nazuk Mohalla,
Cheeni Chowk, Anantnag ‑ 192 101, Jammu and Kashmir, India. upper‑body adiposity, high body fat percentage, and high insulin
E‑mail: [email protected] resistance.[5] Blurred vision, weight loss, fatigue, increased hunger
and thirst, confusion, frequent urination, poor healing, frequent
Received: 02‑03‑2022 Revised: 24‑06‑2022
Accepted: 26‑06‑2022 Published: 16-12-2022
This is an open access journal, and articles are distributed under the terms of the Creative
Commons Attribution‑NonCommercial‑ShareAlike 4.0 License, which allows others to
Access this article online remix, tweak, and build upon the work non‑commercially, as long as appropriate credit is
Quick Response Code: given and the new creations are licensed under the identical terms.
Website:
www.jfmpc.com For reprints contact: [email protected]

DOI: How to cite this article: Firdous S, Wagai GA, Sharma K. A survey on
10.4103/jfmpc.jfmpc_502_22 diabetes risk prediction using machine learning approaches. J Family
Med Prim Care 2022;11:6929-34.

© 2022 Journal of Family Medicine and Primary Care | Published by Wolters Kluwer ‑ Medknow 6929
Firdous, et al.: Survey on diabetes risk prediction using ML approaches

infections, and difficulty concentrating are all signs or symptoms The pancreas produces insulin, but it may not be enough to
of diabetes. “Diabetes means you have too much sugar in your keep blood glucose levels normal, or the cells may be resistant
blood. High blood sugar problems start when your body no to the insulin produced. The illness is most prevalent in those
longer makes enough of a chemical, or hormone, called insulin.”[6] over the age of 40, but it is also becoming more prevalent in
“Sweet urine” is the direct translation. Normal urine does not teenagers and young children. Type 1 diabetes is characterized
contain sugar. There is sugar (or more precisely glucose) in the by drowsiness, dry, itchy skin, unintended weight gain or loss,
urine because the amount of glucose in the blood has increased blurred vision, tingling, numbness, pain in the lower legs, easy
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

to the point where it spills over into the urine. Because the body weariness, sluggish healing of cuts, or scratches, and frequent
is unable to metabolize glucose properly, it accumulates in the infections (e.g., vaginal infections). Food, activity, lifestyle control,
bloodstream. As a result, diabetes is a disease in which the body and, in some situations, oral medicines or insulin are all necessary
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

is unable to use glucose properly. for type 2 diabetes.[6]

This paper provides many machine learning algorithms used Gestational diabetes
for the early prediction of diabetes. The remainder of the Pregnant women who have never had diabetes before but have
paper is conceived in the following manner: Section (1) is the high blood glucose (sugar) levels during pregnancy are diagnosed
introduction; section (2) is diabetes and its types; section (3) is with gestational diabetes. It is a temporary condition that affects
machine learning algorithms; section (4) literature survey for 2%–4% of all pregnant women and usually disappears after
prediction of diabetes; and section (5) is the conclusion. the baby is born. Women who have had gestational diabetes in
the past are more likely to develop type 2 diabetes later in life.
Diabetes and its Types There is no known etiology for this type of diabetes. The placenta
supports the infant’s development; placental hormones help the
Diabetes mellitus (DM) is a metabolic disease with a variety
baby develop, but they also prevent the mother’s insulin from
of causes. It is characterized by persistent hyperglycemia and
working properly in her body, resulting in insulin resistance. When
alterations in carbohydrate, lipid, and protein metabolism caused
a mother’s body is unable to create and use all of the insulin
by insulin deficiency, insulin action, or both. Diabetes is a chronic
required during pregnancy, gestational diabetes develops. The
disease. Diabetes can injure neurons and blood arteries in the
majority of women are unaware of any signs or symptoms of
eyes, kidneys, heart, and lower legs if not effectively treated.
gestational diabetes. Increased thirst and more frequent urination
Problems may emerge if blood glucose levels remain high for
are two symptoms.
an extended period. Gum disease or tooth decay are examples
of mouth issues. Diabetic retinopathy is a condition that causes
Diabetes mellitus is a deadly disorder, if not treated early’
vision loss and, in severe cases, blindness. Heart and blood
however, early detection can minimize the risk significantly.
vessel illnesses include heart attacks, strokes, and peripheral
A range of medical diagnostic procedures is already in use for
artery disease (cardiovascular diseases or CVD) (insufficient
early diagnosis. Early risk forecasts can be made using machine
blood supply to the feet and legs). Kidney disease (diabetic
learning techniques. Recent research has given promising results
nephropathy) is a condition in which the kidneys do not function
in terms of forecasting the risk of diabetes mellitus. Machine
properly or at all.[7] The three kinds of diabetes include type 1
learning is a field of study in which algorithms are used to
diabetes, type 2 diabetes, and gestational diabetes.
teach machines without the need of humans. Without having
to explicitly program them, we can train them to do a given job
Type 1 diabetes (T1D)
and then use that training to handle similar duties. Accuracy
The body does not produce enough insulin in type 1 diabetes. is always a major problem in medical science, and different
Body cells can’t absorb glucose from the bloodstream without algorithms might yield varying degrees of accuracy on the same
insulin, so they have to rely on other sources of energy. An excess data set. To design a better classifier for better classification, it is
of glucose in the blood causes diabetes and its complications. vital to figure out which algorithm delivers the greatest results.
This type of diabetes is also known as insulin‑dependent Machine learning can now be found in almost every industry.
diabetes mellitus (IDDM). Although it affects adolescents and Its application in medical science has the potential to improve
teenagers more frequently, it can affect anyone at any age. It healthcare dramatically.
requires a delicate balancing act of insulin injections (and, in
some circumstances, oral medicines), exercise, nutrition planning, Decision trees, random forests, support vector machines, naive
and lifestyle changes. Frequent urination, unusual thirst, unusual Bayes classifiers, and artificial neural networks are examples of
hunger, rapid weight loss, weariness and weakness, nausea, and machine learning and classification algorithms that work well in
irritability are some of the symptoms of type 1 diabetes. risk prediction. Because of the algorithms’ computing and data
management skills, this is possible. Measures of classification
Type 2 diabetes (T2D) accuracy can be used to select the best algorithm and determine
Type 2 diabetes can range from primarily insulin resistance to the best classification accuracy. This statistic, however, is
mostly secretory dysfunction with or without insulin resistance. insufficient to properly and efficiently determine the best method.

Journal of Family Medicine and Primary Care 6930 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches

When determining the best conclusion, other variables such as Sadhu, A. and Jadli A.[9] experimented on a diabetes data set
the receiver operating characteristic (ROC) value, F‑score, and taken from the UCI repository. There were 520 occurrences and
calculation time should be taken into account. Metrics include 16 attributes in all. They attempted to concentrate their efforts
classification accuracy, F‑score, ROC value, and computation on predicting diabetes at an early stage. On the validation set
time. Future researchers will be aided by the findings of this of the employed data set, seven classification techniques were
study in constructing a baseline strategy for DM classification. implemented: k‑NN, logistic regression, SVM, naive Bayes,
decision tree, random forests, and multilayer perceptron. The
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

Machine Learning Algorithms random forests classifier proved to be the best model for the
concerned data set, with an accuracy score of 98%, followed
Machine learning (ML) is a rapidly developing field that is being by logistic regression at 93%, SVM at 94%, naive Bayes at 91%,
decision tree at 94%, random forests at 98%, and multilayer
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

applied in a variety of medical applications. ML models all learn


from the past and make predictions based on a data set. Diabetes perceptron at 98%, according to the results of training several
detection will become much easier and less expensive thanks to machine learning models.
recent advances in ML. There are numerous diabetic data sets
accessible. As a result, ML is required for medical diagnostics. Xue et al.[10] experimented on the diabetes data set taken from
The goal of this study is to forecast a patient’s probability the UCI repository; there were 520 patients and 17 qualities
of developing diabetes. Algorithms for machine learning are in it. They attempted to concentrate on early detection of
employed. There are two types of learning for the study. diabetes. They trained on the actual data of 520 diabetic patients
1) Supervised Learning and probable diabetic patients aged 16–90 using supervised
2) Unsupervised Learning. ML techniques such as SVM, naive Bayes classifiers, and
LightGBM. The performance of the SVM is the best when
The goal of a supervised learning algorithm is to predict based comparing classification and recognition accuracy. The naive
on labeled data. In supervised learning, the data is labeled. Bayes classifier is the most widely used classification algorithm,
It simulates what a student might learn from an instructor. with an accuracy of 93.27%. SVM has the highest accuracy rate
Unsupervised learning, on the other hand, does not label the data. of 96.54%. LightGBM has an accuracy of only 88.46%. This
It’s more like self‑learning based on previous experiences. The demonstrates that SVM is the best classification algorithm for
goal is to forecast a variable’s value. A set of traits and features diabetes prediction.
are used to represent the data. The outcome of guided learning
Le et al. [11] experimented on the early‑stage diabetes risk
is predetermined. Decision trees (DT), random forests, linear
prediction; the data set used in this research was taken from the
regression, logistic regression, naive Bayes classifiers, k‑nearest
UCI repository and consisted of 520 patients and 16 variables.
neighbors (k‑NN), support vector machine (SVM), and artificial
They suggested a ML approach for predicting diabetes patients’
neural networks (ANN) are some of the most commonly used
early onset. It was a new wrapper‑based feature selection
techniques.
method that employed grey wolf optimizer (GWO) and adaptive
particle swarm optimization (APSO) to optimize the multilayer
The data in unsupervised learning is made up of values
perceptron (MLP) and reduce the number of needed input
without labels, and the outcome is not predetermined. Based
attributes. They also compared the results obtained with this
on self‑learning, the model makes predictions. Forecasting,
method to those obtained via a variety of traditional machine
classifying, detecting, segmenting, and categorizing data are
learning algorithms, including SVM, DT, k‑NN, naive Bayes
the key goals of these models. Machine learning applications classifier (NBC), random forest classifier (RFC), and logistic
include analysis, recognition, image analysis, information retrieval, regression (LR). LR achieved a 95% accuracy rate. k‑NN had
bioinformatics, data compression, and computer graphics. a 96% accuracy rate, SVM a 95% accuracy rate, NBC a 93%
accuracy rate, DT a 95% accuracy rate, and RFC had a 96%
Literature Survey for Prediction of Diabetes accuracy rate. The suggested methods’ computational findings
using Machine Learning Approaches show that not only are fewer features required but also that higher
prediction accuracy may be attained (96% for GWO–MLP and
Birjais et al.[8] experimented on PIMA Indian Diabetes (PID) 97% for APSO–MLP). This research has the potential to be
data set. It has 768 instances and 8 attributes and is available applied in clinical practice and used as a tool to assist doctors
in the UCI machine learning repository. They aimed to focus and physicians.
more on diabetes diagnosis, which, according to the World
Health Organization (WHO) in 2014, is one of the world’s Julius et al.[12] used the Waikato Environment for Knowledge
fastest‑growing chronic diseases. Gradient boosting, logistic Analysis (Weka) application platform to test a data set collected
regression, and naive Bayes classifiers were used to predict from the UCI repository. There were 520 samples in the data
whether a person is diabetic or not, with gradient boosting having set, each with a collection of 17 attributes. The goal of this study
an accuracy of 86%, logistic regression having a 79% accuracy, was to use machine learning classification approaches based on
and naive Bayes having a 77% accuracy. observable sample attributes to predict diabetes at an early stage.

Journal of Family Medicine and Primary Care 6931 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches

The k‑NN, SVM, functional tree (FT), and RFCs were employed To predict diabetes mellitus, Hassan et al. [18] employed
as classifiers. k‑NN had the highest accuracy of 98%, followed classification approaches such as the DT, k‑NN, and SVM. The
by SVM at 94%, FT at 93%, and RF at 97%. SVM outperformed the DT and KNN methods with a maximum
accuracy of 90.23%.
Shafi et al.[13] reported that because diabetes is a serious illness,
early detection is always a struggle. This study used machine Kandhasamy and Balamurali[19] investigated the prediction
learning classification methods to develop a model that could accuracy of J48, k‑NN, RFC, and SVM on the diabetes data set.
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

solve any problem and that could be used to identify diabetes Before preprocessing the data, the author discovered that the
development early on. The authors of this research made J48 method had a higher accuracy than others, at 73.82%. After
concerted efforts to develop a framework that could accurately preprocessing, k‑NN and RFC demonstrated improved accuracy.
predict the likelihood of diabetes in patients. As part of this study,
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

the three ML approach classification algorithms—DT, SVM, and Meng et al.[20] examined J48, LR, and k‑NN algorithms on the
NBC—were studied and assessed on various measures. In the diabetes data set. J48 was found to be the most accurate, with a
study, the PID data set acquired from the UCI repository was classification accuracy of 78.27%.
used to save time and produce precise findings. The experimental
results suggested that the NBC approach was adequate, with a Nai‑Arun and Moungmai[21] created a web application based on
74% accuracy, followed by SVM with a 63% accuracy and the the prediction accuracy for diabetes prediction. They compared
DT with a 72% accuracy. In the future, the built framework, prediction methods such as DTs, NNs, LR, NBC, and RFC,
as well as the ML classifiers used, could be used to identify or as well as, bagging and boosting. They discovered that RFC
diagnose other diseases. The study, as well as several other ML performed best in terms of accuracy and ROC score, with an
methodologies, could be extended and improved for diabetes accuracy of 85.558% and an ROC value of 0.912.
research, and the scientists intended to classify other algorithms
with missing data. Saravananathan and Velmurugan[22] looked at J48, CART, SVM,
and k‑NN on a medical data set in their research. They compared
Khanam et al.[14] experimented with diabetes illness prediction. them based on accuracy, specificity, sensitivity, precision, and
Diabetes is a condition with no known cure; therefore early error rate. With a score of 67.15%, they discovered that J48
detection is essential. In this study, data mining, ML techniques, algorithms were the most accurate, followed by SVM (65.04%),
and neural network (NN) methodologies were utilized to predict CART (62.28%), and k‑NN (53.39%).
diabetes. They developed a technique that could accurately
predict diabetes. They used data from the UCI repository’s PID Kumari and Chitra[23] used SVM, RFC, DT, MLP, and LR,
data set. The data set included information on 768 patients and as well as four k‑fold cross‑validations (k = 2,4,5,10) in their
their 9 attributes. On the data set, they utilized seven ML methods research. According to the researchers, MLP with four‑fold
to predict diabetes: DT, k‑NN, RFC, NBC, AB, LR, and SVM. cross‑validation achieves the best accuracy, at 78.7%. They
They used the Weka tool to preprocess the data. They discovered discovered that MLP outscored all other algorithms.
that a model combining LR and SVM is effective at predicting
diabetes. They created a NN model with two hidden layers and To predict diabetes, Kavakiotis et al.[24] employed NBC, RFC, k‑NN,
varied epochs and found that the NN with two hidden layers SVM, DT, and LR methods. The algorithms were applied using
gave 88.6% accuracy. ANN scored 88.57%, LR scored 78.85%, a ten‑fold cross‑validation technique. SVM had the best accuracy
NBC scored 78.28%, and RFC scored 77.34%. of all the approaches, measuring 84%, according to the study.

Sisodia et al.[15] used the PID data set available on the UCI The work on the classification of “Diabetes Prediction” based
repository. This data set contained 768 patients and 8 on eight attributes was done by Rawat et al.[25] In this study,
attributes. They employed three ML classifications to identify five ML algorithms for the analysis and prediction of diabetic
diabetic patients: DT, SVM, and NBC. NBC had the highest patients were described: AdaBoost, LogicBoost, RobustBoost,
accuracy (76.30%) when compared to the other models. naive Bayes, and bagging. A group of diabetic PIMA Indians
was used to test the proposed strategies. The computed results
Agarwal et al.[16] used the PID data set of 738 patients as well were found to be quite accurate, with a classification accuracy of
in their study. To analyze the effectiveness of this data set for 81.77% and 79.69% for the bagging and AdaBoost techniques,
identifying diabetic patients, the authors applied models such as respectively. As a result, the proposed DM prediction algorithms
SVM, k‑NN, NBC, ID3, C4.5, and CART. The SVM and LDA were particularly appealing, effective, and efficient.
algorithms were the most accurate, with an accuracy of 88%.
Using disease classifiers and an actual data set, Nai‑Arun and
Rathore et al.[17] employed classification techniques like SVM and Moungmai[21] suggested a web application. The data for this
DTs to predict diabetes mellitus. The PID data set provided the component was collected from 30,122 people at Sawanpracharak
data for this investigation. PIMA India prioritizes women’s health. Regional Hospital’s twenty‑six primary care units between 2012
The SVM has an accuracy of 82%. and 2013. To identify a predictive model, thirteen classification

Journal of Family Medicine and Primary Care 6932 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches

models were investigated before the web application was created. prediction at an early stage. Different metrics can be used to
These models, except the RFC method, included the DT, NN, assess performance and for accurate diabetic diagnosis.
LR, NBC, and RFC algorithms, which all used a combination
of bagging and boosting techniques. Each model’s accuracy and Financial support and sponsorship
ROC curves were calculated and compared to others to see how Nil.
robust they were. According to the findings, RFC won in both
accuracy and ROC curve. This could be owing to a wide range of Conflicts of interest
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

options. Not only were data and input factors chosen at random
There are no conflicts of interest.
in the RFC approach, but crucial variables were also taken into
account. As a result, the precision values rose. As a result, this
References
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

algorithm was chosen to represent diabetes risk prediction and


was employed in the development of the application. 1. Report of the expert committee on the diagnosis
and classification of diabetes mellitus. Diabetes Care
Perveen et al.[26] used a data set from the Canadian Primary 1997;20:1183‑97.
Care Sentinel Surveillance Network (CPCSSN) database to do 2. Norris SL, Lau J, Smith SJ, Schmid CH, Engelgau MM.
their research. The study employed the AdaBoost and bagging Self‑management education for adults with type 2 diabetes:
A meta‑analysis of the effect on glycemic control. Diabetes
ensemble techniques using the J48 (C4.5) DT as a base learner
Care 2002;25:1159‑71.
and standalone data mining methodology J48 to categorize
3. Shaw JE, Sicree RA, Zimmet PZ. Global estimates of the
patients with diabetes mellitus based on diabetes risk indicators. prevalence of diabetes for 2010 and 2030. Diabetes Res
This categorization was done across three separate ordinal adult Clin Pract 2010;87:4‑14.
groups in the CPCSSN. In terms of overall performance, the 4. Anjana RM, Pradeepa R, Deepa M, Datta M, Sudha V,
AdaBoost ensemble method surpassed both bagging and a single Unnikrishnan R, et al. Prevalence of diabetes and
J48 DT, according to the findings. prediabetes (impaired fasting glucose and/or impaired
glucose tolerance) in urban and rural India: Phase I
results of the Indian Council of Medical Research India
Mujumdar and Vaidehi[27] presented a diabetes prediction model Diabetes (ICMRINDIAB) study. Diabetologia 2011;54:3022‑7.
for better diabetes classification that included a few extrinsic
5. Ramachandran A, Snehalatha C, Salini J, Vijay V. Use of
factors that caused diabetes, as well as regular components such as glimepiride and insulin sensitizers in the treatment of
glucose, BMI, age, insulin, and so on. The new data set enhanced type 2 diabetes‑a study in Indians. J Assoc Physicians India
classification accuracy when compared to the old data set. Multiple 2004;52:459‑63.
ML approaches were used on the data set, and classification was 6. Wagai GA, Romshoo GJ. Adiposity contributes to poor
done with a variety of algorithms, with LR yielding the highest glycemic control in people with diabetes mellitus, a
randomized case study, in South Kashmir, India. J Family
accuracy at 96%. The AdaBoost classifier was found to be the
Med Prim Care 2020;:4623‑6.
most accurate, with a 98.8% accuracy rate. They used two separate
7. AACE/ACE Position Statement on the Prevention. Diagnosis
data sets to compare the accuracy of ML techniques. When and treatment of obesity (1998 Revision). Endoc Practice
compared to the existing data set, it was clear that the model 1998;4:297‑330.
improved diabetes prediction accuracy and precision. 8. Birjais R, Mourya AK, Chauhan R, Kaur H. Prediction and
diagnosis of future diabetes risk: A machine learning
Mercaldo et al.[28] offered a strategy for classifying diabetic approach. SN Appl Sci 20191:1‑8.
patients based on a set of features chosen according to the 9. Sadhu A, Jadli A. Early‑stage diabetes risk prediction:
WHO criteria. Evaluating real-world data using state of the art A comparative analysis of classification algorithms. Int Adv
Res J Sci Eng Technol (IARJSET) 2021;8:193‑201.
machine learning algorithms. The model was trained using six
alternative classification approaches, with the Hoeffding Tree 10. Xue J, Min F, Ma F. Research on diabetes prediction method
based on machine learning. J Phys Conf Ser 2020;1684:1-6.
method scoring 0.770 in precision and 0.775 in recall. They used
11. Le TM, Vo TM, Pham TN, Dao SV. A novel wrapper–based
data from the PIMA Indian community in Phoenix, Arizona, to
feature selection for early diabetes prediction enhanced
evaluate the method. with a metaheuristic. IEEE Access 2020;9:7869‑84.
12. Julius AO, Ayokunle AO, Ibrahim FO. Early diabetic
Conclusion risk prediction using machine learning classification
techniques. Available from: https://ijisrt.com/
Early detection of diabetes is critical for effective therapy. early‑diabetic‑risk‑prediction‑using‑machine‑learning‑class
Many people have no idea whether or not they have it. The ification‑techniques.
full assessment of machine learning approaches for early 13. Shafi S, Ansari GA. Early prediction of diabetes disease
& classification of algorithms using machine learning
diabetes prediction and how to apply a variety of supervised
approach. In Proceedings of the International Conference
and unsupervised machine learning algorithms to the data on Smart Data Intelligence (ICSMDI 2021). Available from:
set to achieve the best accuracy are addressed in this paper. SSRN 3852590 (2021).
Furthermore, the work will be expanded and refined to create 14. Khanam JJ, Foo SY. A comparison of machine learning
a more precise and general predictive model for diabetes risk algorithms for diabetes prediction. ICT Express 2021;7:432‑9.

Journal of Family Medicine and Primary Care 6933 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches

15. Sisodia D, Sisodia DS. Prediction of diabetes using 2015;69:132‑42.


classification algorithms. Procedia Comput Sci 22. Saravananathan K, Velmurugan T. Analyzing diabetic data
2018;132:1578‑85. using classification algorithms in data mining. Indian J Sci
16. Agrawal P, Dewangan AK. A brief survey on the techniques Technol 2016;9:1‑6.
used for the diagnosis of diabetes‑mellitus. Int Res J Eng 23. Kumari VA, Chitra R. Classification of diabetes disease using
Tech IRJET 2015;2:1039‑43. support vector machine. Int J Eng Res Appl 2013;3:1797‑801.
17. Rathore A, Chauhan S, Gujral S. Detecting and predicting 24. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N,
diabetes using supervised learning: An approach towards
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A

Vlahavas I, Chouvarda I. Machine learning and data mining


better healthcare for women. Int J Adv Res Comput Sci methods in diabetes research. Comput Struct Biotechnol J
2017;8:1192‑4. 2017;15:104‑16.
18. Hassan AS, Malaserene I, Leema AA. Diabetes mellitus 25. Rawat V, Suryakant S. A classification system for diabetic
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024

prediction using classification techniques. Int J Innov patients with machine learning techniques. Int J Math Eng
Technol Explor Eng 2020;9:2080‑4. Manag Sci 2019;4:729‑44.
19. Kandhasamy JP, Balamurali S. Performance analysis of 26. Perveen S, Shahbaz M, Guergachi A, Keshavjee K. Performance
classifier models to predict diabetes mellitus. Procedia analysis of data mining classification techniques to predict
Comput Sci 2015;47:45‑51. diabetes. Procedia Comput Sci 2016;82:115‑21.
20. Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison 27. Mujumdar A, Vaidehi V. Diabetes prediction using machine
of three data mining models for predicting diabetes or learning algorithms. Procedia Comput Sci 2019;165:292‑9.
prediabetes by risk factors. Kaohsiung J Med Sci 2013;29:93‑9. 28. Diabetes mellitus affected patients classification and
21. Nai‑Arun N, Moungmai R. Comparison of classifiers for diagnosis through machine learning techniques. Procedia
the risk of diabetes prediction. Procedia Comput Sci Comput Sci 2017;112:2519‑28.

Journal of Family Medicine and Primary Care 6934 Volume 11 : Issue 11 : November 2022

You might also like