A Survey On Diabetes Risk Prediction Using Machine.50
A Survey On Diabetes Risk Prediction Using Machine.50
1
Department of Computer Science, Bhagwant University, Ajmer, Rajasthan, 2Department of Medicine‑Associated Hospital
GMC, Anantnag, Jammu and Kashmir, India
A bstract
Background: Diabetes mellitus (DM) is a chronic condition that can lead to a variety of consequences. Diabetes is a condition that is
caused by factors such as age, lack of exercise, sedentary lifestyle, family history of diabetes, high blood pressure, depression and stress,
poor food, and so on. Diabetics are at a higher risk of developing diseases such as heart disease, nerve damage (diabetic neuropathy),
eye problems (diabetic retinopathy), kidney disease (diabetic nephropathy), stroke, and so on. According to the International Diabetes
Federation, 382 million people worldwide suffer from diabetes. By 2035, this number will have risen to 592 million. Every day, a large
number of people become victims, and many are ignorant whether they have it or not. It primarily affects individuals between the ages
of 25 and 74 years. If diabetes is left untreated and undiagnosed, it can lead to a slew of complications. The emergence of machine
learning approaches, on the other hand, solves this crucial issue. Aims and Objectives: The aim was to study the DM and analyze how
machine learning algorithms are used to identify the diabetes mellitus at an early stage, which is one of the most serious metabolic
disorders in the world today. Methods and Materials: Data was obtained from databases such as Pubmed, IEEE xplore, and INSPEC,and
from other secondary sources and primary sources in which methods based on machine learning approaches used in healthcare to
predict diabetes at an early stage are reported. Results: After surveying various research papers, it was found that machine learning
classification algorithms like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) etc shows the best
accuracy for predicting diabetes at an early stage. Conclusion: Early detection of diabetes is critical for effective therapy. Many people
have no idea whether or not they have it. The full assessment of Machine learning approaches for early diabetes prediction and how to
apply a variety of supervised and unsupervised machine learning algorithms to the dataset to achieve the best accuracy are addressed
in this paper.. Furthermore, the work will be expanded and refined to create a more precise and general predictive model for diabetes
risk prediction at an early stage. Different metrics can be used to assess performance and for accurate diabetic diagnosis.
Introduction parts of the patient’s life, including physical and mental well‑being,
and no therapy technique can produce spectacular improvements
Diabetes mellitus is a metabolic disorder defined by abnormally or stop the disease from progressing.[2] In the year 2000, India had
high blood sugar levels due to a lack of insulin secretion or a the greatest number of diabetics in the world (31.7 million), which
combination of insulin resistance and insufficient insulin synthesis increased to 62.4 million in 2011 and is anticipated to reach 69.9
to compensate.[1] It is a progressive metabolic ailment that affects all million by 2025.[3,4] Rapid urbanization and economic development
are to blame for India’s high frequency. Indians are more likely to
Address for correspondence: Dr. Gowher A. Wagai,
develop diabetes as a result of their low BMI combined with high
Al‑Jowher Diabetes Care and Scan Centre, Nazuk Mohalla,
Cheeni Chowk, Anantnag ‑ 192 101, Jammu and Kashmir, India. upper‑body adiposity, high body fat percentage, and high insulin
E‑mail: [email protected] resistance.[5] Blurred vision, weight loss, fatigue, increased hunger
and thirst, confusion, frequent urination, poor healing, frequent
Received: 02‑03‑2022 Revised: 24‑06‑2022
Accepted: 26‑06‑2022 Published: 16-12-2022
This is an open access journal, and articles are distributed under the terms of the Creative
Commons Attribution‑NonCommercial‑ShareAlike 4.0 License, which allows others to
Access this article online remix, tweak, and build upon the work non‑commercially, as long as appropriate credit is
Quick Response Code: given and the new creations are licensed under the identical terms.
Website:
www.jfmpc.com For reprints contact: [email protected]
DOI: How to cite this article: Firdous S, Wagai GA, Sharma K. A survey on
10.4103/jfmpc.jfmpc_502_22 diabetes risk prediction using machine learning approaches. J Family
Med Prim Care 2022;11:6929-34.
© 2022 Journal of Family Medicine and Primary Care | Published by Wolters Kluwer ‑ Medknow 6929
Firdous, et al.: Survey on diabetes risk prediction using ML approaches
infections, and difficulty concentrating are all signs or symptoms The pancreas produces insulin, but it may not be enough to
of diabetes. “Diabetes means you have too much sugar in your keep blood glucose levels normal, or the cells may be resistant
blood. High blood sugar problems start when your body no to the insulin produced. The illness is most prevalent in those
longer makes enough of a chemical, or hormone, called insulin.”[6] over the age of 40, but it is also becoming more prevalent in
“Sweet urine” is the direct translation. Normal urine does not teenagers and young children. Type 1 diabetes is characterized
contain sugar. There is sugar (or more precisely glucose) in the by drowsiness, dry, itchy skin, unintended weight gain or loss,
urine because the amount of glucose in the blood has increased blurred vision, tingling, numbness, pain in the lower legs, easy
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A
to the point where it spills over into the urine. Because the body weariness, sluggish healing of cuts, or scratches, and frequent
is unable to metabolize glucose properly, it accumulates in the infections (e.g., vaginal infections). Food, activity, lifestyle control,
bloodstream. As a result, diabetes is a disease in which the body and, in some situations, oral medicines or insulin are all necessary
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024
This paper provides many machine learning algorithms used Gestational diabetes
for the early prediction of diabetes. The remainder of the Pregnant women who have never had diabetes before but have
paper is conceived in the following manner: Section (1) is the high blood glucose (sugar) levels during pregnancy are diagnosed
introduction; section (2) is diabetes and its types; section (3) is with gestational diabetes. It is a temporary condition that affects
machine learning algorithms; section (4) literature survey for 2%–4% of all pregnant women and usually disappears after
prediction of diabetes; and section (5) is the conclusion. the baby is born. Women who have had gestational diabetes in
the past are more likely to develop type 2 diabetes later in life.
Diabetes and its Types There is no known etiology for this type of diabetes. The placenta
supports the infant’s development; placental hormones help the
Diabetes mellitus (DM) is a metabolic disease with a variety
baby develop, but they also prevent the mother’s insulin from
of causes. It is characterized by persistent hyperglycemia and
working properly in her body, resulting in insulin resistance. When
alterations in carbohydrate, lipid, and protein metabolism caused
a mother’s body is unable to create and use all of the insulin
by insulin deficiency, insulin action, or both. Diabetes is a chronic
required during pregnancy, gestational diabetes develops. The
disease. Diabetes can injure neurons and blood arteries in the
majority of women are unaware of any signs or symptoms of
eyes, kidneys, heart, and lower legs if not effectively treated.
gestational diabetes. Increased thirst and more frequent urination
Problems may emerge if blood glucose levels remain high for
are two symptoms.
an extended period. Gum disease or tooth decay are examples
of mouth issues. Diabetic retinopathy is a condition that causes
Diabetes mellitus is a deadly disorder, if not treated early’
vision loss and, in severe cases, blindness. Heart and blood
however, early detection can minimize the risk significantly.
vessel illnesses include heart attacks, strokes, and peripheral
A range of medical diagnostic procedures is already in use for
artery disease (cardiovascular diseases or CVD) (insufficient
early diagnosis. Early risk forecasts can be made using machine
blood supply to the feet and legs). Kidney disease (diabetic
learning techniques. Recent research has given promising results
nephropathy) is a condition in which the kidneys do not function
in terms of forecasting the risk of diabetes mellitus. Machine
properly or at all.[7] The three kinds of diabetes include type 1
learning is a field of study in which algorithms are used to
diabetes, type 2 diabetes, and gestational diabetes.
teach machines without the need of humans. Without having
to explicitly program them, we can train them to do a given job
Type 1 diabetes (T1D)
and then use that training to handle similar duties. Accuracy
The body does not produce enough insulin in type 1 diabetes. is always a major problem in medical science, and different
Body cells can’t absorb glucose from the bloodstream without algorithms might yield varying degrees of accuracy on the same
insulin, so they have to rely on other sources of energy. An excess data set. To design a better classifier for better classification, it is
of glucose in the blood causes diabetes and its complications. vital to figure out which algorithm delivers the greatest results.
This type of diabetes is also known as insulin‑dependent Machine learning can now be found in almost every industry.
diabetes mellitus (IDDM). Although it affects adolescents and Its application in medical science has the potential to improve
teenagers more frequently, it can affect anyone at any age. It healthcare dramatically.
requires a delicate balancing act of insulin injections (and, in
some circumstances, oral medicines), exercise, nutrition planning, Decision trees, random forests, support vector machines, naive
and lifestyle changes. Frequent urination, unusual thirst, unusual Bayes classifiers, and artificial neural networks are examples of
hunger, rapid weight loss, weariness and weakness, nausea, and machine learning and classification algorithms that work well in
irritability are some of the symptoms of type 1 diabetes. risk prediction. Because of the algorithms’ computing and data
management skills, this is possible. Measures of classification
Type 2 diabetes (T2D) accuracy can be used to select the best algorithm and determine
Type 2 diabetes can range from primarily insulin resistance to the best classification accuracy. This statistic, however, is
mostly secretory dysfunction with or without insulin resistance. insufficient to properly and efficiently determine the best method.
Journal of Family Medicine and Primary Care 6930 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches
When determining the best conclusion, other variables such as Sadhu, A. and Jadli A.[9] experimented on a diabetes data set
the receiver operating characteristic (ROC) value, F‑score, and taken from the UCI repository. There were 520 occurrences and
calculation time should be taken into account. Metrics include 16 attributes in all. They attempted to concentrate their efforts
classification accuracy, F‑score, ROC value, and computation on predicting diabetes at an early stage. On the validation set
time. Future researchers will be aided by the findings of this of the employed data set, seven classification techniques were
study in constructing a baseline strategy for DM classification. implemented: k‑NN, logistic regression, SVM, naive Bayes,
decision tree, random forests, and multilayer perceptron. The
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A
Machine Learning Algorithms random forests classifier proved to be the best model for the
concerned data set, with an accuracy score of 98%, followed
Machine learning (ML) is a rapidly developing field that is being by logistic regression at 93%, SVM at 94%, naive Bayes at 91%,
decision tree at 94%, random forests at 98%, and multilayer
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024
Journal of Family Medicine and Primary Care 6931 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches
The k‑NN, SVM, functional tree (FT), and RFCs were employed To predict diabetes mellitus, Hassan et al. [18] employed
as classifiers. k‑NN had the highest accuracy of 98%, followed classification approaches such as the DT, k‑NN, and SVM. The
by SVM at 94%, FT at 93%, and RF at 97%. SVM outperformed the DT and KNN methods with a maximum
accuracy of 90.23%.
Shafi et al.[13] reported that because diabetes is a serious illness,
early detection is always a struggle. This study used machine Kandhasamy and Balamurali[19] investigated the prediction
learning classification methods to develop a model that could accuracy of J48, k‑NN, RFC, and SVM on the diabetes data set.
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A
solve any problem and that could be used to identify diabetes Before preprocessing the data, the author discovered that the
development early on. The authors of this research made J48 method had a higher accuracy than others, at 73.82%. After
concerted efforts to develop a framework that could accurately preprocessing, k‑NN and RFC demonstrated improved accuracy.
predict the likelihood of diabetes in patients. As part of this study,
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024
the three ML approach classification algorithms—DT, SVM, and Meng et al.[20] examined J48, LR, and k‑NN algorithms on the
NBC—were studied and assessed on various measures. In the diabetes data set. J48 was found to be the most accurate, with a
study, the PID data set acquired from the UCI repository was classification accuracy of 78.27%.
used to save time and produce precise findings. The experimental
results suggested that the NBC approach was adequate, with a Nai‑Arun and Moungmai[21] created a web application based on
74% accuracy, followed by SVM with a 63% accuracy and the the prediction accuracy for diabetes prediction. They compared
DT with a 72% accuracy. In the future, the built framework, prediction methods such as DTs, NNs, LR, NBC, and RFC,
as well as the ML classifiers used, could be used to identify or as well as, bagging and boosting. They discovered that RFC
diagnose other diseases. The study, as well as several other ML performed best in terms of accuracy and ROC score, with an
methodologies, could be extended and improved for diabetes accuracy of 85.558% and an ROC value of 0.912.
research, and the scientists intended to classify other algorithms
with missing data. Saravananathan and Velmurugan[22] looked at J48, CART, SVM,
and k‑NN on a medical data set in their research. They compared
Khanam et al.[14] experimented with diabetes illness prediction. them based on accuracy, specificity, sensitivity, precision, and
Diabetes is a condition with no known cure; therefore early error rate. With a score of 67.15%, they discovered that J48
detection is essential. In this study, data mining, ML techniques, algorithms were the most accurate, followed by SVM (65.04%),
and neural network (NN) methodologies were utilized to predict CART (62.28%), and k‑NN (53.39%).
diabetes. They developed a technique that could accurately
predict diabetes. They used data from the UCI repository’s PID Kumari and Chitra[23] used SVM, RFC, DT, MLP, and LR,
data set. The data set included information on 768 patients and as well as four k‑fold cross‑validations (k = 2,4,5,10) in their
their 9 attributes. On the data set, they utilized seven ML methods research. According to the researchers, MLP with four‑fold
to predict diabetes: DT, k‑NN, RFC, NBC, AB, LR, and SVM. cross‑validation achieves the best accuracy, at 78.7%. They
They used the Weka tool to preprocess the data. They discovered discovered that MLP outscored all other algorithms.
that a model combining LR and SVM is effective at predicting
diabetes. They created a NN model with two hidden layers and To predict diabetes, Kavakiotis et al.[24] employed NBC, RFC, k‑NN,
varied epochs and found that the NN with two hidden layers SVM, DT, and LR methods. The algorithms were applied using
gave 88.6% accuracy. ANN scored 88.57%, LR scored 78.85%, a ten‑fold cross‑validation technique. SVM had the best accuracy
NBC scored 78.28%, and RFC scored 77.34%. of all the approaches, measuring 84%, according to the study.
Sisodia et al.[15] used the PID data set available on the UCI The work on the classification of “Diabetes Prediction” based
repository. This data set contained 768 patients and 8 on eight attributes was done by Rawat et al.[25] In this study,
attributes. They employed three ML classifications to identify five ML algorithms for the analysis and prediction of diabetic
diabetic patients: DT, SVM, and NBC. NBC had the highest patients were described: AdaBoost, LogicBoost, RobustBoost,
accuracy (76.30%) when compared to the other models. naive Bayes, and bagging. A group of diabetic PIMA Indians
was used to test the proposed strategies. The computed results
Agarwal et al.[16] used the PID data set of 738 patients as well were found to be quite accurate, with a classification accuracy of
in their study. To analyze the effectiveness of this data set for 81.77% and 79.69% for the bagging and AdaBoost techniques,
identifying diabetic patients, the authors applied models such as respectively. As a result, the proposed DM prediction algorithms
SVM, k‑NN, NBC, ID3, C4.5, and CART. The SVM and LDA were particularly appealing, effective, and efficient.
algorithms were the most accurate, with an accuracy of 88%.
Using disease classifiers and an actual data set, Nai‑Arun and
Rathore et al.[17] employed classification techniques like SVM and Moungmai[21] suggested a web application. The data for this
DTs to predict diabetes mellitus. The PID data set provided the component was collected from 30,122 people at Sawanpracharak
data for this investigation. PIMA India prioritizes women’s health. Regional Hospital’s twenty‑six primary care units between 2012
The SVM has an accuracy of 82%. and 2013. To identify a predictive model, thirteen classification
Journal of Family Medicine and Primary Care 6932 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches
models were investigated before the web application was created. prediction at an early stage. Different metrics can be used to
These models, except the RFC method, included the DT, NN, assess performance and for accurate diabetic diagnosis.
LR, NBC, and RFC algorithms, which all used a combination
of bagging and boosting techniques. Each model’s accuracy and Financial support and sponsorship
ROC curves were calculated and compared to others to see how Nil.
robust they were. According to the findings, RFC won in both
accuracy and ROC curve. This could be owing to a wide range of Conflicts of interest
Downloaded from http://journals.lww.com/jfmpc by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4XMi0hCywCX1A
options. Not only were data and input factors chosen at random
There are no conflicts of interest.
in the RFC approach, but crucial variables were also taken into
account. As a result, the precision values rose. As a result, this
References
WnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8KKGKV0Ymy+78= on 02/08/2024
Journal of Family Medicine and Primary Care 6933 Volume 11 : Issue 11 : November 2022
Firdous, et al.: Survey on diabetes risk prediction using ML approaches
prediction using classification techniques. Int J Innov patients with machine learning techniques. Int J Math Eng
Technol Explor Eng 2020;9:2080‑4. Manag Sci 2019;4:729‑44.
19. Kandhasamy JP, Balamurali S. Performance analysis of 26. Perveen S, Shahbaz M, Guergachi A, Keshavjee K. Performance
classifier models to predict diabetes mellitus. Procedia analysis of data mining classification techniques to predict
Comput Sci 2015;47:45‑51. diabetes. Procedia Comput Sci 2016;82:115‑21.
20. Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison 27. Mujumdar A, Vaidehi V. Diabetes prediction using machine
of three data mining models for predicting diabetes or learning algorithms. Procedia Comput Sci 2019;165:292‑9.
prediabetes by risk factors. Kaohsiung J Med Sci 2013;29:93‑9. 28. Diabetes mellitus affected patients classification and
21. Nai‑Arun N, Moungmai R. Comparison of classifiers for diagnosis through machine learning techniques. Procedia
the risk of diabetes prediction. Procedia Comput Sci Comput Sci 2017;112:2519‑28.
Journal of Family Medicine and Primary Care 6934 Volume 11 : Issue 11 : November 2022