0% found this document useful (0 votes)
119 views5 pages

Diabetes Prediction Using Different Machine Learning Techniques PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views5 pages

Diabetes Prediction Using Different Machine Learning Techniques PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

Diabetes Prediction using Different Machine


Learning Techniques
2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) | 978-1-6654-3789-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICACITE53722.2022.9823640

Naresh Kumar Trivedi Vinay Gautam Himanshu Sharma


Institute of Engineering and Sobhasaria Group of Institutions
Institute of Engineering and
Technology, Chitkara University, Sikar, Rajasthan,India
Technology, Chitkara University,
Punjab, India
Punjab, India

Sumit Agarwal
Abhineet Anand
Chitkara Business School Chitkara
Chitkara Business School Chitkara
University Punjab, India
University Punjab, India

Abstract— Even though diabetes is a worldwide epidemic, blood sugar levels have fewer life-threatening complications.
there is no cure for it. Furthermore, Healthcare for people with High blood pressure, numbness and pain in the feet, stroke,
diabetes costs a lot of money every year. As a result, the most and nerve damage are common concerns.
critical consideration is the accuracy of the forecast and the
selection of an appropriate methodology. Artificial Neural B. Diabetic condition type two
Networks (ANNs) and artificial intelligence systems are two Type 2 diabetics can't generate or use insulin properly. By
examples of these techniques (ANN). As a result, artificial neural NIDDK estimates, type 2 diabetes is the commonly seen and
networks were employed in this study to determine whether a closely associated with obesity: it's also the most common
subject has diabetes. Furthermore, a neural network error type of diabetes. Type 2 diabetics do not need insulin.
function was also used as a criterion during training. After the However, some symptoms can be controlled with drugs and
neural network was trained, it was 99.6% accurate in predicting
lifestyle changes, including increasing physical activity, and
if a person had diabetes, with an average error of 0.01.
eating healthier foods. Diabetes type 2 affects people of all
Keywords—Machine Learning, Diabetes, Classification, ages and genders. Age over 45, obesity, and Type 2 diabetes
Neural Network is linked to several risk factors, including a family history of
the disease.
I. INTRODUCTION
C. Effects of diabetes during pregnancy
When blood sugar, or glucose, is not processed correctly When a woman's insulin sensitivity decreases due to
by the body, it results in diabetes. Different forms of diabetes pregnancy, she develops gestational diabetes. According to
have other treatment options. An estimated 39.2 million the CDC Trusted Source, pregnancy-related gestational
Americans and 81.2 million Indians of all ages are diagnosed diabetes affects 2–10% of all pregnancies. Obese pregnant
with diabetes. Diabetes, if not appropriately medicated, can women have an increased chance of getting gestational
move to hazardous consequences like high blood sugar, diabetes.
stroke, and heart disease if not continuously and meticulously
managed. Diabetes results from either inadequate production The CDC estimates that a person with gestational diabetes
of hypoglycaemic agents by the exocrine gland or an has a 50% probability of acquiring type 2 diabetes in the
insufficient response to the produced agents. To assist the future. However, it is possible to manage the disease
medical professionals, various data mining algorithms are throughout pregnancy with the following factors Keeping
presented, and prediction accuracy measures the decision physically fit keeping an eye on the fetus's growth and
support system's effectiveness. Because of this, the goal is to development. In addition, it can monitor it by keeping an eye
create a decision - making system that is capable of accurately on your blood sugar and weight gain during pregnancy.
predicting and identifying the status of a particular patient [1].
High blood pressure is more likely during pregnancy if a
Types of diabetes have varying management woman has gestational diabetes. In addition, as a side effect of
requirements. Thus, it's essential to understand the differences preterm birth, there is a more significant chance that the child
between them. Diabetes does not always result from weight or may acquire type 2 diabetes later in life if the infant has issues
lack of physical activity. Some of these traits have been with with their blood sugar[2].
us since our earliest years.
D. Prediabetes
A. Diabetic type one People with prediabetes have high blood sugar levels but
When the body cannot manufacture insulin, we are said to not enough to warrant a diagnosis of type 2 diabetes. A
have type 1 diabetes. Sugar in the blood is broken down by physician can only diagnose prediabetes if the patient meets
insulin, regulating blood sugar levels. Type 1 diabetes can be the following criteria such as glucose tolerance levels of 140–
diagnosed in children. People with type 1 diabetes must 199 milligrams per decilitre in the range of 5.7–6.4% in A1C
frequently inject insulin from a Trusted Source. Insulin pump test results, blood levels of sugar 100–125 mg/dl while fasting.
injections can be used. Diabetes 1 has no cure. Those with Type 2 diabetes is more likely in those with prediabetes, but
diabetes must monitor their blood sugar levels, use insulin, the symptoms of full-blown diabetes are not yet present.
and modify their lifestyle. Type 1 diabetics who control their

978-1-6654-3789-9/22/$31.00 ©2022 IEEE 2173

Authorized licensed use limited to: UNIVERSIDADE DE RIBEIRAO PRETO. Downloaded on April 03,2023 at 12:17:57 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

the two completely connected layers. A single node in the


The aim of this research paper is to i) foretell and output layer determines the outcome. The dataset of Pima
categarized a person's current condition of health. ii) Indian diabetes is used to evaluate suggested model on this
Identification of certain acceptable health condition-affecting dataset. The proposed model's accuracy rate is 88.51%;
factors iii) Using pre-defined data, a machine learning Complex illness prediction like diabetes was improved by Zhu
algorithm can forecast the health outcomes of a specific health et al. [13] utilising multiple classifiers. In that system, they
condition. presented a dynamic voting system. T2DM and diabetes
datasets are used to test the system. The system's highest
In [3], By 2030 and 2045, the study predicts that the accuracy is 93.45 percent while using MFWC with k=10.
number of people effected by diabetes would rise from 50
crore to 1 billion. Diabetes has no cure, although it can be To predict diabetes mellitus, [14] employed data mining
managed and prevented if a reliable forecast is formed early techniques. They analyse the data and provide an easy-to-
enough. understand analysis of the patterns they find using a Bayesian
network. The system achieved a 99.51% success rate. In [15],
II. RELATED WORK developed a system based on Genetic algorithm, K-Means,
Diabetes, medically known as diabetes mellitus, is a and Support Vector Machine was to forecast the diagnosis of
metabolism disorder that affects the human body in many diabetes. These actions were monitored by the system. Means
ways. Patients with this condition cannot produce insulin, or can be used to fill in all the gaps. Next, clustering utilising K-
their bodies develop insulin resistance, preventing the insulin Means and evolutionary algorithms is used to remove outliers
they produce from working correctly. The primary part of from the dataset and then choose the best feature to reduce the
insulin is to control blood sugar levels through various count of elements in the dataset. SVM algorithm provides the
devices. Diabetes can be classified into: type 1 and type 2. model with a maximum accuracy of 98.82%.
Obliteration of beta-pancreatic cells in type I diabetes To make predictions about diabetes, reduction recursive
damages insulin construction. Eventually, beta cells will be feature and PCA (principal component analysis) were
wiped out and insulin production flaws will be caused by a employed in the model [16]. People with diabetes are
rise in insulin conflict. Type II diabetes has been linked to categorized utilizing NN (neural networks) and DNN(deep
obesity, genetic disorders, and a lack of physical activity [1]. neural networks). In both cases, accuracy was 82.67 percent.
In [4], Diabetes can be predicted early using a new Using the hierarchical Neuro-Fuzzy BSP technique, [17]
technology that researchers devised. Decision trees were used devised a system for predicting diabetes. Pattern
to construct the proposed system, which is based on machine categorization and rule extraction models are proposed in a
learning. Because the technique was designed to reliably hierarchical neuro-fuzzy binary space partitioning model
predict the onset of diabetes at a certain age, the results were (BSP). The training set had an accuracy of 80.08%, whereas
satisfactory [5,6]. the testing set had 78.26%. This was followed by the pairwise
and size-constrained K-means technique [18] for screening
They also tested extensively on rejecting outliers and the high-risk population of diabetes.
filling in missing data to improve the ML model's
performance. Their AUC (Average Utility Classification) was In order to construct a new diabetes prediction pipeline,
930. Predicting the likelihood of diabetes in participants was the PIMA Indians Diabetes dataset was employed. When it
done using DT, SVM, and NB classifiers [7]. An AUC of comes to generating a high-quality outcome, the proposed
0:819. An algorithm called Naive Bayes was deemed to be the pipeline relies heavily on pre-processing. Data
most effective. Data mining approaches based on J48 (c4:5)- standardisation, feature selection, and cross-validation are part
DT has been utilised to categorise diabetes mellitus. AB and of the pre-processing process. For example, we look at the
bagging ensemble techniques have also been applied [8]. AB mean rather than the median value in the missing place to
ensemble method outperforms bagging and J48-DT method provide a more accurate way of the attribute spreading. In
on their own in terms of the outcomes obtained. In [9], addition for cross-fold validation, the datasets is folded with
developed genetic programming for diabetes prediction, and great care in order to preserve the original dataset's class
the framework outperformed other implemented techniques. percentage.
It will utilise Artificial neural networks (ANNs), logistic We presented a pipeline that contained classifiers like
regression (LR), and neural networks (NNs) were to kNN, RF, DT, NB, AB, and XGBoost (XB). Neuron
determine the risk of diabetes by the researchers in (Naive initializers, batch sizes, and learning rates all play a role in
Bayes) [10]. Tests demonstrate that RF exceeds all other selecting hidden layers. The epoch is also used to select hidden
algorithms when it comes to speed and accuracy. In [11], It layers. It is also possible to tune the hyperparameters of MLP
was shown that a GP-based classification method was superior and ML models using the grid search technique. Diabetic
to the usual LDA, QDA, and NB when using three different prediction accuracy can be improved by combining
kernel types: linear, polynomial, or radial basis kernels. The preparation and ML classifiers in various ways while still
GP served as the foundation for this method. In addition, a using the same experimental setup and dataset.
plethora of experiments were conducted to identify the most
effective cross-validation method. Diabetes can be most III. DATA SET AND METHODOLOGY USED
accurately predicted using the GP-based classifier and K20
Two subsections make up this section. As a starting point,
cross-validation approach, according to the study's findings.
we need to consider the dataset and proposed technique.
However, many frameworks have been established, there is
still room for improvement in diabetes prediction. A. Data set
The dropout strategy was used by Ashiquzzaman et al. Global dataset: At least 768 cases and nine features are
[12] to reduce overfitting in a deep learning system for included in the dataset. The dataset has the following
diabetes mellitus prediction. A dropout layer separates each of characteristics:

2174

Authorized licensed use limited to: UNIVERSIDADE DE RIBEIRAO PRETO. Downloaded on April 03,2023 at 12:17:57 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

neural network is like a collection of neurons, whether they


x Number of pregnancies in total are natural or artificial [21].
x Glucose/sugar level F. k-nearest neighbour
Instead of theoretical models, the kNN algorithm uses
x The diastolic blood pressure (DBP)
instantiation learning. In some cases, a simpler technique is
x Within two hours, the insulin level employed, and a similar class may be found in the input space
next door. One of the most often occuring qualities in the local
x A measurement of skin fold thickness in millimetres vicinity is used to categorise an example using kNN
(mm). algorithms. The categorization process is hindered by the need
x Body Mass Index (BMI) for k.

x Hereditary factor- Pedigree function G. Stochastic gradient descent


SGD can be used to iteratively improve an objective
x Years of the patient's age function's smoothness qualities (e.g., differentiable or
Training and testing can be done with a percentage split subdifferentiable). It is feasible to approximate gradient
option. 80 percent of the 768 instances are used for training, descent optimization by replacing an estimated gradient for
and 20 percent are used for testing [19]. the true gradient (selected from a small, arbitrary data
collection). However, faster iterations reduce convergence
B. Proposed Methodology rates for elevated optimization algorithms.
The block diagram below shows the algorithm
combinations that the proposed system will use Random IV. RESULT AND DISCUSSION
Forest, Logistic Regression, Neural Network, kNN, and According to Tables 1 to 5, we have data for the random
stochastic gradient are the most used methods for classifying forest, logistic regression, neural network, closest neighbour
data to ensure accuracy. (kNN), and stochastic gradient descent (SGD) prediction
models.
Figure 2 depicts the outcomes of applying these five
machine learning models. According to the findings (Table 3;
Figure 2), Neural networks correctly identify the true positives
(absence shown as absent) and true negatives
represents (presence shows the presence of diabetes).
Table 3 (NN confusion matrix) provides the truest
positives based on the investigation (Figure. 2).
A small number of samples are expected to be completely
free of diabetes (TP). False positives occur when diabetes is
predicted to be absent (FP). Based on diabetes, the number of
samples expected to have the condition is calculated (TN).
The estimated number of samples predicted to be diabetic but
are not diabetic.
The confusion matrices in Table 5 show that SGD is the
worst performer in testing (TN).
With classification results of 99.6%, neural networks
exceed all the classification algorithms.

Fig. 1. Block diagram of diabetes prediction system Confusion matrix for various algorithms are shown below.

C. Random forest TABLE I. RANDOM FOREST

The machine learning method Random Forest is Predicted


commonly utilised in Classification and Regression tasks. For 0 1 ∑
classification and regression, it uses a combination of multiple Actual 0 1304 12 1316
samples to construct decision trees, which are voted on by 1 26 658 684
most of the sample members [20].
∑ 1330 670 2000
D. Logistic regression
With the help of logistic regression, a statistical analysis TABLE II. LOGISTIC REGRESSION
technique, it is possible to forecast a yes-or-no outcome. A
Predicted
dependent data variable is predicted using a logistic regression
model, examining the relationships between existing 0 1 ∑
independent variables. Actual 0 1177 139 1316
1 300 384 684
E. Neural network
∑ 1477 523 1025
By simulating how the human brain works, a neural
network attempts to find hidden patterns in a piece of data. A

2175

Authorized licensed use limited to: UNIVERSIDADE DE RIBEIRAO PRETO. Downloaded on April 03,2023 at 12:17:57 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

TABLE III. NEURAL NETWORK would develop diabetes disease are outperformed by neural
Predicted networks, as shown in
0 1 ∑ Figures 3 and 4.
Actual 0 1310 6 1316
1 1 683 684
∑ 1311 689 2000

TABLE IV. KNN


Predicted
0 1 ∑
Actual 0 1173 143 1316
1 297 387 684
∑ 1470 530 2000

TABLE V. SGD’S
Predicted
0 1 ∑
Actual 0 1157 159 1316
1 48 478 684
Fig. 3. ROC curve for the patient's lack of diabetes
∑ 1448 552 2000

Classification Results using


Machine Learning
120

100

80

60

40

20

0
RandomLogistic
Forest Regression
Neural NetworkkNN SGD
Fig. 4. Prevalence of diabetes in patients with ROC curve
Fig. 2. Comparison of different machine learning models
VI. CONCLUSION
TABLE VI. COMPARISON OF AUC, F1, PRECISION AND RECALL OF This study employed a neural network, and other
DIFFERENT MACHINE LEARNING ALGORITHMS classification algorithms were employed to predict diabetes.
Classifier AUC F1 Precision Recall We can create and implement sophisticated software
SGD 0.717 0.761 0.762 0.769 processes in the medical field using a model. Prediction,
kNN 0.848 0.772 0.774 0.78 diagnosis, treatment, and support for the general public are all
Logistic improved by using software systems in various medical
0.834 0.772 0.775 0.780
Regression
disciplines. There are a variety of ways to distribute and
Random Forest 0.999 0.985 0.985 0.986 deploy these systems. A neuronal network is a parallel
processing system that can identify complicated patterns in
Neural Network 0.999 0.996 0.996 0.996
data. The study's goal was to determine the essential
characteristics and relate them to diabetes. We can save
V. PERFORMANCE MEASUREMENT people's lives and change the course of their treatment if we
ROC graph shows the FP ratio (x) and TP ratio (y) for can predict diabetic illnesses. Thanks to this work, machine
each receiver channel (y-axis). It is useful to use ROC when learning techniques may now be used to determine whether a
the quantity of samples from two classes fluctuates all through person has diabetes. The classification accuracy of a neural
training. Your ROC area must be close to 1 to get the best network model is 99.6%. In this study, algorithms for
classifier. All other methods of forecasting whether a patient predicting diabetes based on various exact parameters were

2176

Authorized licensed use limited to: UNIVERSIDADE DE RIBEIRAO PRETO. Downloaded on April 03,2023 at 12:17:57 UTC from IEEE Xplore. Restrictions apply.
2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)

employed. It was also tested and evaluated using twenty-fold [19] Pima Indians Diabetes DataBase, Data Obtained From:
cross-validation for performance indicators. ttp://www.liacc.up.pt/ML/statlog/datasets/diabetes/diabetes.doc.html
[20] Trivedi N.K., Kumar S., Jain S., Maheshwari S. (2021) KFCM-Based
. REFERENCES Direct Marketing. In: Rathore V.S., Dey N., Piuri V., Babo R.,
Polkowski Z., Tavares J.M.R.S. (eds) Rising Threats in Expert
[1] World Health Organization (WHO), "Definition, Diagnosis, and Applications and Solutions. Advances in Intelligent Systems and
classification of diabetes mellitus and its complications", part 1. Computing, vol 1187. Springer, [Singapore.
WHO/NCD/NCS/2016.2, (2016). https://doi.org/10.1007/978-981-15-6014-9_57
[2] H. Temurtas, N. Yumusak and F. Temurtas, "A comparative study on [21] A. Kaur, K. Guleria and N. Kumar Trivedi, "Feature Selection in
diabetes disease diagnosis using neural networks", Expert System, vol. Machine Learning: Methods and Comparison," 2021 International
36, (2009), pp. 8610–15. Conference on Advance Computing and Innovative Technologies in
[3] P. Saeedi, I. Petersohn, P. Salpea, B. Malanda, S. Karuranga, N. Engineering (ICACITE), 2021, pp. 789-795, doi:
Unwin, S. Colagiuri, L. Guariguata, A. A. Motala, K. Ogurtsova, J. E. 10.1109/ICACITE51222.2021.9404623.
Shaw, D. Bright, and R. Williams, ``Global and regional diabetes
prevalence estimates for 2019 and projections for 2030 and 2045:
Results from theinternational diabetes federation diabetes atlas, 9th
edition,'' Diabetes Res. Clin. Pract., vol. 157, Nov. 2019, Art. no.
107843.
[4] Orabi,K.M.,Kamal,Y.M.,Rabah,T.M.,2016.EarlyPredictiveSystemfor
DiabetesMellitusDisease,in:IndustrialConferenceonDataMining,Sprin
ger.Springer.pp.420–427.
[5] Priyam,A.,Gupta,R.,Rathee,A.,Srivastava,S.,2013.ComparativeAnaly
sisofDecisionTreeClassificationAlgorithms.InternationalJournalofCur
rentEngineeringandTechnologyVol.3,334–
337.doi:JUNE2013,arXiv:ISSN2277-4106.
[6] Trivedi NK, Simaiya S, Lilhore UK, Sharma SK, (2022) “COVID-19
Pandemic Role of Machine Learning & Deep Learning Methods in
Diagnosis”. International Journal of Current Research and Review.
DOI: http://dx.doi.org/10.31782/IJCRR.2021.SP192.
[7] Kaur A, Guleria K and Trivedi NK, 2021 "Rice Leaf Disease
Detection: A Review," 6th International Conference on Signal
Processing, Computing and Control (ISPCC), 2021, pp. 418-422
[8] Sarangi, P. K., Guleria, K., Prasad, D., & Verma, D. K. (2021). Stock
movement prediction using neuro genetic hybrid approach and impact
on growth trend due to COVID-19. International Journal of
Networking and Virtual Organisations, 25(3-4), 333-352.
[9] Trivedi NK, Gautam V, Anand A, Aljahdali HM, Villar SG, Anand D,
Goyal N, Kadry S (2021) “Early Detection and Classification of
Tomato Leaf Disease Using High-Performance Deep Neural
Network”. Sensors. 21(23):7987.
[10] N. Nai-arun and R. Moungmai, ``Comparison of classi_ers for the risk
of diabetes prediction,'' Procedia Comput. Sci., vol. 69, pp. 132_142,
Dec. 2015.
[11] M. Maniruzzaman, N. Kumar, M. M. Abedin, M. S. Islam, H. S. Suri,
A. S. El-Baz, and J. S. Suri, ``Comparative approaches for classi_cation
of diabetes mellitus data: Machine learning paradigm,'' Comput.
Methods Programs Biomed., vol. 152, pp. 23_34, Dec. 2017.
[12] A. Ashiquzzaman, A. K. Tushar, M. Islam, J.-M. Kim et al.,
``Reduction of overfitting in diabetes prediction using deep learning
neural network,'' arXiv preprint arXiv:1707.08386, 2017.
[13] J. Zhu, Q. Xie, K. Zheng. “An Improved Early Detection Method of
Type-2 Diabetes Mellitus Using Multiple Classifier Systems”.
Information Sciences, volume 292, pages 1-14, 2015.
[14] M. Kumari, Dr. R. Vohra, and A. Arora, “Prediction of Diabetes using
Bayesian Network,” International Journal of Computer Science and
Information Technologies, vol. 5, pp. 5174-5178, 2014.
[15] T. Santhanam and M.S Padmavathi, “Application of K-Means and
Genetic Algorithms for Dimension Reduction by Integrating SNM for
Diabetes Diagnosis,” Procedia Computer Science, vol. 47, pp. 76-83,
2015.
[16] J. Vijayashree and J. Jayashree, “An Expert System for the Diagnosis
of Diabetic Patients using Deep Neural Networks and Recursive
Feature Elimination,” International Journal of Civil Engineering and
Technology, vol. 8, pp. 633-641, Dec. 2017.
[17] L. B. Goncalves and M. M. Bernardes, “Inverted Hierarchical Neuro-
Fuzzy BSP System: A Novel Neuro-Fuzzy Model for Pattern
Classification and Rule Extraction in Databases,” in IEEE Transactions
on Systems, Man, and Cybernetics, vol. 36, no. 2, pp. 236-248, Mar.
2006.
[18] L. Han, S. Luo, H. Wang, L. Pan, X. Ma and T. Zhang, "An Intelligible
Risk Stratification Model Based on Pairwise and Size Constrained
Kmeans," in IEEE Journal of Biomedical and Health Informatics, vol.
21, no. 5, pp. 1288-1296, Sept. 2017.

2177

Authorized licensed use limited to: UNIVERSIDADE DE RIBEIRAO PRETO. Downloaded on April 03,2023 at 12:17:57 UTC from IEEE Xplore. Restrictions apply.

You might also like