Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
Abstract— Diabetes mellitus has an issue called chronic Type 2 diabetes, which is frequently associated to the
hyperglycaemia. It might lead to a lot of problems. According to occurrence of overweight, hypertension, lipid problems,
present rates of morbidity, it is expected to be 642 million arteriosclerosis, and other illnesses, is more likely to occur in
diabetic patients worldwide by 2040, or one in every ten middle-aged and older adults [3].
individuals. Without a doubt, more attention has to be paid to
this alarming statistic. The huge and highly classified data that Diabetes is growing more prevalent in daily life for
the healthcare industry produces must be handled with care. individuals as living standards rise. Therefore, it is imperative
One of the numerous fatal diseases that are spreading around to do study regarding the simplest and most precise methods
the globe is diabetes mellitus. Medical professionals want a of identifying and evaluating diabetes. Resting blood glucose,
reliable diabetes prediction system. Several machine learning tolerance to glucose, and variable blood glucose levels are all
techniques are used in a range of areas to do predictive used in medical diagnosis of diabetes. The easier it is to
modelling over huge data. Machine learning is presently applied manage, quicker the diagnosis is determined. Based on data
in many fields of medical science due to its rapid progress. from normal physical tests, users and experts can utilise
Although it is challenging to employ analytics to predict machine learning to establish an initial diagnosis of diabetes
outcomes in healthcare, in the long run, it may assist mellitus. The two most significant issues with machine
practitioners in making quick judgements on the treatment and learning is selecting the appropriate classifier and valid
well-being of individuals based on vast volumes of data.
features [4]. As one of the most often used machine learning
Basically, Pima Indians diabetes dataset is considered in this
techniques, classification employs it to generate an inferred
work. The National Institute of Diabetes and Digestive and
Kidney Diseases is the original repository for this dataset. In this
function that can be used to map new or unseen instances. The
work, seven distinct machine learning algorithms are employed present investigation uses classification to create a more
and compared with deep neural network model to address the accurate prediction model. In order to predict diabetes,
diabetes prediction. The deep neural net model gives better predictive machine learning approaches are widely used, and
accuracy score as compared to conventional machine learning the results are better. A decision tree is a well-liked machine
models. learning technique in the medical industry since it is good at
classifying data. Decision trees are produced in great quantity
Keywords—artificial neural network, machine learning, by random forest. The machine learning techniques that have
precision, recall, deep learning, diabetes dataset, prediction lately gained popularity but the neural network performs
better in many ways. So, in this study, we used a variety of
I. INTRODUCTION classification techniques, regression methods, and neural
Diabetes is a communal chronic illness that poses a major networks to predict diabetes [5].
threat to people's health. Diabetes is characterised by elevated
blood glucose levels, which can be brought on by problems II. MATERIALS AND METHOD
with insulin manufacturing, physiologic effects of insulin, or The proposed methodology consists of a few steps that are
both. Several tissues, particularly the cardiovascular system, crucial to achieving our objective, such as collecting a
kidneys, liver, nerves, blood vessels, and eyes can suffer diabetes dataset with the relevant patient characteristics,
persistent damage and malfunction as a result of diabetes. preliminary processing the numerical value of the attributes,
Type 1 diabetes (T1D) and type 2 diabetes (T2D) are the two using a variety of machine learning techniques, and using the
distinct kinds of diabetes. Type 1 diabetics typically have a outcome of predictive analysis [6]. The stages are briefly
younger age than 30 [1], [2]. Among the typical clinical covered in the sections. The overall work flow diagram is
indications are high blood sugar levels, rapid thirst, and given in the Figure 1, where we initially taken diabetic dataset
frequent urination. Since oral medications alone are unable to then the data pre-processing occurs, after pre-processing we
cure this kind of diabetes, patients must have insulin therapy. use correlation analysis of features. Then the different
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.
machine learning models are used for classification and then These instances were picked under a number of constraints
model evaluation is performed with test dataset. Model from a larger database. In instance, the bulk of those receiving
prediction occurs on the basis of model accuracy [7]. treatment at this clinic are Pima Indian women under the age
of 21. The dataset has 768 instances with a total of 9
characteristics. The datasets consist of many medical
predictive (independent) factors and one target (dependent)
variable, called Outcome. Independent variables include the
patient's BMI, glucose level, age, number of prior
pregnancies, and other things. Information about the test
dataset and datatype is included in Table I,
B. Data Pre-processing
The diabetes dataset has undergone some data pre-processing
in order to meet the objectives of this study. For example, it is
useless to forecast diabetes based on the precise numerical
value of the features. As a result, we translate into the nominal
values of the numerical characteristic. The patient's age, for
instance, is alienated into three groups: minor (10–25 years),
adult (26–50 years), and old (above 50 years). Similar
classifications are made for patients' weights: underweight
(less than or equal to 40 kg), normal (41–60 kg), and
overweight (more than 60 kg). Last but not least, blood
pressure is divided into three groups: Low (less than 80
mmHg), High (more than 120 mmHg), and Normal (120/80
mmHg).
C. Machine Learning Techniques
After the data is ready for modelling, we apply different
recognized machine learning classification techniques to
forecast diabetes mellitus. As a result of this, we give a
summary of alternative approaches.
1. Logistic regression
Calculating the likelihood that the particular instance fits to a
Figure 1. Process Flow Diagram specific class is the main use of the automated machine
learning technique logistic regression in classification
A. Dataset and Attributes assignments. Classification approaches make use of a
The National Institute of Diabetes and Digestive and Kidney technique known as logistic regression. Regression is
Diseases served as the original repository for the dataset used employed because it uses the output of a linear regression
in this study. A few diagnostic factors that are part of the function as input for determining the probability for the given
collection will be used to figure out a patient's chance of class [8]. In contrast to linear regression, which produces a
having diabetes. There were a few limitations on the instances continuous number that might be anything, logistic
that were preferred from a larger database. Every patient at regression predicts whether a given instance will belong to a
this hospital is a female Pima Indian, in particular. certain class or not.
Table I. Dataset information
2. K Nearest Neighbor
Column No. Column Name Data type Simple regression and classification technique K-nearest
1 Pregnancies int64 neighbour uses non-parametric approach. The algorithm
keeps track of all acceptable attributes and categorises novel
2 Glucose int64 characteristics based on how similar they are to existing
3 Blood Pressure int64 attributes. It employs a tree-like data structure to calculate the
distance between the place of interest and the training data
4 Skin Thickness int64 set's points. The characteristic is categorised by its
5 Insulin int64 surroundings. The value of k in a classification method
remains a positive integer of nearest neighbour. An
6 BMI float64 assortment of class or object attribute values are used to select
7 Diabetes Pedigree float64 the closest neighbours [9].
Function
8 Age int64 3. Classification and Regression Tree (CART)
The CART machine learning technique predicts the values of
9 Outcome int64
the target variable based on other characteristics. The needed
variable's projected value is found at the final point of every
node. The hierarchy of choices is divided into components
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.
that predict at each fork. In the decision tree, sub-nodes are characteristic in order to assess the information gain of all
created based on the threshold value of an attribute. The most potential split sites [14]. A gradient boosting system called
advantageous parameter and the threshold value are used to Light GBM makes use of tree-based learning techniques.
split the training set, which serves as the root node, into two Gradient-based One-Side Sampling (GOSS) and Exclusive
groups. The subgroups are also distinguished using the same Feature Bundling (EFB) are two unique strategies that are
logic. This process is repeated until the tree produces all of used in its distribution and efficiency design. GOSS only
the leaves that are accessible to it or locates the final pure utilises the remaining data instances to estimate the
subset [10]. information gain after excluding a sizeable part of those with
tiny gradients. Since the information gain computation
4. Random Forest heavily relies on the data records with greater gradients,
A versatile classification and regression method that was GOSS can estimate the information gain rather well with a
created by Dr. Breiman is the random forest algorithm. When considerably smaller dataset. To decrease the number of
there are much more variables involved than different types features, EFB is used to bundle characteristics that are
of observations, the technique has shown to perform well in mutually incompatible [15], [16].
certain scenarios. It averages the forecasts of several different
randomised decision trees. It is an algorithm based on the d. Deep Neural Network (DNN)
theory of statistical learning that uses the Bootstrap Deep Neural Network (DNN) are thought to be a
randomised resampling technique to extract numerous mathematically mirror the learning and generalisation
sample sets from the initial training datasets [11]. The capabilities of human neurons. A very nonlinear system with
algorithm then integrates all of the outputs from the choice uncertain or complicated variable relationships can be scaled
trees to forecast the classification outcomes using the pre- up using an DNN model.
established voting mechanism after creating the decision tree A neural network is made up of layers and different types of
framework for each sample set. neurons. Nodes represent the dendrite and axon portions of
the human neuron anatomy, and the connections between
5. Support Vector Machine nodes represent the weighted axon. The neural network's
Some of the most used categorization methods is this one. An overall structure is influenced by the data input layer, one or
occludent classifier known as a Support Vector Machine more layers that are concealed, and the output layer. Here, ith
(SVM) explicitly characterises the data by isolating a neuron represents a connection with jth neuron of the whole
hyperplane. SVM separates things within a given class. structure, and Wij denotes the strength of the link between
Instances that are not supported by data can also be neurons [17]. The nodes of an DNN's structure receive inputs
recognised and classified [12]. The order of presentation of (features), process them, and then transport the processed
acquiring data for each class is unimportant to SVM. Using data to the next hidden layer through some form of weighted
regression analysis to produce a linear function and learning connection, where ith nodes send data to jth nodes for
to rank things to provide classification for every component processing, which includes computing the weighting sum and
are two ways that this strategy might be improved. adding up a bias term (θj). A mathematical illustration of the
ideas above discussed as follows:
6. XG Boost
Machine learning models may be trained quickly and flexibly netj = ∑ * wij + θj (j= 1,2, …, n) (1)
using the generalised gradient boosting toolbox known as
XGBoost. Using the ensemble learning approach, a number
e. Evaluation
of tentative model predictions are integrated to produce a
This is the prediction model's last phase. Here, we assess the
more solid forecast. One of XGBoost's important
accuracy of the predictions using a variety of measures, such
characteristics is its effective supervision of values that are
as the classification accuracy, precision, recall, and the F1-
missing, which allows it to handle data from the real world
score.
with values that are missing with no the need for time-
1. Classification Accuracy
consuming pre-processing. Additionally, XGBoost offers
It measures the proportion number of accurate forecasts to
capabilities for parallel processing that enable rapid model
total input trials. It is provided in equation (2),
training on big data sets. Forecasting of click-through rates,
systems for recommendation, and Kaggle contests are just a
few uses for XGBoost. Additionally, it is quite adaptable and Accuracy = (2)
makes speed optimisation easier because it allows for the
fine-tuning of a number of model parameters [13].
2. Confusion Matrix
A confusion matrix summarises an assessment of a machine
7. Light GBM
learning algorithm's effectiveness on a set of test data. It is
An efficient version of the widely used Gradient Boosting
commonly employed to evaluate the effectiveness of
Decision Tree (GBDT) technique includes XGBoost and
classification models. Each input event is assigned a category
parallel Gradient Boosted Regression Trees (pGBRT). Even
name by these models. The matrix parades how many true
though both solutions make extensive use of engineering
positives (TP), true negatives (TN), false positives (FP), and
optimisations, their efficiency and scalability are
false negatives (FN) the model engendered using the test data.
comparatively poor for feature spaces with high dimensions
and huge data sets. One key factor is the extremely long
calculation time required to test all the data records for each TP FP
FN TN
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.
According Figure 2 it shows the Correlation matrix graph of
the data set. Heatmaps make it extremely simple to
3. Precision understand the association between one characteristic
Precision is the proportion of real positive results to all (variable) and every other feature (variable). To put it another
positive results that your model projected as positive results. way, a correlation matrix is a table of data that shows the
Precision may be calculated using the formula TP/(TP+FP). 'correlations' between sets of variables in a set of data.
You may calculate the rate at which your optimistic forecasts
come true using this statistic.
Precision= (3)
! "# $
4. Recall
Recall is a metric that compares the number of real positive
outcomes to your genuine positive. Recall may be calculated
as follows: TP/(TP+FN). Using this method, we may assess
how well our model can identify the real outcome.
Recall = (4)
! "#%$
5. F1 Score
A model's recall and accuracy scores are summed up into one
metric, the F1 score, which has resulted in its frequent
application in recent research [18]. In contrast to accuracy,
which measures a model's overall effectiveness, the F1 score
measures its capacity to forecast by concentrating on how
well it succeeds within each class.
F1=2* ' ' (5) Figure 2. Correlation matrix graph of the data set
& 0"! $
()*+,-,./ )*+122
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.
Figure 3. visualization of missing observations
Lack of an appropriate diagnosis plan, a lack of funding, and IV. RESULT ANALYSIS
a general lack of information are the main contributors to Seven machine learning algorithms were employed in this
these negative effects. Therefore, avoiding the illness experiment and is compared with DNN model. The machine
completely by early identification may undoubtedly reduce a learning models are LR, KNN, CART, RF, SVM, XGB and
substantial economic load and assist the patient in managing LightGBM algorithms. These methods were all used using
their diabetes. The suggested system's advantages include, the PIMA Indian diabetes dataset. Training data and testing
but are not limited to, the understated, data were divided into two groups, each of which contained
70% and 30% of the total data. Prediction accuracy was our
• Regarding the hospital management, it would serve as a main criterion for evaluation in this study. The algorithm's
Decision Support System (DSS), which would greatly overall success rate is known as accuracy. Algorithm
aid them in making a quick choice that is of high quality. accuracy with Precision, Recall and F1 Score was evaluated
and shown in Table II. The accuracy and parameter values
• By using this approach, it is hoped that more effective given in the Table, are plotted through bar diagram for easy
measures may be implemented to lessen the negative visualisation are given in the Figure 4. The box plot of the
effects that diabetes has on the patient and to provide accuracy values of the classifiers is also shown in Figure 5.
recommendations that would enable the patient to The results indicates that the accuracy of DNN is 0.92 which
properly manage his health. is better than all other models. This DNN model has one input
layer, seven hidden layers and one output layer. The relu and
• Since most operations would be automated under the sigmoid activation functions have been used with the loss
proposed system, it saves the hospital administration the function as binary cross entropy.
time and effort used to create patients' diseases in the
current system.
Table II. Model evaluation on the basis of accuracy and different parameters
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.
Performance Comparison of Models
1
0.8
0.6
0.4
0.2
0
V. CONCLUSION
The dataset in this study is subjected to a diversity of [7] K. Kayaer and T. Yildirim, "Medical Diagnosis on Pima Indian
machine learning techniques, and numerous algorithms Diabetes Using General Regression Neural Networks", Iternational
Conf Artif. Neural Networks Neural Inf. Process., pp. 181-184, 2003.
have been used to classify the data. As compared to [8] M. A. Abdul-Ghani and R. A. DeFronzo, "Plasma Glucose
traditional machine learning models the Deep Neural Concentration and Prediction of Future Risk of Type 2
Network gives better result and the result is obvious Diabetes", Diabetes Care, vol. 32, no. suppl_2, pp. S194-S198, Nov.
because the database is not used in traditional 2009.
programming; instead, data is stored over the whole [9] Z. Punthakee, R. Goldenberg and P. Katz, "Definition Classification
network. Although some data briefly vanishes from one and Diagnosis of Diabetes Prediabetes and Metabolic
Syndrome", Can. J. Diabetes, vol. 42, pp. S10-S15, 2018.
point, the network still works. There are several possible
[10] G. Swapna, R. Vinayakumar and K. P. Soman, "Diabetes detection
negative outcomes associated with diabetes mellitus. It using deep learning algorithms", ICT Express, vol. 4, no. 4, pp. 243-
could be beneficial to look into applying machine learning 246, 2018.
to accurately predict and diagnose this illness. It implies [11] S. K. Nayak, A. K. Nayak, S. Mishra, P. Mohanty, N. Tripathy, A.
that diabetes may be predicted using machine learning, but Pati and A. Panigrahi, Original Research Article Speech data
it's critical to choose the appropriate features, classifier, and collection system for KUI, a Low resourced tribal. Journal of
data mining approach. We want to determine the kinds of Autonomous Intelligence, 7(1).
[12] M. T. P. Kamble, "Diabetes Detection using Deep Learning
diabetes in the future and also analyse the relative Approach", vol. 2, no. 12, pp. 342-349, 2016.
significance for every signal, which may enhance the [13] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju and H. Tang, "Predicting
precision with which diabetes is forecast because we cannot Diabetes Mellitus with Machine Learning Techniques", Front. Genet,
identify the kind of diabetes based just on the data. vol. 9, pp. 1-10, 2018.
Additionally, it is possible for non-diabetic persons to [14] N. Tripathy, S. Hota and D. Mishra, "Performance analysis of bitcoin
develop diabetes in the coming years. forecasting using deep learning techniques", Indonesian Journal of
Electrical Engineering and Computer Science, Vol. 31, no. 3, pp.
1515-1522, 2023.
Reference [15] V. A. K. and R. C., "Classification of Diabetes Disease Using Support
Vector Machine", International Journal of Engineering Research and
[1] J. Andreu-Perez, C. C. Y. Poon, R. D. Merrifield, S. T. C. Wong and Applications., vol. 3, pp. 1797-1801, April 2013.
G.-Z. Yang, "Big Data for Health", IEEE J. Biomed. Heal. [16] N. Tripathy, S. Parida and S. K. Nayak, "Forecasting Stock Market
Informatics, vol. 19, no. 4, pp. 1193-1208, 2015. Indices Using Gated Recurrent Unit (GRU) Based Ensemble Models:
[2] M. Chen, Y. Hao, K. Hwang, L. Wang and L. Wang, "Disease LSTM-GRU", "International Journal of Computer and
Prediction by Machine Learning over Big Data from Healthcare Communication Technology", vol. 9(1), 2023.
Communities", IEEE Access, vol. 5, no. c, pp. 8869-8879, 2017. [17] X. H. Meng, Y. X. Huang, D. P. Rao, Q. Zhang and Q. Liu,
[3] J. B. Heaton, N. G. Polson and J. H. Witte, "Deep learning for finance: "Comparison of three data mining models for predicting diabetes or
deep portfolios", Appl. Stoch. Model. Bus. Ind., vol. 33, no. 1, pp. 3- prediabetes by risk factors", The Kaohsiung journal of medical
12. sciences, vol. 29, no. 2, pp. 93-99, 2013.
[4] M. Chen, Y. Hao, K. Hwang, L. Wang and L. Wang, "Disease [18] A. Swarupa Rani and S. Jyothi, "Performance analysis of
Prediction by Machine Learning Over Big Data from Healthcare classification algorithms under different datasets", Computing for
Communities", IEEE Access, vol. 5, pp. 8869-8879, 2017. Sustainable Global Development (INDIACom) 2016 3 rd
[5] N. Tripathy, S. Hota, S Prusty, and S.K.Nayak. "Performance Analysis International Conference on, pp. 1584-1589, 2016.
of Deep Learning Techniques for Time Series Forecasting." In 2023 [19] N. Tripathy, S. K. Nayak, J. F. Godslove, I. K. Friday, and S. S. Dalai,
International Conference in Advances in Power, Signal, and “Credit Card Fraud Detection Using Logistic Regression and
Information Technology (APSIT), pp. 639-644. IEEE, 2023. Synthetic Minority Oversampling Technique (SMOTE) Approach",
[6] D. M. Renuka and J. M. Shyla, "Analysis of Various Data Mining "International Journal of Computer and Communication
Techniques to Predict Diabetes Mellitus", Int. J. Appl. Eng. Res. Technology", 8(4), p.4, 2022.
ISSN, vol. 11, no. 1, pp. 973-4562, 2016.
Authorized licensed use limited to: Siksha O Anusandhan University. Downloaded on February 17,2024 at 03:52:30 UTC from IEEE Xplore. Restrictions apply.