0% found this document useful (0 votes)

51 views12 pages

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset

As the leading cause of death and morbidity among non-communicable diseases, diabetes impacts millions of people worldwide. Diabetes mellitus, commonly referred to as diabetes, poses a significant worldwide public health issue. According to the International Diabetes Federation, by 2040, there would be 642 million individuals living with the disease, up from 415 million now. Early risk prediction is crucial for diagnosis and prevention of this chronic condition since it impairs the body's capaci

Uploaded by

GJR PUBLICATION

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views12 pages

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset

Uploaded by

GJR PUBLICATION

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of

Machine Learning Algorithms on Big Dataset

*Corresponding author:

Chandrakanth Rao Madhavaram

Infosys, Technology Lead

And

Eswar Prasad Galla2, Mohit Surender Reddy3, Manikanth Sarisa4, Venkata Nagesh
Boddapati5, Siddharth Konkimalla6
2
Infosys, Senior Support Engineer
3
Motorola Solutions, Sr Network Engineer
4
Sr Application Developer, Bank of America
5
Microsoft, Support Escalation Engineer
6
Amazon Com LLC, Network Development Engineer

SPECIAL EDITION
2021

Published in the Journal of

Global Journal of Research in Engineering & Computer Sciences

ISSN: 2583-2727 (Online)
Volume: 1 Issue: 1 (2021)

Published by GJR PUBLICATION

Global Journal of Research in Engineering & Computer Sciences
ISSN: 2583-2727 (Online)
Volume 01| Issue 01 | 2021
Journal homepage: https://gjrpublication.com/gjrecs/
Research Article
Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning
Algorithms on Big Dataset
*Chandrakanth Rao Madhavaram 1, Eswar Prasad Galla 2, Mohit Surender Reddy 3, Manikanth Sarisa 4, Venkata Nagesh
Boddapati 5, Siddharth Konkimalla 6
1
Infosys, Technology Lead
2
Infosys, Senior Support Engineer
3
Motorola Solutions, Sr Network Engineer
4
Sr Application Developer, Bank of America
5
Microsoft, Support Escalation Engineer
6
Amazon Com LLC, Network Development Engineer
DOI: 10.5281/zenodo.14010835

*Corresponding author: Chandrakanth Rao Madhavaram

Infosys, Technology Lead

Abstract
As the leading cause of death and morbidity among non-communicable diseases, diabetes impacts millions of
people worldwide. Diabetes mellitus, commonly referred to as diabetes, poses a significant worldwide public health
issue. According to the International Diabetes Federation, by 2040, there would be 642 million individuals living
with the disease, up from 415 million now. Early risk prediction is crucial for diagnosis and prevention of this
chronic condition since it impairs the body's capacity to absorb glucose. This study presents a comprehensive
evaluation of machine learning techniques for diabetes outcome prediction using data from the UCI Machine
Learning Repository. Training (80%) and testing (20%) subsets of the dataset are used to evaluate several
classifiers, including Support Vector Machines (SVM), Multi-Layer Perceptrons (MLP), and Gradient Boosting
Machines (GBM). When performance is measured using accuracy and F1-score, the results demonstrate that the
GBM model outperforms the MLP and SVM by a significant margin, with an accuracy of 96.92% compared to 76%
and 77.73%, respectively. This study highlights the superior predictive capability of the GBM model, emphasizing
its potential to enhance diabetes management and support healthcare professionals in making informed clinical
decisions. These findings contribute to the growing body of evidence supporting the integration of machine
learning in healthcare settings for improved patient outcomes.

Keywords: Healthcare, prediction, Machine learning, diagnosis, Diabetes mellitus, Data mining, diabetes
dataset.

INTRODUCTION
The healthcare data are produced in a variety of forms and from a variety of sources. In order to generate useful
information, integrating health data and bringing it to a shared platform for additional analysis calls for sophisticated
tools and procedures.

The variability, inconsistency, incompleteness, etc. of health care prevents the healthcare workers from gaining useful
knowledge for usable clinical intelligence; Healthcare providers use a variety of methods to predict diabetes mellitus[1].
Hyperglycemia is a hallmark of diabetes mellitus, a chronic illness [2][3]. Numerous difficulties might result from it. In
2040, there will be 642 million diabetic patients worldwide, meaning that one in ten persons will have the disease,
according to rising morbidity in recent years. Without a doubt, this concerning statistic requires careful consideration.
Numerous facets of medical health have benefited from the quick growth of machine learning[4].

1 @ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Global J Res Eng Comput Sci. 2021; 1(1), 1-11

Diabetes mellitus, often referred Known as diabetes, this chronic metabolic illness impairs the body's ability to use food
as fuel, which raises blood sugar levels. Hyperglycemia toxicity, in which the bloodstream is saturated with sugar, heart
disease, gum disease and tooth decay, renal failure, and various diseases that can be fatal are among the problems that
diabetes produces in the human body. Diabetes has no known cure, however early identification allows patients to
prevent or halt the progression of the condition [5].

An examination of the most popular machine learning methods for determining the number of people with diabetes
mellitus. AI and ML [6] contribute significantly to the control of diabetes by empowering patients to choose wise dietary
and physical activity choices. Given that diabetes can strike anybody and that its symptoms are difficult to identify, early
identification is crucial and confirms the necessity of routine checkups.[7]. The diabetes prediction system diagnoses
diabetics using machine learning. In order to diagnose diabetes, the supervised learning algorithm is also used to train the
diabetes prediction system[8].

1.1 Motivation and Contribution paper

The rising incidence of diabetes mellitus throughout the world presents serious health issues, making the creation of
efficient prediction models for early detection and treatment necessary. Rapidly identifying those who are at risk is
essential to enhancing patient outcomes and reducing the strain on healthcare systems, since millions are impacted. This
study aims to harness machine learning algorithms to enhance diabetes prediction capabilities, addressing the urgent need
for data-driven approaches in healthcare that facilitate proactive management and targeted prevention strategies. This
paper makes several key contributions to the field of healthcare analytics and ML. In following contributions are:
• Utilize the diabetes dataset for predicting diabetes Mellitus.
• Implements label encoding to convert categorical variables into numerical format, facilitating the application of
ML algorithms.
• Employs the ETC for feature importance analysis, identifying key predictors of diabetes, which helps in
reducing dimensionality and improving model performance.
• Applies normalization techniques, specifically Standard Scaler, to standardize the dataset, enhancing the
performance.
• Conduct a comparative analysis of various ML models, including GBM, MLP, and SVM, to evaluate their
effectiveness in predicting diabetes outcomes.
• Utilizes performance matrix like AUC-ROC, accuracy and f1-score.
1.2 Structure of paper
The remainder of the paper is organised in this manner. Research on diabetes mellitus prediction in an industrial setting is
presented in Section 2. The approach is described in depth in Section 3. The findings, analysis, and discussion are
contrasted and compared in Section 4. The study's findings and recommendations for more research are presented in
Section 5.

LITERATURE REVIEW
Researchers have recently demonstrated an increasing interest in the development of Predicting Diabetes Mellitus. Some
background studies are provided in below:
This study Agarwal and Saxena, (2019) creates a model for One well-known diabetes dataset is the Pima Indians
Diabetes Dataset research that includes information on Pima women, who are disproportionately affected by diabetes.
The cardinal factor of this dataset is that the features are physical factors rather than dependent on region of the women.
To successfully predict and diagnose diabetes, I worked on finding the best-suited algorithm for this purpose. Finding the
best accuracy by comparing the various algorithms is the primary objective. DT, LR, Naïve Bayes, SVM, and KNN are
the algorithms that are being compared. K-Fold and Cross Validation helped us achieve an accuracy of 81.1% in the end
[9].

In this paper Yahyaoui et al., (2019), present the concept of a machine learning (ML) model for diabetes prediction
using Decision Support Systems (DSS). We contrasted traditional algorithmic machine learning techniques with deep
learning methods. We examined RF and the SVM classifier, or SVM, which are the two most often used classifiers for
conventional ML techniques. In the suggested study, however, a fully CNN for DL was used to predict and identify the
diabetes individuals. 768 samples with precisely 8 characteristics each were employed, along with the Indians Diabetes
dataset Pima that is accessible through the Dew Media public repository, to evaluate the suggested process. The first 268
samples are classified as diabetic, while the remaining 500 samples are placed in the non-diabetic category. Accuracy
was 76.81% for DL, 65.38 for SVM, and 83.67% for RF. According to the experimental research, RF was more effective
in predicting diabetes than deep learning and SVM methods [7] .

This study Islam et al., (2019), have gathered 340 cases, each including 26 characteristics of individuals with diabetes
that exhibit a range of symptoms divided into groups that are normal and those that are not. After the dataset was trained

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

2
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

using the cross-validation approach, three ML algorithms—RF, LR, and Bagging—were applied for classification. At
90.29%, 83.24%, and 89.12%, respectively, Random Forest, Logistic Regression, and Bagging all have incredibly
impressive accuracy rates [10].

This research Kowsher et al., (2019) 7 ML classifiers and an ANN approach are compared in order to recognize and
treat diabetes patients as soon as feasible. Data from 9483 diabetic people make up our training and test dataset. The size
of the training dataset prevents overfitting and yields very precise test results. We choose the best approach, deep ANN,
by using performance indicators like accuracy and precision. With an accuracy of 95.14%, it outperforms all other
studied ML classifiers. We anticipate that hospitals will be able to predict diabetes using our effective method, which will
also stimulate research into more accurate prediction models [11].

In order to help with patient categorisation for intense This study examined case management among individuals with
type 2 diabetes Seng et al., (2016), examines the use of predictive analytics to EHR data in a Singaporean academic
health system. They have created a risk score for high healthcare EHR users using a multidisciplinary team approach. In
order to forecast the top 10% of healthcare spenders in 2011, The Akaike Information to obtain this risk score, the
backward stepwise variable selection model-building approach was combined with a multiple logistic regression model
and criterion. Among the variables of the risk score were sociodemographic, biochemical, comorbid, and healthcare
usage parameters. Compared to using the 2010's total cost as the sole prediction, the risk score's Area under the Curve
(AUC) was greater at 0.708. If routine biochemistry measures were a part of the clinical practice for T2DM, the lack of
them may be seen as either a sign that the patient's condition is being positively perceived or as a sign that they are not
receiving frequent follow-up for treating their disease. In order to provide a comprehensive interpretation of a risk score,
close cooperation across several disciplines is essential [12].

The background study of Comparative Analysis of Predicting Diabetes Mellitus with its dataset, models, performance,
and contribution is provided in Table 1.
Table I. Comparative Study on Predicting Diabetes Mellitus using multiple approaches
Author Methods Data Performance Limitation/future work
Agarwal and SVM, KNN, Pima Indians Accuracy of K-Fold and Limited to a specific
Saxena, Naïve Bayes, Diabetes Dataset Cross Validation: demographic; further
Decision Trees, (768 instances, 8 81.1%Accuracy of K-Fold testing on diverse
and Logistic features) and Cross Validation: populations needed.
Regression 81.1%
Yahyaoui et al., SVM, Random Pima Indians SVM: 65.38%, RF: RF outperformed; explore
Forest, Diabetes Dataset 83.67%, DL: 76.81% other deep learning
Convolutional (768 instances, 8 architectures for better
Neural Network features) accuracy.
(CNN)
Islam et al., Bagging, Custom dataset Bagging: 89.12%, LR: Limited dataset size;
Logistic (340 instances, 26 83.24%, RF: 90.29% consider larger and more
Regression, features) diverse datasets for
Random Forest generalization.
Kowsher et al., Various ML Dataset of 9483 Accuracy: 95.14% High accuracy may not
classifiers, Deep diabetes patients translate to clinical
Artificial Neural settings; explore real-world
Network (ANN) applicability.
Seng et al., Multiple Logistic EHR data for AUC: 0.708 Lack of biochemistry data;
Regression T2DM patients future work could integrate
(not specified) more clinical parameters.

RESEARCH METHODOLOGY
For Machine learning for diabetes mellitus prediction model, following steps of methodology workflow are present in
figure 1. In this study, the methodology focuses on predicting diabetes mellitus by analyzing a large data extracted from
UCI's ML repository. The dataset contains diabetes-related symptoms from 520 individuals. The initial stage involves
comprehensive data preprocessing, including handling missing values through imputation and removing redundant data
to ensure high-quality inputs. The Standard Scaler technique is applied to normalize features, ensuring all attributes are
on a common scale, thus improving model accuracy. Categorical variables are transformed into numerical formats using
label encoding to make them compatible with machine learning algorithms. Next, feature importance is assessed using
the Extra Trees Classifier (ETC), which assigns importance scores according to each feature's role in forecast accuracy.
The characteristics that have been found to have the greatest influence on diabetes outcome prediction are polyuria and

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

3
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

polydipsia. To train and assess different machine learning models, the data is then separated into training (80%) and
testing (20%) sets. The performance of the GBM, MLP, and SVM classifiers is assessed using confusion matrices and
performance measures including accuracy and F1-score.

diabetic Dataset Data

from UCI preprocessing

Handle Remove
Missing Value Redundancy

Label Encoder for Normalization with

labeling Standard Scler

Feature selection Data splitting into

with ETC training and testing
sets

Model evaluation Apply models

with accuracy, f1- like GBM, MLP,
score, ROC scores and SVM

Results

Fig. 1. Flowchart for Predicting Diabetes Mellitus

In the following Figure 1 flowchart for Predicting Diabetes Mellitus, each graphic phase is briefly described.

3.1 Data Collection

The dataset, which comprises 520 people's reports of diabetes-related symptoms, was gathered from the diabetes dataset's
UCI machine repository. It includes personal information about people, such as signs of diabetes. The preparation step
involved extensive data quality tests on the dataset. The following visualization graph of the dataset are listed in below:

Fig. 2. Count plot for gender

The count plot in Figure 2 depicts the distribution of two classes (Positive and Negative) based on gender (Male and
Female). For Males, compared for females, the Positive class far outnumbers the Negative class, with the Negative class
having extremely few instances, but the Negative class is more common than the Positive class. This suggests a potential
gender-based disparity in the data, where males have a higher representation in the negative class and females are
predominantly in the positive class.

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

4
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

Fig. 3. Count lot for Polydipsia

Figure 3, the count plot shows the relationship between Polydipsia (excessive thirst) and the two classes (Positive and
Negative). For individuals with Polydipsia ("Yes"), the Positive class is significantly higher, indicating a strong
association between Polydipsia and the Positive class. On the other hand, for those without Polydipsia ("No"), the
Negative class has a higher count, suggesting that the absence of Polydipsia is more commonly associated with the
Negative class.

Fig. 4. Correlations of attributes with output class

The bar plot in figure 4 shows that attributes like polyuria, polydipsia, and sudden weight loss have strong positive
correlations with diabetes, while age, alopecia, and obesity have weaker or negative correlations. Strongly correlated
attributes are likely more important for predicting diabetes, whereas weakly correlated ones may be excluded to enhance
model performance. However, correlation doesn't imply causation, so further analysis is needed to confirm any causal
relationships.

3.2 Data Preprocessing

Preprocessing is the process by which unstructured data is transformed into intelligible representations suitable for
machine-learning models [13]. Preprocessing is largely utilized to improve the quality of input data by minimizing the
amount of noise, redundant data, and unnecessary data[14]. This phase of model deals with noise in order to arrive at
better and improved results from the original data set which was noisy. This dataset also has some level of missing value
present in it. Thus, most values are imputed on the basis of few chosen attributes such as Age, BMI, skin thickness, blood
pressure, and glucose level, as well as because some characteristic values cannot be zero. The dataset should then be
scaled so that all values fall between 0 and 1. Below are the main pre-processing terminology:
• Handle missing values: Deletion involves eliminating the rows that include missing data, whereas imputation
involves substituting statistical measures such as mean, median, or model for the missing values.
• Remove Redundancy: Data redundancy may result in the need of extra storage space, particularly if that space
is costly. So, remove all the redundancy from the dataset.

3.3 Normalization with standard Scaler

Normalisation is a data preparation method that shifts a dataset's features to a common scale to improve machine learning
algorithms' accuracy and effectiveness[15]. The Standard Scaler approach, which uses the Z-score normalisation,
standardises attributes and creates removing the mean from each value and dividing the result by the standard deviation
of the attribute yields a distribution with zero mean and unit variance. A value xi may be changed into x 0 i using
Equation 1, where ¯x is the x variable's mean.

𝑥𝑖 − 𝑥̅
𝑥𝑖′ = … … . (1)
𝑠

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

5
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

The sample mean of the property serves as the translational term in this instance, Although the standard deviation acts as
the scaling factor.

3.4 Label Encoder

In data analysis and machine learning, label One technique for converting categorical information into numerical
representation is encoding. One very useful technique The Label Encoder is used to transform categorical variables into a
numerical representation during data preparation. This is accomplished by giving each input category is assigned a
distinct number. A method for converting categorical variables into numerical representation in machine learning and
data analysis is called label encoding [16].

3.5 Feature Importance using ETC

Feature importance using the Extra Trees Classifier (ETC) measures how the accuracy of the model's predictions is
influenced by each feature. Following dataset training, the classifier gives each feature a significance score that indicates
how relevant it is to the classification objective [17]. Higher-scoring features are thought to have more predictive power,
whereas lower-scoring features could have less of an effect. This helps in identifying key predictors and potentially
reducing the dataset's dimensionality by removing less important features, thereby improving model performance and
interpretability.

Fig. 5. Important features plot using extra trees classifier

In figure 5, the bar plot illustrates the relative importance of various variables in predicting the target variable of interest,
likely the presence or absence of a disease. The y-axis represents the relative importance score, indicating. The variable's
impact on the predictive ability of the model. Based on the plot, polyuria and polydipsia emerge as the most important
variables, followed by gender and sudden weight loss. Other variables, such as age, partial paresis, and itching, exhibit
moderate importance, while variables like obesity and weakness demonstrate relatively lower importance. These findings
suggest that polyuria, polydipsia, gender, and sudden weight loss are key factors in determining the disease outcome.

3.6 Data Splitting

Therefore, consequently, the preprocessed data yields two sets: the training set and the testing set. The model is
developed on or tested on the training set of 80 percent of the data and the accuracy in tested with the testing set of the 20
percent of data.

3.7 Classification GBM model

Gradient boosting machines, or GBMs, use gradients to determine the shortcomings of weak models[18][19]. This is
accomplished by an iterative process that combines decision trees using an additive model, with the ultimate goal being
to link base learners to reduce forecast mistakes. [20] while using gradient descent to lower the loss function. The sum of
𝑛 regression trees (2) is known as the gradient boosting tree, or GBT, 𝐹𝑛(𝑥𝑡).
𝑛
𝐹𝑛 (𝑥𝑡 ) = ∑ 𝑓𝑖 (𝑥𝑡 ) … … … . (2)
𝑖=1

In which each 𝑓𝑖(𝑥𝑡) is a regression-tree, or decision tree. The following equation (3) is used to estimate the new decision
tree 𝑓𝑛+1(𝑥𝑡) to strengthen the group of trees in a sequential manner:

𝑎𝑟𝑔𝑚𝑖𝑛 ∑ 𝐿 (𝑦𝑡 . 𝐹𝑛 (𝑥𝑡 ) + 𝐹𝑛+1 (𝑥𝑡 )) … … . (3)

𝑡
When L, the loss-function L, is differentiable. The steepest descent approach is used to accomplish this optimisation [21].

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

6
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

3.8 Evaluation metrics

It is crucial to depict the confusion matrix and look at certain effectiveness indicators when evaluating any data mining
classification system. The arrangement of the expected and actual classes is known as the confusion matrix. It displays
how many samples fit into each of the model's four quadrants. Interpreting accurately and inaccurately projected model
results, such as FN, TN, TP, and F, is made easier with its help. As a result, it is crucial for assessing how effectively the
model performed the categorisation. A matrix of confusion is shown below figure (6).

Fig. 6. Representation of confusion matrics

When comparing predicted and actual values, four different columns are produced: These consist of the quantity among
false positives (FP), true positives (TP), false negatives (FN), and true negatives (TN). For example, if an instance were
predicted to have diabetes and it did not have diabetes, then it is classified as a false positive.

a) Accuracy
It is calculated by dividing the number of accurate predictions by the total number of input samples. It is offered as (4).
TP + TN
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = … . . (4)
TP + Fp + TN + FN

b) F1 score
It is employed to gauge the correctness of a test. The F1 Score is the average of recall and accuracy. The F1 Score range
is [0, 1]. It informs you of the robustness and precision of your classifier. It is expressed mathematically as (5).
2 ∗ (precision ∗ recall)
F1 = … . … . . … (5)
precision + recall
c) ROC and AUC Score
ROC is the abbreviation for receiver operating characteristic and it is a probability curve which has been plotted with the
FPR on X coordinate and TPR on Y coordinate. The quality of a binary classifier is summarized quantitatively by an
ROC graph. In conclusion the ROC curve is expressed in terms of the area under the ROC curve abbreviated as AUC.
For the model utilized, the greater value of AUC is desired. AUC is an evaluation measure and its maximum or perfect
value is normative and contains a value of 1 always. In the meanwhile, the AUC of random classifier is 0.5.

RESULTS AND DISCUSSION

The experiment result of the models is provided in this section. The following results are measured on f1-score, accuracy
and ROC-AUC score. For the comparative analysis, use machine learning models like MLP[22], SVM[23], and GBM.
The following Table 2 provides the performance of the GBM model with graphical results including confusion matrix,
ROC, and AUC graphs.

Table II. GBM model performance for predicting Diabetes Mellitus on diabetes dataset
Performance matrix Gradient Boosting Machine
Accuracy 96.92
F1-score 99.37
ROC-AUC 99.92

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

7
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

Performance of GBM for Prediction of

Diabetes Mellitus
102
99.92
101 99.37
100
99
96.92

In %
98
97
96
95
94
93
Accuracy F1-score ROC-AUC

Matrix

Fig. 7. GBM model performance on diabetes dataset

The following table 2 and Figure 7 show the GBM model performance data. In this figure, the GBM model for diabetes
prediction showcases outstanding performance, achieving an accuracy of 96.92%. With an F1-score of 99.37, it
effectively balances precision and recall, minimizing false positives and negatives. Additionally, a remarkable ROC-
AUC score of 99.92 highlights the model's exceptional ability to differentiate between diabetic and non-diabetic cases,
demonstrating its robustness for real-world applications in early diabetes detection.

Fig. 8. Confusion Matrix for GBM model

Figure 8 illustrates the confusion matrix for the GBM model and its classification performance, revealing that the model
accurately classified 50 instances of class 0 (TP) and 79 instances of class 1 (TN) while making only 1 FP and no FN.
With a total of 129 correct predictions out of 130 instances, the model demonstrates strong performance. However, to
gain a more comprehensive understanding of its efficacy, further evaluation using additional metrics such as
precision, F1-score recall, and accuracy is recommended.

Fig. 9. ROC-AUC curve for the GBM model

The ROC curve for the GBM model in figure 9 illustrates its binary classification performance by plotting the TPR
against the FPR at different thresholds. Improved discrimination is shown by a curve towards the upper-left corner, while
the AUC quantifies this performance, with values close to 1 reflecting strong classification ability. In this case, the ROC
curve suggests the GBM model performs well, likely achieving an AUC near 1, indicating effective identification of both
positive and negative instances with a low FPR.

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

8
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

Table III. Accuracy Comparison between machine learning models on diabetes dataset
Model Accuracy
Gradient Boosting Machine (GBM) 96.92
Multi-layer perceptron (MLP) 76
Support Vector Machine (SVM) 77.73

Accuracy Comparison Between GBM and

Other Models Performance
96.92
100 76 77.73
In %

0
GBM MLP SVM
Models

Fig. 10. Accuracy comparison between model performance

Figure 10 illustrates the accuracy comparison of models. in this comparison, The GBM outperforms both the MLP and
the SVM in terms of diabetes prediction accuracy. The GBM achieves an impressive accuracy of 96.92%, significantly
higher than the MLP, which reaches 76%, and the SVM, which achieves 77.73%. This comparison highlights the
superior performance of GBM in handling this classification task, making it more reliable for accurate predictions
compared to MLP and SVM models.

CONCLUSION AND FUTURE STUDY

Diabetes Mellitus (DM) is a severe condition that affects a lot of individuals worldwide. Given the high incidence of
diabetes mellitus, its detrimental effects on health, and the rising expenses of care and treatment, prevention, early
identification, and better disease management are imperative. This study uses a dataset to illustrate the efficiency of
machine learning algorithms, taken from the UCI Machine Learning Repository in diabetes mellitus prediction,
underscoring the importance of careful data preparation and feature selection. According to the investigation, the
Gradient Boosting Machine (GBM) performs noticeably better than other classifiers, including MLP and SVM, with an
astounding 96.92% accuracy and f1-score of 99.37. The identification of key predictors, including polyuria and
polydipsia, underscores the relevance of these symptoms in diabetes risk assessment. The robust performance of the
GBM model, coupled with its interpretability through feature importance analysis, emphasizes its potential as a reliable
tool for clinical decision-making in diabetes management. The findings advocate for the integration of machine learning
methodologies in healthcare settings to facilitate early diagnosis and individualised therapeutic approaches. In order to
further improve predicted accuracy, future studies should examine how well these models work in a variety of groups and
think about adding other clinical characteristics.

REFERENCES
1. M. H. Tanrıverdi, T. Çelepkolu, and H. Aslanhan, “Diabetes mellitus and primary healthcare,” J. Clin. Exp.
Investig., vol. 4, no. 4, Dec. 2013, doi: 10.5799/ahinjs.01.2013.04.0347.
2. A. Petersmann et al., “Definition, classification and diagnostics of diabetes mellitus,” J. Lab. Med., 2018, doi:
10.1515/labmed-2018-0016.
3. S. C. R. Vennapusa, T. Fadziso, K. Sachani, V. K. Yarlagadda, and S. K. R. Anumandla, “Cryptocurrency-Based
Loyalty Programs for Enhanced Customer Engagement,” Technol. Manag. Rev., vol. 3, no. 1, pp. 46–62, 2018.
4. Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning
Techniques,” Front. Genet., 2018, doi: 10.3389/fgene.2018.00515.
5. A. K. Steck et al., “Predictors of slow progression to diabetes in children with multiple islet autoantibodies,” J.
Autoimmun., 2016, doi: 10.1016/j.jaut.2016.05.010.
6. K. Mullangi, N. D. Vamsi Krishna Yarlagadda, and M. Rodriguez, “Integrating AI and Reciprocal Symmetry in
Financial Management: A Pathway to Enhanced Decision-Making,” Int. J. Reciprocal Symmetry Theor. Phys., vol.
5, no. 1, pp. 42–52, 2018.

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

9
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

7. A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A Decision Support System for Diabetes Prediction Using
Machine Learning and Deep Learning Techniques,” in 1st International Informatics and Software Engineering
Conference: Innovative Technologies for Digital Transformation, IISEC 2019 - Proceedings, 2019.
doi: 10.1109/UBMYK48245.2019.8965556.
8. A. Gnana, E. Leavline, and B. Baig, “Diabetes Prediction Using Medical Data,” J. Comput. Intell. Bioinforma.,
2017.
9. A. Agarwal and A. Saxena, “Analysis of machine learning algorithms and obtaining highest accuracy for prediction
of diabetes in women,” in Proceedings of the 2019 6th International Conference on Computing for Sustainable
Global Development, INDIACom 2019, 2019.
10. M. T. Islam, M. Raihan, F. Farzana, M. G. M. Raju, and M. B. Hossain, “An Empirical Study on Diabetes Mellitus
Prediction for Typical and Non-Typical Cases using Machine Learning Approaches,” in 2019 10th International
Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 2019. doi:
10.1109/ICCCNT45670.2019.8944528.
11. M. H. Tanrıverdi, T. Çelepkolu, and H. Aslanhan, “Diabetes mellitus and primary healthcare,” J. Clin. Exp.
Investig., vol. 4, no. 4, Dec. 2013, doi: 10.5799/ahinjs.01.2013.04.0347.
12. A. Petersmann et al., “Definition, classification and diagnostics of diabetes mellitus,” J. Lab. Med., 2018, doi:
10.1515/labmed-2018-0016.
13. S. C. R. Vennapusa, T. Fadziso, K. Sachani, V. K. Yarlagadda, and S. K. R. Anumandla, “Cryptocurrency-Based
Loyalty Programs for Enhanced Customer Engagement,” Technol. Manag. Rev., vol. 3, no. 1, pp. 46–62, 2018.
14. Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning
Techniques,” Front. Genet., 2018, doi: 10.3389/fgene.2018.00515.
15. A. K. Steck et al., “Predictors of slow progression to diabetes in children with multiple islet autoantibodies,” J.
Autoimmun., 2016, doi: 10.1016/j.jaut.2016.05.010.
16. K. Mullangi, N. D. Vamsi Krishna Yarlagadda, and M. Rodriguez, “Integrating AI and Reciprocal Symmetry in
Financial Management: A Pathway to Enhanced Decision-Making,” Int. J. Reciprocal Symmetry Theor. Phys., vol.
5, no. 1, pp. 42–52, 2018.
17. A. Yahyaoui, A. Jamil, J. Rasheed, and M. Yesiltepe, “A Decision Support System for Diabetes Prediction Using
Machine Learning and Deep Learning Techniques,” in 1st International Informatics and Software Engineering
Conference: Innovative Technologies for Digital Transformation, IISEC 2019 - Proceedings, 2019.
doi: 10.1109/UBMYK48245.2019.8965556.
18. A. Gnana, E. Leavline, and B. Baig, “Diabetes Prediction Using Medical Data,” J. Comput. Intell. Bioinforma.,
2017.
19. A. Agarwal and A. Saxena, “Analysis of machine learning algorithms and obtaining highest accuracy for prediction
of diabetes in women,” in Proceedings of the 2019 6th International Conference on Computing for Sustainable
Global Development, INDIACom 2019, 2019.
20. M. T. Islam, M. Raihan, F. Farzana, M. G. M. Raju, and M. B. Hossain, “An Empirical Study on Diabetes Mellitus
Prediction for Typical and Non-Typical Cases using Machine Learning Approaches,” in 2019 10th International
Conference on Computing, Communication and Networking Technologies, ICCCNT 2019, 2019.
doi: 10.1109/ICCCNT45670.2019.8944528.
21. M. Kowsher, M. Y. Turaba, T. Sajed, and M. M. Mahabubur Rahman, “Prognosis and treatment prediction of type-2
diabetes using deep neural network and machine learning classifiers,” in 2019 22nd International Conference on
Computer and Information Technology, ICCIT 2019, 2019. doi: 10.1109/ICCIT48885.2019.9038574.
22. T. C. Seng et al., “Predicting high cost patients with type 2 diabetes mellitus using hospital databases in a multi-
ethnic Asian population,” in 3rd IEEE EMBS International Conference on Biomedical and Health Informatics, BHI
2016, 2016. doi: 10.1109/BHI.2016.7455879.
23. S. Vijayarani, M. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining-An Overview Privacy
Preserving Data Mining View project,” Int. J. Comput. Sci. Commun. Networks, 2015.
24. V. K. Y. Nicholas Richardson, Rajani Pydipalli, Sai Sirisha Maddula, Sunil Kumar Reddy Anumandla, “Role-Based
Access Control in SAS Programming: Enhancing Security and Authorization,” Int. J. Reciprocal Symmetry Theor.
Phys., vol. 6, no. 1, pp. 31–42, 2019.
25. R. P. Vamsi Krishna Yarlagadda, “Secure Programming with SAS: Mitigating Risks and Protecting Data Integrity,”
Eng. Int., vol. 6, no. 2, pp. 211–222, 2018.
26. Z. Lin, G. Ding, J. Han, and L. Shao, “End-to-End Feature-Aware Label Space Encoding for Multilabel
Classification with Many Classes,” IEEE Trans. Neural Networks Learn. Syst., 2018,
doi: 10.1109/TNNLS.2017.2691545.
27. A. Altmann, L. Toloşi, O. Sander, and T. Lengauer, “Permutation importance: A corrected feature importance
measure,” Bioinformatics, 2010, doi: 10.1093/bioinformatics/btq134.
28. V. V. Kumar, A. Sahoo, and F. W. Liou, “Cyber-enabled product lifecycle management: A multi-agent framework,”
in Procedia Manufacturing, 2019. doi: 10.1016/j.promfg.2020.01.247.

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

10
Global J Res Eng Comput Sci. 2021; 1(1), 1-11

29. V. V. Kumar, F. T. S. Chan, N. Mishra, and V. Kumar, “Environmental integrated closed loop logistics model: An
artificial bee colony approach,” in SCMIS 2010 - Proceedings of 2010 8th International Conference on Supply Chain
Management and Information Systems: Logistics Systems and Engineering, 2010.
30. V. V. Kumar and F. T. S. Chan, “A superiority search and optimisation algorithm to solve RFID and an
environmental factor embedded closed loop logistics model,” Int. J. Prod. Res., 2011,
doi: 10.1080/00207543.2010.503201.
31. T. Chen et al., “Prediction of Extubation Failure for Intensive Care Unit Patients Using Light Gradient Boosting
Machine,” IEEE Access, 2019, doi: 10.1109/ACCESS.2019.2946980.
32. A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” in Procedia Computer
Science, 2019. doi: 10.1016/j.procs.2020.01.047.
33. N. Sneha and T. Gangil, “Analysis of diabetes mellitus for early prediction using optimal features selection,” J. Big
Data, 2019, doi: 10.1186/s40537-019-0175-6.

CITATION
Chandrakanth R. M., Eswar P. G., Mohit S. R., Manikanth S., Venkata N. B., & Siddharth K. (2021). Predicting
Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms on Big Dataset. In
Global Journal of Research in Engineering & Computer Sciences (Vol. 1, Number 1, pp. 1–11).
https://doi.org/10.5281/zenodo.14010835

Global Journal of Research in Engineering & Computer Sciences

Assets of Publishing with Us
• Immediate, unrestricted online access
• Peer Review Process
• Author’s Retain Copyright
• DOI for all articles

Copyright © 2021 The Author(s): This is an open-access article distributed under the terms of the Creative Commons
Attribution 4.0 International License (CC BY-NC 4.0) which permits unrestricted use, distribution, and reproduction
in any medium for non-commercial use provided the original author and source are credited.

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Programa de Lee Silverman
No ratings yet
Programa de Lee Silverman
16 pages
Dinesh Paper On Diabetes Mellitus (9%)
No ratings yet
Dinesh Paper On Diabetes Mellitus (9%)
8 pages
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
No ratings yet
11-A Risk Assessment and Prediction Framework For Diabetes Mellitus Using Machine Learning Algorithms
12 pages
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
No ratings yet
Prediction of Diabetes Using Machine Learning Analysis of 70000 Clinical Database Patient Record
5 pages
Diabetes Prediction Using Machine Learning Techniques
No ratings yet
Diabetes Prediction Using Machine Learning Techniques
18 pages
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
No ratings yet
Machine Learning Meets Healthcare: Predicting Diabetes Onset With EHR
8 pages
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
No ratings yet
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
12 pages
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
No ratings yet
Sat - 17.Pdf - Machine Learning Models For Diagnosis of The Diabetic Patient and Predicting Insulin Dosage
11 pages
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
No ratings yet
An Effective Pre-Processing Techniques For Diabetes Mellitus Prediction in Healthcare Systems
15 pages
Comparison of ML Techniques
No ratings yet
Comparison of ML Techniques
16 pages
Slide Presetatio
No ratings yet
Slide Presetatio
30 pages
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
No ratings yet
Diabetes Decoded: Transitioning From Traditional Models To Hybrid Deep Learning Approaches
5 pages
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
No ratings yet
Integrating Machine Learning For Accurate Prediction of Early Diabetes - A Novel Approach
24 pages
Paper 1
No ratings yet
Paper 1
9 pages
Diabetes Prediction Report
No ratings yet
Diabetes Prediction Report
16 pages
Diagnosis of Diabetes Using Machine Learning
No ratings yet
Diagnosis of Diabetes Using Machine Learning
12 pages
Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
No ratings yet
Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
4 pages
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
No ratings yet
Performance Analysis of Deep Neural Network and Machine Learning Algorithms For Diabetes Prediction
6 pages
Analysis and Prediction of Diabetes Mell PDF
No ratings yet
Analysis and Prediction of Diabetes Mell PDF
10 pages
Classification of Diabetes Mellitus Prediction Using Hybrid Machine Learning Techniques
No ratings yet
Classification of Diabetes Mellitus Prediction Using Hybrid Machine Learning Techniques
10 pages
245-Article Text-2088-1-10-20240129
No ratings yet
245-Article Text-2088-1-10-20240129
8 pages
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
No ratings yet
22comparative Analysis of Machine Learning Algorithms For Diabetes Prediction Using Real-Time Data-Set
5 pages
Ijarcce 2020 9712
No ratings yet
Ijarcce 2020 9712
7 pages
Predictive Machine Learning Applying Cross Industry Standard Process For Data Mining For The Diagnosis of Diabetes Mellitus Type 2
No ratings yet
Predictive Machine Learning Applying Cross Industry Standard Process For Data Mining For The Diagnosis of Diabetes Mellitus Type 2
14 pages
Projectreport Diabetes Prediction
No ratings yet
Projectreport Diabetes Prediction
22 pages
Kush Don FINAL Jatu
No ratings yet
Kush Don FINAL Jatu
11 pages
Machine Learning and Applications CS522I1C
No ratings yet
Machine Learning and Applications CS522I1C
15 pages
Diabetes Prediction Using Machine Learning Algorithms and Ontology
No ratings yet
Diabetes Prediction Using Machine Learning Algorithms and Ontology
19 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
6 pages
3 Journal
No ratings yet
3 Journal
9 pages
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
No ratings yet
Efficient Binary Classifier For Prediction of Diabetes Using Data Preprocessing and Support Vector Machine
2 pages
DDPIS Diabetes Disease Prediction by Improvising
No ratings yet
DDPIS Diabetes Disease Prediction by Improvising
11 pages
Diabetes Prediction Using Supervised Machine Learning
No ratings yet
Diabetes Prediction Using Supervised Machine Learning
10 pages
Final Survey Diabetes Prediction ML IEEE
No ratings yet
Final Survey Diabetes Prediction ML IEEE
5 pages
PM For Diabetes
No ratings yet
PM For Diabetes
11 pages
Predicting Diabetes Using Deep Learning Techniques: A Study On The Pima Dataset
No ratings yet
Predicting Diabetes Using Deep Learning Techniques: A Study On The Pima Dataset
15 pages
Synopsis - Diabetes Prediction
No ratings yet
Synopsis - Diabetes Prediction
28 pages
Paper 2
No ratings yet
Paper 2
5 pages
Food Del Report 1
No ratings yet
Food Del Report 1
13 pages
54 Batch Project Documentation-1
No ratings yet
54 Batch Project Documentation-1
82 pages
Ext 74513
No ratings yet
Ext 74513
10 pages
Sensors 22 05304 v2
No ratings yet
Sensors 22 05304 v2
18 pages
Project Poster Template-2025
No ratings yet
Project Poster Template-2025
1 page
22258-Article Text-93692-1-10-20250212
No ratings yet
22258-Article Text-93692-1-10-20250212
21 pages
Analyze The Use of Machine Learning Models in The Pima Diabetes Data Set For Early Stage Detection
No ratings yet
Analyze The Use of Machine Learning Models in The Pima Diabetes Data Set For Early Stage Detection
5 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
1 page
1 s2.0 S2666307421000048 Main
No ratings yet
1 s2.0 S2666307421000048 Main
7 pages
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
No ratings yet
Prediction of Diabetes Disease Using An Ensemble of Machine Learning Multi-Classifier Models
24 pages
10.3934 Publichealth.2023030
No ratings yet
10.3934 Publichealth.2023030
21 pages
Paper 4
No ratings yet
Paper 4
5 pages
10 22399-Ijcesen 1185474-2693654
No ratings yet
10 22399-Ijcesen 1185474-2693654
6 pages
1 s2.0 S2772671124002419 Main (Asp)
No ratings yet
1 s2.0 S2772671124002419 Main (Asp)
18 pages
Proposal
No ratings yet
Proposal
12 pages
MLA Report
No ratings yet
MLA Report
19 pages
Major Project Report 2023-2024
No ratings yet
Major Project Report 2023-2024
33 pages
Article 6
No ratings yet
Article 6
11 pages
An Analytical Paradigm For Exploration of Diabetes Using Machine Learning
No ratings yet
An Analytical Paradigm For Exploration of Diabetes Using Machine Learning
8 pages
Health Data Analytics And Informatics
From Everand
Health Data Analytics And Informatics
Mbuso Mabuza
No ratings yet
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
From Everand
Cutting-Edge AI and ML Technological Solutions: Healthcare Industry
Zemelak Goraga
No ratings yet
Navigating Complexity: Advanced Decision Support Systems for Healthcare Professionals: O7.0 TRANSFORM INFORMATION TECHNOLOGY
From Everand
Navigating Complexity: Advanced Decision Support Systems for Healthcare Professionals: O7.0 TRANSFORM INFORMATION TECHNOLOGY
Elizabeth Mogopodi
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
The Impact of Artificial Intelligence On Journalism and News Media
No ratings yet
The Impact of Artificial Intelligence On Journalism and News Media
6 pages
The Relationship Between The Use of Artificial Intelligence in Classroom Management and Increased Student Engagement
No ratings yet
The Relationship Between The Use of Artificial Intelligence in Classroom Management and Increased Student Engagement
8 pages
A Study On The Homological Properties of Noetherian Rings and Local Cohomology Modules
No ratings yet
A Study On The Homological Properties of Noetherian Rings and Local Cohomology Modules
5 pages
Teachers' Views On The Change in The LOLT From Grade Three To Grade Four in Urban Township Schools in South Africa
No ratings yet
Teachers' Views On The Change in The LOLT From Grade Three To Grade Four in Urban Township Schools in South Africa
11 pages
ISSN: 2583-2670 (Online) : Research Article
No ratings yet
ISSN: 2583-2670 (Online) : Research Article
10 pages
Financial System Development and Taxes On Exports of Sub-Saharan Africa
No ratings yet
Financial System Development and Taxes On Exports of Sub-Saharan Africa
12 pages
Type III Radix Entomolaris in Permanent Mandibular Second Molar
No ratings yet
Type III Radix Entomolaris in Permanent Mandibular Second Molar
3 pages
Organoleptic Characteristics and Eating Quality Acceptability of Traditionally Smoke-Dried Freshwater Fishes in Toru-Orua, Bayelsa State
No ratings yet
Organoleptic Characteristics and Eating Quality Acceptability of Traditionally Smoke-Dried Freshwater Fishes in Toru-Orua, Bayelsa State
4 pages
Perceived Effects of Domestic Water Access On Rural Households' Livelihood in Ogbomoso Agricultural Zone of Oyo State, Nigeria
No ratings yet
Perceived Effects of Domestic Water Access On Rural Households' Livelihood in Ogbomoso Agricultural Zone of Oyo State, Nigeria
15 pages
Fire Extinguisher Types and Applications
No ratings yet
Fire Extinguisher Types and Applications
6 pages
Management of External Root Resorption in Mandibular Molar With Biodentine: A Case Report
No ratings yet
Management of External Root Resorption in Mandibular Molar With Biodentine: A Case Report
4 pages
Nigerian Languages in The Tangled Web of Language Attitudes
No ratings yet
Nigerian Languages in The Tangled Web of Language Attitudes
8 pages
Assignment For Oxy. Online Based
No ratings yet
Assignment For Oxy. Online Based
5 pages
Intra-Abdominal Hypertension
No ratings yet
Intra-Abdominal Hypertension
3 pages
Critical Care Nursing Monitoring and Treatment For Advanced Nursing Practice 1st Edition Kathy Booker PDF Download
No ratings yet
Critical Care Nursing Monitoring and Treatment For Advanced Nursing Practice 1st Edition Kathy Booker PDF Download
51 pages
Pathogenesis of Acne Vulgaris: Simplified: Review Article
No ratings yet
Pathogenesis of Acne Vulgaris: Simplified: Review Article
5 pages
Persuasive Speech Outline
No ratings yet
Persuasive Speech Outline
4 pages
Maternal and Child Health Programmes: MCH Services
No ratings yet
Maternal and Child Health Programmes: MCH Services
10 pages
? CT Scan Terminology Glossary
No ratings yet
? CT Scan Terminology Glossary
5 pages
Qcciniaherb Noval Herbal Medicine
No ratings yet
Qcciniaherb Noval Herbal Medicine
4 pages
Geriatric Neuroanesthesi
No ratings yet
Geriatric Neuroanesthesi
333 pages
A Case Based Approach To Pacemakers, ICDs, and Cardiac Resynchronization Questions For Examination Review and Clinical Practice (Volume 1), 1st Edition No-Wait Download
100% (18)
A Case Based Approach To Pacemakers, ICDs, and Cardiac Resynchronization Questions For Examination Review and Clinical Practice (Volume 1), 1st Edition No-Wait Download
16 pages
COC Exam For BSC Nurse 2
No ratings yet
COC Exam For BSC Nurse 2
34 pages
Clinical Factors Affecting The Accuracy Of.15
No ratings yet
Clinical Factors Affecting The Accuracy Of.15
8 pages
Studyof Prescriptionof Safetyandfunctional Efficacyofdigestivesyrup
No ratings yet
Studyof Prescriptionof Safetyandfunctional Efficacyofdigestivesyrup
5 pages
Effect of Exercise Augmentation of Cognitive Behavioural Therapy For The Treatment of Suicidal Ideation and Depression - Abdollahi 2017
No ratings yet
Effect of Exercise Augmentation of Cognitive Behavioural Therapy For The Treatment of Suicidal Ideation and Depression - Abdollahi 2017
18 pages
COVID19 - BMJ Best Practice PDF
No ratings yet
COVID19 - BMJ Best Practice PDF
79 pages
Incidence of Various Clinico-Morphological Variants of Cutaneous Tuberculosis and Its Drug Susceptibility Pattern-Delhi Based Study
No ratings yet
Incidence of Various Clinico-Morphological Variants of Cutaneous Tuberculosis and Its Drug Susceptibility Pattern-Delhi Based Study
4 pages
Endo Perio Lesion
No ratings yet
Endo Perio Lesion
13 pages
Delirium
No ratings yet
Delirium
89 pages
Hnrs 199 Senior Capstone Project SQ
No ratings yet
Hnrs 199 Senior Capstone Project SQ
24 pages
Chodon Purer Rajar Magi Khawar Hishab Kore Fellam
No ratings yet
Chodon Purer Rajar Magi Khawar Hishab Kore Fellam
3 pages
Microbiology and Parasitology 100: Trinity University of Asia St. Luke's College of Nursing
No ratings yet
Microbiology and Parasitology 100: Trinity University of Asia St. Luke's College of Nursing
11 pages
L3-SCBM343-Pathology of RBC WBC 4p-NK
No ratings yet
L3-SCBM343-Pathology of RBC WBC 4p-NK
22 pages
Why Do We Fall Ill - PPT 1
No ratings yet
Why Do We Fall Ill - PPT 1
10 pages
Lecture II Myopia
No ratings yet
Lecture II Myopia
39 pages
Bacaan Berikut Untuk Mengerjakan Nomor 1 - 5
No ratings yet
Bacaan Berikut Untuk Mengerjakan Nomor 1 - 5
3 pages
Online Exam - Mycology & Virology
No ratings yet
Online Exam - Mycology & Virology
15 pages
Clarifying The Confusion of Arterial Blood Gas Analysis: Is It Compensation or Combination?
No ratings yet
Clarifying The Confusion of Arterial Blood Gas Analysis: Is It Compensation or Combination?
5 pages
The Importance of Mental Health Awareness in Today's Society
No ratings yet
The Importance of Mental Health Awareness in Today's Society
2 pages
NCM 118 - Lesson 15 (Chronic Renal Failure and Dialysis)
No ratings yet
NCM 118 - Lesson 15 (Chronic Renal Failure and Dialysis)
7 pages

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset

Uploaded by

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of Machine Learning Algorithms On Big Dataset

Uploaded by

Predicting Diabetes Mellitus in Healthcare: A Comparative Analysis of

Machine Learning Algorithms on Big Dataset

Chandrakanth Rao Madhavaram

Published in the Journal of

Global Journal of Research in Engineering & Computer Sciences

Published by GJR PUBLICATION

*Corresponding author: Chandrakanth Rao Madhavaram

1 @ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

1.1 Motivation and Contribution paper

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

diabetic Dataset Data

Label Encoder for Normalization with

Feature selection Data splitting into

Model evaluation Apply models

Fig. 1. Flowchart for Predicting Diabetes Mellitus

3.1 Data Collection

Fig. 2. Count plot for gender

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Fig. 3. Count lot for Polydipsia

Fig. 4. Correlations of attributes with output class

3.2 Data Preprocessing

3.3 Normalization with standard Scaler

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

3.4 Label Encoder

3.5 Feature Importance using ETC

Fig. 5. Important features plot using extra trees classifier

3.6 Data Splitting

3.7 Classification GBM model

𝑎𝑟𝑔𝑚𝑖𝑛 ∑ 𝐿 (𝑦𝑡 . 𝐹𝑛 (𝑥𝑡 ) + 𝐹𝑛+1 (𝑥𝑡 )) … … . (3)

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

3.8 Evaluation metrics

Fig. 6. Representation of confusion matrics

RESULTS AND DISCUSSION

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Performance of GBM for Prediction of

Fig. 7. GBM model performance on diabetes dataset

Fig. 8. Confusion Matrix for GBM model

Fig. 9. ROC-AUC curve for the GBM model

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Accuracy Comparison Between GBM and

Fig. 10. Accuracy comparison between model performance

CONCLUSION AND FUTURE STUDY

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

Global Journal of Research in Engineering & Computer Sciences

@ 2021 | PUBLISHED BY GJR PUBLICATION, INDIA

You might also like