A Comparative Analysis Using Machine Learning Algorithm on - Copy
A Comparative Analysis Using Machine Learning Algorithm on - Copy
• Diabetes mellitus, a chronic metabolic disorder characterized by elevated blood glucose levels has
become a global health concern with profound implications for public health systems and
individual well-being.
• According to International Diabetes Federation(IDF) 382 million people are living with diabetes
across the whole world. By 2035, this will be doubled 592 million
• The escalating prevalence of diabetes underscores the urgent need for effective and proactive
strategies to identify individuals at risk of developing the condition at an early stage. Early
prediction holds the key to implementing timely interventions, thereby preventing or mitigating
the progression of diabetes-related complications
Statement Of The Problem
General Objective
• The aim is to develop and assess the efficacy of machine learning models for early
diabetes prediction employing logistic regression model and artificial neural networks
(ANN).
Specific Objectives
• To fit the Logistic Regression model on the prediction of diabetes onset.
• To fit the Artificial Neural Network model on the prediction of diabetes onset.
• Imperative for innovative and effective approaches to early detection and management
• MLA potential to offer higher accuracy and reliability in predicting diabetes onset
• Extensive research has explored machine learning for diabetes prediction, utilizing various
classifiers and data preprocessing techniques. Studies indicate machine learning algorithms like
ANN and Logistic Regression significantly enhance prediction accuracy and performance over
traditional methods. Critical analysis identifies the need for personalized, accurate predictive
models due to diabetes' multifaceted nature.
• Kandhasamy Balamurali (2015) has used multiple classifiers J48, SVM, RF, and K-Nearest
Neighbors (KNNs). The dataset is taken from the UCI repository. The matrices compared are
specificity, sensitivity, and accuracy.
Methodology
hbA1c Level 100000.00 5.53 1.07 3.50 4.80 5.80 6.20 9.00
• The percentage distribution was 8.8 percent for Diabetic labelled as one(1) and 91.2 percent for
non-Diabetic labelled as zero(0). Clearly the target variable had an uneven class distribution hence
making the dataset imbalanced.
BIVARIATE ANALYSIS
• The colour, reflects the strength of the correlation, and the yellow colour shows a stronger
correlation between two features, while the deep-blue colour shows a weaker correlation between
variables.
Logistic regression model
• a statistical method that is used for building machine learning models where the dependent
variable is binary(0 or 1).
• The equation below is the final equation for Logistic Regression;
Model performance LR
Metric Score
Precision 0.95
Recall 0.93
F1 score 0.94
Accuracy 0.94
Artificial neural network
• Artificial neural networks (ANN), also known as neural networks (NNs), are a type of machine
learning model inspired by the organization of animal brains. They consist of interconnected
artificial neurons, which receive and process signals before sending them to other neurons.
Neurons are organized into layers and have adjustable weights that affect signal strength.
Model performance ANN
Metric Score
Precision 0.97
Recall 0.94
F1 score 0.95
Accuracy 0.95
Model performance comparison
• The study used Accuracy Score to compare the performance of the two models.The performance
of the two models is shown in the table below:
Metric LR ANN
• The study concluded that the Artificial Neural Network model had a slightly higher accuracy than
the Logistic Regression model, making it more effective in predicting diabetes onset. Given the
performance of both models, it is recommended to consider the data's nature and the specificity of
the task when choosing the model for diabetes prediction. Future studies could explore integrating
more complex machine learning models and a wider range of variables for even more accurate
predictions.
Budget
Proposal
Development
Proposal
Presentation
Data Analysis
Report Writing
Project Presentation
and Submission
References
Birjais R, Mourya AK, Chauhan R, Kaur H. Prediction and diagnosis of future diabetes risk: A machine
learning approach. SN Appl Sci. 2019;1:1–8
Dalakleidi, K., Zarkogianni, K., Thanopoulou, A., & Nikita, K. (2017). Comparative assessment of
statistical and machine learning techniques towards estimating the risk of developing type 2 diabetes
and cardiovascular complications. Expert Systems, 34 (6), e12214. https://doi.org/10.1111/exsy.12214
Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using
ensembling of different machine learning classifiers. IEEE 7Access, 8, 76516-76531. https://doi:
10.1109/ACCESS.2020.2989857