0% found this document useful (0 votes)
3 views

A Comparative Analysis Using Machine Learning Algorithm on - Copy

The document presents a comparative analysis of machine learning algorithms, specifically Logistic Regression and Artificial Neural Networks (ANN), for predicting diabetes onset. The study highlights the urgent need for accurate predictive models due to the increasing global burden of diabetes and concludes that the ANN model outperforms Logistic Regression in accuracy. Recommendations for future research include exploring more complex models and a broader range of variables for improved predictions.

Uploaded by

Kelvin mburu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

A Comparative Analysis Using Machine Learning Algorithm on - Copy

The document presents a comparative analysis of machine learning algorithms, specifically Logistic Regression and Artificial Neural Networks (ANN), for predicting diabetes onset. The study highlights the urgent need for accurate predictive models due to the increasing global burden of diabetes and concludes that the ANN model outperforms Logistic Regression in accuracy. Recommendations for future research include exploring more complex models and a broader range of variables for improved predictions.

Uploaded by

Kelvin mburu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

A Comparative Analysis Using Machine Learning

Algorithm on the Prediction of Diabetes Onset

Kelvin Mburu SCM223-0087/2020


Atieno Beatrice SCM223-0161/2020
Alex Nyaga SCM223-0083/2020
Fabian Getugi SCM223-0712/2018
Introduction

• Diabetes mellitus, a chronic metabolic disorder characterized by elevated blood glucose levels has
become a global health concern with profound implications for public health systems and
individual well-being.
• According to International Diabetes Federation(IDF) 382 million people are living with diabetes
across the whole world. By 2035, this will be doubled 592 million
• The escalating prevalence of diabetes underscores the urgent need for effective and proactive
strategies to identify individuals at risk of developing the condition at an early stage. Early
prediction holds the key to implementing timely interventions, thereby preventing or mitigating
the progression of diabetes-related complications
Statement Of The Problem

• An increased burden on healthcare systems and individual well-being.


• Early identification implementing timely interventions
• existing risk assessment methods often lack the precision and adaptability required for proactive
healthcare.
• pressing need for advanced and accurate predictive models that harness the capabilities of machine
learning to enable early and personalized diabetes risk assess
Objectives

General Objective
• The aim is to develop and assess the efficacy of machine learning models for early
diabetes prediction employing logistic regression model and artificial neural networks
(ANN).

Specific Objectives
• To fit the Logistic Regression model on the prediction of diabetes onset.

• To fit the Artificial Neural Network model on the prediction of diabetes onset.

• To compare the performance of the models


Justification

• Urgent need to address the escalating global burden of diabetes

• Imperative for innovative and effective approaches to early detection and management

• MLA potential to offer higher accuracy and reliability in predicting diabetes onset

• The research aim to provide valuable insights

• Contribute to proactive and personalized healthcare strategies


Literature review

• Extensive research has explored machine learning for diabetes prediction, utilizing various
classifiers and data preprocessing techniques. Studies indicate machine learning algorithms like
ANN and Logistic Regression significantly enhance prediction accuracy and performance over
traditional methods. Critical analysis identifies the need for personalized, accurate predictive
models due to diabetes' multifaceted nature.
• Kandhasamy Balamurali (2015) has used multiple classifiers J48, SVM, RF, and K-Nearest
Neighbors (KNNs). The dataset is taken from the UCI repository. The matrices compared are
specificity, sensitivity, and accuracy.
Methodology

Exploratory Data Analysis


• Initial data analysis to understand dataset characteristics, address missing values, and assess
variables' distribution.
Model Selection
• Focused on Logistic Regression and ANN for predictive modeling, considering their suitability for
handling diabetes-related data.
Performance Metrics
• Utilized accuracy, precision, recall, F1 score, and confusion matrix to evaluate the predictive
performance of models.
Descriptive statistics of the variables

Count Mean Std Min 25% 50% 75% Max

Age 100000.00 41.89 22.52 0.08 24.00 43.00 60.00 80.00

Hypertension 100000.00 0.07 0.26 0.00 0.00 0.00 0.00 1.00

Heart 100000.00 0.04 0.19 0.00 0.00 0.00 0.00 1.00


Disease

BMI 100000.00 27.32 6.64 10.01 23.63 27.32 29.58 95.69

hbA1c Level 100000.00 5.53 1.07 3.50 4.80 5.80 6.20 9.00

Blood 100000. 138.06 40.71 80.00 100.00 140.00 159.00 300.00


Glucose 00
Level

Diabetes 100000.00 0.09 0.28 0.00 0.00 0.00 0.00 1.00


Visualizing the Distribution of the Target Variable - Diabetes

• The percentage distribution was 8.8 percent for Diabetic labelled as one(1) and 91.2 percent for
non-Diabetic labelled as zero(0). Clearly the target variable had an uneven class distribution hence
making the dataset imbalanced.
BIVARIATE ANALYSIS
• The colour, reflects the strength of the correlation, and the yellow colour shows a stronger
correlation between two features, while the deep-blue colour shows a weaker correlation between
variables.
Logistic regression model

• a statistical method that is used for building machine learning models where the dependent
variable is binary(0 or 1).
• The equation below is the final equation for Logistic Regression;
Model performance LR

Metric Score

Precision 0.95

Recall 0.93

F1 score 0.94

Accuracy 0.94
Artificial neural network

• Artificial neural networks (ANN), also known as neural networks (NNs), are a type of machine
learning model inspired by the organization of animal brains. They consist of interconnected
artificial neurons, which receive and process signals before sending them to other neurons.
Neurons are organized into layers and have adjustable weights that affect signal strength.
Model performance ANN

Metric Score

Precision 0.97

Recall 0.94

F1 score 0.95

Accuracy 0.95
Model performance comparison

• The study used Accuracy Score to compare the performance of the two models.The performance
of the two models is shown in the table below:

Metric LR ANN

Precision 0.95 0.97

Recall 0.93 0.94

F1 score 0.94 0.95

Accuracy 0.94 0.95


Conclusion and Recommendation

• The study concluded that the Artificial Neural Network model had a slightly higher accuracy than
the Logistic Regression model, making it more effective in predicting diabetes onset. Given the
performance of both models, it is recommended to consider the data's nature and the specificity of
the task when choosing the model for diabetes prediction. Future studies could explore integrating
more complex machine learning models and a wider range of variables for even more accurate
predictions.
Budget

ITEM DESCRIPTION QUANTITY COST


INTERNET SAFARICOM 1 2500
STATIONARY NOTEBOOK,PEN 4 500
ACCESSORIES FLASHDISK 1 1200
PRINTING AND BINDING 4 Copies 1400
MISCELLANEOUS 1400
TOTAL COST 7000
Workplan

Time and Activity Oct Nov Jan Feb Mar

Proposal
Development

Proposal
Presentation

Data Analysis

Report Writing

Project Presentation
and Submission
References

Birjais R, Mourya AK, Chauhan R, Kaur H. Prediction and diagnosis of future diabetes risk: A machine
learning approach. SN Appl Sci. 2019;1:1–8

Dalakleidi, K., Zarkogianni, K., Thanopoulou, A., & Nikita, K. (2017). Comparative assessment of
statistical and machine learning techniques towards estimating the risk of developing type 2 diabetes
and cardiovascular complications. Expert Systems, 34 (6), e12214. https://doi.org/10.1111/exsy.12214

Hasan, M. K., Alam, M. A., Das, D., Hossain, E., & Hasan, M. (2020). Diabetes prediction using
ensembling of different machine learning classifiers. IEEE 7Access, 8, 76516-76531. https://doi:
10.1109/ACCESS.2020.2989857

You might also like