0% found this document useful (0 votes)
12 views9 pages

TechnologyName Phase1

Uploaded by

hariprabu1734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views9 pages

TechnologyName Phase1

Uploaded by

hariprabu1734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

AI Based Diabetes Prediction System

Abstract
Many factors can cause a person to get affected by diabetes, like excessive body
weight, abnormal cholesterol level, family history, physical inactivity, bad food
habit etc. People with diabetes for a long time can get several complications like
heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk
can be reduced if it is predicted early. We are going to use machine learning
classification methods, that is, decision tree, SVM, Random Forest, Logistic
Regression, KNN, and various ensemble techniques, to determine which algorithm
produces the best prediction results. The explainable AI approach with LIME and
SHAP frameworks is implemented to understand how the model predicts the final
results.
Introduction

 Diabetes is a chronic disease that directly affects the pancreas, and the body is
incapable of producing insulin.

 Insulin is mainly responsible for maintaining the blood glucose level.

 Many factors, such as excessive body weight, physical inactivity, high blood
pressure, and abnormal cholesterol level, can cause a person get affected by
diabetes.

 In this paper, we have employed machine learning and explainable AI


techniques to detect diabetes.
Proposed System
Proposed System

 This section describes the working procedures and implementation of various


machine learning techniques to design the proposed automatic diabetes
prediction system.

 First, the dataset was collected and preprocessed to remove the necessary
discrepancies from the dataset

 Then the dataset was separated into the training set and test set using the
holdout validation technique.

 Next, different classification algorithms were applied to find the best


classification algorithm for this dataset.
Dataset
Several constraints were placed on the selection of these instances
from a larger database. In particular, all patients here are females at
least 21 years old of Pima Indian heritage.

• Pregnancies: Number of times pregnant


• Glucose: Plasma glucose concentration a 2 hours in an oral glucose
tolerance test
• Blood Pressure : Diastolic blood pressure (mm Hg)
• Skin Thickness: Triceps skin fold thickness (mm)
• Insulin: 2-Hour serum insulin (mu U/ml)
• BMI: Body mass index (weight in kg/(height in m)^2)
• Diabetes Pedigree Function: Diabetes pedigree function
• Age: Age (years)
• Outcome: Class variable (0 or 1)
Dataset Preprocessing

 In the merged dataset, we discovered a few exceptional zero values.

 The zero value has been replaced by its corresponding mean value.

 The training and test dataset has been separated using the holdout
validation technique, where 80% is the training data and 20% is the
test data.

Mutual Information: Mutual information attempts to measure the


interdependence of variables. It produces information gain, and its
higher values indicate greater dependency .
Machine learning classifiers

Grid Search CV framework has been employed in this


research to find the optimal values of different hyper
parameters for all the machine learning models to prevent
overfitting.

Decision tree: A decision tree represents the learning function


provided by a set of rules. The decision tree learning technique
performs a method for approximating discrete‐valued target
functions.
CONCLUSIONS

 In this paper, an automatic diabetes prediction system using various


machine learning approaches has been proposed. The open‐source Pima
Indian and a private dataset of female Bangladeshi patients have been
used in this work.

 This research paper reported different performance metrics, that is,


precision, recall, accuracy, F1 score, and AUC for various machine
learning and ensemble techniques.

 There are some future scopes of this work, for example, we recommend
getting additional private data with a larger cohort of patients to get
better results.

You might also like