Cloud Base Diabetic Predication
Cloud Base Diabetic Predication
№ 9433
December 8, 2022
Cloud-based Diabetic Prediction Framework: Deep
Learning Approach
Monther Tarawneh Fiasal AlZyoud Yousef AlSharrab Khalaf khatatneh
Computer Science Computer Science Computer Science Cyber security department,
department, Isra department, Isra University department Isra University Isra University
University. Amman. jordan Amman. jordan Amman. Jordan
Amman. jordan [email protected] [email protected] [email protected]
[email protected]
Abstract— Since Diabetic is one of most common growing will reduce the time and protect medical staff from
disease in the world. Which open the gate for another kind of infection. IoT is a collection of connected objects that
diseases such as blindness, kidney problems, heart disease and employ technology and algorithms on data to develop
more. Therefore, we need to develop a system the predict solutions or services for smart applications[6]. The
diabetic before it happens to people and advise them to avoid integration IoT with Artificial intelligent and cloud
it. The system is more than early detection as its prediction. computing gives and advantage to all environments[7],
We propose a cloud-based secure framework that integrates mainly the health sector. It will reduce the cost and protect
traditional machine learning methods with deep neural doctors and nurses from infectious diseases. It could be used
networks. The system collect patients readings using IoT
to give an initial prediction for their illness, and send an
devices and sensors, where it will be moved securely using
public key encryption to cloud storage. Then the prediction
alarm in case or critical situation[8]. While previous data
algorithm perform on time prediction on the data to see if the helps to diagnose patient and prescribe medicine and
patient expected to be diabetic in the future or not. The suitable treatment. All kind of reading can be collected such
prediction techniques tested on Pima Indian diabetic dataset as heart rate, EEG, blood pressure, temperature, glucose
from UCI. The result shows that it performs traditional ML level, x-rays, depression, mood and other readings. Usually
methods with accuracy of 98%. data can be collected using microcontroller such as
raspberry pie and Arduino, then stored in the cloud for
Keywords— E-Health, deep learning, Deep neural network, analysis[9]. Data processed on the cloud or fog computing
machine learning, diabetic prediction, diabetic prediction, IoT for real-time response. Security need more attention here
where data travel through the internet[10]. However, all
I. INTRODUCTION using smart health care system is moving to the cloud
The proliferation of new technologies and smart devices technology faster than before[11, 12]. For example; a
facilitate the human life, and decrease the human effort, personal health dashboard that use cloud-based system
which results in increasing the risk factors for developing developed to monitor patients, analyses their data and detect
diseases such as high blood pressure, and diabetic. The COVID-19[13, 14]. Such a system must be deployed with
number of patients in diabetic is expected to reach 625 high level of security to protect patient’s information.
million by 2045[1]. The focus of smart healthcare is to measure vital
Diabetes is a chronic common disease that affect the parameters of human body: temperature, Pulse rate,
health of the entre people in the earth. It destabilizes the Respiration rate, Oxygen level and Blood pressure. Vital
sugar level in the blood. The normal range of sugar in the signs are useful in detecting or monitoring patient’s health.
blood is from 70 – 180mg/dl. There are number of different Other useful parameters can be measured such as gesture,
diabetic and the most common are type 1 and 2. Type 1 facial expression, Consciousness level and body language.
diabetes appears in children and type 2 diabetes for the Remote monitoring systems developed to measure PPG,
middle aged and old people[2]. Diabetic open the body for EGC and temperature to check the patient status[15-17],
other disease such as kidney problems, heart disease, nerve most of them are using chair to measure these vitals such as
problem and disabilities[3]. Diabetic become a major health car seat, wheelchair or smart chairs. Wearable devices
problem. The traditional way of sugar level Self-monitoring developed over years to measures temperature and other
is finger stick samples[4]. Modern devices invented to vital sign[18-20]. Currently smart mobile devices has the
control the sugar level that record the patient state every ability to measure vital signs. Mobile health (m-health) uses
mints such as using electromagnetic radiation[5]. With the mobile devices to collect real time data and store it a
emerging of IoT and e-health framework, patient monitored server[21, 22], the collected data can be analyzed and
remotely and regularly. Health organizations concentrate processed using AI to give initial diagnose. M-health
their attention to use the new technologies and artificial reduces the costs and improve the quality of healthcare, but
techniques in diagnosing and predicting the probability of it comes with challenges in terms of interoperability and
diabetes occurrence, as it has undesired effects when it does security [4, 22]. More research done to develop a secure
not discover and treat in the first stages. Diabetes Mellitus framework for m-health the focus on authentication and
(DM) is a long-time term disease problem, which has an Encryption[23-25]. Any remote healthcare system has more
inverse impact on the human social life benefits and great transformation to healthcare, but raises a
security challenges. Security and privacy become more
Huge research carried out to apply the internet of things important in healthcare. Attackers can use any methods to
(IoT) in what smart healthcare. It helps doctors to diagnose find out sensitive data and release it to the public[26]. A
patients without direct contact. It can be done by analyzing secure framework must be used to prevent leak of
hug amount of patient data collected by IoT devices; this information and threats to patients. Data transmitted over
The use of deep learning has grown faster because it Fig. 2. Proposed framework
simulate the human mind. They are used in different forms
in the health field[52, 53] and proven to be the better by The cloud layer will employ deep learning techniques to
reducing error rate and robust against data noise[54, 55]. A analysis the data predict diabetic for current patient in the
deep neural network has hidden layers between the input future. Healthcare specialist and organization to view
and output layers. Its developed to predict by exploring the patient records in a secure channel based on access
patterns in data set. The accuracy is much better when the privileges. The main focus of this paper is on the accuracy
techniques trained well. The architecture of deep neural of diabetic prediction and the security layer will be
network contains many hidden layers and several neurons discussed in future work.
in every hidden layer as shown in Figure 1. II. DATASET
I. METHODOLOGY The framework depends on real time data collected by
IoT in time. Sense we have not implement this in reality, we
The proposed framework contains four phase distributed
will use a common used data set for prediction of diabetes
over three layers as shown in Figure 2. The first phase is
is the Pima Indian Diabetes dataset is retrieved from the UCI
Data collection where data can be collected using smart
machine learning repository database[56]. The data set
devices and with the use of IoT technology. The second
contains of 768 rows and 9 columns. The attributes are
phase is data pre-processing where we clean up the data and
glucose, pregnancies, skin thickness, blood pressure, BMI,
remove unnecessary features. The third phase is the
insulin, and age. The last column is test result: positive or
prediction phase where the implemented hybrid algorithm
negative for diabetic. Table 1 below the features and there
is used to analyses the collected data and draw a diabetic
properties.
prediction using medical facts and historical records. The
phase is the evaluation criteria. These phases distributed Table 1. Dataset Features
over three layers: sensor layer where IoT devices and
sensors used to collect data, security layer where data will Feature Description Type
be encrypted using public key encryption, the last layer is
the cloud layer where the computation will take place. Preg Number of times pregnant Numeric
The collection layer will employ IoT devices to collect Plasma glucose concentration at 2 Hours
patient’s data with. Sensor devices can be wearable, Gluc
in an oral glucose tolerance test (GTIT)
Numeric
embedded in the body, on Apps on smart devices. The
communication technology on the sensor device will enable BP Diastolic Blood Pressure (mm Hg) Numeric
it to transfer the data to a health gateway, which in turn
forward these accumulated data to cloud storage. The data Skin Triceps skin fold thickness (mm) Numeric
will be encrypted using patient private key before
transmitting it to cloud. There are two operations for the Insulin 2-Hour Serum insulin (µh/ml) Numeric
security process:
Body mass index [weight in kg/(Height in
Key management: RSA algorithm will be used to BMI Numeric
m)]
generate pair of keys for each patients and specialist. The
public key will be stored in a directory on the cloud and used DPF Diabetes pedigree function Numeric
to encrypt and decrypt data to/from patient
gateway/specialist. In addition, give access to authorized Age Age (years) Numeric
users to access patient records.
Binary value indicating non-diabetic
Data encryption/decryption: patient data will be Outcome Factor
/diabetic
encrypted using his own private key, and decrypted using
his public keys. Request to access data encrypted using III. PROPOSED FRAMEWORK
requester private keys and replies encrypted using requester
public key. The framework is design for real time analysis and data
collected on time remotely over years to predict diabetic.
Due some network disconnections or device faults (down),
some readings could not reach the cloud storage. Therefore,
pre-processing is an essential step to avoid in accurate Machine learning classifiers used on the dataset. Data
prediction. The algorithm will consider the reading as a split into two sets: training (90%) and test (%10). KNN,
DNA for the person over time and any missing values will SVM and Random forest are supervised ML where .KNN is
be replaced based on previous and current readings. used to classify dataset’s columns, SVM is suitable for small
datasets and RF classifier is a several decision tree. The ML
In this research, Pima data set to check the accuracy methods are used more like a feature selection techniques.
prediction. Therefore, we will focus on cleaning data and
feature selection at this stage. We use python to check the The prediction system implemented using python
data set for missing or none values. The test shows that the library. Deep neural network with 4 hidden layers is
data has not missing or none values but some features values implemented and the number of neurons are 12, 16, 16 and
are inconsistent: glucose, blood pressure, skin fold 14 respectively for layer 1, 2, 3 and 4. The best performance
thickness, insulin and BMI. This may affect the accuracy of for diabetic prediction of the algorithm on four layers
the algorithm. If we look at the statistic of the data set as architecture. The input layer layers are eight and the output
shown on table 2. We can see many zeros as all null values is one. The output is one layer because the output can be
replaced by Zero. The real time dataset will not contain inferred as positive or negative. To evaluate the supervised
many zeros’, because a genetic algorithm will replace
machine learning algorithms, the confusion matrix is used.
missing values with a values base on previous and future
Where it makes the evaluation of the accuracy is easy.
value for this feature. However, the data set we used can be
replace because we have no history and not on time data. Table 3: Correlation between features
Also, dropping one of eight features is not a good idea. It
may reduce the noise but reduce the accuracy as well.
Pregnancies
Glucose
Pressure
Blood
Thickness
Skin
Insulin
BMI
Function
Pedigree
Age
Outcome
Table 2: Data set features statistics.
Pregnancies
Thickness
Function
Pressure
Pedigree
Diabetes
Glucose
Insulin
Blood
result
BMI
Skin
Age
ies
Pregnanc
1.000000
0.129459
0.141282
-0.081672
-0.073535
0.017683
-0.033523
0.544341
0.221898
120.89453
69.10547
20.53646
79.79948
31.99258
3.84505
0.47188
Glucose
0.129459
1.000000
0.152590
0.057328
0.331357
0.221071
0.137337
0.263514
0.466581
mean
33.2
0.35
115.24400
31.97262
19.35581
15.95222
Pressure
Blood
0.141282
0.152590
1.000000
0.207371
0.088933
0.281805
0.041265
0.239528
0.065068
3.36958
7.88416
0.33133
11.8
0.48
std
Thickness
Skin
-0.081672
0.057328
0.207371
1.000000
0.436783
0.392573
0.183928
-0.113970
0.074752
0.078
21.0
min
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.2437
99.00
62.00
27.30
25%
1.00
0.00
0.00
24.0
0.00
5
Insulin
0.073535
-
0.331357
0.088933
0.436783
1.000000
0.197859
0.185071
-0.042163
0.130548
117.00
0.3725
72.00
23.00
30.50
32.00
50%
3.00
29.0
0.00
BMI
0.017683
0.221071
0.281805
0.392573
0.197859
1.000000
0.140647
0.036242
0.292695
0.62625
140.25
127.25
80.00
32.00
36.60
75%
6.00
41.0
1.00
Function
Pedigree
-0.033523
0.137337
0.041265
0.183928
0.185071
0.140647
1.000000
0.033561
0.173844
2.42000
199.00
122.00
846.00
17.00
99.00
67.10
max
81.0
1.00
Age
0.544341
0.263514
0.239528
-0.113970
-0.042163
0.036242
0.033561
1.0
0.238356
0.221898
0.466581
0.065068
0.074752
0.130548
0.292695
0.173844
0.238356
1.000000