An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
An Effective Approach For Detecting Diabetes Using Deep Learning Techniques Based On Convolutional LSTM Networks
Abstract—The most common disorder affecting millions of algorithms [2]. The most popular algorithms were Support
population worldwide due to insufficient release of insulin by Vector Machine (SVM), Decision Trees, and Random Forest.
pancreas is diabetes. Early detection or precaution of diabetes is
necessary, otherwise leads to many complicated problems. Another popular model to predict diabetes is an Artificial
Predicting diabetes at early stages with appropriate treatment, Neural Network (ANN) [2]. It is well-known for its high
individuals can maintain a happy life. If the conventional precision and performance. Present research includes Deep
diabetes detection method is tedious, the identification of diabetes Learning (DL) for prediction due to the increasing size and
from clinical and physical data requires an automated system. complexity of data.
This paper proposes an approach to enhance diabetes prediction
using deep learning techniques. Based on the Convolutional Long Recent studies [3] using DL have enhanced various
Short-term Memory (CLSTM), we developed a diabetes prediction and classification parameters like accuracy and
classification model and compared with the existing methods on precision. PIMA diabetes dataset [4] is used by many
the Pima Indians Diabetes Database (PIDD). We assessed the researchers to test their models.
findings of various classification approaches in this study. The Diabetes occurs when the body is unable to metabolize the
proposed approach is further improved by an efficient pre-
glucose. The body is unable to produce or react to the insulin
processing mechanism called multivariate imputation by chained
produced in the case of diabetes. Once diabetes is attacked, it is
equations. The outcomes are promising compared to existing
machine learning approaches and other research models.
tough to cure. Hence, the knowledge of how diabetes occurs
helps individuals to prevent it. Early diagnosis helps in
Keywords—Convolutional long short-term memory; diabetes reducing the risk for the patient.
prediction; machine learning; pre-processing Practitioners require high amount of data. The healthcare
I. INTRODUCTION industry collects a large amount of health-related data, but this
data cannot perceive undetected patterns of good decision-
Diabetes is affecting the world's elderly population in a making [5]. It is a tedious job for any individual to process a
very drastic way [1]. By 2019, 463 million individuals around high amount of data. As a result of this, researchers developed
the globe had diabetes. It is expected by the International various machine learning and classification techniques to
Diabetes Federation (IDF) that the number of patients rises to handle the data.
700 million individuals in near future.
This paper has used Traditional LSTM and convolutional
Diabetes occurs due to the inconsistency of glucose levels LSTM models for prediction on the PIMA dataset. We have
in the blood. Usually, diabetes is classified into type 1 and type performed extensive experimentation using data mining
2 diabetes. Type 1 diabetes is due to little insulin production algorithms such as decision trees (DT), Naïve Bayes
and type 2 occurs due to blood cells becoming insulin resistant. classification, ANN, and DL to provide an insight into how
The fundamental cause of diabetes remains unclear, but different algorithms work for diabetes prediction. In a logical
scientists agree that diabetes plays a significant role in both and well-organized way, the comparison of algorithms is
genetic factors and environmental lifestyles. And though it is interpreted, with more efficient and prominent results provided
incurable, therapy and medicine can handle it by maintaining by DL. DL is a self-learning framework for knowledge used
the levels in check. successfully to predict diabetes.
Diabetes slowly causes different diseases in the long run. II. RELATED WORK
Mainly it affects the heart, nervous systems, retina, kidneys and
other internal organs. The care taken at the early stages of A. Diabetes Prediction using Machine Learning (ML)
diabetes helps in avoiding the damage of various organs. Algorithms
Although it is a chronic problem, researchers handled this by ML algorithms are used by researchers to predict diabetes.
developing various prediction systems using machine learning The most famous approaches are SVM, J48, K-Nearest
Neighbours (KNN), and Random Forest classifiers [8]. Ioannis
*Corresponding Author
519 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
et al. [7] applied ML and data mining (DM) techniques for significance followed by BMI, age, births, pedigree feature of
diabetes prediction. This work [7] mainly focused on analysing diabetes, blood pressure, thickness of the skin and insulin.' The
the existing techniques in ML and DM. The authors have done training dataset and 20 percent for the testing were split into 80
extensive research on different databases containing diabetic percent to validate the analysis.
data.
This paper proposes a convolutional neural network with
Zhu et al. [9] used a logistic regression-based model to enhanced feature selection and data pre-processing
predict diabetes. The authors have used principal component mechanisms for diabetes prediction. The later section provides
analysis and k-means algorithms to classify the developed the proposed methodology.
model data correctly. The authors [10] developed a prediction
model on diabetes data using classifiers based on the decision The existing models fail to extract the features properly.
tree, naïve Bayes and random forest. The existing models failed to properly incorporate the data pre-
processing techniques. The existing models do not fill the
The authors [11] also used various MLK algorithms to missing values. Moreover, neural networks and error
classify diabetes data. This work [11] majorly focused on using propagation are not implemented by existing models. The
decision trees and SVM to classify PIMA diabetes data. proposed model overcomes all the above-mentioned problems
Dataset partitioning is carried out using a 10-fold method of and enhances the Diabetes prediction task. The remaining part
cross-validation. The authors have not performed data pre- of the paper is as follows. Section 2 gives the proposed
processing. methodology. The fourth section gives dataset description and
selected results of the existing and proposed method.
Negi and Jaiswal [12] also applied SVM to diabetes
prediction on PIMA and Diabetes 130-US datasets. III. MATERIALS AND METHODS
The authors tested the existing ML algorithms on various 90 percent of all forms of diabetes are diabetes types II.
datasets to predict diabetes. But the data consists of missing This disorder causes insulin resistance or insulin loss problems
values and requires pre-processing. We are using data pre- for the victim. The age at which diabetes type II typically takes
processing techniques to enhance diabetes classification. The place is 40 years old. Youth under the age of 30 are at risk for
next part of this section covers various deep neural network this disease with current eating habits and lifestyle. Early
models for diabetes prediction. detection with routine checks and surveys allows people to
diagnose the disease early and to take precautions.
B. Deep Neural Networks
In the analysis of large datasets, researchers have begun to Various research attempts were made to enhance the
realize the capabilities of DL techniques [6]. Therefore, using accuracy and applicability of various Clinical Decision Support
DL techniques, diabetes prediction has also been carried out. Systems (CDSS) interpretability. However, it is still essential
to optimize this issue. In the medical area, where
The authors [13] used a Deep Neural Network (DNN) for interpretability is an essential question, fluid rules are relevant.
diabetes prediction. This approach was tested on the PIMA
dataset. As DNN can filter the data and develop biases, the Many healthcare systems gain valuable information and
authors did not deliberately pre-process the dataset. For the produce a huge amount of clinical data. Machine learning
research collection and the rest of the research, the dataset is techniques allow the practitioner to process this data and make
divided into 192 samples. 88.41 percent was the accuracy rate quick decisions [9]. These decisions reduce the risk of diabetes,
stated by the authors. affecting the person severely, and preventing damage to other
organs. Multiple machine training techniques for disease
Another approach [14] based on CNN and CNN-LSTM is prediction and information from medical data have been
developed to test the Electrocardiograms dataset. developed.
The authors [15] used the logistic regression model as a The long short-term memory (LSTM) [21] is a form of
basis for the multilayer neural network and CNN. The dataset RNN and consists of feedback connections. LSTM models can
used by authors [15] consists of nine patients. For each patient process a long input data sequence at ease.
nine features are gathered. Moreover, each patient had data for
10,800 days, resulting in a total of 97,200 simulated days. A standard LSTM system consists of a cell, an entrance
There was no proper discussion of the attributes used in this gate, an output gate, and a forgotten gate. The cell recalls
analysis. values at arbitrary times, and the three gates monitor
information flow in and out of the cell.
Miotto et al. [16] proposed the Deep Patient model, which
is an unsupervised DNN. This model is used to classify LSTM networks are well suited for the classification,
electronic health records. The model is tested on a database processing, and estimation of time series data because the
consisting of 704,857 patients. period of uncertain events in a time series can be delayed.
LSTMs have been developed to resolve the disappearance
The authors [17] tested various deep learning methods on gradient problem that can be observed during conventional
Australian hospital health records and developed a dataset. RNN training. Relative lack of attention to the length of gaps is
The authors [18] used RNN model to predict both type 1 an advantage of LSTM in multiple applications over RNNs,
and type 2 diabetes. The authors used the PIMA dataset and hidden Markov models, and other sequence learning methods.
predicted that the attribute “Glucose” has the highest
520 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
Compared with a popular recurrent unit, an LSTM cell has Equation 2 represents the input gate of TLSTM as,
the benefit of its cell memory unit. The cell vector can
encapsulate the concept of missing some of its formerly saved it = σg(wixt +uiht-1+bi) (2)
memory and add some of the new details. The cell equations Equation 3 represents the control gate of TLSTM as,
and the sorting of sequences under the hood must be inspected
to demonstrate this. ct=ft×ct-1 + it× σh (wcxt + ucht-1 +bc) (3)
A. Traditional LSTM Equations 4 and 5 represent the output of TLSTM,
A LSTM network comprises of memory cell and four gates. ot= σg(woxt +uoht-1+bo) (4)
The four gates in LSTM network are a) forget gate (f), b) input
gate (i), control gate (c) and output gate (o) [19]. ht =ot× σh ct (5)
The underlying data pattern can be extracted and Here, the sigmoid function is represented by σg and
remembered, which addresses long-term data dependence on hyperbolic tangent function is denoted by σh.. The symbols w
classic RNN algorithms [19]. Fig. 1 shows the TLSTM and u represent weights. These weights usually prevent the
architecture [20]. Inputs of the architecture are ht−1, xt, and b. issue of gradients from disappearing.
The term ht-1 represents previous cell sate, xt represents current We have used 50 T-LSTM units in each layer. In each
input vector and b represents bias. One of the outputs of the layer, for every input an attention value is calculated. Attention
architecture is ct, which represents the present memory content. value gives the significance of the input and is helpful in final
Another output of the architecture is ht, represents present cell prediction. The dense layer allows the final prediction of
state. These four gates listed above influences the data in the whether a patient has diabetes with the aid of an attention
memory cell. Forget gate gives a value in the range 0-1. This vector.
value defines how much should be ignored from the previous
memory cell. If the forget gate produces a value close to 0 it It can be noticed from in Fig. 1, that there is no correlation
means that at the new time stamp, much of the previous between the previous memory content with any of the gates in
timestamp's memory will be overlooked and the reverse occurs the network. This results in an abnormal situation if the output
for the value close to 1. The gates in TLSTM are represented in gate is locked. This reduces the efficiency of prediction and
the following equations as follows: classification tasks. Hence, the primary goal of this work is to
apply CLSTM to the classification of patients with diabetes
Equation 1 represents the forget gate of TLSTM [20] as, and to illustrate how CLSTM overcomes the limitations faced
ft = αg(wfxt +ufht-1+bf) (1) by TLSTM.
521 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
522 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
D. Dataset Description column reflects the incidence of diabetes (1/0). Fig. 4 shows
The initial process of our strategy is to apply dataset pre- the information of various attributes of the PIMA dataset. Fig.
processing techniques on the PIMA dataset. The dataset 5 shows the correlation of various attributes in the PIMA
contains information about 768 patients, with nine attributes dataset.
obtained for each patient. The data in the dataset consists of
different female individuals between the ages of 21 and 81.
Six attributes represent physical examination specifics in
each row, and the remaining attributes represent chemical
examination information. The last attribute in-row is the data
on whether the patient is diabetic.
The last column of each row is either 1 or 0, 1 indicating
that the patient is diabetic and 0, indicating that the patient is
not diabetic.
The first column in the dataset represents the number of
times a woman is pregnant, and the second column in the
dataset represents the plasma glucose concentration. The third
column in the dataset depicts the diastolic blood pressure and
the fourth column gives the thickness of the triceps skin fold.
The fifth column represents serum insulin for two hours, and
the sixth column represents the person's body mass index
(BMI). Pedigree feature is in the seventh column and the eighth
column in the dataset reflects the individual's age and the last
Fig. 4. Correlation of Various Attributes in the PIMA Dataset.
523 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
IV. RESULTS AND DISCUSSION The results presented in this section specify that the
proposed model outperforms all the existing models. The
The comparison of various models like neural networks, TLSTM and CLSTM models have obtained higher accuracy
machine learning and deep learning systems are presented in results than all the existing machine learning models. The
this section. machine learning models do not capture the features properly
A. Experimental Setup and hence the results are less when compared with the
proposed model. Moreover the proposed model takes care of
The TLSTM, CLSTM models are used in this paper to the data pre-processing and feature selection properly and
predict feature selection. Initially we have pre-processed the hence the results are high for our model.
dataset with the mentioned techniques in the previous section.
Random Forest algorithm is used for feature selection. We V. CONCLUSION
have found from our observation that five features (Glucose,
This paper aims to introduce a CLSTM, TLSTM prediction
Age, BMI, BP, Insulin) as important.
model for diabetes. As diabetes is becoming a serious disorder
We have set the TLSTM and CLSTM models' now-a-days it is the need of the hour if the researchers come up
hyperparameters with the following details mentioned in Table with prediction models. The proposed approach enhances
1. diabetes prediction using deep learning techniques. Moreover,
the proposed approach also uses an efficient pre-processing
The values in Table 1 are hyperparameter optimization mechanism called multivariate imputation by chained
values where we obtained highest accuracy. We have used equations. This paper examines various classification
python inbuilt packages to develop our model. Pre-processing approaches on the PIMA dataset. Existing ML and DL
and feature selection of dataset are also carried out using approaches are tested on PIMA dataset. As mentioned in Table
python. 2, the result achieved by CLSTM model is higher than other
Table 2 presents the results of different models on the methodologies. In the future, in the form of an application or a
PIMA dataset. Naïve Bayes, SVM, DT, K-means have similar website, we plan to build a comprehensive framework using
accuracy results. TLSTM and CLSTM models outperformed CLSTM algorithm, which will help practitioners to predict
the accuracy results of other existing models. The machine diabetes at early stages and reduce the risk of various diseases
learning algorithms reported in this section are traditional ones. REFERENCES
In Table 2 all the results are obtained from our [1] N. Cho, J. Shaw, S. Karuranga, Y. Huang, J. D. R. Fernandes, A.
Ohlrogge and B. Malanda, “IDF Diabetes Atlas: Global estimates of
experimentations. The results show that the TLSTM and diabetes prevalence for 2017 and projections for 2045,” Diabetes Res.
CLSTM models outperformed all the existing machine Clin. Pr., vol. 138, pp. 271–281, 2018.
learning models. [2] Y. L. Sun and D. L. Zhang, “Machine Learning Techniques for
Screening and Diagnosis of Diabetes: A Survey,” Teh. Vjesn., vol. 26,
TABLE I. HYPERPARAMETERS OF TLSTM AND CLSTM pp. 872–880, 2019.
[3] H. Naz and S. Ahuja, “Deep learning approach for diabetes prediction
Parameter TLSTM CLSTM using PIMA Indian dataset,” Journal of Diabetes & Metabolic Disorders,
Learning Rate 0.02 0.01 vol.19(1), pp.391-403, 2020.
[4] D. Liccardo, A. Cannavo, G. Spagnuolo, N. Ferrara, A. Cittadini, C.
Batch size 32 32 Rengo and G. Rengo, “Periodontal disease: A risk factor for diabetes
Hidden layers 50 50 and cardiovascular disease,” International journal of molecular sciences,
vol. 20(6), pp.1414, 2019.
Epoch 50 50 [5] B. P. Nguyen, H. N. Pham, H. Tran, N. Nghiem, Q. H. Nguyen, T. T. Do,
C.T. Tran and C. R. Simpson, “Predicting the onset of type 2 diabetes
TABLE II. COMPARISON OF VARIOUS MODELS ON PIMA DATASET using wide and deep learning with electronic health records,” Computer
methods and programs in biomedicine, vol. 182, 2019.
Model Accuracy (test set 10%) Accuracy (test set 20%) [6] S. Spanig, A. Emberger-Klein, J. P. Sowa, A. Canbay, K. Menrad, and
D. Heider, “The virtual doctor: An interactive clinical-decision-support
Naïve Bayes 79.6% 78.6% system based on deep learning for non-invasive prediction of diabetes,”
SVM 79.2% 78% Artificial intelligence in medicine, vol. 100, 2019.
[7] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I.
Decision Trees 78.4% 77.2% Chouvarda, “Machine learning and data mining methods in diabetes
MLP 80% 82% research,” Computational and structural biotechnology journal, vol. 15,
pp.104-116, 2017.
K means 77% 72%
[8] J. P. Kandhasamy and S. Balamurali, “Performance Analysis of
TLSTM 92.5% 93.7% Classifier Models to Predict Diabetes Mellitus,” Procedia Comput. Sci.,
vol. 47, pp. 45–51, 2015.
CLSTM 96.8% 95.6%
524 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 4, 2021
[9] C. Zhu, C. U. Idemudia, and W. Feng, “Improved logistic regression type 2 diabetics,” IEEE Engineering in Medicine and Biology Society,
model for diabetes prediction by integrating PCA and K-means pp. 2896–2899, 2017.
techniques,” Informatics in medicine Unlocked, vol. 17, 2019. [16] R. Miotto, L. Li, B. A. Kidd and J.T. Dudley, “Deep Patient: An
[10] Z. Tafa, N. Pervetica, and B. Karahoda, “An intelligent system for Unsupervised Representation to Predict the Future of Patients from the
diabetes prediction,” In Proceedings of the 2015 4th Mediterranean Electronic Health Records,” Appl. Sci., vol.6, pp. 4604-4612, 2019.
Conference on Embedded Computing (MECO), pp. 378–382, 2015. [17] T. Pham, T. Tran, D. Phung and S. Venkatesh, “Predicting healthcare
[11] D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using trajectories from medical records: A deep learning approach,” J. Biomed.
Classification Algorithms,” Procedia Comput. Sci., vol. 132, pp. 1578– Inform., vol.69, pp.218–229, 2017.
1585, 2018. [18] H. Balaji, N. Iyengar and R. D. Caytiles, “Optimal Predictive analytics
[12] A. Negi and V. Jaiswal, “A first attempt to develop a diabetes prediction of Pima Diabetics using Deep Learning,” Int. J. Database Theory Appl.,
method based on different global datasets,” In Proceedings of the 2016 vol. 10, pp. 47–62, 2017.
Fourth International Conference on Parallel, Distributed and Grid [19] G. Zhu et al., “Redundancy and Attention in Convolutional LSTM for
Computing, pp. 237–241, 2016. Gesture Recognition.,” IEEE Trans. neural networks Learn. Syst., Jun.
[13] A. Ashiquzzaman, A. Kawsar Tushar, M. D. Rashedul Islam, D. Shon, L. 2019.
M. Kichang, P. Jeong-Ho, L. Dong-Sun and K. Jongmyon, “Reduction [20] G. Zhu, L. Zhang, L. Yang, L. Mei, S. A. A. Shah, M. Bennamoun, and
of overfitting in diabetes prediction using deep learning neural network,” P. Shen, “Redundancy and attention in convolutional LSTM for gesture
In IT Convergence and Security; Lecture Notes in Electrical recognition,” IEEE transactions on neural networks and learning
Engineering; Springer, vol. 449, 2017. systems, vol. 31(4), pp.1323-1335, 2019.
[14] G. Swapna, K. P. Soman and R. Vinayakumar, “Automated detection of [21] Rahman and Siddiqui, “An Optimized Abstractive Text Summarization
diabetes using CNN and CNN-LSTM network and heart rate signals,” Model Using Peephole Convolutional LSTM,” Symmetry (Basel)., vol.
Procedia Comput. Sci., vol. 132, pp.1253–1262, 2018. 11,2019.
[15] A. Mohebbi, T. B. Aradóttir, A. R. Johansen, H. Bengtsson, M. Fraccaro
and M. Mørup, “A deep learning approach to adherence detection for
525 | P a g e
www.ijacsa.thesai.org