Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction
Abstract- Diabetes mellitus is a high-risk medical disease in which blood sugar levels are too high. It is a major cause of death
worldwide. According to the increased morbidity in recent years, in 2040, the world's diabetics will reach approximately 642
million, which means that one in ten adults will suffer from diabetes in the future. There is no doubt that it draws considerable
attention to this troubling amount. Machine learning has been extended too many areas of medical health through the
exponential advancement of machine learning. A lot of data mining and machine learning techniques have been applied to
diabetes datasets for disease risk prediction. The purpose of this paper is to review these machine learning techniques based on
the performance measures and characteristics of the methods. The Pima Indian Diabetes dataset, taken as part of the study,
includes 768 patients, out of which 268 patients are diabetic and 500 patients are under control.
Keywords- Machine Learning, Decision tree, Random forest, PIMA diabetes dataset, Diabetes Mellitus.
© 2022 IJSRET
444
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X
In recent times, several algorithms are used to forecast Principal component analysis (PCA) is a technique to
diabetes, including the conventional machine learning bring out strong patterns in a dataset by supressing
method [6], such as support vector machine (SVM), variations. We applied this algorithm for feature extraction
decision tree (DT), logistic regression etc. [7] proposed a from our Pima dataset and also used to clean data sets to
10-fold cross validation method in three algorithms, i.e. make it easy to explore and analyse. In this PCA we have
logistic regression, naive bayes and SVM, where SVM set the feature vector according their Eigen value.
obtained higher performance and accuracy in comparison
to other algorithms. [8] Constructed prediction models Step: 2 CNN based Feature Selection
based on logistic regression for different onsets of type 2 In the proposed model, CNN is used for automatic feature
diabetes prediction in order to deal with the high selection form decomposed EEG signals.CNN is a deep
dimensional datasets. learning subset that has received a lot of focus in recent
years and is applied in signal processing. The activation
In [9], the authors concentrated on glucose and used function called ‗Softmax‘ is used to express a probability
diabetes, which is a multivariate regression problem, to distribution over an n-valued discrete signals with kernel
predict support vector regression (SVR). In addition, more size 1 for extracting specific features from input signals
and more studies have used ensemble techniques to and padding is valid. It is assumed that all dimensions are
enhance the accuracy of [6]. A new ensemble method, valid for the input signals to be completely covered by the
Rotation Forest, which incorporates 30 machine learning filter respectively. Max-pooling filter serves as a window
techniques, was proposed in [10]. In [11], authors through which only the maximum score is chosen for
suggested a method of machine learning that modified the output which is used between the first and second layer
rules for the prediction of SVM. In [12], the authors and after the second layer of CNN with batch size 1.
proposed a computer assisted diabetes on the basis of
digital image processing on retinal images to disclose Step 3: Classification:
diabetic retinopathy, by employing SVM technique. The Long Short-Term Memory LSTM layer consists of a
fully connected unit with 64 neuronsavoid the problem of
Machine learning approaches are commonly used to overfitting during the learning process and dropout layer
predict diabetes and produce preferred results. Decision used with 0.5 rate. Finally fed into the last dense layer
tree is one of the common methods of machine learning in with 1 neuronwith activation function sigmoid has been
the medical field, which has the power to classify used. All the parameter setup has been chosen based on
gratefully. Many decision trees are created by Random the hit and trial approach. The proposed model's
Forest. The neural network is a common method of architecture is described in figure 1.
machine learning that has improved performance in many
aspects recently. So we used algorithms like decision
trees, random forest (RF) and neural network to predict
diabetes in this research.
© 2022 IJSRET
445
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X
Procedure:
Step 1: Data Pre-processing Pima Dataset:
1.1 Remove Missing and Nan Values;
Step 2: Apply an feature Extraction using Principal
Component Analysis(PCA) a Machine learning
module
Step 3: Apply CNN Model for Automatic Feature
Selection:
Step 4: Apply LSTM Model for Classification:
4.1 Input layer with padding valid and kernel=2
4.2 Batch Normalization layer Fig 2. Accuracy Vs Algorithm.
4.3 Max Pooling layer
4.4 With Sigmoid activation layer
5.5 Flatten layer
6.6 Batch Normalization layer
7.7 Relu layer
8.8 Dense Layer
8.9 Compile Model with Adam optimizer
Step 5: Generate and store obtained classification
accuracy, precision, f1-score, Recall, classification
accuracy;
Step 6: Repeat Step 3 ,Step 4 and Step 5
V. RESULTS ANALYSIS
The comparison of different models with respect to Fig 3. Precision Vs Algorithm.
accuracy predicted, precision, recall and F1 score by all
the applied algorithms. It is observed that the decision tree
performed best on the Pima Indian dataset.
© 2022 IJSRET
446
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X
VI. CONCLUSION AND FUTURE WORK Vector Machine Classifier," Expert Systems with
Applications, vol. 38, pp. 8311-8315, 2011.
Diabetes mellitus is a high-risk medical disease in which [7] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras,
blood sugar levels are too high. It is a major cause of death I. Vlahavas, and I. Chouvarda, "Machine learning and
worldwide According to growing morbidity in recent data mining methods in diabetes research,"
years; the global diabetic population will reach around 642 Computational and structural biotechnology journal,
million in 2040, implying that one in every ten persons vol. 15, pp. 104-116, 2017.
would suffer from diabetes in the future. There is no doubt [8] Lee, B.J., Kim, and J.Y.: Identification of type 2
that it brings a lot of attention to this worrisome figure. diabetes risk factors using phenotypes consisting of
anthropometry and triglycerides based on machine
Through the exponential growth of machine learning, learning. IEEE J. Biomed. Health Inform. 20(1), 39–
machine learning has been extended to many fields of 46 (2016)
medical health. A lot of data mining and machine learning [9] J. Lee and J. Y. Kim, "Identification of type 2
techniques have been applied to diabetes datasets for diabetes risk factors using phenotypes consisting of
disease risk prediction. anthropometry and triglycerides based on machine
learning," IEEE journal of biomedical and health
Machine learning is used to teach machines how to handle informatics, vol. 20, pp. 39-46, 2015.
the data more efficiently. Sometimes after viewing the [10] A. Ozcift and A. Gulten, "Classifier ensemble
data, we cannot interpret the pattern or extract information construction with rotation forest to improve medical
from the data. In that case, we apply machine learning diagnosis performance of machine learning
with the abundance of datasets available. The purpose of algorithms," Computer methods and programs in
machine learning is to learn from the data. Many studies biomedicine, vol. 104, pp. 443-451, 2011.
have been done on how to make machines learn by [11] L. Han, S. Luo, J. Yu, L. Pan, and S. Chen, "Rule
themselves. extraction from support vector machines using
ensemble learning approach: an application for
The purpose of this report is to review these machine diagnosis of diabetes," IEEE journal of biomedical
learning techniques based on the performance measures and health informatics, vol. 19, pp. 728-734, 2014.
and characteristics of the methods. The Pima Indian [12] Carrera, E.V., González, A., Carrera, R.: Automated
Diabetes dataset, taken as part of the study, includes 768 detection of diabetic retinopathy using SVM. In: 2017
patients, out of which 268 patients are diabetic and 500 IEEE XXIV International Conference on Electronics,
patients are under control. Electrical Engineering and Computing (INTERCON).
IEEE (2017)
[13] V. A. Kumari and R. Chitra, "Classification of
REFERENCES diabetes disease using support vector machine,"
International Journal of Engineering Research and
[1] Lonappan, G. Bindu, V. Thomas, J. Jacob, C. Applications, vol. 3, pp. 1797-1801, 2013.
Rajasekaran, and K. Mathew, "Diagnosis of diabetes [14] A. Mujumdar and V. Vaidehi, "Diabetes Prediction
mellitus using microwaves," Journal of Electro using Machine Learning Algorithms," Procedia
magnetic Waves and Applications, vol. 21, pp. 1393- Computer Science, vol. 165, pp. 292-299, 2019/01/01/
1401, 2007. 2019.
[2] Krasteva, V. Panov, A. Krasteva, A. Kisselova, and Z. [15] Nagesh Singh Chauhan, Data Science Enthusiast,
Krastev, "Oral cavity and systemic diseases—diabetes Blog: Decision tree, https://www.kdnuggets.com/202
mellitus," Biotechnology & Biotechnological 0/01/ decision-tree-algorithm-explained.html.
Equipment, vol. 25, pp. 2183-2186, 2011. [16] NiklasDonges, A Complete Guide To The Random
[3] M. I. N. Logical, S. BUZURA, V. DADARLAT, B. Forest Algorithm, https://builtin.com/data-science/r
IANCU, A. PECULEA, E. CEBUC, et al., "2020 ando m-forest-algorithm
IEEE International Conference on Automation, [17] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, H. Tang,
Quality and Testing, Robotics." ―Predicting Diabetes Mellitus With Machine Learning
[4] M. E. Cox and D. Edelman, "Tests for screening and Techniques‖, https://www.frontiersin.org/articles/10.3
diagnosis of type 2 diabetes," Clinical diabetes, vol. 3 89/fgene.2018.00515/full
27, pp. 132-138, 2009.
[5] K. Polat and S. Güneş, "An expert system approach
based on principal component analysis and adaptive
neuro-fuzzy inference system to diagnosis of diabetes
disease," Digital Signal Processing, vol. 17, pp. 702-
710, 2007.
[6] D. Çalişir and E. Doğantekin, "An automatic diabetes
diagnosis system based on LDA-Wavelet Support
© 2022 IJSRET
447