0% found this document useful (0 votes)
74 views4 pages

Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction

Uploaded by

jaiyadav03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views4 pages

Hybrid Deep Learning CNN-LSTM Model For Diabetes Prediction

Uploaded by

jaiyadav03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Scientific Research & Engineering Trends

Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X

Hybrid Deep learning CNN-LSTM Model for Diabetes


Prediction
Mahadeo Bhopte, Manish Rai
Department of Computer Science & Engineering,
Bhabha University Bhopal, India
[email protected], [email protected]

Abstract- Diabetes mellitus is a high-risk medical disease in which blood sugar levels are too high. It is a major cause of death
worldwide. According to the increased morbidity in recent years, in 2040, the world's diabetics will reach approximately 642
million, which means that one in ten adults will suffer from diabetes in the future. There is no doubt that it draws considerable
attention to this troubling amount. Machine learning has been extended too many areas of medical health through the
exponential advancement of machine learning. A lot of data mining and machine learning techniques have been applied to
diabetes datasets for disease risk prediction. The purpose of this paper is to review these machine learning techniques based on
the performance measures and characteristics of the methods. The Pima Indian Diabetes dataset, taken as part of the study,
includes 768 patients, out of which 268 patients are diabetic and 500 patients are under control.

Keywords- Machine Learning, Decision tree, Random forest, PIMA diabetes dataset, Diabetes Mellitus.

I. INTRODUCTION is presented and compared there accuracy on Pima Indian


dataset.
Diabetes mellitus (DM) is commonly called diabetes. It is
a medical problem that is severe and complex. The II. RELATED WORK
pancreas does not produce enough insulin so blood sugar
rises and it affects various organs, in particular the eyes, Nowadays, Diabetes is a general chronic disease which
kidneys, nerves [1]. poses a great risk to an individual's physical condition.
Blood glucose is a main property of diabetes which is
It is for this reason that diabetes is referred to as the silent higher than the normal level, because of defective insulin
killer. Three kinds of diabetes exist: type I diabetes, type II secretion with special biological effects, [1].
diabetes, and gestational diabetes [2]. The pancreas
produces very little insulin in the case of type I diabetes or Diabetes can direct to persistent damage and dysfunction
even no insulin. Roughly 5 to 10% of all diabetes is type I of different tissues, specially kidneys, eyes, heart, blood
and can occur in any stage of life, as well as in infants [3]. vessels and nerves [2]. The distinctive medical symptoms
Type II diabetes occurs if insulin is not adequately are increased thirst and regular urination, high blood
released by the body. Approximately 90% of diabetic glucose levels [3]. Diabetes cannot be treated successfully
patients are of type II diabetes in the world. Form II is with medications alone and the patients are requisite
similar to the third type of diabetes, gestational diabetes insulin therapy. With the advancement of living standards,
mellitus (GDM). In many ways, since it requires a mixture diabetes is becoming more and more prevalent in the
of comparatively inadequate secretion of insulin. everyday lives of people. Therefore a subject worth
Approximately 2-10% of all researching is how to easily and reliably diagnose and
evaluate diabetes. In medicine, diabetes diagnosis is based
Pregnant women are affected by gestational diabetes, after on fasting blood glucose, glucose tolerance, and
delivery, it can progress or disappear.Diabetes disease spontaneous levels of blood glucose [3] [4].
diagnosis and interpreting diabetes data is a difficult
problem. Various machine learning methods are used for The sooner a diagnosis for diabetes is received, the easier
dealing with healthcare problems which are typical in we can control it. Machine learning can help people make
nature. Most of the medical data contains non-linearity, a preliminary judgment about diabetes mellitus according
non-normality and an inherent correlation structure. to their daily physical examination data, and it can serve as
Therefore, the conventional and extensively used a reference for doctors [5]. The most important problems
classification techniques like naive bayes, random forest are how to pick the correct features and the right classifier
and decision tree etc. but cannot classify the data properly. for the machine learning process.
In this paper review of various machine learning methods

© 2022 IJSRET
444
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X

In recent times, several algorithms are used to forecast Principal component analysis (PCA) is a technique to
diabetes, including the conventional machine learning bring out strong patterns in a dataset by supressing
method [6], such as support vector machine (SVM), variations. We applied this algorithm for feature extraction
decision tree (DT), logistic regression etc. [7] proposed a from our Pima dataset and also used to clean data sets to
10-fold cross validation method in three algorithms, i.e. make it easy to explore and analyse. In this PCA we have
logistic regression, naive bayes and SVM, where SVM set the feature vector according their Eigen value.
obtained higher performance and accuracy in comparison
to other algorithms. [8] Constructed prediction models Step: 2 CNN based Feature Selection
based on logistic regression for different onsets of type 2 In the proposed model, CNN is used for automatic feature
diabetes prediction in order to deal with the high selection form decomposed EEG signals.CNN is a deep
dimensional datasets. learning subset that has received a lot of focus in recent
years and is applied in signal processing. The activation
In [9], the authors concentrated on glucose and used function called ‗Softmax‘ is used to express a probability
diabetes, which is a multivariate regression problem, to distribution over an n-valued discrete signals with kernel
predict support vector regression (SVR). In addition, more size 1 for extracting specific features from input signals
and more studies have used ensemble techniques to and padding is valid. It is assumed that all dimensions are
enhance the accuracy of [6]. A new ensemble method, valid for the input signals to be completely covered by the
Rotation Forest, which incorporates 30 machine learning filter respectively. Max-pooling filter serves as a window
techniques, was proposed in [10]. In [11], authors through which only the maximum score is chosen for
suggested a method of machine learning that modified the output which is used between the first and second layer
rules for the prediction of SVM. In [12], the authors and after the second layer of CNN with batch size 1.
proposed a computer assisted diabetes on the basis of
digital image processing on retinal images to disclose Step 3: Classification:
diabetic retinopathy, by employing SVM technique. The Long Short-Term Memory LSTM layer consists of a
fully connected unit with 64 neuronsavoid the problem of
Machine learning approaches are commonly used to overfitting during the learning process and dropout layer
predict diabetes and produce preferred results. Decision used with 0.5 rate. Finally fed into the last dense layer
tree is one of the common methods of machine learning in with 1 neuronwith activation function sigmoid has been
the medical field, which has the power to classify used. All the parameter setup has been chosen based on
gratefully. Many decision trees are created by Random the hit and trial approach. The proposed model's
Forest. The neural network is a common method of architecture is described in figure 1.
machine learning that has improved performance in many
aspects recently. So we used algorithms like decision
trees, random forest (RF) and neural network to predict
diabetes in this research.

III. PIMA INDIAN DATASET


The review of machine learning methods is performed on
the Pima Indian dataset [13]. The dataset is originally from
National Institute of Diabetes and Digestive and Kidney
Diseases (NIDDK), generally used for diagnosis of
diabetes in patients based on certain factors.

In particular, all patients are females of Pima Indian herita


ge who are at least 21 years old. The dataset comprises 8
pregnancy features, plasma glucose concentration after a
2-h oral glucose tolerance test, diastolic blood pressure,
skin fold thickness of triceps 2-h serum insulin, body mass
index, pedigree feature and age of diabetes. This dataset
contains 786 initial values of diabetic data including Fig 1. Flow chart of Proposed Algorithm.
missing values which are removed, remaining dataset is
392. Algorithm 1:- Proposed algorithm for Diabetes
prediction
IV. PROPOSED MODEL Results:Different Activity Recognition using
precision, f1-score, Recall, classification accuracy;
Step: 1 Feature extraction using PCA: Input:Pima dataset[18]

© 2022 IJSRET
445
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X

Output: Predict Diabetes

Procedure:
Step 1: Data Pre-processing Pima Dataset:
1.1 Remove Missing and Nan Values;
Step 2: Apply an feature Extraction using Principal
Component Analysis(PCA) a Machine learning
module
Step 3: Apply CNN Model for Automatic Feature
Selection:
Step 4: Apply LSTM Model for Classification:
4.1 Input layer with padding valid and kernel=2
4.2 Batch Normalization layer Fig 2. Accuracy Vs Algorithm.
4.3 Max Pooling layer
4.4 With Sigmoid activation layer
5.5 Flatten layer
6.6 Batch Normalization layer
7.7 Relu layer
8.8 Dense Layer
8.9 Compile Model with Adam optimizer
Step 5: Generate and store obtained classification
accuracy, precision, f1-score, Recall, classification
accuracy;
Step 6: Repeat Step 3 ,Step 4 and Step 5

V. RESULTS ANALYSIS
The comparison of different models with respect to Fig 3. Precision Vs Algorithm.
accuracy predicted, precision, recall and F1 score by all
the applied algorithms. It is observed that the decision tree
performed best on the Pima Indian dataset.

The model hybrid CNN-LSTM analysis using confusion


matrix parameters are accuracy 89.30%, Precision
87.80%, Recall 84.10% and F1-Score 85.58% which again
has scope for improvement. In the future work, it can be
improved using changes in the setup of hyper-parameters.

Table 1. Comparison of different models on the basis of


accuracy, precision, recall and F1 score.
Model Accuracy Precision Recall F1- Fig 4. Recall Vs Algorithm.
% % % Score
%
DT 76.27 74.9 72.03 73.95
RF 75.67 71.44 76.62 74.53
KNN 72.25 70.5 66.07 61.84
SVM 74.37 71.01 67.59 71.09
Extra Tree 80.16 83.8 76.05 77.38
Proposed 89.30 87.80 84.10 85.58
Hybrid CNN-
LSTM

Fig 5 F1-ScoreVs Algorithm.

© 2022 IJSRET
446
International Journal of Scientific Research & Engineering Trends
Volume 8, Issue 1, Jan-Feb-2022, ISSN (Online): 2395-566X

VI. CONCLUSION AND FUTURE WORK Vector Machine Classifier," Expert Systems with
Applications, vol. 38, pp. 8311-8315, 2011.
Diabetes mellitus is a high-risk medical disease in which [7] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras,
blood sugar levels are too high. It is a major cause of death I. Vlahavas, and I. Chouvarda, "Machine learning and
worldwide According to growing morbidity in recent data mining methods in diabetes research,"
years; the global diabetic population will reach around 642 Computational and structural biotechnology journal,
million in 2040, implying that one in every ten persons vol. 15, pp. 104-116, 2017.
would suffer from diabetes in the future. There is no doubt [8] Lee, B.J., Kim, and J.Y.: Identification of type 2
that it brings a lot of attention to this worrisome figure. diabetes risk factors using phenotypes consisting of
anthropometry and triglycerides based on machine
Through the exponential growth of machine learning, learning. IEEE J. Biomed. Health Inform. 20(1), 39–
machine learning has been extended to many fields of 46 (2016)
medical health. A lot of data mining and machine learning [9] J. Lee and J. Y. Kim, "Identification of type 2
techniques have been applied to diabetes datasets for diabetes risk factors using phenotypes consisting of
disease risk prediction. anthropometry and triglycerides based on machine
learning," IEEE journal of biomedical and health
Machine learning is used to teach machines how to handle informatics, vol. 20, pp. 39-46, 2015.
the data more efficiently. Sometimes after viewing the [10] A. Ozcift and A. Gulten, "Classifier ensemble
data, we cannot interpret the pattern or extract information construction with rotation forest to improve medical
from the data. In that case, we apply machine learning diagnosis performance of machine learning
with the abundance of datasets available. The purpose of algorithms," Computer methods and programs in
machine learning is to learn from the data. Many studies biomedicine, vol. 104, pp. 443-451, 2011.
have been done on how to make machines learn by [11] L. Han, S. Luo, J. Yu, L. Pan, and S. Chen, "Rule
themselves. extraction from support vector machines using
ensemble learning approach: an application for
The purpose of this report is to review these machine diagnosis of diabetes," IEEE journal of biomedical
learning techniques based on the performance measures and health informatics, vol. 19, pp. 728-734, 2014.
and characteristics of the methods. The Pima Indian [12] Carrera, E.V., González, A., Carrera, R.: Automated
Diabetes dataset, taken as part of the study, includes 768 detection of diabetic retinopathy using SVM. In: 2017
patients, out of which 268 patients are diabetic and 500 IEEE XXIV International Conference on Electronics,
patients are under control. Electrical Engineering and Computing (INTERCON).
IEEE (2017)
[13] V. A. Kumari and R. Chitra, "Classification of
REFERENCES diabetes disease using support vector machine,"
International Journal of Engineering Research and
[1] Lonappan, G. Bindu, V. Thomas, J. Jacob, C. Applications, vol. 3, pp. 1797-1801, 2013.
Rajasekaran, and K. Mathew, "Diagnosis of diabetes [14] A. Mujumdar and V. Vaidehi, "Diabetes Prediction
mellitus using microwaves," Journal of Electro using Machine Learning Algorithms," Procedia
magnetic Waves and Applications, vol. 21, pp. 1393- Computer Science, vol. 165, pp. 292-299, 2019/01/01/
1401, 2007. 2019.
[2] Krasteva, V. Panov, A. Krasteva, A. Kisselova, and Z. [15] Nagesh Singh Chauhan, Data Science Enthusiast,
Krastev, "Oral cavity and systemic diseases—diabetes Blog: Decision tree, https://www.kdnuggets.com/202
mellitus," Biotechnology & Biotechnological 0/01/ decision-tree-algorithm-explained.html.
Equipment, vol. 25, pp. 2183-2186, 2011. [16] NiklasDonges, A Complete Guide To The Random
[3] M. I. N. Logical, S. BUZURA, V. DADARLAT, B. Forest Algorithm, https://builtin.com/data-science/r
IANCU, A. PECULEA, E. CEBUC, et al., "2020 ando m-forest-algorithm
IEEE International Conference on Automation, [17] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, H. Tang,
Quality and Testing, Robotics." ―Predicting Diabetes Mellitus With Machine Learning
[4] M. E. Cox and D. Edelman, "Tests for screening and Techniques‖, https://www.frontiersin.org/articles/10.3
diagnosis of type 2 diabetes," Clinical diabetes, vol. 3 89/fgene.2018.00515/full
27, pp. 132-138, 2009.
[5] K. Polat and S. Güneş, "An expert system approach
based on principal component analysis and adaptive
neuro-fuzzy inference system to diagnosis of diabetes
disease," Digital Signal Processing, vol. 17, pp. 702-
710, 2007.
[6] D. Çalişir and E. Doğantekin, "An automatic diabetes
diagnosis system based on LDA-Wavelet Support

© 2022 IJSRET
447

You might also like