A Deep Learning Model For Identification of Diabetes Type 2 Based On Nucleotide Signals
A Deep Learning Model For Identification of Diabetes Type 2 Based On Nucleotide Signals
https://doi.org/10.1007/s00521-022-07121-8 (0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL ARTICLE
Received: 24 July 2021 / Accepted: 21 February 2022 / Published online: 12 March 2022
The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2022
Abstract
In Genome-Wide Association Studies (GWAS), detection of T2D-related variants in genome sequences and accurate
modeling of the complex structure of the relevant gene are of great importance for the diagnosis of diabetes. For this
purpose, this paper presents a novel strong algorithm to accurately and effectively identify Type 2 Diabetes (T2D) risk
variants at high-performance rates. The proposed algorithm consists of five important phases. The first stage is to collect
T2D-associated DNA sequences and to digitize them by the Entropy-based technique. The second stage is to transform
these digitized DNA sequences into 224 9 224 pixels size spectrum images. The third is to extract a distinctive feature set
from these spectrum images using the ResNet and VGG19 architectures. The fourth is to classify the effective feature set
using SVM and k-NN methods. The last stage is to evaluate the system with k-fold cross-validation. As a result of the
developed algorithm, the performances of the used Convolutional Neural Network (CNN) methods, the Entropy-based
technique, and the classifiers were compared in relation. As a result of the study a combination model of the proposed
Entropy-based technique, ResNet and Support Vector Machine (SVM) achieved the highest accuracy rate with 99.09%.
With this study, the performance of the system in the extraction of epigenetic features and prediction of T2D from
spectrogram images was investigated. The results show that the system will contribute to the identification of all genes in
diabetes-related tissue and studies on new drug targets.
Keywords Convolutional neural network Entropy-based technique DNA sequences Signal processing
Type 2 diabetes
1 Introduction sugar levels rise. Type 2 diabetes (T2D) occurs when the
body does not produce enough insulin to function properly,
With GWAS, significant advances have been made in or when body cells do not react to insulin [4, 5]. Type 2
understanding the genetic makeup of complex human dis- diabetes is also called type 2 diabetes mellitus (T2DM).
eases [1]. The main purpose of these studies is to accu- This is known as insulin resistance. T2D is much more
rately model the complex structure of the disease-related common than type 1 diabetes (T1D). In T1D, the body does
gene regulator [2]. Diabetes is one of these genetic diseases not produce any insulin. The factors that cause T2D are
and develops and lasts a lifetime when the gland which is genetic and environmental factors [6–8]. The basis of T2D
called the pancreas does not produce enough insulin hor- is insulin resistance and insulin secretion abnormality.
mone in your body or the insulin hormone cannot be used Some patients with diabetes can be misdiagnosed. Not all
effectively [3]. The full name of the disease is Diabetes patients with diabetes are type 1 or type 2. There are also
Mellitus. People with diabetes are unable to use glucose, patients with diabetes caused by genetic mutation, all over
which goes to blood from the food they eat, and their blood the world [9–11]. The vast majority of these people are
treated unnecessarily using insulin rather than low-dose
medication.
& Bihter Das
[email protected] Recently, the studies on the correct diagnosis of T2D
disease, which is very common, continue intensively.
1
Department of Software Engineering, Technology Faculty, However, studies on disease diagnosis using DNA data set
Firat University, 23119 Elazig, Turkey
123
12588 Neural Computing and Applications (2022) 34:12587–12599
are limited. The conversion of DNA sequences into digital 2 Related work
signals and then spectrum images and the detection of
diabetes disease is the first with this study. Therefore, this Genome studies aimed at understanding the genetic archi-
paper aims to develop a novel algorithm based on deep tecture of various diseases continue without slowing down.
learning to discover the genetic basis and risk mechanisms As there is still a significant gap between genetic discov-
of T2D. Thus, it will contribute to the identification of all eries and T2D risk mechanisms, current research focuses
genes in diabetes-related tissue and studies on new drug on learning T2D biological mechanisms. The aim of the
targets. ongoing studies is to reveal the effective molecular
mechanisms in the emergence of the disease and to use the
1.1 Motivation obtained genetic information to predict the risk of T2D
development. There are many studies related to T2D and
In the literature, there are some genome studies on its genetics in the literature. More than 400 genomic sig-
understanding the genetic architecture of diabetes. These nals associated with T2DM have been identified in some of
studies are microarray-based techniques, deep-learning- these studies [8, 12–14]. These signals are often named
based models, machine learning methods, statistical anal- after their closest genes but it is not known whether the
ysis to diagnose T2D. However, these methods either variant is a transcript, which changes the risk of diabetes.
require a laboratory environment or have not been able to They have been used T2D-associated genes, ATAC
clearly demonstrate whether the variants in the genome sequences, T2D variants, mitochondrial DNA (mtDNA)
sequences are a transcript that causes T2D. The most sequences in their studies. In addition, in other studies in
important point that distinguishes this study from others is the literature, deep learning-based models [12–15], path-
that it offers a low-cost, high-accuracy deep learning-based way analysis [16], CNN models, statistical analysis, Sup-
hybrid algorithm to detect T2D from nucleotide sequences port Vector Machine Recursive Feature Elimination (SVM-
without the need for a laboratory environment. The pro- RFE) approach [17] and some machine learning methods
posed algorithm achieves satisfactory sensitivity, precision, have been used to diagnosis T2D from genomic signals
and strong robustness in classification. [19–23]. Moreover, Ensemble-based methods have been
used for the prediction of diabetes [24–28]. Table 1 lists
1.2 Contributions of the paper current studies for the detection of T2D-related genes.
123
Neural Computing and Applications (2022) 34:12587–12599 12589
Table 1 The studies for the identification of T2D-associated genes in the literature
References Methods Datasets Results
Abdulaima Deep learning SNP with T2D AUC = 96.53%, Sens = 93.91%
et al. [13]
Rai et al. Deep learning model base on U-Net architecture ATAC sequence with T2D –
[14]
Mattis et al. Convolutional Neural Network (CNN) T2D gene –
[15] ATAG sequence
Wang et al. Ingenuity Pathway Analysis Diabetic PDAC patients n = 18, P = 2.6964E-08
[16]
Kumar SVMRFE approach 37 samples of normal AUC = %83.9
et al. [17] human, 34 diabetic
humans
Lalrohlui Statistical analysis mtDNA sequence for 28 ND3 variant 10398A [ G was found
et al. [18] diabetics from Northeast associated with T2D (OR = 9.489, 95%
India CI = 1.161–77.54, P value = 0.03)
Liang et al. SimPo algorithm Sequences of 24 Type 2 AUC = 0.902
[19] patients with T2D and 47 Sensitivity = 0.837
healthy controls
Specificity = 0.944
Cai et al. Logistic regression (LR), linear discriminant Dataset A from Chinese SVM on several different experiments
[20] analysis (LDA), Naive Bayes (NB), and Dataset B from European The best AUC is 0.97
support vector machine (SVM)
Malik et al. LR, SVM, Artificial neural network (ANN) 175 samples half healthy SVM ACC = 84.09
[21] and half diabetic patients
Nilamyani Recursive feature extraction, Random Forest Microarray-based –
et al. [22] RFE, SVM GSE18732 gene for T2D
Liu et al. Independent sample t-test TRs of diabetes genes, non- P-value of the t-test is 0.557 and 0.422
[23] diabetes disease genes
were converted to digital signals by Entropy-based map- randomly in many studies, but the alpha value is newly
ping technique. defined in Entropy-based technique. The alpha value is
defined as a division of the logarithm of pðxi Þ to 1. The
3.1.1 The entropy-based mapping technique formula for the alpha value is shown in Eq. (2).
1
Entropy based technique, which is used in the proposed a¼ ð2Þ
logðpðxi ÞÞ
approach, better reflects the complex structure of DNA
sequences and digitizes the sequences according to the Figure 2 shows the numerical representation of DNA
frequency of repetition of codons. Also, Entropy-based gene sequences by Entropy-based mapping technique.
technique provides a wide range of correlation information
on the gene sequence. This technique has higher perfor- 3.1.2 Electron ion interaction potential (EIIP) mapping
mance than existing numerical mapping techniques in the technique
literature. In the implementation, the performance of this
technique was also compared with Electron–Ion Interaction In this technique, where bases are defined as the average
Pseudo Potential (EIIP) and Integer techniques, which are energy of delocalized electrons [32, 33], bases are repre-
widely used in the literature. The formula of the mapping sented by the following values, respectively: A = 0.1260,
technique based on fractional Shannon entropy is given in G = 0.0806, C = 0.1340, T = 0.1335 [32–34]. Figure 3
Eq. (1) [30, 31]. shows the numerical representation of DNA gene sequen-
X ces by EIIP mapping technique.
Sf ¼ ½ðpðxi ÞÞ/i pðxi Þ logðpðxi ÞÞ ð1Þ
i
123
12590 Neural Computing and Applications (2022) 34:12587–12599
123
Neural Computing and Applications (2022) 34:12587–12599 12591
123
12592 Neural Computing and Applications (2022) 34:12587–12599
Fig. 5 Graphical representation of the digitized T2DM gene sequence according to Entropy-based technique and others
VGG19 and ResNet models, which are pre-trained models, expected to create more successful network archi-
were used. tectures. But this raises problems such as vanishing
gradients and optimization difficulty. One of the
(A) VGG19: Visual Geometry Group (VGG19) is a
most important contributions of ResNet is that it can
convolutional neural network that is 19 layers deep
prevent these problems while increasing the depth of
was developed by the Oxford University Visual
the network [45].
Geometry Group (VGG) [40]. VGG19 consists of 16
convolution layers and 3 fully connected layers, and
5 max polling and SoftMax as the last layer. This 3.4 Phase – 4: classification of deep features
model contains about 144 million parameters
[41, 42]. In this phase of the proposed approach, SVM and k-NN
(B) Residual Neural Network (ResNet): In recent years, classifiers are used. Since there are hundreds of studies and
deep learning studies have gained great importance dozens of books on these classifiers in the literature,
and momentum. LeNet, AlexNet, GoogleNet, summary information is presented in this section.
VGGNet, and ResNet have been the most important
studies in this field, respectively. One of the common 3.4.1 Support vector machine classifier
views in all these studies is that the number of layers
in CNNs is a very important parameter [43, 44]. Support Vector Machines is a controlled machine learning
Increasing the number of layers, in theory, increases algorithm that can be used for classification or regression
the representational capacity in CNNs. This is problems [46]. SVM offers a way to follow a path among
123
Neural Computing and Applications (2022) 34:12587–12599 12593
many possible classifiers that will increase your chances of follows; True Positive (TP), the number of correctly
accurately labeling your test data. In addition to performing identified T2D gene, false negative (FN), the number of
linear classification, SVMs can efficiently perform a non- incorrectly defined T2DM gene, true negative (TN), the
linear classification using what is called the kernel trick, number of the correctly identified normal gene, false pos-
implicitly mapping their inputs into high-dimensional itive (FP), false shows the number of identified normal
feature spaces. It should be noted that Support Vector gene.
Machines work in any number and size; and in these Sensitivity ¼ TP=ðTP þ FNÞ 100
dimensions, they find a similar two-dimensional line [47].
For example, they can classify a hyper plan at higher Precision ¼ TP=ðTP þ FPÞ 100
dimensions, such as generalizing the two-dimensional line Accuracy ¼ ðTP þ TNÞ=ðTP þ FP þ TN þ FNÞ 100
and a three-dimensional plane to arbitrary dimensions.
Since the hyperplane can act as a linear classifier, the SVM
classifier was used to classify spectrogram images of DNA
sequences. 4 Experimental results and discussion
3.4.2 k-nearest neighborhood (k-NN) classifier Normal gene and T2D-associated gene sequences of 2470
bases lengths were used in the study. The gene sequences
k-Nearest Neighbor (k-NN) is a type of supervised machine were converted into spectrogram images after digitizing
learning algorithm that can be used for both classifications with Entropy-based, EIIP and Integer techniques. Spec-
and regression predictive problems [48]. However, it is trogram images were 875 9 656 pixels in size. These
mainly used for classification predictive problems in the images have been resized to 224 9 224 pixels to been
industry. k-NN is a nonparametric algorithm, which means processed by ResNet and VGG19 architectures. These
it does not make any assumption on underlying data. It is spectrogram images were then given as input to ResNet
very important for the k-NN classifier that the training set and VGG19 to extract deep features. While ResNet obtains
is large and the k value is selected appropriately [49]. Since a 2048 dimensional vector, VGG19 obtains a 4096-di-
the DNA datasets are very large, the k-NN classifier was mensional feature vector from each spectrogram image.
preferred in this study. The feature vectors were classified using SVM. To evaluate
the classification results more objectively, the k-fold cross-
3.5 Phase – 5: evaluation with k-fold cross- validation method was used. The k value was determined
validation as 5.
The accuracy rates of the classifiers are shown in
In this last phase of the study, the numerical results Table 2. In Table 2, it is seen that in both classifiers, the
obtained were evaluated with the k-Fold Cross Validation Entropy-based mapping technique is more successful than
method. 80% of the data set, which was divided into 5 other numerical techniques. The reason for the high per-
parts, was used for education and 20% was used for testing. formance of the Entropy-based mapping technique is that
This process was applied for each part separately. In Fig. 6, better reflection of the complex structure of the gene, and
the K-fold validation diagram for the dataset is shown. The digitization of the sequences according to the frequency of
performance of the proposed approach was calculated by repetition of codons. Besides, Entropy-based technique
averaging of 5 parts. The used parameters are defined as provides a wide range of correlation information on the
123
12594 Neural Computing and Applications (2022) 34:12587–12599
Table 2 The accuracy rates of Entropy-based, EIIP, and Integer ResNet because the performance of a deep learning model
techniques for VGG19 and ResNet models changes according to the size of the data, distinguishing
Models Entropy EIIP Integer feature of the data, quality of the data, and parameters of
the model. Nonparametric statistical significance tests also
SVM ResNet 99.09 – 0.59 98.72 – 0.54 98.52 – 0.92
have been applied to the results. Firstly, the data set was
VGG19 98.16 ± 0.76 98.38 ± 0.71 97.81 ± 0.73 evaluated in terms of the homogeneity of variance
k-NN ResNet 98.58 ± 0.33 98.38 ± 0.30 98.25 ± 0.33 assumption. The extreme values of Levene’s Test are
VGG19 98.48 ± 0.32 98.62 ± 0.41 98.38 ± 0.18 shown in Fig. 7, and the descriptive results of the Levene’s
Test are shown in Fig. 8. According to the test result, it was
seen that the data were not homogeneously distributed and
it was certain that nonparametric tests would be applied.
gene sequence. As seen in Table 2, the classification results According to Levene’s test results, it was observed that
are very close in the k-NN classifier for three different the data were nonparametric. For this, the Kruskal–Wallis
mapping techniques. When the performances of SVM and test was applied since the number of categories was more
k-NN classifiers are compared, the SVM achieved a better than 2 because it was made according to 3 techniques.
classification accuracy. 80% of the data set was used for Krustal Wallis test results are shown in Fig. 9.
training and the remaining part was used for testing. The confusion matrix is a special table layout that
Accordingly, 3952 of the total 4940 spectrogram images allows the visualization of an algorithm’s performance,
were used for training and the remaining 988 images were typically a controlled learning one. It is an effective tool to
used for testing. measure the performance of classification. Figure 10 shows
According to the SVM, both CNN models have a lower the confusion matrix values of the Entropy-based technique
performance for EIIP and Integer mapping techniques. of the SVM classifier for ResNet. It was seen that Entropy-
Whereas the accuracy values of the Entropy based mapping based technique has better TN and TP values compared to
technique for SVM are highest, the standard deviation others. These results show that the ResNet architecture is
values are within the acceptable limits. When the k-NN more successful than the other models in feature extraction
classification results were examined, the classification with Entropy-based technique.
accuracy of the Entropy-based technique is higher than the The Receiver Operating Characteristic (ROC) curves of
other two mapping techniques, but it is lower than the the two techniques with the highest performance for
accuracy values of the SVM classifier. Table 3 shows the identification of the T2D gene are shown in Fig. 11. The
Sensitivity, Specificity, Precision, F1 score rates of all three AUC is under the ROC curve. The highest AUC value is 1.
mapping techniques for both CNN models in the SVM AUC value close to 1 indicates that it is the correct clas-
classifier. sification. The maximum AUC value of 99.09% was
The best classification performance with 99.09% was obtained in the ResNet model with Entropy-based tech-
obtained using Entropy-based technique in the ResNet nique. The obtained classification results with the proposed
model. The ResNet is a deeper model than the VGG19. method are promising. This shows that the proposed
Increasing the depth of the models may not always give a method has high applicability in practice.
better result. While the VGG19 model has a better feature The proposed approach showed a high-rate performance
vector than ResNet, it did not give a better result than for T2D disease modeling and achieved an average accu-
racy of 99.09% in the model where features were extracted
Table 3 The sensitivity, specificity, precision, F1 score values of with ResNet and classified with SVM. This result showed a
SVM high rate of performance in the modeling of T2D diabetes
Models Entropy EIIP Integer risk variants.
There are some studies dealing with the identification of
ResNet Sensitivity 99.95 – 0.79 98.62 ± 0.13 98.58 ± 1.38
T2D from DNA sequences. In these studies, several
Specificity 99.23 – 1.21 98.83 ± 0.70 98.46 ± 0.55 methods have been used such as deep learning [13–15],
Precision 99.23 – 1.21 98.83 ± 0.68 98.46 ± 0.56 statistical analysis [18], machine learning methods
F1 score 99.09 – 0.58 98.72 ± 0.54 98.52 ± 0.93 [19–21]. Table 4 shows a comparison of our method with
VGG19 Sensitivity 98.22 – 0.12 97.94 ± 0.12 97.33 ± 1.36 the methods proposed in the literature with regard to per-
Specificity 98.10 – 0.98 98.83 ± 0.79 98.30 ± 0.98 formance results of liver cancer DNA sequences
Precision 98.10 – 0.98 98.82 ± 0.78 98.29 ± 0.97 classification.
F1 score 98.16 – 0.76 98.37 ± 0.72 97.80 ± 0.74 The proposed framework has higher performance than
the existing techniques. The proposed study is different in
two aspects from the other studies in the literature about the
123
Neural Computing and Applications (2022) 34:12587–12599 12595
identification of T2D-associated DNA sequences. The first proposed approach, a satisfactory accuracy rate and low-
is the use of Entropy-based mapping technique, which is a cost system is presented. Thus, a successful and effective
new method for digitizing DNA sequences, and the second alternative solution approach has been developed for the
is the transferring of digitized signals, which were con- identified problem. On the other hand, pre-trained CNN
verted to spectrogram, into 2-dimensional space and the models can effectively extract features from small datasets
feature extraction using pre-trained CNN models. The as they are pre-trained with large image data. The size of
proposed study showed good performance compared to the gene sequences used in this study and the size of the
other studies in the literature and achieved an average spectrogram image used to provide an adequate training set
accuracy of 99.09 ± 0.59%. for designing a new CNN model were the limitations of the
It is difficult to understand the complexity of the T2D- study.
associated gene and to detect disease-causing variants
without the need for a laboratory environment. With the
123
12596 Neural Computing and Applications (2022) 34:12587–12599
123
Neural Computing and Applications (2022) 34:12587–12599 12597
Fig. 10 Confusion matrices of all three techniques (Entropy, EIIP, Integer, respectively) for ResNet model in SVM
Table 4 Comparison of methods used for liver cancer DNA sequences classification
Authors Methods Dataset Performance
123
12598 Neural Computing and Applications (2022) 34:12587–12599
ResNet and classified with SVM. It also contributes to 15. Mattis KK, Gloyn LA (2020) From Genetic association to
GWAS on the genetic development of T2D. molecular mechanisms for Islet-cell dysfunction in type 2 dia-
betes. J Mol Biol 432:1551–1578. https://doi.org/10.1016/j.jmb.
2019.12.045
16. Wang K, Zhou W, Meng P, Wang P, Zhou C, Yao Y, Wu S,
Funding There is no funding source for this article. Wang Y, Zhao J, Zou D, Jin G (2019) Immune-related somatic
mutation genes are enriched in PDAGs with diabetes. Transl
Oncol 12(9):1147–1154
Declarations 17. Kumar A, JeyaSundaraSharmila D, Singh S (2017) SVMRFE
based approach for prediction of most discriminatory gene target
Conflict of interest The author declares that there are no known for type II diabetes. Genom Data 12:28–37. https://doi.org/10.
competing financial interests or personal relationships that could 1016/j.gdata.2017.02.008
appear to influence the work reported in this paper. 18. Lalrohlui F, Zohmingthanga J, Hruaii V, Kumar NS (2020)
Genomic profiling of mitochondrial DNA reveals novel complex
gene mutations in familial type 2 diabetes mellitus individuals
from Mizo ethnic population, Northeast India. Mitochondrion.
References https://doi.org/10.1016/j.mito.2019.12.001
19. Liang F et al (2020) Insulin-resistance and depression cohort data
1. Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J (2019) mining to identify nutraceutical related DNA methylation bio-
Machine learning SNP based prediction for precision medicine. marker for type 2 diabetes. Genes Dis. https://doi.org/10.1016/j.
Front Genet. https://doi.org/10.3389/fgene.2019.00267 gendis.2020.01.013
2. Imani M, Ghoreishi S, F. (2020) Optimal finite-horizon pertur- 20. Cai L, Wu H, Li D, Zhou K, Zou F (2015) Type 2 diabetes
bation policy for inference of gene regulatory networks. IEEE biomarkers of human gut microbiota selected via iterative sure
Intell Syst. https://doi.org/10.1109/MIS.2020.3017155 independent screening method. PLoS ONE. https://doi.org/10.
3. Guariguata L, Whiting DR, Hambleton I, Beagley J, Linnenkamp 1371/journal.pone.0140827
U, Shaw JE (2014) Global estimates of diabetes prevalence for 21. Malik S, Khadgawat R, Anand S et al (2016) Non-invasive
2013 and projections for 2035. Diabetes Res Clin Pract detection of fasting blood glucose level via electrochemical
103:137–149 measurement of saliva. Springerplus 5:701. https://doi.org/10.
4. Arikoglu H, Kaya DE (2015) Tip 2 diyabetin moleküler genetik 1186/s40064-016-2339-6
temeli; Son gelişmeler. Genel Tıp Dergisi 25:147–159 22. Nilamyani N, Lawi A, Thamrin SA (2018) A preliminary study
5. Defronzo RA, Ferrannini E, Groop L, Henry RR, Herman WH, on identifying probable biomarker of type 2 diabetes using
Holst JJ et al (2015) Type 2 diabetes mellitus. Nat Rev Dis Pri- recursive feature extraction. In: 2018 2nd East Indonesia con-
mers 1:15019. https://doi.org/10.1038/nrdp.2015.19 ference on computer and information technology (EIConCIT),
6. Morris AP (2018) Progress in defining the genetic contribution to pp 267–270. https://doi.org/10.1109/EIConCIT.2018.8878565
type 2 diabetes susceptibility. Curr Opin Genet Dev 50:41–51 23. Liu ZY, Ding XP, Bian HJ (2008) Comparisons of properties of
7. Das KW, Elbein SC (2006) The Genetic basis of type 2 diabetes. tandem repeats associated with beteen diabetes genes and non-
Cell Sci 2:100–131 diabetes disease genes. In: 2nd international conference on
8. Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, bioinformatics and biomedical engineering, iCBBE 2008,
Rayner NW et al (2018) Fine-mapping type 2 diabetes loci to pp 436–440. https://doi.org/10.1109/ICBBE.2008.107
single-variant resolution using high-density imputation and islet- 24. Reddy SS, Sethi N, Rajender R, Mahesh G (2020) Extensive
specific epigenome maps. Nat Genet. https://doi.org/10.1038/ analysis of machine learning algorithms to early detection of
s41588-018-0241-6 diabetic retinopathy. Mater Today Proc. https://doi.org/10.1016/j.
9. Vinuela A, Varshney A, van de Bunt M, Prasad RB, Asplund OB, matpr.2020.10.894
Bennett A et al (2019) Influence of genetic variants on gene 25. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I,
expression in human pancreatic islets-implications for type 2 Chouvarda I (2017) Machine learning and data mining methods
diabetes. BioRxiv. https://doi.org/10.1101/655670 in diabetes research. Comput Struct Biotechnol J 15:104–116.
10. Varshney A, Scott LJ, Welch RP, Erdos MR, Chines PS, Narisu https://doi.org/10.1016/j.csbj.2016.12.005
N et al (2017) Genetic regulatory signatures underlying işlet gene 26. Sikder N, Masud M, Bairagi AK, Arif ASM, Nahid A-A, Alhu-
expression and type 2 diabetes. Proc Natl Acad Sci myani HA (2021) Severity classification of diabetic retinopathy
114:2301–2306. https://doi.org/10.1073/pnas.162119214 using an ensemble learning algorithm through analyzing retinal
11 Kleinberger JW, Pollin TI (2015) Personalized medicine in dia- images. Symmetry 13:670
betes mellitus: current opportunities and future prospects. Ann N 27. Islam MT, Raihan M, Aktar N, Alam MS, Ema RR, Islam T
Y Acad Sci 1346:45–56. https://doi.org/10.1111/nyas.12757 (2020) Diabetes mellitus prediction using different ensemble
12. Awotunde JB et al (2021) Chapter Nine—Prediction and classi- machine learning approaches. In: 2020 11th international con-
fication of diabetes mellitus using genomic data. In: Sangaiah ference on computing, communication and networking tech-
AK, Mukhopadhyay S (eds) Intelligent IoT systems in person- nologies (ICCCNT), pp 1–7
alized health care. Academic Press, pp 235–292 28. Islam MT, Raihan M, Farzana F, Aktar N, Ghosh P, Kabiraj S
13. Abdulaimma B, Fergus P, Chalmers C, Montañez C (2020) Deep (2020) Typical and non-typical diabetes disease prediction using
learning and genome-wide association studies for the classifica- random forest algorithm. In: 2020 11th International conference
tion of type 2 diabetes. In: içinde 2020 international joint con- on computing, communication and networking technologies
ference on neural networks (IJCNN), Tem, pp 1–8. https://doi. (ICCCNT), pp 1–6
org/10.1109/IJCNN48605.2020.9206999 29. ‘‘Ensembl Genbank’’. Available: https://www.ensembl.org/index.
14. Rai V et al (2020) Single-cell ATAC-Seq in human pancreatic html. Accessed 04 Apr 2020
islets and deep learning upscaling of rare cells reveals cell- 30. Das B, Turkoglu I (2018) A novel numerical mapping method
specific type 2 diabetes regulatory signatures. Mol Metab based on entropy for digitizing DNA sequences. Neural Comput
32:109–121. https://doi.org/10.1016/j.molmet.2019.12.006 Appl 29:207–215. https://doi.org/10.1007/s00521-017-2871-5
123
Neural Computing and Applications (2022) 34:12587–12599 12599
31. Daş B (2018) Development of new approaches based on signal computer vision-based data-driven pavement distress detection.
processing for disease diagnosis from Dna sequences, Fırat Constr Build Mater 157:322–330. https://doi.org/10.1016/j.con
University, PhD Thesis, 2018 buildmat.2017.09.110
32. Grandhi DG, Kumar CV (2007) 2-Simplex mapping for identi- 42. Simonyan K, Zisserman A (2015) Very deep convolutional net-
fying the protein coding regions in DNA. In: TENCON 2007- works for large-scale image recognition. arXiv:1409.1556 [cs]
2007 IEEE reg. 10 conf., pp 1–3. IEEE 43. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
33. Chakraborty S, Gupta V (2016) DWT Based cancer identification image recognition. In: Proceedings of the IEEE computer society
using EIIP. In: 2016 second international conference on compu- conference on computer vision and pattern recognition
tational intelligence communication technology (CICT), 44. Reddy N, Rattani A, Derakhshani R (2018) Comparison of deep
pp 718–723. https://doi.org/10.1109/CICT.2016.148 learning models for biometric-based mobile user authentication.
34. Akhtar M, Epps J, Ambikairajah E (2007) On DNA numerical In: 2018 IEEE 9th international conference on biometrics theory,
representations for period-3 based exon prediction. In: 2007 IEEE applications and systems (BTAS), pp 1–6. https://doi.org/10.
international workshop on genomic signal processing and statis- 1109/BTAS.2018.8698586
tics, pp 1–4. IEEE 45. Chen Z, Cen J, Xiong J (2020) Rolling bearing fault diagnosis
35. Cristea PD (2002) Conversion of nucleotides sequences into using time-frequency analysis and deep transfer convolutional
genomic signals. J Cell Mol Med 6:279–303. https://doi.org/10. neural network. IEEE Access 8:150248–150261. https://doi.org/
1111/j.1582-4934.2002.tb00196.x 10.1109/ACCESS.2020.3016888
36. Cristea PD (2005) Representation and Analysis of DNA 46. Dilmen E, Beyhan S (2017) A novel online LS-SVM approach
sequences. Genomic signal processing and statistics. Eurasip B for regression and classification. IFAC-PapersOnLine
Ser Signal Process Commun 15–66 50(1):8642–8647. https://doi.org/10.1016/j.ifacol.2017.08.1521
37. Yosinski J, Clune Y, Lipson BH (2014) How transferable are 47. Khairandish MO, Sharma M, Jain V, Chatterjee JM, Jhanjhi NZ
features in deep neural networks?. Adv Neural Inf Process Syst. (2021) A Hybrid CNN-SVM threshold segmentation approach for
http://arxiv.org/abs/1411.1792 tumor detection and classification of MRI brain images. IRBM.
38. Ozcan T, Basturk A (2019) Transfer learning-based convolutional https://doi.org/10.1016/j.irbm.2021.06.003
neural networks with heuristic optimization for hand gesture 48. Baby Saral G, Priya R (2021) Digital screen addiction with KNN
recognition. Neural Comput Appl 31:8955–8970. https://doi.org/ and -Logistic regression classification. Mater Today Proc. https://
10.1007/s00521-019-04427-y doi.org/10.1016/j.matpr.2020.11.360
39. Zeiler MD, Fergus R (2014) Visualizing and understanding 49. Wang Y, Pan Z, Dong J A new two-layer nearest neighbor
convolutional networks. In: Lect. Notes Comput. Sci. (Including selection method for kNN classifier—ScienceDirect. https://
Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), www.sciencedirect.com/science/article/pii/S0950705121008662.
pp 818–833 Accessed 07 Feb 2022
40. Ullah I, Hussain M, Qazi E-H, Aboalsamh H (2018) An auto-
mated system for epilepsy detection using EEG brain signals Publisher’s Note Springer Nature remains neutral with regard to
based on deep learning approach. Expert Syst Appl 107:61–71. jurisdictional claims in published maps and institutional affiliations.
https://doi.org/10.1016/j.eswa.2018.04.021
41. Gopalakrishnan K, Khaitan SK, Choudhary A, Agrawal A (2017)
Deep Convolutional Neural Networks with transfer learning for
123