Deep Learning in Bioinformatics: Seonwoo Min, Byunghan Lee and Sungroh Yoon
Deep Learning in Bioinformatics: Seonwoo Min, Byunghan Lee and Sungroh Yoon
doi: 10.1093/bib/bbw068
Advance Access Publication Date: 25 July 2016
Paper
Abstract
In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important
challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-
art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has
been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of
current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics do-
main (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural net-
works, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of
each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future
research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers
to apply deep learning approaches in their bioinformatics studies.
Key words: deep learning; neural network; machine learning; bioinformatics; omics; biomedical imaging; biomedical signal
processing.
Introduction
underlying patterns, build models, and make predictions based
In the era of ‘big data,’ transformation of large quantities of data on the best fit model. Indeed, some well-known algorithms (i.e.
into valuable knowledge has become increasingly important in support vector machines, random forests, hidden Markov mod-
various domains [1], and bioinformatics is no exception. els, Bayesian networks, Gaussian networks) have been applied
Significant amounts of biomedical data, including omics, image in genomics, proteomics, systems biology and numerous other
and signal data, have been accumulated, and the resulting po- domains [6].
tential for applications in biological and healthcare research The proper performance of conventional machine learning
has caught the attention of both industry and academia. For in- algorithms relies heavily on data representations called fea-
stance, IBM developed Watson for Oncology, a platform analyz- tures [7]. However, features are typically designed by human en-
ing patients’ medical information and assisting clinicians with gineers with extensive domain expertise and identifying which
treatment options [2, 3]. In addition, Google DeepMind, having features are more appropriate for the given task remains diffi-
achieved great success with AlphaGo in the game of Go, re- cult. Deep learning, a branch of machine learning, has recently
cently launched DeepMind Health to develop effective health- emerged based on big data, the power of parallel and distributed
care technologies [4, 5]. computing, and sophisticated algorithms. Deep learning has
To extract knowledge from big data in bioinformatics, ma- overcome previous limitations, and academic interest has
chine learning has been a widely used and successful method- increased rapidly since the early 2000s (Figure 1). Furthermore
ology. Machine learning algorithms use training data to uncover deep learning is responsible for major advances in diverse fields
Seonwoo Min is a M.S./Ph.D. candidate at the Department of Electrical and Computer Engineering, Seoul National University, Korea. His research areas in-
clude high-performance bioinformatics, machine learning for biomedical big data, and deep learning.
Byunghan Lee is a Ph.D. candidate at the Department of Electrical and Computer Engineering, Seoul National University, Korea. His research areas include
high-performance bioinformatics, machine learning for biomedical big data, and data mining.
Sungroh Yoon is an associate professor at the Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea. He received
his Ph.D. and postdoctoral training from Stanford University, Stanford, USA. His research interests include machine learning and deep learning for bio-
informatics, and high-performance bioinformatics.
Submitted: 20 March 2016; Received (in revised form): 16 June 2016
C The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
V
851
852 | Min et al.
where the artificial intelligence (AI) community has struggled data-driven features, representation learning, particularly deep
for many years [8]. One of the most important advancements learning has shown great promise. Representation learning can
thus far has been in image and speech recognition [9–15], al- discover effective features as well as their mappings from data
though promising results have been disseminated in natural for given tasks. Furthermore, deep learning can learn complex
language processing [16, 17] and language translation [18, 19]. features by combining simpler features learned from data. In
Certainly, bioinformatics can also benefit from deep learning other words, with artificial neural networks of multiple non-lin-
(Figure 2): splice junctions can be discovered from DNA se- ear layers, referred to as deep learning architectures, hierarch-
quences, finger joints can be recognized from X-ray images, ical representations of data can be discovered with increasing
lapses can be detected from electroencephalography (EEG) sig- levels of abstraction [25].
nals, and so on.
Previous reviews have addressed machine learning in bio-
Key elements of deep learning
informatics [6, 20] and the fundamentals of deep learning [7, 8,
21]. In addition, although recently published reviews by Leung The successes of deep learning are built on a foundation of sig-
et al. [22], Mamoshina et al. [23], and Greenspan et al. [24] dis- nificant algorithmic details and generally can be understood in
Figure 1. Approximate number of published deep learning articles by year. The number of articles is based on the search results on http://www.scopus.com with the
two queries: ‘Deep learning,’ ‘Deep learning’ AND ‘bio*’.
Deep learning in bioinformatics | 853
Figure 3. Relationships and high-level schematics of artificial intelligence, machine learning, representation learning, and deep learning [7].
combines simpler features into complex features so that the the inferenced outputs and the given labels. To minimize the
most suitable hierarchical representations can be learned from training error, the backward pass uses the chain rule to back-
data. A single cycle of the optimization process is organized as propagate error signals and compute gradients with respect to
follows [8]. First, given a training dataset, the forward pass se- all weights throughout the neural network [46]. Finally, the
quentially computes the output in each layer and propagates weight parameters are updated using optimization algorithms
the function signals forward through the network. In the final based on stochastic gradient descent (SGD) [47]. Whereas batch
output layer, an objective loss function measures error between gradient descent performs parameter updates for each
854 | Min et al.
Table 1. Abbreviations in alphabetical order advantage in the processing speed. C þþ based Caffe [59] and
Lua-based Torch [60] offer great advantages in terms of pre-
Abbreviation Full word
trained models and functional extensionality, respectively.
AE Auto-Encoder Python-based Theano [61, 62] provides a low-level library to de-
AI Artificial intelligence fine and optimize mathematical expressions; moreover, numer-
AUC Area-under-the-receiver operation characteristics curve ous higher-level wrappers such as Keras [63], Lasagne [64]
AUC-PR Area-under-the-precision–recall curve and Blocks [65] have been developed on top of Theano to pro-
BRNN Bidirectional recurrent neural network vide more intuitive interfaces. Google recently released the
CAE Convolutional auto-encoder C þþ based TensorFlow [66] with a Python interface. This library
CNN Convolutional neural network currently shows limited performance but is undergoing con-
DBN Deep belief network tinuous improvement, as heterogeneous distributed computing
DNN Deep neural network is now supported. In addition, TensorFlow can also take advan-
DST-NN Deep spatio-temporal neural network tage of Keras, which provides an additional model-level
ECG Electrocardiography interface.
Deep neural networks Protein structure [84–87] Anomaly classification [122–124] Brain decoding [158–163]
Gene expression regulation [93–98] Segmentation [133] Anomaly classification [171–175]
Protein classification [108] Recognition [142, 143]
Anomaly classification [111] Brain decoding [149, 150]
Convolutional neural Gene expression regulation [99–104] Anomaly classification [125–132] Brain decoding [164–167]
networks Segmentation [134–140] Anomaly classification [176]
Recognition [144–147]
Emergent architectures Protein structure [91, 92] Segmentation [141] Brain decoding [169, 170]
Notes: Speed for batch* is based on the averaged processing times for AlexNet [33] with batch size of 256 on a single GPU [57]; Caffe, Neon, Theano, Torch was utilized
with cuDNN v.3 while TensorFlow was utilized with cuDNN v.2.
Figure 6. Basic structure of CNNs consisting of a convolution layer, a non-linear layer and a pooling layer [32]. The convolution layer of CNNs uses multiple learned fil-
ters to obtain multiple filter maps detecting low-level filters, and then the pooling layer combines them into higher-level features.
signal processing. The three keys ideas of CNNs can be applied RNNs have been used successfully in many areas including nat-
not only in a one-dimensional grid to discover meaningful ural language processing [16, 17] and language translation [18,
recurring patterns with small variance, such as genomic se- 19].
quence motifs, but also in two-dimensional grids, such as inter- Even though RNNs have been explored less than DNNs and
actions within omics data and in time–frequency matrices of CNNs, they still provide very powerful analysis methods for se-
biomedical signals. Thus, we believe that the popularity and quential information. Since omics data and biomedical signals
promise of CNNs in bioinformatics applications will continue in are typically sequential and often considered languages of na-
the years ahead. ture, the capabilities of RNNs for mapping a variable-length in-
put sequence to another sequence or fixed-size prediction are
promising for bioinformatics research. With regard to biomed-
Recurrent neural networks
ical imaging, RNNs are currently not the first choice of many re-
RNNs, which are designed to utilize sequential information, searchers. Nevertheless, we believe that dissemination of
have a basic structure with a cyclic connection (Figure 7). Since dynamic CT and MRI [71, 72] would lead to the incorporation of
input data are processed sequentially, recurrent computation is RNNs and CNNs and elevate their importance in the long term.
performed in the hidden units where cyclic connection exists. Furthermore, we expect that their successes in natural language
Therefore, past information is implicitly stored in the hidden processing will lead RNNs to be applied in biomedical text ana-
units called state vectors, and output for the current input is lysis [73] and that employing an attention mechanism [74–77]
computed considering all previous inputs using these state vec- will improve performance and extract more relevant informa-
tors [8]. Since there are many cases where both past and future tion from bioinformatics data.
inputs affect output for the current input (e.g. in speech recogni-
tion), bidirectional recurrent neural networks (BRNNs) [70] have
also been designed and used widely (Figure 8).
Emergent architectures
Although RNNs do not seem to be deep as DNNs or CNNs in Emergent architectures refer to deep learning architectures be-
terms of the number of layers, they can be regarded as an even sides DNNs, CNNs and RNNs. In this review, we introduce three
deeper structure if unrolled in time (Figure 7). Therefore, for a emergent architectures (i.e. DST-NNs, MD-RNNs and CAEs) and
long time, researchers struggled against vanishing gradient their applications in bioinformatics.
problems while training RNNs, and learning long-term depend- DST-NNs [38] are designed to learn multi-dimensional out-
ency among data were difficult [35]. Fortunately, substituting put targets through progressive refinement. The basic structure
the simple perceptron hidden units with more complex units of DST-NNs consists of multi-dimensional hidden layers (Figure
such as LSTM [36, 37] or GRU [19], which function as memory 9). The key aspect of the structure, progressive refinement, con-
cells, significantly helps to prevent the problem. More recently, siders local correlations and is performed via input feature
Deep learning in bioinformatics | 857
Figure 7. Basic structure of RNNs with an input unit x, a hidden unit h and an
output unit y [8]. A cyclic connection exists so that the computation in the hid-
den unit receives inputs from the hidden unit at the previous time step and
from the input unit at the current time step. The recurrent computation can be
expressed more explicitly if the RNNs are unrolled in time. The index of each
symbol represents the time step. In this way, ht receives input from xt and ht–1
Figure 9. Basic structure of DST-NNs [38]. The notation hki;j represents the hidden
unit at (i, j) coordinate of the kth hidden layer. To conduct the progressive refine-
ment, the neighborhood units of hki;j and input units x are used in the computa-
tion of. hkþ1
i;j .
decoder, which extract feature vectors from input data and re-
create the data from the feature vectors, respectively. In CNNs,
convolution and pooling layers can be regarded as a type of en-
coder. Therefore, the CNN encoder and decoder consisting of
deconvolution and unpooling layers are integrated to form a
CAE and are trained in the same manner as in AE.
Deep learning is a rapidly growing research area, and a
plethora of new deep learning architecture is being proposed
but awaits wide applications in bioinformatics. Newly proposed
architectures have different advantages from existing architec-
tures, so we expect them to produce promising results in vari-
ous research areas. For example, the progressive refinement of
Figure 8. Basic structure of BRNNs unrolled in time [70]. There are two hidden
DST-NNs fits the dynamic folding process of proteins and can
units h! ! !
t and ht for each time step. ht receives input from xt and hðtþ1Þ to reflect
be effectively utilized in protein structure prediction [38]; the
past information; ht receives input from xt and hðtþ1Þ to reflect future informa-
tion. The information from both hidden units is propagated to yt. capabilities of MD-RNNs are suitable for segmentation of bio-
medical images since segmentation requires interpretation of
local and global contexts; the unsupervised representation
compositions in each layer: spatial features and temporal fea-
learning with consideration of spatial information in CAEs can
tures. Spatial features refer to the original inputs for the whole
provide great advantages in discovering recurring patterns in
DST-NN and are used identically in every layer. However, tem-
limited and imbalanced bioinformatics data.
poral features are gradually altered so as to progress to the
upper layers. Except for the first layer, to compute each hidden
unit in the current layer, only the adjacent hidden units of the Omics
same coordinate in the layer below are used so that local correl-
In omics research, genetic information such as genome, transcrip-
ations are reflected progressively. tome and proteome data is used to approach problems in bioinfor-
MD-RNNs [39] are designed to apply the capabilities of RNNs matics. Some of the most common input data in omics are raw
to non-sequential multi-dimensional data by treating them as biological sequences (i.e. DNA, RNA, amino acid sequences) which
groups of sequential data. For instance, two-dimensional data have become relatively affordable and easy to obtain with next-
are treated as groups of horizontal and vertical sequence data. generation sequencing technology. In addition, extracted features
Similar to BRNNs which use contexts in both directions in one- from sequences such as a position specific scoring matrices (PSSM)
dimensional data, MD-RNNs use contexts in all possible direc- [78], physicochemical properties [79, 80], Atchley factors [81] and
tions in the multi-dimensional data (Figure 10). In the example one-dimensional structural properties [82, 83] are often used as in-
of a two-dimensional dataset, four contexts that vary with the puts for deep learning algorithms to alleviate difficulties from com-
order of data processing are reflected in the computation of four plex biological data and improve results. In addition, protein
hidden units for each position in the hidden layer. The hidden contact maps, which present distances of amino acid pairs in their
units are connected to a single output layer, and the final re- three-dimensional structure, and microarray gene expression data
sults are computed with consideration of all possible contexts. are also used according to the characteristics of interest. We cate-
CAEs [40, 41] are designed to utilize the advantages of both gorized the topics of interest in omics into four groups (Table 4).
AE and CNNs so that it can learn good hierarchical representa- One of the most researched problems is protein structure predic-
tions of data reflecting spatial information and be well regular- tion, which aims to predict the secondary structure or contact map
ized by unsupervised training (Figure 11). In training of AEs, of a protein [84–92]. Gene expression regulation [93–107], including
reconstruction error is minimized using an encoder and splice junctions or RNA binding proteins, and protein classification
858 | Min et al.
Figure 11. Basic structure of CAEs consisting of a convolution layer and a pooling layer working as an encoder and a deconvolution layer and an unpooling layer work-
ing as a decoder [41]. The basic idea is similar to the AE, which learns hierarchical representations through reconstructing its input data, but CAE additionally utilizes
spatial information by integrating convolutions.
[108–110], including super family or subcellular localization, are expression regulation [93–98]. For example, Lee et al. [94] uti-
also actively investigated. Furthermore, anomaly classification lized DBN in splice junction prediction, a major research av-
[111] approaches have been used with omics data to detect cancer. enue in understanding gene expression [112], and proposed a
new DBN training method called boosted contrastive diver-
Deep neural networks gence for imbalanced data and a new regularization term for
sparsity of DNA sequences; their work showed not only signifi-
DNNs have been widely applied in protein structure prediction cantly improved performance but also the ability to detect sub-
[84–87] research. Since complete prediction in three-dimensional tle non-canonical splicing signals. Moreover, Chen et al. [96]
space is complex and challenging, several studies have used applied MLP to both microarray and RNA-seq expression data
simpler approaches, such as predicting the secondary struc- to infer expression of up to 21 000 target genes from only 1000
ture or torsion angles of protein. For instance, Heffernan et al. landmark genes. In terms of protein classification, Asgari et al.
[85] applied SAE to protein amino acid sequences to solve pre- [108] adopted the skip-gram model, a widely known method in
diction problems for secondary structure, torsion angle and ac- natural language processing, that can be considered a variant
cessible surface area. In another study, Spencer et al. [86] of MLP and showed that it could effectively learn a distributed
applied DBN to amino acid sequences along with PSSM and representation of biological sequences with general use for
Atchley factors to predict protein secondary structure. DNNs many omics applications, including protein family classifica-
have also shown great capabilities in the area of gene tion. For anomaly classification, Fakoor et al. [111] used
Deep learning in bioinformatics | 859
Table 4. Deep learning applied bioinformatics research avenues and input data
Omics Sequencing data (DNA-seq, RNA-seq, ChIP-seq, DNase-seq) Protein structure prediction [84–92]
Features from genomic sequence 1-Dimensional structural properties
Position specific scoring matrix (PSSM) Contact map
Physicochemical properties (steric parameter, volume) Structure model quality assessment
Atchley factors (FAC) Gene expression regulation [93–107]
1-Dimensional structural properties Splice junction
Contact map (distance of amino acid pairs in 3D structure) Genetic variants affecting splicing
Microarray gene expression Sequence specificity
Protein classification [108–110]
Super family
Subcellular localization
principal component analysis (PCA) [113] to reduce the dimen- For example, as an early approach, Denas et al. [99] prepro-
sionality of microarray gene expression data and applied SAE cessed ChIP-seq data into a two-dimensional matrix with the
to classify various cancers, including acute myeloid leukemia, rows as transcription factor activity profiles for each gene and
breast cancer and ovarian cancer. exploited a two-dimensional CNN similar to its use in image
processing. Recently, more studies focused on directly using
Convolutional neural networks one-dimensional CNNs with biological sequence data.
Alipanahi et al. [100] and Kelley et al. [103] proposed CNN-based
Relatively few studies have used CNNs to solve problems
approaches for transcription factor binding site prediction and
involving biological sequences, specifically gene expression
164 cell-specific DNA accessibility multitask prediction, respect-
regulation problems [99–104]; nevertheless, those have intro-
ively; both groups presented downstream applications for
duced the strong advantages of CNNs, showing their great
promise for future research. First, an initial convolution layer disease-associated genetic variant identification. Furthermore,
can powerfully capture local sequence patterns and can be con- Zeng et al. [102] performed a systematic exploration of CNN
sidered a motif detector for which PSSMs are solely learned architectures for transcription factor-binding site prediction
from data instead of hard-coded. The depth of CNNs enables and showed that the number of convolutional filters is more
learning more complex patterns and can capture longer motifs, important than the number of layers for motif-based tasks.
integrate cumulative effects of observed motifs, and eventually Zhou et al. [104] developed a CNN-based algorithmic framework,
learn sophisticated regulatory codes [114]. Moreover, CNNs are DeepSEA, that performs multitask joint learning of chromatin
suited to exploit the benefits of multitask joint learning. By factors (i.e. transcription factor binding, DNase I sensitivity,
training CNNs to simultaneously predict closely related factors, histone-mark profile) and prioritizes expression quantitative
features with predictive strengths are more efficiently learned trait loci and disease-associated genetic variants based on the
and shared across different tasks. predictions.
860 | Min et al.
the frequency components of EEG signals to classify left- and A few assessment metrics have been used to clearly observe
right-hand motor imagery skills. Moreover, Jia et al. [161] and how limited and imbalanced data might compromise the per-
Jirayucharoensak et al. [163] used DBN and SAE, respectively, for formance of deep learning [181]. While accuracy often gives
emotion classification. In anomaly classification [171–175], misleading results, the F-measure, the harmonic mean of preci-
Huanhuan et al. [171] published one of the few studies applying sion and recall, provides more insightful performance scores.
DBN to ECG signals and classified each beat into either a normal To measure performance over different class distributions, the
or abnormal beat. A few studies have used raw EEG signals. area-under-the-receiver operating characteristic curve (AUC)
Wulsin et al. [172] analyzed individual second-long waveform and the area-under-the-precision–recall curve (AUC-PR) are
abnormalities using DBN with both raw EEG signals and ex- commonly used. These two measures are strongly correlated
tracted features as inputs, whereas Zhao et al. [174] used only such that a curve dominates in one measure if and only if it
raw EEG signals as inputs for DBN to diagnose Alzheimer’s dominates in the other. Nevertheless, in contrast with AUC-PR,
disease. AUC might present a more optimistic view of performance,
since false positive rates in the receiver operating characteristic
Convolutional neural networks curve fail to capture large changes of false positives if classes
know very little about how such results are derived internally. machine learning research, which aims to automatically opti-
In bioinformatics, particularly in biomedical domains, it is not mize hyperparameters is growing constantly [196]. A few algo-
enough to simply produce good outcomes. Since many studies rithms have been proposed including sequential model based
are connected to patients’ health, it is crucial to change the global optimization [197], Bayesian optimization with Gaussian
black-box into the white-box providing logical reasoning just as process priors [198] and random search approaches [199].
clinicians do for medical treatments.
Transformation of deep learning from the black-box into the Multimodal deep learning
white-box is still in the early stages. One of the most widely
Multimodal deep learning [200], which exploits information
used approaches is interpretation through visualizing a trained
from multiple input sources, is a promising avenue for the fu-
deep learning model. In terms of image input, a deconvolutional
ture of deep learning research. In particular, bioinformatics is
network has been proposed to reconstruct and visualize hier-
expected to benefit greatly, as it is a field where various types of
archical representations for a specific input of CNNs [190]. In
data can be assimilated naturally [201]. For example, not only
addition, to visualize a generalized class representative image
are omics data, images, signals, drug responses and electronic
rather than being dependent on a particular input, gradient
examples, which degrade performance with small human- neural networks, convolutional neural networks, re-
imperceptible perturbations, have received increased attention current neural networks, emergent architectures).
from the machine learning community [219, 220]. Since adver- • Furthermore, we discuss the theoretical and practical
sarial training of neural networks can result in regularization to issues plaguing the applications of deep learning
provide higher performance, we expect additional studies in in bioinformatics, including imbalanced data, inter-
this area, including those involving adversarial generative net- pretation, hyperparameter optimization, multimodal
works [221] and manifold regularized networks [222]. deep learning, and training acceleration.
In terms of learning methodology, semi-supervised learning • As a comprehensive review of existing works, we be-
and reinforcement learning are also receiving attention. Semi- lieve that this paper will provide valuable insight and
supervised learning exploits both unlabeled and labeled data,
serve as a launching point for researchers to apply
and a few algorithms have been proposed. For example, ladder
deep learning approaches in their bioinformatics
networks [223] add skip connections to MLP or CNNs, and simul-
studies.
taneously minimize the sum of supervised and unsupervised
cost functions to denoise representations at every level of the
estimation. In: Advances in Neural Information Processing 32. Lawrence S, Giles CL, Tsoi AC, et al. Face recognition: a con-
Systems. 2014, 1799–807. volutional neural-network approach. IEEE Trans Neural Netw
12. Liu N, Han J, Zhang D, et al. Predicting eye fixations using 1997;8(1):98–113.
convolutional neural networks. In: Proceedings of the IEEE 33. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification
Conference on Computer Vision and Pattern Recognition. 2015. p. with deep convolutional neural networks. In: Advances in
362–70. Neural Information Processing Systems, 2012. p. 1097–105.
13. Hinton G, Deng L, Yu D, et al. Deep neural networks for 34. Williams RJ, Zipser D. A learning algorithm for continually
acoustic modeling in speech recognition: the shared views running fully recurrent neural networks. Neural Comput
of four research groups. IEEE Signal Process Mag 1989;1(2):270–80.
2012;29(6):82–97. 35. Bengio Y, Simard P, Frasconi P. Learning long-term depend-
14. Sainath TN, Mohamed A-R, Kingsbury B, et al. Deep convolu- encies with gradient descent is difficult. IEEE Trans Neural
tional neural networks for LVCSR. In: 2013 IEEE International Netw 1994;5(2):157–66.
Conference on Acoustics, Speech and Signal Processing (ICASSP), 36. Hochreiter S, Schmidhuber J. Long short-term memory.
2013. p. 8614–8. IEEE, New York. Neural Comput 1997;9(8):1735–80.
55. Ioffe S, Szegedy C. Batch normalization: accelerating deep 77. Mnih V, Heess N, Graves A. Recurrent models of visual at-
network training by reducing internal covariate shift. arXiv tention. In: Advances in Neural Information Processing Systems,
Preprint arXiv:1502.03167, 2015. 2014, p. 2204–12.
56. Deeplearning4j Development Team. Deeplearning4j: open- 78. Jones DT. Protein secondary structure prediction based on
source distributed deep learning for the JVM. Apache position-specific scoring matrices. J Mol Biol 1999;292(2):195–202.
Software Foundation License 2.0. http://deeplearning4j.org, 79. Ponomarenko JV, Ponomarenko MP, Frolov AS, et al.
2016. Conformational and physicochemical DNA features specific
57. Bahrampour S, Ramakrishnan N, Schott L, et al. for transcription factor binding sites. Bioinformatics
Comparative study of deep learning software frameworks. 1999;15(7):654–68.
arXiv Preprint arXiv:1511.06435, 2015. 80. Cai Y-D, Lin SL. Support vector machines for predicting
58. Nervana Systems. Neon. https://github.com/ rRNA-, RNA-, and DNA-binding proteins from amino acid
NervanaSystems/neon, 2016. sequence. Biochim Biophys Acta (BBA) – Proteins Proteomics
59. Jia Y. Caffe: an open source convolutional architecture for 2003;1648(1):127–33.
fast feature embedding. In: ACM International Conference on 81. Atchley WR, Zhao J, Fernandes AD, et al. Solving the protein
97. Li Y, Shi W, Wasserman WW. Genome-wide prediction of 120. Bailey DL, Townsend DW, Valk PE, et al. Positron Emission
cis-regulatory regions using supervised deep learning meth- Tomography. Springer, London, 2005.
ods. bioRxiv 2016;041616. 121. Gurcan MN, Boucheron LE, Can A, et al. Histopathological
98. Liu F, Ren C, Li H, et al. De novo identification of replication- image analysis: a review. Biomed Eng, IEEE Rev 2009;2:147–71.
timing domains in the human genome by deep learning. 122. Plis SM, Hjelm DR, Salakhutdinov R, et al. Deep learning for
Bioinformatics 2015;btv643. neuroimaging: a validation study. Front Neurosci 2014;8:229.
99. Denas O, Taylor J. Deep modeling of gene expression regula- 123. Hua K-L, Hsu C-H, Hidayati SC, et al. Computer-aided classi-
tion in an Erythropoiesis model. In: International Conference fication of lung nodules on computed tomography images
on Machine Learning workshop on Representation Learning. via deep learning technique. Onco Targets Ther 2015;8:
Atlanta, Georgia, USA, 2013. 2015–22.
100. Alipanahi B, Delong A, Weirauch MT, et al. Predicting the se- 124. Suk H-I, Shen D. Deep learning-based feature representation
quence specificities of DNA-and RNA-binding proteins by for AD/MCI classification. In: Medical Image Computing and
deep learning. Nat Biotechnol 2015;33(8):825–6. Computer-Assisted Intervention – MICCAI 2013. Springer, New
101. Lanchantin J, Singh R, Lin Z, et al. Deep motif: visualizing York, 2013. 583–90.
138. Prasoon A, Petersen K, Igel C, et al. Deep feature learning for 158. Freudenburg ZV, Ramsey NF, Wronkeiwicz M, et al. Real-
knee cartilage segmentation using a triplanar convolutional time naive learning of neural correlates in ECoG electro-
neural network. Medical Image Computing and Computer- physiology. Int J Mach Learn Comput 2011.
Assisted Intervention – MICCAI 2013. Springer, Heidelberg, 159. An X, Kuang D, Guo X, et al. A deep learning method for clas-
2013, 246–53. sification of EEG data based on motor imagery. In: Intelligent
139. Havaei M, Davy A, Warde-Farley D, et al. Brain tumor seg- Computing in Bioinformatics. Springer, Heidelberg, 2014,
mentation with deep neural networks. arXiv Preprint 203–10.
arXiv:1505.03540, 2015. 160. Li K, Li X, Zhang Y, et al. Affective state recognition from
140. Roth HR, Lu L, Farag A, et al. Deeporgan: multi-level deep EEG with deep belief networks. In: 2013 IEEE International
convolutional networks for automated pancreas segmenta- Conference on Bioinformatics and Biomedicine (BIBM), 2013.
tion. In: Medical Image Computing and Computer-Assisted p. 305–10. IEEE, New York.
Intervention – MICCAI 2015. Springer, Heidelberg, 2015, 161. Jia X, Li K, Li X, et al. A novel semi-supervised deep learning
556–64. framework for affective state recognition on EEG signals. In:
141. Stollenga MF, Byeon W, Liwicki M, et al. Parallel multi- 2014 IEEE International Conference on Bioinformatics and
175. La€ ngkvist M, Karlsson L, Loutfi A. Sleep stage classification 197. Hutter F, Hoos HH, Leyton-Brown K. Sequential model-
using unsupervised feature learning. Adv Artif Neural Syst based optimization for general algorithm configuration. In:
2012;2012:5. Learning and Intelligent Optimization. Springer, Berlin, 2011,
176. Mirowski P, Madhavan D, LeCun Y, et al. Classification of pat- 507–23.
terns of EEG synchronization for seizure prediction. Clin 198. Snoek J, Larochelle H, Adams RP. Practical bayesian opti-
Neurophysiol 2009;120(11):1927–40. mization of machine learning algorithms. In: Advances in
177. Petrosian A, Prokhorov D, Homan R, et al. Recurrent neural Neural Information Processing Systems. 2012, 2951–9.
network based prediction of epileptic seizures in intra-and 199. Bergstra J, Bengio Y. Random search for hyper-parameter
extracranial EEG. Neurocomputing 2000;30(1):201–18. optimization. J Mach Learn Res 2012;13(1):281–305.
178. Davidson PR, Jones RD, Peiris MT. EEG-based lapse detection 200. Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning.
with high temporal resolution. IEEE Trans Biomed Eng In: Proceedings of the 28th International Conference on Machine
2007;54(5):832–9. Learning (ICML-11), 2011. p. 689–96.
179. Oh S, Lee MS, Zhang B-T. Ensemble learning with active ex- 201. Cao Y, Steffey S, He J, et al. Medical image retrieval: a multi-
ample selection for imbalanced biomedical data classifica- modal approach. Cancer Inform 2014;13(Suppl 3):125.
219. Szegedy C, Zaremba W, Sutskever I, et al. Intriguing prop- 223. Rasmus A, Berglund M, Honkala M, et al. Semi-supervised
erties of neural networks. arXiv Preprint arXiv:1312.6199, learning with ladder networks. In: Advances in Neural
2013. Information Processing Systems. 2015, 3532–40.
220. Goodfellow IJ, Shlens J, Szegedy C. Explaining and harness- 224. Arel I. Deep reinforcement learning as foundation for artifi-
ing adversarial examples. arXiv Preprint arXiv:1412.6572, cial general intelligence. In: Theoretical Foundations of Artificial
2014. General Intelligence. Springer, Berlin, 2012, 89–102.
221. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative ad- 225. Cutler M, How JP. Efficient reinforcement learning for robots using
versarial nets. In: Advances in Neural Information Processing informative simulated priors. In: 2015 IEEE International Conference
Systems. 2014, 2672–80. on Robotics and Automation (ICRA), 2015. p. 2605–12. IEEE, New York.
222. Lee T, Choi M, Yoon S. Manifold regularized deep neural net-
works using adversarial examples. arXiv Preprint arXiv:
1511.06381, 2015.