Deeplearning Survery
Deeplearning Survery
https://doi.org/10.1093/bib/bbaa229
Method Review
Abstract
DNA/RNA motif mining is the foundation of gene function research. The DNA/RNA motif mining plays an extremely
important role in identifying the DNA- or RNA-protein binding site, which helps to understand the mechanism of gene
regulation and management. For the past few decades, researchers have been working on designing new efficient and
accurate algorithms for mining motif. These algorithms can be roughly divided into two categories: the enumeration
approach and the probabilistic method. In recent years, machine learning methods had made great progress, especially the
algorithm represented by deep learning had achieved good performance. Existing deep learning methods in motif mining
can be roughly divided into three types of models: convolutional neural network (CNN) based models, recurrent neural
network (RNN) based models, and hybrid CNN–RNN based models. We introduce the application of deep learning in the field
of motif mining in terms of data preprocessing, features of existing deep learning architectures and comparing the
differences between the basic deep learning models. Through the analysis and comparison of existing deep learning
methods, we found that the more complex models tend to perform better than simple ones when data are sufficient, and
the current methods are relatively simple compared with other fields such as computer vision, language processing (NLP),
computer games, etc. Therefore, it is necessary to conduct a summary in motif mining by deep learning, which can help
researchers understand this field.
Key words: motif mining; deep learning; protein binding site; recurrent neural networks; convolutional neural network
Ying He is pursuing a Ph.D. degree in computer science and technology at Tongji University, China. His research interests include bioinformatics, machine
learning and deep learning.
Zhen Shen is pursuing a Ph.D. degree in computer science and technology at Tongji University, China. His research interests include bioinformatics,
machine learning and deep learning.
Qinhu Zhang received a Ph.D. degree in computer science and technology at Tongji University, China, in 2019. He is currently working at Tongji University
as a post-doctor. His research interests include bioinformatics, machine learning and deep learning.
Siguo Wang is working toward the Ph.D. degree in computer science and technology, Tongji University, China. Her research interests include bioinformatics,
machine learning and deep learning.
De-Shuang Huang is a chaired professor at Tongji University. At present, he is the Director of the Institute of Machines Learning and Systems Biology, Tongji
University. Dr. Huang is currently IAPR Fellow and a senior member of the IEEE. His current research interest includes bioinformatics, pattern recognition
and machine learning.
Submitted: 18 July 2020; Received (in revised form): 19 August 2020
© The Author(s) 2020. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/
licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
For commercial re-use, please contact [email protected]
1
2 He et al.
of motif research, various motif mining algorithms emerge [9]. such as iDeepS that combine CNN and RNN to target specific
Early motif mining methods are mainly divided into two prin- RNA binding proteins (RBP) [54]. The advantage of the combined
cipal types: enumeration methods and probabilistic methods: model of RNN and CNN is that the newly added RNN layer can
enumeration approach and probabilistic method [10]. capture the long-term dependency between sequence features
The first class is based on simple word enumeration. Yeast by learning the features extracted by the CNN layer to improve
Motif Finder (YMF) algorithm used consensus representation to the accuracy of prediction. Other researchers used a pure RNN-
detect short motifs with a small number of degenerate positions based method: the KEGRU method [55] created an internal state
in the yeast genome developed by Sinha et. al [11]. YMF is mainly of the network by using a k-mer representation and embedding
divided into two steps: the first step enumerates all motifs of layer, and it captures long-term dependencies by combining with
search spaces and the second step calculates the z-score of all a layer of bidirectional gated recurrent units (bi-GRUs). Besides,
motifs to find the greatest one. Bailey proposed discriminative many researchers have done a lot of works based on three basic
regular expression motif elicitation algorithm that calculated the models, for example, Xiaoyong Pan [56], Qinhu Zhang [51, 57],
significance of motifs using Fisher’s Exact test [12]. Wenxuan Xu [58], Dailun Wang [59] and Wenbo Yu [60].
To accelerate the running speed of word enumeration-based Although, there are currently many deep learning methods
motif mining methods, some special methods were used, like in motif mining. Those methods compared to the deep learn-
suffix trees, parallel processing [13]. Besides, motif mining algo- ing methods in the field of computer vision and NLP, such as
rithms, such as LMMO [14], DirectFS [9], ABC [15], DiscMLA [16], image field [61, 62], video field [63] and question answering field
CisFinder [12], Weeder [17], Fmotif [18] and MCES [19] all used [64], are also relatively primitive and simple. Therefore, it is
this idea in the model. necessary to summarize the motif mining through deep learn-
In probabilistic-based motif mining methods, a probabilistic ing to help researchers to better understand the field. In this
model that needs a few parameters will be constructed [20]. paper, we introduce the basic biological background knowledge
These methods provided a base distribution of bases for each about motif mining and provide insights into the differences
site in the binding region to distinguish the motif is exist or not between the basic models of deep learning CNN and RNN, and
[21]. These methods usually built distribution by the position- discuss some new trends in the development of deep learning.
specific scoring matrix (PSSM/PWM) or motif matrix [22]. PWM This article hopes to help researchers who do not have basic
was an m by n size matrix (m represents the length of a specific deep learning or basic biology Background knowledge to quickly
protein binding site, and n represents the type of nucleotide understand topic mining.
base), which was used to indicate the degree of preference of The remainder of this paper is organized as follows: The
a specific protein binding motif at each position [23]. Just as second section describes the basic biological background knowl-
Figure 1 shows, PWM can intuitively express the binding pref- edge, several common databases and the basic knowledge of
erence of a specific protein with fewer parameters, so if a set motif. Then, the third section describes different models of
of specific protein binding site data is given, the parameters of deep learning algorithms for DNA/RNA motif mining. Finally, we
PWM can be learned from these binding site data. Some methods further discuss some new developments and challenges in motif
are based on PWM approaches such as MEME [11], STEME [24], mining deep learning and possible future directions in the fourth
EXTREME [25], AlignACE [26] and BioProspector [27]. section.
ChIP-seq and high-throughput sequencing have tremen-
dously increased the amount of data available in vivo [28], which
makes it possible to study the motif mining by deep learning
Basic Knowledge of Motif
[29]. In bioinformatics, although deep learning methods are not In this section, we introduce the some basic knowledge of motif
many at present, it is now on the rise [30]. Known applications mining. Motif mining (or motif discovery) in biological sequences
include DNA methylation [31, 32], protein classification [33–35], can be defined as the problem of finding a set of short, similar,
splicing regulation and gene expression [36–38] and biological conserved sequence elements (‘motifs’) that are often short
image analysis tasks [39–42]. Of particular relevance to our work and similar in nucleotide sequence with common biological
is the development of applications for motif mining, such as functions [65]. Motif mining has been one of the widely studied
DNA-/RNA-protein binding sites [43], chromatin accessibility problems in bioinformatics, such as transcription factor binding
[36, 44–46], enhancer [47–49], DNA-shape [50, 51]. site (TFBS) because its biological significance and bioinformatics
DeepBind [43] is the first study to apply deep learning in significance is highly significant [66, 67].
motif mining. Just as Figure 2 shows, DeepBind attempted to As shown in Figure 3, it shows how multiple sequences rec-
describe the method by CNN and predicts DNA-protein/RNA- ognize the same transcription factor (CREB). Their ‘consensus’
protein binding sites in a way that machine learning or genomics means that each position has its own more friendly nucleic acid
researchers can easily understand. It treated a genome sequence by the transcription factor. Since transcription factor binding
window as a picture. Unlike an image composed of pixels with can tolerate approximate values, all oligos that differ from the
three color channels (R, G, B), it treated the genomic sequence consensus sequence to the maximum number of nucleotide
as a fixed-length sequence window composed of four channels substitutions can be considered as valid instances of the same
(A, C, G, T) or (A, C, G, U). Therefore, the problem of DNA pro- TFBS.
tein binding site prediction is similar to the problem of binary After understanding the basic concept of motif, we introduce
classification of pictures. common databases and data preprocessing methods. The com-
After this, a series of research on deep learning in motifs min- monly used motif mining database is as follows: TCGA database
ing appeared. Some researchers focused on the impact of various [68], NCBI database [69] and ENCODE database [70]. Generally
parameters in deep learning, such as the number of layers, on speaking, two data preprocessing methods are the following
motif mining [52]. Some researchers have made more attempts methods as shown in Figure 4, bottom left.
for deep learning frameworks, adding a long short-term memory The simple method is to use the one-hot encoding. One-hot
(LSTM) layer to DeepBind, and obtained a new model combining is often used for indicating the state of a state machine [71].
CNN and RNN for motif mining [53]. Besides, there are methods For example, using one-hot codes to encode DNA sequences
Survey on deep learning in DNA/RNA motif mining 3
Figure 1. The process of generating PSSM, position frequency matrix (PFM) and logo of SPI1 [104]. The process of as follows generating PSSM, PFM and logo of SPI1.
First, generate a PFM based on the number of times each type of nucleotide appears in each position of the alignment. Then, convert the PFM into a logarithmic scale
PSSM/PWM. By adding the corresponding nucleotide values of PSSM, the score of any DNA sequence window with the same length as the matrix can be calculated and
drawn into a logo map.
as binary vectors: A = (1,0,0,0), G = (0,1,0,0), C = (0,0,1,0) and fully connected network (FCN) augmented at the end to trans-
T = (0,0,0,1). RNA sequences can also be encoded similarly by form feature vectors into a scalar binding score. It also opened
simply changing T to U. It is easy to design and modify, and up a precedent for deep learning in motif mining and provides
easy to detect illegal states. However, it is easily sparse and a basic framework for other deep learning methods. It corre-
context-free. sponded to each base to four channels similar to the RGB channel
Another method is to label with k-mers and vectorize by in color and used one-hot encoding to complete vectorization.
embedding [44]. For example, we can tokenize the DNA sequence Many subsequent methods use this to build their models.
‘ATCGCGTACGATCCG’ as different k-mers, as shown in Table 1. DeepSEA [38] was a deep learning method based on CNN,
Different k-mers can be vectorized using the embedding method which used three convolution layers with 320, 480 and 960
widely used in the NLP field [72], such as word2vec [73]. RNA kernels, respectively. Higher-level convolutional layers receive
sequences can be represented similarly. input from a larger spatial range, and lower-level convolutional
network layers can represent more complex features. DeepSEA
added an FCN layer on top of the third convolutional layer, in
Deep Learning in Motif Mining
which all neurons receive input from all outputs of the previous
In recent years, deep learning has achieved great success in layer so that the information of the entire sequence data can
various application scenarios, which makes researchers try to be completely obtained. The convolution step of the DeepSEA
apply it to DNA or RNA motif mining. Next, we introduce these model consisted of three convolutional layers and two maxi-
models in detail. There are three main types of deep learn- mum merge layers, and the motif was learned in alternating
ing frameworks in motif mining: CNN-based models (Figure 4, order.
left), RNN-based models (Figure 4, center), hybrid CNN–RNN- DeepSNR [74] was a deep learning method based on CNN. The
based models (Figure 4, right). We summarize several classic convolution part of the DeepSNR model had the same structure
deep learning methods in motif mining, as shown in Table 2. as the DeepBind network. But DeepSNR added that the deconvo-
DeepBind [43] is the first attempt to use CNN to predict DNA lution network is a mirrored version of the convolution network,
or RNA motifs from original DNA or RNA sequences. DeepBind which can reduce the size of the activation and enlarges the acti-
used a single CNN layer, which consists of one convolutional vations through combinations of unpooling and deconvolution
layer, followed by rectification and pooling operation, and one operations.
4 He et al.
Figure 2. The parallel training process of Deepbind [43]. (A) The DeepBind model processes five independent sequences in parallel. The data first passes through the
convolutional layer to extract features, then passes through the pooling layer to optimize the features. Finally, features go through the activation function to output
the prediction result and compare with the target to calculate the loss and update weight to improve the prediction accuracy. (B) It is shown in detail that the dataset
is divided into validate set, train set and test set, which are used to calculate validate AUC (area under the curve), training AUC and test AUC, respectively, to select the
optimal parameters.
It shows DNA sequence ‘ATCGCGTACGATCCG’ is cut into multiple different k-mers and his vector when the length is (3,4,5,4,4) and the window is (3,4,5,2,3).
Architecture CNN CNN CNN CNN CNN + RNN CNN + RNN RNN CNN + RNN
Embedding NO NO NO NO NO NO YES NO
Input One-hot One-hot One-hot One-hot One-hot k-mer k-mer One-hot
It shows the architecture, embedding and input of eight classic deep learning models in motif mining.
Dilated [75] was a deep learning method based on dilated DanQ [53] used a single layer CNN followed by a bidirectional
multilayer CNN. This method learns the mapping from the DNA LSTM (BLSTM). The first layer of the DanQ model aimed to scan
region of the nucleotide sequence to the position of the regula- the position of the motif in the sequence through convolution
tory marker in this region. The dilated convolution can capture a filtering. The convolution step of the DanQ model was much
hierarchical representation of the input space that is larger than simpler than DeepSEA. It contained a convolutional layer and
the standard convolution so that they can be scaled to larger a maximum merge layer to learn the motif. After the largest
before and after sequences. pooling layer was the BLSTM layer. Motifs can follow the
Survey on deep learning in DNA/RNA motif mining 5
Figure 4. Sequence representation of motif mining [78]. It shows two data preprocessing methods(bottom left) and three architectures include CNN-only (left), RNN-only
(center) and hybrid CNN–RNN models (right).
Recently, since the adversarial training of neural networks As we enter the era of big data, whether it is in academic
can lead to regularization to provide higher performance, this or industrial, deep learning is already a very important
field has developed rapidly, including involving adversarial gen- development direction. In bioinformatics, which has made
erative networks [89] and a series of related research such as great progress in traditional machine learning, deep learning
Wasserstein GAN [90], MolGAN [91] and NetGAN [92]. In motif is expected to produce encouraging results [99]. In this review,
mining, GAN may be used to automatically generate negative we conducted a comprehensive review of the application of deep
examples instead of simple random generation or shuffling the learning in the field of motif mining. We desire that this review
positive sequence. Besides, pretraining models [93] that have will provide help researchers understand this field and promote
achieved significant results in the NLP field, from word2vec [73, the application of motif mining in research.
94] to now Bert [95] and GPT [96]. In motif mining, pretraining Of course, we also need to recognize the limitations of deep
can be used to enhance the robustness and generalization ability learning methods and the promising direction of future research.
of the model. The great success of AlphaGo [97] has set off Although deep learning is promising, it is not a panacea.
an unprecedented change in the Go world, and it has made In many applications of motif mining, there are still many
deep reinforcement learning familiar to the public. In particular, potential challenges, including unbalanced or limited data,
AlphaGo Zero does not require any history of human chess, and interpretation of deep learning results [71] and the choice of
only uses deep reinforcement learning [98]. The achievement of appropriate architecture and hyperparameters. For unbalanced
training from 0 to 3 days has far exceeded the knowledge of Go or limited data, the common methods are enhanced datasets [48]
that humans have accumulated for thousands of years. In motif or few-shot learning [100]. For interpretation of deep learning
mining, reinforcement learning may enable people to learn more results, common methods are the interpretability of the model
motifs beyond human knowledge. itself [101] or the interpretation after the prediction [71]. For
Survey on deep learning in DNA/RNA motif mining 7
Figure 5. Comparison results of nine deep learning models [78]. It compares the performance of these models in predicting DNA and RNA motif mining tasks. (A) The
AUC distribution of nine models in 83 ChIP-seq datasets. (B) P-value annotated heat maps using paired models of nine models in 83 ChIP-seq datasets. (C) The AUC
distribution of nine models in 31 CLIP-seq datasets. (D) P-value annotated heat maps using paired models of nine models in 31 datasets.
Key Points
• Motif mining (or motif discovery) in biological Acknowledgement
sequences can be defined as the problem of finding
This work was supported by the grant of National Key R&D
a set of short, similar, conserved sequence elements
Program of China (Nos. 2018AAA0100100 & 2018YFA0902600)
(‘motifs’) that are often short and similar in nucleotide
and partly supported by National Natural Science Foun-
sequence with common biological functions. Motif
plays a key role in the gene-expression regulating dation of China (Grant nos. 61861146002, 61520106006,
both transcriptional and posttranscriptional levels. 61732012, 61932008, 61772370, 61672382, 61702371, 61532008,
• In recent years, deep learning has achieved great suc- 61772357, and 61672203) and China Postdoctoral Science
cess in various application scenarios, which makes Foundation (Grant no. 2017M611619) and supported by
researchers try to apply it to DNA or RNA motif mining. “BAGUI Scholar” Program and the Scientific & Technological
There are three main types of deep learning frame- Base and Talent Special Program, GuiKe AD18126015 of
works in motif mining: CNN-based models, RNN- the Guangxi Zhuang Autonomous Region of China and
based models and hybrid CNN–RNN-based models. supported by Shanghai Municipal Science and Technology
Major Project (No. 2018SHZDZX01), LCNBI and ZJLab.
8 He et al.
Conference on Computer Vision and Pattern Recognition Work- 60. Yu W, Yuan C-A, Qin X, et al. Hierarchical attention network
shops 2016, 77–85. for predicting DNA-protein binding sites. In: International
41. Mahmud M, Kaiser MS, Hussain A, et al. Applications of Conference on Intelligent Computing. Berlin: Springer, 2019,
deep learning and reinforcement learning to biological 366–73.
data. IEEE Trans Neural Netw Learn Syst 2018;29:2063–79. 61. Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image
42. Affonso C, Rossi ALD, Vieira FHA, et al. Deep learn- caption generation with visual attention. International Con-
ing for biological image classification. Expert Syst Appl ference on Machine Learning, 2015, 2048–57.
2017;85:114–22. 62. Tang P, Wang H, Kwong S. G-MS2F: GoogLeNet based multi-
43. Alipanahi B, Delong A, Weirauch MT, et al. Predicting the stage feature fusion of deep CNN for scene recognition.
sequence specificities of DNA-and RNA-binding proteins Neurocomputing 2017;225:188–97.
by deep learning. Nat Biotechnol 2015;33:831–8. 63. Yao L, Torabi A, Cho K, et al. Describing videos by exploiting
44. Min X, Zeng W, Chen N, et al. Chromatin accessibil- temporal structure. In: Proceedings of the IEEE International
ity prediction via convolutional long short-term memory Conference on Computer Vision. 2015, pp. 4507-15.
networks with k-mer embedding. Bioinformatics 2017;33: 64. Noh H, Hongsuck Seo P, Han B. Image question answering
i92–101. using convolutional neural network with dynamic param-
45. Nair S, Kim DS, Perricone J, et al. Integrating regulatory eter prediction. In: Proceedings of the IEEE Conference on
DNA sequence and gene expression to predict genome- Computer Vision and Pattern Recognition. 2016, pp. 30-8.
wide chromatin accessibility across cellular contexts. Bioin- 65. Zambelli F, Pesole G, Pavesi G. Motif discovery and
formatics 2019;35:i108–16. transcription factor binding sites before and after the
46. Liu Q, Xia F, Yin Q, et al. Chromatin accessibility prediction next-generation sequencing era. Brief Bioinform 2013;14:
via a hybrid deep convolutional neural network. Bioinfor- 225–37.
matics 2018;34:732–8. 66. Pavesi G, Mauri G, Pesole G. In silico representation and dis-
47. Kleftogiannis D, Kalnis P, Bajic VB. DEEP: a general compu- covery of transcription factor binding sites. Brief Bioinform
tational framework for predicting enhancers. Nucleic Acids 2004;5:217–36.
Res 2015;43:e6–6. 67. Sandve GK, Drabløs F. A survey of motif discovery methods
48. Cohn D, Zuk O, Kaplan T. Enhancer identification using in an integrated framework. Biol Direct 2006;1:1–16.
transfer and adversarial deep learning of DNA sequences. 68. Tomczak K, Czerwińska P, Wiznerowicz M. The cancer
BioRxiv 2018; 264200. genome atlas (TCGA): an immeasurable source of knowl-
49. Yang B, Liu F, Ren C, et al. BiRen: predicting enhancers with a edge. Contemporary Oncol 2015;19:A68.
deep-learning-based model using the DNA sequence alone. 69. Sherry ST, Ward M-H, Kholodov M, et al. dbSNP: the
Bioinformatics 2017;33:1930–6. NCBI database of genetic variation. Nucleic Acids Res
50. Yang J, Ma A, Hoppe AD, et al. Prediction of regulatory motifs 2001;29:308–11.
from human Chip-sequencing data using a deep learning 70. Consortium EP. The ENCODE (ENCyclopedia of DNA ele-
framework. Nucleic Acids Res 2019;47:7809–24. ments) project. Science 2004;306:636–40.
51. Zhang Q, Shen Z, Huang D-S. Predicting in-vitro transcrip- 71. Lanchantin J, Singh R, Wang B, et al. Deep motif dashboard:
tion factor binding sites using DNA sequence+ shape. visualizing and understanding genomic sequences using
IEEE/ACM Trans Comput Biol Bioinform 2019. deep neural networks. In: Pacific Symposium on Biocomputing,
52. Zhang S, Zhou J, Hu H, et al. A deep learning framework Vol. 2017. Singapore: World Scientific, 2017, 254–65.
for modeling structural features of RNA-binding protein 72. Koren S, Walenz BP, Berlin K, et al. Canu: scalable and
targets. Nucleic Acids Res 2016;44:e32–2. accurate long-read assembly via adaptive k-mer weighting
53. Quang D, Xie X. DanQ: a hybrid convolutional and recurrent and repeat separation. Genome Res 2017;27:722–36.
deep neural network for quantifying the function of DNA 73. Goldberg Y, Levy O. word2vec explained: deriving Mikolov
sequences. Nucleic Acids Res 2016;44:e107–7. et al.’s negative-sampling word-embedding method.
54. Pan X, Rijnbeek P, Yan J, et al. Prediction of RNA- arXiv:1402.3722. 2014.
protein sequence and structure binding preferences using 74. Salekin S, Zhang JM, Huang Y. A deep learning model
deep convolutional and recurrent neural networks. BMC for predicting transcription factor binding location at sin-
Genomics 2018;19:511. gle nucleotide resolution. In: 2017 IEEE EMBS International
55. Shen Z, Bao W, Huang D-S. Recurrent neural network Conference on Biomedical & Health Informatics. 2017, pp. 57-60.
for predicting transcription factor binding sites. Sci Rep 75. Gupta A, Rush AM. Dilated convolutions for modeling long-
2018;8:1–10. distance genomic dependencies. arXiv:1710.01278. 2017.
56. Pan X, Shen H-B. Predicting RNA–protein binding sites and 76. Visel A, Minovitsky S, Dubchak I, et al. VISTA enhancer
motifs through combining local and global deep convolu- browser—a database of tissue-specific human enhancers.
tional neural networks. Bioinformatics 2018;34:3427–36. Nucleic Acids Res 2007;35:D88–92.
57. Zhang Q, Zhu L, Huang D-S. High-order convolutional 77. Lipton ZC, Steinhardt J. Troubling trends in machine learn-
neural network architecture for predicting DNA-protein ing scholarship. arXiv:1807.03341. 2018.
binding sites. IEEE/ACM Trans Comput Biol Bioinform 2018; 78. Trabelsi A, Chaabane M, Ben-Hur A. Comprehensive eval-
16:1184–92. uation of deep learning architectures for prediction of
58. Xu W, Zhu L, Huang D-S. DCDE: an efficient deep convo- DNA/RNA sequence binding specificities. Bioinformatics
lutional divergence encoding method for human promoter 2019;35:i269–77.
recognition. IEEE Trans Nanobioscience 2019;18:136–45. 79. Blin K, Dieterich C, Wurmus R, et al. DoRiNA 2.0—
59. Wang D, Zhang Q, Yuan C-A, et al. Motif discovery via convo- upgrading the doRiNA database of RNA interactions in
lutional networks with K-mer embedding. In: International post-transcriptional regulation. Nucleic Acids Res 2015;
Conference on Intelligent Computing. Berlin: Springer, 2019, 43:D160–7.
374–82. 80. iCount. iCount. http://icount.biolab.si/.
10 He et al.
81. Stražar M, Žitnik M, Zupan B, et al. Orthogonal matrix 94. Rong X. word2vec parameter learning explained. arXiv:
factorization enables integrative analysis of multiple RNA 1411.2738. 2014.
binding proteins. Bioinformatics 2016;32:1527–35. 95. Devlin J, Chang M-W, Lee K, et al. Bert: pre-training of
82. Cawley GC, Talbot NL. On over-fitting in model selection deep bidirectional transformers for language understand-
and subsequent selection bias in performance evaluation. ing. arXiv:1810.04805. 2018.
J Mach Learn Res 2010;11:2079–107. 96. Radford A, Narasimhan K, Salimans T, et al. Improving
83. Hong Z, Zeng X, Wei L, et al. Identifying enhancer– language understanding by generative pre-training. 2018.
promoter interactions with neural network based on pre- 97. Silver D, Hassabis D. Alphago: mastering the ancient
trained DNA vectors and attention mechanism. Bioinformat- game of go with machine learning. Res Blog 2016;9. https://
ics 2020;36:1037–43. ai.googleblog.com/2016/01/alphago-mastering-ancient-
84. Shen Z, Zhang Q, Kyungsook H, et al. A deep learning model game-of-go.html.
for RNA-protein binding preference prediction based on 98. Silver D, Schrittwieser J, Simonyan K, et al. Master-
hierarchical LSTM and attention network. IEEE/ACM Trans ing the game of go without human knowledge. Nature
Comput Biol Bioinform 2020. 2017;550:354–9.
85. Shen Z, Deng S-P, D-S H. Capsule network for predict- 99. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief
ing RNA-protein binding preferences using hybrid feature. Bioinform 2017;18:851–69.
IEEE/ACM Trans Comput Biol Bioinform 2019. 100. Snell J, Swersky K, Zemel R. Prototypical networks for few-
86. Shen Z, Deng S-P, Huang D-S. RNA-protein binding sites shot learning. In: Advances in Neural Information Process-
prediction via multi scale convolutional gated recurrent ing Systems, Long Beach, CA, USA: NIPS Foundation, 2017,
unit networks. IEEE/ACM Trans Comput Biol Bioinform 2019. 4077–87.
87. Zhang Q, Zhu L, Bao W, et al. Weakly-supervised con- 101. Hu H-J, Wang H, Harrison R, et al. Understanding the
volutional neural network architecture for predicting prediction of transmembrane proteins by support vector
protein-DNA binding. IEEE/ACM Trans Comput Biol Bioinform machine using association rule mining. In: 2007 IEEE Sym-
2018, 2672–80. posium on Computational Intelligence and Bioinformatics and
88. Zhang Q, Shen Z, Huang D-S. Modeling in-vivo protein-DNA Computational Biology. 2007, pp. 418-25.
binding by combining multiple-instance learning with a 102. Snoek J, Larochelle H. Spearmint. https://github.com/Jaspe
hybrid deep neural network. Sci Rep 2019;9:1–12. rSnoek/spearmint 2012.
89. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative 103. Bergstra J, Yamins D, Cox DD. Hyperopt: a python library for
adversarial nets. In: Advances in Neural Information Processing optimizing the hyperparameters of machine learning algo-
Systems, 2014, 2672–80. rithms. In: Proceedings of the 12th Python in Science Conference.
90. Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. 2013, p. 20.
arXiv:1701.07875. 2017. 104. Worsley-Hunt R, Bernard V, Wasserman WW. Identification
91. De Cao N, Kipf T. MolGAN: an implicit generative model for of cis-regulatory sequence variations in individual genome
small molecular graphs. arXiv:1805.11973. 2018. sequences. Genome Med 2011;3:65.
92. Bojchevski A, Shchur O, Zügner D, et al. Netgan: generating 105. Cornish-Bowden A. Nomenclature for incompletely spec-
graphs via random walks. arXiv:1803.00816. 2018. ified bases in nucleic acid sequences: recommendations
93. Mikolov T, Grave E, Bojanowski P, et al. Advances in 1984. Nucleic Acids Res 1985;13:3021.
pre-training distributed word representations. arXiv: 106. Crooks GE, Hon G, Chandonia J-M, et al. WebLogo: a
1712.09405. 2017. sequence logo generator. Genome Res 2004;14:1188–90.