0% found this document useful (0 votes)
42 views25 pages

DL Protein

This review discusses the application of deep learning techniques in mining protein data, highlighting their potential to transform complex protein big data into valuable insights. It categorizes various deep learning architectures used for residue-level, sequence-level, structural, interaction predictions, and mass spectrometry data analysis, while also addressing their advantages and limitations. The document emphasizes future challenges such as optimizing architectures for specific tasks and improving interpretability in deep learning for protein studies.

Uploaded by

Mohanan V.p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views25 pages

DL Protein

This review discusses the application of deep learning techniques in mining protein data, highlighting their potential to transform complex protein big data into valuable insights. It categorizes various deep learning architectures used for residue-level, sequence-level, structural, interaction predictions, and mass spectrometry data analysis, while also addressing their advantages and limitations. The document emphasizes future challenges such as optimizing architectures for specific tasks and improving interpretability in deep learning for protein studies.

Uploaded by

Mohanan V.p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Briefings in Bioinformatics, 00(00), 2019, 1–25

doi: 10.1093/bib/bbz156
Advance Access Publication Date:
Review Article

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Deep learning for mining protein data
Qiang Shi, Weiya Chen, Siqi Huang, Yan Wang and Zhidong Xue
Corresponding author: Zhidong Xue, School of Software Engineering and College of Life Science Technology, Huazhong University of Science and
Technology, Wuhan, 430074, China. Tel.: +86 130 9889 6226 ; Fax: +86-27-87541114; E-mail: [email protected]

Abstract
The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address
the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a
powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions.
In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data.
The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks,
two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex
neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction,
three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and
deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some
practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture
optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous
protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives
on general deep learning techniques for protein data analysis.

Key words: deep learning; protein big data; residue-level prediction; sequence-level prediction; 3D-structure prediction;
interaction prediction; protein mass spectrometry

Introduction biomedical data [11, 12], drug discovery [13], and healthcare [14,
15] have been discussed in detail.
Deep learning has seen success in the fields of vision and speech
Protein data analysis is an important branch of bioinfor-
recognition [1]. Since deep learning approaches can automati-
cally learn the representations of data with multiple levels of matics, the computational methods of which have been greatly
abstraction, they impact almost every discipline of science and improved with the rapid growth of sequential and structural pro-
engineering, including the physical [2], chemical [3], medical [4], tein data and the continual development of deep learning tech-
and biological sciences [5, 6]. Deep learning plays a particularly nology. The richness of protein data provides a solid foundation
important role in knowledge discovery and practical solutions for data-driven hypothesis generation and biological knowledge
from biological/biomedical big data [7]. Recently, the applica- discovery. Deep learning can automatically extract nonlinear,
tions of deep learning in bioinformatics [7], biomedicine [8–10], intrinsic, abstract, and complex patterns from large-scale data

Qiang Shi is a postdoctoral fellow at the School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine
learning especially deep learning, protein data analysis, and big data mining.
Weiya Chen is an assistant professor at School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests
cover bioinformatics, virtual reality, and data visualization.
Siqi Huang is a master’s student of Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining.
Yan Wang is an associated professor at School of life, University of Science & Technology; her main interests cover protein structure and function prediction
and big data mining.
Zhidong Xue is professor at School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover
bioinformatics, machine learning, and image processing.
Submitted: 16 August 2019; Received (in revised form): 21 October 2019

© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]

1
2 Shi et al.

without prior knowledge [16] and is suitable for the analysis simultaneous feature reconstruction and classifier training, the
of large-scale protein data [17]. Therefore, analysis of protein shallow features directly extracted from raw data are still needed
solubility [18], secondary structures [19–23], sequence profiles to represent protein data. This is because protein data should be
[5], protein–protein interactions (PPIs) [24, 25], protein threading converted to numerical vectors that the algorithm can recognize
[26], protein design [27–29], posttranslational modifications [30], directly [40]. It is an important process for machine-learning
function annotation [31–34], and other applications [35–41] has methods, since effective mathematical expressions can describe
benefited from deep learning. the intrinsic correlation with the corresponding structural and
According to the inputs for deep learning models, these functional attributes [194–196]. Considering that MS techniques
approaches can be categorized by four aspects: sequence, struc- reflect protein structures [197], only some preprocessing, such

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


ture, interaction, and mass spectrometry (MS) data. As sum- as noise filtering, data normalization, and data transformation,
marized in Table 1, sequence-based approaches can be classi- is needed [198, 199]. Conversely, it is necessary to extract
fied into residue- and sequence-level categories. The architec- meaningful features from protein sequences and 3D structures
tures utilized by residue-level approaches include deep neu- to improve their representations. The descriptors can be
ral networks (DNNs), consisting of multilayer perception (MLP), categorized as residue-, sequence-, evolution-, and structure-
deep belief networks (DBNs), and stacked autoencoders (SAEs) related features (see details in Table 2).
[42–59]; convolutional neural networks (CNNs) [18, 36, 60–83, Features of individual types are rarely used for residue-
87–97]; recurrent neural networks (RNNs) [40, 41, 98–103]; and or sequence-level prediction, and features of combined types
hybrid CNNs and RNNs [104–115]. Sequence-level approaches are often adopted. For instance, DeepGO [31], DeepGOPlus
also use the architectures of DNNs [116–124], CNNs [31–34, 125– [34], DeepLoc [134], and SignalP 5.0 [110] adopt the vector
128], RNNs [32, 33, 129–133], and hybrids of CNNs and RNNs of amino acid component (ACC) as input for deep learning
[134–140]. These architectures have already realized residue- models. ACC is usually encoded using one-hot encoding [109,
level tasks such as secondary structure (SS) [53–56, 66–72, 104– 120, 123, 128]. Although various approaches, including n-gram
108], backbone angles [41–45, 73, 74, 99], solvent accessibility (SA) and physiochemical property-based extraction methods, can
[18, 40, 46, 76, 100, 101, 111], and posttranslational modifications be used to encode amino acid sequences, they need prior
(PTMs) [18, 40, 46, 76, 100, 101, 111]. They have also implemented knowledge based on expert experience [115]. Considering that
sequence-level tasks such as fold analysis [116, 125, 126, 128, 132], the deep learning approach can learn features automatically,
function prediction [31–34, 117–120, 140], subcellular localization nature number encoding can also be used [138]. Furthermore,
[31–34, 117–120, 140], and remote homology detection [129–131]. various combinations of residue-, sequence-, evolution-, and
Structure-based approaches utilize low-dimensional mapping structure-related features are used to predict SS [39, 54, 72, 106],
[141–143], voxel-based [144–152] and graph convolutional net- gamma-turn [88], backbone angles [73, 99], SA [46, 100], and
work (GNN) [153–157] methods to analyze model quality assess- PTM sites [48, 52, 114]. Among these combinations, the position-
ment [141, 152], pocket prediction [151, 157], binding site predic- specific scoring matrix (PSSM) is often combined with other
tion [146–150], protein classification [144, 153, 154], amino acid types of features, and its improved vision is also often used.
environment analysis [145], and molecule interpretation [155, For instance, in addition to utilizing PSSM, AUCpreD adopts a
156]. Deep models for interactions between proteins and other hidden Markov model (HMM) profile, which complements PSSM
molecules utilize DNNs [24, 25, 158–167], CNNs [168–173], GNNs to some degree [63]. The deep convolutional neural field (DCNF)
[174–177], and hybrids of CNNs and other models (RNNs, GNNs, proposed by Wang [66] not only used the PSSM generated by
or DNNs) [178–184] to analyze protein-RNA interactions [168, three iterations but also added the PSSM generated by five
169, 178], noncoding RNA (ncRNA)–protein interactions [158–160, iterations to the input features. Besides PSSM, the pseudo-
170, 179], compound–protein interactions [161–163, 171–173, 180– Zernike moment (PZM) extracted from PSSM is also used [170,
182], and PPIs [24, 25, 164–167, 177, 183, 184]. Additionally, deep 229]. PZM is more robust and has less information redundancy.
models of DNNs [185–187], CNNs [188, 189], RNNs [190, 191], Sequence-related features that describe characteristics of whole
and hybrids of CNNs and RNNs [192, 193] are adopted for MS sequences are rarely utilized, although DeepSol combines these
interpretation. In recent years, these approaches have achieved features with other features for SA prediction [18]. Residue-
superior predictive performance and made great progress in big related features are used to represent sequences in other
protein data analysis. sequence-level predictors for target protein identification [123],
Considering that deep learning architectures play an impor- protein function prediction [120], subcellular localization [134],
tant role in modeling protein, in this review, we present various and remote homology detection [129].
deep learning architectures for protein sequences, protein 3D Voxel-based features are often utilized for 3D structure-
structures, protein interactions, and MS data. After comparing based analysis [145, 146, 150, 151]. Procedures to extract
several architectures with respect to one property, or several voxel features mainly include local box extraction, local box
properties of one architecture, we present the advantages and featurization, and various channel combinations. Different
disadvantages of deep learning architectures. The remaining types of atoms are used depending on the task. For instance,
challenges of deep learning on big protein data mainly lie in opti- 14 atom types were adopted to calculate the representation of
mal feature analysis, robust deep learning, network architecture a voxel to predict ligand-binding pockets [150], while 8 and 4
optimization, efficient deep learning with limited protein data, atom types were used to represent voxels to predict binding
multimodal deep learning, and interpretable deep learning. We pockets and amino acid environment similarity, respectively
summarize the deep learning technique in protein data analysis [145, 146, 151].
and provide valuable insights to facilitate the application of deep Specifically, for residue pair prediction at the protein–protein
learning in protein studies. interaction (PPI) interface, three new geometric features are
introduced to describe residue pairs, including interior con-
tact (IC) area, exterior contact (EC) area with other residues,
Shallow features fed into deep learning
and exterior void (EV) area [166]. These features are helpful for
Although deep learning eliminates the need for manual understanding PPI mechanisms and for guidance in biological
dimensionality uniformization and building complex features by experiments.
Deep learning for mining protein data 3

Table 1. Various deep learning approaches based on input data types for protein property prediction

Data type Deep model Protein properties

Sequences Residue-level DNN Backbone dihedral angles [42–44]; torsion angle [45]; solvent
prediction accessibility [46]; PTM site [47–52]; secondary structure [53–56];
contact prediction [57, 58, 83]; disorder [59]
CNN DNA-binding site [60, 61]; signal peptide [62]; disorder [63–65];
secondary structure [66–72]; dihedral angles [73, 74]; torsion angle
[75]; solubility [18, 76]; binding site [36, 77, 78]; residue–residue

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


contact [79–82]; ligand binding [84–86]; turn prediction [87–89];
PTM site [90–97]
RNN DNA-binding site [98]; torsion angle [41, 99]; solvent accessibility
[40, 100, 101]; disorder [102]; contact prediction [103]
Hybrid of CNN Secondary structure [104–108]; antibody paratope [109]; signal
and RNN peptide [110]; solvent accessibility [111]; domain boundary [112];
turns [113]; PTM site [114]; DNA-binding site [115]
Sequence-level DNN Fold recognition [116]; function prediction [117–120]; subcellular
prediction localization [121, 122]; target identification [123, 124]
CNN Fold classification [125]; fold quality assessment [126]; subcellular
localization [127]; function prediction [31–34]; family prediction
[128]
RNN Remote homology detection [129–131]; fold recognition [132];
anticancer peptide [133]; function prediction [32, 33]
Hybrid of CNN Subcellular localization [134, 135]; enzyme classification [136, 137];
and RNN antimicrobial peptide recognition [138, 139]; function prediction
[140]
Structures Low- 1D or 2D CNN Model quality assessment [141]; property predictions [142]; protein
dimensional function [143]
mapping
Voxel-based 3D CNN Enzyme classification [144]; amino acid environment analysis
approach [145]; binding site prediction [146–150]; inpainting binding pockets
[151]; model quality assessment [152]
Graph-based 3D GNN Protein classification [153, 154]; molecule interpretation [155, 156];
method pocket prediction [157]
Interactions Interactions DNN ncRNA–protein interactions [158–160]; compound–protein
between protein [161–163]; PPIs [24, 25, 164–167]
and molecule CNN Protein–RNA interactions [168, 169]; ncRNA–protein interactions
[170]; compound–protein [171–173]
GNN Compound–protein interaction [174–176]; PPIs [177]
Hybrid of CNN Protein–RNA interaction [178]; ncRNA-protein interactions [179];
and RNN (or compound–protein interaction [180–182]; PPIs [183, 184]
GNN or DNN)
MS data MS DNN Neoantigen identification [185]; dimensionality reduction [186];
peptide identification [187]
CNN Tumor classification [188]; proteome inference [189]
RNN MS/MS spectra prediction of peptide [190, 191]
Hybrid CNN and Peptide sequencing [192, 193]
RNN

Residue-level prediction from protein long short-term memory (LSTM) and gated recurrent unit (GRU)]
sequence [19], and the hybrid of CNN and RNN are utilized for residue-level
prediction.
Residue-level prediction means that the properties are associ-
ated with specific residues, such as secondary structure [53–
56, 66–72], disorder [63–65, 102], solvent accessibility [46, 100, DNN-based approaches
101], protein-ligand sites [61, 84, 85], PTM sites [30, 47, 48, 51, MLP- [48, 52, 55], SAE- [38, 44, 46, 56], and DBN-based [47,
90, 93, 94, 97, 114], residue contact [57, 79–82], signal peptides 54] approaches have been widely utilized for various tasks
[62, 110], backbone angles [42, 73, 74, 99], and so on [39, 41, 78]. in residue-level prediction. MLP-based methods have been
These properties are affected by neighbors that are close in the applied to predict secondary structures [55], lysine acetylation
primary sequence or in the 3D structure. However, residues that sites [52], and nitration and nitrosylation sites [48]. SAE-
are neighbors in 3D might be far apart in the primary sequence. based approaches have been used to predict secondary
These local or nonlocal dependencies are essential to property structure [56], solvent accessibility and contact number [46],
prediction at the residue level. To model these dependencies to and backbone Cα angles and dihedrals [44]. DBN-based meth-
improve predictive performance, DNN, CNN [87], RNN [including ods have been adopted to predict secondary structures [54]
4 Shi et al.

Table 2. Descriptors of shallow features

Descriptor groups Descriptor Tools or reference

Residue-related features AAC Amino acid composition (AAC) [75]


Pseudo-amino acid composition [200, 201]
(PseAAC)
Information gain Window-wise entropy [202]
Physical A steric parameter, [203]
hydrophobicity, volume,

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


polarizability, and isoelectric point
Physicochemical AAindex/reduced AAindex [203–205]
Contact potential Measuring interactions between CCMpred [206]
residues
Contact number The number of residues that each AcconPred [207, 208]
residue may be in contact with
Conservation score Conservation score derived from [209]
MSAs
Conjoint triad Conjoint k-spaced triad or [210]
conjoint triad (CTriad)
Autocovariance It describes how variables at [24, 211, 212]
different positions are correlated
and interact
Sequence-related features Sequence length The length of sequence [18, 213, 214]
Molecular weight Molecular weight [18, 213]
Turn-forming residue fraction — [213, 215]
Aliphatic index It is defined as the relative volume [216]
of a protein occupied by aliphatic
side chains
Average hydropathicity The averaged hydropathy of AAs [217]
in sequence
Absolute charge — [18, 213, 214]
Evolution-related features PSSM Position-specific scoring matrix PSI-BLAST [218]
HMM profile Hidden Markov model sequence HHblits [219]
profiles
PZM Pseudo-Zernike moment is [220]
extracted from PSSM
Structure-level features SS Three-state SS/eight-state SS SCRATCH [221], PSIPRED [222],
RaptorX-Property [37]
ACC Solvent accessibility state SCRATCH [221], RaptorX-Property
[37]
FERs 0–95% cutoffs of relative solvent SCRATCH [221]
accessibility
ASA Accessible surface area SPIDER2 [35], SPIDER3 [40], deep
learning [38]
HSE Half Sphere Exposure [223]
Disorder Disorder probability of each PreDisorder [224], DISORDER sever
residue [225]
Backbone angles Backbone torsion, dihedral angles SPIDER2 [35], SPIDER3 [40]
Functional domain Sequence contains one or several HMMER [226]
domains
Distance matrix Distance between two residues’ [141]
C-α atoms
Fingerprints 1D biomolecular persistent [142, 227, 228]
barcodes
Voxel featurization Voxel is featurized by various [145, 146, 150, 151]
atom types
New geometric features IC area [166]
EC area with other residues
EV area

and S-sulfenylation sites [47]. Although these approaches prediction of secondary structures, local backbone angles,
outperform traditional methods, there is a long way for the and solvent-accessible surface area by employing previous
actual application. Especially, to further improve predictive predicted results as input for the next iterative training of deep
performance, iterative deep learning is introduced to improve learning [38]. Similar to the two-level strategy, the DeepConPred
Deep learning for mining protein data 5

model of Xiong [83] improves long-range residue–residue contact few homologs are insufficient for accurate contact-assisted pro-
prediction based on a hierarchical strategy. DeepCCon is adopted tein folding.
to predict the probabilities of parallel contact, anti-parallel Gao [73] proposed RaptorX-Angle, which employs ResNet
contact, and no contact. Then the coarse contact predicted to construct a much deeper DCNN to predict backbone dihe-
by DeepCCon, smoothed PSSM, the natural vector of the dral angles from the sequence alone. RaptorX-Angle uses PSSM,
intervening sequence, the contact propensity of the residue position-specific frequency matrix (PSFM), ACC, SA, and SS prob-
pair, and coevolutionary information of the residue pair are abilities as input features and adopts six ResNets with different
combined and fed into DeepRCon to predict the final contact numbers of layers to extract deep features. The deep features
map. are fed into a logistic regression layer to get the probability of 20

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Considering that previous DNN-based approaches can only
labels, and the final output consists of the mean probabilities
capture local dependence, the deep generative model that
of the ResNets’ output. Although RaptorX-Angle outperforms
can describe higher-order, context-dependent constraints in
the state-of-art method SPIDER2 [38] in terms of Pearson cor-
biological sequences is introduced to capture the effects of
relation coefficient (PCC) and mean absolute error (MAE), this
mutations [230]. In this model, a deep latent variable model is
approach only uses 1D CNN and cannot extract long-range inter-
used to capture higher-order correlations in biological sequence
action information. Fang [75] proposed deep residual inception
families.
neural networks to predict protein backbone torsion angles,
which can enable effective encoding of local and global inter-
actions between amino acids. Although CNN-based approaches
CNN-based approaches achieve better performance than traditional machine-learning
approaches, their accuracy is hampered by their inability to
A deep convolutional neural network (DCNN) consists of multi-
effectively capture the long range.
ple convolutional blocks and implements a composite of linear
Considering that the main blocks of DCNNs are scalar
convolution and nonlinear active transformation. DCNN can not
neurons, which may not characterize hierarchical relationships
only capture local dependence but also can extract high-level,
between simple and complex features using scant training
nonlinear, and abstract features. DCNN-based predictors are
data, a novel deep learning architecture, called capsule network
applied to predict properties that strongly depend on neighbor
(CapsNet) [232], was introduced for protein property prediction.
residues, such as secondary structure, disorder, backbone dihe-
Taking the prediction of PTM sites as an example, although
dral/torsion angles, metal-binding sites, signal peptides, metal-
some approaches, including DCNN, have been utilized [90, 92],
binding sites, and PTM sites. Figure 1 shows a typical frame-
challenges remain, especially in small-sample training and
work of a DCNN for residue-level prediction. Deep convolutional
model interpretation. Therefore, a CapsNet with a multilayer
neural fields (DeepCNFs) [66] and area under the curve (AUC)-
CNN for protein PTM site prediction was proposed [93]. In
maximized DeepCNF (AUCpreD) [63] are the two important opti-
addition, motivated by their excellent performance, capsule
mization strategies. DeepCNF was proposed for SS prediction,
networks were also introduced for protein structure prediction
and it combines the advantages of DCNN and conditional neu-
[232], such as gamma-turn prediction [88]. With the benefits
ral fields (CNFs), which capture complex sequence-structure
of the deep inception network, the deep inception capsule
relationships and SS label correlations, respectively. DeepCNF
network achieved excellent results, significantly outperforming
outperforms traditional methods, especially for high-curvature
the previous best method [233].
regions (S), beta loops (T), and irregular loops (L). However, Deep-
Since many contemporary methods are still limited in cap-
CNF does not address the class-imbalance problem, which is a
turing complex spatial dependency, an approach that combines
common issue in residue-level prediction. Therefore, AUCpreD
a supervised generative stochastic network and a convolutional
utilizes AUC as the imbalance-insensitive measurement to deal
architecture is proposed for SS prediction [72]. By extending
with the imbalance issue and is introduced to predict intrinsi-
the success of generative stochastic networks in capturing
cally disordered regions.
complex dependencies in proteins, this supervised generative
To improve sequence-based contact prediction, Eickholt [58]
stochastic network is demonstrated to be effective for structured
introduced DNCON, which combines the boosted ensembles and
prediction.
deep belief network and achieves state-of-the-art performance
on top-, medium-, and long-range contact predictions. Adhikari
[81] presented DNCON2 to predict full-contact maps. Since
RNN-based approaches
the residue–residue coevolution features captured by five
convolution neural networks are integrated with other features Long-range interactions between residues are structural but not
such as SS, SA, and pairwise contact potentials, DCNN can sequence neighbors. LSTM [234, 235] cells are introduced to
achieve more accurate predictions. Jones [80] proposed DeepCov learn nonlocal relationships more efficiently. Bidirectional LSTM
for contact prediction based on amino acid pair frequency (BLSTM) [236] is usually adopted in bioinformatics studies [102,
or covariance derived directly from sequence alignments. 237] to capture dependencies from the forward and backward
DeepCov is a fully convolutional neural network, and it directions. BLSTM is always stacked to achieve more abstract,
shows that using CNN, simple alignment statistics contain complex, and distinguishing features. As shown in Figure 2, deep
sufficient information to achieve state-of-the-art precision. BLSTM consists of input generation, deep feature extraction,
The deep learning models for contact prediction are listed and classification. In Figure 2, BLSTM layer representations are
in Table 2. directly taken from Hanson et al. [102].
As shown in the last row of Table 3, considering the advan- Various approaches based on different layers of BLSTMs
tage of the deep residual network (ResNet) [231] that the accu- are proposed for residue-level property prediction. The single
racy will not become saturated or degraded with increasing BLSTM-based framework, MUscADEL, is proposed for lysine
depth, Wang [79] used it to improve contact prediction and PTM prediction [30]. MUscADEL contains full-sequence and
solved the problem that the predicted contacts of proteins with sequence-fragment models, which are both BLSTM-based
6 Shi et al.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Figure 1. The framework of DCNN for residue-level prediction and its two optimization strategies, DeepCNF and AUCpreD. (a) DCNN consists of the input layer, several
hidden layers, and output layer. The local correspondences of residues are computed by convolutional operators. (b) Based on the deep features from DCNN, DeepCNF
considers the correspondence of local labels. (c) To solve the imbalance problem, AUCpreD adopts AUC for classification evaluation.

Figure 2. The architecture of deep RNN based on BLSTM for residue-level prediction. Shallow features extracted from protein sequences are combined as input for
deep learning. Then several BLSTM layers are stacked to extract long-range dependencies and achieve more abstract and distinguishable features for classification.
Finally, MLPs or CRFs are used as classifiers for results.

approaches. This is because the glutarylation PTM site is only Hybrid of CNN and RNN approaches
sensitive to local motif patterns, and others are sensitive to the
The motivation to combine CNN and RNN is that residues
combination of long and local information. The two-stacked are influenced by not only their sequential neighbors but
BLSTM approach is used to capture nonlocal interactions for also structural adjacent residues. A suitable prediction model
SS, SA, backbone angles, and contact numbers [40]. This work can exploit this phenomenon to learn useful local patterns
highlights the importance of capturing nonlocal interactions by CNN and then use RNN to learn aggregate features of the
to predict one-dimensional structural properties. Similarly, entire sequence. A simple idea of hybrid deep learning is the
Zhang [100] introduced a three-stacked BLSTM, which is concatenation of CNN and BLSTM models. The framework
called a stacked deep bidirectional recurrent neural network of this hybrid model, as shown in Figure 3, consists of an
(SDBRNN), to predict solvent accessibility. Besides the PSSM input layer of sequences with encoding, the convolutional
and physiochemical properties, they used conservation score layer, BLSTM layers, and classifier layers (MLP or CRF). This
and protein encodings as inputs. They redesigned BLSTM using hybrid model has been used to predict antibody paratope [109]
three types of merging operators (concat, sum, and weighting and protein hydroxylation sites [114]. However, this simple
sum) and used logistic activation as a predictor. Compared with model cannot deal with the prediction problem with small
BLSTM using a single merging operator, SDBRNN can capture samples. To tackle this issue, a complex hybrid model, SignalP
more protein features and is more generalizable. 5.0, is constructed for signal peptide predictions [110]. In this
Deep learning for mining protein data 7

Table 3. Several deep learning models for residue–residue contact prediction

Methods Deep model Features Performance Web site Note

DNCON [58] DBN based on RBM Length of the Dataset: Casp10 http://iris.rnet. First deep learning
protein; SS and SA; Long range: missouri.edu/dncon/ model for
PSSM sums; PSSM Top L/10: 0.663 Top residue–residue
sum cosines; Atchley L/5: 0.615 contact prediction
factors; statistical Medium range:
potentials Top L/10: 0.749 Top

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


L/5: 0.720
DNCON2 [81] CNN Length of the Dataset: Casp10 https://github.com/ Using CNN for
protein; SS and SA; (free-modeling) multicom-toolbox/ residue–residue
PSSM sums; PSSM Long range: Top L/5: DNCON2/ contact prediction
sum cosines; Atchley 0.35 http://sysbio.rnet.
factors; statistical Dataset: Casp11 missouri.edu/
potentials; contact (free-modeling) dncon2/
probabilities; 3-SS; Long range: Top L/5:
alignment statistics 0.50
Dataset: Casp12
(free-modeling)
Long range: Top L/5:
0.534
DeepConPred DBN and two-stage Coevolutionary Long-range: CASP10 http://166.111.152.91/ A two-stage strategy
[83] prediction information; contact Top L/5: 59.33% Downloads.html based on DBN
propensity; natural Top L/10: 64.39%
vector of intervening Top 5 predictions:
sequence; statistics 70.00%
of SS Long-range: CASP11
Top L/5: 49.97%
Top L/10: 54.01%
Top 5 predictions:
59.81%
DeepCov [80] Fully CNN Pair frequencies Mean long-range https://github.com/ Fully CNN is adopted
Covariance CASP12 psipred/DeepCov
Top-L: 0.406 Top-L/2:
0.523
Top-L/5:0.611
Top-L/10: 0.642
DeepResNet [79] Deep residual neural Protein sequence Short-range CASP11 http://raptorx. Higher CNN used for
network profile, 3-state SS; Top-L: 0.28 Top-L/2: uchicago.edu/ residue–residue
3-state SA, 0.46 ContactMap/ contact prediction
Coevolutionary Top-L/5:0.70
information; mutual Top-L/10: 0.82
information; Medium-range
pairwise potential CASP11
Top-L: 0.35 Top-L/2:
0.55
Top-L/5:0.76
Top-L/10: 0.85
Long-range CASP11
Top-L: 0.55 Top-L/2:
0.68
Top-L/5:0.77
Top-L/10: 0.81

approach, SignalP 5.0 integrates one-dimensional convolutions are ignored. To simultaneously use local and global features, Shi
to obtain learnable nonlinear PSSMs before combining CNN and et al. [112] proposed the DNN-Dom architecture for the boundary
BLSTM. In addition, SignalP 5.0 adopts transform learning to prediction of protein domains. In DNN-Dom, the combination
improve predictive performance in organism groups with little of local and global features is fed into parallel balanced random
data (notably Archaea). In transform learning, pretrained deep forests for boundary prediction. Furthermore, considering that
learning in other taxonomic groups is fine-tuned for Archaea, the traditional convolutional layers ignore features from the
gram-positive bacteria, gram-negative bacteria, and Eukarya. feature vector dimension, CNN is replaced by asymmetric
In these hybrid models, global features from BLSTM are only convolutional neural networks (ACNNs) for constructing hybrid
adopted for prediction, and local features extracted from CNN models. For example, the DeepACLSTM consisting of ACNN and
8 Shi et al.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Figure 3. The flowchart of hybrid deep models for residue-level prediction. Convolutional layers are used to extract local dependencies, and BLSTM layers are adopted
to capture long-range dependencies. MLP or CRF is used as the classifier. CRF can describe the dependencies of labels.

BLSTM is introduced to predict eight-category SS [106]. In this for proteins, a hierarchical stack of multitask feed-forward deep
new method, ACNN is adopted to learn phrase-level features as neural networks, named DEEPred, is proposed for automatic pro-
inputs for BLSTM. Since current approaches aim to solve one tein function annotation [120]. DEEPred uses a multitask feed-
problem and cannot simultaneously solve several issues, web forward network to generate a practical large-scale protein func-
servers based on integrated deep learning, called NetSurfP-2.0 tion prediction pipeline. However, DEEPred does not consider an
and MUFold-SSW [39, 41], have been presented to predict several optimized network initialization. Therefore, a new deep model
properties. that utilizes a restricted Boltzmann machine (RBM) for network
initialization is introduced for fold recognition [116].
The above approaches have achieved state-of-the-art perfor-
Sequence-level property prediction mance using hidden layers to get nonlinear and abstract fea-
Sequence-level prediction indicates that properties to be pre- tures. However, these hidden layers cannot be used to describe
dicted are decided with a whole protein sequence. Since shal- the probability distribution of raw data. To solve this problem,
low features are difficult to extract from sequences and lack a the deep generative model is proposed for T-cell receptor (TCR)
representative for the classification task, traditional algorithms protein sequences [238]. In this model, variational autoencoder
do not perform satisfactorily. In addition, they cannot model (VAE) models parameterized by deep neural networks are fitted
the complex relationships between sequences and properties. to TCR repertoires.
Therefore, based on its ability to automatically learn repre-
sentations from data with multiple levels of abstraction, deep
learning is utilized for property prediction at the sequence level. CNN-based approaches
Various deep learning approaches, including DNN, CNN, RNN, Based on the fact that deep learning can efficiently extract infor-
and hybrids of CNN and RNN, are utilized for fold analysis [116, mation from unstructured data far better than human experts,
125, 126, 132], function prediction [31–34, 117–120, 140], target DCNN-based approaches have been proposed for function pre-
identification [123, 124], subcellular location [121, 122, 127, 134, diction [31, 34], fold recognition [40], and family prediction [128]
135], remote homology detection [129–131], antimicrobial pep- from protein sequences. The typical framework of DCNN for
tide (AMP) recognition [138, 139], and enzyme EC classification protein sequence prediction, as shown in Figure 4, consists of an
[136, 137]. input layer of protein sequences with shallow features, several
convolutional layers, and max pooling, fully connected, and
softmax layers.
DNN-based approaches
For more accurate and faster family prediction, Seo [128]
Inspired by the representation power of deep learning models, proposed DeepFam, which consists of one convolution layer
DNN-based approaches are introduced to improve predictive and 1-max pooling layer, and fully connected and softmax
performance of sequence-level properties, such as subcellular layers. Combinations of various hyper-parameters, including the
localization [121], protein function [120], target protein identifi- number and length of convolution kernels, the number of
cation [123], and fold recognition [116]. Given the multiple labels perceptions in the fully connected layer, the coefficient of
and hierarchical structure of the gene ontology (GO) function regularization, dropout rate, learning rate, and batch size, were
Deep learning for mining protein data 9

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Figure 4. The framework of DCNN-based approach for sequence-level prediction (fold classification and function prediction). First, several convolutional layers are
stacked to extract local dependencies and abstract patterns for the whole protein sequences. Second, max pooling layers are adopted to increase the integration
of features and prevent useless parameters from increasing computational complexity. Third, the fully connected layers and softmax layer are used to achieve
classification results.

tested in experiments. Hou [40] similarly proposed DeepSF to construct hybrid models. The simple model consists of one CNN
directly classify any protein sequence into one of 1195 known and one RNN, while the complex model includes multichannel
folds. DeepSF adopted 1D DCNN for fold classification, which CNN and BLSTM with an attention mechanism to capture fea-
consists of 10 convolutional layers, 1 max pooling layer, 1 tures for protein prediction.
flattening layer, 1 fully connected hidden layer, and an output To improve the AMP recognition, expert-free features are
layer. DeepSF used softmax as a classifier for fold recognition. extracted by the deep learning approach consisting of CNN and
Kulmanov [31] proposed DeepGO to predict functions from RNN [138]. The outputs of the convolutional layer are fed into
sequences. DeepGO combines the deep features learned from an LSTM layer, which is a general process in bioinformatics.
sequences by deep learning with a feature vector extracted from Armenteros [134] similarly presented DeepLoc to predict subcel-
a cross-species PPI network for predictions. These combined lular localization. In DeepLoc, the CNN is also followed by RNN.
features are fed into a hierarchical classifier to make predictions. However, there are several differences: (1) DeepLoc uses con-
However, for novel or uncharacterized proteins, there is no addi- volutional filters of different sizes to extract meaningful motif
tional information from the protein’s interactions. DeepGO has information; (2) bidirectional LSTMs are adopted to capture long-
been extended and improved with DeepGOPlus [34], which over- range features in both the forward and backward directions; and
comes its main limitations related to sequence length, missing (3) attention mechanisms [239] are used to improve prediction.
features, and number of predicted classes. In addition, because of the hierarchical categories of subcellular
localization, a hierarchical tree [240] with multiple nodes is
developed.
RNN-based approaches To avoid limitations such as homology requirements, fea-
Li [129] proposed ProDec-BLSTM as a predictor to improve ture design, and feature dimensionality nonuniformity, Li [136]
remote homology detection. ProDec-BLSTM can capture both the introduced the DEEPre model to improve enzyme EC number
long and short dependency, as shown in Figure 5. The protein prediction. This method uses both the deep features from the
sequence is encoded by one-hot encoding as the input. BLSTM hybrid of CNN and RNN and shallow features such as sequence
extracts more comprehensive dependence information, which is length-independent features for classification. The combination
included in the mediate hidden units. The values of these hidden of sequence one-hot encoding, PSSM, SA, and SS is fed into a
units are fed into the time-distributed dense layer, which can hybrid model. This is different from DeepLoc and deep learning-
reassign the weights of the dependence relationships extracted based AMP, which use one-hot encoding as inputs.
from different cells. Finally, the outputs of time-distributed In summary, various architectures of deep learning for pro-
dense layer are concatenated into one feature vector, which tein function prediction are listed in Table 4. Complex models
is fed into an SVM classifier for decision-making. Thanks to the that combine the deep features from hybrid deep learning archi-
time-distributed dense layer, the fused features that contain tectures and shallow features are usually utilized for sequence-
complex dependencies are more discriminative; hence, the level tasks.
ProDec-BLSTM predictor achieves higher performance than
various related methods, including kernel-, SVM-, and LSTM-
based approaches. Three-dimensional structural data mining
Central to protein biology is the understanding of how the
structural arrangement of amino acids creates functional
Hybrid of CNN and RNN approaches characteristics within protein sites. The surfeit of protein
To obtain a more comprehensive sequence representation, CNN structural data enables development of computational methods
and RNN are combined for AMP recognition [138], subcellular to systematically derive rules governing structural–functional
localization prediction [134], and enzyme EC classification [144]. relationships. However, performance of these methods depends
As shown in Figure 6, there are simple and complex strategies to critically on the choice of protein structural representation. Good
10 Shi et al.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Figure 5. Procedures of BLSTM for remote homology detection. BLSTM captures long-range dependence from backward and forward directions. By adopting time-
distributed dense layers, the hidden values generated from different memory cells are given weights. This approach combines different levels of dependence
relationships.

Figure 6. The architecture of a hybrid model for sequence-level prediction. (a) A simple strategy of the hybrid model consisting of one CNN and one LSTM is used for
AMP prediction. (b) A complex hybrid model with multiple channel CNN and BLSTM with an attention mechanism is adopted for subcellular prediction.

representations efficiently capture the most critical information, similarity analyses [145], model quality assessment [141, 152],
while poor representations create a noisy distribution with no and so on [142, 155, 156].
underlying patterns. Most current methods rely on features
that are manually selected based on knowledge of protein
structures. In addition, designing hand-engineered features
Low-dimensional mapping methods
is labor-intensive, time-consuming, and suboptimal for some The idea of the projected-based approach is to reduce the data
tasks. Fortunately, the surfeit of protein structures and the recent dimension from 3D to 2D or 1D using geometric and topo-
success of deep learning algorithms provide an opportunity to logical relations within the 3D structure and to then employ
develop tools to automatically extract task-specific representa- deep learning methods to extract deep features for prediction.
tions of protein structures. Following the voxel-, projected-, or These approaches include distance matrix- and topology-based
graph-based representation of protein 3D structures (Figure 7), DL methods. Nguyen [141] proposed a distance matrix-based
several deep learning approaches automatically extract features deep learning model (DL-PRO) for 3D structure quality assess-
from the protein 3D structure and are applied to predict func- ment. DL-PRO first calculates the pairwise distance matrix of
tions [143], binding pockets [151, 157], ligand-binding pockets the C-α atoms of residues. These distance matrixes, and their
[146–150], enzyme classifications [144], amino acid environment corresponding labels indicating good or bad models, are fed into
Deep learning for mining protein data 11

Table 4. Deep architectures for protein function prediction

Methods Deep model Shallow features Performance Web site Note

DEEPred [120] Multitask Conjoint triad; CAFA2_F-max: https://github.com/ Multitask DNN


feed-forward DNNs pseudo- PAAC; molecular function cansyl/DEEPred algorithms
subsequence profile (MF): 0.49 inherently extract
map (SPMap) biological process the relationships
(BP): 0.26 between multiple
cellular component classes by building

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


(CC): 0.43 complex features
from the raw input
data at each layer in
a hierarchical
manner
DeepGO [31] 1D CNN One-hot encoding CAFA3_Fmax https://github.com/ It combines CNN
MF: 0.47 bio-ontology- model with PPI
BP: 0.34 research-group/ network features
CC: 0.52 deepgo
DeepGOPlus [34] 1D CNN One-hot encoded CAFA3_Fmax https://github.com/ It combines CNN
representation BP: 0.47 bio-ontology- model with sequence
CC: 0.70 research-group/ similarity based
deepgoplus predictions.
ProLanGO [32] Three layers of RNN K-mers of 20 letters CAFA3: — A neural machine
of amino acids Area under Curve translation model
(AUC): 0.39 based on recurrent
neural networks to
translate “ProLan”
language to “GOLan”
language
DEEPre [136] Hybrid of CNN and One-hot encoding; All sub-subclasses: http://www.cbrc. Sequence
LSTM PSSM; ACC; SS; Accuracy: 0.9415 kaust.edu.sa/DEEPre length-dependent
functional domain Kappa: 0.8918 features are fed into
Macro-precision: a hybrid model to
0.8942 extract deep features
Macro-recall: 0.8578 Sequence
Macro-F1: 0.8665. length-independent
features are directly
used for
classification

a stacked autoencoder network for training. DL-PRO is a purely map irregular atomic coordinates to regular representations of
geometric method that can extract effective features represent- 3D grids, including the occupancy grid [144], multiple atom-
ing good models. Since DL-PRO only uses a distance matrix and channel grid [145], and multiple atom-type grid [146, 148, 151].
loses some information, the method proposed by [143] combines To avoid less reliable function prediction caused by sequences,
local shape features with features characterizing the interaction Amidi [144] proposed EnzyNet for enzyme classification
of amino acids to form a multichannel image, which is fed into according to a voxel-based spatial structure. Enzymes are
2D CNN for function prediction. represented as binary volumetric shapes with voxels. A voxel
To extract geometric and biological complexities of biomolecules of vertices takes the value 1 if the backbone of the enzyme
and improve predictive performance, Cang [142] proposed passes through the voxel, and 0 otherwise. Although this
TopologyNet to predict protein-ligand binding affinities. The occupancy grid can be directly fed into 3D CNN, it ignores
element-specific topological fingerprint (ESTF) that can provide physical chemistry properties of atoms. Therefore, because
a sufficient and structured low-level representation is com- amino acid microenvironments are characterized by 3D spatial
puted. Then 1D CNN is used to learn high-level representations. distributions of oxygen, carbon, nitrogen, and sulfur atoms of
Shallow and deep features are combined to feed into a multitask amino acids in a local box, Torng and Altman [145] proposed
learning framework for prediction. 3D CNN for residue microenvironment analysis. The voxelizing
Since information is lost when mapping 3D data to lower process includes local box sampling, local box extraction, and
dimensions, these approaches are expected to combine addi- local box featurization to produce four channel structures,
tional information extracted directly from 3D structures. including oxygen, carbon, nitrogen, and sulfur, which serve as
input samples to 3DCNN. This approach can systematically
derive rules governing structural–functional relationships
Voxel-based methods motivated by the surfeit of structural data. Torng’s approach
Atoms of proteins do not locate at regular grids as 2D images. [145] utilized four atom channels and ignored other atom types.
Voxelizing the protein structure can help to directly put the Jiménez et al. [146] introduced a new 3D CNN, DeepSite, for
protein into the CNN. There are several voxelizing methods to ligand-binding site prediction with seven atom categories:
12 Shi et al.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Figure 7. Representations of 3D protein data for the DL model. (a) In projected-based approaches, the 3D structure is mapped into one- or two-dimensional data, which
are input to 1D or 2D DCNN. (b) In voxel-based methods, the 3D structure is voxelized using 3D grids. The representation for each grid is calculated and fed into 3D
DCNN for property prediction. (c) In graph-based methods, the 3D structure is modeled using a 3D graph, which is fed into GNN for property prediction.

hydrophobic, aromatic, hydrogen bond acceptor or donor, 184], and compounds [161–163, 171–173, 180–182], play an
positive or negative ionizable, and metallic. Skalic et al. [151] important role in many cellular biological processes, such as
similarly proposed the LigVoxel model to predict ligand chemical signal transduction, immune response, cellular organization,
properties like occupancy, aromaticity, and donor–acceptor. protein synthesis, and viral infectivity. Furthermore, protein–
Additionally, for accurate classification of ligand-binding compound interactions facilitate network pharmacology and
pockets, DeepDrug3D used 14 atom types to calculate features drug discovery. Considering that they must simultaneously
of a voxel [150]. process two inputs, these approaches can be classified as
early- or late-fusion strategies, as shown in Figure 8. The
Graph-based methods former consists of representation calculation of two kinds of
biomolecules, representation stitching, deep feature extraction,
Although voxel-based methods achieve state-of-the-art perfor-
and classification. The latter includes representation calcula-
mance, they ignore intrinsic irregular topology, which directly
tion, deep feature extraction for two kinds of biomolecules, deep
governs protein properties. To describe 3D topology such as spa-
feature fusion, and classification. L. Wang [168] adopted the
tial distances and directions between atoms, protein structures
early-fusion strategy for protein–RNA interaction prediction.
are represented as 3D molecular graphs. Based on this rep-
The PSSM from the protein sequence and order-preserving
resentation, a three-dimensional graph convolutional network
transformation (OPT) from the RNA sequence are stitched,
(3DGCN) is introduced to efficiently deal with these irregular
and the deep features are extracted by DCNN from these
topologies for molecule interpretation [155]. In 3DGCN, a con-
stitched representations. An extreme learning machine (ELM)
volutional layer contains two phases. One combines the fea-
[244] classifier that executes quickly and guarantees learning
tures from each node and generates the intermediate features.
accuracy predicts interactions. Similarly, H. Yi [170] used SAE
The other collects and sums these intermediate features along
to predict ncRNA–protein interactions, K. Tian [171] adopted
neighborhoods and generates higher-level features. Experiments
DNN based on ELM to boost compound–protein interaction
on four datasets in the chemical and biological fields demon-
prediction, and T. Sun [24] used SAE for PPIs. Different from early-
strate that 3DGCN achieves state-of-the-art performance in vir-
fusion approaches that first combine shallow representations,
tual drug screening, protein-ligand interactions, and protein
Hashemifar [245] presented DPPI to predict PPIs. PSSMs from
docking.
the interactive sequences are fed into DCNN to detect various
In summary, deep architectures used for protein 3D struc-
patterns. A representation to model paired sequences is
ture analysis are listed in Table 5. Although low-dimensional
generated by a random projection module. Maximizing the log-
mapping approaches, voxel-based approaches, and 3D graph-
likelihood of the interaction is used to predict PPIs. Similarly, Lei
based methods are adopted for protein property prediction,
[168] introduced a multimodal deep polynomial network (MDPN)
graph-based deep learning that can directly model the intrinsic
for PPI prediction. A two-stage DPN extracts high-level and
irregular topology of protein structure may be promising.
complex features from paired sequences. The first stage feeds
multiple protein features into DPN encoding to obtain the high-
Interaction prediction of proteins and other level feature, while the second stage fuses and learns features by
cascading three types of high-level features in the DPN encoding.
molecules
A regularized extreme learning machine (RELM) [246] predicts
The interactions between proteins and other molecules [241], PPIs.
such as RNA [168, 169, 178, 242, 243], noncoding RNA (ncRNA) Specifically, considering that analyzing interactions requires
[158–160, 170, 179], other proteins [24, 25, 164–167, 177, 183, one to handle rich relation information among elements,
Deep learning for mining protein data 13

Table 5. Deep architectures for 3D structure analysis

Methods Deep model Shallow features Performance Web service Note

TopologyNet 1D-CNN Fingerprints PCC: 0.82 weilab.math.msu.edu/TDL It represents 3D


[142] Root mean square complex geometry by 1D
error of PCC: 0.92 topological invariants
DeepSite [146] 3D-CNN Physical chemistry Average DVO (short — It outperforms other
properties of 8 type for discretized existing state-of-the-art
atoms volumetric overlap): strategies while learning

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


0.652 purely from examples
and without encoding
any problem specific
knowledge.
LigVoxel [151] 3D-CNN Physical chemistry 70 out of 85 cases A part of the PlayMolecule.org It constructs end-to-end
properties of 8 type within a threshold of DCNNs that can
atoms 2 Å RMSD (short for generate ligand fields
root mean square given the structure of a
deviation) protein-binding site
DeepDrug3D 3D-CNN Physical chemistry High accuracy of 95% https://github.com/pulimeng/ It not only achieves a
[150] properties of 14 type DeepDrug3D high accuracy of 95%,
atoms but also has the ability
to generalize to unseen
data.
3DGCN [155] 3D-GNN 3D molecular graph Protein-binding — It has the ability of
motifs generalizing a given
Area under curve conformer to targeted
(AUC): 0.857 features regardless of its
Receiver operating rotations in the 3D space
characteristic (ROC):
0.793

Figure 8. Two strategies for interaction prediction between proteins and other biomolecules. (a) In the early-fusion strategy, the fused shallow features from two
biomolecules are fed into deep neural networks to extract deep features, which are fed into classifiers such as ELM and logistic regression. (b) In the late-fusion
strategy, shallow features are fed into deep learning to achieve deep features, which are combined to feed into classifiers.

a graph model is used to describe proteins. Graph neural interactions. Therefore, a new graph convolutional neural
networks can be utilized to learn from graph inputs [247]. network, Decagon, is introduced for multi-relational link
This method has been used to predict protein interfaces prediction [176]. Decagon first constructs a graph model as
[177]. The interface prediction problem can be converted to a multimodal graph encoding drug, protein, and side-effect
classify pairs of nodes from two protein graphs. Following relationships. It operates directly on this graph by a graph
the late-fusion strategy, the features from two GNNs are convolutional encoder and tensor factorization decoder. This
combined for classifying. However, this GNN only deals with one approach can be classified as a late-fusion strategy. Since the
relationship between proteins and cannot handle multimodal parameters across multiple edge types are shared, Decagon
relationships from protein–protein, drug–protein, and drug–drug achieves better performance.
14 Shi et al.

Table 6. Deep architectures for PPI prediction

Method Deep model Shallow features Performance Web site Note

DeepPPI [25] MLP with late fusion ACC; dipeptide Accuracy: 92.50% http://ailab.ahu.edu. It employs deep neural
strategy composition; Precision: 94.38% cn:8087/DeepPPI/index. networks to learn
composition, transition, Recall: 90.56% html. effectively the
and distribution; Specificity: 94.49% representations of
amphiphilic Matthews Correlation proteins from common
pseudoamino AAC; Coefficient (MCC): protein descriptors
85.08%

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


AUC: 97.43%
MDPN (short for MDPN with early fusion Feature of mutation rate Average accuracies: — MDPN consists of a
multimodal deep strategy based on BLOSUM62; H. pylori: 97.87% two-stage DPN; the first
polynomial network) feature of hydrophobic Human: 99.90% stage feeds features into
[167] based on AAindex Yeast: 98.11% DPN to obtain high-level
matrix; feature of features, while the
hydrophilicity based on second stage fuses and
AAindex matrix learns features by
cascading high-level
features in the DPN
PAIRPred [177] GNN with late fusion Proteins as graphs AUC: 86.3% https://github.com/ Using a graph
strategy fouticus/pipgcn representation of the
underlying protein
structure to predict
interfaces between pairs
of proteins
DNN-PPI [183] Hybrid of CNN and LSTM Encode an amino acid by Accuracy: 98.78% — The three layers of CNNs
with late fusion strategy a natural number >MCC: 97.57% and LSTM networks
randomly allow for mining the
relationships between
amino acid fragments in
terms of local
connectivity and
long-term dependence
PIPR [184] Residual RCNN with late Encode an amino acid by Yeast dataset based on https://github.com/ PIPR, employs a residual
fusion strategy a natural number 5-fold cross-validation muhaochen/seq_ppi.git RCNN, which provides
Accuracy: 97.09% an automatic
Precision: 97.00% multi-granular feature
Sensitivity: 97.17% selection mechanism to
Specificity: 97.00% capture both local
F1-score: 97.09% significant features and
MCC: 94.17% sequential features from
the primary protein
sequences

In summary, deep architectures used for PPI predictions of neoantigen-targeted immunotherapies for cancer patients.
are listed in Table 6. Although early- and late-fusion strategies However, considering that MLP cannot be used for unsupervised
are both used to predict PPIs, deep models become more dimension reduction, SAE-based approaches are proposed to
complex. Specifically, to improve prediction performance, a compress MS imaging data [186]. These approaches not only
late fusion strategy adopts complex architectures includ- can nonlinearly project the unseen high-dimensional data to
ing GNNs [177], residual recurrent convolutional networks the low-dimensional space but can enhance the stability of
(RCNNs) [184], and combinations of CNN and LSTM [183] the initial parameters used during fine-tuning across differ-
for PPIs. ent runs. Focusing on fully supervised learning, the CNN-based
approach is usually adopted for tumor classification and protein
inference. IsotopeNet, which is a specialized architecture for
Protein MS data interpretation tumor classification by imaging MS, is constructed [188]. Com-
MS-based technologies are powerful tools to study the ensemble pared to ResNet, IsotopeNet is sensitive to a large number of
of proteins in cells or organs under different circumstances to peaks. In addition, using peptide profiles, a DCNN method, called
gain insight into the functionalities of proteins [197]. Since MS DeepPep, is built for protein inference that predicts the protein
spectra contain much noise and ambiguity, computational pro- set from a proteomics mixture [189]. Comparison to leading
teomics is still a challenge [199]. Motivated by its breakthroughs methods shows that DeepPep has the most robust performance
on these problems, deep learning models, such as MLP, SAE, for various instruments and datasets.
DCNN, BLSTM, and the hybrid of CNN and LSTM, have been Motivated by BLSTM’s capability to model the influences of
applied to understand MS data. both N- and C-terminal amino acids of each cleavage position,
Benefiting from the advantages of highly nonlinear modeling pDeep built by two-layer BLSTM is introduced to predict MS/MS
from MLP, EDGE based on the MLP method improves neoanti- spectra of peptides [190]. Although pDeep can predict peptides
gen identification using tumor human leukocyte antigen (HLA) with high accuracy, it cannot give the peptide sequencing. To
peptide MS datasets [185]. EDGE can facilitate the development realize de novo peptide sequencing given an MS/MS spectrum
Deep learning for mining protein data 15

Table 7. Deep architectures for protein MS interpretation

Methods Deep model Inputs Performance Web site Note

EDGE [185] MLP and rectified Peptide MS data The average PPV — Benefited from deep
linear unit (ReLU) (short for positive learning, EDGE
predictive values) at achieved an
40% recall was 0.54 improved
performance
DeepPep [189] DCNN with each Peptide pairs AUC: 0.80 https://deeppep. DeepPep uses

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


layer followed by a AUPR (short for the github.io/DeepPep/ convolution layers to
pooling layer and area under the capture features of
dropout precision–recall proteins and
curve): 0.84 peptides, allowing for
more complex
nonlinear
relationships
Prosit [187] Bidirectional GRU Peptide, precursor More identifications https://github.com/ The learned internal
with attention layer charge, and at >10× lower false kusterlab/prosit/ representation of
normalized collision discovery rates Prosit approximates
energy (NCE) a chemo-physical
model for peptide
fragmentation and
chromatographic
retention time
DeepNovo [192] Spectrum-CNN and MS/MS spectrum, Reconstruct the https://github.com/ DeepNovo achieves
LSTM network peptide mass, amino complete sequences nh2tran/DeepNovo major improvement
acid sequence of antibody light and of sequencing
heavy chains of accuracy over
mouse: state-of-the-art
97.5–100% coverage methods and
97.2–99.5% accuracy subsequently enables
complete assembly
of protein sequences
without assisting
databases
DeepNovo-DIA [193] Ion-CNN, Precursor and its Amino acid level: https://github.com/ The extension of
spectrum-CNN, and associated 63.8–68.1% nh2tran/DeepNovo- DeepNov can deal
LSTM MS/MS spectra, Peptide level: DIA with the DIA
peptide 37.4–52.4% challenges

and the peptide mass, DeepNovo was presented following the protein MS interpretation. However, as shown in Table 8, several
recently trending topic of “automatically generating a descrip- challenges should be addressed in the future. These include
tion for an image” [192]. DeepNovo learns amino acid sequence optimal feature analysis in protein big data, robust deep learning
patterns of the peptide in association with the feature’s spectra for protein noisy data, network architecture optimization for
by designing the model of spectrum-CNN coupled with LSTM protein data mining, efficient deep learning with limited protein
and provides a complete end-to-end training and prediction data, multimodal deep learning for heterogeneous protein data,
solution. Furthermore, DeepNovo is extended to DeepNovo-DIA and interpretable deep learning for protein understanding.
for data-independent acquisition (DIA) of MS data [193]. The key
idea of this extended model is to learn features of fragment ions
Optimal feature analysis in protein big data
and peptide sequences from DIA MS data.
In summary, deep architectures used for MS interpretation Various types of shallow features extracted from proteins have
are listed in Table 7. Although these approaches have achieved been adopted by deep learning approaches. Protein data are
state-of-the-art performance in neoantigen identification [187], becoming bigger not only in terms of the abundance of pat-
peptide inference [189], peptide MS prediction [185], and peptide terns (data instances or tuples) but also in the dimensionality
sequencing [192, 193], other mechanisms [248, 249] are needed to of features. Irrelevant or redundant features may significantly
fuse protein heterogeneous data for protein understanding and degrade the accuracy and efficiency of machine learning algo-
scientific studies. rithms. Selecting the optimal feature subset from protein big
data becomes an urgent task [250, 251].
Due to the properties of protein big data, existing feature
selection methods face demanding challenges in a variety of
Discussion and future trends
phases, for example, the speed of data processing, imbalanced
Deep learning has achieved state-of-the-art performance in pro- data, and dealing with structural features. Traditional feature-
tein data mining from residue-level prediction, sequence-level selection methods face three challenges with respect to big
prediction, 3D structure data mining, interaction prediction, and data: (1) existing methods usually require large amounts of
16 Shi et al.

Table 8. Issues and future directions of deep learning-based protein analysis

Issues Description Future directions

Shallow feature selection High-dimensional features degrade the accuracy 1) Large-scale feature selection
and efficiency of deep learning. However, 2) Feature selection for imbalanced data
traditional feature selection approaches cannot 3) Feature selection for structured data
solve the problems of large-scale instance,
high-dimensional features, imbalanced classes,
and structured data caused by protein big data

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Protein data with noisy labels Labels with experimental evidence and ones Two strategies for solving this issue:
predicted by machine learning algorithm are 1) Robust loss function
usually available at the same time. Those labels 2) Modeling the latent labels
are very noisy and unlikely to help training deep
networks without additional tricks. Label noise
may handicap the generalization and efficiency
of classifiers
Network architecture optimization The architecture and topology of a neural 1) Various automatic NAS algorithm
network strongly impact the prediction and 2) Multi-objective NAS
computational complexity. The performance of a 3) NSA for other network
neural network is very sensitive to the choice of
the architecture. However, rich expertise and
tremendous laborious trials are usually required
to identify a suitable neural network architecture
Limited amounts of protein data Although deep learning models trained on 1) Unsurprised transfer learning
narrower taxonomical scope have higher 2) Supervised transform learning
performance, some of the species categories have 3) Semi-supervised transfer learning
limited amounts of data available. However,
training deep models requires large amounts of
data
Multimodal protein data Protein data have multiple modals, such as 1) Deep multimodal fusion at various depth
sequence, structure, interaction network, and MS. 2) Optimization for deep fusion architectures
They need to process and relate information from 3) End-to-end model for multimodal
these multiple modalities for some special tasks. translation
Using these data in a complementary manner
can help for learning a complex task
Interpretable deep learning In health-related field, the output of deep 1) Intrinsic interpretability approaches
learning determines crucial decisions. Therefore, 2) Post hoc interpretability approaches
understanding the underlying mechanism is very 3) Prior knowledge-driven mechanisms
important. Not only the quantitative algorithmic
performance is important, but also the reason
why the algorithms work is relevant

learning time, so it is hard for processing speeds to catch up predicted labels is much higher than the number of manual
with the changes of big data; (2) traditional methods are mainly labels. To ensure the reliability of deep learning models, existing
influenced by instances from the majority classes, and this bias approaches only use manually annotated samples. However,
will result in the selected features being unsuitable to predict these methods are not scalable and risk the removal of crucial
rare classes; and (3) most algorithms are designed for generic examples that may be significant for small datasets. In addition,
data and completely ignore the intrinsic structures among fea- removing samples with noisy labels works against the need for
tures. Current techniques, such as distributed computing [252], large-scale data in deep learning approaches.
graphics processing unit (GPU)-accelerated methods [253], cost- To guarantee the convergence and high performance of com-
sensitive learning [254], and the least absolute shrinkage and plex deep models, valuable samples with noisy labels are also
selection operator (lasso) [255], can provide solutions for the utilized for model training. Of course, the mechanism for dealing
above issues in feature selection. However, these methods are with noisy labels should be introduced to achieve robust deep
extremely specific, and how to extract valuable information learning. There are two strategies to solve this issue for deep
from protein big data is still an open issue. Additionally, from learning: robust loss function and modeling latent labels [257].
the perspective of the system, it is valuable to construct practical The former aims to design a robust loss function to alleviate
tools or systems for feature selection in the context of protein big noise effects, while the latter targets the modeling of latent
data. labels to train the classifier and the building of a transition for
adaption from latent labels to the noisy labels. For instance,
in the strategy of robust loss functions, predicted labels in the
Robust deep learning for protein noisy data
cross-entropy loss can be rectified by a label-correction network
In a database related to protein properties, labels with exper- trained on the extra clear dataset [258]. In the strategy of model-
imental evidence and with no direct experimental evidence ing latent labels, a linear adaption layer can be adopted to model
are usually available at the same time [256]. The number of the asymmetric label noise, and this layer can be added on top of
Deep learning for mining protein data 17

a DNN [259]. This encourages the network to learn a “pessimistic” the hidden layers of the deep model trained on large-scale
noise model that denoises the corrupted labels during learning. labeled data as the initial values. Although this method can
However, the above approaches risk the misestimation of some perform satisfactorily, performance may be poor when the
labels when they attempt to correct noisy labels or reweigh the amount of target domain data is small [267]. In practice, the
terms of all the data points. Therefore, other approaches that target domain usually has scant labeled data. To solve this
represent trustworthiness of noisy labels [257] or adopt semi- issue, recent semi-supervised approaches focus on DNNs to
supervised learning methods by concealing the labels of the construct semi-supervised transfer learning approaches, with
noisy set [260] are needed to achieve robust deep learning for promising results based on several benchmarks [268]. But most
protein data mining with noisy labels. of their experiments are based on models trained from scratch.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Further study is required for the construction of semi-supervised
transfer learning methods starting from pretrained models
Network architecture optimization for mining protein
under varying conditions, including training strategies and
data architecture choices. Although there are still many challenges,
State-of-the-art deep models applied to protein data mining the above supervised, unsupervised, and semi-supervised
mostly rely on human expertise. Given a specific scenario, rich transfer learning methods are promising for dealing with issues
expertise and many laborious trials are usually required to iden- caused by limited protein data.
tify a suitable neural network architecture. The issue of how
many convolutional layers are optimal is always present in
Multimodal deep learning for heterogeneous protein
current approaches. Since the architecture of a neural network
data
strongly impacts its prediction and topology and affects compu-
tational complexity, its performance is sensitive to its architec- Protein data have multiple modals, such as sequence, structure,
ture [99]. interaction network, and MS. It is necessary to process and relate
As manually finding an architecture is arduous and requires information from these multiple modalities for some special
the exploration of several network architectures, its automation tasks. For instance, fusing the information from multiple het-
has seen increased effort. To this end, there have been erogeneous interaction networks can assist in protein function
several automated neural architecture search (NAS) algorithms, prediction [119]. In addition, it is helpful for de novo peptide
mainly based on evolutionary algorithms or reinforcement sequencing to simultaneously consider the information from
learning [261]. One challenge of these algorithms remains the protein MS data and residue sequencing [192]. Therefore, using
computational effort to find the best network. To tackle high these data in a complementary manner can help in learning
computational costs, approaches including Auto-Keras based a complex task. Although multimodal deep learning can offer
on network morphism [262], deep active learning [263], deep improved performance for many practical problems, how to
graph Bayesian optimization [264], and multi-agents [265] have build optimal deep multimodal architectures through search,
been proposed for neural architecture optimization. Work has optimization, and multimodal regularization remains a chal-
focused on architecture optimization for image classification, lenge.
with little attention paid to networks in other fields. For From a fusion perspective, techniques in deep multimodal
protein data mining, considering that existing methods with learning can be classified as early-, late-, or intermediate-
similar architecture are only applied to one task, a promising fusion approaches. The intermediate-fusion approach makes
direction is to develop NAS for multitask and multi-objective it simpler to fuse modality-wise representations and learn a
problems. In addition, given the important role of RNN and joint representation, and it allows multimodal fusion at various
GNN, it is interesting to study architecture optimizations of these depths in the architecture [91, 119, 249, 269]. Deep learning still
models. involves much manual design, and experts cannot explore the
full space of possible fusion architectures. It is a promising
way to extend the learning notion to fusion architecture and
Efficient deep learning with limited protein data
construct a truly generic learning method for a specific task. In
Bioinformaticians have noticed that predictive performance can addition, it can be a valuable way to study deep feature selection,
be improved by training several deep learning models with a which can improve the generalization and accuracy of deep
narrower taxonomical scope instead of treating all species using models [270]. From a translation perspective, end-to-end trained
a general model. For instance, for subcellular location prediction, neural networks are currently most popular for multimodal
some false predictions can be avoided by disallowing plastid translation, such as image captioning and de novo peptide
prediction for groups, such as animals and fungi, that do not sequencing from MS [192, 193]. However, when generating
have plastids [110, 134]. However, some categories have limited sequences using an RNN, it becomes especially difficult to
amounts of available data, for which there are three situations: generate long sequences, since RNN models tend to forget the
fully unknown labels, fully known labels, and few experimentally initial input. This has been partly addressed by neural attention
known labels. How to efficiently handle these limited data by models.
deep learning is an open problem.
One common approach is transfer learning, which reuses
knowledge of the source domain to solve a new task of the target
Interpretable deep learning for protein understanding
domain. Given the unlabeled samples, unsupervised transfer In the domain of bioinformatics and health-related fields, the
learning is adopted to improve the predictive performance on output of deep learning determines crucial decisions, which
small datasets [266]. However, most of these approaches focus are subject to legal consequences and/or administrative audits.
on homogeneous domain adaptation, where the source and Comprehending a model and understanding the underlying
target domains have the same or very similar feature spaces. mechanism are important to decision-makers. Taking the
For small labeled samples from target domains, supervised prediction of adverse drug reactions (ADRs) as an example
transform learning constructs a new deep model by reusing [271], to introduce the attention mechanism to deep learning
18 Shi et al.

models enables the identification of substructures within Supplementary Data


the drug molecules related to a particular ADR. This can
Supplementary data are available online at https://academic.
help identify risky substructures and may help improve the
oup.com/bib
safety evaluation of pipeline drugs. Not only the quantitative
algorithmic performance is important; the reason why the
algorithms work is relevant. Hence, it is urgent to develop
predictable and explainable deep learning models, especially
Funding
in health-related fields [14, 15]. This work was supported by the National Natural Science
From a methodological perspective, existing techniques Foundation of China under Grant 61772217 and Grant

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


for interpretable machine learning can be classified as either 71771098 and Fundamental Research Funds for the Cen-
intrinsic or post hoc interpretability [272]. The former is tral Universities under Grant 2016YXMS104 and Grant
achieved by constructing self-explanatory models, which 2017KFYXJJ225.
directly incorporate interpretability. The latter requires a
second model to provide explanations for an existing model.
The promising work of intrinsic interpretability includes
References
attention mechanisms based on interpreting unified RNN- 1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature
CNN models and GNN models. Representative work for post 2015;521(7553):436.
hoc interpretability includes deep k-nearest neighbors (DkNNs) 2. Alexander R, et al. Machine learning at the energy
[273] and understanding deep models via influence functions and intensity frontiers of particle physics. Nature
[274]. Since previous studies do not use biological knowledge, 2018;560(7716):41–8.
other mechanisms are needed to directly incorporate prior 3. Segler MHS, Preuss M, Waller MP. Planning chemical syn-
knowledge [264]. theses with deep neural networks and symbolic AI. Nature
2018;555(7698):604.
4. Coudray N, et al. Classification and mutation prediction
from non–small cell lung cancer histopathology images
Key Points using deep learning. Nat Med 2018;24(10):1559.
• Deep learning-based approaches for protein big data 5. O’Connell J, et al. SPIN2: predicting sequence profiles from
mining can be classified into sequence, structure, inter- protein structures using deep neural networks. Proteins
action, and MS categories according to the inputs for Struct Funct Bioinf 2018;86(6):629–633.
deep models. 6. Angermueller C, et al. Deep learning for computational
• Sequence-based predictors can be classified as residue- biology. Mol Syst Biol 2016;12(7):878.
or sequence-level methods, which adopt DNN, CNN, 7. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief
RNN, or the combination of CNN and RNN to model Bioinf 2017;18(5):851–69.
protein sequences. 8. Wainberg M, et al. Deep learning in biomedicine. Nat Biotech-
• Structure-based approaches can be classified as low- nol 2018;36(9):829.
dimensional mapping-, voxel-, and graph-based meth- 9. Mamoshina P, et al. Applications of deep learning in
ods. Low-dimensional mapping can use 1D and 2D CNN biomedicine. Mol Pharm 2016;13(5):1445–54.
to model the projected information from a 3D struc- 10. Cao C, et al. Deep learning and its applications in
ture. Voxel-based approaches adopt 3D CNN to directly biomedicine. Genomics Proteomics Bioinf 2018;16(1):17–32.
model 3D structure. Additionally, graph-based methods 11. Baldi P. Deep learning in biomedical data science. Annu Rev
utilize graph convolutional networks to extract high- Biomed Data Sci 2018;1:181–205.
level features from 3D protein graphs. 12. Greenspan H, Van Ginneken B, Summers RM. Guest edito-
• Interaction-based predictors usually utilize early- and rial deep learning in medical imaging: overview and future
late-fusion strategies to realize interaction prediction. promise of an exciting new technique. IEEE Trans Med Imag-
The former consists of representation calculation of ing 2016;35(5):1153–9.
two kinds of biomolecules, representation stitching, 13. Sun M, et al. Graph convolutional networks for computa-
deep feature extraction, and classification. The lat- tional drug development and discovery. Briefings in Bioinfor-
ter includes representation calculation, deep feature matics, 2019. https://doi.org/10.1093/bib/bbz042.
extraction for two kinds of biomolecules, deep feature 14. Miotto R, et al. Deep learning for healthcare: review, oppor-
fusion, and classification. Various deep architectures, tunities and challenges. Brief Bioinf 2017;19(6).
such as GNN, CNN, and DNN, can be applied to this task. 15. Kwak, G.H.-J. and P. Hui, DeepHealth: Deep Learning for
• Architectures of DNN, CNN, RNN, and hybrids of CNN Health Informatics arXiv preprint arXiv:1909.00384, 2019.
and RNN are utilized to interpret MS data. Tasks include 16. Zhang L, et al. From machine learning to deep learning:
peptide identification, proteome inference, and peptide progress in machine intelligence for rational drug discov-
sequencing from MS data. ery. Drug Discov Today 2017;22(11):1680–5.
• There are several challenges to future trends, including 17. Klausen MS, et al. NetSurfP-2.0: improved prediction of
optimal feature analysis, robust deep learning, network protein structural features by integrated deep learning.
architecture optimization, efficient deep learning with Proteins: Struct Funct Bioinf 2019;87(6):520–527.
limited protein data, multimodal deep learning for het- 18. Khurana S, et al. DeepSol: a deep learning framework for
erogeneous protein data, and interpretable deep learn- sequence-based protein solubility prediction. Bioinformatics
ing. The combination of deep learning and protein big 2018;34(15):2605–2613.
data points to a prosperous future on a new frontier. 19. Zhang B, Li J, Qiang L. Prediction of 8-state protein sec-
ondary structures by a novel deep learning architecture.
BMC Bioinf 2018;19(1):293.
Deep learning for mining protein data 19

20. Hou J, Guo Z, Cheng J. DNSS2: improved ab initio protein structure, backbone angles, contact numbers and solvent
secondary structure prediction using advanced deep learn- accessibility. Bioinformatics 2017;33(18):2842–9.
ing architectures. bioRxiv 2019; (2019):639021. 41. Fang C, et al. MUFold-SSW: a new web server for predicting
21. Yang Y, et al. Sixty-five years of the long march in protein protein secondary structures, torsion angles, and turns.
secondary structure prediction: the final stretch? Brief Bioinf Bioinformatics 2019.
2016;19(3):482–94. 42. Gao J, Yang Y, Zhou Y. Predicting the errors of pre-
22. Jiang Q, et al. Protein secondary structure prediction: a dicted local backbone angles and non-local solvent-
survey of the state of the art. J Mol Graph Model 2017;76: accessibilities of proteins by deep neural networks. Bioin-
379–402. formatics 2016;32(24):3768–73.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


23. Wardah W, et al. Protein secondary structure prediction 43. Zimmermann O. Backbone Dihedral Angle Prediction, in Pre-
using neural networks and deep learning: a review. Comput diction of Protein Secondary Structure. Springer, New York:
Biol Chem 2019;81:1–8. Humana Press, 2017, 65–82.
24. Sun T, et al. Sequence-based prediction of protein protein 44. Lyons J, et al. Predicting backbone Cα angles and dihedrals
interaction using a deep-learning algorithm. BMC Bioinf from protein sequences by stacked sparse auto-encoder
2017;18(1):277. deep neural network. J Comput Chem 2014;35(28):2040–6.
25. Du X, et al. DeepPPI: boosting prediction of protein–protein 45. Gao J, Yang Y, Zhou Y. Grid-based prediction of torsion
interactions with deep neural networks. J Chem Inf Model angle probabilities of protein backbone and its application
2017;57(6):1499–510. to discrimination of protein intrinsic disorder regions and
26. Zhu J, et al. Protein threading using residue co-variation and selection of model structures. BMC bioinf 2018;19(1):29.
deep learning. Bioinformatics 2018;34(13):i263–73. 46. Deng LL, Fan C, Zeng Z. A sparse autoencoder-based deep
27. Wang J, et al. Computational protein design with deep neural network for protein solvent accessibility and con-
learning neural networks. Sci Rep 2018;8(1):6349. tact number prediction. BMC Bioinf 2017;18(16):569.
28. Müller AT, Hiss JA, Schneider G. Recurrent neural network 47. Nie L, et al. Prediction of protein S-sulfenylation sites using
model for constructive peptide design. J Chem Inf Model a deep belief network. Curr Bioinforma 2017;12(5):461–7.
2018;58(2):472–9. 48. Xie Y, et al. DeepNitro: prediction of protein nitration and
29. Paladino A, et al. Protein design: from computer models to nitrosylation sites by deep learning. Genomics Proteomics
artificial intelligence. Wiley Interdiscip Rev: Comput Mol Sci Bioinf 2018;16(4):294–306.
2017;7(5):e1318. 49. Chandra A, et al. PhoglyStruct: prediction of phospho-
30. Chen Z, et al. Large-scale comparative assessment of com- glycerylated lysine residues using structural properties of
putational predictors for lysine post-translational modi- amino acids. Sci Rep 2018;8(1):17923.
fication sites. Brief Bioinf 2018. https://doi.org/10.1093/bib/ 50. Le NQK, Sandag GA, Ou Y-Y. Incorporating post transla-
bby089. tional modification information for enhancing the predic-
31. Kulmanov M, Khan MA, Hoehndorf R. DeepGO: pre- tive performance of membrane transport proteins. Comput
dicting protein functions from sequence and interac- Biol Chem 2018;77:251–60.
tions using a deep ontology-aware classifier. Bioinformatics 51. Lumbanraja FR, et al. An evaluation of deep neural network
2017;34(4):660–8. performance on limited protein phosphorylation site pre-
32. Cao R, et al. ProLanGO: protein function prediction using diction data. Procedia Comput Sci 2019;157:25–30.
neural machine translation based on a recurrent neural 52. Wu M, et al. A deep learning method to more accu-
network. Molecules 2017;22(10):1732. rately recall known lysine acetylation sites. BMC Bioinf
33. Liu X. Deep recurrent neural network for protein function 2019;20(1):49.
prediction from sequence. bioRxiv 2017;(2017):103994. 53. Wang Y, Hua M, Zhang Y. Protein secondary structure
34. Kulmanov M, Hoehndorf R. DeepGOPlus: improved pro- prediction by using deep learning method 73. Knowl-Based
tein function prediction from sequence. bioRxiv 2019; Syst 2016;118:S0950705116304713.
(2019):615260. 54. Spencer M, Eickholt J, Cheng J. A deep learning net-
35. Yang Y, et al. SPIDER2: A Package to Predict Secondary Structure, work approach to ab initio protein secondary struc-
Accessible Surface Area, and Main-Chain Torsional Angles by ture prediction. IEEE/ACM Trans Comput Biol Bioinform
Deep Neural Networks, 2017. 2015;12(1):103–12.
36. Jurtz VI, et al. An introduction to deep learning on biolog- 55. Yavuz BC, Yurtay N, Ozkan O. Prediction of protein sec-
ical sequence data: examples and solutions. Bioinformatics ondary structure with clonal selection algorithm and mul-
2017;33(22):3685–90. tilayer perceptron. IEEE Access 2018;6:45256–61.
37. Wang S, et al. RaptorX-property: a web server for 56. Shuaiyan Z, Yihui L, Jinyong C. The prediction of protein
protein structure property prediction. Nucleic Acids Res secondary structure based on auto encoder. In: International
2016;44(W1):W430–W435. Conference on Natural Computation, 2017.
38. Heffernan R, et al. Improving prediction of secondary struc- 57. Stahl K, Schneider M, Brock O. EPSILON-CP: using deep
ture, local backbone angles, and solvent accessible sur- learning to combine information from multiple sources for
face area of proteins by iterative deep learning. Sci Rep protein contact prediction. BMC Bioinf 2017;18(1):303.
2015;5(1):11476–6. 58. Eickholt J, Cheng J. Predicting protein residue–residue con-
39. Klausen MS, et al. NetSurfP-2.0: improved prediction of pro- tacts using deep networks and boosting. Bioinformatics
tein structural features by integrated deep learning. Proteins 2012;28(23):3066–72.
2019;87(6):520–7. 59. Eickholt J, Cheng J. DNdisorder: predicting protein disorder
40. Heffernan R, et al. Capturing non-local interactions by using boosting and deep networks. BMC Bioinf 2013;14(1):88.
long short-term memory bidirectional recurrent neural 60. Zhou J, et al. CNNsite: prediction of DNA-binding residues
networks for improving prediction of protein secondary in proteins using convolutional neural network with
20 Shi et al.

sequence features. In: IEEE International Conference on Bioin- 80. Jones DT, Kandathil SM. High precision in protein con-
formatics & Biomedicine, 2017. tact prediction using fully convolutional neural net-
61. Zhang Q, Zhu L, Huang D-S. High-order convolutional neu- works and minimal sequence features. Bioinformatics
ral network architecture for predicting DNA-protein bind- 2018;34(19):3308–15.
ing sites. IEEE/ACM Trans Comput Biol Bioinform 2018. 81. Adhikari B, Hou J, Cheng J. DNCON2: improved protein con-
62. Savojardo C, et al. DeepSig: deep learning improves signal tact prediction using two-level deep convolutional neural
peptide detection in proteins. Bioinformatics 2017;34(10). networks. Bioinformatics 2017;34(9):1466–721472.
63. Wang S, Ma J, Xu J. AUCpreD: proteome-level protein dis- 82. Schaarschmidt J, et al. Assessment of contact predictions
order prediction by AUC-maximized deep convolutional in CASP12: co-evolution and deep learning coming of age.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


neural fields. Bioinformatics 2016;32(17):i672–9. Proteins: Struct Funct Bioinf 2018;86:51–66.
64. Wang S, Sun S, Xu J. AUC-Maximized deep convolutional 83. Xiong D, Zeng J, Gong H. A deep learning frame-
neural fields for protein sequence labeling. In: Joint European work for improving long-range residue–residue contact
Conference on Machine Learning and Knowledge Discovery in prediction using a hierarchical strategy. Bioinformatics
Databases. Cham: Springer, 2016. 2017;33(17):2675–83.
65. Wang S, et al. DeepCNF-D: predicting protein order/disorder 84. Cui Y, et al. Predicting protein-ligand binding residues
regions by weighted deep convolutional neural fields. Int J with deep convolutional neural networks. BMC Bioinf 2019;
Mol Sci 2015;16(8):17315–30. 20(1).
66. Wang S, et al. Protein secondary structure prediction using 85. Ragoza M, et al. Protein–ligand scoring with convolutional
deep convolutional neural fields. Sci Rep 2016;6:18962. neural networks. J Chem Inf Model 2017;57(4):942–57.
67. Fang C, Shang Y, Xu D. MUFOLD-SS: new deep inception- 86. Zeng H, Gifford DK. DeepLigand: accurate prediction of
inside-inception networks for protein secondary structure MHC class I ligands using peptide embedding. Bioinformatics
prediction. Proteins: Struct Funct Bioinf 2018;86(5):592–8. 2019;35(14):i278–83.
68. Fang C, Shang Y, Xu D. A new deep neighbor residual 87. Fang C, Shang Y, Xu D. MUFold-BetaTurn. In: A Deep Dense
network for protein secondary structure prediction. In: Inception Network for Protein Beta-Turn Prediction, 2018.
2017 IEEE 29th International Conference on Tools with Artificial 88. Fang C, Shang Y, Xu D. Improving protein gamma-
Intelligence (ICTAI). IEEE, 2017. turn prediction using inception capsule networks. Sci Rep
69. Busia A, Collins J, Jaitly N. Protein secondary structure 2018;8(1):15741.
prediction using deep multi-scale convolutional 89. Fang C, Shang Y, Xu D. A deep dense inception network
neural networks and next-step conditioning. arXiv for protein beta-turn prediction. Proteins: Struct Funct Bioinf
preprintarXiv:1611.01503 2016. 2019.
70. Zhou J, et al. CNNH_PSS: protein 8-class secondary structure 90. Fu H, et al. DeepUbi: a deep learning framework for pre-
prediction by convolutional neural network with highway. diction of ubiquitination sites in proteins. BMC Bioinf 2019;
BMC Bioinf 2018;19(4):60. 20(1).
71. Busia A, Jaitly N. Next-step conditioned deep convolutional 91. Fei H, et al. A multimodal deep architecture for large-scale
neural networks improve protein secondary structure pre- protein ubiquitylation site prediction. In: IEEE International
diction. arXiv preprintarXiv:1702.03865 2017. Conference on Bioinformatics & Biomedicine, 2017.
72. Zhou J, Troyanskaya OG. Deep supervised and convo- 92. Wang D, et al. MusiteDeep: a deep-learning framework for
lutional generative stochastic network for protein sec- general and kinase-specific phosphorylation site predic-
ondary structure prediction. arXiv: Quantitative Methods tion. Bioinformatics 2017;33(24).
2014. 93. Wang D, Liang Y, Xu D. Capsule network for protein post-
73. Gao Y, et al. RaptorX-angle: real-value prediction of protein translational modification site prediction. Bioinformatics
backbone dihedral angles through a hybrid method of clus- 2019;35(14):2386–2394.
tering and deep learning. BMC Bioinf 2018;19(4):100. 94. Luo F, et al. DeepPhos: prediction of protein
74. Gao Y, et al. Real-value and confidence prediction of protein phosphorylation sites with deep learning. Bioinformatics
backbone dihedral angles through a hybrid method of clus- 2019;35(16):2766–2773.
tering and deep learning. arXiv preprintarXiv:1712.07244 95. He F, et al. Large-scale prediction of protein ubiquitination
2017. sites using a multimodal deep architecture. BMC Syst Biol
75. Fang C, Shang Y, Xu D. Prediction of protein backbone tor- 2018;12(6):109.
sion angles using deep residual inception neural networks. 96. Li F, et al. DeepCleave: a deep learning predictor for caspase
IEEE/ACM Trans Comput Biol Bioinform 2018. and matrix metalloprotease substrates and cleavage sites.
76. Lin Z, Lanchantin J, Qi Y. MUST-CNN: a multilayer shift- Bioinformatics 2019. doi: 10.1093/bioinformatics/btz721.
and-stitch deep convolutional architecture for sequence- 97. Long H, Wang M, Fu H. Deep convolutional neural networks
based protein structure prediction. In: Thirtieth AAAI Con- for predicting hydroxyproline in proteins. Curr Bioinforma
ference on Artificial Intelligence, 2016. 2017;12(3):233–8238.
77. Haberal I, Ogul H. DeepMBS: prediction of protein metal 98. Zhou J, et al. EL_LSTM: prediction of DNA-binding
binding-site using deep learning networks. In: International residue from protein sequence by combining long
Conference on Mathematics & Computers in Sciences & in Indus- short-term memory and ensemble learning. IEEE/ACM
try, 2017. Trans Comput Biol Bioinform 2018; Early Access, doi:
78. Zheng J, et al. Deep-RBPPred: predicting RNA binding pro- 10.1109/TCBB.2018.2858806.
teins in the proteome scale based on deep learning. Sci Rep 99. Li H, et al. Deep learning methods for protein torsion angle
2018;8(1):15264. prediction. BMC Bioinf 2017;18(1):417.
79. Wang S, et al. Accurate de novo prediction of protein con- 100. Zhang B, Li L, Lü Q. Protein solvent-accessibility prediction
tact map by ultra-deep learning model. PLoS Comput Biol by a stacked deep bidirectional recurrent neural network.
2017;13(1):e1005324. Biomolecules 2018;8(2):33.
Deep learning for mining protein data 21

101. Heffernan R, et al. Single-sequence-based prediction of Data Mining and Information Security. Singapore: Springer,
protein secondary structures and solvent accessibility by 2019, 29–38.
deep whole-sequence learning. J Comput Chem 2018;39(26): 123. Wang Q, et al. A novel framework for the identifica-
2210–6. tion of drug target proteins: combining stacked auto-
102. Hanson J, et al. Improving protein disorder prediction by encoders with a biased support vector machine. PLoS One
deep bidirectional long short-term memory recurrent neu- 2017;12(4):e0176486.
ral networks. Bioinformatics 2016;33(5):685–92. 124. Mayr A, et al. Large-scale comparison of machine learning
103. Hanson J, et al. Accurate prediction of protein contact maps methods for drug target prediction on ChEMBL. Chem Sci
by coupling residual two-dimensional bidirectional long 2018;9(24):5441–51.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


short-term memory with convolutional neural networks. 125. Hou J, Adhikari B, Cheng J. DeepSF: deep convolutional
Bioinformatics 2018;34(23):4039–45. neural network for mapping protein sequences to folds.
104. Johansen AR, et al. Deep recurrent conditional random field Bioinformatics 2017;34(8):1295–303.
network for protein secondary prediction. In: Proceedings of 126. Derevyanko G, et al. Deep convolutional networks
the 8th ACM International Conference on Bioinformatics, Compu- for quality assessment of protein folds. Bioinformatics
tational Biology, and Health Informatics. ACM, 2017. 2018;34(23):4046–53.
105. Li Z, Yu Y. Protein secondary structure prediction using cas- 127. Sønderby SK, et al. Convolutional LSTM networks for sub-
caded convolutional and recurrent neural networks. arXiv cellular localization of proteins. In: International Conference
preprintarXiv:1604.07176 2016. on Algorithms for Computational Biology. Springer, 2015.
106. Guo Y, et al. DeepACLSTM: deep asymmetric convolutional 128. Seo S, et al. DeepFam: deep learning based alignment-
long short-term memory neural models for protein sec- free method for protein family modeling and prediction.
ondary structure prediction. BMC Bioinf 2019;20(1):341. Bioinformatics 2018;34(13):i254–62.
107. Drori I, et al. High quality prediction of protein Q8 secondary 129. Li S, Chen J, Liu B. Protein remote homology detection
structure by diverse neural network architectures. arXiv based on bidirectional long short-term memory. BMC bioinf
preprintarXiv:1811.07143 2018. 2017;18(1):443.
108. Uddin MR, et al. SAINT: self-attention augmented 130. Liu B, Li S. ProtDet-CCH: protein remote homology detec-
inception-inside-inception network improves protein tion by combining long short-term memory and ranking
secondary structure prediction. bioRxiv 2019;(2019):786921. methods. IEEE/ACM Trans Comput Biol Bioinform 2018.
109. Liberis E, et al. Parapred: antibody paratope prediction using 131. Chen J, et al. A comprehensive review and compar-
convolutional and recurrent neural networks. Bioinformatics ison of different computational methods for protein
2018;34(17):2944–50. remote homology detection. Brief Bioinform 2016;19(2):
110. Armenteros JJA, et al. SignalP 5.0 improves signal pep- 231–44.
tide predictions using deep neural networks. Nat Biotechnol 132. Tsubaki M, Shimbo M, Matsumoto Y. Protein fold recog-
2019;1. nition with representation learning and long short-term
111. Kaleel M, et al. PaleAle 5.0: prediction of protein relative memory. IPSJ Trans Bioinf 2017;10:2–8.
solvent accessibility by deep learning. Amino Acids 2019;1–8. 133. Yi H-C, et al. ACP-DL: a deep learning long short-term
112. Shi Q, et al. DNN-Dom: predicting protein domain boundary memory model to predict anticancer peptides using high-
from sequence alone by deep neural network. Bioinformatics efficiency feature representation. Molecular Therapy-Nucleic
2019. Acids 2019;17:1–9.
113. Ludwiczak J, et al. PiPred–a deep-learning method for 134. Almagro Armenteros JJ, et al. DeepLoc: prediction of protein
prediction of π-helices in protein sequences. Sci Rep subcellular localization using deep learning. Bioinformatics
2019;9(1):6888. 2017;33(21):3387–95.
114. Long H, et al. A hybrid deep learning model for predicting 135. Savojardo C, et al. BUSCA: an integrative web server to
protein hydroxylation sites. Int J Mol Sci 2018;19(9):2817. predict subcellular localization of proteins. Nucleic Acids Res
115. Qu YH, et al. On the prediction of DNA-binding proteins only 2018;46(W1):W459–66.
from primary sequences: a deep learning approach. PLoS 136. Li Y, et al. DEEPre: sequence-based enzyme EC number
One 2017;12(12):e0188129. prediction by deep learning. Bioinformatics 2018;34(5):760–9.
116. Jo T, et al. Improving protein fold recognition by deep learn- 137. Zou Z, et al. Mldeepre: multi-functional enzyme func-
ing networks. Sci Rep 2015;5(1):17573–3. tion prediction with hierarchical multi-label deep learning.
117. Fa R, et al. Predicting human protein function with multi- Front Genet 2018;9:714.
task deep neural networks. PLoS One 2018;13(6). 138. Veltri D, Kamath U, Shehu A. Deep learning improves
118. Gao R, et al. Prediction of enzyme function based on three antimicrobial peptide recognition. Bioinformatics
parallel deep CNN and amino acid mutation. Int J Mol Sci 2018;34(16):2740–7.
2019;20(11):2845. 139. Schneider P, et al. Hybrid network model for “deep learning”
119. Gligorijevic V, Barot M, Bonneau R. deepNF: deep net- of chemical data: application to antimicrobial peptides. Mol
work fusion for protein function prediction. Bioinformatics Inf 2017;36(1–2):1600011.
2017;34(22):3873–3881. 140. Chen H, et al. DIFFUSE: predicting isoform functions from
120. Rifaioglu AS, et al. DEEPred: automated protein function sequences and expression profiles via deep learning. Bioin-
prediction with multi-task feed-forward deep neural net- formatics 2019;35(14):i284–94.
works. Sci Rep 2019;9(1):7344. 141. Nguyen SP, Shang Y, Xu D. DL-PRO: a novel deep learning
121. Wei L, et al. Prediction of human protein subcellular method for protein model quality assessment. Proc Int Jt
localization using deep learning. J Parallel Distrib Comput Conf Neural Netw 2014;2014:2071–8.
2017;117:212–7. 142. Cang Z, Wei GW. TopologyNet: topology based deep convo-
122. Ali M, et al. Prediction of bacteriophage protein locations lutional and multi-task neural networks for biomolecular
using deep neural networks. In: Emerging Technologies in property predictions. PLoS Comput Biol 2017;13(7):e1005690.
22 Shi et al.

143. Zacharaki EI. Prediction of protein function using a 164. Wang Y-B, et al. Predicting protein–protein interactions
deep convolutional neural network ensemble. PeerJ 2017;3: from protein sequences by a stacked sparse autoencoder
1–17. deep neural network. Mol Biosyst 2017;13(7):1336–44.
144. Amidi A, et al. EnzyNet: enzyme classification using 3D 165. Patel S, et al. DeepInteract: deep neural network based
convolutional neural networks on spatial representation. protein-protein interaction prediction tool. Curr Bioinforma
PeerJ 2018;6:e4750. 2017;12(6):551–7.
145. Torng W, Altman RB. 3D deep convolutional neural net- 166. Zhao Z, Gong X. Protein-protein interaction interface
works for amino acid environment similarity analysis. BMC residue pair prediction based on deep learning architec-
Bioinf 2017;18(1):302. ture. IEEE/ACM Trans Comput Biol Bioinform 2017;1–1.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


146. Jiménez J, et al. DeepSite: protein-binding site predic- 167. Lei H, et al. Protein-protein interactions prediction via mul-
tor using 3D-convolutional neural networks. Bioinformatics timodal deep polynomial network and regularized extreme
2017;33(19):3036–42. learning machine. IEEE J Biomed Health Inf 2018.
147. Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. 168. Wang L, et al. Combining high speed ELM learning with a
Development and evaluation of a deep learning model deep convolutional neural network feature encoding for
for protein-ligand binding affinity prediction. Bioinformatics predicting protein-RNA interactions. IEEE/ACM Trans Com-
2018;34(21). put Biol Bioinform 2018.
148. Jimãc Nez LJ, et al. KDEEP: protein-ligand absolute binding 169. Wang L, et al. Prediction of RNA-protein interactions by
affinity prediction via 3D-convolutional neural networks. J combining deep convolutional neural network with fea-
Chem Inf Model 2018. ture selection ensemble method. J Theor Biol 2019;461:
149. Gomes J, et al. Atomic convolutional networks for 230–8.
predicting protein-ligand binding affinity. arXiv 170. Yi H-C, et al. A deep learning framework for robust and
preprintarXiv:1703.10603 2017. accurate prediction of ncRNA-protein interactions using
150. Pu L, et al. DeepDrug3D: classification of ligand-binding evolutionary information. Molecular Therapy-Nucleic Acids
pockets in proteins with a convolutional neural network. 2018;11:337–44.
PLoS Comput Biol 2019;15(2):e1006718. 171. Tian K, et al. Boosting compound-protein interaction pre-
151. Skalic M, et al. LigVoxel: inpainting binding pockets diction by deep learning. Methods 2016;110:64–72.
using 3D-convolutional neural networks. Bioinformatics 172. Lee I, Keum J, Nam H. DeepConv-DTI: prediction of drug-
2018;35(2):243–50. target interactions via deep learning with convolution on
152. Pagès G, Charmettant B, Grudinin S. Protein model quality protein sequences. PLoS Comput Biol 2019;15(6):e1007129.
assessment using 3D oriented convolutional neural net- 173. Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug–target
works. Bioinformatics 2019;35(18):3313–93319. binding affinity prediction. Bioinformatics 2018;34(17):i821–
153. Cantoni V, et al. A Supervised Approach to 3D Structural Clas- 9.
sification of Proteins, 2013. 174. Feng Q, et al. Padme: a deep learning-based
154. Wu B, et al. Dgcnn: disordered graph convolutional neural framework for drug-target interaction prediction. arXiv
network based on the gaussian mixture model. Neurocom- preprintarXiv:1807.09741 2018.
puting 2018;321:346–56356. 175. Lim J, et al. Predicting drug-target interaction using a novel
155. Cho H, Choi IS. Three-Dimensionally Embedded Graph graph neural network with 3D structure-embedded graph
Convolutional Network (3DGCN) for Molecule Interpretation, representation. J Chem Inf Model 2019.
2018. 176. Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy
156. Liu K, et al. Chemi-net: a molecular graph convolutional side effects with graph convolutional networks. Bioinfor-
network for accurate drug property prediction. Int J Mol Sci matics 2018;34(13):i457–66.
2019;20(14):3389. 177. Fout A, et al. Protein interface prediction using graph con-
157. Bianchini M, et al. Deep Neural Networks for Structured Data, volutional networks. In: Advances in Neural Information Pro-
2018. cessing Systems, 2017.
158. Pan X, et al. IPMiner: hidden ncRNA-protein interac- 178. Ben-Bassat I, Chor B, Orenstein Y. A deep neural network
tion sequential pattern mining with stacked autoen- approach for learning intrinsic protein-RNA binding pref-
coder for accurate computational prediction. BMC Genomics erences. Bioinformatics 2018;34(17):i638–46.
2016;17(1):582–2. 179. Peng C, et al. RPITER: a hierarchical deep learning frame-
159. Zhan Z, et al. BGFE: a deep learning model for ncRNA- work for ncRNA–protein interaction prediction. Int J Mol Sci
protein interaction predictions based on improved 2019;20(5):1070.
sequence information. Int J Mol Sci 2019;20(4):978. 180. Richoux F, et al. Comparing two deep learning sequence-
160. Zhan Z, et al. Efficient framework for predicting ncRNA- based models for protein-protein interaction prediction.
protein interactions based on sequence information by arXiv: Learning 2019.
deep learning. In: International Conference on Intelligent Com- 181. Karimi M, et al. DeepAffinity: interpretable deep learn-
puting, 2018. ing of compound–protein affinity through unified recur-
161. Wang L, et al. A computational-based method for predict- rent and convolutional neural networks. Bioinformatics
ing drug–target interactions by using stacked autoencoder 2019;35(18):3329–3338.
deep neural network. J Comput Biol 2017;25(3):361–73. 182. Tsubaki M, Tomii K, Sese J. Compound–protein interac-
162. Wan F, Zeng J. Deep learning with feature embed- tion prediction with end-to-end learning of neural net-
ding for compound-protein interaction prediction. bioRxiv works for graphs and sequences. Bioinformatics 2018;35(2):
2016;2016:086033. 309–18.
163. Hamanaka M, et al. CGBVS-DNN: prediction of compound- 183. Li H, et al. Deep neural network based predictions of
protein interactions based on deep learning. Mol Inf protein interactions using primary sequences. Molecules
2017;36:1600045. 2018;23(8):1923.
Deep learning for mining protein data 23

184. Chen M, et al. Multifaceted protein–protein interaction pre- 206. Stefan S, Markus G, Johannes SD. CCMpred–fast and
diction based on Siamese residual RCNN. Bioinformatics precise prediction of protein residue-residue contacts
2019;35(14):i305–i314. from correlated mutations. Bioinformatics 2014;30(21):
185. Bulik-Sullivan B, et al. Deep learning using tumor HLA 3128.
peptide mass spectrometry datasets improves neoantigen 207. KabakIoglu A, et al. Statistical properties of contact vec-
identification. Nat Biotechnol 2019;37(1):55. tors. Phys Rev E Stat Nonlinear Soft Matter Phys 2002;65:
186. Thomas SA, et al. Dimensionality reduction of mass spec- 041904.
trometry imaging data using autoencoders. In: 2016 IEEE 208. Kinjo AR, Horimoto K, Nishikawa K. Predicting absolute
Symposium Series on Computational Intelligence (SSCI), IEEE, contact numbers of native protein structure from amino

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


Athens, Greece 2016:1–7. acid sequence. Proteins-structure Function & Bioinformatics
187. Gessulat S, et al. Prosit: proteome-wide prediction of pep- 2010;58(1):158–65.
tide tandem mass spectra by deep learning. Nat Methods 209. Quan L, Lv Q, Zhang Y. STRUM: structure-based prediction
2019;16(6):509. of protein stability changes upon single-point mutation.
188. Behrmann J, et al. Deep learning for tumor classification in Bioinformatics 2016;32(19):2936.
imaging mass spectrometry. Bioinformatics 2017;34(7):1215– 210. Shen J, et al. Predicting protein–protein interactions based
1223. only on sequences information. Proc Natl Acad Sci U S A
189. Kim M, Eetemadi A, Tagkopoulos I. DeepPep: deep pro- 2007;104(11):4337–41.
teome inference from peptide profiles. PLoS Comput Biol 211. You Z, et al. Prediction of protein-protein interactions from
2017;13(9):e1005661. amino acid sequences with ensemble extreme learning
190. Zhou XX, et al. pDeep: predicting MS/MS spectra of peptides machines and principal component analysis. BMC Bioinf
with deep learning. Anal Chem 2017;89(23):12690–7. 2013;14(8):1–11.
191. Zeng W-F, et al. MS/MS spectrum prediction for modified 212. Zhao Y, Chen Y, Jiang M. Predicting protein-protein inter-
peptides using pDeep2 trained by transfer learning. Anal actions from protein sequences using probabilistic neu-
Chem 2019;91(15):9724. ral network and feature combination. J Inf Comput Sci
192. Tran NH, et al. De novo peptide sequencing by deep learn- 2014;11(7):2397–406.
ing. Proc Natl Acad Sci U S A 2017;114(31):201705691. 213. Rawi R, et al. PaRSnIP: sequence-based protein solubility
193. Tran NH, et al. Deep learning enables de novo pep- prediction using gradient boosting machine. Bioinformatics
tide sequencing from data-independent-acquisition mass 2018;34(7):1092–1098.
spectrometry. Nat Methods 2019;16(1):63–6. 214. Hebditch M, et al. Protein-sol: a web tool for predicting
194. Dan O, Michal L. ProFET: feature engineering captures high- protein solubility from sequence. Bioinformatics 2017;33(19):
level protein functions. Bioinformatics 2015;31:btv345. 3098.
195. Chen Z, et al. iFeature: a python package and web server for 215. Magnan CN, Arlo R, Pierre B. SOLpro: accurate sequence-
features extraction and selection from protein and peptide based prediction of protein solubility. Bioinformatics
sequences. Bioinformatics 2018;34(14):2499–502. 2009;25(17):2200.
196. Zhang P, et al. PROFEAT update: a protein features web 216. Ikai A. Thermostability and aliphatic index of globular
server with added facility to compute network descrip- proteins. J Biochem 1980;88(6):1895–8.
tors for studying omics-derived networks. J Mol Biol 217. Kyte J, Doolittle RF. A simple method for display-
2017;429(3):416–25. ing the hydropathic character of a protein. J Mol Biol
197. Aebersold R, Mann M. Mass-spectrometric exploration 1982;157(1):105–32.
of proteome structure and function. Nature 218. Camacho C, et al. BLAST+: architecture and applications.
2016;537(7620):347. BMC Bioinf 2009;10(1):421–1.
198. Ma C. DeepQuality: mass spectra quality assessment via 219. Remmert M, et al. HHblits: lightning-fast iterative protein
compressed sensing and deep learning. arXiv: Quantitative sequence searching by HMM-HMM alignment. Nat Methods
Methods 2017. 2012;9(2):173–5.
199. Sinitcyn P, Rudolph JD, Cox J. Computational methods 220. Gorji HT, Haddadnia J. A novel method for early diagnosis
for understanding mass spectrometry–based shotgun pro- of Alzheimer’s disease based on pseudo Zernike moment
teomics data. Annu Rev Biomed Data Sci 2018;1:207–34. from structural MRI. Neuroscience 2015;305:361–71371.
200. Liu B, Chen J, Wang X. Protein remote homology detec- 221. Magnan CN, Pierre B. SSpro/ACCpro 5: almost per-
tion by combining Chou’s distance-pair pseudo amino acid fect prediction of protein secondary structure and rela-
composition and principal component analysis. Mol Genet tive solvent accessibility using profiles, machine learn-
Genomics 2015;290(5):1919–31. ing and structural similarity. Bioinformatics 2014;30(18):
201. Kuo-Chen C. Using amphiphilic pseudo amino acid com- 2592–7.
position to predict enzyme subfamily classes. Bioinformatics 222. Buchan DWA, et al. Scalable web services for the PSIPRED
2005;21(1):10–9. protein analysis workbench. Nucleic Acids Res 2013;41(Web
202. Ismail HD, et al. RF-Phos: a novel general phosphorylation Server issue):349–57.
site prediction tool based on random forest. Biomed Res Int 223. Heffernan R, et al. Highly accurate sequence-based predic-
2016;2016:3281590. tion of half-sphere exposures of amino acid residues in
203. Meiler J, et al. Generation and evaluation of dimension- proteins. Bioinformatics 2015;32(6):843–9.
reduced amino acid parameter representations by artificial 224. Deng X, Eickholt J, Cheng J. PreDisorder: ab initio sequence-
neural networks. J Mol Model 2001;7(9):360–9. based prediction of protein disordered regions. BMC Bioinf
204. Kawashima S, Ogata H, Kanehisa M. AAindex: amino acid 2009;10(1):436–6.
index database. Nucleic Acids Res 1999;27(1):368–9. 225. Ward JJ, et al. Prediction and functional analysis of native
205. Atchley WR, et al. Solving the protein sequence metric disorder in proteins from the three kingdoms of life. J Mol
problem. Proc Natl Acad Sci U S A 2005;102(18):6395–400. Biol 2004;337(3):635–45.
24 Shi et al.

226. Finn RD, et al. The Pfam protein families database: 248. Kanezaki A, et al. Deep learning for multimodal data fusion.
towards a more sustainable future. Nucleic Acids Res In: Multimodal Scene Understanding. Elsevier, Pittsburgh: Aca-
2016;44(Database issue):D279–85. demic Press, 2019, 9–39.
227. Xia K, Wei GW. Persistent homology analysis of protein 249. Ramachandram D, Taylor GW. Deep multimodal learning:
structure, flexibility, and folding. Int J Numer Methods Biomed a survey on recent advances and trends. IEEE Signal Process
Eng 2014;30(8):814–44. Mag 2017;34(6):96–108.
228. Xia K, et al. Persistent homology for the quantitative pre- 250. Rong M, Gong D, Gao X. Feature selection and its use
diction of fullerene stability. J Comput Chem 2015;36(6): in big data: challenges, methods, and trends. IEEE Access
408–22. 2019;7:19709–25.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019


229. Haddadnia J, Ahmadi M, Faez K. An efficient feature extrac- 251. Wang L, Wang Y, Chang Q. Feature selection methods for
tion method with pseudo-Zernike moment in RBF neural big data bioinformatics: a survey from the search perspec-
network-based human face recognition system. EURASIP J tive. Methods 2016;111:21–31.
Adv Signal Process 2003;2003(9):1–12. 252. Peralta D, et al. Evolutionary feature selection for big
230. Riesselman AJ, Ingraham JB, Marks DS. Deep generative data classification: a mapreduce approach. Math Probl Eng
models of genetic variation capture the effects of muta- 2015;2015:1–11.
tions. Nat Methods 2018;15(10):816–22. 253. Escobar JJ, et al. Issues on GPU Parallel Implementation of Evo-
231. He K, et al. Deep Residual Learning for Image Recognition, 2015. lutionary High-Dimensional Multi-objective Feature Selection,
232. de Jesus, D.R., et al., Capsule Networks for Protein 2017.
Structure Classification and Prediction arXiv preprint 254. Hamidi H, Daraei A. A novel two-step feature selec-
arXiv:1808.07475, 2018. tion based cost sensitive myocardial infarction prediction
233. Zhu Y, et al. Using predicted shape string to enhance model. Int J Comput Intell Syst 2018;11(1):861–72.
the accuracy of -turn prediction. Amino Acids 255. Kim S, Xing EP. Tree-Guided Group Lasso for Multi-Task Regres-
2012;42(5):1749–55. sion with Structured Sparsity, 2009.
234. Gers FA, Schmidhuber E. LSTM recurrent networks learn 256. Buza TJ, Mccarthy FM, Burgess SC. Experimental-
simple context-free and context-sensitive languages. IEEE confirmation and functional-annotation of predicted
Trans Neural Netw 2001;12(6):1333–40. proteins in the chicken genome. BMC Genomics 2007;8(1):
235. Greff K, et al. LSTM: a search space odyssey. IEEE trans- 425.
actions on neural networks and learning systems 2016;28(10): 257. Yao J, et al. Deep Learning from Noisy Image Labels with Quality
2222–32. Embedding, 2017.
236. Graves A, Fernández S, Schmidhuber J. Bidirectional LSTM 258. Veit A, et al. Learning from Noisy Large-Scale Datasets with
networks for improved phoneme classification and recog- Minimal Supervision, 2017.
nition. In: International Conference on Artificial Neural Net- 259. Bekker AJ, Goldberger J. Training deep neural-networks
works. Springer, 2005. based on unreliable labels. In: IEEE International Conference
237. Min X, et al. Chromatin accessibility prediction via con- on Acoustics, 2016.
volutional long short-term memory networks with k-mer 260. Ding Y, et al. A Semi-Supervised Two-Stage Approach to Learn-
embedding. Bioinformatics 2017;33(14):i92–101. ing from Noisy Labels, 2018.
238. Davidsen K, et al. Deep generative models for T cell receptor 261. Balaprakash P, et al. Scalable reinforcement-learning-
protein sequences. Elife 2019;8. based neural architecture search for cancer deep learning
239. Bahdanau D, Cho K, Bengio Y. Neural machine trans- research. arXiv preprintarXiv:1909.00311 2019.
lation by jointly learning to align and translate. arXiv 262. Jin H, Song Q, Hu X. Auto-keras: an efficient neural architec-
preprintarXiv:1409.0473 2014. ture search system. In: Proceedings of the 25th ACM SIGKDD
240. Freitas A, Carvalho A. A tutorial on hierarchical classifica- International Conference on Knowledge Discovery & Data Mining.
tion with applications in bioinformatics. In: Research and ACM, 2019.
Trends in Data Mining Technologies and Applications. Hershey, 263. Geifman Y, El-Yaniv R. Deep active learning with a neural
Pennsylvania: IGI Global, 2007, 175–208. architecture search. arXiv preprintarXiv:1811.07579 2018.
241. Shi C, et al. Deep learning in the study of protein-related 264. Ma L, Cui J, Yang B. Deep neural architecture
interactions: Review. Protein Pept Lett 2019;26:1–11. search with deep graph bayesian optimization. arXiv
242. Pan X, et al. Recent methodology progress of deep learn- preprintarXiv:1905.06159 2019.
ing for RNA–protein interaction prediction. Wiley Interdisci- 265. Carlucci FM, et al. MANAS: multi-agent neural architecture
plinary Reviews—RNA 2019;(2019):e1544. search. arXiv preprintarXiv:1909.01051 2019.
243. Moore KS, AC’t Hoen P. Computational approaches for the 266. Gopalan R, Li R, Chellappa R. Unsupervised adaptation
analysis of RNA–protein interactions: a primer for biolo- across domain shifts by generating intermediate data
gists. J Biol Chem 2019;294(1):1–9. representations. IEEE Trans Pattern Anal Machine Intell
244. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: 2014;36(11):2288–302.
theory and applications. Neurocomputing 2006;70(1– 267. Sawada Y, et al. All-Transfer Learning for Deep Neural Networks
3):489–501. and its Application to Sepsis Classification, 2017.
245. Hashemifar S, et al. Predicting protein–protein interac- 268. Papernot N, et al. Semi-supervised Knowledge Transfer for Deep
tions through sequence-based deep learning. Bioinformatics Learning from Private Training Data, 2017.
2018;34(17):i802–10. 269. Zhao X, et al. General and species-specific lysine acetyla-
246. MartíNez-MartíNez JM, et al. Regularized extreme learn- tion site prediction using a bi-modal deep architecture. IEEE
ing machine for regression problems. Neurocomputing Access 2018;6:63560–9.
2011;74(17):3716–21. 270. Li Y, Chen C-Y, Wasserman WW. Deep feature selection:
247. Zhou J, et al. Graph neural networks: a review of methods theory and application to identify enhancers and promot-
and applications. arXiv preprintarXiv:1812.08434 2018. ers. J Comput Biol 2016;23(5):322–36.
Deep learning for mining protein data 25

271. Dey S, et al. Predicting adverse drug reactions through 273. Papernot N, McDaniel P. Deep k-nearest neighbors: towards
interpretable deep learning framework. BMC Bioinf confident, interpretable and robust deep learning. arXiv
2018;19(21):476. preprintarXiv:1803.04765 2018.
272. Murdoch WJ, et al. Interpretable machine learning: def- 274. Koh PW, Liang P. Understanding black-box predictions via
initions, methods, and applications. arXiv preprintarXiv: influence functions. In: Proceedings of the 34th International
1901.04592 2019. Conference on Machine Learning-Volume 70, 2017, JMLR. org.

Downloaded from https://academic.oup.com/bib/advance-article-abstract/doi/10.1093/bib/bbz156/5681782 by guest on 23 December 2019

You might also like