Structure-Based Protein Function Prediction Using Graph Convolutional Networks
Structure-Based Protein Function Prediction Using Graph Convolutional Networks
https://doi.org/10.1038/s41467-021-23303-9 OPEN
Ramnik J. Xavier 5,9,10,11, Rob Knight 2,12,13, Kyunghyun Cho14,15 & Richard Bonneau 1,4,14,16 ✉
1234567890():,;
The rapid increase in the number of proteins in sequence databases and the diversity of their
functions challenge computational approaches for automated function prediction. Here, we
introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by
leveraging sequence features extracted from a protein language model and protein struc-
tures. It outperforms current leading methods and sequence-based Convolutional Neural
Networks and scales to the size of current sequence repositories. Augmenting the training set
of experimental structures with homology models allows us to significantly expand the
number of predictable functions. DeepFRI has significant de-noising capability, with only a
minor drop in performance when experimental structures are replaced by protein models.
Class activation mapping allows function predictions at an unprecedented resolution,
allowing site-specific annotations at the residue-level in an automated manner. We show the
utility and high performance of our method by annotating structures from the PDB and
SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a
webserver at https://beta.deepfri.flatironinstitute.org/.
1 Center for Computational Biology, Flatiron Institute, New York, NY, USA. 2 Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
3 Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland. 4 Courant Institute of Mathematical Sciences, Department of Computer
Science, New York University, New York, NY, USA. 5 Broad Institute of MIT and Harvard, Cambridge, MA, USA. 6 The Liggins Institute, University of
Auckland, Auckland, New Zealand. 7 Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA. 8 Scientific Computing
Core, Flatiron Institute, Simons Foundation, New York, NY, USA. 9 Center for Computational and Integrative Biology, Massachusetts General Hospital and
Harvard Medical School, Boston, MA, USA. 10 Gastrointestinal Unit, and Center for the Study of Inflammatory Bowel Disease, Massachusetts General
Hospital and Harvard Medical School, Boston, MA, USA. 11 Center for Microbiome Informatics and Therapeutics, MIT, Cambridge, MA, USA. 12 Center for
Microbiome Innovation, University of California San Diego, La Jolla, CA, USA. 13 Department of Computer Science and Engineering, University of California
San Diego, La Jolla, CA, USA. 14 Center for Data Science, New York University, New York, NY, USA. 15 CIFAR Azrieli Global Scholar, New York, NY, USA.
16 Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA. ✉email: vgligorijevic@flatironinstitute.org;
P
roteins fold into 3-dimensional structures to carry out a present a method applicable to hundreds of thousands of
wide variety of functions within the cell1. Even though sequences of proteins from unknown organisms, lacking the
many functional regions of proteins are disordered, the required network data.
majority of domains fold into specific and ordered three- In the last decade, deep learning has led to unprecedented
dimensional conformations2–6. In turn, the structural features improvements in performance of methods tackling a broad
of proteins determine a wide range of functions: from binding spectrum of problems, ranging from learning protein sequence
specificity and conferring mechanical stability, to catalysis of embeddings for contact map prediction35 to predicting protein
biochemical reactions, transport, and signal transduction. There structure36,37 and function38. In particular, convolutional neural
are several widely used classification schemes that organize these networks (CNN)39, the state-of-the-art in computer vision, have
myriad protein functions including the Gene Ontology (GO) shown tremendous success in addressing problems in computa-
Consortium7, Enzyme Commission (EC) numbers8, Kyoto tional biology. They have enabled task-specific feature extraction
Encyclopedia of Genes and Genomes (KEGG)9, and others. For directly from protein sequence (or the corresponding 3D struc-
example, GO classifies proteins into hierarchically related func- ture), overcoming the limitations of standard feature-based
tional classes organized into three different ontologies: Molecular machine learning (ML) methods. The majority of sequence-
Function (MF), Biological Process (BP), and Cellular Component based protein function prediction methods use 1D CNNs, or
(CC), to describe different aspects of protein functions. variations thereof, that search for recurring spatial patterns within
The advent of efficient low-cost sequencing technologies and a given sequence and converts them hierarchically into complex
advances in computational methods (e.g., gene prediction) have features using multiple convolutional layers. Recent work has
resulted in a massive growth in the number of sequences available employed 3D CNNs to extract features from protein structural
in key protein sequence databases like the UniProt Knowledge- data40,41. Although these works demonstrate the utility of struc-
base (UniProtKB)10. UniProt currently contains over 100 million tural features, storing and processing explicit 3D representations
sequences, only about 0.5% of which are manually annotated of protein structure at high resolution is not memory efficient,
(UniProtKB/Swiss-Prot). Due to considerations of scale, design, since most of the 3D space is unoccupied by protein structure. In
and costs of experiments to verify a function, it is safe to posit contrast, geometric deep learning methods42,43, and more speci-
that most proteins with unknown function (i.e., hypothetical fically graph convolutional networks (GCNs)44, overcome these
proteins) are unlikely to be experimentally characterized. limitations by generalizing convolutional operations on more
Understanding the functional roles and studying the mechanisms efficient graph-like molecular representations. GCNs have shown
of newly discovered proteins is one of the most important bio- tremendous success in various problems ranging from learning
logical problems in the post-genomic era. In parallel to the features for quantitative structure-activity relationship (QSAR)
growth of sequence data, advances in experimental and compu- models45, to predicting biochemical activity of drugs46, to pre-
tational techniques in structural biology has made the three- dicting interfaces between pairs of proteins47.
dimensional structures of many proteins available11–18. The Here, we describe a method based on GCNs for functionally
Protein Data Bank (PDB)19, a repository of three-dimensional annotating proteins and detecting functional regions in proteins,
structures of proteins, nucleic acids, and complex assemblies, has termed Deep Functional Residue Identification (DeepFRI), that
experienced significant recent growth, reaching almost 170,000 outperforms current methods and scales to the size of current
entries. Large databases of comparative models such as SWISS- repositories of sequence information. Our model has a two-stage
MODEL also provide valuable resources for studying architecture that takes as input a protein structure and a sequence
structure–function relationships13,20. representation from a pre-trained, task-agnostic language model,
To address the sequence-function gap many computational represented as graphs derived from amino acid interactions in the
methods have been developed with the goal to automatically 3D structure. The model outputs probabilities for each function
predict protein function. Further, related work is directed at (see Fig. 1) and identifies residues important for function pre-
predicting function in a site- or domain-specific manner21–24. diction by using the gradient-weighted Class Activation Map
Traditional machine learning classifiers, such as support vector (grad-CAM)48 approach, that we adapted for post-training ana-
machines, random forests, and logistic regression have been lysis of GCNs. We provide several examples where we auto-
used extensively for protein function prediction. They have matically and correctly identify functional sites for various
established that integrative prediction schemes outperform functions where binding and catalytic sites are known.
homology-based function transfer25,26 and that integration of
multiple gene- and protein-network features typically outperform
sequence-based features even though network features are often Results
incomplete or unavailable. Systematic blind prediction challenges, DeepFRI combines protein structure and pre-trained sequence
such as the Critical Assessment of Functional Annotation embeddings in a GCN. In the past few years, it has been shown
(CAFA127, CAFA228, and CAFA329) and MouseFunc30, are cri- that features extracted from pre-trained, task-agnostic, language
tical in the development of these methods and have shown that models can significantly increase classification performance in
integrative machine learning and statistical methods outperform many natural language processing49 and biological problems35.
traditional sequence alignment-based methods (e.g., BLAST)26. Here, we use a similar approach for extracting features from
However, the top-performing CAFA methods typically rely sequences and learning protein representations. The first stage of
strongly on manually-engineered features constructed from either our method is a self-supervised language model with a recurrent
text, sequence, biological networks, or protein structure31. In neural network architecture with long short-term memory
most cases, for newly sequenced proteins, or proteins of poorly (LSTM-LM)50. The language model is pre-trained on a set of
studied organisms these features are difficult to obtain because of protein domain sequences from the protein families database
limited information (e.g., no text features or biological network (Pfam)51, and is used for extracting residue-level features from
available). Here, we focus on methods that take sequence and PDB sequences (see Fig. 1a). The second stage is a GCN that uses
sequence-based features (such as predicted structure) as inputs a deep architecture to propagate the residue-level features
and do not focus on, or compare to, the many methods that rely between residues that are proximal in the structure and construct
on protein networks like GeneMANIA32, Mashup33, DeepNF34, final protein-level feature representations (see Fig. 1b).
and other integrative network prediction methods. As a result, we
Fig. 1 Schematic method overview. a LSTM language model, pre-trained on ~10 million Pfam protein sequences, used for extracting residue-level features
of PDB sequence. b Our GCN with three graph convolutional layers for learning complex structure–function relationships.
We train the LSTM-LM on a corpus of around 10 million protein, and is computed as a harmonic mean of the precision
protein domain sequences from Pfam51. Our LSTM-LM is and recall; and (2) term-centric area under precision-recall
trained to predict an amino acid residue in the context of its (AUPR) curve, which measures the accuracy of assigning proteins
position in a protein sequence (see the “Methods” section for to different GO terms/EC numbers. When reporting the overall
details). During the training of the GCN the parameters of the performance of a method the AUPR and Fmax scores are averaged
LSTM-LM are fixed; i.e., the LSTM-LM stage is only used as a over all GO terms and all proteins in the test set, respectively. To
sequence feature extractor. The residue-level features constructed compare different methods we also report the precision-recall
for sequences, together with contact maps, are used as an input curves representing the average precision and recall at the
for the second stage of our method. Each layer of the graph different values of the decision threshold t ∈ [0, 1].
convolution stage takes both an adjacency matrix and the residue- This architecture leads to the main advantage of our method,
level features described above, and outputs the residue-level that it convolves features over residues that are distant in the
features in the next layer. We explore different types of graph primary sequence, but close to each other in the 3D space,
convolutions, including the most widely used Kipf & Welling without having to learn these functionally relevant proximities
graph convolutional layer (GraphConv)44, Chebyshev spectral from the data. Such an operation, implemented here using graph
graph convolutions (ChebConv)52, SAmple and aggreaGatE convolution, leads to better protein feature representations and
convolutions (SAGEConv)53, Graph Attention (GAT)54, and a ultimately to more accurate function predictions as shown in
combination of different graph convolutional layers with different Supplementary Fig. 2. These results illustrate the importance of
propagation rules (MultiGraphConv)55. Our comparison between both graph convolutions and protein language model features as
different graph convolution formulations is shown in the components of DeepFRI. Specifically, DeepFRI outperforms a
“Methods” section and Supplementary Fig. 1. Three layers of baseline model which only takes into account contact maps in
MultiGraphConv or GAT often result in the best performance combination with simple one-hot sequence encoding, indicating
across many of our experiments. The GCN protein representation that the LSTM-LM features significantly boost the predictive
is obtained by concatenating features from all layers of this GCN power compared to simplified residue feature representation.
into a single feature matrix and is subsequently fed into two fully Moreover, by comparing DeepFRI with a baseline model that
connected layers to produce the final protein function predictions takes only language model features into account, we show the
for all terms (see “Methods” for details on GCN architecture). importance of protein structures and the effect of the long-range
We train different models to predict GO terms (one model for connections in the predictive performance of DeepFRI.
each branch of the GO: molecular function, cellular component,
biological process) and EC numbers. The GO terms are selected DeepFRI improves performance when protein models are
to have at least 50 and not more than 5000 training examples, included in the training. We investigate the performance of
whereas EC numbers are selected from levels 3 and 4 of the EC DeepFRI trained only on experimentally determined, high-quality
tree as they are the most specific descriptors of the enzymatic structures from the PDB. Further, to explore the possibility of
functions. We evaluate the function prediction performance by including a large number of available protein models into the
two measures commonly used in the CAFA challenges27 (see training, we examine the performance when homology models
“Methods”): (1) protein-centric maximum F-score (Fmax) which from SWISS-MODEL are included in the training procedure.
measures the accuracy of assigning GO terms/EC numbers to a This significantly increases the number of training samples per
Fig. 2 Performance of DeepFRI in predicting MF-GO terms of experimental structures and protein models. a Precision-recall curves showing the
performance of DeepFRI on ~700 protein contact maps (PDB700 dataset) from NATIVE PDB structures (CMAP_NATIVE, black), their corresponding
Rosetta-predicted lowest energy (LE) models (CMAP-Rosetta_LE, orange) and DMPfold lowest energy (LE) models (CMAP-DMPFold_LE, red), in
comparison to the sequence-only CNN-based method (SEQUENCE, blue). All DeepFRI models are trained only on experimental PDB structures.
b Distribution of protein-centric Fmax score over 1500 different Rosetta models from the PDB700 dataset grouped by their TM-score computed against the
native structures. Data are represented as boxplots with the center line representing the median, upper and lower edges of the boxes representing the
interquartile range, and whiskers representing the data range (0.5 × interquartile range). c An example of DeepFRI predictions for Rosetta models of a lipid-
binding protein (PDB id: 1IFC) with different TM-scores computed against its native structure. The DeepFRI output score >0.5 is considered as a significant
prediction. Precision-recall curves showing the: d performance of our method, trained only on PDB experimental structures, and evaluated on homology
models from SWISS-MODEL (red), in comparison to the CNN-based method (DeepGO) trained only on PDB sequences, and BLAST baselines are shown in
blue and gray, respectively; e performance of DeepFRI trained on PDB (blue), SWISS-MODEL (orange) and both PDB and SWISS-MODEL (red) structures
in comparison to the BLAST baseline (gray). The dot on the curve indicates where the maximum F-score is achieved (the perfect prediction should have
Fmax = 1 at the top right corner of the plot).
function and reduces the imbalance between positive and nega- suite57 and the contact predictor DMPfold12, and obtain the
tive examples. GO term and EC number annotations for PDB and lowest energy model for each chain and method (see “Methods”
SWISS-MODEL chains are retrieved from SIFTS56 and Uni- section). We construct two kinds of Cα–Cα contact maps for each
ProtKB/Swiss-Prot repositories, respectively. We report all our PDB chain—one from its experimental (i.e., NATIVE) structure
results on a test set consisting of only experimental PDB struc- and one from the lowest-energy (i.e., LE) model. DeepFRI
tures with varying degrees of sequence identity to the training set. exhibits higher performance (with Fmax = 0.657/0.633/0.619 for
For each annotated chain in PDB and SWISS-MODEL, we extract native structures and models from DMPFold and Rosetta,
its sequence and construct its Cα–Cα contact map (see “Methods” respectively) than that of the CNN-based method DeepGO
for data collection and pre-processing). We systematically explore (Fmax = 0.525) even when accounting for errors in predicted
the effect of different Cα–Cα distance thresholds and different contact maps (Fig. 2a). To further test the robustness in
types of contact maps on the predictive power of DeepFRI (see predicting GO terms with degrading quality of predicted models,
Supplementary Fig. 3). We further explore different structure we compute the Fmax score on a set of Rosetta models with
prediction methods for both training and prediction of newly different template modeling scores (TM-scores)58 and compare
observed sequences and find that using models from SWISS- them to the results from the sequence-only CNN model (see
MODEL during training greatly improves model comprehension Fig. 2b). Specifically, for each sequence in the PDB700 dataset, we
and accuracy. obtain 1500 Rosetta models with different TM-scores computed
First, we explore how DeepFRI trained on PDB structures against their corresponding native structure. Even for low TM-
tolerates modeling errors, by comparing its performance on scores we obtain better performance in GO term classification
models obtained from SWISS-MODEL13 and other de novo than the sequence-only CNN-based method (Fig. 2c). For
structure prediction protocols (see Figs. 2a, d). We extract the example, Fig. 2c shows the output of DeepFRI with varying
sequences from about 700 experimentally annotated PDB chains quality (TM-score) of Rosetta models of rat intestinal lipid-
(we refer to this dataset as PDB700), carry out structure binding apoprotein (PDB id: 1IFC). For models with TM-scores
prediction using both the Rosetta macro-molecular modeling >0.58, DeepFRI correctly predicts four GO terms including lipid
binding (GO:0008289), whereas for a TM-score >0.73, DeepFRI score within each group (Fig. 3b). DeepFRI robustly predicts MF-
correctly predicts even more specific function (i.e., fatty acid GO terms of proteins with ≤30% sequence identity to the training
binding, GO:0005504, a child term of lipid binding). Here, we set (with a median Fmax = 0.545 compared to a median of Fmax =
consider DeepFRI scores above 0.5 to be significant. 0.514 for FunFams and Fmax = 0.491 for DeepGO), and outper-
Even though Rosetta models often result in noisy contact maps, forms both FunFams and DeepGO at other sequence identity
the performance of our method on the lowest energy models is cutoffs. Even though DeepFRI achieves somewhat higher
not drastically impaired (Fig. 2a), which is due to the high precision in low recall region in predicting EC numbers at 30%
denoising ability of the GCN implied by a high correlation sequence identity (see Fig. 3c), FunFams outperforms both
between GCN features extracted from NATIVE and LE contact DeepFRI and DeepGO with the higher Fmax score across different
maps (see Supplementary Fig. 4). Moreover, the high tolerance sequence identity thresholds (Fig. 3c, d); This is especially the case
for predicting functions from low-quality models is due to for PDB chains in our test set from underrepresented protein
powerful language model features, which the model is mainly families. However, this not the case for PDB chains belonging to
relying on when making those predictions. protein families well represented in our training set, on which
Second, we examine the inclusion of homology models into the DeepFRI outperforms or has a comparable performance to
DeepFRI training procedure. A large number of diverse structures FunFams (see Supplementary Fig. 19). DeepFRI outperforms the
in the training set is an important prerequisite for more accurate sequence-only CNN (DeepGO) and the BLAST baseline for more
and robust performance of our deep learning-based method. To specific MF-GO terms (IC > 5) with fewer training examples (see
this end, we combine ~30 k non-redundant experimental Fig. 3f). In addition to testing the robustness of DeepFRI in
structures from the PDB and ~220 k non-redundant homology case when a certain level of homology relationships between the
models from the SWISS-MODEL repository. Inclusion of SWISS- training and the test set is allowed (Fig. 3b, d), we also test its
MODEL models not only results in more training examples and robustness when the test set is comprised of non-homologous
consequently in more accurate performance (Fmax = 0.455/0.545 PDB chains. That is, the PDB chains belonging to protein families
on structures from the PDB/PDB & SWISS-MODEL, see Fig. 2e), (i.e., Pfam51 IDs) and structural/fold classes (i.e., CATH4 IDs)
but it also results in a larger GO term coverage, especially in the different than the ones in the training set. To do this we remove
number of very specific, rarely-occurring GO terms (information PDB chains belonging to 23 largest protein families covering 3224
content, IC >10; Supplementary Fig. 5). Comparing the PDB chains from our training set, train the model on the rest, and
performance of our model with the CNN-based method, report the results on the held our (i.e., unseen) Pfams. See
DeepGO38 that operates only on sequences, and the BLAST Supplementary Fig. 21 for the performance results and the list of
baseline, we observe that our method benefits greatly from Pfam IDs in the test set. Similarly, we perform another train/test
homology models (Fig. 2e). split by composing a test set of PDB chains associated with the 4
most common (and largest in our set of) folds obtained from
CATH database: TIM barrel, Immunoglobulin-like, Jelly Rolls
DeepFRI outperforms other state-of-the-art methods. To and Alpha-Beta plaits, covering in total 4759 PDB chains. We
compare the performance of our method with previously pub- trained the model on the rest of the PDB chains, covering other
lished methods, we use a test set of PDB chains with experi- structural/fold classes, and report the performance results on the
mentally confirmed functional annotations, comprising of subsets test set (see Supplementary Fig. 22). In the first case, we observe
of PDB chains with varying degrees of sequence identity to the higher performance of DeepFRI (Fmax = 0.6) than in the second
training set. We compare our method to two sequence-based case (Fmax < 0.3 across all 4 CATH folds), which can be explained
annotation transfer methods (i.e., BLAST27 and FunFams24), one by the fact that DeepFRI’s LM, pre-trained on the entire Pfam
state-of-the-art deep learning method (DeepGO38), and one database, is helping the model generalize well on the unseen
feature engineering-based machine learning method (FFPred31). Pfams. Thus, the second case is a much more reliable setting for
CAFA challenges commonly use the BLAST baseline, in which testing the robustness of DeepFRI. In the second case, a much
every test sequence receives GO terms that are transferred from lower performance of DeepFRI is observed, indicating the
the sequence in the training set with the score being the pairwise difficulty of DeepFRI to generalize well on the unseen fold
sequence identity. FunFams is one of the top-performing meth- classes. However, it can still generalize its performance on these
ods in CAFA challenges in which test sequences are scanned folds better than sequence-based DeepGO and BLAST baseline
against a library of HMMs of CATH superfamilies. A test indicated by the higher value of Fmax score (Supplementary
sequence is first mapped to a most likely FunFam (i.e., with the Fig. 22).
highest HMM score); then GO terms and EC numbers of that It is important to note that different methods encompass
FunFam are transferred to the test sequence. The confidence different subsets of the GO-term vocabulary and that a key
score for each predicted GO term is computed as the annotation advantage of using comparative models (for instance from SWISS-
frequency of that GO term among the seed sequences of the MODEL) in training is the increase in the size of the vocabulary
FunFam24. DeepGO is a state-of-the-art CNN-based method encompassed by our method. Comparison to the standard feature
trained on the same number of protein sequences as DeepFRI. engineering-based, SVM-based method FFPred, is shown in
DeepGO uses 1D convolution layers with varying sizes of con- Supplementary Fig. 6. Given that FFPred is limited in the number
volutional filters to extract hierarchical features from the protein of GO terms for which it makes predictions (131 MF-GO, 379 BP-
sequences (see “Methods” for the architecture details). GO, and 76 CC-GO on our test set), and also it cannot predict EC
The performance of our method in comparison to state-of-the- numbers, we only show the result averaged over a subset of GO
art and baseline methods is shown in Fig. 3. In terms of both terms common to all methods. Moreover, different methods have
protein-centric Fmax, our method outperforms other methods on different coverages, i.e., the number of proteins in our test set for
MF- and BP-GO terms (Fig. 3a, e). Moreover, DeepFRI learns which they make predictions (see legend in Fig. 3a–d). For
general structure–function relationships more robustly than other example, FunFams is not able to predict MF-GO terms/EC
methods by predicting MF-GO terms of proteins with low numbers for 28%/14% of proteins in our test set (the total
sequence identity to the training set. To investigate this, we coverage for the entire test set is shown in legends in Fig. 3b, d).
partitioned our test set into groups based on maximum sequence We explored the performance of our method on individual GO
identity to the training set and computed the protein-centric Fmax terms. We observe that for the majority of MF-GO terms,
Fig. 3 Performance over GO terms in different ontologies and EC numbers. Precision-recall curves showing the performance of different methods on (a)
MF-GO terms and (c) EC numbers on the test set comprised of PDB chains chosen to have ≤30% sequence identity to the chains in the training set.
Coverage of the methods is shown in the legend. Distribution of the Fmax score under 100 bootstrap iterations for the top three best-performing methods
applied on (b) MF-GO terms and (c) EC numbers computed on the test PDB chains and grouped by maximum % sequence identity to the training set.
e Distribution of protein-centric Fmax score and function-centric AUPR score under 10 bootstrap iterations summarized over all test proteins and GO
terms/EC numbers, respectively. f Distribution of AUPR score on MF-GO terms of different levels of specificities under 10 bootstrap iterations. Every figure
illustrates the performance of DeepFRI (red) in comparison to sequence-based annotation transfer from protein families, FunFams (blue), the CNN-based
method DeepGO (orange), SVM-based method, FFPred (black), and BLAST baseline (gray). Error bars on the bar plots (e and f) represent standard
deviation of the mean. In panels b and d, data are represented as boxplots with the center line representing the median, upper and lower edges of the boxes
representing the interquartile range, and whiskers representing the data range (0.5 × interquartile range).
DeepFRI outperforms the sequence-only CNN method, indicat- Even though DeepFRI was not designed or trained explicitly to
ing the importance of structural features in improving perfor- predict residue-level annotations, we show how this is achieved
mance (see also Supplementary Fig. 7). DeepFRI outperforms the by post-processing methods.
CNN on almost all GO terms with an average PDB chain length To better interpret decisions made by neural networks, recent
≥400 (see Supplementary Fig. 7), illustrating the importance of work in ML has provided several new approaches for localizing
encoding distant amino acid contacts via the structure graph. signal to regions of the input feature space that lead to a given
This demonstrates the superiority of graph convolutions over positive prediction61–64. In computer vision these methods
sequence convolutions in constructing more accurate protein determine the regions of images that lead to positive object
features when key functional sites are composed of distal classifications48; in NLP these methods identify sub-regions of
sequence elements (as is the case for more complex folds with documents65. Recent work in computer vision uses gradient-
higher contact order)59. Specifically, in the case of long protein weighted Class Activation Maps (grad-CAMs) on trained CNN-
sequences (e.g., >400 residues), a CNN with reasonable filter based architectures to localize the most important regions in
lengths, would most likely fail to convolve over residues at images relevant for making correct classification decisions48. We
different ends of the long sequence, even after applying multiple use grad-CAMs, adapted for post-training analysis of GCNs. For
consecutive CNN layers; whereas, graph convolutions applied on each protein, DeepFRI detects function-specific structural sites by
contact maps would, in 3 layers or less, access feature information identifying residues relevant for making accurate GO term
from the complete structure. prediction (for DeepFRI model trained on MF-GO terms), or
EC prediction (for DeepFRI model trained on EC numbers). See
an example of grad-CAM and its corresponding heatmap over the
Class activation maps increase the resolution from protein- sequence in Fig. 4a, right. It does so by first computing the
level to region-level predictions. Many proteins carry out their contribution of each graph convolutional feature map of the
functions through spatially clustered sets of important residues model (trained on the MF-GO dataset) to the GO term
(e.g., active sites on an enzyme, ligand-binding sites on a protein, prediction, and then by summing the feature maps with positive
or protein–protein interaction sites). This is particularly relevant contributions to obtain a final residue-level activation map (see
in the Molecular Function branch of the GO hierarchy, or for EC “Methods”).
numbers, and less so for terms encoded in the Biological Process For site-specific MF-GO terms (i.e., GO terms describing
branch. Designing ML methods for identifying such functional different types of ligand binding), we provide four examples
residues have been the subject of many recent studies21,22,24,60. where we automatically and correctly identify functional sites for
They exploit features from sequence or structure to train classi- several functions where binding sites are known (see Fig. 4).
fiers on existing functional sites in order to predict new ones.
Fig. 4 Automatic mapping of function prediction to sites on protein structures. a An example of the gradient-weighted class activation map for ‘Ca Ion
Binding’ (right) mapped onto the 3D structure of rat α-parvalbumin (PDB Id: 1S3P), chain A (left), annotated with calcium ion binding. The two highest
peaks in the grad-CAM activation profile correspond to calcium-binding residues. b ROC curves showing the overlap between grad-CAM activation profiles
and binding sites, retrieved from the BioLiP database, computed for the PDB chains shown in panel (c). c Examples of other PDB chains annotated with
DNA binding, GTP binding, and glutathione transferase activity. All residues are colored using a gradient color scheme matching the grad-CAM activity
profile, with more salient residues highlighted in red and less salient residues highlighted in blue. No information about co-factors, active sites, or site-
specificity was used during training of the model.
Figure 4a shows the grad-CAM identified residues for a calcium argument that the method is general and capable of predicting
ion binding (GO:0005509) of α-parvalbumin protein (PDB id: functions in a manner that transcends sequence alignment.
1S3P). The two highest peaks in the profile correspond to the A similar approach can be used for predicting catalytic residues
calcium-binding residues in the structure of the protein (Fig. 4a, and active sites of proteins. Specifically, we apply grad-CAM
left). Indices of the calcium-binding residues in 1S3P were approach on the DeepFRI model trained on EC numbers. To
retrieved from the BioLiP database66 and compared to the evaluate our predictions, we use a dataset composed of enzymes
residues identified by our method by using receiver operating available in the Catalytic Site Atlas (CSA)67, a database that
characteristic (ROC) curves. The ROC curve shows the relation provides enzyme annotations specifying catalytic residues that
between sensitivity or true positive rate (ratio of functional have been experimentally validated and published in the primary
residues identified as salient) and 1-specificity or false positive literature. We use a manually curated dataset of 100 evolutiona-
rate (ratio of non-functional residues identified as non-salient). A rily divergent enzymes from the CSA provided by Alterovitz
high area under the ROC curve indicates high correspondence et al.60 used for training their method ResBoost. Figure 5 shows
between annotated binding sites and our predictions, meaning results for a subset of PDB chains in this dataset, covering
high accuracies in residue-level predictions. Sample ROC curves different EC numbers. Using the CSA as ground truth, we
for other functions including DNA binding (GO:0003677), GTP compute a ROC curve quantifying the accuracy of DeepFRI in
binding (GO:0005525), and glutathione transferase activity predicting catalytic residues (see Supplementary Fig. 16). This
(GO:0004364) computed between the binary profile representing result is not directly comparable to the performance results of
binding sites from BioLiP and the grad-CAM profile are depicted ResBoost because we computed it only on a subset of 38 enzymes
in Fig. 4b, and structural visualizations in Fig. 4c. Our study of (out of 100 enzymes used for training ResBoost) for which EC
grad-CAMs against BioLiP database reveals that the highest numbers were in our training set. Moreover, DeepFRI is not
performing group of GO terms are related to functions with designed to perform training on existing catalytic residues in
known site-specific mechanisms or site-specific underpinnings. the cross-validation manner (i.e., by hiding some catalytic
We depict examples (with high AUROC scores) for which residues in the training of the model, and then predicting on
grad-CAMs correctly identify binding regions in Supplementary them) as ResBoost and it cannot control the trade-off between
Figs. 8–15. For various GO terms, the functional sites correspond sensitivity and specificity in predicting catalytic residues. DeepFRI
to known binding sites or conserved functional regions (see is also not explicitly trained to predict catalytic residues using a
Supplementary Figs. 8–15). Interestingly, our model is not set of enzymes with known catalytic residues and information
explicitly trained to predict functional sites, but instead such about their positions in the structure. Surprisingly, a high
predictions stem solely from the grad-CAM analysis of the graph AUROC score of 0.81 (Supplementary Fig. 16) stems solely from
convolution parameters of the trained model; thus, the ability of the grad-CAM analysis of our DeepFRI model trained on EC
the method to correctly map functional sites supports our numbers.
Fig. 5 Identifying catalytic residues in enzymes using grad-CAM applied on the DeepFRI model trained on EC numbers. All residues are colored using a
gradient color scheme matching the grad-CAM activity score, with more salient residues highlighted in red and less salient residues highlighted in blue. The
PDB chains (shown in panels a–i) are annotated with all of its known catalytic residues (available in Catalytic Site Atlas), with a residue number and a
pointer to the location on the structure. Residues correctly identified by our method are highlighted in red.
Performing functional site identification is also very efficient as DeepFRI makes reliable predictions on unannotated PDB and
it does not require any further training or modification of the SWISS-MODEL chains. A large number of high-quality protein
model’s architecture. The site-specificity afforded by our function structures in both the PDB and SWISS-MODEL lack or have
predictions is especially valuable for poorly studied, unannotated incomplete functional annotations in the databases we used for
proteins. Site-specific predictions provide first insights into the training and testing our models. For example, analysis of the
correctness of predictions and frame follow-up validation SIFTS June 2019 release56 reveals that around 20,000 non-
experiments, for example, using genetics or mutagenesis to test redundant, high-quality PDB chains currently lack GO term
site-specific predictions. annotations. Similarly, around 13,000 SWISS-MODEL chains
lack Swiss-Prot GO term annotations. Interestingly, even though
the PDB chains lack GO term annotations, many have additional
Temporal holdout evaluation emphasizes DeepFRI’s perfor- site-specific functional information present in their PDB files, for
mance in a realistic scenario. We also evaluate the performance instance through ligands, co-factors, metals, DNA, and RNA. We
of our method in a more realistic scenario using a temporal use these cases to verify their function and discuss them in depth.
holdout strategy similar to the one in CAFA27–29. That is, we A set of predictions, including many for truly unknown PDB
composed a test set of PDB chains by looking at the difference in chains, is provided in Supplementary File 1. For example, there
GO annotations of the PDB chains in the SIFTS56 database are a number of PDB chains binding metal ions that have known
between two releases separated by ~6 months—releases 18 June binding residues in BioLip66, but missing GO term annotations
2019 and 04 January 2020. We identified ~3000 PDB chains that (GO:0046872). In other cases, the function, albeit missing in
did not have annotations in the 2019 SIFTS release and gained SIFTS, is directly implied in the name of the protein (e.g., a zinc
new annotations in the 2020 SIFTS release (see “Methods”). We finger protein without zinc ion binding (GO:0008270) annota-
evaluated the performance of DeepFRI on the newly annotated tion). Here, we apply our method to these unannotated PDB
PDB chains from the 2020 SIFTS release. DeepFRI significantly chains, as a part of a blind experiment, to evaluate our predictions
outperforms both BLAST and DeepGO (see Supplementary at the chain-level and the residue-level through the grad-CAM
Fig. 17). Furthermore, we highlight examples of PDB chains with approach. We also make predictions on SWISS-MODEL chains.
correctly predicted GO terms for which both BLAST and Supplementary Data Files 1 and 2 contain all DeepFRI high-
DeepGO are failing to produce any meaningful predictions, confidence predictions for the PDB and SWISS-MODEL chains.
indicating again the importance of structural information (see In Fig. 6a, b, we show their statistics, with the total number of
Supplementary Fig. 17). PDB and SWISS-MODEL chains predicted with all and more
Fig. 6 Predicting and mapping function to unannotated PDB & SWISS-MODEL chains. Percentage/number of PDB chains (a) and SWISS-MODEL chains
(b) with MF-, BP-, and CC-GO terms predicted by our method; the number of specific GO term predictions (with IC >5) are shown in blue and red for PDB
and SWISS-MODEL chains, respectively. c An example of a Fe–S-cluster-containing hydrogenase (PDB Id: 6F0K), found in Rhodothermus marinus, with
missing GO term annotations in SIFTS (unannotated). The PDB chain lacks annotations in databases used for training our model and DeepFRI predicts to
bind a 4Fe–4S iron–sulfur cluster with high confidence score. The predicted grad-CAM profile significantly overlaps with ligand-binding sites of 4Fe–4S
obtained from BioLiP, as shown by the ROC curve. d grad-CAM profiles for predicted DNA binding and metal ion binding functions mapped onto the
structure of an unannotated zinc finger protein (PDB Id: 1MEY) found in Escherichia coli; the corresponding ROC curves show significant overlap between
the grad-CAM profile and the binding sites obtained from BioLiP.
specific (Information Content, IC >5) GO terms. Some interesting sequence alignment is often sufficient to transfer folds or struc-
unannotated PDB chains with known ligand-binding information tural information68, sequence alignments are challenging to use to
include 4-iron, 4-sulfur cluster binding (GO:0051539) of a Fe–S- transfer function (as evidenced by the poor performance of the
cluster-containing hydrogenase (PDB id: 6F0K), shown in Fig. 6c. CAFA-like BLAST benchmark) due to the need for different
Iron–sulfur clusters are important in oxidation-reduction reac- thresholds for different functions, partial alignments, and domain
tions for electron transport and DeepFRI accurately predicts their structures, protein moonlighting, and neofunctionalization27,29,69.
binding sites as shown by the corresponding ROC curve, Thus, one important advantage of DeepFRI is its ability to make
computed between the predicted grad-CAM profile and the function predictions beyond homology-based transfer by extract-
known 4Fe–4S cluster binary binding profile retrieved from ing local sequence and global structural features27.
BioLiP. Another example includes DNA binding (GO:0003677) By comparing function prediction performance on DMPFold
and metal ion binding (GO:0046872) of the zinc finger protein and Rosetta models and their corresponding experimentally
(PDB Id: 1MEY) with predicted grad-CAM activity mapped onto determined structures, we demonstrate that DeepFRI has a high
the same structure and validated experimentally for both DNA denoising power. Our method’s robustness to structure prediction
and metal (Fig. 6d). errors indicates that it can reliably predict functions of proteins
with computationally inferred structures. The ability to use pro-
Discussion tein models opens the door for characterizing many proteins
Here we describe a deep learning method for predicting protein lacking experimentally determined structures. Further, databases
function from both sequences and contact map representations of with available protein models (e.g., homology models from
3D structures. Our method DeepFRI is trained on protein struc- SWISS-MODEL13 and ModBase20) can expand the training set
tures from the PDB and SWISS-MODEL and rapidly predicts and improve the predictive power of the model. The more
both GO terms and EC numbers of proteins and improves over extensive use of homology models will be the subject of a
state-of-the-art sequence-based methods on the majority of future study.
function terms. Features learned from protein sequences by the While this paper mainly focuses on introducing efficient and
LSTM-LM and from contact maps by the GCN lead to substantial accurate function prediction models, it also provides a means of
improvements in protein function prediction, therefore enabling interpreting prediction results. We demonstrate on multiple dif-
novel protein function discoveries. Although high-quality ferent GO terms that the DeepFRI grad-CAM identifies
structurally meaningful site-specific prediction, for instance from phenotype), IGI (inferred from genetic interaction), IEP (inferred from expression
ligand-binding sites. For some PDB chains, the accuracy of the pattern), TAS (traceable author statement), and IC (inferred by curator), and (2)
electronically inferred (in figure captions/legends, we refer to those as IEA—
DeepFRI grad-CAM in identifying binding residues is quite inferred from electronic annotation). Furthermore, we focus only on specific MF-,
remarkable, especially since the model is not designed to predict BP-, and CC-GO terms that have enough training examples from the non-
functional residues and the ligand-binding information was not redundant training set (see the section above). That is, we select only GO terms
given to the model a priori. However, the main disadvantage of that annotate >50 non-redundant PDB/SWISS-MODEL chains. We retrieved
considering this to be a site-specific function prediction method enzyme classes for sequences and PDB structures from the levels 3 and 4 (most
specific levels) of the EC tree. The number of GO terms and EC classes in each
lies in the multiple meanings of grad-CAMs. Specifically, for ontology is represented in Supplementary Table 1.
some GO terms related to binding, grad-CAMs do not necessarily In our analyses, we differentiate GO terms based on their specificity, expressed
identify binding residues/regions; instead, they identify regions as Shannon information content (IC)73:
that are conserved among the sequences annotated with the same
function. This can be explained with the fact that any neural ICðGOi Þ ¼ log2 ProbðGOi Þ; ð1Þ
network, including ours, would always tend to learn the most
trivial features that lead to the highest accuracy70,71. where, Prob(GOi) is the probability of observing GO term i in the UniProt-GOA
In conclusion, here we describe a method that connects two database (ni/n, where ni—number of proteins annotated with GO term i and n—
total number of proteins in UniProt-GOA). Infrequent GO terms (i.e., more
key problems in computational biology, protein structure pre- specific) have higher IC values.
diction and protein function prediction. Our method linking deep
learning with an increasing amount of available sequence and Training and test set construction. We partition the non-redundant set com-
structural data has the potential to meet the annotation challenges posed of PDB and SWISS-MODEL sequences into training, validation, and test
posed by ever-increasing volumes of genomic sequence data, sets, with approximate ratios 80/10/10%. The test set, comprising of only experi-
offering new insights for interpreting protein biodiversity across mentally determined PDB structures and experimentally determined annotations is
chosen to have PDB chains with varying degrees of sequence identity (i.e., 30%,
our expanding molecular view of the tree of life. 40%, 50%, 70%, and 95% sequence identity) to the training set. Furthermore, each
PDB chain in the test set is chosen to have at least one experimentally confirmed
Methods GO term in each branch of GO. See Supplementary Table 2.
Construction of contact maps. We collect 3D atomic coordinates of proteins from We use the CD-HIT clustering tool74 to select SWISS-MODEL sequences that
the Protein Data Bank (PDB)19. As the PDB contains extensive redundancy in are dissimilar to the test set and to split them into training and validation sets. We
terms of both sequence and structure, we remove identical and similar sequences examine the performance of our method when trained only on PDB, only on
from our set of annotated PDB chains. We create a non-redundant set by clustering SWISS-MODEL and both PDB & SWISS-MODEL contact maps; we also
all PDB chains (for which we were able to retrieve contact maps) by blastclust at investigate training on only EXP and both EXP & IEA function labels (see
95% sequence identity (i.e., number of identical residues out of the total number of Supplementary Fig. 18A). In all our experiments we trained the model using both
residues in the sequence alignment). Then, from each cluster we select a repre- EXP and IEA GO annotations), but the test set, composed of only experimentally
sentative PDB chain that is annotated (i.e., has at least one GO term in at least one annotated PDB chains (EXP), is always kept fixed. See Supplementary Table 1. The
of the three ontologies) and which is of high quality (i.e., has a high-resolution final results are averaged over 100 bootstraps of the test set, in all our experiments.
structure). In addition to PDB structures, we also obtained homology models from
the SWISS-MODEL repository13. We include only annotated SWISS-MODEL Preparation of a benchmark set of protein models. The initial set of benchmark
chains (i.e., having at least one GO term in at least one of the three GO ontologies) structures used here was Jane and Dave Richardson’s Top 500 dataset75. It is a set
in our training procedure. We remove similar SWISS-MODEL sequences again at of hand curated, high-resolution, and high quality (the top 500 best), protein
95% sequence identity. Including SWISS-MODEL models leads to a 5-fold increase structures that were chosen for their fit to their completeness, how well they fit the
in the number of training samples (see Supplementary Table 1) and also in a larger experimental data, and lack of high energy structural outliers (bond angle and bond
coverage of more specific GO terms (see Supplementary Fig. 5). length deviations76). This set has been used in the past for fitting Rosetta energy/
To construct contact maps, we consider two resides to be in contact if the score terms and numerous other structural-bioinformatics validation tasks.
distance between their corresponding Cα atoms is <10 Å. We refer to this type of Unfortunately, the structures in this set lacked sufficient annotations (many of
contact maps as CA-CA. We have also considered two other criteria for contact these structures were the results of structural genomics efforts and had no, or only
map construction. Two residues are in contact if (1) the distance between any of high level, annotations in GO and EC). Accordingly, we choose an additional
their atoms is <6.5 Å (we refer to this type of contact maps as ANY-ANY) and (2) 200 sequences from the PDB. These additional high-quality benchmark structures
if the distance between their Rosetta neighbor atoms is less than sum of the were chosen by taking 119K chains with functional annotations and filtering them
neighbor radii of the amino acid pair (we refer to this type of contact maps as NBR- with the PISCES Protein Sequence Culling Server77 with the following criteria:
NBR). Rosetta neighbor atoms are defined as Cβ atoms for all amino acids except sequence percentage identity: ≤25, resolution: 0.0–2.0, R-factor: 0.2, sequence
glycine where Cα is used. An amino acid neighbor-radius describes a potential length: 40–500, non-X-ray entries: Excluded, CA-only entries: Excluded, Cull PDB
interaction sphere that would be covered by the amino acid side chain as it samples by chain.
all possible conformations. Neighbor–neighbor contact maps are therefore more This left us with 1606 SIFTS annotated chains from which we randomly selected
indicative of side-chain–side-chain interactions than Cα–Cα maps. To conserve the 200 chains. These PDB chains together with the Top500 PDB chains (we refer to
memory avoid training the model on protein chains with long sequences, we only this combined set as PDB700) were then excluded from all phases of model
construct contact maps for chains between 60 and 1000 residues. We have also training. The performance of our method on this set of PDB chains is shown in
experimented with different distance thresholds for CA-CA and ANY-ANY Fig. 2a. In Supplementary Fig. 4, we demonstrate the denoising capabilities of our
contact maps. We found that our method produced similar results when trained on method on this set of structures.
these contact maps with a Cα–Cα distance of 10 Å, producing slightly better results
(see Supplementary Fig. 3).
Comparison with existing methods
CNNs. CNNs have shown tremendous success in extracting information from
Functional annotations of PDB & SWISS-MODEL chains. For training our sequence data and making highly accurate predictive models. Their success can be
models we use two sets of function labels: (1) Gene Ontology (GO)7 terms and (2) attributed to convolutional layers with a highly reduced number of learnable
enzyme commission (EC) numbers72. GO terms are hierarchically organized into parameters, which allow multi-level and hierarchical feature extraction. In the last
three different ontologies—molecular function (MF), biological process (BP), and few years, a large body of work has been published covering various applications of
cellular component (CC). We train our models to predict GO terms separately for CNNs, such as the prediction of protein functions38 and subcellular localization78,
each ontology. The summary of GO identifiers as well as EC numbers for each PDB prediction of effects of noncoding-variants79 and protein fold recognition80. Here
and SWISS-MODEL chain were retrieved from SIFTS56 (structure integration with we use the CNN-based DeepGO tool38 in our comparison study. We describe this
function, taxonomy, and sequence) and UniProt Knowledgebase databases, architecture in more detail in the Supplementary Material.
respectively. We represent a protein sequence with L amino acid residues as a feature matrix
SIFTS transfers annotations to PDB chains via residue-level mapping between X = [x1, …, xL] ∈ {0, 1}L×c, where c = 26 dimensions (20 standard, 5 non-standard
UniProtKB and PDB entries. All the annotation files were retrieved from the SIFTS amino acids, and the gap symbol) are used as a one-hot indicator, xi ∈ {0, 1}c, of the
database (2019/06/18) with PDB release 24.19 and UniPortKB release 2019.06. We amino acid residue at position i in the sequence. This representation is fed into a
consider annotations that are (1) not electronically inferred (in figure captions/ convolution layer, which applies a one-dimensional convolution operation with a
legends, we refer to those as EXP), specifically, we consider GO terms with the specified number of kernels (weight matrices or filters), fn, of certain length, fl. The
following evidence codes: EXP (inferred from experiment), IDA (inferred from output is then transformed by the rectified linear activation function (ReLU),
direct assay), IPI (inferred from physical interaction), IMP (inferred from mutant which sets values below 0 to 0, i.e., ReLU(x) = max(x, 0). This is followed by a
global max-pooling layer and a fully connected layer with sigmoid activation by the identity matrix IL 2 RL ´ L , D e is the diagonal degree matrix with entries
function for predicting probabilities of GO terms or EC enzyme classes. e ij , and WðlÞ 2 Rcl ´ clþ1 is a trainable weight matrix for layer l + 1.
Dii ¼ ∑Lj¼1 A
In the first convolution layer, we use 16 CNN layers with fn = 512 filters of
To normalize residue features after each convolutional layer the adjacency
different lengths (see Supplementary Material). After concatenating the outputs of
the CNN layers, we obtain an L × 8192 dimensional feature map for each sequence. matrix is first symmetrically normalized, hence the term D e 0:5 A
eDe 0:5 . Equation (4)
Using filters of variable lengths ensures the extraction of complementary updates features of each residue by a weighted sum of features of the directly
information from protein sequences. The second layer has ∣GO∣ number of units connected residues in the graph (adding self-connections ensures that the residue’s
for GO terms (or ∣EC∣ for EC) classification. own features are also included in the sum).
We also explore other types of graph convolutional layers previously proposed
BLAST. BLAST baseline is used in the same way as described in CAFA127: a in the machine learning literature. Specifically, we tested the performance of
sequence in our test set receives GO/EC annotations from all annotated sequences DeepFRI on all of the branches of GO as well as EC classes with SAmple and
in our training set (comprised of SWISS-PROT sequences) with the prediction aggreGatE convolutions (SAGEConv)53, Chebyshev spectral graph convolutions
scores equal to the sequence identity scores (divided by 100) between the test and (ChebConv)52, Graph Attention (GAT)54, and a combination of different graph
the training sequences. Prior to this, we remove all sequences from our training set convolutions with different propagation rules (MultiGraphConv)55 in comparison
that are similar to our test sequences using an E-value threshold of 1e−3, to to the plain Kipf & Welling graph Convolution (GraphConv)44. These
prevent annotation transfer from homologous sequences, as previously described convolutions differ in the way the features of the neighboring residues are
by Cozzetto et al.31. aggregated. The performance of DeepFRI in predicting MF-GO and EC labels with
these graph convolution layers is shown in Supplementary Fig. 1.
Given that we are classifying individual protein graphs with different number of
FFPred. FFPred is a support vector machine (SVM)-based classifier on manually residues, we use several layers, Nl = 3, of graph convolutions. The final protein
designed features derived from sequences such as transmembrane regions, sec- representation is obtained by first concatenating features from all layers into a
ondary structures, and sequence motifs31. L
single feature matrix, i.e., H ¼ ½Hð1Þ ; ¼ ; HðN l Þ 2 RL ´ ∑l¼1 cl and then by
performing a global pooling layer after which we obtain a fixed vector
FunFam. FunFam is a domain-based method that uses functional classification of L
CATH superfamilies for annotation transfer. The method takes each sequence and representation of a protein structure, hpool 2 R∑l¼1 cl . The global pooling is
scans it against CATH FunFams using HMMER381. Then it transfers all GO terms/ obtained by a sum operator over L residues:
EC numbers from the FunFams with the highest HMM score to the test sequence. L
We followed the procedure described here https://github.com/UCLOrengoGroup/ hpool ¼ ∑ Hi: ð5Þ
cath-tools-genomescan to obtain GO terms and EC numbers for our test i¼1
sequences. The GO term assignment score is computed as frequency of the GO We then use a fully connected layer with a ReLU activation function for
terms among the seed sequences of the matched FunFam and propagated up the computing the hidden representation from the pooled representation. This is then
GO hierarchy as described in Das et al.24. followed by a fully connected layer which is used for mapping the hidden
representation from the previous layer to a ∣GO∣x2 output; that is, two activations
LSTM language model for learning residue-level features. We use an approach for each GO term. These activations are transformed by a softmax activation
similar to Bepler and Berger35 for training our language model. We train a LSTM function, outputting the positive and negative probability for each GO term/EC
language model on ~10 M sequences sampled from the entire set of sequences from number (i.e., the final layer outputs probability vector ^y of dimension ∣GO∣ × 2
Pfam51. The sequences are represented using 1-hot encoding (see above). The (∣EC∣ × 2 for EC numbers) for predicting positive and negative probabilities of GO
language model architecture is comprised of two stacked forward LSTM layers with terms (EC numbers).
512 units each (see Fig. 1). The LSTM-LM model is trained for 5 epochs using an
ADAM optimizer82 with a learning rate 0.001 and a batch size of 128. All hyper- Model training and hyper-parameter tuning. To account for imbalanced labels,
parameters are determined through a grid search based on the model’s perfor- both the CNN and GCN are trained to minimize the weighted binary cross-entropy
mance on the validation set. cost function that gives higher weights to the GO term with fewer training
The residue-level features, extracted from the final LSTM layer’s hidden states, examples:
HLM, are combined with 1-hot representation of sequences, X, through learnable
non-linear mapping: 1 N jGOj 2
LðΘÞ ¼ ∑ ∑ ∑ w y log ð^yijk Þ ð6Þ
N i¼1 j¼1 k¼1 j ijk
Hinput ¼ ReLUðHLM WLM þ XWX þ bÞ ð2Þ
where, Hinput is the final residue-level feature representation passed to the fist GCN where Θ is the set of all parameters in all layers to be learned; wj ¼ NNþ is the class
j
layer, H(0) = Hinput (see the equation below). We refer to this stage of our method weight for function j, with N þ j being the number of positive examples associated
as a feature extraction stage. The parameters, WLM, WX, and b are trained with the with function j; N is the total number of samples and ∣GO∣ is the total number of
parameters of the GCN. All the parameters of the LSTM-LM are frozen during the functions (i.e., GO terms); yijk is the true binary indicator for sample i and function
training. We choose this strategy because it more efficient (i.e., instead of fine j (i.e., yij1 = 1, if sample i is annotated with function j, and yij2 = 0, otherwise) and
tuning the huge number of the LSTM-LM parameters together with GCN ^yij1 is the predicted probability that sample i is annotated with function j. In the
parameters, we only tune, WLM, WX, and b parameters while keeping the
inference phase, we say we predict GO terms/EC numbers if the positive prob-
parameters of the LSTM-LM fixed).
ability is >0.5.
All hyper-parameters are determined through a grid search based on the
Graph convolutional network. GCNs have proven to be powerful for extracting model’s performance on the validation set. The validation set is comprised of ~10%
features from data that are naturally represented as one or more graphs42. Here we randomly chosen samples from the training set. To avoid overfitting, we use an
experiment with the notion that GCNs are a suitable method for extracting features early stopping criterion with patience = 5 (i.e., we stop training if the validation loss
from proteins by taking into account their graph-based structure of inter- does not improve in 5 epochs). We use the ADAM optimizer82 with a learning rate
connected residues, represented by contact maps. We propose our model based on lr = 0.0001, β1 = 0.95, and β2 = 0.95 and a batch size of 64. The default number of
the work of Kipf and Welling44. A protein graph can be represented by a contact epochs is 200. Both GCN and CNN are implemented to deal with variable length
map, A 2 RL ´ L , encoding connections between its L residues, and a residue-level sequences, by performing sequence/contact map padding. The entire method is
feature matrix, X 2 RL ´ c . implemented using the Tensorflow/Keras deep learning library (see Supplementary
We explore different residue-level feature representations including one-hot Note).
encoding of residues as in the CNN (c = 26), LSTM language model (c = 512, i.e.,
the output of the LSTM layers), and no sequence features (to be able to run the Temporal holdout validation. We also evaluate the performance of our method by
GCN, in this case, the feature matrix is substituted with a diagonal identity matrix, using temporal holdout validation similar to CAFA27. The temporal holdout
i.e., X = IL). approach ensures a more “realistic” scenario where function predictions are eval-
The graph convolution takes both the adjacency matrix A and the residue-level uated based on recent experimental annotations34. We used GO annotations
embeddings from the previous layer, HðlÞ 2 RL ´ cl and outputs the residue-level retrieved from SIFTS56 from two time points, version 2019/06/18 (we refer to this
embeddings in the next layer, Hðlþ1Þ 2 RL ´ clþ1 : as SIFTS-2019) and version 2020/01/04 (we refer to this as SIFTS-2020), to con-
struct our temporal holdout test set. We form the test set from the PDB chains that
Hlþ1 ¼ GCðA; Hl Þ ð3Þ did not have any annotations in SIFTS-2019 but gained annotations in SIFTS-2020.
where, H(0) = Hinput, and cl and cl+1 are residue embedding dimensions for layers l To increase the GO term coverage, we focus on the PDB chains with both EXP and
and l + 1, respectively. Concretely, we use the formulation of Kipf and Welling44: IEA evidence codes. We obtain 4072 PDB chains (out of which 3115 have
sequences <1200 residues). We use our model (trained on SIFTS-2019 GO
0:5
e
Hlþ1 ¼ ReLUðD eD
A e 0:5 HðlÞ WðlÞ Þ ð4Þ annotations) to predict functions of these newly annotated PDB chains. We
evaluate our predictions against the annotations from SIFTS-2020. The results for
e ¼ A þ IL is the adjacency matrix with added self-connections represented
where A MF-, BP-, and CC-GO terms are shown in Supplementary Fig. 17. We also show a
few examples of the PDB chains with correctly predicted MF-GO terms by our 2. Mitchell, A. L. et al. InterPro in 2019: improving coverage, classification and
method, for which both BLAST and DeepGO are not able to make any significant access to protein sequence annotations. Nucleic Acids Res. 47, D351–D360
predictions. (2018).
3. Jones, D. T. & Cozzetto, D. DISOPRED3: precise disordered region
Residue-level annotations. We use a method based on Gradient-weighted Class predictions with annotated protein-binding activity. Bioinformatics 31,
Activation Map (grad-CAM)48 to localize function predictions on a protein 857–863 (2014).
structure (i.e., to find residues with highest contribution to a specific function). 4. Dawson, N. L. et al. CATH: an expanded resource to predict protein function
Grad-CAM is a class-discriminative localization technique that provides visual through structure and sequence. Nucleic Acids Res. 45, D289–D295 (2016).
explanations for predictions made by CNN-based models. Motivated by its success 5. Gerstein, M. How representative are the known structures of the proteins in a
in image analysis, we use grad-CAM to identify residues in a protein structure that complete genome? A comprehensive structural census. Fold. Des. 3, 497–512
are important for the prediction of a particular function. (1998).
In grad-CAM, we first compute the contribution of each filter, k, in the last 6. Vogel, C., Berzuini, C., Bashton, M., Gough, J. & Teichmann, S. A. Supra-
convolutional layer to the prediction of function label l by taking the derivative of domains: evolutionary units larger than single protein domains. J. Mol. Biol.
the output of the model for function l, yl, with respect to feature map Fk 2 RL over 336, 809–823 (2004).
the whole sequence of length L: 7. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat.
News 25, 25–29 (2000).
L ∂y l
8. Bairoch, A. The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305
wlk ¼ ∑ ð7Þ
i¼1 ∂F k;i (2000).
9. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG:
where wlk represents the importance of feature map k for predicting function l,
new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res.
obtained by summing the contribution from each individual residue. Finally, we
45, D353–D361 (2016).
obtain the function-specific heatmap in a residue space by making the weighted
10. Boutet, E, Lieberherr, D, Tognolli, M, Schneider, M & Bairoch, A. UniProtKB/
sum over all feature maps in the last convolutional layer:
Swiss-Prot 89–112 (Humana Press, 2007).
11. Ovchinnikov, S. et al. Protein structure determination using metagenome
CAMl ½i ¼ ReLU ∑ wlk F k;l ð8Þ
k sequence data. Science 355, 294–298 (2017).
12. Greener, J. G., Kandathil, S. M. & Jones, D. T. Deep learning extends de novo
where the ReLU function ensures that only features with positive influence on the
protein modelling coverage of genomes using iteratively predicted structural
functional label are preserved; CAMl[i] indicates the relative importance of residue
constraints. Nat. Commun. 10, 1–13 (2019).
i to function l. The advantage of grad-CAM is that it does not require re-training or
changes in the architecture of the model which makes is computationally efficient 13. Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein
and directly applicable to our models. See Supplementary Figs. 8–15 representing structures and complexes. Nucleic Acids Res. 46, W296–W303 (2018).
grad-CAM mapped onto 3D structure of PDB chains with known ligand-binding 14. Vallat, B., Webb, B., Westbrook, J., Sali, A. & Berman, H. M. Archiving and
information and Fig. 4 for grad-CAM mapped to 3D structure of PDB chains with disseminating integrative structure models. J. Biomol. NMR 73, 385–398
known active sites. (2019).
Residue-level evaluation: for each individual protein and its predicted MF-GO 15. Webb, B & Sali, A. Protein Structure Modeling with MODELLER 1–15
term/EC number, we measure the ability of our method in predicting binding or (Springer New York, 2014).
active sites. This measure can only be computed for the minority of proteins with 16. Shigematsu, H. Electron cryo-microscopy for elucidating the dynamic nature
detailed site-specific annotations; here we rely on the site-specific annotation of live-protein complexes. Biochim. Biophys. Acta Gen. Subj. 1864, 129436
available in the BioLiP database66 for ligand-binding proteins and the Catalytic Site (2019).
Atlas (CSA)67 for enzymes. 17. García-Nafría, J. & Tate, C. G. Cryo-electron microscopy: moving beyond x-
For example, for a given protein of L residues, we construct a ligand-binding ray crystal structures for drug receptors and drug development. Annu. Rev.
binary profile (retrieved from BioLiP), s ∈ {0, 1}L, indicating residues known to Pharmacol. Toxicol. 60, 51–71 (2020).
bind a specific ligand (e.g., ATP); i.e., si = 1 if residue i is a ligand-binding residue, 18. Senior, A. W. et al. Improved protein structure prediction using potentials
si = 0 otherwise. For the same protein and its corresponding predicted function from deep learning. Nature 577, 1–5 (2020).
(e.g., ATP binding (GO:0005524)), we compute a real-valued grad-CAM profile 19. Gilliland, G. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242
from our pre-trained DeepFRI method, ^s 2 ½0; 1L , indicating the functional (2000).
importance of each residue. To show how well the grad-CAM profile recovers 20. Pieper, U. et al. ModBase, a database of annotated comparative protein
known binding sites, we compute the area under the ROC curve, representing the structure models and associated resources. Nucleic Acids Res. 42, D336–D346
values of sensitivity for a given 1-specificity (false positive rate), using the sliding (2013).
threshold approach; we then compute the area under the ROC curve (AUROC) 21. Koo, D. C. E. & Bonneau, R. Towards region-specific propagation of protein
using the trapezoid rule83. See Supplementary Figs. 8–15 for examples of ROC functions. Bioinformatics 35, 1737–1744 (2018).
curves for different MF-GO terms and Supplementary Fig. 16 for ROC curve 22. Torng, W. & Altman, R. B. High precision protein functional site detection
showing aggregate performance over different EC numbers. using 3D convolutional neural networks. Bioinformatics 35, 1503–1512
(2018).
Reporting summary. Further information on research design is available in the Nature 23. Schug, J., Diskin, S., Mazzarelli, J., Brunk, B. P. & Stoeckert, C. J. Predicting
Research Reporting Summary linked to this article. gene ontology functions from ProDom and CDD protein domains. Genome
Res. 12, 648–655 (2002).
24. Das, S. et al. Functional classification of CATH superfamilies: a domain-based
Data availability approach for protein function annotation. Bioinformatics 31, 3460–3467
Our training, validation, and test data splits are available from our github page at https:// (2015).
github.com/flatironinstitute/DeepFRI. All other relevant data are available from the 25. Guan, Y. et al. Predicting gene function in a hierarchical context with an
authors upon reasonable request. Source data are provided with this paper. ensemble of classifiers. Genome biology 9, S3 (2008).
26. Wass, M. N., Barton, G. & Sternberg, M. J. E. CombFunc: predicting protein
Code availability function using heterogeneous data sources. Nucleic Acids Res. 40,
The source code for training the DeepFRI model, together with neural network weights W466–W470 (2012).
are available for research and non-commercial use at https://github.com/flatironinstitute/ 27. Radivojac, P. et al. A large-scale evaluation of computational protein function
DeepFRI and it can be cited by using https://doi.org/10.5281/zenodo.4650027. A web prediction. Nat. Methods 10, 221–227 (2013).
service of our method is available at https://beta.deepfri.flatironinstitute.org/. 28. Jiang, Y. et al. An expanded evaluation of protein function prediction methods
shows an improvement in accuracy. Genome Biol. 17, 184 (2016).
29. Zhou, N. et al. The cafa challenge reports improved protein function
Received: 21 September 2020; Accepted: 22 April 2021; prediction and new functional annotations for hundreds of genes through
experimental screens. Genome Biol. 20, 244 (2019).
30. Peña-Castillo, L. et al. A critical assessment of mus musculus gene function
prediction using integrated genomic evidence. Genome Biol. 9, S2 (2008).
31. Cozzetto, D., Minneci, F., Currant, H. & Jones, D. T. FFPred 3: feature-
based function prediction for all Gene Ontology domains. Sci. Rep. 6, 31865
References (2016).
1. Goodsell, D. S. The Machinery of Life (Springer Science & Business Media, 32. Mostafavi, S. et al. GeneMANIA: a real-time multiple association network
2009). integration algorithm for predicting gene function. Genome Biol. 9, S4 (2008).
33. Cho, H., Berger, B. & Peng, J. Compact integration of multi-network topology 63. Zołna, K., Geras, K. J. & Cho, K. Classifier-agnostic saliency map extraction. In
for functional analysis of genes. Cell Syst. 3, 540–548 (2016). Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33,
34. Barot, M., Gligorijević, V. & Bonneau, R. deepNF: deep network fusion for 10087–10088 (2019).
protein function prediction. Bioinformatics 34, 3873–3881 (2018). 64. Adebayo, J. et al. In Advances in Neural Information Processing Systems Vol.
35. Bepler, T. & Berger, B. Learning protein sequence embeddings using 31 (eds Bengio, S. et al.) Advances in Neural Information Processing Systems
information from structure. In International Conference on Learning 31, 9505–9515 (Curran Associates, Inc., 2018).
Representations (2019). 65. Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P. & de Freitas, N. Modelling,
36. AlQuraishi, M. End-to-end differentiable learning of protein structure. Cell visualising and summarising documents with a single convolutional neural
Syst. 8, 292–301.e3 (2019). network. Preprint at https://arxiv.org/abs/1406.3830 (2014).
37. Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of 66. Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for
protein contact map by ultra-deep learning model. PLOS Comput. Biol. 13, biologically relevant ligand-protein interactions. Nucleic Acids Res. 41,
1–34 (2017). D1096–D1103 (2012).
38. Kulmanov, M., Khan, M. A. & Hoehndorf, R. DeepGO: predicting protein 67. Porter, C. T., Bartlett, G. J. & Thornton, J. M. The Catalytic Site Atlas: a
functions from sequence and interactions using a deep ontology-aware resource of catalytic sites and residues identified in enzymes using structural
classifier. Bioinformatics 34, 660–668 (2017). data. Nucleic Acids Res. 32, D129–D133 (2004).
39. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436 (2015). 68. Schneider, R., de Daruvar, A. & Sander, C. The HSSP database of protein
40. Jiménez, J., Doerr, S., Martínez-Rosell, G., Rose, A. S. & De Fabritiis, G. structure-sequence alignments. Nucleic Acids Res. 25, 226–230 (1997).
DeepSite: protein-binding site predictor using 3D-convolutional neural 69. Huberts, D. H. & van der Klei, I. J. Moonlighting proteins: an intriguing mode
networks. Bioinformatics 33, 3036–3042 (2017). of multitasking. Biochim. Biophys. Acta, Mol. Cell Res. 1803, 520–525 (2010).
41. Amidi, A. et al. Enzynet: enzyme classification using 3d convolutional neural 70. Geirhos, R. et al. Imagenet-trained CNNs are biased towards texture;
networks on spatial representation. PeerJ, 6, e4750 (2018). increasing shape bias improves accuracy and robustness. in International
42. Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A. & Vandergheynst, P. Conference on Learning Representations (2019).
Geometric deep learning: going beyond euclidean data. IEEE Signal Process. 71. Ilyas, A. et al. Adversarial examples are not bugs, they are features.
Mag. 34, 18–42 (2017). In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach,
43. Henaff, M., Bruna, J. & LeCun, Y. Deep convolutional networks on graph- H. et al.) (Curran Associates, Inc., 2019).
structured data. CoRR abs/1506.05163 (2015). 72. Chang, A., Schomburg, I., Jeske, L., Placzek, S. & Schomburg, D. BRENDA in
44. Kipf, T. N. & Welling, M. Semi-supervised classification with graph 2019: a European ELIXIR core data resource. Nucleic Acids Res. 47,
convolutional networks. In 5th International Conference on Learning D542–D549 (2018).
Representations (ICLR) (2017). 73. of the Gene Ontology Consortium, T. R. G. G. The gene ontology’s reference
45. Duvenaud, D. et al. Convolutional networks on graphs for learning molecular genome project: a unified framework for functional annotation across species.
fingerprints. in Proceedings of the 28th International Conference on Neural PLOS Comput. Biol. 5, 1–8 (2009).
Information Processing Systems Vol. 2, NIPS’15, 2224–2232 (MIT Press, 2015). 74. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large
46. Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Convolutional embedding of attributed molecular graphs for physical 75. Lovell, S. C. et al. Structure validation by Cα geometry: ϕ, ψ and Cβ deviation.
property prediction. J. Chem. Inform. Model. 57, 1757–1772 (2017). Proteins 50, 437–450 (2003).
47. Fout, A., Byrd, J., Shariat, B. & Ben-Hur, A. In Advances in Neural 76. Rhodes, G. Complementary Science: Crystallography Made Crystal Clear 3rd
Information Processing Systems Vol. 30 (eds Guyon, I. et al.) 6530–6539 edn. (Academic Press, Burlington, US, 2014).
(Curran Associates, Inc., 2017). 77. Wang, G., Dunbrack, J. & Roland, L. PISCES: a protein sequence culling
48. Selvaraju, R. R. et al. Grad-cam: visual explanations from deep networks via server. Bioinformatics 19, 1589–1591 (2003).
gradient-based localization. In 2017 IEEE International Conference on 78. Nielsen, H., Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K. &
Computer Vision (ICCV) 618–626 (2017). Winther, O. DeepLoc: prediction of protein subcellular localization using deep
49. Peters, M. et al. Deep contextualized word representations. in Proceedings of learning. Bioinformatics 33, 3387–3395 (2017).
the 2018 Conference of the North American Chapter of the Association for 79. Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with
Computational Linguistics: Human Language Technologies, Volume 1 (Long deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Papers), 2227–2237 (Association for Computational Linguistics, 2018). 80. Hou, J., Adhikari, B. & Cheng, J. DeepSF: deep convolutional neural network
50. Graves, A. Generating sequences with recurrent neural networks. Preprint at for mapping protein sequences to folds. Bioinformatics 34, 1295–1303 (2017).
https://arxiv.org/abs/1308.0850 (2013). 81. Eddy, S. R. A new generation of homology search tools based on probabilistic
51. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, inference. in Genome informatics. International Conference on Genome
D222–D230 (2013). Informatics Vol. 23, 205–211 (2009).
52. Defferrard, M., Bresson, X. & Vandergheynst, P. Convolutional neural 82. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. in 3rd
networks on graphs with fast localized spectral filtering. In Advances in Neural International Conference on Learning Representations, ICLR 2015, San Diego,
Information Processing SystemstsVol. 29 (eds Lee, D. et al.) 3844–3852 CA, USA, May 7–9, 2015, Conference Track Proceedings (2015).
(Curran Associates, Inc., 2016) 83. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 27,
53. Hamilton, W., Ying, Z. & Leskovec, J. In Advances in Neural Information 861–874 (2006).
Processing Systems Vol. 30 (eds Guyon, I. et al.) 1024–1034 (Curran
Associates, Inc., 2017).
54. Velickovic, P. et al. Graph attention networks. In International Conference on Acknowledgements
Learning Representations (2018). R.J.X. is funded by NIH (DK043351), JDRF, and Center for Microbiome Informatics and
55. Dehmamy, N., Barabasi, A.-L. & Yu, R. In Advances in Neural Information Therapeutics. R.B. is funded by NSF 1728858-DMREF and NSF 1505214 - Engineered
Processing Systems Vol. 32 (eds Wallach, H. et al.) 15413–15423 (Curran Proteins. T.K. is partly funded by the Polish National Agency for Academic Exchange
Associates, Inc., 2019). grant PPN/PPO/2018/1/00014. R.B., V.G., P.D.R., D.B., C.C., and J.K.L. are supported by
56. Gutmanas, A. et al. SIFTS: updated Structure Integration with Function, Simons Foundation funding to the Flatiron Institute. K.C. is partly supported by Sam-
Taxonomy and Sequences resource allows 40-fold increase in coverage of sung AI and Samsung Advanced Institute of Technology. We thank IBM for access to the
structure-based annotations for proteins. Nucleic Acids Res. 47, D482–D489 (2018). WCG World Community Grid.
57. Leaver-Fay, A. et al. Rosetta3: an object-oriented software suite for the
simulation and design of macromolecules. In Methods in enzymology Vol. 487, Author contributions
545–574 (Elsevier, 2011). V.G. wrote the manuscript with input from all the authors. V.G., R.B., and K.C. con-
58. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm ceived the study. V.G. designed the experiments, oversaw all method development,
based on the TM-score. Nucleic Acids Res. 33 (2005).
conducted the benchmarks, and ran all of the analyses. P.D.R. performed the protein
59. Bonneau, R., Ruczinski, I., Tsai, J. & Baker, D. Contact order and ab initio structure prediction and structure comparison experiments and together with D.B. col-
protein structure prediction. Protein Sci. 11, 1937–1944 (2002). lected and curated all the contact maps used for training the models. P.D.F., T.K., J.K.L.,
60. Alterovitz, R. et al. Resboost: characterizing and predicting catalytic residues D.B., T.V., C.C., B.C.T., I.M.F., H.V., R.J.X., R.K., K.C., and R.B. contributed to analysis
in enzymes. BMC Bioinform. 10, 197 (2009). and discussion on the data. C.C. developed the DeepFRI webserver. J.K.L. helped with
61. Pope, P. E., Kolouri, S., Rostami, M., Martin, C. E. & Hoffmann, H.
visualizations and figure design. R.B. supervised the research. All authors reviewed the
Explainability methods for graph convolutional neural networks. In The IEEE manuscript and approved it for submission.
Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
62. Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and
understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018). Competing interests
The authors declare no competing interests.
Additional information Open Access This article is licensed under a Creative Commons
Supplementary information The online version contains supplementary material Attribution 4.0 International License, which permits use, sharing,
available at https://doi.org/10.1038/s41467-021-23303-9. adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Correspondence and requests for materials should be addressed to V.G. or R.B. Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
Peer review information Nature Communications thanks Lucas Bleicher and the other,
indicated otherwise in a credit line to the material. If material is not included in the
anonymous, reviewer(s) for their contribution to the peer review of this work.
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
Reprints and permission information is available at http://www.nature.com/reprints
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
© The Author(s) 2021