0% found this document useful (0 votes)
17 views14 pages

Structure-Based ML Paper

Uploaded by

Suraj Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Structure-Based ML Paper

Uploaded by

Suraj Shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Computational and Structural Biotechnology Journal 21 (2023) 630–643

Contents lists available at ScienceDirect

Computational and Structural Biotechnology Journal


journal homepage: www.elsevier.com/locate/csbj

Beyond sequence: Structure-based machine learning


Janani Durairaj a,b, Dick de Ridder b, Aalt D.J. van Dijk b,
⁎ ]]
]]]]]]
]]

a
Biozentrum, University of Basel, Basel, Switzerland
b
Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands

a r t i cl e i nfo a bstr ac t

Article history: Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural
Received 26 September 2022 bioinformatics. Combined with various advances in experimental structure determination and the unin­
Received in revised form 21 December 2022 terrupted pace at which new structures are published, this promises an age in which protein structure
Accepted 21 December 2022
information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has
Available online 29 December 2022
been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich
structural information as input. Machine learning methods making use of structures are scattered across
Keywords:
Protein structures literature and cover a number of different applications and scopes; while some try to address questions and
Machine learning tasks within a single protein family, others aim to capture characteristics across all available proteins. In this
Deep learning review, we look at the variety of structure-based machine learning approaches, how structures can be used
as input, and typical applications of these approaches in protein biology. We also discuss current challenges
and opportunities in this all-important and increasingly popular field.
© 2023 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and
Structural Biotechnology. This is an open access article under the CC BY license (http://creative­
commons.org/licenses/by/4.0/).

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 631
2. Machine learning in the protein field . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 631
2.1. Protein family based ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 631
2.2. Protein universe based ML . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 632
3. Computational representations of protein structures . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 634
3.1. Generating structure feature matrices . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 635
3.1.1. Residue level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 635
3.1.2. Structural environment level. . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 635
3.2. Learning protein embeddings . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 635
4. Challenges and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 637
4.1. Structure-based approaches are computationally expensive. ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 637
4.2. End-to-end learning on structures . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 637
4.3. Dynamic representations of structure . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 637
4.4. Probing underlying protein mechanisms . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
4.5. A unified approach to function . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
CRediT authorship contribution statement . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
Conflicts of Interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... . ..... . ..... ..... . ..... . ..... . ..... . ..... . ..... . ..... . .... 638

https://doi.org/10.1016/j.csbj.2022.12.039
2001-0370/© 2023 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the
CC BY license (http://creativecommons.org/licenses/by/4.0/).
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

1. Introduction coverage by an average of 25% compared to homology modelling


across 11 proteomes [11], reaching over 76% for the human genome
Protein bioinformatics is a thriving and fast-growing field dealing and reducing the fraction of the human “dark proteome” from 26% to
with algorithms and data structures to explore, analyse and compare 10% [12]. Thus, we can theoretically obtain high resolution protein
(groups of) proteins in order to better understand their various structural information for a large number of available protein se­
biological, physicochemical and molecular properties and functions. quences. In addition, computationally predicted models can help
With the increase in protein sequence data obtained from large-scale better resolve experimental structures [13–15].
high-throughput sequencing technology, machine learning (ML) has With these advances, we are at the brink of a structural revolution
become a key methodology in protein bioinformatics. In protein with millions of newly modelled structures at our disposal. Thus ML
structure prediction, be it secondary structure, backbone angles, applications in protein bioinformatics, already shown to be very
contacts, folds, or full-atom structure, ML has become indispensable powerful in shedding light on biological problems, now have a wealth
and forms the basis of a number of popular tools and algorithms. ML of structural information to exploit as input instead of, or along with,
has also successfully been applied to predict protein function, pro­ the typically used protein sequences. These sequence- and structure-
tein-protein interactions, drug-target binding, enzyme substrate based ML methods (hereafter referred to as “structure-based”) can
specificity, thermostability, catalytic rates, binding affinity, variant greatly outperform purely sequence-based approaches, as demon­
and mutant effects and more. ML is data-driven and attempts to strated in studies where the same ML architecture is validated using
identify patterns in existing data to predict properties of new, un­ only sequence and both sequence and structure information [16–19],
seen data. Given ML’s requirement of large amounts of diverse data, though sometimes data biases have prevented useful training of
the overwhelming majority of ML applications on proteins use se­ structure-based methods [20]. The past years have already seen
quences as input, some of which are powering different aspects of movement in the direction of protein structure-based ML and its role
popular resources such as Ensembl [1], Pfam [2] and UniProt [3]. is sure to increase drastically in future research. In this review, we
However, numerous protein families have divergent protein se­ describe the space of machine learning on protein structures in terms
quences yet share highly similar three-dimensional structures, of the kinds of tasks that structures can help solve and the kinds of
topologies and folds, since structure tends to evolve slower than algorithms applicable to these tasks. We outline the various structural
sequence [4]. Furthermore, protein tertiary structure typically pro­ features and representations currently obtainable. Finally, we look at
vides a wealth of information not found in sequence - spatial to­ open problems and challenges, as well as promising opportunities in
pology, residue interactions, solvent accessibility, residue dynamics this exciting field.
and electrostatics, and more.
Historically, structural biology depends primarily on experi­ 2. Machine learning in the protein field
mental structure determination methods including X-ray crystal­
lography, nuclear magnetic resonance (NMR), small-angle scattering, Machine learning (ML) is defined as “the study of computer al­
and cryo-electron microscopy (cryo-EM). The Protein Data Bank gorithms that improve automatically through experience and by the
(PDB) [5], established in 1971, stores these experimentally de­ use of data” [21]. Typically, these algorithms find patterns in datasets
termined structures and its size has been steadily increasing over the and link such patterns to specific outcomes or groupings. Deep
years. At the time of writing the PDB consists of 195,325 structures learning (DL) is a sub-field of ML which uses artificial neural net­
and grows by an average of 13,723 structures a year (calculated over works with multiple stacked layers of network connections enabling
2017–2021). However, these numbers pale in comparison to the learning of increasingly complex information through huge amounts
growing deluge of protein sequence data, with the UniProt protein of data compared to the more “classical” ML approaches. In this
database containing 226,771,949 sequence entries at the time of work, we use ML to refer to both DL and classical ML.
writing, over 771,752 more than the previous release with a release Supervised ML attempts to predict a certain response by learning
cycle of 8 weeks. This phenomenon is often referred to as the se­ patterns from labelled data. In the case of classification, this re­
quence-structure knowledge gap [6]. Fortunately, experimental ap­ sponse is the membership of the data point in a particular grouping
proaches are not the only way to obtain structural information, and or class. Regression, on the other hand, predicts a real-valued nu­
computational structure prediction techniques are fast closing this meric outcome. Unsupervised ML attempts to find clusters or learn
gap. A protein’s structure can be modelled from its sequence either reduced representations from data without any labels. See [22] for
using the experimental structures of one or more homologous pro­ an in-depth introduction to these topics.
teins (template-based, comparative or homology modelling), or ML has been used widely across biology for decades, with re­
using de novo prediction techniques (template-free or de novo views outlining its usage in the fields of omics [23], synthetic biology
modelling). Given that homology modelling performs well when [24], biomedicine [25], and drug discovery [26]. In the context of
using templates with > 30% sequence identity to the protein of in­ proteins, ML approaches, both supervised and unsupervised, can
terest, accurate structural models can be obtained for over 60% of the broadly be divided into protein family based and protein universe
genes in the top 12 most accessed genomes on UniProt [7,8]. Tem­ based techniques. These two categories differ in the kinds of pre­
plate-free modelling, on the other hand, does not rely on global si­ diction problems they are applied to, the kinds of algorithms used,
milarity to a known structure and hence can be applied to proteins and the kinds of representations used as input.
with rarer folds. A recent breakthrough, the highly accurate deep-
learning based AlphaFold2 model from DeepMind [9] trained on 2.1. Protein family based ML
experimental structures to predict the structure for an input se­
quence, has allowed structural modelling to realise as high accuracy Protein family based ML is used to predict properties of the
and resolution as the best experimentally resolved structures in members of individual protein families or sub-families, usually
many cases. In collaboration with EBI, DeepMind has released the consisting of hundreds to thousands of experimentally characterised
AlphaFold Protein Structure Database [10], currently containing over training proteins. Some of the questions in protein family supervised
200 million structural models. This increases high quality structural ML include specificity prediction of substrates, intermediates,


Corresponding author.
E-mail addresses: [email protected] (J. Durairaj), [email protected] (D. de Ridder), [email protected] (A.D.J. van Dijk).

631
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

products, and inhibitors; state prediction in the context of en­ Though computationally predicted structures are shown to be
gineering thermostability, binding affinity and activity; and predic­ highly accurate at the backbone level, tasks such as the ones de­
tion of the effects of mutations. In many cases, such as the scribed above which involve small molecule binding may need fur­
immensely diverse lipocalins [27] and the fast-evolving enzyme fa­ ther family-specific processing and ML-based approaches to harness
milies involved in specialised metabolism [28], the sequence di­ the structural information specifically related to ligand interaction.
versity within a family make it impossible for sequence-based For example, [45] show that AlphaFold-predicted GPCR structures
techniques to predict family properties. Even very similar sequences differ in crucial features such as domain assembly, ligand-binding
can have mutations in key structural regions resulting in completely pockets, and interface conformation, thus impeding their direct use
different activities, which is easier to ascertain from structure than in functional studies.
from sequence alone. In addition, insights from computational pre­ Unsupervised ML in the protein family space hosts a new sub-
diction methods which also use structure as input can better drive field of structural bioinformatics, dubbed “comparative structuro­
experimental studies due to the generally higher accuracy of struc­ mics" by Mohammed AlQuraishi. This is concerned with tools, al­
ture-based prediction, and better enable exploration of the protein gorithms, and techniques to compare and contrast assorted datasets
family space with structural stability and activity taken into account. of protein structures to answer a variety of biological questions - the
We give examples of supervised ML tasks for some well known evolutionary relationships between structural orthologs, interaction
protein families below. networks and how they are affected by structural changes, folding
The superfamily of G protein-coupled receptors (GPCRs) is the and changes within different cellular contexts and organisms, and
largest family of targets for approved drugs in modern drug dis­ how structure and folding are coupled with different functional
covery, and hence also a popular target for ML approaches to drive characteristics. Zebra3D [46] is an example of such a technique. It
exploration and understanding. GPCRs play an essential role in provides a systematic analysis of 3D protein structure alignments
physiological processes such as vision, olfaction, neuronal signal combined with the identification of subfamily-specific regions using
transmission, cell differentiation, pain, muscle contraction, and unsupervised ML clustering algorithms - these regions represent
hormone secretion [29]. Recent ML studies on GPCRs have started patterns of local 3D structure similar within subfamilies, but dif­
incorporating structural information to improve prediction perfor­ fering between them, thus likely to be associated with functional
mance, and to derive biological insight into the residues and me­ diversity and function-related conformational plasticity. The work of
chanisms involved. As commonly used ML models for structure, de Lima et al. [47] is another example of unsupervised protein family
interaction and interface prediction are trained on soluble proteins, ML concerned with the detection of subfamilies and simultaneous
specialised GPCR-specific oligomerization and interface predictors identification of differentiating residues. Clustering and dimension­
were developed [30,31], able to handle their long transmembrane ality reduction techniques have been used to describe the con­
regions. Recent work even modified the existing AlphaFold2 algo­ formational landscape of proteins and identify binding-induced
rithm to generate rarer GPCR conformations [32]. GPCRs often dis­ conformational change [48,49].
play high conformational flexibility and low thermostability, making Protein family ML often has to deal with sparsely populated
their structural, biophysical, and biochemical characterisation in the datasets and rely on algorithms which can handle a large number of
laboratory challenging. Given that experimental identification of features measured across a small number of data points. A wide
thermostabilizing mutations is very resource intensive and must be range of algorithms are at our disposal for these tasks, including but
repeated for each individual receptor, computational prediction of not limited to k-nearest neighbours algorithms (k-NNs) [50], support
GPCR mutant stability is a crucial task in this field [33]. Finally, vector machines (SVMs) [51], Gaussian processes [52], and ensemble
GPCRs bind to a very diverse range of ligands and ML is used to methods such as Random Forests [53] and gradient boosting trees
identify biologically active ligands and binding inhibitors, estimating [54]. In addition, many approaches in this field aim to interpret
affinity and other binding properties, and probe ligand-specific prediction results to derive insights about underlying mechanisms
binding mechanisms [34]. and residues which may be important for function. Such predictions
Another important class of drug targets are the kinases [35], with and insights obtained from protein family ML are often used to drive
over 500,000 publications, 20,000 patents, inhibition assays for the experimental research to explore and characterise novel, interesting
majority of the human kinome and 115,000 kinase inhibitors cov­ or relevant proteins.
ering 20% of the kinome [36]. With over 7000 structures solved
covering 308 kinases across 8 groups and complexed with over 3000
unique ligands and inhibitors, structure-based ML approaches are 2.2. Protein universe based ML
widely used for addressing challenges within this superfamily. These
include methods to predict inhibition [37] and binding affinity [38] The larger-scale protein universe based ML typically uses tens of
in specific kinase families. Another common kinase challenge is thousands of proteins from diverse superfamilies to learn global
predicting conformational change between the so-called active and properties of proteins, such as secondary and tertiary structure and
inactive conformations [39,40]. For drug targets, predicting the ef­ folding, interactions, disorder, broad function classes etc. DL is a
fects of mutation of a single protein could also be considered a common choice for such problems, as it is known to drastically
protein family ML task, as the inputs are still proteins sharing the outperform other techniques in the presence of large amounts of
same structural fold with key differences caused by changes in the data. In fact, protein structure prediction is in itself a protein uni­
sequence. PremPLI [41] uses features from modelled protein-ligand verse task in which the use of DL has in many cases eclipsed other
complexes to predict the effect of mutation on binding affinity to a ML or statistical methods. This is true for prediction of secondary
number of inhibitors for a kinase cancer target. structure, solvent accessibility [55], backbone torsion angles [56,57],
In the field of natural products and specialised metabolism in residue-residue contacts or distance matrices from co-evolution
plants, bacteria, and fungi, ML has slowly been gaining popularity [58–62], and in de novo all atom structure modelling. In fact, all the
over more traditional approaches involving similarity search or top-performing Critical Assessment of Structure Prediction (CASP13
analysis of a few, closely related proteins. ML has been used for [63], CASP14 [64]) methods for de novo modelling rely on deep
successful prediction of substrate [42,43] and product [44] specifi­ convolutional neural networks for predicting residue contacts or
city in various natural product enzyme families. In 2013, a structure- distances, predicting backbone torsion angles and/or ranking the
informed approach was used to engineer highly thermostable cy­ final models. For recent reviews on the underlying techniques used,
tochrome p450s [19]. including those in AlphaFold2 and related approaches, see [65,66].

632
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

Table 1
Supervised protein universe tasks, inputs and examples.

Prediction of Input Examples Datasets

Protein function Protein [67,68] SIFTS [69]


Mutant stability Protein + Mutation [70–73] ProThermDB [74], ATOM3D [75]
Cavity and pocket Protein, Residue [76,77] TOUGH-C1 [77], SOIPPA [78]
Model quality Protein, Residue [79–82] CASP [83]
PPI-Interface Residue [84–89] ProtCID [90], Docking benchmark v5 [91], DockGround [92], DIPS-
Plus [93]
Ligand binding site Residue [94–96] sc-PDB [97], COACH420 [98], HOLO4K [99]
Intrinsic disorder Residue [11,100,101] DIBS [102], DisProt [103]
Interaction Protein-protein complex, Protein [104–106] DIP [107], STRING [108], HPRD [109], BioGRID [110], HPIDB [111]
+ Protein
Protein binding affinity Protein-protein complex, Protein [112–114] Affinity benchmark [91], SKEMPI2 [115]
+ Protein
Ligand screening and binding affinity Protein-ligand complex, Protein [38,79,116–124] PDBBind [125], Binding MOAD [126], DUD-E [127]
+ Ligand

The Input column describes the typical form of input given to the algorithms used. Multiple input format possibilities are comma-separated. All inputs refer to the structural
context, i.e. “Protein” refers to the 3D protein structure, “Residue” to aspects associated with each individual residue - its physicochemical, electrostatic, geometric properties etc.
(similarly for “Mutation”), “Ligand” to the 2D and/or 3D structure of a small molecule ligand.

With the availability of protein structures, a number of additional Predicting the effects of variants and mutations, especially those
tasks can make use of structure-based ML instead of sequence. These involved in diseases, is another common task. Sen et al. [141] took
are listed in Table 1, grouped by the kinds of inputs used. Recent advantage of the latest de novo structure prediction techniques to
examples as well as common datasets used to validate and bench­ model human disease-associated proteins, many of which do not have
mark novel algorithms created for each task are also listed. existing structures or even close homologues. Afterwards, they com­
In the 2020 CASP14 competition, the breakthrough results of pared disease-associated mutations to ligand binding sites, protein-
AlphaFold2 prompted a press release declaring the protein structure protein interfaces and conserved regions predicted from the models, in
problem for single protein chains solved [64]. This emphasis on “single order to provide some rationale for most of the mutations. However,
protein chains” revealed the new frontier for structural bioinformatics the current DL-based structure predictors are not yet able to suc­
- complex structures are yet to be successfully predicted at the same cessfully predict mutations in protein structures as their training
breakthrough levels. Thus the related yet distinct tasks of predicting procedure is designed to be robust to small changes in sequence. This
whether two proteins interact, and predicting the interface of two has been practically demonstrated in studies aiming to predict stabi­
interacting proteins are common protein universe problems with a lity effects of mutations using predicted structures [142,143], and it
number of solutions, based on docking [87,104], templates [105], end- indicates an under-explored area of structure prediction.
to-end learning [84] and, most recently, protein complex prediction Approaches building upon AlphaFold2 and its underlying archi­
approaches building upon AlphaFold2 [128-130]. The latter generation tectures have been used successfully in design tasks [144–147], in­
combines the AlphaFold2 DL architecture with a modified paired MSA dicating that the AlphaFold2 breakthrough may also cause a leap in
generation approach which encapsulates co-evolutionary information protein design prediction. The process of constructing idealised folds
across the subunits of the desired complex. This yielded success rates during protein design can reveal new information about the physical
for complex prediction up to double that of previous template-based and structural constraints that dictate which conformations a pro­
and docking methods, marking significant progress in the field. tein can adopt [148,149]. Such insights could be of vital importance
However, these success rates are still only around 50% and vary to solving fundamental biological questions behind the evolution of
drastically across species, protein families, types of complexes, and proteins, as well as for further improvement of protein engineering
stoichiometries considered [129,131]. Similarly, the popular de novo and design [150]. See [151] for a recent review of DL approaches in
protein structure prediction algorithm RoseTTAFold, has been ex­ the protein design field.
tended to the prediction of nucleic acid and protein-nucleic acid Instrinsically disordered proteins (IDPs) lack a fixed or ordered
complexes [132], though again only around half of the tested com­ three-dimensional structure. This widespread phenomenon, thought
plexes could be successfully modelled. to occur in over 33% of eukaryotic proteins, has been linked with
Structure-based drug discovery also hosts some significant ap­ allosteric regulation, enzyme catalysis, and a variety of diseases
plications of protein universe ML [133], starting from the computa­ [152]. While structure-based prediction of intrinsic disorder may
tional modelling of putative receptor targets. Subsequently, binding seem contradictory, energy scores obtained from existing structures
sites in the target structure and putative drug candidates are iden­ [100] as well as residue-level computational modelling scores
tified using cavity/pocket prediction techniques [76], prediction of [11,101] contain information correlating with disorder and are ef­
“druggable” regions, and protein-ligand binding site [134] predic­ fective for prediction. Structure-based ML has also been used to
tion. This is typically followed by molecular docking to evaluate sample the very diverse conformational ensembles of IDPs [153].
protein-ligand interaction and affinity between the target and a Unsupervised techniques in the protein universe support tasks
variety of drug candidates. In the case of unknown target proteins or such as structure query and retrieval, clustering for motif and hot­
to identify off-target binding candidates, reverse/inverse docking spot discovery, and structure-based fold annotation. For the former
[135–138] is used to create embeddings of drugs and search across task, an array of fast techniques that allows near-instant retrieval of
protein structure databases for good docking solutions. In these structures matching an input structure [154–158]. Recent ap­
contexts, ML approaches are used to improve scoring functions of proaches for structure-based clustering allow pinpointing novel or
binding affinity and plausible docking poses [81,116,121,138,139]. rare folds [11,159], as well as residues and structural regions asso­
Indeed, [140] show that computationally predicted structures per­ ciated with function [160]. Another common task is the generation
form on par with experimental structures at reverse docking tasks - of fixed-dimensional unsupervised embeddings which capture
although the docking and scoring methods themselves could use global and local protein characteristics. These can be used in
major improvements to further drug discovery and design. downstream ML algorithms, as discussed in the next section.

633
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

Fig. 1. Common steps in structure-based machine learning. A) Starting from a set of protein sequences, structural models can either be retrieved from the PDB or constructed
using computational approaches. B) A number of different feature extraction, feature engineering, or pre-trained embedding approaches can then be used C) to extract a matrix
representation of the input, with the rows as data points and columns representing features or embedding values. D) This matrix forms the input for ML models resulting in
predictions of classes, regression values, or unsupervised clustering and dimensionality reduction. E) Prediction results, combined with the trained model, can be used to inspect
and interpret regions of the protein structure relevant for the task at hand.

3. Computational representations of protein structures Another approach is to generate reduced fixed-dimensional protein
representations, referred to as embeddings. Both these approaches
Protein structures contain interconnected high-dimensional in­ (Fig. 1B) are followed by the use of ML algorithms that take the
formation about the amino acids involved, their positions and re­ feature matrix or embedding as input and return various results
lative orientations, and the varying physicochemical and (Fig. 1D) and insights (Fig. 1E) for user interpretation.
electrostatic effects they have on each other. Fig. 1 shows an over­ A number of studies have demonstrated that high-confidence
view of the most common steps taken in structure-based ML. Once a predicted structural models (both homology-based and DL-based)
set of structures with or without associated labels has been collected have predictive power and can even perform as well as experimental
(Fig. 1A), the next step typically consists of choosing a format to structures on specific tasks [11,16,33,161]. However, this is unlikely to
represent this information that can be understood by computers be a general statement as it is highly dependent on both the types of
(Fig. 1C). One way to do this is by explicitly extracting a set of at­ proteins and the task at hand. For example, membrane proteins, in­
tributes or features from proteins to create a tabular feature matrix. trinsically disordered proteins, and proteins with high conformational

634
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

flexibility would still benefit from experimental structures solved in 3.1.2. Structural environment level
different conditions to increase the diversity of structures available Fig. 2 depicts some structural environments commonly used in
and thus our knowledge of them. In addition, side-chain modelling computational representations. For tasks such as hotspot prediction
accuracy, crucial for tasks involving side-chain interactions, tends to or interface residue prediction, each input data point could be a
lag behind main chain accuracy. Finally, in a significant number of single residue. In such situations, including aggregate features with
cases, AlphaFold2 and related approaches do not produce high-con­ weighted neighbour averages over the spatial nearest neighbouring
fidence structures. It was recently shown that while residues pre­ residues, as shown in Fig. 2A, often improves the discriminatory
dicted by AlphaFold2 with high confidence (> 90 plDDT) have a very power of predictors [181]. Some environment representations were
low prediction error (median 0.6 Å), this quickly increases to over 3 Å borne out of ease of adaption of approaches from other fields to
error for low confidence residues (< 70 plDDT) [162]. For such cases protein structures - for example, viewing the three-dimensional
with only low confidence structure information present, we may still coordinates of atoms in a structure as a 3D image grid (Fig. 2B) al­
have to fall back on sequence-based approaches or utilise embedding lows the application of voxelization followed by the use of 3D con­
techniques as described in Section 3.2. volutional neural networks often applied in the field of computer
vision. Whereas in the case of images the red, green and blue values
3.1. Generating structure feature matrices are often encoded as different channels, for proteins these channels
have been used to encode different atom types [77,95]. Another
Broadly, protein structures are compared at the residue level, approach that can also take into account atomic density and radii is
where features are extracted from each individual residue in the the use of geometric tessellations to define a set of polyhedra around
structure, or at a structural environment level, where features are atoms or residues in a structure [182–185] (Fig. 2C).
extracted from well-defined portions of the structure (or the entire Representations of the molecular surface (Fig. 2D) are useful for
structure) containing relevant and localised properties. The former tasks related to protein interactions and protein-solvent interactions.
approach is commonly used in structurally conserved protein family For example, MaSIF [86] depicts the surface as a series of overlapping
ML tasks involving the entire protein, and the latter is used for more radial patches with associated geometric features such as shape index
divergent proteins or for more specific tasks involving the corre­ and distance-dependent curvature, as well as chemical features such
sponding structural environments. Both approaches use a range of as hydropathy index, continuum electrostatics and the location of free
techniques to align or arrange the extracted features into the fixed electrons and proton donors. A geometric deep neural network is
dimensional feature matrix format. applied to these input features to spatially localise features and op­
timise them towards particular tasks. Other approaches have used 3D
Zernike or similar descriptors of surfaces which are invariant to ro­
3.1.1. Residue level tation, thus allowing structures and surfaces of different proteins to
Many different features can be extracted from each residue in a be compared [186–188]. In fact, one of the main problems to solve
protein structure using a plethora of computational tools, as listed in when representing entire protein structures is this rotational and
Table 2. translational invariance. Fig. 2E depicts one way to address this,
When the proteins under consideration are evolutionarily closely namely by using a 2D residue-residue distance or contact map
related, multiple protein alignment is commonly used to generate [189,190]. Another approach gaining popularity is the representation
the input feature matrix. While sequence alignment has generally of a protein structure as a graph (Fig. 2F) with rotation and translation
been much more popular than structure alignment, the existence of invariant properties attached to the nodes and/or edges [17,191–194].
protein families which share the same structural fold despite having These graphs form the ideal input for geometric deep learning ap­
little sequence similarity necessitates the use of structure-based proaches and have the capacity to encode most of the information
alignment methods. This has driven the development of fast mul­ contained in the protein structure [195,196].
tiple structure aligners capable of scaling to the numbers of proteins Proteins often interact with other molecules - other proteins,
required to train ML algorithms [178–180]. peptides, nucleic acids and small molecule ligands - so computa­
An alternative to the tabular format is a (dis)similarity matrix, tional representations of these binding regions or interfaces are
often used as input to kernel-based methods such as SVMs or in necessary for a number of tasks. Graph [122,197,198] and voxel-
unsupervised dimensionality reduction. For instance, de Lima et al. based [79,116,199] approaches can be used on experimentally solved
[47] calculate protein-protein similarity by combining similarities or computationally docked protein-ligand complexes, usually by
calculated from, among other features, structural alignment, align­ zooming in to the ligand binding pocket. In addition, there are
ment-free structural comparisons, putative active sites, and in­ specialised approaches to take into account explicit protein-ligand
stability indices. interactions within the ligand binding pocket in a complex
[124,200]; see [201] for more examples of protein-ligand feature
Table 2 representations. In cases where data about the complex is absent but
Structural features and tools used to extract them. Apart from DISPORED, all tools use
unbound structures are present, some approaches concatenate fea­
protein structures as input.
tures of the individual entities as their representation [117,119,120].
Residue feature Tools

Accessible surface area NACCESS [163], PSAIA [164], FreeSASA [165], 3.2. Learning protein embeddings
DSSP [166], ProtDCal [167]
Half sphere exposure BioPython (Bio.PDB.HSExposure) [168]
A complementary approach to generate the tabular input re­
Residue depth MSMS [169], PSAIA [164]
Hydrogen bonding patterns DSSP [166] quired for ML is by using end-to-end or pre-trained embedding al­
Bond angles DSSP [166], MDAnalysis [170] gorithms. These typically make use of unsupervised DL methods
Secondary structure DSSP [166] trained on a large dataset of proteins to produce a series of values
Energy FoldX [171], Rosetta [172]
representing a given protein in a fixed high-dimensional space, often
Electrostatics APBS [173]
Disorder DISOPRED [174] without the need for explicitly handcrafted features. Due to the
Residue flexibility and ProDy [175], MechStiff [176] training process, these values place similar proteins closer together
stiffness in this space thus capturing overall protein variation and relation­
Perturbation response PRS [177] ships between individual proteins. For example, recent global se­
Thermodynamics ProtDCal [167]
quence embeddings have been shown to capture amino acid

635
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

Fig. 2. Different approaches for computational representation of a protein structure which go beyond features of individual residues. For A-D features or representations
calculated across individual blocks (respectively: spheres, grids, polyhedra, surface patches) are used as input to ML, while for E-F, the entire matrix or graph is often used in
methods specifically designed for these kinds of inputs. A Overlapping spheres B 3D voxel grids C Geometric tesselations D Molecular surface representations E Distance/contact
maps F Graph representations.

characteristics and other physiological properties of proteins as a compared to MaSIF while maintaining and even improving accuracy.
whole [202–205]. These have recently been extended to include Recent DL approaches use the concept of “equivariance” (i.e rotation
structural information as well [206,207]. Unlike protein family ML, and translation of coordinates does not affect the learning process)
alignment is generally not an option in such techniques since most in sequence, graph-based, and diffusion architectures for end-to-end
proteins used for training are evolutionarily remote, thus most de­ predictive and generative learning [211–213,213].
scribed embedding techniques depend on learning alignment-free GeoPPI [113] is an unsupervised approach that operates on the
patterns across diverse proteins or on generating on-the-fly align­ graph of a protein complex and uses a message passing neural network
ments of sub-groups of data during the learning process. to reconstruct the structure of a perturbed complex, i.e one in which a
End-to-end learning is popular in this area, covering techniques random residue is modified. This enables learning of intrinsic binding
which start from the raw protein structure with minimal processing interactions, optimal for the prediction of protein-protein binding af­
and automatically extract features based on optimising prediction finity. An advantage of such “self"-supervised approaches is that they
accuracy in a given end task - thus the intermediate feature re­ are not specific to a single task while still encoding more global protein
presentations or embeddings learned are more applicable to the task context; i.e GeoPPI embeddings could easily be used as input for any
at hand and can be retrained to adapt better to different tasks. prediction task. This kind of repurposing of unsupervised or pretrained
ContactLib-ATT [208] applies this concept to predict the SCOP embeddings is quite popular in the sequence world [214,215], and
(Structural Classification Of Proteins) classification of an input likely the same will hold through for structure-based ML in the future.
structure, using attention-based learning [209] on vectors of hy­ Pretrained embeddings can also be used in a transfer learning context,
drogen bond properties extracted from the structure. SASNet [84] is where they are further fine-tuned to a more specific case of a general
an example of such an approach applied to interface prediction. protein problem, such as the prediction of antibody-antigen interfaces
Local atomic environments of each surface residue are voxelized and from an embedding trained across all protein-protein interfaces [17].
a 3D convolutional neural network is applied to the resulting grids of Another interesting and relevant approach is structure-guided
each pair of residues to learn their interaction propensity. Interest­ sequence embeddings [203,216,217] - these make use of structural
ingly, this method was trained based only on residues within bound information only in the training stage while the input to the em­
structures of interacting partners and yet performs exceedingly well bedding algorithm from the perspective of the end user is just the
also on unbound counterparts, indicating that complex features sequence. This provides a compromise between the use of structure
beyond simple shape complementarity can be learned in this end- data, which may be computationally expensive to produce, and more
to-end fashion. dMaSIF [210], the successor to MaSIF (mentioned easily accessible sequence data while still making use of implicit
above), performs end-to-end learning of molecular surface re­ structural information. Some recent work [194,218] has even made
presentations directly from 3D point cloud data, optimised to each use of the intermediate representations generated by AlphaFold2
prediction task. Removing the reliance on handcrafted features im­ during the structure prediction process instead of, or along with, the
proved the running time of dMaSIF by many orders of magnitude predicted structure itself - these representations contain

636
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

information about homologous sequences and structures, especially handcrafting of intermediate features and tasks, was seen to be
useful for predicting the effects of mutations or ligand binding, most highly successful for the extremely complex task of mapping an
of which is lost on generation of the final structure. input sequence to a 3D structure [66]. This has been followed by a
boom in end-to-end learning approaches on proteins sequences for
4. Challenges and future directions function prediction, as well as on protein structures for generating
designed protein sequences. See [223] for a recent review.
Despite rapid progress in the direction of structure-based ML, End-to-end learning is becoming popular for a number of tasks as
there are challenges to address before it can become as ubiquitously large models trained once on huge datasets of structures can then be
used as sequence-based ML. Just as there exists a wide variety of reused for smaller sets of proteins and adapted to similar tasks with
tools for answering questions from a sequence perspective, there much less resource consumption and, at the same time, a great in­
need to be tools in structural bioinformatics that are as easy to use, crease in performance for even sparse amounts of data
as intuitive to interpret, as optimised, and as feature-rich. [16,212,213,224,225]. In addition, these approaches can learn to
make use of relevant intermediate information from proteins that
4.1. Structure-based approaches are computationally expensive may not be required or prioritised for the structure prediction task
but are crucial for other downstream tasks - for example, residue
The universal and widespread use of protein sequence data, masking in the AlphaFold2 learning procedure increases its robust­
combined with its one-dimensional nature, has resulted in a diverse ness and improves overall structure prediction but makes it im­
landscape of highly optimised sequence-based tools and algorithms. possible to predict the structural changes caused by mutations,
Many of these, including clustering algorithms, aligners, feature while much of this information is still present in the intermediate
extractors etc., scale to hundreds of thousands of sequences with representations and useful for mutant effect prediction [218].
ease. This cannot be said for structure-based approaches yet, both However, these learners do need huge initial training sets of di­
due to their relative newness and to structural data being much verse data and careful architecture engineering to avoid overfitting
more complex than sequence data. as well as large amounts of computational resources for training and
Often this resource intensiveness starts from the very first step - inference. In addition, results from such approaches are difficult to
i.e. generating structural models. Template-based or homology interpret in terms of which kinds of protein properties are being
modelling approaches take a matter of minutes to hours for gen­ used to make certain decisions, which is a useful property of more
erating a single model, often exacerbated by the need to infer handcrafted ML techniques to hypothesise about the underlying
multiple models for better robustness and expensive additions such biology.
as loop modelling for special cases. Recent template-free methods
such as AlphaFold2 and RosettaFold run in minutes, though scaling 4.3. Dynamic representations of structure
very poorly with the number of residues, and require GPUs and high
amounts of memory and disk space. Memory and space require­ Since proteins are inherently dynamic in nature, their true
ments for both are somewhat alleviated by the presence of servers “structure” is much more than the rigid three-dimensional co­
such as SWISS-MODEL [219] for template-based modelling and the ordinates which serve as the basis for many of the approaches de­
recently released ColabFold [220] for template-free modelling, both tailed in the previous sections. Instead, a protein is an ensemble of
of which allow running these resource intensive modelling steps on possible conformations, with some areas displaying more flexibility
shared external servers. In addition, the growth of the AlphaFold than others. This is further influenced by the constant interaction of
protein structure database [9] will eventually reduce the need for proteins with the surrounding solvent, small molecules, nucleic
remodelling from scratch for a large number of sequenced proteins. acids, peptides and of course other proteins, all of which drive
Mutants, designed and novel proteins will still need computational conformational changes within the protein. Protein biological ac­
modelling however, indicating that speeding up the modelling pro­ tivity often involves adopting specific conformations, contributions
cess is still a relevant problem in the field. Recent approaches that from local fluctuations, and even large-scale structural transitions
use protein language model embeddings as input instead of calcu­ between different conformations. In fact, the old paradigm that se­
lating time-intensive multiple sequence alignments (MSAs) provide quence encodes structure, and structure determines function can
a step in this direction [221]. With the growth of exascale computing now be rephrased as sequence encodes structure, structure de­
resources, modelling structural dynamics via molecular simulations termines dynamics, and dynamics encodes function [226].
is increasingly accessible, though there is a long way to go for this to Protein flexibility and conformational diversity can be modelled
become commonplace. in multiple ways. One of the most common approaches is using
Once a dataset of structures is gathered or generated, the next molecular dynamics (MD) simulations, which calculates the force
steps often involve structural comparison and feature extraction. exerted on each atom by all other atoms as a function of time using a
Alignment-free structural comparison techniques are relatively fast molecular mechanics force field [227]. However, MD simulations,
already, but structural aligners that scale to the sizes of datasets which are already computationally extremely expensive, do not
required for ML have only recently started to appear. These are still a address covalent bond formation or breakage, both crucial in a
far cry from the highly optimised sequence aligners, but many of number of enzyme families. This sometimes leads to the need for the
these optimisation techniques can be transferred to structure-based even more expensive and challenging set up of Quantum mechanics/
approaches and represent a logical next step as ML on structures molecular mechanics (QM/MM) simulations [228]. Coarse-grained
grows in popularity. Extraction of many of the features detailed in modelling with Monte Carlo simulations (CG-MC) and elastic net­
Table 2 is time consuming as well. While some improvements can be work models (ENM, a.k.a normal mode analysis) both provide sim­
made with parallelisation and making better use of modern hard­ plified protein representations that still allow for understanding
ware, this is unlikely to scale to hundreds of thousands of proteins in some aspects of protein flexibility while greatly reducing computa­
a similar timescale as sequence feature extraction. tional time [226,229]. structures resolved by cryo-EM, a fast-growing
number.
4.2. End-to-end learning on structures Together, these computational techniques can provide informa­
tion about globular protein flexibility and mutations [230,231],
End-to-end learning, where a DL model learns a mathematical large-scale structural transitions (e.g.from active to inactive con­
function to map an input to a complex output [222], with minimal formations) [232–235], and conformations involved in the formation

637
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

of protein complexes [236]. They have also been used to assess and multi-omics studies, a great target for ML and DL methods. The future
refine 3D models [237–239], improve ligand positioning [240,241], holds an increasing number of opportunities for this combination of
and to create receptor ensembles for ensemble docking [242,243]. network biology and ML [259] – in understanding and fighting diseases
The faster and cruder CG-MC and ENM approaches can be combined by inspecting protein and gene interaction networks, in locating off-
with atomistic-level MD, providing efficient strategies and starting target effects of drugs and concocting valuable drug combination
points for multiscale simulations of proteins and complexes [244]. therapies based on chemical networks and multi-omics data from drug
While ML is becoming more prevalent in the MD and CG-MC fields, treatments [260], in understanding microbial interactions through
to construct force field models, model energy surfaces, and perform metabolic networks, in finding biosynthetic gene clusters through gene
conformational sampling [245–247], future efforts will likely also neighbourhoods, transcriptomics, and expression profiling, and in de­
utilise the flexibility information obtained from these techniques to signing synthetic gene circuits combining interconnected genes, pro­
use as input in ML-based predictors of protein function, with a few moters, and ribosome binding sites. Apart from a few examples [261],
early examples already doing this in unsupervised [248,249] and structural data has rarely been used in such large scale integrative
supervised settings [250,251]. There is some evidence that this can approaches due to its scarcity and complexity. With the former being
improve over static structure-based prediction [252]. solved, the future holds promise in finding and using algorithms and
approaches to link protein structures with all of their interlinked data
4.4. Probing underlying protein mechanisms in a unified approach to model function [262].

A major limitation of DL-based structure prediction techniques, 5. Conclusion


where prediction acts merely as an alternative to an experimental
technique, is that they do not immediately provide us with a deeper Protein structure is a central component to understanding bio­
understanding of the processes behind the folding of proteins as this logical processes, and thus a great addition to ML approaches in the
is not their aim [253]. In contrast, many approaches using structural protein bioinformatics field. In this review we described the space of
data to predict protein properties, especially those in protein family structure-based ML in terms of the tasks it can be applied to, and the
ML, have tried to make more explicit use of the rich feature sources kinds of input representations and algorithms used with a number of
provided to extract mechanistic insights and interpret the residues, examples demonstrating the powerful predictions that can be ob­
causes and processes involved behind specific predictions, as well as tained. Mainly due to the recent breakthroughs in computational
guide experimental design in the most relevant directions. structure prediction, the field of structure-based ML is expanding
Interpretable ML is a crucial concept in bioinformatics, as often we very rapidly, with a high number of actively cited preprints in this
are as interested in the how and why of a prediction as we are in the review attesting to this. At the moment, sequence-based features,
what. Thus an important next step in structure-based ML is to couple aligners, representations, and ML approaches still far outnumber
predictions with an understanding of protein biology in terms of structure-based ones and they are generally much faster as well.
folding, interaction, function, and the interplay between the three. However, the power of structural information to improve compu­
From a protein universe perspective, interpretation becomes depen­ tational prediction of protein biology is alluring, and the growth of
dent on the model inspection techniques specific to DL approaches. structural databases, algorithms for alignment and representation,
While this is a nascent field, techniques such as integrated gradients, and increasing accessibility of relevant DL approaches and archi­
saliency and class activation maps exist for this purpose, though they tectures will foster a new generation of protein bioinformatics in
are rarely used yet in structure-based ML tasks [254]. Large-scale un­ which structure will play a starring role.
supervised techniques exploring the protein structural space can also
be helpful to pinpoint folds, pockets, and interfaces upon which evo­ CRediT authorship contribution statement
lutionary and function-specific analyses can be conducted and for
which ML representations and techniques that lend well to linking of Janani Durairaj: Conceptualization, Investigation, Visualization,
prediction to cause can be used. Most importantly, a tight coupling of Writing – original draft, Writing – review & editing. Dick de Ridder:
computational prediction with experimental set up is required, Writing – review & editing, Supervision. Aalt D.J. van Dijk: Writing –
creating a feedback loop that improves prediction and experimentally review & editing, Supervision, Project administration.
characterizes relevant functional space.
Conflicts of Interest
4.5. A unified approach to function
We have no conflicts of interest to disclose.
Biological function is only partly determined by an individual
protein – its genomic and cellular contexts also play a big role. Each Acknowledgements
protein is determined by an underlying gene sequence, but the
mapping from gene to protein is not so straightforward, complicated This work was supported by the Netherlands Organization for
by the existence of alternatively spliced transcript variants [255], Scientific Research (NWO).
pre-protein sequences in need of further processing [256], and
moonlighting pseudoenzymes [257]. In addition, post-translational References
modifications, the developmental stage of an organism’s life, their
subcellular localisation and environment in the cell, and even the [1] Zerbino DR, Wilder SP, Johnson N, Juettemann T, Flicek PR. The ensembl reg­
ulatory build. Genome Biol 2015;16(1):56. https://doi.org/10.1186/s13059-015-
extra-cellular conditions all have an effect on protein expression and 0621-5
function [258]. More often than not, proteins also work in concert [2] Bileschi ML, Belanger D, Bryant DH, Sanderson T, Carter B, Sculley D, Bateman A,
with a wide variety of other entities, ranging from metal ions and DePristo MA, Colwell LJ. Using deep learning to annotate the protein universe.
Nat Biotechnol 2022;40(6):932–7. https://doi.org/10.1038/s41587-021-01179-w
cofactors, water and other solvent molecules, small molecule li­ [3] Gane A., Bileschi, M.L., Dohan D., Speretta E., Héliou A., Meng-Papaxanthos L.,
gands, peptides, nucleic acids, and other proteins. Zellner H., Brevdo E., Parikh A., Orchard S. ProtNLM: model-based natural lan­
One area of study focused on integrating these different contexts of guage protein annotation.
[4] IllergÅrd K, Ardell DH, Elofsson A. Structure is three to ten times more con­
proteins and their complex interactions is network biology. This field is
served than sequence–a study of structural response in protein cores. Proteins
crucial for the accurate modelling of biological systems, and given the Struct Funct Bioinform 2009;77(3):499–508. https://doi.org/10.1002/prot.
influx of data from high-throughput interaction assays and large-scale 22458

638
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

[5] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, J, Ciruela F, editors. Progress in molecular biology and translational science, Vol.
Bourne PE. The protein data bank. Nucleic Acids Res 2000;28(1):235–42. 169 of oligomerization in health and disease: from enzymes to G protein-
[6] Schwede T. Protein modeling: what happened to the “protein structure gap”? coupled receptors Academic Press; 2020. p. 105–49. https://doi.org/10.1016/bs.
Structure 2013;21(9):1531–40. https://doi.org/10.1016/j.str.2013.08.007 pmbts.2019.11.007. (pp).
[7] Somody JC, MacKinnon SS, Windemuth A. Structural coverage of the proteome [31] Bordner AJ. Predicting protein-protein binding sites in membrane proteins.
for pharmaceutical applications. Drug Discov Today 2017;22(12):1792–9. BMC Bioinform 2009;10(1):312. https://doi.org/10.1186/1471-2105-10-312
https://doi.org/10.1016/j.drudis.2017.08.004 [32] L. Heo, M. Feig, Multi-state modeling of G-protein Coupled Receptors at ex­
[8] Bienert S, Waterhouse A, de Beer TAP, Tauriello G, Studer G, Bordoli L, Schwede perimental accuracy, bioRxiv Preprint (Nov. 2021). 10.1101/2021.11.26.470086.
T. The SWISS-MODEL Repository–new features and functionality. Nucleic Acids [33] Popov P, Peng Y, Shen L, Stevens RC, Cherezov V, Liu Z-J, Katritch V.
Res 2017;45(D1):D313–9. https://doi.org/10.1093/nar/gkw1132 Computational design of thermostabilizing point mutations for G Protein-
[9] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool Coupled Receptors. eLife 2018;7:e34729https://doi.org/10.7554/eLife.34729
K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie [34] Raschka S, Kaufman B. Machine learning and AI-based approaches for bioactive
A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy ligand discovery and GPCR-ligand recognition. Methods 2020;180:89–110.
E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, https://doi.org/10.1016/j.ymeth.2020.06.016
Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein [35] Cohen P. Protein Kinases — the major drug targets of the twenty-first century?
structure prediction with AlphaFold. Nature 2021;596(7873):583–9. https://doi. Nat Rev Drug Discov 2002;1(4):309–15. https://doi.org/10.1038/nrd773
org/10.1038/s41586-021-03819-2 [36] Laufer S, Bajorath J. New frontiers in kinases: second generation inhibitors. J
[10] Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Med Chem 2014;57(6):2167–8.
Stroe O, Wood G, Laydon A, et al. AlphaFold Protein Structure Database: [37] Afanasyeva A, Nagao C, Mizuguchi K. Developing a kinase-specific target se­
Massively expanding the structural coverage of protein-sequence space with lection method using a structure-based machine learning approach. Adv
high-accuracy models. Nucleic Acids Res 2022;50(D1):D439–44. Appl Bioinform Chem AABC 2020;13:27–40. https://doi.org/10.2147/AABC.
[11] M. Akdel, D.E.V. Pires, E.P. Pardo, J. Jänes, A.O. Zalevsky, B. Mészáros, P. Bryant, L. S278900
L. Good, R.A. Laskowski, G. Pozzati, A. Shenoy, W. Zhu, P. Kundrotas, V.R. Serra, C. [38] de Ávila MB, Xavier MM, Pintro VO, de Azevedo WF. Supervised machine
H.M. Rodrigues, A.S. Dunham, D. Burke, N. Borkakoti, S. Velankar, A. Frost, J. learning techniques to predict binding affinity A study for Cyclin-Dependent
Basquin, K. Lindorff-Larsen, A. Bateman, A.V. Kajava, A. Valencia, S. Ovchinnikov, Kinase 2. Biochem Biophys Res Commun 2017;494(1):305–10. https://doi.org/
J. Durairaj, D.B. Ascher, J.M. Thornton, N.E. Davey, A. Stein, A. Elofsson, T.I. Croll, 10.1016/j.bbrc.2017.10.035
P. Beltrao, A structural biology community assessment of AlphaFold2 applica­ [39] McSkimming DI, Rasheed K, Kannan N. Classifying kinase conformations using
tions, Nat Struct Mol Biol 29(11) (2022) 1056–1067. 10.1038/s41594–022- a machine learning approach. BMC Bioinform 2017;18(1):86. https://doi.org/10.
00849-w. 1186/s12859-017-1506-2
[12] Porta-Pardo E, Ruiz-Serra V, Valentini S, Valencia A. The structural coverage of [40] Ung PM-U, Rahman R, Schlessinger A. Redefining the protein kinase con­
the human proteome before and after AlphaFold. PLoS Comput Biol 2022;18(1). formational space with machine learning. e2 Cell Chem Biol
https://doi.org/10.1371/journal.pcbi.1009818 2018;25(7):916–24. https://doi.org/10.1016/j.chembiol.2018.05.002
[13] Pfab J, Phan NM, Si D. Deeptracer for fast de novo cryo-em protein structure [41] Sun T, Chen Y, Wen Y, Zhu Z, Li M. PremPLI: a machine learning model for
modeling and special studies on cov-related complexes. Proc Natl Acad Sci USA predicting the effects of missense mutations on protein-ligand interactions.
2021;118(2):e2017525118. (Nov.). Commun Biol 2021;4(1). https://doi.org/10.1038/s42003-021-02826-3.
[14] Jin S, Miller MD, Chen M, Schafer NP, Lin X, Chen X, Phillips GN, Wolynes PG. (Nov.).
Molecular-replacement phasing using predicted protein structures from [42] Mou Z, Eakes J, Cooper CJ, Foster CM, Standaert RF, Podar M, Doktycz MJ, Parks
awsem-suite. IUCrJ 2020;7(6):1168–78. JM. Machine learning-based prediction of enzyme substrate scope: application
[15] Chai L, Zhu P, Chai J, Pang C, Andi B, McSweeney S, Shanklin J, Liu Q. Alphafold to bacterial nitrilases. Proteins Struct Funct Bioinform 2021;89(3):336–47.
protein structure database for sequence-independent molecular replacement. https://doi.org/10.1002/prot.26019
Crystals 2021;11(10):1227. [43] Robinson SL, Smith MD, Richman JE, Aukema KG, Wackett LP. Machine
[16] Abdin O, Nim S, Wen H, Kim PM. PepNN: a deep attention model for the learning-based prediction of activity and substrate specificity for OleA enzymes
identification of peptide binding sites. Commun Biol 2022;5(1):1–10. https:// in the Thiolase superfamily. Synth Biol 2020;5(1). https://doi.org/10.1093/
doi.org/10.1038/s42003-022-03445-2 synbio/ysaa004. (Jan.).
[17] Pittala S, Bailey-Kellogg C. Learning context-aware structural representations to [44] Durairaj J, Melillo E, Bouwmeester HJ, Beekwilder J, de Ridder D, van Dijk ADJ.
predict antigen and antibody binding interfaces. Bioinformatics Integrating structure-based machine learning and co-evolution to investigate
2020;36(13):3996–4003. https://doi.org/10.1093/bioinformatics/btaa263 specificity in plant sesquiterpene synthases. PLoS Comput Biol
[18] Liu R, Hu J. DNABind: a hybrid algorithm for structure-based prediction of DNA- 2021;17(3):e1008197https://doi.org/10.1371/journal.pcbi.1008197
binding residues by combining machine learning- and template-based ap­ [45] He X-h, You C-z, Jiang H-l, Jiang Y, Xu HE, Cheng X. Alphafold2 versus experi­
proaches. Proteins Struct Funct Bioinform 2013;81(11):1885–99. https://doi. mental structures: evaluation on g protein-coupled receptors. Acta Pharmacol
org/10.1002/prot.24330 Sin 2022:1–7.
[19] Romero PA, Krause A, Arnold FH. Navigating the protein fitness landscape with [46] Timonina D, Sharapova Y, Švedas V, Suplatov D. Bioinformatic analysis of
Gaussian Processes. Proc Natl Acad Sci USA 2013;110(3):E193–201. https://doi. subfamily-specific regions in 3D-structures of homologs to study functional
org/10.1073/pnas.1215251110 diversity and conformational plasticity in protein superfamilies. Comput Struct
[20] Volkov M, Turk J-A, Drizard N, Martin N, Hoffmann B, Gaston-Mathé Y, Rognan Biotechnol J 2021;19:1302–11.
D. On the frustration to predict binding affinities from protein-ligand structures [47] de Lima EB, Júnior WM, de Melo-Minardi RC. Isofunctional protein subfamily
with deep neural networks. J Med Chem 2022;65(11):7946–58. https://doi.org/ detection using data integration and spectral clustering. PLoS Comput Biol
10.1021/acs.jmedchem.2c00487 2016;12(6):e1005001https://doi.org/10.1371/journal.pcbi.1005001
[21] Mitchell TM, et al. Machine learning vol. 45 McGraw Hill; 1997. [48] N. Ahalawat, J. Mondal, Resolving protein conformational plasticity and sub­
[22] Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for strate binding through the lens of machine-learning, bioRxiv Preprint (Jan.
biologists. Nat Rev Mol Cell Biol 2022;23(1):40–55. https://doi.org/10.1038/ 2022). 10.1101/2022.01.07.475334.
s41580-021-00407-0 [49] A. Joshi, N. Haspel, E. González, Characterizing protein conformational spaces
[23] Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and per­ using dimensionality reduction and algebraic topology, bioRxiv Preprint (Nov.
spectives. bbab460 Brief Bioinform 2022;23(1). https://doi.org/10.1093/bib/ 2021). 10.1101/2021.11.16.468545.
bbab460 [50] Peterson LE. K-Nearest neighbor. Scholarpedia 2009;4(2):1883. https://doi.org/
[24] Sieow BF-L, De Sotto R, Seet ZRD, Hwang IY, Chang MW. Synthetic biology 10.4249/scholarpedia.1883
meets machine learning. In: Selvarajoo K, editor. Computational biology and [51] Noble WS. What is a support vector machine? Nat Biotechnol
machine learning for metabolic engineering and synthetic biology, methods in 2006;24(12):1565–7. https://doi.org/10.1038/nbt1206-1565
molecular biology US, New York, NY: Springer; 2023. p. 21–39. https://doi.org/ [52] Rasmussen CE. Gaussian processes in machine learning. In: Bousquet O, von
10.1007/978-1-0716-2617-7_2. (pp). Luxburg U, Rätsch G, editors. Advanced lectures on machine learning: ML
[25] Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat summer schools 2003, Canberra, Australia, February 2 - 14, 2003, Tübingen,
Biotechnol 2018;36(9):829–38. https://doi.org/10.1038/nbt.4233 Germany, August 4 - 16, 2003, Revised Lectures, Lecture Notes in Computer
[26] Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Science Berlin, Heidelberg: Springer; 2004. p. 63–71. https://doi.org/10.1007/
Madabhushi A, Shah P, Spitzer M, Zhao S. Applications of machine learning in 978-3-540-28650-9_4
drug discovery and development. Nat Rev Drug Discov 2019;18(6):463–77. [53] Breiman L. Random forests. Mach Learn 2001;45(1):5–32. https://doi.org/10.
https://doi.org/10.1038/s41573-019-0024-5 1023/A:1010933404324
[27] Flower DR, North ACT, Sansom CE. The lipocalin protein family: structural and [54] Friedman JH. Greedy function approximation: a gradient boosting machine.
sequence overview. Biochim Biophys Acta ((BBA)) Protein Struct Mol Enzymol Ann Stat 2001;29(5):1189–232.
2000;1482(1):9–24. https://doi.org/10.1016/S0167-4838(00)00148-5 [55] Cheng J, Randall AZ, Sweredoski MJ, Baldi P. SCRATCH: a protein structure and
[28] Durairaj J, DiGirolamo A, Bouwmeester HJ, de Ridder D, Beekwilder J, van Dijk structural feature prediction server. Nucleic Acids Res 2005;33(Web Server
AD. An analysis of characterized plant sesquiterpene synthases. Phytochemistry issue):W72–6. https://doi.org/10.1093/nar/gki396
2019;158:157–65. https://doi.org/10.1016/j.phytochem.2018.10.020 [56] Shen Y, Bax A. Protein backbone and sidechain torsion angles predicted from
[29] Böhme I, Beck-Sickinger AG. Illuminating the life of GPCRs. Cell Commun Signal NMR chemical shifts using artificial neural networks. J Biomol NMR
2009;7(1):1–22. 2013;56(3):227–41. https://doi.org/10.1007/s10858-013-9741-y
[30] Barreto CAV, Baptista SJ, Preto AJ, Matos-Filipe P, Mourão J, Melo R, Moreira I. [57] Mataeimoghadam F, Newton MAH, Dehzangi A, Karim A, Jayaram B,
Chapter Four - Prediction and targeting of GPCR oligomer interfaces. In: Giraldo Ranganathan S, Sattar A. Enhancing protein backbone angle prediction by using

639
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

simpler models of deep neural networks. Sci Rep 2020;10(1):19430. https://doi. [83] Kryshtafovych A, Monastyrskyy B, Fidelis K. CASP11 statistics and the predic­
org/10.1038/s41598-020-76317-6 tion center evaluation system. Proteins Struct Funct Bioinform
[58] Xu J. Distance-based protein folding powered by deep learning. Proc Natl Acad 2016;84(S1):15–9. https://doi.org/10.1002/prot.25005
Sci USA 2019;116(34):16856–65. https://doi.org/10.1073/pnas.1821309116 [84] Townshend R, Bedi R, Suriana P, Dror R. End-to-end Learning on 3D protein
[59] Jones DT, Kandathil SM. High precision in protein contact prediction using fully structure for interface prediction. Adv Neural Inf Process Syst 2019;32.
convolutional neural networks and minimal sequence features. Bioinformatics [85] Sanchez-Garcia R, Sorzano COS, Carazo JM. A method for the prediction of
2018;34(19):3308–15. https://doi.org/10.1093/bioinformatics/bty341 partner-specific protein-protein interfaces. Bioinformatics 2019;35(3):470–7.
[60] Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein https://doi.org/10.1093/bioinformatics/bty647
contact map by ultra-deep learning model. PLoS Comput Biol [86] Gainza P, Sverrisson F, Monti F, Rodolà E, Boscaini D, Bronstein MM, Correia BE.
2017;13(1):e1005324https://doi.org/10.1371/journal.pcbi.1005324 Deciphering interaction fingerprints from protein molecular surfaces using
[61] Liu Y, Palmedo P, Ye Q, Berger B, Peng J. Enhancing evolutionary couplings with geometric deep learning. Nat Methods 2020;17(2):184–92. https://doi.org/10.
deep convolutional neural networks. e3 Cell Syst 2018;6(1):65–74. https://doi. 1038/s41592-019-0666-6
org/10.1016/j.cels.2017.11.014 [87] U. Ghani, I. Desta, A. Jindal, O. Khan, G. Jones, S. Kotelnikov, D. Padhorny, S.
[62] Ovchinnikov S, Park H, Varghese N, Huang P-S, Pavlopoulos GA, Kim DE, Vajda, D. Kozakov, Improved docking of protein models by a combination of
Kamisetty H, Kyrpides NC, Baker D. Protein structure determination using alphafold2 and cluspro, bioRxiv Preprint (Sep. 2021). 10.1101/2021.09.07.
metagenome sequence data. Science 2017;355(6322):294–8. https://doi.org/10. 459290.
1126/science.aah4043 [88] Bendell CJ, Liu S, Aumentado-Armstrong T, Istrate B, Cernek PT, Khan S,
[63] Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of Picioreanu S, Zhao M, Murgita RA. Transient protein-protein interface predic­
methods of protein structure prediction (CASP)—round XIII. Proteins Struct tion: datasets, features, algorithms, and the rad-t predictor. BMC Bioinform
Funct Bioinform 2019;87(12):1011–20. https://doi.org/10.1002/prot.25823 2014;15(1):1–12.
[64] Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of [89] Das S, Chakrabarti S. Classification and prediction of protein-protein interaction
methods of protein structure prediction (CASP)–round XIV. Proteins Struct interface using machine learning algorithm. Sci Rep 2021;11(1):1–12.
Funct Bioinform 2021;89(12):1607–17. https://doi.org/10.1002/prot.26237 [90] Xu Q, Dunbrack RL. Protcid: a data resource for structural information on
[65] Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat protein interactions. Nat Commun 2020;11(1):1–16.
Rev Mol Cell Biol 2019;20(11):681–97. https://doi.org/10.1038/s41580-019- [91] Vreven T, Moal IH, Vangone A, Pierce BG, Kastritis PL, Torchala M, Chaleil R,
0163-x Jiménez-García B, Bates PA, Fernandez-Recio J, Bonvin AMJJ, Weng Z. Updates to
[66] AlQuraishi M. Machine learning in protein structure prediction. Curr Opin the integrated protein-protein interaction benchmarks: Docking Benchmark
Chem Biol 2021;65:1–8. https://doi.org/10.1016/j.cbpa.2021.04.005 Version 5 and Affinity Benchmark Version 2. J Mol Biol 2015;427(19):3031–41.
[67] Gligorijević V, Renfrew PD, Kosciolek T, Leman JK, Berenberg D, Vatanen T, https://doi.org/10.1016/j.jmb.2015.07.016
Chandler C, Taylor BC, Fisk IM, Vlamakis H, Xavier RJ, Knight R, Cho K, Bonneau [92] Kundrotas PJ, Anishchenko I, Dauzhenka T, Kotthoff I, Mnevets D, Copeland
R. Structure-based protein function prediction using graph convolutional net­ MM, Vakser IA. Dockground: a comprehensive data resource for modeling of
works. Nat Commun 2021;12:3168. https://doi.org/10.1038/s41467-021- protein complexes. Protein Sci 2018;27(1):172–81.
23303-9 [93] A. Morehead, C. Chen, A. Sedova, Dips-plus: The enhanced database of inter­
[68] Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches acting protein structures for interface prediction, arXiv preprint arXiv:2106.
to predict protein functional families and functional sites. Curr Opin Struct Biol 04362 (2021).
2021;70:108–22. https://doi.org/10.1016/j.sbi.2021.05.012 [94] Jiménez J, Doerr S, Martínez-Rosell G, Rose AS, Fabritiis GDe. DeepSite: Protein-
[69] Dana JM, Gutmanas A, Tyagi N, Qi G, O’Donovan C, Martin M, Velankar S. SIFTS: binding site predictor using 3D-convolutional neural networks. Bioinformatics
updated Structure Integration with Function, Taxonomy and Sequences re­ 2017;33(19):3036–42. https://doi.org/10.1093/bioinformatics/btx350
source allows 40-fold increase in coverage of structure-based annotations for [95] Kozlovskii I, Popov P. Spatiotemporal identification of druggable binding sites
proteins. Nucleic Acids Res 2019;47(D1):D482–9. https://doi.org/10.1093/nar/ using deep learning. Commun Biol 2020;3(1):1–12. https://doi.org/10.1038/
gky1114 s42003-020-01350-0
[70] Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein sta­ [96] Krivák R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate
bility upon point mutations. Nucleic Acids Res 2006;34(suppl_2):W239–42. prediction of ligand binding sites from protein structure. J Cheminfor
https://doi.org/10.1093/nar/gkl190 2018;10(1):39. https://doi.org/10.1186/s13321-018-0285-8
[71] Li B, Yang YT, Capra JA, Gerstein MB. Predicting changes in protein thermo­ [97] Desaphy J, Bret G, Rognan D, Kellenberger E. sc-PDB: a 3D-database of ligand­
dynamic stability upon point mutation with deep 3D convolutional neural able binding sites—10 years on. Nucleic Acids Res 2015;43(D1):D399–404.
networks. PLoS Comput Biol 2020;16(11):e1008291https://doi.org/10.1371/ https://doi.org/10.1093/nar/gku928
journal.pcbi.1008291 [98] Roy A, Yang J, Zhang Y. COFACTOR: an accurate comparative algorithm for
[72] Masso M, Vaisman II. Accurate prediction of stability changes in protein mu­ structure-based protein function annotation. Nucleic Acids Res
tants by combining machine learning with structure based computational 2012;40(W1):W471–7. https://doi.org/10.1093/nar/gks372
mutagenesis. Bioinformatics 2008;24(18):2002–9. https://doi.org/10.1093/ [99] Schmidtke P, Souaille C, Estienne F, Baurin N, Kroemer RT. Large-scale com­
bioinformatics/btn353 parison of four binding site detection algorithms. J Chem Inf Model
[73] Quan L, Lv Q, Zhang Y. STRUM: Structure-based prediction of protein stability 2010;50(12):2191–200. https://doi.org/10.1021/ci1000289
changes upon single-point mutation. Bioinformatics 2016;32(19):2936–46. [100] Mészáros B, Erdős G, Dosztányi Z. IUPred2A: Context-dependent prediction of
https://doi.org/10.1093/bioinformatics/btw361 protein disorder as a function of redox State and protein binding. Nucleic Acids
[74] Nikam R, Kulandaisamy A, Harini K, Sharma D, Gromiha MM. ProThermDB: Res 2018;46(W1):W329–37. https://doi.org/10.1093/nar/gky384
thermodynamic database for proteins and mutants revisited after [101] McGuffin LJ. Intrinsic disorder prediction from the analysis of multiple protein
15 years. Nucleic Acids Res 2021;49(D1):D420–4. https://doi.org/10.1093/nar/ fold recognition models. Bioinformatics 2008;24(16):1798–804. https://doi.org/
gkaa1035 10.1093/bioinformatics/btn326
[75] R.J. Townshend, M. Vögele, P. Suriana, A. Derry, A. Powers, Y. Laloudakis, S. [102] Schad E, Fichó E, Pancsa R, Simon I, Dosztányi Z, Mészáros B. DIBS: a repository of
Balachandar, B. Jing, B. Anderson, S. Eismann, et al., Atom3d: Tasks on mole­ disordered binding sites mediating interactions with ordered proteins.
cules in three dimensions, arXiv preprint arXiv:2012.04035 (2020). Bioinformatics 2018;34(3):535–7. https://doi.org/10.1093/bioinformatics/btx640
[76] Naderi M, Lemoine JM, Govindaraj RG, Kana OZ, Feinstein WP, Brylinski M. [103] Piovesan D, Tabaro F, Mičetić I, Necci M, Quaglia F, Oldfield CJ, Aspromonte MC,
Binding site matching in rational drug design: algorithms and applications. Davey NE, Davidović R, Dosztányi Z, Elofsson A, Gasparini A, Hatos A, Kajava AV,
Brief Bioinform 2019;20(6):2167–84. https://doi.org/10.1093/bib/bby078 Kalmar L, Leonardi E, Lazar T, Macedo-Ribeiro S, Macossay-Castillo M, Meszaros
[77] Pu L, Govindaraj RG, Lemoine JM, Wu H-C, Brylinski M. DeepDrug3D: classifi­ A, Minervini G, Murvai N, Pujols J, Roche DB, Salladini E, Schad E, Schramm A,
cation of ligand-binding pockets in proteins with a convolutional neural net­ Szabo B, Tantos A, Tonello F, Tsirigos KD, Veljković N, Ventura S, Vranken W,
work. PLoS Comput Biol 2019;15(2):e1006718https://doi.org/10.1371/journal. Warholm P, Uversky VN, Dunker A, Longhi S, Tompa P, Tosatto SC. DisProt 7.0: a
pcbi.1006718 major update of the database of disordered proteins. Nucleic Acids Res
[78] Brylinski M. eMatchSite: Sequence order-independent structure alignments of 2017;45(D1):D219–27. https://doi.org/10.1093/nar/gkw1056
ligand binding pockets in protein models. PLoS Comput Biol [104] Wass MN, Fuentes G, Pons C, Pazos F, Valencia A. Towards the prediction of
2014;10(9):e1003829https://doi.org/10.1371/journal.pcbi.1003829 protein interaction partners using physical docking. Mol Syst Biol
[79] Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein-ligand scoring with 2011;7(1):469. https://doi.org/10.1038/msb.2011.3
convolutional neural networks. J Chem Inf Model 2017;57(4):942–57. https:// [105] Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, Lefebvre C,
doi.org/10.1021/acs.jcim.6b00740 Accili D, Hunter T, Maniatis T, Califano A, Honig B. Structure-based prediction of
[80] Pagès G, Charmettant B, Grudinin S. Protein model quality assessment using 3D protein-protein interactions on a genome-wide scale. Nature
oriented convolutional neural networks. Bioinformatics 2019;35(18):3313–9. 2012;490(7421):556–60. https://doi.org/10.1038/nature11503
https://doi.org/10.1093/bioinformatics/btz122 [106] I.R. Humphreys, J. Pei, M. Baek, A. Krishnakumar, I. Anishchenko, S.
[81] Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep Ovchinnikov, J. Zhang, T.J. Ness, S. Banjade, S. Bagde, V.G. Stancheva, X.-H. Li, K.
learning: advances in scoring functions for protein-ligand docking. WIREs Liu, Z. Zheng, D.J. Barrero, U. Roy, I.S. Fernández, B. Szakal, D. Branzei, E.C.
Comput Mol Sci 2020;10(1):e1429https://doi.org/10.1002/wcms.1429 Greene, S. Biggins, S. Keeney, E.A. Miller, J.C. Fromme, T.L. Hendrickson, Q. Cong,
[82] Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved D. Baker, Structures of core eukaryotic protein complexes, bioRxiv Preprint
protein structure refinement guided by deep learning based accuracy estima­ (Sep. 2021). 10.1101/2021.09.30.462231.
tion. Nat Commun 2021;12(1):1340. https://doi.org/10.1038/s41467-021- [107] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D. The database
21511-x of interacting proteins: 2004 update. Nucleic Acids Res
2004;32(suppl_1):D449–51.

640
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

[108] Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, [130] P. Bryant, G. Pozzati, W. Zhu, A. Shenoy, P. Kundrotas, A. Elofsson, Predicting the
Legeay M, Fang T, Bork P, Jensen LJ, von Mering C. The STRING database in 2021: structure of large protein complexes using AlphaFold and Monte Carlo tree
customizable protein-protein networks, and functional characterization of search, Nat Commun 13(1) (2022) 6028.10.1038/s41467–022-33729–4.
user-uploaded gene/measurement sets. Nucleic Acids Res [131] Yin R, Feng BY, Varshney A, Pierce BG. Benchmarking AlphaFold for protein
2021;49(D1):D605–12. https://doi.org/10.1093/nar/gkaa1074 complex modeling reveals accuracy determinants. Protein Sci 2022;31(8):e4379
[109] Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath https://doi.org/10.1002/pro.4379
V, Niranjan V, Muthusamy B, Gandhi TKB, Gronborg M, Ibarrola N, Deshpande [132] M. Baek, R. McHugh, I. Anishchenko, D. Baker, F. DiMaio, Accurate prediction of
N, Shanker K, Shivashankar HN, Rashmi BP, Ramya MA, Zhao Z, Chandrika KN, nucleic acid and protein-nucleic acid complexes using rosettafoldna, bioRxiv
Padma N, Harsha HC, Yatish AJ, Kavitha MP, Menezes M, Choudhury DR, Suresh (2022). 10.1101/2022.09.09.507333.
S, Ghosh N, Saravana R, Chandran S, Krishna S, Joy M, Anand SK, Madavan V, [133] Lima AN, Philot EA, Trossini GHG, Scott LPB, Maltarollo VG, Honorio KM.
Joseph A, Wong GW, Schiemann WP, Constantinescu SN, Huang L, Khosravi-Far Use of machine learning approaches for novel drug discovery. Expert
R, Steen H, Tewari M, Ghaffari S, Blobe GC, Dang CV, Garcia JGN, Pevsner J, Opin Drug Discov 2016;11(3):225–39. https://doi.org/10.1517/17460441.2016.
Jensen ON, Roepstorff P, Deshpande KS, Chinnaiyan AM, Hamosh A, Chakravarti 1146250
A, Pandey A. Development of human protein reference database as an initial [134] Zhao J, Cao Y, Zhang L. Exploring the computational methods for protein-ligand
platform for approaching systems biology in humans. Genome Res binding site prediction. Comput Struct Biotechnol J 2020;18:417–26. https://
2003;13(10):2363–71. https://doi.org/10.1101/gr.1680803 doi.org/10.1016/j.csbj.2020.02.008
[110] Oughtred R, Rust J, Chang C, Breitkreutz B-J, Stark C, Willems A, Boucher L, [135] Lee M, Kim D. Large-scale reverse docking profiles and their applications. BMC
Leung G, Kolas N, Zhang F, Dolma S, Coulombe-Huntington J, Chatr-Aryamontri Bioinform 2012;13(17):S6. https://doi.org/10.1186/1471-2105-13-S17-S6
A, Dolinski K, Tyers M. The BioGRID database: a comprehensive biomedical [136] Grinter SZ, Liang Y, Huang S-Y, Hyder SM, Zou X. An inverse docking approach
resource of curated protein, genetic, and chemical interactions. Protein Sci for identifying new potential anti-cancer targets. J Mol Graph Model
2021;30(1):187–200. https://doi.org/10.1002/pro.3978 2011;29(6):795–9. https://doi.org/10.1016/j.jmgm.2011.01.002
[111] Kumar R, Nanduri B. HPIDB - a unified resource for host-pathogen interactions. [137] Fernández A. Artificial intelligence teaches drugs to target proteins by tackling
BMC Bioinform 2010;11(6):S16. https://doi.org/10.1186/1471-2105-11-S6-S16 the induced folding problem. Mol Pharm 2020;17(8):2761–7. https://doi.org/
[112] Zhang N, Chen Y, Lu H, Zhao F, Alvarez RV, Goncearenco A, Panchenko AR, Li M. 10.1021/acs.molpharmaceut.0c00470
MutaBind2: Predicting the impacts of single and multiple mutations on pro­ [138] Z. Xu, O.R. Wauchope, A.T. Frank, Navigating chemical space by interfacing
tein-protein interactions. iScience 2020;23(3). https://doi.org/10.1016/j.isci. generative artificial intelligence and molecular docking, J Chem Inf Model
2020.100939. (Mar.). 61(11) (2021) 5589–5600. 10.1021/acs.jcim.1c00746.
[113] Liu X, Luo Y, Li P, Song S, Peng J. Deep geometric representations for modeling [139] P. Drotár, A.R. Jamasb, B. Day, C. Cangea, P. Liò, Structure-aware generation of
effects of mutations on protein-protein binding affinity. PLoS Comput Biol drug-like molecules, arXiv Preprint (Nov. 2021.
2021;17(8):e1009284https://doi.org/10.1371/journal.pcbi.1009284 [140] Wong F, Krishnan A, Zheng EJ, Stärk H, Manson AL, Earl AM, Jaakkola T, Collins
[114] Geng C, Vangone A, Folkers GE, Xue LC, Bonvin AMJJ. iSEE: Interface structure, JJ. Benchmarking alphafold-enabled molecular docking predictions for anti­
evolution, and energy-based machine learning predictor of binding affinity biotic discovery. Mol Syst Biol 2022;18(9):e11081.
changes upon mutations. Proteins Struct Funct Bioinform 2019;87(2):110–9. [141] N. Sen, I. Anishchenko, N. Bordin, I. Sillitoe, S. Velankar, D. Baker, C. Orengo,
https://doi.org/10.1002/prot.25630 Characterizing disease-associated human proteins without available protein
[115] Jankauskaitė J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. structures or homologues, bioRxiv Preprint (Nov. 2021). 10.1101/2021.11.17.
SKEMPI 2.0: an updated benchmark of changes in protein-protein binding en­ 468998.
ergy, kinetics and thermodynamics upon mutation. Bioinformatics [142] Pak MA, Ivankov DN. Best templates outperform homology models in pre­
2019;35(3):462–9. https://doi.org/10.1093/bioinformatics/bty635 dicting the impact of mutations on protein stability. 07 Bioinform Btac
[116] Jiménez J, Škalič M, Martínez-Rosell G, Fabritiis GDe. KDEEP: Protein-ligand 2022;515. https://doi.org/10.1093/bioinformatics/btac515. 07.
absolute binding affinity prediction via 3D-convolutional neural networks. J [143] M.A. Pak, K.A. Markhieva, M.S. Novikova, D.S. Petrov, I.S. Vorobyev, E.S.
Chem Inf Model 2018;58(2):287–96. https://doi.org/10.1021/acs.jcim.7b00650 Maksimova, F.A. Kondrashov, D.N. Ivankov, Using alphafold to predict the im­
[117] Ahmed A, Mam B, Sowdhamini R. DEELIG: A deep learning approach to predict pact of single mutations on protein stability and function, BioRxiv (2021).
protein-ligand binding affinity. Bioinform Biol Insights 2021;15:11779322211030364 [144] C. Norn, B.I.M. Wicky, D. Juergens, S. Liu, D. Kim, B. Koepnick, I. Anishchenko, F.
https://doi.org/10.1177/11779322211030364 Players, D. Baker, S. Ovchinnikov, Protein sequence design by explicit energy
[118] Ballester PJ, Mitchell JBO. A machine learning approach to predicting protein- landscape optimization, bioRxiv (2020). 10.1101/2020.07.23.218917.
ligand binding affinity with applications to molecular docking. Bioinformatics [145] D. Tischer, S. Lisanza, J. Wang, R. Dong, I. Anishchenko, L.F. Milles, S.
2010;26(9):1169–75. https://doi.org/10.1093/bioinformatics/btq112 Ovchinnikov, D. Baker, Design of proteins presenting discontinuous functional
[119] Boyles F, Deane CM, Morris GM. Learning from docked ligands: Ligand-based sites using deep learning, bioRxiv (2020). 10.1101/2020.11.29.402743.
features rescue structure-based scoring functions when trained on docked [146] J. Wang, S. Lisanza, D. Juergens, D. Tischer, I. Anishchenko, M. Baek, J.L. Watson,
poses. J Chem Inf Model 2021. https://doi.org/10.1021/acs.jcim.1c00096. (Sep.). J.H. Chun, L.F. Milles, J. Dauparas, M. Expòsit, W. Yang, A. Saragovi, S.
[120] Kundu I, Paul G, Banerjee R. A machine learning approach towards the prediction Ovchinnikov, D. Baker, Deep learning methods for designing proteins scaf­
of protein- ligand binding affinity based on fundamental molecular properties. folding functional sites, bioRxiv Preprint (Nov. 2021). 10.1101/2021.11.10.
RSC Adv 2018;8(22):12127–37. https://doi.org/10.1039/C8RA00003D 468128.
[121] Li H, Leung K-S, Wong M-H, Ballester PJ. Improving AutoDock Vina using [147] Anishchenko I, Pellock SJ, Chidyausiku TM, Ramelot TA, Ovchinnikov S, Hao J,
Random Forest: the growing accuracy of binding affinity prediction by the ef­ Bafna K, Norn C, Kang A, Bera AK, DiMaio F, Carter L, Chow CM, Montelione GT,
fective exploitation of larger data sets. Mol Inf 2015;34(2–3):115–26. https:// Baker D. De novo protein design by deep network hallucination. Nature
doi.org/10.1002/minf.201400132 2021;600(7889):547–52. https://doi.org/10.1038/s41586-021-04184-w
[122] S. Li, J. Zhou, T. Xu, L. Huang, F. Wang, H. Xiong, W. Huang, D. Dou, H. Xiong, [148] Lin Y-R, Koga N, Tatsumi-Koga R, Liu G, Clouser AF, Montelione GT, Baker D.
Structure-aware interactive graph neural networks for the prediction of pro­ Control over overall shape and size in de novo designed proteins. Proc Natl
tein-ligand binding affinity, in: Proceedings of the 27th ACM SIGKDD Acad Sci USA 2015;112(40):E5478–85. https://doi.org/10.1073/pnas.
Conference on Knowledge Discovery & Data Mining, ACM, Virtual Event 1509508112
Singapore, 2021, pp.975–985.10.1145/3447548.3467311. [149] Marcos E, Chidyausiku TM, McShan AC, Evangelidis T, Nerli S, Carter L, Nivón
[123] Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P. Development and eva­ LG, Davis A, Oberdorfer G, Tripsianes K, Sgourakis NG, Baker D. De novo design
luation of a deep learning model for protein- ligand binding affinity prediction. of a non-local β-sheet protein with high stability and accuracy. Nat Struct Mol
Bioinformatics 2018;34(21):3666–74. https://doi.org/10.1093/bioinformatics/bty374 Biol 2018;25(11):1028–34. https://doi.org/10.1038/s41594-018-0141-6
[124] Wójcikowski M, Kukiełka M, Stepniewska-Dziubinska MM, Siedlecki P. [150] Baker D. What has de novo protein design taught us about protein folding and
Development of a Protein-Ligand Extended Connectivity (PLEC) fingerprint and biophysics? Protein Sci 2019;28(4):678–83. https://doi.org/10.1002/pro.3588
its application for binding affinity predictions. Bioinformatics [151] N. Ferruz, M. Heinzinger, M. Akdel, A. Goncearenco, L. Naef, C. Dallago, From
2019;35(8):1334–41. https://doi.org/10.1093/bioinformatics/bty757 sequence to function through structure: deep learning for protein design,
[125] Liu Z, Li Y, Han L, Li J, Liu J, Zhao Z, Nie W, Liu Y, Wang R. PDB-wide collection of bioRxiv (2022).
binding data: current status of the PDBbind database. Bioinformatics [152] Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and functional
2015;31(3):405–12. https://doi.org/10.1093/bioinformatics/btu626 analysis of native disorder in proteins from the three kingdoms of life. J Mol
[126] Hu L, Benson ML, Smith RD, Lerner MG, Carlson HA. Binding MOAD (Mother Of Biol 2004;337(3):635–45. https://doi.org/10.1016/j.jmb.2004.02.002
All Databases). Proteins Struct Funct Bioinform 2005;60(3):333–40. https://doi. [153] A. Gupta, S. Dey, H.-X. Zhou, Artificial Intelligence Guided Conformational
org/10.1002/prot.20512 Mining of Intrinsically Disordered Proteins, bioRxiv Preprint(Nov. 2021). 10.
[127] Mysinger MM, Carchia M, Irwin JJ, Shoichet BK. Directory of useful decoys, 1101/2021.11.21.469457.
enhanced (dud-e): better ligands and decoys for better benchmarking. J Med [154] Budowski-Tal I, Nov Y, Kolodny R. FragBag, an accurate representation of pro­
Chem 2012;55(14):6582–94. tein structure, retrieves structural neighbors from the entire PDB quickly and
[128] R. Evans, M. O’Neill, A. Pritzel, N. Antropova, A. Senior, T. Green, A. Žídek, R. accurately. Proc Natl Acad Sci USA 2010;107(8):3481–6. https://doi.org/10.
Bates, S. Blackwell, J. Yim, O. Ronneberger, S. Bodenstein, M. Zielinski, A. 1073/pnas.0914097107
Bridgland, A. Potapenko, A. Cowie, K. Tunyasuvunakool, R. Jain, E. Clancy, P. [155] Liu Y, Ye Q, Wang L, Peng J. Learning structural motif representations for effi­
Kohli, J. Jumper, D. Hassabis, Protein complex prediction with AlphaFold- cient protein structure search. Bioinformatics 2018;34(17):i773–80. https://doi.
Multimer, bioRxiv Preprint (Oct. 2021). 10.1101/2021.10.04.463034. org/10.1093/bioinformatics/bty585
[129] Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein inter­ [156] Guzenko D, Burley SK, Duarte JM. Real time structural search of the protein data
actions using AlphaFold2. Nat Commun 2022;13(1):1265. https://doi.org/10. bank. PLoS Comput Biol 2020;16(7):e1007970https://doi.org/10.1371/journal.
1038/s41467-022-28865-w pcbi.1007970

641
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

[157] T. Aderinwale, V. Bharadwaj, C. Christoffer, G. Terashi, Z. Zhang, R. Jahandideh, Y. structures. Bioinformatics 2021;37(16):2332–9. https://doi.org/10.1093/
Kagaya, D. Kihara, Real-Time Structure Search and Structure Classification for bioinformatics/btab118
AlphaFold Protein Models, bioRxiv Preprint (Oct. 2021). 10.1101/2021.10.21. [185] Bernauer J, Bahadur RP, Rodier F, Janin J, Poupon A. DiMoVo: A voronoi tes­
465371. sellation-based method for discriminating crystallographic and biological
[158] Foldseek: fast and accurate protein structure search bioRxiv 10.1101/2022.02. protein– protein interactions. Bioinformatics 2008;24(5):652–8. https://doi.
07.479398v4 org/10.1093/bioinformatics/btn022
[159] N. Bordin, I. Sillitoe, V. Nallapareddy, C. Rauer, S.D. Lam, V.P. Waman, N. Sen, M. [186] Durairaj J, Akdel M, de Ridder D, van Dijk ADJ. Geometricus represents protein
Heinzinger, M. Littmann, S. Kim, S. Velankar, M. Steinegger, B. Rost, C. Orengo, structures as shape-mers derived from moment invariants. Bioinformatics
AlphaFold2 reveals commonalities and novelties in protein structure space for 2020;36(Supplement_2):i718–25. https://doi.org/10.1093/bioinformatics/btaa839
21 model organisms, pages: 2022.06.02.494367 Section: New Results (Jun. [187] Kihara D, Sael L, Chikhi R, Esquivel-Rodriguez J. Molecular surface re­
2022). 10.1101/2022.06.02.494367. presentation Using 3D Zernike descriptors for protein shape comparison and
[160] Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, Wyczalkowski MA, Liang docking. Curr Protein Peptide Sci 2011;12(6):520–30. https://doi.org/10.2174/
W-W, Zhang Q, McLellan MD, Sun SQ, Tripathi P, Lou C, Ye K, Mashl RJ, Wallis J, 138920311796957612
Wendl MC, Chen F, Ding L. Protein-structure-guided discovery of functional [188] Yin S, Proctor EA, Lugovskoy AA, Dokholyan NV. Fast screening of protein sur­
mutations across 19 cancer types. Nat Genet 2016;48(8):827–37. https://doi. faces using geometric invariant fingerprints. Proc Natl Acad Sci USA
org/10.1038/ng.3586 2009;106(39):16622–6. https://doi.org/10.1073/pnas.0906146106
[161] Berliner N, Teyra J, Çolak R, Lopez SG, Kim PM. Combining structural modeling [189] Namrata A, Po-Ssu H. Generative modeling for protein structures. Adv Neural
with ensemble machine learning to accurately predict protein fold stability and Inf Process Syst 2018:7494–505.
binding affinity effects upon mutation. PLoS One 2014;9(9):e107353https://doi. [190] Jiang M, Li Z, Zhang S, Wang S, Wang X, Yuan Q, Wei Z. Drug- target affinity
org/10.1371/journal.pone.0107353 prediction using graph neural network and contact maps. RSC Adv
[162] Terwilliger TC, Liebschner D, Croll TI, Williams CJ, McCoy AJ, Poon BK, Afonine 2020;10(35):20701–12. https://doi.org/10.1039/D0RA02297G
PV, Oeffner RD, Richardson JS, Read RJ, Adams PD. AlphaFold predictions: great [191] Wang X, Flannery ST, Kihara D. Protein docking model evaluation by graph neural
hypotheses but no match for experiment, preprint. Biochemistry 2022. https:// networks. Front Mol Biosci 2021;8:402. https://doi.org/10.3389/fmolb.2021.647915
doi.org/10.1101/2022.11.21.517405. (Nov.). [192] Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible
[163] Hubbard SJ, Thornton JM, et al. naccess, computer program, department of protein design using deep graph neural networks. e4 Cell Syst
biochemistry and molecular biology. Univ Coll Lond 1993;2(1). 2020;11(4):402–11. https://doi.org/10.1016/j.cels.2020.08.016
[164] Mihel J, Šikić M, Tomić S, Jeren B, Vlahoviček K. Psaia-protein structure and [193] Ingraham J, Garg V, Barzilay R, Jaakkola T. Generative models for graph-based
interaction analyzer. BMC Struct Biol 2008;8(1):1–11. protein design. Adv Neural Inf Process Syst 2019;32:15820–31.
[165] Mitternacht S. Freesasa: An open source c library for solvent accessible surface [194] Q. Yuan, S. Chen, J. Rao, S. Zheng, H. Zhao, Y. Yang, AlphaFold2-aware protein-
area calculations. F1000Research 2016;5. DNA binding site prediction using graph transformer, bioRxiv Preprint (Dec.
[166] Touw WG, Baakman C, Black J, Te Beek TA, Krieger E, Joosten RP, Vriend G. A 2021). 10.1101/2021.08.25.457661.
series of pdb-related databanks for everyday needs. Nucleic Acids Res [195] A.R. Jamasb, R. Viñas, E.J. Ma, C. Harris, K. Huang, D. Hall, P. Lió, T.L. Blundell,
2015;43(D1):D364–8. Graphein - a Python library for geometric deep learning and network analysis
[167] Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y. Protdcal: A program to on protein structures and interaction networks, bioRxiv Preprint (Oct. 2021).
compute general-purpose-numerical descriptors for sequences and 3d-struc­ 10.1101/2020.07.15.204701.
tures of proteins. BMC Bioinform 2015;16(1):1–15. [196] Somnath VR, Bunne C, Krause A. Multi-scale representation learning on pro­
[168] Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck teins. Adv Neural Inf Process Syst 2021;34.
T, Kauff F, Wilczynski B, et al. Biopython: freely available python tools for [197] Lim J, Ryu S, Park K, Choe YJ, Ham J, Kim WY. Predicting drug-target interaction
computational molecular biology and bioinformatics. Bioinformatics using a novel graph neural network with 3D structure-embedded graph re­
2009;25(11):1422–3. presentation. J Chem Inf Model 2019;59(9):3981–8. https://doi.org/10.1021/acs.
[169] Sanner MF, Olson AJ, Spehner J-C. Reduced surface: an efficient way to compute jcim.9b00387
molecular surfaces. Biopolymers 1996;38(3):305–20. [198] Morrone JA, Weber JK, Huynh T, Luo H, Cornell WD. Combining docking pose
[170] R.J. Gowers, M. Linke, J. Barnoud, T.J.E. Reddy, M.N. Melo, S.L. Seyler, J. rank and structure with deep learning improves protein-ligand binding mode
Domanski, D.L. Dotson, S. Buchoux, I.M. Kenney, et al., Mdanalysis: a python prediction over a baseline docking approach. J Chem Inf Model
package for the rapid analysis of molecular dynamics simulations, Tech. rep., 2020;60(9):4170–9. https://doi.org/10.1021/acs.jcim.9b00927
Los Alamos National Lab. (LANL), Los Alamos, NM (United States) (2019). [199] Sunseri J, King JE, Francoeur PG, Koes DR. Convolutional neural network scoring
[171] Buß O, Rudat J, Ochsenreither K. Foldx as protein engineering tool: better than and minimization in the D3R 2017 community challenge. J Comput Aided Mol
random based approaches? Comput Struct Biotechnol J 2018;16:25–33. Des 2019;33(1):19–34. https://doi.org/10.1007/s10822-018-0133-y
[172] Alford RF, Leaver-Fay A, Jeliazkov JR, O’Meara MJ, DiMaio FP, Park H, Shapovalov [200] Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K,
MV, Renfrew PD, Mulligan VK, Kappel K, et al. The rosetta all-atom energy Pande V. MoleculeNet: A benchmark for molecular machine learning. Chem Sci
function for macromolecular modeling and design. J Chem Theory Comput 2018;9(2):513–30. https://doi.org/10.1039/C7SC02664A
2017;13(6):3031–48. [201] Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of
[173] Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA. Electrostatics of nano­ protein- ligand interfaces for structure-based virtual screening. Expert Opin
systems: application to microtubules and the ribosome. Proc Natl Acad Sci USA Drug Discov 2021;16(10):1175–92. https://doi.org/10.1080/17460441.2021.
2001;98(18):10037–41. 1929921
[174] Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT. The disopred server for the [202] Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational
prediction of protein disorder. Bioinformatics 2004;20(13):2138–9. protein engineering with sequence-based deep representation learning. Nat
[175] Bakan A, Meireles LM, Bahar I. Prody: protein dynamics inferred from theory Methods 2019;16(12):1315–22. https://doi.org/10.1038/s41592-019-0598-1
and experiments. Bioinformatics 2011;27(11):1575–7. [203] T. Bepler, B. Berger, Learning protein sequence embeddings using information
[176] Mikulska-Ruminska K, Kulik AJ, Kaya C, BenAdiba C, Dietler G, Nowak W, Bahar from structure, arXiv Preprint (Oct. 2019). arXiv:1902.08661.
I. Mechstiff: A new tool for evaluating stress-induced dynamics and application [204] Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B.
to cell adhesion proteins. Biophys J 2017;112(3):45a–6a. Modeling aspects of the language of life through transfer-learning protein sequences.
[177] Atilgan C, Atilgan AR. Perturbation-response scanning reveals ligand entry-exit BMC Bioinform 2019;20(1):723. https://doi.org/10.1186/s12859-019-3220-8
mechanisms of ferric binding protein. PLoS Comput Biol 2009;5(10):e1000544. [205] Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, Guo D, Ott M, Zitnick CL, Ma J,
[178] Shegay MV, Suplatov DA, Popova NN, Švedas VK, Voevodin VV. parMATT: Fergus R. Biological structure and function emerge from scaling unsupervised
Parallel multiple alignment of protein 3D-structures with translations and learning to 250 million protein sequences. Proc Natl Acad Sci USA 2021;118(15).
twists for distributed-memory systems. Bioinformatics 2019;35(21):4456–8. https://doi.org/10.1073/pnas.2016239118. (Apr.).
https://doi.org/10.1093/bioinformatics/btz224 [206] Mansoor S, Baek M, Madan U, Horvitz E. Toward more general embeddings for
[179] J. Durairaj, M. Akdel, D. de Ridder, A.D. van Dijk, Fast and adaptive protein protein design: harnessing joint representations of sequence and structure.
structure representations for machine learning, bioRxiv Preprint (Apr. 2021). bioRxiv Preprint 2021. https://doi.org/10.1101/2021.09.01.458592. (Sep.).
10.1101/2021.04.07.438777. [207] P. Hermosilla, T. Ropinski, Contrastive representation learning for 3d protein
[180] Shegay MV, Švedas VK, Voevodin VV, Suplatov DA, Popova NN. Guide tree opti­ structures, arXiv preprint arXiv:2205.15675 (2022).
mization with genetic algorithm to improve multiple protein 3D-structure align­ [208] C. Chen, Y. Zha, D. Zhu, K. Ning, X. Cui, Hydrogen bonds meet self-attention: all
ment. Bioinformatics 2021. https://doi.org/10.1093/bioinformatics/btab798 you need for general-purpose protein structure embedding, bioRxiv Preprint
[181] Ezkurdia I, Bartoli L, Fariselli P, Casadio R, Valencia A, Tress ML. Progress and (Aug. 2021). 10.1101/2021.01.31.428935.
challenges in predicting protein- protein interaction sites. Brief Bioinform [209] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I.
2009;10(3):233–46. https://doi.org/10.1093/bib/bbp021 Polosukhin, Attention is all you need, in: Advances in neural information
[182] Poupon A. Voronoi and voronoi-related tessellations in studies of protein processing systems, 2017, pp.5998–6008.
structure and interaction. Curr Opin Struct Biol 2004;14(2):233–41. https://doi. [210] F. Sverrisson, J. Feydy, B.E. Correia, M.M. Bronstein, Fast end-to-end learning on
org/10.1016/j.sbi.2004.03.010 protein surfaces, bioRxiv Preprint (Dec. 2020). 10.1101/2020.12.28.424589.
[183] Pan Y, Wang Z, Zhan W, Deng L. Computational identification of binding energy [211] G. Corso, H. Stärk, B. Jing, R. Barzilay, T. Jaakkola, DiffDock:Diffusion Steps,
hot spots in protein-RNA complexes using an ensemble approach. Bioinformatics Twists, and Turns for Molecular Docking, arXiv:2210.01776 [physics, q-bio](Oct.
2018;34(9):1473–80. https://doi.org/10.1093/bioinformatics/btx822 2022). 10.48550/arXiv.2210.01776.
[184] Igashov I, Olechnovič K, Kadukova M, Venclovas Č, Grudinin S. VoroCNN: Deep [212] O.-E. Ganea, X. Huang, C. Bunne, Y. Bian, R. Barzilay, T. Jaakkola, A. Krause,
convolutional neural network built on 3D voronoi tessellation of protein Independent SE(3)-equivariant models for end-to-end rigid protein docking,
arXiv:2111.07786 [cs] (Mar. 2022). 10.48550/arXiv.2111.07786.

642
J. Durairaj, D. de Ridder and A.D.J. van Dijk Computational and Structural Biotechnology Journal 21 (2023) 630–643

[213] A. Schneuing, Y. Du, C. Harris, A. Jamasb, I. Igashov, W. Du, T. Blundell, P. Lió, C. [237] Singharoy A, Teo I, McGreevy R, Stone JE, Zhao J, Schulten K. Molecular dy­
Gomes, M. Welling, M. Bronstein, B. Correia, Structure-based drug design with namics-based refinement and validation for sub-5 Å cryo-electron microscopy
equivariant diffusion models, arXiv:2210.13695 [cs, q-bio](Oct. 2022). 10. maps. eLife 2016;5. https://doi.org/10.7554/eLife.16105. (Jul.).
48550/arXiv.2210.13695. [238] Mirjalili V, Noyes K, Feig M. Physics-based protein structure refinement
[214] Kim PT, Winter R, Clevert D-A. Unsupervised representation learning for pro­ through multiple molecular dynamics trajectories and structure averaging.
teochemometric modeling. Int J Mol Sci 2021;22(23):12882https://doi.org/10. Proteins Struct Funct Genet 2014;82(Suppl 2):196–207. https://doi.org/10.
3390/ijms222312882 1002/prot.24336
[215] Villegas-Morcillo A, Makrodimitris S, van Ham RCHJ, Gomez AM, Sanchez V, [239] Gniewek P, Kolinski A, Jernigan RL, Kloczkowski A. Elastic network normal
Reinders MJT. Unsupervised protein embeddings outperform hand-crafted se­ modes provide a basis for protein structure refinement. J Chem Phys
quence and structure features at predicting molecular function. Bioinformatics 2012;136(19):195101https://doi.org/10.1063/1.4710986
2021;37(2):162–70. https://doi.org/10.1093/bioinformatics/btaa701 [240] Schneider J, Korshunova K, SiChaib Z, Giorgetti A, Alfonso-Prieto M, Carloni P.
[216] S. Sledzieski, R. Singh, L. Cowen, B. Berger, Sequence-based prediction of pro­ Ligand pose predictions for human G Protein-Coupled Receptors: insights from
tein-protein interactions: a structure-aware interpretable deep learning model, the Amber-based hybrid molecular mechanics/coarse-grained approach. J
bioRxiv (2021). 10.1101/2021.01.22.427866. Chem Inf Model 2020;60(10):5103–16. https://doi.org/10.1021/acs.jcim.
[217] M. Heinzinger, M. Littmann, I. Sillitoe, N. Bordin, C. Orengo, B. Rost, Contrastive 0c00661
learning on protein embeddings enlightens midnight zone at lightning speed, [241] Wang A, Zhang Y, Chu H, Liao C, Zhang Z, Li G. Higher accuracy achieved for
bioRxiv Preprint (Nov. 2021). 10.1101/2021.11.14.468528. protein-ligand binding pose prediction by Elastic Network Model-based en­
[218] Y. Zhang, P. Li, F. Pan, H. Liu, P. Hong, X. Liu, J. Zhang, Applications of AlphaFold semble docking. J Chem Inf Model 2020;60(6):2939–50. https://doi.org/10.
beyond protein structure prediction, bioRxiv Preprint (Dec. 2021). 10.1101/2021. 1021/acs.jcim.9b01168
11.03.467194. [242] Cavasotto CN. Normal mode-based approaches in receptor ensemble docking.
[219] Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, In: Baron R, editor. Computational drug discovery and design, methods in
de Beer TAP, Rempfer C, Bordoli L, Lepore R, Schwede T. SWISS-MODEL: molecular biology New York, NY: Springer; 2012. p. 157–68. https://doi.org/10.
homology modelling of protein structures and complexes. Nucleic Acids Res 1007/978-1-61779-465-0_11. (pp).
2018;46(W1):W296–303. https://doi.org/10.1093/nar/gky427 [243] Evangelista Falcon W, Ellingson SR, Smith JC, Baudry J. Ensemble docking in
[220] M. Mirdita, S. Ovchinnikov, M. Steinegger, ColabFold - Making protein folding drug discovery: how many protein configurations from molecular dynamics
accessible to all, bioRxiv Preprint (Aug. 2021). 10.1101/2021.08.15.456425. simulations are needed to reproduce known ligand binding? J Phys Chem B
[221] Weißenow K, Heinzinger M, Rost B. Protein language-model embeddings for 2019;123(25):5189–95. https://doi.org/10.1021/acs.jpcb.8b11491
fast, accurate, and alignment-free protein structure prediction. Structure 2022. [244] Stansfeld PJ, Sansom MSP. From coarse grained to atomistic: a serial multiscale
[222] AlQuraishi M, Sorger PK. Differentiable biology: using deep learning for bio­ approach to membrane protein simulations. J Chem Theory Comput
physics-based and data-driven modeling of molecular mechanisms. Nat 2011;7(4):1157–66. https://doi.org/10.1021/ct100569y
Methods 2021;18(10):1169–80. https://doi.org/10.1038/s41592-021-01283-4 [245] Noé F, Tkatchenko A, Müller K-R, Clementi C. Machine learning for molecular
[223] Ferruz N, Heinzinger M, Akdel M, Goncearenco A, Naef L, Dallago C. From se­ simulation. Annu Rev Phys Chem 2020;71(1):361–90. https://doi.org/10.1146/
quence to function through structure: deep learning for protein design. annurev-physchem-042018-052331
Comput Struct Biotechnol J 2023;21:238–50. https://doi.org/10.1016/j.csbj. [246] Noé F, De Fabritiis G, Clementi C. Machine learning for protein folding and
2022.11.014 dynamics. Curr Opin Struct Biol 2020;60:77–84. https://doi.org/10.1016/j.sbi.
[224] Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, Wicky BIM, 2019.12.005
Courbet A, de Haas RJ, Bethel N, Leung PJY, Huddy TF, Pellock S, Tischer D, Chan [247] Jin Y, Johannissen LO, Hay S. Predicting new protein conformations from mo­
F, Koepnick B, Nguyen H, Kang A, Sankaran B, Bera AK, King NP, Baker D. Robust lecular dynamics simulation conformational landscapes and machine learning.
deep learning-based protein sequence design using ProteinMPNN. Science Proteins Struct Funct Bioinform 2021;89(8):915–21.
2022;378(6615):49–56. https://doi.org/10.1126/science.add2187 [248] Karamzadeh R, Karimi-Jafari MH, Sharifi-Zarchi A, Chitsaz H, Salekdeh GH,
[225] J.L. Watson, D. Juergens, N.R. Bennett, B.L. Trippe, J. Yim, H.E. Eisenach, W. Moosavi-Movahedi AA. Machine learning and network analysis of molecular
Ahern, A.J. Borst, R.J. Ragotte, L.F. Milles, B.I.M. Wicky, N. Hanikel, S.J. Pellock, A. dynamics trajectories reveal two chains of red/ox-specific residue interactions
Courbet, W. Sheffler, J. Wang, P. Venkatesh, I. Sappington, S.V. Torres, A. Lauko, in human protein Disulfide Isomerase. Sci Rep 2017;7(1):3666. https://doi.org/
V.D. Bortoli, E. Mathieu, R. Barzilay, T.S. Jaakkola, F. DiMaio, M. Baek, D. Baker, 10.1038/s41598-017-03966-5
Broadly applicable and accurate protein design by integrating structure pre­ [249] Spiwok V, Kr^íž P. Time-lagged t-Distributed Stochastic Neighbor Embedding
diction networks and diffusion generative models, pages: 2022.12.09.519842 (t-SNE) of molecular simulation trajectories. Front Mol Biosci 2020;7.
Section: New Results (Dec. 2022). 10.1101/2022.12.09.519842. [250] Wang DD, Ou-Yang L, Xie H, Zhu M, Yan H. Predicting the impacts of mutations
[226] Kmiecik S, Kouza M, Badaczewska-Dawid AE, Kloczkowski A, Kolinski A. on protein-ligand binding affinity based on molecular dynamics simulations
Modeling of protein structural flexibility and large-scale dynamics: coarse- and machine learning methods. Comput Struct Biotechnol J 2020;18:439–54.
grained simulations and Elastic Network Models. Int J Mol Sci https://doi.org/10.1016/j.csbj.2020.02.007
2018;19(11):3496. https://doi.org/10.3390/ijms19113496 [251] Marchetti F, Moroni E, Pandini A, Colombo G. Machine learning prediction of
[227] Hollingsworth SA, Dror RO. Molecular dynamics simulation for all. Neuron allosteric drug activity from molecular dynamics. J Phys Chem Lett
2018;99(6):1129–43. https://doi.org/10.1016/j.neuron.2018.08.011 2021;12(15):3724–32. https://doi.org/10.1021/acs.jpclett.1c00045
[228] Quesne MG, Borowski T, de Visser SP. Quantum mechanics/molecular me­ [252] Glazer DS, Radmer RJ, Altman RB. Improving structure-based function predic­
chanics modeling of enzymatic processes: caveats and breakthroughs. Chem tion using molecular dynamics. Structure 2009;17(7):919–29. https://doi.org/
Eur J 2016;22(8):2562–81. https://doi.org/10.1002/chem.201503802 10.1016/j.str.2009.05.010
[229] Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Keskin O, Bahar I. Anisotropy of [253] C. Outeiral, D.A. Nissley, C.M. Deane, Current protein structure predictors do not
fluctuation dynamics of proteins with an Elastic Network Model. Biophys J produce meaningful folding pathways, bioRxiv Preprint (Sep. 2021). 10.1101/
2001;80(1):505–15. https://doi.org/10.1016/S0006-3495(01)76033-X 2021.09.20.461137.
[230] Jamroz M, Orozco M, Kolinski A, Kmiecik S. Consistent view of protein fluc­ [254] Hochuli J, Helbling A, Skaist T, Ragoza M, Koes DR. Visualizing convolutional
tuations from all-atom molecular dynamics and coarse-grained dynamics with neural network protein-ligand scoring. J Mol Graph Model 2018;84:96–108.
knowledge-based force-field. J Chem Theory Comput 2013;9(1):119–25. https://doi.org/10.1016/j.jmgm.2018.06.005
https://doi.org/10.1021/ct300854w [255] Kim E, Goren A, Ast G. Alternative splicing: current perspectives. BioEssays
[231] Frappier V, Najmanovich RJ. A coarse-grained elastic network atom contact 2008;30(1):38–47. https://doi.org/10.1002/bies.20692
model and its use in the simulation of protein dynamics and the prediction of [256] Owji H, Nezafat N, Negahdaripour M, Hajiebrahimi A, Ghasemi Y. A compre­
the effect of mutations. PLoS Comput Biol 2014;10(4):e1003569https://doi.org/ hensive review of signal peptides: structure, roles, and applications. Eur J Cell
10.1371/journal.pcbi.1003569 Biol 2018;97(6):422–41. https://doi.org/10.1016/j.ejcb.2018.06.003
[232] Tekpinar M, Zheng W. Predicting order of conformational changes during [257] Ribeiro AJM, Das S, Dawson N, Zaru R, Orchard S, Thornton JM, Orengo C,
protein conformational transitions using an interpolated Elastic Network Zeqiraj E, Murphy JM, Eyers PA. Emerging concepts in pseudoenzyme classifi­
Model. Proteins Struct Funct Genet 2010;78(11):2469–81. https://doi.org/10. cation, evolution, and signaling. Sci Signal 2019;12(594). https://doi.org/10.
1002/prot.22755 1126/scisignal.aat9797. (Aug.).
[233] Kmiecik S, Gront D, Kouza M, Kolinski A. From coarse-grained to atomic-level [258] Smith LM, Kelleher NL. Proteoforms as the next proteomics currency. Science
characterization of protein dynamics: transition state for the folding of B do­ 2018;359(6380):1106–7.
main of protein A. J Phys Chem B 2012;116(23):7026–32. https://doi.org/10. [259] Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ. Next-generation
1021/jp301720w machine learning for biological networks. Cell 2018;173(7):1581–92. https://
[234] Mahajan S, Sanejouand Y-H. On the relationship between low-frequency doi.org/10.1016/j.cell.2018.05.015
normal modes and the large-scale conformational changes of proteins. Arch [260] Fuentealba M, Dönertas HM, Williams R, Labbadia J, Thornton JM, Partridge L.
Biochem Biophys 2015;567:59–65. https://doi.org/10.1016/j.abb.2014.12.020 Using the drug-protein interactome to identify anti-ageing compounds for
[235] Yang L, Song G, Jernigan RL. How well can we understand large-scale protein humans. PLoS Comput Biol 2019;15(1):e1006639https://doi.org/10.1371/
motions using normal modes of Elastic Network Models? Biophys J journal.pcbi.1006639
2007;93(3):920–9. https://doi.org/10.1529/biophysj.106.095927 [261] Murray D, Petrey D, Honig B. Integrating 3D structural information into
[236] Takada S, Kanada R, Tan C, Terakawa T, Li W, Kenzaki H. Modeling structural systems biology. J Biol Chem 2021;296:100562https://doi.org/10.1016/j.jbc.
dynamics of biomolecular complexes by coarse-grained molecular simulations. 2021.100562
Acc Chem Res 2015;48(12):3026–35. https://doi.org/10.1021/acs.accounts. [262] Aloy P, Russell RB. Structural systems biology: modelling protein interactions.
5b00338 Nat Rev Mol Cell Biol 2006;7(3):188–97. https://doi.org/10.1038/nrm1859

643

You might also like