0% found this document useful (0 votes)

223 views111 pages

2021 09 08 AlphaFold Webinar Slides

The document provides an overview of AlphaFold, a model developed by DeepMind to predict the 3D structure of proteins from their amino acid sequences. It discusses how AlphaFold works, how to interpret its predictions using metrics like predicted LDDT and predicted aligned error, and how to access AlphaFold predictions. The document also outlines DeepMind's goals of solving fundamental scientific problems with AI, and how AlphaFold was the top method in the CASP14 assessment for protein structure prediction. It notes that future work includes extending AlphaFold's approach to predicting protein complexes.

Uploaded by

Rajkumar Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

223 views111 pages

2021 09 08 AlphaFold Webinar Slides

Uploaded by

Rajkumar Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 111

Introduction to

AlphaFold
Presenter: Kathryn Tunyasuvunakool
Research Scientist at DeepMind
Agenda Private & Conﬁdential

● Introduction

● How AlphaFold works

● How to interpret predictions

● Accessing AlphaFold

● Future work
DeepMind and protein folding Private & Conﬁdential

A central part of DeepMind’s mission is to solve

fundamental scientiﬁc problems with AI

Predicting the 3D structure of a protein from its

amino acid sequence is one such challenge

AlphaFold is our model that aims to solve this problem

AlphaFold at CASP Private & Conﬁdential
Experiment
Prediction

When working on these problems, a clear success

metric is crucial

Fortunately, the protein structure prediction

community had established CASP

The CASP assessment involves predicting recently

solved structures that aren’t yet public

At CASP14, AlphaFold was the top ranked method

achieving consistently high accuracy

We committed to publishing the method

and making AlphaFold broadly accessible
Private & Conﬁdential

How does it work?

(the short
version)

See Jumper et al. 2021 (especially the SI) for details

Inputs Private & Conﬁdential

A key AlphaFold input is the MSA, containing sequences evolutionarily related to the target.
Related sequences are found using standard tools and public databases.
Inputs Private & Conﬁdential

The input sequence is used to create an array of numbers representing all residue pairs.
Inputs Private & Conﬁdential

AlphaFold can also use template structures from the PDB, found using standard tools.
However, it often produces accurate predictions without a template.
Network Private & Conﬁdential

The Evoformer blocks extract information about the relationship between residues.
The MSA representation can update the pair representation and vice versa.
Network Private & Conﬁdential

The Structure Module predicts a rotation + translation to place each residue.

A small network predicts side chain chi angles. The ﬁnal structure is run through a relaxation process.
Network Private & Conﬁdential

Feeding certain outputs back through the network again improves performance
Other outputs Private & Conﬁdential

As well as a predicted structure, AlphaFold produces two conﬁdence estimates

Private & Conﬁdential

Interpreting
predictions

The short version: use both conﬁdence metrics

Predicted LDDT: definition Private & Conﬁdential

AlphaFold’s per-residue prediction of its lDDT-Cɑ score*

Roughly, lDDT measures the percentage of correctly predicted interatomic distances,

not how well the predicted and true structures can be superimposed.

It rewards locally correct structures, and getting individual domains right.

pLDDT behaves similarly, as a measure of local conﬁdence

*Mariani et al. 2013

Alignment-based metric lDDT

Predicted LDDT: format Private & Conﬁdential

pLDDT ranges from 0 to 100 (100 is most conﬁdent)

Very high (>90)

We use a consistent “conﬁdence bands” color scheme Conﬁdent (70-90)

when displaying predictions Low (50-70)
A pLDDT plot is also displayed by some of our tools Very low (<50)

Prediction ﬁles always contain pLDDT in the B-factors

Therefore a higher B-factor is better!
Predicted LDDT: usage Private & Conﬁdential

Identifying domains & possible disordered regions Assessing conﬁdence within a domain

pLDDT > 90
pLDDT > 70 Reasonable to
Residues 65-342 pLDDT < 50 investigate side
and 418-784 form A disorder chains / active
a conﬁdent domain prediction not site details
a structure
prediction

pLDDT > 70
Lower confidence on
these specific parts
Predicted LDDT: pitfalls Private & Confidential

High pLDDT on all domains does not imply AlphaFold is conﬁdent of their relative positions

Assessing inter-domain conﬁdence requires a different metric...

Predicted Aligned Error: definition Private & Conﬁdential

AlphaFold’s prediction of its position error at residue x,

if the predicted and the true structures were aligned on residue y

PAE aims to measure conﬁdence in the relative positions of pairs of residues

Mainly used to assess relative domain positions, but applicable whenever pairwise conﬁdence is relevant
Predicted Aligned Error: format Private & Conﬁdential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure

and we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

In this case the squares correspond to two domains

Residues
400-722

Residues
1-375
Predicted Aligned Error: format Private & Conﬁdential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure

And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

AlphaFold is conﬁdent in relative positions within each domain...

272 163
Predicted Aligned Error: format Private & Conﬁdential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure

And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

AlphaFold is conﬁdent in relative positions within each domain...

272 163
Predicted Aligned Error: format Private & Conﬁdential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure

And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

...but not between domains.

272 429
Predicted Aligned Error: format Private & Conﬁdential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure

And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

...but not between domains.

272 429
Predicted Aligned Error: usage Private & Conﬁdential

No conﬁdence in relative domain positions Predicted domain packing

1640
1521

861

2000
Things to be aware of Private & Conﬁdential

Uncertain domain placement

● If AlphaFold is uncertain, it won’t necessarily place domains sensibly relative to each other
○ Membrane proteins won’t leave space for the cell membrane
○ Clashes can occur

Complexes

● For proteins that exist in complex, AlphaFold is missing context about their binding partners
○ Heteromers more problematic than homomers
○ Worst case: the protein is ﬂexible in isolation

● Some have had success predicting complexes by joining 2 sequences with a linker
○ We think it is possible to extend the ideas in AlphaFold to complexes
○ However, this linker setup remains to be benchmarked
Benchmarking AlphaFold Private & Conﬁdential

If you aim to measure AlphaFold’s accuracy,

it’s important to ensure the model can’t “cheat” by making use of the true structure

So it’s important to:

1. Evaluate accuracy on PDB structures released after April 30th 2018

AlphaFold isn’t trained on these recent chains, so it hasn’t had access to them before

2. Check that the PDB you’re testing on wasn’t used as a template

Our code includes a max_template_date ﬂag that can be used to limit the templates
the model can use.

AlphaFold DB predictions may use any template released before February 15th 2021
Private & Conﬁdential

1. EMBL-EBI database

Accessing
AlphaFold
2. AlphaFold Colab

3. Open source code

AlphaFold Protein Structure Database Private & Conﬁdential

Website developed and hosted by EMBL-EBI

Contains pre-run predictions for 21 organisms, with plans to expand

Cons
Pros
● Your protein of interest may be missing
● Easiest way to access a prediction -
● Can’t play around with how the prediction
no code, no wait times
● Bulk download available is generated
● Data is CC-BY 4.0
AlphaFold Colab Private & Conﬁdential

A Colab is a website hosting a pre-written Python program

When run, the code executes on a machine in the Cloud

Most similar option to a structure prediction server:

● Enter your sequence

● Hit “play” on each step
AlphaFold Colab Private & Conﬁdential

Cons
Pros ● Not suitable for large prediction jobs
● May be unreliable for long sequences
● Can run an arbitrary sequence of interest ● Wait times
● Limited ability to inﬂuence the prediction
● No coding or installation required

Note: community-made Colabs are also available that use AlphaFold models! Sergey will talk about one later.
These may support a wider range of options for customising the prediction.

Please cite our methods paper (Jumper et al. 2021) if you use an AlphaFold prediction.
Open source code Private & Conﬁdential

You can also download the code and run AlphaFold on your own machine

Most dependencies are provided in a Docker container, but you need to download genetics / template databases and trained
models separately.

See https://github.com/deepmind/alphafold for setup instructions

Script for generating a prediction: python3 docker/run_docker.py --fasta_paths=T1050.fasta

Cons
Pros
● Requires sufﬁcient space for the
● Can run an arbitrary sequence of interest databases (~2.2 TB) and a GPU
● Can run large numbers of predictions ● A little technical expertise required
● Full freedom to edit the code and change how the prediction ● Wait times
is generated
Private & Conﬁdential

Future work
Private & Conﬁdential
Thank you to everyone who made AlphaFold possible! Private & Conﬁdential

Agata Laydon Clemens Meyer Koray Kavukcuoglu Russ Bates

Alex Bateman David Reiman Martin Steinegger Sameer Velankar
Alex Bridgland David Silver Michael Figurnov Sebastian Bodenstein
Alexander Pritzel Demis Hassabis Michal Zielinski Simon A. A. Kohl
Andrew Cowie Ellen Clancy Michalina Pacholska Stanislav Nikolov
Andrew J. Ballard Ewan Birney Olaf Ronneberger Stig Petersen
Andrew W. Senior Gerard J. Kleywegt Oriol Vinyals Tamas Berghammer
Anna Potapenko John Jumper Pushmeet Kohli Tim Green
Augustin Žídek Jonas Adler Richard Evans Trevor Back
Bernardino Romera-Paredes Kathryn Tunyasuvunakool Rishub Jain Zachary Wu

The wider team at DeepMind and EMBL-EBI

The CASP community The experimental biology community

AlphaFold database

Sameer Velankar
EMBL-EBI
AlphaFold Database

• Collaboration between EMBL and

DeepMind
• Provides open access (CC-BY-4.0 license)
to all structure models predicted using
AlphaFold
• Hosted on the Google Cloud Platform
• Data available for bulk download via FTP
(ftp.ebi.ac.uk/pub/database/alphafold)
• >150K unique users since launch
• Many examples of how models can be
used on Twitter/emails
AlphaFold Database

• ~365K predicted structures for proteins from 21 model organisms

• For the organisms currently covered, predicted structures available for
sequences in the UniProt reference proteome that are between 16 and 2700
amino acids long and contain only standard amino acids
• Only one prediction (out of 5 independent predictions) is made available in
the release
• Accession – AF-P12345-F[1-N]
• Files
• AF-P12345-F[1-N]-model_V[1-N].[pdb, mmcif]
• AF-P12345-F[1-N]-predicted-aligned-error_V[1-N].json
PDB format coordinate file

• Title
• Molecule description
• Name, source
• COMPND, SOURCE records

• Citation
• Cross-reference to UniProt
• DBREF record
• Sequence
• SEQRES record
Model archive extension

• Entry details
• Accession, version, authors, version history
• Molecule description
• Name, source, sequence, sequence version
• Cross-reference
• UniProt, Taxonomy id
• Quality measure
• Per residue quality, Global quality
• Possibility to add protocols, MSA details
• Can be extended to include more metadata
AlphaFold web pages
• Basic search system
• Allows search using UniProt accession, UniProt id,
protein name, gene name and organism

• Clear indication that the structure shown is a

prediction
• Allow easy download of structure data
• Basic information about protein
• Clearly indicates if there are experimental structures
available
• Display residue-quality information in 3D viewer
(pLDDT – predicted Local Distance Difference Test)
• Predicted Aligned Error (PAE viewer)
Models available across EBI resources

UniProt Pfam

InterPro
PDBe-KB
AlphaFold Database – limitations

• Information on complexes with other proteins, nucleic acids (DNA or RNA) or ligands. In
some cases, the single-chain prediction may correspond to the structure adopted in a
complex. The missing context from surrounding molecules may lead to an uninformative
prediction
• AlphaFold does not make any predictions about any of the non-protein components such as
cofactors, metals, ligands including drug-like molecules, ions, carbohydrates and other
post-translational modifications
• Protein dynamics - AlphaFold will usually only produce one of multiple conformations
• AlphaFold has not been trained or validated for predicting the effect of mutations
• May (or may not) lead to hypotheses about protein function – any hypotheses have to be
tested by further experimentation
What’s next – under discussion

• Remove signal peptides from predictions

• Making 5 independent predictions available for each protein

• Additional metadata
• MSA – need to consider data size
• information on templates
• quality criteria e.g. predicted TM score

• Updating database to UniRef90 dataset (~130 million structures)

What’s next – under discussion

• Design pages for UniProt accession with multiple fragment structure

• Design pages for individual fragments for a given UniProt accession
• Continue adding more functionality to the web pages
• Integrate data of experimentally determined structures – e.g. display
superpositions
• Map known annotations onto the predicted structures
• Variants
• Ligand-binding sites
• Interfaces
• ….
Impact of AlphaFold database on life science
research
Structural bioinformatics – (structure/function)

• Predicting complexes between macromolecules

• Homo- and Hetero- Protein-protein; Protein-nucleic acid complexes
• Intrinsically disordered proteins

• Provide information on protein dynamics

• Relevant conformational states

• Functionally important residues

• Impact of mutation; Binding sites; Conformationally important
residues
• Interfaces

• Ligand prediction – What binds?

• What might bind in a pocket
Structural biology

• Accelerating structure studies

• Improved construct design
• Starting model for structure
determination
• Fitting models in low resolution EM
maps
• Time resolved studies to understand
mechanism
Structural biology

• Integrative/hybrid methods
• Models for individual components
I/H Methods Structures
552-protein yeast Nuclear Pore Complex
Kim et al. (2018) Nature 555, 475-82
PDBDEV_00000010; PDBDEV_00000011; PDBDEV_00000012
• Combination of sparse experimental data and
predicted model may lead to actionable data to test
hypothesis
• Chemical foot printing
• Hydrogen-Deuterium exchange
• smFRET - Single molecule fluorescence resonance
energy transfer
Structural biology across scales
• Organism • Carbohydrate chain
• Cellular organelle
• Strain/variant • Immune-system evasion
Organism Atom
Virus Complex Molecule
Infected Chemical
Cell Assembly Chains Entity

• Species • Protein-protein interface

• Tissue/cell type • Tertiary protein assembly • Enzyme co-factor,
substrate and product
• Ligand-binding site
• Proteins bind and recognise cell surface • Residue interactions
• Membrane-bound protein • Interface residues
Acknowledgements

• Mihaly Varadi • Galabina Yordanova • John Jumper

• Sreenath Nair • Cindy Natassia • Kathryn Tunyasuvunakool
• Mandar Deshpande • Richard Green • Augustin Žídek
• Stephen Anyango • Stig Petersen
• Gerard Kleywegt • Agata Laydon
• Sameer Velankar • Demis Hasabis
• AlphaFold team
ColabFold
Making Protein folding accessible to all via Google Colab
(and the unintended uses of AlphaFold)

github.com/sokrypton/ColabFold
ColabFold - Advanced options C

● Modify MSA input

○ Custom or MMseqs2 (much faster) B A
○ Trim
● Complexes
○ Homo-oligomers
○ Hetero-oligomers E D
● Fine control
○ Number of recycles
● Sample (Output more than 5 models)
○ 0 Generate
1 ensembles2by iterating through
3 4 5 6 7
random seeds, enabling dropout.

8 9 10 11 12 13 14 15
Can predict protein-protein/peptide interactions
Don't actually need a G-linker!

G-linker!

UNK-linker!

Protein-peptide interaction
Can predict protein-protein/peptide interactions

dimer-swap intertwined dimer

consistent w/
biochem data
Preprints rolling in...

Cross-species
Predicting homo-oligomers
Modeling protein given MSA
Residue index [1,2,3,4,5,6]

Multiple Sequence
Alignment

MSA image borrowed from Kathryn T. (Deepmind)

Modeling homo-oligomeric interactions by duplicating,
padding and concatenating the MSAs.
Residue index [1,2,3,4,5,6] [1,2,3,4,5,6,207,208,209,210,211,212]

Multiple Sequence
Alignment
Residue_index in AlphaFold is used to create a Relative positional encoding

[1,2,3,4,5,6]

Residue index difference capped at 32

Just duplicating often does not work.

Multiple Sequence
Alignment
Complexes - monomer

Positions

PAE (Å)
Positions
Complexes - homodimer

PAE (Å)
Chains
A B

Positions
Complexes - homo-6mer

E F

PAE (Å)
A

Chains
B C
D Positions
Complexes - homo-8mer?

PAE (Å)
Chains
Positions
BENCHMARK* - Can AlphaFold predict homo-oligomers
and is inter-PAE a good metric for this?
* technically in the training set, but alphafold was only trained on single-chains.

A B A B
A A

B B

Homo-oligomeric dataset
Ponstingl, H., Kabir, T. and Thornton, J.M., 2003. Automatic inference of protein quaternary structure from crystals.
Journal of Applied Crystallography, 36(5), pp.1116-1122.
BENCHMARK* - Lower inter-PAE scores appears
predictive of homo-dimeric formation
A B
A
B

A B
A
B

A B
A pTMscore integrates PAE info. We
recommend using it instead of pLDDT for
B ranking predicted complexes.
How about hetero-oligomers (or a mixture)?
Hetero-dimer (1:1) - CASP target H1065

A 1:1

PAE (Å)
Chains

paired msa (currently only works for prokaryotic operons)

Positions
Sometimes unpaired MSA works (example: CASP target H1065)

Unpaired MSA Paired MSA

Success Success
1/5 ~3/5
Combining paired+unpaired helps (example: CASP target H1065)
Unpaired+Paired MSA

Success
5/5
Homo/hetero-oligomer

2:2
Chains

Positions
Homo/hetero-oligomer - D-methionine transport system

2:1:2
B A

E D

A B C D E
PAE (Å)
Chains

Positions
What if your protein/complex is too big to fit into memory? - Trim it!
What if your protein/complex is too big to fit into memory? - Trim it!

A1-A36,A92-A197

A47-A242 B65-B477
Put it together in pymol!
Conclusions
● AlphaFold can be “hacked” into predicting protein-protein complexes.
● This is an unintended use and should only be used as hypotheses generation
tool that needs further experimental validation.
Acknowledgments Coauthors

Minkyung Beak, Ivan A.

Justas D., Frank D.
(RoseTTAFold support)

John Jumper & Tim Green

Milot Martin
(Alphafold support) Mirdita Steinegger
David Koes Enzo Guerrero-Araya
(py3Dmol) Lim Heo
(Submitting bug fixes)
(initial analysis)

Martin Milot

Sergey
Some AlphaFold use cases

Alex Bateman
Pfam use case: Calmodulin_bind (PF07887)
• Found in plants
• Involved in stress responses
• Built family in 2004
Seed alignment
• No PDB structure
Structural model for Calmodulin_bind (PF07887)
2
1
3

1 2 3
Structural model for Calmodulin_bind (PF07887)
2
3

3) Similarity to POLO box

New Pfam PF20452
2) Similar to BAF domain
1) Similar to RUNT New Pfam PF20451
DNA-binding domain
Consider how accurate your prediction needs to be

• For some purposes you don’t need highly accurate models

• For Pfam we are interested in identifying domains and comparing them to
PDB
• Can greatly speed up predictions in Colab by making just a single model
• Can greatly speed up predictions with ColabFold which uses mmseqs
• Use plDDT and PAE to judge if model is suitable for purpose
Just because AlphaFold can fold it doesn’t mean nature can

• Pfam use case 2: CPB_BcsS family (PF17036)

• AlphaFold prediction of region matched by Pfam identifies incomplete domain
• Structurally similar to MBB clan

• This will not be stable in vitro!

Run whole sequence in Colab
• >tr|A0A3S2XL24|A0A3S2XL24_9RHIZ Cellulose biosynthesis protein BcsS
MRRRAGLRRAGRLVGLGGLVLADLAWADERADARTVLFGSLDAGSSSFVTVGAKHAVDPVGRDGAVVLGSLG
YGGRSEIRGAWEGSPKRVRRHTVKASVVGGRQWFADWGVVALFVGPEIDFDQREAAKAVGPSPGLRLHGEVW
TRPAPTTLLTLTAIAGSARGDVWTRGSFGVRALGAYLGPEIALYADRTGYRKWSLGLHATEWTGWGVSLRLS
GGWLYEERERRPGAYGALTLWRDLD
AlphaFold model is good quality
Natural linker
Beads on a string proteins

+GG linker

+GGGG linker
Negative control: plDDT values for a spurious protein

• Random sequence is not a good

negative control for AlphaFold
• No homologues
• Predictions for the spurious protein
from AntiFam resources
• A0A0A8WY33.1/7-182
• Some regions have plDDT ~60-70
Negative control: plDDT values for a spurious protein

• No consistent structure predicted

across the models
• Looks a lot like a disordered protein
Another spurious protein
We used AlphaFold to spot a false positive in AntiFam
AlphaFold edge case

• Collagen triple helices are not predicted

• Collagens are obligate trimers and are heavily modified
• ColabFold notebook offers option for trimeric oligomers (Thanks Sergey!)

• Doesn’t seem to help in this case

What about non-natural sequences?

• KKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDK
KKKDDDD
Acknowledgements

Pfam & InterPro AlphaFold DB trRosetta & RoseTTAfold models

• Jaina Mistry • David Baker
• Mihaly Varadi
• Sara Chuguransky • Ivan
• Sameer Velankar Anishchanka
• Gustavo Salazar
• AlphaFold DB team
• Matthias Blum
• Typhaine Paysan-Lafosse AlphaFold models
• Matloob Qureshi • John Jumper
• Swaathi Kandasaamy • Kathryn Tunyasuvunakool
• Aleix Lafita • Demis Hassabis
• Alex Bateman • The AlphaFold team
Application of AF2 structures for variant eﬀect
prediction and pocket detection

Pedro Beltrao, Group Leader

www.ebi.ac.uk/beltrao
@pedrobeltrao
Comparing experimental vs structure based prediction of
missense mutations

● Several groups have used Deep

Mutational Scanning
experiments to measure the
impact of each possible
mutations on the function of a
protein.
● The impact of the mutations
should be consistent with the
protein structure.
● We can indirectly evaluate the
accuracy of the AF2 structures
by measuring their consistency
with mutational impacts.
Comparing experimental vs structure based prediction of
missense mutations

● We used FoldX to predict the impact of

each possible mutation on the stability of
the protein
● FoldXtakes as an input a protein structure
and can be used to estimate the impact of
a mutation purely from empirical energy
terms (e.g. clashes, hydrogen bonds etc)
● It has been shown to correlate reasonably
well with experimentally measured
stability changes (not trained)

Guerois et al. JMB 2002

Comparing experimental vs structure based prediction of
missense mutations
● Experimental impact of
mutations from deep
mutational scanning
experiments (30 proteins)
● Predicted ddG of mutation
using FoldX on alphafold
structures or experimental
structures
● Alphafold structures give
equal or better predictions
and it holds for regions
with no templates
Comparing experimental vs structure based prediction of
missense mutations
Comparing experimental vs structure based prediction of
missense mutations

EFEMP2

C267 → Y in ARCL1B; loss of

protein expression in patient
dermal fibroblasts;

Pathogenic Predicted ddG 5.51

Pocket detection and how to ﬁlter the models
EGFR
example Extracellular
region

TM
regio
n Top pocket
prediction

Kinase
Long domain
disordered
c-term
Pocket detection on the full models is likely to result in false positives and false negatives due to
modelling of low confidence regions and interactions. We tested this on a benchmark dataset
Pocket detection and how to ﬁlter the models

We retained 230 of 304 proteins from a dataset by Clark et al., 2020. Pocket detection was
performed using ghecom (Kawabata, 2010), as done previously in (Clark et al., 2020).
Pocket detection and how to ﬁlter the models

Filtering the pocket residues by confidence (likely also predicted aligned

residue) improves pocket detection
AlphaFold2
for detecting intrinsically
disordered protein regions

Bálint Mészáros
EMBL Heidelberg
08/09/2021
AlphaFold2 indicates the presence of IDRs

AF2 generates coordinates for every residue, even ones that have no fixed structure

human
p53

Two interpretations for low confidence: pLDDT is a good indicator of disordered regions (in this
• AF2 isn’t good enough to predict the case)
structure Let’s test the generic case – binary disordered prediction
• There is no structure to predict
AlphaFold2 as a disorder prediction method

1.2
pLDDT score distribution on the human
proteome
0.8

0.4

0 20 40 60 80 100
AlphaFold2 as a disorder prediction method

Structural parameters derived from AF2 serve as excellent predictors of disorder – outperforming our current
workhorse

Download data for human + documentation: https://tinyurl.com/AF2-disorder

AF2 disorder scores in ProViz @ DaveyLab

https://tinyurl.com/AF2-ProViz
AF2 identifies functional sites in IDRs

Homo-tetramerization
Binding site for the
region
MDM2 ubiquitin
ligase
AF2 can build complex structures for certain IDRs

Homo-tetramerization
Binding site for the
region
MDM2 ubiquitin
ligase

In what cases does AF2 work for IDR

complex predictions?

All complex structures were generated using ColabFold by Sergey

AF2 can build complex structures for certain IDRs
Right bound structure, wrong Three IDRs as a single
binding site folding unit

KID:KI E2F1:DP1:R
X b
Right binding site, wrong Highly specialized homotrimeric
orientation folding

RelA:CB
P collagen
trimer
AF2 can build complex structures for certain IDRs
• AF2 can predict IDR complexes but success depends on several factors
• Helps prediction:
• Helical / β bound IDP conformation
• Well defined, hydrophobic binding groove
• Asymmetric bound IDP structure (secondary structural elements along the IDR sequence)
• Hinders prediction:
• Short IDRs
• Irregular bound structure
• Phosphorylation-dependent binding
• Presence of ions in the interface
• Symmetric bound structures (long helices or arrays of short similar structural elements)
• Examples + documentation: https://tinyurl.com/AF2-IDRcomplex
Acknowledgements
Thank you!

Norman Davey | Short Linear Motif Team

@DaveyLab

Bálint Mészáros | Structural and Computational Biology Unit

@_BalintMeszaros
[email protected]

Axion Physics in Condensed-Matter
No ratings yet
Axion Physics in Condensed-Matter
15 pages
Physical Biology of The Cell - Content
No ratings yet
Physical Biology of The Cell - Content
12 pages
CHEM 223 Lab Manual - Spring 2015
No ratings yet
CHEM 223 Lab Manual - Spring 2015
47 pages
Protein Modelling
No ratings yet
Protein Modelling
53 pages
The Prokaryotes - Prokaryotic Biology and Symbiotic Associations PDF
100% (1)
The Prokaryotes - Prokaryotic Biology and Symbiotic Associations PDF
628 pages
ANKERSMIT, 2001. The Sublime Dissociation of The Past
No ratings yet
ANKERSMIT, 2001. The Sublime Dissociation of The Past
30 pages
Shoreline Stabilization Study Final Report 2021
100% (2)
Shoreline Stabilization Study Final Report 2021
43 pages
DeepMind AlphaFold A Revolutionary Advance in Protein Structure Prediction
No ratings yet
DeepMind AlphaFold A Revolutionary Advance in Protein Structure Prediction
8 pages
Introduction To AlphaFold RCS 2022
No ratings yet
Introduction To AlphaFold RCS 2022
36 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Bioinformatics: Applications: ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren
No ratings yet
Bioinformatics: Applications: ZOO 4903 Fall 2006, MW 10:30-11:45 Sutton Hall, Room 312 Jonathan Wren
75 pages
Presentation On Bioinformatics (With Animation) by Shahman Riaz (B190607006) Biochemistry & Molecular Biology
No ratings yet
Presentation On Bioinformatics (With Animation) by Shahman Riaz (B190607006) Biochemistry & Molecular Biology
8 pages
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
No ratings yet
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
24 pages
Cell Biology by The Numbers: Ron Milo and Rob Phillips
No ratings yet
Cell Biology by The Numbers: Ron Milo and Rob Phillips
368 pages
Protein Folding
No ratings yet
Protein Folding
21 pages
Protein Crystallography 2005a
No ratings yet
Protein Crystallography 2005a
46 pages
AP Biology Lab Manual 2015
No ratings yet
AP Biology Lab Manual 2015
223 pages
Lecture 01 An Introduction To Proteins Enzymes
No ratings yet
Lecture 01 An Introduction To Proteins Enzymes
28 pages
Thermodynamics of Protein Folding
No ratings yet
Thermodynamics of Protein Folding
32 pages
Bioinformatics: Submitted by
No ratings yet
Bioinformatics: Submitted by
19 pages
Methods For Studying Proteins
No ratings yet
Methods For Studying Proteins
96 pages
Chemoinformatics and Metabolism: Paula de Matos
No ratings yet
Chemoinformatics and Metabolism: Paula de Matos
26 pages
Bengt Nolting Protein Folding Kinetics Biophysic PDF
100% (1)
Bengt Nolting Protein Folding Kinetics Biophysic PDF
228 pages
Algorithms in Bioinformatics: Dan Brown Burkhard Morgenstern
No ratings yet
Algorithms in Bioinformatics: Dan Brown Burkhard Morgenstern
379 pages
High Throughput Methods in Proteomics: David Wishart University of Alberta Edmonton, AB
No ratings yet
High Throughput Methods in Proteomics: David Wishart University of Alberta Edmonton, AB
72 pages
The Fluorescent Protein Revolution
No ratings yet
The Fluorescent Protein Revolution
340 pages
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
No ratings yet
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
12 pages
OSU Biophysics Student Handbook 2012
No ratings yet
OSU Biophysics Student Handbook 2012
62 pages
2008 Book RetrotranspositionDiversityAnd
No ratings yet
2008 Book RetrotranspositionDiversityAnd
131 pages
Section I A Guide To The Coast Redwood For Teachers and Learners
No ratings yet
Section I A Guide To The Coast Redwood For Teachers and Learners
98 pages
MSC Bioinformatics Syllabus
No ratings yet
MSC Bioinformatics Syllabus
42 pages
Proteomic and Proteomics
No ratings yet
Proteomic and Proteomics
6 pages
Guide To Protein Purification, Volume 463
No ratings yet
Guide To Protein Purification, Volume 463
6 pages
Bob Coecke - Kindergarten Quantum Mechanics
No ratings yet
Bob Coecke - Kindergarten Quantum Mechanics
69 pages
Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Download PDF
No ratings yet
Deep Learning in Bioinformatics: Techniques and Applications in Practice Habib Izadkhah Download PDF
41 pages
Principles of Synthetic Biology
No ratings yet
Principles of Synthetic Biology
21 pages
Topology Biology
No ratings yet
Topology Biology
20 pages
Harper S Illustrated Biochemistry by Vic-385-403
No ratings yet
Harper S Illustrated Biochemistry by Vic-385-403
19 pages
Molecular Dynamics Simulations Advances and Applications
No ratings yet
Molecular Dynamics Simulations Advances and Applications
11 pages
Bioinformatics 1.1
No ratings yet
Bioinformatics 1.1
52 pages
Gregor Mendel
No ratings yet
Gregor Mendel
3 pages
Errors of The Wheeler School, The Distortions To General Relativity - MIT
No ratings yet
Errors of The Wheeler School, The Distortions To General Relativity - MIT
26 pages
Computational Biology Lab File
No ratings yet
Computational Biology Lab File
67 pages
Lac Operon - Genetics-Essentials-Concepts-and-Connections
No ratings yet
Lac Operon - Genetics-Essentials-Concepts-and-Connections
15 pages
Sequence Analysis
No ratings yet
Sequence Analysis
6 pages
1 - Introduction To Computational Biology
No ratings yet
1 - Introduction To Computational Biology
22 pages
Chapter 2 - Structure of Proteins
No ratings yet
Chapter 2 - Structure of Proteins
30 pages
Introduction To Elementary Particles: Griffith'S First Chapter
No ratings yet
Introduction To Elementary Particles: Griffith'S First Chapter
15 pages
Basic Protein Structure
No ratings yet
Basic Protein Structure
39 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Uri Alon An Introduction To Systems Biology PDF
No ratings yet
Uri Alon An Introduction To Systems Biology PDF
162 pages
5BBG0206 Workshop 2 2020-2021 v2
No ratings yet
5BBG0206 Workshop 2 2020-2021 v2
40 pages
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
No ratings yet
Bioinformatics - Group21 - Report - Application of Bioinformatics in Agriculture
11 pages
Application of Bioinformatics in Various Fields
71% (7)
Application of Bioinformatics in Various Fields
9 pages
Mathematical and Theoretical Biology
No ratings yet
Mathematical and Theoretical Biology
14 pages
Computational Molecular Biology - An Introduction Volume in Wiley Series in Mathematical and Computational Biology - Wiley (PDFDrive)
No ratings yet
Computational Molecular Biology - An Introduction Volume in Wiley Series in Mathematical and Computational Biology - Wiley (PDFDrive)
308 pages
Gene Finding
No ratings yet
Gene Finding
31 pages
Creation and Evolution from Atoms to the Molecules of Life
From Everand
Creation and Evolution from Atoms to the Molecules of Life
J.C. Collins
No ratings yet
The Latest Advances in Genetics and biology
From Everand
The Latest Advances in Genetics and biology
Aliasghar Tabatabaei Mohammadi
No ratings yet
ALPHAFOLD
No ratings yet
ALPHAFOLD
16 pages
Journal Pone 0282689
No ratings yet
Journal Pone 0282689
9 pages
Improved Protein Structure Prediction Using Potentials From Deep Learning
No ratings yet
Improved Protein Structure Prediction Using Potentials From Deep Learning
22 pages
LG 507 Governance of Culture Industries 2025
No ratings yet
LG 507 Governance of Culture Industries 2025
4 pages
Imaging of Brain Tumors.10
No ratings yet
Imaging of Brain Tumors.10
23 pages
Dynamics 12-10
No ratings yet
Dynamics 12-10
16 pages
Preview - File20191130 6946 E4hagx
No ratings yet
Preview - File20191130 6946 E4hagx
7 pages
Squint
No ratings yet
Squint
7 pages
Chemical Bonding For IIT-JEE - Vaibhav Trivedi
86% (7)
Chemical Bonding For IIT-JEE - Vaibhav Trivedi
156 pages
Summer 2020 Loosaper Newsletter
No ratings yet
Summer 2020 Loosaper Newsletter
24 pages
MSDS LBLA Glue Superhero
No ratings yet
MSDS LBLA Glue Superhero
7 pages
Calculation For IPE200
No ratings yet
Calculation For IPE200
10 pages
Psychotherapy Notes
No ratings yet
Psychotherapy Notes
6 pages
Your Infleunce A2 Worksheets UNITS 5&6
No ratings yet
Your Infleunce A2 Worksheets UNITS 5&6
33 pages
Childhood Cancer
No ratings yet
Childhood Cancer
19 pages
USP 1231 - Water For Pharmaceutical Purposes
100% (1)
USP 1231 - Water For Pharmaceutical Purposes
66 pages
Practice #1: Hydrostatic Forces: Hydraulics Lab Professor: Jose Manuel Molano Martínez
No ratings yet
Practice #1: Hydrostatic Forces: Hydraulics Lab Professor: Jose Manuel Molano Martínez
8 pages
Must Be Sunny
No ratings yet
Must Be Sunny
863 pages
Passed Final
No ratings yet
Passed Final
41 pages
Python NOTES Unit-4
No ratings yet
Python NOTES Unit-4
163 pages
Chapter 4
No ratings yet
Chapter 4
28 pages
Vital Signs
No ratings yet
Vital Signs
5 pages
Regulation of Cryptocurrency in India: 1 National Conference
No ratings yet
Regulation of Cryptocurrency in India: 1 National Conference
8 pages
AVENAR Panel 8000 Datasheet 51 en 42008906891
No ratings yet
AVENAR Panel 8000 Datasheet 51 en 42008906891
9 pages
Digital Marketing Essentials Chapter 1 Slides
No ratings yet
Digital Marketing Essentials Chapter 1 Slides
34 pages
QHDM - Vol. 03 Part 18 PDF
No ratings yet
QHDM - Vol. 03 Part 18 PDF
4 pages
Sappress Sap Change and Transport Management
50% (2)
Sappress Sap Change and Transport Management
82 pages
Revolts From Below
No ratings yet
Revolts From Below
5 pages
Partograph Case Study
100% (1)
Partograph Case Study
3 pages
South East Asian Architecture
No ratings yet
South East Asian Architecture
6 pages
Pneumatic Positioner
No ratings yet
Pneumatic Positioner
9 pages