0% found this document useful (0 votes)
220 views111 pages

2021 09 08 AlphaFold Webinar Slides

The document provides an overview of AlphaFold, a model developed by DeepMind to predict the 3D structure of proteins from their amino acid sequences. It discusses how AlphaFold works, how to interpret its predictions using metrics like predicted LDDT and predicted aligned error, and how to access AlphaFold predictions. The document also outlines DeepMind's goals of solving fundamental scientific problems with AI, and how AlphaFold was the top method in the CASP14 assessment for protein structure prediction. It notes that future work includes extending AlphaFold's approach to predicting protein complexes.

Uploaded by

Rajkumar Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views111 pages

2021 09 08 AlphaFold Webinar Slides

The document provides an overview of AlphaFold, a model developed by DeepMind to predict the 3D structure of proteins from their amino acid sequences. It discusses how AlphaFold works, how to interpret its predictions using metrics like predicted LDDT and predicted aligned error, and how to access AlphaFold predictions. The document also outlines DeepMind's goals of solving fundamental scientific problems with AI, and how AlphaFold was the top method in the CASP14 assessment for protein structure prediction. It notes that future work includes extending AlphaFold's approach to predicting protein complexes.

Uploaded by

Rajkumar Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Introduction to

AlphaFold
Presenter: Kathryn Tunyasuvunakool
Research Scientist at DeepMind
Agenda Private & Confidential

● Introduction

● How AlphaFold works

● How to interpret predictions

● Accessing AlphaFold

● Future work
DeepMind and protein folding Private & Confidential

A central part of DeepMind’s mission is to solve


fundamental scientific problems with AI

Predicting the 3D structure of a protein from its


amino acid sequence is one such challenge

AlphaFold is our model that aims to solve this problem


AlphaFold at CASP Private & Confidential
Experiment
Prediction

When working on these problems, a clear success


metric is crucial

Fortunately, the protein structure prediction


community had established CASP

The CASP assessment involves predicting recently


solved structures that aren’t yet public

At CASP14, AlphaFold was the top ranked method


achieving consistently high accuracy

We committed to publishing the method


and making AlphaFold broadly accessible
Private & Confidential

How does it work?


(the short
version)

See Jumper et al. 2021 (especially the SI) for details


Inputs Private & Confidential

A key AlphaFold input is the MSA, containing sequences evolutionarily related to the target.
Related sequences are found using standard tools and public databases.
Inputs Private & Confidential

The input sequence is used to create an array of numbers representing all residue pairs.
Inputs Private & Confidential

AlphaFold can also use template structures from the PDB, found using standard tools.
However, it often produces accurate predictions without a template.
Network Private & Confidential

The Evoformer blocks extract information about the relationship between residues.
The MSA representation can update the pair representation and vice versa.
Network Private & Confidential

The Structure Module predicts a rotation + translation to place each residue.


A small network predicts side chain chi angles. The final structure is run through a relaxation process.
Network Private & Confidential

Feeding certain outputs back through the network again improves performance
Other outputs Private & Confidential

As well as a predicted structure, AlphaFold produces two confidence estimates


Private & Confidential

Interpreting
predictions

The short version: use both confidence metrics


Predicted LDDT: definition Private & Confidential

AlphaFold’s per-residue prediction of its lDDT-Cɑ score*

Roughly, lDDT measures the percentage of correctly predicted interatomic distances,


not how well the predicted and true structures can be superimposed.

It rewards locally correct structures, and getting individual domains right.


pLDDT behaves similarly, as a measure of local confidence

*Mariani et al. 2013

Alignment-based metric lDDT


Predicted LDDT: format Private & Confidential

pLDDT ranges from 0 to 100 (100 is most confident)


Very high (>90)

We use a consistent “confidence bands” color scheme Confident (70-90)


when displaying predictions Low (50-70)
A pLDDT plot is also displayed by some of our tools Very low (<50)

Prediction files always contain pLDDT in the B-factors


Therefore a higher B-factor is better!
Predicted LDDT: usage Private & Confidential

Identifying domains & possible disordered regions Assessing confidence within a domain

pLDDT > 90
pLDDT > 70 Reasonable to
Residues 65-342 pLDDT < 50 investigate side
and 418-784 form A disorder chains / active
a confident domain prediction not site details
a structure
prediction

pLDDT > 70
Lower confidence on
these specific parts
Predicted LDDT: pitfalls Private & Confidential

High pLDDT on all domains does not imply AlphaFold is confident of their relative positions

Assessing inter-domain confidence requires a different metric...


Predicted Aligned Error: definition Private & Confidential

AlphaFold’s prediction of its position error at residue x,


if the predicted and the true structures were aligned on residue y

PAE aims to measure confidence in the relative positions of pairs of residues

Mainly used to assess relative domain positions, but applicable whenever pairwise confidence is relevant
Predicted Aligned Error: format Private & Confidential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure


and we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

In this case the squares correspond to two domains

Residues
400-722

Residues
1-375
Predicted Aligned Error: format Private & Confidential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure


And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

AlphaFold is confident in relative positions within each domain...

272 163
Predicted Aligned Error: format Private & Confidential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure


And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

AlphaFold is confident in relative positions within each domain...

272 163
Predicted Aligned Error: format Private & Confidential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure


And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

...but not between domains.

272 429
Predicted Aligned Error: format Private & Confidential

PAE is displayed as a 2D plot.

Suppose residue y were aligned to the true structure


And we measured the position error at residue x
The color at (x, y) is AlphaFold’s prediction of that error

...but not between domains.

272 429
Predicted Aligned Error: usage Private & Confidential

No confidence in relative domain positions Predicted domain packing

1640
1521

861

2000
Things to be aware of Private & Confidential

Uncertain domain placement

● If AlphaFold is uncertain, it won’t necessarily place domains sensibly relative to each other
○ Membrane proteins won’t leave space for the cell membrane
○ Clashes can occur

Complexes

● For proteins that exist in complex, AlphaFold is missing context about their binding partners
○ Heteromers more problematic than homomers
○ Worst case: the protein is flexible in isolation

● Some have had success predicting complexes by joining 2 sequences with a linker
○ We think it is possible to extend the ideas in AlphaFold to complexes
○ However, this linker setup remains to be benchmarked
Benchmarking AlphaFold Private & Confidential

If you aim to measure AlphaFold’s accuracy,


it’s important to ensure the model can’t “cheat” by making use of the true structure

So it’s important to:

1. Evaluate accuracy on PDB structures released after April 30th 2018

AlphaFold isn’t trained on these recent chains, so it hasn’t had access to them before

2. Check that the PDB you’re testing on wasn’t used as a template

Our code includes a max_template_date flag that can be used to limit the templates
the model can use.

AlphaFold DB predictions may use any template released before February 15th 2021
Private & Confidential

1. EMBL-EBI database

Accessing
AlphaFold
2. AlphaFold Colab

3. Open source code


AlphaFold Protein Structure Database Private & Confidential

Website developed and hosted by EMBL-EBI


Contains pre-run predictions for 21 organisms, with plans to expand

Cons
Pros
● Your protein of interest may be missing
● Easiest way to access a prediction -
● Can’t play around with how the prediction
no code, no wait times
● Bulk download available is generated
● Data is CC-BY 4.0
AlphaFold Colab Private & Confidential

A Colab is a website hosting a pre-written Python program

When run, the code executes on a machine in the Cloud

Most similar option to a structure prediction server:

● Enter your sequence


● Hit “play” on each step
AlphaFold Colab Private & Confidential

Cons
Pros ● Not suitable for large prediction jobs
● May be unreliable for long sequences
● Can run an arbitrary sequence of interest ● Wait times
● Limited ability to influence the prediction
● No coding or installation required

Note: community-made Colabs are also available that use AlphaFold models! Sergey will talk about one later.
These may support a wider range of options for customising the prediction.

Please cite our methods paper (Jumper et al. 2021) if you use an AlphaFold prediction.
Open source code Private & Confidential

You can also download the code and run AlphaFold on your own machine

Most dependencies are provided in a Docker container, but you need to download genetics / template databases and trained
models separately.

See https://github.com/deepmind/alphafold for setup instructions

Script for generating a prediction: python3 docker/run_docker.py --fasta_paths=T1050.fasta

Cons
Pros
● Requires sufficient space for the
● Can run an arbitrary sequence of interest databases (~2.2 TB) and a GPU
● Can run large numbers of predictions ● A little technical expertise required
● Full freedom to edit the code and change how the prediction ● Wait times
is generated
Private & Confidential

Future work
Private & Confidential
Thank you to everyone who made AlphaFold possible! Private & Confidential

Agata Laydon Clemens Meyer Koray Kavukcuoglu Russ Bates


Alex Bateman David Reiman Martin Steinegger Sameer Velankar
Alex Bridgland David Silver Michael Figurnov Sebastian Bodenstein
Alexander Pritzel Demis Hassabis Michal Zielinski Simon A. A. Kohl
Andrew Cowie Ellen Clancy Michalina Pacholska Stanislav Nikolov
Andrew J. Ballard Ewan Birney Olaf Ronneberger Stig Petersen
Andrew W. Senior Gerard J. Kleywegt Oriol Vinyals Tamas Berghammer
Anna Potapenko John Jumper Pushmeet Kohli Tim Green
Augustin Žídek Jonas Adler Richard Evans Trevor Back
Bernardino Romera-Paredes Kathryn Tunyasuvunakool Rishub Jain Zachary Wu

The wider team at DeepMind and EMBL-EBI

The CASP community The experimental biology community


AlphaFold database

Sameer Velankar
EMBL-EBI
AlphaFold Database

• Collaboration between EMBL and


DeepMind
• Provides open access (CC-BY-4.0 license)
to all structure models predicted using
AlphaFold
• Hosted on the Google Cloud Platform
• Data available for bulk download via FTP
(ftp.ebi.ac.uk/pub/database/alphafold)
• >150K unique users since launch
• Many examples of how models can be
used on Twitter/emails
AlphaFold Database

• ~365K predicted structures for proteins from 21 model organisms


• For the organisms currently covered, predicted structures available for
sequences in the UniProt reference proteome that are between 16 and 2700
amino acids long and contain only standard amino acids
• Only one prediction (out of 5 independent predictions) is made available in
the release
• Accession – AF-P12345-F[1-N]
• Files
• AF-P12345-F[1-N]-model_V[1-N].[pdb, mmcif]
• AF-P12345-F[1-N]-predicted-aligned-error_V[1-N].json
PDB format coordinate file

• Title
• Molecule description
• Name, source
• COMPND, SOURCE records

• Citation
• Cross-reference to UniProt
• DBREF record
• Sequence
• SEQRES record
Model archive extension

• Entry details
• Accession, version, authors, version history
• Molecule description
• Name, source, sequence, sequence version
• Cross-reference
• UniProt, Taxonomy id
• Quality measure
• Per residue quality, Global quality
• Possibility to add protocols, MSA details
• Can be extended to include more metadata
AlphaFold web pages
• Basic search system
• Allows search using UniProt accession, UniProt id,
protein name, gene name and organism

• Clear indication that the structure shown is a


prediction
• Allow easy download of structure data
• Basic information about protein
• Clearly indicates if there are experimental structures
available
• Display residue-quality information in 3D viewer
(pLDDT – predicted Local Distance Difference Test)
• Predicted Aligned Error (PAE viewer)
Models available across EBI resources

UniProt Pfam

InterPro
PDBe-KB
AlphaFold Database – limitations

• Information on complexes with other proteins, nucleic acids (DNA or RNA) or ligands. In
some cases, the single-chain prediction may correspond to the structure adopted in a
complex. The missing context from surrounding molecules may lead to an uninformative
prediction
• AlphaFold does not make any predictions about any of the non-protein components such as
cofactors, metals, ligands including drug-like molecules, ions, carbohydrates and other
post-translational modifications
• Protein dynamics - AlphaFold will usually only produce one of multiple conformations
• AlphaFold has not been trained or validated for predicting the effect of mutations
• May (or may not) lead to hypotheses about protein function – any hypotheses have to be
tested by further experimentation
What’s next – under discussion

• Remove signal peptides from predictions

• Making 5 independent predictions available for each protein

• Additional metadata
• MSA – need to consider data size
• information on templates
• quality criteria e.g. predicted TM score

• Updating database to UniRef90 dataset (~130 million structures)


What’s next – under discussion

• Design pages for UniProt accession with multiple fragment structure


• Design pages for individual fragments for a given UniProt accession
• Continue adding more functionality to the web pages
• Integrate data of experimentally determined structures – e.g. display
superpositions
• Map known annotations onto the predicted structures
• Variants
• Ligand-binding sites
• Interfaces
• ….
Impact of AlphaFold database on life science
research
Structural bioinformatics – (structure/function)

• Predicting complexes between macromolecules


• Homo- and Hetero- Protein-protein; Protein-nucleic acid complexes
• Intrinsically disordered proteins

• Provide information on protein dynamics


• Relevant conformational states

• Functionally important residues


• Impact of mutation; Binding sites; Conformationally important
residues
• Interfaces

• Ligand prediction – What binds?


• What might bind in a pocket
Structural biology

• Accelerating structure studies


• Improved construct design
• Starting model for structure
determination
• Fitting models in low resolution EM
maps
• Time resolved studies to understand
mechanism
Structural biology

• Integrative/hybrid methods
• Models for individual components
I/H Methods Structures
552-protein yeast Nuclear Pore Complex
Kim et al. (2018) Nature 555, 475-82
PDBDEV_00000010; PDBDEV_00000011; PDBDEV_00000012
• Combination of sparse experimental data and
predicted model may lead to actionable data to test
hypothesis
• Chemical foot printing
• Hydrogen-Deuterium exchange
• smFRET - Single molecule fluorescence resonance
energy transfer
Structural biology across scales
• Organism • Carbohydrate chain
• Cellular organelle
• Strain/variant • Immune-system evasion
Organism Atom
Virus Complex Molecule
Infected Chemical
Cell Assembly Chains Entity

• Species • Protein-protein interface


• Tissue/cell type • Tertiary protein assembly • Enzyme co-factor,
substrate and product
• Ligand-binding site
• Proteins bind and recognise cell surface • Residue interactions
• Membrane-bound protein • Interface residues
Acknowledgements

• Mihaly Varadi • Galabina Yordanova • John Jumper


• Sreenath Nair • Cindy Natassia • Kathryn Tunyasuvunakool
• Mandar Deshpande • Richard Green • Augustin Žídek
• Stephen Anyango • Stig Petersen
• Gerard Kleywegt • Agata Laydon
• Sameer Velankar • Demis Hasabis
• AlphaFold team
ColabFold
Making Protein folding accessible to all via Google Colab
(and the unintended uses of AlphaFold)

github.com/sokrypton/ColabFold
ColabFold - Advanced options C

● Modify MSA input


○ Custom or MMseqs2 (much faster) B A
○ Trim
● Complexes
○ Homo-oligomers
○ Hetero-oligomers E D
● Fine control
○ Number of recycles
● Sample (Output more than 5 models)
○ 0 Generate
1 ensembles2by iterating through
3 4 5 6 7
random seeds, enabling dropout.

8 9 10 11 12 13 14 15
Can predict protein-protein/peptide interactions
Don't actually need a G-linker!

G-linker!

UNK-linker!

Protein-peptide interaction
Can predict protein-protein/peptide interactions

dimer-swap intertwined dimer

consistent w/
biochem data
Preprints rolling in...

Cross-species
Predicting homo-oligomers
Modeling protein given MSA
Residue index [1,2,3,4,5,6]

Multiple Sequence
Alignment

MSA image borrowed from Kathryn T. (Deepmind)


Modeling homo-oligomeric interactions by duplicating,
padding and concatenating the MSAs.
Residue index [1,2,3,4,5,6] [1,2,3,4,5,6,207,208,209,210,211,212]

Multiple Sequence
Alignment
Residue_index in AlphaFold is used to create a Relative positional encoding

[1,2,3,4,5,6]

Residue index difference capped at 32


Just duplicating often does not work.

Multiple Sequence
Alignment
Complexes - monomer

Positions

PAE (Å)
Positions
Complexes - homodimer

PAE (Å)
Chains
A B

Positions
Complexes - homo-6mer

E F

PAE (Å)
A

Chains
B C
D Positions
Complexes - homo-8mer?

PAE (Å)
Chains
Positions
BENCHMARK* - Can AlphaFold predict homo-oligomers
and is inter-PAE a good metric for this?
* technically in the training set, but alphafold was only trained on single-chains.

A B A B
A A

B B

Homo-oligomeric dataset
Ponstingl, H., Kabir, T. and Thornton, J.M., 2003. Automatic inference of protein quaternary structure from crystals.
Journal of Applied Crystallography, 36(5), pp.1116-1122.
BENCHMARK* - Lower inter-PAE scores appears
predictive of homo-dimeric formation
A B
A
B

A B
A
B

A B
A pTMscore integrates PAE info. We
recommend using it instead of pLDDT for
B ranking predicted complexes.
How about hetero-oligomers (or a mixture)?
Hetero-dimer (1:1) - CASP target H1065

A 1:1

PAE (Å)
Chains

paired msa (currently only works for prokaryotic operons)

Positions
Sometimes unpaired MSA works (example: CASP target H1065)

Unpaired MSA Paired MSA

Success Success
1/5 ~3/5
Combining paired+unpaired helps (example: CASP target H1065)
Unpaired+Paired MSA

Success
5/5
Homo/hetero-oligomer

2:2
Chains

Positions
Homo/hetero-oligomer - D-methionine transport system

2:1:2
B A

E D

A B C D E
PAE (Å)
Chains

Positions
What if your protein/complex is too big to fit into memory? - Trim it!
What if your protein/complex is too big to fit into memory? - Trim it!

A1-A36,A92-A197

A47-A242 B65-B477
Put it together in pymol!
Conclusions
● AlphaFold can be “hacked” into predicting protein-protein complexes.
● This is an unintended use and should only be used as hypotheses generation
tool that needs further experimental validation.
Acknowledgments Coauthors

Minkyung Beak, Ivan A.


Justas D., Frank D.
(RoseTTAFold support)

John Jumper & Tim Green


Milot Martin
(Alphafold support) Mirdita Steinegger
David Koes Enzo Guerrero-Araya
(py3Dmol) Lim Heo
(Submitting bug fixes)
(initial analysis)

Martin Milot

Sergey
Some AlphaFold use cases

Alex Bateman
Pfam use case: Calmodulin_bind (PF07887)
• Found in plants
• Involved in stress responses
• Built family in 2004
Seed alignment
• No PDB structure
Structural model for Calmodulin_bind (PF07887)
2
1
3

1 2 3
Structural model for Calmodulin_bind (PF07887)
2
3

3) Similarity to POLO box


New Pfam PF20452
2) Similar to BAF domain
1) Similar to RUNT New Pfam PF20451
DNA-binding domain
Consider how accurate your prediction needs to be

• For some purposes you don’t need highly accurate models


• For Pfam we are interested in identifying domains and comparing them to
PDB
• Can greatly speed up predictions in Colab by making just a single model
• Can greatly speed up predictions with ColabFold which uses mmseqs
• Use plDDT and PAE to judge if model is suitable for purpose
Just because AlphaFold can fold it doesn’t mean nature can

• Pfam use case 2: CPB_BcsS family (PF17036)


• AlphaFold prediction of region matched by Pfam identifies incomplete domain
• Structurally similar to MBB clan

• This will not be stable in vitro!


Run whole sequence in Colab
• >tr|A0A3S2XL24|A0A3S2XL24_9RHIZ Cellulose biosynthesis protein BcsS
MRRRAGLRRAGRLVGLGGLVLADLAWADERADARTVLFGSLDAGSSSFVTVGAKHAVDPVGRDGAVVLGSLG
YGGRSEIRGAWEGSPKRVRRHTVKASVVGGRQWFADWGVVALFVGPEIDFDQREAAKAVGPSPGLRLHGEVW
TRPAPTTLLTLTAIAGSARGDVWTRGSFGVRALGAYLGPEIALYADRTGYRKWSLGLHATEWTGWGVSLRLS
GGWLYEERERRPGAYGALTLWRDLD
AlphaFold model is good quality
Natural linker
Beads on a string proteins

+GG linker

+GGGG linker
Negative control: plDDT values for a spurious protein

• Random sequence is not a good


negative control for AlphaFold
• No homologues
• Predictions for the spurious protein
from AntiFam resources
• A0A0A8WY33.1/7-182
• Some regions have plDDT ~60-70
Negative control: plDDT values for a spurious protein

• No consistent structure predicted


across the models
• Looks a lot like a disordered protein
Another spurious protein
We used AlphaFold to spot a false positive in AntiFam
AlphaFold edge case

• Collagen triple helices are not predicted


• Collagens are obligate trimers and are heavily modified
• ColabFold notebook offers option for trimeric oligomers (Thanks Sergey!)

• Doesn’t seem to help in this case


What about non-natural sequences?

• KKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDKKKKDDDDK
KKKDDDD
Acknowledgements

Pfam & InterPro AlphaFold DB trRosetta & RoseTTAfold models


• Jaina Mistry • David Baker
• Mihaly Varadi
• Sara Chuguransky • Ivan
• Sameer Velankar Anishchanka
• Gustavo Salazar
• AlphaFold DB team
• Matthias Blum
• Typhaine Paysan-Lafosse AlphaFold models
• Matloob Qureshi • John Jumper
• Swaathi Kandasaamy • Kathryn Tunyasuvunakool
• Aleix Lafita • Demis Hassabis
• Alex Bateman • The AlphaFold team
Application of AF2 structures for variant effect
prediction and pocket detection

Pedro Beltrao, Group Leader


www.ebi.ac.uk/beltrao
@pedrobeltrao
Comparing experimental vs structure based prediction of
missense mutations

● Several groups have used Deep


Mutational Scanning
experiments to measure the
impact of each possible
mutations on the function of a
protein.
● The impact of the mutations
should be consistent with the
protein structure.
● We can indirectly evaluate the
accuracy of the AF2 structures
by measuring their consistency
with mutational impacts.
Comparing experimental vs structure based prediction of
missense mutations

● We used FoldX to predict the impact of


each possible mutation on the stability of
the protein
● FoldXtakes as an input a protein structure
and can be used to estimate the impact of
a mutation purely from empirical energy
terms (e.g. clashes, hydrogen bonds etc)
● It has been shown to correlate reasonably
well with experimentally measured
stability changes (not trained)

Guerois et al. JMB 2002


Comparing experimental vs structure based prediction of
missense mutations
● Experimental impact of
mutations from deep
mutational scanning
experiments (30 proteins)
● Predicted ddG of mutation
using FoldX on alphafold
structures or experimental
structures
● Alphafold structures give
equal or better predictions
and it holds for regions
with no templates
Comparing experimental vs structure based prediction of
missense mutations
Comparing experimental vs structure based prediction of
missense mutations

EFEMP2

C267 → Y in ARCL1B; loss of


protein expression in patient
dermal fibroblasts;

Pathogenic Predicted ddG 5.51


Pocket detection and how to filter the models
EGFR
example Extracellular
region

TM
regio
n Top pocket
prediction

Kinase
Long domain
disordered
c-term
Pocket detection on the full models is likely to result in false positives and false negatives due to
modelling of low confidence regions and interactions. We tested this on a benchmark dataset
Pocket detection and how to filter the models

We retained 230 of 304 proteins from a dataset by Clark et al., 2020. Pocket detection was
performed using ghecom (Kawabata, 2010), as done previously in (Clark et al., 2020).
Pocket detection and how to filter the models

Filtering the pocket residues by confidence (likely also predicted aligned


residue) improves pocket detection
AlphaFold2
for detecting intrinsically
disordered protein regions

Bálint Mészáros
EMBL Heidelberg
08/09/2021
AlphaFold2 indicates the presence of IDRs

AF2 generates coordinates for every residue, even ones that have no fixed structure

human
p53

Two interpretations for low confidence: pLDDT is a good indicator of disordered regions (in this
• AF2 isn’t good enough to predict the case)
structure Let’s test the generic case – binary disordered prediction
• There is no structure to predict
AlphaFold2 as a disorder prediction method

1.2
pLDDT score distribution on the human
proteome
0.8

0.4

0 20 40 60 80 100
AlphaFold2 as a disorder prediction method

Structural parameters derived from AF2 serve as excellent predictors of disorder – outperforming our current
workhorse

Download data for human + documentation: https://tinyurl.com/AF2-disorder


AF2 disorder scores in ProViz @ DaveyLab

https://tinyurl.com/AF2-ProViz
AF2 identifies functional sites in IDRs

Homo-tetramerization
Binding site for the
region
MDM2 ubiquitin
ligase
AF2 can build complex structures for certain IDRs

Homo-tetramerization
Binding site for the
region
MDM2 ubiquitin
ligase

In what cases does AF2 work for IDR


complex predictions?

All complex structures were generated using ColabFold by Sergey


AF2 can build complex structures for certain IDRs
Right bound structure, wrong Three IDRs as a single
binding site folding unit

KID:KI E2F1:DP1:R
X b
Right binding site, wrong Highly specialized homotrimeric
orientation folding

RelA:CB
P collagen
trimer
AF2 can build complex structures for certain IDRs
• AF2 can predict IDR complexes but success depends on several factors
• Helps prediction:
• Helical / β bound IDP conformation
• Well defined, hydrophobic binding groove
• Asymmetric bound IDP structure (secondary structural elements along the IDR sequence)
• Hinders prediction:
• Short IDRs
• Irregular bound structure
• Phosphorylation-dependent binding
• Presence of ions in the interface
• Symmetric bound structures (long helices or arrays of short similar structural elements)
• Examples + documentation: https://tinyurl.com/AF2-IDRcomplex
Acknowledgements
Thank you!

Norman Davey | Short Linear Motif Team


@DaveyLab

Bálint Mészáros | Structural and Computational Biology Unit


@_BalintMeszaros
[email protected]

You might also like