Protein Structural Motifs: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)
Protein Structural Motifs: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)
Doug Brutlag
Professor Emeritus
Biochemistry & Medicine (by courtesy)
Homework 5: Phylogenies
• Class
o Similar secondary
structure content
o All α, all β, alternating α/
β etc
• Fold (Architecture)
o Major structural similarity
o SSE’s in similar
arrangement
• Superfamily (Topology)
o Probable common
ancestry
o HMM family membership
• Family
o Clear evolutionary
relationship
o Pairwise sequence
similarity > 25%
Classes of Protein Structures
• Mainly α
• Mainly β
α β alternating
o Parallel β sheets, β-α-β
units
• α β
o Anti-parallel β sheets,
segregated α and β regions
o helices mostly on one side of
sheet
Classes of Protein Structures
• Others
o Multi-domain, membrane and cell surface,
small proteins, peptides and fragments,
designed proteins
Folds / Architectures
• Mainly α
• α/ βand α+β
o Bundle
o • Closed
Non-Bundle
• Mainly β • Barrel
o Single sheet • Roll, ...
o Roll • Open
o Barrel • Sandwich
o Clam • Clam, ...
o Sandwich
o Prism
o 4/6/7/8 Propeller
o Solenoid
The TIM Barrel Fold
A Conceptual Problem ...
Fold versus Topology
Another example:
Globin
vs.
Colicin
PDB Protein Database
http://www.rcsb.org/pdb/
• Protein DataBase
o Multiple Structure Viewers
o Sequence & Structure Comparison Tools
o Derived Data
SCOP
CATH
pFAM
Go Terms
o Education on Protein Structure
o Download Structures and Entire Database
PDB Protein Database
http://www.rcsb.org/pdb/
PDB Protein Database
http://www.rcsb.org/pdb/
PDB Advanced Search for UniProt Entry
http://www.rcsb.org/pdb/
PDB Search Results
http://www.rcsb.org/pdb/
PDB E. coli Hu Entry
http://www.rcsb.org/pdb/explore/explore.do?structureId=2O97
PDB SimpleViewer
http://www.rcsb.org/pdb/
PDB Protein Workshop View
http://www.rcsb.org/pdb/
PDB Derived Data
http://www.rcsb.org/pdb/
Molecule of the Month: Enhanceosome
http://www.rcsb.org/pdb/static.do?p=education_discussion/molecule_of_the_month/current_month.html
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/
• Macromolecular Structures
• Related Structures
• View Aligned Structures & Sequences
• Cn3D: Downloadable Structure & Sequence Viewer
• CDD: Conserved Domain Database
o CD-Search: Protein Sequence Queries
o CD-TREE: Protein Classification Downloadable Application
o CDART: Conserved Domain Architecture Tool
• PubChem: Small Molecules and Biological Activity
• Biological Systems: BioCyc, KEGG and Reactome Pathways
• MMDB: Molecular Modeling Database
• CBLAST: BLAST sequence against PDB and Related Structure
Database
• IBIS: Inferred Biomolecular Interaction Server
• VAST Search: Structure Alignment Tool
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/
NCBI Cn3D Viewer
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
PyMol PDB Structure Viewer
http://www.pymol.org/
Databases of Protein Folds
• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
SCOP Database of Protein Folds
http://scop.berkeley.edu/
SCOP Hierarchy
http://scop.berkeley.edu/data/scop.b.html
SCOP Alpha and Beta Proteins
http://scop.berkeley.edu/data/scop.b.d.html
SCOP TIM Barrels
http://scop.berkeley.edu/data/scop.b.d.b.html
SCOP Thiamin Phosphate Synthase
http://scop.berkeley.edu/data/scop.b.d.b.d.A.html
SCOP Thiamin Phosphate Synthase Entry
http://scop.berkeley.edu/
SuperFamily HMM Fold Library
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
SuperFamily Major Features
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
Genome Assignments by Superfamily
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
Databases of Protein Folds
• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
CATH Protein Structure Classification
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Structure Hierarchy
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Class Level
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Orthogonal Bundle
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Summary
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Summary
http://www.biochem.ucl.ac.uk/bsm/cath/
Databases of Protein Folds
• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
FSSP Database
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+FSSP
Dali Server
http://www.ebi.ac.uk/dali/
DALI Database (Liisa Holm)
http://ekhidna.biocenter.helsinki.fi/dali/start
Protein Fold Prediction: Swiss Model
http://swissmodel.expasy.org/
12 3 4 5 6 1 2 3 4 5 6
1 1
2 2
3 3
4 4
5 5
6 6
VAST - Vector Alignment Search Tool
• Aligns only secondary structure elements (SSE)
• Finds all possible pairs of vectors from the two structures that are
similar
M
2M
2 - M d
S(d) = d
1+ d0
d0
-M
Aligning Secondary Structure Vectors
H H S S
S Best local alignment :
H HHSS
S SHSSH
S
H
Three Step Algorithm
• Atomic Superposition
o Apply a greedy nearest neighbor method to minimize the
RMSD between the C-α atoms from query and the target
(i.e. find the nearest local minimum in the alignment
space)
• Core Superposition
o Find the best sequential core of aligned C-α atoms and
minimize the RMSD between them
Step 1: Local Secondary Structure Superposition
S4 H3
S4
S2
H1 H1
H3
S2
Step 1: Local Secondary Structure Superposition
B3
A4 B4
A2
A1 B1
A3 B2
A1,A2 2 32
B2,B3
A3,A4
3 71
B3,B4
Step 1: Local Secondary Structure Superposition
Step 2: Atomic Superposition
Step 3: Core Superposition
LOCK 2: Secondary Structure Element
Alignment
φ
ψ Ф
d
EEKSAVTALWGKV--
GDKKAINKIWPKIYK
• Naïve approach:
Nearest neighbor alpha
carbons
Beta Carbons Encode Directional
Information
Cβ
Improvements in Consistency
• Consistency: measures the adherence to the transitivity property
among all triples of protein structures in a given superfamily
Globin Immunoglobulin
Superfa Superfamily
mily
Ra Fold 16
14
nk 12
10
1 Immunoglobuli 8
6
n Positives
4
Number
2 of True
2 Immunoglobuli 0
0 5 10 15
n
...
...
3 p53
• Gold standard: Structural Classification of Proteins
(SCOP)
o SCOP folds: similar arrangement and connectivity of
secondary structure elements
Comparing ROC Curves
40
Number of True Positives
Families Superfamilies