0% found this document useful (0 votes)
69 views100 pages

Protein Structural Motifs: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)

This document discusses protein structure databases including SCOP, Superfamily, and CATH. SCOP and CATH classify protein structures manually based on class, fold, superfamily, and family. Superfamily assigns folds to genomes using HMM models for each SCOP fold. The databases provide hierarchical classification of structures and tools for sequence and structure searches.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views100 pages

Protein Structural Motifs: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)

This document discusses protein structure databases including SCOP, Superfamily, and CATH. SCOP and CATH classify protein structures manually based on class, fold, superfamily, and family. Superfamily assigns folds to genomes using HMM models for each SCOP fold. The databases provide hierarchical classification of structures and tools for sequence and structure searches.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Computational Molecular Biology

Biochem 218 – BioMedical Informatics 231


http://biochem218.stanford.edu/

Protein Structural Motifs

Doug Brutlag
Professor Emeritus
Biochemistry & Medicine (by courtesy)
Homework 5: Phylogenies

• For this homework assignment take 20 to 30 protein sequences which


are at least 30% similar or better and:
o 1) make a multiple sequence alignment with them using ClustalW and
o 2) make two phylogenies, one using UPGMA method and the other using the
Neighbor Joining method
• Describe the resulting alignments and include graphic images of the
phylogenies in a message to [email protected]
• Mention if the trees seem reasonable biologically or taxonomically by
comparison with standard taxonomies
• Do the two trees have the same topology?
• Do the trees have the same branch lengths?
• If the two trees do not have the same topology or branch lengths,
describe the differences and indicate why you think the two trees
differ. Are the differences significant?
• Do the trees show evidence of paralogous evolution? Which nodes
are orthologous and which are paralogous bifurcations?
• Do the trees show evidence of either gene conversion or horizontal
gene transfer?
Final Projects Due March 12

• Examples of Previous Final Projects


o http://biochem218.stanford.edu/Projects.html
• Critical review of any area of computational molecular biology.
o Area from the lectures but in more depth
o Any other area of bioinformatics or genomics focused on
computational approaches
• Proposed improvement or novel approach
• Can be a combined experimental/computational method.
• Could be an implementation or just pseudocode.
• Please do a MeSH literature search for Reviews on your topic.
Some useful MeSH terms include:
o Algorithms
o Statistics
o Molecular Sequence Data
o Molecular Structure etc.
• Please send a proposed final project topic to
[email protected] by next Friday
Protein Structure Computational Goals

• Compare all known structures to each other


• Compute distances between protein structures
• Classify and organize all structures in a biologically
meaningful way
• Discover conserved substructure domain
• Discover conserved substructural motifs
• Find common folding patterns and structural/functional
motifs
• Discover relationship between structure and function.
• Study interactions between proteins and other proteins,
ligands and DNA (Protein Docking)
• Use known structures and folds to infer structure from
sequence (Protein Threading)
• Use known structural motifs to infer function from structure
• Many more…
Structural Classification of Proteins (SCOP)
http://scop.berkeley.edu/

• Class
o Similar secondary
structure content
o All α, all β, alternating α/
β etc

• Fold (Architecture)
o Major structural similarity
o SSE’s in similar
arrangement

• Superfamily (Topology)
o Probable common
ancestry
o HMM family membership

• Family
o Clear evolutionary
relationship
o Pairwise sequence
similarity > 25%
Classes of Protein Structures
• Mainly α
• Mainly β
 α β alternating
o Parallel β sheets, β-α-β
units

• α β
o Anti-parallel β sheets,
segregated α and β regions
o helices mostly on one side of
sheet
Classes of Protein Structures

• Others
o Multi-domain, membrane and cell surface,
small proteins, peptides and fragments,
designed proteins
Folds / Architectures
• Mainly α
• α/ βand α+β
o Bundle
o • Closed
Non-Bundle
• Mainly β • Barrel
o Single sheet • Roll, ...
o Roll • Open
o Barrel • Sandwich
o Clam • Clam, ...
o Sandwich
o Prism
o 4/6/7/8 Propeller
o Solenoid
The TIM Barrel Fold
A Conceptual Problem ...
Fold versus Topology

Another example:
Globin
vs.
Colicin
PDB Protein Database
http://www.rcsb.org/pdb/

• Protein DataBase
o Multiple Structure Viewers
o Sequence & Structure Comparison Tools
o Derived Data
 SCOP
 CATH
 pFAM
 Go Terms
o Education on Protein Structure
o Download Structures and Entire Database
PDB Protein Database
http://www.rcsb.org/pdb/
PDB Protein Database
http://www.rcsb.org/pdb/
PDB Advanced Search for UniProt Entry
http://www.rcsb.org/pdb/
PDB Search Results
http://www.rcsb.org/pdb/
PDB E. coli Hu Entry
http://www.rcsb.org/pdb/explore/explore.do?structureId=2O97
PDB SimpleViewer
http://www.rcsb.org/pdb/
PDB Protein Workshop View
http://www.rcsb.org/pdb/
PDB Derived Data
http://www.rcsb.org/pdb/
Molecule of the Month: Enhanceosome
http://www.rcsb.org/pdb/static.do?p=education_discussion/molecule_of_the_month/current_month.html
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/

• Macromolecular Structures
• Related Structures
• View Aligned Structures & Sequences
• Cn3D: Downloadable Structure & Sequence Viewer
• CDD: Conserved Domain Database
o CD-Search: Protein Sequence Queries
o CD-TREE: Protein Classification Downloadable Application
o CDART: Conserved Domain Architecture Tool
• PubChem: Small Molecules and Biological Activity
• Biological Systems: BioCyc, KEGG and Reactome Pathways
• MMDB: Molecular Modeling Database
• CBLAST: BLAST sequence against PDB and Related Structure
Database
• IBIS: Inferred Biomolecular Interaction Server
• VAST Search: Structure Alignment Tool
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/
NCBI Structure Database
http://www.ncbi.nlm.nih.gov/Structure/
NCBI Cn3D Viewer
http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
PyMol PDB Structure Viewer
http://www.pymol.org/
Databases of Protein Folds

• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
SCOP Database of Protein Folds
http://scop.berkeley.edu/
SCOP Hierarchy
http://scop.berkeley.edu/data/scop.b.html
SCOP Alpha and Beta Proteins
http://scop.berkeley.edu/data/scop.b.d.html
SCOP TIM Barrels
http://scop.berkeley.edu/data/scop.b.d.b.html
SCOP Thiamin Phosphate Synthase
http://scop.berkeley.edu/data/scop.b.d.b.d.A.html
SCOP Thiamin Phosphate Synthase Entry
http://scop.berkeley.edu/
SuperFamily HMM Fold Library
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
SuperFamily Major Features
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
Genome Assignments by Superfamily
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
Databases of Protein Folds

• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
CATH Protein Structure Classification
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Structure Hierarchy
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Class Level
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Orthogonal Bundle
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Summary
http://www.biochem.ucl.ac.uk/bsm/cath/
CATH Protein Summary
http://www.biochem.ucl.ac.uk/bsm/cath/
Databases of Protein Folds

• SCOP (http://scop.berkeley.edu/)
o Structural Classification of Proteins
o Class-Fold-Superfamily-Family
o Manual assembly by inspection
• Superfamily (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
o HMM models for each SCOP fold
o Fold assignments to all genome ORFs
o Assessment of specificity/sensitivity of structure prediction
o Search by sequence, genome and keywords
• CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
o Class - Architecture - Topology - Homologous Superfamily
o Manual classification at Architecture level
o Automated topology classification using SSAP (Orengo & Taylor)
• FSSP (http://www2.embl-ebi.ac.uk/dali/fssp/ )
o Fully automated using the DALI algorithm (Holm & Sander)
o No internal node annotations
o Structural similarity search using DALI
FSSP Database
http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+LibInfo+-lib+FSSP
Dali Server
http://www.ebi.ac.uk/dali/
DALI Database (Liisa Holm)
http://ekhidna.biocenter.helsinki.fi/dali/start
Protein Fold Prediction: Swiss Model
http://swissmodel.expasy.org/

• Amos Bairoch, Swiss Bioinformatics Institute, SBI


• Threading and Template Discovery
• Workspace for saving Template Results
• Domain Annotation
• Structure Assessment
• Template Library
• Structures & Models
• Documentation and Tutorials
Protein Fold Prediction: Swiss Model
http://swissmodel.expasy.org/
Automatic Protein Fold Prediction
http://swissmodel.expasy.org/
Automatic Protein Fold Prediction Results
http://swissmodel.expasy.org/
Automatic Protein Fold Prediction Results
http://swissmodel.expasy.org/
Automatic Protein Fold Prediction Results
http://swissmodel.expasy.org/
Protein Fold Prediction: phyre
http://www.sbg.bio.ic.ac.uk/~phyre/

• Michael Sternberg, Structural Bioinformatics Group,


Imperial College London
• Protein structure prediction on the web: a case study
using the Phyre server Kelley LA and Sternberg MJE.
Nature Protocols 4, 363 - 371 (2009)
• Protein Homology/analogY Recognition Engine
Protein Fold Prediction: phyre
http://www.sbg.bio.ic.ac.uk/~phyre/
Protein Fold Prediction: phyre
http://www.sbg.bio.ic.ac.uk/~phyre/
Protein Fold Prediction: phyre
http://www.sbg.bio.ic.ac.uk/~phyre/
Protein Fold Prediction: PsiPred
http://bioinf4.cs.ucl.ac.uk:3000/psipred/

• Kevin Bryson and David Jones, University College


London
• Predicts Secondary Structure of single molecules
• Predicts Transmembrane Topology
• Three Fold Recognition methods
Protein Fold Prediction: PsiPred
http://bioinf4.cs.ucl.ac.uk:3000/psipred/
Protein Fold Prediction: PsiPred
http://bioinf4.cs.ucl.ac.uk:3000/psipred/
Protein Fold Prediction: Predict Protein
http://www.predictprotein.org/

• Burkhard Rost, Columbia


• Methods
o MaxHom : multiple alignment
o PSI-BLAST : iterated profile
o searchProSite : functional motifs
o SEG : composition-bias
o ProDom : domain assignment
o PredictNLS : nuclear localisation signal
o PHDsec : secondary structure
o PHDacc : solvent accessibility
o Globe : globularity of proteins
o PHDhtm : transmembrane helices
o PROFsec : secondary structure
o PROFacc : solvent accessibilityCoils : coiled-coil regions
o CYSPRED : cysteine bridges
o Topits : fold recognition by threading
Protein Fold Prediction: Predict Protein
http://www.predictprotein.org/
Automating Structure Classification,
Fold & Function Detection

• Growth of PDB demands automated


techniques for classification and fold detection
• Protein Structure Comparison
o computing structure similarity based on metrics
(distances)
o identifying protein function
o understanding functional mechanism
o identifying structurally conserved regions in the
protein
o finding binding sites or other functionally
important regions of the protein
Structure Superposition

• Find the transformation matrix that best overlaps the table


and the chair
• i.e. Find the transformation matrix that minimizes the root
mean square deviation between corresponding points of
the table and the chair
• Correspondences:
o Top of chair to top of table
o Front of chair to front of table, etc.
Absolute Orientation Algorithm
http://www-mtl.mit.edu/researchgroups/itrc/ITRC_publication/horn_publications.html

Closed-form solution of absolute orientation using unit quate

Berthold K.P. Horn, J.Opt.Soc.Am,


+ April 1987, Vol 4, No. 4

The key is finding corresponding points between


the two structures
Algorithms for Structure Superposition
• Distance based methods:
 DALI (Holm & Sander): Aligning scalar distance plots

STRUCTAL (Gerstein & Levitt): Dynamic programming
using pair-wise inter-molecular distances

SSAP (Orengo & Taylor): Dynamic programming using
intra-molecular vector distances

MINAREA (Falicov and Cohen): Minimizing soap-bubble
surface area
 CE (Shindyalov & Bourne)
• Vector based methods:

VAST (Bryant): Graph theory based secondary structure
alignment

3D Search (Singh and Brutlag) & 3D Lookup (Holm
and Sander): Fast secondary structure index lookup
• Both

LOCK (Singh & Brutlag) LOCK2 (Ebert & Brutlag):
Hierarchically uses both secondary structure vectors and
atomic distances
DALI

An intra-molecular distance plot for myoglobin


DALI

• Based on aligning 2-D intra-molecular distance


matrices
• Computes the best subset of corresponding
residues from the two proteins such that the
similarity between the 2-D distance matrices is
maximized
• Searches through all possible alignments of
residues using Monte-Carlo and Branch-and-
Bound algorithms

Score(i, j) = 1.5 - |distanceA(i, j) - distanceB(i, j)|


STRUCTAL
• Based on Iterative Dynamic Programming to
align inter-molecular distances
• Pair-wise alignment score in each square of
the matrix is inversely proportional to
distance between the two atoms

12 3 4 5 6 1 2 3 4 5 6
1 1
2 2
3 3
4 4
5 5
6 6
VAST - Vector Alignment Search Tool
• Aligns only secondary structure elements (SSE)

• Represents each SSE as a vector

• Finds all possible pairs of vectors from the two structures that are
similar

• Uses a graph theory algorithm to find maximal subset of similar


vector pairs

• Overall alignment score is based on the number of similar pairs of


vectors between the two structures
Algorithms for Structure Superposition
• Atomic distance based methods:
 DALI (Holm and Sander): Aligning scalar distance plots
 STRUCTAL (Gerstein and Levitt): Dynamic programming
using pair wise inter-molecular distances
 SSAP (Orengo and Taylor): Dynamic programming using
intra-molecular vector distances
 MINAREA (Falicov and Cohen): Minimizing soap-bubble
surface area
• Vector based methods:
 VAST (Bryant): Graph theory based secondary structure
alignment
 3dSearch (Singh and Brutlag): Fast secondary structure index
lookup
• Use both SSE vectors and atomic distances

LOCK (Singh and Brutlag): Hierarchically uses both
secondary structure vectors and atomic distances
LOCK - Creating Secondary Structure Vectors
Comparing Secondary Structure Vectors

θ Orientation Independent Scores:


i k
S = S(|angle θ(i,k) - angle φ(p,r)|)
S = S(|distance(i,k) - distance(p,r)|)
S = S(|length(i) - length(p)|)+
φ S(|length(p) - length(r)|)
p
r Orientation Dependent Scores:
S = S(angle(k,r))
S = S(distance(k,r))

M
2M
2 - M d
S(d) = d
1+ d0
d0
-M
Aligning Secondary Structure Vectors

H H S S
S Best local alignment :
H HHSS
S SHSSH
S
H
Three Step Algorithm

• Local Secondary Structure Superposition


o Find an initial superposition of the two proteins by using
dynamic programming to align the secondary structure
vectors

• Atomic Superposition
o Apply a greedy nearest neighbor method to minimize the
RMSD between the C-α atoms from query and the target
(i.e. find the nearest local minimum in the alignment
space)

• Core Superposition
o Find the best sequential core of aligned C-α atoms and
minimize the RMSD between them
Step 1: Local Secondary Structure Superposition

S4 H3
S4
S2
H1 H1
H3
S2
Step 1: Local Secondary Structure Superposition

B3
A4 B4
A2
A1 B1
A3 B2

pair # of aligned vectors total alignment score

A1,A2 2 32
B2,B3
A3,A4
3 71
B3,B4
Step 1: Local Secondary Structure Superposition
Step 2: Atomic Superposition
Step 3: Core Superposition
LOCK 2: Secondary Structure Element
Alignment

φ
ψ Ф
d

Superimpose vectors and


θ Compare internal distances in
Represent
score
Restore
alignment
secondary
secondary
using
structure
both
ψ order to find equivalent
φ structure
orientation
element representation
elements
independent
as vectors
and
secondary structure elements
d orientation dependent scores
Residue Alignment

EEKSAVTALWGKV--
GDKKAINKIWPKIYK

superposition residue registration

• Naïve approach:
Nearest neighbor alpha
carbons
Beta Carbons Encode Directional
Information

θ = Angle between Cα and


Cβ vectors
d = distance between Cβ atom
(maximum 6Ǻ)
New Residue Alignment


Improvements in Consistency
• Consistency: measures the adherence to the transitivity property
among all triples of protein structures in a given superfamily

Globin Immunoglobulin
Superfa Superfamily
mily

Alpha carbon 74.3% 58.6%


distances

Beta carbon 80% 59.9%


positions

% increase in 37.0% 77.8%


aligned residues

(less than 10% pairwise sequence identity)


New LOCK 2 Properties

• Changes to secondary structure element alignment


phase allow for recognition of more distant structural
relationships
• Metric scoring function:
1-score(A,B) + 1-score(B,C) ≤ 1-score(A,C)
• Biologically relevant residue alignment
• Highly consistent alignments
• Symmetric
• Assessment of statistical significance
FoldMiner: Structure Similarity Search
Based on LOCK2 Alignment

• FoldMiner aligns query structure with all


database structures using LOCK2

• FoldMiner up weights secondary structure


elements in query that are aligned more often

• FoldMiner outperforms CE and VAST is


searches for structure similarity
Receiver-Operating Characteristic (ROC) Curves

Ra Fold 16
14
nk 12
10
1 Immunoglobuli 8
6
n Positives
4
Number
2 of True
2 Immunoglobuli 0
0 5 10 15
n
...

...

Number of False Positives

3 p53
• Gold standard: Structural Classification of Proteins
(SCOP)
o SCOP folds: similar arrangement and connectivity of
secondary structure elements
Comparing ROC Curves

40
Number of True Positives

35 • Area under the ROC


30
25 curve correlates with
20 the property of ranking
15
10
CE
VAST
true positives ahead of
5 FoldMiner false positives
0
0 100 200 300
• Curves may terminate
25 at different numbers of
20 true and false positives
15 • Areas can only be
10 VAST
directly compared if
5 CE calculated at points
FoldMiner where the two curves
0
0 5 10 15 20
cross over one another
Number of False Positives
Comprehensive Analysis of ROC Curves
Motif Alignment Results

Families Superfamilies

eMOTIFs 96.4% 91.6%

Prosite patterns 97.4% 92.6%


LOCK2 Superposition Web Site
http://brutlag.stanford.edu/lock2/
LOCK2 Superposition Web Site
http://brutlag.stanford.edu/lock2/
PyMol Display of LOCK2 Superposition
FoldMiner Structure Search
http://brutlag.stanford.edu/foldminer/
FoldMiner Myoglobin Structure Search
http://brutlag.stanford.edu/foldminer/
FoldMiner Myoglobin Structure Search
http://brutlag.stanford.edu/foldminer/
FoldMiner Myoglobin Structure Search
http://brutlag.stanford.edu/foldminer/
FoldMiner Myoglobin Structure Search
http://brutlag.stanford.edu/foldminer/
ModLink+
http://sbi.imim.es/modlink/

You might also like