Protein Structure: Predictive Methods and Experimental Methodologies
Protein Structure: Predictive Methods and Experimental Methodologies
Topics Covered
Methods for solving protein structures experimentally
Overview of protein structure: primary, secondary, tertiary,
and quaternary
Overview of protein folding
Protein structure classification resources
CATH
SCOP
Primary,
Secondary,
Tertiary and
Quaternary
Structure
Secondary structure
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Amphipathic
alpha helix
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Beta strand
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Beta sheet
http://www.web-books.com/MoBio/Free/Ch2C4.htm
Super-secondary structures
Somewhat short structural segments
These may play key functional roles
http://chemistry.umeche.maine.edu/CHY431/Proteins8.html
Supersecondary structure:
Helix-turn-helix
4-helical bundles
More
about 4helical
bundles
Coiled Coil
http://www.mhl.soton.ac.uk/public/research/projects/current/rhodopsin/index.html
Flavodoxin
Thioredoxin
10
TIM Barrel
Rossman fold
11
LRR structures
across the Tree
of Life
12
13
CATH
CATH classification
14
Structural Classification of
Proteins (SCOP)
and the Astral datasets
15
Anti-fungal defensin
(Radish)
Drosomycin
(Drosophila)
Scorpion toxin
16
17
18
Experimental determination of
protein structure
X-ray crystallography
NMR spectroscopy
Many experimental issues:
Some proteins are hard to solve (e.g., membrane proteins)
Long proteins normally hard to solve, and are often divided into
individual domains
Getting the domain boundaries correct is hard
Numerous other problems (which is why predicting protein
structure is useful)
19
X-ray crystallography
Most accurate; can be applied to larger proteins
Oldest method; first structure (myoglobin) determined in late 1950s
(Kendrew et al 1958). More than 20K structures solved to date
Method:
Small protein crystals (measuring <1mm) exposed to X-ray beams
X-rays (which have a wavelength of 1-2Angstroms) are scattered or
diffracted by the protein atoms in the crystal
Diffraction pattern appears as tens of thousands of tiny spots arrayed in
complex circular patterns
XYZ coordinates determined from these diffraction patterns, along with
intensity and phase information
Molecular replacement (comparative models based on homologous
structures) can facilitate that process.
20
NMR spectroscopy
21
Structural superposition
Superposing protein structures can reveal startling
structural similarities not apparent from primary
sequence comparison
Structural superposition is the basis for evaluating
many alignment methods
Structural superposition has a great degree of
variability across methods
Also: proteins can be solved in a bound or unbound
conformation
The two conformations can be significantly different
Flexible superposition is important
Currently
22K structures (as of October 23, 2005) (~46K as of October 2007)
Coordinate distribution and deposition is electronic (via the world wide web)
Moved to the Research Collaboratory for Structural Bioinformatics (RSCB) in
1998
Primary archival center for experimentally determined 3D structures of proteins,
nucleic acids, carbohydrates and complexes
Separate repository for theoretical models
22
http://www.usm.maine.edu/~rhodes/ModQual/index.html
http://www.usm.maine.edu/~rhodes/ModQual/index.html
23
24
Y 2005
Sequences
2,300,000
Structures
29,000
Anacystis nidulans
Anabaena 7120
Ab initio prediction
Condrus crispus
folding
Desulfovibrio vulgaris
GFCHIKAYTRLIMVG
evolution
Fold Recognition
Comparative Modeling
Andras Fiser, Albert Einstein College of Medicine
25
Ab initio prediction
Comparative Modeling
26
Structural Genomics
Characterize most protein sequences (red) based on related
known structures (green).
The number of families is
much smaller than the number
of proteins
Structural Genomics
Definition: The aim of structural genomics is to put every
protein sequence within a modeling distance of a
known protein structure.
Size of the problem:
There are a few thousand domain fold families.
There are ~20,000 sequence families (30% sequence id).
Solution:
Determine protein structures for as many different
families as possible.
Model the rest of the family members using comparative
modeling
Andras Fiser, Albert Einstein College of Medicine
27
Protein folding
28
Folding pathways
Evidence that local structure segments form first, and
then pack against each other to form 3D fold
Exploited in protein fold prediction, Rosetta method
Simons, Bonneau, Ruczinski & Baker (1999). Ab initio Protein
Structure Prediction of CASP III Targets Using ROSETTA. Proteins
29
Beta turns
3/10 helices
! loops
http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm
30
Structural domains
Folded structures of most small proteins are roughly spherical and
remarkably compact
Proteins with >200aa tend to consist of >2 structural units, called
domains
Domains interact to varying extents, but less extensively than do
structural elements within domains
Some domain detection tools make use of this pattern, looking for
covariation between positions as evidence of interaction, and lack of
covariation as evidence of domain boundaries
Nagarajan and Yona, Automatic prediction of protein domains from sequence
information using a hybrid learning system. Bioinformatics 2004
31
Stronger interactions
Hydrogen bonds, disulfide bridges
Weak interactions
Van der Waals, electrostatic, etc
32
33