0% found this document useful (0 votes)
57 views

Protein Structure: Predictive Methods and Experimental Methodologies

The document provides an overview of protein structure prediction and experimental methods for determining protein structures. It lists sources and topics that will be covered, including experimental methods like X-ray crystallography and NMR spectroscopy. It also discusses protein structure classification databases like SCOP and CATH and highlights the importance of determining protein structure, as structure provides more information about function than sequence alone.

Uploaded by

marylaranjo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Protein Structure: Predictive Methods and Experimental Methodologies

The document provides an overview of protein structure prediction and experimental methods for determining protein structures. It lists sources and topics that will be covered, including experimental methods like X-ray crystallography and NMR spectroscopy. It also discusses protein structure classification databases like SCOP and CATH and highlights the importance of determining protein structure, as structure provides more information about function than sequence alone.

Uploaded by

marylaranjo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Protein structure

Predictive methods and


experimental methodologies

Sources for this lecture


Bioinformatics (course text: Baxevanis and Ouellette)
Chapter 8: Predictive methods using protein sequences (Ofran and Rost) 198-219
Chapter 9: Protein structure prediction and analysis (Wishart) 224-247
Chapter 12: Creation and analysis of protein multiple sequence alignments (Barton)
pgs 333-336; (Required reading during weeks 3-4 on multiple sequence alignment)

Proteins: Structures and molecular properties (Creighton)


Introduction to Protein Structure (Branden & Tooze)
Much of the text in the slides that follow are drawn either verbatim
or paraphrased from these texts.

Topics Covered
Methods for solving protein structures experimentally
Overview of protein structure: primary, secondary, tertiary,
and quaternary
Overview of protein folding
Protein structure classification resources
CATH
SCOP

The Structural Genomics Initiative

The importance of protein structure


Bioinformatics is much more than just sequence
analysismany of the most interesting and exciting
applications in bioinformatics today actually are
concerned with structure analysis.
The origins of bioinformatics actually lie in the field of
structural biology.
Proteins are perhaps the most complex chemical entities in
nature. No other class of molecule exhibits the variety and
and irregularity in shape, size, texture and mobility that
can be found in proteins.

Baxevanis & Ouellette (Ch. 9, p.224, Wishart)

Primary,
Secondary,
Tertiary and
Quaternary
Structure

Secondary structure

! helix first described by


Linux Pauling in 1951
Avg length: 10 residues
(3 turns)
range: from 4 to over 40
residues
good helix formers:
A, E, L, M
very poor formers:
P, G, Y, S
most common location:
on surface, one side
buried, one side exposed
transmembrane helices
have almost entirely
hydrophobic side chains
(Branden & Tooze)

http://www.web-books.com/MoBio/Free/Ch2C4.htm

Amphipathic
alpha helix

http://www.web-books.com/MoBio/Free/Ch2C4.htm

Beta strand

http://www.web-books.com/MoBio/Free/Ch2C4.htm

Beta sheet

http://www.web-books.com/MoBio/Free/Ch2C4.htm

Super-secondary structures
Somewhat short structural segments
These may play key functional roles

Helix-turn-helix and EF-hand

http://chemistry.umeche.maine.edu/CHY431/Proteins8.html

Supersecondary structure:
Helix-turn-helix

4-helical bundles

More
about 4helical
bundles

Coiled Coil

A small selection of common folds

TM proteins: Porin & Rhodopsin

http://www.mhl.soton.ac.uk/public/research/projects/current/rhodopsin/index.html

Flavodoxin

Thioredoxin

10

TIM Barrel

Rossman fold

11

Repeat structures: LRR

LRR structures
across the Tree
of Life

12

Repeat structures: TPR

Structure Classification Databases


SCOP and CATH

13

CATH

CATH classification

14

Structural Classification of
Proteins (SCOP)
and the Astral datasets

15

Evolution of structure and function

Extend function prediction through inclusion of


structure prediction and analysis

Anti-fungal defensin
(Radish)

Drosomycin
(Drosophila)

Scorpion toxin

16

SCOP and ASTRAL

SCOP is a database providing a hierarchy of protein structural domains


divided into Class, Fold, Superfamily and Family

Domains are defined by SCOP primarily by evolutionary evidence


(found in proteins of different domain architectures)

Domains are placed into classes based on similar secondary structure


content (e.g., all alpha, all beta, alpha+beta, etc)

Domains in the same SCOP fold are asserted to have a similar


topology, but their actual homology (evolutionary relationship) may not
be known

Domains in the same SCOP superfamily are asserted by SCOP to


have a common ancestor (evidence may appear obscure to some
biologists, and is based on structural and sequence analyses)

Domains in the same SCOP family generally have a high functional


similarity

ASTRAL is a subset of SCOP, used to assess protein structure


prediction methods
See, e.g., PDB40: a subset of SCOP domains s.t. no pair has greater than
40% identity

17

The complexity of protein structure


John Kendrew, upon solving a low-resolution structure of
myoglobin in 1958:
Perhaps the most remarkable features of the molecule
are its complexity and its lack of symmetry.
The arrangement seems to be almost totally lacking in
the kind of regularities which one instinctively
anticipates, and it is more complicated than has been
predicted by any theory of protein structure.
Reported in Introduction to Protein Structure (Branden & Tooze)

18

Experimental methods for solving


protein 3D structure

Experimental determination of
protein structure
X-ray crystallography
NMR spectroscopy
Many experimental issues:
Some proteins are hard to solve (e.g., membrane proteins)
Long proteins normally hard to solve, and are often divided into
individual domains
Getting the domain boundaries correct is hard
Numerous other problems (which is why predicting protein
structure is useful)

19

X-ray crystallography
Most accurate; can be applied to larger proteins
Oldest method; first structure (myoglobin) determined in late 1950s
(Kendrew et al 1958). More than 20K structures solved to date
Method:
Small protein crystals (measuring <1mm) exposed to X-ray beams
X-rays (which have a wavelength of 1-2Angstroms) are scattered or
diffracted by the protein atoms in the crystal
Diffraction pattern appears as tens of thousands of tiny spots arrayed in
complex circular patterns
XYZ coordinates determined from these diffraction patterns, along with
intensity and phase information
Molecular replacement (comparative models based on homologous
structures) can facilitate that process.

Baxevanis & Ouellette (Ch. 9, Wishart)

Limitations of X-ray crystallography

Structures are solved in an artificial solid-state (crystalline) environment


unlike the natural (liquid) environment of cell.
Structures can be altered by crystal packing and solvent exclusion effects.

Some regions can not be resolved properly (especially very mobile


regions)

these can be left as gaps in the 3D structure, or simply be fuzzy

R factor: measure of agreement (actually divergence: smaller is better)


between calculated structure and experimental data
0.25 (for good protein structures of normal size)
0.05 (for small molecules)
Completely wrong structures can have an R factor of 0.59

It is not unusual for many protein structures to have some errors,


ambiguities or inaccuracies in atomic positions (+/- 0.5 Angstroms), or
to be missing atoms and residues (e.g., TrwB, PDB 1E9RA)
Baxevanis & Ouellette (Ch. 9, Wishart)

20

NMR spectroscopy

Much newer: first NMR structure in 1983

Allows biologists to study structure and dynamics of molecules in liquid state


(or near-physiological environment)

Structures solved by measuring how radio waves are absorbed by atomic


nuclei

Absorption measurement allows the determination of how much nuclear


magnetism is transferred from one atom (or nucleus) to another
Magnetization transfer measured through chemical shifts, J-couplings and nuclear
Overhauser effects
Measured parameters define a set of approximate structural constraints that are fed
into a constraint minimization calculation (distance geometry or simulated annealing)
Result is an ensemble of (15-50) of structures that satisfy the experimental
constraints
These multiple structures are overlaid/superimposed on each other to produce
blurrograms

NMR result is potentially more reflective of true solution behavior of proteins;


most proteins seem to exist in an ensemble of slightly different configurations
Baxevanis & Ouellette (Ch. 9, Wishart)

Limitations of NMR spectroscopy


Size limitations: maximum of 30kD (~250aa)
Solubility of molecule
cannot be applied to membrane proteins

Expensive: requires special isotopically labeled


molecules
Inherently less precise

Baxevanis & Ouellette (Ch. 9, Wishart)

21

Structural superposition
Superposing protein structures can reveal startling
structural similarities not apparent from primary
sequence comparison
Structural superposition is the basis for evaluating
many alignment methods
Structural superposition has a great degree of
variability across methods
Also: proteins can be solved in a bound or unbound
conformation
The two conformations can be significantly different
Flexible superposition is important

Popular methods include: DALI, CE, VAST, Structal

Storing and retrieving protein structures


The Protein Data Bank (PDB)
First electronic database in bioinformatics
Set up at Brookhaven National Laboratory by Walter Hamilton in 1971
7 protein structures at database initiation
Coordinates stored and distributed on punch cards and computer tape

Currently
22K structures (as of October 23, 2005) (~46K as of October 2007)
Coordinate distribution and deposition is electronic (via the world wide web)
Moved to the Research Collaboratory for Structural Bioinformatics (RSCB) in
1998
Primary archival center for experimentally determined 3D structures of proteins,
nucleic acids, carbohydrates and complexes
Separate repository for theoretical models

Baxevanis & Ouellette (Ch. 9, Wishart)

22

http://www.usm.maine.edu/~rhodes/ModQual/index.html

http://www.usm.maine.edu/~rhodes/ModQual/index.html

23

Structural Genomics Initiative


From Andras Fiser
Albert Einstein College of Medicine

Why is it useful to know the structure of a


protein not only its sequence?
The 3D structure is more informative than sequence because patterns in space are
frequently more recognizable than patterns in sequence

Evolution tends to conserve function and


function depends more directly on structure
than on sequence, structure is more
conserved in evolution than sequence.

Andras Fiser, Albert Einstein College of Medicine

24

Why Protein Structure Prediction?

Y 2005
Sequences

2,300,000

Structures

29,000

We know the experimental 3D structure for


~1% of the protein sequences
Andras Fiser, Albert Einstein College of Medicine

Principles of Protein Structure

Anacystis nidulans

Anabaena 7120

Ab initio prediction

Condrus crispus

folding

Desulfovibrio vulgaris

GFCHIKAYTRLIMVG

evolution
Fold Recognition
Comparative Modeling
Andras Fiser, Albert Einstein College of Medicine

25

Protein structure modeling

Ab initio prediction

Comparative Modeling

Applicable to any sequence

Applicable to those sequences only that


share recognizable similarity to a template
structure

Not very accurate (>4 Ang RMSD),

Fairly accurate ( <3 Ang RMSD), typically


comparable to a low resolution X-ray
experiment.

Attempted for proteins of <100 residues

Not limited by size

Accuracy and applicability are limited


by our understanding of the protein
folding problem

Accuracy and applicability are rather


limited by the number of known folds
Andras Fiser, Albert Einstein College of Medicine

What makes comparative modeling possible

I A small difference in the sequence makes a small


difference in the structure

II Protein structures are clustered into fold families

Andras Fiser, Albert Einstein College of Medicine

26

Structural Genomics
Characterize most protein sequences (red) based on related
known structures (green).
The number of families is
much smaller than the number
of proteins

Andras Fiser, Albert Einstein College of Medicine

Structural Genomics
Definition: The aim of structural genomics is to put every
protein sequence within a modeling distance of a
known protein structure.
Size of the problem:
There are a few thousand domain fold families.
There are ~20,000 sequence families (30% sequence id).
Solution:
Determine protein structures for as many different
families as possible.
Model the rest of the family members using comparative
modeling
Andras Fiser, Albert Einstein College of Medicine

27

Protein folding

Information required for folding is (mostly)


contained in the primary sequence
Early on, proteins were shown to fold into their native structures
in isolation
This led to the belief that structure is determined by sequence
alone (Anfinsen, 1973)
Over the last decade, a significant number of proteins have
been shown to not fold properly in the test tube (e.g., requiring
the assistance of chaperonins)
Nevertheless, the native 3D structure is assumed to be in some
energetic minimum
This led to the development of ab initio folding methods

Baxevanis & Ouellette (Ch. 9, Wishart)

28

Folding pathways
Evidence that local structure segments form first, and
then pack against each other to form 3D fold
Exploited in protein fold prediction, Rosetta method
Simons, Bonneau, Ruczinski & Baker (1999). Ab initio Protein
Structure Prediction of CASP III Targets Using ROSETTA. Proteins

Semi-stable structural intermediates on folding pathway


to lowest-energy conformation
Prof. Susan Marqusee, Berkeley

Baxevanis & Ouellette (Ch. 9, Wishart)

Structural studies have provided


insights into protein folding

When high-resolution studies of myoglobin became available, Kendrew


noticed that amino acids in the interior had almost exclusively
hydrophobic side chains

Over time, structural studies have shown the following:


The main driving force for folding water-soluble globular proteins is to
pack hydrophobic side chains into the interior, thus creating a
hydrophobic core and a hydrophilic surface
Bringing hydrophobic side chains into the interior requires that the highly polar
main chain (with one hydrogen bond donor, NH, and one hydrogen bond
acceptor, C=0) is brought along
In a hydrophobic environment, these main-chain polar groups must be
neutralized by the formation of hydrogen bonds
This problem is solved elegantly by the formation of regular secondary
structure in the interior of the molecule
Internal cavities are usually occupied by water molecules that hydrogen-bond
to internal polar groups

From Introduction to Protein Structure (Branden & Tooze)

29

Hierarchical descriptions of proteins


(follows the folding process)

Primary structure: the amino acid sequence

Secondary structure: regular local structure of linear segments of polypeptide chains


(Creighton)

Helix (~35% of residues): subtypes: !,

Beta sheet (~25% of residues)

Both types predicted by Linus Pauling (Corey and Pauling, 1953;


! helix first described by Pauling in 1951)

Other less common structures:

Beta turns

3/10 helices

! loops

Remaining unclassifiable regions termed random coil or unstructured regions

http://www.chembio.uoguelph.ca/educmat/phy456/456lec01.htm

Tertiary structure: Overall topology of the folded polypeptide chain (Creighton)

" and 310

Mediated by hydrophobic interactions between distant parts of protein

Quaternary structure: Aggregation of the separate polypeptide chains of a protein


(Creighton)

Baxevanis & Ouellette (Ch. 9, p.224, Wishart)

Folded conformations of globular


proteins
Most proteins are globular: natural proteins in solution are much
smaller in their dimensions than comparable polypeptides with
random or repetitive conformations and have roughly spherical
shapes
Denaturation: Most proteins are robust to changes in their
environment, until they (somewhat literally) fall apart:
Most proteins are robust to changes in temperature, pH and
pressure, exhibiting little or no change until a point is reached at
which there is a sudden change and loss of biological function
Denaturing proteins has been used to explore folding pathways
e.g.,Understanding how proteins fold: the lysozyme story so far.Dobson CM, Evans
PA, Radford SE.Trends Biochem Sci. 1994

Creighton, Proteins Ch. 6

30

Structural domains
Folded structures of most small proteins are roughly spherical and
remarkably compact
Proteins with >200aa tend to consist of >2 structural units, called
domains
Domains interact to varying extents, but less extensively than do
structural elements within domains
Some domain detection tools make use of this pattern, looking for
covariation between positions as evidence of interaction, and lack of
covariation as evidence of domain boundaries
Nagarajan and Yona, Automatic prediction of protein domains from sequence
information using a hybrid learning system. Bioinformatics 2004

Domains may not always be well segregated; some proteins have


multiple domains with 2 or three polypeptide connections between
domains
See, for example, the SCOP interleaved domains

Domains may also be connected by flexible linker regions


Creighton, Proteins Ch. 6

Structural domains (contd)


Definition of domain is a subjective process done in different ways
by different people
Domains are most evident by their compactness
Expressed quantitatively as the ratio of the surface area of a domain to
the surface area of a sphere with the same volume
Observed values are 1.65+/- 0.08

Course of polypeptide backbone through domain is irregular, but


generally follows moderately straight course through the domain
and then makes a U-turn to re-cross the domain
Overall impression: segments of somewhat stiff polypeptide chain
interspersed with relatively tight turns or bends (almost always on
the molecules surface)
Compared to behavior of a fire hose dropped in one spot

Creighton, Proteins Ch. 6

31

Driving forces in protein folding


Complex combination of local and global
forces
Local forces drive secondary structure formation
Repulsion between hydrophobic side chains of some
amino acids and hydrophilic backbone of protein chain
(intra-molecular)
Interaction between side chains and surrounding solvent
Subcellular environment (e.g., membrane, secreted, etc.)
Pauling et al 1951

Baxevanis & Ouellette (Ch. 9, Wishart)

Summary of driving forces in protein


folding
Hydrophobicity
Hydrophobic residues need to be shielded from solvent
Polar residues to the outside, hydrophobic to the inside

Stronger interactions
Hydrogen bonds, disulfide bridges

Weak interactions
Van der Waals, electrostatic, etc

Recommended reading: Proteins (Thomas Creighton).

32

Global effects on protein fold


Long-range interactions (repulsive or
attractive) between distant parts of structure
These can override local effects
E.g., chameleon protein:
11 amino acids adopt helical structure in one region, and
the same 11 amino acids adopt beta strand in another.
Minor & Kim, 1996

Baxevanis & Ouellette (Ch. 9, Wishart)

33

You might also like