0% found this document useful (0 votes)
11 views46 pages

Protein 3D Structure Database

The document provides an overview of protein 3D structures, including primary, secondary, tertiary, and quaternary classifications. It discusses methods for determining 3D structures, such as X-ray crystallography and NMR, and introduces protein structural databases like PDB, SCOP, and CATH. Additionally, it outlines the hierarchical classification of protein domains and the significance of structural and evolutionary relationships in protein classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views46 pages

Protein 3D Structure Database

The document provides an overview of protein 3D structures, including primary, secondary, tertiary, and quaternary classifications. It discusses methods for determining 3D structures, such as X-ray crystallography and NMR, and introduces protein structural databases like PDB, SCOP, and CATH. Additionally, it outlines the hierarchical classification of protein domains and the significance of structural and evolutionary relationships in protein classification.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 46

Protein 3D structure database

PDB, CATH, SCOP


Three-dimensional (3D) structures
• The three-dimensional (3D) structure is also
called the tertiary structure.
• If a protein molecule consists of more than
one polypeptide, it also has the quaternary
structure, which specifies the relative
positions among the polypeptides (subunits)
in a protein.
Protein Structures
Primary Secondary Tertiary Quaternary

Amino acid Alpha helices & Arrangement Packing of several


sequence. Beta sheets, of secondary polypeptide chains.
Loops. elements in
3D space.
Given an amino acid sequence, we are interested in its
secondary structures, and how they are arranged in higher
structures.
How is a 3D structure determined ?

1. Experimental methods (Best approach):


• X-rays crystallography - stable fold, good quality crystals.
• NMR - stable fold, not suitable for large molecule.

2. In-silico methods (partial solutions -


based on similarity):
• Sequence or profile alignment - uses similar sequences,
limited use of 3D information.
• Threading - needs 3D structure
• Ab-initio structure prediction - not always successful.
Protein Structural Databases
• PDB-Protein Data Bank

• SCOP

• CATH
PDB
• The Protein Data Bank is a repository for 3-D
structural data of proteins and nucleic acids.
• These data, typically obtained by X-ray
crystallography or NMR spectroscopy and submitted
by biologists and biochemists from around the world.
• The PDB was established in 1971 at Brookhaven
National Laboratory and originally contained just 7
protein structures
• In 1998, the Research Collaboratory for Structural
Bioinformatics (RCSB) became responsible for the
management of the PDB.
PDB Statistics: 142000 Biological Macromolecular Structures
Protein Structure in PDB
• Text files
• Each entry is specified by a unique 4-letter
code (PDB code): say 1HUY for a variant of
GFP; 1BGK for a 37-residue toxin protein
isolated from sea anemone
• 1HUY and 1BGK
– Header information
– Atomic coordinates in Å
Header Details

• Identifies the molecule, modifications, date of


release

• Host organism, keywords, method of study

• Authors, reference, resolution for X-ray structure


– Smaller the number, better the structure.

• Sequence, reference
The Atomic Coordinates
• XYZ Coordinates for each atom (starting with ATOM, only heavy atom
for X-ray structure) from the first residue to the last

• XYZ coordinates for any ligands (starting with HETATM) complexed to


the bio-macromolecule

• O atoms of water molecules (starting with HETATM, normally at the last


part of the xyz coordinate section)

• Usually, for X-ray structure, resolution is not high enough to locate H


atoms: hence only heavy atoms are shown in the PDB file.

• For NMR structure, all atoms (including hydrogen atoms) are specified
in the PDB file.
X-ray structure 1HUY
NMR structure 1BGK
2. Free Software for Protein Structure
Visualization

• RASMOL: available for all platforms


http://www.openrasmol.org
• Swiss PDB Viewer: from Swiss-Prot
http://www.expasy.ch/spdbv/
• Chemscape Chime Plug-in: for PC and Mac
http://www.mdl.com/downloads/downloadable/index.jsp
• YASARA: http://www.yasara.org/
• MOLMOL: MOLecule analysis and MOLecule display
http://129.132.45.141/wuthrich/software/molmol/index.html
Ribbon representation by RasMol

1HUY An Improved Yellow


Variant Of Green
Fluorescent Protein

From Tsien’s group


J.Biol.Chem. 276 29188
(2001)
Ribbon representation by YASARA
Ribbon representation by YASARA
Ribbon representation by MOLMOL
An ensemble of 15 structures (NMR, toxin Bgk);
Proton atoms also included

15 backbone structures of the


sea anemone toxin Bgk
15 all-atom structures of the
sea anemone toxin Bgk

Line representation
Ribbon representation
Space-filling representation
3. Hierarchical classification of protein
domains: SCOP & CATH

• SCOP:Structural Classification of Proteins


University of Cambridge, UK
http://scop.mrc-lmb.cam.ac.uk/scop/

• CATH: Class—Architecture—Topology
--Homologous Superfamily
Sequence family
University College London, UK
http://www.biochem.ucl.ac.uk/bsm/cath/
Basis for protein classification
Proteins adopt a limited number of topologies
More than 50,000 sequences fold into ~1000 unique folds.

Homologous sequences have similar structures


Usually, when sequence identity>30%, proteins adopt the
same fold. Even in the absence of sequence homology, some
folds are preferred by vastly different sequences.

The “active site” is highly conserved


A subset of functionally critical residues are found to be
conserved even the folds are varied.
How many unique folds do organisms
use to express functions?

Sequence space
> 50,000

Conformational
Many sequences to form space
one unique fold
~1,000 ???????
SCOP-Introduction
• SCOP-Structural Classification Of Protein

• URL - http://scop.mrc-lmp.cam.ac.uk/scop/

• Maintained-MRC laboratory of molecular biology and centre for protein

engineering , Cambridge, UK.

• Authors : Alexei.G.Murzin, 1995

• CO-WORKERS- L.Lo conte, B.G.Ailey,S.E.Brenner, T.J.P.Hubbard,

C.Chothia
Features:
- Its purpose is to classify protein 3D structures in a
hierarchical scheme of structural classes.

- All protein structures are classified and it is updated


as new structures, are deposited in the PDB.

-It adds information through analysis and


organization into hierarchical scheme of Folds, Super
Families and Families.
Definition
• Description- the structural and evolutionary
relationships between all proteins whose structure is
known.
• Proteins are classified to reflect both structural and
evolutionary relatedness.
• Hierarchy Level
• SCOP has been constructed using a combination of
manual inspection and automated methods
• The four classification levels are:

• Class - A very broad description of the


structural content of the protein

• Fold - Indicative of a broad structural


similarity but with no evidence of a
homologous relationship
• Super family - Sufficient structural similarity
to infer a divergent evolutionary
relationship but no
detectable sequence similarity
• Family - Significant sequence similarity which
can be detected either directly or through a
transitive search.
SCOP2
• SCOP2 is a successor to the Structural Classification of
Proteins (SCOP) database.
• Similarly to SCOP, the main focus of SCOP2 is to organize
structurally characterized proteins according to their
structural and evolutionary relationships.
• The relationships in SCOP2 fall into four major categories:
– Protein types,
– Evolutionary events,
– Structural classes and
– Protein relationships. The first two categories do not have counterparts in
SCOP.
`
CATH
• CATH- Class, Architecture,Topology and
Homologous

• URL- www.cathdb.info

• Maintained by PDB

• 1997 by Christine Orengo, Janet Thornton and their


colleagues , University college of London.
Features:
• The CATH database ( Class, Architecture, Topology,
Homologous super family) is a hierarchical classification
of protein domain structures, which clusters proteins at
four major structural levels.
• The aim of the databases is similar to that of SCOP but the
scheme is different , the philosophy and practical details of
producing the classification are also different.
• Four main levels
Class C-level
Architecture, A-level
Topology (Fold family), T-level
Homologous Superfamily, H-level
Class
• Class is determined according to the
secondary structure composition and packing
within the structure.
• Three major classes :
mainly-alpha,
mainly-beta and
alpha-beta (α/β,α+β)
Architecture, A-level
• This describes the overall shape of the domain
structure as determined by the orientations
of the secondary structures
• but ignores the connectivity between the
secondary structures
• e.g. barrel or 3-layer sandwich
Topology (Fold family), T-level
• Structures are grouped into fold groups at this level
depending on both the overall shape and
connectivity of the secondary structures.
• This is done using the structure comparison
algorithm SSAP (sequential structure alignment
program) and CATHEDRAL (a fast and effective
algorithm to predict folds and domain boundaries
from multidomain protein structures).
• Equivalent to a fold in SCOP
Homologous Superfamily, H-level
• This level groups together protein domains
which are thought to share a common
ancestor and can therefore be described as
homologous

• Similarities are identified either by high


sequence identity or structure comparison
using SSAP.

You might also like