0% found this document useful (0 votes)
116 views33 pages

Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot

The document discusses the Protein Data Bank (PDB) which is a comprehensive database of 3D protein and nucleic acid structures. It is managed by the Worldwide Protein Data Bank organization and includes several member databases. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is one of the members and has been archiving structures since 1971. The PDB file format contains coordinate and metadata information for biological macromolecules.

Uploaded by

Saran.S.Menon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views33 pages

Unit 1: Structure Determination: Protein Structure Database PDB PDB File Format Ramachandran Plot

The document discusses the Protein Data Bank (PDB) which is a comprehensive database of 3D protein and nucleic acid structures. It is managed by the Worldwide Protein Data Bank organization and includes several member databases. The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is one of the members and has been archiving structures since 1971. The PDB file format contains coordinate and metadata information for biological macromolecules.

Uploaded by

Saran.S.Menon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Unit 1: Structure Determination

Protein Structure Database


PDB
PDB File format
Ramachandran Plot
Worldwide PDB (wwPDB)
• The wwPDB organization manages the PDB that archive
structure data and metadata for biological macromolecules to
promote basic and applied research and education across the
sciences.
• Missions:
– Manage the wwPDB Core Archives according to the FAIR Principles.
– Provide expert deposition, validation, biocuration, and remediation
services at no charge to Data Depositors worldwide.
– Ensure universal open access to public domain structural biology data with
no limitations on usage.
– Develop and promote community-endorsed data standards for archiving
and exchange of global structural biology data.
wwPDB Members
• Protein Data Bank in
Europe
• Biological Magnetic
Resonance Data Bank
• Protein Data Bank Japan
• Research Collaboratory for
Structural Bioinformatics
Protein Data Bank
RCSB PDB
• The Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data
Bank (PDB) is a comprehensive database of three-dimensional protein and
nucleic acid structures determined by X-ray crystallography, NMR, and cryo-
electron microscopy.
• The PDB was established in 1971 at Brookhaven National Laboratories (BNL)
with the deposition of seven structures.
• Since then, management of the PDB has been transferred to the RCSB (fall,
1998) and around 154735 structures have been deposited (August 2019).
• The PDB has become an international resource for macromolecular structural
coordinates
• Most peer-reviewed journals now require the deposition of coordinates in this
database prior to publishing structural data.
• PDB structures can be easily accessed through the main web site
(http://www.rcsb.org/pdb) or six other international mirror sites by searching
key words or PDB entry codes. An advanced search form is also available.
Information about structures
1. Download/Display File
– Allows downloading or displaying the coordinate file
2. Medline
– Provides abstract of the primary publication describing the structure
3. View Structure
– Permits interactive viewing of the structure through interactive
graphics programs such as RasMol, Chime, Swiss PdbViewer, VRML, or
MICE
4. Structural Neighbors
– Provide links for that particular structure within the CATH, CE, FSSP,
SCOP, or VAST databases
5. Geometry
– Provides considerable structural analysis of the structure like
Ramachandran plot etc.
6. Sequence Details
– Provide sequence(s) for that particular structure in FASTA format
Other Existing Structural Databases
MSD:
• The Macromolecular Structure Database (MSD;
http://www.ebi.ac.uk/msd/index.html) at the European
Bioinformatics Institute (EBI) manages and distributes
macromolecular structural data.
MMBD:
• The Molecular Modeling Database
(MMDB;http://www.ncbi.nlm.nih.gov/Structure/MMDB/
mmdb.shtml) is maintained by the National Center for
Biotechnology Information (NCBI), part of the National
Institutes of Health (NIH).
Understanding Structure File
• The primary information stored in the structure file
basically consists of coordinate information for biological
molecules.
• These files list the atoms in each protein, and their 3D
location in space (coordinates).
• These files are available in several formats (PDB, mmCIF,
XML).
• A typical PDB formatted file includes a large "header"
section of text that summarizes the protein, citation
information, and the details of the structure solution,
followed by the sequence and a long list of the atoms and
their coordinates.
• The archive also contains the experimental observations
that are used to determine these atomic coordinates.
Interpreting Coordinates
• A Protein Data Bank (PDB) data file for a protein
structure contains only x, y, and z coordinates of atoms
the most basic requirement for a visualization program is
to build connectivity between atoms to make a view of a
molecule.
• The visualization program should also be able to produce
molecular structures in different styles, which include
wire frames, balls and sticks, space-filling spheres, and
ribbons
• The main feature of computer visualization programs is
interactivity, which allows users to visually manipulate
the structural images through a graphical user interface.
Protein Data Bank (PDB) File

• A file that describe a structure.


• PDB format is a standard for files containing atomic
coordinates.
• It is used for structures in the Protein Data Bank and is
read and written by many programs.
• The complete PDB file specification provides
information, including authors, literature references,
and the method of structure determination.
• PDB format consists of lines of information in a text
file.
(A) Wireframes. (B) Balls and sticks. (C) Space-filling spheres. (D) Ribbons
Example: 1gcn
Atom serial number
X coordinate values
Atom name
Y coordinate values

Z coordinate values
Branch indicator
Residue type
Chain identifier
Occupancy

Remoteness indicator code


α-A β-B γ-G δ-D ε-E ζ-Z η-H
Element symbol

Residue number
ATOM Temperature factor
HETATM (B-factor)
TER
HELIX
SHEET
SSBOND
Example: 1gcn

• OXT - extra oxygen atom on the terminal carboxyl group.


• HXT- extra hydrogen atom - rarely seen
• TER - terminates the amino acid chain
• The last residue in the alpha chain is THR.
• Again, the extra oxygen atom OXT appears in the terminal carboxyl group.
• The TER record indicates the end of the peptide chain.
• It is important to have TER records at the end of peptide chains so a bond
is not drawn from the end of one chain to the start of another.
Example: 3hhb

• At the end of chain A, the heme group records appear


• The last residue in the alpha chain is an ARG (arginine).
• Again, the extra oxygen atom OXT appears in the terminal carboxyl group.
• The TER record indicates the end of the peptide chain.
• It is important to have TER records at the end of peptide chains so a bond is not
drawn from the end of one chain to the start of another.
• In the example above, the TER record is correct and should be present, but the
molecule chain would still be terminated at that point even without a TER record,
because HETATM residues are not connected to other residues or to each other.
• The heme group is a single residue made up of HETATM records.
Hydrogen Atoms Example: 1vm3

• Hydrogen atom records follow the records of all other atoms of a particular residue.
• A hydrogen atom name starts with H. The next part of the name is based on the name of the
connected nonhydrogen atom.
• For example, in amino acid residues, H is followed by the remoteness indicator (if any) of the
connected atom, followed by the branch indicator (if any) of the connected atom;
• If more than one hydrogen is connected to the same atom, an additional digit is appended so
that each hydrogen atom will have a unique name.
Common Errors in PDB Format Files
• Spurious Long Bonds
• Missing TER cards - Either a TER card or a
change in the chain ID is needed to mark the
end of a chain
• Improper use of ATOM records instead of
HETATM records
• Misaligned Atom Names
• Incorrectly aligned atom names in PDB records
can cause problems
• Duplicate Atom Names
• failure to uniquely name all atoms within a given
residue
• Residues Out of Sequence
• the second residue in the file is erroneously
numbered
• Common Typos
• Sometimes the letter l is accidentally substituted
for the number 1
• Missing Coordinates and Biological Assemblies
• Due to the limitations of structure determination methods,
most entries do not include coordinates for every single
atom in the identified molecule.
• In some cases, the experimental method may not observe
certain atoms. For example, flexible regions and hydrogen
atoms are not observed in X-ray crystallographic
experiments, and therefore, are not included in the PDB
coordinate files.
• A few of the common situations you might encounter are
– Asymmetric and Biological Assemblies (PDB ID:1hho)
– Alpha-Carbon Coordinate Files (PDB ID:1f6g)
– Missing Loops and Tails (PDB ID:1az5)
– Fragments and Domains (PDB ID:2a7u)
Exercise : Understanding PDB
Files
• Go to www.rcsb.org
• or search PDB in Google
• Search and download 1gcn
• Search and download 3hhb
• Search and download 1vm3
• Do not double click to open the file but right click the file and
choose ‘Open with’ option.
• Choose program ‘WordPad’ to open.
Alternate version of the exercise

• Go to www.rcsb.org
• or search PDB in Google
• Search 1gcn
• Search 3hhb
• Search 1vm3
• Click ‘Display Files’
• Explore ‘PDB Format’
Need More info?
• Check the following links…
• Introduction to PDB Data
• http://pdb101.rcsb.org/learn/guide-to-
understanding-pdb-data/introduction
Ramachandran Plot
• A special way for plotting
protein torsion angles was also
introduced by Ramachandran
and co-authors, and was
subsequently named the
Ramachandran plot.
• The Ramachandran plot
provides an easy way to view
the distribution of torsion
angles in a protein structure.
• The two torsion anglesdescribe
the rotations of the
polypeptide backbone around
the bonds between N-Cα
(called Phi, φ) and Cα-C (called
Psi, ψ).
• Torsion angles are among the most important
local structural parameters that control protein
folding - essentially, if we would have a way to
predict the Ramachandran angles for a particular
protein, we would be able to predict its fold.
• The torsion angles phi and psi provide the
flexibility required for the polypeptide backbone
to adopt a certain fold, since the third possible
torsion angle within the protein backbone (called
omega, ω) is essentially flat and fixed to 180
degrees.
• The horizontal axis shows φ values, while the vertical shows ψ
values.
• Notice that the counting starts in the left hand corner from -180
and extend to +180 for both the vertical and horizontal axes.
• Each dot on the plot shows the angles for an amino acid.
• This allows clear distinction of the characteristic regions of α-
helices and β-sheets.
• The regions on the plot with the highest density of dots are the
so-called “allowed” regions, also called low-energy regions.
• Some values of φ and ψ are forbidden since the involved atoms
will come too close to each other, resulting in a steric clash.
• For a high-quality and high resolution experimental structure
these regions (generously allowed and disallowed) are usually
empty or almost empty - very few amino acid residues in
proteins have their torsion angles within these regions.
• But there are sometimes exclusions from this rule - such values can be
found and they most probably will result in some strain in the polypeptide
chain.
• In such cases additional interactions will be present to stabilize such
structures. They may have functional significance and may be conserved
within a protein family.
• Another exception from the principle is the torsion angle distribution for one
single residue, glycine.
• Glycine does not have a side chain, which allows high flexibility in the
polypeptide chain, making otherwise forbidden rotation angles accessible.
• That is why glycine is often found in loop regions, where the polypeptide
chain needs to make a sharp turn.
• This is also the reason for the high conservation of glycine residues in
protein families, since the presence of turns at certain positions is a
characteristic of a particular fold of a structure.
• Another residue with special properties is proline, which in contrast to
glycine fixes the torsion angles at a certain value, very close to that of an
extended β-strand.
• Proline is often found at the end of helices and functions as a “helix
disruptor”.
Structure Quality Assessment
• In cases when the protein X-ray structure was not properly
refined, and especially for bad or wrong homology models,
we may find torsion angles in disallowed regions of the
Ramachandran plot − this type of deviations usually
indicates problems with the structure.
• Based on this, the Ramachandran plot is usually used in
assessing the quality of experimental structures or
homology models.
• Torsion angles outside the low-energy regions, whenever
observed, should be carefully examined.
• They may indicate problems in the structure, but they may
also be true and may provide some interesting insights into
the function of the protein.
Red indicates low-energy regions and allowed regions; yellow
allowed regions, pale yellow the so-called generously-allowed
regions and white marks disallowed regions. A: Good, B: Bad.
Exercise – Ramachandran Plot
RAMPAGE can be accessed from
http://mordred.bioc.cam.ac.uk/~rapper/rampage.php
Procheck can be accessed through PDBSum http://www.ebi.ac.uk/thornton-
srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=index.html
Upload a PDB file by clicking the ‘Browse’ button.
Provide an email ID to receive the results in email.

You might also like