PDP
PDP
The
PDB files contain experimentally decided 3D structures of biological macromolecules. ➢ The
structural information of a protein can be determined by X–ray crystallography or Nuclear Magnetic
Resonance (NMR) spectroscopy methods. ➢ Here X-rays are diffracted by electrons of a comparable
sized atom resulting in patterns obtained as small spots in an X-ray film. These patterns are used to
calculate the coordinates of atoms in a protein. ➢ NMR spectroscopy (Nuclear Magnetic Resonance)
is also used for determining the structure of molecules. The nucleus of an atom that is located in a
high magnetic field can absorb the electromagnetic radiation of a particular frequency. ➢
Electromagnetic radiation is a form of energy that contains both electric and magnetic fields. This
type of radiation includes X-rays, gamma rays, radio waves, visible light etc. The PDB files also
contains information of data collected, molecule name, primary and secondary structure, ligand,
atomic coordinates, crystallographic structure factors, NMR experimental data etc.. ➢ The data are
submitted by scientists from all over the world. PDB is maintained by Worldwide Protein Data Bank.
Each entry in the PDB is provided with a unique identification numbercalled the PDB ID. It is a 4
letter identification number which consist of both alphanumeric characters. All data in PDB are
accessible to public. ➢ There are databases which contain data derived from PDB. For example
Structural Classification of Proteins (SCOP) that groups different protein structures, HSSP (Homology-
Derived Secondary Structure of Proteins) for 3D- structure and 1Dsequence of the protein, CATH for
protein structure classification according to their evolution etc. ➢ PDB allows searching for
information regarding the structure, sequence, function, visualize, download and to assess
molecules. PDB File Format ➢ The PDB file format is the standard file format for protein structure
files. It describes how molecules are held together in 3-D structure of a protein. The file contains
hundreds or thousands of lines called record, which describes about protein. Figure 1 shows certain
parts of a PDB formatted file for deoxyhemoglobin. STUDY MATERIAL FOR B.SC MB
BIOINFORMATICS SEMESTER - V, ACADEMIC YEAR 2020-21 Page 13 of 41 Sequence Data ➢ For
guidance on the submission process for your sequence(s), please see How To: Submit sequence data
to NCBI. ➢ Your data will be submitted to one of the following databases: ➢ GenBank ➢ Sequence
Read Archive (SRA) ➢ dbSNP ➢ dbVar ➢ GEO Microarray Data ➢ If you have microarray data from
clinical studies that require controlled access, you should submit your data to dbGaP. ➢ For all other
microarray data, you should submit your data to GEO via GEO's Submission page. Bioassay Data,
Substance or Sequence-BasedReagents ➢ BioAssay data and chemical substance information should
be submitted to PubChem via the PubChem Upload Service. ➢ Submission of sequence-based
reagents should go to the Probe database via the Probe submission protocol. Humanclinical data and
genetic tests ➢ If you have data from clinical studies, you should submit your data to dbGaP. ➢ If
you are a genetic test provider, you may submit your information to GTR. A Manuscript ➢ If you
have a manuscript or publication that needs to be deposited into the PubMed Central Database in
order to comply with the NIH Public Access Policy, you should follow the steps described ➢
Submitting data to the UniProt Knowledge base ➢ We provide accession numbers for proteins that
have been directly sequenced. We do not provide, in advance, accession numbers for protein
sequences that result from translation of nucleic acid sequences. ➢ These translations are
automatically forwarded to us from the DDBJ/EMBLBank/GenBank nucleotide sequence databases
and are processed into UniProtKB/TrEMBL. All the information you need to submit sequence or
annotation updates is available at www.uniprot.org/support/submissions.shtm Retrieving data from
UniProt databases Browsing: ➢ UniProt offers a range of services that allows you to browse and
analyze the data (www.uniprot.org/search/SearchTools.shtml). Depending on the complexity of your
query you can choose from three different types of text-based search. You can perform sequence-
based searches of any of the UniProt databases and some of their component data sets using a
variety of sequence comparison tools, and you can search for families, domains and motifs. ➢ You
can perform multiple sequence alignments, retrieve multiple entries, identify proteins from
proteomics experiments and perform bibliographic searches. STUDY MATERIAL FOR B.SC MB
BIOINFORMATICS SEMESTER - V, ACADEMIC YEAR 2020-21 Page 14 of 41 Downloading: ➢ If you
need to download entire databases, the UniProt Knowledgebase and UniRef databases are available
at www.uniprot.org/database/download.shtml.CD-ROM.The UniProt Knowledgebase full releases
are distributed on CD-ROM. If you would like to receive them, please send us an e-mail using the
query form at www.ebi.ac.Uk/support/.