Lec 01
Lec 01
• Expression Analysis
– Genomes of human
– Selected model organisms
–First contig
–First gene
Protein Data Bank
• PDB homepage: www.rcsb.org/pdb/
• Search for protein
glyceraldehyde-3-phosphate dehydrogenase
– Result 1A7K
• View
LINKS
• Human Genome Project -
www.nhgri.nih.gov/HGP/
• The three main DNA banks:
– GenBank - www.ncbi.nlm.nih.gov
– EMBL - www.embl-heidelberg.de
– DDBJ – www.ddbj.nig.ac.jp
Important protein database-
www.expasy.ch/sprot/sprot-top.html
Pdb: protein databank (USA) -www.rcsb.org/pdb/
Definition
• Index: An index is a set of pointers to information
in a database.
In searching the entire World Wide Web, or a
specialized database in molecular biology, you
submit one or more search terms, and a program
checks for them in its tables of indices.
• Information retrieval software identifies entries
with contents relevant to your interest.
• Example: If you submit the term 'horse' and the
program returns a list of entries that contain the
term horse.
Contents of Databank
Primary data collections related to biological
macromolecules include:
• Nucleic acid sequences, including whole-genome
projects
• Amino acid sequences of proteins
• Protein and nucleic acid structures
• Small-molecule crystal structures
• Protein functions
• Expression patterns of genes
• Publications
Data Bank
• NCBI: National Center for Biotechnology Information
(USA)
• EMBL: EMBL Data Library (European Bioinformatics
Institute, UK)
• DDBJ: DNA Data Bank of Japan (National Institute of
Genetics, Japan).
• The groups exchange data daily. As a result the raw data are
identical, although the format in which they are stored, and
the nature of the annotation, vary slightly among them.
• http://swissmodel.expasy.org/
Documentation
• All files documented and indexed
• Documentation kept up-to-date
• Application of SwissProt:
– Provides highly organized data and information on a
wide variety of proteins
– Can be used as a starting point for protein research
– Allows searches to be conducted starting with various
search strings
– Biochemical encyclopedia
SWISS-PROT
UniPort
• Mission: Provide the scientific community with a
comprehensive, high quality and freely accessible
resource of protein sequence and functional
information.
Comprised of four components:
• UniProt Knowledgebase (UniProtKB)
• UniProt Reference Clusters (UniRef)
• UniProt Archive (UniParc)
• UniProt Metagenomic and Environmental
Sequences (UniMES)
UniProt Knowledgebase
• UniProt Knowledgebase (UniProtKB) - Central
access point for extensive curated protein
information, including function, classification, and
cross-reference.
– UniProtKB/Swiss-Prot - manually annotated and is
reviewed
– UniProtKB/TrEMBL - automatically annotated and is
not reviewed.
UniProt Reference Clusters
• UniProt Reference Clusters (UniRef) - databases
provide clustered sets of sequences from the
UniProtKB and selected UniProt Archive records to
obtain complete coverage of sequence space at
several resolutions while hiding redundant
sequences.
• UniProt Archive
UniProt (UniParc)
Archive - comprehensive
repository, used to keep track of sequences and their
identifiers.
• UniProt Metagenomic and Environmental
Sequences (UniMES) - database is a repository
specifically developed for metagenomic and
environmental data.
• http://pir.georgetown.edu/
• http://www.uniprot.org/uniprot/P00974
• http://www.rcsb.org
Protein Data Bank (PDB)
The best-established database for biological macromolecular
structures.
• Contains:
– Structures of proteins
– Nucleic acids
– a few carbohydrates
• Founder: Walter Hamilton
Brookhaven National Laboratories
Long Island, New York, USA
• Time: 1971
Protein Data Bank (PDB)
• Current Manager: