0% found this document useful (0 votes)
18 views

Bioinformatic Databases 2

Biological databases are libraries of biological information collected from scientific experiments and literature. They provide structured, indexed data including sequences, functions, structures, and related references that is periodically updated. Major databases include sequence databases like NCBI and UniProt, and structure databases like PDB, CATH, and SCOP. Sequence databases provide nucleotide and protein sequences along with annotation, while structure databases classify protein domains and structures hierarchically based on structural and evolutionary relationships. Tools on these databases allow users to search, analyze, and visualize biological data.

Uploaded by

vivian1899190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Bioinformatic Databases 2

Biological databases are libraries of biological information collected from scientific experiments and literature. They provide structured, indexed data including sequences, functions, structures, and related references that is periodically updated. Major databases include sequence databases like NCBI and UniProt, and structure databases like PDB, CATH, and SCOP. Sequence databases provide nucleotide and protein sequences along with annotation, while structure databases classify protein domains and structures hierarchically based on structural and evolutionary relationships. Tools on these databases allow users to search, analyze, and visualize biological data.

Uploaded by

vivian1899190
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Biological Databases

Dr. Sajid
Department of Biotechnology
Biological Databases
• Biological databases are libraries of
biological sciences, collected from scientific
experiments, published literature, high-
throughput experiment technology, and
computational analysis.
What we expect from a database..!!
• Sequence, functional, structural information,
related bibliography
• Well Structured and Indexed
• Well cross-referenced (with other databases)
• Periodically updated
• Tools for analysis and visualization
Biological Databases
• Sequence databases
• Structure databases
Sequence databases
• Nucleotide databases
• Protein databases
Sequence databases
Nucleotide databases
• International Nucleotide Sequence
Database Collaboration (INSDC)
– NCBI
– EMBL
– DDBJ
Standard contents of a sequence database

• Sequences
• Accession number
• References
• Taxonomic data
• Annotation/curation
• Keywords
• Cross-references
• Documentation
NCBI
• Very comprehensive biological database
• GENBANK: The nucleotide sequence database
• Provides 42 different resource
• Provides a simple and easy to use web
interface

http://www.ncbi.nlm.nih.gov/
• Sequence submission: done using Bankit or
Sequin
• Search Engine for data retrieval: Entrez
• Retrieves information across all the resources
under NCBI
Example: PubMed, taxonomy, SNP, PubChem
etc.
Tools for analysis
• BLAST
• Primer-BLAST
• B-Link
• ORF finder
• Genome workbench
Protein Sequence databases

• UniProt
• PFAM
• Gene Index project
UniProt
• Universal Protein Resource
• Formed through the merger of :
– SIB
– EBI-SwissProt
– TrEMBL
– PIR-PSD
• Entry names are often the names of the gene
followed by the species.
• Accession numbers are of the following
format:
• e.g. P26367 (PAX6_HUMAN)
Uniprot features

• Blast
• Align
• Retrieve
• ID mapping
Pfam
• Proteins contain conserved regions
• Based on the conserved regions, proteins are
classified into families
• Provides links to external databases like PDB,
SCOP, CATH etc.
Pfam: Features
• Sequence search
• View Pfam family
• View a clan
• View a sequence
• View a structure
• Keyword search
Gene Indices
• Project aimed at indexing genes and their
variants in the various genome sequences.

• Creating a catalogue of genes in a wide range


of organisms

• Reduce redundancy
Gene Indices Software Tools
• TGI Clustering tools
• Clview
• SeqClean
• Cdbfasta/cdbyank
Structural databases
• PDB – Protein Data Bank
• CATH
• SCOP – Structural Classification of Proteins
wwPDB
• Contains information about experimentally
determined structures of proteins, nucleic
acids, and complex assemblies

• RCSB-PDB, PDBe, PDBj, BMRB – repositories of


protein structure data

• Files in PDB, mmCIF, PDBML/XML formats


• Advanced search – provides comprehensive
information about a protein.
• Sequence info, domain info, sequence
similarity, literature, apart from the details of
the structure.

• Cross referenced to SCOP and CATH


CATH
• Classification of proteins based on domain
structures
• Each protein chopped into individual domains
and assigned into homologous superfamilies.
• Hierarchial domain classification of PDB
entries.
CATH hierarchy
• Class – derived from secondary structure content is assigned
automatically
• Architecture – describes gross orientation of secondary
structures, independent of connectivity
• Topology – clusters structures according to their
topological connections and numbers of secondary
structures
• Homologous superfamily – this level groups
together protein domains which are thought to
share a common ancestor and can therefore be
described as homologous
SCOP
• Description of structural and evolutionary
relationships between all the proteins with
known structures
• Uses the PDB entries
• Search using keywords or PDB identifiers
Hierarchy in SCOP
• Class
• Fold
• Superfamily
• Family
• Species
Thank you

You might also like