We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17
CS427 Bioinformatics
Biological Databases and Data Formats
Biological Databases Classification 1. Based on Data Type 2. Based on Scope: 3. Based on Data Source 4. Based on Access and Usage 5. Based on Organizational Structure Based on Data Type • Nucleotide Sequence Databases: These databases store DNA and RNA sequences. • Examples: GenBank, EMBL, DDBJ. • Protein Sequence Databases: These databases store protein sequences. • Examples: UniProt, SWISS-PROT, PIR, PRF, TrEMBL • Structure Databases: These contain three-dimensional structures of biomolecules. • Examples: Protein Data Bank (PDB), SCOP, CATH. Based on Data Type • Genomic Databases: These store whole-genome sequences and annotations. • Examples: Ensembl, UCSC Genome Browser. • Expression Databases: These include data on gene expression, such as microarray data. • Examples: Gene Expression Omnibus (GEO), ArrayExpress. • Pathway Databases: These focus on metabolic and signaling pathways. • Examples: KEGG, Reactome. Based on Data Type • Interaction Databases: These include protein-protein interactions, gene regulatory networks, etc. • Examples: BioGRID, STRING. • Phenotype and Mutation Databases: These contain data related to genetic mutations and their phenotypic consequences. • Examples: OMIM, ClinVar, HGMD. Based on Scope • Primary Databases: These contain raw data submitted directly by researchers. • Examples: GenBank, EMBL, DDBJ, Swiss-Prot, PIR, PRF • Secondary Databases: These contain data derived from primary databases, usually through analysis, annotation, or curation. • Examples: UniProtKB (a combination of Swiss-Prot and TrEMBL), RefSeq, PDB • Specialized Databases: These focus on a specific organism, biological system, or type of data. • Examples: FlyBase (Drosophila), WormBase (C. elegans). Based on Data Source • Curated Databases: These are manually curated by experts to ensure high accuracy. • Examples: Swiss-Prot, RefSeq. • Non-curated Databases: These rely on automated data submissions with minimal human intervention. • Examples: TrEMBL, EST databases. Based on Access and Usage • Public Databases: These are freely accessible to everyone. • Examples: NCBI, EMBL-EBI, DDBJ. • Proprietary Databases: Access is restricted, often requiring a subscription or payment • Examples: COSMIC, BioBase. Based on Organizational Structure • Flat File Databases: Store data in simple text files with a standardized format. • Example: GenBank flat files. • Relational Databases: Organize data in tables that can be linked based on relationships. • Example: UniProtKB. • Object-oriented Databases: Store data as objects, similar to object- oriented programming. • Example: Ensembl. Bibliography Data Base • PubMed Most common • GenBank by NCBI • Go to NCBI website and search for a gene in nucleotide (eg. HBA1 or TP53) • The information is extracted from GenBank • EMBL • Goto ebi.ac.uk and search • DDBJ • UniProt • PDB Expasy • The translational tool Entrez • Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. • Text based search • All information integrated: Structure, sequence, literature etc. • Cross reference across databases • French word for “come in” • Use of Boolean operators Ensembl • Genome browser • Demo Biomart in Ensembl • For interconversion of ids and names of genes Motif • A sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. • Explore motif data bases