0% found this document useful (0 votes)
15 views17 pages

Biological Databases

Uploaded by

Sickdan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Biological Databases

Uploaded by

Sickdan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CS427 Bioinformatics

Biological Databases and Data Formats


Biological Databases Classification
1. Based on Data Type
2. Based on Scope:
3. Based on Data Source
4. Based on Access and Usage
5. Based on Organizational Structure
Based on Data Type
• Nucleotide Sequence Databases: These databases store DNA and RNA
sequences.
• Examples: GenBank, EMBL, DDBJ.
• Protein Sequence Databases: These databases store protein
sequences.
• Examples: UniProt, SWISS-PROT, PIR, PRF, TrEMBL
• Structure Databases: These contain three-dimensional structures of
biomolecules.
• Examples: Protein Data Bank (PDB), SCOP, CATH.
Based on Data Type
• Genomic Databases: These store whole-genome sequences and
annotations.
• Examples: Ensembl, UCSC Genome Browser.
• Expression Databases: These include data on gene expression, such as
microarray data.
• Examples: Gene Expression Omnibus (GEO), ArrayExpress.
• Pathway Databases: These focus on metabolic and signaling
pathways.
• Examples: KEGG, Reactome.
Based on Data Type
• Interaction Databases: These include protein-protein interactions,
gene regulatory networks, etc.
• Examples: BioGRID, STRING.
• Phenotype and Mutation Databases: These contain data related to
genetic mutations and their phenotypic consequences.
• Examples: OMIM, ClinVar, HGMD.
Based on Scope
• Primary Databases: These contain raw data submitted directly by
researchers.
• Examples: GenBank, EMBL, DDBJ, Swiss-Prot, PIR, PRF
• Secondary Databases: These contain data derived from primary
databases, usually through analysis, annotation, or curation.
• Examples: UniProtKB (a combination of Swiss-Prot and TrEMBL), RefSeq, PDB
• Specialized Databases: These focus on a specific organism, biological
system, or type of data.
• Examples: FlyBase (Drosophila), WormBase (C. elegans).
Based on Data Source
• Curated Databases: These are manually curated by experts to ensure
high accuracy.
• Examples: Swiss-Prot, RefSeq.
• Non-curated Databases: These rely on automated data submissions
with minimal human intervention.
• Examples: TrEMBL, EST databases.
Based on Access and Usage
• Public Databases: These are freely accessible to everyone.
• Examples: NCBI, EMBL-EBI, DDBJ.
• Proprietary Databases: Access is restricted, often requiring a
subscription or payment
• Examples: COSMIC, BioBase.
Based on Organizational Structure
• Flat File Databases: Store data in simple text files with a standardized
format.
• Example: GenBank flat files.
• Relational Databases: Organize data in tables that can be linked based
on relationships.
• Example: UniProtKB.
• Object-oriented Databases: Store data as objects, similar to object-
oriented programming.
• Example: Ensembl.
Bibliography Data Base
• PubMed
Most common
• GenBank by NCBI
• Go to NCBI website and search for a gene in nucleotide (eg. HBA1 or TP53)
• The information is extracted from GenBank
• EMBL
• Goto ebi.ac.uk and search
• DDBJ
• UniProt
• PDB
Expasy
• The translational tool
Entrez
• Global Query Cross-Database Search System is a federated search
engine, or web portal that allows users to search many discrete
health sciences databases at the National Center for Biotechnology
Information (NCBI) website.
• Text based search
• All information integrated: Structure, sequence, literature etc.
• Cross reference across databases
• French word for “come in”
• Use of Boolean operators
Ensembl
• Genome browser
• Demo
Biomart in Ensembl
• For interconversion of ids and names of genes
Motif
• A sequence motif is a nucleotide or amino-acid sequence pattern that
is widespread and usually assumed to be related to biological
function of the macromolecule.
• Explore motif data bases

You might also like