0% found this document useful (0 votes)
44 views37 pages

Nucleic Acid Databases

The document provides an overview of biological databases, specifically focusing on nucleic acid databases and their classifications, including primary and secondary databases based on data types and technical design. It highlights major nucleotide sequence databases such as EMBL, GenBank, and DDBJ, along with additional NCBI databases and resources. The document also offers tips for database searching and exercises for further exploration of the topic.

Uploaded by

Mohamed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views37 pages

Nucleic Acid Databases

The document provides an overview of biological databases, specifically focusing on nucleic acid databases and their classifications, including primary and secondary databases based on data types and technical design. It highlights major nucleotide sequence databases such as EMBL, GenBank, and DDBJ, along with additional NCBI databases and resources. The document also offers tips for database searching and exercises for further exploration of the topic.

Uploaded by

Mohamed Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Introduction to Bioinformatics

databases: Nucleic Acid


Databases




04/04/25 15:52
Biological databases: why?
• Need for storing and communicating
large datasets has grown
• Make biological data available to
scientists.
• To make biological data available in
computer-readable form.

04/04/25 15:52
Different classifications of
databases
• Type of data
– nucleotide sequences
– protein sequences
– proteins sequence patterns or motifs
– macromolecular 3D structure
– gene expression data
– metabolic pathways

04/04/25 15:52
Different classifications of databases….

• Primary or derived databases


– Primary databases: experimental results
directly into database
– Secondary databases: results of analysis of
primary databases
– Aggregate of many databases
• Links to other data items
• Combination of data
• Consolidation of data

04/04/25 15:52
Different classifications of databases….

• Technical design
– Flat-files
– Relational database (SQL)
– Exchange/publication technologies (FTP,
HTML, CORBA, XML,...)

04/04/25 15:52
Different classifications of databases….

• Availability
– Publicly available, no restrictions
– Available, but with copyright
– Accessible, but not downloadable
– Academic, but not freely available
– Proprietary, commercial; possibly free for
academics

04/04/25 15:52
Where do I get DB of my interest ?

04/04/25 15:52
04/04/25 15:52
http://www3.oup.co.uk/nar/database/c/

04/04/25 15:52
Nucleotide sequence databases
• EMBL, GenBank, and DDBJ are the three
primary nucleotide sequence
databases
• EMBL www.ebi.ac.uk/embl/
• GenBank www.ncbi.nlm.nih.gov/Genbank/
• DDBJ www.ddbj.nig.ac.jp

04/04/25 15:52
Genbank
• An annotated collection of all publicly
available nucleotide and proteins

• Set up in 1979 at the LANL (Los Alamos).

• Maintained since 1992 NCBI (Bethesda).

• http://www.ncbi.nlm.nih.gov
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
EMBL Nucleotide Sequence
Database
• An annotated collection of all publicly available
nucleotide and protein sequences

• Created in 1980 at the European Molecular


Biology Laboratory in Heidelberg.

• Maintained since 1994 by EBI- Cambridge.

• http://www.ebi.ac.uk/embl.html
04/04/25 15:52
04/04/25 15:52
http://www3.ebi.ac.uk/Services/DBStats/

04/04/25 15:52
DDBJ–DNA Data Bank of Japan
• An annotated collection of all publicly available
nucleotide and protein sequences

• Started, 1984 at the National Institute of


Genetics (NIG) in Mishima.

• Still maintained in this institute a team led by


Takashi Gojobori.

• http://www.ddbj.nig.ac.jp
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
Other NCBI nucleic acids DBs
• EST database: A collection of expressed sequence tags, or short, single-pass sequence
reads from mRNA (cDNA).
• GSS database: A database of genome survey sequences, or short, single-pass genomic
sequences.
• HomoloGene: A gene homology tool that compares nucleotide sequences between pairs of
organisms in order to identify putative orthologs.
• HTG database: A collection of high-throughput genome sequences from large-scale
genome sequencing centers, including unfinished and finished sequences.
• SNPs database: A central repository for both single-base nucleotide substitutions and
short deletion and insertion polymorphisms.
• RefSeq: A database of non-redundant reference sequences standards, including genomic
DNA contigs, mRNAs, and proteins for known genes. Multiple collaborations, both within
NCBI and with external groups, supports data-gathering efforts.
• STS database: A database of sequence tagged sites, or short sequences that are
operationally unique in the genome.
• UniSTS: A unified, non-redundant view of sequence tagged sites (STSs).
• UniGene: A collection of ESTs and full-length mRNA sequences organized into clusters,
each representing a unique known or putative human gene annotated with mapping and
expression information and cross-references to other sources.

04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
Sequence submission
• Data mainly direct submissions from the
authors.
• Submissions through the Internet:
– Web forms.
– Email.
• Sequences shared/exchanged between
the 3 centers on a daily basis:
– The sequence content of the banks is
identical.
04/04/25 15:52
Derived databases
• CUTG Codon usage tabulated from GenBank
http://www.kazusa.or.jp/codon/
• Genetic Codes Deviations from the standard genetic code in various
organisms and organelles
http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
• TIGR Gene Indices Organism-specific databases of EST and gene
sequences http://www.tigr.org/tdb/tgi.shtml
• UniGene Unified clusters of ESTs and full-length mRNA sequences
http://www.ncbi.nlm.nih.gov/UniGene/
• ASAP Alternative spliced isoforms
http://www.bioinformatics.ucla.edu/ASAP
• Intronerator Introns and alternative splicing in C.elegans and
C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/

04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
Nucleic acid structure
databases
• NDB Nucleic acid-containing structures
http://ndbserver.rutgers.edu/

• NTDB Thermodynamic data for nucleic acids


http://ntdb.chem.cuhk.edu.hk/

• RNABase RNA-containing structures from PDB and NDB


http://www.rnabase.org/

• SCOR Structural classification of RNA: RNA motifs by


structure, function and tertiary interactions
• http://scor.lbl.gov/

04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
04/04/25 15:52
Database searching tips
• Look for links to Help or Examples
• Try Boolean searches
• Be careful with UK/US spelling differences
– leukaemia vs leukemia
– haemoglobin vs hemoglobin
– colour vs color

04/04/25 15:52
Exercises
• Study the statistics of the three primary nucleic acid
databases: Are they matching ?
• Look for a gene of your interest in the three primary
nucleic acid databases: compare the information given in
each one of them.
• Read NAR DB paper and NAR DB index site: search for
different nucleic acid databases based on different
search terms.
• Self study:
– http://www3.oup.co.uk/nar/database/c/
– Download NAR database paper (NARDB2004) from:
ftp://cbag.sc.mahidol.ac.th/pub/Course_Materials/dinesh

04/04/25 15:52

You might also like