0% found this document useful (0 votes)
170 views14 pages

Genome Database & Information System For Daphnia: @bio - Indiana.edu

This document discusses requirements for developing a genome database and information system for Daphnia. It outlines the key components needed, including data types, standardized schemas, software architecture, analysis tools, publication interfaces, and data management interfaces. Examples of existing genome databases like FlyBase and euGenes are provided to illustrate the anatomy and components of a successful genome information system.

Uploaded by

sruthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views14 pages

Genome Database & Information System For Daphnia: @bio - Indiana.edu

This document discusses requirements for developing a genome database and information system for Daphnia. It outlines the key components needed, including data types, standardized schemas, software architecture, analysis tools, publication interfaces, and data management interfaces. Examples of existing genome databases like FlyBase and euGenes are provided to illustrate the anatomy and components of a successful genome information system.

Uploaded by

sruthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

Genome database &

information system for


Daphnia
• Don Gilbert, [email protected] October
2002
• Talk doc at http://iubio.bio.indiana.edu/daphnia/docs/
genome-dbs-talk.doc, .ppt
Genome database
examples
• Drosophila: FlyBase, http://flybase.net/ (Indiana Univ.)
• C. elegans: Wormbase, http://www.wormbase.org/
• Mouse: MGD, http://www.informatics.jax.org/
• Saccaromyces: SGD, http://genome-
www.stanford.edu/Saccharomyces/
• Human: LocusLink, http://www.ncbi.nlm.nih.gov/LocusLink/
• Human: GeneCards http://bioinfo.weizmann.ac.il/cards/
• Various eukaryotes: Ensembl http://www.ensembl.org/
• Various eukaryotes: euGenes http://eugenes.org/ (Indiana
Univ.)
• Many newly developing organism genome systems for
Daphnia, insects, vertebrates, new full-genome organisms
Anatomy of genome
database & info system
Anatomy of Genome
DB/IS
• Structure
– Complex document structure; tabular data; etc.
– Organize: Table of contents, Reports, Indexing
– Browse contents; Search / retrieve from biological
questions
– Bulk data search / retrieve for bioinformatics

• Content
– Literature (abstracted and curated), Sequence and
feature analyses, maps, controlled
vocabulary/ontologies, people, biologics, contacts, etc.
– Metadata describing primary data, along with
protocols, notes, sources
Anatomy of Genome
DB/IS, 2
• Data exchange
– Data definitions & schema (XML)
– Controlled vocabularies of science terms, ontologies
– Minimal information for collaboration, sharing

• Informatics / software
– Backend database, data collection, management,
analyses
– Front-end services (hypertext web, search/retrieval);
ease of understanding and usage (HCI)
– Middleware software, interfaces
– Genome specialized: maps, BLAST searches,
ontologies
GMOD - Generic genome
database tools
• Generic Model Organism Database
Construction Set, http://www.gmod.org/
• Database schemas
• Literature curation tools
• Gene ontology management tools
• Visualization tools
• Data processing pipelines
FlyBase and euGenes
FlyBase.net
• Distributed project (4 sites, ~6 PI’s, ~15 curators,
~15 informaticians); 10 years old
• Multiple databases; project data flow and
exchange critical
• Curated and computed data, from expt.
literature, genome sequence
• Integrated database modules (for generic use w/
GMOD)
– Genetics, Sequences, Maps, Expression
– Controlled vocabularies & Ontologies
– Computational analyses
– Organism, taxonomy, phylogenetic/comparative
– Publications, General
euGenes.org
• Automated genome summaries for Human,
Fruitfly, Mouse, Mosquito, Arabidopsis, C.
elegans, Saccharomyces, Zebrafish
• 3 year, computational DB project, 1 part-time
informatician (dgg )
• genome maps, sequences, gene reports,
external database links
• cross-species comparisons: similar genes,
genome features, gene function
A genome web db for
Daphnia
Preliminary example
• http://iubio.bio.indiana.edu/daphnia/
• Sample data include microsatellite DNA of J.
Colbourne, GenBank Daphnia seqs, Medline
abstracts
• Blast searches, reports
• Text data searches
Requirements for a
genome db/ info system
• Data components??
– biosequence types, literature, external data (insects,
others), expression info, pathways, maps, anatomy,
populations, species, ecology, organismal, stocks,
people
– Standard data structure and exchange schema
(sequences, XML)

• Architecture
– Internet-shared, standards-based, open-source preferred
– Relational database for data management
– Search and retrieval software for flat file data
– Flexible – data schema changes common
– Performance constraints
Requirements for
genome system, cont.
• Analysis software
– Project uses: sequence analyses, external database
comparisons
– One-time analyses, publishing results
– Pipeline for automated analyses, rerun as needed
– Public uses (e.g. BLAST search)

• Publication interface
– Detail biological object views (sequences, genes, etc.)
– Queries: simple-common, ad-hoc/general
– Graphic viewers

• Editing / data management interface


– Interactive – document editing
– Batch data updates
Compute parts of
system
• Web server (Apache) and modules
• FTP server for bulk data exchange
• Relational DBMS: PostgreSQL.org, MySQL.com,
Oracle..
• Analysis programs: BLAST, various
bioinformatics tools
• Perl, Java middleware for data access &
analysis, search and report
• Limited, secure access for project data
management
• Public access for released data (web, ftp)

You might also like