0% found this document useful (0 votes)
5 views11 pages

LO4 Access To Sequenced Data and Related Information

Uploaded by

dumpdave30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

LO4 Access To Sequenced Data and Related Information

Uploaded by

dumpdave30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Bio16 Computational Biology

Access to Sequenced Data and Related Information

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

Sources of Data and Databases


Researchers collect lots of data BIOLOGICAL DATABASES

Diverse living organisms = Library of related information


Hundreds of thousands of #Goals: collection & and preservation,
species easy access, standardized data
presentation, minimize redundancy, data
Scientifically published independence, management, updating,
experiments producing millions and organizing data into knowledge
of articles

High-throughput technologies
like PCR, sequencing, and
molecular assays

Data quality assessment data


(even before analysis) Growth of DNA
sequence in
repositories
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

1
Centralized Databases Store DNA Sequences

3 Main Nucleotide Sequence Database

GenBank = National Center for


Biotechnology Information (NCBI) of the
National Institutes of Health (NIH) in
Bethesda

European Nucleotide Archive = European


Molecular Biology Laboratory (EMBL)-Bank
Nucleotide Sequence Database at the
European Bioinformatics Institute (EBI) in
Hinxton, England

DNA Database of Japan = National Institute


of Genetics in Mishima

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Other Common Biological Database

2
Integration of Biological Databases
Tier 1 Tier 2 Tier 3

Challenges
Database architecture = similar structure
How to access & what can be accessed = data surfing
Naming system (S. cerevisiae RAD24 =rad17 in S. pombe)
Clash of concepts = definitions of terms (definition of GENE)

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Integration of Biological Databases


Approaches

Link Integration
▪ researchers begin their query with one data
source and then follow hypertext links to related
information in other data sources
▪ Vulnerable to naming clashes and ambiguities,
updates, researcher-dependent

View Integration
▪ leaves the information in its source databases but
builds an environment around the databases that
makes them all seem to be part of one large
system
▪ didn’t perform as well as the source database

Data Warehousing
▪ bringing all the data under one roof in a single
database
▪ Issue on keeping the data warehouse up to date

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

3
Contents of Databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Contents of Databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

4
Types of Biological Data

Genomic Databases

Sequenced Tag Sites (STS) = short


(typically 500 base pairs long)
genomic landmark sequences

Genome Survey Sequences (GSSs)


= consist of sequences that are
genomic in origin

High-Throughput Genomic
Sequence (HTGS) = contains
unfinished DNA sequences from
sequencing centers

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Types of Biological Data

RNA Databases

Expressed Sequence Tags (ESTs)=


contain sequence data on “single-pass”
cDNA sequences

UniGene = (unique gene) created for


gene-oriented clusters by making
nonredundant sets of ESTs

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

10

5
Types of Biological Data
Protein Databases

UniProt = aka Universal Protein Resource


is the most comprehensive, centralized
protein sequence catalog
Key Databases:
Swiss-Prot = considered the best-
annotated protein database
(structure and function)
TrEMBL = translated EMBL-NSDL
provides automated annotations of
proteins
PIR = aka Protein Information
Resource maintains the Protein
Sequence Database curated also by
experts

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

11

Types of Biological Data

Protein Databases

UniProt = aka Universal Protein Resource


is the most comprehensive, centralized
protein sequence catalog
Key Database Layers:
UniProtKB = UniProt Knowledgebase is
the central database of either manual
or automated annotations
UniRef = UniProt Reference Clusters
offer nonredundant reference clusters
on UniProtKB
UniParc = UniProt Archive consists of a
stable, nonredundant archive of
protein sequences

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

12

6
Central Bioinformatics Resource

National Center for Biotechnology


Information (NCBI)
creates public databases
conducts research in comp. biology
develops software tools for analyzing
genome data
disseminates biomedical information

European Bioinformatics Institute (EBI)


Comparable to NCBI in its scope and
mission
Represents a complementary,
independent resource
Have six (6) core molecular databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

13

NCBI Prominent Resources

National Center for Biotechnology Information (NCBI) (n.d.). Guide. https://ncbi.nlm.nih.gov/guide/all/

14

7
Access to Information

Entrez
a molecular biology
database system that
provides integrated
access to databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

15

Access to Information

Searching Databases

Boolean Operators: AND, OR, NOT


Quotation Marks “ ”= specific phrase
Parenthesis ( ) = process as a unit rather
than sequentially
Use TaxBrowser for taxonomy identifier
(human = txid9606) or human[ORGN]
Limiters (characteristics of the query)
Asterisk * = truncating query that
begins/ends with a particular text string
Identify the database

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

16

8
Access to Information

Accession Numbers
a string of about 4–12 numbers
and/or alphabetic characters
that are associated with a
molecular sequence
record/expression/structure

GenInfo Number (GI:12345678)


assigned consecutively to
each sequence that is
processed

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

17

Access to Information

Reference Sequence (RefSeq) Project


provide the best representative sequence for each normal (i.e., nonmutated) transcript
produced by a gene and for each normal protein product
are curated by the staff at NCBI and are nearly nonredundant
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

18

9
Access to Information
Format Display
General Description

View Options

Accession # & Version

Related Literature

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
15) National Center for Biotechnology Information (NCBI) (n.d.). Nucleotide. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/nuccore

19

Access to Information

Locus Reference Genomic (LRG) Project


define genomic sequences that can be used as reference standards for genes, representing
a standard allele

Consensus Coding Sequence (CCDS) Project


was established to identify a core set of protein-coding sequences that provide a basis for a
standard set of gene annotations; “gold standards” of best-supported gene and protein
annotations

Vertebrate Genome Annotation (VEGA) Project


Offers high-quality, manual (expert) annotation of the human and mouse genomes, as well
as selected other vertebrate genomes

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

20

10
Access to Information
Genome Browsers
databases with a graphical interface representing sequence information and other data as
a function of position across the chromosomes. Principal genome browsers are:

University of California, Santa Cruz (UCSC) Genome browser


supports the analysis of dozens of vertebrate and invertebrate genomes
provides graphical views of chromosomal locations at various levels of resolution
Each chromosomal view is accompanied by horizontally oriented annotation tracks
Ensembl Genome browser
offers a series of comprehensive websites emphasizing a variety of eukaryotic
organisms
Map Viewer at NCBI
includes chromosomal maps for a variety of organisms
(1) Home page, (2) genome view, (3) map view, (4) sequence view

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

21

Bio16 Computational Biology


Access to Sequenced Data and Related Information

References:
National Center for Biotechnology Information (NCBI) (n.d.). Bookshelf. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/books/

National Center for Biotechnology Information (NCBI) (n.d.). Guide. Retrieved July 16, 2023 from https://ncbi.nlm.nih.gov/guide/all/

National Center for Biotechnology Information (NCBI) (n.d.). Nucleotide. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/nuccore

National Center for Biotechnology Information (NCBI) (n.d.). OMIM. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/omim

National Center for Biotechnology Information (NCBI) (n.d.). Structure. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/structure

National Center for Biotechnology Information (NCBI) (n.d.). Taxonomy. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/taxonomy

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

22

11

You might also like