0% found this document useful (0 votes)

5 views11 pages

LO4 Access To Sequenced Data and Related Information

Uploaded by

dumpdave30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views11 pages

LO4 Access To Sequenced Data and Related Information

Uploaded by

dumpdave30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Bio16 Computational Biology

Access to Sequenced Data and Related Information

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

Sources of Data and Databases

Researchers collect lots of data BIOLOGICAL DATABASES

Diverse living organisms = Library of related information

Hundreds of thousands of #Goals: collection & and preservation,
species easy access, standardized data
presentation, minimize redundancy, data
Scientifically published independence, management, updating,
experiments producing millions and organizing data into knowledge
of articles

High-throughput technologies
like PCR, sequencing, and
molecular assays

Data quality assessment data

(even before analysis) Growth of DNA
sequence in
repositories
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

1
Centralized Databases Store DNA Sequences

3 Main Nucleotide Sequence Database

GenBank = National Center for

Biotechnology Information (NCBI) of the
National Institutes of Health (NIH) in
Bethesda

European Nucleotide Archive = European

Molecular Biology Laboratory (EMBL)-Bank
Nucleotide Sequence Database at the
European Bioinformatics Institute (EBI) in
Hinxton, England

DNA Database of Japan = National Institute

of Genetics in Mishima

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Other Common Biological Database

2
Integration of Biological Databases
Tier 1 Tier 2 Tier 3

Challenges
Database architecture = similar structure
How to access & what can be accessed = data surfing
Naming system (S. cerevisiae RAD24 =rad17 in S. pombe)
Clash of concepts = definitions of terms (definition of GENE)

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Integration of Biological Databases

Approaches

Link Integration
▪ researchers begin their query with one data
source and then follow hypertext links to related
information in other data sources
▪ Vulnerable to naming clashes and ambiguities,
updates, researcher-dependent

View Integration
▪ leaves the information in its source databases but
builds an environment around the databases that
makes them all seem to be part of one large
system
▪ didn’t perform as well as the source database

Data Warehousing
▪ bringing all the data under one roof in a single
database
▪ Issue on keeping the data warehouse up to date

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

3
Contents of Databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Contents of Databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

4
Types of Biological Data

Genomic Databases

Sequenced Tag Sites (STS) = short

(typically 500 base pairs long)
genomic landmark sequences

Genome Survey Sequences (GSSs)

= consist of sequences that are
genomic in origin

High-Throughput Genomic
Sequence (HTGS) = contains
unfinished DNA sequences from
sequencing centers

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Types of Biological Data

RNA Databases

Expressed Sequence Tags (ESTs)=

contain sequence data on “single-pass”
cDNA sequences

UniGene = (unique gene) created for

gene-oriented clusters by making
nonredundant sets of ESTs

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

5
Types of Biological Data
Protein Databases

UniProt = aka Universal Protein Resource

is the most comprehensive, centralized
protein sequence catalog
Key Databases:
Swiss-Prot = considered the best-
annotated protein database
(structure and function)
TrEMBL = translated EMBL-NSDL
provides automated annotations of
proteins
PIR = aka Protein Information
Resource maintains the Protein
Sequence Database curated also by
experts

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Types of Biological Data

Protein Databases

UniProt = aka Universal Protein Resource

is the most comprehensive, centralized
protein sequence catalog
Key Database Layers:
UniProtKB = UniProt Knowledgebase is
the central database of either manual
or automated annotations
UniRef = UniProt Reference Clusters
offer nonredundant reference clusters
on UniProtKB
UniParc = UniProt Archive consists of a
stable, nonredundant archive of
protein sequences

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

6
Central Bioinformatics Resource

National Center for Biotechnology

Information (NCBI)
creates public databases
conducts research in comp. biology
develops software tools for analyzing
genome data
disseminates biomedical information

European Bioinformatics Institute (EBI)

Comparable to NCBI in its scope and
mission
Represents a complementary,
independent resource
Have six (6) core molecular databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

NCBI Prominent Resources

National Center for Biotechnology Information (NCBI) (n.d.). Guide. https://ncbi.nlm.nih.gov/guide/all/

7
Access to Information

Entrez
a molecular biology
database system that
provides integrated
access to databases

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Access to Information

Searching Databases

Boolean Operators: AND, OR, NOT

Quotation Marks “ ”= specific phrase
Parenthesis ( ) = process as a unit rather
than sequentially
Use TaxBrowser for taxonomy identifier
(human = txid9606) or human[ORGN]
Limiters (characteristics of the query)
Asterisk * = truncating query that
begins/ends with a particular text string
Identify the database

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

8
Access to Information

Accession Numbers
a string of about 4–12 numbers
and/or alphabetic characters
that are associated with a
molecular sequence
record/expression/structure

GenInfo Number (GI:12345678)

assigned consecutively to
each sequence that is
processed

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Access to Information

Reference Sequence (RefSeq) Project

provide the best representative sequence for each normal (i.e., nonmutated) transcript
produced by a gene and for each normal protein product
are curated by the staff at NCBI and are nearly nonredundant
Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

9
Access to Information
Format Display
General Description

View Options

Accession # & Version

Related Literature

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.
15) National Center for Biotechnology Information (NCBI) (n.d.). Nucleotide. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/nuccore

Access to Information

Locus Reference Genomic (LRG) Project

define genomic sequences that can be used as reference standards for genes, representing
a standard allele

Consensus Coding Sequence (CCDS) Project

was established to identify a core set of protein-coding sequences that provide a basis for a
standard set of gene annotations; “gold standards” of best-supported gene and protein
annotations

Vertebrate Genome Annotation (VEGA) Project

Offers high-quality, manual (expert) annotation of the human and mouse genomes, as well
as selected other vertebrate genomes

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

10
Access to Information
Genome Browsers
databases with a graphical interface representing sequence information and other data as
a function of position across the chromosomes. Principal genome browsers are:

University of California, Santa Cruz (UCSC) Genome browser

supports the analysis of dozens of vertebrate and invertebrate genomes
provides graphical views of chromosomal locations at various levels of resolution
Each chromosomal view is accompanied by horizontally oriented annotation tracks
Ensembl Genome browser
offers a series of comprehensive websites emphasizing a variety of eukaryotic
organisms
Map Viewer at NCBI
includes chromosomal maps for a variety of organisms
(1) Home page, (2) genome view, (3) map view, (4) sequence view

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Bio16 Computational Biology

Access to Sequenced Data and Related Information

References:
National Center for Biotechnology Information (NCBI) (n.d.). Bookshelf. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/books/

National Center for Biotechnology Information (NCBI) (n.d.). Guide. Retrieved July 16, 2023 from https://ncbi.nlm.nih.gov/guide/all/

National Center for Biotechnology Information (NCBI) (n.d.). Nucleotide. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/nuccore

National Center for Biotechnology Information (NCBI) (n.d.). OMIM. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/omim

National Center for Biotechnology Information (NCBI) (n.d.). Structure. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/structure

National Center for Biotechnology Information (NCBI) (n.d.). Taxonomy. Retrieved July 16, 2023 from https://www.ncbi.nlm.nih.gov/taxonomy

Pevsner, J. (2015). Bioinformatics and Functional Genomics (3rd ed.). John Wiley & Sons Inc.

Stein, L. D., et al. (2003). Integrating Biological Databases. Nature Reviews, 4, 337-345. doi: https://doi.org/10.1038/nrg1065

Prepared by:
Joseph Martin Q. Paet
Biology Department, College of Science
Bicol University

Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Database
No ratings yet
Database
16 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
CH12
No ratings yet
CH12
8 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
Bioinformatic Databases 2
No ratings yet
Bioinformatic Databases 2
28 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
2024.HF BioInformatics Lec3p
No ratings yet
2024.HF BioInformatics Lec3p
11 pages
Bioinformatics Glossary
No ratings yet
Bioinformatics Glossary
4 pages
Lecture 5 Information Retrieval From Databases
No ratings yet
Lecture 5 Information Retrieval From Databases
22 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Lecture 1 - Biological Database
No ratings yet
Lecture 1 - Biological Database
14 pages
Bio in For Matics
No ratings yet
Bio in For Matics
26 pages
المحاضرة 2
No ratings yet
المحاضرة 2
16 pages
"MBG1002 Biological Databases Week II
No ratings yet
"MBG1002 Biological Databases Week II
37 pages
Zoya Bioinformatics Assignment
No ratings yet
Zoya Bioinformatics Assignment
36 pages
Database 2
No ratings yet
Database 2
15 pages
Peace BMCB Seminar
No ratings yet
Peace BMCB Seminar
13 pages
Data Base in Bioinformatics
No ratings yet
Data Base in Bioinformatics
30 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Ncbi
No ratings yet
Ncbi
25 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
Databases of NCBI
No ratings yet
Databases of NCBI
13 pages
Biological Databases
No ratings yet
Biological Databases
17 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
Bioinfi U3 Part - 1
No ratings yet
Bioinfi U3 Part - 1
4 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
Database
No ratings yet
Database
40 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Lecture 4 Biological Databases
No ratings yet
Lecture 4 Biological Databases
29 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Bif501 Handouts PDF Bif
No ratings yet
Bif501 Handouts PDF Bif
197 pages
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
100% (1)
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
59 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Bioinformatics Intern
No ratings yet
Bioinformatics Intern
8 pages
Molecules 27 04643
No ratings yet
Molecules 27 04643
15 pages
Biological Databases PDF
No ratings yet
Biological Databases PDF
13 pages
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
No ratings yet
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
52 pages
Drug Design Using Bioinformatics
100% (3)
Drug Design Using Bioinformatics
13 pages
Information Age
No ratings yet
Information Age
49 pages
Protein Sequence Database Ankita Sharma
No ratings yet
Protein Sequence Database Ankita Sharma
31 pages
Uniprot: The Universal Protein Knowledgebase
No ratings yet
Uniprot: The Universal Protein Knowledgebase
12 pages
Syllabus M.SC - Bioinformatics 2023 2024
No ratings yet
Syllabus M.SC - Bioinformatics 2023 2024
72 pages
Bioinformatics Day4
No ratings yet
Bioinformatics Day4
5 pages
The Ensembl Xref System: Table Name Purpose
No ratings yet
The Ensembl Xref System: Table Name Purpose
9 pages
TinoTranscriptDB A Database of Transcripts and Microsatellite
No ratings yet
TinoTranscriptDB A Database of Transcripts and Microsatellite
13 pages
Bioinformatics (STH Sir)
No ratings yet
Bioinformatics (STH Sir)
13 pages
I Semester: M.Tech Full Time Scheme (New)
No ratings yet
I Semester: M.Tech Full Time Scheme (New)
53 pages
Bioinformatics-An Introduction and Overview
No ratings yet
Bioinformatics-An Introduction and Overview
12 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Practical 2 Sequence Alignment
No ratings yet
Practical 2 Sequence Alignment
8 pages
BIF401P Assignment No 2 - Spring 2025 (Solved)
No ratings yet
BIF401P Assignment No 2 - Spring 2025 (Solved)
18 pages
User Guide For Propy 1.0: 1.1 What Is This?
No ratings yet
User Guide For Propy 1.0: 1.1 What Is This?
11 pages
Enzyme Informatics
No ratings yet
Enzyme Informatics
13 pages
Introduction Bioinformatics
50% (2)
Introduction Bioinformatics
155 pages
Chapter 1: Genbank: The Nucleotide Sequence Database: Ilene Mizrachi
No ratings yet
Chapter 1: Genbank: The Nucleotide Sequence Database: Ilene Mizrachi
14 pages
Protein Database Overview
No ratings yet
Protein Database Overview
13 pages
Databases in Bioinformatics - An Introduction
No ratings yet
Databases in Bioinformatics - An Introduction
11 pages
Aula 1
No ratings yet
Aula 1
27 pages
Structural Proteomics Highthroughput Methods 2nd Edition Raymond J Owens Eds Download
No ratings yet
Structural Proteomics Highthroughput Methods 2nd Edition Raymond J Owens Eds Download
79 pages
Jurnal Aurelia Azalya Sofyan-7a - Compressed
No ratings yet
Jurnal Aurelia Azalya Sofyan-7a - Compressed
21 pages

LO4 Access To Sequenced Data and Related Information

Uploaded by

LO4 Access To Sequenced Data and Related Information

Uploaded by

Bio16 Computational Biology

Access to Sequenced Data and Related Information

Sources of Data and Databases

Diverse living organisms = Library of related information

Data quality assessment data

3 Main Nucleotide Sequence Database

GenBank = National Center for

European Nucleotide Archive = European

DNA Database of Japan = National Institute

Other Common Biological Database

Integration of Biological Databases

Sequenced Tag Sites (STS) = short

Genome Survey Sequences (GSSs)

Types of Biological Data

Expressed Sequence Tags (ESTs)=

UniGene = (unique gene) created for

UniProt = aka Universal Protein Resource

Types of Biological Data

UniProt = aka Universal Protein Resource

National Center for Biotechnology

European Bioinformatics Institute (EBI)

NCBI Prominent Resources

National Center for Biotechnology Information (NCBI) (n.d.). Guide. https://ncbi.nlm.nih.gov/guide/all/

Boolean Operators: AND, OR, NOT

GenInfo Number (GI:12345678)

Reference Sequence (RefSeq) Project

Accession # & Version

Locus Reference Genomic (LRG) Project

Consensus Coding Sequence (CCDS) Project

Vertebrate Genome Annotation (VEGA) Project

University of California, Santa Cruz (UCSC) Genome browser

Bio16 Computational Biology

You might also like