0% found this document useful (0 votes)
5 views3 pages

99 - Bioinformatics - Searching - Handouts

Bioinformatics is the interdisciplinary science focused on storing, retrieving, and analyzing biological data, utilizing primary databases for experimental data and secondary databases for curated information. It involves various specialists and is distinct from medical informatics, which focuses on IT innovations in healthcare. Public databases facilitate data sharing and collaboration, ensuring that research data is accessible, especially when publicly funded.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

99 - Bioinformatics - Searching - Handouts

Bioinformatics is the interdisciplinary science focused on storing, retrieving, and analyzing biological data, utilizing primary databases for experimental data and secondary databases for curated information. It involves various specialists and is distinct from medical informatics, which focuses on IT innovations in healthcare. Public databases facilitate data sharing and collaboration, ensuring that research data is accessible, especially when publicly funded.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

BIOINFORMATICS PRIMARY VERSUS SECONDARY DATABASES

WHAT IS BIOINFORMATICS? PRIMARY DATABASES:


Bioinformatics is the science of storing, retrieving and Populated with experimentally derived data such as
analyzing large amounts of biological information. nucleotide sequence, protein sequence or
It is a highly interdisciplinary field involving many macromolecular structure.
different types of specialists, including biologists, Experimental results are submitted directly into the
molecular life scientists, computer scientists and database by researchers, and the data are essentially
mathematicians. archival in nature.
DIFFERENCE BETWEEN BIOINFORMATICS AND Once given a database accession number, the data in
MEDICAL INFORMATICS primary databases are never changed: they form part of
MEDICAL INFORMATICS: The interdisciplinary study of the scientific record.
the design, development, adoption and application of IT- SECONDARY DATABASES:
based innovations in healthcare services delivery, Comprise data derived from the results of analyzing
management and planning. primary data.
BIOMEDICAL INFORMATICS: the interdisciplinary field Often draw upon information from numerous sources,
that studies and pursues the effective uses including other databases (primary and secondary),
of biomedical data, information, and knowledge for controlled vocabularies and the scientific literature.
scientific enquiry, problem solving and decision making, They are highly curated, often using a complex
motivated by efforts to improve human health. combination of computational algorithms and manual
WHO CAN USE BIOINFORMATICS? analysis and interpretation to derive new knowledge
• Researchers from the public record of science.
• Clinicians-scientists Essential aspects of primary and secondary databases
• Geneticists Primary database Secondary database
• Anyone who works on data-driven and data-reliant Syn Archival database Curated database;
researches. ony knowledgebase
PUBLIC DATA BASES ms
They collect, catalogue and provide open access to Sou Direct submission of Results of analysis,
published biological data. rce experimentally- literature research and
• The EMBL-European Bioinformatics Institute (EMBL- of derived data from interpretation, often of
EBI)
data researchers data in primary
• The US National Center for Biotechnology
databases
Information (NCBI)
ENA, GenBank and D InterPro (protein
• The National Institute of Genetics in Japan (NIG)
DBJ (nucleotide families, motifs and
sequence) ArrayExpr domains) UniProt
ess and GEO (functio Knowledgebase (seque
nal genomics nce and functional
data) Protein Data information on
Bank (PDB; proteins) Ensembl (vari
coordinates of ation, function,
three-dimensional regulation and more
macromolecular layered onto whole
DATA SHARING COLLABORATIONS structures) genome sequences)
• BEFORE PUBLISHING RESEARCH PAPER
• UPLOAD RESEARCH DATA (SEQUENCES) TO A CONTROLLED VOCABULARIES
DATABASE Controlled vocabularies are a vital ingredient for
• GET ACCESSION NUMBER annotating data stored in databases.
• CITE ACCESSION NUMBER IN THE RESEARCH PAPER
✓Non-hierarchal list (simplest controlled vocabulary),
• RESEARCH PAPER IS PUBLISHED
ex. Language
Some examples of global collaborations established to
TAXONOMY
manage the public record of different biological data
types. • Classification scheme.
DATA TYPE COLLABORATIONS • Life, domain, kingdom, phylum, class, order,
Nucleotide International Sequence family, genus, species (Linnean classification)
sequences Database Collaboration THESAURUS
Protein UniProt Consortium • In the field of information retrieval, a thesaurus is
sequences defined as a controlled and structured vocabulary in
Macromolecular Worldwide Protein Data Bank which concepts are represented by terms.
structures • Terms are organized so that relationships between
Molecular The International Molecular concepts are made explicit.
interactions Exchange Consortium
Protein The ProteomeXchange
identifications Consortium
Genomic and Global Alliance for Genomics
clinical data and Health
• OPEN DATA SHARING
• IF PUBLIC MONEY WAS USED AS FUNDS, THE
REASEARCH DATA SHOULD BE MADE PUBLIC
ONTOLOGIES
In informatics and computer science, an ontology is a MEGA X
representation of the shared background knowledge • PHYLOGENY INFERENCES
for a community. • MODEL SELECTION
An ontology describes the categories of objects • DATING & CLOCKS
described in a body of data, the relationships between • ANCESTRAL STATES
those objects, and the relationships between those • SELECTION & TESTS
categories. • SEQUENCE ALIGNMENT
In doing so, an ontology describes the objects PROTEIN SEQUENCE ALIGNMENT
themselves and sometimes defines what you need to CLUSTAL CONVERSED (IDENTICAL) *
know to recognise one of those objects. The labels used
to describe the objects can be used to deliver CONSERVATION MUTATON :
a controlled vocabulary, but an ontology is much more SEMI-CONSERVATION MUTATION .
than a controlled vocabulary.
https://www.ebi.ac.uk/training/events/ontologies- NON-CONSERVATION MUTATION ()
biocuration/ GAP_
GENE ONTOLOGIES
The archetypal example of an ontology in the molecular PROTEIN SEQUENCE ALIGNMENT - CLUSTAL
life sciences is the Gene Ontology (GO), created and
maintained by the Gene Ontology Consortium.
GO describes the function and cellular localization of
gene products across all species.

PROTEIN SEQUENCE ALIGNMENT


(3) ONTOLOGIES OF GENE CLUSTAL X DEFAULT COLORING
ONTOLOGIES
Biological process: terms
represent a series of
molecular events or
functions.
Molecular function:
activities performed by
individual gene products at
the molecular level (ex. MODELING
catalytic activity).
Cellular component: describes the parts of the cell –
subcellular structures and macromolecular complexes –
and the extracellular environment in which a gene
product may be localized (ex. Cytoplasm, ribosome, etc).
BIOINFORMATICS AS AN EXPERIMENTAL SCIENCE:
SEARCHING, COMPARING, MODELING, DATA
INTEGRATION
2. COMPARING:
1. SEARCHING:
MULTIPLE SEQUENCE 3. MODELING (OF
DATABASES (NCBI, EMBL-
ALIGNMENT (CLUSTAL PROTEIN): SWISS MODEL
EBI, etc.)
OMEGA TOOL)

4. INTEGRATION (OF
DATA): UNIPROT, ID
MAPPING
biological information - mmuch more on DNA, indi lang color ka mata
Computer scientist, mathematician -
much more on algorithms

Supercomputers - we use, facilitate storage of information


medical informatics
- e.g. LIS
HIS

biomedical informatics
- gene mutate -
check amino acid mutation
- hhala, check kung same sa present sa iban man

EML - EMBI
NCBI -

accesion number - butang mo sa paper ang sequence

nd ka magpublish unless kwaon mo sa accession number

primary database - mga bag- o na discover


database - archive ang results
primary - mga nadiscover
amu gd na

secondary -

may taxonomy man -


pero nd mga linya2
lain aton

ontology - may pagkaworkflow

------------------------------------
Conserved -
Concervative mutation -

e.g. all ni polar, all ni acid, e.g.


amu ni bi ang glutamate sequence
may mutate nga isa
ti nag aspartate na
same sila galing acidic
may mutation pero indi mang gid amu na ka atrong ang pagmutate or
express niya
non-conversation
-missense sa gyapon
glutamate -> mutate to -> valine
valine is very different from glutamate
- aromatic i think
white mice - 80% human genome same

MEGA -

You might also like