Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
BIOINFORMATICS
TOTAL : 100%
REFERENCES & TEXTBOOKS
• Bioinformatics websites.
Introduction to • What is bioinformatics?
www.menti.com
Code:1163 7806
https://www.menti.com/al2ac9orq9fo
A. What is
bioinformatics?
A field of study that uses computation to
process knowledge from biological data.
Computer
Databases
(software)
http://www.youtube.com/watch?v=dJrpSvsFXFI
A. What is Bioinformatics?
Another definition adopted by Luscombe et al.:
8
Scopes of Bioinformatics
Development of computational tools and
1 databases.
Hi!
The cell
DNA, RNA, protein
The organism The tree of life
Central dogma of molecular
biology Genome-wide analysis of RNA and protein Genome analysis
• DNA sequence analysis data in bioinformatic databases is accumulating for over 150,000
different organisms.
• Complete genome sequences : help categorize organisms into three major branches in the
Treeof Life : bacteria, archaea, and eukaryotes
• Fundamental unity of life and comparative genomics : learn how chromosomes evolved through
duplications, deletions, and rearrangements.
Food for thought...
The cow genome is comprised of a sequence of 2.86 billion
letters (2,860,000,000 bases) - able to fill millions of pages of
a normal book.
How can you detect anomalies (at the gene level) between
cows, or between cows possessing defective phenotypes?
Manipulate Analyze
E. Application of bioinformatics
The right medicine can be tailored to the right patient based on biomarker-based
diagnosis.
Forensic DNA analysis
DNA sequencing for legal and investigative purposes.
Molecular phylogenetic analysis as evidence in criminal courts.
The deployment of genomic selection breeding will help in achieving higher genetic
gains in less time
Pandey etl al. (2016). Front. Plant. Sci. 7:455.
Access to • Publicly accessible databases
Sequence Data • Database operators
• Access to information
and literature • Access to biomedical
information literature
Lecture 01-B
Learning Outcomes (LO)
Manipulate Analyze
WEBSITES FOR BIOINFORMATIC-BASED
REPOSITORIES
August 2018
Note: Bacteria, archeae and viruses are absent from the list
because of relatively small genomes
8
Types of data in GenBank
• A part of large fragment of DNA:
- bacterial artificial chromosome
(BACs)
- yeast artificial chromosome (YACs)
DNA RNA Protein
• A gene:
- Prokaryote: non-coding and coding
regions cDNA
- Eukaryote:regulatoryregions, protein ESTs
Genomic DNA databases UniGene
coding exons and introns
*non-redundant (NR)
• cDNA databases
- RNA converted to more stable cDNA.
• Expressed Sequence Tags (ESTs)
- Partial DNA sequence of a cDNA clone.
- Assume 220,000 human genes ➠300 ESTs to
each gene.
National Center for Biotechnology Information
(NCBI)
https://www.ncbi.nlm.nih.gov/
NCBI key features: ①PubMed
• National Library of
Medicine’s search service.
• >20 million citations
in MEDLINE
- Medical Literature,
Analysis and Retrieval
System Online.
• Links to participating
online journals.
https://pubmed.ncbi.nlm.nih.gov
NCBI key features: ②Entrez
Also known as The Entrez Global Query Cross-
Database Search System.
• Scientific literature;
• DNA and protein sequence databases;
• 3D protein structure data;
• Population study datasets; and
• Assemblies of complete genomes.
NCBI key features: ③BLAST
• One typical molecule can contain many accession numbers- ESTs and DNA
fragment matching that particular molecule.
A sequence file in
GenBank/GenPept
format
FEATURE
SEQUENCE
The Reference Sequence ( RefSeq ) Project
(Accessible through NCBI main page)
• Goal: To provide best representative sequence for each normal, non-mutated transcript
produced by a gene, and normal protein product.
• One RefSeq entry per given gene or gene product, OR several RefSeq entries - splice
variants or distinct loci.
• RefSeq best representative sequences: provide an expertly curated accession number
that corresponds to the most stable, agreed upon “reference” version of a sequence.
https://www.ncbi.nlm.nih.gov/homologene/?term=68066
Access to Biomedical Literature
1 OR 2 1 2 lipocalin OR disease
(2,500,000 results)