Bioinformatics: Intended Learning Outcomes
Bioinformatics: Intended Learning Outcomes
Bioinformatics: Intended Learning Outcomes
Introduction
- Millions of nucleic acid sequences have been stored in data banks
-- where they’re organized by applying computer and information technology.
- Thus, the field of Bioinformatics emerged.
- biological informatics
a) application of information technology to the field of molecular
biology.
b) Storing
retrieving large amounts of biological information.
Analyzing
- Like an archive of biological information
- involves the establishment of databases, algorithms, computational and statistical
techniques.
- highly interdisciplinary field
- strives to advance our knowledge of the biological system
-- enable us to interpret biological processes
-- utilize such knowledge in various applications.
Applications of Bioinformatics
- various applications and fields:
FAR EASTERN UNIVERSITY - NICANOR REYES MEDICAL FOUNDATION
SCHOOL OF MEDICAL LABORATORY SCIENCE
Joeperl C. Verdadero
MOL BIO
1. Sequence alignment and analysis 10. Gene therapy
2. Mapping and analyzing DNA, RNA, 11. Antibiotic resistance
Protein, Amino Acid, and Lipid 12. Evolutionary studies
sequences 13. Waste cleanup
3. Creation and visualization of 3-D 14. Biotechnology
structure models for biological 15. Climate change studies
molecules of significance, e.g., proteins 16. Alternative energy sources
4. Genome annotation 17. Crop improvement
5. Genetic diseases 18. Forensic analysis
6. Designer / Personalized Medicine 19. Bio-weapon creation
7. Drug development 20. Insect resistance
8. Microbial genome applications 21. Improve nutritional quality
9. Molecular medicine 22. Veterinary science
4.Genome annotation
- identifying locations of genes & all coding regions in a genome
- determine gene function
Gene consists of enough DNA to code for 1 protein
Genome is the sum total of an organism’s DNA.
Annotation
-- note added, explanation or commentary
- feature information:
1. Cds - coding sequence
2..Coding region intervals, includes start & stop codon (if present)
3..Protein name
4..Gene name
5..Amino acid sequence
Eg. Escherichia coli CFT073 (organism & gene of interest)
1. Cds
2. Coding region intervals: 190…255
3..Protein name: Threonine, leader peptide
4..Gene name: thrL
5..Amino acid sequence: "MKRISTTITTTITITTGNGAG”
4. You may search for your gene of interest, and click on the gene link, a new page will appear that
contains the gene sequence and other information.
BLAS the higher the alignment score,
T the more significant the hit.
- Basic Local Alignment Search
Tool) B. Table of BLAST hits
- used if nucleotide sequence is available but - summary table where all the sequences in the
without annotation Refseq (reference sequence) database that show
- finds regions of similarity between biological significant sequence homology
sequences - shows all of the alignment blocks for each
- compares sequences (query sequence to BLAST hit.
database sequence ) - sequence alignments show how well the query
-- calculates the statistical significance sequence matches with the subject sequence
- designed to identify local regions of sequence
similarity. C. Corresponding alignments
- may report multiple discrete regions of
sequence similarity b Important Features of a Typical
BLAST alignment
1. Go to NCBI A) Score:
2. Click on“Resources” and choose “DNA & RNA” - number used to assess the biological relevance
at drop-down menu and choose “BLAST”. of a finding
3. On the page that appears, click on “Nucleotide - numerical value that describes the overall
BLAST” . quality of an alignment.
4. Paste your sequence at the box for Enter higher numbers correspond
Query Sequence. FASTA sequence or to higher similarity
Accession number may be entered.
B) Expect value / E-value
FASTA : text-based format for a nucleic acid - describes the number of hits one can “expect”
or protein sequence to see by chance when searching a database of a
- nucleotides or amino acids are particular size.
represented using single-letter codes. The lower the E-value, or the closer it is to “0”,
Accession number : unique identifier the higher is the “significance” of the match.
given to a DNA or protein sequence record
C) Gap.
- allows tracking of different versions of
that sequence record and the associated - A space introduced into an alignment to
sequence over time in a single data repository compensate for insertions and deletions
D) Identity.
- The extent to which two (nucleotide or amino
5. Click ‘BLAST’ box below
acid) sequences have the same residues
-- Some parameters may be modified or
at the same positions in an alignment, often
specified.
expressed as a percentage.
-- line between the bases of the two sequences
Default BLAST Report
A. Graphical Summary
indicate identity.
- color of the boxes corresponds to the score (S) of the E) Query sequence
alignment - input sequence to which all of the entries in a
- Red bar: colored boxes, represent alignments database are to be compared.
in the database that match to the Query F) Subject or Matching sequence.
sequence - A subject or matching sequence is the sequence
-- highest alignment scores present in the database.
-- eg. our unknown hemoglobin beta genomic DNA has sequence homology to the hemoglobin
beta gene in Homo sapiens.
-- Homo sapiens hit has an accession number that begins with ‘NM’
-- other hits (Gorilla gorilla, Pan troqlodytes and Pan paniscus) begin with ‘XM’.
- primary difference between the two prefixes is the type of information available to
support each of the Refseq mRNAs.
-- NM: confirmed by experimental evidence
-- XM: based solely on computational predictions
- NM > XM: more favorable as it's based on experimental evidence
OTHER SEQUENCE ALIGNMENT TOOLS
Pairwise Sequence
Alignment
- identify regions of similarity between 2 biological sequences
EMBOSS
Water
- eg. compare the DNA sequence of normal beta hemoglobin gene (Hgb A) with the sickle cell
hemoglobin beta gene (Hgb S)
1. Access EMBOSS Water Pairwise Sequence Alignment using a browser.
Note:
-- Enter the gi number of normal Homo sapiens hemoglobin beta which is preceded by the
character “>”, then paste the DNA sequence on the next line.
- GI number is a simple series of digits that are assigned consecutively to each sequence record
processed by NCBI. (GenInfo Identifier)
-- Next, enter the gi number of Sickle cell Hgb-beta: >Anemic Homo sapiens hemoglobin, beta
(HBB), mRNA in another Notepad file.
- “>”, precedes the name of your sequence
-- sequence is found as FASTA
2. Enter the pair of sequences on the boxes indicated(
3. Set your pairwise alignment options.
4. Submit your job.
5. The Result will show the characteristics of the alignment such as as:
- length of the sequences (626 bp)
- percent identity (99.8%)
- percent similarity (99.8%)
- gaps (0.0%)
MUSCL
E