Introduction to Bioinformatics Using Action Labs
By Jean-Louis Lassez, Ryan Rossi and Stephen Sheel
5/5
()
About this ebook
Related to Introduction to Bioinformatics Using Action Labs
Related ebooks
Statistical Bioinformatics: For Biomedical and Life Science Researchers Rating: 0 out of 5 stars0 ratingsMolecular Data Analysis Using R Rating: 0 out of 5 stars0 ratingsIntegration of Omics Approaches and Systems Biology for Clinical Applications Rating: 0 out of 5 stars0 ratingsBioinformatics: Algorithms, Coding, Data Science And Biostatistics Rating: 0 out of 5 stars0 ratingsKnowledge-Based Bioinformatics: From Analysis to Interpretation Rating: 0 out of 5 stars0 ratingsIntroducing Proteomics: From Concepts to Sample Separation, Mass Spectrometry and Data Analysis Rating: 0 out of 5 stars0 ratingsIntroduction to Bioinformatics, Sequence and Genome Analysis Rating: 0 out of 5 stars0 ratingsAnimal Cell Culture: Essential Methods Rating: 4 out of 5 stars4/5Grid Computing for Bioinformatics and Computational Biology Rating: 1 out of 5 stars1/5Systems Biology: A Textbook Rating: 0 out of 5 stars0 ratingsComputational Intelligence and Pattern Analysis in Biology Informatics Rating: 0 out of 5 stars0 ratingsStatistics at Square Two: Understanding Modern Statistical Applications in Medicine Rating: 0 out of 5 stars0 ratingsAnalysis of Clinical Trials Using SAS: A Practical Guide, Second Edition Rating: 0 out of 5 stars0 ratingsUnderstanding Biostatistics Rating: 0 out of 5 stars0 ratingsCellular and Molecular Pharmacology Rating: 5 out of 5 stars5/5Computational Approaches in Cheminformatics and Bioinformatics Rating: 0 out of 5 stars0 ratingsBioinformatics with Python Cookbook Rating: 0 out of 5 stars0 ratingsBioinformatics For Dummies Rating: 4 out of 5 stars4/5CRISPR: Genome Editing and Engineering And Related Issues Rating: 5 out of 5 stars5/5Exploring Molecular Biology and Genetic Engineering Rating: 5 out of 5 stars5/5Bioinformatics Algorithms: Techniques and Applications Rating: 0 out of 5 stars0 ratingsBioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data Rating: 4 out of 5 stars4/5Machine Learning in Bioinformatics Rating: 0 out of 5 stars0 ratingsData Analysis and Visualization in Genomics and Proteomics Rating: 0 out of 5 stars0 ratingsMolecular Modelling and Drug Design Rating: 0 out of 5 stars0 ratingsClinical Trials: A Practical Approach Rating: 5 out of 5 stars5/5Microbiology Rating: 0 out of 5 stars0 ratingsFast Facts: Biosimilars Rating: 0 out of 5 stars0 ratingsFundamental Bacterial Genetics Rating: 2 out of 5 stars2/5Genetics, revised edition: A Guide for Students and Practitioners of Nursing and Health Care Rating: 0 out of 5 stars0 ratings
Science & Mathematics For You
Outsmart Your Brain: Why Learning is Hard and How You Can Make It Easy Rating: 4 out of 5 stars4/5Sapiens: A Brief History of Humankind Rating: 4 out of 5 stars4/5The Source: The Secrets of the Universe, the Science of the Brain Rating: 4 out of 5 stars4/5The Big Book of Hacks: 264 Amazing DIY Tech Projects Rating: 4 out of 5 stars4/5Feeling Good: The New Mood Therapy Rating: 4 out of 5 stars4/5The Wuhan Cover-Up: And the Terrifying Bioweapons Arms Race Rating: 4 out of 5 stars4/5Activate Your Brain: How Understanding Your Brain Can Improve Your Work - and Your Life Rating: 4 out of 5 stars4/5The Gulag Archipelago [Volume 1]: An Experiment in Literary Investigation Rating: 4 out of 5 stars4/5Blitzed: Drugs in the Third Reich Rating: 4 out of 5 stars4/5What If?: Serious Scientific Answers to Absurd Hypothetical Questions Rating: 5 out of 5 stars5/5First, We Make the Beast Beautiful: A New Journey Through Anxiety Rating: 4 out of 5 stars4/5The Confidence Code: The Science and Art of Self-Assurance---What Women Should Know Rating: 4 out of 5 stars4/5Our Kind of People: Inside America's Black Upper Class Rating: 3 out of 5 stars3/5Ultralearning: Master Hard Skills, Outsmart the Competition, and Accelerate Your Career Rating: 4 out of 5 stars4/5Suicidal: Why We Kill Ourselves Rating: 4 out of 5 stars4/5No-Drama Discipline: the bestselling parenting guide to nurturing your child's developing mind Rating: 4 out of 5 stars4/5How Emotions Are Made: The Secret Life of the Brain Rating: 4 out of 5 stars4/5The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet Rating: 4 out of 5 stars4/5Chaos: Making a New Science Rating: 4 out of 5 stars4/5American Carnage: On the Front Lines of the Republican Civil War and the Rise of President Trump Rating: 4 out of 5 stars4/5The Gulag Archipelago: The Authorized Abridgement Rating: 4 out of 5 stars4/5The Rise of the Fourth Reich: The Secret Societies That Threaten to Take Over America Rating: 4 out of 5 stars4/5The Systems Thinker: Essential Thinking Skills For Solving Problems, Managing Chaos, Rating: 4 out of 5 stars4/5A Crack In Creation: Gene Editing and the Unthinkable Power to Control Evolution Rating: 4 out of 5 stars4/5Homo Deus: A Brief History of Tomorrow Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5A Letter to Liberals: Censorship and COVID: An Attack on Science and American Ideals Rating: 3 out of 5 stars3/5The Joy of Gay Sex: Fully revised and expanded third edition Rating: 4 out of 5 stars4/5Alchemy: The Dark Art and Curious Science of Creating Magic in Brands, Business, and Life Rating: 4 out of 5 stars4/5How to Think Critically: Question, Analyze, Reflect, Debate. Rating: 4 out of 5 stars4/5
Reviews for Introduction to Bioinformatics Using Action Labs
1 rating0 reviews
Book preview
Introduction to Bioinformatics Using Action Labs - Jean-Louis Lassez
9781257694891
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
Ryan Rossi
Stephen Sheel
Preface
Bioinformatics is the application of computational techniques and tools to analyze and manage biological data. This book provides an introduction to bioinformatics through the use of Action Labs. These labs allow students to get experience using real data and tools to solve difficult problems. The book comes with supplementary software tools and papers. The labs use data from Breast Cancer, Liver Disease, Diabetes, SARS, HIV, Extinct Organisms, and many others. The book has been written for first or second year computer science, mathematics, and biology students. The supplementary software and papers can be found at http://www.kibazen.com/binf
e9781257694891_i0002.jpgJean-Louis Lassez: Life is Pachinko
at the Kinsey Institute Museum
Table of Contents
Copyright Page
Title Page
Preface
Chapter 1 - Introduction to Bioinformatics
Chapter 2 - Introduction to BLAST and FASTA
Chapter 3 - BLAST Analysis and Applications
Chapter 4 - Advanced Bioinformatics Tools
Chapter 5 - Classification and Pattern Recognition
Chapter 6 - Advanced Topics
Appendix - Supplementary Papers
Glossary
Index
Chapter 1
Introduction to Bioinformatics
What is Bioinformatics
Background:
e9781257694891_i0003.jpgWhat is Bioinformatics? It depends on who you are talking to. A geneticist, a biologist, a mathematician, a CEO of a pharmaceutical company and a computer scientist all would have related, but different, opinions as to what Bioinformatics is.
Purpose:
This lab introduces various aspects of Bioinformatics, its scientific basis, its techniques and its applications.
Resources:
There are many excellent resources on Bioinformatics that can be found on the web. Visit, for instance, the tutorial located at:
http://www.ebi.ac.uk/2can/bioinformatics/bioinf_what_1.html
Key Terms:
Genome
Gene
Protein
Amino Acid
DNA
Codon
Prokaryote
Eukaryote
Archaea
RNA
Directions:
Read the tutorial in the resources (or equivalent) thoroughly.
Exercises:
Give a concise, yet precise, definition of Bioinformatics.
What are the biggest challenges facing Bioinformatics? Why do you think this is the case?
Give a list of the main biological databases that can be accessed on the internet.
What are the differences in the functions of the various biological databases?
Name the categories of the major data analysis tool.
How are the sequence analysis tools used in Bioinformatics?
Make a list of the most important real-world applications for Bioinformatics. Rank your choices from 1-10 and justify why, in your view, the application received its ranking (As the ranking is subjective and tied to your taste or expertise, what matters most is not the ranking you choose but the justifications you give).
References:
European Molecular Biology Laboratory (EMBL).What is Bioinformatics?
.
<http://www.ebi.ac.uk/2can/bioinformatics/bioinf_what_1.html>.
Genetic Home Reference: Your Guide to Understanding Genetic Conditions [Internet]. Bethesda, MD: United States National Library of Medicine, National Institute of Health [modified: 2009 July 31]. [Illustration], DNA is a double helix formed by base pairs attached to a sugar-phosphate backbone.;[cited 2007 July][about 3 screens]. Available from:http://ghr.nlm.nih.gov/handbook/basics/dna.
Exploring Frameshifts
Background:
A frameshift mutation (also called a frameshift or a framing error) is a genetic mutation that inserts or deletes a number of nucleotides that are not evenly divisible by three from a DNA sequence. Due to the triplet nature of gene expression by codons, the insertion or deletion disrupts the reading frame, or the grouping of the codons, resulting in a completely different translation from the original. The earlier in the gene the deletion or insertion occurs, the more altered the gene product will become.
e9781257694891_i0004.jpgFrameshift mutations frequently result in severe genetic diseases.
Purpose:
This lab is intended to analyze how different mutations affect sequences.
Resources:
BLAST: http://www.ebi.ac.uk/blast
Transeq: http://www.ebi.ac.uk/emboss/transeq/
Key Terms:
Frameshift mutation
Codon
Insertion
Deletion
Directions:
Make sure you have an understanding of the keywords above, and then complete the exercises below.
Exercises:
How many ways can we parse this DNA subsequence into a potential coding frame?
………TACGGAAGTTCACTGCAATCAGTTGACTGAGGACTG……
Assume that the coding frame for the subsequence is in fact:
TAC/GGA/AGT/TCA/CTG/CAA/TCA/GTT/GAC/TGA/GGA/CTG
Translate this subsequence into a sequence of amino acids. (You can do it by hand using the table for the genetic code, but using the Transeq program will be easier and faster.)
Now an insertion mutation has happened resulting in the following sequence:
TACGGTAAGTTCACTGCAATCAGTTGACTGAGGACTG
Translate this new sequence into a sequence of amino acids.
Next divide the sequence, which has a deletion mutation, into codons:
TACGAAGTTCACTGCAATCAGTTGACTGAGGACTG
Translate this new sequence into a sequence of amino acids.
Are there significant changes in the translation? Explain the reason for the differences in the translation from questions 3 and 4.
Run the BLAST program on the three DNA sequences above. Do the frameshifts cause a misclassification in the organisms identified by BLAST when compared to the original DNA sequence?
Visit the site: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=442341. Read the abstract for the article. Summarize the authors’ main point.
References:
Schach,B.G., Yoshitake,S. and Davie,E. W., Hemophilia B (factor IXSeattle 2) due to a single nucleotide deletion in the gene for factor IX
, The Journal of Clinical Investigation, no. 4(1987),
<http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=442341>.(12 September 2006).
The European Bioinformatics Institute (EBI).Blast @ EBI
.<http://www.ebi.ac.uk/blast>.
The European Bioinformatics Institute (EBI). EMBOSS Transeq
.<http://www.ebi.ac.uk/emboss/transeq/>.
Genetic Home Reference: Your Guide to Understanding Genetic Conditions [Internet]. Bethesda, MD: United States National Library of Medicine, National Institute of Health [modified: 2009 July 31]. [Illustration], Frameshift mustation.;[cited 2007 July][about 3 screens]. Available from:http://ghr.nlm.nih.gov/handbook/basics/dna.
Bioinformatics Tools
Background:
Molecular Sequence Alignment Tool
Sequence similarity is assessed, in a first instance, by comparing the first, and then second, then third, etc. letters from each sequence and scoring positive points when there is a match and negative points when there is no match. The problem becomes more complex when we have gaps, which occur when one sequence may have been subjected to one or more insertion or deletion mutations. This lab provides an introduction to sequence alignment, which is the first fundamental tool in the study of biosequences.
e9781257694891_i0005.jpgHere is an example of alignment:
410 AANCGTGATCGATGCTAGCTATATA 434
e9781257694891_i0006.jpg410 AATCGTTATCGATGCTAGCTATATA 434
The numbers at each end of the sequences correspond to the nucleotide number in the original sequence. The (|) means a match, while (:) means a gap and no connector means a substitution, as we see on the seventh pair.
Purpose:
This lab introduces Molecular Sequence Alignment tools.
Resources:
For this exercise use the software located at:
http://xylian.igh.cnrs.fr/bin/align-guess.cgi.
Key Terms:
Genome
Sequence Alignment
Mutation
Insertion/deletion/substitution
Gap Penalty
E-score
Nucleotide
Directions:
As will often happen with online bioinformatics resources, links, such as the one in the resources may or may not work. It is part of this lab to train you in searching the net until you find the appropriate information. Once you are at the website, or another equivalent one, run the alignment tool with the sequences below.
First Sequence:
AACGCCCAGGGTTTCCCAGTCACGACGTTGTAAAAGCGACGGCCAGTGCCA
Second Sequence:
AACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGCCA
Exercises:
What percentage identity do these two sequences have?
What is the gap penalty and where is/are the gap(s) in the alignment?
What is the score of the alignment?
The next exercises make use of an ORF finder and the sequence below.
The link to ORF Finder is: http://www.ncbi.nlm.nih.gov/gorf/gorf.html
Give the following sequence as input to the program:
e9781257694891_i0007.jpg4. What do the colored bars represent in the frames?
5. Which frame does not contain an open reading frame?
6. Which frame has the longest open reading frame?
7. Which of these ORF’s, if any, correspond to a known gene?
References:
Institut de Génétique Humaine. ALIGN Query using sequence data
. <http://xylian.igh.cnrs.fr/bin/align-guess.cgi>.
National Center for Biotechnology Information (NCBI). ORF Finder (Open Reading Frame Finder)
.
<http://www.ncbi.nlm.nih.gov/gorf/gorf.html>.
Chapter 2
Introduction to BLAST and FASTA
Introduction to Sequence Analysis
e9781257694891_i0008.jpgDatabase Searching Options
Statistical matrices allow a query sequence to be aligned with matching sequences in the database. The less complex, faster matrices sacrifice a certain degree of match significance. The matrix together with the choice of the program essentially determines the search sensitivity and speed.
Filtering masks regions of the query sequence that has repeats or other low compositional complexity areas. Masking is achieved by replacing the repeats with N’s, the IUB code for any base.
The three main public molecular databases are EMBL(Europe), GenBank(US), and DDBJ(Japan). These three databases update each other with new sequences collected from each region, every 24 hours.
Every entry into the database requires a unique identifier that never changes and a version number.
A redundant database is a database where more than one copy of each variant of a sequence may be found. The advantage of a redundant database is that it’s much more likely to contain recently discovered sequences. The disadvantage is that the biologically significant results are more likely to be hidden among the large number of reported matches.
Sequence Alignment Programs
BLAST – BLAST is the fastest, but compromises some degree of sensitivity for speed.
FASTA – FASTA is slower, but more sensitive then BLAST.
BLITZ – BLITZ also provides a very sensitive search but is very slow to run.
BLAST and FASTA