0% found this document useful (0 votes)
12 views

ModelQuestions MID Spring2024

Uploaded by

shuvojk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ModelQuestions MID Spring2024

Uploaded by

shuvojk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Pairwise alignment:

1. Define homology and differentiate between orthologs and paralogs. Why is this
distinction important in evolutionary biology and functional genomics?
2. Explain the concept of pairwise sequence alignment and its fundamental role in bioinformatics.
What are the practical applications of this process in genomic analysis and molecular biology?

3. Explain the principles of dynamic programming as applied to global (Needleman-


Wunsch) and local (Smith-Waterman) sequence alignments. How do these
algorithms differ, and what are the advantages and limitations of each?
4. Describe the process and challenges of aligning DNA sequences versus protein
sequences. In what scenarios would one prefer to align DNA sequences over
protein sequences, or vice versa?
5. Discuss the "twilight zone" of pairwise alignm1ent and its implications for
detecting evolutionary relationships between sequences. How does this concept
challenge the limits of sequence alignment methods?
6. Evaluate the statistical significance of pairwise alignments. How do researchers determine
whether an alignment reflects a true evolutionary relationship or is merely a coincidental
match?

7. Detail the steps involved in the Needleman-Wunsch algorithm for global


alignment. What does this algorithm reveal about the evolutionary relationship
between two sequences?
8. Explain the significance of the Smith-Waterman algorithm for local sequence alignment. How
does it differ from the global alignment approach, and why is it particularly useful for database
searches?
9. What are dot plots, and how can they be used to visualize sequence alignments and identify
regions of similarity?
10. How does one interpret the statistical significance of an alignment score? Explain the concept of
bit-scores and E-values in the context of sequence alignment.
GenBank:
1. GenBank Overview: Describe GenBank and its role in bioinformatics. What
makes GenBank an essential resource for genetic and genomic research?
1. Database Collaboration: Who are the international collaborators involved with GenBank, and
how do they contribute to the database’s comprehensiveness?
2. Database Types: Differentiate between archival and curated data in the context of genetic
databases. Provide examples of each from the lecture.
3. Reference Sequences (RefSeq): What is RefSeq, and how does it differ from GenBank in
terms of data curation and sequence representation?
4. Features of RefSeq Records: Describe the unique features and information that a RefSeq
record provides compared to a standard GenBank entry.
5. GenBank vs. RefSeq: Based on the lecture content, discuss the main differences between
GenBank and RefSeq databases. Why might a researcher choose to use one database over the
other?
6. Sequence Submission: Explain the process of submitting sequence data to GenBank. What
tools are available for submission, and what types of data can be submitted?
7. Accession Numbers: How are accession numbers structured in GenBank, and what information
can they provide about a sequence record?
8. Sequence Analysis Tools: Identify and describe the tools mentioned in the lecture for analyzing
sequence data directly or finding additional related data within GenBank.
9. Accession Numbers and Versioning:
How are GenBank accession numbers structured, and what does the version number signify? Provide
an example to illustrate the accession.version system and explain how it is used to track changes to
sequence records.
Multiple Sequence Alignment:
Conceptual Understanding of MSA:
 Define multiple sequence alignment (MSA) and explain its importance in
bioinformatics. What are the main criteria for building a successful MSA?
2. Applications of MSA:
 List and describe at least three main applications of multiple sequence alignment in
bioinformatics research.
3. Evolutionary Considerations in MSA:
 Discuss why important amino acids or nucleotides are less likely to mutate compared to
less important residues in the context of multiple sequence alignment.
4. Sequence Selection for MSA:
 What are the guidelines for selecting sequences for multiple sequence alignment, and
why is it advised to avoid sequences that are too similar or too different?
5. DNA vs. Protein Sequences:
 Under what circumstances should you choose to align protein sequences over DNA
sequences for phylogenetic analysis?
6. Optimal Number of Sequences:
 Explain the challenges associated with computing large alignments and the reasons for
choosing the right number of sequences for an MSA.
7. Common Issues with MSA:
 Identify and explain two types of sequences that multiple sequence alignment programs
typically struggle with.
8. Naming Conventions for Sequences:
 Describe the best practices for naming sequences when preparing them for multiple
sequence alignment.
9. Gathering Sequences using BLAST:
 What is the significance of characterizing sequences with good annotations and
experimental information when gathering sequences for MSA using BLAST?
10. Interpreting MSA Results:
 How can one recognize the "good parts" of a multiple sequence alignment, and what do
the symbols (*), (:), and (.) indicate in the context of MSA interpretation?
Phylogenetic Trees:
Basic Phylogenetics:
 Describe what a bifurcating phylogenetic tree represents and identify the components of
a typical phylogenetic tree including the root, internal nodes, terminal nodes, and
branches.
2. Clades and Taxonomy:
 Define a clade or monophyletic group in the context of phylogenetic trees. How do sister
taxa relate to each other within a monophyletic group?
3. Tree Topology and Evolution:
 Explain the difference between bifurcation and multifurcation in a phylogenetic tree.
What does multifurcation indicate about the evolutionary relationships among the taxa
involved?
4. Rooting Phylogenetic Trees:
 Discuss the two ways to root a phylogenetic tree, including the use of an outgroup and
midpoint rooting. What assumptions underlie the midpoint rooting approach?
5. Gene Phylogeny vs. Species Phylogeny:
 Why might the evolution of a particular gene sequence not correlate directly with the
evolutionary path of the species? How can researchers obtain a comprehensive species
phylogeny?
6. Tree Representation Forms:
 Compare and contrast cladograms and phylograms in terms of branch length scaling.
How do angled and squared forms of tree representation differ?
7. Newick Format:
 Describe the Newick format for representing phylogenetic trees. Why is this format
important for computational analysis of phylogenies?
8. Challenges in Finding True Phylogenetic Trees:
 Why is deriving a consensus tree based on individual inferred trees necessary, and what
does a multifurcating node in a consensus tree represent?
9. Sequence Choice for Phylogenetic Analysis:
 Discuss the considerations for choosing between nucleotide and protein sequences for
phylogenetic analysis. Under what circumstances would one type be preferable over the
other?
10. Alignment and Phylogenetic Analysis:
 Why are protein sequences often considered more reliable than nucleotide sequences for
phylogenetic analysis, especially when considering evolutionary rates and codon usage
bias?
Problem 1: Jukes-Cantor Model
The Jukes-Cantor model provides a simple formula to estimate the evolutionary distance (number of
substitutions per site) between two sequences. The formula is given by:
DJC=−43ln(1−34p)
where DJC is the evolutionary distance and p is the proportion of nucleotide sites that differ between
the two sequences.
Given: Two DNA sequences of equal length have been compared, and it was found that 150 out of 600
nucleotides are different between them.
Task: Calculate the evolutionary distance between these two sequences using the Jukes-Cantor model.

Problem 2: Kimura Two-Parameter Model


The Kimura two-parameter model takes into account the difference in rates between transitions
(substitutions between purines or between pyrimidines) and transversions (substitutions between purine
and pyrimidine) to estimate evolutionary distance. The formula is:
DK2P=21ln(1−2P−Q1)+41ln(1−2Q1)
where DK2P is the evolutionary distance, P is the proportion of transitional differences, and Q is the
proportion of transversional differences.
Given: In a comparison between two sequences, it was determined that 100 out of 1000 sites show
transitional differences and 50 out of 1000 sites show transversional differences.
Task: Calculate the evolutionary distance between these sequences using the Kimura two-parameter
model.

You might also like