Beispielfragen Bioinformatik 1
Beispielfragen Bioinformatik 1
Note
• The questions will be based on material from the Übungen and the Lectures.
Example questions
Write down the best alignment for the same two DNA sequences.
Z:\summer_11_teaching\bioinformatic\exercises\example_question_bioinformatics.doc 16.05.2011 [1 / 4]
Machine Translated by Google
T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0
In -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3
And -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1
V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4
6. You have calculated a score matrix for a pair of DNA sequences. You have performed the traceback
calculation and found a result like:
ACACCTTA
Write down the corresponding sequence alignment with gaps in the correct positions.
7. Outline the steps used to find values for a BLOSUM amino acid similarity matrix.
Z:\summer_11_teaching\bioinformatic\exercises\example_question_bioinformatics.doc 16.05.2011 [2 / 4]
Machine Translated by Google
9. What is the advantage of a seeded method like BLAST compared to a Needleman-Wunsch alignment ?
10. Name an application where you would use a method like BLAST and not a Needleman-Wunsch
alignment.
11. Name an application where you would use a slow method like Needleman and Wunsch, rather than a
BLAST-like method.
12. You have two proteins with weak, remote similarity. You know their sequences. You also know the
sequences of the original DNA. Why would you expect a better alignment using the protein sequences ?
13. In protein alignments, we do not just look at match/mismatches. We look at the similarities between
14. There are several different substitution matrices used in protein alignments. Why would you prefer one
over another ?
15. What is the difference between an iterated blast (psi-blast) search and a simple blast search ?
16. What is the advantage of an iterated blast search compared to a simple blast search ?
17. I have written a program that generates random DNA sequences. I expect to see 25 % sequence identity
between pairs of random sequences in gapped alignments. When I try the calculation, I usually see more
18. If I take random biological sequences from a data bank, I see even more sequence similarity. Why ?
19. I have two proteins with 20 % sequence identity. I ask you if this is likely to be significant. What other
20. If you use the program "chimera", what representation would you pick in order to see the secondary
structure ?
21. I want to calculate a multiple sequence alignment for Nseq sequences. How many pair-wise alignments will
I have to calculate ?
22. Nseq N
seq
b a=a
ÿÿÿ 1
words ?
23. In a multiple sequence alignment, I want to build a "guide tree". What determines the order in which I join
24. I have 3 sequences, A, B, C. The sequences B and C are both related to A, but I cannot get a good
alignment score when I align B and C. What could be a reason ? Draw a diagram if it is easier to explain.
25. I have calculated a multiple sequence alignment. I want to find which sites in the alignment are conserved
and which vary. I would like to make a plot of variability/conservation as a function of sequence position.
Nstates
i 1
Z:\summer_11_teaching\bioinformatic\uebungen\Beispiel_frage_bioinformatik.doc 16.05.2011 [3 / 4]
Machine Translated by Google
26. From a multiple sequence alignment, I have calculated variability/conservation as a function of sequence
3.5
2.5
2
S
1.5
0.5
0
residue number
0 50 100
position: Residues 37 and 43 seem to be very
conserved. Why might they be important residues ?
27. In the picture above, some sites are not very conserved. I say these residues cannot be important to the
function of the protein. Why may I be wrong ?
28. I have calculated a sequence alignment of 400 tyrosine kinases and I find that very few sites seem to be
conserved in evolution. How could I change my results, so that more sites seem to be conserved ?
29. We have a family of sequences and all pair-wise alignments. I can count the number of differences
(mutations) between any two sequences and calculate the fraction of residues that have changed :
N diff
p=
.
I would like to estimate evolutionary time by saying t = k pmut for some constant k.
N mut length
32. It is believed that protein sequence evolves and changes faster than protein structure. What could be an
evolutionary explanation for this.
33. Which graphical representation would you use in order to emphasize the secondary structure content of
protein? all-atom, chaintrace, ribbon, ...?
34. You have a protein of unknown function from a bacterium. You have made a knock-out mutant, but the
bacteria die immediately without the corresponding gene. You have sequenced the protein. What steps
would you take to guess the function of the protein ? What kind of information would you look for ?
35. How would you proceed in a multiple sequence alignment to identify potentially catalytically important
To identify side chains in the active center?
Z:\summer_11_teaching\bioinformatic\uebungen\Beispiel_frage_bioinformatik.doc 16.05.2011 [4 / 4]