Tutorial Note 7 Midterm Exam Review (Again!)
Tutorial Note 7 Midterm Exam Review (Again!)
A C G T $ T A G
C T T G A G A G $ T
T G G T A G $ C T A
G T $ A C T T G G
T A G T G G $ T
A G T G $ T G
G T G T A $
T G $ A G
G $ G T
$ T G
G $
T G G T A G $ C T A
3-3 12-13 4-13 10-13 13-13 8-13
G T $ A C T T G G
4-13 10-13
T A G T G G $ T
A G T G $ T G
G T G T A $
T G $ A G
G $ G T
$ T G
G $
B C G H O
D E F I J P S
K N Q R T U
L M
• ABACDCECFCAGAHIHJKLKMKJNJHAOPQPRPOSTSUSOA
• Upon arriving leaves, output corresponding suffix starting
position (you can record current “depth” as suffix length).
• SA = [13,3,4,9,5,12,1,7,10,2,8,11,6]
1 $GTAACTGTAGTG
2 AACTGTAGTG$GT
3 ACTGTAGTG$GTA
4 AGTG$GTAACTGT
x A C G T 5 CTGTAGTG$GTAA
F(x) 2 5 6 10 6 G$GTAACTGTAGT
7 GTAACTGTAGTG$
8 GTAGTG$GTAACT
9 GTG$GTAACTGTA
10 TAACTGTAGTG$G
11 TAGTG$GTAACTG
12 TG$GTAACTGTAG
13 TGTAGTG$GTAAC
1 $GTAACTGTAGTG
2 AACTGTAGTG$GT
• Observation: ‘$’ must be lying on the
3 ACTGTAGTG$GTA top-left corner of BWT rotation matrix
4 AGTG$GTAACTGT
5 CTGTAGTG$GTAA • Thus, ‘$’ is impossible to be BWT[1]
6 G$GTAACTGTAGT
7 GTAACTGTAGTG$ (unless DNA sequence is empty)
8 GTAGTG$GTAACT
9 GTG$GTAACTGTA
10 TAACTGTAGTG$G
11 TAGTG$GTAACTG
12 TG$GTAACTGTAG
13 TGTAGTG$GTAAC
CSCI3220 Algorithms for Bioinformatics Tutorial Notes | Fall 2018 13
Assignment 2. Q1. h)
• Suppose
given DNA sequence is randomly generated, if we
append ‘$’ and perform BWT on it, will ‘$’ have equal
probability to appear in each of the n+1 positions of BWT
output?
• Or, consider how you calculate probabilities: