Alignment of Whole Genomes
Alignment of Whole Genomes
WHOLE GENOMES
Presented by: Hisham Adel Mohamed
Outline
Genome
Objective
MUMmer
MUMmer 1.0
Algorithms
Results
MUMmer 2
Algorithms
NUCmer
PROmer
Results
Genome
Is the Total DNA in the Cell
Whole genomes are millions of
nucleotides.
Identification of region of similarities
and difference between genomes is
important
http://www.scq.ubc.ca/wp-content/uploads/2006/08/applications.gif
MUMmer
Is a system used to align DNA and protein
sequencing in linear time.
O
u
t
p
u
t
t
h
e
r
es
ul
ts
.
MUMmer 1.0: Suffix Tree
Suffixes:
1 gaaccgacct
2 aaccgacct
3 accgacct
4 ccgacct
5 cgacct
6 gacct
7 acct
8 cct
9 ct
10 t
MUMmer 1.0: Maximal Unique matching
subsequence (MUM)
1 2 3 4 5 6 7
A
B
1 3 2 6 4 5 7
1 2 4 5 7
A
B
1 2 4 5 7
MUMmer 1.0: Closing the gaps
A gap : is an interruption in the MUM alignment.
Repeat procedure using a shorter minimum length for MUMS (long gaps)
Use the Smith-Waterman alignments ( short gaps)
Results
Align 2 highly homologous strains of M.tuberculosis, 4.4 million bps.
Time: 5 s suffix tree construction, 45 s sorting MUMs, 5 s Smith-Waterman alignments.
Longest MUM ,24563 bp; 249 MUMs > 5000 bp; >90% identical
atgtgtgtc$ $
c$ t
gt
1 9 i+1 10
c$ gt c$ gt
7 8 i
c$ gtc$ c$ gt
5 3 6
Suffix Tree for String atgtgtgtc$`
1 2 3 4 5 6 7 8 9 10 c$ gtc$
4 2
MUMmer 2.0: Cluster MUMs
In MUMmer 1 it is presumed that two complete sequence
to be aligned. It compute a single longest alignment
between the sequences.
Uses MUMmer 2
A contig (from contiguous) is a set of overlapping DNA segments derived from a single genetic
source.