0% found this document useful (0 votes)

62 views21 pages

Alignment of Whole Genomes

This document discusses genome alignment using MUMmer. MUMmer is a system that can align whole genomes in linear time using suffix trees, longest increasing subsequence algorithms, and Smith-Waterman alignment. The document outlines MUMmer version 1.0 and 2.0, describing their algorithms and improvements. MUMmer 1.0 finds maximal unique matches between genomes and uses longest increasing subsequence to output alignments, while MUMmer 2.0 clusters matches and includes tools like NUCmer for contig alignment and PROmer for protein alignment. The document provides examples of MUMmer's speed and ability to align diverse genomes.

Uploaded by

Samir Sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views21 pages

Alignment of Whole Genomes

Uploaded by

Samir Sabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

ALIGNMENT OF

WHOLE GENOMES
Presented by: Hisham Adel Mohamed
Outline
 Genome
 Objective
 MUMmer
 MUMmer 1.0
 Algorithms
 Results
 MUMmer 2
 Algorithms
 NUCmer
 PROmer
 Results
Genome
 Is the Total DNA in the Cell
 Whole genomes are millions of
nucleotides.
 Identification of region of similarities
and difference between genomes is
important

 Similarities means similar function for the two

organisms.
http://www.scq.ubc.ca/wp-content/uploads/2006/08/molecular-machine.gif

 Difference means function is only need in one

of the organisms.
Objective
 Align the whole genomes of two related organisms.
 Understand the functions of genomes.
 In healthcare to find drugs for diseases.

http://www.scq.ubc.ca/wp-content/uploads/2006/08/applications.gif
MUMmer
 Is a system used to align DNA and protein
sequencing in linear time.

 There are two versions

 MUMmer 1.0, 1999.
 MUMmer 2 .0, 2002.
MUMmer
 Is mainly based on three algorithms.
 Suffix trees.
 The longest increasing subsequence (LIS).
 Smith-Waterman alignment.

 The Novelty of this system is to integrate the three algorithms to work in

coherent system

 For Aligning using dynamic programming

 Naïve :version O(n2) space and time.
 Hashing: faster but O(n2).
 Some versions can reduce the space to O(n) by taking more time.
 MUMmer:
 Suffix Trees: building takes O(n) time and space, search takes O(m), where n sequence length and m is substring length.
 LIS O(k log K) where k is number of Maximal unique match (MUMS).
MUMmer 1.0, 2009
 Assumption: Sequences are closely related.

 Inputs: Two DNA sequence and length of the shortest MUM.

 Output: base to base alignment of whole sequence

highlighting the exact difference in the genomes.
MUMmer 1.0, 2009
Perform a Maximal
Sort
Closethe
the matches founded
gaps in the and
alignment,
extract the longest set of
unique match
performing
matches
local identificatio
large inserts,
usingrepeats,
Longestsmall
n of

(MUM) using suffix

mutated
in regions,
creasin tandem repeats
g sequencing
and SNPs.m (LIS)
tree.
algorith

O
u
t
p
u
t
t
h
e
r
es
ul
ts
.
MUMmer 1.0: Suffix Tree
Suffixes:

1 gaaccgacct
2 aaccgacct
3 accgacct
4 ccgacct
5 cgacct
6 gacct
7 acct
8 cct
9 ct
10 t
MUMmer 1.0: Maximal Unique matching
subsequence (MUM)

 MUM is a sequence in genome A and B that occur

exactly once in A and B and not contained in any
large subsequences.
 With single scan of the suffix tree, find all MUMs
MUMmer 1.0: LIS
 Store MUMs according to their position in genome A.
 Select longest set of MUMs whose sequence occur in
ascending order in both genome A and B.

1 2 3 4 5 6 7
A

B
1 3 2 6 4 5 7
1 2 4 5 7
A

B
1 2 4 5 7
MUMmer 1.0: Closing the gaps
 A gap : is an interruption in the MUM alignment.
 Repeat procedure using a shorter minimum length for MUMS (long gaps)
 Use the Smith-Waterman alignments ( short gaps)
Results
 Align 2 highly homologous strains of M.tuberculosis, 4.4 million bps.
 Time: 5 s suffix tree construction, 45 s sorting MUMs, 5 s Smith-Waterman alignments.
 Longest MUM ,24563 bp; 249 MUMs > 5000 bp; >90% identical

 Align 2 cousin bacteria, M.genitalium (580 kbp) and M.pneumoniae

(816 kbp)
 Time : 6.5 s suffix tree; finding LIS 0.02 s; 116 s alignments.
 Longest MUM, 281 bp, 16 MUMs > 100 bp, <50% identical

 Align 2 syntenic sequences from human chromosome 12 and mouse

chromosome 6 (225 kbp).
 Time: 29 s in total, 1.6 s for suffix tree.
 Longest MUM, 117 bp, 10 MUMs > 50bp
MUMmer 2.0, 2002
 Algorithm improvement
 Memory
 Streaming query
 New model to cluster matches

 Able to align not only simple DNA sequences, but also

human chromosomes

 Able to align incomplete genomes and protein sequences

MUMmer 2.0: Streaming query
 A link points from Node U to node V, if the string label from the root to v is equal to the label from the root
to u with the first character removed.

Streaming String ...atgtcc...

atgtgtgtc$ $
c$ t
gt

1 9 i+1 10

c$ gt c$ gt

7 8 i

c$ gtc$ c$ gt

5 3 6
Suffix Tree for String atgtgtgtc$`
1 2 3 4 5 6 7 8 9 10 c$ gtc$

4 2
MUMmer 2.0: Cluster MUMs
 In MUMmer 1 it is presumed that two complete sequence
to be aligned. It compute a single longest alignment
between the sequences.

 New version Align unfinished assembly which needs

rearrangement
 First, the system outputs a series of separate MUMs.

 Clustering is performed by finding pairs of matches that are sufficiently close.

 Finally, a LIS is done.

MUMmer 2.0: NUCmer
 Multiple-contigs alignment program

 Uses MUMmer 2

 A contig (from contiguous) is a set of overlapping DNA segments derived from a single genetic
source.

 NUCmer input: two multi-fasta files contains partial or complete assemblies.

 Create a map of all contig positions within each file.

 Concatenate contigs in each file.

 Run MUMer to fine MUMs.

 Map back the matches

 MUMs are clustered together.

MUMmer 2.0: PROmer
 Protein-based alignment program
 Input: 2 multi-fasta files (DNA)
 Technique:
 Translate DNA into Amino Acid
 Index created that maps each protein to the source
DNA.
 Amino Acid are filtered to remove stop codons.
 Amino acid sequences are passed to MUMmer.
 Index is then used to translate matches back to DNA.
 Clustering steps.
Results
 Align P.yeolii and P.falciparum ,size 25 Mb
 PROmer : time < 1 h
 Blast : time ~ weeks
 Align E.coli (4.7 Mb) and V.cholerae (3 Mb) on 1
GHz desktop computer
 MUMmer 1: 74 s, 293 MB
 MUMmer 2: 27 s, 100 MB
Refrences
 Delcher et al. (2002) Fast algorithms for large-scale
genome alignment and comparison, Nucleic Acids
Res. 2478-2483
 Delcher et al. (1999) Alignment of whole genomes,
Nucleic Acids Res., 27,2369-2376
 http://www.iro.umontreal.ca/~csuros/IFT6299/A20
04/materiel/Shakiba.ppt
Questions?

8th Grade Science-Genetics and Heredity
80% (5)
8th Grade Science-Genetics and Heredity
56 pages
Multiple Sequence Alignment Thesis
100% (3)
Multiple Sequence Alignment Thesis
8 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Align 2
No ratings yet
Align 2
29 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
BMIE452 6 Ch4 Genome Alignment
No ratings yet
BMIE452 6 Ch4 Genome Alignment
37 pages
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
No ratings yet
Sequence Similarity Searching: WWW - Med.nyu - edu/rcr/rcr/course/PPT/similarity
57 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
Lec2 Choosing The Right Sequences 2024
No ratings yet
Lec2 Choosing The Right Sequences 2024
31 pages
Bioinformatics: Speeding Up Whole-Genome Alignment by Indexing Frequency Vectors
No ratings yet
Bioinformatics: Speeding Up Whole-Genome Alignment by Indexing Frequency Vectors
13 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Multiple Sequence Alignment 3
No ratings yet
Multiple Sequence Alignment 3
22 pages
MUMmer4 A Fast and Verstile Genome Alignment System
No ratings yet
MUMmer4 A Fast and Verstile Genome Alignment System
14 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
MUMmer PDF
No ratings yet
MUMmer PDF
8 pages
AlinhamentosMultiplos 2023-24
No ratings yet
AlinhamentosMultiplos 2023-24
24 pages
L3.4 Alignment
No ratings yet
L3.4 Alignment
90 pages
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Dynamic Programming Methods in Pairwise Alignment
No ratings yet
Dynamic Programming Methods in Pairwise Alignment
41 pages
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
No ratings yet
Alignment Methods: Introduction To Global and Local Sequence Alignment Methods
57 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Frid Seminar
No ratings yet
Frid Seminar
30 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Unit 3 Bioinformatics
No ratings yet
Unit 3 Bioinformatics
11 pages
Bioinformatics PDF Bak
No ratings yet
Bioinformatics PDF Bak
14 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
No ratings yet
Optimization of A Classical Algorithm For The Alignment of Genomic Sequences With Artificial Bee Colony
7 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Module4 Session1 Part2
No ratings yet
Module4 Session1 Part2
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Bioinfo Generic Skill
No ratings yet
Bioinfo Generic Skill
10 pages
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
No ratings yet
Secondary Structure Prediction of Tuberculosis Genomes Using Machine Learning Algorithms
111 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
Msa
No ratings yet
Msa
28 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Mummer
No ratings yet
Mummer
4 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Tabby
No ratings yet
Tabby
11 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
DNA Lesson Plan
0% (3)
DNA Lesson Plan
3 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Sequence Allignment
No ratings yet
Sequence Allignment
5 pages
03 - Sequence Alignment
No ratings yet
03 - Sequence Alignment
4 pages
Lecture Notes On Molecular Biology
No ratings yet
Lecture Notes On Molecular Biology
94 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
Acid Hydrolysis of DNA Isolated From Allium Cepa and Analysis of DNA Components Using Qualitative Color Reaction Test
100% (1)
Acid Hydrolysis of DNA Isolated From Allium Cepa and Analysis of DNA Components Using Qualitative Color Reaction Test
6 pages
CH 14.3 Studying The Human Genome
No ratings yet
CH 14.3 Studying The Human Genome
7 pages
Genes and Genetic Engineering v2.0
No ratings yet
Genes and Genetic Engineering v2.0
50 pages
KS4 Biology: Chromosomes, Genes and DNA
No ratings yet
KS4 Biology: Chromosomes, Genes and DNA
21 pages
Structure of DNA PDF
No ratings yet
Structure of DNA PDF
7 pages
Bio 308 Lecture Notes
No ratings yet
Bio 308 Lecture Notes
7 pages
Bio Molecules
No ratings yet
Bio Molecules
15 pages
Learner's Activity Sheet: Science (Quarter III - Week 4)
No ratings yet
Learner's Activity Sheet: Science (Quarter III - Week 4)
10 pages
The Basics of Molecular Biology 1st Alexander Vologodskii Instant Download
No ratings yet
The Basics of Molecular Biology 1st Alexander Vologodskii Instant Download
59 pages
Biopython: Ahmed G. A. Ali
No ratings yet
Biopython: Ahmed G. A. Ali
13 pages
Comparison Chart: DNA, or Deoxyribonucleic Acid, Is Like A Blueprint of Biological
No ratings yet
Comparison Chart: DNA, or Deoxyribonucleic Acid, Is Like A Blueprint of Biological
5 pages
Algorithms For Next-Generation Sequencing 1st Edition Wing-Kin Sung
No ratings yet
Algorithms For Next-Generation Sequencing 1st Edition Wing-Kin Sung
56 pages
2 Heridity
No ratings yet
2 Heridity
30 pages
222 Unit 3
No ratings yet
222 Unit 3
103 pages
Models of Life
No ratings yet
Models of Life
353 pages
Biology Chapter 13 DNA CE
No ratings yet
Biology Chapter 13 DNA CE
11 pages
Lecture 1 Chap 1-Navathe
No ratings yet
Lecture 1 Chap 1-Navathe
18 pages
Biology Notes Book g11
No ratings yet
Biology Notes Book g11
43 pages
c6 Constraints
No ratings yet
c6 Constraints
42 pages
Master Thesis Disposition
No ratings yet
Master Thesis Disposition
1 page
Ds83-Replication Review Formative Assessment
No ratings yet
Ds83-Replication Review Formative Assessment
4 pages
Transcription Translation SHO
No ratings yet
Transcription Translation SHO
10 pages
Study of Cell Jkchrome Com
No ratings yet
Study of Cell Jkchrome Com
8 pages
Biochemistry of NUCLEIC ACID
No ratings yet
Biochemistry of NUCLEIC ACID
30 pages
Zhao 2017
No ratings yet
Zhao 2017
10 pages
BMSolutions PDF
No ratings yet
BMSolutions PDF
48 pages
Chapter 34 - Nucleic Acid Structure &amp Amp Function
No ratings yet
Chapter 34 - Nucleic Acid Structure &amp Amp Function
16 pages
Daniel G. Schwartz: Research Interests
No ratings yet
Daniel G. Schwartz: Research Interests
22 pages
c1 Intro
No ratings yet
c1 Intro
46 pages
Active Contours Using Dynamic Programming
No ratings yet
Active Contours Using Dynamic Programming
44 pages
Microarray Image Analysis and Gene Expression Ratio Statistics
No ratings yet
Microarray Image Analysis and Gene Expression Ratio Statistics
42 pages
DNA and RNA Handout
No ratings yet
DNA and RNA Handout
3 pages
Virtualization: Ahmed Aley Supervisor: DR Mohamed Abo Elhoda
No ratings yet
Virtualization: Ahmed Aley Supervisor: DR Mohamed Abo Elhoda
17 pages
Spectral Repeat Finder (SRF) : Identification of Repetitive Sequences Using Fourier Transformation
No ratings yet
Spectral Repeat Finder (SRF) : Identification of Repetitive Sequences Using Fourier Transformation
17 pages
Expiry Date Warning
No ratings yet
Expiry Date Warning
6 pages
AQA Biology Topic 8.3 Structures of Ribonucleic Acid
No ratings yet
AQA Biology Topic 8.3 Structures of Ribonucleic Acid
4 pages
Ligase Chain Reaction
No ratings yet
Ligase Chain Reaction
16 pages
PCW LIC
No ratings yet
PCW LIC
3 pages

Alignment of Whole Genomes

Uploaded by

Alignment of Whole Genomes

Uploaded by

ALIGNMENT OF

 Similarities means similar function for the two

 Difference means function is only need in one

 There are two versions

 The Novelty of this system is to integrate the three algorithms to work in

 For Aligning using dynamic programming

 Inputs: Two DNA sequence and length of the shortest MUM.

 Output: base to base alignment of whole sequence

(MUM) using suffix

 MUM is a sequence in genome A and B that occur

 Align 2 cousin bacteria, M.genitalium (580 kbp) and M.pneumoniae

 Align 2 syntenic sequences from human chromosome 12 and mouse

 Able to align not only simple DNA sequences, but also

 Able to align incomplete genomes and protein sequences

Streaming String ...atgtcc...

 New version Align unfinished assembly which needs

 Clustering is performed by finding pairs of matches that are sufficiently close.

 Finally, a LIS is done.

 NUCmer input: two multi-fasta files contains partial or complete assemblies.

 Create a map of all contig positions within each file.

 Concatenate contigs in each file.

 Run MUMer to fine MUMs.

 Map back the matches

 MUMs are clustered together.

You might also like