0% found this document useful (0 votes)
14 views19 pages

Lecture-7 Gene Duplication and Read Mapping

The document covers topics related to DNA mutation, gene duplication, and read mapping in genomics. It explains the mechanisms of mutation, types of gene duplication, and the concepts of homologs, orthologs, and paralogs. Additionally, it details various genome indexing techniques such as keyword trees, suffix trees, suffix arrays, and the Burrows-Wheeler transform for short read mapping.

Uploaded by

deadghostbusters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views19 pages

Lecture-7 Gene Duplication and Read Mapping

The document covers topics related to DNA mutation, gene duplication, and read mapping in genomics. It explains the mechanisms of mutation, types of gene duplication, and the concepts of homologs, orthologs, and paralogs. Additionally, it details various genome indexing techniques such as keyword trees, suffix trees, suffix arrays, and the Burrows-Wheeler transform for short read mapping.

Uploaded by

deadghostbusters
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Gene Duplication

and Read
Mapping
Week 7

Department of CSE, DIU


CONTENTS

1. Mutation

2. Gene Duplication

3. Read Mapping
- Keyword Tree
- Suffix Tree
- Suffix Array
- Burrows Wheeler Transform
1. DNA Mutation
What and how mutation occurs, common forms
Mutation
DNA Mutation refers to sudden, ATCCGA
random changes in DNA sequences ATGCCGA
which leads to different phenotypic
expressions.
Insertion
Common Mutation
Types
Substitution Duplication
AATTCGCA AATCGCA
AATGCGCA Inversion AATCATCGCA

AATCGCA
AACGGCA Insertion
Deletion
AGCATCG AATCGCA
AATTCGCA
ACTATCG AATTCGCA
AATCGCA
2. Gene Duplication
Duplication of Genes, Homolog, Ortholog, Paralogs
Gene
Duplication
Gene duplication (or chromosomal
duplication or gene amplification) is
a major mechanism through which
new genetic material is generated
during molecular evolution. It can be
defined as any duplication of a
region of DNA that contains a gene.
Homolog, Ortholog, Paralog and
Speciation
• Homolog - A gene related to a
second gene by descent from
a common ancestral DNA
sequence

• Ortholog - Orthologs are


genes in different species that
evolved from a common
ancestral gene by speciation*

• Paralog - Paralogs are genes


related by duplication within a
genome

• Speciation* - Speciation is the


origin of a new species
capable of making a living in a
new way from the species
3. Read Mapping
Short Read Mapping, Genome Indexing
Read
Mapping
Mapping refers to the
process of aligning short
reads to and finding the
starting position in a
reference sequence
(typically Genome).

Short read generally are


Genome Indexing (Keyword Tree)
▹ Stores a set of keywords in a
rooted labeled tree.
▹ Each edge is labeled with a
letter from an alphabet.
▹ Any two edges coming out of the
same vertex have distinct labels.
▹ Every keyword stored can be
spelled on a path from root to
some leaf.
▹ Furthermore, every path from
root to leaf gives a keyword.
Keywords
▹ Apple
▹ Apropos
▹ Banana
▹ Bandana
▹ Orange
Genome Indexing (Suffix Tree)
▹ Similar to Keyword Tree
▹ Suffixes of the text are keywords
▹ Edges that form paths are
collapsed
▹ Each edge is labeled with a
substring of the text
▹ All internal edges have at least
two outgoing edges.
▹ Leaves are labeled by the index
of the pattern.

Suffix tree of ATCATG


Genome Indexing (Suffix Array)
▹ More space efficient than
1 ATCATG$ 7 $ suffix tree
2 TCATG$ 1 ATCATG$ ▹ Suffix tree index for human
3 CATG$ 4 ATG$ genome is about 47 GB
Sort the
suffixes
4 ATG$ 3 CATG$ ▹ Lexicographically sort all the
lexicographi
suffixes
cally 6 G$
▹ Store the starting indices of the
2 TCATG$
suffixes along with the original
5 TG$ string
5 TG$ Generate Suffix Array
6 G$ of ATCATG
7 $
Genome Indexing (Burrows Wheeler Transform)
▹ Given Sequence –
abaaba

▹ Add $ as ending
notation – abaaba$

▹ By Shifting each
alphabet to the right
once, generate all the
rotations

▹ Lexicographically Sort
all the rotations

▹ The very last column


will be denoted as BWT
(T)
Genome Indexing (Burrows Wheeler Transform)

▹ Given Sequence –
abaaba

▹ Add $ as ending
notation – abaaba$

▹ Lexicographically sorted
all rotations will
generate BWT Matrix
which will be denoted as
BWM (T)

▹ Suffix Array generated


from all the rotations
will be called SA (T)

▹ BWM can be derived


Genome Indexing (Burrows Wheeler Transform)

LF (Last to First)
Mapping

▹ Generate Burrows
Wheeler Matrix for a
given sequence

▹ Assign numbers to
distinguish same
characters

▹ Assign the numbers in a


ascending manner for
each character
Genome Indexing (Burrows Wheeler Transform)
Find out the row starting with b1 using LF
Mapping
Start
1. Start from the row containing $ in the First
Column

2. Find out what’s in Last Column of that row


(here its a0)

3. Compare it with query (b1)

4. If MATCH, then
- Find b1 in First Column
- Print row number
- Terminate

5. If No MATCH, then
- Find the row with that element in the
Genome Indexing (Burrows Wheeler Transform)

Find Original Gene using F L


LF Mapping if BWT (T) is
Given Start $ a0
a0 b0
1. Original Gene = abaaba
(Not Given) a1 b1
2. Given BWT (T) = abba$aa
3. Store it as Last Column a2 a1
4. Draw the First Column by a3 $ FINISH
sorting the elements of
Last Column b0 a2
Lexicographically
5. Assign numbers to b1 a3
distinguish characters in
an ascending manner a b a a b a $
6. Start LF Mapping from
Starting Element ($)
7. For each element found in
Whales
Birds came
Their ancestors
and
had backfrom
Dolphins
legs once, they could walk

Dinosaurs
And they both descended from Reptiles

Humans have tails


While they are inside the womb! It dissolves
eventually.
Bacterium
All livings beings can be traced back to a
bacterium

You might also like