Lecture4-Gene Prediction Problem - Simiarity Based Method
Lecture4-Gene Prediction Problem - Simiarity Based Method
In gene prediction using similarity-based methods, where you’ll use sequence alignment
techniques to identify potential genes in a given DNA sequence.
Example Problem
You have a newly sequenced DNA segment from an organism and need to identify possible
gene regions based on similarity to known gene sequences. The known gene sequences are
from a closely related species.
DNA Segment to Analyze:
5'-
ATGCGTACCTGATTAGCGGACTTCCAGTACGTAAGGATGGCTTAGAATGGAGTTGG
GAGTCC-3'
Known Gene Sequences:
1. Gene A: 5'-ATGCGTACCTGATTAGCGGACTTCCAG-3'
2. Gene B: 5'-GATTAGCGGACTTCCAGTACGTAAGGATGG-3'
3. Gene C: 5'-AATGGAGTTGGGAGTCC-3'
Task: Predict the gene regions in the DNA segment by identifying which known gene
sequences are present.
Solution
Step 1: Identify Matches
Compare the DNA segment with each known gene sequence to find the best matches.
1. Compare with Gene A:
o Gene A: 5'-ATGCGTACCTGATTAGCGGACTTCCAG-3'
o DNA Segment: 5'-
ATGCGTACCTGATTAGCGGACTTCCAGTACGTAAGGATGGCTTAGAA
TGGAGTTGGGAGTCC-3'
Matching region:
o Start: 5'-ATGCGTACCTGATTAGCGGACTTCCAG-3'
o Position: 1 to 20 in the DNA segment
2. Compare with Gene B:
o Gene B: 5'-GATTAGCGGACTTCCAGTACGTAAGGATGG-3'
o DNA Segment: 5'-
ATGCGTACCTGATTAGCGGACTTCCAGTACGTAAGGATGGCTTAGAA
TGGAGTTGGGAGTCC-3'
Matching region:
o Start: 5'-GATTAGCGGACTTCCAGTACGTAAGGATGG-3'
o Position: 13 to 32 in the DNA segment
3. Compare with Gene C:
o Gene C: 5'-AATGGAGTTGGGAGTCC-3'
o DNA Segment: 5'-
ATGCGTACCTGATTAGCGGACTTCCAGTACGTAAGGATGGCTTAGAA
TGGAGTTGGGAGTCC-3'
Matching region:
o Start: 5'-AATGGAGTTGGGAGTCC-3'
o Position: 33 to 52 in the DNA segment
Step 2: Annotate the Gene Regions
Based on the identified matches, annotate the DNA segment to highlight potential gene
regions.
Identified Gene Regions:
1. Gene A Region:
o Sequence: 5'-ATGCGTACCTGATTAGCGGACTTCCAG-3'
o Positions: 1 to 20
2. Gene B Region:
o Sequence: 5'-GATTAGCGGACTTCCAGTACGTAAGGATGG-3'
o Positions: 13 to 32
3. Gene C Region:
o Sequence: 5'-AATGGAGTTGGGAGTCC-3'
o Positions: 33 to 52
Summary
For the given DNA segment:
• Gene A corresponds to positions 1-20.
• Gene B overlaps with Gene A and corresponds to positions 13-32.
• Gene C corresponds to positions 33-52.
Prediction:
• The DNA segment contains gene regions that match known sequences for Gene A,
Gene B, and Gene C. The overlaps suggest that the segment may include multiple
gene features from different genes.
This similarity-based approach involves finding known gene sequences within a DNA
segment to predict gene regions. Tools like BLAST can be used to perform such sequence
comparisons in a real-world scenario.
Example Problem
You have a DNA sequence from a newly sequenced genome and need to predict gene regions
by comparing it to known gene sequences from a reference database.
DNA Segment to Analyze:
5'-
ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGGCGGAGAGCCGATGACT
TAGGGATTCCGATCCG-3'
Known Gene Sequences:
1. Gene X: 5'-ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGG-3'
2. Gene Y: 5'-GAGCCGATGACTTAGGGATTCCGATCCG-3'
3. Gene Z: 5'-TACAGGACCTTAGGCGGAGAGCCGAT-3'
Task: Predict the gene regions in the DNA segment by identifying which known gene
sequences are present.
Solution
Step 1: Identify Matches
Compare the DNA segment with each known gene sequence to find matches.
1. Compare with Gene X:
o Gene X: 5'-ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGG-
3'
o DNA Segment: 5'-
ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGGCGGAGAG
CCGATGACTTAGGGATTCCGATCCG-3'
Matching region:
o Start: 5'-ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGG-3'
o Position: 1 to 20 (matches the initial portion of the DNA segment)
2. Compare with Gene Y:
o Gene Y: 5'-GAGCCGATGACTTAGGGATTCCGATCCG-3'
o DNA Segment: 5'-
ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGGCGGAGAG
CCGATGACTTAGGGATTCCGATCCG-3'
Matching region:
o Start: 5'-GAGCCGATGACTTAGGGATTCCGATCCG-3'
o Position: 25 to 44 (matches the middle portion of the DNA segment)
3. Compare with Gene Z:
o Gene Z: 5'-TACAGGACCTTAGGCGGAGAGCCGAT-3'
o DNA Segment: 5'-
ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGGCGGAGAG
CCGATGACTTAGGGATTCCGATCCG-3'
Matching region:
o Start: 5'-TACAGGACCTTAGGCGGAGAGCCGAT-3'
o Position: 30 to 49 (matches the later portion of the DNA segment)
Step 2: Annotate the Gene Regions
Based on the matches, annotate the DNA segment to identify the potential gene regions.
Identified Gene Regions:
1. Gene X Region:
o Sequence: 5'-
ATGGCCCTAGGTGACAGTGGATGCTTACAGGACCTTAGG-3'
o Positions: 1 to 20
2. Gene Y Region:
o Sequence: 5'-GAGCCGATGACTTAGGGATTCCGATCCG-3'
o Positions: 25 to 44
3. Gene Z Region:
o Sequence: 5'-TACAGGACCTTAGGCGGAGAGCCGAT-3'
o Positions: 30 to 49
Overlap Analysis:
• Gene X: Positions 1 to 20 (matches the beginning)
• Gene Y: Positions 25 to 44 (partially overlaps with Gene X, continues from position
25)
• Gene Z: Positions 30 to 49 (partially overlaps with Gene Y, continues from position
30)
Overlap between Gene X and Gene Z: Gene Z starts in the region where Gene X ends,
suggesting a potential combined gene structure or multiple gene features within the segment.
Summary
For the given DNA segment:
• Gene X: Present in positions 1-20
• Gene Y: Present in positions 25-44
• Gene Z: Present in positions 30-49
Prediction:
• The DNA segment contains overlapping gene regions corresponding to Gene X, Gene
Y, and Gene Z. The overlaps suggest that the DNA segment might include multiple
genes or gene features.
This similarity-based approach involves aligning the DNA segment with known gene
sequences to identify which gene sequences are present and how they might overlap,
providing insights into the potential gene structure in the DNA segment.