0% found this document useful (0 votes)

72 views25 pages

Genomic Sequence Alignment

Bioinformatic Sequence Alignment

Uploaded by

Đặng Mai Hương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views25 pages

Genomic Sequence Alignment

Bioinformatic Sequence Alignment

Uploaded by

Đặng Mai Hương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Introduction to Bioinformatics

for Medical Research

Gideon Greenspan
[email protected]

Lecture 3
Genomic Sequence Alignment
Genomic Sequence Alignment
• Homology
• Sequence similarity
• Dot plots
• Needleman-Wunsch alignment
• Alignment Variations
• Complexity
• BLASTN tool
2
Homology
• Similarity between species
• Before Darwin, based on morphology
– Similar bones in bat’s wing and whale’s flipper
• Darwin’s evolutionary theory
– Common ancestry
• Genetic sequences
– Evolution made visible
• Beware of incidental similarity
3
Types of Homology

• Orthologs generated by speciation

• Paralogs by gene duplication 4
Homology Motivation
• Study evolution
– Compare different species
• Discover function
– Compare against known species
• Find crucial features
– Compare many examples
• Identify cause of disease
– Compare against healthy individuals
5
Homology via Sequences
• Study evolution
– Multiple alignment and phylogenetic trees
• Discover function
– BLAST searches against database
• Find crucial features
– Motif finding
• Identify cause of disease
– Detect variable sites in alignment
6
Sequence Similarity
• Molecular genetic evidence of homology
• Requires quantification
– Length of match
– Strength of match

GTCTTCGACGTCAGTCGGCACGACCTG
ACCGTACGTCTCAGTTAGCGTGATCAG

Strong but short match Long but weak match

7
Sequence Modifications
• Three types of mutation
– Substitution (point mutation)
– Insertion
Indel (replication slippage)
– Deletion

TCCGT
TCAGT TCGAGT TCAGT
TCGT 8
Scoring Similarity
• Assume independent mutation model
– Each site considered separately
• Score at each site
– Positive if the same
– Negative if different
• Sum to make final score
– Can be positive or negative
– Significance depends on sequence length
9
Substitutions Only
• Pretend there are no indels
– Sequences compared base-by-base
– Count the number of matches and mismatches

TTCGTCGTAGTCGGCTCGACCTG
GTACGTCTAGCGAGCGTGATCCT

9 matches +18 Total score +4

14 mismatches -14 A weak match
10
Including Indels
• Create an ‘alignment’
– Count matches within alignment
– Required if sequences are different length

TT-CGTCGTAGTCG-GC-TCGACC-TG
GTACGTC-TAG-CGAGCGT-GATCCT-
17 matches +34
2 mismatches - 2 Total score +24
8 indels - 8 A strong match
11
Choosing an Alignment
• Many different alignments are possible
– Should consider all possible
– Take the best score found

TT-CGTCGTAGTCG-GC-TCGACC-TG
+24
GTACGTC-TAG-CGAGCGT-GATCCT-
-TTCGT-CGTAGTC-GGCTCG-ACCTG
0
GTAC-GTCTA-GCGAGCGT-GATCC-T
12
Dot Plots
G T A G T C G G • Early method
T ® ®
• Sequences at
A ®
top and left
G ® ® ® ®
C ® • Dots indicate
G ® ® ® ® matched bases
A ® • Diagonal series
G ® ® ® ® show matched
C ® regions
13
Local Alignment
• Best score for aligning part of sequences
– Often beats global alignment score
• Similar algorithm: Smith-Waterman
– Table cells never score below zero

-ACGCTG Global: ACGCT Local:

CATG-T- +2 ATG-T +4

14
Gap Scores
• Example showed -1 score per indel
– So gap cost is proportional to its length
• Biologically, indels occur in groups
– We want our gap score to reflect this
• Standard solution: affine gap model
– Once-off cost for opening a gap
– Lower cost for extending the gap
– Changes required to algorithm
15
Complexity
• Complexity is determined by size of table
– Aligning a sequence of length m against one of
length n requires calculating (m ¥ n) cells
• Estimate: we calculate 108 cells per second
– Aligning two mRNA sequences of 8,000 bp
requires 64,000,000 cells fi 0.64 seconds
– Aligning an mRNA and a 107 bp chromosome
requires ~1011 cells fi 1,000 secs = 15 minutes

16
Complexity for GenBank
• GenBank contains 3 ¥ 1010 base pairs
– Searching an mRNA against GenBank requires
~2.5 ¥ 1014 cells fi 2.5 ¥ 106 secs = 1 month!
– So each computer could support just one
GenBank search per month
• We need to cut down on alignment
– Use a heuristic method to narrow down the part
of GenBank that could be of interest

17
Genomic Indexing
• Indexing can improve any kind of search
– Index need only be built once
– Accessing an index is instantaneous
• Keys in genomic index are short sequences
– Each key points to many sequences
• Search uses index to isolate candidates
– Alignment is applied only to likely matches
• FASTA and BLAST implementations
18
FASTA and BLAST
• FASTA (Lipman & Pearson 1985)
– First fast sequence searching algorithm
• Gapped BLAST (Altschul et al 1997)
– Originally without gaps (1990)
• BLAST benefits
– Search speed
– Ease of use
– Statistical rigor
19
BLAST Variations
Name Query type Database
blastn Genomic Genomic
blastp Protein Protein
blastx Translated genomic Protein
tblastn Protein Translated genomic
tblastx Translated genomic Translated genomic

• Genomic translations test all 6 possibilities

– 3x for codon frames, 2x for reverse complement
20
BLASTN Parameters
Query: bare sequence, FASTA format,
GenBank accession or identifier

Subsequence: only
use part of query

Database: nr = all
available, month = in last
Go! month, chromosome =
complete genomes, more…
21
BLASTN Results (1)
Query sequence
representation

Matched areas
of database
sequences

Multiple
matches on
sequence
22
BLASTN Results (2)
Sequence
Identifier

Sequence
description

Score and
E value
23
BLASTN Results (3)
Database
sequence

Score
breakdown

Sequence
positions

Matching
sites
24
BLASTN Scores
• Bit score
– Similar to alignment score
– Normalized to account for different schemes
– Higher means more significant
• E value
– Based on random database of similar size
– Number of hits of given score expected
– Lower means more significant
25

Next Generation Sequencing and Data Analysis Melanie Kappelmannfenzl Instant Download
No ratings yet
Next Generation Sequencing and Data Analysis Melanie Kappelmannfenzl Instant Download
87 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Handbook of Biochemistry Section B Nucleic Acids, Volume II 3rd Edition Free Ebook Download
100% (13)
Handbook of Biochemistry Section B Nucleic Acids, Volume II 3rd Edition Free Ebook Download
14 pages
Cambridge IGCSE: BIOLOGY 0610/41
No ratings yet
Cambridge IGCSE: BIOLOGY 0610/41
20 pages
Human Molecular Genetics, Fourth Edition. ISBN 0815341490, 978-0815341499
100% (38)
Human Molecular Genetics, Fourth Edition. ISBN 0815341490, 978-0815341499
23 pages
Database Similarity Searching
No ratings yet
Database Similarity Searching
4 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Introduction-To-Computational Biology
No ratings yet
Introduction-To-Computational Biology
61 pages
1 T Coffee Dalign 18
No ratings yet
1 T Coffee Dalign 18
31 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Adejoh MCB 201 (Questions and Answers) ? ????
No ratings yet
Adejoh MCB 201 (Questions and Answers) ? ????
57 pages
Estimation of DNA by Diphenylamine Method
No ratings yet
Estimation of DNA by Diphenylamine Method
2 pages
Botany Paper Aakash
No ratings yet
Botany Paper Aakash
3 pages
Bioinspired and Biomimetic Supramolecular Chemistry
No ratings yet
Bioinspired and Biomimetic Supramolecular Chemistry
653 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
General Biology II - 4th Quarter
No ratings yet
General Biology II - 4th Quarter
136 pages
3 Biotechsch
No ratings yet
3 Biotechsch
69 pages
1681554716643a7d1ccb961MARDS 21 BP 107 - Merged1
No ratings yet
1681554716643a7d1ccb961MARDS 21 BP 107 - Merged1
215 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
(Svante Pääbo Et Al) Genetic Analyses From Ancient DNA PDF
No ratings yet
(Svante Pääbo Et Al) Genetic Analyses From Ancient DNA PDF
35 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Quarter 3-DNA SCIENCE 10
No ratings yet
Quarter 3-DNA SCIENCE 10
29 pages
Lecture2022 - 3 /!
No ratings yet
Lecture2022 - 3 /!
60 pages
Bio 2
No ratings yet
Bio 2
39 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Fundamentals of Bioinformatics - L5
No ratings yet
Fundamentals of Bioinformatics - L5
56 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
No ratings yet
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
53 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Algorithms For Biological Sequence Analysis: Class Presentation
No ratings yet
Algorithms For Biological Sequence Analysis: Class Presentation
40 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment (Chapter 6) : The Biological Problem
No ratings yet
Sequence Alignment (Chapter 6) : The Biological Problem
44 pages
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
No ratings yet
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
39 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Second - Done - w14b - Searching Squence Databases
No ratings yet
Second - Done - w14b - Searching Squence Databases
32 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
No ratings yet
Blast 2 Sequences, A New Tool For Comparing Protein and Nucleotide Sequences
17 pages
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
No ratings yet
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
14 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Human Genome Project
No ratings yet
Human Genome Project
16 pages
Sequence Alignment
No ratings yet
Sequence Alignment
22 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
Protein Synthesis: Teacher Answer Key: Code Cracking
No ratings yet
Protein Synthesis: Teacher Answer Key: Code Cracking
10 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Sequence Alignment Write
No ratings yet
Sequence Alignment Write
17 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
(Ebook PDF) Molecular Population Genetics by Matthew W. Hahn All Chapters Instant Download
100% (6)
(Ebook PDF) Molecular Population Genetics by Matthew W. Hahn All Chapters Instant Download
56 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
DNA Extraction Pro Max 2024
No ratings yet
DNA Extraction Pro Max 2024
17 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
32 pages
DNA Vs RNA - Introduction and Differences Between DNA and RNA
No ratings yet
DNA Vs RNA - Introduction and Differences Between DNA and RNA
16 pages
Module 4 Classification and Evolution
No ratings yet
Module 4 Classification and Evolution
10 pages
D 1.2 - Protein Synthesis
No ratings yet
D 1.2 - Protein Synthesis
19 pages
Basic Local Alignment Search Tool-BLAST
No ratings yet
Basic Local Alignment Search Tool-BLAST
9 pages
DNA RNA and Protein Synthesis WS PPT
No ratings yet
DNA RNA and Protein Synthesis WS PPT
8 pages
IB Biology Grade 12 Reviewer
No ratings yet
IB Biology Grade 12 Reviewer
38 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
D.1.3 Mutation and Gene Editing Final Final - Pdf.freenotes
No ratings yet
D.1.3 Mutation and Gene Editing Final Final - Pdf.freenotes
10 pages
Human Genetics Concepts and Applications 11th Edition Ricki Lewis Solutions Manualdownload
100% (4)
Human Genetics Concepts and Applications 11th Edition Ricki Lewis Solutions Manualdownload
36 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
Buoyed by Milestone' Clinical Result, RNA Editing Is Poised To Treat Diseases - Science - AAAS
No ratings yet
Buoyed by Milestone' Clinical Result, RNA Editing Is Poised To Treat Diseases - Science - AAAS
9 pages
Lesson: Physical Science - Grade 11 Quarter 3 - Module 10: Macromolecules: Proteins and Nucleic Acids
No ratings yet
Lesson: Physical Science - Grade 11 Quarter 3 - Module 10: Macromolecules: Proteins and Nucleic Acids
6 pages
Nucleic Acids Checklist
No ratings yet
Nucleic Acids Checklist
4 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
BTN 315 Exam Prep ch1 4
No ratings yet
BTN 315 Exam Prep ch1 4
3 pages
Protein Synthesis STEM Case - Bradley Boardman
No ratings yet
Protein Synthesis STEM Case - Bradley Boardman
2 pages
7 Cladogramsandgenetics
No ratings yet
7 Cladogramsandgenetics
4 pages
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Simulation for Data Science with R
From Everand
Simulation for Data Science with R
Matthias Templ
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Genomic Sequence Alignment

Uploaded by

Genomic Sequence Alignment

Uploaded by

Introduction to Bioinformatics

for Medical Research

• Orthologs generated by speciation

Strong but short match Long but weak match

9 matches +18 Total score +4

-ACGCTG Global: ACGCT Local:

• Genomic translations test all 6 possibilities

You might also like