0% found this document useful (0 votes)

15 views25 pages

Sequence Alignment

Chapter 2 discusses sequence alignments, which are used to infer relatedness, predict functions, and assemble sequences. It covers concepts such as homology, similarity, identity, and the methods for alignment including global and local alignments, dynamic programming, and algorithms like BLAST. Additionally, it emphasizes the importance of scoring matrices and the statistical significance of alignment results.

Uploaded by

CT Hương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views25 pages

Sequence Alignment

Uploaded by

CT Hương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Chapter 2.

Sequence Alignments

5/3/2025

1
Sequence alignment: Overview

• Sequence alignment provides inference for the relatedness of two

sequences under study.

seq1: CATTTATTTTC
seq2: AATTTGTA Mismatch

Match
Indel
• Match vs mismatch.
• Gap (added to increase number of match) represents insertion or deletion
(indels)

2
Sequence alignment: Purpose

• Predict function of a sequence by inference from a well-characterized

sequence

seq1: CATTTATTTTC
seq2: AATTTGTA

• Infer evolutionary relationship between sequences: If the two sequences

share significant similarity, it is likely that the two sequences must have
derived from a common evolutionary origin
Có thể so sánh trình tự để suy ra chức năng do trình tụ bậc 1 quyết định bậc 2- bậc 3- bậc 4 (vùng active site
giống nhau)
• Predict structural and functional motif: active site, receptor site

3
Sequence alignment: Purpose

• Assembly of sequence reads into larger units such as contigs or genomes

Seq 1 The more that

Seq 2 that you read,
Seq 3 you read, the more things
Seq 4 things you will
Seq 5 will know.

4
Sequence alignment: Purpose

• Assembly of sequence reads into larger units such as contigs or genomes

Seq 1 The more that

Seq 2 that you read,
Seq 3 you read, the more things
Seq 4 things you will
Seq 5 will know.

Assembled sequence:
The more that that you read, the more things you will know.

5
Sequence homology, similarity and identity

• Two sequences share homology when they share a common ancestor.

Homology are not a quantitative term
2 homolog là 2 trình tự chia sẻ 1 tổ tiên
paralog -> gene dipication: những trình tự lặp lại trên cùng 1 cơ thể
arthlog
• Sequence similarity is the percentage of aligned residues that are similar in
physiochemical properties such as size, charge, and hydrophobicity. Similarity
is a quantitative term

• Sequence identity can be the same as similarity (for DNA) but is different from
similarity (for protein)

• Sequence identity refers to the percentage matches of the aligned residues

6
Sequence evolution

• Major changes:
• Substitution GACTGGA
• Insertion
• Deletion Substitution: G -> C CACTGGA
Deletion: C CATGGA
Speciation event
Substitution: G ->T CATGTA
Insertion: T CATGTTA

CATGTTA CACTGGA

7
Sequence alignment: which alignment is the best?

0 -2
-1

C A T G T T A C A - T G T T A C A T - G T T A
| | | | | | | | | | | |
C A C T G G A C A C T G G - A C A C T G G - A

1 match: 1p, 1 mis:-1, 1 gap: -2

gap penalty

8
100 50
100
consensuit
Pairwise alignment: Global vs. local 250
so sánh toàn bộ (giả thiết 2 trình tự có kích thước như nhau) và so sánh cục bộ

• In global alignment, two sequences

to be aligned are assumed to be
generally similar over their entire
length.
• Global alignment applies for closely
related sequences
• Local alignment does not assume
similar length between aligned
sequence, finds local regions that
share the highest level of similarity
Tìm vùng cục bộ có tỉ lệ tương đồng cao nhất
• Local alignment to search for
conversed regions within the
sequence
9
Sequence alignment: dynamic programming method

• Global alignment: Needleman and

Wunsch algorithm

Match: +1, mismatch: -1, gap: -2

chéo
• Step 1: set up a matrix
• Step 2: score a matrix
• Step 3: trace back and identify
alignment
CACTGGA
CATGTTA

10
Sequence alignment: dynamic programming method

Sequence 2 (length m)
C A – T G T T A
C A C T G G - A
Sequence 1 (length n)

2
11
Sequence alignment: dynamic programming method

• Match: +1, mismatch: -1, gap: -3

12
Scoring matrix
substitution matrix: hệ ma trận thay thế
• Substitution matrix is a set of values for quantifying the likelihood of one residue
being substituted by another in an alignment.

• Substitution matrix is derived from statistical analysis of residue substitution data

from sets of reliable alignments of highly related sequences.

• Scoring matrices for nucleotide sequences are relatively simple. A positive value
or high score is given for a match and a negative value or low score for a mismatch.

• Scoring matrices for amino acids are more complicated because scoring reflects
the physicochemical properties of amino acid residues, as well as the likelihood of
certain residues being substituted among true homologous sequences

13
Waterman wush

Scoring matrix
khác tính chất khả năng thay
thế ít hơn: bị phạt nhiều điểm
hơn

14
Local alignment: Smith and Waterman algorithm

• Negative scores are replaced by 0

• Tracing back scoring matrix starts
from the cell with the highest score

So sánh từ giá trị cao nhất

15
Sequence alignment: dot plots
• Seq1: GATTCTATCTAACTA
• Seq2: GTTCTATTCTAAC

• Put a dot at where a match is found

• Connect the dots in diagonal direction
• Drawback: high noise
• Solution: sliding window with a
threshold

16
Database similarity searching: pairwise alignment on large scale

• Database searching: a mean of assigning putative functions to newly

determined sequences.

• How: by pairwise alignment on a large scale: a query sequence (input

sequence) vs. thousands of sequences in the database

17
Database similarity searching: pairwise alignment on large scale

• Requirements:
• Sensitivity: the ability to find as many correct hits as possible tính nhạy

• Selectivity (specificity): to find as few unrelated hits as possible tính đặc hiệu

• Speed: the time it takes to get results tốc đọ

• Approaches:
• Exhaustive type: dynamic programming (Waterman and Smith algorithm)
• Heuristic type: take shortcut by reducing the search space. so sánh đường tắt

18
Basic Local Alignment Search Tool (BLAST)

• Developed by Stephen Altschul of NCBI in 1990

• Became one of the most popular programs for sequence analysis
• Use heuristic approach to align a query sequence with all sequences in the
database
• Objective: find high-scoring ungapped segments along related sequences.

19
BLAST steps
1. Break query sequence into words
(e.g. 3 aa or 11 nucleotides)
2. Scan every 3 residues in word
database
3. Assume one of the words finds
matches in the database
4. Calculate sums of match scores
based on a scoring matrix
5. Find the database sequence
corresponding to the best word
match and extend alignment in both
directions
6. Determine the high scored segment
above threshold (e.g., 22)

20
BLAST results

21
Statistical significance of BLAST search results

• E-value (Expectation value) indicates the probability that the resulting

alignments from a database search are caused by random chance

E-value = m x n x P
m: total number of residues in a database
n: number of residues in the query sequence
P: probability that an alignment is a result of random chance

E.g., E-value = 1012 x 100 x 10-20 = 10-6

22
BLAST results

23
BLAST results

24
Problems

1. Obtain the human HBA and HBB protein sequences. Perform pairwise
alignment on NCBI and on EBI websites
2. You have isolated a novel bacterial strain from a soil sample and subject
PCR product of 16S rRNA gene for Sanger sequencing. Now that you have
a sequence of 16S rRNA gene, use Blastn on NCBI to identify the identity of
your isolate.
identity>97%: Cùng loài
>94%: cùng chi

Veterinary Oncology
100% (6)
Veterinary Oncology
311 pages
Bioinformatics Alignment
No ratings yet
Bioinformatics Alignment
128 pages
MycoViro - Complete Handouts (AMCC)
100% (1)
MycoViro - Complete Handouts (AMCC)
46 pages
ARS - Main - Descriptive-Previous Questions - Agricultural Biotechnology (2011) Biology Exams 4 U
100% (1)
ARS - Main - Descriptive-Previous Questions - Agricultural Biotechnology (2011) Biology Exams 4 U
5 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
FRCR Part 1 Guidance
No ratings yet
FRCR Part 1 Guidance
6 pages
Lesson 5 Protein Synthesis
100% (1)
Lesson 5 Protein Synthesis
8 pages
The Perpetuation of Life
100% (1)
The Perpetuation of Life
40 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Lec 02
No ratings yet
Lec 02
103 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
Generic Drug Product Development: SPH SPH IHBK055-FM IHBK055-Kanfer January 8, 2010 19:20 Char Count
No ratings yet
Generic Drug Product Development: SPH SPH IHBK055-FM IHBK055-Kanfer January 8, 2010 19:20 Char Count
330 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
CE6068 Lecture 5
No ratings yet
CE6068 Lecture 5
83 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
BT302 L3 Psa
No ratings yet
BT302 L3 Psa
47 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Bio 3
No ratings yet
Bio 3
51 pages
Bioinfo Notes 2
No ratings yet
Bioinfo Notes 2
9 pages
Biochem Lab Con Nucleic
70% (10)
Biochem Lab Con Nucleic
27 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Tin Sinh
No ratings yet
Tin Sinh
49 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Sequence Alingment
No ratings yet
Sequence Alingment
10 pages
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
No ratings yet
Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences
59 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Sequence Alignment
No ratings yet
Sequence Alignment
7 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
Disclaimer
No ratings yet
Disclaimer
22 pages
Sequence Alignment Methods
No ratings yet
Sequence Alignment Methods
32 pages
Sequence Comparison Part 1
No ratings yet
Sequence Comparison Part 1
31 pages
PB Bioinfo L3 2023
No ratings yet
PB Bioinfo L3 2023
38 pages
Yac Bac and Shuttle Vectors
No ratings yet
Yac Bac and Shuttle Vectors
53 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Lecture 04 Alignment
No ratings yet
Lecture 04 Alignment
22 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Dot Matrix
No ratings yet
Dot Matrix
23 pages
Module 5
No ratings yet
Module 5
23 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
AlinhamentosMultiplos 2023-24
No ratings yet
AlinhamentosMultiplos 2023-24
24 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
No ratings yet
Unit - Ii Sequence Analysis: Pair-Wise Sequence Comparison
17 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Notes Bioinformatics
No ratings yet
Notes Bioinformatics
14 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
No ratings yet
Alignments & Phylogenetic Trees: Lesk, A. 2 Ed
18 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
AsBioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
AsBioinfo Ders 7 ALLIGNMENT - 1
9 pages
Scan 16 Jan 25 22 34 49
No ratings yet
Scan 16 Jan 25 22 34 49
6 pages
03 - Sequence Alignment
No ratings yet
03 - Sequence Alignment
4 pages
01 Lect
No ratings yet
01 Lect
16 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Question Paper 2nd Year
100% (1)
Question Paper 2nd Year
2 pages
Laboratory Diagnosis of Viral Infections
No ratings yet
Laboratory Diagnosis of Viral Infections
71 pages
Basic Mechanism Disease Cell Metabolism
No ratings yet
Basic Mechanism Disease Cell Metabolism
20 pages
Week 3c - Phylogenetic - Tree - ConstructionMai PDF
No ratings yet
Week 3c - Phylogenetic - Tree - ConstructionMai PDF
19 pages
Infectious Agents James N KC
No ratings yet
Infectious Agents James N KC
3 pages
Dna Translation
No ratings yet
Dna Translation
16 pages
Stamen Structure and Function: R. J. Scott, M. Spielman, and H. G. Dickinson
No ratings yet
Stamen Structure and Function: R. J. Scott, M. Spielman, and H. G. Dickinson
15 pages
Worksheet - 1 Cell
No ratings yet
Worksheet - 1 Cell
1 page
Flower Development
No ratings yet
Flower Development
17 pages
A Multi-Use Deep Learning Method For CITE-seq and Single-Cell RNA-seq Data Integration With Cell Surface Protein Prediction and Imputation
No ratings yet
A Multi-Use Deep Learning Method For CITE-seq and Single-Cell RNA-seq Data Integration With Cell Surface Protein Prediction and Imputation
20 pages
Tunisian Population of The Wheat Pathogen Mycosphaerella Graminicola Is Still Fully Sensitive To Strobilurin Fungicides
No ratings yet
Tunisian Population of The Wheat Pathogen Mycosphaerella Graminicola Is Still Fully Sensitive To Strobilurin Fungicides
5 pages
Bioprocess
No ratings yet
Bioprocess
22 pages
Analysis of Gene Expression
No ratings yet
Analysis of Gene Expression
28 pages
Alana Dass: SS-Galactosidase Project
No ratings yet
Alana Dass: SS-Galactosidase Project
1 page
LAPORAN PRAKTIKUM Apoptosis Moh Jaml 226070103141001
No ratings yet
LAPORAN PRAKTIKUM Apoptosis Moh Jaml 226070103141001
11 pages
RT PCR Tests Kits Evaluation Summ 30052020 PDF
No ratings yet
RT PCR Tests Kits Evaluation Summ 30052020 PDF
3 pages
25 Mark Essay - The Importance of Shapes in Biology.
No ratings yet
25 Mark Essay - The Importance of Shapes in Biology.
3 pages
BIO2107 - Tutorial 1 Discussion Points
No ratings yet
BIO2107 - Tutorial 1 Discussion Points
5 pages
7.1 - Nucleic Acids, Ahl: (Adapted From)
No ratings yet
7.1 - Nucleic Acids, Ahl: (Adapted From)
5 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet