0% found this document useful (0 votes)
13 views128 pages

Bioinformatics Alignment

The document discusses global pairwise alignment of nucleotide and amino-acid sequences, emphasizing the importance of homologous sequences and their evolutionary relationships. It outlines the concepts of orthology, paralogy, and xenology, and explains the methods and algorithms used for sequence alignment, including manual, dot matrix, and dynamic programming approaches. The document also highlights the significance of scoring matrices and gap penalties in determining optimal alignments between sequences.

Uploaded by

almondnathan400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views128 pages

Bioinformatics Alignment

The document discusses global pairwise alignment of nucleotide and amino-acid sequences, emphasizing the importance of homologous sequences and their evolutionary relationships. It outlines the concepts of orthology, paralogy, and xenology, and explains the methods and algorithms used for sequence alignment, including manual, dot matrix, and dynamic programming approaches. The document also highlights the significance of scoring matrices and gap penalties in determining optimal alignments between sequences.

Uploaded by

almondnathan400
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 128

GLOBAL

PAIRWISE ALIGNMENT

GLOBAL ALIGNMENT OF:


2 NUCLEOTIDE SEQUENCES
OR
2 AMINO-ACID SEQUENCES
1
Assumptions:

Life is monophyletic
Biological entities (sequences,
taxa) share common ancestry

2
ancestor
Any two organisms
share a common
ancestor in their past

descendant 1 descendant 2 3
ancestor (~5 MYA)

4
ancestor (~120 MYA)

5
ancestor (~1,500 MYA)

6
(1) Speciation events
(2) Gene duplication
(3) Duplicative transposition

Homologous
sequences

7
Homolog
y: A term
coined by
Richard Owen
in 1843.

Definition:
Similarity
resulting from 8
Homology

There are three main types of

molecular homology: orthology,

paralogy (including ohnology) and

xenology.

9
Homology: General Definition

• Homology designates a qualitative


relationship of common descent
between entities
• Two genes are either homologous
or they are not!
– it doesn’t make sense to say “two
genes are 43% homologous.”
– it doesn’t make sense to say “Linda is
43% pregnant.”
10
Orthology & Paralogy
• Two genes are orthologs if they
originated from a single ancestral
gene in the most recent common
ancestor of their respective
genomes
• Two genes are paralogs if they are
related by gene duplication. Two
genes are ohnologs if they are
related by gene duplication due to
genome duplication 11
12
= Gene death

13
Xenology is due to horizontal
(lateral) gene transfer (HGT or
LGT)
XA and XB are xenologs
Distinguishing orthologs from xenologs is
impossible in pairwise genomic
comparisons, but possible when multiple
genomes are compared

14
Orthology, Paralogy, Xenology
(Fitch, Trends in Genetics, 2000. 16(5):227-231)

15
Homology

By comparing homologous
characters, we can reconstruct
the evolutionary events that have
led to the formation of the extant
sequences from the common
ancestor. 16
Homology

When comparing sequences, we are


interested in POSITIONAL HOMOLOGY.
We identify POSITIONAL HOMOLOGY
through SEQUENCE ALIGNMENT.
17
Alignment: A hypothesis
concerning positional
homology among residues
from two or more sequence.
Positional homology = In
pairwise alignment, a pair of
nucleotides from two
homologous sequences that
have descended from one
nucleotide in the ancestor of
the two sequences.
Sequence alignment involves the
identification of the correct location
of deletions and insertions that have
occurred in either of the two lineages
since their divergence from a
common ancestor.

19
20
Unknown sequence

Unknown events & unknown Unknown events & unknown


sequence of events sequence of events

The true alignment is


unknown.

21
There are two modes of alignment.

Global alignment: each residue of sequence A is


compared with each residue in sequence B. Global
alignment algorithms are used in comparative and
evolutionary studies.

Local alignment: Determining if sub-segments of


one sequence are present in another. Local
alignment methods have their greatest utility in
database searching and retrieval (e.g., BLAST).
For reasons of computational complexity, sequence
alignment is divided into two categories:

Pairwise alignment (i.e., the alignment of two


sequences).

Multiple-sequence alignment (i.e., the alignment of


three or more sequences).

Pairwise alignment problems have exact solutions.

Multiple-sequence alignment problems only have


approximate (heuristic) solutions.
A pairwise alignment consists of a
series of paired bases, one base from
each sequence. There are three types
of pairs:
(1) matches = the same nucleotide appears in
both sequences.
(2) mismatches = different nucleotides are
found in the two sequences.
(3) gaps = a base in one sequence and a null
base in the other.

GCGGCCCATCAGGTAGTTGGTG-G
GCGTTCCATC--CTGGTTGGTGTG
24
-Two DNA sequences: A and B.
-Lengths are m and n, respectively.

-The number of matched pairs is x.

-The number of mismatched pairs


is y.
- Total number of bases in gaps is
z.

25
There are internal and terminal
gaps.

GCGG-CCATCAGGTAGTTGGTG--
GCGTTCCATC--CTGGTTGGTGTG
26
A terminal gap may indicate
missing data.

GCGG-CCATCAGGTAGTTGGTG--
GCGTTCCATC--CTGGTTGGTGTG
27
An internal gap indicates that
a deletion or an insertion has
occurred in one of the two
lineages.

GCGG-CCATCAGGTAGTTGGTG--
GCGTTCCATC--CTGGTTGGTGTG
28
When sequences are compared through
alignment, it is impossible to tell
whether a deletion has occurred in one
sequence or an insertion has occurred
in the other. Thus, deletions and
insertions are collectively referred to as
indels (short for insertion or deletion).

GCGG-CCATCAGGTAGTTGGTG--
GCGTTCCATC--CTGGTTGGTGTG
29
The alignment is the first step
in many functional and
evolutionary studies.

Errors in alignment tend to


amplify in later stages of the
study.

30
Motivation for sequence
alignment

Function
– Similarity may be indicative of
similar function.

Evolution
– Similarity may be indicative of
common ancestry.

31
Some definitions

32
Methods of
alignment:

1. Manual
2. Dot matrix
3. Distance Matrix
4. Combined (Distance +
Manual)
34
Manual alignment.
nment When there
are few gaps and the two
sequences are not too
different from each other, a
reasonable alignment can
be obtained by visual
inspection.

GCG-TCCATCAGGTAGTTGGTGTG
GCGATCCATCAGGTGGTTGGTGTG
35
Advantages of manual alignment:

(1) use of a powerful and trainable tool


(the brain, well… some brains).

(2) ability to integrate additional data,


e.g., domain structure, biological
function.

36
37
Protein Alignment may be
guided by Secondary and
Tertiary Structures

Escherichia
coli Homo sapiens
DjlA protein DjlA protein

38
Disadvantages of manual alignment:

subjectivity (the algorithm is unspecified)

irreproducibility (the results cannot be


independently reproduced)

unscalability (inapplicable to long


sequences)

incommensurability (the results cannot


be compared to those obtained by other
39
The dot-matrix
method (Gibbs and
McIntyre, 1970): The
two sequences are
written out as column
and row headings of a
two-dimensional matrix.
A dot is put in the dot-
matrix plot at a position
where the nucleotides in
the two sequences are
identical.
40
The
alignment is
defined by a
path from the
upper-left
element to
the lower-
right
element.

41
There are 4 possible steps in the
path:
(1) a diagonal step
through a dot =
match.
(2) a diagonal step
through an empty
element of the matrix
= mismatch.
(3) a horizontal step = a
gap in the sequence
on the left of the
matrix.
(4) a vertical step = a
gap in the sequence
on the top of the 42
A dot matrix may become
cluttered. With DNA sequences,
~25% of the elements will be
occupied by dots by chance
alone. 43
window size =1
stringency = 1
alphabet size = 4

The number of spurious matches is determined by:


window size (how many residues are compared),
stringency (the minimum number of matches for
a hit), & alphabet size (number of characters
states). Window size must be an odd number. 44
window size =1 window size = 3
stringency = 1 stringency = 2
alphabet size = 4 alphabet size = 4
45
window size = 1
stringency = 1
alphabet size = 20
46
Dot-matrix methods:
Advantages: By being a visual
representation, and humans
being visual animals, the
method may unravel
information on the evolution of
sequences that cannot easily
be gleaned from a line
alignment.
Disadvantages: May not
identify the best possible 47
Window size = 60 amino acids; Stringency = 24 matches

Advantages:
Highlighting Information

The vertical gap indicates


that a coding region
corresponding to ~75
amino acids has either
been deleted from the
human gene or inserted
into the bacterial gene.

48
Window size = 60 amino acids; Stringency = 24 matches

Advantages:
Highlighting Information

The two pairs of


diagonally oriented
parallel lines most
probably indicate that two
small internal duplications
occurred in the bacterial
gene.

49
Disadvantages:

Not possible to
identify the
best alignment.

50
Scoring Matrices & Gap
Penalties

51
The true alignment between two sequences is
the one that reflects accurately the evolutionary
relationships between the sequences.

Since the true alignment is unknown, in practice


we look for the optimal alignment, which is the
one in which the numbers of mismatches and
gaps are minimized according to certain
criteria.
Unfortunately, reducing
the number of mismatches
results in an increase in
the number of gaps, and
vice versa.

53
 = matches
 = mismatches
 = nucleotides in gaps
 = gaps

54
The scoring scheme comprises a gap
penalty and a scoring matrix, M(a,b), that
specifies the score for each type of match (a = b)
or mismatch (a  b).

The units in a scoring matrix may be the


nucleotides in the DNA or RNA sequences, the
codons in protein-coding regions, or the amino
acids in protein sequences.

55
DNA scoring matrices are usually simple. In the
simplest scheme all mismatches are given the
same penalty.

M(a,b) is positive if a = b and negative otherwise.

 0 if a b
M(a,b)
 0 if a b
In more complicated matrices a distinction may be
made between transition and transversion
mismatches or each type of mismatch may be

penalized differently.
56
Further complications:
Distinguishing among
different matches and
mismatches.

For example, a mismatched pair


consisting of Leu & Ile,
Ile which are very
similar biochemically to each other,
may be given a lesser penalty than a
mismatched pair consisting of Arg &
Glu,
Glu which are very dissimilar from
each other.
57
Lesser penalty than

58
BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

59
BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

B = asx (asp or asn) X = unknown


Z = glx (glu or gln) * = termination codon 60
BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

61
The matrix is symmetrical
BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

Positive numbers on the diagonal 62


BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

Mismatches are usually penalized 63


BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

Some mismatches are not penalized 64


BLOSUM62 (BLOcks of amino acid SUbstitution Matrix

A few mismatches are even rewarded 65


Gap penalty (or cost) is a factor (or a
set of factors) by which the gap
values (numbers and lengths of gaps)
are mathematically manipulated to
make the gaps equivalent in value to
the mismatches.

The gap penalties are based on our


assessment of how frequent different
types of insertions and deletions
occur in evolution in comparison with
the frequency of occurrence of point
substitutions. 66
Mismatches
Gaps
The gap penalty has two
components: a gap-opening
penalty and a gap-extension
penalty.

68
Three main gap-penalty systems:
(1) Fixed gap-penalty system = 0 gap-extension
costs.

69
Three main gap-penalty systems:
(2) Linear gap-penalty system = the gap-extension cost is
calculated by multiplying the gap length minus 1 by a
constant representing the gap-extension penalty for
increasing the gap by 1.

70
Three main gap-penalty systems:
(3) Logarithmic gap-penalty system = the gap-
extension penalty increases with the logarithm
of the gap length, i.e., slower.

71
Alignment
algorithms

72
Aim: Given a
predetermined set of
criteria, find the
alignment associated
with the best score from
among all possible
alignments.

The OPTIMAL ALIGNMENT


73
The number of possible
alignments may be astronomical.
 n  m  (n  m)! n  m (n  m)n m
    n m
min(n,m) n!m! 2nm n m

where n and m are the lengths of


 the two sequences to be aligned.

74
The number of possible
alignments may be astronomical.

For example, when two DNA


sequences 200 residues long each
are compared, there are more
than 10153 possible alignments.

In comparison, the number of


protons in the universe is only
~1080.
75
FORTUNATELY:

There are computer algorithms


for finding the optimal
alignment between two
sequences that do not require
an exhaustive search of all the
possibilities.

76
The
Needleman-Wunsch (1970)
algorithm
uses
Dynamic
Programming

77
Dynamic programming = a
computational technique. It is
applicable when large searches can be
divided into a succession of small
stages, such that (1) the solution of
the initial search stage is trivial, (2)
each partial solution in a later stage
can be calculated by reference to only
a small number of solutions in an
earlier stage, and (3) the last stage
contains the overall solution.
78
Dynamic programming can be
applied to problems of
alignment because
ALIGNMENT SCORES obey the
following rules:
S S S
1 x, 1 y x 1, y1 1 x 1, 1 y1

79
Path Graph for aligning two
sequences

80
allowed

81
not allowed

82
Scoring scheme

match = +5
mismatch = –3
gap-opening penalty = –4
gap-extension penalty = 0

84
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix initialization
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix initialization
0 + match = 5
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix initialization
0 + gap = –4
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix initialization
0 + gap = –4
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix fill
0 + match = 5
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix fill
5 + gap = 1
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Matrix fill
0 + gap = –4
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

… and so on and so forth


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Complete matrix fill


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Trace back
The alignment is produced by either
starting at the highest score in either
the rightmost column or the bottom
row, and proceeding from right to left
by following the best pointers, or at
the bottom rightmost cell.

This stage is called the traceback.


traceback The
graph of pointers in the traceback is
also referred to as the path graph
because it defines the paths through
the matrix that correspond to the
optimal alignment or alignments. 95
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Trace back (if we DO allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

Trace back (if we DO NOT allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

10 + gap ≠ 11 10 + gap ≠ 11 14 + mismatch = 11


Trace back (if we DO NOT allow terminal gaps)
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

10 + gap ≠ 14 5 + gap ≠ 14 9 + match = 14

Trace back (if we DO NOT allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

4 + mismatch ≠ 9 0 + gap ≠ 9 13 + gap= 9

Trace back (if we DO NOT allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

8 + match = 13 9 + gap ≠ 13 4 + gap ≠ 13

Trace back (if we DO NOT allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

12 + gap = 8 3 + match = 8 –1 + gap ≠ 8

Trace back (if we DO NOT allow terminal gaps)


match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

7 + gap ≠ 12 3 + gap ≠ 12 7 + match = 12


7 + gap = 3 –2 + mismatch ≠ 3 –6 + gap ≠ 3
Trace back (if we DO NOT allow terminal gaps)
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0


Trace back (if we DO NOT allow terminal gaps)
match = +5, mismatch = –3,
gap-opening penalty = –4, gap-extension penalty = 0

high road/low road/middle road

Trace back (complete)


Two possible alignments:

GAATTCAGT
GGA-TC-GA
* * ** *

GAATTCAGT
GGAT-C-GA
* ** * *
Scoring Matrices

Mismatch and gap penalties


should be inversely proportional
to the frequencies with which
changes occur.

107
Transitions (68%) occur more frequently than transversions (32%).
Mismatch penalties for transitions should be smaller than those
for transversions.

To A To T To C To G Row totals

3.4 0.7 4.5 0.8 12.5 1.1 20.3


From A
(3.6 0.7) (4.8 0.9) (13.3 1.1) (21.6)

3.3 0.6 13.8 1.9 3.3 0.6 20.4


From T
(3.5 0.6) (14.7 2.0) (3.5 0.6) (21.7)

4.2 0.5 20.7 1.3 4.6 0.6 29.5


From C
(4.2 0.5) (16.4 1.3) (4.4 0.6) (25.1)

20.4 1.4 4.4 0.6 4.9 0.7 29.7


From G
(21.9 1.5) (4.6 0.6) (5.2 0.8) (31.6)

Column 27.9 28.5 23.2 20.5


totals (29.5) (24.6) (23.2) (21.3)

108
Empirical substitution matrices

PAM (Percent/Point Accepted Mutation)

BLOSUM (BLOcks SUbstitution Matrix)

109
PAM
• Developed by Margaret
Dayhoff in 1978.
• Based on comparisons of very
similar protein sequences.

110
Log-odds ratios

• A scoring matrix is a table of values that describe the


probability of a residue (amino acid or base) pair occurring in
an alignment.

• The values in a scoring matrix are log ratios of two


probabilities.

One is the random probability. The other is the probability of a


empirical pair occurrence.

• Because the scores are logarithms of probability ratios, they can


be added to give a meaningful score for the entire alignment. The
more positive the score, the better the alignment!

111
The PAM matrices
(Percent accepted mutations)

• Align sequences that are at least 85% identical.

– Minimizes ambiguity in alignments and the number of coincident mutations.

• Reconstruct phylogenetic trees and infer ancestral sequences.

• Tally replacements "accepted" by natural selection, in all


pairwise comparisons.

– Meaning, the number of times j was replaced by i in all comparisons.

• Compute amino acid mutability (i.e., the propensity of a given


amino acid, j, to be replaced).

112
The PAM matrices

• Combine data to produce a Mutation Probability


Matrix for one PAM of evolutionary distance,
which is used to calculate the Log Odds Matrix for
similarity scoring.

• Thus, depending on the protein family used,


various PAM matrices result - some of which are
“good” at locating evolutionary distant
conserved mutations and some that are good at
locating evolutionary close conserved mutations.

113
More on log-odds ratios

In PAM log-odds scores are multiplied by 10 to avoid decimals. Therefore, a PAM


score of 2 actually corresponds to a log-odds ratio of 0.2.

0.2 = substitioni to j = log10 { (observed ij mutation rate) / (expected rate) }

The value 0.2 is log10 of the relative expectation value of the mutation. Therefore, the
expectation value is 100.2 = 1.6.

So, a PAM score of 2 indicates that (in related sequences) the mutation would
be expected to occur 1.6 times more frequently than random.

114
PAM250
– Calculated for families of related proteins
(>85% identity)
– 1 PAM is the amount of evolutionary
change that yields, on average, one
substitution in 100 amino acid residues
– A positive score signifies a common
replacement whereas a negative score
signifies an unlikely replacement
– PAM250 matrix assumes/is optimized for
sequences separated by 250 PAM, i.e. 250
substitutions in 100 amino acids (longer
evolutionary time)

115
PAM250
Sequence alignment matrix that allows 250 accepted point
mutations per 100 amino acids. PAM250 is suitable for
comparing distantly related sequences, while a lower PAM is
suitable for comparing more closely related sequences.

116
Selecting a PAM Matrix
• Low PAM numbers: short sequences, strong local
similarities.

• High PAM numbers: long sequences, weak


similarities.
– PAM60 for close relations (60% identity)
– PAM120 recommended for general use (40% identity)
– PAM250 for distant relations (20% identity)

• If uncertain, try several different matrices


– PAM40, PAM120, PAM250 recommended.

117
BLOSUM
• Blocks Substitution Matrix
– Steven and Jorga G. Henikoff (1992).
• Based on BLOCKS database (www.blocks.fhcrc.org)
– Families of proteins with identical function.
– Highly conserved protein domains.
• Ungapped local alignment to identify motifs
– Each motif is a block of local alignment.
– Counts amino acids observed in same column.
– Symmetrical model of substitution.

118
BLOSUM62
• BLOSUM matrices are based on local alignments
(“blocks” or conserved amino acid patterns).

• BLOSUM 62 is a matrix calculated from comparisons of


sequences with no less than 62% divergence.

• All BLOSUM matrices are based on observed


alignments; they are not extrapolated from
comparisons of closely related proteins.

• BLOSUM 62 is the default matrix in BLAST 2.0.

119
BLOSUM Matrices

• Different BLOSUMn matrices are


calculated independently from
BLOCKS
• BLOSUMn is based on sequences
that are at most n percent
identical.

120
BLOSUM62
The procedure for calculating a BLOSUM matrix is based on a
likelihood method estimating the occurrence of each possible
pairwise substitution. Only aligned blocks are used to calculate the
BLOSUMs.

The higher the score


The more closely
related sequences.

121
Why is BLOSUM62 called

BLOSUM62?

Because all blocks whose members shared at least 62%


identity with ANY other member of that block were
averaged and represented as 1 sequence.

122
Selecting a BLOSUM Matrix

• For BLOSUMn, higher n suitable for


sequences which are more similar
– BLOSUM62 recommended for general
use
– BLOSUM80 for close relations
– BLOSUM45 for distant relations

123
Equivalent PAM and Blosum
matrices
The following matrices are roughly equivalent...

•PAM100 ==> Blosum90 Less


•PAM120 ==> Blosum80 divergent
•PAM160 ==> Blosum60
•PAM200 ==> Blosum52 More
•PAM250 ==> Blosum45 divergent

Generally speaking...
•The Blosum matrices are best for detecting local alignments.
•The Blosum62 matrix is the best for detecting the majority of
weak protein similarities.
•The Blosum45 matrix is the best for detecting long and weak
124
alignments.
Comparison of PAM250 and
BLOSUM62
The relationship between BLOSUM and PAM substitution
matrices:

BLOSUM matrices with higher numbers and PAM matrices with


low numbers are both designed for comparisons of closely
related sequences.

BLOSUM matrices with low numbers and PAM matrices with high
numbers are designed for comparisons of distantly related
proteins.

If distant relatives of the query sequence are specifically being


sought, the matrix can be tailored to that type of search.
125
Scoring matrices commonly
used
• PAM250
– Shown to be appropriate for searching for
sequences of 17-27% identity.

• BLOSUM62
– Though it is tailored for comparisons of
moderately distant proteins, it performs well in
detecting closer relationships.

• BLOSUM50
– Shown to be better for FASTA searches.

126
Effect of gap penalties on amino-acid alignment
Human pancreatic hormone precursor versus chicken
pancreatic hormone

(a) Penalty for gaps is 0


(b) Penalty for a gap of size k nucleotides is wk = 1 + 0.1k
(c) The same alignment as in (b), only the similarity between
the two sequences is further enhanced by showing pairs of 127
biochemically similar amino acids
Alignments: things to
keep in mind
“Optimal alignment” means “having the highest
possible score, given a substitution matrix and a set
of gap penalties”
This is NOT necessarily the most meaningful alignment

The assumptions of the algorithm are often wrong:


- substitutions are not equally frequent at all
positions,
- it is very difficult to realistically model insertions
and deletions.
Pairwise alignment programs ALWAYS produce an
alignment (even when it does not make sense to align
sequences)

You might also like