Sequence Alignment
Sequence Alignment
• For the nucleotide sequence of the DNA and RNA both words have
the same meaning. Because both the sequences have similar base
pairs like
Calculating seq.ID and seq.Sim in DNA and
RNA strands
Calculate the % similarity and identity between ALL the taxon's sequences
• There are two different alignment strategies that are often used:
1. Global Alignment
2. Local Alignment
Global Alignment
• A global sequence-alignment method aligns and compares two sequences
along their entire length, and comes up with the best alignment that
displays the maximum number of nucleotides or amino acids aligned.
• The algorithm that drives global alignment is the Needleman-Wunsch
algorithm. The Needleman–Wunsch algorithm is an algorithm used in
bioinformatics to align protein or nucleotide sequences. It was one of the
first applications of dynamic programming to compare biological
sequences.
• Global alignment algorithm starts at the beginning of two sequences and
adds gaps to each until the end of one is reached.
• Global alignment works the best when the sequences are similar in
character and length. Because global alignment displays the best alignment
between two sequences using the entire sequence, it may miss a small
region of biological importance.
Local Alignment
• The basic sequence alignment method is the dot matrix or dot plot
method. In this method, two sequences being compared are written
in the vertical and horizontal axes of the matrix.
• Then each residue is scanned and each match is given a dot;
mismatches are left blank. When enough dots are lined up, they are
connected
ATTGTAC
ATGTTAC
ATTGTAC
ATGTTAC
ATTGTAC
ATGTTAC
Shift down in diagonal means
a gap must be added in sequence 1.
Shift right in diagonal means a gap
Must be added in sequence 2.
A1
A2
A3
ALIGNMENT ALGORITHMS, GAPS, AND GAP
PENALTIES
• An algorithm is a step-by-step procedure that utilizes a finite number
of instructions for automated reasoning and the calculation of a
function.
• The algorithm that drives global alignment is the Needleman-Wunsch
algorithm, and the algorithm that drives local alignment is the Smith-
Waterman algorithm.
• Both these algorithms are examples of dynamic programming.
Dynamic programming is a method for solving complex problems by
breaking them down into simpler sub-problems.
Alignment Scoring function
• In both global and local alignment, the final output is given an
alignment score.
• Gaps have to be introduced to improve the alignment. The reason
gaps are introduced is because one of the sequences may have
gained or lost sequence characteristics (insertion-/-deletion) during
evolution that did not happen with the other sequence.
• The gap penalty value is subtracted from the gross alignment score
to obtain the final alignment score. The insertion of no more than 1
gap per 20 amino acid residues is ideal but that is not possible in
most cases.
• For each gap opened, a gap-opening penalty value is assigned, and
for each gap extended, a gap-extension penalty value is assigned.
• A gap-opening penalty is always much higher than a gap-extension
penalty. Often, a default value of -10 for a gap-opening penalty and -1
for a gap-extension penalty are used.
Alignment Score