03 - Sequence Alignment
03 - Sequence Alignment
For example, the simplest way to compare two sequences of the same length is to
calculate the number of matching symbols. The value that measures the degree of
sequence similarity is called the alignment score of two sequences. The opposite
value, corresponding to the level of dissimilarity between sequences, is usually
referred to as the distance between sequences. The number of non-matching
characters is called the Hamming distance. Fig. 1 shows an example of two
sequences with Hamming distance equal to 3.
Fig. 1. Example of two sequences with Hamming distances equal to 3.
Sequence alignments can be further divided into global alignments that align the
complete sequences and local alignments that identify only the most similar
segments or sequence patterns (motifs). While global alignment algorithms
produce more accurate alignments for proteins of similar length, local alignment
algorithms are better at identifying similar regions within sequences when the
sequences are not related over their entire length.
Facts About Global Sequence Alignment
1. In global alignment, an attempt is made to align the entire sequence (end to
end alignment).
2. A global alignment contains all letters from both the query and target
sequences.
3. If two sequences have approximately the same length and are quite similar,
they are suitable for global alignment.
4. Suitable for aligning two closely related sequences.
5. Global alignments are usually done for comparing homologous genes like
comparing two genes with same function (in human vs. mouse) or
comparing two proteins with similar function.
6. A general global alignment technique is the Needleman–Wunsch algorithm.