Sequence Comparison Part 1
Sequence Comparison Part 1
BIOINFORMATICS
DR. UROOJ AINUDDIN
PAIRWISE SEQUENCE
ALIGNMENT
CHAPTER 3
Sequence alignment
• Sequence alignment is the procedure of comparing two (pair-wise
alignment) or more (multiple sequence alignment) sequences by
searching for a series of individual characters or character patterns that
are in the same order in the sequences.
• Two sequences are aligned by writing them across a page in two rows.
• Identical or similar characters are placed in the same column.
• Non-identical characters can either be placed in the same column as a
mismatch or opposite a gap in the other sequence.
Optimal sequence alignment
• Orthologs
• Paralogs
Orthologs
• Orthologous genes are genes in different species that originated from
a single gene of a common ancestor.
• Orthologs are generated by speciation (creation of a new species). It
occurs when a group within a species separates from other members of
its species and develops its own unique characteristics.
• Homologous proteins from different species that possess the same
function are called orthologs.
• Example: the beta-hemoglobin genes of humans and chimpanzees.
Paralogs
• Amino acids are not exchanged with the same probability as might
be conceived theoretically.
• For example, an exchange of aspartic acid for glutamic acid is
frequently observed; however, a change from aspartic acid to
tryptophan is rarely seen.
• There are several reasons that may facilitate the aspartic acid -
glutamic acid exchange and not the aspartic acid - tryptophan
exchange.
Explaining similarity for amino acids
• One reason for this is the triplet-based genetic code.
• For an exchange of aspartic acid to glutamic acid, only a mutation
of the last nucleotide in the codon is required (GAT/GAC to
GAA/GAG).
• In contrast, a complete mutation of the whole triplet must occur for
aspartic acid to be exchanged for tryptophan (GAT/GAC to TGG).
• Of course, a complete mutational substitution has a much lower
probability of occurrence and needs a longer timeframe.
Explaining similarity for amino acids
• A second reason for the mutation of aspartic acid to glutamic acid to
occur more often is that both have similar properties.
• In contrast, aspartic acid and tryptophan are chemically different – the
hydrophobic tryptophan is frequently found in the center of proteins,
whereas the hydrophilic aspartic acid occurs more often at the surface.
• An exchange of aspartic acid for tryptophan, therefore, could greatly
alter the structure of a protein and, consequently, its function.
• Such striking amino acid exchanges accompanied by a loss of function
rarely happen.
Alignment strategies
X: CCAGTGTGGCCGATACCCCAGGTTGGCACGCATCGTTGCCTTGGTAAGC
Y: CCAGTGTGGCCGATGCCC--G-T-GT-ACGCATCGTTGCCTTGGTAAGC
Example of global alignment
X: EARDFNQYYSSIKRSGSIQ 19
Y: LPKLFIDQYYSSIKRTMGH 19
X: EARDFN-QYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMGH--
• Identity(X,Y) = [10*2]/[19+19] = 52.63%
• Consider K and R, D and N, H and Q to be similar. Also consider G,
S and T, to be similar.
X: EARDF-NQYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMG--H
Example of global alignment
X: EARDFNQYYSSIKRSGSIQ 19
Y: LPKLFIDQYYSSIKRTMGH 19
X: EARDFN-QYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMGH--
• Identity(X,Y) = [9*2]/[19+19] = 52.63%
• Consider K and R, D and N, H and Q to be similar. Also consider G,
S and T, to be similar.
X: EARDF-NQYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMG--H
• Similarity(X,Y) = [14*2]/[19+19] = 73.68%
Example of local alignment
X: EARDFNQYYSSIKRSGSIQ 19
Y: LPKLFIDQYYSSIKRTMGH 19
X: EARDFN-QYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMGH--
• Identity(X,Y) = [8*2]/[8+8] = 100%
• Consider K and R, D and N, H and Q to be similar. Also consider G,
S and T, to be similar.
X: EARDF-NQYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMG--H
Example of local alignment
X: EARDFNQYYSSIKRSGSIQ 19
Y: LPKLFIDQYYSSIKRTMGH 19
X: EARDFN-QYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMGH--
• Identity(X,Y) = [8*2]/[8+8] = 100%
• Consider K and R, D and N, H and Q to be similar. Also consider G,
S and T, to be similar.
X: EARDF-NQYYSSIKRS-GSIQ
Y: LPKLFIDQYYSSIKRTMG--H
• Similarity(X,Y) = [10*2]/[10+10] = 100%