BioInfor Assignment
BioInfor Assignment
COURSE: BIOINFORMATICS
3. Substitution: Replacing one character with another (e.g., transform kitten to sitten by
replacing k with s).
4. Transposition: Swapping two adjacent characters (e.g., transform abc to acb by swapping b
and c).
Use Cases
DNA Sequencing: Comparing DNA sequences with small mutations, including inversions of
base pairs.
Natural Language Processing (NLP): For tasks such as autocorrect or fuzzy string matching.
Complexity
The algorithm typically runs in O(m×n)O(m \times n)O(m×n), where mmm and nnn are the
lengths of the two strings. It uses a matrix to calculate the distance iteratively, filling it based
on the costs of the operations.
Example Calculation
2.The Hamming distance is a metric used to measure the number of positions at which the
corresponding elements in two strings (or sequences) differ. In bioinformatics, it is commonly
applied to analyze DNA, RNA, or protein sequences.
How It Works
1. Definition: The Hamming distance is applicable only to strings of the same length. It
calculates the total number of mismatches between two strings.
Formula:
Where:
o s1[i]s_1[i]s1[i] and s2[i]s_2[i]s2[i] are characters at position iii in strings s1s_1s1 and
s2s_2s2, respectively.
o Sequence 1: GATTACA
o Sequence 2: GACTATA
1. Sequence Alignment:
2. Phylogenetics:
Limitations
Length Requirement: The Hamming distance requires sequences of equal length, making it
unsuitable for comparing sequences with insertions or deletions.
Context Blind: It does not account for the biological or evolutionary significance of
mismatches.
Conclusion
The Hamming distance is a fast and effective tool for comparing biological sequences of the same
length. While limited in its scope, it serves as a foundational measure in bioinformatics for tasks like
SNP detection and error correction in DNA sequencing.
4o