Note 4
Note 4
algorithms
Madhurima Nath, PhD · Follow
5 min read · Jan 8, 2024
Listen Share
Algorithms used
Most commonly used fuzzy matching
algorithms involve calculating the edit
distance metrics between the strings. Edit
distance metric quantifies how dissimilar
two strings are by counting the minimum
number of operations required to transform
one string into the other. Some of the well-
known distance metrics are
Levenshtein distance
Damerau–Levenshtein distance
Hamming distance
Jaro distance
Implementation in python
Bitap algorithm
This is an on-line method of searching
(i.e., search without indexing) and uses
Levenshtein distance to calculate
approximate equality between the search
string and the given pattern. Bitap
algorithm uses bitwise operations on the
bitmasks (a bitmask is the data used for
bitwise operations, and multiple bits in a
byte, word etc. can be set either on or off,
or inverted from on to off or vice versa in a
single bitwise operation using bitmasks)
which are extremely fast. It performs best
on patterns of short lengths due to the
underlying data structures.
Example 1:
input text: womenwhocode, pattern: code
output: Pattern found at index: 8
Example 2:
input text: youareawesome, pattern:
youareamazing
output: No Match
Implementation in python
n-gram algorithm
This algorithm predicts next item in a
sequence of text in form of a Markov
model. It is an off-line search, i.e., the
search is performed on the indices,
making this much computationally
efficient for large data. Currently, n-gram
techniques are used in almost every
Natural Language processing algorithms.
n-gram is a set of values generated from a
string by pairing sequentially occurring n
References
1. Levenshtein, Vladimir I. “Binary codes
capable of correcting deletions,
insertions, and reversals.” In Soviet
physics doklady, vol. 10, no. 8, pp. 707–
710. 1966.
130 1
Machine Learning NLP
Naturallanguageprocessing
130 1
Follow
Responses (1)
Akash Agarwal
Apr 20, 2024
Bitmap
1 reply
Implementation of end-to-end
machine learning solution
Solutioning and designing the end-to-end
architecture for a enterprise wide/large scale…
implementation of machine learning models
Aug 20, 2023 1
Roya
Python Coding
Sebastian Carlos
Mar 23 987 50