UNIT 5.3 (String Mactching)
UNIT 5.3 (String Mactching)
Prepared by
S Durga Devi
Assistant Professor
Department of Computer Science and Engineering
Chaitanya Bharathi Institute of Technology
-When you type URL in web browser, all the list of possible matching url’s list will be
displayed. That means web browser uses some internal processing and gives list of matching
urls. This technique is called auto- completion.
-Similarly when you type partial directory name in command line prompt and press tab
button which gives list of all the matched directory names. It also uses the auto completion
technique.
-To perform such operations, a special data structures are used to store the string data
efficiently. Design data structures to implement string algorithms.
String operations: several string operations are used to process the text
1. Substring: break a string into smaller strings is called sub string.
2. Pattern matching problem: we are given a text string T of length n and a pattern string P
of length m,, and want to find whether P is a substring of T. The notion of a “match” is
that there is a substring of T starting at some index i that matches P, character by
character, so that T[i] = P[0], T[i + 1] = P[1], ..., T[i + m − 1] = P[m − 1].
S. Durga Devi ,CSE,CBIT
Example
Suppose we are given the text string T = "abacaabaccabacabaabb"
and the pattern string
P = "abacab".
Then P is a substring of T. Namely
P = T[10..15].
Brute force approach used to solve a problem. In this approach try out all the possible
solutions of a problem and pick up the best solution.
Example:
B1, B2,G1 there are two boys and one girl arrange them into three chairs what are the
possible ways I can arrange them ?
Brute Force pattern matching algorithm compares the pattern P with text T starts from index
i that ranges from 0 to n-m, where n is length of the String T and m in length of the pattern p
the search will continue until
1. Match is found
2. No match has been found.
In Brute Force pattern matching algorithm the scan is made from left to right.
Algorithm
Boyer-Moore(T[0..n],P[0…m])
// set I and j to last index of P
i=m-1;
J=m-1;
// loop to the end of the text string
While i<n
// if P[j]=T[i] then
if j=0 then
return i;
Else
// go to next char
i=i-1;
j=i-1;
Else
// skip over the whole word or shift to last occurrence
i=i+m-min(j,1+last[T[i]])
J=m-1
Return -1 // no match
Case-2
Above LPS table is used to decide how many characters to be skipped when mismatch
occurred.
Standard Trie
The standard trie is an ordered tree for building the strings of set S such that
1. Each node is labelled with an alphabet except root node
2. The children of a node are alphabetically ordered.
3. The path from root to external node should produce a string from set S.
4. No string in S is prefix to another string.
Worst case time complexity is O(n)
Compressed Trie
1. Each internal node has at least two children.