SOU Lecture Handout ADA Unit-8
SOU Lecture Handout ADA Unit-8
1. Introduction
String Matching Algorithm is also called "String Searching Algorithm." This is a vital class
of string algorithm is declared as "this is the method to find a place where one is several
strings are found within the larger string."
Given a text array, T [1.....n], of n character and a pattern array, P [1......m], of m characters.
The problems are to find an integer s, called valid shift where 0 ≤ s < n-m and T
[s+1......s+m] = P [1......m]. In other words, to find even if P in T, i.e., where P is a sub-string
of T. The item of P and T are character drawn from some finite alphabet such as {0, 1} or {A,
B .....Z, a, b..... z}.
Given a string T [1......n], the sub-strings are represented as T [i......j] for some 0≤i ≤ j≤n-1,
the string formed by the characters in T from index i to index j, inclusive. This process that a
string is a sub-string of itself (take i = 0 and j =m).
The proper sub-string of string T [1......n] is T [1......j] for some 0<i ≤ j≤n-1. That is, we must
have either i>0 or j < m-1.
Using these descriptions, we can say given any string T [1......n], the sub-strings are
We assume that the text is an array T [1…n] of length n and that the pattern is an array P
[1…m] of length m ≤ n.
We further assume that the elements of P and T are characters drawn from a finite
alphabet Σ. For example, we may have Σ = {0, 1} or Σ = {a, b ..., z}.
DEPARTMENT OF COMPUTER ENGINEERING Page | 1
*Proprietary material of SILVER OAK UNIVERSITY
1010043316
(ANALYSIS & DESIGN OF
ALGORTIHM)
LECTURE COMPANION SEMESTER: 5 PREPARED BY: MONALI SUTHAR
We say that pattern P occurs with shift s in text T (or, equivalently, that pattern P occurs
beginning at position s + 1 in text T) if 0 ≤ s ≤ n - m and T [s + 1…s + m] = P [1…m]
(that is, if T [s + j] = P[j], for 1 ≤ j ≤ m).
If P occurs with shift s in T, then we call s a valid shift; otherwise, we call s an invalid
shift. The string-matching problem is the problem of finding all valid shifts with which a
given pattern P occurs in a given text T.
The naive algorithm finds all valid shifts using a loop that checks the condition P [1…m]
= T [s + 1…s + m] for each of the n - m + 1 possible values of s.
NAIVE-STRING-MATCHER(T, P)
1 n ← length[T]
2 m ← length[P]
3 for s ← 0 to n – m do
The for loop beginning on line 3 considers each possible shift explicitly.
The test on line 4 determines whether the current shift is valid or not; this test involves an
implicit loop to check corresponding character positions until all positions match
successfully or a mismatch is found.
2.1. Example:
In the above example, valid shift is s = 3 for which we found the occurrence of pattern P
in text T.
Let us assume that Σ = {0, 1, 2… 9}, so that each character is a decimal digit. (In the
general case, we can assume that each character is a digit in radix-d notation, where d =
|Σ|).
In a similar manner, given a text T [1…n], let ts denote the decimal value of the length-m
substring T[s + 1…s + m], for s = 0, 1. . . n - m.
To compute the remaining values t1, t2, . . . , tn-m in time θ(n - m), it suffices to observe
that ts+1 can be computed from ts in constant time, since
Subtracting 10m-1 T[s + 1] removes the high-order digit from ts, multiplying the result by
10 shifts the number left one position, and adding T [s + m + 1] brings in the appropriate
lower order digit.
For example, if m = 5 and ts = 31415 then we wish to remove the high order digit T [s +
1] = 3 and bring in the new lower order digit (suppose it is T [s + 5 + 1] = 2) to obtain ts+1
= 10 (31415 – 10000 *3) + 2 = 14152
The only difficulty with this procedure is that p and ts may be too large to work with
conveniently.
There is a simple cure for this problem, compute p and the t s's modulo a suitable
modulus q.
Where h = dm-1 (mod q) is the value of the digit "1" in the high-order position of an m-
digit text window.
The solution of working modulo q is not perfect, however ts = p (mod q) does not imply
that ts = p but if ts ≠ p (mod q) definitely implies ts ≠ p, so that shift s is invalid.
Any shift s for which ts = p (mod q) must be tested further to see whether s is really valid
or it is just a spurious hit.
Algorithm RABIN-KARP-MATCHER(T, P, d, q)
n ← length[T];
m ← length[P];
h ← dm-1 mod q;
DEPARTMENT OF COMPUTER ENGINEERING Page | 4
*Proprietary material of SILVER OAK UNIVERSITY
1010043316
(ANALYSIS & DESIGN OF
ALGORTIHM)
LECTURE COMPANION SEMESTER: 5 PREPARED BY: MONALI SUTHAR
p ← 0;
t0 ← 0;
for i ← 1 to m do
for s ← 0 to n – m do
if p == ts then
3.1. Analysis
3.2. Example
We choose q = 11
P mod q = 26 mod 11 = 4
Many string-matching algorithms build a finite automaton that scans the text string T for
all occurrences of the pattern P.
We begin with the definition of a finite automaton. We then examine a special string-
matching automaton and show how it can be used to find occurrences of a pattern in a
text.
Finally, we shall show how to construct the string-matching automaton for a given input
pattern.
The finite automaton begins in state q0 and reads the characters of its input string one at
a time.
If the automaton is in state q and reads input character a, it moves ("makes a transition")
from state q to state δ(q, a).
Whenever its current state q is a member of A, the machine M is said to have accepted
the string read so far. An input that is not accepted is said to be rejected.
In order to properly search for the string, the program must define a suffix function (σ)
which checks to see how much of what it is reading matches the search string at any
given moment.
σ(x) = max {k : Pk ⊐ x}
P= ababa
P1=a
DEPARTMENT OF COMPUTER ENGINEERING Page | 7
*Proprietary material of SILVER OAK UNIVERSITY
1010043316
(ANALYSIS & DESIGN OF
ALGORTIHM)
LECTURE COMPANION SEMESTER: 5 PREPARED BY: MONALI SUTHAR
P2=ab
P3=aba
P4=abab
The state set Q is {0, 1 . . . m}. The start state q0 is state 0, and state m is the only
accepting state.
The transition function δ is defined by the following equation, for any state q and
character a:
δ (q, α) = σ(Pq α)
ALGORITHM FINITE-AUTOMATON-MATCHER(T, δ, m)
n ← length[T]
q←0
for i ← 1 to n do
q ← δ(q, T[i])
if q == m
ALGORITHM COMPUTE-TRANSITION-FUNCTION(P, Σ)
m ← length[P]
for q ← 0 to m do
k ← min(m + 1, q + 2)
repeat k ← k - 1
until Pk ⊐ Pq α
δ(q, α) ← k
return δ
The nested loops beginning on lines 2 and 3 consider all states q and characters α and
lines 4-7 set δ(q, a) to be the largest k such that Pk ⊐ Pq α. The code starts with the
largest conceivable value of k, which is min (m, q + 1), and decreases k until Pk ⊐ Pq α.
4.3. Example
Repeat this procedure for q=0 to 7 we can get transition function table then pattern can
be matched using this table.
Given a text txt[0 . . . N-1] and a pattern pat[0 . . . M-1], write a function search(char pat[],
char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that N > M.
Examples:
pat[] = “AABA”
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12
The worst case complexity of the Naive algorithm is O(m(n-m+1)). The time complexity of
the KMP algorithm is O(n+m) in the worst case.
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right.
When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that
property to improve the time complexity, also for in the worst case.
Input:
Output:
5.2. Algorithm
findPrefix(pattern, m, prefArray)
Input − The pattern, the length of pattern and an array to store prefix location
Begin
length := 0
prefArray[0] := 0
increase length by 1
prefArray[i] := length
else
if length ≠ 0 then
length := prefArray[length - 1]
decrease i by 1
else
prefArray[i] := 0
done
End
kmpAlgorithm(text, pattern)
Input: The main text, and the pattern, which will be searched
Begin
n := size of text
m := size of pattern
while i < n, do
increase i and j by 1
if j = m, then
j := prefArray[j-1]
if j ≠ 0 then
j := prefArray[j - 1]
else
increase i by 1
done
End
Example
#include<iostream>
int length = 0;
if(pattern[i] == pattern[length]) {
length++;
prefArray[i] = length;
}else {
if(length != 0) {
}else
prefArray[i] = 0;
int n, m, i = 0, j = 0;
n = mainString.size();
m = pattern.size();
findPrefix(pattern, m, prefixArray);
loc = 0;
while(i < n) {
if(mainString[i] == pattern[j]) {
i++; j++;
if(j == m) {
loc++;
if(j != 0)
j = prefixArray[j-1];
else
i++;
int main() {
int locationArray[str.size()];
int index;
Output