Exact String Matching Algorithms: Presented by Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Exact String Matching Algorithms: Presented by Dr. Shazzad Hosain Asst. Prof. EECS, NSU
Presented By
Dr. Shazzad Hosain
Asst. Prof. EECS, NSU
Exact Matching: What’s the Problem
1
1 2 34 5 67 8 90 1 2
P occurs in T starting at locations 3, 7, and 9
T = bbabaxababay P may overlap, as found at 7 and 9.
P = aba
The Naive Method
• Problem is to find if a pattern P[1..m] occurs
within text T[1..n]
• Let P = abxyabxz and T = xabxyabxyabxz
• Where m = 8 and n = 13
The Naive Method
Instead of
Instead of
Instead of Starts at
12
Preprocessing
Zi(S) = The longest prefix of S[i..|S|] that matches a prefix of S,
where i > 1
1 Z5(S) = 3 (aabc…aabx…)
12 3 456 7 8 901 Z6(S) = 1 (aa…ab…)
S = aabcaabxaaz Z7(S) = Z8(S) = 0
Z9(S) = 2 (aab…aaz)
We will use Zi in place of Zi(S)
Z Box
for i > 1, where Zi is greater than zero
40 50 55 62 70 78 82 85 89 95
r78 = 95 l78 = 78
r82 = 95 l82 = 78
r52 = 50 l52 = 40
r75 = 85 l75 = 70
Preprocessing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
S: a a b a a b c a x a a b a a b c y
Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0
Z-box
a a b a a b c a x a a b a a b c y
ri: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16
li: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10
15
Z-Algorithm
Goal: To calculate Zi for an input string S in a linear time
16
Z-Algorithm
In iteration k:
(I) if k<=r
l k r
a’ b’ a b
l’ k’ r’ l k r
k’=k-l+1; r’=r-l+1; a=a’; b=b’
a’ a
b’ b
a a b a a b c a x a a b a a b c y
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
17
Z-Algorithm
A) If |g’|<|b’|, that is, Z k’< r-k+1, Z k = Z k’
a’
g’’ x g’ y b’ a g y b
l’ k’ r’ l k r
g=g’=g’’; x≠y
a’ a
g’’ g’ b’ g
b
a a b a a b c a x a a b a a b c y
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Z: 0 1 0 3 1 0 0 1 0 7 1 0 3
18
Z-Algorithm
B) If |g’|>|b’|, that is, Z k’ >r-k+1, Zk =|b|, i.e., r-k+1
a’
b’’ b’ a b
g’’ x g’ x g y
l’ k’ r’ l k r
b=b’=b’’
Zk =|b|, i.e., r-k+1
g’=g’’;
x ≠y (because a is a Z box)
a’ b’’ b’
a b
g’’ g’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
S: a a b a a b c a x a a b a a c d
Z: 0 1 0 3 1 0 0 1 0 6 1 0 2 1 0 0
19
Z-Algorithm
C) If |g’|=|b’|, that is, Z k’ =r-k+1, Zk ≥|b|, i.e., ≥ r-k+1
a’
b’’ b’ a b
g’’ z g’ x g y
l’ k’ r’ l k r
b=b’=b’’
Compare S[r+1,...] with S[ |b| +1,…]
g=g’=g’’;
until a mismatch occurs. Update Zk, r,
x ≠y (because a is a Z box)
z ≠x (because g’ is a Z box) and l
z ?? y
a’ b’
a b
g’’ g’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
S: a a b a a e c a x a a b a a b d
Z: 0 1 0 2 1 0 0 1 0 6 1 0 3 1 0 0
20
Z-Algorithm
(II) if k>r
l r k
Compare the characters starting at k+1 with those
starting at 1.
Update r, and l if necessary
21
Z-Algorithm
Input: Pattern P
Output: Zi
Z Algorithm
Calculate Z2, r2 and l2 specifically by comparisons. R= r2 and l=l2
for i=3; i<n; i++
if k<=r
if Z k-l+1 <r-k+1, then Z k = Z k-l+1
else if Z k-l+1 > r-k+1 Z k = r-k+1
else compare the characters starting at r+1 with those starting at |b|
+1. Update r, and l if necessary
else Compare the characters starting at k to those starting at 1.
Update r, and l if necessary
22
Preprocessing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
S: a a b a a b c a x a a b a a b c y
Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0
r: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16
l: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10
23
Z-Algorithm
Time complexity
#mismatches <= number of iterations, n
#matches
• Let q be the number of matches at iteration k, then we need to increase r by at least q
• r<=n
• Thus total #match <=n
T=O( #matches + #mismatches +#iterations)=O(n)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
S: a a b a a b c a x a a b a a b c y
Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0
r: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16
l: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10
#m: 0 1 0 3 0 0 0 1 0 7 0 0 0 0 0 0 0
#mis: 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1
24
Simplest Linear Time Exact Matching Algorithm
T=O(|P|+|T|+1)=O(n+m)
25
Simplest Linear Time Exact Matching Algorithm
a’ b’ $ a b
l’ k’ r’ l k r
26
Reference
• Chapter 1, 2: Exact Matching: Fundamental
Preprocessing and First Algorithms