AAD-String Matching
AAD-String Matching
Outline
Looping
Introduction
The Naive String Matching Algorithm
The Knuth-Morris-Pratt Algorithm
Introduction
Text-editing programs frequently need to find all occurrences of a pattern in the text.
Efficient algorithms for this problem is called String-Matching Algorithms.
Among its many applications, “String-Matching” is highly used in Searching for patterns in DNA
and Internet search engines.
Assume that the text is represented in the form of an array 𝑻[𝟏…𝒏] and the pattern is an array
𝑷[𝟏…𝒎].
Text T[1..13] a b c a b a a b c a b a c
Pattern P[1..4] a b a a
3
Naive String Matching - Example
The naive algorithm finds all valid shifts using a loop that checks the condition P[1..m] =
T[s+1..s+m]
a c a a b c a c a a b c a c a a b c
a a b a a b a a b
a c a a b c
Pattern matched with shift 2
a a b P[1..m] = T[s+1..s+m]
s=3
5
Naive String Matching - Algorithm
NAIVE-STRING MATCHER (T,P)
1. n = T.length
2. m = P.length T[1..6] a c a a b c
3. for s = 0 to n-m
P[1..3] a a ba ba ba b
4. if p[1..m] == T[s+1..s+m]
5. print “Pattern occurs with
s = 0132
shift” s
Pattern occurs with shift 2
6
Introduction
The KMP algorithm relies on prefix function (π).
Proper prefix: All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and
“Snap” are all the proper prefixes of “Snape”.
Proper suffix: All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”,
“rid”, “id”, and “d” are all proper suffixes of “Hagrid”.
KMP algorithm works as follows:
Step-1: Calculate Prefix Function
Step-2: Match Pattern with Text
8
Longest Common Prefix and Suffix
1 2 3 4 5 6 7
Pattern a b a b a c a
Prefix(π) 0 0 1 2 3 0 1
ababa
abab
aba
ab
a
We have no
Possible possible
prefix a ab,
= a, abprefixes
aba,
aba abab
We have no
Possible possible
suffix bb, ba,
= a, suffixes
ba
ab, aba,
bab baba
9
Calculate Prefix Function - Example
k+1 q q
1 2 3 4 5 6 7
P a c a c a g t
π 0 0 1 2 3 0 0
false true
k = 1
0
3
2 P[k+1]==P[q]
q = 4
3
2
7
6
5 false true
k>0
Initially set π[1] = 0
k is the longest prefix found k=π[k] k=k+1
q is the current index of pattern
π[q]=k
10
KMP- Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
m ← length[P]
π[1] ← 0
k←0
for q ← 2 to m
while k > 0 and P[k + 1] ≠ P[q]
k ← π[k]
end while
if P[k + 1] == P[q] then
k←k+1
end if
π[q] ← k
return π
11
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
T a c a t a c g a c a c a g t Prefix(π) 0 0 1 2 3 0 0
Mismatch ?
a c a c a g t Check value in prefix table
We can skip 2 shifts
a c a c a g t
(Skip unnecessary shifts)
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix table
T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix table
12
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
T a c a t a c g a c a c a g t Prefix(π) 0 0 1 2 3 0 0
Mismatch ?
a c a c a g t Check value in prefix table
We can skip 2 shifts
(Skip unnecessary shifts)
T a c a t a c g a c a c a g t
a c a c a g t
T a c a t a c g a c a c a g t
a c a c a g t
Pattern matches with shift 𝑖 − 𝑚
13
KMP-MATCHER
KMP-MATCHER(T, P)
n ← length[T]
m ← length[P]
π ← COMPUTE-PREFIX-FUNCTION(P)
q←0 //Number of characters matched.
for i ← 1 to n //Scan the text from left to right.
while q > 0 and P[q + 1] ≠ T[i]
q ← π[q] //Next character does not match.
if P[q + 1] == T[i] then
then q ← q + 1 //Next character matches.
if q == m then //Is all of P matched?
print "Pattern occurs with shift" i - m
q ← π[q] //Look for the next match.
14