0% found this document useful (0 votes)
20 views15 pages

AAD-String Matching

AAD-String Matching thapar adsa

Uploaded by

tsingh6be21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

AAD-String Matching

AAD-String Matching thapar adsa

Uploaded by

tsingh6be21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

String Matching

 Outline
Looping
 Introduction
 The Naive String Matching Algorithm
 The Knuth-Morris-Pratt Algorithm
Introduction
 Text-editing programs frequently need to find all occurrences of a pattern in the text.
 Efficient algorithms for this problem is called String-Matching Algorithms.
 Among its many applications, “String-Matching” is highly used in Searching for patterns in DNA
and Internet search engines.
 Assume that the text is represented in the form of an array 𝑻[𝟏…𝒏] and the pattern is an array
𝑷[𝟏…𝒎].

Text T[1..13] a b c a b a a b c a b a c

Pattern P[1..4] a b a a

3
Naive String Matching - Example
 The naive algorithm finds all valid shifts using a loop that checks the condition P[1..m] =
T[s+1..s+m]

a c a a b c a c a a b c a c a a b c

a a b a a b a a b

s=0 s=1 s=2

a c a a b c
Pattern matched with shift 2
a a b P[1..m] = T[s+1..s+m]
s=3

5
Naive String Matching - Algorithm
NAIVE-STRING MATCHER (T,P)
1. n = T.length
2. m = P.length T[1..6] a c a a b c
3. for s = 0 to n-m
P[1..3] a a ba ba ba b
4. if p[1..m] == T[s+1..s+m]
5. print “Pattern occurs with
s = 0132
shift” s
Pattern occurs with shift 2

Naive String Matcher takes time O((n-m+1)m)

6
Introduction
 The KMP algorithm relies on prefix function (π).
 Proper prefix: All the characters in a string, with one or more cut off the end. “S”, “Sn”, “Sna”, and
“Snap” are all the proper prefixes of “Snape”.
 Proper suffix: All the characters in a string, with one or more cut off the beginning. “agrid”, “grid”,
“rid”, “id”, and “d” are all proper suffixes of “Hagrid”.
 KMP algorithm works as follows:
 Step-1: Calculate Prefix Function
 Step-2: Match Pattern with Text

8
Longest Common Prefix and Suffix

1 2 3 4 5 6 7
Pattern a b a b a c a
Prefix(π) 0 0 1 2 3 0 1

ababa
abab
aba
ab
a

We have no
Possible possible
prefix a ab,
= a, abprefixes
aba,
aba abab

We have no
Possible possible
suffix bb, ba,
= a, suffixes
ba
ab, aba,
bab baba

9
Calculate Prefix Function - Example
k+1 q q

1 2 3 4 5 6 7
P a c a c a g t
π 0 0 1 2 3 0 0
false true
k = 1
0
3
2 P[k+1]==P[q]
q = 4
3
2
7
6
5 false true
k>0
Initially set π[1] = 0
k is the longest prefix found k=π[k] k=k+1
q is the current index of pattern

π[q]=k
10
KMP- Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
m ← length[P]
π[1] ← 0
k←0
for q ← 2 to m
while k > 0 and P[k + 1] ≠ P[q]
k ← π[k]
end while
if P[k + 1] == P[q] then
k←k+1
end if
π[q] ← k
return π

11
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
T a c a t a c g a c a c a g t Prefix(π) 0 0 1 2 3 0 0
Mismatch ?
a c a c a g t Check value in prefix table
We can skip 2 shifts
a c a c a g t
(Skip unnecessary shifts)

T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix table

T a c a t a c g a c a c a g t
Mismatch ?
a c a c a g t Check value in prefix table
12
KMP String Matching
1 2 3 4 5 6 7
Pattern a c a c a g t
T a c a t a c g a c a c a g t Prefix(π) 0 0 1 2 3 0 0
Mismatch ?
a c a c a g t Check value in prefix table
We can skip 2 shifts
(Skip unnecessary shifts)
T a c a t a c g a c a c a g t

a c a c a g t

T a c a t a c g a c a c a g t

a c a c a g t
Pattern matches with shift 𝑖 − 𝑚
13
KMP-MATCHER
KMP-MATCHER(T, P)
n ← length[T]
m ← length[P]
π ← COMPUTE-PREFIX-FUNCTION(P)
q←0 //Number of characters matched.
for i ← 1 to n //Scan the text from left to right.
while q > 0 and P[q + 1] ≠ T[i]
q ← π[q] //Next character does not match.
if P[q + 1] == T[i] then
then q ← q + 1 //Next character matches.
if q == m then //Is all of P matched?
print "Pattern occurs with shift" i - m
q ← π[q] //Look for the next match.
14

You might also like