0% found this document useful (0 votes)
8 views

5 1 Stringsearch

Uploaded by

rgn12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

5 1 Stringsearch

Uploaded by

rgn12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

String Searching

1
String Search
A common word processor facility is to search
for a given word in a document. Generally, the
problem is to search for occurrences of a short
string in a long string.

the Do the first then do the other one

2
History of String Search
The brute force algorithm:
 invented in the dawn of computer history
 re-invented many times, still common
Knuth & Pratt invented a better one in 1970
 invented independently by Morris
 published 1976 as “Knuth-Morris-Pratt”

3
 The obvious algorithm is to try the word at each possible
place, and compare all the characters:
characters
for i := 0 to n-m do (doc length n)
for
m) j := 0 to m-1 do (word length

compare word[j] with doc[i+j]

if not equal, exit the inner loop


The complexity is at worst O(m*n) and best
O(n).
4
Improving String Search
Surprisingly, there is a faster algorithm
where you compare the last characters first:
Do the first then do the other one
the
compare ‘e’ with ‘ ‘, fail so move along 3 places

Do the first then do the other one


thecan only move along 2 places

5
Improved string search, continued
In every case where the document
character is not one of the characters in
the word, we can move along m places.
Sometimes, it is less.

6
Problem Definition, terminology
Let p be the pattern string
Let t be the target string (document)
Let k be the index of the character in the target
string that “lies over” the first character of the
pattern
Given two strings, p and t, over the alphabet ,
determine whether p occurs as the substring of t
That is, determine whether there exists k such
that p=Substring(t,k,|p|).
7
Straightforward string searching
function SimpleStringSearch(string p,t): integer
{Find p in t; return its location or -1 if p is not a substring of t}

for k from 0 to Length(t) – Length(p) do


i=0
while i < Length(p) and p[i] = t[k+i] do
i = i+1
if i == Length(p) then return k
return -1

8
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

Y Y Y N

9
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

10
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

11
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

12
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

13
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

14
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

15
SimpleStringSearch
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

A B C E F G A B C D E
p[0] p[1] p[2] p[3]

A B C D

Y Y Y Y

16
Straightforward string searching
Worst case:
 Pattern string always matches completely except for last
character
 Example: search for XXXXXXY in target string of
XXXXXXXXXXXXXXXXXXXX
 Outer loop executed once for every character in target
string
 Inner loop executed once for every character in pattern
 (|p| * |t|)

17
Knuth-Morris-Pratt
t[0] t[1] t[2] t[3] t[4] t[5] t[6] t[7] t[8] t[9] t10]

X Y X Y X Y c

p[0] p[1] p[2] p[3] p[4]

X Y X Y Z

Y Y Y Y N

X Y X Y Z
Y Y Y Y ?

18
Knuth-Morris-Pratt
(|p| * |t|)
Key idea:
 if pattern fails to match, slide pattern to right by
as many boxes as possible without permitting a
match to go unnoticed

19
Knuth-Morris Pratt
Correct motion of pattern depends on both
location of mismatch and the mismatching
character
If c == X : move 2 boxes to right
If c == E : move 5 boxes to right
If c == Z : target found; alg terminates

20
Knuth-Morris-Pratt
Goal: determine d, number of boxes to
right pattern should move; smallest d such
that:
p[0] = t[k+d]
p[1] = t[k+d+1]
p[2] = t[k+d+2]
…
p[i-d] = t[k+i]

21
Knuth-Morris-Pratt
Note: can be stated largely in terms of
pattern alone.
Value of d depends only on:
The pattern
The value of i
The mismatching character c (at t[k+i])

22
Knuth-Morris-Pratt
algorithm kmp_search:
Input:
an array of characters, t (the text to be searched)
an array of characters, p (the word sought)
output:
an integer (the zero-based position in t at which p is found)
define variables:
an integer, m ← 0 (the beginning of the current match in t)
an integer, i ← 0 (the position of the current character in p)
an array of integers, T (the table, computed elsewhere)
while (m + i) is less than the length of t, do:
if p[i] = p[m + i],
let i ← i + 1
if i equals the length of p,
return m
otherwise,
let m ← m + i - T[i],
if i is greater than 0,
let i ← T[i]
(if we reach here, we have searched all of t unsuccessfully)
return the length of t
23
Knuth-Morris-Pratt

m = 0, i = 0 : no “A” occurs between t(0) and t(3). Goto m = 4

m = 4, i = 0: A match can occur at t(8) and t(9)

m = 8, i = 0: Fails

m = 11, i = 0: t(17) not equals p(6)

m = 15, i = 0

24
Knuth-Morris-Pratt
For pattern ABCD:

A B C D <- if the position in the pattern is


this character
And the
mis- A 0 1 2 3
matching Then skip this many
B 1 0 3 4 spaces
character
in the
C
1 2 0 4
target is
this - 1 2 3 0
D
1 2 3 4
other
25
Knuth-Morris-Pratt
For pattern XYXYZ:
<- if the position in the pattern is
X Y X Y Z this character

And the
X 0 1 0 3 2
mis-
matching
1 0 3 0 5 Then skip this many

character
Y spaces

in the
target is Z 1 2 3 4 0
this -

1 2 3 4 5
other
26

You might also like