0% found this document useful (0 votes)

17 views12 pages

Lecture 05

The document discusses string matching algorithms like Horspool, Boyer-Moore, and backward nondeterministic automata matching (BNDM). It explains how these algorithms work and analyzes their time complexities in the worst, best, and average cases.

Uploaded by

sayendranadh2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

Lecture 05

Uploaded by

sayendranadh2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Horspool

The algorithms we have seen so far access every character of the text. If we
start the comparison between the pattern and the current text position from
the end, we can often skip some text characters completely.

There are many algorithms that start from the end. The simplest are the
Horspool-type algorithms.

The Horspool algorithm checks first the text character aligned with the last
pattern character. If it doesn’t match, move (shift) the pattern forward
until there is a match.

Example 2.10: Horspool

ainaisesti-ainainen
ainaine/n
ainaine//
n
ainainen

81
More precisely, suppose we are currently comparing P against T [j..j + m).
Start by comparing P [m − 1] to T [k], where k = j + m − 1.
• If P [m − 1] 6= T [k], shift the pattern until the pattern character aligned
with T [k] matches, or until the full pattern is past T [k].
• If P [m − 1] = T [k], compare the rest in brute force manner. Then shift
to the next position, where T [k] matches.

Algorithm 2.11: Horspool

Input: text T = T [0 . . . n), pattern P = P [0 . . . m)
Output: position of the first occurrence of P in T
Preprocess:
(1) for c ∈ Σ do shif t[c] ← m
(2) for i ← 0 to m − 2 do shif t[P [i]] ← m − 1 − i
Search:
(3) j ← 0
(4) while j + m ≤ n do
(5) if P [m − 1] = T [j + m − 1] then
(6) i←m−2
(7) while i ≥ 0 and P [i] = T [j + i] do i ← i − 1
(8) if i = −1 then return j
(9) j ← j + shif t[T [j + m − 1]]
(10) return n
82
The length of the shift is determined by the shift table. shif t[c] is defined
for all c ∈ Σ:
• If c does not occur in P , shif t[c] = m.
• Otherwise, shif t[c] = m − 1 − i, where P [i] = c is the last occurrence of
c in P [0..m − 2].

Example 2.12: P = ainainen. c last occ. shif t

a ainainen 4
e ainainen 1
i ainainen 3
n ainainen 2
Σ \ {a,e,i,n} — 8

83
On an integer alphabet:
• Preprocessing time is O(σ + m).
• In the worst case, the search time is O(mn).
For example, P = bam−1 and T = an .
• In the best case, the search time is O(n/m).
For example, P = bm and T = an .
• In average case, the search time is O(n/ min(m, σ)).
This assumes that each pattern and text character is picked
independently by uniform distribution.

In practice, a tuned implementation of Horspool is very fast when the

alphabet is not too small.

84
BNDM
Starting matching from the end enables long shifts.
• The Horspool algorithm bases the shift on a single character.
• The Boyer–Moore algorithm uses the matching suffix and the
mismatching character.
• Factor based algorithms continue matching until no pattern factor
matches. This may require more comparisons but it enables longer
shifts.

Example 2.13: Horspool shift

varmasti-aikai/sen-ainainen
ainaisen-ainainen
ainaisen-ainainen
Boyer–Moore shift Factor shift
varmasti-aikai/sen-ainainen varmasti-ai/kaisen-ainainen
ainaisen-ainainen ainaisen-ainainen
ainaisen-ainainen ainaisen-ainainen

85
Factor based algorithms use an automaton that accepts suffixes of the
reverse pattern P R (or equivalently reverse prefixes of the pattern P ).
• BDM (Backward DAWG Matching) uses a deterministic automaton
that accepts exactly the suffixes of P R .
DAWG (Directed Acyclic Word Graph) is also known as suffix automaton.

• BNDM (Backward Nondeterministic DAWG Matching) simulates a

nondeterministic automaton.

Example 2.14: P = assi.

a s s i
3 2 1 0 -1

• BOM (Backward Oracle Matching) uses a much simpler deterministic

automaton that accepts all suffixes of P R but may also accept some
other strings. This can cause shorter shifts but not incorrect behaviour.

86
Suppose we are currently comparing P against T [j..j + m). We use the
automaton to scan the text backwards from T [j + m − 1]. When the
automaton has scanned T [j + i..j + m):

• If the automaton is in an accept state, then T [j + i..j + m) is a prefix

of P .
⇒ If i = 0, we found an occurrence.
⇒ Otherwise, mark the prefix match by setting shif t = i. This is the
length of the shift that would achieve a matching alignment.

• If the automaton can still reach an accept state, then T [j + i..j + m) is

a factor of P .
⇒ Continue scanning.

• When the automaton can no more reach an accept state:

⇒ Stop scanning and shift: j ← j + shif t.

87
BNDM does a bitparallel simulation of the nondeterministic automaton,
which is quite similar to Shift-And.
The state of the automaton is stored in a bitvector D. When the
automaton has scanned T [j + i..j + m):
• D.i = 1 if and only if there is a path from the initial state to state i
with the string (T [j + i..j + m))R .
• If D.(m − 1) = 1, then T [j + i..j + m) is a prefix of the pattern.
• If D = 0, then the automaton can no more reach an accept state.

Updating D uses precomputed bitvectors B[c], for all c ∈ Σ:

• B[c].i = 1 if and only if P [m − 1 − i] = P R [i] = c.

The update when reading T [j + i] is familiar: D = (D << 1) & B[T [j + i]]

• Note that there is no “+1”. This is because D.(−1) = 0 always, so the
shift brings the right bit to D.0. With Shift-And D.(−1) = 1 always.
• The exception is that in the beginning before reading anything
D.(−1) = 1. This is handled by starting the computation with the first
shift already performed. Because of this, the shift is done at the end of
the loop.

88
Algorithm 2.15: BNDM
Input: text T = T [0 . . . n), pattern P = P [0 . . . m)
Output: position of the first occurrence of P in T
Preprocess:
(1) for c ∈ Σ do B[c] ← 0
(2) for i ← 0 to m − 1 do B[P [m − 1 − i]] ← B[P [m − 1 − i]] + 2i
Search:
(3) j ← 0
(4) while j + m ≤ n do
(5) i ← m; shif t ← m
(6) D ← 2m − 1 // D ← 1m
(7) while D 6= 0 do
// Now T [j + i..j + m) is a pattern factor
(8) i←i−1
(9) D ← D & B[T [j + i]]
(10) if D & 2m−1 6= 0 then
// Now T [j + i..j + m) is a pattern prefix
(11) if i = 0 then return j
(12) else shif t ← i
(13) D ← D << 1
(14) j ← j + shif t
(15) return n
89
Example 2.16: P = assi, T = apassi.

B[c], c ∈ {a,i,p,s} D when scanning apas backwards

a i p s a p a s
i 0 1 0 0 i 0 0 0 1
s 0 0 0 1 s 0 0 1 1
s 0 0 0 1 s 0 0 1 1
a 1 0 0 0 a 0 1 0 1 ⇒ shif t = 2

D when scanning assi backwards

a s s i
i 0 0 0 1 1
s 0 0 1 0 1
s 0 1 0 0 1
a 1 0 0 0 1 ⇒ occurrence

90
On an integer alphabet when m ≤ w:
• Preprocessing time is O(σ + m).
• In the worst case, the search time is O(mn).
For example, P = am−1 b and T = an .
• In the best case, the search time is O(n/m).
For example, P = bm and T = an .
• In the average case, the search time is O(n(logσ m)/m).
This is optimal! It has been proven that any algorithm needs to inspect
Ω(n(logσ m)/m) text characters on average.

When m > w, there are several options:

• Use multi-word bitvectors.
• Search for a pattern prefix of length w and check the rest when the
prefix is found.
• Use BDM or BOM.

91
• The search time of BDM and BOM is O(n(logσ m)/m), which is
optimal on average. (BNDM is optimal only when m ≤ w.)

• MP and KMP are optimal in the worst case.

• There are also algorithms that are optimal in both cases. They are
based on similar techniques, but we will not describe them here.

Foundations of Computer Science - Solutions To Selected Exercise
No ratings yet
Foundations of Computer Science - Solutions To Selected Exercise
89 pages
Algorithm Questions and Answers
No ratings yet
Algorithm Questions and Answers
23 pages
CLRS Solutions Manual PDF
100% (1)
CLRS Solutions Manual PDF
511 pages
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
No ratings yet
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
41 pages
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
No ratings yet
Two-Dimensional Pattern Matching: Technische Universiteit Eindhoven Department of Mathematics and Computer Science
100 pages
Alogrithm 02
No ratings yet
Alogrithm 02
69 pages
Blaeser Algorithms and Data Structures
No ratings yet
Blaeser Algorithms and Data Structures
109 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
Avl-2 0 1
No ratings yet
Avl-2 0 1
432 pages
Tsa Lectures 1
No ratings yet
Tsa Lectures 1
226 pages
Slides9 8
No ratings yet
Slides9 8
55 pages
DAA Unit 4
No ratings yet
DAA Unit 4
34 pages
Lecture 04
No ratings yet
Lecture 04
18 pages
1 Lecture 1
No ratings yet
1 Lecture 1
65 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
Pattern Matching
No ratings yet
Pattern Matching
46 pages
Textbook Solutions Clifford A.Shaffer
No ratings yet
Textbook Solutions Clifford A.Shaffer
43 pages
MADF Unit 4
No ratings yet
MADF Unit 4
144 pages
Lec 3
No ratings yet
Lec 3
37 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
1 s2.0 0890540191900465 Main
No ratings yet
1 s2.0 0890540191900465 Main
27 pages
Soluzioni Cormen
No ratings yet
Soluzioni Cormen
20 pages
M4 - Chapter 7 1
No ratings yet
M4 - Chapter 7 1
30 pages
Inf715 11
No ratings yet
Inf715 11
57 pages
Slides 03
No ratings yet
Slides 03
21 pages
4string Matching Kmprabin Karp and Naive
No ratings yet
4string Matching Kmprabin Karp and Naive
57 pages
U3 - SpaceAndTimeTradeoff
No ratings yet
U3 - SpaceAndTimeTradeoff
30 pages
CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified
No ratings yet
CCCS314 - DAA - 22!23!3rd 05 Space and Time Tradeoffs - Modified
30 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
Efficient String Matching Using Bit Parallelism: Kapil Kumar Soni, Rohit Vyas, Dr. Vivek Sharma
No ratings yet
Efficient String Matching Using Bit Parallelism: Kapil Kumar Soni, Rohit Vyas, Dr. Vivek Sharma
5 pages
ACSL by Example
No ratings yet
ACSL by Example
291 pages
String Searching Algorithm
No ratings yet
String Searching Algorithm
22 pages
Shift-Or String Matching With Super-Alphabets: Kimmo Fredriksson Kfredrik@cs - Joensuu.fi
No ratings yet
Shift-Or String Matching With Super-Alphabets: Kimmo Fredriksson Kfredrik@cs - Joensuu.fi
7 pages
Fast Pattern Matching In: Strings
No ratings yet
Fast Pattern Matching In: Strings
28 pages
A Fast String Matching Algorithm: H N Verma, Ravendra Singh M.Tech (CSE-0104cs09mt16) RKDF IST Bhopal, India
No ratings yet
A Fast String Matching Algorithm: H N Verma, Ravendra Singh M.Tech (CSE-0104cs09mt16) RKDF IST Bhopal, India
7 pages
Pushdown Automata Simulator: Bachelor Thesis
No ratings yet
Pushdown Automata Simulator: Bachelor Thesis
29 pages
ADA UNIT 3 Complete Notes
No ratings yet
ADA UNIT 3 Complete Notes
59 pages
Ads Unit5
No ratings yet
Ads Unit5
26 pages
Lecture 1 - Analysis of Algorithms
No ratings yet
Lecture 1 - Analysis of Algorithms
34 pages
Chapter 2 - String Processing
No ratings yet
Chapter 2 - String Processing
26 pages
IRS Unit-5
No ratings yet
IRS Unit-5
62 pages
Exact String Matchin
No ratings yet
Exact String Matchin
7 pages
Assgn 1 Sol
No ratings yet
Assgn 1 Sol
10 pages
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
No ratings yet
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
5 pages
AlgDs1LectureNotes 2025 02 16
No ratings yet
AlgDs1LectureNotes 2025 02 16
89 pages
Approximate String
No ratings yet
Approximate String
36 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
Ir Asnment
No ratings yet
Ir Asnment
6 pages
Data Sructures and Algorithms
No ratings yet
Data Sructures and Algorithms
112 pages
Ada M3
No ratings yet
Ada M3
40 pages
Pattren Matching
No ratings yet
Pattren Matching
3 pages
String Match - Horspool Sad Life
No ratings yet
String Match - Horspool Sad Life
4 pages
Algo Lecture 7
No ratings yet
Algo Lecture 7
52 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Unit 5
No ratings yet
Unit 5
14 pages
String Search Algorithm
No ratings yet
String Search Algorithm
6 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
Space and Time Trade Off
No ratings yet
Space and Time Trade Off
8 pages
DAA Assignment 01 Solution
No ratings yet
DAA Assignment 01 Solution
4 pages
Activity 4 Application of Matrix Operations GROUP 1
No ratings yet
Activity 4 Application of Matrix Operations GROUP 1
8 pages
DSA Patterns and Problems
No ratings yet
DSA Patterns and Problems
10 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
Binary Search Algorithm
100% (1)
Binary Search Algorithm
12 pages
3 Binary - Search
No ratings yet
3 Binary - Search
20 pages
PDF of Digital Signal Processing Ramesh Babu 2 PDF
No ratings yet
PDF of Digital Signal Processing Ramesh Babu 2 PDF
2 pages
10EC/TE61: Answer Any FIVE Full Questions, Selecting at Least TWO Questions From Each Part
No ratings yet
10EC/TE61: Answer Any FIVE Full Questions, Selecting at Least TWO Questions From Each Part
2 pages
Cos 211 Lecture Note
No ratings yet
Cos 211 Lecture Note
18 pages
Module 3 SAMPLING THEOREM PROOF
No ratings yet
Module 3 SAMPLING THEOREM PROOF
11 pages
Bmi 401-Design and Analysis of Algorithms Course Outline
No ratings yet
Bmi 401-Design and Analysis of Algorithms Course Outline
4 pages
Operating System - Lab Manual # 8
No ratings yet
Operating System - Lab Manual # 8
6 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
09 - Lecture Note 09 - Numerical Solution ODE
No ratings yet
09 - Lecture Note 09 - Numerical Solution ODE
8 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Brain Stroke
No ratings yet
Brain Stroke
3 pages
ADA Manual Updated
No ratings yet
ADA Manual Updated
54 pages
MGMT Science
No ratings yet
MGMT Science
35 pages
DSA-Class-Assignment 3
No ratings yet
DSA-Class-Assignment 3
2 pages
Dip Cat 2
No ratings yet
Dip Cat 2
2 pages
Ma 2020
No ratings yet
Ma 2020
14 pages
Rosco Mini-V Exploded Diagram Final - 2
No ratings yet
Rosco Mini-V Exploded Diagram Final - 2
1 page
The FPGA Implementation of The Digital Receiver 3
No ratings yet
The FPGA Implementation of The Digital Receiver 3
64 pages
One-Dimensional Minimization: Lectures For PHD Course On Numerical Optimization
No ratings yet
One-Dimensional Minimization: Lectures For PHD Course On Numerical Optimization
33 pages
EE322M Problem Set4
No ratings yet
EE322M Problem Set4
3 pages
Assignment 1.1 (Distri.)
No ratings yet
Assignment 1.1 (Distri.)
2 pages
16 Weeks Plan
No ratings yet
16 Weeks Plan
5 pages
Problem 1 (Total: 15%) : 19ECE06C Signals & Systems Problem-Based Project
No ratings yet
Problem 1 (Total: 15%) : 19ECE06C Signals & Systems Problem-Based Project
6 pages
EE-520 (171) Dr. Ibrahim O. Habiballah Home Work 2: Use The Chained Data Structure Method For Storing Symmetric Y
No ratings yet
EE-520 (171) Dr. Ibrahim O. Habiballah Home Work 2: Use The Chained Data Structure Method For Storing Symmetric Y
5 pages
DSP - Eee F434 2018-19 - CMS PDF
No ratings yet
DSP - Eee F434 2018-19 - CMS PDF
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet

Lecture 05

Uploaded by

Lecture 05

Uploaded by

Horspool

Example 2.10: Horspool

Algorithm 2.11: Horspool

Example 2.12: P = ainainen. c last occ. shif t

In practice, a tuned implementation of Horspool is very fast when the

Example 2.13: Horspool shift

• BNDM (Backward Nondeterministic DAWG Matching) simulates a

Example 2.14: P = assi.

• BOM (Backward Oracle Matching) uses a much simpler deterministic

• If the automaton is in an accept state, then T [j + i..j + m) is a prefix

• If the automaton can still reach an accept state, then T [j + i..j + m) is

• When the automaton can no more reach an accept state:

Updating D uses precomputed bitvectors B[c], for all c ∈ Σ:

The update when reading T [j + i] is familiar: D = (D << 1) & B[T [j + i]]

B[c], c ∈ {a,i,p,s} D when scanning apas backwards

D when scanning assi backwards

When m > w, there are several options:

• MP and KMP are optimal in the worst case.

You might also like