CS 240 Tutorial 11 Notes: C A A B A

This document discusses string pattern matching algorithms. It begins by describing the brute force algorithm, which has O(nm) runtime where n and m are the lengths of the text and pattern strings. It then introduces the Knuth-Morris-Pratt (KMP) algorithm, which improves efficiency by not checking positions where a match is impossible based on previous characters. KMP preprocesses the pattern to calculate a failure array F showing the largest valid shift if a mismatch occurs. During matching, the text index shifts by the amount in F, avoiding invalid positions. This results in an O(n) runtime where n is the text length.

Uploaded by

DavidKnight

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views2 pages

CS 240 Tutorial 11 Notes: C A A B A

Uploaded by

DavidKnight

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CS 240 Tutorial 11 Notes

Pattern Matching/String Search: Find the location(s) of a short string (the pattern) in a longer string
(the text).
Example: Find the pattern P = needle in the text T = haystackneedlehaystack.
Answer: The pattern occurs at position 8.
Examples: For simplicity, return only the first position. (If necessary, repeat on remaining portion.)
search(a, abc) = 0
search(c, abcabc) = 2
search(d, abc) = FAIL
An elegant, real-world problem; solved by e.g., word processors, strstr in C, strpos in PHP, etc.
Brute force: Check every possible location in T to see if P is there.
Example: Search for P = abc in T = abdacabcd.
Try each possible location 0, 1, . . . in P , and see if abc starts there. Once a mismatch occurs, move on to
the next location.
a b d a c a b c d
0 aX bX c

1
a

2
a

3
aX b

4
a

5
aX bX cX
X
Loop over two indices: position or T -index and mismatch or P -index.
Analysis: Say |T | = n and |P | = m.
There are n m + 1 positions to check, and each position can have up to m character comparisons.
Total # of comparisons: O((n m + 1)m) = O(nm) = O(n2 ).
Behaviour this bad really can occur, e.g., with T = an and P = am1 b every possible comparison is made.
When m = n/2 this is (n2 ) comparisons.
KnuthMorrisPratt: More efficient string search. Dont check positions where its impossible to find P .
When a mismatch occurs, can often remove possible locations for P by looking at the last characters
seen.
Example: P = abcd, T = abcabcd
0
3

a b c a b c d
aX bX cX d
aX bX cX dX

We can rule out positions 1 and 2 because P [0] = a does not appear in P [1..2], so it also doesnt appear in
the text T [1..2] which was just matched to P [1..2].
Example: P = abcabc . . . , T = abcabd . . .
On the first mismatch, consider the possible shifts of P before the mismatch:
0
1
2
3
4

a b c
aX bX cX
a b
a

a
aX
c
b
aX

b d ...
bX c
a
c
bX
a

abcab is P truncated at mismatch

valid shift

Key observation: The valid shifts are when a prefix of P is equal to a suffix of P truncated at the
mismatch.
Before we even see T we can calculate the possible shifts for each truncation of P ! This pre-processing
of P allows efficient string search.
Actually, only need to store the minimal valid shift for each truncation of P . (Equivalently, the maximal
valid prefix length.) This is known as the KMP failure array F :
F [i] := largest prefix of P that is a proper suffix of P [0..i]
= largest prefix of P that is a suffix of P [1..i]
Note proper suffix, otherwise the largest prefix would always be P [0..i] itself, leading to a shift of 0.
Example: Give the KMP failure array F for P
Answer:
i
0
1
2
3
4
5

= ababac.
P [0..i]
a
ab
aba
abab
ababa
ababac

F [i]
0
0
1
2
3
0

Once F is computed (can be done in O(m) time), KMP is really just brute-force except that on a
mismatch at P -index i you shift the T -index by i F [i 1] (or 1 if i = 0) and set the P -index to
F [i 1] (or 0 if i = 0).
A naive analysis says there are O(n) changes of T -index and for each there are O(m) changes of P -index,
the same as brute-force. But it actually performs better, as we shall see.
Example: Run KMP with P = ababac and T = abcaabaabababac.
a b c a a b a a b a b a b a c
aX bX a
a
aX b
aX bX aX b
b
aX bX aX bX aX c
bX aX cX

P -index:

i
2
0
1
3
1
5
5

inew
F [1] = 0
0
F [0] = 0
F [2] = 1
F [1] = 0
F [4] = 3

Notice the staircase pattern that KMP generates. In general, the number of comparisons made will be
O(staircase length + staircase width).
But the length and height of the staircase are O(n), so KMP costs O(n).

you wanted to continue the search, i would be 6 and inew would be F [5] = 0

Kannada Handwritten Digit Recognition. Version-1.0
0% (1)
Kannada Handwritten Digit Recognition. Version-1.0
9 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
No ratings yet
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
14 Introduction To Digital Filter
No ratings yet
14 Introduction To Digital Filter
24 pages
Matlab Slides
No ratings yet
Matlab Slides
16 pages
54.string Inotes
No ratings yet
54.string Inotes
20 pages
PHIL 215 - Final Exam Notes
No ratings yet
PHIL 215 - Final Exam Notes
99 pages
CFG Removal of Null and Unit Production
No ratings yet
CFG Removal of Null and Unit Production
31 pages
5CS4-AOA-Unit-2 - PPT @zammers
No ratings yet
5CS4-AOA-Unit-2 - PPT @zammers
108 pages
Operation Research Problems Solving in Python: Prepared by Saurav Barua
No ratings yet
Operation Research Problems Solving in Python: Prepared by Saurav Barua
15 pages
Ebooks File Digital Signal Processing 3rd Edition S. Salivahanan All Chapters
No ratings yet
Ebooks File Digital Signal Processing 3rd Edition S. Salivahanan All Chapters
67 pages
Wavelet Transform Use For Feature Extraction and EEG Signal Segments Classification
No ratings yet
Wavelet Transform Use For Feature Extraction and EEG Signal Segments Classification
4 pages
NMCP Question Bank
No ratings yet
NMCP Question Bank
4 pages
HW 06 Markov Chains Solutions
No ratings yet
HW 06 Markov Chains Solutions
4 pages
Memoization
No ratings yet
Memoization
4 pages
Chapter 9 Searching
No ratings yet
Chapter 9 Searching
47 pages
1 The K-Medoids Algorithm
No ratings yet
1 The K-Medoids Algorithm
5 pages
DSA Final Project
No ratings yet
DSA Final Project
37 pages
A Review of Deep Learning Models For Time Series Prediction
No ratings yet
A Review of Deep Learning Models For Time Series Prediction
16 pages
Quadratic Equations Class Notes
100% (2)
Quadratic Equations Class Notes
12 pages
2023 Dsga1007 Lect03 VF
No ratings yet
2023 Dsga1007 Lect03 VF
39 pages
Kernel Functions
No ratings yet
Kernel Functions
35 pages
DataMining Course Handout
No ratings yet
DataMining Course Handout
5 pages
Recap: Signals & Coding Framing Error Detection & Correction
No ratings yet
Recap: Signals & Coding Framing Error Detection & Correction
26 pages
String Matching
No ratings yet
String Matching
16 pages
Practical 3
No ratings yet
Practical 3
4 pages
CI - Warehouse
No ratings yet
CI - Warehouse
6 pages
ACTSC 221 - Review For Final Exam
No ratings yet
ACTSC 221 - Review For Final Exam
2 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
8 pages
CS454/654 Assignment 2 (Milestone Assignment)
No ratings yet
CS454/654 Assignment 2 (Milestone Assignment)
4 pages
Backpropagation Math
No ratings yet
Backpropagation Math
6 pages
OR New
No ratings yet
OR New
8 pages
Strings and Pattern Searching
100% (1)
Strings and Pattern Searching
80 pages
Dijkstra'S Algorithm: (For Programming Based Labs)
No ratings yet
Dijkstra'S Algorithm: (For Programming Based Labs)
5 pages
Design & Analysis of Algorithm - 6
No ratings yet
Design & Analysis of Algorithm - 6
32 pages
Lec 7
No ratings yet
Lec 7
24 pages
Recitation #10
No ratings yet
Recitation #10
5 pages
CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )
No ratings yet
CS 240 Tutorial 9 Notes: (Array A of Size M, Store V at A (K) )
4 pages
CS 240 Tutorial 10 Notes: Lo Hi Lo Hi
No ratings yet
CS 240 Tutorial 10 Notes: Lo Hi Lo Hi
4 pages
Chapter4 TD
No ratings yet
Chapter4 TD
4 pages
1 Non-Linear Curve Fitting: 1.1 Linearization
No ratings yet
1 Non-Linear Curve Fitting: 1.1 Linearization
3 pages
M269 - Lec8 Fall 1819
No ratings yet
M269 - Lec8 Fall 1819
24 pages
Hash Function
No ratings yet
Hash Function
3 pages
12 StringMatching
No ratings yet
12 StringMatching
23 pages
CS 240 Tutorial 8 Notes: N I 1 1 X N I N I N I 1
No ratings yet
CS 240 Tutorial 8 Notes: N I 1 1 X N I N I N I 1
2 pages
Stringsearch
No ratings yet
Stringsearch
13 pages
DS Unit-5 Topic
No ratings yet
DS Unit-5 Topic
26 pages
Internetalgo
No ratings yet
Internetalgo
13 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
20BCS5977 - DAA LAB WORKSHEET 3.3pdf
No ratings yet
20BCS5977 - DAA LAB WORKSHEET 3.3pdf
5 pages
W 9 Presentation
No ratings yet
W 9 Presentation
20 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
Module III Problem Solving
No ratings yet
Module III Problem Solving
16 pages
KMP Algorithm
No ratings yet
KMP Algorithm
19 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
KMP Algo
No ratings yet
KMP Algo
16 pages
String Matching
No ratings yet
String Matching
89 pages
Advanced String Lecture
No ratings yet
Advanced String Lecture
50 pages
AAD-String Matching
No ratings yet
AAD-String Matching
15 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
CSE 205 Lab Manual 12 KMP
No ratings yet
CSE 205 Lab Manual 12 KMP
6 pages
Cse 217
No ratings yet
Cse 217
10 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
Unit 3
No ratings yet
Unit 3
34 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
String Matching
No ratings yet
String Matching
27 pages
DAA DA Output
No ratings yet
DAA DA Output
9 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
KMP 2
No ratings yet
KMP 2
7 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Daa Da
No ratings yet
Daa Da
9 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
CH 8
No ratings yet
CH 8
26 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Unit 5
No ratings yet
Unit 5
14 pages
String Matching Problem
No ratings yet
String Matching Problem
16 pages
String Matching
No ratings yet
String Matching
34 pages
Short Notes On Knuth
No ratings yet
Short Notes On Knuth
2 pages
28 - Text Processing
No ratings yet
28 - Text Processing
7 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
String Matching
No ratings yet
String Matching
35 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
Algorithms in Bioinformatics
No ratings yet
Algorithms in Bioinformatics
7 pages
Knuth-Morris-Pratt Algorithm KENT
No ratings yet
Knuth-Morris-Pratt Algorithm KENT
4 pages
How A Search Engine Works
No ratings yet
How A Search Engine Works
28 pages
Pattern Matching 2
No ratings yet
Pattern Matching 2
46 pages
W9 Presentation
No ratings yet
W9 Presentation
20 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Abstract
No ratings yet
Abstract
12 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
201 Mind Boggling Problems In Mathematics
From Everand
201 Mind Boggling Problems In Mathematics
Srijit Mondal
No ratings yet
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)

CS 240 Tutorial 11 Notes: C A A B A

Uploaded by

CS 240 Tutorial 11 Notes: C A A B A

Uploaded by

CS 240 Tutorial 11 Notes

abcab is P truncated at mismatch

You might also like