12 StringMatching

This document discusses string matching algorithms. It introduces the brute force algorithm and its O(MN) complexity. It then describes the Knuth-Morris-Pratt (KMP) algorithm which improves efficiency by preprocessing the pattern string to determine how to match more efficiently. The KMP algorithm runs in O(N+M) time where N and M are the lengths of the text and pattern strings, respectively. It provides examples of building the KMP table and tracing the algorithm.

Uploaded by

huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

12 StringMatching

Uploaded by

huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

So sánh xâu nhanh

String matching
String Matching

Tham khảo bài giảng 15-211

Fundamental Data Structures and
Algorithms, CMU
In this lecture
• String Matching Problem
– Concept
– brute force algorithm
– complexity
• Knuth-Morris-Pratt (KMP) Algorithm
– Pre-processing
– complexity

Tham khảo bài giảng 15-211 Fundamental Data

Structures and Algorithms, CMU
Pattern Matching
Algorithms
String Matching
• Text string T[0..N-1]
T = “abacaabaccabacabaabb”
• Pattern string P[0..M-1]
P = “abacab”
• Where is the first instance of P in T?
T[10..15] = P[0..5]
• Typically N >> M
Why String Matching?
• Applications in Computational Biology
– DNA sequence is a long word (or text) over a 4-letter alphabet
– GTTTGAGTGGTCAGTCTTTTCGTTTCGACGGAGCCCCCAATTA
ATAAACTCATAAGCAGACCTCAGTTCGCTTAGAGCAGCCGAA
A…..
– Find a Specific pattern W
• Finding patterns in documents formed using a large alphabet
– Word processing
– Web searching
– Desktop search (Google, MSN)
• Matching strings of bytes containing
– Graphical data
– Machine code
• grep in Unix/Linux
– grep searches for lines matching a pattern.
Naïve Algorithm
(or Brute Force)
• Assume |T| = n and |P| = m
Text T
Pattern P
Pattern P
Pattern P

Compare until a match is found. If so return the index where match

occurs
else return -1
String Matching
abacaabaccabacabaabb • The brute force
abacab algorithm
abacab
• 22+6=28 comparisons.
abacab
abacab
abacab
abacab
abacab
abacab
abacab
abacab
abacab
A bad case
00000000000000001
• 60+5 = 65
0000- comparisons are
0000- needed
0000- • How many of them
0000- could be avoided?
0000-
0000-
0000-
0000-
0000-
0000-
0000-
0000-
00001
String Matching
• Brute force worst case
– O(MN)
– Expensive for long patterns in repetitive text
• How to improve on this?
• Intuition:
– Remember what is learned from previous
matches
Knuth Morris Pratt
(KMP)
Algorithm
KMP – The Big Idea
• Retain information from prior attempts.
• Compute in advance how far to jump in P when a match
fails.
– Suppose the match fails at P[j]  T[i+j].
– Then we know P[0 .. j-1] = T[i .. i+j-1].
• We must next try P[0] ? T[i+1].
– But we know T[i+1]=P[1]
– What if we compare: P[1] ? P[0]
• If so, increment j by 1. No need to look at T.
– What if P[1] = P[0] and P[2] = P[1] ?
• Then increment j by 2. Again, no need to look at T.
• In general, we can determine how far to jump without any
knowledge of T!
Implementing KMP
• Never decrement i, ever.
– Comparing
T[i] with P[j].
• Compute a table f of how far to jump j
forward when a match fails.
– The next match will compare
T[i] with P[ f[j-1] ]
• Do this by matching P against itself in all
positions.
Building the Table for f
• P = 1010011
• Find self-overlaps
Prefix Overlap j f
1 . 1 0
10 . 2 0
101 1 3 1
1010 10 4 2
10100 . 5 0
101001 1 6 1
1010011 1 7 1
What f means
Prefix Overlap j f • f non-zero implies there is
1 . 1 0 a self-match.
10 . 2 0 E.g., f=2 means P[0..1] = P[j-
101 1 3 1 2..j-1]
1010 10 4 2
• Hence must start new
10100 . 5 0 comparison at j-2, since we
101001 1 6 1 know T[i-2..i-1] = P[0..1]
1010011 1 7 1
• If f is zero, there is no In general:
self-match. – Set j=f[j-1]
– Do not change i.
– Set j=0
• The next match is
– Do not change i. T[i] ? P[f[j-1]]
• The next match is
T[i] ? P[0]
Favorable conditions
• P = 1234567
• Find self-overlaps
Prefix Overlap j f
1 . 1 0
12 . 2 0
123 . 3 0
1234 . 4 0
12345 . 5 0
123456 . 6 0
1234567 . 7 0
Mixed conditions
• P = 1231234
• Find self-overlaps
Prefix Overlap j f
1 . 1 0
12 . 2 0
123 . 3 0
1231 1 4 1
12312 12 5 2
123123 123 6 3
1231234 . 7 0
Poor conditions
• P = 1111110
• Find self-overlaps
Prefix Overlap j f
1 . 1 0
11 1 2 1
111 11 3 2
1111 111 4 3
11111 1111 5 4
111111 11111 6 5
1111110 . 7 0
KMP pre-process
Algorithm
m = |P|;
Define a table F of size m
F[0] = 0;
i = 1; j = 0;
while(i<m) {
compare P[i] and P[j];
if(P[j]==P[i]) Use
{ F[i] = j+1; previous
i++; j++; } values of f
else if (j>0) j=F[j-1];
else {F[i] = 0; i++;}
}
KMP Algorithm
input: Text T and Pattern P
|T| = n
|P| = m
Compute Table F for Pattern P
i=j=0
while(i<n) {
if(P[j]==T[i])
{ if (j==m-1) return i-m+1;
i++; j++; }
else if (j>0) j=F[j-1];
else i++; Use F to determine
} next value for j.

output: first occurrence of P in T

Brute Force KMP
000000000000000000000000001 0000000000000000000000000001
0000000000000-
0000000000000- 0000000000000-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0-
0000000000000-
0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
0000000000000- 0-
01
• A worse case example: 28+14 = 42 comparisons
196 + 14 = 210 comparisons
KMP Performance
• Pre-processing needs O(M) operations.
• At each iteration, one of three cases:
– T[i] = P[j]
• i increases
– T[i] <> P[j] and j>0
• i-j increases
– T[I] <> P[j] and j=0
• i increases and i-j increases
• Hence, maximum of 2N iterations.
• Thus worst case performance is O(N+M).
Exercises
• E1
– Construct the KMP table for P = 10010001
– Trace the KMP algorithm with T =
000100100100010111
• E2
– Construct the KMP table for pattern P =
ababaca
– Trace the KMP algorithm with T =
bacbababaabcbab

Advanced Concept of Modelling in AI
No ratings yet
Advanced Concept of Modelling in AI
32 pages
54.string Inotes
No ratings yet
54.string Inotes
20 pages
14 StringMatching
No ratings yet
14 StringMatching
25 pages
APPLIED THERMODYNAMICS (Solved Question Paper - 2017 Dec/2018 Jan)
50% (4)
APPLIED THERMODYNAMICS (Solved Question Paper - 2017 Dec/2018 Jan)
21 pages
14 KMP
No ratings yet
14 KMP
25 pages
Unit 4 Neuro Fuzzy
No ratings yet
Unit 4 Neuro Fuzzy
25 pages
Kumboji Pattern Matching Alg
No ratings yet
Kumboji Pattern Matching Alg
4 pages
3914.practical Methods of Optimization. Volume 1. Unconstrained Optimization by R. Fletcher
100% (1)
3914.practical Methods of Optimization. Volume 1. Unconstrained Optimization by R. Fletcher
126 pages
AI Intro1
No ratings yet
AI Intro1
81 pages
Module V
No ratings yet
Module V
4 pages
سلاسل ماركوف 1
No ratings yet
سلاسل ماركوف 1
49 pages
Design & Analysis of Algorithm - 6
No ratings yet
Design & Analysis of Algorithm - 6
32 pages
Internetalgo
No ratings yet
Internetalgo
13 pages
Aoa Assignment
No ratings yet
Aoa Assignment
5 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
CPS Final Project
No ratings yet
CPS Final Project
4 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
ADSA IA2 Solution
No ratings yet
ADSA IA2 Solution
14 pages
KMP Algorithm
No ratings yet
KMP Algorithm
19 pages
W 9 Presentation
No ratings yet
W 9 Presentation
20 pages
Vik
No ratings yet
Vik
23 pages
DS Unit-5 Topic
No ratings yet
DS Unit-5 Topic
26 pages
Module III Problem Solving
No ratings yet
Module III Problem Solving
16 pages
Advanced Methods For Complex Network Analysis
No ratings yet
Advanced Methods For Complex Network Analysis
2 pages
Ch-5 Numerical Daa
No ratings yet
Ch-5 Numerical Daa
11 pages
CSE 205 Lab Manual 12 KMP
No ratings yet
CSE 205 Lab Manual 12 KMP
6 pages
Building An Effective Data Science Practice
100% (2)
Building An Effective Data Science Practice
376 pages
AAD-String Matching
No ratings yet
AAD-String Matching
15 pages
ch14 Nonlinear Regression Models
100% (1)
ch14 Nonlinear Regression Models
18 pages
資料工程 Data Engineering: Pattern Matching 張賢宗
No ratings yet
資料工程 Data Engineering: Pattern Matching 張賢宗
38 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
KMP Algo
No ratings yet
KMP Algo
16 pages
Control of Electromechanical Systems: Prof. Claudio Roberto Gaz
No ratings yet
Control of Electromechanical Systems: Prof. Claudio Roberto Gaz
7 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
AI Project Cycle Q & A
No ratings yet
AI Project Cycle Q & A
9 pages
Khalid Raihan Talha 1731682642, Koushik Banerjee 1812171642 (CSE299.9 Report)
No ratings yet
Khalid Raihan Talha 1731682642, Koushik Banerjee 1812171642 (CSE299.9 Report)
13 pages
DS IV Unit Notes
No ratings yet
DS IV Unit Notes
29 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages
5th Sem Main Exam + Test Exam Questions
No ratings yet
5th Sem Main Exam + Test Exam Questions
10 pages
586 Joint Inversion Overview
100% (1)
586 Joint Inversion Overview
36 pages
Eecs 639 HW4
No ratings yet
Eecs 639 HW4
8 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
Numerical Solution To ODE's
No ratings yet
Numerical Solution To ODE's
6 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
Studentplacement
No ratings yet
Studentplacement
10 pages
Daa Da
No ratings yet
Daa Da
9 pages
Lecture 18 - String Matching-KMP
No ratings yet
Lecture 18 - String Matching-KMP
40 pages
The Knuth Morris Pratt Algorithm
No ratings yet
The Knuth Morris Pratt Algorithm
7 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
Ads Unit5
No ratings yet
Ads Unit5
26 pages
String Matching
No ratings yet
String Matching
63 pages
TLP 3 Factoring Common Monomial Factor
No ratings yet
TLP 3 Factoring Common Monomial Factor
5 pages
OMET Question Types PART 2 - Sol
No ratings yet
OMET Question Types PART 2 - Sol
3 pages
Activity Report - Introduction To Stat
No ratings yet
Activity Report - Introduction To Stat
3 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
PP - Nov Winter 2019
No ratings yet
PP - Nov Winter 2019
2 pages
EECP 3375-Digital Control Systems
No ratings yet
EECP 3375-Digital Control Systems
1 page
Knuth Moris 2797348
No ratings yet
Knuth Moris 2797348
21 pages
Cloud Endsem (Major)
No ratings yet
Cloud Endsem (Major)
4 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
String Matching
No ratings yet
String Matching
27 pages
KMP 2
No ratings yet
KMP 2
7 pages
Short Notes On Knuth
No ratings yet
Short Notes On Knuth
2 pages
Unit 5
No ratings yet
Unit 5
14 pages
HW 4
No ratings yet
HW 4
2 pages
4string Matching Kmprabin Karp and Naive
No ratings yet
4string Matching Kmprabin Karp and Naive
57 pages
On The Abc Spectral Radius of Cactus Graphs: Z D B Z
No ratings yet
On The Abc Spectral Radius of Cactus Graphs: Z D B Z
1 page
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
Probabilities
No ratings yet
Probabilities
2 pages
CS 240 Tutorial 11 Notes: C A A B A
No ratings yet
CS 240 Tutorial 11 Notes: C A A B A
2 pages
How A Search Engine Works
No ratings yet
How A Search Engine Works
28 pages
BML End Sem
No ratings yet
BML End Sem
2 pages
Algorithms in Bioinformatics
No ratings yet
Algorithms in Bioinformatics
7 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
Unit II
No ratings yet
Unit II
94 pages
W9 Presentation
No ratings yet
W9 Presentation
20 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
Odev 5
No ratings yet
Odev 5
1 page
String Matching
No ratings yet
String Matching
5 pages
String Matching
No ratings yet
String Matching
35 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
String Matching
No ratings yet
String Matching
34 pages
String Matching Problem
No ratings yet
String Matching Problem
16 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
Abstract
No ratings yet
Abstract
12 pages
Basic Mathematics. Explained Easy | For Beginners
From Everand
Basic Mathematics. Explained Easy | For Beginners
ExaGrecation
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Exercises of Logarithms and Exponentials
From Everand
Exercises of Logarithms and Exponentials
Simone Malacrida
No ratings yet

12 StringMatching

Uploaded by

12 StringMatching

Uploaded by

So sánh xâu nhanh

Tham khảo bài giảng 15-211

Tham khảo bài giảng 15-211 Fundamental Data

Compare until a match is found. If so return the index where match

output: first occurrence of P in T

You might also like