String Searching Algorithm

This document discusses several string searching algorithms: Naive, Knuth-Morris-Pratt, Shift-OR, Boyer-Moore, Boyer-Moore-Horspool, and Karp-Rabin. It explains the basic ideas and provides examples for each algorithm.

Uploaded by

Mohan Krishna Mannava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

172 views22 pages

String Searching Algorithm

Uploaded by

Mohan Krishna Mannava

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

String Searching Algorithm

 指導教授 : 黃三益教授
 組員 : 9142639 蔡嘉文
9142642 高振元
9142635 丁康迪
String Searching Algorithm
 Outline:
 The Naive Algorithm
 The Knuth-Morris-Pratt Algorithm
 The SHIFT-OR Algorithm
 The Boyer-Moore Algorithm
 The Boyer-Moore-Horspool Algorithm
 The Karp-Rabin Algorithm
 Conclusion
String Searching Algorithm
 Preliminaries:
 n: the length of the text
 m: the length of the pattern(string)
 c: the size of the alphabet
 Cn: the expected number of comparisons
performed by an algorithm while searching
the pattern in a text of length n
The Naive Algorithm
Char text[], pat[] ;
int n, m ;
{
int i, j, k, lim ; lim=n-m+1 ;
for (i=1 ; i<=lim ; i++) /* search */
{
k=i ;
for (j=1 ; j<=m && text[k]==pat[j]; j++) k++;
if (j>m) Report_match_at_position(i-j+1);
}
}
The Naive Algorithm(cont.)
 The idea consists of trying to match any
substring of length m in the text with the
pattern.
The Knuth-Morris-Pratt Algorithm
{
int j, k ;
int next[Max_Pattern_Size];
initnext(pat, m+1, next); /*preprocess pattern, 建立
j=k=1 ; next table*/
do{ /*search*/
if (j==0 || text[k]==pat[j] ) k++; j++;
else j=next[j] ;
if (j>m) Report_match_at_position(k-m);
} while (k<=n)
}
The Knuth-Morris-Pratt
Algorithm(cont.)
 To accomplish this, the pattern is preprocessed
to obtain a table that gives the next position in
the pattern to be processed after a mismatch.
 Ex:
position: 1 2 3 4 5 6 7 8 9 10 11
pattern: a b r a c a d a b r a
Next[j]: 0 1 1 0 2 0 2 0 1 1 0
text: a b r a c a f ……………
The Shift-Or Algorithm
 The main idea is to represent the state of the
search as a number.
 State=S1 ． 20 ＋ S2 ． 21+…+Sm ． 2m-1
 Tx=δ(pat1=x) ． 20 ＋ δ(pat2=x) +…..+
δ(patm=x) ． 2m-1
 For every symbol x of the alphabet,
whereδ(C) is 0 if the condition C is true, and
1 otherwise.
The Shift-Or Algorithm(cont.)
 Ex:{a,b,c,d} be the alphabet, and ababc the
pattern.
T[a]=11010,T[b]=10101,T[c]=01111,T[d]=11111
the initial state is 11111
The Shift-Or Algorithm(cont.)
 Pattern: ababc
 Text: a b d a b a b c

 T[x]:11010 10101 11111 11010 10101 11010 10101 01111

 State: 11110 11101 11111 11110 11101 11010 10101 01111
 For example, the state 10101 means that in the current
position we have two partial matches to the left, of
lengths two and four, respectively.
 The match at the end of the text is indicated by the
value 0 in the leftmost bit of the state of the search.
The Boyer-Moore Algorithm
 Search from right to left in the pattern
 Shift method :
 match heuristic
compute the dd table for the pattern
 occurrence heuristic
compute the d table for the pattern
The Boyer-Moore Algorithm
(cont.)
Match shift
The Boyer-Moore Algorithm
(cont.)
occurrence shift
The Boyer-Moore Algorithm
(cont.)
k=m
while(k<=n){
j=m;
while(j>0&&text[k]==pat[j])
{ j -- , k -- }
if(j == 0)
{ report_match_at_position(k+1) ; }
else k+= max( d[text[k] , dd[j]);
}
The Boyer-Moore Algorithm
(cont.)
 Example

T : xyxabraxyzabracadabra
P : abracadabra

mismatch, compute a shift

The Boyer-Moore-Horspool
Algorithm
 A simplification of BM Algorithm

 Compares the pattern from left to right

The Boyer-Moore-Horspool
Algorithm(cont.)
for(k=;k<=m;k++) d[pat[k] = m+1-k;
pat[m+1]=CHARACTER_NOT_IN_THE_TEXT;
lim = n-m+1;
for( k=1; k<=lim ; k+= d[text[k+m]] )
{
i=k;
for(j=1 ; text[i]==pat[j] ; j++) i++;
if( j==m+1) report_match_at_position(k);
}
The Boyer-Moore-Horspool
Algorithm(cont.)
 Eaxmple :

T:xyzabraxyzabracadabra
P:abracadabra
The Karp-Rabin Algorithm
 Use hashing
 Computing the signature function of
each possible m-character substring
 Check if it is equal to the signature
function of the pattern
 Signature function h(k)=k mod q, q is a
large prime
The Karp-Rabin
Algorithm(cont.)
rksearch( text, n, pat, m ) /* Search pat[1..m] in text[1..n] */
char text[], pat[]; /* (0 m = n) */
int n, m;
{
int h1, h2, dM, i, j;
dM = 1;
for( i=1; i<m; i++ ) dM = (dM << D) % Q; /* Compute the signature */
h1 = h2 = O; /* of the pattern and of */
for( i=1; i<=m; i++ ) /* the beginning of the */
{ /* text */
h1 = ((h1 << D) + pat[i] ) % Q;
h2 = ((h2 << D) + text[i] ) % Q;
}
The Karp-Rabin
Algorithm(cont.)
for( i = 1; i <= n-m+1; i++ ) /* Search */
{
if( h1 == h2 ) /* Potential match */
{
for(j=1; j<=m && text[i-1+j] == pat[j]; j++ ); /* check */
if( j > m ) /* true match */
Report_match_at_position( i );
}
h2 = (h2 + (Q << D) - text[i]*dM ) % Q; /* update the signature */
h2 = ((h2 << D) + text[i+m] ) % Q; /* of the text */
}
}
Conclusions
 Test: Random pattern, random text and English
text
 Best: The Boyer-Moore-Horspool Algorithm
 Drawback: preprocessing time and space(depend
on alphabet/pattern size)
 Small pattern: The Shift-Or Algorithm
 Large alphabet: The Knuth-Morris-Pratt Algorithm
 Others: The Boyer-Moore Algorithm
 “don’t care”: The Shift-Or Algorithm

Pattern Matching Algorithms
100% (1)
Pattern Matching Algorithms
14 pages
Pattern Matching
No ratings yet
Pattern Matching
46 pages
Boyer-Moore String Search: - How Does It Work? - Examples - Complexity - Acknowledgements
100% (1)
Boyer-Moore String Search: - How Does It Work? - Examples - Complexity - Acknowledgements
14 pages
Boyer Moore
100% (1)
Boyer Moore
19 pages
Pattern Matching 2
No ratings yet
Pattern Matching 2
46 pages
Boyer-Moore String Matching Guide
100% (1)
Boyer-Moore String Matching Guide
13 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
String Searching Algorithms Slides
100% (1)
String Searching Algorithms Slides
102 pages
Aho-Corasick and String Matching Techniques
No ratings yet
Aho-Corasick and String Matching Techniques
89 pages
Access Rights and Protection Mechanisms
No ratings yet
Access Rights and Protection Mechanisms
43 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Unit 1
No ratings yet
Unit 1
23 pages
Biometric Image Noise & Restoration
No ratings yet
Biometric Image Noise & Restoration
33 pages
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
No ratings yet
An Introduction To Kohonen Self Organizing Maps: Rajarshi Guha
12 pages
AI.02a - Solving Problems by Searching - T
No ratings yet
AI.02a - Solving Problems by Searching - T
118 pages
Understanding Informed Search Methods
No ratings yet
Understanding Informed Search Methods
36 pages
Intermediate Code Generation-17-19
No ratings yet
Intermediate Code Generation-17-19
90 pages
Shortest Path Algorithms
No ratings yet
Shortest Path Algorithms
94 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Computer Education For Nepali School Students - QBASIC CLASS IX
No ratings yet
Computer Education For Nepali School Students - QBASIC CLASS IX
10 pages
Ran Sac 4 Dummies
No ratings yet
Ran Sac 4 Dummies
101 pages
Truth Maintenance Systems
No ratings yet
Truth Maintenance Systems
5 pages
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
No ratings yet
Linear Algebra For Deep Learning. The Math Behind Every Deep Learning - by Vihar Kurama - Towards Data Science
20 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
21 pages
Chapter - 1 Analysis of Algorithm
No ratings yet
Chapter - 1 Analysis of Algorithm
46 pages
AI-Powered Music Creation Tool
No ratings yet
AI-Powered Music Creation Tool
16 pages
Programming On Parallel Machines
100% (2)
Programming On Parallel Machines
347 pages
Ug1228 Ultrafast Embedded Design Methodology Guide
No ratings yet
Ug1228 Ultrafast Embedded Design Methodology Guide
217 pages
CUDA Parallel Programming Patterns
No ratings yet
CUDA Parallel Programming Patterns
35 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
30 pages
Search Algorithms in AI: Overview
No ratings yet
Search Algorithms in AI: Overview
105 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
96 pages
AI Problem Solving for Students
No ratings yet
AI Problem Solving for Students
36 pages
Unit 3 - Software Estimation Technique
No ratings yet
Unit 3 - Software Estimation Technique
20 pages
Lecture 4 Computer Arithematic (Sign Magnitude)
No ratings yet
Lecture 4 Computer Arithematic (Sign Magnitude)
30 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
Randomized Algorithms Monte Carlo and Las Vegas Algorithm
No ratings yet
Randomized Algorithms Monte Carlo and Las Vegas Algorithm
45 pages
Regularization Techniques in Deep Learning
No ratings yet
Regularization Techniques in Deep Learning
30 pages
Machine Learning for Tech Enthusiasts
No ratings yet
Machine Learning for Tech Enthusiasts
12 pages
Ant Colony Optimization Guide
No ratings yet
Ant Colony Optimization Guide
18 pages
Adversarial Search in AI Games
No ratings yet
Adversarial Search in AI Games
30 pages
Applied Machine Learning for Engineers
0% (1)
Applied Machine Learning for Engineers
6 pages
K-Means Clustering Insights
No ratings yet
K-Means Clustering Insights
8 pages
Convert CFG to Chomsky Normal Form
No ratings yet
Convert CFG to Chomsky Normal Form
4 pages
Case Study in Rtlinux
No ratings yet
Case Study in Rtlinux
5 pages
Deep Learning LectureCNN
No ratings yet
Deep Learning LectureCNN
28 pages
Understanding Support Vector Machines (SVM)
No ratings yet
Understanding Support Vector Machines (SVM)
11 pages
Image Compression with DCT Tutorial
100% (1)
Image Compression with DCT Tutorial
10 pages
ML 4 PPT Unit Iv
No ratings yet
ML 4 PPT Unit Iv
71 pages
Image Enhancement in Frequency Domain
No ratings yet
Image Enhancement in Frequency Domain
59 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Graphics Line & Circle Algorithms
No ratings yet
Graphics Line & Circle Algorithms
8 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
19 pages
UNIT-4 PPT New
No ratings yet
UNIT-4 PPT New
47 pages
String Algorithms & Pattern Matching
No ratings yet
String Algorithms & Pattern Matching
22 pages
Pattern Matching Algorithms Explained
No ratings yet
Pattern Matching Algorithms Explained
3 pages
Outline and Reading: Strings ( 9.1.1) Pattern Matching Algorithms
No ratings yet
Outline and Reading: Strings ( 9.1.1) Pattern Matching Algorithms
3 pages
Algo Lecture 7
No ratings yet
Algo Lecture 7
52 pages
AV Equipment Rental Price List
No ratings yet
AV Equipment Rental Price List
5 pages
How Was The Traditional Makam Theory Wes PDF
100% (2)
How Was The Traditional Makam Theory Wes PDF
19 pages
Geometric Probability: Enduring Understanding: Develop A Better Understanding of How To Identify, Determine The Size Of
100% (1)
Geometric Probability: Enduring Understanding: Develop A Better Understanding of How To Identify, Determine The Size Of
5 pages
Com Lynx User Guide 15 The 20110513
No ratings yet
Com Lynx User Guide 15 The 20110513
45 pages
2-Bearing Capacity of Soils
No ratings yet
2-Bearing Capacity of Soils
24 pages
Manual RKE15000A-V (En) Orion
67% (6)
Manual RKE15000A-V (En) Orion
96 pages
Psychological Report WISC IV
No ratings yet
Psychological Report WISC IV
2 pages
Standard 1 Assessment Task
No ratings yet
Standard 1 Assessment Task
3 pages
SCM - Midterm Test of 40 Questions - 25'
No ratings yet
SCM - Midterm Test of 40 Questions - 25'
3 pages
Asl 121 Syllabus Spring 2022
No ratings yet
Asl 121 Syllabus Spring 2022
8 pages
Theoretical and Conceptual Frameworks
83% (6)
Theoretical and Conceptual Frameworks
133 pages
Ôn Tập Học Kỳ 1 Lớp 9 Năm 2022-2023
No ratings yet
Ôn Tập Học Kỳ 1 Lớp 9 Năm 2022-2023
8 pages
22655-IHP Notes by Shaikh Sir-UNIT 05
No ratings yet
22655-IHP Notes by Shaikh Sir-UNIT 05
26 pages
Taylor Experiencing
No ratings yet
Taylor Experiencing
12 pages
Individual Education Plan Template
No ratings yet
Individual Education Plan Template
11 pages
XVR User Manual
No ratings yet
XVR User Manual
172 pages
Implementing Performance Assessment in The Classroom
No ratings yet
Implementing Performance Assessment in The Classroom
4 pages
Diwali Holiday Homework Class 8
No ratings yet
Diwali Holiday Homework Class 8
3 pages
Challenges of GROWTH
No ratings yet
Challenges of GROWTH
36 pages
Dynamics of Rigid Bodies PS Solution
No ratings yet
Dynamics of Rigid Bodies PS Solution
2 pages
Design of Transmission Systems Question Bank
No ratings yet
Design of Transmission Systems Question Bank
23 pages
Care Plan Pre-School
No ratings yet
Care Plan Pre-School
2 pages
Las Ap4 Q2 Week 2 Las 1 Arlene Asgapo1
No ratings yet
Las Ap4 Q2 Week 2 Las 1 Arlene Asgapo1
4 pages
The Two Ibn Sina's PDF
No ratings yet
The Two Ibn Sina's PDF
26 pages
Mass Wasting
No ratings yet
Mass Wasting
10 pages
A Review of Mindfulness and Social Media Excessive Use: Xiru Sun
No ratings yet
A Review of Mindfulness and Social Media Excessive Use: Xiru Sun
9 pages
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
No ratings yet
C18T2S3 - Adam and Michelle's Presentation On Laki Eruption
2 pages
CNS Unit 2
No ratings yet
CNS Unit 2
38 pages
Dresser Water Pipeline Repair Brochure
No ratings yet
Dresser Water Pipeline Repair Brochure
20 pages
Three Phase Transformer Connections
No ratings yet
Three Phase Transformer Connections
17 pages

String Searching Algorithm

Uploaded by

String Searching Algorithm

Uploaded by

String Searching Algorithm

 T[x]:11010 10101 11111 11010 10101 11010 10101 01111

mismatch, compute a shift

 Compares the pattern from left to right

You might also like