String Matching

Uploaded by

pranshusahu862

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

String Matching

Uploaded by

pranshusahu862

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 5

String Matching Algorithms

String matching or pattern matching is the task of finding a substring (also known as a "pattern")
within another string (the "text"). The problem is central to various applications in computer
science, including text search engines, bioinformatics, and data processing.

1.1 Introduction to String Matching

String matching involves identifying whether a given pattern exists within a larger string and, if it
does, determining the position of the match. The problem of searching for a pattern in a text can be
generalized as: given a string T of length n and a pattern P of length m, find all occurrences of P in
T.
Applications:
• Search Engines: Search for keywords in documents.
• Bioinformatics: Match DNA sequences or protein patterns.
• Text Processing: Tasks such as text indexing, spell checkers, and plagiarism detection.

1.2 Naive String Matching Algorithm

The naive approach is the most straightforward but least efficient for string matching. It involves
checking all possible positions in the text where the pattern can fit, and at each position, checking if
the substring of the text matches the pattern.
Algorithm Steps:
1. Start from the first character of the text.
2. Compare the substring of the text starting at that position with the pattern.
3. If a match is found, record the position.
4. Move one character forward and repeat until the end of the text.
Time Complexity: The worst-case time complexity is O(n×m), where n is the length of the text and
m is the length of the pattern.
Python Example:
python
Copy code
def naive_search(text, pattern):
n = len(text)
m = len(pattern)
for i in range(n - m + 1):
if text[i:i+m] == pattern:
print(f"Pattern found at index {i}")

Pros:
• Simple to implement.
• Works well for small texts or patterns.
Cons:
• Inefficient for larger texts, as it checks all possible positions.
1.3 Knuth-Morris-Pratt (KMP) Algorithm
The KMP algorithm improves upon the naive approach by using information from previous
comparisons to avoid unnecessary re-checking of characters. The core idea is to preprocess the
pattern to create a partial match table (also known as the "prefix table"), which helps to skip over
portions of the text that have already been checked.
Key Idea: The idea is to avoid rechecking characters that are known to match. When a mismatch
occurs, the algorithm uses the prefix table to shift the pattern appropriately.
Algorithm Steps:
1. Preprocess the pattern: Create a table that stores the length of the longest proper prefix of
the pattern that is also a suffix.
2. Search: Compare the pattern with the text. If a mismatch occurs and the current matched
length is greater than 0, shift the pattern according to the prefix table, otherwise move the
pattern one step forward.
Time Complexity: The time complexity of the KMP algorithm is O(n+m), which is more efficient
than the naive approach.
Python Example:
python
Copy code
def KMP_search(text, pattern):
def compute_prefix_table(pattern):
m = len(pattern)
lps = [0] * m
length = 0
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps

n = len(text)
m = len(pattern)
lps = compute_prefix_table(pattern)
i = 0
j = 0
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
print(f"Pattern found at index {i - j}")
j = lps[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1

Pros:
• Much faster than the naive approach.
• Avoids redundant comparisons.
Cons:
• Requires preprocessing, which takes O(m) time.
• Slightly more complex to implement.

1.4 Boyer-Moore Algorithm

The Boyer-Moore algorithm is one of the most efficient string matching algorithms, especially for
large texts. It improves searching by utilizing two heuristics:
1. Bad Character Heuristic: If a mismatch occurs, the algorithm shifts the pattern so that the
mismatched character aligns with its rightmost occurrence in the pattern.
2. Good Suffix Heuristic: If a mismatch occurs, the algorithm shifts the pattern based on
previously matched suffixes.
Key Idea: Boyer-Moore searches for the pattern in the text from right to left, unlike other
algorithms that check from left to right. The idea is to skip as many characters as possible when a
mismatch occurs by using the aforementioned heuristics.
Time Complexity: The average case time complexity is O(n/m), where n is the length of the text
and m is the length of the pattern, but in the worst case, it is O(n×m).
Python Example:
python
Copy code
def Boyer_Moore(text, pattern):
m = len(pattern)
n = len(text)

# Preprocessing the bad character rule

bad_char = {}
for i in range(m):
bad_char[pattern[i]] = i

i = 0
while i <= n - m:
j = m - 1
while j >= 0 and pattern[j] == text[i + j]:
j -= 1
if j < 0:
print(f"Pattern found at index {i}")
i += (m - bad_char.get(text[i + m], -1) if i + m < n else 1)
else:
i += max(1, j - bad_char.get(text[i + j], -1))

Pros:
• Very efficient in practice, especially for large patterns.
• Preprocessing takes linear time, and the search is very fast.
Cons:
• More complicated than other algorithms.
• Can have poor worst-case performance, though rare.

1.5 Rabin-Karp Algorithm

The Rabin-Karp algorithm uses hashing to quickly identify possible matches. It computes the hash
of the pattern and the hash of every substring of the text with the same length as the pattern. If a
hash matches, a direct comparison is done to verify the match.
Key Idea: By comparing hash values, Rabin-Karp avoids direct string comparison in many cases,
which can lead to faster results. However, there can be hash collisions, so further verification is
necessary when a hash match occurs.
Time Complexity: The average time complexity is O(n+m), but in the worst case, it can degrade to
O(n×m) due to hash collisions.
Python Example:
python
Copy code
def Rabin_Karp(text, pattern):
d = 256 # number of characters in the input alphabet
q = 101 # prime number to mod hash values

m = len(pattern)
n = len(text)
pattern_hash = 0
text_hash = 0
h = 1

# Calculate the value of d^(m-1) % q

for i in range(m-1):
h = (h * d) % q

# Calculate initial hash values for the pattern and text

for i in range(m):
pattern_hash = (d * pattern_hash + ord(pattern[i])) % q
text_hash = (d * text_hash + ord(text[i])) % q

# Search for the pattern

for i in range(n - m + 1):
if pattern_hash == text_hash:
if text[i:i+m] == pattern:
print(f"Pattern found at index {i}")
if i < n - m:
text_hash = (d * (text_hash - ord(text[i]) * h) + ord(text[i+m])) %
q
if text_hash < 0:
text_hash += q

Pros:
• Efficient for multiple pattern searches (can be extended for multiple patterns).
• Hashing speeds up the search.
Cons:
• Can suffer from hash collisions, leading to additional comparisons.
• Requires careful handling of hashing and collisions.

Problems Text Algorithms Solutions
100% (2)
Problems Text Algorithms Solutions
345 pages
UNIT-4 PPT New
No ratings yet
UNIT-4 PPT New
47 pages
Python Program For Array Rotation
No ratings yet
Python Program For Array Rotation
3 pages
Module-5-28march
No ratings yet
Module-5-28march
10 pages
AOA EXPT 10
No ratings yet
AOA EXPT 10
10 pages
Daa Exp 09
No ratings yet
Daa Exp 09
7 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
ADS UNIT5
No ratings yet
ADS UNIT5
26 pages
Report Rabin-Karp-Algorithm IR IA
No ratings yet
Report Rabin-Karp-Algorithm IR IA
13 pages
String Matching Algorithms: International Journal of Engineering and Computer Science March 2018
No ratings yet
String Matching Algorithms: International Journal of Engineering and Computer Science March 2018
5 pages
Module III Problem Solving
No ratings yet
Module III Problem Solving
16 pages
Strings
No ratings yet
Strings
23 pages
DAA-DA-output
No ratings yet
DAA-DA-output
9 pages
String Matching Algorithms: 1 Brute Force
No ratings yet
String Matching Algorithms: 1 Brute Force
5 pages
DAA-DA
No ratings yet
DAA-DA
9 pages
Paper 20
No ratings yet
Paper 20
4 pages
DS V Unit Notes
No ratings yet
DS V Unit Notes
33 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Unit-4 Ads
100% (1)
Unit-4 Ads
31 pages
ALo 2
No ratings yet
ALo 2
23 pages
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
No ratings yet
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
82 pages
Unit 2 - Letter ManipilationPattern Searching
No ratings yet
Unit 2 - Letter ManipilationPattern Searching
19 pages
INF715-11
No ratings yet
INF715-11
57 pages
String Matching
100% (1)
String Matching
27 pages
Algorithm U2 Answer Key
No ratings yet
Algorithm U2 Answer Key
18 pages
1739462659092
No ratings yet
1739462659092
114 pages
Abdul Rauf (021!21!0019) Assignment2
No ratings yet
Abdul Rauf (021!21!0019) Assignment2
3 pages
Module 06. String Algorithms Lecture 1 - 2
No ratings yet
Module 06. String Algorithms Lecture 1 - 2
19 pages
Module V
No ratings yet
Module V
4 pages
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
No ratings yet
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
15 pages
Daa Project
No ratings yet
Daa Project
39 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
Design and Analysis of Algorithms: Dr. Sobia Arshad
No ratings yet
Design and Analysis of Algorithms: Dr. Sobia Arshad
43 pages
Obs Ds Unit5
No ratings yet
Obs Ds Unit5
10 pages
DAA Unit 5 Part 1
No ratings yet
DAA Unit 5 Part 1
27 pages
KMP Algorithm For Strings
No ratings yet
KMP Algorithm For Strings
4 pages
Information Retrieval - Chapter 10 - String Searching Algorithms
No ratings yet
Information Retrieval - Chapter 10 - String Searching Algorithms
27 pages
pattern matching
No ratings yet
pattern matching
33 pages
Daa Mini Project
No ratings yet
Daa Mini Project
5 pages
Adsa Report
No ratings yet
Adsa Report
9 pages
Rabin Karp
No ratings yet
Rabin Karp
13 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
Unit-V DS Pattern Matching and Tries
No ratings yet
Unit-V DS Pattern Matching and Tries
26 pages
Ads M2
No ratings yet
Ads M2
12 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
10 pages
String Matching Algorithm
No ratings yet
String Matching Algorithm
5 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
21 pages
Experiment No.09: Part A
No ratings yet
Experiment No.09: Part A
7 pages
WINSEM2024-25_BCSE204L_TH_VL2024250501496_2025-02-07_Reference-Material-I
No ratings yet
WINSEM2024-25_BCSE204L_TH_VL2024250501496_2025-02-07_Reference-Material-I
11 pages
Position S + 1 in Text T) If 0 S N - M and T (S + 1 - . S + M) P (1 - . M) (That Is, If T (S + J) P (J), For 1 J M) - If
100% (1)
Position S + 1 in Text T) If 0 S N - M and T (S + 1 - . S + M) P (1 - . M) (That Is, If T (S + J) P (J), For 1 J M) - If
2 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
SplitPDFFile 346 to 402
No ratings yet
SplitPDFFile 346 to 402
57 pages
WINSEM2024-25_BCSE204L_TH_VL2024250501518_2025-02-07_Reference-Material-I
No ratings yet
WINSEM2024-25_BCSE204L_TH_VL2024250501518_2025-02-07_Reference-Material-I
6 pages
NLPmidterm Slide
No ratings yet
NLPmidterm Slide
16 pages
PSA Unit V
No ratings yet
PSA Unit V
34 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
Problem Solving Approach L N D C S E
No ratings yet
Problem Solving Approach L N D C S E
36 pages
PSA 5 Final
No ratings yet
PSA 5 Final
36 pages
String Matching Algorithm
No ratings yet
String Matching Algorithm
2 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Elementary Theory and Application of Numerical Analysis: Revised Edition
From Everand
Elementary Theory and Application of Numerical Analysis: Revised Edition
David G. Moursund
No ratings yet
Graph Algorithm 2
No ratings yet
Graph Algorithm 2
5 pages
Graph Algorithm
No ratings yet
Graph Algorithm
10 pages
Backtracking Algorithm
No ratings yet
Backtracking Algorithm
5 pages
Sorting
No ratings yet
Sorting
4 pages
Trees Ds
No ratings yet
Trees Ds
4 pages
Hashing
No ratings yet
Hashing
4 pages
Mapping by Elementary Function and Conformal Mapping
No ratings yet
Mapping by Elementary Function and Conformal Mapping
8 pages
Notes 2
No ratings yet
Notes 2
4 pages
Complex Integration
No ratings yet
Complex Integration
10 pages
Complex Analysis
No ratings yet
Complex Analysis
13 pages
Concept of Negative Remainder
No ratings yet
Concept of Negative Remainder
12 pages
Remainder Theorem
No ratings yet
Remainder Theorem
6 pages
Number System
No ratings yet
Number System
10 pages
Trie and Suffix Trees
No ratings yet
Trie and Suffix Trees
17 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Baeza Yates PDF
No ratings yet
Baeza Yates PDF
9 pages
17 String Matching - Rabin Karp Algorithm
No ratings yet
17 String Matching - Rabin Karp Algorithm
25 pages
Ripping For Dummies
No ratings yet
Ripping For Dummies
17 pages
Rabin Karp Plagiarism Check
No ratings yet
Rabin Karp Plagiarism Check
16 pages
String Search Algorithm
No ratings yet
String Search Algorithm
6 pages
DSA Sheet by Shradha Didi & Aman Bhaiya - Google Drive
No ratings yet
DSA Sheet by Shradha Didi & Aman Bhaiya - Google Drive
2 pages
KMP algorithm
No ratings yet
KMP algorithm
19 pages
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
Information Retrival List of Experiment - Odd Sem 2024-25
No ratings yet
Information Retrival List of Experiment - Odd Sem 2024-25
23 pages
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
No ratings yet
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
5 pages
String Algorithms: Jaehyun Park Cs 97si Stanford University
No ratings yet
String Algorithms: Jaehyun Park Cs 97si Stanford University
40 pages
Brute Force Approach in Algotihm
No ratings yet
Brute Force Approach in Algotihm
10 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
BruteForceStringMatch
No ratings yet
BruteForceStringMatch
2 pages
Unit 5 DS
No ratings yet
Unit 5 DS
53 pages
Automata Theory and Formal Languages - Module 2
No ratings yet
Automata Theory and Formal Languages - Module 2
4 pages
String Matching With Regular Expression
No ratings yet
String Matching With Regular Expression
6 pages
Exact String Matching Algorithms Survey Issues and
No ratings yet
Exact String Matching Algorithms Survey Issues and
25 pages
Thesis On String Matching
100% (3)
Thesis On String Matching
5 pages
Lecture 40 Boyer Moore Algorithm
100% (1)
Lecture 40 Boyer Moore Algorithm
13 pages
String Matching 0
No ratings yet
String Matching 0
40 pages
String Matching
No ratings yet
String Matching
30 pages
Advanced Search Reference
No ratings yet
Advanced Search Reference
40 pages
Ads Sy
No ratings yet
Ads Sy
3 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages