0% found this document useful (0 votes)

13 views13 pages

String Matching Algorithms

The document discusses various string matching algorithms including Naive, KMP, Rabin-Karp, Boyer-Moore, and Aho-Corasick, highlighting their time complexities and use cases. It also covers classical problems in computer science such as sorting, greedy algorithms, dynamic programming, graph problems, and string problems, providing examples and code snippets for concepts like Longest Common Subsequence and Minimum Edit Distance. Additionally, it introduces Kadane’s Algorithm for finding the maximum sum subarray, detailing its implementation and complexities.

Uploaded by

Cloud Computing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

String Matching Algorithms

Uploaded by

Cloud Computing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

String Matching Algorithms

1. Naive String Matching

 Checks for the pattern at every position in the text.

 Simple but inefficient: O(m * n) time complexity (m = pattern length, n = text length).

 Good for small inputs or simple cases.

2. Knuth-Morris-Pratt (KMP) Algorithm

 Improves over naive by avoiding re-examination of characters.

 Preprocesses the pattern to create a Longest Prefix Suffix (LPS) array.

 Time complexity: O(m + n).

 Great for large texts with repetitive patterns.

3. Rabin-Karp Algorithm

 Uses hashing to find pattern in text.

 Hash pattern and substrings of text; if hash matches, verify characters.

 Average time complexity: O(m + n); worst-case O(m * n) (due to collisions).

 Good for multiple pattern searching.

4. Boyer-Moore Algorithm

 Uses bad character and good suffix heuristics to skip sections of text.

 Usually very fast in practice.

 Time complexity: best case O(n/m), worst case O(m * n).

 Efficient for large alphabets and long patterns.

5. Aho-Corasick Algorithm

 For searching multiple patterns simultaneously.

 Builds a trie + failure links automaton.

 Time complexity: O(n + total pattern length + number of matches).

 Used in spam filtering, DNA sequence analysis.

Quick Example: KMP LPS Array Construction

python

CopyEdit

def compute_lps(pattern):

lps = [0] * len(pattern)

length = 0

i=1

while i < len(pattern):

if pattern[i] == pattern[length]:

length += 1

lps[i] = length

i += 1

else:

if length != 0:

length = lps[length - 1]

else:

lps[i] = 0

i += 1

return lps

Classical Problems

1. Sorting and Searching

 Merge Sort, Quick Sort

 Binary Search and Variants

 Counting Sort, Radix Sort

2. Greedy Problems

 Activity Selection (Job Scheduling)

 Huffman Coding (Data Compression)

 Fractional Knapsack
3. Dynamic Programming

 Fibonacci Numbers

 0/1 Knapsack Problem

 Longest Common Subsequence (LCS)

 Coin Change Problem

 Matrix Chain Multiplication

 Edit Distance (Levenshtein Distance)

4. Graph Problems

 Depth-First Search (DFS) / Breadth-First Search (BFS)

 Detect Cycle in Graph (Directed & Undirected)

 Minimum Spanning Tree (Prim’s, Kruskal’s)

 Shortest Path (Dijkstra’s, Bellman-Ford)

 Topological Sorting

 Strongly Connected Components (Kosaraju’s Algorithm)

5. Backtracking

 N-Queens Problem

 Sudoku Solver

 Hamiltonian Path

 Subset Sum Problem

6. Mathematical Problems

 Sieve of Eratosthenes (Prime Numbers)

 Greatest Common Divisor (Euclid’s Algorithm)

 Modular Exponentiation

 Fast Fourier Transform (FFT)

7. String Problems

 Pattern Matching (KMP, Rabin-Karp)

 Longest Palindromic Substring

 Trie Construction and Usage

 Anagram Detection

8. Others

 Tower of Hanoi
 Convex Hull (Computational Geometry)

 Sliding Window Maximum

 Union-Find (Disjoint Set Union)

LCS

Longest Common Subsequence (LCS)

Problem:
Given two sequences (strings or arrays), find the length of their Longest Common Subsequence — a
subsequence that appears in the same relative order (not necessarily contiguous) in both sequences.

🔎 Example:

makefile

CopyEdit

X = "ABCBDAB"

Y = "BDCAB"

Longest Common Subsequence: "BCAB" or "BDAB"

Length = 4

🧠 Approach: Dynamic Programming (Bottom-Up)

1. Create a 2D table dp where dp[i][j] represents the length of LCS of X[0..i-1] and Y[0..j-1].

2. If characters match (X[i-1] == Y[j-1]), then:

CopyEdit

dp[i][j] = 1 + dp[i-1][j-1]

3. Otherwise:

lua

CopyEdit

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

4. The answer will be in dp[m][n] where m and n are lengths of X and Y.

📝 Code Example (Python):

python
CopyEdit

def lcs(X, Y):

m, n = len(X), len(Y)

dp = [[0] * (n + 1) for _ in range(m + 1)]

for i in range(1, m + 1):

for j in range(1, n + 1):

if X[i - 1] == Y[j - 1]:

dp[i][j] = 1 + dp[i - 1][j - 1]

else:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

return dp[m][n]

# Example usage

X = "ABCBDAB"

Y = "BDCAB"

print(lcs(X, Y)) # Output: 4

Recovering the LCS String

To find the actual subsequence, you can backtrack from dp[m][n]:

python

CopyEdit

def get_lcs(X, Y):

m, n = len(X), len(Y)

dp = [[0] * (n + 1) for _ in range(m + 1)]

for i in range(1, m + 1):

for j in range(1, n + 1):

if X[i - 1] == Y[j - 1]:

dp[i][j] = 1 + dp[i - 1][j - 1]

else:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

# Backtracking

i, j = m, n

lcs_str = []

while i > 0 and j > 0:

if X[i - 1] == Y[j - 1]:

lcs_str.append(X[i - 1])

i -= 1

j -= 1

elif dp[i - 1][j] > dp[i][j - 1]:

i -= 1

else:

j -= 1

return ''.join(reversed(lcs_str))

print(get_lcs(X, Y)) # Output: "BCAB" or "BDAB"

⏳ Time & Space Complexity

 Time: O(m * n)

 Space: O(m * n) (can be optimized to O(min(m, n)) with advanced techniques)

Longest Increasing Subsequence (LIS)

Problem:
Given an array of numbers, find the length of the Longest Increasing Subsequence — a subsequence
where elements are strictly increasing, but not necessarily contiguous.

🔎 Example:

mathematica

CopyEdit
Input: [10, 9, 2, 5, 3, 7, 101, 18]

Longest Increasing Subsequence: [2, 3, 7, 101]

Length = 4

🧠 Approach 1: Dynamic Programming (O(n²))

 Let dp[i] = length of LIS ending at index i.

 For each i, check all j < i:

o If arr[j] < arr[i], update dp[i] = max(dp[i], dp[j] + 1).

 Answer is max(dp).

📝 Code (O(n²)):

python

CopyEdit

def length_of_lis(arr):

n = len(arr)

dp = [1] * n # Each element is LIS of length 1 by itself

for i in range(1, n):

for j in range(i):

if arr[j] < arr[i]:

dp[i] = max(dp[i], dp[j] + 1)

return max(dp)

# Example

arr = [10, 9, 2, 5, 3, 7, 101, 18]

print(length_of_lis(arr)) # Output: 4

🧠 Approach 2: Efficient Binary Search Method (O(n log n))

 Maintain an array sub where sub[i] is the smallest possible tail of an increasing subsequence
of length i+1.

 For each number:

o Use binary search to find its position in sub.

o Replace or append it.

 Length of sub is length of LIS.

📝 Code (O(n log n)):

python

CopyEdit

import bisect

def length_of_lis(arr):

sub = []

for x in arr:

i = bisect.bisect_left(sub, x) # Find position to insert x

if i == len(sub):

sub.append(x)

else:

sub[i] = x

return len(sub)

# Example

arr = [10, 9, 2, 5, 3, 7, 101, 18]

print(length_of_lis(arr)) # Output: 4

⏳ Complexity

 DP approach: O(n²)

 Binary search approach: O(n log n)

Minimum Edit Distance (Levenshtein Distance)

Problem:
Given two strings, find the minimum number of operations required to convert one string into the
other.
Allowed operations:

1. Insert

2. Delete

3. Replace

🔎 Example:

text

CopyEdit

word1 = "kitten"

word2 = "sitting"

Operations:

- kitten → sitten (replace 'k' with 's')

- sitten → sittin (replace 'e' with 'i')

- sittin → sitting (insert 'g')

Minimum edit distance = 3

🧠 Dynamic Programming Approach (Bottom-Up)

Let dp[i][j] = minimum operations to convert word1[0..i-1] to word2[0..j-1].

✅ Recurrence:

 If characters match:

CopyEdit

dp[i][j] = dp[i-1][j-1]

 If characters don't match:

python

CopyEdit

dp[i][j] = 1 + min(

dp[i-1][j], # delete
dp[i][j-1], # insert

dp[i-1][j-1] # replace

📝 Python Code:

python

CopyEdit

def min_edit_distance(word1, word2):

m, n = len(word1), len(word2)

dp = [[0]*(n+1) for _ in range(m+1)]

for i in range(m+1):

dp[i][0] = i # Deletion

for j in range(n+1):

dp[0][j] = j # Insertion

for i in range(1, m+1):

for j in range(1, n+1):

if word1[i-1] == word2[j-1]:

dp[i][j] = dp[i-1][j-1]

else:

dp[i][j] = 1 + min(

dp[i-1][j], # delete

dp[i][j-1], # insert

dp[i-1][j-1] # replace

return dp[m][n]

# Example

print(min_edit_distance("kitten", "sitting")) # Output: 3

⏳ Time and Space Complexity

 Time: O(m * n)

 Space: O(m * n) (can be optimized to O(min(m, n)))

Applications

 Spell checkers

 DNA sequence comparison

 Natural Language Processing (NLP)

Max Sum Subarray

Maximum Sum Subarray (Kadane’s Algorithm)

Problem:
Given an array of integers, find the contiguous subarray (containing at least one number) which has
the largest sum.

🔎 Example:

text

CopyEdit

Input: [-2, 1, -3, 4, -1, 2, 1, -5, 4]

Output: 6

Explanation: [4, -1, 2, 1] has the largest sum = 6

✅ Optimal Solution: Kadane’s Algorithm

Idea:

 Keep track of the current subarray sum (current_sum).

 Update the maximum sum (max_sum) seen so far.

 If current_sum drops below 0, reset it to 0.

📝 Code (Python):

python
CopyEdit

def max_subarray_sum(arr):

max_sum = float('-inf')

current_sum = 0

for num in arr:

current_sum = max(num, current_sum + num)

max_sum = max(max_sum, current_sum)

return max_sum

# Example

arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]

print(max_subarray_sum(arr)) # Output: 6

🧠 Optional: Track the Subarray Too

python

CopyEdit

def max_subarray_with_indices(arr):

max_sum = float('-inf')

current_sum = 0

start = end = s = 0

for i in range(len(arr)):

current_sum += arr[i]

if current_sum > max_sum:

max_sum = current_sum

start = s

end = i
if current_sum < 0:

current_sum = 0

s=i+1

return max_sum, arr[start:end+1]

# Output: (6, [4, -1, 2, 1])

print(max_subarray_with_indices(arr))

⏳ Time and Space Complexity

 Time: O(n)

 Space: O(1)

📌 Variants

 Max sum circular subarray

 Max product subarray

 2D max sum submatrix

17 Dynprog2
No ratings yet
17 Dynprog2
33 pages
Longest Common Sub Sequence
No ratings yet
Longest Common Sub Sequence
4 pages
Lec06 448
No ratings yet
Lec06 448
6 pages
Zoho 2nd and 3rd Round Coding Questions
70% (10)
Zoho 2nd and 3rd Round Coding Questions
49 pages
Longest Common Subsquence
No ratings yet
Longest Common Subsquence
8 pages
Longest Common Subsequence Using Dynamic Programming: Submitted By: Submitted To
No ratings yet
Longest Common Subsequence Using Dynamic Programming: Submitted By: Submitted To
30 pages
Intro To Dynamic Programming
No ratings yet
Intro To Dynamic Programming
7 pages
Programming Assignment 5: Dynamic Programming 1
No ratings yet
Programming Assignment 5: Dynamic Programming 1
11 pages
Dynamic Programming: Assignment
No ratings yet
Dynamic Programming: Assignment
29 pages
Strings
No ratings yet
Strings
9 pages
L09 DynamicProgramming - Part03
No ratings yet
L09 DynamicProgramming - Part03
14 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
11 pages
Without Ans
No ratings yet
Without Ans
14 pages
Geeks DP
No ratings yet
Geeks DP
111 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
DP 1
No ratings yet
DP 1
67 pages
Week5 Dynamic Programming1
No ratings yet
Week5 Dynamic Programming1
11 pages
Lect11 DP Lcs
No ratings yet
Lect11 DP Lcs
6 pages
8 LCS 19 01 2024
No ratings yet
8 LCS 19 01 2024
17 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
Programming Assignment 5: Dynamic Programming 1
No ratings yet
Programming Assignment 5: Dynamic Programming 1
11 pages
f22 hw2 Sol
No ratings yet
f22 hw2 Sol
9 pages
???? ?????????? ??? ??????????
No ratings yet
???? ?????????? ??? ??????????
13 pages
DAA Final Examination 2003en
No ratings yet
DAA Final Examination 2003en
10 pages
Algorithms Dynamic Programming
No ratings yet
Algorithms Dynamic Programming
6 pages
Ex 7
No ratings yet
Ex 7
7 pages
1 Travelling Sa
No ratings yet
1 Travelling Sa
4 pages
Week 12 - Dynamic Programming
No ratings yet
Week 12 - Dynamic Programming
55 pages
DP Practice Questions 2
No ratings yet
DP Practice Questions 2
5 pages
Module 3 - Roots of An Equation (Bracketing and Open Methods)
No ratings yet
Module 3 - Roots of An Equation (Bracketing and Open Methods)
12 pages
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
0% (1)
Artificial Intelligence For R-2017 by Krishna Sankar P., Shangaranarayanee N. P., Nithyananthan S.
8 pages
Github Jakehoare Leetcode 4splits 100scale With Difficulty
No ratings yet
Github Jakehoare Leetcode 4splits 100scale With Difficulty
221 pages
Aoa 6
No ratings yet
Aoa 6
4 pages
LCS Notes
No ratings yet
LCS Notes
5 pages
Properties of Axiomatic System
No ratings yet
Properties of Axiomatic System
18 pages
TOC Chapter-1
No ratings yet
TOC Chapter-1
21 pages
Lec 06
No ratings yet
Lec 06
41 pages
Module V
No ratings yet
Module V
4 pages
10 Dynamic 1
No ratings yet
10 Dynamic 1
37 pages
Sno 717
No ratings yet
Sno 717
10 pages
Dynamic Programming
No ratings yet
Dynamic Programming
22 pages
TD DynProg
No ratings yet
TD DynProg
8 pages
W-8 - L-1 - DP Longest Common Subsequence and Edit Distance
No ratings yet
W-8 - L-1 - DP Longest Common Subsequence and Edit Distance
19 pages
Algorithmshw 4
No ratings yet
Algorithmshw 4
6 pages
University of Campinas Notebook
No ratings yet
University of Campinas Notebook
17 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
DP Problem Algortithms
No ratings yet
DP Problem Algortithms
16 pages
CSE 205 Lab Manual 13 LCS
No ratings yet
CSE 205 Lab Manual 13 LCS
5 pages
True/False Question
No ratings yet
True/False Question
3 pages
Design Techniques Part 2 64
No ratings yet
Design Techniques Part 2 64
15 pages
Diffie Hellman Discrete Log PDF
No ratings yet
Diffie Hellman Discrete Log PDF
11 pages
Graphs Assignment
No ratings yet
Graphs Assignment
4 pages
Matlab PDF
No ratings yet
Matlab PDF
2 pages
Project Explanation
No ratings yet
Project Explanation
50 pages
B306 DAA Lab Manual Exp 7
No ratings yet
B306 DAA Lab Manual Exp 7
8 pages
11339AoA - EX-7
No ratings yet
11339AoA - EX-7
7 pages
Accenture Coding Test Vardhaman Material
No ratings yet
Accenture Coding Test Vardhaman Material
11 pages
GATE Online Coaching Classes: Digital Communications
No ratings yet
GATE Online Coaching Classes: Digital Communications
64 pages
FP Growth PPT Shabnam
No ratings yet
FP Growth PPT Shabnam
19 pages
Assignment NM 2
No ratings yet
Assignment NM 2
1 page
Application of Machine Learning in Mining
100% (1)
Application of Machine Learning in Mining
58 pages
Adamodelpaper 3
No ratings yet
Adamodelpaper 3
35 pages
Cse 2017 PDF
No ratings yet
Cse 2017 PDF
300 pages
Trees
No ratings yet
Trees
6 pages
Dynamic Programming
No ratings yet
Dynamic Programming
8 pages
Operations Research 1
No ratings yet
Operations Research 1
9 pages
Designing Synchronous Counters (9.5 FLOYD) ++: Warning!! Important Topic
No ratings yet
Designing Synchronous Counters (9.5 FLOYD) ++: Warning!! Important Topic
31 pages
Unit 2 - Session 3
No ratings yet
Unit 2 - Session 3
21 pages
A Hybrid Genetic Algorithm For Integrated Process Planning and Scheduling Problem With Precedence Constraints
No ratings yet
A Hybrid Genetic Algorithm For Integrated Process Planning and Scheduling Problem With Precedence Constraints
15 pages
PS3 Relations PDF
No ratings yet
PS3 Relations PDF
2 pages
Day 11
No ratings yet
Day 11
7 pages
Assignment Solution
No ratings yet
Assignment Solution
12 pages
Algorithmic Graph Minor Theory
No ratings yet
Algorithmic Graph Minor Theory
10 pages
Physics Project - For-Class-12th
No ratings yet
Physics Project - For-Class-12th
10 pages
17-dynprog2
No ratings yet
17-dynprog2
33 pages
Chap 12 LPP
No ratings yet
Chap 12 LPP
16 pages
Week 5-6
No ratings yet
Week 5-6
80 pages
B60 Exp07 Aoa
No ratings yet
B60 Exp07 Aoa
8 pages
Toffoli Gate: Navigation Search
No ratings yet
Toffoli Gate: Navigation Search
5 pages
Geo Test Review 20180221 Ans
No ratings yet
Geo Test Review 20180221 Ans
3 pages
DSSlides M1
No ratings yet
DSSlides M1
86 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
112 pages
HMM Stochastic Tagger
No ratings yet
HMM Stochastic Tagger
8 pages
American Express Data Analyst DSA Interview Questions
No ratings yet
American Express Data Analyst DSA Interview Questions
16 pages
Arrays and Strings
No ratings yet
Arrays and Strings
8 pages
Searching Algorithms
No ratings yet
Searching Algorithms
50 pages
CSE332 Practice Questions Mid Summer2024
No ratings yet
CSE332 Practice Questions Mid Summer2024
4 pages
Rust Oops
No ratings yet
Rust Oops
35 pages
Programming Fundamentals For Software Engineering
No ratings yet
Programming Fundamentals For Software Engineering
19 pages
Rust Ownership
No ratings yet
Rust Ownership
19 pages
LabPractice-Week9
No ratings yet
LabPractice-Week9
8 pages
Rust - Concept of Smart Pointers
No ratings yet
Rust - Concept of Smart Pointers
15 pages
Job Sequencing
No ratings yet
Job Sequencing
12 pages