0% found this document useful (0 votes)
4 views13 pages

String Matching Algorithms

The document discusses various string matching algorithms including Naive, KMP, Rabin-Karp, Boyer-Moore, and Aho-Corasick, highlighting their time complexities and use cases. It also covers classical problems in computer science such as sorting, greedy algorithms, dynamic programming, graph problems, and string problems, providing examples and code snippets for concepts like Longest Common Subsequence and Minimum Edit Distance. Additionally, it introduces Kadane’s Algorithm for finding the maximum sum subarray, detailing its implementation and complexities.

Uploaded by

Cloud Computing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

String Matching Algorithms

The document discusses various string matching algorithms including Naive, KMP, Rabin-Karp, Boyer-Moore, and Aho-Corasick, highlighting their time complexities and use cases. It also covers classical problems in computer science such as sorting, greedy algorithms, dynamic programming, graph problems, and string problems, providing examples and code snippets for concepts like Longest Common Subsequence and Minimum Edit Distance. Additionally, it introduces Kadane’s Algorithm for finding the maximum sum subarray, detailing its implementation and complexities.

Uploaded by

Cloud Computing
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

String Matching Algorithms

String Matching Algorithms

1. Naive String Matching

 Checks for the pattern at every position in the text.

 Simple but inefficient: O(m * n) time complexity (m = pattern length, n = text length).

 Good for small inputs or simple cases.

2. Knuth-Morris-Pratt (KMP) Algorithm

 Improves over naive by avoiding re-examination of characters.

 Preprocesses the pattern to create a Longest Prefix Suffix (LPS) array.

 Time complexity: O(m + n).

 Great for large texts with repetitive patterns.

3. Rabin-Karp Algorithm

 Uses hashing to find pattern in text.

 Hash pattern and substrings of text; if hash matches, verify characters.

 Average time complexity: O(m + n); worst-case O(m * n) (due to collisions).

 Good for multiple pattern searching.

4. Boyer-Moore Algorithm

 Uses bad character and good suffix heuristics to skip sections of text.

 Usually very fast in practice.

 Time complexity: best case O(n/m), worst case O(m * n).

 Efficient for large alphabets and long patterns.

5. Aho-Corasick Algorithm

 For searching multiple patterns simultaneously.

 Builds a trie + failure links automaton.

 Time complexity: O(n + total pattern length + number of matches).

 Used in spam filtering, DNA sequence analysis.


Quick Example: KMP LPS Array Construction

python

CopyEdit

def compute_lps(pattern):

lps = [0] * len(pattern)

length = 0

i=1

while i < len(pattern):

if pattern[i] == pattern[length]:

length += 1

lps[i] = length

i += 1

else:

if length != 0:

length = lps[length - 1]

else:

lps[i] = 0

i += 1

return lps

Classical Problems

Classical Problems

1. Sorting and Searching

 Merge Sort, Quick Sort

 Binary Search and Variants

 Counting Sort, Radix Sort

2. Greedy Problems

 Activity Selection (Job Scheduling)

 Huffman Coding (Data Compression)

 Fractional Knapsack
3. Dynamic Programming

 Fibonacci Numbers

 0/1 Knapsack Problem

 Longest Common Subsequence (LCS)

 Coin Change Problem

 Matrix Chain Multiplication

 Edit Distance (Levenshtein Distance)

4. Graph Problems

 Depth-First Search (DFS) / Breadth-First Search (BFS)

 Detect Cycle in Graph (Directed & Undirected)

 Minimum Spanning Tree (Prim’s, Kruskal’s)

 Shortest Path (Dijkstra’s, Bellman-Ford)

 Topological Sorting

 Strongly Connected Components (Kosaraju’s Algorithm)

5. Backtracking

 N-Queens Problem

 Sudoku Solver

 Hamiltonian Path

 Subset Sum Problem

6. Mathematical Problems

 Sieve of Eratosthenes (Prime Numbers)

 Greatest Common Divisor (Euclid’s Algorithm)

 Modular Exponentiation

 Fast Fourier Transform (FFT)

7. String Problems

 Pattern Matching (KMP, Rabin-Karp)

 Longest Palindromic Substring

 Trie Construction and Usage

 Anagram Detection

8. Others

 Tower of Hanoi
 Convex Hull (Computational Geometry)

 Sliding Window Maximum

 Union-Find (Disjoint Set Union)

LCS

Longest Common Subsequence (LCS)

Problem:
Given two sequences (strings or arrays), find the length of their Longest Common Subsequence — a
subsequence that appears in the same relative order (not necessarily contiguous) in both sequences.

🔎 Example:

makefile

CopyEdit

X = "ABCBDAB"

Y = "BDCAB"

Longest Common Subsequence: "BCAB" or "BDAB"

Length = 4

🧠 Approach: Dynamic Programming (Bottom-Up)

1. Create a 2D table dp where dp[i][j] represents the length of LCS of X[0..i-1] and Y[0..j-1].

2. If characters match (X[i-1] == Y[j-1]), then:

CopyEdit

dp[i][j] = 1 + dp[i-1][j-1]

3. Otherwise:

lua

CopyEdit

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

4. The answer will be in dp[m][n] where m and n are lengths of X and Y.

📝 Code Example (Python):

python
CopyEdit

def lcs(X, Y):

m, n = len(X), len(Y)

dp = [[0] * (n + 1) for _ in range(m + 1)]

for i in range(1, m + 1):

for j in range(1, n + 1):

if X[i - 1] == Y[j - 1]:

dp[i][j] = 1 + dp[i - 1][j - 1]

else:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

return dp[m][n]

# Example usage

X = "ABCBDAB"

Y = "BDCAB"

print(lcs(X, Y)) # Output: 4

Recovering the LCS String

To find the actual subsequence, you can backtrack from dp[m][n]:

python

CopyEdit

def get_lcs(X, Y):

m, n = len(X), len(Y)

dp = [[0] * (n + 1) for _ in range(m + 1)]

for i in range(1, m + 1):

for j in range(1, n + 1):

if X[i - 1] == Y[j - 1]:

dp[i][j] = 1 + dp[i - 1][j - 1]


else:

dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])

# Backtracking

i, j = m, n

lcs_str = []

while i > 0 and j > 0:

if X[i - 1] == Y[j - 1]:

lcs_str.append(X[i - 1])

i -= 1

j -= 1

elif dp[i - 1][j] > dp[i][j - 1]:

i -= 1

else:

j -= 1

return ''.join(reversed(lcs_str))

print(get_lcs(X, Y)) # Output: "BCAB" or "BDAB"

⏳ Time & Space Complexity

 Time: O(m * n)

 Space: O(m * n) (can be optimized to O(min(m, n)) with advanced techniques)

Longest Increasing Subsequence (LIS)

Problem:
Given an array of numbers, find the length of the Longest Increasing Subsequence — a subsequence
where elements are strictly increasing, but not necessarily contiguous.

🔎 Example:

mathematica

CopyEdit
Input: [10, 9, 2, 5, 3, 7, 101, 18]

Longest Increasing Subsequence: [2, 3, 7, 101]

Length = 4

🧠 Approach 1: Dynamic Programming (O(n²))

 Let dp[i] = length of LIS ending at index i.

 For each i, check all j < i:

o If arr[j] < arr[i], update dp[i] = max(dp[i], dp[j] + 1).

 Answer is max(dp).

📝 Code (O(n²)):

python

CopyEdit

def length_of_lis(arr):

n = len(arr)

dp = [1] * n # Each element is LIS of length 1 by itself

for i in range(1, n):

for j in range(i):

if arr[j] < arr[i]:

dp[i] = max(dp[i], dp[j] + 1)

return max(dp)

# Example

arr = [10, 9, 2, 5, 3, 7, 101, 18]

print(length_of_lis(arr)) # Output: 4

🧠 Approach 2: Efficient Binary Search Method (O(n log n))


 Maintain an array sub where sub[i] is the smallest possible tail of an increasing subsequence
of length i+1.

 For each number:

o Use binary search to find its position in sub.

o Replace or append it.

 Length of sub is length of LIS.

📝 Code (O(n log n)):

python

CopyEdit

import bisect

def length_of_lis(arr):

sub = []

for x in arr:

i = bisect.bisect_left(sub, x) # Find position to insert x

if i == len(sub):

sub.append(x)

else:

sub[i] = x

return len(sub)

# Example

arr = [10, 9, 2, 5, 3, 7, 101, 18]

print(length_of_lis(arr)) # Output: 4

⏳ Complexity

 DP approach: O(n²)

 Binary search approach: O(n log n)

Minimum Edit Distance (Levenshtein Distance)


Problem:
Given two strings, find the minimum number of operations required to convert one string into the
other.
Allowed operations:

1. Insert

2. Delete

3. Replace

🔎 Example:

text

CopyEdit

word1 = "kitten"

word2 = "sitting"

Operations:

- kitten → sitten (replace 'k' with 's')

- sitten → sittin (replace 'e' with 'i')

- sittin → sitting (insert 'g')

Minimum edit distance = 3

🧠 Dynamic Programming Approach (Bottom-Up)

Let dp[i][j] = minimum operations to convert word1[0..i-1] to word2[0..j-1].

✅ Recurrence:

 If characters match:

CopyEdit

dp[i][j] = dp[i-1][j-1]

 If characters don't match:

python

CopyEdit

dp[i][j] = 1 + min(

dp[i-1][j], # delete
dp[i][j-1], # insert

dp[i-1][j-1] # replace

📝 Python Code:

python

CopyEdit

def min_edit_distance(word1, word2):

m, n = len(word1), len(word2)

dp = [[0]*(n+1) for _ in range(m+1)]

for i in range(m+1):

dp[i][0] = i # Deletion

for j in range(n+1):

dp[0][j] = j # Insertion

for i in range(1, m+1):

for j in range(1, n+1):

if word1[i-1] == word2[j-1]:

dp[i][j] = dp[i-1][j-1]

else:

dp[i][j] = 1 + min(

dp[i-1][j], # delete

dp[i][j-1], # insert

dp[i-1][j-1] # replace

return dp[m][n]

# Example

print(min_edit_distance("kitten", "sitting")) # Output: 3


⏳ Time and Space Complexity

 Time: O(m * n)

 Space: O(m * n) (can be optimized to O(min(m, n)))

Applications

 Spell checkers

 DNA sequence comparison

 Natural Language Processing (NLP)

Max Sum Subarray

Maximum Sum Subarray (Kadane’s Algorithm)

Problem:
Given an array of integers, find the contiguous subarray (containing at least one number) which has
the largest sum.

🔎 Example:

text

CopyEdit

Input: [-2, 1, -3, 4, -1, 2, 1, -5, 4]

Output: 6

Explanation: [4, -1, 2, 1] has the largest sum = 6

✅ Optimal Solution: Kadane’s Algorithm

Idea:

 Keep track of the current subarray sum (current_sum).

 Update the maximum sum (max_sum) seen so far.

 If current_sum drops below 0, reset it to 0.

📝 Code (Python):

python
CopyEdit

def max_subarray_sum(arr):

max_sum = float('-inf')

current_sum = 0

for num in arr:

current_sum = max(num, current_sum + num)

max_sum = max(max_sum, current_sum)

return max_sum

# Example

arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]

print(max_subarray_sum(arr)) # Output: 6

🧠 Optional: Track the Subarray Too

python

CopyEdit

def max_subarray_with_indices(arr):

max_sum = float('-inf')

current_sum = 0

start = end = s = 0

for i in range(len(arr)):

current_sum += arr[i]

if current_sum > max_sum:

max_sum = current_sum

start = s

end = i
if current_sum < 0:

current_sum = 0

s=i+1

return max_sum, arr[start:end+1]

# Output: (6, [4, -1, 2, 1])

print(max_subarray_with_indices(arr))

⏳ Time and Space Complexity

 Time: O(n)

 Space: O(1)

📌 Variants

 Max sum circular subarray

 Max product subarray

 2D max sum submatrix

You might also like