String Matching Algorithms
String Matching Algorithms
1. Naive String Matching
Checks for the pattern at every position in the text.
Simple but inefficient: O(m * n) time complexity (m = pattern length, n = text length).
Good for small inputs or simple cases.
2. Knuth-Morris-Pratt (KMP) Algorithm
Improves over naive by avoiding re-examination of characters.
Preprocesses the pattern to create a Longest Prefix Suffix (LPS) array.
Time complexity: O(m + n).
Great for large texts with repetitive patterns.
3. Rabin-Karp Algorithm
Uses hashing to find pattern in text.
Hash pattern and substrings of text; if hash matches, verify characters.
Average time complexity: O(m + n); worst-case O(m * n) (due to collisions).
Good for multiple pattern searching.
4. Boyer-Moore Algorithm
Uses bad character and good suffix heuristics to skip sections of text.
Usually very fast in practice.
Time complexity: best case O(n/m), worst case O(m * n).
Efficient for large alphabets and long patterns.
5. Aho-Corasick Algorithm
For searching multiple patterns simultaneously.
Builds a trie + failure links automaton.
Time complexity: O(n + total pattern length + number of matches).
Used in spam filtering, DNA sequence analysis.
Quick Example: KMP LPS Array Construction
python
CopyEdit
def compute_lps(pattern):
lps = [0] * len(pattern)
length = 0
i=1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
Classical Problems
Classical Problems
1. Sorting and Searching
Merge Sort, Quick Sort
Binary Search and Variants
Counting Sort, Radix Sort
2. Greedy Problems
Activity Selection (Job Scheduling)
Huffman Coding (Data Compression)
Fractional Knapsack
3. Dynamic Programming
Fibonacci Numbers
0/1 Knapsack Problem
Longest Common Subsequence (LCS)
Coin Change Problem
Matrix Chain Multiplication
Edit Distance (Levenshtein Distance)
4. Graph Problems
Depth-First Search (DFS) / Breadth-First Search (BFS)
Detect Cycle in Graph (Directed & Undirected)
Minimum Spanning Tree (Prim’s, Kruskal’s)
Shortest Path (Dijkstra’s, Bellman-Ford)
Topological Sorting
Strongly Connected Components (Kosaraju’s Algorithm)
5. Backtracking
N-Queens Problem
Sudoku Solver
Hamiltonian Path
Subset Sum Problem
6. Mathematical Problems
Sieve of Eratosthenes (Prime Numbers)
Greatest Common Divisor (Euclid’s Algorithm)
Modular Exponentiation
Fast Fourier Transform (FFT)
7. String Problems
Pattern Matching (KMP, Rabin-Karp)
Longest Palindromic Substring
Trie Construction and Usage
Anagram Detection
8. Others
Tower of Hanoi
Convex Hull (Computational Geometry)
Sliding Window Maximum
Union-Find (Disjoint Set Union)
LCS
Longest Common Subsequence (LCS)
Problem:
Given two sequences (strings or arrays), find the length of their Longest Common Subsequence — a
subsequence that appears in the same relative order (not necessarily contiguous) in both sequences.
🔎 Example:
makefile
CopyEdit
X = "ABCBDAB"
Y = "BDCAB"
Longest Common Subsequence: "BCAB" or "BDAB"
Length = 4
🧠 Approach: Dynamic Programming (Bottom-Up)
1. Create a 2D table dp where dp[i][j] represents the length of LCS of X[0..i-1] and Y[0..j-1].
2. If characters match (X[i-1] == Y[j-1]), then:
CopyEdit
dp[i][j] = 1 + dp[i-1][j-1]
3. Otherwise:
lua
CopyEdit
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
4. The answer will be in dp[m][n] where m and n are lengths of X and Y.
📝 Code Example (Python):
python
CopyEdit
def lcs(X, Y):
m, n = len(X), len(Y)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
return dp[m][n]
# Example usage
X = "ABCBDAB"
Y = "BDCAB"
print(lcs(X, Y)) # Output: 4
Recovering the LCS String
To find the actual subsequence, you can backtrack from dp[m][n]:
python
CopyEdit
def get_lcs(X, Y):
m, n = len(X), len(Y)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# Backtracking
i, j = m, n
lcs_str = []
while i > 0 and j > 0:
if X[i - 1] == Y[j - 1]:
lcs_str.append(X[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] > dp[i][j - 1]:
i -= 1
else:
j -= 1
return ''.join(reversed(lcs_str))
print(get_lcs(X, Y)) # Output: "BCAB" or "BDAB"
⏳ Time & Space Complexity
Time: O(m * n)
Space: O(m * n) (can be optimized to O(min(m, n)) with advanced techniques)
Longest Increasing Subsequence (LIS)
Problem:
Given an array of numbers, find the length of the Longest Increasing Subsequence — a subsequence
where elements are strictly increasing, but not necessarily contiguous.
🔎 Example:
mathematica
CopyEdit
Input: [10, 9, 2, 5, 3, 7, 101, 18]
Longest Increasing Subsequence: [2, 3, 7, 101]
Length = 4
🧠 Approach 1: Dynamic Programming (O(n²))
Let dp[i] = length of LIS ending at index i.
For each i, check all j < i:
o If arr[j] < arr[i], update dp[i] = max(dp[i], dp[j] + 1).
Answer is max(dp).
📝 Code (O(n²)):
python
CopyEdit
def length_of_lis(arr):
n = len(arr)
dp = [1] * n # Each element is LIS of length 1 by itself
for i in range(1, n):
for j in range(i):
if arr[j] < arr[i]:
dp[i] = max(dp[i], dp[j] + 1)
return max(dp)
# Example
arr = [10, 9, 2, 5, 3, 7, 101, 18]
print(length_of_lis(arr)) # Output: 4
🧠 Approach 2: Efficient Binary Search Method (O(n log n))
Maintain an array sub where sub[i] is the smallest possible tail of an increasing subsequence
of length i+1.
For each number:
o Use binary search to find its position in sub.
o Replace or append it.
Length of sub is length of LIS.
📝 Code (O(n log n)):
python
CopyEdit
import bisect
def length_of_lis(arr):
sub = []
for x in arr:
i = bisect.bisect_left(sub, x) # Find position to insert x
if i == len(sub):
sub.append(x)
else:
sub[i] = x
return len(sub)
# Example
arr = [10, 9, 2, 5, 3, 7, 101, 18]
print(length_of_lis(arr)) # Output: 4
⏳ Complexity
DP approach: O(n²)
Binary search approach: O(n log n)
Minimum Edit Distance (Levenshtein Distance)
Problem:
Given two strings, find the minimum number of operations required to convert one string into the
other.
Allowed operations:
1. Insert
2. Delete
3. Replace
🔎 Example:
text
CopyEdit
word1 = "kitten"
word2 = "sitting"
Operations:
- kitten → sitten (replace 'k' with 's')
- sitten → sittin (replace 'e' with 'i')
- sittin → sitting (insert 'g')
Minimum edit distance = 3
🧠 Dynamic Programming Approach (Bottom-Up)
Let dp[i][j] = minimum operations to convert word1[0..i-1] to word2[0..j-1].
✅ Recurrence:
If characters match:
CopyEdit
dp[i][j] = dp[i-1][j-1]
If characters don't match:
python
CopyEdit
dp[i][j] = 1 + min(
dp[i-1][j], # delete
dp[i][j-1], # insert
dp[i-1][j-1] # replace
📝 Python Code:
python
CopyEdit
def min_edit_distance(word1, word2):
m, n = len(word1), len(word2)
dp = [[0]*(n+1) for _ in range(m+1)]
for i in range(m+1):
dp[i][0] = i # Deletion
for j in range(n+1):
dp[0][j] = j # Insertion
for i in range(1, m+1):
for j in range(1, n+1):
if word1[i-1] == word2[j-1]:
dp[i][j] = dp[i-1][j-1]
else:
dp[i][j] = 1 + min(
dp[i-1][j], # delete
dp[i][j-1], # insert
dp[i-1][j-1] # replace
return dp[m][n]
# Example
print(min_edit_distance("kitten", "sitting")) # Output: 3
⏳ Time and Space Complexity
Time: O(m * n)
Space: O(m * n) (can be optimized to O(min(m, n)))
Applications
Spell checkers
DNA sequence comparison
Natural Language Processing (NLP)
Max Sum Subarray
Maximum Sum Subarray (Kadane’s Algorithm)
Problem:
Given an array of integers, find the contiguous subarray (containing at least one number) which has
the largest sum.
🔎 Example:
text
CopyEdit
Input: [-2, 1, -3, 4, -1, 2, 1, -5, 4]
Output: 6
Explanation: [4, -1, 2, 1] has the largest sum = 6
✅ Optimal Solution: Kadane’s Algorithm
Idea:
Keep track of the current subarray sum (current_sum).
Update the maximum sum (max_sum) seen so far.
If current_sum drops below 0, reset it to 0.
📝 Code (Python):
python
CopyEdit
def max_subarray_sum(arr):
max_sum = float('-inf')
current_sum = 0
for num in arr:
current_sum = max(num, current_sum + num)
max_sum = max(max_sum, current_sum)
return max_sum
# Example
arr = [-2, 1, -3, 4, -1, 2, 1, -5, 4]
print(max_subarray_sum(arr)) # Output: 6
🧠 Optional: Track the Subarray Too
python
CopyEdit
def max_subarray_with_indices(arr):
max_sum = float('-inf')
current_sum = 0
start = end = s = 0
for i in range(len(arr)):
current_sum += arr[i]
if current_sum > max_sum:
max_sum = current_sum
start = s
end = i
if current_sum < 0:
current_sum = 0
s=i+1
return max_sum, arr[start:end+1]
# Output: (6, [4, -1, 2, 1])
print(max_subarray_with_indices(arr))
⏳ Time and Space Complexity
Time: O(n)
Space: O(1)
📌 Variants
Max sum circular subarray
Max product subarray
2D max sum submatrix