0% found this document useful (0 votes)
3 views

Module V

Uploaded by

nayankonar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module V

Uploaded by

nayankonar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

**PGCSE104: Advanced Algorithms

Module V - (4L)
Set and String Problems**

In this module, we explore important problems related to sets and strings, focusing on
optimization techniques and algorithms.

1. Set Cover Problem


The Set Cover Problem is a classical optimization problem where we aim to cover all elements of a
universal set with the minimum number of subsets from a given collection.

Problem Statement

Given a universe U and a collection S = {S1 , S2 , … , Sm } of subsets of U , find the minimum


​ ​ ​

number of subsets from S whose union equals U .

Approach
The problem is NP-hard, but a greedy algorithm provides an approximate solution. The greedy
approach selects the subset that covers the most uncovered elements of U at each step.

Greedy Algorithm
Initialize the set of covered elements as empty.
While there are uncovered elements, select the subset that covers the maximum number of
uncovered elements.
Repeat until all elements are covered.

Code Example (Python)

python Copy code

def set_cover(universe, subsets): covered = set() selected_subsets = [] while covered


!= universe: # Choose the subset that covers the most uncovered elements subset =
max(subsets, key=lambda s: len(s - covered)) selected_subsets.append(subset) covered
|= subset return selected_subsets # Example usage universe = {1, 2, 3, 4, 5} subsets
= [{1, 2, 3}, {2, 4}, {3, 4, 5}, {5}] solution = set_cover(universe, subsets)
print("Selected subsets:", solution)
2. String Matching
String Matching refers to the problem of finding one or more occurrences of a pattern string
within a larger text string.

Naive String Matching Algorithm


The simplest way to solve this problem is the naive algorithm, which slides the pattern over the
text one character at a time and checks for a match.

Code Example (Python)

python Copy code

def naive_string_matching(text, pattern): n = len(text) m = len(pattern) occurrences


= [] for i in range(n - m + 1): if text[i:i+m] == pattern: occurrences.append(i)
return occurrences # Example usage text = "abracadabra" pattern = "abra"
print("Pattern found at positions:", naive_string_matching(text, pattern))

KMP Algorithm (Knuth-Morris-Pratt)


The KMP algorithm is an efficient string matching algorithm that preprocesses the pattern to
avoid unnecessary comparisons. It uses a partial match table (also called the "lps" array) to skip
sections of the text.

3. Approximate String Matching


Approximate String Matching (also known as fuzzy string matching) is the problem of finding
substrings that match a pattern approximately, allowing for some mismatches or errors (insertions,
deletions, or substitutions).

Dynamic Programming Approach


The most common way to solve this problem is to use dynamic programming to compute
the edit distance (Levenshtein distance), which is the minimum number of operations (insertions,
deletions, or substitutions) required to convert one string into another.

Code Example (Python)

python Copy code

def edit_distance(s1, s2): n = len(s1) m = len(s2) dp = [[0] * (m + 1) for _ in


range(n + 1)] for i in range(n + 1): for j in range(m + 1): if i == 0: dp[i][j] = j
elif j == 0: dp[i][j] = i elif s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] else:
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) return dp[n][m] # Example
usage s1 = "kitten" s2 = "sitting" print("Edit distance:", edit_distance(s1, s2))
This algorithm runs in O(n × m), where n and m are the lengths of the two strings.

4. Longest Common Subsequence (LCS)


The Longest Common Subsequence (LCS) problem is a classic dynamic programming problem
where we seek to find the longest subsequence common to two sequences. Unlike substrings,
subsequences are not required to occupy consecutive positions.

Problem Statement

Given two sequences X and Y , find the longest subsequence that appears in both sequences in
the same order (but not necessarily consecutively).

Dynamic Programming Approach

Let dp[i][j] represent the length of the LCS of the first i characters of X and the first j characters
of Y . The recurrence relation is:

dp[i − 1][j − 1] + 1 if X[i − 1] == Y [j − 1]


dp[i][j] = {
max(dp[i − 1][j], dp[i][j − 1]) if X[i − 1] =
 Y [j − 1]
​ ​

Code Example (Python)

python Copy code

def lcs(X, Y): m = len(X) n = len(Y) dp = [[0] * (n + 1) for _ in range(m + 1)] for i
in range(1, m + 1): for j in range(1, n + 1): if X[i-1] == Y[j-1]: dp[i][j] = dp[i-1]
[j-1] + 1 else: dp[i][j] = max(dp[i-1][j], dp[i][j-1]) return dp[m][n] # Example
usage X = "AGGTAB" Y = "GXTXAYB" print("Length of LCS:", lcs(X, Y))

Time Complexity

The time complexity of this algorithm is O(m × n), where m and n are the lengths of the two
sequences.

Summary
In this module, we covered several key problems related to sets and strings:
Set Cover: An NP-hard optimization problem, approximated using a greedy approach.
String Matching: Finding exact occurrences of a pattern in a text using naive and efficient
algorithms like KMP.
Approximate String Matching: Finding close matches between strings using dynamic
programming to compute edit distances.
Longest Common Subsequence: A dynamic programming problem that finds the longest
subsequence common to two sequences.

These problems have wide-ranging applications in optimization, data analysis, and computational
biology.

You might also like