100% found this document useful (1 vote)
42 views

Proficient Computer Network Assignment Help

This document discusses algorithms for solving string matching problems. It begins by describing a naive O(nm) algorithm to check for matches between a pattern string P and source string S. It then presents a more efficient algorithm that converts the strings to polynomials, multiplies them using FFT-based polynomial multiplication, and extracts the match positions from the product in O(n log n) time. For the DNA string matching problem, it states this same approach can be applied, using a larger alphabet size.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
42 views

Proficient Computer Network Assignment Help

This document discusses algorithms for solving string matching problems. It begins by describing a naive O(nm) algorithm to check for matches between a pattern string P and source string S. It then presents a more efficient algorithm that converts the strings to polynomials, multiplies them using FFT-based polynomial multiplication, and extracts the match positions from the product in O(n log n) time. For the DNA string matching problem, it states this same approach can be applied, using a larger alphabet size.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

/ For any Assignment related queries, Call us at : -  

+1 678 648 4277


You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com

Proficient Computer Network Assignment Help

Problem
Problem 1.
Suppose you are given a source string S[0 . . n − 1] of length n, consisting
of symbols a and b. Suppose further that you are given a pattern string P[0 .
. m − 1] of length m « n, consisting of symbols a, b, and ∗, representing a
pattern to be found in string S. The symbol ∗ is a “wild card” symbol,
which matches a single symbol, either a or b. The other symbols must
match exactly.
The problem is to output a sorted list M of valid “match positions”,
which are positions j in S such that pattern P matches the substring S[j . .
j + |P| − 1]. For example, if S = a b a b b a b and P = a b ∗, then the
output M should be [0, 2].
(a) Describe a straightforward, na¨ıve algorithm to solve the problem. Your
algorithm should run in time O(nm).

(b) Give an algorithm to solve the problem by reducing it to the problem of


polynomial multiplication. Specifically, describe how to convert strings
S and P into polynomials such that the product of the polynomials allows
you to determine the answer M. Give examples to illustrate your
polynomial representation of the inputs and your way of determining
outputs from the product, based on the example S and P strings given
above.
computernetworkassignmenthelp.com
(c) Suppose you combine your solution to Part (b) with an FFT
algorithm for polynomial multiplication, as presented in Lecture 3. What
is the time complexity of the resulting solution to the string matching
problem?
(d) Now consider the same problem but with a larger symbol alphabet.
Specifically, suppose you are given a representation of a DNA strand as a
string D[0 . . n−1] of length n, consisting of symbols A, C, G, and T; and
you are given a pattern string P[0 . . m − 1] of length m « n, consisting of
symbols A, C, G, T, and ∗. The problem is, again, to output a sorted list
M of valid “match positions”, which are positions j in D such that pattern
P matches the substring D[j . . j + |P| − 1]. For example, if D = A C G A C
C A T and P = A C ∗ A, then the output M should be [0, 3].

Based on your solutions to Parts (b) and (c), give an efficient algorithm
for this setting. Illustrate your algorithm on the example above.

Problem 2. Combining B-trees

Consider a new B-tree operation COMBINE(T1, T2, k). This operation


takes as input two B-trees T1 and T2 with the same minimum degree
parameter t, plus a new key k that does not appear in either T1 or T2. We
assume that all the keys in T1 are strictly smaller than k and all the keys
in T2 are strictly larger than k. The COMBINE operation produces a
new B-tree T, with the same minimum degree t, whose keys are those in
T1, those in T2, plus k. In the process, it destroys the original trees T1
and T2.

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
In this problem, you will design an algorithm to implement the
COMBINE operation. Your algorithm should run in time O(|h1 − h2| +
1), where h1 and h2 are the heights of trees T1 and T2 respectively. In
analyzing the costs, you should regard t as a constant.

(a) First consider the special case of the problem in which h1 is


assumed to be equal to h2. Give an algorithm to combine the trees
that runs in constant time.

(b) Consider another special case, in which h1 is assumed to be


exactly equal to h2 + 1. Give a constant-time algorithm to
combine the trees.

(c) Now consider the more general case in which h1 and h2 are
arbitrary. Because the algorithm must work in such a small
amount of time, and must work for arbitrary heights, a first step is
to develop a new kind of augmented B-tree data

structure in which each node x always carries information about the


height of the subtree below x. Describe how to augment the common
B-tree insertion and deletion operations to maintain this information,
while still maintaining the asymptotic time complexity of all
operations.

(d) Now give an algorithm for combining two B-trees T1 and T2, in
the general case where h1 and h2 are arbitrary. Your algorithm should
run in time O(|h1−h2|+1).

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
Solution
Problem 1. Pattern Matching

Suppose you are given a source string S[0 . . n − 1] of length n,


consisting of symbols a and b. Suppose further that you are given a
pattern string P[0 . . m − 1] of length m « n, consisting of symbols a, b,
and ∗, representing a pattern to be found in string S. The symbol ∗ is a
“wild card” symbol, which matches a single symbol, either a or b. The
other symbols must match exactly.

The problem is to output a sorted list M of valid “match positions”,


which are positions j in S such that pattern P matches the substring
S[j . . j + |P| − 1]. For example, if S = a b a b b a b and P = a b ∗,
then the output M should be [0, 2].

(a) Describe a straightforward, na¨ıve algorithm to solve the problem.


Your algorithm should run in time O(nm).

Solution: One can explicitly check every possible starting position s


∈ {0, 1, . . . , n− m} by checking whether each entry in P matches
from s to s + m − 1.
For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
NAIVE-
ALGORITHM(S, P)

(b) Give an algorithm to solve the problem by reducing it to the


problem of polynomial multiplication. Specifically, describe how to
convert strings S and P into polynomials such that the product of the
polynomials allows you to determine the answer M. Give examples to
illustrate your polynomial representation of the inputs and your way of
determining outputs from the product, based on the example S and P
strings given above.
Solution: Let’s represent a by 1, b by −1, and ∗ by 0. We will use these
representations instead of the original symbols in this solution. Notice
that P matches S starting from position j, 0 ≤ j ≤ n − m, if and only if
for every i, 0 ≤ i ≤ m − 1, either P[i] = 0 or S[j + i]P[i] = 1. This is true
if and only if

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
where k is the number of non-∗ symbols in P.

We would like to express these summations as coefficients of a product


of polynomials. Let x be a variable. Represent S as

where each Q[i] = P[m − 1 − i]. Thus, we have reversed the order of the
coefficients in this last representation.

Suppose C is the product of the S polynomial and the P polynomial. Then


the coefficient of xm−1+j in C is

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
This is the same as the sum above. To obtain the output M, we simply
examine all the coefficients of C, outputting position number j, 0 ≤ j ≤ n −
m, exactly if the coefficient of xm−1+j is equal to k, the total number of
non-∗ symbols in P. We output these in order of increasing j, as required. In
the example above, S is represented by

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
The number k is equal to 2, so the terms of interest are 2x2, 2x4, and
2x7. These would yield j = 0, 2, 5, but 5 is ruled out because we are
only considering j ≤ n − m = 7 − 3 = 4.

(c) Suppose you combine your solution to Part (b) with an FFT algorithm
for polynomial multiplication, as presented in Lecture 3. What is the time
complexity of the resulting solution to the string matching problem?

Solution: It’s O(n lg n). It takes time O(n lg n) to perform the needed
DFT and inverse DFT algorithms, and O(n) for producing inputs for the
DFT algorithm and extracting M from the outputs.

d) Now consider the same problem but with a larger symbol alphabet.
Specifically, suppose you are given a representation of a DNA strand as
a string D[0 . . n−1] of length n, consisting of symbols A, C, G, and T;
and you are given a pattern string P[0 . . m − 1] of length m « n,
consisting of symbols A, C, G, T, and ∗.

The problem is, again, to output a sorted list M of valid “match


positions”, which are positions j in D such that pattern P matches the
substring D[j . . j + |P| − 1]. For example, if D = A C G A C C A T and P
= A C ∗ A, then the output M should be [0, 3].

Based on your solutions to Parts (b) and (c), give an efficient algorithm
for this setting. Illustrate your algorithm on the example above.

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
Solution: Use a reduction. Encode A as a a, C as a b, G as b a, T as b b,
and ∗ as ∗ ∗. Use our previous solution to solve the problem on the
resulting string, obtaining a list M' of positions.

The final output list M will consist of just the even numbers from the
list M' , all divided by 2.

This will take the time it takes to convert and then solve the original
problem on arrays of length 2n and 2m respectively:

Consider a new B-tree operation COMBINE(T1, T2, k). This operation


takes as input two B-trees T1 and T2 with the same minimum degree
parameter t, plus a new key k that does not appear in either T1 or T2.
We assume that all the keys in T1 are strictly smaller than k and all the
keys in T2 are strictly larger than k. The COMBINE operation
produces a new B-tree T, with the same minimum degree t, whose keys
are those in T1, those in T2, plus k. In the process, it destroys the
original trees T1 and T2.

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
In this problem, you will design an algorithm to implement the
COMBINE operation. Your algorithm should run in time O(|h1 − h2|
+ 1), where h1 and h2 are the heights of trees T1 and T2 respectively.
In analyzing the costs, you should regard t as a constant.

(a) First consider the special case of the problem in which h1 is


assumed to be equal to h2. Give an algorithm to combine the trees
that runs in constant time.

Solution: Construct a new root node for T consisting of the root nodes
for T1 and T2, with the root of T1 at the left, and with k inserted
between the keys of the two original roots. Then, if the number of
keys in the resulting root node is at least 2t−1 then split the root
node around its median, forming a new root node containing one
key and two child nodes containing at least t − 1 keys apiece.

COMBINE(T1, T2, k)

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
(b) Consider another special case, in which h1 is assumed to be
exactly equal to h2 + 1. Give a constant-time algorithm to
combine the trees.
Solution: Append k to the right end of the right child node of T1
and append the root of T2 to that. Clearly this preserves sorted
order. Now the right child may have anywhere from t + 1 to 4t − 1
keys. If it has 2t − 1 or more keys, then split it around its median
key. If that causes the root node to have 2t − 1 nodes then split
that around its median key, thus adding another level to the tree.

COMBINE(T1, T2, k)

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
(c) Now consider the more general case in which h1 and h2 are arbitrary.
Because the algorithm must work in such a small amount of time, and
must work for arbitrary heights, a first step is to develop a new kind of
augmented B-tree data structure in which each node x always carries
information about the height of the subtree below x. Describe how to
augment the common B-tree insertion and deletion operations to maintain
this information, while still maintaining the asymptotic time complexity of
all operations.

Solution: Augment the tree by adding a height attribute for each node.
The height of a leaf node is 0. For internal nodes, HEIGHT(x) =
HEIGHT(x.c1) + 1.
Insertion: New nodes are added during splitting. The newly allocated
node in a split has the same height as the original node. The only other
time a node is added is when the root is split. This is done by making the
root the child of a dummy node and then splitting it. The height of the
new root is set to one greater than the height of the old root.

Deletion: In deletion, no nodes are deleted except the root. Since height
values are indexed starting at the leaf, deletion does not affect node
heights.

With these additions, the asymptotic running time for insertion and
deletion is that same as before, O(lg n).

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
(d) Now give an algorithm for combining two B-trees T1 and T2, in the
general case where h1 and h2 are arbitrary. Your algorithm should run in
time O(|h1−h2|+1).

Solution:

If |h1 − h2| < 2, use part (a) or (b). Otherwise, assume that h1 > h2 +
1 (h2 > h1 + 1 works symmetrically). Let x be the rightmost node of
T1 at level h2. Add k at the right end of x and append the root of T2
to that. Now node x may have anywhere from t + 1 to 4t − 1 keys. If
it has 2t − 1 or more keys, then split it around its median key. The
split may propagate upwards, possibly as far as the root. So, the time
complexity depends linearly on the height difference O(|h1 − h2|).

COMBINE(T1, T2, k)

For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com
For any Assignment related queries, Call us at : -  +1 678 648 4277
You can mail us at : - [email protected] or
reach us at : - www.computernetworkassignmenthelp.com/
computernetworkassignmenthelp.com

You might also like