Longest Common Subsequence Using LTDP and Rank Convergence: Course Code - CSE: 371
Longest Common Subsequence Using LTDP and Rank Convergence: Course Code - CSE: 371
1. Abstract 3
2. Introduction 4
3. Hirschberg’s Algorithm 5
5. Parallel Implementation
6. Enhancement
11
7. Evaluation 12
8. Theoretical Analysis 13
2|Page
9. References 14
Abstract
3|Page
1. Introduction
4|Page
The LCS is not necessarily unique; for example the LCS of "ABC" and
"ACB" is both "AB" and "AC". Indeed, the LCS problem is often defined
to be finding all common subsequences of a maximum length. This
problem inherently has higher complexity, as the number of such
subsequences is exponential in the worst case, even for only two input
strings.
2. Hirschberg’s Algorithm
It is a linear space recursive algorithm that uses the divide and conquer
approach to solve the LCS problem.
The main highlights of this algorithm are –
1. The key idea is to split one of the input sentences into two.
2. We then find the LCS of Xpart1, Y and Xpart2rev , Yrev.
3. Then we fins a position y-position k that maximizes the sums of
these forward and backward LCS lengths.
4. Then we have a suitable split position for the second string as
well.
5. Thus our problem reduces to finding the answer for the two
smaller subsequences.
5|Page
3. Linear Tropical Dynamic Programming
6|Page
tropical semiring. In other words, the solution to sub problem j in stage
of LTDP, is given by the recurrence equation
7|Page
4. Parallel Implementation
8|Page
In the parallel algorithm as shown in figure (b) processors P a and Pb
assume arbitrary solution vector and compute their part of the matrix
based on these solution vectors. Hence, there is no interdependency
between the processors as in the serial version so these processors can
work in parallel. Of course, the solutions for the stages computed by Pa
and Pb will start out as completely wrong (shaded dark in the figure).
However, if rank convergence occurs then these erroneous solution
vectors will eventually become parallel to the actual solution vectors
(shaded gray in the figure). Thus, Pa will generate some solution vector
śb parallel to sb and Pb will generate some solution vector śb parallel to sb .
In a subsequent fix up phase, shown in Figure 2(c), Pa uses computed by
P0 and Pa uses sa computed by P1 to fix stages that are not parallel to the
actual solution vector at that stage. After the fixup, the solution vectors
9|Page
at each stage are either the same as or parallel to the actual solution
vector at those respective stages.
1. Forward Phase (Parallel)
The goal of the parallel forward phase is to compute a solution
vector s[i] at stage i that is parallel to the actual solution vector ⃗si,
as shown in the figure. During the execution of the algorithm, we
say that a stage i has converged if s[i] computed by the algorithm
is parallel to its actual solution vector ⃗si.
When compared to the sequential algorithm, the parallel
algorithm should additionally store s[i] per stage required to test
for convergence in the fix up loop. If space is a constraint, then
the fix up loop can be modified to recompute s[i] in each iteration,
trading compute for space.
10 | P a g e
The LCS matrix is computed in an anti-diagonal manner using the
following recurrence relation -
The value of i,j is 1 if the characters at the respective positions are the
same else it is equal to 0.
5. Enhancement
11 | P a g e
Further, the program is also implemented in OpenMPI using both :
Single Node and Multi Node Cluster. The results of Multi Node Cluster
were better as compared to Single Node.
6. Evaluation
12 | P a g e
The above graph(time vs input size) shows the comparison between the
three different implementations.
It depicts the running time of the same algorithm when run on a single
machine, when run in a single machine multi-threaded environment
and a multi-node environment. We can clearly see that the single node
implementation is the worst whereas the other two have similar run-
times for input size up to 9000.
However, the multi node implementation is expected to perform better
as input size increases because with increase in input size the number
of threads required would also increase in a multi threaded
environment thus giving leading to poor performance because of
limitations on the maximum number of threads and the overhead
incurred to create them.
7. Theoretical Analysis
13 | P a g e
Serial Time : Ts = O(n2)
Isoefficiency = O(p2)
8. References
15 | P a g e