Finite Automata Based Regular Expression Constrained Longest Common Subsequence
Finite Automata Based Regular Expression Constrained Longest Common Subsequence
CSCI-589
Abstract. Finite Automata Based Algorithms for the Generalized Constrained Longest Common Subsequence[i] solves the following problems: STR-IC-LCS Problem, SEQ-IC-LCS Problem, STREC-LCS Problem and SEQ-EC-LCS Problem. For the generalized constrained longest common subsequence (GC-LCS) for strings S1 , S2 with respect to P, the time complexity of the solutions are worked out to be (r (n+ m)+ nm) for a fixed size alphabet, where r , n and m are the lengths of P, S1 and S2 respectively. The problems and solutions can be extended to Regular Expression Constrained Longest Common Subsequence. I presented the finite automata based algorithm can be applied to Regular Expression Constrained Longest Common Subsequence problem.
= All possible 2 m subsequences of S2. Step 4. Construct the Intersection Automata M1R of M1 and MR. L( M 1R )=L ( M 1) L( M R ) contains all the subsequences of S1 which satisfy the regular expression constraint R. This step consumes (nr ) time and at most (nr ) space. Step 5. Construct the Intersection Automata M2R of M2 and MR. L( M 2R )= L( M 2 )L (M R) contains all the subsequences of S2 which satisfy the regular expression constraint R. This step consumes (mr ) time and at most (mr ) space. Step 6. Construct the Intersection Automata M12R of M1R and M2R. L( M 12R )=L ( M 1R ) L( M 2R ) contains all the common subsequences of S1 and S2 which satisfy the regular expression constraint R. This step consumes (nmr 2 ) time and at most (nmr 2 ) space. Step 7. Find the maximum value path for Alignment Score in M12R. Following the Dijsktra's algorithm for finding the maximum paths using the weights on the edges from the transition function (x y) of the edit operation x y, we can find the maximum alignment (longest common subsequence with maximum alignment score). The simplest implementation of the Dijkstra's algorithm backed up with a binary heap takes ((nmr 2)+ log(nmr 2)) time. The total solution is bounded by step 7 consuming ((nmr 2)+ log(nmr 2)) time.
i Effat Farhana, Jannatul Ferdous, Tanaeem Moosa and M. Sohel Rahman, Finite Automata Based Algorithms for the Generalized Constrained Longest Common Subsequence Problems ii A. N. Arslan, Regular Expression Constrained Sequence Alignment, Journal of Discrete Algorithms iii Sanjay Bhargava, G. N. Purohit, Construction of a Minimal Deterministic Finite Automaton from a Regular Expression iv Ricardo A. Baeza Yates, Searching Subsequences