LCS Notes
LCS Notes
Suppose, X and Y are two sequences over a finite set of elements. We can say that Z is a
common subsequence of X and Y, if Z is a subsequence of both X and Y.
Naive Method
Let X be a sequence of length m and Y a sequence of length n. Check for every subsequence
of X whether it is a subsequence of Y, and return the longest common subsequence found.
Step 1 − Construct an empty adjacency table with the size, n × m, where n = size of
sequence X and m = size of sequence Y. The rows in the table represent the elements in
sequence X and columns represent the elements in sequence Y.
Step 2 − The zeroeth rows and columns must be filled with zeroes. And the remaining values
are filled in based on different cases, by maintaining a counter value.
Step 3 − Once the table is filled, backtrack from the last value in the table. Backtracking here
is done by tracing the path where the counter incremented first.
Step 4 − The longest common subseqence obtained by noting the elements in the traced path.
Pseudocode
In this procedure, table C[m, n] is computed in row major order and another table B[m,n] is
computed to construct optimal solution.
Algorithm: LCS-Length-Table-Formulation (X, Y)
m := length(X)
n := length(Y)
for i = 1 to m do
C[i, 0] := 0
for j = 1 to n do
C[0, j] := 0
for i = 1 to m do
for j = 1 to n do
if xi = yj
C[i, j] := C[i - 1, j - 1] + 1
B[i, j] := ‘D’
else
if C[i -1, j] ≥ C[i, j -1]
C[i, j] := C[i - 1, j] + 1
B[i, j] := ‘U’
else
C[i, j] := C[i, j - 1] + 1
B[i, j] := ‘L’
return C and B
Algorithm: Print-LCS (B, X, i, j)
if i=0 and j=0
return
if B[i, j] = ‘D’
Print-LCS(B, X, i-1, j-1)
Print(xi)
else if B[i, j] = ‘U’
Print-LCS(B, X, i-1, j)
else
Print-LCS(B, X, i, j-1)
Analysis
To populate the table, the outer for loop iterates m times and the inner for loop
iterates n times. Hence, the complexity of the algorithm is O(m,n), where m and n are the
length of two strings.
Example
In this example, we have two strings X=BACDB and Y=BDCB to find the longest common
subsequence.
Following the algorithm, we need to calculate two tables 1 and 2.
Given n = length of X, m = length of Y
X = BDCB, Y = BACDB
Constructing the LCS Tables
In the table below, the zeroeth rows and columns are filled with zeroes. Remianing values are
filled by incrementing and choosing the maximum values according to the algorithm.
Once the values are filled, the path is traced back from the last value in the table at T[4, 5].
From the traced path, the longest common subsequence is found by choosing the values
where the counter is first incremented.
In this example, the final count is 3 so the counter is incremented at 3 places, i.e., B, C, B.
Therefore, the longest common subsequence of sequences X and Y is BCB.
Implementation
Following is the final implementation to find the Longest Common Subsequence using
Dynamic Programming Approach −
#include <stdio.h>
#include <string.h>
int max(int a, int b);
int lcs(char* X, char* Y, int m, int n){
int L[m + 1][n + 1];
int i, j, index;
for (i = 0; i <= m; i++) {
for (j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (X[i - 1] == Y[j - 1]) {
L[i][j] = L[i - 1][j - 1] + 1;
} else
L[i][j] = max(L[i - 1][j], L[i][j - 1]);
}
}
index = L[m][n];
char LCS[index + 1];
LCS[index] = '\0';
i = m, j = n;
while (i > 0 && j > 0) {
if (X[i - 1] == Y[j - 1]) {
LCS[index - 1] = X[i - 1];
i--;
j--;
index--;
} else if (L[i - 1][j] > L[i][j - 1])
i--;
else
j--;
}
printf("LCS: %s\n", LCS);
return L[m][n];
}
int max(int a, int b){
return (a > b) ? a : b;
}
int main(){
char X[] = "ABSDHS";
char Y[] = "ABDHSP";
int m = strlen(X);
int n = strlen(Y);
printf("Length of LCS is %d\n", lcs(X, Y, m, n));
return 0;
}
Output
LCS: ABDHS
Length of LCS is 5
Applications
The longest common subsequence problem is a classic computer science problem, the basis
of data comparison programs such as the diff-utility, and has applications in bioinformatics. It
is also widely used by revision control systems, such as SVN and Git, for reconciling
multiple changes made to a revision-controlled collection of files.