CYK Algorithm
CYK Algorithm
Example
The CYK algorithm of testing a string for membership in the yield of a context-free grammar can be succinctly represented in a triangular matrix. As an example, consider the following grammar in Chomsky Normal Form (CYK requires CNF). S => XY X => XA | a | b Y => AY | a A => a To determine whether the string babaa is generated by this grammar, begin by writing each terminal letter at the point of each step in a triangular matrix:
a a b a b
We now determine the possible non-terminals that each terminal can result from, and place these in the first diagonal:
a
X,Y,A
a
X,Y,A
b
X
a
X,Y,A
b
X
We now repeat the process for the next diagonal inward. For example, consider how the string of length 2, ba in the lower left can be formed. By looking at the contents of the two lower boxes on the first diagonal, we see that the possible right-hand side combination pairs are XX, XY, and XA. Two of
these exist in the grammar: S => XY, and X => XA, so we place S and X in the lowest box on the sepond diagonal to show that those combinations can originate from S or X. Moving up the diagonal, we see the combinations XX, YX, and AX. None of these occur in the grammar, so this box is empty (we put a there to indicate the empty set). Continuing in this fashion we fill the second diagonal as follows:
a
X,Y,A
a
X,Y,a S,X,Y
b
X S,X
a
X,Y,A
b
X S,X
Filling the third diagonal is a little more involved. First, notice that if you extend horizontally and vertically from the lowest square in that diagonal, the subtended string is the initial bab. So this square will contain non-terminals that can expand into the string bab (if there are anyit turns out that there are not). So to fill this square (call it (1,3), for row 1, 3rd column), we consider all the ways you can divide bab into two concatenated substrings (since CNF grammars always expand by twos). One way is to form b + ab, which, according to square (1,1) and (2,2) would have to come from X followed by . Since there is nothing in (2,2), then no such combination is possible in the grammar. We now consider the pairing ba + b, which would have to come from combinations found in squares (1,2) and (3,1), which has possibilities SS and XX. Neither of these are in the grammar, so we must out a in square (1,3). Moving up the diagonal to (2,3), we seek possible pairings from (2,1) concatenated with (3,2), and (2,2) with (4,1) (these represent the possible ways of generating aba). Again, we find none, so place a there. The complete third diagonal looks like:
a
X,Y,A
a
X,Y,A S,X,Y S,X
b
X S,X
a
X,Y,A
b
X S,X
We now seek to populate the fourth diagonal. The lowest square, (1,4), will contain valid grammar combinations from concatenating (1,1)(2,3), (1,2)(3,2), and (1,3)(4,1). None of those combinations are in the grammar. In fact, the rest of the entries are , so this string is not generated by the grammar. Now consider the string baaaa. We start by filling in the singleton rules:
a
X,Y,A
a
X,Y,A
a
X,Y,A
a
X,Y,A
b
X
Now we fill in (1,2) with the valid combinations that come from concatenating (1,1) and (2,1), which are XY and XA. These come from S and X, respectively, so we put S,X in (1,2). Pairing (2,1) and (3,1) finds rules XY, XA, and AY, so we place S, X, and Y in (2,2). Hopefully now you get the idea. Here is the final table that shows that baaaa is generated by the grammar, since S is found in (1,5) (the lower right-hand corner):
a
X,Y,A
a
X,Y,A S,X,Y S,X,Y S,Y S,X
a
X,Y,A S,X,Y S,Y S,X
a
X,Y,A S,X,Y S,X
b
X S,X
FYI: because X appears there also, then baaaa can also be generated from X.