Chomshky Notes
Chomshky Notes
work And can find all the pieces of an exponential number of trees in polynomial time. Two popular methods: CKY Earley CKY Algorithm The CockeYoungerKasami (CYK) algorithm (alternatively called CKY) determines whether a string can be generated by a given context-free grammar and, if so, how it can be generated. This is known as parsing the string. The algorithm employs bottom-up parsing and dynamic programming. The standard version of CYK operates on context-free grammars given in Chomsky normal form (CNF). Any context-free grammar may be transformed to a CNF grammar expressing the same language. In the theory of computation, the importance of the CYK algorithm stems from the fact that it constructively proves that it is decidable whether a given string belongs to the formal language described by a given context-free grammar, and the fact that it does so quite efficiently. The worst case running time of CYK is , where n is the length of the parsed string and | G| is the size of the CNF grammar G. This makes it one of the most efficient algorithms for recognizing general context-free languages. The algorithm in pseudocode is as follows: Input string of size n Create a 2D table chart of size n2 For i=0 to n-1 Chart[i][i+1]=A if there is a rule A->a and input[i] =a For j=2 to N For i=j-2 down to 0 For k=i+1 to j-1 Chart[i][j]=A if there is a rule A->BC and chart[i][k]=B and chart[k][j]=C Return yes if chart[0][n] has the start symbol Else return no
The CKY (Cocke-Kasami-Younger)Algorithm requires the grammar be in Chomsky Normal Form (CNF) All rules must be in following form: A -> B C A -> w Any grammar can be converted automatically to Chomsky Normal Form Converting to CNF Rules that mix terminals and non-terminals.Introduce a new dummy non-terminal that covers the terminal INFVP -> to VP replaced by: INFVP -> TO VP TO -> to Rules that have a single non-terminal on right (unit productions) Rewrite each unit production with the RHS of their expansions Rules whose right hand side length >2 Introduce dummy non-terminals that spread the right-hand side Automatic Conversion to CNF
Sample Grammar
CKY Parsing Given rules in CNF. Consider the rule A -> BC. If there is an A in the input then there must be a B followed by a C in the input. If the A goes from i to j in the input then there must be some k st. i<k<j that is B splits from the C someplace. So lets build a table so that an A spanning from i to j in the input is placed in cell [i,j] in the table. So a non-terminal spanning an entire string will sit in cell [0, n].If we build the table bottom up well know that the parts of the A must go from i to k and from k to j. Meaning that for a rule like A -> B C we should look for a B in [i,k] and a C in [k,j].In other words, if we think there might be an A spanning i,j in the input AND A -> B C is a rule in the grammar THEN there must be a B in [i,k] and a C in [k,j] for some i<k<j.So just loop over the possible k values. CKY Example S -> NP VP VP -> V NP NP -> NP PP VP -> VP PP PP -> P NP NP -> John, Mary, Denver V -> called P -> from
Ambiguity Both CKY and Earley will result in multiple S structures for the [0,n] table entry. They both efficiently store the sub-parts that are shared between multiple parses. But neither can tell us which one is right. Not a parser a recognizer The presence of an S state with the right attributes in the right place indicates a successful recognition. But no parse tree no parser Thats how we solve (not) an exponential problem in polynomial time Converting CKY from Recognizer to Parser With the addition of a few pointers we have a parser. Augment each new cell in chart to point to where we came from.