0% found this document useful (0 votes)
18 views12 pages

Cdeprt

Ambiguity in grammar

Uploaded by

kishumanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Cdeprt

Ambiguity in grammar

Uploaded by

kishumanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 12
A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Or An ambiguous grammar is one that produces more than one leftmost derivation or more than one rightmost derivation for the same sentence. Example: > E+E|E*E|(£)|id E > E+E E > Ese = id+E > E+E+E Ss id+ Bek = id} E+E = id+id+F => id+ids = id+idvid = id+id«id For most parsers, it is desirable that the grammar be made unambiguous, for if it is not, we cannot uniquely determine which parse tree to select for a sentence. In other cases, it is convenient to use carefully chosen ambiguous grammars, together with disambiguating rules that “throw away" undesirable parse trees, leaving only one tree for each sentence. Left Recursion A grammar is left recursive if it has a nonterminal A such that there is a derivation A 4 Aa for some string a. ‘Top-down parsing methods cannot handle left-recursive grammars, so a transformation is needed to eliminate left A Aa | 3 (Left Recursive Grammar) Notation to eliminate A+ 6a ad! | « Immediate left recursion can be eliminated by the following technique, which works for any number of A-productions. First, group the productions as A Aay | Aaa | | Aan | Br | be | | Bn where no 4; begins with au A. Then, replace the A-productions by ABA! | BoA! | ve | B,Al A’ say A! | ad! | | amd! | € Algorithm INPUT: Grammar G with no cycles or ¢-productions. OUTPUT: An equivalent grammar with no left recursion. 1) arrange the nonterminals in some order Ay, A 2) for ( each i from 1 ton) { 3) for ( each j from 1toi-1) { ) replace each production of the form A; + jy by the productions A; + 67 | dyy | +++ | dy, where Aj 6, | dy | +++ | 6; are all current A;-productions = 6) eliminate the immediate left recursion among the 4;-produetions Example: Left Recursive Grammar E> E+T|T T 7+ T*F\F F > (EB) | id After Elimination E> TE! El + +TE |e T > FT! T > «FT' le Fo o-+ (B) | id Left Factoring Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing. When the choice between two alternative A-productions is not clear, we may be able to rewrite the productions to defer the decision until enough of the input has been seen that we can make the right choice. For example, if we have the two productions stmt — if expr then stmt else stmt | if expr then stmt on seeing the input if, we cannot immediately tell which production to choose. if A > ay | ado 1+ aA’ Alf | By Top Down Parsing: Top-down parsing can be viewed as the problem of constructing a parse tree for the input string, starting from the root and creating the nodes of the parse tree in preorder. Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an input string. At each step of a top-down parse, the key problem is that of determining the production to be applied for a nonterminal, say A. Once an A-production is chosen, the rest of the parsing process consists of “matching” the terminal symbols in the production body with the input string E + TE Boo+ +TE\« T + FT T + «*FT' |e & + (E) | id ‘Top down parsing for string id+id¥id Recursive Descent Parsing void AQ) { 1) Choose an A-production, A> X{ Xp Xp 2) for (i=1tok) { 3) if (_X, is a nonterminal ) 4) call procedure X;(): 3) else if (X; equals the current input symbol a ) 6) advance the input to the next symbol: 7 else /* an error has occurred */: } } A recursive-descent parsing program consists of a set of procedures, one for each nonterminal. Execution begins with the procedure for the start symbol, which halts and announces success if its procedure body scans the entire input string. ive-des General recurs ent may require backtracking; that is, it may require repeated scans over the input. To allow backtracking, the code given above needs to be modified. First, we cannot choose a unique A-production at line (1), so we must try each of several productions in some order. Then, failure at line (7) is not ultimate failure, but suggests only that we need to return to line (1) and try another A-production. Only if there are no more A-productions to try do we declare that an input error has been found. In order to try another A-production, we need to be able to reset the input pointer to where it was when we first reached line (1). Thus, a local variable is needed to store this input pointer for future use. Consider the grammar S + cAd A + abja (a) LL (1) Grammars Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1). The first \L" in LL(1) stands for scanning the input from left to right, the second \L" for producing a lefimost derivation, and the \I" for using one input symbol of lookahead at each step to make parsing action decisions. The class of LL(1) grammars is rich enough to cover most programming constructs, although care is needed in writing a suitable grammar for the source language. For example, no left-recursive or ambiguous grammar can be LL(1). A grammar G is LL(1) if and only if whenever A + a | 3 are two distinct productions of G, the following conditions hold L. For no terminal a do both a and 3 derive strings beginning with « 2. At most one of a and can derive the empty string, 3. If 3% ©, then a does not derive any string beginning with a terminal in FOLLOW(A). Likewise, if a 2 6, then 3 does not derive any string beginning with a terminal in FOLLOW(A)| The first two conditions are equivalent to the statement that riRsT(a) and FIRST(3) are disjoint sets. The third condition is equivalent to stating that if cis in FIRST(3), then FIRST(@) and FOLLOW(A) are disjoint sets, and likewise if cis in FIRST(a). E + TE Bo +TE'| T 3 Fr T’ + +#FT |e Fo (B)\ id Predictive parsers can be constructed for LL(1) grammars since the proper production to apply for a nonterminal can be selected by looking only at the current input symbol. algorithm is based on the following idea: the production A a: is chosen if the next input symbol a is in TiKsi(a). ‘The only complication occurs when or, more gencrally, « 2 ¢. In thls case, we should again choose A ~> a if the current input symbol is in POLLOW(A), or if the $ on the input has been reached and $ is in POLLOW(A). Algorithm 4.31: Construction of a predietive parsing table. INPUT: Grammar G. ourpur: Parsing table M METHOD: For each production A a of the grammar, do the following: 1. For each terminal @ in Fins ?(a), add A a to M[A,a. 2 Ife is in FIMST(a), then for each terminal 6 in FOLLOW(A), add A> o to M[A,O). Ife is in FiRST(a) and $ is in FOLLOW(A), add A a to M[A,$] as well] If, after performing the above, there is no production at all in M[A; a, then set M[A; a] to error. NON - INPUT SYMBOL, TERMINAL id + > T y 7 E BOTE ESTE EB BE’ 44TE' Bl +e| B's v Torr Torr r Tie |T’ + 9FT" TV +e|T +e F roid F3() Consider production E + TE". Since rirst(TE") = rirst(T) = {(,id} this production is added to ME, (J and M[Z,id]. Production £' + +28" is added to M(B’, +] since FIRST(+P.E") = {+}. Since FOLLOW(Z’) = {),8} production 2’ + ¢ is added to M[Z’,)} and M[B',S]. 0 S + iBtss'|\a Ss’ + eS |e bab ‘The grammar is ambiguous and the ambiguity is manifested by a choice in what production to use when an e (else) is seen, We can resolve this ambiguity by choosing S’ + €S. This choice corresponds to associating an else with the closest previous then, Note that the choice S' + © would prevent ¢ from ever being put on the stack or removed from the input, and is surely wrong. 0 Non - INPUr SYMBOL TERMINAL | q b . ; 7 : S Soa S > iEtSS! 3! ae Ste eS £ E>b Non-recursive Predictive Parsing A nonrecursive predictive parser can be built by maintaining a stack explicitly, rather than implicitly via recursive calls. The parser mimics a leftmost derivation If w is the input that has been matched so far, then the stack holds a sequence of grammarsymbols a such that SS wa in The input buffer contains the string to be parsed, followed by the end marker S. We reuse the symbol § to mark the bottom of the stack, which initially contains the start symbol of the grammar on top ofS. Predictive Stack [x Parsing Output y Program Zz ; en Parsing; ‘Table M The parser is controlled by a program that considers X, the symbol on top of the stack, and a, the current input symbol. If X is a nonterminal, the parser chooses an X-production by consulting entry M[X; a] of the parsing table M. Otherwise, it checks for a match between the terminal X and current input symbol a. let a be the first symbol of w let X be the top stack symbol: while (.X #8) { /* stack is not tn MATCHED upty */ let a be the next symbol of w else if ( is a terminal ) error(): else if ( M[X,q] is an error entry ) error else if (M[X,a] =X 3 ViYo---Ve) { output the production X + ViYa ++ Ye: if ( X =a) pop the stack a pop the stack: push Ye, Vey... Yi onto the stack, with Yon top; i let X be the top stack symbol; TE'=> FT'E'> idT'E' > id E' > id+TE' => -- In i tn bin hn TACK INPUT ACTION id + id » id id +id¥iaS output E> TE id+id+idS—ontput T > FT" id +id*idS ontput f+ id id + id*id$ match id id +id*id$ output T' > id +id*id$ output £’ > + TE’ id + TE'S id*id$ match + id + FI'E'S id+id$ output T > FT’ id + id T'E'S id+id$ ontput F + id id + id VES #id$ match id id + id a FI'E'S xid$ output T’ > « FT’ id + id « Fre’ id$ match * id + id id + id «id id + id «id id +id sid iS ontput F id $ match id $ output T! + « $output B’ +e Error Recovery in Predictive Parsing. ‘An error is detected during predictive parsing when the terminal on top of the stack does not match the next input symbol or when nonterminal A is on top of, the stack, a is the next input symbol, and M[A; a] is error (i.e., the parsing-table entry is empty). Panic Mode Panic-mode error recovery is based on the idea of skipping over symbols on the input until a token in a selected set of synchronizing tokens appears. Its e effectiveness depends on the choice of synchronizing set. The sets should be chosen so that the parser recovers quickly from errors that are likely to occur in practice. Some heuristics are as follows: As a starting point, place all symbols in FOLLOW(A) into the synchronizing set for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, itis likely that parsing can continue. Itis not enough to use FOLLOW(A) as the synchronizing set for A, We can add to the synchronizing set of a lower-level construct the symbols that begin higher-level constructs, For example, we might add keywords that begin statements to the synchronizing sets for the nonterminals generating expressions If we add symbols in FIRST(A) to the synchronizing set for nonterminal A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input, Ifanonterminal can generate the empty string, then the production deriving empty string can be used as a default. Doing so may postpone some error detection, but cannot cause an error to be missed. This approach reduces the number of nonterminals that have to be considered during error recovery. Ifa terminal on top of the stack cannot be matched, a simple idea is to pop the terminal, issue a message saying that the terminal was inserted, and continue parsing. In effect, this approach takes the synchronizing set of a token to consist of all other tokens NON - INPUT SYMBOL. VERMINAL ia. + % i y 3 E EOTE ESTE’ | synch | synch E Bo+Pe' Be|E se r TFT'| synch TFT | synch | synch r Toe |v serv’ T+ elT +e F Pid | synch synch | > (B) | synch | synch REMARK error, skip ) id is in FIRST(E) T'E'S FT'E'S FT'E'$ TE $ E'S +TE'S TE'$ PUES id T'E’S TE §$ E'S $ $ error, M[F, synch F has been popped Phrase-level Recovery Phrase-level error recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines. These routines may change, insert, or delete symbols on the input and issue appropriate error messages. They may also pop from the stack, Alteration of stack symbols or the pushing of new symbols onto the stack is questionable for several reasons. First, the steps carried out by the parser might then not correspond to the derivation of any word in the language at all. Second, we must ensure that there is no possibility of an infinite loop. Checking that any recovery action eventually results in an input symbol being consumed (or the stack being shortened if the end of the input has been reached) is a good way to protect against such loops.

You might also like