Unit-Ii: Top Down Parsing
Unit-Ii: Top Down Parsing
It is classified in to two different variants namely; one which uses Back Tracking and the other is
Non Back Tracking in nature.
Non Back Tracking Parsing: There are two variants of this parser as given below.
1. Table Driven Predictive Parsing :
i. LL (1) Parsing
Back Tracking
1. Brute Force method
An input buffer that contains the string to be parsed followed by a $ Symbol, used to
indicate end of input.
A stack, containing a sequence of grammar symbols with a $ at the bottom of the stack,
which initially contains the start symbol of the grammar on top of $.
A parsing table containing the production rules to be applied. This is a two dimensional
array M [Non terminal, Terminal].
V is a finite set of Non terminal; Non terminals are syntactic variables that denote sets of
strings. The sets of strings denoted by non terminals help define the language generated
by the grammar. Non terminals impose a hierarchical structure on the language that
is key to syntax analysis and translation.
T is a Finite set of Terminal; Terminals are the basic symbols from which strings are
formed. The term "token name" is a synonym for '"terminal" and frequently we will use
the word "token" for terminal when it is clear that we are talking about just the token
name. We assume that the terminals are the first components of the tokens output by the
lexical analyzer.
S is the Starting Symbol of the grammar, one non terminal is distinguished as the start
symbol, and the set of strings it denotes is the language generated by the grammar. P
is finite set of Productions; the productions of a grammar specify the manner in which the
terminals and non terminals can be combined to form strings, each production is in α->β
form, where α is a single non terminal, β is (VUT)*.Each production consists of:
(a) A non terminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol ->. Some times: = has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and non-
terminals. The components of the body describe one way in which strings of the non
terminal at the head can be constructed.
Conventionally, the productions for the start symbol are listed first.
Example: Context Free Grammar to accept Arithmetic expressions.
The terminals are +, *, -, (,), id.
The Non terminal symbols are expression, term, factor and expression is the starting symbol.
DERIVATIONS:
The construction of a parse tree can be made precise by taking a derivational view, in
which productions are treated as rewriting rules. Beginning with the start symbol, each rewriting
step replaces a Non terminal by the body of one of its productions. This derivational view
corresponds to the top-down construction of a parse tree as well as the bottom construction of the
parse tree.
Derivations are classified in to Let most Derivation and Right Most Derivations.
NOTE: Every time we need to start from the root production only, the under line using at Non
terminal indicating that, it is the non terminal (left most one) we are choosing to rewrite the
productions to accept the string.
E => E + E
=> E + E * E
=> E + E * id
=> E + id * id
=> id + id * id
NOTE: Every time we need to start from the root production only, the under line using at Non
terminal indicating that, it is the non terminal (Right most one) we are choosing to rewrite the
productions to accept the string.
What is a Parse Tree?
A parse tree is a graphical representation of a derivation that filters out the order in which
productions are applied to replace non terminals.
Each interior node of a parse tree represents the application of a production.
All the interior nodes are Non terminals and all the leaf nodes terminals.
All the leaf nodes reading from the left to right will be the output of the parse tree.
If a node n is labeled X and has children n1,n2,n3,…nk with labels X1,X2,…Xk
respectively, then there must be a production A->X1X2…Xk in the grammar.
Example1:- Parse tree for the input string - (id + id) using the above Context free Grammar is
Figure 2.4 : Parse Tree for the input string - (id + id)
The Following figure shows step by step construction of parse tree using CFG for the parse tree
for the input string - (id + id).
Figure 2.5 : Sequence outputs of the Parse Tree construction process for the input string –(id+id)
Example2:- Parse tree for the input string id+id*id using the above Context free Grammar is
Figure 2.6: Parse tree for the input string id+ id*id
AMBIGUITY in CFGs:
Definition: A grammar that produces more than one parse tree for some sentence (input string)
is said to be ambiguous.
In other words, an ambiguous grammar is one that produces more than one leftmost
derivation or more than one rightmost derivation for the same sentence.
Or If the right hand production of the grammar is having two non terminals which are
exactly same as left hand side production Non terminal then it is said to an ambiguous grammar.
Example : If the Grammar is E-> E+E | E*E | -E| (E) | id and the Input String is id + id* id
Two parse trees for given input string are
(a)
(b)
Two Left most Derivations for given input String are :
E => E +E E => E * E
=> id + E => E+E*E
=> id + E * E => id + E * E
=> id + id * E => id+ id* E
=> id + id * id => id + id * id
(a) (b)
The above Grammar is giving two parse trees or two derivations for the given input string so, it
is an ambiguous Grammar
Note: LL (1) parser will not accept the ambiguous grammars or We cannot construct an
LL(1) parser for the ambiguous grammars. Because such grammars may cause the Top
Down parser to go into infinite loop or make it consume more time for parsing. If necessary
we must remove all types of ambiguity from it and then construct.
ELIMINATING AMBIGUITY: Since Ambiguous grammars may cause the top down Parser
go into infinite loop, consume more time during parsing.
Therefore, sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. The
general form of ambiguous productions that cause ambiguity in grammars is
A Aα | β
This can be written as (introduce one new non terminal in the place of second non terminal)
A β Aꞌ
Aꞌ α Aꞌ| ε
Example : Let the grammar is E E+E | E*E | -E| (E) | id . It is shown that it is ambiguous that
can be written as
E E+E
E E-E
E E*E
E -E
E (E)
E id
In the above grammar the 1st and 2nd productions are having ambiguity. So, they can be written
as
E-> E+E | E*E this production again can be written as
E-> E+E | β , where β is E*E
The above production is same as the general form. so, that can be written as
E->E+T|T
T->β
LEFT RECURSION:
Another feature of the CFGs which is not desirable to be used in top down parsers is left
recursion. A grammar is left recursive if it has a non terminal A such that there is a derivation
A=>Aα for some string α in (TUV)*. LL(1) or Top Down Parsers can not handle the Left
Recursive grammars, so we need to remove the left recursion from the grammars before being
used in Top Down Parsing.
The General form of Left Recursion is
A Aα | β
The above left recursive production can be written as the non left recursive equivalent :
A βAꞌ
Aꞌ αAꞌ| €
Example : - Is the following grammar left recursive? If so, find a non left recursive grammar
equivalent to it.
E E+T|T
T T*F|F
F -E | (E) | id
Yes ,the grammar is left recursive due to the first two productions which are satisfying the
general form of Left recursion, so they can be rewritten after removing left recursion from
E → E + T, and T→ T * F is
E TE′
E′ +T E′ | €
T F T′
T′ *F T′ | €
F (E) | id
LEFT FACTORING:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive or top-down parsing. A grammar in which more than one production has common
prefix is to be rewritten by factoring out the prefixes.
For example, in the following grammar there are n A productions have the common prefix α,
which should be removed or factored out without changing the language defined for A.
We can factor out the α from all n productions by adding a new A production A αA′
, and rewriting the A′ productions grammar as
A αA′
A′ A1|A2|A3|A4…|An
Computation of FIRST:
FIRST function computes the set of terminal symbols with which the right hand side of
the productions begin. To compute FIRST (A) for all grammar symbols, apply the following
rules until no more terminals or € can be added to any FIRST set.
1. If A is a terminal, then FIRST {A} = {A}.
2. If A is a Non terminal and A->X1X2…Xi
FIRST(A)=FIRST(X1) if X1is not null, if X1 is a non terminal and X1->€, add
FIRST(X2) to FIRST(A), if X2-> € add FIRST(X3) to FIRST(A), … if Xi->€ ,
i.e., all Xi‘s for i=1..i are null, add € FIRST(A).
3. If A ->€ is a production, then add € to FIRST (A).
Computation Of FOLLOW:
Follow (A) is nothing but the set of terminal symbols of the grammar that are
immediately following the Non terminal A. If a is to the immediate right of non terminal A, then
Follow(A)= {a}. To compute FOLLOW (A) for all non terminals A, apply the following rules
until no more symbols can be added to any FOLLOW set.
1. Place $ in FOLLOW(S), where S is the start symbol, and $ is the input right end
marker.
2. If there is a production A-> αBβ, then everything in FIRST (β) except € is in
FOLLOW(B).
3. If there is a production A->αB or a production A-> αBβ with FIRST(β) contains €,
then FOLLOW (B) = FOLLOW (A).
Example: - Compute the FIRST and FOLLOW values of the expression grammar
1. E TE′
2. E′ +TE′ | €
3. T FT′
4. T′ *FT′ | €
5. F (E) | id
E { (, id } { $, ) }
E′ { +, € } { $, ) }
T { (, id} { +, $, ) }
T′ {*, €} { +, $, ) }
F { ( , id} { *, +, $, ) }
Table 2.1: FIRST and FOLLOW values
Constructing Predictive Or LL (1) Parse Table:
It is the process of placing the all productions of the grammar in the parse table based on the
FIRST and FOLLOW values of the Productions.
The rules to be followed to Construct the Parsing Table (M) are :
1. For Each production A-> α of the grammar, do the bellow steps.
2. For each terminal symbol ‗a‘ in FIRST (α), add the production A-> α to M [A, a].
3. i. If € is in FIRST (α) add production A->α to M [ A, b], where b is all terminals in
FOLLOW (A).
ii. If € is in FIRST(α) and $ is in FOLLOW (A) then add production A->α to
M [A, $].
4. Mark other entries in the parsing table as error .
INPUT SYMBOLS
NON-TERMINALS
+ * ( ) id $
E TE′ E id
E
E′ +TE′ E′ € E′ €
E′
T FT′ T FT′
T
T′ € T′ *FT′ T′ € T′ €
T′
F (E) F id
F
Table 2.2: LL (1) Parsing Table for the Expressions Grammar
Note: if there are no multiple entries in the table for single a terminal then grammar is accepted
by LL(1) Parser.
LL (1) Parsing Algorithm:
The parser acts on basis on the basis of two symbols
i. A, the symbol on the top of the stack
ii. a, the current input symbol
There are three conditions for A and ‗a‘, that are used fro the parsing program.
1. If A=a=$ then parsing is Successful.
2. If A=a≠$ then parser pops off the stack and advances the current input pointer to the
next.
3. If A is a Non terminal the parser consults the entry M [A, a] in the parsing table. If
M[A, a] is a Production A-> X1X2..Xn, then the program replaces the A on the top of
the Stack by X1X2..Xn in such a way that X1 comes on the top.
advance( );
return true;
}
else return error;
}
advance()
{
input = next token;
}
BACK TRACKING: This parsing method uses the technique called Brute Force method
during the parse tree construction process. This allows the process to go back (back track) and
redo the steps by undoing the work done so far in the point of processing.
Brute force method: It is a Top down Parsing technique, occurs when there is more
than one alternative in the productions to be tried while parsing the input string. It selects
alternatives in the order they appear and when it realizes that something gone wrong it tries with
next alternative.
For example, consider the grammar bellow.
S cAd
A ab | a
To generate the input string ―cad‖, initially the first parse tree given below is generated.
As the string generated is not ―cad‖, input pointer is back tracked to position ―A‖, to examine the
next alternate of ―A‖. Now a match to the input string occurs as shown in the 2nd parse trees
given below.
( 1) (2)
IMPORTANT AND EXPECTED QUESTIONS
1. Explain the components of working of a Predictive Parser with an example?
2. What do the FIRST and FOLLOW values represent? Give the algorithm for computing
FIRST n FOLLOW of grammar symbols with an example?
3. Construct the LL (1) Parsing table for the following grammar?
E E+T|T
T T*F
F (E) | id
4. For the above grammar construct, and explain the Recursive Descent Parser?
5. What happens if multiple entries occurring in your LL (1) Parsing table? Justify your
answer? How does the Parser
ASSIGNMENT QUESTIONS
4. Will the Predictive parser accept the ambiguous Grammar justify your answer?
Operator precedence
Parsing Algorithm
Output
$
Stack
E E+E
E E-E
E E*E
E E/E
E E^E
E -E
E (E)
E id , Construct operator precedence table and accept input string “ id+id*id”
The first handle is ‗id‘ and match for the ‗id ‗in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
2. $ <• E •> *<• id•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So , the string becomes
3. $ <• *<• id•> $
The next handle is ‗id‘ and match for the ‗id ‗in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
4. $ <• *<• E•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes
5. $ <• * •> $
The next handle is ‗*‘ and match for the ‗ ‗in the grammar is E E*E.
So, id is replaced with the Non terminal E. the given input string can be
written as
6. $ E $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes
7. $ $
$ On $ means parsing successful.
Operator Parsing Algorithm:
The operator precedence Parser parsing program determines the action of the parser depending
on
1. ‗a‘ is top most symbol on the Stack
2. ‗b‘ is the current input symbol
There are 3 conditions for ‗a‘ and ‗b‘ that are important for the parsing program
1. a=b=$ , the parsing is successful
2. a <• b or a = b, the parser shifts the input symbol on to the stack and advances the
input pointer to the next input symbol.
3. a •> b, parser performs the reduce action. The parser pops out elements one by
one from the stack until we find the current top of the stack element has lower
precedence than the most recently popped out terminal.
Example, the sequence of actions taken by the parser using the stack for the input string ―id * id
― and corresponding Parse Tree are as under.
E * E
id id
Advantages and Disadvantages of Operator Precedence Parsing:
The following are the advantages of operator precedence parsing
1. It is simple and easy to implement parsing technique.
2. The operator precedence parser can be constructed by hand after understanding the
grammar. It is simple to debug.
The following are the disadvantages of operator precedence parsing:
1. It is difficult to handle the operator like ‗-‗which can be either unary or binary and hence
different precedence‘s and associativities.
2. It can parse only a small class of grammar.
3. New addition or deletion of the rules requires the parser to be re written.
4. Too many error entries in the parsing tables.
LR Parsing:
Most prevalent type of bottom up parsing is LR (k) parsing. Where, L is left to right scan of the
given input string, R is Right Most derivation in reverse and K is no of input symbols as the
Look ahead.
It is the most general non back tracking shift reduce parsing method
The class of grammars that can be parsed using the LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.
Shift GOTO
Stack
LR Parsing Table
2. Simple LR ( 1 )
3. Canonical LR ( 1 )
4. Look ahead LR ( 1 )
E E+T|T
T T*F
F (E) | id the Augment grammar G` is Represented by
E` E
E E+T|T
T T*F
F (E) | id
NOTE: Augment Grammar is simply adding one extra production by preserving the actual
meaning of the given Grammar G.
Canonical collection of LR (0) items
LR (0) items
An LR (0) item of a Grammar is a production G with dot at some position on the right
side of the production. An item indicates how much of the input has been scanned up to a given
point in the process of parsing. For example, if the Production is X YZ then, The LR (0)
items are:
1. X •AB, indicates that the parser expects a string derivable from AB.
2. X A•B, indicates that the parser has scanned the string derivable from the A and
expecting the string from Y.
3. X AB•, indicates that he parser has scanned the string derivable from AB.
If the grammar is X € the, the LR (0) item is
X •, indicating that the production is reduced one.
Canonical collection of LR(0) Items:
This is the process of grouping the LR (0) items together based on the closure and Go to
operations
Closure operation
If I is an initial State, then the Closure (I) is constructed as follows:
1. Initially, add Augment Production to the state and check for the • symbol in the Right
hand side production, if the • is followed by a Non terminal then Add Productions
which are Stating with that Non Terminal in the State I.
2. If a production X α•Aβ is in I, then add Production which are starting with X in the
State I. Rule 2 is applied until no more productions added to the State I( meaning that
the • is followed by a Terminal symbol).
Example :
0. E` E E` •E
1. E E+T LR (0) items for the Grammar is E •E+T
2. T F T •F
3. T T*F T •T*F
4. F (E) F • (E)
5. F id F • id
GO TO Operation
Go to (I0, X), where I0 is set of items and X is the grammar Symbol on which we
are moving the „•‟ symbol. It is like finding the next state of the NFA for a give State I0 and the
input symbol is X. For example, if the production is E •E+T
Note: Once we complete the Go to operation, we need to compute closure operation for the
output production
Go to (I0, E) is E E•+T,E` E. = Closure ({E` E•, E E•+T})
E`->.E E`-> E.
E->.E+T E E-> E.+T
T-> .T*F
a States ACTION GO TO
A->α•aβ A->αa•β a $ A
Ii Sj
Ii Ij
Ij
If there is a transaction from one state (Ii ) to anot her state (Ij) on a Non terminal val ue
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:
States ACTION GO TO
A
A->α•Aβ A->αA•β a $ A
Ii j
Ii Ij
Ij
If there is one state (Ii), where there is one production which has no transitions. Then, the
production is said to be a reduced production. These productions should have reduced entry in
the Action part along with their production numbers. If the Augment production is reducing then,
write accept in the Action part.
States ACTION GO TO
1 A->αβ• a $ A
ngineering & Technology/Hyderabad/In
Ii r1 r1
Ii
Ii
For Example, Construct the LR (0) parsing Table for the given Grammar (G)
S aB
B bB | b
Sol: 1. Add Augment Production and insert „•‟ symbol at the first position for every
production in G
0. S′ •S
1. S •aB
2. B •bB
3. B •b
I0 State:
1. Add Augment production to the I0 State and Compute the Closure
I0 = Closure ( S′ •S)
Since ‗•‘ is followed by the Non terminal, add all productions starting with S in to I0 State. So,
the I0 State becomes
I0 = S′ •S
S •aB Here, in the S production ‗.‘ Symbol is followed by a terminal value so close
the state.
I1= Go to (I0, S)
S` S•
Closure( S` S•) = S′ S• Here, The Production is reduced so close the State.
I1= S′ S•
I2= S a•B
B •bB
B •b
B •bB
B •b The Dot Symbol is followed by the terminal value. So, close the State.
I4= B b•B
B •bB
B •b
B b•
I7 = Go to ( I4 , b) = I4
Drawing Finite State diagram DFA: Following DFA gives the state transitions of the parser
and is useful in constructing the LR parsing table.
S->aB•
S′->S•
S I3
I1 B
S′->•S
S->•aB
B->b•B
B
a S->a•B b B->•bB
B->bB•
I0 B->•bB B->•b
B->•b B->b•
b
I5
I4
I2 I4
LR Parsing Table:
ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACC
I2 S4 3
I3 R1 R1 R1
I4 R3 S4/R3 R3 5
I5 R2 R2 R2
Note: if there are multiple entries in the LR (1) parsing table, then it will not accepted by the
LR(1) parser. In the above table I3 row is giving two entries for the single terminal value ‗b‘ and
it is called as Shift- Reduce conflict.
Shift-Reduce Conflict in LR (0) Parsing: Shift Reduce Conflict in the LR (0) parsing
occurs when a state has
1. A Reduced item of the form A α• and
2. An incomplete item of the form A β•aα as shown below:
Ii
Ij
1 A-> α• a $ A B
1. Write the Context free Grammar for the given input string
2. Check for the Ambiguity
3. Add Augment production
If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then,
we should write the shift entry in the action part as shown below:
States ACTION GO TO
a
A->α•aβ A->αa•β a $ A
Ii Sj
Ii Ij
Ij
If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:
A States ACTION GO TO
A->α•Aβ A->αA•β a $ A
Ii j
Ij
Ii Ij
1 If there is one state (Ii), where there is one production (A->αβ•) which has no transitions
to the next State. Then, the production is said to be a reduced production. For all
terminals X in FOLLOW (A), write the reduce entry along with their production
numbers. If the Augment production is reducing then write accept.
1 S -> •aAb
2 A->αβ•
Follow(S) = {$}
Follow (A) = (b}
Ii States ACTION GO TO
2 A->αβ• a b $ S A
Ii r2
Ii
S aB
B bB | b
ACTION GOTO
States
A b $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1
I4 S4 R3 5
I5 R2
Note: When Multiple Entries occurs in the SLR table. Then, the grammar is not accepted by
SLR(1) Parser.
Conflicts in the SLR (1) Parsing :
When multiple entries occur in the table. Then, the situation is said to be a Conflict.
Shift-Reduce Conflict in SLR (1) Parsing : Shift Reduce Conflict in the LR (1) parsing occurs
when a state has
1. A Reduced item of the form A α• and Follow(A) includes the terminal value
‗a‘.
2. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α
States Action GOTO
a
2 B->b•
Ij a $ A B
Ii Sj/r2
Ii
2 B->β• a $ A B
Ii r1/r2
Ii
Canonical LR (1) Parsing: Various steps involved in the CLR (1) Parsing:
1. Write the Context free Grammar for the given input string
2. Check for the Ambiguity
5. Draw DFA
7. Based on the information from the Table, with help of Stack and Parsing
algorithm generate the output.
LR (1) items :
The LR (1) item is defined by production, position of data and a terminal symbol. The
terminal is called as Look ahead symbol.
General form of LR (1) item is S->α•Aβ , $
I0 State : Add Augment production and compute the Closure, the look ahead symbol for the Augment
Production is $.
S′->•S, $= Closure(S′->•S, $)
The dot symbol is followed by a Non terminal S. So, add productions starting with S in I0
State.
S->•CC, $
The dot symbol is followed by a Non terminal C. So, add productions starting with C in I0
State.
C->•cC, FIRST(C, $)
C->•d, FIRST(C, $)
C->•cC, c/d
C->•d, c/d
The dot symbol is followed by a terminal value. So, close the I0 State. So, the productions in the
I0 are
S′->•S , $
S->•CC , $
C->•cC, c/d
C->•d , c/d
S-> C->•cC , $
C->•d,$ So, the I2 State is
S->C•C,$
C->•cC , $
C->•d,$
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
Drawing the Finite State Machine DFA for the above LR (1) items
S->CC•, $
S′->S•,$
S I1 C I5 C->cC• , $
0 S′->•S , $ C->c•C , $ I9
S->C•C,$
1 S->•CC , $ C C->•cC , $ c C->•cC , $ c
2C->•cC,c/d C->•d,$ C->•d,$
3 C->•d ,c/d d I6
I2 I6 I7
I0 c d
d
C->c•C, c/d C->d•, $
C->d•, c/d C->•cC, c/d C I7
I4 C->•d , c/d
d I3 c
I4 I3 I8
C->cC•, c/d
Construction of CLR (1) Table
Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.
ACTION GOTO
States
c d $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
Table : LR (1) Table
These states are differing only in the look-aheads. They have the same productions. Hence these
states are combined to form a single state called as I47.
Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
These states are differing only in the look-aheads. They have the same productions. Hence these
states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
ACTION GOTO
States
c d $ S C
I0 S36 S47 1 2
I1 ACCEPT
I2 S36 S47 5
I36 S36 S47 89
I47 R3 R3 R3 5
I5 R1
I89 R2 R2 R2
Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α , $
States Action GOTO
a
2 B->b• ,a
Ij a $ A B
Ii Sj/r2
Ii
Reduce / Reduce Conflict in CLR (1) Parsing
Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B β• If two productions in a state (I) reducing on same look ahead symbol
as shown below:
1 A-> α• ,a
States Action GOTO
2 B->β•,a
a $ A B
Ii r1/r2
Ii
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
ACTION GOTO
States
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop 2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6
$0c3C8 d$ Reduce with R2 ,C->cC, pop 2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop 2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R1,S->CC, pop 2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept
Ambiguity: A Grammar can have more than one parse tree for a string . For example, consider
grammar.
A grammar is said to be an ambiguous grammar if there is some string that it can generate in
more than one way (i.e., the string has more than one parse tree or more than one leftmost
derivation). A language is inherently ambiguous if it can only be generated by ambiguous
grammars.
In this grammar, the string 9-5+2 has two possible parse trees as shown in the next slide.
Consider the parse trees for string 9-5+2, expression like this has more than one parse tree. The
two trees for 9-5+2 correspond to the two ways of parenthesizing the expression: (9-5)+2 and 9-
(5+2). The second parenthesization gives the expression the value 2 instead of 6.
Ambiguity is problematic because meaning of the programs can be incorrect
Ambiguity is harmful to the intent of the program. The input might be deciphered in a way which
was not really the intention of the programmer, as shown above in the 9-5+2 example. Though
there is no general technique to handle ambiguity i.e., it is not possible to develop some feature
which automatically identifies and removes ambiguity from any grammar. However, it can be
removed, broadly speaking, in the following possible ways:-
2) Implementing precedence and associatively rules in the grammar. We shall discuss this
technique in the later slides.
If an operand has operator on both the sides, the side on which operator takes this operand is the
associativity of that operator
Grammar to generate strings with right associative operators right à letter = right | letter letter
a| b |.| z
A binary operation * on a set S that does not satisfy the associative law is called non-
associative. A left-associative operation is a non-associative operation that is conventionally
evaluated from left to right i.e., operand is taken by the operator on the left side.
For example,
6*5*4 = (6*5)*4 and not 6*(5*4)
6/5/4 = (6/5)/4 and not 6/(5/4)
For example,
6^5^4 => 6^(5^4) and not (6^5)^4)
x=y=z=5 => x=(y=(z=5))
Following is the grammar to generate strings with left associative operators. (Note that this is left
recursive and may go into infinite loop. But we will handle this problem later on by making it
right recursive)
IMPORTANT QUESTIONS
1. Discuss the the working of Bottom up parsing and specifically the Operator Precedence
Parsing with an exaple?
2. What do you mean by an LR parser? Explain the LR (1) Parsing technique?
3. Write the differences between canonical collection of LR (0) items and LR (1) items?
4. Write the Difference between CLR (1) and LALR(1) parsing?
5. What is YACC? Explain how do you use it in constructing the parser using it.
ASSIGNMENT QUESTIONS
5. E E+T|T
T T*F
F (E) |id, construct the LALR (1) Parsing table? And explain the Conflicts?