CD - R16 - UNIT III - Notes
CD - R16 - UNIT III - Notes
CD - R16 - UNIT III - Notes
BOTTOM-UP PARSING
HANDLE A handle of a string is a substring that matches the right side of a production,
and whose reduction to the non-terminal on the left side of the production
represents one step along the reverse of a rightmost derivation
SHIFT The next input symbol is shifted onto the top of the stack.
REDUCE The parser replaces the handle within a stack with a non-terminal.
ERROR The parser discovers that a syntax error has occurred and calls an error
recovery routine.
LR PARSERS An efficient bottom-up syntax analysis technique that can be used to parse a
large class of CFG is called LR(k) parsing. The ‘L’ is for left-to-right scanning
of the input, the ‘R’ for constructing a rightmost derivation in reverse, and the
‘k’ for the number of input symbols.
TYPES OF LR 1. SLR- Simple LR
PARSING ▪ Easiest to implement, least powerful.
METHODS
2. CLR- Canonical LR
▪ Most powerful, most expensive.
3. LALR- Look-Ahead LR
▪ Intermediate in size and cost between the other two methods.
LR (1) ITEM The LR (1) item is defined by production, position of data and a terminal
symbol.
The terminal is called as look ahead symbol.
CONCEPTS
Constructing a parse tree for an input string beginning at the leaves and going towards
the root is called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the reverse
of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3
The rightmost derivation is :
E → E+E
→ E+E*E
→ E+E*id3
→ E+id2*id3
→ id1+id2*id3
In the above derivation the underlined substrings are called handles.
Handle pruning:
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ reduce by E→ E *E
$E $ accept
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shift-reduce conflict:
Example:
Consider the grammar:
2. Reduce-reduce conflict:
Consider the grammar:
M → R+R | R+c |
RR→c
and input c+c
$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
OPERATOR-PRECEDENCE PARSING
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is ɛ or has two adjacent non-terminals.
Example:
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
E → E+E | E-E | E*E | E/E | E↑E | -E | id
)>θ,θ>)
. .
θ>$,$<θ
Also make
. . . . . . . .
( = ) , ( < ( , ) > ) , ( < id , id > ) , $ < id , id > $ , $ < ( , ) > $
Example:
Operator-precedence relations for the grammar
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming
( . . . . . . . =
< < < < < < <
) . > . > . > . > . > . > . >
$ . . . . . . .
< < < < < < <
Example:
Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id .The
implementation is as follows:
where w is the input string to be parsed.
Advantages of LR parsing:
• It recognizes virtually all programming language constructs for which CFG can be
written.
• It is an efficient non-backtracking shift-reduce parsing method.
• A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
• It detects a syntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
• The parsing program reads characters from an input buffer one at a time.
• The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
• The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and ai, the current
input symbol. It then consults action[sm,ai] in the action table which can have one of four values :
1. shift s, where s is a state,
2. reduce by a grammar production A → β,
3. accept, and
4. error.
Goto : The function goto takes a state and grammar symbol as arguments and produces a state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
buffer. The parser then executes the following program :
set ip to point to the first input symbol of
w$; repeat forever begin
let s be the state on top of the stack
and a the symbol pointed to by ip;
if action[s, a] = shift s’ then begin push
a then s’ on top of the stack;
advance ip to the next input symbol
end
else if action[s, a] = reduce A→β then begin
pop 2* | β | symbols off the stack;
let s’ be the state now on top of the stack;
push A then goto[s’, A] on top of the
stack; output the production A→ β
end
else if action[s, a] = accept then
return
else error( )
end
CONSTRUCTING SLR(1) PARSING TABLE:
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the
right side. For example, production A → XYZ yields the four items :
A → . XYZ
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from
I by the two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ
to I , if it is not already there. We apply this rule until no more new items can be added
to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such
that [A→ α . Xβ] is in I.
Steps to construct SLR parsing table for grammar G are:
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
3. The goto transitions for state i are constructed for all non-terminals A using the rule: If
goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing
[S’→.S].
I0 : E’ → . E
E → . E +T
E → .T
T→.T*F
T→.F
F → . (E)
F → . id
GOTO ( I0 , E) GOTO ( I4 , id )
I1 : E’ → E . I5 : F → id .
E→E.+T
GOTO ( I6 , T )
I9 : E → E + T .
GOTO ( I0 , T)
T→T.*F
I2 : E → T .
T→T.*F
GOTO ( I6 , F )
I3 : T → F .
GOTO ( I0 , F)
I3 : T → F .
GOTO ( I6 , ( )
I4 : F → ( . E )
GOTO ( I4 , T) GOTO ( I9 , *)
I2 : E →T . I7 : T → T * . F
T→T.*F F→.(E )
F → . id
GOTO ( I4 , F)
I3 : T → F .
GOTO ( I4 , ( )
I4 : F → ( . E)
E → .E +T
E → .T
T → . T *F
T → .F
F → . (E)
F → id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }
ACTION GOTO
id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 ACC
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
Blank entries are error entries.
Stack implementation:
Write the Context free Grammar for the given input string
5. Draw DFA
7. Based on the information from the Table, with help of Stack and
p arsing algorithm generate the output.
LR (1) item
The LR (1) item is defined by production, position of data and a terminal symbol.
The terminal is called as look ahead symbol.
General form of LR (1) item is
S->α•Aβ , $
I0 State:
Add Augment production and compute the Closure, the look ahead symbol for the Augment Production
is $.
S`->•S, $= Closure(S`->•S, $)
The dot symbol is followed by a Non terminal S. So, add productions starting with S in
I0 State.
S->•CC, FIRST ($), using 2nd rule
S->•CC, $
The dot symbol is followed by a Non terminal C. So, add productions starting with C in
I0 State.
C->•cC, FIRST(C,
$) C->•d, FIRST(C,
$)
FIRST(C) = {c, d} so, the items are
C->•cC, c/d
C->•d, c/d
The dot symbol is followed by a terminal value. So, close the I0 State. So, the productions in
the I0 are
S`->•S , $
S->•CC ,
$
C->•cC, c/d
C->•d , c/d
I1 = Goto ( I0, S)= S`->S•,$
S-> C->•cC ,$
C->•d,$
So, the I2 State is
C->•cC , $
C->•d,$
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
I1 I5 C->cC• , $
0 S`->•S , $ I9
S->C•C,$
1 S->•CC , $
C c
2C- C->•d,$ C->•d,$
I6
3 C->•d
I2 I6 I7
I0
C->d•, $
C->d•,
c/d C->•d , c/ d
I7
I4
d I3
I4 I3 c/ d
I8
Construction of CLR (1) Table
Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.
LR (1) Table
The CLR Parser avoids the conflicts in the parse table. But it produces more number of States
when compared to SLR parser. Hence more space is occupied by the table in the memory. So LALR
parsing can be used. Here, the tables obtained are smaller than CLR parse table. But, it is also as
efficient as CLR parser. Here LR (1) items that have same productions but different look-aheads are
combined to form a single set of items.
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I47.
Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I6= Goto ( I2, c)=
C->c•C , $ C-
>•cC , $ C-
>•d,$
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
LALR Table
Conflicts in the CLR (1) Parsing
When, multiple entries occur in the table. Then, the situation is said to be a Conflict.
Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:
1 A-> β•a α ,
States Action GOTO
$ a
Ij a $ A B
2 B->b• ,a
Ii Sj/r2
Ii
Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B ȕ• If two productions in a state (I) reducing on same look ahead symbol
as shown below:
Action GOTO
1 A-> α• ,a
2 B->β•,a a $ A B
Ii r1/r2
States
Ii
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop
2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6
$0c3C8 d$ Reduce with R2 ,C->cC, pop
2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop
2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R1,S->CC, pop
2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept
LL Parsers vs LR Parsers:
• LL starts with only the root nonterminal on the stack, LR ends with only the root nonterminal
on the stack.
• LL ends when the stack is empty. But, LR starts with an empty stack.
• LL uses the stack for designating what is still to be expected, LR uses the stack for
designating what is already seen.
• LL builds the parse tree top down. But, LR builds the parse tree bottom up.
• LL continuously pops a nonterminal off the stack, and pushes a corresponding right hand side.
But, LR tries to recognize a right hand side on the stack, pops it, and pushes the corresponding
nonterminal.
• LL reads terminal when it pops one off the stack, LR reads terminals while it pushes them on
the stack.
• LL uses grammar rules in an order which corresponds to pre-order traversal of the parse tree,
LR does a post-order traversal.
IMPORTANT QUESTIONS
1. Write the steps for the efficient construction of LALR parsing table. Explain with an example.
LALR (1) Parsing
The CLR Parser avoids the conflicts in the parse table. But it produces more number of States
when compared to SLR parser. Hence more space is occupied by the table in the memory. So LALR
parsing can be used. Here, the tables obtained are smaller than CLR parse table. But, it is also as
efficient as CLR parser. Here LR (1) items that have same productions but different look-aheads are
combined to form a single set of items.
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I47.
Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I6= Goto ( I2, c)=
C->c•C , $ C-
>•cC , $ C-
>•d,$
These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
2. LALRof
Write about SR conflicts and RR conflicts Table
shift reduce parsers.
There are two conflicts that occur in shift shift-reduce parsing:
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shift-reduce conflict:
Example:
Consider the grammar:
2. Reduce-reduce conflict:
Consider the grammar:
M → R+R | R+c |
RR→c
and input c+c
$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
3. Explain the way to implement a shift-reduce parser using a stack by taking an input string for a
grammar
Shift- Reduce Parsing
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree
for an input string beginning at the leaves (the bottom) and working up towards the root (the
top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.
REDUCTION (LEFTMOST) RIGHTMOST DERIVATION
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the reverse
of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3
The rightmost derivation is :
E → E+E
→ E+E*E
→ E+E*id3
→ E+id2*id3
→ id1+id2*id3
In the above derivation the underlined substrings are called handles.
Handle pruning:
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ reduce by E→ E *E
$E $ accept
4. Consider the following grammar and construct the CLR parsing table.
S→C
C→cC
C→d
the Grammar is
S->C
C->cC
C->d
S`->•S, $= Closure(S`->•S, $)
The dot symbol is followed by a Non terminal S. So, add productions starting with S in
I0 State.
S->•CC, FIRST ($), using 2nd rule
S->•CC, $
The dot symbol is followed by a Non terminal C. So, add productions starting with C in
I0 State.
C->•cC, FIRST(C,
$) C->•d, FIRST(C,
$)
FIRST(C) = {c, d} so, the items are
C->•cC, c/d
C->•d, c/d
The dot symbol is followed by a terminal value. So, close the I0 State. So, the productions in
the I0 are
S`->•S , $
S->•CC ,
$
C->•cC, c/d
C->•d , c/d
I1 =
S-> C->•cC , $
C->•d,$
So, the I2 State is
C->•cC ,
$ C->•d,$
I3= Goto(I0,c)= Closure( C->c•C, c/d)
C->•cC, c/d
C->•d , c/d So, the I3 State is
C->c•C, c/d
C->•cC, c/d
C->•d , c/d
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d
Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.
LR (1) Table
T ->id
Augmented grammar - E’ -> E
E -> T+E | T
T -> id