CS346 Bottom Up Parser
CS346 Bottom Up Parser
Resource: Textbook
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
“Compilers: Principles,Techniques, and Tools”,
Addison-Wesley, 1986.
Bottom-Up Parsing
Bottom-up parser:
parse tree created from the given input starting from leaves towards the root
tries to find the right-most derivation of the given input in the reverse order
S ⇒ ... ⇒ ω (the right-most derivation of ω)
← (the bottom-up parser finds the right-most derivation in the reverse order)
Bottom-up parsing: also known as shift-reduce parsing because its two main actions
are shift and reduce
At each shift action, the current symbol in the input string is pushed to a stack
At each reduction step, the symbols at the top of the stack (this symbol sequence is the right
side of a production) replaced by the non-terminal at the left side of that production
Two more actions: accept and error
Shift-Reduce Parsing
Shift-reduce parser tries to reduce the given input string into the starting symbol
At each reduction step, a substring of the input matching to the right side of a production
rule is replaced by the non-terminal at the left side of that production rule
If the substring is chosen correctly, the right most derivation of that string is created in
the reverse order
*
Rightmost Derivation: S⇒ω
rm
rm
* rm
S ⇒ αAω ⇒ αβω
ω is a string of terminals
If the grammar is unambiguous, then every right-sentential form of the
grammar has exactly one handle
Handle Pruning
A right-most derivation in reverse can be obtained by handle-pruning
S=γ0 ⇒
rm γ1 rm
⇒ γ2 rm
⇒ ...rm⇒ γn-1 rm
⇒ γn= ω
input string
Start from γn, find a handle An→βn in γn, and replace βn by An to get γn-1
Then find a handle An-1→βn-1 in γn-1, and replace βn-1 by An-1 to get γn-2
1. Shift :The next input symbol is shifted onto the top of the stack
2. Reduce: Replace the handle on the top of the stack by the non-terminal
3. Accept: Successful completion of parsing
4. Error: Parser discovers a syntax error, and calls an error recovery routine
1. Operator-Precedence Parser
simple, but only a small class of grammars
CFG
LR
LALR
2. LR-Parsers
SLR
covers wide range of grammars
SLR – simple LR parser
LR – most general LR parser
LALR – intermediate LR parser (lookahead LR parser)
SLR, LR and LALR work same, only their parsing tables are different
LR Parsers
The most powerful shift-reduce parsing (yet efficient) is:
LR(k) parsing
Sm and ai decides the parser action by consulting the parsing action table
(Initial Stack contains just So )
So : does not represent any grammar symbol
4. Error -- Parser detected an error (an empty entry in the action table)
Reduce Action
pop 2|β| (=r) items from the stack; let us assume that β =
Y1Y2...Yr
then push A and s where s=goto[sm-r, A]
In fact,Y1Y2...Yr is a handle
2) E→T 0 s5 s4 1 2 3
1 s6 acc
3) T → T*F 2 r2 s7 r2 r2
4) T→F 3 r4 r4 r4 r4
5) F → (E) 4 s5 s4 8 2 3
6) F → id 5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Actions of A (S)LR-Parser -- Example
stack input action output
0 id*id+id$ shift 5
0id5 *id+id$ reduce by F→id F→id
0F3 *id+id$ reduce by T→F T→F
0T2 *id+id$ shift 7
0T2*7 id+id$ shift 5
0T2*7id5 +id$ reduce by F→id F→id
0T2*7F10 +id$ reduce by T→T*F T→T*F
0T2 +id$ reduce by E→T E→T
0E1 +id$ shift 6
0E1+6 id$ shift 5
0E1+6id5 $ reduce by F→id F→id
0E1+6F3 $ reduce by T→F T→F
0E1+6T9 $ reduce by E→E+T E→E+T
0E1 $ accept
Constructing SLR Parsing Tables – LR(0) Item
. .
2. If A → α Bβ is in closure(I) and B→γ is a production rule of G; then
B→ γ will be in the closure (I)
We will apply this rule until no more new LR(0) items can be added to
closure(I)
The Closure Operation -- Example
E’ → E .
closure({E’ → E}) =
E → E+T .
{ E’ → E
.
kernel items
E →T E → E+T
T → T*F .
E→ T
T→F .
T → T*F
F → (E) .
T→ F
F → id .
F → (E)
.
F → id }
Kernel and Non-kernel Items
Each set of items formed by taking the closure of a set of kernel items
.
terminal), then goto (I,X) is defined as follows:
.
If A → α Xβ in I then every item in closure({A → αX β}) will be in
goto (I,X)
Example:
I ={ .. ..
E’ → E, E → E+T, E → T, .
. .. .
T → T*F, T → F,
F → (E), F → id }
..
goto (I,E) = { E’ → E , E → E +T }
..
goto (I,T) = { E → T , T → T *F }
goto (I,F) = {T → F }
.. . . .
..
goto (I,() = { F → ( E), E → E+T, E →
F → (E), F → id }
goto (I,id) = { F → id }
T, T → T*F, T → F,
Construction of The Canonical LR(0) Collection
Algorithm:
.
C is { closure({S’→ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
I5: F → id.
Transition Diagram (DFA)
E + T
I0 I1 I6 I9 * to I7
F
( to I3
T
id to I4
to I5
F I2 * I7 F
I10
(
I3 id to I4
(
to I5
I4 E I8 )
id id T
F
to I2 + I11
I5 to I3 to I6
(
to I4
Constructing SLR Parsing Table
(of an augmented grammar G’)
I5: F → id.
Parsing Tables of Expression Grammar
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
SLR(1) Grammar
An LR parser using SLR(1) parsing tables for a grammar G is called as the
SLR(1) parser for G
Every SLR grammar is unambiguous, but every unambiguous grammar is not a SLR
grammar
shift/reduce and reduce/reduce conflicts
If the SLR parsing table of a grammar G has a conflict, we say that the
grammar is not SLR grammar
Conflict Example
S → L=R I0: S’ → .S I1: S’ → S. I6: S → L=.R I9: S L=R.
S→R S → .L=R R → .L
L→ *R S → .R I2: S → L.=R L→ .*R
L → id L → .*R R → L. L → .id
R→L L → .id
R → .L I3: S → R.
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A → ε b reduce by A → ε
reduce by B → ε reduce by B → ε
reduce/reduce conflict reduce/reduce conflict
Constructing Canonical LR(1) Parsing Tables
In SLR method, the state i makes a reduction by A→α when the
current token is a:
if A→α. in Ii and a is in FOLLOW(A)
S → BbBa
A→ε Aab ⇒ ε ab Bba ⇒ ε ba
B→ε AaAb ⇒ Aa ε b BbBa ⇒ Bb ε a
An Example
S → L=R I0: S’ → .S I1: S’ → S. I6: S → L=.R I9: S → L=R.
S→R S → .L=R R → .L
L→ *R S → .R I2: S → L.=R L→ .*R
L → id L → .*R R → L. L → .id
R→L L → .id
R → .L I3: S → R.
can be written as
.
A → α β,a1/a2/.../an
Canonical LR(1) Collection -- Example
S → AaAb I0: S’ → .S ,$ I1: S’ → S. ,$
S
S → BbBa S → .AaAb ,$
a
A→ε S → .BbBa ,$ A I2: S → A.aAb ,$ to I4
B→ε A → . ,a
B b
B → . ,b I3: S → B.bBa ,$ to I5
b
I4: S → Aa.Ab ,$ A I6: S → AaA.b ,$ I8: S → AaAb. ,$
A → . ,b
a
I5: S → Bb.Ba ,$ B I7: S → BbB.a ,$ I9: S → BbBa. ,$
B → . ,a
Canonical LR(1) Collection – Example 2
S’ → S I0:S’ → .S,$ I1:S’ → S.,$ I4: L → *.R,= R to I7
1) S → L=R S → .L=R,$ S * R → .L, = L
to I8
2) S → R S → .R,$ L I2:S → L.=R,$ to I6 L→ .*R, = *
3) L→ *R L → .*R,= R → L.,$ L → .id, = to I4
id
4) L → id L → .id,= R to I5
I3:S → R.,$ id
I5:L → id., =
5) R → L R → .L,$
I9:S → L=R.,$
R I13:L → *R.,$
I6:S → L=.R,$ to I9
R → .L,$ L I10:R → L.,$
to I10
L → .*R,$ *
R
I4 and I11
L → .id,$ to I11 I11:L → *.R,$ to I13
id R → .L,$ L I5 and I12
to I12 to I10
L→ .*R,$ *
I7:L → *R.,= L → .id,$ to I11 I7 and I13
id
I8: R → L.,= to I12 I8 and I10
I12:L → id.,$
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’.
C←{I0,...,In}
2. Create the parsing action table as follows
• .
If a is a terminal, A→α aβ,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
• .
If A→α ,a is in Ii , then action[i,a] is reduce A→α where A≠S’.
• .
If S’→S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not LR(1)
I9:S → L=R.,$
R I13:L → *R.,$
I6:S → L=.R,$ to I9
R → .L,$ L I10:R → L.,$
to I10
L → .*R,$ *
R
I4 and I11
L → .id,$ to I11 I11:L → *.R,$ to I13
id R → .L,$ L I5 and I12
to I12 to I10
L→ .*R,$ *
I7:L → *R.,= L → .id,$ to I11 I7 and I13
id
I8: R → L.,= to I12 I8 and I10
I12:L → id.,$
LR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5 ⇓
9 r1 so, it is a LR(1) grammar
10 r5
11 s12 s11 10 13
12 r4
13 r3
LALR Parsing Tables
R → L ,$
..
Ex: S → L =R,$
R→L
..
S → L =R Core
Find the states (sets of LR(1) items) in a canonical LR(1) parser with the same cores. Merge
them as a single state
. .
.
I1:L → id ,= A new state: I12: L → id ,=
.
L → id ,$
I2:L → id ,$ have same core, merge them
Do this for all states of a canonical LR(1) parser to get the states of the LALR parser
number of the states of the LALR parser = number of states of the SLR parser for any grammar
Creation of LALR Parsing Tables
Create the canonical LR(1) collection of the sets of LR(1) items for the
given grammar
Find each core; find all sets having that same core; replace those sets having
same cores with a single set which is their union.
C={I0,...,In} C’={J1,...,Jm} where m ≤ n
Create the parsing tables (action and goto tables) same as the construction
of the parsing tables of LR(1) parser
Note that: If J=I1 ∪ ... ∪ Ik since I1,...,Ik have same cores
cores of goto (I1,X),..., goto (I2,X) must be same
So, goto (J,X)=K, where K is the union of all sets of items having same cores
as goto (I1,X)
Grammar is LALR(1) if no conflict is introduced
possible to introduce reduce/reduce conflicts
cannot introduce a shift/reduce conflict
Shift/Reduce Conflict
Assume that we can introduce a shift/reduce conflict. In this case, a state of
LALR parser must have:
A → α ,a . and .
B → β aγ, b
This means that a state of the canonical LR(1) parser must have:
A → α ,a. and .
B → β aγ, c
But, this state has also a shift/reduce conflict. i.e. the original canonical LR(1)
parser has a conflict
(Reason for this, the shift operation does not depend on lookaheads)
Reduce/Reduce Conflict
For reduce/reduce conflict:
.
I1 : A → α ,a .
I2: A → α ,b
.
B → β ,b .
B → β ,c
⇓
.
I12: A → α ,a/b reduce/reduce conflict
.
B → β ,b/c
Canonical LR(1) Collection – Example 2
S’ → S I0:S’ → .S,$ I1:S’ → S.,$ I4:L → *.R,= R to I7
1) S → L=R S → .L=R,$ S * R → .L, = L
to I8
2) S → R S → .R,$ L I2:S → L.=R,$ to I6 L→ .*R, = *
3) L→ *R L → .*R,= R → L.,$ L → .id, = to I4
id
4) L → id L → .id,= R to I5
I3:S → R.,$ id
I5:L → id., =
5) R → L R → .L,$
I9:S → L=R.,$
R I13:L → *R.,$
I6:S → L=.R,$ to I9
R → .L,$ L I10:R → L.,$
to I10
L → .*R,$ *
R
I4 and I11
L → .id,$ to I11 I11:L → *.R,$ to I13
id R → .L,$ L I5 and I12
to I12 to I10
L→ .*R,$ *
I7:L → *R.,= L → .id,$ to I11 I7 and I13
id
I8: R → L.,= to I12 I8 and I10
I12:L → id.,$
Canonical LALR(1) Collection – Example2
S’ → S I0:S’ → .S,$ .
I1:S’ → S ,$ .
I411: L → * R, =/$ R
1) S → L=R S→ .
L=R,$ S
..* .
R → L, =/$ L
to I713
2) S → R S→ .
R,$ .
L I2:S → L =R,$ to I6 L→ *R, =/$ *
to I810
3) L→ *R L→ .
*R,= R → L ,$ .
L → id, =/$ to I411
4) L → id L→ .
id,= R
.
I3:S → R ,$ id
.
id
to I512
5) R → L R→ .
L,$
I :L → id , =/$
512
.
I6:S → L= R,$
R
to I9 .
I9:S → L=R ,$
..
R → L,$
L → *R,$
L
*
to I810
Same Cores
I4 and I11
.
L → id,$
id
to I411
to I512
I5 and I12
.
I713:L → *R ,=/$
I7 and I13
.
I810: R → L , =/$
I8 and I10
LALR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5 ⇓
9 r1 so, it is a LALR(1) grammar
LR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5 r5 ⇓
9 r1 so, it is a LR(1) grammar
10 r5
11 s12 s11 10 13
12 r4
13 r3
Using Ambiguous Grammars
All grammars used in the construction of LR-parsing tables must be un-ambiguous
Can we create LR-parsing tables for ambiguous grammars ?
Yes, but they will have conflicts
We can resolve these conflicts in favor of one of them to disambiguate the grammar
At the end, we will have again an unambiguous grammar
Ex.
E → E+T | T
E → E+E | E*E | (E) | id T → T*F | F
F → (E) | id
Sets of LR(0) Items for Ambiguous Grammar
I0: E’ → .E E I1: E’ → E. + .
I4: E → E + E E .
I7: E → E+E + I4
E→
E→
..E+E
E*E
..
E → E +E
E → E *E
..
E → E+E
E → E*E
(
I2
..
E → E +E * I
E → E *E 5
E→ .(E) * .
E → (E)
id
E→ .id (
.
E → id
I3
(
.
I5: E → E * E E
.
I2: E → ( .E) ..
E → E+E ( I8: E → E*E + I4
..
E → E +E * I
E→
E→
..E+E E → E*E id
.
E → (E) I3
I2
E → E *E 5
E→
E*E
.(E) E
.
E → id
id E→ .id id .
I6: E → (E ) ) I9: E → (E) .
I : E → id.
..
E → E +E
E → E *E
+
* I4
3
I5
SLR-Parsing Tables for Ambiguous Grammar
FOLLOW(E) = { $,+,*,) }
E + E
I0 I1 I4 I7
E * E
I0 I1 I5 I8