Bottom Up Parser
Bottom Up Parser
Bottom Up Parser
• A bottom-up parser creates the parse tree of the given input starting
from leaves towards the root.
• A bottom-up parser tries to find the right-most derivation of the given
input in the reverse order.
S ... (the right-most derivation of )
(the bottom-up parser finds the right-most derivation in the reverse order)
• Bottom-up parsing is also known as shift-reduce parsing because its
two main actions are shift and reduce.
– At each shift action, the current symbol in the input string is pushed to a stack.
– At each reduction step, the symbols at the top of the stack (this symbol sequence is the right
side of a production) will replaced by the non-terminal at the left side of that production.
– There are also two more actions: accept and error.
• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production rule.
• If the substring is chosen correctly, the right most derivation of that string is created in
the reverse order.
Rightmost Derivation: *
S
rm
aABb
S rm aaAbb
rm aAbb rm rm aaabb
S=0
rm 1
rm 2
rm ...
rm n-1
rm n=
input string
1. Shift : The next input symbol is shifted onto the top of the stack.
2. Reduce: Replace the handle on the top of the stack by the non-
terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error recovery
routine.
1. Operator-Precedence Parser
– simple, but only a small class of grammars.
CFG
LR
LALR
2. LR-Parsers SLR
– covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (lookhead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different.
LR(k) parsing.
4. Error -- Parser detected an error (an empty entry in the action table)
CS416 Compiler Design 15
Reduce Action
• pop 2|| (=r) items from the stack; let us assume that = Y1Y2...Yr
• then push A and s where s=goto[sm-r,A]
..
(four different possibility)
A aB b
A aBb
A a Bb
• Sets of LR(0) items will be the states of action and goto table of the SLR
parser.
• A collection of sets of LR(0) items (the canonical LR(0) collection) is
the basis for constructing SLR parsers.
• Augmented Grammar:
G’ is G with a new production rule S’S where S’ is the new starting
symbol.
CS416 Compiler Design 19
The Closure Operation
• If I is a set of LR(0) items for a grammar G, then closure(I) is the
set of LR(0) items constructed from I by the two rules:
..
1. Initially, every LR(0) item in I is added to closure(I).
2. If A B is in closure(I) and B is a production rule of G;
then B will be in the closure(I).
We will apply this rule until no more new LR(0) items can be
added to closure(I).
Example:
.. .. .
I ={ E’ E, E E+T, E T,
. .. .
T T*F, T F,
F (E), F id }
.. .
goto(I,E) = { E’ E , E E +T }
goto(I,T) = { E T , T T *F }
.. .. . .
goto(I,F) = {T F }
goto(I,() = { F ( E), E E+T, E T, T T*F, T .
F,
.
F (E), F id }
goto(I,id) = { F id }
• Algorithm:
.
C is { closure({S’ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C
add goto(I,X) to C
I5: F id.
E + T
I0 I1 I6 I9 * to I7
F
( to I3
T
id to I4
*
to I5
F I2 I7 F
( I10
(
I3 id
to I4
E to I5
id Iid
4 T I8 )
F to I2 +
I11
I5 ( to I3
to I6
to I4
CS416 Compiler Design 25
Constructing SLR Parsing Table
(of an augumented grammar G’)
• If the SLR parsing table of a grammar G has a conflict, we say that that
grammar is not SLR grammar.
Problem
FOLLOW(A)={a,b}
FOLLOW(B)={a,b}
a reduce by A b reduce by A
reduce by B reduce by B
reduce/reduce conflict reduce/reduce conflict
...
A .,a n
can be written as
.
A ,a1/a2/.../an
A a
I4: S Aa.Ab ,$ I6: S AaA.b ,$ I8: S AaAb. ,$
A . ,b
B b
I5: S Bb.Ba ,$ I7: S BbB.a ,$ I9: S BbBa. ,$
B . ,a
I9:S L=R.,$
R I13:L *R.,$
I6:S L=.R,$ to I9
L I10:R L.,$
R .L,$ to I10
L .*R,$ * I4 and I11
to I11 R
L .id,$ I11:L *.R,$ to I13
id L
to I12 R .L,$ to I10 I5 and I12
I7:L *R.,$/= L .*R,$ *
to I11
L .id,$ id I7 and I13
I8: R L.,$/= to I12
I12:L id.,$ I8 and I10
CS416 Compiler Design 40
Construction of LR(1) Parsing Tables
1. Construct the canonical collection of sets of LR(1) items for G’.
C{I0,...,In}
2. Create
•
.
the parsing action table as follows
If a is a terminal, A a,b in Ii and goto(Ii,a)=Ij then action[i,a] is shift j.
•
.
If A ,a is in Ii , then action[i,a] is reduce A where AS’.
•
.
If S’S ,$ is in Ii , then action[i,$] is accept.
• If any conflicting actions generated by these rules, the grammar is not LR(1).
• LALR parsers are often used in practice because LALR parsing tables
are smaller than LR(1) parsing tables.
• The number of states in SLR and LALR parsing tables for a grammar G
are equal.
• But LALR parsers recognize more grammars than SLR parsers.
• yacc creates a LALR parser for the given grammar.
• A state of LALR parser will be again a set of LR(1) items.
.. ..
• The core of a set of LR(1) items is the set of its first component.
Ex: S L =R,$ S L =R Core
R L ,$ RL
• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same
cores. Then we will merge them as a single state.
. .
.
I1:L id ,= A new state: I12: L id ,=
.
L id ,$
I2:L id ,$ have same core, merge them
• We will do this for all states of a canonical LR(1) parser to get the states of the LALR
parser.
• In fact, the number of the states of the LALR parser for a grammar will be equal to the
number of states of the SLR parser for that grammar.
.
state of LALR parser must have:
A ,a and .
B a,b
A .,a B .a,c
• This means that a state of the canonical LR(1) parser must have:
and
But, this state has also a shift/reduce conflict. i.e. The original canonical
LR(1) parser has a conflict.
(Reason for this, the shift operation does not depend on lookaheads)
.
I1 : A ,a I2: A ,b .
.
B ,b B ,c .
.
I12: A ,a/b reduce/reduce conflict
.
B ,b/c
.
id
4) L id
5) R L
L
R .
id,$/=
L,$
R
I3:S R ,$ id .
I512:L id ,$/=
to I512
. R I9:S L=R ,$ .
..
I6:S L= R,$
R L,$ L
to I9
to I810
Same Cores
I4 and I11
.
L *R,$
L id,$
*
id
to I411
I5 and I12
to I512
.
I713:L *R ,$/= I7 and I13
.
I810: R L ,$/=
I8 and I10
CS416 Compiler Design 49
LALR(1) Parsing Tables – (for Example2)
id * = $ S L R
0 s5 s4 1 2 3
1 acc
2 s6 r5
3 r2
4 s5 s4 8 7
5 r4 r4 no shift/reduce or
6 s12 s11 10 9 no reduce/reduce conflict
7
8
r3
r5
r3
r5
9 r1 so, it is a LALR(1) grammar
E
E
..E*E E E *E . *
E E*E
E (E) .. id
(
I2 E E *E
I5
E .id
(E)
(
E id
I3
( I 5: E E * E . .
I2: E ( ..E+E
E) E E+E ..
(
E
I2
I8: E E*E + I
..
E E +E *
4
E E*E id
E
E ..E*E E
E (E)
E id
..
I3 E E *E
I5
E
.id
(E)
id E
id I6: E (E ) .. ) I9: E (E) .
.
I3: E id
E E +E
E E *E . +
* I4
I5
E + E
I0 I1 I4 I7
E * E
I0 I1 I5 I7