100% found this document useful (1 vote)
101 views

LR Parsing Table Costruction

This document discusses LR parsing table construction. It begins with an example of an LR parsing of the input string "id * id + id". It then discusses how LR parsing works by constructing a parse table from configurations and transitions between configurations as the input is shifted and reduced. The document goes on to describe how to construct LR parsing tables using different methods, including Simple LR (SLR), Canonical LR, and LALR. It covers topics like LR(0) items, item set closure, and the goto table. It provides an example of building an SLR parsing table from a grammar's canonical LR(0) item sets. Finally, it discusses how ambiguity in a grammar can lead to shift-reduce conflicts when constructing an

Uploaded by

chanakkaya
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
101 views

LR Parsing Table Costruction

This document discusses LR parsing table construction. It begins with an example of an LR parsing of the input string "id * id + id". It then discusses how LR parsing works by constructing a parse table from configurations and transitions between configurations as the input is shifted and reduced. The document goes on to describe how to construct LR parsing tables using different methods, including Simple LR (SLR), Canonical LR, and LALR. It covers topics like LR(0) items, item set closure, and the goto table. It provides an example of building an SLR parsing table from a grammar's canonical LR(0) item sets. Finally, it discusses how ambiguity in a grammar can lead to shift-reduce conflicts when constructing an

Uploaded by

chanakkaya
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 47

LR Parsing Table Costruction

Lecture 6
Syntax Analysis

1
LR parsing example

Grammar:
1. E -> E + T
2. E -> T
3. T -> T * F
4. T -> F
5. F -> ( E )
6. F -> id

2
LR parsing example

CONFIGURATIONS
STACK INPUT ACTION
0 id * id + id $ shift 5

3
STACK INPUT ACTION

(1) 0 id * id + id$ shift


(2) 0 id 5 * id + id$ reduced by F  id
(3) 0F5 * id + id$ reduced by TF
(4) 0T2 * id + id$ shift
(5) 0T2*7 id + id$ shift
(6) 0 T 2 * 7 id 5 + id$ reduced by F  id
(7) 0 T 2 * 7 F 10 + id$
id$ reduced by T  T*F
(8) 0T2 + id$ reduced by ET
(9) 0E1 + id$ shift
(10) 0E1+6 id$
id$ shift
(11) 0 E 1 + 6 id 5 $ reduced by F  id
(12) 0E1+6F3 $ reduced by TF
(13) 0E1+6T9 $ EE+T
(14) 0E1 $ accept

Fig. 4.32. Moves of LR parser on id * id + id.


4
LR grammars
If it is possible to construct an LR parse table for G, we
say “G is an LR grammar”.
LR parsers DO NOT need to parse the entire stack to
decide what to do (other shift-reduce parsers might).
Instead, the STATE symbol summarizes all the information
needed to make the decision of what to do next.
The GOTO function corresponds to a DFA that knows how
to find the HANDLE by reading the top of the stack
downwards.
In the example, we only looked at 1 input symbol at a
time. This means the grammar is LR(1).

5
How to construct an LR parse table?

We will look at 3 methods:


 Simple LR (SLR): simple but not very powerful

 Canonical LR: very powerful but too many states

 LALR: almost as powerful with many fewer states

yacc uses the LALR algorithm.

6
SLR (Simple LR) Parse Table Construction

7
SLR parse tables
The SLR parse table is easy to construct, but the resulting parser is
a little weak.
The table is based on LR(0) ITEMS, or just plain ITEMS.
A LR(0) item is a production G with a dot at some position on the
RHS.
The production A -> XYZ could generate the following LR(0) items:
 A -> .XYZ

 A -> X.YZ

 A -> XY.Z

 A -> XYZ.

The production A -> ε only generates 1 LR(0) item:


 A -> .

8
LR(0) items

An item indicates how far we are in parsing the RHS.

A -> .XYZ means we think we’re at the beginning of an


A production, but haven’t seen an X yet.

A -> X.YZ means we think we’re in the middle of an A


production, have seen an X, and should see a Y
soon.

9
Augmenting the grammar G

Before we can produce an SLR parse table, we have to


AUGMENT the input grammar, G.
Given G, we produce G’, the AUGMENTED GRAMMAR
for G:
 Add a new symbol S’

 Add a new production S’ -> S (where S is the old


start symbol)
 Make S’ the new start symbol

10
Item set closure

We need a new concept: the CLOSURE of a set of


LR(0) items.
If I is a set of items for grammar G’, then the CLOSURE
of I is defined recursively:
 Initially, every item in I is added to closure(I)

 If A -> α . B β is in closure(I) and B -> γ is a pro


duction, then add the item B -> . γ to I, if not alre
ady there.

11
Itemset closure example

E’ -> E Closure(I) = { E’ -> . E


E -> E + T | T E -> . E + T
T -> T * F | F E -> . T
F -> ( E ) | id T -> . T * F
T -> . F
Initial itemset I is { E’ -> .E } F -> . ( E )
F -> . id }

12
The goto table

We also need the function goto(I,X) that takes an


itemset I and a grammar symbol X, and returns the
closure of the set of all items [ A -> α X . β ] such
that [ A -> α . X β ] is in I.

Example: I = { [E’ -> E.], [E -> E. + T] }


goto(I,+) =

13
I0: E'  · E I5: F  id ·
E·E+T
E·T I6: E  E + · T
T·T*F
T·T*F
T·F
T·F
F  · ( E)
F  · (E)
F  · id
F  · id
I7: T  T * · F
F  · ( E)
F  · id

I1: E'  E ·
EE·+T

I2: E  T · I8: F  ( E · )
TT·*F EE·+T

I3: T  F · I9: EE+T·


TT·*F
I4: F  (· E )
E·E+T I10: T  T * F ·
E·T
Fig.
I11: 4.35.
F  (Canonical
E)· LR(0) collection
T·T*F
T·F for grammar (4.19)
F  · (E)
F  · id 14
E + T *
I0 I1 I6 I9 to I7
F
to I3
(
to I4
id
to I5
T * F
I2 I7 I10
(
to I4
id
to I5
F
I3
(

( E )
I4 I8 I11
+
to I6
T
to I2
F
id to I3
id
I5

Fig. 4.36. Transition diagram of DFA D form viable prefixes.


15
Canonical LR(0) itemsets

The CANONICAL LR(0) ITEMSETS can be used to create


the states in the SLR parse table.
We begin with an initial set C = {closure({ [S’->.S] })}.
Then, foreach I in C and each grammar symbol X such
that goto(I,X) is not empty and not in C already, do
 Add goto(I,X) to C

Example: canonical LR(0) itemsets for the same grammar.


Each set in C corresponds to a state in a DFA.

16
How to build the SLR parse table
1. Take the augmented grammar G’
2. Construct the canonical LR(0) itemsets C for G’
3. Associate a state with each itemset Ii in C
4. Construct the parse table as follows:
1. If A -> α . a β is in Ii and goto(Ii,a) = Ij, then set action[i,a]
to “shift j” (“a” here is a terminal)
2. If A -> α . is in Ii then set action[i,a] to “reduce A -> α” for
all a in FOLLOW(A)
3. If S’ -> S . is in Ii then set action[i,$] to “accept”

If any of the actions in the table conflict, then G is NOT SLR.

17
Example SLR table construction
For the first LR(0) itemset in our favorite grammar:

I0: E’ -> .E
E -> .E + T
E -> .T
T -> .T * F
T -> .F
F -> .(E) This gives us action[0,(] = shift 4
F -> .id This gives us action[0,id] = shift 5

18
Using Ambiguous Grammars

19
What to do with ambiguity?

Sometimes it is convenient to leave ambiguity in G


For instance, G1: is simpler than G2:
E -> E + E E -> E + T | T
|E*E E -> T * F | F
|(E) F -> ( E ) | id
| id
But SLR(1), LR(1), and LALR(1) parsers will all have a
shift/reduce conflict for G1.

20
What to do with ambiguity?

Sometimes it is convenient to leave ambiguity in G


For instance, G1: is simpler than G2:
E -> E + E E -> E + T | T
|E*E E -> T * F | F
|(E) F -> ( E ) | id
| id
But SLR(1), LR(1), and LALR(1) parsers will all have a
shift/reduce conflict for G1.

21
LR(0) itemsets for G1

22
Ambiguity leads to conflicts

G1 is ambiguous, so we are guaranteed to get conflicts.

For example, in I7:


 We will add rules to “shift 4” on ‘+’ and “shift 5” on
‘*’.
 For the item E -> E+E. we will add the rule
“reduce E->E+E” to the parse table for each terminal
in FOLLOW(E).
 But! FOLLOW(E) contains + and * -- shift/reduce
conflict.
LR(1) and LALR(1) tables will have the same problems.

23
Resolving the conflicts

Knowing about operator precedence and associativity,


we can resolve the conflicts.
Example: for input “id + id * id”, we will be in state 7 aft
er processing “id + id”
STACK INPUT
0E1+4E7 * id $
since * has higher precedence than +, we should real
ly shift, not reduce.
With a + next in the input, we should reduce, to enforce
left-associativity.
See Fig. 4.47 in text for a complete SLR(1) table.

24
If-else ambiguity

The ambiguity of the “dangling else” creates a shift-


reduce conflict in parsers for most languages.

Since the else is normally associated with the nearest if,


we resolve the conflict by shifting, instead of
educing, when we see “else” in the input.

See the LR(0) states and parse table on page 251.


This method is much simpler than writing an
unambiguous grammar.

25
Non-SLR grammars

Consider the assignment grammar

1. S’ -> S generating, e.g. S =*> id = * id


2. S -> L = R
3. S -> R
4. L -> * R
5. L -> id
6. R -> L

26
Non-SLR grammars
Construct the initial canonical LR(0) itemset I0.
Compute I2 = goto(I0,L) and I6 = goto(I2,=).
Compute FOLLOW(L)
Compute parse table entries for I2: shift/reduce conflict!

This means in state I2, with ‘=’ in the input, we do not


know whether to shift and go to state I6 or reduce
with R -> L, since ‘=’ is in FOLLOW(L).
To correct this, we need to know more about the context
of the L we just parsed.
“Canonical LR(1)” and “LALR(1)” are powerful enough.

27
Canonical LR Parse Table Construction

28
I0: S'  · S I5: L  id ·
S·L=R
S·R
L·*R I6: S  L = · R
R·L
L  · id
L·*R
R·L
L  · id

I1: S'  S · I7: L  * R ·

I2: S  L · = R I8: R  L ·
RL·

I9: SL=R·
I3: S  R ·

I4: L  * · R
R·L
L·*R
L  · id

Fig. 4.37. Canonical LR(0) collection for grammar (4.20).


29
More states means more memory

In SLR, we said in state i we should reduce by A -> α if


the itemset contains the item [A -> α .] and a is in
FOLLOW(A).
However, sometimes when state i is on top of the stack
,
and a is next in the input, what comes BEFORE α on
the stack might invalidate the reduction A -> α.
Example from previous grammar: sentential form
“R = …” is impossible, but “* R =” is possible.
So actually, we really want to reduce by L -> * R when
we see R on stack and “=” in the input.

30
LR(1) idea

Our parser needs to keep track of more state information


.
How can it?
Idea: use canonical LR(0) states, but split states as
needed by adding a terminal symbol to each item.
LR(1) ITEMS take the form [A-> α.β,a], where A-> αβ is
a production in G and a is a terminal symbol or $.
The “1” refers to the length of a, the LOOKAHEAD for
each item. If length = k, we would have an LR(k) item.
In parsing, we will now only reduce αβ. to A if an item’s
lookahead symbol agrees with the next input.

31
LR(1) parse table construction

We need to redefine closure(I) for a set of LR(1) items:


for each
item [A-> α.B β,a] in I
production B -> γ in G’
terminal b in FIRST(β a)
such that [B->. γ,b] is not already in I, do:
add [B->. γ,b] to I
repeat until no more items can be added to I

goto(I,X) is the same as for SLR(1).

32
Example LR(1) parser construction
Begin with augmented grammar G’:
S’ -> S
S -> C C [ what is L(G’)?? ]
C -> c C | d

The first itemset I0 = closure({S’->.S,$}) = {


S’ -> .S,$
S -> .CC,$ [ from S’->.S,$ and S->CC, B=S, α=ε, β= ε ]
C -> .cC,c/d [ from S’->.CC,$ and C->cC, B=C, α= ε, β=C ]
C -> .d,c/d [ from S’->.CC,$ and C->d, B=C, α= ε,
β=C ]
}

33
S' → ·S , $
S → ·CC , $ S S' → S·, $
C → ·cC , c /d
I1
C → ·d , c /d
I0

S → C·C , $
C C → ·cC , $ C S → CC·, $
C → ·d , $ I5
I2

C → c·C , $
c C → ·cC , $ C C → cC·, $
C → ·d , $ I9
I6

d C → d·, $
I7
c

C → c·C , c /d
c C → ·cC , c /d C C → cC·, c /d
C → ·d , c /d I8
I3

d C → d·, c /d
I4

Fig. 4.39. The goto graph for grammar (4.21).


34
LR(1) parsers: the good news

LR(1) is quite similar to SLR(1), with one main difference:


 We only add reduce rules to the parse table when
the input matches the LOOKAHEAD for the item
 SLR(1) adds reduce rules for any terminal in the
FOLLOW set.

This means LR(1) will have fewer shift/reduce and


reduce/reduce conflicts, because it tries to reduce in
fewer situations.

35
LR(1) parsers: the bad news

LR(1) parsers are powerful, able to parse almost any


unambiguous CFG used for real programming
languages.

But there is a price: the number of states is huge.


For the very simple c*dc*d language with 4 productions,
we already needed 10 LR(1) states.
For a typical PL like Pascal, the LR(1) table would
contain a few THOUSAND states!
Is there a technique as powerful with fewer states?

36
action goto
STATE
c d $ S C
0 s3 s4 1 2
1 acc
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2

Fig. 4.40. Canonical parsing table for grammar (4.21).


37
LALR Parse Table Construction

38
LALR parse tables

LALR makes smaller parse tables than canonical LR, but


still covers most common programming language con
structs.
LALR has the same number of states as the SLR parser
for the same grammar, but is more picky about when
to reduce, so fewer conflicts come up.
yacc actually constructs a LALR(1) table, not a canonic
al LR(1) table.

39
LALR idea

Usually, in a LR parser, there will be many states that ar


e identical, except for the lookahead symbol.
LALR takes these identical states and MERGES them, fo
rming the UNION of the lookahead symbols for the m
erged items.

Algorithm: build the LR(1) itemsets, then merge itemset


s with the same CORES.

40
Which LR(1) itemsets
LALR example can be merged?

I0: S’ -> .S,$ I 3: C -> c.C,c/d


S -> .CC,$ C -> .cC,c/d
C -> .cC,c/d C -> .d,c/d
C -> .d,c/d
I5: S -> CC.,$
I1: S’ -> S.,$
I6: C -> c.C,$
I2: S -> C.C,$ C -> .cC,$
C -> .cC,$ C -> .d,$
C -> .d,$
I7: C -> d.,$
I4: C -> d.,c/d
I8: C -> cC.,c/d
I9: C -> cC.,$

41
action goto
STATE
c d $ S C
0 s36 s47 1 2
1 acc
2 s36 s47 5
36 s36 s47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2

Fig. 4.41. LALR parsing table for grammar (4.21).

42
Efficient Construction of LALR Parsing Tables

Example 4.46. Let us again consider the augmented grammar


S'  S
SL=R|R
A  * R | id
BL
The kernels of the sets of LR(0) items for this grammar are shown in Fig. 4.42.

I0: S'  · S I5: L  id ·


I1: S'  S · I6: S  L = · R
I2: S  L · = R
RL· I7: L  * R ·

I3: S  R · I8: R  L ·

I4: L  * · R I9: S  L = R ·

Fig. 4.42. Kernels of the sets of LR(0) items for grammar (4.20).
43
Efficient Construction of LALR Parsing Tables
Example 4.47. Let us construct the kernels of the LALR(1) items for the grammar in
the previous example. The kernels of the LR(0) items were shown in Fig. 4.42.
When we apply Algorithm 4.12 to the kernel of set of items I0, we compute
closure ({[S'  · S, #]}), which is

S'  · S, #
S  · L = R, #
S  · R, #
L  · * R, #/=
L  · id, #/=
R  · L, #

44
FROM TO
I0: S'  · S I1: S'  S ·
I2: SL·=R
I2: RL·
I3: SR·
I4: L*·R
I5: L  id ·

I2: S  L · = R I6: SL=·R


I4: L  * · R I4: L*·R
I5: L  id ·
I7: L*R·
I8: RL·
I6: S  L = · R I4: L*·R
I5: L  . id
I8: RL·
I9: SL=R·

Fig.4.44. Propagation of lookaheads.

45
LOOKAHEADS
SET ITEM
INIT PASS1 PASS2 PASS3
I0: S'  · S $ $ $ $
I1: S'  S · $ $ $
I2: S  L · = R $ $ $
I2: R L · $ $ $
I3: S  R · $ $ $
I4: L  * · R = =/$ =/$ =/$
I5: L  id · = =/$ =/$ =/$
I6: S  L = · R $ $
I7: L  * R · = =/$ =/$
I8: R  L · = =/$ =/$
I9: S  L = R · $

Fig. 4.45. Computation of lookaheads.


46
Next time

- Yacc 사용법은 조교가 설명


- Semantic 처리 (Yacc 에서 배운 것 구현 방법 )

47

You might also like