Lecture4 Java
Lecture4 Java
CS414-20034-06
LR Parsing
David Galles
(0) E 0 → E$
(1) E → E + T
(2) E → T
(3) T → T ∗ num
(4) T → num
3 + 4 * 5$
3 * 4 + 5$
06-3: LR Parsing
How do we know when to shift, and when to
reduce?
Use a Deterministic Finite Automaton
Combination of DFA and a stack is called a
Push-down automaton
We will put both states and symbols on the stack
When the end-of-file marker is shifted, accept the
string
06-4: LR Parsing Example
num + * $ E T
1 s2 g3 g4
(0) E 0 → E$ 2 r(4) r(4) r(4)
(1) E → E + T 3 s7 a
(2) E → T 4 r(2) s5 r(2)
(3) T → T ∗ num 5 s6
(4) T → num 6 r(3) r(3) r(3)
7 s2 g8
8 r(1) s(5) r(1)
06-5: LR Parsing
LR(0) Parsers. Reduce as soon as the top of the
stack is the same as the left-hand side of a rule
SLR(1) Parsers. More powerful than LR(0) – adds
some lookahead information
LR(1) Parsers. More powerful than SLR(1) – adds
more sophisticated lookahead information
LALR Parsers. Almost as powerful as LR(1), but
uses much less memory (smaller table sizes)
06-6: LR(0) Parsing
Reads the input file Left-to-Right LR(0)
Creates a Rightmost derivation LR(0)
No Lookahead (0-symbol lookahead) LR(0)
(0) S 0 → S $
(1) S → AA
(2) S → bc
(3) A → baA
(4) A → c
06-8: LR Parsing Example
S0 → S$
S → AA
S → bc
A → baA
A→c
a b c
S0 S0 → S$ S0 → S$
S S → bc S → AA
S → AA
A A → baA A → c
Not LL(1)!
06-9: LR(0) Items
An LR(0) item consists of
A rule from the CFG
A “.” in the rule, which indicates where we
currently are in the rule
S → ab . c
Trying to parse the rule S → abc
Already seen “ab”, looking for a “c”
06-10: LR(0) States & Transitions
(0) S 0 → S $
(1) S → AA
(2) S → bc
(3) A → baA
(4) A → c
06-11: LR(0) States & Transitions
$
State 2 Accept
S’ -> S . $
S
State 3 State 6
State 1 A
S -> A . A S -> AA .
S’ -> . S$ A
A -> . baA
S -> . AA b
A -> . c
S -> . bc b
A -> . baA c State 7
A -> . c A -> b . aA
c
State 4 a
S -> b . c b
State 5 a
A -> b . aA State 8 State 10
A -> c . A
c A -> ba . A A -> baA .
A -> . baA
State 9 A -> . c
S -> bc . c
06-12: LR(0) Parse Table
a b c $ S A
1 s4 s5 g2 g3
2 accept
3 s7 s5 g6
4 s8 s9
5 r(4) r(4) r(4) r(4)
6 r(1) r(1) r(1) r(1)
7 s8
8 s7 s5 g10
9 r(2) r(2) r(2) r(2)
10 r(3) r(3) r(3) r(3)
06-13: Closure & Transitions
Two basic operations for creating LR(0) parsers:
Finding the closure of a state
Finding the transitions out of a state
06-14: Closure
1. For each item in the state of the form S → α . S1 β ,
where α and β are (possibly empty) strings of
terminals and non-terminals, and S1 is a
non-terminal:
For each rule of the form S1 → γ add the item
S1 → . γ if it is not already there
2. If any items were added in step 1, go back to step
1 and repeat
06-15: Closure
If a “.” appears right before the non-terminal S in
an item
Add items for all S rules to the state, with the “.”
at the beginning of the rule
Repeat until no more items can be added
06-16: Finding Transitions
1. If the end-of-file terminal $ appears before the “.” in
some item in the original state, create a transition
from the original state to an “accept” state,
transitioning on $.
2. For each terminal a (other than $) that appears
before the “.” in some item in the original state:
Create a new empty state.
For each item in the original state of the form
S → alpha . a γ , where α and γ are (possibly
empty) strings of terminals an non-terminals,
add the item S → alphaa . γ to the new state.
Find the closure of the new state.
Add a transition from the original state to the
new state, labeled with a.
06-17: Finding Transitions
If a “.” appears just before a terminal a in at least
one item:
Create a new state
Add all items where the “.” is just before an a
Move the “.” past the a in the new state
Find the closure of the new state
Add a transition to the new state, labeled with
an a.
06-18: Another LR(0) Example
(0) E 0 → E $
(1) E → E + T
(2) E → T
(3) T → T * id
(4) T → id
06-19: LR(0) States & Transitions
accept
$
State 1 State 2 State 5
+ id
E’-> . E$ E’ -> E . $ E -> E + . T
E -> . E + T E
E -> E . + T T -> . T * id
E -> . T T -> . id
T -> . T * id
T
T -> . id T
State 3
id
E -> T . State 7
T -> T . * id E -> E + T .
State 4
* T -> T . * id
T -> id . *
State 6 State 8
id
T -> T * . id T -> T * id .
06-20: LR(0) Parse Table
id + * $ E T
1 s4 g2 g3
2 s5 accept
3 r(2) r(2) r(2),s6 r(2)
4 r(4) r(4) r(4) r(4)
5 s4 g7
6 s8
7 r(1) r(1) r(1),s6 r(1)
8 r(3) r(3) r(3) r(3)
06-21: Shift-Reduce Conflict
In state 3, on a *, should we shift, or reduce? Why?
06-22: Shift-Reduce Conflict
In state 3, on a *, should we shift, or reduce? Why?
If we reduce, then we’re stuck – since the top of
the stack will contain E , the next symbol in the
input stream is *, and * cannot follow E in any
partial derivation!
If a state contains the item:
S→γ.
we should only reduce if the next terminal can
follow S
06-23: SLR(1)
Add simple lookahead (the S in SLR(1) is for
simple
In LR(0) parsers, if state k contains the item
“S → γ .” (where S → γ is rule (n))
Put r(n) in state k, in all columns
In SLR(0) parsers, if state k contains the item
“S → γ .” (where S → γ is rule (n))
Put r(n) in state k, in all columns in the follow
set of S
06-24: SLR(1) Parse Table
id + * $ E T
1 s4 g2 g3
2 s5 accept
3 r(2) s6 r(2)
4 r(4) r(4) r(4)
5 s4 g7
6 s8
7 r(1) s6 r(1)
8 r(3) r(3) r(3)
x+y*z w+x+y*z
x*y+z w*x*y+z
06-25: Yet Another Example
(0) S 0 → S $
(1) S → L = R
(2) S → R
(3) L → *R
(4) L → id
(5) R → L
06-26: LR(0) States & Transitions
State 1 S State 2 $
accept
S’ -> . S$ S’ -> S . $
S -> . L = R
S -> . R L
L -> . *R
State 3 = State 7 R State 10
L -> . id R
R -> . L S -> L . = R S -> L = . R S -> L = R .
R -> L . R -> . L id
id * L -> . *R
L -> . id
*
State 6 State 4 L
L -> id . S -> R .
State 8
L -> * R .
State 5
R
L -> * . R
R -> . L
L -> . * R State 9
id L -> . id L
R -> L .
*
06-27: SLR(1) Parse Table
id = * $ S L R
1 s6 s5 g2 g3 g4
2 accept
3 r(5),s7 r(5)
4 r(2)
5 s6 s5 g9 g8
6 r(4) r(4)
7 s6 s5 g9 g10
8 r(3) r(3)
9 r(5) r(5)
10 r(1)
06-28: Why SLR(1) Fails
S→L.=R
R→L.
S
State 3
State 1
S -> L . = R $
L R -> L . $
S’ -> . S$ <none> =
S -> . L = R $ R
S -> . R $
L -> . *R =, $ State 8
L -> . id =, $ State 4
* L -> * R . =, $
R -> . L $ S -> R . $
id
R
State 6
L -> id . =, $ State 5
L -> * . R =, $ State 9
R -> . L =, $ L
R -> L . =, $
L -> . * R =, $
State 10 id
L -> . id =, $
S -> L = R . $
R
*
State 7 L
State 11
S -> L = . R $
R -> . L $ R -> L . $
L -> . *R $ id
State 13
L -> . id $
L -> id . $
*
id
State 12 State 14
L -> * . R $ R L -> * R . $
R -> . L $
L -> . * R $ L
L -> . id $
*
06-32: LR(1) Parse Table
id = * $ S L R
1 s6 s5 g2 g3 g4
2 accept
3 s7 r(5)
4 r(2)
5 s6 s5 g9 g8
6 r(4) r(4)
7 s13 s12 g11 g10
8 r(3) r(3)
9 r(5) r(5)
10 r(1)
11 r(5)
12 s13 s12 g11 g14
13 r(4)
14 r(3) r(3)
06-33: More LR(1) Examples
(0) S 0 → S $
(1) S → BC
(2) S → b
(3) B → bB
(4) B → a
(5) C →
(6) C → cC
06-34: LR(1) States & Transitions
State 2 $
accept
S’ -> S . $ <none>
a b c $ S B C
1 s5 s3 g2 g4
2 accept
3 s5 s7 r(2) g6
4 s9 r(5) g8
5 r(4) r(4)
6 r(3) r(3)
7 s5 s7 g6
8 r(1)
9 s9 g10
10 r(6)
06-36: LALR Parsers
LR(1) Parsers are more powerful than LR(0) or
SLR(1) parsers
LR(1) Parsers can have many more states than
LR(0) or SLR(1) parsers
My simpleJava implementation has 139 LR(0)
states, and thousands of LR(1) states
We’d like nearly the power of LR(1), with the
memory requirements of LR(0)
06-37: LALR Parsers
LR(1) parsers can have large numbers of states
Many of the states will be nearly the same – they
will differ only in Lookahead
IDEA – Combine states that differ only in
lookahead values
Set lookahead of combined state to union of
lookahead values from combining states
06-38: LALR Parser Example
State 2
$
S’ -> S . $ <none> Accept
S
State 3
State 1
S -> L . = R $
L R -> L . $
S’ -> . S$ <none> =
S -> . L = R $ R
S -> . R $
L -> . *R =, $ State 8
L -> . id =, $ State 4
* L -> * R . =, $
R -> . L $ S -> R . $
id
R
State 6
L -> id . =, $ State 5
L -> * . R =, $ State 9
R -> . L =, $ L
R -> L . =, $
L -> . * R =, $
State 10 id
L -> . id =, $
S -> L = R . $
R
*
State 7 L
State 11
S -> L = . R $
R -> . L $ R -> L . $
L -> . *R $ id
State 13
L -> . id $
L -> id . $
*
id
State 12 State 14
L -> * . R $ R L -> * R . $
R -> . L $
L -> . * R $ L
L -> . id $
S
State 3
State 1
S -> L . = R $
L R -> L . $
S’ -> . S$ <none> =
S -> . L = R $ R
S -> . R $
L -> . *R =, $ State 8-14
L -> . id =, $ State 4
* L -> * R . =, $
R -> . L $ S -> R . $
id
R
State 6-13
L -> id . =, $ State 5-12
L -> * . R =, $ State 9-11
R -> . L =, $ L
R -> L . =, $
L -> . * R =, $
State 10 id
L -> . id =, $
S -> L = R . $
R *
*
State 7 L
id S -> L = . R $
R -> . L $
L -> . *R $
L -> . id $
06-40: LALR Parser Example
id = * $ S L R
1 s6-13 s5-12 g2 g3 g4
2 accept
3 s7 r(5)
4 r(2)
5-12 s6-13 s5-12 g9-11 g8-14
6-13 r(4) r(4)
7 s6-13 s5-12 g9-11 g10
8-14 r(3) r(3)
9-11 r(5) r(5)
10 r(1)
06-41: More LALR Examples
(0) S 0 → S $
(1) S → Aa
(2) S →bAc
(3) S → B c
(4) S →bB a
(5) A →d
(6) B →d
06-42: More LALR Examples
S State 4
State 1 State 8
S’ -> S . $ -
S’ -> . S $ - a S -> Aa . $
S -> . Aa $
S -> . bAc $
S -> . Bc $ A State 5
S -> . bBa $ S -> A . a $
A -> . d a State 9
B -> . d c A S -> bA . c $
b
State 6 c
d B S -> b . Ac $
State 11
S -> b . Ba $
State 2 A -> . d c S -> bAc . $
S -> B . c $ B -> . d a B
d State 10
State 3 S -> bB . a $
State 7
A -> d . a a
A -> d . c
B -> d . c
B -> d . a State 12
S -> bBa . $
06-43: More LALR Examples
S State 4
State 1 State 8
S’ -> S . $ -
S’ -> . S $ - a S -> Aa . $
S -> . Aa $
S -> . bAc $
S -> . Bc $ A State 5
S -> . bBa $ S -> A . a $
A -> . d a State 9
B -> . d c A S -> bA . c $
b
State 6 c
d B S -> b . Ac $
State 11
S -> b . Ba $
State 2 A -> . d c S -> bAc . $
S -> B . c $ B -> . d a B
d State 10
State 3-7 S -> bB . a $
A -> d . a, c a
B -> d . a, c
State 12
S -> bBa . $
06-44: More LALR Examples
S State 4
State 1 State 8
S’ -> S . $ -
S’ -> . S $ - a S -> Aa . $
S -> . Aa $
S -> . bAc $
S -> . Bc $ A State 5
S -> . bBa $ S -> A . a $
A -> . d a State 9
B -> . d c A S -> bA . c $
b
State 6 c
d B S -> b . Ac $
State 11
S -> b . Ba $
State 2 A -> . d c S -> bAc . $
S -> B . c $ B -> . d a B
d State 10
State 3-7 S -> bB . a $
A -> d . a, c a
B -> d . a, c
State 12
S -> bBa . $