Bottom Up Parsing
Bottom Up Parsing
(3) int
(2) int * T
(2) int * T Rightmost Derivation (2) int * T Rightmost Derivation Leftmost Derivation
automaton ²
Constructing an LR(0)
LR(0) items
automaton
• Alternate construction of LR(0) automaton
1. Add a dummy start symbol [S’ → S $]
– it is actually easy to construct the
– Distinguishes accepting reductions
deterministic automaton directly rather than
2. Make an automaton for each production construct the non-deterministic automaton
3. For transitions on non-terminals, add ε- first, and then convert it to a deterministic one
edges to the corresponding automaton • A LR(0) item is
4. Apply NFA to DFA conversion – A production, A → X Y Z, with a dot in the
body, e.g., A → . X Y Z
– Represents state of parser
• A → X . Y Z means that the parser has seen X so
far and is looking for a string derivable from Y Z
Deterministic automaton (LR(0))
LR(0) items
1
• CLOSURE(I : item set) E’ Æ .E$
2 12
$
– I ⊆ CLOSURE(I) E Æ .int
E
E’ Æ E.$ E’ Æ E$.
Accept
– If E Æ .(E+E) 3
E Æ .(E-E) int
• [A → α . B β] ∈ CLOSURE(I) and E Æ int. Reduce 2
• [B → γ]
( int
– Then 4 5
int
• [B→. γ] ∈ CLOSURE(I) 6 7 Reduce 3
E Æ (.E+E) E Æ (E.+E)
– “If we’re looking for B β and B → γ then we should E Æ (.E-E) E E Æ (E.-E) +
E Æ (E+.E) E E Æ (E+E.) ) E Æ (E+E).
also be looking for γ” E Æ .int
E Æ.int 8
- E Æ .(E+E)
• GOTO(I, X : symbol) E Æ.(E+E) (
E Æ .(E-E)
E Æ.(E-E)
– If [A → α . X β] ∈ I then [A → α X . β] ∈ GOTO(I,X) int Reduce 4
– CLOSURE(GOTO(I, X)) ⊆ GOTO(I, X) E Æ (E-.E) E E Æ (E-E.) ) E Æ (E-E).
– “If we’re in state I and see symbol X, we are now in ( ( E Æ .int 10 11
state I’” E Æ.(E+E)
E Æ.(E-E)
9
a1 … ai … $ Input a1 … ai … $ Input
Sm Sm
Sm-1 Sm-1
Action[S,a] = Action[S,a] =
… Parser {Shift, Reduce, Accept} … Parser {Shift, Reduce, Accept}
Stack Stack
Goto[S,A] = S Goto[S,A] = S
Y:
* T
ε
LR(0) automaton Shift-reduce conflicts
• Add ε-transitions to • When do we apply ² reductions?
E: 1
T
2
X
3 indicate possible parsing
ε ε
states
4
E
5
E→TX
ε • Apply NFA to DFA + T → ( E ) | int Y
T: 4
(
5
E
6
)
7
conversion X→ε X→+E
T X
int
X→+E|ε
ε 1 2 3
int Y ( ( Y→*T|ε
8 9 E→TX
E )
ε int 6 7 8
(
int T→(E)
X:
+ E Y
10 11 12 9 10 TÆint Y
ε Y→ε * (
int T
Y:
* T
13 14 15 12 13
ε Y→*T
Y→*T
SLR(1) parser SLR(1) parsing example
State Action Goto
int ( ) + * $ E T X Y
• Generate Action table from automaton 1 S9 S6 13 2
– For each edge Si ==>aSj in LR(0) automaton, 2 R5 S4 R5 3
Action[Si,a] = shift Sj 3 R1 R1
– For each “reduce” node Si with reduction [A → β], 4 S9 S6 5 2
Action[Si,a] = reduce A → β where a ∈ FOLLOW(A) 5 R4 R4
• Exception: If the node corresponds to the reduction 6 S9 S6 7 2
S’ → S, Action[Si, $] = accept 7 S8
8 R2 R2
– All other actions are error
9 R7 R7 S11 R7 10
– Conflict between actions Î grammar not SLR
10 R3 R3
• Generate Goto table from automaton 11 S9 S6 12
– For each edge Si ==> Sj in LR(0) automaton, 12 R6 R6 R6
Goto[Si,A] = Sj 13 acc
Example
Rules for LALR(1) constraints 0 S
3 $
S’ Æ .S$ S’ Æ S.$ S’ Æ S$.
g
S Æ.L=R
1 R
S Æ .R S Æ L=.R S Æ L=R. c
L S Æ L.=R = j
L Æ .*R R Æ .L
h R Æ L. a
L Æ .id L Æ .*R L
R R Æ L. d
R Æ .L L Æ .id k
i S Æ R. b id
2 L Æ id. e
id *
Look-ahead(q,A) ⊆ Look-ahead(p,AÆω.) id
FIRST($) ⊆ g *
FIRST(=R) ⊆ h L Æ *.R L Æ *R. f
l R
FIRST(γ) - ² ⊆ Look-ahead (q,A) a⊆h j⊆d R Æ .L
l⊆d L Æ .*R L
If γ Æ* ² m
h⊆e * L Æ .id
Look-ahead(r,BÆβ A.γ) ⊆ Look-ahead(q,A) b⊆i k⊆e
i⊆a m⊆e
g⊆b k⊆f
c⊆j h⊆f g=b=c=j=i=a=m= {$}
d⊆k d⊆m f=l=d=k=e=h= {=,$}
f⊆l
g⊆c
LALR(1) grammars in practice Dealing with ambiguity
• yacc,CUP: parser-generators for LALR(1) • Commonly, parser generators allow
grammars
methods for dealing with ambiguous
• Most PL constructs can be expressed
reasonably in an LALR(1) grammar
grammars
• That fact and the availability of tools like yacc – Precedence and associativity rules for
and CUP have made LALR(1) grammars pretty operators
much the default – Implemented by generating suitable parser
• More powerful grammars like LR(1) are table
described in textbooks but rarely used in
practice
Example
JFlex and CUP
4
5
EÆE+.E +
2 EÆ.int EÆE+E. • JFlex, a lexer generator for Java
+ EÆE.+E
EÆ.E+E E shift-reduce conflict
0
E
EÆE.+E
EÆE.*E EÆ.E*E EÆE.*E • CUP, a parser generator for Java
E Æ .int int
*
E Æ .E+E 1 * +
• Both take specifications and generate
E Æ .E*E int 3 6
EÆint
EÆE*.E
E EÆE*E. Java code
int EÆE.+E shift-reduce conflict
EÆ.int
EÆ.E+E * EÆE.*E
int + * E
0 s1 g2 EÆ.E*E
1 r1 r1 r1
2 s4 s3 EÆint|E*E|E+E //* has higher priority
3 s1 g6
4 s1 g5 precedence left +
5 r2 s3 precedence right *
6 r3 s3
Grammar is ambiguous but ambiguity is resolved
using operator priority and precedence
JFlex CUP
import java_cup.runtime.Symbol; /* Terminals (tokens returned by lexer). */
%% terminal PLUS, MINUS, SLASH, STAR, QUESTION, COLON, LPAREN, RPAREN;
%class Lexer terminal Integer INT;
%cup
%{ non terminal Integer Exp;
private Symbol symbol(int sym) { return new Symbol(sym, yyline+1, yycolumn+1); }
private Symbol symbol(int sym, Object val) { return new Symbol(sym, val); } precedence left QUESTION;
%} precedence left PLUS, MINUS;
IntLiteral = 0 | [1-9][0-9]* precedence left STAR, SLASH;
new_line = \r|\n|\r\n;
white_space = {new_line} | [ \t\f] Exp ::= INT:i {: RESULT = i; :}
%% | Exp:e1 PLUS Exp:e2 {: RESULT = e1 + e2; :}
{IntLiteral} { return symbol(sym.INT, new Integer(Integer.parseInt(yytext()))); } | Exp:e1 MINUS Exp:e2 {: RESULT = e1 - e2; :}
"(" { return symbol(sym.LPAREN); } | Exp:e1 SLASH Exp:e2 {: RESULT = e1 / e2; :}
")" { return symbol(sym.RPAREN); } | Exp:e1 STAR Exp:e2 {: RESULT = e1 * e2; :}
"+" { return symbol(sym.PLUS); } | Exp:e1 QUESTION Exp:e2 COLON Exp:e3 {: RESULT = e1 == 0 ? e3 : e2; :}
... | LPAREN Exp:e1 RPAREN {: RESULT = e1; :}
{white_space} { /* ignore */ } ;
.|\n { error("Illegal character <"+ yytext()+">"); }
S’
S $
S’ → S $ S’ → S $ S’
S $
S’ → S $
S→L=R|R
$
L R S→L=R L → *R | id L R S → L = R, {$}
= =
R→L
S S$
R → L, {$}
R S→R (1) FIRST($) ⊆ FOLLOW(S) R S → R, {$} R$
L
LR(1) example
=
Recap
S $
S’ S’ → S $
L
R= R → L, {=}
LR(1) items LR(1) items
• Equivalence between LR(1) automaton • CLOSURE(I : item set)
– I ⊆ CLOSURE(I)
and LR(1) item sets – If
• [A → α . B β, a] ∈ CLOSURE(I), Like FOLLOW(B) but takes
• A LR(1) item is • [B → γ], and into account of current
• b ∈ FIRST(β a) production; “a” term handles
– An LR(0) item augmented with a lookahead – Then the case when β = ε;
symbol (terminal), e.g., [A → . X Y Z, a] • [B →. γ, b] ∈ CLOSURE(I) FIRST(a) ⊆ FOLLOW(A)
– “If we’re looking for B β and B → γ then we should also be
– The item [A → X Y Z ., a] calls for a reduction looking for γ”
only if the next input symbol is a • GOTO(I, X : symbol)
– If [A → α . X β, a] ∈ I then [A → α X . β, a] ∈ GOTO(I,X)
– CLOSURE(GOTO(I, X)) ⊆ GOTO(I, X)
– “If we’re in state I and see symbol X, we are now in state
I’”
LR(1) parser
• Generate Action table from automaton
a
– For each edge Si ==> Sj in LR(1) automaton,
Action[Si,a] = shift Sj
– For each “reduce” node Si with reduction [A →
β, a], Action[Si,a] = reduce A → β
• Exception: If the node corresponds to the reduction
[S’ → S, $], then Action[Si, $] = accept
– All other actions are error
– If there is a conflict between actions, grammar
is not in LR(1)
• Goto table generated as in SLR(1)