0% found this document useful (0 votes)
52 views12 pages

Bottom Up Parsing

This document discusses bottom-up parsing and constructing LR(0) parsing automata. It begins by explaining bottom-up parsing builds a parse tree from the leaves to the root by applying a series of reductions based on the grammar productions. It then discusses constructing the LR(0) parsing automaton by adding epsilon transitions between states corresponding to non-terminal symbols. The automaton represents all possible parsing configurations and is then converted to a deterministic automaton for parsing input strings.

Uploaded by

Nimra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views12 pages

Bottom Up Parsing

This document discusses bottom-up parsing and constructing LR(0) parsing automata. It begins by explaining bottom-up parsing builds a parse tree from the leaves to the root by applying a series of reductions based on the grammar productions. It then discusses constructing the LR(0) parsing automaton by adding epsilon transitions between states corresponding to non-terminal symbols. The automaton represents all possible parsing configurations and is then converted to a deterministic automaton for parsing input strings.

Uploaded by

Nimra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Bottom-up parsing

• Bottom-up parsing builds a parse tree from


the leaves (terminals) to the start symbol
(5) E
Bottom-up parsing (4) T + E
E→T+E|T
T T → int * T | int

(3) int

(2) int * T

(1) int * int + int

Bottom-up parsing Bottom-up parsing


• Bottom-up parsing builds a parse tree from • Bottom-up parsing builds a parse tree from
the leaves (terminals) to the start symbol the leaves (terminals) to the start symbol
(5) E (5) E
E => T + E E => T + E E => T + E
(4) T + E (4) T + E
=> T + T => T + T => int * T + E
T => T + int T => T + int => int * int + E
=> int * T + int => int * T + int => int * int + T
(3) int (3) int
=> int * int + int => int * int + int => int * int + int

(2) int * T Rightmost Derivation (2) int * T Rightmost Derivation Leftmost Derivation

(1) int * int + int (1) int * int + int


Bottom-up parsing II Intuition
• Bottom-up parsing is a series of reductions (inverses of • Explanation in terms of reverse rightmost
productions), the reverse of which is the rightmost
derivation derivations is correct but not terribly
(5) E E→T+E|T intuitive
T → int * T | • In particular, connection between top-
(4) T + E
int
(1) int --> T E => T + E down and bottom-up parsing becomes
T (2) int * T --> T => E + T obscured
(3) int (3) int --> T => E + int
• More intuitive explanation
(4) T --> E => int * T + int
(5) T + E --> E => int * int + int – in terms of transition diagram of grammar
(2) int * T

Reductions Rightmost Derivation


(1) int * int + int

Example Transition diagram for


recursive-descent parser
• Consider the grammar E $ Accept
E’ Æ E$ (1) E’ Æ .E$ E’ Æ E.$ E’ Æ E$.
E Æ int (2)
E Æ (E + E) (3)
E Æ (E - E) (4)
• This grammar is not LL(1) E Æ .int int
– although it can be massaged into a LL(1) grammar E Æ .(E+E) E Æ int.
• Key problem with recursive-descent parser: E Æ .(E-E)
– when we are trying to parse an “E” from the input, parser needs to
decide whether to apply production (2), (3) or (4) by just looking at next (
terminal symbol (
– if next terminal is ‘(‘, not clear whether to apply (3) or (4) E Æ (.E+E) E Æ (E.+E) + E Æ (E+.E) E E Æ (E+E.) ) E Æ (E+E).
• Bottom-up parsing E
– instead of making this decision at the start of the RHS of E productions,
delay this decision till we get to end of RHS of productions
– in this case, it is clear that delaying the decision will let us figure out the E Æ (.E-E) E Æ (E.-E) - E Æ (E-.E) E E Æ (E-E.) ) E Æ (E-E).
E
right rule unambiguously when we encounter either ‘+’ or ‘-’ somewhere
along the way
Not clear which transition to take
Transition diagram with
Bottom-up parsing ² transitions
• Idea: think of transition diagram as the E $
Accept

state diagram for a non-deterministic E’ Æ .E$ E’ Æ E.$ E’ Æ E$.

automaton ²

• Automaton follows “all possible paths” till it E Æ .int int Reduce 2


E Æ .(E+E) E Æ int.
either accepts or rejects, thereby delaying E Æ .(E-E)
decision of which production to apply
( ² ²
• Implementation: Reduce 3
(
E Æ (.E+E) E Æ (E.+E) + E Æ (E+.E) E E Æ (E+E.) ) E Æ (E+E).
– make the dashed lines in the transition E
diagram into ² transitions for the automaton ² ² Reduce 4
E Æ (.E-E) E Æ (E.-E) - E Æ (E-.E) E E Æ (E-E.) ) E Æ (E-E).
E

Constructing an LR(0)
LR(0) items
automaton
• Alternate construction of LR(0) automaton
1. Add a dummy start symbol [S’ → S $]
– it is actually easy to construct the
– Distinguishes accepting reductions
deterministic automaton directly rather than
2. Make an automaton for each production construct the non-deterministic automaton
3. For transitions on non-terminals, add ε- first, and then convert it to a deterministic one
edges to the corresponding automaton • A LR(0) item is
4. Apply NFA to DFA conversion – A production, A → X Y Z, with a dot in the
body, e.g., A → . X Y Z
– Represents state of parser
• A → X . Y Z means that the parser has seen X so
far and is looking for a string derivable from Y Z
Deterministic automaton (LR(0))
LR(0) items
1
• CLOSURE(I : item set) E’ Æ .E$
2 12
$
– I ⊆ CLOSURE(I) E Æ .int
E
E’ Æ E.$ E’ Æ E$.
Accept
– If E Æ .(E+E) 3
E Æ .(E-E) int
• [A → α . B β] ∈ CLOSURE(I) and E Æ int. Reduce 2
• [B → γ]
( int
– Then 4 5
int
• [B→. γ] ∈ CLOSURE(I) 6 7 Reduce 3
E Æ (.E+E) E Æ (E.+E)
– “If we’re looking for B β and B → γ then we should E Æ (.E-E) E E Æ (E.-E) +
E Æ (E+.E) E E Æ (E+E.) ) E Æ (E+E).
also be looking for γ” E Æ .int
E Æ.int 8
- E Æ .(E+E)
• GOTO(I, X : symbol) E Æ.(E+E) (
E Æ .(E-E)
E Æ.(E-E)
– If [A → α . X β] ∈ I then [A → α X . β] ∈ GOTO(I,X) int Reduce 4
– CLOSURE(GOTO(I, X)) ⊆ GOTO(I, X) E Æ (E-.E) E E Æ (E-E.) ) E Æ (E-E).
– “If we’re in state I and see symbol X, we are now in ( ( E Æ .int 10 11
state I’” E Æ.(E+E)
E Æ.(E-E)
9

Shift-reduce parsing Shift-reduce parsing


• Shift Sm : Push Sm on stack; increment • Reduce A → β : Pop |β| symbols; push
input position Goto[Sm-|β|,A] on stack

a1 … ai … $ Input a1 … ai … $ Input

Sm Sm
Sm-1 Sm-1
Action[S,a] = Action[S,a] =
… Parser {Shift, Reduce, Accept} … Parser {Shift, Reduce, Accept}
Stack Stack

Goto[S,A] = S Goto[S,A] = S

Action Goto Action Goto


LR(0) parsing table Example
State Action Goto
Stack Input Action
int ( ) + - $ E
1 S3 S4 2 1 (int + ( int – int))$ s4
2 acc 1(4 int + ( int – int))$ s3
1 ( 4 int 3 + ( int – int))$ r2
3 R2 R2 R2 R2 R2 R2
1(4E5 + ( int – int))$ s6
4 S3 S4 5 1(4E5+6 ( int – int))$ s4
5 S6 S9 1(4E5+6(4 int – int))$ s3
1 ( 4 E 5 + 6 ( 4 int 3 - int))$ r2
6 S3 S4 7
1(4E5+6(4E5 - int))$ s9
7 S8 1(4E5+6(4E5–9 int))$ s3
8 R3 R3 R3 R3 R3 R3 1 ( 4 E 5 + 6 ( 4 E 5 – 9 int 3 ))$ r2
1 ( 4 E 5 + 6 ( 4 E 5 – 9 E 10 ))$ s11
9 S3 S4 10
1 ( 4 E 5 + 6 ( 4 E 5 – 9 E 10 ) 11 )$ r4
10 S11 1(4E5+6E7 )$ s8
11 R4 R4 R4 R4 R4 R4 1(4E5+6E7)8 $ r3
1E2 $ acc
12

Conflicts in LR(0) table


• When the grammar is not LR(0), there are
Example
conflicts in the LR(0) parsing table E→TX
– shift-reduce conflicts: in some state, it is E: T X
T → ( E ) | int Y
possible to perform both a shift and a reduce X→+E|ε
– reduce-reduce conflicts: in some state, two or T: ( E ) Y→*T|ε
more reductions can be applied
int Y
• Example: SÆ aB | Ac
A Æ ab
BÆb
X:
+ E

Y:
* T

ε
LR(0) automaton Shift-reduce conflicts
• Add ε-transitions to • When do we apply ² reductions?
E: 1
T
2
X
3 indicate possible parsing

ε ε
states
4
E
5
E→TX
ε • Apply NFA to DFA + T → ( E ) | int Y
T: 4
(
5
E
6
)
7
conversion X→ε X→+E
T X
int
X→+E|ε
ε 1 2 3
int Y ( ( Y→*T|ε
8 9 E→TX
E )
ε int 6 7 8
(
int T→(E)
X:
+ E Y
10 11 12 9 10 TÆint Y
ε Y→ε * (
int T
Y:
* T
13 14 15 12 13

ε Y→*T

LR(k) grammars SLR(1) parser


• When do we apply ² reductions?
• Recognize most programming language constructs • when look-ahead token is in FOLLOW set of non-terminal
– LR(k) recognizes the body of a production in right-sentential form
with k symbols of lookahead E E→TX
• Determine when to apply reductions, A → β, given string δβa1…akw 4 5

– LL(k) recognizes the use of a production after seeing the first k


+
X→+E
T → ( E ) | int Y
X→ε
symbols of what the body derives T X
int
X→+E|ε
• Determine when to apply productions, A → a1…akβ, given string 1 2 3
wa1…akβδ ( ( Y→*T|ε
E→TX
• LR(k) is a proper superset of LL(k) E )
int 6 7 8
• However, tables can get very large, so we usually use
tricks to add some look-ahead to LR(0) automaton int (
T→(E) FOLLOW(X) = { ), $ }
Y
9 10 TÆint Y FOLLOW(Y) = { +, ), $ }
Y→ε * (
int T
12 13

Y→*T
SLR(1) parser SLR(1) parsing example
State Action Goto
int ( ) + * $ E T X Y
• Generate Action table from automaton 1 S9 S6 13 2
– For each edge Si ==>aSj in LR(0) automaton, 2 R5 S4 R5 3
Action[Si,a] = shift Sj 3 R1 R1
– For each “reduce” node Si with reduction [A → β], 4 S9 S6 5 2
Action[Si,a] = reduce A → β where a ∈ FOLLOW(A) 5 R4 R4
• Exception: If the node corresponds to the reduction 6 S9 S6 7 2
S’ → S, Action[Si, $] = accept 7 S8
8 R2 R2
– All other actions are error
9 R7 R7 S11 R7 10
– Conflict between actions Î grammar not SLR
10 R3 R3
• Generate Goto table from automaton 11 S9 S6 12
– For each edge Si ==> Sj in LR(0) automaton, 12 R6 R6 R6
Goto[Si,A] = Sj 13 acc

Problem with SLR(1) LR(0) automaton


S’ → S $
S’
S $
S’ → S $ S’ → S $ S→L=R|R
shift-reduce conflict
S→L=R|R L → *R | id
²
L R S→L=R L → *R | id R→L
=
R→L
S S
S’ Æ .S$ S’ Æ S.$
S→R S Æ.L=R
R
R ² (1) FIRST($) ⊆ FOLLOW(S) S Æ .R S Æ L=.R S Æ L=R.
L S Æ L.=R =
(2) FIRST(=) ⊆ FOLLOW(L) L Æ .*R R Æ .L
² ² ² R Æ L.
L Æ .id L Æ .*R L
R L→*R (3) FOLLOW(R) ⊆ FOLLOW(L) R Æ .L R R Æ L.
L * L Æ .id
(4) FOLLOW(S) ⊆ FOLLOW(R) S Æ R. id
id L Æ id.
(5) FOLLOW(S) ⊆ FOLLOW(R) *
L → id id
² (6) FOLLOW(L) ⊆ FOLLOW(R)
L Æ *.R L Æ *R.
R
FOLLOW(S) = { $ } Constraints FOLLOW(S) = { $ } R Æ .L
R→L FOLLOW(R) = { =, $ } L Æ .*R
R
L FOLLOW(R) = { =, $ }
* L Æ .id
FOLLOW(L) = { =, $ } Solutions FOLLOW(L) = { =, $ }
Finding look-aheads in LR(0) graph
0 S
3 $
LALR(1) grammars
S’ Æ .S$ S’ Æ S.$ S’ Æ S$.
S Æ.L=R
S Æ .R 1
S Æ L=.R
R
S Æ L=R. • Generate LR(0) automaton.
L S Æ L.=R =
L Æ .*R R Æ .L • If there are no conflicts, done.
R Æ L.
L Æ .id L Æ .*R L
R R Æ L. • Otherwise, figure out look-ahead sets for
R Æ .L L Æ .id
S Æ R. id reductions by tracing paths backwards (and
2 L Æ id.
*
id
forwards) in the automaton graph.
*
L Æ *.R L Æ *R. • Doing this systematically:
R
Easy to see that if we are in state 1 R Æ .L – set up a system of constraints as shown in next slide
and reduction R Æ L. is valid, L Æ .*R L • reason is that these sets depend on each other as in the
look-ahead symbol cannot be =. L Æ .id example
Parser moves: – look-ahead set for RÆL. depended on look-ahead set for SÆR.
-Pop state 1 from stack which in turn depended on look-ahead set for 0Æ3 transition
-Topmost state must be state 0 – solve them
-Take R transition: push state 2
-Pop state 2 and take S transition to 3
-Only look-ahead symbol is $.
You can figure this out by tracing paths in graph Î intuition behind LALR(1) grammars

Example
Rules for LALR(1) constraints 0 S
3 $
S’ Æ .S$ S’ Æ S.$ S’ Æ S$.
g
S Æ.L=R
1 R
S Æ .R S Æ L=.R S Æ L=R. c
L S Æ L.=R = j
L Æ .*R R Æ .L
h R Æ L. a
L Æ .id L Æ .*R L
R R Æ L. d
R Æ .L L Æ .id k
i S Æ R. b id
2 L Æ id. e
id *
Look-ahead(q,A) ⊆ Look-ahead(p,AÆω.) id
FIRST($) ⊆ g *
FIRST(=R) ⊆ h L Æ *.R L Æ *R. f
l R
FIRST(γ) - ² ⊆ Look-ahead (q,A) a⊆h j⊆d R Æ .L
l⊆d L Æ .*R L
If γ Æ* ² m
h⊆e * L Æ .id
Look-ahead(r,BÆβ A.γ) ⊆ Look-ahead(q,A) b⊆i k⊆e
i⊆a m⊆e
g⊆b k⊆f
c⊆j h⊆f g=b=c=j=i=a=m= {$}
d⊆k d⊆m f=l=d=k=e=h= {=,$}
f⊆l
g⊆c
LALR(1) grammars in practice Dealing with ambiguity
• yacc,CUP: parser-generators for LALR(1) • Commonly, parser generators allow
grammars
methods for dealing with ambiguous
• Most PL constructs can be expressed
reasonably in an LALR(1) grammar
grammars
• That fact and the availability of tools like yacc – Precedence and associativity rules for
and CUP have made LALR(1) grammars pretty operators
much the default – Implemented by generating suitable parser
• More powerful grammars like LR(1) are table
described in textbooks but rarely used in
practice

Example
JFlex and CUP
4
5
EÆE+.E +
2 EÆ.int EÆE+E. • JFlex, a lexer generator for Java
+ EÆE.+E
EÆ.E+E E shift-reduce conflict
0
E
EÆE.+E
EÆE.*E EÆ.E*E EÆE.*E • CUP, a parser generator for Java
E Æ .int int
*
E Æ .E+E 1 * +
• Both take specifications and generate
E Æ .E*E int 3 6
EÆint
EÆE*.E
E EÆE*E. Java code
int EÆE.+E shift-reduce conflict
EÆ.int
EÆ.E+E * EÆE.*E
int + * E
0 s1 g2 EÆ.E*E
1 r1 r1 r1
2 s4 s3 EÆint|E*E|E+E //* has higher priority
3 s1 g6
4 s1 g5 precedence left +
5 r2 s3 precedence right *
6 r3 s3
Grammar is ambiguous but ambiguity is resolved
using operator priority and precedence
JFlex CUP
import java_cup.runtime.Symbol; /* Terminals (tokens returned by lexer). */
%% terminal PLUS, MINUS, SLASH, STAR, QUESTION, COLON, LPAREN, RPAREN;
%class Lexer terminal Integer INT;
%cup
%{ non terminal Integer Exp;
private Symbol symbol(int sym) { return new Symbol(sym, yyline+1, yycolumn+1); }
private Symbol symbol(int sym, Object val) { return new Symbol(sym, val); } precedence left QUESTION;
%} precedence left PLUS, MINUS;
IntLiteral = 0 | [1-9][0-9]* precedence left STAR, SLASH;
new_line = \r|\n|\r\n;
white_space = {new_line} | [ \t\f] Exp ::= INT:i {: RESULT = i; :}
%% | Exp:e1 PLUS Exp:e2 {: RESULT = e1 + e2; :}
{IntLiteral} { return symbol(sym.INT, new Integer(Integer.parseInt(yytext()))); } | Exp:e1 MINUS Exp:e2 {: RESULT = e1 - e2; :}
"(" { return symbol(sym.LPAREN); } | Exp:e1 SLASH Exp:e2 {: RESULT = e1 / e2; :}
")" { return symbol(sym.RPAREN); } | Exp:e1 STAR Exp:e2 {: RESULT = e1 * e2; :}
"+" { return symbol(sym.PLUS); } | Exp:e1 QUESTION Exp:e2 COLON Exp:e3 {: RESULT = e1 == 0 ? e3 : e2; :}
... | LPAREN Exp:e1 RPAREN {: RESULT = e1; :}
{white_space} { /* ignore */ } ;
.|\n { error("Illegal character <"+ yytext()+">"); }

LR(1) LR(1) example


S’
S $
S’ → S $ S’ → S $
• Use k = 1 lookahead symbols to determine S→L=R|R
(1)
when to shift rather than reduce L R S→L=R L → *R | id
=
R→L
S
– Reduce only when we have a matching
lookahead R S→R (1) FIRST($) ⊆ FOLLOW(S)
(2) FIRST(=) ⊆ FOLLOW(L)
– The set of lookahead symbols for A is some (2)
L→*R (3) FOLLOW(R) ⊆ FOLLOW(L)
* R
subset of FOLLOW(A) L
(4) FOLLOW(S) ⊆ FOLLOW(R)
id
• Use LR(0) automaton to give intuition (3) (5) FOLLOW(S) ⊆ FOLLOW(R)
L → id
(6) FOLLOW(L) ⊆ FOLLOW(R)
about LR(1) (5)
(4) (6) FOLLOW(S) = { $ } Constraints
R→L
R
L FOLLOW(R) = { =, $ }
FOLLOW(L) = { =, $ } Solutions
LR(1) example LR(1) example
=

S’
S $
S’ → S $ S’ → S $ S’
S $
S’ → S $
S→L=R|R
$
L R S→L=R L → *R | id L R S → L = R, {$}
= =
R→L
S S$

R → L, {$}
R S→R (1) FIRST($) ⊆ FOLLOW(S) R S → R, {$} R$
L

(2) FIRST(=) ⊆ FOLLOW(L)


= R L → * R, {=, $}
R L→*R (3) FOLLOW(R) ⊆ FOLLOW(L) R L=$ *
L * L= *
(4) FOLLOW(S) ⊆ FOLLOW(R) id
id id
=,$ (5) FOLLOW(S) ⊆ FOLLOW(R) L → * R, {=} L → id, {=, $}
L → id
(6) FOLLOW(L) ⊆ FOLLOW(R)
$
$ =,$ FOLLOW(S) = { $ } Constraints L → id, {=}
R→L
R
L FOLLOW(R) = { =, $ } L
R=$ R → L, {=, $}
FOLLOW(L) = { =, $ } Solutions

LR(1) example
=
Recap
S $
S’ S’ → S $

1. Use “context” of ε-moves to introduce


S$
L = R S → L = R, {$} states corresponding to the terminal(s)
we expect to see after non-terminal
R S → R, {$} L R → L, {$}
R$ – State dependent FOLLOW
L → * R, {$}
L$ * R – Subset of FOLLOW
* R
L= id
id 2. Propagate lookahead to reduction rules
L → * R, {=} L → id, {$}
3. Perform NFA to DFA conversion
L → id, {=}

L
R= R → L, {=}
LR(1) items LR(1) items
• Equivalence between LR(1) automaton • CLOSURE(I : item set)
– I ⊆ CLOSURE(I)
and LR(1) item sets – If
• [A → α . B β, a] ∈ CLOSURE(I), Like FOLLOW(B) but takes
• A LR(1) item is • [B → γ], and into account of current
• b ∈ FIRST(β a) production; “a” term handles
– An LR(0) item augmented with a lookahead – Then the case when β = ε;
symbol (terminal), e.g., [A → . X Y Z, a] • [B →. γ, b] ∈ CLOSURE(I) FIRST(a) ⊆ FOLLOW(A)
– “If we’re looking for B β and B → γ then we should also be
– The item [A → X Y Z ., a] calls for a reduction looking for γ”
only if the next input symbol is a • GOTO(I, X : symbol)
– If [A → α . X β, a] ∈ I then [A → α X . β, a] ∈ GOTO(I,X)
– CLOSURE(GOTO(I, X)) ⊆ GOTO(I, X)
– “If we’re in state I and see symbol X, we are now in state
I’”

LR(1) parser
• Generate Action table from automaton
a
– For each edge Si ==> Sj in LR(1) automaton,
Action[Si,a] = shift Sj
– For each “reduce” node Si with reduction [A →
β, a], Action[Si,a] = reduce A → β
• Exception: If the node corresponds to the reduction
[S’ → S, $], then Action[Si, $] = accept
– All other actions are error
– If there is a conflict between actions, grammar
is not in LR(1)
• Goto table generated as in SLR(1)

You might also like