0% found this document useful (0 votes)
60 views27 pages

Lecture 9: Bottom-Up Parsing: Front-End Back-End

This document discusses bottom-up parsing. Bottom-up parsing starts at the leaves of a parse tree and works up to the root by finding substrings called handles that match grammar productions. LR(1) grammars allow handles to be found efficiently through left-to-right scanning of the input string. An LR(1) parser uses shift-reduce operations guided by a parsing table to construct a rightmost derivation and parse tree without backtracking. The parser adds state information to the stack and handles shift-reduce conflicts and errors.

Uploaded by

Aayush Kaushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views27 pages

Lecture 9: Bottom-Up Parsing: Front-End Back-End

This document discusses bottom-up parsing. Bottom-up parsing starts at the leaves of a parse tree and works up to the root by finding substrings called handles that match grammar productions. LR(1) grammars allow handles to be found efficiently through left-to-right scanning of the input string. An LR(1) parser uses shift-reduce operations guided by a parsing table to construct a rightmost derivation and parse tree without backtracking. The parser adds state information to the stack and handles shift-reduce conflicts and errors.

Uploaded by

Aayush Kaushal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Lecture 9: Bottom-Up Parsing

Source code Front-End IR Object code


LexicalSyntax Back-End
Analysis
Analysis
(from last lecture) Top-Down Parsing:
• Start at the root of the tree and grow towards leaves.
• Pick a production and try to match the input.
• We may need to backtrack if a bad choice is made.
• Some grammars are backtrack-free (predictive parsing).

Today’s lecture:
Bottom-Up parsing
18 Aug 2022 COMP36512 Lecture 9 1
Bottom-Up Parsing: What is it all about?
Goal: Given a grammar, G, construct a parse tree for a string (i.e.,
sentence) by starting at the leaves and working to the root (i.e., by
working from the input sentence back toward the start symbol S).
Recall: the point of parsing is to construct a derivation:
S012...n-1sentence
To derive i-1 from i, we match some rhs b in i, then replace b with its
corresponding lhs, A. This is called a reduction (it assumes Ab).
The parse tree is the result of the tokens and the reductions.
Example: Consider the grammar below and the input string abbcde.
Sentential Form Production Position
1. GoalaABe abbcde 3 2
2. AAbc a A bcde 2 4
a A de 4 3
3. |b aABe 1 4
Goal - -
4. Bd
18 Aug 2022 COMP36512 Lecture 9 2
Finding Reductions
• What are we trying to find?
– A substring b that matches the right-side of a production that occurs as one
step in the rightmost derivation. Informally, this substring is called a handle.
• Formally, a handle of a right-sentential form  is a pair <Ab,k>
where Ab  P and k is the position in  of b’s rightmost symbol.
(right-sentential form: a sentential form that occurs in some rightmost derivation).
– Because  is a right-sentential form, the substring to the right of a handle
contains only terminal symbols. Therefore, the parser doesn’t need to scan past
the handle.
– If a grammar is unambiguous, then every right-sentential form has a unique
handle (sketch of proof by definition: if unambiguous then rightmost
derivation is unique; then there is unique production at each step to produce a
sentential form; then there is a unique position at which the rule is applied;
hence, unique handle).
If we can find those handles, we can build a derivation!
18 Aug 2022 COMP36512 Lecture 9 3
Motivating Example
Given the grammar of the left-hand side below, find a rightmost
derivation for x – 2*y (starting from Goal there is only one, the
grammar is not ambiguous!). In each step, identify the handle.
1. Goal  Expr Production Sentential Form Handle
2. Expr  Expr + Term - Goal -
3. | Expr – Term 1 Expr 1,1
4. | Term 3 Expr – Term 3,3
5. Term  Term * Factor
6. | Term / Factor
7. | Factor
8. Factor  number
9. | id

Problem: given the sentence x – 2*y, find the handles!


18 Aug 2022 COMP36512 Lecture 9 4
A basic bottom-up parser
• The process of discovering a handle is called handle pruning.
• To construct a rightmost derivation, apply the simple algorithm:
for i=n to 1, step -1
find the handle <Ab,k>i in i
replace b with A to generate i-1
(needs 2n steps, where n is the length of the derivation)
• One implementation is based on using a stack to hold grammar
symbols and an input buffer to hold the string to be parsed. Four
operations apply:
– shift: next input is shifted (pushed) onto the top of the stack
– reduce: right-end of the handle is on the top of the stack; locate
left-end of the handle within the stack; pop handle off stack and
push appropriate non-terminal left-hand-side symbol.
– accept: terminate parsing and signal success.
– error: call an error recovery routine.
18 Aug 2022 COMP36512 Lecture 9 5
Implementing a shift-reduce parser
push $ onto the stack
token = next_token()
repeat
if the top of the stack is a handle Ab
then /* reduce b to A */
pop the symbols of b off the stack
push A onto the stack
elseif (token != eof) /* eof: end-of-file = end-of-input */
then /* shift */
push token
token=next_token()
else /* error */
call error_handling()
until (top_of_stack == Goal && token==eof)
Errors show up: a) when we fail to find a handle, or b) when we hit EOF and
we need to shift. The parser needs to recognise syntax errors.
18 Aug 2022 COMP36512 Lecture 9 6
Example: x–2*y
Stack Input Handle Action
$ id – num * id None Shift
$ id – num * id 9,1 Reduce 9
$ Factor – num * id 7,1 Reduce 7
$ Term – num * id 4,1 Reduce 4
$ Expr – num * id None Shift !!
$ Expr – num * id None Shift
$ Expr – num * id 8,3 Reduce 8
$ Expr – Factor * id 7,3 Reduce 7
$ Expr – Term * id None Shift !!
$ Expr – Term * id None Shift
$ Expr – Term * id 9,5 Reduce 9
$ Expr – Term * Factor 5,5 Reduce 5
$ Expr – Term 3,3 Reduce 3
$ Expr 1,1 Reduce 1
$ Goal none Accept

– 1. Shift until top of stack is the right end of the handle


– 2. Find the left end of the handle and reduce
(5 shifts, 9 reduces, 1 accept)
18 Aug 2022 COMP36512 Lecture 9 7
What can go wrong?
(think about the steps with an exclamation mark in the previous slide)
• Shift/reduce conflicts: the parser cannot decide whether to
shift or to reduce.
Example: the dangling-else grammar; usually due to ambiguous
grammars.
Solution: a) modify the grammar; b) resolve in favour of a shift.
• Reduce/reduce conflicts: the parser cannot decide which of
several reductions to make.
Example: id(id,id); reduction is dependent on whether the
first id refers to array or function.
May be difficult to tackle.

Key to efficient bottom-up parsing: the handle-finding mechanism.


18 Aug 2022 COMP36512 Lecture 9 8
LR(1) grammars
(a beautiful example of applying theory to solve a complex problem in practice)
A grammar is LR(1) if, given a rightmost derivation, we can (I) isolate
the handle of each right-sentential form, and (II) determine the
production by which to reduce, by scanning the sentential form from
left-to-right, going at most 1 symbol beyond the right-end of the
handle.
• LR(1) grammars are widely used to construct (automatically) efficient
and flexible parsers:
– Virtually all context-free programming language constructs can be expressed in
an LR(1) form.
– LR grammars are the most general grammars parsable by a non-backtracking,
shift-reduce parser (deterministic CFGs).
– Parsers can be implemented in time proportional to tokens+reductions.
– LR parsers detect an error as soon as possible in a left-to-right scan of the input.

L stands for left-to-right scanning of the input; R for constructing a rightmost derivation in reverse; 1 for the
number of input symbols for lookahead.
18 Aug 2022 COMP36512 Lecture 9 9
LR Parsing: Background
• Read tokens from an input buffer (same as with shift-reduce
parsers)
• Add an extra state information after each symbol in the
stack. The state summarises the information contained in
the stack below it. The stack would look like:
$ S0 Expr S1 - S2 num S3
• Use a table that consists of two parts:
– action[state_on_top_of_stack, input_symbol]: returns one of: shift
s (push a symbol and a state); reduce by a rule; accept; error.
– goto[state_on_top_of_stack,non_terminal_symbol]: returns a new
state to push onto the stack after a reduction.
18 Aug 2022 COMP36512 Lecture 9 10
Skeleton code for an LR Parser
Push $ onto the stack
push s0
token=next_token()
repeat
s=top_of_the_stack /* not pop! */
if ACTION[s,token]==‘reduce Ab’
then pop 2*(symbols_of_b) off the stack
s=top_of_the_stack /* not pop! */
push A; push GOTO[s,A]
elseif ACTION[s,token]==‘shift sx’
then push token; push sx
token=next_token()
elseif ACTION[s,token]==‘accept’
then break
else report_error
end repeat
report_success

18 Aug 2022 COMP36512 Lecture 9 11


The Big Picture: Prelude to what follows
• LR(1) parsers are table-driven, shift-reduce parsers that
use a limited right context for handle recognition.
• They can be built by hand; perfect to automate too!
• Summary: Bottom-up parsing is more powerful!

source Scanner tokens Table-driven I.R.


Parser
code
•The table encodes
grammatical knowledge
grammar Parser Table •It is used to determine
Generator the shift-reduce parsing
decision.
Next: we will automate table construction!
Reading: Aho2 Section 4.5; Aho1 pp.195-202; Hunter pp.100-103;
Grune pp.150-152
18 Aug 2022 COMP36512 Lecture 9 12
Example
Consider the following grammar and tables:
ACTION GOTO
STATE
1. Goal  CatNoise eof miau CatNoise
0 - Shift 2 1
2. CatNoise  CatNoise miau 1 accept Shift 3
3. | miau 2 Reduce 3 Reduce 3
3 Reduce 2 Reduce 2

Example 1: (input string miau)


Stack Input Action Note that there cannot
$ s0 miau eof Shift 2 be a syntax error with
$ s0 miau s2 eof Reduce 3 CatNoise, because it has
$ s0 CatNoise s1 eof Accept
only 1 terminal symbol.
“miau woof” is a lexical
Example 2: (input string miau miau)
problem, not a syntax
Stack Input Action
$ s0 miau miau eof Shift 2 error!
$ s0 miau s2 miau eof Reduce 3
$ s0 CatNoise s1 miau eof Shift 3
$ s0 CatNoise s1 miau s3 eof Reduce 2
eof is a convention for
$ s0 CatNoise s1 eof accept end-of-file (=end of input)
18 Aug 2022 COMP36512 Lecture 9 13
Example: the expression grammar (slide 4)
STA ACTION GOTO
1. Goal  Expr TE eof + – * / num id Expr Term Factor

2. Expr  Expr + Term 0 S4 S5 1 2 3


1 Acc S6 S7
3. | Expr – Term 2 R4 R4 R4 S8 S9
3 R7 R7 R7 R7 R7
4. | Term 4 R8 R8 R8 R8 R8
5. Term  Term * Factor 5 R9 R9 R9 R9 R9
6 S4 S5 10 3
6. | Term / Factor 7 S4 S5 11 3
8 S4 S5 12
7. | Factor
9 S4 S5 13
8. Factor  number 10 R2 R2 R2 S8 S9
11 R3 R3 R3 S8 S9
9. | id 12 R5 R5 R5 R5 R5
13 R6 R6 R6 R6 R6

Apply the algorithm in slide 3 to the expression x-2*y


The result is the rightmost derivation (as in Lect.8, slide 7), but …
…no conflicts now: state information makes it fully deterministic!
18 Aug 2022 COMP36512 Lecture 9 14
Summary
• Top-Down Recursive Descent: Pros: Fast, Good locality, Simple,
good error-handling. Cons: Hand-coded, high-maintenance.
• LR(1): Pros: Fast, deterministic languages, automatable. Cons:
large working sets, poor error messages.
• What is left to study?
– Checking for context-sensitive properties
– Laying out the abstractions for programs & procedures.
– Generating code for the target machine.
– Generating good code for the target machine.
• Reading: Aho2 Sections 4.7, 4.10; Aho1 pp.215-220 & 230-236;
Cooper 3.4, 3.5; Grune pp.165-170; Hunter 5.1-5.5 (too general).

18 Aug 2022 COMP36512 Lecture 9 15


LR(1) – Table Generation

18 Aug 2022 COMP36512 Lecture 9 16


LR Parsers: How do they work?
miau
CatNoise 1 3
• Key: language of handles is regular 0 Reduce
miau actions
– build a handle-recognising DFA 2
– Action and Goto tables encode the DFA
• How do we generate the Action and Goto tables?
– Use the grammar to build a model of the DFA
– Use the model to build Action and Goto tables
– If construction succeeds, the grammar is LR(1).
• Three commonly used algorithms to build tables:
– LR(1): full set of LR(1) grammars; large tables; slow, large construction.
– SLR(1): smallest class of grammars; smallest tables; simple, fast construction.
– LALR(1): intermediate sized set of grammars; smallest tables; very common.
(Space used to be an obsession; now it is only a concern)

18 Aug 2022 COMP36512 Lecture 9 17


LR(1) Items
• An LR(1) item is a pair [A,B], where:
– A is a production  with a • at some position in the rhs.
– B is a lookahead symbol.
• The • indicates the position of the top of the stack:
– [•,a]: the input seen so far (ie, what is in the stack) is con-
sistent with the use of , and the parser has recognised .
– [•,a]: the parser has seen , and a lookahead symbol of a
is consistent with reducing to .
• The production  with lookahead a, generates:
– [•,a], [•,a], [•,a], [•,a]
• The set of LR(1) items is finite.
– Sets of LR(1) items represent LR(1) parser states.
18 Aug 2022 COMP36512 Lecture 9 18
The Table Construction Algorithm
• Table construction:
– 1. Build the canonical collection of sets of LR(1) items, S:
• I) Begin in S0 with [Goal, eof] and find all equivalent items
as closure(S0).
• II) Repeatedly compute, for each Sk and each symbol  (both
terminal and non-terminal), goto(Sk,). If the set is not in the
collection add it. This eventually reaches a fixed point.
– 2. Fill in the table from the collection of sets of LR(1) items.
• The canonical collection completely encodes the transition
diagram for the handle-finding DFA.
• The lookahead is the key in choosing an action:
Remember Expr-Term from Lecture 8 slide 7, when we chose to shift rather than reduce to Expr?
18 Aug 2022 COMP36512 Lecture 9 19
Closure(state)
Closure(s) // s is the state
while (s is still changing)
for each item [,a] in s
for each production 
for each terminal b in FIRST(a)
if [,b] is not in s, then add it.

Recall (Lecture 7, Slide 7): FIRST(A) is defined as the set of terminal


symbols that appear as the first symbol in strings derived from A.
E.g.: FIRST(Goal) = FIRST(CatNoise) = FIRST(miau) = miau
Example: (using the CatNoise Grammar) S0: {[GoalCatNoise,eof],
[CatNoiseCatNoise miau, eof], [CatNoisemiau, eof],
[CatNoiseCatNoise miau, miau], [CatNoisemiau, miau]}
(the 1st item by definition; 2nd,3rd are derived from the 1st; 4th,5th are derived from the 2nd)

18 Aug 2022 COMP36512 Lecture 9 20


Goto(s,x)
Goto(s,x)
new=
for each item [x,a] in s
add [x,a] to new
return closure(new)

Computes the state that the parser would reach if it recognised an x while
in state s.

Example:
S1 (x=CatNoise): [GoalCatNoise,eof], [CatNoiseCatNoise miau, eof],
[CatNoiseCatNoise miau, miau]
S2 (x=miau): [CatNoisemiau, eof], [CatNoisemiau, miau]
S3 (from S1): [CatNoiseCatNoise miau, eof], [CatNoiseCatNoise miau, miau]
18 Aug 2022 COMP36512 Lecture 9 21
Example (slide 1 of 4)
Simplified expression grammar:
GoalExpr
ExprTerm-Expr
ExprTerm
TermFactor*Term
TermFactor
Factorid

FIRST(Goal)=FIRST(Expr)=FIRST(Term)=FIRST(Factor)=FIRST(id)=id
FIRST(-)=-
FIRST(*)=*

18 Aug 2022 COMP36512 Lecture 9 22


Example: first step (slide 2 of 4)
• S0: closure({[GoalExpr,eof]})
{[GoalExpr,eof], [ExprTerm-Expr,eof],
[ExprTerm,eof], [TermFactor*Term,eof],
[TermFactor*Term,-], [TermFactor,eof],
[TermFactor,-], [Factorid, eof], [Factorid,-],
[Factorid,*]}
• Next states:
– Iteration 1:
• S1: goto(S0,Expr), S2: goto(S0,Term), S3: goto(S0, Factor), S4:
goto(S0, id)
– Iteration 2:
• S5: goto(S2,-), S6: goto(S3,*)
– Iteration 3:
• S7: goto(S5, Expr), S8: goto(S6, Term)
18 Aug 2022 COMP36512 Lecture 9 23
Example: the states (slide 3 of 4)
S1: {[GoalExpr,eof]}
S2: {[GoalTerm-Expr,eof], [ExprTerm,eof]}
S3: {[TermFactor*Term,eof],[TermFactor*Term,-],
[TermFactor,eof], [TermFactor,-]}
S4: {[Factorid,eof], [Factorid,-], [Factorid,*]}
S5: {[ExprTerm-Expr,eof], [ExprTerm,eof],
[TermFactor*Term,eof], [TermFactor*Term,-],
[TermFactor,eof], [TermFactor,-], [Factorid,eof],
[Factorid,-], [Factorid,-]}
S6: {[TermFactor*Term,eof],[TermFactor*Term,-],
[TermFactor*Term,eof], [TermFactor*Term,-],
[TermFactor,eof], [TermFactor,-], [Factorid,eof],
[Factorid,-], [Factorid,-]}
S7: {[ExprTerm-Expr,eof]}
S8: {[TermFactor*Term,eof], TermFactor*Term,-]}
18 Aug 2022 COMP36512 Lecture 9 24
Table Construction
• 1. Construct the collection of sets of LR(1) items.
• 2. State i of the parser is constructed from state j.
– If [Aa,b] in state i, and goto(i,a)=j, then set
action[i,a] to “shift j”.
– If [A,a] in state i, then set action[i,a] to “reduce
A”.
– If [GoalA,eof] in state i, then set action[i,eof] to
“accept”.
– If goto[i,A]=j then set goto[i,A] to j.
• 3. All other entries in action and goto are set to “error”.
18 Aug 2022 COMP36512 Lecture 9 25
Example: The Table (slide 4 of 4)
STA ACTION GOTO
TE id - * eof Expr Term Factor
GoalExpr 0 S4 1 2 3
ExprTerm-Expr 1 Accept
2 S5 R3
ExprTerm 3 R5 S6 R5
TermFactor*Term 4 R6 R6 R6
TermFactor 5 S4 7 2 3
6 S4 8 3
Factorid 7 R2
8 R4 R4

18 Aug 2022 COMP36512 Lecture 9 26


Further remarks
• If the algorithm defines an entry more than once in the
ACTION table, then the grammar is not LR(1).
• Other table construction algorithms, such as LALR(1)
or SLR(1), produce smaller tables, but at the cost of
larger space requirements.
• yacc can be used to convert a context-free grammar
into a set of tables using LALR(1) (see % man yacc )
• In practice: “…the compiler-writer does not really want to
concern himself with how parsing is done. So long as the parse is
done correctly, …, he can live with almost any reliable
technique…” [J.J.Horning from “Compiler Construction: An
Advanced Course”, Springer-Verlag, 1976]
18 Aug 2022 COMP36512 Lecture 9 27

You might also like