0% found this document useful (0 votes)
0 views

slides08-lr-parsing

The document discusses Bottom-Up LR Parsing, contrasting it with Top-Down parsing methods. It explains the mechanics of LR parsing, including the use of LR items and the handling of shift-reduce conflicts, particularly in the context of if-then-else statements. Additionally, it touches on error reporting and recovery strategies, as well as other parsing tools like GLR and PEG parsers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

slides08-lr-parsing

The document discusses Bottom-Up LR Parsing, contrasting it with Top-Down parsing methods. It explains the mechanics of LR parsing, including the use of LR items and the handling of shift-reduce conflicts, particularly in the context of if-then-else statements. Additionally, it touches on error reporting and recovery strategies, as well as other parsing tools like GLR and PEG parsers.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Bottom-Up LR Parsing

17-363/17-663: Programming Language Pragmatics

Reading: PLP section 2.3

Copyright © 2016 Elsevier


Prof. Jonathan Aldrich
Top-Down vs. Bottom-Up Parsing

• Top-Down/LL Parsing Intuition


program Start trying to parse a program

stmt_list $$$ Based


Start trying
on lookahead,
to parse arefine
program
to stmt_list
then to stmt stmt_list
stmt stmt_list $$$
Stack tracks predicted future parsing
...
• Bottom-Up/LR Parsing Intuition
read A Start by shifting a few tokens

stmt Reduce tokens to a stmt, thentotoa stmt_list


stmt, then a stmt_list

stmt_list Continue
Continue to
tokens
to shift
shift and
and reduce
reduce tokens
tokens
tokens to
to recognize
recognize another
another stmt
stmt
stmt_list read B Stack shows what constructs
stmt_list stmt have been recognized so far
Example Program and SLR(1) Grammar

read A
read B
sum := A + B
write sum
write sum / 2
Modeling a Parse with LR Items

• Initial parse state captured by an item

– includes start symbol, production, and current location

• What we see next might be inside stmt_list


– So we expand stmt_list and get a set of items:
Modeling a Parse with LR Items

• We can likewise expand stmt to get the item set:

• This is an SLR parser state


– We’ll call it state 0
Modeling a Parse with LR Items

• Our starting stack has state 0 on it:


0
• Input: read A read B …

• From state 0, we shift read onto the stack and


move to state 1:
0 read 1

• State 1 represents the following item:


Modeling a Parse with LR Items

• stack / item: 0 read 1


• input: A read B …

• From state 1, we shift id onto the stack


• stack / item: 0 read 1 id 1’
• input: read B …

• Now we reduce to stmt, and put stmt into the input


• stack / item: 0
• input: stmt read B …
Modeling a Parse with LR Items

• stack / item: 0
• input: stmt read B …

• We now shift stmt


• stack / item: 0 stmt 0’
• input: read B …

• Next we reduce to stmt_list


• stack / item: 0
• input: stmt_list read B …
Modeling a Parse with LR Items

• stack / item: 0
• input: stmt_list read B …

• Now we shift stmt_list


• stack / item: 0 stmt_list 2
• input: read B …
The Characteristic Finite State
Machine (CFSM)

There are also shift-reduce actions. So our states 0’, 1’ aren’t shown
here: they are “in between” states within a shift-reduce action
The CFSM as a Table
A Detailed Explanation of the CFSM
A Detailed Explanation of the CFSM
A Detailed Explanation of the CFSM
Exercise: LR Parsing

• Assume you are in parsing state 0


and the token stream is write sum / 2
• Show how the parse stack changes as the token
stream is consumed
• We’ll do the first action together
Parsing if-then-else Statements

• A famous parsing challenge (from Algol) involves if-


then-else, where else is optional:

stmt ::= if exp then stmt


| if exp then stmt else stmt

• Consider the phrase:

if exp then if exp then stmt else stmt

• Which then does the else belong to?


Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
• This corresponds to the traditional solution to the if problem
Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
– You can declare productions with precedence
• E.g. giving the if-then-else production higher precedence
than the if-then production
Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
– You can declare productions with precedence
– Rewrite the grammar to make it LR(1)
An LR(0) If-Then-Else Grammar
stmt → balanced_stmt | unbalanced_stmt
balanced_stmt → if cond then balanced_stmt
else balanced_stmt
| other_stuff
unbalanced_stmt → if cond then stmt
| if cond then balanced_stmt
else unbalanced_stmt

Invariant: balanced_stmts may be inside unbalanced_stmts


– but not vice versa
Unfortunately this grammar is LR(0) but not LL(0)
– Have to use precedence in LL parsers
or custom code in a recursive-descent parser
Connections to Theory
• A scanner is a Deterministic Finite Automaton (DFA)
– it can be specified with a state diagram

• An LL or LR parser is a Pushdown Automaton (PDA)


– a PDA can be specified with a state diagram and a stack
• the state diagram looks just like a DFA state diagram, except the arcs
are labeled with <input symbol, top-of-stack symbol> pairs, and in
addition to moving to a new state the PDA has the option of pushing
or popping a finite number of symbols onto/off the stack
• For LL(1) parsers the state machine has only two states:
processing and accepted
• All the action is in the input symbol and top of stack
• LR(1) parsers are richer (and more expressive)
Error Reporting
• Error reporting is relatively simple
• If you get a token for which there’s no entry in the
current parsing state / top of stack element, signal an
error
• Can tell the user what tokens would be OK here
Error Recovery
• Nice to report more than one error to the user
• Rather than stopping after the first one
• Simple idea: Panic mode
• In C-like languages, semicolons are good recovery spots
• So on an error:
• read tokens until you get to a semicolon
• discard the parser’s stack (predictions in an LL parser, states in an LR
parser) until you come to a production that has a semicolon
• assume you’ve parsed the semicolon-containing construct,
and continue parsing
• There are ways to do substantially better – see the online
supplement to the textbook
Other Parsing Tools
• Generalized LR (GLR) parser generators
• Accept any grammar – even ambiguous ones!
• This can be good if you have grammars written by nonexperts, as in
SASyLF
• But for a compiler-writer it is dangerous—you may not even know
your grammar is ambiguous, and then your poor users get ambiguity
errors when the parser runs
• Works like an LR parser, but on ambiguity considers all
possible parses in parallel
• Still O(n) if the grammar is LR (or “close”)
Other Parsing Tools
• Parsing Expression Grammar (PEG) parser generators
• Sidestep ambiguity by always favoring the first production
• Same danger as GLR parsers – you may not know your
grammar is ambiguous
• Still used some in practice (e.g. in Python)
• About as efficient as LL or LR in practice
• Like LR, PEG grammars can be cleaner than LL grammars
• Requires extreme care to get right – must think algorithmically
instead of declaratively
• Guido van Rossum, the developer of Python, saw this as an advantage

You might also like