0% found this document useful (0 votes)
58 views

05 Parsingbottomup PDF

This document discusses parsing and bottom-up parsing techniques. It introduces LR(0), SLR(1), and LALR(1) parsing and explains that they are typically implemented using tool-supported parsing tables. It provides an example grammar and walks through the bottom-up construction of a parse tree for an input based on that grammar. It also explains how bottom-up reduction of a parse tree implicitly performs a right-most derivation in reverse order.

Uploaded by

Arwa Khallouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

05 Parsingbottomup PDF

This document discusses parsing and bottom-up parsing techniques. It introduces LR(0), SLR(1), and LALR(1) parsing and explains that they are typically implemented using tool-supported parsing tables. It provides an example grammar and walks through the bottom-up construction of a parse tree for an input based on that grammar. It also explains how bottom-up reduction of a parse tree implicitly performs a right-most derivation in reverse order.

Uploaded by

Arwa Khallouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

INF5110 – Compiler Construction

Parsing

Spring 2016

1 / 131
Outline

1. Parsing
Bottom-up parsing
Bibs

2 / 131
Outline

1. Parsing
Bottom-up parsing
Bibs

3 / 131
Bottom-up parsing: intro

"R" stands for right-most derivation.

LR(0) • only for very simple grammars


• approx 300 states for standard programming
languages
• only as intro tro SLR(1) and LALR(1)
SLR(1) • expressive enough for most grammars for
standard PLs
• same number of states as LR(0)
• main focus here
LALR(1) • slightly more expressive than SLR(1)
• same number of states as LR(0)
• we look at ideas behind that method as well
LR(1) covers all grammars, which can in principle be parsed
by looking at the next token
4 / 131
Grammar classes overview (again)

unambiguous ambiguous

LL(k) LR(k)
LL(1) LR(1)
LALR(1)
SLR
LR(0)
LL(0)

5 / 131
LR-parsing and its subclasses
• right-most derivation (but left-to-right parsing)
• in general: bottom-up parsing more powerful than top-down
• typically: tool-supported (unlike recursive descent, which may
well be hand-coded)
• based on parsing tables + explicit stack
• thankfully: left-recursion no longer problematic
• typical tools: yacc and it’s descendants (like bison, CUP, etc)
• another name: shift-reduce parser
tokens + non-terms

states LR parsing table

6 / 131
Example grammar

S′ → S
S → ABt7 ∣ . . .
A → t4 t5 ∣ t1 B ∣ . . .
B → t2 t3 ∣ At6 ∣ . . .

• assume: grammar unambiguous


• assume word of terminals t1 t2 . . . t7 and its (unique)
parse-tree

• general agreement for bottom-up parsing:


• start symbol never on the right-hand side or a production
• routinely add another “extra” start-symbol (here S ′ )1

1
That will later be used when constructing a DFA for “scanning” the stack,
to control the reactions of the stack machine. This restriction leads to a
unique, well-define initial state.
7 / 131
Parse tree for t1 . . . t7

S′

A B

B A

t1 t2 t3 t4 t5 t6 t7

Remember: parse tree independent from left- or


right-most-derivation
8 / 131
LR: left-to right scan, right-most derivation?
Potentially puzzling question at first sight:
How can the parser do a right-most derivation, when it parses from
left-to-right?

• short answer: parser builds the parse tree bottom-up


• derivation:
• replacement of nonterminals by right-hand sides
• derivation: builds (implicitly) a parse-tree top-down

• sentential form: word from Σ∗ derivable from start-symbol

Right-sentential form: right-most derivation


S ⇒∗r α

Slighly longer answer


LR parser parses from left-to-right and builds the parse tree
bottom-up. When doing the parse, the parser (implicitly) builds a
right-most derivation in reverse (because of bottom-up). 9 / 131
Example expression grammar (from before)

exp → expaddopterm ∣ term (1)


addop → + ∣ −
term → termmulopterm ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

exp

term

term factor

factor

number ∗ number

10 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number

11 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number

12 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number

13 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor

14 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor
↪ term

15 / 131
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor
↪ term
↪ exp

16 / 131
Reduction in reverse = right derivation

Reduction Right derivation

n∗n ↪ factor ∗ n n ∗ n ⇐r factor ∗ n


↪ term ∗ n ⇐r term ∗ n
↪ term ∗ factor ⇐r term ∗ factor
↪ term ⇐r term
↪ exp ⇐r exp

• underlined part:
• different in reduction vs. derivation
• represents the “part being replaced”
• for derivation: right-most non-terminal
• for reduction: indicates the so-called handle
• note: all intermediate words are right-sentential forms

17 / 131
Handle

Definition (Handle)
Assume S ⇒∗r αAw ⇒r αβw . A production A → β at position k
following α is a handle of αβw We write ⟨A → β, k⟩ for such a
handle.
Note:
• w (right of a handle) contains only terminals
• w : corresponds to the future input still to be parsed!
• αβ will correspond to the stack content.
• the ⇒r -derivation-step in reverse:
• one reduce-step in the LR-parser-machine
• adding (implicitly in the LR-machine) a new parent to children
β (= bottom-up!)
• “handle” β can be empty (= )

18 / 131
Schematic picture of parser machine (again)

... if 1 + 2 ∗ ( 3 + 4 ) ...

q2

Reading “head”
(moves left-to-right)

q3 ⋱

q2 qn ...

q1 q0
unbouded extra memory (stack)
Finite control

19 / 131
General LR “parser machine” configuration

• Stack:
• contains: terminals + non-terminals (+ $)
• containing: what has been read already but not yet “processed”
• position on the “tape” (= token stream)
• represented here as word of terminals not yet read
• end of “rest of token stream”: $, as usual
• state of the machine
• in the following schematic illustrations: not yet part of the
discussion
• later: part of the parser table, currently we explain without
referring to the state of the parser-engine
• currently we assume: tree and rest of the input given
• the trick will be: how do achieve the same without that tree
already given (just parsing left-to-right)

20 / 131
Schematic run (reduction: from top to bottom)

$ t 1t 2t 3t 4t 5t 6t 7$
$t 1 t 2t 3t 4t 5t 6t 7$
$t 1 t 2 t 3t 4t 5t 6t 7$
$t 1 t 2 t 3 t 4t 5t 6t 7$
$t 1 B t 4t 5t 6t 7$
$A t 4t 5t 6t 7$
$At 4 t 5t 6t 7$
$At 4 t 5 t 6t 7$
$AA t 6t 7$
$AAt 6 t 7$
$AB t 7$
$ABt 7 $
$S $
$S ′ $

21 / 131
2 basic steps: shift and reduce

• parsers reads input and uses stack as intermediate storage


• so far: no mention of look-ahead (i.e., action depending on the
value of the next token(s)), but that may play a role, as well

Shift Reduce
Move the next input Remove the symbols of the
symbol (terminal) over to right-most subtree from the stack
the top of the stack and replace it by the non-terminal
(“push”) at the root of the subtree
(replace = “pop + push”).
• easy to do if one has the parse tree already!

22 / 131
Example: LR parsing for addition (given the tree)

E′ → E
E → E +n ∣ n

E′
parse stack input action
1 $ n + n$ shift
E 2 $n + n$ red:. E → n
3 $E + n$ shift
4 $E + n$ shift
5 $E + n $ reduce E → E + n
E 6 $E $ red.: E ′ → E
7 $E ′
$ accept

n + n note: line 3 vs line 6!; both contain E on


top of stack
(right) derivation: reduce-steps “in reverse”
E′ ⇒ E ⇒ E +n ⇒ n +n 23 / 131
Example with -transitions: parentheses

S′ → S
S → (S )S ∣ 
side remark: unlike previous grammar, here:
• production with two non-terminals in the right
⇒ difference between left-most and right-most derivations (and
mixed ones)

24 / 131
Parentheses: tree, run, and right-most derivation

S′
parse stack input action
1 $ ( ) $ shift
2 $( ) $ reduce S →
S
3 $(S ) $ shift
4 $(S ) $ reduce S →
S S 5 $(S )S $ reduce S → (S )S
6 $S $ reduce S′ → S

7 $S $ accept
(  ) 
Note: the 2 reduction steps for the 
productions
Right-most derivation and right-sentential forms
S ′ ⇒r S ⇒r ( S ) S ⇒r ( S ) ⇒r ( )
25 / 131
Right-sentential forms & the stack
• sentential form: word from Σ∗ derivable from start-symbol

Right-sentential form: right-most derivation


S ⇒∗r α

• right-sentential forms:
• part of the “run”
• but: split between stack and input
parse stack input action
1 $ n + n$ shift E ′ ⇒r E ⇒r E + n ⇒r n + n
2 $n + n$ red:. E → n
+ n$
n +n ↪ E +n ↪ E ↪ E′
3 $E shift
4 $E + n$ shift
5 $E + n $ reduce E → E + n
6 $E $ red.: E ′ → E
7 $E ′ $ accept

E ′ ⇒r E ⇒r E + n ∥ ∼ E + ∥ n ∼ E ∥ + n ⇒r n ∥ + n ∼∥ n + n
26 / 131
Viable prefixes of right-sentential forms and handles

• right-sentential form: E + n
• viable prefixes of RSF
• prefixes of that RSF on the stack
• here: 3 viable prefixes of that RSF: E , E +, E + n
• handle: remember the definition earlier
• here: for instance in the sentential form n + n
• handle is production E → n on the left occurrence of n in n + n
(let’s write n 1 + n 2 for now)
• note: in the stack machine:
• the left n 1 on the stack
• rest + n 2 on the input (unread, because of LR(0))
• if the parser engine detects handle n 1 on the stack, it does a
reduce-step
• However (later): depends on current “state” of the parser
engine

27 / 131
A typical situation during LR-parsing

28 / 131
General design for an LR-engine
• some ingredients clarified up-to now:
• bottom-up tree building as reverse right-most derivation,
• stack vs. input,
• shift and reduce steps
• however, one ingredient missing: next step of the engine may
depend on
• top of the stack (“handle”)
• look ahead on the input (but not for LL(0))
• and: current state of the machine (same stack-content, but
different reactions at different stages of the parse)

General idea:
Construct an NFA (and ultimately DFA) which works on the stack
(not the input). The alphabet consists of terminals and
non-terminals ΣT ∪ ΣN . The language

α may occur on the stack during LR-


Stacks(G ) = {α ∣ }
parsing of a sentence in L(G )

is regular! 29 / 131
LR(0) parsing as easy pre-stage
• LR(0): in practice too simple, but easy conceptual step
towards LR(1), SLR(1) etc.
• LR(1): in practice good enough, LR(k) not used for k > 1

LR(0) item
production with specific “parser position” . in its right-hand side

• . is, of course, a “meta-symbol” (not part of the production)

• For instance: production A → α, where α = βγ, then

LR(0) item
A → β.γ

• item with dot at the beginning: initial item


• item with dot at the end: complete item
30 / 131
Example: items of LR-grammar
Grammar for parentheses: 3 productions
S′ → S
S → (S )S ∣ 

8 items
S′ → .S
S′ → S.
S → .(S )S
S → ( .S ) S
S → ( S. ) S
S → ( S ) .S
S → ( S ) S.
S → .

• note: S →  gives S → . as item (not S → . and S → .)


• side remark: see later, it will turn out: grammar not LR(0)
31 / 131
Another example: items for addition grammar
Grammar for addition: 3 productions
E′ → E
E → E + number ∣ number

(coincidentally also:) 8 items


E′ → .E
E′ → E.
E → .E + number
E → E . + number
E → E + .number
E → E + number.
E → .number
E → number.

• also here: it will turn out: not LR(0) grammar


32 / 131
Finite automata of items
• general set-up: items as states in an automaton
• automaton: “operates” not on the input, but the stack
• automaton either
• first NFA, afterwards made deterministic (subset construction),
or
• directly DFA

States formed of sets of items


In a state marked by/containing item

A → β.γ

• β on the stack
• γ: to be treated next (terminals on the input, but can contain
also non-terminals)
33 / 131
State transitions of the NFA

• X ∈Σ
• two kind of transitions

Terminal or non-terminal Epsilon (X : non-terminal here)


X 
A → α.X η A → αX .η A → α.X η X → .β

• In case X = terminal (i.e. token) =


• the left step corresponds to a shift step2
• for non-terminals (see next slide):
• interpretation more complex: non-terminals are officially never
on the input
• note: in that case, item A → α.X η has two (kind of) outgoing
transitions

2
We have explained shift steps so far as: parser eats one terminal (= input
token) and pushes it on the stack.
34 / 131
Transitions for non-terminals and 
• so far: we never pushed a non-terminal from the input to the
stack, we replace in a reduce-step the right-hand side by a
left-hand side
• however: the replacement in a reduce steps can be seen as
1. pop right-hand side off the stack,
2. instead, “assume” corresponding non-terminal on input &
3. eat the non-terminal an push it on the stack.
• two kind of transitions
1. the -transition correspond to the “pop” half
2. that X transition (for non-terminals) corresponds to that
“eat-and-push” part
• assume production X → β) and initial item X → .β

Epsilon (X : non-terminal here)


Terminal or non-terminal
Given production X → β:
X
A → α.X η A → αX .η

A → α.X η X → .β
35 / 131
Initial and final states
initial states:
• we make our lives easier
• we assume (as said): one extra start symbol say S ′
(augmented grammar)
⇒ initial item S ′ → .S as (only) initial state

final states:
• NFA has a specific task, scanning the stack, not scanning the
input
• acceptance condition of the overall machine a bit more
complex
• input must be empty
• stack must be empty except the (new) start symbol
• NFA has a word to say about acceptence
• but not in form of being in an accepting state
• so: no accepting states
• but: accepting action (see later)
36 / 131
NFA: parentheses

S
start S′ → .S S′ → S.




S→ .(S )S S→ . S→ ( S ) S.


( 

S→ ( .S ) S S→ ( S. ) S S
S


)

S→ ( S ) .S

37 / 131
Remarks on the NFA

• colors for illustration


• “reddish”: complete items
• “blueish”: init-item (less important)
• “violet’tish”: both
• init-items
• one per production of the grammar
• that’s where the -transistions go into, but
• with exception of the initial state (with S ′ -production)
no outgoing edges from the complete items

38 / 131
NFA: addition

E
start E′ → .E E′ → E.




 n
 E→ .E + n E→ .n E→ n.

E→ E. + n E→ E + .n E→ E + n.
+ n

39 / 131
Determinizing: from NFA to DFA

• standard subset-construction3
• states then contains sets of items
• especially important: -closure
• also: direct construction of the DFA possible

3
technically, we don’t require here a total transition function, we leave out
any error state.
40 / 131
DFA: parentheses

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
(
4
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

41 / 131
DFA: addition

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

42 / 131
Direct construction of an LR(0)-DFA

• quite easy: simply build in the closure already

-closure
• if A → α.Bγ is an item in a state
• there is productions B → β1 ∣ β2 . . .
• add items B → .β1 , B → .β2 . . . to the state
• continue that process, until saturation

initial state
S ′ → .S
start
plus closure

43 / 131
Direct DFA construction: transitions

...
A1 → α1 .X β1 A1 → α1 X .β1
X
... A2 → α2 X .β2
A2 → α2 .X β2 plus closure
...

• X : terminal or non-terminal, both treated uniformely


• All items of the form A → α.X β must be included in the
post-state
• and all others (indicated by ". . . ") in the pre-state:
not-included
• re-check the previous examples: outcome is the same

44 / 131
How does the DFA do the shift/reduce and the rest?

• we have seen: bottom-up parse tree generation


• we have seen: shift-reduce and the stack vs. input
• we have seen: the construction of the DFA

But: how does it hang together?


We need to interpret the “set-of-item-states” in the light of the
stack content and figure out the reaction in terms of
• transitions in the automaton
• stack manipulations (shift/reduce)
• acceptance
• input (apart from shifting) not relevant when doing LR(0)

and the reaction better be uniquely determined . . . .

45 / 131
Stack contents and state of the automaton

• remember: at any given intermediate configuration of


stack/input in a run
1. stack contains words from Σ∗
2. DFA operates deterministically on such words
• the stack contains the “past”: read input (and potentially
partially reduced)
• when feeding that “past” on the stack into the automaton
• starting with the oldest symbol (not in a LIFO manner)
• starting with the DFA’s initial state
⇒ stack content determines state of the DFA
• actually: each prefix also determines uniquely a state
• top state:
• state after the complete stack content
• corresponds to the current state of the stack-machine
⇒ crucial when determining reaction

46 / 131
State transition allowing a shift

• assume: top-state (= current state) contains item

X → α.aβ
• construction thus has transition as follows

s t
... ...
a
X→ α.aβ X→ αa.β
... ...

• shift is possible;
• if shift is the correct operation and a is terminal symbol
corresponding to the current token: state afterwards = t

47 / 131
State transition: analogous for non-terminals

s t
X → α.Bβ ... B ...
X→ α.Bβ X→ αB.β

• same as before, now with non-terminal B


• note: we never read non-term from input
• not officially called a shift
• corresponds to the reaction followed by a reduce step, it’s not
the reduce step itself
• think of it as folllows: reduce and subsequent step
• not as: replace on top of the stack the handle (right-hand
side) by non-term B,
• but instead as:
1. pop off the handle from the top of the stack
2. put the non-term B “back onto the input” (corresponding to
the above state s)
3. eat the B and shift it to the stack
• later: a goto reaction in the parse table
48 / 131
State (not transition) where a reduce is possible
• remember: complete items (those with a dot . at the end)
• assume top state s containing complete item A → γ.
s
...
A→ γ.

• a complete right-hand side (“handle”) γ on the stack and thus


done
• may be replaced by right-hand side A
⇒ reduce step
• builds up (implicitly) new parent node A in the bottom-up
procedure
• Note: A on top of the stack instead of γ:4
• new top state!
• remember the “goto-transition” (shift of a non-terminal)
4
Indirectly only: as said, we remove the handle from the stack, and pretend,
as if the A is next on the input, and thus we “shift” it on top of the stack,
doing the corresponding A-transition.
49 / 131
Remarks: states, transitions, and reduce steps
• ignoring the -transitions (for the NFA)
• there are 2 “kinds” of transitions in the DFA
1. terminals: reals shifts
2. non-terminals: “following a reduce step”

No edges to represent (all of) a reduce step!


• if a reduce happens, parser engine changes state!
• however: this state change is not represented by a transition in
the DFA (or NFA for that matter)
• especially not by outgoing errors of completed items

• if the handle is removed from top stack: ⇒


• “go back to the (top) state before that handle had been
added”: no edge for that
• later: stack notation simply remembers the state as part of its
configuration
50 / 131
Example: LR parsing for addition (given the tree)

E′ → E
E → E +n ∣ n

E′
parse stack input action
1 $ n + n$ shift
E 2 $n + n$ red:. E → n
3 $E + n$ shift
4 $E + n$ shift
5 $E + n $ reduce E → E + n
E 6 $E $ red.: E ′ → E
7 $E ′
$ accept

n + n note: line 3 vs line 6!; both contain E on


top of stack

51 / 131
DFA of addition example

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• note line 3 vs. line 6


• both stacks = E ⇒ same (top) state in the DFA (state 1)

52 / 131
LR(0) grammars

LR(0) grammar
The top-state alone determines the next step.

• especially: no shift/reduce conflicts in the form shown


• thus: the number-grammar is not LR(0)

53 / 131
Simple parentheses

A → (A) ∣ a

0
A′ → .A 1 • for shift:
A
start A→ .(A) A →

A. • many shift
A→ .a transitions in state
allowed
( a • shift counts as one
3 action (including
A→ ( .A ) 2 “shifts” on
a
( A→ .(A) A→ a. non-terms)
A→ .a • but for reduction:
also the production
A
4 5 must be clear
A→ ( A. ) A→ (A).
)
54 / 131
Simple parentheses is LR(0)

0
A →

.A 1
A
start A→ .(A) A′ → A.
A→ .a state possible action
0 only shift
(
only red: (with A′ → A)
a
3
1
A→ ( .A ) 2
2 only red: (with A → a)
a 3 only shift
( A→ .(A) A→ a.
A→ .a 4 only shift
5 only red (with A → ( A ))
A
4 5
A→ ( A. ) A→ (A).
)

55 / 131
NFA for simple parentheses (bonus slide)

A
start A′ → .A A′ → A.




a
A→ .(A) A→ .a A→ a.


 (

A→ ( .A ) A→ ( A. ) A→ (A).
A )

56 / 131
Parsing table for an LR(0) grammar
• table structure slightly different for SLR(1), LALR(1), and
LR(1) (see later)
• note: the “goto” part: “shift” on non-terminals (only one
non-terminal here)
• corresponding to the A-labelled transitions
• see the parser run on the next slide

state action rule input goto


( a ) A
0 shift 3 2 1
1 reduce A′ → A
2 reduce A → a
3 shift 3 4
4 shift 5
5 reduce A → ( A )
57 / 131
Parsing of ( ( a ) )

stage parsing stack input action

1 $0 ((a))$ shift
2 $ 0 (3 (a))$ shift
3 $ 0 (3 (3 a))$ shift
4 $ 0 (3 (3 a 2 ))$ reduce A → a
5 $0 (3 (3 A4 ))$ shift
6 $0 (3 (3 A4 )5 )$ reduce A → ( A )
7 $0 (3 A4 )$ shift
8 $0 (3 A4 )5 $ reduce A → ( A )
9 $0 A1 $ accept

• note: stack on the left


• contains top state information
• in particular: overall top state on the right-most end
• note also: accept action
• reduce wrt. to A′ → A and
• empty stack (apart from $, A, and the state annotation)
⇒ accept
58 / 131
Parse tree of the parse

A′

A
( ( a ) )

• As said:
• the reduction “contains” the parse-tree
• reduction: builds it bottom up
• reduction in reverse: contains a right-most derivation (which is
“top-down”)
• accept action: corresponds to the parent-child edge A′ → A of
the tree
59 / 131
Parsing of erroneous input
• empty slots it the table: “errors”

stage parsing stack input action


1 $0 ((a)$ shift
2 $ 0 (3 (a)$ shift
3 $ 0 (3 (3 a)$ shift
4 $ 0 (3 (3 a 2 )$ reduce A → a
5 $0 (3 (3 A4 )$ shift
6 $0 (3 (3 A4 )5 $ reduce A → ( A )
7 $0 (3 A4 $ ????

stage parsing stack input action


1 $0 ()$ shift
2 $ 0 (3 )$ ?????

Invariant
important general invariant for LR-parsing: never shift something
“illegal” onto the stack
60 / 131
LR(0) parsing algo, given DFA

let s be the current state, on top of the parse stack


1. s contains A → α.X β, where X is a terminal
• shift X from input to top of stack. the new state pushed on
X
the stack: state t where s Ð→t
• else: if s does not have such a transition: error
2. s contains a complete item (say A → γ.): reduce by rule
A → γ:
• A reduction by S ′ → S: accept, if input is empty, error:
• else:
pop: remove γ (including “its” states from the stack)
back up: assume to be in state u which is now head state
push: push A to the stack, new head state t where
A

→t

61 / 131
LR(0) parsing algo remarks

• in [Louden, 1997]: slightly differently formulated


• instead of requiring (in the first case):
X
• push state t were s Ð
→ t or similar, book formulates
• push state containing item A → α.X β
• analogous in the second case
• algo (= deterministic) only if LR(0) grammar
• in particular: cannot have states with complete item and item
of form Aα.X β (otherwise shift-reduce conflict
• cannot have states with two X -successors (known as
reduce-reduce conflict)

62 / 131
DFA parentheses again: LR(0)?

S′ → S
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S →

S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
(
4
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

63 / 131
DFA parentheses again: LR(0)?

S′ → S
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
(
4
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

Look at states 0, 2, and 4


64 / 131
DFA addition again: LR(0)?

E′ → E
E → E + number ∣ number

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

65 / 131
DFA addition again: LR(0)?

E′ → E
E → E + number ∣ number

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

How to make a decision in state 1?

66 / 131
Decision? If only we knew the ultimate tree already . . .
. . . especially the parts still to come

E′

parse stack input action


E 1 $ n + n$ shift
2 $n + n$ red:. E → n
3 $E + n$ shift
E 4 $E + n$ shift
5 $E + n $ reduce E → E + n
6 $E $ red.: E ′ → E
7 $E ′
$ accept
n + n
• current stack: representation of the already known part of the
parse tree
• since we don’t have the future parts of the tree yet:
⇒ look-ahead on the input (without building the tree as yet)
• LR(1) and its variants: look-ahead of 1 (= look at the current
value of token)
67 / 131
Addition grammar (again)

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• How to make a decision in state 1? (here: shift vs. reduce)


⇒ look at the next input symbol (in the token variable)

68 / 131
One look-ahead

• LR(0), not very useful, much too weak


• add look-ahead, here of 1 input symbol (= token)
• different variations of that idea (with slight difference in
expresiveness)
• tables slightly changed (compared to LR(0))
• but: still can use the LR(0)-DFA’s

69 / 131
Resolving LR(0) reduce/reduce conflicts

LR(0) reduce/reduce conflict:


...
A → α.
...
B → β.

SLR(1) solution: use follow sets of non-terms


• If Follow (A) ∩ Follow (B) = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (α) then reduce using A → α
• if token ∈ Follow (β) then reduce using B → β
• ...

70 / 131
Resolving LR(0) reduce/reduce conflicts

LR(0) reduce/reduce conflict:


...
A → α.
...
B → β.

SLR(1) solution: use follow sets of non-terms


• If Follow (A) ∩ Follow (B) = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (α) then reduce using A → α
• if token ∈ Follow (β) then reduce using B → β
• ...

71 / 131
Resolving LR(0) shift/reduce conflicts

LR(0) shift/reduce conflict:


...
A → α. b1

... b2
B1 → β1 .b1 γ1
B2 → β2 .b2 γ2

SLR(1) solution: again: use follow sets of non-terms


• If Follow (A) ∩ {b1 , b2 , . . .} = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (A) then reduce using A → α, non-terminal A
determines new top state
• if token ∈ {b1 , b2 , . . .} then shift. Input symbol bi determines
new top state
• ...

72 / 131
Resolving LR(0) shift/reduce conflicts

LR(0) shift/reduce conflict:


...
A → α. b1

... b2
B1 → β1 .b1 γ1
B2 → β2 .b2 γ2

SLR(1) solution: again: use follow sets of non-terms


• If Follow (A) ∩ {b1 , b2 , . . .} = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (A) then reduce using A → α, non-terminal A
determines new top state
• if token ∈ {b1 , b2 , . . .} then shift. Input symbol bi determines
new top state
• ...

73 / 131
SLR(1) requirement on states (as in the book)

• formulated as conditions on the states (of LR(0)-items)


• given the LR(0)-item DFA as defined

SLR(1) condition, on all states s


1. For any item A → α.X β in s with X a terminal, there is no
complete item B → γ. in s with X ∈ Follow (B).
2. For any two complete items A → α. and B → β. in s,
Follow (α) ∩ Follow (β) = ∅

74 / 131
Revisit addition one more time

0
1
E′ → .E
E E →

E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• Follow (E ′ ) = {$}
⇒ • shift for +
• reduce with E ′ → E for $ (which corresponds to accept, in case
the input is empty)

75 / 131
SLR(1) algo
let s be the current state, on top of the parse stack
1. s contains A → α.X β, where X is a terminal and X is the next
token on the input, then
• shift X from input to top of stack. the new state pushed on
X
the stack: state t where s Ð
→ t5
2. s contains a complete item (say A → γ.) and the next token in
the input is in Follow (A): reduce by rule A → γ:
• A reduction by S ′ → S: accept, if input is empty6
• else:
pop: remove γ (including “its” states from the stack)
back up: assume to be in state u which is now head state
push: push A to the stack, new head state t where
A

→t
3. if next token is such that neither 1. or 2. applies: error
5
Cf. to the LR(0) algo: since we checked the existence of the transition
before, the else-part is missing now.
6
Cf. to the LR(0) algo: This happens now only if next token is $. Note that
the follow set of S ′ in the augmented grammar is always only $
76 / 131
Parsing table for SLR(1)
0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

state input goto


n + $ E
0 s ∶2 1
1 s ∶3 accept
2 r ∶ (E → n)
3 s ∶4
4 r ∶ (E → E + n) r ∶ (E → E + n)
for state 2 and 4: n ∉ Follow (E ) 77 / 131
Parsing table: remarks

• SLR(1) parsing table: rather similar-looking to the LR(0) one


• differences: reflect the differences in: LR(0)-algo vs.
SLR(1)-algo
• same number of rows in the table ( = same number of states
in the DFA)
• only: colums “arranged differently
• LR(0): each state uniformely: either shift or else reduce (with
given rule)
• now: non-uniform, dependent on the input
• it should be obvious:
• SLR(1) may resolve LR(0) conflicts
• but: if the follow-set conditions are not met: SLR(1) shift-shift
and/or SRL(1) shift-reduce conflicts
• would result in non-unique entries in SRL(1)-table7

7
by which it, strictly speaking, would no longer be an SRL(1)-table :-)
78 / 131
SLR(1) parser run (= “reduction”)

state input goto


n + $ E
0 s ∶2 1
1 s ∶3 accept
2 r ∶ (E → n)
3 s ∶4
4 r ∶ (E → E + n) r ∶ (E → E + n)

stage parsing stack input action

1 $0 n + n + n$ shift: 2
2 $0 n 2 + n + n$ reduce: E → n
3 $ 0 E1 + n + n$ shift: 3
4 $0 E1 +3 n + n$ shift: 4
5 $0 E1 +3 n 4 + n$ reduce: E → E + n
6 $ 0 E1 n$ shift 3
7 $0 E1 +3 n$ shift 4
8 $0 E1 +3 n 4 $ reduce: E → E + n
9 $ 0 E1 $ accept
79 / 131
Corresponding parse tree

E′

number + number + number

80 / 131
Revisit the parentheses again: SLR(1)?

Grammar: parentheses (from before)


Follow set
S′ → S Follow (S) = {), $}
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
(
4
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .
81 / 131
SLR(1) parse table

state input goto


( ) $ S
0 s ∶2 r ∶S → r ∶ S → ) 1
1 accept
2 s ∶2 r ∶S → r ∶S → 3
3 s ∶4
4 s ∶2 r ∶S → r ∶S → 5
5 r ∶ S → (S )S r ∶ S → (S )S

82 / 131
Parentheses: SLR(1) parser run (= “reduction”)

state input goto


( ) $ S
0 s∶2 r ∶S → r ∶ S → ) 1
1 accept
2 s∶2 r ∶S → r ∶S → 3
3 s∶4
4 s∶2 r ∶S → r ∶S → 5
5 r ∶ S → (S )S r ∶ S → (S )S

stage parsing stack input action


1 $0 ()()$ shift: 2
2 $ 0 (2 )()$ reduce: S →
3 $ 0 (2 S 3 )()$ shift: 4
4 $ 0 (2 S 3 )4 ()$ shift: 2
5 $ 0 (2 S 3 )4 (2 )$ reduce: S →
6 $ 0 (2 S 3 )4 (2 S 3 )$ shift: 4
7 $ 0 (2 S 3 )4 (2 S 3 )4 $ reduce: S →
8 $0 (2 S3 )4 (2 S3 )4 S5 $ reduce: S → (S )S
9 $ 0 (2 S 3 )4 S 5 $ reduce: S → (S )S
10 $0 S 2 $ accept
83 / 131
SLR(K)

• in principle: straightforward: k look-ahead, instead of 1


• rarely used in practice, using First k and Follow k instead of the
k = 1 versions
• tables grow exponentially with k!

84 / 131
Ambiguity & LR-parsing
• in principle: LR(k) (and LL(k)) grammars: unambiguous
• definition/construction: free of shift/reduce and reduce/reduce
conflict (given the chosen level of look-ahead)
• However: ambiguous grammar tolerable, if (remaining)
conflicts can be solved meaningfully otherwise:

Additional means of disambiguation:


1. by specifying associativity / precedence “outside” the grammar
2. by “living with the fact” that LR parser (commonly) prioritizes
shifts over reduces
• for the second point (“let the parser decide according to its
preferences”):
• use sparingly and cautiously
• typical example: dangling-else
• even if parsers makes a decision, programmar may or may not
“understand intuitively” the resulting parse tree (and thus AST)
• a grammar with many S/R-conflicts: go back to the drawing
board 85 / 131
Example of an ambiguous grammar

stmt → if -stmt ∣ other


if -stmt → if ( exp ) stmt
→ if ( exp ) stmtelsestmt
exp → 0 ∣ 1

In the following E for exp etc.

86 / 131
Simplified conditionals

Simplified “schematic” if-then-else


S → I ∣ other
I → if S ∣ ifS else S

Follow-sets
Follow

S {$}
S {$, else}
I {$, else}

• since ambiguous: at least one conflict must be somewhere

87 / 131
DFA of LR(0) items
0 1
S ′ → .S S ′ → S.
S
S → .I 2
I
start S → .other S → I.
I → .if S I I
I → .if S else S
if 4
6
I → if.S
other I → ifS else .S
I → if.S else S
3 S → .I
S → .I
S → other . S → .other
S → .other
I → .if S
other if
I → .if S
I → .if S else S
I → .if S else S

else
S S
other
5
7
I → ifS.
I → ifS else S.
I → ifS . else S

88 / 131
Simple conditionals: parse table

SLR(1)-parse-table, conflict resolved

Grammar state input goto


if else other $ S I
0 s∶4 s∶3 1 2
S → I (1)
1 accept
∣ (2)
r ∶1 r ∶1
other
2
→ (3)
r ∶2 r ∶2
I if S
3
∣ (4)
s∶4 s∶3
ifS else S
4 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

• shift-reduce conflict in state 5: reduce with rule 3 vs. shift (to


state 6)
• conflict there: resolved in favor of shift to 6
• note: extra start state left out from the table

89 / 131
Parser run (= reduction)

state input goto


if else other $ S I
0 s∶4 s∶3 1 2
1 accept
2 r ∶1 r ∶1
3 r ∶2 r ∶2
4 s∶4 s∶3 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

stage parsing stack input action


1 $0 if if other else other $ shift: 4
2 $0 if 4 if other else other $ shift: 4
3 $0 if 4 if 4 other else other $ shift: 3
4 $0 if 4 if 4 other 3 else other $ reduce: 2
5 $0 if 4 if 4 S5 else other $ shift 6
6 $0 if 4 if 4 S5 else6 other $ shift: 3
7 $0 if 4 if 4 S5 else6 other 3 $ reduce: 2
8 $0 if 4 if 4 S5 else6 S7 $ reduce: 4
9 $0 if 4 I2 $ reduce: 1
10 $0 S1 $ accept
90 / 131
Parser run, different choice

state input goto


if else other $ S I
0 s∶4 s∶3 1 2
1 accept
2 r ∶1 r ∶1
3 r ∶2 r ∶2
4 s∶4 s∶3 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

stage parsing stack input action


1 $0 if if other else other $ shift: 4
2 $0 if 4 if other else other $ shift: 4
3 $0 if 4 if 4 other else other $ shift: 3
4 $0 if 4 if 4 other 3 else other $ reduce: 2
5 $0 if 4 if 4 S5 else other $ reduce 3
6 $0 if 4 I2 else other $ reduce 1
7 $0 if 4 S5 else other $ shift 6
8 $0 if 4 S5 else6 other $ shift 3
9 $0 if 4 S5 else6 other 3 $ reduce 2
10 $0 if 4 S5 else6 S7 $ reduce 4
11 $0 S 1 $ accept
91 / 131
Parse trees: simple conditions

shift-precedence: conventional “wrong” tree


S S

I I S

S S S

if if other else other if if other else other

“dangling else”
“an else belongs to the last previous, still open (= dangling)
if-clause”

92 / 131
Use of ambiguous grammars

• advantage of ambiguous grammars: often simpler


• if ambiguous: grammar guaranteed to have conflicts
• can be (often) resolved by specifying precedence and
associativity
• supported by tools like yacc and CUP . . .

E′ → E
E → E + E ∣ E ∗ E ∣ number

93 / 131
DFA for + and ×

0
1
E ′ → .E
E′ → E.
E → .E + E E
start E → E. + E
E → .E ∗ E
E → E. ∗ E
E → .n 3 +
5
E → E + .E
E → E + E.
E → .E + E +
E → E. + E
E → .E ∗ E
E → E. ∗ E
E → .n E

6
n
+ E → E ∗ E. ∗
E → E. + E
n E → E. ∗ E

4
E → E ∗ .E
2 E E → .E + E
E→ n.
n E → .E ∗ E
E → .n 94 / 131
States with conflicts

• state 5
• stack contains ...E + E
• for input $: reduce, since shift not allowed from $
• for input +; reduce, as + is left-associative
• for input ∗: shift, as ∗ has precedence over +
• state 6:
• stack contains ...E ∗ E
• for input $: reduce, since shift not allowed from $
• for input +; reduce, a ∗ has precedence over +
• for input ∗: shift, as ∗ is left-associative
• see also the table on the next slide

95 / 131
Parse table + and ×

state input goto


n + ∗ $ E
0 s∶2 1
1 s∶3 s∶4 accept
2 r ∶E →n r ∶E →n r ∶E →n
3 s∶2 5
4 s∶2 6
5 r ∶ E → E +E s∶4 r; E → E + E
6 r ∶ E → E ∗E r ∶ E → E ∗E r ∶ E → E ∗E

How about exponentiation (written ↑ or ∗ ∗)?


Defined as right-associative. See exercise

96 / 131
For comparison: unambiguous grammar for + and ∗

Unambiguous grammar: precedence and left-assoc built in


E′ → E
E → E +T ∣ T
T → T ∗n ∣ n

Follow
E′ {$} (as always for start symbol)
E {$, +}
T {$, +, ∗}

97 / 131
DFA for unambiguous + and ×

0
E ′ → .E 2
1
E → .E + T E → E + .T
E E′ → E. +
start E → .T T → .T ∗ n
E → E. + T
E → .T ∗ n T → .n
E → .n

n T
n
6
3
E → E +T.
T → n.
T → T.∗n
T


4
5 7
E → T. ∗
T → T ∗ .n T → T ∗ .n
T → T.∗n n
98 / 131
DFA remarks

• the DFA now is SLR(1)


• check in particular states with complete items
state 1: Follow (E ′ ) = {$}
state 4: Follow (E ) = {$, +}
state 6: Follow (E ) = {$, +}
state 7: Follow (T ) = {$, +, ∗}
• in no case there’s a shift/reduce conflict (check the outgoing
edges vs. the follow set)
• there’s not reduce/reduce conflict either

99 / 131
LR(1) parsing

• most general from of LR(1) parsing


• aka: canonical LR(1) parsing
• usually: considered as unecessarily “complex” (i.e. LALR(1) or
similar is good enough)
• “stepping stone” towards LALR(1)

Basic restriction of SLR(1)


Uses look-ahead, yes, but only after it has built a non-look-ahead
DFA (based on LR(0)-items)

100 / 131
Limits of SLR(1) grammars

Assignment grammar fragmenta


a
Inspired by Pascal, analogous problems in C . . .

stmt → call -stmt ∣ assign-stmt


call -stmt → identifier
assign-stmt → var ∶= exp
var → [ exp ] ∣ identifier
exp ∣ var ∣ number

Assignment grammar fragment, simplified


S → id ∣ V ∶= E
V → id
E → V ∣ n

101 / 131
non-SLR(1): Reduce/reduce conflict

102 / 131
Situation can be saved

103 / 131
LALR(1) (and LR(1)): Being more precise with the
follow-sets

• LR(0)-items: too “indiscriminate” wrt. the follow sets


• remember the definition of SLR(1) conflicts
• LR(0)/SLR(1)-states:
• sets of items8 due to subset construction
• the items are LR(0)-items
• follow-sets as an after-thought

Adding precision in the states of the automaton already


Instead of using LR(0)-items and, when the LR(0) DFA is done, try
to disambiguate with the help of the follow sets for states
containing complete items: make more fineg rained items:
• LR(1) items
• each item with “specific follow information”: look-ahead
8
That won’t change in principle (but the items get more complex)
104 / 131
LR(1) items

• main idea: simply make the look-ahead part of the item


• obviously: proliferation of states9

LR(1) items
[A → α.β, a] (2)

• a: terminal/token, including $

9
Not to mention if we wanted look-ahead of k > 1, which in practice is not
done, though
105 / 131
LALR(1)-DFA (or LR(1)-DFA)

106 / 131
Remarks on the DFA

• Cf. state 2 (seen before)


• in SLR(1): problematic (reduce/reduce), as
Follow (V ) = {∶=, $}
• now: diambiguation, by the added information
• LR(1) would give the same DFA

107 / 131
Full LR(1) parsing

• AKA: canonical LR(1) parsing


• the best you can do with 1 look-ahead
• unfortunately: big tables
• pre-stage to LALR(1)-parsing

SLR(1) LALR(1)
LR(0)-item-based parsing, with LR(1)-item-based parsing, but
afterwards adding some extra afterwards throwing away
“pre-compiled” info (about precision by collapsing states,
follow-sets) to increase expressivity to save space

108 / 131
LR(1) transitions: arbitrary symbol

• transitions of the NFA (not DFA)

X -transition
X
[A → α.X β, a] [A → αX .β, a]

109 / 131
LR(1) transitions: 

-transition
for all
B → β1 ∣ β2 . . . and all b ∈ First(γa)

[ A → α.Bγ ,a] [ B → .β ,b]

Special case (γ = )
for all B → β1 ∣ β2 . . .

[ A → α.B ,a] [ B → .β ,b]

110 / 131
LALR(1) vs LR(1)

LR(1)

LALR(1)

111 / 131
Core of LR(1)-states

• actually: not done that way in practice


• main idea: collapse states with the same core

Core of an LR(1) state


= set of LR(0)-items (i.e., ignoring the look-ahead)

• observation: core of the LR(1) item = LR(0) item


• 2 LR(1) states with the same core have same outgoing edges,
and those lead to states with the same core

112 / 131
LALR(1)-DFA by as collapse

• collapse all states with the same core


• based on above observations: edges are also consistent
• Result: almost like a LR(0)-DFA but additionally
• still each individual item has still look ahead attached: the
union of the “collapsed” items
• especially for states with complete items [A → α, a, b, . . .] is
smaller than the follow set of A
• ⇒ less unresolved conflicts compared to SLR(1)

113 / 131
Concluding remarks of LR / bottom up parsing

• all constructions (here) based on BNF (not EBNF)


• conflicts (for instance due to ambiguity) can be solved by
• reformulate the grammar, but generarate the same language10
• use directives in parser generator tools like yacc, CUP, bison
(precedence, assoc.)
• or (not yet discussed): solve them later via semantical analysis
• NB: not all conflics are solvable, also not in LR(1) (remember
ambiguous languages)

10
If designing a new language, there’s also the option to massage the
language itself. Note also: there are inherently ambiguous languages for which
there is no unambiguous grammar.
114 / 131
LR/bottom-up parsing overview

advantages remarks
LR(0) defines states also used by not really used, many con-
SLR and LALR flicts, very weak
SLR(1) clear improvement over weaker than LALR(1). but
LR(0) in expressiveness, often good enough. Ok
even if using the same for hand-made parsers for
number of states. Table small grammars
typically with 50K entries
LALR(1) almost as expressive as method of choice for most
LR(1), but number of generated LR-parsers
states as LR(0)!
LR(1) the method covering all large number of states
bottom-up, one-look-ahead (typically 11M of entries),
parseable grammars mostly LALR(1) preferred

Remeber: once the table specific for LR(0), . . . is set-up, the parsing
algorithms all work the same

115 / 131
Error handling

Minimal requirement
Upon “stumbling over” an error (= deviation from the grammar):
give a reasonable & understandable error message, indicating also
error location. Potentially stop parsing

• for parse error recovery


• one cannot really recover from the fact that the program has
an error (an syntax error is a syntax error), but
• after giving decent error message:
• move on, potentially jump over some subsequent code,
• until parser can pick up normal parsing again
• so: meaningfull checking code even following a first error
• avoid: reporting an avalanche of subsequent spurious errors
(those just “caused” by the first error)
• “pick up” again after semantic errors: easier than for syntactic
errors

116 / 131
Error messages
• important:
• try to avoid error messages t hat only occur because of an
already report ed error!
• report error as early as possible, if possible at the first point
where the program cannot be extended to a correct program.
• make sure that , after an error, one doesn’t end up in a infinite
loop without reading any input symbols.
• What’s a good error message?
• assume: that the method factor() chooses the alternative
( exp ) but that it , when control returns from method exp(),
does not find a )
• one could report : left paranthesis missing
• But this may often be confusing, e.g. if what the program text
is: ( a + b c )
• here the exp() method will terminate after ( a + b, as c
cannot extend the expression). You should therefore rather
give the message error in expression or left
paranthesis missing.
117 / 131
Error recovery in bottom-up parsing
• panic recovery in LR-parsing
• simple form
• the only one we shortly look at
• upon error: recovery ⇒
• pops parts of the stack
• ignore parts of the input
• until “on track again”
• but: how to do that
• additional problem: non-determinism
• table: onstructed conflict-free under normal operation
• upon error (and clearing parts of the stack + input): no
guarantee it’s clear how to continue
⇒ heuristic needed (like panic mode recovery)

Panic mode idea


• try a fresh start,
• promising “fresh start” is: a possible goto action
• thus: back off and take the next such goto-opportunity
118 / 131
Possible error situation

parse stack input action


1 $ 0 a 1 b 2 c 3 (4 d 5 e 6 f ) gh . . . $ no entry for f

state input goto


... ) f g ... ... A B ...
...
3 bcellblueu bcellv
4 − − −
5 − − −
6 − − − −
...
u − − reduce . . .
v − − shift ∶ 7
...

119 / 131
Possible error situation

parse stack input action


1 $ 0 a 1 b 2 c 3 (4 d 5 e 6 f ) gh . . . $ no entry for f
2 $0 a 1 b 2 c 3 B v gh . . . $ back to normal
3 $0 a 1 b 2 c 3 B v g 7 h ...$ ...

state input goto


... ) f g ... ... A B ...
...
3 bcellblueu bcellv
4 − − −
5 − − −
6 − − − −
...
u − − reduce . . .
v − − shift ∶ 7
...

120 / 131
Panic mode recovery

Algo
1. Pop states for the stack until a state is found with non-empty
goto entries
2. • If there’s legal action on the current input token from one of
the goto-states, push token on the stack, restart the parse.
• If there’s several such states: prefer shift to a reduce
• Among possible reduce actions: prefer one whose associated
non-terminal is least general
3. if no legal action on the current input token from one of the
goto-states: advance input until there is a legal action (or until
end of input is reached)

121 / 131
Example again

parse stack input action


1 $ 0 a 1 b 2 c 3 (4 d 5 e 6 f ) gh . . . $ no entry for f

• first pop, until in state 3


• then jump over input
• until next input g
• since f and ) cannot be treated
• choose to goto v (shift in that state)

122 / 131
Example again

parse stack input action


1 $ 0 a 1 b 2 c 3 (4 d 5 e 6 f ) gh . . . $ no entry for f
2 $0 a 1 b 2 c 3 B v gh . . . $ back to normal
3 $0 a 1 b 2 c 3 B v g 7 h ...$ ...

• first pop, until in state 3


• then jump over input
• until next input g
• since f and ) cannot be treated
• choose to goto v (shift in that state)

123 / 131
Panic mode may loop forever

parse stack input action


1 $0 (n n)$
2 $ 0 (6 n n)$
3 $0 (6 n5 n)$
4 $0 (6 factor 4 n)$
6 $0 (6 term3 n)$
7 $0 (6 exp 10 n)$ panic!
8 $0 (6 factor 4 n)$ been there before: stage 4!

124 / 131
Typical yacc parser table
some variant of the expression grammar again
command → exp
exp → term ∗ factor ∣ factor
term → term ∗ factor ∣ factor
factor → number ∣ ( exp )

125 / 131
Panicking and looping

parse stack input action


1 $0 (n n)$
2 $ 0 (6 n n)$
3 $0 (6 n5 n)$
4 $0 (6 factor 4 n)$
6 $0 (6 term3 n)$
7 $0 (6 exp 10 n)$ panic!
8 $0 (6 factor 4 n)$ been there before: stage 4!

• error raised in stage 7, no action possible


• panic:
1. pop-off exp 10
2. state 6: 3 goto’s
exp term factor
goto to 10 3 4
with n next: action there — reduce r4 reduce r6

3. no shift, so we need to decide between the two reduces


4. factor : less general, we take that one
126 / 131
How to deal with looping panic?

• make sure to detec loop (i.e. previous “configurations”)


• if loop detected: doen’t repeat but do something special, for
instance
• pop-off more from the stack, and try again
• pop-off and insist that a shift is part of the options

Left out (from the book and the pensum)


• more info on error recovery
• expecially: more on yacc error recovery
• it’s not pensum, and for the oblig: need to deal with
CUP-specifics (not classic yacc specifics even if similar)
anyhow, and error recovery is not part of the oblig (halfway
decent error-/handling/ is).

127 / 131
Outline

1. Parsing
Bottom-up parsing
Bibs

128 / 131
References I

[Louden, 1997] Louden, K. (1997).


Compiler Construction, Principles and Practice.
PWS Publishing.

129 / 131

You might also like