0% found this document useful (0 votes)
68 views330 pages

04 Parsing

This document provides an outline for the topic of parsing in the course INF5110 - Compiler Construction. It discusses first and follow sets, which are concepts used for grammars that provide information about possible derivations of words. It then covers top-down and bottom-up parsing techniques. The document gives definitions and examples of first and follow sets, including recursive definitions and pseudocode for calculating these sets. It also provides examples using a sample expression grammar.

Uploaded by

fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views330 pages

04 Parsing

This document provides an outline for the topic of parsing in the course INF5110 - Compiler Construction. It discusses first and follow sets, which are concepts used for grammars that provide information about possible derivations of words. It then covers top-down and bottom-up parsing techniques. The document gives definitions and examples of first and follow sets, including recursive definitions and pseudocode for calculating these sets. It also provides examples using a sample expression grammar.

Uploaded by

fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 330

INF5110 – Compiler Construction

Spring 2017

1 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

2 / 330
INF5110 – Compiler Construction

Parsing

Spring 2017

3 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

4 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

5 / 330
Overview

• First and Follow set: general concepts for grammars


• textbook looks at one parsing technique (top-down)
[Louden, 1997, Chap. 4] before studying First/Follow sets
• we: take First/Follow sets before any parsing technique
• two transformation techniques for grammars, both preserving
the accepted language
1. removal for left-recursion
2. left factoring

6 / 330
First and Follow sets
• general concept for grammars
• certain types of analyses (e.g. parsing):
• info needed about possible “forms” of derivable words,

First-set of A
which terminal symbols can appear at the start of strings derived
from a given nonterminal A

Follow-set of A
Which terminals can follow A in some sentential form.

• sentential form: word derived from grammar’s starting symbol


• later: different algos for First and Follow sets, for all
non-terminals of a given grammar
• mostly straightforward
• one complication: nullable symbols (non-terminals)
• Note: those sets depend on grammar, not the language
7 / 330
First sets

Definition (First set)


Given a grammar G and a non-terminal A. The First-set of A,
written First G (A) is defined as

First G (A) = {a ∣ A ⇒∗G aα, a ∈ ΣT } + { ∣ A ⇒∗G } . (1)

Definition (Nullable)
Given a grammar G . A non-terminal A ∈ ΣN is nullable, if A ⇒∗ .

8 / 330
Examples

• Cf. the Tiny grammar


• in Tiny, as in most languages

First(if -stmt) = {”if ”}


• in many languages:

First(assign-stmt) = {identifier, ”(”}

• typical Follow (see later) for statements:

Follow (stmt) = {”; ”, ”end”, ”else”, ”until”}

9 / 330
Remarks

• note: special treatment of the empty word 


• in the following: if grammar G clear from the context
• ⇒∗ for ⇒∗G
• First for First G
• ...
• definition so far: “top-level” for start-symbol, only
• next: a more general definition
• definition of First set of arbitrary symbols (and even words)
• and also: definition of First for a symbol in terms of First for
“other symbols” (connected by productions)
⇒ recursive definition

10 / 330
A more algorithmic/recursive definition

• grammar symbol X : terminal or non-terminal or 

Definition (First set of a symbol)


Given a grammar G and grammar symbol X . The First-set of X ,
written First(X ), is defined as follows:
1. If X ∈ ΣT + {}, then First(X ) = {X }.
2. If X ∈ ΣN : For each production

X → X1 X2 . . . Xn

2.1 First(X ) contains First(X1 ) ∖ {}


2.2 If, for some i < n, all First(X1 ), . . . , First(Xi ) contain , then
First(X ) contains First(Xi+1 ) ∖ {}.
2.3 If all First(X1 ), . . . , First(Xn ) contain , then First(X )
contains {}.
11 / 330
For words

Definition (First set of a word)


Given a grammar G and word α. The First-set of

α = X1 . . . Xn ,

written First(α) is defined inductively as follows:


1. First(α) contains First(X1 ) ∖ {}
2. for each i = 2, . . . n, if First(Xk ) contains  for all
k = 1, . . . , i − 1, then First(α) contains First(Xi ) ∖ {}
3. If all First(X1 ), . . . , First(Xn ) contain , then First(X )
contains {}.

12 / 330
Pseudo code

f o r all non-terminals A do
F i r s t [ A ] := {}
end
w h i l e there are changes to any F i r s t [ A ] do
f o r each production A → X1 . . . Xn do
k := 1 ;
c o n t i n u e := t r u e
w h i l e c o n t i n u e = t r u e and k ≤ n do
F i r s t [ A ] := F i r s t [ A ] ∪ F i r s t [ Xk ] ∖ {}
i f  ∉ F i r s t [ Xk ] then c o n t i n u e := f a l s e
k := k + 1
end ;
if continue = true
then F i r s t [ A ] := F i r s t [ A ] ∪ {}
end ;
end

13 / 330
If only we could do away with special cases for the empty
words . . .

for grammar without -productions.1


f o r all non-terminals A do
F i r s t [ A ] := {} // c o u n t s a s c h a n g e
end
w h i l e there are changes to any F i r s t [ A ] do
f o r each production A → X1 . . . Xn do
F i r s t [ A ] := F i r s t [ A ] ∪ F i r s t [ X1 ]
end ;
end

1
A production of the form A → .
14 / 330
Example expression grammar (from before)

exp → exp addop term ∣ term (2)


addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

15 / 330
Example expression grammar (expanded)

exp → exp addop term (3)


exp → term
addop → +
addop → −
term → term mulop factor
term → factor
mulop → ∗
factor → ( exp )
factor → n

16 / 330
nr pass 1 pass 2 pass 3

1 exp → exp addop term

2 exp → term

3 addop → +

4 addop → −

5 term → term mulop factor

6 term → factor

7 mulop → ∗

8 factor → ( exp )

9 factor → n

17 / 330
“Run” of the algo

18 / 330
Collapsing the rows & final result
• results per pass:
1 2 3
exp {(, n}
addop {+, −}
term {(, n}
mulop {∗}
factor {(, n}

• final results (at the end of pass 3):

First[_]
exp {(, n}
addop {+, −}
term {(, n}
mulop {∗}
factor {(, n}
19 / 330
Work-list formulation

f o r all non-terminals A do
F i r s t [ A ] := {}
WL := P // a l l p r o d u c t i o n s
end
w h i l e WL = / ∅ do
remove one (A → X1 . . . Xn ) from WL
if F i r s t [A] = / F i r s t [ A ] ∪ F i r s t [ X1 ]
then F i r s t [ A ] := F i r s t [ A ] ∪ F i r s t [ X1 ]
add a l l p r o d u c t i o n s (A → X1′ . . . Xm′ ) to WL
else skip
end

• worklist here: “collection” of productions


• alternatively, with slight reformulation: “collection” of
non-terminals instead also possible

20 / 330
Follow sets

Definition (Follow set (ignoring $))


Given a grammar G with start symbol S, and a non-terminal A.
The Follow-set of A, written Follow G (A), is

Follow G (A) = {a ∣ S ⇒∗G α1 Aaα2 , a ∈ ΣT } . (4)

• More generally: $ as special end-marker

S $ ⇒∗G α1 Aaα2 , a ∈ ΣT + { $ } .

• typically: start symbol not on the right-hand side of a


production

21 / 330
Follow sets, recursively

Definition (Follow set of a non-terminal)


Given a grammar G and nonterminal A. The Follow-set of A,
written Follow (A) is defined as follows:
1. If A is the start symbol, then Follow (A) contains $.
2. If there is a production B → αAβ, then Follow (A) contains
First(β) ∖ {}.
3. If there is a production B → αAβ such that  ∈ First(β), then
Follow (A) contains Follow (B).

• $: “end marker” special symbol, only to be contained in the


follow set

22 / 330
More imperative representation in pseudo code

Follow [ S ] := {$}
f o r all non-terminals A =/ S do
Follow [ A ] := {}
end
w h i l e there are changes to any Follow − s e t do
f o r each production A → X1 . . . Xn do
f o r e a c h Xi w h i c h i s a non − t e r m i n a l do
Follow [ Xi ] := Follow [ Xi ] ∪ ( F i r s t ( Xi+1 . . . Xn ) ∖ {})
i f  ∈ F i r s t ( Xi+1 Xi+2 . . . Xn )
then Follow [ Xi ] := Follow [ Xi ] ∪ Follow [ A ]
end
end
end

Note! First() = {}

23 / 330
Example expression grammar (expanded)

exp → exp addop term (3)


exp → term
addop → +
addop → −
term → term mulop factor
term → factor
mulop → ∗
factor → ( exp )
factor → n

24 / 330
nr pass 1 pass 2

1 exp → exp addop term

2 exp → term

5 term → term mulop factor

6 term → factor

8 factor → ( exp )

normalsize

25 / 330
“Run” of the algo

26 / 330
Illustration of first/follow sets

a ∈ First(A) a ∈ Follow (A)

• red arrows: illustration of information flow in the algos


• run of Follow :
• relies on First
• in particular a ∈ First(E ) (right tree)
• $ ∈ Follow (B)
27 / 330
More complex situation (nullability)

a ∈ First(A) a ∈ Follow (A)

28 / 330
Some forms of grammars are less desirable than others

• left-recursive production:

A → Aα
more precisely: example of immediate left-recursion
• 2 productions with common “left factor”:

A → αβ1 ∣ αβ2 where α =/ 

29 / 330
Some simple examples for both

• left-recursion

exp → exp + term

• classical example for common left factor: rules for conditionals

if -stmt → if ( exp ) stmt end


∣ if ( exp ) stmt else stmt end

30 / 330
Transforming the expression grammar

exp → exp addop term ∣ term


addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

• obviously left-recursive
• remember: this variant used for proper associativity!

31 / 330
After removing left recursion

exp → term exp ′


exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

• still unambiguous
• unfortunate: associativity now different!
• note also: -productions & nullability

32 / 330
Left-recursion removal

Left-recursion removal
A transformation process to turn a CFG into one without left
recursion

• price: -productions
• 3 cases to consider
• immediate (or direct) recursion
• simple
• general
• indirect (or mutual) recursion

33 / 330
Left-recursion removal: simplest case

Before After

A → Aα ∣ β A → βA′
A′ → αA′ ∣ 

34 / 330
Schematic representation

A → Aα ∣ β A → βA′
A′ → αA′ ∣ 

A A

A α β A′

A α α A′

A α α A′

β α A′

35 / 330
Remarks

• both grammars generate the same (context-free) language (=


set of words over terminals)
• in EBNF:

A → β{α}
• two negative aspects of the transformation
1. generated language unchanged, but: change in resulting
structure (parse-tree), i.a.w. change in associativity, which
may result in change of meaning
2. introduction of -productions
• more concrete example for such a production: grammar for
expressions

36 / 330
Left-recursion removal: immediate recursion (multiple)

Before After

A → Aα1 ∣ . . . ∣ Aαn A → β1 A′ ∣ . . . ∣ βm A′
∣ β1 ∣ . . . ∣ βm A′ → α1 A′ ∣ . . . ∣ αn A′
∣ 

Note: can be written in EBNF as:

A → (β1 ∣ . . . ∣ βm )(α1 ∣ . . . ∣ αn )∗

37 / 330
Removal of: general left recursion

Assume non-terminals A1 , . . . , Am
f o r i := 1 to m do
f o r j := 1 to i −1 do
replace each grammar rule of the form Ai → Aj β by // i < j
rule Ai → α1 β ∣ α2 β ∣ . . . ∣ αk β
where Aj → α1 ∣ α2 ∣ . . . ∣ αk
is the current rule(s) for Aj // c u r r e n t , i . e . , a t t h e c u r r e n t s t a g e
end
{ c o r r e s p o n d s to i = j }
remove, if necessary, immediate left recursion for Ai
end

“current” = rule in the current stage of algo

38 / 330
Example (for the general case)
let A = A1 , B = A2

A → Ba ∣ Aa ∣ c
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ BaA′ b ∣ cA′ b ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → cA′ bB ′ ∣ dB ′
B′ → bB ′ ∣ aA′ bB ′ ∣ 

39 / 330
Example (for the general case)
let A = A1 , B = A2

A → Ba ∣ Aa ∣ c
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ BaA′ b ∣ cA′ b ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → cA′ bB ′ ∣ dB ′
B′ → bB ′ ∣ aA′ bB ′ ∣ 

40 / 330
Example (for the general case)
let A = A1 , B = A2

A → Ba ∣ Aa ∣ c
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ BaA′ b ∣ cA′ b ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → cA′ bB ′ ∣ dB ′
B′ → bB ′ ∣ aA′ bB ′ ∣ 

41 / 330
Example (for the general case)
let A = A1 , B = A2

A → Ba ∣ Aa ∣ c
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ Ab ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → Bb ∣ BaA′ b ∣ cA′ b ∣ d

A → BaA′ ∣ cA′
A′ → aA′ ∣ 
B → cA′ bB ′ ∣ dB ′
B′ → bB ′ ∣ aA′ bB ′ ∣ 

42 / 330
Left factor removal

• CFG: not just describe a context-free languages


• also: intended (indirect) description of a parser for that
language
⇒ common left factor undesirable
• cf.: determinization of automata for the lexer

Simple situation

A → αA′ ∣ . . .
A → αβ ∣ αγ ∣ . . .
A′ → β ∣ γ

43 / 330
Example: sequence of statements

Before After

stmt-seq → stmt ; stmt-seq stmt-seq → stmt stmt-seq ′


∣ stmt stmt-seq ′ → ; stmt-seq ∣ 

44 / 330
Example: conditionals

Before
if -stmt → if ( exp ) stmt-seq end
∣ if ( exp ) stmt-seq else stmt-seq end

After
if -stmt → if ( exp ) stmt-seq else-or -end
else-or -end → else stmt-seq end ∣ end

45 / 330
Example: conditionals (without else)

Before
if -stmt → if ( exp ) stmt-seq
∣ if ( exp ) stmt-seq else stmt-seq

After
if -stmt → if ( exp ) stmt-seq else-or -empty
else-or -empty → else stmt-seq ∣ 

46 / 330
Not all factorization doable in “one step”
Starting point
A → abcB ∣ abC ∣ aE

After 1 step
A → abA′ ∣ aE
A′ → cB ∣ C

After 2 steps
A → aA′′
A′′ → bA′ ∣ E
A′ → cB ∣ C

• note: we choose the longest common prefix (= longest left


47 / 330
Left factorization

w h i l e there are changes to the grammar do


f o r each nonterminal A do
l e t α be a prefix of max. length that is shared
by two or more productions for A
if α=/
then
l e t A → α1 ∣ . . . ∣ αn be all
prod. for A and suppose that α1 , . . . , αk share α
so that A → αβ1 ∣ . . . ∣ αβk ∣ αk+1 ∣ . . . ∣ αn ,
that the βj ’s share no common prefix, and
that the αk+1 , . . . , αn do not share α.
replace rule A → α1 ∣ . . . ∣ αn by the rules
A → αA′ ∣ αk+1 ∣ . . . ∣ αn
A′ → β1 ∣ . . . ∣ βk
end
end
end

48 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

49 / 330
What’s a parser generally doing

task of parser = syntax analysis


• input: stream of tokens from lexer
• output:
• abstract syntax tree
• or meaningful diagnosis of source of syntax error

• the full “power” (i.e., expressiveness) of CFGs not used


• thus:
• consider restrictions of CFGs, i.e., a specific subclass, and/or
• represented in specific ways (no left-recursion, left-factored
...)

50 / 330
Lexer, parser, and the rest

token
source parse tree rest of the interm.
lexer parser
program get next front end rep.

token

symbol table

51 / 330
Top-down vs. bottom-up

• all parsers (together with lexers): left-to-right


• remember: parsers operate with trees
• parse tree (concrete syntax tree): representing grammatical
derivation
• abstract syntax tree: data structure
• 2 fundamental classes
• while parser eats through the token stream, it grows, i.e.,
builds up (at least conceptually) the parse tree:

Bottom-up Top-down
Parse tree is being grown from Parse tree is being grown from
the leaves to the root. the root to the leaves.
• while parse tree mostly conceptual: parsing build up the
concrete data structure of AST bottom-up vs. top-down.

52 / 330
Parsing restricted classes of CFGs
• parser: better be “efficient”
• full complexity of CFLs: not really needed in practice2
• classification of CF languages vs. CF grammars, e.g.:
• left-recursion-freedom: condition on a grammar
• ambiguous language vs. ambiguous grammar
• classification of grammars ⇒ classification of language
• a CF language is (inherently) ambiguous, if there’s no
unambiguous grammar for it
• a CF language is top-down parseable, if there exists a grammar
that allows top-down parsing . . .

• in practice: classification of parser generating tools:


• based on accepted notation for grammars: (BNF or some form
of EBNF etc.)
2
Perhaps: if a parser has trouble to figure out if a program has a syntax
error or not (perhaps using back-tracking), probably humans will have similar
problems. So better keep it simple. And time in a compiler may be better
spent elsewhere (optimization, semantical analysis).
53 / 330
Classes of CFG grammars/languages

• maaaany have been proposed & studied, including their


relationships
• lecture concentrates on
• top-down parsing, in particular
• LL(1)
• recursive descent
• bottom-up parsing
• LR(1)
• SLR
• LALR(1) (the class covered by yacc-style tools)
• grammars typically written in pure BNF

54 / 330
Relationship of some grammar (not language) classes

unambiguous ambiguous

LL(k) LR(k)
LL(1) LR(1)
LALR(1)
SLR
LR(0)
LL(0)

taken from [Appel, 1998]

55 / 330
General task (once more)

• Given: a CFG (but appropriately restricted)


• Goal: “systematic method” s.t.
1. for every given word w : check syntactic correctness
2. [build AST/representation of the parse tree as side effect]
3. [do reasonable error handling]

56 / 330
Schematic view on “parser machine”

... if 1 + 2 ∗ ( 3 + 4 ) ...

q2

Reading “head”
(moves left-to-right)

q3 ⋱

q2 qn ...

q1 q0
unbounded extra memory (stack)
Finite control

Note: sequence of tokens (not characters)

57 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

exp

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

58 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

term exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

59 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

factor term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

60 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

61 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

numberterm′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

62 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

63 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

numberexp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

64 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

numberaddop term exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

65 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number+ term exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

66 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +term exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

67 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +factor term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

68 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

69 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +numberterm′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

70 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +numbermulop factor term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

71 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number∗ factor term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

72 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( exp ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

73 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( exp ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

74 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( exp ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

75 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( term exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

76 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( factor term′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

77 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number term′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

78 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( numberterm′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

79 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

80 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( numberexp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

81 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( numberaddop term exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
82 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number+ term exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

83 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + term exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

84 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + factor term′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
85 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number term′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
86 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + numberterm′ exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
87 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number exp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
88 / 330
Derivation of an expression

... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + numberexp ′ ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n
89 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

90 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

91 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number ) term′ exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

92 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number )  exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

93 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number ) exp ′

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

94 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number ) 

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

95 / 330
Derivation of an expression
... 1 + 2 ∗ ( 3 + 4 ) ...

number +number ∗ ( number + number )

factors and terms


exp → term exp ′ (5)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

96 / 330
Remarks concerning the derivation

Note:
• input = stream of tokens
• there: 1 . . . stands for token class number (for
readability/concreteness), in the grammar: just number
• in full detail: pair of token class and token value ⟨number, 1⟩
Notation:
• underline: the place (occurrence of non-terminal where
production is used)
• crossed out:
• terminal = token is considered treated
• parser “moves on”
• later implemented as match or eat procedure

97 / 330
Not as a “film” but at a glance: reduction sequence

exp ⇒
term exp ′ ⇒
factor term′ exp ′ ⇒
number term′ exp ′ ⇒
numberterm′ exp ′ ⇒
number exp ′ ⇒
numberexp ′ ⇒
numberaddop term exp ′ ⇒
number+ term exp ′ ⇒
number +term exp ′ ⇒
number +factor term′ exp ′ ⇒
number +number term′ exp ′ ⇒
number +numberterm′ exp ′ ⇒
number +numbermulop factor term′ exp ′ ⇒
number +number∗ factor term′ exp ′ ⇒
number +number ∗ ( exp ) term′ exp ′ ⇒
number +number ∗ ( exp ) term′ exp ′ ⇒
number +number ∗ ( exp ) term′ exp ′ ⇒
...

98 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

99 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

100 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

101 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

102 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

103 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

104 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

105 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

106 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

107 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

108 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

109 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

110 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

111 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

112 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

113 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

114 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

115 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

116 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

117 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

118 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

119 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

120 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

121 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

122 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

123 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

124 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

125 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

126 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

127 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

128 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

129 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

130 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

131 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

132 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

133 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

134 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

135 / 330
Best viewed as a tree

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

136 / 330
Non-determinism?

• not a “free” expansion/reduction/generation of some word, but

• reduction of start symbol towards the target word of terminals

exp ⇒∗ 1 + 2 ∗ (3 + 4)
• i.e.: input stream of tokens “guides” the derivation process (at
least it fixes the target)
• but: how much “guidance” does the target word (in general)
gives?

137 / 330
Two principle sources of non-determinism here

Using production A → β
S ⇒∗ α1 A α2 ⇒ α1 β α2 ⇒∗ w

• α1 , α2 , β: word of terminals and nonterminals


• w : word of terminals, only
• A: one non-terminal

2 choices to make
1. where, i.e., on which occurrence of a non-terminal in α1 Aα2 to
apply a productiona
2. which production to apply (for the chosen non-terminal).
a
Note that α1 and α2 may contain non-terminals, including further
occurrences of A.

138 / 330
Left-most derivation

• that’s the easy part of non-determinism


• taking care of “where-to-reduce” non-determinism: left-most
derivation
• notation ⇒l
• the example derivation earlier used that

139 / 330
Non-determinism vs. ambiguity
• Note: the “where-to-reduce”-non-determinism =
/ ambiguitiy of
a grammar3
• in a way (“theoretically”): where to reduce next is irrelevant:
• the order in the sequence of derivations does not matter
• what does matter: the derivation tree (aka the parse tree)

Lemma (Left or right, who cares)


S ⇒∗l w iff S ⇒∗r w iff S ⇒∗ w .

• however (“practically”): a (deterministic) parser


implementation: must make a choice

Using production A → β
S ⇒∗ α1 A α2 ⇒ α1 β α2 ⇒∗ w

3
A CFG is ambiguous, if there exists a word (of terminals) with 2 different
parse trees.
140 / 330
Non-determinism vs. ambiguity
• Note: the “where-to-reduce”-non-determinism =
/ ambiguitiy of
a grammar3
• in a way (“theoretically”): where to reduce next is irrelevant:
• the order in the sequence of derivations does not matter
• what does matter: the derivation tree (aka the parse tree)

Lemma (Left or right, who cares)


S ⇒∗l w iff S ⇒∗r w iff S ⇒∗ w .

• however (“practically”): a (deterministic) parser


implementation: must make a choice

Using production A → β
S ⇒∗l w1 A α2 ⇒ w1 β α2 ⇒∗l w

3
A CFG is ambiguous, if there exists a word (of terminals) with 2 different
parse trees.
141 / 330
What about the “which-right-hand side” non-determinism?

A→β ∣ γ

Is that the correct choice?


S ⇒∗l w1 A α2 ⇒ w1 β α2 ⇒∗l w

• reduction with “guidance”: don’t loose sight of the target w


• “past” is fixed: w = w1 w2
• “future” is not:

Aα2 ⇒l βα2 ⇒∗l w2 or else Aα2 ⇒l γα2 ⇒∗l w2 ?

Needed (minimal requirement):


In such a situation, “future target” w2 must determine which of the
rules to take!
142 / 330
Deterministic, yes, but still impractical

Aα2 ⇒l βα2 ⇒∗l w2 or else Aα2 ⇒l γα2 ⇒∗l w2 ?

• the “target” w2 is of unbounded length!


⇒ impractical, therefore:

Look-ahead of length k
resolve the “which-right-hand-side” non-determinism inspecting only
fixed-length prefix of w2 (for all situations as above)

LL(k) grammars
CF-grammars which can be parsed doing that.a
a
Of course, one can always write a parser that “just makes some decision”
based on looking ahead k symbols. The question is: will that allow to capture
all words from the grammar and only those.

143 / 330
Parsing LL(1) grammars
• this lecture: we don’t do LL(k) with k > 1
• LL(1): particularly easy to understand and to implement
(efficiently)
• not as expressive than LR(1) (see later), but still kind of decent

LL(1) parsing principle


Parse from 1) left-to-right (as always anyway), do a 2) left-most
derivation and resolve the “which-right-hand-side” non-determinism
by looking 3) 1 symbol ahead.

• two flavors for LL(1) parsing here (both are top-down parsers)
• recursive descent4
• table-based LL(1) parser
• predictive parsers
4
If one wants to be very precise: it’s recursive descent with one look-ahead
and without back-tracking. It’s the single most common case for recursive
descent parsers. Longer look-aheads are possible, but less common.
Technically, even allowing back-tracking can be done using recursive descent as
principle (even if not done in practice). 144 / 330
Sample expr grammDar again

factors and terms


exp → term exp ′ (6)
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

145 / 330
Look-ahead of 1: straightforward, but not trivial

• look-ahead of 1:
• not much of a look-ahead, anyhow
• just the “current token”
⇒ read the next token, and, based on that, decide
• but: what if there’s no more symbols?
⇒ read the next token if there is, and decide based on the token
or else the fact that there’s none left5

Example: 2 productions for non-terminal factor


factor → ( exp ) ∣ number

that situation is trivial, but that’s not all to LL(1) . . .

5
Sometimes “special terminal” $ used to mark the end (as mentioned).
146 / 330
Recursive descent: general set-up

1. global variable, say tok, representing the “current token” (or


pointer to current token)
2. parser has a way to advance that to the next token (if there’s
one)

Idea
For each non-terminal nonterm, write one procedure which:
• succeeds, if starting at the current token position, the “rest” of
the token stream starts with a syntactically correct word of
terminals representing nonterm
• fail otherwise

• ignored (for right now): when doing the above successfully,


build the AST for the accepted nonterminal.

147 / 330
Recursive descent

method factor for nonterminal factor


1 f i n a l i n t LPAREN=1 ,RPAREN=2 ,NUMBER=3 ,
2 PLUS=4 ,MINUS=5 ,TIMES=6;

1 void factor () {
2 switch ( tok ) {
3 c a s e LPAREN : e a t (LPAREN ) ; e x p r ( ) ; e a t (RPAREN ) ;
4 c a s e NUMBER: e a t (NUMBER ) ;
5 }
6 }

148 / 330
Recursive descent

t y p e t o k e n = LPAREN | RPAREN | NUMBER


| PLUS | MINUS | TIMES

let factor () = (∗ f u n c t i o n f o r f a c t o r s ∗)
match ! t o k w i t h
LPAREN −> e a t (LPAREN ) ; e x p r ( ) ; e a t (RPAREN)
| NUMBER −> e a t (NUMBER)

149 / 330
Slightly more complex
• previous 2 rules for factor : situation not always as immediate
as that
LL(1) principle (again)
given a non-terminal, the next token must determine the choice of
right-hand sidea
a
It must be the next token/terminal in the sense of First, but it need not be
a token directly mentioned on the right-hand sides of the corresponding rules.

⇒ definition of the First set


Lemma (LL(1) (without nullable symbols))
A reduced context-free grammar without nullable non-terminals is
an LL(1)-grammar iff for all non-terminals A and for all pairs of
productions A → α1 and A → α2 with α1 =/ α2 :

First 1 (α1 ) ∩ First 1 (α2 ) = ∅ .


150 / 330
Common problematic situation

• often: common left factors problematic

if -stmt → if ( exp ) stmt


∣ if ( exp ) stmt else stmt

• requires a look-ahead of (at least) 2


• ⇒ try to rearrange the grammar
1. Extended BNF ([Louden, 1997] suggests that)
if -stmt → if ( exp ) stmt[else stmt]

1. left-factoring:

if -stmt → if ( exp ) stmt else−part


else−part →  ∣ else stmt

151 / 330
Recursive descent for left-factored if -stmt

1 procedure i f s t m t ( )
2 begin
3 match ( " i f " ) ;
4 match ("(");
5 exp ( ) ;
6 match ( " ) " ) ;
7 stmt ( ) ;
8 if token = " e l s e "
9 then match ( " e l s e " ) ;
10 stmt ( )
11 end
12 end ;

152 / 330
Left recursion is a no-go

factors and terms


exp → exp addop term ∣ term (7)
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

• consider treatment of exp: First(exp)?


• whatever is in First(term), is in First(exp)6
• even if only one (left-recursive) production ⇒ infinite recursion.

Left-recursion
Left-recursive grammar never works for recursive descent.

6
And it would not help to look-ahead more than 1 token either.
153 / 330
Removing left recursion may help

p r o c e d u r e exp ( )
begin
term ( ) ;
exp ′ ( )
end
exp → term exp ′
exp ′ → addop term exp ′ ∣  p r o c e d u r e exp ′ ( )
addop → + ∣ − begin
factor term′
case token of
term → "+": match ( " + " ) ;
term′ → mulop factor term′ ∣  term ( ) ;
exp ′ ( )
mulop → ∗ " −": match ( " − " ) ;
factor → ( exp ) ∣ n term ( ) ;
exp ′ ( )
end
end

154 / 330
Recursive descent works, alright, but . . .

exp

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr mulop factor term′

∗ ( exp ) 

term exp ′

factor term′ addop term exp ′

Nr  + factor term′ 

Nr 

. . . who wants this form of trees?


155 / 330
The two expression grammars again

no left-rec.

Precedence & assoc. exp → term exp ′


exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
exp → exp addop term ∣ term term → factor term′
addop → + ∣ − term′ → mulop factor term′ ∣ 
term → term mulop factor ∣ factor mulop → ∗
mulop → ∗ factor → ( exp ) ∣ n
factor → ( exp ) ∣ number

• no left-recursion
• assoc. / precedence ok
• clean and straightforward
rules • rec. descent parsing ok

• left-recursive • but: just “unnatural”


• non-straightforward
parse-trees
156 / 330
Left-recursive grammar with nicer parse trees

1 + 2 ∗ (3 + 4)
exp

exp addop term

term + term mulop term

factor factor ∗ factor

Nr Nr ( exp )

Nr mulop Nr

157 / 330
The simple “original” expression grammar (even nicer)
Flat expression grammar
exp → exp op exp ∣ ( exp ) ∣ number
op → + ∣ − ∣ ∗

1 + 2 ∗ (3 + 4)
exp

exp op exp

Nr + exp op exp

Nr ∗ ( exp )

exp op exp

Nr + Nr

158 / 330
Associtivity problematic

Precedence & assoc.


exp → exp addop term ∣ term
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

exp

exp addop term


3+4+5
exp addop term + factor

parsed “as” term + factor number

factor number

(3 + 4) + 5
number

159 / 330
Associtivity problematic

Precedence & assoc.


exp → exp addop term ∣ term
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

exp

exp addop term


3−4−5
exp addop term − factor

parsed “as” term − factor number

factor number

(3 − 4) − 5
number

160 / 330
Now use the grammar without left-rec (but right-rec
instead)
No left-rec.
exp → term exp ′
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

exp

term exp ′

factor termaddop
′ term exp ′


3−4−5 number  factor term′ addop term exp ′

number  − factor term′ 

161 / 330
Now use the grammar without left-rec (but right-rec
instead)
No left-rec.
exp → term exp ′
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

exp

term exp ′
3−4−5
factor termaddop
′ term exp ′

parsed “as”  − exp ′


number factor term′ addop term

number  − factor term′ 

3 − (4 − 5) 162 / 330
But if we need a “left-associative” AST?

• we want (3 − 4) − 5, not 3 − (4 − 5)

exp
-6

3
term exp ′

4 -1

factor term ′ addop term exp ′

number  − factor term′ addop term exp ′

number  − factor term′ 

number 

163 / 330
Code to “evaluate” ill-associated such trees correctly

f u n c t i o n exp′ ( v a l s o f a r : i n t ) : i n t ;
begin
i f t o k e n = ’+ ’ o r t o k e n = ’ − ’
then
case token of
’ + ’ : match ( ’ + ’ ) ;
v a l s o f a r := v a l s o f a r + term ;
’ − ’ : match ( ’ − ’ ) ;
v a l s o f a r := v a l s o f a r − term ;
end c a s e ;
r e t u r n exp′ ( v a l s o f a r ) ;
else return v a l s o f a r
end ;

• extra “accumulator” argument valsofar


• instead of evaluating the expression, one could build the AST
with the appropriate associativity instead:
• instead of valueSoFar, one had rootOfTreeSoFar

164 / 330
“Designing” the syntax, its parsing, & its AST

• trade offs:
1. starting from: design of the language, how much of the syntax
is left “implicit” 7
2. which language class? Is LL(1) good enough, or something
stronger wanted?
3. how to parse? (top-down, bottom-up, etc.)
4. parse-tree/concrete syntax trees vs. ASTs

7
Lisp is famous/notorious in that its surface syntax is more or less an
explicit notation for the ASTs. Not that it was originally planned like this . . .
165 / 330
AST vs. CST

• once steps 1.–3. are fixed: parse-trees fixed!


• parse-trees = essence of grammatical derivation process
• often: parse trees only “conceptually” present in a parser
• AST:
• abstractions of the parse trees
• essence of the parse tree
• actual tree data structure, as output of the parser
• typically on-the fly: AST built while the parser parses, i.e.
while it executes a derivation in the grammar

AST vs. CST/parse tree


Parser "builds" the AST data structure while "doing" the parse tree

166 / 330
AST: How “far away” from the CST?
• AST: only thing relevant for later phases ⇒ better be clean . . .
• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,

exp
-6

3
term exp ′

4 -1

factor term′ addop term exp ′

number  − factor term ′ addop term exp ′

number  − factor term′ 

number  167 / 330


AST: How “far away” from the CST?

• AST: only thing relevant for later phases ⇒ better be clean . . .


• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,

exp

exp addop term

exp addop term − factor

term − factor number

factor number

number

slightly more reasonable looking as AST (but underlying grammar


not directly useful for recursive descent)
168 / 330
AST: How “far away” from the CST?

• AST: only thing relevant for later phases ⇒ better be clean . . .


• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,

exp

exp op exp

number − exp op exp

number − number

That parse tree looks reasonable clear and intuitive

169 / 330
AST: How “far away” from the CST?

• AST: only thing relevant for later phases ⇒ better be clean . . .


• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,

number −

number number

Wouldn’t that be the best AST here?

170 / 330
AST: How “far away” from the CST?
• AST: only thing relevant for later phases ⇒ better be clean . . .
• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,

number −

number number

Wouldn’t that be the best AST here?


Certainly minimal amount of nodes, which is nice as such.
However, what is missing (which might be interesting) is the fact
that the 2 nodes labelled “−” are expressions!
171 / 330
AST: How “far away” from the CST?
• AST: only thing relevant for later phases ⇒ better be clean . . .
• AST “=” CST?
• building AST becomes straightforward
• possible choice, if the grammar is not designed “weirdly”,
exp ∶ −

exp ∶ number op ∶ −

exp ∶ number exp ∶ number

Wouldn’t that be the best AST here?


Certainly minimal amount of nodes, which is nice as such.
However, what is missing (which might be interesting) is the fact
that the 2 nodes labelled “−” are expressions!
172 / 330
This is how it’s done (a recipe)

Assume, one has a “non-weird” grammar


exp → exp op exp ∣ ( exp ) ∣ number
op → + ∣ − ∣ ∗

• typically that means: assoc. and precedences etc. are fixed


outside the non-weird grammar
• by massaging it to an equivalent one (no left recursion etc.)
• or (better): use parser-generator that allows to specify assoc
...
„ without cluttering the grammar.
• if grammar for parsing is not as clear: do a second one
describing the ASTs

Remember (independent from parsing)


BNF describe trees
173 / 330
This is how it’s done (recipe for OO data structures)

Recipe
• turn each non-terminal to an abstract class
• turn each right-hand side of a given non-terminal as
(non-abstract) subclass of the class for considered non-terminal
• chose fields & constructors of concrete classes appropriately
• terminal: concrete class as well, field/constructor for token’s
value

174 / 330
Example in Java

exp → exp op exp ∣ ( exp ) ∣ number


op → + ∣ − ∣ ∗
1 a b s t r a c t p u b l i c c l a s s Exp {
2 }

1 p u b l i c c l a s s BinExp e x t e n d s Exp { // e x p −> e x p op e x p


2 p u b l i c Exp l e f t , r i g h t ;
3 p u b l i c Op op ;
4 p u b l i c BinExp ( Exp l , Op o , Exp r ) {
5 l e f t =l ; op=o ; r i g h t=r ; }
6 }

1 p u b l i c c l a s s P a r e n t h e t i c E x p e x t e n d s Exp { // e x p −> ( op )
2 p u b l i c Exp e x p ;
3 p u b l i c P a r e n t h e t i c E x p ( Exp e ) { e x p = l ; }
4 }

1 p u b l i c c l a s s NumberExp e x t e n d s Exp { // e x p −> NUMBER


2 p u b l i c number ; // t o k e n v a l u e
3 p u b l i c Number ( i n t i ) { number = i ; }
4 }
175 / 330
Example in Java

exp → exp op exp ∣ ( exp ) ∣ number


op → + ∣ − ∣ ∗
1 a b s t r a c t p u b l i c c l a s s Op { // non − t e r m i n a l = a b s t r a c t
2 }

1 public class Plus e x t e n d s Op { // op −> "+"


2 }

1 p u b l i c c l a s s Minus e x t e n d s Op { // op −> "−"


2 }

1 p u b l i c c l a s s Times e x t e n d s Op { // op −> "∗"


2 }

176 / 330
3 − (4 − 5)

Exp e = new BinExp (


new NumberExp ( 3 ) ,
new Minus ( ) ,
new BinExp ( new P a r e n t h e t i c E x p r (
new NumberExp ( 4 ) ,
new Minus ( ) ,
new NumberExp ( 5 ) ) ) )

177 / 330
Pragmatic deviations from the recipe

• it’s nice to have a guiding principle, but no need to carry it too


far . . .
• To the very least: the ParentheticExpr is completely
without purpose: grouping is captured by the tree structure
⇒ that class is not needed
• some might prefer an implementation of

op → + ∣ − ∣ ∗
as simply integers, for instance arranged like
1 p u b l i c c l a s s BinExp e x t e n d s Exp { // e x p −> e x p op e x p
2 p u b l i c Exp l e f t , r i g h t ;
3 public int op ;
4 p u b l i c BinExp ( Exp l , i n t o , Exp r ) { p o s=p ; l e f t =l ; o p e r=o ; r i g h
5 p u b l i c f i n a l s t a t i c i n t PLUS=0 , MINUS=1 , TIMES=2;
6 }

and used as BinExpr.PLUS etc.


178 / 330
Recipe for ASTs, final words:
• space considerations for AST representations are irrelevant in
most cases
• clarity and cleanness trumps “quick hacks” and “squeezing bits”
• some deviation from the recipe or not, the advice still holds:

Do it systematically
A clean grammar is the specification of the syntax of the language
and thus the parser. It is also a means of communicating with
humans (at least with pros who (of course) can read BNF) what
the syntax is. A clean grammar is a very systematic and structured
thing which consequently can and should be systematically and
cleanly represented in an AST, including judicious and systematic
choice of names and conventions (nonterminal exp represented by
class Exp, non-terminal stmt by class Stmt etc)

• a word on [Louden, 1997]: His C-based representation of the


AST is a bit on the “bit-squeezing” side of things . . .
179 / 330
Extended BNF may help alleviate the pain

BNF EBNF

exp → exp addop term ∣ term exp → term{ addop term }


term → term mulop factor ∣ factor term → factor { mulop factor }

but remember:
• EBNF just a notation, just because we do not see (left or
right) recursion in { . . . }, does not mean there is no recursion.
• not all parser generators support EBNF
• however: often easy to translate into loops- 8
• does not offer a general solution if associativity etc. is
problematic
8
That results in a parser which is somehow not “pure recursive descent”. It’s
“recusive descent, but sometimes, let’s use a while-loop, if more convenient
concerning, for instance, associativity”
180 / 330
Pseudo-code representing the EBNF productions

1 p r o c e d u r e exp ;
2 begin
3 term ; { recursive call }
4 w h i l e t o k e n = "+" o r t o k e n = "−"
5 do
6 match ( t o k e n ) ;
7 term ; // r e c u r s i v e c a l l
8 end
9 end

1 p r o c e d u r e term ;
2 begin
3 factor ; { recursive call }
4 w h i l e t o k e n = "∗"
5 do
6 match ( t o k e n ) ;
7 factor ; // r e c u r s i v e c a l l
8 end
9 end

181 / 330
How to produce “something” during RD parsing?

Recursive descent
So far: RD = top-down (parse-)tree traversal via recursive
procedure.a Possible outcome: termination or failure.
a
Modulo the fact that the tree being traversed is “conceptual” and not the
input of the traversal procedure; instead, the traversal is “steered” by stream of
tokens.

• Now: instead of returning “nothing” (return type void or


similar), return some meaningful, and build that up during
traversal
• for illustration: procedure for expressions:
• return type int,
• while traversing: evaluate the expression

182 / 330
Evaluating an exp during RD parsing

1 f u n c t i o n exp ( ) : i n t ;
2 v a r temp : i n t
3 begin
4 temp := term ( ) ; { recursive call }
5 w h i l e t o k e n = "+" o r t o k e n = "−"
6 case token of
7 "+": match ( " + " ) ;
8 temp := temp + term ( ) ;
9 " −": match (" −")
10 temp := temp − term ( ) ;
11 end
12 end
13 r e t u r n temp ;
14 end

183 / 330
Building an AST: expression

1 f u n c t i o n exp ( ) : s y n t a x T r e e ;
2 v a r temp , newtemp : s y n t a x T r e e
3 begin
4 temp := term ( ) ; { recursive call }
5 w h i l e t o k e n = "+" o r t o k e n = "−"
6 case token of
7 "+": match ( " + " ) ;
8 newtemp := makeOpNode ( " + " ) ;
9 l e f t C h i l d ( newtemp ) := temp ;
10 r i g h t C h i l d ( newtemp ) := term ( ) ;
11 temp := newtemp ;
12 " −": match (" −")
13 newtemp := makeOpNode ( " − " ) ;
14 l e f t C h i l d ( newtemp ) := temp ;
15 r i g h t C h i l d ( newtemp ) := term ( ) ;
16 temp := newtemp ;
17 end
18 end
19 r e t u r n temp ;
20 end

• note: the use of temp and the while loop


184 / 330
Building an AST: factor

factor → ( exp ) ∣ number

1 function factor () : syntaxTree ;


2 var f a c t : syntaxTree
3 begin
4 case token of
5 " ( " : match ( " ( " ) ;
6 f a c t := exp ( ) ;
7 match ( " ) " ) ;
8 number :
9 match ( number )
10 f a c t := makeNumberNode ( number ) ;
11 else : error . . . // f a l l t h r o u g h
12 end
13 return fact ;
14 end

185 / 330
Building an AST: conditionals

if -stmt → if ( exp ) stmt [else stmt]

1 function ifStmt () : syntaxTree ;


2 v a r temp : s y n t a x T r e e
3 begin
4 match ( " i f " ) ;
5 match ( " ( " ) ;
6 temp := makeStmtNode ( " i f " )
7 t e s t C h i l d ( temp ) := exp ( ) ;
8 match ( " ) " ) ;
9 t h e n C h i l d ( temp ) := stmt ( ) ;
10 if token = " e l s e "
11 then match " e l s e " ;
12 e l s e C h i l d ( temp ) := stmt ( ) ;
13 e l s e e l s e C h i l d ( temp ) := n i l ;
14 end
15 r e t u r n temp ;
16 end

186 / 330
Building an AST: remarks and “invariant”

• LL(1) requirement: each procedure/function/method


(covering one specific non-terminal) decides on alternatives,
looking only at the current token
• call of function A for non-terminal A:
• upon entry: first terminal symbol for A in token
• upon exit: first terminal symbol after the unit derived from A
in token
• match("a") : checks for "a" in token and eats the token (if
matched).

187 / 330
LL(1) parsing
• remember LL(1) grammars & LL(1) parsing principle:

LL(1) parsing principle


1 look-ahead enough to resolve “which-right-hand-side”
non-determinism.

• instead of recursion (as in RD): explicit stack


• decision making: collated into the LL(1) parsing table
• LL(1) parsing table:
• finite data structure M (for instance 2 dimensional array)9
M ∶ ΣN × ΣT → ((ΣN × Σ∗ ) + error)

• M[A, a] = w
• we assume: pure BNF
9
Often, the entry in the parse table does not contain a full rule as here,
needed is only the right-hand-side. In that case the table is of type
ΣN × ΣT → (Σ∗ +error). We follow the convention of this book.
188 / 330
Construction of the parsing table

Table recipe
1. If A → α ∈ P and α ⇒∗ aβ, then add A → α to table entry
M[A, a]
2. If A → α ∈ P and α ⇒∗  and S $ ⇒∗ βAaγ (where a is a
token (=non-terminal) or $), then add A → α to table entry
M[A, a]

Table recipe (again, now using our old friends First and
Follow )
Assume A → α ∈ P.
1. If a ∈ First(α), then add A → α to M[A, a].
2. If α is nullable and a ∈ Follow (A), then add A → α to M[A, a].

189 / 330
Example: if-statements

• grammars is left-factored and not left recursive

stmt → if -stmt ∣ other


if -stmt → if ( exp ) stmt else−part
else−part → else stmt ∣ 
exp → 0 ∣ 1

First Follow
stmt other, if $, else
if -stmt if $, else
else−part else,  $, else
exp 0, 1 )

190 / 330
Example: if statement: “LL(1) parse table”

• 2 productions in the “red table entry”


• thus: it’s technically not an LL(1) table (and it’s not an LL(1)
grammar)
• note: removing left-recursion and left-factoring did not help!
191 / 330
LL(1) table based algo

1 w h i l e the top of the parsing stack =/ $


2 i f the top of the parsing stack is terminal a
3 and the next input token = a
4 then
5 pop the parsing stack ;
6 advance the input ; // ‘ ‘ match ’ ’
7 else i f the top the parsing is non-terminal A
8 and the next input token is a terminal or $
9 and parsing table M[A, a] contains
10 production A → X1 X2 . . . Xn
11 then ( ∗ g e n e r a t e ∗ )
12 pop the parsing stack
13 f o r i ∶= n to 1 do
14 push X onto the stack ;
15 e l s e error
16 if the top of the stack = $
17 then accept
18 end
192 / 330
LL(1): illustration of run of the algo

193 / 330
Expressions
Original grammar
exp → exp addop term ∣ term
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

First Follow
exp (, number $, )
exp ′ +, −,  $, )
addop +, − (, number
term (, number $, ), +, −
term′ ∗,  $, ), +, −
mulop ∗ (, number
factor (, number $, ), +, −, ∗
194 / 330
Expressions
Original grammar
exp → exp addop term ∣ term
addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

left-recursive ⇒ not LL(k)

First Follow
exp (, number $, )
exp ′ +, −,  $, )
addop +, − (, number
term (, number $, ), +, −
term′ ∗,  $, ), +, −
mulop ∗ (, number
factor (, number $, ), +, −, ∗ 195 / 330
Expressions
Left-rec removed
exp → term exp ′
exp ′ → addop term exp ′ ∣ 
addop → + ∣ −
term → factor term′
term′ → mulop factor term′ ∣ 
mulop → ∗
factor → ( exp ) ∣ n

First Follow
exp (, number $, )
exp ′ +, −,  $, )
addop +, − (, number
term (, number $, ), +, −
term′ ∗,  $, ), +, −
mulop ∗ (, number
factor (, number $, ), +, −, ∗
196 / 330
Expressions: LL(1) parse table

197 / 330
Error handling

• at the least: do an understandable error message


• give indication of line / character or region responsible for the
error in the source file
• potentially stop the parsing
• some compilers do error recovery
• give an understandable error message (as minimum)
• continue reading, until it’s plausible to resume parsing ⇒ find
more errors
• however: when finding at least 1 error: no code generation
• observation: resuming after syntax error is not easy

198 / 330
Error messages
• important:
• try to avoid error messages that only occur because of an
already reported error!
• report error as early as possible, if possible at the first point
where the program cannot be extended to a correct program.
• make sure that, after an error, one doesn’t end up in a infinite
loop without reading any input symbols.
• What’s a good error message?
• assume: that the method factor() chooses the alternative
( exp ) but that it, when control returns from method exp(),
does not find a )
• one could report : left paranthesis missing
• But this may often be confusing, e.g. if what the program text
is: ( a + b c )
• here the exp() method will terminate after ( a + b, as c
cannot extend the expression). You should therefore rather
give the message error in expression or left
paranthesis missing.
199 / 330
Handling of syntax errors using recursive descent

200 / 330
Syntax errors with sync stack

201 / 330
Procedures for expression with "error recovery"

202 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

203 / 330
Bottom-up parsing: intro

"R" stands for right-most derivation.

LR(0) • only for very simple grammars


• approx. 300 states for standard programming
languages
• only as intro to SLR(1) and LALR(1)
SLR(1) • expressive enough for most grammars for
standard PLs
• same number of states as LR(0)
• main focus here
LALR(1) • slightly more expressive than SLR(1)
• same number of states as LR(0)
• we look at ideas behind that method as well
LR(1) covers all grammars, which can in principle be parsed
by looking at the next token
204 / 330
Grammar classes overview (again)

unambiguous ambiguous

LL(k) LR(k)
LL(1) LR(1)
LALR(1)
SLR
LR(0)
LL(0)

205 / 330
LR-parsing and its subclasses
• right-most derivation (but left-to-right parsing)
• in general: bottom-up parsing more powerful than top-down
• typically: tool-supported (unlike recursive descent, which may
well be hand-coded)
• based on parsing tables + explicit stack
• thankfully: left-recursion no longer problematic
• typical tools: yacc and its descendants (like bison, CUP, etc)
• another name: shift-reduce parser
tokens + non-terms

states LR parsing table

206 / 330
Example grammar

S′ → S
S → ABt7 ∣ . . .
A → t4 t5 ∣ t1 B ∣ . . .
B → t2 t3 ∣ At6 ∣ . . .

• assume: grammar unambiguous


• assume word of terminals t1 t2 . . . t7 and its (unique)
parse-tree

• general agreement for bottom-up parsing:


• start symbol never on the right-hand side or a production
• routinely add another “extra” start-symbol (here S ′ )10

10
That will later be relied upon when constructing a DFA for “scanning” the
stack, to control the reactions of the stack machine. This restriction leads to a
unique, well-defined initial state.
207 / 330
Parse tree for t1 . . . t7

S′

A B

B A

t 1 t2 t3 t4 t5 t6 t7

Remember: parse tree independent from left- or


right-most-derivation
208 / 330
LR: left-to right scan, right-most derivation?
Potentially puzzling question at first sight:
How does the parser right-most derivation, when parsing
left-to-right?

• short answer: parser builds the parse tree bottom-up


• derivation:
• replacement of nonterminals by right-hand sides
• derivation: builds (implicitly) a parse-tree top-down

Right-sentential form: right-most derivation


S ⇒∗r α

Slighly longer answer


LR parser parses from left-to-right and builds the parse tree
bottom-up. When doing the parse, the parser (implicitly) builds a
right-most derivation in reverse (because of bottom-up). 209 / 330
Example expression grammar (from before)

exp → exp addop term ∣ term (8)


addop → + ∣ −
term → term mulop factor ∣ factor
mulop → ∗
factor → ( exp ) ∣ number

exp

term

term factor

factor

number ∗ number

210 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number

211 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number

212 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number

213 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor

214 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor
↪ term

215 / 330
Bottom-up parse: Growing the parse tree

exp

term

term factor

factor

number ∗ number

number ∗ number ↪ factor ∗ number


↪ term ∗ number
↪ term ∗ factor
↪ term
↪ exp

216 / 330
Reduction in reverse = right derivation

Reduction Right derivation

n∗n ↪ factor ∗ n n ∗ n ⇐r factor ∗ n


↪ term ∗ n ⇐r term ∗ n
↪ term ∗ factor ⇐r term ∗ factor
↪ term ⇐r term
↪ exp ⇐r exp

• underlined part:
• different in reduction vs. derivation
• represents the “part being replaced”
• for derivation: right-most non-terminal
• for reduction: indicates the so-called handle (or part of it)
• consequently: all intermediate words are right-sentential forms

217 / 330
Handle

Definition (Handle)
Assume S ⇒∗r αAw ⇒r αβw . A production A → β at position k
following α is a handle of αβw We write ⟨A → β, k⟩ for such a
handle.
Note:
• w (right of a handle) contains only terminals
• w : corresponds to the future input still to be parsed!
• αβ will correspond to the stack content (β the part touched
by reduction step).
• the ⇒r -derivation-step in reverse:
• one reduce-step in the LR-parser-machine
• adding (implicitly in the LR-machine) a new parent to children
β (= bottom-up!)
• “handle”-part β can be empty (= )
218 / 330
Schematic picture of parser machine (again)

... if 1 + 2 ∗ ( 3 + 4 ) ...

q2

Reading “head”
(moves left-to-right)

q3 ⋱

q2 qn ...

q1 q0
unbounded extra memory (stack)
Finite control

219 / 330
General LR “parser machine” configuration

• Stack:
• contains: terminals + non-terminals (+ $)
• containing: what has been read already but not yet “processed”
• position on the “tape” (= token stream)
• represented here as word of terminals not yet read
• end of “rest of token stream”: $, as usual
• state of the machine
• in the following schematic illustrations: not yet part of the
discussion
• later: part of the parser table, currently we explain without
referring to the state of the parser-engine
• currently we assume: tree and rest of the input given
• the trick ultimately will be: how do achieve the same without
that tree already given (just parsing left-to-right)

220 / 330
Schematic run (reduction: from top to bottom)

$ t1 t 2 t3 t4 t5 t6 t7 $
$ t1 t 2 t3 t4 t5 t6 t7 $
$ t1 t2 t3 t4 t5 t6 t7 $
$ t1 t2 t3 t4 t5 t6 t7 $
$ t1 B t4 t5 t6 t7 $
$A t4 t5 t6 t7 $
$ At4 t5 t6 t7 $
$ At4 t5 t6 t7 $
$ AA t6 t7 $
$ AAt6 t7 $
$ AB t7 $
$ ABt7 $
$S $
$ S′ $

221 / 330
2 basic steps: shift and reduce

• parsers reads input and uses stack as intermediate storage


• so far: no mention of look-ahead (i.e., action depending on the
value of the next token(s)), but that may play a role, as well

Shift Reduce
Move the next input Remove the symbols of the
symbol (terminal) over to right-most subtree from the stack
the top of the stack and replace it by the non-terminal
(“push”) at the root of the subtree
(replace = “pop + push”).
• easy to do if one has the parse tree already!
• reduce step: popped resp. pushed part = right- resp. left-hand
side of handle

222 / 330
Example: LR parsing for addition (given the tree)

E′ → E
E → E +n ∣ n

E′
parse stack input action
1 $ n + n $ shift
E 2 $n + n $ red:. E → n
3 $E + n $ shift
4 $E + n $ shift
5 $E +n $ reduce E → E + n
E 6 $E $ red.: E ′ → E
7 $E ′
$ accept

n + n note: line 3 vs line 6!; both contain E on


top of stack
(right) derivation: reduce-steps “in reverse”
E′ ⇒ E ⇒ E +n ⇒ n+n 223 / 330
Example with -transitions: parentheses

S′ → S
S → (S )S ∣ 
side remark: unlike previous grammar, here:
• production with two non-terminals in the right
⇒ difference between left-most and right-most derivations (and
mixed ones)

224 / 330
Parentheses: tree, run, and right-most derivation

S′
parse stack input action
1 $ ( ) $ shift
2 $( ) $ reduce S →
S
3 $(S ) $ shift
4 $(S ) $ reduce S →
S S 5 $(S )S $ reduce S → (S )S
6 $S $ reduce S′ → S
7 $ S′ $ accept
(  ) 
Note: the 2 reduction steps for the 
productions
Right-most derivation and right-sentential forms
S ′ ⇒r S ⇒r ( S ) S ⇒r ( S ) ⇒r ( )
225 / 330
Right-sentential forms & the stack
Right-sentential form: right-most derivation
S ⇒∗r α

• right-sentential forms:
• part of the “run”
• but: split between stack and input
parse stack input action
1 $ n+n$ shift E ′ ⇒r E ⇒r E + n ⇒r n + n
2 $n +n$ red:. E → n
+n$
n+n ↪ E +n ↪ E ↪ E′
3 $E shift
4 $E + n$ shift
5 $E +n $ reduce E → E + n
6 $E $ red.: E ′ → E
7 $E′ $ accept

E ′ ⇒r E ⇒r E + n ∥ ∼ E + ∥ n ∼ E ∥ + n ⇒r n ∥ + n ∼∥ n + n
226 / 330
Viable prefixes of right-sentential forms and handles

• right-sentential form: E + n
• viable prefixes of RSF
• prefixes of that RSF on the stack
• here: 3 viable prefixes of that RSF: E , E +, E + n
• handle: remember the definition earlier
• here: for instance in the sentential form n + n
• handle is production E → n on the left occurrence of n in
n + n (let’s write n1 + n2 for now)
• note: in the stack machine:
• the left n1 on the stack
• rest + n2 on the input (unread, because of LR(0))
• if the parser engine detects handle n1 on the stack, it does a
reduce-step
• However (later): reaction depends on current state of the
parser engine

227 / 330
A typical situation during LR-parsing

228 / 330
General design for an LR-engine

• some ingredients clarified up-to now:


• bottom-up tree building as reverse right-most derivation,
• stack vs. input,
• shift and reduce steps
• however: 1 ingredient missing: next step of the engine may
depend on
• top of the stack (“handle”)
• look ahead on the input (but not for LL(0))
• and: current state of the machine

229 / 330
But what are the states of an LR-parser?

General idea:
Construct an NFA (and ultimately DFA) which works on the stack
(not the input). The alphabet consists of terminals and
non-terminals ΣT ∪ ΣN . The language

α may occur on the stack during LR-


Stacks(G ) = {α ∣ }
parsing of a sentence in L(G )

is regular!

230 / 330
LR(0) parsing as easy pre-stage
• LR(0): in practice too simple, but easy conceptual step
towards LR(1), SLR(1) etc.
• LR(1): in practice good enough, LR(k) not used for k > 1

LR(0) item
production with specific “parser position” . in its right-hand side

• . is, of course, a “meta-symbol” (not part of the production)


• For instance: production A → α, where α = βγ, then

LR(0) item
A → β.γ

• item with dot at the beginning: initial item


• item with dot at the end: complete item
231 / 330
Grammar for parentheses: 3 productions
S′ → S
S → (S )S ∣ 

8 items
S′ → .S
S′ → S.
S → .(S )S
S → ( .S ) S
S → ( S. ) S
S → ( S ) .S
S → ( S ) S.
S → .

• note: S →  gives S → . as item (not S → . and S → .)


• side remark: see later, it will turn out: grammar not LR(0)
232 / 330
Grammar for addition: 3 productions
E′ → E
E → E + number ∣ number

(coincidentally also:) 8 items


E′ → .E
E′ → E.
E → .E + number
E → E . + number
E → E + .number
E → E + number.
E → .number
E → number.

• also here: it will turn out: not LR(0) grammar


233 / 330
Finite automata of items
• general set-up: items as states in an automaton
• automaton: “operates” not on the input, but the stack
• automaton either
• first NFA, afterwards made deterministic (subset construction),
or
• directly DFA

States formed of sets of items


In a state marked by/containing item

A → β.γ

• β on the stack
• γ: to be treated next (terminals on the input, but can contain
also non-terminals)
234 / 330
State transitions of the NFA

• X ∈Σ
• two kind of transitions

Terminal or non-terminal Epsilon (X : non-terminal here)


X 
A → α.X η A → αX .η A → α.X η X → .β

• In case X = terminal (i.e. token) =


• the left step corresponds to a shift step11
• for non-terminals (see next slide):
• interpretation more complex: non-terminals are officially never
on the input
• note: in that case, item A → α.X η has two (kinds of) outgoing
transitions

11
We have explained shift steps so far as: parser eats one terminal (= input
token) and pushes it on the stack.
235 / 330
Transitions for non-terminals and 
• so far: we never pushed a non-terminal from the input to the
stack, we replace in a reduce-step the right-hand side by a
left-hand side
• however: the replacement in a reduce steps can be seen as
1. pop right-hand side off the stack,
2. instead, “assume” corresponding non-terminal on input &
3. eat the non-terminal an push it on the stack.
• two kind of transitions
1. the -transition correspond to the “pop” half
2. that X transition (for non-terminals) corresponds to that
“eat-and-push” part
• assume production X → β and initial item X → .β

Epsilon (X : non-terminal here)


Terminal or non-terminal
Given production X → β:
X
A → α.X η A → αX .η

A → α.X η X → .β

236 / 330
Initial and final states
initial states:
• we make our lives easier
• we assume (as said): one extra start symbol say S ′
(augmented grammar)
⇒ initial item S ′ → .S as (only) initial state

final states:
• NFA has a specific task, “scanning” the stack, not scanning
the input
• acceptance condition of the overall machine: a bit more
complex
• input must be empty
• stack must be empty except the (new) start symbol
• NFA has a word to say about acceptence
• but not in form of being in an accepting state
• so: no accepting states
• but: accepting action (see later)
237 / 330
NFA: parentheses

S
start S′ → .S S′ → S.




S→ .(S )S S→ . S→ ( S ) S.


( 

S→ ( .S ) S S→ ( S. ) S S
S


)

S→ ( S ) .S

238 / 330
Remarks on the NFA

• colors for illustration


• “reddish”: complete items
• “blueish”: init-item (less important)
• “violet’tish”: both
• init-items
• one per production of the grammar
• that’s where the -transistions go into, but
• with exception of the initial state (with S ′ -production)
no outgoing edges from the complete items

239 / 330
NFA: addition

E
start E′ → .E E′ → E.




 n
 E→ .E + n E→ .n E→ n.

E→ E. + n E→ E + .n E→ E + n.
+ n

240 / 330
Determinizing: from NFA to DFA

• standard subset-construction12
• states then contains sets of items
• especially important: -closure
• also: direct construction of the DFA possible

12
Technically, we don’t require here a total transition function, we leave out
any error state.
241 / 330
DFA: parentheses

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
4
(
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

242 / 330
DFA: addition

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

243 / 330
Direct construction of an LR(0)-DFA

• quite easy: simply build in the closure already

-closure
• if A → α.Bγ is an item in a state where
• there are productions B → β1 ∣ β2 . . . ⇒
• add items B → .β1 , B → .β2 . . . to the state
• continue that process, until saturation

initial state
S ′ → .S
start
plus closure

244 / 330
Direct DFA construction: transitions

...
A1 → α1 .X β1 A1 → α1 X .β1
X
... A2 → α2 X .β2
A2 → α2 .X β2 plus closure
...

• X : terminal or non-terminal, both treated uniformely


• All items of the form A → α.X β must be included in the
post-state
• and all others (indicated by ". . . ") in the pre-state: not
included
• re-check the previous examples: outcome is the same

245 / 330
How does the DFA do the shift/reduce and the rest?

• we have seen: bottom-up parse tree generation


• we have seen: shift-reduce and the stack vs. input
• we have seen: the construction of the DFA

But: how does it hang together?


We need to interpret the “set-of-item-states” in the light of the
stack content and figure out the reaction in terms of
• transitions in the automaton
• stack manipulations (shift/reduce)
• acceptance
• input (apart from shifting) not relevant when doing LR(0)

and the reaction better be uniquely determined . . . .

246 / 330
Stack contents and state of the automaton

• remember: at any given intermediate configuration of


stack/input in a run
1. stack contains words from Σ∗
2. DFA operates deterministically on such words
• the stack contains the “past”: read input (potentially partially
reduced)
• when feeding that “past” on the stack into the automaton
• starting with the oldest symbol (not in a LIFO manner)
• starting with the DFA’s initial state
⇒ stack content determines state of the DFA
• actually: each prefix also determines uniquely a state
• top state:
• state after the complete stack content
• corresponds to the current state of the stack-machine
⇒ crucial when determining reaction

247 / 330
State transition allowing a shift

• assume: top-state (= current state) contains item

X → α.aβ
• construction thus has transition as follows

s t
... ...
a
X→ α.aβ X→ αa.β
... ...

• shift is possible
• if shift is the correct operation and a is terminal symbol
corresponding to the current token: state afterwards = t

248 / 330
State transition: analogous for non-terminals

s t
X → α.Bβ ... B ...
X→ α.Bβ X→ αB.β

249 / 330
State (not transition) where a reduce is possible
• remember: complete items (those with a dot . at the end)
• assume top state s containing complete item A → γ.
s
...
A→ γ.

• a complete right-hand side (“handle”) γ on the stack and thus


done
• may be replaced by right-hand side A
⇒ reduce step
• builds up (implicitly) new parent node A in the bottom-up
procedure
• Note: A on top of the stack instead of γ:13
• new top state!
• remember the “goto-transition” (shift of a non-terminal)
13
Indirectly only: as said, we remove the handle from the stack, and pretend,
as if the A is next on the input, and thus we “shift” it on top of the stack,
doing the corresponding A-transition.
250 / 330
Remarks: states, transitions, and reduce steps
• ignoring the -transitions (for the NFA)
• there are 2 “kinds” of transitions in the DFA
1. terminals: reals shifts
2. non-terminals: “following a reduce step”

No edges to represent (all of) a reduce step!


• if a reduce happens, parser engine changes state!
• however: this state change is not represented by a transition in
the DFA (or NFA for that matter)
• especially not by outgoing errors of completed items

• if the (rhs of the) handle is removed from top stack: ⇒


• “go back to the (top) state before that handle had been
added”: no edge for that
• later: stack notation simply remembers the state as part of its
configuration
251 / 330
Example: LR parsing for addition (given the tree)

E′ → E
E → E +n ∣ n

E′
parse stack input action
1 $ n + n $ shift
E 2 $n + n $ red:. E → n
3 $E + n $ shift
4 $E + n $ shift
5 $E +n $ reduce E → E + n
E 6 $E $ red.: E ′ → E
7 $E ′
$ accept

n + n note: line 3 vs line 6!; both contain E on


top of stack

252 / 330
DFA of addition example

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• note line 3 vs. line 6


• both stacks = E ⇒ same (top) state in the DFA (state 1)

253 / 330
LR(0) grammars

LR(0) grammar
The top-state alone determines the next step.

• especially: no shift/reduce conflicts in the form shown


• thus: previous number-grammar is not LR(0)

254 / 330
Simple parentheses

A → (A) ∣ a

0
A′ → .A 1 • for shift:
A
start A→ .(A) A →

A. • many shift
A→ .a transitions in 1
state allowed
( a • shift counts as one
3 action (including
A→ ( .A ) 2 “shifts” on
a
( A→ .(A) A→ a. non-terms)
A→ .a • but for reduction:
also the production
A
4 5 must be clear
A→ ( A. ) A→ (A).
)
255 / 330
Simple parentheses is LR(0)

0
A →

.A 1
A
start A→ .(A) A′ → A.
A→ .a state possible action
0 only shift
(
only red: (with A′ → A)
a
3
1
A→ ( .A ) 2
2 only red: (with A → a)
a 3 only shift
( A→ .(A) A→ a.
A→ .a 4 only shift
5 only red (with A → ( A ))
A
4 5
A→ ( A. ) A→ (A).
)

256 / 330
NFA for simple parentheses (bonus slide)

A
start A′ → .A A′ → A.




a
A→ .(A) A→ .a A→ a.


 (

A→ ( .A ) A→ ( A. ) A→ (A).
A )

257 / 330
Parsing table for an LR(0) grammar
• table structure: slightly different for SLR(1), LALR(1), and
LR(1) (see later)
• note: the “goto” part: “shift” on non-terminals (only 1
non-terminal A here)
• corresponding to the A-labelled transitions
• see the parser run on the next slide

state action rule input goto


( a ) A
0 shift 3 2 1
1 reduce A′ → A
2 reduce A → a
3 shift 3 2 4
4 shift 5
5 reduce A → ( A )
258 / 330
Parsing of ( ( a ) )

stage parsing stack input action

1 $0 ((a))$ shift
2 $ 0 (3 (a))$ shift
3 $ 0 (3 (3 a))$ shift
4 $ 0 (3 (3 a 2 ))$ reduce A → a
5 $ 0 (3 (3 A 4 ))$ shift
6 $ 0 (3 (3 A 4 )5 )$ reduce A → ( A )
7 $ 0 (3 A 4 )$ shift
8 $ 0 (3 A 4 )5 $ reduce A → ( A )
9 $0 A 1 $ accept

• note: stack on the left


• contains top state information
• in particular: overall top state on the right-most end
• note also: accept action
• reduce wrt. to A′ → A and
• empty stack (apart from $, A, and the state annotation)
⇒ accept
259 / 330
Parse tree of the parse

A′

A
( ( a ) )

• As said:
• the reduction “contains” the parse-tree
• reduction: builds it bottom up
• reduction in reverse: contains a right-most derivation (which is
“top-down”)
• accept action: corresponds to the parent-child edge A′ → A of
the tree
260 / 330
Parsing of erroneous input
• empty slots it the table: “errors”

stage parsing stack input action


1 $0 ((a)$ shift
2 $ 0 (3 (a)$ shift
3 $ 0 (3 (3 a)$ shift
4 $ 0 (3 (3 a 2 )$ reduce A → a
5 $ 0 (3 (3 A 4 )$ shift
6 $ 0 (3 (3 A 4 )5 $ reduce A → ( A )
7 $ 0 (3 A 4 $ ????

stage parsing stack input action


1 $0 ()$ shift
2 $0 (3 )$ ?????

Invariant
important general invariant for LR-parsing: never shift something
“illegal” onto the stack
261 / 330
LR(0) parsing algo, given DFA

let s be the current state, on top of the parse stack


1. s contains A → α.X β, where X is a terminal
• shift X from input to top of stack. the new state pushed on
X
the stack: state t where s Ð→t
• else: if s does not have such a transition: error
2. s contains a complete item (say A → γ.): reduce by rule
A → γ:
• A reduction by S ′ → S: accept, if input is empty, error:
• else:
pop: remove γ (including “its” states from the stack)
back up: assume to be in state u which is now head state
push: push A to the stack, new head state t where
A

→ t (in the DFA)

262 / 330
DFA parentheses again: LR(0)?

S′ → S
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S →

S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
4
(
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

263 / 330
DFA parentheses again: LR(0)?

S′ → S
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
4
(
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ .

Look at states 0, 2, and 4


264 / 330
DFA addition again: LR(0)?

E′ → E
E → E + number ∣ number

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

265 / 330
DFA addition again: LR(0)?

E′ → E
E → E + number ∣ number

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

How to make a decision in state 1?

266 / 330
Decision? If only we knew the ultimate tree already . . .
. . . especially the parts still to come

E′

parse stack input action


E 1 $ n + n $ shift
2 $n + n $ red:. E → n
3 $E + n $ shift
E 4 $E + n $ shift
5 $E +n $ reduce E → E + n
6 $E $ red.: E ′ → E
7 $E ′
$ accept
n + n
• current stack: represents already known part of the parse tree
• since we don’t have the future parts of the tree yet:
⇒ look-ahead on the input (without building the tree as yet)
• LR(1) and its variants: look-ahead of 1

267 / 330
Addition grammar (again)

0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• How to make a decision in state 1? (here: shift vs. reduce)


⇒ look at the next input symbol (in the token)

268 / 330
One look-ahead

• LR(0), not useful, too weak


• add look-ahead, here of 1 input symbol (= token)
• different variations of that idea (with slight difference in
expresiveness)
• tables slightly changed (compared to LR(0))
• but: still can use the LR(0)-DFAs

269 / 330
Resolving LR(0) reduce/reduce conflicts

LR(0) reduce/reduce conflict:


...
A → α.
...
B → β.

SLR(1) solution: use follow sets of non-terms


• If Follow (A) ∩ Follow (B) = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (α) then reduce using A → α
• if token ∈ Follow (β) then reduce using B → β
• ...

270 / 330
Resolving LR(0) reduce/reduce conflicts

LR(0) reduce/reduce conflict:


...
A → α.
...
B → β.

SLR(1) solution: use follow sets of non-terms


• If Follow (A) ∩ Follow (B) = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (α) then reduce using A → α
• if token ∈ Follow (β) then reduce using B → β
• ...

271 / 330
Resolving LR(0) shift/reduce conflicts

LR(0) shift/reduce conflict:


...
A → α. b1
... b2
B1 → β1 .b1 γ1
B2 → β2 .b2 γ2

SLR(1) solution: again: use follow sets of non-terms


• If Follow (A) ∩ {b1 , b2 , . . .} = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (A) then reduce using A → α, non-terminal A
determines new top state
• if token ∈ {b1 , b2 , . . .} then shift. Input symbol bi determines
new top state
• ...

272 / 330
Resolving LR(0) shift/reduce conflicts

LR(0) shift/reduce conflict:


...
A → α. b1
... b2
B1 → β1 .b1 γ1
B2 → β2 .b2 γ2

SLR(1) solution: again: use follow sets of non-terms


• If Follow (A) ∩ {b1 , b2 , . . .} = ∅
⇒ next symbol (in token) decides!
• if token ∈ Follow (A) then reduce using A → α, non-terminal A
determines new top state
• if token ∈ {b1 , b2 , . . .} then shift. Input symbol bi determines
new top state
• ...

273 / 330
Revisit addition one more time

0
1
E′ → .E
E E →

E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

• Follow (E ′ ) = {$}
⇒ • shift for +
• reduce with E ′ → E for $ (which corresponds to accept, in
case the input is empty)

274 / 330
SLR(1) algo
let s be the current state, on top of the parse stack
1. s contains A → α.X β, where X is a terminal and X is the next
token on the input, then
• shift X from input to top of stack. the new state pushed on
X
the stack: state t where s Ð
→ t 14
2. s contains a complete item (say A → γ.) and the next token in
the input is in Follow (A): reduce by rule A → γ:
• A reduction by S ′ → S: accept, if input is empty15
• else:
pop: remove γ (including “its” states from the stack)
back up: assume to be in state u which is now head state
push: push A to the stack, new head state t where
A

→t
3. if next token is such that neither 1. or 2. applies: error
14
Cf. to the LR(0) algo: since we checked the existence of the transition
before, the else-part is missing now.
15
Cf. to the LR(0) algo: This happens now only if next token is $. Note that
the follow set of S ′ in the augmented grammar is always only $
275 / 330
LR(0) parsing algo, given DFA

let s be the current state, on top of the parse stack


1. s contains A → α.X β, where X is a terminal
• shift X from input to top of stack. the new state pushed on
X
the stack: state t where s Ð→t
• else: if s does not have such a transition: error
2. s contains a complete item (say A → γ.): reduce by rule
A → γ:
• A reduction by S ′ → S: accept, if input is empty, error:
• else:
pop: remove γ (including “its” states from the stack)
back up: assume to be in state u which is now head state
push: push A to the stack, new head state t where
A

→ t (in the DFA)

276 / 330
Parsing table for SLR(1)
0
1
E′ → .E
E E′ → E.
start E→ .E + n
E→ E. + n
E→ .n

n +
2 3 4
n
E→ n. E→ E + .n E→ E + n.

state input goto


n + $ E
0 s ∶2 1
1 s ∶3 accept
2 r ∶ (E → n)
3 s ∶4
4 r ∶ (E → E + n) r ∶ (E → E + n)
for state 2 and 4: n ∉ Follow (E ) 277 / 330
Parsing table: remarks

• SLR(1) parsing table: rather similar-looking to the LR(0) one


• differences: reflect the differences in: LR(0)-algo vs.
SLR(1)-algo
• same number of rows in the table ( = same number of states
in the DFA)
• only: colums “arranged differently
• LR(0): each state uniformely: either shift or else reduce (with
given rule)
• now: non-uniform, dependent on the input
• it should be obvious:
• SLR(1) may resolve LR(0) conflicts
• but: if the follow-set conditions are not met: SLR(1) shift-shift
and/or SRL(1) shift-reduce conflicts
• would result in non-unique entries in SRL(1)-table16

16
by which it, strictly speaking, would no longer be an SRL(1)-table :-)
278 / 330
SLR(1) parser run (= “reduction”)

state input goto


n + $ E
0 s ∶2 1
1 s ∶3 accept
2 r ∶ (E → n)
3 s ∶4
4 r ∶ (E → E + n) r ∶ (E → E + n)

stage parsing stack input action

1 $0 n+n+n$ shift: 2
2 $ 0 n2 +n+n$ reduce: E → n
3 $0 E1 +n+n$ shift: 3
4 $0 E1 +3 n+n$ shift: 4
5 $0 E1 +3 n4 +n$ reduce: E → E + n
6 $0 E1 n$ shift 3
7 $0 E1 +3 n$ shift 4
8 $0 E1 +3 n4 $ reduce: E → E + n
9 $0 E1 $ accept
279 / 330
Corresponding parse tree

E′

number + number + number

280 / 330
Revisit the parentheses again: SLR(1)?

Grammar: parentheses (from before)


Follow set
S′ → S
Follow (S) = {), $}
S → (S )S ∣ 

0
S′ → .S 1
S
start S→ .(S )S S′ → S.
S→ .

( 2
S→ ( .S ) S 3
S
( S→ .(S )S S→ ( S. ) S
S→ .
)
4
(
S→ ( S ) .S 5
S
S→ .(S )S S→ ( S ) S.
S→ . 281 / 330
SLR(1) parse table

state input goto


( ) $ S
0 s ∶2 r ∶S → r ∶S → 1
1 accept
2 s ∶2 r ∶S → r ∶S → 3
3 s ∶4
4 s ∶2 r ∶S → r ∶S → 5
5 r ∶ S → (S )S r ∶ S → (S )S

282 / 330
Parentheses: SLR(1) parser run (= “reduction”)

state input goto


( ) $ S
0 s∶2 r ∶S → r ∶S → 1
1 accept
2 s∶2 r ∶S → r ∶S → 3
3 s∶4
4 s∶2 r ∶S → r ∶S → 5
5 r ∶ S → (S )S r ∶ S → (S )S

stage parsing stack input action


1 $0 ()()$ shift: 2
2 $ 0 (2 )()$ reduce: S →
3 $ 0 (2 S 3 )()$ shift: 4
4 $ 0 (2 S 3 )4 ()$ shift: 2
5 $ 0 (2 S 3 )4 (2 )$ reduce: S →
6 $ 0 (2 S 3 )4 (2 S 3 )$ shift: 4
7 $ 0 (2 S 3 )4 (2 S 3 )4 $ reduce: S →
8 $0 (2 S3 )4 (2 S3 )4 S5 $ reduce: S → (S )S
9 $ 0 (2 S 3 )4 S 5 $ reduce: S → (S )S
10 $ 0 S1 $ accept
283 / 330
SLR(k)

• in principle: straightforward: k look-ahead, instead of 1


• rarely used in practice, using First k and Follow k instead of the
k = 1 versions
• tables grow exponentially with k!

284 / 330
Ambiguity & LR-parsing
• in principle: LR(k) (and LL(k)) grammars: unambiguous
• definition/construction: free of shift/reduce and reduce/reduce
conflict (given the chosen level of look-ahead)
• However: ambiguous grammar tolerable, if (remaining)
conflicts can be solved “meaningfully” otherwise:

Additional means of disambiguation:


1. by specifying associativity / precedence “outside” the grammar
2. by “living with the fact” that LR parser (commonly) prioritizes
shifts over reduces
• for the second point (“let the parser decide according to its
preferences”):
• use sparingly and cautiously
• typical example: dangling-else
• even if parsers makes a decision, programmar may or may not
“understand intuitively” the resulting parse tree (and thus AST)
• grammar with many S/R-conflicts: go back to the drawing
board 285 / 330
Example of an ambiguous grammar

stmt → if -stmt ∣ other


if -stmt → if ( exp ) stmt
∣ if ( exp ) stmt else stmt
exp → 0 ∣ 1

In the following, E for exp, etc.

286 / 330
Simplified conditionals

Simplified “schematic” if-then-else


S → I ∣ other
I → if S ∣ if S else S

Follow-sets
Follow

S {$}
S {$, else}
I {$, else}

• since ambiguous: at least one conflict must be somewhere

287 / 330
DFA of LR(0) items
0 1
S ′ → .S S ′ → S.
S
S → .I 2
I
start S → .other S → I.
I → .if S I I
I → .if S else S
if 4
6
I → if .S
I → if S else .S
I → if .S else S
other
3 S → .I
S → .I
S → other. S → .other
other S → .other
I → .if S
if
I → .if S
I → .if S else S
I → .if S else S

if
S else S
other
5
7
I → if S .
I → if S else S.
I → if S .else S

288 / 330
Simple conditionals: parse table

SLR(1)-parse-table, conflict resolved

Grammar state input goto


if else other $ S I
0 s∶4 s∶3 1 2
S → I (1)
1 accept
∣ (2)
r ∶1 r ∶1
other
2
→ (3)
r ∶2 r ∶2
I if S
3
∣ (4)
s∶4 s∶3
if S else S
4 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

• shift-reduce conflict in state 5: reduce with rule 3 vs. shift (to


state 6)
• conflict there: resolved in favor of shift to 6
• note: extra start state left out from the table

289 / 330
Parser run (= reduction)

state input goto


if else other $ S I
0 s∶4 s∶3 1 2
1 accept
2 r ∶1 r ∶1
3 r ∶2 r ∶2
4 s∶4 s∶3 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

stage parsing stack input action


1 $0 if if other else other $ shift: 4
2 $0 if 4 if other else other $ shift: 4
3 $0 if 4 if 4 other else other $ shift: 3
4 $0 if 4 if 4 other3 else other $ reduce: 2
5 $0 if 4 if 4 S5 else other $ shift 6
6 $0 if 4 if 4 S5 else6 other $ shift: 3
7 $0 if 4 if 4 S5 else6 other3 $ reduce: 2
8 $0 if 4 if 4 S5 else6 S7 $ reduce: 4
9 $0 if 4 I2 $ reduce: 1
10 $0 S1 $ accept
290 / 330
Parser run, different choice

state input goto


if else other $ S I
0 s∶4 s∶3 1 2
1 accept
2 r ∶1 r ∶1
3 r ∶2 r ∶2
4 s∶4 s∶3 5 2
5 s∶6 r ∶3
6 s∶4 s∶3 7 2
7 r ∶4 r ∶4

stage parsing stack input action


1 $0 if if other else other $ shift: 4
2 $0 if 4 if other else other $ shift: 4
3 $0 if 4 if 4 other else other $ shift: 3
4 $0 if 4 if 4 other3 else other $ reduce: 2
5 $0 if 4 if 4 S5 else other $ reduce 3
6 $0 if 4 I2 else other $ reduce 1
7 $0 if 4 S5 else other $ shift 6
8 $0 if 4 S5 else6 other $ shift 3
9 $0 if 4 S5 else6 other3 $ reduce 2
10 $0 if 4 S5 else6 S7 $ reduce 4
11 $0 S1 $ accept
291 / 330
Parse trees: simple conditions

shift-precedence: conventional “wrong” tree


S S

I I S

S S S

if if other else other if if other else other

standard “dangling else” convention


“an else belongs to the last previous, still open (= dangling)
if-clause”

292 / 330
Use of ambiguous grammars

• advantage of ambiguous grammars: often simpler


• if ambiguous: grammar guaranteed to have conflicts
• can be (often) resolved by specifying precedence and
associativity
• supported by tools like yacc and CUP . . .

E′ → E
E → E + E ∣ E ∗ E ∣ number

293 / 330
DFA for + and ×

0
1
E ′ → .E
E′ → E.
E → .E + E E
start E → E. + E
E → .E ∗ E
E → E. ∗ E
E → .n 3 +
5
E → E + .E
E → E + E.
E → .E + E +
E → E. + E
E → .E ∗ E
E → E. ∗ E
E → .n E

6
n
+ E → E ∗ E. ∗
E → E. + E
n E → E. ∗ E

4
E → E ∗ .E
2 E E → .E + E
E→ n.
n E → .E ∗ E
E → .n 294 / 330
States with conflicts

• state 5
• stack contains ...E + E
• for input $: reduce, since shift not allowed from $
• for input +; reduce, as + is left-associative
• for input ∗: shift, as ∗ has precedence over +
• state 6:
• stack contains ...E ∗ E
• for input $: reduce, since shift not allowed from $
• for input +; reduce, a ∗ has precedence over +
• for input ∗: shift, as ∗ is left-associative
• see also the table on the next slide

295 / 330
Parse table + and ×

state input goto


n + ∗ $ E
0 s∶2 1
1 s∶3 s∶4 accept
2 r ∶E →n r ∶E →n r ∶E →n
3 s∶2 5
4 s∶2 6
5 r ∶ E → E +E s∶4 r ∶ E → E +E
6 r ∶ E → E ∗E r ∶ E → E ∗E r ∶ E → E ∗E

How about exponentiation (written ↑ or ∗ ∗)?


Defined as right-associative. See exercise

296 / 330
For comparison: unambiguous grammar for + and ∗

Unambiguous grammar: precedence and left-assoc built in


E′ → E
E → E +T ∣ T
T → T ∗n ∣ n

Follow
E′ {$} (as always for start symbol)
E {$, +}
T {$, +, ∗}

297 / 330
DFA for unambiguous + and ×

0
E ′ → .E 2
1
E → .E + T E → E + .T
E E′ → E. +
start E → .T T → .T ∗ n
E → E. + T
E → .T ∗ n T → .n
E → .n

n T
n
6
3
E → E +T.
T → n.
T → T.∗n
T


4
5 7
E → T. ∗
T → T ∗ .n T → T ∗ n.
T → T.∗n n
298 / 330
DFA remarks

• the DFA now is SLR(1)


• check states with complete items
state 1: Follow (E ′ ) = {$}
state 4: Follow (E ) = {$, +}
state 6: Follow (E ) = {$, +}
state 3/7: Follow (T ) = {$, +, ∗}
• in no case there’s a shift/reduce conflict (check the outgoing
edges vs. the follow set)
• there’s not reduce/reduce conflict either

299 / 330
LR(1) parsing

• most general from of LR(1) parsing


• aka: canonical LR(1) parsing
• usually: considered as unecessarily “complex” (i.e. LALR(1) or
similar is good enough)
• “stepping stone” towards LALR(1)

Basic restriction of SLR(1)


Uses look-ahead, yes, but only after it has built a non-look-ahead
DFA (based on LR(0)-items)

A help to remember
SRL(1) “improved” LR(0) parsing LALR(1) is “crippled” LR(1)
parsing.

300 / 330
Limits of SLR(1) grammars

Assignment grammar fragmenta


a
Inspired by Pascal, analogous problems in C . . .

stmt → call -stmt ∣ assign-stmt


call -stmt → identifier
assign-stmt → var ∶= exp
var → [ exp ] ∣ identifier
exp ∣ var ∣ number

Assignment grammar fragment, simplified


S → id ∣ V ∶= E
V → id
E → V ∣ n

301 / 330
non-SLR(1): Reduce/reduce conflict

302 / 330
Situation can be saved: more look-ahead

303 / 330
LALR(1) (and LR(1)): Being more precise with the
follow-sets

• LR(0)-items: too “indiscriminate” wrt. the follow sets


• remember the definition of SLR(1) conflicts
• LR(0)/SLR(1)-states:
• sets of items17 due to subset construction
• the items are LR(0)-items
• follow-sets as an after-thought

Add precision in the states of the automaton already


Instead of using LR(0)-items and, when the LR(0) DFA is done, try
to disambiguate with the help of the follow sets for states
containing complete items: make more fine-grained items:
• LR(1) items
• each item with “specific follow information”: look-ahead
17
That won’t change in principle (but the items get more complex)
304 / 330
LR(1) items

• main idea: simply make the look-ahead part of the item


• obviously: proliferation of states18

LR(1) items
[A → α.β, a] (9)

• a: terminal/token, including $

18
Not to mention if we wanted look-ahead of k > 1, which in practice is not
done, though.
305 / 330
LALR(1)-DFA (or LR(1)-DFA)

306 / 330
Remarks on the DFA

• Cf. state 2 (seen before)


• in SLR(1): problematic (reduce/reduce), as
Follow (V ) = {∶=, $}
• now: diambiguation, by the added information
• LR(1) would give the same DFA

307 / 330
Full LR(1) parsing

• AKA: canonical LR(1) parsing


• the best you can do with 1 look-ahead
• unfortunately: big tables
• pre-stage to LALR(1)-parsing

SLR(1) LALR(1)
LR(0)-item-based parsing, with LR(1)-item-based parsing, but
afterwards adding some extra afterwards throwing away
“pre-compiled” info (about precision by collapsing states,
follow-sets) to increase expressivity to save space

308 / 330
LR(1) transitions: arbitrary symbol

• transitions of the NFA (not DFA)

X -transition
X
[A → α.X β, a] [A → αX .β, a]

309 / 330
LR(1) transitions: 

-transition
for all
B → β1 ∣ β2 . . . and all b ∈ First(γa)

[ A → α.Bγ ,a] [ B → .β ,b]

including special case (γ = )


for all B → β1 ∣ β2 . . .


[ A → α.B ,a] [ B → .β ,a]

310 / 330
LALR(1) vs LR(1)

LR(1)

LALR(1)

311 / 330
Core of LR(1)-states

• actually: not done that way in practice


• main idea: collapse states with the same core

Core of an LR(1) state


= set of LR(0)-items (i.e., ignoring the look-ahead)

• observation: core of the LR(1) item = LR(0) item


• 2 LR(1) states with the same core have same outgoing edges,
and those lead to states with the same core

312 / 330
LALR(1)-DFA by as collapse

• collapse all states with the same core


• based on above observations: edges are also consistent
• Result: almost like a LR(0)-DFA but additionally
• still each individual item has still look ahead attached: the
union of the “collapsed” items
• especially for states with complete items [A → α, a, b, . . .] is
smaller than the follow set of A
• ⇒ less unresolved conflicts compared to SLR(1)

313 / 330
Concluding remarks of LR / bottom up parsing

• all constructions (here) based on BNF (not EBNF)


• conflicts (for instance due to ambiguity) can be solved by
• reformulate the grammar, but generarate the same language19
• use directives in parser generator tools like yacc, CUP, bison
(precedence, assoc.)
• or (not yet discussed): solve them later via semantical analysis
• NB: not all conflics are solvable, also not in LR(1) (remember
ambiguous languages)

19
If designing a new language, there’s also the option to massage the
language itself. Note also: there are inherently ambiguous languages for which
there is no unambiguous grammar.
314 / 330
LR/bottom-up parsing overview

advantages remarks
LR(0) defines states also used by not really used, many con-
SLR and LALR flicts, very weak
SLR(1) clear improvement over weaker than LALR(1). but
LR(0) in expressiveness, often good enough. Ok
even if using the same for hand-made parsers for
number of states. Table small grammars
typically with 50K entries
LALR(1) almost as expressive as method of choice for most
LR(1), but number of generated LR-parsers
states as LR(0)!
LR(1) the method covering all large number of states
bottom-up, one-look-ahead (typically 11M of entries),
parseable grammars mostly LALR(1) preferred

Remeber: once the table specific for LR(0), . . . is set-up, the parsing
algorithms all work the same

315 / 330
Error handling

• at the least: do an understandable error message


• give indication of line / character or region responsible for the
error in the source file
• potentially stop the parsing
• some compilers do error recovery
• give an understandable error message (as minimum)
• continue reading, until it’s plausible to resume parsing ⇒ find
more errors
• however: when finding at least 1 error: no code generation
• observation: resuming after syntax error is not easy

316 / 330
Error handling

Minimal requirement
Upon “stumbling over” an error (= deviation from the grammar):
give a reasonable & understandable error message, indicating also
error location. Potentially stop parsing

• for parse error recovery


• one cannot really recover from the fact that the program has
an error (an syntax error is a syntax error), but
• after giving decent error message:
• move on, potentially jump over some subsequent code,
• until parser can pick up normal parsing again
• so: meaningfull checking code even following a first error
• avoid: reporting an avalanche of subsequent spurious errors
(those just “caused” by the first error)
• “pick up” again after semantic errors: easier than for syntactic
errors

317 / 330
Error messages
• important:
• avoid error messages that only occur because of an already
reported error!
• report error as early as possible, if possible at the first point
where the program cannot be extended to a correct program.
• make sure that, after an error, one doesn’t end up in an
infinite loop without reading any input symbols.
• What’s a good error message?
• assume: that the method factor() chooses the alternative
( exp ) but that it , when control returns from method exp(),
does not find a )
• one could report : left paranthesis missing
• But this may often be confusing, e.g. if what the program text
is: ( a + b c )
• here the exp() method will terminate after ( a + b, as c
cannot extend the expression). You should therefore rather
give the message error in expression or left
paranthesis missing.
318 / 330
Error recovery in bottom-up parsing
• panic recovery in LR-parsing
• simple form
• the only one we shortly look at
• upon error: recovery ⇒
• pops parts of the stack
• ignore parts of the input
• until “on track again”
• but: how to do that
• additional problem: non-determinism
• table: constructed conflict-free under normal operation
• upon error (and clearing parts of the stack + input): no
guarantee it’s clear how to continue
⇒ heuristic needed (like panic mode recovery)

Panic mode idea


• try a fresh start,
• promising “fresh start” is: a possible goto action
• thus: back off and take the next such goto-opportunity
319 / 330
Possible error situation

parse stack input action


1 $0 a1 b2 c3 (4 d5 e6 f ) gh . . . $ no entry for f

state input goto


... ) f g ... ... A B ...
...
3 u v
4 − − −
5 − − −
6 − − − −
...
u − − reduce . . .
v − − shift ∶ 7
...

320 / 330
Possible error situation

parse stack input action


1 $0 a1 b2 c3 (4 d5 e6 f ) gh . . . $ no entry for f
2 $0 a1 b2 c3 Bv gh . . . $ back to normal
3 $0 a1 b2 c3 Bv g7 h ...$ ...

state input goto


... ) f g ... ... A B ...
...
3 u v
4 − − −
5 − − −
6 − − − −
...
u − − reduce . . .
v − − shift ∶ 7
...

321 / 330
Panic mode recovery

Algo
1. Pop states for the stack until a state is found with non-empty
goto entries
2. • If there’s legal action on the current input token from one of
the goto-states, push token on the stack, restart the parse.
• If there’s several such states: prefer shift to a reduce
• Among possible reduce actions: prefer one whose associated
non-terminal is least general
3. if no legal action on the current input token from one of the
goto-states: advance input until there is a legal action (or until
end of input is reached)

322 / 330
Example again

parse stack input action


1 $0 a1 b2 c3 (4 d5 e6 f ) gh . . . $ no entry for f

• first pop, until in state 3


• then jump over input
• until next input g
• since f and ) cannot be treated
• choose to goto v (shift in that state)

323 / 330
Example again

parse stack input action


1 $0 a1 b2 c3 (4 d5 e6 f ) gh . . . $ no entry for f
2 $0 a1 b2 c3 Bv gh . . . $ back to normal
3 $0 a1 b2 c3 Bv g7 h ...$ ...

• first pop, until in state 3


• then jump over input
• until next input g
• since f and ) cannot be treated
• choose to goto v (shift in that state)

324 / 330
Panic mode may loop forever

parse stack input action


1 $0 (n n)$
2 $0 (6 n n)$
3 $0 (6 n5 n)$
4 $0 (6 factor 4 n)$
6 $0 (6 term3 n)$
7 $0 (6 exp 10 n)$ panic!
8 $0 (6 factor 4 n)$ been there before: stage 4!

325 / 330
Typical yacc parser table
some variant of the expression grammar again
command → exp
exp → term ∗ factor ∣ factor
term → term ∗ factor ∣ factor
factor → number ∣ ( exp )

326 / 330
Panicking and looping

parse stack input action


1 $0 (n n)$
2 $0 (6 n n)$
3 $0 (6 n5 n)$
4 $0 (6 factor 4 n)$
6 $0 (6 term3 n)$
7 $0 (6 exp 10 n)$ panic!
8 $0 (6 factor 4 n)$ been there before: stage 4!

• error raised in stage 7, no action possible


• panic:
1. pop-off exp 10
2. state 6: 3 goto’s
exp term factor
goto to 10 3 4
with n next: action there — reduce r4 reduce r6

3. no shift, so we need to decide between the two reduces


4. factor : less general, we take that one
327 / 330
How to deal with looping panic?

• make sure to detec loop (i.e. previous “configurations”)


• if loop detected: doen’t repeat but do something special, for
instance
• pop-off more from the stack, and try again
• pop-off and insist that a shift is part of the options

Left out (from the book and the pensum)


• more info on error recovery
• expecially: more on yacc error recovery
• it’s not pensum, and for the oblig: need to deal with
CUP-specifics (not classic yacc specifics even if similar)
anyhow, and error recovery is not part of the oblig (halfway
decent error handling is).

328 / 330
Outline

1. Parsing
First and follow sets
Top-down parsing
Bottom-up parsing
References

329 / 330
References I

[Appel, 1998] Appel, A. W. (1998).


Modern Compiler Implementation in ML/Java/C.
Cambridge University Press.

[Louden, 1997] Louden, K. (1997).


Compiler Construction, Principles and Practice.
PWS Publishing.

330 / 330

You might also like