Parsing Notes
Parsing Notes
Rupesh Nasre.
Machine-Independent
Machine-Independent
Lexical
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
Intermediate representation
Backend
Token stream
Frontend
Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator
Machine-Dependent
Machine-Dependent
Semantic
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
Intermediate
Intermediate Symbol
Code
CodeGenerator
Generator Table
2
Intermediate representation
Jobs of a Parser
● Read specification given by the language
implementor.
● Get help from lexer to collect tokens.
● Check if the sequence of tokens matches the
specification.
● Declare successful program structure or report
errors in a useful manner.
● Later: Also identify some semantic errors.
Parsing Specification
● In general, one can write a string manipulation
program to recognize program structures (e.g.,
Lab 2).
● However, the string manipulation / recognition
can be generated from a higher level
description.
● We use Context-Free Grammars to specify.
– Precise, easy to understand + modify, correct
translation + error detection, incremental language
development.
CFG
1. A set of terminals called tokens. list →
list → list
list++digit
digit
list →
list → list
list––digit
digit
Terminals are elementary symbols list →
list →digit
digit
of the parsing language. digit→
digit →00| |11| |...
...| |88| |99
2. A set of non-terminals called variables.
A non-terminal represents a set of strings of
terminals.
3. A set of productions.
– They define the syntactic rules.
4. A start symbol designated by a non-terminal.
Productions, Derivations and Languages
list →
list → list
list++digit
digit
list →
list → list
list––digit
digit
list →
list →digit
digit
digit→
digit →00| |11| |...
...| |88| |99
left or right or
head body
Parse Tree ++ 55
++ 11
list →
list → list
list++digit
digit
list →
list -- 00
→ list
list––digit
digit
list →
list →digit
digit ++ 22
digit→
digit →00| |11| |...
...| |88| |99 -- 88
++ 00
33 11
3+1-0+8-2+0+1+5
● A parse tree is a pictorial representation of operator evaluation.
Precedence
What
Whatififboth
boththe
theoperators
operatorsare
arethe
thesame?
same?
● x#y@z ## @
@
– How does a compiler know whether xx @
@ ## zz
to execute # first or @ first? xx yy
yy zz
– Think about x+y*z vs. x/y-z
– A similar situation arises in if-if-else.
● Humans and compilers may “see” different parse
trees.
#define MULT(x) x*x
int main() {
printf(“%d”, MULT(3 + 1));
}
Same Precedence
## @
@ -- --
x+y+z Order of evaluation
xx @
@ ## zz doesn't matter. xx -- -- zz
yy zz xx yy yy zz xx yy
x-y-z Order of evaluation
matters.
Associativity
● Associativity decides the order in
which multiple instances of same- -- --
E→E+T|E–T|T
T→T*F|T/F|F
F → (E) | number | name
Ambiguous / Unambiguous Grammars
Grammar for simple arithmetic expressions
E → E + E | E * E | E – E | E / E | (E) | number | name Precedence not encoded
a+b*c
E→E+E|E–E|T
T→T*T|T/T|F Associativity not encoded
F → (E) | number | name a–b-c
E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name
E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name Left recursive, not suitable
for top-down parsing
E → T E'
Non-left-recursive grammar
E' → + T E' | - T E' | ϵ Can be used for top-down
T → F T' parsing
T' → * F T' | / F T' | ϵ
F → (E) | number | name
Sentential Forms
● Example grammar E → E + E | E * E | – E | (E) | id
- (id + id)
● Sentence / string
E => - E => - (E) => - (E + E) => - (id + E) => - (id + id)
● Derivation
● Sentential forms E, -E, -(E), ..., - (id + id)
– Rightmost: ...
E => -E => - (E) => - (E + E) => - (E + id) => - (id + id)
E
E E
E E
E E
E E
E E
E
-- E
E -- E
E -- E
E -- E
E -- E
E
(( E
E )) (( E
E )) (( E
E )) (( E
E ))
E
E ++ E
E E
E ++ E
E E
E ++ E
E
E
E id
id id
id
Parse Trees
● Given a parse tree, it is unclear which order
was used to derive it.
– Thus, a parse is a pictorial representation of future
operator order.
– It is oblivious to a specific derivation order.
E
E
● Every parse tree has a unique leftmost
derivation and a unique rightmost derivation -- EE
(( E
E ))
– We will use them in uniquely identifying
a parse tree. E
E ++ E
E
id
id id
id
Context-Free vs Regular
● We can write grammars for regular
expressions.
– Consider our regular expression (a|b)*abb.
– We can write a grammar for it.
A → aA | bA | aB
B → bC
C → bD
D→ϵ
stmt stmt
stmt
stmt ->
-> matched_stmt
matched_stmt || open_stmt
open_stmt
matched_stmt
matched_stmt -> -> ifif expr
expr then
then matched_stmt
matched_stmt else
else matched_stmt
matched_stmt
|| otherstmt
otherstmt
open_stmt
open_stmt ->-> ifif expr
expr thenthen stmt
stmt
|| ifif expr
expr thenthen matched_stmt
matched_stmt else
else open_stmt
open_stmt
if-else Ambiguity
stmt
E2 S1 S2
Classwork: Write an unambiguous grammar for associating else with the first if.
Left Recursion
A grammar is left-recursive if it has a non-terminal A
such that there is a derivation A =>+ Aα for some
string α.
● Top-down parsing methods cannot handle left-
recursive grammars.
●
A → Aα | β
A
... Can we eliminate left recursion?
A
A
A
β α α ... α
Left Recursion
A grammar is left-recursive if it has a non-terminal A
such that there is a derivation A =>+ Aα for some
string α.
● Top-down parsing methods cannot handle left-
recursive grammars.
Right reursive.
●
A → Aα | β
A A
... R
A R
...
A
R
A R
β α α ... α β α α ... α є
Left Recursion
AA →
→βBβB
AA →
→Aα
Aα || ββ
BB → αB || ϵϵ
→ αB
Right reursive.
A A
... R
A R
...
A
R
A R
β α α ... α β α α ... α є
Left Recursion
AA →
→βBβB
AA →
→Aα
Aα || ββ
BB → αB || ϵϵ
→ αB
In general
AA→
→Aα
Aα11 ||Aα
Aα22 || ...
... ||Aα
Aαmm || ββ11 || ββ22 || ...
... || ββnn
AA →
→ ββ11BB || ββ22BB || ...... || ββnnBB
→ αα1BB || αα2BB || ...
BB →
1 2
... || ααmBB || ϵϵ
m
Algorithm for
Eliminating Left Recursion
arrange non-terminals in some order A1, ..., An.
for i = 1 to n {
for j = 1 to i -1 {
replace Ai → Ajα by Ai → β1α | ... | βkα
EE →→TTE'E'
EE →
→EE++TT||TT E' →
E' → ++ TTE'E'||ϵϵ
TT→→ TT**FF||FF TT→ →FFT'T'
FF →
→ (E)
(E) || name
name || number
number T' →
T' →**FFT'T'||ϵϵ
FF →→ (E)
(E) || name
name || number
number
Ambiguous / Unambiguous Grammars
Grammar for simple arithmetic expressions
E → E + E | E * E | E – E | E / E | (E) | number | name Precedence not encoded
a+b*c
E→E+E|E–E|T
T→T*T|T/T|F Associativity not encoded
F → (E) | number | name a–b-c
E→E+T|E–T|T
T→T*F|T/F|F Unambiguous grammar
F → (E) | number | name Left recursive, not suitable
for top-down parsing
E → T E'
Non-left-recursive grammar
E' → + T E' | - T E' | ϵ Can be used for top-down
T → F T' parsing
T' → * F T' | / F T' | ϵ
F → (E) | number | name
Classwork
● Remove left recursion from the following
grammar.
SS→
→AAaa||bb SS→
→AAaa||bb
AA→
→ AAcc||SSdd||ϵϵ AA→
→ AAcc||AAaadd||bbdd||ϵϵ
SS →→AAaa||bb
AA→ → bbddA'A'||A'A'
A' →
A' →ccA'A'||aaddA'A'||ϵϵ
Left Factoring
● When the choice between two alternative
productions is unclear, rewrite the grammar to
defer the decision until enough input is seen.
– Useful for predictive or top-down parsing.
● A → α β1 | α β2
– Here, common prefix α can be left factored.
– A → α A'
– A' → β1 | β2
●
Left factoring doesn't change ambiguity. e.g. in
dangling if-else.
Non-Context-Free
Language Constructs
● wcw is an example of a language that is not CF.
● In the context of C, what does this language
indicate?
● It indicates that declarations of variables (w)
followed by arbitrary program text (c), and then
use of the declared variable (w) cannot be
specified in general by a CFG.
● Additional rules or passes (semantic phase) are
required to identify declare-before-use cases.
What does the language anbmcndm indicate in C?
Q1 Paper Discussion
● And attendance.
● And assignment marks.
Top-Down Parsing
● Constructs parse-tree for the input string,
starting from root and creating nodes.
● Follows preorder (depth-first).
● Finds leftmost derivation.
● General method: recursive descent.
– Backtracks
● Special case: Predictive (also called LL(k))
– Does not backtrack
– Fixed lookahead
Recursive Descent Parsing
void A() { Nonterminal A
saved = current input position;
for each A-production A -> X1 X2 X3 ... Xk { A-> BC | Aa | b
}
if (A matched) break; Prod. match
id ϵ FF T'
T' id ϵ FF T'
T' id ϵ FF T'
T' ϵ
id FF T'
T' id * FF T'
T' id FF T'
T'
* *
id id ϵ id ϵ
FIRST and FOLLOW
● Top-down (as well as bottom-up) parsing is aided
by FIRST and FOLLOW sets.
– Recall firstpos, followpos from lexing.
● First and Follow allow a parser to choose which
production to apply, based on lookahead.
● Follow can be used in error recovery.
– While matching a production for A→ α, if the input
doesn't match FIRST(α), use FOLLOW(A) as the
synchronizing token.
FIRST and FOLLOW
●
FIRST(α) is the set of terminals that begin strings
derived from α, where α is any string of symbols
– If α =>* ϵ, ϵ is also in FIRST(α)
– If A → α | β and FIRST(α) and FIRST(β) are disjoint sets,
then the lookahead decides the production to be applied.
● FOLLOW(A) is the set of terminals that can appear
immediately to the right of A in some sentential form,
where A is a nonterminal.
– If S =>* αAaβ, then FOLLOW(A) contains a.
– If S =>* αABaβ and B =>* ϵ then FOLLOW(A) contains a.
– If A can be the rightmost symbol, we add $ to FOLLOW(A).
This means FOLLOW(S) always contains $.
FIRST and FOLLOW
E → T E'
● First(E) = {(, id} ● Follow(E) = {), $} E' → + T E' | ϵ
T → F T'
● First(T) = {(, id} ● Follow(T) = {+, ), $} T' → * F T' | ϵ
F → (E) | id
● First(F) = {(, id} ● Follow(F) = {+, *, ), $}
● First(E') = {+, ϵ} ● Follow(E') = {), $}
● First(T') = {*, ϵ} ● Follow(T') = {+, ), $}
First and Follow
Non-terminal FIRST FOLLOW
E (, id ), $ E → T E'
E' → + T E' | ϵ
+, ϵ
E' ), $
T → F T'
T (, id +, ), $ T' → * F T' | ϵ
F → (E) | id
*, ϵ
T' +, ), $
F (, id +, *, ), $
Predictive Parsing Table
Non- id + * ( ) $
terminal
E
E'
T
T'
F
F (, id +, *, ), $
Predictive Parsing Table
Non- id + * ( ) $
terminal
E E → T E' E → T E'
E' E'→ +TE' E' → ϵ E' → ϵ
T
T→F T→F
T' T'
T' → ϵ T'→ *FT' T' → ϵ T' → ϵ
T'
F
→ id
FNon-terminal FIRST FOLLOW
F → (E)
E (, id ), $ E → T E'
E' → + T E' | ϵ
+, ϵ
E' ), $
T → F T'
T (, id +, ), $ T' → * F T' | ϵ
F → (E) | id
*, ϵ
T' +, ), $
F (, id +, *, ), $
Predictive Parsing Table
for each production A → α
for each terminal a in FIRST(α) Process terminals
using FIRST
Table[A][a].add(A→ α)
if ϵ is in FIRST(α) then
Process terminals
for each terminal b in FOLLOW(A) on nullable using
FOLLOW
Table[A][b].add(A→ α)
if $ is in FOLLOW(A) then Process $ on
nullable using
Table[A][$].add(A→ α) FOLLOW
LL(1) Grammars
● Predictive parsers needing no backtracking
can be constructed for LL(1) grammars.
– First L is left-to-right input scanning.
– Second L is leftmost derivation.
– 1 is the maximum lookahead.
– In general, LL(k) grammars.
– LL(1) covers most programming constructs.
– No left-recursive grammar can be LL(1).
– No ambiguous grammar can be LL(1).
Any example of RR grammar?
LL(1) Grammars
●
A grammar is LL(1) iff whenever A → α | β are
two distinct productions, the following hold:
– FIRST(α) and FIRST(β) are disjoint sets.
– If ϵ is in FIRST(β) then FIRST(α) and FOLLOW(A)
are disjoint sets, and likewise if ϵ is in FIRST(α)
FIRST(β) and FOLLOW(A) are disjoint sets.
Predictive Parsing Table
Non- id + * ( ) $
terminal
E E → T E' E → T E'
E' E'→ +TE' E' → ϵ E' → ϵ
T T → F T' T → F T'
F
F → id F → (E)
● Each entry contains a single production.
● Empty entries correspond to error states.
● For LL(1) grammar, each entry uniquely identifies an entry or signals an error.
● If there are multiple productions in an entry, then that grammar is not LL(1).
However, it does not guarantee that the language produced is not LL(1). We may
be able to transform the grammar into an LL(1) grammar (by eliminating left-
recursion and by left-factoring).
● There exist languages for which no LL(1) grammar exists.
Classwork: Parsing Table
Non- i t a e b $
terminal
S S → i E t S S' S→a
S' S' → e S S' → ϵ
S' → ϵ
E
E→b
for
for ii == 0;
0; ii << 10;
10; ++i)
++i)
a[i+1]
a[i+1] == a[i];
a[i];
id FF FF id TT * FF TT
id id FF id TT * FF
id FF id
id
Bottom-Up Parsing
● A reduction is the reverse of a derivation.
● Therefore, the goal of bottom-up parsing is to
construct a derivation in reverse.
id * id FF * id TT * id TT * FF TT E
E
Bottom-Up Parsing
● A reduction is the reverse of a derivation.
● Therefore, the goal of bottom-up parsing is to
construct a derivation in reverse.
id * id FF * id TT * id TT * FF TT E
E
id * id FF * id TT * id TT * FF TT E
E
The one
above
E→E+T|T
T→T*F|F
F → (E) | id
II0
0
Initial state
E'
E' → .EE
→ . Kernel item
EE→→..EE++TT
EE→ Item closure
→..TT
TT→→..TT**FF Non-kernel items
TT→→..FF
FF→→..(E)(E)
FF→→..id id
Classwork:
Find closure set for T → T * . F
Find closure set for F → ( E ) .
E' → E
E→E+T|T
T→T*F|F
F → (E) | id
LR(0) Automaton
1. Find sets of LR(0) items.
2. Build canonical LR(0) collection.
Grammar augmentation (start symbol)
CLOSURE (similar in concept to ϵ-closure in FA)
– GOTO (similar to state transitions in FA)
3. Construct the FA
E' → E
E→E+T|T
T→T*F|F
F → (E) | id
II0 II1
0 E 1
E'
E' → .EE
→ . E'
E' → E..
→ E
EE→→..EE++TT EE→→EE..++TT
EE→→..TT
TT→→..TT**FF
TT→→..FF
FF→→..(E)(E)
FF→→..id id
Classwork:
● Find GOTO(I, +) where I contains
{E' → E ., E → E . + T} E' → E
E→E+T|T
T→T*F|F
F → (E) | id
II0 II1 II6 II9
0 E 1 + 6 T 9
E'
E' → .EE
→ . E' → E
E' → E . . EE→
→ E++..TT
E EE→
→ E++TT..
E
EE→→..EE++TT EE→→EE..++TT TT→
→..TT**FF TT→
→TT..**FF
EE→→..TT TT→ F
$ →..FF *
TT→→..TT**FF FF→
→..(E)(E) (
TT→→..FF accept FF→ . id
→ . id
FF→ id
→..(E)(E) T II2
FF→→..id id 2
EE→
→ T..
T * II7
T TT→
→TT..**FF 7
TT→
→ T**..FF
T F II10
10
FF→
→..((EE)) TT→
id → T**FF..
T
II5 id FF→
→..id id
5
FF→
→ id..
id
id
id +
II8
II4 E 8 ) II11 Is the automaton
4 EE→ E
→E.+T . + T 11
LR(0) Automaton
FF→ FF→
→ ( E))..
( E complete?
( →((..EE)) FF→
→((EE..))
EE→
→..EE++TT
EE→
→..TT
( TT→
→..TT**FF (
TT→
→..FF
T FF→
→..(E)(E)
FF→ . id
→ . id (
F E' → E
F E→E+T|T
F II3
3 T→T*F|F
TT→
→FF.. F → (E) | id
II0 II1 II6 II9
0 E 1 + T 9
E'
E' → .EE
→ . E' → E . EE→
6
EE→
→ E++TT..
E ● Initially, the state
E' → E . → E++..TT
E
EE→→..EE++TT EE→→EE..++TT TT→
→..TT**FF TT→
→TT..**FF is 0 (for I0).
EE→→..TT TT→ F
$ →..FF *
● On seeing input
TT→→..TT**FF FF→
→..(E)(E) ( symbol id, the
TT→→..FF accept FF→ . id
→ . id state changes to 5
FF→ id
→..(E)(E) T II2 (for I5).
FF→→..id id 2
EE→
→ T..
T * On seeing input *,
II7 ●
T TT→
→TT..**FF 7
TT→
→ T**..FF
T F II10 there is no action
FF→
→..((EE)) TT→
10
out of state 5.
id → T**FF..
T
II5 id FF→
→..id id
5
FF→
→ id..
id
id
id +
II8
II4 E 8 ) II11
4 EE→ E
→E.+T . + T 11
FF→ FF→
→ ( E))..
( E
( →((..EE)) FF→
→((EE..))
EE→
→..EE++TT
EE→
→..TT
( TT→
→..TT**FF (
TT→
→..FF
T FF→
→..(E)(E)
FF→ . id
→ . id (
F E' → E
F E→E+T|T
F II3
3 T→T*F|F
TT→
→FF.. F → (E) | id
SLR Parsing using Automaton
Contains
Contains states
states like
like II00,, II11,, ...
...
E' → E
E→E+T|T
T→T*F|F
F → (E) | id
LR Parsing
let a be the first symbol of w$
push 0 state on stack
while (true) {
let s be the state on top of the stack
if ACTION[s, a] == shift t {
push t onto the stack
let a be the next input symbol
} else if ACTION[s, a] == reduce A → β {
pop |β| symbols off the stack
let state t now be on top of the stack
push GOTO[t, a] onto the stack
output the production A → β
} else if ACTION[s, a] == accept { break }
else yyerror()
}
Classwork
● Construct LR(0) automaton and SLR(1)
parsing table for the following grammar.
S→AS|b
A→ SA| a
E' → E
E→E+T|T
Why do we not have a transition out of state 5 on (? T→T*F|F
F → (E) | id
Reduce Entries in the Parsing Table
● Columns for reduce entries are lookaheads.
● Therefore, they need to be in the FOLLOW of
the head of the production.
●
Thus, A -> α. is the production to be applied
(that is, α is being reduced to A), then the
lookahead (next input symbol) should be in
FOLLOW(A).
II5
5
Reduction F -> id should be applied only if the next input
FF→
→ id..
id symbol is FOLLOW(F) which is {+, *, ), $}.
State id + * ( ) $ E T F
5 r6 r6 r6 r6
l-values and r-values
S→L=R|R
L → *R | id
R→L
l-values and r-values
II0 II5
0 5 Consider state I2.
S' → . S
S' → . S LL→
→ id..
id
SS→ ● Due to the first item (S → L . = R),
→..LL==RR
SS→→..RR ACTION[2, =] is shift 6.
LL→→..**RR ● Due to the second item (R → L .), and
LL→→..id
id II6
6
RR→ →..LL SS→
→ L==..RR
L because FOLLOW(R) contains =,
RR→
→..LL ACTION[2, =] is reduce R → L..
LL→
→..**RR
II1
1 LL→
→..id id
S'
S' → S..
→ S Thus, there is a shift-reduce conflict.
II2
Does that mean the grammar is
SS→
2
II7 ambiguous?
→ L..==RR
L 7
RR→
→LL.. LL→
→ * R..
* R Not necessarily; in this case no.
However, our SLR parser is not able to
II3
3 handle it.
SS→
→ R..
R
II8
8
RR→
→ L..
L
II4
4 S'→ S
LL→
→ *..RR
*
RR→ S→L=R|R
→..LL II9
LL→
→..**RR 9
L → *R | id
LL→
→..id id SS→
→ L =RR..
L = R→L
LR(0) Automaton and
Shift-Reduce Parsing
● Why can LR(0) automaton be used to make
shift-reduce decisions?
● LR(0) automaton characterizes the strings of
grammar symbols that can appear on the
stack of a shift-reduce parser.
● The stack contents must be a prefix of a right-
sentential form [but not all prefixes are valid].
● If stack hold β and the rest of the input is x,
then a sequence of reductions will take βx to S.
Thus, S =>* βx.
Viable Prefixes
● Example
– E =>* F * id => (E) * id
– At various times during the parse, the stack holds
(, (E and (E).
– However, it must not hold (E)*. Why?
– Because (E) is a handle, which must be reduced.
– Thus, (E) is reduced to F before shifting *.
● Thus, not all prefixes of right-sentential forms
can appear on the stack.
● Only those that can appear are viable.
Viable Prefixes
● SLR parsing is based on the fact that LR(0)
automata recognize viable prefixes.
●
Item A -> β1.β2 is valid for a viable prefix αβ1 if
there is a derivation S =>* αAw => αβ1β2w.
●
Thus, when αβ1 is on the parsing stack, it
suggests we have not yet shifted the handle –
so shift (not reduce).
– Assuming β2 -> ϵ.
Homework
● Exercises in Section 4.6.6.
LR(1) Parsing
● Lookahead of 1 symbol.
● We will use similar construction (automaton),
but with lookahead.
● This should increase the power of the parser.
S'→ S
S→L=R|R
L → *R | id
R→L
LR(1) Parsing
● Lookahead of 1 symbol.
● We will use similar construction (automaton),
but with lookahead.
● This should increase the power of the parser.
S'→ S
S→CC
C→cC|d
LR(1) Automaton
S II1 $
II0 1 accept
0 S'
S' → S.,.,$$
→ S
S'
S' → . S,$$
→ . S,
SS→→..CC,
CC,$$ C II5
5
CC→ II2
→..ccC, C,c/d
c/d C 2 SS→
→ CC.,.,$$
CC
CC→ . d, c/d
→ . d, c/d SS→
→ C .C,
C . C,$$
CC→
→..ccC, C,$$ C II9
c II6 9
CC→
→..d,
d,$$ 6 CC→
→ c C.,.,$$
c C
CC→
→ c .C,
c . C,$$
CC→
→..ccC, C,$$
LL→ c
→..d,
d,$$
d
d II7
7
CC→
→ d.,.,$$
d
II3 C II8
3 Same LR(0) item, but
c CC→
→cc..C, C,c/d CC→
8
c/d → c C.,.,c/d
c C c/d different LR(1) items.
CC→ . c C, c/d
→ . c C, c/d c
CC→
→..d,d,c/dc/d
d S'→ S
d S→CC
II4
CC→
4 C→cC|d
→dd.,.,c/d
c/d
LR(1) Grammars
● Using LR(1) items and GOTO functions, we
can build canonical LR(1) parsing table.
● An LR parser using the parsing table is
canonical-LR(1) parser.
● If the parsing table does not have multiple
actions in any entry, then the given grammar is
LR(1) grammar.
● Every SLR(1) grammar is also LR(1).
– SLR(1) < LR(1)
– Corresponding CLR parser may have more states.
CLR(1) Parsing Table
State c d $ S C
0 s3 s4 1 2
1 accept
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
S'→ S
S→CC
C→cC|d
LR(1) Automaton
S II1 $
II0 1 accept
0 S'
S'→ →SS.,.,$$
S'
S'→ →..S,
S,$$
SS→ →..CC,
CC,$$ C II5
5
CC→ II2
→..ccC,
C,c/d
c/d C 2 SS→ →CC
CC.,.,$$
CC→ . d, c/d
→ . d, c/d SS→ →CC..C,
C,$$
CC→ →..ccC,
C,$$ C II9
c II6 9
CC→ →..d,
d,$$ 6 CC→ →ccCC.,.,$$
CC→ →cc..C,
C,$$
CC→ →..ccC,
C,$$
CC→ c
→..d,
d,$$
d
d II7
7
CC→ →dd.,.,$$
II3 C II8
3 Same LR(0) item, but
c CC→ →cc..C,
C,c/d CC→
8
c/d →ccCC.,.,c/d
c/d different LR(1) items.
CC→ . c C, c/d
→ . c C, c/d c
CC→ →..d,
d,c/d
c/d
I8 and I9, I4 and I7, I3 and I6 S'→ S
d
Corresponding SLR parser has seven S→CC
d II4
4
states. C→cC|d
CC→ →dd.,.,c/d
c/d Lookahead makes parsing precise.
LALR Parsing
● Can we have memory efficiency of SLR and
precision of LR(1)?
● For C, SLR would have a few hundred states.
● For C, LR(1) would have a few thousand states.
● How about merging states with same LR(0)
items?
● Knuth invented LR in 1965, but it was considered
impractical due to memory requirements.
● Frank DeRemer invented SLR and LALR in 1969 (LALR
as part of his PhD thesis).
● YACC generates LALR parser.
State c d $ S C
I8 and I9, I4 and I7, I3 and I6
0 s3 s4 1 2 Corresponding SLR parser has seven
1 accept states.
2 s6 s7 5 Lookahead makes parsing precise.
3 s3 s4 8
4 r3 r3 ● LALR parser mimics LR parser on
correct inputs.
5 r1 ● On erroneous inputs, LALR may
6 s6 s7 9 proceed with reductions while LR
7 r3 has declared an error.
● However, eventually, LALR is
8 r2 r2 guaranteed to give the error.
9 r2
CLR(1) Parsing Table LALR(1) Parsing Table
State c d $ S C
0 s36 s47 1 2
1 accept
2 s36 s47 5
36 s36 s47 89
S'→ S 47 r3 r3 r3
S→CC
5 r1
C→cC|d
89 r2 r2 r2
State Merging in LALR
● State merging with common kernel items does
not produce shift-reduce conflicts.
● A merge may produce a reduce-reduce conflict.
S'→ S
S→aAd|bBd|aBe|bAe A → c., d/e
A→c B → c., d/e
B→c
S'→ S
S→iSeS|iS|a
Using Ambiguous Grammars
II0 S II1 $
0
S'
1
accept
S' → .
S' → . SS S' → S..
→ S
SS→→..iSeSiSeS
SS→ II5
→..iS iS II2 II4
5
SS→ . a i S e SS→
→ iSe..SS
iSe S II6
→.a 2 4
SS→ i . SeS
→ i . SeS SS→
→ iS..eS
iS eS SS→
→..iSeS
iSeS SS→
6
SS→ → iSeS..
iSeS
→i i..SS SS→ iS
→ iS . . SS→
→..iSiS
SS→
→..iSeS
iSeS SS→ .
→.a a
SS→
→..iS iS i
SS→ . a i
→.a
a
a II3 a State i e a $ S
3
SS→
→ a..
a
0 s2 1
1 s3 accept
2 s2 s3 4
3 r3 r3
4 s5/r2 r2
S'→ S 5 s2 s3 6
S→iSeS|iS|a 6 r1 r1
Summary
-- --
● Precedence / Associativity xx -- -- zz
● Parse Trees yy zz xx yy
● Left Recursion .
. .
.
● Left factoring β α α ...
.
α β α α . ... α є
● Top-Down Parsing
● LL(1) Grammars
● Bottom-Up Parsing
● Shift-Reduce Parsers
● LR(0), SLR
● LR(1), LALR