0% found this document useful (0 votes)
86 views33 pages

Compiler Construction: Parsing: Mandar Mitra

The document discusses parsing in compiler construction, including context-free grammars, ambiguity, top-down parsing techniques like recursive descent and predictive parsing, bottom-up parsing, and the use of FIRST and FOLLOW sets to construct parsing tables for LL(1) grammars to guide both top-down and bottom-up parsing approaches.

Uploaded by

Arwa Khallouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views33 pages

Compiler Construction: Parsing: Mandar Mitra

The document discusses parsing in compiler construction, including context-free grammars, ambiguity, top-down parsing techniques like recursive descent and predictive parsing, bottom-up parsing, and the use of FIRST and FOLLOW sets to construct parsing tables for LL(1) grammars to guide both top-down and bottom-up parsing approaches.

Uploaded by

Arwa Khallouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Compiler Construction: Parsing

Mandar Mitra

Indian Statistical Institute

M. Mitra (ISI) Parsing 1 / 33


Context-free grammars
Reference: Section 4.2
.
Formal way of specifying rules about the structure/syntax of a
program
terminals - tokens
non-terminals - represent higher-level structures of a program
start symbol, productions

Example:
E → E op E | (E) | − E | id
op → + | − | ∗ | / | %
NOTE : recall token vs. lexeme difference

Derivation: starting from the start symbol, use productions to


generate a string (sequence of tokens)
Parse tree: pictorial representation of a derivation

M. Mitra (ISI) Parsing 2 / 33


Ambiguity
Reference: Section 4.2, 4.3
.
Left-most derivation: at each step, replace left-most non-terminal
Ambiguous grammar: G is ambiguous if a string has > 1 left-most (or
right-most) derivation
ALT : G is ambiguous if > 1 parse tree can be constructed for a string

Examples:
1. E → E + E E →E∗E
2. stmt → if expr then stmt |
if expr then stmt else stmt

[ SOLUTION : stmt → matched | unmatched


matched → if E then matched else matched
unmatched → if E then stmt |
if E then matched else unmatched ]

M. Mitra (ISI) Parsing 3 / 33


Top-down parsing - I
Reference: Section 4.4
.
Recursive descent parsing:
Corresponds to finding a leftmost derivation for an input string
Equivalent to constructing parse tree in pre-order
Example:
Grammar: S → cAd A → ab | a
Input: cad

Problems:
1. backtracking involved (⇒buffering of tokens required)
2. left recursion will lead to infinite looping
3. left factors may cause several backtracking steps

M. Mitra (ISI) Parsing 4 / 33


Top-down parsing - I
Reference: Section 4.3
.
+
Left recursion: G istb left recursive if for some non-terminal A, A ⇒ Aα
A → βA′
Simple case I: A → Aα | β ⇒
A′ → αA′ | ϵ
Simple case II:
A → Aα1 | Aα2 | . . . | Aαm |
β1 | . . . | βn

A → β1 A′ | . . . | βn A′
A′ → α1 A′ | α2 A′ | . . . | αm A′ | ϵ

M. Mitra (ISI) Parsing 5 / 33


Top-down parsing - I

General case:
+
Input: G without any cycles (A ⇒ A) or ε-productions
Output: equivalent non-recursive grammar
Algorithm:
Let the non-terminals be A1 , . . . An .
for i = 1 to n do
for j = 1 to i − 1 do
replace Ai → Aj γ by Ai → δ1 γ | δ2 γ | . . . | δk γ
where Aj → δ1 | δ2 | . . . | δk are the current Aj productions
end for
Eliminate immediate left-recursion for Ai .
end for

M. Mitra (ISI) Parsing 6 / 33


Top-down parsing - I

Left factoring:
Example: stmt → if ( expr ) stmt |
if ( expr ) stmt else stmt

Algorithm:
while left factors exist do
for each non-terminal A do
Find longest prefix α common to ≥ 2 rules
Replace A → αβ1 | . . . | αβn | . . .

by A → αA′ | . . .

A′ → β1 | . . . | βn
end for
end while

M. Mitra (ISI) Parsing 7 / 33


Top-down parsing - II
Reference: Section 4.4
.
Predictive parsing: recursive descent parsing without backtracking
Principle: Given current input symbol and non-terminal, we should be
able to determine which production is to be used
Example: stmt → if ( expr ) . . . |
while . . . |
for . . .

M. Mitra (ISI) Parsing 8 / 33


Top-down parsing - II

Implementation: use transition diagrams (1 per non-terminal)

A → X1 X2 . . . X n
X1 X2 Xn
s ... F

1. If Xi is a terminal, match with next input token and advance to next


state.
2. If Xi is a non-terminal, go to the transition diagram for Xi , and
continue. On reaching the final state of that transition diagram,
advance to next state of current transition diagram.
Example: E → E+T |T
T → T ∗F |F
F → (E) | id

M. Mitra (ISI) Parsing 9 / 33


Top-down parsing - II

Non-recursive implementation:
a Input

top X Table: 2-d array s.t.


TABLE M [A, a] specifies
A-production to be used
Stack if input symbol is a

Algorithm:
0. Initially: stack contains ⟨EOF S⟩, input pointer is at start of input
1. if X = a = EOF, done
2. if X = a ̸= EOF, pop stack and advance input pointer
3. if X is non-terminal, lookup M [X, a] ⇒ X → U V W
pop X , push W, V, U

M. Mitra (ISI) Parsing 10 / 33


FIRST and FOLLOW

FIRST(α): set of terminals that begin strings derived from α



if α ⇒ ϵ, then ϵ ∈ FIRST (α)

FOLLOW (A): set of terminals that can appear immediately to the right of
A in some sentential form

FOLLOW (A) = {a | S ⇒ αAaβ}

if A is the rightmost symbol in any sentential form, then


EOF ∈ FOLLOW (A)

M. Mitra (ISI) Parsing 11 / 33


FIRST and FOLLOW

FIRST:
1. if X is a terminal, then FIRST (X) = {X}
2. if X → ϵ is a production, then add ϵ to FIRST (X)
3. if X → Y1 Y2 . . . Yk is a production:
if a ∈ FIRST (Yi ) and ϵ ∈ FIRST (Y1 ), . . . , FIRST (Yi−1 ),
add a to FIRST (X)
if ϵ ∈ FIRST (Yi ) ∀i, add ϵ to FIRST (X)

FOLLOW:
1. Add EOF to FOLLOW (S)
2. For each production of the form A → αBβ
(a) add FIRST (β)\{ϵ} to FOLLOW (B)
(b) if β = ϵ or ϵ ∈ FIRST (β), then add everything in FOLLOW (A) to
FOLLOW (B)

M. Mitra (ISI) Parsing 12 / 33


Table construction

Algorithm:
1. For each production A → α
(a) for each terminal a ∈ FIRST (α), add A → α to M [A, a]
(b) if ϵ ∈ FIRST (α), add A → α to M [A, b] for each terminal
b ∈ FOLLOW (A)

2. Mark all other entries “error”

Example:
Grammar: E → E+T |T Input: id + id * id
T → T ∗F |F
F → (E) | id

M. Mitra (ISI) Parsing 13 / 33


LL(1) grammars

If the table has no multiply defined entries, grammar istb LL(1)


L - left-to-right L - leftmost 1 - lookahead

If G is LL(1), then G cannot be left-recursive or ambiguous

Example: S → i E t S S′ | a
S′ → e S | ϵ
E → b
M [S ′ , e] = {S ′ → ϵ, S ′ → e S}
Some non-LL(1) grammars may be transformed into equivalent
LL(1) grammars

M. Mitra (ISI) Parsing 14 / 33


LL(1) parsing

Error recovery:
1. if X is a terminal, but X ̸= a, pop X
2. if M [X, a] is blank, skip a
3. if M [X, a] = synch , pop X , but do not advance input pointer
Synch sets:
use FOLLOW (A)
add the FIRST set of a higher-level non-terminal to the synch set of a
lower-level non-terminal

M. Mitra (ISI) Parsing 15 / 33


Bottom-up parsing
Reference: Section 4.5
.
Example: Grammar: E → E+T |T Input: id + id * id
T → T ∗F |F
F → (E) | id

Sentential form: any string α s.t. S ⇒ α
Handle: for a sentential form γ , handle is a production A → β and a
position of β in γ , s.t. β may be replaced by A to produce the previous
right-sentential form in a rightmost derivation of γ
Properties:
1. string to right of handle must consist of terminals only
2. if G is unambiguous, every right-sentential form has a unique handle
Advantages:
1. No backtracking
2. More powerful than LL(1) / predictive parsing
M. Mitra (ISI) Parsing 16 / 33
Bottom-up parsing
Reference: Section 4.7
.
Implementation scheme:
0. Use input buffer, stack, and parsing table.
1. Shift ≥ 0 input symbols onto stack until a handle β is on top of stack.
2. Reduce β to A (i.e. pop symbols of β and push A).
3. Stop when stack = ⟨EOF, S⟩, and input pointer is at EOF.
Stack: so X1 s1 . . . Xm sm , where each si represents a “state” (current
situation in the parsing process)

Table:
used to guide steps 2 and 3
2-d array indexed by ⟨state, input symbol⟩ pairs
consists of two parts (action + goto)

M. Mitra (ISI) Parsing 17 / 33


Bottom-up parsing

Algorithm:

1. Initially, stack = ⟨s0 ⟩ (initial state)


2. Let s - state on top of stack
a - current input symbol
if action[s, a] = shift s′
push a, s′ on stack, advance input pointer
if action[s, a] = reduce A → β
pop 2 ∗ |β| symbols
let s′ be the new top of stack
push A, goto[s′ , A] on stack
if action[s, a] = accept, done
else error

M. Mitra (ISI) Parsing 18 / 33


SLR parsing

Grammar augmentation:
Create new start symbol S ′ ; add S ′ → S to productions

Item (LR(0) item): production of G with a dot at some position in the


RHS, representing how much of the RHS has already been seen at a
given point in the parse
Example: A → ϵ ⇒ A→·

Closure:
Let I be a set of items
closure(I) ← I
repeat until no more changes
for each A → α · Bβ in closure(I)
for each production B → γ s.t. B → ·γ ̸∈ closure(I)
add B → ·γ to closure(I)
Example: closure(E ′ → ·E )
M. Mitra (ISI) Parsing 19 / 33
SLR parsing

Goto construction:
goto(I, X) = closure( {A → αX · β | A → α · Xβ ∈ I} )
Example: Let I = {E ′ → E·, E → E · +T }
goto(I, +) = closure({E → E + ·T })

Canonical collection construction:

1. C ← {closure({S ′ → ·S})}
2. repeat until no more changes:
for each I ∈ C , for each grammar symbol X
if goto(I, X) is not empty and not in C
add goto(I, X) to C

M. Mitra (ISI) Parsing 20 / 33


SLR parsing

Table construction:
1. Let C = {I0 , . . . , In } be the canonical collection of LR(0) items for G.
2. Create a state si corresponding to each Ii . The set containing
S ′ → ·S corresponds to the initial state.
3. If A → α · aβ ∈ Ii and goto(Ii , a) = Ij , then action(si , a) = shift sj .
4. If A → α· ∈ Ii (A ̸= S ′ ), then action(si , a) = reduce A → α for all
a ∈ FOLLOW (A).
5. If S ′ → S· ∈ Ii , then action(si , EOF) = accept.
6. If goto(Ii , a) = Ij , then goto(si , a) = sj .
7. Mark all blank entries error.

M. Mitra (ISI) Parsing 21 / 33


Conflicts

1. Shift-reduce conflict: stmt → if ( expr ) stmt |


if ( expr ) stmt else stmt

2. Reduce-reduce conflict: stmt → id ( param list ) ;


expr → id ( expr list )
..
.
param → id
expr → id
Example:
Grammar: S → L = R S→R L → ∗R L → id R→L
Canonical collection:
I0 = {S ′ → ·S, . . .} I2 = {S → L· = R, R → L·} . . .
Table: action(2, =) = shift . . . action(2, =) = reduce . . .
NOTE : SLR(1) grammars are unambiguous, but not vice versa.
M. Mitra (ISI) Parsing 22 / 33
Canonical LR parsers

Motivation: Reduction by A → α· not necessarily proper even if


a ∈ FOLLOW (A)
⇒ explicitly indicate tokens for which reduction is acceptable

LR(1) item: pair of the form ⟨A → α · β, a⟩, where A → αβ is a


production, a is a terminal or EOF
Properties:
1. ⟨A → α · β, a⟩ - lookahead has no effect
⟨A → α·, a⟩ - reduce only if input symbol is a
2. {a | ⟨A → α·, a⟩ ∈ canonical collection} ⊆ FOLLOW (A)

M. Mitra (ISI) Parsing 23 / 33


Canonical LR parsers

Closure:

Let I be a set of items


closure(I) ← I
repeat until no more changes
for each item ⟨A → α · Bβ, a⟩ in closure(I)
for each production B → γ and each terminal b ∈ FIRST (βa)
if ⟨B → ·γ, b⟩ ̸∈ closure(I)
add ⟨B → ·γ, b⟩ to closure(I)

Goto:
goto(I, X) = closure({⟨A → αX · β, a⟩ | ⟨A → α · Xβ, a⟩ ∈ I})

Canonical collection construction:


1. C ← {closure({⟨S ′ → ·S, EOF⟩})}
2. /* Similar to SLR algorithm */
M. Mitra (ISI) Parsing 24 / 33
Canonical LR parsers

Table construction:
1. If ⟨A → α · aβ, b⟩ ∈ Ii and goto(Ii , a) = Ij then action(i, a) = shift j .
2. If ⟨A → α·, b⟩ ∈ Ii , then action(i, b) = reduce A → α (A ̸= S ′ ).
If ⟨S ′ → S·, EOF⟩ ∈ Ii , then action(i, EOF) = accept.

M. Mitra (ISI) Parsing 25 / 33


LALR parsers

Motivation: try to combine efficiency of SLR parser with power of


canonical method
Core: set of LR(0) items corresponding to a set of LR(1) items

Method:
1. Construct canonical collection of LR(1) items, C = {I0 , . . . , In }.
2. Merge all sets with the same core. Let the new collection be
C ′ = {J0 , . . . , Jm }.
3. Construct the action table as before.
4. If J = I1 ∪ . . . ∪ Ik , then goto(J, X) = union of all sets with the same
core as goto(I1 , X).

M. Mitra (ISI) Parsing 26 / 33


SLR vs LR(1) vs LALR

No. of states: SLR = LALR <= LR(1) (cf. Pascal)


Power: SLR < LALR < LR(1)
SLR vs. LALR: LALR items can be regarded as SLR items, with the
core augmented by appropriate subsets of FOLLOW (A) explicitly
specified
LALR vs. LR(1):
1. If there were no shift-reduce conflicts in the LR(1) table, there will be
no shift-reduce conflicts in the LALR table.
2. Step 2 may generate reduce-reduce conflicts.
Example: I1 = {⟨A → α·, a⟩, ⟨B → β·, b⟩}
I2 = {⟨A → α·, b⟩, ⟨B → β·, a⟩}
3. Correct inputs: LALR parser mimics LR(1) parser
Incorrect inputs: incorrect reductions may occur on a lookahead a ⇒
parser goes back to a state Ii in which A has just been recognized.
But a cannot follow A in this state ⇒ error
M. Mitra (ISI) Parsing 27 / 33
Error recovery
Reference: Section 4.8
.
Detection:
Canonical LR - errors are immediately detected (no unnecessary
shift/reduce actions)
SLR/LALR - no symbols are shifted onto stack, but reductions may
occur before error is detected

Panic mode recovery:


1. Scan down the stack until a state s with a goto on a “significant”
non-terminal A (e.g. expr, stmt, etc.) is found.
2. Discard input until a symbol a which can follow A is found.
3. Push A, goto(s, A) and resume parsing.
Expln: s ≡ α · Aaβ ⇒ α γ aβ
| {z }
location of error

M. Mitra (ISI) Parsing 28 / 33


Error recovery

Phrase-level error recovery: recovery by local correction on remaining


input e.g. insertion, deletion, substitution, etc.
Scheme:
1. Consider each blank entry, and decide what the error is likely to be.
2. Call appropriate recovery method
— should consume input (to avoid infinite loop)
— avoid popping a “significant” non-terminal from stack
Examples:
State: E ′ → ·E State: E → E + ·T
Input: + or ∗ Input: )
Action: push id, goto appropriate Action: skip ’)’ from input
state Message: extra ’)’
Message: missing operand

M. Mitra (ISI) Parsing 29 / 33


Yacc
Reference: Section 4.9
.
Usage: $ yacc myfile.y (generates y.tab.c)
File format:
declarations
%%
grammar rules (terminals, non-terminals, start symbol)
%%
auxiliary procedures (C functions)
Semantic actions:
— $$ - attribute value associated with LHS non-terminal
$i - attribute value associated with i-th grammar symbol on
RHS
— specified action is executed on reduction by corresponding
production
— default action: $$ = $1
M. Mitra (ISI) Parsing 30 / 33
M. Mitra (ISI) Parsing 31 / 33
Yacc

Lexical analyzer: yylex() must be provided


should return integer code for a token
should set yylval to attribute value
Usage with lex:
% lex scanner.l Add #include "lex.yy.c" to
% yacc parser.y third section of file
% cc y.tab.c -ly -ll
Declared tokens can be used as return values in scanner.l

M. Mitra (ISI) Parsing 32 / 33


Yacc

Implicit conflict resolution:


1. Shift-reduce: choose shift
2. Reduce-reduce: choose the production listed earlier
Explicit conflict resolution:
Precedence: tokens are assigned precedence according to the order
in which they are declared (lowest first)
Associativity: left, right, or nonassoc
Precedence/assoc. of a production = precedence/assoc. of rightmost
terminal or explicitly specified using %prec
Given A → α·, a:
if precedence of production is higher, reduce
if precedence of production is same, and associativity of production is
left, reduce
else shift

M. Mitra (ISI) Parsing 33 / 33

You might also like