Top Down Parsing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Växjö University

DAC718 - Compiler Design


Top-Down Parsing

Jonas Lundberg

Jonas.Lundberg@msi.vxu.se

http://w3.msi.vxu.se/users/jonasl/dac718

5 februari 2007

The Software Technology Group

DAC718 - Compiler Design 1(22)


Växjö University

Frontend Overview

Compiler Frontend

program lexical token syntax parse semantical abstract


source analysis sequence analysis tree analysis syntax
tree

Symbol Table

I Lexical Analysis: Identify atomic language constructs.


Each type of construct is represented by a token.
(e.g. 3.14 7→ FLOAT, if 7→ IF, a 7→ ID).
I Syntax Analysis: Checks if the token sequence is correct with respect
to the language specification.
I Semantical Analysis: Checks type relations + consistency rules.
(e.g. if type(lhs) = type(rhs) in an assignment lhs = rhs).

Each step involves a transformation from a program representation to another.

Syntax Analysis: Reveiew The Software Technology Group

DAC718 - Compiler Design 2(22)


Växjö University

Lexical Analysis Overview

lexical specification
(regular expressions)

if ( x > 1 ) { IF, LP, ID, RelOp, RP, LB


x = x + 1; ID, ASSGN, ID, AddOp, ID, SC
} else { Scanner/Tokanizer RB, ELSE, LB,
x = x - 1; (finite automata) ID, ASSGN, ID, AddOp, ID, SC
} RB

I Input program representation: Character sequence


I Output program representation: Token sequence
I Analysis specification: Regular expressions
I Recognizing (abstract) machine: Finite Automata
I Implementation: Finite Automata

Syntax Analysis: Reveiew The Software Technology Group

DAC718 - Compiler Design 3(22)


Växjö University

Syntax Analysis Overview

syntax specification
(context-free grammers)

IF, LP, ID, RelOp, RP, LB


ID, ASSGN, ID, AddOp, ID, SC Parser
RB, ELSE, LB,
ID, ASSGN, ID, AddOp, ID, SC (push-down automata,
RB top-down, bottom-up)

I Input program representation: Token sequence


I Output program representation: Parse (or syntax) tree
I Analysis specification: Context-free grammar
I Recognizing (abstract) machine: Push-down Automata
I Implementation: Top-down or Bottom-up parsers

Syntax Analysis: Reveiew The Software Technology Group

DAC718 - Compiler Design 4(22)


Växjö University

Top-down Methods
Consider the grammar
(1) S → aABe (2) A→b (3) A → Abc (4) B→d

I Using a left-most derivation we can show that abbcde is in the language


1 3 2 4
S =⇒ aABe =⇒ aAbcBe =⇒ abbcBe =⇒ abbcde
This is a top-down approach since we start from the start symbol S (the syntax
tree root) and work our way down to the tokens abbcde (the leaves of the
syntax tree).
I Problem: What production to use when facing one (or k) tokens.
I Fast and easy when it works.
I JavaCC uses a top-down parsing method.

Agenda
I Recursive Desecent
I Table-driven Parsing.
I Deriving a LL(1) parse table.
Top-Down Parsing The Software Technology Group

DAC718 - Compiler Design 5(22)


Växjö University

A Simple Method: Recursive Descent (RD)

I We associate one procedure pA() with each nonterminal A.


I lookahed = next token to process.
I The procedure pA() is called whenever we want to resolve A → α.
I For example, consider A → bCd | eF where A, C , F ∈ N and b, d, e ∈ T

pA() { eat(Token t) {
if lookahead = b then if lookahead = t then
eat(b); pC(); eat(d); lookahead = nextToken();
elsif lookahead = e then else
eat(e); pF(); reportError();
else end if;
reportError(); }
end if;
}
The variable lookahead holds the next input token.

Recursive Descent The Software Technology Group

DAC718 - Compiler Design 6(22)


Växjö University

Predictive Parsing

I RD in summary:

I Given a lookahead a ∈ T . . .
I . . . and a non-terminal A ∈ N . . .
I it should decide which production A → α to use.
I The problem with RD (as with any LL(k) method) is that it must be able to
decide which branch of a production to use just by looking at one (or k) token(s)
ahead.
I These methods are also called Predictive Parsing Methods since
every production decision implies a prediction of what will follow.
Predictive Parsing Problems
I Ambiguous Grammar: Gives non-deterministic left-most derivation.
I Left-factoring: A → αβ |αω makes prediction impossible.
I Left-recursion: A → Aα causes an infinite loop.

Recursive Descent The Software Technology Group

DAC718 - Compiler Design 7(22)


Växjö University

Arithmetic Expressions (Grammar 3)


A non-ambiguous grammar for arithmetic expressions with correct
operator priorities:

G = {T , N, P, S}
T = {id, +, ∗, (, ), }
N = {E , E 0 , T , T 0 , F }
S = E

where P is defined as

(1) E → TE 0 , E 0 → +TE 0 | ε,
(2) T → F T 0, T 0 → ∗F T 0 | ε,
(3) F → id | (E )

Notice: In Grammar 3 is ambiguity, left-factoring, and left-recursion


already removed.
Recursive Descent The Software Technology Group

DAC718 - Compiler Design 8(22)


Växjö University

Recursive Descent Revisited


RD in summary:
I Given a lookahead a ∈ T . . .
I . . . and a non-terminal A ∈ N . . .
I it should decide which production A → α to use.

The procedure associated with T 0 → ∗F T 0 | ε


Tprime() {
if lookahead = * then
eat(*); F(); Tprime();
elsif lookahead = +,) then
; //Do nothing
else
reportError();
end if;
}
The ε-production for T 0 is the tricky part. Here we must determine on what
input T 0 should do nothing and when to report error. A non-trivial task.
Fortunately, we have algorithms that can help us.
Recursive Descent The Software Technology Group

DAC718 - Compiler Design 9(22)


Växjö University

Problems with Recursive Descent

I The large number of simultanious recursive calls makes the compiler


slow and memory consuming. (calls ⇒ new activation records ⇒
several object creations)
I Grammar updates are often difficult to handle.
I We have no systematic approach to decide which production branch
to chose given some input token t.
A Parse Table Driven Approach
I Recursive calls are replaced by a stack.

I Which production branch to chose is given by a parse table M[A, t].

I Given a non-terminal A and lookahead t, M[A, t] returns the


appropriate production to use.
I We have algorithms for constructing parse tables

Recursive Descent The Software Technology Group

DAC718 - Compiler Design 10(22)


Växjö University

A Parse Table for Grammar 3

id + ∗ ( )
E E → TE 0 E → TE 0
E0 E 0 → +TE 0 E0 → ε
T T → FT 0 T → FT 0
T0 T0 → ε T 0 → ∗FT 0 T0 → ε
F F → id F → (E )
Parse Tables
I Given a non-terminal A and lookahead t, M[A, t] returns the appropriate
production to use.
I Using a parse table is easy (next slide)
I Implementing the use of a parse table is a bit more tricky (but not very hard)
I Constructing a parse table is much more difficult
(but we have algorithms who can help us!)

Table-Driven Parsing: Introduction The Software Technology Group

DAC718 - Compiler Design 11(22)


Växjö University

Using Parse Tables


Parsing id + id$ (where $ symbolizes end-of-file)
I Start:
I Push start symbol E on stack ⇒ TOP = E
I Lookahead is first input token ⇒ LA = id, Remains = +id$
I Parse:
I Rule: reduce iff TOP element equals LA, otherwise shift.
I shift ⇒ replace top element with M[TOP, LA] right-hand side
I reduce ⇒ pop element (a terminal) and set lookahead to next input.
I Success: When lookahead is end-of-file (LA = $)

Remains LA TOP Stack Remains LA TOP Stack


+id$ id E E id$ + + +TE 0
+id$ id T TE 0 $ id T TE 0
+id$ id F FT 0 E 0 $ id F FT 0 E 0
+id$ id id idT 0 E 0 $ id id idT 0 E 0
id$ + T0 T 0E 0 $ T0 T 0E 0
id$ + E0 E0

Table-Driven Parsing: Introduction The Software Technology Group

DAC718 - Compiler Design 12(22)


Växjö University

Algorithm for table driven LL-parsing


stack.push(StartSymbol)
LA = input.nextToken()
repeat
X = stack.top()
if X ∈ T or X = EOF then
if X = LA then
stack.pop()
LA = input.nextToken()
else
error(stack,LA,input) (Token not in agreement with prediction)
end if
else
if M[X , t] = X → Y1 . . . Yn then
stack.pop()
push Yn . . . Y1 onto stack, with Y1 on top
add X → Y1 . . . Yn to parse tree
else
error(stack,LA,input) (Can’t make a prediction, empty slot in M[X , t])
end if
end if
until LA = EOF
Table-Driven Parsing: Introduction The Software Technology Group

DAC718 - Compiler Design 13(22)


Växjö University

Constructing Parse Tables: Introduction

I Given a grammar G we can construct a parse table M[X , t] systematically.


I Ambiguity, left-recursion, and left-factorization give multiple entries in M[X , t].
I Before constructing M[X , t], try to eliminate all cases of the above problems.
(It will save you both time and effort.)
I Basic idea: Constructing three methods for each non-terminal X ∈ N.
I Nullable(X ): is true if X can derive the empty string ε.
I FIRST(X ): the terminals that can begin strings derived from X .
I FOLLOW(X ): is the set of terminals that can immediately follow X .
I Use Algorithm 4 to construct M[X , t] using these methods.

Notations to be used
a, b, . . . ∈ T , A, B, . . . ∈ N, . . . X , Y , Z ∈ (N ∪ T ), α, β, γ . . . ∈ (N ∪ T )∗

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 14(22)


Växjö University

Algorithm 1: Nullable(X )

I Nullable(X ) is true if X can derive the empty string ε.


I Algorithm for constructing Nullable(X ).
nullable(X ) := false for all X ∈ (N ∪ T )
repeat
for each production X → Y1 Y2 . . . Yn do
if Y1 Y2 . . . Yn are all nullable (or if X → ε) then
nullable(X ) := true
end if
end for
until nullable not changed in this iteration
I Furthermore, a string α = X1 X2 . . . Xn is nullable if every Xi is nullable.

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 15(22)


Växjö University

Algorithm 2: FIRST(α)
I FIRST(X ) is the set of terminals that can begin strings derived from X .
I Algorithm for FIRST(X )
FIRST(a) := {a} for each a ∈ T
FIRST(A) := {} for each A ∈ N
repeat
for each production X → Y1 Y2 . . . Yn do
if Y1 not nullable then
add FIRST(Y1 ) to FIRST(X )
else if Y1 . . . Yi−1 are all nullable (or if i = n) then
add FIRST(Y1 ) ∪ . . . ∪ FIRST(Yi ) to FIRST(X )
end if
end for
until FIRST not changed in this iteration
I Given string α = X1 X2 . . . Xn where Xi ∈ N ∪ T , we have
FIRST(α) = FIRST(X1 ), if not X1 nullable
FIRST(α) = FIRST(X1 ) ∪ . . . ∪ FIRST(Xi ) , if X1 . . . Xi−1 nullable
⇒ given FIRST(X ), we can compute FIRST(α) for each string α.

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 16(22)


Växjö University

Algorithm 3: FOLLOW(X )
I FOLLOW(X ) is the set of terminals that can immediately follow X .
I Example, t ∈ FOLLOW(X ) if there is any derivation containing Xt.
This can occur if a derivation contains XYZt where both Y and Z
are nullable.
I Algorithm for FOLLOW(X )
repeat
for each nonterminal Y do
for each production X → αY β do
add FIRST(β) to FOLLOW(Y )
if β is nullable (or ε) then
add FOLLOW(X ) to FOLLOW(Y )
end if
end for
end for
until FOLLOW not changed in this iteration

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 17(22)


Växjö University

Algorithm 4: Parse Table Construction

I M[X , t] gives the production to use when resolving X given lookahead t.


I Basic idea: X → α ∈ M[X , t] iff t ∈ FIRST(α)
I α is Nullable requires special treatment.
I Algorithm
for each production X → α do
for each terminal t ∈ FIRST(α) do
add X → α to M[X , t]
end for
if α is Nullable (or ε) then
for each t ∈ FOLLOW(X ) do
add X → α to M[X , t]
end for
end if
end for

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 18(22)


Växjö University

Summary: Table Construction for Grammar 3

Auxiliary functions Nullable, FIRST, and FOLLOW


Nullable FIRST FOLLOW
E No id, ( )
E0 Yes + )
T No id, ( +, )
T0 Yes ∗ +, )
F No id, ( +, ∗, )

Corresponding Parse Table


id + ∗ ( )
E E → TE 0 E → TE 0
E0 E 0 → +TE 0 E0 → ε
T T → FT 0 T → FT 0
T0 T0 → ε T 0 → ∗FT 0 T0 → ε
F F → id F → (E )

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 19(22)


Växjö University

Multiple Entries
Consider the following “dangling else” grammar:
S → iEtSS 0 |a, S 0 → eS|ε, E →b
0
where E = expression, S = statement, S = elsePart, i = if, t = then, e = else,
a = OtherStatement, and b = someExpression. It has the following parse table
a b e i t
S S →a E → iEtSS 0
S0 S 0 → eS, S 0 → ε
E E →b

I The ambiguous grammar is manifested as a duplicate entry when e (else)


is seen. We can resolve the ambiguity by always chosing S 0 → eS (That
is, remove S 0 → ε from that entry.)
I Removing S 0 → ε from that entry is not the same as removing S 0 → ε
from the grammar.
I In general, the parse table is a good place to do some minor adjustments
of the parser.
Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 20(22)


Växjö University

LL(1)
I LL(1) stands for Left-to-right parse, Leftmost-derivation, 1-symbol lookahead.
I Left-to-right parse means that we are scanning the input left-to-right.
I A grammar generating a table with no multiple entries is a LL(1) grammar.
(multiple entry ⇒ not deterministic ⇒ ambiguous grammar)
I An LL(1) table is of size O(|N| ∗ |T |) where |N| and |T | are the numbers of
non-terminals and terminals.
LL(k)
I LL(k) stands for Left-to-right parse, Leftmost-derivation, k-symbol lookahead.
I Grammars parsable with LL(k) parsers are called LL(k) grammars.
I An LL(3) grammar might require 3 token to chose the correct branch.
I An LL(3) table has an entry for every possible triple of tokens ⇒ O(|N| ∗ |T |3 )
I No ambiguous grammar is LL(k) for any k.
I LL(k) parsers can be constructed systematically, FIRST(X ) gives all k-tuples
that can begin a string derived from X , FOLLOW(X ) is all k-tuples that can
immediately follow X . It is straight forward but not so fun . . . .

Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 21(22)


Växjö University

Written Assignment 2: LL(1) Parsing Tables


Consider the following grammar

S → uBDz B → w | Bv D → EF E →y |ε F →x |ε

where S is the start symbol and u, v , w , x, y , z are terminals.


1. Compute Nullable, FIRST , and FOLLOWS for the non-terminals in
above grammar using the algorithms presented in the lecture slides.
2. Construct the LL(1) parsing table.
3. Give evidence that this grammar is not LL(1).
4. Modify the grammar as little as possible to make it an LL(1)
grammar that accepts the same language.
5. Recompute the results in 1) and 2) using the modified grammar.
6. Simulate the parsing of the string uwvvyz using the newly
constructed parsing table.
Deadline: 2007-02-18 (One week before PA step 1)
Parse Table Construction The Software Technology Group

DAC718 - Compiler Design 22(22)

You might also like