Top Down Parsing
Top Down Parsing
Top Down Parsing
Jonas Lundberg
Jonas.Lundberg@msi.vxu.se
http://w3.msi.vxu.se/users/jonasl/dac718
5 februari 2007
Frontend Overview
Compiler Frontend
Symbol Table
lexical specification
(regular expressions)
syntax specification
(context-free grammers)
Top-down Methods
Consider the grammar
(1) S → aABe (2) A→b (3) A → Abc (4) B→d
Agenda
I Recursive Desecent
I Table-driven Parsing.
I Deriving a LL(1) parse table.
Top-Down Parsing The Software Technology Group
pA() { eat(Token t) {
if lookahead = b then if lookahead = t then
eat(b); pC(); eat(d); lookahead = nextToken();
elsif lookahead = e then else
eat(e); pF(); reportError();
else end if;
reportError(); }
end if;
}
The variable lookahead holds the next input token.
Predictive Parsing
I RD in summary:
I Given a lookahead a ∈ T . . .
I . . . and a non-terminal A ∈ N . . .
I it should decide which production A → α to use.
I The problem with RD (as with any LL(k) method) is that it must be able to
decide which branch of a production to use just by looking at one (or k) token(s)
ahead.
I These methods are also called Predictive Parsing Methods since
every production decision implies a prediction of what will follow.
Predictive Parsing Problems
I Ambiguous Grammar: Gives non-deterministic left-most derivation.
I Left-factoring: A → αβ |αω makes prediction impossible.
I Left-recursion: A → Aα causes an infinite loop.
G = {T , N, P, S}
T = {id, +, ∗, (, ), }
N = {E , E 0 , T , T 0 , F }
S = E
where P is defined as
(1) E → TE 0 , E 0 → +TE 0 | ε,
(2) T → F T 0, T 0 → ∗F T 0 | ε,
(3) F → id | (E )
id + ∗ ( )
E E → TE 0 E → TE 0
E0 E 0 → +TE 0 E0 → ε
T T → FT 0 T → FT 0
T0 T0 → ε T 0 → ∗FT 0 T0 → ε
F F → id F → (E )
Parse Tables
I Given a non-terminal A and lookahead t, M[A, t] returns the appropriate
production to use.
I Using a parse table is easy (next slide)
I Implementing the use of a parse table is a bit more tricky (but not very hard)
I Constructing a parse table is much more difficult
(but we have algorithms who can help us!)
Notations to be used
a, b, . . . ∈ T , A, B, . . . ∈ N, . . . X , Y , Z ∈ (N ∪ T ), α, β, γ . . . ∈ (N ∪ T )∗
Algorithm 1: Nullable(X )
Algorithm 2: FIRST(α)
I FIRST(X ) is the set of terminals that can begin strings derived from X .
I Algorithm for FIRST(X )
FIRST(a) := {a} for each a ∈ T
FIRST(A) := {} for each A ∈ N
repeat
for each production X → Y1 Y2 . . . Yn do
if Y1 not nullable then
add FIRST(Y1 ) to FIRST(X )
else if Y1 . . . Yi−1 are all nullable (or if i = n) then
add FIRST(Y1 ) ∪ . . . ∪ FIRST(Yi ) to FIRST(X )
end if
end for
until FIRST not changed in this iteration
I Given string α = X1 X2 . . . Xn where Xi ∈ N ∪ T , we have
FIRST(α) = FIRST(X1 ), if not X1 nullable
FIRST(α) = FIRST(X1 ) ∪ . . . ∪ FIRST(Xi ) , if X1 . . . Xi−1 nullable
⇒ given FIRST(X ), we can compute FIRST(α) for each string α.
Algorithm 3: FOLLOW(X )
I FOLLOW(X ) is the set of terminals that can immediately follow X .
I Example, t ∈ FOLLOW(X ) if there is any derivation containing Xt.
This can occur if a derivation contains XYZt where both Y and Z
are nullable.
I Algorithm for FOLLOW(X )
repeat
for each nonterminal Y do
for each production X → αY β do
add FIRST(β) to FOLLOW(Y )
if β is nullable (or ε) then
add FOLLOW(X ) to FOLLOW(Y )
end if
end for
end for
until FOLLOW not changed in this iteration
Multiple Entries
Consider the following “dangling else” grammar:
S → iEtSS 0 |a, S 0 → eS|ε, E →b
0
where E = expression, S = statement, S = elsePart, i = if, t = then, e = else,
a = OtherStatement, and b = someExpression. It has the following parse table
a b e i t
S S →a E → iEtSS 0
S0 S 0 → eS, S 0 → ε
E E →b
LL(1)
I LL(1) stands for Left-to-right parse, Leftmost-derivation, 1-symbol lookahead.
I Left-to-right parse means that we are scanning the input left-to-right.
I A grammar generating a table with no multiple entries is a LL(1) grammar.
(multiple entry ⇒ not deterministic ⇒ ambiguous grammar)
I An LL(1) table is of size O(|N| ∗ |T |) where |N| and |T | are the numbers of
non-terminals and terminals.
LL(k)
I LL(k) stands for Left-to-right parse, Leftmost-derivation, k-symbol lookahead.
I Grammars parsable with LL(k) parsers are called LL(k) grammars.
I An LL(3) grammar might require 3 token to chose the correct branch.
I An LL(3) table has an entry for every possible triple of tokens ⇒ O(|N| ∗ |T |3 )
I No ambiguous grammar is LL(k) for any k.
I LL(k) parsers can be constructed systematically, FIRST(X ) gives all k-tuples
that can begin a string derived from X , FOLLOW(X ) is all k-tuples that can
immediately follow X . It is straight forward but not so fun . . . .
S → uBDz B → w | Bv D → EF E →y |ε F →x |ε