Predictive Parsing - For A Given Non-Terminal, The Look-Ahead Symbol Uniquely Determines The Production To Apply
Predictive Parsing - For A Given Non-Terminal, The Look-Ahead Symbol Uniquely Determines The Production To Apply
CS331 • LL Parsing
LL Parsing
CS331 • LL Parsing
LL(1) Grammars
Definition
A grammar G is LL(1) if and only if for each set of
productions A ::= 1 | 2| …| n
• FIRST(1), FIRST(2), …FIRST(n) are all pairwise
disjoint, and
• if i * then FIRST(j) FOLLOW(A) = , for all 1
≤ j ≤ n, i j.
S ( (1+2+(3+4))+5
ES’ ( (1+2+(3+4))+5
(S)S’ 1 (1+2+(3+4))+5
(ES’)S’ 1 (1+2+(3+4))+5
(1S’)S’ + (1+2+(3+4))+5
(1+S)S’ 2 (1+2+(3+4))+5
(1+ES’)S’ 2 (1+2+(3+4))+5
(1+2 S’)S’ + (1+2+(3+4))+5
Parse table:
num + ( ) EOF
S ES’ ES’
S’ +S
E num (S)
CS331 • LL Parsing
How to implement?
• The table can be easily converted into a
recursive descent parser
num + ( ) EOF
S ES’ ES’
S’ +S
E num (S)
CS331 • LL Parsing
Recursive descent parsing - LL(1)
Recursive descent is one of the simplest parsing techniques
used in practical compilers
Parsing procedures may contain (call upon) code that performs some useful
“computation" (syntax directed translation)
CS331 • LL Parsing
Recursive-Descent Parser S ES’
S’ | +S
void parse_S () { Lookahead
E num | (S)
token
switch (token) {
case num: parse_E(); parse_S’(); return;
case ‘(’: parse_E(); parse_S’(); return;
default: throw new ParseError();
}
}
num + ( ) EOF
S ES’ ES’
S’ +S
E num (S)
CS331 • LL Parsing
Recursive-Descent Parser
void parse_S’() {
switch (token) {
case ‘+’: token = input.read(); parse_S(); return;
case ‘)’: return;
case EOF: return;
default: throw new ParseError();
}
}
num + ( ) EOF
S ES’ ES’
S’ +S
E num (S)
CS331 • LL Parsing
Recursive-Descent Parser
void parse_E() {
switch (token) {
case number: token = input.read(); return;
case ‘(‘: token = input.read(); parse_S();
if (token != ‘)’) throw new ParseError();
token = input.read(); return;
default: throw new ParseError(); }
}
num + ( ) EOF
S ES’ ES’
S’ +S
E num (S)
CS331 • LL Parsing
Call tree = Parse tree (1+2+(3+4))+5
parse_S S
E S’
parse_E parse_S’ ( S ) S
+
parse_S E S’ 5
parse_S
1 + S
parse_E parse_S’
E S’
parse_S S
2 +
parse_E parse_S’ E S’
parse_S
( S )
E S’
parse_E parse_S’ S
3 +
parse_S E
CS331 • LL Parsing
4
Recall: Expression grammar
Expression grammar with precedence
<goal> ::= <expr>
<expr> ::= <term> <expr’>
<expr’> ::= + <expr>
| - <expr>
|
<term> ::= <factor> <term’>
<term’> ::= * <term>
| / <term>
|
<factor> ::= num
| id
LL(1) parse table
id num + - * / eof
<goal> g→ e g→ e - - - - -
<expr> e → te’ e → te’ - - - - -
<expr’> - - e’ → +e e’ → +e e’ → ε
<term > e → te’ e → te’ - - - - -
<term ’> - - t’ → ε t’ → ε t’ → *t t’ → /t t’ → ε
<factor> f’ → id f→ num - - - - -
CS331 • LL Parsing
Recursive Descent Parser
For the expression grammar:
goal:
token next token();
if (expr() = ERROR | token EOF) then
return ERROR;
else return OK;
expr:
if (term() = ERROR) then
return ERROR;
else return expr_prime();
Expr_prime:
if (token = PLUS) then
token next_token();
return expr();
else if (token = MINUS) then
token next_token();
return expr();
else return OK;
CS331 • LL Parsing
Recursive Descent Parser (cont.)
term:
if (factor() = ERROR) then
return ERROR;
else return term_prime();
term_prime:
if (token = MULT) then
token next token();
return term();
else if (token = DIV) then
token next token();
return term();
else return OK;
factor:
if (token = NUM) then
token next token();
return OK;
else if (token = ID) then
token next token();
return OK;
else return ERROR;
CS331 • LL Parsing
Constructing parse tables
S ES’ ? S
num
ES’
+ (
ES’
) EOF
S’ | +S
S’ +S
E num | (S)
E num (S)
CS331 • LL Parsing
Constructing Parse Tables
• Use FIRST and FOLLOW sets
• Recall:
– FIRST() for arbitrary string of terminals and non-
terminals is the set of symbols that might begin the
fully expanded version of
– FOLLOW(X) for a non-terminal X is the set of symbols
that might follow the derivation of X in the input stream
First Follow
CS331 • LL Parsing
Parse table entries
Consider a production X
• Add to the X row for each symbol in FIRST()
num + ( ) EOF
S ES’ S ES’ ES’
S’ | +S S’ +S
E num | (S)
E num (S)
• If can derive ( is nullable), add for each
symbol in FOLLOW(X)
Algorithm
Assume all non-terminals non-nullable,
apply rules repeatedly until no change in status
CS331 • LL Parsing
Constructing FIRST sets
• FIRST(X) FIRST() if X
• FIRST(a) = {a}
• FIRST(X) FIRST(X)
• FIRST(X) FIRST() if X is nullable
Algorithm
Assume FIRST() ={} for all ,
apply rules repeatedly to build FIRST sets
CS331 • LL Parsing
Constructing FOLLOW sets
• FOLLOW(S) { EOF }
• if X Y
– FOLLOW(Y) = FIRST()
– FIRST(X) FIRST(X)
• if X Y and is nullable (or non-existent)
– FOLLOW(Y) FOLLOW(X)
Algorithm
Assume FOLLOW(X) = { } for all X,
apply rules repeatedly to build FOLLOW sets
Nullable
– Only S’ is nullable S ES’
S’ | +S
FIRST E num | (S)
– FIRST(ES’ ) = {num, ( }
– FIRST(+S) = { + }
– FIRST(num) = {num}
– FIRST( (S) ) = { ( } FOLLOW
– FIRST(S’) = { + } –FOLLOW(S) = { EOF, ) }
–FOLLOW(S’) = = {EOF, )}
–FOLLOW(E) = { +, ), EOF}
CS331 • LL Parsing
Creating the parse table S ES’
S’ | +S
E num | (S)
For each production X
• Add to the X row for each symbol in FIRST()
• If is nullable, add for each symbol in FOLLOW(X)
• Entry for [S, EOF] is ACCEPT
FIRST(ES’ ) = {num, ( }
FIRST(+S) = { + }
FOLLOW(S) = { EOF, ) } FIRST(num) = {num}
FOLLOW(S’) = = {EOF, )} FIRST( (S) ) = { ( }
FOLLOW(E) = { +, ), EOF} FIRST(S’) = { + }
num + ( ) EOF
S ES’ ES’ Accept
S’ +S
E num (S)
CS331 • LL Parsing
Ambiguous grammars
Construction of predictive parse table for ambiguous
grammar results in conflicts (but converse does not hold)
Parse table
num + -
S S+S
S S*S
CS331 • LL Parsing
LL(1) grammars
Provable facts about LL(1) grammars:
– no left recursive grammar is LL(1)
– no ambiguous grammar is LL(1)
– LL(1) parsers operate in linear time
– an -free grammar where each alternative expansion for A begins
with a distinct terminal is a simple LL(1) grammar
• S ::= aS’
S’ ::= aS |
accepts the same language and is LL(1)
CS331 • LL Parsing
LL grammars
LL(1) grammars
– may need to rewrite grammar (left recursion removal, left factoring)
– resulting grammar larger, less maintainable
LL(k) grammars
– k-token lookahead
– more powerful than LL(1) grammars
– example:
S ::= ac | abc is LL(2)
Not all grammars are LL(k)
• Example:
– Set of productions of form: S ::= aibj for i ≥ j
• Problem:
– must choose production after k tokens of lookahead
CS331 • LL Parsing
Completing the parser
One of the key jobs of the parser is to build an intermediate
representation of the source code.
CS331 • LL Parsing
Creating the AST
CS331 • LL Parsing
AST Representation
(1 + 2 + (3 + 4)) + 5
+ Add
1 + Num(1) Add
2 + Num(2) Add
3 4 Num(3) Num(4)
CS331 • LL Parsing
Creating the AST
Just add code to each parsing routine to create
the appropriate nodes!
Works because parse tree and call tree have the same
shape