0% found this document useful (0 votes)
38 views33 pages

Predictive Parsing - For A Given Non-Terminal, The Look-Ahead Symbol Uniquely Determines The Production To Apply

The document discusses predictive parsing and LL(1) grammars, describing how LL(1) grammars can be used to build table-driven predictive parsers through recursive descent parsing procedures associated with each non-terminal that call upon one another to parse input according to the grammar. An example grammar for arithmetic expressions is provided along with its LL(1) parsing table and the recursive descent parsing procedures that could implement an LL(1) parser for that grammar.

Uploaded by

lumona_gr12
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views33 pages

Predictive Parsing - For A Given Non-Terminal, The Look-Ahead Symbol Uniquely Determines The Production To Apply

The document discusses predictive parsing and LL(1) grammars, describing how LL(1) grammars can be used to build table-driven predictive parsers through recursive descent parsing procedures associated with each non-terminal that call upon one another to parse input according to the grammar. An example grammar for arithmetic expressions is provided along with its LL(1) parsing table and the recursive descent parsing procedures that could implement an LL(1) parser for that grammar.

Uploaded by

lumona_gr12
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

Predictive Parsing

• For a given non-terminal, the look-


ahead symbol uniquely determines the
production to apply
• Top-down parsing = predictive
parsing
• Driven by a predictive parsing table
of non-terminals X input symbols 
productions

CS331 • LL Parsing
LL Parsing

• Reads input from left to right and constructs


leftmost derivation (forwards)
• LL Parsing is predictive
• Features
– input parsed from left to right
– leftmost derivation (forward)
– one token lookahead

CS331 • LL Parsing
LL(1) Grammars
Definition
A grammar G is LL(1) if and only if for each set of
productions A ::= 1 | 2| …| n
• FIRST(1), FIRST(2), …FIRST(n) are all pairwise
disjoint, and
• if i *  then FIRST(j)  FOLLOW(A) = , for all 1
≤ j ≤ n, i  j.

What rule to select for a given non-terminal and input


token can be represented in a parse table M.
Algorithm for LL(1) parse table construction must
not result in multiple entries for any M[A, a] or M[A,
eof] (Aho, Sethi, and Ullman, Algorithm 4.4)
Whether a grammar is LL(1) or not is decidable
CS331 • LL Parsing
Table-driven predictive parser - LL(1)
Input: a string w and a parsing table M for G
push eof
push Start Symbol
token  next token()
X  top-of-stack
repeat
if X is a terminal then
if X = token then
pop X
token  next token()
else error()
else /* X is a non-terminal */
if M[X,token] = X  Y1Y2 …Yk then
pop X
push Yk, Yk-1, …, Y1
else error()
X  top-of-stack
until X = eof
if token  eof then error()

Aho, Sethi, and Ullman, Algorithm 4.3


CS331 • LL Parsing
Example grammar and its table
Expression grammar with precedence
<goal> ::= <expr>
<expr> ::= <term> <expr’>
<expr’> ::= + <expr>
| - <expr>
| 
<term> ::= <factor> <term’>
<term’> ::= * <term>
| / <term>
| 
<factor> ::= num
| id
LL(1) parse table
id num + - * / eof
<goal> g→ e g→ e - - - - -
<expr> e → te’ e → te’ - - - - -
<expr’> - - e’ → +e e’ → +e e’ → ε
<term > e → te’ e → te’ - - - - -
<term ’> - - t’ → ε t’ → ε t’ → *t t’ → /t t’ → ε
<factor> f’ → id f→ num - - - - -
CS331 • LL Parsing
S  ES’
Another example S’   | +S
E  num | (S)

S ( (1+2+(3+4))+5
 ES’ ( (1+2+(3+4))+5
 (S)S’ 1 (1+2+(3+4))+5
 (ES’)S’ 1 (1+2+(3+4))+5
 (1S’)S’ + (1+2+(3+4))+5
 (1+S)S’ 2 (1+2+(3+4))+5
 (1+ES’)S’ 2 (1+2+(3+4))+5
 (1+2 S’)S’ + (1+2+(3+4))+5

Parse table:
num + ( ) EOF
S  ES’  ES’
S’  +S   
E  num  (S)
CS331 • LL Parsing
How to implement?
• The table can be easily converted into a
recursive descent parser

num + ( ) EOF
S  ES’  ES’
S’  +S    
E  num  (S)

Three procedures: parse_S, parse_S’, parse_E

CS331 • LL Parsing
Recursive descent parsing - LL(1)
Recursive descent is one of the simplest parsing techniques
used in practical compilers

Each non-terminal has an associated parsing procedure that can


recognize any sequence of tokens generated by that non-terminal
Within a parsing procedure, both non-terminals and terminals can be
matched:
• non-terminal A
– call parsing procedure for A
• token t
– compare t with current input token
– if match, consume input
– otherwise, ERROR

Parsing procedures may contain (call upon) code that performs some useful
“computation" (syntax directed translation)

CS331 • LL Parsing
Recursive-Descent Parser S  ES’
S’   | +S
void parse_S () { Lookahead
E  num | (S)
token
switch (token) {
case num: parse_E(); parse_S’(); return;
case ‘(’: parse_E(); parse_S’(); return;
default: throw new ParseError();
}
}
num + ( ) EOF
S  ES’  ES’
S’  +S    
E  num  (S)

CS331 • LL Parsing
Recursive-Descent Parser
void parse_S’() {
switch (token) {
case ‘+’: token = input.read(); parse_S(); return;
case ‘)’: return;
case EOF: return;
default: throw new ParseError();
}
}
num + ( ) EOF
S  ES’  ES’
S’  +S    
E  num  (S)

CS331 • LL Parsing
Recursive-Descent Parser
void parse_E() {
switch (token) {
case number: token = input.read(); return;
case ‘(‘: token = input.read(); parse_S();
if (token != ‘)’) throw new ParseError();
token = input.read(); return;
default: throw new ParseError(); }
}
num + ( ) EOF
S  ES’  ES’
S’  +S    
E  num  (S)

CS331 • LL Parsing
Call tree = Parse tree (1+2+(3+4))+5

parse_S S
E S’
parse_E parse_S’ ( S ) S
+

parse_S E S’ 5
parse_S
1 + S
parse_E parse_S’
E S’
parse_S S
2 +
parse_E parse_S’ E S’

parse_S
( S )

E S’
parse_E parse_S’ S
3 +
parse_S E
CS331 • LL Parsing
4
Recall: Expression grammar
Expression grammar with precedence
<goal> ::= <expr>
<expr> ::= <term> <expr’>
<expr’> ::= + <expr>
| - <expr>
| 
<term> ::= <factor> <term’>
<term’> ::= * <term>
| / <term>
| 
<factor> ::= num
| id
LL(1) parse table
id num + - * / eof
<goal> g→ e g→ e - - - - -
<expr> e → te’ e → te’ - - - - -
<expr’> - - e’ → +e e’ → +e e’ → ε
<term > e → te’ e → te’ - - - - -
<term ’> - - t’ → ε t’ → ε t’ → *t t’ → /t t’ → ε
<factor> f’ → id f→ num - - - - -
CS331 • LL Parsing
Recursive Descent Parser
For the expression grammar:
goal:
token  next token();
if (expr() = ERROR | token  EOF) then
return ERROR;
else return OK;
expr:
if (term() = ERROR) then
return ERROR;
else return expr_prime();
Expr_prime:
if (token = PLUS) then
token  next_token();
return expr();
else if (token = MINUS) then
token  next_token();
return expr();
else return OK;

CS331 • LL Parsing
Recursive Descent Parser (cont.)
term:
if (factor() = ERROR) then
return ERROR;
else return term_prime();
term_prime:
if (token = MULT) then
token  next token();
return term();
else if (token = DIV) then
token  next token();
return term();
else return OK;
factor:
if (token = NUM) then
token  next token();
return OK;
else if (token = ID) then
token  next token();
return OK;
else return ERROR;
CS331 • LL Parsing
Constructing parse tables

Needed: algorithm for automatically generating


a predictive parse table from a grammar

S  ES’ ? S
num
 ES’
+ (
 ES’
) EOF

S’   | +S
S’  +S    
E  num | (S)
E  num  (S)

CS331 • LL Parsing
Constructing Parse Tables
• Use FIRST and FOLLOW sets
• Recall:
– FIRST() for arbitrary string of terminals and non-
terminals is the set of symbols that might begin the
fully expanded version of 
– FOLLOW(X) for a non-terminal X is the set of symbols
that might follow the derivation of X in the input stream

First Follow
CS331 • LL Parsing
Parse table entries
Consider a production X  
• Add   to the X row for each symbol in FIRST()

num + ( ) EOF
S  ES’ S  ES’  ES’
S’   | +S S’  +S    
E  num | (S)
E  num  (S)
• If  can derive  ( is nullable), add   for each
symbol in FOLLOW(X)

Grammar is LL(1) if there are no conflicting


entries
CS331 • LL Parsing
Computing nullable

X is nullable if it can derive the empty string:


– Directly: X  
– Indirectly: X has a production X  YZ… where
all rhs symbols (Y, Z) are nullable

Algorithm
Assume all non-terminals non-nullable,
apply rules repeatedly until no change in status

CS331 • LL Parsing
Constructing FIRST sets

• FIRST(X)  FIRST() if X  
• FIRST(a) = {a}
• FIRST(X)  FIRST(X)
• FIRST(X)  FIRST() if X is nullable

Algorithm
Assume FIRST() ={} for all ,
apply rules repeatedly to build FIRST sets

CS331 • LL Parsing
Constructing FOLLOW sets
• FOLLOW(S)  { EOF }
• if X  Y
– FOLLOW(Y) = FIRST()
– FIRST(X)  FIRST(X)
• if X  Y and  is nullable (or non-existent)
– FOLLOW(Y)  FOLLOW(X)

Algorithm
Assume FOLLOW(X) = { } for all X,
apply rules repeatedly to build FOLLOW sets

Common theme: iterative analysis.


Start with initial assignment, apply rules until no change
CS331 • LL Parsing
Example

Nullable
– Only S’ is nullable S  ES’
S’   | +S
FIRST E  num | (S)
– FIRST(ES’ ) = {num, ( }
– FIRST(+S) = { + }
– FIRST(num) = {num}
– FIRST( (S) ) = { ( } FOLLOW
– FIRST(S’) = { + } –FOLLOW(S) = { EOF, ) }
–FOLLOW(S’) = = {EOF, )}
–FOLLOW(E) = { +, ), EOF}
CS331 • LL Parsing
Creating the parse table S  ES’
S’   | +S
E  num | (S)
For each production X  
• Add   to the X row for each symbol in FIRST()
• If  is nullable, add   for each symbol in FOLLOW(X)
• Entry for [S, EOF] is ACCEPT
FIRST(ES’ ) = {num, ( }
FIRST(+S) = { + }
FOLLOW(S) = { EOF, ) } FIRST(num) = {num}
FOLLOW(S’) = = {EOF, )} FIRST( (S) ) = { ( }
FOLLOW(E) = { +, ), EOF} FIRST(S’) = { + }

num + ( ) EOF
S  ES’  ES’ Accept

S’  +S    
E  num  (S)
CS331 • LL Parsing
Ambiguous grammars
Construction of predictive parse table for ambiguous
grammar results in conflicts (but converse does not hold)

Grammar and FIRST sets


S  S + S | S * S | num
FIRST(S + S) = FIRST(S * S)
= FIRST(num) = { num }

Parse table
num + -
S S+S
S S*S

CS331 • LL Parsing
LL(1) grammars
Provable facts about LL(1) grammars:
– no left recursive grammar is LL(1)
– no ambiguous grammar is LL(1)
– LL(1) parsers operate in linear time
– an -free grammar where each alternative expansion for A begins
with a distinct terminal is a simple LL(1) grammar

Not all grammars are LL(1)


• S ::= aS | a is not LL(1)
FIRST(aS) = FIRST(a) = {a}

• S ::= aS’
S’ ::= aS | 
accepts the same language and is LL(1)

CS331 • LL Parsing
LL grammars
LL(1) grammars
– may need to rewrite grammar (left recursion removal, left factoring)
– resulting grammar larger, less maintainable
LL(k) grammars
– k-token lookahead
– more powerful than LL(1) grammars
– example:
S ::= ac | abc is LL(2)
Not all grammars are LL(k)
• Example:
– Set of productions of form: S ::= aibj for i ≥ j
• Problem:
– must choose production after k tokens of lookahead

Bottom-up parsers avoid some of these problems

CS331 • LL Parsing
Completing the parser
One of the key jobs of the parser is to build an intermediate
representation of the source code.

To build an abstract syntax tree in the recursive descent


parser, we can simply insert code at the appropriate points

E.g., for expression grammar:


• factor() can stack nodes id, num
• term_prime() can stack nodes *, /
• term() can pop 3, build and push subtree
• expr_prime() can stack nodes +, -
• expr() can pop 3, build and push subtree
• goal() can pop and return tree

CS331 • LL Parsing
Creating the AST

abstract class Expr { }

class Add extends Expr {


Expr left, right;
Add(Expr L, Expr R) { left = L; right = R; } Expr
}

class Num extends Expr { Num Add


int value;
Num (int v) { value = v);
}

CS331 • LL Parsing
AST Representation
(1 + 2 + (3 + 4)) + 5

+ Add

+ 5 Add Num (5)

1 + Num(1) Add

2 + Num(2) Add

3 4 Num(3) Num(4)

How can we generate this structure during recursive-descent


parsing?

CS331 • LL Parsing
Creating the AST
Just add code to each parsing routine to create
the appropriate nodes!
Works because parse tree and call tree have the same
shape

parse_S, parse_S’, parse_E all return an Expr:

void parse_E()  Expr parse_E()


void parse_S()  Expr parse_S()
void parse_S’()  Expr parse_S’()
S  ES’
S’   | +S
E  num | (S)
CS331 • LL Parsing
AST creation code
Expr parse_E() {
switch(token) {
case num: // E  number
Expr result = Num (token.value);
token = input.read(); return result;
case ‘(‘: // E  ( S )
token = input.read();
Expr result = parse_S();
if (token != ‘)’) throw new ParseError();
token = input.read(); return result;
default: throw new ParseError();
}
}
CS331 • LL Parsing
parse_S
Expr parse_S() { S  ES’
switch (token) { S’   | +S
case num: E  num | (S)
case ‘(’:
Expr left = parse_E();
Expr right = parse_S’();
if (right == null) return left;
else return new Add(left, right);
default: throw new ParseError();
}
}
CS331 • LL Parsing
Or…an interpreter!
int parse_S() {
switch (token) {
case number:
case ‘(’:
int left = parse_E();
int right = parse_S’();
if (right == 0) return left;
int parse_E() {
else return left + right;
switch(token) {
default: throw new ParseError(); } }
case number:
int result = token.value;
token = input.read(); return result;
case ‘(‘:
token = input.read();
int result = parse_S();
if (token != ‘)’) throw new ParseError();
token = input.read(); return result;
default: throw new ParseError(); }}
CS331 • LL Parsing

You might also like