Compiler Principle and Technology: Mr. Aruna Malik BIT (Mesra) Ranchi, Off Campus NOIDA
Compiler Principle and Technology: Mr. Aruna Malik BIT (Mesra) Ranchi, Off Campus NOIDA
Principle and
Technology
Mr. Aruna Malik
BIT (Mesra) Ranchi,
Off Campus NOIDA
4. Top-Down
Parsing
PART ONE
Contents
PART ONE
4.1 Top-Down Parsing by Recursive Descent
4.2 LL(1) Parsing
PART TWO
4.3 First and Follow Sets
4.5 Error Recovery in Top-Down Parsers
Basic Concepts
Recursive-descent
Predictive parsers parsing
Error
recovery
LL(1) parsing:
First set & non-recursive
Follow set
4.1 Top-Down
Parsing by
Recursive-
Descent
4.1.1 The Basic
Method of
Recursive-
Descent
The idea of Recursive-Descent Parsing
The grammar rule for a non-terminal A : a definition for a
procedure to recognize an A
procedure factor
• The token keeps the current next
begin
token in the input (one symbol of
case token of
look-ahead)
( : match( ( );
exp;
match( )); • The Match procedure matches the
number: current next token with its
match (number); parameters, advances the input if
else error; it succeeds, and declares error if it
end case; does not
end factor
Match Procedure
Matches the current next token with its parameters
Advances the input if it succeeds, and declares error if it
does not
term addop -
term
factor mulop
*
factor mulop
( exp )
factor
number
4.1.2 Repetition
and Choice:
Using EBNF
An Example
procedure ifstmt; • The grammar rule for an if-statement:
begin If-stmt → if ( exp ) statement
match( if ); ∣ if ( exp ) statement else statement
match( ( );
exp;
match( ) );
statement;
Issuse
if token = else then
• Could not immediately distinguish
match (else); the two choices because the both
statement; start with the token if
end if; • Put off the decision until we see the
end ifstmt; token else in the input
The EBNF of the if-statement
If-stmt → if ( exp ) statement [ else statement]
Square brackets of the EBNF are translated into a test in
the code for if-stmt:
if token = else then
match (else);
statement;
end if;
Notes
EBNF notation is designed to mirror closely the actual
code of a recursive-descent parser,
So a grammar should always be translated into EBNF if
recursive-descent is to be used.
It is natural to write a parser that matches each else
token as soon as it is encountered in the input
EBNF for Simple Arithmetic Grammar(1)
The EBNF rule for :
exp → exp addop term∣term
exp → term {addop term}
procedure term;
begin
factor;
while token = * do
match(token);
factor;
end while;
end exp;
Left associatively
implied by the curly bracket
The left associatively implied by the curly bracket
(and explicit in the original BNF) can still be
maintained within this code
function exp: integer;
var temp: integer;
begin
temp:=term;
while token=+ or token = -
do
case token of
+ : match(+);
temp:=temp+term;
-: match(-);
temp:=temp-term;
end case;
end while;
return temp;
end exp;
Some Notes
The method of turning grammar rule in EBNF into code is
quite powerful.
There are a few pitfalls, and care must be taken in
scheduling the actions within the code.
In the previous pseudo-code for exp:
(1) The match of operation should be before repeated
calls to term;
(2) The global token variable must be set before the parse
begins;
(3) The getToken must be called just after a successful test
of a token
Construction of the syntax tree
The expression: 3+4+5
+ 5
3 4
The pseudo-code for constructing the
syntax tree
function exp : syntaxTree;
var temp, newtemp: syntaxTree;
begin
temp:=term;
while token=+ or token = -
do
case token of
+ : match(+);
newtemp:=makeOpNode(+);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
-: match(-);
newtemp:=makeOpNode(-);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
end case;
end while;
return temp;
end exp;
A simpler one
function exp : syntaxTree;
var temp, newtemp: syntaxTree;
begin
temp:=term;
while token=+ or token = -
do
newtemp:=makeOpNode(token);
match(token);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
end while;
return temp;
end exp;
The pseudo-code for the
if-statement procedure
function ifstatement: syntaxTree;
var temp:syntaxTree;
begin
match(if);
match(();
temp:= makeStmtNode(if);
testChild(temp):=exp;
match());
thenChild(temp):=statement;
if token= else then
match(else);
elseChild(temp):=statement;
else
ElseChild(temp):=nil;
end if;
end ifstatement
4.1.3 Further
Decision
Problems
Characteristics
of recursive-descent
M[N,T] ( ) $
M[N,T] ( ) $
Push the start symbol onto the top the parsing stack;
While the top of the parsing stack ≠ $ and
the next input token ≠ $
do
if the top of the parsing stack is terminal a and the
next input token = a
then (* match *)
pop the parsing stack;
advance the input;
A Parsing Algorithm
Using the LL(1) Parsing Table
else if the top of the parsing stack is non-terminal A
and the next input token is terminal a and
parsing table entry M[A,a] contains production
A→X1X2…Xn
then (* generate *)
pop the parsing stack;
for i:=n downto 1 do
push Xi onto the parsing stack;
else error;
If-stmt If-stmt → if
(exp)
statemen
t else-
part
Else-part
→ε
S match
S→o
$ match
L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o If (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
I match
S→o
$ match
L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o If (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
L match
S→o
$ match
L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o if (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
L match
S→o
$ match
L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o if (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
L match
S→o
$ match
L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o if (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
L match
S→o
$ match
0 L→eS
Match
S→o
match
L→ε
22 $ $ accept
( for conciseness, statement= S, if-stmt=I,
else-part=L, exp=E, if=i, else=e, other=o) Steps Parsing Stack Input Action
1 $S i(0)i(1)oeo$ S→I
S→I|o if (0) if (1) other else other 2 $I i(0)i(1)oeo$ I→i(E)SL
L match
S→o
$ match
0 L→eS
Match
S→o
match
L→ε
22 $ $ accept
The last Step:
We omit the procedure, and the last status Steps Parsing Stack Input Action
i ( E ) S L
$ LLS)E(i i(1)oeo $ Match
$ LLS)E( (1)oeo Match
… … E→1
0 I ε Match
match
S→o
i ( $
E ) S
match
L L→eS
Match
1 o e S S→o
match
L→ε
o 22 $ $ accept
4.2.3 Left
Recursion
Removal and
Left Factoring
Repetition and Choice Problem
Solutions:
1. Apply the same ideas of using EBNF (in recursive-descent
parsing) to LL(1) parsing;
2. Rewrite the grammar within the BNF notation into a form
that the LL(1) parsing algorithm can accept.
Two standard techniques for Repetition
and Choice
n
We rewrite this grammar rule into two rules:
A → βA’
To generate β first;
A’ → αA’| ε
To generate the repetitions of α, using right
recursion.
Example
Explanation:
(1) Picking an arbitrary order for all non-terminals, say,
A1,…, Am;
(2) Eliminates all rules of the form Ai→ Ajγ with j≤i;
(3) Every step in such a loop would only increase the index,
and thus the original index cannot be reached again.
Example
Consider the following grammar:
A→Ba| Aa| c
B→Bb| Ab| d
Where, A1=A, A2=B and m=2
(1) When i=1, the inner loop does not execute, So only to
remove the immediate left recursion of A
A→BaA’| c A’
A’→aA’| ε
B→Bb| Ab| d
Example
(2) when i=2, the inner loop execute once, with
j=1;To eliminate the rule B→Ab by replacing A with
it choices
A→BaA’| c A’
A’→aA’| ε
B→Bb| BaA’b|cAb| d
(3) We remove the immediate left recursion of B to
obtain
A→BaA’| c A’
A’→aA’| ε
B→|cA’bB’| dB’
B→bB’ |aA’bB’|ε
Now, the grammar has no left recursion.
Notice
term exp’
addop term
factor
exp’
- factor
number
addop term exp’
(3)
number
(4)
- factor ε
number
(5)
Syntax Tree
Nevertheless, a parse should still construct the
appropriate left associative syntax tree
- 5
3 4
• From the given parse tree, we can see how the value of
3-4-5 is computed.
Left-Recursion Removed Grammar and
its Procedures
The grammar with its left recursion removed, exp and exp’
as follows:
exp → term exp’
exp’→ addop term exp’∣ε
Procedure exp Procedure exp’
Begin Begin
Term; Case token of
Exp’; +: match(+);
End exp; term;
exp’;
-: match(-);
term;
exp’;
end case;
end exp’
Left-Recursion Removed Grammar and
its Procedures
To compute the value of the expression, exp’ needs a
parameter from the exp procedure
exp → term exp’
exp’→ addop term exp’∣ε
A→αβ|αγ
Example:
stmt-sequence→stmt; stmt-sequence | stmt
stmt→s
An LL(1) parser cannot distinguish between the production
choices in such a situation
A→αA’
A’→β|γ
Algorithm for Left Factoring a Grammar
Notes: The addition is scheduled just after the next number, but
before any more E’ non-terminals are processed. This guaranteed
left associativity.
The actions of the parser to compute
the value of the expression 3+4+5