Lecture04 Week06 TopDownParsing 1 - Compilers
Lecture04 Week06 TopDownParsing 1 - Compilers
Kenneth C. Louden
4. Top-Down Parsing
PART ONE
The outline of this chapter
Concept of Top-Down Parsing(1)
• It parses an input string of tokens by tracing out
the steps in a leftmost derivation.
– And the implied traversal of the parse tree is a preorder
traversal and, thus, occurs from the root to the leaves.
• The example:
– number + number, and corresponds to the parse tree
exp
exp op exp
number + number
Concept of Top-Down Parsing(2)
The example: number + number, and corresponds to the
parse tree
• The above parse tree is corresponds to the leftmost
derivations:
(1) exp => exp op exp
(2) => number op exp
(3) => number + exp
(4) => number + number
exp
exp op exp
number + number
Two forms of Top-Down Parsers
• Predictive parsers:
– attempts to predict the next construction in the input
string using one or more look-ahead tokens
• Backtracking parsers:
– try different possibilities for a parse of the input,
backing up an arbitrary amount in the input if one
possibility fails.
– It is more powerful but much slower, unsuitable for
practical compilers.
Two kinds of Top-Down parsing
algorithms
• Recursive-descent parsing:
– is quite versatile and suitable for a handwritten parser.
• LL(1) parsing:
– The first “L” refers to the fact that it processes the input
from left to right;
– The second “L” refers to the fact that it traces out a
leftmost derivation for the input string;
– The number “1” means that it uses only one symbol of
input to predict the direction of the parse.
Other Contents
• Look-Ahead Sets
– First and Follow sets: are required by both recursive-
descent parsing and LL(1) parsing.
• A TINY Parser
– It is constructed by recursive-descent parsing algorithm.
• Error recovery methods
– The error recovery methods used in Top-Down parsing
will be described.
Contents
PART ONE
4.1 Top-Down Parsing by Recursive-Descent [More]
4.2 LL(1) Parsing [More]
PART TWO
4.3 First and Follow Sets
4.4 A Recursive-Descent Parser for the TINY
Language
4.5 Error Recovery in Top-Down Parsers
4.1 Top-Down Parsing by
Recursive-Descent
4.1.1 The Basic Method of
Recursive-Descent
The idea of Recursive-Descent
Parsing
• Viewing the grammar rule for a non-terminal A as
a definition for a procedure to recognize an A
• The right-hand side of the grammar for A specifies
the structure of the code for this procedure
• The Expression Grammar:
exp → exp addop term∣term
addop → + ∣-
term → term mulop factor ∣ factor
mulop →*
factor →(exp) ∣ number
A recursive-descent procedure that
recognizes a factor
procedure factor • The token keeps the current
begin next token in the input (one
case token of symbol of look-ahead)
( : match( ( );
exp;
match( )); • The Match procedure
number: matches the current next
match (number); token with its parameters,
else error; advances the input if it
end case; succeeds, and declares error
end factor if it does not
Match Procedure
• The Match procedure matches the current next token with
its parameters,
– advances the input if it succeeds, and declares error if it does not
term addop -
term mulop
factor *
factor mulop
( exp )
factor
number
4.1.2 Repetition and Choice:
Using EBNF
An Example
procedure ifstmt; • The grammar rule for an if-
begin statement:
match( if ); If-stmt → if ( exp ) statement
match( ( ); ∣ if ( exp ) statement else statement
exp;
match( ) );
statement; • Could not immediately
if token = else then distinguish the two choices
match (else); because the both start with the
statement; token if
end if; • Put off the decision until we see
end ifstmt; the token else in the input
The EBNF of the if-statement
• If-stmt → if ( exp ) statement [ else statement]
Square brackets of the EBNF are translated into a test in the code for
ifstmt.
• if token = else then
• match (else);
• statement;
• endif;
• Notes
– EBNF notation is designed to mirror closely the actual code of a
recursive-descent parser,
– So a grammar should always be translated into EBNF if recursive-
descent is to be used.
• It is natural to write a parser that matches each else token
as soon as it is encountered in the input
EBNF for Simple Arithmetic
Grammar(1)
• The EBNF rule for exp → exp addop term∣term
– exp → term {addop term}
– Where, the curly bracket expressing repetition can be
translated into the code for a loop:
procedure exp;
begin
term;
while token = + or token = - do
match(token);
term;
end while;
end exp;
EBNF for Simple Arithmetic
Grammar(2)
• The EBNF rule for term:
– term → factor {mulop factor}
Becomes the code
procedure term;
begin
factor;
while token = * do
match(token);
factor;
end while;
end exp;
Left associatively implied by the
curly bracket
• The left associatively implied by the curly
bracket (and explicit in the original BNF) can
still be maintained within this code
function exp: integer; case token of
+ : match(+);
var temp: integer;
temp:=temp+term;
begin -:match(-);
temp:=term; temp:=temp-term;
while token=+ or token = - do end case;
end while;
return temp;
end exp;
A working simple calculator in C
code(1)
+ 5
3 4
The pseudo-code for constructing
the syntax tree(1)
function exp : syntaxTree;
var temp, newtemp: syntaxTree;
begin
temp:=term;
while token=+ or token = - do
case token of
+ : match(+);
newtemp:=makeOpNode(+);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
The pseudo-code for constructing
the syntax tree(2)
-:match(-);
newtemp:=makeOpNode(-);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
end case;
end while;
return temp;
end exp;
A simpler one
function exp : syntaxTree;
var temp, newtemp: syntaxTree;
begin
temp:=term;
while token=+ or token = - do
newtemp:=makeOpNode(token);
match(token);
leftChild(newtemp):=temp;
rightChild(newtemp):=term;
temp=newtemp;
end while;
return temp;
end exp;
The pseudo-code for the if-statement
procedure (1)
function ifstatement: syntaxTree;
var temp:syntaxTree;
begin
match(if);
match(();
temp:= makeStmtNode(if);
testChild(temp):=exp;
match());
thenChild(temp):=statement;
The pseudo-code for the if-statement
procedure (2)
if token= else then
match(else);
elseChild(temp):=statement;
else
ElseChild(temp):=nil;
end if;
end ifstatement
4.1.3 Further Decision Problems
More formal methods to deal with
complex situation
(1) It may be difficult to convert a grammar in
BNF into EBNF form;
(2) It is difficult to decide when to use the
choice A →αand the choice A →β;
if both α andβ begin with non-terminals.
Such a decision problem requires the
computation of the First Sets.
More formal methods to deal with
complex situation
(3) It may be necessary to know what token legally
coming from the non-terminal A, in writing the
code for an ε-production: A→ε. Such tokens
indicate A may disappear at this point in the
parse. This set is called the Follow Set of A.
(4) It requires computing the First and Follow sets in
order to detect the errors as early as possible.
Such as “)3-2)”, the parse will descend from exp
to term to factor before an error is reported.
4.2 LL(1) PARSING
4.2.1 The Basic Method of LL(1)
Parsing
Main idea
• LL(1) Parsing uses an explicit stack rather than
recursive calls to perform a parse
• An example:
– a simple grammar for the strings of balanced
parentheses:
S→(S) S∣ε
• The following table shows the actions of a top-
down parser given this grammar and the string ( )
Table of Actions
Steps Parsing Stack Input Action
1 $S ()$ S→(S) S
2 $S)S( ()$ match
3 $S)S )$ S→ε
4 $S) )$ match
5 $S $ S→ε
6 $ $ accept
General Schematic
• A top-down parser begins by pushing the start symbol onto
the stack
• It accepts an input string if, after a series of actions, the
stack and the input become empty
• A general schematic for a successful top-down parse:
$ StartSymbol Inputstring$
… … //one of the
two actions
… … //one of the two actions
$ $ accept
Two Actions
• The two actions
– Generate: Replace a non-terminal A at the top of the stack by a
string α(in reverse) using a grammar rule A →α, and
– Match: Match a token on top of the stack with the next input token.
• The list of generating actions in the above table:
S => (S)S [S→(S) S]
=> ( )S [S→ε]
=> ( ) [S→ε]
• Which corresponds precisely to the steps in a leftmost
derivation of string ( ).
• This is the characteristic of top-down parsing.
4.2.2 The LL(1) Parsing Table
and Algorithm
4.2.3 Left Recursion Removal
and Left Factoring
4.2.4 Syntax Tree Construction in
LL(1) Parsing
End of Part One
THANKS