Lecture04 TopDownParsing 2
Lecture04 TopDownParsing 2
Kenneth C. Louden
4. Top-Down Parsing
PART TWO
Contents
PART ONE
4.1 Top-Down Parsing by Recursive-Descent
4.2 LL(1) Parsing [More]
PART TWO
4.3 First and Follow Sets [More]
4.4 A Recursive-Descent Parser for the TINY
Language [More]
4.5 Error Recovery in Top-Down Parsers
4.1 Top-Down Parsing by
Recursive-Descent
4.2 LL(1) Parsing
4.2.1 The Basic Method of LL(1)
Parsing
Main idea
• LL(1) Parsing uses an explicit stack rather than
recursive calls to perform a parse
• An example:
– a simple grammar for the strings of balanced
parentheses:
S→(S) S∣ε
• The following table shows the actions of a top-
down parser given this grammar and the string ( )
Table of Actions
Steps Parsing Stack Input Action
1 $S ()$ S→(S) S
2 $S)S( ()$ match
3 $S)S )$ S→ε
4 $S) )$ match
5 $S $ S→ε
6 $ $ accept
General Schematic
• A top-down parser begins by pushing the start symbol onto
the stack
• It accepts an input string if, after a series of actions, the
stack and the input become empty
• A general schematic for a successful top-down parse:
$ StartSymbol Inputstring$
… … //one of the
two actions
… … //one of the two actions
$ $ accept
Two Actions
• The two actions
– Generate: Replace a non-terminal A at the top of the stack by a
string α(in reverse) using a grammar rule A →α, and
– Match: Match a token on top of the stack with the next input token.
• The list of generating actions in the above table:
S => (S)S [S→(S) S]
=> ( )S [S→ε]
=> ( ) [S→ε]
• Which corresponds precisely to the steps in a leftmost
derivation of string ( ).
• This is the characteristic of top-down parsing.
4.2.2 The LL(1) Parsing Table
and Algorithm
Purpose and Example of LL(1)
Table
• Purpose of the LL(1) Parsing Table:
– To express the possible rule choices for a non-terminal A
when the A is at the top of parsing stack based on the
current input token (the look-ahead).
• The LL(1) Parsing table for the following simple
grammar:
S→(S) S∣∣ε
M[N,T] ( ) $
S S→(S) S S→ε
ε S→ε
ε
The General Definition of Table
• The table is a two-dimensional array
indexed by non-terminals and terminals
• Containing production choices to use at the
appropriate parsing step called M[N,T]
• N is the set of non-terminals of the grammar
• T is the set of terminals or tokens (including $)
• Any entrances remaining empty
• Representing potential errors
Table-Constructing Rule
• The table-constructing rule
– If A→α is a production choice, and there is a
derivation α=>* aβ, where a is a token, then
add A→α to the table entry M[A,a];
– If A→α is a production choice, and there are
derivations α=>* ε and S$=>* βAaγ, where S is
the start symbol and a is a token (or $), then
add A→α to the table entry M[A,a];
A Table-Constructing Case
• The constructing-process of the following table
– For the production : S→(S)S, α=(S)S, where a=(,
this choice will be added to the entry M[S, (].
– Since: S=>(S)S$, ,rule 2 applied with
A = S, a = ), so add the choice S→ε to M[S, )]
– Since S$=>* S$, S→ε is also added to M[S, $].
M[N,T] ( ) $
S S→(S) S S→ε
ε S→ε
ε
Properties of LL(1) Grammar
• Definition of LL(1) Grammar
– A grammar is an LL(1) grammar if the
associated LL(1) parsing table has at most
on production in each table entry
• An LL(1) grammar cannot be ambiguous
A Parsing Algorithm Using the
LL(1) Parsing Table
(* assumes $ marks the bottom of the stack and the end of the input *)
push the start symbol onto the top the parsing stack;
Else-part
→ε ε
Exp Exp → Exp →
0 1
Notice for Example: If-Statement
term exp’
addop term
factor
exp’
- factor
number
addop term exp’
(3)
number
(4)
- factor ε
number
(5)
Syntax Tree
• Nevertheless, a parse should still construct
the appropriate left associative syntax tree
-
- 5
3 4
• Example:
stmt-sequence→stmt; stmt-sequence | stmt
stmt→s
• An LL(1) parser cannot distinguish between the production
choices in such a situation
• The solution in this simple case is to “factor” the α out on the left
and rewrite the rule as two rules:
A→αA’
A’→β|γ
Algorithm for Left Factoring a
Grammar
While there are changes to the grammar do
For each non-terminal A do
Let α be a prefix of maximal length that is shared
By two or more production choices for A
If α≠εthen
Let A →α1|α2|…|αn be all the production choices for A
And suppose thatα1,α2,…,αk shareα, so that
A →αβ1|αβ2|…|αβk|αK+1|…|αn, the βj’s share
No common prefix, andαK+1,…,αn do not share α
Replace the rule A →α1|α2|…|αn by the rules
A →αA’|αK+1|…|αn
A ‘→β1|β2|…|βk
Example 4.4
• Consider the grammar for statement
sequences, written in right recursive
form:
Stmt-sequence→stmt; stmt-sequence | stmt
Stmt→s
• Left Factored as follows:
Stmt-sequence→stmt stmt-seq’
Stmt-seq’→; stmt-sequence | ε
Example 4.4
• Notices:
– if we had written the stmt-sequence rule left
recursively,
– Stmt-sequence→stmt-sequence ;stmt | stmt
• Then removing the immediate left
recursion would result in the same rules:
Stmt-sequence→stmt stmt-seq’
Stmt-seq’→; stmt-sequence | ε
Example 4.5
• Consider the following grammar for if-
statements:
If-stmt → if ( exp ) statement
∣ if ( exp ) statement else statement
• The left factored form of this grammar is:
If-stmt → if (exp) statement else-part
Else-part → else statement | ε
Example 4.6
• An arithmetic expression grammar with right
associativity operation:
exp → term+exp |term
• This grammar needs to be left factored, and we obtain
the rules
exp → term exp’
exp’→ + exp∣∣ε
• Suppose we substitute term exp’ for exp, we then
obtain:
exp → term exp’
exp’→ + term exp’∣∣ε
Example 4.7
• An typical case where a grammar fails to be
LL(1)
Statement → assign-stmt| call-stmt| other
Assign-stmt→identifier:=exp
Call-stmt→indentifier(exp-list)
• Where, identifier is shared as first token of
both assign-stmt and call-stmt and,
• thus, could be the lookahead token for either.
• But not in the form can be left factored.
Example 4.7
• First replace assign-stmt and call-stmt by the
right-hand sides of their definition productions:
Statement → identifier:=exp | indentifier(exp-list)
| other
• Then, we left factor to obtain
Statement → identifier statement’ | other
Statement’ →:=exp |(exp-list)
• Note:
– this obscures the semantics of call and assignment
by separating the identifier from the actual call or
assign action.
4.2.4 Syntax Tree Construction in
LL(1) Parsing
Difficulty in Construction
• It is more difficult for LL(1) to adapt to syntax tree
construction than recursive descent parsing
M[N,T] S ; $
Stmt-sequence Stmt-sequence →
Stmt Stmt-seq’
Stmt Stmt→s
Stmt-seq’ Stmt-seq’ → ; Stmt-seq’ →|ε
Stmt-sequence|
4.3.4 Extending the lookahead:
LL(k) Parsers
Definition of LL(k)
• The LL(1) parsing method can be extend to k
symbols of look-ahead.
• Definitions:
– Firstk(α)={wk | α=>* w}, where, wk is the first k
tokens of the string w if the length of w > k, otherwise
it is the same as w.
– Followk(A)={wk | S$=>*αAw}, where, wk is the first
k tokens of the string w if the length of w > k,
otherwise it is the same as w.
• LL(k) parsing table:
– The construction can be performed as that of LL(1).
Complications in LL(k)
• The complications in LL(k) parsing:
– The parsing table become larger; since the number of
columns increases exponentially with k.
– The parsing table itself does not express the complete
power of LL(k) because the follow strings do not occur
in all contexts.
– Thus parsing using the table as we have constructed it
is distinguished from LL(k) parsing by calling it Strong
LL(k) parsing, or SLL(k) parsing.
Complications in LL(k)
THANKS