Top-Down Parsing: - The Parse Tree Is Created Top To Bottom. - Top-Down Parser
Top-Down Parsing: - The Parse Tree Is Created Top To Bottom. - Top-Down Parser
1
Recursive-Descent Parsing (uses Backtracking)
• Backtracking is needed.
• It tries to find the left-most derivation.
S aBc
B bc | b
S S
input: abc
a B c a B c
fails, backtrack
b c b
2
Recursive Predictive Parser
a grammar a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
current token
3
Recursive Predictive Parser (example)
• When we are trying to write the non-terminal stmt, if the current token
is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it
may not be suitable for predictive parsing (not LL(1) grammar).
4
Recursive Predictive Parsing
• Recursive Descent parsing without backtracking.
• Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
5
Recursive Predictive Parsing (cont.)
A aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
6
Recursive Predictive Parsing (cont.)
• When to apply -productions.
A aA | bB |
7
Recursive Predictive Parsing (Example)
A aBe | cBd | C
B bB |
Cf
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}
follow set of B
first set of C
8
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
Parsing Table
9
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input
buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
11
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input string (say
a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser
also outputs the production rule XY1Y2...Yk to represent a step of the derivation.
4. none of the above error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.
12
LL(1) Parser – Example1
S aBa a b $ LL(1) Parsing
B bB | S S aBa Table
B B B bB
13
LL(1) Parser – Example1 (cont.)
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
14
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW
• FIRST() is a set of the terminal symbols which occur as first symbols in strings derived from
where is any string of grammar symbols.
• if derives to , then is also in FIRST() .
• FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A
in the strings derived from the starting symbol.
– a terminal a is in FOLLOW(A) if S Aa
*
– $ is in FOLLOW(A) if S A
*
Benefits of FIRST() and FOLLOW()
– can be used to prove the LL(k) characteristics of the grammar
– can be used to Aid in the construction of predictive parsing table
– Provides selection information for recursive decent parsers
15
Compute FIRST for Any String X
• If X is a terminal symbol FIRST(X)={X}
• If X is a non-terminal symbol and X is a production rule
is in FIRST(X).
• If X is a non-terminal symbol and X Y1Y2..Yn is a production rule
if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
• If X is FIRST(X)={}
16
FIRST Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
17
Compute FOLLOW (for non-terminals)
• If S is the start symbol $ is in FOLLOW(S)
• If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
18
FOLLOW Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
19
Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A of a grammar G
– for each terminal a in FIRST()
add A to M[A,a]
– If in FIRST()
for each terminal a in FOLLOW(A) add A to M[A,a]
– If in FIRST() and $ in FOLLOW(A)
add A to M[A,$]
• All other undefined entries of the parsing table are error entries.
20
Constructing LL(1) Parsing Table -- Example
E TE’ FIRST(TE’)={(,id} E TE’ into M[E,(] and M[E,id]
E’ +TE’ FIRST(+TE’ )={+} E’ +TE’ into M[E’,+]
E’ FIRST()={} none
but since in FIRST()
and FOLLOW(E’)={$,)} E’ into M[E’,$] and M[E’,)]
T’ FIRST()={} none
but since in FIRST()
and FOLLOW(T’)={$,),+} T’ into M[T’,$], M[T’,)] and M[T’,+]
21
LL(1) Parser – Example2
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
id + * ( ) $
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
22
LL(1) Parser – Example2
stack input output
$E id+id$ E TE’
$E’T id+id$ T FT’
$E’ T’F id+id$ F id
$ E’ T’id id+id$
$ E ’ T’ +id$ T’
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T FT’
$ E ’ T’ F id$ F id
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $ E’
$ $ accept
23
LL(1) Grammars
• A grammar whose parsing table has no multiply-defined entries is said
to be LL(1) grammar.
• The parsing table of a grammar may contain more than one production
rule. In this case, we say that it is not a LL(1) grammar.
24
A Grammar which is not LL(1)
SiCtSE | a FOLLOW(S) = { $,e }
EeS | FOLLOW(E) = { $,e }
Cb FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S Sa S iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
E
FIRST(b) = {b}
C Cb
Problem ambiguity
27
A Grammar which is not LL(1)
Q.1 Consider the following Grammar, test whether the grammar is LL(1)
or not, and construct a predictive parsing table for it.
S AaAb |BbBa FIRST (S) = { a,b} FOLLOW(S) = { $}
A FIRST (A) = {} FOLLOW (A) = {a,b}
B FIRST (B) = {} FOLLOW(B) = {a,b}
a b $
S S AaAb SBbBa
A A A
B B B
28
A Grammar which is not LL(1)
Q.1 Consider the following Grammar, test whether the grammar is LL(1)
or not, and construct a predictive parsing table for it.
S 1AB | FOLLOW(S) = { $}
A 1AC | 0C FOLLOW(A) = { 1,0 }
B 0S FOLLOW(B) = { $}
C1
FIRST(S) = {1, } 1 0 $
FIRST(A) = {1,0} S
FIRST(B) = {0} A
FIRST(C) = {1}
B
C
29
A Grammar which is not LL(1) (cont.)
• What do we have to do it if the resulting parsing table contains multiply
defined entries?
– If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
– If the grammar is not left factored, we have to left factor the grammar.
– If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
• A left recursive grammar cannot be a LL(1) grammar.
– A A |
any terminal that appears in FIRST() also appears FIRST(A) because A .
If is , any terminal that appears in FIRST() also appears in FIRST(A) and FOLLOW(A).
31
Error Recovery in Predictive Parsing
• An error may occur in the predictive parsing (LL(1) parsing)
– if the terminal symbol on the top of stack does not match with
the current input symbol.
– if the top of stack is a non-terminal A, the current input symbol is a,
and the parsing table entry M[A,a] is empty.
• What should the parser do in an error case?
– The parser should be able to give an error message (as much as
possible meaningful error message).
– It should be recover from that error case, and it should be able
to continue the parsing with the rest of the input.
32
Error Recovery Techniques
• Panic-Mode Error Recovery
– Skipping the input symbols until a synchronizing token is found.
• Phrase-Level Error Recovery
– Each empty entry in the parsing table is filled with a pointer to a specific error routine to
take care that error case.
• Error-Productions
– If we have a good idea of the common errors that might be encountered, we can augment
the grammar with productions that generate erroneous constructs.
– When an error production is used by the parser, we can generate appropriate error
diagnostics.
– Since it is almost impossible to know all the errors that can be made by the programmers,
this method is not practical.
• Global-Correction
– Ideally, we we would like a compiler to make as few change as possible in processing
incorrect inputs.
– We have to globally analyze the input to find the error.
– This is an expensive method, and it is not in practice.
33
Panic-Mode Error Recovery in LL(1) Parsing
• In panic-mode error recovery, we skip all the input symbols until a
synchronizing token is found.
• What is the synchronizing token?
– All the terminal-symbols in the follow set of a non-terminal can be used as a synchronizing
token set for that non-terminal.
• So, a simple panic-mode error recovery for the LL(1) parsing:
– All the empty entries are marked as synch to indicate that the parser will skip all the input
symbols until a symbol in the follow set of the non-terminal A which on the top of the
stack. Then the parser will pop that non-terminal A from the stack. The parsing continues
from that state.
– To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol
from the stack and it issues an error message saying that that unmatched terminal is
inserted.
34
Panic-Mode Error Recovery - Example
S AbS | e | a b c d e $
A a | cAd S S AbS sync S AbS sync S e S
FOLLOW(S)={$}
A Aa sync A cAd sync sync sync
FOLLOW(A)={b,d}
35
Phrase-Level Error Recovery
• Each empty entry in the parsing table is filled with a pointer to a
special error routine which will take care that error case.
• These error routines may:
– change, insert, or delete input symbols.
– issue appropriate error messages
– pop items from the stack.
• We should be careful when we design these error routines, because we
may put the parser into an infinite loop.
36