0% found this document useful (0 votes)
90 views19 pages

Predictive Parsing: Recall The Main Idea of Top-Down Parsing

This document discusses predictive parsing and how it avoids backtracking during top-down parsing. It introduces the concepts of FIRST and FOLLOW sets, which are used to determine which production rule to select based on the lookahead token. FIRST sets contain the first possible tokens that could follow a non-terminal, while FOLLOW sets contain tokens that could follow the non-terminal in a sentential form. Using these sets, a predictive parsing table can be constructed to uniquely determine the production rule without backtracking. The document also discusses constructing LL(1) parsers from grammars and methods for error recovery during LL(1) parsing.

Uploaded by

Suresh Dhanapal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views19 pages

Predictive Parsing: Recall The Main Idea of Top-Down Parsing

This document discusses predictive parsing and how it avoids backtracking during top-down parsing. It introduces the concepts of FIRST and FOLLOW sets, which are used to determine which production rule to select based on the lookahead token. FIRST sets contain the first possible tokens that could follow a non-terminal, while FOLLOW sets contain tokens that could follow the non-terminal in a sentential form. Using these sets, a predictive parsing table can be constructed to uniquely determine the production rule without backtracking. The document also discusses constructing LL(1) parsers from grammars and methods for error recovery during LL(1) parsing.

Uploaded by

Suresh Dhanapal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Predictive parsing

Recall the main idea of top-down parsing:


Start at the root, grow towards leaves
Pick a production and try to match input
May need to backtrack
Can we avoid the backtracking?

Given A | the parser should be able to choose


between and

How?
What if we do some "preprocessing" to answer the
question: Given a non-terminal A and lookahead t,
which (if any) production of A is guaranteed to start
with a t?
1

Predictive parsing
If we have two productions: A

, we want a
distinct way of choosing the correct one.
Define:

for G, x FIRST() iff * x

If FIRST() and FIRST() contain no common

symbols, we will know whether we should


choose A or A by looking at the
lookahead symbol.

Predictive parsing
Compute FIRST(X) as follows:
if X is a terminal, then FIRST(X)={X}
if X is a production, then add to FIRST(X)
if X is a non-terminal and XY1Y2...Yn is a production,
add FIRST(Yi) to FIRST(X) if the preceding Yjs contain
in their FIRSTs

Predictive parsing
What if we have a "candidate" production A

where = or *?
We could expand if we knew that there is some
sentential form where the current input symbol
appears after A.
Define:

for AN, xFOLLOW(A) iff S*Ax

Predictive parsing
Compute FOLLOW as follows:
FOLLOW(S) contains EOF
For productions AB, everything in FIRST()
except goes into FOLLOW(B)
For productions AB or AB where FIRST()
contains , FOLLOW(B) contains everything that is in
FOLLOW(A)

Predictive parsing
Armed with
FIRST
FOLLOW
we can build a parser where no backtracking is

required!

Predictive parsing (w/table)


For each production A do:
For each terminal a FIRST() add A to entry
M[A,a]
If FIRST(), add A to entry M[A,b] for each
terminal b FOLLOW(A).
If FIRST() and EOFFOLLOW(A), add A to
M[A,EOF]
Use table and stack to simulate recursion.

Recursive Descent Parsing


Basic idea:
Write a routine to recognize each lhs
This produces a parser with mutually recursive
routines.
Good for hand-coded parsers.
Example:

AaB | b will correspond to


A() {
if (lookahead == 'a')
match('a');
B();
else if (lookahead == 'b')
match ('b');
else error();
}

Building a parser
Original grammar:

EE+E
EE*E
E(E)
Eid

This grammar is left-recursive, ambiguous and requires

left-factoring. It needs to be modified before we build a


predictive parser for it:

Remove ambiguity:
EE+T
TT*F
F(E)
Fid

Remove left recursion:


ETE'
E'+TE'|
TFT'
T'*FT'|
F(E)
Fid

Building a parser
The grammar:

ETE'
E'+TE'|
TFT'
T'*FT'|
F(E)
Fid

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}


FIRST(E') = {+, }
FIRST(T') = {*, }
FOLLOW(E) = FOLLOW(E') = {$, )}
FOLLOW(T) = FOLLOW(T') = {+, $, )}
FOLLOW(F) = {*, +, $, )}
Now, we can either build a table or design a recursive descend parser.
10

Parsing table
+

E
E' E'+TE'
T
T'
T'
F
match
+
*
(
)
id
$

T'*FT'
match

(
ETE'
TFT'
F(E)
match

)
E'
T'

match

id
ETE'
TFT'
Fid

match

$
E'
T'

accept
11

Parsing table
Parse the input id*id using the parse table and a stack
Step Stack
1
$E
2
$E'T
3
$E'T'F
4
$E'T'id
5
$E'T'
6
$T'F*
7
$T'F
8
$T'id
9
$T'
10
$

Input
id*id$
id*id$
id*id$
id*id$
*id$
*id$
id$
id$
$
$

Next Action
ETE'
TFT'
Fid
match id
T'*FT'
match *
Fid
match id
T'
accept
12

Recursive descend parser


parse() {
token = get_next_token();
if (E() and token == '$')
then return true
else return false
}

E() {
if (T())
then return Eprime()
else return false
}

Eprime() {
if (token == '+')
then token=get_next_token()
if (T())
then return Eprime()
else return false
else if (token==')' or token=='$')
then return true
else return false
}

The remaining procedures are similar.


13

LL(1) parsing
Our parser scans the input Left-to-right,

generates a Leftmost derivation and uses 1


symbol of lookahead.
It is called an LL(1) parser.
If you can build a parsing table with no
multiply-defined entries, then the grammar is
LL(1).
Ambiguous grammars are never LL(1)
Non-ambiguous grammars are not necessarily
LL(1)
14

LL(1) parsing
For example, the following grammar will have

two possible ways to expand S' when the


lookahead is else.
S if E then S S' | other
S' else S |
E id

It may expand S' else S or S'

We can resolve the ambiguity by instructing the parser to


always pick S' else S. This will match each else to the
closest previous then.

15

LL(1) parsing
Here's an example of a grammar that is NOT

LL(k) for any k:


S Ca | Cb
C cC | c

Why? Suppose the grammar was LL(k) for some k.


Consider the input string ck+1a. With only k lookaheads,
the parser would not be able to decide whether to
expand using S Ca or S Cb
Note that the grammar is actually regular: it generates
strings of the form c+(a|b)

16

Error detection in LL(1) parsing


An error is detected whenever an empty table

slot is encountered.
We would like our parser to be able to recover
from an error and continue parsing.
Phase-level recovery

We associate each empty slot with an error handling


procedure.

Panic mode recovery


Modify the stack and/or the input string to try and
reach state from which we can continue.

17

Error recovery in LL(1) parsing


Panic mode recovery
Idea:
Decide on a set of synchronizing tokens.
When an error is found and there's a nonterminal at
the top of the stack, discard input tokens until a
synchronizing token is found.
Synchronizing tokens are chosen so that the parser
can recover quickly after one is found

e.g. a semicolon when parsing statements.

If there is a terminal at the top of the stack, we


could try popping it to see whether we can
continue.

Assume that the input string is actually missing that


terminal.
18

Error recovery in LL(1) parsing


Panic mode recovery
Possible synchronizing tokens for a nonterminal A
the tokens in FOLLOW(A)

the tokens in FIRST(A)

When one is found, pop A of the stack and try to


continue
When one is found, match it and try to continue

tokens such as semicolons that terminate


statements

19

You might also like