BITS Pilani
BITS Pilani Prof.Aruna Malapati
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus
Top down parsers
Today’s Agenda
• Types of parsers
• Top down parser
BITS Pilani, Hyderabad Campus
Parsing technique
• Scan input string left to right and identify the derivation is
leftmost or rightmost.
• Make use of productions for choosing the appropriate
derivation.
BITS Pilani, Hyderabad Campus
Types of parsers
Parsers
Top Down Bottom Up
Operator LR
Backtracking Predictive
Precedence
Recursive
LL(1) LR(0) SLR(1) LALR(1) CLR(1)
Descent
BITS Pilani, Hyderabad Campus
Two Approaches
• Top-down parsers LL(1), recursive descent
• Start at the root of the parse tree and grow toward leaves
• Pick a production & try to match the input
• Bad “pick” → may need to backtrack
• Bottom-up parsers LR(1), operator precedence
• Start at the leaves and grow toward root
• As input is consumed, encode possible parse trees in an
internal state
• Bottom-up parsers handle a large class of grammars
BITS Pilani, Hyderabad Campus
Difference between Top down
and bottom up parser
Input string abbcde
S -> aABe
A -> Abc | b Top down parser generates the tree from the start symbol or the root
Node and continues finding the right production to derive the string.
B -> d
Main task is to make a decision to use the right production for deriving
the string
Bottom Up parsers generates the tree by looking at input by reducing
It i.e it looks for strings in the inputs that matches the RHS of productions
and replace it with its LHS continues until start production is derived.
Main task is to make a decision of whether to shift or reduce.
BITS Pilani, Hyderabad Campus
Grammars and Parsers
LL(1) parsers
– Left-to-right input Grammars that this
can handle are called
– Leftmost derivation
LL(1) grammars
– 1 symbol of look-ahead
LR(1) parsers
– Left-to-right input Grammars that this
– Rightmost derivation can handle are called
LR(1) grammars
– 1 symbol of look-ahead
BITS Pilani, Hyderabad Campus
Top down parser
• Built from root to leaves.
• The derivation terminates when the required input string
terminates.
• Left derivation matches this requirement.
• Main task is to find appropriate production rule in order
to produce the correct input string.
BITS Pilani, Hyderabad Campus
Example - 1
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
First input string matches
with the leftmost node, hence
Advance the input string pointer
BITS Pilani, Hyderabad Campus
Example - 1
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
Match next node P with current
Character in input string. It does
not match and P is non terminal
Hence expand.
BITS Pilani, Hyderabad Campus
Example - 1
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
y w
Match, hence advance
The input string pointer
BITS Pilani, Hyderabad Campus
Example - 1
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
y w
Mismatch
Hence backtrack
And use other
Production of P
BITS Pilani, Hyderabad Campus
Example - 1
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
BITS Pilani, Hyderabad Campus
Example
Grammar
Sentential form
# Production rule S
1 S -> x P z
2 P -> yw | y x P z
Input string x y z
Matching done for
entire string
BITS Pilani, Hyderabad Campus
Example - 2
Expression grammar (with precedence)
# Production rule
1 expr → expr + term
2 | expr - term
3 | term
4 term → term * factor
5 | term / factor
6 | factor
7 factor → number
8 | identifier
Input string x – 2 * y
BITS Pilani, Hyderabad Campus
Example -2
Current position in
the input stream
Rule Sentential form Input string expr
- expr x - 2 * y
1 expr + term x - 2 * y
3 term + term x – 2 * y expr + term
6 factor + term x – 2 * y
8 <id> + term x – 2 * y
x – 2 * y term
- <id,x> + term
Problem: fact
– Can’t match next terminal
– We guessed wrong at step 2 x
BITS Pilani, Hyderabad Campus
Backtracking
Rule Sentential form Input string
- expr x - 2 * y
2 expr + term x - 2 * y
3 term + term x – 2 * y Undo all these
6 factor + term x – 2 * y productions
8 <id> + term x – 2 * y
? <id,x> + term x – 2 * y
• Rollback productions
• Choose a different production for expr
• Continue
BITS Pilani, Hyderabad Campus
Retrying
Rule Sentential form Input string expr
- expr x - 2 * y
2 expr - term x - 2 * y
3 term - term x – 2 * y expr - term
6 factor - term x – 2 * y
8 <id> - term x – 2 * y
x – 2 * y term fact
- <id,x> - term
3 <id,x> - factor x – 2 * y
7 <id,x> - <num> x – 2 * y fact 2
x
Problem:
– More input to read
– Another cause of backtracking
BITS Pilani, Hyderabad Campus
Successful Parse
Rule Sentential form Input string expr
- expr x - 2 * y
2 expr - term x - 2 * y
3 term - term x – 2 * y expr - term
6 factor - term x – 2 * y
8 <id> - term x – 2 * y
term term * fact
- <id,x> - term x – 2 * y
4 <id,x> - term * fact x – 2 * y
6 <id,x> - fact * fact x – 2 * y fact fact y
7 <id,x> - <num> * fact x – 2 * y
- <id,x> - <num,2> * fact x – 2 * y
x 2
8 <id,x> - <num,2> * <id> x – 2 * y
All terminals match – we’re finished
BITS Pilani, Hyderabad Campus
Other Possible Parses
Rule Sentential form Input string
- expr x - 2 * y
2 expr + term x - 2 * y
2 expr + term + term x – 2 * y
2 expr + term + term + term x – 2 * y
2 expr + term + term + term + term x – 2 * y
Top down parser cannot handle left recursive grammar.
Problem: termination
– Wrong choice leads to infinite expansion
(More importantly: without consuming any input!)
– May not be as obvious as this
– Our grammar is left recursive
BITS Pilani, Hyderabad Campus
Left Recursion
• Bad news:
– Top-down parsers cannot handle left
recursion
• Good news:
– We can systematically eliminate left
recursion
BITS Pilani, Hyderabad Campus
Backtracking parser
• Tries different production rules to find the match for the
input sting by backtracking each time.
• Slower and requires exponential time in general.
• Hence not preferred in practical compilers.
BITS Pilani, Hyderabad Campus
Predictive parsing
• The goal of predictive parsing is to construct a top-down
parser that never backtracks.
• To do so, we must transform a grammar in two ways:
– eliminate left recursion, and
– perform left factoring.
Consider this grammar:
A ::= A a | b A ::= A a | b
This grammar recognizes ba* Here is an alternative way:
A ::= b A'
A' ::= a A' | ε
BITS Pilani, Hyderabad Campus
Recursive Descent Parser
Basic idea
• Given A →a | b, the parser should be able to choose
between a & b.
• The parser uses a collection of recursive procedures for
paring the given input string.
• The RHS of the production is directly converted to a
program.
• For each non terminal a separate procedure is written
and the body of the procedure is RHS of the
corresponding non terminal.
BITS Pilani, Hyderabad Campus
Recursive Descent parser
For every variable we will have one function and if a
variable
Why has name?
this many productions depending on the
E -> id E’
number of variable
For every productions
in thewe will have
grammar if else
we will cases
write or
a function
E’ -> + id E’ | Є switch cases
E (){ E’ (){ match(char t){ l=getchar();
if(l==‘id’){ if(l==‘+’){ If(l==t) main(){
match(‘id’); match(‘+’); l=getchar(); E();
E’(); match(‘id’); else If(l==‘$’)
} E’(); printf(“error”); printf(“parsing
successful”);
} }
According to the grammar
else id + id $ can be generated E’()
E’()
return; E()
} Recursion stack main()
BITS Pilani, Hyderabad Campus
Exercise
S->ABd | aBc
A -> є
B->b|c
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
LL(1) Parser
$ Input Buffer
LL(1) Parser
$ LL(1) Parse Table
Stack
BITS Pilani, Hyderabad Campus
First and Follow sets
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
Terminals First Set Follow Set Input: Grammar G
E (,id ),$ Output: Parse table M
Method:
E’ +,ε ),$ 1. For each production S -> a of the grammar,
T (,id +,), $ perform steps 2 and 3.
T’ *, ε +,), $ 2. For each terminal a in First(a), add S -> a to M[S,a]
3. If ε is in First(S), add S -> ε to M[S,b] for each
F (,id *,+,),$ terminal b in Follow(S).If ε is in First(a) and $ is in
Follow(S), add S-> ε to M[S,$].
Input Symbols
NT Id + * ( ) $
E E →TE’ E →TE’
E’ E’ →+TE’ E’ → ε E’ → ε
T T →FT’ T →FT’
T’ T’ → ε T’ →*FT’ T’ → ε T’ → ε
F F->id F →(E)
BITS Pilani, Hyderabad Campus
Input Symbols
NT Id + * ( ) $
E E →TE’ E →TE’
E’ E’ →+TE’ E’ → ε E’ → ε
T T →FT’ T →FT’
T’ T’ → ε T’ →*FT’ T’ → ε T’ → ε
F F->id F →(E)
Input Stack Action Input Stack Action
id+id*id$ E$ [E,id] *id$ T’E’$ [T’,*]
id+id*id$ TE’$ [T,id] *id$ *FT’E’$ *=*
id+id*id$ FT’E’$ [F,id] id$ FT’E’$ [F,id]
id+id*id$ id T’E’$ id = id id$ id T’E’$ id=id
+id*id$ T’E’$ [T’,+] $ T’E’$ [T’,$]
+id*id$ E’$ [E’,+] $ E’$ [E’,$]
+id*id$ +TE’$ +=+ $ $ $=$
id*id$ TE’$ [T,id] Since no more symbols to match both
id*id$ FT’E’$ [F,id] in input and the stack the input string
is accepted
id*id$ idT’E’$ id=id BITS Pilani, Hyderabad Campus
Rules for LL(1) Grammars
• Grammar must not have left-recursion
• Grammar must be predictive / deterministic
• Grammar must not be ambiguous
BITS Pilani, Hyderabad Campus
Error Handling
• Errors
• Terminal at top of stack ≠ terminal on input
• Variable A is top of stack and M[A, a] = no production
• The parser has to “recover” or synchronize itself
• If we just continue: many further senseless errors
• The generated error messages should possibly be
• Exact in meaning
• Exact in place (e.g. line, column)
• Find as many errors as possible in one run • Avoid
propagated errors
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
BITS Pilani, Hyderabad Campus
Take home message
• Top down parsers use left derivations.
• Backtracking parsers are exponential.
• Backtracking parsers cannot handle left recursive grammars.
• Backtracking overcome by look ahead in recursive descent
parsers.
• Recursive descent parsers are implemented by writing
recessive procedures.
BITS Pilani, Hyderabad Campus
Take home message
• The Top down parsers use the derivation from the start
symbol and use left derivation.
• Two class of top down parsers exist
• Backtracking: Exponential time since it tries all products suitable
• Top Down Parsers: Predict based on parse table
BITS Pilani, Hyderabad Campus
Take home message
• All CFG cannot be LL(1) grammar.
• Before Implementing the predictive parser convert the
CFG into LL(1) by eliminating the left recursion and
performing the left factoring.
• Compute the first and follow sets.
• Make the entries in the LL(1) table,if every cell in the
table has unique entries then the grammar is of type
LL(1).
BITS Pilani, Hyderabad Campus