0% found this document useful (0 votes)
94 views9 pages

Top-Down Parsing and Intro To Bottom-Up Parsing: Predictive Parsers

Predictive parsers can predict which production to use based on looking ahead at the next few tokens without backtracking. They accept LL(k) grammars where productions are uniquely determined by the next k tokens. LL(1) parsers can predict based on the next 1 token. Left factoring a grammar removes ambiguity to make it LL(1). An LL(1) parsing table shows the unique production to use for each non-terminal/token pair. The LL(1) parsing algorithm uses the table to parse inputs in a single left-to-right pass, raising errors for invalid inputs.

Uploaded by

Anurag Upadhyay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views9 pages

Top-Down Parsing and Intro To Bottom-Up Parsing: Predictive Parsers

Predictive parsers can predict which production to use based on looking ahead at the next few tokens without backtracking. They accept LL(k) grammars where productions are uniquely determined by the next k tokens. LL(1) parsers can predict based on the next 1 token. Left factoring a grammar removes ambiguity to make it LL(1). An LL(1) parsing table shows the unique production to use for each non-terminal/token pair. The LL(1) parsing algorithm uses the table to parse inputs in a single left-to-right pass, raising errors for invalid inputs.

Uploaded by

Anurag Upadhyay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Predictive Parsers

• Like recursive-descent but parser can


“predict” which production to use
Top-Down Parsing
– By looking at the next few tokens
and – No backtracking
Intro to Bottom-Up Parsing
• Predictive parsers accept LL(k) grammars
Lecture 7 – L means “left-to-right” scan of input
– L means “leftmost derivation”
– k means “predict based on k tokens of lookahead”
– In practice, LL(1) is used

Prof. Aiken CS 143 Lecture 7 1 Prof. Aiken CS 143 Lecture 7 2

LL(1) vs. Recursive Descent Predictive Parsing and Left Factoring

• In recursive-descent, • Recall the grammar


– At each step, many choices of production to use
– Backtracking used to undo bad choices
ET+E|T
T  int | int * T | ( E )
• In LL(1),
– At each step, only one choice of production
– That is
• Hard to predict because
• When a non-terminal A is leftmost in a derivation – For T two productions start with int
• The next input symbol is t – For E it is not clear how to predict
• There is a unique production A   to use
– Or no production to use (an error state)
• We need to left-factor the grammar
• LL(1) is a recursive descent variant without backtracking
Prof. Aiken CS 143 Lecture 7 3 Prof. Aiken CS 143 Lecture 7 4

Left-Factoring Example LL(1) Parsing Table Example

• Recall the grammar • Left-factored grammar


ET+E|T ETX X+E|
T  int | int * T | ( E ) T  ( E ) | int Y Y*T|
• The LL(1) parsing table: next input token

• Factor out common prefixes of productions int * + ( ) $


ETX E TX TX

X+E| X +E  
T  ( E ) | int Y T int Y (E)
Y *T   
Y*T|
rhs of production to use
leftmost non-terminal
Prof. Aiken CS 143 Lecture 7 5 Prof. Aiken CS 143 Lecture 7 6

1
LL(1) Parsing Table Example (Cont.) LL(1) Parsing Tables. Errors

• Consider the [E, int] entry • Blank entries indicate error situations
– “When current non-terminal is E and next input is
int, use production E  T X” • Consider the [E,*] entry
– This can generate an int in the first position – “There is no way to derive a string starting with *
from non-terminal E”
• Consider the [Y,+] entry
– “When current non-terminal is Y and current token
is +, get rid of Y”
– Y can be followed by + only if Y  

Prof. Aiken CS 143 Lecture 7 7 Prof. Aiken CS 143 Lecture 7 8

Using Parsing Tables LL(1) Parsing Algorithm

• Method similar to recursive descent, except initialize stack = <S $> and next
– For the leftmost non-terminal S repeat
– We look at the next input token a case stack of
– And choose the production shown at [S,a]
<X, rest> : if T[X,*next] = Y1…Yn
then stack  <Y1… Yn rest>;
• A stack records frontier of parse tree else error ();
– Non-terminals that have yet to be expanded
<t, rest> : if t == *next ++
– Terminals that have yet to matched against the input
– Top of stack = leftmost pending terminal or non-terminal
then stack  <rest>;
else error ();
• Reject on reaching error state until stack == < >
• Accept on end of input & empty stack
Prof. Aiken CS 143 Lecture 7 9 Prof. Aiken CS 143 Lecture 7 10

LL(1) Parsing Algorithm$ marks bottom of stack LL(1) Parsing Example

initialize stack = <S $> and next Stack Input Action


repeat For non-terminal X on top of stack, E$ int * int $ TX
lookup production
case stack of TX$ int * int $ int Y
<X, rest> : if T[X,*next] = Y1…Yn int Y X $ int * int $ terminal
then stack  <Y1… Yn rest>; YX$ * int $ *T
else error (); Pop X, push *TX$ * int $ terminal
<t, rest> : if t == *next ++ production
For terminal t on top of TX$ int $ int Y
then stack  <rest>; rhs on stack.
stack, check t matches next Note int Y X $ int $ terminal
input token.
else error ();
until stack == < >
leftmost YX$ $ 
symbol of rhs
X$ $ 
is on top of
the stack. $ $ ACCEPT
Prof. Aiken CS 143 Lecture 7 11 Prof. Aiken CS 143 Lecture 7 12

2
Constructing Parsing Tables: The Intuition Computing First Sets

• Consider non-terminal A, production A  , & token t Definition


• T[A,t] =  in two cases: First(X) = { t | X * t}  { | X * }

• If  * t  Algorithm sketch:
–  can derive a t in the first position
– We say that t  First() 1. First(t) = { t }
2.   First(X)
• If A   and  *  and S *  A t  • if X  
– Useful if stack has A, input is t, and A cannot derive t • if X  A1 … An and   First(Ai) for 1  i  n
– In this case only option is to get rid of A (by deriving ) 3. First()  First(X) if X  A1 … An 
• Can work only if t can follow A in at least one derivation
– and   First(Ai) for 1  i  n
– We say t  Follow(A)

Prof. Aiken CS 143 Lecture 7 13 Prof. Aiken CS 143 Lecture 7 14

First Sets. Example Computing Follow Sets

• Recall the grammar • Definition:


ETX X+E|
Follow(X) = { t | S *  X t  }
T  ( E ) | int Y Y*T|

• First sets • Intuition


First( ( ) = { ( } First( T ) = {int, ( } – If X  A B then First(B)  Follow(A) and
First( ) ) = { ) } First( E ) = {int, ( } Follow(X)  Follow(B)
First( int) = { int } First( X ) = {+,  } • if B *  then Follow(X)  Follow(A)
First( + ) = { + } First( Y ) = {*,  }
First( * ) = { * } – If S is the start symbol then $  Follow(S)

Prof. Aiken CS 143 Lecture 7 15 Prof. Aiken CS 143 Lecture 7 16

Computing Follow Sets (Cont.) Follow Sets. Example

Algorithm sketch: • Recall the grammar


ETX X+E|
1. $  Follow(S)
T  ( E ) | int Y Y*T|
2. First() - {}  Follow(X)
– For each production A   X  • Follow sets
3. Follow(A)  Follow(X) Follow( + ) = { int, ( } Follow( * ) = { int, ( }
– For each production A   X  where   First() Follow( ( ) = { int, ( } Follow( E ) = {), $}
Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}
Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $}
Follow( int) = {*, +, ) , $}

Prof. Aiken CS 143 Lecture 7 17 Prof. Aiken CS 143 Lecture 7 18

3
Constructing LL(1) Parsing Tables Notes on LL(1) Parsing Tables

• Construct a parsing table T for CFG G • If any entry is multiply defined then G is not
LL(1)
• For each production A   in G do: – If G is ambiguous
– For each terminal t  First() do – If G is left recursive
• T[A, t] =  – If G is not left-factored
– If   First(), for each t  Follow(A) do – And in other cases as well
• T[A, t] = 
– If   First() and $  Follow(A) do
• T[A, $] =  • Most programming language CFGs are not LL(1)

Prof. Aiken CS 143 Lecture 7 19 Prof. Aiken CS 143 Lecture 7 20

Bottom-Up Parsing An Introductory Example

• Bottom-up parsing is more general than top- • Bottom-up parsers don’t need left-factored
down parsing grammars
– And just as efficient
– Builds on ideas in top-down parsing • Revert to the “natural” grammar for our
example:
• Bottom-up is the preferred method ET+E|T
T  int * T | int | (E)
• Concepts today, algorithms next time
• Consider the string: int * int + int
Prof. Aiken CS 143 Lecture 7 21 Prof. Aiken CS 143 Lecture 7 22

The Idea Observation

Bottom-up parsing reduces a string to the start • Read the productions in reverse
symbol by inverting productions: (from bottom to top)
• This is a rightmost derivation!
int * int + int T  int
int * T + int T  int * T int * int + int T  int
T + int T  int int * T + int T  int * T
T+T ET T + int T  int
T+E ET+E T+T ET
E T+E ET+E
E
Prof. Aiken CS 143 Lecture 7 23 Prof. Aiken CS 143 Lecture 7 24

4
Important Fact #1 A Bottom-up Parse

int * int + int


E
Important Fact #1 about bottom-up parsing:
int * T + int
T E
A bottom-up parser traces a rightmost T + int
derivation in reverse T+T
T+E
T T

E
int * int + int

Prof. Aiken CS 143 Lecture 7 25 Prof. Aiken CS 143 Lecture 7 26

A Bottom-up Parse in Detail (1) A Bottom-up Parse in Detail (2)

int * int + int int * int + int


int * T + int

int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 27 Prof. Aiken CS 143 Lecture 7 28

A Bottom-up Parse in Detail (3) A Bottom-up Parse in Detail (4)

int * int + int int * int + int


int * T + int int * T + int
T + int
T T + int
T

T+T
T T T

int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 29 Prof. Aiken CS 143 Lecture 7 30

5
A Bottom-up Parse in Detail (5) A Bottom-up Parse in Detail (6)

int * int + int int * int + int


E

int * T + int int * T + int


T + int
T E T + int
T E

T+T T+T
T+E
T T T+E
T T

E
int * int + int int * int + int

Prof. Aiken CS 143 Lecture 7 31 Prof. Aiken CS 143 Lecture 7 32

A Trivial Bottom-Up Parsing Algorithm Questions

Let I = input string • Does this algorithm terminate?


repeat
pick a non-empty substring  of I • How fast is the algorithm?
where X  is a production
if no such , backtrack • Does the algorithm handle all cases?
replace one  by X in I
until I = “S” (the start symbol) or all • How do we choose the substring to reduce at
possibilities are exhausted each step?

Prof. Aiken CS 143 Lecture 7 33 Prof. Aiken CS 143 Lecture 7 34

Where Do Reductions Happen? Notation

Important Fact #1 has an interesting • Idea: Split string into two substrings
consequence: – Right substring is as yet unexamined by parsing
– Let  be a step of a bottom-up parse (a string of terminals)
– Assume the next reduction is by X  – Left substring has terminals and non-terminals
– Then  is a string of terminals
• The dividing point is marked by a |
Why? Because X   is a step in a right- – The | is not part of the string
most derivation
• Initially, all input is unexamined |x1x2 . . . xn

Prof. Aiken CS 143 Lecture 7 35 Prof. Aiken CS 143 Lecture 7 36

6
Shift-Reduce Parsing Shift

Bottom-up parsing uses only two kinds of • Shift: Move | one place to the right
actions: – Shifts a terminal to the left string

Shift ABC|xyz  ABCx|yz

Reduce

Prof. Aiken CS 143 Lecture 7 37 Prof. Aiken CS 143 Lecture 7 38

Reduce The Example with Reductions Only

• Apply an inverse production at the right end


of the left string
– If A  xy is a production, then
int * int | + int reduce T  int
int * T | + int reduce T  int * T
Cbxy|ijk  CbA|ijk

T + int | reduce T  int


T+T| reduce E  T
T+E| reduce E  T + E
Prof. Aiken CS 143 Lecture 7 39
E| Prof. Aiken CS 143 Lecture 7 40

The Example with Shift-Reduce Parsing A Shift-Reduce Parse in Detail (1)


|int * int + int shift |int * int + int
int | * int + int shift
int * | int + int shift
int * int | + int reduce T  int
int * T | + int reduce T  int * T
T | + int shift
T + | int shift
T + int | reduce T  int
T+T| reduce E  T int * int + int
T+E| reduce E  T + E 
E| Prof. Aiken CS 143 Lecture 7 41 Prof. Aiken CS 143 Lecture 7 42

7
A Shift-Reduce Parse in Detail (2) A Shift-Reduce Parse in Detail (3)
|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 43 Prof. Aiken CS 143 Lecture 7 44

A Shift-Reduce Parse in Detail (4) A Shift-Reduce Parse in Detail (5)


|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
int * T | + int

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 45 Prof. Aiken CS 143 Lecture 7 46

A Shift-Reduce Parse in Detail (6) A Shift-Reduce Parse in Detail (7)


|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
T T
int * T | + int int * T | + int
T | + int T | + int
T T + | int T

int * int + int int * int + int


 
Prof. Aiken CS 143 Lecture 7 47 Prof. Aiken CS 143 Lecture 7 48

8
A Shift-Reduce Parse in Detail (8) A Shift-Reduce Parse in Detail (9)
|int * int + int |int * int + int
int | * int + int int | * int + int
int * | int + int int * | int + int
int * int | + int int * int | + int
T T
int * T | + int int * T | + int
T | + int T | + int
T + | int T T + | int T T
T + int | T + int |
int * int + int T+T| int * int + int
 
Prof. Aiken CS 143 Lecture 7 49 Prof. Aiken CS 143 Lecture 7 50

A Shift-Reduce Parse in Detail (10) A Shift-Reduce Parse in Detail (11)


|int * int + int |int * int + int
int | * int + int int | * int + int E
int * | int + int int * | int + int
int * int | + int int * int | + int
T E T E
int * T | + int int * T | + int
T | + int T | + int
T + | int T T T + | int T T
T + int | T + int |
T+T| int * int + int T+T| int * int + int
T+E|  T+E| 
Prof. Aiken CS 143 Lecture 7 51
E| Prof. Aiken CS 143 Lecture 7 52

The Stack Conflicts

• Left string can be implemented by a stack • In a given state, more than one action (shift or
reduce) may lead to a valid parse
– Top of the stack is the |
• If it is legal to shift or reduce, there is a shift-
• Shift pushes a terminal on the stack reduce conflict

• If it is legal to reduce by two different productions,


• Reduce pops 0 or more symbols off of the there is a reduce-reduce conflict
stack (production rhs) and pushes a non-
terminal on the stack (production lhs) • You will see such conflicts in your project!
– More next time . . .

Prof. Aiken CS 143 Lecture 7 53 Prof. Aiken CS 143 Lecture 7 54

You might also like