0% found this document useful (0 votes)
10 views140 pages

Compiler Design 3

The document covers syntax analysis and parsing in programming languages, focusing on context-free grammar (CFG) and its role in defining syntactic structures. It explains the importance of grammar for language designers and compiler writers, detailing parsing techniques like top-down and bottom-up parsing, as well as the concept of parse trees. Additionally, it addresses issues like ambiguity in grammar and methods for resolving it, such as adding precedence rules.

Uploaded by

tibit40784
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views140 pages

Compiler Design 3

The document covers syntax analysis and parsing in programming languages, focusing on context-free grammar (CFG) and its role in defining syntactic structures. It explains the importance of grammar for language designers and compiler writers, detailing parsing techniques like top-down and bottom-up parsing, as well as the concept of parse trees. Additionally, it addresses issues like ambiguity in grammar and methods for resolving it, such as adding precedence rules.

Uploaded by

tibit40784
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

–3–

Syntax Analysis ( Parsing )


Objectives
 To Understand

1. Context Free Grammar & Writing a Grammar

2. Parsing – Top Down, Bottom up & Operator


Precedence

3. The Role of a Parser

4. LR Parsers & Parser Generators

5. Semantic analysis
2
Syntax Analysis & Parsing
 Just to recapitulate the definitions:
1. Syn-tax
The way in which words are stringed together to form
phrases, clauses and sentences
2. Syntax analysis:
The task concerned with fitting a sequence of
tokens into a specified syntax.
3. Parsing:
To break a sentence down into its component parts of
speech with
. an explanation of the form, function, and
syntactical relationship of each part
3
Syntax Analysis
 The step two of Front end in Compilation
 Every programming Language has RULES that prescribe
the syntactic structure (i.e. the way in which words and
phrases can be organized to form sentences.)
 These rules can be formally specified using :
CONTEXT FREE GRAMMARS
( or GRAMMAR for Short)
A.k.a
BACKUS – NAUR FORM ( BNF)
4
Intro to Context Free Grammar (CFG)
 Essential formalism for describing the structure of the
programs in a programming Language
 A recipe for constructing elements of a set of strings of
symbols ( or tokens)
 Consists of a set production rules and a start symbol
 Each production rule:
– Defines a named syntactic construct using Two parts
1. An LHS (with a syntactic Construct) and
2. An RHS (showing the possible form of the construct)
– Connected by an Arrow symbol ( meaning “ is defined as” )
sentence → subject predicate ‘.’
5
Importance of the Grammar
 Grammar offers significant advantages to both
language Designers and Compiler writers:
1. Gives precise, yet easy-to-understand, syntactic
specification of Programming Languages
2. Helps construct efficient parsers automatically ( not in
all cases – of course)
3. Provides a good structure to the language which in
turn helps generate correct object code
4. Makes language open ended easily amenable to add
additional constructs at a later date
6
Context Free Grammar – Formal Definition
A context-free grammar G = (T, N, S, P) consists of:
1. T, a set of terminals (scanner tokens - symbols that
may not appear on the left side of rule,).
2. N, a set of nonterminals (syntactic variables generated
by productions – symbols on left or right of rule).
3. S, a designated start nonterminal.
4. P, a set of productions (Rules). Each production has
the form, A::=  , where A is a nonterminal and  is a
sentential form , i.e., a string of zero or more grammar
symbols (terminals / nonterminals)
7
CFG – An Example
 A grammar derives strings by
1. Beginning with the Start Symbol and
2. Repeatedly replacing a nonterminal by the right side
of the production rule for that nonterminal
Example:
 expr  expr op expr  The terminal symbols are
 expr  ( expr ) id, num, (, ), +, –, *, /, 
 expr  – expr  The non-terminal symbols
 expr  id | num are expr op
 op  [ +, –, *, /,  ]  The start symbol is expr
8
CFG – Another Example

What are the token classes of this language?

 It can be shown that the above syntax conforms to the Grammar

9
Notational Conventions
1. These Symbols are terminals
1. Lower case letters early in the alphabet ( like a, b, c)
2. Operator symbols such as +, - etc..
3. Punctuation symbols like ,(comma), (, ), etc
4. The digits 0, 1, ………..9
5. Boldface strings such as if, id etc…
2. These Symbols are nonterminals
1. Uppercase letters early in the alphabet ( like A, B, C)
2. The letter S when it appears, is usually the Start Symbol
3. Lower case letters late in the alphabet chiefly u, v, ….
……. z represent string of terminals
………. Contd. 10
Notational Conventions
4. Uppercase letters late in the alphabet such as X, Y, Z
represent grammar symbols that is either terminal or
nonterminal
5. Lower case Greek letters α, β, γ represent strings of
grammar symbols
6. If A → α1 ,, A → α2 …….. A → αn are all Productions
with A on the LHS ( known as A-productions), then, A
→ α1 | α2 …| αn (α1 , α2 … αn are alternatives of A)
7. Unless otherwise stated , the LHS of the first production
is the start nonterminal s
11
Use of Conventional Notation – Example
 Remember the previous Example:
expr  expr op expr
expr  ( expr )
expr  - expr
expr  id | num
op  [ +, -, *, /,  ]
 Using the notations described so far, this grammar can
be written as:
E → E A E | ( E ) | id | num
A→+|–|*|↑
12
Grammar for a Tiny Language
 program ::= statement ; Statement
 statement ::= assignStmt | ifStmt
 assignStmt ::= id = expr ;
 ifStmt ::= if ( expr ) stmt
 expr ::= id | int | expr + expr
 Id ::= a | b | c | i | j | k | n | x | y | z
 int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
13
Regular Expressions and Grammars
 EVERY construct that can be described by an RE can
also be described by a grammar
 Example : RE ( a |b )* abb can be described as:
A0 → aA0 | bA0 | aA1 A1 → bA2 A2 → bA3 A3 → ε
 Then why use RE for Lex & CFG for syntax analysis ?
– Lexical rules are frequently quite simple & we do not need
powerful grammars to describe them
– RE’s are more concise and easy to understand for tokens
– RE’s provide for automatic construction of efficient Lexical
Analyzers
14
Derivation
 A way of showing how an input sentence is recognized
with a grammar Now let us derive a structure
ctr + ( a – 5) / 2
 Can also be thought of as a type of algebraic substitution
using the grammar expr
 Remember the grammar: expr op expr
expr + expr
1. expr  expr op expr Expr + expr op exr
2. expr  ( expr ) expr + expr / expr
3. expr  - expr expr + expr / num
4. expr  id | num expr + (expr) / num
expr + (expr – expr) / num
5. op  [ +, -, *, /,  ]
id = id + ( id – num) / num 15
Derivations – Some definitions
 Uses productions to derive more complex Expressions
The symbol:
– > denotes “ derives in one step”.
– * > denotes “ derives in zero or more steps”( α * >β )
– + denotes “ derives in one or more steps” ( α + >β )
>
 Given the grammar G with start symbol S, a string ω is
part of the language L (G), If and only if S * > ω
 Further, if S * > α and α contains
– Only terminals, then it is a – Sentence of G
– nonterminals too, then it is a – Sentential form of G
16
Derivation – Another Example
 Consider the following Grammar a simple ; terminated
expression consisting of interspersed numbers and +
1: stmt → expr ‘;’ Stmt > expr ;
2: expr → factor ‘ +’ factor
> factor + factor ;
3: | factor > Number + factor ;
4: factor → number
 Let us try and come up with > Number + Number ;
a derivation to prove that an  The process is called a
input sentence (like 1 + 4 ;) Leftmost Derivation because
can be recognized from this the left most nonterminal is
grammar always replaced
17
Leftmost Derivation
 Start with the topmost symbol in the grammar
 Replace the leftmost nonterminal in the partially parsed
sentence with the equivalent production’s RHS in each
step
 The operator L > signifies left derivation
L
 X > Y means X derives Y by leftmost Derivation
 Each of the step (Partial Derivation) is called Viable
Prefix
 Portion of the viable prefix that is replaced at each step is
called the Handle
18
Leftmost Derivation Example

19
Leftmost Derivation Example

20
Rightmost Derivation
 Replace the rightmost nonterminal in the partially parsed
R
sentence > may signify Right Derivation

 The operator
R
X > Y means X derives Y by rightmost Derivation
 It is possible to start with the start symbol and do
rightmost derivation

21
Rightmost Derivation – Another Example

22
Rightmost Derivation – Another Example

23
Parse Tree
 WHAT ?
A graphical representation for a derivation sequence showing
how to derive the string of a language from grammar starting
from Start symbol
 WHY ?
Each stage of the compiler has two purposes:
– Detect and filter out some class of errors
– Compute some new information or translate the
representation of the program to make things easier for later
stages
Recursive structure of tree suits recursive structure of
language definition
25
Parse Tree (Contd.)
 HOW ?
Tree nodes represent symbols of the grammar (nonterminals
or terminals) and tree edges represent derivation steps
 Example:
Given the production: expr
expr  expr op expr
 The corresponding Parse tree is :
 It’s Properties are:
expr op expr
1. The root is labeled by a start symbol
2. Each leaf is labeled by a token or by ε
3. Each interior node is labeled by a nonterminal
26
Parse Tree Example
 Remember the grammar: The corresponding
Parse tree is
expr expr
expr → expr op expr
+ expr
expr
expr
expr  +( expr
→ expr expr/ )expr
expr
expr  + -(expr)
→ expr expr / expr expr + expr

expr 
expr →expr id,–num
+ (expr expr) / expr Id (expr ) / Expr
opid (ctr)
id → + ( id[ –+,num)
-, *, /,/ num
] num
expr – expr
 And the String derived from it (2)
ctr + ( a – 5) / 2 Id num
(a) (5)
 Using the derivation
27
Parse Tree Example (Continued)
 the Yield of the tree
expr (ctr)
– The string derived or
generated from the
expr + expr
nonterminal at the root of the
tree Id (expr ) / Expr
– Obtained by reading the (ctr)
leaves of the parse tree read num
expr – expr
from left to right (2)
 In this Example Id num
(a) (5)
ctr = ctr + (a – 5) / 2
28
Parse Tree – Another Example

29
Another Derivation Example
Given the grammar : E  E + E | E  E | ( E ) | - E | num
Find a derivation for the expression: 6 + 2  4
E E E E

E + E E + E E + E
6 + (2 * 4 ) = 14 E  E 6 E  E

Which derivation tree is 2 4


E E correct?E E

E  E E  E E  E

(6 + 2) * 4 = 32 E + E E + E 4

6 2
According to above grammar BOTH are 30
Another Derivation Example t
u
Given the grammar : E  E + E | E  E | ( Ey )in|p- E | id
an
Find a derivation for the expression: e6fo+r 2  4
t r e
E E E
r s e E
pa r.
E + E E + Ee
n m a E + E
n o am
6 + (2 * 4 ) = 14 haE sgr E
e t 6 E  E
or uou
Which derivation s m tree
b ig is 2 4
u c e am
E E d
o correct?
n E
p r e a E

a t o b
r thE 
d t E E  E E  E
m a s a i
s
(6 +gra2)m * 4e=i 32 E + E E + E 4

A tenc
s en 6 2
According to above grammar BOTH are 31
Ambiguity in Grammar
A grammar that produces more than one parse tree for
any input sentence
– Happens because the derivation can be either
Leftmost or Rightmost
 Can also happen because of the way the grammar is
defined
 Classic example — the if-then-else problem
– Stmt → if Expr then Stmt
– | if Expr then Stmt else Stmt
– | … other stmts …
32
Stmt → If Expr then Stmt
Ambiguous IF | if Expr then Stmt else Stmt
| … other stmts
 This sentential form has two derivations
 if Expr1 then if Expr2 then Stmt1 else Stmt2

33
Stmt → If Expr then Stmt
Ambiguous IF | if Expr then Stmt else Stmt
| … other stmts
 This sentential form has two derivations
 if Expr1 then if Expr2 then Stmt1 else Stmt2

34
Removing the ambiguity
1. Add precedence ; for Example
 This grammar is slightly larger
 Takes more rewriting to reach
some of the terminal symbols
 Encodes expected precedence
 Produces same parse tree
 under leftmost & rightmost
derivations
 Let’s see how it parses x - 2 * y
35
Removing the ambiguity (Contd.)

36
Removing the ambiguity (Contd.)
me s
s a ode
t he enc
i v e itly
s g l ic
i on e x p
a t d
r iv an
d e t ly
s t ec e
m o d i r n c
t e
ir gh mar eced
nd am pr
t a g r ed
os he s i r
f tm et d e
le u s he
t he eca t
o th n, b
B sio
re s
ex p
37
Removing the ambiguity (Contd.)
2. Rewrite the grammar to avoid generating the problem
3. Example The case of If then if then else :
Match each else to innermost unmatched if (common sense rule)

38
Removing the ambiguity (Contd.)

40
Removing the ambiguity (Contd.)
ng rt i
s ta
n s
c t i o
d u g.
pro torin
i pl e fa c
ul t e ft
m d l
a te alle
m i n c
e i
l ni s
r to o k e
m a e t
ra m sa m
a g he
g h t
r i t i n wi t
w
Re

41
Ambiguity - A Summary
 Ambiguity arises from two distinct sources
– Confusion in the context-free syntax (if-then-else)
– Confusion that requires context to resolve (overloading)
 Resolving ambiguity
– To remove context-free ambiguity, rewrite the grammar
– To handle context-sensitive ambiguity use precedence
 This is a language design problem
 Sometimes, the compiler writer accepts an ambiguous
grammar
 Parsing techniques must “do the right thing” (i.e.,
always select the same derivation)
42
Recursion In Grammars
 Very Common in grammars
 Generally 3 types of recursions are Possible:
1. A ::= Ab (Left Recursion)
2. B ::= cB (Right Recursion)
3. C ::= dCf (Middle Recursion or Self-embedding)
 Facts:
 If a grammar contains no middle recursion then the
language it generates is regular.
 If there is no recursion at all in a grammar, then it is
finite and therefore regular.
43
Left Recursion
Consider E  E + T | T A top-down parser might loop
the T  T  F | F forever when parsing an
grammar: F  ( E ) | id expression using this grammar
E E E E
E + T E + T E + T
E + T E + T
E + T
 A grammar that has at least one production of the form
A  A is a left recursive grammar
Top Down parsing cannot handle left-recursive grammars
44
Eliminating left Recursion
 First group the A productions as A A1 | A2| …..|
Am|1 |2 |.. n |
Where no i begins with an A.
 Then replace the A-productions by A 1 A’|2 A’|……
n A’|
A’ 1 A’| 2 A’|… n A’| 
 This eliminates all immediate left recursions from A and A’
productions
BUT
 It does not eliminate left recursions involving derivations of
two or more steps
45
Left Recursion ( Contd.)
 Left-recursion can often be eliminated by rewriting the
grammar
EE+T|T
This left-recursive grammar: T  T  F | F
F  ( E ) | id
Can be re-written to eliminate the immediate left recursion:
E  TE’
E’  +TE’ | 
T  FT’
T’  FT’ | 
F  ( E ) | id 46
Properties of Grammars – A Summary
 Epsilon Productions
– Has at least one production of the form “E → ε ”,
 Right( / Left) Linear Grammar:
– Grammars where all the RHS of all productions has at most
one non-terminal and that nonterminal appears rightmost
(/ Leftmost.)
– E → x E” or “E → x”.
 Left-Recursive Grammar:
– A grammar that can generate “E → E + X” (for example).
 Similarly, “Right-recursive grammars.”
 Ambiguous Grammar
– More than one parse tree is possible for a specific sentence47.
Limitations of CFG
 CFGs cannot express context conditions
 For Example:
1. Every name must be declared before it is used
2. The operands of an expression must have
compatible types
 Possible solutions
 Use context-sensitive grammars
– Too complicated
 Check context conditions during semantic analysis
48
Syntax Analysis
Problem Statement:
 To find a derivation sequence in a grammar G for the
input token stream
or
 To conclude that none exists
Solution
 Build parse tree for the string ( of tokens) based on
grammar rules
 The process is known as P A R S I N G
49
PARSING
 Constructs the syntax tree for a given sequence of
tokens using grammar rules and productions such
that:
1. The top node is labeled with the start symbol of the
grammar
2. Leaf nodes are labeled with terminals and inner nodes
with nonterminals
3. the children of the inner node labeled N correspond to the
members of an alternative of N, in the same order as they
occur in that alternative
4. The terminals labeling the leaf nodes correspond to the
sequence of tokens, in the same order they occur in the
input
………. Contd.
50
The Role of a Parser
Lexical Rest of the
analysis
PARSER
front end
Symbol Table
 The parser continues the process of translation and
validation started by the Lexical Analyzer:
1. Analyzing the phrase structure of the program,
2. Adding hierarchical (tree) structure to the flat stream of
tokens produced by the lexical Analyzer,
3. Outputting a data structure conveying the program's
syntax to subsequent compile phases, AND
4. Reporting errors when the program does not match the
syntactic structure of the source language
51
Parsing Methods
 Three Types of parsing methods in Vogue
1. Universal Parsing method
– Too inefficient to use in any practical compilers ( hence not
discussed any further)
– EX: Coocke–Younger–Kasami (CYK) algorithm
2. Top – Down Parsing
– Can be generated automatically or written manually
– Ex: : Left-to-Right –Top –Down Parser ( A.k.a. LL parser)
3. Bottom – Up parsing
– Can only be generated automatically
– Ex : Left-to-Right –Bottom-Up Parser ( A.k.a. LR Parser)
52
Top – Down & Bottom – Up Methods
 While they are efficient;
 They can work only on restricted classes of Grammars,
( such as LL and LR Grammars )
BUT
 They are expressive enough to describe most syntactic
constructs in Programming Languages.
SO
 Care must be taken while defining the grammar for the
language to make sure that the grammar is
unambiguous
53
TOP – DOWN Parsing
 An attempt to construct a parse tree for an input string
from the root and creating the nodes in PRE-ORDER (i.e.
the top of the tree is constructed before its nodes)
 Can also be viewed as an attempt to find a leftmost
derivation for an input string
 The way it works:
1. The process starts at root node say N
2. Repeat until the fringe of the parse tree matches the input string
1. Determine the correct alternative for N The key Step
2. Parser then proceeds to construct the left child of N
3. The process of determining the correct alternative for the leftmost
child continues till the left most child is a terminal
54
Top Down parsing example
String to be parsed :array [ num dotdot num] of integer
type

array [ simple ] of type

num dotdot num simple


Consider the Grammar:
type  simple | id | array [simple] of type integer
simple  integer Token dotdot “..” is used to stress
| char that character sequence is treated
as unit
| num dotdot num
55
Top Down Parsing – Another Case
 Consider the grammar : s → cAd A → ab | a
 String to be parsed : W = c a d s
Parse tree Construction would be:
1. Start with a single node S c A d
2. Use first production to expand S
3. Expand the nonterminal A using the first a b
alternative from rule 2 as we have a s
match for input token a
4. But b does not match d ; c A d
5. We have to BACKTRACK to step 3 and
try second alternative from rule 2
a 56
Top Down Parsing - Observations
 In general,
1. The selection of a production for a nonterminal may
involve trial – and – error
2. We may have to try a production and backtrack to try
another production if the first is found to be unsuitable.
(A production is unsuitable ,if after using the production, we
can not complete the tree to match the input string )
 If we can make the correct choice by looking at just the
next input symbol, we can build a Predictive Parser that
can perform a top-down parse without backtracking
57
Top Down Parsing - Observations
 In general, r
e f o
1. The selection of a production for a nonterminal b l
a may
u it
involve trial – and – error n s
f te
o
2. We may have to try a production a re backtrack to try
and
a rs n g
another production if the first m m is r s i
found to be unsuitable.
r a
g i ve P a
(A production is unsuitablea g e ,if t after using the production, we
gu ed i c
can not complete la n the
P r tree to match the input string )
in g
 If we can m mmake the correct choice by looking at just the
g ra
o
next
P input symbol, we can build a Predictive Parser that
r
can perform a top-down parse without backtracking
58
Predictive Parsing
 Most common technique used in parsers (Particularly
TRUE in case of Parsers written manually)
 Use grammars that are carefully written to eliminate
left recursion
 Further, if many alternatives exist for a nonterminal,
the grammar is so defined that the input token will
uniquely define the one and only one alternative and
that will lead to correct parsing or Error.
 No question of backtracking and trying another
alternative
59
Predictive Parsing Example
stmt  if expr then stmt else stmt
Consider the grammar: | while expr do stmt
| begin stmt_list end
A parser for this grammar can be switch(gettoken())
written with the following simple {
case if:
structure: ….
Based only on the first token, the parser break;
knows which rule to use to derive a case while:
statement, because all the three ….
break;
outcomes are unique even at the first case begin:
letter. ….
break;
Therefore this is called a default:
reject input;
Predictive Parser. } 60
Parsing Table
 Represents a Grammar as
 A two dimensional array M [ A,α ], where: Grammar:
– A is the nonterminal and
E  TE’
– α is the input symbol E’  +TE’ | 
 Used in Parser as the reference table to T  FT’
T’  FT’ | 
decide the possible values ( thereby the F  ( E ) | id
choice of production to be used)
NON- INPUT SYMBOL
TERMINAL id + * ( ) $
E E  TE’ E  TE’
Parsing E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
Table: T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
61
A table Driven Predictive Parser Program
 An input Buffer (containing string to be parsed with $ in the end)
 A stack ( containing a sequence of grammar symbols with $ at the
bottom)
 A Parsing table and
 An Output stream
INPUT: id + id  id $
OUTPUT:

Predictive Parsing E
STACK: E
T
E’
$ Program T E’
$

NON- INPUT SYMBOL


TERMINAL id + * ( ) $
PARSING E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
TABLE: T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
62
A Predictive Parser
 The Working of the Table driven Predictive Parser

INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
TE Program
$
E’
$
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + +* (* ) ( $ ) $
PARSING E E  TE’ E  TE’
E E  TE’ E  TE’
E’ E’ E’  +TE’E’  +TE’ E’   E’ E’   E’  
TABLE: T T
T  FT’ T  FT’ T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
63
A Predictive Parser
 The Working of the Table driven Predictive Parser

INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
FT Program F T’
T’
E’
E’
$
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E  TE’
E  TE’
E’ E’ E’ E’+TE’
 +TE’ E’   E’E’ E’  
TABLE: TT TTFT’ FT’ T  FT’
T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
64
A Predictive Parser
 The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
id
F Program F T’
T’
E’ id
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E  TE’
E  TE’
E’ E’ E’ E’+TE’
 +TE’ E’   E’E’ E’  
TABLE: TT TTFT’ FT’ T  FT’
T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
65
A Predictive Parser
 The Working of the Table driven Predictive Parser
Action when Top(Stack) = ε : Pop stack
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
ε
T’ Program F T’
E’
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E  TE’
E  TE’
E’ E’ E’ E’+TE’
 +TE’ E’   E’E’ E’  
TABLE: TT TTFT’ FT’ T  FT’
T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
66
A Predictive Parser
 The Working of the Table driven Predictive Parser
Action when Top(Stack) = input  $ : Pop stack, advance input.
INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
E’
+ F T’ + T E’
Program
T$
E’ id ε
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E  TE’
E  TE’
E’ E’ E’ E’+TE’
 +TE’ E’   E’E’ E’  
TABLE: TT TTFT’ FT’ T  FT’
T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
67
A Predictive Parser
 The Working of the Table driven Predictive Parser

INPUT: id + id  id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
T F T’ + T E’
E’ Program
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E  TE’
E  TE’
E’ E’ E’ E’+TE’
 +TE’ E’   E’E’ E’  
TABLE: TT TTFT’ FT’ T  FT’
T  FT’
T’ T’  T’  *FT’ T’   T’  
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E)
F F  id F  (E)
68
A Predictive Parser
The predictive parser proceeds in this fashion emitting the
following productions:
T  FT’ E
T E’
F  id
F T’ + T E’
T’   FT’
id  F T’
F  id 
id  F T’
T’  
E’  
id 

When Top(Stack) = input = $ the parser halts & accepts the input
string – ( id + id * id ) 69
Algorithm for Predictive parser
 The program execution is controlled by TWO inputs
– The input symbol a and the symbol on the top of the stack X
 There are THREE possibilities for the parser
– If X = a = $ : halt & announce the successful completion of
parsing
– If X = a ≠ $ : Pop X off the stack and advance the input
pointer to next input symbol
– If X is a nonterminal : Consult entry M [ X, a ] in the Parsing
table M and replace X on top of the stack with the RHS of
the production
 If M [ X, a ] is error call error routine
70
Algorithm for Predictive parser
c e io n
 The program execution is controlled by TWO s i n inputs v a t
r r i
– The input symbol a and the symbol onathe rse top tof-dethe stack X
) P o s o n .
 There are THREE possibilitiesLfor ( 1 f t m c ti
L theleparser t a
a s s a n e x
– If X = a = $ : halt & announce w n the o e
successful i ts completion of
n o d d e
parsing o k a n o o s
a ls h t, c h
– If X = a ≠ $ :r Pop s
i X off r igthe t
stack o and advance the input
r s e t o e ad
pointerpto a nextleinputt
f symbol a h
h is m o l
– IfTX is a nonterminal
fr o m b : Consult entry M [ X, a ] in the Parsing
e s s y
tablea r s 1
M andp replace X on top of the stack with the RHS of
It p s u
1. the production
l o o k
It
2. If M [ X, a ] is error call error routine
71
FIRST & FOLLOW
 The two functions represent a set of terminal symbols,
that aid us in constructing Predictive parser by helping
us fill the Parsing Table with valid entries
 Set of terminal symbols yielded by FOLLOW function
can also be used in error recovery
 FIRST – if α is the string of grammar symbols then
FIRST (α) is the set of terminals that begin the strings
derived from α (If α * > ε , then ε is also in FIRST (α))
 FOLLOW (A) – the set of terminals a that can appear
immediately to the right of A such that there exists a
derivation of the form S * > α A a β for some α and β
72
FIRST & FOLLOW (Contd.)
 The set of terminal symbols ( including ε )that can appear
at the far left of any parse tree derived from a particular
nonterminal is the FIRST set of that nonterminal
 The set of terminal symbols ( including ε ) That can
follow a nonterminal in some derivation or the other, is
called the FOLLOW set of that nonterminal
 A set of rules are followed to compute the FIRST and
FOLLOW set
 These sets will be used in creating Parsing table

73
Rules to Create FIRST
FIRST rules: GRAMMAR:
1. If X is a terminal, FIRST(X) = {X}
2. If X   , then   FIRST(X)
E  TE’
3. If X  aABC , then a  FIRST(X)
E’  +TE’ | 
T  FT’
4. If X →ABCD, then FIRST (A)  FIRST (X).
4a. Further, If A   , in the above production, then T’  FT’ | 
FIRST (B)  FIRST (X)........ [& so on recursively]
F  ( E ) | id

FIRST(id) = {id} FIRST(E’) = {}{+, }


FIRST() = {} FIRST(T’) = {}{, }
SETS: FIRST(+) = {+} FIRST(F) = {(, id}
FIRST(() = {(} FIRST(T) = FIRST(F) = {(, id}
FIRST()) = {)} FIRST(E) = FIRST(T) = {(, id}74
FIRST(E’) = {+, }
FIRST(T’) = { , } Rules to Create FOLLOW
FIRST(F) = {(, id} FOLLOW rules:
FIRST(T) = {(, id}
FIRST(E) = {(, id} 1. If S is the start symbol, then $  FOLLOW(S)

GRAMMAR: 2. If there is a production A  B, then,


E  TE’ EVERYTHING in FIRST() except  is placed
E’  +TE’ |  in FOLLOW(B)
T  FT’
3. If there is a production A  B then,
T’  FT’ | 
EVERYTHING in FOLLOW(A) is in FOLLOW(B)
F  ( E ) | id
SETS:
FOLLOW(E) = {$} { ), $} 3a. If there is a production A  B where
FOLLOW(E’) = { ), $} FIRST() contains  ( i.e.  
*  ) then,

FOLLOW(T) = { ), $} {+, ), $} EVERYTHING in FOLLOW(A) is in FOLLOW(B)


FOLLOW(T’) = { +, ), $}
FOLLOW(F) = {+, ), $} A and B are non-terminals,
{+, , ), $}  and  are strings of grammar symbols75
FOLLOW
Rules to Build Parse Table SETS:
E  TE’ FIRST(E’) = {+, } FOLLOW(E) = {), $}
FIRST FIRST(T’) = { , }
GRAMMAR: E’  +TE’ |  FOLLOW(E’) = { ), $}
T  FT’ SETS: FIRST(F) = {(, id} FOLLOW(T) = {+, ), $}
T’  FT’ |  FIRST(T) = {(, id}
FOLLOW(T’) = {+, ), $}
F  ( E ) | id FIRST(E) = {(, id}
Rules FOLLOW(F) = {+, , ), $}

1. If A  , :if a  FIRST(), add A   to M[A, a]


2. If A  , if   FIRST(), add A   to M[A, b] for each terminal
b  FOLLOW(A),
3.If A  , if   FIRST(), and $  FOLLOW(A), A   to M[A, $]
NON- INPUT SYMBOL
TERMINAL id + * ( ) $
PARSING E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
TABLE: T T  FT’ T  FT’
T’ T’  T’  *FT’ T’   T’  
F F  id F  (E) 76
Predictive Parsing
 A top down method of parsing (Recursive descent
parsing ) in which a set of procedures are executed
recursively to process the input.
 The speciality of predictive parsing is that :
The look ahead symbol unambiguously determines the
procedure selected for each nonterminal
(No Prediction in fact !!!)
 Hence:
The sequence of procedures called implicitly defines a
parse tree for the input.
77
Algorithm for Predictive parsing
procedure match (c : token);
{
if ( lookahead ==c) then lookahead := nexttoken;
else error;
}
procedure type;
{if ( lookahead is [ integer, char, num ] ) then simple;
else if (lookahead == ‘ ‘ ) { match (‘‘); match (id); }
else if ( lookahead == array )
then { match (array); match(‘[‘); simple;
match(‘]’); match(of); type; }
else error; }
………. Contd.
78
Algorithm for Predictive parsing
procedure simple;
{
if ( lookahead == integer ) match (integer);
else if ( lookahead == char ) match (char );
else if ( lookahead == num)
{ match (num ); match (dotdot);
match( num );}
else error;
}
79
Error recovery in Predictive parsing
 Error is encountered by Predictive Parser when:
– The terminal on the stack does not match the next input symbol
– The nonterminal A is on the top of the stack, A is the input
symbol and the parsing table is empty for entry M [A, a]
 Two methods used for recovery
1. Panic – Mode error recovery
– Skip symbols on the input until a token in a selected set of
synchronizing tokens appears
– Effectiveness depends on set of synchronizing tokens chosen
2. Phased – level recovery
– Fill empty slots of parsing table with pointers to error Routines

80
LL (1) Parsing
 Grammars whose predictive parsing tables contain no
multiple entries are called LL(1).
 LL(1)
● Left-to-right parse of input string
● Leftmost derivation of grammar
● 1 symbol look ahead (we know only what the next token is)
 LL (k) : we know what the next k tokens are
 Every LL(1) grammar is an LL(2) grammar and so on
81
Bottom Up Parsing
 Constructs the nodes in the syntax tree in the post order
i.e. bottom up by
Beginning at the leaves and working up till the root
 Suitable for automatic parser generation,
 Less restrictive in the sense that it can postpone the
decision of which production to use till it has seen all the
input tokens
 Hence it can handle a larger class of grammars ( LR
grammars ) than LL parsers
 But, Efficiency is low
82
Working of Bottom Up Parsing
 Repeatedly matches a right-sentential form from the
language against the tree’s upper frontier.
 At each match, it applies a reduction to build on the
frontier:
 Each reduction matches an upper frontier of the
partially built tree to the RHS of some production
 Each reduction adds a node on top of the frontier
 The final result is a rightmost derivation, in reverse.
83
Bottom Up Parsing
 The main task is to find the leftmost node that has not
yet been constructed, BUT all of whose children have
been constructed.
 Such a sequence of symbols whose top needs to be
constructed is called – The Handle
 Creating a parent node N for the handle based on
grammar rules is called Reducing the handle to N
 Hence the problem can be stated as :
1. To find the proper handle that leads to the expression &
2. To locate the right grammar to reduce the handle to a
parent node
84
Bottom Up Parsing S  aABe
A  Abc | b
Consider the Grammar: B  d
We want to parse the input string abbcde.
INPUT:
a b b c d e $
Production
S  aABe
A  Abc Bottom-Up Parsing
Ab Program
Bd

85
Bottom Up Parsing S  aABe
A  Abc | b
Consider the Grammar: B  d
We want to parse the input string abbcde.
OUTPUT:
INPUT:
a A
b b c d e $
Production
S  aABe A
A  Abc Bottom-Up Parsing
A b c
Ab Program
Bd b

We are not reducing here in this example.


A parser would reduce, get stuck and then backtrack!
86
Bottom Up Parsing S  aABe
A  Abc | b
Consider the Grammar: B  d
We want to parse the input string abbcde.
OUTPUT:
INPUT:
d S
a A B e $ S
Production
a A B e
S  aABe
A  Abc Bottom-Up Parsing
A b c d
Ab Program
Bd b
This parser is known as an LR Parser because it scans the input
from Left to right, and it constructs a Rightmost derivation in
reverse order
87
Bottom Up Parsing
 Very inefficient because of:
– Too many scanning of productions for matching with handles
in the input string, and
– Backtracking at times
 The bottom Up parsing, is also known as,
 Shift–reduce Parsing
– Uses a stack and two operations SHIFT & REDUCE
– SHIFT  Move current input symbol onto stack and
advance the input

– REDUCE  If the top few items on the stack form the


RHS of a production , Pop these items and
replace it with LHS of that Production
88
Shift Reduce Parsing – An Illustration
STACK INPUT ACTION
$ Id1 + id2 * id3 $ SHIFT
$ id1 + id2 * id3 $ REDUCE by E → id
$E + id2 * id3 $ SHIFT
$E+ id2 * id3 $ SHIFT
$ E + id2 * Id3 $ REDUCE by E → id
$E+E * Id3 $ SHIFT
$E+E* Id3 $ SHIFT
$ E + E * id3 $ REDUCE by E → id
1. E E + E
$E+E*E 2. E E * E $ REDUCE by E → E * E
$E+E 3. E ( E ) $ REDUCE by E → E + E
$E 4. E id $ ACCEPT 89
Conflict Situations
Two possible conflict Scenarios
1. Shift – Reduce Conflict:
– One rule implies shift while another implies reduce ( for E.g.)
– Grammar Productions : S → if E then S | if E then S else S
– On stack: if E then S and Input: else
– Should shift trying for 2nd rule or reduce by first rule?
2. Reduce – Reduce Conflict
– Can have two grammar rules with same RHS ( For E.g.)
– Grammar Productions : A → α and B → α
– On stack: α
– Which Production to choose for Reduce A → α and B → α?

90
Conflict Situations g
in
 Two possible conflict Scenarios ars
f P
d o
1. Shift – Reduce Conflict: K in
– One rule implies shift while another implies reduce this ( for E.g.)
for
– Grammar Productions : S → if E then aSbl|eif E then S else S
:
N ns u it
– On stack: if E then S and TInput: IO else u
L U a e
r reduce by first rule?
– Should shift trying for S2nd O ars rule or
2. Reduce – Reduce Conflict m m
g ra
– Can have two grammar u ch rules with same RHS ( For E.g.)
s s
– Grammar Productionsr a : A → α and B → α
ma
– On stack: m
ra α
e g
– Which t h Production to choose for Reduce A → α and B → α?
e
ang
Ch 91
Operator Precedence Parsing
 A simplified and efficient version of bottom up parsing
technique that can be applied only to a small but
important class of grammars – Operator Grammars
 Operator Grammars Require:
– No right hand side of rule is empty ( ε )
– No right hand side has 2 adjacent nonterminals
 Following Grammars represent same language;
E  E + E mar EEAE
| EGr–amE
r E*E |(E)
to|
era | id
Op | ( E )
| id A|+|–|*| 92
Operator Precedence Parsing
 Works on the basis of Operator Precedence:
 Uses three precedence relations
– a <. b, a yields in precedence to b
– a .> b, a takes precedence over b
– a .= b, a has same precedence as b and
 Find handle as <. .> pattern at top of stack;
 Drawbacks
– Small class of grammars qualify
– Overloaded operators are hard (unary minus)
– Parser correctness hard to prove
93
LR PARSERS
 An efficient Bottom Up parsing technique that can be
used to parse a large class of context free grammars
 An attractive option because it:
– Can be used with virtually any kind of grammar
– Is the most general non-backtracking shift reduce parsing
technique known
– Can be implemented as efficiently as any other technique
– Can detect syntax error as soon as possible to do so on a
left-to-right scan of the input
 But it is hard to construct Manually; However,
 Tools are available to construct them automatically.
94
LR Parser
 Consists of : 1. A Driver Program 2. An Input 3. An Output
4. A stack and 5. A Parsing Table
The Stack
– Stores a string of the form s0 Input
S
x1 s1 x2 s2 x3….xm sm with (sm at
t
the top) where each: LR Parsing
a Output
Program
1. xi is a grammar symbol & c
k
2. si is a symbol called State
summarizing the information in TABLEgoto
action
the stack below it
 The parsing table has 2 parts
1. A parsing action function and a
2. A goto function - goto 95
LR Parser Functioning
 The parsing Program:
– Reads input characters (a) from input buffer one at a time
– Determines the current state on the top of the stack (s)
– Consults the Parsing table based on current state s and
current input symbol a & gets M [s, a] which can be either:
1. Shift to state S
OR
2. Reduce by a grammar production
3. Accept 4. Error
– The function goto takes a state and grammar symbol as
arguments and produces a new state
– The next move of parser is to read input a and state s and
then consulting the entry action of parsing table
– If action [s,a] is accept , parsing is complete
96
LR Parsing Example
 Let us parse the input string : id + id  id $

 Using the grammar & the parsing table


State action goto
(1) E  E + T 0
id
s5
+ * (
s4
) $ E T F
1 2 3
(2) E  T 1
2
s6
r2 s7 r2
acc
r2
3 r4 r4 r4 r4
(3) T  T  F 4 s5 s4 8 2 3
5 r6 r6 r6 r6
(4) T  F 6 s5 s4 9 3
7 s5 s4 10
(5) F  ( E ) 8 s6 s11
9 r1 s7 r1 r1
(6) F  id 10 r3 r3 r3 r3
11 r5 r5 r5 r5
97
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id + id  id $
(6) F  id

LR Parsing
STACK: E
0
Program

State action goto


id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 98
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: E5
Program
id
0
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 99
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: 0
Program

State action goto


id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 100
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: E
3
Program
F
0 T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 101
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: 0
Program

T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 102
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: E
2
Program
T
0 T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 103
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: E
7
Program

2 T
T State action goto
0 id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 104
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: E5
Program
id
7 T F
 State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 105
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: 7
Program

2 T F
T State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 106
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: 10
E T
Program
F
7 T  F
 State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 107
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id

LR Parsing
STACK: 10
E0 T
Program
F
7 T  F
 State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 108
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 2 T
Program
T
0 T  F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc id
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 109
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 0 T
Program

T  F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 110
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 1 T
Program
E
0 T  F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 111
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 6 T
Program
+
1 T  F
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 112
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 5 T F
Program
id
6 T  F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 113
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 6 T F
Program
+
1 T  F id
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 114
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E T
LR Parsing
STACK: 3 T F
Program
F
6 T  F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 115
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E )
INPUT: id  id + id $
(6) F  id
E
LR Parsing
STACK: 6 T F
Program
+
1 T  F id
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 116
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E ) E
INPUT: id  id + id $
(6) F  id
E + T
LR Parsing
STACK: 9 T F
Program
T
6 T  F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 117
GRAMMAR:
(1) E  E + T
(2) E  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E ) E
INPUT: id  id + id $
(6) F  id
E + T
LR Parsing
STACK: 0 T F
Program

T  F id
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 118
GRAMMAR:
(1) E  E + T
(2) E’  T
LR Parser Example
(3) T  T  F
OUTPUT:
(4) T  F
(5) F  ( E ) E
INPUT: id  id + id $
(6) F  id
E + T
LR Parsing
STACK: 1 T F
Program
E
0 T  F id
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 119
LR Parsing Observations
 All LR parsers use the same parsing program
Independent of the table and hence of the Grammar.
 What differentiates the LR parsers are the action and the
goto tables:
 These tables can be constructed using :
1. Simple LR (SLR): succeds for the fewest grammars, but is the
easiest to implement
2. Canonical LR: succeds for the most grammars, but is the
hardest to implement. It splits states when necessary to
prevent reductions that would get the parser stuck
3. Lookahead LR (LALR): succeeds for most common syntatic
constructions used in programming languages, but produces
LR tables much smaller than canonical LR
120
SLR Parsing table Construction
 Parsing table construction is based on the key idea :
A handle should NOT be reduced to a nonterminal N
if the next token (i.e. look ahead token ) can not
follow N (thereby resulting in dead end and
backtracking)
 So ensure that :
Redction of item N → α● is done ONLY if the look
ahead token is part of FOLLOW (N).
 In order to achieve this, we define a concept called;
Dotted ITEM or ITEM for short
121
DOTTED ITEM ( A.k.a. ITEM)
 A concept used to summarize the state of our search &
 Sets of items to represent sets of hypotheses about the
next token An item with rightmost ● is called a REDUCE ITEM.
 Definition: All others are SHIFT ITEMS (or KERNEL ITEMS)
– Consider a production N →αβ
– An LR dotted Item N →α●β ( Note the ● in between)
means that we maintain the hypothesis that:
1. αβ is a possible handle and
2. We have already seen the α part
3. When ● reaches the rightmost point like in N →αβ●, we
have identified the handle
122
Dotted ITEM – An Example
 Consider the production A → XYZ
 This production yields four items:
1. A → ● XYZ
2. A → X ● YZ Shift Items
3. A → XY ● Z
4. A → XYZ ● Reduce Item
 The production A → ε yields only one item A → ●
 An item can be represented by a pair of integers
– One representing the production number in the grammar &
– Other indicating the position of the dot in the production
123
Dotted ITEM – An Example ave
 Consider the production A → XYZ e h
n w
 This production yields four items: c t io
d u ss
p r o ce
1. A → ● XYZ a r o
o f gp
c h s i n Items
Shift
2. A → X ● m u pa r
YZ
o w t he
3. A → s h ●inZ
XY
a te oint
d ic p Reduce Item
in A i→
4. e n XYZ ●
em g v
 The production t
i At → a ε yields only one item A → ●
n
a na
,
lycaneebe represented by a pair of integers
 An item t iv e s
t i
uOne representing the production number in the grammar &
In

– Other indicating the position of the dot in the production
124
Closure Operation
 The operation of extending the context with items is
called the closure operation
 Useful in identifying all “ITEMS” belonging to a state
 Example:
 If a state includes: A → ●B
– Include all items that is start of B like B →●X Y Z
– Informally: if I expect to see B next, I expect to start
anything that B can start with: So Include X → ●a H I
 States are thus built by closure from individual items.
125
Parsing table construction
 The Central Idea:
 To construct from the grammar a DFA to recognize
“viable prefixes”
 The method:
 Use the grammar to build a model of the DFA
– Group items together into sets, which give rise to the states
of the parser
– Use two functions goto( s, X ) and closure( s )
 goto() is analogous to move() in the subset construction
 closure() adds information to round out a state
 Build up the states and transition functions of the DFA
 Use this info to fill in the ACTION & GOTO tables
126
Creating States in Parsing Table Gen
 Start with the initial state
 Keep moving DOT across the symbols
– if it s a new state; If yes mark it .
– Otherwise mark it to an existing state
– Compute the closures, if any, in that state & include them
– Keep moving the dot and marking the states until all viable
states are identified and inter transitions are marked.
 The generated DFA is then used to construct the parsing
table with Action and go to items
127
Parsing Table Construction

128
E .E 0 E E. 1 E E+.T 6
E .E+T E E.+T T  .T * F
E  .T T  .F
T  .T * F E T. 2 F  .( E )
T  .F T  T.*F F  . id
F  .( E ) T  T * .F
T  F. 3 7
F  . id F  .( E )
F  id . 5 F  . id
4 F  (. E )
E .E+T E E+T. 9
8
E  .T F (E . ) T  T.*F

T  .T * F E E.+T 10
T  T*F.
T  .F
F  .( E ) F ( E ). 11
F  . id 129
Constructing Parsing Table from DFA
 Use the following rules :
1. An arc between two states labeled with a terminal is a
shift action.
2. An arc between two states labeled with a non-
terminal is a goto action.
3. if a state contains an item A → , (a reduce item)
the action is to reduce by this production, for all
terminals in Follow (A).
 If there are shift-reduce conflicts or reduce-reduce
conflicts, more elaborate techniques are needed
130
Transition Diagram for Expression
GrammarReduce 1

0 E 1 + 6 T 9 * 7
F 3
( 4
Reduce 2 id 5
T 2 * 7 F 10
Reduce 3 1. E  E + T
(
4 2. E  T
F id
Reduce 4
3 5 3. T  T * F
(
Reduce 5 4. T  F
( E 8 ) 11 5. F  ( E )
4
T 2 + 6. F  id
F 6
id 3
id 5
Reduce 6 131
LR Parsing - A Summary
 Most general paser that can parse wide variety of
grammars
 Parsing Program is one ; only the table changes
 Takes longer time to execute
 The table generation is tough, if manually done;
– Use of automated tools a necessity

 Error recovery is difficult


132
Comparison of LL & LR Parsers
LL LR
Left to right scan of input Left to right scan of input
k look ahead symbols k look ahead symbols
Left-most derivation Right-most derivation in reverse
Grammar with no multiple More wider range of Grammar
defined entries in Parsing acceptable ( less Restrictive)
Tables(Restrictive)
Shift Reduce Parser Shift-Reduce-Accept-Go to Parser
Recognize the use of production Recognize the occurrence of the
seeing only first k look ahead right side of a production, having
symbols of its right side derives seen all of what what is derived
from the right side with k input look
ahead symbols.
133
Comparison of Parsers
PARSER TYPE ADVANTAGES DISADVANTAGES
Fast Hand Coded
LL(1) Parsing
( Recursive, Simple High Maintenance
Descent, Good Locality Right Associativity
Top-down) Good Error Detection
Fast Large Working Sets
LR (1) Parsing Deterministic languages Poor error Messages
(Bottom- up) Automatable Large size tables
Less Associative
134
Parser generators
Yet Another Compiler Compiler ( YACC ).

Yacc specification Yacc y.tab.c


translate.y compiler

y.tab.c C
a.out
compiler

input a.out output

135
In Summary
 We use CFGs and Backus-Naur form to describe
context-free languages
 Grammars can be ambiguous
 Parsing techniques use these grammars
 The general CYK algorithm, for testing membership in
a context-free language Works for any unambiguous
grammar BUT Running time is O(n3), much too slow for
practical use (n is the length of the input)
 The practical techniques are
– LL ( recursive Descent) &
– LR ( Shift – Reduce ) Techniques
………. Contd.
136
In Summary
 LL Parsing ( A.k.a Recursive Descent parsing )
 LL stands for Left-to-right scan of input & Leftmost
derivation
 Start by expanding the start symbol and then expand
out nonterminals based on the input, building the parse
tree from the top down
 Works only on grammars with some restrictions. ( LL
Grammars with no left recursion and the first token
uniquely identifies the Production)
 Can be easily built manually
………. Contd.
137
In Summary
 LR Parsing ( A.k.a. Shift – Reduce parsing )
 LR stands for Left-to-right scan of input & Rightmost
derivation
 Bottom – up parsing techniques that starts from leaf
nodes and builds the tree upwards by reducing the
tokens to its LHS (recognize occurrence of b (the
handle) having seen all of what is derived from b )
 Works on wide variety of grammars ( LR Grammars
with no right recursion)
 Too complex to build manually
 Tools are available to build them automatically
138
Any
Questions ????

Thank you
139
Augmented Grammar & its Functions
 As a pre-requisite, “to construct DFA to recognize
viable prefixes from the grammar” we define:
1. Augmented grammar and
2. Two functions with it namely:
1. The closure Operation – Closure (I) and
2. The Goto Operation - goto (I, X)
 Augmented Grammar
– If G is the grammar with start symbol S then the
Augmented grammar for G is
– G’ with new start symbol S’ such that S’ → S
– Used to indicate the parser when to stop parsing
140
1. The closure Operation
 If I is the set of items for a grammar G then Closure (I)
is the set of items constructed from following 2 rules
1. Initially every item in I is added to Closure (I)
2. If A → α●Bβ is in the Closure (I) & B → γ is a production
then add the item B → ●γ to I if not already included
(This rule needs to be applied repeatedly until no more new items
can be added to Closure (I) ) E’ ●E
E  ●E + T
GRAMMAR If I is the set of one item E  ●T
E’  E T  ●T  F
{ E’ ●E },
E E+T|T T  ●F
T  T  F | F then the Closure (I) set would F ●( E )
F  ( E ) | id contain: F ●id 141
2. goto Operation
 goto (I, X) ( where I is the set of items and X is the input
grammar symbol) is defined as:
– The closure of the set of all items [A →αX●β] such that [A
→α ● Xβ] is in I.
 Intuitively, if I is the set of items that are valid for some
viable prefix γ, then goto (I,X) is the set of items that are
valid for the viable prefix γX
E E+●T
 If I is the set of 2 items
T  ●T  F
– { E’ E● and E  E● + T} ;
T  ●F
then, goto ( I, X) contains: F ●( E )
F ●id 142

You might also like