Compiler Design 3
Compiler Design 3
5. Semantic analysis
2
Syntax Analysis & Parsing
Just to recapitulate the definitions:
1. Syn-tax
The way in which words are stringed together to form
phrases, clauses and sentences
2. Syntax analysis:
The task concerned with fitting a sequence of
tokens into a specified syntax.
3. Parsing:
To break a sentence down into its component parts of
speech with
. an explanation of the form, function, and
syntactical relationship of each part
3
Syntax Analysis
The step two of Front end in Compilation
Every programming Language has RULES that prescribe
the syntactic structure (i.e. the way in which words and
phrases can be organized to form sentences.)
These rules can be formally specified using :
CONTEXT FREE GRAMMARS
( or GRAMMAR for Short)
A.k.a
BACKUS – NAUR FORM ( BNF)
4
Intro to Context Free Grammar (CFG)
Essential formalism for describing the structure of the
programs in a programming Language
A recipe for constructing elements of a set of strings of
symbols ( or tokens)
Consists of a set production rules and a start symbol
Each production rule:
– Defines a named syntactic construct using Two parts
1. An LHS (with a syntactic Construct) and
2. An RHS (showing the possible form of the construct)
– Connected by an Arrow symbol ( meaning “ is defined as” )
sentence → subject predicate ‘.’
5
Importance of the Grammar
Grammar offers significant advantages to both
language Designers and Compiler writers:
1. Gives precise, yet easy-to-understand, syntactic
specification of Programming Languages
2. Helps construct efficient parsers automatically ( not in
all cases – of course)
3. Provides a good structure to the language which in
turn helps generate correct object code
4. Makes language open ended easily amenable to add
additional constructs at a later date
6
Context Free Grammar – Formal Definition
A context-free grammar G = (T, N, S, P) consists of:
1. T, a set of terminals (scanner tokens - symbols that
may not appear on the left side of rule,).
2. N, a set of nonterminals (syntactic variables generated
by productions – symbols on left or right of rule).
3. S, a designated start nonterminal.
4. P, a set of productions (Rules). Each production has
the form, A::= , where A is a nonterminal and is a
sentential form , i.e., a string of zero or more grammar
symbols (terminals / nonterminals)
7
CFG – An Example
A grammar derives strings by
1. Beginning with the Start Symbol and
2. Repeatedly replacing a nonterminal by the right side
of the production rule for that nonterminal
Example:
expr expr op expr The terminal symbols are
expr ( expr ) id, num, (, ), +, –, *, /,
expr – expr The non-terminal symbols
expr id | num are expr op
op [ +, –, *, /, ] The start symbol is expr
8
CFG – Another Example
9
Notational Conventions
1. These Symbols are terminals
1. Lower case letters early in the alphabet ( like a, b, c)
2. Operator symbols such as +, - etc..
3. Punctuation symbols like ,(comma), (, ), etc
4. The digits 0, 1, ………..9
5. Boldface strings such as if, id etc…
2. These Symbols are nonterminals
1. Uppercase letters early in the alphabet ( like A, B, C)
2. The letter S when it appears, is usually the Start Symbol
3. Lower case letters late in the alphabet chiefly u, v, ….
……. z represent string of terminals
………. Contd. 10
Notational Conventions
4. Uppercase letters late in the alphabet such as X, Y, Z
represent grammar symbols that is either terminal or
nonterminal
5. Lower case Greek letters α, β, γ represent strings of
grammar symbols
6. If A → α1 ,, A → α2 …….. A → αn are all Productions
with A on the LHS ( known as A-productions), then, A
→ α1 | α2 …| αn (α1 , α2 … αn are alternatives of A)
7. Unless otherwise stated , the LHS of the first production
is the start nonterminal s
11
Use of Conventional Notation – Example
Remember the previous Example:
expr expr op expr
expr ( expr )
expr - expr
expr id | num
op [ +, -, *, /, ]
Using the notations described so far, this grammar can
be written as:
E → E A E | ( E ) | id | num
A→+|–|*|↑
12
Grammar for a Tiny Language
program ::= statement ; Statement
statement ::= assignStmt | ifStmt
assignStmt ::= id = expr ;
ifStmt ::= if ( expr ) stmt
expr ::= id | int | expr + expr
Id ::= a | b | c | i | j | k | n | x | y | z
int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
13
Regular Expressions and Grammars
EVERY construct that can be described by an RE can
also be described by a grammar
Example : RE ( a |b )* abb can be described as:
A0 → aA0 | bA0 | aA1 A1 → bA2 A2 → bA3 A3 → ε
Then why use RE for Lex & CFG for syntax analysis ?
– Lexical rules are frequently quite simple & we do not need
powerful grammars to describe them
– RE’s are more concise and easy to understand for tokens
– RE’s provide for automatic construction of efficient Lexical
Analyzers
14
Derivation
A way of showing how an input sentence is recognized
with a grammar Now let us derive a structure
ctr + ( a – 5) / 2
Can also be thought of as a type of algebraic substitution
using the grammar expr
Remember the grammar: expr op expr
expr + expr
1. expr expr op expr Expr + expr op exr
2. expr ( expr ) expr + expr / expr
3. expr - expr expr + expr / num
4. expr id | num expr + (expr) / num
expr + (expr – expr) / num
5. op [ +, -, *, /, ]
id = id + ( id – num) / num 15
Derivations – Some definitions
Uses productions to derive more complex Expressions
The symbol:
– > denotes “ derives in one step”.
– * > denotes “ derives in zero or more steps”( α * >β )
– + denotes “ derives in one or more steps” ( α + >β )
>
Given the grammar G with start symbol S, a string ω is
part of the language L (G), If and only if S * > ω
Further, if S * > α and α contains
– Only terminals, then it is a – Sentence of G
– nonterminals too, then it is a – Sentential form of G
16
Derivation – Another Example
Consider the following Grammar a simple ; terminated
expression consisting of interspersed numbers and +
1: stmt → expr ‘;’ Stmt > expr ;
2: expr → factor ‘ +’ factor
> factor + factor ;
3: | factor > Number + factor ;
4: factor → number
Let us try and come up with > Number + Number ;
a derivation to prove that an The process is called a
input sentence (like 1 + 4 ;) Leftmost Derivation because
can be recognized from this the left most nonterminal is
grammar always replaced
17
Leftmost Derivation
Start with the topmost symbol in the grammar
Replace the leftmost nonterminal in the partially parsed
sentence with the equivalent production’s RHS in each
step
The operator L > signifies left derivation
L
X > Y means X derives Y by leftmost Derivation
Each of the step (Partial Derivation) is called Viable
Prefix
Portion of the viable prefix that is replaced at each step is
called the Handle
18
Leftmost Derivation Example
19
Leftmost Derivation Example
20
Rightmost Derivation
Replace the rightmost nonterminal in the partially parsed
R
sentence > may signify Right Derivation
The operator
R
X > Y means X derives Y by rightmost Derivation
It is possible to start with the start symbol and do
rightmost derivation
21
Rightmost Derivation – Another Example
22
Rightmost Derivation – Another Example
23
Parse Tree
WHAT ?
A graphical representation for a derivation sequence showing
how to derive the string of a language from grammar starting
from Start symbol
WHY ?
Each stage of the compiler has two purposes:
– Detect and filter out some class of errors
– Compute some new information or translate the
representation of the program to make things easier for later
stages
Recursive structure of tree suits recursive structure of
language definition
25
Parse Tree (Contd.)
HOW ?
Tree nodes represent symbols of the grammar (nonterminals
or terminals) and tree edges represent derivation steps
Example:
Given the production: expr
expr expr op expr
The corresponding Parse tree is :
It’s Properties are:
expr op expr
1. The root is labeled by a start symbol
2. Each leaf is labeled by a token or by ε
3. Each interior node is labeled by a nonterminal
26
Parse Tree Example
Remember the grammar: The corresponding
Parse tree is
expr expr
expr → expr op expr
+ expr
expr
expr
expr +( expr
→ expr expr/ )expr
expr
expr + -(expr)
→ expr expr / expr expr + expr
expr
expr →expr id,–num
+ (expr expr) / expr Id (expr ) / Expr
opid (ctr)
id → + ( id[ –+,num)
-, *, /,/ num
] num
expr – expr
And the String derived from it (2)
ctr + ( a – 5) / 2 Id num
(a) (5)
Using the derivation
27
Parse Tree Example (Continued)
the Yield of the tree
expr (ctr)
– The string derived or
generated from the
expr + expr
nonterminal at the root of the
tree Id (expr ) / Expr
– Obtained by reading the (ctr)
leaves of the parse tree read num
expr – expr
from left to right (2)
In this Example Id num
(a) (5)
ctr = ctr + (a – 5) / 2
28
Parse Tree – Another Example
29
Another Derivation Example
Given the grammar : E E + E | E E | ( E ) | - E | num
Find a derivation for the expression: 6 + 2 4
E E E E
E + E E + E E + E
6 + (2 * 4 ) = 14 E E 6 E E
E E E E E E
(6 + 2) * 4 = 32 E + E E + E 4
6 2
According to above grammar BOTH are 30
Another Derivation Example t
u
Given the grammar : E E + E | E E | ( Ey )in|p- E | id
an
Find a derivation for the expression: e6fo+r 2 4
t r e
E E E
r s e E
pa r.
E + E E + Ee
n m a E + E
n o am
6 + (2 * 4 ) = 14 haE sgr E
e t 6 E E
or uou
Which derivation s m tree
b ig is 2 4
u c e am
E E d
o correct?
n E
p r e a E
a t o b
r thE
d t E E E E E
m a s a i
s
(6 +gra2)m * 4e=i 32 E + E E + E 4
A tenc
s en 6 2
According to above grammar BOTH are 31
Ambiguity in Grammar
A grammar that produces more than one parse tree for
any input sentence
– Happens because the derivation can be either
Leftmost or Rightmost
Can also happen because of the way the grammar is
defined
Classic example — the if-then-else problem
– Stmt → if Expr then Stmt
– | if Expr then Stmt else Stmt
– | … other stmts …
32
Stmt → If Expr then Stmt
Ambiguous IF | if Expr then Stmt else Stmt
| … other stmts
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2
33
Stmt → If Expr then Stmt
Ambiguous IF | if Expr then Stmt else Stmt
| … other stmts
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2
34
Removing the ambiguity
1. Add precedence ; for Example
This grammar is slightly larger
Takes more rewriting to reach
some of the terminal symbols
Encodes expected precedence
Produces same parse tree
under leftmost & rightmost
derivations
Let’s see how it parses x - 2 * y
35
Removing the ambiguity (Contd.)
36
Removing the ambiguity (Contd.)
me s
s a ode
t he enc
i v e itly
s g l ic
i on e x p
a t d
r iv an
d e t ly
s t ec e
m o d i r n c
t e
ir gh mar eced
nd am pr
t a g r ed
os he s i r
f tm et d e
le u s he
t he eca t
o th n, b
B sio
re s
ex p
37
Removing the ambiguity (Contd.)
2. Rewrite the grammar to avoid generating the problem
3. Example The case of If then if then else :
Match each else to innermost unmatched if (common sense rule)
38
Removing the ambiguity (Contd.)
40
Removing the ambiguity (Contd.)
ng rt i
s ta
n s
c t i o
d u g.
pro torin
i pl e fa c
ul t e ft
m d l
a te alle
m i n c
e i
l ni s
r to o k e
m a e t
ra m sa m
a g he
g h t
r i t i n wi t
w
Re
41
Ambiguity - A Summary
Ambiguity arises from two distinct sources
– Confusion in the context-free syntax (if-then-else)
– Confusion that requires context to resolve (overloading)
Resolving ambiguity
– To remove context-free ambiguity, rewrite the grammar
– To handle context-sensitive ambiguity use precedence
This is a language design problem
Sometimes, the compiler writer accepts an ambiguous
grammar
Parsing techniques must “do the right thing” (i.e.,
always select the same derivation)
42
Recursion In Grammars
Very Common in grammars
Generally 3 types of recursions are Possible:
1. A ::= Ab (Left Recursion)
2. B ::= cB (Right Recursion)
3. C ::= dCf (Middle Recursion or Self-embedding)
Facts:
If a grammar contains no middle recursion then the
language it generates is regular.
If there is no recursion at all in a grammar, then it is
finite and therefore regular.
43
Left Recursion
Consider E E + T | T A top-down parser might loop
the T T F | F forever when parsing an
grammar: F ( E ) | id expression using this grammar
E E E E
E + T E + T E + T
E + T E + T
E + T
A grammar that has at least one production of the form
A A is a left recursive grammar
Top Down parsing cannot handle left-recursive grammars
44
Eliminating left Recursion
First group the A productions as A A1 | A2| …..|
Am|1 |2 |.. n |
Where no i begins with an A.
Then replace the A-productions by A 1 A’|2 A’|……
n A’|
A’ 1 A’| 2 A’|… n A’|
This eliminates all immediate left recursions from A and A’
productions
BUT
It does not eliminate left recursions involving derivations of
two or more steps
45
Left Recursion ( Contd.)
Left-recursion can often be eliminated by rewriting the
grammar
EE+T|T
This left-recursive grammar: T T F | F
F ( E ) | id
Can be re-written to eliminate the immediate left recursion:
E TE’
E’ +TE’ |
T FT’
T’ FT’ |
F ( E ) | id 46
Properties of Grammars – A Summary
Epsilon Productions
– Has at least one production of the form “E → ε ”,
Right( / Left) Linear Grammar:
– Grammars where all the RHS of all productions has at most
one non-terminal and that nonterminal appears rightmost
(/ Leftmost.)
– E → x E” or “E → x”.
Left-Recursive Grammar:
– A grammar that can generate “E → E + X” (for example).
Similarly, “Right-recursive grammars.”
Ambiguous Grammar
– More than one parse tree is possible for a specific sentence47.
Limitations of CFG
CFGs cannot express context conditions
For Example:
1. Every name must be declared before it is used
2. The operands of an expression must have
compatible types
Possible solutions
Use context-sensitive grammars
– Too complicated
Check context conditions during semantic analysis
48
Syntax Analysis
Problem Statement:
To find a derivation sequence in a grammar G for the
input token stream
or
To conclude that none exists
Solution
Build parse tree for the string ( of tokens) based on
grammar rules
The process is known as P A R S I N G
49
PARSING
Constructs the syntax tree for a given sequence of
tokens using grammar rules and productions such
that:
1. The top node is labeled with the start symbol of the
grammar
2. Leaf nodes are labeled with terminals and inner nodes
with nonterminals
3. the children of the inner node labeled N correspond to the
members of an alternative of N, in the same order as they
occur in that alternative
4. The terminals labeling the leaf nodes correspond to the
sequence of tokens, in the same order they occur in the
input
………. Contd.
50
The Role of a Parser
Lexical Rest of the
analysis
PARSER
front end
Symbol Table
The parser continues the process of translation and
validation started by the Lexical Analyzer:
1. Analyzing the phrase structure of the program,
2. Adding hierarchical (tree) structure to the flat stream of
tokens produced by the lexical Analyzer,
3. Outputting a data structure conveying the program's
syntax to subsequent compile phases, AND
4. Reporting errors when the program does not match the
syntactic structure of the source language
51
Parsing Methods
Three Types of parsing methods in Vogue
1. Universal Parsing method
– Too inefficient to use in any practical compilers ( hence not
discussed any further)
– EX: Coocke–Younger–Kasami (CYK) algorithm
2. Top – Down Parsing
– Can be generated automatically or written manually
– Ex: : Left-to-Right –Top –Down Parser ( A.k.a. LL parser)
3. Bottom – Up parsing
– Can only be generated automatically
– Ex : Left-to-Right –Bottom-Up Parser ( A.k.a. LR Parser)
52
Top – Down & Bottom – Up Methods
While they are efficient;
They can work only on restricted classes of Grammars,
( such as LL and LR Grammars )
BUT
They are expressive enough to describe most syntactic
constructs in Programming Languages.
SO
Care must be taken while defining the grammar for the
language to make sure that the grammar is
unambiguous
53
TOP – DOWN Parsing
An attempt to construct a parse tree for an input string
from the root and creating the nodes in PRE-ORDER (i.e.
the top of the tree is constructed before its nodes)
Can also be viewed as an attempt to find a leftmost
derivation for an input string
The way it works:
1. The process starts at root node say N
2. Repeat until the fringe of the parse tree matches the input string
1. Determine the correct alternative for N The key Step
2. Parser then proceeds to construct the left child of N
3. The process of determining the correct alternative for the leftmost
child continues till the left most child is a terminal
54
Top Down parsing example
String to be parsed :array [ num dotdot num] of integer
type
Predictive Parsing E
STACK: E
T
E’
$ Program T E’
$
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
TE Program
$
E’
$
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + +* (* ) ( $ ) $
PARSING E E TE’ E TE’
E E TE’ E TE’
E’ E’ E’ +TE’E’ +TE’ E’ E’ E’ E’
TABLE: T T
T FT’ T FT’ T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
63
A Predictive Parser
The Working of the Table driven Predictive Parser
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
FT Program F T’
T’
E’
E’
$
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E TE’
E TE’
E’ E’ E’ E’+TE’
+TE’ E’ E’E’ E’
TABLE: TT TTFT’ FT’ T FT’
T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
64
A Predictive Parser
The Working of the Table driven Predictive Parser
Action when Top(Stack) = input $ : Pop stack, advance input.
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
id
F Program F T’
T’
E’ id
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E TE’
E TE’
E’ E’ E’ E’+TE’
+TE’ E’ E’E’ E’
TABLE: TT TTFT’ FT’ T FT’
T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
65
A Predictive Parser
The Working of the Table driven Predictive Parser
Action when Top(Stack) = ε : Pop stack
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
ε
T’ Program F T’
E’
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E TE’
E TE’
E’ E’ E’ E’+TE’
+TE’ E’ E’E’ E’
TABLE: TT TTFT’ FT’ T FT’
T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
66
A Predictive Parser
The Working of the Table driven Predictive Parser
Action when Top(Stack) = input $ : Pop stack, advance input.
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
E’
+ F T’ + T E’
Program
T$
E’ id ε
$ NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E TE’
E TE’
E’ E’ E’ E’+TE’
+TE’ E’ E’E’ E’
TABLE: TT TTFT’ FT’ T FT’
T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
67
A Predictive Parser
The Working of the Table driven Predictive Parser
INPUT: id + id id $ OUTPUT:
E
STACK: T E’
Predictive Parsing
T F T’ + T E’
E’ Program
$ id ε
NON- INPUT SYMBOL
NON- INPUT SYMBOL
TERMINAL
TERMINAL id id + + * * ( ( ) )$ $
PARSING EE EETE’TE’ E TE’
E TE’
E’ E’ E’ E’+TE’
+TE’ E’ E’E’ E’
TABLE: TT TTFT’ FT’ T FT’
T FT’
T’ T’ T’ *FT’ T’ T’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
F F id F (E)
68
A Predictive Parser
The predictive parser proceeds in this fashion emitting the
following productions:
T FT’ E
T E’
F id
F T’ + T E’
T’ FT’
id F T’
F id
id F T’
T’
E’
id
When Top(Stack) = input = $ the parser halts & accepts the input
string – ( id + id * id ) 69
Algorithm for Predictive parser
The program execution is controlled by TWO inputs
– The input symbol a and the symbol on the top of the stack X
There are THREE possibilities for the parser
– If X = a = $ : halt & announce the successful completion of
parsing
– If X = a ≠ $ : Pop X off the stack and advance the input
pointer to next input symbol
– If X is a nonterminal : Consult entry M [ X, a ] in the Parsing
table M and replace X on top of the stack with the RHS of
the production
If M [ X, a ] is error call error routine
70
Algorithm for Predictive parser
c e io n
The program execution is controlled by TWO s i n inputs v a t
r r i
– The input symbol a and the symbol onathe rse top tof-dethe stack X
) P o s o n .
There are THREE possibilitiesLfor ( 1 f t m c ti
L theleparser t a
a s s a n e x
– If X = a = $ : halt & announce w n the o e
successful i ts completion of
n o d d e
parsing o k a n o o s
a ls h t, c h
– If X = a ≠ $ :r Pop s
i X off r igthe t
stack o and advance the input
r s e t o e ad
pointerpto a nextleinputt
f symbol a h
h is m o l
– IfTX is a nonterminal
fr o m b : Consult entry M [ X, a ] in the Parsing
e s s y
tablea r s 1
M andp replace X on top of the stack with the RHS of
It p s u
1. the production
l o o k
It
2. If M [ X, a ] is error call error routine
71
FIRST & FOLLOW
The two functions represent a set of terminal symbols,
that aid us in constructing Predictive parser by helping
us fill the Parsing Table with valid entries
Set of terminal symbols yielded by FOLLOW function
can also be used in error recovery
FIRST – if α is the string of grammar symbols then
FIRST (α) is the set of terminals that begin the strings
derived from α (If α * > ε , then ε is also in FIRST (α))
FOLLOW (A) – the set of terminals a that can appear
immediately to the right of A such that there exists a
derivation of the form S * > α A a β for some α and β
72
FIRST & FOLLOW (Contd.)
The set of terminal symbols ( including ε )that can appear
at the far left of any parse tree derived from a particular
nonterminal is the FIRST set of that nonterminal
The set of terminal symbols ( including ε ) That can
follow a nonterminal in some derivation or the other, is
called the FOLLOW set of that nonterminal
A set of rules are followed to compute the FIRST and
FOLLOW set
These sets will be used in creating Parsing table
73
Rules to Create FIRST
FIRST rules: GRAMMAR:
1. If X is a terminal, FIRST(X) = {X}
2. If X , then FIRST(X)
E TE’
3. If X aABC , then a FIRST(X)
E’ +TE’ |
T FT’
4. If X →ABCD, then FIRST (A) FIRST (X).
4a. Further, If A , in the above production, then T’ FT’ |
FIRST (B) FIRST (X)........ [& so on recursively]
F ( E ) | id
80
LL (1) Parsing
Grammars whose predictive parsing tables contain no
multiple entries are called LL(1).
LL(1)
● Left-to-right parse of input string
● Leftmost derivation of grammar
● 1 symbol look ahead (we know only what the next token is)
LL (k) : we know what the next k tokens are
Every LL(1) grammar is an LL(2) grammar and so on
81
Bottom Up Parsing
Constructs the nodes in the syntax tree in the post order
i.e. bottom up by
Beginning at the leaves and working up till the root
Suitable for automatic parser generation,
Less restrictive in the sense that it can postpone the
decision of which production to use till it has seen all the
input tokens
Hence it can handle a larger class of grammars ( LR
grammars ) than LL parsers
But, Efficiency is low
82
Working of Bottom Up Parsing
Repeatedly matches a right-sentential form from the
language against the tree’s upper frontier.
At each match, it applies a reduction to build on the
frontier:
Each reduction matches an upper frontier of the
partially built tree to the RHS of some production
Each reduction adds a node on top of the frontier
The final result is a rightmost derivation, in reverse.
83
Bottom Up Parsing
The main task is to find the leftmost node that has not
yet been constructed, BUT all of whose children have
been constructed.
Such a sequence of symbols whose top needs to be
constructed is called – The Handle
Creating a parent node N for the handle based on
grammar rules is called Reducing the handle to N
Hence the problem can be stated as :
1. To find the proper handle that leads to the expression &
2. To locate the right grammar to reduce the handle to a
parent node
84
Bottom Up Parsing S aABe
A Abc | b
Consider the Grammar: B d
We want to parse the input string abbcde.
INPUT:
a b b c d e $
Production
S aABe
A Abc Bottom-Up Parsing
Ab Program
Bd
85
Bottom Up Parsing S aABe
A Abc | b
Consider the Grammar: B d
We want to parse the input string abbcde.
OUTPUT:
INPUT:
a A
b b c d e $
Production
S aABe A
A Abc Bottom-Up Parsing
A b c
Ab Program
Bd b
90
Conflict Situations g
in
Two possible conflict Scenarios ars
f P
d o
1. Shift – Reduce Conflict: K in
– One rule implies shift while another implies reduce this ( for E.g.)
for
– Grammar Productions : S → if E then aSbl|eif E then S else S
:
N ns u it
– On stack: if E then S and TInput: IO else u
L U a e
r reduce by first rule?
– Should shift trying for S2nd O ars rule or
2. Reduce – Reduce Conflict m m
g ra
– Can have two grammar u ch rules with same RHS ( For E.g.)
s s
– Grammar Productionsr a : A → α and B → α
ma
– On stack: m
ra α
e g
– Which t h Production to choose for Reduce A → α and B → α?
e
ang
Ch 91
Operator Precedence Parsing
A simplified and efficient version of bottom up parsing
technique that can be applied only to a small but
important class of grammars – Operator Grammars
Operator Grammars Require:
– No right hand side of rule is empty ( ε )
– No right hand side has 2 adjacent nonterminals
Following Grammars represent same language;
E E + E mar EEAE
| EGr–amE
r E*E |(E)
to|
era | id
Op | ( E )
| id A|+|–|*| 92
Operator Precedence Parsing
Works on the basis of Operator Precedence:
Uses three precedence relations
– a <. b, a yields in precedence to b
– a .> b, a takes precedence over b
– a .= b, a has same precedence as b and
Find handle as <. .> pattern at top of stack;
Drawbacks
– Small class of grammars qualify
– Overloaded operators are hard (unary minus)
– Parser correctness hard to prove
93
LR PARSERS
An efficient Bottom Up parsing technique that can be
used to parse a large class of context free grammars
An attractive option because it:
– Can be used with virtually any kind of grammar
– Is the most general non-backtracking shift reduce parsing
technique known
– Can be implemented as efficiently as any other technique
– Can detect syntax error as soon as possible to do so on a
left-to-right scan of the input
But it is hard to construct Manually; However,
Tools are available to construct them automatically.
94
LR Parser
Consists of : 1. A Driver Program 2. An Input 3. An Output
4. A stack and 5. A Parsing Table
The Stack
– Stores a string of the form s0 Input
S
x1 s1 x2 s2 x3….xm sm with (sm at
t
the top) where each: LR Parsing
a Output
Program
1. xi is a grammar symbol & c
k
2. si is a symbol called State
summarizing the information in TABLEgoto
action
the stack below it
The parsing table has 2 parts
1. A parsing action function and a
2. A goto function - goto 95
LR Parser Functioning
The parsing Program:
– Reads input characters (a) from input buffer one at a time
– Determines the current state on the top of the stack (s)
– Consults the Parsing table based on current state s and
current input symbol a & gets M [s, a] which can be either:
1. Shift to state S
OR
2. Reduce by a grammar production
3. Accept 4. Error
– The function goto takes a state and grammar symbol as
arguments and produces a new state
– The next move of parser is to read input a and state s and
then consulting the entry action of parsing table
– If action [s,a] is accept , parsing is complete
96
LR Parsing Example
Let us parse the input string : id + id id $
LR Parsing
STACK: E
0
Program
LR Parsing
STACK: E5
Program
id
0
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 99
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: 0
Program
LR Parsing
STACK: E
3
Program
F
0 T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 101
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: 0
Program
T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 102
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: E
2
Program
T
0 T
State action goto
id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 103
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: E
7
Program
2 T
T State action goto
0 id + * ( ) $ E T F F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 104
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: E5
Program
id
7 T F
State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 105
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: 7
Program
2 T F
T State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 106
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: 10
E T
Program
F
7 T F
State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 107
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
LR Parsing
STACK: 10
E0 T
Program
F
7 T F
State action goto
2 id + * ( ) $ E T F F id
T 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 108
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 2 T
Program
T
0 T F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc id
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 109
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 0 T
Program
T F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 110
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 1 T
Program
E
0 T F
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 111
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 6 T
Program
+
1 T F
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 112
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 5 T F
Program
id
6 T F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 113
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 6 T F
Program
+
1 T F id
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 114
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E T
LR Parsing
STACK: 3 T F
Program
F
6 T F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 115
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E )
INPUT: id id + id $
(6) F id
E
LR Parsing
STACK: 6 T F
Program
+
1 T F id
E State action goto
0 id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 116
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E ) E
INPUT: id id + id $
(6) F id
E + T
LR Parsing
STACK: 9 T F
Program
T
6 T F id
+ State action goto
1 id + * ( ) $ E T F F id
E 0 s5 s4 1 2 3
1 s6 acc
0 2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 117
GRAMMAR:
(1) E E + T
(2) E T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E ) E
INPUT: id id + id $
(6) F id
E + T
LR Parsing
STACK: 0 T F
Program
T F id
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 118
GRAMMAR:
(1) E E + T
(2) E’ T
LR Parser Example
(3) T T F
OUTPUT:
(4) T F
(5) F ( E ) E
INPUT: id id + id $
(6) F id
E + T
LR Parsing
STACK: 1 T F
Program
E
0 T F id
State action goto
id + * ( ) $ E T F F id
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
id
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 119
LR Parsing Observations
All LR parsers use the same parsing program
Independent of the table and hence of the Grammar.
What differentiates the LR parsers are the action and the
goto tables:
These tables can be constructed using :
1. Simple LR (SLR): succeds for the fewest grammars, but is the
easiest to implement
2. Canonical LR: succeds for the most grammars, but is the
hardest to implement. It splits states when necessary to
prevent reductions that would get the parser stuck
3. Lookahead LR (LALR): succeeds for most common syntatic
constructions used in programming languages, but produces
LR tables much smaller than canonical LR
120
SLR Parsing table Construction
Parsing table construction is based on the key idea :
A handle should NOT be reduced to a nonterminal N
if the next token (i.e. look ahead token ) can not
follow N (thereby resulting in dead end and
backtracking)
So ensure that :
Redction of item N → α● is done ONLY if the look
ahead token is part of FOLLOW (N).
In order to achieve this, we define a concept called;
Dotted ITEM or ITEM for short
121
DOTTED ITEM ( A.k.a. ITEM)
A concept used to summarize the state of our search &
Sets of items to represent sets of hypotheses about the
next token An item with rightmost ● is called a REDUCE ITEM.
Definition: All others are SHIFT ITEMS (or KERNEL ITEMS)
– Consider a production N →αβ
– An LR dotted Item N →α●β ( Note the ● in between)
means that we maintain the hypothesis that:
1. αβ is a possible handle and
2. We have already seen the α part
3. When ● reaches the rightmost point like in N →αβ●, we
have identified the handle
122
Dotted ITEM – An Example
Consider the production A → XYZ
This production yields four items:
1. A → ● XYZ
2. A → X ● YZ Shift Items
3. A → XY ● Z
4. A → XYZ ● Reduce Item
The production A → ε yields only one item A → ●
An item can be represented by a pair of integers
– One representing the production number in the grammar &
– Other indicating the position of the dot in the production
123
Dotted ITEM – An Example ave
Consider the production A → XYZ e h
n w
This production yields four items: c t io
d u ss
p r o ce
1. A → ● XYZ a r o
o f gp
c h s i n Items
Shift
2. A → X ● m u pa r
YZ
o w t he
3. A → s h ●inZ
XY
a te oint
d ic p Reduce Item
in A i→
4. e n XYZ ●
em g v
The production t
i At → a ε yields only one item A → ●
n
a na
,
lycaneebe represented by a pair of integers
An item t iv e s
t i
uOne representing the production number in the grammar &
In
–
– Other indicating the position of the dot in the production
124
Closure Operation
The operation of extending the context with items is
called the closure operation
Useful in identifying all “ITEMS” belonging to a state
Example:
If a state includes: A → ●B
– Include all items that is start of B like B →●X Y Z
– Informally: if I expect to see B next, I expect to start
anything that B can start with: So Include X → ●a H I
States are thus built by closure from individual items.
125
Parsing table construction
The Central Idea:
To construct from the grammar a DFA to recognize
“viable prefixes”
The method:
Use the grammar to build a model of the DFA
– Group items together into sets, which give rise to the states
of the parser
– Use two functions goto( s, X ) and closure( s )
goto() is analogous to move() in the subset construction
closure() adds information to round out a state
Build up the states and transition functions of the DFA
Use this info to fill in the ACTION & GOTO tables
126
Creating States in Parsing Table Gen
Start with the initial state
Keep moving DOT across the symbols
– if it s a new state; If yes mark it .
– Otherwise mark it to an existing state
– Compute the closures, if any, in that state & include them
– Keep moving the dot and marking the states until all viable
states are identified and inter transitions are marked.
The generated DFA is then used to construct the parsing
table with Action and go to items
127
Parsing Table Construction
128
E .E 0 E E. 1 E E+.T 6
E .E+T E E.+T T .T * F
E .T T .F
T .T * F E T. 2 F .( E )
T .F T T.*F F . id
F .( E ) T T * .F
T F. 3 7
F . id F .( E )
F id . 5 F . id
4 F (. E )
E .E+T E E+T. 9
8
E .T F (E . ) T T.*F
T .T * F E E.+T 10
T T*F.
T .F
F .( E ) F ( E ). 11
F . id 129
Constructing Parsing Table from DFA
Use the following rules :
1. An arc between two states labeled with a terminal is a
shift action.
2. An arc between two states labeled with a non-
terminal is a goto action.
3. if a state contains an item A → , (a reduce item)
the action is to reduce by this production, for all
terminals in Follow (A).
If there are shift-reduce conflicts or reduce-reduce
conflicts, more elaborate techniques are needed
130
Transition Diagram for Expression
GrammarReduce 1
0 E 1 + 6 T 9 * 7
F 3
( 4
Reduce 2 id 5
T 2 * 7 F 10
Reduce 3 1. E E + T
(
4 2. E T
F id
Reduce 4
3 5 3. T T * F
(
Reduce 5 4. T F
( E 8 ) 11 5. F ( E )
4
T 2 + 6. F id
F 6
id 3
id 5
Reduce 6 131
LR Parsing - A Summary
Most general paser that can parse wide variety of
grammars
Parsing Program is one ; only the table changes
Takes longer time to execute
The table generation is tough, if manually done;
– Use of automated tools a necessity
y.tab.c C
a.out
compiler
135
In Summary
We use CFGs and Backus-Naur form to describe
context-free languages
Grammars can be ambiguous
Parsing techniques use these grammars
The general CYK algorithm, for testing membership in
a context-free language Works for any unambiguous
grammar BUT Running time is O(n3), much too slow for
practical use (n is the length of the input)
The practical techniques are
– LL ( recursive Descent) &
– LR ( Shift – Reduce ) Techniques
………. Contd.
136
In Summary
LL Parsing ( A.k.a Recursive Descent parsing )
LL stands for Left-to-right scan of input & Leftmost
derivation
Start by expanding the start symbol and then expand
out nonterminals based on the input, building the parse
tree from the top down
Works only on grammars with some restrictions. ( LL
Grammars with no left recursion and the first token
uniquely identifies the Production)
Can be easily built manually
………. Contd.
137
In Summary
LR Parsing ( A.k.a. Shift – Reduce parsing )
LR stands for Left-to-right scan of input & Rightmost
derivation
Bottom – up parsing techniques that starts from leaf
nodes and builds the tree upwards by reducing the
tokens to its LHS (recognize occurrence of b (the
handle) having seen all of what is derived from b )
Works on wide variety of grammars ( LR Grammars
with no right recursion)
Too complex to build manually
Tools are available to build them automatically
138
Any
Questions ????
Thank you
139
Augmented Grammar & its Functions
As a pre-requisite, “to construct DFA to recognize
viable prefixes from the grammar” we define:
1. Augmented grammar and
2. Two functions with it namely:
1. The closure Operation – Closure (I) and
2. The Goto Operation - goto (I, X)
Augmented Grammar
– If G is the grammar with start symbol S then the
Augmented grammar for G is
– G’ with new start symbol S’ such that S’ → S
– Used to indicate the parser when to stop parsing
140
1. The closure Operation
If I is the set of items for a grammar G then Closure (I)
is the set of items constructed from following 2 rules
1. Initially every item in I is added to Closure (I)
2. If A → α●Bβ is in the Closure (I) & B → γ is a production
then add the item B → ●γ to I if not already included
(This rule needs to be applied repeatedly until no more new items
can be added to Closure (I) ) E’ ●E
E ●E + T
GRAMMAR If I is the set of one item E ●T
E’ E T ●T F
{ E’ ●E },
E E+T|T T ●F
T T F | F then the Closure (I) set would F ●( E )
F ( E ) | id contain: F ●id 141
2. goto Operation
goto (I, X) ( where I is the set of items and X is the input
grammar symbol) is defined as:
– The closure of the set of all items [A →αX●β] such that [A
→α ● Xβ] is in I.
Intuitively, if I is the set of items that are valid for some
viable prefix γ, then goto (I,X) is the set of items that are
valid for the viable prefix γX
E E+●T
If I is the set of 2 items
T ●T F
– { E’ E● and E E● + T} ;
T ●F
then, goto ( I, X) contains: F ●( E )
F ●id 142