0% found this document useful (0 votes)

18 views91 pages

U 2 PPT

The document outlines the key concepts of syntax analysis in compiler design, focusing on the role of parsers, context-free grammars, and various parsing techniques such as top-down and bottom-up parsing. It discusses error handling strategies, derivation trees, and the elimination of ambiguity in grammars, as well as the construction of predictive parsing tables. Additionally, it covers the concepts of First and Follow sets, LL(1) grammars, and error recovery methods in predictive parsing.

Uploaded by

Hari haran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views91 pages

U 2 PPT

Uploaded by

Hari haran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

20CS601 – COMPILER DESIGN Lab Integrated

UNIT – II PPT

Dr.S.Selvi
Professor/CSE
UNIT II
SYNTAX ANALYSIS

Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in

Grammars and Languages- Writing a grammar – Top-Down Parsing –Bottom Up
Parsing -LR Parser-SLR, CLR - Introduction to LALR Parser -Parser Generators –
Design of a parser generator – YACC

2
Outline
⚫ Role of parser
⚫ Context free grammars
⚫ Top down parsing
⚫ Bottom up parsing
⚫ Parser generators
⚫ Type Checking
⚫ Type Conversions
⚫ Run – Time Environments
⚫ Storage Allocation Strategies
⚫ Activation Records
The role of parser
token
Source Lexical Parse tree Rest of Intermediate
program Parser
Analyzer Front End representation
getNext
Token

Symbol
table
Uses of grammars
E -> E + T | T
T -> T * F | F
F -> (E) | id

E -> TE’
E’ -> +TE’ | Ɛ
T -> FT’
T’ -> *FT’ | Ɛ
F -> (E) | id
Error handling
⚫ Common programming errors
⚫ Lexical errors
⚫ Syntactic errors
⚫ Semantic errors
⚫ Lexical errors
⚫ Error handler goals
⚫ Report the presence of errors clearly and accurately
⚫ Recover from each error quickly enough to detect subsequent
errors
⚫ Add minimal overhead to the processing of correct programs
Error-recover strategies
⚫ Panic mode recovery
⚫ Discard input symbol one at a time until one of designated set of
synchronization tokens is found
⚫ Phrase level recovery
⚫ Replacing a prefix of remaining input by some string that allows
the parser to continue
⚫ Error productions
⚫ Augment the grammar with productions that generate the
erroneous constructs
⚫ Global correction
⚫ Choosing minimal sequence of changes to obtain a globally
least-cost correction
Context free grammars
⚫ Terminals
⚫ Nonterminals
expression -> expression + term
⚫ Start symbol expression -> expression – term
⚫ productions expression -> term
term -> term * factor
term -> term / factor
term -> factor
factor -> (expression)
factor -> id
Derivations
⚫ Productions are treated as rewriting rules to generate a
string
⚫ Rightmost and leftmost derivations
⚫ E -> E + E | E * E | -E | (E) | id
⚫ Derivations for –(id+id)
⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Parse trees
⚫ -(id+id)
⚫ E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Ambiguity
⚫ For some strings there exist more than one parse tree
⚫ Or more than one leftmost derivation
⚫ Or more than one rightmost derivation
⚫ Example: id+id*id
Elimination of ambiguity
Elimination of ambiguity (cont.)

⚫ Idea:
⚫ A statement appearing between a then and an else must be
matched
Elimination of left recursion
⚫ A grammar is left recursive if it has a non-terminal A
such that there is a derivation A=>+Aα
⚫ Top down parsing methods cant handle left-recursive
grammars
⚫ A simple rule for direct left recursion elimination:
⚫ For a rule like:
⚫ A -> A α|β
⚫ We may replace it with
⚫ A -> β A’
⚫ A’ -> α A’ | ɛ
Left recursion elimination (cont.)

⚫ There are cases like following

⚫ S -> Aa | b
⚫ A -> Ac | Sd | ɛ
⚫ Left recursion elimination algorithm:
⚫ Arrange the nonterminals in some order A1,A2,…,An.
⚫ For (each i from 1 to n) {
⚫ For (each j from 1 to i-1) {
⚫ Replace each production of the form Ai-> Aj γ by the production Ai
-> δ1 γ | δ2 γ | … |δk γ where Aj-> δ1 | δ2 | … |δk are
all current Aj productions
⚫ }
⚫ Eliminate left recursion among the Ai-productions
⚫ }
Left factoring
⚫ Left factoring is a grammar transformation that is useful for
producing a grammar suitable for predictive or top-down
parsing.
⚫ Consider following grammar:
⚫ Stmt -> if expr then stmt else stmt
⚫ | if expr then stmt
⚫ On seeing input if it is not clear for the parser which production
to use
⚫ We can easily perform left factoring:
⚫ If we have A->αβ1 | αβ2 then we replace it with
⚫ A -> αA’
⚫ A’ -> β1 | β2
Left factoring (cont.)
⚫ Algorithm
⚫ For each non-terminal A, find the longest prefix α common
to two or more of its alternatives. If α<> ɛ, then replace all
of A-productions A->αβ1 |αβ2 | … | αβn | γ by
⚫ A -> αA’ | γ
⚫ A’ -> β1 |β2 | … | βn
⚫ Example:
⚫ S -> I E t S | i E t S e S | a
⚫ E -> b
Top Down Parsing
Introduction
⚫ A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right
⚫ It can be also viewed as finding a leftmost derivation for
an input string
⚫ Example: id+id*id
E E E E E E
E -> TE’ lm lm lm lm lm
E’ -> +TE’ | Ɛ T E’ T E’ T E’ T E’ T E’
T -> FT’
F T’ F T’ F T’ F T’ + T E’
T’ -> *FT’ | Ɛ
F -> (E) | id id id Ɛ id Ɛ
Recursive descent parsing
⚫ Consists of a set of procedures, one for each nonterminal
⚫ Execution begins with the procedure for start symbol
⚫ A typical procedure for a non-terminal

void A() {
choose an A-production, A->X1X2..Xk
for (i=1 to k) {
if (Xi is a nonterminal
call procedure Xi();
else if (Xi equals the current input symbol a)
advance the input to the next symbol;
else /* an error has occurred */
}
}
Recursive descent parsing (cont)
⚫ General recursive descent may require backtracking
⚫ The previous code needs to be modified to allow
backtracking
⚫ In general form it cant choose an A-production easily.
⚫ So we need to try all alternatives
⚫ If one failed the input pointer needs to be reset and another
alternative should be tried
⚫ Recursive descent parsers cant be used for left-recursive
grammars
Example
S->cAd
A->ab | a Input: cad

S S S

c A d c A d c A d

a b a
First and Follow
⚫ First() is set of terminals that begins strings derived from
⚫ If α=>ɛ
* then is also in First(ɛ)
⚫ In predictive parsing when we have A-> α|β, if First(α) and
First(β) are disjoint sets then we can select appropriate
A-production by looking at the next input
⚫ Follow(A), for any nonterminal A, is set of terminals a that can
appear immediately after A in some sentential form
⚫ If we have S =>
* αAaβ for some αand βthen a is in Follow(A)

⚫ If A can be the rightmost symbol in some sentential form, then

$ is in Follow(A)
Computing First
⚫ To compute First(X) for all grammar symbols X, apply
*
following rules until no more terminals or ɛ can be added
to any First set:
1. If X is a terminal then First(X) = {X}.
2. If X is a nonterminal and X->Y1Y2…Yk is a production
for some k>=1, then place a in First(X) if for some i a is in
First(Yi) and ɛ is in all of First(Y1),…,First(Yi-1) that is
*
Y1…Yi-1 => ɛ. if ɛ is in First(Yj) for j=1,…,k then add
ɛ to First(X).
3. If X-> ɛ is a production then add ɛ to First(X)
⚫ Example!
Computing follow
⚫ To compute First(A) for all nonterminals A, apply
following rules until nothing can be added to any follow
set:
1. Place $ in Follow(S) where S is the start symbol
2. If there is a production A-> αBβ then everything in
First(β) except ɛ is in Follow(B).
3. If there is a production A->B or a production
A->αBβ where First(β) contains ɛ, then everything in
Follow(A) is in Follow(B)
⚫ Example!
LL(1) Grammars
⚫ Predictive parsers are those recursive descent parsers needing no
backtracking
⚫ Grammars for which we can create predictive parsers are called
LL(1)
⚫ The first L means scanning input from left to right
⚫ The second L means leftmost derivation
⚫ And 1 stands for using one input symbol for lookahead
⚫ A grammar G is LL(1) if and only if whenever A-> α|βare two
distinct productions of G, the following conditions hold:
⚫ For no terminal a do αandβ both derive strings beginning with a
⚫ At most one of α or βcan derive empty string
*
⚫ If α=> ɛ then βdoes not derive any string beginning with a terminal in
Follow(A).
Construction of predictive
parsing table
⚫ For each production A->α in grammar do the following:
1. For each terminal a in First(α) add A-> in M[A,a]
2. If ɛ is in First(α), then for each terminal b in Follow(A)
add A-> ɛ to M[A,b]. If ɛ is in First(α) and $ is in
Follow(A), add A-> ɛ to M[A,$] as well
⚫ If after performing the above, there is no production in
M[A,a] then set M[A,a] to error
First
Example F {(,id}
Follow
{+, *, ), $}
E -> TE’ {(,id} {+, ), $}
E’ -> +TE’ | Ɛ T
E {(,id} {), $}
T -> FT’ {+,ɛ} {), $}
E’
T’ -> *FT’ | Ɛ {*,ɛ} {+, ), $}
F -> (E) | id T’
Input Symbol
Non -
terminal id + * ( ) $
E E -> TE’ E -> TE’

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

F F -> id F -> (E)

Another example
S -> iEtSS’ | a
S’ -> eS | Ɛ
E -> b

Input Symbol
Non -
terminal a b e i t $
S S -> a S -> iEtSS’

S’ S’ -> Ɛ S’ -> Ɛ
S’ -> eS
E E -> b
Non-recursive predicting parsing
a + b $

Predictive
parsing output
stack X
Y program
Z
$
Parsing
Table
M
Predictive parsing algorithm
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X<>$) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X,a] is an error entry) error();
else if (M[X,a] = X->Y1Y2..Yk) {
output the production X->Y1Y2..Yk;
pop the stack;
push Yk,…,Y2,Y1 on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Example
⚫ id+id*id$

Matched Stack Input Action

E$ id+id*id$
Error recovery in predictive parsing
⚫ Panic mode
⚫ Place all symbols in Follow(A) into synchronization set for nonterminal
A: skip tokens until an element of Follow(A) is seen and pop A from
stack.
⚫ Add to the synchronization set of lower level construct the symbols that
begin higher level constructs
⚫ Add symbols in First(A) to the synchronization set of nonterminal A
⚫ If a nonterminal can generate the empty string then the production
deriving can be used as a default
⚫ If a terminal on top of the stack cannot be matched, pop the terminal,
issue a message saying that the terminal was insterted
Non - Input Symbol
terminal id + * ( ) $
E E -> TE’ E -> TE’ synch synch
Example E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T -> FT’ synch T -> FT’ synch synch

T
T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ
T’

F F -> id synch synch F -> (E) synch synch

Stack Input Action

E$ )id*+id$ Error, Skip )
E$ id*+id$ id is in First(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
T’E’$ *+id$
*FT’E’$ *+id$
FT’E’$ +id$ Error, M[F,+]=synch
T’E’$ +id$ F has been poped
Bottom-up Parsing
Introduction
⚫ Constructs parse tree for an input string beginning at the
leaves (the bottom) and working towards the root (the top)
⚫ Example: id*id

E -> E + T | T idid F id T * id T*F F E

T -> T * F | F
F id T*F F
F -> (E) | id id F

id id F id T*F

id F id

id
Shift-reduce parser
⚫ The general idea is to shift some symbols of input to the
stack until a reduction can be applied
⚫ At each reduction step, a specific substring matching the
body of a production is replaced by the nonterminal at the
head of the production
⚫ The key decisions during bottom-up parsing are about
when to reduce and about what production to apply
⚫ A reduction is a reverse of a step in a derivation
⚫ The goal of a bottom-up parser is to construct a derivation
in reverse:
⚫ E=>T=>T*F=>T*id=>F*id=>id*id
Handle pruning
⚫ A Handle is a substring that matches the body of a
production and whose reduction represents one step along
the reverse of a rightmost derivation

Right sentential form Handle Reducing production

id*id id F->id
F*id F T->
T*id id F
F->id
T*F T*F E->T*F
Shift reduce parsing
⚫ A stack is used to hold grammar symbols
⚫ Handle always appear on top of the stack
⚫ Initial configuration:
Stack Input
$ w$
⚫ Acceptance configuration
Stack Input
$S $
Shift reduce parsing (cont.)
⚫ Basic operations:
⚫ Shift Stack Input Action
⚫ Reduce
$ id*id$ shift
⚫ Accept $id *id$ reduce by F->id
⚫ Error $F *id$ reduce by
$T *id$ shift
⚫ Example: id*id $T* id$
T->F
shift
$T*id $ reduce by F->id
$T*F $ reduce by
$T $ reduce by E->T
T->T*F
$E $ accept
Handle will appear on top of
the stack
S S
A
B
B A
α β γ z α γ z
y x y
Stack Input Stack Input
$αβγ yz$ $αγ xyz$
$αβB yz$ $αBxy z$
$αβBy z$
Conflicts during shit reduce
parsing
⚫ Two kind of conflicts
⚫ Shift/reduce conflict
⚫ Reduce/reduce conflict
⚫ Example:

Stack Input
… if expr then stmt else …$
Reduce/reduce conflict
stmt -> id(parameter_list)
stmt -> expr:=expr
parameter_list->parameter_list, parameter
parameter_list->parameter
parameter->id
expr->id(expr_list)
expr->id
expr_list->expr_list, expr
Stack Input
expr_list->expr
… id(id ,id) …$
LR Parsing
⚫ The most prevalent type of bottom-up parsers
⚫ LR(k), mostly interested on parsers with k<=1
⚫ Why LR parsers?
⚫ Table driven
⚫ Can be constructed to recognize all programming language
constructs
⚫ Most general non-backtracking shift-reduce parsing method
⚫ Can detect a syntactic error as soon as it is possible to do so
⚫ Class of grammars for which we can construct LR parsers are
superset of those which we can construct LL parsers
States of an LR parser
⚫ States represent set of items
⚫ An LR(0) item of G is a production of G with the dot at
some position of the body:
⚫ For A->XYZ we have following items
⚫ A->.XYZ

⚫ A->X.YZ

⚫ A->XY.Z

⚫ A->XYZ.

⚫ In a state having A->.XYZ we hope to see a string derivable

from XYZ next on the input.
⚫ What about A->X.YZ?
Constructing canonical LR(0)
item sets
⚫ Augmented grammar:
⚫ G with addition of a production: S’->S
⚫ Closure of item sets:
⚫ If I is a set of items, closure(I) is a set of items constructed from I by the
following rules:
⚫ Add every item in I to closure(I)
⚫ If A->α.Bβ is in closure(I) and B->γ is a production then add the item
B->.γ to clsoure(I).
⚫ Example: I0=closure({[E’->.E]}
E’->E E’->.E
E -> E + T | T E->.E+T
T -> T * F | F E->.T
T->.T*F
F -> (E) | id T->.F
F->.(E)
F->.id
Constructing canonical LR(0)
item sets (cont.)
⚫ Goto (I,X) where I is an item set and X is a grammar
symbol is closure of set of all items [A-> αX. β] where
[A-> α.X β] is in I
I1
⚫ Example E’->E.
E E->E.+T
I0=closure({[E’->.E]}
E’->.E I2
E->.E+T T
E’->T.
E->.T T->T.*F
T->.T*F I4
T->.F ( F->(.E)
F->.(E) E->.E+T
E->.T
F->.id T->.T*F
T->.F
F->.(E)
F->.id
Closure algorithm
SetOfItems CLOSURE(I) {
J=I;
repeat
for (each item A-> α.Bβ in J)
for (each prodcution B->γ of G)
if (B->.γ is not in J)
add B->.γ to J;
until no more items are added to J on one round;
return J;
GOTO algorithm
SetOfItems GOTO(I,X) {
J=empty;
if (A-> α.X β is in I)
add CLOSURE(A-> αX. β ) to J;
return J;
}
Canonical LR(0) items
Void items(G’) {
C= CLOSURE({[S’->.S]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (GOTO(I,X) is not empty and not in C)
add GOTO(I,X) to C;
until no new set of items are added to C on a round;
}
E’->E
E -> E + T | T

Example acc
$
T -> T * F | F
F -> (E) | id
I6 I9
E->E+.T
I1
+ T->.T*F T E->E+T.
E’->E. T->.F
T->T.*F
E E->E.+T
F->.(E)
F->.id
I0=closure({[E’->.E]} I2
E’->.E T I7
F I10
E->.E+T E’->T. * T->T*.F
F->.(E) T->T*F.
E->.T T->T.*F id F->.id
T->.T*F id
T->.F I5
F->.(E)
F->.id ( F->id. +
I4
F->(.E)
I8 I11
E->.E+T
E->.T
E E->E.+T )
T->.T*F F->(E.) F->(E).
T->.F
F->.(E)
F->.id

I3
T>F.
Use of LR(0) automaton
⚫ Example: id*id
Line Stack Symbols Input Action
(1) 0 $ id*id$ Shift to 5
(2) 05 $id *id$ Reduce by F->id
(3) 03 $F *id$ Reduce by T->F
(4) 02 $T *id$ Shift to 7
(5) 027 $T* id$ Shift to 5
(6) 0275 $T*id $ Reduce by F->id
(7) 02710 $T*F $ Reduce by T->T*F
(8) 02 $T $ Reduce by E->T
(9) 01 $E $ accept
LR-Parsing model
INPUT a1 … ai … an $

LR Parsing Output
Sm
Program
Sm-1
…
$
ACTION GOTO
LR parsing algorithm
let a be the first symbol of w$;
while(1) { /*repeat forever */
let s be the state on top of the stack;
if (ACTION[s,a] = shift t) {
push t onto the stack;
let a be the next input symbol;
} else if (ACTION[s,a] = reduce A->β) {
pop |β| symbols of the stack;
let state t now be on top of the stack;
push GOTO[t,A] onto the stack;
output the production A->β;
} else if (ACTION[s,a]=accept) break; /* parsing is done */
else call error-recovery routine;
}
Example (0) E’->E
(1) E -> E + T
(2) E-> T
STATE ACTON GOTO
(3) T -> T * F
id + * ( ) $ E T F (4) T-> F
0 S5 S4 1 2 3
(5) F -> (E) id*id+id?
(6) F->id
1 S6 Acc
Line Stack Symbols Input Action
2 R2 S7 R2 R2
(1) 0 id*id+id$ Shift to 5
3 R4 R7 R4 R4
(2) 05 id *id+id$ Reduce by F->id
4 S5 S4 8 2 3 (3) 03 F *id+id$ Reduce by T->F
5 R6 R6 R6 R6 (4) 02 T *id+id$ Shift to 7
(5) 027 T* id+id$ Shift to 5
6 S5 S4 9 3 (6) 0275 T*id +id$ Reduce by F->id
7 S5 S4 10 (7) 02710 T*F +id$ Reduce by
T->T*F
8 S6 S11 (8) 02 T +id$ Reduce by E->T
9 R1 S7 R1 R1 (9) 01 E +id$ Shift
(10) 016 E+ id$ Shift
10 R3 R3 R3 R3
(11) 0165 E+id $ Reduce by F->id
11 R5 R5 R5 R5 (12) 0163 E+F $ Reduce by T->F
(13) 0169 E+T` $ Reduce by
E->E+T
(14) 01 E $ accept
Constructing SLR parsing table
⚫ Method
⚫ Construct C={I0,I1, … , In}, the collection of LR(0) items for G’
⚫ State i is constructed from state Ii:
⚫ If [A->α.aβ] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
⚫ If [A->α.] is in Ii, then set ACTION[i,a] to “reduce A->α” for all a in
follow(A)
⚫ If {S’->.S] is in Ii, then set ACTION[I,$] to “Accept”
⚫ If any conflicts appears then we say that the grammar is not
SLR(1).
⚫ If GOTO(Ii,A) = Ij then GOTO[i,A]=j
⚫ All entries not defined by above rules are made “error”
⚫ The initial state of the parser is the one constructed from the set of
items containing [S’->.S]
Example grammar which is not
SLR(1) S -> L=R | R
L -> *R | id
R -> L
I0 I1 I3 I5 I7
S’->.S S’->S. S ->R. L -> id. L -> *R.
S -> .L=R
S->.R I2 I4 I6
I8
L -> .*R | S ->L.=R L->*.R S->L=.R
R -> L.
L->.id R ->L. R->.L R->.L
R ->. L L->.*R L->.*R I9
L->.id L->.id S -> L=R.

Action
=
Shift 6
2 Reduce R->L
More powerful LR parsers
⚫ Canonical-LR or just LR method
⚫ Use lookahead symbols for items: LR(1) items
⚫ Results in a large collection of items
⚫ LALR: lookaheads are introduced in LR(0) items
Canonical LR(1) items
⚫ In LR(1) items each item is in the form: [A->α.β,a]
⚫ An LR(1) item [A->α.β,a] is valid for a viable prefix γ if
*
there is a derivation S=>δAw=>δαβw, where
rm
⚫ Γ= δα
⚫ Either a is the first symbol of w, or w is ε and a is $
⚫ Example:
⚫ S->BB
*
S=>aaBab=>aaaBab
rm
⚫ B->aB|b Item [B->a.B,a] is valid for γ=aaa
and w=ab
Constructing LR(1) sets of items
SetOfItems Closure(I) {
repeat
for (each item [A->α.Bβ,a] in I)
for (each production B->γ in G’)
for (each terminal b in First(βa))
add [B->.γ, b] to set I;
until no more items are added to I;
return I;
}

SetOfItems Goto(I,X) {
initialize J to be the empty set;
for (each item [A->α.Xβ,a] in I)
add item [A->αX.β,a] to set J;
return closure(J);
}

void items(G’){
initialize C to Closure({[S’->.S,$]});
repeat
for (each set of items I in C)
for (each grammar symbol X)
if (Goto(I,X) is not empty and not in C)
add Goto(I,X) to C;
until no new sets of items are added to C;
}
Example
S’->S
S->CC
C->cC
C->d
Canonical LR(1) parsing table
⚫ Method
⚫ Construct C={I0,I1, … , In}, the collection of LR(1) items for G’
⚫ State i is constructed from state Ii:
⚫ If [A->α.aβ, b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to “shift j”
⚫ If [A->α., a] is in Ii, then set ACTION[i,a] to “reduce A->α”
⚫ If {S’->.S,$] is in Ii, then set ACTION[I,$] to “Accept”
⚫ If any conflicts appears then we say that the grammar is not
LR(1).
⚫ If GOTO(Ii,A) = Ij then GOTO[i,A]=j
⚫ All entries not defined by above rules are made “error”
⚫ The initial state of the parser is the one constructed from the set of
items containing [S’->.S,$]
Example
S’->S
S->CC
C->cC
C->d
LALR Parsing Table
⚫ For the previous example we had:

I4
C->d. , c/d
I47
C->d. , c/d/$
I7
C->d. , $

⚫ State merges cant produce Shift-Reduce conflicts. Why?

⚫ But it may produce reduce-reduce conflict
Example of RR conflict in state
merging
S’->S
S -> aAd | bBd | aBe | bAe
A -> c
B -> c
An easy but space-consuming
LALR table construction
⚫ Method:
1. Construct C={I0,I1,…,In} the collection of LR(1) items.
2. For each core among the set of LR(1) items, find all sets having
that core, and replace these sets by their union.
3. Let C’={J0,J1,…,Jm} be the resulting sets. The parsing actions
for state i, is constructed from Ji as before. If there is a conflict
grammar is not LALR(1).
4. If J is the union of one or more sets of LR(1) items, that is J = I1
UI2…IIk then the cores of Goto(I1,X), …, Goto(Ik,X) are the
same and is a state like K, then we set Goto(J,X) =k.
⚫ This method is not efficient, a more efficient one is discussed
in the book
Compaction of LR parsing table
⚫ Many rows of action tables are identical
⚫ Store those rows separately and have pointers to them from
different states
⚫ Make lists of (terminal-symbol, action) for each state
⚫ Implement Goto table by having a link list for each
nonterinal in the form (current state, next state)
Using ambiguous grammars STATE ACTON GO
TO
E->E+E id + * ( ) $ E

E->E*E 0 S3 S2 1

1 S4 S5 Acc
E->(E)
2 S3 S2 6
E->id 3 R4 R4 R4 R4

4 S3 S2 7

5 S3 S2 8
I0: E’->.E I1: E’->E. I2: E->(.E)
E->.E+E E->E.+E E->.E+E 6 S4 S5
E->.E*E E->E.*E E->.E*E 7 R1 S5 R1 R1
E->.(E) E->.(E)
E->.id E->.id 8 R2 R2 R2 R2

I4: E->E+.E I5: E->E*.E 9 E->(E.)

I6: I7:R3 R3
E->E+E. R3 R3
I3: E->.id E->.E+E E->(.E) E->E.+E E->E.+E
E->.E*E E->.E+E E->E.*E E->E.*E
E->.(E) E->.E*E
E->.id E->.(E) I8: E->E*E. I9: E->(E).
E->.id E->E.+E
E->E.*E
TYPE CHECKING
⚫ A compiler has to do semantic checks in addition to syntactic checks.
⚫ Semantic Checks
⚫ Static – done during compilation
⚫ Dynamic – done during run-time
⚫ Type checking is one of these static checking operations.
⚫ we may not do all type checking at compile-time.
⚫ Some systems also use dynamic type checking too.
⚫ A type system is a collection of rules for assigning type expressions to the
parts of a program.
⚫ A type checker implements a type system.
⚫ A sound type system eliminates run-time type checking for type errors.
⚫ A programming language is strongly-typed, if every program its compiler
accepts will execute without type errors.
⚫ In practice, some of type checking operations are done at run-time (so, most of the
programming languages are not strongly-typed).
⚫ Ex: int x[100]; … x[i] most of the compilers cannot guarantee that i will be between
0 and 99

69
TYPE SYSTEMS
⚫ The type of a language construct is denoted by a type expression.
⚫ A type expression can be:
⚫ A basic type
⚫a primitive data type such as integer, real, char, boolean, …
⚫ type-error to signal a type error
⚫ void : no type
⚫ A type name
⚫ a name can be used to denote a type expression.
⚫ A type constructor applies to other type expressions.
⚫ arrays: If T is a type expression, then array(I,T) is a type expression where I denotes
index range. Ex: array(0..99,int)
⚫ products: If T and T are type expressions, then their cartesian product T x T is a
1 2 1 2
type expression. Ex: int x int
⚫ pointers: If T is a type expression, then pointer(T) is a type expression. Ex:
pointer(int)
⚫ functions: We may treat functions in a programming language as mapping from a
domain type D to a range type R. So, the type of a function can be denoted by the type
expression D→R where D are R type expressions. Ex: int→int represents the type of
a function which takes an int value as parameter, and its return type is also int.

70
SPECIFICATION OF SIMPLE TYPE CHECKER

P → D;E
D → D;D
D → id:T { addtype(id.entry,T.type) }
T → char { T.type=char }
T → int { T.type=int }
T → real { T.type=real }
T → ↑T1 { T.type=pointer(T1.type) }
T → array[intnum] of T1 {T.type=array(1..intnum.val,T1.type) }

71
Type Checking of Expressions
E → id { E.type=lookup(id.entry) }
E → charliteral { E.type=char }
E → intliteral { E.type=int }
E → realliteral { E.type=real }
E → E1 + E 2 { if (E1.type=int and E2.type=int) then E.type=int
else if (E1.type=int and E2.type=real) then E.type=real
else if (E1.type=real and E2.type=int) then E.type=real
else if (E1.type=real and E2.type=real) then E.type=real
else E.type=type-error }
E → E1 [E2] { if (E2.type=int and E1.type=array(s,t)) then E.type=t
else E.type=type-error }
E → E1 ↑ { if (E1.type=pointer(t)) then E.type=t
else E.type=type-error }

72
Type Checking of Statements
S → id = E { if (id.type=E.type then S.type=void
else S.type=type-error }

S → if E then S1 { if (E.type=boolean then S.type=S1.type

else S.type=type-error }

S → while E do S1 { if (E.type=boolean then S.type=S1.type

else S.type=type-error }

73
Type Checking of Functions
E → E1 ( E2 ) { if (E2.type=s and E1.type=s→t) then E.type=t
else E.type=type-error }

Ex: int f(double x, char y) { ... }

f: double x char → int

argument types return type

74
Structural Equivalence of Type Expressions
⚫ How do we know that two type expressions are equal?
⚫ As long as type expressions are built from basic types (no
type names), we may use structural equivalence between two
type expressions

Structural Equivalence Algorithm (sequiv):

if (s and t are same basic types) then return true
else if (s=array(s1,s2) and t=array(t1,t2)) then return (sequiv(s1,t1) and
sequiv(s2,t2))
else if (s = s1 x s2 and t = t1 x t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else if (s=pointer(s1) and t=pointer(t1)) then return (sequiv(s1,t1))
else if (s = s1 → s2 and t = t1 → t2) then return (sequiv(s1,t1) and sequiv(s2,t2))
else return false
75
Names for Type Expressions
⚫ In some programming languages, we give a name to a type
expression, and we use that name as a type expression
afterwards.

type link = ↑ cell; ? p,q,r,s have same types ?

var p,q : link;
var r,s : ↑ cell

⚫ How do we treat type names?

⚫ Get equivalent type expression for a type name (then use structural equivalence), or
⚫ Treat a type name as a basic type.

76
Cycles in Type Expressions
type link = ↑ cell;
type cell = record
x : int,
next : link
end;
⚫ We cannot use structural equivalence if there are cycles in
type expressions.
⚫ We have to treat type names as basic types.
but this means that the type expression link is different than the type expression
↑cell.

77
Type Conversions
x+y ? what is the type of this expression (int or
double)?

⚫ What kind of codes we have to produce, if the type of x is

double and the type of y is int?

inttoreal y,,t1
real+ t1,x,t2

78
RUN TIME ENVIRONMENTS

⚫ Source Language issues

⚫ Activation record and activations trees
⚫ Control stacks (Frame)
⚫ The scope of a declarations
⚫ Storage organization
⚫ Symbol Table organizations (revisited)

79
SOURCE LANGUAGE ISSUES
⚫ A compiler must implement the abstractions presented in the source language
definition
⚫ Examples include
⚫ Names/id
⚫ Scopes
⚫ Binding
⚫ Data types

⚫ Operators

⚫ Procedures/functions,
⚫ Control flow constructs

⚫ Compilers need to work

⚫ OS
⚫ Resource allocation subsystems (memory mgt)
⚫ etc

80
STORAGE ORGANIZATION
⚫ A typical organization of run-time storage for Pascal/C include the
following sub-division of memory
⚫ Program code

⚫ Refers to static area used by the generated target code which is

fixed at compile time

⚫ Static Global data

⚫ Refers to static area used by global constants and data generated by

the compiler (e.g., info. Needed for garbage collector)
⚫ Stack (procedure activation info.)
⚫ Dynamic area used to store Activation Records
⚫ The area usually grows towards lower address
⚫ E.g., all local names for a procedure call

⚫ Heap (other program info.)

⚫ Dynamic area used to store data in some variable amount that
won't be known until the program is running.
⚫ E.g.,

⚫ a program may accept different amounts of input from one or

more users for processing and then do the processing on all
the input data at once

82
High
memory

low
memory
STORAGE ALLOCATION STRATEGIES

Static Vs Dynamic storage allocation

⚫ Static storage allocation

⚫ Compile-time notion referring to the decision made at
compile time by the compiler looking only at the program
text and at its behavior
⚫ Dynamic storage allocation
⚫ Run-time notion referring to the decision made when the
program is running
⚫ Almost all compilers for languages supporting procedures/function call use
stack to handle the run-time memory
⚫ E.g.,
⚫ Pascal/C use control stack (frame) to manage activation of procedures
⚫ Each time procedure call occurs
⚫ Space is pushed on stack
⚫ Execution of a program or a current activation is suspended
⚫ Information about the status of machine (e.g., PC, registers) are saved
in stack
⚫ Each time a procedure call returns
⚫ Space is popped
⚫ Restoring the values of the registers, PC
⚫ Stack is a natural and simple arrangement to share space between procedure
calls
⚫ Stack allows to compile procedure code in a such way that the relative
addresses of its nonlocal variable are the same no matter how procedures are
called

85
Stack
⚫ Stack
⚫ Last-in First-out (LIFO)
⚫ Used to implement the behavior of function calls
⚫ Stack uses the following operations
⚫ Push (or call) the node for an activation onto the stack when
it begins
⚫ Pop (or return) the node when the activation end

86
Stack allocation of space
⚫ Stack allocation would not be practical if
procedures calls did not nest in time

⚫ Activation tree
⚫ used to establish the proper order in which activations
appear along the path to Root, and they will return the
result in the reverse of that order
Activation Records: Fields
⚫ the field of Activation Record (AR)
⚫ Temporary values: values generated as the result of the
expression evaluations which cannot be put in registers
⚫ Local data: local data to belong to the procedure
⚫ Saved machine: keeps the context or machine status (register,
PC, etc)
⚫ Access link: points to non-local data in other AR
⚫ Control link: points to the caller’s activation record
⚫ the return value space of the called function, if any
⚫ Actual parameters: used by the calling procedure to pass
parameters to called procedures. Registers are used to pass these
information

88
Activation Trees
❑ Activation trees
▪ Represents the activations of procedures during program
execution using a tree
❑ Activation of procedure
▪ Execution of a procedure body
▪ Lifetime of activation
▪ Sequence of steps between first and last statements
▪ Non-overlapping,
▪ nested activation,
▪ Recursive
89
parent

Occurs
first

90
Symbol Tables

❑ At many of the stages of translation, you'll need to keep track of

various attributes of the symbols that appear in the program.
› Their type (variables/symbols)
› Their memory location (for variables)
› The size of their stack frames (for functions and procedures)
› Their values (for constants)
› ...
❑ The structure used to keep track of this information is called a
symbol table
❑ need to think carefully about the design of symbol tables

Chapter 8 - Syntax Analysis
No ratings yet
Chapter 8 - Syntax Analysis
92 pages
Compiler Design Syntax Analysis Top Down
No ratings yet
Compiler Design Syntax Analysis Top Down
34 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
73 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
68 pages
Compiler Design Syntax Analysis Top Down
No ratings yet
Compiler Design Syntax Analysis Top Down
34 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
Lecture 1.9 Top Down Parsing and Lecture 1.10 Recursive Descent Parsing
No ratings yet
Lecture 1.9 Top Down Parsing and Lecture 1.10 Recursive Descent Parsing
21 pages
Cdeprt
No ratings yet
Cdeprt
12 pages
Chapter 4 - Syntax Analysis
No ratings yet
Chapter 4 - Syntax Analysis
82 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
Top-Down Parsing
No ratings yet
Top-Down Parsing
10 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
Syntax Analysis
No ratings yet
Syntax Analysis
90 pages
Presented by Jyoti Thakur
No ratings yet
Presented by Jyoti Thakur
31 pages
Unit-II CD
No ratings yet
Unit-II CD
81 pages
Top Down Parser
No ratings yet
Top Down Parser
111 pages
CS6109 Module 5
No ratings yet
CS6109 Module 5
117 pages
Compiler Construction Week 4 Lecture 7 Part 2
No ratings yet
Compiler Construction Week 4 Lecture 7 Part 2
8 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
Chapter4 1
No ratings yet
Chapter4 1
61 pages
Parsing
No ratings yet
Parsing
158 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
td2 LL - 1 Parsing
No ratings yet
td2 LL - 1 Parsing
45 pages
Elimination of Left Recursion
No ratings yet
Elimination of Left Recursion
17 pages
Adama Science and Technology University: School of Electrical Engineering and Computing
No ratings yet
Adama Science and Technology University: School of Electrical Engineering and Computing
10 pages
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
No ratings yet
Unit - Ii Topdown Parsing 1. Context-Free Grammars: Definition
26 pages
Unit 2-Part B
No ratings yet
Unit 2-Part B
73 pages
Mod 2.1 - (Lec 8) - Syntax Analyzer and CFG
No ratings yet
Mod 2.1 - (Lec 8) - Syntax Analyzer and CFG
39 pages
CD Unit-2
No ratings yet
CD Unit-2
107 pages
Unit 3
No ratings yet
Unit 3
37 pages
Syntax Analysis I 2022 Class
No ratings yet
Syntax Analysis I 2022 Class
33 pages
Top-Down and Bottom-Up Parsing
No ratings yet
Top-Down and Bottom-Up Parsing
23 pages
Parsing Technique Baar Baar
No ratings yet
Parsing Technique Baar Baar
29 pages
Chapter 4 - Syntax Analysis CIE1
No ratings yet
Chapter 4 - Syntax Analysis CIE1
69 pages
Compiler Design Unit-2
No ratings yet
Compiler Design Unit-2
29 pages
Top Down Parsing
No ratings yet
Top Down Parsing
38 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
CD Unit-3 Part-1
No ratings yet
CD Unit-3 Part-1
99 pages
Predictive Parsing: Recall The Main Idea of Top-Down Parsing
No ratings yet
Predictive Parsing: Recall The Main Idea of Top-Down Parsing
19 pages
Predictive Parsing: Recall The Main Idea of Top-Down Parsing
No ratings yet
Predictive Parsing: Recall The Main Idea of Top-Down Parsing
19 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
90 pages
Pert 4 - Syntax Analysis-Top Down Parsing
No ratings yet
Pert 4 - Syntax Analysis-Top Down Parsing
54 pages
Session 3
No ratings yet
Session 3
18 pages
Chapter-4 - CS-411 Compiler Construction
No ratings yet
Chapter-4 - CS-411 Compiler Construction
8 pages
Parsing
No ratings yet
Parsing
33 pages
3 Syntax Analysis
No ratings yet
3 Syntax Analysis
42 pages
Lecture 13
No ratings yet
Lecture 13
35 pages
Unit 3
No ratings yet
Unit 3
117 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Chapter 4 - Syntax Analysis Part 1
No ratings yet
Chapter 4 - Syntax Analysis Part 1
36 pages
CD Unit3
No ratings yet
CD Unit3
74 pages
Syntax Analyzer
No ratings yet
Syntax Analyzer
38 pages
Chapter 3-Syntax Analysis-II
No ratings yet
Chapter 3-Syntax Analysis-II
28 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Syntax Analysis: COP5621 Compiler Construction
No ratings yet
Syntax Analysis: COP5621 Compiler Construction
36 pages
Top To Bottom
No ratings yet
Top To Bottom
31 pages
Parser
No ratings yet
Parser
36 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Operator Precedence and LL Parsing
No ratings yet
Operator Precedence and LL Parsing
31 pages
Academic Calendar 2025 - 26 (03.07.2025)
No ratings yet
Academic Calendar 2025 - 26 (03.07.2025)
12 pages
2
No ratings yet
2
37 pages
3
No ratings yet
3
25 pages
APznzabfSQ9i5TG4tnwX6UbeRziHn8OUo8NyQGTVluntPnz45zJLmzKDaJWkC8XmzO FUeXocw6HyMRHMDlYq8Q RNHMkBps2z6X LIEgur Lbz Wv31QnZ BExENdmkoqPmlRoMEn20pe6phisjHcUJJtmw DUtoMqVqFigmLOpCxwQGELFfPS9A UQiXAa4pLLpw4APZuLj5 R
No ratings yet
APznzabfSQ9i5TG4tnwX6UbeRziHn8OUo8NyQGTVluntPnz45zJLmzKDaJWkC8XmzO FUeXocw6HyMRHMDlYq8Q RNHMkBps2z6X LIEgur Lbz Wv31QnZ BExENdmkoqPmlRoMEn20pe6phisjHcUJJtmw DUtoMqVqFigmLOpCxwQGELFfPS9A UQiXAa4pLLpw4APZuLj5 R
34 pages
Bsss Unit 3
No ratings yet
Bsss Unit 3
61 pages
Imp
No ratings yet
Imp
15 pages
R.M.K. Engineering College: 22CS304 Operating System
No ratings yet
R.M.K. Engineering College: 22CS304 Operating System
101 pages
Balaji Varkala
No ratings yet
Balaji Varkala
3 pages
Bss
No ratings yet
Bss
40 pages
Oose Lab Manual
No ratings yet
Oose Lab Manual
93 pages
Imp Q
100% (1)
Imp Q
7 pages
Module 3
No ratings yet
Module 3
56 pages
Compiler Design Suggestion
No ratings yet
Compiler Design Suggestion
3 pages
Computer Science and Engineering PDF
No ratings yet
Computer Science and Engineering PDF
17 pages
Mtech CST Andhra University 2015
No ratings yet
Mtech CST Andhra University 2015
48 pages
Guide in The Language Review of Deped-Developed Adm Modules
No ratings yet
Guide in The Language Review of Deped-Developed Adm Modules
3 pages
MODULE 3 - Syntax Analysis
No ratings yet
MODULE 3 - Syntax Analysis
110 pages
Sebesta Chapter 4 With Additions
No ratings yet
Sebesta Chapter 4 With Additions
46 pages
Btech 7th 8th
No ratings yet
Btech 7th 8th
30 pages
CD Unit-3 (1) (R20)
No ratings yet
CD Unit-3 (1) (R20)
29 pages
CS 6660 Compiler Design
No ratings yet
CS 6660 Compiler Design
7 pages
Compiler Design RCS 602
No ratings yet
Compiler Design RCS 602
2 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
7 pages
TCS 502 Assignment
25% (8)
TCS 502 Assignment
12 pages
Chapter No. 5.: Compilers: Analysis Phase
No ratings yet
Chapter No. 5.: Compilers: Analysis Phase
113 pages
CLR and Lalr
No ratings yet
CLR and Lalr
10 pages
Compiler Design Complete Assignment
No ratings yet
Compiler Design Complete Assignment
5 pages
Lecture 5 - LL (1) Parsing Algorithm
No ratings yet
Lecture 5 - LL (1) Parsing Algorithm
23 pages
BTech CSE
No ratings yet
BTech CSE
28 pages
SITA3012 NLP Unit 1
No ratings yet
SITA3012 NLP Unit 1
33 pages
Compiler Design ECX6235 Answers For Tma 02: Name Reg. No Centre Due Date
No ratings yet
Compiler Design ECX6235 Answers For Tma 02: Name Reg. No Centre Due Date
7 pages
Compiler Design Rtmnu Imp Questions S2025
No ratings yet
Compiler Design Rtmnu Imp Questions S2025
6 pages
A Ad - A - Ab - Abc - B: Generate The SLR Parsing Table For The Following Grammar
0% (1)
A Ad - A - Ab - Abc - B: Generate The SLR Parsing Table For The Following Grammar
7 pages
Compiler Construction Unit 3 Part-5 LL (1), LR (0) - SLR (1) CSE
No ratings yet
Compiler Construction Unit 3 Part-5 LL (1), LR (0) - SLR (1) CSE
8 pages
Final 7th Sem Syllabus
No ratings yet
Final 7th Sem Syllabus
39 pages
B.Tech - CSE and CS Syllabus of 3rd Year July 2020
No ratings yet
B.Tech - CSE and CS Syllabus of 3rd Year July 2020
36 pages
Cs8501 - Theory of Computation - by DR W T Chembian
No ratings yet
Cs8501 - Theory of Computation - by DR W T Chembian
167 pages
CD GTU Study Material Presentations Unit-3 15092020080346AM
No ratings yet
CD GTU Study Material Presentations Unit-3 15092020080346AM
128 pages
UNIT-3 CD Final
No ratings yet
UNIT-3 CD Final
94 pages
CD Question Bank
No ratings yet
CD Question Bank
14 pages

U 2 PPT

Uploaded by

U 2 PPT

Uploaded by

20CS601 – COMPILER DESIGN Lab Integrated

Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in

⚫ There are cases like following

⚫ If A can be the rightmost symbol in some sentential form, then

E’ E’ -> +TE’ E’ -> Ɛ E’ -> Ɛ

T T -> FT’ T -> FT’

T’ T’ -> Ɛ T’ -> *FT’ T’ -> Ɛ T’ -> Ɛ

F F -> id F -> (E)

Matched Stack Input Action

T -> FT’ synch T -> FT’ synch synch

F F -> id synch synch F -> (E) synch synch

Stack Input Action

E -> E + T | T id*id F * id T * id T*F F E

Right sentential form Handle Reducing production

⚫ In a state having A->.XYZ we hope to see a string derivable

⚫ State merges cant produce Shift-Reduce conflicts. Why?

I4: E->E+.E I5: E->E*.E 9 E->(E.)

S → if E then S1 { if (E.type=boolean then S.type=S1.type

S → while E do S1 { if (E.type=boolean then S.type=S1.type

Ex: int f(double x, char y) { ... }

f: double x char → int

argument types return type

Structural Equivalence Algorithm (sequiv):

type link = ↑ cell; ? p,q,r,s have same types ?

⚫ How do we treat type names?

⚫ What kind of codes we have to produce, if the type of x is

⚫ Source Language issues

⚫ Compilers need to work

⚫ Refers to static area used by the generated target code which is

⚫ Static Global data

⚫ Refers to static area used by global constants and data generated by

⚫ Heap (other program info.)

⚫ a program may accept different amounts of input from one or

Static Vs Dynamic storage allocation

⚫ Static storage allocation

❑ At many of the stages of translation, you'll need to keep track of

You might also like

E -> E + T | T idid F id T * id T*F F E