0% found this document useful (0 votes)
50 views72 pages

Gate Compiler Design

Compiler Design for GATE

Uploaded by

rixomo1836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views72 pages

Gate Compiler Design

Compiler Design for GATE

Uploaded by

rixomo1836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Compiler Design

JW
n
Pw
Chapter 1: Lexical Analysis and Parsing 6.3

Jf
/2
Chapter 2: Syntax Directed Translation 6.27

ly
it.
//b
Chapter 3: Intermediate

s:
tp
Code Generation6.36

ht

i
S
TE
Chapter 4: Code Optimization6.56
O
N
D
N
A
H
EW
N
SE

t
C
TE
A
G
D
A
LO
N
W
O
D

6
D
O
W
N
LO
A
D
G
A
TE
C
SE
N
EW
H
A
N
D
N
O
TE
S
ht
This page is intentionally left blank
tp
s:
//b
it.
ly
/2
Jf
Pw
JW
Chapter 1
Lexical Analysis and Parsing

JW
LEARNING OBJECTIVES

Pw
 Language processing system  Bottom up parsing

Jf
 Lexical analysis  Conflicts

/2
 Syntax analysis  Operator precedence grammar

ly
it.
 Context free grammars and ambiguity  LR parser

//b
 Types of parsing  Canonical LR parser(CLR)

s:
 Top down parsing

tp
ht
S
TE
O
N

LanGuaGe processinG system High level


Compiler
Low level
D

program program
Language Processors
N

(source program) (target program)


A

Interpreter Error messages


H
EW

It is a computer program that executes instructions written in a


programming language. It either executes the source code directly Passes
N

or translates source code into some efficient intermediate represen-


SE

tation and immediately executes this. The number of iterations to scan the source code, till to get the
executable code is called as a pass.
C

Source program Compiler is two pass. Single pass requires more memory and
TE

Interpreter Output
Input multipass require less memory.
A
G

Example: Early versions of Lisp programming language, BASIC.


Analysis–synthesis model of compilation
D
A

Translator There are two parts of compilation:


LO

A software system that converts the source code from one form of
N

the language to another form of language is called as translator. Compilation


W

There are 2 types of translators namely (1) Compiler (2) Assembler.


O
D

Compiler converts source code of high level language into low


level language. Analysis Synthesis
(front end) (back end)
Assembler converts assembly language code into binary code.

Compilers Analysis It breaks up the source program into pieces and creates
A compiler is a software that translates code written in high-level an intermediate representation of the source program. This is more
language (i.e., source language) into target language. language specific.
Example: source languages like C, Java, . . . etc. Compilers are
user friendly. Synthesis It constructs the desired target program from the inter-
The target language is like machine language, which is efficient mediate representation. The target program will be more machine
for hardware. specific, dealing with registers and memory locations.
6.4 | Unit 6 • Compiler Design

Front end vs back end of a compiler Syntax analyzer or Parser


The front end includes all analysis phases and intermediate •• Tokens are grouped hierarchically into nested collections
code generator with part of code optimization. with collective meaning.
Error handler •• A context free grammar (CFG) specifies the rules or
productions for identifying constructs that are valid in
a programming language. The output is a parse/syntax/
Intermediate
Source Lexical Tokens Parser Syntax 3- derivation tree.
code
program analyzer tree adress
generator
code Example: Parse tree for –(id + id) using the following
Symbol
grammar:
table E→E+E

JW
E→E*E
The back end includes code optimization and code gen- E → –E

Pw
(G1)
eration phases. The back end synthesizes the target program E → (E)

Jf
from intermediate code. E → id

/2
E

ly
Context of a compiler

it.
In addition to a compiler, several other programs may be − E

//b
required to create an executable target program, like pre-

s:
processor to expand macros. ( E )

tp
The target program created by a compiler may require E + E

ht
further processing before it can be run.
The language processing system will be like this: id id

S
TE
Source program with macros
Semantic analysis
O
N

Preprocessor •• It checks the source program for semantic errors.


•• Type checking is done in this phase, where the compiler
D

Modified source program


N

checks that each operator has matching operands for


A

semantic consistency with the language definition.


H

Compiler
•• Gathers the type information for the next phases.
EW

Target assembly program


Example 1: The bicycle rides the boy.
N

Assembler This statement has no meaning, but it is syntactically


SE

Relocatable machine code correct.


C

Loader/linker
Library files, relocatable Example 2:
TE

object files int a;


bool b;
A

Absolute machine code


G

char c;
D

Phases c = a + b;
A

Compilation process is partitioned into some subproceses We cannot add integer with a Boolean variable and assign it
LO

called phases. to a character variable.


N

In order to translate a high level code to a machine code,


W

Intermediate code generation


we need to go phase by phase, with each phase doing a par-
O

ticular task and parsing out its output for the next phase. The intermediate representation should have two important
D

properties:
Lexical analysis or scanning (i) It should be easy to produce.
It is the first phase of a compiler. The lexical analyzer reads the (ii) Easy to translate into the target program
stream of characters making up the source program and groups
‘Three address code’ is one of the common forms of
the characters into meaningful sequences called lexemes.
Intermediate code.
Example: Consider the statement: if (a < b) Three address code consists of a sequence of instruc-
In this sentence the tokens are if, (a, <, b,). tions, each of which has at most three operands.
Number of tokens = 6
Identifiers: a, b Example:
Keywords: if id1 = id2 + id3 × 10;
Operators: <, ( , ) t1: = inttoreal(10)
Chapter 1 • Lexical Analysis and Parsing | 6.5

t2:= id3 × t1 in the source program. The stream of tokens is sent to the
t3:= id2 + t2 parser for syntax analysis.
There will be interaction with the symbol table as well.
id1 = t3

Source program
Code optimization
Lexical
The output of this phase will result in faster running analyzer
machine code.
Get
Example: For the above intermediate code the optimized Error handler next Tokens Symbol table
tokens
code will be
t1:= id3 × 10.0

JW
Parser
id1: = id2 + t1

Pw
In this we eliminated t2 and t3 registers.

Jf
Lexeme: Sequence of characters in the source program
Code generation

/2
that matches the pattern for a token. It is the smallest logical

ly
•• In this phase, the target code is generated. unit of a program.

it.
•• Generally the target code can be either a relocatable

//b
Example: 10, x, y, <, >, =
machine code or an assembly code.

s:
•• Intermediate instructions are each translated into a Tokens: These are the classes of similar lexemes.

tp
sequence of machine instructions.

ht
•• Assignment of registers will also be done. Example: Operators: <, >, =
Identifiers: x, y

S
Example: MOVF
id3, R2 TE Constants: 10
MULF ≠ 60.0, R2 Keywords: if, else, int
O


MOVF id2, R1
N


ADDF R2, R1
D


MOVF R1, id1
Operations performed by lexical analyzer
N
A

1. Identification of lexemes and spelling check


H

Symbol table management 2. Stripping out comments and white space (blank, new
EW

line, tab etc).


A symbol table is a data structure containing a record for
3. Correlating error messages generated by the compiler
each variable name, with fields for the attributes of the
N

with the source program.


name.
SE

4. If the source program uses a macro-preprocessor, the


C

What is the use of a symbol table? expansion of macros may also be performed by lexical
1. To record the identifiers used in the source program. analyzer.
TE

2. Its type and scope


A

3. If it is a procedure name then the number of argu- Example 1: Take the following example from Fortran
G

ments, types of arguments, the method of parsing (by   DO 5 I = 1.25


D

reference) and the type returned.   Number of tokens = 5


A

  The 1st lexeme is the keyword DO


LO

Error detection and reporting   Tokens are DO, 5, I, =, 1.25.


N
W

(i) Lexical phase can detect errors where the characters Example 2: An example from C program
O

remaining in the input ‘do not form any token’.   for (int i = 1; i < = 10; i + +)
D

(ii) Errors of the type, ‘violation of syntax’ of the language   Here tokens are for, (, int, i, =, 1,;, i, < =, 10,;,
are detected by syntax analysis. i, ++,)
(iii) Semantic phase tries to detect constructs that have the   Number of tokens = 13
right syntactic structure but no meaning.
Example: adding two array names etc. LEX compiler
Lexical analyzer divides the source code into tokens. To
Lexical Analysis implement lexical analyzer we have two techniques namely
Lexical Analysis is the first phase in compiler design. The hand code and the other one is LEX tool.
main task of the lexical analyzer is to read the input char- LEX is an automated tool which specifies lexical ana-
acters of the source program, group them into lexemes, and lyzer, from the rules given by the regular expression.
produce as output a sequence of tokens for each lexeme These rules are also called as pattern recognizing rules.
6.6 | Unit 6 • Compiler Design

Syntax Analysis The parser generator is used for construction of the com-
pilers front end.
This is the 2nd phase of the compiler, checks the syntax and
constructs the syntax/parse tree.
Input of parser is token and output is a parse/ syntax tree. Scope of declarations
Declaration scope refers to the certain program text portion,
Constructing parse tree in which rules are defined by the language.
Within the defined scope, entity can access legally to
Construction of derivation tree for a given input string by
declared entities.
using the production of grammar is called parse tree.
The scope of declaration contains immediate scope
Consider the grammar
always. Immediate scope is a region of declarative portion
S → E + E/E * E

JW
with enclosure of declaration immediately.
E → id Scope starts at the beginning of declaration and scope

Pw
The parse tree for the string continues till the end of declaration. Whereas in the over

Jf
ω = id + id * id is loadable declaration, the immediate scope will begin, when

/2
S the callable entity profile was determined.

ly
The visible part refers text portion of declaration, which

it.
E + E is visible from outside.

//b
s:
E * E
Syntax Error Handling

tp
1. Reports the presence of errors clearly and accurately.

ht
id id id
2. Recovers from each error quickly.
ω = id + id * id

S
3. It should not slow down the processing of correct
TE
programs.
Role of the parser
O
N

1. Construct a parse tree. Error Recovery Strategies


D

2. Error reporting and correcting (or) recovery. A parser


N

can be modeled by using CFG (Context Free Grammar)


A

recognized by using pushdown automata/table driven Panic Phrase level Error Global
H

parser. mode productions correction


EW

3. CFG will only check the correctness of sentence with


Panic mode On discovering an error, the parser discards
N

respect to syntax not the meaning.


input symbols one at a time until one of the synchronizing
SE

tokens is found.
Source Token
C

program Lexical Parse Phrase level A parser may perform local correction on the
Parser
TE

analyzer tree
Get next remaining input. It may replace the prefix of the remaining
token
A

Syntax input.
Lexical
G

errors
errors Error productions Parser can generate appropriate error
D

messages to indicate the erroneous construct that has been


A

recognized in the input.


LO

How to construct a parse tree?


Parse tree’s can be constructed in two ways. Global corrections There are algorithms for choosing a
N
W

(i) Top-down parser: It builds parse trees from the top minimal sequence of changes to obtain a globally least cost
O

(root) to the bottom (leaves). correction.


D

(ii) Bottom-up parser: It starts from the leaves and works


up to the root. Context Free Grammars
In both cases, the input to the parser is scanned from left to and Ambiguity
right, one symbol at a time.
A grammar is a set of rules or productions which generates
a collection of finite/infinite strings.
Parser generator It is a 4-tuple defined as G = (V, T, P, S)
Parser generator is a tool which creates a parser. Where
V = set of variables
Example: compiler – compiler, YACC
T = set of terminals
The input of these parser generator is grammar we use and P = set of production rules
the output will be the parser code. S = start symbol
Chapter 1 • Lexical Analysis and Parsing | 6.7

Example: S → (S)/e String


  S → (S)(1)
  S → e(2) String −
String
Here S is start symbol and the only variable.
(,), e is terminals. 9
String + String
(1) and (2) are production rules.
5 2

Sentential forms Figure 2 Rightmost derivation


*
s ⇒ a, Where a may contain non-terminals, then we say •• Ambiguity is problematic because the meaning of the
that a is a sentential form of G. program can be incorrect.

JW
Sentence: A sentence is a sentential form with no •• Ambiguity can be handled in several ways

Pw
non-terminals. 1. Enforce associativity and precedence
2. Rewrite the grammar by eliminating left recursion and

Jf
Example: –(id + id) is a sentence of the grammar (G1).

/2
left factoring.

ly
Derivations

it.
Removal of ambiguity

//b
Left most derivations Right most derivations
The grammar is said to be ambiguous if there exists more

s:
E ⇒ −E ⇒ −(E ) E ⇒ −E ⇒ −(E ) than one derivation tree for the given input string.

tp
⇒ −(E + E ) ⇒ −(E + E ) The ambiguity of grammar is undecidable; ambiguity of

ht
⇒ −(id + E) ⇒ −(E + id)
a grammar can be eliminated by rewriting the grammar.
⇒ −(id + id) ⇒ −(id + id)

S
Example:
TE
Right most derivations are also known as canonical E → E + E/id} → ambiguous grammar
O
derivations. E → E + T/T rewritten grammar
N

E T → id (unambiguous grammar)
D
N


A

E Left recursion
H

Left recursion can take the parser into infinite loop so we


EW

( E )
need to remove left recursion.
+
N

E E
Elimination of left recursion
SE

id id
A → Aa/b is a left recursive.
C

It can be replaced by a non-recursive grammar:


TE

Ambiguity A → bA′
A′ → aA′/e
A

A grammar that produces more than one parse tree for some
G

sentence is said to be ambiguous. In general


D

Or A → Aa1/Aa2/…/Aam/b1/b2/…/bn
A
LO

A grammar that produces more than one left most or more We can replace A productions by
than one right most derivations is ambiguous. A → b1 A′/b2 A′/–bn A′
N

A′ → a1 A′/a2 A′/–am A′
W

For example consider the following grammar:


O

String → String + String/String – String /0/1/2/…/9 Example 3: Eliminate left recursion from
D

  E → E + T/T
9 – 5 + 2 has two parse trees as shown below   T → T * F/F
String   F → (E)/id
Solution E → E + T/T it is in the form
A → Aa/b
String String
+ So, we can write it as E → TE′
E′ → +TE′/e
2
String String
Similarly other productions are written as

T → FT ′
9 5 T1 → × FT ′/∈
Figure 1 Leftmost derivation F → (E)/id
6.8 | Unit 6 • Compiler Design

Example 4 Eliminate left recursion from the grammar Let the string w = cad is to generate:
S → (L)/a S
L → L, S/b
c A d
Solution: S → (L)/a
a b
L → bL′
L′ → SL′/∈ The string generated from the above parse tree is cabd.
but, w = cad, the third symbol is not matched.
So, report error and go back to A.
Left factoring Now consider the other alternative for production A.
A grammar with common prefixes is called non-determin-

JW
S
istic grammar. To make it deterministic we need to remove
common prefixes. This process is called as Left Factoring.

Pw
c A d
The grammar: A → ab1/ab2 can be transformed into

Jf
a
A → a A′

/2
String generated ‘cad’ and w = cad. Now, it is successful.

ly
A′ → b1/b2
In this we have used back tracking. It is costly and time

it.
consuming approach. Thus an outdated one.

//b
Example 5: What is the resultant grammar after left
factoring the following grammar?

s:
Predictive Parsers

tp
S → iEtS/iEtSeS/a

ht
E→b By eliminating left recursion and by left factoring the gram-
mar, we can have parse tree without backtracking. To con-

S
Solution: S → iEtSS ′/a struct a predictive parser, we must know,
TE
S ′ → eS/∈
1. Current input symbol
O

E→b
N

2. Non-terminal which is to be expanded


D

A procedure is associated with each non-terminal of the


N

grammar.
Types of Parsing
A
H

Recursive descent parsing


EW

Parsers

In recursive descent parsing, we execute a set of recursive


N

procedures to process the input.


SE

Topdown parsers Bottom up parsers


(predictive parser)
The sequence of procedures called implicitly, defines a
C

parse tree for the input.


TE

Recursive descent Non-recursive Operator (LR parsers)


Non-recursive predictive parsing
A

parsing descent precedence


G

parsing parsing
(table driven parsing)
D

SLR CLR LALR •• It maintains a stack explicitly, rather than implicitly via
A

recursive calls.
LO

•• A table driven predictive parser has


N

→ An input buffer
W

Topdown Parsing → A stack


O

A parse tree is constructed for the input starting from the


D

→ A parsing table
root and creating the nodes of the parse tree in preorder. It → Output stream
simulates the left most derivation.

Input a + b $
Backtracking Parsing
If we make a sequence of erroneous expansions and sub-
x
sequently discover a mismatch we undo the effects and roll Predictive parsing
y Output
program
back the input pointer.
z
This method is also known as brute force parsing.
$
Example: S → cAd Parsing table
M
A → ab/a
Chapter 1 • Lexical Analysis and Parsing | 6.9

Constructing a parsing table By applying these rules to the above grammar, we will get
To construct a parsing table, we have to learn about two the following parsing table.
functions:
Input Symbol
1. FIRST ( )
2. FOLLOW ( ) Non-terminal id + * ( ) $
E E → TE ′ E → TE ′
FIRST(X) To compute FIRST(X) for all grammar symbols
E′ E ′ → + TE ′ E′ → e E′ → e
X, apply the following rules until no more terminals or e can
be added to any FIRST set. T T → FT ′ T → FT ′
T′ T′ → e T ′ → * FT ′ T′ → e T′ → e
1. If X is a terminal, then FIRST(X) is {X}.

JW
2. If X → e is a production, then add e to FIRST(X). F F → id F → (E)

3. If X is non-terminal and X → Y1Y2 – Yk is a production,

Pw
then place ‘a’ in FIRST(X) if for some i, a is an FIRST The parser is controlled by a program. The program con-

Jf
(Yi) and ∈ is in all of FIRST(Y1), …, FIRST(Yi–1); that sider x, the symbol on top of the stack and ‘a’ the current

/2
*
is, Y1, …, Yi – 1 ⇒ ∈. If ∈ is in FIRST (Yj) for all j = 1, input symbol.

ly
2, …, k, then add ∈ to FIRST(X). For example, every-

it.
thing in FIRST (Y1) is surely in FIRST(X). If Y1 does 1. If x = a = $, the parser halts and announces successful

//b
not derive ∈, then add nothing more to FIRST(X), but if completion of parsing.

s:
* 2. If x = a ≠ $, the parser pops x off the stack and advances
Y1 ⇒ ∈, then add FIRST (Y2) and so on.

tp
the input pointer to the next input symbol.

ht
FOLLOW (A): To compute FOLLOW (A) for all non- 3. If x is a non-terminal, the program consults entry M[x,
terminals A, apply the following rules until nothing can be a] of the parsing table M. This entry will be either an

S
added to any FOLLOW set.
TE
x-production of the grammar or an error entry. If M[x,
a] = {x → UVW}, the parser replaces x on top of the
O
1. Place $ in FOLLOW(S), where S is the start symbol
stack by WVU with U on the top.
N

and $ is input right end marker.


D

2. If there is a production A → aBb, then everything in If M[x, a] = error, the parser calls an error recovery routine.
N

FIRST (b) except e is placed in FOLLOW (B).


A

For example, consider the moves made by predictive


3. If there is a production A → aB or a production
H

parser on input id + id * id, which are shown below:


A → aBb, where FIRST (b) contains e, then every-
EW

thing in FOLLOW (A) is in FOLLOW (B).


Matched Stack Input Action
N

Example: Consider the grammar


SE

E$ id+id*id$
E → TE′
C

TE′$ id+id*id$ Output E → TE′


E′ → +TE′/e
TE

T → FT ′ FT′E′$ id+id*id$ Output T → FT′


A

T ′ → *FT ′/e idT′E′$ id+id*id$ Output F → id


G

F → (E)/id. Then
D

id T′E′$ +id*id$ Match id


FIRST (E) = FIRST (T ) = FIRST (F ) = {(, id}
A

id +id*id$ Output T′→ e


LO

E′$
FIRST (E ′) = {+, e}
FIRST (T ′) = {*, e} id +TE′$ +id*id$ Output E′ → +TE′
N
W

FOLLOW (E) = FOLLOW (E′) = {), $} id+ TE′$ id*id$ Match+


O

FOLLOW (T) = FOLLOW (T ′) = {+,), $} id+ FT′E′$ id*id$ Output T → FT′


D

FOLLOW (F) = {*, +,), $}


id+ idT′E′$ id*id$ Output F → id
Steps for the construction of predictive id+id T′E′$ *id$ Match id
parsing table id+id *FT′E′$ *id$ Output T′ → *FT′
1. For each production A → a of the grammar, do steps 2 id+id* FT′E′$ id$ Match*
and 3.
id+id* idT′E′$ id$ OutputF → id
2. For each terminal a in FIRST (a), add A → a to M [A, a]
3. If e is in FIRST (a), add A → a to M [A, b] for each id+id*id T′E′$ $ Match id
terminal b in FOLLOW (A). If e is in FIRST (a) and $ id+id*id E′$ $ Output T′ → e
is in FOLLOW (A), add A → a to M [A, $]
id+id*id $ $ Output E′ → e
4. Make each undefined entry of M be error.
6.10 | Unit 6 • Compiler Design

Bottom up Parsing Rightmost derivation


•• This parsing constructs the parse tree for an input string S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde
beginning at the leaves and working up towards the root. For bottom up parsing, we are using right most derivation
•• General style of bottom-up parsing is shift-reduce parsing. in reverse.

Shift–Reduce Parsing Handle of a string Substring that matches the RHS of


some production and whose reduction to the non-terminal
Reduce a string to the start symbol of the grammar. It simu-
on the LHS is a step along the reverse of some rightmost
lates the reverse of right most derivation.
derivation.
In every step a particular substring is matched (in left
right fashion) to the right side of some production and then rm
S ⇒ α Ar ⇒ αβ r

JW
it is substituted by the non-terminal in the left hand side of ∗

Pw
the production.
Right sentential forms of a unambiguous grammar have
For example consider the grammar

Jf
one unique handle.
S → aABe

/2
Example: For grammar, S → aABe

ly
A → Abc/b
A → Abc/b

it.
B→d
B→d

//b
In bottomup parsing the string ‘abbcde’ is verified as S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde

s:
abbcde

tp
Note: Handles are underlined.
aAbcde

ht
aAde → reverse order Handle pruning The process of discovering a handle and

S
aABe reducing it to the appropriate left hand side is called han-
TE
S dle pruning. Handle pruning forms the basis for a bottomup
O

Stack implementation of shift–reduce parser parsing.


N
D

The shift reduce parser consists of input buffer, Stack and To construct the rightmost derivation:
N

parse table. S = r0 ⇒ r1 ⇒ r2 ____ ⇒ rn = w


A

Input buffer consists of strings, with each cell containing


H

only one input symbol. Apply the following simple algorithm:


EW

Stack contains the grammar symbols, the grammar sym- For i ← n to 1


Find the handle Ai → Bi in ri
N

bols are inserted using shift operation and they are reduced
Replace Bi with Ai to generate ri–1
SE

using reduce operation after obtaining handle from the col-


lection of buffer symbols. Consider the cut of a parse tree of a certain right sentential
C

Parse table consists of 2 parts goto and action, which are form:
TE

constructed using terminal, non-terminals and compiler items. S


A

Let us illustrate the above stack implementation.


G

→ Let the grammar be


D

A
S → AA
A

  A → aA
LO

a b w
A→b
N

Let the input string ‘w’ be abab$ Here A → b is a handle for abw.
W

w = abab$
O

Shift reduce parsing with a stack There are 2 problems


D

Stack Input String Action with this technique:


$ abab$ Shift
(i) To locate the handle
$a bab$ Shift
(ii) Decide which production to use
$ab ab$ Reduce (A → b)
$aA ab$ Reduce (A → aA)
General construction using a stack
$A ab$ Shift
$Aa b$ Shift 1. ‘Shift’ input symbols onto the stack until a handle is
$Aab $ Reduce (A → b) found on top of it.
$AaA $ Reduce (A → aA)
2. ‘Reduce’ the handle to the corresponding non-terminal.
$AA $ Reduce (S → AA)
3. ‘Accept’ when the input is consumed and only the start
$S $ Accept symbol is on the stack.
4. Errors – call an error reporting/recovery routine.
Chapter 1 • Lexical Analysis and Parsing | 6.11

Viable prefixes The set of prefixes of a right sentential Let the operator precedence table for this grammar is:
form that can appear on the stack of a shift reduce parser
id + × $
are called viable prefixes.
id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
Conflicts × ⋖ ⋗ ⋗ ⋗
Conflicts
$ ⋖ ⋖ ⋖ accept

Shift/reduce Reduce/reduce 1. Scan the string from left until ⋗ is encountered


conflict conflict 2. Then scan backwards (to left) over any = until ⋖ is
encountered.
Shift/reduce conflict 3. The handle contains everything to the left of the first ⋗

JW
Example: stmt → if expr then stmt | if expr then stmt else and to the right of the ⋖ is encountered.

Pw
stmt | any other statement
After inserting precedence relation is
If exp then stmt is on the stack, in this case we can’t tell

Jf
$id + id * id $ is
whether it is a handle. i.e., ‘shift/reduce’ conflict.

/2
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $

ly
Reduce/reduce conflict

it.
Precedence functions Instead of storing the entire table of

//b
Example: S → aA/bB precedence relations table, we can encode it by precedence
functions f and g, which map terminal symbols to integers:

s:
A→c

tp
B→c 1. f(a) ⋖ f(b) whenever a ⋖ b

ht
W = ac it gives reduce/reduce conflict. 2. f(a) ⋗ f(b) whenever a ≗ b
3. f(a) > f(b) whenever a ⋗ b

S
Operator Precedence Grammar TE
In operator grammar, no production rule can have: Finding precedence functions for a table
O

•• e at the right side.


N

1. Create symbols f(a) and g(a) for each ‘a’ that is a ter-
•• two adjacent non-terminals at the right side.
D

minal or $.
N

Example 1: E → E + E /E – E/ id is operator grammar. 2. Partition the created symbols into as many groups as
A

possible in such away that a = b then f (a) and g (b) are


H

Example 2: E → AB in the same group


EW

A→a not operator grammar 3. Create a directed graph


B→b If a < b then place an edge from g(b) to f(a)
N

If a > b then place an edge from f(a) to g(b)


SE

Example 3: E → E0E/id
4. If the graph constructed has a cycle then no precedence
C

not operator function exists.


TE

grammar If there are no cycles, let f(a) be the length of the long-
A

est path being at the group of f(a).


Precedence relation If
G

Let g(a) be the length of the longest path from the


a < b then b has higher precedence than a
D

group of g(a).
a = b then b has same precedence as a
A
LO

a > b then b has lower precedence than a


Disadvantages of operator
N

Common ways for determining the precedence relation precedence parsing


W

between pair of terminals:


•• It can not handle unary minus.
O

1. Traditional notations of associativity and precedence. •• Difficult to decide which language is recognized by
D

Example: × has higher precedence than + × .> + (or) + <. × grammar.


2. First construct an unambiguous grammar for the lan- Advantages
guage which reflects correct associativity and prec- 1. Simple
edence in its parse tree. 2. Powerful enough for expressions in programming
language.
Operator precedence relations from
Error cases
associativity and precedence 1. No relation holds between the terminal on the top of
Let us use $ to mark end of each string. Define $ <. b and b stack and the next input symbol.
⋗ $ for all terminals b. Consider the grammar is: 2. A handle is found, but there is no production with this
E → E + E/E × E/id handle as the right side.
6.12 | Unit 6 • Compiler Design

Error recovery a 1 a 2 ... a i ... a n $ Input


1. Each empty entry is filled with a pointer to an error Stack
routine. Sm
2. Based on the handle tries to recover from the situation. xm Predictive parsing
Output
: program
To recover, we must modify (insert/change) :
:
1. Stack or S1 Action table Goto table
2. Input or x1 terminals and $ non-terminals
3. Both So s
t 4 State
We must be careful that we don’t get into an infinite loop. a different number
t actions

JW
e
s

Pw
LR Parsers Stack: To store the string of the form,
•• In LR (K), L stands for Left to Right Scanning, R stands

Jf
So x1 S1 … xmSm where

/2
for Right most derivation, K stands for number of look
Sm: state

ly
ahead symbols.

it.
•• LR parsers are table-driven, much like the non-recursive xm: grammar symbol

//b
LL parsers. A grammar which is used in construction of Each state symbol summarizes the information contained in

s:
LR parser is LR grammar. For a grammar to be LR it is the stack below it.

tp
sufficient that a left-to-right shift-reduce parser be able
Parsing table: Parsing table consists of two parts:

ht
to recognize handles of right-sentential forms when they
appear on the top of the stack. 1. Action part

S
•• The Time complexity for such parsers is O (n3) 2. Goto part
TE
•• LR parsers are faster than LL (1) parser.
O
ACTION Part:
•• LR parsing is attractive because
N

The most general non-backtracking shift reduce parser. Let, Sm → top of the stack
D

„„

The class of grammars that can be passed using LR    ai → current symbol


N

„„
A

methods is a proper superset of predictive parsers. LL Then action [Sm, ai] which can have one of four values:
H

(1) grammars ⊂ LR (1) grammars. 1. Shift S, where S is a state


EW

„„
LR parser can detect a syntactic error in the left to right 2. Reduce by a grammar production A → b
scan of the input. 3. Accept
N

•• LR parsers can be implemented in 3 ways: 4. Error


SE

1. Simple LR (SLR): The easiest to implement but the GOTO Part:


C

least powerful of the three. If goto (S, A) = X where S → state, A → non-terminal, then
TE

2. Canonical LR (CLR): most powerful and most GOTO maps state S and non-terminal A to state X.
A

expensive.
G

3. Look ahead LR (LALR): Intermediate between the


Configuration
D

remaining two. It works on most programming lan-


A

guage grammars. (So x1S1 x2S2 – xmSm, aiai+1 – an$)


LO

The next move of the parser is based on action [Sm, ai]


N

The configurations are as follows.


Disadvantages of LR parser
W

1. If action [Sm, ai] = shift S


O

1. Detecting a handle is an overhead, parse generator is


D

used. (So x1S1 x2S2--- xmSm, aiai+1 --- an$)


2. The main problem is finding the handle on the stack 2. If action [Sm, ai] = reduce A → b then
and it was replaced with the non-terminal with the left (So x1S1 x2S2--- xm–rSm–r, AS, aiai+1 --- an$)
hand side of the production.
Where S = goto [Sm–r, A]
3. If action [Sm, ai] = accept, parsing is complete.
The LR parsing algorithm 4. If action [Sm, ai] = error, it calls an error recovery
•• It consists of an input, an output, a stack, a driver program routine.
and a parsing table that has two parts (action and goto). Example: Parsing table for the following grammar is
•• The driver/parser program is same for all these LR pars- shown below:
ers, only the parsing table changes from parser to another. 1. E → E + T 2. E → T
Chapter 1 • Lexical Analysis and Parsing | 6.13

3. T → T * F 4. T → F A → B.CD means we have seen an input string derivable


5. F → (E) 6. F → id from B and hope to see a string derivable from CD.
The LR (0) items are constructed as a DFA from gram-
Action Goto mar to recognize viable prefixes.
State id + × ( ) $ E T F The items can be viewed as the states of NFA.
0 S5 S4 1 2 3 The LR (0) item (or) canonical LR (0) collection, pro-
1 S6 acc
vides the basis for constructing SLR parser.
2 r2 S7 r2 r2 To construct LR (0) items, define
3 r4 r4 r4 r4 (a) An augmented grammar
4 S5 S4 8 2 3 (b) closure and goto

JW
5 r6 r6 r6 r6
6 S5 S4 9 3 Augmented grammar (G′) If G is a grammar with start

Pw
7 S5 S4 10 symbol S, G′ the augmented grammar for G, with new start

Jf
8 S6 S1 symbol S ′ and production S′ → S.

/2
9 r1 S7 r1 r1 Purpose of G′ is to indicate when to stop parsing and

ly
10 r3 r3 r3 r3 announce acceptance of the input.

it.
//b
11 r5 r5 r5 r5
Closure operation Closure (I) includes

s:
Moves of LR parser on input string id*id+id is shown below: 1. Intially, every item in I is added to closure (I)

tp
2. If A → a.Bb is in closure (I) and b → g is a production

ht
then add B → .g to I.
Stack Input Action

S
0 id * id + id$ Shift 5
TE
Goto operation
O
reduce 6 means reduce with
Goto (I, x) is defined to be the closure of the set of all items
N

0id 5 * id + id$ 6th production F → id and


goto [0, F ] = 3 [A → aX.b] such that [A → a.Xb] is in I.
D
N

reduce 4 i.e T → F
0F 3 * id + id$
A

goto [0, T ] = 2 Items


H

0T 2 * id + id$ Shift 7
EW

0T2 * 7 id + id$ Shift 5 Kernel items: S′ → .S Non-kernel items:


and all items whose Which have their
N

reduce 6 i.e F → id
0T2 * 7 id 5 + id$ dots are not at the dots at the left end.
goto [7, F ] = 10
SE

left end
0T2 * 7 F 10 + id$ reduce 3 i.e T → T *F
C

0T 2 + id$ goto [0, T ] = 2 Construction of sets of Items


TE

0E 1 + id$ reduce 2 i.e E → T & goto [0, E ] = 1 Procedure items (G′)


A

Begin
G

0E1 + 6 id$ Shift 6


C: = closure ({[S′ → .S]});
D

0E1 + 6 id 5 $ Shift 5 repeat


A

For each set of items I in C and each grammar symbol x


LO

0E1 + 6F 3 $ reduce 6 & goto [6, F ] = 3


0E1 + 6T 9 $ reduce 4 & goto [6, T ] = 9 Such that goto (I, x) is not empty and not in C do add goto
N

(I, x) to C;
W

0E1 $ reduce 1 & goto [0, E ] = 1


Until no more sets of items can be added to C, end;
O

0E1 $ accept
D

Example: LR (0) items for the grammar


Constructing SLR parsing table E′ → E
LR (0) item: LR (0) item of a grammar G is a production of E → E + T/T
G with a dot at some position of the right side of production. T → T * F/F
F → (E)/id
Example: A → BCD
Possible LR (0) items are is given below:
I0: E′ → .E
A → .BCD
A → B.CD E → .E + T
A → BC.D E → .T
A → BCD. T → .T * F
6.14 | Unit 6 • Compiler Design

T → .F For viable prefixes construct the DFA as follows:


F → .(E)
E + T
F → .id I0 I1 I6 I9 * to I 7
F to I 3
( to I 4
I1: got (I0, E)
id to I 5
E′ → E. T F
I2 * I7 I 10
E → E. + T ( to I 4
id to I 5
I2: goto (I0, T) F
I3 to I 7
E → T.
(

JW
T → T. * F ( E )
I4 I8 I 11
+

Pw
I3: goto (I0, F) T to I 6
to I 2
T → F. F

Jf
id to I 3
id

/2
I5
I4: goto (I0, ( )

ly
F → (.E)

it.
//b
E → .E + T SLR parsing table construction

s:
E → .T 1. Construct the canonical collection of sets of LR (0)

tp
E → .T * F items for G′.

ht
T → .F 2. Create the parsing action table as follows:
(a) If a is a terminal and [A → a.ab] is in Ii, goto
S
F → .(E) TE
F → .id (Ii, a) = Ij then action (i, a) to shift j. Here ‘a’ must
be a terminal.
O

(b) If [A → a.] is in Ii, then set action [i, a] to ‘reduce


N

I5: goto (I0, id)


A → a’ for all a in FOLLOW (A);
D

F → id.
N

(c) If [S′ → S.] is in Ii then set action [i, $] to ‘accept’.


A

I6: got (I1, +)


H

3. Create the parsing goto table for all non-terminals A, if


E → E+ .T goto (Ii, A) = Ij then goto [i, A] = j.
EW

T → .T * F 4. All entries not defined by steps 2 and 3 are made errors.


N

T → .F 5. Initial state of the parser contains S′ → S.


SE

  The parsing table constructed using the above algo-


F → .(E) rithm is known as SLR (1) table for G.
C

F → .id
TE

Note: Every SLR (1) grammar is unambiguous, but every


A

I7: goto (I2, *) unambiguous grammar is not a SLR grammar.


G

T → T* .F Example 6: Construct SLR parsing table for the following


D

F → .(E) grammar:
A
LO

F → .id 1.  S → L = R
2.  S → R
N
W

I8: goto (I4, E)


3.  L → * R
O

F → (E.)
4.  L → id
D

I9: goto (I6, T) 5. R → L


E → E+ T. Solution: For the construction of SLR parsing table, add
T → T.* F S′ → S production.
S′ → S
I10: goto (I7, F)
  S → L = R
T → T* F.
  S → R
I11: goto (I8,))   L → *R
F → (E).   L → id
R→L
Chapter 1 • Lexical Analysis and Parsing | 6.15

LR (0) items will be Action Goto


I0: S′ → .S States = * id $ S L R
S → .L = R 0 S4 S5 1 2 3
S → .R 1 acc
L → .*R 2 S6,r5 r5
L → .id 3
R → .L 4 S4 S5 8 7
I1: goto (I0, S)
5
S′ → S.
6 S4 S5 8 9

JW
I2: goto (I0, L)
7
S → L. = R

Pw
8
R → L.

Jf
9
I3: got (I0, R)

/2
S → R.

ly
FOLLOW (S) = {$}

it.
I4: goto (I0, *)
FOLLOW (L) = {=}

//b
L → *.R
FOLLOW (R) = {$, =}

s:
R → .L For action [2, =] = S6 and r5

tp
L → .*R ∴ Here we are getting shift – reduce conflict, so it is not

ht
L → .id SLR (1).

S
I5: goto(I0, id) TE
L → id. Canonical LR Parsing (CLR)
O

I6: goto(I2, =)
N

•• To avoid some of invalid reductions, the states need to


S → L = .R
D

carry more information.


N

R → .L •• Extra information input into a state by including a terminal


A

L → .*R symbol as a second component of an item.


H

•• The general form of an item


EW

L → .id
I7: goto(I4, R) [A → a.b, a]
N

Where A → ab is a production.
L → *R.
SE

a is terminal/right end marker ($). We will call it as LR


I8: goto(I4, L) (1) item.
C

R → L.
TE

I9: goto(I6, R) LR (1) item


A

S → L = R.
G

It is a combination of LR (0) items along with look ahead of


D

the item. Here 1 refers to look ahead of the item.


A

The DFA of LR(0) items will be


LO

Construction of the sets of LR (1) items Function closure (I):


Begin
N

S
I0 I1
W

Repeat
O

*
For each item [A → a.Bb, a] in I,
D

* I4
id
I5 Each production B → .g in G′,
R And each terminal b in FIRST (b a)
I7
Such that [B → .g, b] is not in I do
L
I8 Add [B → .g, b] to I;
L = R End;
I2 I6 I9
L
Until no more items can be added to I;
I8
* Example 7: Construct CLR parsing table for the following
I4
id
I5
grammar:
S′ → S
R
I3   S → CC
id C → cC/d
I5
6.16 | Unit 6 • Compiler Design

Solution: The initial set of items is Consider the string derivation ‘dcd’:
I0: S′ → .S, $ S ⇒ CC ⇒ CcC ⇒ Ccd ⇒ dcd
S → .CC, $
A → a.Bb, a Stack Input Action
Here A = S, a = ∈, B = C, b = C and a = $ 0 dcd $ shift 4
First (ba) is first (C$) = first (C) = {c, d} 0d4 Cd $ reduce 3 i.e. C → d
So, add items [C → .cC, c]
0C 2 Cd $ shift 6
[C → .cC, d]
0C 2C 6 D$ shift 7
∴ Our first set I0: S′ → .S, $
0C2C 6d 7 $ reduce C → d
S → .CC, $

JW
0C 2C 6C 9 $ reduce C → cC
C → .coca, c/d

Pw
C → .d, c/d. 0C 2C 5 $ reduce S → CC
0S1 $
I1: goto (I0, X) if X = S

Jf
/2
S′ → S., $

ly
Example 8: Construct CLR parsing table for the grammar:
I2 : goto (I0, C)

it.
S→L=R
  S → C.C, $

//b
S→R
C → .cC, $

s:
L → *R
C → .d, $

tp
L → id
I3: goto (I0, c)

ht
R→L
C → c.C, c/d
C → .cC, c/d
S
Solution: The canonical set of items is
TE
C → .d c/d I0: S′ → .S, $
O

I4: goto (I0, d) S → .L = R, $


N

C → d., c/d S → .R, $


D

L → .* R, = [first (= R$) = {=}]


N

I5: goto (I2, C) L → .id, =


A

S → CC., $ R → .L, $
H
EW

I6: goto (I2, c) I1: got (I0, S)


C → c.C; $ S′ → S., $
N

C → ..cC, $
SE

C → ..d, $ I2: goto (I0, L)


S → L. = R, $
C

I7: goto (I2, d)


R → L., $
TE

C → d. $
I3: goto (I0, R)
A

I8: goto (I3, C)


S → R., $
G

C → cC., c/d
D

I4: got (I0, *)


A

I9: goto (I6, C)


L → *. R, =
LO

C → cC., $
R → .L, =
N

CLR table is: L → .* R, =


W

L → .id, =
O

Action Goto
I5: goto (I0, id)
D

States c 1 $ S C
L → id.,=
I0 S3 S4 1 2
I6: goto (I7, L)
I1 acc
R → L., $
I2 S6 S7 5
I3 S3 S4 8 I7: goto (I2, =)
I4 R3 r3 S → L = .R,
I5 r1 R → .L, $
I6 S6 S7 9
L → .*R, $
L → .id, $
I7 r3
I8 R2 r2 I8: goto (I4, R)
I9 r2 L → *R., =
Chapter 1 • Lexical Analysis and Parsing | 6.17

I9: goto (I4, L) Stack Input


R → L., = 0 Id = id $

I10: got (I7, R) 0id 5 = id $


S → L = R., $ 0L2 = id $

I11: goto (I7, *) 0L2 = 7 id $


L → *.R, $ 0L2 = 7! d12 $
R → .L, $
0L2 = 7L6 $
L → .*R, $
0L 2= 7R10 $
L → .id, $

JW
0S1 (accept) $
I12: goto (I7, id)

Pw
L → id. , $
Every SLR (1) grammar is LR (1) grammar.

Jf
I13: goto (I11, R) CLR (1) will have ‘more number of states’ than SLR Parser.

/2
L → *R., $

ly
LALR Parsing Table

it.
S

//b
I0 I1 •• The tables obtained by it are considerably smaller than

s:
the canonical LR table.

tp
L = L
I2 I7 I6 •• LALR stands for Lookahead LR.

ht
R
I 10 •• The number of states in SLR and LALR parsing tables for
R

S
* I 11 I 13 a grammar G are equal.
TE
R * L
to I 6 •• But LALR parsers recognize more grammars than SLR.
I3
O
id
to I 12 •• YACC creates a LALR parser for the given grammar.
N

id
I 12 •• YACC stands for ‘Yet another Compiler’.
D

*
R •• An easy, but space-consuming LALR table construction
N

* I4 I8
A

L is explained below:
H

I9
id 1. Construct C = {I0, I1, –In}, the collection of sets of LR
EW

I5
(1) items.
N

2. Find all sets having the common core; replace these


We have to construct CLR parsing table based on the above sets by their union
SE

diagram. 3. Let C′ = {JO, J1 --- Jm} be the resulting sets of LR (1)


C

In this, we are going to have 13 states items. If there is a parsing action conflict then the
TE

The shift –reduce conflict in the SLR parser is reduced grammar is not a LALR (1).
here.
A

4. Let k be the union of all sets of items having the same


G

core. Then goto (J, X) = k


D

States id * = $ S L R •• If there are no parsing action conflicts then the grammar


A
LO

0 S5 S4 1 2 3 is said to LALR (1) grammar.


1 acc •• The collection of items constructed is called LALR (1)
N

collection.
W

2 S7 r5
O

3 r2
Example 9: Construct LALR parsing table for the
D

4 S5 S4 9 8 following grammar:
5 r4
S′ → S
6 r5   S → CC
7 s12 s11 6 10 C → cC/d
8 r3
Solution: We already got LR (1) items and CLR parsing
9 r5
table for this grammar.
10 r1
After merging I3 and I6 are replaced by I36.
11 S12 S11 13
I36: C → c.C, c/d/$
12 r4
C → .cC, c/d/$
13 r3
C → .d, c/d/$
6.18 | Unit 6 • Compiler Design

I47: By merging I4 and I7 I5: goto (I2, B)


C → d. c/d/$ S → aB.e, c
I89: I8 and I9 are replaced by I89 I6: goto (I2, c)
C → cC., c/d/$ A → c., d
The LALR parsing table for this grammar is given below: B → c., e
I7: goto (I3, c)
Action goto
A → c., e
State c d $ S C
B → c., d
0 S36 S47 1 2

JW
I8: goto (I4, d)
1 acc S → aAd., c

Pw
2 S36 S47 5 I9: goto (I5, e)

Jf
36 S36 S47 89 S → aBe., c

/2
ly
47 r3 R3 r3 If we union I6 and I7

it.
A → c., d/e

//b
5 r1
B → c., d/e

s:
89 r2 r2 r2

tp
It generates reduce/reduce conflict.

ht
Example: Consider the grammar: Notes:
S′ → S 1. The merging of states with common cores can never

S
TE
  S → aAd produce a shift/reduce conflict, because shift action
  S → bBd depends only on the core, not on the lookahead.
O
N

  S → aBe 2. SLR and LALR tables for a grammar always have the
  S → bAe same number of states (several hundreds) whereas
D
N

  A → c CLR have thousands of states for the same grammar.


A

  B → c
H

Comparison of parsing methods


Which generates strings acd, bcd, ace and bce
EW

LR (1) items are Goto and Grammar it


N

Method Item Closures Applies to


I0: S′ → .S, $
SE

SLR (1) LR(0) item Different from SLR (1) ⊂ LR(1)


   S → .aAd, $ LR(1)
C

S → .bBd, $
TE

LR (1) LR(1) item LR(1) – Largest


S → .aBe, $ class of LR
A

grammars
S → .bAe, $
G

LALR(1) LR(1) item Same as LR(1) LALR(1) ⊂ LR(1)


D

I1: goto (I0, S)


A

S′ → S., $
LO

CLR(1)
N

I2: goto (I0, a) LALR(1)


W

SLR(1)
S → a.Ad, c
O

LR(0)
S → a.Be, c
D

A → .c,d
LL(1)
B → .c,e
I3: goto (I0, b)
S → b.Bd, c
S → b.Ae, c Every LR (0) is SLR (1) but vice versa is not true.
A → .c, e
B → .c, e Difference between SLR, LALR
I4: goto (I2, A)
and CLR parsers
S → aA.d, c Differences among SLR, LALR and CLR are discussed
below in terms of size, efficiency, time and space.
Chapter 1 • Lexical Analysis and Parsing | 6.19

Table 1 Comparison of parsing methods


SI. No. Factors SLR Parser LALR Parser CLR Parser
1 Size Smaller Smaller Larger
2. Method It is based on FOLLOW This method is applicable This is most powerful than
function to wider class than SLR SLR and LALR.
3. Syntactic features Less exposure compared Most of them are Less
to other LR parsers expressed
4. Error detection Not immediate Not immediate Immediate
5. Time and space Less time and space More time and space More time and space
complexity complexity complexity

JW
Exercises

Pw
Jf
Practice Problems 1 4. If an LL (1) parsing table is constructed for the above

/2
grammar, the parsing table entry for [S → [ ] is
Directions for questions 1 to 15: Select the correct alterna-

ly
(A) S → T; S (B) S → ∈
tive from the given choices.

it.
(C) T → UR (D) U → [S]

//b
1. Consider the grammar
S→a Common data for questions 5 to 7: Consider the aug-

s:
S → ab mented grammar

tp
The given grammar is: S→X

ht
(A) LR (1) only X → (X)/a

S
(B) LL (1) only 5. If a DFA is constructed for the LR (1) items of the
TE
(C) Both LR (1) and LL (1) above grammar, then the number states present in it
O
(D) LR (1) but not LL (1) are:
N

2. Which of the following is an unambiguous grammar, (A) 8 (B) 9


D

that is not LR (1)? (C) 7 (D) 10


N
A

(A) S → Uab | Vac 6. Given grammar is


H

U → d (A) Only LR (1)


EW

V → d (B) Only LL (1)


(B) S → Uab/Vab/Vac (C) Both LR (1) and LL (1)
N

U → d (D) Neither LR (1) nor LL (1)


SE

V → d 7. What is the number of shift-reduce steps for input (a)?


C

(C) S → AB (A) 15 (B) 14


A → a
TE

(C) 13 (D) 16
B → b
A

8. Consider the following two sets of LR (1) items of a


(D) S → Ab
G

grammar:
A → a/c
D

X → c.X, c/d X → c.X, $


A

Common data for questions 3 and 4: Consider the grammar: X → .cX, c/d X → .cX, $
LO

S → T; S/∈ X → d, c/d X → .d, $


N

T → UR Which of the following statements related to merging


W

U → x/y/[S] of the two sets in the corresponding LALR parser is/are


O

R → .T/∈
D

FALSE?
3. Which of the following are correct FIRST and 1. Cannot be merged since look ahead are different.
FOLLOW sets for the above grammar? 2. Can be merged but will result in S – R conflict.
(i) FIRST(S) = FIRST (T) = FIRST (U) = {x, y, [, e} 3. Can be merged but will result in R – R conflict.
(ii) FIRST (R) = {,e} 4. Cannot be merged since goto on c will lead to two
(iii) FOLLOW (S) = {], $} different sets.
(iv) FOLLOW (T) = Follow (R) = {;} (A) 1 only (B) 2 only
(v) FOLLOW (U) = {. ;} (C) 1 and 4 only (D) 1, 2, 3 and 4
(A) (i) and (ii) only 9. Which of the following grammar rules violate the
(B) (ii), (iii), (iv) and (v) only requirements of an operator grammar?
(C) (ii), (iii) and (iv) only (i) A → BcC (ii) A → dBC
(D) All the five (iii) A → C/∈ (iv) A → cBdC
6.20 | Unit 6 • Compiler Design

(A) (i) only (B) (i) and The entry/entries for [Z, d] is/are
(C) (ii) and (iii) only (D) (i) and (iv) only (A) Z → d
10. The FIRST and FOLLOW sets for the grammar: (B) Z → XYZ
(C) Both (A) and (B)
S → SS + /SS*/a
(D) X → Y
(A) First (S) = {a}
Follow (S) = {+, *, $} 13. The following grammar is
(B) First (S) = {+} S → AaAb/BbBa
Follow (S) = {+, *, $} A→e
(C) First (S) = {a} B→e
Follow (S) = {+, *} (A) LL (1) (B) Not LL (1)

JW
(D) First (S) = {+, *} (C) Recursive (D) Ambiguous
Follow (S) = {+, *, $} 14. Compute the FIRST (P) for the below grammar:

Pw
11. A shift reduces parser carries out the actions specified P → AQRbe/mn/DE

Jf
within braces immediately after reducing with the cor- A → ab/e

/2
responding rule of the grammar: Q → q1q2/e

ly
S → xxW [print ‘1’] R → r1r2/e

it.
S → y [print ‘2’] D→d

//b
W → Sz [print ‘3’] E→e

s:
What is the translation of ‘x x x x y z z’? (A) {m, a} (B) {m, a, q1, r1, b, d}

tp
(A) 1231 (B) 1233 (C) {d, e} (D) {m, n, a, b, d, e, q1, r1}

ht
(C) 2131 (D) 2321 15. After constructing the LR(1) parsing table for the aug-

S
12. After constructing the predictive parsing table for the mented grammar
TE
following grammar: S′ → S
O

Z→d S → BB
N

Z → XYZ B → aB/c
D

Y → c/∈ What will be the action [I3, a]?


N

X→Y
A

(A) Accept (B) S7


H

X→a (C) r2 (D) S5


EW
N

Practice Problems 2
SE

(C) + and? have same precedence


Directions for questions 1 to 19: Select the correct alterna- (D) + has higher precedence than *
C

tive from the given choices. 4. The action of parsing the source program into the
TE

1. Consider the grammar proper syntactic classes is known as


A

S → aSb (A) Lexical analysis


G

S → aS (B) Syntax analysis


D

(C) Interpretation analysis


A

S→e
LO

This grammar is ambiguous by generating which of the (D) Parsing


following string.
N

5. Which of the following is not a bottom up parser?


W

(A) aa (B) ∈ (A) LALR (B) Predictive parser


O

(C) aaa (D) aab (C) CLR (D) SLR


D

2. To convert the grammar E → E + T into LL grammar 6. A system program that combines separately compiled
(A) use left factor modules of a program into a form suitable for execu-
(B) CNF form tion is
(C) eliminate left recursion (A) Assembler.
(D) Both (B) and (C) (B) Linking loader.
3. Given the following expressions of a grammar (C) Cross compiler.
E → E × F/F + E/F (D) None of these.
F → F? F/id 7. Resolution of externally defined symbols is performed
Which of the following is true? by a
(A) × has higher precedence than + (A) Linker (B) Loader.
(B) ? has higher precedence than × (C) Compiler. (D) Interpreter.
Chapter 1 • Lexical Analysis and Parsing | 6.21

8. LR parsers are attractive because 15. Which of the following statement is false?
(A) They can be constructed to recognize CFG cor- (A) An unambiguous grammar has single leftmost
responding to almost all programming constructs. derivation.
(B) There is no need of backtracking. (B) An LL (1) parser is topdown.
(C) Both (A) and (B). (C) LALR is more powerful than SLR.
(D) None of these (D) An ambiguous grammar can never be LR (K) for
9. YACC builds up any k.
(A) SLR parsing table
16. Merging states with a common core may produce ___
(B) Canonical LR parsing table
conflicts in an LALR parser.
(C) LALR parsing table
(A) Reduce – reduce

JW
(D) None of these
(B) Shift – reduce
10. Language which have many types, but the type of every

Pw
(C) Both (A) and (B)
name and expression must be calculated at compile (D) None of these

Jf
time are

/2
(A) Strongly typed languages 17. LL (K) grammar

ly
(B) Weakly typed languages (A) Has to be CFG

it.
(C) Loosely typed languages (B) Has to be unambiguous

//b
(D) None of these (C) Cannot have left recursion

s:
11. Consider the grammar shown below: (D) All of these

tp
S → iEtSS′/a/b 18. The I0 state of the LR (0) items for the grammar

ht
S′ → eS/e
S → AS/b

S
In the predictive parse table M, of this grammar, the A → SA/a.
TE
entries M [S′, e] and M [S′, $] respectively are
(A) S′ → .S
O

(A) {S′ → eS} and {S′ → ∈} S → .As


N

(B) {S′ → eS} and { } S → .b


D

(C) {S′ → ∈} and {S′ → ∈}


N

A → .SA
(D) {S′ → eS, S′ → e}} and {S′ → ∈}
A

A → .a
H

12. Consider the grammar S → CC, C → cC/d. (B) S → .AS


EW

The grammar is S → .b
(A) LL (1) A → .SA
N

(B) SLR (1) but not LL (1) A → .a


SE

(C) LALR (1) but not SLR (1) (C) S → .AS


C

(D) LR (1) but not LALR (1) S → .b


TE

13. Consider the grammar (D) S → A


E → E + n/E – n/n
A

A → .SA
G

For a sentence n + n – n, the handles in the right senten- A → .a


D

tial form of the reduction are


19. In the predictive parsing table for the grammar:
A

(A) n, E + n and E + n – n
LO

S → FR
(B) n, E + n and E + E – n
R → ×S/e
N

(C) n, n + n and n + n – n
W

(D) n, E + n and E – n F → id
O

14. A top down parser uses ___ derivation. What will be the entry for [S, id]?
D

(A) Left most derivation (A) S → FR


(B) Left most derivation in reverse (B) F → id
(C) Right most derivation (C) Both (A) and (B)
(D) Right most derivation in reverse (D) None of these
6.22 | Unit 6 • Compiler Design

Previous Years’ Questions


1. Consider the grammar: (C) S → AC|CB (D) S → AC|CB
S → (S) | a C → aC|b|∈ C → aCb|∈
Let the number of states in SLR (1), LR (1) and LALR (1) A → aA|∈ A → aA|a
parsers for the grammar be n1, n2 and n3 respectively. B → Bb|∈ B → Bb|b
The following relationship holds good:  [2005] 6. In the correct grammar above, what is the length of
(A) n1 < n2 < n3 (B) n1 = n3 < n2 the derivation (number of steps starting from S) to
(C) n1 = n2 = n3 (D) n1 ≥ n3 ≥ n2 generate the string albm with l ≠ m? [2006]
2. Consider the following grammar: (A) max (l, m) + 2
  S→S*E (B) l + m + 2

JW
  S→E (C) l + m + 3

Pw
E→F+E (D) max (l, m) + 3
E→F 7. Which of the following problems is undecidable?

Jf
F → id  [2007]

/2
Consider the following LR (0) items corresponding to

ly
(A) Membership problem for CFGs.
the grammar above.

it.
(B) Ambiguity problem for CFGs.
(i) S → S * .E

//b
(C) Finiteness problem for FSAs.
(ii) E → F. + E

s:
(D) Equivalence problem for FSAs.
(iii) E → F + .E

tp
8. Which one of the following is a top-down parser?

ht
Given the items above, which two of them will appear  [2007]
in the same set in the canonical sets-of items for the (A) Recursive descent parser.
S
grammar? [2006] TE
(B) Operator precedence parser.
(A) (i) and (ii) (B) (ii) and (iii) (C) An LR (k) parser.
O

(C) (i) and (iii) (D) None of the above


N

(D) An LALR (k) parser.


D

3. Consider the following statements about the context- 9. Consider the grammar with non-terminals N = {S, C,
N

free grammar and S1}, terminals T = {a, b, i, t, e} with S as the start


A

G = {S → SS, S → ab, S → ba, S → ∈}


H

symbol, and the following set of rules: [2007]


(i) G is ambiguous S → iCtSS1|a
EW

(ii) G produces all strings with equal number of a’s S1 → eS|e


and b’s
N

C→b
(iii) G can be accepted by a deterministic PDA.
SE

The grammar is NOT LL (1) because:


Which combination below expresses all the true state- (A) It is left recursive
C

ments about G?  [2006] (B) It is right recursive


TE

(A) (i) only (B) (i) and (iii) only (C) It is ambiguous
A

(C) (ii) and (iii) only (D) (i), (ii) and (iii) (D) It is not context-free.
G

10. Consider the following two statements:


D

4. Consider the following grammar:


A

S → FR P: Every regular grammar is LL (1)


LO

R → *S|e Q: Every regular set has a LR (1) grammar


Which of the following is TRUE?[2007]
N

F → id
W

(A) Both P and Q are true


In the predictive parser table, M, of the grammar the
O

(B) P is true and Q is false


entries M[S, id] and M[R, $] respectively. [2006]
D

(C) P is false and Q is true


(A) {S → FR} and {R → e}
(D) Both P and Q are false
(B) {S → FR} and { }
(C) {S → FR} and {R → *S} Common data for questions 11 and 12: Consider the
(D) {F → id} and {R → e} CFG with {S, A, B} as the non-terminal alphabet, {a, b}
5. Which one of the following grammars generates the as the terminal alphabet, S as the start symbol and the fol-
language L = {aibj|i ≠ j}? [2006] lowing set of production rules:
(A) S → AC|CB (B) S → aS|Sb|a|b S → aB S → bA
C → aCb|a|b B→b A→a
A → aA|∈ B → bS A → aS
B → Bb|∈ B → aBB    S → bAA
Chapter 1 • Lexical Analysis and Parsing | 6.23

11. Which of the following strings is generated by the (A) Abstract syntax tree
grammar? [2007] (B) Symbol table
(A) aaaabb (B) aabbbb (C) Semantic stack
(C) aabbab (D) abbbba (D) Parse table
12. For the correct answer strings to Q.78, how many der- 18. The grammar S → aSa|bS|c is [2010]
ivation trees are there?  [2007] (A) LL (1) but not LR (1)
(A) 1 (B) 2 (B) LR (1) but not LR (1)
(C) 3 (D) 4 (C) Both LL (1) and LR (1)
(D) Neither LL (1) nor LR (1)
13. Which of the following describes a handle (as applica-
19. The lexical analysis for a modern computer language

JW
ble to LR-parsing) appropriately? [2008]
(A) It is the position in a sentential form where the next such as Java needs the power of which one of the fol-

Pw
shift or reduce operation will occur. lowing machine models in a necessary and sufficient
sense? [2011]

Jf
(B) It is non-terminal whose production will be used
(A) Finite state automata

/2
for reduction in the next step.

ly
(C) It is a production that may be used for reduction (B) Deterministic pushdown automata

it.
in a future step along with a position in the sen- (C) Non-deterministic pushdown automata

//b
tential form where the next shift or reduce opera- (D) Turing machine

s:
tion will occur. Common data for questions 20 and 21: For the grammar

tp
(D) It is the production p that will be used for reduc- below, a partial LL (1) parsing table is also presented

ht
tion in the next step along with a position in the along with the grammar. Entries that need to be filled are
sentential form where the right hand side of the indicated as E1, E2, and E3. Is the empty string, $ indicates

S
production may be found. TE
end of input, and, I separates alternate right hand side of
productions
O
14. Which of the following statements are true?
N

(i) Every left-recursive grammar can be converted S → a A b B|b A a B| e


D

to a right-recursive grammar and vice-versa A→S


N

(ii) All e-productions can be removed from any con-


A

B→S
text-free grammar by suitable transformations
H

(iii) The language generated by a context-free gram- a b $


EW

mar all of whose productions are of the form X S E1 E2 S→e


→ w or X → wY (where, w is a string of terminals
N

A A→S A→S Error


SE

and Y is a non-terminal), is always regular


(iv) The derivation trees of strings generated by a con- B B→S B→S E3
C

text-free grammar in Chomsky Normal Form are


TE

always binary trees [2008] 20. The FIRST and FOLLOW sets for the non-terminals
A

(A) (i), (ii), (iii) and (iv) (B) (ii), (iii) and (iv) only A and B are [2012]
G

(C) (i), (iii) and (iv) only (D) (i), (ii) and (iv) only (A) FIRST (A) = {a, b, e} = FIRST (B)
D

FOLLOW (A) = {a, b}


A

15. An LALR (1) parser for a grammar G can have shift-


FOLLOW (B) = {a, b, $}
LO

reduce (S–R) conflicts if and only if [2008]


(A) The SLR (1) parser for G has S–R conflicts (B) FIRST (A) = {a, b, $}
N

FIRST (B) = {a, b, e}


W

(B) The LR (1) parser for G has S–R conflicts


FOLLOW (A) = {a, b}
O

(C) The LR (0) parser for G has S–R conflicts


FOLLOW (B) = {$}
D

(D) The LALR (1) parser for G has reduce-reduce


conflicts (C) FIRST (A) = {a, b, e} = FIRST (B)
FOLLOW (A) = {a, b}
16. S → aSa| bSb | a | b;
FOLLOW (B) = ∅
The language generated by the above grammar over the
(D) FIRST (A) = {a, b} = FIRST (B)
alphabet {a, b} is the set of [2009]
FOLLOW (A) = {a, b}
(A) All palindromes.
(B) All odd length palindromes. FOLLOW (B) = {a, b}
(C) Strings that begin and end with the same symbol. 21. The appropriate entries for E1, E2, and E3 are  [2012]
(D) All even length palindromes. (A) E1: S → a A b B, A → S
17. Which data structure in a compiler is used for managing E2: S → b A a B, B → S
information about variables and their attributes? [2010] E3: B → S
6.24 | Unit 6 • Compiler Design

(B) E1: S → a A b B, S → e 26. Consider the grammar defined by the following produc-
E2: S → b A a B, S → e tion rules, with two operators * and +
E3: S → ∈ S→T*P
(C) E1: S → a A b B, S → e T → U|T * U
E2: S → b A a B, S → e P → Q + P|Q
E3: B → S Q → Id
(D) E1: A → S, S → e U → Id
E2: B → S, S → e
Which one of the following is TRUE? [2014]
E3: B → S
(A) + is left associative, while * is right associative
22. What is the maximum number of reduce moves that

JW
(B) + is right associative, while * is left associative
can be taken by a bottom-up parser for a grammar with
no epsilon-and unit-production (i.e., of type A → ∈ (C) Both + and * are right associative.

Pw
and A → a) to parse a string with n tokens? [2013] (D) Both + and * are left associative

Jf
(A) n/2 (B) n – 1 27. Which one of the following problems is undecidable?

/2
(C) 2n – 1 (D) 2n

ly
 [2014]

it.
23. Which of the following is/are undecidable? (A) Deciding if a given context -free grammar is am-

//b
(i) G is a CFG. Is L (G) = φ? biguous.

s:
(ii) G is a CFG, Is L (G) = Σ*? (B) Deciding if a given string is generated by a given

tp
context-free grammar.
(iii) M is a Turing machine. Is L (M) regular?

ht
(C) Deciding if the language generated by a given
(iv) A is a DFA and N is an NFA. Is L (A) = L (N)?
context-free grammar is empty.
[2013]
S
(D) Deciding if the language generated by a given
TE
(A) (iii) only context free grammar is finite.
O
(B) (iii) and (iv) only
28. Which one of the following is TRUE at any valid state
N

(C) (i), (ii) and (iii) only


in shift-reduce parsing? [2015]
D

(D) (ii) and (iii) only


N

(A) Viable prefixes appear only at the bottom of the


A

24. Consider the following two sets of LR (1) items of an stack and not inside.
H

LR (1) grammar. [2013] (B) Viable prefixes appear only at the top of the
EW

X → c.X, c/d X → c.X, $ stack and not inside.


(C) The stack contains only a set of viable prefixes.
N

X → .cX, c/d X → .cX, $


(D) The stack never contains viable prefixes.
SE

X → .d, c/d X → .d, $


29. Among simple LR (SLR), canonical LR, and look-
Which of the following statements related to merging
C

ahead LR (LALR), which of the following pairs iden-


of the two sets in the corresponding LALR parser is/
TE

tify the method that is very easy to implement and the


are FALSE?
A

method that is the most powerful, in that order?


(i) Cannot be merged since look - ahead are different.
G

 [2015]
(ii) Can be merged but will result in S-R conflict.
D

(A) SLR, LALR


A

(iii) Can be merged but will result in R-R conflict. (B) Canonical LR, LALR
LO

(iv) Cannot be merged since goto on c will lead to (C) SLR, canonical LR


two different sets.
N

(D) LALR, canonical LR


W

(A) (i) only (B) (ii) only 30. Consider the following grammar G
O

(C) (i) and (iv) only (D) (i), (ii), (iii) and (iv)
S → F | H
D

25. A canonical set of items is given below F → p | c


S → L. > R
H → d | c
Q → R.
Where S, F and H are non-terminal symbols, p, d
On input symbol < the sset has [2014]
and c are terminal symbols. Which of the following
(A) A shift–reduce conflict and a reduce–reduce conflict.
statement(s) is/are correct? [2015]
(B) A shift–reduce conflict but not a reduce–reduce
conflict. S1. LL(1) can parse all strings that are generated
(C) A reduce–reduce conflict but not a shift reduce using grammar G
conflict. S2. LR(1) can parse all strings that are generated using
(D) Neither a shift–reduce nor a reduce–reduce conflict. grammar G
Chapter 1 • Lexical Analysis and Parsing | 6.25

(A) Only S1 (B) Only S2 (A) E − >E − T | T (B) E − > TE


(C) Both S1 and S2 (D) Neither S1 nor S2 T − > T + F | F E′ − > −TE | ∈
31. Match the following: [2016] F − > (E) | id T−>T+F|F
(P) Lexical analysis (i) Leftmost derivation F − > (E) | id
(Q) Top down parsing (ii) Type checking (C) E − > TX (D) E − > TX | (TX)
(R) Semantic analysis (iii) Regular expressions X − > −TX | ∈ X − > −TX | +TX | ∈
(S) Runtime environments (iv) Activation records T − > FY T − > id
(A) P ↔ i, Q ↔ ii, R ↔ iv, S ↔ iii Y − > + FY | ∈
(B) P ↔ iii, Q ↔ i, R ↔ ii, S ↔ iv F − > (E) | id
(C) P ↔ ii, Q ↔ iii, R ↔ i, S ↔ iv 36. Which one of the following statements is FALSE?

JW
(D) P ↔ iv, Q ↔ i, R ↔ ii, S ↔ iii  [2018]
(A) Context-free grammar can be used to specify

Pw
32. A student wrote two context - free grammars G1 and
G2 for generating a single C-like array declaration. both lexical and syntax rules.

Jf
The dimension of the array is at least one. (B) Type checking is done before parsing.

/2
(C) High-level language programs can be translated

ly
For example, int a [10] [3];
to different Intermediate Representations.

it.
The grammars use D as the start symbol, and use six (D) Arguments to a function can be passed using the

//b
terminal symbols int; id [ ] num. [2016] program stack.

s:
Grammar G1 Grammar G2
37. A lexical analyzer uses the following patterns to rec-

tp
D → int L; D → intL; ognize three tokens T1, T2, and T3 over the alphabet {a,

ht
L → id [E L → id E b, c}.

S
E → num ] E → E [num] T1: a?(b|c)*a
TE
E → num ] [E E → [num] T2: b?(a|c)*b
O

Which of the grammars correctly generate the decla- T3: c?(b|a)*c


N

ration mentioned above?


D

Note that ‘x?’ means 0 or 1 occurrence of the symbol


N

(A) Both G1 and G2 x. Note also that the analyzer outputs the token that
A

(B) Only G1 matches the longest possible prefix.


H

(C) Only G2 If the string bbaacabc is processed by the analyzer,


EW

(D) Neither G1 nor G2 which one of the following is the sequence of tokens
N

33. Consider the following grammar: it outputs? [2018]


SE

(A) T1T2T3 (B) T1T1T3


P → xQRS
C

(C) T2T1T3 (D) T3T3


Q → yz | z
TE

38. Consider the following parse tree for the expression


R→w |ε a#b$c$d#e#f, involving two binary operators $ and #.
A
G

S→y #
D
A

a #
What is FOLLOW (Q)? [2017]
LO

(A) {R} (B) {w}


$ #
N

(C) {w, y} (D) {w, $}


W

34. Which of the following statements about parser is/are $ d e f


O

CORRECT? [2017] b c
D

I. Canonical LR is more powerful than SLR.


II. SLR is more powerful than LALR. Which one of the following is correct for the given
III. SLR is more powerful than Canonical LR. parse tree? [2018]
(A) I only (B) II only (A) $ has higher precedence and is left associative; #
(C) III only (D) I and III only is right associative
(B) # has higher precedence and is left associative; $
35. Consider the following expression grammar G :
is right associative
E−>E−T|T
(C) $ has higher precedence and is left associative; #
T−>T+F|F
is left associative
F − > (E) | id
(D) # has higher precedence and is right associative;
Which of the following grammars is not left recur- $ is left associative
sive, but is equivalent to G? [2017]
6.26 | Unit 6 • Compiler Design

Answer Keys
Exercises
Practice Problems 1
1. D 2. A 3. B 4. A 5. D 6. C 7. C 8. D 9. C 10. A
11. C 12. C 13. A 14. B 15. D

Practice Problems 2
1. D 2. C 3. B 4. A 5. B 6. B 7. A 8. C 9. C 10. A
11. D 12. A 13. D 14. A 15. D 16. A 17. C 18. A 19. A

JW
Pw
Previous Years’ Questions
1. B 2. D 3. B 4. A 5. D 6. A 7. B 8. A 9. C 10. A

Jf
/2
11. C 12. B 13. D 14. C 15. B 16. B 17. B 18. C 19. A 20. A

ly
21. C 22. B 23. D 24. D 25. D 26. B 27. A 28. C 29. C 30. D

it.
31. B 32. A 33. C 34. A 35. C 36. B 37. D 38. A

//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D
Chapter 2
Syntax Directed Translation
LEARNING OBJECTIVES

JW
 Syntax directed translation  S-attributed definition

Pw
 Syntax directed definition  L-attributed definitions

Jf
 Dependency graph  Synthesized attributes on the parser

/2
 Constructing syntax trees for expressions  Syntax directed translation schemes

ly
Types of SDD’s Bottom up evaluation of inherited attributes

it.
 

//b
s:
tp
ht
SyntAx directed trAnSlAtion
S
Notes:TE
1. Grammar symbols are associated with attributes.
To translate a programming language construct, a compiler may
O
2. Values of the attributes are evaluated by the semantic rules
need to know the type of construct, the location of the first instruc-
N

associated with production rules.


tion, and the number of instructions generated . . . etc. So, we have
D

to use the term ‘attributes’ associated with constructs.


N

An attribute may represent type, number of arguments, memory


A

Notations for Associating Semantic Rules


H

location, compatibility of variables used in a statement which can-


There are two techniques to associate semantic rules:
EW

not be represented by CFG alone.


So, we need to have one more phase to do this, i.e., ‘semantic
N

analysis’ phase. Syntax directed definition (SDD) It is high level specification for
SE

translation. They hide the implementation details, i.e., the order in


which translation takes place.
C

Semantic analysis Attributes + CFG + Semantic rules = Syntax directed definition


TE

Syntax Semantically checked


tree syntax tree (SDD).
A
G

In this phase, for each production CFG, we will give some seman- Translation schemes These schemes indicate the order in which
D
A

tic rule. semantic rules are to be evaluated. This is an input and output
LO

mapping.
N

Syntax directed translation scheme


W

A CFG in which a program fragment called output action (seman-


O

tic action or semantic rule) is associated with each production is SyntAx directed definitionS
D

known as Syntax Directed Translation Scheme. A SDD is a generalization of a CFG in which each grammar sym-
These semantic rules are used to bol is associated with a set of attributes.
There are two types of set of attributes for a grammar symbol.
1. Generate intermediate code.
2. Put information into symbol table. 1. Synthesized attributes
3. Perform type checking. 2. Inherited attributes
4. Issues error messages. Each production rule is associated with a set of semantic rules.
6.28 | Unit 6 • Compiler Design

Semantic rules setup dependencies between attributes Example: An inherited attribute distributes type informa-
which can be represented by a dependency graph. tion to the various identifiers in a declaration.
The dependency graph determines the evaluation order For the grammar
of these semantic rules. D → TL
Evaluation of a semantic rule defines the value of an
  T → int
attribute. But a semantic rule may also have some side
  T → real
effects such as printing a value.
  L → L1, id
Attribute grammar: An attribute grammar is a syntax   L → id
directed definition in which the functions in semantic rules
‘cannot have side effects’. That is, The keyword int or real followed by a list of

JW
identifiers.
Annotated parse tree: A parse tree showing the values of
In this T has synthesized attribute type: T.type. L has an
attributes at each node is called an annotated parse tree.

Pw
inherited attribute in L.in
The process of computing the attribute values at the
Rules associated with L call for procedure add type to the

Jf
nodes is called annotating (or decorating) of the parse tree.
type of each identifier to its entry in the symbol table.

/2
In a SDD, each production A → ∝ is associated with a

ly
set of semantic rules of the form:

it.
b = f (c1, c2,… cn) where Production Semantic Rule

//b
f : A function D → TL L.in = T.type

s:
b can be one of the following: T → int T.type = integer

tp
b is a ‘synthesized attribute’ of A and c1, c2,…cn are attrib-

ht
utes of the grammar symbols in A → ∝. T → real T.type = real
The value of a ‘synthesized attribute’ at a node is com-
S
L → L1, id
TE addtype L1.in = L.in(id.entry, L.in)
puted from the value of attributes at the children of that
L → id addtype (id.entry, L.in)
node in the parse tree.
O
N

Example: The annotated parse tree for the sentence real id1, id2, id3 is
D

shown below:
N

Production Semantic Rule


A

D
H

expr → expr1 + term expr.t: = expr1.t||term.t||’+’


EW

T ⋅type = real L ⋅in = real


expr → expr1 – term expr.t: = expr1.t||term.t||’-‘
N

L ⋅in = real , id3


expr → term expr.t: = term.t
SE

real
C

term → 0 term.t: = ‘0’ L .in = real , id2


TE

term → 1 term.t: = ‘1’


Id1
A
G
...

...
D

term → 9 term.t: = ‘9’


Synthesized Attribute
A
LO

The value of a synthesized attribute at a node is computed


expr⋅t = 95 − 2+
N

from the value of attributes at the children of that node in a


W

term⋅t = 2 parse tree. Consider the following grammar:


O

expr⋅t = 95 −
L → En
D

expr⋅t = 9 term⋅t = 5 E → E1 + T
E→T
term⋅t = 9
T → T1*F
T→F
9 − 5 + 2 F → (E)
b is an ‘inherited attribute’ of one of the grammar symbols F → digit.
on the right side of the production. Let us consider synthesized attribute value with each of the
An ‘inherited attribute’ is one whose value at a node is non-terminals E, T and F.
defined in terms of attributes at the parent and/or siblings of Token digit has a synthesized attribute lexical supplied
that node. It is used for finding the context in which it appears. by lexical analyzer.
Chapter 2 • Syntax Directed Translation | 6.29

Production Semantic Rule Dependency Graph


L → En print (E.val) The interdependencies among the attributes at the nodes
E → E1 + T E.val: = E1.val + T.val
in a parse tree can be depicted by a directed graph called
dependency graph.
E→T E.val: = T1.val
•• Synthesized attributes have edges pointing upwards.
T → T1*F T.val: = T1.val*F.val •• Inherited attributes have edges pointing downwards and/
T→F T.val: = F.val or sidewise.
F → (E) F.val: = E.val Example 1: A.a:= f (X.x, Y.y) is a semantic rule for A →
XY. For each semantic rule that consists of a procedure call:
F → digit F.val: = digit.lexval A⋅a

JW
Pw
X⋅x Y⋅y
The Annotated parse tree for the expression 5 + 3 * 4 is
shown below: Example 2:

Jf
val

/2
D E

ly
E⋅val = 17 return

it.
E2

//b
E⋅val = 5 + T⋅val = 12 E1 +
val val

s:
T⋅val = 5 T⋅val = 3 F ⋅val = 4 Example 3: real p, q;

tp
*

ht
L⋅in = real
F⋅val = 5 F⋅val = 3 digit⋅lexval = 4

S
TE T ⋅type = real add type (q⋅real)
digit⋅lexval = 5 L1⋅in = real
digit⋅lexval = 3
id⋅entry = q
O
add type (P⋅real)
Example 1: Consider an example, which shows semantic
N

rules for Infix to posfix translation:


D

id⋅entry = p
N
A

Production Semantic Rules


Evaluation order
H

expr → expr1 + term expr.t: = expr1.t||term.t||’+’ A topological sort of directed acyclic graph is an ordering
EW

expr → expr1 – term expr.t: = expr1.t||term.t ||‘-‘ m1, m2, . . . mk of nodes of the graph S. t edges go from nodes
N

expr → term expr.t: = term.t earlier in the ordering to later nodes.


SE

term → 0 term.t: = ‘0’ mi → mj means mi appears before mj in the ordering.


C

If b: = f (c1, c2, …, ck), the dependent attributes c1, c2,...ck are


...

...

TE

available at node before f is evaluated.


term→ 9 term.t := ‘9’
A

Abstract syntax tree


G

Example 2: Write a SDD for the following grammar to It is a condensed form of parse tree useful for representing
D

determine number.val.
A

language constructs.
LO

number → number digit digit.val := ‘0’ Example


digit→ 0|1| . . . 9 digit.val := ‘1’  if-then-else
N

 
W

 
   B S1 S1
O

digit.val = ‘9’ 
D

number.val:=number.val * 10 + digit.val Constructing Syntax Trees


Annotated tree for 131 is for Expressions
131 Each node in a syntax tree can be implemented as a record
with several fields.
number⋅val = 13∗10 + number⋅val = 1
In the node for an operator, one field identifies the opera-
number⋅val∗10 + number⋅val digit⋅val = 1
tor and the remaining fields contain pointers to the nodes for
the operands.
digit⋅val digit⋅val 1. mknode (op, left, right)
2. mkleaf (id, entry). Entry is a pointer to symbol table.
1 3 3. mkleaf (num, val)
6.30 | Unit 6 • Compiler Design

Example: State Val


Production Semantic Rules
E→E1+T E.nptr := mknode (‘+’, E1.nptr, T.nptr) Top → Z Z.z
E→E1 – T E.nptr := mknode (‘-‘, E1.nptr, T.nptr) Y Y.y
E→T E.nptr := T.nptr
X X.x
T→ (E) T.nptr := E.nptr
T→id T.nptr := mkleaf(id, id.entry)
T→num T.nptr := mkleaf(num, num.val) Example: Consider the following grammar:

Construction of a syntax tree for a – 4 + c S→E$ {print(E.val)}

JW
E⋅nptr E→E+E {E.val := E.val + E.val}

Pw
E → E*E {E.val := E.val * E.val}
E⋅nptr +
T⋅nptr E → (E) {E.val := E.val}

Jf
− + E→I {I.val := I.val * 10 + digit}

/2
E⋅nptr T⋅nptr id I → I digit

ly
num

it.
T⋅nptr I → digit {I.val := digit}

//b

id id

s:
Implementation
to entry for c

tp
id num 4
S→E$ print (val [top])

ht
to entry for a E→E+E val[ntop] := val[top] + val[top-2]

S
E → E*E
TE val[ntop] := val[top] * val[top-2]
typeS of Sdd’S E → (E) val[ntop] := val[top-1]
O

Syntax Directed definitions (SDD) are used to specify syn-


N

E→I val[ntop] := val[top]


tax directed translations. There are two types of SDD.
D

I → I digit val[ntop] := 10*val[top] + digit


N

1. S-Attributed Definitions
I → digit val[ntop] := digit
A

2. L-Attributed Definitions.
H
EW

S-attributed definitions L-attributed Definitions


• Only synthesized attributes used in syntax direct definition.
N

A syntax directed definition is L-attributed if each inherited


• S-attributed grammars interact well with LR (K) parsers
SE

attribute of Xj, 1≤ j ≤ n, on the right side of A → X1 X2…Xn,


since the evaluation of attributes is bottom-up. They do
depends only on
C

not permit dependency graphs with cycles.


TE

1. The attributes of symbols X1, X2, . . ., Xj-1 to the left of


L-attributed definitions
A

Xj in the production.
G

• Both inherited and synthesized attribute are used. 2. The inherited attributes of A.
D

• L-attributed grammar support the evaluation of attributes


A

associated with a production body, dependency–graph Every S-attributed definition is L-attributed, because the
LO

edges can go from left to right only. above two rules apply only to the inherited attributes.
N

• Each S-attributed grammar is also a L-attributed grammar.


W

• L-attributed grammars can be incorporated conveniently


O

in top down parsing.


SyntAx directed trAnSlAtion
D

• These grammars interact well with LL (K) parsers (both


table driven and recursive descent). SchemeS
A translation scheme is a CFG in which attributes are asso-
Synthesized Attributes on the ciated with grammar symbols and semantic actions are
Parser Stack enclosed between braces { } are inserted within the right
A translator for an S-attributed definition often be imple- sides of productions.
mented with LR parser generator. Here the stack is imple- Example: E → TR
mented by a pair of array state and val.
R → op T {print (op.lexeme)} R1|∈
• Each state entry is pointed to a LR (1) parsing table.
• Each val[i] holds the value of the attributes associated T → num {print (num.val)}
with the node. For A → xyz, the stack will be: Using this, the parse tree for 9 – 5 + 2 is
Chapter 2 • Syntax Directed Translation | 6.31

E Thus we will evaluate all semantic actions during reductions,


and we find a place to store an inherited attribute. The steps are
T R
1. Remove an embedding semantic action Si, put new
9 −
R1 non-terminal Mi instead of that semantic action.
T {print(‘−’)}
{print(‘9’)} 2. Put Si into the end of a new production rule Mi → ∈.
+ R1
3. Semantic action Si will be evaluated when this new
T {print(‘+’)}
5 ∈
production rule is reduced.
2
{print(‘5’)} {print(‘2’)}
4. Evaluation order of semantic rules is not changed. i.e., if

A → {S1} X1{S2}X2…{Sn}Xn
If we have both inherited and synthesized attributes then we

JW
have to follow the following rules: After removing embedding semantic actions:

Pw
1. An inherited attribute for a symbol on the right side A → M1X1M2X2…MnXn

Jf
of a production must be computed in an action before M1 → ∈{S1}

/2
that symbol.
M2 → ∈{S2}

ly
2. An action must not refer to a synthesized attribute of

...

it.
a symbol on the right side of the action.
Mn→ ∈ {Sn}

//b
3. A synthesized attribute for the non–terminal on the left

s:
can only be computed after all attributes it references, For example,

tp
have been computed.
E → TR

ht
Note: In the implementation of L-attributed definitions dur- R → +T {print (‘+’)} R1

S
ing predictive parsing, instead of syntax directed transla- R→∈ TE
tions, we will work with translation schemes. T → id {print (id.name)}
O

⇓ remove embedding semantic actions


N

Eliminating left recursion from E → TR


D

translation scheme R → +TMR1


N
A

Consider following grammar, which has left recursion R→∈


H

E → E + T {print (‘+’) ;} T → id {print (id.name)}


EW

M → ∈ {print (‘+’)}
When transforming the grammar, treat the actions as if they
N

were terminal symbols. After eliminating recursion from


Translation with inherited attributes
SE

the above grammar.


E → TR Let us assume that every non-terminal A has an inherited
C

R → +T {print (‘+’);} R attribute A.i and every symbol X has a synthesized attribute
TE

R→∈ X.s in our grammar.


A

For every production rule A → X1, X2 . . . Xn, introduce


G

Bottom-up Evaluation new marker non-terminals


D

M1, M2, . . . Mn and replace this production rule with A →


A

of Inherited Attributes M1X1M2X2 . . . MnXn


LO

•• Using a bottom up translation scheme, we can implement The synthesized attribute of Xi will not be changed.
N

any L-attributed definition based on LL (1) grammar. The inherited attribute of Xi will be copied into the syn-
W

•• We can also implement some of L-attributed definitions thesized attribute of Mi by the new semantic action added at
O

based on LR (1) using bottom up translations scheme. the end of the new production rule
D

•• The semantic actions are evaluated during the reductions. Mi → ∈


•• During the bottom up evaluation of S-attributed defi-
nitions, we have a parallel stack to hold synthesized Now, the inherited attribute of Xi can be found in the
attributes. synthesized attribute of Mi.
Where are we going to hold inherited attributes? A → {B.i = f1(. .) B { c.i = f2(. .)} c {A.s = f3(. .)}
We will convert our grammar to an equivalent grammar to

guarantee the following:
•• All embedding semantic actions in our translation scheme A → {M1.i = f1(. .)} M1 {B.i = M1.s} B {M2.i = f2(. .)}M2
will be moved to the end of the production rules. {c.i = M2.S} c {A.s = f3 (. .)}
•• All inherited attributes will be copied into the synthesized M1 → ∈ {M1.s = M1.i}
attributes (may be new non-terminals). M2 → ∈ {M2.s = M2.i}
6.32 | Unit 6 • Compiler Design

exerciSeS
Practice Problems 1 3. Which of the following productions with transla-
Directions for questions 1 to 13: Select the correct alterna- tion rules converts binary number representation into
tive from the given choices. decimal.
1. The annotated tree for input ((a) + (b)), for the rules (A) Production Semantic Rule
given below is
B→0 B.trans = 0
Production Semantic Rule
B→1 B.trans = 1
E→E+T $ $ = mknode (‘+’, $1, $3)
B → B0 B1.trans = B2.trans*2
E → E-T $ $ = mknode (‘-’, $1, $3)

JW
B → B1 B1.trans = B2.trans * 2 + 1
E→T $ $ = $1;

Pw
T → (E) $ $ = $2;
(B) Production Semantic Rule

Jf
T → id $ $ = mkleaf (id, $1)

/2
B→0 B.trans = 0
T → num $ $ = mkleaf (num, $1)

ly
B → B0 B1.trans = B2.trans*4

it.
(A) E (B) E

//b
s:
T (C) Production Semantic Rule
T

tp
( E ) B→1 B.trans = 1

ht
( E )
B → B1 B1.trans = B2.trans*2
E + T

S
E + T
TE
T
( E )
(D) None of these
O
T id = b 4. The grammar given below is
T
N

( E )
D

( E ) Production Semantic Rule


T id = b
N
A

id = a id = a A → LM L.i := l(A. i)
H

M.i := m(L.s)
EW

(C) E (D) None of these A.s := f(M.s)


N

E + T A → QR R.i := r(A.i)
SE

T id = b Q.i := q(R.s)
C
TE

A.s := f(Q.s)
id = a
A

(A) A L-attributed grammar


G

2. Let synthesized attribute val give the value of the binary (B) Non-L-attributed grammar
D

(C) Data insufficient


A

number generated by S in the following grammar.


LO

S→LL (D) None of these


S→L 5. Consider the following syntax directed translation:
N

L → LB
W

S → aS {m := m + 3; print (m);}
L→B
O

|bS {m: = m*2; print (m) ;}


D

B→0 |∈ {m: = 0 ;}
B→1 A shift reduce parser evaluate semantic action of a pro-
Input 101.101, S.val = 5.625 duction whenever the production is reduced.
use synthesized attributes to determine S.val
Which of the following are true? If the string is = a a b a b b then which of the following
(A) S → L1.L2 {S.val = L1.val + L2.val/ (2**L2.bits) is printed?
|L {S.val = L.val; S.bits = L.bits} (A) 0 0 3 6 9 12 (B) 0 0 0 3 6 9 12
(B) L → L1 B {L.val = L1.val*2 + B.val; (C) 0 0 0 3 6 9 12 15 (D) 0 0 3 9 6 12
L.bits = L1.bits + 1} 6. Which attribute can be evaluated by shift reduce parser
|B {L.val = B.val; L.bits = 1} that execute semantic actions only at reduce moves but
(C) B → 0 {B.val = 0} never at shift moves?
|1 {B.val = 1} (A) Synthesized attribute (B) Inherited attribute
(D) All of these (C) Both (a) and (b) (D) None of these
Chapter 2 • Syntax Directed Translation | 6.33

7. Consider the following annotated parse tree: If Input = begin east south west north, after evaluating

A A⋅num = y⋅num + z⋅num this sequence what will be the value of S.x and S.y?
(A) (1, 0) (B) (2, 0)
B⋅num = num B + C C⋅num = num (C) (-1, -1) (D) (0, 0)
11. What will be the values s.x, s.y if input is ‘begin west
num num south west’?
(A) (–2, –1)
Which of the following is true for the given annotated (B) (2, 1)
tree? (C) (2, 2)
(A) There is a specific order for evaluation of attribute (D) (3, 1)

JW
on the parse tree.

Pw
(B) Any evaluation order that computes an attribute 12. Consider the following grammar:
‘A’ after all other attributes which ‘A’ depends on, S→E S.val = E.val

Jf
is acceptable.

/2
E.num = 1
(C) Both (A) and (B)

ly
E → E*T E1.val = 2 * E2.val + 2 * T.val

it.
(D) None of these.

//b
E2.num = E1.num + 1
Common data for questions 8 and 9: Consider the fol-

s:
lowing grammar and syntax directed translation. T.num = E1.num + 1

tp
E→T E.val = T.val
E→E+T E1.val = E2.val + T.val

ht
T.num = E.num + 1
E→T E.val = T.val

S
T→T+P
TE T1.val = T2.val + P.val
T → T*P T1.val = T2.val * P.val *
P.num T2.num = T1.num + 1
O

P.num = T1.num + 1
N

T→P T.val = P.val * P.num


D

T→P T.val = P.val


P → (E) P.val = E.val
N

P.num = T.num + 1
A

P→0 P.num = 1
H

P → (E) P.val = E.val


P.val = 2
EW

P→1 P.num = 2  E .num = P.num 


P→i  
N

P.val = 1  P.val = I | P.num 


SE

Which attributes are inherited and which are synthe-


C

8. What is E.val for string 1*0? sized in the above grammar?


TE

(A) 8 (B) 6
(A) Num attribute is inherited attribute. Val attribute is
A

(C) 4 (D) 12
synthesized attribute.
G

9. What is the E.val for string 0 * 0 + 1?


(B) Num is synthesized attribute. Val is inherited at-
D

(A) 8 (B) 6
A

tribute.
(C) 4 (D) 12
LO

(C) Num and val are inherited attributes.


10. Consider the following syntax directed definition:
N

(D) Num and value are synthesized attributes.


W

13. Consider the grammar with the following translation


O

Production Semantic Rule


D

S→b S.x = 0
rules and E as the start symbol.
S.y = 0 E → E1@T {E.value = E1.value*T.value}
S → S1 I S.x = S1.x + I.dx
S.y = S1.y + I.dy
|T {E.value = T.value}
I → east I.dx = 1 T → T1 and F {T.value = T1.value + F.value}
I.dy = 0
|F {T.value = F.value}
I → north I.dx = 0
I.dy = 1 F → num {F.value = num.value}
I → west I.dx = -1
Compute E.value for the root of the parse tree for the
I.dy = 0
expression: 2 @ 3 and 5 @ 6 and 4
I → south I.dx = 0
I.dy = -1 (A) 200 (B) 180
(C) 160 (D) 40
6.34 | Unit 6 • Compiler Design

Practice Problems 2 (C) Action translating expression represents postfix


Directions for questions 1 to 10: select the correct alterna- notation.
tive from the given choices. (D) None of these
1. Consider the following Tree: 4. In the given problem, what will be the result after eval-
uating 9 – 5 + 2?
Production Meaning
(A) + - 9 5 2 (B) 9 – 5 + 2
E → E1 + T E.t = E1.t*T.t (C) 9 5 – 2+ (D) None of these
E → E1 – T E.t = E1.t + T.t
5. In a syntax directed translation, if the value of an attrib-
E→T E.t = T.t ute node is a function of the values of attributes of chil-
t→0 T.t = ‘0’ dren, then it is called:

JW
(A) Synthesized attribute (B) Inherited attribute
t→5 T.t = ‘5’
(C) Canonical attributes (D) None of these

Pw
t→2 T.t = ‘2’
6. Inherited attribute is a natural choice in:

Jf
t→4 T.t = ‘4’
(A) Keeping track of variable declaration

/2
(B) Checking for the correct use of L-values and R-

ly
E

it.
values.

//b
E + T (C) Both (A) and (B)
(D) None of these

s:
E - T

tp
7. Syntax directed translation scheme is desirable because

ht
(A) It is based on the syntax
T 2
4 (B) Its description is independent of any implementa-

S
5 TE tion.
After evaluation of the tree the value at the root will be: (C) It is easy to modify
O

(D) All of these


N

(A) 28 (B) 32
D

(C) 14 (D) 7 8. A context free grammar in which program fragments,


N

2. The value of an inherited attribute is computed from the called semantic actions are embedded within right side
A

values of attributes at the _______ of the production is called,


H

(A) Sibling nodes (B) Parent of the node (A) Syntax directed translation
EW

(C) Children node (D) Both (A) and (B) (B) Translation schema
(C) Annotated parse tree
N

3. Consider an action translating expression:


(D) None of these
SE

expr → expr + term {print (‘+’)}


9. A syntax directed definition specifies translation of
C

expr → expr - term {print (‘-’)}


construct in terms of:
TE

expr → → term
term → 1 {print (‘1’)} (A) Memory associated with its syntactic component
A

term → 2 (B) Execution time associated with its syntactic com-


G

{print (‘2’)}
term → 3 {print (‘3’)} ponent
D

(C) Attributes associated with its syntactic component


A

Which of the following is true regarding the above


LO

(D) None of these


translation expression?
10. If an error is detected within a statement, the type
N

(A) Action translating expression represents infix


assigned to the Statement is:
W

notation.
(A) Error type (B) Type expression
O

(B) Action translating expression represents prefix


D

notation. (C) Type error (D) Type constructor

Previous Years’ Questions


Common data for questions 1 (A) and 1 (B): Consider 1. (A) The above grammar and the semantic rules are fed
the following expression grammar. The semantic rules for to a yacc tool (which is an LALR (1) parser gener-
expression evaluation are stated next to each grammar pro- ator) for parsing and evaluating arithmetic expres-
duction:[2005] sions. Which one of the following is true about the
E → number E.val = number.val action of yacc for the given grammar?
(A) It detects recursion and eliminates recursion
|E ‘+’ E E (1).val = E (2).val + E (3).val
(B) It detects reduce-reduce conflict, and resolves
|E → E E (1).val = E (2).val × E (3).val
Chapter 2 • Syntax Directed Translation | 6.35

(C) It detects shift-reduce conflict, and resolves the (C) The maximum number of successors of a node
conflict in favor of a shift over a reduce action. in an AST and a CFG depends on the input pro-
(D) It detects shift-reduce conflict, and resolves the gram.
conflict in favor of a reduce over a shift action. (D) Each node in AST and CFG corresponds to at
(B) Assume the conflicts in Part (A) of this question most one statement in the input program.
are resolved and an LALR (1) parser is gener- 3. Consider the following Syntax Directed Translation
ated for parsing arithmetic expressions as per the Scheme (SDTS), with non-terminals {S, A} and ter-
given grammar. Consider an expression 3 × 2 minals {a, b}.[2016]
+ 1. What precedence and associativity proper- S → aA { print 1 }
ties does the generated parser realize?

JW
S → a { print 2 }
(A) Equal precedence and left associativity; expres-
A → Sb { print 3 }

Pw
sion is evaluated to 7
(B) Equal precedence and right associativity; expres- Using the above SDTS, the output printed by a bot-

Jf
sion is evaluated to 9 tom-up parser, for the input aab is:

/2
(C) Precedence of ‘×’ is higher than that of ‘+’, and (A) 1 3 2 (B) 2 2 3

ly
both operators are left associative; expression is (C) 2 3 1 (D) syntax error

it.
evaluated to 7

//b
4. Which one of the following grammars is free from left
(D) Precedence of ‘+’ is higher than that of ‘×’, and recursion?[2016]

s:
both operators are left associative; expression is (A) S → AB

tp
evaluated to 9 A → Aa|b

ht
2. In the context of abstract-syntax-tree (AST) and B → c
(B) S → Ab|Bb|c

S
control-flow-graph (CFG), which one of the follow- TE
A → Bd|ε
ing is TRUE?[2015]
B → e
O
(A) In both AST and CFG, let node N2 be the suc- (C) S → Aa|B
N

cessor of node N1. In the input program, the code A → Bb|Sc|ε


D

corresponding to N2 is present after the code cor- B → d


N

responding to N1. (D) S → Aa|Bb|c


A
H

(B) For any input program, neither AST nor CFG A → Bd|ε
B → Ae|ε
EW

will contain a cycle.


N
SE

Answer Keys
C
TE

Exercises
A

Practice Problems 1
G
D

1. A 2. D 3. A 4. B 5. A 6. A 7. B 8. C 9. B 10. D
A

11. A 12. A 13. C


LO
N

Practice Problems 2
W

1. A 2. D 3. C 4. C 5. A 6. C 7. D 8. B 9. C 10. C
O
D

Previous Years’ Questions


1. (a) C (b) B 2. C 3. C 4. A
Chapter 3
Intermediate Code Generation

JW
LEARNING OBJECTIVES

Pw
 Introduction  Procedure calls

Jf
/2
 Directed Acyclic Graphs (DAG)  Code generation

ly
 Three address code  Next use information

it.
Symbol table operations Run-time storage management

//b
 
Assignment statements DAG representations of basic blocks

s:
 

tp
 Boolean expression  Peephole optimization

ht
 Flow control of statements

S
TE
O
N
D

inTroduCTion
N

Directed acyclic graphs for expression: (DAG)


A

• A DAG for an expression identifies the common sub expressions


H

In the analysis–synthesis model, the front end translates a source


in the given expression.
EW

program into an intermediate representation (IR). From IR the


back end generates target code. • A node N in a DAG has more than one parent if N represents a
N

common sub expression.


• DAG gives the compiler, important clues regarding the genera-
SE

Intermediate Target tion of efficient code to evaluate the expressions.


C

Source Front Back


code end representation end representation
TE

Example 1: DAG for a + a*(b – c) + (b – c)*d


A
G

Target Mostly target Target P7 P 13


independent, dependent, dependent, +
D

P12
source dependent source independent source independent
A

+ P *
6
LO

P5P10
* d P11
N

There are different types of intermediate representations: a −


W

P1P 2
• High level IR, i.e., AST (Abstract Syntax Tree) b c
O


D

Medium level IR, i.e., Three address code P3P8 P4P9


• Low level IR, i.e., DAG (Directed Acyclic Graph)
• Postfix Notation (Reverse Polish Notation, RPN). P1 = makeleaf (id, a)
P2 = makeleaf (id, a) = P1
In the previous sections already we have discussed about AST and
RPN. P3 = makeleaf (id, b)
P4 = makeleaf (id, c)
Benefits of Intermediate code generation: The benefits of ICG
P5 = makenode (-, P3, P4)
are
P6 = makenode (*, P1, P5)
1. We can obtain an optimized code. P7 = makenode (+, P1, P6)
2. Compilers can be created for the different machines by
P8 = makeleaf (id, b) = P3
attaching different backend to existing front end of each
machine. P9 = makeleaf (id, c) = P4
3. Compilers can be created for the different source languages. P10 = makenode (-, P8, P9) = P5
Chapter 3 • Intermediate Code Generation | 6.37

P11 = makeleaf (id, d) The corresponding three address code will be like this:
P12 = makenode (*, P10, P11)
Syntax Tree DAG
P13 = makenode (+, P7, P12)
t1 = -z t1 = -z
Example 2: a: = a – 10 t2 = y * t1 t2 = y * t1
:=
t3 = -z t5 = t2 + t2
− t4 = y * t3 X = t5
a 10 t5 = t4 + t2
X = t5

JW
Three-Address Code The postfix notation for syntax tree is: xyz unaryminus *yz

Pw
In three address codes, each statement usually contains 3 unaryminus *+=.

Jf
addresses, 2 for operands and 1 for the result. •• Three address code is a ‘Linearized representation’ of

/2
Example: -x = y OP z syntax tree.

ly
•• x, y, z are names, constants or complier generated •• Basic data of all variables can be formulated as syntax

it.
temporaries, directed translation. Add attributes whenever necessary.

//b
•• OP stands for any operator. Any arithmetic operator (or) Example: Consider below SDD with following

s:
Logical operator.

tp
specifications:

ht
Example: Consider the statement x = y * - z + y* - z E might have E. place and E.code
E.place: the name that holds the value of E.

S
=
E.code: the sequence of intermediate code starts evaluating E.
TE
+ Let Newtemp: returns a new temporary variable each time
x
O

it is called.
N

*
* New label: returns a new label.
D

Unary-minus
y Then the SDD to produce three–address code for expressions
N

Unary-minus
z y
is given below:
A

z
H
EW

Production Semantic Rules


N

S→ id ASN E S. code = E.code \\ gen (ASN, id.place, E.place )


E. Place = newtemp ();
SE

E→ E1 PLUS E2 E. code = E1. code || E2. code || gen (PLUS, E. place, E1. place, E2. place);
C

E. place = newtemp();
TE

E→ E1MUL E2 E. code = E1. code || E2. code || gen (MUL, E. place, E1. place, E2. place);
A

E. Place = Newtemp();
G

E→ UMINUS E1 E. code = E1 code || gen (NEG, E. Place, E1. place);


D

E. code = E1.code
A

E→ LP E1 RP E. Place = E1. Place


LO

E→ IDENT E.place = id. place


N

E. code = empty.list ();


W
O

Types of Three Address Statement


D

Address and pointer manipulation


Assignment x : = &y Store address of y to x
•• Binary assignment: x: = y OP z Store the result of y OP z x : = *y Store the contents of y to x
to x. *x : = y Store y to location pointed by x .
•• Unary assignment: x: = op y Store the result of unary
operation on y to x. Jump
•• Unconditional jump:- goto L, jumps to L.
Copy •• Conditional:
•• Simple Copy x: = y Store y to x if (x relop y)
•• Indexed Copy x: = y[i] Store the contents of y[i] to x goto L1;
•• x[i]:= y Store y to (x + i)th address. else
6.38 | Unit 6 • Compiler Design

{ Example 1: For the expression x = y * - z + y * - z, the


goto L2; quadruple representation is
}
OP Arg1 Arg2 Result
Where relop is <, < =, >, > = , = or ≠.
(0) Uminus z t1
Procedure call (1) * y t1 t2
(2) Uminus z t3
Param x1; (3) * y t3 t4
(4) + t2 t4 t5
Param x2; (5) = t5 x
.
. Example 2: Read (x)

JW
.

Pw
Op Arg1 Arg2 Result
Param xn; (0) Param x

Jf
Call procedure p with n parameters and (1) Call READ (x)

/2
Call p, n, x;
store the result in x.

ly
return x Use x as result from procedure. Example 3: WRITE (A*B, x +5)

it.
//b
Declarations OP Arg1 Arg2 Result

s:
•• Global x, n1, n2: Declare a global variable named x at off- (0) * A B t1

tp
set n1 having n2 bytes of space. (1) + x 5 t2

ht
•• Proc x, n1, n2: Declare a procedure x with n1 bytes of (2) Param t1
(3) Param t2
parameter space and n2 bytes of local variable space.

S
(4) Call Write 2
•• Local x, m: Declare a local variable named x at offset m
TE
from the procedure frame.
O

•• End: Declare the end of the current procedure. Triples


N

Triples have three fields: OP, arg1, arg2.


D
N

Adaption for object oriented code •• Temporaries are not used and instead references to
A

•• x = y field z: Lookup field named z within y, store address instructions are made.
H

to x •• Triples are also known as two address code.


EW

•• Class x, n1, n2: declare a class named x with n1 bytes of •• Triples takes less space when compared with Quadruples.
N

class variables and n2 bytes of class method pointers. •• Optimization by moving code around is difficult.
•• Field x, n: Declare a field named x at offset n in the class
SE

•• The DAG and triple representations of expressions are


frame. equivalent.
C

•• New x: Create a new instance of class name x. •• For the expression a = y* – z + y*–z the Triple representa-
TE

tion is
A

Implementation of Three
G

Address Statements Op Arg1 Arg2


D

(0) Uminus z
A

Three address statements can be implemented as records (1) * y (0)


LO

with fields for the operator and the operands. There are 3 (2) Uminus z
N

types of representations: (3) * y (2)


W

1. Quadruples (4) + (1) (3)


O

(5) = a (4)
2. Triples
D

3. Indirect triples
Array – references
Quadruples Example: For A [I]: = B, the quadruple representation is
A quadruple has four fields: op, arg1, arg2 and result.
Op Arg1 Arg2 Result
•• Unary operators do not use arg2. (0) []= A I T1
•• Param use neither arg2 nor result. (1) = B T2
•• Jumps put the target label in result.
•• The contents of the fields are pointers to the symbol table The same can be represented by Triple representation also.
entries for the names represented by these fields. [] = is called L-value, specifies the address to an
•• Easier to optimize and move code around. element.
Chapter 3 • Intermediate Code Generation | 6.39

Op Arg1 Arg2 Example:


(0) []= A I Declaration → M1D
(1) = (0) B M1→ ∈ {TOP (Offset): = 0 ;}
Example 2: A: = B [I] D→ D ID
D→ id: T {enter (top (tblptr), id.name, T.type
Op Arg1 Arg2
top (offset)); top (offset): = top (offset)
(0) =[] B I
+ T. width ;}
(1) = A (0)
T→ integer {T.type : = integer; T. width: = 4 :}
= [ ] is called r-value, specifies the value of an element.
T→ double {T.type: = double; T.width = 8 ;}
Indirect Triples T→ * T1 {T. type: = pointer (T. type); T.width

JW
•• In indirect triples, pointers to triples will be there instead = 4;}

Pw
of triples. Need to remember the current offset before entering the
•• Optimization by moving code around is easy.

Jf
block, and to restore it after the block is closed.
•• Indirect triples takes less space when compared with

/2
Example: Block → begin M4 Declarations statements end

ly
Quadruples.
{pop (tblptr); pop (offset) ;}

it.
•• Both indirect triples and Quadruples are almost equally

//b
efficient. M4 → ∈{t: = mktable (top (tblptr); push (t,

s:
Example: Indirect Triple representation of 3-address code tblptr); push (top (offset), offset) ;

tp
Can also use the block number technique to avoid creating

ht
Statement
a new symbol table.
(0) (14)

S
(1) (15)
TE
(2) (16) Field names in records
O
(3) (17) •• A record declaration is treated as entering a block in
N

(4) (18) terms of offset is concerned.


D

(5) (19)
•• Need to use a new symbol table.
N
A

Op Arg1 Arg2 Example: T→ record M5 D end


H

(14) Uminus z {T. type: = (top (tblptr));


EW

(15) * y (14)
T. width = top (offset);
pop (tblptr);
(16) Uminus z
N

pop (offset) ;}
(17) * y (16)
SE

(18) + (15) (17) M5 → ∈ {t: = mktable (null);


push (t, tblptr);
C

(19) = x (18)
push {(o, offset) ;}
TE
A

Symbol Table Operations Assignment Statements


G

Treat symbol tables as objects.


D

Expressions can be of type integer, real, array and record.


A

•• Mktable (previous); As part of translation of assignments into three address


LO

•• create a new symbol table. code, we show how names can be looked up in the symbol
N

•• Link it to the symbol table previous. table and how elements of array can be accessed.
W

•• Enter (table, name, and type, offset)


O

•• insert a new identifier name with type and offset into Code generation for assignment statements gen ([address
D

table # 1], [assignment], [address #2], operator, address # 3);


•• Check for possible duplication. Variable accessing Depending on the type of [address # i],
•• Add width (table, width); generate different codes.
•• increase the size of symbol table by width. Types of [address # i]:
•• Enterproc (table, name, new table) •• Local temp space
•• Enter a procedure name into table. •• Parameter
•• The symbol table of name is new table. •• Local variable
•• Lookup (name, table); •• Non-local variable
•• Check whether name is declared in the symbol table, if •• Global variable
it is in the table then return the entry. •• Registers, constants,…
6.40 | Unit 6 • Compiler Design

Error handling routine error – msg (error information); Start_addr: starting address
The error messages can be written and stored in other 1D Array: A[i]
file. Temp space management:
•• Start_addr + (i – low )* w = i * w + (start_addr - low *w)
•• This is used for generating code for expressions. •• The value called base, (start_addr – low * w) can be com-
•• newtemp (): allocates a temp space. puted at compile time and then stored at the symbol table.
•• freetemp (): free t if it is allocated in the temp space Example: array [-8 …100] of integer.
To declare [-8] [-7] … [100] integer array in Pascal.
Label management 2D Array A [i1, i2]
•• This is needed in generating branching statements. Row major order: row by row. A [i] means the ith row.
•• newlabel (): generate a label in the target code that has 1st row A [1, 1]
never been used.

JW
A [1, 2]

Pw
Names in the symbol table 2nd row A [2, 1]

Jf
S→ id: = E {p: = lookup (id-name, top (tblptr)); A [2, 2]

/2
If p is not null then gen (p, “:=”, A [i, j] = A [i] [j]

ly
E.place); Column major: column by column.

it.
Else error (“var undefined”, id. Name) A [1, 1] : A [1, 2]

//b
;} A [2, 1] A [2, 2]

s:
E→E1+ E2 {E. place = newtemp (); 1st Column 2nd column

tp
gen (E.place, “: = “, E1.place, "+”, Address for A [i1, i2]:

ht
E2.Place); free temp (E1.pace); Start _ addr + ((i, - low1) *n2 + (i2 – low2))*w
freetemp Where low1 and low2 are the lower bounds of i1 and i2. n2

S
(E2. place) ;}
TE
is the number of values that i2 can take. High2 is the upper
E→ –E1 {E. place = newtemp (); bound on the valve of i2. n2 = high2 – low2 + 1
O

gen (E.place, “: =”, “uminus”, We can rewrite address for A [i1, i2] as ((i1 × n2) + i2)
N

E1.place); × w + (start _ addr - ((low1 × n2) + low2) × w). The value


D

Freetemp (E1. place ;)} (start _ addr - low1 × n2 × w – low2 × w) can be computed at
N
A

E→(E1) {E. place = E1. place ;} compiler time and then stored in the symbol table.
H

E→ id {p: = lookup (id.name, top (tblptr); Multi-Dimensional Array A [i1, i2,…ik]


EW

If p ≠ null then E.place = p. place else error Address for A [i1, i2,…ik]
(“var undefined”, id. name) ;}
)
N

n n
= i1 * π ik= 2 i + i2 * π ik=3 i +  + ik * w

SE

Type conversions
(
+ start _ addr − low1 * w * π ik= 2 ni
C

Assume there are only two data types: integer, float.


)
TE

n
For the expression, −low2 * w * π ik=3 i −  − lowk * w
A

E → E1 + E2
It can be computed incrementally in grammar rules:
G

If E1. type = E2. type then f (1) = i1;


D

generate no conversion code f (j) = f (j -1) * nj + ij;


A

E.type = E1. type;


LO

f (k) is the value we wanted to compute.


Else Attributes needed in the translation scheme for addressing
N

E.type = float; array elements:


W

temp1 = newtemp (); Elegize: size of each element in the array


O
D

If E1. type = integer then Array:  a pointer to the symbol table entry containing
gen (temp1,’:=’ int - to - float, E1.place); information about the array declaration.
gen (E,’:=’ temp1, ‘+’, E2.place); Ndim: the current dimension index
Else Base: base address of this array
gen (temp1,’:=’ int - to - float, E2. place); Place: where a variable is stored.
gen (E,’:=’ temp1, ‘+’, E1. place); Limit (array, n) = nm is the number of elements in the mth
Free temp (temp1); coordinate.

Addressing array elements Translation scheme for array elements


Let us assume Consider the grammar
low: lower bound S → L: = E
w: element data width E→L
Chapter 3 • Intermediate Code Generation | 6.41

L→ id Boolean Expressions
L→ [Elist] There are two choices for implementation of Boolean
Elist→ Elist1, E expressions:
Elist→ id [E] 1. Numerical representation
E→ id 2. Flow of control
E→ E + E
Numerical representation
E→ (E)
Encode true and false values.
•• S → L: = E {if L. offset = null then /* L is a Numerically, 1:true 0: false.
simple id */ gen (L. place, “:=”, E.place); Flow of control: Representing the value of a Boolean

JW
Else expression by a position reached in a program.

Pw
gen (L. place, “[“, L. offset, “]”,”:=”,
Short circuit code: Generate the code to evaluate a Boolean
E.place);

Jf
expression in such a way that it is not necessary for the code
•• E → E1 + E2 {E.place = newtemp ();

/2
to evaluate the entire expression.

ly
gen (E. place, “:=”, E1.place, "+”, E2.

it.
place) ;} •• If a1 or a2

//b
•• E → (E1) {E.place= E1.place} a1 is true then a2 is not evaluated.
•• E →L {if L. offset = null then /* L is a •• If a1 and a2

s:
tp
simple id */ E.place:= L .place); a1 is false then a2 is not evaluated.

ht
Else begin
E.place:=newtemp(); Numerical representation
S
gen (E.place, “:=”,L.place, “[“,L.offset, TE
E → id1 relop id2
‘]”);
{B.place:= newtemp ();
O

end }
N

gen (“if”, id1.place, relop.op, id2.


•• L → id {P! = lookup (id.name, top (tblptr));
D

place,”goto”, next stat +3);


If P ≠ null then
N

gen (B.place,”:=”, “0”);


A

Begin
gen (“goto”, nextstat+2);
H

L.place: = P.place:
gen (B.place,”:=”, “1”)’}
EW

L.offset:= null;
End Example 1: Translate the statement (if a < b or c < d and e
N

Else < f) without short circuit evaluation.


SE

Error (“Var underfined”, id. Name) ;} 100: if a < b goto 103


C

•• L → Elist {L. offset: = newtemp (); 101: t1:= 0


TE

gen (L. offset, “:=”, Elist.elesize, 102: goto 104


“*”, Elist.place ); 103: t1:= 1 /* true */
A
G

freetemp (Elist.place); 104: if c < d goto 107


L.Place := Elist . base ;}
D

105: t2:= 0 /* false */


A

•• Elist→ Elist1, E {t: =newtemp (); m: = Elist1. 106: goto 108


LO

ndim+1;
107: t2:= 1
N

gen (t, “:=” Elist1.place, “*”, limit (Elist1.


108: if e < f goto 111
W

array, m));
O

Gen (t, “:=”, t"+”, E.place); freetemp 109: t3:= 0


D

(E.place); 110: goto 112


Elist.array: = Elist.array; 111: t3 := 1
Elist.place:= t; Elist.ndim:= m ;} 112: t4 := t2 and t3
Elist → id [E {Elist.Place:= E.place; Elist. 113: t3:= t1 or t4
ndim:=1;
P! = lookup (id.name, top (tblptr)); check Flow of Control Statements
for id errors;
Elist.elesize:= P.size; Elist.base: = p.base;
B→ id1 relop id2
{
Elist.array:= p.place ;}
B.true: = newlabel ();
•• E → id {P:= lookup (id,name, top (tblptr); B.false:= newlabel ();
Check for id errors; E. Place: = Populace ;} B.code:= gen (“if”, id1. relop, id2, “goto”,
6.42 | Unit 6 • Compiler Design

B.true, “else”, “goto”, B. false) || .


gen (B.true, “:”) .
} .
S→if B then S1 S.code:= B.code || S1 .code ||gen case V [k]: S[k]
(B.false, ‘:’) default: S[d]
|| is the code concatenation operator. }

1. If – then implementation:
S →if B then S1 {gen (Befalls,” :”);} Translation sequence
To B.true •• Evaluate the expression.
B.Code
To B.false •• Find which value in the list matches the value of the

JW
B.true: S1.Code expression, match default only if there is no match.
•• Execute the statement associated with the matched value.

Pw
B.false:

Jf
2. If – then – else How to find the matched value? The matched value can be

/2
P→S {S.next:= newlabel (); found in the following ways:

ly
P.code:= S.code || gen (S.next,” :”)} 1. Sequential test

it.
2. Lookup table

//b
S → if B then S1 else S2 {S1.next:= S.next;
3. Hash table

s:
S2.next:= S.next; 4. Back patching

tp
Secede: = B.code || S1.code ||.
Two different translation schemes for sequential test are

ht
Gen (“goto” S.next) || B. false,” :”) shown below:
||S2.code}

S
1. Code to evaluate E into t
TE
Need to use inherited attributes of S to define the
Goto test
attributes of S1 and S2
O

L[i]: code for S [1]


N

To B. true goto next


D

B.Code
To B.false
N

L[k]: code for S[k]


A

B.true: S1.Code
H

goto next
Goto S.next
EW

L[d]: code for S[d]


S2.Code
B.false: Go to next test:
N

If t = V [1]: goto L [1]


SE

S.next
.
C

3. While loop: .
TE

B→ id1 relop id2 B.true:= newlabel (); .


A

B.false:= newlabel (); goto L[d]


G

B.code:=gen (‘if’, id.relop, Next:


D

id2, ‘goto’, B.true ‘else’, ‘goto’, B. false) || 2. Can easily be converted into look up table
A

If t <> V [i] goto L [1]


LO

gen (B.true ‘:’);


S→ while B do S1 S.begin:= newlabel (); Code for S [1]
N

S.code:=gen (S.begin,’:’)|| goto next


W

B.code||S1.code || gen
O

(‘goto’, S.begin) || gen (B.false, ‘:’);


D

L [1]: if t < > V [2] goto L [2]


Code for S [2]
S.begin B.Code B. true
B.false Goto next
B.true: S1.Code
L [k - 1]: if t < > V [k] goto L[k]
Goto S.next
Code for S[k]
B.false: Goto next
.
4. Switch/case statement: .
The c - like syntax of switch case is .
switch epr { L[k]: code for S[d]
case V [1]: S [1] Next:
Chapter 3 • Intermediate Code Generation | 6.43

Use a table and a loop to find the address to jump r – value: value of the variable, i.e., on the right side of
assignment. Ex: y, in above assignment.
V [1] L [1] l – value: The location/address of the variable, i.e., on the
L[1] : S [1]
V [2] L [2] leftside of assignment. Ex: x, in above assignment.
L [2]: S [2]
There are different modes of parameter passing
V [3] L [3]
1. call-by-value
2. call-by-reference
3. call-by-value-result (copy-restore)
4. call-by-name
3. Hash table: When there are more than two entries
use a hash table to find the correct table entry.

JW
4. Back patching: Call by value

Pw
•• Generate a series of branching statements with the Calling procedure copies the r values of the arguments into
targets of jumps temporarily left unspecified. the called proceduce’s Activation Record.

Jf
•• To determine label table: each entry contains a list Changing a formal parameter has no effect on the actual

/2
of places that need to be back patched. parameter.

ly
it.
•• Can also be used to implement labels and gotos.
Example: void add (int C)

//b
{

s:
Procedure Calls C = C+ 10;

tp
•• Space must be allocated for the activation record of the printf (‘\nc = %d’, &C);

ht
called procedure. }
main ()

S
•• Arguments are evaluated and made available to the called TE
procedure in a known place. {
•• Save current machine status. int a = 5;
O
N

•• When a procedure returns: printf (‘a=%d’, &a);


add (a);
D

•• Place returns value in a known place.


N

•• Restore activation record. printf (‘\na = %d’, &a);


A

}
H

Example: S → call id (Elist) In main a will not be affected by calling add (a)
EW

{for each item P on the queue Elist.


It prints a = 5
Queue do gen (‘PARAM’, q);
a=5
N

gen (‘call:’, id.place) ;}


Only the value of C in add ( ) will be changed to 15.
SE

Elist → Elist, E {append E.place to the end of


Elist.queue} Usage:
C

Elist → E {initialize Elist.queue to contain only 1. Used by PASCAL and C++ if we use non-var
TE

E.place} parameters.
A

Use a queue to hold parameters, then generate codes for 2. The only thing used in C.
G

params. Advantages:
D

Code for E1, store in t1


A

1. No aliasing.
LO

. 2. Easier for static optimization analysis.


. 3. Faster execution because of no need for redirecting.
N

.
W

Code for Ek, store in tk


O

PARAM t1 Call by reference


D

: Calling procedure copies the l-values of the arguments into


. the called procedure’s activation record. i.e., address
. will be passed to the called procedure.
PARAM tk
•• Changing formal parameter affects the corresponding
Call P
actual parameter.
Terminology:
•• It will have some side effects.
Procedure declaration:
Parameters, formal parameters Example: void add (int *c)
Procedure call: {
Arguments, actual parameters. *c = *c + 10;
The values of a variable: x = y printf(‘\nc=%d’, *c);
6.44 | Unit 6 • Compiler Design

} int j;
void main() j = - 1;
{ For (in y= 0; y < 10; y ++)
int a = 5; x ++;
}
printf (‘\na = %d’, a);
add (&a); •• Instead of passing values or address as arguments, a func-
printf (‘\na = %d’, a); tion is passed for each argument.
output: a = 5 •• These functions are called thunks.
c = 15 •• Each time a parameter is used, the thunk is called, then
a = 15 the address returned by the thunk is used.
That is, here the actual parameter is also modified.

JW
y = 0: use return value of thunk for y as the  -value.
Advantages

Pw
1. Efficiency in passing large objects. Advantages
2. Only need to copy addresses. •• More efficient when passing parameters that are never

Jf
used.

/2
Call-by-value-result •• This saves lot of time because evaluating unused param-

ly
it.
Equivalent to call-by-reference except when there is aliasing. eter takes a longtime.

//b
That is, the program produces the same result, but not the

s:
same code will be generated. Code Generation

tp
Aliasing: Two expressions that have the same l-values are Code generation is the final phase of the compiler model.

ht
called aliases. They access the same location from different
places. Input

S
Front Intermediate Code
Aliasing happens through pointer manipulation.
TE (or)
optimization
end code
Source
1. Call by reference with global variable as an argument.
O
program
2. Call by reference with the same expression as argu-
N

Intermediate code
ment twice.
D
N

Example: test (x,y,x) Target Code


A

program generation
H

Advantages:
EW

1. If there is no aliasing, we can implement it by using


call – by – reference for large objects. The requirements imposed on a code generator are
N

2. No implicit side effect if pointers are not passed. 1. Output code must be correct.
SE

2. Output code must be of high quality.


Call by-name 3. Code generator should run efficiently.
C

used in Algol.
TE

•• Procedure body is substituted for the call in calling procedure. Issues in the Design of a Code Generator
A

•• Each occurrence of a parameter in the called procedure is


G

The generic issues in the design of code generators are


replaced with the corresponding argument.
D

•• Similar to macro expansion. •• Input to the code generator


A

•• Target programs
LO

•• A parameter is not evaluated unless its value is needed


during computation. •• Memory Management
N

•• Instruction selection
W

Example: •• Register Allocation


O

void show (int x) •• Choice of Evaluation order


D

{
for (int y = 0; y < 10; y++)
x++; Input to the code generator
} Intermediate representation with symbol table will be the
main () input for the code generator.
{
int j; •• High Level Intermediate representation
j = –1; Example: Abstract Syntax Tree (AST)
show (j);
} •• Medium – level intermediate representation
Actually it will be like this Example: control flow graph of complex operations
main ()
{ •• Low – Level Intermediate representation
Chapter 3 • Intermediate Code Generation | 6.45

Example: Quadruples, DAGS Example: x = y + z in three address statements:


•• Code for abstract stack machine, i.e., postfix code. MOV y, R0 / * load y into R0 * /
ADD z, R0
Target programs MOV R0, x /* store R0 into x*/
The output of the code generator is the target program. The
output may take on a variety of forms: Register allocation
1. Absolute machine language •• Instructions with register operands are faster. So, keep fre-
2. Relocatable machine language quently used values in registers.
3. Assembly language •• Some registers are reserved.

JW
Example: SP, PC … etc.
Absolute machine language Minimize number of loads and stores.

Pw
•• Final memory area for a program is statically known.

Jf
•• Hard coded addresses.
Evaluation order

/2
•• Sufficient for very simple systems.
•• The order of evaluation can affect the efficiency of the

ly
Advantages:

it.
target code.
•• Fast for small programs

//b
•• Some orders require fewer registers to hold intermediate
•• No separate compilation
results.

s:
tp
Disadvantages: Can not call modules from other languages/

ht
compliers. Target Machine

S
Lets us assume, the target computer is
Relocatable code It Needs TE
•• Relocation table •• Byte addressable with 4 bytes per word
O

•• Relocating linker + loader (or) runtime relocation in •• It has n general purpose registers
N

Memory management Unit (MMU). R0, R1, R2, … Rn-1


D

•• It has 2 address instructions of the form


N

Advantage: More flexible.


A

OP source, destination
H

Assembly language Generates assembly code and use an [cost: 1 + added]


EW

assembler tool to convert this to binary (object) code. It needs Example: The op may be MOV, ADD, MUL.
N

(i) assembler (ii) linker and loader. Generally cost will be like this
SE

Advantage: Easier to handle and closer to machine. Source Destination Cost


C

Register Register 1
TE

Register Memory 2
Memory management
A

Memory Register 2
Mapping names in the source program to addresses of data
G

Memory Memory 3
objects in runtime memory is done by the front end and the
D

Addressing modes:
A

code generator.
LO

•• A name in a three address statement refers to a symbol Mode Form Address Cost
N

entry for the name. Absolute M M 2


W

•• Stack, heap, garbage collection is done here. Register R R 1


O
D

Indexed C(R) C+contents(R) 2


Instruction selection Indirect *R Contents (R) 1
register
Instruction selection depends on the factors like
Indirect *C(R) Contents (C+contents 2
•• Uniformity indexed (R))
•• Completeness of the instruction
•• Instruction speed Example: x: = y – z
•• Machine idioms MOV y, R0 → cost = 2
•• Choose set of instructions equivalent to intermediate rep- SUB z, R0 → cost = 2
resentation code. MOV R0, x → cost = 2
•• Minimize execution time, used registers and code size. 6
6.46 | Unit 6 • Compiler Design

Runtime Storage Management 1. Static allocation: The position of an activation


record in memory is fixed at compile time.
Storage Organization 2. Stack allocation: A new activation record is pushed
To run a compiled program, compiler will demand the oper- on to the stack for each execution of the procedure.
ating system for the block of memory. This block of mem- The record is poped when the activation ends.
ory is called runtime storage.
This run time storage is subdivided into the generated Control stack The control stack is used for managing active
target code, Data objects and Information which keeps track procedures, which means when a call occurs, the execution
of procedure activations. of activation is interrupted and status information of the
The fixed data (generated code) is stored at the statically stack is saved on the stack.
determined area of the memory. The Target code is placed When control is returned from a call, the suspended acti-

JW
at the lower end of the memory. vation is resumed after storing the values of relevant reg-
isters it also includes program counter which sets to point

Pw
The data objects are stored at the statically determined
area as its size is known at the compile time. Compiler immediately after the call.

Jf
stores these data objects at statically determined area The size of stack is not fixed.

/2
because these are compiled into target code. This static data

ly
Scope of declarations Declaration scope refers to the cer-
area is placed on the top of the code area.

it.
tain program text portion, in which rules are defined by the
The runtime storage contains stack and the heap. Stack

//b
language.
contains activation records and program counter, data

s:
Within the defined scope, entity can access legally to
object within this activation record are also stored in this

tp
declared entities.
stack with relevant information.

ht
The scope of declaration contains immediate scope
The heap area allocates the memory for the dynamic data
always. Immediate scope is a region of declarative portion

S
(for example some data items are allocated under the pro-
with enclosure of declaration immediately.
TE
gram control)
Scope starts at the beginning of declaration and scope
O
The size of stack and heap will grow or shrink according
continues till the end of declaration. Whereas in the over
N

to the program execution.


loadable declaration, the immediate scope will begin, when
D

the callable entity profile was determined.


N

Activation Record
A

The visible part refers text portion of declaration, which


H

Information needed during an execution of a procedure is is visible from outside.


EW

kept in a block of storage called an activation record.


Flow Graph
N

•• Storage for names local to the procedures appears in the


SE

activation record. A flow graph is a graph representation of three address


•• Each execution of a procedure is referred as activation of statement sequences.
C

the procedure. •• Useful for code generation algorithms.


TE

•• If the procedure is recursive, several of its activation •• Nodes in the flow graph represents computations.
A

might be alive at a given time. •• Edges represent flow of control.


G

•• Runtime storage is subdivided into


D

1. Generated target code area


A

Basic Blocks
LO

2. Data objects area


3. Stack Basic blocks are sequences of consecutive statements in
N

4. Heap
W

which flow of control enters at the beginning and leaves at


the end without a halt or branching.
O
D

Code
1. First determine the set of leaders
Static data •• First statement is leader
Stack •• Any target of goto is a leader

•• Any statement that follows a goto is a leader.
… 2. For each leader its basic block consists of the leader
Heap and all statements up to next leader.
Initial node: Block with first statement is leader.
•• Sizes of stack and heap can change during program Example: consider the following fragment of code that
execution. computes dot product of two vectors x and y of length 10.
For code generation there are two standard storage begin
allocations: Prod: = 0;
Chapter 3 • Intermediate Code Generation | 6.47

i: = 1; 2. Dead code elimination: Code that computes values


repeat for names that will be dead i.e., never subsequently
begin used can be removed.
Prod: = Prod + x [i] * y [i]; 3. Renaming of temporary variables
i: = i + 1; 4. Interchange of two independent adjacent statements
end
until i < = 10; Algebraic Transformations
end
Algebraic identities represent other important class optimi-
B1 (1) Prod : = 0 zations on basic blocks. For example, we may apply arith-
metic identities, such as x + 0 = 0 + x = x,
(2) I: = 1

JW
x*1=1*x=x
x–0=x

Pw
B2 (3) t1:= 4*i x/1 = x

Jf
(4) t2: =x[t1]

/2
Next-Use Information

ly
(5) t3: =4 * i
•• Next-use info used in code generation and register

it.
(6) t4: =y [t3]
allocation.

//b
(7) t5: =t2* t4
•• Remove variables from registers if not used.

s:
(8) t6; =Prod + t5 •• Statement of the form A = B or C defines A and uses B

tp
(9) Prod := t6 and C.

ht
(10) t7: = i+1 •• Scan each basic block backwards.

S
(11) i:= t7 •• Assume all temporaries are dead or exit and all user vari-
TE
ables are live or exit.
(12) if i < = 10 goto (3)
O
N

\The flow graph for this code will be


D

Algorithm to compute next use information


N
A

b1 Suppose we are scanning


H

i: x: = y op z
EW

in backward scan
N

b2 •• attach to i, information in symbol table about x, y, z.


•• set x to not live and no next-use in symbol table
SE

•• set y and z to be live and next-use in symbol table.


C
TE

Here b1 is the initial node/block. Consider the following code:


•• Once the basic blocks have been defined, a number of 1: t1 = a * a
A
G

transformations can be applied to them to improve the 2: t2 = a * b


quality of code.
D

3: t3 = 2 * t2
A

1. Global: Data flow analysis


4: t4 = t1 + t2
LO

2. Local:
•• Structure preserving transformations 5: t5 = b * b
N
W

•• Algebraic transformations 6: t6 = t4 + t5
O

•• Basic blocks compute a set of expressions. These expres- 7: x = t6


D

sions are the values of the names live on exit from the
block. Statements:
•• Two basic blocks are equivalent if they compute the same 7: no temporary is live
set of expressions. 6: t6: use (7) t4 t5 not live
5: t5: use (6)
Structure preserving transformations:
4: t4: use (6), t1 t3 not live
1. Common sub-expression elimination: 3: t3: use (4) t2 not live
a:=b+c a:=b+c 2: t2: use (3)
b:=a–d b:=a–d 1: t1: use (4)
c:=b+c ⇒ c:=b+c Symbol Table:
d:=a-d d:=b
t1 dead use in 4
6.48 | Unit 6 • Compiler Design

t2 dead use in 3 Address mapping In this, mapping is defined between


t3 dead use in 4 intermediate representations to target code address.
t4 dead use in 6 It is based on run time environment like static, stack or
heap.
t5 dead use in 6
t6 dead use in 7 Instruction set It should provide a complete set in such a
The six temporaries in the basic block can be packed into way that all its operations can be implemented.
two locations t1 and t2:
1: t1 = a * a Code Generation Algorithm
2: t2 = a * b For each three address statement x = y op z do
3: t2 = 2 * t2 •• Invoke a function getreg to determine location L where x

JW
4: t1 = t1 + t2 must be stored. Usually L is a register.

Pw
•• Consult address descriptor of y to determine y′. Prefer a
5: t2 = b * b
register for y′. If value of y is not already in L generate

Jf
6: t1 = t1 + t2 MOV y′, L.

/2
7: x = t1 •• Generate

ly
OP z′, L

it.
//b
Code Generator Again prefer a register for z. Update address descriptor
of x to indicate x is in L. If L is a register update its descrip-

s:
•• Consider each statement
tor to indicate that it contains x and remove x from all other

tp
•• Remember if operand is in a register
register descriptors.

ht
•• Descriptors are used to keep track of register contents and
•• If current value of y and/or z have no next use and are
address for names
S
dead or exit from block and are in registers then change
TE
•• There are 2 types of descriptors
the register descriptor to indicate that it no longer contain
1. Register Descriptor
O
y and /or z.
N

2. Address Descriptor
D

Function getreg
N

Register Descriptor
A

1. If y is in register and y is not live and has no next use


H

Keep track of what is currently in each register. Initially all after x = y OP z then return register of y for L.
EW

registers are empty. 2. Failing (1) return an empty register.


3. Failing (2) if x has a next use in the block or OP
N

requires register then get a register R, store its


SE

Address Descriptors contents into M and use it.


C

•• Keep track of location where current value of the name 4. Else select memory location x as L.
TE

can be found at runtime.


Example: D: = (a - b) + (a - c) + (a - c)
•• The location might be a register, stack, memory address
A
G

or a set of all these. Code


Stmt Generated reg desc addr desc
D

Issues in design of code generation The issues in the t= a - b MOV a, R0 R0 contains t t in R0


A
LO

design of code generation are SUB b, R0


u=a–c MOV a, R1 R0 contains t t in R0
N

1. Intermediate representation SUB c, R1 R1 contains u u in R1


W

2. Target code v=t+u ADD R1, R0 R0 contains v u in R0


O

3. Address mapping R1 contains u v in R0


D

d=v+u ADD R1, R0 Ro contains d d in R0


4. Instruction set.
MOV R0,d d in R0 and
memory
Intermediate Representation It is represented in post fix,
3-address code (or) quadruples and syntax tree (or) DAG.
Conditional Statements
Target Code The Target Code could be absolute code, Machines implement conditional jumps in 2 ways:
relocatable machine code (or) assembly language code.
Absolute code will execute immediately as it is having 1. Based on the value of the designated register (R)
fixed address relocatable, requires linker and loader to get Branch if values of R meets one of six conditions.
the code from appropriate location for the assembly code, (i) Negative (ii) Zero
assemblers are required to convert it into machine level (iii) Positive (iv) Non-negative
code before execution. (v) Non-zero (vi) Non-positive
Chapter 3 • Intermediate Code Generation | 6.49

Example: Three address statement: if x < y goto z Code Generation from DAG:
It can be implemented by subtracting y from x in R, then
jump to z if value of R is negative. S1 = 4 * i S1 = 4 * i
2. Based on a set of condition codes to indicate whether S2 = add(A) - 4 S2 = add(A) - 4
last quantity computed or loaded into a location is S3 = S2 [S1] S3 = S2 [S1]
negative (or) Zero (or) Positive. S4 = 4 * i
•• compare instruction set codes without actually S5 = add(B) - 4 S5 = add(B) - 4
computing the value. S6 = S5[S4] S6 = S5[S4]
Example: CMP x, y S7 = S3 *S6 S7 = S3 *S6
CJL Z. S8 = prod + S7 prod = prod + S7
•• Maintains a condition code descriptor, which tells the

JW
prod = S8
name that last sets the condition codes. S9 = I + 1

Pw
Example: X: = y + z I = S9 I=I+1

Jf
If x < 0 goto z if I < = 20 got (1) if I < = 20 got (1)

/2
By

ly
MOV y, Ro
Rearranging order of the code

it.
ADD z, Ro

//b
MOV Ro, x Consider the following basic block

s:
CJN z. t1:= a + b

tp
t2:= c + d
DAG Representation

ht
t3:= e – t2
of Basic Blocks
S
x = t1 - t3 and its DAG
TE
•• DAGS are useful data structures for implementing trans- −x
O

formations on basic blocks.


N

•• Tells, how value computed by a statement is used in sub- t1 − t3


D

sequent statements.
N

•• It is a good way of determining common sub expressions.


A

a b y +t 2
H

•• A DAG for a basic block has following labels on the nodes:


EW

•• Leaves are labeled by unique identifiers, either variable c d


names or constants.
N

•• Interior nodes are labeled by an operator symbol. Three address code for the DAG:
SE

•• Nodes are also optionally given as a sequence of identi- (Assuming only two registers are available)
C

fiers for labels. MOV a, Ro


TE

Example: 1: t1:= 4 * i ADD b, Ro


2: t2:= a [t1]
A

MOV c, R1
G

3: t3:= 4 * i MOV Ro, t1 Register Spilling


D

4: t4:= b [t3]
MOV e, Ro Register Reloading
A

5: t5:= t2 * t4
LO

6: t6:= prod + t5 SUB R1, Ro


7: prod: = t6 MOV t1, R1
N
W

8: t7:= i + 1 SUB Ro, R1


O

9: i= t7 MOV R1, x
D

10: if i < = 20 got (1)


Rearranging the code as
+ t 6, prod t2:= c + d
t3:= e – t2
prod * t5
t1:= a + b
(1)
x = t1 – t3
[] [ ] t4 <=
The rearrangement gives the code:
MOV c, Ro
20
a b t 1, t 3 + t 7, i ADD d, Ro
*
MOV e, R1
4 i SUB Ro, R1
io
6.50 | Unit 6 • Compiler Design

MOV a, Ro 1. Eliminating redundant instructions


ADD b, Ro 2. Eliminating unreachable code
SUB R1, R0 3. Flow of control optimizations or Eliminating jumps
over jumps
MOV R1, x
4. Algebraic simplifications
Error detection and Recovery The errors that arise while 5. Strength reduction
compiling 6. Use of machine idioms
1. Lexical errors
Elimination of Redundant Loads and stores
2. Syntactic errors
3. Semantic errors Example 1: (1) MOV Ro, a
(2) MOV a, Ro

JW
4. Run-time errors
We can delete instruction (2), because the value of a is

Pw
Lexical errors If the variable (or) constants are declared already in R0.
(or) defined, not according to the rules of language, special Example 2: Load x, R0

Jf
symbols are included which were not part of the language, Store R0, x

/2
etc is the lexical error.

ly
If no modifications to R0/x then store instruction can be
Lexical analyzer is constructed based on pattern recog-

it.
deleted
nizing rules to form a token, when a source code is made

//b
Example 3: (1) Load x, R0
into tokens and if these tokens are not according to rules

s:
then errors are generated. (2) Store R0, x

tp
Example 4: (1) store R0, x

ht
Consider a c program statement (2) Load x, R0
printf (‘Hello World’); Second instruction can be deleted from both examples 3 and 4.
S
Main printf, (, ‘, Hello world,’ , ),; are tokens.
TE
Example 5: Store R0, x
Printf is not recognizable pattern, actually it should be
O
Load x, R0
printf. It generates an error.
N

Here load instruction can be deleted.


D

Syntactic error These errors include semi colons, missing Eliminating Unreachable code
N
A

braces etc. which are according to language rules. An unlabeled instruction immediately following and uncon-
H

The parser reports the errors ditional jump may be removed.


EW

•• May be produced due to debugging code intro-


Semantic errors This type of errors arises, when operation duced during development.
N

is performed over incompatible type of variables, double •• May be due to updates in programs without consid-
SE

declaration, assigning values to undefined variables etc. ering the whole program segment.
C

Runtime errors The Runtime errors are the one which are Example: Let print = 0
TE

detected at runtime. These include pointers assigned with if print = 1 goto L 1 if print ! = 1 goto L 2
A

NULL values and accessing a variable which is out of its goto L 2 print instructions
G

boundary, unlegible arithmetic operations etc. L 1: print in L 2:


D

After the detection of errors. The following recovery


A

strategies should be implemented.


LO

1. Panic mode recovery


N

goto L 2 if 0! = 1 goto L 2
2. Phrase level recovery
W

print instructions print instructions


3. Error production L 2: L 2:
O

4. Global correction.
D

In all of the above cases print instructions are unreachable.


\ Print instructions can be eliminated.
Peephole Optimization Example: goto L2
•• Target code often contains redundant instructions and …
suboptimal constructs. L2:
•• Improving the performance of the target program by
Flow of control optimizations The unnecessary jumps can
examining a short sequence of target instructions (peep-
be eliminated.
hole) and replacing these instructions by a shorter or
Jumps like:
faster sequence is peephole optimization.
Jumps to jumps,
•• The peephole is a small, moving window on the target
Jumps to conditional jumps,
program. Some well known peephole optimizations are
Conditional jumps to jumps.
Chapter 3 • Intermediate Code Generation | 6.51

Example 1: we can replace the jump sequence Reduction in strength


goto L1 •• x2 is cheaper to implement as x * x than as a call to expo-
… nentiation routine.
L1: got L2 •• Replacement of multiplication by left shift.
By the sequence
Example: x * 23 ⇒ x < < 3
Got L2
•• Replace division by right shift.
L1: got L2,
… Example: x > > 2 (is x/22)
If there are no jumps to L1 then it may be
possible to eliminate the statement L1: goto L2. Use of machine Idioms
•• Auto increment and auto decrement addressing modes

JW
Example 2: can be used whenever possible.

Pw
Sometimes skips “goto L 3”
Example: replace add #1, R by INC R

Jf
goto L 1 Only one jump to if a < b goto L 2
L

/2
... goto L 3:
L 1: if a < b goto

ly
...
L2 L 3:

it.
L 3:

//b
...

s:
tp
Exercises

ht
Practice Problems 1 Var …

S
Directions for questions 1 to 15: Select the correct alterna-
TE
call A2;
tive from the given choices }
O
N

1. Consider the following expression tree on a machine Procedure A2 ( )


D

with bad store architecture in which memory can be {


N

accessed only through load and store instructions. The


A

variables p, q, r, s and t are initially stored in memory. Var..


H

The binary operators used in this expression tree can Procedure A21 ( )
EW

be evaluated by the machine only when the operands {


N

are in registers. The instructions produce result only Var…


SE

in a register if no intermediate results can be stored


in memory, what is the minimum number of registers call A21 ( );
C

needed to evaluate this expression? }


TE

+ Call A1;
A

}
G

− Call A1;
D


A

}
LO

p q t + Consider the calling chain: main ( )→ A1 ( ) → A2 ( ) →


N

A21 ( ) → A1 ( ).
W

The correct set of activation records along with their


O

r s
D

access links is given by


(A) 2 (B) 9 (A) (B) (C) (D)
(C) 5 (D) 3 main main
main main
2. Consider the program given below with lexical scoping
and nesting of procedures permitted. A1 A1 A1 A1
Program main ( ) A2 A2 A2 A2
{
A 21
Var … A 21 A 21 A 21
Procedure A1 ( ) Frame A1 Access A1 A1
{ Pointer links
6.52 | Unit 6 • Compiler Design

3. Consider the program fragment: a = a * ( j* (b/c));


sum = 0; d = a * ( j* (b/c));
For (i = 1; i < = 20; i++) (A) 4 (B) 7
(C) 8 (D) 10
sum = sum + a[i] +b[i];
8. Let A = 2, B = 3, C = 4 and D = 5, what is the final value
How many instructions are there in the three-address
of the prefix expression: + * AB – CD
code for this?
(A) 5 (B) 10
(A) 15 (B) 16
(C) –10 (D) –5
(C) 17 (D) 18
9. Which of the following is a valid expression?
4. Suppose the instruction set of the processor has only
(A) BC * D – + (B) * ABC –

JW
two registers. The code optimization allowed is code
(C) BBB ***- + (D) -*/bc
motion. What is the minimum number of spills to

Pw
memory in the complied code? 10. What is the final value of the postfix expression B C D
A D – + – + where A = 2, B = 3, C = 4, D = 5?

Jf
c = a + b;
(A) 5 (B) 4

/2
d = c*a; (C) 6 (D) 7

ly
e = c + a;

it.
11. Consider the expression x = (a + b)* –C/D. In the

//b
x = c*c; quadruple representation of this expression in which

s:
If (x > a) instruction ‘/’ operation is used?

tp
(A) 3rd (B) 4th
{

ht
(C) 5th (D) 8th
y = a*a;
12. In the triple representation of x = (a + b)*– c/d, in which

S
Else TE
instruction (a + b) * – c/d result will be assigned to x?
{ (A) 3rd (B) 4th
O

(C) 5th (D) 8th


N

d = d*d; e = e*e;
D

} 13. Consider the three address code for the following


N

(A) 0 (B) 1 program:


A
H

(C) 2 (D) 3 While (A < C and B > D) do


EW

5. What is the minimum number of registers needed to If (A = = 1) then C = C + 1;


compile the above problem’s code segment without any Else
N

spill to memory?
While (A < = D) do
SE

(A) 3 (B) 4
A = A + 3;
C

(C) 5 (D) 6
TE

6. Convert the following expression into postfix notation: How many temporaries are used?
(A) 2 (B) 3
A

a = (-a + 2*b)/a
(C) 4 (D) 0
G

(A) aa – 2b *+a/= (B) a – 2ba */+ =


D

(C) a2b * a/+ (D) a2b – * a/+ 14. Code generation can be done by
A

(A) DAG (B) Labeled tree


7. In the quadruple representation of the following pro-
LO

(C) Both (A) and (B) (D) None of these


gram, how many temporaries are used?
N

15. Live variables analysis is used as a technique for


int a = 2, b = 8, c = 4, d;
W

(A) Code generation (B) Code optimization


O

For ( j = 0; j< = 10; j++) (C) Type checking (D) Run time management
D

Practice Problems 2 (iii) For i = 1 to 10 ⇒ for i = 1 to 10 (r) Common sub


A [i] = B + C t=B+C expression
Directions for questions 1 to 19: Select the correct alterna- A [i] = t; elimination.
tive from the given choices
(iv) x = 2 * y ⇒ y << 2; (s) Code motion
1. Match the correct code optimization technique to the
corresponding code:
(A) i – r, iii – s, iv – p, ii – q
(i) i = i * 1 ⇒ j=2*i (p) Reduction in (B) i – q, ii – r, iii – s, iv –p
j=2*i strength
(ii) A = B + C ⇒ A=B+C (q) Machine Idioms (C) i – s, iii – p, iii – q, iv – r
D = 10 + B + C D = 10 + A (D) i – q, ii – p, iii – r, iv – s
Chapter 3 • Intermediate Code Generation | 6.53

2. What will be the optimized code for the following z: = a * * 2


expression represented in DAG? x: = 0 * b
a=q*-r+q*-r y: = b + c
(A) t1 = -r (B) t1 = -r w: = y * y
t2 = q * t1 t2 = q * t1 u: = x + 3
t3 = a * t1 t3 = t2 + t2 v: = u + w
t4 = t2 + t3 a = t3 Assume that the only variables that are live at the exit of this
a = t4 block are v and z. In order, apply the following optimization
(C) t1 = -r (D) All of these to this basic block.
t2 = q 10. After applying algebraic simplification, how many
t3 = t1 * t2

JW
instructions will be modified?
t4 = t3 + t3 (A) 1 (B) 2

Pw
a = t4 (C) 4 (D) 5

Jf
3. In static allocation, names are bound to storage at 11. After applying common sub expression elimination to

/2
_______ time. the above code. Which of the following are true?

ly
(A) Compile (B) Runtime (A) a: = b + c (B) y: = a

it.
(C) Debugging (D) Both (A) and (B) (C) z = a + a (D) None of these

//b
4. The actual parameters are evaluate d and their r-values 12. Among the following instructions, which will be modi-

s:
are passed to the called procedure is known as fied after applying copy propagation?

tp
(A) call-by-reference (A) a: = b + c (B) z: = a * a

ht
(B) call-by-name (C) y: = a (D) w: = y * y
(C) call-by-value

S
13. Which of the following is obtained after constant
TE
(D) copy-restore
folding?
5. If the expression – (a + b) *(c + d) + (a + b + c) is trans-
O
(A) u: = 3 (B) v: = u + w
N

lated into quadruple representation, then how many (C) x: = 0 (D) Both (A) and (C)
D

temporaries are required?


14. In order to apply dead code elimination, what are the
N

(A) 5 (B) 6
A

statements to be eliminated?
(C) 7 (D) 8
H

(A) x = 0
EW

6. If the above expression is translated into triples repre- (B) y = b + c


sentation, then how many instructions are there? (C) Both (A) and (B)
N

(A) 6 (B) 10 (D) None of these


SE

(C) 5 (D) 8 15. How many instructions will be there after optimizing
C

7. In the indirect triple representation for the expression the above result further?
TE

A = (E/F) * (C – D). The first pointer address refers to (A) 1 (B) 2


(A) C – D (C) 3 (D) 4
A
G

(B) E/F 16. Consider the following program:


(C) Both (A) and (B)
D

L0: e: = 0
A

(D) (E/F) * (C – D)
b: = 1
LO


8. For the given assembly language, what is the cost for it?
d: = 2
N

MOV b, a
W

L1: a: = b + 2
O

ADD c, a c: = d + 5
D

(A) 3 (B) 4
e: = e + c
(C) 6 (D) 2
f: a*a
9. Consider the expression
If f < c goto L3
((4 + 2 * 3 + 7) + 8 * 5). The polish postfix notation for L2: e: = e + f
this expression is
goto L4
(A) 423* + 7 + 85*+ (B) 423* + 7 + 8 + 5*
(C) 42 + 37 + *85* + (D) 42 + 37 + 85** + L3: e: = e + 2
L4: d: = d + 4
Common data for questions 10 to 15: Consider the fol-
b: = b – 4
lowing basic block, in which all variables are integers, and
** denotes exponentiation. If b! = d goto 4
a: = b + c L5:
6.54 | Unit 6 • Compiler Design

How many blocks are there in the flow graph for the 18. In call by value the actual parameters are evaluated.
above code? What type of values is passed to the called procedure?
(A) 5 (A) l-values
(B) 6 (B) r-values
(C) Text of actual parameters
(C) 8
(D) None of these
(D) 7
19. Which of the following is FALSE regarding a Block?
17. A basic block can be analyzed by (A) The first statement is a leader.
(A) Flow graph (B) Any statement that is a target of conditional / un-
conditional goto is a leader.
(B) A graph with cycles

JW
(C) Immediately next statement of goto is a leader.
(C) DAG (D) The last statement is a leader.

Pw
(D) None of these

Jf
/2
Previous Years’ Questions

ly
it.
1. The least number of temporary variables required to The minimum number of total variables required to con-

//b
create a three-address code in static single assignment vert the above code segment to static single assignment

s:
form for the expression q + r/3 + s – t * 5 + u * v/w is form is _____ .

tp
________[2015] 4. What will be the output of the following pseudo-

ht
code when parameters are passed by reference and
2. Consider the intermediate code given below.

S
dynamic scoping is assumed?[2016]
TE
(1) i=1 a = 3;
O
(2) j=1
void n(x) { x = x* a; print (x);}
N

(3) t1 = 5 * i
void m(y) {a = 1; a = y – a; n(a) ; print (a)}
D

(4) t2 = t1 + j
N

t3 = 4 * t2
void main( ) {m(a);}
A

(5)
(A) 6,2 (B) 6,6
H

(6) t4 = t3
(C) 4,2 (D) 4,4
EW

(7) a[t4] = –1
5. Consider the following intermediate program in three
(8) j=j+1
N

address code
(9) if j < = 5 goto (3)
SE

p=a−b
(10) i=i+1
C

(11) if i < 5 goto (2)


q=p*c
TE

p=u*v
The number of nodes and edges in the control-flow-
A

graph constructed for the above code, respectively, q=p+q


G

are[2015] Which one of the following corresponds to a static


D

single assignment form of the above code?[2017]


A

(A) 5 and 7 (B) 6 and 7


(A) p1 = a − b (B) p3 = a − b
LO

(C) 5 and 5 (D) 7 and 8


q1 = p1 * c q4 = p3 * c
N

3. Consider the following code segment.[2016] p1 = u * v p4 = u * v


W

x = u – t; q1 = p1 + q1 q5 = p4 + q4
O

y = x * v; (C) p1 = a − b (D) p1 = a − b
D

x = y + w; q1 = p2 * c q1 = p * c
y = t – z; p3 = u * v p2 = u * v
y = x * y; q2 = p4 + q3 q2 = p + q
Chapter 3 • Intermediate Code Generation | 6.55

Answer Keys
Exercises
Practice Problems 1
1. D 2. D 3. C 4. C 5. B 6. A 7. B 8. A 9. A 10. A
11. B 12. C 13. A 14. C 15. B

Practice Problems 2
1. B 2. B 3. A 4. B 5. B 6. A 7. B 8. C 9. A 10. A
11. B 12. D 13. A 14. C 15. C 16. A 17. C 18. B 19. D

JW
Previous Years’ Questions

Pw
1. 8 2. B 3. 10 4. D 5. B

Jf
/2
ly
it.
//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D
Chapter 4
Code Optimization

JW
Pw
LEARNING OBJECTIVES

Jf
 Code optimization basics  Pre-header

/2
ly
 Principle sources of optimization  Global data flow analysis

it.
 Loop invariant code motion  Definition and usage of variables

//b
 Strength reduction on induction variables  Use-definition (u-d ) chaining

s:
 Loops in flow graphs  Data flow equations

tp
ht
S
TE
O
N
D

coDE optiMiZation basics


N

Optimizing compiler organization


A

This applies
H

The process of improving the intermediate code and the target


• Control flow analysis
EW

code in terms of both speed and the amount of memory required


for execution is known as code optimization. • Data flow analysis
N

Compilers that apply code–improving transformations are • Transformations


SE

called optimizing compilers.


Issues in design of code optimization The issues in the design of
C

code optimization are


TE

Properties of the transformations of an 1. Target machine characteristics


2. Target CPU architecture
A

optimizing compiler are


G

3. Functional units
1. A transformation must preserve the meaning of programs.
D

2. It must speed up programs by a measurable amount.


A

Target machine Optimization is done, according to the target


LO

3. A transformation must be worth the effort. machine characteristics. Altering the machine description param-
eters, one can optimize single piece of compiler code.
N
W

Places for improvements Target CPU architecture The issues to be considered for the opti-
O

1. Source Code: mization with respect to CPU architecture


D

User can – profile a program 1. Number of CPU registers


– change an algorithm 2. RISC Instruction set
– transform loops 3. CISC instruction set
4. Pipelining
2. Intermediate code can be improved by improving
– Loops Functional units Based on number of functional units, optimiza-
– Procedure calls tion is done. So that instructions can be executed simultaneously.
– Address calculations
3. Target code can be improved by
– Using registers principLE sourcEs oF optiMiZation
– Selecting instructions Some code improving transformation is Local transformations and
– Peephole transformations some are Global transformations.
Chapter 4 • Code Optimization | 6.57

Local Transformations can be performed by looking The computations made in quadruples


only at a statement in a basic block. Otherwise it is global (1) – (3), (12) – (14), (15) – (17) are essentially same.
transformation. That is, ((n)*(n + 1))/2 is computed.
It is the common sub expression.
This common sub expression is computed four times in
Function Preserving Transformations the above example.
These transformations improve the program without chang- It is possible to optimize the code to have common sub
ing the function it computes. Some of these transformations expressions computed only once and then reuse the com-
are puted values further.
1. Common sub expression elimination
∴ Optimized intermediate code will be

JW
2. Copy propagation
3. Dead-code elimination (0) proc-begin sum

Pw
4. Loop optimization (1) t0: = n + 1
- Code motion (2) t1: = n * t0

Jf
(3) sultan: = t1/2

/2
- Induction variable elimination

ly
- Reduction in strength (4) t5: = 2 * n

it.
(5) t6: = t5 + 1

//b
Common sub expression elimination The process of iden- (6) t7: = t1 * t6
tifying common sub expressions and eliminating their com-

s:
(7) sum_n2: = t7/6
putation multiple times is known as common sub expression

tp
(8) sum_n3: sum_n * sum_n
elimination.

ht
(9) proc-end sum
Example: Consider the following program segment:

S
int sum_n, sum_n2, sum_n3; Constant folding The constant expressions in the input
TE
int sum (int n) source are evaluated and replaced by the equivalent values
O

at the time of compilation.


N

{
For example 10*3, 6 + 101 are constant expressions and
D

Sum_n = ((n)*(n+1))/2;
they are replaced by 30, 107 respectively.
N

sum_n2 = ((n)*(n+1)*(2n+1))/6;
A

sum_n3 = (((n)*(n+1))/2)*(((n)*(n+1))/2; Example: Consider the following ‘C’ code:


H

int arr1 [10];


EW

}
Three Address code for the above input is int main ( )
N

{
(0) Proc-begin sum arr1 [0] = 3;
SE

(1) t0: = n + 1 arr1 [1] = 4;


C

(2) t1: = n * t0 }
TE

(3) t2: = t1/2 Unoptimized three address code equivalent to the above ‘C’
A

(4) sum_n = t2 code is


G

(5) t3: = n + 1 (0) proc-begin main


D

(6) t4: = n * t3 t0: = 0*4


A

(1)
LO

(7) t5: 2 * n (2) t1: = &arr1


(8) t6: = t5 + 1 (3) t1 [t0]: = 3
N

t7: = t4 * t6 t2: = 1*4


W

(9) (4)
t3: = &arr1
O

(10) t8: = t7/6 (5)


D

(11) sum_n2: = t8 (6) t3 [t2]: = 4


(12) t9: = n + 1 (7) Label L0
(13) t10: n * t9 (8) Proc – end main
(14) t11: t10/2 In the above code, 0*4 is a constant expression its value
(15) t12: = n + 1 = 0. 1*4 is a constant expression, its value = 4.
(16) t13: = n * t12 ∴ After applying constant folding, optimized code will be
(17) t14: = t13/2 (0) proc-begin main
(18) t15: = t11 * t14 (1) t0: = 0
(19) sum_n3: = t15 (2) t1: = &arr1
(20) label Lo (3) t1 [t0]: = 3
(21) Proc end sum (4) t2: = 4
6.58 | Unit 6 • Compiler Design

(5) t3: = &arr1 Three address code (unoptimized):


(6) t3 [t2]: = 4
(0) proc-begin func
(7) label L0
(8) proc – end main (1) d : = a
(2) if a >10 goto L0
Copy propagation In copy propagation, if there is an
(3) goto L1
expression x = y then use the variable ‘y’ instead of ‘x’. This
(4) label : L0
propagated in the statements following x = y.
(5) e : = d + b
Example: In the previous example, there are two copy
statements. (6) goto L2
(7) label : L1

JW
(1) t0 = 0
(2) t2 = 4 (8) e : = d + c

Pw
After applying copy propagation, the optimized code will be (9) label : L2

Jf
(0) proc-begin main (10) f : = d * e

/2
(1) t0: = 0

ly
(11) return f
(2) t1: = &arr1

it.
(12) goto L3
(3) t1 [0]: = 3

//b
(13) label : L3
(4) t2: = 4 (14) proc-end func

s:
(5) t3: = &arr1

tp
(6) t3 [4]: = 4 Three address code after variable (copy) propagation:

ht
(7) Label L0 (0) proc-begin func

S
(8) proc-end main (1) d: = a
TE
In the three address code shown above, quadruples (1) (2) If a >10 goto .L0
O

and (4) are no longer used in any of the following statements. (3) goto L1
N

∴ (1) and (4) can be eliminated. (4) label: L0


D

(5) e: = a + b
N

Three address code after dead store elimination


A

(6) goto L2
(0) proc-begin main
H

(7) label: L1
(1) t1: = &arr1
EW

(8) e: = a + c
(2) t1 [0]: = 3 (9) label: L2
N

(3) t3: = &arr1 (10) f: = a*e


SE

(4) t3 [4]: = 4 (11) return f


(5) Label L0
C

(12) goto L3
(6) proc-end main
TE

(13) label: L3
In the above example, we are propagating constant val- (14) proc-end func
A

ues. It is also known as constant propagation.


G

After dead store elimination:


D

Variable propagation Propagating another variable instead In the above code (1) d: = a is no more used
A

of the existing one is known as variable propagation. ∴ Eliminate the dead store d: = a
LO

Example: int func(int a, int b, int c) (0) proc-begin func


N

{
W

(1) If a > 10 goto L0


   int d, e, f;
O

(2) goto L1
  d = a;
D

(3) label: L0
   If (a > 10) (4) e: = a + b
  { (5) goto L2
e = d + b;
(6) label: L1
  }
(7) e: a + c
  Else
  { (8) label: L2
e = d + c; (9) f: = a*e
  } (10) return f
  f = d*e; (11) goto L3
  return (f); (12) label: L3
} (13) proc-end func
Chapter 4 • Code Optimization | 6.59

Dead code elimination Eliminating the code that never Example: Consider the following code fragment:
gets executed by the program is known as Dead code struct mystruct
­elimination. It reduces the memory required by the program
{
Example: Consider the following Unoptimized Interme­ int a [20];
diate code: int b;
(0) proc-begin func } xyz;
(1) debug: = 0 int func(int i)
(2) If debug = = 1 goto L0 {
(3) goto L1 xyz.a[i] = 34;
(4) label: L0 }

JW
(5) param c
The Unoptimized three address code:

Pw
(6) param b
(7) param a (0) proc-begin func

Jf
(8) param lc1 (1) t0: = &xyz

/2
(9) call printf 16

ly
(2) t 1: = 0

it.
(10) retrieve to
t2: = i*4

//b
(11) label: L1 (3)
(12) t1: = a + b (4) t1: = t2 + t1

s:
(13) t2: = t1 + c (5) t0 [t1] = 34

tp
(14) v1: = t2 (6) label: L0

ht
(15) Return v1 (7) proc-end func

S
(16) goto L2 TE
Optimized code after copy propagation and dead code elim-
(17) label: L2
ination is shown below:
O
(18) proc-end func
The statement t1: = 0 is eliminated.
N

In copy propagation, debug is replaced with 0, wherever


D

(0) proc-being func


N

debug is used after that assignment.


(1) t0 =: = &xyz
A

∴ Statement 2 will be changed as (2) t2: = i*4


H

If 0 = = 1 goto L0
EW

(3) t1 : = t 2 + 0
0 = = 1, always returns false.
(4) t0 [t1]: = 34
N

∴ The control cannot flow to label: L0 (5) label: L0


SE

This makes the statements (4) through (10) as dead (6) proc-end func
C

code. (2) Can also be removed as part of dead code elimina-


After applying additive identity:
TE

tion. (1) Cannot be eliminated, because ‘debug’ is a global


­variable. The optimized code after elimination of dead code
A

(0) proc-begin func


G

is shown below. (1) t0: = &xyz


D

(0) proc-begin func (2) t2: = i*4


A

(1) debug: = 0 (3) t1 : = t 2


LO

(2) goto L1 (4) t0 [t1]: = 34


N

(3) label: L1
W

(5) label: L0
(4) t1: = a + b
O

(6) proc-end func


(5) t2: = t1 + c
D

(6) v1: = t2 After copy propagation and dead store elimination:


(7) return v1 (0) proc-begin func
(8) goto L2 (1) t0: = &xyz
(9) label: L2
(2) t2 : = i*4
(10) proc-end func
(3) t0 [t2]: = 34
(4) label: L0
Algebraic transformations We can use algebraic identities
to optimize the code further. For example (5) proc-end func
Additive Identity: a + 0 = a Strength reduction transformation This transformation
Multiplicative Identity: a*1 = a replaces expensive operators by equivalent cheaper ones on
Multiplication with 0: a*0 = 0 the target machine.
6.60 | Unit 6 • Compiler Design

For example y: = x*2 is replaced by y: = x + x as addition is After loop invariant code motion transformation the code
less expensive than multiplication. will be
Similarly (0) proc-begin func
Replace y: = x*32 by y: = x << 5 (1) i:=0
Replace y: = x/8 by y: = x >> 3 (2) n1 : = x*y
(3) n2 : = x-y
Loop optimization We can optimize loops by (4) t3 : = &arr
(1) Loop invariant code motion transformation. (5) t5 : n1*n2
(2) Strength reduction on induction variable transformation. (6) label : L0
(7) t2 : = i*4

JW
(8) t4 : = t3[t2]
Loop invariant code motion (9) if t4 > t5 goto L1

Pw
The statements within a loop that compute value, which (10) goto L2

Jf
do not vary throughout the life of the loop are called loop (11) label : L1

/2
invariant statements. (12) i:=i+1

ly
Consider the following program fragment: (13) goto L0

it.
int a [100]; (14) label : L2

//b
int func(int x, int y) (15) return i

s:
(16) goto L3
{

tp
(17) label : L3

ht
int i; (18) proc-end func
int n1, n2;

S
i = 0;
TE
Strength reduction on induction variables
n1 = x*y;
O

Induction variable: A variable that changes by a fixed


N

n2 = x – y; quantity on each of the iterations of a loop is an induction


D

while (a[i] > (n1*n­ )) variable.


N

2
A

i = i + 1; Example: Consider the following code fragment:


H

return(i);
EW

int i;
}
int a[20];
N

The Three Address code for above program is int func( )


SE

{
(0) proc-begin func
C

  while(i<20)
(1) i:=0
TE

(2) n1 : = x*y   {


  a[i] = 10;
A

(3) n2 : = x – y
G

(4) label : L0   i = i + 1;


D

(5) t2 : = i*4   }


A

}
LO

(6) t3 : = &arr
The three-address code will be
N

(7) t4 : = t3[t2]
W

(0) proc-begin func


(8) t5 : = n1 * n2
O

(1) label : L0
D

(9) if t4 > t5 goto L1 (2) if i < 20 goto L1


(10) goto L2 (3) goto L2
(11) label : L1 (4) label : L1
(12) i:=i+1 (5) t0 : = i*4
(13) goto L0 (6) t1 : = &a
(14) label : L2 (7) t1[t0] : = 10
(15) return i (8) i : = i + 1
(16) goto L3 (9) goto L0
(17) label : L3 (10) label : L2
(18) proc-end func (11) label : L3
In the above code statements (6) and (8) are invariant. (12) proc-end func
Chapter 4 • Code Optimization | 6.61

After reduction of strength the code will be (6) label: L1


Here (5) t0 = i*4 is moved out of the loop and (8) is followed (7) t0: = y*x
by t0 = t0 + 4. (8) y: = t0
(9) t1: = x + 1
(0) proc-begin func
(10) x: = t1
(0a) t0 : = i*4
(11) goto L0
(1) label : L0
(12) label: L2
(2) if i < 20 goto L1
(13) return y
(3) goto L2
(14) goto L3
(4) label:L1
(15) label: L3
(5)
(16) proc-end func

JW
(6) t1 : = &a
(7) t1[t0] : = 10 The Flow Graph for above code will be:

Pw
(8) i:=i+1

Jf
(8a) t0 : = t0 + 4 proc – begin func

/2
(9) goto L0 B0 x:=a

ly
(10) label : L2 y:=a

it.
(11) label : L3­

//b
(12) proc-end func

s:
tp
Loops in Flow Graphs

ht
Label: L 0
B1 If a < 100 goto L1

S
Loops in the code are detected during the data flow analysis TE
by using the concept called ‘dominators’ in the flow graph.
O
B2 goto L2 label : L1
N

Dominators t0 : = y ∗ x
D

y : = t0 B3
A node d of a flow graph dominates node n, if every path
N

label : L2 t1 = x + 1
A

from the initial node to ‘n’ goes through ‘d’. B4 return y x = t1


H

It is represented as d dom n. goto L3 goto L0


EW

Notes:
Label : L3
N

1. Each and every node dominates itself. B5


2. Entry of the loop dominates all nodes in the loop. proc – end func
SE
C

Example: Consider the following code fragment:


To reach B2, it must pass through B1
TE

int func(int a) ∴ B1 dominates B2. Also B0 dominates B2.


A

{ dominators [B1] = {B0, B1} (or) dominators [1] = {0, 1}


G

int x, y;
The dominators for each of the nodes in the flow graph
D

x = a;
are
A

y = a;
LO

While (a < 100) dominators [0] = {0}


N

{ dominators [1] = {0, 1}


W

y = y*x; dominators [2] = {0, 1, 2}


O

x = x+1; dominators [3] = {0, 1, 3}


D

} dominators [4] = {0, 1, 2, 4}


return(y); dominators [5] = {0, 1, 2, 4, 5}
}

The Three Address code after local optimization will be Edge


An edge in a flow graph represents a possible flow of control.
(0) proc-begin func
In the flow graph, B0 to B1 edge is represented as 0 → 1.
(1) x: = a
(2) y: = a Head and tail: In the edge a → b, the node b is called head
(3) label: L0 and the node a is called as tail.
(4) if a < 100 goto L1 Back edges: There are some edges in which dominators
(5) goto L2 [tail] contains the head.
6.62 | Unit 6 • Compiler Design

The presence of a back edge indicates the existence of a Header of the loop The entry of the loop is also called as
loop in a flow graph. the header of the loop.
In the previous graph, 3 → 1 is a back edge.
Loop exit block In loop L1 can be exited from the basic
Consider the following table: block B6. It is called loop exit block. The block B3 is the loop
exit block for the loop L2. It is possible to have multiple exit
Dominators Dominators
Edge Head Tail [head] [tail] blocks in a loop.
0→1 1 0 {0, 1} {0}
Dominator tree
1→2 2 1 {0, 1, 2} {0, 1}
A tree, which represents dominate information in the form
1→3 3 1 {0, 1, 3} {0, 1}
of tree is a dominator tree. In this,

JW
3→1 1 3 {0, 1} {0, 1, 3}
•• The initial node is the root.

Pw
2→4 4 2 {0, 1, 2, 4} {0, 1, 2}
•• Each node d dominates only its descendents in the tree.
4→5 5 4 {0, 1, 2, 4, 5} {0, 1, 2, 4}

Jf
/2
Consider the flow graph

ly
Example: Consider below flow graph:

it.
B0 1

//b
2

s:
B1 B2 3

tp
ht
B3 B4 4

S
TE 5 6
B5 B6
O
N

B7 7
D
N

The dominators of each node are 8


A

dominators [0] = {0}


H

9 10
dominators [1] = {0, 1}
EW

dominators [2] = {0, 2}


N

dominators [3] = {0, 1, 3} The dominators of each node are


dominators [4] = {0, 2, 4}
SE

dominators [1] = {1}


dominators [5] = {0, 1, 3, 5} dominators [2] = {1, 2}
C

dominators [6] = {0, 2, 4, 6} dominators [3] = {1, 3}


TE

dominators [7] = {0, 7} dominators [4] = {1, 3, 4}


A

dominators [5] = {1, 3, 4, 5}


G

Dominators
Edge Head Tail [head] Dominators [tail]
dominators [6] = {1, 3, 4, 6}
D

dominators [7] = {1, 3, 4, 7}


A

0→1 1 0 {0, 1} {0}


dominators [8] = {1, 3, 4, 7, 8}
LO

0→2 2 0 {0, 2} {0} dominators [9] = {1, 3, 4, 7, 8, 9}


N

1→3 3 1 {0, 1, 3} {0, 1} dominators [10] = {1, 3, 4, 7, 8, 10}


W

3→1
O

1 3 {0, 1} {0, 1, 3}
The dominator tree will be:
D

3→5 5 3 {0, 1, 3, 5} {0, 1, 3} Backedge


1
5→7 7 5 {0, 7} {0, 1, 3, 5}
2→4 4 2 {0, 2, 4} {0, 2} 2 3
{0, 2, 4, 6}
6→2 2 6 {0, 2}
Backedge
4
4→6 6 4 {0, 2, 4, 6} {0, 2, 4}
6→7 7 6 {0, 7} {0, 2, 4, 6} 5 6 7

Here {B6, B2, B4} form a loop (L1), {B3, B1} form another 8
loop (L2)
In a loop, the entry of the loop dominates all nodes in 9 10
the loop.
Chapter 4 • Code Optimization | 6.63

Pre-header In the above flow graph, there are five back edges
A pre-header is a basic block introduced during the loop 4 → 3, 7 → 4, 8 → 3, 9 → 1 and 10 → 7
optimization to hold the statements that are moved from Remove all backedges.
within the loop. It is a predecessor to the header block. The remaining edges must be the forward edges.
The remaining graph is acyclic.
1
Header Pre-header
2
loop L 3
Header

JW
loop L 4

Pw
B0
B0 5 6

Jf
7

/2
B1 B2

ly
B1 B2

it.
8

//b
B3 After
pre-header B7
10

s:
9

tp
B4 ∴ It is reducible.

ht
B3

S
Global Dataflow Analysis
TE
B5
Point: A point is a place of reference that can be found at
O
B5
N

1. Before the first statement in a basic block.


2. After the last statement in a basic block.
D

B6
N

B6 3. In between two adjacent statements within a basic block.


A

Example 1:
H

a. = 10
EW

Reducible Flow Graphs b. = 20


c. = a * b B1
N

A flow graph G is reducible if and only if we can partition


SE

the edges into two disjoint groups:


Here, In B1 there are 4 points
(1) Forward edges
C

Example 2: •P –B
(2) Backward edges with the following properties.
TE

1 1
proc-begin func
(i) The forward edges form an acyclic graph in which
A

• P2 – B1
G

every node can be reached from the initial node of G. B1 v3 = v1 + v2


D

(ii) The back edges consist only of edges whose heads • P3 – B1


A

dominates their tails. if c > 100 goto L0


LO

Example: Consider previous flow graph • P4 – B1


N
W

1
There is 4 point in the basic block B1, given by P1 – B1,
O

2
D

P2 – B1, P3 – B1 and P4 – B1.


3
Path: A path is a sequence of points in which the control
4 can flow.
A path from P1 to Pn is a sequence of points P1, P2 ,…, Pn
5 6 such that for each i between 1 and n-1, either
(a) Pi is the point immediately preceding a statement and
7 Pi+1 is the point immediately following that statement
in the same block.
8 (OR)
9 10
(b) Pi is the end of some block and Pi+1 is the beginning of
a successor block.
6.64 | Unit 6 • Compiler Design

Example: •• An example of a data-flow analysis is reaching definitions.


•• A single way to perform data-flow analysis of program is
• P0 – b0 to setup data flow equations for each node of the control
Proc-begin func
• P1 – b0 flow graph.
B0
v3 : = v1 + v2
• P2 – b0 Use definition (U-d) chaining
If c >100 goto L 0
• P3 – b0 The use of a value is any point where that variable or con-
• P4 – b2 stant is used in the right hand side of an assignment or is
Label L 0 evaluating an expression.
• P3 – b1 • P5 – b2 The definition of a value occurs implicitly at the begin-
v4 : = v1 + v2

JW
Goto .L1 B2 ning of the whole program for a variable.
B1 • P6 – b2
• P4 – b1
v1 : = 0 A point is defined either prior to or immediately after a

Pw
• P7 – b2 statement.

Jf
• P7 – b3
Reaching definitions

/2
Label L1

ly
B3 • P8 – b3 A definition of a variable A reaches a point P if there is a

it.
V5 : = v1 + v2 path in the flow graph from that definition to P, such that no

//b
• P9 – b3
other definitions of A appear on the path.

s:
Example:

tp
• P9 – b4 if A = B goto B3 B1

ht
Label L1
• P10 – b4

S
B4 Proc-end func TE B2 A:=2 if A = B goto B5 B3
• P11 – b4
O
N

A:=3 B4
Path is between the points P0 – b0 and P6 – b2:
D

B5 P:
N

The sequence of points P0 – b0, P1 – b0, P2 – b0, P3 – b0 ,


A

P4 – b2, P5 – b2 and P6 – b2.


H

The definition A: = 3 can reach point p in B5.


Path between P3 – b1 and P6 – b2: There is no sequence
EW

To determine the definitions that can reach a given pro-


of points. gram first assign distinct numbers to each definition, since
N

Path between P0 – b0 and P7 – b3: There are two paths. it is associated with a unique quadruple.
SE

(1) Path 1 consists of the sequence of points, P0 – b0, P1 – b0, •• For each simple variable A, make a list of all definitions of
C

P2 – b0, P3 – b0, P3 – b0, P4 – b1 and P7 – b3 . A anywhere in the program.


TE

(2) Path 2 consists of the sequence of points P0 – b0, P1 – b0, •• Compute two sets for each basic block B.
P2 – b0, P3 – b0, P4 – b2, P5 – b2, P6 – b2, P7 – b2 and
A

Gen [B] is the set of generated definitions within block B


P7 – b3
G

and that reach the end of the block.


D

Definition and Usage of Variables 1. Kill [B], which is the set of definitions outside of B that
A
LO

Definitions define identifiers that also have definitions within B.


2. IN [B], which are all definitions reaching the point just
N

It is either an assignment to the variable or reading of a before B’s first statement.


W

value for the variable.


O

Once this is known, the definitions reaching any use of A


D

Use within B are found by:


Let u be the statement being examined, which uses A.
Use of identifier x means any occurrence of x as an operand.
1. If there are definitions of A within B before u, the last
Example: Consider the statement
is the only one reaching u.
x = y + z;
2. If there is no definition of A within B prior to u, those
In this statement some value is assigned to x. It defines x and
reaching u are in IN [B].
used y and z values.

Global Data-Flow-Analysis Data Flow Equations


Data Flow Analysis (DFA) is a technique for gathering 1. For all blocks B,
information about the possible set of values calculated at OUT [B] = (IN [B] – KILL [B]) U GEN [B]
various points in a program. A definition d, reaches the end of B if
Chapter 4 • Code Optimization | 6.65

(a) d ∈ IN [B] and is not killed by B. If no such definition precedes its use, all definitions of
(or) ‘a’ in IN [B] are on its chain.
(b) d is generated in B and is not subsequently redefined
here.
2. IN [B] = U OUT [P] Uses of U-d Chains
∀ P preceding B
1. If the only definition of ‘a’ reaching this statement
A definition reaches the beginning of B iff it reaches
involves a constant, we can substitute that constant for
the end of one of its predecessors.
‘a’.
2. If no definitions of ‘a’ reaches this point, a warning can
Computing U-d Chains be given.

JW
If a use of variable ‘a’ is preceded in its block by a definition 3. If a definition reaches nowhere, it can be eliminated.

Pw
of ‘a’, this is the only one reaching it. This is part of dead code elimination.

Jf
/2
ly
it.
Exercises

//b
s:
Practice Problems 1 6. In block B, if x or y is assigned there and s is not in B,

tp
then s : x = y is
Directions for questions 1 to 15: Select the correct alterna-

ht
(A) Generated (B) Killed
tive from the given choices.
(C) Blocked (D) Dead
S
1. Replacing the expression 2 * 3.14 by 6.28 is
TE
7. Given the following code
(A) Constant folding
O

A = x + y;
(B) Induction variable
N

B = x + y;
(C) Strength reduction
D

Then the corresponding optimized code as


N

(D) Code reduction


–––––
A
H

2. The expression (a*b)*c op … where ‘op’ is one of ‘+’, –––––


‘*’ and ‘↑’ (exponentiation) can be evaluated on CPU
EW

C = x + y;
with a single register without storing the value of (a*b) –––––
N

if A = C;
SE

(A) ‘op’ is ‘+’ or ‘*’ –––––


(B) ‘op’ is ‘↑’ or ‘+’ B = C;
C

(C) ‘op’ is ‘↑’ or ‘*’


TE

When will be optimized code pose a problem?


(D) not possible to evaluate without storing (A) When C is undefined.
A

(B) When memory is consideration.


G

3. Machine independent code optimization can be applied


to (C) C may not remain same after some statements.
D
A

(A) Source code (D) Both (A) and (C).


LO

(B) Intermediate representation 8. Can the loop invariant X = A – B from the following
(C) Runtime output
N

code be moved out?


W

(D) Object code For i = 1 to 10


O

4. In block B if S occurs in B and there is no subsequent {


D

assignment to y within B, then the copy statement A = B * C;


S : x = y is X = A – B;
(A) Generated (B) Killed }
(C) Blocked (D) Dead (A) No
(B) Yes
5. If E was previously computed and the value of variable
(C) X = A – B is not invariant
in E have not changed since previous computation, then
(D) Data insufficient
an occurrence of an expression E is
(A) Copy propagation 9. If every path from the initial node goes through a par-
(B) Common sub expression ticular node, then that node is said to be a
(C) Dead code (A) Header (B) Dominator
(D) Constant folding (C) Parent (D) Descendant
6.66 | Unit 6 • Compiler Design

Common data for questions 10 and 11: Consider the fol- 13. The analysis that cannot be implemented by forward
lowing statements of a block: operating data flow equations mechanism is
a: = b + c (A) Interprocedural
b: = a – d (B) Procedural
c: = b + c (C) Live variable analysis
d: = a – d (D) Data
10. The above basic block contains, the value of b in 3rd
statement is 14. Which of the following consist of a definition, of a vari-
(A) Same as b in 1st statement able and all the uses, U, reachable from that definition
(B) Different from b in 1st statement without any other intervening definitions?
(A) Ud-chaining (B) Du-chaining

JW
(C) 0
(D) 1 (C) Spanning (D) Searching

Pw
11. The above basic block contains 15. Consider the graph

Jf
(A) Two common sub expression

/2
(B) Only one common sub expression 1

ly
(C) Dead code

it.
(D) Temporary variable

//b
2 3
12. Find the induction variable from the following code:

s:
A = –0.2;

tp
B = A + 5.0; The graph is

ht
(A) A (A) Reducible graph
(B) Non-reducible graph
S
(B) B TE
(C) Both A and B are induction variables (C) Data insufficient
O
(D) No induction variables (D) None of these
N
D
N

Practice Problems 2
A

(A) Statements s may be the only definition of x reach-


H

Directions for questions 1 to 15: Select the correct alterna- ing u


EW

tive from the given choices. (B) x is dead


(C) y is dead
N

1. In labeling algorithm, let n is a binary node and its chil- (D) x and y are aliases
SE

dren have L1 and L2, if L1 = L2 then LABEL (n):


6. Consider the following code
(A) L1 – 1 (B) L2 + 1
C

(C) L1 + L1 (D) L1 + 1 for (i=0; i<m; i++)


TE

{
2. The input for the code generator is a: for (j=0; j<m; j ++)
A
G

(A) Tree at lexical level If (i%2)


(B) Tree at semantic level
D

{
A

(C) Sequence of assembly language instructions a = a + (14*j+5*i);


LO

(D) Sequence of machine idioms b = b + (9 + 4*j);


N

3. In labeling algorithm, let n is a binary node and its chil- }


W

dren have i1 and i2, LABEL (n) if i1 ≠ i2 is }


O

(A) Max (i1, i2) Which of the following is false?


D

(B) i2 + 1
(A) There is a scope of common reduction in this code
(C) i2 – 1
(B) There is a scope of strength reduction in this code.
(D) i2 – i1
(C) There is scope of dead code elimination in this
4. The following tries to keep frequently used value in a code
fixed register throughout a loop is: (D) Both (A) and (C)
(A) Usage counts 7. S1: In dominance tree, the initial node is the root.
(B) Global register allocation S2: Each node d dominates only its ancestors in the tree.
(C) Conditional statement S3: if d≠n and d dom n then d dom m.
(D) Pointer assignment
Which of the statements is/are true?
5. Substitute y for x for copy statement s : x = y if the fol- (A) S1, S2 are true
lowing condition is met (B) S1, S2 and S3 are true
Chapter 4 • Code Optimization | 6.67

(C) Only S3 is true {


(D) Only S1 is true int y;
8. The specific task storage manager performs: y = 9;
(A) Allocation/Deallocation of storage to programs ------
(B) Protection of storage area allocated to a program x = 2*y;
from illegal access by other programs in the system }
(C) The status of each program -------
(D) Both (A) and (B) -------
9. Concept which can be used to identify loops is: x = x + y;
(A) Dominators printf (“%d”, x);

JW
(B) Reducible graphs }
(C) Depth first ordering ------

Pw
(D) All of these printf (“%d”, x);

Jf
10. A point cannot be found: }

/2
(A) Between two adjacent statements

ly
The output is
(B) Before the first statement

it.
(A) 3 – 25 (B) 25 – 3
(C) After the last statement

//b
(C) 3 – 3 (D) 25 – 25
(D) Between any two statements

s:
13. The evaluation strategy which delays the evaluation of
11. In the statement, x = y*10 + z; which is/are defined?

tp
an expression until its value is needed and which avoids
(A) x (B) y

ht
repeated evaluations is:
(C) z (D) Both (B) and (C)
(A) Early evaluation (B) Late evaluation

S
12. Consider the following program: TE
(C) Lazy evaluation (D) Critical evaluation
void main ( ) 14. If two or more expressions denote same memory
O

{
N

address, then the expressions are:


int x, y;
D

(A) Aliases (B) Definitions


N

x = 3; y = 7; (C) Superiors (D) Inferiors


A

--------
H

15. Operations that can be removed completely are called:


-------- (A) Strength reduction
EW

if (x<y) (B) Null sequences


{
N

(C) Constant folding


int x;
SE

(D) None of these


C
TE
A
G

Previous Years’ Questions


D
A

1. In a compiler, keywords of a language are recognized End A1


LO

during: [2011] Procedure A2;


(A) parsing of the program Var …
N
W

(B) the code generation Procedure A21;


O

(C) the lexical analysis of the program Var …


D

(D) dataflow analysis Call A1;


End A21
2. Consider the program given below, in a block struc- Call A21;
tured pseudo-language with lexical scoping and nest- End A2
ing of procedures permitted. [2012] Call A1;
Program main; End main
Var … Consider the calling chain: Main → A1 → A2 → A21
Procedure A1; → A1
Var … The correct set of activation records along with their
Call A2; access links is given by:
6.68 | Unit 6 • Compiler Design

(A) Main compile this code segment without any spill to mem-
ory? Do not apply any optimization other than opti-
A1
mizing register allocation. [2013]
A2 (A) 3 (B) 4
FRAME A21 (C) 5 (D) 6
POINTER
A1 4. Suppose the instruction set architecture of the proces-
ACCESS sor has only two registers. The only allowed compiler
LINKS optimization is code motion, which moves statements
(B) Main from one place to another while preserving correct-
ness. What is the minimum number of spills to mem-

JW
A1 ory in the compiled code? [2013]

Pw
A2 (A) 0 (B) 1
A 21 (C) 2 (D) 3

Jf
5. Which one of the following is NOT performed during

/2
A1
FRAME compilation?  [2014]

ly
ACCESS
POINTER

it.
LINKS (A) Dynamic memory allocation

//b
(B) Type checking
(C) Symbol table management

s:
(D) Inline expansion

tp
(C)

ht
Main 6. Which of the following statements are CORRECT?
A1  [2014]

S
FRAME (i) Static allocation of all data areas by a compiler
TE
A2
POINTER makes it impossible to implement recursion.
O

A21 (ii) Automatic garbage collection is essential to im-


N

ACCESS plement recursion.


D

LINKS
(iii) Dynamic allocation of activation records is es-
N
A

(D) Main sential to implement recursion.


H

(iv) Both heap and stack are essential to implement


EW

A1
recursion.
A2 (A) (i) and (ii) only (B) (ii) and (iii) only
N

A21 (C) (iii) and (iv) only (D) (i) and (iii) only
SE

A1 7. A variable x is said to be live at a statement Si in a


FRAME
C

ACCESS
POINTER
program if the following three conditions hold simul-
LINKS
TE

taneously: [2015]
A

Common data for questions 3 and 4: The following code 1. There exists a statement Sj that uses x
G

segment is executed on a processor which allows only reg- 2. There is a path from Si to Sj in the flow graph cor-
D

ister operands in its instructions. Each instruction can have responding to the program.
A

atmost two source operands and one destination operand.


LO

3.  The path has no intervening assignment to x


Assume that all variables are dead after this code segment. including at Si and Sj.
N

c = a + b;
W

d = c * a; 1 p=q+r
O

s=p+q
e = c + a;
D

u=s*v
x = c * c;
If (x > a) {
2 v=r+u 3 q=s*u
y = a * a;
}
Else { 4 q=v+r
d = d * d;
e = e * e; The variables which are live both at the statement in
} basic block 2 and at the statement in basic block 3 of
the above control flow graph are
3. What is the minimum number of registers needed (A) p, s, u (B) r, s, u
in the instruction set architecture of the processor to (C) r, u (D) q, v
Chapter 4 • Code Optimization | 6.69

8. Match the following [2015] Consider a program P following the above gram-
mar containing ten if terminals. The number of con-
P. Lexical analysis 1. Graph coloring trol flow paths in P is__________. For example, the
Q. Parsing 2. DFA minimization program
R. Register allocation 3. Post-order traversal if e1 then e2 else e3
S. Expression evaluation 4. Production tree has 2 control flow paths, e1 → e2 and e1 → e3. [2017]
(A) P–2, Q–3, R–1, S–4 (B) P–2, Q–1, R–4, S–3 11. Consider the expression (a—1) ∗ (((b + c)/3) + d)).
(C) P–2, Q–4, R–1, S–3 (D) P–2, Q–3, R–4, S–1 Let X be the minimum number of registers required
9. Consider the following directed graph: by an optimal code generation (without any register

JW
spill) algorithm for a load/store architecture, in which

Pw
b c (i) only load and store instruction can have memory
operands and (ii) arithmetic instructions can have

Jf
only register or immediate operands. The value of X

/2
a f is . [2017]

ly
it.
12. Match the following according to input (from the left

//b
d e column) to the compiler phase (in the right column)

s:
that processes it: [2017]

tp
The number of different topological orderings of the

ht
vertices of the graph is _______ . [2016] (P) Syntax tree (i) Code generator
10. Consider the following grammar: (Q) Character stream (ii) Syntax analyzer

S
− > if expr then expr else expr; stmt | Ò
stmt
TE
(R) Intermediate representation (iii) Semantic analyzer
expr − > term relop term | term (S) Token stream (iv) Lexical analyzer
O

term − > id | number


N

id −>a|b|c (A) P → (ii), Q → (iii), R → (iv), S → (i)


D

− > [0 − 9] P → (ii), Q → (i), R → (iii), S → (iv)


N

number (B)
A

where relop is a relational operator (e.g., <, >,…), Ò (C) P → (iii), Q → (iv), R → (i), S → (ii)
H

refers to the empty statement, and if, then, else are (B) P → (i), Q → (iv), R → (ii), S → (iii)
EW

terminals.
N
SE
C

Answer Keys
TE
A

Exercises
G
D

Practice Problems 1
A

1. A 2. C 3. B 4. A 5. B 6. B 7. C 8. B 9. B 10. B
LO

11. B 12. D 13. C 14. B 15. B


N
W

Practice Problems 2
O
D

1. D 2. B 3. A 4. B 5. A 6. D 7. D 8. D 9. D 10. D
11. A 12. B 13. C 14. A 15. B

Previous Years’ Questions


1. C 2. D 3. B 4. B 5. A 6. D 7. C 8. C 9. 6 10. 1024
11. 2 12. C
6.70 | Unit 6 • Compiler Design

Test

Compiler Design Time: 45 min.


Directions for questions 1 to 30: Select the correct alterna- 8. The grammar E → E * E/E + E/a, is
tive from the given choices. (A) Ambiguous
1. The most powerful parsing method is (B) Unambiguous
(A) LALR (B) LR (C) Will not depend on the given sentence
(C) CLR (D) LL (1) (D) None of these
2. In which phase ‘type checking’ is done? 9. Shift-reduce parsers are

JW
(A) Lexical analysis (A) Bottom up parsers

Pw
(B) Code optimization (B) Top down parsers
(C) Syntax analysis (C) Both (A) and (B)

Jf
(D) Semantic analysis (D) None of these

/2
10. Consider the following grammars:

ly
3. A shift reduces parser carries out the actions specified

it.
within braces immediately after reducing the corre- I. E → TE′

//b
sponding rule of grammar, as below: E′ → + TE′/∈
T → FT ′

s:
S → aaD {Print “1”}.

tp
S → b {Print “2”} TI → *FT ′/∈

ht
D → Sc {Print “3”} F → (E)/id
What is the translation of ‘aaaabcc’ using the syntax di- II. S → iCtSS′ | a
S
rected translation scheme described by the above rules?
TE
S′ → eS | ∈
(A) 33211 (B) 11233 C → b
O

(C) 11231 (D) 23131


N

Which of the following is true?


4. E → TE′
D

(A) II is LL (1) (B) I is LL (1)


N

E′ → + TE′/∈
(C) Both (A) and (B) (D) None of these
A

T → FT ′
H

T ′ → *FT ′/∈ 11. Consider the following grammar:


EW

F → (E)/id S → iCtSS′/a
From above grammar, FOLLOW (E) is S′ → eS/∈
N

(A) { ), $} (B) {$, *)} C→b


SE

(C) {(, id} (D) {+,), $} First (S′) is


C

5. To eliminate backtracking, which one is used? (A) {i, a} (B) {$, e}


TE

(A) Left Recursion (C) {e, ∈} (D) {b}


A

(B) Left Factoring 12. From the above grammar Follow(S) is.
G

(C) Right Recursion (A) {$, e} (B) {$}


D

(D) Right Factoring (C) {e} (D) {$,), e}


A
LO

6. Consider the grammar 13. Find the LEADING (S) from the following grammar:
T → (T) | ∈ S → a | ^ | (T)
N

Let the number of states in SLR (1), LR (1) and LALR


W

T → T, S / S
(1) parsers for the grammar be n1, n2 and n3 respectively.
O

(A) {a, ^, ( } (B) {, a,)}


D

Which relationship holds well? (C) {, a, ( } (D) {, a, ^,)}


(A) n1 = n2 = n3
(B) n1 ≥ n3 ≥ n2 14. From above grammar find the TRAILING (T).
(C) n1 = n3 < n2 (A) {a,)} (B) {a, ^,)}
(D) n1 < n2 < n3 (C) {),} (D) {, a,)}
7. If w is a string of terminals and A, B are two non- 15. Which of the following remarks logically follows?
terminals then which of the following are left-linear (A) FIRST (∈) = {∈}.
grammars? (B) If FOLLOW (A) contains $, then A may or may
(A) A → wB/w not be the start symbol.
(B) A → Bw/w (C) If A → w, is a production in the given grammar G,
(C) A → wB then FIRSTk (A) contains FIRSTk (w).
(D) None of the above (D) All of the above
Test | 6.71

16. Consider the following grammar: 22. The parse tree is constructed and then it is traversed
S → AB and the semantic rules are evaluated in a particular
B → ab order by a
A → aa (A) Recursive evaluator
A→a (B) Bottom up translation
B → b. (C) Top down translation
The grammar is (D) Phase tree method
(A) Ambiguous 23. The following grammar indicates
(B) Unambiguous S → a a b|b a c|a b
(C) Can’t predictable S → a S |b

JW
(D) None of these S → a b b/a b
S → a b d b/b
17. If a handle has been found but there is no production

Pw
(A) LR (0) grammar
with this handle as a right side, then we discover

Jf
(B) SLR grammar
(A) Logical error

/2
(C) Regular grammar
(B) Runtime error

ly
(D) None of these
(C) Syntactic error

it.
(D) All of the above 24. If the attributes of the child depends on the attributes of

//b
the parent node then it is ____ attribute.

s:
18. The function of syntax phase is (A) Inherited

tp
(A) To build a literal table (B) Directed

ht
(B) To build an uniform symbol table (C) Synthesised
(C) To parse the tokens produced by lexical analyzer (D) TAC

S
(D) None of these
TE
25. The semantic rule is evaluated and the intermediate
O

19. Which of the following are cousins of compilers? code is generated when the production is expanded in
N

(A) Pre-processor and Assembler _____


D

(B) Assembler and LEX (A) Parse tree method


N

(C) Pre-processor and YACC (B) Bottom up translation


A

(C) Top down translation


H

(D) LEX and YACC.


(D) Recursive evaluator model
EW

20. Error is detected in predictive parsing when ____ 26. Consider the grammar shown below:
N

hold(s). S → CC
(i) ‘a’ on top of stack and next input symbol is ‘b’.
SE

C → cC/a
(ii) When ‘a’ is on top of stack, ‘a’ is next input sym-
C

The grammar is
bol and parsing table entry M [A, a] is empty.
TE

(A) LL (1)
(A) Neither (i) nor (ii)
(B) SLR (1) But not LL (1)
A

(B) Both (i) and (ii)


(C) LALR (1) but not SLR (1)
G

(C) only (i)


(D) LR (1) but not LALR
D

(D) only (ii)


A

27. The class of grammars for which we can construct pre-


LO

21. Which one indicates abstract syntax tree (AST) of “a * dictive parsers looking k-symbols ahead in the input is
N

b + c” with following grammar: called


W

E → E * T/T (A) LR (k)


O

(B) CLR (k)


T → T + F/F
D


(C) LALR (k)
F → id (D) LL (k)
(A) * (B) + 28. A compiler is a program that
(A) Places programs into memory and prepares them
a b b c
for execution.
c (B) Automates the translation of assembly language
into machine language.
(C) * (D) * (C) Accepts a program written in a high level language
a + + a and produces an object program.
(D) Appears to execute a source program as if it were
b c b c machine language.
6.72 | Unit 6 • Compiler Design

Common data for questions 29 and 30: 29. Which one is FOLLOW (F)?
Consider the grammar (A) {+,), $} (B) {+, (,), *}
E → TE′ (C) {*,), $} (D) {+, *,), $}
E’ → + TE′ | ∈ 30. FIRST (E) will be as same as
T → FT ′ (A) FIRST (T) (B) FIRST (F)
T1 → * FT ′ | ∈ (C) Both (A) and (B) (D) None of these
F → (E) | id.

Answers Keys

JW
1. A 2. D 3. D 4. A 5. B 6. C 7. B 8. A 9. A 10. B

Pw
11. C 12. A 13. A 14. C 15. D 16. A 17. C 18. C 19. A 20. B

Jf
21. C 22. A 23. D 24. A 25. C 26. A 27. D 28. C 29. D 30. C

/2
ly
it.
//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D

You might also like