Gate Compiler Design
Gate Compiler Design
JW
n
Pw
Chapter 1: Lexical Analysis and Parsing 6.3
Jf
/2
Chapter 2: Syntax Directed Translation 6.27
ly
it.
//b
Chapter 3: Intermediate
s:
tp
Code Generation6.36
ht
i
S
TE
Chapter 4: Code Optimization6.56
O
N
D
N
A
H
EW
N
SE
t
C
TE
A
G
D
A
LO
N
W
O
D
6
D
O
W
N
LO
A
D
G
A
TE
C
SE
N
EW
H
A
N
D
N
O
TE
S
ht
This page is intentionally left blank
tp
s:
//b
it.
ly
/2
Jf
Pw
JW
Chapter 1
Lexical Analysis and Parsing
JW
LEARNING OBJECTIVES
Pw
Language processing system Bottom up parsing
Jf
Lexical analysis Conflicts
/2
Syntax analysis Operator precedence grammar
ly
it.
Context free grammars and ambiguity LR parser
//b
Types of parsing Canonical LR parser(CLR)
s:
Top down parsing
tp
ht
S
TE
O
N
program program
Language Processors
N
tation and immediately executes this. The number of iterations to scan the source code, till to get the
executable code is called as a pass.
C
Source program Compiler is two pass. Single pass requires more memory and
TE
Interpreter Output
Input multipass require less memory.
A
G
A software system that converts the source code from one form of
N
Compilers Analysis It breaks up the source program into pieces and creates
A compiler is a software that translates code written in high-level an intermediate representation of the source program. This is more
language (i.e., source language) into target language. language specific.
Example: source languages like C, Java, . . . etc. Compilers are
user friendly. Synthesis It constructs the desired target program from the inter-
The target language is like machine language, which is efficient mediate representation. The target program will be more machine
for hardware. specific, dealing with registers and memory locations.
6.4 | Unit 6 • Compiler Design
JW
E→E*E
The back end includes code optimization and code gen- E → –E
Pw
(G1)
eration phases. The back end synthesizes the target program E → (E)
Jf
from intermediate code. E → id
/2
E
ly
Context of a compiler
it.
In addition to a compiler, several other programs may be − E
//b
required to create an executable target program, like pre-
s:
processor to expand macros. ( E )
tp
The target program created by a compiler may require E + E
ht
further processing before it can be run.
The language processing system will be like this: id id
S
TE
Source program with macros
Semantic analysis
O
N
Compiler
•• Gathers the type information for the next phases.
EW
Loader/linker
Library files, relocatable Example 2:
TE
char c;
D
Phases c = a + b;
A
Compilation process is partitioned into some subproceses We cannot add integer with a Boolean variable and assign it
LO
ticular task and parsing out its output for the next phase. The intermediate representation should have two important
D
properties:
Lexical analysis or scanning (i) It should be easy to produce.
It is the first phase of a compiler. The lexical analyzer reads the (ii) Easy to translate into the target program
stream of characters making up the source program and groups
‘Three address code’ is one of the common forms of
the characters into meaningful sequences called lexemes.
Intermediate code.
Example: Consider the statement: if (a < b) Three address code consists of a sequence of instruc-
In this sentence the tokens are if, (a, <, b,). tions, each of which has at most three operands.
Number of tokens = 6
Identifiers: a, b Example:
Keywords: if id1 = id2 + id3 × 10;
Operators: <, ( , ) t1: = inttoreal(10)
Chapter 1 • Lexical Analysis and Parsing | 6.5
t2:= id3 × t1 in the source program. The stream of tokens is sent to the
t3:= id2 + t2 parser for syntax analysis.
There will be interaction with the symbol table as well.
id1 = t3
Source program
Code optimization
Lexical
The output of this phase will result in faster running analyzer
machine code.
Get
Example: For the above intermediate code the optimized Error handler next Tokens Symbol table
tokens
code will be
t1:= id3 × 10.0
JW
Parser
id1: = id2 + t1
Pw
In this we eliminated t2 and t3 registers.
Jf
Lexeme: Sequence of characters in the source program
Code generation
/2
that matches the pattern for a token. It is the smallest logical
ly
•• In this phase, the target code is generated. unit of a program.
it.
•• Generally the target code can be either a relocatable
//b
Example: 10, x, y, <, >, =
machine code or an assembly code.
s:
•• Intermediate instructions are each translated into a Tokens: These are the classes of similar lexemes.
tp
sequence of machine instructions.
ht
•• Assignment of registers will also be done. Example: Operators: <, >, =
Identifiers: x, y
S
Example: MOVF
id3, R2 TE Constants: 10
MULF ≠ 60.0, R2 Keywords: if, else, int
O
MOVF id2, R1
N
ADDF R2, R1
D
MOVF R1, id1
Operations performed by lexical analyzer
N
A
Symbol table management 2. Stripping out comments and white space (blank, new
EW
What is the use of a symbol table? expansion of macros may also be performed by lexical
1. To record the identifiers used in the source program. analyzer.
TE
3. If it is a procedure name then the number of argu- Example 1: Take the following example from Fortran
G
(i) Lexical phase can detect errors where the characters Example 2: An example from C program
O
remaining in the input ‘do not form any token’. for (int i = 1; i < = 10; i + +)
D
(ii) Errors of the type, ‘violation of syntax’ of the language Here tokens are for, (, int, i, =, 1,;, i, < =, 10,;,
are detected by syntax analysis. i, ++,)
(iii) Semantic phase tries to detect constructs that have the Number of tokens = 13
right syntactic structure but no meaning.
Example: adding two array names etc. LEX compiler
Lexical analyzer divides the source code into tokens. To
Lexical Analysis implement lexical analyzer we have two techniques namely
Lexical Analysis is the first phase in compiler design. The hand code and the other one is LEX tool.
main task of the lexical analyzer is to read the input char- LEX is an automated tool which specifies lexical ana-
acters of the source program, group them into lexemes, and lyzer, from the rules given by the regular expression.
produce as output a sequence of tokens for each lexeme These rules are also called as pattern recognizing rules.
6.6 | Unit 6 • Compiler Design
Syntax Analysis The parser generator is used for construction of the com-
pilers front end.
This is the 2nd phase of the compiler, checks the syntax and
constructs the syntax/parse tree.
Input of parser is token and output is a parse/ syntax tree. Scope of declarations
Declaration scope refers to the certain program text portion,
Constructing parse tree in which rules are defined by the language.
Within the defined scope, entity can access legally to
Construction of derivation tree for a given input string by
declared entities.
using the production of grammar is called parse tree.
The scope of declaration contains immediate scope
Consider the grammar
always. Immediate scope is a region of declarative portion
S → E + E/E * E
JW
with enclosure of declaration immediately.
E → id Scope starts at the beginning of declaration and scope
Pw
The parse tree for the string continues till the end of declaration. Whereas in the over
Jf
ω = id + id * id is loadable declaration, the immediate scope will begin, when
/2
S the callable entity profile was determined.
ly
The visible part refers text portion of declaration, which
it.
E + E is visible from outside.
//b
s:
E * E
Syntax Error Handling
tp
1. Reports the presence of errors clearly and accurately.
ht
id id id
2. Recovers from each error quickly.
ω = id + id * id
S
3. It should not slow down the processing of correct
TE
programs.
Role of the parser
O
N
recognized by using pushdown automata/table driven Panic Phrase level Error Global
H
tokens is found.
Source Token
C
program Lexical Parse Phrase level A parser may perform local correction on the
Parser
TE
analyzer tree
Get next remaining input. It may replace the prefix of the remaining
token
A
Syntax input.
Lexical
G
errors
errors Error productions Parser can generate appropriate error
D
(i) Top-down parser: It builds parse trees from the top minimal sequence of changes to obtain a globally least cost
O
JW
Sentence: A sentence is a sentential form with no •• Ambiguity can be handled in several ways
Pw
non-terminals. 1. Enforce associativity and precedence
2. Rewrite the grammar by eliminating left recursion and
Jf
Example: –(id + id) is a sentence of the grammar (G1).
/2
left factoring.
ly
Derivations
it.
Removal of ambiguity
//b
Left most derivations Right most derivations
The grammar is said to be ambiguous if there exists more
s:
E ⇒ −E ⇒ −(E ) E ⇒ −E ⇒ −(E ) than one derivation tree for the given input string.
tp
⇒ −(E + E ) ⇒ −(E + E ) The ambiguity of grammar is undecidable; ambiguity of
ht
⇒ −(id + E) ⇒ −(E + id)
a grammar can be eliminated by rewriting the grammar.
⇒ −(id + id) ⇒ −(id + id)
S
Example:
TE
Right most derivations are also known as canonical E → E + E/id} → ambiguous grammar
O
derivations. E → E + T/T rewritten grammar
N
E T → id (unambiguous grammar)
D
N
−
A
E Left recursion
H
( E )
need to remove left recursion.
+
N
E E
Elimination of left recursion
SE
id id
A → Aa/b is a left recursive.
C
Ambiguity A → bA′
A′ → aA′/e
A
A grammar that produces more than one parse tree for some
G
Or A → Aa1/Aa2/…/Aam/b1/b2/…/bn
A
LO
A grammar that produces more than one left most or more We can replace A productions by
than one right most derivations is ambiguous. A → b1 A′/b2 A′/–bn A′
N
A′ → a1 A′/a2 A′/–am A′
W
String → String + String/String – String /0/1/2/…/9 Example 3: Eliminate left recursion from
D
E → E + T/T
9 – 5 + 2 has two parse trees as shown below T → T * F/F
String F → (E)/id
Solution E → E + T/T it is in the form
A → Aa/b
String String
+ So, we can write it as E → TE′
E′ → +TE′/e
2
String String
Similarly other productions are written as
−
T → FT ′
9 5 T1 → × FT ′/∈
Figure 1 Leftmost derivation F → (E)/id
6.8 | Unit 6 • Compiler Design
Example 4 Eliminate left recursion from the grammar Let the string w = cad is to generate:
S → (L)/a S
L → L, S/b
c A d
Solution: S → (L)/a
a b
L → bL′
L′ → SL′/∈ The string generated from the above parse tree is cabd.
but, w = cad, the third symbol is not matched.
So, report error and go back to A.
Left factoring Now consider the other alternative for production A.
A grammar with common prefixes is called non-determin-
JW
S
istic grammar. To make it deterministic we need to remove
common prefixes. This process is called as Left Factoring.
Pw
c A d
The grammar: A → ab1/ab2 can be transformed into
Jf
a
A → a A′
/2
String generated ‘cad’ and w = cad. Now, it is successful.
ly
A′ → b1/b2
In this we have used back tracking. It is costly and time
it.
consuming approach. Thus an outdated one.
//b
Example 5: What is the resultant grammar after left
factoring the following grammar?
s:
Predictive Parsers
tp
S → iEtS/iEtSeS/a
ht
E→b By eliminating left recursion and by left factoring the gram-
mar, we can have parse tree without backtracking. To con-
S
Solution: S → iEtSS ′/a struct a predictive parser, we must know,
TE
S ′ → eS/∈
1. Current input symbol
O
E→b
N
grammar.
Types of Parsing
A
H
Parsers
parsing parsing
(table driven parsing)
D
SLR CLR LALR •• It maintains a stack explicitly, rather than implicitly via
A
recursive calls.
LO
→ An input buffer
W
→ A parsing table
root and creating the nodes of the parse tree in preorder. It → Output stream
simulates the left most derivation.
Input a + b $
Backtracking Parsing
If we make a sequence of erroneous expansions and sub-
x
sequently discover a mismatch we undo the effects and roll Predictive parsing
y Output
program
back the input pointer.
z
This method is also known as brute force parsing.
$
Example: S → cAd Parsing table
M
A → ab/a
Chapter 1 • Lexical Analysis and Parsing | 6.9
Constructing a parsing table By applying these rules to the above grammar, we will get
To construct a parsing table, we have to learn about two the following parsing table.
functions:
Input Symbol
1. FIRST ( )
2. FOLLOW ( ) Non-terminal id + * ( ) $
E E → TE ′ E → TE ′
FIRST(X) To compute FIRST(X) for all grammar symbols
E′ E ′ → + TE ′ E′ → e E′ → e
X, apply the following rules until no more terminals or e can
be added to any FIRST set. T T → FT ′ T → FT ′
T′ T′ → e T ′ → * FT ′ T′ → e T′ → e
1. If X is a terminal, then FIRST(X) is {X}.
JW
2. If X → e is a production, then add e to FIRST(X). F F → id F → (E)
Pw
then place ‘a’ in FIRST(X) if for some i, a is an FIRST The parser is controlled by a program. The program con-
Jf
(Yi) and ∈ is in all of FIRST(Y1), …, FIRST(Yi–1); that sider x, the symbol on top of the stack and ‘a’ the current
/2
*
is, Y1, …, Yi – 1 ⇒ ∈. If ∈ is in FIRST (Yj) for all j = 1, input symbol.
ly
2, …, k, then add ∈ to FIRST(X). For example, every-
it.
thing in FIRST (Y1) is surely in FIRST(X). If Y1 does 1. If x = a = $, the parser halts and announces successful
//b
not derive ∈, then add nothing more to FIRST(X), but if completion of parsing.
s:
* 2. If x = a ≠ $, the parser pops x off the stack and advances
Y1 ⇒ ∈, then add FIRST (Y2) and so on.
tp
the input pointer to the next input symbol.
ht
FOLLOW (A): To compute FOLLOW (A) for all non- 3. If x is a non-terminal, the program consults entry M[x,
terminals A, apply the following rules until nothing can be a] of the parsing table M. This entry will be either an
S
added to any FOLLOW set.
TE
x-production of the grammar or an error entry. If M[x,
a] = {x → UVW}, the parser replaces x on top of the
O
1. Place $ in FOLLOW(S), where S is the start symbol
stack by WVU with U on the top.
N
2. If there is a production A → aBb, then everything in If M[x, a] = error, the parser calls an error recovery routine.
N
E$ id+id*id$
E → TE′
C
F → (E)/id. Then
D
E′$
FIRST (E ′) = {+, e}
FIRST (T ′) = {*, e} id +TE′$ +id*id$ Output E′ → +TE′
N
W
JW
it is substituted by the non-terminal in the left hand side of ∗
Pw
the production.
Right sentential forms of a unambiguous grammar have
For example consider the grammar
Jf
one unique handle.
S → aABe
/2
Example: For grammar, S → aABe
ly
A → Abc/b
A → Abc/b
it.
B→d
B→d
//b
In bottomup parsing the string ‘abbcde’ is verified as S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde
s:
abbcde
tp
Note: Handles are underlined.
aAbcde
ht
aAde → reverse order Handle pruning The process of discovering a handle and
S
aABe reducing it to the appropriate left hand side is called han-
TE
S dle pruning. Handle pruning forms the basis for a bottomup
O
The shift reduce parser consists of input buffer, Stack and To construct the rightmost derivation:
N
bols are inserted using shift operation and they are reduced
Replace Bi with Ai to generate ri–1
SE
Parse table consists of 2 parts goto and action, which are form:
TE
A
S → AA
A
A → aA
LO
a b w
A→b
N
Let the input string ‘w’ be abab$ Here A → b is a handle for abw.
W
w = abab$
O
Viable prefixes The set of prefixes of a right sentential Let the operator precedence table for this grammar is:
form that can appear on the stack of a shift reduce parser
id + × $
are called viable prefixes.
id ⋗ ⋗ ⋗
+ ⋖ ⋗ ⋖ ⋗
Conflicts × ⋖ ⋗ ⋗ ⋗
Conflicts
$ ⋖ ⋖ ⋖ accept
JW
Example: stmt → if expr then stmt | if expr then stmt else and to the right of the ⋖ is encountered.
Pw
stmt | any other statement
After inserting precedence relation is
If exp then stmt is on the stack, in this case we can’t tell
Jf
$id + id * id $ is
whether it is a handle. i.e., ‘shift/reduce’ conflict.
/2
$ ⋖ id ⋗ + ⋖ id ⋗ * ⋖ id ⋗ $
ly
Reduce/reduce conflict
it.
Precedence functions Instead of storing the entire table of
//b
Example: S → aA/bB precedence relations table, we can encode it by precedence
functions f and g, which map terminal symbols to integers:
s:
A→c
tp
B→c 1. f(a) ⋖ f(b) whenever a ⋖ b
ht
W = ac it gives reduce/reduce conflict. 2. f(a) ⋗ f(b) whenever a ≗ b
3. f(a) > f(b) whenever a ⋗ b
S
Operator Precedence Grammar TE
In operator grammar, no production rule can have: Finding precedence functions for a table
O
1. Create symbols f(a) and g(a) for each ‘a’ that is a ter-
•• two adjacent non-terminals at the right side.
D
minal or $.
N
Example 1: E → E + E /E – E/ id is operator grammar. 2. Partition the created symbols into as many groups as
A
Example 3: E → E0E/id
4. If the graph constructed has a cycle then no precedence
C
grammar If there are no cycles, let f(a) be the length of the long-
A
group of g(a).
a = b then b has same precedence as a
A
LO
1. Traditional notations of associativity and precedence. •• Difficult to decide which language is recognized by
D
JW
e
s
Pw
LR Parsers Stack: To store the string of the form,
•• In LR (K), L stands for Left to Right Scanning, R stands
Jf
So x1 S1 … xmSm where
/2
for Right most derivation, K stands for number of look
Sm: state
ly
ahead symbols.
it.
•• LR parsers are table-driven, much like the non-recursive xm: grammar symbol
//b
LL parsers. A grammar which is used in construction of Each state symbol summarizes the information contained in
s:
LR parser is LR grammar. For a grammar to be LR it is the stack below it.
tp
sufficient that a left-to-right shift-reduce parser be able
Parsing table: Parsing table consists of two parts:
ht
to recognize handles of right-sentential forms when they
appear on the top of the stack. 1. Action part
S
•• The Time complexity for such parsers is O (n3) 2. Goto part
TE
•• LR parsers are faster than LL (1) parser.
O
ACTION Part:
•• LR parsing is attractive because
N
The most general non-backtracking shift reduce parser. Let, Sm → top of the stack
D
A
methods is a proper superset of predictive parsers. LL Then action [Sm, ai] which can have one of four values:
H
LR parser can detect a syntactic error in the left to right 2. Reduce by a grammar production A → b
scan of the input. 3. Accept
N
least powerful of the three. If goto (S, A) = X where S → state, A → non-terminal, then
TE
2. Canonical LR (CLR): most powerful and most GOTO maps state S and non-terminal A to state X.
A
expensive.
G
JW
5 r6 r6 r6 r6
6 S5 S4 9 3 Augmented grammar (G′) If G is a grammar with start
Pw
7 S5 S4 10 symbol S, G′ the augmented grammar for G, with new start
Jf
8 S6 S1 symbol S ′ and production S′ → S.
/2
9 r1 S7 r1 r1 Purpose of G′ is to indicate when to stop parsing and
ly
10 r3 r3 r3 r3 announce acceptance of the input.
it.
//b
11 r5 r5 r5 r5
Closure operation Closure (I) includes
s:
Moves of LR parser on input string id*id+id is shown below: 1. Intially, every item in I is added to closure (I)
tp
2. If A → a.Bb is in closure (I) and b → g is a production
ht
then add B → .g to I.
Stack Input Action
S
0 id * id + id$ Shift 5
TE
Goto operation
O
reduce 6 means reduce with
Goto (I, x) is defined to be the closure of the set of all items
N
reduce 4 i.e T → F
0F 3 * id + id$
A
0T 2 * id + id$ Shift 7
EW
reduce 6 i.e F → id
0T2 * 7 id 5 + id$ dots are not at the dots at the left end.
goto [7, F ] = 10
SE
left end
0T2 * 7 F 10 + id$ reduce 3 i.e T → T *F
C
Begin
G
(I, x) to C;
W
0E1 $ accept
D
JW
T → T. * F ( E )
I4 I8 I 11
+
Pw
I3: goto (I0, F) T to I 6
to I 2
T → F. F
Jf
id to I 3
id
/2
I5
I4: goto (I0, ( )
ly
F → (.E)
it.
//b
E → .E + T SLR parsing table construction
s:
E → .T 1. Construct the canonical collection of sets of LR (0)
tp
E → .T * F items for G′.
ht
T → .F 2. Create the parsing action table as follows:
(a) If a is a terminal and [A → a.ab] is in Ii, goto
S
F → .(E) TE
F → .id (Ii, a) = Ij then action (i, a) to shift j. Here ‘a’ must
be a terminal.
O
F → id.
N
F → .id
TE
F → .(E) grammar:
A
LO
F → .id 1. S → L = R
2. S → R
N
W
F → (E.)
4. L → id
D
JW
I2: goto (I0, L)
7
S → L. = R
Pw
8
R → L.
Jf
9
I3: got (I0, R)
/2
S → R.
ly
FOLLOW (S) = {$}
it.
I4: goto (I0, *)
FOLLOW (L) = {=}
//b
L → *.R
FOLLOW (R) = {$, =}
s:
R → .L For action [2, =] = S6 and r5
tp
L → .*R ∴ Here we are getting shift – reduce conflict, so it is not
ht
L → .id SLR (1).
S
I5: goto(I0, id) TE
L → id. Canonical LR Parsing (CLR)
O
I6: goto(I2, =)
N
L → .id
I7: goto(I4, R) [A → a.b, a]
N
Where A → ab is a production.
L → *R.
SE
R → L.
TE
S → L = R.
G
S
I0 I1
W
Repeat
O
*
For each item [A → a.Bb, a] in I,
D
* I4
id
I5 Each production B → .g in G′,
R And each terminal b in FIRST (b a)
I7
Such that [B → .g, b] is not in I do
L
I8 Add [B → .g, b] to I;
L = R End;
I2 I6 I9
L
Until no more items can be added to I;
I8
* Example 7: Construct CLR parsing table for the following
I4
id
I5
grammar:
S′ → S
R
I3 S → CC
id C → cC/d
I5
6.16 | Unit 6 • Compiler Design
Solution: The initial set of items is Consider the string derivation ‘dcd’:
I0: S′ → .S, $ S ⇒ CC ⇒ CcC ⇒ Ccd ⇒ dcd
S → .CC, $
A → a.Bb, a Stack Input Action
Here A = S, a = ∈, B = C, b = C and a = $ 0 dcd $ shift 4
First (ba) is first (C$) = first (C) = {c, d} 0d4 Cd $ reduce 3 i.e. C → d
So, add items [C → .cC, c]
0C 2 Cd $ shift 6
[C → .cC, d]
0C 2C 6 D$ shift 7
∴ Our first set I0: S′ → .S, $
0C2C 6d 7 $ reduce C → d
S → .CC, $
JW
0C 2C 6C 9 $ reduce C → cC
C → .coca, c/d
Pw
C → .d, c/d. 0C 2C 5 $ reduce S → CC
0S1 $
I1: goto (I0, X) if X = S
Jf
/2
S′ → S., $
ly
Example 8: Construct CLR parsing table for the grammar:
I2 : goto (I0, C)
it.
S→L=R
S → C.C, $
//b
S→R
C → .cC, $
s:
L → *R
C → .d, $
tp
L → id
I3: goto (I0, c)
ht
R→L
C → c.C, c/d
C → .cC, c/d
S
Solution: The canonical set of items is
TE
C → .d c/d I0: S′ → .S, $
O
S → CC., $ R → .L, $
H
EW
C → ..cC, $
SE
C → d. $
I3: goto (I0, R)
A
C → cC., c/d
D
C → cC., $
R → .L, =
N
L → .id, =
O
Action Goto
I5: goto (I0, id)
D
States c 1 $ S C
L → id.,=
I0 S3 S4 1 2
I6: goto (I7, L)
I1 acc
R → L., $
I2 S6 S7 5
I3 S3 S4 8 I7: goto (I2, =)
I4 R3 r3 S → L = .R,
I5 r1 R → .L, $
I6 S6 S7 9
L → .*R, $
L → .id, $
I7 r3
I8 R2 r2 I8: goto (I4, R)
I9 r2 L → *R., =
Chapter 1 • Lexical Analysis and Parsing | 6.17
JW
0S1 (accept) $
I12: goto (I7, id)
Pw
L → id. , $
Every SLR (1) grammar is LR (1) grammar.
Jf
I13: goto (I11, R) CLR (1) will have ‘more number of states’ than SLR Parser.
/2
L → *R., $
ly
LALR Parsing Table
it.
S
//b
I0 I1 •• The tables obtained by it are considerably smaller than
s:
the canonical LR table.
tp
L = L
I2 I7 I6 •• LALR stands for Lookahead LR.
ht
R
I 10 •• The number of states in SLR and LALR parsing tables for
R
S
* I 11 I 13 a grammar G are equal.
TE
R * L
to I 6 •• But LALR parsers recognize more grammars than SLR.
I3
O
id
to I 12 •• YACC creates a LALR parser for the given grammar.
N
id
I 12 •• YACC stands for ‘Yet another Compiler’.
D
*
R •• An easy, but space-consuming LALR table construction
N
* I4 I8
A
L is explained below:
H
I9
id 1. Construct C = {I0, I1, –In}, the collection of sets of LR
EW
I5
(1) items.
N
In this, we are going to have 13 states items. If there is a parsing action conflict then the
TE
The shift –reduce conflict in the SLR parser is reduced grammar is not a LALR (1).
here.
A
collection.
W
2 S7 r5
O
3 r2
Example 9: Construct LALR parsing table for the
D
4 S5 S4 9 8 following grammar:
5 r4
S′ → S
6 r5 S → CC
7 s12 s11 6 10 C → cC/d
8 r3
Solution: We already got LR (1) items and CLR parsing
9 r5
table for this grammar.
10 r1
After merging I3 and I6 are replaced by I36.
11 S12 S11 13
I36: C → c.C, c/d/$
12 r4
C → .cC, c/d/$
13 r3
C → .d, c/d/$
6.18 | Unit 6 • Compiler Design
JW
I8: goto (I4, d)
1 acc S → aAd., c
Pw
2 S36 S47 5 I9: goto (I5, e)
Jf
36 S36 S47 89 S → aBe., c
/2
ly
47 r3 R3 r3 If we union I6 and I7
it.
A → c., d/e
//b
5 r1
B → c., d/e
s:
89 r2 r2 r2
tp
It generates reduce/reduce conflict.
ht
Example: Consider the grammar: Notes:
S′ → S 1. The merging of states with common cores can never
S
TE
S → aAd produce a shift/reduce conflict, because shift action
S → bBd depends only on the core, not on the lookahead.
O
N
S → aBe 2. SLR and LALR tables for a grammar always have the
S → bAe same number of states (several hundreds) whereas
D
N
B → c
H
S → .bBd, $
TE
grammars
S → .bAe, $
G
S′ → S., $
LO
CLR(1)
N
SLR(1)
S → a.Ad, c
O
LR(0)
S → a.Be, c
D
A → .c,d
LL(1)
B → .c,e
I3: goto (I0, b)
S → b.Bd, c
S → b.Ae, c Every LR (0) is SLR (1) but vice versa is not true.
A → .c, e
B → .c, e Difference between SLR, LALR
I4: goto (I2, A)
and CLR parsers
S → aA.d, c Differences among SLR, LALR and CLR are discussed
below in terms of size, efficiency, time and space.
Chapter 1 • Lexical Analysis and Parsing | 6.19
JW
Exercises
Pw
Jf
Practice Problems 1 4. If an LL (1) parsing table is constructed for the above
/2
grammar, the parsing table entry for [S → [ ] is
Directions for questions 1 to 15: Select the correct alterna-
ly
(A) S → T; S (B) S → ∈
tive from the given choices.
it.
(C) T → UR (D) U → [S]
//b
1. Consider the grammar
S→a Common data for questions 5 to 7: Consider the aug-
s:
S → ab mented grammar
tp
The given grammar is: S→X
ht
(A) LR (1) only X → (X)/a
S
(B) LL (1) only 5. If a DFA is constructed for the LR (1) items of the
TE
(C) Both LR (1) and LL (1) above grammar, then the number states present in it
O
(D) LR (1) but not LL (1) are:
N
(C) 13 (D) 16
B → b
A
grammar:
A → a/c
D
Common data for questions 3 and 4: Consider the grammar: X → .cX, c/d X → .cX, $
LO
R → .T/∈
D
FALSE?
3. Which of the following are correct FIRST and 1. Cannot be merged since look ahead are different.
FOLLOW sets for the above grammar? 2. Can be merged but will result in S – R conflict.
(i) FIRST(S) = FIRST (T) = FIRST (U) = {x, y, [, e} 3. Can be merged but will result in R – R conflict.
(ii) FIRST (R) = {,e} 4. Cannot be merged since goto on c will lead to two
(iii) FOLLOW (S) = {], $} different sets.
(iv) FOLLOW (T) = Follow (R) = {;} (A) 1 only (B) 2 only
(v) FOLLOW (U) = {. ;} (C) 1 and 4 only (D) 1, 2, 3 and 4
(A) (i) and (ii) only 9. Which of the following grammar rules violate the
(B) (ii), (iii), (iv) and (v) only requirements of an operator grammar?
(C) (ii), (iii) and (iv) only (i) A → BcC (ii) A → dBC
(D) All the five (iii) A → C/∈ (iv) A → cBdC
6.20 | Unit 6 • Compiler Design
(A) (i) only (B) (i) and The entry/entries for [Z, d] is/are
(C) (ii) and (iii) only (D) (i) and (iv) only (A) Z → d
10. The FIRST and FOLLOW sets for the grammar: (B) Z → XYZ
(C) Both (A) and (B)
S → SS + /SS*/a
(D) X → Y
(A) First (S) = {a}
Follow (S) = {+, *, $} 13. The following grammar is
(B) First (S) = {+} S → AaAb/BbBa
Follow (S) = {+, *, $} A→e
(C) First (S) = {a} B→e
Follow (S) = {+, *} (A) LL (1) (B) Not LL (1)
JW
(D) First (S) = {+, *} (C) Recursive (D) Ambiguous
Follow (S) = {+, *, $} 14. Compute the FIRST (P) for the below grammar:
Pw
11. A shift reduces parser carries out the actions specified P → AQRbe/mn/DE
Jf
within braces immediately after reducing with the cor- A → ab/e
/2
responding rule of the grammar: Q → q1q2/e
ly
S → xxW [print ‘1’] R → r1r2/e
it.
S → y [print ‘2’] D→d
//b
W → Sz [print ‘3’] E→e
s:
What is the translation of ‘x x x x y z z’? (A) {m, a} (B) {m, a, q1, r1, b, d}
tp
(A) 1231 (B) 1233 (C) {d, e} (D) {m, n, a, b, d, e, q1, r1}
ht
(C) 2131 (D) 2321 15. After constructing the LR(1) parsing table for the aug-
S
12. After constructing the predictive parsing table for the mented grammar
TE
following grammar: S′ → S
O
Z→d S → BB
N
Z → XYZ B → aB/c
D
X→Y
A
Practice Problems 2
SE
tive from the given choices. 4. The action of parsing the source program into the
TE
S→e
LO
2. To convert the grammar E → E + T into LL grammar 6. A system program that combines separately compiled
(A) use left factor modules of a program into a form suitable for execu-
(B) CNF form tion is
(C) eliminate left recursion (A) Assembler.
(D) Both (B) and (C) (B) Linking loader.
3. Given the following expressions of a grammar (C) Cross compiler.
E → E × F/F + E/F (D) None of these.
F → F? F/id 7. Resolution of externally defined symbols is performed
Which of the following is true? by a
(A) × has higher precedence than + (A) Linker (B) Loader.
(B) ? has higher precedence than × (C) Compiler. (D) Interpreter.
Chapter 1 • Lexical Analysis and Parsing | 6.21
8. LR parsers are attractive because 15. Which of the following statement is false?
(A) They can be constructed to recognize CFG cor- (A) An unambiguous grammar has single leftmost
responding to almost all programming constructs. derivation.
(B) There is no need of backtracking. (B) An LL (1) parser is topdown.
(C) Both (A) and (B). (C) LALR is more powerful than SLR.
(D) None of these (D) An ambiguous grammar can never be LR (K) for
9. YACC builds up any k.
(A) SLR parsing table
16. Merging states with a common core may produce ___
(B) Canonical LR parsing table
conflicts in an LALR parser.
(C) LALR parsing table
(A) Reduce – reduce
JW
(D) None of these
(B) Shift – reduce
10. Language which have many types, but the type of every
Pw
(C) Both (A) and (B)
name and expression must be calculated at compile (D) None of these
Jf
time are
/2
(A) Strongly typed languages 17. LL (K) grammar
ly
(B) Weakly typed languages (A) Has to be CFG
it.
(C) Loosely typed languages (B) Has to be unambiguous
//b
(D) None of these (C) Cannot have left recursion
s:
11. Consider the grammar shown below: (D) All of these
tp
S → iEtSS′/a/b 18. The I0 state of the LR (0) items for the grammar
ht
S′ → eS/e
S → AS/b
S
In the predictive parse table M, of this grammar, the A → SA/a.
TE
entries M [S′, e] and M [S′, $] respectively are
(A) S′ → .S
O
A → .SA
(D) {S′ → eS, S′ → e}} and {S′ → ∈}
A
A → .a
H
The grammar is S → .b
(A) LL (1) A → .SA
N
A → .SA
G
(A) n, E + n and E + n – n
LO
S → FR
(B) n, E + n and E + E – n
R → ×S/e
N
(C) n, n + n and n + n – n
W
(D) n, E + n and E – n F → id
O
14. A top down parser uses ___ derivation. What will be the entry for [S, id]?
D
JW
S→E (C) l + m + 3
Pw
E→F+E (D) max (l, m) + 3
E→F 7. Which of the following problems is undecidable?
Jf
F → id [2007]
/2
Consider the following LR (0) items corresponding to
ly
(A) Membership problem for CFGs.
the grammar above.
it.
(B) Ambiguity problem for CFGs.
(i) S → S * .E
//b
(C) Finiteness problem for FSAs.
(ii) E → F. + E
s:
(D) Equivalence problem for FSAs.
(iii) E → F + .E
tp
8. Which one of the following is a top-down parser?
ht
Given the items above, which two of them will appear [2007]
in the same set in the canonical sets-of items for the (A) Recursive descent parser.
S
grammar? [2006] TE
(B) Operator precedence parser.
(A) (i) and (ii) (B) (ii) and (iii) (C) An LR (k) parser.
O
3. Consider the following statements about the context- 9. Consider the grammar with non-terminals N = {S, C,
N
C→b
(iii) G can be accepted by a deterministic PDA.
SE
(A) (i) only (B) (i) and (iii) only (C) It is ambiguous
A
(C) (ii) and (iii) only (D) (i), (ii) and (iii) (D) It is not context-free.
G
F → id
W
11. Which of the following strings is generated by the (A) Abstract syntax tree
grammar? [2007] (B) Symbol table
(A) aaaabb (B) aabbbb (C) Semantic stack
(C) aabbab (D) abbbba (D) Parse table
12. For the correct answer strings to Q.78, how many der- 18. The grammar S → aSa|bS|c is [2010]
ivation trees are there? [2007] (A) LL (1) but not LR (1)
(A) 1 (B) 2 (B) LR (1) but not LR (1)
(C) 3 (D) 4 (C) Both LL (1) and LR (1)
(D) Neither LL (1) nor LR (1)
13. Which of the following describes a handle (as applica-
19. The lexical analysis for a modern computer language
JW
ble to LR-parsing) appropriately? [2008]
(A) It is the position in a sentential form where the next such as Java needs the power of which one of the fol-
Pw
shift or reduce operation will occur. lowing machine models in a necessary and sufficient
sense? [2011]
Jf
(B) It is non-terminal whose production will be used
(A) Finite state automata
/2
for reduction in the next step.
ly
(C) It is a production that may be used for reduction (B) Deterministic pushdown automata
it.
in a future step along with a position in the sen- (C) Non-deterministic pushdown automata
//b
tential form where the next shift or reduce opera- (D) Turing machine
s:
tion will occur. Common data for questions 20 and 21: For the grammar
tp
(D) It is the production p that will be used for reduc- below, a partial LL (1) parsing table is also presented
ht
tion in the next step along with a position in the along with the grammar. Entries that need to be filled are
sentential form where the right hand side of the indicated as E1, E2, and E3. Is the empty string, $ indicates
S
production may be found. TE
end of input, and, I separates alternate right hand side of
productions
O
14. Which of the following statements are true?
N
B→S
text-free grammar by suitable transformations
H
always binary trees [2008] 20. The FIRST and FOLLOW sets for the non-terminals
A
(A) (i), (ii), (iii) and (iv) (B) (ii), (iii) and (iv) only A and B are [2012]
G
(C) (i), (iii) and (iv) only (D) (i), (ii) and (iv) only (A) FIRST (A) = {a, b, e} = FIRST (B)
D
(B) E1: S → a A b B, S → e 26. Consider the grammar defined by the following produc-
E2: S → b A a B, S → e tion rules, with two operators * and +
E3: S → ∈ S→T*P
(C) E1: S → a A b B, S → e T → U|T * U
E2: S → b A a B, S → e P → Q + P|Q
E3: B → S Q → Id
(D) E1: A → S, S → e U → Id
E2: B → S, S → e
Which one of the following is TRUE? [2014]
E3: B → S
(A) + is left associative, while * is right associative
22. What is the maximum number of reduce moves that
JW
(B) + is right associative, while * is left associative
can be taken by a bottom-up parser for a grammar with
no epsilon-and unit-production (i.e., of type A → ∈ (C) Both + and * are right associative.
Pw
and A → a) to parse a string with n tokens? [2013] (D) Both + and * are left associative
Jf
(A) n/2 (B) n – 1 27. Which one of the following problems is undecidable?
/2
(C) 2n – 1 (D) 2n
ly
[2014]
it.
23. Which of the following is/are undecidable? (A) Deciding if a given context -free grammar is am-
//b
(i) G is a CFG. Is L (G) = φ? biguous.
s:
(ii) G is a CFG, Is L (G) = Σ*? (B) Deciding if a given string is generated by a given
tp
context-free grammar.
(iii) M is a Turing machine. Is L (M) regular?
ht
(C) Deciding if the language generated by a given
(iv) A is a DFA and N is an NFA. Is L (A) = L (N)?
context-free grammar is empty.
[2013]
S
(D) Deciding if the language generated by a given
TE
(A) (iii) only context free grammar is finite.
O
(B) (iii) and (iv) only
28. Which one of the following is TRUE at any valid state
N
24. Consider the following two sets of LR (1) items of an stack and not inside.
H
LR (1) grammar. [2013] (B) Viable prefixes appear only at the top of the
EW
[2015]
(ii) Can be merged but will result in S-R conflict.
D
(iii) Can be merged but will result in R-R conflict. (B) Canonical LR, LALR
LO
(A) (i) only (B) (ii) only 30. Consider the following grammar G
O
(C) (i) and (iv) only (D) (i), (ii), (iii) and (iv)
S → F | H
D
JW
(D) P ↔ iv, Q ↔ i, R ↔ ii, S ↔ iii [2018]
(A) Context-free grammar can be used to specify
Pw
32. A student wrote two context - free grammars G1 and
G2 for generating a single C-like array declaration. both lexical and syntax rules.
Jf
The dimension of the array is at least one. (B) Type checking is done before parsing.
/2
(C) High-level language programs can be translated
ly
For example, int a [10] [3];
to different Intermediate Representations.
it.
The grammars use D as the start symbol, and use six (D) Arguments to a function can be passed using the
//b
terminal symbols int; id [ ] num. [2016] program stack.
s:
Grammar G1 Grammar G2
37. A lexical analyzer uses the following patterns to rec-
tp
D → int L; D → intL; ognize three tokens T1, T2, and T3 over the alphabet {a,
ht
L → id [E L → id E b, c}.
S
E → num ] E → E [num] T1: a?(b|c)*a
TE
E → num ] [E E → [num] T2: b?(a|c)*b
O
(A) Both G1 and G2 x. Note also that the analyzer outputs the token that
A
(D) Neither G1 nor G2 which one of the following is the sequence of tokens
N
S→y #
D
A
a #
What is FOLLOW (Q)? [2017]
LO
CORRECT? [2017] b c
D
Answer Keys
Exercises
Practice Problems 1
1. D 2. A 3. B 4. A 5. D 6. C 7. C 8. D 9. C 10. A
11. C 12. C 13. A 14. B 15. D
Practice Problems 2
1. D 2. C 3. B 4. A 5. B 6. B 7. A 8. C 9. C 10. A
11. D 12. A 13. D 14. A 15. D 16. A 17. C 18. A 19. A
JW
Pw
Previous Years’ Questions
1. B 2. D 3. B 4. A 5. D 6. A 7. B 8. A 9. C 10. A
Jf
/2
11. C 12. B 13. D 14. C 15. B 16. B 17. B 18. C 19. A 20. A
ly
21. C 22. B 23. D 24. D 25. D 26. B 27. A 28. C 29. C 30. D
it.
31. B 32. A 33. C 34. A 35. C 36. B 37. D 38. A
//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D
Chapter 2
Syntax Directed Translation
LEARNING OBJECTIVES
JW
Syntax directed translation S-attributed definition
Pw
Syntax directed definition L-attributed definitions
Jf
Dependency graph Synthesized attributes on the parser
/2
Constructing syntax trees for expressions Syntax directed translation schemes
ly
Types of SDD’s Bottom up evaluation of inherited attributes
it.
//b
s:
tp
ht
SyntAx directed trAnSlAtion
S
Notes:TE
1. Grammar symbols are associated with attributes.
To translate a programming language construct, a compiler may
O
2. Values of the attributes are evaluated by the semantic rules
need to know the type of construct, the location of the first instruc-
N
analysis’ phase. Syntax directed definition (SDD) It is high level specification for
SE
In this phase, for each production CFG, we will give some seman- Translation schemes These schemes indicate the order in which
D
A
tic rule. semantic rules are to be evaluated. This is an input and output
LO
mapping.
N
tic action or semantic rule) is associated with each production is SyntAx directed definitionS
D
known as Syntax Directed Translation Scheme. A SDD is a generalization of a CFG in which each grammar sym-
These semantic rules are used to bol is associated with a set of attributes.
There are two types of set of attributes for a grammar symbol.
1. Generate intermediate code.
2. Put information into symbol table. 1. Synthesized attributes
3. Perform type checking. 2. Inherited attributes
4. Issues error messages. Each production rule is associated with a set of semantic rules.
6.28 | Unit 6 • Compiler Design
Semantic rules setup dependencies between attributes Example: An inherited attribute distributes type informa-
which can be represented by a dependency graph. tion to the various identifiers in a declaration.
The dependency graph determines the evaluation order For the grammar
of these semantic rules. D → TL
Evaluation of a semantic rule defines the value of an
T → int
attribute. But a semantic rule may also have some side
T → real
effects such as printing a value.
L → L1, id
Attribute grammar: An attribute grammar is a syntax L → id
directed definition in which the functions in semantic rules
‘cannot have side effects’. That is, The keyword int or real followed by a list of
JW
identifiers.
Annotated parse tree: A parse tree showing the values of
In this T has synthesized attribute type: T.type. L has an
attributes at each node is called an annotated parse tree.
Pw
inherited attribute in L.in
The process of computing the attribute values at the
Rules associated with L call for procedure add type to the
Jf
nodes is called annotating (or decorating) of the parse tree.
type of each identifier to its entry in the symbol table.
/2
In a SDD, each production A → ∝ is associated with a
ly
set of semantic rules of the form:
it.
b = f (c1, c2,… cn) where Production Semantic Rule
//b
f : A function D → TL L.in = T.type
s:
b can be one of the following: T → int T.type = integer
tp
b is a ‘synthesized attribute’ of A and c1, c2,…cn are attrib-
ht
utes of the grammar symbols in A → ∝. T → real T.type = real
The value of a ‘synthesized attribute’ at a node is com-
S
L → L1, id
TE addtype L1.in = L.in(id.entry, L.in)
puted from the value of attributes at the children of that
L → id addtype (id.entry, L.in)
node in the parse tree.
O
N
Example: The annotated parse tree for the sentence real id1, id2, id3 is
D
shown below:
N
D
H
real
C
...
D
expr⋅t = 95 −
L → En
D
expr⋅t = 9 term⋅t = 5 E → E1 + T
E→T
term⋅t = 9
T → T1*F
T→F
9 − 5 + 2 F → (E)
b is an ‘inherited attribute’ of one of the grammar symbols F → digit.
on the right side of the production. Let us consider synthesized attribute value with each of the
An ‘inherited attribute’ is one whose value at a node is non-terminals E, T and F.
defined in terms of attributes at the parent and/or siblings of Token digit has a synthesized attribute lexical supplied
that node. It is used for finding the context in which it appears. by lexical analyzer.
Chapter 2 • Syntax Directed Translation | 6.29
JW
Pw
X⋅x Y⋅y
The Annotated parse tree for the expression 5 + 3 * 4 is
shown below: Example 2:
Jf
val
/2
D E
ly
E⋅val = 17 return
it.
E2
//b
E⋅val = 5 + T⋅val = 12 E1 +
val val
s:
T⋅val = 5 T⋅val = 3 F ⋅val = 4 Example 3: real p, q;
tp
*
ht
L⋅in = real
F⋅val = 5 F⋅val = 3 digit⋅lexval = 4
S
TE T ⋅type = real add type (q⋅real)
digit⋅lexval = 5 L1⋅in = real
digit⋅lexval = 3
id⋅entry = q
O
add type (P⋅real)
Example 1: Consider an example, which shows semantic
N
id⋅entry = p
N
A
expr → expr1 + term expr.t: = expr1.t||term.t||’+’ A topological sort of directed acyclic graph is an ordering
EW
expr → expr1 – term expr.t: = expr1.t||term.t ||‘-‘ m1, m2, . . . mk of nodes of the graph S. t edges go from nodes
N
...
TE
Example 2: Write a SDD for the following grammar to It is a condensed form of parse tree useful for representing
D
determine number.val.
A
language constructs.
LO
W
B S1 S1
O
digit.val = ‘9’
D
JW
E⋅nptr E→E+E {E.val := E.val + E.val}
Pw
E → E*E {E.val := E.val * E.val}
E⋅nptr +
T⋅nptr E → (E) {E.val := E.val}
Jf
− + E→I {I.val := I.val * 10 + digit}
/2
E⋅nptr T⋅nptr id I → I digit
ly
num
it.
T⋅nptr I → digit {I.val := digit}
//b
−
id id
s:
Implementation
to entry for c
tp
id num 4
S→E$ print (val [top])
ht
to entry for a E→E+E val[ntop] := val[top] + val[top-2]
S
E → E*E
TE val[ntop] := val[top] * val[top-2]
typeS of Sdd’S E → (E) val[ntop] := val[top-1]
O
1. S-Attributed Definitions
I → digit val[ntop] := digit
A
2. L-Attributed Definitions.
H
EW
Xj in the production.
G
• Both inherited and synthesized attribute are used. 2. The inherited attributes of A.
D
associated with a production body, dependency–graph Every S-attributed definition is L-attributed, because the
LO
edges can go from left to right only. above two rules apply only to the inherited attributes.
N
A → {S1} X1{S2}X2…{Sn}Xn
If we have both inherited and synthesized attributes then we
JW
have to follow the following rules: After removing embedding semantic actions:
Pw
1. An inherited attribute for a symbol on the right side A → M1X1M2X2…MnXn
Jf
of a production must be computed in an action before M1 → ∈{S1}
/2
that symbol.
M2 → ∈{S2}
ly
2. An action must not refer to a synthesized attribute of
...
it.
a symbol on the right side of the action.
Mn→ ∈ {Sn}
//b
3. A synthesized attribute for the non–terminal on the left
s:
can only be computed after all attributes it references, For example,
tp
have been computed.
E → TR
ht
Note: In the implementation of L-attributed definitions dur- R → +T {print (‘+’)} R1
S
ing predictive parsing, instead of syntax directed transla- R→∈ TE
tions, we will work with translation schemes. T → id {print (id.name)}
O
M → ∈ {print (‘+’)}
When transforming the grammar, treat the actions as if they
N
R → +T {print (‘+’);} R attribute A.i and every symbol X has a synthesized attribute
TE
•• Using a bottom up translation scheme, we can implement The synthesized attribute of Xi will not be changed.
N
any L-attributed definition based on LL (1) grammar. The inherited attribute of Xi will be copied into the syn-
W
•• We can also implement some of L-attributed definitions thesized attribute of Mi by the new semantic action added at
O
based on LR (1) using bottom up translations scheme. the end of the new production rule
D
exerciSeS
Practice Problems 1 3. Which of the following productions with transla-
Directions for questions 1 to 13: Select the correct alterna- tion rules converts binary number representation into
tive from the given choices. decimal.
1. The annotated tree for input ((a) + (b)), for the rules (A) Production Semantic Rule
given below is
B→0 B.trans = 0
Production Semantic Rule
B→1 B.trans = 1
E→E+T $ $ = mknode (‘+’, $1, $3)
B → B0 B1.trans = B2.trans*2
E → E-T $ $ = mknode (‘-’, $1, $3)
JW
B → B1 B1.trans = B2.trans * 2 + 1
E→T $ $ = $1;
Pw
T → (E) $ $ = $2;
(B) Production Semantic Rule
Jf
T → id $ $ = mkleaf (id, $1)
/2
B→0 B.trans = 0
T → num $ $ = mkleaf (num, $1)
ly
B → B0 B1.trans = B2.trans*4
it.
(A) E (B) E
//b
s:
T (C) Production Semantic Rule
T
tp
( E ) B→1 B.trans = 1
ht
( E )
B → B1 B1.trans = B2.trans*2
E + T
S
E + T
TE
T
( E )
(D) None of these
O
T id = b 4. The grammar given below is
T
N
( E )
D
id = a id = a A → LM L.i := l(A. i)
H
M.i := m(L.s)
EW
E + T A → QR R.i := r(A.i)
SE
T id = b Q.i := q(R.s)
C
TE
A.s := f(Q.s)
id = a
A
2. Let synthesized attribute val give the value of the binary (B) Non-L-attributed grammar
D
L → LB
W
S → aS {m := m + 3; print (m);}
L→B
O
B→0 |∈ {m: = 0 ;}
B→1 A shift reduce parser evaluate semantic action of a pro-
Input 101.101, S.val = 5.625 duction whenever the production is reduced.
use synthesized attributes to determine S.val
Which of the following are true? If the string is = a a b a b b then which of the following
(A) S → L1.L2 {S.val = L1.val + L2.val/ (2**L2.bits) is printed?
|L {S.val = L.val; S.bits = L.bits} (A) 0 0 3 6 9 12 (B) 0 0 0 3 6 9 12
(B) L → L1 B {L.val = L1.val*2 + B.val; (C) 0 0 0 3 6 9 12 15 (D) 0 0 3 9 6 12
L.bits = L1.bits + 1} 6. Which attribute can be evaluated by shift reduce parser
|B {L.val = B.val; L.bits = 1} that execute semantic actions only at reduce moves but
(C) B → 0 {B.val = 0} never at shift moves?
|1 {B.val = 1} (A) Synthesized attribute (B) Inherited attribute
(D) All of these (C) Both (a) and (b) (D) None of these
Chapter 2 • Syntax Directed Translation | 6.33
7. Consider the following annotated parse tree: If Input = begin east south west north, after evaluating
A A⋅num = y⋅num + z⋅num this sequence what will be the value of S.x and S.y?
(A) (1, 0) (B) (2, 0)
B⋅num = num B + C C⋅num = num (C) (-1, -1) (D) (0, 0)
11. What will be the values s.x, s.y if input is ‘begin west
num num south west’?
(A) (–2, –1)
Which of the following is true for the given annotated (B) (2, 1)
tree? (C) (2, 2)
(A) There is a specific order for evaluation of attribute (D) (3, 1)
JW
on the parse tree.
Pw
(B) Any evaluation order that computes an attribute 12. Consider the following grammar:
‘A’ after all other attributes which ‘A’ depends on, S→E S.val = E.val
Jf
is acceptable.
/2
E.num = 1
(C) Both (A) and (B)
ly
E → E*T E1.val = 2 * E2.val + 2 * T.val
it.
(D) None of these.
//b
E2.num = E1.num + 1
Common data for questions 8 and 9: Consider the fol-
s:
lowing grammar and syntax directed translation. T.num = E1.num + 1
tp
E→T E.val = T.val
E→E+T E1.val = E2.val + T.val
ht
T.num = E.num + 1
E→T E.val = T.val
S
T→T+P
TE T1.val = T2.val + P.val
T → T*P T1.val = T2.val * P.val *
P.num T2.num = T1.num + 1
O
P.num = T1.num + 1
N
P.num = T.num + 1
A
P→0 P.num = 1
H
(A) 8 (B) 6
(A) Num attribute is inherited attribute. Val attribute is
A
(C) 4 (D) 12
synthesized attribute.
G
(A) 8 (B) 6
A
tribute.
(C) 4 (D) 12
LO
S→b S.x = 0
rules and E as the start symbol.
S.y = 0 E → E1@T {E.value = E1.value*T.value}
S → S1 I S.x = S1.x + I.dx
S.y = S1.y + I.dy
|T {E.value = T.value}
I → east I.dx = 1 T → T1 and F {T.value = T1.value + F.value}
I.dy = 0
|F {T.value = F.value}
I → north I.dx = 0
I.dy = 1 F → num {F.value = num.value}
I → west I.dx = -1
Compute E.value for the root of the parse tree for the
I.dy = 0
expression: 2 @ 3 and 5 @ 6 and 4
I → south I.dx = 0
I.dy = -1 (A) 200 (B) 180
(C) 160 (D) 40
6.34 | Unit 6 • Compiler Design
JW
(A) Synthesized attribute (B) Inherited attribute
t→5 T.t = ‘5’
(C) Canonical attributes (D) None of these
Pw
t→2 T.t = ‘2’
6. Inherited attribute is a natural choice in:
Jf
t→4 T.t = ‘4’
(A) Keeping track of variable declaration
/2
(B) Checking for the correct use of L-values and R-
ly
E
it.
values.
//b
E + T (C) Both (A) and (B)
(D) None of these
s:
E - T
tp
7. Syntax directed translation scheme is desirable because
ht
(A) It is based on the syntax
T 2
4 (B) Its description is independent of any implementa-
S
5 TE tion.
After evaluation of the tree the value at the root will be: (C) It is easy to modify
O
(A) 28 (B) 32
D
2. The value of an inherited attribute is computed from the called semantic actions are embedded within right side
A
(A) Sibling nodes (B) Parent of the node (A) Syntax directed translation
EW
(C) Children node (D) Both (A) and (B) (B) Translation schema
(C) Annotated parse tree
N
expr → → term
term → 1 {print (‘1’)} (A) Memory associated with its syntactic component
A
{print (‘2’)}
term → 3 {print (‘3’)} ponent
D
notation.
(A) Error type (B) Type expression
O
(C) It detects shift-reduce conflict, and resolves the (C) The maximum number of successors of a node
conflict in favor of a shift over a reduce action. in an AST and a CFG depends on the input pro-
(D) It detects shift-reduce conflict, and resolves the gram.
conflict in favor of a reduce over a shift action. (D) Each node in AST and CFG corresponds to at
(B) Assume the conflicts in Part (A) of this question most one statement in the input program.
are resolved and an LALR (1) parser is gener- 3. Consider the following Syntax Directed Translation
ated for parsing arithmetic expressions as per the Scheme (SDTS), with non-terminals {S, A} and ter-
given grammar. Consider an expression 3 × 2 minals {a, b}.[2016]
+ 1. What precedence and associativity proper- S → aA { print 1 }
ties does the generated parser realize?
JW
S → a { print 2 }
(A) Equal precedence and left associativity; expres-
A → Sb { print 3 }
Pw
sion is evaluated to 7
(B) Equal precedence and right associativity; expres- Using the above SDTS, the output printed by a bot-
Jf
sion is evaluated to 9 tom-up parser, for the input aab is:
/2
(C) Precedence of ‘×’ is higher than that of ‘+’, and (A) 1 3 2 (B) 2 2 3
ly
both operators are left associative; expression is (C) 2 3 1 (D) syntax error
it.
evaluated to 7
//b
4. Which one of the following grammars is free from left
(D) Precedence of ‘+’ is higher than that of ‘×’, and recursion?[2016]
s:
both operators are left associative; expression is (A) S → AB
tp
evaluated to 9 A → Aa|b
ht
2. In the context of abstract-syntax-tree (AST) and B → c
(B) S → Ab|Bb|c
S
control-flow-graph (CFG), which one of the follow- TE
A → Bd|ε
ing is TRUE?[2015]
B → e
O
(A) In both AST and CFG, let node N2 be the suc- (C) S → Aa|B
N
(B) For any input program, neither AST nor CFG A → Bd|ε
B → Ae|ε
EW
Answer Keys
C
TE
Exercises
A
Practice Problems 1
G
D
1. A 2. D 3. A 4. B 5. A 6. A 7. B 8. C 9. B 10. D
A
Practice Problems 2
W
1. A 2. D 3. C 4. C 5. A 6. C 7. D 8. B 9. C 10. C
O
D
JW
LEARNING OBJECTIVES
Pw
Introduction Procedure calls
Jf
/2
Directed Acyclic Graphs (DAG) Code generation
ly
Three address code Next use information
it.
Symbol table operations Run-time storage management
//b
Assignment statements DAG representations of basic blocks
s:
tp
Boolean expression Peephole optimization
ht
Flow control of statements
S
TE
O
N
D
inTroduCTion
N
P12
source dependent source independent source independent
A
+ P *
6
LO
P5P10
* d P11
N
P1P 2
• High level IR, i.e., AST (Abstract Syntax Tree) b c
O
•
D
P11 = makeleaf (id, d) The corresponding three address code will be like this:
P12 = makenode (*, P10, P11)
Syntax Tree DAG
P13 = makenode (+, P7, P12)
t1 = -z t1 = -z
Example 2: a: = a – 10 t2 = y * t1 t2 = y * t1
:=
t3 = -z t5 = t2 + t2
− t4 = y * t3 X = t5
a 10 t5 = t4 + t2
X = t5
JW
Three-Address Code The postfix notation for syntax tree is: xyz unaryminus *yz
Pw
In three address codes, each statement usually contains 3 unaryminus *+=.
Jf
addresses, 2 for operands and 1 for the result. •• Three address code is a ‘Linearized representation’ of
/2
Example: -x = y OP z syntax tree.
ly
•• x, y, z are names, constants or complier generated •• Basic data of all variables can be formulated as syntax
it.
temporaries, directed translation. Add attributes whenever necessary.
//b
•• OP stands for any operator. Any arithmetic operator (or) Example: Consider below SDD with following
s:
Logical operator.
tp
specifications:
ht
Example: Consider the statement x = y * - z + y* - z E might have E. place and E.code
E.place: the name that holds the value of E.
S
=
E.code: the sequence of intermediate code starts evaluating E.
TE
+ Let Newtemp: returns a new temporary variable each time
x
O
it is called.
N
*
* New label: returns a new label.
D
Unary-minus
y Then the SDD to produce three–address code for expressions
N
Unary-minus
z y
is given below:
A
z
H
EW
E→ E1 PLUS E2 E. code = E1. code || E2. code || gen (PLUS, E. place, E1. place, E2. place);
C
E. place = newtemp();
TE
E→ E1MUL E2 E. code = E1. code || E2. code || gen (MUL, E. place, E1. place, E2. place);
A
E. Place = Newtemp();
G
E. code = E1.code
A
JW
.
Pw
Op Arg1 Arg2 Result
Param xn; (0) Param x
Jf
Call procedure p with n parameters and (1) Call READ (x)
/2
Call p, n, x;
store the result in x.
ly
return x Use x as result from procedure. Example 3: WRITE (A*B, x +5)
it.
//b
Declarations OP Arg1 Arg2 Result
s:
•• Global x, n1, n2: Declare a global variable named x at off- (0) * A B t1
tp
set n1 having n2 bytes of space. (1) + x 5 t2
ht
•• Proc x, n1, n2: Declare a procedure x with n1 bytes of (2) Param t1
(3) Param t2
parameter space and n2 bytes of local variable space.
S
(4) Call Write 2
•• Local x, m: Declare a local variable named x at offset m
TE
from the procedure frame.
O
Adaption for object oriented code •• Temporaries are not used and instead references to
A
•• x = y field z: Lookup field named z within y, store address instructions are made.
H
•• Class x, n1, n2: declare a class named x with n1 bytes of •• Triples takes less space when compared with Quadruples.
N
class variables and n2 bytes of class method pointers. •• Optimization by moving code around is difficult.
•• Field x, n: Declare a field named x at offset n in the class
SE
•• New x: Create a new instance of class name x. •• For the expression a = y* – z + y*–z the Triple representa-
TE
tion is
A
Implementation of Three
G
(0) Uminus z
A
with fields for the operator and the operands. There are 3 (2) Uminus z
N
(5) = a (4)
2. Triples
D
3. Indirect triples
Array – references
Quadruples Example: For A [I]: = B, the quadruple representation is
A quadruple has four fields: op, arg1, arg2 and result.
Op Arg1 Arg2 Result
•• Unary operators do not use arg2. (0) []= A I T1
•• Param use neither arg2 nor result. (1) = B T2
•• Jumps put the target label in result.
•• The contents of the fields are pointers to the symbol table The same can be represented by Triple representation also.
entries for the names represented by these fields. [] = is called L-value, specifies the address to an
•• Easier to optimize and move code around. element.
Chapter 3 • Intermediate Code Generation | 6.39
JW
•• In indirect triples, pointers to triples will be there instead = 4;}
Pw
of triples. Need to remember the current offset before entering the
•• Optimization by moving code around is easy.
Jf
block, and to restore it after the block is closed.
•• Indirect triples takes less space when compared with
/2
Example: Block → begin M4 Declarations statements end
ly
Quadruples.
{pop (tblptr); pop (offset) ;}
it.
•• Both indirect triples and Quadruples are almost equally
//b
efficient. M4 → ∈{t: = mktable (top (tblptr); push (t,
s:
Example: Indirect Triple representation of 3-address code tblptr); push (top (offset), offset) ;
tp
Can also use the block number technique to avoid creating
ht
Statement
a new symbol table.
(0) (14)
S
(1) (15)
TE
(2) (16) Field names in records
O
(3) (17) •• A record declaration is treated as entering a block in
N
(5) (19)
•• Need to use a new symbol table.
N
A
(15) * y (14)
T. width = top (offset);
pop (tblptr);
(16) Uminus z
N
pop (offset) ;}
(17) * y (16)
SE
(19) = x (18)
push {(o, offset) ;}
TE
A
•• create a new symbol table. code, we show how names can be looked up in the symbol
N
•• Link it to the symbol table previous. table and how elements of array can be accessed.
W
•• insert a new identifier name with type and offset into Code generation for assignment statements gen ([address
D
Error handling routine error – msg (error information); Start_addr: starting address
The error messages can be written and stored in other 1D Array: A[i]
file. Temp space management:
•• Start_addr + (i – low )* w = i * w + (start_addr - low *w)
•• This is used for generating code for expressions. •• The value called base, (start_addr – low * w) can be com-
•• newtemp (): allocates a temp space. puted at compile time and then stored at the symbol table.
•• freetemp (): free t if it is allocated in the temp space Example: array [-8 …100] of integer.
To declare [-8] [-7] … [100] integer array in Pascal.
Label management 2D Array A [i1, i2]
•• This is needed in generating branching statements. Row major order: row by row. A [i] means the ith row.
•• newlabel (): generate a label in the target code that has 1st row A [1, 1]
never been used.
JW
A [1, 2]
Pw
Names in the symbol table 2nd row A [2, 1]
Jf
S→ id: = E {p: = lookup (id-name, top (tblptr)); A [2, 2]
/2
If p is not null then gen (p, “:=”, A [i, j] = A [i] [j]
ly
E.place); Column major: column by column.
it.
Else error (“var undefined”, id. Name) A [1, 1] : A [1, 2]
//b
;} A [2, 1] A [2, 2]
s:
E→E1+ E2 {E. place = newtemp (); 1st Column 2nd column
tp
gen (E.place, “: = “, E1.place, "+”, Address for A [i1, i2]:
ht
E2.Place); free temp (E1.pace); Start _ addr + ((i, - low1) *n2 + (i2 – low2))*w
freetemp Where low1 and low2 are the lower bounds of i1 and i2. n2
S
(E2. place) ;}
TE
is the number of values that i2 can take. High2 is the upper
E→ –E1 {E. place = newtemp (); bound on the valve of i2. n2 = high2 – low2 + 1
O
gen (E.place, “: =”, “uminus”, We can rewrite address for A [i1, i2] as ((i1 × n2) + i2)
N
Freetemp (E1. place ;)} (start _ addr - low1 × n2 × w – low2 × w) can be computed at
N
A
E→(E1) {E. place = E1. place ;} compiler time and then stored in the symbol table.
H
If p ≠ null then E.place = p. place else error Address for A [i1, i2,…ik]
(“var undefined”, id. name) ;}
)
N
n n
= i1 * π ik= 2 i + i2 * π ik=3 i + + ik * w
SE
Type conversions
(
+ start _ addr − low1 * w * π ik= 2 ni
C
n
For the expression, −low2 * w * π ik=3 i − − lowk * w
A
E → E1 + E2
It can be computed incrementally in grammar rules:
G
If E1. type = integer then Array: a pointer to the symbol table entry containing
gen (temp1,’:=’ int - to - float, E1.place); information about the array declaration.
gen (E,’:=’ temp1, ‘+’, E2.place); Ndim: the current dimension index
Else Base: base address of this array
gen (temp1,’:=’ int - to - float, E2. place); Place: where a variable is stored.
gen (E,’:=’ temp1, ‘+’, E1. place); Limit (array, n) = nm is the number of elements in the mth
Free temp (temp1); coordinate.
L→ id Boolean Expressions
L→ [Elist] There are two choices for implementation of Boolean
Elist→ Elist1, E expressions:
Elist→ id [E] 1. Numerical representation
E→ id 2. Flow of control
E→ E + E
Numerical representation
E→ (E)
Encode true and false values.
•• S → L: = E {if L. offset = null then /* L is a Numerically, 1:true 0: false.
simple id */ gen (L. place, “:=”, E.place); Flow of control: Representing the value of a Boolean
JW
Else expression by a position reached in a program.
Pw
gen (L. place, “[“, L. offset, “]”,”:=”,
Short circuit code: Generate the code to evaluate a Boolean
E.place);
Jf
expression in such a way that it is not necessary for the code
•• E → E1 + E2 {E.place = newtemp ();
/2
to evaluate the entire expression.
ly
gen (E. place, “:=”, E1.place, "+”, E2.
it.
place) ;} •• If a1 or a2
//b
•• E → (E1) {E.place= E1.place} a1 is true then a2 is not evaluated.
•• E →L {if L. offset = null then /* L is a •• If a1 and a2
s:
tp
simple id */ E.place:= L .place); a1 is false then a2 is not evaluated.
ht
Else begin
E.place:=newtemp(); Numerical representation
S
gen (E.place, “:=”,L.place, “[“,L.offset, TE
E → id1 relop id2
‘]”);
{B.place:= newtemp ();
O
end }
N
Begin
gen (“goto”, nextstat+2);
H
L.place: = P.place:
gen (B.place,”:=”, “1”)’}
EW
L.offset:= null;
End Example 1: Translate the statement (if a < b or c < d and e
N
ndim+1;
107: t2:= 1
N
array, m));
O
1. If – then implementation:
S →if B then S1 {gen (Befalls,” :”);} Translation sequence
To B.true •• Evaluate the expression.
B.Code
To B.false •• Find which value in the list matches the value of the
JW
B.true: S1.Code expression, match default only if there is no match.
•• Execute the statement associated with the matched value.
Pw
B.false:
Jf
2. If – then – else How to find the matched value? The matched value can be
/2
P→S {S.next:= newlabel (); found in the following ways:
ly
P.code:= S.code || gen (S.next,” :”)} 1. Sequential test
it.
2. Lookup table
//b
S → if B then S1 else S2 {S1.next:= S.next;
3. Hash table
s:
S2.next:= S.next; 4. Back patching
tp
Secede: = B.code || S1.code ||.
Two different translation schemes for sequential test are
ht
Gen (“goto” S.next) || B. false,” :”) shown below:
||S2.code}
S
1. Code to evaluate E into t
TE
Need to use inherited attributes of S to define the
Goto test
attributes of S1 and S2
O
B.Code
To B.false
N
B.true: S1.Code
H
goto next
Goto S.next
EW
S.next
.
C
3. While loop: .
TE
id2, ‘goto’, B.true ‘else’, ‘goto’, B. false) || 2. Can easily be converted into look up table
A
B.code||S1.code || gen
O
Use a table and a loop to find the address to jump r – value: value of the variable, i.e., on the right side of
assignment. Ex: y, in above assignment.
V [1] L [1] l – value: The location/address of the variable, i.e., on the
L[1] : S [1]
V [2] L [2] leftside of assignment. Ex: x, in above assignment.
L [2]: S [2]
There are different modes of parameter passing
V [3] L [3]
1. call-by-value
2. call-by-reference
3. call-by-value-result (copy-restore)
4. call-by-name
3. Hash table: When there are more than two entries
use a hash table to find the correct table entry.
JW
4. Back patching: Call by value
Pw
•• Generate a series of branching statements with the Calling procedure copies the r values of the arguments into
targets of jumps temporarily left unspecified. the called proceduce’s Activation Record.
Jf
•• To determine label table: each entry contains a list Changing a formal parameter has no effect on the actual
/2
of places that need to be back patched. parameter.
ly
it.
•• Can also be used to implement labels and gotos.
Example: void add (int C)
//b
{
s:
Procedure Calls C = C+ 10;
tp
•• Space must be allocated for the activation record of the printf (‘\nc = %d’, &C);
ht
called procedure. }
main ()
S
•• Arguments are evaluated and made available to the called TE
procedure in a known place. {
•• Save current machine status. int a = 5;
O
N
}
H
Example: S → call id (Elist) In main a will not be affected by calling add (a)
EW
Elist → E {initialize Elist.queue to contain only 1. Used by PASCAL and C++ if we use non-var
TE
E.place} parameters.
A
Use a queue to hold parameters, then generate codes for 2. The only thing used in C.
G
params. Advantages:
D
1. No aliasing.
LO
.
W
} int j;
void main() j = - 1;
{ For (in y= 0; y < 10; y ++)
int a = 5; x ++;
}
printf (‘\na = %d’, a);
add (&a); •• Instead of passing values or address as arguments, a func-
printf (‘\na = %d’, a); tion is passed for each argument.
output: a = 5 •• These functions are called thunks.
c = 15 •• Each time a parameter is used, the thunk is called, then
a = 15 the address returned by the thunk is used.
That is, here the actual parameter is also modified.
JW
y = 0: use return value of thunk for y as the -value.
Advantages
Pw
1. Efficiency in passing large objects. Advantages
2. Only need to copy addresses. •• More efficient when passing parameters that are never
Jf
used.
/2
Call-by-value-result •• This saves lot of time because evaluating unused param-
ly
it.
Equivalent to call-by-reference except when there is aliasing. eter takes a longtime.
//b
That is, the program produces the same result, but not the
s:
same code will be generated. Code Generation
tp
Aliasing: Two expressions that have the same l-values are Code generation is the final phase of the compiler model.
ht
called aliases. They access the same location from different
places. Input
S
Front Intermediate Code
Aliasing happens through pointer manipulation.
TE (or)
optimization
end code
Source
1. Call by reference with global variable as an argument.
O
program
2. Call by reference with the same expression as argu-
N
Intermediate code
ment twice.
D
N
program generation
H
Advantages:
EW
2. No implicit side effect if pointers are not passed. 1. Output code must be correct.
SE
used in Algol.
TE
•• Procedure body is substituted for the call in calling procedure. Issues in the Design of a Code Generator
A
•• Target programs
LO
•• Instruction selection
W
{
for (int y = 0; y < 10; y++)
x++; Input to the code generator
} Intermediate representation with symbol table will be the
main () input for the code generator.
{
int j; •• High Level Intermediate representation
j = –1; Example: Abstract Syntax Tree (AST)
show (j);
} •• Medium – level intermediate representation
Actually it will be like this Example: control flow graph of complex operations
main ()
{ •• Low – Level Intermediate representation
Chapter 3 • Intermediate Code Generation | 6.45
JW
Example: SP, PC … etc.
Absolute machine language Minimize number of loads and stores.
Pw
•• Final memory area for a program is statically known.
Jf
•• Hard coded addresses.
Evaluation order
/2
•• Sufficient for very simple systems.
•• The order of evaluation can affect the efficiency of the
ly
Advantages:
it.
target code.
•• Fast for small programs
//b
•• Some orders require fewer registers to hold intermediate
•• No separate compilation
results.
s:
tp
Disadvantages: Can not call modules from other languages/
ht
compliers. Target Machine
S
Lets us assume, the target computer is
Relocatable code It Needs TE
•• Relocation table •• Byte addressable with 4 bytes per word
O
•• Relocating linker + loader (or) runtime relocation in •• It has n general purpose registers
N
OP source, destination
H
assembler tool to convert this to binary (object) code. It needs Example: The op may be MOV, ADD, MUL.
N
(i) assembler (ii) linker and loader. Generally cost will be like this
SE
Register Register 1
TE
Register Memory 2
Memory management
A
Memory Register 2
Mapping names in the source program to addresses of data
G
Memory Memory 3
objects in runtime memory is done by the front end and the
D
Addressing modes:
A
code generator.
LO
•• A name in a three address statement refers to a symbol Mode Form Address Cost
N
JW
at the lower end of the memory. vation is resumed after storing the values of relevant reg-
isters it also includes program counter which sets to point
Pw
The data objects are stored at the statically determined
area as its size is known at the compile time. Compiler immediately after the call.
Jf
stores these data objects at statically determined area The size of stack is not fixed.
/2
because these are compiled into target code. This static data
ly
Scope of declarations Declaration scope refers to the cer-
area is placed on the top of the code area.
it.
tain program text portion, in which rules are defined by the
The runtime storage contains stack and the heap. Stack
//b
language.
contains activation records and program counter, data
s:
Within the defined scope, entity can access legally to
object within this activation record are also stored in this
tp
declared entities.
stack with relevant information.
ht
The scope of declaration contains immediate scope
The heap area allocates the memory for the dynamic data
always. Immediate scope is a region of declarative portion
S
(for example some data items are allocated under the pro-
with enclosure of declaration immediately.
TE
gram control)
Scope starts at the beginning of declaration and scope
O
The size of stack and heap will grow or shrink according
continues till the end of declaration. Whereas in the over
N
Activation Record
A
•• If the procedure is recursive, several of its activation •• Nodes in the flow graph represents computations.
A
Basic Blocks
LO
4. Heap
W
Code
1. First determine the set of leaders
Static data •• First statement is leader
Stack •• Any target of goto is a leader
…
•• Any statement that follows a goto is a leader.
… 2. For each leader its basic block consists of the leader
Heap and all statements up to next leader.
Initial node: Block with first statement is leader.
•• Sizes of stack and heap can change during program Example: consider the following fragment of code that
execution. computes dot product of two vectors x and y of length 10.
For code generation there are two standard storage begin
allocations: Prod: = 0;
Chapter 3 • Intermediate Code Generation | 6.47
JW
x*1=1*x=x
x–0=x
Pw
B2 (3) t1:= 4*i x/1 = x
Jf
(4) t2: =x[t1]
/2
Next-Use Information
ly
(5) t3: =4 * i
•• Next-use info used in code generation and register
it.
(6) t4: =y [t3]
allocation.
//b
(7) t5: =t2* t4
•• Remove variables from registers if not used.
s:
(8) t6; =Prod + t5 •• Statement of the form A = B or C defines A and uses B
tp
(9) Prod := t6 and C.
ht
(10) t7: = i+1 •• Scan each basic block backwards.
S
(11) i:= t7 •• Assume all temporaries are dead or exit and all user vari-
TE
ables are live or exit.
(12) if i < = 10 goto (3)
O
N
i: x: = y op z
EW
in backward scan
N
3: t3 = 2 * t2
A
2. Local:
•• Structure preserving transformations 5: t5 = b * b
N
W
•• Algebraic transformations 6: t6 = t4 + t5
O
sions are the values of the names live on exit from the
block. Statements:
•• Two basic blocks are equivalent if they compute the same 7: no temporary is live
set of expressions. 6: t6: use (7) t4 t5 not live
5: t5: use (6)
Structure preserving transformations:
4: t4: use (6), t1 t3 not live
1. Common sub-expression elimination: 3: t3: use (4) t2 not live
a:=b+c a:=b+c 2: t2: use (3)
b:=a–d b:=a–d 1: t1: use (4)
c:=b+c ⇒ c:=b+c Symbol Table:
d:=a-d d:=b
t1 dead use in 4
6.48 | Unit 6 • Compiler Design
JW
4: t1 = t1 + t2 must be stored. Usually L is a register.
Pw
•• Consult address descriptor of y to determine y′. Prefer a
5: t2 = b * b
register for y′. If value of y is not already in L generate
Jf
6: t1 = t1 + t2 MOV y′, L.
/2
7: x = t1 •• Generate
ly
OP z′, L
it.
//b
Code Generator Again prefer a register for z. Update address descriptor
of x to indicate x is in L. If L is a register update its descrip-
s:
•• Consider each statement
tor to indicate that it contains x and remove x from all other
tp
•• Remember if operand is in a register
register descriptors.
ht
•• Descriptors are used to keep track of register contents and
•• If current value of y and/or z have no next use and are
address for names
S
dead or exit from block and are in registers then change
TE
•• There are 2 types of descriptors
the register descriptor to indicate that it no longer contain
1. Register Descriptor
O
y and /or z.
N
2. Address Descriptor
D
Function getreg
N
Register Descriptor
A
Keep track of what is currently in each register. Initially all after x = y OP z then return register of y for L.
EW
•• Keep track of location where current value of the name 4. Else select memory location x as L.
TE
Example: Three address statement: if x < y goto z Code Generation from DAG:
It can be implemented by subtracting y from x in R, then
jump to z if value of R is negative. S1 = 4 * i S1 = 4 * i
2. Based on a set of condition codes to indicate whether S2 = add(A) - 4 S2 = add(A) - 4
last quantity computed or loaded into a location is S3 = S2 [S1] S3 = S2 [S1]
negative (or) Zero (or) Positive. S4 = 4 * i
•• compare instruction set codes without actually S5 = add(B) - 4 S5 = add(B) - 4
computing the value. S6 = S5[S4] S6 = S5[S4]
Example: CMP x, y S7 = S3 *S6 S7 = S3 *S6
CJL Z. S8 = prod + S7 prod = prod + S7
•• Maintains a condition code descriptor, which tells the
JW
prod = S8
name that last sets the condition codes. S9 = I + 1
Pw
Example: X: = y + z I = S9 I=I+1
Jf
If x < 0 goto z if I < = 20 got (1) if I < = 20 got (1)
/2
By
ly
MOV y, Ro
Rearranging order of the code
it.
ADD z, Ro
//b
MOV Ro, x Consider the following basic block
s:
CJN z. t1:= a + b
tp
t2:= c + d
DAG Representation
ht
t3:= e – t2
of Basic Blocks
S
x = t1 - t3 and its DAG
TE
•• DAGS are useful data structures for implementing trans- −x
O
sequent statements.
N
a b y +t 2
H
•• Interior nodes are labeled by an operator symbol. Three address code for the DAG:
SE
•• Nodes are also optionally given as a sequence of identi- (Assuming only two registers are available)
C
MOV c, R1
G
4: t4:= b [t3]
MOV e, Ro Register Reloading
A
5: t5:= t2 * t4
LO
9: i= t7 MOV R1, x
D
JW
4. Run-time errors
We can delete instruction (2), because the value of a is
Pw
Lexical errors If the variable (or) constants are declared already in R0.
(or) defined, not according to the rules of language, special Example 2: Load x, R0
Jf
symbols are included which were not part of the language, Store R0, x
/2
etc is the lexical error.
ly
If no modifications to R0/x then store instruction can be
Lexical analyzer is constructed based on pattern recog-
it.
deleted
nizing rules to form a token, when a source code is made
//b
Example 3: (1) Load x, R0
into tokens and if these tokens are not according to rules
s:
then errors are generated. (2) Store R0, x
tp
Example 4: (1) store R0, x
ht
Consider a c program statement (2) Load x, R0
printf (‘Hello World’); Second instruction can be deleted from both examples 3 and 4.
S
Main printf, (, ‘, Hello world,’ , ),; are tokens.
TE
Example 5: Store R0, x
Printf is not recognizable pattern, actually it should be
O
Load x, R0
printf. It generates an error.
N
Syntactic error These errors include semi colons, missing Eliminating Unreachable code
N
A
braces etc. which are according to language rules. An unlabeled instruction immediately following and uncon-
H
is performed over incompatible type of variables, double •• May be due to updates in programs without consid-
SE
declaration, assigning values to undefined variables etc. ering the whole program segment.
C
Runtime errors The Runtime errors are the one which are Example: Let print = 0
TE
detected at runtime. These include pointers assigned with if print = 1 goto L 1 if print ! = 1 goto L 2
A
NULL values and accessing a variable which is out of its goto L 2 print instructions
G
goto L 2 if 0! = 1 goto L 2
2. Phrase level recovery
W
4. Global correction.
D
JW
Example 2: can be used whenever possible.
Pw
Sometimes skips “goto L 3”
Example: replace add #1, R by INC R
Jf
goto L 1 Only one jump to if a < b goto L 2
L
/2
... goto L 3:
L 1: if a < b goto
ly
...
L2 L 3:
it.
L 3:
//b
...
s:
tp
Exercises
ht
Practice Problems 1 Var …
S
Directions for questions 1 to 15: Select the correct alterna-
TE
call A2;
tive from the given choices }
O
N
The binary operators used in this expression tree can Procedure A21 ( )
EW
+ Call A1;
A
}
G
− Call A1;
D
−
A
}
LO
A21 ( ) → A1 ( ).
W
r s
D
JW
two registers. The code optimization allowed is code
(C) BBB ***- + (D) -*/bc
motion. What is the minimum number of spills to
Pw
memory in the complied code? 10. What is the final value of the postfix expression B C D
A D – + – + where A = 2, B = 3, C = 4, D = 5?
Jf
c = a + b;
(A) 5 (B) 4
/2
d = c*a; (C) 6 (D) 7
ly
e = c + a;
it.
11. Consider the expression x = (a + b)* –C/D. In the
//b
x = c*c; quadruple representation of this expression in which
s:
If (x > a) instruction ‘/’ operation is used?
tp
(A) 3rd (B) 4th
{
ht
(C) 5th (D) 8th
y = a*a;
12. In the triple representation of x = (a + b)*– c/d, in which
S
Else TE
instruction (a + b) * – c/d result will be assigned to x?
{ (A) 3rd (B) 4th
O
d = d*d; e = e*e;
D
spill to memory?
While (A < = D) do
SE
(A) 3 (B) 4
A = A + 3;
C
(C) 5 (D) 6
TE
6. Convert the following expression into postfix notation: How many temporaries are used?
(A) 2 (B) 3
A
a = (-a + 2*b)/a
(C) 4 (D) 0
G
(C) a2b * a/+ (D) a2b – * a/+ 14. Code generation can be done by
A
For ( j = 0; j< = 10; j++) (C) Type checking (D) Run time management
D
JW
instructions will be modified?
t4 = t3 + t3 (A) 1 (B) 2
Pw
a = t4 (C) 4 (D) 5
Jf
3. In static allocation, names are bound to storage at 11. After applying common sub expression elimination to
/2
_______ time. the above code. Which of the following are true?
ly
(A) Compile (B) Runtime (A) a: = b + c (B) y: = a
it.
(C) Debugging (D) Both (A) and (B) (C) z = a + a (D) None of these
//b
4. The actual parameters are evaluate d and their r-values 12. Among the following instructions, which will be modi-
s:
are passed to the called procedure is known as fied after applying copy propagation?
tp
(A) call-by-reference (A) a: = b + c (B) z: = a * a
ht
(B) call-by-name (C) y: = a (D) w: = y * y
(C) call-by-value
S
13. Which of the following is obtained after constant
TE
(D) copy-restore
folding?
5. If the expression – (a + b) *(c + d) + (a + b + c) is trans-
O
(A) u: = 3 (B) v: = u + w
N
lated into quadruple representation, then how many (C) x: = 0 (D) Both (A) and (C)
D
(A) 5 (B) 6
A
statements to be eliminated?
(C) 7 (D) 8
H
(A) x = 0
EW
(C) 5 (D) 8 15. How many instructions will be there after optimizing
C
7. In the indirect triple representation for the expression the above result further?
TE
L0: e: = 0
A
(D) (E/F) * (C – D)
b: = 1
LO
8. For the given assembly language, what is the cost for it?
d: = 2
N
MOV b, a
W
L1: a: = b + 2
O
ADD c, a c: = d + 5
D
(A) 3 (B) 4
e: = e + c
(C) 6 (D) 2
f: a*a
9. Consider the expression
If f < c goto L3
((4 + 2 * 3 + 7) + 8 * 5). The polish postfix notation for L2: e: = e + f
this expression is
goto L4
(A) 423* + 7 + 85*+ (B) 423* + 7 + 8 + 5*
(C) 42 + 37 + *85* + (D) 42 + 37 + 85** + L3: e: = e + 2
L4: d: = d + 4
Common data for questions 10 to 15: Consider the fol-
b: = b – 4
lowing basic block, in which all variables are integers, and
** denotes exponentiation. If b! = d goto 4
a: = b + c L5:
6.54 | Unit 6 • Compiler Design
How many blocks are there in the flow graph for the 18. In call by value the actual parameters are evaluated.
above code? What type of values is passed to the called procedure?
(A) 5 (A) l-values
(B) 6 (B) r-values
(C) Text of actual parameters
(C) 8
(D) None of these
(D) 7
19. Which of the following is FALSE regarding a Block?
17. A basic block can be analyzed by (A) The first statement is a leader.
(A) Flow graph (B) Any statement that is a target of conditional / un-
conditional goto is a leader.
(B) A graph with cycles
JW
(C) Immediately next statement of goto is a leader.
(C) DAG (D) The last statement is a leader.
Pw
(D) None of these
Jf
/2
Previous Years’ Questions
ly
it.
1. The least number of temporary variables required to The minimum number of total variables required to con-
//b
create a three-address code in static single assignment vert the above code segment to static single assignment
s:
form for the expression q + r/3 + s – t * 5 + u * v/w is form is _____ .
tp
________[2015] 4. What will be the output of the following pseudo-
ht
code when parameters are passed by reference and
2. Consider the intermediate code given below.
S
dynamic scoping is assumed?[2016]
TE
(1) i=1 a = 3;
O
(2) j=1
void n(x) { x = x* a; print (x);}
N
(3) t1 = 5 * i
void m(y) {a = 1; a = y – a; n(a) ; print (a)}
D
(4) t2 = t1 + j
N
t3 = 4 * t2
void main( ) {m(a);}
A
(5)
(A) 6,2 (B) 6,6
H
(6) t4 = t3
(C) 4,2 (D) 4,4
EW
(7) a[t4] = –1
5. Consider the following intermediate program in three
(8) j=j+1
N
address code
(9) if j < = 5 goto (3)
SE
p=a−b
(10) i=i+1
C
p=u*v
The number of nodes and edges in the control-flow-
A
x = u – t; q1 = p1 + q1 q5 = p4 + q4
O
y = x * v; (C) p1 = a − b (D) p1 = a − b
D
x = y + w; q1 = p2 * c q1 = p * c
y = t – z; p3 = u * v p2 = u * v
y = x * y; q2 = p4 + q3 q2 = p + q
Chapter 3 • Intermediate Code Generation | 6.55
Answer Keys
Exercises
Practice Problems 1
1. D 2. D 3. C 4. C 5. B 6. A 7. B 8. A 9. A 10. A
11. B 12. C 13. A 14. C 15. B
Practice Problems 2
1. B 2. B 3. A 4. B 5. B 6. A 7. B 8. C 9. A 10. A
11. B 12. D 13. A 14. C 15. C 16. A 17. C 18. B 19. D
JW
Previous Years’ Questions
Pw
1. 8 2. B 3. 10 4. D 5. B
Jf
/2
ly
it.
//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D
Chapter 4
Code Optimization
JW
Pw
LEARNING OBJECTIVES
Jf
Code optimization basics Pre-header
/2
ly
Principle sources of optimization Global data flow analysis
it.
Loop invariant code motion Definition and usage of variables
//b
Strength reduction on induction variables Use-definition (u-d ) chaining
s:
Loops in flow graphs Data flow equations
tp
ht
S
TE
O
N
D
This applies
H
3. Functional units
1. A transformation must preserve the meaning of programs.
D
3. A transformation must be worth the effort. machine characteristics. Altering the machine description param-
eters, one can optimize single piece of compiler code.
N
W
Places for improvements Target CPU architecture The issues to be considered for the opti-
O
JW
2. Copy propagation
3. Dead-code elimination (0) proc-begin sum
Pw
4. Loop optimization (1) t0: = n + 1
- Code motion (2) t1: = n * t0
Jf
(3) sultan: = t1/2
/2
- Induction variable elimination
ly
- Reduction in strength (4) t5: = 2 * n
it.
(5) t6: = t5 + 1
//b
Common sub expression elimination The process of iden- (6) t7: = t1 * t6
tifying common sub expressions and eliminating their com-
s:
(7) sum_n2: = t7/6
putation multiple times is known as common sub expression
tp
(8) sum_n3: sum_n * sum_n
elimination.
ht
(9) proc-end sum
Example: Consider the following program segment:
S
int sum_n, sum_n2, sum_n3; Constant folding The constant expressions in the input
TE
int sum (int n) source are evaluated and replaced by the equivalent values
O
{
For example 10*3, 6 + 101 are constant expressions and
D
Sum_n = ((n)*(n+1))/2;
they are replaced by 30, 107 respectively.
N
sum_n2 = ((n)*(n+1)*(2n+1))/6;
A
}
Three Address code for the above input is int main ( )
N
{
(0) Proc-begin sum arr1 [0] = 3;
SE
(2) t1: = n * t0 }
TE
(3) t2: = t1/2 Unoptimized three address code equivalent to the above ‘C’
A
(1)
LO
(9) (4)
t3: = &arr1
O
JW
(1) t0 = 0
(2) t2 = 4 (8) e : = d + c
Pw
After applying copy propagation, the optimized code will be (9) label : L2
Jf
(0) proc-begin main (10) f : = d * e
/2
(1) t0: = 0
ly
(11) return f
(2) t1: = &arr1
it.
(12) goto L3
(3) t1 [0]: = 3
//b
(13) label : L3
(4) t2: = 4 (14) proc-end func
s:
(5) t3: = &arr1
tp
(6) t3 [4]: = 4 Three address code after variable (copy) propagation:
ht
(7) Label L0 (0) proc-begin func
S
(8) proc-end main (1) d: = a
TE
In the three address code shown above, quadruples (1) (2) If a >10 goto .L0
O
and (4) are no longer used in any of the following statements. (3) goto L1
N
(5) e: = a + b
N
(6) goto L2
(0) proc-begin main
H
(7) label: L1
(1) t1: = &arr1
EW
(8) e: = a + c
(2) t1 [0]: = 3 (9) label: L2
N
(12) goto L3
(6) proc-end main
TE
(13) label: L3
In the above example, we are propagating constant val- (14) proc-end func
A
Variable propagation Propagating another variable instead In the above code (1) d: = a is no more used
A
of the existing one is known as variable propagation. ∴ Eliminate the dead store d: = a
LO
{
W
(2) goto L1
d = a;
D
(3) label: L0
If (a > 10) (4) e: = a + b
{ (5) goto L2
e = d + b;
(6) label: L1
}
(7) e: a + c
Else
{ (8) label: L2
e = d + c; (9) f: = a*e
} (10) return f
f = d*e; (11) goto L3
return (f); (12) label: L3
} (13) proc-end func
Chapter 4 • Code Optimization | 6.59
Dead code elimination Eliminating the code that never Example: Consider the following code fragment:
gets executed by the program is known as Dead code struct mystruct
elimination. It reduces the memory required by the program
{
Example: Consider the following Unoptimized Interme int a [20];
diate code: int b;
(0) proc-begin func } xyz;
(1) debug: = 0 int func(int i)
(2) If debug = = 1 goto L0 {
(3) goto L1 xyz.a[i] = 34;
(4) label: L0 }
JW
(5) param c
The Unoptimized three address code:
Pw
(6) param b
(7) param a (0) proc-begin func
Jf
(8) param lc1 (1) t0: = &xyz
/2
(9) call printf 16
ly
(2) t 1: = 0
it.
(10) retrieve to
t2: = i*4
//b
(11) label: L1 (3)
(12) t1: = a + b (4) t1: = t2 + t1
s:
(13) t2: = t1 + c (5) t0 [t1] = 34
tp
(14) v1: = t2 (6) label: L0
ht
(15) Return v1 (7) proc-end func
S
(16) goto L2 TE
Optimized code after copy propagation and dead code elim-
(17) label: L2
ination is shown below:
O
(18) proc-end func
The statement t1: = 0 is eliminated.
N
If 0 = = 1 goto L0
EW
(3) t1 : = t 2 + 0
0 = = 1, always returns false.
(4) t0 [t1]: = 34
N
This makes the statements (4) through (10) as dead (6) proc-end func
C
(3) label: L1
W
(5) label: L0
(4) t1: = a + b
O
For example y: = x*2 is replaced by y: = x + x as addition is After loop invariant code motion transformation the code
less expensive than multiplication. will be
Similarly (0) proc-begin func
Replace y: = x*32 by y: = x << 5 (1) i:=0
Replace y: = x/8 by y: = x >> 3 (2) n1 : = x*y
(3) n2 : = x-y
Loop optimization We can optimize loops by (4) t3 : = &arr
(1) Loop invariant code motion transformation. (5) t5 : n1*n2
(2) Strength reduction on induction variable transformation. (6) label : L0
(7) t2 : = i*4
JW
(8) t4 : = t3[t2]
Loop invariant code motion (9) if t4 > t5 goto L1
Pw
The statements within a loop that compute value, which (10) goto L2
Jf
do not vary throughout the life of the loop are called loop (11) label : L1
/2
invariant statements. (12) i:=i+1
ly
Consider the following program fragment: (13) goto L0
it.
int a [100]; (14) label : L2
//b
int func(int x, int y) (15) return i
s:
(16) goto L3
{
tp
(17) label : L3
ht
int i; (18) proc-end func
int n1, n2;
S
i = 0;
TE
Strength reduction on induction variables
n1 = x*y;
O
2
A
return(i);
EW
int i;
}
int a[20];
N
{
(0) proc-begin func
C
while(i<20)
(1) i:=0
TE
(3) n2 : = x – y
G
}
LO
(6) t3 : = &arr
The three-address code will be
N
(7) t4 : = t3[t2]
W
(1) label : L0
D
JW
(6) t1 : = &a
(7) t1[t0] : = 10 The Flow Graph for above code will be:
Pw
(8) i:=i+1
Jf
(8a) t0 : = t0 + 4 proc – begin func
/2
(9) goto L0 B0 x:=a
ly
(10) label : L2 y:=a
it.
(11) label : L3
//b
(12) proc-end func
s:
tp
Loops in Flow Graphs
ht
Label: L 0
B1 If a < 100 goto L1
S
Loops in the code are detected during the data flow analysis TE
by using the concept called ‘dominators’ in the flow graph.
O
B2 goto L2 label : L1
N
Dominators t0 : = y ∗ x
D
y : = t0 B3
A node d of a flow graph dominates node n, if every path
N
label : L2 t1 = x + 1
A
Notes:
Label : L3
N
int x, y;
The dominators for each of the nodes in the flow graph
D
x = a;
are
A
y = a;
LO
The presence of a back edge indicates the existence of a Header of the loop The entry of the loop is also called as
loop in a flow graph. the header of the loop.
In the previous graph, 3 → 1 is a back edge.
Loop exit block In loop L1 can be exited from the basic
Consider the following table: block B6. It is called loop exit block. The block B3 is the loop
exit block for the loop L2. It is possible to have multiple exit
Dominators Dominators
Edge Head Tail [head] [tail] blocks in a loop.
0→1 1 0 {0, 1} {0}
Dominator tree
1→2 2 1 {0, 1, 2} {0, 1}
A tree, which represents dominate information in the form
1→3 3 1 {0, 1, 3} {0, 1}
of tree is a dominator tree. In this,
JW
3→1 1 3 {0, 1} {0, 1, 3}
•• The initial node is the root.
Pw
2→4 4 2 {0, 1, 2, 4} {0, 1, 2}
•• Each node d dominates only its descendents in the tree.
4→5 5 4 {0, 1, 2, 4, 5} {0, 1, 2, 4}
Jf
/2
Consider the flow graph
ly
Example: Consider below flow graph:
it.
B0 1
//b
2
s:
B1 B2 3
tp
ht
B3 B4 4
S
TE 5 6
B5 B6
O
N
B7 7
D
N
9 10
dominators [1] = {0, 1}
EW
Dominators
Edge Head Tail [head] Dominators [tail]
dominators [6] = {1, 3, 4, 6}
D
3→1
O
1 3 {0, 1} {0, 1, 3}
The dominator tree will be:
D
Here {B6, B2, B4} form a loop (L1), {B3, B1} form another 8
loop (L2)
In a loop, the entry of the loop dominates all nodes in 9 10
the loop.
Chapter 4 • Code Optimization | 6.63
Pre-header In the above flow graph, there are five back edges
A pre-header is a basic block introduced during the loop 4 → 3, 7 → 4, 8 → 3, 9 → 1 and 10 → 7
optimization to hold the statements that are moved from Remove all backedges.
within the loop. It is a predecessor to the header block. The remaining edges must be the forward edges.
The remaining graph is acyclic.
1
Header Pre-header
2
loop L 3
Header
JW
loop L 4
Pw
B0
B0 5 6
Jf
7
/2
B1 B2
ly
B1 B2
it.
8
//b
B3 After
pre-header B7
10
s:
9
tp
B4 ∴ It is reducible.
ht
B3
S
Global Dataflow Analysis
TE
B5
Point: A point is a place of reference that can be found at
O
B5
N
B6
N
Example 1:
H
a. = 10
EW
Example 2: •P –B
(2) Backward edges with the following properties.
TE
1 1
proc-begin func
(i) The forward edges form an acyclic graph in which
A
• P2 – B1
G
1
There is 4 point in the basic block B1, given by P1 – B1,
O
2
D
JW
Goto .L1 B2 ning of the whole program for a variable.
B1 • P6 – b2
• P4 – b1
v1 : = 0 A point is defined either prior to or immediately after a
Pw
• P7 – b2 statement.
Jf
• P7 – b3
Reaching definitions
/2
Label L1
ly
B3 • P8 – b3 A definition of a variable A reaches a point P if there is a
it.
V5 : = v1 + v2 path in the flow graph from that definition to P, such that no
//b
• P9 – b3
other definitions of A appear on the path.
s:
Example:
tp
• P9 – b4 if A = B goto B3 B1
ht
Label L1
• P10 – b4
S
B4 Proc-end func TE B2 A:=2 if A = B goto B5 B3
• P11 – b4
O
N
A:=3 B4
Path is between the points P0 – b0 and P6 – b2:
D
B5 P:
N
Path between P0 – b0 and P7 – b3: There are two paths. it is associated with a unique quadruple.
SE
(1) Path 1 consists of the sequence of points, P0 – b0, P1 – b0, •• For each simple variable A, make a list of all definitions of
C
(2) Path 2 consists of the sequence of points P0 – b0, P1 – b0, •• Compute two sets for each basic block B.
P2 – b0, P3 – b0, P4 – b2, P5 – b2, P6 – b2, P7 – b2 and
A
Definition and Usage of Variables 1. Kill [B], which is the set of definitions outside of B that
A
LO
(a) d ∈ IN [B] and is not killed by B. If no such definition precedes its use, all definitions of
(or) ‘a’ in IN [B] are on its chain.
(b) d is generated in B and is not subsequently redefined
here.
2. IN [B] = U OUT [P] Uses of U-d Chains
∀ P preceding B
1. If the only definition of ‘a’ reaching this statement
A definition reaches the beginning of B iff it reaches
involves a constant, we can substitute that constant for
the end of one of its predecessors.
‘a’.
2. If no definitions of ‘a’ reaches this point, a warning can
Computing U-d Chains be given.
JW
If a use of variable ‘a’ is preceded in its block by a definition 3. If a definition reaches nowhere, it can be eliminated.
Pw
of ‘a’, this is the only one reaching it. This is part of dead code elimination.
Jf
/2
ly
it.
Exercises
//b
s:
Practice Problems 1 6. In block B, if x or y is assigned there and s is not in B,
tp
then s : x = y is
Directions for questions 1 to 15: Select the correct alterna-
ht
(A) Generated (B) Killed
tive from the given choices.
(C) Blocked (D) Dead
S
1. Replacing the expression 2 * 3.14 by 6.28 is
TE
7. Given the following code
(A) Constant folding
O
A = x + y;
(B) Induction variable
N
B = x + y;
(C) Strength reduction
D
C = x + y;
with a single register without storing the value of (a*b) –––––
N
if A = C;
SE
(B) Intermediate representation 8. Can the loop invariant X = A – B from the following
(C) Runtime output
N
Common data for questions 10 and 11: Consider the fol- 13. The analysis that cannot be implemented by forward
lowing statements of a block: operating data flow equations mechanism is
a: = b + c (A) Interprocedural
b: = a – d (B) Procedural
c: = b + c (C) Live variable analysis
d: = a – d (D) Data
10. The above basic block contains, the value of b in 3rd
statement is 14. Which of the following consist of a definition, of a vari-
(A) Same as b in 1st statement able and all the uses, U, reachable from that definition
(B) Different from b in 1st statement without any other intervening definitions?
(A) Ud-chaining (B) Du-chaining
JW
(C) 0
(D) 1 (C) Spanning (D) Searching
Pw
11. The above basic block contains 15. Consider the graph
Jf
(A) Two common sub expression
/2
(B) Only one common sub expression 1
ly
(C) Dead code
it.
(D) Temporary variable
//b
2 3
12. Find the induction variable from the following code:
s:
A = –0.2;
tp
B = A + 5.0; The graph is
ht
(A) A (A) Reducible graph
(B) Non-reducible graph
S
(B) B TE
(C) Both A and B are induction variables (C) Data insufficient
O
(D) No induction variables (D) None of these
N
D
N
Practice Problems 2
A
1. In labeling algorithm, let n is a binary node and its chil- (D) x and y are aliases
SE
{
2. The input for the code generator is a: for (j=0; j<m; j ++)
A
G
{
A
(B) i2 + 1
(A) There is a scope of common reduction in this code
(C) i2 – 1
(B) There is a scope of strength reduction in this code.
(D) i2 – i1
(C) There is scope of dead code elimination in this
4. The following tries to keep frequently used value in a code
fixed register throughout a loop is: (D) Both (A) and (C)
(A) Usage counts 7. S1: In dominance tree, the initial node is the root.
(B) Global register allocation S2: Each node d dominates only its ancestors in the tree.
(C) Conditional statement S3: if d≠n and d dom n then d dom m.
(D) Pointer assignment
Which of the statements is/are true?
5. Substitute y for x for copy statement s : x = y if the fol- (A) S1, S2 are true
lowing condition is met (B) S1, S2 and S3 are true
Chapter 4 • Code Optimization | 6.67
JW
(B) Reducible graphs }
(C) Depth first ordering ------
Pw
(D) All of these printf (“%d”, x);
Jf
10. A point cannot be found: }
/2
(A) Between two adjacent statements
ly
The output is
(B) Before the first statement
it.
(A) 3 – 25 (B) 25 – 3
(C) After the last statement
//b
(C) 3 – 3 (D) 25 – 25
(D) Between any two statements
s:
13. The evaluation strategy which delays the evaluation of
11. In the statement, x = y*10 + z; which is/are defined?
tp
an expression until its value is needed and which avoids
(A) x (B) y
ht
repeated evaluations is:
(C) z (D) Both (B) and (C)
(A) Early evaluation (B) Late evaluation
S
12. Consider the following program: TE
(C) Lazy evaluation (D) Critical evaluation
void main ( ) 14. If two or more expressions denote same memory
O
{
N
--------
H
(A) Main compile this code segment without any spill to mem-
ory? Do not apply any optimization other than opti-
A1
mizing register allocation. [2013]
A2 (A) 3 (B) 4
FRAME A21 (C) 5 (D) 6
POINTER
A1 4. Suppose the instruction set architecture of the proces-
ACCESS sor has only two registers. The only allowed compiler
LINKS optimization is code motion, which moves statements
(B) Main from one place to another while preserving correct-
ness. What is the minimum number of spills to mem-
JW
A1 ory in the compiled code? [2013]
Pw
A2 (A) 0 (B) 1
A 21 (C) 2 (D) 3
Jf
5. Which one of the following is NOT performed during
/2
A1
FRAME compilation? [2014]
ly
ACCESS
POINTER
it.
LINKS (A) Dynamic memory allocation
//b
(B) Type checking
(C) Symbol table management
s:
(D) Inline expansion
tp
(C)
ht
Main 6. Which of the following statements are CORRECT?
A1 [2014]
S
FRAME (i) Static allocation of all data areas by a compiler
TE
A2
POINTER makes it impossible to implement recursion.
O
LINKS
(iii) Dynamic allocation of activation records is es-
N
A
A1
recursion.
A2 (A) (i) and (ii) only (B) (ii) and (iii) only
N
A21 (C) (iii) and (iv) only (D) (i) and (iii) only
SE
ACCESS
POINTER
program if the following three conditions hold simul-
LINKS
TE
taneously: [2015]
A
Common data for questions 3 and 4: The following code 1. There exists a statement Sj that uses x
G
segment is executed on a processor which allows only reg- 2. There is a path from Si to Sj in the flow graph cor-
D
ister operands in its instructions. Each instruction can have responding to the program.
A
c = a + b;
W
d = c * a; 1 p=q+r
O
s=p+q
e = c + a;
D
u=s*v
x = c * c;
If (x > a) {
2 v=r+u 3 q=s*u
y = a * a;
}
Else { 4 q=v+r
d = d * d;
e = e * e; The variables which are live both at the statement in
} basic block 2 and at the statement in basic block 3 of
the above control flow graph are
3. What is the minimum number of registers needed (A) p, s, u (B) r, s, u
in the instruction set architecture of the processor to (C) r, u (D) q, v
Chapter 4 • Code Optimization | 6.69
8. Match the following [2015] Consider a program P following the above gram-
mar containing ten if terminals. The number of con-
P. Lexical analysis 1. Graph coloring trol flow paths in P is__________. For example, the
Q. Parsing 2. DFA minimization program
R. Register allocation 3. Post-order traversal if e1 then e2 else e3
S. Expression evaluation 4. Production tree has 2 control flow paths, e1 → e2 and e1 → e3. [2017]
(A) P–2, Q–3, R–1, S–4 (B) P–2, Q–1, R–4, S–3 11. Consider the expression (a—1) ∗ (((b + c)/3) + d)).
(C) P–2, Q–4, R–1, S–3 (D) P–2, Q–3, R–4, S–1 Let X be the minimum number of registers required
9. Consider the following directed graph: by an optimal code generation (without any register
JW
spill) algorithm for a load/store architecture, in which
Pw
b c (i) only load and store instruction can have memory
operands and (ii) arithmetic instructions can have
Jf
only register or immediate operands. The value of X
/2
a f is . [2017]
ly
it.
12. Match the following according to input (from the left
//b
d e column) to the compiler phase (in the right column)
s:
that processes it: [2017]
tp
The number of different topological orderings of the
ht
vertices of the graph is _______ . [2016] (P) Syntax tree (i) Code generator
10. Consider the following grammar: (Q) Character stream (ii) Syntax analyzer
S
− > if expr then expr else expr; stmt | Ò
stmt
TE
(R) Intermediate representation (iii) Semantic analyzer
expr − > term relop term | term (S) Token stream (iv) Lexical analyzer
O
number (B)
A
where relop is a relational operator (e.g., <, >,…), Ò (C) P → (iii), Q → (iv), R → (i), S → (ii)
H
refers to the empty statement, and if, then, else are (B) P → (i), Q → (iv), R → (ii), S → (iii)
EW
terminals.
N
SE
C
Answer Keys
TE
A
Exercises
G
D
Practice Problems 1
A
1. A 2. C 3. B 4. A 5. B 6. B 7. C 8. B 9. B 10. B
LO
Practice Problems 2
O
D
1. D 2. B 3. A 4. B 5. A 6. D 7. D 8. D 9. D 10. D
11. A 12. B 13. C 14. A 15. B
Test
JW
(A) Lexical analysis (A) Bottom up parsers
Pw
(B) Code optimization (B) Top down parsers
(C) Syntax analysis (C) Both (A) and (B)
Jf
(D) Semantic analysis (D) None of these
/2
10. Consider the following grammars:
ly
3. A shift reduces parser carries out the actions specified
it.
within braces immediately after reducing the corre- I. E → TE′
//b
sponding rule of grammar, as below: E′ → + TE′/∈
T → FT ′
s:
S → aaD {Print “1”}.
tp
S → b {Print “2”} TI → *FT ′/∈
ht
D → Sc {Print “3”} F → (E)/id
What is the translation of ‘aaaabcc’ using the syntax di- II. S → iCtSS′ | a
S
rected translation scheme described by the above rules?
TE
S′ → eS | ∈
(A) 33211 (B) 11233 C → b
O
E′ → + TE′/∈
(C) Both (A) and (B) (D) None of these
A
T → FT ′
H
F → (E)/id S → iCtSS′/a
From above grammar, FOLLOW (E) is S′ → eS/∈
N
(B) Left Factoring 12. From the above grammar Follow(S) is.
G
6. Consider the grammar 13. Find the LEADING (S) from the following grammar:
T → (T) | ∈ S → a | ^ | (T)
N
T → T, S / S
(1) parsers for the grammar be n1, n2 and n3 respectively.
O
16. Consider the following grammar: 22. The parse tree is constructed and then it is traversed
S → AB and the semantic rules are evaluated in a particular
B → ab order by a
A → aa (A) Recursive evaluator
A→a (B) Bottom up translation
B → b. (C) Top down translation
The grammar is (D) Phase tree method
(A) Ambiguous 23. The following grammar indicates
(B) Unambiguous S → a a b|b a c|a b
(C) Can’t predictable S → a S |b
JW
(D) None of these S → a b b/a b
S → a b d b/b
17. If a handle has been found but there is no production
Pw
(A) LR (0) grammar
with this handle as a right side, then we discover
Jf
(B) SLR grammar
(A) Logical error
/2
(C) Regular grammar
(B) Runtime error
ly
(D) None of these
(C) Syntactic error
it.
(D) All of the above 24. If the attributes of the child depends on the attributes of
//b
the parent node then it is ____ attribute.
s:
18. The function of syntax phase is (A) Inherited
tp
(A) To build a literal table (B) Directed
ht
(B) To build an uniform symbol table (C) Synthesised
(C) To parse the tokens produced by lexical analyzer (D) TAC
S
(D) None of these
TE
25. The semantic rule is evaluated and the intermediate
O
19. Which of the following are cousins of compilers? code is generated when the production is expanded in
N
20. Error is detected in predictive parsing when ____ 26. Consider the grammar shown below:
N
hold(s). S → CC
(i) ‘a’ on top of stack and next input symbol is ‘b’.
SE
C → cC/a
(ii) When ‘a’ is on top of stack, ‘a’ is next input sym-
C
The grammar is
bol and parsing table entry M [A, a] is empty.
TE
(A) LL (1)
(A) Neither (i) nor (ii)
(B) SLR (1) But not LL (1)
A
21. Which one indicates abstract syntax tree (AST) of “a * dictive parsers looking k-symbols ahead in the input is
N
(C) LALR (k)
F → id (D) LL (k)
(A) * (B) + 28. A compiler is a program that
(A) Places programs into memory and prepares them
a b b c
for execution.
c (B) Automates the translation of assembly language
into machine language.
(C) * (D) * (C) Accepts a program written in a high level language
a + + a and produces an object program.
(D) Appears to execute a source program as if it were
b c b c machine language.
6.72 | Unit 6 • Compiler Design
Common data for questions 29 and 30: 29. Which one is FOLLOW (F)?
Consider the grammar (A) {+,), $} (B) {+, (,), *}
E → TE′ (C) {*,), $} (D) {+, *,), $}
E’ → + TE′ | ∈ 30. FIRST (E) will be as same as
T → FT ′ (A) FIRST (T) (B) FIRST (F)
T1 → * FT ′ | ∈ (C) Both (A) and (B) (D) None of these
F → (E) | id.
Answers Keys
JW
1. A 2. D 3. D 4. A 5. B 6. C 7. B 8. A 9. A 10. B
Pw
11. C 12. A 13. A 14. C 15. D 16. A 17. C 18. C 19. A 20. B
Jf
21. C 22. A 23. D 24. A 25. C 26. A 27. D 28. C 29. D 30. C
/2
ly
it.
//b
s:
tp
ht
S
TE
O
N
D
N
A
H
EW
N
SE
C
TE
A
G
D
A
LO
N
W
O
D