0% found this document useful (0 votes)
27 views

CD Notes Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

CD Notes Unit 5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Please read this disclaimer before

proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
1. CONTENTS

Page
S. No Contents
No

1 Course Objectives 5

2 Pre Requisites 6

3 Syllabus 7

4 Course outcomes 9

5 CO- PO/PSO Mapping 10

6 Lecture Plan 11

7 Activity based learning 12

8 UNIT -IV Lecture Notes 13

9 Assignments 47

10 Part A Questions & Answers 48

11 Part B Questions 52

12 Supportive online Certification courses 53

13 Real time Applications 54

14 Contents beyond the Syllabus 55

15 Assessment Schedule 58

16 Prescribed Text Books & Reference Books 59

17 Mini Project Suggestions 60

3
21CS601
Compiler Design

Department: CSE
Batch/Year: 2021-25 / III

Created by:
Dr. P. EZHUMALAI, Prof & Head/RMDEC
Dr. A. K. JAITHUNBI, Associate Professor/RMDEC
V.SHARMILA, Assistant Professor/RMDEC

Date: 26.03.2024

4
1. CONTENTS

S. Page
Contents
No No
1 Course Objectives 5

2 Pre Requisites 6

3 Syllabus 7

4 Course outcomes 9

5 CO- PO/PSO Mapping 10

6 Lecture Plan 11

7 Activity based learning 12

8 Lecture Notes 13

9 Assignments 47

10 Part A Questions & Answers 48

11 Part B Questions 55

12 Supportive online Certification courses 58

13 Real time Applications 59

14 Contents beyond the Syllabus 60

15 Assessment Schedule 64

16 Prescribed Text Books & Reference Books 65

17 Mini Project Suggestions 66

5
2. COURSE OBJECTIVES

• To study the different phases of Compiler.

• To understand the techniques for tokenization and parsing.

• To understand the conversion of source program into an intermediate


representation.

• To learn the different techniques used for assembly code generation.

• To analyze various code optimization techniques.

6
3. PRE REQUISITES
• Pre-requisite Chart

21CS601 - COMPILER DESIGN

21CS503 THEORY OF COMPUTATION

21MA302
21CS201
Discrete Mathematics
Data Structures

21CS02 Python
21GE101 -
Programming (Lab
Problem solving and C
Integrated)
Programming

7
4. SYLLABUS
21CS601 COMPILER DESIGN (Lab Integrated) LTPC
3 02 4
OBJECTIVES

• To study the different phases of compiler.


• To understand the techniques for tokenization and parsing.
• To understand the conversion of source program into an intermediate
representation.
• To learn the different techniques used for assembly code generation.
• To analyze various code optimization techniques.

UNIT I INTRODUCTION TO COMPILERS 9


Introduction–Structure of a Compiler–Role of the Lexical Analyzer - Input
Buffering - Specification of Tokens - Recognition of Tokens–The Lexica Analyzer
Generator LEX- Finite Automata - From Regular Expressions to Automata -
conversion from NFA to DFA, Epsilon NFA to DFA – Minimization of Automata.
UNIT II SYNTAX ANALYSIS 9
Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in
Grammars and Languages- Writing a grammar – Top-Down Parsing –Bottom Up
Parsing -LR Parser-SLR, CLR - Introduction to LALR Parser -Parser Generators –
Design of a parser generator –YACC.
UNITIII INTERMEDIATE CODE GENERATION 9
Syntax Directed Definitions - Evaluation Orders for Syntax Directed Definitions–
Application of Syntax Directed Translation - Intermediate Languages - Syntax Tree
-Three address code – Types and Declarations - Translation of Expressions - Type
Checking.

UNIT IV RUN-TIME ENVIRONMENT AND CODE GENERATION 9


Run Time Environment: Storage Organization-Stack Allocation of space - Access
to nonlocal data on stack – Heap management - Parameter Passing - Issues in Code
Generation - Design of a simple Code Generator Code generator using DAG –
Dynamic programming based code generation.
UNIT V CODE OPTIMIZATION 9
Principal Sources of Optimization – Peep-hole optimization - Register allocation
and assignment - DAG -Basic blocks and flow graph - Optimization in Basic
blocks – Data Flow Analysis.

8
4. SYLLABUS

LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol
table, while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
3. Implement a Lexical Analyzer using Lex Tool.
4. Design Predictive Parser for the given language.
5. Implement an Arithmetic Calculator using LEX and YACC.
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation).
8. Implement back-end of the complier for which the three address code
is given as input and the 8086 assembly language code is produced as
output.

9
5. COURSE OUTCOME

• At the end of the course, the student should be able to:

COURSE OUTCOMES HKL

Understand the different phases of compiler and identify the


CO1 K2
tokens using Automata

Construct the parse tree and check the syntax of the given
CO2 K3
source program.

Generate intermediate code representation for any source


CO3 K4
programs.

Analyze the different techniques used for assembly code


CO4 K4
generation.

Implement code optimization techniques with simple code


CO5 K3
generators.

10
• HKL = Highest Knowledge Level
6. CO - PO / PSO MAPPING

PROGRAM OUTCOMES PSO


K3, PS PS PS
CO HKL K3 K4 K5 K5 K4, A3 A2 A3 A3 A3 A3 A2 O1 O2 O3
K5
PO PO PO PO PO PO PO PO PO PO PO PO
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12

CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -

CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO3 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO4 K4 3 2 1 - - - - 1 1 1 - 1 2 - -

CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

• Correlation Level - 1. Slight (Low) 2. Moderate (Medium)


3. Substantial (High) , If there is no correlation, put “-“.

11
7. LECTURE PLAN
UNIT – V CODE OPTIMIZATION

S. Propos Topic Actual Pertai High Mod Deli After successful Rem
No ed Lecture ning est e of very completion of the arks
Lectur CO(s) Cog Deli Reso course, the students
e n very urce should be able to
Period Period itive s ( Outcomes)
Leve
l
1 CO5 K2 MD1 T1
Principal Optimize the given code
Sources of using principle sources of
Optimization optimization

2 CO5 K2 MD1 T1
MD5 Optimize the given step
Optimization
using principle sources of
problem
optimization

3 CO5 K3 MD1 T1
Peep-hole Optimize the code within
optimization the basic blovks

4 CO5 K2 MD1 T1
Register MD2
Allocating registers to the
allocation and
optimized code
assignment

5 CO5 K2 MD1 T1
MD2 Identify the common
DAG subexpression and
construct DAG

6 CO5 K2 MD1 T1
Group the dependent
Basic blocks contiguous code and main
and flow graph the control flow and data
flow

7 CO5 K2 MD1 T1

-Optimization Optimize the code within


in Basic blocks the basic block (local)

8 CO5 K3 MD1 T1 Check the data flow and


Data Flow control flow in if
Analysis statement, for loop, while
loop and do while loop.
8. ACTIVITY BASED LEARNING : UNIT – V

UNIT – V

• TO UNDERSTAND THE BASIC CONCEPTS OF COMPILERS , STUDENTS


ABLE TO TAKE QUIZ AS AN ACTIVITY.

• LINKS WILL BE PROVIDED BELOW.

https://create.kahoot.it/share/cs8602-unit-1/2e5c742f-541a-4bd0-
84b2-9e2ca3b47170
Join at www.kahoot.it
or with the Kahoot! app use the below game pin to
play the Quiz.

13
9. LECTURE NOTES : UNIT – V
CODE OPTIMIZATION

PRINCIPAL SOURCES OF OPTIMIZATION


A transformation of a program is called local if it can be performed by looking
only at the statements in a basic block; otherwise, it is called global. Many
transformations can be performed at both the local and global levels. Local
transformations are usually performed first.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes.
Function preserving transformations examples:

Common sub expression elimination


Copy propagation,
Dead-code elimination
Constant folding

The other transformations come up primarily when global optimizations are performed.

Frequently, a program will include several calculations of the offset in an array. Some of
the duplicate calculations cannot be avoided by the programmer because they lie below the
level of detail accessible within the source language.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E was
previously computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5

14
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1 and the
value of i is not been changed from definition to use.

Common Sub expressions elimination:

An occurrence of an expression E is called a common sub-expression if E was


previously computed, and the values of variables in E have not changed since the
previous computation. We can avoid recomputing the expression if we can use the
previously computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1 and the
value of i is not been changed from definition to use.
Copy Propagation:
Assignments of the form f : = g called copy statements, or copies for short. The idea behind
the copy-propagation transformation is to use g for f, whenever possible after the copy
statement f: = g. Copy propagation means use of one variable instead of another. This may
not appear to be an improvement, but as we shall see it gives us an opportunity to eliminate
x.
For example:
x=Pi;
A=x*r*r;
The optimization using copy propagation can be done as follows: A=Pi*r*r;
Here the variable x is eliminated

15
Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is dead or
useless code, statements that compute values that never get used. While
the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations.
Example:
i=0;
if(i=1)
{
a=b+5;
}Here, ‘if’ statement is dead code because this condition will never get
satisfied.
Constant folding:
Deducing at compile time that the value of an expression is a
constant and using the constant instead is known as constant folding. One
advantage of copy propagation is that it often turns the copy statement into
dead code.
For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
Loop Optimizations:
In loops, especially in the inner loops, programs tend to spend the bulk of
their time. The running time of a program may be improved if the number
of instructions in an inner loop is decreased, even if we increase the
amount of code outside that loop.
Three techniques are important for loop optimization:
1. Code motion, which moves code outside a loop;
2. Induction-variable elimination, which we apply to replace variables
from inner loop.
3. Reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.

16
Code Motion:
An important modification that decreases the amount of code in a loop is
code motion. This transformation takes an expression that yields the same
result independent of the number of times a loop is executed (a loop-
invariant computation) and places the expression before the loop. Note
that the notion “before the loop” assumes the existence of an entry for the
loop. For example, evaluation of limit-2 is a loop-invariant computation
in the following while-statement:


• while (i <= limit-2) /* statement does not change limit*/

• Code motion will result in the equivalent of

• t= limit-2;
• while (i<=t) /* statement does not change limit or t */

17
Induction Variables :

Loops are usually processed inside out. For example consider the loop
around B3. Note that the values of j and t4 remain in lock-step; every
time the value of j decreases by 1, that of t4 decreases by 4 because 4*j is
assigned to t4. Such identifiers are called induction variables.

When there are two or more induction variables in a loop, it may be possible
to get rid of all but one, by the process of induction-variable elimination. For
the inner loop around B3 in Fig.5.3 we cannot get rid of either j or t4
completely; t4 is used in B3 and j in B4.

However, we can illustrate reduction in strength and illustrate a part of the


process of induction-variable elimination. Eventually j will be eliminated
when the outer loop of B2- B5 is considered.

Example: As the relationship t4:=4*j surely holds after such an


assignment to t4 in Fig. and t4 is not changed elsewhere in the inner loop
around B3, it follows that just after the statement j:=j-1 the relationship t4:=
4*j-4 must hold. We may therefore replace the assignment t4:= 4*j by t4:= t4-
4. The only problem is that t4 does not have a value when we enter block B3
for the first time. Since we must maintain the relationship t4=4*j on entry to
the block B3, we place an initializations of t4 at the end of the block where j
itself is initialized, shown by the dashed addition to block B1 in Fig.5.3.

The replacement of a multiplication by a subtraction will speed up the object


code if multiplication takes more time than addition or subtraction, as is the
case on many machines.

Reduction In Strength:

Reduction in strength replaces expensive operations by equivalent cheaper


ones on the target machine. Certain machine instructions are considerably
cheaper than others and can often be used as special cases of more expensive
operators. For example, x² is invariably cheaper to implement as x*x than as a
call to an exponentiation routine. Fixed-point multiplication or division by a
power of two is cheaper to implement as a shift. Floating-point division by a
constant can be implemented as multiplication by a constant, which may be
cheaper.

18
19
PEEPHOLE OPTIMIZATION
A statement-by-statement code-generations strategy often produces target code
that contains redundant instructions and suboptimal constructs. The quality of
such target code can be improved by applying “optimizing” transformations to
the target program.

A simple but effective technique for improving the target code is


peephole optimization, a method for trying to improving the performance of the
target program by examining a short sequence of target instructions (called the
peephole) and replacing these instructions by a shorter or faster sequence,
whenever possible.

The peephole is a small, moving window on the target program. The code
in the peephole need not be contiguous, although some implementations do
require this. It is characteristic of peephole optimization that each improvement
may spawn opportunities for additional improvements.

Characteristics of peephole optimizations:


Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
Unreachable
Redundant Loads And Stores:
If we see the instructions sequence
(1) MOV R0,a
(2) MOV a,R0

we can delete instructions because whenever is executed. we will ensure


that the value of a is already in register R0.If had a label we could not be sure
that was always executed immediately before and so we could not remove .

20
Unreachable Code:
Another opportunity for peephole optimizations is the removal of
unreachable instructions. An unlabeled instruction immediately following
an unconditional jump may be removed. This operation can be repeated
to eliminate a sequence of instructions. For example, for debugging
purposes, a large program may have within it certain segments that are
executed only if a variable debug is 1. In C, the source code might look
like:
#define debug 0
….
If ( debug ) {
Print debugging information
}
In the intermediate representations the if-statement may be translated as:

If debug =1 goto L1 goto L2

L1: print debugging information L2: ………………………… (a)

One obvious peephole optimization is to eliminate jumps over jumps .Thus no


matter what the value of debug; (a) can be replaced by:

If debug ≠1 goto L2
Print debugging information
L2: …………………………… (b)

If debug ≠0 goto L2
Print debugging information
L2: ………………………… (c)
As the argument of the statement of (c) evaluates to a constant true it can be
replaced

21
By goto L2. Then all the statement that print debugging aids are manifestly
unreachable and can be eliminated one at a time.

Flows-Of-Control Optimizations:
The unnecessary jumps can be eliminated in either the intermediate code
or the target code by the following types of peephole optimizations. We
can replace the jump sequence

goto L1
….

L1: gotoL2 (d)


by the sequence
goto L2
….
L1: goto L2
If there are now no jumps to L1, then it may be possible to eliminate the
statement L1:goto L2 provided it is preceded by an unconditional jump
.Similarly, the sequence
if a < b goto L1
….

L1: goto L2 (e)

can be replaced by
If a < b goto L2

….

L1: goto L2

Finally, suppose there is only one jump to L1 and L1 is preceded by an


unconditional goto. Then the sequence
goto L1

L1: if a < b goto L2 (f) L3:


may be replaced by

22
If a < b goto L2
goto L3
…….
L3:
Algebraic Simplification:

There is no end to the amount of algebraic simplification that can be


attempted through peephole optimization. Only a few algebraic identities
occur frequently enough that it is worth considering implementing them.
For example, statements such as

x := x+0 or x := x * 1

are often produced by straightforward intermediate code-generation


algorithms, and they can be eliminated easily through peephole optimization.

Reduction in Strength:

Reduction in strength replaces expensive operations by equivalent


cheaper ones on the target machine. Certain machine instructions are
considerably cheaper than others and can often be used as special cases of
more expensive operators.

For example, x² is invariably cheaper to implement as x*x than as a call


to an exponentiation routine. Fixed-point multiplication or division by a
power of two is cheaper to implement as a shift. Floating-point division
by a constant can be implemented as multiplication by a constant, which
may be cheaper.

X2 → X*X

Use of Machine Idioms:

The target machine may have hardware instructions to implement certain


specific operations efficiently. For example, some machines have auto-
increment and auto-decrement addressing modes. These add or subtract
one from an operand before or after using its value. The use of these
modes greatly improves the quality of code when pushing or popping a
stack, as in parameter passing. These modes can also be used in code for
statements like i : =i+1. i:=i+1 → i++. i:=i-1 → i- -

23
DAG-DIRECTED ACYCLIC GRAPH
THE DAG REPRESENTATION FOR BASIC BLOCKS

A DAG for a basic block is adirected acyclic graphwith the following labels on
nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store
the computed values.
 DAGs are useful data structures for implementing transformations on basic
blocks.
 It gives a picture of how the value computed by a statement is used
in subsequent statements.
 It provides a good way of determining common sub - expressions.
Algorithm for construction of DAG
Input:A basic block

Output:A DAG for the basic block containing the following information:

1. A label for each node. For leaves, the label is an identifier. For interior nodes, an
operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case 1 x : = y OP z
Case 2 x : = OP y
Case 3 x : = y
Method:

Step 1:If y is undefined then create node(y).

If z is undefined, create node(z) for case(i).


Step 2:For the case(i), create a node(OP) whose left child is node(y) and
right child is node(z). ( Checking for common sub expression). Let n be this
node.
For case(ii), determine whether there is node(OP) with one child node(y). If not create
such a node.

For case(iii), node n will be node(y).

Step 3:Delete x from the list of identifiers for node(x). Append x to the list of attached
identifiers for the node n found

24
Example:Consider the block of three- address statements:
1. t1 := 4* i
2. t2 := a[t1]
3. t3 := 4* i
4. t4 := b[t3]
5. t5 := t2*t4
6. t 6 := prod+t5
7. prod := t6 8. t7 := i+1 9. i := t7
10. if i<=20 goto (1)

Stages in DAG construction

25
26
27
Application of DAGs:

1. We can automatically detect common sub expressions.


2. We can determine which identifiers have their values used
in the block.
3. We can determine which statements compute values that
could be used outside the block.

28
OPTIMIZATION OF BASIC BLOCKS
OPTIMIZATION OF BASIC BLOCKS

There are two types of basic block optimizations. They are :

Structure-Preserving Transformations
Algebraic Transformations

Structure-Preserving Transformations:

The primary Structure-Preserving Transformation on basic blocks are:

Common sub-expression elimination


Dead code elimination
Renaming of temporary variables
Interchange of two independent adjacent statements.

Common sub-expression elimination:

Common sub expressions need not be computed over and over again.
Instead they can be computed once and kept in store from where it’s
referenced when encountered aga in – of course providing the variable
values in the expression still remain constant.

Example:
a: =b+c b: =a-d c: =b+c d: =a-d
The 2nd and 4th statements compute the same expression: b+c
and a-d Basic block can be transformed to
a=b+c
b=a-d
c=a
d=b

29
Dead code elimination
It’s possible that a large amount of dead (useless) code may exist in
the program. This might be especially caused when introducing
variables and procedures as part of construction or error-correction
of a program – once declared and defined, one forgets to remove
them in case they serve no purpose. Eliminating these will definitely
optimize the code.
Renaming of temporary variables:

A statement t:=b+c where t is a temporary name can be


changed to u:=b+c where u is another temporary name, and
change all uses of t to u.
In this we can transform a basic block to its equivalent block called
normal-form block.

Interchange of two independent adjacent statements:


Two statements
t1=b+c
t2=x+y
can be interchanged or reordered in its computation in the basic block when
value of t 1does not affect the value of t2.
Algebraic Transformations:

Algebraic identities represent another important class of


optimizations on basic blocks. This includes simplifying
expressions or replacing expensive operation by cheaper ones
reduction in strength.
Another class of related optimizations is constant folding. Here
we evaluate constant expressions at compile time and replace the
constant expressions by their values. Thus the expression 2*3.14
would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes
generate unexpected common sub expressions.
Associative laws may also be applied to expose common sub
expressions. For example, if the source code has the
assignments

30
a=b+c
e=c+d+b
The following intermediate code may be generated
a=b+c
t=c+d
E=t+b
Example
X=x+0 can be removed
X=y**2 can be replaced by a cheaper statemet x=y*y

The compiler writer should examine the language carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does
not always obey the algebraic identities of mathematics. Thus, a compiler may
evaluate x*y-x*z as x*(y-z) but it may not evaluate a+(b-c) as (a+b)-c.

GLOBAL DATA FLOW ANALYSIS


In order to do code optimization and a good job of code generation , compiler
needs to collect information about the program as a whole and to distribute this
information to each block in the flow graph.

A compiler could take advantage of “reaching definitions” , such as knowing


where a variable like debug was last defined before reaching a given block, in
order to perform transformations are just a few examples of data-flow
information that an optimizing compiler collects by a process known as data-
flow analysis.

Data-flow information can be collected by setting up and solving systems of


equations of the form :
out [S] = gen [S] U ( in [S] – kill [S] )
This equation can be read as “ the information at the end of a statement is
either generated within the statement , or enters at the beginning and is not
killed as control flows through the statement.”

31
GLOBAL DATA FLOW ANALYSIS
The details of how data-flow equations are set and solved depend on three
factors.

The notions of generating and killing depend on the desired


information, i.e., on the data flow analysis problem to be
solved. Moreover, for some problems, instead of proceeding
along with flow of control and defining out[s] in terms of in[s],
we need to proceed backwards and define in[s] in terms of
out[s].

Since data flows along control paths, data-flow analysis is


affected by the constructs in a program. In fact, when we write
out[s] we implicitly assume that there is unique end point
where control leaves the statement; in general, equations are
set up at the level of basic blocks rather than statements,
because blocks do have unique end points.

There are subtleties that go along with such statements as


procedure calls, assignments through pointer variables, and
even assignments to array variables.
Points and Paths:
Within a basic block, we talk of the point between two adjacent statements, as
well as the point before the first statement and after the last. Thus, block B1
has four points: one before any of the assignments and one after each of the
three assignments.
Now let us take a global view and consider all the points in all the blocks. A path
from
p 1
to pn is a sequence of points p1, p2,….,pn such that for each i between 1 and
n-1, either

P i is the point immediately preceding a statement and pi+1


is the point immediately following that statement in the
same block, or
P i is the end of some block and pi+1 is the beginning of a successor block.

32
Reaching definitions:

A definition of variable x is a statement that assigns, or may assign, a value


to x. The most common forms of definition are assignments to x and
statements that read a value from an i/o device and store it in x.

These statements certainly define a value for x, and they are referred to as
unambiguous definitions of x. There are certain kinds of statements that
may define a value for x; they are called ambiguous definitions. The most
usual forms o f ambiguous definitions of x are:

A call of a procedure with x as a parameter or a procedure that can


access x because x is in the scope of the procedure.
An assignment through a pointer that could refer to x. For example, the
assignment *q: =
y is a definition of x if it is possible that q points to x. we must assume that
an assignment through a pointer is a definition of every variable.

We say a definition d reaches a point p if there is a path from the point


immediately following d to p, such that d is not “killed” along that path.
Thus a point can be reached

by an unambiguous definition and an ambiguous definition of the same


variable appearing later along one path.

33
Data-flow analysis of structured programs:

Flow graphs for control flow constructs such as do-while statements


have a useful property: there is a single beginning point at which
control enters and a single end point
that control leaves from when execution of the statement is over.
We exploit this property when we talk of the definitions reaching
the beginning and the end of statements with the following syntax.
S->id=E|S;S|if E then S else S|do Swhile E
E->id+id|id

Expressions in this language are similar to those in the intermediate code, but
the flow graphs for statements have restricted forms.

34
We define a portion of a flow graph called a region to be a
set of nodes N that includes a header, which dominates all
other nodes in the region. All edges between nodes in N
are in the region, except for some that enter the header.
The portion of flow graph corresponding to a statement S
is a region that obeys the further restriction that control
can flow to just one outside block when it leaves the
region.
We say that the beginning points of the dummy blocks at
the entry and exit of a statement’s region are the
beginning and end points, respectively, of the statement.
The equations are inductive, or syntax-directed, definition
of the sets in[S], out[S], gen[S], and kill[S] for all
statements S.
gen[S] is the set of definitions “generated” by S while
kill[S] is the set of definitions that never reach the
end of S.
i) gen [S] = { d }
kill [S] = Da – { d }
out [S] = gen [S] U ( in[S] – kill[S] )

Observe the rules for a single assignment of variable a.


Surely that assignment is a definition of a, say d. Thus

Gen[S]={d}

On the other hand, d “kills” all other

definitions of a, so we write

Kill[S] =Da – {d}


Where, Da is the set of all definitions in the program for
variable a.

35
ii) gen[S]=gen[S2] U (gen[S1]-kill[S2])
Kill[S] = kill[S2] U (kill[S1] – gen[S2])
in[S1]=in[S]
In[S2]=out[S1]
Out[S]=out[S2]
Under what circumstances is definition d generated by
S=S 1; S2? First of all, if it is generated by S2, then it is
surely generated by S. if d is generated by S1, it will
reach the end of S provided it is not killed by S2. Thus,
we write

gen[S]=gen[S2] U (gen[S1]-kill[S2])

Similar reasoning applies to the

killing of a definition, so we have

Kill[S] = kill[S2] U (kill[S1] –

gen[S2])

36
EFFICIENT DATA FLOW ALGORITHM

To efficiently optimize the code compiler collects all the information about the
program and distribute this information to each block of the flow graph. This
process is known as data-flow graph analysis.
Certain optimization can only be achieved by examining the entire program. It
can't be achieve by examining just a portion of the program.
For this kind of optimization user defined chaining is one particular problem.
Here using the value of the variable, we try to find out that which definition of a
variable is applicable in a statement.
Based on the local information a compiler can perform some optimizations.
For example, consider the following code:
x = a + b;
x=6*3
In this code, the first assignment of x is useless. The value computer for x is
never used in the program.
At compile time the expression 6*3 will be computed, simplifying the second
assignment statement to x = 18;
Some optimization needs more global information. For example, consider the
following code:
a = 1;
b = 2;
c = 3;
if (....) x = a + 5;
else x = b + 4;
c = x + 1;
In this code, at line 3 the initial assignment is useless and x +1 expression can
be simplified as 7.
But it is less obvious that how a compiler can discover these facts by looking
only at one or two consecutive statements. A more global analysis is required
so that the compiler knows the following things at each point in the program:
Which variables are guaranteed to have constant values
Which variables will be used before being redefined
Data flow analysis is used to discover this kind of property. The data flow
analysis can be performed on the program's control flow graph (CFG).
The control flow graph of a program is used to determine those parts of a
program to which a particular value assigned to a variable might propagate.

37
9. ASSIGNMENTS

S.No Questions K CO
Leve Level
l
1. Write a quicksort program, convert it into three address
code. Construct the block and flow graph and optimize the K3 CO5
code.
2. Construct DAG for factorial program.

Optimize the code, identify the basic block and construct the
basic block. Optimize the basic block. K3 CO5

3. Explain in detail about Peephole optimization.

K3 CO5

4 Explain in detail about data flow analysis.

Draw the diagram for for loop, while loop and do while loop.
K4 CO5

38
11. PART A : Q & A : UNIT – V
S
C
N Questions and Answers K
O
o
Define basic block and flow graph.
A basic block is a sequence of consecutive
statements in which flow of Control enters at the
beginning and leaves at the end without halt or
1 possibility Of branching except at the end. K1
A flow graph is defined as the adding of flow of
control information to the Set of basic blocks
making up a program by constructing a directed
graph.
Give the applications of DAG.
Automatically detect the common sub expressions
Determine which identifiers have their values used in
2 K1
the block.
Determine which statements compute values that
could be used outside the blocks.
Give the important classes of local
transformations on basic blocks
3
Structure preservation transformations C K1
Algebraic transformations. O
5
What are the structure preserving
transformations on basic blocks?
Common sub-expression elimination
4 K1
Dead-code elimination
Renaming of temporary variables
Interchange of two independent adjacent statement
Write the characteristics of peephole
optimization?
Redundant-instruction elimination
5 K1
Flow-of-control optimizations.
Algebraic simplifications
Use of machine idioms
Define Dead-code elimination with ex.
It is defined as the process in which the statement
x=y+z appear in a basic block, where x is
6 K1
a dead that is never subsequently used. Then this
statement maybe safely removed without
39
changing the value of basic blocks.
11. PART A : Q & A : UNIT – V

SN
Questions and Answers CO K
o
Define use of machine idioms.
The target machine may have harder instructions to
implement certain specific operations
7 K1
efficiently. Detecting situations that permit the use of
these instructions can reduce execution
time significantly.
Define code optimization and optimizing compiler
The term code-optimization refers to techniques a
compiler can employ in an attempt to produce a better
8 object language program than the most obvious for a K1
given source program.
Compilers that apply code-improving transformations
are called Optimizing-compilers.
Define Common sub-expression elimination.
It is defined as the process in which eliminate the
CO5
statements which has the
9 K2
Same expressions. Hence this basic block may be
transformed into the equivalent
Block.
Define reduction in strength.
Reduction in strength replaces expensive operations
by equivalent cheaper ones
10 K1
on the target machines. Certain machine instructions
are cheaper than others and can often
be used as special cases of more expensive operators.

What is meant by copy propagation?


11 It is the process of replacing the occurrences of K2
targets of direct assignments with their values.

What is meant by live variable analysis?


The process of knowing for variable x from point p
12 K2
whether the value of x at p could be used along some
path in the flow graph starting at p.

40
11. PART A : Q & A : UNIT – V

SN
Questions and Answers CO K
o
List the functions involved in semantics-preserving
transformations.
13 Common-subexpression elimination, copy K1
propagation, dead-code elimination, and constant
folding
Define code motion.
The process of moving the code either above the loop
or after the loop is called code motion
14 While(i<= limit-2) K2
After code motion it becomes t=limit-2
while(i<=t)

What is the purpose of Next-use information?


Knowing when the value of a variable will be used
15 K2
next is essential for generating good code. It makes a
backward pass over each basic block CO5

List out the criteria for code improving


transformations.
1. The transformation should speed up the program
16 by a measurable amount. K2
2. The transformation must be worth the effort.
3. The transformation must preserve the meaning of
the program
Write the formula used for dataflow equation and
explain its terms.
Out[S]=gen[S]U(in[S]-kill[S]) which is interpreted as
“The information at the end of the statement is either
17 K2
generated within the statement or enters at the
beginning and is not killed as the control flows
through the statement”
11. PART A : Q & A : UNIT – V

SN
Questions and Answers CO K
o
Write the labels on nodes in DAG.
A DAG for a basic block is a directed acyclic graph
with the following Labels on nodes:
Leaves are labeled by unique identifiers, either
18 K1
variable names or constants.
Interior nodes are labeled by an operator symbol.
Nodes are also optionally given a sequence of
identifiers for labels.
Give an example of eliminating unreachable code.
An unlabeled instruction immediately following an
19 K1
unconditional jump may be removed. If the sequence
of code will never be executed then it is unreachable
CO5
Write the step to partition a sequence of 3 address
statements into basic blocks.
1. First determine the set of leaders, the first
statement of basic blocks.
The rules we can use are the following.
The first statement is a leader.
Any statement that is the target of a conditional or
20 K1
unconditional goto is a leader.
Any statement that immediately follows a goto or
conditional goto statement is a leader.
2. For each leader, its basic blocks consists of the
leader and all statements
Up to but not including the next leader or the end of
the program.

42
12. PART B QUESTIONS : UNIT – V

(CO5, K4)

1. Explain in detail about principal sources of optimization and


solve for quicksort.
2. Explain global data flow analysis with necessary equations.
3. (i) Write an algorithm for constructing natural loop of a
back edge.
(ii) Explain the peephole optimization
4. For the given C program draw the syntax tree, Three
Address code and construct the DAG
while(i<=10)
{
prod=prod+a[i]*b[i]
i=i+1;
}
5. Define DAG. Write the algorithm for DAG and its application. Mention its
advantages and disadvantages.

43
PART C QUESTIONS
(CO5, K6)
1. For the given quicksort code write the Three Addres Code.

2. For the given quicksort code do the principle sources of optimization


void quicksort(m,n)
int m,n;
int i,j,v,x;
if (n <= m) return;
i = m-1; j = n; v = a[n];
while(1){
do i=i+1; while (a[i]<v);
do j=j-1; while (a[j]>v);
if(i>=j) break;
x=a[i];
[i]=a[j];
a[j]=x;
}
x=a[i];
a[i]=a[n];
a[n]=x;
quicksort(m,j);
quicksort(i+1,n);

44
13. SUPPORTIVE ONLINE CERTIFICATION COURSES

UNITS : I TO V

UDEMY
The Ultimate : Compiler Design - Module - 1
Compiler Design

NPTEL
Compiler Design
https://onlinecourses.nptel.ac.in/noc21_cs07/preview

45
14. REAL TIME APPLICATIONS : UNIT – V

1. Design of an optimized compiler for different softwares available


2. Compare and contrast intraprocedural and interprocedural optimization.

46
15. CONTENTS BEYOND SYLLABUS : UNIT – V

Optimization levels of SPARC compiler

The compiler supports four levels of optimization.:


 O1 Limited optimizations only in the optimizing
components of the code generator
 O2 Optimize expressions not involving global, aliased
local, and volatile variables, automatic inlining,
software pipelining, loop unrolling and instruction
scheduling are skipped
 O3 optimize expressions that involve global variables,
but makes worst case assumptions on pointer aliases.
 O4 makes worst case assumptions on pointer aliases
only when necessary. it depends on the front-ends to
provide aliases. automatic inliner and aliaser are used in
this level of optimization.

47
16. ASSESSMENT SCHEDULE

• Tentative schedule for the Assessment During 2020-2021


even semester

Name of the
Start Date End Date Portion
Assessment

1 Unit Test 1 29.1.2024 3.2.2024 UNIT 1

2 IAT 1 10.2.2024 16.2.2024 UNIT 1 & 2

3 Unit Test 2 11.3.2024 16.3.2024 UNIT 3

4 IAT 2 1.4.2024 6.4.2024 UNIT 3 & 4

UNIT 5 , 1 &
5 Revision 1 13.5.2024 16.5.2024
2

6 Revision 2 17.4.2024 19.4.2024 UNIT 3 & 4

7 Model 20.4.2024 30.4.2024 ALL 5 UNITS

48
17. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS

• TEXT BOOKS:

• Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman,


Compilers: Principles, Techniques and Tools‖, Second Edition,
Pearson Education, 2009.

• REFERENCE BOOKS:

• Randy Allen, Ken Kennedy, Optimizing Compilers for Modern


Architectures: A Dependence based Approach, Morgan
Kaufmann Publishers, 2002.
• Steven S. Muchnick, Advanced Compiler Design and Implementation‖,
Morgan Kaufmann Publishers - Elsevier Science, India, Indian Reprint
2003.
• Keith D Cooper and Linda Torczon, Engineering a Compiler‖,
Morgan Kaufmann Publishers Elsevier Science, 2004.
• V. Raghavan, Principles of Compiler Design‖, Tata McGraw Hill Education
Publishers,2010.
• Allen I. Holub, Compiler Design in C‖, Prentice-Hall Software Series, 1993.

49
18. MINI PROJECT SUGGESTION

• Objective:
This module facilitate hands-on skills of the students (from the practical
courses more effectively) and they can try the following mini projects
for deep understanding in Compiler Design.

• Planning:
• This method is mostly used to improve the ability of students in

application domain and also to reinforce knowledge imparted during the


lecture.
• Being a technical institute, this method is extensively used to provide

empirical evidence of theory learnt.


• Students are asked to prepare mini projects involving application of the

concepts, principles or laws learnt.


• The faulty guides the students at various stages of developing the project

and gives timely inputs for the development of the model.

Project Idea :
Design of an highly efficient Optimizing Compiler
References:
The design of an optimizing compiler by William A.Wulf, Richard K. Johnson,
Charles B.Weinstock, Steven O.Hobbs, Computer Science Department,
Carnegie-Melton University, Pittsburgh, Pa. December 1973.

50
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.

51

You might also like