CD Notes Unit 5
CD Notes Unit 5
proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
1. CONTENTS
Page
S. No Contents
No
1 Course Objectives 5
2 Pre Requisites 6
3 Syllabus 7
4 Course outcomes 9
6 Lecture Plan 11
9 Assignments 47
11 Part B Questions 52
15 Assessment Schedule 58
3
21CS601
Compiler Design
Department: CSE
Batch/Year: 2021-25 / III
Created by:
Dr. P. EZHUMALAI, Prof & Head/RMDEC
Dr. A. K. JAITHUNBI, Associate Professor/RMDEC
V.SHARMILA, Assistant Professor/RMDEC
Date: 26.03.2024
4
1. CONTENTS
S. Page
Contents
No No
1 Course Objectives 5
2 Pre Requisites 6
3 Syllabus 7
4 Course outcomes 9
6 Lecture Plan 11
8 Lecture Notes 13
9 Assignments 47
11 Part B Questions 55
15 Assessment Schedule 64
5
2. COURSE OBJECTIVES
6
3. PRE REQUISITES
• Pre-requisite Chart
21MA302
21CS201
Discrete Mathematics
Data Structures
21CS02 Python
21GE101 -
Programming (Lab
Problem solving and C
Integrated)
Programming
7
4. SYLLABUS
21CS601 COMPILER DESIGN (Lab Integrated) LTPC
3 02 4
OBJECTIVES
8
4. SYLLABUS
LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol
table, while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
3. Implement a Lexical Analyzer using Lex Tool.
4. Design Predictive Parser for the given language.
5. Implement an Arithmetic Calculator using LEX and YACC.
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation).
8. Implement back-end of the complier for which the three address code
is given as input and the 8086 assembly language code is produced as
output.
9
5. COURSE OUTCOME
Construct the parse tree and check the syntax of the given
CO2 K3
source program.
10
• HKL = Highest Knowledge Level
6. CO - PO / PSO MAPPING
CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -
CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO3 K4 3 2 1 - - - - 1 1 1 - 1 2 - -
CO4 K4 3 2 1 - - - - 1 1 1 - 1 2 - -
CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
11
7. LECTURE PLAN
UNIT – V CODE OPTIMIZATION
S. Propos Topic Actual Pertai High Mod Deli After successful Rem
No ed Lecture ning est e of very completion of the arks
Lectur CO(s) Cog Deli Reso course, the students
e n very urce should be able to
Period Period itive s ( Outcomes)
Leve
l
1 CO5 K2 MD1 T1
Principal Optimize the given code
Sources of using principle sources of
Optimization optimization
2 CO5 K2 MD1 T1
MD5 Optimize the given step
Optimization
using principle sources of
problem
optimization
3 CO5 K3 MD1 T1
Peep-hole Optimize the code within
optimization the basic blovks
4 CO5 K2 MD1 T1
Register MD2
Allocating registers to the
allocation and
optimized code
assignment
5 CO5 K2 MD1 T1
MD2 Identify the common
DAG subexpression and
construct DAG
6 CO5 K2 MD1 T1
Group the dependent
Basic blocks contiguous code and main
and flow graph the control flow and data
flow
7 CO5 K2 MD1 T1
UNIT – V
https://create.kahoot.it/share/cs8602-unit-1/2e5c742f-541a-4bd0-
84b2-9e2ca3b47170
Join at www.kahoot.it
or with the Kahoot! app use the below game pin to
play the Quiz.
13
9. LECTURE NOTES : UNIT – V
CODE OPTIMIZATION
The other transformations come up primarily when global optimizations are performed.
Frequently, a program will include several calculations of the offset in an array. Some of
the duplicate calculations cannot be avoided by the programmer because they lie below the
level of detail accessible within the source language.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E was
previously computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
t6: = b [t4] +t5
14
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
t6: = b [t1] +t5
The common sub expression t4: =4*i is eliminated as its computation is already in t1 and the
value of i is not been changed from definition to use.
15
Dead-Code Eliminations:
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is dead or
useless code, statements that compute values that never get used. While
the programmer is unlikely to introduce any dead code intentionally, it
may appear as the result of previous transformations.
Example:
i=0;
if(i=1)
{
a=b+5;
}Here, ‘if’ statement is dead code because this condition will never get
satisfied.
Constant folding:
Deducing at compile time that the value of an expression is a
constant and using the constant instead is known as constant folding. One
advantage of copy propagation is that it often turns the copy statement into
dead code.
For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
Loop Optimizations:
In loops, especially in the inner loops, programs tend to spend the bulk of
their time. The running time of a program may be improved if the number
of instructions in an inner loop is decreased, even if we increase the
amount of code outside that loop.
Three techniques are important for loop optimization:
1. Code motion, which moves code outside a loop;
2. Induction-variable elimination, which we apply to replace variables
from inner loop.
3. Reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.
16
Code Motion:
An important modification that decreases the amount of code in a loop is
code motion. This transformation takes an expression that yields the same
result independent of the number of times a loop is executed (a loop-
invariant computation) and places the expression before the loop. Note
that the notion “before the loop” assumes the existence of an entry for the
loop. For example, evaluation of limit-2 is a loop-invariant computation
in the following while-statement:
•
• while (i <= limit-2) /* statement does not change limit*/
•
• Code motion will result in the equivalent of
•
• t= limit-2;
• while (i<=t) /* statement does not change limit or t */
17
Induction Variables :
Loops are usually processed inside out. For example consider the loop
around B3. Note that the values of j and t4 remain in lock-step; every
time the value of j decreases by 1, that of t4 decreases by 4 because 4*j is
assigned to t4. Such identifiers are called induction variables.
When there are two or more induction variables in a loop, it may be possible
to get rid of all but one, by the process of induction-variable elimination. For
the inner loop around B3 in Fig.5.3 we cannot get rid of either j or t4
completely; t4 is used in B3 and j in B4.
Reduction In Strength:
18
19
PEEPHOLE OPTIMIZATION
A statement-by-statement code-generations strategy often produces target code
that contains redundant instructions and suboptimal constructs. The quality of
such target code can be improved by applying “optimizing” transformations to
the target program.
The peephole is a small, moving window on the target program. The code
in the peephole need not be contiguous, although some implementations do
require this. It is characteristic of peephole optimization that each improvement
may spawn opportunities for additional improvements.
20
Unreachable Code:
Another opportunity for peephole optimizations is the removal of
unreachable instructions. An unlabeled instruction immediately following
an unconditional jump may be removed. This operation can be repeated
to eliminate a sequence of instructions. For example, for debugging
purposes, a large program may have within it certain segments that are
executed only if a variable debug is 1. In C, the source code might look
like:
#define debug 0
….
If ( debug ) {
Print debugging information
}
In the intermediate representations the if-statement may be translated as:
If debug ≠1 goto L2
Print debugging information
L2: …………………………… (b)
If debug ≠0 goto L2
Print debugging information
L2: ………………………… (c)
As the argument of the statement of (c) evaluates to a constant true it can be
replaced
21
By goto L2. Then all the statement that print debugging aids are manifestly
unreachable and can be eliminated one at a time.
Flows-Of-Control Optimizations:
The unnecessary jumps can be eliminated in either the intermediate code
or the target code by the following types of peephole optimizations. We
can replace the jump sequence
goto L1
….
can be replaced by
If a < b goto L2
….
L1: goto L2
22
If a < b goto L2
goto L3
…….
L3:
Algebraic Simplification:
x := x+0 or x := x * 1
Reduction in Strength:
X2 → X*X
23
DAG-DIRECTED ACYCLIC GRAPH
THE DAG REPRESENTATION FOR BASIC BLOCKS
A DAG for a basic block is adirected acyclic graphwith the following labels on
nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store
the computed values.
DAGs are useful data structures for implementing transformations on basic
blocks.
It gives a picture of how the value computed by a statement is used
in subsequent statements.
It provides a good way of determining common sub - expressions.
Algorithm for construction of DAG
Input:A basic block
Output:A DAG for the basic block containing the following information:
1. A label for each node. For leaves, the label is an identifier. For interior nodes, an
operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case 1 x : = y OP z
Case 2 x : = OP y
Case 3 x : = y
Method:
Step 3:Delete x from the list of identifiers for node(x). Append x to the list of attached
identifiers for the node n found
24
Example:Consider the block of three- address statements:
1. t1 := 4* i
2. t2 := a[t1]
3. t3 := 4* i
4. t4 := b[t3]
5. t5 := t2*t4
6. t 6 := prod+t5
7. prod := t6 8. t7 := i+1 9. i := t7
10. if i<=20 goto (1)
25
26
27
Application of DAGs:
28
OPTIMIZATION OF BASIC BLOCKS
OPTIMIZATION OF BASIC BLOCKS
Structure-Preserving Transformations
Algebraic Transformations
Structure-Preserving Transformations:
Common sub expressions need not be computed over and over again.
Instead they can be computed once and kept in store from where it’s
referenced when encountered aga in – of course providing the variable
values in the expression still remain constant.
Example:
a: =b+c b: =a-d c: =b+c d: =a-d
The 2nd and 4th statements compute the same expression: b+c
and a-d Basic block can be transformed to
a=b+c
b=a-d
c=a
d=b
29
Dead code elimination
It’s possible that a large amount of dead (useless) code may exist in
the program. This might be especially caused when introducing
variables and procedures as part of construction or error-correction
of a program – once declared and defined, one forgets to remove
them in case they serve no purpose. Eliminating these will definitely
optimize the code.
Renaming of temporary variables:
30
a=b+c
e=c+d+b
The following intermediate code may be generated
a=b+c
t=c+d
E=t+b
Example
X=x+0 can be removed
X=y**2 can be replaced by a cheaper statemet x=y*y
The compiler writer should examine the language carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does
not always obey the algebraic identities of mathematics. Thus, a compiler may
evaluate x*y-x*z as x*(y-z) but it may not evaluate a+(b-c) as (a+b)-c.
31
GLOBAL DATA FLOW ANALYSIS
The details of how data-flow equations are set and solved depend on three
factors.
32
Reaching definitions:
These statements certainly define a value for x, and they are referred to as
unambiguous definitions of x. There are certain kinds of statements that
may define a value for x; they are called ambiguous definitions. The most
usual forms o f ambiguous definitions of x are:
33
Data-flow analysis of structured programs:
Expressions in this language are similar to those in the intermediate code, but
the flow graphs for statements have restricted forms.
34
We define a portion of a flow graph called a region to be a
set of nodes N that includes a header, which dominates all
other nodes in the region. All edges between nodes in N
are in the region, except for some that enter the header.
The portion of flow graph corresponding to a statement S
is a region that obeys the further restriction that control
can flow to just one outside block when it leaves the
region.
We say that the beginning points of the dummy blocks at
the entry and exit of a statement’s region are the
beginning and end points, respectively, of the statement.
The equations are inductive, or syntax-directed, definition
of the sets in[S], out[S], gen[S], and kill[S] for all
statements S.
gen[S] is the set of definitions “generated” by S while
kill[S] is the set of definitions that never reach the
end of S.
i) gen [S] = { d }
kill [S] = Da – { d }
out [S] = gen [S] U ( in[S] – kill[S] )
Gen[S]={d}
definitions of a, so we write
35
ii) gen[S]=gen[S2] U (gen[S1]-kill[S2])
Kill[S] = kill[S2] U (kill[S1] – gen[S2])
in[S1]=in[S]
In[S2]=out[S1]
Out[S]=out[S2]
Under what circumstances is definition d generated by
S=S 1; S2? First of all, if it is generated by S2, then it is
surely generated by S. if d is generated by S1, it will
reach the end of S provided it is not killed by S2. Thus,
we write
gen[S]=gen[S2] U (gen[S1]-kill[S2])
gen[S2])
36
EFFICIENT DATA FLOW ALGORITHM
To efficiently optimize the code compiler collects all the information about the
program and distribute this information to each block of the flow graph. This
process is known as data-flow graph analysis.
Certain optimization can only be achieved by examining the entire program. It
can't be achieve by examining just a portion of the program.
For this kind of optimization user defined chaining is one particular problem.
Here using the value of the variable, we try to find out that which definition of a
variable is applicable in a statement.
Based on the local information a compiler can perform some optimizations.
For example, consider the following code:
x = a + b;
x=6*3
In this code, the first assignment of x is useless. The value computer for x is
never used in the program.
At compile time the expression 6*3 will be computed, simplifying the second
assignment statement to x = 18;
Some optimization needs more global information. For example, consider the
following code:
a = 1;
b = 2;
c = 3;
if (....) x = a + 5;
else x = b + 4;
c = x + 1;
In this code, at line 3 the initial assignment is useless and x +1 expression can
be simplified as 7.
But it is less obvious that how a compiler can discover these facts by looking
only at one or two consecutive statements. A more global analysis is required
so that the compiler knows the following things at each point in the program:
Which variables are guaranteed to have constant values
Which variables will be used before being redefined
Data flow analysis is used to discover this kind of property. The data flow
analysis can be performed on the program's control flow graph (CFG).
The control flow graph of a program is used to determine those parts of a
program to which a particular value assigned to a variable might propagate.
37
9. ASSIGNMENTS
S.No Questions K CO
Leve Level
l
1. Write a quicksort program, convert it into three address
code. Construct the block and flow graph and optimize the K3 CO5
code.
2. Construct DAG for factorial program.
Optimize the code, identify the basic block and construct the
basic block. Optimize the basic block. K3 CO5
K3 CO5
Draw the diagram for for loop, while loop and do while loop.
K4 CO5
38
11. PART A : Q & A : UNIT – V
S
C
N Questions and Answers K
O
o
Define basic block and flow graph.
A basic block is a sequence of consecutive
statements in which flow of Control enters at the
beginning and leaves at the end without halt or
1 possibility Of branching except at the end. K1
A flow graph is defined as the adding of flow of
control information to the Set of basic blocks
making up a program by constructing a directed
graph.
Give the applications of DAG.
Automatically detect the common sub expressions
Determine which identifiers have their values used in
2 K1
the block.
Determine which statements compute values that
could be used outside the blocks.
Give the important classes of local
transformations on basic blocks
3
Structure preservation transformations C K1
Algebraic transformations. O
5
What are the structure preserving
transformations on basic blocks?
Common sub-expression elimination
4 K1
Dead-code elimination
Renaming of temporary variables
Interchange of two independent adjacent statement
Write the characteristics of peephole
optimization?
Redundant-instruction elimination
5 K1
Flow-of-control optimizations.
Algebraic simplifications
Use of machine idioms
Define Dead-code elimination with ex.
It is defined as the process in which the statement
x=y+z appear in a basic block, where x is
6 K1
a dead that is never subsequently used. Then this
statement maybe safely removed without
39
changing the value of basic blocks.
11. PART A : Q & A : UNIT – V
SN
Questions and Answers CO K
o
Define use of machine idioms.
The target machine may have harder instructions to
implement certain specific operations
7 K1
efficiently. Detecting situations that permit the use of
these instructions can reduce execution
time significantly.
Define code optimization and optimizing compiler
The term code-optimization refers to techniques a
compiler can employ in an attempt to produce a better
8 object language program than the most obvious for a K1
given source program.
Compilers that apply code-improving transformations
are called Optimizing-compilers.
Define Common sub-expression elimination.
It is defined as the process in which eliminate the
CO5
statements which has the
9 K2
Same expressions. Hence this basic block may be
transformed into the equivalent
Block.
Define reduction in strength.
Reduction in strength replaces expensive operations
by equivalent cheaper ones
10 K1
on the target machines. Certain machine instructions
are cheaper than others and can often
be used as special cases of more expensive operators.
40
11. PART A : Q & A : UNIT – V
SN
Questions and Answers CO K
o
List the functions involved in semantics-preserving
transformations.
13 Common-subexpression elimination, copy K1
propagation, dead-code elimination, and constant
folding
Define code motion.
The process of moving the code either above the loop
or after the loop is called code motion
14 While(i<= limit-2) K2
After code motion it becomes t=limit-2
while(i<=t)
SN
Questions and Answers CO K
o
Write the labels on nodes in DAG.
A DAG for a basic block is a directed acyclic graph
with the following Labels on nodes:
Leaves are labeled by unique identifiers, either
18 K1
variable names or constants.
Interior nodes are labeled by an operator symbol.
Nodes are also optionally given a sequence of
identifiers for labels.
Give an example of eliminating unreachable code.
An unlabeled instruction immediately following an
19 K1
unconditional jump may be removed. If the sequence
of code will never be executed then it is unreachable
CO5
Write the step to partition a sequence of 3 address
statements into basic blocks.
1. First determine the set of leaders, the first
statement of basic blocks.
The rules we can use are the following.
The first statement is a leader.
Any statement that is the target of a conditional or
20 K1
unconditional goto is a leader.
Any statement that immediately follows a goto or
conditional goto statement is a leader.
2. For each leader, its basic blocks consists of the
leader and all statements
Up to but not including the next leader or the end of
the program.
42
12. PART B QUESTIONS : UNIT – V
(CO5, K4)
43
PART C QUESTIONS
(CO5, K6)
1. For the given quicksort code write the Three Addres Code.
44
13. SUPPORTIVE ONLINE CERTIFICATION COURSES
UNITS : I TO V
UDEMY
The Ultimate : Compiler Design - Module - 1
Compiler Design
NPTEL
Compiler Design
https://onlinecourses.nptel.ac.in/noc21_cs07/preview
45
14. REAL TIME APPLICATIONS : UNIT – V
46
15. CONTENTS BEYOND SYLLABUS : UNIT – V
47
16. ASSESSMENT SCHEDULE
Name of the
Start Date End Date Portion
Assessment
UNIT 5 , 1 &
5 Revision 1 13.5.2024 16.5.2024
2
48
17. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS
• TEXT BOOKS:
• REFERENCE BOOKS:
49
18. MINI PROJECT SUGGESTION
• Objective:
This module facilitate hands-on skills of the students (from the practical
courses more effectively) and they can try the following mini projects
for deep understanding in Compiler Design.
• Planning:
• This method is mostly used to improve the ability of students in
Project Idea :
Design of an highly efficient Optimizing Compiler
References:
The design of an optimizing compiler by William A.Wulf, Richard K. Johnson,
Charles B.Weinstock, Steven O.Hobbs, Computer Science Department,
Carnegie-Melton University, Pittsburgh, Pa. December 1973.
50
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.
51