CD Unit 5
CD Unit 5
UNIT 5
CODE OPTIMIZATION & CODE
GENERATION
I. CODE OPTIMIZATION
o The process of code optimization involves-
Eliminating the unwanted code lines
Rearranging the statements of the code
o Optimization is a program transformation technique, which tries to improve the code
by making it consume less resources (i.e. CPU, Memory) and deliver high speed.
o In optimization, high-level general programming constructs are replaced by very
efficient low-level programming codes. A code optimizing process must follow the
three rules given below:
The output code must not, in any way, change the meaning of the program.
Optimization should increase the speed of the program and if possible, the program
should demand less number of resources.
Optimization should itself be fast and should not delay the overall compiling process.
o Efforts for an optimized code can be made at various levels of compiling the process.
At the beginning, users can change/rearrange the code or use better algorithms to
write the code.
After generating intermediate code, the compiler can modify the intermediate code
by address calculations and improving loops.
While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.
o Advantages-
The optimized code has the following advantages-
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
o Optimization can be categorized broadly into two types: machine independent and
machine dependent.
1. Machine-Independent Optimization
o Machine independent optimization attempts to improve the intermediate code to get a
better target code. The part of the code which is transformed here does not involve any
absolute memory location or any CPU registers.
o The process of intermediate code generation introduces much inefficiency like: using
variable instead of constants, extra copies of variable, repeated evaluation of
expression. Through the code optimization, you can remove such efficiencies and
improves code.
o It can change the structure of program sometimes of beyond recognition like: unrolls
loops, inline functions, eliminates some variables that are programmer defined.
Example:
pi = 3.14
radius = 10
Area of circle = pi x radius x radius
Here,
This technique substitutes the value of variables ‘pi’ and ‘radius’ at
compile time.
It then evaluates the expression 3.14 x 10 x 10.
The expression is then replaced with its result 314.
This saves the time at run time.
ii. Common Sub-Expression Elimination:
The expression that has been already computed before and appears again in the
code for computation is called as Common Sub-Expression.
In this technique,
As the name suggests, it involves eliminating the common sub
expressions.
The redundant expressions are eliminated to avoid their re-computation.
The already computed result is used in the further program when
required.
Example:
S1 = 4 x i
S1 = 4 x i
S2 = a[S1] S2 = a[S1]
S3 = 4 x j S3 = 4 x j
S4 = 4 x i // Redundant Expression S5 = n
S5 = n S6 = b[S1] + S5
S6 = b[S4] + S5
x=y+z; {
a[j] = 6 x j; a[j] = 6 x j;
} }
i=0;
if (i == 1)
{ i=0;
a=x+5;
}
v. Strength Reduction:
In this technique,
As the name suggests, it involves reducing the strength of expressions.
This technique replaces the expensive and costly operators with the
simple and cheaper ones.
Example:
B=Ax2 B=A+A
Here,
The expression “A x 2” is replaced with the expression “A + A”.
This is because the cost of multiplication operator is higher than that of
addition operator.
2. Machine-dependent Optimization
Machine-dependent optimization is done after the target code has been generated and
when the code is transformed according to the target machine architecture. It involves
CPU registers and may have absolute memory references rather than relative references.
Machine-dependent optimizers put efforts to take maximum advantage of memory
hierarchy.
i. Code motion
Code motion is used to decrease the amount of code in loop. This
transformation takes a statement or expression which can be moved outside
the loop body without affecting the semantics of the program.
Example:
In the while statement, the limit-2 equation is a loop invariant equation.
while (i<=limit-2) /*statement does not change limit*/
In this figure, we can replace the assignment t4: = 4*j by t4: = t4-4. The only
problem which will be arose that t4 does not have a value when we enter
block B2 for the first time. So we place a relation t4=4*j on entry to the block
B2.
I. T. S. Engineering College, Greater Noida
Subject: Compiler Design Prachi Jain
Subject Code: KCS 502 Assistant Professor (CSE Dept.)
while (i<10)
{
j= 3 * i+1;
a[j]=a[j]-2;
i=i+2;
}
o EXAMPLE:
1. Three Address Code for the expression a = b + c + d is-
Here,
All the statements execute in a sequence one after the other.
Thus, they form a basic block.
2. Three Address Code for the expression If A<B then 1 else 0 is-
Here,
The statements do not execute in a sequence one after the other.
Thus, they do not form a basic block.
o EXAMPLE:
1. Compute the basic blocks for the given three address statements-
(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)
Solution:
We have-
PROD = 0 is a leader since first statement of the code is a leader.
T1 = 4 x I is a leader since target of the conditional goto statement is a leader.
Now, the given code can be partitioned into two basic blocks as-
2. Draw a flow graph for the three address statements given in problem-01.
Solution:
Firstly, we compute the basic blocks (already done above).
Secondly, we assign the flow control information.
o There are two type of basic block optimization. These are as follows:
i. Structure-Preserving Transformations
ii. Algebraic Transformations
i. Structure-Preserving Transformations
The primary Structure-Preserving Transformation on basic blocks is as follows:
a) Common sub-expression elimination
b) Dead code elimination
c) Renaming of temporary variables
d) Interchange of two independent adjacent statements
b) Dead-code elimination
It is possible that a program contains a large amount of dead code.
This can be caused when once declared and defined once and forget to remove
them in this case they serve no purpose.
Suppose the statement x:= y + z appears in a block and x is dead symbol that
means it will never subsequently used. Then without changing the value of the
basic block you can safely remove this statement.
d) Interchange of statement
Suppose a block has the following two adjacent statements:
t1 : = b + c
t2 : = x + y
These two statements can be interchanged without affecting the value of block
when value of t1 does not affect the value of t2.
o PROPERTIES OF DAG:
Reachability relation forms a partial order in DAGs.
Both transitive closure & transitive reduction is uniquely defined for DAGs.
Topological Orderings are defined for DAGs.
o APPLICATIONS OF DAG
DAGs are used for the following purposes-
To determine the expressions which have been computed more than once
(called common sub-expressions).
To determine the names whose computation has been done outside the block
but used inside the block.
To determine the statements of the block whose computed value can be made
available outside the block.
To simplify the list of Quadruples by not executing the assignment
instructions x:=y unless they are necessary and eliminating the common sub-
expressions.
o CONSTRUCTION OF DAG
Rule 1:
In a DAG,
Interior nodes always represent the operators.
Exterior nodes (leaf nodes) always represent the names, identifiers or
constants.
Rule 2:
While constructing a DAG,
A check is made to find if there exists any node with the same value.
A new node is created only when there does not exist any node with
the same value.
This action helps in detecting the common sub-expressions and
avoiding the re-computation of the same.
Rule 3:
The assignment instructions of the form x:=y are not performed unless they
are necessary.
o EXAMPLES:
1. Consider the following expression and construct a DAG for it-
(a+b)x(a+b+c)
Solution:
Three Address Code for the given expression is-
T1 = a + b
T2 = T1 + c
T3 = T1 x T2
Now, Step wise representation of Directed Acyclic Graph is-
(i) +
a b
(ii) +
+ c
a b
(iii)
S7:= i+1
i := S7
if i<= 20 goto (1)
Solution:
In this code, the first assignment of x is useless. The value computer for x is
never used in the program.
At compile time the expression 6*3 will be computed, simplifying the second
assignment statement to x = 18;
2. Some optimization needs more global information. For example, consider the
following code:
a = 1;
b = 2;
c = 3;
if (. . .) x = a + 5;
else x = b + 4;
c = x + 1;
In this code, at line 3 the initial assignment is useless and x +1 expression can be
simplified as 7.
o But it is less obvious that how a compiler can discover these facts by looking only at
one or two consecutive statements. A more global analysis is required so that the
compiler knows the following things at each point in the program:
Which variables are guaranteed to have constant values
Which variables will be used before being redefined
o A CODE-GENERATION ALGORITHM:
The algorithm takes a sequence of three-address statements as input. For each three
address statement of the form a:= b op c perform the various actions. These are as
follows:
1. Invoke a function getreg to find out the location L where the result of computation b
op c should be stored.
2. Consult the address description for y to determine y'. If the value of y currently in
memory and register both then prefer the register y'. If the value of y is not already
in L then generate the instruction MOV y', L to place a copy of y in L.
3. Generate the instruction OP z', L where z' is used to show the current location of z.
if z is in both then prefer a register to a memory location. Update the address
descriptor of x to indicate that x is in location L. If x is in L then update its
descriptor and remove x from all other descriptor.
4. If the current value of y or z have no next uses or not live on exit from the block or
in register then alter the register descriptor to indicate that after execution of x : = y
op z those register will no longer contain y or z.
1. Consider the three address statement x:= y + z. It can have the following
sequence of codes:
MOV x, R0
ADD y, R0
2. The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the
following sequence of three address code:
t:= a-b
u:= a-c
v:= t +u
d:= v+u
5. Register allocation
Instructions involving register operands are shorter and faster than those
involving operands in memory.
The use of registers is subdivided into two sub-problems:
Register allocation – the set of variables that will reside in registers at a
point in the program is selected.
Register assignment – the specific register that a variable will reside in is
picked.
6. Evaluation order
The order in which the computations are performed can affect the efficiency
of the target code. Some computation orders require fewer registers to hold
intermediate results than others.
X. TARGET MACHINE