CTCD Unit 5
CTCD Unit 5
Many transformations can be performed at both the local and global levels. Local
transformations are usually performed first.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without changing
the function it computes.
The transformations
• Frequently, a program will include several calculations of the same value, such as an
offset in an array. Some of the duplicate calculations cannot be avoided by the
programmer because they lie below the level of detail accessible within the source
language.
✓ Common Sub expressions elimination:
• An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous
computation. We can avoid recomputing the expression if we can use the previously
computed value.
• For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5: = n
The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5: = n
The common sub expression t4: =4*i is eliminated as its computation is already in t1. And
value of i is not been changed from definition to use.
✓ Copy Propagation:
• Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after
the copy statement f: = g. Copy propagation means use of one variable instead of
another.
This may not appear to be an improvement, but as we shall see it gives us an opportunity to
eliminate x.
• For example:
x=Pi;
……
A=x*r*r;
A=Pi*r*r;
✓ Dead-Code Eliminations:
it is dead at that point. A related idea is dead or useless code, statements that compute values
that never get used. While the programmer is unlikely to introduce any dead code
intentionally, it may appear as the result of previous transformations. An optimization can be
done by eliminating dead code.
Example:
i=0;
if(i=1)
a=b+5;
Here, ‘if’ statement is dead code because this condition will never get satisfied.
✓ Constant folding:
• We can eliminate both the test and printing from the object code. More generally,
deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.
• One advantage of copy propagation is that it often turns the copy statement into dead
code.
• For example,
a=3.14157/2 can be replaced by
✓ Loop Optimizations:
• We now give a brief introduction to a very important place for optimizations, namely
loops, especially the inner loops where programs tend to spend the bulk of their time.
The running time of a program may be improved if we decrease the number of
instructions in an inner loop, even if we increase the amount of code outside that loop.
• Three techniques are important for loop optimization:
I. code motion, which moves code outside a loop;
II. Induction-variable elimination, which we apply to replace variables from inner loop.
III. Reduction in strength, which replaces and expensive operation by a cheaper one, such
as a multiplication by an addition.
✓ Code Motion:
• An important modification that decreases the amount of code in a loop is code motion.
This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the expression
before the loop. Note that the notion “before the loop” assumes the existence of an entry for
the loop. For example, evaluation of limit-2 is a loop-invariant computation in the following
while-statement:
t= limit-2;
✓ Induction Variables :
• Loops are usually processed inside out. For example consider the loop around B3.
• Note that the values of j and t4 remain in lock-step; every time the value of j decreases
by
1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called induction
variables.
• When there are two or more induction variables in a loop, it may be possible to get rid
of all but one, by the process of induction-variable elimination. For the inner loop
around B3 in Fig. we cannot get rid of either j or t4 completely; t4 is used in B3 and j
in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2 - B5
is considered.
Example:
As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is not
changed elsewhere in the inner loop around B3, it follows that just after the statement
j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:=
4*j by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3
for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3,
• Reduction In Strength:
• Reduction in strength replaces expensive operations by equivalent cheaper ones on
the target machine. Certain machine instructions are considerably cheaper than others
and can often be used as special cases of more expensive operators.
• For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be
implemented as multiplication by a constant, which may be cheaper.
2. OPTIMIZATION OF BASIC BLOCKS
Structure-Preserving Transformations:
The primary Structure-Preserving Transformation on basic blocks are:
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value of
t1
does not affect the value of t2.
➢ Algebraic Transformations:
• Algebraic identities represent another important class of optimizations on
basic blocks. This includes simplifying expressions or replacing expensive
operation by cheaper ones i.e. reduction in strength.
• Another class of related optimizations is constant folding. Here we evaluate
constant
expressions at compile time and replace the constant expressions by their values. Thus
the expression 2*3.14 would be replaced by 6.28.
• The relational operators <=, >=, <, >, + and = sometimes generate unexpected
common
sub expressions.
• Associative laws may also be applied to expose common sub expressions. For
example, if the source code has the assignments
a :=b+c
e :=c+d+b
3. PEEPHOLE OPTIMIZATION
✓ Redundant-instructions elimination
✓ Flow-of-control optimizations
✓ Algebraic simplifications
✓ Use of machine idioms
✓ Unreachable Code
Unreachable Code:
• Another opportunity for peephole optimizations is the removal of
unreachable instructions. An unlabeled instruction immediately
following an unconditional jump may be removed. This operation can
be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain
segments that are executed only if a variable debug is 1. In C, the source
code might look like:
#define debug 0
….
If ( debug ) {
Print debugging information
}
• In the intermediate representations the if-statement may be translated
as:
If debug =1 goto L2
goto L2
L1: print debugging information
L2: …………………………(a)
Flows-Of-Control Optimizations:
• The unnecessary jumps can be eliminated in either the intermediate code or
the target code by the following types of peephole optimizations. We can
replace the jump sequence
goto L1
….
L1: goto L2
by the sequence
goto L2
….
L1: goto L2
• If there are now no jumps to L1, then it may be possible to eliminate the
statement L1:goto L2 provided it is preceded by an unconditional jump
.Similarly, the sequence
if a < b goto L1
….
L1: goto L2
can be replaced by
If a < b goto L2
….
L1: goto L2
• May be replaced by
If a < b goto L2
goto L3
…….
L3: ………………………………….(2)
• While the number of instructions in (1) and (2) is the same, we sometimes skip
the unconditional jump in (2), but never in (1). Thus (2) is superior to (1) in
execution time.
Algebraic Simplification:
Reduction in Strength:
Knowledge about the behavior of a variable is essential for performing function preserving
transformations. In addition to the behavior of the variables, control flow information is also
required to do transformations across basic blocks. In addition, to apply global optimizations
on basic blocks, data- flow information is collected by solving systems of data- flow equations.
Data flow equations
A variable is defined at a particular statement. The value of this variable may reach subsequent
sequence of instructions. The variable when it is defined, we call it as being generated and is
indicated using gen[S], which means, statement S generates this variable. Thus, this variable
with the value it has gets forwarded to all other statements until some other statement kills
this definition of the variable. Thus, to determine the reaching definitions for a sequence of
statements “S” the following equation is defined in terms of 4 functions.
In the above equation, out[S] indicates the variables which are available after statement S and
this is computed in terms of the variables the current statement generates and including the
statements that enter a block eliminating the definitions of variables the current statement
kills. The information at the end of a statement S is either generated within the statement or
enters at the beginning and is not killed as control flows through the statement.
The notion of killing and generating statements depends on the desired information and on
the data-flow analysis problem to be solved. For some problems the function out[S] needs to
be defined in terms of in[S] and for some in[S] needs to be defined in terms of out[S]. Data
flow is affected by the control flow of the program. The function out [S] is based on the
assumption that there is a unique end point. In addition, we need to consider that the variable
assignments through pointer variables, procedure calls, assignments to array variables
influence the data flow. Consider the following example given in figure, involving three basic
blocks. We will look at each statement as a definition of the LHS variable.
A “point” is defined as the position between two adjacent statements. In addition, a “point”
can also be the position above the first statement and following the last statement. For
example, in figure, basic block B1 has 3 points while B2 has 2 points. If we consider all the
blocks then each will have many points. If we try to merge the last point of a current block
with the first point of its successor block then we could end up with the sequence of
statements that can be looked together to perform some optimization. On the other hand, a
path is the sequence of statements between any two points.
Reaching Definitions
A definition of a variable ‘x’ is a statement that assigns or may assign a value to ‘x’. This
definition of the variable ‘x’ is unambiguous if a simple assignment holds good. On the other
hand, if ‘x’ is used as a parameter of a procedure or through pointer then ‘x’ is said to have
defined with ambiguity. Definition ‘d’ reaches a point ‘p’ if there is a path from the point
immediately following ‘d’ to ‘p’ and ‘d’ is not killed in that path. A “kill” is defined as the
position between two points, where the variable is defined and is redefined. Consider the
figure for which we have three definitions. Table gives the definitions of the various blocks.
Data flow analysis – structure d programs
Let us consider the data flow analysis of structured programs. In order to understand the data
flow, we need to know the various forms of statements. The assumption with the statements
is that there is a single entry and single exit point.