Code Optimization Unit-4-II
Code Optimization Unit-4-II
1
6.1 CODE OPTIMIZATION
The code generated by the compiler can be made faster or take less space or both.
Some transformations can be applied on this code called optimization or optimization
transformations.
Compilers that can apply optimizing transformations are called optimizing compilers.
Code optimization is an optional phase and it must not change the meaning of the
program.
Compiler was found to consume 40% extra compilation time due to optimization. The
optimized program occupied 25% less storage and executed 3 times faster than
unoptimized program.
3. Machine level details like instructions and addresses are not known to the
programmer.
3. It must be worth with the effort, ie, the effort put on optimization must be worthy
when compared with the improvement.
2
Optimization can be applied in 3 places.
Source code
Intermediate code
Target code
Code
Front End
Source code Intermediate code Generator Target code
1. Local optimization
Transformations are applied over a small segment of the program called basic block,
in which statements are executed in sequential order. Speed-up factor for local
optimization is 1.4.
2. Global optimization
Transformations are applied over a large segment of the program like loop,
procedures, functions etc. Local optimization must be done before applying global
optimization. Speed-up factor is 2.7.
3
Basic Block
To identify basic block, we have to find leader statements. Rules for leader statements
are
From a leader statement to all statements up to but, not including the next leader is a
basic block.
Flow Graphs
Basic block is a sequence of consecutive 3 address statements which may be entered
only at the beginning and when entered statements are executed in sequence without
halting or branching.
To identify basic block, we have to find leader statements. Rules for leader statements
are
Nodes are basic blocks and edges are control flow. It is a directed graph G = (N,E n0).
n0 – starting node
If there is a directed edge from B1 to B2, the control transfers from the last statement
of B1 to the first statement of B2. B1 is called predecessor of B2 and B2 is successor of
B1.
4
EXAMPLE
5
Flow graph for quick sort
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes.
6
We can avoid recomputing the expression if we can use the previously computed
value. Two types are: -
EXAMPLE 1
B5
7
EXAMPLE 2
The common sub expression t4: =4 * i is eliminated as its computation is already in t1 and the
value of i is not been changed from definition to use.
t8 = 4 * j t9 = a[t4]
t9 = a[t8] in B5 can be replaced by a[t4] = x using t4 computed in block B3
a[t8] =x
In Flow graph given above, observe that as control passes from the evaluation of 4 * j
in B3 to B5, there is no change to j and no change to t4, so t4 can be used if 4 * j is
needed.
Another common subexpression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level.
Not only does j retain its value as control leaves B3 and then enters B5, but a[j], a value
computed into a temporary t5, does too, because there are no assignments to elements
of the array a in the interim.
A similar series of transformations has been done to B6 in Flow graph. The expression
a[tl] in blocks B1 and B6 is not considered a common sub expression, although tl can
be used in both places.
After control leaves B1 and before it reaches B6, it can go through B5, where there are
assignments to a. Hence, a[t1] may not have the same value on reaching B6 as it did
on leaving B1, and it is not safe to treat a[t1] as a common sub expression.
8
B5 and B6 after common subexpression elimination.
Copy Propagation
Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after
the copy statement f: = g.
Copy propagation means use of one variable instead of another. Copy statements
introduced during common subexpression elimination.
EXAMPLE 1
This change may not appear to be an improvement, but it gives us the opportunity to eliminate
the assignment to x.
9
EXAMPLE 2
When the common subexpression in c := d+e is dominated in fig given below, the algorithm
uses a new variable t to hold the value of d+e.
Since control may reach c := d+e either after the assignment to a or after the assignment to b,
it would be incorrect to replace c := d+e by either c := a or by c := b.
ADVANTAGE
One advantage of copy propagation is that it often turns the copy statement into dead code.
Dead-Code Elimination
A related idea is dead or useless code, statements that compute values that never get
used.
While the programmer is unlikely to introduce any dead code intentionally, it may
appear as the result of previous transformations.
EXAMPLE 1
10
EXAMPLE 2
i=0;
if(i==1)
{
a=b+5;
}
Here, ‘if’ statement is dead code because this condition will never get satisfied.
Constant Folding
If all operands are constants in an expression, then it can be evaluated at compile time
itself. The result of the operation can replace the original evaluation in the program.
This will improve the run time performance and reducing code size by avoiding
evaluation at compile- time.
EXAMPLE
Loop Optimization
Code Motion
Induction Variable
Reduction In Strength
Code Motion
An important modification that decreases the amount of code in a loop is code motion.
Execution time of a program can be reduced by moving code from a part of a program
which is executed very frequently to another part of the program which is executed
fewer times
A fragment of code that resides in the loop and computes the same value of each
iteration is called loop invariant code.
11
EXAMPLE 1
z := 1; {
x := 25 * a ; z := 1;
y := x + z ; y := x + z ;
end; end;
} }
EXAMPLE 2
t= limit - 2;
while (i<=t) /* statement does not change limit or t */
Induction Variables
Loops are usually processed inside out. For example consider the loop around B3.
Note that the values of j and t4 remain in lock-step; every time the value of j decreases
by 1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called
induction variables.
12
Reduction In Strength
When there are two or more induction variables in a loop, it may be possible to get
rid of all but one, by the process of induction-variable elimination. For the inner loop
around B3 we cannot get rid of either j or t4 completely; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of
B2- B5 is considered.
EXAMPLE
As the relationship t4:=4*j surely holds after such an assignment to t4 in Figure. and
t4 is not changed elsewhere in the inner loop around B3, it follows that just after the
statement j:=j-1 the relationship t4:= 4*j-4 must hold.
We may therefore replace the assignment t4:= 4*j by t4:= t4-4. The only problem is that
t4 does not have a value when we enter block B3 for the first time.
Since we must maintain the relationship t4=4*j on entry to the block B3, we place an
initializations of t4 at the end of the block where j itself is initialized, shown by the
dashed addition to block B1 in Figure
13
Directed Acyclic Graph
In compiler design, a DAG is an abstract syntax tree with a unique node for each
value. DAG is an useful data structure for implementing transformation on basic
block. DAG is constructed from three address code.
When we construct the node for the third statement c = b + c, we know that the use
of b in b + c refers to the node labeled -, because that is the most recent definition of
b.
Application of DAG
Determine which names are used in the block and compute outside the block.
Determine which statement of the block could have their computed value
outside the block.
1. In a DAG Leaf node represents identifiers, names, constants. Interior node represents
operators.
2. While constructing DAG, there is a check made to find if there is an existing node with
same children. A new node is created only when such a node does not exist. This
action allows us to detect common subexpression and eliminate the same.
14
EXAMPLE 1
EXAMPLE 2
d=b*c
e=a+b
b=b*c
a=e–d
EXAMPLE 3
(a+b)*(a+b+c)
t1 = a + b
t2 = t1 + c
t3 = t1 * t2
EXAMPLE 4
(((a+a)+(a+a))+((a+a)+(a+a)))
15
The Use of Algebraic Identities
a=b+c
e=a+d
16
6.2.1 ISSUES IN THE DESIGN OF A CODE
GENERATOR
The following issues arise during the code generation phase:
2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order
The input to the code generator consists of the intermediate representation of the
source program produced by the front end, together with information in the symbol
table that is used to determine the run-time addresses of the data objects denoted by
the names in the intermediate representation.
There are several choices for the intermediate language including postfix notation,
three address representation such as quadruple, virtual machine representations such
as stack machine code, and graphical representations such as syntax trees and dags.
We assume that prior to code generation the front end scanned, parsed and translated
the source program into a reasonably detailed intermediate representation, so the
values of names appearing in the intermediate language, type checking has taken
place, so type conversion operators have been inserted wherever necessary. The code
generation phase can therefore proceed on the assumption that its input is free of
errors.
Target programs
The output of the code generator is the target program. This output may take on a
variety of forms- absolute machine language, relocatable machine language or
assembly language. Producing an absolute machine language as output has the
advantage that it can be placed in a fixed location in memory and intermediate
executed.
17
Producing a relocatable machine language program as output allows subprograms to
be compiled separately. A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
The instruction set architecture of the target machine has a significant impact on the
difficulty of constructing a good code generator that produces high quality machine
code. The most common target machine architectures are RISC (reduced instruction
set computer), CISC (complex instruction set computer) and stack based.
The RISC machine has many registers, three-address instructions, simple addressing
modes and a relatively simple instruction set architecture. In contrast, a CISC machine
has few registers, two-address instructions, a variety of addressing modes, several
register classes, variable length instructions and instructions with side effects.
In stack based machine, operations are done by pushing operands onto the stack and
then performing the operations on the operands at the top of the stack. To achieve
high performance, the top of the stack is typically kept in registers.
Stack based architectures were revived with the introduction of Java Virtual Machine
(JVM). The JVM is a software interpreter for java bytecodes, an intermediate language
produced by Java compiler. The interpreter provides software compatibility across
multiple platforms. To overcome the high-performance penalty of interpretation,
which can be on the order of a factor of 10, just-in-time java compiler.
Memory Management
Mapping of variable names to address is done co-operatively by the front end and
code generator. Name and width are obtained from symbol table. Width is the amount
of storage needed for that variable. Each three-address code is translated to addresses
and instructions during code generation. A relative addressing is done for each
instruction. Al the labels should be addressed properly. Backward jump is easier to
manage than the forward jump.
Instruction Selection
The code generator must map the IR program into a code sequence that can be
executed by the target machine. The complexity of performing this mapping is
determined by a factor such as,
18
If the IR is high level, the code generator may translate each IR statement into a
sequence of machine instructions using code templates. Such statement-by-statement
code generation often produces poor code that needs further optimization. If IR
reflects some of the low-level details of the underlying machine, then the code
generator can use this information to generate more efficient code sequence.
The nature of the instruction set of the target machine has a strong effect on the
difficulty of instruction selection. Uniformity and completeness of the instruction set
are important factors.
If the target program does not support each data type in a uniform manner, then each
exception to the general rule requires special handing.eg: in some machines floating
point operations are done using separate registers. Instruction speed and machine
idioms are other important factors.
If we do not care about the efficiency of the target program, instruction selection is
straightforward. For each common three-address statement, a general code can be
designed.
Eg: x = y + z
MOV y, R0
ADD z, R0
MOV R0, x
Eg:
a=b+c
d=a+e
MOV b, R0
ADD c, R0
MOV R0, a
ADD e, R0
MOV R0, d
The quality of the generated code is usually determined by its speed and size. On
most machines, a given IR program can be implemented by many different code
sequence, with significant cost difference between the different implementations.
19
Eg: if the target machine has an increment instruction INC, then the three-address
statement a = a+1 may be implemented more efficiently by the single instruction INC
a, rather than by a more obvious sequence that loads a into a register, adds one to the
register, and then store the result back into a.
MOV a, R0
ADD #1, R0
MOV R0, a
Register Allocation
A key problem in code generation is deciding what values to hold in what registers.
Registers are the fastest computational unit on the target machine, but we usually not
have enough of then to hold all values.
Register allocation, during which we select the set of variables that will reside
in registers at each point in the program.
Register assignment, during which we pick the specific register that a variable
will reside in.
This problem becomes more complicated, if the target machine has certain
conventions on register use.
Eg: in 8085, one of the operand of some operations should be placed in register A.
The order of evaluation can affect the efficiency of target code. Some order requires
fewer registers and instructions than others.
Picking the best order is an NP-complete problem. This can be solved up to an extend
by code optimization in which the order of instruction may change.
The target code generated should be correct. Correctness depends on the number of
special cases the code generator might face. Other design goals of code generator are,
it should be easily implemented, tested and maintained.
20
6.2.2 TARGET MACHINE
Familiarity with the target machine and its instruction set is a prerequisite for
designing a good code generator.
Our target computer is a byte-addressable machine with four bytes to a word and n
general purpose registers, R0, R1, R2…. Rn-1. It has two address instructions of the
form
op source, destination
in which op is an op-code and source and destination are data fields. It has the
following op-codes
The source and destination fields are not long enough to hold memory addresses, so
certain bit patterns in these fields specify that words following an instruction contain
operands and/or addresses.
The source and destination of an instruction are specified by combining registers and
memory locations with address mode. contents(a) denotes the contents of the register
or memory address represented by a.The address modes together with their
assembly-language forms and associated costs are as follows:
absolute M M 1
register R R 0
21
MODE FORM ADDRESS ADDED COST
literal #C C 1
Instruction Cost
Cost of an instruction is one plus the costs associated with the source and destination
address modes, indicated by add cost in the above table.
This cost corresponds to the length of the instruction. Address modes involving
registers have cost zero, while those with a memory location or literal in them have
cost one, because such operands have to be stored with the instruction.
1. The instruction MOV R0, R1 copies the contents of register R0 into register R1.
This instruction has cost one, since it occupies only one word of memory.
2. The (store) instruction MOV R5 , M copies the contents of register R5 into memory
location M. This instruction has cost two, since the address of memory location M
is in the word following the instruction.
3. The instruct ion ADD # 1 , R3 adds the constant I to the contents of register 3, and
has cost two, since the constant I must appear in the next word following the
instruction.
4. The instruction SUB 4 (R0) , * 12 (R) stores the value
into the destination *12 (R1) .The cost of this instruction is three, since the constants 4
and 12 are stored in the next two words following the instruction.
1. MOV b, R0
ADD c, R0 cost = 6
MOV R0, a
2. MOV b, a
ADD c, a cost = 6
Assuming R0, R 1 , and R2 contain the addresses of a, b, and c. respectively, we can use:
22
3. MOV *R1, *R0
ADD *R2, *R0 cost = 2
Assuming R1 and R2 contain the values of b and c, respectively, and that the value of b is not
needed after the assignment, we can use:
4. ADD R2, R1
MOV R1, a cost = 3
For a three-address statement a = b + c, generate instruction ADD Rj, Ri with cost one,
leaving the result a in register Ri.
This sequence is possible only if register Ri contains b, Rj contains c and b is not live
after the statement; that is, b is not used after the statement.
ADD c, Ri cost =2
Or
MOV c, Rj
ADD Rj, Ri cost =3
The code generation algorithm uses descriptors to keep track of register contents and
addresses for names.
2. An Address Descriptor keeps track of the location where the current value of the
name can be found at run time. The location might be a register, a stack location or a
memory address. This information can be stores in the symbol table and is used to
determine the accessing method for a name.
23
A code-generation algorithm
1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of y.
Prefer the register for y’ if the value of y is currently both in memory and a
register. If the value of y is not already in L, generate the instruction MOV y’ , L
to place a copy of y in L.
3. Generate the instruction OP z’ , L where z’ is a current location of z. Prefer a
register to a memory location if z is in both. Update the address descriptor of x
to indicate that x is in location L. If x is in L, update its descriptor and remove x
from all other descriptors.
4. If the current values of y or z have no next uses, are not live on exit from the
block, and are in registers, alter the register descriptor to indicate that, after
execution of x : = y op z , those registers will no longer contain y or z
1. If the name Y is in a register that holds the value of no other names and Y is not live
and has no next use after X := Y op Z then return register of Y for L. Update the address
descriptor of y to indicate that y is no longer in L.
2. Failing (1) return an empty register for L if there is one.
3. Failing (2) if X has a next use in the block or op is an operator, such as indexing that
requires a register, find an occupied register R. Store the value of R into a memory
location (by MOV R, M ) If it is not already in proper memory location M, update the
address descriptor for M, and return R. if R hold the value of several variables, a MOV
instruction must be generated for each variable that need to be stored. A suitable
occupied register might be one whose datum is referenced furthest in the future, or
one whose value is also in memory. We leave the exact choice unspecified, since there
is no one proven best way to make the selection.
4. If X is not used in the block. Or no suitable occupied register can be found, select the
memory location of X as L.
24
Generating Code For Assignment Statements
The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-address
code sequence: t := a – b
u := a – c
v := t + u
d := v + u with d live at the end.
25
PEEPHOLE OPTIMIZATION
Peephole optimization is a simple and effective technique for locally improving target code.
This technique is applied to improve the performance of the target program by examining
the short sequence of target instructions (called the peephole) and replace these instructions
replacing by shorter or faster sequence whenever possible. Peephole is a small, moving
window on the target program.
So The peephole optimization can be applied to the target code using the following
characteristic.
MOV R0, x
MOV x, R0
• We can eliminate the second instruction since x is in already R0. But if MOV x, R0 is a
label statement then we cannot remove it.
2. Unreachable Code
Unreachable code is a part of the program code that is never accessed because of
programming constructs. Programmers may have accidently written a piece of code that can
never be reached.
EXAMPLE
void add_ten(int x)
return x + 10;
In this code segment, the printf statement will never be executed as the program control
returns back before it can execute, hence printf can be removed.
26
...
MOV R1, R2
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
In this code,label L1 can be removed as it passes the control to L2. So instead of jumping to
L1 and then to L2, the control can directly reach L2, as shown below:
...
MOV R1, R2
GOTO L2
...
L2 : INC R1
4. Algebraic simplification
There are occasions where algebraic expressions can be made simple. For example, the
expression a = a + 0 can be replaced by a itself and the expression a = a + 1 can simply be
replaced by INC a.
5. Reduction in strength
There are operations that consume more time and space. Their ‘strength’ can be reduced by
replacing them with other operations that consume less time and space, but produce the same
result.
For example, x * 2 can be replaced by x << 1, which involves only one left shift. Though the
output of a * a and a2 is same, a2 is much more efficient to implement
6. Machine idioms
So The target instructions have equivalent machine instructions for performing some have
operations.
Hence we can replace these target instructions by equivalent machine instructions in order to
improve the efficiency.
27