Atcd-Unit-5 (1) - 2
Atcd-Unit-5 (1) - 2
UNIT-5
Machine Dependent Phases
Syllabus:
UNIT-V: Machine Dependent Phases
Intermediate Code Generation: Intermediate code, three address code,
quadruples, triples, directed acyclic graph.
Code Optimization: Common sub expression elimination, copy propagation, dead
code elimination, constant folding, strength reduction, loop optimization.
Code Generation: Basic blocks & flow graphs, Peephole optimization, Register
allocation and assignment.
Example: x = (a + b * c) / (a – b * c)
AST DAG
A node in a Directed Acyclic Graph (DAG) may have more than one
parent.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Problem: Consider the following expression and construct a DAG for it-
(((a+a)+(a+a))+((a+a)+(a+a)))
Solution-
Postfix Notation:
Also known as reverse Polish notation or suffix notation.
The ordinary (infix) way of writing the sum of a and b is with an
operator in the middle: a + b The postfix notation for the same
expression places the operator at the right end as ab+
In general, if e1 and e2 are any postfix expressions, and + is any binary
operator, the result of applying + to the values denoted by e1 and e2 is
postfix notation by e1e2 +
No parentheses are needed in postfix notation because the position and
arity (number of arguments) of the operators permit only one way to
decode a postfix expression.
Unlike infix notation, postfix notation places the operator after the two
operands. And, it doesn’t use any parenthesis to group expressions.
In postfix notation, the operator follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is :
ab + c *
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Infix Expression: ( ( A + ( B * C ) ) / ( D + E ) )
Postfix Expression: A B C* + D E + /
Infix Expression: ( ( A + B ) * ( C + E ) )
Postfix Expression: A B + C E + *
Infix Expression: ( A * ( B * ( ( ( C + A) + B) * C ) ) )
Postfix Expression: ABC A + B + C * * *
Infix Expression: ( ( H * ( ( ( ( A + ( ( B + C ) * D ) ) * F ) * G ) * E ) ) + J)
Postfix Expression: H A B C + D * + F * G * E * * J +
Three-Address Code (TAC)
A statement involving no more than three addresses (two for operands
and one for result) is known as a three address statement. A sequence of
three address statements is known as a three address code.
Three address statement is of form x = y op z, where x, y, and z will
have address (memory location). Sometimes a statement might contain
less than three references but it is still called a three address statement.
Three address codes is a type of intermediate code which is easy to
generate and can be easily converted to machine code.
It makes use of at most three addresses and one operator to represent an
expression and the value computed at each instruction is stored in
temporary variable generated by compiler.
The compiler decides the order of operation given by three address
code.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
General representation:
a = b op c
Where a, b or c represents operands like names, constants or compiler
generated temporaries and op represents the operator
Example: The three address code for the expression a + b * c + d is
t1 = b * c
t2 = a + t1
t3 = t2 + d
Where t1 , t2 , t3 are temporary variables.
Implementation of Three Address Codes:
(Data structures for three address codes)
There are 3 representations of three address code namely
1. Quadruples
2. Triples
3. Indirect Triples
Quadruples:
Quadruple is defined as a record structure used to represent a three-
address statement.
It consists of four fields. The first field contains the operator, the second
and third fields contain the operand 1 and operand 2, respectively, and the
last field contains the result of that three-address statement.
Advantages:
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using
symbol table.
Disadvantage –
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
Example: Consider expression a = b * – c + b * – c.
The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Triples
A triple is also defined as a record structure that is used to represent a
three address statement.
In triples, for representing any three-address statement three fields are
used, namely, operator, operand 1 and operand 2, where operand 1 and
operand 2 are pointers to either symbol table or they are pointers to the
records (for temporary variables) within the triple representation itself.
In this representation, the result field is removed to eliminate the use of
temporary names referring to symbol table entries. Instead, we refer the
results by their positions.
The pointers to the triple structure are represented by parenthesized
numbers, whereas the symbol-table pointers are represented by the names
themselves.
Note that instead of referring the temporary t by its name, we refer it by
its position in the triple.
Disadvantages:
Temporaries are implicit and difficult to rearrange code.
It is difficult to optimize because optimization involves moving
intermediate code. When a triple is moved, any other triple
referring to it must be updated also. With help of pointer one can
directly access symbol table entry.
Example:Consider expression a = b * – c + b * – c
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Indirect Triples:
An indirect triple representation consists of an additional array that
contains the pointers to the triples in the desired order
Its similar in utility as compared to quadruple representation but
requires less space than it.
Temporaries are implicit and easier to rearrange code.
Question: Write quadruple, triples and indirect triples for following expression
: (x + y) * (y + z) + (x + y + z)
Explanation: The three address code is:
t1 = x + y
t2 = y + z
t3 = t1 * t2
t4 = t1 + z
t5 = t3 + t4
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Problem:
Convert S = -z/a * (x + y) into three address code,where Unary minus is represented
by -z.
The following is the three-address code:
t1 = x + y
t2 = a * t1
t3 = - z
t4 = t3/t2
S = t4
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Problem: Construct Quadruples, Triples, and Indirect Triples for the expression
-(a + b) * (c + d) - (a + b + c)
Solution
First of all this statement will be converted into Three Address Code as:
t1 = a + b
t2 = −t1
t3 = c + d
t4 = t2 ∗ t3
t5 = t1 + c
t6 = t4 − t5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Code Optimization
Code Optimization is an approach to improve the code by eliminating
unnecessary code lines and arranging the statement in such a sequence that
speed up the program execution without changing the meaning of the
program
The code produced by the straight forward compiling algorithms can often be made to r
faster or take less space, or both. This improvement is achieved by program transforma
that are traditionally called optimizations. Compilers that apply code-improving
transformations are called optimizing compilers.
The process of code optimization involves:
Eliminating the unwanted code lines
Rearranging the statements of the code
The criteria for code Optimization:
The transformation must preserve the meaning of programs. That
is, the optimization must not change the output produced by a
program for a given input.
A transformation must, on the average, speed up programs by a
measurable amount.
The transformation must be worth the effort. i.e., yield the most
benefit for the least effort.
Advantages of Code optimization:
Optimized code has faster execution speed.
Optimized code utilizes the memory efficiently.
Optimized code gives better performance.
Optimizations are classified into two categories. They are
Machine independent optimizations
Machine dependant optimizations
Machine independent optimizations: Machine independent optimizations
are program transformations that improve the target code without taking
into consideration any properties of the target machine.
Machine dependant optimizations: Machine dependant optimizations are
based on register allocation and utilization of special machine-instruction
sequences
Common Types of Code Optimization Techniques
(Principle Sources of Optimization (or) Structure-Preserving Transformation)
Local Optimization:
1. Common sub-expression elimination
2. Dead Code Elimination
3. Compile Time Evaluation
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Loop Optimization:
4. Code Movement
5. Strength Reduction
6. Induction Variable elimination
Common Sub-Expression Elimination:
An occurrence of an expression E is called a common sub-expression if E
was previously computed, and the values of variables in E have not
changed since the previous computation.
We can avoid re-computing the expression if we can use the previously
computed value.
In this technique,
As the name suggests, it involves eliminating the common sub
expressions.
The redundant expressions are eliminated to avoid their re-
computation.
The already computed result is used in the further program when
required.
Example 1:
a:= b+c
b:=a-d
c:=b+c
d:=a-d
in the above code a-d is common sub expression but b+c is not common
sub expression
The above code can be optimized using the common sub-expression
elimination as
a:= b+c
b:=a-d
c:=b+c
d:=b
Example 2:
Code Before Optimization Code After Optimization
i=0;
if (i == 1)
{ i=0;
a=x+5;
}
Here, ‘if’ statement is dead code because this condition will never get
satisfied
Example 1:
Circumference of Circle = (22/7) x Diameter
Here,
This technique evaluates the expression 22/7 at compile time.
The expression is then replaced with its result 3.14.
This saves the time at run time.
Example 2:
For example, a=3.14157/2 can be replaced by a=1.570 thereby eliminating a
division operation.
ii) Constant Propagation:
In this technique,
If some variable has been assigned some constant value, then it replaces
that variable with its constant value in the further program during
compilation.
The condition is that the value of variable must not get alter in between.
Example:
radius = 10
Area = pi x radius x radius
Here,
This technique substitutes the value of variables ‘pi’ and ‘radius’ at
compile time.
It then evaluates the expression 3.14 x 10 x 10.
The expression is then replaced with its result 314.
This saves the time at run time.
Code Movement or Frequency Reduction
(Moving loop-invariant computation)
An important modification that decreases the amount of code in a loop is
code motion. This transformation takes an expression that yields the same
result independent of the number of times a loop is executed ( a loop-
invariant computation) and places the expression before the loop.
This technique reduces the execution frequency of expression by moving
the code.
In this technique,
As the name suggests, it involves movement of the code.
The code present inside the loop is moved out if it does not matter
whether it is present inside or outside.
Such a code unnecessarily gets execute again and again with each
iteration of the loop.
This leads to the wastage of time at run time.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Example1:
Code Before Optimization Code After Optimization
Strength Reduction:
In this technique,
As the name suggests, it involves reducing the strength of expressions.
This technique replaces the expensive and costly operators with the
simple and cheaper ones.
For example, x*x cheaper to implement as x+ x. x² is invariably cheaper
to implement as x*x than as a call to an exponentiation routine. Fixed-
point multiplication or division by a power of two is cheaper to
implement as a shift. Floating-point division by a constant can be
implemented as multiplication by a constant, which may be cheaper.
Example:
B=Ax2 B=A+A
Here,
The expression “A x 2” is replaced with the expression “A + A”.
This is because the cost of multiplication operator is higher than that
of addition operator.
Induction variable Elimination:
A basic induction variable is a variable X whose only definitions within
the loop are assignments of the form: X = X+c or X = X-c, where c is
either a constant or a loop-invariant variable. (e.g., i)
III IT-I SEM Regulation: R20 ATCD: UNIT-5
s=3*i + 1;
s=3*i + 1; While(s<31)
While(i<10) {
{ a[s] = a[s]-2;
a[s] = a[s]-2; s=s+6;
i= i+2; }
s=s+6;
}
Here s and i are induction variables. We can eliminate i from the loop.
Code Generation
The final phase in compiler model is the code generator.
It is machine dependent phase.
It takes as input an intermediate representation of the source program and
produces as output an equivalent target program.
It is the most complex phase of a compiler, since it depends not only on
the characteristics of the source language but also on detailed information
about the target machine architecture, the structure of the run time
environment and the operating system running on the target machine.
The position of code generator in compilation process is illustrated by
following figure.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
5. Register allocation:
Instructions involving register operands are shorter and faster than those
involving operands in memory.
The use of registers is subdivided into two sub problems :
Register allocation – the set of variables that will reside in registers at
a point in the program is selected.
Register assignment – the specific register that a variable will reside in
is picked.
Certain machine requires even-odd register pairs for some operands and
results. For example , consider the division instruction of the form :
D x, y
Where, x – dividend even register in even/odd register pair
y – divisor
even register holds the remainder
odd register holds the quotient
6. Evaluation order:
The order in which the computations are performed can affect the
efficiency of the target code.
Some computation orders require fewer registers to hold intermediate
results than others.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Example 2:
1) i = 1
2) j = 1
3) t1 = 10 * i
4) t2 = t1 + j
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) j = j + 1
9) if j <= 10 goto (3)
10) i = i + 1
11) if i <= 10 goto (2)
12) i = 1
13) t5 = i - 1
14) t6 = 88 * t5
15) a[t6] = 1.0
16) i = i + 1
17) if i <= 10 goto (13)
B5 for statement 12
B6 for statement 13-17.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Advantage
Heavily used values reside in registers
Disadvantage
Does not consider non-uniform distribution of uses
Global register allocation
Local allocation does not take into account that some instructions (e.g.
those in loops) execute more frequently. It forces us to store/load at basic
block endpoints since each block has no knowledge of the context of
others.
To find out the live range(s) of each variable and the area(s) where the
variable is used/defined global allocation is needed. Cost of spilling will
depend on frequencies and locations of uses.
Register allocation depends on:
Size of live range
Number of uses/definitions
Frequency of execution
Number of loads/stores needed.
Cost of loads/stores needed.
Register Allocation by Usage Counts
A slightly more sophisticated method for global register allocation is
called usage counts.
In this method, registers are allocated first to the variables that are used
the most.
Example :
Consider the following loop:
LOOP: X = 2 * E
Z=Y+X+1
IF some condition THEN
Y=Z+Y
D=Y-1
ELSE Z = X - Y
D=2
ENDIF
X=Z+D
Z=X
ENDLOOP
Here, there are five references to X, and Z, five references
to Y, three references to D, and one to E. Thus, if there are three
registers, a reasonable approach would be to allocate X and Z to
two of them, saving the third for local computations.
III IT-I SEM Regulation: R20 ATCD: UNIT-5
Tutorial Questions:
1. What is role of intermediate Code generator in compilation process? Explain Various
Forms Of Intermediate Codes Used By Compiler.
2. Explain various methods of implementing three address statements with suitable
examples.
(or) Explain about quadruples, triples and indirect triples of three-address statements
of intermediate code
3. Write quadruples, triples and indirect triples for the expression:
-(a*b)+(c+d)-(a+b+c+d)
(or) For the given expression generate different kinds of three-address
codes -(a*b)+(c+d)-(a+b+c+d)
4. Write the short note on: (i) Abstract syntax tree (ii) Polish notation
(iii) Three address code
III IT-I SEM Regulation: R20 ATCD: UNIT-5
5. Construct abstract syntax tree & DAG for the assignment statement
x:=a*b+c-a*b+d
6. Write the quadruple, triple, indirect triples for the statement
a:= b * -c + b * -c
7. Translate the assignment A:= -B*(C+D) into following
i) Quadruple ii) Triples iii) Indirect triples
8. Translate the expression -(a+b)*(c+d)+(a+b+c) into the following
i) Quadruples ii) Triples iii) Indirect triples
9. Convert the following arithmetic expressions into Abstract syntax tree, DAG, postfix
notation and three-address code:
i) b*-(a+b)
ii) a+b*(a+b)+c+d
10. What is code optimization? Compare machine dependent and independent code
optimization techniques.
11. What is code optimization? Explain about various levels and types of optimizations
(or) Explain different principle sources of optimization techniques with suitable
examples
12. Discuss about principal sources of optimization.
13. Write short note on
a. Constant Folding
b. Dead Code Elimination
c. Code Motion
d. Induction Variable Elimination
14. Discuss briefly various loop optimization techniques.
15. Define flow graph. Explain the optimization of Basic Blocks.
16. Write about all issues in code generation. Describe it.
(or) Explain the different issues in the design of a code generator
17. Explain the peephole optimization Techniques?
(or) Discuss the transformations that are characteristic of peephole optimizations.
(or) What kinds of peephole techniques can be used to perform machine-dependent
optimizations?+
18. What is a basic block and flow graph? Explain how flow graph can be constructed for
a given program.
19. What is peephole optimization? How can it be performed? Give its role in code
generation.
20. Discuss about register allocation and assignment in target code generation.