Intermidiate Code Generator
Intermidiate Code Generator
Graphical Representations
A syntax tree depicts the natural hierarchical structure of a source
program. A DAG (Directed Acyclic Graph) gives the same
information but in a more compact way because common subexpressions are identified. A syntax tree for
the assignment statement a:=b*-c+b*-c appear in the figure.
Syntax Tree:
id
id
uminus 1
0 2
id
id
uminus 5
4 6
3 7
id
10 assign
a
9 8
11
! "
#
$
"%
! $%
'(
'*
'+
,
-
'*
.
/0/ 0 1
2
'
.
$
3
4
.$ 3 4
!
&
#
$
"
"
#
%
#
$
'
'
(
)
'
(
$
)
(
)
"
&
#
$
'
(
)
&
#
$
'
(
)
2
"
&
$
'
(
#(
#)
#.
#/
#0
#1
!
#(
#)
#.
#/
#0
#1
"
"
#(
#)
*
#.
#/
#0
id:= E
E+E
E*E
-E
E
E
(E1)
id
E id | E.place := y
E id | E.place := z
E E + E | E.place := t1 E.Code := gen( t1 = y + z )
E (E) | E.place := t1 E.Code := gen( t1 = y + z )
E id | E.place := w
E -E | E.place := t2 E.Code := gen( t2 = -w)
E id | E.place := v
E E + E | E.place := t3 E.Code := gen( t2 = -w , t3 = t2
+v )
9. E (E) | E.place := t3 E.Code := gen( t2 = -w , t3 = t2 +
v)
10.
E E * E | E.place = t4 E.Code := gen( t1= y + z ,
t2 = -w , t3 = t2 + v , t4 = t1 * t3)
11.
S id:=E | S.Code := gen(t1= y + z , t2 = -w , t3 =
t2 + v , t4 = t1 * t3 , x=t4)
Where
1.
2.
3.
4.
5.
6.
7.
8.
L:=E
Semantic Actions
{ if L.offset = null then /* L is a simple id*/
emit(L.place : = E.place)
else
emit(L.place [ L.offset ] : = E.place)
2. E
E1 + E2
{ E.place : = newtemp;
emit(E.place : = E1.place + E2.place) }
3. E
(E1)
{ E.place : = E1.place }
4. E
5. L
Elist ]
{ L.place : = newtemp
L.offset : = newtemp
emit(L.place : = c(Elist.array))
emit(L.offset : = Elist.place *
width(Elist.array)) }
6. L
id
{ L.place : = id.place
L.offset : = null }
7. Elist
E
Elist1 ,
{ t : = newtemp
m : = Elist1.ndim + 1
emit(t : = Elist1.place *
limit(Elist1.array,m))
emit(t : = t + E.place)
Elist.array : = Elist1.array
Elist.place : = t
Elist.dim : = m }
8. Elist
id [ E
{ Elist.array : = id.place
Elist.place : = E.place
Elist.dim : = 1 }
Formula to remember
In more simple terms
1) ((i1*n2)+i2)*w
2) baseAddr(array)-n2*w
Consider the expression :: X[ i , j ] = Y [ i + j , k ] + Z[ k , l ]
o d1, d2 are the dimensions of X ;
o d3 , d4 are the dimensions of Y ;
o d5 , d6 are the dimensions of Z.
Three-address code statements for the above expression will be as
follows
For array Y
t1 = i + j
t2 = t1 * d4
t3 = t2 + k
t4 = t3 * w
t5 =addr(y)- C
t6= t5[ t4 ]
where c= d4*4
For array Z
t7 = k*d6
t8 = t7 + l
t9 = t8 * w
t10 =addr(z)- C where c=d6*4
t11 = t10[ t9 ]
For array X
t12 = i * d2
t13 = t12 + j
t14 = t13 * w
t15 =addr(x)- C where c=d2*4
t16 = t15[ t14 ]
t17 = t6 + t11
t16 = t17
t1,t2,.t17 are address locations and are generated using newtemp
in syntax directed translation.
Example:
Consider the statement:
X [i , j] : = Y [ i + j, k] + z.
The maximum dimensions of X are [d1, d2] and of Y are [d3, d4].
The intermediate three-address code for this statement can be
written by intuition as follows:
1. t1 : = i * d1
2. t2 : = t1 + j
3. t3 : =addrX- c where c=d2*width(X)
4. t4 : = t2 * width(X)
5. t5 : = i + j
6. t6 : = t5 * d4
7. t7 : = t6 + k
8. t8 : = addrY-c
where c=d2*width(Y)
9. t9 : = t7 * width(Y)
10.
t10 : = t8[t9]
11.
t11 : = t10 + z
12.
t3[t4] : = t11
22
L
:=
21
Elist
18
Elist
3
id
(X)
E
2
17
L
19
Elist
4
L
20
L
5
[
16
id
(j)
Elist
13
E
15
id
(z)
id
(i)
id
(Y)
12
E
9
14
+
E
11
L
8
id
(k)
L
10
id
(i)
id
(j)
NOTE: The number associated with each production is the production rule number.
The corresponding semantic actions and three-address codes generated are as follows:
Semantic Action
1. L.place : = i
2. E.place : = i
3. Elist.array : = X
Elist.place : = i
Elist.dim : = 1
4. L.place : = j
5. E.place : = j
6. m : = 2
Elist.array : = X
Elist.place : = t1
Elist.dim : = 2
7. L.place : = t2
L.place : = t3
8. L.place : = i
9. E.place : = i
10. L.place : = j
11. E.place : = j
12. E.place : = t4
13. Elist.array : = Y
Elist.place : = t4
Elist.dim : = 1
14. L.place : = k
15. E.place : = k
Three-Address Code
t1 : = i * d2
t1 : = t1 + j
t2 : = addr X-c
t3 : = t1 * width(X)
t4 : = i + j
16. m : = 2
Elist.array : = Y
Elist.place : = t5
Elist.dim : = 2
17. L.place : = t6
L.offset : = t7
18. E.place : = t8
19. L.place : = z
20. E.place : = z
21. E.place : = t9
22.
t5 : = t4 * d4
t5 : = t5 + k
t6 : =addrY - c
t7 : = t5 * width(Y)
t8 : = t6[t7]
t9 : = t8 + z
t2[t3] : = t9
NOTE: The three-address code generated by the above method is same as that which was
generated by intuition. This proves the validity of the above method.
Boolean Functions:
Boolean expressions are composed of the Boolean operators (and , or, and not) applied to
the elements that are Boolean variables or relational expressions.
Example:
a or b and not c
t1 = not c
t2 = b and t1
t3 = a or t2
Example:
a<b
if a<b goto nextstat+3
t1 := 0;
goto nextstat+2
t1 :=1
E
Translation scheme for producting Three Address Codes for Boolean Expressions:
E
E1 or E2
{
E.place := newtemp();
E1 and E2
{
E.place := newtemp();
emit (E.place := E1.place and E2.place);
}
not E
{
E.place := newtemp();
emit ( E.place := not E.place);
}
( E1 )
{
E.place := E1.place;
}
true
{
E.place := newtemp;
emit (E.place := 1);
}
false
{
E.place = newtemp;
emit (E.place := 0);
}
Short-Circuit Code
We can translate a Boolean expression into three address code without generating code
for any of the Boolean operators and without having the code necessarily evaluate the
entire expression. This is called Short-Ciruit or Jumping code.
Translation of a<b or c<d and e<f:
100:
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
E+E
{E.type :=
if E1.type = integer and
E2.type = integer then integer
else real }
end
else if E1.type = real and E2.type = real then begin
emit (E.place := E1.place real + E2.place);
E.type := real
end
else if E1.type = integer and E2.type = real then begin
u := newtemp;
emit(u := inttoreal E1.place);
emit(E.place := u real+ E2.place);
E.type:= real
end
else if E1.type = real and E2.type = integer then begin
u := newtemp;
emit(u := inttoreal E2.place);
emit(E.place := E1.place real+ u);
E.type: = real
end
else
E.type:= type error;
*32
14th Nov 05
45 46+78
syntax tree
( type checker )
intermediate representation
Type checking can be of two types:1. Static: Done during compilation time. This reduces the run time of the program
and the code generation is also faster.
2. Dynamic: Done during run time. Due to this the code gets inefficient and it also
slows down the execution time. But it adds to the flexibility of the program.
Types:
1.
2.
3.
4.
5.
6.Function: it maps element of one set to another i.e. from domain to range as
(D) (R)
or
T1 T2
P
D
T
E
D;E
D ;D | id : T
char | integer | array[num] of T
literal | num | id | E mod E | E[E]
Corresponding Actions:
1. T
2. T
char : {T.type=char}
integer: {T.type = integer}
3.
4.
5.
6.
7.
8.
D
T
E
E
E
E
id : T : { AddEntry(id.entry,T.type)}
array[num] of T1 : {T.type = Array(num,T1.type)}
literal : { E.type = char }
num : { E.type = iniger }
id :{E.type = lookup(id.type)}
E1mod E2: {
if(E1.type==int) && (E2.type == int)
then E.type = int
else type error
}
9. E E1 [E2] :{
if(E2.type == int & E1.type == array(s,t))
then E.type = t
else type error
}
CODE GENERATION
The final phase in our compiler model is the code generator. It
takes as input an intermediate representation of the source program and produces
as output an equivalent target program.
The requirements traditionally imposed on a code
generator are severe. The output code must be correct and of high
quality, meaning that it should make effective use of the resources
of the target machine. Moreover, the code generator itself should
run efficiently.
fig. 1
2) TARGET PROGRAMS
3) MEMORY MANAGEMENT
4) Instruction selection
6) Evaluation order
BASIC BLOCKS
IDENTIFYING BASIC BLOCKS:
A basic block is a sequence of consecutive statements in
which flow of control enters at the beginning and leaves at the
end without halt or possibility of branching except at the end.
1. We first determine the set of leaders, the first statements
of basic blocks. The rules we use are the following :
Egs.
1. location = -1
2. i=0
3. if (i<100) goto 5
4. goto 13
5. t1 = 4*1
6. t2 = A[t1]
7. if t2 = x goto 9
8. goto 10
9. location = i
10. t3 = i+1
11. i = t3
12. goto 3
13. ..
Leaders :
1,3,4,5,8,9,10,13
B1 = 1,2
B2 = 3
B3 = 4
B4 = 5,6,7
B5 = 8
B6 = 9
B7 = 10 , 11 , 12
B8 = 13
REGISTER ALLOCATION
The getreg function :
The function getreg returns the location L to hold the value of x for the assignment
x:=y op z .
The algorithm for getreg:
1) If the name y is in a register , that holds the value of no other names (in other words
no other names point to the same register as y does), and y is not live and has no next
use after the execution of y = x op z, then
a. return L.
b. Update the address descriptor of y , so that y is no longer in L.
2) Failing (1), return an empty register for L if there is one.
3) Failing (2), if x has a next use in the block, or if op requires a register then
a. find an occupied register R.
b. MOV(R,M) if value of R is not in proper M. If R holds value of many
variables, generate a MOV for each of the variables.
4) Failing (3), select the memory location of x as L.
Example:
d := (a-b) + (a-c) + (a-c)
Statements
t = a-b
u= a-c
Code generated
MOV a, R0
SUB b,R0
MOV a, R1
SUB c,R1
ADD R1,R0
Register descriptor
R0 contains t
Address descriptor
t in R0
R0 contains t
R1 contains u
R0 contains v
R1 contains u
R0 contains d
t in R0
u in R1
v= t+u
u in R1
v in R0
d= v+u
ADD R1,R0
d in R0
MOV R0,d
d in R0 and memory
Note that cost of this code can be reduced from 12 to 11 by generating MOV R0,R1
immediately after the instruction MOV a,R1.
The guiding force behind a good program is the strategy of efficient allocation of
registers.
A simple design is to keep fixed number of registers for each purpose like:
1) Base address
2) Stack
3) Arithmetic Operations
The simple design compensates for the inefficient utilization by this strategy.
d,f,b,c
a = b+c ; e = a+f ;
a,c,d,e
a,c,d,e,f
f = a d;
c,d,e,f
d = d-b;
a,c,d,f
b=d+f ;
c,d,e,f
e= a-c;
c,d,e,f,b
d,e,f,b
b = d+c;
c,d,e,f,b
Peephole Optimization
A statement-by-statement code-generation strategy often produces target code that contains
redundant instructions and suboptimal constructs. The quality of such target code can be
improved by applying optimizing transformations to the target program.
Here optimization is a misleading term because the most optimum code is not generated
through the process called peephole optimization.
Peephole optimization can be called a small but effective technique for locally improving the
target code. It is called so because at any time optimization is done by looking at a small
sequence of target instructions called the peephole and replacing the code by faster or shorter
code whenever possible.
The general practice being that each optimization results in spawning additional opportunities
for code improvisation. And thus repeated passes over the code can be done to get the
maximum benefit of this type of optimization.
o
o
o
o
o
Redundant-instruction optimization
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
Removal of unreachable code
Statements such as these can always be omitted or substituted as required for they end up doing
no productive work and eat up the valuable CPU working time.
Use of Machine Idioms :The target language may have hardware instructions to implement certain specific
operations efficiently. And detecting situations such as these can result in a far more
efficient code than the other ways optimizations. A good example can be that several
hardwares support the single instruction INC x. Which serves to increment the value
of x which otherwise would have taken at least 3 micro-instructions.