0% found this document useful (0 votes)
73 views34 pages

Unit 3

The document discusses intermediate code generation in compilers. It describes how an intermediate language acts as a representation between the source code and machine code. Common intermediate languages include syntax trees, postfix notation, and triple/quadruple codes. The document focuses on triple/quadruple codes (three-address code) as an intermediate representation, explaining the different statement types in detail including binary operations, jumps, procedures, indexed assignments, and more. It provides examples of translating syntax trees to the three-address code.

Uploaded by

karthi_gopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views34 pages

Unit 3

The document discusses intermediate code generation in compilers. It describes how an intermediate language acts as a representation between the source code and machine code. Common intermediate languages include syntax trees, postfix notation, and triple/quadruple codes. The document focuses on triple/quadruple codes (three-address code) as an intermediate representation, explaining the different statement types in detail including binary operations, jumps, procedures, indexed assignments, and more. It provides examples of translating syntax trees to the three-address code.

Uploaded by

karthi_gopal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

23.11.

09 1
Intermediate Code Generation
Intermediate codes are machine independent codes, but they are close
to machine instructions.
The given program in a source language is converted to an
equivalent program in an intermediate language by the intermediate
code generator.
Intermediate language can be many different languages, and the
designer of the compiler decides this intermediate language.
syntax trees can be used as an intermediate language.
postfix notation can be used as an intermediate language.
three-address code (Quadraples) can be used as an intermediate language
we will use quadraples to discuss intermediate code generation
quadraples are close to machine instructions, but they are not actual machine instructions.
some programming languages have well defined intermediate languages.
java java virtual machine
prolog warren abstract machine
In fact, there are byte-code emulators to execute instructions in these intermediate languages.

23.11.09 2
Three-Address Code (Quadraples)
A quadraple is:
x := y op z
where x, y and z are names, constants or compiler-generated
temporaries; op is any operator.

But we may also the following notation for quadraples (much better
notation because it looks like a machine code instruction)
op y,z,x
apply operator op to y and z, and store the result in x.

We use the term three-address code because each statement usually
contains three addresses (two for operands, one for the result).
23.11.09 3
Three-Address Statements
Binary Operator: op y,z,result or result := y op z
where op is a binary arithmetic or logical operator. This binary operator is applied to
y and z, and the result of the operation is stored in result.
Ex: add a,b,c
gt a,b,c
addr a,b,c
addi a,b,c

Unary Operator: op y,,result or result := op y
where op is a unary arithmetic or logical operator. This unary operator is applied to y,
and the result of the operation is stored in result.
Ex: uminus a,,c
not a,,c
inttoreal a,,c

23.11.09 4
Three-Address Statements (cont.)
Move Operator: mov y,,result or result := y
where the content of y is copied into result.
Ex: mov a,,c
movi a,,c
movr a,,c

Unconditional J umps: jmp ,,L or goto L
We will jump to the three-address code with the label L, and the execution continues
from that statement.
Ex: jmp ,,L1 // jump to L1
jmp ,,7 // jump to the statement 7

23.11.09 5
Three-Address Statements (cont.)
Conditional J umps: jmprelop y,z,L or if y relop z goto L
We will jump to the three-address code with the label L if the result of y relop z is
true, and the execution continues from that statement. If the result is false, the execution
continues from the statement following this conditional jump statement.
Ex: jmpgt y,z,L1 // jump to L1 if y>z
jmpgte y,z,L1 // jump to L1 if y>=z
jmpe y,z,L1 // jump to L1 if y==z
jmpne y,z,L1 // jump to L1 if y!=z

Our relational operator can also be a unary operator.
jmpnz y,,L1 // jump to L1 if y is not zero
jmpz y,,L1 // jump to L1 if y is zero
jmpt y,,L1 // jump to L1 if y is true
jmpf y,,L1 // jump to L1 if y is false
23.11.09 6
Three-Address Statements (cont.)
Procedure Parameters: param x,, or param x
Procedure Calls: call p,n, or call p,n
where x is an actual parameter, we invoke the procedure p with n parameters.
Ex: param x
1
,,
param x
2
,,
p(x
1
,...,x
n
)
param x
n
,,
call p,n,

f(x+1,y) add x,1,t1
param t1,,
param y,,
call f,2,
23.11.09 7
Three-Address Statements (cont.)
I ndexed Assignments:
move y[i],,x or x := y[i]
move x,,y[i] or y[i] := x


Address and Pointer Assignments:
moveaddr y,,x or x := &y
movecont y,,x or x := *y

23.11.09 8
Syntax-Directed Translation into Three-Address
Code
S id := E S.code = E.code || gen(mov E.place ,, id.place)
E E
1
+ E
2
E.place = newtemp();
E.code = E
1
.code || E
2
.code || gen(add E
1
.place , E
2
.place , E.place)
E E
1
* E
2
E.place = newtemp();
E.code = E
1
.code || E
2
.code || gen(mult E
1
.place , E
2
.place , E.place)
E - E
1
E.place = newtemp();
E.code = E
1
.code || gen(uminus E
1
.place ,, E.place)
E ( E
1
) E.place = E
1
.place;
E.code = E
1
.code
E id E.place = id.place;
E.code = null



23.11.09 9
Syntax-Directed Translation (cont.)
S while E do S
1
S.begin = newlabel();
S.after = newlabel();
S.code = gen(S.begin :) || E.code ||
gen(jmpf E.place ,, S.after) || S
1
.code ||
gen(jmp ,, S.begin) ||
gen(S.after :)
S if E then S
1
else S
2
S.else = newlabel();
S.after = newlabel();
S.code = E.code ||
gen(jmpf E.place ,, S.else) || S
1
.code ||
gen(jmp ,, S.after) ||
gen(S.else :) || S
2
.code ||
gen(S.after :)


23.11.09 10
Translation Scheme to Produce Three-Address Code
S id := E { p= lookup(id.name);
if (p is not nil) then emit(mov E.place ,, p)
else error(undefined-variable) }
E E
1
+ E
2
{ E.place = newtemp();
emit(add E
1
.place , E
2
.place , E.place) }
E E
1
* E
2
{ E.place = newtemp();
emit(mult E
1
.place , E
2
.place , E.place) }
E - E
1
{ E.place = newtemp();
emit(uminus E
1
.place ,, E.place) }
E ( E
1
) { E.place = E
1
.place; }
E id { p= lookup(id.name);
if (p is not nil) then E.place = id.place
else error(undefined-variable) }

23.11.09 11
Translation Scheme with Locations
S id := { E.inloc = S.inloc } E
{ p = lookup(id.name);
if (p is not nil) then { emit(E.outloc mov E.place ,, p); S.outloc=E.outloc+1 }
else { error(undefined-variable); S.outloc=E.outloc } }

E { E
1
.inloc = E.inloc } E
1
+ { E
2
.inloc = E
1
.outloc } E
2

{ E.place = newtemp(); emit(E
2
.outloc add E
1
.place , E
2
.place , E.place); E.outloc=E
2
.outloc+1 }

E { E
1
.inloc = E.inloc } E
1
+ { E
2
.inloc = E
1
.outloc } E
2

{ E.place = newtemp(); emit(E
2
.outloc mult E
1
.place , E
2
.place , E.place); E.outloc=E
2
.outloc+1 }

E - { E
1
.inloc = E.inloc } E
1

{ E.place = newtemp(); emit(E
1
.outloc uminus E
1
.place ,, E.place); E.outloc=E
1
.outloc+1 }

E ( E
1
) { E.place = E
1
.place; E.outloc=E
1
.outloc+1 }

E id { E.outloc = E.inloc; p= lookup(id.name);
if (p is not nil) then E.place = id.place
else error(undefined-variable) }

23.11.09 12
Boolean Expressions
E { E
1
.inloc = E.inloc } E
1
and { E
2
.inloc = E
1
.outloc } E
2

{ E.place = newtemp(); emit(E
2
.outloc and E
1
.place , E
2
.place , E.place); E.outloc=E
2
.outloc+1 }

E { E
1
.inloc = E.inloc } E
1
or { E
2
.inloc = E
1
.outloc } E
2

{ E.place = newtemp(); emit(E
2
.outloc and E
1
.place , E
2
.place , E.place); E.outloc=E
2
.outloc+1 }

E not { E
1
.inloc = E.inloc } E
1

{ E.place = newtemp(); emit(E
1
.outloc not E
1
.place ,, E.place); E.outloc=E
1
.outloc+1 }

E { E
1
.inloc = E.inloc } E
1
relop { E
2
.inloc = E
1
.outloc } E
2

{ E.place = newtemp();
emit(E
2
.outloc relop.code E
1
.place , E
2
.place , E.place); E.outloc=E
2
.outloc+1 }




23.11.09 13
Translation Scheme(cont.)
S while { E.inloc = S.inloc } E do
{ emit(E.outloc jmpf E.place ,, NOTKNOWN);
S
1
.inloc=E.outloc+1; } S
1

{ emit(S
1
.outloc jmp ,, S.inloc);
S.outloc=S
1
.outloc+1;
backpatch(E.outloc,S.outloc); }

S if { E.inloc = S.inloc } E then
{ emit(E.outloc jmpf E.place ,, NOTKNOWN);
S
1
.inloc=E.outloc+1; } S
1
else
{ emit(S
1
.outloc jmp ,, NOTKNOWN);
S
2
.inloc=S
1
.outloc+1;
backpatch(E.outloc,S
2
.inloc); } S
2

{ S.outloc=S
2
.outloc;
backpatch(S
1
.outloc,S.outloc); }

23.11.09 14
Three Address Codes - Example
x:=1; 01: mov 1,,x
y:=x+10; 02: add x,10,t1
while (x<y) { 03: mov t1,,y
x:=x+1; 04: lt x,y,t2
if (x%2==1) then y:=y+1; 05: jmpf t2,,17
else y:=y-2; 06: add x,1,t3
} 07: mov t3,,x
08: mod x,2,t4
09: eq t4,1,t5
10: jmpf t5,,14
11: add y,1,t6
12: mov t6,,y
13: jmp ,,16
14: sub y,2,t7
15: mov t7,,y
16: jmp ,,4
17:
23.11.09 15
Arrays
Elements of arrays can be accessed quickly if the elements are stored in a block of
consecutive locations.

A one-dimensional array A:



base
A
low i width

base
A
is the address of the first location of the array A,
width is the width of each array element.
low is the index of the first array element

location of A[i] base
A
+(i-low)*width

23.11.09 16
Arrays (cont.)
base
A
+(i-low)*width
can be re-written as i*width + (base
A
-low*width)

should be computed at run-time can be computed at compile-time

So, the location of A[i] can be computed at the run-time by evaluating
the formula i*width+c where c is (base
A
-low*width) which is evaluated
at compile-time.

Intermediate code generator should produce the code to evaluate this
formula i*width+c (one multiplication and one addition operation).
23.11.09 17
Two-Dimensional Arrays
A two-dimensional array can be stored in
either row-major (row-by-row) or
column-major (column-by-column).
Most of the programming languages use row-major method.

Row-major representation of a two-dimensional array:



row
1
row
2
row
n


23.11.09 18
Two-Dimensional Arrays (cont.)
The location of A[i
1
,i
2
] is

base
A
+ ((i
1
-low
1
)*n
2
+i
2
-low
2
)*width

base
A
is the location of the array A.
low
1
is the index of the first row
low
2
is the index of the first column
n
2
is the number of elements in each row
width is the width of each array element

Again, this formula can be re-written as

((i
1
*n
2
)+i
2
)*width + (base
A
-((low
1
*n
1
)+low
2
)*width)

should be computed at run-time can be computed at compile-time
23.11.09 19
Multi-Dimensional Arrays
In general, the location of A[i
1
,i
2
,...,i
k
] is

(( ... ((i
1
*n
2
)+i
2
) ...)*n
k
+i
k
)*width + (base
A
-
((...((low
1
*n
1
)+low
2
)...)*n
k
+low
k
)*width)

So, the intermediate code generator should produce the codes to
evaluate the following formula (to find the location of A[i
1
,i
2
,...,i
k
]) :

(( ... ((i
1
*n
2
)+i
2
) ...)*n
k
+i
k
)*width + c

To evaluate the (( ... ((i
1
*n
2
)+i
2
) ...)*n
k
+i
k
portion of this formula, we
can use the recurrence equation:

e
1
= i
1

e
m
= e
m-1
* n
m
+ i
m

23.11.09 20
Translation Scheme for Arrays
If we use the following grammar to calculate addresses of array
elements, we need inherited attributes.

L id | id [ Elist ]
Elist Elist , E | E

Instead of this grammar, we will use the following grammar to calculate
addresses of array elements so that we do not need inherited attributes
(we will use only synthesized attributes).

L id | Elist ]
Elist Elist , E | id [ E

23.11.09 21
Translation Scheme for Arrays (cont.)
S L := E { if (L.offset is null) emit(mov E.place ,, L.place)
else emit(mov E.place ,, L.place [ L.offset ]) }

E E
1
+ E
2
{ E.place = newtemp();
emit(add E
1
.place , E
2
.place , E.place) }

E ( E
1
) { E.place = E
1
.place; }

E L { if (L.offset is null) E.place = L.place)
else { E.place = newtemp();
emit(mov L.place [ L.offset ] ,, E.place) } }


23.11.09 22
Translation Scheme for Arrays (cont.)
L id { L.place = id.place; L.offset = null; }

L Elist ]
{ L.place = newtemp(); L.offset = newtemp();
emit(mov c(Elist.array) ,, L.place);
emit(mult Elist.place , width(Elist.array) , L.offset) }

Elist Elist
1
, E
{ Elist.array = Elist
1
.array ; Elist.place = newtemp(); Elist.ndim = Elist
1
.ndim + 1;
emit(mult Elist
1
.place , limit(Elist.array,Elist.ndim) , Elist.place);
emit(add Elist.place , E.place , Elist.place); }

Elist id [ E
{Elist.array = id.place ; Elist.place = E.place; Elist.ndim = 1; }

23.11.09 23
Translation Scheme for Arrays Example1
A one-dimensional double array A : 5..100
n
1
=95 width=8 (double) low
1
=5

Intermediate codes corresponding to x := A[y]

mov c,,t1 // where c=base
A
-(5)*8
mult y,8,t2
mov t1[t2],,t3
mov t3,,x

23.11.09 24
Translation Scheme for Arrays Example2
A two-dimensional int array A : 1..10x1..20
n
1
=10 n
2
=20 width=4 (integers) low
1
=1 low
2
=1

Intermediate codes corresponding to x := A[y,z]

mult y,20,t1
add t1,z,t1
mov c,,t2 // where c=base
A
-(1*20+1)*4
mult t1,4,t3
mov t2[t3],,t4
mov t4,,x
23.11.09 25
Translation Scheme for Arrays Example3
A three-dimensional int array A : 0..9x0..19x0..29
n
1
=10 n
2
=20 n
3
=30 width=4 (integers) low
1
=0 low
2
=0 low
3
=0

Intermediate codes corresponding to x := A[w,y,z]

mult w,20,t1
add t1,y,t1
mult t1,30,t2
add t2,z,t2
mov c,,t3 // where c=base
A
-((0*20+0)*30+0)*4
mult t2,4,t4
mov t3[t4],,t5
mov t5,,x

23.11.09 26
Declarations
P M D
M { offset=0 }
D D ; D
D id : T { enter(id.name,T.type,offset); offset=offset+T.width }
T int { T.type=int; T.width=4 }
T real { T.type=real; T.width=8 }
T array[num] of T
1
{ T.type=array(num.val,T
1
.type);
T.width=num.val*T
1
.width }
T T
1
{ T.type=pointer(T
1
.type); T.width=4 }

where enter crates a symbol table entry with given values.
23.11.09 27
Nested Procedure Declarations
For each procedure we should create a symbol table.

mktable(previous) create a new symbol table where previous is the parent symbol
table of this new symbol table

enter(symtable,name,type,offset) create a new entry for a variable in the given
symbol table.

enterproc(symtable,name,newsymbtable) create a new entry for the procedure in
the symbol table of its parent.

addwidth(symtable,width) puts the total width of all entries in the symbol table
into the header of that table.

We will have two stacks:
tblptr to hold the pointers to the symbol tables
offset to hold the current offsets in the symbol tables in tblptr stack.

23.11.09 28
Nested Procedure Declarations
P M D { addwidth(top(tblptr),top(offset)); pop(tblptr); pop(offset) }
M { t=mktable(nil); push(t,tblptr); push(0,offset) }
D D ; D
D proc id N D ; S
{ t=top(tblptr); addwidth(t,top(offset));
pop(tblptr); pop(offset);
enterproc(top(tblptr),id.name,t) }

D id : T { enter(top(tblptr),id.name,T.type,top(offset));
top(offset)=top(offset)+T.width }

N { t=mktable(top(tblptr)); push(t,tblptr); push(0,offset) }
23.11.09 29
Backpatching
Previous codes for Boolean expressions insert symbolic labels for jumps

It therefore needs a separate pass to set them to appropriate addresses

We can use a technique named backpatching to avoid this

We assume we save instructions into an array and labels will be indices in the array

For nonterminal B we use two attributes B.truelist and B.falselist together with following
functions:

makelist(i): create a new list containing only I, an index into the array of instructions
Merge(p1,p2): concatenates the lists pointed by p1 and p2 and returns a pointer to the concatenated list
Backpatch(p,i): inserts i as the target label for each of the instruction on the list pointed to by p

23.11.09 30
Backpatching for Boolean Expressions


23.11.09 31
Backpatching for Boolean Expressions
Annotated parse tree for x < 100 || x > 200 && x ! = y
23.11.09 32
Flow-of-Control Statements
23.11.09 33
Translation of a switch-statement
23.11.09 34

You might also like