Code Geneartion
Code Geneartion
Issues involved in code generation – Register allocation – Conversion of three address code
to assembly code using code generation algorithm – examples – Procedure for converting
assembly code to machine code – Case study
Code Generation:
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program.
The code generation techniques presented below can be used whether or not an optimizing
phase occurs before code generation.
2. Target program:
The output of the code generator is the target program. The output may be :
a. Absolute machine language: It can be placed in a fixed memory location and can
be executed immediately.
b. Relocatable machine language: It allows subprograms to be compiled separately.
c. Assembly language: Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of data objects in run-time memory by
the front end and code generator. It makes use of symbol table, that is, a name in a three-
address statement refers to a symbol-table entry for the name.
Labels in three-address statements have to be converted to addresses of instructions.
For example,
j : goto i generates jump instruction as follows :
if i < j, a backward jump instruction with target address equal to location of
code for quadruple i is generated.
if i > j, the jump is forward. We must store on a list for quadruple i the location of the first
machine instruction generated for quadruple j. When i is processed, the machine locations for
all instructions that forward jumps to i are filled.
Instruction selection:
1. The instructions of target machine should be complete and uniform.
2. Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.
3. The quality of the generated code is determined by its speed and size.
For example:
Every three-address statement of the form
x=y+z
where x, y and z are statically allocated.
Code sequence generated is shown as:
MOV y,R0 /* load y into register R0 */
ADD z,R0 /* add z to R0 */
MOV R0,x /* store R0 into x */
Unfortunately, this kind of statement-by-statement code generation often produces poor code.
For example, the sequence of statements,
The quality of the generated code is determined by its speed and size.A target machine with a
rich instruction set may provide several ways of implementing a given operation.
For example:
If the target machine has an “increment” instruction(INC), then the three address statement
a=a+1
may be implemented more efficiently by the single instruction,
INC a
instead of
,
MOV a,R0
ADD #1,R0
MOV R0,a
Register allocation:
Instructions involving register operands are shorter and faster than those involving operands
in memory. The use of registers is subdivided into two sub problems :
Register allocation – the set of variables that will reside in registers in the program is
selected.
Register assignment - the specific register that a variable will reside is selected.
Certain machine requires even-odd register pairs for some operands and results. For example
consider the division instruction of the form :
Div x, y
where, x – dividend in even register in even/odd register pair, y – divisor in even register
holds the remainder odd register holds the quotient
Evaluation order:
At last, the code generator decides the order in which the instruction will be executed. The
order in which the computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
• Picking the best order is a difficult task.
• Initially avoid this problem by generating code for the three address statements in the
order in which they have been produced by the intermediate code generator.
• It creates schedules for instructions to execute them.
When instructions are independent, their evaluation order can be changed to utilize
registers and save on instruction cost. Consider the following instruction:
a+b-(c+d)*e
The three-address code, the corresponding code and its reordered instruction are given
below:
TARGET MACHINE:
Familiarity with the target machine and its instruction set is a prerequisite for designing a
good code generator. The target computer is a byte-addressable machine with 4 bytes to a
word. It has n general-purpose registers, R0, R1, . . . , Rn-1.It has two-address instructions of
the form:
op source, destination
where, op is an op-code, and source and destination are data
fields. It has the following op-codes :
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
The source and destination of an instruction are specified by combining registers and
memory locations with address modes.
Table 5.1 Mode and address allocation table
For example : contents(a) denotes the contents of the register or memory address
represented by a. A memory location M or a register R represents itself when used as a source
or destination.
e.g. MOV R0,M → stores the content of register R0 into memory location M.
Instruction costs :
Instruction cost = 1+cost for source and destination address modes. This cost corresponds to
the length of the instruction.
Address modes involving registers have cost zero. Address modes involving memory
location or literal have cost one. Instruction length should be minimized if space is important.
Doing so also minimizes the time taken to fetch and perform the instruction.
For example :
1. The instruction MOV R0, R1 copies the contents of register R0 into R1. It has cost
one, since it occupies only one word of memory.
2. The (store) instruction MOV R5,M copies the contents of register R5 into memory
location M. This instruction has cost two, since the address of memory location M is
in the word following the instruction.
3. The instruction ADD #1,R3 adds the constant 1 to the contents of register 3,and has
cost two, since the constant 1 must appear in the next word following the instruction.
4. The instruction SUB 4(R0),*12(R1) stores the value
contents(contents(12+contents(R1)))-contents(contents(4+R0))
into the destination *12(R1). Cost of this instruction is three, since the constant 4 and
12 are stored in the next two words following the instruction.
For example :
MOV R0, R1 copies the contents of register R0 into R1. It has cost one, since it occupies
only one word of memory.
The three-address statement a : = b + c can be implemented by many different instruction
sequences :
MOV b, R0
ADD c, R0
MOV R0, a cost = 6
MOV b, a
ADD c, a cost = 6
Assuming R0, R1 and R2 contain the addresses of a, b, and c :
MOV *R1, *R0
ADD *R2, *R0 cost = 2
In order to generate good code for target machine, we must utilize its addressing capabilities
efficiently.
Run-Time Storage Management:
Information needed during an execution of a procedure is kept in a block of storage called an
activation record, which includes storage for names local to the procedure.
The two standard storage allocation strategies are:
• Static allocation
• Stack allocation
In static allocation, the position of an activation record in memory is fixed at compile time.
In stack allocation, a new activation record is pushed onto the stack for each execution of a
procedure. The record is popped when the activation ends.
The following three-address statements are associated with the run-time allocation and
deallocation of activation records:
• Call
• Return
• Halt
We assume that the run-time memory is divided into areas for:
• Code
• Static data
• Stack
Static allocation:
• In this allocation scheme, the compilation data is bound to a fixed location in the
memory and it does not change when the program executes.
• As the memory requirement and storage locations are known in advance, runtime
support package for memory allocation and de-allocation is not required.
Implementation of call statement:
The codes needed to implement static allocation are as follows:
MOV #here +20, callee.static_area /*It saves return address*/
GOTO callee.code_area /*It transfers control to the target code for the called
procedure */
where,
callee.static_area – Address of the activation record
callee.code_area– Address of the first instruction for called procedure
#here +20 – Literal return address which is the address of the instruction following GOTO.
If the current values of y or z have no next uses, are not live on exit from the block, and are in
registers, alter the register descriptor to indicate that, after execution of x : = y op z those
registers will no longer contain y or z.
where,
-use(x, B) is the number of times x is used in B prior to any definition of x;
-live(x,B) is 1 if x is live on exit from B and is assigned a value in B and 0 otherwise.
Example: Consider the basic block in the inner loop in Fig 5.3 where jump and conditional
jumps have been omitted. Assume R0, R1 and R2 are allocated to hold values throughout the
loop. Variables live on entry into and on exit from each block are shown in the figure.