Code Generation 5th Year Computer Science Course
Code Generation 5th Year Computer Science Course
Code Generation 5th Year Computer Science Course
1
Code Generation
It is the final phase of a compiler. It takes as input an intermediate representation
of the source program with supplementary information in symbol table and
produces as output an equivalent target program.
It is used to produce the target code for three-address statements. It uses
registers to store the operands of the three address statement.
2
Code Generation
Code generator main tasks:
Instruction ordering
Consider the three address statement x:= y + z. It can have the following
sequence of codes:
MOV z, R0
ADD y, R0
3
Code Generation
presented below can be used whether or not an
optimizing phase occurs before code generation.
.
4
ISSUES IN THE DESIGN OF A
CODE GENERATOR
The following issues arise during the code generation
phase:
Input to code generator
Target program
Memory management
Instruction selection
Register allocation
Evaluation order
5
Input to code generator
The input to the code generation consists of the intermediate
representation of the source program produced by front end, together
with information in the symbol table to determine run-time addresses of
the data objects denoted by the names in the intermediate representation.
6
Target program
The output of the code generator is the target program.
The output may be :
Absolute machine language
Producing an absolute machine language program as output has the advantage that
it can be placed in a fixed location in memory and immediately executed.
7
Target program
A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
If the target machine does not handle relocation automatically,
the compiler must provide explicit relocation information
to the loader, to link the separately compiled program
segments.
Assembly language
Producing an assembly language program as output makes
the process of code generation some what easier
8
Memory Management
Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.
9
Instruction selection
The instructions of target machine should be complete and uniform.
MOV b, R0 R0 ← b
ADD c, R0 R0 ← c + R0
MOV R0, a a ← R0
MOV a, R0 R0 ← a
ADD e, R0 R0 ← e + R0
MOV R0 , d d ← R0
14
Directed Acyclic Graph
It is a tool that depicts the structure of basic blocks, helps
to see the flow of values flowing among the basic blocks,
and offers optimization too. DAG provides easy
transformation on basic blocks. DAG can be understood
here:
Leaf nodes represent identifiers, names or constants.
Interior nodes represent operators.
Interior nodes also represent the results of expressions or the
identifiers/name where the values are to be stored or assigned.
15
Directed Acyclic Graph
t0 = a + b
t1 = t0 + c
d = t0 + t1
16
Descriptors
The code generator has to track both the registers (for availability) and
addresses (location of values) while generating the code. For both of
them, the following two descriptors are used:
Register descriptor :
It is used to inform the code generator about the availability of registers.
It keeps track of values stored in each register.
Whenever a new register is required during code generation, this
descriptor is consulted for register availability.
The register descriptors show that all the registers are initially empty.
17
Descriptors
Address descriptor :
An address descriptor is used to store the location where current
value of the name can be found at run time.
Values of the names (identifiers) used in the program might be
stored at different locations while in execution.
It used to keep track of memory locations where the values of
identifiers are stored.
These locations may include CPU registers, heaps, stacks, memory
or a combination of the mentioned locations.
18
getReg Function
getReg : Code generator uses getReg function to
determine the status of available registers and the location
of name values. getReg works as follows:
If variable Y is already in register R, it uses that register.
Else if some register R is available, it uses that register.
Else if both the above options are not possible, it chooses a
register that requires minimal number of load and store
instructions.
19
A code-generation algorithm
The algorithm takes a sequence of three-address statements as input. For each three
address statement of the form x : = y op z perform the various actions. These are as
follows:
Invoke a function getreg to find out the location L where the result of computation b op c should
be stored.
Consult the address description for y to determine y'. If the value of y currently in memory and
register both then prefer the register y' . If the value of y is not already in L then generate the
instruction MOV y' , L to place a copy of y in L.
Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in both
then prefer a register to a memory location. Update the address descriptor of x to indicate that x
is in location L. If x is in L then update its descriptor and remove x from all other descriptor.
If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register will
no longer contain y or z. 20