Code Generation 5th Year Computer Science Course

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Code Generation

1
Code Generation
It is the final phase of a compiler. It takes as input an intermediate representation
of the source program with supplementary information in symbol table and
produces as output an equivalent target program.

The source code written in a higher-level language is transformed into a lower-


level language that results in a lower-level object code, which should have the
following minimum properties:
 It should carry the exact meaning of the source code.

 It should be efficient in terms of CPU usage and memory management.

It is used to produce the target code for three-address statements. It uses
registers to store the operands of the three address statement.
2
Code Generation
 Code generator main tasks:

 Instruction selection: factors to determining (level of IR, nature of ISA


(instruction set architecture) and desired quality of generated code)
 Register allocation and assignment

 Instruction ordering

Consider the three address statement x:= y + z. It can have the following
sequence of codes:
MOV z, R0
ADD y, R0
3
Code Generation
 presented below can be used whether or not an
optimizing phase occurs before code generation.

.
4
ISSUES IN THE DESIGN OF A
CODE GENERATOR
 The following issues arise during the code generation
phase:
 Input to code generator
 Target program
 Memory management
 Instruction selection
 Register allocation
 Evaluation order
5
Input to code generator
 The input to the code generation consists of the intermediate
representation of the source program produced by front end, together
with information in the symbol table to determine run-time addresses of
the data objects denoted by the names in the intermediate representation.

 Intermediate representation can be :


 Linear representation such as postfix notation
 Three address representation such as quadruples
 Virtual machine representation such as stack machine code
 Graphical representations such as syntax trees and DAGs

6
Target program
 The output of the code generator is the target program.
The output may be :
 Absolute machine language
 Producing an absolute machine language program as output has the advantage that
it can be placed in a fixed location in memory and immediately executed.

 Relocatable machine language


 Producing a relocatable machine language program as output allows subprograms
to be compiled separately.

7
Target program
 A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
 If the target machine does not handle relocation automatically,
the compiler must provide explicit relocation information
to the loader, to link the separately compiled program
segments.
 Assembly language
 Producing an assembly language program as output makes
the process of code generation some what easier
8
Memory Management
 Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.

 It makes use of symbol table, that is, a name in a three-address


statement refers to a symbol table entry for the name.

 Labels in three-address statements have to be converted to


addresses of instructions.

9
Instruction selection
 The instructions of target machine should be complete and uniform.

 Instruction speeds and machine idioms are important factors when


efficiency of target program is considered.

 The quality of the generated code is determined by its speed and


size.

 The factors to be considered during instruction selection are:


 The uniformity and completeness of the instruction set.
 Instruction speed.
 Size of the instruction set.
10
Instruction selection
 The former statement can be translated into the latter statement as shown below:

Eg., for the following address code is:


a := b + c
d := a + e

inefficient assembly code is:

MOV b, R0 R0 ← b

ADD c, R0 R0 ← c + R0

MOV R0, a a ← R0

MOV a, R0 R0 ← a

ADD e, R0 R0 ← e + R0

MOV R0 , d d ← R0

Here the fourth statement is redundant, and so is the third statement if


11
Register allocation
 Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Therefore efficient utilization of registers is particularly
important in generating good code.

 The use of registers is subdivided into two sub problems :


 Register allocation - the set of variables that will reside in

registers at a point in the program is selected.


 Register assignment - the specific register that a value
12
Evaluation order
 It affects the efficiency of the target code.

 Some computation orders require fewer registers to


hold intermediate results than others.

 Picking a best order in the general case is a difficult NP-


complete problem.

 Initially, we shall avoid the problem by generating code


for the three-address statements in the order in which they
13
Basic Blocks and Control Flow Graphs
 A basic block is the longest sequence of three-address codes with the
following properties.
 The control flows to the block only through the first three-address code.
 The flow goes out of the block only through the last three-address code.

 A control-flow graph is a directed graph G = (V,E), where the nodes are


the basic blocks and the edges correspond to the flow of control from
one basic block to another. As an example the edge eij = (vi , vj)
corresponds to the transfer of flow from the basic block vi to the basic
block vj.

14
Directed Acyclic Graph
 It is a tool that depicts the structure of basic blocks, helps
to see the flow of values flowing among the basic blocks,
and offers optimization too. DAG provides easy
transformation on basic blocks. DAG can be understood
here:
 Leaf nodes represent identifiers, names or constants.
 Interior nodes represent operators.
 Interior nodes also represent the results of expressions or the
identifiers/name where the values are to be stored or assigned.
15
Directed Acyclic Graph
 t0 = a + b

 t1 = t0 + c

 d = t0 + t1

16

Descriptors
 The code generator has to track both the registers (for availability) and
addresses (location of values) while generating the code. For both of
them, the following two descriptors are used:

 Register descriptor :
 It is used to inform the code generator about the availability of registers.
 It keeps track of values stored in each register.
 Whenever a new register is required during code generation, this
descriptor is consulted for register availability.
 The register descriptors show that all the registers are initially empty.

17
Descriptors
 Address descriptor :
 An address descriptor is used to store the location where current
value of the name can be found at run time.
 Values of the names (identifiers) used in the program might be
stored at different locations while in execution.
 It used to keep track of memory locations where the values of
identifiers are stored.
 These locations may include CPU registers, heaps, stacks, memory
or a combination of the mentioned locations.
18
getReg Function
 getReg : Code generator uses getReg function to
determine the status of available registers and the location
of name values. getReg works as follows:
 If variable Y is already in register R, it uses that register.
 Else if some register R is available, it uses that register.
 Else if both the above options are not possible, it chooses a
register that requires minimal number of load and store
instructions.
19
A code-generation algorithm
 The algorithm takes a sequence of three-address statements as input. For each three
address statement of the form x : = y op z perform the various actions. These are as
follows:
 Invoke a function getreg to find out the location L where the result of computation b op c should
be stored.
 Consult the address description for y to determine y'. If the value of y currently in memory and
register both then prefer the register y' . If the value of y is not already in L then generate the
instruction MOV y' , L to place a copy of y in L.
 Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in both
then prefer a register to a memory location. Update the address descriptor of x to indicate that x
is in location L. If x is in L then update its descriptor and remove x from all other descriptor.
 If the current value of y or z have no next uses or not live on exit from the block or in register
then alter the register descriptor to indicate that after execution of x : = y op z those register will
no longer contain y or z. 20

You might also like