UNIT-5 Compiler Design
UNIT-5 Compiler Design
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program. The
code generation techniques presented below can be used whether or not an optimizing phase
occurs before code generation.
The executing target program runs in its own logical address space in which each program
value has a location.
The management and organization of this logical address space is shared between the complier,
operating system and target machine. The operating system maps the logical address into
physical addresses, which are usually spread throughout memory.
Code
Static Data
Stack
free memory
Heap
Run-time storage comes in blocks, where a byte is the smallest unit of addressable memory.
Four bytes form a machine word. Multibyte objects are stored in consecutive bytes and given the
address of first byte.
This run-time storage might be subdivided to hold:
1. The generated target code,
The storage layout for data objects is strongly influenced by the addressing constraints of the target
machine.
A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may
allocate 12 bytes to get alignment, leaving 2 bytes unused.
This unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be placed in an area
called static.
The dynamic areas used to maximize the utilization of space at run time are stack and heap.
Activation records:
2
Procedure calls and returns are usually managed by a run time stack called the control stack.
Each live activation has an activation record on the control stack, with the root of the activation
tree at the bottom, the latter activation has its record at the top of the stack.
The contents of the activation record vary with the language being implemented. The diagram
below shows the contents of activation record.
Temporaries
Local Data
Machine Status
Control Link
Access Link
Actual Parameters
Return Value
Static allocation
In static allocation, names are bound to storage as the program is compiled, so there is no need for
a run-time support package.
Since the bindings do not change at run-time, everytime a procedure is activated, its names
are bound to the same storage locations.
Therefore values of local names are retained across activations of a procedure. That is, when
control returns to a procedure the values of the locals are the same as they were when control left
the last time.
From the type of a name, the compiler decides the amount of storage for the name and decides
where the activation records go. At compile time, we can fill in the addresses at which the target
code can find the data it operates on.
Some limitations of using static allocation:
1. The size of a data object and constraints on its position in memory must be known at
compile time.
2. Recursive procedures are restricted, because all activations of a procedure use the same
bindings for local names.
3. Data structures cannot be created dynamically, since there is no mechanism for storage
allocation at run time.
FORTRAN use static storage allocation
4
Stack allocation
All compilers for languages that use procedures, functions or methods as units of user-defined
actions manage at least part of their run-time memory as a stack.
Each time a procedure is called , space for its local variables is pushed onto a stack, and when the
procedure terminates, that space is popped off the stack.
Calling sequences:
Procedures called are implemented in what is called as calling sequence, which consists of code
that allocates an activation record on the stack and enters information into its fields.
A return sequence is similar to code to restore the state of machine so the calling procedure
can continue its execution after the call.
The code in calling sequence is often divided between the calling procedure (caller) and the
procedure it calls (callee).
When designing calling sequences and the layout of activation records, the following principles
are helpful:
Values communicated between caller and callee are generally placed at the beginning of
the callee’s activation record, so they are as close as possible to the caller’s activation
record.
Fixed length items are generally placed in the middle. Such items typically include the control
link, the access link, and the machine status fields.
Items whose size may not be known early enough are placed at the end of the activation record.
The most common example is dynamically sized array, where the value of one of the callee’s
parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common approach is to have it point to the
end of fixed-length fields in the activation record. Fixed-length data can then be accessed by fixed
offsets, known to the intermediate-code generator, relative to the top-of-stack pointer.
5
Parameters and returned values
caller’s
control link
activation
links and saved status
record
caller’s temporaries and local data
responsibility Parameters and returned values
callee’s
activation control link
record links and saved status
top_sp
callee’s
responsibility temporaries and local data
The calling sequence and its division between caller and callee are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp into the callee’s activation
record. The caller then increments the top_sp to the respective positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
The callee places the return value next to the parameters.
Using the information in the machine-status field, the callee restores top_sp and other
registers, and then branches to the return address that the caller placed in the status field.
Although top_sp has been decremented, the caller knows where the return value is, relative to the
current value of top_sp; the caller therefore may use that value.
6
their space.
The same scheme works for objects of any type if they are local to the procedure called and have a
size that depends on the parameters of the call.
Heap allocation
Stack allocation strategy cannot be used if either of the following is possible :
1. The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation records or
other objects.
Pieces may be deallocated in any order, so over the time the heap will consist of alternate
areas that are free and in use.
7
Position in the Activation records in the heap Remarks
activation tree
s Retained activation
s record for r
r q ( 1 , 9) control link
control link
q(1,9)
control link
The record for an activation of procedure r is retained when the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.
If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.
For large blocks of storage use the heap manager.This approach results in fast allocation
and deallocation of small amounts of storage, since taking and returning a block from
linked list are efficient operations.
Prior to code generation, the front end must be scanned, parsed and translated into intermediate
representation along with necessary type checking. Therefore, input to code generation is assumed
to be error-free.
2. Target program:
The output of the code generator is the target program. The output may be :
a. Absolute machine language
- It can be placed in a fixed memory location and can be executed immediately.
c. Assembly language
- Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of data objects in run-time memory by
the front end and code generator.
It makes use of symbol table, that is, a name in a three-address statement refers to a symbol-
table entry for the name.
Labels in three-address statements have to be converted to addresses of instructions. For
example,
j :gotoigenerates jump instruction as follows :
ifi<j, a backward jump instruction with target address equal to location of code for
quadruple i is generated.
ifi>j, the jump is forward. We must store on a list for quadruplei the location of the
first machine instruction generated for quadruplej. When iis processed, the machine
locations for all instructions that forward jumps to i are filled.
4. Instruction selection:
The instructions of target machine should be complete and uniform.
Instruction speeds and machine idioms are important factors when efficiency of target program
9
is considered.
The quality of the generated code is determined by its speed and size.
The former statement can be translated into the latter statement as shown below:
5. Register allocation
Instructions involving register operands are shorter and faster than those involving operands in
memory.
The use of registers is subdivided into two subproblems :
Register allocation – the set of variables that will reside in registers at a point inthe program is selected.
Register assignment – the specific register that a variable will reside in ispicked.
Certain machine requires even-odd register pairs for some operands and results. For
example , consider the division instruction of the form :
D x, y
6. Evaluation order
The order in which the computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
A code generator generates target code for a sequence of three- address statements and effectively
uses registers to store operands of the statements.
For example: consider the three-address statement a := b+c
10
ADD Rj, Ri Cost = 1 // if Ri contains b and R j contains c
(or)
(or)
ADD Rj, Ri
Register and Address Descriptors:
A register descriptor is used to keep track of what is currently in each registers. The register
descriptors show that initially all the registers are empty.
An address descriptor stores the location where the current value of the name can be found at run
time.
A code-generation algorithm:
The algorithm takes as input a sequence of three -address statements constituting a basic block. For each
three-address statement of the form x : = y op z, perform the following actions:
1. Invoke a function getreg to determine the location L where the result of the computation y op z should
be stored.
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the register for
y’ if the value of y is currently both in memory and a register. If the value of y is not already in L,
generate the instruction MOV y’ , L to place a copy of y in L.
4. If the current values of y or z have no next uses, are not live on exit from the block, and are in
registers, alter the register descriptor to indicate that, after execution of x : = y op z , those registers
will no longer contain y or z.
11
Generating Code for Assignment Statements:
The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-address code
sequence:
t:=a–b u:=
a–c v:=t+u
d:=v+u
with d live at the end.
12
Code sequence for the example is:
Register empty
The table shows the code sequences generated for the indexed assignment statements a : = b [ i ]
and a [ i ] : = b
The table shows the code sequences generated for the pointer assignments a : = *p and *p : = a
a : = *p MOV *Rp, a
*p : = a MOV a, *Rp
13
Generating Code for Conditional Statements
Statement Code
CMP x, y
if x < y goto z CJ<z
/* jump to z if condition code
is negative */
14
15