15Cs314J - Compiler Design: Unit 4
15Cs314J - Compiler Design: Unit 4
15Cs314J - Compiler Design: Unit 4
UNIT 4
Prepared by:
Dr.R.I.Minu
Associate Professor
1
CONTENT
• Issues in the design of code generator.
• The target machine
• Runtime Storage management
• Basic Blocks and Flow Graphs
• Next-use Information
• A simple Code generator
• DAG representation of Basic Blocks
• Peephole Optimization
• Cross Compiler – T diagrams
INTRODUCTION TO CODE GENERATOR
• A code generator has three primary tasks:
• The input to the code generator is the intermediate representation of the source
program
• Here it is assume that all syntactic and static semantic errors have been detected, that
the necessary type checking has taken place, and that type conversion operators have
been inserted wherever necessary.
• The code generator can therefore proceed on the assumption that its input is free of
these kinds of errors.
The Target Program
• The instruction-set architecture of the target machine has a
significant impact on the difficulty of constructing a good code
generator that produces high-quality machine code.
• The most common target-machine architectures are
• RISC (reduced instruction set computer)
• It has many registers, three-address instructions, simple addressing modes, and a
relatively simple instruction-set architecture
• CISC (complex instruction set computer),
• It has few registers, two-address instructions, a variety of addressing modes, several
register classes, variable-length instructions, and instructions with side effects
• Stack based
• operations are done by pushing operands onto a stack and then performing the
operations on the operands at the top of the stack
The Target Program (Cont)
• To overcome the high performance penalty of interpretation, just-in-time
(JIT) Java compilers have been created.
• These JIT compilers translate bytecodes during run time to the native
hardware instruction set of the target machine
• Producing an absolute machine-language program as output has the
advantage that it can be placed in a fixed location in memory and
immediately executed
• Producing a relocatable machine-language program (often called an object
module) as output allows subprograms to be compiled separately. A set of
relocatable object modules can be linked together and loaded for
execution by a linking loader.
Instruction Selection
• The code generator must map the IR program into a code sequence
that can be executed by the target machine.
• The complexity of performing this mapping is determined by a factors
such as
• The level of the IR
• The nature of the instruction-set architecture
• The desired quality of the generated code.
Instruction Selection
• If the IR is high level, the code generator may translate each IR statement
into a sequence of machine instructions using code templates. Such
statement by-statement code generation, however, often produces poor
code that needs
• If the IR reflects some of the low-level details of the underlying machine,
then the code generator can use this information to generate more
efficient code sequences.
• The quality of the generated code is usually determined by its speed and
size.
• On most machines, a given IR program can be implemented by many
different code sequences, with significant cost differences between the
different implementations
Register Allocation
• A key problem in code generation is deciding what values to hold in what registers.
• Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values.
• Values not held in registers need to reside in memory.
• Instructions involving register operands are invariably shorter and faster than those
involving operands in memory, so efficient utilization of registers is particularly
important.
• The use of registers is often subdivided into two sub-problems:
• Register allocation, during which we select the set of variables that will reside in registers at each
point in the program.
• Register assignment, during which we pick the specific register that a variable will reside in.
• Finding an optimal assignment of registers to variables is difficult, even with single-
register machines.
• Mathematically, the problem is NP-complete.
• The problem is further complicated because the hardware and/or the operating system
of the target machine may require that certain register-usage conventions be observed.
Evaluation Order
• The order in which computations are performed can affect the
efficiency of the target code.
• Some computation orders require fewer registers to hold
intermediate results than others.
• However, picking a best order in the general case is a difficult NP-
complete problem.
• Initially, we shall avoid the problem by generating code for the three-
address statements in the order in which they have been produced
by the intermediate code generator.
Target Machine
A Simple Target Machine Model
• Let consider target computer models a three-address machine with
load and store operations, computation operations, jump
operations, and conditional jumps.
• The underlying computer is a byte-addressable machine with n
general-purpose registers,R0, R1, . . . , Rn - 1.
Simple comments
• Load operations: An instruction of the form LD r1, r2 is a register-to-
register copy in which the contents of register r2 are copied into register r1.
• Store operations: The instruction ST x, r stores the value in register r into
the location x. This instruction denotes the assignment x = r.
• Computation operations of the form OP dst, src1, src2, where OP is a
operator like ADD or SUB, and dst, src1 , and src2 are locations, not
necessarily distinct.
• Unconditional jumps: The instruction BR L causes control to branch to the
machine instruction with label L. (BR stands for branch.)
• Conditional jumps of the form B cond r, L, where r is a register, L is a label,
and cond stands for any of the common tests on values in the register r.
Simple comments- Example
• A memory location can be an integer indexed by a register.
• For example, LD R1, 100(R2)
It has the effect of setting R1 = contents(100 + contents(R2))
• For example, LD R1, * 100 (R2)
R1 = contents(contents(100 + contents(R2)))
(loading into R1 the value in the memory location stored in the
memory location obtained by adding 100 to the contents of register
R2.)
• The instruction LD R1, #100 loads the integer 100 into register R1,
and ADD R1, R1, #I00 adds the integer 100 into register R1.
Program and Instruction Costs
• We often associate a cost with compiling and running a program.
Depending on what aspect of a program we are interested in
optimizing
• some common cost measures are
• The length of compilation time and the size,
• Running time and power consumption of the target program.
Program and Instruction Costs
• The instruction LD R0, R1 copies the contents of register R1 into
register RO. This instruction has a cost of one because no additional
memory words are required.
• The instruction LD R0 M loads the contents of memory location M
into register RO. The cost is two since the address of memory location
M is in the word following the instruction.
• The instruction LD R1, *100(R2) loads into register R1 the value given
by contents(contents(l00 + contents(R2))). The cost is three because
the constant 100 is stored in the word following the instruction.
Runtime Storage management
Runtime Storage management
• The executing program runs in its own logical address space that was partitioned
into four code and data areas:
1. A statically determined area Code that holds the executable target code. The
size of the target code can be determined at compile time.
2. A statically determined data area Static for holding global constants and other
data generated by the compiler. The size of the global constants and compiler
data can also be determined at compile time.
3. A dynamically managed area Heap for holding data objects that are allocated
and freed during program execution. The size of the Heap cannot be
determined at compile time.
4. A dynamically managed area Stack for holding activation records as they are
created and destroyed during procedure calls and returns. Like the Heap, the
size of the Stack cannot be determined at compile time.
Static Allocation
• Illustrate code generation for simplified procedure calls and returns
ST callee.staticArea, #here + 20
BR callee. codeArea
• The ST instruction saves the return address at the beginning of the
activation record for callee, and the BR transfers control to the target
code for the called procedure callee.
• return callee statement can be implemented by a simple jump
instruction
BR *callee. staticArea
ST callee.staticArea, #here + 20
BR callee. codeArea
BR *callee. staticArea
Stack Allocation
• Static allocation can become stack allocation by using relative
addresses for storage in activation records.
• In stack allocation, however, the position of an activation record for a
procedure is not known until run time.
• This position is usually stored in a register, so words in the activation
record can be accessed as offsets from the value in this register
Stack Allocation
• The code for the first procedure initializes the stack by setting SP to
the start of the stack area in memory:
LD SP, #stackStart // initialize the stack code for the first procedure
HALT // terminate execution
• A procedure call sequence increments SP, saves the return address,
and transfers control to the called procedure:
ADD SP , SP , #caller. recordsize // increment stack pointer
ST *SP , #here + 16 // save return address
BR callee.codeArea // return to caller
Stack Allocation
• The return sequence consists of two parts. The called procedure
transfers control to the return address using
BR *0(SP) // return to caller
• The second part of the return sequence is in the caller, which
decrements SP, thereby restoring SP to its previous value.
SUB SP , SP , #caller. recordsize // decrement stack pointer
• LD SP, #stackStart // initialize the
stack code for the first procedure
• HALT // terminate execution
• ADD SP , SP , #caller. recordsize
// increment stack pointer
• ST *SP , #here + 16 // save return
address
• BR callee.codeArea // return to
caller
• BR *0(SP) // return to caller
• SUB SP , SP , #caller. recordsize //
decrement stack pointer
Basic Block
• Partition the intermediate code into basic blocks, which are maximal
sequences of consecutive three-address instructions with the
properties that
• The flow of control can only enter the basic block through the first
instruction in the block. That is, there are no jumps into the middle of
the block.
• Control will leave the block without halting or branching, except
possibly at the last instruction in the block.
• The basic blocks become the nodes of a flow graph, whose edges
indicate which blocks can follow which other blocks.
Partitioning three-address instructions into
basic blocks
INPUT: A sequence of three-address instructions.
OUTPUT: A list of the basic blocks for that sequence in which each
instruction is assigned to exactly one basic block.
METHOD: First, we determine those instructions in the intermediate code
that are leaders, that is, the first instructions in some basic block. The
instruction just past the end of the intermediate program is not included as a
leader. The rules for finding leaders are:
1. The first three-address instruction in the intermediate code is a leader.
2. Any instruction that is the target of a conditional or unconditional jump is
a leader.
3. Any instruction that immediately follows a conditional or unconditional
jump is a leader
Example
Leader
• First, instruction 1 is a leader by rule (I) of Algorithm 8.5.
• To find the other leaders, we first need to find the jumps.
• In this example, there are three jumps, all conditional, at
instructions 9, 11, and 17.
• By rule (2), the targets of these jumps are leaders; they
are instructions 3, 2, and 13, respectively.
• Then, by rule (3), each instruction following a jump is a
leader; those are instructions 10 and 12.
• We conclude that the leaders are instructions 1, 2, 3, 10, 12,
and 13.
• The basic block of each leader contains all the instructions
from itself until just before the next leader.
• Thus, the basic block of 1 is just 1, for leader 2 the block is
just 2.
• Leader 3, however, has a basic block consisting of
instructions 3 through 9, inclusive.
• Instruction 10's block is 10 and 11; instruction 12's block
• is just 12, and instruction 13's block is 13 through 17
Next-Use Information
• Knowing when the value of a variable will be used next is essential for
generating good code
• The use of a name in a three-address statement is defined as follows.
• Suppose three-address statement i assigns a value to x.
• If statement j has x as an operand, and control can flow from
statement i to j along a path that has no intervening assignments to x,
then we say statement j uses the value of x computed at statement i.
• We further say that x is live at statement i.
Next-Use Information
Algorithm: Determining the liveness and next-use information for each statement in a basic
block.
INPUT: A basic block B of three-address statements. We assume that the symbol table initially
shows all nontemporary variables in B as being live on exit.
OUTPUT: At each statement i: x = y + z in B, we attach to i the liveness and next-use information
of x, y, and z.
METHOD: We start at the last statement in B and scan backwards to the beginning of B. At each
statement i: x = y + z in B, we do the following:
1. Attach to statement i the information currently found in the symbol table regarding the next
use and liveness of x, y, and y.
2. In the symbol table, set x to "not live" and "no next use."
3. In the symbol table, set y and z to "live" and the next uses of y and z to i
Flow Graphs
• If we have two block B and C
• There is an edge from block B to block C if and only if it is possible for
the first instruction in block C to immediately follow the last
instruction in block B.
• There are two ways that such an edge could be justified:
1. There is a conditional or unconditional jump from the end of B to
the beginning of C.
2. C immediately follows B in the original order of the three-address
instructions, and B does not end in an unconditional jump.
Optimization of Basic Blocks
Topic Repeated in Unit 5
The DAG Representation of Basic Blocks
1. There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
2. There is a node N associated with each statement s within the block. The
children of N are those nodes corresponding to statements that are the
last definitions, prior to s, of the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is
the list of variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose
variables are live on exit from the block; that is, their values may be used
later, in another block of the flow graph. Calculation of these "live
variables" is a matter for global flow analysis
The DAG Representation of Basic Blocks
Using DAG following code improving transformations can be done
1. We can eliminate local common sub-expressions, that is,
instructions that compute a value that has already been computed.
2. We can eliminate dead code, that is, instructions that compute a
value that is never used.
3. We can reorder statements that do not depend on one another;
such reordering may reduce the time a temporary value needs to
be preserved in a register.
4. We can apply algebraic laws to reorder operands of three-address
instructions, and sometimes t hereby simplify t he computation
Finding Local Common Subexpressions
• Common subexpressions can be detected by noticing, as a new node
M is about to be added, whether there is an existing node N with the
same children, in the same order, and with the same operator.
• If so, N computes the same value as M and may be used in its place.
This technique was introduced as the "value-number" method of
detecting common subexpressions
Dead Code Elimination
• The operation on DAG's that corresponds to dead-code elimination
can be implemented as follows.
• We delete from a DAG any root (node with no ancestors) that has
no live variables attached.
• Repeated application of this transformation will remove all nodes
from the DAG that correspond to dead code.
The Use of Algebraic Identities
1. Algebraic identities represent another important class of
optimizations on basic blocks. For example, we may apply
arithmetic identities, such as
b
Pointer Assignments
x = *p
*q = y
• In effect, x = *p is a use of every variable
• *q = y is a possible assignment to every variable.
• As a consequence, the operator =* must take all nodes that are currently associated
with identifiers as arguments, which is relevant for dead-code elimination.
• More importantly, the *= operator kills all other nodes so far constructed in the DAG.
• There are global pointer analyses one could perform that might limit the set of variables
a pointer could reference at a given place in the code.
• Even local analysis could restrict the scope of a pointer.
Simple Code Generator
Four principal uses of registers
descriptor
empty.
• As the code generation progresses, each register will hold the value of zero
or more names.
• For each program variable keeps track of the location or locations where
descriptor
• The information can be stored in the symbol-table entry for that variable
name
The Code-Generation Algorithm
For x=y
• We assume that getReg will always choose the same register for both
x and y.
• If y is not already in that register Ry then generate the machine
instruction LD Ry y.
• If y was already in Ry we do nothing.
• It is only necessary that we adjust the register description for Ry so
that it includes x as one of the values found there.
The Code-Generation Algorithm - Ending the Basic Block
• Variables used by the block may wind up with their only location being a
register.
• If the variable is a temporary used only within the block, that is fine;
• When the block ends, we can forget about the value of the temporary and
assume its register is empty.
• However, if the variable is live on exit from the block, or if we don't know
which variables are live on exit, then we need to assume that the value of
the variable is needed later.
• In that case, for each variable x whose location descriptor does not say that its value
is located in the memory location for x,
• we must generate the instruction ST x, R, where R is a register in which x's value
exists at the end of the block.
The Code-Generation Algorithm - Managing Register and Address Descriptors
The rules to update the register and address descriptors.
• A simple but effective technique for locally improving the target code
is peephole optimization
• This is done by examining a sliding window of target instructions
(called the peephole) and replacing instruction sequences within the
peephole by a shorter or faster sequence, whenever possible.
• Peephole optimization can also be applied directly after intermediate
code generation to improve the intermediate representation.
Redundant instruction elimination
• At compilation level, the compiler searches for instructions redundant
in nature. Multiple loading and storing of instructions may carry the
same meaning even if some of them are removed.
• There are instances in a code where the program control jumps back
and forth without performing any significant task. These jumps can be
removed. Consider the following chunk of code:
• There are operations that consume more time and space. Their
‘strength’ can be reduced by replacing them with other operations
that consume less time and space, but produce the same result.
• For example, x * 2 can be replaced by x << 1, which involves only one
left shift. Though the output of a * a and a2 is same, a2 is much more
efficient to implement.
Accessing machine instructions