15Cs314J - Compiler Design: Unit 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 71

15CS314J – COMPILER DESIGN

UNIT 4
Prepared by:
Dr.R.I.Minu
Associate Professor

1
CONTENT
• Issues in the design of code generator.
• The target machine
• Runtime Storage management
• Basic Blocks and Flow Graphs
• Next-use Information
• A simple Code generator
• DAG representation of Basic Blocks
• Peephole Optimization
• Cross Compiler – T diagrams
INTRODUCTION TO CODE GENERATOR
• A code generator has three primary tasks:

Instruction • Involves choosing appropriate target-machine


selection instructions to implement the IR statements.

Register allocation • Involves deciding what values to keep in


and assignment which registers.

Instruction • Involves deciding in what order to schedule


ordering the execution of instructions.
Issues in the Design of a Code Generator
• Input to the Code Generator
• The Target Program
• Instruction Selection
• Register Allocation
• Evaluation Order
Input to the Code Generator

• The input to the code generator is the intermediate representation of the source
program

Three-address representations quadruples, triples, indirect triples;


Virtual machine representations Bytecodes and stack-machine code
Linear representations Postfix notation
Graphical representations Syntax trees and DAG's

• Here it is assume that all syntactic and static semantic errors have been detected, that
the necessary type checking has taken place, and that type conversion operators have
been inserted wherever necessary.
• The code generator can therefore proceed on the assumption that its input is free of
these kinds of errors.
The Target Program
• The instruction-set architecture of the target machine has a
significant impact on the difficulty of constructing a good code
generator that produces high-quality machine code.
• The most common target-machine architectures are
• RISC (reduced instruction set computer)
• It has many registers, three-address instructions, simple addressing modes, and a
relatively simple instruction-set architecture
• CISC (complex instruction set computer),
• It has few registers, two-address instructions, a variety of addressing modes, several
register classes, variable-length instructions, and instructions with side effects
• Stack based
• operations are done by pushing operands onto a stack and then performing the
operations on the operands at the top of the stack
The Target Program (Cont)
• To overcome the high performance penalty of interpretation, just-in-time
(JIT) Java compilers have been created.
• These JIT compilers translate bytecodes during run time to the native
hardware instruction set of the target machine
• Producing an absolute machine-language program as output has the
advantage that it can be placed in a fixed location in memory and
immediately executed
• Producing a relocatable machine-language program (often called an object
module) as output allows subprograms to be compiled separately. A set of
relocatable object modules can be linked together and loaded for
execution by a linking loader.
Instruction Selection
• The code generator must map the IR program into a code sequence
that can be executed by the target machine.
• The complexity of performing this mapping is determined by a factors
such as
• The level of the IR
• The nature of the instruction-set architecture
• The desired quality of the generated code.
Instruction Selection
• If the IR is high level, the code generator may translate each IR statement
into a sequence of machine instructions using code templates. Such
statement by-statement code generation, however, often produces poor
code that needs
• If the IR reflects some of the low-level details of the underlying machine,
then the code generator can use this information to generate more
efficient code sequences.
• The quality of the generated code is usually determined by its speed and
size.
• On most machines, a given IR program can be implemented by many
different code sequences, with significant cost differences between the
different implementations
Register Allocation
• A key problem in code generation is deciding what values to hold in what registers.
• Registers are the fastest computational unit on the target machine, but we usually do not
have enough of them to hold all values.
• Values not held in registers need to reside in memory.
• Instructions involving register operands are invariably shorter and faster than those
involving operands in memory, so efficient utilization of registers is particularly
important.
• The use of registers is often subdivided into two sub-problems:
• Register allocation, during which we select the set of variables that will reside in registers at each
point in the program.
• Register assignment, during which we pick the specific register that a variable will reside in.
• Finding an optimal assignment of registers to variables is difficult, even with single-
register machines.
• Mathematically, the problem is NP-complete.
• The problem is further complicated because the hardware and/or the operating system
of the target machine may require that certain register-usage conventions be observed.
Evaluation Order
• The order in which computations are performed can affect the
efficiency of the target code.
• Some computation orders require fewer registers to hold
intermediate results than others.
• However, picking a best order in the general case is a difficult NP-
complete problem.
• Initially, we shall avoid the problem by generating code for the three-
address statements in the order in which they have been produced
by the intermediate code generator.
Target Machine
A Simple Target Machine Model
• Let consider target computer models a three-address machine with
load and store operations, computation operations, jump
operations, and conditional jumps.
• The underlying computer is a byte-addressable machine with n
general-purpose registers,R0, R1, . . . , Rn - 1.
Simple comments
• Load operations: An instruction of the form LD r1, r2 is a register-to-
register copy in which the contents of register r2 are copied into register r1.
• Store operations: The instruction ST x, r stores the value in register r into
the location x. This instruction denotes the assignment x = r.
• Computation operations of the form OP dst, src1, src2, where OP is a
operator like ADD or SUB, and dst, src1 , and src2 are locations, not
necessarily distinct.
• Unconditional jumps: The instruction BR L causes control to branch to the
machine instruction with label L. (BR stands for branch.)
• Conditional jumps of the form B cond r, L, where r is a register, L is a label,
and cond stands for any of the common tests on values in the register r.
Simple comments- Example
• A memory location can be an integer indexed by a register.
• For example, LD R1, 100(R2)
It has the effect of setting R1 = contents(100 + contents(R2))
• For example, LD R1, * 100 (R2)
R1 = contents(contents(100 + contents(R2)))
(loading into R1 the value in the memory location stored in the
memory location obtained by adding 100 to the contents of register
R2.)
• The instruction LD R1, #100 loads the integer 100 into register R1,
and ADD R1, R1, #I00 adds the integer 100 into register R1.
Program and Instruction Costs
• We often associate a cost with compiling and running a program.
Depending on what aspect of a program we are interested in
optimizing
• some common cost measures are
• The length of compilation time and the size,
• Running time and power consumption of the target program.
Program and Instruction Costs
• The instruction LD R0, R1 copies the contents of register R1 into
register RO. This instruction has a cost of one because no additional
memory words are required.
• The instruction LD R0 M loads the contents of memory location M
into register RO. The cost is two since the address of memory location
M is in the word following the instruction.
• The instruction LD R1, *100(R2) loads into register R1 the value given
by contents(contents(l00 + contents(R2))). The cost is three because
the constant 100 is stored in the word following the instruction.
Runtime Storage management
Runtime Storage management
• The executing program runs in its own logical address space that was partitioned
into four code and data areas:
1. A statically determined area Code that holds the executable target code. The
size of the target code can be determined at compile time.
2. A statically determined data area Static for holding global constants and other
data generated by the compiler. The size of the global constants and compiler
data can also be determined at compile time.
3. A dynamically managed area Heap for holding data objects that are allocated
and freed during program execution. The size of the Heap cannot be
determined at compile time.
4. A dynamically managed area Stack for holding activation records as they are
created and destroyed during procedure calls and returns. Like the Heap, the
size of the Stack cannot be determined at compile time.
Static Allocation
• Illustrate code generation for simplified procedure calls and returns
ST callee.staticArea, #here + 20
BR callee. codeArea
• The ST instruction saves the return address at the beginning of the
activation record for callee, and the BR transfers control to the target
code for the called procedure callee.
• return callee statement can be implemented by a simple jump
instruction
BR *callee. staticArea
ST callee.staticArea, #here + 20
BR callee. codeArea

BR *callee. staticArea
Stack Allocation
• Static allocation can become stack allocation by using relative
addresses for storage in activation records.
• In stack allocation, however, the position of an activation record for a
procedure is not known until run time.
• This position is usually stored in a register, so words in the activation
record can be accessed as offsets from the value in this register
Stack Allocation
• The code for the first procedure initializes the stack by setting SP to
the start of the stack area in memory:
LD SP, #stackStart // initialize the stack code for the first procedure
HALT // terminate execution
• A procedure call sequence increments SP, saves the return address,
and transfers control to the called procedure:
ADD SP , SP , #caller. recordsize // increment stack pointer
ST *SP , #here + 16 // save return address
BR callee.codeArea // return to caller
Stack Allocation
• The return sequence consists of two parts. The called procedure
transfers control to the return address using
BR *0(SP) // return to caller
• The second part of the return sequence is in the caller, which
decrements SP, thereby restoring SP to its previous value.
SUB SP , SP , #caller. recordsize // decrement stack pointer
• LD SP, #stackStart // initialize the
stack code for the first procedure
• HALT // terminate execution
• ADD SP , SP , #caller. recordsize
// increment stack pointer
• ST *SP , #here + 16 // save return
address
• BR callee.codeArea // return to
caller
• BR *0(SP) // return to caller
• SUB SP , SP , #caller. recordsize //
decrement stack pointer
Basic Block
• Partition the intermediate code into basic blocks, which are maximal
sequences of consecutive three-address instructions with the
properties that
• The flow of control can only enter the basic block through the first
instruction in the block. That is, there are no jumps into the middle of
the block.
• Control will leave the block without halting or branching, except
possibly at the last instruction in the block.
• The basic blocks become the nodes of a flow graph, whose edges
indicate which blocks can follow which other blocks.
Partitioning three-address instructions into
basic blocks
INPUT: A sequence of three-address instructions.
OUTPUT: A list of the basic blocks for that sequence in which each
instruction is assigned to exactly one basic block.
METHOD: First, we determine those instructions in the intermediate code
that are leaders, that is, the first instructions in some basic block. The
instruction just past the end of the intermediate program is not included as a
leader. The rules for finding leaders are:
1. The first three-address instruction in the intermediate code is a leader.
2. Any instruction that is the target of a conditional or unconditional jump is
a leader.
3. Any instruction that immediately follows a conditional or unconditional
jump is a leader
Example
Leader
• First, instruction 1 is a leader by rule (I) of Algorithm 8.5.
• To find the other leaders, we first need to find the jumps.
• In this example, there are three jumps, all conditional, at
instructions 9, 11, and 17.
• By rule (2), the targets of these jumps are leaders; they
are instructions 3, 2, and 13, respectively.
• Then, by rule (3), each instruction following a jump is a
leader; those are instructions 10 and 12.
• We conclude that the leaders are instructions 1, 2, 3, 10, 12,
and 13.
• The basic block of each leader contains all the instructions
from itself until just before the next leader.
• Thus, the basic block of 1 is just 1, for leader 2 the block is
just 2.
• Leader 3, however, has a basic block consisting of
instructions 3 through 9, inclusive.
• Instruction 10's block is 10 and 11; instruction 12's block
• is just 12, and instruction 13's block is 13 through 17
Next-Use Information
• Knowing when the value of a variable will be used next is essential for
generating good code
• The use of a name in a three-address statement is defined as follows.
• Suppose three-address statement i assigns a value to x.
• If statement j has x as an operand, and control can flow from
statement i to j along a path that has no intervening assignments to x,
then we say statement j uses the value of x computed at statement i.
• We further say that x is live at statement i.
Next-Use Information
Algorithm: Determining the liveness and next-use information for each statement in a basic
block.
INPUT: A basic block B of three-address statements. We assume that the symbol table initially
shows all nontemporary variables in B as being live on exit.
OUTPUT: At each statement i: x = y + z in B, we attach to i the liveness and next-use information
of x, y, and z.
METHOD: We start at the last statement in B and scan backwards to the beginning of B. At each
statement i: x = y + z in B, we do the following:
1. Attach to statement i the information currently found in the symbol table regarding the next
use and liveness of x, y, and y.
2. In the symbol table, set x to "not live" and "no next use."
3. In the symbol table, set y and z to "live" and the next uses of y and z to i
Flow Graphs
• If we have two block B and C
• There is an edge from block B to block C if and only if it is possible for
the first instruction in block C to immediately follow the last
instruction in block B.
• There are two ways that such an edge could be justified:
1. There is a conditional or unconditional jump from the end of B to
the beginning of C.
2. C immediately follows B in the original order of the three-address
instructions, and B does not end in an unconditional jump.
Optimization of Basic Blocks
Topic Repeated in Unit 5
The DAG Representation of Basic Blocks
1. There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
2. There is a node N associated with each statement s within the block. The
children of N are those nodes corresponding to statements that are the
last definitions, prior to s, of the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is
the list of variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose
variables are live on exit from the block; that is, their values may be used
later, in another block of the flow graph. Calculation of these "live
variables" is a matter for global flow analysis
The DAG Representation of Basic Blocks
Using DAG following code improving transformations can be done
1. We can eliminate local common sub-expressions, that is,
instructions that compute a value that has already been computed.
2. We can eliminate dead code, that is, instructions that compute a
value that is never used.
3. We can reorder statements that do not depend on one another;
such reordering may reduce the time a temporary value needs to
be preserved in a register.
4. We can apply algebraic laws to reorder operands of three-address
instructions, and sometimes t hereby simplify t he computation
Finding Local Common Subexpressions
• Common subexpressions can be detected by noticing, as a new node
M is about to be added, whether there is an existing node N with the
same children, in the same order, and with the same operator.
• If so, N computes the same value as M and may be used in its place.
This technique was introduced as the "value-number" method of
detecting common subexpressions
Dead Code Elimination
• The operation on DAG's that corresponds to dead-code elimination
can be implemented as follows.
• We delete from a DAG any root (node with no ancestors) that has
no live variables attached.
• Repeated application of this transformation will remove all nodes
from the DAG that correspond to dead code.
The Use of Algebraic Identities
1. Algebraic identities represent another important class of
optimizations on basic blocks. For example, we may apply
arithmetic identities, such as

2. Another class of algebraic optimizations includes local reduction in


strength, that is, replacing a more expensive operator by a cheaper
one as in:
The Use of Algebraic Identities
3. A third class of related optimizations is constant folding. Here we
evaluate constant expressions at compile time and replace the constant
expressions by their value
• Thus the expression 2 * 3.14 would be replaced by 6.28.
• Many constant expressions arise in practice because of the frequent
use of symbolic constants in programs.
Representation of Array References
The proper way to represent array accesses in a DAG is as follows.
1. An assignment from an array, like x = a[i] , is represented by
creating a node with operator =[] and two children representing the
initial value of the array, a0 in this case, and the index i. Variable x
becomes a label of this new node.
2. An assignment to an array, like a[j] = y, is represented by a new
node with operator []= and three children representing a0, j and y.
There is no variable labelling this node.
• The creation of this node kills all currently constructed nodes whose value
depends on ao. A node that has been killed cannot receive any more labels;
that is, it cannot become a common subexpression.
Representation of Array References

b
Pointer Assignments

x = *p
*q = y
• In effect, x = *p is a use of every variable
• *q = y is a possible assignment to every variable.
• As a consequence, the operator =* must take all nodes that are currently associated
with identifiers as arguments, which is relevant for dead-code elimination.
• More importantly, the *= operator kills all other nodes so far constructed in the DAG.
• There are global pointer analyses one could perform that might limit the set of variables
a pointer could reference at a given place in the code.
• Even local analysis could restrict the scope of a pointer.
Simple Code Generator
Four principal uses of registers

• In most machine architectures, some or all of the operands of an operation


must be in registers in order to perform the operation.
• Registers make good temporaries - places to hold the result of a sub-
expression while a larger expression is being evaluated, or more generally,
a place to hold a variable that is used only within a single basic block.
• Registers are used to hold (global) values that are computed in one basic
block and used in other blocks, for example, a loop index that is
incremented going around the loop and is used several times within the
loop.
• Registers are often used to help with run-time storage management, for
example, to manage the run-time stack, including the maintenance of stack
pointers and possibly the top elements of the stack itself.
Register and Address Descriptors
• For each available register keeps track of the variable names whose

Register current value is in that register.


• Since we shall use only those registers that are available for local use
within a basic block, we assume that initially, all register descriptors are

descriptor
empty.
• As the code generation progresses, each register will hold the value of zero
or more names.

• For each program variable keeps track of the location or locations where

Address the current value of that variable can be found.


• The location might be a register, a memory address, a stack location, or
some set of more than one of these.

descriptor
• The information can be stored in the symbol-table entry for that variable
name
The Code-Generation Algorithm

• Machine Instructions for Operations


• Machine Instructions for Copy Statements
• Ending the Basic Block
• Managing Register and Address Descriptors
The Code-Generation Algorithm - getReg
• An essential part of the algorithm is a function getReg(I)
• which selects registers for each memory location associated with the
three-address instruction I.
• Function getReg has access to the register and address descriptors for
all the variables of the basic block,
• It may also have access to certain useful data-flow information such
as the variables that are live on exit from the block.
The Code-Generation Algorithm - Machine Instructions for Operations

For a three-address instruction such as x = y + z, do the following:


1. Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx
Ry and Rz
2. If y is not in Ry (according to the register descriptor for Ry), then
issue an instruction LD Ry y', where y' is one of the memory
locations for y (according to the address descriptor for y).
3. Similarly, if z is not in Rz , issue and instruction LD Rz z', where z' is a
location for x .
4. Issue the instruction ADD Rx Ry Rz
The Code-Generation Algorithm - Machine Instructions for Copy Statements

For x=y
• We assume that getReg will always choose the same register for both
x and y.
• If y is not already in that register Ry then generate the machine
instruction LD Ry y.
• If y was already in Ry we do nothing.
• It is only necessary that we adjust the register description for Ry so
that it includes x as one of the values found there.
The Code-Generation Algorithm - Ending the Basic Block

• Variables used by the block may wind up with their only location being a
register.
• If the variable is a temporary used only within the block, that is fine;
• When the block ends, we can forget about the value of the temporary and
assume its register is empty.
• However, if the variable is live on exit from the block, or if we don't know
which variables are live on exit, then we need to assume that the value of
the variable is needed later.
• In that case, for each variable x whose location descriptor does not say that its value
is located in the memory location for x,
• we must generate the instruction ST x, R, where R is a register in which x's value
exists at the end of the block.
The Code-Generation Algorithm - Managing Register and Address Descriptors
The rules to update the register and address descriptors.

1. For the instruction LD R, x


• Change the register descriptor for register R so it holds only x.
• Change the address descriptor for x by adding register R as an additional
location.
2. For the instruction ST x, R, change the address descriptor for x to
include its own memory location.
The Code-Generation Algorithm-Managing
Register and Address Descriptors
3. For an operation such as ADD Rx Ry Rz implementing a three-address
instruction x = y + x
• Change the register descriptor for Rx so that it holds only x.
• Change the address descriptor for x so that its only location is fix. Note that the
memory location for x is not now in the address descriptor for x.
• Remove Rx from the address descriptor of any variable other than x.

4. When we process a copy statement x = y, after generating the load for y


into register Ry, if needed, and after managing descriptors as for all load
statements (per rule I):
• Add x to the register descriptor for Ry
• Change the address descriptor for x so that its only location is Ry
• t=a-b
• since nothing is in a register initially.
• Load a and b into registers R1 and
R2,
• The value t produced in register R2.
• Notice that we can use R2 for t
because the value b previously in
R2 is not needed within the block.
• Since b is presumably live on exit
from the block, had it not been in
its own memory location (as
indicated by its address descriptor),
we would have had to store R2 into
b first. The decision to do so, had we
needed R2, would be taken by
getReg.
• The second instruction, u = a - c,
does not require a load of a, since it
is already in register R1.
• we can reuse R1 for the result, u,
since the value of a, previously in
that register, is no longer needed
within the block
• The value is in its own memory
location if a is needed outside the
block.
• Note that we change the address
descriptor for a to indicate that it is
no longer in R1, but is in the
memory location called a.
• The third instruction, v = t + u,
requires only the addition.
• Further, we can use R3 for the result,
v, since the value of c in that register
is no longer needed within the block
• c has its value in its own memory
location.
• The copy instruction, a = d, requires a
load of d, since it is not in memory.
• We show register R2's descriptor
holding both a and d.
• The addition of a to the register
descriptor is the result of our
processing the copy statement
• Is not the result of any machine
instruction.
• The fifth instruction, d = v + u, uses
two values that are in registers.
• Since u is a temporary whose value
is no longer needed, we have hosen
to reuse its register R1 for the new
value of d.
• Notice that d is now in only R1, and
is not in its own memory location.
• The same holds for a, which is in R2
and not in the memory location
called a.
• As a result, we need a "coda" to the
machine code for the basic block
that stores the live-on-exit variables
a and d into their memory locations.
Peephole Optimization
Peephole Optimization

• A simple but effective technique for locally improving the target code
is peephole optimization
• This is done by examining a sliding window of target instructions
(called the peephole) and replacing instruction sequences within the
peephole by a shorter or faster sequence, whenever possible.
• Peephole optimization can also be applied directly after intermediate
code generation to improve the intermediate representation.
Redundant instruction elimination
• At compilation level, the compiler searches for instructions redundant
in nature. Multiple loading and storing of instructions may carry the
same meaning even if some of them are removed.

• Eliminating Redundant Loads and Stores


Unreachable code

• Unreachable code is a part of the program code that is never


accessed because of programming constructs. Programmers may
have accidently written a piece of code that can never be reached.

• In this code segment, the printf statement will never be executed as


the program control returns back before it can execute,
hence printf can be removed.
Flow of control optimization

• There are instances in a code where the program control jumps back
and forth without performing any significant task. These jumps can be
removed. Consider the following chunk of code:

• In this code,label L1 can be removed as it passes the control to L2. So


instead of jumping to L1 and then to L2, the control can directly reach
L2, as shown below:
Algebraic expression simplification

• There are occasions where algebraic expressions can be made simple.


• For example, the expression a = a + 0 can be replaced by a itself
• The expression a = a + 1 can simply be replaced by INC a.
Strength reduction

• There are operations that consume more time and space. Their
‘strength’ can be reduced by replacing them with other operations
that consume less time and space, but produce the same result.
• For example, x * 2 can be replaced by x << 1, which involves only one
left shift. Though the output of a * a and a2 is same, a2 is much more
efficient to implement.
Accessing machine instructions

• The target machine can deploy more sophisticated instructions, which


can have the capability to perform specific operations much
efficiently.
• If the target code can accommodate those instructions directly, that
will not only improve the quality of code, but also yield more efficient
results.
Cross Compiler
• A cross compiler is a compiler capable of creating executable code for a
platform other than the one on which the compiler is running.
• For example, a compiler that runs on a Windows 7 PC but generates code
that runs on Android smartphone is a cross compiler.
• The fundamental use of a cross compiler is to separate the build
environment from target environment.
• Use of virtual machines (such as Java's JVM) resolves some of the reasons
for which cross compilers were developed. The virtual machine paradigm
allows the same compiler output to be used across multiple target systems,
although this is not always ideal because virtual machines are often slower
and the compiled program can only be run on computers with that virtual
machine.
T-diagram
• In computing, tombstone diagrams (or T-
diagrams) consist of a set of “puzzle
pieces” representing compilers and
other related language processing
programs.
• They are used to illustrate and reason
about transformations from a source
language (left of T) to a target language
(right of T) realised in an implementation
language (bottom of T)

You might also like