Principles of Compiler Design (Seng 3043) : Chapter - 8 Code Generation
Principles of Compiler Design (Seng 3043) : Chapter - 8 Code Generation
Principles of Compiler Design (Seng 3043) : Chapter - 8 Code Generation
Chapter – 8
Code Generation
Debremarkos Institute of Technology
School of Computing
Software Engineering Academic Program
By Lamesginew A. ([email protected] ) 1
Outline
Introduction
Issues in the design of code generation
The Target Machine
Basic Blocks and flow graphs
2
Introduction
It is the final phase of a compiler.
It receives an intermediate representation (IR) from the front-end with
supplementary information in symbol table.
It produces a semantically equivalent target program.
Compulsory requirements for code generator
Target program must preserve semantic meaning of source program
Target program must be of high quality(it must make effective use of
available resources of the target machine).
The code generator itself must run efficiently.
3
,,, Cont’d
4
Issues in the Design of a Code Generator
In the design of code generators, tasks like instruction selection, register
allocation and assignment, and instruction ordering are encountered.
They are dependent on the specifics of the intermediate representation, the
target language, and the run-time system.
The most important criterion for a code generator is that it produce correct code.
7
,,, Cont’d
Instruction Selection
• Code generator maps the IR program into a code sequence that can be
executed by the target machine.
• Factors affecting this mapping are the following
• the level of the IR
• the nature of the instruction-set architecture
• the desired quality of the generated code
If the IR is high level, the code generator may translate using code
templates. However, such statement-by-statement code generation
often produces poor code that needs further optimization. If the IR
reflects the low-level details of the machine, then the code generator
can use this information to generate more efficient code sequences.
The nature of the instruction set of the target machine (e.g. uniformity
and completeness of the instruction set) has a strong effect on the
difficulty of instruction selection. If the target machine does not
support each data type in a uniform manner, then each exception to the
general rule requires special handling. 8
Instruction speeds and machine idioms are other important factors.,,, Cont’d
For each type of three-address statement, we can design a code
skeleton that defines the target code to be generated for that construct.
For example, the three-address statement x = y + z can be translated
into the code sequence
LD R0, y // R0 = y (load y into register R0)
ADD R0, R0, z // R0 = R0 + z (add z to R0)
ST x, R0 // x = R0 (store R0 into x)
For example, for sequence of three-address statements
a=b+c
d=a+e
LD R0, b // R0 = b
ADD R0, R0, c // R0 = R0 + c
ST a, R0 // a = R0
LD R0, a // R0 = a
ADD R0, R0, e // R0 = R0 + e 9
,,, Cont’d
The quality of the generated code is usually determined by its speed
and size.
IR program can be implemented by many different code sequences,
with significant cost differences.
11
,,, Cont’d
Evaluation order
The order in which computations are performed can affect the
efficiency of the target code.
Some computation orders require fewer registers to hold
intermediate results than others.
However, picking a best order in the general case is a difficult
problem.
Initially, we shall avoid the problem by generating code for the
three-address statements in the order in which they have been
produced by the intermediate code generator.
12
The Target Machine
Familiarity with the target machine and its instruction set is a
prerequisite for designing a good code generator.
Implementing code generation requires thorough understanding of the
target machine architecture and its instruction set.
A Simple Target Machine Model
The target computer models a three-address machine with load and store
operations, computation operations, jump operations, and conditional jumps.
The following are instructions available.
Load operations: LD dst, addr (dst = addr). Eg.: LD r, x and LD r1, r2
Store operations: ST x, r
Computation operations: OP dst, srcl, src2. E.g: SUB rl, r2, r3 rl = r2 – r3.
Unary operators do not have a src2.
Unconditional jumps: BR L causes to branch to label L.
Conditional jumps: Bcond r, L, where r is a register, L is a label, and
cond is the common tests on values in the register r. Example, BLTZ r, L
causes a jump to label L if the value in register r is less than zero.
13
,,, Cont’d
Indirect addressing mode: *r (the memory location found in the location represented
by the contents of register r) and *100(r) [the memory location found in the location
obtained by adding 100 to the contents of r]. Example, LD R1, *100(R2) has the
effect of setting R1 = contents(contents(l00 + contents(R2)))
Immediate constant addressing mode: like LD R1, #100; loads the integer 100 into
register R1
14
,,, Cont’d
Eg-1: The three-address statement x = y - z can be implemented by the
machine instructions:
LD R1, y // R1 = y
LD R2, z // R2 = z
SUB R1, R1, R2 // R1 = R1 - R2
ST x, R1 // x = R1
Eg-4: for x = *p
LD R1, p // R1 = p
LD R2, 0(R1) // R2 = contents(0 + contents(R1))
ST x, R2 // x = R2
Eg-5: for *p = y
LD R1, p // R1 = p
LD R2, y // R2 = y
ST 0(R1), R2 // contents(0 + contents(R1)) = R2
Eg-6: for conditional-jump, if x < y goto M
LD R1, x // R1 = x
LD R2, y // R2 = y
SUB R1, R1, R2 // R1 = R1 - R2
BLTZ R1, M // if R1 < 0 jump to M
16
,,, Cont’d
17
• Some examples: ,,, Cont’d
19
Basic Blocks and Flow graphs
flow graph :- a graph representation of intermediate code.
The representation is constructed as follows:
Partition the intermediate code into basic blocks, with the
properties that
The flow of control can only enter the basic block
through the first instruction in the block. That is, there
are no jumps into the middle of the block.
Control will leave the block without halting or
branching, except possibly at the last instruction in the
block.
The basic blocks become the nodes of a flow graph, whose
edges indicate which blocks can follow which other blocks.
20
,,, Cont’d
Basic block
A basic block is a sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without halt or
possibility of branching except at the end.
We begin a new basic block with the first instruction and keep adding
instructions until we meet either a jump, a conditional jump, or a label on the
following instruction.
• Algorithm: Partitioning three-address instructions into basic blocks.
• Input: A sequence of three-address instructions.
• Output: A list of the basic blocks for that sequence in which each instruction
is assigned to exactly one basic block.
• Method: First, determine leaders, that is, the first instruction the basic block.
The rules for finding leaders are:
The first three-address instruction in the intermediate code is a leader.
Any instruction that is the target of a conditional or unconditional jump is a leader.
Any instruction that immediately follows a conditional or unconditional jump is a
leader.
• Then, for each leader, its basic block consists of itself and all instructions up
21
to but not including the next leader or the end of the intermediate program.
For example: for the following intermediate code ,,, Cont’d
23
,,, Cont’d
24
OU
K Y
AN
TH
RY
VE
! !
CH
MU