Chapter 7
Chapter 7
Code Generation
1
Outline
– Position of code generator
– Code Generation
– Issue in the Design of a Code Generator
– Input to the Code Generator
– The Target Program
– Instruction Selection
– Register Allocation
– Choice of Evaluation Order
– A Simple Target Machine Model
– Program and Instruction Costs
2
Outline
3
Position of a Code Generator
Source Target
Program IC IC Program
Front-End Code Code
Optimizer Generator
Lexical error
Syntax error
Semantic error Symbol
Table
4
Code Generation
• The final phase in our compiler model is code generator.
• It takes as input the intermediate representation (IR)
produced by the front end of the compiler, along with
relevant symbol table information, and produces as output a
semantically equivalent target program.
7
Input to the Code Generator
• The input to the code generator is
– the intermediate representation of the source program produced
by the frontend along with
– information in the symbol table that is used to determine the run-
time address of the data objects denoted by the names in the IR.
• Choices for the IR
– Three-address representations: quadruples, triples, indirect triples
– Virtual machine representations: such as byte codes and stack-
machine code
– Linear representations: such as postfix notation
– Graphical representation: such as syntax trees and DAG’s
• Assumptions
– Front end has scanned, parsed and translated into relatively lower level IR
– All syntactic and static semantic errors are detected.
8
The Target Program
• The most common target-machine architectures
are RISC, CISC, and stack based.
9
The Target Program…
• Producing the target program as
– Absolute machine code (exécutable code)
– Relocatable machine code (Object files for linker and
loader)
– Assembly language (assembler)
– Byte code forms for interpreters (e.g. JVM)
• In this chapter
– Use very simple RISC-like computer as the target
machine.
– Add some CISC-like addressing modes
– Use assembly code as the target language.
10
Instruction Selection
• The code generator must map the IR program
into a code sequence that can be executed by
the target machine.
• The complexity of the mapping is determined by
factors such as:
11
Instruction Selection…
• If the IR is high level, use code templates to
translate each IR statement into a sequence of
machine instruction.
– Produces poor code, needs further optimization.
12
Instruction Selection…
• The nature of the instruction set of the target machine has a strong
effect on the difficulty of instruction selection. For example,
– The uniformity and completeness of the instruction set are important
factors.
– Instruction speeds are another important factor.
• If we do not care about the efficiency of the target program, instruction
selection is straightforward.
x = y + z LD R0, y
ADD R0, R0, z
ST x, R0
a = b + c LD R0, b
d = a + e ADD R0, R0, c
ST a, R0 Redundant
LD R0, a
ADD R0, R0,e
ST d, R0 13
Instruction Selection…
• Example: consider the following statement:
x := x + 1
• Use ADD instruction (straight forward)
– costly
• Use INC instruction
– Less costly
14
Instruction Selection…
• Suppose we translate three-address code:
x: = y + z LD R0 , y
ADD R0 , z
ST x , R0
x: = x + 1 LD R0 , x
ADD R0 , #1 cost = 6
ST x , R0
15
Register Allocation
• Efficient and careful management of registers results in a faster
program.
• A key problem in code generation is deciding what values to hold in
what registers.
– Use of registers imposes two problems:
• Register allocation: select the variables that will reside
in
registers.
• Register assignment: pick the register that a variable
will
reside in.
• Finding an optimal assignment of registers to variables
is mathematically difficult.
• In addition, the hardware/OS may require some register
usage rules to be followed.
16
Register Allocation…
t := a * b t := a * b
t := t + a t := t + a
t := t / d t := t / d
LD R1 , a LD R0 , a
MUL R1 , b LD R1 , R0
ADD R1 , a MUL R1 , b
DIV R1 , d ADD R1 , R0
ST t , R1 DIV R1 , d
ST t , R1
17
Choice of Evaluation Order
• The order in which computations are performed can
affect the efficiency of the target code.
18
Choice of Evaluation Order…
• When instructions are independent, their evaluation order
can be changed.
LD R0 , a
ADD R0 , b
t1 := a + b ST t1 , R0
a + b – (c + d) * e t2 := c + d LD R1 , c
t3 := e * t2 ADD R1 , d
t4 := t1 – t3 LD R0 , e LD R0 , c
MUL R0 , R1 ADD R0 , d
Reorder LD R1 , t1 LD R1, e
SUB R1 , R0 MUL R1, R0
ST t4 , R1 LD R0 , a
t2 := c + d
ADD R0 , b
t3 := e * t2
SUB R0 , R1
t1 := a + b
ST t4 , R0
t4 := t1 – t3
19
A Simple Target Machine Model
• Implementing code generation requires complete
understanding of the target machine architecture and its
instruction set.
21
Store operations
• The instruction ST x, r
• stores the value in register r into the location x.
22
Computation operations
• Has the form OP dst, src1, src2 ,
• where OP is an operator like ADD or SUB, and
dst, src1, src2 are locations, not necessarily distinct.
24
Conditional Jumps
• Has the form Bcond r, L,
where: r is a register,
L is a label, and
cond is any of the common tests on values in the
register r.
• For example:
BLTZ r, L
causes a jump to label L if the value in register r is less
than zero, and allows control to pass to the next machine
instruction
if not.
25
The Target Machine: Addressing
Modes
• We assume that our target machine has a variety of
addressing modes:
– In instructions, a location can be a variable name x referring to
the memory location that is reserved for x.
– Indexed address, a(r), where a is a variable and r is a register.
27
The Target Machine: Addressing
Modes
• Op-codes (op), for example
LD and ST (move content of source to destination)
ADD (add content of source to destination)
SUB (subtract content of source from dest.)
Address
modes
Mode Form Address Added Cost
Absolute M M 1
Register R R 0
Indexed a(R) a + contents (R) 1
Indirect Register *R contents (R) 0
Indirect Indexed *a(R) contents(a + contents (R)) 1
Literal #c c 1
28
A Simple Target language (assembly
language)
• Example :
x = y – z LD R1, y //R1=y
LD R2, z // R2=z
SUB R1, R1, R2 //R1=R1-R2
ST x, R1 //x=R1
b = a[i] LD R1, i // R1=i
MUL R1, R1, 8 // R1=R1*8
LD R2, a(R1) //
R2=content(a+content(R1))
ST b, R2 // b=R2
a[j] = c LD R1, c
//R1=c
LD R2, j //R2=j
MUL R2, R2, 8 //R2=R2*8
ST a(R2), R1 //
content(a+content(R2))=R1
29
A Simple Target language (assembly
language)
x = *p LD R1, p //R1=p
LD R2, 0(R1) //R2=content(0+content(R1))
ST x, R2 //x=R2
*p = y LD R1, p //R1=p
LD R2, y //R2=y
ST 0(R1), R2 //content(0+content(R1))=R2
30
Program and Instruction Costs
• Cost is associated with compiling and running a program.
• Cost measures are:
– The length of compilation time and the size
– Running time and power consumption of the target program
• Addressing modes:
– involving registers have zero additional cost,
– involving a memory location or constant in them have an
additional cost of one.
31
Examples
Instruction Operation Cost
LD R1 , R0 Load content(R0) into register R1 1
ST M , R0 Store content(R0) into memory location 2
M
LD R0 , M Load content(M) into register R0 2
ST M , 4(R0) Store contents(4+contents(R0)) into M 3
ST M , *4(R0) Store contents(contents(4+contents(R0))) 3
into M
LD R0, #1 Load 1 into R0 2
ADD *12(R1) , Add contents(4+contents(R0)) to value at 3
4(R0) location contents(12+contents(R1))
32
Exercises
• Exercise 8.2.1
• Exercise 8.2.2
• Exercise 8.2.1
• Exercise 8.2.3
• Exercise 8.2.4
• Exercise 8.2.5
• Exercise 8.2.6
33
Basic Blocks and Flow Graphs
• Introduce a graph representation of intermediate
code that is helpful for discussing code generation;
• Useful for:
– Register allocation and Instruction selection
– Local and global optimization
(1) prod := 0
(2) i := 1
begin
(3) t1 := 4 * i
prod := 0;
(4) t2 := a[t1]
i := 1;
(5) t3 := 4 * i
do begin
(6) t4 := b[t3]
prod := prod + a[i] * b[i];
(7) t5 := t2 * t4
i = i+ 1;
(8) t6 := prod + t5
end
(9) prod := t6
while i <= 20
(10) t7 := i + 1
end
(11) i := t7
Source code (12) if i <= 20 goto (3)
Three-address code
40
Example: Finding Leaders
The following code computes the inner product of two vectors.
44
Example: Forming the Basic Blocks
B1 (1) prod := 0
(2) i := 1
B2 (3) t1 := 4 * i
(4) t2 := a[t1]
Basic Blocks: (5) t3 := 4 * i
(6) t4 := b[t3]
(7) t5 := t2 * t4
(8) t6 := prod + t5
(9) prod := t6
(10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto (3)
B3 (13) …
45
Example (More)
(1) i := m – 1 (16) t7 := 4 * i
(2) j := n (17) t8 := 4 * j
(3) t1 := 4 * n (18) t9 := a[t8]
(4) v := a[t1] (19) a[t7] := t9
(5) i := i + 1 (20) t10 := 4 * j
(6) t2 := 4 * i (21) a[t10] := x
(7) t3 := a[t2] (22) goto (5)
(8) if t3 < v goto (5) (23) t11 := 4 * i
(9) j := j - 1 (24) x := a[t11]
(10) t4 := 4 * j (25) t12 := 4 * i
(11) t5 := a[t4] (26) t13 := 4 * n
(12) If t5 > v goto (9) (27) t14 := a[t13]
(13) if i >= j goto (23) (28) a[t12] := t14
(14) t6 := 4*i (29) t15 := 4 * n
(15) x := a[t6] (30) a[t15] := x
46
Example: Leaders
(1) i := m – 1 (16) t7 := 4 * i
(2) j := n (17) t8 := 4 * j
(3) t1 := 4 * n (18) t9 := a[t8]
(4) v := a[t1] (19) a[t7] := t9
(5) i := i + 1 (20) t10 := 4 * j
(6) t2 := 4 * i (21) a[t10] := x
(7) t3 := a[t2] (22) goto (5)
(8) if t3 < v goto (5) (23) t11 := 4 * i
(9) j := j - 1 (24) x := a[t11]
(10) t4 := 4 * j (25) t12 := 4 * i
(11) t5 := a[t4] (26) t13 := 4 * n
(12) If t5 > v goto (9) (27) t14 := a[t13]
(13) if i >= j goto (23) (28) a[t12] := t14
(14) t6 := 4*i (29) t15 := 4 * n
(15) x := a[t6] (30) a[t15] := x
47
Example: Basic Blocks
(1) i := m – 1 (16) t7 := 4 * i
(2) j := n (17) t8 := 4 * j
(3) t1 := 4 * n (18) t9 := a[t8]
(4) v := a[t1] (19) a[t7] := t9
(5) i := i + 1 (20) t10 := 4 * j
(6) t2 := 4 * i (21) a[t10] := x
(7) t3 := a[t2] (22) goto (5)
(8) if t3 < v goto (5) (23) t11 := 4 * i
(9) j := j - 1 (24) x := a[t11]
(10) t4 := 4 * j (25) t12 := 4 * i
(11) t5 := a[t4] (26) t13 := 4 * n
(12) If t5 > v goto (9) (27) t14 := a[t13]
(13) if i >= j goto (23) (28) a[t12] := t14
(14) t6 := 4*i (29) t15 := 4 * n
(15) x := a[t6] (30) a[t15] := x
48
Control Flow Graph (CFG)
49
Control Flow Graph (CFG)
50
Example: Control Flow Graph Formation
B1 (1) prod := 0
(2) i := 1
Rule (2)
B2 (3) t1 := 4 * i
B1 (4) t2 := a[t1]
(5) t3 := 4 * i
B2 (6) t4 := b[t3]
(7) t5 := t2 * t4
B3 (8) t6 := prod + t5
(9) prod := t6
(10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto (3)
B3 (13) … 51
Example : Control Flow Graph
Formation
B1 (1) prod := 0
(2) i := 1 Rule (1)
Rule (2)
B2 (3) t1 := 4 * i
B1 (4) t2 := a[t1]
(5) t3 := 4 * i
B2 (6) t4 := b[t3]
(7) t5 := t2 * t4
B3 (8) t6 := prod + t5
(9) prod := t6
(10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto (3)
B3 (13) … 52
Example : Control Flow Graph
Formation
B1 (1) prod := 0
(2) i := 1 Rule (1)
Rule (2)
B2 (3) t1 := 4 * i
B1 (4) t2 := a[t1]
(5) t3 := 4 * i
B2 (6) t4 := b[t3]
(7) t5 := t2 * t4
B3 (8) t6 := prod + t5
(9) prod := t6
(10) t7 := i + 1
(11) i := t7
(12) if i <= 20 goto (3)
Rule (2)
B3 (13) … 53
Example: CFG (More)
(1) i := m – 1 (16) t7 := 4 * i
(2) j := n (17) t8 := 4 * j
(3) t1 := 4 * n (18) t9 := a[t8]
(4) v := a[t1] (19) a[t7] := t9
(5) i := i + 1 (20) t10 := 4 * j
(6) t2 := 4 * i (21) a[t10] := x
(7) t3 := a[t2] (22) goto (5)
(8) if t3 < v goto (5) (23) t11 := 4 * i
(9) j := j - 1 (24) x := a[t11]
(10) t4 := 4 * j (25) t12 := 4 * i
(11) t5 := a[t4] (26) t13 := 4 * n
(12) If t5 > v goto (9) (27) t14 := a[t13]
(13) if i >= j goto (23) (28) a[t12] := t14
(14) t6 := 4*i (29) t15 := 4 * n
(15) x := a[t6] (30) a[t15] := x
54
A Simple Code Generator
55
A Simple Code Generator
• One of the primary issues during code generation:
deciding how to use registers to best advantage
• Four principal uses of registers:
– In most machine architectures, some or all of the operands
of an operation must be in registers in order to perform the
operation.
– Registers make good temporaries to hold the result of a
sub expression or a variable that is used only within a
single basic block.
– Registers are used to hold (global) values that are
computed in one basic block and used in other blocks.
– Registers are often used to help with run-time storage
management (e.g., stack-pointer).
56
A Simple Code Generator
• Assumption of the code-generation algorithm in this
section:
– Some set of registers is available to hold the values that are
used within the block.
– The basic block has already been transformed into a
preferred sequence of three-address instructions
– For each operator, there is exactly one machine instruction
that takes the necessary operands in registers and
performs that operation, leaving the result in a register as
long as possible until:
a) Their register is needed for another computation
• LDcall,
b) A procedure reg,jump
memor labeled statements
• ST mem, reg
• OP reg, reg, reg
57
Register and Address Descriptors
• Descriptors are used by the code generating alg. To keep
track of register contents and addresses for the names.
• Descriptors are necessary for variable load and store
decision.
• Register descriptor MOV R0, a “R0 contains
– For each available register a”
– Keeping track of the variable names whose current value is in
that register, it is consulted when a new register is needed.
– Initially, all register descriptors are empty
• Address descriptor MOV R0, a
– For each program variable
MOV R1,R0 “a in R0 and
R1”
– Keeping track of the location (s) where the current value of that
variable can be found (register, memory address, stack location)
– Stored in the symbol-table entry for that variable name.
58
The Code-Generation Algorithm
• There are basically three parts to (this simple algorithm
for) code generation.
• Choosing registers
• Generating instructions
• Managing descriptors
61
The Code-Generation Algorithm
• Ending the Basic Block
• We need to ensure that all variables needed by (dynamically)
subsequent blocks (i.e., those live-on-exit) have their current
values in their memory locations.
additional location of x.
2. For the instruction ST x, R
(a) Change the address descriptor for x to include its own location.
3. For an operation such as ADD Rx , Ry , Rz for x = y +
z
(a) Change the register descriptor for Rx so that it holds only x.
(b) Change the address descriptor for x so that its only location is Rx.
– Note that the memory location for x is not now in the address
descriptor for x.
(c) Remove Rx from the address descriptor of any variable other than x.
63
The Code-Generation Algorithm…
64
The Code-Generation Algorithm
• Managing Register and Address Descriptors
• For R a register, let Desc(R) be its register descriptor. For x a
program variable, let Desc(x) be its address descriptor.
1. Load: LD R, x
– Desc(R) = x (removing everything else from Desc(R))
– Add R to Desc(x) (leaving alone everything else in Desc(x))
– Remove R from Desc(w) for all w ≠ x
2. Store: ST x, R
– Add the memory location of x to Desc(x)
3. Operation: OP Rx, Ry, Rz implementing the quad OP x, y, z
– Desc(Rx) = x
– Desc(x) = Rx (Now Rx does not contain x's memory location!)
– Remove Rx from Desc(w) for all w ≠ x
4. Copy: For x = y after processing the load (if needed)
– Add x to Desc(Ry) (recall that Ry=Rx).
– Desc(x) = Ry.
65
Example
• Since we haven't specified getReg() yet, we will assume there
are an unlimited number of registers so we do not need to
generate any spill code (saving the register's value in memory).
• One of getReg()'s jobs is to generate spill code when a register
needs to be used for another purpose and the current value is
not presently in memory.
• Despite having ample registers and thus not generating spill
code, we will not be wasteful of registers.
– When a register holds a temporary value and there are no
subsequent uses of this value, we reuse that register.
– When a register holds the value of a program variable and there
are no subsequent uses of this value, we reuse that register
providing this value is also in the memory location for the variable.
– When a register holds the value of a program variable and all
subsequent uses of this value are preceded by a redefinition, we
could reuse this register. But to know about all subsequent uses
may require live/dead-on-exit knowledge.
66
t=a-b
u=a-c
v=t+u
a=d
d=v+u
Assume:
t,u,v are
temporaries
a,b,c,d are
variables live
on exit 67
Design of the Function getReg
• Pick a register Ry for y in x=y+z
1 . If y is currently in a register, pick the register.
2. If y is not in a register, but there is an empty register, pick the
register.
3. If y is not in a register, and there is no empty register.
• Let R be a candidate register, and suppose v is one of the variables in
the register descriptor
• need to make sure that v's value either is not needed, or that there is
somewhere else we can go to get the value of R.
(a) OK if the address descriptor for v says that v is somewhere besides R,
(b) OK if v is x, and x is not one of the other operands of the
instruction(z in this example)
(c) OK if v is not used later
(d) Generate the store instruction ST v, R to place a copy of v in its own
memory location. This operation is called a spill.
68
Design of the Function getReg
70
Peephole “Optimization”
• The peephole is a small, sliding window on a program.
71
Peephole “Optimization”
Goals:
- improve performance
- reduce memory footprint
- reduce code size
Method:
1. Exam short sequences of target instructions
2. Replacing the sequence by a more efficient one.
• redundant-instruction elimination
• flow-of-control optimizations
• algebraic simplifications
• use of machine idioms
72
Eliminating Redundant Load and
Stores
73
An unlabeled instruction Eliminating
immediately following an
Unreachable code
unconditional jump may
be removed.
This operation can be
repeated to eliminate a debug = 0
sequence of instructions. ...
if(debug) {
Source Code: print debugging information
}
debug = 0
...
Intermediate if debug = 1 goto L1
Code: goto L2
L1: print debugging information
L2:
74
Eliminate Jump after Jump
One obvious
peephole
optimization is to debug = 0
eliminate jumps over ...
jumps. Before: if debug = 1 goto L1
goto L2
L1: print debugging information
L2:
debug = 0
...
After: if debug 1 goto L2
print debugging information
L2:
75
Constant Propagation
If debug is set to
0 at the debug = 0
beginning of the Before: ...
program, if debug 1 goto L2
constant print debugging information
propagation L2:
would transform
this sequence debug = 0
into ...
After: if 0 1 goto L2
print debugging information
L2:
76
Deleting Unreachable Code
(dead code elimination)
Now the
argument of the
first statement
debug = 0
always evaluates
Before: ...
to true,
so the if 0 1 goto L2
print debugging information
statement can be
L2:
replaced by goto
L2.
b:= x + y
b:= x + y
…
…
79
Transformations on Basic Blocks
• Structure-Preserving Transformations:
80
Algebraic Simplification and Reduction in
Strength
• Algebraic simplification can be used to eliminate three-
address statements
x = x+0 can be eliminated to …
x=x*1
• Reduction-in-strength transformations can be applied to
replace expensive operations by a cheaper one
– x2 ; power(x, 2); x*x
– 2*x x+x
– Fixed-point multiplication or division; shift
– Floating-point division by a constant can be
approximated as multiplication by a constant
81
Examples of Transformations
Algebraic transformations:
x := x + 0
x := x * 1 x := y*y
x := y**2 z := x + x
z := 2*x
82
Examples of Transformations
Common subexpression elimination:
remove redundant computations
a := b + c a := b + c
b := a - d b := a - d
c := b + c c := b + c
d := a - d d := b
t1 := b*c t1 := b * c
t2 := a – t1 t2 := a – t1
t3 := b*c t4 := t2 + t1
t4 := t2 + t3
t1 := b + c t1 := b + c
t2 := a – t1 t2 := a – t1
t1 := t1 * d t3 := t1 * d
d := t2 + t1 d := t2 + t3
84
Use of Machine Idioms
• The target machine may have hardware instructions to
implement certain specific operations efficiently.
• Using these instructions can reduce execution time
significantly.
• Example:
– some machines have auto-increment and auto-decrement
addressing modes.
– The use of the modes greatly improves the quality of code
when pushing or popping a stack as in parameter passing.
– These modes can also be used in code for statements like
x=x+1 inc x
85