Code Generation
Code Generation
[email protected] 19/07/2023
Logical Structure of a Compiler
[email protected] 19/07/2023 2
Issues in the Design of a Code Generator
● Instruction selection, register allocation and assignment, and instruction
ordering are encountered in the design of almost all code generators.
● It produces correct code
● Given the premium on correctness, designing a code generator so it can be
easily implemented, tested, and maintained is an important design goal
[email protected] 19/07/2023 3
Input to the Code Generator
● input to the code generator is the intermediate representation of the source
Program
● All syntactic and static semantic errors should be detected before
● Type checking has taken place, and that type-conversion operators have
been inserted wherever necessary.
[email protected] 19/07/2023 4
The Target Program
● Instruction-set architecture of the target machine impact on constructing a
good code generator
● Types of target-machine architectures are.
○ A RISC (reduced instruction set computer) machine typically has many
registers, three-address instructions, simple addressing modes, and a
relatively simple instruction-set architecture.
○ A CISC (complex instruction set computer) machine typically has few
registers, two-address instructions, a variety of addressing modes,
several register classes, variable-length instructions, and instructions with
side effects.
[email protected] 19/07/2023 5
The Target Program
● Stack-based machine - operations are done by pushing operands onto a
stack and then performing the operations on the operands at the top of the
stack.
● Became less used over the time due to repeated operations
● Revived with the introduction of the Java Virtual Machine (JVM).
○ The JVM is a software interpreter for Java bytecodes, an intermediate
language produced by Java compilers.
[email protected] 19/07/2023 6
Instruction Selection
● The code generator must map the IR program into a code sequence that can
be executed by the target machine.
● Complexity of this mapping is determined by factors such as
○ the level of the IR
○ the nature of the instruction-set architecture
○ the desired quality of the generated code.
[email protected] 19/07/2023 7
Instruction Selection
● Example, every three-address statement of the form x = y + z, where x, y, and
z are statically allocated, can be translated into the code sequence
○ LD R0, y // R0 = y (load y into register R0)
○ ADD R0, R0, z // R0 = R0 + z (add z to R0)
○ ST x, R0 // x = R0 (store R0 into x)
● This strategy often produces redundant loads and stores.
[email protected] 19/07/2023 8
Instruction Selection
● This strategy often produces redundant loads and stores.
● Example, the sequence of three-address statements
○ a=b+c
○ d=a+e
● Translated into
○ LD R0, b // R0 = b
○ ADD R0, R0, c // R0 = R0 + c
○ ST a, R0 // a = R0
○ LD R0, a // R0 = a
○ ADD R0, R0, e // R0 = R0 + e
○ ST d, R0 // d = R0
[email protected] 19/07/2023 9
Instruction Selection
● The following instructions set
○ LD R0, a // R0 = a
○ ADD R0, R0, #1 // R0 = R0 + 1
○ ST a, R0 // a = R0
[email protected] 19/07/2023 10
Register Allocation
● Register allocation, during which we select the set of variables that
will reside in registers at each point in the program.
● Register assignment, during which we pick the speci c register
that a variable will reside in.
● Finding an optimal assignment of registers to variables is dicult,
even with single-register machines.
● Mathematically, the problem is NP-complete
[email protected] 19/07/2023 11
Register Allocation
L R0,a
t=a+b A R0,b
t=t*c M R0,c
t=t/d D R0,d
ST R1,t
(b) (b)
[email protected] 19/07/2023 13
Target Machine
● Need to know details about the target machine architecture
● Helps to generate better code
● Will understand a basic assembly code of target language
[email protected] 19/07/2023 14
Simple Target Machine Model
● Target computer modelling of 3-address code is performed with
○ Load and Store operation
○ Computation operations,
○ Jump operations, and
○ Conditional jumps
● Say, the machine is byte-addressable with n general purpose registers (R0,
R1, …..Rn-1, Rn)
[email protected] 19/07/2023 15
Simple Target Machine Model
● Assuming the following available instructions
● Load operations:
○ Instruction LD dst, addr loads the value in location addr into location dst.
■ denotes the assignment dst = addr.
○ Common form of this instruction is LD r, x which loads the value in
location x into register r.
○ Instruction of the form LD r1, r2 is a register-to-register copy in which the
contents of register r2 are copied into register r1 .
[email protected] 19/07/2023 16
Simple Target Machine Model
● Store operations:
○ The instruction ST x, r stores the value in register r into the location x.
○ Denotes the assignment x = r.
● Computation operations:
○ OP dst, src, src2, where OP is a operator like ADD or SUB, and dst, src1
, and src2 are locations, not necessarily distinct.
○ Example, SUB r1, r2, r3 computes r1 = r2 - r3 .
■ Any value formerly stored in r1 is lost,
■ if r1 is r2 or r3 , the old value is read first.
[email protected] 19/07/2023 17
Simple Target Machine Model
● Computation operations
○ Example:
■ Three-address statement x = y - z
■ Assembly Code:
● LD R1, y // R1 = y
● LD R2, z // R2 = z
● SUB R1, R1, R2 // R1 = R1 - R2
● ST x, R1 // x = R1
[email protected] 19/07/2023 18
Simple Target Machine Model
● Unconditional jumps:
○ Instruction BR L causes control to branch to the machine instruction with
label L. (BR stands for branch.)
● Conditional jumps
○ Instruction Bcond r, L, where r is a register, L is a label, and cond stands
for any of the common tests on values in the register r.
○ For example, BLTZ r, L causes a jump to label L if the value in register r
is less than zero, and allows control to pass to the next machine
instruction if not.
[email protected] 19/07/2023 19
Target Machine Model - Addressing Modes
● Self Study
● Link: https://www.geeksforgeeks.org/addressing-modes/
[email protected] 19/07/2023 20
Program and Instruction Costs
● Associate a cost with compiling and running a program
● Some common cost measures are
○ the length of compilation time and the size,
○ running time and
○ power consumption of the target program.
● actual cost of compiling and running a program is a complex problem
[email protected] 19/07/2023 21
Program and Instruction Costs
● The instruction LD R0, R1 copies the contents of register R1 into register R0.
○ This instruction has a cost of one because no additional memory words
are required.
● The instruction LD R0, M loads the contents of memory location M into
register R0.
○ The cost is two since the address of memory location M is in the word
following the instruction.
● The instruction LD R1, *100(R2) loads into register R1 the value given by
contents(contents(100 + contents(R2))).
○ The cost is two because the constant 100 is stored in the word following
the instruction.
[email protected] 19/07/2023 22
Next Use Information
● When the value of a variable will be used next is the Next use information
● If a variable is assigned to register and wont be used soon then another
variable can be assigned the register
[email protected] 19/07/2023 23
Algorithm - Next Use Information and Liveliness
● INPUT: A basic block B of three-address statements. We assume that the symbol table
initially shows all non temporary variables in B as being live on exit.
● OUTPUT: At each statement i: x = y + z in B , we attach to i the liveness and next-use
information of x, y, and z .
● METHOD: We start at the last statement in B and scan backwards to the beginning of B . At
each statement i: x = y + z in B , we do the following:
○ Attach to statement i the information currently found in the symbol table regarding the
next use and liveness of x, y, and z .
○ In the symbol table, set x to “not live" and “no next use."
○ In the symbol table, set y and z to “live" and the next uses of y and z to i.
[email protected] 19/07/2023 24
Runtime Organization
● Compiler creates and manages a run-time environment in which it assumes
its target programs are being executed.
● This environment deals with
○ layout and allocation of storage locations for the objects named in the
source program
○ mechanisms used by the target program to access variables
○ linkages between procedures
○ mechanisms for passing parameters
○ interfaces to the operating system, input/output devices, and other
programs.
[email protected] 19/07/2023 25
Storage Organization
Code Storage-allocation can be made by the compiler looking only
at the text of the program, not at what the program does
Static when it executes.
Heap storage. Data that may outlive the call to the
Heap
procedure that created it is usually allocated on a
“heap" of reusable storage. This is one type of
dynamic allocation
Free Memory
Returned values
Control link
Access link
Local data
Temporaries
[email protected] 19/07/2023 27
General structure of an activation record
Activation Records
● Temporary values - arises while evaluating an expression and cannot be
stored in a register
● Local data belonging to the procedure whose activation record this is.
● A saved machine status, with information about the state of the machine just
before the call to the procedure.
○ Includes return address and contents of register used by the calling
procedure
● An “access link" may be needed to locate data needed by the called
procedure but found elsewhere, e.g., in another activation record.
● A control link, pointing to the activation record of the caller.
● Space for returned values of the called function
● Actual parameters used by the calling procedure
[email protected] 19/07/2023 28
Activation Records
[email protected] 19/07/2023 29
[email protected] 19/07/2023 30