0% found this document useful (0 votes)
109 views27 pages

Advanced Compiler Design and Implementation: Run-Time Support

The document discusses various aspects of run-time support for compiled code, including: 1. Data type representations and register usage for different integer sizes and long arithmetic. 2. Organization of the run-time stack, activation records, and how different links are used to support procedure calls and nested procedures. 3. Parameter passing modes and the division of responsibilities between caller and callee procedures in setting up the stack frame and argument passing.

Uploaded by

vadriangmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views27 pages

Advanced Compiler Design and Implementation: Run-Time Support

The document discusses various aspects of run-time support for compiled code, including: 1. Data type representations and register usage for different integer sizes and long arithmetic. 2. Organization of the run-time stack, activation records, and how different links are used to support procedure calls and nested procedures. 3. Parameter passing modes and the division of responsibilities between caller and callee procedures in setting up the stack frame and argument passing.

Uploaded by

vadriangmail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

1 of 27

Advanced Compiler
Design and
Implementation:
Run-Time Support
Juhana Helovuo
Data type representations and instruction set support
Register set and register usage
Activation records and run-time stack
Parameter passing modes
Code for subroutine calls
2 of 27
Shared object code
Dynamic typing, heap management, function polymorphism
3 of 27
Data type representations
Fixed-size integers: word, halfword, byte
How to treat integers with size < register size
Example: Add 5 to signed byte @ sp+72
Different sizes of loads, stores and arithmetic (M68k)
addi.b (72,a7), 5 ; add immediate byte
Sign/zero-extend on load instructions (Sparc)
ldsb [%sp+72],%l2 ; load signed byte (and extend)
add %l2, 5, %l2 ; add (32-bit)
stb %l2, [%sp+72] ; store byte
4 of 27
Sign/zero-extend and align with separate instructions
(Alpha)
ldq_u r2, 72(sp) ; load quadword unaligned -> r2
lda r1, 72(sp) ; load address -> r1
extbl r2, r1, r3 ; extract 1 byte from r2 -> r3
mskbl r2, r1, r2 ; mask (clear) byte from r2
addq r3, 5, r3 ; add quadword (64-bit)
insbl r3, r1, r3 ; shift byte back in position
or r2, r3, r2 ; combine result & rest of qword
stq_u r2, (r1) ; write quadword back to memory
The general case seems very complex, but case-specic
optimizations often simplify this (register allocation,
alignment, BWX)
Very simple memory unit: Only aligned 64-bit loads & stores
For integer size > register size: Use two or four registers
Architecture may provide double load for two consecutive
registers (Sparc) or multiple load (ARM)
5 of 27
Long arithmetic
Use carry ag for addition & subtraction (Sparc)
addcc %i1, %i3, %l0 ; add low words, generate carry
addx %i2, %i4, %l1 ; add high words + carry
Or use unsigned less than-comparison (Alpha)
addq a0, a2, t0 ; add low words
addq a1, a3, t1 ; add high words
cmpult t0, a0, t2 ; generate carry: t2 = (t0<a0 ? 1 : 0)
addq t1, t2, t1 ; add carry to result high word
6 of 27
Character strings
C-style strings: Array of characters, end of string marked by
character code 0
Pascal-style strings: Character count (integer) followed by an
array of characters
Instruction set support
x86: store string or move string instructions + repeat prex,
byte-sized operations
Sparc: byte loads and stores
Alpha: insert, extract, mask, zap, cmpbge
PowerPC: load/store string (and compare)
7 of 27
Pointers
Usually 32/64-bit words (same as register size)
Naturally aligned: pointer mod sizeof(pointed data) = 0
Array access often requires pointer arithmetic
base pointer + (index * element size)
Element size is often 4 or 8
Special support for address computation
ARM: Data path for second operand contains a shifter unit
Alpha: s4add, s8add, s4sub, s8sub
PowerPC, ARM, Sparc: Indexed addressing mode
lwzx r0,r9,r2 ; r0 := M[r9+r2] (PowerPC)
ld [%i2+%i3], %l1 ; l1 := M[i2+i3] (Sparc)
8 of 27
Register Usage
Typical RISC has 32 integer registers
(ARM: 16, Itanium: 128, Sparc: register windows, x86: ~8)
Compiler typically has several uses for registers
stack pointer and frame pointer
global offset table pointer (global pointer)
dynamic link and static link
call arguments and return values
local variables
frequently used global variables
temporary values
9 of 27
...Register Usage
The compiler should maximize the use of the register set in
order to avoid memory accesses
The partitioning of the register set may be partially
determined by
ISA (Instruction Set Architecture = hardware platform) and
ABI (Application Binary Interface = system software)
ISA usually denes or recommends a stack pointer, possibly
also frame pointer and link register
ABI may dene argument and return value registers
ABI must be followed to maintain interoperability with other
compilers and libraries
10 of 27
Register partitioning example (Alpha)
v0 = return value, a0..a5 = call arguments, ra=return address
s0..s5 = local/global variables, preserved across calls
t0..t11 = local variables and temporaries, not preserved
pv = call address, gp = global pointer, AT = assembler temp.
v0 t7
s0
s1
fp
t8
t9
t10
t11
pv
AT
gp
sp
zero
s2
s3
s4
s5
a0
a1
a2
a3
a4
a5
ra
t0
t1
t2
t3
t4
t5
t6
r0
r7 r31
r24
r1
r2
...
...
11 of 27
The Run-Time Stack
The run-time stack is used to store activation records (stack
frames)
Activation records represent
procedure activations and they may
contain
dynamic link and static link
call arguments and return values
local variables
saved registers (by caller and callee)
procedure call return address
The stack is maintained and accessed through the stack
pointer register, often also by the frame pointer
sp
fp
current
frame
previous
frame
sp+N
fp-M
12 of 27
The activation record is used to communicate between the
caller (main program) and callee (subroutine)
These procedures may be compiled separately
The compiler must adhere to a call convention, or a
procedure call protocol
Parts of the activation record are constructed by the caller
and some parts by the callee
Only the caller may know the size of argument list (C)
Only the callee knows the storage required for local
variables
Both have to be able to access arguments, return value and
links (dynamic, static, return address)
13 of 27
Links in Stack Frame
Dynamic Link
Used to nd the calling stack frame on return
If the frame size is xed and static, then there is no need for
this. Just use a constant offset in the code
Static Link
Used to nd the last activation of the static parent of the
current frame
Required only in languages allowing nested, local
procedures (e.g. Pascal, Ada, not in C)
Return Address
Used to nd the code of the caller on procedure exit
RISCs store return address into a link register on call (jump-
and-link) instruction
14 of 27
Parameter passing modes
Call by value: Argument value is copied into the callee. The
original variable of the caller is not modied during the call.
Default in most languages (except Fortran and Perl)
Call by result: Argument is copied from the callee to the
caller. Used to return values.
Call by value-result: Argument is copied both ways.
Call by reference: Callee gets a reference (pointer) to a
memory location holding the argument. Callee can modify
the argument.
Call by name: Like call by reference, but the argument pointer
expression is recomuputed at each access.
The callee is passed a small anonymous function to
compute the address of the argument.
15 of 27
Procedure Call and Return
Callers view of a subroutine call
Call
1. Evaluate each argument and place them in argument
registers or stack frame
2. Determine the address of the subroutine (mostly done by the
linker)
3. Store caller-save -registers in stack frame
4. Compute a static link for the subroutine, if necessary
5. Save the return address and jump to the subroutine
Return
1. Restore saved registers from stack
2. Use the return value
16 of 27
Epilogue and Prologue
Callees view of the call
Prologue
1. Save frame pointer, copy stack pointer to frame pointer,
compute new stack pointer, i.e. allocate new stack frame
2. Save callee-save registers, if necessary
3. Construct a display (cache of static links), if necessary
Procedure body is executed between the prologue and the
epilogue
Epilogue
1. Restore saved callee-save registers
2. Restore SP from frame pointer and FP from dynamic link
3. Place return value in appropriate register or stack location
4. Jump to return address
17 of 27
Call Example
Sample C code
int test_proc(int a1, int a2)
{
int lv1, lv2;
...
return ...;
}
...
r = test_proc(r,4);
Subroutine with two int parameters and two int locals
18 of 27
PowerPC calling convention (MacOS X)
stack frames are of
static and xed size
no frame pointer
callee saves as
many registers as it
uses
frame contains
outgoing arguments
(incoming
arguments in
previous frame)
callee may store
incoming args in
callers frame if it
needs them in memory
r0
r1
r2
r3
r10
r11
r12
r13
r31
link
count
cond
exception
zero/temp
stack ptr
temp
arg0/ret.v.
arg7
temp
indir. branch target
local
variables
Register partitioning
Stack frame structure
old SP
SP
saved cond
saved link
???
SP+24
arg0
argN
local
variables
saved
registers
prev. frame
old SP
and temps
in memory
outgoing
args
arg1/ret.v.
arg2
...
19 of 27
PowerPC assembly for example call
Prologue and epilogue
_test_proc:
mflr r0 ; r0 <- link
stmw r26,-24(r1) ; store r26..r31 below SP (24 bytes)
stw r0,8(r1) ; M[SP+8] <- r0 (to callers frame)
stwu r1,-96(r1) ; M[SP-96] <- SP ; SP <- SP-96
...body...
lwz r0,104(r1) ; r0 <- M[SP+96+8]
addi r1,r1,96 ; SP <- SP + 96
mtlr r0 ; link <- r0
lmw r26,-24(r1) ; restore r26..r31
blr ; branch to link register
Call site
li r3,3 ; load arg0
li r4,4 ; load arg1
bl _test_proc ; branch and link
mr r4,r3 ; r4 <- r3 (use return value)
20 of 27
Procedure-valued variables
Rare in imperative languages, routine in functional languages
C provides function pointers
Simple to implement as plain code pointers
This is sufcient, since there are no local procedures
Nested procedures require prodecure values to contain both
code pointer and static link (=closure)
Static link is required to nd the local variables of enclosing
scope
Now activation records may have to live even after the
function execution has ended. Stack allocation is not
sufcient for all procedures
21 of 27
Position-Independent Code
Required for shared libraries - and more generally - for any
dynamically loadable code, e.g. plugin modules
Only one copy of shared code in memory code cannot be
modied at load time
PIC must be loadable to an arbitrary memory location
Code and data references must work regardless of code
location
Local data references are SP-based ok
Jumps within the same object module can use relative
addressing ok
Global data references and jumps from object module to
another cannot be absolute use indirect addressing
22 of 27
Global Offset Table
Global Offset Table (GOT) is a pointer table used to point to
global symbols, whose addresses are not known until
program load time.
Data References
The compiler generates indirect references though the GOT
The link-editor relocates the reference as a GOT offset
The run-time linker lls the GOT with actual symbol
addresses, when it knows where the object will be loaded
Code References
Calls to shared code jump to an element of Procedure
Linkage Table (PLT)
PLT element contains code to load an address fromGOT and
a jump to that address
23 of 27
GOT Example (Sparc)
Procedure prologue
.LLGETPC0: ; helper function
retl ; to read program counter
add %o7, %l7, %l7 ; %l7 += return address
so_func: ; actual procedure start
save %sp, -112, %sp ; allocate stack frame
sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %l7
call .LLGETPC0
add %l7, %lo(_GLOBAL_OFFSET_TABLE_+4), %l7
; now %l7 contains the address of GOT
; _GLOBAL_OFFSET_TABLE_ is a PC-relative symbol
Loading from global data symbol si
; symbol si has relocation type GOT, i.e. it is treated as
; an offset into GOT, not actual memory address
sethi %hi(si), %g1
or %g1, %lo(si), %g1 ; %g1 = GOT offset for si
ld [%l7+%g1], %g1 ; load address of si from GOT
ld [%g1], %i0 ; load value of si
24 of 27
Calling via PLT
call so_aux, 0 ; looks normal, but symbol so_aux
; has relocation type PLT
; linker relocates this to .PLT2
The call is to object module-local PLT, not actual subroutine
in another object
.PLT2
sethi (. - .PLT0), %g1
sethi %hi(so_aux), %g1
jmp %g1+%lo(so_aux)
.PLT3
sethi (. - .PLT0), %g1
ba,a .PLT0
nop
.PLT0
save %sp, -64, %sp
call dyn_linker
so_aux:
save ...
...
PLT
Code from shared object
0: rst entry
2: run-time
3: not yet
linked entry
linked
25 of 27
Dynamic typing and polymorphism
Dynamic typing: The programming language does not
associate types to variables, but rather to data values
Variable name can refer to value of any type
Dynamic typing is usually implemented by tagging data
values. Each value carries a type tag with it.
The compiler should generate efcient code for resolving the
types of data values and selecting the corresponding
(polymorphic) operation on them, e.g. a+b on integers, oats,
stings or bignums.
Modern architectures have very little built-in hardware
support for this
Sparc provides tagged add and subtract instructions
26 of 27
Storage management
Fully manual: malloc and free in C, or new and delete
in C++
Automatic deallocation: new but no delete in Java
Fully automatic: All memory management operations
implicit, in e.g. Lisp or Haskell
Automatic deallocation is usually based on reference
counting, garbage collection, or combination of both
Manual allocation is usually implemented as a library call
e.g. Doug Leas dlmalloc library has been shown to
outperform custom memory allocation routines
27 of 27
Summary
Language semantics and run-time services (dynamic
loading, code sharing, memory management) may require
complicated run-time support
It should be possible to optimize away costly parts of the
procedure call mechanism to obtain good call performance.
The amount of required run-time support code depends on
the language and hardware.
Modern RISCs do not have much explicit architectural
support for specic high-level languages, but this can be
compensated in software

You might also like