An efficient implementation of SELF
An efficient implementation of SELF
Abstract 1. Introduction
We have developed and implemented techniques that SELF yUS87] is a dynamically-typed object-oriented
double the performance of dynamically-typed object- language inspired by the Smalltalk-80** language
oriented languages. Our SELF implementation runs [GR83]. Like Smalltalk, SELF has no type declarations,
twice as fast as the fastest Smalltalk implementation, allowing programmers to rapidly build and modify
&spite SELF’s lack of classesand explicit variables. systems without interfenznce from out-of-date type
To compensate for the absence of classes, our system declarations. Also, SELF provides blocks (lexically-
uses implementation-level maps to transparently group scoped function objects akin to closures [Ste76, SS76])
objects cloned from the same prototype, providing data so that SELF programmers may define their own
type information and eliminating the apparent space control structures; even the standard control structures
overhead for prototype-based systems. To compensate for iteration and boolean selection are constructed out
for dynamic typing, userdefined conml structures, and of blocks. However, unlike Smalltalk and most other
the lack of explicit variables, our system dynamically object-oriented languages, SELF has no classes.***
compiles multiple versions of a source method, each Instead it is based on the prototype object model, in
customized according to its receiver’s map. Within each which each object defines its own object-specific
version the type of the receiver is fixed, and thus the behavior, and inherits shared behavior from its parent
compiler can statically bind and inline all messages sent
objects. Also unlike Smalltalk, SELF accesses state
to self. Message splitting and type prediction extract
solely by sending messages; there is no special syntax
and preserve even more static type information,
for accessing a variable or changing its value. These two
allowing the compiler to inline many other messages.
Inlining dramatically improves performance and features, combined with SELF’s multiple inheritance
eliminates the need to hard-wire low-level methods rules, help keep programs concise, malleable, and
such as +, ==, and if True : . reusable.
Despite inlining and other optimizations, our system In a straightforward implementation, SELF’s prototype-
still supports interactive programming environments. based model would consume much more storage space
The system traverses internal dependency lists to other dynamically-typed object-oriented
invalidate all compiled methods affected by a program- programming languages, and its reliance on message
ming change. The debugger reconstructs inlined stack
frames from compiler-generated debugging information,
making inlining invisible to the SELF programmer. * ‘Ibis work has been generously supported by a National
Science Foundation Presidential Young Investigator Grant #
CCR-865’7631, and by IBM, Texas Instruments, NCR, Tandem
Computers, Apple Computer, and Sun Microsystems.
Permission to copy without fee all or part of this material is granted provided
*If Smalltalk- is a trademark of ParcPlace Systems,Inc.
that the copies are not made or distributed for direct commercial advantage,
Hereafterwhen we write “Smalltalk” we will. be referring to the
the ACM copyright notice and the title of the publication and its date appear,
Sma.lltalk-80systemor language.
and notice is given that copying is by permission of the Association for *** To illustrate how unusual this is, note that some well-
Computing Machinery. To copy otherwise, or to republish, requires a fee respected authorities have gone so far as to require that “object-
and/or specific permission. oriented” languages provide classes meg87]. Other prototype
0 1989 ACM 08979L333-7/89/0010/0049 $1.50 models are discussedin por86, Lie86, LTP86, Ste87].
object references.
A SELF Memory Space
Since the elements of byte arrays are represented using
packed bytes rather than tagged words, byte array
elements may masquerade as object references. our SELF system scanned memory at the rate of
Smalltalk systems typically handle this problem by approximately 3 megabytes per second. Measurements
scanning the heap object-by-object rather than word-by- of the fastest Smalltalk- implementation on the same
word. For each object, the system checks to see whether machine indicated a scanning speed for non-segregated
the object contains object references or only bytes. Only memory spacesof only 1.6 megabytesper second.
if the object contains object references does the system
scan the object for matching references, iterating up to For some kinds of scans, such as finding all objects that
the length of the object. Then the scanner proceeds to refer to a particular object, the scanner needs to find the
the next object. This procedure avoids the problems objects that contain a matching reference, rather than
caused by scanning byte arrays, but slows down the scan the reference itself. Our system can perform these types
with the overhead to parse object headers and compute of searches nearly as fast as a normal scan. We use a
object lengths. special tag for the first header word of every object
In our SELF system, we avoid the problems associated (called the murk word) to identify the beginning of the
with scanning byte arrays without degrading the object object. The scanner proceeds normally, searching for
reference scamring speed by segregating the byte arrays matching references. Once a reference is found, the
from the other SELF objects. Each Generation object containing the reference can be found by simply
Scavenging memory space is divided into two areas, one scanning backwards to the object’s mark word, and then
for byte arrays and one for objects with references. To converting the mark’s addressinto an object reference.
scan all object references, only the object reference ama
of each space needs to be scanned. This optimization 3.3. Object Formats
speeds scans in two ways: byte array objects are never
A SELF memory space is organized as a linear array of
scanned,and object headers are never parsed.
aligned 32-bit words. Each word contains a low-order 2-
To avoid slowing the tight scanning loop with an bit tag field, used to interpret the remaining 30 bits of
explicit end-of-space check, the word after the end of information. A reference to an integer or floating point
the space is temporarily replaced with a sentinel number encodes the number directly in the reference
reference that matches the scanning criterion. The itself. Converting between a tagged integer immediate
scanner checks for the end of the space only on a and its corresponding hardware representation requires
matching reference, instead of on every word. Early only a shift instruction. Adding, subtracting, and
measurements on 68020-based Sun-3/50’s showed that comparing tagged integers require no conversion at all.
I dwqgpcy
I
1 byte code ]
.
1 des%$on /
I I
constant slot data slot assignment slot data slot slot type
description description description 2 slot offset
slot
. . * de ndency
hfe
33 slot name
Y
slot contents 1 slot offset 1 Idata slot index1
data slot slot type
3 slot offset
slot
. . . j?; dew
r
‘x:’ slot name
assignment slot slot type
1 data slot index
From the above object formats, we can determine that slot
the total space cost to represent a clone family of n . . . ;; dew
r
objects (each with s slots, (I of which are assignable) is
‘y:’ s/of name
(2 + a)n + 5s + 8 words. For the simple Cartesian point
assignment slot slot type
example, s is 5 (x, x: , y. y:, and parent) and u is 2
2 dafa slot index
(x and y), leading to a total space cost to represent all
slot
point objects of 4n + 33 words. Published accounts of . . . de ndency
Smalltalk- systems [DS84, Ung86] indicate that these /inRe
systems use at least two extra words per object: one for
its class pointer and another for either its address or its The representation of two carte&n point objects. The
hash code and flags. Therefore, maps allow objects in a objects on the left are the point “instances,” containing
prototype-based system like SELF to be represented just the values of the x and y assignable data slots. The
right object is the shared map for all Cartesian points,
as space-efficiently as objects in a class-based system
containing the value of the constant parent slot and the
like Smalltalk. offsets of the assignable x and y slots.
min: arg = (
< arg ifTrue: [self] False: [arg] ).
The next message considered by our compiler is the create-closure /$$Icreate-closure create-closure C
ifTrue:False: message. If arg is an integer--the
common case-the receiver of ifTrue: False: Will
be either true or false; otherwise it will be the result
of the value message (unknown at compile-time).
Normally, tbis would prevent inlining of the
ifTrue:False: message, since the type of its
receiver cannot be uniquely determined. However, by
compiling multiple versions of the ifTrue:False:
message, one version for each statically-known receiver
type, our SELF compiler can handle and optimize each
caseseparately. This technique is explained next.
1 send value1
I
1
1 push Lselfl 1
I create-closure
I I
I 1
push [ . . . ] 1 push [argl 1
I
create-closure I create-closure I
I
send value
I
1 push ; self] 1 1 push iself] 1 push [self]
I
create-closure IIcreate-closure
~~~~1
i[self] valuei [arg] value i send ifTrue:
compiler won’t inline the value message, and so the
value message’s result type is unknown at compile-
time. Thus the receiver type of the ifTrue:False:
message is unknown, and a simple SELF compiler
wouldn’t be able to inline this message away either.
However, the next subsection describes how our
The two value messagescan be inlined, replaced by the compiler uses la-~own patterns of usage to predict that
bodies of the blocks. Since none of the receiver and the receiver of the ifTrue:False: message Will be a
arguments of the inlined ifTrue: False: messages boolean and optimizes the messageaccordingly.
create-closure
I header
native
machine slot locations
code
create-closure
I
k
\I r I scavenging info $i~$$gg$?
send ifTrue: :!:::::::;:;:i:v
.y$;:::::::::
::M:i:;:;::.’
False: depy;lzncy &$#L:’ a ,,yte code mapp,ng
$$$>
:::::.:
p .+::g
..,.:::::::z::::
.::::::::::::::::::
,.~.::::::::::::::::::::::
..,~!~~~~Sl~ii(iiiiiiii
.A.,..... .A........
.,I.......,...
.‘.‘.‘.‘.‘f:.:.:.:.:......,:,:.:,:,:
:.:.:.:...A.. ..A.. .
.,.,.,.,.,.,.,.,.,...,.,.,.,.,.,,
:i:i:~.~:1:~:i:b~::~~:~~:~:~:~
....,.
...,.,.,.,.,.,
........._.....,.,.,.,
.“‘!‘:‘:‘:.:::::::~::::::::::::
...................
.‘....:.::::~.~:~:~::~:
In the common case of taking the minimum of two
integers, our compiler executes only two simple
compare-and-branch sequences, for fast execution. A
simihr savings will be seen if the user calls min : on A compiled method contains more than just instruc-
tions. It includes a list of the offsets within the in-
two floating point numbers or two strings, since our structions of embedded object references, used by the
compiler customizes and optimizes special versions for scavenger to modify the compiled code if a refer-
each of these receiver types. But even in the case of enced object is moved. The compiled method includes
dependency links to support selective invalidation. It
taking the minimum of two values of different types, also includes descriptions of the inlined method
such as an integer and a floating point number, our scopes, which are used to fiid the values of local
compilation techniques preserve the message passing slots of the method and to display source-level call
stacks, and a bidirectional mapping between source-
semantics of the original source code, and execute the level byte codes and actual program counter values.
source code faithfully.
I I I
than e affects
min: f ookup
in case 4 is
is changed
four ways:
9 When a method is being compiled, the system
I ’ (map dependency) XG#~
in case min:
is added
I I
min:
source-level debugger The SELF debugger presents the selt: rl
arg: f-2
program execution state in terms of the programmer’s
execution model: the state of the byte code interpreter,
with no optimizations. This requires that the debugger
be able to examine the state of the compiled, optimized
SELF program, and construct a view of that state (the
virtuaE state) in terms of the byte-coded execution
model. Examining the execution state is complicated by
having methods in the virtual call stack actually be
inlined within other methods in the compiled method
call stack, and by allocating the slots of virtual
methods to registers and/or stack locations in the
compiled methods. To allow the debugger to
reconstruct the virtual call stack from the physical
The debugging information for the min: method. Each
optimized call stack, the SELF compiler appends scope description points to its calling scope description
debugging information to each compiled method For (black arrows); a block scope also points to its lexically-
each scope compiled (the initial method, and any enclosing scope description (gray arrows). For each slot
within a scope, the debugging information identifies ei-
methods or block methods inlined within it), the ther the slot’s compile-time value or its run-time location.
compiler outputs information describing that scope’s For the min: example, only the initial arguments have nm-
place in the virtual call chain within the compiled time locations (registers rl and r2 in this case); all other
slot contents are known statically at compile-time.
method’s physical stack frame. For each argument and
local slot in the scope, the compiler outputs either the
value of the slot (if it’s a constant known at compile-
time, as many slots are) or the register or stack location execution stack displayer uses this mapping information
allocated to hold the value of the slot at run-time. to find the bottommost virtual stack frame for each
physical stack frame to display the call stack whenever
Our SELF compiler also outputs debugging information
the program is halted.
to support computing and setting breakpoints. This
information takes the form of a bidirectional mapping We have not implemented the breakpointing facilities in
between program counter addresses and byte code our debugger yet; the current “debugger” displays the
instructions within a particular scope. One complexity virtual execution stack and immediately continues
with this mapping is that it is not one-to-one: several execution whenever the -DumpSelfStack primitive is
byte codes may map to the same program counter called. However, our mapping system is designed to
address (as messages get inhned and optimized away), support computing and setting breakpoints in
and several program counter addresses may map to the anticipation of breakpointing and process control
same byte code (as messages get split and compiled in primitives. To set a breakpoint at a particular source-
more than one place). To determine the current state of level byte code, the debugger would find all those
the program in byte code tenus at any program counter program counter addressesassociated with the byte code
address, the debugger first finds the Iarest program and set breakpoints there. In cases where several byte
counter address in the mapping that is less than or equal codes map to the same program counter address, single
to the current program counter, and then selects the stepping from one byte code to the next wouldn’t
latest byte code mapped to that address; this algorithm actually cause any instructions to be executed, the
returns the last byte code that has been started but not debugger would pretend to execute instructions to
completed for any program counter address, The preserve the illusion of byte-coded execution.