0% found this document useful (0 votes)
3 views

An efficient implementation of SELF

Uploaded by

shinehale730
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

An efficient implementation of SELF

Uploaded by

shinehale730
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

An Efficient Implementation of SELF,

a Dynamically-Typed Object-Oriented Language


Based on Prototypes*
Craig Chambers, David Ungar, and
Elgin Lee
Center for Integrated Systems, Stanford University
sel@@self.stanford.edu

Abstract 1. Introduction
We have developed and implemented techniques that SELF yUS87] is a dynamically-typed object-oriented
double the performance of dynamically-typed object- language inspired by the Smalltalk-80** language
oriented languages. Our SELF implementation runs [GR83]. Like Smalltalk, SELF has no type declarations,
twice as fast as the fastest Smalltalk implementation, allowing programmers to rapidly build and modify
&spite SELF’s lack of classesand explicit variables. systems without interfenznce from out-of-date type
To compensate for the absence of classes, our system declarations. Also, SELF provides blocks (lexically-
uses implementation-level maps to transparently group scoped function objects akin to closures [Ste76, SS76])
objects cloned from the same prototype, providing data so that SELF programmers may define their own
type information and eliminating the apparent space control structures; even the standard control structures
overhead for prototype-based systems. To compensate for iteration and boolean selection are constructed out
for dynamic typing, userdefined conml structures, and of blocks. However, unlike Smalltalk and most other
the lack of explicit variables, our system dynamically object-oriented languages, SELF has no classes.***
compiles multiple versions of a source method, each Instead it is based on the prototype object model, in
customized according to its receiver’s map. Within each which each object defines its own object-specific
version the type of the receiver is fixed, and thus the behavior, and inherits shared behavior from its parent
compiler can statically bind and inline all messages sent
objects. Also unlike Smalltalk, SELF accesses state
to self. Message splitting and type prediction extract
solely by sending messages; there is no special syntax
and preserve even more static type information,
for accessing a variable or changing its value. These two
allowing the compiler to inline many other messages.
Inlining dramatically improves performance and features, combined with SELF’s multiple inheritance
eliminates the need to hard-wire low-level methods rules, help keep programs concise, malleable, and
such as +, ==, and if True : . reusable.

Despite inlining and other optimizations, our system In a straightforward implementation, SELF’s prototype-
still supports interactive programming environments. based model would consume much more storage space
The system traverses internal dependency lists to other dynamically-typed object-oriented
invalidate all compiled methods affected by a program- programming languages, and its reliance on message
ming change. The debugger reconstructs inlined stack
frames from compiler-generated debugging information,
making inlining invisible to the SELF programmer. * ‘Ibis work has been generously supported by a National
Science Foundation Presidential Young Investigator Grant #
CCR-865’7631, and by IBM, Texas Instruments, NCR, Tandem
Computers, Apple Computer, and Sun Microsystems.
Permission to copy without fee all or part of this material is granted provided
*If Smalltalk- is a trademark of ParcPlace Systems,Inc.
that the copies are not made or distributed for direct commercial advantage,
Hereafterwhen we write “Smalltalk” we will. be referring to the
the ACM copyright notice and the title of the publication and its date appear,
Sma.lltalk-80systemor language.
and notice is given that copying is by permission of the Association for *** To illustrate how unusual this is, note that some well-
Computing Machinery. To copy otherwise, or to republish, requires a fee respected authorities have gone so far as to require that “object-
and/or specific permission. oriented” languages provide classes meg87]. Other prototype
0 1989 ACM 08979L333-7/89/0010/0049 $1.50 models are discussedin por86, Lie86, LTP86, Ste87].

October 1-6, 1989 OOPSLA ‘89 Proceedings 49


passing to access state would exact an even higher general traits
penalty in execution time. We have developed and
implemented techniques that eliminate the space and
point traits
time costs of these features. In addition, we have t
1.
parent’
implemented other optimizations that enable SELF to mint x print. ‘, ’ print. y print
nm twice as fast as the fastest Smalltalk system. These
same techniques could improve implementations of
class-based object-oriented languages such as Smalltalk, cartesian point traits Ialar point traits
T
Flavors Fzoo86], CLOS [Bob88], C++ [Str86],
TreIlis/Owl [Sch86], and Eiffel [Mey86].

This paper describes our implementation for SELF,


which has been running for over a year. First we review
SELF’s object and execution model in section 2. Then a
we describe SELF’S object storage system in section 3,
introducing mups and segregation and presenting object
formats. Section 4 explains our byte-coded
representation for source code. Section 5 reviews the
compiler techniques, originally published in fCU89].
Section 6 explains how these optimizations can coexist Six SELF objects. The bottom objects are two-dimensional
point objects, the left one using cartesian coordinates and
with an exploratory programming environment that
the right one using polar coordinates. The t represents
supports incremental recompilation and source-level the assignment primitive operation, which is invoked to
debugging. Section 7 compares the performance of SELF modify the contents of corresponding data slots. The carte-
to the fastest available Smalltalk system and an Sian point traits object is the immediate parent object
shared by all Cartesian point objects, and defines four meth-
optimizing C compiler. It also proposes a new ods for interpreting Cartesian points in terms of polar coor-
petiOltllliUCe metric, MlMS, for object-oriented dinates; the polar point traits object does the same for po-
language implementations. We conclude with a lar point objects. The point traits object is a shared ances-
discussion of open issues and future work. tor of all point objects, and defines general methods for
printing and adding points, regardless of coordinate sys-
tem. This object inherits from the top object, which defines
2. Overview of SELF even more general behavior, such as how to copy objects.

SELF was initially designed by the second author and


Randall B. Smith at Xerox PARC. The subsequent Once a matching slot is found, its contents is evuluuted
design evolution and implementation were undertaken and the result is returned as the result of the message
beginning in mid-1987 by the authors at Stanford send.
University.
An object without code evaluates to itself (and so the
SELF objects consist of named slots, each of which
slot holding it acts like a variable). An object with
contains a reference to some other object. Some slots
code (a method) is a prototype activation record. When
may be designated as parent slots (by appending
evaluated, the method object clones itself, iills in its
asterisks to their names). Objects may also have SELF
source code associated with them, in which case the self slot with the receiver of the message, fills in its
object is called a method (similar to a procedure). To argument slots (if any) with the arguments of the
make a new object in SELF, an existing object (called message, and executes its code. The self slot is a parent
the prototype) is simply cloned (shallow-copied). slot so that the cloned activation record inherits from
the receiver of the messagesend.
When a message is sent to an object (called the receiver
of the message), the object is searched for a slot with For instance, in the point example shown above, sending
the same name as the message. If a matching slot is not the x message to the cartesian point object finds the x
found, then the contents of the object’s parent slots are slot immediately. The contents of the slot is the integer
searched recursively, using SELF’s multiple inheritance 3, which evaluates to itself (it has no associated code),
rules to disambiguate any duplicate matching slots. producing 3 as the result of the x message. If x were

50 OOPSLA ‘89 Proceedings October l-6, 1989


sent to the polar point object, however, x wouldn’t be expression, which retums a value not to the caller of
found immediately. The object’s parents would be the block method, but to the caller of the lexically-
searched, finding the x slot defined in the polar point enclosing non-block method, much like a return
traits object. That x slot contains a method that statement in C.
computes the x coordinate from the rho and theta
coordinates. The method would get cloned and executed, Two other kinds of objects appear in SELF: object arrays
producing the floating point result 1.2 5. and byte arrays. Arrays contain only a single parent slot
pointing to the parent object for that kind of array, but
If the print message were sent to a point object, the contain a variable number of element objects. As their
print slot defined in the point traits object would be names suggest, object arrays contain elements that are
found. The method contained in the slot prints out the arbitrary objects, while byte arrays contain only integer
point object in Cartesian coordinates. If the point were objects in the range 0 to 255, but in a more compact
represented using Cartesian coordinates, the x and y form. Primitive operations support fetching and storing
messages would access the corresponding data slots of elements of arrays as well as determining the size of an
the point object. But the print method works fine array and cloning a new array of a particular size.
even for points represented using polar coordinates: the
The SELF language described here is both simple and
x and y messages would find the conversion methods
powerful, but resists efficient implementation. SELF’S
defined in the polar point traits object to compute the
prototype object model, in which each object can have
correct x and y values.
unique format and behavior, poses serious challenges for
SELF supports assignments to data slots by associating the economical storage of objects. SELF’s exclusion of
an assignment slot with each assignable data slot. The type declarations and commitment to message passing
assignment slot contains the assignment primitive for all computation-even for control structures and
object. When the assignment primitive is evaluated as variable accesses4efeats existing compiler technology.
the result of a message send, it stores its argument into The remainder of this paper describes our responses to
the associated data slot. A data slot with no these challenges.
corresponding assignment slot is called a constant or
read-only slot, since a running program cannot change 3. The Object Storage System
its value. For example, most parent slots are constant The object storage system (also referred to as the
slots. However, our object model allows a parent slot memory system) must represent the objects of the SELF
to be assignable just like any other slot, simply by user’s world, including references between objects. It
defining its corresponding assignment slot. Such an crates new objects and reclaims the resources consumed
assignable parent slot permits an object’s inheritance to by inaccessible objects. An ideal memory system would
change on-the-fly, perhaps as a result of a change in the squeeze as many objects into as little memory as
object’s state. For example, a collection object may possible, for high performance at low cost. An earlier
wish to provide different behavior depending on version of our SELF memory system was documented in
whether the collection is empty or not. This dynamic [Lee88].
inheritance is one of SELF’s linguistic innovations, and
has proven to be a useful addition to the set of object- Much of our memory system design exploits
oriented programming techniques. technology proven in existing high-performance
SmaIltalk systems. For minimal overhead in the
SELF allows programmers to define their own control common case, our SELF system represents object
structures using blocks. A block contains a method in a references using direct tagged pointers, rather than
slot named value; this method is special in that when indirectly through an object table. Allocation and
it is invoked (by sending value to the block), the garbage collection in our SELF system uses Generation
method runs as a child of its lexically enclosing Scavenging with demographic feedback-mediated
activation record (either a “normal” method activation tenuring [Ung86, UJ88], augmented with a traditional
or another block method activation). The self slot is not mark-and-sweep collector to reclaim tenured garbage.
rebound when invoking a block method, but instead is The following two subsections describe our new
inherited from the IexicaIly enclosing method. Block techniques for efficient object storage systems; the third
methods may be terminated with a non-local return subsection describes our object formats in detail.

October 1-6, 1989 OOPSLA‘89 Proceedings 51


3.1. Maps
A naive implementation of SELF’s prototype object Without Maps
model would waste space. If SELF were based on Cartesian point traits
classes, the class objects would contain the format
(names and locations of the instance variables),
methods, and superclass information for all their
instances; the instances would contain only the values
of their instance variables and a pointer to the shared
class object. Since SELF uses the prototype model, each
object must define its own format, behavior, and
inheritance, and presumably an implementation would
have to represent both the class-like format, method,
and inheritance information and the instance-like state
information in every SELF object. two cartesian points

Luckily, we can regain the storage efficiency of classes


even in SELF’s prototype object model. Few SELF With Maps
objects have totally unique format and behavior.
Almost all objects are created by cloning some other
ia cartesian point traits map

cartesian point traits


object and then modifying the values of the assignable
slots. Wholesale changes in the format or inheritance of
an object, such as those induced by the programmer, can
only be accomplished by invoking special primitives.
We say that a prototype and the objects cloned from it,
identical in every way except for the values of their
assignable slots, form a clone family.

We have invented maps as an implementation technique cartesian point map


two cartesian points ii
to efficiently represent members of a clone family. In
our SELF object storage system, objects are represented
by the values of their assignable slots, if any, and a An example of the representations for two Cartesian
points and their parent. Without maps, each slot would
pointer to the object’s map; the map is shared by all require at least two words: one for its name and another
members of the same clone family. For each slot in the for its contents. This means that each point would CCCU-
object, the map contains the name of the slot, whether py at least 10 words. With maps,each point object only
the slot is a parent slot, and either the offset within the needs to store the contents of its assignable slots, plus
one more word to point to the map. All constant slots and
object of the slot’s contents (if it’s an assignable slot) all format information are factored out into the map.
or the slot’s contents itself (if it’s a constant slot, such Maps reduce the 10 words per point to 3 words. Since
as a non-assignable parent slot). If the object has code the cartesian point traits object has no assignable slots,
all of its data are kept in its map.
(i.e., is a method), the map stores a pointer to a SELF
byte code object representing the source code of the
method (byte code objects are described in section 4).
From the implementation point of view, maps look
Maps are immutable so that they may he freely shared much like classes, and achieve the same sorts of space
by objects in the same clone family. However, when the savings for shared data But maps are totally
user changes the format of an object or the value of one transparent at the SELF language level, simplifying the
of an object’s constant slots, the map no longer applies language and increasing expressive power by allowing
to the object. In this case, a new map is created for the objects to change their formats at will. In addition, the
changed object, starting a new clone family. The old map of an object conveys its static properties to the
map still applies to any other members of the originaI SELF compiler, Section 5 explains how the compiler can
clone family. exploit this information to optimize SELF code.

52 OOPSLA‘89 Proceedings October l-6, 1989


3.2. Segregation
A common operation of the memory system is to scan
all object references for those that meet some criterion:
. The scavenger scans all objects for references to
objects in from-space.
. The reflective object modification and programming
primitives have to redirect all references to an
object if its size changes and it has to be moved.
. The browser may want to scan all objects for those .-:.:.:.:.:.:.:.:.
.““““““““““” :.:.:.
:.:.:.:.:y:.:.
:.:.:.:
:.:.:.:.:.:i
.,.......; 3............f..:.:.:.:.:
_.,.,...................
~~...~~~..~.~,............~.....~.........~..........
.._.........._...._............,......,.,...,.,..........
..................................,..,. ,.............~~~~~~~~~~~~~~~~~,,~~~~~:
......~..............._................~...~.~.~,.,.,., .,,.,,_
/ ............,....,. . .. .. ,.............
...._.....,.,.,,,.,.,.,,,,,.,.,.,.,,,,,.,.,.,,,.,
“.“‘.‘.‘.‘.‘.’
‘.‘.‘,L’.‘. ‘.‘A.‘. 0.‘.‘...‘:‘:‘:‘.‘.‘.:...:.~.:.:
::::::::::::::::.:.:.:.x+:+:.:.: ...A
.,..., ._.,.....,.....,...j,.
......_..................
... ‘.:,~:.~:.~:.:.~~:.:.:.:.~~~~~~~~~:.~
......A.... ......................_..............................................:.,...,.,.,...,,,,,.,.,.,.,,,,,.,.,.,.,.,
.‘.,““.‘.‘.‘.‘.‘.‘...,.,.,.,.,.,.,...,.,.,.,.,.,.,.,.,.,.,,,,,,,,,,.,,,~,~,~,~~~,~,~,~,~,~,~,~,~,~,~,
:,:,:,:,:,:
“““.‘.‘.‘.:.:.:.:.:.:.:.:.:.:.:.~.:.,.....,.............
‘.‘.‘.‘.‘.‘.‘.‘. ............................ .:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.;.:.:.:.:.:..:.:.:.:.:.:
,.,.,.,.,._.. ,....i.,..,._..,,~_~.~.~.~,~.~,~,~.~.~,~,
“‘j”..‘..... .,.,.,.,...,.,.,.,.,.,.,...,.,.,.,.,.,.,.,...,...,.....,.,......,.~...,.,...,.,.,.,.,.,.,.,.,.,.,.,.,.,.,...,.,. .,.,,,
that contain a reference to a particular object that ijiijjjjljjiijiiiiii~~~~~~~:~
j’i:j’j:i’i:j:j:j:j::::::::::
&j ect
. .
reference area grows upward ;
I:_:; ,...:.:.:.:.
~~~ii~~i’ij~~
:.;.:.I....,
interests the SELF user. ~
:.:.:.:.:.:.:.:((.:.:.:.:.:
contains all object references but :#:::::::::::::::::::::::::::
i..‘.,.‘.‘.:,:.:.:::::::::::::
::::::::::::.:.:.:
.,.,.,.,.,._.
fj~~$##$$$ no confusing byte arrays ~.....~.~...~....1_~.........~
.,_,.(
.,.,.,,
To support these and other functions, our SELF .~:~:~.:.:.:.:.:.:.:.:,:.:.:
:::::::::::::::::::::::::::: .,.,.,.,.>A.?.A.....<.:.:,:+:.):_>:.:+:+:.>:.:.:.:.>::>>>>:::.:.:.:.:.: .................._....................................
... ,.. _. ,.. ,....,_..
~,~~:,~
i:+ ._..>:.:.:
.,.v., ..~~~:.:
.,,,.,
;:)I::::::::::::.:.:.:.:.:.:.:
,.,‘,‘,.,‘~~,:(: :::: : :: :: :,
‘f:‘:‘:::::
.................“‘.‘.‘...‘.‘.‘.‘I.‘.‘...‘.:.:.:.:......,....
.“““““.‘(.....,...... . .
.,.,.,.,...,.,.,.,.,.,.,..., . . . . . . . .A....
. . . ..................................._L..........
. . . . . . . ..:.:.:.:.:.:.:.:.:.:.:.:.:.:.~::::::::::::::::::::::::::::::::~:::~:::~:::
. . . . . . . . . . .
_, . .... . :::::::::::,:,:,:,:,:.:,:,:,:,:,:,:,:.,
..~......._....._.............._.~.................
implementation has been designed for rapid scanning of .,.(.,.,.(.(.(.,., .,.,...,.,.,... ..,.,.,.,.,.,...,.,.
:::::::::::::::::::::::::::::‘:‘:’:’:’:’:’:’:’:::::::.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.~.:.: ,.,.,.,.,.,., ,.,.,.,..
.....:.:.:.:.:.:.:.~:.:.:.:.:.:.:.‘.:.:.~:.:.~.~:.’.:::: .: :: ::::: ::::: :: :::: : :.,,.,.,...,.....,.,.,.,...,.,
,,,,,, ,,_

object references.
A SELF Memory Space
Since the elements of byte arrays are represented using
packed bytes rather than tagged words, byte array
elements may masquerade as object references. our SELF system scanned memory at the rate of
Smalltalk systems typically handle this problem by approximately 3 megabytes per second. Measurements
scanning the heap object-by-object rather than word-by- of the fastest Smalltalk- implementation on the same
word. For each object, the system checks to see whether machine indicated a scanning speed for non-segregated
the object contains object references or only bytes. Only memory spacesof only 1.6 megabytesper second.
if the object contains object references does the system
scan the object for matching references, iterating up to For some kinds of scans, such as finding all objects that
the length of the object. Then the scanner proceeds to refer to a particular object, the scanner needs to find the
the next object. This procedure avoids the problems objects that contain a matching reference, rather than
caused by scanning byte arrays, but slows down the scan the reference itself. Our system can perform these types
with the overhead to parse object headers and compute of searches nearly as fast as a normal scan. We use a
object lengths. special tag for the first header word of every object
In our SELF system, we avoid the problems associated (called the murk word) to identify the beginning of the
with scanning byte arrays without degrading the object object. The scanner proceeds normally, searching for
reference scamring speed by segregating the byte arrays matching references. Once a reference is found, the
from the other SELF objects. Each Generation object containing the reference can be found by simply
Scavenging memory space is divided into two areas, one scanning backwards to the object’s mark word, and then
for byte arrays and one for objects with references. To converting the mark’s addressinto an object reference.
scan all object references, only the object reference ama
of each space needs to be scanned. This optimization 3.3. Object Formats
speeds scans in two ways: byte array objects are never
A SELF memory space is organized as a linear array of
scanned,and object headers are never parsed.
aligned 32-bit words. Each word contains a low-order 2-
To avoid slowing the tight scanning loop with an bit tag field, used to interpret the remaining 30 bits of
explicit end-of-space check, the word after the end of information. A reference to an integer or floating point
the space is temporarily replaced with a sentinel number encodes the number directly in the reference
reference that matches the scanning criterion. The itself. Converting between a tagged integer immediate
scanner checks for the end of the space only on a and its corresponding hardware representation requires
matching reference, instead of on every word. Early only a shift instruction. Adding, subtracting, and
measurements on 68020-based Sun-3/50’s showed that comparing tagged integers require no conversion at all.

October 1-6, 1989 OOPSLA ‘89 Proceedings 53


31 2 0 contents. An array object contains its length (tagged as
I
30-bit signed integer 00 a SELF integer to prevent interactions with scavenging
and scanning) and its elements (either 32-bit tagged
integer immediate (or virtual machine address) object references or 8-bit untagged bytes, padded out to
31 2 0 the nearest 32-bit boundary).
I
I top 30 bits of word-aligned address 01 The representation of a map is similar. Map objects
begin with mark and map words. AU map objects share
reference to SELF heap object
the same map, called the “map map.” The map map is
I its own map. All maps in new-space are linked together
I 30 bits of IEEE floating point number I 10 by their third words; after a scavenge the system
traverses this list to finalize inaccessible maps. The
floating point immediate (or v. m. address) fourth word of a map contains the virtual machine
31 2 0 address of an array of function pointers;* these
I
scavenging fields and hash field . 11 functions perform format-dependent operations on
objects or their maps.
mark header word (begins SELF heap object)
For maps of objects with slots, the fifth word specifies
References to other SELF objects and references to map the size of the object in words. The sixth word
objects embed the address of the object in the reference indicates the number of slots in the object. The next
(remember that there is no object table). The remaining two words contain a change dependency link for the
tag format is used to mark the Crst header word of each map, described in section 6.1. These four words are
object, as required by the scanning scheme discussed in tagged as integers. If the map is for a method, the ninth
the previous subsection. Pointers to virtual machine word references the byte code object representing the
functions and other objects not in the SELF heap are method’s source code.
represented using raw machine addresses; since their
map for map for map for
addresses are at least 16-bit half-word aligned the a data object a method an array
scavenger will interpret them as immediates and won’t
try to relocate them.
fTii%J
object w/ slots object array byte array
El

I dwqgpcy
I
1 byte code ]

.
1 des%$on /
I I

Each object begins with two header words. The first


word is the mark word, marking the beginning of the
I dot
description I
:
.

object. The mark contains several bitfields used by the / de.s%$ion 1


scavenger and an immutable bitfield used by the SELF
hash primitive. The second word is a tagged reference
to the object’s map. A SELF object with assignable * This function pointer array is exactly the virtual function array
slots contains additional words to represent their generated by the C++ compiler.

54 OOPSLA ‘89 Proceedings October i-6, 1989


Finally, the map includes a five-word description for . . . mark
each of the object’s slots. The first word points to the
map map map
SELF string object representing the name of the slot;
slot contenfs 3 i . . . scav’ing link
the next word describes both the type of the slot
slot contents 4 i . . . function array
(either constant data slot, assignable data slot, or liil . .. . .. . :
assignment slot) and whether the slot is a parent slot.* 4 object length
The thid word of a slot description contains either the 5 slot count
contents of the slot (if it’s a constant slot), the offset map
map i
. . . f;rdew
within the object of the contents of the slot (if it’s au
assignable data slot), or the index of the corresponding slot contents 7.5 i ‘parent slot name
data slot (if it’s an assignment slot). The last two slot contents -24 : const. parent slo slot type
B .. . c
words of each slot contain a change dependency link for cart. point traits slot contents
that slot, described in section 6.1. slot
. . . ;; dew
R”
‘X’ slot name

constant slot data slot assignment slot data slot slot type
description description description 2 slot offset
slot
. . * de ndency
hfe
33 slot name
Y
slot contents 1 slot offset 1 Idata slot index1
data slot slot type
3 slot offset
slot
. . . j?; dew
r
‘x:’ slot name
assignment slot slot type
1 data slot index
From the above object formats, we can determine that slot
the total space cost to represent a clone family of n . . . ;; dew
r
objects (each with s slots, (I of which are assignable) is
‘y:’ s/of name
(2 + a)n + 5s + 8 words. For the simple Cartesian point
assignment slot slot type
example, s is 5 (x, x: , y. y:, and parent) and u is 2
2 dafa slot index
(x and y), leading to a total space cost to represent all
slot
point objects of 4n + 33 words. Published accounts of . . . de ndency
Smalltalk- systems [DS84, Ung86] indicate that these /inRe
systems use at least two extra words per object: one for
its class pointer and another for either its address or its The representation of two carte&n point objects. The
hash code and flags. Therefore, maps allow objects in a objects on the left are the point “instances,” containing
prototype-based system like SELF to be represented just the values of the x and y assignable data slots. The
right object is the shared map for all Cartesian points,
as space-efficiently as objects in a class-based system
containing the value of the constant parent slot and the
like Smalltalk. offsets of the assignable x and y slots.

* In SELF parents are prioritized; the priority of a parent slot is


stored in the second word of the slot description.

October I-6, 1989 OOPSLA ‘89 Proceedings 55


4. The Parser
To minimize parsing overhead, textual SELF programs method object
(prototype activation record)
are parsed once when entered into the system, generating
SELF-level byte code objects, much like Smalltalk
CompiledMethod instances. Each method object
represents its source code by storing a reference to the
pre-parsed byte code object in the method’s map; all
cloned invocations of the method thus share the same
byte code object. A byte code object contains a byte
array holding the byte codes for the source, and an
object array holding the message names and object
literals used in the source. Each byte code in the byte
array represents a single byte-sized virtual machine SELF SEND #I0 (x)
instruction, and is divided into two parts: a 3-bit SEND #l (prin?)
opcode and a 5-bit object array index. The opcodes am LITERAL #2 (: ‘)
specified as if for execution by a stack-oriented SEND #l (print)

interpreter, in actuality, our SELF compiler SELF SEND #3 (y)


SEND #l {print)
dynamically translates byte code objects into native
machine instructions just prior to execution The only The representation of the point print method. The top ob-
opcodes used to represent SELF programs are the ject is the prototype activation record, containing place-
following: holders for the local slots of the method (in this case, just
the self slot) plus a reference to the byte code object rep-
SELF resenting the source code (actually stored in the method’s
push aelf onto the execution stack
map). The byte code object contains a byte array for the
LITERAL cvaiue index> byte codes themselves, and a separate object array for the
push a literal value onto the execution stack constants and message names used in the source code.
SEND <message name index>
send a message, popping the receiver and arguments off
the execution stack and pushing the result Every call of a primitive operation may optionally pass
SELF SEND <message name index> in a block to be invoked if the primitive fails by
send a message to self, popping the arguments off the appending IfFail: to the message name. If invoked,
execution stack and pushing the result
the block is passed an error code identifying the nature
SUPER SEND <message name index>
send a message to self, delegated to all parents, of the failure (e.g. overflow, divide by zero, or
popping the arguments off the execution stack and incorrect argument type). The normal SEND byte codes
pushing the result
are used to represent all primitive operation
DELEGATEE <parent name index>
delegate the next message send to the named parent invocations, simplifying the byte codes and facilitating
NON-LOCAL RETURN extensions to the set of available primitive operations.
execute a non-local return from the lexically-enclosing By contrast, SmalltaIk-80 primitives are invoked by
method activation
number rather than name, and may only be called at the
INDEX-EXTENSION <index extension>
extend the next index by prepending the index
beginning of a method. The rest of the method is
extension executed if the primitive fails, without any indication
The index for the opcodes is an index into the of why the primitive failed.
accompanying object array. The 5-bit offset allows the
The byte codes needed to express SELF programs fall
ftrst 32 message names and literals to be referred to
into only three classes: base values (LITERAL and
directly; indices larger than 32 are constructed using
SELF), message sends, and non-local return. This small
extra INDEX-EXTENSION instmctions.
number results from both the simplicity and elegance of
In SELF source code, primitive operations are invoked the SELF language and the lack of elaborate space-saving
with the same syntax used to send a message, except encodings. Smalltalk- defines a much larger set of
that the message name begins with an underscore (“-“). byte codes [GR83], tuned to minimize space and

56 OOPSLA‘89 Proceedings October 1-6, 1989


maximize interpretation speed, and includes byte codes 5. The Compiler
to fetch and store local, instance, class, pool, and
global variables and shortcut byte codes for common- The SELF compiler is a significant part of our efficient
case operations such as loading constants like nil, implementation [CU89]. It is similar to the Deutsch-
true. and 0. Schiffman translator described in [DS84] (and
implemented in the PamPlace Smalltalk- system) in
Smalltalk- systems also use special control flow that it supports dynamic translation of byte-coded
byte codes to implement common boolean messages Iike methods into machine code transparently on demand at
ifTrue:ifFalse: and whilelrue:; the Smalltalk run-time, and it uses an inline caching technique to
parser translates the message sends into conditional and reduce the cost of non-polymorphic message sends.
unconditional branch byte codes, open-coding the However, although the Deutsch-Schiffman system is the
argument blocks. Similarly, the == message is fastest Smalltalk system (as of July 1989), it still runs
automatically translated into the identity comparison about 10 times slower than optimized C. By combining
primitive operation byte code. A similar optimization is traditional optimizing compiler technology, techniques
included for messages like + and <, which the parser from high-performance Smalltalk systems, and some
translates into special byte codes. When executed, these critical new techniques we developed, our SELF
byte codes either directly invoke the corresponding compiler has already achieved a level of performance
integer primitive operation (if the receiver is an more than twice as fast as the Deutsch-Schiffman
integer), or perform the message send (if the receiver system, and only 4 to 5 times slower than optimized C.
isn’t an integer). We hope that our second-generation system under
Although this special processing for common messages construction (and described in section 7) will achieve
may significantly improve the performance of existing even better levels of performance.
Smalltalk systems, especially interpreted ones, they
The main obstacle to generating efficient code from
violate the extensible and flexible spirit of Smalltalk:
Smalltalk programs, as many people have noted before
l The source code for the hard-wired methods is [Atk86, JGZ88, BMW86], is that very little static type
relegated to documentation, and all changes to the information is available in the Smalltalk source. Only
hard-wired source code are ignored by the system. literal constants have a known class at compile-time;
hY definitions of ifTrue:ifFalse:,
whileTrue:, and == for other types of objects without detailed analysis, no other types are known.
are ignored. Type infemncing is difficult for Smalltalk programs,
especially when the compiler is using the inferred types
l The receiver of an ifTrue:ifFalse: message
must evaluate to either the true or the false object to improve performance [Suz81, BI82, Cur89]. Even if
at run-time and the arguments must be block the Smalltalk programmer were willing to sacrifice
literals at parse-time; the receiver and argument to many of the benefits of his exploratory programming
whileTrue: must be block literals at parse-time, environment and annotate his programs with static tyPe
and the receiver block must evaluate to either the declarations, designing an adequate type system for
true or the false object at run-time. Smalltalk would be hard [Atk86, JGZ88]; the more
l Perhaps the worst aspect of these parser flexible the type system, the smaller the performance
“optimizations” is that they tempt programmers to improvement possible and the smaller the reward for
select inappropriate control structures like including type declarations in the first place.
whileTrue: instead of : do : to obtain the
to

performance of the hard-wired message.


SELF programs are even hauler to compile efficiently
In effect, these hard-wired messages have become the than Smalltalk programs. All the problems of missing
non-object-oriented built-in operators of Smalltalk. static type information that Smalltalk compilers face
Our SELF system incorporates none of these tricks. are also faced by our SELF compiler. In addition, all
Instead our compilation techniques achieve better variables in SELF are accessed by sending messages,
performance without compromising the language’s rather than being explicitly identified as variables in the
conceptual simplicity and elegance, preserving the source code and byte codes. And since there am no
messagepassing model for all messages. classes in SELF, some of the class-based techniques used

October 1-6, 1989 OOPSLA ‘89 Proceedings 57


to optimize Smalltalk programs, such as inline caching, systems, our SELF system waits until the min: method
type inferencing, and static type checking, cannot he is first invoked before compiling any code for this
directly used in our SELF system. method. Other systems would compile this method once
for all receiver and argument types, which would
Rather than compromising the flexibility of SELF
require generating the code for a full message dispatch
programs with a static type system, or compromising to select the right < comparison routine. Since our SELF
the execution speed of programs by interpreting
compiler generates a separate compiled version for each
dynamic type information, we have developed
receiver type, it can customize the version to that
compilation techniques that automatically derive much specific receiver type, and use the new-found type
of the type information statically specified in other
information to optimize the < message.
type systems. By combining this extra information with
a few general-purpose techniques from optimizing Let’s trace the operations of our SELF compiler to
compilers for traditional languages like Fortran and C, evaluate the expression i min: j, where i contains an
our compiler achieves good performance without integer at nm-time. Assuming this is the first time
sacrificing any of the comforts of an interactive, min : has been sent to an integer, our compiler will
exploratory programming environment: fast turnaround generate code for a version of min: that is customized
for programming changes, complete source-level for integer receivers. The compiler first builds the
debugging, and a simple, elegant programming language following internal flow graph (expensive operations axe
unfettered by static type declarations. The next few in bold face):*
subsections summarize our new compilation techniques;
a more detailed discussion may be found in [CU89].

5.1. Customized Compilation


The Deutsch-Schiffman Smalltalk- system compiles a
single machine code method for a given source code I send < I
L 1
I
method. Since many classes may inherit the same , 4

method the SmalltaUc-80 compiler cannot know the 1 push [self] 1


exact class of the receiver. Our SELF compiler, on the
other hand, compiles a different machine code method
for each type of receiver that runs a given source
method. The advantage of this approach is that our SELF
compiler can know the type of the receiver of the
create-closure
message at compile-time, and can generate much better
code for each of the specific versions of a method than it send ifTrue:False:
could for a single general-puxpose compiled method.
We call this technique of dynamic translation of
multiple specially-compiled methods for a single Many of the expensive operations can be eliminated by
source-code method customized compilation. inlining messages sent to receivers of known type, as
Consider the min : method defined for all objects: described next.

min: arg = (
< arg ifTrue: [self] False: [arg] ).

This method could be invoked on integers, floating


* To simplify the discussion.messagesendsthat accesslocal
point numbers, strings, or any other objects that can be slots within the executing activation record (e.g. arguments)are
compared using <. Like other dynamic compilation assumedtobereplaced with localregisteraccesses immediately.

58 OOPSLA‘89 Proceedings October l-6, 1989


5.2. Message Inlining
Our compiler uses sources of type information, such as I push self ~~ I
the types of source-code literals and the type of self
gleaned from customized compilation, to perform 1 push arg 1
compile-time message lookup and message inlining. If
the type of the receiver of a message is known at
compile-time, the compiler can perform the message
lookup at compile-time rather than wait until run-time.
If this lookup is successful, which it will be in the
absence of dynamic inheritance and programming errors,
our compiler will do one of the following:
l If the slot contains a method, the compiler will
inline the body of the method at the call site, if the 1 create-closure 1
method is short enough and nonrecursive.
l If the slot contains a block value method, the
compiler will inline the body of the block value
method at the call site, if it is short enough. If
after inlining there are no remaining uses of the create-closure
block object, the compiler will eliminate the code
to create the block at run-time. send ifTrue:False:
l If the slot is a constant data slot, the compiler
will replace the message send with the value of the
slot (a constant known at compile-time). The overhead for sending the < message has been
. If the slot is an assignable data slot, the compiler eliminated, but calling a procedure to compare integers
will replace the message send with code to fetch is still expensive. The next section explains how our
the contents of the slot (e.g. a load instruction).
compiler open-codes common primitive built-in
l If the slot is an assignment slot, the compiler will operations to further increase performance.
replace the message send with code to update the
contents of the slot (e.g. a store instruction).
5.3. Primitive Inlining
After inlining all messages sent to receivers of lolown
type, the compiler will have inlined all messages that Primitive inlining can be viewed as a simpler form of
in an equivalent Smalltalk program would have been message inlining. Calls to primitive operations are
variable refe=nces or assignments, thus eliminating the normally implemented using a simple procedure call to
overhead in SELF of using message passing to access an external function in the virtual machine. However,
variables. In addition, many more messages have been like most other high-performance systems, including
inlined that in a Smalltalk system would have remained some Smalltalk systems [DS84, JGZ88], our SELF
full messagesends. compiler replaces calls of certain common primitives,
such as integer arithmetic, comparisons, and array
For example, in the version of min: customized for accesses, with their hard-wired definitions. This
integers, the compiler can statically look up the significantly improves perfomumce since some of these
definition of < defined for integers: primitives can be implemented in two or three machine
< arg = ( instructions if the overhead of the procedure call is
IntLTPrim: arg IfFail: [...I ). removed. If the arguments to a side-effect-free
primitive, such as an arithmetic or comparison
This method simply calls the integer less-than primitive, are known at compile-time, the compiler
primitive with a failure block (omitted here for actually calls the primitive at compile-time, replacing
brevity). The compiler inlines this < method to get to the call to the primitive with the result of the
the flow graph pictured at the top of the next column. primitive; this is SELF’s form of constant folding.

October 1-6, 1989 OOPSLA ‘89 Proceedings 59


In our ongoing min: example, the compiler inlines the 5.4. Message Splitting
- 1ntLTPrim:IfFai.l: call (the definition of the
integer less-than primitive, but not the integer less- When type information is lost because the 5ow of
than method, is hard-wired into the compiler) to get control merges (such as happens just prior to the
the flow graph: ifTrue:False: message in the min: example), our
SELF compiler may elect to split the message following
the merge into separate messages at the end of each of
the preceding branches; the merge is postponed until
after the split message. The compiler knows the type of
the receiver for some of the copies of the message, and
can perform compile-time message lookup and message
inlining to radically improve performance for these
versions. The proper semantics of the original unsplit
message is preserved by compiling a real message send
along those branches with unknown receiver types.
Message splitting can be thought of as an extension to
customized compilation, by customizing individual
messages along particular control flow paths, with
push [self] similar improvements in run-time performance.
I For the min: example, the SELF compiler will split
create-closure the ifTrue:False: message into three separate
I
versions:
push [argl
I

and #3, arg


&Elziz&
bzero
The first compare-and-branch sequence verifies that the
argument to the _ 1ntLTPrim:IfFail: cd is also
an integer (the receiver is already known to be an I cmp self,arg I
integer courtesy of customization); if not, the failure
block is created and invoked. If the argument is an
integer, then the two integers are compared, and either
the true object or the false object is returned as the
result of the < message.

The next message considered by our compiler is the create-closure /$$Icreate-closure create-closure C
ifTrue:False: message. If arg is an integer--the
common case-the receiver of ifTrue: False: Will
be either true or false; otherwise it will be the result
of the value message (unknown at compile-time).
Normally, tbis would prevent inlining of the
ifTrue:False: message, since the type of its
receiver cannot be uniquely determined. However, by
compiling multiple versions of the ifTrue:False:
message, one version for each statically-known receiver
type, our SELF compiler can handle and optimize each
caseseparately. This technique is explained next.

60 OOPSLA ‘89 Proceedings October 1-6, 1989


Now the compiler can inline the definition of need to be created at run-time any more, the compiler
ifTrue:False: forthe trueobject: eliminates them from the control flow graph,
producing the following flow graph:
ifTrue: trueBlk False: falseBlk = (
trueBlk value ).

and for the false object:

ifTrue: trueBlk False: falseBlk = (


falseBlk value ).
1 push .[ . . . I 1
to get to the following flow graph:

1 send value1
I
1
1 push Lselfl 1

I create-closure
I I
I 1
push [ . . . ] 1 push [argl 1
I
create-closure I create-closure I
I
send value
I
1 push ; self] 1 1 push iself] 1 push [self]
I
create-closure IIcreate-closure

1 Push ,L=-gl 1 1 push /arg] 1 1 push :arg] 1


Let’s assume that the failure block for integer
I create-closure I I createclosure II create-closure 1 comparisons is too complex to inline away. The

~~~~1
i[self] valuei [arg] value i send ifTrue:
compiler won’t inline the value message, and so the
value message’s result type is unknown at compile-
time. Thus the receiver type of the ifTrue:False:
message is unknown, and a simple SELF compiler
wouldn’t be able to inline this message away either.
However, the next subsection describes how our
The two value messagescan be inlined, replaced by the compiler uses la-~own patterns of usage to predict that
bodies of the blocks. Since none of the receiver and the receiver of the ifTrue:False: message Will be a
arguments of the inlined ifTrue: False: messages boolean and optimizes the messageaccordingly.

October 1-6, 1989 OOPSLA ‘89 Proceedings 61


5.5. Type Prediction
When the type of the receiver of a message is unknown and the false object, followed by several copies of the
at compile-time, the SELF compiler uses static type ifTrue:False: message (we’ll just look at the
prediction to generate better code for some common remaining unoptimized branch):
situations. Certain messages are known to the compiler
to be likely to be sent to receivers of certain types: +
and < are likely to be sent to integers, and
ifTrue:False: is likely to be sent to either true or
false. The compiler generates a run-time test based on create-closure
the expected type or value of the receiver, followed by I
a conditional branch to one of two sections of code; send value
along the “success” branch, the type (or value) of the
receiver is known (at compile-time), along the
“failure” branch, the type is unknown. The compiler
then uses the message splitting techniques to split the
predicted message, compiling a copy of the message
along each branch. Because the compiler now knows the
type of the receiver of the split message along the
“success” branch, it can inline that version of the
message away, significantly improving performance for
common operations like integer arithmetic and boolean
testing. A real message send is executed in the case that
the prediction fails, preserving the original message’s
semantics for all possible receivers.

This type prediction scheme requires little additional


implementation work, since message splitting and
inlining is already implemented. It is also much better
hard-wiring the ifTrue:ifFalse:,
whileTrue:, ==, +, and < messages into the parser
and compiler as SmaLltalk- systems do, since it
achieves the same sorts of performance improvements
but preserves the message passing semantics of the
In the left branch, the receiver of ifTrue:False: is
language and aLlows the programmer to modify the
definitions of all SELF methods, including those that known to be the value true; for the middle branch, the
are optimized through type prediction. receiver is known to be the value false. As before, the
Compiler idineS these IWO ifTrue:False: messages,
Let’s apply type prediction to the remaining plus the corresponding value messages, and eliminates
ifTrue:False: message in the min: example. The the closure creations to get to the final flow graph for
compiler fust inserts run-time tests for the true object the entire method, picture at the top of the next page.

62 OOPSLA ‘89 Proceedings October 1-6, 1989


6. Supporting the
Programming Environment
Our SELF system supports a high-productivity
programming environment. This environment requires
both rapid turn-around time for programming changes
and complete source-level debugging at the byte code
level. These features must coexist with our optimizing
compiler techniques, including message inhning. The
__TJ 1 create-pure 1 next two subsections describe the compiler-maintained
change dependency links that support incremental
\ 1 send value 1
recompilation of compiled code affected by
programming changes, and the compiler-generated
debugging information that allows the debugger to
reconstruct inlined stack frames at debug-time. This
information is appended to each compiled method object
iu the compiled code cache.

a compiled method a scope description

create-closure
I header

native
machine slot locations
code
create-closure
I

k
\I r I scavenging info $i~$$gg$?
send ifTrue: :!:::::::;:;:i:v
.y$;:::::::::
::M:i:;:;::.’
False: depy;lzncy &$#L:’ a ,,yte code mapp,ng
$$$>
:::::.:
p .+::g
..,.:::::::z::::
.::::::::::::::::::
,.~.::::::::::::::::::::::
..,~!~~~~Sl~ii(iiiiiiii
.A.,..... .A........
.,I.......,...
.‘.‘.‘.‘.‘f:.:.:.:.:......,:,:.:,:,:
:.:.:.:...A.. ..A.. .
.,.,.,.,.,.,.,.,.,...,.,.,.,.,.,,
:i:i:~.~:1:~:i:b~::~~:~~:~:~:~
....,.
...,.,.,.,.,.,
........._.....,.,.,.,
.“‘!‘:‘:‘:.:::::::~::::::::::::
...................
.‘....:.::::~.~:~:~::~:
In the common case of taking the minimum of two
integers, our compiler executes only two simple
compare-and-branch sequences, for fast execution. A
simihr savings will be seen if the user calls min : on A compiled method contains more than just instruc-
tions. It includes a list of the offsets within the in-
two floating point numbers or two strings, since our structions of embedded object references, used by the
compiler customizes and optimizes special versions for scavenger to modify the compiled code if a refer-
each of these receiver types. But even in the case of enced object is moved. The compiled method includes
dependency links to support selective invalidation. It
taking the minimum of two values of different types, also includes descriptions of the inlined method
such as an integer and a floating point number, our scopes, which are used to fiid the values of local
compilation techniques preserve the message passing slots of the method and to display source-level call
stacks, and a bidirectional mapping between source-
semantics of the original source code, and execute the level byte codes and actual program counter values.
source code faithfully.

October 1-6, 1989 OOPSLA ‘89 Proceedings 63


6.1. Support for Incremental Recompilation
compiled code for integer min:
A high-productivity programming environment requires
that programming changes take effect within a fraction
of a second. This is accomplished in our SELF system by
( dependency lists=

selectively invalidating only those compiled methods


that m, affected by the programming change, root
recompiling them from new definitions when next incase min:
needed. The compiler maintains two-way change +-mr
is changed
dependency links between each cached compiled method
and the slots that the compiled method depends on. The
information used to compile code-object formats and inteaer I
in case parent
the contents of non-assignable slots-is precisely the
information stored in maps. Therefore we can confine
our dependency links to maps. These links are formed in
T-t
trait;

I I I
than e affects
min: f ookup
in case 4 is
is changed
four ways:
9 When a method is being compiled, the system
I ’ (map dependency) XG#~
in case min:
is added

creates a dependency link between the map slot in case parent


description containing the method and the compiled change affects
< or min: lookups
code in case the definition of the method changes or (map dependency) in case < or min:
its slot is removed. is added
l When the compiler inlines a message, the system
creates a dependency link between the matching slot
description (either a method slot, a data slot, or an in case
assignment slot) and the compiled code in case the ifTrue:False:
definition of the inlined method changes or its slot is changed
is removed.
l When the compiler searches a parent object during
the course of a compile-time lookup, the system
creates a dependency link between the slot in case
description containing the parent and the compiled iffrue:False: I=+ ifTrue:False:
code in case the parent pointer changes and alters (map dependency) is changed
I
the result of the lookup.
l When the compiler searches an object The dependency lists for the compiled min: method custom-
unsuccessfully for a matching slot during compile- ized for integers. The gray line represents eight separate
time lookup, the system creates a dependency link circularly-linked dependency lists. Each list connects a slot
between the map of the object searched and the description to its dependent compiled code objects. If any
compiled code in case a matching slot is added to of the map information linked to the compiled code chang-
the object later. es, the compiled code for min: (and for any other compiled
methodsthat depend on the same changed information) will
These rules ensure that no out-of-date compiled meth- be thrown away and recompiled when next needed.
ods survive programming changes, while limiting invali-
dationsto those methodsactually affectedby a change. Selective invalidation is complicated by methods that
A dependency link is represented by a circular list that are executing when a programming change requires that
connects a slot description or map to all dependent they be invalidated. The methods cannot really be
compiled methods. When the system changes the flushed, because they are still executing, and some co&
contents of a constant slot or removes a slot, it must exist. But neither can they remain untouched, since
traverses the corresponding dependency list and they have been optimized based on information that is
invalidates all compiled code objects on the list. When no longer correct. One solution, which has not been
the system adds a slot, it similarly traverses the map’s implemented yet, would be to recompile executing
dependency list and invalidates linked compiled code methods immediately and to rebuild the execution stack
objects. Links must be removed from their lists when a for the new compiled methods. We do not know yet if
method is invalidated or a map is garbage-collected; this procedure would be fast enough to keep
lists are doubly-linked to speed these removals. programming turn-around time short..

64 OOPSLA ‘89 Proceedings October 1-6, 1989


6.2. Support for Source-Level Debugging
A good programming environment must include a

I I
min:
source-level debugger The SELF debugger presents the selt: rl
arg: f-2
program execution state in terms of the programmer’s
execution model: the state of the byte code interpreter,
with no optimizations. This requires that the debugger
be able to examine the state of the compiled, optimized
SELF program, and construct a view of that state (the
virtuaE state) in terms of the byte-coded execution
model. Examining the execution state is complicated by
having methods in the virtual call stack actually be
inlined within other methods in the compiled method
call stack, and by allocating the slots of virtual
methods to registers and/or stack locations in the
compiled methods. To allow the debugger to
reconstruct the virtual call stack from the physical
The debugging information for the min: method. Each
optimized call stack, the SELF compiler appends scope description points to its calling scope description
debugging information to each compiled method For (black arrows); a block scope also points to its lexically-
each scope compiled (the initial method, and any enclosing scope description (gray arrows). For each slot
within a scope, the debugging information identifies ei-
methods or block methods inlined within it), the ther the slot’s compile-time value or its run-time location.
compiler outputs information describing that scope’s For the min: example, only the initial arguments have nm-
place in the virtual call chain within the compiled time locations (registers rl and r2 in this case); all other
slot contents are known statically at compile-time.
method’s physical stack frame. For each argument and
local slot in the scope, the compiler outputs either the
value of the slot (if it’s a constant known at compile-
time, as many slots are) or the register or stack location execution stack displayer uses this mapping information
allocated to hold the value of the slot at run-time. to find the bottommost virtual stack frame for each
physical stack frame to display the call stack whenever
Our SELF compiler also outputs debugging information
the program is halted.
to support computing and setting breakpoints. This
information takes the form of a bidirectional mapping We have not implemented the breakpointing facilities in
between program counter addresses and byte code our debugger yet; the current “debugger” displays the
instructions within a particular scope. One complexity virtual execution stack and immediately continues
with this mapping is that it is not one-to-one: several execution whenever the -DumpSelfStack primitive is
byte codes may map to the same program counter called. However, our mapping system is designed to
address (as messages get inhned and optimized away), support computing and setting breakpoints in
and several program counter addresses may map to the anticipation of breakpointing and process control
same byte code (as messages get split and compiled in primitives. To set a breakpoint at a particular source-
more than one place). To determine the current state of level byte code, the debugger would find all those
the program in byte code tenus at any program counter program counter addressesassociated with the byte code
address, the debugger first finds the Iarest program and set breakpoints there. In cases where several byte
counter address in the mapping that is less than or equal codes map to the same program counter address, single
to the current program counter, and then selects the stepping from one byte code to the next wouldn’t
latest byte code mapped to that address; this algorithm actually cause any instructions to be executed, the
returns the last byte code that has been started but not debugger would pretend to execute instructions to
completed for any program counter address, The preserve the illusion of byte-coded execution.

October 1-6, 1989 OOPSLA ‘89 Proceedings 65


7. Performance Comparison The following table presents the actual running times
of the benchmarks on the specified platform. All times
SELF is implemented in 33,000 lines of C++ code and are in milliseconds of CPU time, except for the
1,000 lines of assembler, and runs on both the Sun-3 (a Smalltalk times, which are in milliseconds of real time;
68020-based machine) and the Sun-4 (a SPARC-based the real time measurements for the SELF system and the
machine). We have written almost 9,000 lines of SELF compiled C program are practically identical to the
code, including a hierarchy of collection objects, a CPU time numbers, so comparisons in measured
recursive descent parser for SELF, and a prototype performance between the ParcPlace SmaRtalk system
graphical user interface. and the other two systems are valid.

We compare the performance of our first-generation Raw Running Times


SELF implementation with a fast Smalltalk Smalltalk SELF’
SELF
implementation and the standard Sun optimizing C (real ms) (cpu ms) (cpu ms) (cpu m$
compiler on a &m-4/260 workstation. The fastest perm 1559 660 420 120
Smalltalk system currently available (excluding towers 2130 900
859 520 i% E
graphics performance) is the ParcPlacc V2.4 p2 z-F% 1490 970 160
F$ 16510 5290 (“Z$ 1 770
Smalltalk- virtual machine, rated at about 4 1239 110
bubble 2970 1% 1230 170
Dorados* [pP88]; this system includes the Deutsch- tree 1760 1750 1480 820
Schiffman techniques described earlier. We compare richards 7740 2760 (2760) 730
transliterations from C into Smalltalk and SELF of the
sumToTe& 25 18 (‘8) 4
Stanford integer benchmarks men881 and the Richards
recurreTest 169 52 (52) 32
operating system simulation benchmadr [Deu88], as
well as the following smail benchmarks, adapted from
The entries in the following table are the ratios of the
Smalltalk- systems FTcC!83]:
running times of the benchmarks for the given pair of
sumToTe& = ( 1 sumTo: 10000 ). systems. From our point of view, bigger numbers am
sumTo : arg = ( better in the first two columns, while smaller numbers
I total <- 0 I am better in the last two columns. The most
to: arg Do: I I :index I meaningful rows of the table am probably the rows for
total: total + index. the median of the Stanford integer benchmarks and the
1. row for the Richards benchmark.
total ).
Relative Performance of SELF
Smalltalk/ SmalHalW Smalltalk/ SELF/ SELF’/
recurseTest = ( 14 recurse ). SELF SELF’ C C C
recurse = ( 13.0 3.5
perm 3.7
= 0 ifFalse: [ towers $:i 3.8 11.2 2;
Fgy . 1.8 5:;
(- 1) recurse. (- 1) recurse. 1.5 ES 3 6.1)
3 1. puule i: I 1 21:4 I
quick 1:s ;::, 11.3 :! %?
bubble 1.8 2.4 17.5 9:s 7.2
We also rewrote most of the Stanford integer tree 2.1
min ::x ::; z :.t
benchmarks in a more SELFish programming style, median 1% 5:s 5:1
max E X:; 21:4 9.8 7.2
using the first argument of a C function as tbe receiver
richads 2.8 (2.8) 10.6 3.8 (3.8)
of the corresponding SELF method. Measurements for
the rewritten benchmarks are presented in columns sumToTest 1.4 (1.4) 6.2 4.5 (4.5)
labeled SELF’; times in parentheses mark those recurseTest 3.2 (3.2) 5.3 1.6 (1.6)
benchmarks that were not rewritten.
Our SELF implementation outperforms the Smalltalk
* A “Dorado” is a measure of the performance of Smalltalk implementation on every benchmark; in many cases
implementations. One Dorado is defined as the performance of SELF runs more than twice as fast as Smalltalk. Not
an early Smalltalk implementation in microcode on the 7011s surprisingly, an optimizing C compiler does better than
Xerox Dorado [Deu83]; until recently it was the fastest
available Smalltalk implementation. the SELF compiler. Some of the difference in

66 OOPSLA ‘89 Proceedings October I-6, 1989


performance results from significantly poorer programs. Atkinson’s Hurricane compiler compiles a
implementation in the SELF compiler of standard subset of Smalltalk annotated with type declarations
compiler techniques such as register allocation and [Atk86]. He reports a performance improvement of a
peephole optimization. Some of the difference may be factor of 2 for his Hurricane compiler over the Deutsch-
attributed to the robust semantics of primitive Schiffman system on a 68020-based Sun-3; our initial
operations in SELF: arithmetic operations always check SELF system already achieves the same performance
for overflow, array accesses always check for indices improvement over the Deutsch-S&i&an system
out of bounds, method calls always check for stack without type declarations. Johnson’s TS Typed
overflow. The rest of the difference is probably caused Smalltalk system type-checks and compiles Smalltalk-
by the lack of type information, especially for 80 programs fully annotated with type declarations
arguments and assignable data slots. We are remedying [JGZ88]. He reports a performance improvement of a
these deficiencies to a large extent in the second- factor of between 5 and 10 over the Tektronix
generation SELF system described in the next section. Smalltal.k-80 interpreter on a 68020-based Tektronix
4405. For a benchmark almost identical to our
The above table shows that the performance of object- sumToTest benchmark, he reports an execution time
oriented systems is improving dramatically. As a new of 62ms, which we executed in 18ms on a machine 3 to
metric for comparing the performance of these systems, 4 times faster than his machine. This makes his system’s
we propose the millions of messages per second performance roughly comparable to our system’s
(MiMS) measure, analogous to the millions of performance, even though his system relies on type
instructions per second (MIPS) measure for processors. declarations while ours does not. These results suggest
This number measures the performance of an object- that our compilation techniques do a good job of
oriented system in executing messages. To compute the extracting as much type information as is available to
MiMS rating of a system for a specific benchmark on a these other systems through programmer-supplied type
particular hardware platform, divide the number of declarations.
messagesthe benchmark sends by its total running time.
We define message sends as those invocations whose
semantics include a dispatch; for SELF, this includes
8. Future Work
references to slots in the receiver (“instance variable” SELF has not reached its fmal state. Although we have
accesses), since the same reference could invoke a established the feasibility and rewards of the
method, but excludes references to slots local to a implementation techniques described in this paper, much
method invocation (“local variable” accesses), since work remains.
these could never do anything other than accessdata. We
computed the MiMS rating of our first-generation 8.1. The Second-Generation SELFSystem
SELF system for the Richards benchmark on the
SPARC-based Sun-4/260 to be 3.3 MiMS, or a message We are in the process of reimplementing our entire
executed every 300ns [Lee88]. SELF system to clean up our code, simplify our design,
and include better compilation algorithms. As of this
The efhciency of an object-oriented system is inversely writing (July 1989), we have completely rewritten the
proportionaI to the number of instructions executed per object storage system and unified the run-time/compile-
message sent. The cycle time on the Sun-4/260 is 6Ons time message lookup system. We have implemented the
[Nam88], giving our SELF system a cost per message of core of the second-generation compiler, and it is now
about 5 cycles. Since the SPARC has been clocked at 1.6 compiles and executes about half of our SELF code.
cycles per instruction [Nam88] (accounting for cache
misses and multicycle instructions), this would give The new compiler performs type flow analysis to
our SELF system an efficiency rating of around 3 determine the types of many local slots at compile-
instructions per message sent. We are not aware of any time. It also includes a significantly more powetil
other implementations of dynamically-typed object- message splitting system. The initial message splitter
oriented languages that approach this level of efficiency. described in this paper only splits a message based on
the type of the result of the previous message; the
Other researchers have attempted to speed Smalltalk second-generation message splitting system can use any
systems by adding type declarations to Smalltalk type information constructed during type flow analysis,

October I-6, 1989 OOPSLA ‘89 Proceedings 67


especially the types of local slots. The message splitter fails, and the message cannot be inlined. Our second
may elect to split messages even when the message is generation system provides limited support for
not immediately after a merge point, splitting all dynamically-inherited methods by adding the types of
messages that intervene between the merge that lost the any assignable parents traversed in the run-time lookup
type information and the message that needs the type to the customization information about the method; the
information. method prologue tests the values of the assignable
parents in addition to the type of the receiver. We plan
Our goal for the combined type analyzer and extended to investigate techniques to optimize dynamically-
message splitter is to ahow the compiler to split off inherited methods.
entire sections of the control flow graph, especially
loop bodies, that manipulate the most common data The message inliner needs to make better decisions about
types. Along these common-case sections, the types of when to inline a method, and when not to. The inliner
most variables will be known at compile-time, leading should use information about the call site, such as
to maximally-inlined code with few run-time type whether it’s in a loop or in a failure block, to help
checks; in the other sections, less type information is decide whether to inhne the send, without wasting too
available to the compiler, and more full message sends much extra compile time and compiled code space. It
are generated. Under normal conditions the optimized should also do a better job of deciding if a method is
code will be executed, and the method will run fast, short enough to inline reasonably; counting the byte
possibly just as fast as for a C program. However, in codes with a fixed cut-off value as it does now is not a
exceptional situations, such as when an overflow very good algorithm. Finally, our implementation of
actually occurs, the flow of control will transfer to a type prediction hard-wires both the message names and
less optimized section of the method that preserves the the predicted type; a mom dynamic implementation that
messagepassing semantics. used dynamic profile information or analysis of the
SELF inheritance hierarchy might produce better, more
Our second-generation compiler also performs data
adapting results.
flow analysis, common subexpression elimination, code
motion, global register allocation, and instruction The current implementation of the compiler, though
scheduling. We hope that the addition of these speedy by traditional batch optimizing compiler
optimizations will allow our new SELF compiler to standards, is not yet fast enough for our interactive
compete with high-quality production optimizing programming environment. The compiler takes over
compilers. seven seconds to compile and optimize the Stanford
integer benchmarks (almost 900 lines of SELF code),
8.2. Open Issues and almost three seconds to compile and optimize the
Method arguments are one of the largest sources of Richards benchmark (over 400 lines of SELF code). We
“unknown” type information in the current compiler. plan to experiment with strategies in which the
We want to extend our second-generation system to compiler executes quickly with little optimization
customize methods by the types of their arguments in whenever the user is waiting for the compiler, queuing
addition to the receiver type. This extension would up background jobs to recompile unoptimized methods
provide the compiler with static type information about with full optimization later.
arguments so it could generate faster code. These
Work remains in making sure that our techniques are
benefits have to be balanced against the costs of
practical for larger systems than we have tested. To
verifying the types of arguments in the prologue of the
fully understand the contributions of our work, we
method at run-time.
need to analyze the relative performance gains and the
The compile-time lookup strategy works nicely as long associated space and time costs of our techniques. This
as all the parents that get searched am constant parents; analysis will be performed as part of the first author’s
if any am assignable, then the compile-time lookup forthcoming dissertation,

68 OOPSLA ‘89 Proceedings October 1-6, 1989


9. Conclusions customization and automatic inlining could be used to
eliminate many C-t-+ virtual function calls, encouraging
Many researchers have attempted to boost the broader use of object-oriented features and programming
performance of dynamically-typed object-oriented styles by reducing their cost. Compiler-generated
languages. The designers of Smalltalk- hard-wired debugging information could be used by the C++
the definitions of user-level arithmetic and control debugger to hide the inlining from the user, just as our
methods into the compiler, preventing the users from compiler generates debugging information to
changing or overriding them. Other researchers added reconstruct the SELF virtual call stack.
type declarations to Smalltalk, thereby hindering reuse
and modification of code. We devised dynamic SELF is practical: our implementation of SELF is twice
customized compilation, static type prediction, type as fast as any other dynamically-typed purely object-
flow analysis, message splitting, and message inlining to oriented language documented in the literature. The
automatically extract and preserve static type SELF compiler achieves this level of efficiency by
information. Our measurements suggest that our system combining traditional optimizing compiler technology
runs just as fast as Smalltalk systems with type like procedure inhning and global register allocation,
declarations and at least twice as fast as those with specialized techniques developed for high-speed
hard-wired methods. Researchers seeking to improve Smalltalk systems like dynamic translation and inline
performance should improve their compilers instead of caching, and new techniques like customization, message
compromising their languages. splitting, and type prediction to bridge the gap between
them. The resulting synergy of old and new results in
SELF’s novel features do not cost the user either good performance.
execution time or storage space. Our virtual machine
supports the prototype object model just as space- and Acknowledgments
time-efficiently as similar class-based systems; maps
act as implementation-level classes and thus reclaim the We owe much to Randy Smith, one of the original
efficiency of classes for the implementation without designers of SELF. We also would like to thank Peter
inflicting class-based semantics on the SELF user. Deutsch for many instructive discussions and seminal
SELF’s use of messages to access variables has ideas for the design and implementation of SELF. Bay-
absolutely no effect on the fmal performance of SELF Wei Chang implemented our graphical SELF object
programs, since these message sends are the first to get browser and contributed to discussions on the future of
inlined away. Once an implementation reaches this level the SELF language and implementation.
of sophistication and performance, the information
provided by classes and explicit variables becomes
redundant and unnecessary. Prototype-based languages
can run just as fast as class-basedlanguages. References
[ASU86] Alfred V. Aho, Ravi Sethi, and Jeffrey D.
Our implementation introduces new techniques to Ullman. Compilers: Principles, Techniques, and
support the programming environment. The segregation Tools. Addison-Wesley, Reading, MA, 1986.
of object references from byte arrays speeds scavenging [Atk86] Robert G. Atkinson. Hurricane: An Opti-
and scanning operations. Dependency lists reduce the mizing Compiler for Smalltalk. In OOPSLA ‘86 Con-
response time for programming changes. Detailed ference Proceedings, pp. 151-158, Portland, OR,
debugging information maps the execution state into the 1986. Published as SZGPLAN Notices Zl(Il), Novem-
ber, 1986.
user’s source-level execution model, transparently
“undoing” the effects of method inlining and other pMW86] Mark B. Ballard, David Maier, and Allen
Wirfs-Brock. QUICKTALK A Smalltalk- Dialect
optimizations. for Defining Primitive Methods. In OOPSLA ‘86 Con-
ference Proceedings, pp. 140-150, Portland, OR,
Our techniques are not restricted to SELF; they apply to 1986. Published as SZGPLAN Notices 2I(lI), Novem-
other dynarnicaIly-typed object-oriented languages like ber, 1986,
Smalltalk, Flavors, and CLOS. Many of our techniques [Bob881 D. G. Bobrow et al. Common Lisp Object
could even be applied to statically-typed object-oriented System Specification. In SIGPLAN Notices 23(Special
languages like Ct+ and Trellis/Owl. For example, Issue), September, 1988.

October 1-6, 1989 OOPSLA ‘89 Proceedings 69


[BI82] A. H. Borning and D. H. H. Ingalls. A [McC83] Kim McCall. The Smalltalk- Bench-
type declaration and inference system for Smalltalk. marks. In [Kra83], pp. 153-174.
In Conference Record of the Ninth Annual Sympo- WyW Bertrand Meyer. Genericity versus Inherit-
sium on Foundations of Computer Science, pp. 133- ance. In OOPSEA ‘86 Conference Proceedings, pp.
139,1982. 391-405, Portland, OR, 1986. Published as SIGPLMV
[Bor86] A. H. Boming. Classes Versus Prototypes Notices 21(11), November, 1986.
in Object-Oriented Languages. In Proceedings of the [Moo861 David A. Moon. Object-Oriented Program-
ACMIEEE Fall Joint Computer Conference, pp. 36- ming with Flavors. In OOPSLA ‘86 Conference Pro-
40, Dallas, TX, 1986. ceedings, pp. 1-16, Portland, OR, 1986. Published as
[Cur891 Pave1 Curtis. Type inferencing in Small- SIGPLAN Notices 21 (II), November, 1986.
talk. Personal communication, March, 1989. [Nam88] Masood Namjoo et al. CMOS Gate Array
[~@‘I Craig Chambers and David Ungar. Customi- Implementation of the SPARC Architecture. In
zation: Optimizing Compiler Technology for SELF, a COMPCON ‘88 Conference Proceedings, pp. 10-13,
Dynamically-Typed Object-Oriented Programming San Francisco, CA, 1988.
Language. In Proceedings of the SIGPLAN ‘89 Confer- [PP88] ParcPlace Newsletter, Winter 1988, Vol.
ence on Programming Language Design and Impiemen- 1, No. 2. ParcPlace Systems, Palo Alto, CA, 1988.
tation, Portland, OR, June, 1989. Published as SIGP- [Sch86] Craig Schaffert et al. An Introduction to
LAN Notices 24(7), July, 1989. Trellis/Owl. III OOPSLA ‘86 Conference Proceed-
peu83] L. Peter Deutsch. The Dorado Smalltalk- ings, pp. 9-16, Portland, OR, 1986. Published as SIG-
Implementation. Hardware Architecture’s Impact on PLAN Notices 21(11), November, 1986.
Software Architecture. In &1x83], pp. 113-126. [Ste76] Guy Lewis Steele Jr. LAMBDA: The Ulti-
Peu88] L. Peter Deutsch. Richards benchmark. Per- mate Declarative. AI Memo 379, MIT Artificial In-
sonal communication, October, 1988. telligence Laboratory, November, 1976.
peu89] L. Peter Deutsch. Expanded byte codes for [Ste87] Lynn Andrea Stein. Delegation Is Inherit-
primitives. Personal communication, June, 1989. ance. Iu OOPSLA ‘87 Conference Proceedings, pp.
K’S841 L. Peter Deutsch and Allan M. Schiffman. 138-146, Orlando, FL, 1987. Published as SIGPLAN
Efficient Implementation of the Smalltalk- Sys- Notices 22(12), December, 1987.
tem. In Proceedings of the 11 th Annual ACM Sympo- [SS76] Guy Lewis Steele Jr. and Gerald Jay Suss-
sium on the Principles of Programming Languages, pp. man. LAMBDA: The Ultimate Imperative. AI Memo
297-302, Salt Lake City, UT, 1984. 353, MIT Amficial Intelligence Laboratory, March,
[GR83] Adele Goldberg and David Robson. Small- 1976.
talk-80: The Language and Its Implementation. Addi- [Str86] Bjame Stroustrup. The C++ Programming
son-Wesley, Reading, MA, 1983. Language. Addison-Wesley, Reading, MA, 1986.
men881 John Hennessy. Stanford integer bench- [Suz8 11 N. Suzuki. Inferring Types in Smalhalk.
marks. Personal communication, June, 1988. In 8th Annual ACM Symposium on Principles of Pro-
[JGZ88] Ralph E. Johnson, Justin 0. Graver, and gramming Languages, pp. 187-199,198l.
Lawrence W. Zurawski. TS: An Optimizing Compiler WwW David Michael Ungar. The Design and
for Smalltalk. In OOPSLA ‘88 Conference Proceed- Evaluation of a High-Performance Smalltalk System.
ings, pp. 18-26, San Diego, CA, 1988. Published as Ph.D. dissertation, the University of California at
SZGPL.AN Notices 23(11), November, 1988. Berkeley, February, 1986. Published by the MIT
cKra831 Glenn Krasner, editor. Smalltalk-80: Bits of Press,Cambridge, MA, 1987.
History, Words of Advice. Addison-Wesley, Reading, D-JJW David Ungar and Frank Jackson. Tenuring
MA, 1983. Policies for Generation-Based Storage Reclamation. In
ILee881 Elgin Lee. Object Storage and Inheritance 0OPSL.A ‘88 Conference Proceedings, pp. l-17, San
for SELF, a Prototype-Based Object-Oriented Pro- Diego, CA, 1988. Published as SZGPLAN Notices
gramming Language. Engineer’s thesis, Stanford Uni- 23(11), November, 1988.
versity, 1988. [US871 David Ungar and Randall B. Smith. SELF:
Lie861 Henry Lieberman. Using Prototypical Ob- The Power of Simplicity. In 0OPSL.A ‘87 Conference
jects to Implement Shared Behavior in Object-Orient- Proceedings, pp. 227-241, Orlando, FL, 1987. Pub-
ed Systems. In OOPSLA ‘86 Conference Proceedings, lished as SIGPLAN Notices 22(12), December, 1987.
pp. 214-223, Portland, OR, 1986. Published as SIGP- lYeg871 Peter Wegner. Dimensions of Object-Based
LAN Notices 21(1 I), November, 1986. Language Design. In OOPSLA ‘87 Conference Pro-
ILTPW Wilf R. LaLonde, Dave A. Thomas, and ceedings, pp. 227-241, Orlando, FL, 1987. Published
John R. Pugh. An Exemplar Based Smalltalk. In as SIGPLAN Notices 22(12), December, 1987.
0OPSL.A ‘86 Conference Proceedings, pp. 322-330,
Portland, OR, 1986. Published as SIGPL.AN Notices
21(11), November, 1986.

70 OOPSLA ‘89 Proceedings October 1-6, 1989

You might also like