0% found this document useful (0 votes)
21 views

Columnar Objects

Uploaded by

Pablo Tesone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Columnar Objects

Uploaded by

Pablo Tesone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Columnar Objects

Improving the Performance of Analytical Applications

Toni Mattis1 , Johannes Henning1 , Patrick Rein1 , Robert Hirschfeld1,2 , and Malte Appeltauer3
1
Hasso-Plattner-Institute, University of Potsdam, Germany; {first.last}@hpi.uni-potsdam.de
2
Communications Design Group (CDG), SAP Labs; Viewpoints Research Institute
3
SAP Innovation Center Potsdam, Germany; [email protected]

Abstract 1. Introduction
Growing volumes of data increase the demand to use it in Advances in information technology have lead to an ever
analytical applications to make informed decisions. Unfor- increasing amount of available data, creating demand to
tunately, object-oriented runtimes experience performance analyze it using algorithms from a variety of methodologies,
problems when dealing with large data volumes. Simi- e.g. statistics, clustering, and forecasting. Such analyses are
lar problems have been addressed by column-oriented in- typically executed on database systems, as these provide
memory databases, whose memory layout is tailored to an- an optimized execution and efficient scaling on the data
alytical workloads. As a result, data storage and processing volume used. For an interactive analysis, the response time
are often delegated to such a database. However, the more is crucial. Thus, the improvement of analytical algorithm
domain logic is moved to this separate system, the more performance has been the goal of recent developments in
benefits of object-orientation are lost. relational database technology, such as columnar in-memory
We propose modifications to dynamic object-oriented databases [19].
runtimes to store collections of objects in a column-oriented In contrast, dynamic object-oriented execution environ-
memory layout and leverage a just-in-time compiler (JIT) ments have been optimized for different use-cases, in par-
to take advantage of the adjusted layout by mapping ob- ticular for systems with manifold interactions of elements in
ject traversal to array operations. We implemented our con- a complex domain. Hence, they suffer from suboptimal ex-
cept in PyPy, a Python interpreter equipped with a tracing ecution times for analytical algorithms. The requirement of
JIT. Finally, we show that analytical algorithms, expressed short response times can force programmers to give up on
through object-oriented code, are up to three times faster object-oriented principles, like abstractions close to the ap-
due to our optimizations, without substantially impairing the plication domain. Instead, they have to program using lan-
paradigm. Hopefully, extending these concepts will mitigate guages or libraries with less suited abstractions or switch the
some problems originating from the paradigm mismatch be- programming paradigm altogether.
tween object-oriented runtimes and databases. Relational databases, which are a common approach to
data-intensive applications, implement a different paradigm.
Categories and Subject Descriptors D1.5 [Programming To take advantage of the functions implemented in the
Techniques]: Object-oriented Programming; D3.4 [Pro- database, we have to incorporate them into our object-
gramming Languages]: Processors—Run-time environments, oriented application. There are three common options for
Optimization; E2 [Data storage representations]: Object this: Using an object-relational mapper (ORM), which can-
representation cels out most performance benefits gained through the
Keywords Column-oriented Object Layout, Dynamic Lan- database and eventually reduces maintainability [16], em-
guages, Data Science, Just-in-Time Compilation ploying Structured Query Language (SQL) directly, which
constrains developers to a different paradigm and restricts
the set of expressible algorithms, or we can use a stored
Permission to
Permission to make
make digital
digital or
or hard
hard copies
copies ofof all
all or part of this work for personal or
classroom use
classroom use isis granted
granted without
without fee
fee provided
provided that
that copies are not made or distributed
procedure language, which provides good performance but
for
for profit or commercial advantage and that copies bear
profit or commercial advantage and that copies bear this notice and the full citation often lacks concepts to adequately express the application
on
on the first page. Copyrights for components ofthis
the first page. Copyrights for components of thiswork
workowned
ownedbyby
others than
others ACM
than the
must be honored.
author(s) must beAbstracting with creditwith
honored. Abstracting is permitted. To copy otherwise,
credit is permitted. or republish,
To copy otherwise, or
domain.
to
republish, to post on servers or to redistribute to lists, requires prior specific permissiona
post on servers or to redistribute to lists, requires prior specific permission and/or None of these options are satisfactory regarding perfor-
fee. Request permissions from [email protected].
and/or a fee. Request permissions from [email protected].
Copyright is held by the owner/author(s). Publication rights licensed to ACM. mance, expressiveness, and maintainability. An ideal solu-
Onward! 2015, October 25–30, 2015, Pittsburgh, Pennsylvania, USA.
Copyright is held
Onward!’15, by the 25–30,
October owner/author(s). Publication PA,
2015, Pittsburgh, rights licensed to ACM.
USA tion would allow us to keep programming with an object-
ACM 978-1-4503-3688-8/15/10.
ACM. 978-1-4503-3688-8/15/10...$15.00. . $15.00.
http://dx.doi.org/10.1145/2814228.2814230
http://dx.doi.org/10.1145/2814228.2814230

197
class Player: pass Column<Int32> playerCol =
match_results.getColumn<Int32>("PLAYER");
class Match: Column<Int32> opponentCol =
def __init__(self, black, white, result): match_results.getColumn<Int32>("OPPONENT");
self.black = black // 16 additional declarations omitted ...
self.white = white while(row < length) {
self.result = result current_player = Size(playerCol[row]);
current_opponent = Size(opponentCol[row]);
def predict_result(self): current_result = matchResultCol[row];
d = self.white.rating - self.black.rating expected = predict_result(
return 1. / (1. + 10. ** (d / 40.)) ratingCol[current_player],
ratingCol[current_opponent]);
for match in matches: delta = 2 * (Double(current_result) - expected);
expected = match.predict_result() ratingCol[current_player] += delta;
delta = 2 * (match.result - expected) ratingCol[current_opponent] -= delta;
match.black.rating += delta row = row + 1z;
match.white.rating -= delta }

Listing 1: Example implementation of the Elo chess Listing 2: Elo implementation in the column-oriented
ranking algorithm in Python with columns. Constructor stored procedure language “L”. Most column and vari-
of class Player omitted. able declarations omitted; prediction function omitted.

oriented language and get the performance of a stored pro- 2. Background


cedure execution. In this section, we define the scope of our contribution and
We argue that the current performance deficiencies of expand on our motivation by examining related solutions.
object-oriented runtimes are not inherently linked to object- Additionally, we will explain concepts our approach is based
orientation but to the way most runtimes are implemented. on, namely ideas from in-memory databases as well as JIT
Considering the optimizations databases implement for the technologies.
execution of analytical algorithms, there are several possi-
bilities not yet explored in the realm of dynamic object- 2.1 Analytical Applications
orientated runtimes.
Our approach targets the execution of data-intensive ana-
The observation which motivates these optimizations is
lytical applications, which involve reporting, data mining,
that analytical algorithms often run on homogeneously struc-
forecasting, and similar algorithms. They typically process
tured data [19], i.e. all objects in the processed collection
large amounts of data in a read-intensive way, free of side-
have the same fields. We aim to improve the performance
effects, and produce an aggregate value, a model from which
of object-oriented runtimes regarding algorithm execution
patterns and prognoses can be drawn, or similar decision-
on homogeneous data, while maintaining constructs and
supporting results.
mechanisms for object-oriented domain abstractions. Our
The processed data is homogeneously structured, i.e.
approach utilizes the concept of a columnar data layout to
from a database perspective there are a lot of entities per
implement objects in dynamically typed, object-oriented ex-
table. Further, the data often has a high dimensionality,
ecution environments and optimize execution by leveraging
which corresponds to many columns per database table.
recent advances of just-in-time compiler technology. We im-
However, most analytical algorithms only use a subset of
plemented our approach in a prototype based on PyPy1 . Sub-
these columns [19, 20]. We neither target transactional nor
sequently, we evaluated its implementation regarding perfor-
write-intensive processing. For our concepts we assume that
mance of real-world analytical algorithms and its integration
the data is already available and is primarily read.
into Python. In particular, our contributions are:
2.2 Limitations of Current Approaches
1. A column-oriented object layout for a dynamically typed
language, such that a tracing just-in-time compiler can We identified the following approaches for implementing
generate code optimized for the traversal of large collec- analytical applications:
tions of objects 1. Implement the full application inside a single language
2. A prototypical implementation as a library based on the and execution environment
Python interpreter PyPy 2. Move the data to a database, but maintain object-oriented
3. A performance evaluation of our implementation abstractions by using an ORM
3. Move the data to a database, but give up object-oriented
abstractions and implement performance-critical logic as
1 PyPy is a meta-traced Python interpreter build in RPython [22] stored procedures or SQL

198
The first approach is the most preferrable in terms of con-
Memory ...
venience and maintainability, but performance and memory ... ...
efficiency is often impractical. We explain the reasons in sec- 0 1 2 ...
tion 3.
The perhaps most convenient way to integrate a database
into an object-oriented application is to use an ORM. It
L2 Cache Line
maps domain classes to tables of the database. Whenever Attribute Access
the developer accesses them, the ORM generates the re- 0 1 2 ...
quired SQL commands, executes them and parses the re- Figure 1: Subsequent access to values stored in a sequence
sponse into objects. However, the performance degradation in memory leads to a high cache utilization.
of this approach outweights its benefits for data-intensive ap-
plications. For instance, when looping over all entries of a
table, the ORM will query and materialize each object sepa- better performance for analytical applications compared to a
rately before any computation. The resulting objects take up row-based layout. At the same time, elaborate mechanisms
regular object memory space and offer none of the database are now needed for write-intensive operations or single en-
optimizations like compression or column-wise traversal. tity selects. One example for a columnar database based on
To take advantage of these optimizations, the performance- the described concepts is SanssouciDB, which influenced
critical algorithm needs to be executed close to the data, the design of SAP HANA [19, 21].
and therefore is implemented as a stored procedure. While
2.4 Technology of Meta-Tracing JIT Compilers
databases provide different stored procedure languages, we
have observed that none offers support for in-place execution Just-in-time compilation is a common mechanism to im-
of dynamically typed object-oriented languages. Stored pro- prove performance in interpreted languages. In contrast to
cedure languages offer abstractions close to the database do- ahead-of-time compilation, JIT compilation arrives at opti-
main which seldom fit our object-oriented abstractions and mization decisions based on the observed types and con-
do not offer the aforementioned maintainability benefits. All trol flows during program execution. This leads to optimized
in all, we lose advantages of object-oriented programming code which may be very specific to currently processed in-
by introducing code into our system that is more difficult to puts, but can compete with low-level languages in terms of
maintain, often impossible to debug and needs to be inte- performance.
grated and managed through elaborate database interfaces.
2.4.1 Tracing JITs
A simplified example of a stored procedure that operates on
column abstractions is given in listing 2. A tracing JIT is a particular strategy to collect assumptions
regarding the control flow of a program. When a code re-
2.3 Columnar In-Memory Data Storage gion, e.g. a loop body or method, gets executed very often,
In-memory databases perform well executing analytical al- a tracer starts to record all operations that are executed dur-
gorithms. This performance stems from the fact that the data ing the next run; tracing includes if-branches and descends
already resides in main memory, as well as from an opti- into method calls. Whenever the trace could have taken an-
mized data layout which mediates issues caused by the mem- other path, it records a guard storing the assumption that lead
ory wall [2, 19]. With regard to the execution of algorithms to this particular trace, e.g. the class handling a polymor-
on large data volumes, dynamic runtimes might benefit from phic call or the if-condition causing a branch. The recorded
this data layout. trace is optimized using common subexpression elimination,
Main memory is a performance bottleneck, as its latency constant folding and the elimination of redundant guards
is high in comparison to CPU computations. To mitigate this by propagating assumptions (types, non-negativeness, array
problem, hardware caches were introduced. As an analyt- bounds, etc). The resulting trace undergoes a register allo-
ics application accesses large amounts of data, the optimal cation step and is compiled to architecture-specific native
utilization of the cache determines the overall performance code [4].
of operations on an in-memory database. For subsequent ac-
cess, a columnar data layout can improve cache utilization in 2.4.2 Meta-Tracing
comparison to row-based storage, by storing all the values of A meta-tracing JIT does not trace the actual user program,
one field for all entities in one sequence (see fig. 1) [1]. but the interpreter running that program. It detects hot paths
As a column stores values which have the same charac- and loops by observing whether certain parts of the inter-
teristics (e.g. type, range, or distribution), most compression preter state repeat, e.g. a repeating program counter may in-
methods become more effective, for example, a run-length dicate a loop inside the user program. Meta-tracing JITs can
encoding. Through these compressions, columnar databases be reused across languages. They do not come into direct
can also store larger amounts of data in memory than row- contact with the high-level dynamicity of the user language
based databases. Overall, the columnar data layout provides but only observe how it is implemented using lower-level

199
primitives. From a modularity view point, they “scrape” the 3.2 Problems of Traditional Object Layouts
cross-cutting concerns of JIT compilation (especially trac- The flexibility of common object implementations in dy-
ing) from the actual language implementation. Program de- namic languages reduces their efficiency regarding the pro-
sign that optimizes for a meta-tracing JIT is likely transfer- cessing of large collections. The following indirections are
able to other language implementations based on it [4]. some of the problems associated with processing collections
of traditionally structured objects:
2.4.3 Allocation Removal and Escape Analysis
The most important optimization we will address in this pa- • Boxed values need unboxing to process the raw value.
per is the removal of object allocations inside a trace [5]. • Individually boxed values introduce memory overhead.
Dynamic languages usually wrap primitive values, e.g. inte-
• Unboxed values need to be re-wrapped in order to be
gers, inside boxes that can be passed around like any other
stored in an object.
object. However, if a box is only referenced inside a trace
and is proven to never escape the trace by a process called • Large objects fill the cache with adjacent attributes that
escape analysis, there is no need to allocate that box. The are probably not needed right now.
surrounding trace can be re-written to directly operate on • Collections store pointers to objects, so traversal switches
the primitive and garbage collector invocations are removed between collection memory, object memory and boxed
completely. Even if the object escapes, its allocation can be value memory all the time, reducing overall memory
deferred to the end of the trace, allowing JITted code to op- locality.
erate on the raw value before it gets wrapped. This optimiza-
• Modern JIT compilers may omit repeated map lookups
tion can be done recursively for exploding structured objects
by compiling the positions directly into native code.
into register types, i.e. integers, floating point numbers and
However, an object’s attribute layout (map) can change
pointers.
quickly, so the JIT needs frequent tests whether the gen-
erated code is still valid by checking whether it is still the
3. A Columnar Object Layout same map.
The following section elaborates on how we adapted the
memory layout of objects and contrasts it with common 3.3 Columnar Object Layouts
implementations found in dynamic languages. We describe Ideally, traversing through many objects and reading one at-
at a conceptual level how these changes help a tracing JIT tribute from each of them should fill the CPU cache with the
transforming object-oriented loop code into low-level array same, fully unboxed attribute of the next objects and neither
operations. pollute the cache with unused attributes nor cause memory
accesses to maps and classes. We generate this behavior by
3.1 Common Implementation of Objects storing corresponding values of the same attribute in a con-
Current object-oriented runtimes represent an object as a secutive chunk of memory, conceptually known as column
continuous block of memory prefixed by a header and fol- in the context of databases. (see Figure 3)
lowed by the values assigned to its attributes. References to We introduce an additional type of class that explodes the
an object are implemented as pointers to the object’s mem- attributes of its instances into arrays and reduces the instance
ory [12] 1 . itself to just an offset at which its attributes reside in their
respective columns. This idea has been explored before in
Maps The runtime can read an attribute by loading the compiler-based transformations to speed up simulations in
object memory at a given offset. Dynamic languages usually Java [17] and Kedama [18] (see sections 6.1.1 and 6.1.3).
link an attribute name to its offset via a structure called Our approach, in contrast to previous approaches, works
map [3, 6, 23] or hidden class [9] referenced by the object without modifications to the language or compiler. It heavily
header and shared between objects with the same structure. relies on a JIT to take full advantage of the vertical object
In some implementations (e.g. Dart [23] and PyPy [3]), layout. Moreover, we allow mixed compositions of both
object header and attribute storage can be separated, so the columnar and non-columnar objects and retain full run-time
attribute storage can be relocated in order to grow. reflection and metaprogramming capabilities on columnar
objects.
Boxed Primitives In order to treat everything like an object
and refer to it via pointers, primitives like integers, Boolean 3.4 Classes, Identity and Associations
values and other numbers are boxed. Figure 2 shows the
memory structure of a full-fledged object using maps and Our model changes the implementation of object identity.
boxed primitives. Columnar objects are uniquely identified by their class
(which refers to the columns) and their offset inside the
1A notable exception are object tables known from Smalltalk, which im- columns, which we call ID. Referring to an object there-
plement a location-independent notion of object identity [8] fore means referring to a class at a given ID. From this point

200
class ... int 1 instance ...

instance ...
map storage
instance

__class__ result white black

Figure 2: Memory layout of an object using maps and boxed primitives

conceptual view memory layout


We also introduce a class ID column to record the most
specific subclass responsible for a particular column offset
LineItem: Class attributes quantity price
and to guarantee correct polymorphic message dispatch in
- quantity: int
column column co-variant collections and attributes.
- price: decimal
3.6 Gradual Typing
0 10 99.50
«instance» object
In contrast to the fully dynamic nature of Python, columnar
1 2 18.00
attributes should have a type. For value types, e.g. integers,
class 2 2 25.50
item : LineItem
= this allows to allocate a plain array without any boxing or
id = 3 3 4 40.00
run-time type checks. Also, knowing the exact columnar
- quantity = 4 4 1 8.00
- price = 40.00 class of a reference allows us to unpack the ID of their
5 3 21.80
instance into a plain integer array and reconstruct the proxy
6 1 9.00
whenever the association is read.
However, types are optional. A column may receive type
Figure 3: Memory layout of columnar objects and their prox- Object and store boxed values, proxy objects to other colum-
ies. Classes maintain one column per attribute. Proxies con- nar classes and arbitrary objects from the language. This
sist of an ID that is used as offset into the column corre- may only cause performance issues if the dynamically typed
sponding to an attribute. column is frequently read during an analytic computation.

3.7 Leveraging the JIT compiler


of view, classes behave like a collection of all their instances Our primary goal is to improve speed despite having proxy
addressable via their IDs. objects as additional indirection. This can be achieved by
In order to embed our identity model into the traditional reducing the life time of proxies containing just the class
pointer-based model, we introduce a proxy2 , that stores class and the ID to a minimum. The more limited an object’s life
and ID. The proxy represents a columnar instance and redi- time and scope, the more effective the allocation removal
rects attribute access to the columns of its class. will be. Also, whenever a proxy object “escapes”, i.e. there
Associations to other columnar classes can be represented are references to the proxy which survive a particular loop or
by an integer column storing only the IDs of referenced method body, then it cannot be optimized away, but becomes
instances, while the column itself stores the target class. an actual heap object. Life time reduction can be achieved at
Associations to non-columnar classes can be represented multiple positions:
simply as a column of pointers to the corresponding object.
• Iterators traversing collections of columnar objects al-
3.5 Polymorphism and Encapsulation ways emit a fresh proxy. As long as this proxy is only
used inside the loop, allocation removal will explode the
A columnar class can implement methods and class mem- proxy into ID and class. Loop code will subsequently be
bers like any non-columnar class; only attribute access will compiled to work with a plain integer ID instead of a
be re-interpreted by the runtime. This way, we do not inter- proxy object.
fere with encapsulation and information hiding.
• Collections should deconstruct inserted proxies into ID
We explicitly allow subclassing of columnar classes. A
subclass inherits all columnar attributes of its base class and class and reconstruct the proxy on each read access.
and may introduce additional columnar attributes. Inherited This saves memory and prevents the proxy from “escap-
columns are shared between instances of all subclasses in ing” due to a reference by the collection.
a hierarchy and newly introduced columns will have null- • When following an association, a new proxy represent-
values at all offsets not belonging to their declaring class. ing the instance of the target class is created. However,
if only a primitive attribute is read from that proxy (e.g.
2 also known as surrogate in the context of object identity [12] match.black.rating) it will never be allocated and the

201
plicitly invoke getters and setters associated with the respec-
class Player(Columnar, Float(’rating’),
String(’name’)): pass
tive property.

class Match(Columnar, Player.one(’black’), 4.3 Attribute Mixins


Player.one(’white’),
Integer(’result’)):
The term Integer(’result’) in the class header of List-
ing 3 creates a mix-in, whose purpose is to provide an inte-
def predict_result(self): ger column and a property that accesses the column when-
d = self.white.rating - self.black.rating ever the result attribute is read or written. This way, we can
return 1. / (1. + 10. ** (d / 40.))
compose our columnar class by inheriting single-attribute
for match in matches: mix-ins. As attribute lookup is late-bound, new columnar at-
expected = match.predict_result() tributes can still be added and removed from the class dy-
delta = 2 * (match.result - expected) namically. The following code illustrates the implementa-
match.black.rating += delta
match.white.rating -= delta tion of the Integer() mix-in factory, which spawns integer
columns with accessors:
Listing 3: Elo code with columnar attribute definitions added
to the classes. The integer ’result’ is 0 if the black player won def Integer(name):
or 1 if the white player won. # column construction
column = allocate_int_column()

# getter reads column at instance offset


def getter(instance):
operation is folded into a nested array lookup (equivalent
return column[instance.__id__]
to evaluating rating_column[black_column[match_id]]
without ever allocating a proxy) # setter writes column at instance offset
def setter(instance, value):
Table 1 shows the effects of allocation removal on an column[instance.__id__] = value
iteration over columnar objects in contrast to iterating over
unmodified, traditional objects. prop = property(getter, setter)

# type/mixin construction:
4. Implementation return type(
name=’’, # anonymous class
Our implementation consists of a plain Python library which
bases=(), # no base classes
provides an API for working with columnar objects and can dict={name: prop}, # redirect ’name’ to prop
be loaded at run-time. We only target the PyPy implementa- )
tion of Python due to its meta-tracing JIT.
Our prototype uses proxies for each columnar object. The Listing 4: The Integer factory creating a mixin redirecting
proxies redirect attribute access to the respective columns. attribute access to a column.
We rely on PyPy’s allocation removal and inlining to prevent
the proxy from causing any overhead in JIT-compiled code. The resulting inheritance hierarchies do not impact per-
formance, because the tracing JIT will remove super-class
4.1 Example lookups and inline accessors, regardless of where they ap-
Listing 3 shows an implementation of the Elo chess rank- pear in the hierarchy.
ing algorithm from listing 1 using our library. The base
4.4 Inspecting the Trace
class Columnar implements instance and proxy creation,
Float(’rating’) and Integer(’result’) create primi- Apart from continuous speed measurements, the effective-
tive columnar attributes named rating and result, while ness of optimizations can be analyzed by inspecting the trace
Player.one creates an association attribute. There are no produced by the JIT. Consider the following microbench-
changes to methods and to the analytical computation. mark counting how often the white player won:

4.2 Proxy Implementation def white_player_wins():


count = 0
What appears to be an instance of a class, e.g. a Match ob- for item in Match.instances:
ject, is in fact just a proxy object. In our Python implemen- # result: 0 = black wins, 1 = white wins
tation, their state consists of __class__ and __id__ fields, count += match.result
return s
as illustrated in fig. 3 and is managed by the Columnar base
class. This is analogous to normal objects, which consist of Listing 5: Example aggregation
a __class__ field but have their attributes directly attached
to the object. The attributes at a proxy are implemented by Iterating over the instances of a class yields new proxy
Python’s property objects, which instruct the runtime to ex- Match instances for each offset allocated by instances of this

202
Optimized operations during iteration
...on traditional objects ...on columnar objects
Check loop condition Check loop condition
Read object pointer match Read object ID id
Increment iterator Increment iterator
Check map of match for result Read integer result = result_column[id]
Read boxed_result = match.result
boxed_result is boxed integer?
Read integer result inside boxed_result
Process result and loop process result and loop

Table 1: Comparison between plain objects and columnar objects with regard to a JITed loop

class. After creating and aggregating several millions of in- 4.5 Associations
stances, the JIT converged to the following set of instructions An association to another class is represented as a column
(operation names and variable names are renamed for better of IDs. Instead of reading and writing proxies from and to
readability, # starts a comment): an array, the associated instance’s ID is stored inside the col-
umn and the wrapper object is reconstructed when the field
iterator = <set up iterator>
max = <get upper limit of iterator> is read. This mechanism is exposed to the user by the one()
column = <get result column> class method, which returns the appropriate type to inherit.
column_len = column.size See the usage of Player.one() in the Match class in list-
c = 0 # the unwrapped count variable
i = 0 # the unwrapped iterator state
ing 3. Evaluating an expression like match.black.rating
loop: now results in the following operations:
# --- inlined iterator call ---
guard(i < max)
k = i + 1
1. Lookup id = match.__id__
2. Lookup the player ID
# --- inlined item.quantity lookup ---
id2 = Match.black_column[id]
guard(k < column_len)
guard(k >= 0) 3. Construct the player proxy pl = Player[id2]
qty = column[k]
4. Lookup id3 = pl.__id__
# --- addition, overflow check ---
s = s + qty
5. Lookup the rating rating = Player.rating_column[id3]
guard_no_overflow
The allocation removal of the JIT will prevent the player
# --- write iterator state back ---
proxy from being allocated as it only serves the purpose
cur = wrap_int(k)
iterator.current = cur of looking up its rating and would be garbage collected
i = k instantly. Instead, the five lookups above will be collapsed
jump(loop) into two nested lookups, which do the same as:
count = wrap_int(c)
rating_column[black_column[id]]
where id is the fully unwrapped Match instance.
Listing 6: Optimized JIT trace for listing 5

4.6 Inheritance
We can observe that actual match objects have never been By introducing a metaclass for columnar classes, we over-
allocated and just exist as the fully unwrapped raw ID k. The ride class creation. When our metaclass constructor detects
JIT defensively guards our column access, which is accept- that a columnar class is being subclassed, it adds a class_id
able considering that the column length is deemed constant column to the topmost columnar class if not already present,
and held in a register. The most expensive operations are the and each class receives an integer representing its ID.
instantiations of int-objects at the end of each loop run. This When a proxy is created, which happens in iterators or
happens because the iterator state escapes the loop, so al- when following an association, it sets its __class__ field
location removal can only defer allocation, not prevent it. depending on the value of class_id at the instance’s off-
However, the more complex the algorithm gets, the less time set. This implements co-variance: Declaring an attribute as
is lost during deferred allocation in relation to the overall Player.one(’black’) also allows subclasses of Player to
computation. be written to and read from the black attribute.

203
storage
object list size

... storage
proxy list size class

proxy proxy
id id ...
class class
id id

Figure 4: Unoptimized lists with proxies (left) and lists with ID arrays as storage (right).

4.7 Collections tematic bias created by our choice of input data, each run
When proxies are put into collections, such as lists, the proxy used a different seed to generate random test data. All PyPy
object is usually stored as a full heap object, because it will measurements are conducted using the standard PyPy JIT
be referenced by the collection for a long time. This can be configuration and with a warm JIT. This means that with an
avoided by unpacking the proxy ID once it is inserted into input data size smaller than one million the benchmark was
the collection and reconstructing the proxy when it is ac- run 100 times before a measurement was taken. Above an in-
cessed. The resulting memory layout is depicted in Figure 4 put data size of one million the benchmark was run once and
and generally improves speed and memory efficiency of col- then a measurement was taken. This is sufficient as for the
lections of columnar instances. In our library, we provide PyPy JIT compiler only the total number of loop iterations
custom data types for lists, sets and dictionaries, which im- influences the optimizations.
plement this unwrapping behavior. 5.2 Benchmarked Algorithms

5. Evaluation There are established benchmark suites for object-oriented


runtime environments and analytical database systems, but
Our motivation for using a column-based object layout is to none of them fit the perspective from which we approached
decrease the execution time of analytical algorithms written application development. Typical benchmarks for object-
in an object-oriented dynamic language. We evaluated our oriented and mixed-paradigm languages, e.g. “Richards”
concept by running four analytical algorithms and three mi- and “Deltablue”, are computation-intensive rather than data-
crobenchmarks on our prototype. intensive and often involve a significant portion of writing
Our approach also aims to provide an abstraction to pro- and side-effects, which makes them unrepresentative for
gram analytical algorithms in an object-oriented fashion. To analytical scenarios. Common database benchmarks, e.g.
determine the effects of the changed memory layout on the “TPC-H”, are expressed in SQL, which does not directly
abstractions available to the programmer, we also qualita- map to Python constructs.
tively evaluated our integration of the columnar objects into We therefore acquired implementations of algorithms that
the object-oriented abstractions of Python. are used as stored procedures in production business appli-
5.1 Performance Benchmarks Setup cations (“ATP”, “KM”), added other benchmarks covering
the spectrum between data- and computation-intensive ana-
All benchmarks were executed on a server architecture with lytical algorithms (“Elo”, “Balance”), and ran them on dif-
the following specification: ferently sized collections of objects up to 10,000,000 items.
• CPU: 2 Hexa core Intel Xeon E5-2630 (24 logical cores), Except for the “Balance” benchmark, all these benchmarks
maximal clock speed of 2301 MHz access several different attributes of each instance.
• Memory: 128GB Main memory consisting of 8GB Available to Promise Benchmark (ATP) Available-to-
DDR3 modules with 1333 MHz clock speed promise answers the question whether a customer order can
• System software: SUSE Linux Enterprise Server 11 be fulfilled at a specific due date regarding given past and
(Kernel version 3.0.80-0.7-default) future stock changes and other customer requests. Our im-
plementation of an ATP algorithm has a set of fixed stock
• Runtimes and compilers: PyPy version 2.5.0-alpha0
changes and a set of orders, both include a time and an
and gcc version 4.3.4 revision 152973 amount of stock. The algorithm checks availability in an
We measured the execution time of a benchmark by wrap- iterative, backtracking fashion. It simulates the progress in
ping the benchmark in a function and measuring the time time and correspondingly applies the fixed changes. When
between calling the function and it returning. Each bench- an order is due it tries to satisfy it as soon as possible. If it
mark configuration was measured 60 times. To correct sys- is satisfied and it later turns out that there is a future fixed

204
stock change which is rendered impossible by this order, the meets our expectations exactly and reflects what the colum-
order is revoked and simulation starts again from the time of nar layout was intended for.
the order. The balance benchmark is a particularly difficult case for
Kaplan-Meier Estimator (KM) The Kaplan-Meier estima- control-flow observing optimizations like those in a tracing
tor estimates the survival curve of a population based on a JIT, as its loop contains a condition with an activation likeli-
sample of lifetimes. Amongst other applications, it is used in hood which varies a lot while traversing the input. Assump-
medical research to determine the survival rates of patients tions on the probability of a certain branch being traversed
after a specific treatment based on observed survival times. are often ineffective. Also, non-negativity assumptions can
The estimator is mathematically defined and has a straight- hold for the beginning while being violated frequently at
forward implementation as a product [11]. later stages of the input data. We see a wide confidence in-
terval, which indicates high data-dependent variance with a
Elo-Ranking Given a set of competing players and a large considerable chance of improvement in certain situations,
amount of data recording which player or strategy outper- but not in general. We assume that, among others, two types
formed or defeated an opponent, the Elo rating [7] puts a rat- of applications would suffer from switching to our layout:
ing on each competitor, quantifying its overall performance. transactional applications, which select a few single objects
Using the rating of two competing players, a win chance and use most of their attributes; and technical modules, like
can be predicted beforehand. The algorithm is used in com- web frameworks, which create heterogeneous object graphs.
petitive Chess and Go, but also for matchmaking in online However, at this point we can not back these claims and fur-
games, where it needs to quantify player performances live ther evaluation is needed.
to assign equally skilled opponents. The microbenchmarks provide clear evidence that the
Balance Aggregation Sequential aggregations are often columnar runtime scales better than ordinary objects with in-
implemented by looping over an input set, modifying an in- creasing improvements over larger numbers of objects. The
ternal state at each iteration. Our example involves comput- highest potential shows in the map operation, which effec-
ing an account balance while the input is only a set of trans- tively updates a full column without having an if-condition
actions with positive and negative balance changes, with the or maintaining an aggregate across multiple loop runs.
addition that days with negative balance are counted. The
additional criterion makes the algorithm difficult to express 5.4 Integration with Object-Orientation
in terms of relational operators, as the decision whether to To evaluate the integration of the columnar layout into
count the day or not depends on all records before this day. Python, we qualitatively describe the features of object-
Test transactions are drawn from a uniform distribution, orientation supported by the new layout. We will thereby
e.g. over the interval [−100, 100]. distinguish between working features of our prototype, lim-
Micro Benchmarks We used three basic list traversal op- itations of the prototype and limitations of our concept. This
erations to compare the execution on columnar objects with does not cover all features of Python but focuses on features
the execution on arrays: aggregating a sum (Aggregate which are affected by our changed layout.
Sum), adding a fixed number to all elements (Map Ad-
5.4.1 Features of the Prototype
dition), and extracting elements which fulfill a simple con-
dition into a new list (Filter). Object Identity The identity of an object can be obtained
via the id() function (usually an integer representing the
5.3 Benchmark Results memory address). We can override the id() function to
Statistical Methods We make no assumptions on the un- return objects which compare equal for proxies representing
derlying distribution and provide normalized Tukey boxplots the same instance (e.g. a tuple of class and instance ID).
in Figure 5 to visualize the median and variation of the mea-
sured timings compared to unoptimized PyPy. State and Methods Our implementation uses the Python
Exact median timings are given in table 2. Speedups attribute facilities to translate attribute access. Therefore,
are computed by dividing the median execution time from columnar objects exhibit the same lookup behavior as or-
the respective platform by the median execution time of dinary Python objects. Both, attributes and methods, are de-
our columnar implementation. Confidence bounds of this fined in the class of an object.
statistic are given by the 2.5-th and 97.5-th percentile of the Inheritance and Polymorphism Python supports multiple
bootstrap distribution of the computed ratio. inheritance and polymorphism. Our prototype supports mul-
Analysis From the benchmark results, we see that the tiple inheritance of behavior in the style of traits, meaning
columnar layout outperforms ordinary objects consistently that at most one columnar class may appear as base class
for 1,000,000 or more instances. However, it is not faster and the rest is required to carry mere behavior. The steady
when dealing with small input sizes (below 100,000 tra- construction and destruction of proxies retains the correct
versed records), but, except for the balance benchmark, this class relation and polymorphism works as expected.

205
2.5
1.4

Normalized Execution Time (PyPy = 1.0)


Normalized Execution Time (PyPy = 1.0)

2.0 1.2

1.0

1.5
0.8

1.0 0.6

0.4
0.5
0.2

0.0 0.0
PyPy

PyPy

PyPy

PyPy

PyPy

PyPy
100,000

100,000
1,000,000

10,000,000

1,000,000
PyPy + Columns

PyPy + Columns

PyPy + Columns

PyPy + Columns

PyPy + Columns

10,000,000

PyPy + Columns
Benchmark Size / Environment Benchmark Size / Environment

(a) ATP benchmark timings. (b) Kaplan-Meier benchmark timings


1.4 3.0
Normalized Execution Time (PyPy = 1.0)

Normalized Execution Time (PyPy = 1.0)

2.5
1.2

2.0
1.0

1.5

0.8
1.0

0.6
0.5

0.4 0.0
PyPy

PyPy

PyPy

PyPy

PyPy

PyPy

PyPy
10,000

100,000

100,000
1,000,000

1,000,000
PyPy + Columns

PyPy + Columns

PyPy + Columns

10,000,000

PyPy + Columns

PyPy + Columns

PyPy + Columns

10,000,000

PyPy + Columns

Benchmark Size / Environment Benchmark Size / Environment

(c) Elo benchmark timings (d) Balance benchmark timings

Figure 5: A box plot of the performance benchmark results, normalized to PyPy median. The normalization is the quotient of
the execution time of our approach and the time of a PyPy solution. Generally speaking, if the second box is higher than the
first box, our approach is slower, if it is lower our approach is faster. The red line indicates the median, the upper and lower
edges of the box are the second and third quartile and the end of the whiskers are the most outlying values in a 1.5 inter-quartile
range distance from the second and third quartile.

206
timing [ms] speedup timing [ms] speedup
benchmark size PyPy Col. vs. PyPy benchmark size PyPy Col. vs. PyPy
10 000 1.32 28.05 0.05 [0.04 – 0.05] 10 000 0.08 0.10 0.79 [0.79 – 0.8]
100 000 21.79 48.41 0.45 [0.44 – 0.46] 100 000 0.63 0.59 1.06 [1.05 – 1.07]
ATP Aggregate Sum
1 000 000 235.71 228.91 1.03 [0.93 – 1.16] 1 000 000 8.15 5.75 1.42 [1.41 – 1.42]
10 000 000 2739.53 1902.78 1.44 [1.4 – 1.49] 10 000 000 81.22 54.64 1.49 [1.45 – 1.49]
10 000 0.59 2.99 0.20 [0.2 – 0.2] 10 000 0.33 0.19 1.69 [1.68 – 1.7]
100 000 28.65 17.89 1.60 [1.59 – 1.62] 100 000 11.60 1.61 7.23 [7.22 – 7.24]
KM Map Addition
1 000 000 535.12 174.31 3.07 [3.06 – 3.08] 1 000 000 299.42 14.39 20.81 [20.78 – 20.83]
10 000 000 4393.76 1631.58 2.69 [2.53 – 2.91] 10 000 000 2924.87 132.15 22.13 [22.11 – 22.15]
10 000 6.36 3.38 1.88 [1.8 – 1.95] 10 000 0.25 21.85 0.01 [0.01 – 0.01]
100 000 41.54 28.32 1.47 [1.43 – 1.51] 100 000 2.85 20.11 0.14 [0.14 – 0.14]
Elo Filter
1 000 000 359.06 247.49 1.45 [1.41 – 1.47] 1 000 000 52.88 52.49 1.01 [1.0 – 1.02]
10 000 000 3506.49 2418.84 1.45 [1.41 – 1.49] 10 000 000 495.70 316.90 1.56 [1.56 – 1.68]
10 000 0.11 0.16 0.70 [0.58 – 0.88]
100 000 1.06 1.67 0.63 [0.54 – 0.94]
Balance
1 000 000 18.22 17.70 1.03 [0.61 – 1.91]
10 000 000 119.85 170.17 0.70 [0.43 – 3.48]

Table 2: Analytical algorithm benchmarks on the left side of the table and microbenchmarks on the right side. All median
benchmark timings in milliseconds. Speedups given as ratio of medians with 95% confidence intervals

Metaprogramming It is still possible to use metapro- types. We could also use types observed during run-time in-
gramming without degrading performance, e.g. if a user- stead of manual type annotations (see section 6.2).
supplied attribute needs to be read, we can use the function
getattr(obj, user_attr) and it will be eliminated during Object Identity Another way to determine object identity
optimization if the attribute name stays constant inside a is to compare objects using the is operator. We cannot adapt
long-running loop. the is operator without modifying the VM. Therefore, prox-
As everything is manifested in application-level Python ies may compare non-identical using the is operator despite
structures, full reflection over objects and classes is retained. representing the same columnar instance.
The underlying columnar model can be inspected by access-
Object-Specific State Object-specific attributes and meth-
ing the column_attributes dictionary of the class, which
ods are only defined on proxies and not stored in columns.
adds to existing reflection capabilities.
Therefore, they are lost when the proxy is garbage collected.
Tooling The Python ecosystem provides tooling for sev- However, per-instance attributes and methods can be imple-
eral purposes, e.g. debuggers or editors with auto-completion. mented using a global mapping from the object identity to a
As the runtime was not significantly altered but only ex- mapping from attribute names to state or functions.
tended with a user library, these mechanisms still work for Associations Associations are usually constructed in an
any Python code. They also work for columnar code as the ad-hoc fashion in Python, e.g. if an instance needs references
new physical layout was merely introduced by using exist- to multiple other instances, some method creates a list where
ing Python meta-programming features that are recognized those references are stored. In contrast, our model requires
by most tools. to define the association and its multiplicity in the class def-
inition (see listing 3). One-to-many associations produce a
5.4.2 Limitations of the Prototype read-only (but not immutable) collection on instance-side,
Typed Fields We require that the same attribute of all in- which cannot be replaced by an externally provided collec-
stances of a class has the same primitive or columnar type. tion, but modified through methods width side-effects like
Also these types have to be explicitly stated in the form of append(). However, the interface and run-time complexity
mix-ins as it can be seen in listing 3. This is the most signif- of that framework-provided collection can be influenced by
icant difference to dynamically typed ordinary Python ob- specifying whether it should behave like a set, list or dictio-
jects. However, as our scenarios involve homogenous data, nary, thus the impact on code operating on these collections
we see no urgent need to support full dynamicity. We merely can be mitigated well. It is possible to store a columnar ob-
provide a contract between programmer and runtime stating ject in an ordinary Python object. The inverse is also possi-
that a family of objects are in fact homogenous in attribute ble, given the columnar class declares an Object attribute.

207
5.4.3 Conceptual Limitations 6. Related and Future Work
Garbage-Collection (GC) As classes are considered col- 6.1 Related Work
lections of their instances, an instance needs to be explicitly The idea of columnar objects has already been applied to
removed from that collection to be removed from the sys- other types of runtimes and libraries.
tem. This means that our columnar objects are effectively
excluded from automatic GC and need a separate algorithm. 6.1.1 Kedama
Any GC algorithm for columnar objects suffers from a The Kedama [18] educational parallel programming sys-
conceptual problem. To preserve performance, the GC algo- tem allows users to program “turtles”, a sort of agents that
rithm has to avoid “holes” in columns caused by invalidated interact with their environment, organized as a grid. It is
objects. Therefore, we have to reorder objects in columns integrated into the eToys system, which itself is build on
and as a result we need to update the corresponding proxies. top of Squeak [10], a Smalltalk system. A group of turtles
To be able to update the proxies we need to keep a reference (“breed”) that share the same properties can be instructed to
on them which hinders the allocation removal optimization. perform a collective action at the same time, implementing
Further work is needed to find an algorithm which creates what is known as Single Instruction, Multiple Data (SIMD)
continuous sequences of living objects in columns on-the-fly in parallel programming.
while preserving lookup performance. In order to efficiently modify the state of a breed, the
turtle properties are stored in columns. A property update
Generalization of the Approach For our approach, we on a breed gets compiled to a vectorized operation, which
only considered a meta-tracing JIT. It allows us to use uses functions implemented in C (”primitives”) to process
metaprogramming instead of an interpreter modification arrays. For example, Kedama provides primitives for arith-
without losing performance, because only the resulting low- metic operations, such as adding two arrays. These functions
level control flow is considered. Therefore, we are confident are platform-level code and not editable by application de-
that the approach generalizes to other meta-tracing runtimes, velopers, thus limiting them to the types supported by the
such as HippyVM3 for PHP or Topaz for Ruby4 . A normal primitives. This works well for the domain of simulations
tracing JIT or method-based JIT could cause a much higher for educational purposes. In comparison, our approach tar-
implementation effort, as the JIT itself might need to be gets applications in general and therefore allows the devel-
aware of the new memory layout. oper to use arbitrary classes and the JIT compiler optimizes
the code to an efficient array operation automatically.

6.1.2 OOPAL
5.5 Summary of Evaluation
The OOPAL model [15] aims to extend the object-oriented
Our performance measurements show that analytical algo- model with concepts from array programming, as found for
rithms can generally benefit from a columnar object lay- example in APL. In particular, the model extends the mes-
out when traversing large amounts of data (' 1, 000, 000 sage dispatch to allow the expression of operations on sets
objects). As the optimization does not apply to all object- of objects without explicitly stating any form of iteration.
oriented algorithms, the programmer has to actively decide This concept is evaluated through an implementation in F-
when a columnar layout is adequate. The qualitative analy- Script. This implementation is optimized e.g. by using so
sis of our concept shows that most object-oriented features called ”smart arrays” which change their representation ac-
are supported and limitations of the prototype can be over- cording to their content, e.g. double-precision numbers are
come by integrating these features into the virtual machine. stored in their native platform representation. As a result,
A comparison of listing 3 and listing 1 illustrates how the method calls to elements of such an array can directly be
programmer currently has to adjust the code to provide the mapped to native operations.
type information required by the columnar layout. Regard- While an array programming interface would be a suit-
ing seamless integration of our columnar object layout into able extension to our approach, we aim to improve per-
a dynamically-typed object-oriented language, the interface formance for large data sets without changing the dynamic
available to the programmer leaves room for improvement. object-oriented model. Nevertheless, our approach indirectly
Overall, the results suggest that a columnar object layout makes use of similar optimization techniques, as the smart
provides performance benefits for analytical algorithms ex- arrays of the OOPAL implementation are similar to the stor-
pressed with object-oriented abstractions of the application age strategies for collections in the PyPy JIT compiler.
domain.
6.1.3 Exploded Java Classes
Noth [17] proposed a modification to the Java language in-
3 http://hippyvm.com/,2015-07-16 troducing the exploded keyword. Classes attributed with this
4 http://docs.topazruby.com/, 2015-07-16 keyword store their properties inside columnar arrays. Ex-

208
ploded objects can be used in specialized collections and it- 6.2.1 Optimized Collection Protocols.
erators, which are generated by instantiating code templates Python’s collection protocol, including built-in operations
at compile-time. Field access, instance creation, and iter- like map, filter, reduce, and list comprehensions, can be
ation on exploded classes and specialized collections also optimized for columnar data. Thus, we are currently working
undergo a source-to-source transformation to reflect column on a prototype collection protocol that transforms the inner
access. Subclass polymorphism is implemented by maintain- Python expressions into a query plan. We can optimize this
ing a type ID for each exploded instance and generating a plan similar to the optimizations applied by an SQL query
switch block with one case for each possible method im- optimizer. Based on this, we can map the resulting algorithm
plementation. This implementation strategy results in a dis- onto faster operations working directly on the columns.
crepancy between the code specified by the programmer and
the code actually executed at run-time. In particular, this 6.2.2 Sharing Columns with an In-Memory Database
might become an issue when debugging an application us- Our efforts to improve runtime performance for data-heavy
ing exploded objects. Our approach does not alter the source algorithms are also part of a project concerned with im-
code before execution, but improves the performance by proving the interface between databases and object-oriented
re-interpreting the object-oriented execution as array-based runtimes. For instance, if objects can be implemented in an
computation at run-time. Thus, during debugging or pure object-oriented runtime in a similar structure as data is stored
interpretation, the code is executed as written down by the inside an in-memory database, the interface between them
programmer. Further, an exploded class must neither con- could be vastly different from the current ones (i.e. ORM or
tain associations to ordinary Java classes nor inner classes, stored procedures). For example, shared memory between
thereby limiting their composability with the Java object runtime and the database could allow the runtime to map
model. Also, reflection and metaprogramming on exploded objects to native database data directly, given that security
instances are not supported. and transactional properties can still be maintained.
Therefore, shared data allows one to manipulate database
6.1.4 Bcolz data through objects, while also using optimized database
The bcolz [2] Python library implements array and table ab- operations. For example, when filtering a set of objects, in-
stractions for in-memory analyses of large bulks of struc- stead of using the built-in generic filter operation, the ex-
tured data. They make use of a columnar data layout and ecution environment could map it to the database operation,
column-wise compression to save memory and CPU cache optimized for the database data layout.
space and subsequently speed up read-intensive algorithms. 6.2.3 Columnar Runtimes
A special query interface can be used to execute some com-
putations with highly optimized compression-aware algo- Another opportunity is a dynamic object-oriented execution
rithms. However, the library does not integrate with object- environment based completely on a columnar object layout.
oriented abstractions and does not use JIT-based optimiza- In particular, it will be interesting to see the performance
tions. trade-offs resulting from such an approach. To assess this,
we need detailed benchmarks on the impact of a columnar
6.1.5 GemStone layout on the performance of typical object-oriented appli-
cations. Suitable benchmarks are “Richards” or “Deltablue”.
The GemStone/S system [13] is an object database which is
A hybrid solution might create interesting opportunities.
capable of running a full application. Due to seamless inte-
Currently the programmer has to decide which object layout
gration with the Smalltalk-80 language, there is no bound-
fits the anticipated access patterns best. Manual optimiza-
ary between application logic and database: Persisted ob-
tion is an extra effort and should be offloaded to the runtime
jects can be queried using the Smalltalk collection proto-
whenever possible. The question remains whether it is fea-
col and handled inside domain logic as if they were native
sible for the execution environment to automatically switch
heap objects. To our knowledge, there are no publications on
between object layouts, based on observed access patterns.
the fundamental implementation of Gemstone, and thus we
cannot provide an appropriate comparison to our approach. 6.2.4 Transactional Object-Oriented Runtimes with
However, instead of also providing database features, like Persistence
persisting objects, establishing transaction boundaries and There has been progress towards software-transactional
versioning, we are merely focused on improving analytical memory (STM) in PyPy [14], which could be the founda-
algorithm performance. tion for a scalable execution environment for large data sets.
We might be able to move important database functionality
6.2 Future Work into the runtime itself. One major challenge is an implemen-
Based on our proposal to implement a columnar object lay- tation of a transactional persistence on top of transactional
out, we see several remaining opportunities for improving objects, only causing minimal overhead. The ideal execution
runtimes with database technology. environment would merge the functionality of databases and

209
traditional runtimes, resulting in a system similar to Gem- [9] Google. Chrome V8 design elements, 2012. URL https:
stone/S. As a result, the programmer would not have to //developers.google.com/v8/design. Accessed:
switch the paradigm at all and can program in one devel- 2014-03-18.
opment environment, i.e. one language and one set of tools. [10] D. Ingalls, T. Kaehler, J. Maloney, S. Wallace, and A. Kay.
Back to the future: The story of Squeak, a practical Smalltalk
7. Conclusion written in itself. In ACM SIGPLAN Notices, volume 32, pages
318–326. ACM, 1997.
To mitigate the performance deficiencies of dynamic object-
[11] E. L. Kaplan and P. Meier. Nonparametric estimation from
oriented runtimes regarding analytical workloads, we in-
incomplete observations. Journal of the American statistical
troduced a column-oriented object layout which leverages association, 53(282):457–481, 1958.
tracing JIT technology to execute object-oriented code on
[12] S. N. Khoshafian and G. P. Copeland. Object identity. In Con-
columnar data structures. We developed an interpretation
ference Proceedings on Object-oriented Programming Sys-
of object identity, associations, attribute access, and col- tems, Languages and Applications, OOPLSA ’86, pages 406–
lections in terms of the columnar object layout. We have 416, New York, NY, USA, 1986. ACM. ISBN 0-89791-204-7.
demonstrated the feasibility of our approach with a proto- [13] D. Maier, J. Stein, A. Otis, and A. Purdy. Development of
type implemented in PyPy. Performance measurements with an object-oriented DBMS. In Conference Proceedings on
this prototype showed that analytical algorithms running on Object-oriented Programming Systems, Languages and Appli-
columnar objects perform significantly better than running cations, OOPLSA ’86, pages 472–482, New York, NY, USA,
on native objects. The dynamic object-oriented mechanisms 1986. ACM. ISBN 0-89791-204-7. .
and concepts remain largely unchanged. Overall, our ap- [14] R. Meier and A. Rigo. A way forward in parallelising dy-
proach contributes to the ways programmers can be relieved namic languages. In Proceedings of the 9th International
of the task of manual optimization. Workshop on Implementation, Compilation, Optimization of
Object-Oriented Languages, Programs and Systems PLE,
References page 4. ACM, 2014.
[1] A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. [15] P. Mougin and S. Ducasse. OOPAL: Integrating array pro-
Weaving relations for cache performance. In VLDB, volume 1, gramming in object-oriented programming. In R. Crocker
pages 169–180, 2001. and G. L. S. Jr., editors, Proceedings of the 2003 ACM SIG-
PLAN Conference on Object-Oriented Programming Systems,
[2] F. Alted. Out-of-core columnar datasets, 2014. URL http:
Languages and Applications, OOPSLA 2003, October 26-30,
//blosc.org/docs/bcolz-EuroPython-2014.
2003, Anaheim, CA, USA, pages 65–77. ACM, 2003. .
pdf. EuroPython 2014, Berlin. Accessed: 2015-03-18.
[16] T. Neward. The Vietnam of computer science. The Blog Ride,
[3] C. F. Bolz. Efficiently implementing Python
Ted Newards Technical Blog, 2006.
objects with maps, 2010. URL http:
//morepypy.blogspot.de/2010/11/ [17] M. E. Noth. Exploding Java Objects for Performance. PhD
efficiently-implementing-python-objects. thesis, University of Washington, 2003.
html. Accessed: 2014-03-13. [18] Y. Ohshima. An End-User Programming System for Con-
[4] C. F. Bolz, A. Cuni, M. Fijalkowski, and A. Rigo. Tracing structing Massively Parallel Simulations. PhD thesis, 2006.
the meta-level: PyPy’s tracing JIT compiler. In Proceedings [19] H. Plattner. A Course in In-Memory Data Management.
of the 4th workshop on the Implementation, Compilation, Op- Springer. ISBN 978-3-642-36523-2.
timization of Object-Oriented Languages and Programming [20] H. Plattner. A common database approach for OLTP and
Systems, pages 18–25. ACM, 2009. OLAP using an in-memory column database. In Proceedings
[5] C. F. Bolz, A. Cuni, M. Fijakowski, M. Leuschel, S. Pedroni, of the 2009 ACM SIGMOD International Conference on Man-
and A. Rigo. Allocation removal by partial evaluation in agement of Data, SIGMOD ’09, pages 1–2, New York, NY,
a tracing JIT. In Proceedings of the 20th ACM SIGPLAN USA, 2009. ACM. ISBN 978-1-60558-551-2. .
Workshop on Partial Evaluation and Program Manipulation, [21] H. Plattner. SanssouciDB: An in-memory database for pro-
PEPM ’11, pages 43–52, New York, NY, USA, 2011. ACM. cessing enterprise workloads. In Proceedings of the GI-
[6] C. Chambers, D. Ungar, and E. Lee. An efficient imple- Fachtagung Datenbanksysteme für Business, Technologie und
mentation of SELF a dynamically-typed object-oriented lan- Web 2011, volume 20, pages 2–21, 2011.
guage based on prototypes. In Conference Proceedings on [22] A. Rigo and S. Pedroni. PyPy’s approach to virtual machine
Object-oriented Programming Systems, Languages and Ap- construction. In Companion to the 21st ACM SIGPLAN sym-
plications, OOPSLA ’89, pages 49–70, New York, NY, USA, posium on Object-oriented programming systems, languages,
1989. ACM. ISBN 0-89791-333-7. and applications, pages 944–953. ACM, 2006.
[7] A. Elo. The rating of chessplayers, past and present. Arco [23] F. Schneider. Compiling Dart to effi-
Publishing, 1978. ISBN 9780668047210. cient machine code, 2012. URL https:
[8] A. Goldberg and D. Robson. Smalltalk-80: The Language and //www.dartlang.org/slides/2013/04/
Its Implementation. Addison-Wesley Longman Publishing compiling-dart-to-efficient-machine-code.
Co., Inc., Boston, MA, USA, 1983. ISBN 0-201-11371-6. pdf. Accessed: 2015-03-18.

210

You might also like