Unboxing Using Specialisation
Unboxing Using Specialisation
net/publication/2442838
CITATIONS READS
13 164
3 authors, including:
SEE PROFILE
All content following this page was uploaded by Simon Loftus Peyton Jones on 23 July 2013.
Abstract
In performance-critical parts of functional programs substantial perfor-
mance improvements can be achieved by using unboxed, instead of boxed,
data types. Unfortunately, polymorphic functions and data types cannot
directly manipulate unboxed values, precisely because they do not con-
form to the standard boxed representation. Instead, specialised, monomor-
phic versions of these functions and data types, which manipulate the
unboxed values, have to be created. This can be a very tiresome and
error prone business, since specialising one function often requires the
functions and data types it uses to be specialised as well.
In this paper we show how to automate these tiresome consequential
changes, leaving the programmer to concentrate on where to introduce
unboxed data types in the rst place.
1 Introduction
Non-strict semantics certainly add to the expressive power of a language [8].
Sometimes the performance cost of this extra expressiveness is slight, but not
always. It can happen that an inner loop of a program is made seriously less
ecient by non-strictness. For example, consider the following fragment of a
complex-number arithmetic package:
data Complex = Cpx Float Float
Alas, none of the usual list-manipulating functions (map, filter etc) work over
LFloat , so new versions of them have to be de ned, and so it goes on.
In general, Peyton Jones & Launchbury [16] put forward the restriction
that: polymorphic functions (and data constructors) cannot be used at unboxed
types. We use the term \creeping monomorphism" to describe the sad necessity
to declare new functions simply because of this restriction. The goal of this
paper is to lift the restriction, by automating the production of new versions
of existing functions and data constructors.
Even the humble + function in addCpx is an example. When the Complex
data type is changed, type inference will nd that r1 and r2 are of type Float#.
Since + is an overloaded function, with type
(+) :: Num a => a -> a -> a
Peyton Jones & Launchbury would prohibit + from being applied to a value
of type Float#. However, if the restriction is lifted, and Float# is made an
instance of class Num, then the code for addCpx will compile without modi ca-
tion.
Our goal is to allow the programmer to use pro ling information [19] to
improve run-time performance by making minimal changes to data type decla-
rations and type signatures. The system we describe in this paper propagates
these changes throughout the program, compiling specialised versions of poly-
morphic functions and constructors where they are now used at unboxed types.
We begin by describing our Core language (Section 2). We then examine
the use of polymorphic functions and data types in the presence of unboxed val-
ues (Section 3), formalising Peyton Jones & Launchbury's unboxing restriction,
describing the process of specialisation which enables us to relax this restriction
(Section 3.1), and presenting a partial evaluator which performs this speciali-
sation (Section 3.2). In Section 4 we discuss the practical implications, before
presenting some preliminary results (Section 5) and discussing related work
(Section 6).
has two constructors, Nil and Cons. The implied constructor declarations
might be expressed in the higher-order calculus as follows:
Expression e ::= x
x::e
j
exj
let x: = e1 in e2
j
:e
j
e
j f g
C 1 n x1 x a
j f g
case e of Cj xj 1 :j 1
j f xjaj :jaj -> ej mj=1
g
PolyType ::= 8 : j
MonoType ::= v j
BoxedType ::=
j 1 2
!
j 1 n
Nil : 8:List
= :[Nil]
Cons : 8: List
! List !
= :v1 : :v2 : List :[Cons v1 v2 ]
where [ ] indicates actual construction of the data object. Even though the
constructor Nil has an arity of zero the higher-order constructor still requires
a type parameter to indicate what type it is being used at, e.g. Nil Int . In f g
general, a data declaration has the form
data 1 n = C1 11 1a1 |
| Cm m1 mam
where n is the number of type parameters of the data type and aj is the arity
of the data constructor. Since the number of type parameters is determined by
the arity of the type constructor, , it is the same for all data constructors of
that type.
2.2 Well-formed expressions
We call an expression e well-formed under type assumption ? if we can derive
the typing judgement ? e : . The typing rules are quite standard [11] and
`
are not given here (but see Section 3).
2.3 Notation
For notational convenience we abbreviate sequences such as x1 xn with x,
where n = length x. This is extended to sequences of pairs which are abbrevi-
ated with paired sequences, e.g. xj 1 :j 1 xjaj :jaj is abbreviated with xj :j .
We use for syntactical identity of expressions.
A similar restriction is imposed in the typing rule for data constructors. This
prohibits the construction of polymorphic data objects with unboxed compo-
nents, e.g. List Float#.
These restrictions cause the \creeping monomorphism", described in Sec-
tion 1, since the programmer must declare suitable monomorphic versions of
any polymorphic functions and data types used at unboxed types. This can be
exceedingly tedious and error prone.
To address this problem we propose to relax the unboxing restriction ( ),
allowing the programmer unrestricted use of unboxed values. During the com-
pilation we undertake automatically to generate the necessary monomorphic
function versions: converting the unrestricted program into one which satis es
the unboxing restriction ( ). We can then generate code which directly ma-
nipulates the unboxed values since their type, and hence their representation,
1 The representation information that is typically required is the size of a value and the
position of any heap pointers (so that all roots can be identi ed during garbage collection).
When more sophisticated calling conventions are used, such as passing arguments in registers,
the actual type may also a ect the treatment of a value. For example a boxed value may
be passed in a pointer register, an Int# in an integer register, and a Float# in a dedicated
oating point register.
2 This restriction is equivalent to \Restriction 1: loss of polymorphism" in Peyton Jones
& Launchbury [16].
is known at compile time. For example, here is the monomorphic version of C
which manipulates Float#s:
C 0 = f : Float# Float# Float#:x: Float#:y: Float#:f y x
! !
Since the code generator knows that x and y have type Float# it produces code
which manipulates oating point numbers, instead of pointers. This is the only
di erence between the code produced for C and C 0.
3.1 Specialisation
The transformation of program with unrestricted use of unboxed types into
one which satis es the unboxing restriction above is performed using a partial
evaluator. The idea is to remove all type applications involving unboxed types
by creating new versions of the functions being applied, specialised on the
unboxed types. These specialised versions are created by partially evaluating
the unboxed type applications.
Before launching into the de nition of the partial evaluator itself, we give an
overview of the algorithm. Each time a function (or constructor) is applied to
a sequence of types, a new version of the function (or constructor), specialised
on any unboxed types in the application, is created, unless such a version has
already been created. For example, given the code3
append fInt#g
xs (map f[Int#] Int#g
(sum fInt#g) (append f[Int#]g yss zss)
T [[let x : = e in e] e :e ; PolyType
j 2 (5)
= let (e0 ; 1; 1 ) = e [x (; e1)] [x
T ] ! ! fg
in ([[let ( 1 x) in e0 ] ; 1 [x ]; ) ! ?
T [[x ]
f g (6)
= let (v; ) = spectys
in
case xv dom ( x) of
2
True ([[xv ] ; ; )
! f g fg
False let ( 1 ; 1 ) = specfn x v
!
in ([[xv ] ; 1 ; 1 ) f g
T [[C x]
f g (7)
= let (v; ) = spectys
= ?C
in ([[Cv x] ; ; v )
f g f g
T [[case e of Cj xj :j ej m
f
j=1]
! g (8)
= let (e0 ; 0; 0 ) = e 0 T
(e01 ; 1; 1 ) = e1 T
(e0n ; m ; m ) = en m?m1 T
= [ i i [1::n]; vi ]
j
e = :e00
0 = [(if vi then vi else i)= i ]ni=1
6
= 8 : 0
in ( 1; 1 )
4 Practical Considerations
The specialiser described above interacts with a number of other language fea-
tures including: overloading, and separate module compilation. We address
these issues and discuss the process of introducing unboxing below.
4.1 Overloaded functions
In Haskell many primitive functions, such as comparison and addition, are
overloaded. This allows these operations to be applied to a number of di erent
types. For example, addition belongs to the class Num
class Num a where
(+) :: a -> a -> a
...
which has instances for types such as Int, Integer, Float and Complex. Each
of these instances provides a de nition of the function which is called when it
is used at that type. For example:
instance Num Int where
(+) x y = plusInt x y
...
Since an overloaded function can now be applied to an unboxed type (it was
prohibited by the unboxing restriction ( ) before), it makes sense to introduce
new instances declarations for these unboxed types. For example:
instance Num Int# where
(+) x y = plusInt# x y
...
This allows us to manipulate unboxed values in the same way as we manipu-
late their boxed counterparts, greatly reducing the code modi cations required
when introducing unboxing. It also overloads the literals, allowing us to write
1 instead of 1#, where an Int# is required.
After introducing this unboxing signature type errors may occur where the
unboxed data structure is created and used.4 In modifying the code to correct
these type errors the programmer has to introduce explicit unboxing/boxing
coercions at the \boundaries" of the unboxed values. We believe this is a \good
thing", since the programmer is forced to identify these boundaries and consider
the strictness implications. If the unboxing does cause the program to bottom
the boundaries can be moved or the unboxing modi cations abandoned.
It remains to be seen what the practical overheads of introducing this form
of explicit unboxing are. However, we believe that when performance is an issue,
and resources are allocated to improving it5, it is essential that the programmer
has access to language features, such as this, which enable them to optimise
the execution.
5 Preliminary Results
We have not yet completed the implementation of the specialiser. However,
we do have some preliminary results for programs in which we have introduced
4 These boundary type errors will not occur where the unboxed values are created/used
by overloaded functions which have instances for the unboxed type (see Section 4.1).
5 We would also avocate that such improvements are carefully directed at the actual hot-
spots identi ed by an execution pro ler [18].
Program Brief Description Unboxing Modi cations
clausify converts logical formula to their unbox the character symbols
clausal form [17,18]
life list based implementation of unbox the integers used in the
Conway's Life algorithm [2] board representation
pseudoknot oating point intensive molecu- unbox all integers and oating
lar biology application [5] point numbers
#lines #lines modi ed/added
Program code Unbox Boundary Specialise Overload Speedup
clausify 112 1 3 25 3 1.25x
life 75 1 1 42 9 1.06x
pseudoknot 3146 3 1 10 23666 4.42x
Figure 4: Preliminary results
some unboxing and performed the necessary specialisation by hand. These are
summarised in Figure 4. The Unbox column reports the modi cations required
to introduce the unboxing while the Boundary column reports the modi cations
required to coerce data at the boundaries of the unboxing (see Section 4.4). The
Specialise and Overload columns report the modi cations which we expect to
be automated (either by specialisation or as a result of extending the class
operations to the unboxed types). The small number of changes required to
introduce the unboxing is very encouraging.
6 Related Work
Other treatments of polymorphism in the presence of unboxed values fall into
two categories. The rst automatically introduces coercions which box/unbox
values when they are passed to/from a polymorphic function [6,10,13,20]. The
costs of creating and manipulating boxed values is only incurred when poly-
morphic code is used. This has the unfortunate consequence that it penalises
the performance of polymorphic code, since unboxing is not possible.
The second approach compiles each polymorphic function in such a way
that it can manipulate unboxed values. This is done by passing enough addi-
tional information at runtime to describe the representation of the values being
manipulated [4,14]. This scheme also penalises the performance of polymor-
phic code since it must interpret the representation information. It has the
unfortunate property that the performance penalty is paid even when the code
is manipulating boxed values.
In contrast, our scheme generates specialised versions of polymorphic func-
tions and data types which directly manipulate unboxed values. The perfor-
mance of polymorphic code is not penalised since the polymorphism is removed
precisely where it would impose a performance penalty. It also enables arbi-
trary data types to contain unboxed components. Traded o against this is the
6 The large number of modi cations for pseudoknot were due to the amount of literal data
(about 70% of the program) which had to by unboxed by added #s.
resulting code expansion and the diculties associated with separate module
compilation.
In a strict language, such as ML, both boxed and unboxed values have the
same semantics. Consequently, the approaches to unboxing in strict languages
focus on automatically unboxing values, because doing so is always possible
when the type is known at compile time [4,6,10,14,20]. In a non-strict language,
such as Haskell, unboxed values can only be introduced if we can be sure that
the implied strictness will not change the behaviour of the program. Rather
than relying on the often poor results of a strictness analyser, we ask the
programmer to indicate where the unboxing is to be introduced. A similar
approach is taken by Nocker & Smetsers [13]. They require the programmer to
introduce explicit strictness annotations which they then use to safely unboxed
values.
The use of partial evaluation to produce specialised code is not new. Hall [3]
uses partial evaluation of special type arguments to create specialised versions
which produce and consume an optimised list representation, while both Au-
gustsson [1] and Jones [9] use partial evaluation to eliminate the overheads
of overloading: creating versions which are specialised on their dictionary
arguments.7 Jones [9] also proposes an overloaded implementation of data
types which cause the specialisation of overloading to specialises the data types
as well. Unfortunately this requires a more powerful system of type classes.
7 Future Work
Our immediate goal is to complete the implementation of the specialiser, and
develop a scheme for automatically propagating information about the required
specialisations back to the declaring module. This should enable us to exper-
iment with the unboxing of large programs by examining the practicalities of
introducing the unboxing and the performance improvements which result.
We also plan to experiment with monomorphisation in general. By modify-
ing the de nition of spectys (Figure 3) the partial evaluator can be directed to
introduce an arbitrary degree of monomorphisation. For example, if we de ne
spectys = (; [ ]) we get a completely monomorphic program. Our intention
is to explore the practical bene ts of optimisations which require monomorphic
code to produce good results.
Bibliography
[1] L Augustsson, \Implementing Haskell overloading," Conference on Func-
tional ProgrammingLanguages and Computer Architecture, Copenhagen,
Denmark, June 1993, 65{73.
[2] M Gardner, \Wheels, Life and Other Mathematical Amusements," W.H.
Freeman and Company, New York, 1993.
7 Within our compiler we also use our partial evaluator to eliminate overloading by spe-
cialising on all overloaded type arguments, in addition to any unboxed type arguments. Care
must be taken to ensure that the dictionary argument(s), introduced by the translation into
the Core language, are also eliminated.
[3] CV Hall, \Using Hindley-Milner type inference to optimise list repre-
sentation," Conference on Lisp and Functional Programming, Orlando,
Florida, June 1994.
[4] R Harper & G Morrisett, \Compiling Polymorphism Using Intensional
Type Analysis," Technical Report CMU-CS-94-185, School of Computer
Science, Carnegie Mellon University, Sept 1994.
[5] PH Hartel et al., \Pseudoknot: a oat-intensive benchmark for functional
compilers," in Proc Sixth International Workshop on the Implementation
of Functional Languages, Norwich, JRW Glauert, ed., University of East
Anglia, Norwich, Sept 1994.
[6] F Henglein & J Jorgensen, \Formally optimal boxing," 21st ACM Sym-
posium on Principles of Programming Languages, Portland, Oregon, Jan
1994, 213{226.
[7] P Hudak, SL Peyton Jones, PL Wadler, Arvind, B Boutel, J Fairbairn, J
Fasel, M Guzman, K Hammond, J Hughes, T Johnsson, R Kieburtz, RS
Nikhil, W Partain & J Peterson, \Report on the functional programming
language Haskell, Version 1.2," ACM SIGPLAN Notices 27(5), May 1992.
[8] John Hughes, \Why functional programming matters," The Computer
Journal 32(2), April 1989.
[9] MP Jones, \Partial evaluation for dictionary-free overloading," Research
Report YALE/DCS/RR-959, Dept of Computer Science, Yale University,
April 1993.
[10] X Leroy, \Unboxed objects and polymorphic typing," 19th ACM Sympo-
sium on Principles of Programming Languages, Albuquerque, New Mex-
ico, Jan 1992, 177{188.
[11] JC Mitchell & R Harper, \On the type structure of Standard ML," ACM
Transactions on Programming Languages and Systems 15(2), April 1993,
211{252.
[12] R Morrison, A Dearle, RCH Conner & AL Brown, \An ad-hoc approach to
the implementation of polymorphism," ACM Transactions on Program-
ming Languages and Systems 13(3), July 1991, 342{371.
[13] E Nocker & S Smetsers, \Partially strict non-recursive data types," Jour-
nal of Functional Programming 3(2), April 1993, 191{217.
[14] A Ohori & T Takamizawa, \A polymorphic unboxed calculus and ecient
compilation of ML," Research Institute for Mathematical Sciences, Kyoto
University, Japan, 1994.
[15] SL Peyton Jones, CV Hall, K Hammond, WD Partain & PL Wadler,
\The Glasgow Haskell compiler: a technical overview," Joint Framework
for Information Technology (JFIT) Technical Conference Digest, Keele,
March 1993, 249{257.
[16] SL Peyton Jones & J Launchbury, \Unboxed values as rst class citi-
zens," Conference on Functional Programming Languages and Computer
Architecture, Cambridge, Massachusetts, Sept 1991.
[17] C Runciman & D Wakeling, \Heap pro ling of lazy functional programs,"
Journal of Functional Programming 3(2), April 1993, 217{245.
[18] PM Sansom, \Execution pro ling for non-strict functional languages,"
PhD thesis, Research Report FP-1994-09, Dept of Computing Science,
University of Glasgow, Sept 1994.
[19] PM Sansom & SL Peyton Jones, \Time and space pro ling for non-strict,
higher-order functional languages," 22nd ACM Symposium on Principles
of Programming Languages, San Francisco, California, Jan 1995.
[20] PJ Thiemann, \Unboxed values and polymorphic typing revisited," in
Proc Sixth International Workshop on the Implementation of Functional
Languages, Norwich, JRW Glauert, ed., University of East Anglia, Nor-
wich, Sept 1994.