Implementing Higher-Kinded Types in Dotty: Martin Odersky, Guillaume Martres, Dmitry Petrashko

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Implementing Higher-Kinded Types in Dotty

Martin Odersky, Guillaume Martres, Dmitry Petrashko


EPFL, Switerland: {first.last}@epfl.ch

Abstract proved to be challenging, so much so that we evaluated four


dotty is a new, experimental Scala compiler based on DOT, different strategies before settling on the current direct rep-
the calculus of Dependent Object Types. Higher-kinded resentation encoding. The strategies are summarized as fol-
types are a natural extension of first-order lambda calculus, lows:
and have been a core construct of Haskell and Scala. As long
as such types are just partial applications of generic classes, A simple encoding in the DOT-inspired [9] core type
they can be given a meaning in DOT relatively straightfor- structures that can express partial applications and not
wardly. But general lambdas on the type level require ex- much more
tensions of the DOT calculus to be expressible. This paper A direct representation that adds support for full type
is an experience report where we describe and discuss four lambdas and higher-kinded applications, without re-
implementation strategies that we have tried out in the last using much of the existing concepts of the calculus and
three years. Each strategy was fully implemented in the dotty the compiler.
compiler. We discuss the usability and expressive power of
A projection encoding, that encodes higher-kinded types
each scheme, and give some indications about the amount of
implementation difficulties encountered. as first-order generic types using type projections T#A.
A refinement encoding, that encodes higher-kinded types
Categories and Subject Descriptors D.3.3 [Language
as first-order generic types using refinements and path-
Constructs and Features]: Polymorphism
dependent types.
General Terms Languages, Compilers, Experimentation
Keywords type constructor polymorphism, higher-kinded Neither of the encodings is fully transparent, in that some
types, higher-order genericity, Scala, dotty, DOT, dependent type checking operations still needed special provisions for
object types encoded types.
These four strategies were implemented in the dotty re-
1. Introduction search compiler for Scala over the course of three years
Scala has first-class support for higher-kinded types [3], they (2013-2016). The purpose of the present paper is to give
can be defined by users as follows: a high-level overview of the implementations and the lan-
guage design choices they entail.
type Foo[A] = List[A] // Foo has kind * -> * The perspective of the paper is experimental rather than
and abstracted over: theoretical. One can regard it as a kind of lab notebook
describing and contrasting different experiments. The raw
def return[F[_], A](x: A): M[A] data for the experiments exists in the form of commits in
type Bar[M[_]] = M[Int] // Bar has kind (* -> *) -> *
the repository lampepfl/dotty on GitHub. Given the con-
Implementing sound support for these higher-kinded siderable implementation effort that went into higher-kinded
types in dotty [5] without restricting their expressive power types we wanted to create a record of what was done, what
worked out, and what did not work as well as hoped for.
Permission to make digital or hard copies of all or part of this work for personal or Overall, its fair to say that there were more failed than suc-
classroom use is granted without fee provided that copies are not made or distributed cessful experiments, but failures are at least as important to
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM record as successes.
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a The rest of this paper is organized as follows. Section 3
fee. Request permissions from [email protected].
describes the simple encoding of partial applications into
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
core DOT. Section 4 describes the direct representation of
SCALA16, October 3031, 2016, Amsterdam, Netherlands
ACM. 978-1-4503-4648-1/16/10...$15.00 higher kinded types. Section 5 and Section 6 describe two
http://dx.doi.org/10.1145/2998392.2998400 encodings based on projections and refinements, respec-

51
tively. Section 7 compares the four implementation strate- An application such as List[String] is then expanded
gies described previously. Section 8 concludes. to List { type Elem = String }. If List were declared as
covariant (using [+Elem]), the type application is instead
expanded to a refinement with an upper bound:
2. Background
List { type Elem <: String }
If we combine generics and subtyping in a language like
Java or Scala, we face the problem that we want to express Analogously, applications of contravariant types lead to re-
a generic type where the type argument is an unknown type finements with lower bounds.
that can range over a set of possible types. The prototypi- This scheme has two immediate benefits. First, we only
cal case is where the argument ranges over all subtypes or need to explain one concept instead of two. Second, the
supertypes of some type bound, as in List[_ <: Fruit]. interaction between the two concepts, which was so difficult
Such partially undetermined types come up when we before, now becomes trivial. Indeed, a type like
want to express variance. We would like to express, say,
List[_ <: Fruit]
that List[Apple] is a subtype of List[Fruit] since Apple is
a subtype of Fruit. An equivalent way to express this is to is simply
say that the type List[Fruit] includes Lists where the el-
ements are of an arbitrary subtype of Fruit. By that rea- List { type Elem <: Fruit }
soning, List[Apple] is a special case of List[Fruit]. We
That is, wildcard parameters translate directly to refinements
can also express this notion directly using the wildcard type
with the same type bounds.
List[_ <: Fruit]. Definition-site variance can be regarded
as a user-friendly notation that expands into use-site vari-
ance expressions using such wildcards. 3. The Simple Encoding
The problem is how to model a wildcard type such as Following DOT, we model type parameters as type members
List[_ <: Fruit]. Igarashi and Virolis original interpre-
and type arguments as refinements. For instance, a parame-
tation [1] was as an existential type T <: Fruit.List[T] terized class such as
which would be written
Map[K, V]
List[T] forSome { type T <: Fruit }
is treated as equivalent to a type with type members:
in current Scala. However, existential types usually come
with explicit pack and unpack constructs [2], which are class Map { type Map$K; type Map$V }
absent in Scalas setting. Moreover, actual subtyping rules
as e.g. implemented in the reference compilers for Java and The type members are name-mangled (i.e. Map$K) to ensure
Scala are more powerful than what can be expressed with that they do not conflict with other user-defined members or
existential types alone [12]. The theory of the rules that are parameters named K or V.
actually implemented is not fully known and the issues look A type-instance such as Map[String, Int] would then be
complicated. Tate, Leung and Learner have explored some treated as equivalent to
possible explanations in [11], but their treatment raises about Map { type Map$K = String; type Map$V = Int }
as many questions as it answers.
whereas a wildcard type such as Map[_, Int] is equivalent
2.1 A Uniform Representation of Types to:
The problem is solved in DOT and dotty by a radical reduc- Map { type Map$V = Int }
tion. Type parameters and type arguments are not primitive,
That is, _ arguments correspond to type members that are
but are seen as syntactic sugar for type members and type
left abstract. Wildcard arguments can have bounds. E.g.
refinements. For instance, if List is declared like this:
Map[_ <: AnyRef, Int]
trait List[Elem] { ... }
is equivalent to:
then this would be expanded to a parameterless trait with
a type member, like this: Map { type Map$K <: AnyRef; type Map$V = Int }

trait List { type Elem; ... }

(For simplicity we re-use the name of the parameter Elem 3.1 Type Parameters and Partial Applications
as the name of the type member, whereas in practice the The notion of type parameters makes sense even for encoded
compiler would choose a mangled name like List$Elem in types, which do not contain parameter lists in their syntax.
order to avoid name clashes.) Specifically, the type parameters of a type are a sequence of

52
type fields that correspond to parameters in the unencoded Non-linear parameter occurrences. It is also possible to
type. They are determined as follows. express some patterns where type parameters occur non-
linearly on the right-hand side. An example is the definition
The type parameters of a class or trait type are those of Pair below.
parameter fields declared in the class that are not yet
instantiated, in the order they are given. Type parameter type Pair[T] = Tuple2[T, T]

fields of parents are not considered. where Tuple2 is declared as


The type parameters of an abstract type are the type
class Tuple2[+T1, +T2] ...
parameters of its upper bound.
The type parameters of an alias type are the type param- The definition of Pair is expanded to the following parame-
eters of its right hand side. terless type alias:
The type parameters of every other type is the empty type Pair =
sequence. Tuple2 { type Tuple2$T2 = Tuple2$T1 }

This definition of type parameters leads to a simple model More generally, each type parameter of the left-hand side
of partial applications. Consider for instance: must appear as a type member of the right hand side type.
Type members must appear in the same order as their corre-
type Histogram = Map[_, Int] sponding type parameters. References to the type parameter
Histogram is a higher-kinded type that still has one type are then translated to references to the type member. The
parameter. Histogram[String] would be a possible type in- type member itself is left uninstantiated.
stance, and it would be equivalent to Map[String, Int]. 3.3 Limitations
One interesting consequence of this definition is that
The technique described in the previous section can expand
higher-kinded types and existential types are identified with
most polymorphic type aliases appearing in Scala codebases
each other by virtue of being mapped to the same construct.
but not all of them. Here are some examples of types that
Indeed, the type Map[_, Int] can be interpreted as both an
cannot be expressed:
existential type, where the K field is unspecified and as a
higher-kinded type that takes a type argument for the K field
1. type Rep[T] = T
and produces an instance of Map.
This fails because the right hand side T does not have a
3.2 Modeling Polymorphic Type Declarations
type field named T.
The partial application scheme gives us a new and quite
elegant way to express certain higher-kinded types. But 2. type LL[Elem] = List[List[Elem]]
how do we interpret the polymorphic types that exist in
Scala? This fails because the occurrence of the parameter Elem
More concretely, Scala allows us to write parameterized on the right hand side is not a member binding of the
type definitions, abstract types, and type parameters. In the outer List.
new scheme, only classes (and traits) can have parameters
3. type RMap[V, K] = Map[K, V]
and these are treated as equivalent to type members. Type
aliases and abstract types do not allow the definition of This fails because the order of type parameters of the left-
parameterized types so we have to interpret polymorphic and right-hand sides of the definition differ.
type aliases and abstract types specially.
Another restriction concerns the bounds of higher-kinded
Parameterized Aliases. A simple, and quite common case type parameters. Consider the following pattern:
of parameterized type definitions in Scala are parameterized
aliases. For instance, we find in the Scala package the defi- class Seq[X] extends Iterable[X] ...

nition def f[C[X] <: Iterable[X]]: C[String] = ...


def g[C[X] <: Seq[X]]: C[String] = f[C]
type List[+T] =
scala.collection.immutable.List[T]
According to our rules for type parameters, the result type of
Aliases like these can be expanded under the simple encod- f is encoded as
ing by simply dropping the parameters on the left hand side C { type Iterable$X = String }
and the arguments on the right hand side of the equals sign.
On the other hand, the result type of g is encoded as
Partial Applications. Type definitions representing partial
applications like Histogram above are straightforward. C { type Seq$X = String }

53
The two types are incompatible, hence the example above is now just syntactic sugar for:
would be lead to an ill-typed encoding, even though it seems
type Foo = [+X] -> T
completely natural. The problem here is that type parameters
are encoded as type fields with mangled names that contain Higher-kinded applications
the name of the enclosing class. This means that narrowing
of bounds for type parameters is not supported. The root C[T1, ..., Tn ]
problem in the example above is that the type parameter C
in g has a type parameter field named Seq$X whereas the type where C is a higher-kinded type constructor and T1 , ..., Tn
parameter C in f has a type parameter named Iterable$X. are argument types. In such an application C is always a
Therefore, it should not be allowed to pass C from f to g. higher-kinded abstract or alias type or a type parameter.
In a sense the simple encoding abandons the traditional If C is a class, the usual encoding with refinement types
notion of kinds, but replaces it with the notion that the kind is applied. If C is a lambda abstraction, beta reduction is
of a type is the sequence of the names of its type parameter applied:
fields. According to the new notion, the call f[C] above ([v1 X1, ..., vn Xn ] -> T)[U1, ..., Un ]
would not be kind-correct. -->
This discussion also points to a need for a mechanism to [X1 := U1, ..., Xn := Un ]T

enforce that the type parameters of a class have the same


names as the type parameters of a superclass. In the example Reducing applied aliases proceeds similarly, but this is
above, we would like to enforce that the type parameter of not done eagerly in general as it affects type inference,
class Seq has the same (encoded) name as the type parameter see the example at the end of this section.
of class Iterable. A possible way to do this would be by al- We now sketch the extra subtyping rules as they are im-
lowing explicitly named parameters that are available under plemented in the compiler (a more formal treatment would
the same name as public fields. E.g., require us to extend the soudness proof of DOT), for sim-
plicity of presentation we only cover the case of single-
type Seq[type X] extends Iterable[X]
parameter lambdas and we do not consider F-bounds or
A more detailed discussion of named type parameters is kind-checking. The rule governing type lambdas is as fol-
beyond the scope of this paper. lows (conforms(v1 , v2 ) specifies the conformance relation
between variances. It is true iff v1 = v2 or if v2 is non-
3.4 Discussion variant) :
The simple encoding has the advantage that no new concepts
beyond those already covered by DOT are needed. It sup- conforms(v1 , v2 )
ports all forms of partial application naturally, with minimal ` L1 <: L2
notational overhead. ` U2 <: U1
On the other hand, the limitations of the simple encod- , X >: L1 <: U1 ` T1 <: T2
ing make it less expressive than the current implementation
` [v1 X >: L1 <: U1 ] T1 <: [v2 X >: L2 <: U2 ] T2
of higher-kinded types in Scala. Furthermore, the distinction
between what can be expressed and what cannot looks some-
what arbitrary and not well connected with the source-level Subtyping rules for higher-kinded applications are as fol-
parameter syntax. lows (here, the syntax ` S <:! T or ` S >:! T means
that the closest known upper (respectively, lower) bound of
4. The Direct Representation type S is T ).
The direct representation of higher-kinded types keeps the
encoding of type parameters of traits and classes in terms of ` A1 <:! [v X] U1
type members as before. Higher-kinded abstractions and ap- ` [X := T1 ]U1 <: U2
plications are modeled by their own constructs. In particular, ` A1 [T1 ] <: U2
we add explicit internal representations for:
Type lambdas ` A2 >:! [v X] U2
` U1 <: [X := T2 ]U2
[v1 X1, ..., vn Xn ] -> T
` U1 <: A2 [T2 ]
where v1 , . . . , vn are the variances of the type parame-
ters. This is used internally but can also be written ex- ` A <:! [+X] U
plicitly by the user (see Section 4.2), in fact ` T1 <: T2
type Foo[+X] = T ` A[T1 ] <: A[T2 ]

54
` A <:! [X] U 2. The argument O does not have the right number of type
` T2 <: T1 parameters to match the pattern C[Z] and is therefore
` A[T1 ] <: A[T2 ] discarded.
3. The next base type in linearization order from O is
` A <:! [X] U1 B[Int, String]. This type has enough type parameters,

` T1 <: T2 ` T2 <: T1 and we therefore instantiate C to [X] -> B[Int, X].


` A[T1 ] <: A[T2 ] 4. After instantiation we obtain the constraint
B[Int, String] <: B[Int, Z],
Type inference also has to be adapted to higher-kinded
types. The main addition needed concerns the case where which leads to the instantiation of Z = String.
the compiler needs to satisfy a subtyping constraint 5. The call is hence expanded to
S <: C[T1, ..., Tn ]
f[[X] -> B[Int, X], String](O)

where C is an instantiable higher-kinded type parameter and


and its result type is B[Int, String].
S and T1 , ..., Tn are types. We need to find an instantiation of
C that satisfies the constraint. 6. One could have alternatively chosen to instantiate C to
The scheme to find this instantiation is essentially the [X] -> A[X]. But since B came first in linearization order,
same as for the most recent version of scalac (including this alternative was discarded. If subsequently we were
support for partial unification of type constructors [10]). We faced with the constraint that the result type of f should
first find a base type of S that has a constructor with at least be a subtype of A[String] this constraint will fail.
n type parameters where the variances of the rightmost n
Inference takes abstract types and type aliases into ac-
parameters conform to those of Cs type parameters. Let that
count when trying to find possible type parameter instances.
base type be
For instance, given the type alias
B[S1, ... Sm, U1, ..., Un].
type Transform[X] = Map[X, X]
Then, try to instantiate C to
and the definition
[X1, ..., Xn ] -> B[S1, ... Sm, X1, ..., Xn ]
val trans: Transform[String]
and, if this succeeds, continue with the subtyping check
the call f(trans) would be expanded to
B[S1, ... Sm, U1, ..., Un ] <:
B[S1, ... Sm, T1, ..., Tn ] f[Transform, String](trans)

The search for suitable base types proceeds according to That is, Transform is a valid candidate for the decomposition
the linearization order of C. This is a deviation from scalac, of the type of t into a type constructor and a type argument.
which uses a slightly different order in which base types are For this to work, it is important that aliases are not derefer-
visited. In both compilers, once a base type satisfies both enced eagerly in the compiler. If the compiler had expanded
the type parameter instantiation and the subtyping check, the the binding
type parameter stays instantiated to that base type, even if trans: Transform[String]
subsequent subtyping checks fail. This is analogous to Pro-
logs cut operator that prevents backtracking from undoing to
a partial success. The cut is necessary to prevent a combina-
trans: Map[String, String]
torial explosion by limiting the search space.
type inference would have yielded a different, and less intu-
Example: Assume the following definitions:
itive expansion:
trait A[X] f[[X] -> Map[String, X], String](trans)
trait B[X, Y]
object O extends A[String]
with B[Int, String]
def f[C[X], Z](x: C[Z]) : C[Z] = x 4.1 Higher-Kinded Wildcard Applications
Recall that one of the main motivations of dottys encoding
Then the type parameters for f in the call f(O) are inferred
of type parameters was to give a simple semantics to wild-
as follows.
card arguments. With the introduction of higher-kinded ap-
1. The constraint to be satisfied is O <: C[Z], where C and Z plications, the problem resurfaces. For example, consider the
are instantiable type variables. definition:

55
type M[X] <: Map[X, X] ([X] -> M[X, X])[_]

What should be the meaning of M[_] be? One might be is semantically not the same as M[_, _]. The former type im-
tempted to simply disallow higher-kinded applications to plies a coupling between the two unknown type parameters
wildcard arguments. But unfortunately, Scala libraries do which the latter type lacks.
contain occurrences of such applications, which are hard to The reducibility restriction does not seem to be very bur-
work around. Another possible interpretation would be as an densome in practice. The dotty test suite, which includes
existential type - i.e. M[_] corresponds to Scalas standard collection library did not contain any oc-
currences of irreducible applications that would have to be
Map[X, X] forSome { type X }.
rejected.
If we follow that line, every existential type in Scala could
be expressed as a higher-kinded application to wildcard ar- 4.2 Implementation
guments. Indeed, The changes for supporting the direct representation are con-
tained in pull request #1343 of the lampepfl/dotty reposi-
T forSome { type X >: L <: H }
tory on GitHub. The base-line of that pull request is the re-
is equivalently expressed as finement encoding presented in Section 6.
The changes can be summarized as follows.
([X] -> T)(_ >: L <: H).
New syntax for type lambdas. The additional syntax is:
On the other hand, getting rid of existential types was an-
other design objective of the dotty project. In the absence of Type ::=
HkTypeParamClause -> Type
explicit pack and unpack constructs, their interactions with HkTypeParamClause ::=
many other concepts are unclear. Furthermore, existential [ HkTypeParam {, HkTypeParam} ]
HkTypeParam ::=
types are semantically quite close to path-dependent types {Annotation} [+ | -]
and it seems undesirable to have two concepts that largely (Id[HkTypeParamClause] | _)
overlap. TypeBounds

The solution pursued in dotty is to disallow applications


Internal representations for type lambdas and higher-
of higher-kinded types to wildcards unless these applications
can be ultimately reduced to wildcard arguments of class kinded applications as two new forms of types.
types. More precisely we restrict applications to wildcard A smart constructor for type application C[T1, .., Tn]
arguments to reducible type constructors, where a type con- that picks the member-based encoding if C is a class
structor is reducible if one of the following is true. reference, beta-reduces if C is a type lambda, and returns
a higher-kinded application otherwise.
The constructor is a reference to a class or trait
Extractors for type lambdas and type applications that
The constructor is a type lambda of the form
work independently of the underlying representation.
[X] -> B[... X ...] Implementation of subtyping and inference rules for type
lambdas and applications as outlined above.
where B is a reference to a class or trait and X appears at
most once, in argument position to B. 4.3 Comparison with scalac
The constructor is an alias of a reducible constructor. The direct representation shares many characteristics with
The constructor is an abstract type, and any bounds given scalacs implementation of higher-kinded types, which was
for it are reducible constructors. originally done by Adriaan Moors. In particular, the algo-
rithms for subtyping and type inference are quite similar. But
Example: Assume the declarations there are also differences to note.
In scalac, all forms of type applications are represented
type C[X] <: Iterable[X]
type M[X] = Map[X, X] the same way. Type arguments are recorded as an additional
list-valued field in a TypeRef node, which is one of the fun-
Then C[_] is legal because C is reducible but M[_] is illegal damental constructors with which the compiler represents
because M is irreducible. types. Furthermore, all type definitions have a field that
The idea is that it is safe to beta-reduce an application of a records the definitions type parameters. Type lambdas are
type lambda of the form given above to a wildcard argument. not a primitive concept in scalac; the Scala community has
For instance, instead settled on a rather elaborate encoding using struc-
([X] -> C[X])[_]
tural types with type members and type projection, which
bears some resemblance to the projection encoding in the
is simply C[_]. On the other hand, next section.

56
By contrast, in dotty type parameters of classes are simply non-variant parameter, P (positive) a covariant parameter,
specially marked type members. For alias and abstract types, and N (negative) a contravariant parameter. An n-ary base
type parameters are expressed in the form of type lambdas. trait defines parameters hki for i = 1, ..., n with the given
E.g., a source level definition like variances, as well as an abstract type member $Apply. For
instance, the base trait
type C[X] <: Iterable[X]
trait Lambda$NP {
is represented in the equivalent form type $hk0
type $hk1
type C <: [X] -> Iterable[X].
type $Apply
}
In summary, type parameters in dotty are a derived concept,
not a fundamental one. is used for binary type lambdas where the first type pa-
Type arguments are represented in dotty as refinements rameter is contravariant and the second is covariant.
as long as the type constructor is a reference to a class or
An application of a non-variant higher kinded type C to
a trait. For type constructors that are abstract or alias types,
there is a special type node called HKApply which has the type an argument T is encoded as
constructor and its arguments as fields. C { type $hk0 = T } # $Apply
4.4 Discussion
Covariant and contravariant type applications lead to re-
The direct encoding supports higher-kinded types in their finements with upper and lower bounds instead.
full generality. Partial applications are supported through the
Beta reduction is supported by dereferencing the type
introduction of type lambdas, which are notationally heavier
than the solution of the simple encoding, but are much more projection. Indeed,
legible than the workarounds using structural types and type ([X] -> T)[A]
projections in current Scala.
The direct representation is in a sense less elegant and is encoded as
economical than the simple encoding. It feels a bit awkward
Lambda$I {
that type applications are encoded as refinements in the first- type $hk0
order case but remain as a primitive constructs in the higher- type $Apply = [X := this.$hk0]T
order case. On the other hand, this aligns well with the han- } {
dling of wildcard arguments, which were the original moti- type $hk0 = A
} # Apply
vation for encoding type applications as type member refine-
ments. Wildcard arguments are expressible only if it can be which reduces to
guaranteed that they can be eventually reduced away. So in
a sense, one of the main benefits of making the distinction [this.$hk0 := A][X := this.$hk0]T
between encoded and unencoded applications is that this ob-
viates the need for existential types. which is equivalent to
The conceptual and implementation cost of the direct [X := A]T.
representation suggests that it might be advantageous to
study other encodings of higher-kinded types. Two such
candidate encodings are presented in the next sections. Ideally, an encoding of higher-kinded types into type
members and refinements would be sufficiently expressive;
5. The Projection Encoding an encoded term should be type-checkable in the base calcu-
lus without special provisions that account for the fact that
The type projection approach was originally suggested by
types were originally higher-kinded. Unfortunately, there
Adriaan Moors. It uses the following basic encodings.
are a number of areas where higher-kinded types do shine
A type lambda [X >: S <: U] -> T is encoded as the re- through. To make, e.g. the standard Scala collections com-
fined type pile, all of the following tweaks are necessary:
Lambda$I { 1. $Apply refinements are covariant. If T <: U then
type $hk0 >: S <: U
type $Apply = [X := this.$hk0]T S { type $Apply = T }
} <: S { type $Apply = U }

This makes use of a family of synthetic base traits This subtyping relationship does not hold for ordinary
Lambda$..., one for each vector of variances of possi- type refinements. It would hold for upper bound refine-
ble higher-kinded parameters. A suffix of I indicates a ments, of course. But we cannot model $Apply refine-

57
ments as upper bound refinements because that would 6. The Type Refinement Encoding
lose beta reduction. Whereas the type projection encoding makes use of an op-
2. Encodings of type lambdas distribute over intersections erator (type projection #) not covered in DOT but present in
and unions. For instance, current Scala, the type refinement encoding uses general re-
cursive types which are part of DOT but absent in Scala. The
Lambda$I { ... type $Apply = T } &
Lambda$P { ... type $Apply = U }
idea is as follows.
A type lambda [X >: S <: U] -> T is encoded as the re-
needs to be normalized to fined type
Lambda$I { ... type $Apply = T & U }
[X := this.$hk0]T {
type $hk0 >: S <: U
3. A subtype test of the encoded version of }

([X1, ..., Xn ] -> T) <: C


An application of a non-variant higher kinded type C to
an argument T is encoded as
where C is a class constructor is rewritten to:
T <: C[X1, ..., Xn].
C { type $hk0 = T }

Analogously, the subtype test of the encoded version of Covariant and contravariant type applications lead to re-
finements with upper and lower bounds instead.
C <: ([X1, ..., Xn ] -> T)
Beta reduction is a little bit problematic:
is rewritten to ([X] -> T)[A]

C[X1, ..., Xn ] <: T.


is encoded as
4. Inference of higher-kinded type parameters is handled [X := this.$hk0]T {
using an algorithm analogous to the one described in type $hk0
} {
Section 4.
type $hk0 = A
}
One problematic aspect of the projection encoding is
that generalized type projections have been shown to be This is equivalent to (i.e. both a subtype and a supertype
unsound [7]. The known examples of unsoundness do not of):
overlap with the images of the projection encoding, so it is
[this.$hk0 := A]
conceivable that one could have a restricted form of type
[X := this.$hk0]T {
projection that permits the encoding of higher-kinded types type $hk0 = A
and that is at the same time sound. But the rules for such }
a restricted form of type projection have not been worked
out, and indeed the plan for future Scala is to allow only which simplifies to
classes as prefixes of type projections. This can still model
[X := A]T { type $hk0 = A }
Javas inner classes (i.e. C#I is Scalas version of Javas inner
class reference C.I), but it cannot model higher-kinded types The latter type is a subtype of the beta-reduced type
which relies on the abstract type $Apply appearing in prefix [X := A]T but it is not a supertype, because of the oc-
positon of type projections. currence of the additional refinement { type $hk0 = A }.
To make beta reduction work correctly, we have to add a
garbage collection rule along the following lines.
5.1 Discussion
The projection encoding can model full higher-kinded types. T is a first-order type
It is based on two concepts already present in current Scala, T <: T {$hki = U }
type members and type projections. However, the latter con-
cept is about to be phased out because it has been shown to
be unsound. To encode all higher-kinded types, the refinement encod-
The implementation overhead of the projection encoding ing needs the full power of recursive types. For instance, the
was considerable and debugging was hard because encoded type
types can become quite large. But ultimately, it succeeded in type Rep[T] = T
representing all higher-kinded types in the standard library.

58
Lines of code Full hk types Type lambdas Full inference Implementation effort
Simple 0 no no no 1 person-month
Projection [4] 719 yes no no 4 person-months
Refinement [8] 1216 yes yes no 1 person-month
Direct [6] 2000 yes yes yes 1 person-month

Table 1. Implementation characteristics

would lead to the encoding In theory, such circular types are harmless, but naive imple-
mentations of most type operations would send the compiler
type Rep = { z => z.$hk0; type $hk0 }
into an infinite loop. So cycles like these had to be detected
Here, the right hand side is a path-dependent recursive type, and eliminated, which turned out to be difficult.
where self is represented by the variable z. Scala cannot 6.1 Discussion
currently express types like this, but DOT can. We have
extended the dotty compiler to be able to cope with such The refinement encoding has the advantage that it is very
general recursive types. closely integrated with DOT. It uses the full power of recur-
Like the projection encoding, the refinement encod- sive types of DOT to model higher-kinded types. Unlike the
ing needed several tweaks in the compiler. The neces- projection encoding it does not need additional fundamen-
sary changes are contained in pull request #1282 of the tal concepts like type projection whose status in future Scala
lampepfl/dotty repository on GitHub.
is unclear. On the other hand, the abstraction presented by
The main changes necessary were, in addition to tweaks the refinement encoding is also leaky. Additional subtyping
(2) - (4) of the type projection encoding: rules for garbage collection and type lambdas are needed,
and the compiler needed a subtle combination of two type
normalization rules. Also, one of these type normalization
1. Support for general recursive types, as outlined above.
rules follows higher-kinded aliases when a type was applied,
2. Two normalization functions that essentially perform which leads to suboptimal type inference.
beta reduction. One of these (called betaReduce) was ap-
plied eagerly whenever a type application was formed; 7. Comparison of Implementations
the other (normalizeHkApply) was applied every time the Table 1 gives some of the characteristics of the different im-
application was accessed. plementations. The lines of code number gives approximate
3. A special case that disregards superfluous bindings of additional lines of code relative to the simple encoding. It in-
higher-kinded type parameters, as outlined in the garbage cludes whitespace, comments, and other documentation but
collection rule above. excludes tests. The numbers are taken from the pull requests
4. A special case that disregards parameter bounds check- that implemented the proposals; Smaller changes to the dif-
ing when comparing two encodings of type lambdas. The ferent proposals that occurred after the initial pull requests
problem here is that naturally parameter bounds are con- are not taken into account.
travariant whereas in the encoding they become member The other columns in Table 1 indicate whether higher
bounds, which are covariant. Disabling bounds check- kinded types are supported in full generality, whether type
ing for encodings of type lambdas thus avoids spurious lambdas are supported, and whether type inference is as
type errors. Type soundness can still be guaranteed if one complete as in current scalac with the inclusion of the fix
type-checks all type applications instead. In that case, to SI-2712.
type errors are simply reported reported later, on first- The implementation of the simple encoding is smallest,
orer type formation. However, it turned out subsequently but lacks all three of these properties. The refinement encod-
that checking all type applications, including in types in- ing is about 500 lines larger than the projection encoding,
ferred by the compiler is not very practical; so one might but includes syntactic support for type lambdas, and also
be better off enforcing the proper contravariant bounds includes several ameliorations in the handling of recursive
relationship for type lambdas. types. The direct representation has the largest implemen-
tation footprint. On the other hand, it is the only one that
supports type inference on a par with scalac.
A recurring problem in the implementation of the refine-
The final column in Table 1 gives estimated implementa-
ment encoding was that circular types would arise during
tion cost in (full-time) person months. These are rough es-
type simplification. An example of such a circular type is
timates derived from personal recollections with the help of
C { type $hk0 = this.$hk0 }. some GitHub archaeology. As all effort estimates, they have
to be taken with a grain of salt. The projection encoding took

59
much longer than the others not just because of its imple- Acknowledgments
mentation difficulties but also because it was our first attempt The dotty compiler has profited from the contributions of
at implementing full higher-kinded types. Subsequent imple- many people; Main contributors besides the authors include
mentation schemes were implemented faster in part because Felix Mulder, Ondrej Lhotk, Liu Fengyun, Vladimir Niko-
of what was learned before. layev, Samuel Grtter, Vera Salvis, Sbastien Doeraene, Ja-
son Zaugg, and Nicolas Stucki. Adriaan Moors did the orig-
inal implementation of higher-kinded types in scalac which
informed our implementatio to no small degree. He as well
8. Conclusion as Rex Kerr, Daniel Spiewak, Sandro Stucki and Jason Za-
ugg provided important feedback on some of the implemen-
The work on the four different implementations of higher-
tations presented in this paper.
kinded types was done 2013-2016 in the context of the dotty
compiler. Initially, dotty supported the simple encoding. Be-
cause of its lack of expressiveness this was discarded in favor
References
of the projection encoding. Once it became clear that general [1] A. Igarashi and M. Viroli. On variance-based subtyp-
type projection was unsound, we investigated the refinement ing for parametric types. In ECOOP 2002 - Object-
Oriented Programming, 16th European Conference, Malaga,
encoding as an alternative to the projection encoding.
Spain, June 10-14, 2002, Proceedings, pages 441469,
Neither encoding turned out to be fully satisfactory. Both
2002. URL http://link.springer.de/link/service/
were leaky in the sense that they demanded certain rules series/0558/bibs/2374/23740441.htm.
that applied specifically to constructs that resulted from en-
[2] J. Mitchell and G. Plotkin. Abstract types have existential
codings. Both also posed considerable difficulties for imple-
types. ACM Trans. on Programming Languages and Systems,
mentation and debugging. In retrospect, the biggest problem 10(3):470502, 1988.
with the projection encoding was the size of the encoded
[3] A. Moors, F. Piessens, and M. Odersky. Generics of a higher
types, which made diagnostics and debugging hard. The re-
kind. In Proc. OOPSLA, pages 432438, 2008.
finement encoding added somewhat less bulk, but suffered
[4] M. Odersky. Projection encoding of higher-kinded types,
from the fact that cyclic bindings were often created inadver-
2014. URL https://github.com/lampepfl/dotty/pull/
tently. Both encodings posed the problem that, being encod-
137.
ings, they were not reflected in static types. So the safety net
[5] M. Odersky. Compilers are databases. In JVM Languages
of static typing was largely unavailable to the type checker
Summit, 2015. URL https://www.youtube.com/watch?v=
itself.
WxyyJyB_Ssc.
In the end, dotty settled for a direct representation of
[6] M. Odersky. Direct representation of higher-kinded types,
higher-kinded types. This implementation was larger than
2016. URL https://github.com/lampepfl/dotty/pull/
the others, due to the fact that less typing infrastructure
1343.
could be re-used. On the other hand, each of the higher-
[7] M. Odersky. Type projection is unsound, 2016. URL https:
kinded constructs of type lambdas and type applications now
//github.com/lampepfl/dotty/issues/1050.
was represented by its own static type, which was a big
help in ensuring the correctness and completeness of the [8] M. Odersky. Type refinement encoding of higher-kinded
types, 2016. URL https://github.com/lampepfl/dotty/
implementation.
pull/1282.
In a sense, the direct representation gives an honest ac-
count of the additional implementation overhead caused by [9] T. Rompf and N. Amin. Type Soundness for Dependent
Object Types (DOT). OOPSLA, 2016. To appear.
higher-kinded types. The overhead is non-negligible: about
2000 lines compared to a total of about 28000 lines which [10] M. Sabin. SI-2712 add support for partial unification of
taken up by core data-structures and the type-checker. type constructors, 2016. URL https://github.com/scala/
scala/pull/5102.
In retrospect, we believe the simple encoding is an inter-
esting alternative for a language that wants to provide most [11] R. Tate, A. Leung, and S. Lerner. Taming wildcards in javas
of the benefits of higher-kinded types at minimal cost to type system. In Proceedings of the 32nd ACM SIGPLAN Con-
ference on Programming Language Design and Implementa-
specification and implementation, provided one can arrive at
tion, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, pages
a crisp definition of what is legal and what is not. But Scala
614627, 2011. .
is not that language, since it has a large installed code base
[12] M. Torgersen, E. Ernst, C. P. Hansen, P. von der Ah,
that makes essential use of full higher-kinded types. The les-
G. Bracha, and N. M. Gafter. Adding wildcards to the java
son learned from the work on the dotty compiler was that
programming language. Journal of Object Technology, 3(11):
one is best off supporting full higher-kinded types directly. 97116, 2004. . URL http://dx.doi.org/10.5381/jot.
Encodings seem attractive at first for the code reuse they can 2004.3.11.a5.
provide, but in the end they cause more difficulties than they
remove.

60

You might also like