Implementing Higher-Kinded Types in Dotty: Martin Odersky, Guillaume Martres, Dmitry Petrashko
Implementing Higher-Kinded Types in Dotty: Martin Odersky, Guillaume Martres, Dmitry Petrashko
Implementing Higher-Kinded Types in Dotty: Martin Odersky, Guillaume Martres, Dmitry Petrashko
51
tively. Section 7 compares the four implementation strate- An application such as List[String] is then expanded
gies described previously. Section 8 concludes. to List { type Elem = String }. If List were declared as
covariant (using [+Elem]), the type application is instead
expanded to a refinement with an upper bound:
2. Background
List { type Elem <: String }
If we combine generics and subtyping in a language like
Java or Scala, we face the problem that we want to express Analogously, applications of contravariant types lead to re-
a generic type where the type argument is an unknown type finements with lower bounds.
that can range over a set of possible types. The prototypi- This scheme has two immediate benefits. First, we only
cal case is where the argument ranges over all subtypes or need to explain one concept instead of two. Second, the
supertypes of some type bound, as in List[_ <: Fruit]. interaction between the two concepts, which was so difficult
Such partially undetermined types come up when we before, now becomes trivial. Indeed, a type like
want to express variance. We would like to express, say,
List[_ <: Fruit]
that List[Apple] is a subtype of List[Fruit] since Apple is
a subtype of Fruit. An equivalent way to express this is to is simply
say that the type List[Fruit] includes Lists where the el-
ements are of an arbitrary subtype of Fruit. By that rea- List { type Elem <: Fruit }
soning, List[Apple] is a special case of List[Fruit]. We
That is, wildcard parameters translate directly to refinements
can also express this notion directly using the wildcard type
with the same type bounds.
List[_ <: Fruit]. Definition-site variance can be regarded
as a user-friendly notation that expands into use-site vari-
ance expressions using such wildcards. 3. The Simple Encoding
The problem is how to model a wildcard type such as Following DOT, we model type parameters as type members
List[_ <: Fruit]. Igarashi and Virolis original interpre-
and type arguments as refinements. For instance, a parame-
tation [1] was as an existential type T <: Fruit.List[T] terized class such as
which would be written
Map[K, V]
List[T] forSome { type T <: Fruit }
is treated as equivalent to a type with type members:
in current Scala. However, existential types usually come
with explicit pack and unpack constructs [2], which are class Map { type Map$K; type Map$V }
absent in Scalas setting. Moreover, actual subtyping rules
as e.g. implemented in the reference compilers for Java and The type members are name-mangled (i.e. Map$K) to ensure
Scala are more powerful than what can be expressed with that they do not conflict with other user-defined members or
existential types alone [12]. The theory of the rules that are parameters named K or V.
actually implemented is not fully known and the issues look A type-instance such as Map[String, Int] would then be
complicated. Tate, Leung and Learner have explored some treated as equivalent to
possible explanations in [11], but their treatment raises about Map { type Map$K = String; type Map$V = Int }
as many questions as it answers.
whereas a wildcard type such as Map[_, Int] is equivalent
2.1 A Uniform Representation of Types to:
The problem is solved in DOT and dotty by a radical reduc- Map { type Map$V = Int }
tion. Type parameters and type arguments are not primitive,
That is, _ arguments correspond to type members that are
but are seen as syntactic sugar for type members and type
left abstract. Wildcard arguments can have bounds. E.g.
refinements. For instance, if List is declared like this:
Map[_ <: AnyRef, Int]
trait List[Elem] { ... }
is equivalent to:
then this would be expanded to a parameterless trait with
a type member, like this: Map { type Map$K <: AnyRef; type Map$V = Int }
(For simplicity we re-use the name of the parameter Elem 3.1 Type Parameters and Partial Applications
as the name of the type member, whereas in practice the The notion of type parameters makes sense even for encoded
compiler would choose a mangled name like List$Elem in types, which do not contain parameter lists in their syntax.
order to avoid name clashes.) Specifically, the type parameters of a type are a sequence of
52
type fields that correspond to parameters in the unencoded Non-linear parameter occurrences. It is also possible to
type. They are determined as follows. express some patterns where type parameters occur non-
linearly on the right-hand side. An example is the definition
The type parameters of a class or trait type are those of Pair below.
parameter fields declared in the class that are not yet
instantiated, in the order they are given. Type parameter type Pair[T] = Tuple2[T, T]
This definition of type parameters leads to a simple model More generally, each type parameter of the left-hand side
of partial applications. Consider for instance: must appear as a type member of the right hand side type.
Type members must appear in the same order as their corre-
type Histogram = Map[_, Int] sponding type parameters. References to the type parameter
Histogram is a higher-kinded type that still has one type are then translated to references to the type member. The
parameter. Histogram[String] would be a possible type in- type member itself is left uninstantiated.
stance, and it would be equivalent to Map[String, Int]. 3.3 Limitations
One interesting consequence of this definition is that
The technique described in the previous section can expand
higher-kinded types and existential types are identified with
most polymorphic type aliases appearing in Scala codebases
each other by virtue of being mapped to the same construct.
but not all of them. Here are some examples of types that
Indeed, the type Map[_, Int] can be interpreted as both an
cannot be expressed:
existential type, where the K field is unspecified and as a
higher-kinded type that takes a type argument for the K field
1. type Rep[T] = T
and produces an instance of Map.
This fails because the right hand side T does not have a
3.2 Modeling Polymorphic Type Declarations
type field named T.
The partial application scheme gives us a new and quite
elegant way to express certain higher-kinded types. But 2. type LL[Elem] = List[List[Elem]]
how do we interpret the polymorphic types that exist in
Scala? This fails because the occurrence of the parameter Elem
More concretely, Scala allows us to write parameterized on the right hand side is not a member binding of the
type definitions, abstract types, and type parameters. In the outer List.
new scheme, only classes (and traits) can have parameters
3. type RMap[V, K] = Map[K, V]
and these are treated as equivalent to type members. Type
aliases and abstract types do not allow the definition of This fails because the order of type parameters of the left-
parameterized types so we have to interpret polymorphic and right-hand sides of the definition differ.
type aliases and abstract types specially.
Another restriction concerns the bounds of higher-kinded
Parameterized Aliases. A simple, and quite common case type parameters. Consider the following pattern:
of parameterized type definitions in Scala are parameterized
aliases. For instance, we find in the Scala package the defi- class Seq[X] extends Iterable[X] ...
53
The two types are incompatible, hence the example above is now just syntactic sugar for:
would be lead to an ill-typed encoding, even though it seems
type Foo = [+X] -> T
completely natural. The problem here is that type parameters
are encoded as type fields with mangled names that contain Higher-kinded applications
the name of the enclosing class. This means that narrowing
of bounds for type parameters is not supported. The root C[T1, ..., Tn ]
problem in the example above is that the type parameter C
in g has a type parameter field named Seq$X whereas the type where C is a higher-kinded type constructor and T1 , ..., Tn
parameter C in f has a type parameter named Iterable$X. are argument types. In such an application C is always a
Therefore, it should not be allowed to pass C from f to g. higher-kinded abstract or alias type or a type parameter.
In a sense the simple encoding abandons the traditional If C is a class, the usual encoding with refinement types
notion of kinds, but replaces it with the notion that the kind is applied. If C is a lambda abstraction, beta reduction is
of a type is the sequence of the names of its type parameter applied:
fields. According to the new notion, the call f[C] above ([v1 X1, ..., vn Xn ] -> T)[U1, ..., Un ]
would not be kind-correct. -->
This discussion also points to a need for a mechanism to [X1 := U1, ..., Xn := Un ]T
54
` A <:! [X] U 2. The argument O does not have the right number of type
` T2 <: T1 parameters to match the pattern C[Z] and is therefore
` A[T1 ] <: A[T2 ] discarded.
3. The next base type in linearization order from O is
` A <:! [X] U1 B[Int, String]. This type has enough type parameters,
The search for suitable base types proceeds according to That is, Transform is a valid candidate for the decomposition
the linearization order of C. This is a deviation from scalac, of the type of t into a type constructor and a type argument.
which uses a slightly different order in which base types are For this to work, it is important that aliases are not derefer-
visited. In both compilers, once a base type satisfies both enced eagerly in the compiler. If the compiler had expanded
the type parameter instantiation and the subtyping check, the the binding
type parameter stays instantiated to that base type, even if trans: Transform[String]
subsequent subtyping checks fail. This is analogous to Pro-
logs cut operator that prevents backtracking from undoing to
a partial success. The cut is necessary to prevent a combina-
trans: Map[String, String]
torial explosion by limiting the search space.
type inference would have yielded a different, and less intu-
Example: Assume the following definitions:
itive expansion:
trait A[X] f[[X] -> Map[String, X], String](trans)
trait B[X, Y]
object O extends A[String]
with B[Int, String]
def f[C[X], Z](x: C[Z]) : C[Z] = x 4.1 Higher-Kinded Wildcard Applications
Recall that one of the main motivations of dottys encoding
Then the type parameters for f in the call f(O) are inferred
of type parameters was to give a simple semantics to wild-
as follows.
card arguments. With the introduction of higher-kinded ap-
1. The constraint to be satisfied is O <: C[Z], where C and Z plications, the problem resurfaces. For example, consider the
are instantiable type variables. definition:
55
type M[X] <: Map[X, X] ([X] -> M[X, X])[_]
What should be the meaning of M[_] be? One might be is semantically not the same as M[_, _]. The former type im-
tempted to simply disallow higher-kinded applications to plies a coupling between the two unknown type parameters
wildcard arguments. But unfortunately, Scala libraries do which the latter type lacks.
contain occurrences of such applications, which are hard to The reducibility restriction does not seem to be very bur-
work around. Another possible interpretation would be as an densome in practice. The dotty test suite, which includes
existential type - i.e. M[_] corresponds to Scalas standard collection library did not contain any oc-
currences of irreducible applications that would have to be
Map[X, X] forSome { type X }.
rejected.
If we follow that line, every existential type in Scala could
be expressed as a higher-kinded application to wildcard ar- 4.2 Implementation
guments. Indeed, The changes for supporting the direct representation are con-
tained in pull request #1343 of the lampepfl/dotty reposi-
T forSome { type X >: L <: H }
tory on GitHub. The base-line of that pull request is the re-
is equivalently expressed as finement encoding presented in Section 6.
The changes can be summarized as follows.
([X] -> T)(_ >: L <: H).
New syntax for type lambdas. The additional syntax is:
On the other hand, getting rid of existential types was an-
other design objective of the dotty project. In the absence of Type ::=
HkTypeParamClause -> Type
explicit pack and unpack constructs, their interactions with HkTypeParamClause ::=
many other concepts are unclear. Furthermore, existential [ HkTypeParam {, HkTypeParam} ]
HkTypeParam ::=
types are semantically quite close to path-dependent types {Annotation} [+ | -]
and it seems undesirable to have two concepts that largely (Id[HkTypeParamClause] | _)
overlap. TypeBounds
56
By contrast, in dotty type parameters of classes are simply non-variant parameter, P (positive) a covariant parameter,
specially marked type members. For alias and abstract types, and N (negative) a contravariant parameter. An n-ary base
type parameters are expressed in the form of type lambdas. trait defines parameters hki for i = 1, ..., n with the given
E.g., a source level definition like variances, as well as an abstract type member $Apply. For
instance, the base trait
type C[X] <: Iterable[X]
trait Lambda$NP {
is represented in the equivalent form type $hk0
type $hk1
type C <: [X] -> Iterable[X].
type $Apply
}
In summary, type parameters in dotty are a derived concept,
not a fundamental one. is used for binary type lambdas where the first type pa-
Type arguments are represented in dotty as refinements rameter is contravariant and the second is covariant.
as long as the type constructor is a reference to a class or
An application of a non-variant higher kinded type C to
a trait. For type constructors that are abstract or alias types,
there is a special type node called HKApply which has the type an argument T is encoded as
constructor and its arguments as fields. C { type $hk0 = T } # $Apply
4.4 Discussion
Covariant and contravariant type applications lead to re-
The direct encoding supports higher-kinded types in their finements with upper and lower bounds instead.
full generality. Partial applications are supported through the
Beta reduction is supported by dereferencing the type
introduction of type lambdas, which are notationally heavier
than the solution of the simple encoding, but are much more projection. Indeed,
legible than the workarounds using structural types and type ([X] -> T)[A]
projections in current Scala.
The direct representation is in a sense less elegant and is encoded as
economical than the simple encoding. It feels a bit awkward
Lambda$I {
that type applications are encoded as refinements in the first- type $hk0
order case but remain as a primitive constructs in the higher- type $Apply = [X := this.$hk0]T
order case. On the other hand, this aligns well with the han- } {
dling of wildcard arguments, which were the original moti- type $hk0 = A
} # Apply
vation for encoding type applications as type member refine-
ments. Wildcard arguments are expressible only if it can be which reduces to
guaranteed that they can be eventually reduced away. So in
a sense, one of the main benefits of making the distinction [this.$hk0 := A][X := this.$hk0]T
between encoded and unencoded applications is that this ob-
viates the need for existential types. which is equivalent to
The conceptual and implementation cost of the direct [X := A]T.
representation suggests that it might be advantageous to
study other encodings of higher-kinded types. Two such
candidate encodings are presented in the next sections. Ideally, an encoding of higher-kinded types into type
members and refinements would be sufficiently expressive;
5. The Projection Encoding an encoded term should be type-checkable in the base calcu-
lus without special provisions that account for the fact that
The type projection approach was originally suggested by
types were originally higher-kinded. Unfortunately, there
Adriaan Moors. It uses the following basic encodings.
are a number of areas where higher-kinded types do shine
A type lambda [X >: S <: U] -> T is encoded as the re- through. To make, e.g. the standard Scala collections com-
fined type pile, all of the following tweaks are necessary:
Lambda$I { 1. $Apply refinements are covariant. If T <: U then
type $hk0 >: S <: U
type $Apply = [X := this.$hk0]T S { type $Apply = T }
} <: S { type $Apply = U }
This makes use of a family of synthetic base traits This subtyping relationship does not hold for ordinary
Lambda$..., one for each vector of variances of possi- type refinements. It would hold for upper bound refine-
ble higher-kinded parameters. A suffix of I indicates a ments, of course. But we cannot model $Apply refine-
57
ments as upper bound refinements because that would 6. The Type Refinement Encoding
lose beta reduction. Whereas the type projection encoding makes use of an op-
2. Encodings of type lambdas distribute over intersections erator (type projection #) not covered in DOT but present in
and unions. For instance, current Scala, the type refinement encoding uses general re-
cursive types which are part of DOT but absent in Scala. The
Lambda$I { ... type $Apply = T } &
Lambda$P { ... type $Apply = U }
idea is as follows.
A type lambda [X >: S <: U] -> T is encoded as the re-
needs to be normalized to fined type
Lambda$I { ... type $Apply = T & U }
[X := this.$hk0]T {
type $hk0 >: S <: U
3. A subtype test of the encoded version of }
Analogously, the subtype test of the encoded version of Covariant and contravariant type applications lead to re-
finements with upper and lower bounds instead.
C <: ([X1, ..., Xn ] -> T)
Beta reduction is a little bit problematic:
is rewritten to ([X] -> T)[A]
58
Lines of code Full hk types Type lambdas Full inference Implementation effort
Simple 0 no no no 1 person-month
Projection [4] 719 yes no no 4 person-months
Refinement [8] 1216 yes yes no 1 person-month
Direct [6] 2000 yes yes yes 1 person-month
would lead to the encoding In theory, such circular types are harmless, but naive imple-
mentations of most type operations would send the compiler
type Rep = { z => z.$hk0; type $hk0 }
into an infinite loop. So cycles like these had to be detected
Here, the right hand side is a path-dependent recursive type, and eliminated, which turned out to be difficult.
where self is represented by the variable z. Scala cannot 6.1 Discussion
currently express types like this, but DOT can. We have
extended the dotty compiler to be able to cope with such The refinement encoding has the advantage that it is very
general recursive types. closely integrated with DOT. It uses the full power of recur-
Like the projection encoding, the refinement encod- sive types of DOT to model higher-kinded types. Unlike the
ing needed several tweaks in the compiler. The neces- projection encoding it does not need additional fundamen-
sary changes are contained in pull request #1282 of the tal concepts like type projection whose status in future Scala
lampepfl/dotty repository on GitHub.
is unclear. On the other hand, the abstraction presented by
The main changes necessary were, in addition to tweaks the refinement encoding is also leaky. Additional subtyping
(2) - (4) of the type projection encoding: rules for garbage collection and type lambdas are needed,
and the compiler needed a subtle combination of two type
normalization rules. Also, one of these type normalization
1. Support for general recursive types, as outlined above.
rules follows higher-kinded aliases when a type was applied,
2. Two normalization functions that essentially perform which leads to suboptimal type inference.
beta reduction. One of these (called betaReduce) was ap-
plied eagerly whenever a type application was formed; 7. Comparison of Implementations
the other (normalizeHkApply) was applied every time the Table 1 gives some of the characteristics of the different im-
application was accessed. plementations. The lines of code number gives approximate
3. A special case that disregards superfluous bindings of additional lines of code relative to the simple encoding. It in-
higher-kinded type parameters, as outlined in the garbage cludes whitespace, comments, and other documentation but
collection rule above. excludes tests. The numbers are taken from the pull requests
4. A special case that disregards parameter bounds check- that implemented the proposals; Smaller changes to the dif-
ing when comparing two encodings of type lambdas. The ferent proposals that occurred after the initial pull requests
problem here is that naturally parameter bounds are con- are not taken into account.
travariant whereas in the encoding they become member The other columns in Table 1 indicate whether higher
bounds, which are covariant. Disabling bounds check- kinded types are supported in full generality, whether type
ing for encodings of type lambdas thus avoids spurious lambdas are supported, and whether type inference is as
type errors. Type soundness can still be guaranteed if one complete as in current scalac with the inclusion of the fix
type-checks all type applications instead. In that case, to SI-2712.
type errors are simply reported reported later, on first- The implementation of the simple encoding is smallest,
orer type formation. However, it turned out subsequently but lacks all three of these properties. The refinement encod-
that checking all type applications, including in types in- ing is about 500 lines larger than the projection encoding,
ferred by the compiler is not very practical; so one might but includes syntactic support for type lambdas, and also
be better off enforcing the proper contravariant bounds includes several ameliorations in the handling of recursive
relationship for type lambdas. types. The direct representation has the largest implemen-
tation footprint. On the other hand, it is the only one that
supports type inference on a par with scalac.
A recurring problem in the implementation of the refine-
The final column in Table 1 gives estimated implementa-
ment encoding was that circular types would arise during
tion cost in (full-time) person months. These are rough es-
type simplification. An example of such a circular type is
timates derived from personal recollections with the help of
C { type $hk0 = this.$hk0 }. some GitHub archaeology. As all effort estimates, they have
to be taken with a grain of salt. The projection encoding took
59
much longer than the others not just because of its imple- Acknowledgments
mentation difficulties but also because it was our first attempt The dotty compiler has profited from the contributions of
at implementing full higher-kinded types. Subsequent imple- many people; Main contributors besides the authors include
mentation schemes were implemented faster in part because Felix Mulder, Ondrej Lhotk, Liu Fengyun, Vladimir Niko-
of what was learned before. layev, Samuel Grtter, Vera Salvis, Sbastien Doeraene, Ja-
son Zaugg, and Nicolas Stucki. Adriaan Moors did the orig-
inal implementation of higher-kinded types in scalac which
informed our implementatio to no small degree. He as well
8. Conclusion as Rex Kerr, Daniel Spiewak, Sandro Stucki and Jason Za-
ugg provided important feedback on some of the implemen-
The work on the four different implementations of higher-
tations presented in this paper.
kinded types was done 2013-2016 in the context of the dotty
compiler. Initially, dotty supported the simple encoding. Be-
cause of its lack of expressiveness this was discarded in favor
References
of the projection encoding. Once it became clear that general [1] A. Igarashi and M. Viroli. On variance-based subtyp-
type projection was unsound, we investigated the refinement ing for parametric types. In ECOOP 2002 - Object-
Oriented Programming, 16th European Conference, Malaga,
encoding as an alternative to the projection encoding.
Spain, June 10-14, 2002, Proceedings, pages 441469,
Neither encoding turned out to be fully satisfactory. Both
2002. URL http://link.springer.de/link/service/
were leaky in the sense that they demanded certain rules series/0558/bibs/2374/23740441.htm.
that applied specifically to constructs that resulted from en-
[2] J. Mitchell and G. Plotkin. Abstract types have existential
codings. Both also posed considerable difficulties for imple-
types. ACM Trans. on Programming Languages and Systems,
mentation and debugging. In retrospect, the biggest problem 10(3):470502, 1988.
with the projection encoding was the size of the encoded
[3] A. Moors, F. Piessens, and M. Odersky. Generics of a higher
types, which made diagnostics and debugging hard. The re-
kind. In Proc. OOPSLA, pages 432438, 2008.
finement encoding added somewhat less bulk, but suffered
[4] M. Odersky. Projection encoding of higher-kinded types,
from the fact that cyclic bindings were often created inadver-
2014. URL https://github.com/lampepfl/dotty/pull/
tently. Both encodings posed the problem that, being encod-
137.
ings, they were not reflected in static types. So the safety net
[5] M. Odersky. Compilers are databases. In JVM Languages
of static typing was largely unavailable to the type checker
Summit, 2015. URL https://www.youtube.com/watch?v=
itself.
WxyyJyB_Ssc.
In the end, dotty settled for a direct representation of
[6] M. Odersky. Direct representation of higher-kinded types,
higher-kinded types. This implementation was larger than
2016. URL https://github.com/lampepfl/dotty/pull/
the others, due to the fact that less typing infrastructure
1343.
could be re-used. On the other hand, each of the higher-
[7] M. Odersky. Type projection is unsound, 2016. URL https:
kinded constructs of type lambdas and type applications now
//github.com/lampepfl/dotty/issues/1050.
was represented by its own static type, which was a big
help in ensuring the correctness and completeness of the [8] M. Odersky. Type refinement encoding of higher-kinded
types, 2016. URL https://github.com/lampepfl/dotty/
implementation.
pull/1282.
In a sense, the direct representation gives an honest ac-
count of the additional implementation overhead caused by [9] T. Rompf and N. Amin. Type Soundness for Dependent
Object Types (DOT). OOPSLA, 2016. To appear.
higher-kinded types. The overhead is non-negligible: about
2000 lines compared to a total of about 28000 lines which [10] M. Sabin. SI-2712 add support for partial unification of
taken up by core data-structures and the type-checker. type constructors, 2016. URL https://github.com/scala/
scala/pull/5102.
In retrospect, we believe the simple encoding is an inter-
esting alternative for a language that wants to provide most [11] R. Tate, A. Leung, and S. Lerner. Taming wildcards in javas
of the benefits of higher-kinded types at minimal cost to type system. In Proceedings of the 32nd ACM SIGPLAN Con-
ference on Programming Language Design and Implementa-
specification and implementation, provided one can arrive at
tion, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, pages
a crisp definition of what is legal and what is not. But Scala
614627, 2011. .
is not that language, since it has a large installed code base
[12] M. Torgersen, E. Ernst, C. P. Hansen, P. von der Ah,
that makes essential use of full higher-kinded types. The les-
G. Bracha, and N. M. Gafter. Adding wildcards to the java
son learned from the work on the dotty compiler was that
programming language. Journal of Object Technology, 3(11):
one is best off supporting full higher-kinded types directly. 97116, 2004. . URL http://dx.doi.org/10.5381/jot.
Encodings seem attractive at first for the code reuse they can 2004.3.11.a5.
provide, but in the end they cause more difficulties than they
remove.
60