GADTs
GADTs
Oege de Moor,
Jeremy Gibbons
and Geraint Jones
macmillan.eps
Contents
Haskell is renowned for its many extensions to the Hindley-Milner type system
(type classes, polymorphic recursion, rank-n types, existential types, functional
dependencies—just to name a few). In this chapter we look at yet another exten-
sion. I can hear you groaning but this is quite a mild extension and one that fits
nicely within the Hindley-Milner framework. Of course, whenever you add a new
feature to a language, you should throw out an existing one (especially if the lan-
guage at hand is named after a logician). Now, for this chapter we abandon type
classes—judge for yourself how well we get along without Haskell’s most beloved
feature.
The types are essentially those of the corresponding Haskell functions except that
every argument and every result type has Term wrapped around it. For in-
stance, the Haskell function succ :: Int → Int corresponds to the constructor
Succ :: Term Int → Term Int.
This term representation meets the typing requirement: we can apply Succ
only to an arithmetic expression; applying Succ to a Boolean expression results
6 Fun with phantom types
The with clause that it attached to each constructor records its type constraints.
For instance, Zero has Type t with the additional constraint t = Int. Note that the
with clause of the If constructor is not strictly necessary. We could have simply
replaced a by t. Its main purpose is to illustrate that the type equation may
contain type variables that do not appear on the left-hand side of the declaration.
These variables can be seen as being existentially quantified.
Let us move on to defining an interpreter for the expression language. The
interpreter takes an expression of type Term t to a value of type t. The definition
proceeds by straightforward structural recursion.
eval :: ∀t . Term t → t
eval (Zero) = 0
eval (Succ e) = eval e + 1
eval (Pred e) = eval e − 1
eval (IsZero e) = eval e 0
eval (If e1 e2 e3 ) = if eval e1 then eval e2 else eval e3
Even though eval is assigned the type ∀t . Term t → t, each equation—with the
notable exception of the last one—has a more specific type as dictated by the type
constraints. As an example, the first equation has type Term Int → Int as Zero
constrains t to Int.
The interpreter is quite noticeable in that it is tag free. If it receives a Boolean
expression, then it returns a Boolean. By contrast, a more conventional interpreter
of type Term → Val has to inject the Boolean into the Val data type. Conversely,
when evaluating a conditional it has to untag the evaluated condition and further-
more it has to check whether the value is actually a Boolean. To make a long
story short, we are experiencing the benefits of static typing. Here is a short inter-
active session that shows the interpreter in action (:type displays the type of an
Hinze 7
expression).
Thinking of it, the type Term t is quite unusual. Though Term is param-
eterized, it is not a container type: an element of Term Int, for instance, is an
expression that evaluates to an integer; it is not a data structure that contains
integers. This means, in particular, that we cannot define a mapping function
(a → b) → (Term a → Term b) as for many other data types. How could we
possibly turn expressions of type Term a into expression of type Term b? The
type Term b might not even be inhabited: there are, for instance, no terms of type
Term String. Clearly, types of this characteristic deserve a special name. Since the
type argument of Term is not related to any component, we call Term a phantom
type. The purpose of this chapter is to demonstrate the usefulness and the beauty
of phantom types.
2 Generic functions
Suppose you are developing an application where the need arises to compress data
to strings of bits. As it happens, you have data of many different types and you
want to program a compression function that works for all of these types. This
8 Fun with phantom types
sounds like a typical case for Haskell’s type classes. Alas, I promised to do without
type classes. Fortunately, phantom types offer an intriguing alternative.
The basic idea is to define a type whose elements represent types. For con-
creteness, assume that we need compressing functions for types built from Int and
Char using the list and the pair type constructor.
We assume that compressInt :: Int → [Bit ] and compressChar :: Char → [Bit ] are
given. Consider the definition of compress (RList ra). Since the list data type has
two constructors, we emit one bit to distinguish between the two cases. In the case
of a non-empty list, we recursively encode the head and the tail. As an aside, if we
extend compress to data types with more than two constructors, we must ensure
that the codes for the constructors have the unique prefix property, that is, no code
Hinze 9
is the prefix of another code. However, we can use the same code for constructors
of different types as compression (as well as decompression) is driven by type.
We can view Type as representing a family of types and compress as imple-
menting a family of functions. Through the first argument of compress we specify
which member of the family we wish to apply. Functions that work for a family
of types are commonly called generic functions. Using a phantom type of type
representations, generic functions are easy to define. Typical examples of generic
functions include equality and comparison functions, pretty printers and parsers.
Actually, pretty printing is quite a nice example, so let us consider this next.
In Haskell, the Show class takes care of converting values into string repre-
sentations. We will define a variant of its show method building upon the pretty-
printing combinators of Chapter ??. The implementation of the Show class is
complicated by the desire to print lists of characters different from lists of other
types: a list of characters is shown using string syntax whereas any other list is
shown as a comma-separated sequence of elements enclosed in square brackets. Us-
ing type representations we can easily single out this special case by supplying an
additional equation.
Here, prettyInt :: Int → Doc, prettyChar :: Int → Doc, and prettyString :: String →
Doc are predefined functions that pretty print integers, characters, and strings,
respectively.
3 Dynamic values
Note that Type and Dynamic are now defined by mutual recursion.
Dynamic values and generic functions go well together. In a sense, they are
complementary concepts. It is not too difficult, for instance, to extend the generic
functions of the previous section so that they also work for dynamic values (see
Exercise 7 and 8): a dynamic value contains a type representation, which a generic
function requires as a first argument. The following interactive session illustrates
the use of dynamics and generics (note that the identifier it always refers to the
previously evaluated expression).
By pairing a value with its type representation we turn a static into a dynamic
value. The other way round involves a dynamic check. This operation, usually
termed cast, takes a dynamic value and a type representation and checks whether
Hinze 11
the type representation of the dynamic value and the supplied one are identical.
The equality check is defined
Exercise 7 Use the results of the previous exercise to implement functions that
compress and uncompress dynamic values. To compress a dynamic value, first
compress the type representation and then compress the static value. Conversely,
to uncompress a dynamic value first uncompress the type representation and then
use the type representation to read in a static value of this type. Finally, extend
the generic functions compress and uncompress to take care of dynamic values. 2
Exercise 8 Implement functions that pretty print and parse dynamic values and
extend the definitions of pretty and parse accordingly. 2
12 Fun with phantom types
Exercise 9 Extend the type of type representations Type and the dynamic type
equality check tequal to include functional types of the form a → b. 2
Let us develop the theme of Section 2 a bit further. Suppose you have to write a
function that traverses a complex data structure representing a university’s organ-
isational structure, and that increases the age of a given person. The interesting
part of this function, namely the increase of age, is probably dominated by the
boilerplate code that recurses over the data structure. The boilerplate code is not
only tiresome to program, it is also highly vulnerable to changes in the underlying
data structure. Fortunately, generic programming saves the day as it allows us to
write the traversal code once and use it many times. Before we look at an example
let us first introduce a data type of persons.
Now, the aforementioned function that increases the age can be programmed as
follows (this is only the interesting part without the boilerplate code):
The function tick s is a so-called traversal, which can be used to modify data of
any type (the type Traversal will be defined shortly). In our case, tick s changes
values of type Person whose name equals s; integers, characters, lists etc are left
unchanged.
The following interactive session shows the traversal tick in action. The com-
binator everywhere, defined below, implements the generic part of the traversal: it
Hinze 13
The second and the third example illustrate generic queries: age computes the age
of a person, sizeof yields the size of an object (the number of occupied memory
cells), total applies an integer query to every component of a value and sums up
the results.
Turning to the implementation the type of generic traversals is given by:
A generic traversal takes a type representation and transforms a value of the spec-
ified type. The universal quantifier makes explicit that the function works for all
representable types. The simplest traversal is copy, which does nothing.
copy :: Traversal
copy rt = id
Traversals can be composed using the operator ‘◦’, which has copy as its identity.
The function imap can be seen as a ‘traversal transformer’. Note that imap has
a so-called rank-2 type: it takes polymorphic functions to polymorphic functions.
The combinator everywhere enjoys the same type.
Actually, there are two flavours of the combinator: everywhere f applies f after
the recursive calls (it proceeds bottom-up), whereas everywhere 0 applies f before
(it proceeds top-down). And yes, everywhere and everywhere 0 have the structure
of generic folds and unfolds—only the types are different (Chapter ?? treats folds
and unfolds in detail).
Generic queries have a similar type except that they yield a value of some fixed
type.
In the rest of this section we confine ourselves to queries of type Query Int. Exer-
cise 11 deals with the general case. The definition of the combinator total follows
the model of everywhere. We first define a non-recursive, auxiliary function that
sums up the immediate components of a value and then tie the recursive knot.
Exercise 10 Prove the following properties of imap (which justify its name).
5 Normalization by evaluation
s = λx y z → (x z ) (y z )
k = λx y → x
i = λx → x
and you want to normalize combinator expressions. The function reify, defined
below, allows you to do that: it takes a type representation (where b represents the
base type and ‘:→’ functional types) and yields the normal form of a Haskell value
of this type, where the normal form is given as an element of a suitable expression
data type.
Maini reify (b :→ b) (s k k )
Fun (λa → a)
Maini reify (b :→ (b :→ b)) (s (k k ) i )
Fun (λa → Fun (λb → a))
Maini let e = (s ((s (k s)) ((s (k k )) i ))) ((s ((s (k s)) ((s (k k )) i ))) (k i ))
Maini :type e
∀t . (t → t) → t → t
Maini reify ((b :→ b) :→ (b :→ b)) e
Fun (λa → Fun (λb → App a (App a b)))
The last test case is probably the most interesting one as the expression e is quite
involved. We first use Haskell’s type inferencer to determine its type, then we
call reify passing it a representation of the inferred type and e itself. And voilà:
16 Fun with phantom types
the computed result shows that e normalizes to a function that applies its first
argument twice to its second.
Now, since we want to represent simply typed lambda terms, we change the
type of type representations to
infixr :→
data Type t = RBase with t = Base
| Type a :→ Type b with t = a → b
b :: Type Base
b = RBase.
Here, Base is the base type of the simply typed lambda calculus. We won’t reveal
its definition until later. To represent lambda terms we use higher-order abstract
syntax. For instance, the lambda term λf.λx.f (f x) is represented by the Haskell
term Fun (λf → Fun (λx → App f (App f x ))), that is, abstractions are repre-
sented by Haskell functions.
Note that since we use higher-order abstract syntax there is no need to represent
variables.
The function reify takes a Haskell value of type t to an expression of type
Term t. It is defined by induction over the structure of types, that is, it is driven
by the type representation of t. Let us consider functional types first. In this case,
reify has to turn a value of type a → b into an expression of type Term (a → b).
The constructor Fun constructs terms of this type, so we are left with converting an
a → b value to a Term a → Term b value (unfortunately, Term does not give rise
to a mapping function). Suppose that there is a transformation of type Term a → a
available. Then we can reflect a Term a to an a, apply the given function, and
finally reify the resulting b to a Term b. In other words, to implement reify we
need its converse, as well. Turning to the base case, this means that we require
functions of type Base → Term Base and Term Base → Base. Fortunately, we
are still free in the choice of the base type. An intriguing option is to set Base to
the fixed point of Term.
Then the isomorphisms out and In constitute the required functions. Given these
prerequisites we can finally define reify and its inverse reflect.
Exercise 12 Implement a show function for Term t. Hint: augment the expression
type Term t by an additional constructor Var of type String → Term t. 2
6 Functional unparsing
Can we program C’s printf function in a statically typed language such as Haskell?
Yes, we can, provided we use a tailor-made type of format directives (rather than
a string). Here is an interactive session that illustrates the puzzle (we renamed
printf to format).
The format directive Lit s means emit s literally. The directives Int and String
instruct format to take an additional argument of the types Int and String re-
spectively, which is then shown. The operator ‘:^:’ is used to concatenate two
directives.
The type of format depends on its first argument, the format directive. This
is something we have already seen a number of times: the type of compress, for
instance, depends on its first argument, the type representation. Of course, the
dependence here is slightly more involved. Yet, this smells like a case for phantom
types.
The format directive can be seen as a binary tree of type representations:
Lit s, Int, String form the leaves, ‘:^:’ constructs the inner nodes. The type of
format is essentially obtained by linearizing the binary tree mapping, for instance,
String :^: Lit " is " :^: Int to String → Int → String.
Before tackling the puzzle proper it is useful to reconsider flattening binary
trees (see IFPH, Section 7.3.1). To avoid the repeated use of the expensive ‘++’
operation, one typically defines an auxiliary function that makes use of an accu-
18 Fun with phantom types
mulating parameter.
Note that format 0 (d1 :^: d2 ) can be simplified to format 0 d1 · format 0 d2 , where
‘·’ is ordinary function composition. This is not a coincidence. In fact, the type
(String → x ) → (String → y) = MapTrans String x y constitutes an arrow (see
Chapter ??).
We have seen in the previous sections that with clauses add considerably to the
expressiveness of Haskell. Rather surprisingly, with clauses need not be a primitive
concept, they can be simulated using polymorphic types. The resulting programs
are more verbose—this is why we have used with clauses in the first place—but
they can be readily evaluated using a Haskell 98 implementation that additionally
supports existential types.
The principle idea is to represent type equations by a type equality type: the
data declaration
data T t = · · · | C t1 . . . tn with t = u | · · ·
becomes
data T t = · · · | C (u :=: t) t1 . . . tn | · · · ,
where ‘:=:’ is a binary type constructor, the type equality type. This type has
the intriguing property that it is non-empty if and only if its argument types are
equal.1 Even more intriguing, its definition goes back to Leibniz. According to
Leibniz, two terms are equal if one may be substituted for the other. Adapting
this principle to types, we define
Note that the universally quantified type variable f ranges over type constructors
of kind ∗ → ∗. Thus, an element of a :=: b is a function that converts an element
of type f a into an element of f b for any type constructor f . This function can
be seen as constituting a proof of the type equality a = b. The identity function,
for instance, serves as the proof of reflexivity.
refl :: ∀a . a :=: a
refl = Proof id
1 We ignore the fact, that in Haskell every type contains the bottom element.
20 Fun with phantom types
f (C p1 . . . pn ) = e
becomes
f (C p p1 . . . pn ) = apply p e.
newtype F 0 a = In{out :: F a }
Turning back to the type equality type it is interesting to note that it has all
the properties of an congruence relation. We have already seen that it is reflexive.
It is furthermore symmetric, transitive, and congruent. Here are programs that
Hinze 21
Exercise 18 We have defined congruence proofs for the list and the pair type con-
structor. Generalize the construction to an arbitrary n-ary data type not neces-
sarily being a functor. 2
8 Chapter notes
This chapter is based on a paper by Cheney and Hinze [2], which shows how to
combine generics and dynamics in a type-safe manner. The term phantom type
22 Fun with phantom types
was coined by Leijen and Meijer [8] to denote parameterized types that do not use
their type argument.
There is an abundance of work on generic programming, see, for instance,
[6, 5]. For a gentle introduction to the topic the interested reader is referred to [1].
Section 4 draws from a paper by Lämmel and Peyton Jones [7]. Sections 5
and 6 adopt two pearls by Danvy, Rhiger and Rose [4] and by Danvy [3], respec-
tively. An alternative approach to unparsing is described by Hinze [?].
Acknowledgement
I would like to thank Andres Löh for his helpful and immediate feedback on a draft
version of this chapter.
References
[1] Roland Backhouse, Patrik Jansson, Johan Jeuring, and Lambert Meertens.
Generic Programming — An Introduction —. In S. Doaitse Swierstra,
Pedro R. Henriques, and Jose N. Oliveira, editors, 3rd International Summer
School on Advanced Functional Programming, Braga, Portugal, volume 1608
of Lecture Notes in Computer Science, pages 28–115. Springer-Verlag, Berlin,
1999.
[2] James Cheney and Ralf Hinze. A lightweight implementation of generics and
dynamics. In Manuel M.T. Chakravarty, editor, Proceedings of the 2002 ACM
SIGPLAN Haskell Workshop, pages 90–104. ACM Press, October 2002.
[7] Ralf Lämmel and Simon Peyton Jones. Scrap your boilerplate: a practical
approach to generic programming. Available from
http://research.microsoft.com/~simonpj/papers/hmap/, 2002.
Hinze 23