Type Indexed Data
Type Indexed Data
Abstract
1 Introduction
More advanced examples of polytypic functions are functions for digital search-
ing [3], pattern matching [4], unification [5,6], rewriting [7], and structure edit-
This paper shows how to define a type-indexed data type, discusses several
examples of type-indexed data types, and shows how to specialize a type-
indexed data type. The specialization is illustrated with example translations
to Haskell. The approach has been implemented in Generic Haskell, a generic
programming extension of the functional language Haskell. Generic Haskell
can be obtained from http://www.generic-haskell.org/. This paper is a
revised version of [9].
where FMap stands for ‘finite map’. Such a trie for strings would typically be
used for an index on texts. The first component of the constructor Trie String
contains the value associated with Nil . The second component of Trie String
is derived from the constructor Cons :: Char → String → String. We assume
that a suitable data structure, FMapChar, and an associated look-up function
lookupChar ::∀v . Char → FMapChar v → Maybe v for characters are predefined.
We use the following naming convention: names such as FMap String where
an underscore separates the name of two types are used for instances of type-
indexed entities. The goal of the paper is to describe how to generate such types
automatically from a generic definition. Compound names (such as FMapChar)
1 The examples are given in Haskell [1]. Deviating from Haskell, universal quantifi-
cation of types is always made explicit by means of ∀·’s in the type.
2
are used when we assume that a type or function is predefined or defined by
the user.
Given the definitions of String and FMap String, we can define a look-up func-
tion for strings as follows:
Consider now the data type Bush of binary trees with characters in the leaves:
Again, we have two components, one to store values constructed by Leaf , and
one for values constructed by Fork . The corresponding look-up function is
given by
One can easily recognize that not only the look-up functions, but also the
data types for the tries are instances of an underlying generic pattern. In the
following section we will show how to define a trie and associated functions
generically for arbitrary data types. The material is taken from Hinze [3],
and it is repeated here because it serves as a nice and simple example of a
type-indexed data type.
3
Example 2: Pattern matching. The polytypic functions for the maximum
segment sum problem [10] and pattern matching [4] use labelled data types.
These labelled data types, introduced in [10], can be used to store at each node
the subtree rooted at that node, or a set of patterns (trees with variables)
matching at a subtree, etc. For example, the data type of labelled bushes is
defined by
It can be constructed from the Bush data type by extending each constructor
with an additional field to store the label. In the following section we show
how to define such a labelled data type generically, and how this data type is
used in a (specification of a) generic pattern matching program.
Using the type of locations we can efficiently navigate through a tree. For
example:
The navigation function down Bush moves the focus of attention to the left-
most subtree of the current node; right Bush moves the focus to its right
sibling.
Huet [11] defines the zipper data structure for rose trees and for the data type
Bush, and gives the generic construction in words. In Section 5 we describe
the zipper in more detail and show how to define a zipper for an arbitrary
data type.
4
Other examples. Besides these three examples, a number of other exam-
ples of type-indexed data types have appeared in the literature [13–16]. We
expect that type-indexed data types will also be useful for generic DTD trans-
formations [17]. Generally, we believe that type-indexed data types are almost
as important as type-indexed functions.
5
2 Defining type-indexed data types
This section shows how to define type-indexed data types. Section 2.1 briefly
reviews the concepts of polytypic programming necessary for defining type-
indexed data types. The subsequent sections define type-indexed data types
for the problems described in the introduction. We assume a basic familiarity
with Haskell’s type system and in particular with the concept of kinds [32].
For a more thorough treatment the reader is referred to Hinze’s work [31,30].
data 1 = ()
data a + b = Inl a | Inr b
data a × b = (a, b).
equal ht :: ?i :: t → t → Bool
equal h1i () () = True
equal hChari c1 c2 = equalChar c1 c2
equal ht1 + t2 i (Inl a1 ) (Inl a2 ) = equal ht1 i a1 a2
equal ht1 + t2 i (Inl a1 ) (Inr b2 ) = False
6
equal ht1 + t2 i (Inr b1 ) (Inl a2 ) = False
equal ht1 + t2 i (Inr b1 ) (Inr b2 ) = equal ht2 i b1 b2
equal ht1 × t2 i (a1 , b1 ) (a2 , b2 ) = equal ht1 i a1 a2 ∧ equal ht2 i b1 b2
This simple definition contains all ingredients needed to specialize equal for
arbitrary data types. Note that the definition does not mention type abstrac-
tion, type application, and fixed points. Instances of polytypic functions on
types with these constructions can be generated automatically from just the
cases given above. For example, if we used equal at the data type Bush, the
generated specialization would behave exactly as the following hand-written
code.
show ht :: ?i :: t → String
show h1i () = ""
show hChari c = showChar c
show ht1 + t2 i (Inl a) = show ht1 i a
show ht1 + t2 i (Inr b) = show ht2 i b
show ht1 × t2 i (a, b) = show ht1 i a ++" "++ show ht2 i b
show hc of ti t = "(" ++c+ +" "++ show hti t +
+ ")".
7
of natural numbers
is represented by
is viewed as
The functions equal and show are indexed by a type of kind ?. A polytypic
function may also be indexed by type constructors of kind ? → ? (and, of
course, by type constructors of other kinds, but these are not needed in the
sequel). We need slightly different base cases for generic functions operating
on types of kind ? → ?:
Id = Λa . a
Kt = Λa . t
f1 + f2 = Λa . f1 a + f2 a
f1 × f2 = Λa . f1 a × f2 a
c of f = Λa . c of f a.
Here, Λa . t denotes abstraction on the type level. We have the constant functor
K, which lifts a type of kind ? to kind ? → ?. We will need K 1 as well as
K Char (or more general, K t for all primitive types). We overload +, ×, and
c of to be the lifted versions of their previously defined counterparts. The
only new type index in this set of indices of kind ? → ? is the identity functor
Id. Hinze [30] shows that these types are the normal forms of types of kind
? → ?.
maphf :: ? → ?i :: ∀a b . (a → b) → (f a → f b)
maphIdi ma =ma
maphK 1i mc =c
maphK Chari m c =c
8
maphf1 + f2 i m (Inl f ) = Inl (maphf1 i m f )
maphf1 + f2 i m (Inr g) = Inr (maphf2 i m g)
maphf1 × f2 i m (f , g) = (maphf1 i m f , maphf2 i m g)
Using map we can, for instance, define generic versions of cata- and anamor-
phisms [33]. To this end we assume that data types are given as fixed points
of so-called pattern functors. In Haskell the fixed point combinator can be
defined as follows:
It follows that the constructor In and the ‘destructor’ out have the following
types:
In :: ∀f . f (Fix f) → Fix f
out :: ∀f . Fix f → f (Fix f)
For example, we could have defined the type of bushes by Bush = Fix BushF,
where
It is easy to convert between this data type defined as a fixed point and the
original type definition of bushes.
catahf :: ? → ?i :: ∀a . (f a → a) → (Fix f → a)
catahfi ϕ = ϕ · maphfi (catahfi ϕ) · out
anahf :: ? → ?i :: ∀a . (a → f a) → (a → Fix f)
anahfi ψ = In · maphfi (anahfi ψ) · ψ.
Note that both functions are parameterized by the pattern functor f rather
than by the fixed point Fix f. For example, the catamorphism on the functor
of bushes, BushF, would be defined by
9
Both cata and ana are so-called generic abstractions, i.e. generic functions
that are not defined by induction on base types, but in terms of other generic
functions. Generic Haskell supports generic abstractions [34].
2.2 Tries
Tries are based on the following isomorphisms, also known as the laws of
exponentials.
1 →fin v ∼
= v
(t1 + t2 ) →fin v ∼
= (t1 →fin v) × (t2 →fin v)
(t1 × t2 ) →fin v ∼
= t1 →fin (t2 →fin v)
There are more laws for exponentials, but these are the ones we need in our
definition of tries. Here, t →fin v denotes the type of finite maps from t to v. Us-
ing the isomorphisms above as defining equations, we can give a type-indexed
definition for the data type FMaphti v of finite maps from t to v, which gener-
alizes FMap String from the introduction to arbitrary data types. This is our
first example of a type-indexed data type.
FMapht :: ?i :: ? → ?
FMaph1i v = Maybe v
FMaphChari v = FMapChar v
FMapht1 + t2 i v = FMapht1 i v × FMapht2 i v
FMapht1 × t2 i v = FMapht1 i (FMapht2 i v)
10
On sums the look-up function selects the appropriate map; on products it
‘composes’ the look-up functions for the component keys. The second argu-
ment to the look-up function is an element of the type-indexed type that we
have defined before. Note how the definition of lookup relies on the fact that
the second argument is a pair in the +-case and a nested finite map in the
×-case. This generic look-up function is a generalization of the type-specific
look-up functions on strings and bushes that we have seen in the introduction.
Another generic function can be used to produce the empty trie for any data
type:
emptyht :: ?i :: ∀v . FMaphti v
emptyh1i = Nothing
emptyhChari = emptyChar
emptyht1 + t2 i = (emptyht1 i, emptyht2 i)
emptyht1 × t2 i = emptyht1 i,
where emptyChar is the empty value of type FMapChar. The empty function
serves as a simple example for a function that constructs values in a generic
way.
The pattern matching problem (for exact patterns) can be informally specified
as follows: given a pattern and a text, find all occurrences of the pattern in
the text. The pattern and the text may both be lists, or they may both be
trees, etc. This section specifies a generic pattern-matching program for data
types specified as fixed points of pattern functors. The specification is a rather
inefficient program, but it can be transformed into an efficient program [4].
The efficient program is a generalization of the Knuth, Morris, and Pratt
algorithm on lists [35] to arbitrary data types.
A pattern is a value of a type extended with variables. For example, the data
type Bush is extended with a constructor for variables as follows:
In general, we want to extend a data type given as the fixed point of a functor
Fix f with a case for variables. We can perform the extension on the functor
directly, and we can parametrize over the functor f in question:
11
data VarF f r = Var Int
| Val (f r).
With this definition, Fix (VarF f) is the extension of Fix f with variable case
that we are interested in. In particular, one can easily define isomorphisms
to confirm that Fix (VarF BushF) is equivalent to the previously defined type
Var Bush.
We start with function match that matches a pattern against a value. A pat-
tern matches a value if it is a variable, or if it has the same top-level constructor
as the value, and all children match pairwise. On Bush, for example:
For the general case we use the function zipWith to match all children of a
constructor pairwise:
where zipWith and and are the generalizations of the list-processing func-
tions defined in the Haskell prelude. On BushF, function zipWith is defined as
follows:
12
zipWith BushF :: ∀a b c . (a → b → c)
→ BushF a → BushF b → Maybe (BushF c)
zipWith BushF f (LeafF c1 ) (LeafF c2 ) =
if equalChar c1 c2 then Just (LeafF c1 ) else Nothing
zipWith BushF f (ForkF a1 b1 ) (ForkF a2 b2 ) =
Just (ForkF (f a1 a2 ) (f b1 b2 ))
zipWith BushF f = Nothing.
zipWithhf :: ? → ?i :: ∀a b c . (a → b → c)
→ f a → f b → Maybe (f c)
zipWithhIdi f a b = Just (f a b)
zipWithhK 1i f u u = Just u
zipWithhK Chari f c1 c2 = if equalChar c1 c2
then Just c1
else Nothing
zipWithhf1 + f2 i f (Inl a1 ) (Inl a2 ) = do {x ← zipWithhf1 i f a1 a2 ;
return (Inl x )}
zipWithhf1 + f2 i f (Inl a1 ) (Inr b2 ) = Nothing
zipWithhf1 + f2 i f (Inr b1 ) (Inl a2 ) = Nothing
zipWithhf1 + f2 i f (Inr b1 ) (Inr b2 ) = do {y ← zipWithhf2 i f b1 b2 ;
return (Inr y)}
zipWithhf1 × f2 i f (a1 , b1 ) (a2 , b2 ) = do {x ← zipWithhf1 i f a1 a2 ;
y ← zipWithhf2 i f b1 b2 ;
return (x , y)}.
Having defined match, we need to define suffixes that computes the suffixes of
a data structure generically. For lists, a suffix is a tail of the list. For example,
the string per is a suffix of the string paper . We will now generalize the concept
13
of suffixes in the following way: given a set of patterns, the generic pattern-
matching problem will require finding for each suffix the subset of patterns
matching (in the sense of match) the suffix. How do we compute all suffixes
of a value of a data type? On lists, the suffixes of a list can be represented
as a list of tails, computed by tails, a standard Haskell function that can be
defined as follows:
For a value of an arbitrary data type we construct a value of a new data type,
a labelled data type, that can be used to store all suffixes.
The data type Labelled labels a data type given by a pattern functor:
Labelledhf :: ? → ?i :: ? → ?
Labelledhfi m = Fix (Labelhfi m).
Here we use a generic abstraction, see Section 2.1, on the type level. The idea
is the same as generic abstractions on functions. The type-indexed data type
Label adds a label type to each constructor of a data type. In its definition, we
make use of the fact that a Haskell data type is viewed as a sum of constructor
applications, where the fields of a constructor form a product. In Label, we
traverse the sum structure, and add the label type once we reach a constructor.
There are no recursive calls in the constructor case, therefore the product of
fields is never traversed, and no ×-case is needed. We want to label the whole
data type, but Label does not work recursively. Therefore, we compute a fixed
point using Label in Labelled.
Labelhf :: ? → ?i :: ? → ? → ?
Labelhf1 + f2 i m r = Labelhf1 i m r + Labelhf2 i m r
Labelhc of fi m r = f r × m
The type-indexed function suffixes, defined below, labels a value of a data type
with the subtree rooted at each node. It uses a helper function add , which adds
a label to a value of type f t, returning a value of type Labelhfi m t. As for
the type-indexed type Label, we omit the ×-case for add : the function only
inspects the sum structure and the constructors of a data type.
add hf :: ? → ?i :: ∀m t . m → f t → Labelhfi m t
add hf1 + f2 i m (Inl x ) = Inl (add hf1 i m x )
add hf1 + f2 i m (Inr y) = Inr (add hf2 i m y)
add hc of fi m x = (x , m)
14
The function suffixes is then defined as a recursive function that adds the
subtrees rooted at each level to the tree. It adds the argument tree to the top
level, and applies suffixes to the children by means of function map. It is the
generalization of function tails to arbitrary data types.
The data type Labelled that has been introduced in this section has other
applications: for instance, it can also be used in the generic maximum segment
sum problem [10], which requires finding a subtree of a tree with maximum
sum.
We illustrate the main ideas by translating the digital search tree example to
Haskell. This translation shows in particular how type-indexed data types are
specialized in Generic Haskell: the Haskell code given here will be automat-
ically generated by the Generic Haskell compiler. The example is structured
into three sections: a translation of data types, a translation of type-indexed
data types, and a translation of type-indexed functions that operate on type-
indexed data types.
15
a data type. By applying such a transformation, concepts that are usually
built-in in the Haskell data statement, such as a data type having multiple
constructors, with a variable number of fields per constructor, are replaced by
just type abstraction, type application and some basic type constructors. This
implies, of course, that values of user-defined data types have to be translated
to generic representation types. For example, the type Nat of natural numbers
defined by
is translated to the following type (in which Nat itself still appears), together
with two conversion functions.
The conversion functions from Nat and to Nat transform the top-level struc-
ture of a natural number; they are not recursive.
where the constructor Node takes three arguments. The generic representation
type for Tree is
16
data Iso a b = Iso{from :: a → b, to :: b → a}
iso Nat :: Iso Nat Nat0
iso Nat = Iso from Nat to Nat
iso Tree :: Iso Tree Tree0
iso Tree = Iso from Tree to Tree.
The conversion functions only affect the top-level structure of a data type.
For recursive data types, the generic representation type still contains the
original data type. The isomorphisms will be used in the translation of type-
indexed data types and type-indexed functions to move between the structural
view and the original data type as needed. If the function is recursive and
operates on a recursive data type, then the conversion functions will be applied
recursively, as well.
FMaph1i v = Maybe v
FMaphChari v = FMapChar v
FMapht1 + t2 i v = FMapht1 i v × FMapht2 i v
FMapht1 × t2 i v = FMapht1 i (FMapht2 i v).
The constructor names are generated automatically. This implies that a value
of a type-indexed data type can only be constructed by means of a generic
function. Thus, a type-indexed data type can be viewed as an abstract type.
Finally, for each data type t on which we want to use a trie we generate a
suitable instance FMap t.
17
Note that we use newtype for FMap Nat because it is not possible to define
recursive types in Haskell. The types FMap Nat and FMap Nat0 can easily be
converted into each other by means of the following pair of isomorphisms:
lookuph1i () t = t
lookuphChari c t = lookupChar c t,
Note that we have to wrap the trie constructors around the second argument
of the function.
For the type indices of kind ? → ? → ? (i.e. ‘+’ and ‘×’) we generate types
that take two functions as arguments, corresponding to the instances of the
18
generic function on the arguments of ‘+’ and ‘×’, and return a function of the
combined type, see Hinze [31]. For example, the following lines
The expression lookup Either lookup Unit lookup Nat is generated directly
from the type Nat0 , which is defined as 1 + Nat: each of the type constants has
been replaced by the corresponding case or specialization of the lookup func-
tion, and type application is translated into value application. Unfortunately,
this expression does not have the type we require for lookup Nat – the type
given in the type signature – but rather the type
However, this type is isomorphic to the type we need, because Nat0 is iso-
morphic to Nat, and FMap Nat0 is isomorphic to FMapNatT . The conversion
function conv lookup Nat witnesses this isomorphism:
19
conv lookup Nat :: (∀v . Nat0 → FMap Nat0 v → Maybe v)
→ (∀v . Nat → FMap Nat v → Maybe v)
conv lookup Nat lu
= λt fmt → lu (from iso Nat t) (from iso FMap Nat fmt).
Note that the functions to iso Nat and to FMap Nat are not used on the
right-hand side of the definition of conv lookup Nat. This is because no values
of type Nat or FMap Nat are built for the result of the function. If we look at
the instance of empty for Nat, we are in a different situation. Here we have
where
We will use the explicit style introduced in Section 3.3 throughout the rest of
the paper.
20
class FMap fma a | a → fma where
lookup :: ∀v . a → fma v → Maybe v
instance FMap Maybe () where
lookup () fm = fm
data FMap Either fma fmb v = FMap Either (fma v, fmb v)
instance (FMap fma a, FMap fmb b)
⇒ FMap (FMap Either fma fmb) (a + b) where
lookup (Inl a) (FMap Either fma fmb) = lookup a fma
lookup (Inr b) (FMap Either fma fmb) = lookup b fmb
data FMap Product fma fmb v = FMap Product (fma (fmb v))
instance (FMap fma a, FMap fmb b)
⇒ FMap (FMap Product fma fmb) (a × b) where
lookup (a, b) (FMap Product fma) = (lookup a 3 lookup b) fma
Fig. 1. Implementing FMap in Haskell directly.
This section briefly introduces kinds, types, type schemes, and terms.
21
We distinguish between type terms and type schemes: the language of type
terms comprises the types that may appear as type indices; the language of
type schemes comprises the constructs that are required for the translation of
generic definitions (such as polymorphic types).
Type terms are built from type constants and type variables using type appli-
cation and type abstraction.
In order to be able to model Haskell’s data types the set of type constants
should include at least the types 1, Char, ‘+’, ‘×’, and ‘c of ’ for all known
constructors in the program. Furthermore, it should include a family of fixed
point operators indexed by kind: FixT :: (T → T) → T. In the examples, we
will often omit the kind annotation T in FixT . We may additionally add the
function space constructor ‘→’ or universal quantifiers ∀U :: (U → ?) → ? to
the set of type constants (see Section 4.5 for an example).
22
and universal abstraction λa :: U . t (here a is a type variable). We assume
that the set of value constants includes at least the polymorphic fixed point
operator
fix :: ∀a . (a → a) → a
and suitable functions for each of the other type constants (such as () for
‘1’, Inl , Inr , and case for ‘+’, and outl , outr , and (,) for ‘×’). To improve
readability we will usually omit the type argument of fix .
We omit the standard typing rules for the polymorphic lambda calculus.
Almost all inductive definitions of type-indexed functions and data types given
in Section 2 take the form of a catamorphism:
catah1i = cata1
catahChari = cataChar
cataht1 + t2 i = cata+ (cataht1 i) (cataht2 i)
cataht1 × t2 i = cata× (cataht1 i) (cataht2 i)
catahc of t1 i = catac of (cataht1 i).
These equations implicitly define the family of functions cata1 , cataChar , cata+ ,
cata× , and catac of . In the sequel, we will assume that type-indexed functions
and data types are explicitly defined as a catamorphism. For example, for
digital search trees we have
FMap1 = Λv . Maybe v
FMapChar = Λv . FMapChar v
FMap+ = ΛfMapa fMapb . Λv . fMapa v × fMapb v
FMap× = ΛfMapa fMapb . Λv . fMapa (fMapb v)
FMapc of = ΛfMapa . Λv . fMapa v.
Some inductive definitions, such as the definition of Label, also use the ar-
gument types themselves in their right-hand sides. Such functions are called
paramorphisms [19], and are characterized by:
23
parah1i = para1
parahChari = paraChar
paraht1 + t2 i = para+ t1 t2 (paraht1 i) (paraht2 i)
paraht1 × t2 i = para× t1 t2 (paraht1 i) (paraht2 i)
parahc of t1 i = parac of t1 (paraht1 i).
Section 4.3 below describes how to specialize type-indexed data types with
type indices that appear in the set of type constants: 1, Char, ‘+’, ‘×’, and
‘c of ’. However, we have also used the type indices Id· , K 1, K Char, and
lifted versions of ‘+’ and ‘×’. How are type-indexed data types with these
type indices specialized? The specialization of type-indexed data types with
higher-order type indices proceeds in much the same fashion as in the following
section. Essentially, the process only has to be lifted to higher-order type
indices. For the details of this lifting process see Hinze [37, Section 3.2].
Generalizing the above example, we have that a type-indexed data type pos-
sesses a kind-indexed kind:
Datat::T :: DataT ,
24
DataT::2 :: 2
Data? =
DataA→B = DataA → DataB .
Here, ‘2’ is the superkind: the type of kinds. Note that only the definition of
Data? , as indicated by the box, has to be given to complete the definition of
the kind-indexed kind. The definition of Data· on functional kinds is dictated
by the specialization process. Since type application is interpreted by type
application, the kind of a type with a functional kind is functional.
For example, the kind of the type-indexed data type FMapt , where t is a type
of kind ? is:
FMap? = ? → ?.
MT = Type DataT / E
appT,U [t] [u] = [t u]
const(C) = [DataC ].
The domain of the applicative structure for a kind T is the equivalence class
of the set of types of kind DataT , under an appropriate set of equations E
between type terms, that is, β- and η-equality and f (FixT f) = FixT f for
all kinds T and type constructors f of kind T → T. The application of two
equivalence classes of types (denoted by [t] and [u]) is the equivalence class
of the application of the types. The definition of the constants is obtained
from the definition as a catamorphism. It can be verified that the applicative
structure defined thus is an environment model.
25
It remains to specify the interpretation of the fixed point operators, which is
the same for all type-indexed data types:
const(FixT ) = [FixDataT ].
Again, note that only an equation for Poly? has to be given to complete the
definition of the kind-indexed type. The definition of Poly· on functional kinds
is dictated by the specialization process. The presence of type-indexed data
types slightly complicates the type of a type-indexed value. In Hinze [31]
PolyT takes n arguments of kind T. Here PolyT takes n possibly different type
arguments obtained from the type-indexed data type arguments. For example,
for the type of the look-up function we have:
where Id· is the identity function on kinds. From the definition of the generic
look-up function we obtain the following equations:
26
Just as with type-indexed data types, type-indexed values on type-indexed
data types are specialized by means of an interpretation of the simply typed
lambda calculus. The environment model used for the specialization is some-
what more involved than the one given in Section 4.3. The domain of the
environment model is now a dependent product: the type of the last compo-
nent (the equivalence class of the terms of type PolyT d1 . . . dn ) depends on
the first n components (the equivalence classes of the type schemes d1 . . . dn
of kind T). Note that the application operator applies the term component of
its first argument to both the type and the term components of the second
argument.
1 n
MT = ([d1 ] ∈ Scheme DataT / E, . . . , [dn ] ∈ Scheme DataT / E;
Term PolyT d1 ... dn / E)
appT,U ([r1 ], . . . , [rn ]; [t ]) ([s1 ], . . . , [sn ]; [u ])
= ([r1 s1 ], . . . , [rn sn ]; [t s1 . . . sn u ])
const(C) = ([Data1C ], . . . , [DatanC ]; [poly C ]).
Again, the interpretation of fixed points is the same for different type-indexed
values:
These conversion functions are easily generated, both for type-indexed values
and data types, and can be stored in pairs, as values of type Iso. The only
difficult task is to plug them in at the right positions. This problem is solved by
lifting the conversion functions to the type of the specialized generic function.
This again is a generic program [37, Section 6.1.3], which makes use of the
bimap · function displayed in Figure 2 (we omit the type arguments for function
27
BimapT::2 :: IdT → IdT → ?
Bimap? = Λt1 . Λt2 . Iso t1 t2
bimap t::T :: BimapT Idt Idt
bimap 1 = Iso id id
bimap Char = Iso id id
bimap + = λa1 a2 bimap a . λb1 b2 bimap b .
Iso (λab → case ab of
{Inl a → (Inl · from a1 a2 bimap a ) a;
Inr b → (Inr · from b1 b2 bimap b ) b })
(λab → case ab of
{Inl a → (Inl · to a1 a2 bimap a ) a;
Inr b → (Inr · to b1 b2 bimap b ) b })
bimap × = λa1 a2 bimap a . λb1 b2 bimap b .
Iso (λ(a, b) → (from a1 a2 bimap a a, from b1 b2 bimap b b))
(λ(a, b) → (to a1 a2 bimap a a, to b1 b2 bimap b b))
bimap → = λa1 a2 bimap a . λb1 b2 bimap b .
Iso (λab → from b1 b2 bimap b · ab · to a1 a2 bimap a )
(λab → to b1 b2 bimap b · ab · from a1 a2 bimap a )
bimap ∀? = λf1 f2 bimap f .
Iso (λf v . from (f1 v) (f2 v)
(bimap f v v (Iso id id )) (f v))
(λf v . to (f1 v) (f2 v)
(bimap f v v (Iso id id )) (f v))
bimap c of = λa1 a2 bimap a . bimap a
Fig. 2. Lifting isomorphisms with a generic function.
composition and identity functions).
Let isoDatat denote iso tT if Datat = Idt , and iso Data tT otherwise. The
conversion function can now be derived as
For example, the conversion function for the specialization of lookup to Nat is
given by
conv lookup Nat = to (bimap Lookup? iso Nat iso FMap Nat),
Note that the definition of bimap · must include a case for the quantifier ∀? ::
(? → ?) → ? since Lookup? is a polymorphic type. In this specific case,
28
however, polymorphic type indices can be easily handled, see Figure 2. The
further details are exactly the same as for type-indexed values [39,37], and are
omitted here.
4.6 Summary
• For each data type, the corresponding generic representation type is gener-
ated, together with a pair of isomorphisms.
• Each type-indexed type is translated into a series of newtype statements,
one for each case.
• Analogously, each case of each type-indexed function is translated into one
ordinary function definition.
• Finally, each call to a generic function is replaced by a call to the appropriate
specialization.
lookuphList Chari
can be simplified to
lookuphListi lookuphChari,
hence only the specializations of lookup to List and Char are required. If generic
functions involve type-indexed types, then specializations for those are needed
as well. The same observation holds for type-indexed types, though: special-
izations to type constants suffice.
It is thus obvious that the additional code size of the translated program is
in the order the number of generic functions times the number of data types
in the program. Careful analysis of which calls actually appear in a program
can be used to reduce the number of specializations that is generated.
29
5 An advanced example: the Zipper
This section shows how to define a so-called zipper for an arbitrary data
type. This is a more complex example demonstrating the full power of a
type-indexed data structure together with a number of type-indexed functions
working on it.
The zipper is a data structure that is used to represent a tree together with a
subtree that is the focus of attention, where that focus may move left, right,
up or down in the tree. The zipper is used in tools where a user interactively
manipulates trees, for instance, in editors for structured documents such as
proofs or programs. For the following it is important to note that the focus of
the zipper may only move to recursive components. Consider as an example
the data type Tree:
If the left subtree of a Node constructor is the current focus, moving right
means moving to the right tree, not to the a-label. This implies that recursive
positions in trees play an important rôle in the definition of a generic zipper
data structure. To obtain access to these recursive positions, we have to be
explicit about the fixed points in data type definitions. The zipper data struc-
ture is then defined by induction on the so-called pattern functor of a data
type.
The tools in which the zipper is used, allow the user to repeatedly apply
navigation or edit commands, and to update the focus accordingly. In this
section we define a type-indexed data type for locations, which consist of a
subtree (the focus) together with a context, and we define several navigation
functions on locations.
30
following picture illustrates the idea (the filled circle marks the current cursor
position).
c c c
up left
K ⇐= K1 ⇐= K2
down right
=⇒ =⇒
t1 t2 ··· tm t1 t2 ··· tm t1 t2 ··· tm
5.2 Locations
A location is a subtree, together with a context, which encodes the path from
the top of the original tree to the selected subtree. The type-indexed data type
Loc returns a type for locations given an argument pattern functor.
Lochf :: ? → ?i :: ?
Lochfi = (Fix f, Contexthfi (Fix f))
Contexthf :: ? → ?i :: ? → ?
Contexthfi r = Fix (LMaybe (Ctxhfi r))
data LMaybe f a = LNothing | LJust (f a),
where LMaybe is the lifted version of Maybe. The type Loc is defined in terms
of Context, which constructs the context parameterized by the original tree
type. The Context of a value is either empty (represented by LNothing in the
LMaybe type), or it is a path from the root down into the tree. Such a path
is constructed by means of the argument type of LMaybe: the type-indexed
data type Ctx. The type-indexed data type Ctx is defined by induction on the
pattern functor of the original data type. It can be seen as the derivative (as
in calculus) of the pattern functor f [40,41]. If the derivative of f is denoted
by f 0 , we have
const0 = Void
(f + g)0 = f 0 + g0
(f × g)0 = f 0 × g + f × g0
It follows that in the definition of Ctx we will also need access to the type
arguments themselves on the right-hand side of the definition.
Ctxhf :: ? → ?i :: ? → ? → ?
CtxhIdi rc=c
CtxhK 1i r c = Void
CtxhK Chari r c = Void
31
Ctxhf1 + f2 i r c = Ctxhf1 i r c + Ctxhf2 i r c
Ctxhf1 × f2 i r c = (Ctxhf1 i r c × f2 r) + (f1 r × Ctxhf2 i r c)
For example, for natural numbers with pattern functor K 1 + Id, and for
trees of type Bush with pattern functor BushF, which can be represented by
K Char + (Id × Id) we obtain
McBride [40,41] also defines a type-indexed zipper data type. His zipper slightly
deviates from Huet’s and our zipper: the navigation functions on McBride’s
zipper are not constant time anymore. The observation that the Context of a
data type is its derivative (as in calculus) is due to McBride.
32
The instantiation of down to the data type Bush has been given in Section 1.
The function down satisfies the following property:
where the function up goes up in a tree. So first going down the tree and
then up again is the identity function on locations in which it is possible to
go down.
Since down moves down to the leftmost recursive child of the current node,
the inverse equality downhfi · uphfi = id does not hold in general. However,
there does exist a natural number n such that
where the function right goes right in a tree. These properties do not com-
pletely specify function down. The other properties it should satisfy are that
the selected subtree of downhfi m is the leftmost tree-child of the selected
subtree of m, and the context of downhfi m is the context of m extended with
all but the leftmost tree-child of m.
To find the leftmost recursive child, we have to pattern match on the pattern
functor f, and find the first occurrence of Id. The helper function first is a
type-indexed function that possibly returns the leftmost recursive child of a
node, together with the context (a value of type Ctxhfi c t) of the selected
child. The function down then turns this context into a value of type Context
by inserting it in the right (‘non-top’) component of a sum by means of LJust,
and applying the fixed point constructor In to it.
33
Here, return is obtained from the Maybe monad, and the operator (+
+) is the
standard monadic plus, called mplus in Haskell, given by
(+
+) :: ∀a . Maybe a → Maybe a → Maybe a
Nothing ++ m = m
Just a ++ m = Just a.
The function first returns the value and the context at the leftmost Id position.
So in the product case, it first tries the left component, and only if it fails, it
tries the right component.
The definitions of functions up, right and left are not as simple as the definition
of down, since they are defined by pattern matching on the context instead of
on the tree itself. We will just define functions up and right, and leave function
left as an exercise.
Function up. The function up moves up to the parent of the current node,
if the current node is not the top node.
Remember that LNothing denotes the empty top context. The navigation
function up uses two helper functions: insert and extract. The latter returns
the context of the parent of the current node. Note that each element of type
Ctxhfi c t has at most one c component (by an easy inductive argument), which
marks the context of the parent of the current node. The generic function
extract extracts this context.
34
Function insert takes a context and a tree, and inserts the tree in the current
focus of the context, effectively turning a context into a tree.
Note that the extraction and insertion is happening in the identity case Id;
the other cases only pass on the results.
do {(t, c 0 ) ← firsthfi ft c;
c 00 ← extracthfi c 0 ;
0
ft ← inserthfi c 0 t;
return (c c 00 ∧ ft ft 0 )}
Function right. The function right moves the focus to the next (right) sib-
ling in a tree, if it exists. The context is moved accordingly. The instance of
right on the data type Bush has been given in Section 1. The function right
satisfies the following property:
that is, first going right in the tree and then left again is the identity function
on locations in which it is possible to go to the right. Of course, the dual
equality holds on locations in which it is possible to go to the left. Furthermore,
the selected subtree of righthfi m is the sibling to the right of the selected
subtree of m, and the context of righthfi m is the context of m in which the
context is replaced by the selected subtree of m, and the first subtree to the
right of the context of m is replaced by the context of m.
35
righthf :: ? → ?i :: Lochfi → Lochfi
righthfi (t, c) = case out c of
{LNothing → (t, c);
LJust c 0 → case nexthfi t c 0 of
{Just (t 0 , c 00 ) → (t 0 , In (LJust c 00 ));
Nothing → (t, c)}}
The helper function next is a type-indexed function that returns the first
location that has the recursive value to the right of the selected value as its
focus. Just as there exists a function left such that lefthfi · righthfi = id (on
locations in which it is possible to go to the right), there exists a function
previous, such that
do {(t 0 , c 0 ) ← nexthfi t c;
(t 00 , c 00 ) ← previoushfi t 0 c 0 ;
return (c c 00 ∧ t t 00 )}
The first three lines in this definition show that it is impossible to go to the
right in an identity or constant context. If the context argument is a value of
a sum, we select the next element in the appropriate component of the sum.
The product case is the most interesting one. If the context is in the right
component of a pair, next returns the next value of that context, properly
combined with the left component of the tuple. On the other hand, if the
context is in the left component of a pair, the next value may be either in that
left component (the context), or it may be in the right component (the value).
36
If the next value is in the left component, it is returned by the first line in
the definition of the product case. If it is not, next extracts the context c (the
context of the parent) from the left context cx , it inserts the given value in the
context cx giving a ‘tree’ value x , and selects the first component in the right
component of the pair, using the extracted context c for the new context. The
new context that is thus obtained is combined with x into a context for the
selected tree.
6 Conclusion
We have shown how to define type-indexed data types, and we have given sev-
eral examples of type-indexed data types: digital search trees, generic pattern-
matching using a labelled data type, and the zipper. Furthermore, we have
shown how to specialize type-indexed data types and type-indexed functions
that take values of type-indexed data types as arguments. The treatment gen-
eralizes the specialization of type-indexed functions given in Hinze [31], and
used in the implementation of Generic Haskell, a generic programming exten-
sion of the functional language Haskell, see http://www.generic-haskell.
org/. A technical overview of the compiler can be found in De Wit’s thesis [42].
The current release of Generic Haskell contains an experimental implementa-
tion of type-indexed data types. The syntax for type-indexed types used in the
current Generic Haskell compiler differs from the syntax used in this paper in
a few places. There is a tutorial by Hinze and Jeuring [43] that explains the
syntax used in the implementation.
The approach taken in this paper is powerful enough to be used for sets of
mutually recursive type-indexed data types. Hagg [8] uses mutually recursive
type-indexed data types to specify data types with holes, for use in a generic
editor.
37
References
[8] P. Hagg, A framework for developing generic XML Tools, Master’s thesis,
Department of Information and Computing Sciences, Utrecht University (2002).
[9] R. Hinze, J. Jeuring, A. Löh, Type-indexed data types, in: Proceedings of the
6th Mathematics of Program Construction Conference, MPC’02, Vol. 2386 of
LNCS, 2002, pp. 148–174.
[11] G. Huet, The zipper, Journal of Functional Programming 7 (5) (1997) 549–554.
[12] G. Huet, Linear contexts and the sharing functor: Techniques for symbolic
computation, in: F. Kamareddine (Ed.), Thirty Five Years of Automating
Mathematics, Kluwer, 2003.
38
[14] R. Harper, G. Morrisett, Compiling polymorphism using intensional type
analysis, in: 22nd Symposium on Principles of Programming Languages, POPL
’95, 1995, pp. 130–141.
[20] M. Fokkinga, Law and order in algorithmics, Ph.D. thesis, University of Twente,
Dept INF, Enschede, The Netherlands (1992).
[24] Z. Yang, Encoding types in ML-like languages, in: Proceedings ICFP 1998:
International Conference on Functional Programming, ACM Press, 1998, pp.
289–300.
URL citeseer.nj.nec.com/zhe99encoding.html
[26] K. Crary, S. Weirich, Flexible type analysis, in: Proceedings ICFP 1999:
International Conference on Functional Programming, ACM Press, 1999, pp.
233–248.
URL citeseer.nj.nec.com/crary99flexible.html
39
[27] V. Trifonov, B. Saha, Z. Shao, Fully reflexive intensional type analysis, in:
Proceedings ICFP 2000: International Conference on Functional Programming,
ACM Press, 2000, pp. 82–93.
URL citeseer.nj.nec.com/saha00fully.html
[34] D. Clarke, A. Löh, Generic Haskell, specifically, in: J. Gibbons, J. Jeuring (Eds.),
Generic Programming, Vol. 243 of IFIP, Kluwer Academic Publishers, 2003, pp.
21–48.
[35] D. Knuth, J. Morris, V. Pratt, Fast pattern matching in strings, SIAM Journal
on Computing 6 (1978) 323–350.
[36] M. P. Jones, Type classes with functional dependencies, in: G. Smolka (Ed.),
Proceedings of the 9th European Symposium on Programming, ESOP 2000,
Berlin, Germany, Vol. 1782 of LNCS, Springer-Verlag, 2000, pp. 230–244.
[38] J. C. Mitchell, Foundations for Programming Languages, The MIT Press, 1996.
[39] R. Hinze, S. Peyton Jones, Derivable type classes, in: G. Hutton (Ed.),
Proceedings of the 2000 ACM SIGPLAN Haskell Workshop, Vol. 41.1 of
Electronic Notes in Theoretical Computer Science, Elsevier Science, 2001, the
preliminary proceedings appeared as a University of Nottingham technical
report.
40
[40] C. McBride, The derivative of a regular type is its type of one-hole contexts,
unpublished manuscript (2001).
[44] P. Jansson, The WWW home page for polytypic programming, Available from
http://www.cs.chalmers.se/~patrikj/poly/ (2001).
41